Professor W.W. Sawyer


W. W. Sawyer

In school we learn classical mathematics, the mathematics known before the year 1900. It has a direct relation to the actual world. Arithmetic is related to counting and measurement, algebra to general truths about calculations, geometry and trigonometry to the sizes and shapes of objects, calculus to the velocities of moving objects and a host of other applications. Students have no difficulty in knowing what it is about and what it is for.

Students who go on to modern mathematics, the mathematics of the present century, either in university or by private study, find it considerably different. There is no immediate contact with reality, and they may find it hard to visualize what is being discussed. The difficulty of understanding is increased by the fact that textbooks are often written in a form admirably suited to the needs of the research worker, but offering little help to someone approaching the subject for the first time.

A student's difficulties can be greatly reduced or even eliminated if the formal study of the textbook is preceded by a preview of the subject and an account of the way in which the modern approach came into existence. The present article is an attempt to provide such assistance.

The Beginnings of Modern Mathematics.

Early in this century, mathematicians, looking back on the work of earlier centuries, were able to identify certain common ingredients in different branches of mathematics, in much the same way that modern chemists were able to recognize the same vitamin in many different foods. Like the chemists, they decided to isolate and study these ingredients.

Vectors, Old and New.

We can illustrate this by considering the modern use of the word, "vector". Originally "vector" had a very definite meaning. A vector could be represented by a line segment, AB, in 2 or 3 dimensions,with an arrow to indicate the direction from A to B. It had applications to displacements, velocities, accelerations and forces.

Vectors had certain properties that made them interesting to study. In modern usage, the word "vector" is applied to anything that has these properties. Thus it does not indicate any particular type of object; it refers to a certain aspect of many situations.

The properties we require are very few. A vector can be multiplied by a number; two vectors can be added. Any collection of mathematical objects in which these two operations can be defined is called "a vector space". Of course the definitions of addition and multiplication cannot be completely random. They must be such that, when working with them, we can forget that they are not our usual addition and multiplication, and even so, we shall not be led to incorrect results. In a formal treatise this requirement would be spelt out in a number of axioms.

The position of a point is specified by the vector that comes from the origin to it. Often, when an origin has been agreed, we shall speak of "the point u", where u stands for a vector. This will mean the point at the end of the vector, u, originating at the origin.

Optional Properties.

In the original usage of vectors, lengths and angles played a part. It would be possible include in our description of what we meant by "vector space" the requirement that length and angle should be defined. Indeed we shall soon bring in such requirements, but these will then be regarded as singling out a special (though admittedly important) type of vector space. For the present, however,we are not doing this. We are agreeing to accept a situation in which these concepts are lacking as a vector space.

An extreme example would be if we supposed that marking the point (x,y) on graph paper indicated that we were thinking of x cats and y dogs. It would not matter if the graph paper was marked out in squares or in parallelograms. There is no reason for saying that a cat must be perpendicular to a dog. There might be some objection to calling this example a vector space, since the numbers multiplying vectors are supposed to include fractions and negative numbers, which are not appropriate when applied to animals. However perhaps this rather crude illustration will serve to emphasise that the basic concept of vector space includes systems with no resemblance at all to anything in Euclidean geometry.

Theorems on Vector Spaces.

It might seem that there is so little information in the definition that there will be nothing at all to say about the most general vector space, but this is not so. We can, for instance, prove that in such a space the diagonals of a parallelogram bisect each other.

To begin with, we can attach a meaning to two vectors having the same direction. We say that ku has the same direction as u, and that the line joining the origin to u consists of all the points ku. The line through v parallel to the line just mentioned consists of all the points v+ku. The points v, v+u, v+2u, v+3u, and so on, are evenly spaced along this line.

In particular, if w=v+2u, then v+u is the mid-point between u and w, and is given by (1/2)(v+w).

The points 0, a, b, and a+b are corners of a parallelogram. The mid-point of 0 and a+b is (1/2)(a+b), and this is also the mid-point of a and b, which proves the theorem mentioned above.

Examples of Vector Spaces.

We now look at some examples of vector spaces. We begin with two very familiar examples, then we have a less familiar example, and after that something totally unexpected.

(i). Space of 2 dimensions.
A vector is specified by two numbers. If u = (u1, u2) and v = (v1, v2), then we define ku as (ku1, ku2) and u+v as (u1+v1, u2+v2).

(ii). Space of 3 dimensions.
In the same way we define k
u as (ku1, ku2, ku3) and u+v as
1+v1, u2+v2, u3+v3).

(iii). Space of n dimensions.
For the physical space in which we live and move, the numbers 2 and 3 have special significance. However, in the mathematical patterns used in (i) and (ii), nothing is done that depends on the particular numbers 2 and 3. We can define a vector in space of n dimensions, specified by n numbers. To multiply such a vector by k, we multiply each of the numbers specifying it by k. To get the sum of two such vectors, we add the corresponding numbers in the two brackets.

(iv). Function space.
There are simple procedures for multiplying a function by a number and for adding two functions. These have the pattern we require in a vector space.

For example, if s stands for the function s(x)= sin x, then it is natural to interpret t=3s as meaning that
t(x) = 3 sin x. If c stands for the cube function, with c(x) = x3, then we interpret f=s+c as meaning
f(x)=sin x + x3.

Thus the procedures we use, almost unconsciously, when making calculations with functions are of the type appropriate for work with vectors. We may speak of functions as vectors, and all the functions defined on a given interval as forming a vector space.

Functions that can be expanded in a power series are known as analytic. For such functions there is a very close analogy with the n-dimensional vectors considered in (iii).

For example

(1-x)-1 = 1 + x + x2 + x3 + x4 + x5 + ....

(1-x)-2 = 1 + 2x + 3x2 + 4x3 + 5x4 + 6x5 + ...

(1-x)-1 + (1-x)-2 = 2 + 3x + 4x2 + 5x3 + 6x4 + 7x5

In the top row we see the coefficients (1, 1, 1, 1, 1, 1, ....). In the second row we see (1, 2, 3, 4, 5, 6, ....). If we add these by the usual vector rule we get (2, 3, 4, 5, 6, 7, ...) as seen in the bottom row. Thus we may think of these series as specifying vectors in space of infinite dimensions, and as being added in the way usual for vectors.


A decisive step in the development of mathematics was the publication in 1906 of the thesis of Maurice Frechet. It seems to have come about in the following way. In the 19th century and earlier, work had been done on the calculus of variations. This resembles ordinary calculus in that we are trying to make something a maximum or a minimum; it differs by the unknown being not a number but a function. For instance, a chain with its ends secured hangs in the way that makes its centre of gravity as low as possible. Its equation being y = f(x), our aim is to find the f(x) that will make this happen.

Hadamard pointed out that we can imagine real numbers as inhabiting a line and complex numbers as inhabiting a plane, and these geometrical pictures help us in our reasoning. When we are trying to pick out a particular function, we have no geometrical picture of the multitude of possible functions from which it is to be chosen. It would be very helpful if we could arrive at such a picture.

Frechet took up this question. He examined the classical work on real and complex numbers, and observed that distance played a decisive role. In analysis we are much concerned with questions about limits. For both real and complex numbers, a sequence of numbers zn is tending to a limit L if the distance of zn from L is tending to zero. He decided that, if it was possible to find a satisfactory definition of distance between two mathematical objects (of any kind), it would be possible to find theorems about these objects analogous to the theorems about real and complex numbers.

The first question then is "- what is a satisfactory definition of distance?" He looked at the traditional proofs and found the only properties of distance used were the following very simple ones;-

1. Distance is measured by a real number, which is never negative.

2. A distance is zero if, and only if, it is the distance between a point and itself.

3. The distance from A to B is the same as the distance from B to A.

4. You cannot shorten your journey by breaking it. If you go from A to C, and then from C to B, the total distance cannot be less than the distance from A to B. (It may of course be equal, if C lies on the direct route from A to B.) This is known as the triangle axiom. It corresponds to Euclid's remark, that the sum of the lengths of two sides of a triangle must exceed the length of the third side.

Frechet's investigation was extraordinarily fruitful. It was found possible to find a satisfactory definition for the distance between two matrices, two transformations, two functions, two operations that may involve differentiation and integration. At one blow, this opens the door to a whole series of results concerning the most varied situations.

It is often possible to find more than one definition of distance for given objects. For instance, on a chessboard we can define distance as the minimum number of moves a king needs to get from one square to another. We get a different definition if we consider a rook instead of a king. Options can equally well arise in more serious mathematical contexts.

The distance from A to B is the length of AB. We now consider defining length. Of the many possible definitions we shall here consider only those that lead to our usual geometry, or to a geometry very similar to it. In all the spaces now to be listed, Pythagoras Theorem is true in some sense. All these spaces are vector spaces.

The symbol ||u|| will be used for the length of the vector u. As v-u is the vector that goes from the point u to the point v, ||v-u|| gives the distance of the point u from the point v.


1. Euclidean space of 2 dimensions.
If u =(u1, u2), we define length by

||u||2 = u12 + u22.

With the help of Pythagoras Theorem. we can define "perpendicular" . We shall say v is perpendicular to u, if the points 0, u, v form a right-angled triangle with a right-angle at the origin, 0. Whether it is right-angled or not we test by using Pythagoras. The length of the hypotenuse is the distance from u to v, that is, ||v-u||. As v-u is (v1-u1, v2-u2), c, the length of the hypotenuse is given by

c2 = (v1-u1)2 + (v2-u2)2 .....................................................................(1)

The sum of the squares on the other two sides is

||u||2+||v||2 = (u12+u22) + (v12+v22)...............................(2)

The condition for a right-angle is that the expressions in (1) and (2) are equal. On writing the equation, we find there is cancelling of all squared terms. Dividing what remains by -2 we reach the equation

u1v1+u2v2 =0.

We recognize the expression here as the dot product, u.v. Accordingly, in each geometry below we shall define u.v as the expression that turns up in this way. Thus, in each geometry, u.v = 0 will be the condition for u being perpendicular to v.

2. Euclidean space of 3 dimensions.
We follow essentially the same lines as for 2 dimensions. We define length by

||u||2 = u12 + u22 + u32.

The square on the hypotenuse will be

c2 = (v1-u1)2 + (vs-u2)2 + (v3-u3)2

The sum of the squares on the other two sides is

(u1s + u22 + u32) + (v12 + v22 + v32) .

Equating these, cancelling the squares and dividing by -2 we obtain the definition of u.v as

u.v = u1v1 + u2v2 + u3v3 .

3. Euclidean space of n dimensions.
The argument follows exactly the same lines as for 3 dimensions. the only difference is that, instead of letting the numbers run through 1,2,3, we let them run through 1,2,3,... up to n. At the end we arrive at the definition

u.v = u1v1 + u2v2 + ....+ unvn .

4. Function space.
Suppose we have a continuous function, defined on the interval [p,q]. We could get a good idea of its nature by dividing the interval into a large number,n, of parts, taking x
1, x2,...xn as the midpoints of each part and looking at the values f(x1), f(x2),..... f(xn). These would specify a vector in n dimensions; the square of its length would be

f(x1)2 + f(x2)2 +....... f(xn)2.

The distances between the points so obtained for various functions might give us a useful way of measuring the distances between the functions. However, the method is rather rough and ready. How big should n be ? Why choose the mid-point of each interval?

Now a sum resembling that just written would appear if we were making an estimate of the value of
gpq f(x)2 dx.

This suggests that we might define the length of the function f by

||f||2 = gpq f(x)2 dx .

This leads to something quite novel. We can define a dot product for this space, and thus find a meaning for one function being perpendicular to another.

The work follows the same pattern as before, but with integrals instead of sums.

When we were finding the condition for u to be perpendicular to v, the length of the hypotenuse was ||(v-u)||, so the condition was

||(v-u)||2 = ||u||2 + ||v||2 .

If u corresponds to f(x) and v to g(x) , v-u will correspond to g(x) - f(x) . With the definition of length just found this condition will become

c2 = a2 + b2 where

c2=gpq[g(x)-f(x)]2 dx

a2 = gpqf(x)2 dx , b2 = gpqg(x)2 dx .

We now need to multiply out the bracket that appears in c2.
This gives

c2= gpqg(x)2 - 2f(x)g(x) + f(x)2 dx.

The integrals of f(x)2 and g(x)2 appear on both sides of the equation
2 = a2 + b2 , and cancel just as the squares did in the earlier examples. Again we divide by -2 to arrive at the definition

f.g = gpq f (x) g(x) dx .

We shall say that the functions are perpendicular if this dot product is zero.

"Orthogonal" is a synonym for "perpendicular" and it is the custom to-day to speak of orthogonal functions rather than perpendicular ones. I do not know the reason for this. Perhaps the idea of functions being perpendicular is felt to be rather shocking, and the more learned word is used to lessen the shock.

The usefulness of perpendicularity.

It is often found in attacking a problem that the axes of co-ordinates in which we have started are not the best for further work, and it becomes necessary to go over to some other system. If u, v, w are to be unit vectors in the directions of the new axes, the point with co-ordinates X,Y,Z for the new system will be Xu+Yv+Zw. By equating this to (x,y,z), the specification of the point in the old system, we obtain 3 equations , by solving which we can find the new co-ordinates X,Y,Z. In 3 dimensions this may not be too bad; the corresponding problem in 10 dimensions, for instance, could be rather trying.

However, if the vectors along the axes are perpendicular in both the old and the new system of axes, a much simpler method is available.

Suppose, for example, that in 3 dimensions the new axes are to have the vectors u,v,w where u = (1,1,1), v =(0,1,-1), and w = (-2,1,1). It is easily verified from the dot products that these vectors are perpendicular to each other. Let the vector s be (x,y,z) in the old system. In the new system we are to have s = Xu +Yv + Zw .

Take the dot product of this equation with u.

On the right-hand side u.v and u.w are zero, so the result
is simply

s.u = X (u.u) .....................................................................(3)

The dot products are easy to work out; s.u = x+y+z and u.u=3. We thus find x+y+z =3X, which gives X immediately. In the same way, by taking dot products with v and w we can single out Y and Z.

Fourier Series.

For many problems it is important to expand a function in a series consisting of sines. We may be given a function in the interval (0, q) and wish to find the constants c1, c2, c3,... in the series

f(x) = c1 sin x + c2 sin 2x + c3 sin 3x + .................................(4)

A device which has been known since about 1750 for finding these constants is the following. (There are certain logical difficulties in this procedure which will not be discussed here. It is not always allowable to integrate an infinite series term by term.) It was noticed that, if m and n are different whole numbers, then

goq sin mx sin nx dx = 0 .....................................................................(5)

To find, say, c3, we multiply equation (3) by sin 3x and integrate from 0 to q. This will wipe out all the terms except that containing c3, and we shall have

goq f(x) sin 3x dx = c3 goq sin2 3x dx ...........................(6)

As soon as we have worked out the integrals we shall have the value of c3.

Here we have singled out c3 in much the same way that we singled out X in the 3-dimensional problem. The analogy in fact is extremely close. If we take un as the vector representing sin nx, equation (5) can be written um.un = 0, which means that the sines are represented by mutually perpendicular vectors, while equation (6) says

f.u3 = c3 u3.u3,

which is of exactly the same form as equation (3) earlier. Thus finding the coefficients in a Fourier series turns out to be just the problem of expressing a vector in a new system of perpendicular axes.

Now the modern approach starts to work for us. Fourier theory appears in a new light, as a method using projection onto orthogonal axes. But this immediately suggests a thought; there are infinitely many ways of choosing a set of perpendicular directions. To each choice of axes, there will correspond a way of finding the coefficients in a series by a procedure much like that used with Fourier series. Many such choices have been studied. One of them is particularly simple, and could be used to compose problems in elementary calculus. In it the orthogonal functions are not sines, as with Fourier, but simply polynomials.

Orthogonal Polynomials.

With [-1,1] as the basic interval, let Fo(x) = 1,

F1(x) = x, F2(x) = 3x2 -1,

F3(x) = 5x3 -3x, F4(x) = 35x4 -30x2 + 3.

You can check that these are orthogonal. The work can reduced by using the following observation; the dot product involves the vectors linearly. For example, f.(au + bv + cw) = a(f.u) + b(f.v) + c(f.w) . This means that, if f is perpendicular to u, to v and to w, it is bound to be perpendicular to au + bv + cw, for any a,b,c. This holds for any number of vectors in the bracket; if f is perpendicular to each of them, it is perpendicular to any linear mixture of them.

So, if we check that F4(x), for example, is perpendicular to 1, to x, to x2 and to x3, it is bound to be perpendicular to any linear combination of these, hence to any polynomial with degree less than 4, hence in particular to Fo(x), to F1(x), to F2(x) and to F3(x). The same idea can be applied to F3(x).

Further polynomials in this sequence can be obtained by taking Fn(x) = (d/dx)n (x2 -1)n.

That Fn(x) is perpendicular to xm if m<n can be proved by integration by parts. Observe that
s (x2-1)n is zero for x= -1 and x=1 if s<n ; do not expand any of the powers of x2-1.

If multiplied by certain constants, these polynomials give the Legendre polynomials, which play an important part in electromagnetic theory and other branches of science. They can be used to build series in much the same way that sines are for Fourier series. Like Fourier series, they are capable of representing functions that have discontinuities.


These notes have dealt with some parts of modern mathematics, showing the ideas that led to them and the applications that flow from them. It is hoped that these notes will make progress easier for anyone venturing into this unfamiliar region for the first time.

34 Pretoria Road,
Cambridge CB4 1HE.

Copyright W. W. Sawyer & Mark Alder 2000

Version: 26th  November 2020



Professor W.W. Sawyer