(PDF) Some Physics for Mathematicians: Mathematics 7120, …gross/newt32x.pdf · Some Physics for Mathematicians: Mathematics 7120, Spring 2011 Len Gross with the assistance of Mihai Bailesteanu, - DOKUMEN.TIPS (2024)

Some Physics for Mathematicians:Mathematics 7120, Spring 2011

Len Grosswith the assistance of

Mihai Bailesteanu, Cristina Benea, Joe Chen, Nate Eldridge,Chi-Kwong (Alex) Fok, Igors Gorbovickis, Amanda Hood,

John Hubbard, Yasemin Kara, Tom Kern, Janna Lierl,Yao Liu, Shisen Luo, Peter Luthy, Justin Moore,

Maicol Ochoa Daza, Milena Pabiniak, Eyvindar Palsson,Ben Steinhurst, Baris Ugurcan, Hung Tran, Jim West,

Daniel Wong, Tianyi Zheng,...

June 23, 2011

These notes are based on a previous incarnation of this seminar:Physics for Mathematicians:

Mathematics 712, Spring 2003by Len Gross

with the assistance ofTreven Wall, Roland Roeder, Pavel Gyra,

Dmitriy Leykekhman, Will Gryc, Fernando Schwartz,Todd Kemp, Nelia Charalambous, ...

and the participation ofKai-Uwe Bux, Martin Dindos, Peter Kahn, Bent Orsted,

Melanie Pivarski, Jose Ramirez, Reyer Sjamaar, Sergey Slavnov,Brian Smith, Aaron Solo, Mike Stillman, ...

1

Contents

1 Introduction 5

2 Newtonian Mechanics 72.1 Work, energy and momentum for one particle . . . . . . . . . 72.2 Work, energy and momentum for N particles . . . . . . . . . 102.3 Angular Momentum . . . . . . . . . . . . . . . . . . . . . . . 122.4 Rigid bodies and SO(3) . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Angular velocity . . . . . . . . . . . . . . . . . . . . . 152.4.2 Moment of Inertia and Angular Momentum . . . . . . 16

2.5 Configuration spaces and the Newtonian flow . . . . . . . . . . 182.6 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . 21

2.6.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . 212.6.2 General configuration spaces. (Nonlinear systems.) . . 24

2.7 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . 262.7.1 The Legendre transform . . . . . . . . . . . . . . . . . 282.7.2 From Lagrange to Hamilton via the Legendre transform 30

2.8 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.10 Solutions to problems . . . . . . . . . . . . . . . . . . . . . . . 32

3 Electricity and Magnetism 343.1 Lodestones and amber from antiquity till 1600 . . . . . . . . . 343.2 Production, transfer and storage of electrostatic charge. . . . . 36

3.2.1 Production of electrostatic charge, 1704-1713. Hauksbee 363.2.2 Transfer of electrostatic charge, 1731. Grey and Dufay 373.2.3 Storage of electrostatic charge, 1746. Musschenbroek

of Leyden. . . . . . . . . . . . . . . . . . . . . . . . . . 373.3 Quantitative measurement begins: COULOMB . . . . . . . . 42

3.3.1 How Coulomb did it. . . . . . . . . . . . . . . . . . . . 423.3.2 Mathematical consequences . . . . . . . . . . . . . . . 43

3.4 The production of steady currents, 1780-1800 . . . . . . . . . 453.5 The connection between electricity and magnetism . . . . . . . 47

3.5.1 Oersted, Ampere, Biot and Savart . . . . . . . . . . . . 473.5.2 FARADAY . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6 MAXWELL puts it all together, 1861 . . . . . . . . . . . . . . 553.7 Maxwell’s equations ala differential forms . . . . . . . . . . . . 57

2

3.8 Electromagnetic forces ala Newton, Lagrange and Hamilton . 593.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Quantum Mechanics. 644.1 Spectroscope . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2 100 years of spectroscopy: radiation measurements that defied

Newton ∪ Maxwell . . . . . . . . . . . . . . . . . . . . . . . . 664.3 The rules of quantum mechanics . . . . . . . . . . . . . . 694.4 The Heisenberg commutation relations and uncertainty principle 754.5 Dynamics and stationary states . . . . . . . . . . . . . . . . . 794.6 Hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.7 Combined systems, Bosons, Fermions . . . . . . . . . . . . . . 854.8 Observables from group representations . . . . . . . . . . . . . 87

4.8.1 Example: Angular momentum . . . . . . . . . . . . . . 884.9 Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.10 Pictures: Heisenberg vs Schrodinger . . . . . . . . . . . . . . . 924.11 Conceptual status of quantum mechanics . . . . . . . . . . . . 954.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Quantum field theory 975.1 The harmonic oscillator . . . . . . . . . . . . . . . . . . . . . 985.2 A quantized field; heuristics. . . . . . . . . . . . . . . . . . . . 995.3 The ground state transformation . . . . . . . . . . . . . . . . 1025.4 Back to the quantized field. . . . . . . . . . . . . . . . . . . . 1075.5 The time dependent field . . . . . . . . . . . . . . . . . . . . . 1135.6 Many, many particles: Fock space . . . . . . . . . . . . . . . . 117

5.6.1 Creation and annihilation operators . . . . . . . . . . . 1185.6.2 The canonical commutation relations . . . . . . . . . . 1225.6.3 Occupation number bases . . . . . . . . . . . . . . . . 1255.6.4 Time evolution on Fb and Ff . . . . . . . . . . . . . . . 128

5.7 The Particle-Field isomorphism. . . . . . . . . . . . . . . . . . 1305.8 Pre-Feynman diagrams . . . . . . . . . . . . . . . . . . . . . . 134

6 The electron-positron system 1416.1 The Dirac equation . . . . . . . . . . . . . . . . . . . . . . . . 1416.2 Dirac hole theory . . . . . . . . . . . . . . . . . . . . . . . . . 1456.3 Pair production . . . . . . . . . . . . . . . . . . . . . . . . . . 151

3

7 The Road to Yang-Mills Fields 1527.1 Quantization of a particle in an electromagnetic field . . . . . 1537.2 Two classic debuts of connections in quantum mechanics . . . 155

7.2.1 The Aharonov-Bohm experiment . . . . . . . . . . . . 1567.2.2 The Dirac magnetic monopole . . . . . . . . . . . . . . 157

8 The Standard Model of Elementary Particles 1608.1 The neutron and non-commutative structure groups. Timeline. 163

9 Appendices 1649.1 The Legendre transform for convex functions . . . . . . . . . . 164

9.1.1 The Legendre transform for second degree polynomials 1679.2 Poisson’s equation . . . . . . . . . . . . . . . . . . . . . . . . 1689.3 Some matrix groups and their Lie algebras . . . . . . . . . . . 171

9.3.1 Connection between SU(2) and SO(3). . . . . . . . . . 1749.3.2 The Pauli spin matrices . . . . . . . . . . . . . . . . . 175

9.4 grad, curl, div and d . . . . . . . . . . . . . . . . . . . . . . . 1769.5 Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . 1789.6 Special relativity . . . . . . . . . . . . . . . . . . . . . . . . . 1809.7 Other interesting topics, not yet experimentally confirmed . . 180

9.7.1 The Higgs particle for the standard model . . . . . . . 1809.7.2 Supersymmetry . . . . . . . . . . . . . . . . . . . . . . 1829.7.3 Conformal field theory . . . . . . . . . . . . . . . . . . 1829.7.4 String theory . . . . . . . . . . . . . . . . . . . . . . . 182

9.8 How to make your own theory . . . . . . . . . . . . . . . . . . 182

10 Yet More History 18210.1 Timeline for electricity vs magnetism. . . . . . . . . . . . . . . 18210.2 Timeline for radiation measurements . . . . . . . . . . . . . . 18410.3 Planck, Einstein and Bohr on radiation. Timeline. . . . . . . . 18510.4 List of elementary particles . . . . . . . . . . . . . . . . . . . . 188

4

1 Introduction

There have been a number of good books aimed at mathematicians, de-scribing the mathematical structures that arise in quantum mechanics andquantum field theory. Here are just a few. Most have the words “Quantum”and “Mathematicians” in the title. [21, 28, 3, 46, 57, 58, 59, 73, 74]. Someaim to explain how these mathematical structures build on those that thatrepresent an earlier physical theory . Some aim to give a mathematicallyprecise exposition of various topics for mathematicians who want to under-stand the meaning of terms and ideas developed by physicists. It is no secretthat the writing styles of mathematicians and physicists are not conduciveto cross-communication.

This seminar is aimed at describing the experiments and observationaldata that led physicists to make up the successful theories that explainedthe observed phenomena. Of course we want to describe what these theoriesare and how they solved some of the experimentally produced problems. Tothis end we will make use of some of the common mathematical background ofthe participants (supplemented when necessary), and thereby save time whichwould otherwise have to be used in a physics course. The listed prerequisitesfor this seminar were a) highschool physics and b) two years of graduatemathematics courses. This tells you something.

There is really no substitute for going into a laboratory and doing ex-periments with real objects. But our room is not well equipped for this.Instead I’m going to try to substitute some history of electricity, magnetismand radiation, where possible, and describe the experimental facts that de-manded explanation. There is, I think, some physical insight to be gainedfrom understanding who tried to do what, who succeeded and who failedand how and why. As to whether or not a semi-historical exposition is re-ally in the least bit helpful for conveying the notions of physics, even to amathematically sophisticated audience, remains to be seen. It would be easyand quick just to write down Maxwell’s equations and give the definitions ofthe electric and magnetic fields. We will, however, repeat in class Faraday’sgreat experiment of 1831, which put the final touch on the experimental factsthat Maxwell used 30 years later to bring the theory of electricity and mag-netism into its final form. Fortunately it is possible to do this experimentwith rather primitive equipment, which is all that we have.

One failed goal of this seminar was to reach a description, however prim-

5

itive, of the current status of elementary particle physics. A list of omittedimportant topics on the straight road to this goal would very likely be longerthan the table of contents. Maybe next time.

Here is the progression of topics necessary to understand in order to getat least a little background for current elementary particle theory.

[F = ma]→ Lagrangian Mechanics→ Hamiltonian Mechanics→Quantum Mechanics→ Quantum Field Theory

Parallel to this one needs to understand two other “classical” theoriesand their relation.

Maxwell’s equations→ Yang-Mills equations

6

2 Newtonian Mechanics

2.1 Work, energy and momentum for one particle

The objective of this section is to review the notions of linear momentum,angular momentum, energy and the conservation laws that they satisfy.

Recall Newton’s equation:

F = ma.

We are going to elaborate on the various forms that this equation takes ingradually more sophisticated systems. To begin, consider a single particlemoving in space, which we, perhaps understandably, will take to be R3.

Denote the position of the particle at time t by x(t). Let m be a strictlypositive constant. We will refer to m as the mass of the particle. Theacceleration of the particle is by definition d2x/dt2. Newton’s equation thenreads

F = m d2x(t)/dt2

where F is the force acting on the particle at time t.In the preceding paragraph several mathematically undefined words have

been used, namely “particle”, “time”, “mass” and “force”. These wordsacquire meaning only from experience with the physical world. I’m going toassume that the reader has some familiarity with their physical meaning andI’ll use them without further ado.

The force F is a vector in R3 and may depend on time, on the position,x(t), of the particle, on the velocity dx(t)/dt and could, in principle, dependon higher derivatives of the position. But the existence theory for solutionsof 1) is greatly affected if F depends on higher derivatives than the first. Ifthe particle is charged and there are electromagnetic forces then the force onthe particle can depend on t, on the position of the particle and also on itsvelocity. But the fundamental notions of mechanics are best understood inthe simplest contexts first.

Definition 2.1 The work done on a particle by a constant force is

Work = force times distance.

Here “force” refers to the component of the force in the direction of motion.

7

Example 2.2 If you lift a 10 pound weight up 3 feet then you have done 30footpounds of work. If you lower the ten pound weight 3 feet then you havedone -30 footpounds of work. ( The weight has done 30 footpounds of workon you.) If you move the weight 3 feet horizontally you have done no workif you didn’t have to overcome some friction.

The extension of this notion of work in case the force is not constant andthe path is not straight is given by

Definition 2.3 (Work done by a variable force) The work done by a forcein moving a particle along a trajectory x(·) from t = a to t = b is

W =

∫ b

a

F · dx(t).

This clearly reduces to “force times distance” if the force in the direction ofmotion is constant and the trajectory is a straight line.

Example 2.4 If one moves a ten pound weight around a vertical rectanglethen the total work done on the weight is zero.

Although the force in Definition 2.3 can depend on the position, velocityof the particle and the time t, we will consider, for a while, only forces thatdepend on the position of the particle and not on its velocity or explicitlyon time. In this case the work done in moving a particle along a curve C isclearly given by a parametrization independent line integral

W =

∫C

F (x) · dx.

Definition 2.5 A force field F (·) defined in an open set U ⊂ R3 is calledconservative if ∫

C

F (x) · dx = 0

for every closed curve in U.

Theorem 2.6 In a connected region U of R3 a force field F is conservativeif and only if there is a function V : U → R such that

F = −grad V in U. (2.1)

8

Proof. Pick x0 ∈ U and define V (x) = −∫ xx0F (x) · dx along any curve in U

joining x0 to x. V is well defined because the integral is path independent. Asyou have proved many times before, the identity (2.1) holds for this function.

Terminology: V is called the potential of the force field F . V is clearlydetermined by F only up to an additive constant in the connected region U .

Example 2.7 (Gravitational forcefield of the earth in the interior of thisroom.) Choose coordinates x, y, z aligned with the left front corner of theroom. Since the height of the room is rather small compared with the radiusof the earth the gravitaional force is “pretty much” constant and is downward.Take V (x, y, z) = mgz where g is a constant to be determined by experimentand m is the mass of the particle acted upon. Then −gradV = −(0, 0,mg),which agrees with the assertion that the gravitational force is “locally” con-stant and downward.

Example 2.8 (Gravitational force field of the earth at a big distance, r,from the center of the earth.) Measurements suggest that the force on aparticle of mass m at distance r from the center is proportional to 1/r2 andis pointed toward the center of the earth. The function V (x, y, z) = mG/rgives such a force and is believed to be accurate in the absence of very hugegravitating masses.

Example 2.9 (Harmonic oscillator.) A particle constrained to move on aline is called a harmonic oscillator if it is subject to a force which, choosingthe origin properly, always acts toward the origin and is proportional to thedistance. I.e., F (x) = −kx for some constant k > 0. In this case the functionV (x) = kx2/2 is a potential.

Definition 2.10 The energy of a particle, at a point x and having velocityv, in a conservative force field is

E(x, v) = (1/2)m|v|2 + V (x) (2.2)

Theorem 2.11 (Conservation of energy.) If a particle of mass m movesunder the influence of a conservative force F (x) = −grad V (x) in accordancewith Newton’s equations then its energy is constant along orbits.

9

Proof.

(d/dt)E(x(t), v(t)) = m(v(t), dv/dt) + (gradV, dx/dt)

= (v,ma− F )

= 0

Definition 2.12 The momentum of a particle of mass m and velocity v is

p = mv.

Note: In terms of momentum Newton’s equations read

dp/dt = F.

Neat, huh? We’ll see better reasons for introducing the notion of momentumlater.

2.2 Work, energy and momentum for N particles

Consider N particles of masses m1, ...,mN respectively. Let xj(t) denote theposition of the jth particle at time t. Write vj = dxj/dt for the velocity ofthe jth particle.

Definition 2.13 The momentum of the jth particle is

pj = mjvj

The total momentum of the system is

P =N∑j=1

pj

Two body forces. A frequently arising kind of force in such a system isthat in which the total force on the jth particle is a sum

Fj =∑k 6=j

Fjk

10

where Fjk is the force exerted on the jth particle by the kth particle, e.g.,gravitational forces. These so called “two body forces” are said to obey“action and reaction” if the jth particle “pulls back” on the kth particle bythe same force. That is

Fjk(xj, xk) = −Fkj(xk, xj).

Theorem 2.14 (Conservation of momentum.) In a system with two bodyforces obeying action and reaction the total momentum is conserved duringthe Newtonian flow.

Proof.

dP/dt =N∑j=1

dpj/dt

=∑j

Fj (by Newton’s equations)

=∑k 6=j

Fjk

= 0.

{The following theorem is done in greater generality in the section onLagrangians. So change the following later.}

Theorem 2.15 (Conservation of energy.) Assume that each of the forces Fjkis conservative and depends only on the relative position of the two particles.That is,

Fjk = −gradj V (xj − xk)

[For simplicity we take V the same for each pair.] Then the total energy ofthe system,

E =N∑j=1

(1/2)mj|vj|2 + (1/2)∑k 6=j

V (xj − xk)

is conserved under the Newtonian flow.

11

Proof.

dE(t)/dt =N∑j=1

mj vj · vj + (1/2)∑k 6=j

((grad V )(xj − xk)) · (vj − vk)

=N∑j=1

Fj · vj − (1/2)∑k 6=j

Fjk · (vj − vk)

= 0

END OF DAY 1. 1/25/2011SUMMARY. We defined momentum and energy and showed that for cer-

tain isolated systems momentum and energy are both invariant (conserved)under the Newtonian flow.

Exercise 2.16 Suppose that a cubical piece of wood of mass 5 grams issitting on a table. From the center of the right side of the cube a strongbut weightless rope is attached, which extends horizontally to a frictionlesspulley at the right side of the table. The rope descends from the pulley downto a wooden ball of mass 3 grams, to which it is attached at the top. Whenthe horizontal and vertical portions of the rope are taut and the system isstationary, the ball and cube are released. Find the acceleration of the ballas a function of time.

Hint: Although there are no point particles apparent in this problemyou may apply the ideas from the preceding section by using your physicalintuition and good judgement. (You’re welcome.)

2.3 Angular Momentum

We are going to have to review some elementary facts about the rotationgroup SO(3) because it plays a central role in understanding angular mo-mentum and spin as well as the classification of elementary particles.

Definition 2.17 The angular momentum about the origin, of a parti-cle at x moving with momentum p is

L = x× p

The torque about the origin of a force F acting on the particle is

N = x× F

12

Lemma 2.18 (Newton’s Equations for angular momentum )

dL/dt = N [Note the similarity to dp/dt = F.]

Proof.

dL/dt = dx/dt×mv + x× dp/dt= 0 + x× F= N

Theorem 2.19 The total angular momentum of a system of N particles isconserved under the Newtonian flow if the forces are 2-body forces that obey“action and reaction” and in addition act along the line joining the pair ofparticles.

Proof. Case n =2:

dL/dt = (d/dt)(x1 × p1 + x2 × p2)

= x1 × dp1/dt+ x2 × dp2/dt (because vj × pj = 0)

= x1 × F1 + x2 × F2.

But by assumption F2 = −F1 and also these forces act along the line joiningx1 and x2. Hence

dL/dt = (x1 − x2)× F1 = 0

Case n =n.

dL/dt =n∑j=1

(d/dt)(xj × pj)

=n∑j=1

(vj × pj + xj × dpj/dt)

=n∑j=1

(xj × Fj)

=n∑j=1

(xj ×∑k 6=j

Fjk)

=∑

unordered pairs

(n=2 case)

= 0.

13

2.4 Rigid bodies and SO(3)

Definition 2.20 A rigid body is a discrete or continuous system of particlessuch that the distance between any two particles can not vary with time.[This definition is taken from Goldstein first edition, page 10.]

The discrete or continuous distribution of particles can be represented bya finite measure µ on R3, which we will take to have compact support, andwhich should be interpreted as the mass distribution of the body. The onlymotions that a rigid body can make are

a. a translation. This translates the measure µ.b. a rotation. This rotates the measure µ.c. a composition of these two (i.e. a proper Euclidean motion.)d. a continuously changing family of proper Euclidean motions.

Facts about the rotation group SO(3).1. SO(3) is homeomorphic to Bπ := {v ∈ R3 : |v| ≤ π} with opposite

points on the surface identified. “Proof:” Any rotation is a rotation aroundsome axis v. Use the right hand rule to determine the direction of the rotationand choose |v| to be the amount of right-handed rotation. (If right thumbpoints along v then the rotation is in the direction of one’s fingers.) Thenmake the obvious identifications.

2. The Lie algebra of SO(3) is so(3) :≡ {3 × 3 skew symmetric realmatrices}.

3. so(3) is isomorphic as a Lie algebra to R3 in the cross product.4. π1(SO((3)) = Z2.5. SU(2) is the covering group of SO(3) by the covering map (spell this

out).To be precise, let E3 denote the proper Euclidean group of R3. Thus a

map A : R3 → R3 is in E3 if its of the form Ax = Lx + a where L ∈ SO(3)and a ∈ R3. (So E3 = SO(3) semidirect product with R3.)

Let g(t) be a smooth curve in the group E3. Then the one parameterfamily of measures g(t)∗µ is the typical allowed motion of the rigid body.In the special case that one point in the body is fixed in space, say at theorigin, then translation of this point cannot occur. In this case g(t) lies inthe subgroup SO(3) for all t.

14

What form do Newton’s equations take for this system? For a rigid bodythe analog of linear velocity is angular velocity. We need to digress for amoment to discuss angular velocity.

END OF DAY 2 1/27/2011

2.4.1 Angular velocity

Define Aωx = ω × x for ω and x in R3.Recall the famous identities a× (b× c) = b(a · c)− c(a · b) and a · (b× c) =

(a× b) · c.

Theorem 2.21A∗ω = −Aω. (2.3)

andAu×v = [Au, Av] (2.4)

Proof. (Aωx, y) = (ω × x) · y = −(x × ω) · y = −x · (ω × y) = −(x,Awy).To prove the second equality observe that

u× (v × x) = v(u · x)− x(u · v)

v × (u× x) = u(v · x)− x(u · v)

(u× v)× x = v(x · u)− u(x · v)

= u× (v × x)− v × (u× x)

Now choose coordinate axes such that ω = (0, 0, a) with a > 0. If x =(x1, x2, x3) then ω × x = (−ax2, ax1, 0). Hence

Aω =

0 −a 0a 0 00 0 0

(2.5)

and therefore

exp(tAω) =

cos ta − sin ta 0sin ta cos ta 0

0 0 1

(2.6)

15

So exp(tAω) is a rotation in the x, y plane by an amount θ = θ(t) = ta.Hence dθ/dt = a. Thus the one parameter group of rotations exp(tAω)rotates R3 around the axis ω with speed equal to |ω|. The direction ofrotation is determined from ω by the right hand rule. (If right thumb pointsalong ω then the rotation is in the direction of one’s fingers.) Aω is theinfinitesimal generator of this one parameter group of rotations and ω iscalled the ANGULAR VELOCITY of this motion if t is time. The previoustheorem shows that the map ω → Aω is a Lie algebra hom*omorphism ofR3 (in the cross product) to so(3). It is clearly an isomorphism becausedimension so(3) = 3. The map ω → exp(Aω) maps Bπ ≡ {ω : |ω| ≤ π} ontoSO(3) because every rotation of R3 is a rotation around some unique axis byat most π radians. But if |ω| = π then exp(A±ω) are the same rotation. Sothis map is a diffeomorphism from Bπ with antipodal points identified ontoSO(3).

The rotational motion of a point x ∈ R3 under this 1-parameter rotationgroup is given by x(t) = exp(tAω)x. The velocity of this motion is then

v(t) = (d/dt) exp(tAω)x = Aωx(t) = ω × x(t).

The expressionv = ω × x

is the usual form of this velocity used in the physics literature. It makes theangular velocity ω apparent. Otherwise put, the vector field

x→ ω × x

is the vector field that generates the one parameter diffeomorphism groupexp(tAω) of R3.

2.4.2 Moment of Inertia and Angular Momentum

Definition 2.22 Let µ be the measure representing the mass density of arigid body. The moment of inertia about a point, which we take to be theorigin, is the linear transformation M : R3 → R3 defined by

Mω =

∫R3

x× (ω × x)dµ(x)

16

Exercise 2.23a. Show that M is a nonnegative symmetric operator.b. Show that M is invertible if and only if the body does not lie in a

plane through the origin. (I.e. µ is not supported in such a plane.)

Theorem 2.24 The kinetic energy of a body with one point fixed in spaceand with angular velocity ω is

T = (1/2)(Mω,ω)R3

Proof. (Take note of the similarity to E = (1/2)m|v|2.) The velocity of apoint x in the rigid body is v = ω × x The kinetic energy of the body istherefore

T =

∫R3

(1/2)|v|2dµ(x) (2.7)

= (1/2)

∫|ω × x|2dµ(x) (2.8)

= (1/2)

∫(ω × x, ω × x)dµ(x) (2.9)

= (1/2)

∫(x× (ω × x), ω)dµ(x) (2.10)

= (1/2)(Mω,ω) (2.11)

In the next to the last line I used the fact that the cross product is skew-symmetric in the R3 inner product. This is a special case of the fact thatad Aω is skew symmetric on any Ad invariant inner product on so(3).

Definition 2.25 The angular momentum of a rigid body rotating withangular velocity ω about a point in space (which we take to be the origin) is

L = Mω,

where M is the moment of inertia about the same point.

Definition 2.26 The torque on a rigid body about the origin is the vectorsum of the torques acting at each point of the body. (If the body has handlesthis can be a complicated force to describe.) Since the body is rigid and may

17

even have very light handles or rods attached to the main mass the torqueneed not act at the mass points.

N =

∫R3

x× F (dx) (2.12)

where F is a vector valued measure on R3, possibly unrelated to the massdensity measure µ. But if the force is entirely gravitational then they areclearly related, as in the next Exercise.

Lemma 2.27 Newton’s equations for a rigid body with one point held fixedat the origin are

dL/dt = N

Proof. You do this one.

Exercise 2.28 Seesaw : A narrow seesaw of length 20 feet is supported atit* center. The seesaw is made out of wood of density 3 lbs. per linear foot.Person A sits on the far right end of the seesaw (10 feet from the fulcrum.)Person A weighs 40 lbs. Person B sits 8 feet to the left of the fulcrum. PersonB weighs 60 lbs. The right side of the seesaw tilts upward by an angle θ.Assume that θ = 0 at t = 0. Find d2θ/dt2 at t = 0.

Exercise 2.29 Offset wheel on an axle. Compute torque. Write short essayon static versus dynamical balancing.You may consult with classmates, butnot with local tire shops.

Exercise 2.30 Use the product rule for differentiation of SO(3) valued func-tions to explain Coriolis force. Ref. Goldstein, Chapter 5.

2.5 Configuration spaces and the Newtonian flow

So far we have discussed the form that Newton’s equations take for somesimple mechanical systems; n particles free to move in R3 under variouskinds of forces, and a rigid body. We saw that there are some “constantsof the motion” , total energy, total momentum, total angular momentum,whose invariance under the Newtonian flow (under some circ*mstances onthe forces) seems “reasonable” given our developed sense of intuition - 300years after Newton.

18

We want to consider now the general structure of more complicated sys-tems in order to force a better understanding of the general struture ofNewtonian flows. Our goal is to provide a jumping off point to quantummechanics.

Definition 2.31 The configuration space of a mechanical system is a C∞

manifold whose points are in one-to-one correspondence with the possiblepositions (alias configurations) of the system.

This is the kind of definition linking a precise mathematical notion to anintuitive physical notion that can only be understood by examples. Here aresome examples.

Example 2.32 (Examples of configuration spaces.)1. Physical system: a particle free to move in R3 under some forces. The

configuration space C is R3.2. Physical system: N particles, each free to move in R3 under some

forces. C = R3N . (You could remove the coincident points from this productto avoid having two or more particles at the same point if you wish.)

3. Physical system: a particle constrained to move on a sphere of radiusfive feet (also known as a spherical pendulum). C is a sphere in R3 of radiusfive feet.

4. Physical system: a pendulum on an arm of length 5 feet alowed tomove in in a plane. C is a circle of radius five feet.

5. Physical system: a double pendulum (the arm of the second pendulumis attached to the mass of the first pendulum.) C = S1 × S1. (Each S1 canhave a different radius.)

6. Physical system: a rigid body. C = E3.7. Physical system: a rigid body with one point fixed in space. C =

SO(3).

END of DAY 3 2/1/2011

A curve t → q(t) in C is the mathematical object representing the tra-jectory of the system. Denote by T (C) the tangent bundle. Then the curvet → q(t) ∈ Tq(t)(C) is a map from R to T (C). Newton’s equations, beingsecond order in q(t), are first order in the pair q(t), q(t). For example the

19

equation of a particle moving in R3 in accordance with mx(t) = F (x) can beequivalently described as a solution to the first order system

(d/dt)(x(t), v(t)) = (v(t), F (x(t))/m) (2.13)

because the first component of this equation forces the identification v(t) =x(t). Define a vector field X on T (R3) (which can be identified with R6, atsome small risk) by

X(x,v) = (v, F (x)/m), x ∈ R3, v ∈ R3 = Tx(R3). (2.14)

Then we may rewrite (2.13) as

(d/dt)(x(t), v(t)) = Xx(t),v(t) (2.15)

As in this example, in general systems the forces in the system, as well as themasses (or e.g. moments of inertia, if some rigid bodies are involved) can beencoded into a vector field X over T (C), after determining what manifold Crepresents the configuration space of the physical system. Newton’s equationsthen take the form

db(t)/dt = Xb(t), (2.16)

where b(t) = (q(t), q(t)) comprises the instantaneous position and velocityof the system. By the “state of the system” one means the pair q ∈ C andv ∈ Tq(C). Since the initial position and velocity of the system determinesthe solution to (2.16), the initial state of the system determines the stateof the system for all time. Of course we are assuming here that Equation(2.16) has solutions which exist for all time and are unique. In this case thesolution to (2.16) defines a 1-parameter group of diffeomorphisms φt of T (C)given by

(d/dt)φt(a) = Xφt(a) (2.17)

The group φt is the Newtonian flow on T (C) determined by the vector fieldX.

Identifying the vector field X on T (C) tends to be a messy business if thesystem (i.e. C and forces) is a little bit complicated. Our aim in the nextsection is to show that the vector field X can be deduced from a knowledgeof the kinetic and potential energies, even for a quite general system. Bothof these are functions on T (C) and encode the masses and forces.

20

Example 2.33 (Sphere in R3.) Take C to be the centered sphere in R3 ofradius R. Let m > 0. Define the kinetic energy to be Tq(v) = (1/2)m|v|2for v ∈ Tq(C). The norm on v is the Euclidean norm on R3 restrictedto the tangent space to C at q. Define V (q, v) = mg q3, where g is thegravitational constant in your favorite units and q = (q1, q2, q3). Both Tand V are functions on T (C), although V doesn’t depend on v. The forcespecified by the potential V is then F (q) = −∇V = −mg(∂/∂q3), which isthe constant, downward pointing, gravitational force.

Exercise 2.34 (Harmonic oscillator) Take C = R, T (v) = (1/2)mv2, andV (x) = 3x2. Compute the force and find the solution to Newton’s equation,given that the state of the system at time zero is (x0, v0).

2.6 Lagrangian Mechanics

Our goal here is to see how the kinetic and potential energy of a mechanicalsystem determine the Newtonian flow. We really need to do this for a generalconfiguration space so that the transition to Hamiltonian mechanics will bemore understandable. But first we are going to practice with linear systems,where we can make computations easily, including an actual integration byparts.

2.6.1 Linear systems

Notation 2.35 Let k be a strictly positive integer, and for each integerj ∈ {1, . . . , k} let mj be a strictly positive constant. Define a quadratic formT : Rk → [0,∞) by

T (v) = (1/2)k∑j=1

mjv2j , v = (v1, . . . , vk) (2.18)

Let V ∈ C2(Rk). We have met special cases of this two function structurein Section 2.2. There we were concerned with the case in which k = 3N , Nbeing the number of particles moving in R3, the present constants m1,m2,m3

being all equal to the mass of the first particle, whose velocity is given by thefirst three components of the k vector v above. And so on. We want to freethe computations in this section from the unnecessary notational complexityof that earlier case. To this end we will also write simply q = (q1, . . . , qk) for

21

the Cartesian coordinates in Rk. If the coordinates of the system of particlesis given by the point q then V (q) is to be interpreted as the potential ofthe system. The jth component of the usual force F = −grad V (for oneparticle) may now be written Fj(q) = −∂V (q)/∂qj for the system. To saythat the system point q(t) moves in accordance with Newton’s equationsmeans

mj qj(t) = Fj(q(t)) = −(∂V/∂qj)(q(t)) (Newton’s equations) (2.19)

The total energy at time t along the orbit is by definition

E(t) = T (q(t)) + V (q(t)) (2.20)

Here we have written q(t) for the velocity v(t) of the system in the kineticenergy term.

Conservation of energy in the Newtonian flow holds in this seeminglymore general context than that of Section 2.2, but has an equally easy proof:

Theorem 2.36dE(t)/dt = 0 (2.21)

if (2.19) holds.

Proof. In view of (2.18) we see that

dE(t)

dt=

k∑j=1

mj qj(t)qj(t) + (d/dt)V (q(t)

=k∑j=1

qj{mj qj + (∂V/∂qj)(t)}

= 0

Be aware, however, that conservation of momentum does not hold un-less the forces have some kind of special structure, such as two body forcesobeying action and reaction, discussed in Section 2.2.

Definition 2.37 The Lagrangian of the system specified in (2.18) is

L(q, v) = T (v)− V (q). (2.22)

22

It would be well to keep in mind that L is a function of two variables, q and v,each of which varies over Rk. In keeping with the usual conventions (namelyfx(x, y) means the partial derivative of f with respect to x, evaluated at(x, y)) we will write (Lq)(q, v) and (Lv)(q, v) for the two partial directionalderivatives. If the q and v variable are linked then the usual chain ruleapplies. For example if q(t) is a curve in Rk and v(t) = q(t) is its velocitythen a variation of the curve q(·) by a small variation, giving the nearbycurve q(t) + sh(t), will also vary the velocity: v(t) 7→ (d/dt)(q(t) + sh(t)) =v(t) + sh(t). Consequently, by the chain rule, at s = 0,

(d/ds)L((q + sh)(t), (d/dt)(q + sh(t)))

= (Lq)(q(t), v(t))〈h(t)〉+ (Lv)(q(t), v(t))〈h(t)〉 (2.23)

= −k∑j=1

Vj(q(t))hj(t) +k∑j=1

mj qj(t)hj(t) (2.24)

for each t, where h(t) = (h1(t), . . . , hk(t)).

Theorem 2.38 Let t0 < t1 and let a0 and a1 be two points in Rk. Denoteby P the set of C2 paths in Rk over [t0, t1] joining a0 to a1. Let Y = {h ∈C2([t0, t1],Rk) : h(t0) = h(t1) = 0}. Then Y is a linear space (in facta Banach space in the C2 norm) and if q(·) ∈ P and h ∈ Y then t 7→q(t) + sh(t) is in P for all real s. Suppose that q(·) ∈ P. Then the followingare equivalent.

1) Newton’s equations:

mj qj(t) = −(∂V/∂qj)(q(t)) (2.25)

2) Lagrange’s equations:

(d/dt)[(Lv)(q(t), q(t))

]− Lq(q(t)) = 0 ∀ t ∈ [t0, t1] (2.26)

3) Least action principle:The “action” functional A : P → R given by

A(y) =

∫ t1

t0

L(y(t), y(t))dt (2.27)

has a critical point at y(·) = q(·)

23

(N.B. Both terms in (2.26) are linear functionals on R2k.)Proof. Inserting y(t) = q(t) + sh(t) into f(y) and using (2.23) we find

(d/ds)|s=0A(q(t) + sh(t))

=

∫ t1

t0

{(Lq)(q(t), v(t))〈h(t)〉+ (Lv)(q(t), v(t))〈h(t)〉

}dt

=

∫ t1

t0

{(Lq)(q(t), v(t))〈h(t)〉 −

((d/dt)

[(Lv)(q(t), v(t))

])〈h(t)〉

}dt

=

∫ t1

t0

{(Lq)(q(t), v(t))− (d/dt)

[(Lv)(q(t), v(t))

]}〈h(t)〉dt (2.28)

To derive the third line we used an integration by parts along with h(t0) =h(t1) = 0 to get rid of the boundary terms. (If you feel queasy about dealingwith integration by parts for such linear functionals rewrite it in coordinates,as in (2.24).)

Thus q(·) is a critical point of A(·) if and only if the integral in (2.28) iszero for all of the allowed functions h(·). Since these constitute quite a heftyset of functions, the integral is zero for all such h if and only if the factorin braces in the integrand is zero, that is, if and only if Lagrange’s equation(2.26) holds. Moreover if we put L = T −V in (2.27) and write this equationin coordinates we find

0 ={

(d/dt)[(Lv)(q(t), q(t))

]+ Vq(q(t))

}〈h(t)〉

=k∑j=1

{mj qj(t) + Vj(q(t))

}hj(t)

for all allowed functions hj(t). This is equivalent to Newton’s equations(2.25).

2.6.2 General configuration spaces. (Nonlinear systems.)

Notation 2.39 Let L : T (C) → R be any smooth function. In addition tothe one form dL on T (C) there is another natural one form determined by L.Let π : T (C) → C be the natural projection. Then π∗ : T (T (C)) → T (C).For each q ∈ C let Lq : Tq → R be the restriction of L to Tq(C), which isof course contained in T (C). Since Tq(C) is linear we may regard d(Lq) as a

24

linear functional on Tq(C) for each point v in Tq(C). If w ∈ Tq,vT (C) thenπ∗w ∈ Tq(C) and so

θL〈w〉.= d(Lq)〈π∗w〉

is a well defined real number. θL is therefore a well defined one form on T (C).{ Give its expression in local coordinates.}

Theorem 2.40 Let L : T (C)→ R be a smooth function. Let π : T (C)→ Cbe the natural projection and define the associated 1-form θL as in Notation2.39. Assume that for each point q ∈ C the function Lq on Tq has no criticalpoints. (I.e. its second derivative matrix in the linear coordinates is invertibleat each point v ∈ Tq.) (This is in the spirit of saying that all masses arestrictly positive.)

Then there exists a unique vector field X on T (C) such that

π∗Xq,v = v (2.29)

andLXθL − dL = 0 (2.30)

where LX denotes the Lie derivative.Moreover in any local coordinate system (q1, . . . , qk) and induced coor-

dinates (v1, . . . , vk) on Tq(C) over the coordinate patch, the flow in T (C)t 7→ q(t), q(t) ∈ Tq(t), induced by the vector field X, satisfies Lagrange’sequations

(d/dt)[(∂L/∂vi)(q(t), q(t))

]− (∂L/∂qi)(q(t), q(t)) = 0, i = 1, . . . , k (2.31)

Proof. This statement is based on a lecture by Roland Roeder (in Math.712, (2003)) Very likely there is a proof in either [39] or in [43].

Note: A converse of this theorem also holds in the sense that, in the pres-ence of (2.29), the Lagrange equations, (2.31), imply (2.30), when properlyformulated.

Significance: We saw in Theorem 2.38 that if C = Rk and if L =T −V on T (C) then Lagrange’s equations, (2.31), are equivalent to Newton’sequations. Therefore the vector field X in Theorem 2.40 gives the Newtonianflow on T (C) in this (linear) case.

Example 2.41 (Spherical pendulum again.) We saw in Example 2.33 howeasy it was to write down the Lagrangian for the spherical pendulum:

L(q, v) = (1/2)m|v|2 −mg q3 (2.32)

25

But just try writing down Newton’s equations, say in spherical coordinates!See John Hubbard’s exposition of the spherical pendulum via

both the Lagrangian and Hamiltonian approaches. It is availableon the course website.

2.7 Hamiltonian mechanics

Let π : T ∗(C)→ C be the natural projection. So π∗ : T (T ∗(C))→ T (C). Letb be a point of T ∗(C), We may write b = (q, p) where q ∈ C and p ∈ T ∗q (C).Although any point p ∈ T ∗(C) is a point in T ∗q (C) for some unique pointq we will nevertheless be redundant and specify q also, as is customary inthe physics literature. For any vector u ∈ Tb(T ∗(C)) we have π∗u ∈ Tq(C).Hence 〈p, π∗u〉 is well defined and is linear in u. Thus the equation

α(u) = 〈p, π∗u〉 (2.33)

defines a one-form on T ∗(C). Let

ω = dα.

Lemma 2.42 If dimension C = n then ωn 6= 0.

Proof. Aside from proving the assertion of the lemma this proof is in-tended to give some insight into the two-form ω for future use. Choosea local coordinate system q1, . . . , qn in an open set U in C. A point p inT ∗q (C) may then be written uniquely in the form p =

∑nj=1 pjdqj and conse-

quently q1, . . . , qn, p1, . . . , pn form a coordinate system in π−1(U). The vectors∂/∂qj, ∂/∂pj|nj=1 form a basis of Tb(T

∗(C)). Since π(q1, . . . , qn, p1, . . . , pn) =q1, . . . , qn we find

π∗∂/∂pj = 0

andπ∗∂/∂qj = ∂/∂qj.

Be aware that ∂/∂qj has a different meaning on the two sides of the lastequation. It follows that

α〈n∑j=1

(aj∂/∂pj + vj∂/∂qj

)=

n∑j=1

pjvj. (2.34)

26

In other words, in terms of the local coordinates on π−1U we have

α =n∑j=1

pjdqj. (2.35)

Hence

ω =n∑j=1

dpj ∧ dqj (2.36)

It now follows thatωn = Πn

1 (dpj ∧ dqj)

which is clearly nowhere zero.The fundamental two-form ω sets up an isomorphism between Tb(T

∗C)and T ∗b by means of the correspondence

T ∗b 3 β → z ∈ Tb if β〈u〉 = ω〈u, z〉 for all u ∈ Tb. (2.37)

The correspondence is an isomorphism because ω is nondegenerate.Suppose then that H : T ∗(C) → R is a smooth function. Then the

correspondence (2.37) determines a vector field Y ≡ YH on T ∗(C) from theone-form dH. Thus Y is the unique vector field on T ∗(C) given by

〈dH, u〉 = ω〈u, Y 〉 ∀ u ∈ T (T ∗(C)) (2.38)

If ω were an inner product then it would be customary to write Y = ∇H.But in fact ω is a skew symmetric bilinear form. It is customary to write

Y = ∇ωH (2.39)

The ω gradient of H is therefore defined by the identity

〈dH, u〉 = ω〈u,∇ωH〉 ∀ u ∈ T (T ∗(C)). (2.40)

We intend to show that if one chooses

H = T + V (2.41)

then the flow in T ∗(C) determined by the ω gradient ∇ωH on T ∗(C) “is”the Newtonian flow. However, since the kinetic energy T has not yet beendefined on T ∗(C), and the vector field XL of the previous section, denoted

27

simply X there, is also not defined on T ∗(C), it behooves us to explain themeaning of the word “is”. To this end we will show that (under suitableconditions) the Lagrangian L itself sets up an isomorphism between Tq(C)and T ∗q (C), which extends to a diffeomorphism of T (C) with T ∗(C). Thisdiffeomorphism, in turn, interchanges XL with ∇ωH. In this sense the flowof ∇ωH on T ∗(C) is equivalent to the flow of XL on T (C), which in turn, isequivalent to the Newtonian flow when this statement makes sense, namelyC = Rk (see Theorem 2.38) or C = configuration space for some constrainedsystem in Rk see spherical pendulum in Examples 2.33 and 2.41.

It seems well worth pointing out that, quite aside from the connectionswith Lagrangian and Newtonian mechanics, the theory of flows over sucha cotangent space T ∗(C), as well as geometrical questions and applicationsto other parts of mathematics is a highly developed subject in itself. Infact any even dimensional manifold carrying a symplectic form, analogousto ω, already has a rich structure, that has been explored by some of yourclassmates.

2.7.1 The Legendre transform

Let Y be a finite dimensional real vector space. Denote by Y ∗ its dual space.We want to consider a class of functions on Y which will capture the typicalvelocity dependence of a Lagrangian L(q, v) at a fixed point q, in the presenceof velocity dependent forces. Y should be interpreted as the tangent spaceto configuration space at the point q.

For a smooth function f : Y → R its derivative f ′(v) ∈ Y ∗ is defined asusual, for each v ∈ Y , by the prescription

f ′(v)〈w〉 =d

dtf(v + tw)|t=0 for w ∈ Y. (2.42)

We will use brackets 〈w〉 to emphasize that the argument is linear in w.

Definition 2.43 (Legendre transform) Suppose that f : Y → R is a smoothfunction and that the map

Y 3 v 7→ f ′(v) ∈ Y ∗ (2.43)

is one-to-one and onto. Define a function

f ∗ : Y ∗ → R (2.44)

28

by the prescription

f ∗(p) = 〈p, v〉 − f(v) p ∈ Y ∗, (2.45)

where v is the unique solution to the equation

p = f ′(v). (2.46)

f ∗ is called the Legendre transform of f .

Example 2.44 Take Y = Rn with its standard inner product. Let

f(v) =1

2m|v|2. (2.47)

Then f ′(v)〈w〉 = (mv,w). Upon identifying Y with Y ∗ via the inner productwe may therefore write f ′(v) = mv. Hence the map v 7→ f ′(v) is one-to-oneand onto. Solving the equation p = f ′(v) for v in terms of p gives v = p/m.Therefore

f ∗(p) = (p,p

m)− m

2| pm|2 =

1

2m|p|2 (2.48)

Example 2.45 Take Y = Rn with its standard inner product again. Let Abe a vector in Y ∗. Define

f(v) =1

2m|v|2 + 〈A, v〉 (2.49)

Then

f ∗(p) =1

2m|p− A|2 (2.50)

Proof. f ′(v)〈w〉 = (mv,w) + 〈A,w〉. Hence

f ′(v) = mv + A (2.51)

where mv again denotes the element of Y ∗ gotten by identifying Y with Y ∗

as in the preceding example. Thus, solving the equation p = f ′(v) = mv+Awe find v = (p− A)/m. Therefore

f ∗(p) = 〈p, v〉 − f(v)

= 〈p, p− Am〉 −

(1

2m |p− A

m|2 + 〈A, p− A

m〉)

=1

2m|p− A|2

It is this example that is responsible for forcing connections on vectorbundles into quantum mechanics in the presence of velocity dependent forces.

29

Example 2.46 Take Y = R and define f(v) = v2 + v4. Then f ′(v)〈w〉 =(2v + 4v3)w. That is, f ′(v) = 2v + 4v3. The range of this function is clearlyall of R. Moreover f ′′(v) = 2 + 12v2 ≥ 2. Hence f ′ is one-to one and onto.Its Legendre transformation is therefore well defined.

Remark 2.47 As a cultural matter you should be aware that the definition2.43, which assumes that the map Y 3 v 7→ f ′(v) ∈ Y ∗ is one-to-one andonto, can be replaced by a very clean and general definition of the Legendretransform, for convex functions f which need not be differentiable, nor evendefined on all of Y , but only on some convex subset of Y . The Legendretransform is then another such function and one has the nice theorem thatf ∗∗ = f . Such an extension of our present discussion is very useful in ther-modynamics and other parts of mathematics, but is not needed by us formechanics. A more careful survey of this extension is outlined in Appendix9.1. Notice by the way that the previous three examples are all convex.

2.7.2 From Lagrange to Hamilton via the Legendre transform

Suppose that L : T (C) → R is a smooth function satisfying the non-degeneracy condition of Theorem 2.40; v 7→ L(q, v) has no critical pointson Tq(C). Although this is all that’s needed for the basic theory we will as-sume more. Namely we assume that for each q ∈ C the function v 7→ L(q, v)is quadratic plus linear and furthermore, for each element q ∈ C the functionTq 3 v → L(q, v) has a nonsingular quadratic part, as in Section 9.1.1. Inall of the examples that we’ve looked at so far the operator M that appearsin Section 9.1.1 is positive definite, since it just comes from a kinetic energy.When we introduce electromagnetic forces later we will have to add on alinear term in v also. We are now ready to define the Hamiltonian functionon T ∗(C). It is, for each q ∈ C, the Legendre transform of the LagrangianIn the velocity variable. That is, if Lq = L|Tq(C) then

H(q, p) = (Lq)∗(p). (2.52)

H is a function on T ∗(C). This is the function to which we want to applythe theory of Section 2.7.

The relation between the Newtonian, Lagrangian and Hamil-tonian approaches to classical mechanics is developped further inJohn Hubbard’s notes. These are now available on the course web-site. See also Goldstein, “Classical Mechanics” [23] for the standard ap-proach used by physicists

30

2.8 SUMMARY

Newton Lagrange HamiltonForces and masses L : T (C)→ R H : T ∗(C)→ RSecond order equ. First order system First order systemF = ma LXd(Lq)− dL = 0 ω(Y,∇ωH) = (dH)(Y)

Local versions: ddt

[∂L∂q

]− ∂L

∂q= 0 qi = ∂H

∂qi, pi = −∂H

∂qi

Newton’s equations Lagrange’s equations Hamilton’s equations

Table 1: Newton to Lagrange to Hamilton.

We saw that all the Lagrangians L = T − V on T (C) in our exampleshave the property that the restriction, Lq, of L to Tq is a positive quadraticfunction (the kinetic energy) plus a constant (−V (q)) and is therefore smoothand strictly convex on the finite dimensional linear space Tq(C). Later we willadd on a linear term to incorporate electromagnetic forces. The Lagrangiantherefore sets up a diffeomorphism (actually an affine map) φq between Tqand T ∗q for each q ∈ C, in accordance with Section 9.1.1. Putting all thesefiber maps together yields a diffeomorphism between T (C) and T ∗(C). Foreach point q ∈ C the Hamiltonian function H(q, ·) is the conjugate functionto Lq, as defined in (2.52).

See also John Hubbard’s notes on these topics. These are nowavailable on the course website.

We owe a debt of gratitude to Roland Roeder, who gave the initial lectureson the Lagrangian and Hamiltonian approaches in the 2003 manifestation ofthis seminar.

END of DAY 6, 2/10/2011

2.9 References

References:V. I. Arnold, Mathematical methods of classical mechanics [4]Cushman and Bates, Global aspects of classical integrable systems.[14]Gerald Folland, Quantum field theory, A tourist guide for mathematicians.[21]

This book contains a chapter on classical mechanics, aimed at preparing thereader for quantum mechanics.

31

Goldstein, Classical mechanics. [23]Brian Hall, An introduction to quantum theory for mathematicians. [28].

This book contains a chapter on classical mechanics, aimed at preparing thereader for quantum mechanics. An updated version of this book can bedownloaded from the 7120 website.

Jose and Saletan, Classical dynamics, A contemporary approach. [39]Marsden and Ratiu, Introduction to mechanics and symmetry, A basic

exposition of classical mechanical systems. [43]

2.10 Solutions to problems

Exercise 2.16 . Solution was presented by Peter Luthy.

Exercise 2.23 Solution presented by Alex Fok

1. Show that M is a nonnegative symmetric operator.

2. Show that M is invertible iff the body does not lie in a line throughthe origin. (i.e. µ is not supported in such a line.)

Solution:

(a) It suffices to show that (Mω, ν) = (Mν, ω) and (Mω,ω) ≥ 0 forany ω ∈ R3. Note that

(Mω, ν) =

∫R3

(x× (ω × x), ν)dµ(x)

=

∫R3

(ν × x, ω × x)dµ(x)

=

∫R3

(x× (ν × x), ω)dµ(x)

= (Mν, ω)

(Mω,ω) =

∫R3

‖ω × x‖2dµ(x)

≥ 0

Thus M is a nonnegative symmetric operator.

32

(b) Since M is a symmetric operator, M is not invertible iff thereexists ω0 6= 0 such that (Mω0, ω0) = 0. This is equivalent tosaying that ∫

R3

‖ω0 × x‖2dµ(x) = 0,

which is equivalent to saying that ω0 × x = 0 for all x ∈ supp(µ)because of continuity in x of the integrand. But

ω0 × x = 0 for all x ∈ supp(µ)

⇐⇒x ∈ Rω0 for all x ∈ supp(µ)

⇐⇒supp(µ) ⊂ Rω0

33

3 Electricity and Magnetism

3.1 Lodestones and amber from antiquity till 1600

History: Ref. Duane Roller “The de Magnete of William Gilbert” (1959)Lodestones were known at least as far back as 900 BC (Homer mentioned

them) Aristotle, Plato, Pliny and the usual gang made reference to themin passing, in a way that made it clear that the reader was expected to befamiliar with them. There were various theories about how they worked.

One of the early theories regarded the material attracted by the amberas “food for amber”.

But the two most long lasting theories were the following.Theory Number 1. A “sympathy” exists between the lodestone and

the iron that it attracts.Theory Number 2. The lodestone emits some stuff that removes air

from around it. A piece of iron nearby will then fall into the empty space.The first theory is anthropomorphic while the second theory in effect

rejects action at a distance.

Some highlights:1. Lucretius [c. 60 BC] describes Theory Number 2 for lodestones thus.

It is from his poem “On the nature of things”. [35, p464]

First, from this stone there must stream a shoal of seeds in a currentDriving away with its blows all the air ’twixt itself and the iron.Soon as this space is made void and a vacuum fashioned between them,Instantly into that void the atoms of iron tumble headlongAll in a lump; thus the ring itself moves bodily forward.

2. Plutarch [c. 50 AD] Here is Plutarch’s version of Theory Number 2for amber. [35, p465]

In amber there is a flammeous and spirituous nature,and this by rubbing on the surface is emitted by hidden passages,and does the same that lodestone does.

3. Not much progress in understanding or using lodestones till c. 1200when compasses were invented in China and the technology gradually spreadto Europe. However there have been many medical and social applications

34

found for lodestone. One needed only to tie a piece of lodestone onto adiseased part of the body. This was particularly advised for curing gout andepilepsy. Here are some other known recommendations.

4. Marbode, (11th century). To determine whether a wife is chaste orunchaste apply a lodestone to her head. If she is unchaste she will fall out ofbed. [50, p27]

5. St. Hildegard of Bingen, (12th century). To cure insanity just tieon a lodestone and sing a suitable incantation. [50, p28]

6. Unidentified source, Ithaca, NY, (21st century).a. To cure carpal tunnel syndrome sleep with a magnet bound to your

wrist.b. To cure back pains sleep on a pad filled with magnets.

1600 AD. George Gilbert published the first organized hard data aboutlodestone-like and amber-like objects in 1600. He emphasized that magneticphenomena and electric phenomena were different. He presented hard ex-perimental data for both effects. The data was not quantitative. But helisted many more amber-like materials than were previously known (whichhe named “electrics”). He had invented a very sensitive detection device -a versorium (a needle carefully balanced at its center and allowed to rotateunder the influence of weak forces) He had his own theory to explain thesephenomena the “effluvia theory”. (Effluvia means emmanations.) This wasa modification of Theory Number 2 above. Here is Gilbert’s own theorypresented in his own words.[35, p468]

The effluvia spread in all directions: they are specificand peculiar, and, sui generis, different from the common air;generated from humor; called forth by calorific motion and rubbing,and attenuation; they are as it were material rods - hold and takeup straws, chaff, twiggs, till their force is spent or vanishes;and then these small bodies, being set free again, are attractedby the earth itself and fall to the ground.

Some other events before electricity and magnetism really got off the ground:1632-1642: Galileo is kept under house arrest.1687: Newton publishes “Principia”

35

END of DAY 7 = 2/15/2011

3.2 Production, transfer and storage of electrostaticcharge.

In order to do experiments with electricity, whatever that is, its real handyto have a goodly supply of the stuff under your control. That means that youhave to be able to produce it, store it, and transfer it to a useful place. Youknow what transfer means - conduction of electricity. The devices inventedbetween 1700 and 1750 for these purposes were not suitable for quantitativemeasurements. But they paved the way. The inventors of these devices wereconcerned with the stuff (electrostatic charge) that pulls or pushes littlepieces of this or that.

3.2.1 Production of electrostatic charge, 1704-1713. Hauksbee

I’m afraid that the hair combing method that we used in class to produceelectrostatic charge, and which we successfully detected with our own elec-troscope, is not as useful for further experiments as one might hope. FrancisHaucksbee (1666-1713) constructed a very efficient rubbing device for mak-ing charge. It was a glass globe on a horizontal axis with a crank at one endthat you could turn, while holding something in contact with the glass globe,such as some silk, or leather, or your own hand. His most advanced model(version 1710.3) was constructed around 1710. (When you have free time youmight try to identify all the objects in Figure 1. He used this to make manyobservations of how charged, and sometimes uncharged,materials attract orrepel. For example uncharged strings, held at one end, would all align them-selves radially so as to point toward a charged sphere at the center. He alsoshowed how two such spheres, one charged, produce a mysterious green light,first observed in mercury barometers. It was actually this mysterious lightthat first turned him on to these investigations. This “electric machine”, ashe called it, was used by generations of experimenters. See [49] Chapter 3 ifyou would like to know more about what he said he did and why he did it.He himself attempted to explain his observations in his 45 published papers.But, sadly, his extension of Gilbert’s “effluvia” theory did not stand the testof time.

36

Figure 1: Hauksbee’s electrostatic generator

3.2.2 Transfer of electrostatic charge, 1731. Grey and Dufay

Some materials conduct the “electrical fire” quite readily and some don’t,conductors and nonconductors, as we call them nowadays. It was Grey andDufay who discovered this in 1731. Moreover they accumulated enough realdata to enable Dufay to propose a reasonable theory of what charge is. Itwas called the two fluid theory, which captures some of our present theoryof positive and negative charge. Of course no such proposal about thingsyou can’t see is likely to be accepted right away by any scientific community.Our own Ben Franklin later proposed a “one fluid” theory. This was beforehe got involved with politics and revolution. Physicists took sides on the“n-fluid” issue, of course. Coulomb, for example, after surviving the Frenchrevolution, argued for the two fluid theory.

3.2.3 Storage of electrostatic charge, 1746. Musschenbroek ofLeyden.

The accidental discovery by Musschenbroek of Leyden, in 1746, of a methodof storing large quantities of the “electrical fire” changed the landscape inelectrical science.

We tend to think of the development of science in the seventeenth andeighteenth centuries as proceeding at a slow “gentlemanly” pace. The rea-son for this view, is that the really significant developments, the ones thatsurvived the test of time, were few and far between, with the lesser support-ing discoveries not even brought to our attention, in spite of the fact thattheir discoverers were scrambling to discover and their lesser discoveries wereinfluential on those who finally put the pieces together.

The events leading up to and immediately following the accidental dis-covery of the Leyden jar are particularly fun to read about because thegeneral public was getting into the swing of frontline research on electric-

37

ity. University lectures on electricity were attended by the general public,even to the extent that the registered students couldn’t find seats. (Paris, Isuppose.) Theaters offered kisses from pretty (electrically charged) womenswinging from wires on stage, a shocking experience for willing members ofthe audience.

In the 1750s a home without a charge generator prominently displayedon a coffee table could not claim to be cultured. More on this may be foundin Chapter 6 of Roller’s book [49] and in [35, Section 26.7].

Within months of Musschenbroek’s discovery the news traveled across theAtlantic Ocean to Ben Franklin. He immediately began his own experimentswith stored charge. He constructed a version of the Leyden jar which pro-duced very big sparks. By November, 1747, just in time for a feast, he wasable to kill a turkey.

Before you read what Musschenbroek actually did you might like to readthe following extract from the popular writings of Henry Smith Williams(1863-1943), describing other effects of the discovery of the Leiden jar onpublic entertainment. This was written about 1904 in Harper’s Magazine.For more by this author seehttp://www.worldwideschool.org/library/books/sci/history/AHistoryofScienceVolumeII/chap49.html

“The advent of the Leyden jar, which made it possible to producestrong electrical discharges from a small and comparatively simple device,was followed by more spectacular demonstrations of various kinds allover Europe. These exhibitions aroused the interest of the kings and noblemen,so that electricity no longer remained a ”plaything of the philosophers” alone,but of kings as well. A favorite demonstration was that of sendingthe electrical discharge through long lines of soldiers linked together by piecesof wire, the discharge causing them to ”spring into the air simultaneously” ina most astonishing manner. A certain monk in Paris prepared a mostelaborate series of demonstrations for the amusem*nt of the king, amongother things linking together an entire regiment of nine hundred men,causing them to perform simultaneous springs and contortions in amanner most amusing to the royal guests1. But not all the experimentsbeing made were of a purely spectacular character, although most ofthem accomplished little except in a negative way. The famous Abbe Nollet,for example, combined useful experiments with spectacular demonstrations,

thus keeping up popular interest while aiding the cause of scientific electricity.”

1Perhaps this gives the reader some insight into the origin of the French Revolution.

38

1746: Musschenbroek of Leyden discovers the Leyden jar.The accidental discovery of a method for storing (what we now call)

charge was a shocking experience for its discoverer, Peter van Musschenbroekof Leyden. For the benefit of readers with a slightly sad*stic streak I’m goingto excerpt below the relevant portion of Roller’s book on the development ofthe concept of electric charge. [49][p52].

In order to understand what Musschenbroek did it would be good tounderstand, at an intuitive level, how a condenser ( ≡ capacitor) works.

HOW A CAPACITOR WORKS: Envision a 3 volt battery with a wiresticking out of the negative side. (Thats the casing.) The battery tries topush electrons out along this wire. But they have no place to go. A littlemore precisely, a few electrons do get pushed out along the wire, but sincethe electrons repel each other they will push back until no more electrons canget pushed onto the wire by the battery. (If it were a nine volt battery a fewmore electrons would be pushed onto the wire.) Similarly a wire attachedto the positive end of the battery (thats the little dimple at the other endof the battery) will try to pull electrons into the battery. But once a fewelectrons are pulled out of the wire the remaining positive ions in the wirepull back on the electrons, stopping the very little current flow. Supposenow that one attaches these two wires to two big identical flat metal plates(say 5 inch by 5 inch squares) and puts the two plates parallel to each otherwith a thin piece of glass in between. If the glass were not there the twoplates would be in contact all along their 25 square inch surface and currentwould flow. I.e. electrons would move from the negative side of the battery,through the plates, and into the positive side of the battery. But with theglass separating the two plates what actually happens is this: a few electronsget pushed onto plate A from the negative side of the battery and a fewelectrons get pulled off plate B into the positive end of the battery. But theextra electrons on plate A are very close to the positive ions on plate B andconsequently each (partly) neutralizes the push or pull of the other therebyallowing more electrons to get pushed onto plate A and more electrons tobe pulled off plate B. The result is that some significant current flows outof the battery, but only for a very short time, until the charge built up onplate A, even after being partly neutralized by the positive ions on plate B,push back on new incoming electrons with a “force” of 3 volts. If you nowcut the two wires the negative charges (the elctrons) are trapped on plate Awhile the positive charges (the ions) are trapped on plate B. You have now

39

STORED CHARGE. Such an arrangement of plates is called a condenser, orequivalently, a capacitor. This is what Musschenbroek discovered. If now youseparate the two plates the positive and negative charges no longer neutralizeeach other. The electrons on plate A now want to get off plate A very badly(yes this is anthropomorphism). A voltmeter attached between plate A andplate B would have shown a push of 3 volts when the plates were near. Butafter being separated there is a much higher push. The voltmeter will showa much higher voltage between plate A and plate B than 3 volts. (Keep thismind.) The arrangement of plates can be varied. For example if the piece ofglass is actually a jar one could replace one of the plates by water in the jar(best to add a little salt to make it a good conductor) and one could replacethe other plate by ones own hand (which is indeed a good conductor), inwhich one is holding the jar.

Here is Pieter van Musschenbroek’s letter (of January 1746) to his friendJ.A. Nollet, who included part of it in a paper of his own, published in theMemoires of the French Academy early in 1746. (Pretty short publicationtime, isn’t it?) The following is from [49], page 52.

“I am going to tell you about a new but terrible experiment whichI advise you not to try for yourself... I was making some investigations onthe force of electricity. For this purpose I had suspended by two threads ofblue silk, a gun barrel, which received by communication the electricity of aglass globe that was turned rapidly on its axis while it was rubbed by thehands placed against it. From the other end of the gun barrel there hungfreely a brass wire, the end of which passed into a glass flask, partly filledwith water. This flask I held in my right hand, while with my left hand Iattempted to draw sparks from the gun barrel. Suddenly my right hand wasstruck so violently that all my body was affected as if it had been struck bylightning ... The arm and all the body are affected in a terrible way that Icannot describe; in a word, I thought that it was all up with me ... ”

Peter van Musschenbroek nevertheless didn’t give up these experiments.He went on to describe to Nollet variations of this experiment, in which, forexample he placed the jar on a metal plate sitting on a wooden table, whilethe experimenter stood on the wooden floor when drawing the spark. Therewas very little effect in this case unless the experimenter touches the metalplate at the same time. Further, if one person holds the jar while anotherdraws the spark (by touching the gun barrel with hand or metal rod) thenthe shock is very small. Thus he writes

40

“The person who tries the experiment may simply stand on the floor,but it is important that the same man hold the flask in one hand and try todraw the spark with the other: the effect is very slight if these actions wereperformed by two different persons. If the flask is placed on a metal supporton a wooden table, then the one who touches this metal even with the end ofhis finger and draws the spark with his other hand receives a great shock.”

For videos of a Leyden jar in action go tohttp://www.magnet.fsu.edu/education/tutorials/java/electrostaticgenerator/index.htmlhttp://www.magnet.fsu.edu/education/tutorials/java/leydenjar/index.html

The first shows how to charge up a Leyden jar. The second shows howto discharge it. Both are interactive. Be careful.

SUMMARY

Figure 2: Production - Transfer - Storage of charge

These discoveries of the first half of the 18th century were qualitative: ifyou do this you will get more of that. (E.g. if you rotate this faster you willget more attraction or repulsion and bigger sparks. If you connect a wire theattraction or repulsion will show up at the other end. If you make a capacitor(Leyden jar) you can store lots and lots of the electrical fire (charge).

But none of this is quantitative. What can one measure anyway? Whatabout spark length? Daniel Gralath made many such measurements of spark

41

length (about 1770) but was unable to embed his measurements into any con-sistent theory. Ideas on what one might usefully measure evolved. See [35,Chapter 26] on the evolution of ideas from 1760 onward that finally culmi-nated in the measurement of FORCE between charged particles, by Coulomb,which, for the first time, put electrical science on a firm quantitative footing.

A small timeline of this period.1776: The Colonies revolt against King George.1785: Coulomb measures the force between charges.1789: The French revolution. [Coulomb survived the French Revolution.]

3.3 Quantitative measurement begins: COULOMB

CHARLES AUGUSTIN COULOMB (1736-1806) invented the torsion bal-ance, a very sensitive instrument for measuring small forces. He had alreadyused it for measuring frictional forces, when, in 1785, he adapted it to themeasurement of forces between charged particles.

3.3.1 How Coulomb did it.

There is available now on the web pictures of his instruments and explana-tions of how they worked. The Univrsity of Pavia website

http://ppp.unipv.it/Coulomb/

contains pictures of his actual instruments, modern versions, a biography ofCoulomb, copies of his original articles (in French), translations into English,and finally an interactive version in which you can start his measurementyourself in dry air, normal air or wet air and see what happens. The URLfor the latter is

http://ppp.unipv.it/Coulomb/Pages/e5StrFrm.htm

Upon arriving at this website click on the image of Coulomb’s torsion balanceon the left side of the page.

Here is the first quantitative result in the theory of electricity and mag-netism, Coulomb’s discovery. The following force between two charges atdistance r was actually measured by Coulomb in case q1 = q2. The formula

42

itself can then be used to define units of charge in terms of already establishedunits of force.

|F | = q1q2

r2Coulomb’s law. (3.1)

Coulomb also verified accurately that an earlier measurement by Michell wascorrect: magnetic poles also attract or repel by an inverse square law.

3.3.2 Mathematical consequences

In the context of Newtonian mechanics Lagrange had shown (1777) that theforce exerted on a massive body by several other massive bodies could beexpressed in the form F = −grad V for some function V . [We called V thepotential of this force field in the earlier chapter. The name “potential” wasgiven to it by George Green in 1828.] Laplace then showed (1782) that thefunction V satisfies the equation

∆V = 0 Laplace’s equation in empty space, [1782]. (3.2)

But after the discovery by Coulomb that the force on charges has a similarform to the force on masses [1/r2 law ] Simeon Denis Poisson (1781-1840)showed that the corresponding potential V satisfies, even in a charged region,

∆V = −4πρ Poisson’s Equation, [1812] (3.3)

43

To be precise:

Theorem 3.1 Let ρ be a distribution on R3 with compact support. Define

V (x) =

∫R3

ρ(y)

|x− y|dy (3.4)

Then (3.3) holds in the sense of distributions.

Proof. See Appendix 9.2 for a complete proof.

Let us jump ahead in time and interpret Coulomb’s law from the pointof view that was later pushed by Faraday and Maxwell and is now generallyaccepted. It is field theory. One thinks of the charge q2 at the point y asgenerating an electric field E : R3 → R3 given by

E(x) = q2x− y|x− y|3

(3.5)

Then (3.1) says that a charged particle at x of size q1 experiences a forceF = q1E(x). This equation clearly gives the direction of the force correctlyas well as its magnitude. If you wish to give the vector field E some furtherphysical meaning by thinking of it as measuring a stressed state of somekind of “aether” you will be in the company of many great physicists of thenineteenth century. But the implications of such a viewpoint disagree withlater experiments (e.g the Michelson -Morley experiment.)

Now it is an assumption of linearity (to be checked by experiment) thatthe force on a charged particle at x produced by a number of charges is thesum of their separate forces. That is,

E(x) =N∑j=1

qj(x− yj)|x− yj|3

. (3.6)

For a continuously distributed charge of density ρ the analog (and indeedRiemann limit) is

E(x) =

∫R3

(x− y)

|x− y|3ρ(y)dy (3.7)

The force exerted by this distribution of charge on a small “test” charge atx and of charge q is then

F = qE(x) (3.8)

44

Interchanging the gradient operator with the integral in (3.4) we see thatE(x) = −grad V (x). Since ∆ = div grad, Poisson’s equation, (3.3) immedi-ately gives the following formula.

Theorem 3.2 (Gauss, 1777-1855)

div E = 4πρ Gauss’ law (1837) (3.9)

Equations (3.9) and (3.7) are equivalent in the sense that the only solution of(3.9) (that vanishes at infinity) is given by (3.7). Thus Gauss’ law is exactlythe differential version of Coulomb’s law (3.1).

We are going to formulate all of the measured force laws as differentialequations in this way in order to reach Maxwell’s theory.

END of DAY 8 = 2/17/2011

3.4 The production of steady currents, 1780-1800

So far, all the experimental results and theories that we have discussed dealwith charges sitting still - so called electrostatics. Never mind that therewas movement of charge in sparks. These discharges couldn’t be controlledanyway. The next stage in understanding of electrical phenomena beganby yet another accident, culminating in the discovery by Volta of a battery,which could be used to produce a steady movement of charge - a current. Asyou may have heard somewhere, the ability to produce controlled currentswas the vital ingredient in making the connection between electricity andmagnetism and between electricity and chemistry.

1780: Luigi Galvani was an anatomist with an interest in electrical phe-nomena. For many years before the event that made him famous he hadbeen studying the suseptibility of the nerves to irritation; and having beenformerly a student of Beccaria, he was also interested in electrical experi-ments. In the latter part of the year 1780 he had, as he tells us, ‘dissectedand prepared a frog, and laid it on a table, on which, at some distance fromthe frog, was an electric machine. It happened by chance that one of my as-sistants touched the inner crural nerve of the frog, with the point of a scalpel;whereupon at once the muscles of the limbs were violently convulsed.

45

‘Another of those who used to help me in electrical experiments thoughthe had noticed that at this instant a spark was drawn from the conductorof the machine. I myself was at the time occupied with a totally differentmatter; but when he drew my attention to this, I greatly desired to try it formyself, and discover its hidden principle. So I, too, touched one or other ofthe crural nerves with the point of the scalpel, at the same time that one ofthose present drew a spark; and the same phenomenon was repeated exactlyas before.’

The preceding is taken from Whittaker [67, pp 67-68], which contains thefollowing footnote. { 1 According to a story which has often been repeated,but which rests on no sufficient evidence, the frog was one of a number whichhad been procured for the Signora Galvani who, being in poor health, hadbeen recommended to take a soup made of these animals as a restorative.}

As to whether the electrical effects discovered by Galvani were of animalorigin or not sharply divided physicists. Alexander von Humboldt supportedthe animal origin viewpoint and did experiments on himself to prove it. Un-fortunately they didn’t work. Here is an excerpt from Humboldt’s descriptionof his experiment on himself:

“ I raised two blisters on my back, each the size of a crownpiece and covering the trapezius and deltoid muscles respectively.Meanwhile I lay flat on my stomach. When the blisters werecut and contact was made with the zinc and silver electrodes, Iexperienced a sharp pain, which was so severe that the trapeziusmuscle swelled considerably, and the quivering was communicatedupwards to the base of the skull and the spinous processes ofthe vertebra. Contact with silver produced thee or four singlethrobbings which I could clearly separate. Frogs place upon myback were observed to hop.”

A reader interested in gore can read the rest of his description in [8,pages 32-33]. The next few pages in this book contain interesting gossipabout Goethe and Schiller.

1793-1800: Volta was one of the physicists who believed that the effectdiscovered by Galvani had nothing to do with living matter, but dependedsomehow only on the contact of two different metals. By 1800 he discoveredhow to increase this effect. Whereas two discs, one copper and one zinc,

46

placed in contact, was known to produce a small but detectable effect on anelectroscope, a “pile” of these could produce a shock. (Pile: a repeating se-quence of discs: copper, zinc, moistened pasteboard, copper, zinc, moistenedpasteboard, etc.)

As to whether the shocks from Volta’s battery were of the same electricalorigin as had been explored so intensely in the preceding 100 years was settledby Volta thus:

1) Water could be decomposed by currents produced the old way (rubbingamber, store it in a Leyden jar etc.) or by Volta’s battery.

2) A Leyden jar could be charged up by Volta’s battery with similarshocking effect.

3) Electricities produced by opposite ends of a pile attract while electric-ities produced by corresponding ends of different piles repel.

Conclusion (1801): The electricity produced by Volta’s piles is thesame stuff that’s produced by rubbing two materials together.

3.5 The connection between electricity and magnetism

What happened next really illustrates well how discoveries are dependent onprevious discoveries. In the years after Volta invented the pile, the availabilityof a steady source of current made possible many discoveries concerning therelation between electrical charge and chemical reactions. But of interest forus was the accidental discovery in 1820 by Oersted that a current producesa magnetic field. As the story goes, he was showing his class that a currentdoes not affect a compass. He had aligned the wire carrying the currentperpendicular to the compass. No effect. A student suggested aligning thewire parallel to the compass. Big effect! The compass needle moved quicklyto a perpendicular position. (Interesting story if true.) Here is a timeline ofthe following cross-channel communications.

3.5.1 Oersted, Ampere, Biot and Savart

1820: Oersted, Biot, Savart, Ampere. Immediately after the dis-covery by Oersted (July 1820), that a wire carrying current can influence

47

a compass, Biot and Savart and independently Ampere made careful mea-surements to determine the law of force of a current on a magnetic pole and(Ampere) on another current carrying loop. The dates and fast reaction ofBiot and Savart and Ampere to Oersted’s discovery are interesting becausethey show how intense the competition was. A physicist by the name ofArago came back from a meeting in London and described at a meeting inParis, on September 11, 1820, the results that Oersted had discovered. Aweek later, on September 18, Ampere presented his ideas on how to quantifyOersted’s discovery. On October 30 Biot and Savart gave their formula forthe magnetic field produced by a wire carrying current. By October 30, 1820all the main ideas were out on the table.

Biot and Savart gave a formula for the force that an infinitely long straightwire carrying a current would exert on one pole of a long bar magnet andtherefore also on a compass needle. Ampere gave what could be regarded asa generalization of this. He gave the law of force by which one closed loopof current acts on another closed loop of current. For our purposes its bestto break up their “action at a distance ” laws of force into two parts, whichreflects the later thinking of Faraday, Maxwell, and us. In the latter pointof view the first loop generates a “magnetic field”. The magnetic field thenacts on the second loop. Here is a precise definition.

Definition 3.3 The magnetic field B at a point x ∈ R3 is the force exertedon a unit magnetic pole at x. Equivalently, it is the torque exerted on a smallmagnet at x of unit magnetic moment.

Its time for a little more precision. In measuring the strength of anelectric field at a point x in space its important to place a very small testcharge at the point x because a large test charge could affect the distributionof charges whose field we want to measure. Moreover the test charge shouldbe located “at x”, not just distributed in some small neighborhood of x.Otherwise one will be measuring an average force over the neighborhood.In other words one should use a test particle of “vanishingly small” chargeand spatial extent. Now in reality neither of these requirements can be met.We know now that the smallest magnitude of a charge is the charge on anelectron. Moreover even a single atom has nonzero spatial extent. Thusthe concept of a “vanishingly small” point charge is an idealization. Its auseful idealization when studying the classical theory of fields, as we aredoing. Later, when we study the quantized theory of fields it will be both

48

conceptually and technically necessary to talk only about the “average” forceof the field on a smoothly distributed test charge around x.

For the present therefore we should think of a measurement of the electricfield at x as being carried out by a sequence of increasingly precise averagemeasurements using smaller and smaller pith balls around x of smaller andsmaller total charge. In practice this is just what one does. Measurementsare never 100% accurate anyway. Its convenient to separate the two limitingoperations conceptually: first let the pith ball shrink to a point, therebyobtaining what is usually called a POINT CHARGE. Then imagine repeatingmeasurements with point charges of smaller and smaller charge.

The in-principle method of measuring a magnetic field is similar but withan extra complication. Take a very small compass needle and suspend itfrom its center by a thread so that it can rotate in any direction. Since themagnetic field pushes one end of the needle one way and the other end theother way the magnetic field will exert a torque on the compass needle soas to align the needle with the direction of the field. This determines thedirection of B at the point x, where the center of the needle is located. Theneedle should be very short so that the field at one pole is “substantially” thesame as at the other pole. Otherwise there will be a net force on the needlein addition to the torque and the needle will translate as well as rotate. Theidealization of such a small compass needle (the analog of point charge) iscalled a MAGNETIC DIPOLE. A magnetic dipole is prescribed by a vectorµ in R3 and should be conceptuallized as a limit of small compass needles oflength ε pointing in the direction µ and of pole strength p > 0 (north pole atthe tip of µ) and pole strength −p at the other end chosen so that pε = |µ|.Thus p → ∞ as ε → 0. This relation between ε and p is necesary in orderfor the limiting torque of the magnetic field on the tiny compass needle toexist and not vanish identically. The vector µ is called the magnetic momentof the magnetic dipole.

Exercise 3.4 Show that in a constant magnetic field B the torque on thesmall compass needles µε about any point is independent of ε and of the pointand is given by

N = µ×B

Exercise 3.5 Suppose that B(x) is a continuously differentiable vector fieldon R3. Show that the torque Nε on the compass needle µε described abovesatisfies

limε→0

Nε = N

49

But there is also a translational force. This underlies the Stern-Gerlach ex-periment. Find a formula showing that the translational force on a magneticdipole located (exactly) at x in the presence of a non constant magnetic fieldis proportional to the magnetic moment and to the first derivative of themagnetic field at x.

Exercise 3.6 The mathematical meaning of these limiting configurations ismost easily defined in terms of generalized functions. Let φ ∈ C∞c (R3). Amagnetic dipole µ at the origin is the linear functional

φ→ µφ ≡ −µ · grad φ(0).

Prove thatµφ = lim

ε→0p[φ(uε/2)− φ(−uε/2)]

where u is a unit vector in the direction of µ.

The formulas of Biot and Savart and AmpereLet C and C ′ be two oriented curves carrying currents i and i′ respectively

in the direction of the orientation. Then the current in curve C produces amagnetic field B given by

B(x) = i

∫C

1

|x− y|3ds× (x− y) (3.10)

and the force on an element ds (at x) of the curve C ′ is

F = (i′ × ds′)×B(x). (3.11)

Just as Poisson’s equation (3.3) and Gauss’ law (3.9) follow from Coulomb’slaw, so also Ampere’s formula gives both of the following equations.

Corollary 3.7

div B = 0 (3.12)

curl B =4π

cj(x) Ampere’s law (3.13)

where j(x) is the current density defined by

j(x) · ndA = total charge passing through

the element of surface dA per second. (3.14)

and c is some constant to be determined by measurements on condensers andcoils of wire.

50

3.5.2 FARADAY

As Oersted showed, a current in a loop of wire generates a magnetic field.It stands to reason then that a magnetic field should generate a current in aloop of wire. Does it not? Faraday carried out the experiment to test this, in1824, but found no effect. In our class we carried out this experiment also.In Figure 4 you can see a coil of wire in series with a meter (50 µA full scale).The instructor (me) brought the magnet over to the coil of wire and held itthere for several seconds. No movement of the meter was observed during allthe time that I held it there. Thus we verified Faraday’s 1824 observationthat a magnetic field need not generate a current in a loop of wire.

It happened that several alert students, sitting in the first few rows ofour class, noticed that there actually was a movement of the meter as Imoved the magnet into the vicinity of the coil and as I moved it away. Bya remarkable coincidence, Faraday noticed this effect also. He repeated hisexperiment in 1831. But this time he tested the hypothesis that a changingmagnetic field should produce a current in a loop of wire. His report on itssuccessful outcome was received by the Royal Society on November 24, 1831.If you look back at all the experiments done up to that point in electricityand magnetism you will see that all dealt with stationary circ*mstances -stationary charge, stationary current, stationary magnetic field. It’s true thatthere were lots of sparks, etc. generated in the previous 130 years, and theserepresented sudden motion of charges. But these were not controlled enoughto do quantitative experiments. Thus Faraday’s experiment was the first oneto deal with time dependent phenomena in electricity and magnetism.

We repeated Faraday’s 1831 experiment in class and tested out two hy-potheses. Along with Faraday we assumed that the current induced in thecoil of wire depends on the first derivative, B, of the magnetic field at thelocation of the wire. There are two obvious ways to get a number fromcombining B with the geometry of the coil and which are invariant underEuclidean motions of the apparatus. Namely,

Current = const.

∫C

Btangentialdx (3.15)

and

Current = const.

∫S

BnormaldA (3.16)

where C denotes the closed curve which is the coil of wire and S is a surface

51

with boundary C. The integral over the surface is independent of the choiceof S because div B = 0 by (3.12).

By moving the magnet perpendicular to the axis of the coil one can expectto get a large current if (3.15) is correct and a small current if (3.16) is correct.By moving the magnet parallel to the axis of the coil one should expect toget a large current if (3.16) is correct and a small current if (3.15) is correct.(Picture these two cases thoughtfully.) Three independent teams of studentsand faculty made observations of meter needle motion under the two typesof magnet movement as above. All three teams concluded, independently,and well within the limits of experimental accuracy, that (3.16) can be theonly correct one among these two formulas. By a remarkable coincidence,Faraday reached the same conclusion.{This discussion is repeated a little differently after Figure 3. Leave it

there for now.}

52

Figure 3: Faraday’s experiment showing induction between coils of wire: Theliquid battery (right) provides a current which flows through the small coil(A), creating a magnetic field. When the coils are stationary, no current isinduced. But when the small coil is moved in or out of the large coil (B),the magnetic flux through the large coil changes, inducing a current which isdetected by the galvanometer (G).

Here is our version of Faraday’s experiment. We tested two hypotheses:given the assumption that a changing magnetic field B(x, t), produces anelectric field, which in turn produces a current in a loop of wire, does thecurrent resulting from this electric field depend on the integral of the tan-gential component of the time derivative, B, around the loop C? or does itdepend on the integral of the normal component of B over some surface Swith boundary C? In the latter case, Equation (3.12), together with Gauss’stheorem, shows that the integral is independent of which surface one chooses.These two kinds of formulas were the only simple, translation and rotationinvariant formulas we could think of that combine a closed curve and a vectorfield to produce a number. Our three expert lab assistants tested hypothesis#1 by moving the magnet quickly, perpendicular to the coil near its center(to cancel non tangential contributions.) They also tested hypothesis #2by moving one pole of the magnet parallel to the axis very quickly. Theyindependently arrived at the same conclusion as each other: Hypothesis # 2is correct.

See Figure 4 for a picture of our equipment.

53

Figure 4: Differences from Faraday’s version, compare with Fig. 3:1. We used a magnet instead of an electromagnet (A).2. We used a (nearly) modern ammeter instead of a galvanometer (G).3. We had more wires hanging out than Faraday.4. We tested two hypotheses; Hyp. 1, Current = constant

∫CB(x) · dx ;

Hyp. 2, Current = constant∫SB(x) · nd(Area) where S is a surface with

boundary C.

By choosing the current loop C small, it now follows from our experimentalverification of hypothesis #2 and from Stokes’ theorem that

curl E = −1

c∂B/∂t Faraday′s law (3.17)

for some constant c.To be perfectly honest, the appearance of the electric field E in (3.17),

rather than some version of current, as in (3.16), needs explaining. Butthe connection between these two was just being explored by Ohm at thetime that Faraday was doing his experiment. Ohm did some experimentssuggesting that there is an “electromotive force” that pushes current throughwires. Just how much current will depend on both the nature of the wire(resistance as we would say nowadays), and the strength of the electromotiveforce. The electromotive force in the wire is in turn proportional to theelectric field produced by the changing magnetic field, as in (3.17).

STATUSFrom Coulomb, Oersted, Biot, Savart, Ampere and Faraday we now have

relations between charge, electric field, current and magnetic field. The ex-

54

pression of these relations by means of differential equations is

divE = 4πρ Gauss’ law (3.18)

divB = 0 no magnetic monopoles (3.19)

curlB =4π

cj(x) Ampere’s law (3.20)

curl E = −1

c∂B/∂t Faraday’s law (3.21)

3.6 MAXWELL puts it all together, 1861

1861: MAXWELL (1831-1879)In truth, Ampere derived his force law from experiments involving steady

currents. Neither the currents nor the charges were changing with time.There was no condenser on which charge might gradually accumulate. Con-sider a movement of the electric fluid (“charge”, as we would say today)whose flow is given by the vector field j(x) as defined in (3.14). But nowallow the vector field to depend on time also. Denote by ρ(x, t) the chargedensity. In a bounded open set V (with smooth boundary) the total chargeinside is

∫Vρ(x, t)dx while the rate of charge flow out of V is

∫∂V

j(x, t) ·ndA.Under the assumption that no electric fluid (i.e. charge) can be created ordestroyed it follows that ∂/∂t

∫Vρ(x, t)dx+

∫∂V

j(x, t) · ndA = 0. By Gauss’theorem we find that

∫V∂/∂tρ(x, t)dx +

∫V

div j(x, t)dx = 0. Since V isarbitrary we find

div j(x, t) + ∂ρ(x, t)/∂t = 0 conservation of charge. (3.22)

This equation has a different character from the equations of Gauss, Ampereand Faraday because it just reflects the assumption that there is some “elec-tric fluid” which can move but not increase or decrease in total.

Now Ampere’s law, (3.13), is not really consistent with the equation (3.22)of conservation of charge when the charge density is changing. Indeed, takingthe divergence of (3.13) and using the identity div curl = 0 we find

0 = div curlB =4π

cdiv j = −4π

c∂ρ/∂t,

which is a contradiction unless ∂ρ/∂t = 0.

55

Maxwell resolved the inconsistency by simply adding another term toAmpere’s law, yielding the following internally consistent equations.

MAXWELL’S EQUATIONS

divE = 4πρ Gauss’ law (3.23)

divB = 0 no magnetic monopoles (3.24)

curlB =4π

cj(x) +

1

c∂E/∂t Ampere-Maxwell (3.25)

curl E = −1

c∂B/∂t Faraday’s law (3.26)

This glib description of what Maxwell did and why must be taken cumgranum salis. Maxwell in fact had some considerable physical reasons foradding on the extra term in (3.25). Imagine a closed loop of wire brokenat two points. At one point insert a battery. At the other point insert acondenser. A current will flow for a while until the condenser is chargedup. But is it “right” to say that a current is flowing across the gap in thecondenser when in fact no charge is moving across this gap? (Its best to con-sider the gap evacuated for this discussion so that one doesn’t digress ontopolarization of some intermediate insulator between the condenser plates.)Maxwell said, yes, the term c−1∂/∂tE “represents” a current, the so called“displacement current” and with it the circuit can now be considered closed.Many of Maxwell’s contemporaries did not look kindly on this viewpoint andcontinued to try to develop their own theory of forces on moving charges.

Implications of Maxwell’s equations.1. Electromagnetic influences propagate with a finite speed, c.Proof: For simplicity consider Maxwell’s equations in a region with no

charges or currents. I.e. ρ = 0 andj = 0. Then Mawell’s equations yield

1

c2∂2E/∂t2 = ∂/∂t

1

ccurl B

= −curl2E

= ∆E − grad div E

= ∆E (3.27)

56

Thus E satisfies the wave equation 1c2∂2E/∂t2 = ∆E. A similar computation

shows that B also satifies the same wave equation. As is well known, (justtake the Fourier transform in the x variables) this equation implies that Eis a superposition (Fourier transform) of waves that move at the speed c.

Now Faraday and some others before him suspected that light itself wasa manifestation of magnetic waves. Maxwell confidently asserted this on thebasis of his equations. Two simple (in-principle) laboratory experiments canbe used to determine the constant c. Experiment No. 1 is purely electrical:Make a condenser of two parallel plates of measured area and measureddistance. Connect a battery of voltage V and measure how much charge, Q,can be pushed onto one plate by the battery. Define ε = αQ/V where α isa constant that depends only on the geometric measurements made before.Experiment No. 2 is purely magnetic: Pass a current through a long straightwire and measure the magnetic field produced one inch away from the wire.Insert the data into the formula of Biot and Savart to find the constant µneeded in their formula (but omitted in (3.13)) to make the magnetic fieldstrength come out in agreement with this measurement. Had we writtenMaxwell’s equations in terms of standard units, centimeters, Coulombs andseconds, these measured constants would have appeared in the equations.The constant c in (3.27) would have been replaced by c−2 = εµ. In thisway one can measure c by experiments involving only stationary charge andstationary current. Experimental results. Over the next 30 years (that isinto the 1890’s) more and more accurate measurements of ε, µ and the actualspeed of light showed that the constant c is indeed (to within experimentalerror) the speed of light!

3.7 Maxwell’s equations ala differential forms

It is remarkable that after 170 years of electric machines, conductors, shock-ing jars, kites, dead turkeys, balances, accidental currents and cleverly movedcoils of wire, the entire edifice can be summarized in a couple of very simplelooking equations (thanks to Cartan’s calculus of differential forms), that onecould almost guess at (in hindsight).

Choose coordinates x1, x2, x3 and x4 = ct for R4. The Minkowski metricon R4 is defined by its values on 1-forms by the definition

〈dxj, dxk〉 = gjk (3.28)

where gjk = diag 1, 1, 1,−1. We are going to rewrite Maxwell’s equations,

57

(3.23) - (3.26) in terms of differential forms. To this end we will define a1-form E(1) which is a form version of the vector field E and define a 2-formB(2) which is the correct form version of the vector field B as follows.

E(x, y, z, t)(1) =3∑i=1

Ei(x, y, z, t)dxi (3.29)

B(x, y, z, t)(2) =∑

(i,j,k)

Bi(x, y, z, t)dxj ∧ dxk (3.30)

The sum over (i, j, k) refers to a sum over the three cyclic permutations of(1, 2, 3). Then E(1) is a time dependent 1-form on R3, which can just as wellbe interpreted as a 1-form on R4 with no dx4 component. B(2) is a timedependent 2-form on R3 which we may and will interpret as a 2-form onR4. The mapping from vectors to 1-forms or 2-forms is discussed further inAppendix 9.4 where the relations between the usual vector calculus operatorsdiv, curl, gradient and the exterior derivative on forms is reviewed. Define

F = E(1) ∧ dx4 +B(2) (3.31)

and

J =3∑i=1

ji dxi − ρ dx4. (3.32)

Theorem 3.8 Denote by D the exterior derivative operator on forms overR4 and by D∗ its adjoint with respect to the Minkowski metric (3.28). ThenMaxwell’s equations

div B = 0 no magnetic monopoles (3.33)

curl E +1

c

∂B

∂t= 0 Faraday’s law (3.34)

div E = 4πρ Gauss’ law (3.35)

curl B − 1

c

∂E

∂t=

cj(x) Ampere-Maxwell (3.36)

hold if and only if

DF = 0, and (3.37)

D∗F = (4π/c)J (3.38)

58

Proof. Denote by d the exterior derivative on forms over R3. Then Dω =dω + dx4 ∧ (∂ω/∂x4) for any form ω over R4. Hence

DF = (dE(1)) ∧ dx4 + dB(2) + dx4 ∧ (∂B(2)/∂x4) (3.39)

Since dx4∧ commutes with the 2-form (∂B(2)/∂x4) it follows that DF = 0 ifand only if dB(2) = 0 and (dE(1)) + (∂B(2)/∂x4) = 0. These two equationsare the form versions of (3.33) and (3.34), respectively, as shown in Appendix9.4. Hence the equation DF = 0 is equivalent to the pair of equations (3.33)and (3.34).

Now let φ = σ(1) + τ dx4 be a C∞c 1-form on R4 where σ(1) =∑3

i=1 σidxi.Then, mindful of (3.28), we have∫

R4

〈D∗F, φ〉 =

∫R4

〈F,Dφ〉

=

∫R4

〈F, dσ(1) + (dτ − (∂σ(1)/∂x4) ∧ dx4〉

=

∫R4

〈B(2), d σ(1)〉+

∫R4

〈E(1) ∧ dx4, (dτ − (∂σ(1)/∂x4) ∧ dx4)〉

=

∫R4

〈d∗B(2), σ(1)〉 −∫R4

〈E(1), (dτ − ∂σ(1)/∂x4)〉

=

∫R4

〈d∗B(2), σ(1)〉 −∫R4

((d∗E(1))τ + 〈(∂E(1)/∂x4), σ(1)〉

)=

∫R4

〈d∗B(2) − (∂E(1)/∂x4), σ(1)〉 −∫R4

(d∗E(1))τ

On the other hand ∫R4

〈J, φ〉 =

∫R4

〈j(1), σ(1)〉+

∫R4

ρτ (3.40)

Therefore D∗F = (4π/c)J if and only if −d∗E(1) = 4πρ and d∗B(2) −(∂E(1)/∂x4) = (4π/c)j(1). (A c is missing from somewhere.) These twoequations are the form versions of the two remaining (inhom*ogeneous) equa-tions (3.35) and (3.36), respectively, as shown in Appendix 9.4.

3.8 Electromagnetic forces ala Newton, Lagrange andHamilton

In order to incorporate electromagnetic forces into quantum mechanics itwill be necessary to reformulate the Newtonian mechanics for these forces

59

in Hamiltonian form. The force on a charged particle of charge e in thepresence of electric and magnetic fields is given by F = eE if the particleis standing still. This is the very definition of the electric field E. But ifthe particle is moving we have not yet written down the force. In Definition3.3 we wrote down the force of a magnetic field on a single magnetic pole(with the other end of the magnet “far away”) or, equivalently, the torqueon a very small magnet (magnetic dipole), which we took as the definition ofthe magnetic field. Four years after Hertz’ definitive experiments, showingthat an electromagnetic wave from a spark really does propagate, all the wayacross a room, H.A. Lorentz proposed (1892) that the force on a movingcharge is

F = eE + (e/c)v ×B, (3.41)

where B is the magnetic field at the position of the particle and v is thevelocity of the particle. Take note of the fact that there is no magnet lurkingin this definition even though the magnetic field is producing a force. Themany consequences of Lorentz’ proposed force law are consistent with exper-iment and with Maxwell’s equations (provided quantum mechanics doesn’thave to be taken into account). Lorentz’ force law is now accepted.

In order to incorporate the Lorentz force (3.41) into the quantum me-chanics of a charged particle it will be necessary to carry out the followingprocedure:

1) Express forces in terms of a potential.2) Express Newton’s equations in terms of Lagrange’s equations.3) Express Lagrange’s equations in terms of Hamilton’s equations.4) Convert from Hamilton’s equations in classical mechanics to

Schrodinger’s equation in quantum mechanics.Step 4) will be described in the next chapter for velocity independent

forces, in the section on the rules of quantum mechanics. But for a velocitydependent force, such as the electromagnetic force (3.41), the transition toquantum mechanics will be described in detail in Chapter 7. In the remainderof this chapter we will carry out Steps 1), 2) and 3).

Since the Lorentz force is velocity dependent it does not fall among thetypes of examples we considered in Chapter 2. It is not necessarily true thatevery type of force law in Newtonian mechanics can be transcribed into anequivalent Lagrangian formulation along the lines we described in Chapter2. For example friction forces, which typically also depend on the velocityin a linear way, require a substantial change in the Lagrangian formalism, a

60

change which does not lend itself to quantizationBut the very special form of the Lorentz force, in combination with

Maxwell’s equations, yields a successful Hamiltonian formulation, which, aswe will see later, leads to a resulting Schrodinger equation that begs for ageometric interpretation in terms of connections on vector bundles. In factthis is the classical jumping off point to the appearance of Yang-Mills fieldsand to the currently most widely accepted theory of elementary particles.

Here are the details for Steps 1), 2), and 3), for a particle in the presenceof an electromagnetic field. Step 4) will be carried out in Chapter 7.

Step 1. The electromagnetic potentials.When the force on a particle depends on the velocity of the particle

the simple equation Force = −∇V can’t hold for some function V of spaceonly. Our discussion of the Lagrange formalism has to be modified. Sincethe 2-form F defined in the preceding section satisfies DF = 0 and R4 iscohom*ologically trivial there is a 1-form A such that

F = DA (3.42)

The 1-form A is called an electromagetic potential for the electromagneticfield E,B. A is highly non-unique: if f is any smooth function on R4 onemay replace A by A+Df without changing E and B because D2f = 0. Wemay write A =

∑3j=1 Ajdxj + φdt where the components Aj and φ are real

valued functions of x and t. The 2-form F has six independent components.The equation (3.42) may be written, as we know from Appendix 9.4,

B = curl A (3.43)

E = −gradφ− (1/c)∂A/∂t (3.44)

where A = (A1, A2, A3) are the three spatial components of the 1-form A.This is the customary way to write the equation (3.42) in the physics liter-ature. The 1-form A is going to replace our old potential V (x). Whereasthe potential V was unique up to an additive constant, the electromagneticpotential is unique only up to an exact 1-form, df .

Step 2. Newton to Lagrange.Here is a Lagrangian that allows the Lorentz force (3.41) to be incorpo-

rated into Lagrange’s form of Newtonian mechanics. Define

L(x, v, t) = (1/2)m|v|2 − eφ+ e(v/c) ·A (3.45)

61

Theorem 3.9 With the force given by (3.41) Newton’s equations are equiv-alent to the Lagrange equations

d

dt

∂L

∂vi=∂L

∂xi, i = 1, 2, 3 (3.46)

Proof. Referring to (3.45) we see that ∂L/∂vi = mvi + (e/c)Ai(x). Hence

d

dt

∂L

∂vi= mvi +

e

c

(∂Ai∂t

+3∑j=1

∂Ai∂xj

xj

)(3.47)

= mvi +e

c

(∂Ai∂t

+ (v · ∇)Ai

)(3.48)

Also

∂L

∂xi= −e ∂φ

∂xi+ e(v/c) · ∂A

∂xi(3.49)

= −e ∂φ∂xi

+e

c

∂(v ·A)

∂xi(3.50)

since xi and v are independent variables.Therefore, taking into account (3.43) and (3.44), we find

d

dt

∂L

∂vi− ∂L

∂xi= mvi +

e

c

(∂Ai∂t

+ (v · ∇)Ai

)+ e

∂φ

∂xi− (e/c)

∂(v ·A)

∂xi

= mvi + e( ∂φ∂xi

+ (1/c)∂Ai∂t

)+ (e/c)

((v · ∇)Ai −

∂(v ·A)

∂xi

)= mvi − eEi + (e/c)

((v · ∇)A−∇(v ·A)

)i

= mvi − eEi − (e/c)(v × (curl A))i

= mvi − eEi − (e/c)(v ×B)i

= mvi − Fi

where F is the Lorentz force, given by (3.41). We have used the identityv × (∇×A) = ∇(v ·A) − (v · ∇)A. (((Some readers might prefer to writethis identity as −iv(dA) = d(ivA) − LvA, where Lv is the Lie derivative, asin [40, Proposition 3.10].))) This proves the theorem.

Ref. Goldstein pages 19-21.

Step 3. Lagrange to Hamilton.

62

Next we must pass from Lagrangian mechanics to Hamiltonian mechan-ics. To this end we must compute the Legendre transform of the Lagrangefunction (3.45). Recall that one holds x and t fixed and carries out the Leg-endre transform on the fiber over x. In Example 2.45 we carried out theLegendre transform for the crucial part, (1/2)m|v|2 + e(v/c) · A, and sincethe potential φ is independent of v it just goes along for the ride. Hence theHamiltonian is

H(x, p, t) =1

2m

3∑j=1

(pj −e

cAj(x, t))

2 + eφ(x, t), (3.51)

where I have written now H instead of L∗ in accordance with the customaryrespect for Hamilton.

SUMMARY: Starting with Lorentz’s force law, (3.41) for a moving chargedparticle in an electromagnetic field, we carried out the transitions from New-tonian mechanics to Lagrangian mechanics to Hamiltonian mechanics forLorentz’s force law. After we learn what the rules of quantum mechanicsare, we will be ready to insert our Hamiltonian formulation of the electro-magnetic force into quantum mechanics. This will be done in Section 7.

3.9 References

Modern electrodynamics: J.D. Jackson [37]Maxwell’s great treatise (1873) [44]

History booksE.T. Whitaker (History of Aether and Electricity) [67, 68]Roller and Roller (History just from the ancient Greeks to Coulomb) [49]Holton and Roller [35].

63

4 Quantum Mechanics.

4.1 Spectroscope

The devices that were invented in the 18th century for producing, storing andmeasuring electric charge were simple enough for us to understand withoutmuch background on our part. But by the time of Galileo the development ofoptical devices was well along (telescopes by Galileo, then microscopes later).We need to understand, albeit at a superficial level, how one particular opticaldevice works, whose purpose was to analyze the “quality” of light. As youknow ( I assume), a beam of light hitting a surface of glass at an angle willchange direction as it enters the glass. In fact blue light will bend more thanred light, with the result that if the hunk of glass is shaped like a prism thelight emerging from the prism will be split up according to its color. As aresult a beam of white light, hitting a side of a prism made of glass will splitup, by the time it comes out, into its color components. See Figure 5

Figure 5: Spectral analysis by prism

Not shown in this simple diagram are three technical features, one ofwhich is really important for us to understand.

First, the beam of light from the star has to be put through some lensesso that it will head toward the prism in a well formed beam of parallel lightrays. That’s just optics.

64

Second, the star can be replaced by any other source of light you mightlike to use. For example you might wish to put some hydrogen in a jar andpass a spark through it so that it will emit light. Of course you have topass this light through the lenses so that the light will head in a well formedbeam toward the prism. Or instead of hydrogen, or a star, you might like tovaporize some pure sodium in a jar, pass a spark through it, and direct theresulting light through the lenses toward the prism.

Third, and most important for us to understand, is that between thelenses and the prism you must place a vertical plate with a narrow horizontalslit in it. The incoming beam of light from the lenses, which you can think ofas having a circular cross section, will be mostly stopped by this plate. Whatlittle comes through the horizontal slit will have a cross section which is justa horizontal line segment. Now this beam with a linear cross section will passthrough the prism and break up into a bunch of horizontal line segments ofdifferent colors. The six horizontal dark lines in Figure 5 represent missingcolors in the incoming beam from the star. In Figure 6 you can see whatwould result if the source is just hydrogen, instead of a star.

Figure 6: The four visible lines in the hydrogen spectrum

There are four lines visible on this dark background. Each one is animage of the horizontal slit. By measuring the positions of these lines you canfigure out their wavelengths and therefore their frequencies, since (frequency)times (wavelength) = speed of light. Because of the appearance of such linesegments in the measuring apparatus of frequencies in a light beam it iscustomary to refer to them as spectral lines. Of course each such line is just animage of the horizontal slit. In Figure 5 all colors are present in the incomingstarlight except the six colors given by the dark lines. The missing colors(equivalently frequencies) (equivalently wavelengths) (equivalently lines) arecalled absorption lines. The four lines in the hydrogen spectrum are calledemission lines. Spectroscopy is the study of spectra of light. The simplespectroscope described above was invented by Newton in 1666 and alreadyused by him to study the radiation from stars (i.e. starlight). But as wesaw above, one can also use it to study the radiation from heated chemicalelements. There is yet another kind of radiation that can be analyzed in this

65

way (black body radiation, see next section). In the next section we willdescribe the spectroscopic effects discovered in the 100 years before quantummechanics finally explained them.

4.2 100 years of spectroscopy: radiation measurementsthat defied Newton ∪ Maxwell

While Faraday, Maxwell and friends were discovering the laws of inter-action between electric charges, electric currents, magnetic fields and light,there was an independent line of discoveries being made concerning the inter-action of matter with light. The emission and absorption of light by matteris what we mean by radiation. We saw in the preceding section how one canmeasure the ‘quality’ (spectral composition) of light emitted from variouskinds of sources (stars, heated chemical elements). There is another kind ofradiation source that we need to understand also. Namely, black body radia-tion, which is described in item 3 below. Here is a summary of spectroscopicknowledge accumulated over a hundred years.

1. A vaporized chemical element emits a spectrum peculiar to that ele-ment. The spectra of two different chemical elements are distinct.

2. If white light passes through a (cool) gas made of a chemical elementthen, after passing through the gas, the white light will have dark lines exactlywhere the (heated) chemical element had produced light lines as in item 1.The cool gas has absorbed radiation of the same wavelengths as the heatedchemical element can emit.

3. A blackbody, e.g a piece of soot covered iron, emits a red glow whenheated a little bit (think electric stove range). If heated hotter it will turn or-ange and if heated hot enough it will turn bright yellow (think lamp filament).A spectroscope shows that, actually, at any temperature, all frequencies arepresent in the emitted light. But at lower temperatures there is a preponder-ance of red light (i.e., low frequency light) while at higher temperatures thepreponderance of light is more blueish (i.e., higher frequencies, equivalently,longer wavelengths). So the distribution moves toward the blue end (i.e.,higher frequencies) as the body is heated. In Figure 7 you can see four lightintensity curves at four different temperatures. Notice that the maximumpoint on a curve is more bluish at higher temperatures. Thus at 6000 Kthe peak is in the yellow, which is more toward the blue end than the peak

66

at 4000 K. The intensity distribution, as a function of frequency (or wave-length), depends only on the temperature and not on which “blackbody” isused. (Yes, its one word.)

Figure 7: Blackbody radiation at different temperatures

Concerning the spectrum of individual chemical elements, it happensthat some manipulations with the chemical element can change its spectrum.Thus:

4. (Zeeman effect, 1893.) The spectrum of chemical elements can beshifted by placing the test element in a magnetic field while examining itsspectrum. For example take an element thats easy to deal with such assodium. It has a very bright line (called the D line) in the yellow portion ofits spectrum. This is why sodium vapor street lights cast a yellow light. Thisline has a wavelength (in vacuum) of approximately 5891.583264 Angstroms( 1 Angstrom = 10−8 cm.)

Now place a weak magnetic field near the vaporized sodium. The D linesplits into 3 distinct lines. Why? (The splitting increases as the magneticfield increases).

5. (Stark effect, 1913). Same effect as in item 4, except replace themagnetic field by an electric field.

6. Hydrogen spectrum. Most of the hydrogen spectrum is outside thevisible spectrum. One needs a more sophisticated version of our spectroscopeto measure the wavelengths of those lines that are not visible. But it can be

67

done. Here are five of the wavelengths that follow a remarkable pattern: thewavelengths are in nanometers (nm). (1 nm = 10 A.)

λ3 = 656 nm

λ4 = 486 nm

λ5 = 434 nm

λ6 = 410 nm

λ7 = 397 nm

The first four are in the visible spectrum. These four lines are shown inFigure 6. The red line is very bright. This is why passing sparks through atube with pure hydrogen in it will produce a red light. It just so happens thatall five of these wavelengths can be described by a simple formula, namely

λ−1n = R(

1

4− 1

n2), n = 3, 4, . . . , 7 (4.1)

Is this simple formula just a fluke? After all we could add on to the rightside any function that vanishes on the set {3, 4, 5, 6, 7} and still have a validformula. Well there are a lot of other spectral lines of hydrogen that fit thispattern also. The general formula

λ−1n′,n = R

( 1

(n′)2− 1

n2

), n > n′ ≥ 1 Balmer-Rydberg formula (4.2)

fits a large number of observed spectral lines of hydrogen. In Figure 8 youcan see many more of the spectral lines on hydrogen. The four lines in thevisible range are shown in black. Don’t be fooled by the colors in Figure 8.Ultraviolet is on the far left. Infrared is on the far right. {By the way it takesa lot of experimental effort to measure and analyze the hydrogen spectrumto find even one of these series (i.e. fixed n′.) We are indebted for n′ = 2to Balmer (1885), for n′ = 1 to Lyman (1906-1914), for n′ = 3 to Paschen(1908), for n′ = 4 to Brackett (1922) for n′ = 5 to Pfund (1924), and forn′ = 6 to Humphreys.}

SUMMARY: In order to understand radiation one must explain an awfullot of very different kinds of observations. We just scratched the surface inour description of the previous six kinds of experiments. At least hundreds

68

Figure 8: The four visible H lines are in black, starting from Ba -α. UV isto the left. IR is to the right.

(if not thousands) of unsuccessful attempts to explain these experiments onthe basis of Newtonian mechanics and Maxwell’s theory of electricity andmagnetism were made between (say) 1850 and 1925. It would be well todwell on this for a few minutes before going on to the next section.

4.3 The rules of quantum mechanics

One’s physical intuition is based not only on experience but is also moldedconsiderably by the mathematical models we make to explain our experience.For the motion of bodies that we are familiar with, under the influence offorces that we are familiar with, the mathematical model is classical mechan-ics. Table 2 describes some physical notions for which we have some intuitiveideas, based on our experience. In the second and third columns these intu-itive notions are given mathematical representations in the two forms we’vealready studied, namely Lagrange’s and Hamilton’s formulations of Newto-nian mechanics. We need to get the conceptual organization of these notionsstraight before passing to the corresponding mathematical representation ofthese notions in quantum mechanics.

Concerning line (i), the “instantaneous state” of the system is, by defi-nition, the information necessary to determine the instantaneous state at allfuture times. Since Newton’s equations are second order in time one needsto know, typically, the initial positions and velocities of all “particles” (e.g.including rigid bodies) in the system. Both Lagrange and Hamilton’s formu-lation include such data, since the instantaneous state is a point (q, v) ∈ T (C)or a point (q, p) ∈ T ∗(C), respectively.

69

Physical Notion Interpreted by Lagrange Interpreted by Hamilton(i) Instantaneous state A point (q, v) ∈ T (C) A point (q, p) in T ∗(C).(ii) Observable A function g : T (C)→ R A function f : T ∗(C)→ R(iii) Value in a state g(q, v) f(q, p)(iv) Dynamics Newton ala Lagrange Newton ala Hamilton

Table 2: Classical mechanics ala Lagrange and Hamilton.

Concerning line (ii), the mathematical objects corresponding to thingswe observe are (usually) given by functions on the instantaneous state space.See Table 3 for lots of examples in case configuration space is just R3. The“axiom” in line (iii) is pathetically self explanatory.

Concerning line (iv), the forces, masses, moments of inertia, etc. areneeded to determine the motion of the system, given the instantaneous state.The motion is then determined by Newton’s equations, or, equivalently, byLagrange’s equations or Hamilton’s equations.

The first three lines in Table 2 merely describe what system is under ourpurview. This is called kinematics. Line (iv) is the only line dealing with themotion of the system. The information governing the motion of the systemis called dynamics.

Example 4.1 Configuration space = R3: For a particle moving in R3 underthe influence of some forces the configuration space is of course just R3. Thetangent bundle and cotangent bundle are both R3×R3, upon identifying R3

with its dual space. In Table 3 we have listed a bunch of observables andthe functions on T ∗(R3) that represent them in classical mechanics. Laterwe will make the same table over again, but giving the mathematical objectsthat represent these same observables in quantum mechanics. The followingtable contains examples for the third column in Table 2, line (ii).

Quantum mechanical analog of Table 2.This table is very non-definitive because the complex Hilbert space H

has not been specified. Nor have the physical meanings of those self-adjointoperators A in line (ii). Of course one must specify what the mechanicalsystem under consideration is in order to determine H. Just as with ourdefinition of “configuration space” one can only learn how the Hilbert spaceH is to be chosen by looking at examples.

70

The observable: The function f : T ∗(C)→ R :1.“ the jth coordinate of position” f((x, p)) = xj2.“the jth coordinate of momentum” f((x, p)) = pj3.“the potential energy ” f((x, p)) = V (x)

4.“the kinetic energy” f((x, p)) = T = 12m

∑3j=1 p

2j

5.“the z component of angular momentum” f((x, p)) = Lz = x1p2 − x2p1

6.“total angular momentum” f((x, p)) = L2 = L2x + L2

y + L2z

7.“Question: Is the particle in the Borel set B ⊂ R3? f((x, p)) = χB(x)

8.“the total energy” f((x, p)) = E = 12m

∑3j=1 p

2j + V (x)

Table 3: Some classical observables for a single free particle in R3.

Physical Notion Quantum Mechanics(i) Instantaneous state Unit vector ψ in Hilbert space H.(ii) Observable Self-adjoint operator A on H(iii) Value in a state Expected value: (Aψ,ψ)(iv) Dynamics Schrodinger equation

Table 4: Quantum Mechanics.

Example 4.2 Configuration space = R3: How to choose the entries in thesecond column of Table 4 for a particle free to move in R3 under a force givenby F = −grad V .

First: Take H = L2(R3).Second: Use the following table to determine which operators on H

have what physical interpretation. Compare these operators with the func-tions given in Table 3. In particular note that if a function f((x, p)) dependsonly on x, say f(x, p) = g(x), then the quantum mechanical operator cor-responding to that classical mechanical observable is just the operator ofmultiplication by g on L2(R3). Notation: (Mgψ)(x) = g(x)ψ(x). The ob-servables in lines 1., 3. and 7. have this form. But all the others are very farfrom being multiplication operators. They are in fact differential operators.This is the time to dwell, for a few minutes, on these operators, especiallythe momentum operator Pj in line 2.

Third: In Table 4, line 3 you see “Expected value” instead of “Value”.

71

The observable: The operator on H1.“ the jth coordinate of position” Qj = Mxj

2.“the jth coordinate of momentum” Pj = −i~∂/∂xj3.“the potential energy ” MV

4.“the kinetic energy” T = 12m

∑3j=1 P

2j = ~2

2m(−∆)

5.“the z component of angular momentum” Lz = −i~(x∂/∂y − y∂/∂x)6.“total angular momentum” L2 = L2

x + L2y + L2

z

7.“Question: Is it in the Borel set B ⊂ R3? MχB

8.“the total energy” H = − ~22m

∆ +MV

Table 5: Some observables for a single free particle in R3 and the correspond-ing operators on H ≡ L2(R3).

This means that the theory does not actually predict what you will get ina measurement of the observable whose operator is A when the system is inthe state ψ. Rather, it predicts that if you make the measurement many,many times and take the average you will get (Aψ,ψ).

Fourth: The dynamics for the quantum mechanical system is given bythe equation

i~dψ(t)

dt= Hψ(t) Schrodinger equation (4.3)

Here t 7→ ψ(t) is a function on R into H. Since H is a partial differentialoperator this ODE is really a PDE in disguise. We may write it explicitly as

i~∂ψ(t, x)

∂t= − ~2

2m∆ψ(t, x) + V (x)ψ(t, x) Schrodinger equation (4.4)

Advisem*nt on Table 4 rows (i) and (iii). Projective space. If c is acomplex number of absolute value one and ψ is a unit vector in H then cψis also a unit vector. Since (A(cψ), (cψ)) = (Aψ,ψ), the expected value of“A′′ in the state cψ is the same as in the state ψ. But according to Table4, the only information about the state ψ that the theory predicts is givenby such inner products (Aψ,ψ). (You can vary A.) Therefore ψ and cψcontain exactly the same physical information. So the set of instantaneousstates is not really {unit vectors in H} but rather this set modulo the acton

72

of the circle group on H. This, of course, is exactly the infinite dimensionalprojective space, P (H), based onH, i.e., the set of one dimensional subspacesof H.

On the one hand, it would be a technical nuisance to deal with the pro-jective space P (H) too extensively, instead of the nice Hilbert space H, whileon the other hand it is essential for the physical interpretation to realize thatψ and cψ represent the same physical state.

Direct physical interpretation of ψ. Line (iii) in Table 4 provides the(only) basis for interpretation of the wave function ψ when we have for ouruse the choices for A given in Table 5, and their powers. For ease in readinglets drop down to one dimension from three. Thus we will consider a particlefree to move on the line. In this case the state space is H = L2(R3). We candrop some subscripts in Table 5 and just write Q = Mx and P = −i~d/dx.Let n be a non-negative integer. According to the rule (iii) in Table 4, a mea-surement of the “observable” xn should give on average the value (Qnψ, ψ).Writing this out explicitly we find

(Qnψ, ψ) =

∫Rxn|ψ(x)|2dx n ≥ 0 (4.5)

Since∫R |ψ(x)|2dx = 1 the measure |ψ(x)|2dx is a probability measure on the

line and (Qnψ, ψ) is just its nth moment.Interpretation: |ψ(x)|2 is the probability density for finding the particle

at the point x under repeated measurements when the particle is in the stateψ. Neat, huh?

Generalization of this interpretation. We can easily abstract thepreceding example, replacing our special Hilbert space L2(R) by an arbitraryHilbert space H and replacing Q by an arbitrary self-adjoint operator A onH. In this case we may write, by the spectral theorem

A =

∫ ∞−∞

λdE(λ) (4.6)

where E(·) is a projection valued measure on the Borel sets of R. Let ψ bea unit vector in H and define

P (B) = (E(B)ψ, ψ) B = Borel set in R. (4.7)

73

Then P (R) = (Iψ, ψ) = 1, so P is a probability measure on R. Moreover

(Anψ, ψ) =

∫ ∞−∞

λnP (dλ). (4.8)

Just as in the preceding example, we may interpret (4.8) to mean that ameasurement of the observable “O” represented by the operator A will befound to lie in a Borel set B ⊂ R with probability (E(B)ψ, ψ).

For technical reasons (uniqueness of the moment problem) it is strongerto make an axiom based on the probability measure B 7→ (E(B)ψ, ψ) thenline (iii) in Table 4. So if you really want to make some axiom system forquantum mechanics, you might as well replace line (iii) of Table 4 by thefollowing.

Axiom of measurement. Given a self-adjoint operator A on the Hilbertstate space, H, and a unit vector ψ in H, let

A =

∫ ∞−∞

λdE(λ) (4.9)

be the spectral resolution of A. Then any measurement of the observable“O” whose corresponding operator is A will lie in the Borel set B ⊂ R withprobability

P (B) = (E(B)ψ, ψ) (4.10)

when the system is in the state ψ. (As noted above, P (B) ≥ 0 and P (R) =(Iψ, ψ) = 1. So P is a probability measure on the line.)

Probability density for momentum. Lets return to the simple case ofone particle moving on the line. We already have an interpretation of thewave function: |ψ(x)|2dx is the probability measure for finding the particleat a point x. We got this by applying the Axiom of measurement to theposition operator Q. Now lets apply this axiom to the momentum operatorA = P . According to the Axiom of measurement the probability of of findingthe momentum of the particle in some Borel set B in R can be found fromthe spectral resolution of P . But the spectral resolution is known becausethe Fourier transform “diagonalizes” P .

Let

ψ(p) := (2π~)−1/2

∫Rψ(x)eipx/~dx

74

be the Fourier transform of ψ. (The factor ~ in the exponent makes theexponent dimensionless.) Differentiating under the integral sign we see that

(Pψ)(p) = pψ(p) (4.11)

In other words, under the Fourier transform, which is of course a unitarymap of L2(R3) onto itself, the differential operator P goes over to the multi-plication operator Mp. Explicitly, FPF−1 = Mp. We may therefore proceedexactly as in the case of Q, which was multiplication by x, and conclude that|ψ(p)|2 is the probability density of finding a measurement of momentum tobe p when the system is in the state ψ.

SUMMARY:1. The rule (iii) in Table 4 is (informally) equivalent to the Axiom of

measurement, (4.10).

2. For a particle moving on the line, and whose wave function is ψa) the probability that a measurement of position lies in an interval [a, b] is∫ b

a

|ψ(x)|2dx (4.12)

andb) the probability that a measurement of momentum lies in [a, b] is∫ b

a

|ψ(p)|2dp. (4.13)

4.4 The Heisenberg commutation relations and uncer-tainty principle

There to be learned from our simple case of one particle moving on theline under some, as yet unspecified, force, even before we get to dynamics..Suppose that the wave function is supported in a small interval [c, d] oflength at most ε. In that case measurements of position will find the particlesomewhere in this interval on average because

c ≤∫Rx|ψ(x)|2dx ≤ d (4.14)

75

How widely dispersed can these (many) measurements be? The usual quan-titative measure of dispersion of measurements is the variance of the proba-bility distribution: Let x0 =

∫R x|ψ(x)|2dx. Of course x0 lies in the interval

[c, d] by (4.14). The variance of the distribution |ψ(x)|2dx is, by definition,

V ar(Q,ψ) =

∫R(x− x0)2|ψ(x)|2dx. (4.15)

Since (x− x0)2 ≤ (ε)2 on the support of |ψ|2 it follows that

V ar(Q,ψ) ≤ ε2. (4.16)

Undoubtedly you have been wondering whether quantum mechanics allowsthe existence of states for which repeated measurements of position will comeout close to the same value every time, just as in classical mechanics. Yousee that for the wave function ψ supported in the small interval [c, d] thevariance of these measurements will be small. In fact the higher moments,∫R(x− x0)2ndx ≤ ε2n which are also small. So here is a quantum state of the

particle in which not only is the average value of many measurements equal tox0, but the deviation of these measurements from x0 is also small in the usualstatistical sense. In fact, the probability that a measurement of position willlie outside the interval [c, d] is zero by the axiom of measurement, isn’t it.

Now the dispersion of measurements of the momentum in this same stateψ can be computed the same way, using (4.13) instead of (4.12). The varianceof momentum measurements in the state ψ is therefore

V ar(P, ψ) =

∫R(p− p0)2|ψ(p)|2dp (4.17)

where

p0 =

∫Rp|ψ(p)|2dp (4.18)

is the expected value of these measurements. We were able to choose ψ sothat the variance V ar(Q,ψ) was small. But having chosen ψ, its Fouriertransform is no longer under your control. The next theorem shows that youcan’t measure both position and momentum with arbitrarily good accuracy.

Theorem 4.3 Heisenberg uncertainty principle For any unit vectorψ ∈ L2(R),

V ar(Q,ψ)V ar(P, ψ) ≥ ~2/4. (4.19)

76

This theorem looks like a theorem in Fourier analysis, which it is. But it isalso a direct consequence of the following fundamental identity of Heisenberg,which itself follows immediately from the product rule for derivatives.

The Heisenberg commutation relations

PQ−QP = −i~ (Identity operator on H). (4.20)

Indeed, since P = −i~(d/dx) and (Qψ)(x) = xψ(x) this identity merely saysthat (−i~)(xψ(x))′ − xψ′(x)) = (−i~)ψ(x).

All formulations of quantum mechanics build in the Heisenberg com-mutation relations (4.20) in one way or another. It is precisely the non-commutativity encompassed by (4.20) that distinguishes classical mechanicsfrom quantum mechanics, in the view of many.

We will show now that (4.20) implies (4.19). First lets get straight thegeneral definition of variance.

Definition 4.4 Let A be a self-adjoint operator on a Hilbert space H andlet ψ be a unit vector in the domain of A. The mean and variance of A inthe state ψ are defined by

a = (Aψ,ψ) (Mean of A in the state ψ). (4.21)

V ar(A,ψ) = ‖(A− a)ψ‖2 (Variance of A in the state ψ). (4.22)

This is a good time to verify that this definition agrees with our previousdefinitions of V ar(Q,ψ) and V ar(P, ψ).

Theorem 4.5 Let A and B be self-adjoint operators on a Hilbert space H.Assume that for some complex number c and for all vectors ψ ∈ Dom(AB)∩Dom(BA) there holds

(AB −BA)ψ = cψ (4.23)

Then, for all unit vectors ψ ∈ Dom(AB) ∩Dom(BA)

‖Aψ‖‖Bψ‖ ≥ |c|/2 (4.24)

andV ar(A,ψ)V ar(B,ψ) ≥ |c|2/4 (4.25)

77

Proof. {Reference: von Neumann [60, page 234].} If ψ ∈ Dom(AB) ∩Dom(BA) then the following computation is justified:

2‖Aψ‖‖Bψ‖ ≥ 2 Im (Aψ,Bψ)

= (−i)[(Aψ,Bψ)− (Bψ,Aψ)

]= (i)((AB −BA)ψ, ψ)

= (i)(cψ, ψ)

= ic‖ψ‖2

Take absolute values to deduce (4.24) in case ‖ψ‖ = 1. By the way, noticethat the last four equalities imply that c must be purely imaginary if (4.23)holds.

Now let a = (Aψ,ψ) and b = (Bψ,ψ). Then A − aI and B − bI alsosatisfy the commutation relations (4.23). Therefore (4.24) shows that

‖(A− aI)ψ‖‖(B − b)ψ‖ ≥ c/2 (4.26)

This proves (4.39).Taking now A = P and B = Q and c = i~ we see that (4.19) is juxt a

special case of (4.39). But while we’re at it lets formulate the Heisenbergcommutation relations for a system whose configuration space is Rn. (E.g.n = 3, 6, . . . are interesting.) You probably don’t need me to tell you thatfor a system of particles whose configuration space is Rn, the Hilbert statespace should be taken as H = L2(Rn). Moreover the position operatorcorresponding to the the jth coordinate, xj on Rn should be taken to beQj = Mxj while the observable “jth component of momentum” is representedby the operator Pj = −i~∂/∂xj. Now the product rule for derivatives gives,just as before, (compare (4.20))

PjQk −QkPj = −i~δjkIH. (4.27)

For any unit vector ψ we have therefore the joint uncertainty relations

V ar(Qk, ψ)V ar(Pj, ψ) ≥ (~2/4)δjk (4.28)

Of course if j 6= k then (4.27) says that Pj and Qk commute while (4.28)says nothing.

END of DAY 14 = 3/10/2011

78

4.5 Dynamics and stationary states

We have so far discussed only lines (i), (ii) and (iii) in Table 4. These dealonly with the description of the quantum analog of instantaneous states, i.e.,kinematics. Now we want to discuss the rules by which quantum mechanicsasserts that states change with time. Line (iv) in Table 4 specifies that thetime evolution is guided by the Schrodinger equation.

i~dψ(t)

dt= Hψ(t) ψ(0) = ψ0 Schrodinger equation (4.29)

where H is the total energy operator on the state space H. In our exampleof a particle moving in R3 under the force F = −grad V the total energyoperator is

H = − ~2

2m∆ +MV (4.30)

as we see from Table 5.It is appropriate to write d/dt in (4.29) because one should think of ψ(·),

in the abstract, as a function of one variable, t, into H. But when H is itselfa function space, as in Table 2, it is appropriate to write (4.29) as a partialdifferential equation:

i~∂ψ/∂t = − ~2

2m∆ψ(x) + V (x)ψ(x), (4.31)

which is the concrete form that the Schrodinger equation takes for our Ex-amples 4.1 and 4.2.

The solution to (4.29) is simply given, by the spectral theorem,

ψ(t) = e−it~−1Hψ0 (4.32)

The operators e−i(t/~)H are unitary operators on H for each real t, again bythe spectral theorem. Of course it is one thing to cite a powerful theorem likethe spectral theorem to ensure the existence of a solution to the Shcrdingerequation and quite another to actually find the solution explicitly. But theabstract solution (4.32) allows us to understand some of the physical meaningof the dynamics.

Definition 4.6 An eigenfunction for the energy operator H is called a sta-tionary state of the system.

79

So why should an eigenfunction be called a stationary state? Well, ifHψ0 = λψ0 then the solution (4.32) reads

ψ(t) = e−i(t/~)λψ0 (4.33)

because eitHψ0 = eitλψ0 for such an eigenfunction. BUT, for each real t, eitλψ0

is just a multiple of ψ0 by a constant of absolute value one. Consequentlyψ(t) and ψ0 represent the same physical state, as we have already observedabove. In other words although the vector ψ(t) changes with time, the stateψ(t) does not change with time. So we call it a stationary state. Isn’t this areasonable terminology? Yes.

Here is some physical interpretation of stationary states that relates to ourclassical view of one particle orbiting another (earth around sun, or electronaround nucleus.) A classical particle orbiting another in an elliptical orbit isof course not standing still. Its position is changing with time. But the orbit isnot changing. It remains the same ellipse forever. What Heisenberg realized(or conjectured, or believed, or thought, or theorized) while he was on theisland Helgoland during hay fever season in May, 1925 was that you shouldn’treally talk about the orbit of the electron in too much detail because you can’tmeasure the electron’s position anyway, without destroying the orbit. E.g. ifyou “look at” the electron, i.e. bounce even just one photon off of it, you willchange its velocity so much that it will no longer be in the same orbit. Bycontrast, if you “look at” a planet, i.e reflect a few trillion trillion photonsinto your telescope, it will not change the orbit of the planet perceptibly. Soyou can see where the planet is at any instant. Heisenberg once complainedthat his school chemistry book explained the bonding of chemical elementsby showing the atoms with hooks attached so that one atom would grab onto the hook of another atom. In his view this was, shall we say politely,misleading. But he viewed the classical picture of elliptic orbits for electronsas misleading also because you could not, even in principle, measure theposition of the electron in such an orbit. Instead he took the whole orbitit*elf as the physical object to study rather than the position of the electronin the orbit. In his view the only thing that could actually be measured wasthe energy of these orbits and the rate of transition from one orbit to another(under the influence of bumping by photons).

80

4.6 Hydrogen

As we know, the potential that gives the force between two charged particlesis proportional to to their inverse distance. (See (??).) We want to considerthe example which models the hydrogen atom. We are going to take theproton fixed at the origin and allow the electron to wander around R3 subjectto the attractive Coulomb force exerted by the proton. The potential is givenby

V (x) = − γ

|x|, x ∈ R3 (4.34)

for some positive constant γ, which will be discussed some more later. Youmight like to check that this sign assures that the force, −∇V , always pointstoward the origin. This artifice of fixing the proton at the origin reduces ourmechanical system to one whose Hilbert space is L2(R3) and allows us toapply the machinery of Table 4. Fixing the proton at the origin, instead ofallowing the two particles to move freely, which would entail taking configu-ration space to be R6, forces a minor correction in the mass of the electron,which will be explained later. The Hamiltonian operator for the hydrogenatom is

H = − ~2

2m∆− γ

|x|, (4.35)

where m is the mass of the electron. (This will corrected later.) H is a self-adjoint operator on L2(R3), when its domain is chosen right. The eigenvalueequation for this operator is the partial differential equation

− ~2

2m∆ψ − γ

|x|ψ = Eψ (4.36)

where E is a real number and ψ is subject to the condition ψ ∈ Dom(H).The solutions to this equation may be found by using spherical coordinates(r, θ, φ). One seeks a solution, as usual, in the form ψ(r, θ, φ) = u(r) timesfunctions of θ and φ. This leads to an ODE for each of the three functions.You can consult a standard quantum mechanics book, such as Schiff [51,pages 80-87] for details on how to carry this out, if you really want to know.Here is the result. Let

En = − κ

n2, n = 1, 2, . . . , (4.37)

where κ = mγ2/(2~2). Then there are functions of the form

ψn,l,m(r, θ, φ) = un,`(r)Y`,m(θ, φ), ` = 0, 1, . . . , n− 1, −` ≤ m ≤ ` (4.38)

81

such that

− ~2

2m∆ψn,`,m −

γ

|x|ψn,`,m = En+`ψn,`,m (4.39)

(As a matter of culture you might like to know that the functions Y`,m are thespherical harmonics on S2, while the functions un(r) are essentially the La-guerre functions on (0,∞).) The set {E1, E2, . . . } is the entire point spectrumof H while the functions ψn,`,m form an orthonormal basis of the subspace ofL2(R3) spanned by all eigenfunctions. The remainder of the spectrum of His the interval [0,∞).

What does this have to do with the line spectrum of hydrogen? Thenumbers, En, that we arrived at are energies, whereas the positions of thespectral lines are determined by frequencies (equivalently, wavelengths), asin the Rydberg formula, (4.2). Here is the connection.

Planck’s hypothesisa) Light comes in little packets called photons.b) The energy of a photon of frequency ν is

E = hν, (4.40)

for some constant h independent of ν and to be determined by experiment.Now we know that a hydrogen atom, left to itself, and starting in a sta-

tionary state, such as ψn,`,m will remain in that state permanently, becausethats what the rules in Table 4 tell us. But just suppose that, for some rea-son, the orbiting electron suddenly drops into a lower energy orbit, ψn′,`′,m′ .That is, En′+`′ < En+`. What happens to the energy that it loses? Well,just suppose that the energy that it loses is somehow radiated away by theemission of a photon. By Planck’s hypothesis the frequency of the emittedphoton is given by (4.40). Therefore the frequency of the emitted photon isν = h−1(E − E ′). In view of (4.37) and (4.39) the frequency of the emittedphoton is

ν =mγ2

2h~2

( 1

(m′)2− 1

m2

)(4.41)

where m′ = n′+`′, m = n+` and γ, which is defined in (4.34), is proportionalto the product of the electric charges (electron and proton) in suitable units.Thus we have derived the Balmer-Rydberg formula, (4.2), including the valueof Rydberg’s constant R ! ! ! (Of course we have to know Planck’s constant.)

82

SUMMARY: We have derived the line spectrum of hydrogen from therules of quantum mechanics (Tables 4 and 5) in conjunction with Planck’shypothesis, (4.40).

Let’s back up a little and admit that some things are just a little fishywith this “derivation” of the hydrogen spectrum.

First, where did Planck’s hypothesis come from? Answer: After previousfailed attempts (by Wien (1893) and Lord Raleigh (1900)) to derive a formulafor the experimentally observed graph, Figure 7, for blackbody radiation,Planck derived a formula that fit the data, using the hypothesis (4.40) as aninput. The value of h that makes this formula fit the data can be deducedfrom the experimental data. Planck announced this in December of 1900,which some identify as the beginning of quantum mechanics. Moreover,in 1905, Einstein used Planck’s hypothesis to show that a beam of light,shining on a piece of metal in an evacuated tube, will knock electrons outof the metal if and only if the frequency of the light is high enough. Thisis the photoelectric effect. Einstein’s thinking was that the energy of littlecorpuscles of light, i.e. photons as we call them now, must be high enough toovercome the forces at the surface of the metal, which are trying to preventelectrons from escaping. But according to Planck (4.40), this means that thefrequency of the impinging light must be high enough. Einstein’s deductionswere soon confirmed experimentally (by others) and Einstein won a NobelPrize. Thus for this reason and other nice ways in which Planck’s hypothesiswas used successfully, the formula (4.40) was widely accepted by 1925, whenour Tables 4 and 5 originated.

Second, why should the electron suddenly drop down from a higher energyorbit to a lower one? And if all the electrons in all the atoms keep droppingdown to lower energy orbits there won’t be any higher energy electrons left,soon, to drop down. There must be some mechanism operating that makesthem jump up to a higher energy orbit. What is it?

Third, the mechanism for explaining why and how the atom emits a pho-ton when jumping is entirely missing. The electromagnetic field that repre-sents the photon mathematically must somehow enter the equations. But theelectromagnetic field is nowhere in sight in Table 5 or in the computationsat the beginning of this section. Furthermore, looking back at Chapter 3 onelectricity and magnetism, where do you see any mathematical structure that

83

suggests such a discrete notion as “one photon, two photons . . ”? Answer:You don’t.

Resolution: Not only must our notion of quantization be applied to theelectron orbiting the proton but also to the electromagnetic field that isproduced by the electron and in turn influences the electron. Whereas aparticle moving in R3 has only three (or six) degrees of freedom, any classicalfield theory has infinitely many degrees of freedom (e.g. the space of initialconditions). We can already expect, therefore that the quantum Hilbert statespace for a field theory will be something like L2(R∞) instead of L2(Rn) forsome finite n. This is what we will start on in the next chapter.

Nevertheless the simple theory of the hydrogen atom developed above is avital ingredient in much of chemistry and should not be looked down on justbecause we haven’t incorporated the interaction with the electromagneticfield. In fact the sets {x ∈ R3 : |ψn,`,m(x)|2 ≥ a}, for various a > 0, arevery important to understand for chemistry and are graphed extensively inchemistry books and on the web. Your classmate Yao Liu has suggested thefollowing website, which contains lots of nice pictures of these sets, calledorbitals.

http://winter.group.shef.ac.uk/orbitron/Here is the orbital for ψ2,1,1. The wave function is positive in the green

zone and negative in the blue zone.

Figure 9: Orbital for ψ2,1,1

END of DAY 15 = 3/15/11

84

4.7 Combined systems, Bosons, Fermions

There is a loose set of rules explaining how to construct the Hilbert spacerepresenting a system composed of two independent systems in terms of theHilbert spaces associated to each subsystem.

In classical mechanics the rule is simple: if C1 and C2 are the config-uration spaces for two systems then the Cartesian product C1 × C2 is theconfiguration space for the combined system. Moreover the total energy is,typically, the sum of the energies of the two subsystems plus another termthat represents the energy of interaction. For example there may be forcesacting between the two subsystems in addition to the original forces actingwithin each subsystem.

In quantum mechanics the rule is more or less similar: if H1 and H2

are the Hilbert state spaces for two systems then the Hilbert space for thecombined system is the tensor product

H = H1 ⊗H2. (4.42)

This is consistent with the combining rule for classical mechanics because ofthe natural isomorphism

L2(C1 × C2) = L2(C1)⊗ L2(C2), (4.43)

which holds for whatever measures you use on the factors, if you use theproduct measure on the product. For example the Hilbert state space fora system of two particles, each free to move in R3 is L2(R6) because theconfiguration space of the system composed of two particles, each moving inR3, is R3 × R3 = R6. But by the natural isomorphism (4.43) this Hilbertspace can be just as well described as L2(R3)⊗ L2(R3).

Here are the two exceptions to the rule (4.42). Suppose that we have twoidentical particles. In classical mechanics they are still distinguishable, eventhough identical: you can paint a 1 or a 2 on two identical billiard balls. (Usea thin layer of paint so as not to alter their mechanical properties.) But youcan’t paint anything on an electron or a photon. In quantum mechanics twoidentical particles are not only identical but also indistinguishable. If H is thequantum Hilbert space for each of the two identical particles then the Hilbertspace for the pair is not H⊗H. Rather, denoting, the interchange operatorP : H ⊗ H → H ⊗ H, which is the unique unitary operator extending thetransposition map P (u⊗v) = v⊗u, one insists that the state space of the two

85

particles is just the set of those vectors in H⊗H which are indistinguishableunder this transposition. There are two known types of particles which aresubject to this rule:

A Boson is a particle which, when combined with a particle identical toitself, has for its combined state space

{ψ ∈ H ⊗H : Pψ = ψ}. Bosons (4.44)

A Fermion is a particle which, when combined with a particle identicalto itself, has for its combined state space

{ψ ∈ H ⊗H : Pψ = −ψ}. Fermions (4.45)

The fact that the particle interchange operator for Fermions changes ψ to−ψ for Fermions should not cause you any worry as to whether interchangingparticles gives you a different state, because , after all ψ and −ψ define thesame state: no measurement can distinguish them. You might want to reviewTable 4 and its “Advisem*nt” following equation (4.4).

A very different example of this product rule arises when taking intoaccount the idea that an electron, although regarded as a point, has somekind of “internal structure”. The Hilbert state space for an electron withspin is

H = L2(R3)⊗ C2. (4.46)

The first factor reflects the fact that the classical configuration space for theelectron is R3. The second factor, which is just two dimensional, does nothave any classical analog. What is it doing there? 2 With it, you can explainthe periodic table, the Zeeman effect, heat capacity of metals, etc. etc. ... .Without it ... .

The physical significance of the extra factor C2 in (4.46) can be under-stood in a way that is predictive of things to come, concerning the classifi-cation of elementary particles, that we will address later. We will continuethis discussion later in the section on spin.

21924-1926, Pauli, Goudsmit, Uhlenbeck (former father-in-law of our own Karen Uh-lenbeck)

86

4.8 Observables from group representations

Symmetry is everywhere in physics. Symmetry usually means that thereis a group acting on some set and that two points a and b are physicallyequivalent if there is a group element g such that g(a) = b. For example ifthe set consists of the positions and velocities five particles moving in R3 thenthe induced action of SO(3) on this 30 dimensional space converts one suchinstantaneous state into another one with isomorphic physics because thelaws of physics are rotation invariant. (If you want to include a gravitationalforce from the earth you must include the earth in the system.) The sameinvariance of classical mechanics holds also for the Euclidean group, which issix dimensional. You can also change the zero setting of your clock, therebyallowing the Galilean group, which is seven dimensional. If you want to docorrect physics at very high speeds, however, you will have to replace theGalilean group by the Poincare group, which is ten dimensional. In this caseyou are into special relativity. See Section 9.6.

In quantum mechanics the isomorphisms are the unitary operators. Thusif H is the Hilbert state space for a system and A is the operator represent-ing an observable of interest, and if U : H → K is a unitary operator, thenthe Hilbert space K can just as well be taken as the Hilbert state space ofthe system, provided you now represent the previous observable by the in-duced operator UAU−1. The expected value (Aψ,ψ) = (UAU−1(Uψ), (Uψ))doesn’t change in the corresponding state. Therefore quantum mechanicalsymmetry with respect to some group G usually means that there is a unitaryrepresentation of the group lurking nearby. Recall that a unitary represen-tation of a group G is a hom*omorphism from G into the unitary group onsome Hilbert space. If G is a Lie group then one has to assume that thehom*omorphism is continuous in some suitable sense.

The link between group representations and observables descends fromthe following theorem.

Theorem 4.7 (Stone-von Neumann theorem) Let t 7→ U(t) be a functionfrom R into the group of unitary operators on a complex Hilbert space H.Assume

a) U(t+ s) = U(t)U(s), for all t, s ∈ Rb) t 7→ U(t)ψ is a continuous function on R to H for each vector ψ ∈ H.

Then there exists a unique self-adjoint operator A on H such that

U(t) = eitA for all t ∈ R. (4.47)

87

Moreover A is given by

iAψ =d

dt

∣∣∣t=0U(t)ψ (4.48)

for all ψ in the domain of A.

Proof. Find it in any book that has “quantum” and “mathematicians” inthe title.

Many of the fundamental observables that arise in quantum mechanicsand quantum field theory are produced naturally by this theorem. We aregoing to show first how the angular momentum operator arises in exactlythis way. In the next section we will do the same for spin.

4.8.1 Example: Angular momentum

The rotation group SO(3) acts naturally on L2(R3) by the map

UR : ψ 7→ ψ ◦R−1 ψ ∈ L2(R3), R ∈ SO(3) (4.49)

Since a rotation preserves Lebesgue measure, and U−1R = UR−1 , the map UR

is unitary. Moreover it is a simple exercise (if you pay a little attention tothe definition) to show that UR1R2 = UR1UR2 . Therefore the map R 7→ UR isa unitary representation of SO(3) on L2(R3).

Consider the one-parameter group of rotations given by

R(θ) =

cos θ − sin θ 0sin θ cos θ 0

0 0 1

(4.50)

By (4.49) these rotations act naturally on L2(R3) giving the followingfunction of θ into L2,

θ 7→ ψ ◦ (R(θ)−1), θ ∈ R (4.51)

for each function ψ ∈ L2.Lets see what observable we get from this one-parameter group of uni-

taries by the recipe (4.48). Of course we replace t by θ in (4.48). Here is thecomputation.

∂θψ(R(θ)−1(x, y, z)) =

∂θψ(x cos θ + y sin θ,−x sin θ + y cos θ, z)

= (−x sin θ + y cos θ)ψx + (−x cos θ − y sin θ)ψy

88

So at θ = 0 we find

∂θ

∣∣∣θ=0

ψ(R(θ)−1(x, y, z)) = (y∂

∂x− x ∂

∂y)ψ (4.52)

Now look all the way back at Table 5 line 5., which describes the z componentof the angular momentum operator. Back then we derived that operator fromthe classical angular momentum x× p by just making the usual replacementp→ −i~∇. But now we see that we have arrived at the same operator fromthe action of the rotation group on L2(R3). Specifically, comparing with(4.52) we see that

(Lzψ)((x, y, z)) = (−i)~ ∂∂θ

∣∣∣θ=0

ψ(R(θ)−1(x, y, z)). (4.53)

Lets put these pieces together. The exponential of the derivative gives backthe one parameter group of unitaries in accordance with (4.47). We thereforehave

eiθ(Lz/~)ψ = ψ ◦R(θ)−1 ∀ θ ∈ R (4.54)

Furthermore, by (2.6), the one parameter group R(θ) itself is

R(θ) = eθAω , ω = (0, 0, 1) (4.55)

where Aω is the element of the Lie algebra so(3) given by (2.5) with a = 1.

SUMMARY:a) If we write Aω as the element (2.5) of the Lie algebra so(3) which

“generates” rotation around the z axis, i.e., (4.55) holds, and if Lz is theoperator on L2(R3) corresponding to the observable “ z-component of angularmomentum” then

eiθ(Lz/~)ψ = ψ ◦ e−θAω ∀ θ ∈ R (4.56)

and

(i/~)Lzψ =∂

∂θ

∣∣∣θ=0

ψ ◦ e−θAω . (4.57)

b) Starting now with an arbitrary vector ω ∈ R3 the previous computa-tions show that the procedure of differentiating the one parameter group ofunitaries θ 7→ (ψ → ψ◦e−θAω) at θ = 0 gives a skew adjoint operator (i/~)Lωwhere Lω is the angular momentum operator in the direction ω, (times |ω|

89

of course). We thereby have a linear map from the Lie algebra of SO(3) tothe (self-adjoint) angular momentum operators:

so(3) 3 Aω 7→ Lω = angular momentum operator in direction ω, (4.58)

where Aωx = ω × x.c) For any unitary representation of any Lie groupG, Theorem 4.7 induces

a map from g, the Lie algebra of G to operators on the representation Hilbertspace. We will see over and over that this map assigns to the elements ofcertain Lie algebras, operators whose physical interpretation is central toquantum mechanics and quantum field theory. The next section will do thisfor electron spin.

4.9 Spin

We asserted in Section 4.7 that the Hilbert state space for an electron withspin is H ≡ L2(R3) ⊗ C2. This is not a statement with content until oneknows what operators on H have what physical interpretation. Here is thephysical significance of the factor C2. Let a ∈ SU(2). The hom*omorphismρ : SU(2)→ SO(3) described in Appendix 9.3 immediately gives us a repre-sentation of SU(2) on H by the formula

W (a) = Uρ(a) ⊗ a : L2(R3)⊗ C2 → L2(R3)⊗ C2 (4.59)

where UR is the unitary operator on L2(R3) defined in (4.49). You shouldverify at your leisure (like now for example) that W (ab) = W (a)W (b) andthat W (a) is actually unitary. So W is indeed a unitary representation ofSU(2). We are going to apply Theorem 4.7 and see what we get.

Let sz = (i/2)

(1 00 −1

)as in Appendix 9.3. Let us compute the observ-

able corresponding to the one parameter group of unitaries θ 7→ W (eθsz).We’ve already done the hard part in Section 4.8.1. The product rule fordifferentiation gives, in view of (4.57),

∂θW (eθsz) =

∂θ

(Uρ(eθsz ) ⊗ eθsz

)= (i/~)Lz ⊗ IC2 + IL2 ⊗ sz (4.60)

at θ = 0. Therefore

− i~ ∂∂θ|θ=0W (eθsz) = Lz + (−i~sz). (4.61)

90

CAUTION: I’ve omitted the identity operator factors in (4.61), as is commonin the physics literature. But they should be there, just as in (4.60).

Define

Jz = Lz + (−i~sz), which operates on L2(R3)⊗ C2. (4.62)

Terminology: Jz is called the (z component of the) total angular momen-tum. It is a sum of Lz, which is called the (z component of the) orbital angularmomentum, and −i~sz, which is called the (z component of the) spin angu-lar momentum. These names are themselves suggestive and therefore usefulto know. By accepting Jz as the “(z component of) the quantum mechani-cal angular momentum operator”, we acknowledge that the straight-forwardquantization procedure, p 7→ −i~∇, x 7→ Mx, which led us to define the an-gular momentum operator in Table 5 from the classical angular momentumx × p, may not capture the physically correct operator corresponding to agiven classical observable. In our present example there is simply no classicalanalog of a point particle of finite mass, for which spinning around its centermakes sense.

Remarkably, the mechanism we illustrated in Section 4.8.1 for producingan angular momentum operator without spin, and in the present section forproducing an angular momentum operator with spin, seems to apply to allof the fundamental observables in quantum mechanics and quantum fieldtheory. One starts with some Lie group G and some unitary representationof G and uses the mechanism of Theorem 4.7 to map the Lie algebra of Gto operators corresponding to some possibly non-classical observables. Butwhich group G and which unitary representation? Ah! There’s the rub. Thisbrings us to the front lines of the classification of elementary particles. Butwe will develop this procedure in the context of relativity theory first, beforegoing on to elementary particles.

SUMMARY:Taking the viewpoint that the angular momentum operator should al-

ways be defined as the image of a Lie algebra element in some representationof the rotation group, or more generally its covering group SU(2), we sawthat the natural representation of SU(2) on L2(R3)⊗C2 produces an angu-lar momentum operator which is a sum of the classical angular momentumoperator, coming from the motion of the point electron around the origin,

91

plus a term which seems to come from the point itself spinning around itscenter! Go figure.

But it works: if you place an atom in a magnetic field then the interac-tion of the magnetic field with the current (produced of course by the movingelectron) changes the Hamiltonian and therefore the spectrum. This showsup experimentally in that some of the spectral lines change when a magneticfield is turned on around the atom. In fact where there was one spectral linebefore, there are now three, or five, or more. The change of Hamiltonianproduced by the interaction of the magnetic field with the orbital angularmomentum is adequate to explain the split into three lines (Zeeman effect)but not into five or more lines (anomalous Zeeman effect). However thespinning of the electron does the trick: the electron becomes a little magnetindependently of its orbital motion. This magnet interacts with the mag-netic field, changing the Hamiltonian even more. Result: spectral lines split,according to the modified Hamiltonian, (almost) precisely in agreement withexperiment. Why (almost)? Because we have not yet taken special relativityinto account.

A final word on terminology in the physics literature: ignore the orbitalmotion of the electron for the moment and consider just its internal Hilbertstate space, which is C2. Each of the three Hermitian spin angular momentumoperators −i~sx,−i~sy,−i~sz have two one dimensional eigenspaces in C2.These eigenspaces are not in any sense parallel because these three operatorsdon’t commute. But in many computations it is convenient and customaryto single out the z axis (e.g. one might take the magnetic field along the zaxis). And then the z component of the spin angular momentum −i~sz plays

a special role. Since this operator is proportional to the matrix

(1 00 −1

)its

eigenvectors are (1, 0) and (0, 1), the first corresponding to the spin axispointing up along the z axis and the second pointing down. Its customaryto refer to the first as a “spin up” state and to the second as a “spin down”state. But in the absence of a computational or expositional reason to singleout the z axis all directions in C2 are of equal physical significance.

4.10 Pictures: Heisenberg vs Schrodinger

If φt : T ∗(C) → T ∗(C) is the one-parameter diffeomorphism group givingthe Newtonian flow and b is a point in T ∗(C) then a classical observable

92

(which you may recall is specified by a function f on T ∗(C)) takes the valuef(φt(b)) at time t if the state at time t = 0 was b. Mathematically there is nodifference in saying that the observable itself is changing, that is; f → f ◦φt,while the state b is fixed, because after all (f ◦ φt)(b) = f(φt(b)). There isa precise analog of these dual views in quantum mechanics. Suppose thatH is the Hamiltonian operator on our Hilbert state space, H. Lets put~ = 1 for readability. The time evolution of a state ψ0 under the Schrodingerevolution is ψ(t) = e−itHψ0. Suppose that A is a self-adjoint operator on Hcorresponding to some observable. Then the expected value of A after timet is (Aψ(t), ψ(t)) = (Ae−itHψ0, e

−itHψ0) = (eitHAe−itHψ0, ψ0). So define

At = eitHAe−itH . (4.63)

Then (Aψ(t), ψ(t)) = (Atψ0, ψ0). Since these inner products determine allthe information one can get about the physics one could interpret this equa-tion by saying that one might just as well regard the states as fixed whilethe observables change with time. A convenient mathematical nicety of thisviewpoint is that the map A → At is a ∗ preserving automorphism of thealgebra of all bounded operators on H. The flow of diffeomorphisms inclassical mechanics is thereby replaced by a flow of automorphisms of a non-commutative algebra. This is a technically and conceptually useful point ofview, both for mathematical purposes and for quantum statistical mechanics,as it happens. The view presented in Table 4, in which the states ψ change inaccordance with Schrodinger’s equation, is called the “Schrodinger picture”.The view presented in this section, in which the states ψ remain fixed andthe observables change, A → At, is called the “Heisenberg picture”. Thesedual views of the time evolution are mathematically equivalent. But eachviewpoint has advantages.

Lets consider a single particle free to move in one dimension, for simplicity.Suppose that it has mass m and is subject to a force F = −V ′(x), as usual.The Hilbert state space is H = L2(R, dx) and the Hamiltonian is of course

H =1

2mP 2 +MV .

Lets put ~ = 1 for readability again. Then

H = − 1

2md2/dx2 +MV .

DefineQ(t) = eitHQe−itH (4.64)

93

andP (t) = eitHPe−itH . (4.65)

Theorem 4.8 (Newton’s equations in the Heisenberg picture.) When appliedto a wave function ψ ∈ C∞c (R) the previous operators satisfy the followingequations.

m dQ(t)/dt = P (t) (4.66)

dP (t)/dt = −V ′ (4.67)

[P (t), Q(t)] = −iIH, (equal time CCR) (4.68)

Proof.dQ(t)/dt = eitHi(HQ−QH)e−itH (4.69)

But, since multiplication operators commute,

(HQ−QH)ψ = −(1/2m)(d2/dx2xψ − xd2/dx2ψ) (4.70)

= −(1/m)dψ/dx. (4.71)

This proves (4.66). Similarly, dP (t)/dt = eitHi(HP − PH)e−itH . But, sinceP commutes with the kinetic energy term, we find

i(HP − PH)ψ = i(V (x)(−id/dx)− (−id/dx)V (x))ψ(x) (4.72)

= −V ′(x)ψ(x). (4.73)

This proves (4.67). Finally,

(P (t)Q(t)−Q(t)P (t))ψ = eitH(−id/dx x− x(−id/dx))e−itHψ (4.74)

= eitH(−iIH)e−itH (4.75)

= −iIH (4.76)

SUMMARY: As we know, with p = mv, Newton’s equations take the formdp/dt = −V ′(x) when the force is given as F (x) = −V ′(x). The equations(4.66) and (4.67) are therefore precisely Newton’s equations, but for operatorvalued functions of time. The (equal time) canonical commutation relations(4.68) show, however, that these are not identifiable with classical solutions.Some would say that the commutation relations, (4.68), are the real differencebetween classical mechanics and quantum mechanics.

94

4.11 Conceptual status of quantum mechanics

Although quantum mechanics gives fantastically good agreement with ex-periment in all areas of physics, its conceptual foundations are regarded bymany (mostly mathematicians) as incomplete and imprecise. In his classicalbook [60], von Neumann proposed a mathematical interpretation of the mea-surement process. If, for example, a wave function is a linear combinationof two (normalized) eigenstates ψ1 and ψ2 of the Hamiltonian with distincteigenvalues a and b, say ψ = 2−1/2ψ1 + 2−1/2ψ2, then a measurement of thesystem will find the system to be either in state ψ1 or in state ψ2, each withprobability 1/2. (You must make the measurement many times, starting withthe state ψ each time.) But after the measurement the system will remainin state ψ1, if that’s what you found. Thus in the measurement process thesystem has undergone a sudden change from ψ to ψ1. This is called col-lapse of the wave function. To avoid possible inconsistencies associated withcollapse of the wave function some leading physicists endorse the notion of“decoherence”. A fast exposition of this notion and lots of references can befound on the Wikipedia website

http://en.wikipedia.org/wiki/Decoherence

There is a vast literature aiming to put quantum mechanics on a well mo-tivated and clearly organized axiomatic foundation. The algebra of boundedobservables is no longer commutative, as it is in classical mechanics. More-over the lattice of “Question observables” (for which the answer is yes or no)is very different in classical mechanics from that in quantum mechanics. Inclassical mechanics the typical question is “Does the system lie in the Borelset B ⊂ T ∗(C)?” These questions form a lattice (a partially ordered set inwhich any two elements have at least one upper bound. E.g. B1 ≤ B2 meansB1 ⊂ B2. Clearly any two elements have a supremum, namely B1 ∪ B2.)In quantum mechanics the question observables are specified by projectionsonto subspaces of the state Hilbert space H. (P1 ≤ P2 means the usual.)The properties of these two lattices are very different. One can impose one’sfavorite conditions on an abstract lattice and ask what conditions imply thatthe lattice is just like the quantum mechanical one. Axiomatic approachesbased on such lattices are ubiquitous. One approach, containing some of thisidea, but augmented by more concrete axioms can be found in the paper ofGeorge Mackey [41].

The earliest attempt to formulate mechanics in terms of the algebra of

95

observables, from which one could hope to show that the only examples areclassical mechanics or quantum mechanics, is the C∗ algebra approach of I.E. Segal [54]. Subsequent work showed that examples other than these twocould satisfy the axioms. But Segal’s approach was later adapted to form auseful conceptual and technical tool in quantum field theory.

Another approach, that goes under the name of “Geometric quantization”asks how one might map the space of functions on phase space, T ∗(C), intocorresponding operators on the state Hilbert space H. The classic book onthis approach is by Woodhouse [70], from which one can trace back to thefundamental work of B. Kostant. For a recent survey of this approach see B.Hall, [29]

Perhaps quantum mechanics is just an “averaging” over some deeper the-ory in which there are quantities, “hidden variables”, that we just don’t knowhow to measure as yet. This is a very extensively investigated question. Seefor example the book [47] by E. Nelson on stochastic quantum mechanics.

There is a recent work by John Conway and Simon Kochen, which relatesto hidden variables theory and connects these issues to the classical Free Willquestion! See for example

http://www.ams.org/notices/200902/rtx090200226p.pdf

Research on the conceptual and axiomatic foundations of quantum me-chanics is an ongoing activity. For a lead-in to some of this research see, forexample, J.M. Jauch [38] or [5, 6, 7]. And/or type in [Title] 〈The interpre-tation of quantum mechanics〉 on mathscinet. This will bring up (at least)44 references.

4.12 References

Treatments of the basic concepts of quantum mechanics, aimed at mathe-maticians can be found in

Gerald Folland, Quantum field theory, A tourist guide for mathematicians.[21]Brian Hall, An introduction to quantum theory for mathematicians. [28].An updated version of Brian Hall’s forthcoming book can be downloaded

from the 7120 website. These two books give a more serious and detailedtreatment of many of the topics we’ve covered.

Victor Guillemin (the father of our Victor Guillemin)“The story of quan-tum mechanics” [27]. This is a fun book to read.

96

B.L. van der Waerden has a source book in which some of the originalpapers of Heisenberg et al are reprinted. van der Waerden also makes in-sightful commentaries of his own, concerning what happened in those days,the 1920’s. Source book: [61]

Hermann Weyl, in1928, was already into quantum mechanics. His classicbook “Gruppentheorie und Quantenmechanik” has been revised several timessince the 1928 edition. It was translated into english in 1931 and is nowavailable in a Dover edition. [65].

Jagdish Mehra and Helmut Rechenberg, [45], have written 2000 pages ofrecollections, gossip, and a record of the changing viewpoints of the foundersof quantum mechanics. This six volume set also contains detailed expositionsof many of the fundamental papers. They explain the relation between thecontributions of the founders and the influence of one on the other. Thiswork is based on extensive interviews with the founders as well as on thepapers themselves.

5 Quantum field theory

We are going to construct the simplest quantum field. Our aim is to illus-trate not only what the typical structure of a quantum field is, but to showalso how a “free” quantum field can be regarded as a sequence of harmonicoscillators. The heuristics behind this “equivalence” is itself extremely il-luminating and still dominates a large part of the textbook literature onquantum field theory. The heuristics includes the first, and most easily un-derstood, subtraction of an infinite quantity. Such subtractions (so-calledrenormalizations) are ubiquitous in quantum field theory. Many physicistsregard them (at the present time, 2011) as vital ingredients of the theory,rather than an artifact of the present day formulation. In the preface to hisreprint book of the classical papers, “Quantum electrodynamics” [53, pagexv], Julian Schwinger points out that “it took the labors of more than acentury to develop the methods that express fully the mechanical principleslaid down by Newton.” Now quantum field theory is barely 80 years old, andis a much more complicated mathematical system than Newtonian mechan-ics. Even ignoring the possibility of major fundamental modifications of thepresent version of quantum field theory, such as e.g. string theory, which maybe required for understanding higher energy phenomena, 80 years is a prettyshort time to get the mathematics straight. Moreover it may be impossible to

97

get the present versions of quantum field theory into a mathematically con-sistent form without including future modifications. Will another hundredyears do the trick? Or is that too optimistic.

5.1 The harmonic oscillator

One harmonic oscillator. The restoring force for a harmonic oscillatoron the line is given by Hooke’s law F (x) = −kx, as already mentioned inExample 2.9. Newton’s equation, F = ma, reduces, in this case to

− kx = mx (5.1)

where x is the oscillator position, x = dx/dt, and k is the spring constant.The force F is −grad V , with V = k

2x2. The energy of the oscillator is

therefore

E =1

2mx2 +

k

2x2 (5.2)

or, equivalently,

E =1

2mp2 +

k

2x2, (5.3)

where p = mv = mx is the momentum. Define ω =√k/m. Then Newton’s

equation, (5.1), reads x+ ω2x = 0, for which the general solution is

x(t) = A cosωt+B sinωt. (5.4)

So ν ≡ ω/(2π) is the frequency of this periodic motion. The angular fre-quency ω is of more interest to us than the spring constant. So we will alwayswrite harmonic oscillator equations in terms of ω. Thus, since k = mω2, wewill write the energy (5.3) as

E =1

2mp2 +

mω2

2x2, (5.5)

Quantization of this system yields the Hamiltonian

H = − 1

2m

d2

dx2+mω2

2x2, (5.6)

which is to be interpreted as a self–adjoint operator on L2((−∞,∞); dx) (andwhich is obtained from (5.3) by the usual substitution p→ 1

i∂∂x

, x→ mult.by x). We are going to set ~ = 1 in this section.

98

n harmonic oscillators. Similarly the energy and Hamiltonian for a systemconsisting of n independent harmonic oscillators of masses mj and springconstants kj may be derived from the corresponding Newton’s equations,

mjd2xjdt2

= −kjxj j = 1, . . . , n. (5.7)

The total energy is

E =n∑j=1

(− 1

2mj

p2j +

mjω2j

2x2j

), (5.8)

wherein we have again expressed the energy in terms of the angular frequencyωj =

√kj/mj. The Hamiltonian is therefore given by

H =n∑j=1

(− 1

2mj

∂2

∂x2j

+mjω

2j

2x2j

). (5.9)

H is to be interpreted as a self–adjoint operator on a suitable domain inL2(Rn; dnx) (e.g., it can be defined as the closure of its restriction to C∞c (Rn)).

5.2 A quantized field; heuristics.

We will show now how the quantization of a classical (noninteracting) fieldcan be regarded as just an assembly of infinitely many harmonic oscillators.

To avoid technical problems that will only obscure the big ideas, we aregoing to consider only a field that has just one component (not six like theelectromagnetic field), that lives on a one dimensional space (not on R3), andthat lives in fact on just a finite interval (not all of R). In the absence ofany charges or currents, which is the case of interest to us, any componentof the electromagnetic field satisfies the wave equation ∂2u

∂t2= ∆u. It is this

feature, a hyperbolic wave equation, that must be preserved in any reasonableexample. Whereas quantum mechanics arises from quantizing an ordinarydifferential equation, namely Newton’s equation, quantum field theory arisesfrom quantizing a partial differential equation (the field equation).

We are going to quantize the “field equation”

∂2u

∂t2=∂2u

∂x2, (5.10)

99

which is the equation for a vibrating string. Moreover we will take space tobe the interval (0, π) and assume that the string is fixed at the endpoints.I.e.,

u(0, t) = 0 u(π, t) = 0 for all t. (5.11)

In accordance with the method of separation of variables a general solu-tion of (5.10) and (5.11) can be written in the form

u(x, t) =∞∑j=1

qj(t)uj(x) (5.12)

where uj(x) and qj(t) each satisfy ordinary differential equations and uj iszero at the endpoints. The ordinary differential equation for uj is the eigen-value equation d2u/dx2 = λu(x) with u(0) = u(π) = 0. The eigenfunctions,normalized in L2((0, π)), are

uj(x) =

√2

πsin jx j = 1, 2, . . . (5.13)

The eigenvalues are λ = −j2, j = 1, 2, . . . . That is, u′′j = −j2uj. Thefunctions uj form an orthonormal basis of L2((0, π)) and any initial conditioncan therefore be written as

u0(x) =∞∑j=1

qjuj(x) (5.14)

for some suitable choice of the real numbers qj, which we leave unspecifiedfor now. The time evolution of the initial data u0 is found, as usual, byletting the coordinates qj depend on time as in equation (5.12). Then thefield equation (5.10) reads

∞∑j=1

qj(t)uj(x) = ∂2u(x, t)/∂t2

= ∂2u(x, t)/∂x2 =∞∑j=1

qj(t)u′′j (x)

= −∞∑j=1

j2qj(t)uj(x).

100

Since the functions uj are orthonormal it follows that

d2qjdt2

= −j2qj j = 1, 2, . . . . (5.15)

Of course u(x, t) is determined by the values u(x, 0) and u(x, 0) which, inview of (5.12), are determined by the two sequences {qj(0)} and {qj(0)}.

Comparison of equations (5.15) with (5.7) shows that the boundary valueproblem (5.10), (5.11) is equivalent to a mechanical system consisting ofinfinitely many harmonic oscillators with masses mj all equal to one andspring constants kj = j2. The canonical coordinates are {qj}∞j=1 and {pj =mj qj = qj}∞j=1.

According to the principles of quantum mechanics one should thereforequantize this system by taking the Hilbert space to be

K = L2(R∞, dq1dq2 · · · ) (5.16)

and the Hamiltonian to be (in analogy to (5.9) with all mj = 1)

H =∞∑j=1

1

2

(− ∂2

∂q2j

+ ω2j q

2j

)(5.17)

where ωj = j. Moreover, if Qj denotes the quantization of qj, i.e. Qj =multiplication by qj on K, the principles of quantum mechanics assert, inview of (5.12), that the quantized field ϕ at time zero is given by

ϕ(x, 0) =∞∑j=1

Qjuj(x). (5.18)

The field is therefore an operator valued function on (0, π). That is to say,the field is an operator on K for each x ∈ (0, π).

There are a few little problems with the preceding discussion.First, the definition of K in (5.16) is meaningless because of the appear-

ance of infinite dimensional Lebesgue measure. Don’t worry. We will changethe Lebesgue measure to a well defined probability measure in a natural way.

Second, the series defining the operator H in (5.17) diverges to +∞ inevery reasonable sense.Don’t worry. We will subtract off an infinite constantthat will make H a meaningful operator on a meaningful Hilbert space

101

Third, the series (5.18) defining the quantum field ϕ(x) diverges for almostevery x ∈ (0, π). Don’t worry.

SUMMARY: The astute reader can very likely appreciate now the word“heuristics” in the title of this subsection. The heuristic discussion of thepresent section still remains today a good guide for the construction, andeven definition, of what constitutes a quantum field. Precise and meaning-ful definitions of “quantum field” have been given and explored extensively.See e.g. Streater and Wightman [56]. Studies dealing with mathematicallymeaningful definitions of quantum fields go by the name “ axiomatic quan-tum field theory”. Unfortunately no axioms have yet been proposed whichseem likely (to many) to be satisfied by the actual fields (gauge fields) thathave been most successfully corroborated by experiment. For the time be-ing, therefore, its important to understand the heuristic basis for our currentexample as well as a meaningful version of it, to which we now turn. In or-der to rescue the preceding heuristic discussion of our quantized field we aregoing to return temporarily to harmonic oscillators. In one form or anotherthe following rescue operation still appears in modern textbooks on quantumfield theory for physicists.

5.3 The ground state transformation

One harmonic oscillator again. Consider a single harmonic oscillatorwith m = 1 and k = ω2. Its Hamiltonian, according to (5.6), is

H = − 1

2m

d2

dq2+

1

2mω2q2 on L2((−∞,∞); dq). (5.19)

For any constant a > 0 the easily verified identity

He−ax2/2 =

a

2me−ax

2/2 +(mω2

2− a2

2m

)x2e−ax

2/2 (5.20)

shows that e−ax2/2 is an eigenfunction of H if and only if a = mω. Let

ψ0(q) = (mω/π)1/4e−mωq2/2. (5.21)

Then (5.20) shows that

a) Hψ0 = (ω/2)ψ0

102

Moreover an easy Gaussian integral shows that

b)∫∞−∞ ψ0(q)2dq = 1.

So ψ0 is a normalized eigenfunction for H and, since ψ0 is positive, ω/2is the lowest eigenvalue of H and ψ0 is the ground state of H. A directcomputational proof that ω/2 is the lowest eigenvalue will be given in thenext theorem.

Letdµ(q) = ψ0(q)2dq. (5.22)

Then µ is a probability measure on R. Define

U : L2(Rn, dq)→ L2(Rn, µ)

by(Uψ)(q) = ψ(q)/ψ0(q). (5.23)

U is clearly unitary. We assert that

UHU−1 =(− 1

2m

d2

dq2+ ωq

d

dq

)+ω

2I. (5.24)

Indeed, let f(x) = (Uψ)(x) = ψ(x)/ψ0(x). Then (U−1f)(x) = ψ(x) =f(x)ψ0(x) and therefore

(Hψ)(x) = f(x)Hψ0 −1

2m

(f ′′(x)ψ0(x) + 2f ′(x)ψ′0(x)

)(5.25)

= f(x)ω

2ψ0(x)− 1

2m

(f ′′(x)− 2f ′(x)(mωx)

)ψ0(x) (5.26)

Hence (UHU−1f)(x) = (Hψ)(x)/ψ0(x) = (ω/2)f(x) − 12mf ′′(x) + ωxf ′(x),

which is (5.24).Define

H = − 1

2m

d2

dq2+ ωq

d

dqon L2(R, µ) (5.27)

Then (5.24) asserts that

H = U(H − ω

2)U−1 (5.28)

103

Theorem 5.1

(Hf, g)L2(µ) =1

2m

∫R

f ′(q)g′(q)dµ(q) and (5.29)

(Hf, f)L2(µ) ≥ 0 for all sufficiently smooth f ∈ L2(µ). (5.30)

Moreover inf(spectrum H) = 0 and inf(spectrum H) = ω/2.

Proof. In view of the definition (5.21) an integration by parts gives∫R

f ′(q)g′(q)dµ(q) =

∫R

f ′(q)g′(q)ψ0(q)2dq

=

∫R

(−f ′′(q) + 2mωqf ′(q))g(q)ψ0(q)2dq

= 2m(Hf, g)L2(µ).

This proves (5.29). Put g = f in (5.29) to deduce (5.30). Since the constantfunctions are in L2(µ) and H1 = 0 the bottom of the spectrum is indeed zeroand is an eigenvalue. Finally (5.28) shows that H = U−1HU + ω/2. SinceU−1HU has the same spectrum as H this completes the proof.

Remark 5.2 The operator U commutes with multiplication, as is clear fromthe simple formula (5.23). For example

UMqU−1 = Mq. (5.31)

(The domain of Mq has changed however. But never mind that.) But Udoes not commute with differentiation. Thus if f(q) = ψ(q)/ψ0(q) then(U−1f)(q) = ψ(q) = f(q)ψ0(q) and therefore

(Ud

dqU−1f)(q) = U

( ddq

(f · ψ0))

(5.32)

=(f ′ψ0 + fψ′0

)/ψ0 (5.33)

= f ′ − ωq (5.34)

Therefore our momentum operator, which was given by P = −i~(d/dq) whenwe used Lebesgue measure to define the Hilbert state space is now given by

P = −i~( ddq− ωq

)(5.35)

104

as an operator in L2(R; γω). That’s what happens when you change therepresentation of the Hilbert state space. we saw a more drastic change in theformula for the momentum operator when we transformed the Hilbert statespace by the Fourier transform. P just became a multiplication operator.There is a lesson here: In spite of the changed appearance of the formulasfor P and Q, the all important commutation relations [P,Q] = −iI areunchanged.

There are now five naturally occurring operators in this “ground state”representation of the Hilbert state space for a harmonic oscillator. The rela-tions between them dominate a large part of quantum field theory. Define

α =d

dqacting on L2(R; γω) (5.36)

Then an integration by parts shows, just as in the proof of (5.29), that

α∗ = − d

dq+ 2ωq (5.37)

Consequently, in view of (5.35) and (5.27), we have the relations

α + α∗ = 2ωQ, (5.38)

α− α∗ = 2(i/~)P and (5.39)

(1/2)α∗α = H. (5.40)

In short, all of the operators we have discussed so far can be expressed neatlyin terms of the differentiation operator α and its adjoint in the Hilbert spaceL2(R; γω).

SUMMARY: We see that ω/2 is the bottom of the spectrum of the originalHamiltonian H that we’ve been using in preceding sections. If we lowerthe entire spectrum by subtracting off the so called “zero point energy”,ω/2 then the resulting operator is unitarily equivalent to the operator H.The quadratic form of H has the really simple formula given in (5.29). Anoperator whose quadratic form has this nice first derivative look is called aDirichlet form operator. The description (5.29) is equivalent to writing H inthe form (5.40). Such operators are especially easy to deal with. Now noticethat subtracting the constant ω/2 from the Hamiltonian H does not changethe physics because this just amounts to changing the potential mω2q2/2 by

105

an additive constant. Moreover a unitary transformation is an isomorphismfor quantum mechanics and therefore the operator H represents the harmonicoscillator Hamiltonian just as well as our original choice H does.

n harmonic oscillators again. Let us return now to a system of nindependent harmonic oscillators all of mass one and spring constants ω2

j .By (5.9) we find the Hamiltonian in this case to be

Hn = (1/2)n∑j=1

(−∂2/∂q2j + ω2

j q2j ) acting in L2(Rn, dq) (5.41)

Letψj(qj) = (ωj/π)1/4e−ωjq

2j /2

andψ0(q) = Πn

j=1ψj(qj), q = (q1, . . . , qn)

Then

Hnψ0 =[(1/2)

n∑j=1

ωj]ψ0 (5.42)

because Hn is a sum of independent differential operators. So ψ0 is theground state of Hn and the “zero point energy” (i.e. inf spectrum Hn) is

En ≡ (1/2)n∑j=1

ωj. (5.43)

Definedµn(q) = Πn

j=1[ψj(q)2dqj] on Rn (5.44)

Then µn is a probability measure on Rn and the map

ψ → Unψ = ψ/ψ0 (5.45)

is again unitary from L2(Rn, dq) onto L2(Rn, µn). Just as for a single oscil-lator one sees that

Un(Hn − En)U−1n =

n∑j=1

(−(1/2)∂2/∂q2j + ωjqj∂/∂qj). (5.46)

106

So define

Hn =n∑j=1

(− 1

2

∂2

∂q2j

+ ωjqj∂

∂qj

)acting in L2(Rn, µn). (5.47)

ThenUnHnU

−1n = Hn + En. (5.48)

Let us simply record here the obvious extension of the formulas (5.35) -(5.40) to n harmonic oscillators. Define

αj =∂

∂qjacting on L2(Rn;µn) (5.49)

Then

α∗j = − ∂

∂qj+ 2ωjqj (5.50)

αj + α∗j = 2ωjQj (5.51)

αj − α∗j = 2(i/~)Pj (5.52)

(1/2)n∑j=1

α∗jαj = H (5.53)

SUMMARY: Hn differs from our first choice of Hamiltonian, (5.41) intwo ways. First, it incorporates the subtraction of the zero point energy En,which does not affect the physics because it just changes the potential by anadditive constant. Second, it incorporates the unitary transform Un, whichalso does not affect the physics. Thus the assembly of quantum harmonicoscillators are just as well served by Hn acting in L2(Rn;µn) as by Hn actingin L2(Rn; dnq).

We are ready to return to an infinite assembly of harmonic oscillators.

5.4 Back to the quantized field.

We can see now, from (5.41) and (5.43) that the operator H “defined” in(5.17) is bounded below by (1/2)

∑∞j=1 ωj, which diverges because ωj = j in

that field theory. So we can say, informally , that H ≥ +∞, which shows thatH is at best meaningless. The customary resolution of this problem goes likethis. Any potential is defined only up to an additive constant because only

107

grad V has direct physical meaning. Thus there is no change in the physicsif we subtract the infinite constant (1/2)

∑∞j=1 ωj from the operator H. We

may therefore attempt to give the infinite sum in (5.17) a mathematicalinterpretation by passing to the limit n → ∞ in the system (5.47), whichafter all is physically equivalent to the system (5.41).

To this end note first that the measure µn in (5.44) is just a product ofprobability measures, an infinite product of which is a perfectly respectablemeasure. Define

dµ(q) = Π∞j=1[ψj(qj)2dqj] on R∞ (5.54)

Let

H =∞∑j=1

(− 1

2

∂2

∂q2j

+ ωjqj∂

∂qj

)acting in L2(R∞, µ). (5.55)

Since there are no zeroth order terms in H (unlike (5.41)) the infinite summakes perfectly good sense when applied to many functions on the infiniteproduct space. For example if f : R∞ → C depends on only finitely manycoordinates, say f(q) = g(q1, . . . , qn), with g ∈ C∞c (Rn), then there are onlyfinitely many nonzero terms in the sum for Hf . Such functions are dense inL2(R∞, µ). So H is densely defined by the formula (5.55). Thus by subtract-ing the (infinite) zero point energy from (5.17) and unitarily transforming tothe ground state measure µ we have given a meaning to (5.17) as a denselydefined operator in L2(R∞, µ), namely as the operator H defined in (5.55).Moreover we have also “given” a meaning to the Hilbert space K “defined”in (5.16) since K can just be replaced (informally) by the well defined Hilbertspace L2(R∞, µ). After all, the latter is just the image under the “unitaryoperator limn→∞ Un” (which however makes no sense because its domainis meaningless). (If you are getting the feeling that the boundary betweenheuristics and theorems is being stretched you’re right. But you have to getused to it. The infinite subtraction and concomitant change of Hilbert spacedescription is among the mildest in this subject.) We have now solved thefirst two of the three little problems raised at the end of Section 5.2. Beforeaddressing the third lets understand the operator H a little better and themeasure µ.

The operator H is particularly nicely related to the measure µ. Just as inthe one harmonic operator oscillator case, (5.29), the operator Hn in (5.47)

108

clearly satisfies

(Hnf, g)L2(µn) = (1/2)

∫Rn

(∇f · ∇g)dµ (5.56)

With just a little thought you can see that this persists in the limit as n→∞for functions f and g which depend on only finitely many coordinates. Thusfor such a dense set of functions we have

(Hf, g)L2(R∞,µ) = (1/2)

∫R∞

∞∑j=1

(∂f/∂qj)(∂g/∂qj)dµ(x) (5.57)

If one wished to pursue the functional analytic questions left untouched inthis discussion, such as whether H actually has a self-adjoint version in L2(µ),the description of H as the operator associated to a Dirichlet form, as in(5.57), offers a quick and easy mechanism to prove such nice properties be-cause Dirichlet forms in finite and infinite dimensions are well understood.But we won’t carry this out here.

To connect with parts of the physics literature it is illuminating to dealin the following informal way with the measure µ defined in (5.54). WriteDq = Π∞j=1dqj for Lebesgue “measure” on R∞. Then (5.54) gives

dµ(x) = [Π∞j=1(ωj/π)1/2][e−∑∞j=1 ωjq

2j ]Dq. (5.58)

In the case of primary interest to us we have ωj = j. So the first of the threefactors in (5.58) is infinite. The second factor happens to be zero a.e. withrespect to µ. And the third factor is meaningless. It is therefore particularlyedifying that the product of all three factors makes perfectly good sense as ameasure on R∞. The re-association of factors from (5.54) to (5.58) converts ameaningful probability measure µ into a meaningless expression (5.58). Butget used to it. In the physics literature this measure is usually written as(5.58).

Returning now to the informally quantized field ϕ(x) defined in (5.18)here is how we will give it a well defined meaning. Recall that in defining theclassical electric field E(x) one must, in principle, place a small charged pieceof matter around the point x ∈ R3 and measure the force on the object. Thisgives the average force on the object. One must then, in principle, repeatthis measurement with a smaller object, a ball centered at x say, and takea sequence of measurements as the radius of the ball goes to zero while the

109

charge also goes to zero. This way you get the field value E exactly atx and uninfluenced by the change of the sources of the field that may beproduced by the testing charge itself. This is an OK viewpoint in classicalmechanics. But in the actual world you can’t take the charge to zero becausethe minimum charge is the charge on an electron. And of course you can’ttake the radius of the ball smaller than the radius of an atom (or proton,or quark if you want to stretch it.) One can only hope that the averagesthemselves are good enough to make a well defined theory. We are going tocarry this out in our vibrating string example.

Let f ∈ C∞c ((0, π)) be real valued. Think of f as the density of chargeon the little test ball (test interval in our case). We want to give meaningto∫ π

0ϕ(x)f(x)dx when ϕ is given by (5.18). Interchanging the integral with

the sum we define

ϕf =∞∑j=1

Qj

∫ π

f(x)uj(x)dx (5.59)

We will show that this series of operators on L2(R∞;µ) converges in a rea-sonable sense. And then we may write, informally,∫ π

f(x)φ(x)dx = ϕf . (5.60)

Since the series defining ϕ(x) does not actually converge, the integrand onthe left is meaningless. But the operator on the right is well defined. Thiscirc*mstance is to be interpreted by saying that the quantized field ϕ(x)is not an operator valued function after all but rather an operator valueddistribution. This is typical for quantized fields.

Theorem 5.3 Let fj be a sequence of real numbers. For ψ ∈ L∞(R∞;µ)define

(Qjψ)(q1, q2, . . . ) = qjψ(q1, q2, . . . ). (5.61)

Then the series∞∑1

Qjfjψ (5.62)

converges in L2(R∞;µ) for all ψ ∈ L∞ if and only if

∞∑j=1

f 2j /ωj <∞ (5.63)

110

In particular, if f ∈ C∞c ((0, π)) then the series in (5.59) converges on thedense set L∞(R∞, µ).

Proof. Each coordinate function qj is a mean zero Gaussian randomvariable with

‖qj‖2L2(µ =

1

2ωj. (5.64)

Since they are also independent we have

‖n∑j=1

qjfj‖2L2(µ) =

n∑j=1

f 2j /(2ωj). (5.65)

Since any series of orthogonal functions converges in L2 if and only if theseries of their square norms converges, it follows that the series

∑∞j=1 qjfj

converges in L2(µ) if and only if (5.63) holds. But

∞∑1

Qjfjψ = (∞∑j=1

qjfj)ψ. (5.66)

So for bounded ψ the series on the left converges in L2 for each ψ if and onlyif the series (

∑∞j=1 qjfj) converges in L2.

Now if f ∈ C∞c ((0, π)) then it is also in L2((0, π)) and therefore∑∞

j=1 f2j <

∞. Hence (5.63) holds because ωj → ∞. This proves the last assertion ofthe theorem.

Side remark: The functions sj =√

2ωjqj are orthonormal coordinates inL2(R∞, µ), as is shown in (5.64).

SUMMARYWe have now followed a heuristic path, starting with analogies to infinite

dimensional quantum mechanics and its attendant infinities and meaninglessspaces, and ending with a well defined Hilbert space, a well defined Hamil-tonian on it, and well defined field operators. In this way we have unitedthe spirit of the vibrating string equation (5.10) with the spirit of quantummechanics.

Putting all the heuristics behind us now, here is what we actually con-structed. The following summarizes the notation of this section and thestatement of theorems. It is self contained except for the proofs. There areno divergences and no infinities to subtract.

111

Notation 5.4 Define

uj(x) =√

2/π sin(jx), 0 ≤ x ≤ π, j = 1, 2, . . . (5.67)

γω(dq) =√ω/πe−ωq

2

dq, q ∈ R, ω > 0 (5.68)

ωj = j (5.69)

q = (q1, q2, . . . ) (5.70)

γ(dq) = Π∞j=1γωj(dqj) (5.71)

H = L2(R∞, γ) (5.72)

(Qjψ)(q) = qjψ(q), for ψ : R∞ → C (5.73)

Theorem 5.5 Let f ∈ C∞c ((0, π)). Define

φf =∞∑j=1

(f, uj)L2((0,π))Qj (5.74)

This series of (unbounded) operators converges in L2(R∞, γ) when applied toa bounded function ψ. Moreover the map

C∞c ((0, π)) 3 f 7→ φf ψ (5.75)

is a linear map from C∞c ((0, π)) into L2(R∞; γ) for each function ψ ∈ L∞(R∞; γ).Furthermore there is a densely defined, non-negative, essentially self-adjointoperator H on L2(R∞; γ) given by the series

Hψ =∞∑j=1

(− 1

2

∂2

∂q2j

+ ωjqj∂

∂qj

)ψ (5.76)

which converges in L2(R∞; γ) for a dense set of ψ. (E.g. ψ that depend ononly finitely many qj and smoothly also.)

Notice that this is a clear cut theorem with no divergences and no heuris-tics. The linear map f 7→ φf is a map from test functions to operators onH. So it is an operator valued distribution over (0, π). The Hilbert space His a well defined Hilbert space and the Hamiltonian H is as nice as can be.The operator valued distribution f 7→ φf is the simplest example of a typicalquantum field. Based on this example you are almost ready to make yourown set of axioms for quantum field theory.

However we have not yet produced time dependent field operators φ(x, t)which can claim to be quantum analogs of solutions to the field equation(5.10). You can’t make axioms for relativistic quantum field theory if youdon’t have time playing a role similar to that of space.

112

5.5 The time dependent field

As we know from Section 4.10, we have the option of letting the initialwave function ψ0 evolve in time by the Schrodinger equation, giving ψ(t) =e−itHψ0, or keeping ψ0 fixed and letting the observables change with time(Heisenberg picture). In quantum field theory it is more insightful to usethe Heisenberg picture. To this end let us put the time dependence into thequantized field ϕf . In accordance with the prescription (??), define

ϕf (t) = eitHϕfe−itH .f ∈ C∞c ((0, π)) (5.77)

This is the Heisenberg field operator. The use of the honest operator φfinstead of its informal expression

φf =

∫ π

φ(x)f(x)dx (5.78)

is unfortunately cumbersome and actually makes some simple equations hardto read. We have to learn how to write these equations in terms of such infor-mal symbols as φ(x) and how then to give these equations precise meaningsin the distribution sense. The time to do this is now.

To begin with, let us informally manipulate with the definition (5.77)thus:

φf (t) = eitH∫ π

φ(x)f(x)dxe−itH (5.79)

=

∫ π

o

(eitHφ(x)e−itH

)dx (5.80)

=

∫ π

φ(x, t)f(x)ds (5.81)

whereφ(x, t) = eitHφ(x)e−itH , 0 ≤ x ≤ π, ] t ∈ R. (5.82)

Since x 7→ φ(x) is not really a function, what does (5.82) mean?Answer # 1. φ(x, t) is a distribution in x for each t. In other words∫ π

0φ(x, t)f(x)dx is a well defined operator for each t, namely it is φf (t).Answer #2. Since φ(x, t) depends on both x and t it would be reasonable

to interpret it as an operator valued distribution over (0, π) × R. In thisinterpretation one would want to give meaning to

φ(g) =

∫(0,π)×R

φ(x, t)g(x, t)dxdt. (5.83)

113

as an honest operator for each test function g ∈ C∞c ((0, π) × R). This iseasy enough to do in our case because we need only do the x integral first in(5.83) to find

φ(g) =

∫Rφgt(t)dt, (5.84)

where gt(x) = g(x, t). The integrand is a well defined operator valued func-tion of t. So the integral might make sense. Why is there any question aboutthis? Well, these are all unbounded operators. Their domains may changewith t. Thus if one hopes to interpret (5.84) as meaning

φ(g)ψ =

∫Rφgt(t)ψdt (5.85)

for some set of ψ ∈ H then one has to be sure to use only those ψ whichare in the domains of all the operators φgt(t), t ∈ R. Fortunately, theredoes indeed exist a dense subspace D ⊂ H which not only lies in all ofthese domains but for which the function t 7→ φgt(t)ψ is a continuous as afunction into H. The integral in (5.85) makes clear sense for such ψ andthe equation (5.84) therefore has a meaningful interpretation. It would takeus too far afield to go any further into these domain issues. But a readerwho would like to pursue these technical problems herself might considerthe functions ψ ∈ H ≡ L2(R∞; γ) such that ψ(q) depends only on finitelymany coordinates, q1, q2, . . . , qN and as a function of these coordinates isin C∞(RN) with bounded derivatives. This dense subspace “works” for allchoices of the test function g.

Having now alerted the reader as to how to interpret the following defini-tions and equations we can now state the most significant properties of ourquantum field.

Theorem 5.6 Define φ(x) as in the preceding section. Let

φ(x, t) = eitHφ(x)e−itH , 0 < x < π, t ∈ R (5.86)

Defineπ(x, t) = (∂/∂t)φ(x, t) (5.87)

Then

φ(x, t) = φ′′(x, t) and (5.88)

[π(x, t), φ(y, t)] = −iδ(x− y)IH (5.89)

114

Remark 5.7 (More pep talk.) Before proving this theorem lets review themeaning of these equations yet again. Let g ∈ C∞c ((0, π)×R). The equation(5.88) means this: multiply by g(x, t), intgrate with respect to x and t, dotwo integration by parts with respect to x and two with respect to t. Ignoreboundary terms because g is zero near the boundary of (0, π)×R. Then wefind ∫

(0,π)×Rφ(x, t)g(x, t)dxdt =

∫(0,π)×R

φ(x, t)g′′(x, t)dxdt. (5.90)

We already know what this equation means because there are no derivativesof φ involved. By definition this equation is what (5.88) means. To interpret(5.89) let f and h be two real functions in C∞c ((0, π)).Multiply (5.89) byf(x)h(y) and integrate with respect to x and y. We find

[πf (t), φh(t)] = −i(f, h)L2((0,π))IH (5.91)

By definition, this is the meaning of (5.89).

Remark 5.8 (Yet more pep talk.) There are lots of real valued solutions tothe vibrating string equation (5.10), which is the same equation as (5.88).But (5.89) forces φ to be operator valued, in order get a non zero commu-tation. Remarkably, the pair of equations (5.88) and (5.89) together haveonly one solution, namely the one we constructed. Here is just a little moreprecision about this.

Hypotheses: a) Suppose that φ(x, t) and π(x, t) ≡ (∂/∂t)φ(x, t) are op-erator valued distributions over (0, π) × R, acting on some Hilbert space Kand satisfying (5.88) and (5.89). b) K contains no closed subspace which isinvariant under all the operators φ(x, t). c) technical conditions.

Conclusion: There is a unitary operator V : K → H such that V φ(x, t)V −1 =φ(x, t) for all x and t.

MORAL: We have constructed the only solution to the pair of equations(5.88) and (5.89) up to unitary equivalence.

Proof of Theorem 5.6. Differentiating (5.86) with respect to t wefind π(x, t) = ieitH(Hφ(x)−φ(x)H)e−itH because (d/dt)eitH is equal both toeitHiH and to iHeitH . As far as the meaning of this equation is concerned,notice that if one multiplies this equation by f(x) and integrates over (0, π)one gets the well defined operator φf in two places. So this equation makes

115

good sense. We are going to leave out this kind of observation henceforth,leaving such niceties to the reader.

If we differentiate (5.86) again we arrive at a second commutator. Thuswe have

π(x, t) = φ(x, t) = ieitH [H,φ(x)]e−itH (5.92)

φ(x, t) = −eitH [H, [H,φ(x)]]e−itH . (5.93)

We will evaluate these commutators. Now φ(x) =∑∞

j=1 uj(x)Qj and H =∑∞k=1

(− (1/2)∂2

k +ωkqk∂k

). All of the terms in H commute with Qj except

the term k = j. It is straightforward to compute the one non-zero commuta-tor because it is just a one dimensional computation. So also for the secondcommutator. Here is the result.

[H,Qj] = −∂j + ωjQj (5.94)

[H, [H,Qj]] = ω2jQj. (5.95)

Therefore

[H,φ(x)] =∞∑j=1

uj(x)[H,Qj]

=∞∑j=1

uj(x)(−∂j + ωjQj) (5.96)

and

[H, [H,φ(x)]] =∞∑j=1

uj(x)ω2jQj

= −∞∑j=1

u′′j (x)Qj

= −φ′′(x), (5.97)

since u′′j + ω2juj = 0 by (5.13). This proves (5.88).

Since (5.89) contains a delta function we will have to adhere to the moreprecise meaning of (5.89), which is (5.91). Let fj = (f, uj)L2((0,π)) and hj =

116

(h, uj)L2((0,π)). As before, the conjugation by eitH can be taken outside thecommutator. Since eitH commutes with the identity operator it suffices toprove (5.91) at t = 0. Thus we need to prove that

i[[Hφf ], φh] = −i(f, h)L2((0,π))IH

By (5.96) we have

i[[Hφf ], φh] = i[∞∑j=1

fj(−∂j + ωjQj),∞∑k=1

hkQk] (5.98)

= i∞∑j=1

[fj(−∂j + ωjQj), hjQj] (5.99)

= −i∞∑j=1

fjhjIH (5.100)

= −1(f, h)L2((0,π))IH (5.101)

We have of course used [(−∂j + ωjQj), Qk] = 0 if k 6= j. This proves (5.91)and therefore (5.89).

5.6 Many, many particles: Fock space

There are processes in which a particle suddenly appears that wasn’t therebefore. For example when the electron in a hydrogen atom drops into alower energy state (orbit) light is emitted in the form of a photon that wasn’taround before. It can also happen that a photon wandering by is absorbedby the atom, putting the electron into a higher energy state. In this case thephoton disappears. The interactions between particles can be described inthis way in general: some particles disappear while other particles appear.The Hilbert state space appears to be changing suddenly, corresponding tothe change of particles being described. This won’t do. The dynamics of suchtransitions from one multi-particle state to another, with different particlesand different numbers of them coming in and going out, has be incorporatedinto the usual format; operators on a single Hilbert space. This will be a“big” Hilbert space, one that has subspaces corresponding to all the particlesinvolved and all the possible numbers of them. There will be operators whoseinterpretations are to create or destroy particles. The mathematical structurethat does the trick was implicit in Dirac’s paper [?] and made explicit in V.

117

A. Fock’s paper [20]. A fully rigorous account of this mathematical structurewas first given by J.M. Cook [13].

The mathematical substance of this structure is largely algebraic and alittle bit functional analytic. Moreover much of the structure is highly specificto quantum field theory. Our exposition of this topic will therefore be selfcontained.

5.6.1 Creation and annihilation operators

Notation 5.9 Let K be a complex, separable Hilbert space. F0 will denotethe space of algebraic tensors over K. Thus an element β ∈ F0 is of the form

β =∞∑j=0

βj (finite sum), where βj ∈ K⊗algj. (5.102)

Here K⊗algj denotes the algebraic j tensors over K for j ≥ 1 and denotes Cfor j = 0. We will also write

F =∞∑j=0

K⊗j (5.103)

F is a Hilbert space and F0 is dense in F .

Notation 5.10 (Left interior product) Let u ∈ K and let α = x1⊗· · ·⊗xn ∈K⊗n. Define

iuα = (x1, u)x2 ⊗ · · · ⊗ xn (5.104)

and extend iu linearly to K⊗algn. Then

(iuα, β) = (α, u⊗ β) (5.105)

for decomposable α and therefore for all algebraic n-tensors α and all β ∈K⊗(n−1). Hence

|(iuα, β)| = |(α, u⊗ β)| ≤ ‖α‖ ‖u‖ ‖β‖

Therefore iu extends uniquely to a bounded operator iu : K⊗n → K⊗(n−1)

with norm at most ‖u‖.

118

Notation 5.11 (Permutations) Denote by Sn the group of permutations of{1, 2, . . . , n}. For σ ∈ Sn define

Pσ(u1 ⊗ · · · ⊗ un) = uσ−11 ⊗ · · · ⊗ uσ−1n, uj ∈ K, j = 1, . . . , n (5.106)

Pσ is easily seen to extend linearly to a unitary operator on K⊗n. MoreoverPστ = PσPτ . So σ 7→ Pσ is a unitary representation of Sn.

Let

Pb =1

n!

∑σ∈Sn

Pσ and (5.107)

Pf =1

n!

∑σ∈Sn

sgn(σ)Pσ, (5.108)

where sgn(σ) is one if σ is even and is minus one if σ is odd.

Lemma 5.12 Pb and Pf are orthogonal projections on K⊗n.

Proof. If φ : Sn → {−1, 1} is a hom*omorphism then( 1

n!

∑σ∈Sn

φ(σ)Pσ

)2

=1

n!2

∑σ,τ

φ(σ)φ(τ)PσPτ

=1

n!2

∑η∈Sn

∑στ=η

φ(στ)Pστ

=1

n!

∑η∈Sn

φ(η)Pη.

In case φ(σ) ≡ 1 then this asserts that P 2b = Pb. In case φ(σ) = sgn(σ) this

asserts that P 2f = Pf . Since the adjoint of any summand is another summand

(P ∗σ = Pσ−1) both of these projections are Hermitian.

Lemma 5.13 Let 1 ≤ k ≤ n+ 1 and denote by [1, k] the cyclic permutation1 → k → k − 1 → · · · → 2 → 1. Let u ∈ K. Then, writing K⊗(n+1) ≡K ⊗K⊗n, we have

P(n+1)b =

1

n+ 1

n+1∑k=1

P(n+1)[1,k]

(I ⊗ P (n)

b

), and (5.109)

P(n+1)f =

1

n+ 1

n+1∑k=1

(−1)k−1P(n+1)[1,k]

(I ⊗ P (n)

f

). (5.110)

119

Moreover

Pτ iu = iu

(I ⊗ Pτ

)for any τ ∈ Sn, and (5.111)

ivP(n+1)[1,j+1](u⊗ α) = P

(n)[1,j]

(u⊗ (ivα)

), for all α ∈ K⊗n, 1 ≤ j ≤ n.

(5.112)

Proof. Every permutation σ in Sn+1 can be written uniquely in the formσ = [1, k]τ for some permutation τ that leaves 1 invariant. Just take τ =[1, k]−1σ if σ(1) = k. Now sgn([1, k]) = (−1)k−1. So sgn(σ) = (−1)k−1

sgn(τ). The identity (5.107), with n+ 1 in place of n, gives

P(n+1)b =

1

(n+ 1)!

n+1∑k=1

P(n+1)[1,k]

∑τ

Pτ (5.113)

where the sum over τ runs over the permutation group of {2, 3, . . . , n + 1}.This is (5.109). Similarly, the identity (5.108) gives (5.110).

It suffices to prove (5.111) and (5.112) in case α = x1 ⊗ · · · ⊗ xn+1. Butthen the left side of (5.111) is Pτ (x1, u)(x2⊗ · · · ⊗ xn+1) while the right sideis (x1, u)Pτ (x2⊗· · ·⊗xn+1). To prove (5.112) observe that for 1 ≤ k ≤ n+ 1we have

P(n+1)[1,k] (u⊗ α) = x1 ⊗ · · · ⊗ xk−1 ⊗ u⊗ · · · ⊗ xn ( u in kth position)

Hence, writing k = j + 1, we find

ivP(n+1)[1,k] (u⊗ α) = (x1, v)x2 ⊗ · · · ⊗ xk−1 ⊗ u⊗ · · · xn (u in jth position)

= (x1, v)P(n)[1,j](u⊗ x2 ⊗ · · · ⊗ xn)

= P(n)[1,j]

(u⊗ (ivα)

).

Lemma 5.14 Let u and v be in K and let α ∈ K⊗n. Denote by P(1,2) thepermutation operator on K⊗n which permutes the first two factors. Then

iuivP(1,2)α = iviuα (5.114)

Proof. Since all three operators in (5.114) are bounded it suffices to prove(5.114) in case α = x1 ⊗ · · · ⊗ xn. But in that case the left side of (5.114) is(x1, u)(x2, v)x3⊗ · · · ⊗ xn while the right side is (x2, v)(x1, u)x3⊗ · · · ⊗ xn

120

Definition 5.15 A tensor α ∈ K⊗n is symmetrtic if Pσα = α for all σ ∈ Sn.α is anti-symmetric if Pσα = sgn(σ)α for all σ ∈ Sn. Any element of K⊗0 ≡ Cor of K is both symmetric and anti-symmetric.

Corollary 5.16 Suppose that α ∈ K⊗n. Then

(iuiv − iviu)α = 0 if α is symmetric (5.115)

(iuiv + iviu)α = 0 if α is anti-symmetric (5.116)

Proof. Since the transposition (1, 2) is odd, the identity (5.114) proves both(5.115) and (5.116).

Definition 5.17 The Boson Fock space is the subspace Fb of F . consistingof symmetric tensors.

The Fermion Fock space is the subspace Ff of F consisting of anti-symmetric tensors. Thus if

α =∞∑n=0

αn, αn ∈ K⊗n (5.117)

thenα ∈ Fb if each αn is symmetric.α ∈ Ff if each αn is anti-symmetric.

Definition 5.18 Let u ∈ K. The annihilation operator associated to u isgiven by

a(u)α =∞∑n=1

√n iuαn (5.118)

when α is given by (5.117). The domain of a(u) consists of those α for whichthe series (5.118) converges. The interior product iu is defined in Notation5.10. The annihilation operators will be of interest for us only when operatingon Fb or Ff . The creation operator associated to u is

c(u)α =∞∑n=0

√n+ 1 Pb(u⊗ αn) for α ∈ Fb (5.119)

c(u)α =∞∑n=0

√n+ 1 Pf (u⊗ αn) for α ∈ Ff (5.120)

121

As before, the domain of c(u) consists of all α in the respective space Fb orFf for which the series converges. These two creation operators are calledthe Boson creation operator and Fermion creation operator , respectively.

Lemma 5.19 Fb and Ff are each invariant under a(u) for all u ∈ K. More-over

c(u)∗ = a(u) on Fb or Ff , respectively. (5.121)

Or maybe write (a(u)|Fb

)∗= c(u) on Fb (5.122)(

a(u)|Ff)∗

= c(u) on Ff (5.123)

Proof. If α is in K⊗n and τ ∈ Sn−1 then Pτ iuα = iu

(I ⊗ Pτ

)α by (5.111).

Thus if α is symmetric then so is iuα and if α is anti-symmetric then so isiuα. So Fb and Ff are each invariant under a(u).

If β ∈ K⊗(n−1) then (iuα, β) = (α, u⊗ β) by (5.105). Multiply by√n to

find (a(u)α, β) =√n(α, u⊗β) = (α,

√n(Pb(u⊗β) if α is symmetric. So if α

and β are both symmetric then (a(u)α, β) = (α, c(u)β). This proves (5.122).This also proves (5.123) if we simply replace symmetric by anti-symmetricand Pb by Pf .

Since a(u) lowers rank by one while c(u) raises rank by one its clearthat the identity (a(u)α, β) = (α, c(u)β) holds for all finite rank symmetrictensors α and β and for all finite rank anti-symmetric tensors α and β . Thisproves (5.122) and (5.123) on the space of finite rank tensors. In particulara(u) and c(u) have densely defined adjoints and therefore their restrictions tofinite rank tensors have closed extensions. Its clear from the definitions thatthe closure of these operators have the domains specified Definition 5.18.

END of DAY 23 = 4/19/11

5.6.2 The canonical commutation relations

Theorem 5.20 (Boson commutation relations) For any vectors u and v inK we have, on Fb,

a(u)a(v)− a(v)a(u) = 0 (5.124)

c(u)c(v)− c(v)c(u) = 0 (5.125)

a(u)c(v)− c(v)a(u) = (v, u)IFb (5.126)

122

Proof. We are going to prove these commutations relations when appliedjust to finite rank vectors. To this end we need only consider a tensor α ∈K⊗n which is symmetric. In this case(

a(u)a(v)− a(v)a(u))α =

√(n(n− 1))(iuiv − iviu)α = 0

by (5.115). This proves (5.124). (5.125) now follows from Lemma 5.19. (Saysomething about domains.)

Using (5.109) and the symmetry of α, we find

a(v)c(u)α = (√n+ 1iv)

√n+ 1P

(n+1)b (u⊗ α)

= iv

n+1∑k=1

P(n+1)[1,k] (u⊗ α)

= iv(u⊗ α) +n+1∑k=2

ivP(n+1)[1,k] (u⊗ α)

= (u, v)α +n∑j=1

P(n)[1,j]

(u⊗ (ivα)

),

wherein we have used (5.111) in the last step. On the other hand, by (5.109)with n instead of n+ 1, we have

c(u)a(v)α = nP(n)b

(u⊗ (ivα)

)=

n∑j=1

P(n)[1,j]

(u⊗ (ivα)

).

This proves (5.126) on symmetric tensors in K⊗n and therefore on all sym-metric tensors of finite rank.

Theorem 5.21 (Fermion commutation relations) For any vectors u and vin K we have, on Ff ,

a(u)a(v) + a(v)a(u) = 0 (5.127)

c(u)c(v) + c(v)c(u) = 0 (5.128)

a(u)c(v) + c(v)a(u) = (v, u)IFb (5.129)

123

Proof. The proofs of (5.127) and (5.128) are the same as those of (5.124)and (5.125) if one uses (5.116) instead of (5.115). Next, using (5.110) andthe anti-symmetry of α we find

a(v)c(u)α = (√n+ 1iv)

√n+ 1P

(n+1)f (u⊗ α)

= iv

n+1∑k=1

(−1)k−1P(n+1)[1,k] (u⊗ α)

= iv(u⊗ α) +n+1∑k=2

(−1)k−1ivP(n+1)[1,k] (u⊗ α)

= (u, v)α +n∑j=1

(−1)jP(n)[1,j]

(u⊗ (ivα)

),

wherein we have used (5.111) in the last step. On the other hand, by (5.110)with n instead of n+ 1, we have

c(u)a(v)α = nP(n)b

(u⊗ (ivα)

)= −

n∑j=1

(−1)jP(n)[1,j]

(u⊗ (ivα)

).

This proves (5.126) on anti-symmetric tensors in K⊗n and therefore on allanti-symmetric tensors of finite rank.

Mildly hidden in the commutation relations (5.124) - (5.126) are theHeisenberg canonical commutation relations for infinitely many P s and Qs.Indeed these identities immediately give

[c(u) + a(u), c(v) + a(v)] =(

(v, u)− (u, v))I = 2i=(v, u)I (5.130)

So let {e1, e2, . . . } be an orthonormal basis of K and define

Qj =c(ej) + a(ej)√

2and (5.131)

Pk =c(iek) + a(iek)√

2= i

c(ek)− a(ek)√2

(5.132)

Then all of these operators are clearly symmetric. (actually self-adjoint, butnever mind that.) Moreover (5.130) shows that

[Pk, Qj] = iδj,kI, j, k = 1, 2, . . . (5.133)

124

Thus the space Fb supports a countable family of operators satisfying theHeisenberg commutation relations. We are going to see in the next sectionthat this is just what one needs to pass from a system with finitely manydegrees of freedom, i.e., mechanics, to a system with infinitely many degreesof freedom, i.e., a field.

5.6.3 Occupation number bases

The following bases of Fb and Ff are conceptually illuminating and a big stepin understanding some of the notation common in the physics literature.

Theorem 5.22 (Occupation number basis of Fb ) Let e1, e2, . . . be an or-thonormal basis of K. For any vector u ∈ K let

un = u⊗ u⊗ · · · ⊗ u, (n factors) (5.134)

Let n1, n2, . . . be a finitely nonzero sequence of non-negative integers and letn =

∑∞j=1 nj. (This is a finite sum.) Define

|n1, n2, . . .〉 =( n!

n1!n2! · · ·

)1/2

Pb(en11 ⊗ en2

2 ⊗ · · · ) (5.135)

Then the set of vectors {|n1, n2, . . .〉 : nj ≥ 0,∑∞

j=1 nj <∞} is an orthonor-mal basis of Fb.

Proof. Since Pb is a projection,

‖Pb(en11 ⊗ en2

2 ⊗ · · · )‖2 = (Pb(en11 ⊗ en2

2 ⊗ · · · ), en11 ⊗ en2

2 ⊗ · · · ). (5.136)

Referring to (5.107), observe that if a permutation σ carries the subset{1, . . . , n1} into it self and carries the set {n1 +1, . . . , n2} into itself and so onthen (Pσ(en1

1 ⊗en22 ⊗· · · ), (en1

1 ⊗en22 ⊗· · · ) = ((en1

1 ⊗en22 ⊗· · · ), (en1

1 ⊗en22 ⊗· · · )).

The number of permutations of this form is n1!n2! · · · . For any permutationnot of this form the last inner product is clearly zero. Hence ‖Pb(en1

1 ⊗ en22 ⊗

· · · )‖2 = (n1!n2! · · · )/n!. This shows that the vectors |n1, n2, . . .〉 are normal-ized. Moreover ((en1

1 ⊗ en22 ⊗ · · · ), (em1

1 ⊗ em22 ⊗ · · · ) = 0 if any mj 6= nj no

matter what order these factors are placed in. Hence the vectors |n1, n2, . . .〉are also orthogonal. Finally, if w is a symmetric n tensor which is orthogonalto all of the |n1, n2, . . .〉 then (w, ej1⊗ej2⊗· · · ejn) = (Pbw, ej1⊗ej2⊗· · · ejn) =

125

(w,Pb(ej1 ⊗ ej2 ⊗ · · · ejn)) = 0 because Pb(ej1 ⊗ ej2 ⊗ · · · ejn)) is a multiple ofone of the basis elements in (5.135).

Terminology One says that the vector |n1, n2, . . .〉 is a state in whichthere are present n1 particles in state e1, n2 particles in state e2, and so on.Otherwise put, the state |n1, n2, . . .〉 is occupied by n1 particles in state e1,n2 particles in state e2, and so on.

Theorem 5.23 (Action of annihilation and creation operators.) For anyorthonormal basis e1, e2, . . . of K we have

c(ej)|n1, n2, . . .〉 =√nj + 1 |n1, n2, . . . , nj + 1, nj+1, . . .〉 (5.137)

a(ej)|n1, n2, . . .〉 =√nj |n1, n2, . . . , nj − 1, nj+1, . . .〉 (5.138)

wherein we have made the convention that |n1, n2, . . .〉 = 0 if any ni < 0.

For the Fermion occupation number basis our readers may find it morefamiliar to describe the Fermion Fock space as the completion of the exterioralgebra over K in the direct sum norm that we have been using. In order tomake algebraic manipulations with ease we will denote by Λ(K) the algebraicanti-symmetric tensors over the Hilbert space K.

Recall the standard notation, [40, pages 7 and 28],

u1 ∧ · · · ∧ un = Pf (u1 ⊗ · · · ⊗ un), uj ∈ K (5.139)

α ∧ β = Pf (α⊗ β), α, β ∈ Λ(K) (5.140)

Λ(K) is an associative algebra in this product, as follows from the identity(u1 ∧ · · ·uk) ∧ (v1 ∧ · · · ∧ vj) = (u1 ∧ · · · ∧ uk ∧ v1 ∧ · · · ∧ vj), which in turn

follows from the easily verified identity P(k+j)f (P

(k)f ⊗ P

(j)f ) (exercise).

The creation operator can be described in these simple algebraic termsthus: If α is an n-tensor in Λ(K) then

c(u)α =√n+ 1 u ∧ α, α ∈ Λ(K) ∩K⊗n, (5.141)

which follows from the definitions (5.120) and (5.140). In particular, induc-tion now shows that

c(u1) · · · c(un)1 =√n! u1 ∧ · · · ∧ un (5.142)

126

The norm of this element is easy to compute if {u1, . . . , un} is orthonormalbecause (Pσ(u1 ⊗ · · · ⊗ un), Pτ (u1 ⊗ · · · ⊗ un) = 0 unless σ = τ , and in thiscase the value is one. So

n!‖ u1 ∧ · · · ∧ un‖2 = n!‖Pf (u1 ⊗ · · · ⊗ un)‖2

=1

n!

∑σ

∑τ

(sgn σ)(sgn τ)(Pσ(u1 ⊗ · · · ⊗ un), Pτ (u1 ⊗ · · · ⊗ un))

= 1

Hence

‖c(u1) · · · c(un)1‖ = 1 if {u1, . . . , un} is orthonormal (5.143)

An occupation number basis for Fermions can be readily defined. Lete1, e2, . . . be an orthonormal basis of K. Let (n1, n2, . . . ) be a finitely non-zero sequence of zeros and ones. Define

|n1, n2, . . .〉 =√n! e1 ∧ e2 ∧ · · · (5.144)

where enjj should be simply omitted from the wedge product if nj = 0. Then

(5.142) and (5.143) show that these are unit vectors. Any two of them areclearly orthogonal and their span is dense in Ff . So they form an orthonormalbasis of Ff . Moreover (5.142) shows that

c(ej)|n1, n2, . . .〉 =

{(−1)

∑j−1i=1 ni |n1, n2, . . . nj + 1, nj+1, . . .〉 if nj = 0

0 if nj = 1

(5.145)

It is a simple exercise to deduce from this and from the commutation relationsthat

a(ej)|n1, n2, . . .〉 =

{(−1)

∑j−1i=1 ni |n1, n2, . . . nj − 1, nj+1, . . .〉 if nj = 1

0 if nj = 0

(5.146)

Of course no more than one particle can occupy a state ej because ej∧ej = 0.This is reflected in the choice of indexing sequences, namely zeros and ones,in contrast with the Boson case.

The Pauli exclusion principle, which asserts that no more than one elec-tron, (or other Fermion) can occupy a given state, is also reflected in thischoice of indexing sequences.

127

5.6.4 Time evolution on Fb and Ff .

A Schrodinger evolution on K naturally induces a Schrodinger evolution onFb and Ff as follows.

Let U be a unitary operator on K. Then U⊗n ≡ U ⊗ · · · ⊗ U (n factors)is a unitary operator on K⊗n. The direct sum

Γ(U) = I ⊕ U ⊕ U⊗2 ⊕ . . . (5.147)

is a unitary operator on F . It is clear that Γ(U) commutes with permu-tations, and therefore leaves Fb and Ff invariant. Moreover Γ is a grouphom*omorphism in that Γ(UV ) = Γ(U)Γ(V ) for any two unitaries U and Von K. Furthermore, if A is a self-adjoint operator on K and eitA is the oneparameter group that it generates, then Γ(eitA) is also a one parameter groupon F . For any vectors x1, . . . , xn ∈ K each factor in (eitAx1)⊗· · ·⊗(eitAxn) isa continuous function of t and therefore so is the product. Therefore Γ(eitA)wis a continuous function of t for a dense set of vectors w ∈ F and therefore forall w ∈ F because Γ(eitA) is uniformly bounded. Thus the map t 7→ Γ(eitA)is a strongly continuous one-parameter unitary group, and, by the Stone-vonNeumann theorem, Theorem 4.7, has a self-adjoint generator γ(A). That is,

Γ(eitA) = eitγ(A). (5.148)

If we simply differentiate (eitAx1)⊗ · · · ⊗ (eitAxn) with respect to t at t = 0we find a formula for γ(A) on n-tensors. Namely,

γ(A) = A⊗ I ⊗ I ⊗ · · · ⊗ I + I ⊗ A⊗ I ⊗ · · · ⊗ I + . . . (5.149)

on n-tensors which are in the span of products of vectors in the domain ofA. Actually this domain is a core for the self-adjoint operator γ(A). But wewill omit any discussion of this technical issue here. You could look up J.M.Cook’s paper [13] on this point if you wish.

If A is the Hamiltonian for some system whose state space is K thenA⊗ I + I ⊗ A is the Hamiltonian for the system consisting of two particles“of type K”. Of course if this particle is a Boson you must restrict thisoperator to the symmetric tensors in K ⊗K. And if the particle of type Kis a Fermion you must restrict this operator to the anti-symmetric tensors inK ⊗K. Clearly the operator A ⊗ I + I ⊗ A leaves both of these subspacesinvariant. Similarly, the operator γ(A) is the Hamiltonian for the system

128

consisting of an arbitrary number of identical particles, whether they be allBosons or all Fermions.

There is an important physical point to be understood in this construc-tion of a Hamiltonian for a family of identical particles from the Hamiltonianfor one of them. The Hamiltonian γ(A) corresponds to adding together theenergy of several particles, say n, without including any energy of interac-tion between the particles. The n particles move “freely” without interferingwith each other. This is not really “reality”. In a realistic system particlesinteract, either directly or through some other intermediate particle. Typi-cally, such interactions between otherwise independently moving particles isincorporated by adding an additional term to the Hamiltonian γ(A). We aregoing to explore simple cases of this in Section 5.8.

REFERENCES:Cook [13],Fock [20],Reed and Simon, [48, Section X.7]. This reference contains much more

about Fock spaces than we have discussed.

129

5.7 The Particle-Field isomorphism.

In Section 5.5 we constructed a quantized field ϕ(x, t). On the one hand itwas a solution to our field equation

ϕ(x, t) = ϕ′′(x, t)

On the other hand it was “quantized” in the sense that not only do ϕ(x, t) andϕ(y, t′) not commute with each other but the “position-like” operators ϕ(x, t)and the “momentum-like” operators ϕ(y, t) satisfy the continuum version ofthe usual Heisenberg commutation relations [Pj, Qk] = iδjkI. Namely,

[ϕ(x, t), ϕ(y, t)] = −iδ(x− y)I

cf. (5.89). Of course the appearance of the delta function just reflects thefact that x 7→ ϕ(x, t) is an operator valued distribution on (0, π), not anoperator valued function.

The structure of the space L2(R∞, µ) on which these field operators act,contrasts sharply with that in the previous section, which was all aboutparticles, many particles. In this section we are going to show that the fieldpicture of Section 5.5 and the particle picture of Section 5.6 are isomorphicin a way that preserves all the quantum mechanical structures. A little moreprecisely, we will construct a unitary operator from the field Hilbert spaceL2(R∞, µ) onto the appropriate Boson Fock space Fb, which interchanges thefield operators ϕ(x, t) with simple expressions in terms of the annihilationand creation operators. To this end we will apply the machinery of Section5.6 to the appropriate Hilbert space K, which underlies the definition of thequantized field ϕ constructed in Section 5.5.

Definition 5.24 (The field space.) Denote by h1/2 the space of sequencesq ≡ (q1, q2, . . . ) of real numbers such that

‖q‖2h1/2≡ 2

∞∑j=1

ωjq2j <∞. (5.150)

Denote by H1/2 the set of real valued functions u on (0, π) of the form

u(x) =∞∑j=1

qjuj(x), x ∈ (0, π), (q1, q2, . . . ) = q ∈ h1/2 (5.151)

130

and define‖u‖H1/2

= ‖q‖h1/2 (5.152)

Since ωj → ∞, the definition (5.150) implies that h1/2 ⊂ `2. The series(5.151) therefore converges in L2((0, π)). Its worth noting that (5.152) gives

‖uj‖H1/2=√

2ωj. (5.153)

A reader familiar with Sobolev spaces might notice that H1/2 is exactly theSobolev space of order 1/2 over (0, π) because the functions uj satisfy u′′j =−ω2

juj with ωj = j.

Preview: The following theorem has quite a few different proofs. Hereare sketches of the main ones.

a) (not for us) The von Neumann algebra generated by the operatorsQj defined in Section 5.6 is a maximal abelian algebra. The vacuum state,1 ∈ K⊗0 ≡ C , is a cyclic vector for this algebra and induces the measure γon the spectrum of the algebra when the spectrum is identified with R∞.

b) (not for us) The infinite dimensional heat semigroup et∆∞ takesL2(R∞; γ) to functions with holomorphic extensions to C∞. The Taylorcoefficients of such functions, computed at zero, are exactly the symmetrictensors Fb. Wick ordering lurks in this approach

c) (for us) We will map a natural orthonormal basis of L2(R∞, γ) ontothe occupation number basis of Fb defined in Section 5.6 and extend the mapunitarily.

Theorem 5.25 (Particle-Field isomorphism) Take K = H1/2 in Section 5.6and denote by Fb the corresponding Boson Fock space over H1/2. There is aunique unitary operator

U : L2(R∞, γ)→ Fb (5.154)

such that

a) U1 = 1 (the zero rank tensor in Fb) and (5.155)

b) U(∂/∂qj)U−1 = a(uj), j = 1, 2, . . . (5.156)

Further, letaj = (2ωj)

−1/2a(uj), j = 1, 2, . . . (5.157)

131

denote the normalized annihilation operators. Then U induces the followingmaps on operators

UQjU−1 = ω

−1/2j Qj, (5.158)

UPjU−1 = ω

1/2j Pj and (5.159)

UHU−1 =∞∑j=1

ωja∗jaj, (5.160)

where Qj and Pj are defined in (5.131) and (5.132).

Proof. Referring to the notation in Section 5.5, let ej = (1/√

2ωj)uj.Then ej is an orthonormal basis of H1/2, as we see from (??). This basisdefines in turn the orthonormal basis |n1, n2, . . .〉 defined in Theorem ??.

Referring now to Appendix 9.5 on Hermite polynomials, define functionsψn,j : R∞ → R by

ψn,j(q1, q2, . . . ) =1√n!Hn

√2ωjqj) (5.161)

where Hn is the nth order Hermite polynomial. It was shown in Appendix 9.5that for each fixed j the functions qj 7→ (1/

√n!)Hn(

√2ωjqj), n = 0, 1, . . .

form an orthonormal basis of L2(R, γωj). Since the functions q 7→ qj areindependent with respect to γ, the finite products ΠN

j=1ψnj ,j(q) are orthonor-mal with respect to γ. In fact they form an orthonormal basis of L2(R∞, γ).(Exercise for the reader). Define

U(

ΠNj=1ψnj ,j

)= |n1, n2, . . .〉 (5.162)

Then U extends to a unitary operator from L2(R∞; γ) onto Fb(H1/2). Itremains to show that U has the asserted properties.

To this end note first that if n = 0 then ψn,j = 1 for all j. ThereforeU(1) = |0, 0, . . .〉 = 1. This proves (5.155).

Next, observe that (9.69) shows that

(∂/∂qj)ψn,j =√

2ωj√nψn−1,j. (5.163)

For any occupation number basis vector |n1, n2, . . .〉 the application of U−1

yields a product of functions ψnk,k(q) and since ∂/∂qj differentiates just

132

one factor, (5.163) shows that the effect of differentiation is to lower nj byone and to multiply by

√2ωj√nj. In accordance with (5.138) this yields√

2ωjaj|n1, n2, . . .〉, which, by (5.157), equals a(uj)|n1, n2, . . .〉. Thus (5.156)holds when applied to any occupation number basis vector and thereforeholds on a dense set in Fb(H1/2). We omit the technicalities addressing theexact domain on which (5.156) holds.

The remaining three properties (5.158) - (5.160) are really just rewritesof the identities (5.49) - (5.53) proved in Section 5.3 along with the defini-tions (5.131) and (5.132) once one knows the defining relation (5.156). Thus(5.156) implies that U(∂/∂qj)

∗U−1 = a(uj)∗. Therefore

U(∂/∂qj + (∂/∂qj)

∗)U−1 = a(uj) + c(uj) and (5.164)

U(∂/∂qj − (∂/∂qj)

∗)U−1 = a(uj)− c(uj) (5.165)

Keeping in mind the definition (5.157), divide both of these equations by2√ωj to find

U( 1

2√ωj

(∂/∂qj + (∂/∂qj)∗))U−1 =

(aj + cj)√2

and (5.166)

U(1

2√ωj

(∂/∂qj − (∂/∂qj)

∗)U−1 =

(aj − cj)√2

(5.167)

Therefore, in view of (5.38), (5.39), (5.131) and (5.132) we have

U(ω

1/2j Qj

)U−1 = Qj and (5.168)

U(ω−1/2j Pj

)U−1 = Pj. (5.169)

which proves (5.158) and (5.159). Finally, the equation (5.53) yields (5.160)if one replaces αj, which, by (5.156), is the unitary transform of a(uj), by aj,in accordance with (5.157).

SUMMARY: In the past four sections we constructed an operator val-ued distribution φ(x, t) over (0, π)× R acting on a Hilbert space L2(R∞, γ),which satisfies our wave equation (5.88) and also the Heisenberg canonicalcommutation relations (5.89). It is the latter identity that distinguishes thequantum field φ from a classical (i.e. real valued) solution to the wave equa-tion. Where are the “photons” associated to the quantum field φ? In Section

133

5.6 we constructed a Hilbert space Fb whose elements correspond to a vari-able number of particles, each of which has Hilbert state space K, a givenHilbert space. The idea that Fb contains vectors representing two particles(of type K, lets call them) or three particles is very visible in the structure ofFb because Fb contains a subspace K⊗K and a subspace K⊗K⊗K (actuallyjust the symmetric tensors in these spaces). Such a particle structure is notvisible in the Hilbert space L2(R∞, γ), on which the quantum field φ(x, t)acts. However, as we saw in Theorem 5.25, there is a natural unitary mapU from L2(R∞, γ) onto Fb if we take K = H1/2. Here the word “natural”means that U induces a map on operators which interchanges φ(x, t) withcertain easy to describe operators on Fb. In fact Uφ(x, t)U−1 can be writtenas linear combinations of creation and annihilation operators on Fb. (Takethis as an easy exercise based on Theorem 5.25.) Pulling back the particlestructure from Fb to L2(R∞, γ), we should try to understand which functionsψ : R∞ → C correspond to one photon states, two photon states, etc.

First, dismiss from your mind the idea that you may have, that the har-monic oscillators that we used to construct the space L2(R∞, γ), back inSection 5.3, “are” themselves the particles (photons) in question. Lets lookat some examples of what the Particle-Field isomorphism does to some sim-ple wave functions in L2(R∞; γ). Recall that the function q 7→ ψn,j(q) is afunction of the jth coordinate qj of q and as a function of qj it is a Hermitepolynomial of order n, with its argument adjusted appropriately for the fre-quency ωj, as in (5.161). The orthonormal basis of H1/2 that we used forconstruction of the isomorphism is ej =

√2ωjuj. The isomorphism takes

e.g. ψ1,j to the 1-photon state ej. So no matter which harmonic oscillator(these are indexed by j) we start with we “get” only a one photon state.For different j these photons differ only in their color (i.e. frequency), not intheir number. Next consider the function ψ2,5. This maps to e5 ⊗ e5 underthe isomorphism (5.162). This is a two photon state. Both photons have fre-quency ω5 (which is just 5 in our vibrating string example.) Next, considerthe function ψ1,5(q)ψ1,7(q). This maps to const.(e5 ⊗ e7 + e7 ⊗ e5), which isalso a two photon state. One of the photons has frequency ω5 and the otherhas frequency ω7.

5.8 Pre-Feynman diagrams

Suppose that A and V are two operators on a Hilbert space. We are goingto make some computations that are easily justified if A and V are bounded.

134

But this is not the case of interest to us. Nevertheless we will proceed asif all steps can be justified, ignoring all technical issues. First we need toestablish a simple identity. Let B = A+ V . Then

etB = etA +

∫ t

e(t−s)AV esBds. (5.170)

Proof: The product rule for derivatives gives

d

ds(e−sAesB) = e−sA(B − A)esB

Integrate this identity from s = 0 to s = t to find e−tAetB−I =∫ t

0e−sAV esBds.

Multiply both sides by etA on the left to deduce (5.170).We can iterate this identity now by replacing the factor esB in the in-

tegrand by (5.170) itself, with t replaced by s and the s in the integrandreplaced by σ, say. We find

etB = etA +

∫ t

e(t−s)AV(esA +

∫ s

e(s−σ)AV eσBdσ)ds (5.171)

= etA +

∫ t

e(t−s)AV esAds+

∫0≤σ≤s≤t

e(t−s)AV e(s−σ)AV eσBdσds (5.172)

The exponential of B does not occur in the first two terms, and we canobviously continue in this manner, replacing next eσB in the last term by anexpression based on use of (5.170) again. If, in fact, A and B are boundedthen it is quite easy to see that the (n + 1)st term of this series is boundedby et‖A‖‖V ‖n/n! and therefore the series converges in operator norm. Butenough of this kind of technicality! Let us write out the three terms that weget by applying this procedure just once more. We find

etB = etA +

∫ t

e(t−s)AV esAds+

∫0≤σ≤s≤t

e(t−s)AV e(s−σ)AV eσAdσds (5.173)

+ higher degree terms in V

The case of interest for us is that in which a quantum system moves inaccordance with a Hamiltonian consisting of an easy-to-deal-with part, H0,and a not-so-easy-to-deal-with part V . That is

H = H0 + V. (5.174)

135

Put B = iH and A = iH0 in (5.173). Then we have

eitH = eitH0 +

∫ t

ei(t−s)H0(iV )eisH0ds (5.175)

+

∫0≤σ≤s≤t

ei(t−s)H0(iV )ei(s−σ)H0(iV )eiσH0dσds+ · · ·

It is convenient to organize the ensuing computations by symbolizing thevarious terms in this series by a suggestive diagram. For the first term wewill use

//////////////////////////////// (5.176)

which symbolizes the operator eitH0 , customarily called the “free” propaga-tor for the interval [0, t]. The integrand in the second term consists of “free”propagation for a time s, followed by the interaction operator (iV ), followedthen by more free propagation for a time t − s. You can see that the totalamount of free propagation time is t. And in the third term there is simi-larly, propagation, interaction, propagation, another interaction and a finalpropagation. The total propagation time is again t. The time (or times) atwhich interaction occurs is integrated over. The diagram that symbolizes thesecond term in (5.175) is

//////////// X ///////////////// (5.177)

The third term is symbolized by

////////////// X /////////// X /////////// (5.178)

Both symbols include the understanding that the intermediate time (s)(or times, (σ, s)) are integrated over. The equation (5.175) can therefore bewritten symbolically as

eitH = ////// + /////X ///// + /////X /////X /////+· · · (5.179)

We want to see now how these diagrams actually look for a system con-sisting of a particle (call it an electron) interacting with a quantized field.(Call it the electromagnetic field, but our model will be much simpler.) Thequantized field, as we saw in the preceding section, “can be regarded as” (ofcourse we are talking isomorphism here) an assembly of photons whose num-ber may alter in interactions. We will allow the electron to move in “space”

136

which we take to be the interval (0, π). The quantized field must, of course,also be a field that exerts forces on particles moving in this space. We havejust such a field at our fingertips, namely the quantized field constructed inSection 5.5. The Hilbert state space for the electron is He = L2((0, π)) whilethe Hilbert state space for the quantized field (representing photons), andwhich will be denoted by Hp, is Hp ≡ Fb. The combined system, electronplus field has state space

H = He ⊗Hp, (5.180)

in accordance with the combining principle given in (4.42). We need to de-scribe the Hamiltonian that guides the time evolution of the system. Denoteby He the Hamiltonian for the electron by itself. This is an operator on He.For example if there are no forces on the electron, other than those which arekeeping it in the interval (0, π), then the Hamiltonian isHe = −1/(2m)d2/dx2

with Dirichlet boundary conditions. One could add on a potential V if thereare additional forces. But this will not matter for us. Lets just take V = 0. Inany case the electron wave function moves inHe, as usual, by the 1-parameterunitary group eitHe . We have already constructed the Hamiltonian for thequantized field φ. Denote it by Hp. You may recall that this Hamiltonianwas determined from our assumption that the classical version of the field φevolves by the vibrating string equation, (5.10). The Hamiltonian is given by(5.55) in the field space representation and by (5.160) in the particle spacerepresentation. We are going to stick to the particle space representationhenceforth because we want to see how the mathematics results in photonsbeing created and destroyed by the interaction with electrons. You may re-call that when we studied the hydrogen atom back in Section 4.6, we hadno mechanism for explaining how an electron, jumping from one orbit (=state) to another, emitted light, i.e. produced a photon. It is precisely thismechanism that we want to understand now. To this end we need first totransfer the quantized field operators φ(x, t) from operating on L2(R∞; γ) tooperating on Hp. Equation (5.158) shows that

φ(x) ≡ Uφ(x, 0)U−1 =∞∑j=1

uj(x)ω−1/2j Qj (5.181)

where φ(x, 0) = φ(x) is given by the definition (5.18). We see also from thedefinition (5.131) that each Qj is a linear combination of aj ≡ a(ej) and

137

cj ≡ c(ej). Thus we may write

φ(x) =∞∑j=1

uj(x)bj(cj + aj) (5.182)

where each bj > 0. It is this expression of the field in terms of creation andannihilation operators that we need to keep in mind. We already know thatthis series doesn’t converge pointwise in any simple sense. We are going toproceed informally in order to understand just what the diagrams are goodfor. A reader who feels queasy about this can just assume that the coefficientsbj decrease to zero quickly enough to get convergence in her favorite sense,or are even zero after n = 10. Let

H0 = He ⊗ I + I ⊗Hp and (5.183)

H = H0 +HI (5.184)

where HI denotes the interaction part of the total Hamiltonian H. We willdefine HI in a moment. But first observe that the combined system, electronplus field, would propagate, under the Hamiltonian H0 by itself, without anyinterference between the electron and photons because

eitH0 = eitHe ⊗ eitHp (5.185)

It is the term HI which will cause the presence of an electron to influencethe time evolution of the photon field and vice versa.

To describe the action of HI onH we need to use the natural isomorphismbetween L2((0, π))⊗Hp and L2((0, π);Hp). The latter space is the space offunctions ψ : (0, π) → Hp such that ‖ψ‖2 =

∫ π0‖ψ(x)‖2

Hpdx < ∞. It is easyto see that the map f ⊗ w 7→ ψ where the ψ(x) = f(x)w extends linearly toa unitary map of L2((0, π))⊗Hp onto L2((0, π);Hp). We will identify thesetwo spaces. Define

(HIψ)(x) = φ(x)ψ(x) (5.186)

So HI is just a multiplication operator by an operator valued “function” φ.Let us see now what the diagrams in (5.179) signify. We will expand the

total propagator eitH in accordance with (5.175), where now V = HI . Thediagram (5.176) just means “propagate the initial state by eitH0”. Since theinitial state always contains our one (and only) electron and may contain sev-eral photons (or none), and since eitH0 preserves the number of photons (and

138

of course preserves the fact that there is one electron, even though (5.185)shows that the state of the electron can be changing), we can symbolize theevolution corresponding to (5.176) by a diagram such as in Figure 10. Itshows that we started with one electron and two photons and by the end westill have one electron and two photons.

Figure 10: Feynman 1

Next, consider the diagram in (5.177). The initial state propagates freelyas in the preceding diagram and then the interaction operates. The defini-tions (5.181) and (5.186) show that the interaction could increase the numberof photons by one, these are the cj terms, or decrease the number of photonsby one, these are the aj terms. In Figure 11 a you see no photons comingin and one photon coming out. Of course the one electron comes in andgoes out. By the time we get to the end of diagram (5.177) the electron haschanged its state. But its still just one electron. However a photon has beencreated if there was none there before. And if there was a photon present atthe beginning then the interaction may destroy it. More politely, one saysthat the electron has absorbed the photon. Moreover in the case that thereis a photon present before the interaction begins, it may be destroyed as inFigure 11 b or the creation operator terms in (5.182) may create a secondphoton. This would be symbolized by the diagram in Figure 11 c.

The two diagrams in Figures 11 a and b determine all the things that canhappen in this interaction. One need only repeat them for the higher orderterms in the expansion (5.179).

139

Figure 11: Feynman 2a,b,c

Corresponding to the second order diagram (5.178) there are many com-binations of creation and annihilation of photons. Here are just a few of themany possibilities. Lets start with some simple initial states, say zero or onephoton. We read the diagrams from left to right in spite of the fact that theoperators on the right in (5.175) act first.

Figure 12: Feynman 3a,b,c,d

MORAL: The complicated terms in (5.175) can be organized by thesekind of diagrams. It is believed by many that HI is “small” in some senseand that the first few terms therefore accurately predict the outcomes ofexperiments. (So far this seems to be right.) Thus an interaction betweenthe electron and “electromagnetic field” consists of a repeated creation andannihilation of photons while the electron goes its merry way changing its

140

state dramatically every time it emits or absorbs a photon, and changinggradually in between.

So much for the simple hydrogen atom of Section 4.6. We have a theorynow that explains how the electron emits a photon when it falls into a lowerenergy state. We also understand now why an electron sometimes jumps intoa higher energy state - it absorbs a photon.

REFERENCES:The recent book by Folland, [21], goes much more deeply and honestly

into the mathematical structure of quantum field theory than the version inthese notes.

Here is a quick introduction to quantum field theory by Paul Federbush,aimed in part at mathematicians. [19]

For a book on quantum field theory by a real physicist with a math-ematical orientation see the three volume exposition by Steven Weinberg,[62, 63, 64].

END of DAY 26 = 4/28/11

6 The electron-positron system

Within this section, I only did Section 6.1 in class. I added the remainingtwo subsections after the end of classes.

6.1 The Dirac equation

The simplest Lorentz invariant wave equation is the Klein-Gordon equation.

∂2u/∂t2 −∆u+m2u = 0, m ≥ 0 (6.1)

(Actually the case m = 0 is called the wave equation.) It was proposedto use the Klein-Gordon equation as a Lorentz invariant replacement of theSchrodinger equation (4.4). But the appearance of a second time derivativecaused conceptual problems that could not be resolved. In his great paper[16], Dirac argued that one could still use Equ. (6.1) as a starting pointfor seeking a Lorentz invariant substitute for the Schrodinger equation by“factoring” (6.1) into two first order equations.

141

You may recall how dependent each step in the development of Maxwell’sequations was on the immediately preceding experimental discovery. Thefinal step, by Maxwell, followed thirty years after Faraday’s great experimentof 1831 and was a mathematical implementing of Faraday’s heuristic view of amagnetic field, based on his own experiment. By contrast, Dirac’s discoveryof the Dirac equation was based on a search for formal structure, namelyLorentz invariance plus first order in time. Thus:

We seek matrices β and αj, j = 1, 2, 3 such that the first order differentialoperator

H =3∑j=1

αj∂j + βm (6.2)

is a square root of m2 −∆. Now

H2 =( 3∑j=1

α2j∂

2j + β2m2

)+∑j<k

(αjαk + αkαj

)∂j∂k +

3∑j=1

m(αjβ + βαj

)∂j

(6.3)

Therefore H2 = m2 −∆ if and only if

β2 = 1, α2j = −1, j = 1, 2, 3

αjαk + αkαj = 0, j 6= k,

αjβ + βαj = 0, j = 1, 2, 3 (6.4)

Here is an example of 4× 4 matrices which satisfy all of these conditions.In Appendix 9.3.2 we defined the Pauli spin matrices as

σx =

(0 11 0

), σy =

(0 −ii 0

), σz =

(1 00 −1

)(6.5)

Define 4× 4 matrices by

β =

(I 00 −I

)and αj =

(0 σj−σj 0

), j = 1, 2, 3 (6.6)

Since each σ2j = I, while σjσk + σkσj = 0 for j 6= k, the identities (6.4) are

satisfied.Concerning the uniqueness of this particular example we have the follow-

ing theorem.

142

Theorem 6.1 Suppose that S is a finite dimensional vector space supportingoperators αj, β, j = 1, 2, 3, satisfying (6.4). Then dim S is a multiple of 4.If dim S = 4 then any solution to the identities (6.4) is similar to (6.6).That is, any other solution has the form TαjT

−1, TβT−1 for some invertiblelinear transformation T . If dim S > 4 then S is a direct sum of invariant4-dimensional subspaces on each of which the restrictions of the αj and β aresimilar to the example (6.6).

Proof. See Chevalley [12]. (There must be a better reference for this proof.)

The Dirac equation is

∂ψ

∂t= iHψ Dirac equation (6.7)

where H is given in (6.2) and the coefficients satisfy (6.4). Here ψ(x, y, z, t)lies in the four dimensional complex vector space S on which the matricesαj, β act. In the case of the choice (6.6) we may of course take S = C4.

Notice that if ψ is an S valued solution to (6.7) then

(∂2/∂t2)ψ = (∂/∂t)(iHψ)

= (iH)2ψ

= −(m2 −∆)ψ (6.8)

So ψ is an S valued solution to the Klein-Gordon equation (6.1).

Lemma 6.2 Choose the C4 inner product on S and choose the Dirac ma-trices as in (6.6). Let H = L2(R3;S). Then the operator H (cf. (6.2)) isself-adjoint on its natural domain in H. Moreover

spectrum H = (−∞,−m] ∪ [m,∞) (6.9)

Proof. Note first that β∗ = β and α∗j = −αj for all j. Let f and g be in

143

C∞c (R3;S). Then

(Hf, g)L2(R3;S) =

∫R3

〈(Hf)(x), g(x)〉Sdx (6.10)

=

∫R3

〈(3∑j=1

αj∂jf(x) + βmf(x), g(x)〉Sdx (6.11)

=

∫R3

〈f(x),−3∑j=1

α∗j∂jg(x) + β∗mg(x)〉Sdx (6.12)

= (f,Hg)L2(R3;S) (6.13)

So H is symmetric on this domain. We omit technical domain issues becausewe are going to Fourier transform H anyway.

Let

ψ(p) =1√

(2π)3

∫R3

e−ip·xψ(x)dx (6.14)

denote the three dimensional S valued Fourier transform of ψ. Then

(Hψ)(p) =( 3∑j=1

αj(ipj) + βm)ψ(p). (6.15)

Thus H is unitarily equivalent to the operator of multiplication by the matrixvalued function

H(p) =3∑j=1

αj(ipj) + βm (6.16)

The same computation that gave (6.8) now shows that

H(p)2 =(m2 + |p|2

)IS (6.17)

The Hermitian 4 × 4 matrix H(p) has spectrum lying, therefore, in the set{±√m2 + |p|2}. In fact the spectrum consists of both of these numbers. To

see this let γ5 = α1α2α3β. Clearly γ5 anti-commutes with each αj and withβ. Consequently γ5H(p)γ−1

5 = −H(p). So the spectrum of H(p) contains aif it contains −a. This proves (6.9).

144

Remark 6.3 Let γ0 = iβ and γj = βαj, j = 1, 2, 3. Multiply (6.7) by iβ tofind (

γ0∂t +3∑j=1

γj∂j

)ψ +mψ = 0. Dirac Equation (6.18)

This equation looks more symmetric between t and the spatial coordinatesthan does (6.7). In fact, there exists a representation of the covering group ofthe Lorentz group ρ : SL(2,C) → End(S) whose induced action on the op-erators γ0, . . . , γ3 agrees with the action of the corresponding Lorentz trans-formation on a basis of R4. This is the basis of Lorentz invariance of theDirac equation.

STATUS: On the one hand Dirac produced a Lorentz invariant substitutefor the Schrodinger equation, namely equation (6.7), and, since H is self-adjoint, the induced propagation eitH in L2(R3;S) is unitary, as it had betterbe. On the other hand the Hamiltonian H has the very unpleasant feature ofbeing unbounded below, as we see from (6.9). In fact on the negative energysubspace the energy as a function of momentum is E(p) = −

√m2 + |p|2.

Thus as the momentum goes up the energy goes down. No way can thisbe taken seriously. Dirac pointed this out in his first paper, [16], but didn’tpropose a fix till two years later, [17]. Dirac fixed this by inventing an infinitedimensional version of the Hodge star operator. This, in turn, suggested theexistence of an as yet undiscovered particle, the positron. And this, in turn,was discovered a few years later in cosmic rays. Positrons are now availableat your local hospital in large numbers if you should need some for a PETscan. The infinite dimensional Hodge star operator goes under the name ofthe Dirac hole theory.

6.2 Dirac hole theory

Dirac proposed a fix for the negative energy problem, mentioned above, ina second paper, [17], which he submitted two years after the first, [16]. Hisresolution of the problem could be described in modern terms as the inventionof an infinite dimensional Hodge star operator.

Here is a slight variant of the Hodge star operator in finite dimensions,which we will later extend to infinite dimensions. Let V be a finite dimen-sional complex vector space of dimension n. Let u1, . . . , un be a basis of V

145

and let v1, . . . , vn be the dual basis of V ∗. You may already recognize themap

R :√k! u1 ∧ · · · ∧ uk 7→

√(n− k)! vk+1 ∧ · · · ∧ vn (6.19)

as the Hodge star operator, aside from the numerical factors and the lack ofan arbitrary choice of subset of the basis. The numerical factors are neededto make R a unitary map from Λ(V ) onto Λ(V ∗). You can see from (5.142)and (5.143) that these factors are required if R is to have a unitary extension.We are going first to characterize the desired map R in a basis independentway, avoiding the use of an inner product on V and will scrupulously avoididentifying V with V ∗. This is not just for the sake of mathematical pu-rity, but rather, in the application to the electron-positron system, it willbe important to use the dual space to the negative energy subspace H−,discussed in Section 6.1, as the support space of the contragredient represen-tation of the orthochronous Lorentz group, in order to explain why electronsand positrons have different “intrinsic parity”.

To this end let us observe first that the annihilation operator a(u) is aconjugate linear function of u because u appears in (5.104) on the right side inthe inner product. (See also (5.118).) Had we defined the interior product iwin (5.104) using 〈x,w〉, with w ∈ K∗ instead of (x, u) with u ∈ K, then iw aswell as aw would be linear in w and none of the computations would change.The commutation relations (5.129) would read awc(v) + c(v)aw = 〈v, w〉IFf ,which captures nicely the bilinearity in v and w. In the following Propositionthe roles of K and K∗ are interchanged from the previous discussion.

Proposition 6.4 Let V be a complex vector space of finite dimension n.Choose a non-zero vector ω ∈ Λn(V ∗). There exists a unique linear transfor-mation R : Λ(V )→ Λ(V ∗) such that

a) R(1) = ω and (6.20)

b) a(u)R = Rc(u) for all u ∈ V (6.21)

Moreover, if V is an inner product space and ‖ω‖ = 1 then R is unitary.

Proof. Induction, together with a) and b) show that

Rc(u1) · · · c(ur)1 = a(u1) · · · a(ur)ω (6.22)

for any vectors u1, . . . , ur ∈ V . In view of (5.142) this asserts that√r! R(u1 ∧ · · · ∧ ur) = a(u1) · · · a(ur)ω (6.23)

146

Given any basis of V we may allow the set {u1, . . . , ur} to run over all subsetsof the basis. The identity (6.23) shows then that R is uniquely determined ona basis of Λ(V ) by the conditions a) and b). We may use (6.23) to constructR by defining it on that basis and extending it linearly to all of Λ(V ). Theso constructed R is easily shown to satisfy a) and b) on the basis elementsof V and therefore for all u ∈ V . Let us emphasize that we have shown thatthe so constructed map R is basis independent.

Now if V is an inner product space and ‖ω‖ = 1 then we can choose anyorthonormal basis of V , say {v1, . . . , vn}, and take ω = c

√n! v1 ∧ · · · ∧ vn for

some constant c of absolute value one. Without loss of generality we maytake c = 1 because it can be absorbed into v1. But in view of (5.144) and(5.146) the assertion (6.23) simply says that

R(√r! u1 ∧ · · · ∧ ur) = ±

√(n− r)! u′1 ∧ · · · ∧ u′n−r (6.24)

where the primed vectors form the complementary set in {v1, . . . , vn} of theunprimed vectors. By (5.142) and (5.143), R therefore takes an orthonormalbasis to an orthonormal basis.

Lemma 6.5 Suppose that A is a diagonalizeable linear operator on V . De-note its transpose on V ∗ by A′. Then

Rγ(A)R−1 = trace(A)− γ(A′) (6.25)

Proof. Suppose that x1, . . . , xn is a basis of V with Axj = λjxj for j =1, . . . , n. Then A′yj = λjyj for the dual basis. Let u = xi1 ∧ · · · ∧ xik and

v = Ru. Define λ =∑k

1 λij . Then γ(A)u = λu and γ(A′)v = (trace(A)−λ)v.So Rγ(A)R−1v = Rγ(A)u = Rλu = λv = (trace(A)v − γ(A′)v

How to use the identities (6.21) and (6.25). Suppose that V is theone-particle Hilbert space for some Fermion and that A is its Hamiltonian.We will temporarily take V to be finite dimensional. Since the particle is aFermion the Hilbert space for an indefinite number of such particles is theexterior algebra Λ(V ). The total Hamiltonian is γ(A) (see (5.149)), whichacts on Λ(V ). As usual, without changing the physics, one can change theHilbert state space by a unitary operator if one also transfers the observableoperators. By Lemma 6.5 we can change the Hilbert space by the unitary op-erator R to Λ(V ∗) and use for the total Hamiltonian γ(−A′)+ trace(A)IΛ(V ′)

147

Since changing a Hamiltonian by an additive constant does not change thephysics we can drop the last term and take

H = γ(−A′) acting on Λ(V ∗) (6.26)

as the Hamiltonian for our system of an indefinite number of Fermions. Aswill be explained shortly, the case of interest for us is that in which A is anegative operator. In that case −A′ is a positive operator. Thus we havemade a unitary transform and added a constant and thereby transformeda negative Hamiltonian, γ(A), into a positive Hamiltonian γ(−A′). At thesame time we have also interchanged the creation and annihilation operators,as we see from (6.21) and the following exercise. Exercise: For all y ∈ V ∗

there holdsc(y)R = Ra(y) (6.27)

Interpretation. The creation, annihilation and energy operators areautomatically transformed by the unitary operator R onto a new Hilbertspace Λ(V ∗). The significance of the states in Λ(V ∗) need interpretation.The zero rank tensor 1 ∈ Λ0(V ∗) is the image under R of the top rankelement w ∈ Λn(V ), as can be seen immediately from (6.24), with r = n. wis an n particle state for the Fermion whose state space is V , even though itis now represented, via R, by the zero rank tensor 1 ∈ Λ0(V ∗). In the statew all possible energy levels of the one-particle Hamiltonian A are filled.

Terminology. The nth rank tensor w ∈ Λn(V ) is called a “sea” of Vparticles. Under the Hodge map R, the sea is mapped to the zero rank tensor1 ∈ Λ0(V ′). Thus the zero rank tensor 1 ∈ Λ0(V ′) now represents the “sea”of particles whose 1-particle state space is V . Hence forth we will refer tothis particle as a positron. So one says that the zero rank tensor 1 ∈ Λ0(V ′)represents a “sea” of positrons, rather than a state with no particles in it.Similarly, a rank one tensor y ∈ Λ1(V ′) represents a state of n− 1 positrons,i.e., a missing positron, because it is the image under R of a rank n − 1tensor in Λ(V ). This is a “hole” in the sea. A second rank tensor in Λ2(V ′)is similarly not a two particle state but represents two holes in the sea of npositrons.

Our case. We saw in our analysis of the Dirac equation that the DiracHamiltonian H, defined in (6.2), decomposes L2(R3;S) into two orthogo-nal subspaces H− and H+ on which H has spectrum (−∞,−m] or [m,∞),

148

respectively. Denote by H− the restriction of H to H−. This is a strictlynegative operator and is unbounded below. It is unacceptable as a quantummechanical Hamiltonian. But if we apply the procedure developed aboveto the vector space V = H− and A = H− then we will get an (informally)equivalent description of the quantization of H− on H− by using instead−H ′− on (H−)∗. We have then a one “particle” Hamiltonian −H ′− which isnon-negative. So all is well. But the correct physical interpretation of thezero rank tensor 1 ∈ Λ0((H−)∗) is now not that of a no particle state butthat of a “sea” of infinitely many positrons. (infinitely many because dimH− = ∞.) And a rank one tensor is a hole in that sea. The total Hamil-tonian for the Hilbert space of holes, Λ((H−)∗) “is” γ(−H ′−), which acts onΛ((H−)∗) (actually on its completion, Ff ) and is a non-negative operator. Inthis discussion of “our case” we have evaded the fact that trace H− = −∞.We used the finite dimensional heuristic to motivate our choice of state spaceas Λ((H−)∗) and total Hamiltonian as γ(−H ′−). In the physics literature thesame conclusion is reached, but with the computation made directly in theinfinite dimensional case. Thus one has to subtract an infinite “constant”from the total Hamiltonian at some point. Some readers may find that dis-turbing. But we have already been through a similar heuristic procedureback in Section 5.4, when we subtracted the infinite ground state energy.In both cases we developed a well defined finite dimensional approximation,made a meaningful subtraction of a finite constant from some Hamiltonian,and then took the informal limit as a definition of the desired structure. Inour present case the change of Hilbert space to Λ((H−)∗) entails a big con-ceptual change in the physical meaning of the state vectors. This changeturns out to be a serious guide as to what happens physically when the fullinteraction is put into the theory. (See next section on pair production.)

The infinite dimensional Hodge star operation of Dirac. For anyreal or complex Hilbert space K denote by Λ(K) the Fermion Fock spaceFf over K. As we know, (make sure this is stated in Fock space section)the creation and annihilation operators c(u) and a(u) are bounded operatorson Λ(K). Denote by A(K) the C∗ algebra of operators on Λ(K) generatedby all the operators {c(u), a(u) : u ∈ K}. Proposition 6.4 asserts that, incase K is finite dimensional, then the unitary operator R constructed thereinterchanges the algebra A(K) with A(K∗) by mapping

c(u) 7→ a(u) ≡ Rc(u)R−1 (6.28)

149

If dim K =∞ then there is no unitary operator R because there is no highestrank vector ω ∈ Λ∞(K∗). However the isomorphism of C∗ algebras inducedby R, as in (6.28), continues to make sense in infinite dimensions. This isthe isomorphism proposed by Dirac, [17]. Here is a precise statement.

Theorem 6.6 (Dirac-Hodge hole theorem) Let K be a real or complex Hilbertspace. Then there exists a unique algebra isomorphism

β : A(K)→ A(K∗) (6.29)

such thatβ(B∗) = (β(B))∗ for all B ∈ A(K) (6.30)

andβ(c(u)) = a(u) for all u ∈ K (6.31)

In particular β(c(u)∗) = a(u)∗ for all u ∈ K.

Proof. See the appendix of [25]. Note: a(u) is the annihilation operator onΛ(K∗) defined by an element u ∈ K. See discussion preceding Proposition6.4.

Remark 6.7 The action of the isomorphism β on a negative one particleHamiltonian A, as in (6.25), is best understood in the context of a grouprepresentation on K and its contragredient representation on K∗. For exam-ple if A is a negative self-adjoint operator on K and g(t) = eitA is the oneparameter group that it generates, then the contragredient representation ofwhatever Lie group, G, might contain this time translation generator wouldmap these operators to (g(t)−1)transpose, which equals e−itA

′. Thus, passing

to the contragredient representation changes the sign of generators. The caseof interest for the Dirac hole theory is that in which G is the orthochronousPoincare group. It acts unitarily on Dirac’s Hilbert space H ≡ L2(R3;C4)and leaves invariant the positive and negative energy subspaces H+ and H−.We may restrict this representation to the negative energy subspace H− andthen take its contragredient representation on (H−)∗. The resulting represen-tation of G is now a positive energy representation. It is the representationthat belongs to positrons (and actually characterizes positrons). It happensthat this representation of G is unitarily equivalent to the representation of Ggotten by restricting our original representation on H to the positive energysubspace H+, but only for the connected component of G, not for all of G.

150

The other component of G is generated by space reflection. Thus one saysthat a positron differs from an electron not only in charge but also in parity.See the Appendix of [25] for more details.

6.3 Pair production

The interchange of electron creation and annihilation operators describedin the previous section on hole theory leads to an interaction Hamiltoniancontaining terms which represent the mutual destruction of a positron andelectron and the replacement of such a pair by a photon. Other terms rep-resent the annihilation of a photon and the creation of an electron-positronpair in its place. This greatly enlarges the family of Feynman diagrams thatenter at any order. Compare the diagrams in Section 5.8 for simple versionsof such diagrams. In Section 5.8 the interaction does not change the numberof electrons.{In some future version of this seminar this section should be greatly

enlarged.}

151

7 The Road to Yang-Mills Fields

Today’s question: Where do connections on vector bundles come from inquantum mechanics?

Today’s answer: (Outline)

1) The Lorentz force (3.41) on a charged particle depends on the velocityof the particle. In order to incorporate this force into quantum mechanicsone must modify the rules of quantization that we previously discussed forvelocity independent forces in Section 4.3, so as to include this force. As afirst step one must express the time dependent force with the help of suitablepotentials. We did this by

2) first expressing Maxwell’s equations in the form DF = 0, D∗F = Jwhere F is the 2-form on space-time constructed from E and B as in (3.31).D denotes the exterior derivative on forms over R4 (≡ space-time). SeeSection 3.8 for this.

3) Since DF = 0 there is a 1-form A on space-time such that

F = DA. (7.1)

The 1-form A replaces the potential V , which we used earlier to express avelocity independent force, F = −grad V . Whereas V was unique up toan additive constant, the 1-form A is unique only up to an additive exact1-form. The big mathematical problems which will be induced by this largedegree of non-uniqueness was, admittedly, not even mentioned earlier.

4) Having now an electromagnetic analog of the potential V , we madethe transitions from Newton to Lagrange to Hamilton for the Lorentz forcein Section 3.8, with the resulting Hamiltonian function (cf. (3.51))

H(x, p, t) =1

2m

3∑j=1

(pj −e

cAj(x, t))

2 + eφ(x, t), (7.2)

where

A =3∑j=1

Ajdxj + φdt. (7.3)

5) We will carry out the quantization procedure for the Hamiltonian func-tion (7.2) in Section 7.1.

152

6) We will then discuss the geometric meaning of the resulting Schrodingerequation. Specifically, we will see that the Schrodinger equation in the pres-ence of electromagnetic forces begs for an interpretation of the electromag-netic potential A as a connection form for a complex line bundle C×R4 → R4

with structure group U(1).7) Yet to come: Whereas the Schrodinger equation that we will derive

in Section 7.1 lends itself to interpretation as an equation for sections ofa complex line bundle over R4 with structure group U(1), the discovery ofthe neutron in 1932 by Chadwick and the ensuing attempts to understandthe strongly attractive forces between n- n, p-p and n-p, culminated in thesuggestion by Yang and Mills (1954) to replace the complex line bundlewith U(1) structure group by a vector bundle with a more general compactstructure group. The correct choice of structure group has to be determinedby experiment. At the present time the choice SU(3)× SU(2)×U(1) seemsto fit best all the experimental data, and is the basis for the standard modelof elementary particles. I will elaborate a bit on this item 7) in Section 8.Some more of the history of the period 1932-1954 is in the timeline of Section8.1.

7.1 Quantization of a particle in an electromagneticfield

We will apply the usual rules of quantization, as in Section 4.3, to a chargedparticle without spin, subject to electromagnetic forces. The electromagneticfield exerting the force is assumed to satisfy Maxwell’s equations (3.33) -(3.36), or equivalently (3.37), (3.38). There is, therefore, an electromagneticpotential A, which is a 1-form on space-time. We will take over the notationfrom (7.2) and (7.3).

LetH = L2(R3). (7.4)

In accordance with the rules of Section 4.3 one should replace pj in (7.2) by

153

−i~(∂/∂xj). Then we obtain the time dependent Hamiltonian operator

H =1

2m

3∑j=1

(−i~(∂/∂xj)− (e/c)Aj)2 + eφ(·, t) (7.5)

= − ~2

2m

3∑j=1

(∂

∂xj− i e

c~Aj)

2 + eφ(·, t) (7.6)

The Schrodinger equation (4.3) may therefore be written

i~( ∂∂t

+ ie

~φ)ψ = − ~2

2m

3∑j=1

( ∂

∂xj− i e

c~Aj

)2

ψ (7.7)

This is theSchrodinger equation in the presence of an electromagnetic field.

Define

D0 =( ∂∂t

+ ie

~φ)

and (7.8)

Dj =( ∂

∂xj− i e

c~Aj

)(7.9)

{Check signs.} In terms of these operators we may write the Schrodingerequation as

i~D0ψ = − ~2

2m

3∑j=1

D2j (7.10)

Don’t think that (7.10) is just a notationally slick way of writing (7.7). Noticethat the differential operators Dk, k = 0, 1, 2, 3 differ from the usual firstderivatives, ∂/∂t, ∂/∂xj by the addition of purely imaginary functions. So?Well, the circle group, U(1) ≡ {eiθ : θ ∈ R}, has Lie algebra iR. Thereforethe operators Dk can be interpreted as covariant derivatives for sections ofa complex line bundle over R4 with structure group U(1). That is, if oneregards ψ(x, t) not as a complex valued function but rather as a sectionof the bundle C × R4 → R4, then specification of an electromagnetic forcefield amounts to choosing a particular covariant derivative on sections of thisbundle - a covariant derivative different from the trivial one.

154

Is this interpretation just a stretch of ones imagination? Think back tothe non-uniqueness inherent in the representation (7.1), of the fields E andB, by the potential A. For any reasonable (say smooth) function f : R4 → Rthe 1-form A ≡ A + Df represents the same electromagnetic field as Abecause D2f = 0. Now the function eif(x,t) is a U(1) valued function on R4.Therefore the map ψ 7→ ψ = eifψ is a bundle automorphism. That is, ψ andψ represent the same section, but in different (global) trivializations of thebundle. Now look at this simple identity!

(∂j − i(Aj + ∂jf))eifψ = eif (∂j − i(Aj))ψ, j = 0, 1, 2, 3 (7.11)

This says that the non-uniqueness of the potential A is equivalent to thenon-uniqueness of the representative of a section. Otherwise said, it is thecovariant derivative, ∂j − i(Aj), that has physical meaning rather than thepotential A itself. The Lie algebra valued 1-form iA is the connection formthat represents the covariant derivative in a particular (global) trivialization.Change the global trivialization and you change the connection form withoutchanging the covariant derivative.

CONCLUSION: The quantum mechanical procedure for incorporating elec-tromagnetic forces into the Schrodinger equation is commensurable with (oris suggestive of, or indeed forces) an interpretation of the electromagnetic po-tential iA as a connection form on R4 for a complex line bundle C×R4 → R4

with structure group U(1). In short, one could fairly say

Quantum Mech. + E&M = Covariant derivative.

Remark 7.1 Hermann Weyl [66] seems to be the first to have advanced thisview of electromagnetism. His objective was to find a way to unite generalrelativity with electromagnetism.

Reference: A useful source for background on connections on vector bun-dles is [15].

7.2 Two classic debuts of connections in quantum me-chanics

Before leaving S1 bundles there are two important examples to understand inwhich the underlying manifold is not topologically trivial. If the configuration

155

space C is not R3 but, say, an open subset of R3 then space-time is C × R,a 4-manifold, and all of the preceding discussions about Maxwell’s equation,gauge potentials and complex line bundles go through if one does not insiston global trivializations but only local trivializations. In the following twoexamples C is, respectively, R3 minus a cylinder and R3 minus a point. Eachcase arises naturally in a physical experiment.

Before leaving this transition from electromagnetic fields to geometricobjects here is a summary of the terminology used in physics and the cor-responding mathematical terminology. This is taken from the “dictionarypaper” of Wu and Yang [71].

Dictionary4-potential = gauge potential = connection formField strength= gauge field = curvatureNonintegrable phase factor = path dependent parallel transport

7.2.1 The Aharonov-Bohm experiment

Denote by S the closed cylinder in R3 centered on the z axis and of radiusa. That is S = {(x, y, z) : x2 + y2 ≤ a2}. Take configuration space to beC = R3 − S Let b be a real constant and define

A = b

{(−y, x, 0)/(x2 + y2) outsideS

(−y, x, 0)a−2 insideS(7.12)

The magnetic field, B = curl A, is zero in C and is equal to (0, 0, 2ba−2) forall time inside the cylinder S, as one can readily compute. Moreover we willtake φ = 0 on R4. Both E and B are time independent. So we will ignoretime and focus just on the geometry over C.

In the Aharonov-Bohm experiment one shoots a beam of electrons par-allel to the x axis and observes the pattern on a vertical plate behind thecylinder and perpendicular to the x axis. Make the cylinder surface out of a(near) perfect conductor, such as gold, so that no electrons can get inside thecylinder. Those that hit the cylinder are conducted off to infinity. Thus allelectrons stay in a region, C, where E = B = 0. Nevertheless one observesa diffraction pattern on the screen which varies with b. The Schrodinger

156

equation, (take e/c~ = 1)

−i∂ψ/∂t = −(1/2m)3∑j=1

(∂j + iAj)2ψ(x, t)

in C×R (Dirichlet boundary conditions on ∂S) accurately predicts the diffrac-tion pattern! [Refs., Christianne Martin and Nintendo toy company]

Conclusion: Even though the magnetic and electric forces are zero in C,where the electrons are constrained to lie, they nevertheless sense the con-nection form itself. Geometrically speaking, the curvature of this connectionis zero while the holonomy group of A in C in not zero. In any ball in C onecan change the the local trivialization to reduce A to zero in the ball withoutchanging the physics. Of course this does not change the holonomy group.Thus one could say that the electrons really sense the holonomy group of theconnection. This is the observation made by Wu and Yang [71].

Experimental confirmation: The prediction made by Aharonov andBohm, [1] in 1959, was experimentally confirmed by R. G. Chambers in 1960[11].

For some other interesting paradoxes see the book by Aharanov andRohrlich [2].

Exercise 7.2 Show that the holonomy around a circle of radius r > a en-closing the cylinder is e2πib.

7.2.2 The Dirac magnetic monopole

The equation defining electric charge as the source of an electric field is Gauss’law (3.9), div E = 4πρ where ρ is the electric charge density. If magneticcharge exists, its density, M is defined by div B = 4πM because magneticcharge is , by definition, a source of the magnetic field, B. We want to con-sider a point magnetic charge, i.e. a magnetic monopole, sitting still at theorigin. Gauss’ law would then be div B(x) = 4πpδ(x) where p is the polestrength (magnetic charge) of the monopole. The fundamental solution tothis equation is B(x) = px/(|x|3), x ∈ R3 − {0}. In order to formulatethe Schrodinger equation for this force law we need to use, as usual, a corre-sponding gauge potential A. In order to avoid ambiguities generated by thesingularity in B at the origin of R3 one is forced to take configuration spaceto be C = R3−{0}. This immediately makes the topology of C nontrivial. As

157

we will see this nontrivial topology offers a way out of a serious technical andconceptual issue that will soon insert itself. The structure we want to inves-tigate is time independent because the magnetic field B is time independent,the electric field is identically zero, and the magnetic monpole is sitting stillat the origin. We will therefore ignore the time factor in C × R and focusonly on the relevant geometry over the configuration space C. In particularwe need to find a connection on the bundle C ×C→ C whose curvature is B.Now div B = 0 in C because the monopole is not in C. Hence locally thereexist 1-forms A such that B = dA. But in fact there is no globally defined1-form A on C for which B = dA. Here is the best we can do.

Let 0 < δ < π/2 and, in spherical coordinates, r ≥ 0, θ ∈ [0, π], φ ∈[0, 2π], let

N = {(r, θ, φ) : r > 0, 0 ≤ θ < π/2 + δ} (7.13)

S = {(r, θ, φ) : r > 0, π/2− δ < θ ≤ π} (7.14)

Then N∪S = C gives an open cover of C by contractible sets. N∩S intersectsany sphere around the origin in a band around the equator. Define

AN = p(1− cos θ)dφ on N (7.15)

AS = −p(1 + cos θ)dφ on S (7.16)

We assert that B = dAN on N and B = dAS on S. To see this easilyobserve that dAN = p sin θdθ∧dφ on N and dAS = p sin θdθ∧dφ on S. Eachof these is therefore equal to p times the element of area on the sphere ofradius 1. But B (after applying the Hodge star to convert it into a 2-form)is B = (p/r3){xdy ∧ dz + ydz ∧ dx + zdx ∧ dy}. At the north pole of thesphere of radius r we have x = y = 0, z = r. Consequently, near the northpole, the factor in braces is r× element of area = r3 times element of area ofthe unit sphere. Thus, by spherical symmetry of B, AN and AS both havecurvature B.

Let h(r, θ, φ) = AN−AS on N∩S. Then h = 2pdφ on N∩S. Assume nowthat 2p is an integer and let g(r, θ, φ) = ei2pφ. This is a well defined smoothfunction on N ∩ S because 2p is an integer. Moreover iAN = iAS + g−1dg.Consequently iAN and iAS together with g define a connection on a complexline bundle over C with local trivializations N × C and S × C and withtransition function g. Moreover the curvature of this connection is B. If wereinsert the physical constants e/(c~) in front of the electromagnetic potential

158

A then the condition on the pole strength needed to make ie/(~c)AN andie/(~c)AS into a well defined connection on a well defined complex line bundleover C as above is

2pe

~c= integer

This is Dirac’s quantization condition [18] relating pole strength p and elec-tron charge e. This condition is also necessary, as we shall see in the nexttheorem. But first, it may be illuminating to see why AN cannot be extendedto all of C. Convert AN to cartesian coordinates by using φ = (tan)−1(y/x),which implies dφ = (xdy− ydx)/(r2− z2), and multiply (7.15) by r/r to find

AN =p(xdy − ydx)

r(r + z).

Thus AN has a singularity along the entire negative z axis. A similar com-putation shows that, if one chooses δ = π/2, then AS has a singularity alongthe entire positive z axis. Dirac’s view of this, (1931), was that the monopoleforces the singularity in the electromagnetic vector potential to extend alonga curve from the origin to infinity. But Wu and Yang [71] (1975) emphasizedthe viewpoint which we have described here: the complex line bundle overC, associated to a magnetic monopole, does not have a global trivialization.

The following theorem adds to the previous discussion, showing in whatsense Dirac’s quantization condition, 2p = integer, is necessary.

Theorem 7.3 There exists an S1 connection over R3−{0} whose curvatureis (Hodge *) (px/|x|3) if and only if 2p is an integer.

Proof. The sufficiency has already been proved. Define N and S as above.Assume that there exists a connection over C whose curvature isB ≡ ∗px/|x|3.Since N and S are contractible and dB = 0 on C there exist 1-forms AN onN and AS on S satisfying dAN = B and dAS = B on N and S respectively.Since these forms represent the same connection on the presumed bundle overC there exists a smooth function g : N ∩S → S1 such that AN = AS + g−1dg

159

on N ∩ S. Then

4πp =

∫|x|=1

p sin θdθ ∧ dφ

=

∫|x|=1

B

=

∫0≤θ≤π/2

dAN +

∫π/2≤θ≤π

dAS

=

∫θ=π/2

AN −∫θ=π/2

AS

=

∫θ=π/2

g(1, π/2, φ)−1∂φg

= 2π winding number of φ→ g(1, π/2, φ)

= 2π integer

Hence 2p = integer.

Remark 7.4 There is a huge literature on the experimental and theoreticalaspects of magnetic monopoles. For a survey up to 1990 see [22]. An inquiryto mathscinet: Anywhere magnetic monopoles brings up 970 matches, as ofJune 7, 2011.

8 The Standard Model of Elementary Parti-

cles

Ran out of time, although this topic was the real goal of the seminar.Fortunately, you have available Chapter 9 of Folland, [21], which gives

tremendous perspective, both historical and technical, into the evolution andstructure of the standard model.

Another excellent book describing the relation between group theory and,among other things, the classification of elementary particles is the book [55]by Shlomo Sternberg.

Maybe the next time this seminar is taught this topic can be squeezedin. It will require an exposition of connections on vector bundles. See forexample [15].

160

The exposition of connections on vector bundles in [42] is long on themathematics and its relation to important topics within mathematics, e.g.,application to Donaldson polynomials, Jones-Witten invariants, etc. But italso explains the relation of this mathematics to physicist’s gauge field theory.

Anyway, the main idea linking what we have done to the currently mostsuccessful classification of elementary particles is now simple to describe.

In Section 7.1 we saw that the combination of electromagnetic theory withquantum mechanics leads to an interpretation of the electromagnetic poten-tial as a connection form for the vector bundle C× R4 → R4 with structuregroup U(1), the circle group, Consider the following simple generalizationof this. Take your favorite compact Lie group K and an irreducible unitaryrepresentation of K on a finite dimensional inner product space V . The mapV × R4 → R4 is then a vector bundle with structure group K. Denote by kthe Lie algebra of K. Then a k valued 1-form A on R4 determines a covariantderivative for this bundle. Of course a change in the global trivialization ofthis bundle will induce a change in the 1-form A, if we interpret this 1-formas a connection form for this bundle, as we did in the electromagnetic case,and as we will do now. The associated covariant derivative on this bundlenow represents some kind of force field as it did in the electromagnetic case.In the electromagnetic case the particles that implemented the electromag-netic force were photons. But in this more general case they are some otherkind of particles, called gluons. The interesting case is that in which K isnon-commutative. The curvature of a covariant derivative can be expressedin terms of a connection form A for it as

R = dA+ A ∧ A. (8.1)

This is a k valued 2-form on R4. If k = iR then the second term in (8.1) is zeroand the curvature reduces to the 2-form defining the electric and magneticfields, as in (3.31). In the non-commutatilve case we should interpret the sixindependent k valued components of R as “non-commutative” “electric” and“magnetic” fields. These are the force fields associated to the connectionform A. The Maxwell equations, (3.37), are now replaced by the Bianciidentity, DR = 0, which is automatic when R is the curvature, (8.1). Andthe Maxwell equations (3.38) are replaced by the equation D∗R = J - theYang-Mills equation. In this geometric way we arrive at an extension of

161

electromagnetic theory. At the present time these new fields seem to providethe best basis for a comprehensive theory of elementary particles.

What are these force fields acting on? They are acting on whateverparticles are being represented by sections of the vector bundle V ×R4 → R4.In the electromagnetic case V was one dimensional and the electric andmagnetic fields were acting on one charged particle, say an electron. But if Vis e.g. eight dimensional then a section represents the state of any one of eightparticles, one for each of your favorite basis vectors of V . Of course, given thegroup K, the vector space V is limited to those that support an irreducibleunitary representation of K. Moreover the elements of the Lie algebra ofK will transfer to observables, as we learned in Section 4.8. Here are thenames of some of these observables: isotopic spin, hypercharge, strangeness,beauty, charm. { The last three are not quite right. See Griffith’s book, [24].}{ Concerning spin, the previous discussion is oversimplified. Spin comes notfrom the group K but from a more geometric source, the action of the Lorentzgroup on Dirac wave functions, as in Section 6. From the point of view ofthe preceding discussion, one can just replace V by C4⊗V . But K acts onlyon V .} In the standard model one takes K = SU(3)× SU(2)× U(1).

I’m reluctant to leave the reader with the impression that this simpleswitch from structure group U(1) to a general compact structure group wasinvented by someone just “thinking upon it”, as Newton claims to have donewith some of his own theories. In fact the evolution of this extension fromelectromagnetism to non-commutative structure group was driven by experi-ments. Even though Rutherford (1911) used alpha particles to determine thestructure of an atom (small nucleus inside big atom) he didn’t really knowwhat an alpha particle was. Yes, it had charge +2 and mass = 4 protonmasses. It could have been 4 protons stuck together with 2 electrons em-bedded. Or it could have been 2 protons + 2 neutrons (if they exist). Itwasn’t till 1932 that the British experimental physicist, Chadwick, showedthat neutrons exist. In any case the electric forces between any two protonsare pushing the protons apart with tremendous force because the two pro-tons are so close and have the same charge. So whats holding the nucleustogether? There must be another force between any of these nucleons (pro-tons or neutrons), an attractive force, even stronger than the electric force,that beats out the repulsive electric force. Heisenberg, immediately uponhearing of Chadwick’s discovery, made a (rather ad hoc, I would say) math-ematical formalism to allow the possibility that a proton and a neutron are

162

really the same particle, but in different states. The proton is in a chargedstate. The mysterious attractive forces between nucleons are independentof their charge, proposed Heisenberg. The charge only affects the repulsiveelectric force, [31]. See also Heisenberg’s related papers [32, 33, 34].

This hypothesis of charge independence of the nuclear force can be testedexperimentally. It predicts that the isotopes N15 and O15 should have asimilar radiation spectrum from their nuclei even though N15 has 7 protonsand 8 neutrons while O15 has 8 protons and 7 neutrons. The predictionwas experimentally confirmed between 1932 and 1936 (get dates). This wasan impetus for Eugene Wigner to amend Heisenberg’s simple formalism andpromote invariance of nuclear forces under Heisenberg’s interchange operator(proton ↔ neutron) to invariance under SU(2). You might like to read theextract from Wigner’s paper, reproduced in the next section, to be sure thatI’m not just putting words in his mouth, or thoughts in his mind.

It was another 17 years before Yang and Mills promoted Wigner’s globalSU(2) invariance to “local” invariance” which for us means vector bundleswith non-commutative structure group.

8.1 The neutron and non-commutative structure groups.Timeline.

1911 (May): Rutherford determined the structure of an atom: small nu-cleus (radius = 10−13 cm) compared to the overall radius of the atom (10−8

cm.)

1932 (May 10): Chadwick provided convincing evidence that neutronsexist, [10, 9].

1932 : Heisenberg, motivated by Chadwick’s discovery, made an ad hoctheory of nuclear forces. It predicts that the isotopes N15 and O15 should havesimilar radiation spectrum even though N15 has 7 protons and 8 neutronswhile O15 has 8 protons and 7 neutrons, [31]. See also Heisenberg’s relatedpapers [32, 33, 34]. The prediction was experimentally confirmed between1932 and 1936 (get dates).

1937: Eugene Wigner, [69], promoted Heisenberg’s invariance of neutron-proton interactions from invariance under Z2 (interchange of proton and neu-tron wave functions) to invariance under SU(2). Thereafter SU(2) is referredto as the isotopic spin group, and must be distinguished from the ordinary

163

spin group SU(2) (which covers SO(3) and thereby acquires an understand-able geometric meaning).

From the first paragraph of Wigner’s paper.

Recent [experimental] investigations [two references are given]appear to show that the forces between all pairs of constituentsof the nucleus are approximately equal. This makes it desirableto treat the protons and neutrons on an equal footing. A schemefor this was devised in his original paper by W. Heisenberg, whoconsidered protons and neutrons as different states of the sameparticle. Heisenberg introduced a variable τ which we shall callthe isotopic spin. The value -1 of this variable can be assigned tothe proton state of the particle, the value +1 to the neutron state.The assumption that the forces between all pairs of particles areequal is equivalent, then to the assumption that the forces donot depend on τ , or that the Hamiltonian does not involve theisotopic spin.

In Heisenberg’s scheme τ is the 2 × 2 matrix diag (-1,1) acting on C2.Wigner gave arguments, based on the experimental evidence, for assertingthat a potential operator V on L2(R3) ⊗ C2 should be invariant not justunder I ⊗ Z2 but also under I ⊗ SU(2) if V is to represent nuclear forcesconsistently with experiment.

1954: Yang and Mills [72] promoted Wigner’s SU(2) invariance to alocal SU(2) invariance. This means that, whereas Wigner allowed invarianceof his and Heisenberg’s nuclear force theory to be invariant under the mapψ(x, t) 7→ gψ(x, t) for any fixed element g ∈ SU(2) (recall that ψ takes valuesnow in C2), Yang and Mills allowed g to depend on space and time. This localSU(2) “gauge invariance” is linked to the existence of SU(2) “gauge fields” inthe same way that the non-uniqueness of the electromagnetic potential A islinked to the electromagnetic field. See the discussion surrounding equation(7.11).

9 Appendices

9.1 The Legendre transform for convex functions

4/27/11 This appendix contains some redundancies. Compare with Section2.7.1. I replaced these two sections by Section 2.7.1 on 4/27/11. So the ap-

164

pendix on the Legndre transform (in general and for quadratic polynomials)isn’t needed at all for these notes.

As a cultural matter you should know that the formulas derived in Section9.1.1 for quadratic plus linear functions of velocity fit into a very clean andgeneral scheme of transforms. I will sketch this here, so that you don’t thinkthat the formulas of Section 9.1.1 are just ad hoc computations. However wewill only need the results of Section 9.1.1 for our applications to Hamiltonianmechanics. The more general case is needed for thermodynamics and otherparts of mathematics. You can skip this section without too much guilt.

The Legendre transform is a map from convex functions on a real vectorspace V to convex functions on the dual space. Here are some definitions.We take V to be finite dimensional. A function f with domain D(f) ⊂ Vis convex if D(f) is convex and f(ax + (1 − a)y) ≤ af(x) + (1 − a)f(y)for all x, y ∈ D(f) and for all a ∈ [0, 1]. The epigraph of f is the set{(v, y) ∈ D(f)× R : y ≥ f(v)}. A convex function is closed if its epigraphis closed in V × R. Let p ∈ V ∗. A hyperplane in V × R with slope p isthe graph of the function v → 〈p, v〉 + c for some constant c ∈ R. Thishyperplane is a support plane for f if 〈p, v〉 + c ≤ f(v) for all v ∈ D(f)and c is the largest constant for which this inequality holds. In other words,c = inf{f(v) − 〈p, v〉 : v ∈ D(f)} = − sup{〈p, v〉 − f(v) : v ∈ D(f)} (whichcould turn out to be −∞.) The hyperplane clearly intersects the y axis at c.

Definition 9.1 The Legendre transform of f is the function f ∗ given by

f ∗(p) = sup{〈p, v〉 − f(v) : v ∈ D(f)}, (9.1)

with domain D(f ∗) consisting of all those p ∈ V ∗ for which the sup is lessthan +∞. (In other words c > −∞.) f ∗ is called the conjugate function tof .

These functions are related by the following easy to state theorem.

Theorem 9.2 If f is a closed convex function on its domain in V then f ∗

is a closed convex function on its domain in V ∗. Moreover

f ∗∗ = f (9.2)

This is a nice classical theorem with a nice geometric flavor. The proof iseasy and we refer the reader to [26][page 146] for a proof. { A proof is in the

165

Appendices, but commented out.} But it is not exactly the theorem we needbecause we are really interested only in smooth, strictly convex functions fwhich are defined on all of V . Of course the theorem applies to this restrictedclass of functions. But we can say a little more in this case. Draw a pictureof a parabola as a model for a nice convex function on R and you will seethat the support plane (line in this case) touches the graph of f at exactlyone point (v, f(v)) just because f is strictly convex. (If f had flat spots thesupport line could touch all along the flat spot.) At this unique point v weclearly have

f ′(v) = p. (9.3)

This equation determines v as a function of p, say v = φ(p), because f ′ isstrictly increasing, since f ′′ > 0. If one only had f ′′ ≥ 0 then there could beflat spots and in this case f ∗ is still well defined on V ∗ but the diffeomorphismψ−1 constructed in Section 9.1.1 no longer exists. This failure is actually aconceptually important part of thermodynamics because it reflects the exis-tence of phase transformations. But we will need to use the diffeomorphismψ in our application to mechanics as well as the Legendre transform f 7→ f ∗.

Example 9.3 Take V = R and let

f(v) = (1/2)mv2 + av (9.4)

for some constants m > 0 and a ∈ R. The equation (9.3) reduces to

mv + a = p. (9.5)

So φ(p) = m−1(p− a). In accordance with (9.1) the conjugate function f ∗ isgiven by

f ∗(p) = [pv − f(v)]v=φ(p)

=1

2m(p− a)2 (9.6)

Thus in this example the familiar relation p = mv fails and is replaced by(9.5). Indeed we will see later that (9.5) more accurately captures the relationbetween momentum and velocity than p = mv does when the particle isacted on by an electromagnetic field. You might like to know that (9.6) canbe partly blamed for forcing the appearance of connections on vector bundlesin physics.

166

All of this discussion leading to the construction of the function φ appliesto higher dimensions also under the strict convexity assumption f ′′(v)〈w,w〉 >0 (for w 6= 0.) The equation (9.3) should now be interpreted, of course, toassert that the linear functional on the left equals the linear functional onthe right. We are only going to consider functions f that are quadratic, inthe sense of Example 9.3. This is done in Section 9.1.1. (The formula basedversion in Section 9.1.1 doesn’t require convexity.) The function φ will there-fore be affine, just as in the example, and in particular, φ will always be adiffeomorphism from V ∗ onto V . The formula (9.6) is clearly a special caseof (9.13). If you are inclined to find a more general class of smooth, strictlyconvex functions for which φ is a global diffeomorphism tell me what youfind out.

9.1.1 The Legendre transform for second degree polynomials

This section computes the Legendre transform for a slightly more generalclass of quadratic functions than those given in the examples of Section 2.7.1.It is not needed by us.

Let Y be a finite dimensional real vector space and denote by Y ∗ its dualspace. We want to consider a class of functions on Y which will be typicalfor for the velocity dependence of a Lagrangian L(q, v) at a fixed point q. LetM : Y → Y ∗ be an invertible linear transformation and let α ∈ Y ∗. Considerthe function f : Y → R given by

f(v) = (1/2)〈Mv, v〉+ 〈α, v〉 v ∈ Y (9.7)

We may identify Y ∗∗ with Y by the natural isomorphism and then, sinceM∗ maps Y ∗∗ to Y ∗, we may write M∗ : Y → Y ∗. In (9.7) we can assumewithout loss of generality that M∗ = M because 〈Mv, v〉 = 〈v,M∗v〉 anyway.

The derivative of f in a direction u is a linear functional of u given by

〈p, u〉 ≡ ∂uf(v) = 〈Mv, u〉+ 〈α, u〉 ∀u ∈ Y. (9.8)

Here, p is an element of Y ∗ depending on v. It is given by

p = ψ(v) = Mv + α. (9.9)

Since we have assumed that M is invertible we can solve for v in terms of p,finding

v = ψ−1(p) = M−1(p− α). (9.10)

167

The Legendre transformation of f is the function f ∗ : Y ∗ → R given by

f ∗(p) = 〈p, v〉 − f(v), (9.11)

wherein it is understood that v should be replaced by the function of p derivedin (9.10). Thus

f ∗(p) = 〈p,M−1(p− α)〉− {(1/2)〈MM−1(p− α),M−1(p− α)〉+ 〈α,M−1(p− α)〉}= (1/2)〈M−1(p− α), (p− α)〉 (9.12)

SUMMARY: If f(v) = (1/2)〈Mv, v〉 + 〈α, v〉, v ∈ Y , then its Legendretransform, f ∗ : Y ∗ → R is defined by (9.11) with the insertion of (9.10). Itis given by

f ∗(p) = (1/2)〈M−1(p− α), (p− α)〉 for p ∈ Y ∗ (9.13)

Moreover the map ψ : Y → Y ∗ is a diffeomorphism.

9.2 Poisson’s equation

We will write as usual r = |x| in R3.

Theorem 9.4

∆1

r= −4πδ. (9.14)

in the distribution sense. That is,

∆L1/r = −4πLδ

We will break the proof up into several small steps.

Lemma 9.5 At r 6= 0∆(1/r) = 0

168

Proof. ∂(1/r)/∂x = −x/r3 and ∂2(1/r)/∂x2 = −1/r3 + 3x2

r5. So

∆(1/r) = −3/r3 + 3x2 + y2 + z2

r5= −3/r3 + 3/r3 = 0.

QED.In view of this lemma you can see that we have only to deal now with

the singularity at r = 0. Our notion of weak derivative is just right for doingthis.

The trick is to avoid the singularity until after one does some cleverintegration by parts (in the form of the divergence theorem). In case youforgot your vector calculus identities a self contained review is at the end ofthis section. I want to warn you that this is not the kind of proof that youare likely to invent yourself. But the techniques are so frequently occurringthat there is some virtue in following it through at least once in one’s life.

Recall the usual notation: D = C∞c (R3).

Lemma 9.6 Let φ ∈ D. Then∫R3

(1/r)∆φ(x)dx = limε→0

∫r≥ε

(1/r)∆φdx

Proof: The difference between the left and the right sides before taking thelimit is at most (use spherical coordinates in the next step)

|∫r≤ε

(1/r)∆φd3x| ≤ maxx∈R3|∆φ(x)|

∫r≤ε

(1/r)d3x = maxx∈R3|∆φ(x)|2πε2 → 0

QED.Before really getting down to business lets apply the definitions.

∆T1/r(φ) =3∑j=1

(∂2/∂x2j)T1/r(φ)

= −3∑j=1

(∂/∂xj)T1/r(∂φ/∂xj)

= T1/r(∆φ)

=

∫R3

(1/r)∆φ(x)d3x

= limε→0

∫r≥ε

(1/r)∆φ(x)d3x.

169

So what we really need to do is show that this limit is −4πφ(0). To this endwe are going to apply some standard integration by parts identities in the“OK” region r ≥ ε.

Cε :=

∫r≥ε

(1/r)∆φ(x)d3x

=

∫r≥ε∇ ·(1

r∇φ− φ∇(

1

r))d3x by identity (9.18)

=

∫r=ε

(1

r∇φ · n− φ(∇1

r) · n)dA by the divergence theorem

where n is the unit normal pointing toward the origin. The other boundaryterm in this integration by parts identity is zero because we can take it overa sphere so large that φ is zero on and outside it.

Now

|∫r=ε

(1/r)(∇φ · n)dA| = 1

ε|∫r=ε

(∇φ · n)dA|

≤ 1

ε(max |∇φ|)4πε2

→ 0

as ε ↓ 0. This gets rid of one of the terms in Cε in the limit. For the otherone just note that (∇1

r) · n = −∂(1/r)/∂r = 1/r2. So

−∫r=ε

φ(∇1

r) · n)dA = − 1

ε2

∫r=ε

φ(x)dA

= − 1

ε2

∫r=ε

φ(0)dA− 1

ε2

∫r=ε

(φ(x)− φ(0))dA

= −4πφ(0)− 1

ε2

∫r=ε

(φ(x)− φ(0))dA

Only one more term to get rid of!

1

ε2|∫r=ε

(φ(x)− φ(0))dA| ≤ max|x|=ε|φ(x)− φ(0)| · 4π → 0

because φ is continuous at x = 0. This proves (9.14).

170

Vector calculus identities.If f is a real valued function and G is a vector field, both defined on some

region in R3 then∇ · (fG) = (∇f) ·G+ f∇ ·G (9.15)

Application #1. Take f = 1/r and G = ∇φ. Then we get

∇ · (1

r∇φ) = (∇1

r) · ∇φ+

1

r∆φ wherever r 6= 0. (9.16)

Application #2. Take f = φ and G = ∇1r. Then we get

∇ · (φ∇1

r) = (∇φ) · (∇1

r) + φ∆

1

rwherever r 6= 0 (9.17)

But ∆1r

= 0 wherever r 6= 0. So subtracting (9.17) from (9.16) we find

1

r∆φ = ∇ · (1

r∇φ− φ∇1

r) wherever r 6= 0. (9.18)

This is the identity we need in the proof of (9.14).

9.3 Some matrix groups and their Lie algebras

This appendix is an extraction of a very small part of the very readable book[30] by Brian Hall on Lie groups. The following is self-contained in the sensethat the exercises, which are a guide to the facts, are easily doable by today’sstudents.

Recall (or prove yourself):

If A is an n × n matrix with real or complex entries then the initial valueproblem

(O.D.E.)du(t)

dt= Au(t), t ∈ R, u(t) ∈ Rn (9.19)

(Initial value) u(0) = v, v ∈ Rn (9.20)

has a unique solution. It is given by

u(t) = etAv (9.21)

171

where

etA ≡∞∑n=0

(tA)n

n!. (9.22)

Check (at your leisure) that the series converges in operator norm and thatthe sum satisfies

detA

dt= AetA, (9.23)

which ensures that (9.21) gives a solution to (9.19).

Exercise 9.7a) If A is an n× n matrix show that

d

dtdet(etA)

∣∣t=0

= trace A (9.24)

Hint: By (9.22), etA = 1+tA up to order two in t. Expand det(1+tA)by its first column and use induction.

b) Prove, using a), that

d

dtdet(etA) = (det etA)(traceA) for all t. (9.25)

c) Prove, using b), that

det etA = 1 for all real t, if and only if trace A = 0. (9.26)

Exercise 9.8a. Prove: etA is orthogonal for all t if and only if A∗ = −A.

b. Prove: etA is unitary for all t if and only if A∗ = −A.Hint: Differentiate (etA)∗etA.

Notation 9.9

O(n) = {n× n real matrices T : T ∗T = 1} (9.27)

SO(n) = {T ∈ O(n) : det(T ) = 1} (9.28)

U(n) = {n× n complex matrices T : T ∗T = 1} (9.29)

SU(n) = {T ∈ U(n) : detT = 1} (9.30)

172

Notation 9.10

o(n) = {n× n real matrices A : A∗ = −A} (9.31)

so(n) = {A ∈ o(n) : trace A = 0} (9.32)

u(n) = {n× n complex matrices A : A∗ = −A} (9.33)

su(n) = {A ∈ u(n) : trace A = 0} (9.34)

Exercise 9.11 Provea) A ∈ o(n) ⇐⇒ etA ∈ O(n) ∀ t ∈ Rb) A ∈ so(n) ⇐⇒ etA ∈ SO(n) ∀ t ∈ Rc) A ∈ u(n) ⇐⇒ etA ∈ U(n) ∀ t ∈ Rd) A ∈ su(n) ⇐⇒ etA ∈ SU(n) ∀ t ∈ R

Notation 9.12 For any two n× n matrices define

[A,B] = AB −BA (9.35)

Exercise 9.13 Show that each of the four sets defined in Notation 9.10 isclosed under the commutator product, A,B 7→ [A,B]. Thus each of thefour linear sets of matrices is an algebra with respect to the product (9.35).Unfortunately (in the view of some) this product is not associative.

Exercise 9.14 Although the commutator product (9.35) is not associativewe do have instead the following identity for any three n× n matrices.

[A, [B,C]] + [B, [C,A]] + [C, [A,B]] = 0. Jacobi’s identity. (9.36)

Prove this. Notice the easy to remember cyclicity in this identity.

Definition 9.15 A real Lie algebra is a real vector space L with a bilinearpairing L × L 3 A,B 7→ [A,B] which is skew symmetric (i.e. [A,B] =−[B,A]) and which satisfies Jacobi’s identity (9.36).

You showed in Exercises 9.13 and 9.14 that each of the four linear spacesin Notation 9.10 is a Lie algebra with respect to the commutator product(9.35)

Terminology. The linear space o(n), with the product A,B 7→ [A,B] is theLie algebra of O(n). Similarly so(n) is the Lie algebra of SO(n), while u(n) isthe Lie algebra of U(n) and su(n) is the Lie algebra of (make a bold guess).

173

The adjoint representation.If G is any one of the four groups defined in Notation 9.9 and g is its Lie

algebra define

R(g)A = gAg−1 for g ∈ G and A ∈ g (9.37)

Exercise 9.16 a) Show that for each of the four groups in Notation 9.9 themap R(g) caries g back into itself for each g ∈ G.

b) Show that the map g 7→ R(g) is a hom*omorphism of G into the groupof invertible operators on g.

Terminology. R is called the adjoint representation.

Status. You have now shown that for certain matrix groups G, namelythose defined in Notation 9.9, the set L, of matrices, A, for which etA ∈ G∀ real t, forms a real vector space which is closed under the commutatorproduct [A,B].

Fact. This is true for all Lie groups of matrices.

9.3.1 Connection between SU(2) and SO(3).

Exercise 9.17 If x, y, z are real numbers then the matrixA ≡(

z x+ iyx− iy −z

)is a Hermitian 2× 2 matrix with trace zero. Moroever any Hermitian 2× 2matrix with trace zero is clearly of this form for some unique x, y, z.

a. Show that det(A) = −(x2 + y2 + z2).

b. Show that if U is a unitary 2 × 2 matrix and A′ = UAU∗ then A′ isHermitian and has trace zero.

c. Write A′ =

(z′ x′ + iy′

x′ − iy′ −z′)

where A′ is given in part b. Clearly

(x′, y′, z′) depends linearly on (x, y, z). Denote this linear transformation byρ(U). Show that ρ(·) is a hom*omorphism from SU(2) into O(3).

d. Show that if AB = BA for all n × n traceless Hermitian matricesA (where B is an n × n matrix) then B is a scalar multiple of the identitymatrix.

e. Prove that the kernel of ρ (defined in part c.) consists of {I,−I}.

174

9.3.2 The Pauli spin matrices

Define

σx =

(0 11 0

), σy =

(0 −ii 0

), σz =

(1 00 −1

)(9.38)

These are the Pauli spin matrices. Their significance lies in the fact that thefollowing multiples of them form a very convenient basis of the Lie algebrasu(2). Define

sx = (i/2)σx, sy = (i/2)σy, sz = (i/2)σz (9.39)

Observe first that the three Pauli matrices are Hermitian and have tracezero. Therefore the three s matrices are skew-adjoint and have trace zero. Inview of Notation 9.10 the three s matrices lie in su(2). When you have a fewseconds to spare you should check that su(2) is actually three dimensional(over R) and that the three s matrices actually span su(2). The reason forthe factors of 1/2 is that the following nifty commutation relations hold.

[sx, sy] = sz , [sy, sz] = sx , [sz, sx] = sy . (9.40)

[Check these signs]

Theorem 9.18 Identify R3 with the set of 2×2 Hermitian matrices of trace

zero by the map (x, y, z) 7→ A ≡(

z x+ iyx− iy −z

). For U ∈ SU(2) define

ρ(U) ∈ SO(3) as in Exercise 9.17. Then the hom*omorphism

SU(2) 7→ SO(3)given by g 7→ ρ(g) (9.41)

is hom*omorphism of SU(2) onto SO(3) with kernel {I,−I}.

Proof. Most of this has been already proven in Exercise 9.17. It remainsonly to prove the surjectivity. To this end it suffices to prove that any rotationaround some axis is in the image of ρ. We will do this for rotation aroundthe z axis. This computation will be very useful later in understanding spin.

We already know that that for any real number θ the matrix exp (θsz) liesin SU(2). (by Exercise 9.11.) Moreover, since sz is diagonal we can computethe exponential easily. We find

exp (θsz) =

(e(iθ/2) 0

0 e(−iθ/2)

)(9.42)

175

and therefore

exp (θsz)

(z x+ iy

x− iy −z

)exp (−θsz) =

(z e(iθ)(x+ iy)

e(−iθ)(x− iy) −z

)(9.43)

What is the meaning of this matrix? Just write x + iy = re(iφ) and thenyou see that the upper right entry is re(i(θ+φ)). This represents a rotation inthe x, y plane by an amount θ. Thus every rotation around the z axis lies inthe range of the hom*omorphism ρ. I’ll leave it to your trust in symmetry toconclude that every rotation around any axis is in the range of ρ. Thereforeρ is surjective.

It will be real useful to write the result of the computation in the precedingproof in the form

ρ(eθsz) = eθAω , ω = (0, 0, 1) (9.44)

where Aω is defined in (2.5). As we know, eθAω is the matrix (2.6), which isrotation in the x, y plane through an angle θ.

Peculiarity We see that as θ moves from 0 to 2π, the operator ρ(eθsz)runs through rotations starting from the identity rotation and back to theidentity rotation. But what is eθsz doing? At θ = 2π we have, by (9.42)

exp (2πsz) =

(−1 00 −1

)= −I (9.45)

In fact the curve θ 7→ exp(θsz) clearly returns to the identity operator inSU(2) only when θ reaches 4π. But then its image has returned to theidentity operator in SO(3) twice. This is consistent, to say the least, withthe assertion in Theorem 9.18 that the kernel of the hom*omorphism ρ is{±I}.

9.4 grad, curl, div and d

These days its good to be able to switch back and forth between the vectordifferential operators, gradient, curl and divergence on the one hand and theexterior derivative d over R3 on the other. For a nice exposition of theserelations see Hubbard and Hubbard [36], Section 6.8.

Here is a short summary. If v = (v1, v2, v3) ∈ R3 we may associate withit a 1-form and a 2-form:

176

v(1) =3∑j=1

vj dxj, which is a 1-form, and (9.46)

v(2) =∑

(i,j,k)

vi dxj ∧ dxk, which is a 2-form. (9.47)

The sum is over all three triples (i, j, k) which are cyclic permutations of(1, 2, 3). The maps v 7→ v(1) and v 7→ v(2) are isomorphisms of R3 withΛ1(R3) and Λ2(R3) respectively.

The Hodge star operator ∗ : Λ(R3)→ Λ(R3) is given by

∗ a = a dx1 ∧ dx2 ∧ dx3, a ∈ R, ∗v(1) = v(2), ∗∗ = Identity. (9.48)

In the physics literature the coordinate vector v is referred to as a “polar”vector or “axial” vector according to how its coordinates transform under theinversion (x1, x2, x3) 7→ (−x1,−x2,−x3). Under this inversion its clear thatv(1) 7→ −v(1) while v(2) 7→ v(2). To say that v is a polar vector means thatunder rotations the triple (v1, v2, v3) transforms like the 1-form v(1). To saythat v is an axial vector means that the triple (v1, v2, v3) transforms like the2-form v(2). Some would say that axial vectors should have been identifiedas a 2-forms in the first place. Excuse: “polar” and “axial” were inventedlong before 2-forms.

Now suppose that each component vj is a C1 function on R3. The usualdefinitions of curl and div are(

curl v(x))i

=∂vk∂xj− ∂vj∂xk

, (i, j, k) cyclic (9.49)

div v(x) =3∑i=1

∂vi(x)

∂xi(9.50)

All three operations, grad, curl, div, are special cases of the exterior derivatived if one simply identifies a vector field v with a 1-form or 2-form correctly.Thus:

(∇φ)(1) = d φ (9.51)

(curl v)(2) = d v(1) (9.52)

∗(div v) = d v(2) (9.53)

177

Of course with the help of the Hodge ∗ operator we can write also

(curl v)(1) = ∗d v(1), and (9.54)

div v = ∗d v(2) (9.55)

{ Note: If I had defined a(3) = ∗a then (7.21) would have read (div v)(3) =d v(2). Cute. }

9.5 Hermite polynomials

Definition 9.19 (Hermite polynomials). The entire function z 7→ ezx−z2

can be expanded in a power series in z with coefficients that depend on xthus:

e(zx− z2

2) =

∞∑n=0

Hn(x)zn/n!. (9.56)

Since the coefficient of zn/n! in the factor ezx is xn, the coefficient of zn/n! in(9.56) is a polynomial in x of degree at most n. Moreover the highest orderterm in Hn(x) is exactly xn and H0(x) = 1. Since the nth derivative of (9.56)with respect to z at z = 0 is exactly Hn(x), Cauchy’s integral formula gives

Hn(x) =n!

2πi

∫|z|=1

z−(n+1)exz−z2

2 dz (9.57)

The polynomials Hn, n = 0, 1, 2, . . . are the Hermite polynomials.

Example 9.20 Comparing the coefficients of zn in (9.56) for n = 0, 1, 2 onesees immediately that

H0(x) = 1, H1(x) = x, H2(x) = x2 − 1.

Lemma 9.21 Let γ(dx) = (1/√

2π)e−x2/2dx. Then∫ ∞

−∞Hn(x)Hk(x)γ(dx) = n!δkn (9.58)

Proof. From the identity

(zx− z2

2) + (ζx− ζ2

2)− x2/2 = −1

2(x− z + ζ

2)2 + zζ

178

it follows that ∫Rexz−

z2

2 exζ−ζ2

2 γ(dx) = ezζ .

Multiply by n!k!(2πi)2

z−(n+1)ζ−(k+1) and integrate over the product of unit circlesto find∫

RHn(x)Hk(x)γ(dx) =

n!k!

(2πi)2

∫|z|=1

∫|ζ|=1

z−(n+1)ζ−(k+1)ezζdζdz

=n!

2πi

∫|z|=1

z−(n+1)zkdz

= n!δkn

Lemma 9.22 (Identities)

H ′n(x) = nHn−1(x) (9.59)

Hn(x) = xHn−1(x)− (n− 1)Hn−2(x) (9.60)

−H ′′n(x) + xH ′n(x) = nHn(x) (9.61)

Proof. Differentiate (9.57) to arrive at (9.59). Differentiate (9.56) withrespect to z to find

(x− z)exz−z2

2 =∞∑n=1

Hn(x)zn−1

(n− 1)!(9.62)

and therefore

(x− z)∞∑n=1

Hn(x)zn/n! =∞∑n=1

Hn(x)zn−1

(n− 1)!. (9.63)

Comparing the coefficients of zn−1 on both sides of this identity we find

xHn−1(x)

(n− 1)!− Hn−2(x)

(n− 2)!=

Hn(x)

(n− 1)!.

Multiply by (n − 1)! to arrive at (9.60). Differentiate (9.59) once more anduse (9.60) and (9.59) again to deduce

H ′′n = n{(n− 1)Hn−2} = n{xHn−1 −Hn} = xH ′n − nHn,

which is (9.61).

179

Corollary 9.23 (Conversion to frequency ω) Let ω > 0. Define

γω(dq) =(ωπ

)1/2

e−ωq2

dq and ψn(q) =1√n!Hn(√

2ωq) (9.64)

Then

a) {ψn}∞0 is an orthonormal basis of L2(R, γω) (9.65)

b)(− 1

2

d2

dq2+ ωq

d

dq

)ψn = (nω)ψn, n = 0, 1, . . . (9.66)

c) Spectrum(− 1

2

d2

dq2+ ωq

d

dq

)= {nω : n = 0, 1, 2, . . . } (9.67)

d)(

(−1

2

d2

dq2+ ωq

d

dq)f, g

)L2(γω)

=1

2

∫Rf ′(q)g′(q)γω(dq) (9.68)

e) ψ′n(q) =√

2ω√nψn−1(q) (9.69)

f)√

2ω qψn−1(q) =√nψn(q) +

√(n− 1)ψn−2(q) (9.70)

g)

∫Rq2γω(dq) = 1/(2ω) (9.71)

Proof. Put x =√

2ω q in (9.58) to deduce orthonormality of the ψn. Do thesame to derive b) from (9.61), to derive e) from (9.59), and to derive f) from9.60. Take completeness of the ψn as an exercise. Item c) follows from a)and b). Item d) is a straightforward integration by parts. Item g) is a simpleGaussian integral, but can also be deduced from Item a) by taking n = 1,since ψ1(q) =

√2ωq.

9.6 Special relativity

Ran out of time. If you’re concerned about the clock (≡ twin) paradox seeAlfred Schild, [52] 1959. There is no paradox.

9.7 Other interesting topics, not yet experimentallyconfirmed

9.7.1 The Higgs particle for the standard model

The conformal group of Minkowski space is a fifteen dimensional Lie groupwhich contains the ten dimensional Poincare group as a subgroup. Three of

180

the dimensions of the Poincare group correspond to changing to a coordinatesystem which is moving with constant velocity with respect to yours. Three ofthe dimensions of the conformal group correspond to changing to a coordinatesystem which is moving with constant acceleration with respect to yours. Ifa physical theory is invariant under the conformal group then there shouldbe no difference between a system which is stationary and a system whichis moving with constant acceleration. So how can Newton’s equation F =ma have a meaning if a = 0 is physically equivalent to a 6= 0? It can’t,unless m = 0. Now the Yang-Mills equation is invariant under the conformalgroup , as are the various equations for matter fields associated with it.Consequently a theory based on these equations must include only particlesof mass zero. This is problematical because there are particles of strictlypositive mass. In a quantum theory, invariance under the conformal group,G lets call it, means that there is a unitary representation of G on the Hilbertstate space of the system which commutes with some of the observables,including especially the Hamiltonian. But these unitary operators do notleave every vector invariant. Therefore even though the overall theory isconformally invariant, there may be subspaces which are not invariant underthe unitary representation of G but are nevertheless invariant under theHamiltonian. In that case one could conceivably restrict ones attention tosuch a subspace and hope that within this subspace the conformal invarianceof the overall theory is sufficiently undermined so as not to force all massesto be zero. The standard model includes a mechanism for implementing thisidea. One hypothesizes that there exists a particle in the theory, the so calledHiggs particle, which has a non-unique ground state. Such a ground stateneed not be invariant under the conformal group, which will just take theground state into a different ground state. The subspace of the Hilbert statespace “generated” by this ground state will be a non-conformally invariantsubspace of the kind we would like. One says that the conformal symmetryis broken in this subspace. The theory shows that the Higgs particle isquite a heavy particle and therefore not easy to produce. It is widely hopedthat the Large Hadron Collider (LHC), which recently went into operation inSwitzerland, will find it, thereby lending tremendous support to the standardmodel. As of the recent summer solstice at the latitude of Ithaca, the Higgsparticle has not yet been found.

181

9.7.2 Supersymmetry

9.7.3 Conformal field theory

9.7.4 String theory

9.8 How to make your own theory

Quantum mechanics entails a vast conceptual revision of our understandingof matter, an understanding that we learned from classical mechanics. Manyopen minded citizens find quantum mechanics hard to accept. If you areamong them, there is no reason why you shouldn’t make your own theory. Tohelp you with this, I’ve prepared a short list of observations and experimentswhose outcomes must fit into your theory.

1. Hydrogen spectrum.2. Blackbody radiation formula.3. Dark lines from stars4. Zeeman effects. (atom in a magnetic field)

a) normal Zeeman effect (one line splits into three lines)b) anomalous Zeeman effect (one line splits into five or more lines)

5. Stark effect. (atom in an electric field)6. Stern-Gerlach experiment. (magnetic moment of a neutral atom comes

in discrete multiples of a unit.)7. Specific heat of metals (Pauli)

10 Yet More History

10.1 Timeline for electricity vs magnetism.

1731: Lightning struck a box of knives and forks and magnetized them.1805: Hachette and Desormes attempted to determine whether an insu-

lated voltaic pile , freely suspended, is oriented by terrestrial magnetism. Itis not.

1820: Oersted, Biot, Savart, Ampere. The sequence of misstepstaken by Oersted and his predecessors, prior to Oersted’s discovery of whatactually works, is very illuminating. Don’t let anyone tell you that if you,the intelligent reader, will just sit and think about this, you will arrive at thetheory that works. In fact Oersted tried at first to produce a magnetic field

182

from a battery without closing the circuit !! It was years before he thoughtof closing the circuit and producing a magnetic field from a current ratherthan just from a battery by itself. And even then he oriented the compassincorrectly for seeing an effect. But he had company in this kind of misstep.Hachette and Desormes tried to make a compass needle by just suspendinga battery from its middle by a thread, (1805), hoping that it would orient tothe earth’s axis like a compass needle. It didn’t. So much for the productionof magnetic fields from electricity - till 1820. You can read more detail aboutthese travails in Whitaker [W1], pages 80-85.

1824: FARADAY: First attempt to produce a current from a magneticfield. Failure. [Do the experiment.]

For Faraday’s reasoning for doing this experiment you can read [W1]p170, line 3* to p171 line 6*. He already has the idea of “lines of magneticforce”.

1831 Faraday’s second attempt. Success. [Do the experiment.]1861: MAXWELL (1831-1879) puts it all together in 18611887: Hertz(1857-1894) settles it. See [W1] p318-325, but skip the

mathematics.

Views and gossip.1. Maxwell intended to find a mechanical model explaining all the previ-

ous discoveries. See [W1] P246. and footnote Number 2.2. Maxwell’s mechanical view. See [W1] P250, 251.3. Maxwell’s view of the velocity of light. See [W1] p253 , last paragraph.4. Maxwell’s disdainful view of Ampere’s “too mathematical” method.

See [44, Vol. 2], page 175 bottom to page 176 top.5. Maxwell’s theory was accepted with reluctance. See [W1] p254, last

paragraph.6. Lord Kelvin never accepted Maxwell’s theory. See [W1] p266 “This

was not ..” to p267, line4*.7. The Michelson-Morley experiment, (1887). The aether theory was

abandoned after this experiment and along with it any mechanical signifi-cance of the electric and magnetic fields. (such as “stress in the aether”) Youhave a choice of movies illustrating the Michelson-Morley experiment. Justenter Michelson-Morley into Google and take your pick. Wikipedia has, asusual, a good exposition of this experiment.{More cute facts about the aftermath of Maxwell’s discovery are com-

mented out next.}

183

10.2 Timeline for radiation measurements

If you really want to know who did what and when, here is a chart. But youcan skip it if this kind of history doesn’t excite you. The thing to take awayfrom this list is that a large number of physicists evolved a large collection ofspectroscopic facts over a long period of time, and thats not even countingGalileo and Newton.

1752: Thomas Melvill noticed that the yellow light produced by throwingsome table salt into a flame was of a “definite degree of refranglbility”. Thatis, shining the light through a prism turned it a definite amount.

1802: Wollaston noticed that the spectrum of sunlight was crossed byseven dark lines.

1814: Frauenhofer: Also noticed dark lines in the spectrum of sunlight.In 1821 he measured their wavelength with a diffraction grating. He foundthat the sharpest dark line of this (yellow) sunlight had a wavelength of5887.7 Angstroms. [ One Angstrom = 10−8 cm.]. He also found that manyflames produced a bright line of this same wavelength. See Figure 5 for asample of dark lines.

1826: W.H. Fox Talbot placed a slit between a flame and the prismand found that he could distinguish between different elements thrown intothe flame by the different line structures of different elements. In this wayhe could distinguish between strontium and lithium, both of which give redflames.

1827: William Miller showed that light transmitted through a gas pro-duced dark lines exactly where the gas would have produced bright lines ifit had been heated. Conclusion: A gas that emits light at wavelength λ alsoabsorbs light of wavelength λ passing through it. This explains the darklines found in the sun’s spectrum: gasses in the sun’s outer layers absorblight produced deeper in by the hot sun. In this way one could now begin toanalyze the chemical composition of stars.

1853: Angstrom showed that a gas radiates and absorbs light of the samewavelength.

1858: Balfour Stewart introduced the notion of equilibrium of radiationof each wavelength. (Blackbody radiation).

1885: Balmer found an empirical formula for the wavelengths of lightemitted by heated hydrogen. Namely

λ−1 = constant (1

4− 4

m2), m = 3, 4, 5, .... (10.1)

184

1890: Rydberg gave a general formula of a similar form for many otherchemical elements.

1893: Wien derived a formula for the energy density of blackbody radia-tion from “fundamental” physical principles. But it agreed with experimentonly for low frequencies. In particular it didn’t agree with the curves inFigure 7.

1900: Lord Raleigh derived another formula for the energy density ofblack body radiation from other “fundamental” physical principles. But it,too, disagreed with experiment and in particular, with the curves in Figure7. Moreover it was not even in L1(0,∞)!!! (Falls off too slowly for largefrequencies).

1896: Zeeman discovered that the D line of sodium breaks into three lineswhen a magnetic field is imposed on the source of the sodium light. Lorentzgave a theoretical derivation of this effect. But Lorentz’ theory couldn’texplain the later discovery that the bright lines of other chemical elementsbreak into 5, 7 or even 11 lines when a magnetic field is imposed on thesource. This breakup is called the Zeeman effect. (Three lines is called thenormal Zeeman effect. Five or more lines is called the anomalous Zeemaneffect.)

1913: Johannes Stark discovered the electric analogy of the Zeemaneffect: lines split when the element is placed in an electric field. This is nowcalled the Stark effect.

Bottom line: To understand radiation requires explaining an awful lot ofvery different kinds of observations.

10.3 Planck, Einstein and Bohr on radiation. Time-line.

This needs revision and maybe different placement.October 19, 1900: Planck (1858 -1947), at a meeting of the Berlin

Academy of Science, gave a formula for the energy density of black bodyradiation. Namely,

E(T, dν) =8πh

c3

ν3dν

ehν/kT − 1Planck’s formula (10.2)

where k is Boltzmann’s constant (already well known at that time), and h issome other constant (which we call Planck’s constant nowadays.)

185

Ref. [W2, p.78 -¿]But the derivation of (10.2) that Planck presented was itself ad hoc: He

just assumed that the entropy of the radiation field, as a function of its energysatisfied a particular differential equation (chosen largely for simplicity) fromwhich (10.2) followed. Although (10.2) agreed well with measurements it wasstill just an empirical formula.

December 14, 1900 Planck gave another derivation of (10.2). Thistime he based it on the physical assumption that the energy of a simpleharmonic Hertzian oscillator of frequency ν is proportional to ν.

E = hν Planck’s hypothesis (10.3)

1905: Einstein used Planck’s hypothesis, (10.2), to explain why lighthas to be of a high enough frequency before it will knock electrons out of ametal. This is the photoelectric effect.

1911 (May): Rutherford determined the structure of an atom: smallnucleus (10−13 cm) compared to the overall radius (10−8 cm.)

1912: Millikan (the oil drop experiment) showed that charge comes inmultiples of the charge on an electron.

1913: Niels Bohr (1885- 1962) As we have seen, Planck’s assumptionammounts to the assertion that for a harmonic oscillator of frequency ν theonly allowed amplitudes are those for which the energy is hν. Now Bohrapplied this idea to atoms. He made the assumption that not all of the usualclassical orbits (remember the elliptical orbits of Newton?) exist in an atom.

Bohr’s hypothesis: In an atom only those closed orbits exist forwhich ∫

orbit

α = nh, (10.4)

where α is the natural one form on T ∗(R3) defined in (2.33), h is Planck’sconstant and n is an integer. (10.4) is called the quantization condition forBohr orbits. (Actually Bohr considered only circular orbits. But Bohr’s orbitcondition was generalized to elliptical orbits by Sommerfeld (1915) and W.Wilson (1915) independently.)

186

Result: Bohr’s semi-geometric hypothesis (10.4) limits the allowed en-ergies of elliptical (classical) orbits to a computable sequence, {En}. Nowapply Planck’s hypothesis (10.3) to the energy differences {En−Em} to finda double sequence of allowed radiation frequencies,

En − Em = hνn,m (10.5)

These frequencies agree exactly with Balmer’s empirical formula.READ [W2] p. 109 ”The culmination ..” to p. 110.

1914-1918: World War I.Other successes of Bohr’s theory (after some modifications.)

1924: Pauli (Exclusion principle) No two electrons can be in the samestate at the same time. [We will elaborate on this later.]

1925: Pauli (electron spin) shows that an electron must be regarded asa little magnet in order to account for the Zeeman effect properly. Thisinvalidates Sommerfeld’s earlier explanation of the Zeeman effect. So inorder to explain the Zeeman effect it is necessary to regard anelectron as spinning around some internal axis. [We will elaborate onthis later.]

Bose [W2 p.219]1926: Fermi [W2 p. 224]

187

10.4 List of elementary particles

Figure 13: Elementary particles

188

References

[1] Y. Aharonov and D. Bohm, Significance of electromagnetic potentialsin the quantum theory, Phys. Rev. (2) 115 (1959), 485–491, {This isthe famous paper describing the Aharonov-Bohm effect. See [11] forexperimental verification.}. MR 0110458 (22 #1336)

[2] Yakir Aharonov and Daniel Rohrlich, Quantum paradoxes, Wiley-VCHVerlag GmbH & Co. KGaA, Weinheim, 2005, Quantum theory for theperplexed. MR 2327477 (2008i:81002) { This book addresses fundamen-tal conceptual issues in quantum mechanics.}

[3] Huzihiro Araki, Mathematical theory of quantum fields, InternationalSeries of Monographs on Physics, vol. 101, Oxford University Press, Ox-ford, 2009, Translated from the 1993 Japanese original by Ursula Carow-Watamura, Reprint of the 1999 edition [MR1799198]. MR 2542202

[4] V. I. Arnold, Mathematical methods of classical mechanics, Springer-Verlag, New York, 1978, Translated from the Russian by K. Vogtmannand A. Weinstein, Graduate Texts in Mathematics, 60. MR 0690288 (57#14033b)

[5] Ph. Blanchard and A. Jadczyk, Theory of events, Nonlinear, deformedand irreversible quantum systems (Clausthal, 1994), World Sci. Publ.,River Edge, NJ, 1995, pp. 265–272. MR 1457551 (98f:81025)

[6] Ph. Blanchard and A. Jadczyk (eds.), Quantum future, Lecture Notes inPhysics, vol. 517, Berlin, Springer-Verlag, 1999, From Volta and Comoto the present and beyond. MR 1727766 (2000g:81010)

[7] Ph. Blanchard, L. Jakobczyk, and R. Olkiewicz, Measures of entangle-ment based on decoherence, J. Phys. A 34 (2001), no. 41, 8501–8516.MR 1876610 (2002k:81024)

[8] Douglas Botting, Humboldt and the cosmos, Prestel, Munich and NewYork, 1994.

[9] J. Chadwick, The existence of a neutron., Proc. Roy. Soc. London. 136(1932), 692–708, { +Q41 L84, (Refs. are from Schweber’s book “QEDand the men who made it.” ) }.

189

[10] , Possible existence of a neutron., Nature 129 (1932), 312–313,{ Q1 N282 }.

[11] R.G. Chambers, Shift of an electron interference pattern by enclosedmagnetic flux, Phys. Rev. Lett. 5 (1960), 3 – 5, {This is the paperthat verified experimentally the Aharonov-Bohm effect. It was cited 150times.}.

[12] Claude Chevalley, The algebraic theory of spinors and Clifford algebras,Springer-Verlag, Berlin, 1997, Collected works. Vol. 2, Edited and witha foreword by Pierre Cartier and Catherine Chevalley, With a postfaceby J.-P. Bourguignon. MR 1636473 (99f:01028)

[13] J. M. Cook, The mathematics of second quantization, Trans. Amer.Math. Soc. 74 (1953), 222–245. MR 0053784 (14,825h)

[14] Richard H. Cushman and Larry M. Bates, Global aspects of classi-cal integrable systems, Birkhauser Verlag, Basel, 1997. MR 1438060(98a:58083)

[15] R. W. R. Darling, Differential forms and connections, Cambridge Uni-versity Press, Cambridge, 1994. MR 1312606 (95j:53038)

[16] P. A. M. Dirac, The quantum theory of the electron, Proc. R. Soc. ofLondon A117 (1928), 610–624, Received January 2, 1928. This is thegreat paper that introduces the Dirac equation.

[17] , A theory of electrons and protons, Proc. R. Soc. of LondonA126 (1929), 360–366, Received December 6, 1929. This is the greatpaper that introduces hole theory. Dirac believed the negative energyparticles were protons, in this paper.

[18] , Quantized singularities in the electromagnetic field,, Proc. ofthe London Math. Soc. A133 (1931), 60–72, This is the great paper onmagnetic monopoles. [anywhere, magnetic monopoles, brings up 857 ref-erences on mathscinet. 4/19/06. This beats the bibliographies of Stevensand of Goldhaber and Trower].

[19] Paul Federbush, Quantum field theory in ninety minutes, Bull. Amer.Math. Soc. (N.S.) 17 (1987), no. 1, 93–103. MR 888881 (88k:81106)

190

[20] V.A. Fock, Konfigurationsraum und zweite quantelung, Zeitschrift furPhysik 75 (1932), 622–647.

[21] Gerald B. Folland, Quantum field theory, Mathematical Surveys andMonographs, vol. 149, American Mathematical Society, Providence, RI,2008, A tourist guide for mathematicians. MR 2436991 (2010a:81001)

[22] A.S. Goldhaber and W.P. Trower, Resource letter mm-1: Magneticmonopoles, Am. J. Phys. 58 (1990), 429–439.

[23] Herbert Goldstein, Classical mechanics, second ed., Addison-WesleyPublishing Co., Reading, Mass., 1980, Addison-Wesley Series in Physics.MR 575343 (81j:70001)

[24] David J. Griffiths, Introduction to elementary particles, second ed.,Weinheim : Wiley-VCH, c2008.

[25] Leonard Gross, On the formula of Mathews and Salam, J. FunctionalAnalysis 25 (1977), no. 2, 162–209. MR 0459403 (56 #17596)

[26] , Thermodynamics, statistical mechanics and random fields,Tenth Saint Flour Probability Summer School—1980 (Saint Flour,1980), Lecture Notes in Math., vol. 929, Springer, Berlin, 1982, pp. 101–204. MR 665596 (84m:82005)

[27] Victor Guillemin, The story of quantum mechanics, first ed., CharlesScribner’s Sons, New York, 1968, {1. This fun book to read was writtenby the physicist Victor Guillemin, father of our Victor Guillemin.}{2. Contains no mathematics, but quotes Sir James Jeans (1930) (page232) “... the Great Architect of the Universe ... is a pure mathemati-cian.”}.

[28] Brian Hall, An introduction to quantum theory for mathematicians, Inpreparation, 2011.

[29] Brian C. Hall, Harmonic analysis with respect to heat kernel measure,Bull. Amer. Math. Soc. (N.S.) 38 (2001), no. 1, 43–78 (electronic). MR1803077 (2002c:22015)

[30] , Lie groups, Lie algebras, and representations, Graduate Texts inMathematics, vol. 222, Springer-Verlag, New York, 2003, An elementaryintroduction. MR 1997306 (2004i:22001)

191

[31] Werner Heisenberg, Uber den bau der atomkerne, Zeits. Phys. 77 (1932),1–11, {This is the paper that introduces isotopic spin.} [Reprinted inHeisenberg, Collected works, (1989)].

[32] , Uber den bau der atomkerne, Zeits. Phys. 78 (1933), 156–164,[Reprinted in Heisenberg, Collected works, (1989)].

[33] , Uber den bau der atomkerne, Zeits. Phys. 80 (1933), 587–596,{This is the paper that introduces isotopic spin?} [Reprinted in Heisen-berg, Collected works, (1989)] Check these titles.

[34] , Gesammelte Werke. Abteilung A. Teil II, Springer-Verlag,Berlin, 1989, Wissenschaftliche Originalarbeiten. [Original scientific pa-pers], Edited and with a preface by W. Blum, H.-P. Durr and H. Rechen-berg. MR 1041794 (91h:01096)

[35] Gerald Holton and Duane H. D. Roller, Foundations of modern physicalscience, Addison-Wesley Pub. Co., Reading, Mass., 1958.

[36] John Hamal Hubbard and Barbara Burke Hubbard, Vector calculus,linear algebra, and differential forms, Prentice Hall Inc., Upper SaddleRiver, NJ, 1999, A unified approach. MR 1657732 (99k:00002)

[37] J. D. Jackson, Classical electrodynamics, 1999, Third Edition.

[38] Josef M. Jauch, Foundations of quantum mechanics, Addison-WesleyPublishing Co., Reading, Mass.-London-Don Mills, Ont., 1968. MR0218062 (36 #1151)

[39] Jorge V. Jose and Eugene J. Saletan, Classical dynamics, CambridgeUniversity Press, Cambridge, 1998, A contemporary approach. MR1640663 (99g:70001)

[40] Shoshichi Kobayashi and Katsumi Nomizu, Foundations of differentialgeometry. Vol. I, Wiley Classics Library, John Wiley & Sons Inc., NewYork, 1996, Reprint of the 1963 original, A Wiley-Interscience Publica-tion. MR 1393940 (97c:53001a)

[41] George W. Mackey, The mathematical foundations of quantum me-chanics: A lecture-note volume, W., A. Benjamin, Inc., New York-Amsterdam, 1963. MR 0155567 (27 #5501)

192

[42] K. B. Marathe and G. Martucci, The mathematical foundations of gaugetheories, Studies in Mathematical Physics, vol. 5, North-Holland Pub-lishing Co., Amsterdam, 1992. MR 1173210 (93k:58002)

[43] Jerrold E. Marsden and Tudor S. Ratiu, Introduction to mechanics andsymmetry, Texts in Applied Mathematics, vol. 17, Springer-Verlag, NewYork, 1994, A basic exposition of classical mechanical systems. MR1304682 (95i:58073)

[44] James Clerk Maxwell, A treatise on electricity and magnetism, vol. 1and 2, (Dover has its recent version), First edition, 1873, Second edition,1891.

[45] Jagdish Mehra and Helmut Rechenberg, The historical development ofquantum theory., Springer-Verlag, New York, 1982–2001, About 2000pages in 6 volumes. MR 676476 (85h:01025a)

[46] Mikio Nakahara, Geometry, topology and physics, second ed., Gradu-ate Student Series in Physics, Institute of Physics, Bristol, 2003. MR2001829 (2004e:58001)

[47] Edward Nelson, Dynamical theories of Brownian motion, Princeton Uni-versity Press, Princeton, N.J., 1967. MR 0214150 (35 #5001)

[48] Michael Reed and Barry Simon, Methods of modern mathematicalphysics. II. Fourier analysis, self-adjointness, Academic Press [Har-court Brace Jovanovich Publishers], New York, 1975. MR 0493420 (58#12429b)

[49] Duane Roller and Duane H. D. Roller, The development of the con-cept of electric charge; Electricity from the Greeks to Coulomb,, Cam-bridge, Harvard University Press., 1954, {QC507 .R74 , Annex andMann Libs.}.

[50] Duane H. D. Roller, The De Magnete of William Gilbert, MennoHertzberger, Amsterdam, 1959.

[51] Leonard I. Schiff, Quantum mechanics, McGraw-Hill Book Co. NewYork, 1949.

193

[52] Alfred Schild, The clock paradox in relativity theory, Amer. Math.Monthly 66 (1959), 1–18. MR 0136417 (24 #B2455)

[53] Julian Schwinger, Selected papers on quantum electrodynamics, DoverPublications, New York, 1958.

[54] I. E. Segal, Postulates for general quantum mechanics, Ann. of Math.(2) 48 (1947), 930–948. MR 0022652 (9,241b)

[55] S. Sternberg, Group theory and physics, Cambridge University Press,Cambridge, 1994. MR 1287387 (95i:20001)

[56] R. F. Streater and A. S. Wightman, PCT, spin and statistics, andall that, Princeton Landmarks in Physics, Princeton University Press,Princeton, NJ, 2000, Corrected third printing of the 1978 edition. MR1884336 (2003f:81154)

[57] F. Strocchi, An introduction to the mathematical structure of quan-tum mechanics, second ed., Advanced Series in Mathematical Physics,vol. 28, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2008,A short course for mathematicians. MR 2484367 (2010e:81001)

[58] Leon A. Takhtajan, Quantum mechanics for mathematicians, Gradu-ate Studies in Mathematics, vol. 95, American Mathematical Society,Providence, RI, 2008. MR 2433906 (2010c:81003)

[59] R. Ticciati, Quantum field theory for mathematicians, Encyclopedia ofMathematics and its Applications, vol. 72, Cambridge University Press,Cambridge, 1999. MR 1699269 (2000h:81002)

[60] John von Neumann, Mathematical foundations of quantum mechan-ics, Princeton Landmarks in Mathematics, Princeton University Press,Princeton, NJ, 1996, Translated from the German and with a preface byRobert T. Beyer, Twelfth printing, Princeton Paperbacks. MR 1435976(98b:81006)

[61] B. L. van der (Bartel Leendert) Waerden, Sources of quantum mechan-ics, Dover Publications, New York, 1968, c1967, edited with a historicalintrod. by B. L. van der Waerden.

194

[62] Steven Weinberg, The quantum theory of fields. Vol. I, Cambridge Uni-versity Press, Cambridge, 2005, Foundations. MR 2148466

[63] , The quantum theory of fields. Vol. II, Cambridge UniversityPress, Cambridge, 2005, Modern applications. MR 2148467

[64] , The quantum theory of fields. Vol. III, Cambridge UniversityPress, Cambridge, 2005, Supersymmetry. MR 2148468

[65] Hermann Weyl, The theory of groups and quantum mechanics, DoverPublications, Inc., 1928 1st ed, 1930 2nd ed. translated 1931. MRMR0450450 (56 #8744)

[66] , What is the title?, Z. Phys. 56 (1929), 330–??, {This is thepaper in which he introduces E and M into general relativity via con-nections. This ref. is from Dirac 1931 and also from Wu and Yang (1975)[List of publications by Hermann Weyl, Hermann Weyl, 1885-1985 (Ei-dgenssische Tech. Hochschule, Zrich, 1986), 109-119.] }.

[67] Edmund Whittaker, A history of the theories of aether and electricity.Vol. I: The classical theories, Harper & Brothers, New York, 1960. MR0154779 (27 #4724)

[68] , A history of the theories of aether and electricity. Vol. II: Themodern theories, 1900–1926, Harper & Brothers, New York, 1960. MR0154780 (27 #4725)

[69] Eugene Wigner, On the consequences of the symmetry of the nuclearhamiltonian on the spectroscopy of nuclei, Phys. Rev. 51 (1937), 106–119, {This is the paper that introduces isotopic spin.}.

[70] N. M. J. Woodhouse, Geometric quantization, second ed., Oxford Math-ematical Monographs, The Clarendon Press Oxford University Press,New York, 1992, Oxford Science Publications. MR 1183739 (94a:58082)

[71] T.T. Wu and C.N.Yang, Concept of nonintegrable phase factors andglobal formulation of gauge fields, Phys. Rev. D 12 (1975), 3845–3857,{This is the dictionary paper: Nonintegrable phase factor = path de-pendent parallel transport. Gauge field = connection. Field strength =curvature, etc.}.

195

[72] C.N. Yang and R.L. Mills, Conservation of isotopic spin and isotopicgauge invariance, Phys. Rev. 96 (1954), 191–195, {This is THE paper.}.

[73] Eberhard Zeidler, Quantum field theory. I. Basics in mathematics andphysics, Springer-Verlag, Berlin, 2006, A bridge between mathemati-cians and physicists. MR 2257528 (2008b:81002)

[74] , Quantum field theory. II. Quantum electrodynamics, Springer-Verlag, Berlin, 2009, A bridge between mathematicians and physicists.MR 2456465 (2010a:81002)

196

(PDF) Some Physics for Mathematicians: Mathematics 7120, …gross/newt32x.pdf · Some Physics for Mathematicians: Mathematics 7120, Spring 2011 Len Gross with the assistance of Mihai Bailesteanu, - DOKUMEN.TIPS (2024)

FAQs

Is physics heavy on math? ›

While physicists rely heavily on math for calculations in their work, they don't work towards a fundamental understanding of abstract mathematical ideas in the way that mathematicians do.

What math is good for physics? ›

Study trigonometry and algebra. The more math you know, the better at physics you will be. Physics is essentially applied math.

How did Witten learn math? ›

At about age 11, I was presented with some relatively advanced math books. My father is a theoretical physicist and he introduced me to calculus. For a while, math was my passion.

How much physics do mathematicians know? ›

Most mathematicians pick up a tiny bit of physics, if not during their high school or undergraduate years (although many, like myself, do so), then when they start teaching and need to give some motivating examples for calculus, for which some basic physics is often a starting point.

Which is harder, physics or calculus? ›

As for difficulty, calculus-based physics is generally considered to be more challenging than algebra-based physics, as it requires a stronger grasp of calculus and its applications, in addition to a more sophisticated understanding of the physics concepts.

Is physics harder than chemistry? ›

Some people find Physics easier because it involves mainly mathematical concepts and logic, while others prefer Chemistry due to its mix of concepts, memorization, and hands-on lab work.

Which is harder math or physics? ›

Why is Physics harder than Math? Answer: Physics demands problem-solving skills that can be developed only with practice. It also involves theoretical concepts, mathematical calculations and laboratory experiments that adds to the challenging concepts.

Do you need to be good at math for physics? ›

It is absolutely essential that a physicist be proficient in mathematics. You don't have to know everything - that's impossible - but you do have to be comfortable with mathematical concepts and how to apply them.

Is mathematical physics a good degree? ›

Mathematical Physics graduates commonly pursue careers in industry analysis and modelling, software development and theoretical physics research with nuclear power companies, tech companies, engineering firms, and more. Many graduates also pursue specialized master's and PhD studies.

What is Edward Witten's IQ? ›

There is no definitive information available regarding Edward Witten's IQ. While he is widely recognized as one of the most brilliant physicists of his generation, his exact IQ score is not publicly known.

What math did Einstein use? ›

At the time he was conceiving the General Theory of Relativity, he needed knowledge of more modern mathematicss: tensor calculus and Riemannian geometry, the latter developed by the mathematical genius Bernhard Riemann, a professor in Göttingen. These were the essential tools for shaping Einstein's thought.

What math class did Einstein fail? ›

A persistent rumor about Albert Einstein is that he once failed a math class. This story was circulating as far back as 1935, when a rabbi at Princeton showed Einstein a newspaper clipping making the assertion. Einstein laughed and said, “Before I was fifteen I had mastered differential and integral calculus.”

How much maths did Einstein know? ›

From age twelve to sixteen, he studied differential and integral calculus by himself. "In October 1895 Einstein went to Zürich to take the ETH examination. He failed, although he did well in mathematics and the sciences. "[...] he singled out Adolf Hurwitz and Hermann Minkowski as excellent mathematics teachers.

How long would it take to learn all of physics? ›

If you want to learn and understand physics to the level of someone who learned it at university, or at least enough to understand what is going on in physics research at the moment and follow most research papers, you will need to dedicate at least 3 years of your time to studying this.

Did Einstein know how do you do math? ›

Although he is recognized as the greatest physician of the 20th century, his mathematical prowess is not as popularly known, because he did not directly create any theory in math per se, but rather used mathematics in an essential way to come up with remarkable ideas that have sparked great advances in geometry and ...

How hard is physics maths? ›

Approximately 40% of the A-Level Physics exam focuses on mathematics, requiring students to have a solid grasp of mathematical concepts. This integration of complex maths into physics makes the subject particularly challenging for those who may not be as strong in mathematics​​.

Why is math hard in physics? ›

The subject contains very complex concepts and sometimes acts like an experimental science. The calculations and formulas in physics problems can become pretty boring and serious. Concepts like potential energy, kinetic energy, vector quantities, scalar quantities, and hand rules are very difficult and confusing.

Is physics mostly math or science? ›

Physics, the study of the fundamental laws and principles that govern the physical world, is undoubtedly a science. It seeks to explain and predict natural phenomena, uncovering the secrets of the cosmos.

Top Articles
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 6046

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.