Author: Sternberg S.   Loomis L.H.  

Tags: mathematics   computational mathematics   calculus  

ISBN: 0-86720-122-3

Year: 1990

Text
                    LYNN H. LOOMIS and SHLOMO STERNBERG
Department of Mathematics, Harvard University
ADVANCED CALCULUS
REVISED EDITION
JONES AND BARTLETT PUBLISHERS
Boston London


Editorial, Sales, and Customer Service Offices: Jones and Bartlett Publishers, Inc. One Exeter Plaza Boston, MA 02116 Jones and Bartlett Publishers International PO Box 1498 London W6 7RS England Copyright © 1990 by Jones and Bartlett Publishers, Inc. Copyright © 1968 by Addison-Wesley Publishing Company, Inc. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner. Printed in the United States of America. 10 98765432 Library of Congress Cataloging-in-Publication Data Loomis, Lynn H. Advanced calculus / Lynn H. Loomis and Shlomo Sternberg. —Rev. ed. p. cm. Originally published: Reading, Mass. : Addison-Wesley Pub. Co., 1968. ISBN 0-86720-122-3 1. Calculus. I. Sternberg, Shlomo. II. Title. QA303.L87 1990 515-dc20 89-15620 CIP
PREFACE This book is based on an honors course in advanced calculus that we gave in the 1960's. The foundational material, presented in the unstarred sections of Chapters 1 through 11, was normally covered, but different applications of this basic material were stressed from year to year, and the book therefore contains more material than was covered in any one year. It can accordingly be used (with omissions) as a text for a year's course in advanced calculus, or as a text for a three-semester introduction to analysis. These prerequisites are a good grounding in the calculus of one variable from a mathematically rigorous point of view, together with some acquaintance with linear algebra. The reader should be familiar with limit and continuity type arguments and have a certain amount of mathematical sophistication. As possible introductory texts, we mention Differential and Integral Calculus by R. Cou- rant. Calculus by T. Apostol, Calculus by M. Spivak, and Pure Mathematics by G. Hardy. The reader should also have some experience with partial derivatives. In overall plan the book divides roughly into a first half which develops the calculus (principally the differential calculus) in the setting of normed vector spaces, and a second half which deals with the calculus of differentiable manifolds. Vector space calculus is treated in two chapters, the differential calculus in Chapter 3, and the basic theory of ordinary differential equations in Chapter 6. The other early chapters are auxiliary. The first two chapters develop the necessary purely algebraic theory of vector spaces. Chapter 4 presents the material on compactness and completeness needed for the more substantive results of the calculus, and Chapter 5 contains a brief account of the extra structure encountered in scalar product spaces. Chapter 7 is devoted to multilinear (tensor) algebra and is, in the main, a reference chapter for later use. Chapter 8 deals with the theory of (Riemann) integration on Euclidean spaces and includes (in exercise form) the fundamental facts about the Fourier transform. Chapters 9 and 10 develop the differential and integral calculus on manifolds, while Chapter 11 treats the exterior calculus of E. Cartan. The first eleven chapters form a logical unit, each chapter depending on the results of the preceding chapters. (Of course, many chapters contain material that can be omitted on first reading; this is generally found in starred sections.)
Oil the other hand, Chapters 12^ 13, and the latter parts of Chapters 6 and 11 are independent of each other^ and are to be regarded m illustrative applications of the methods developed in the earlier chapters. Presented here are elementary Sturm-Liouville theory and Fourier series^ elementary differential geometry, potential theory^ and classical mechanics. We usually covered only one or two of these topics in our one-year course. We have not hesitated to present the same material more than once from different points of view. For example^ although we have selected the contraction mapping fixed-point theorem as our basic approach to the implicit-function theorem^ we have also outlined a "Kewton's method " proof in the text and have sketched still a third proof in the exercises. Similarly^ the calculus of variations is encountered twice—once in the context of the differential calculus of an infinite-dimensional vector space and later in the context of classical mechanics. The notion of a submanifold of a vector space is introduced in the early chapters^ while the invariant definition of a manifold is given later on. In the introductory treatment of vector space theory, we are more careful and precise than is customary. In fact, this level of precision of language is not maintained in the later chapters. Our feeling is that in linear algebra, where the concepts are so clear and the axioms so familiar, it is pedagogically sound to illustrate various subtle points, such as distinguishing between spaces that are normally identified, discussing the naturality of various maps, and so on. Later on, when overly precise language would be more cumbersome, the reader should be able to produce for himself a more precise version of any assertions that he finds to be formulated too loosely. Similarly, the proofs in the first few chapters are presented in more formal detail. Again, the philosophy is that once the student has mastered the notion of what constitutes a formal mathematical proof, it is safe and more convenient to present arguments in the usual mathematical colloquialisms. While the level of formality decreases, the level of mathematical sophistication does not. Thus increasingly abstract and sophisticated mathematical objects are introduced. It has been our experience that Chapter 9 contains the concepts most difficult for students to absorb, especially the notions of the tangent space to a manifold and the Lie derivative of various objects with respect to a vector field.
There are exercises of many different kinds spread throughout the book. Some are in the nature of routine applications. Others ask the reader to fill in or extend various proofs of results presented in the text. Sometimes whole topics, such as the Fourier transform or the residue calculus, are presented in exercise form. Due to the rather abstract nature of the textual material, the student is strongly advised to work out as many of the exercises as he possibly can. Any enterprise of this nature owes much to many people besides the authors, but we particularly wish to acknowledge the help of L. Ahlfors, A. Gleason, R. Kulkami, R. Rasala, and G. Mackey and the general influence of the book by Dieudonne. We also wish to thank the staff of Jones and Bartlett for their invaluable help in preparing this revised edition. Cambridge, Massachusetts L.H.L. 1968, 1989 S.S.
CONTENTS Chapter 0 Introduction 1 Logic: quantifiers 1 2 The logical connectives 3 3 Negations of quantifiers 6 4 Sets 6 5 Restricted variables 8 6 Ordered pairs and relations 9 7 Functions and mappings 10 8 Product sets; index notation 12 9 Composition 14 10 Duality 15 11 The Boolean operations 17 12 Partitions and equivalence relations 19 Chapter 1 Vector Spaces 1 Fundamental notions 21 2 Vector spaces and geometry 36 3 Product spaces and Hom(y, W) 43 4 Affine subspaces and quotient spaces 52 5 Direct sums 56 6 Bilinearity 67 Chapter 2 Finite-Dimensional Vector Spaces 1 Bases 71 2 Dimension 77 3 The dual space 81 4 Matrices 88 5 Trace and determinant 99 6 Matrix computations 102 *7 The diagonalization of a quadratic form Ill Chapter 3 The DifTerential Calculus 1 Reviewing 117 2 Norms 121 3 Continuity 126
4 Equivalent norms 132 5 Infinitesimals 136 6 The differential 140 7 Directional derivatives; the mean-value theorem 146 8 The differential and product spaces 152 9 The differential and R" 156 10 Elementary applications 161 11 The implicit-function theorem 164 12 Submanifolds and Lagrange multipliers 172 *13 Functional dependence 175 *14 Uniform continuity and function-valued mappings 179 *15 The calculus of variations 182 *16 The second differential and the classification of critical points . 186 *17 The Taylor formula 191 Chapter 4 Compactness and Completeness 1 Metric spaces; open and closed sets 195 *2 Topology 201 3 Sequential convergence 202 4 Sequential compactness 205 5 Compactness and uniformity 210 6 Equicontinuity 215 7 Completeness 216 8 A first look at Banach algebras 223 9 The contraction mapping fixed-point theorem 228 10 The integral of a parametrized arc 236 11 The complex number system 240 *12 Weak methods 245 Chapter 5 Scalar Product Spaces 1 Scalar products 248 2 Orthogonal projection 252 3 Self-adjoint transformations 257 4 Orthogonal transformations 262 5 Compact transformations 264
Chapter 6 Differential Equations 1 The fundamental theorem 266 2 Differentiate dependence on parameters 274 3 The linear equation 276 4 The nth-order linear equation 281 5 Solving the inhomogeneous equation 288 6 The boundary-value problem 294 7 Fourier series 301 Chapter 7 Multilinear Functionals 1 Bilinear functionals 305 2 Multilinear functionals 306 3 Permutations 308 4 The sign of a permutation 309 5 The subspace d"^ of alternating tensors 310 6 The determinant 312 7 The exterior algebra 316 8 Exterior powers of scalar product spaces 319 9 The star operator 320 Chapter 8 Integration 1 Introduction 321 2 Axioms 322 3 Rectangles and paved sets 324 4 The minimal theory 327 5 The minimal theory (continued) 328 6 Contented sets 331 7 When is a set contented? 333 8 Behavior under linear distortions 335 9 Axioms for integration . . 336 10 Integration of contented functions 338 11 The change of variables formula 342 12 Successive integration 346 13 Absolutely integrable functions 351 14 Problem set: The Fourier transform 355
Chapter 9 Differentiable Manifolds 1 Atlases 364 2 Functions, convergence 367 3 Differentiable manifolds 369 4 The tangent space 373 5 Flows and vector fields 376 6 Lie derivatives 383 7 Linear differential forms 390 8 Computations with coordinates 393 9 Riemann metrics 397 Chapter 10 The Integral Calculus on Manifolds 1 Compactness 403 2 Partitions of unity 405 3 Densities 408 4 Volume density of a Riemann metric 411 5 Fullback and Lie derivatives of densities 416 6 The divergence theorem 419 7 More complicated domains 424 Chapter 11 Exterior Calculus 1 Exterior differential forms 429 2 Oriented manifolds and the integration of exterior differential forms 433 3 The operator d 438 4 Stokes' theorem 442 5 Some illustrations of Stokes' theorem 449 6 The Lie derivative of a differential form 452 Appendix L "Vector analysis^^ 457 Appendix IL Elementary differential geometry of surfaces in E^ . 459 Chapter 12 Potential Theory in E'* 1 SoUd angle 474 2 Green's formulas 476 3 The maximum principle 477 4 Green's functions 479
5 The Poisson integral formula 482 6 Consequences of the Poisson integral formula 485 7 Harnack's theorem 487 8 Subharmonic functions 489 9 Dirichlet's problem 491 10 Behavior near the boundary 495 11 Dirichlet's principle 499 12 Physical applications 501 13 Problem set: The calculus of residues 503 Chapter 13 Classical Mechanics 1 The tangent and cotangent bundles 511 2 Equations of variation 513 3 The fundamental linear differential form on T*(M) . . . .515 4 The fundamental exterior two-form on T*(M) 517 5 Hamiltonian mechanics 520 6 The central-force problem 521 7 The two-body problem 528 8 Lagrange's equations 530 9 Variational principles 532 10 Geodesic coordinates 537 11 Euler's equations 541 12 Rigid-body motion 544 13 Small oscillations 551 14 Small oscillations (continued) 553 15 Canonical transformations 558 Selected References 569 Notation Index 572 Index 575
CHAPTER 0 INTRODUCTION This preliminary chapter contains a short exposition of the set theory that forms the substratum of mathematical thinking today. It begins with a brief discussion of logic, so that set theory can be discussed with some precision, and continues with a review of the way in which mathematical objects can be defined as sets. The chapter ends with four sections which treat specific set-theoretic topics. It is intended that this material be used mainly for reference. Some of it will be familiar to the reader and some of it will probably be new. We suggest that he read the chapter through "lightly" at first, and then refer back to it for details as needed. 1. LOGIC: QUANTIFIERS A statement is a sentence which is true or false as it stands. Thus '1 < 2' and '4 + 3 = 5' are, respectively, true and false mathematical statements. Many sentences occurring in mathematics contain variables and are therefore not true or false as they stand, but become statements when the variables are given values. Simple examples are ^x < 4^ ^x < y\ ^x is an integer', '3a:^ + 2/^ = 10'. Such sentences will be called statement frames. If P(x) is a frame containing the one variable 'x\ then P(5) is the statement obtained by replacing ^x^ in P(x) by the numeral '5\ For example, if P{x) is 'x < 4', then P(5) is '5 < 4', P{\/2) is '\/2 < 4', and so on. Another way to obtain a statement from the frame P(x) is to assert that P(x) is always true. We do this by prefixing the phrase 'for every x\ Thus, 'for every Xj X < 4:^ is Si false statement, and 'for every Xj x^ — 1 = (x — l)(x -{- 1)' is a true statement. This prefixing phrase is called a universal quantifier. Synonymous phrases are 'for each x^ and 'for all x\ and the symbol customarily used is '(Vo:)', which can be read in any of these ways. One frequently presents sentences containing variables as being always true without explicitly writing the universal quantifiers. For instance, the associative law for the addition of numbers is often written x+ (y + z) = (x + y) + Zy where it is understood that the equation is true for all Xy y and z. Thus the 1
2 INTRODUCTION 0.1 actual statement being made is {^x){^y){^z)[x +{y + z)= {x + y)+ z]. Finally, we can convert the frame P{x) into a statement by asserting that it is sometimes true, which we do by writing ^there exists an x such that P{xy. This process is called existential quantification. Synonymous prefixing phrases here are ^there is an x such that^ ^for some x\ and, symbolically, \^x)\ The statement ^(yx){x < 4)' still contains the variable ^x\ of course, but ^x^ is no longer free to be given values, and is now called a hound variable. Roughly speaking, quantified variables are bound and unquantified variables are free. The notation ^P{xy is used only when ^x^ is free in the sentence being discussed. Now suppose that we have a sentence P{x, y) containing two free variables. Clearly, we need two quantifiers to obtain a statement from this sentence. This brings us to a very important observation. // quantifiers of both types are used, then the order in ivhich they are ivritten affects the meaning of the statement; {^y)(Vx)P(x, y) and (Vx)(^y)P(Xy y) say different things. The first says that one y can be found that works for all x: "there exists a y such that for all a:. . .". The second says that for each x Oi y can be found that works: "for each x there exists a y such that. . .". But in the second case, it may very well happen that when X is changed, the y that can be found will also have to be changed. The existence of a single y that serves for all x is thus the stronger statement. For example, it is true that (yx){'^y){x < y) and false that {^y)(yx){x < y). The reader must be absolutely clear on this point; his whole mathematical future is at stake. The second statement says that there exists a y, call it 2/0, such that (yx){x < 2/0), that is, such that every number is less than y^. This is false; 2/0 + 1, in particular, is not less than y^. The first statement says that for each x we can find a corresponding y. And we can: take y = x -\- \. On the other hand, among a group of quantifiers of the same type the order does not affect the meaning. Thus ^(^x) (Vy)^ and '(Vy) (Vx)' have the same meaning. We often abbreviate such clumps of similar quantifiers by using the quantification symbol only once, as in \VXj y)\ which can be read ^for every x and y\ Thus the strictly correct ^(yx)(yy)(yz)[x + {y + z) = {x -^ y) -^ z\ receives the slightly more idiomatic rendition ^(Vo:, y, z)[x + (2/ + 2;) = (^ + 2/) + A^- The situation is clearly the same for a group of existential quantifiers. The beginning student generally feels that the prefixing phrases ^for every x there exists a y such that' and ^there exists a y such that for every x' sound artificial and are unidiomatic. This is indeed the case, but this awkwardness is the price that has to be paid for the order of the quantifiers to be fixed, so that the meaning of the quantified statement is clear and unambiguous. Quantifiers do occur in ordinary idiomatic discourse, but their idiomatic occurrences often house ambiguity. The following two sentences are good examples of such ambiguous idiomatic usage: "Every x is less than some y " and "Some y is greater than every x". If a poll were taken, it would be found that most men on the
0.2 THE LOGICAL CONNECTIVES 3 street feel that these two sentences say the same thing, but half will feel that the common assertion is false and half will think it true! The trouble here is that the matrix is preceded by one quantifier and followed by anotlier, and the poor reader doesn't know which to take as the inside, or first applied, quantifier. The two possible symbolic renditions of our first sentence, ^[(Vx)(x < y)]{^y)^ and \Vx)[(x < 2/)(3?/)]\ are respectively false and true. Mathematicians do use hanging quantifiers in the interests of more idiomatic writing, but only if they are sure the reader will understand their order of application, either from the context or by comparison with standard usage. In general, a hanging quantifier would probably be read as the inside, or first applied, quantifier, and with this understanding our two ambiguous sentences become true and false in that order. After this apology the reader should be able to tolerate the definition of sequential convergence. It involves three quantifiers and runs as follows: The sequence {xn} converges to x if (Ve) ('^N)(y7i) (if 7i > N then \xn — x\ < e). In exactly the same format, we define a function / to be continuous at a if (Ve)(3 5)(Vx)(if \x - a\ < 8 then \f{x) - f{a)\ < e). We often omit an inside universal quantifier by displaying the final frame, so that the universal quantification is understood. Thus we define / to be continuous at a if for every e there is a 5 such that if \x - a\ < 5, then \f{x) - f{a)\ < e. We shall study these definitions later. We remark only that it is perfectly possible to build up an intuitive understanding of what these and similar quantified statements actually say. 2. THE LOGICAL CONNECTIVES When the word 'and' is inserted between two sentences, the resulting sentence is true if both constituent sentences are true and is false otherwise. That is, the "truth value", T or F, of the compound sentence depends only on tlie truth values of the constituent sentences. We can thus describe the way 'and' acts in compounding sentences in the simple "truth table" p T T F F Q T F T F P and Q T F F F where 'P' and 'Q' stand for arbitrary statement frames. Words like 'and' are called logical connectives. It is often convenient to use symbols for connectives, and a standard symbol for 'and' is the ampersand '&'. Thus 'P & Q' is read 'P and Q\
4 INTRODUCTION 0.2 Another logical connective is the word 'or\ Unfortunately, this word is used ambiguously in ordinary discourse. Sometimes it is used in the exclusive sense, where ^P or Q^ means that one of P and Q is true, but not both, and sometimes it is used in the inclusive sense that at least one is true, and possibly both are true. Mathematics cannot tolerate any fundamental ambiguity, and in mathematics 'or' is ahvays used in the latter way. We thus have the truth table P Q P or Q T T T T F T FT T F F F The above two connectives are binary, in the sense that they combine tivo sentences to form one new sentence. The word 'not' applies to one sentence and really shouldn't be considered a connective at all; nevertheless, it is called a unary connective. A standard symbol for 'not' is ^^\ Its truth table is obviously T F F T In idiomatic usage the word 'not' is generally buried in the interior of a sentence. We write ^x is not equal to y' rather than 'not (x is equal to y)'. However, for the purpose of logical manipulation, the negation sign (the word 'not' or a symbol like ''^') precedes the sentence being negated. We shall, of course, continue to write ^x 9^ y\ but keep in mind that this is idiomatic for 'not {x = yY or ^^{x = yY. We come now to the troublesome 'if ... , then . . .' connective, which we write as either 'if P, then Q' or 'P =^ Q'. This is almost always applied in the universally quantified context (yx){P{x) =^ Q{x))y and its meaning is best unraveled by a study of this usage. We consider 'if a: < 3, then x < 5' to be a true sentence. More exactly, it is true for all Xj so that the universal quantification (Vx)(x < 3 =^ a: < 5) is a true statement. This conclusion forces us to agree that, in particular, '2 < 3 =^ 2 < 5', '4 < 3 =^ 4 < 5', and '6 < 3 =^ 6 < 5' are all true statements. The truth table for '=^' thus contains the values entered below. p T T F F Q T F T F P^Q T - T T
0.2 THE LOGICAL CONNECTIVES 5 On the other hand, we consider 'x < 7 =^ x < 5^ to he Si false sentence, and therefore have to agree that '6 < 7 =^ 6 < 5Ms false. Thus the remaining row in the table above gives the value T' for P =^ Q. Combinations of frame variables and logical connectives such as we have been considering are called truth-functional forms. We can further combine the elementary forms such as ^P =^ Q^ and '^P^ by connectives to construct composite forms such as ^^{P =^ QY and \P =^ Q) & (Q =^ Py. A sentence has a given (truth-functional) form if it can be obtained from that form by substitution. Thus ^x < y or ^{x < yY has the form 'P or ^P\ since it is obtained from this form by substituting the sentence 'x < y^ for the sentence variable ^P\ Composite truth-functional forms have truth tables that can be worked out by combining the elementary tables. For example, ^^{P =^ QY has the table below, the truth value for the whole form being in the column under the connective which is applied last (^^^ in this example). p T T F F Q T F T F ~iP ^ Q) F T F F T F T T Thus ^(P =^ Q) is true only when P is true and Q is false. A truth-functional form such as 'P or (^PY which is always true (i.e., has only T' in the final column of its truth table) is called a tautology or a tautologous form. The reader can check that (P &(P^ Q)) ^ Q and ((P ^Q) &{Q^ R)) ^ {P ^ R) are also tautologous. Indeed, any valid principle of reasoning that does not involve quantifiers must be expressed by a tautologous form. The 'if and only if' form T <^ Q\ or 'P if and only if Q\ or T iff Q\ is an abbreviation for \P => Q) & {Q=> PY- Its truth table works out to be P Q P^Q T T T T F F F T F F F T That is, P <^ Q is true if P and Q have the same truth values, and is false otherwise. Two truth-functional forms A and B are said to be equivalent if (the final columns of) their truth tables are the same, and, in view of the table for ^<f=^\ we see that A and B are equivalent if A ^^ B is tautologous^ and conversely.
6 INTRODUCTION 0.4 Replacing a sentence obtained by substitution in a form A by the equivalent sentence obtained by the same substitutions in an equivalent form 5 is a device much used in logical reasoning. Thus to prove a statement P true, it suffices to prove the statement ^P false, since 'P' and ^^(^Py are equivalent forms. Other important equivalences are -(PorQ) ^ (-P) & (-Q), (P=>Q) ^ Qor (-P), ^(P=>Q) <^ P & (^Q). A bit of conventional sloppiness which we shall indulge in for smoother idiom is the use of ^iV instead of the correct 'if and only if' in definitions. We define/to be continuous at x if so-and-so, meaning, of course, that/is continuous at X if and only if so-and-so. This causes no difficulty, since it is clear that 'if and only if' is meant when a definition is being given. 3. NEGATIONS OF QUANTIFIERS The combinations ^^(yxY and '{^x)^^ have the same meanings: something is not always true if and only if it is sometimes false. Similarly, '^(^yY and '(Vy)'^' have the same meanings. These equivalences can be applied to move a negation sign past each quantifier in a string of quantifiers, giving the following important practical rule: In taking the negation of a statement beginning with a string of quantifiers^ we simply change each quantifier to the opposite kind and move the negation sign to the end of the string. Thus ^(Vx)(^y)(Vz)P(x,y,z) ^ (^x)(Vy)(^z)^P(x,y, z). There are other principles of quantificational reasoning that can be isolated and which we shall occasionally mention, but none seem worth formalizing here. 4. SETS It is present-day practice to define every mathematical object as a set of some kind or other, and we must examine this fundamental notion, however briefly. A set is a collection of objects that is itself considered an entity. The objects in the collection are called the elements or members of the set. The symbol for 'is a member of is 'g' (a sort of capital epsilon), so that 'x G A' is read "x is a member of A", "x is an element of A", "x belongs to A", or "x is in A". We use the equals sign '=' in mathematics to mean logical identity; A = B means that A is B. Now a set A is considered to be the same object as a set B if and only if A and B have exactly the same members. That is, 'A = B^ means that (Vx)(x G A <^ X eB),
0.4 SETS 7 We say that a set A is a subset of a set B, or that A is included in B (or that 5 is a superset of A) if every element of A is an element of B. The symbol for inclusion is f. Thus 'A C B' means that (Vx)(x G A => x G 5). Clearly, (A = B) ^ (Ac B) and (5 C A). This is a frequently used way of establishing set identity: we prove that A = B by proving that A C B and that B C A. If the reader thinks about the above equivalence, he will see that it depends first on the equivalence of the truth-functional forms T<i=>Q' and '(P=>Q) & {Q=>P)\ and then on the obvious quantificational equivalence between '(Vx)(R & Sy and '(yx)R & (yx)S\ We define a set by specifying its members. If the set is finite, the members can actually be Hsted, and the notation used is braces surrounding a membership list. For example {1, 4, 7} is the set containing the three numbers 1, 4, 7, {x} is the unit set of x (the set having only the one object x as a member), and {Xy y} is the paii' set of x and y. We can abuse this notation to name some infinite sets. Thus {2, 4, 6, 8, . . .} would certainly be considered the set of all even positive integers. But infinite sets are generally defined by statement frames. If P(x) is a frame containing the free variable ^x\ then {x : P{x)} is the set of all X such that P(x) is true. In other words, {x : P(x)} is that set A such that y gA^ P{y). For example, {x : x^ < 9} is the set of all real numbers x such that x^ < 9, that is, the open interval (—3, 3), and ?/ G {x : x^ < 9} <i=> y^ < 9. A statement frame P{x) can be thought of as stating a property that an object x may or may not have, and {x : P(x)} is the set of all objects having that property. We need the empty set 0, in much the same way that we need zero in arithmetic. If P(x) is never true, then {x : P{x)} = 0. For example, {x :x ^ x} = 0. When we said earlier that all mathematical objects are customarily considered sets, it was taken for granted that the reader understands the distinction between an object and a name of that object. To be on the safe side, we add a few words. A chair is not the same thing as the word 'chair', and the number 4 is a mathematical object that is not the same thing as the numeral '4\ The numeral '4' is a name of the number 4, as also are 'four\ '2 + 2\ and 'IV\ According to our present viewpoint, 4 itself is taken to be some specific set. There is no need in this course to carry logical analysis this far, but some readers may be interested to know that we usually define 4 as {0, 1, 2, 3}. Similarly, 2 = {0, 1}, 1 = {0}, and 0 is the empty set 0. It should be clear from the above discussion and our exposition thus far that we are using a symbol surrounded by single quotation marks as a name of that symbol (the symbol itself being a name of something else). Thus ' '4' ' is a name of '4' (which is itself a name of the number 4). This is strictly correct
8 INTRODUCTION 0.5 usage, but mathematicians almost universally mishandle it. It is accurate to write: let x be the number; call this number 'x\ However, the latter is almost always written: call this number x. This imprecision causes no difficulty to the reading mathematician, and it often saves the printed page from a shower of quotation marks. There is, however, a potential victim of such ambiguous treatment of symbols. This is the person who has never realized that mathematics is not about symbols but about objects to which the symbols refer. Since by now the present reader has safely avoided this pitfall, we can relax and occasionally omit the strictly necessary quotation marks. In order to avoid overworking the word 'set', we use many synonyms, such as 'class^ 'collection', 'family' and 'aggregate'. Thus we might say, "Let d be a family of classes of sets". If a shoe store is a collection of pairs of shoes, then a chain of shoe stores is such a three-level object. 5. RESTRICTED VARIABLES A variable used in mathematics is not allowed to take all objects as values; it can only take as values the members of a certain set, called the domain of the variable. The domain is sometimes explicitly indicated, but is often only implied. For example, the letter ' n' is customarily used to specify an integer, so that' (yn)P{ny would automatically be read "for every integer n, P{n)". However, sometimes n is taken to be a positive integer. In case of possible ambiguity or doubt, we would indicate the restriction explicitly and write' (Vn G Z)P(n)\ where 'Z' is the standard symbol for the set of all integers. The quantifier is read, literally, "for all n in Z", and more freely, "for every integer n". Similarly, '(3n G Z)P(n)' is read "there exists an n in Z such that P{7i)" or "there exists an integer n such that P{n)". Note that the symbol 'g' is here read as the preposition 'in\ The above quantifiers are called restricted quantifiers. In the same way, we have restricted set formation, both implicit and explicit, as in '{n : P(n)}' and '{n G Z : P(n)}', both of which are read "the set of all integers n such that P(n)". Restricted variables can be defined as abbreviations of unrestricted variables by (Vx G A)P(x) <^ (yx) {xgA=> P(x)), (^xgA)P(x) <^ (3x)(xG A &P(x)), {xgA: P(x)} = {x:xeA& P(x)}. Although there is never any ambiguity in sentences containing explicitly restricted variables, it sometimes helps the eye to see the structure of the sentence if the restricting phrases are written in superscript position, as in (V€ ^^) (3n^^). Some restriction was implicit on page 1. If the reader agreed that (yx){x^ — 1 = (x — l)(x -{- 1)) was true, he probably took x to be a real number.
0.6 ORDERED PAIRS AND RELATIONS 9 6. ORDERED PAIRS AND RELATIONS Ordered pairs are basic tools, as the reader knows from analytic geometry. According to our general principle, the ordered pair < a, 6 > is taken to be a certain set, but here again we don't care which particular set it is so long as it guarantees the crucial characterizing property: <Xyy> = <ayh> <^ X = a and y = h. Thus <1, 3> 9^ <3, 1>. The notion of a correspondence, or relation, and the special case of a mapping, or function, is fundamental to mathematics. A correspondence is a pairing of objects such that given any two objects x and ?/, the pair <Xy y> either does or does not correspond. A particular correspondence (relation) is generally presented by a statement frame P(x, y) having two free variables, with x and y corresponding if any only if P{Xj y) is true. Given any relation (correspondence), the set of all ordered pairs <x^y> of corresponding elements is called its graph. Now a relation is a mathematical object, and, as we have said several times, it is current practice to regard every mathematical object as a set of some sort or other. Since the graph of a relation is a set (of ordered pairs), it is efficient and customary to take the graph to he the relation. Thus a relation {correspondence) is simply a set of ordered pairs. If 72 is a relation, then we say that x has the relation R to y, and we write ^xRy\ if and only if <Xyy> G R. We also say that X corresponds to y under R. The set of all first elements occurring in the ordered pairs of a relation R is called the domain of R and is designated dom R orT>(R). Thus dom R = {x : (3?/) <x,y> G R}, The set of second elements is called the i^ange of R: range R = {y : (3x) <Xy y> G R}. The inverse, 72~^, of a relation R is the set of ordered pairs obtained by reversing those of R: R-^ = {^x, y> : <y, x> G R}. A statement frame P(x, y) having two free variables actually determines a pair of mutually inverse relations R & Sy called the graphs of P, as follows: R = {<x, y> : P(x, y)}, S = {<y, x> : P(x, y)}. A two-variable frame together with a choice of which variable is considered to be first might be called a directed frame. Then a directed frame would have a uniquely determined relation for its graph. The relation of strict inequality on the real number system U would be considered the set { < x, y > : x < y}, since the variables in 'x < y^ have a natural order. The set A X B = {<Xjy> : x G A & y G B} of all ordered pairs with first element in A and second element in B is called the Cartesian product of the
10 INTKODUCTION 0.7 sets A and B, A relation R is always a subset of dom R X range R. If the two "factor spaces" are the same, we can use exponential notation: A^ = A X A. The Cartesian product IR^ = IR X i^ is the "analytic plane". Analytic geometry rests upon the one-to-one coordinate correspondence between U^ and the Euclidean plane E^ (determined by an axis system in the latter), which enables us to tr»t geometric questions algebraically and algebraic questions geometrically. In particular, since a relation between sets of real numbers is a subset of U^, we can "picture" it by the corresponding subset of the Euclidean plane, or of any model of the Euclidean plane, such as this page. A simple Cartesian product is shown in Fig. 0.1 (A U -B is the union of the sets A and B). 1 .4 A AXB when yl = [l, 2]u[2i 3] and i? = [l, li]u{2} Fig. 0.1 Fig. 0.2 If jB is a relation and A is any set, then the restriction of R to A, R \ A, is the subset of R consisting of those pairs with first element in A: R \ A = {<x, y> : <x, y> G R and x G A}, Thus R \ A = R n (A X range R), where C n D is the intersection of the sets C and D. If E is a relation and A is any set, then the image of A under Rj R[A]^ is the set of second elements of ordered pairs in R whose first elements are in A: R[A] = {y : Ox)ix gA& <x,y> G R)}. Thus R[A] = range (R t A), as shown in Fig. 0.2. 7. FUNCTIONS AND MAPPINGS A function is a relation / such that each domain element x is paired with exactly one range element y. This property can be expressed as follows: <x, y> G/and <Xj z> G f =^ y =^ z.
0.7 FUNCTIONS AND MAPPINGS 11 The y which is thus uniquely determined by / and x is designated f{x): y = fix) <=» <x,y> Gf. One tends to think of a function as being active and a relation which is not a function as being passive. A function / acts on an element x in its domain to give/(a:). We take x and apply / to it; indeed we often call a function an operator. On the other handj if ^ is a relation but not a function^ then there is in general no particular y related to an element x in its domain^ and the pairing of x and y is viewed more passively. We often define a function / by specifying its value f{x) for each x in its domain, and in this connection a stopped arrow notation is used to indicate the pairing. Thus x^-^ x^ is the function assigning to each number x its square x^. <-2, 4> Fig. 0.3 If we want it to be understood that / is this function^ we can write "Consider the function fix ^-^ x^". The domain of / must be understood for this notation to be meaningful. If / is a function, then /"^ is of course a relation^ but in general it is not a function. For example^ if / is the function x *-» x^^ then f~^ contains the pairs <4, 2> and <4; —2> and so is not a function (see Fig. 0.3). If/"^ is a function, we say that/is one-to-one and that/is a one-to-one correspondence between its domain and its range. Each x G dom/ corresponds to only one y G mnge/ (/is a function), and each y G range/ corresponds to only one x G dom/ (/""^ is a function). The notation fiA^B is rmd % (the) function f on A into B*' or "the function / from A to B". The notation implies that / is a function, that dom/ = A, and that range f C B. Many people feel that the very notion of function should include all these ingredients; that is, a function should be considered an ordered triple < /, A, 5 >, where / is a function according to our more limited definition, A is the domain
12 INTRODUCTION 0.8 of /j and 5 is a superset of the range of f, which we shall call the codomain of / in this context. We shall use the terms 'map'^ 'mapping'^ and 'transformation' for such a triple^ so that the notation/: A —> B in its totality presents a mapping. Moreover^ when there is no question about which set is the codomain^ we shall often call the function / itself a mapping^ since the triple </, A, B> is then determined by /. The two arrow notations can be combined^ as in: "Define /:IR-> Uhy x^x^'\ A mapping f: A —> B is said to be injedive if / is one-to-one^ surjedive if range f = B^ and hijedive if it is both injective and surjective. A bijective mapping f: A —> B is thus a one-to-one correspondence between its domain A and its codomain B. Of course^ a function is always surjective onto its range R, and the statement that / is surjective means that R = B, where B is the understood codomain. 8. PRODUCT SETS; INDEX NOTATION One of the characteristic habits of the modern mathematician is that as soon as a new kind of object has been defined and discussed a little^ he immediately looks at the set of all such objects. With the notion of a function from A to /S well in handj we naturally consider the set of all functions from A to S, which we designate S^. Thus R" is the set of all real-valued functions of one real variable^ and S^'^ is the set of all infinite sequences in S. (It is understood that an infinite sequence is nothing but a function whose domain is the set Z"^ of all positive integers.) Similarly^ if we set n = {1, . , , ^n}, then S^ is the set of all finite sequences of length n in S. If 5 is a subset of Sy then its charaderistic fundion (relative to S) is the function on aSj usually designated Xbj which has the constant value 1 on 5 and the constant value 0 off B. The set of all characteristic functions of subsets of S is thus 2*^ (since 2 = {Oj 1}). But because this collection of functions is in a natural one-to-one correspondence with the collection of all subsets of /S, Xb corresponding to B^ we tend to identify the two collections. Thus 2*^ is also interpreted as the set of all subsets of S. We shall spend most of the remainder of this section discussing further similar definitional ambiguities which mathematicians tolerate. The ordered triple <XyyyZ> is usually defined to be the ordered pair < <Xyy> y z>. The reason for this definition is probably that a function of two variables x and y is ordinarily considered a function of the single ordered pair variable <Xy y>^ so thatj for example^ a real-valued function of two real variables is a subset of (R X R) X R. But we also consider such a function a subset of Cartesian 3-space R^. Therefore, we define R^ as (R X R) X R; that isj we define the ordered triple <XjyjZ> as <<Xjy> ,z>, On the other hand^ the ordered triple <Xy y, z> could also be regarded as the finite sequence {<lja:>j <2yy>y <d, z>}y which^ of course^ is a different object. These two models for an ordered triple serve equally wellj and^ again,
0.8 PRODUCT sets; index notation 13 mathematicians tend to slur over the distinction. We shall have more to say on this point later when we discuss natural isomorphisms (Section 1.6). For the moment we shall simply regard U^ and U^ as being the same; an ordered triple is something which can be "viewed" as being either an ordered pair of which the first element is an ordered pair or as a sequence of length 3 (or^ for that matter J as an ordered pair of which the second element is an ordered pair). Similarly^ we pretend that Cartesian 4-space R^ is R^j U^ X U^y or Ri X R^ = UX {(UXU) XU), etc. Clearly, we are in effect assuming an associative law for ordered pair formation that we don^t really have. This kind of ambiguity, where we tend to identify two objects that really are distinct, is a necessary corollary of deciding exactly what things are. It is one of the prices we pay for the precision of set theory; in days when mathematics was vaguer, there would have been a single fuzzy notion. The device of indices, which is used frequently in mathematics, also has ambiguous implications which we should examine. An indexed collection, as a set, is nothing but the range set of a function, the indexing function, and a particular indexed object, say Xi^ is simply the value of that function at the domain element i. If the set of indices is /, the indexed set is designated {xi: i G 1} or {xi] iei (or {xi}Z=i in case 7 = Z"^). However, this notation suggests that we view the indexed set as being obtained by letting the index run through the index set I and collecting the indexed objects. That is, an indexed set is viewed as being the set together with the indexing function. This ambivalence is reflected in the fact that the same notation frequently designates the mapping. Thus we refer to the sequence {Xr^n=\j where, of course, the sequence is the mapping n i—» x^. We believe that if the reader examines his idea of a sequence he will find this ambiguity present. He means neither just the set nor just the mapping, but the mapping with emphasis on its range, or the range "together with" the mapping. But since set theory cannot reflect these nuances in any simple and graceful way, we shall take an indexed set to he the indexing function. Of course, the same range object may be repeated with different indices; there is no implication that an indexing is one-to-one. Note also that indexing imposes no restriction on the set being indexed; any set can at least be self-indexed (by the identity function). Except for the ambiguous ^{xi: i G /}', there is no universally used notation for the indexing function. Since xi is the value of the function at z, we might think of ^Xi as another way of writing ^x(i)\ in which case we designate the function ^x^ or 'x'. We certainly do this in the case of ordered n-tuplets when we say, "Consider the n-tuplet x = <Xi, . . . , Xn> ". On the other hand, there is no compelling reason to use this notation. We can call the indexing function anything we want; if it is/, then of course/(z) = xi for all i. We come now to the general definition of Cartesian product. Earlier we argued (in a special case) that the Cartesian product A X 5 X C is the set of all ordered triples x = <Xi, X2j x^> such that xi G A, ^2 G 5, and X3 G C. More generally, Ai X A2 X • • • X A^, or IILi ^h is the set of ordered n- tuples X = <xij . . . J Xn> such that Xi G Ai for z = 1? . . . ? n. If we interpret
14 INTRODUCTION 0.9 an ordered n-tuplet as a function onn = {1^ . . . ^ n}^ we have IIi=i ^i is the set of all functions x with domain n such that x^.G Ai for all i G n. This rephrasal generalizes almost verbatim to give us the notion of the Cartesian product of an arbitrary indexed collection of sets. Definition. The Cartesian product JlieiSi of the indexed collection of sets {Si: i G /} is the set of all functions / with domain the index set I such that f(i) G Si for all i G /. We can also use the notation Y[{Si :iGl} for the product and fi for the value/(z). 9. COMPOSITION If we are given maps f:A—^B and g: B —> C, then the composition of g with f, g ° fy is the map of A into C defined by {9-f){x) = g{f{x)) for all x G A. This is the function of a function operation of elementary calculus. If / and g are the maps from R to R defined hy f{x) = x^'^ + 1 and g{x) = x^, then / © g{x) = (^2)1/3 + 1 = ^2/3 _^ 1^ ^^^ ^ ,^(^) _ (^1/3 _^ 1)2 _ ^2/3 _^ 2^^/^ + 1. Note that the codomain of / must be the domain of g in order forgof to be defined. This operation is perhaps the basic binary operation of mathematics. Lemma. Composition satisfies the associative law: fo{goh)= {fog)oh. Proof. {fo{goh)){x) = f{{goh){x)) = f{g{Hx))) = {fog){hix)) = ((/ ° q) ° h){x) for all X G dom h, D If A is a set J the identity map 7^: A —> A is the mapping taking every X G A to itself. Thus I a = {<x,x> : x G A}. If / maps A into B, then clearly foI^=f= Is of. If g:B —> A is such that g ° f = I Ay then we say that g isa left inverse of / and that / is a right inverse of g. Lemma. If the mapping f:A—>B has both a right inverse h and a left inverse g, they must necessarily be equal. Proof. This is just algebraic juggling and works for any associative operation. We have h = lAoh = (g o f) o h = g o (f o h) = g o Is = g, D
0.10 DUALITY 15 In this case we call the uniquely determined map g: B —^ A such that f o g = J^ and g ° f = I a the inverse of /. We then have: Theorem. A mapping/: A —^ B has an inverse if and only if it is bijective^ in which case its inverse is its relational inverse/~^ Proof. If / is bijectivej then the relational inverse/"^ is a function from B to A^ and the equations f ° f~^ = Ib and f~^ ° f = Ia are obvious. On the other hand J if f o g = I^^ then / is surjective^ since then every y in B can be written y = f{g{y))' And if g °f = I Ay then / is injective^ for then the equation /W = fiy) implies that x = g{f{x)) = g{f{y)) = y. Thus / is bijective if it has an inverse. D Now let ®(A) be the set of all bisections /: A —^ A, Then ®(A) is closed under the binary operation of composition and 1) f°{0°h)= {fog)oh for all /, g, h G ®; 2) there exists a unique I G ®(A) such that f o I = 7 o / = / for all / G ®; 3) for each / G ® there exists a unique {/ G ® such that f ° g = g ° f = I- Any set G closed under a binary operation having these properties is called a group with respect to that operation. Thus ®(A) is a group with respect to composition. Composition can also be defined for relations as follows. If 72 C A X 5 and SCBXC, then S o R c A X C is defined by <x,z> gSo R ^ {^y^^){<x, y> gR & <y,z> G S). If R and S are mappings^ this definition agrees with our earlier one. 10. DUALITY There is another elementary but important phenomenon called duality which occurs in practically all branches of mathematics. Let F: A X B —^C he any function of two variables. It is obvious that if x is held fixed^ then F{Xy y) is a function of the one variable y. That iSj for each fixed x there is a function h^:B —> C defined by h^{y) = F{x, y). Then x i-> /i"^ is a mapping <^ of A into C^. Similarly^ each y gB yields a function gy G C^j where gy(x) = F{Xy y), and y i-^ gy is a mapping 6 from B to C"^. Now suppose conversely that we are given a mapping <^: A —> C^. For each X G A we designate the corresponding value of (p in index notation as h"^^ so that h"" is a function from B to C, and we define F: A X B —^ C hy F{Xy y) = fv^{y). We are now back where we started. Thus the mappings <^: A —> C^^ F: A X B —^ Cj and 6: B —^ C^ are equivalent^ and can be thought of as three different ways of viewing the same phenomenon. The extreme mappings (p and 0 will be said to be dual to each other.
16 INTRODUCTION 0.10 The mapping ip is the indexed family of functions {h^:x ^ A} c C^. Now suppose that JF C C^ is an wmndexed collection of functions on B into C, and define F: JF X 5 -> C by F{f, y) = f{y). Then d:B->C^ is defined by gy{f) = f{y). What is happening here is simply that in the expression f{y) we regard both symbols as variables^ so that f{y) is a function on JF X B. Then when we hold y fixedj we have a function on JF mapping JF into C. We shall see some important applications of this duality principle as our subject develops. For example^ sm mX n matrix is a function t = {Uj} in ^mXn^ We picture the matrix as a rectangular array of numbers^ where 'i* is the row index and 7' is the column indeXj so that Uj is the number at the intersection of the ith row and the jth column. If we hold i fixedj we get the n-tuple forming the ith rowj and the matrix can therefore be interpreted as an m-tuple of row n-tuples. Similarly (dually) ^ it can be viewed as an n-tuple of column m-tuples. In the same vein^ an n-tuple <fu - - - ffn^ of functions from A to 5 can be regarded as a single n-tuple-valued function from A to B^, ai-> <fi{a),. . . Jn{a)>. In a somewhat different application^ duality will allow us to regard a finite- dimensional vector space V as being its own second conjugate space (F*)*. It is instructive to look at elementary Euclidean geometry from this point of view. Today we regard a straight line as being a set of geometric points. An older and more neutral view is to take points and lines as being two different kinds of primitive objects. Accordingly^ let A be the set of all points (so that A is the Euclidean plane as we now view it)? and let B be the set of all straight lines. Let F be the incidence function: F{p, I) = 1 if p and I are incident {p is "on" Z, I is "on" p) and F{p, I) = 0 otherwise. Thus F maps Ax B into {0, 1}. Then for each I G B the function gi{p) = F{p^ I) is the characteristic function of the set of points that we think of as being the line I {gi{p) has the value 1 if /? is on Z and 0 if 7? is not on I.) Thus each line determines the set of points that are on it. But J dually^ each point p determines the set of lines I "on" it, through its characteristic function h^{l). ThuSj in complete duality we can regard a line as being a set of points and a point as being a set of lines. This duality aspect of geometry is basic in projective geometry. It is sometimes awkward to invent new notation for the "partial" function obtained by holding a variable fixed in a function of several variables, as we did above when we set gy{x) = F{x^ y)^ and there is another device that is frequently useful in this situation. This is to put a dot in the position of the "varying variable". Thus F(a, •) is the function of one variable obtained from F(x, y) by holding x fixed at the value a, so that in our beginning discussion of duality we have h- = Fix,-), 9y = F{-,y). If / is a function of one variable, we can then write f == f{-), and so express the
0.11 THE BOOLEAN OPERATIONS 17 above equations also ash^{') = F{x, ')y gy{') = F{' ^ y). The flaw in this notation is that we can't indicate substitution without losing meaning. Thus the value of the function F(Xf •) at 6 is F(x^ h)^ but from this evaluation we cannot read backward and tell what function was evaluated. We are therefore forced to some such cumbersome notation as F{x^ ')\b^ which can get out of hand. Nevertheless, the dot device is often helpful when it can be used without evaluation difl^iculties. In addition to eliminating the need for temporary notation^ as mentioned above^ it can also be used^ in situations where it is strictly speaking superfluous, to direct the eye at once to the position of the variable. For example, later on D^F will designate the directional derivative of the function F in the (fixed) direction f. This is a function whose value at a is D^F{a)j and the notation D^F{') makes this implicitly understood fact explicit. 11. THE BOOLEAN OPERATIONS Let aS be a fixed domain, and let JF be a family of subsets of S. The Mnion of JF, or the union of all the sets in JF, is the set of all elements belonging to at least one set in JF. We designate the union \J^ or [Jae^ ^, and thus we have \J^= {x: (3A^^)(x G A)}, yG[j^^ {^A^^Xv G A). We often consider the family JF to be indexed. That is, we assume given a set / (the set of indices) and a surjective mapping z i-» Ai from I to 3^, so that 3^ = {Ai :i E: I}. Then the union of the indexed collection is designated [jiei Ai or \J{Ai:i G I}. The device of indices has both technical and psychological advantages, and we shall generally use it. If JF is finite, and either it or the index set is listed, then a different notation is used for its union. If JF = {A, 5}, we designate the union A U 5, a notation that displays the listed names. Note that here we have xGA\jB<=^xGAor x gB. If JF == {Ai :i = 1, . . . , n}, we generally write 'Aj U Ag U • • • U A^ or Vi^i Ai for IJ^. The intersection of the indexed family {A^j^e/j designated flze/ A^-, is the set of all points that lie in every Ai. Thus xG [\Ai ^ {\ffJ)(x^A,), iei For an unindexed family JF we use the notation f)^ or fl^ejF A, and if JF = {A, B}, then f]^ = AnB. The complement, A', of a subset of S is the set of elements x i' S not in A: A^ = {x^^ : X ^ A}, The law of De Morgan states that the complement of an intersection is the union of the complements: (f) Ai)'=[j (A'i). \iEI / lEI This an immediate consequence of the rule for negating quantifiers. It is the
18 INTRODUCTION 0.11 equivalence between 'not always in' and 'sometimes not in': [^{Vi)(x G Ai) ^^ {^i){x S Ai)] says exactly that xe(T\Aiy ^ xG\J{Al). If we set Bi = A'l and take complements again^ we obtain the dual form: Other principles of quantification yield the laws Bc^({} aA = U {Bc^Ai) from P & C^x)Q{x) <=> (3x)(P & Q{x)), BufCl Ai) - n (BuAi), BnfCi Ai) - n (BnAi), Bu (D Ai) = U (BuAi), In the case of two setSj these laws imply the following familiar laws of set algebra: (A U By = A' n B% (A n By = A'uB' (De Morgan), A n (5 u C) = (A n 5) u (A n C), A u (5 n C) = (A u 5) n (A u C). Even here, thinking in terms of indices makes the laws more intuitive. Thus (Ai n A2)' = A'l U A'2 is obvious when thought of as the equivalence between 'not always in' and 'sometimes not in'. The family ^ is disjoint if distinct sets in 3^ have no elements in common, i.e., if (VX, F^^)(X ^ Y=^ XnY = 0), For an indexed family {A,},ej the condition becomes i 9^ j =^ Ai n Ay = 0. If 3^ = {A, 5}, we simply say that A and B are disjoint. Given f:U—^V and an indexed family {Bi} of subsets of F, we have the following important identities: and, for a single set B cVj For example, a; e/-' [ n 5i] ^/(a^) e n B,- ^ (Vz) (/(a;) e 5,)
0.12 PARTITIONS AND EQUIVALENCE RELATIONS 19 The first, but not the other two, of the three identities above remains valid when / is replaced by any relation R. It follows from the commutative law, (3x)(3^)A <^ (3^)(3x)A. The second identity fails for a general R because \^x)(Vyy and '(V^)(3x)' have different meanings. 12. PARTITIONS AND EQUIVALENCE RELATIONS A partition of a set A is a disjoint family ^ of sets whose union is A. We call the elements of ^ 'fibers^ and we say that 'i^ fibers A or is afihering of A. For example, the set of straight lines parallel to a given line in the Euclidean plane is a fibering of the plane. If ^x^ designates the unique fiber containing the point x, then X I—> ^ is a surjective mapping tt: A —> [F which we call the projection of A on [F. Passing from a set A to a fibering [F of A is one of the principal ways of forming new mathematical objects. Any function / automatically fibers its domain into sets on which / is constant. If A is the Euclidean plane and f{p) is the x-coordinate of the point p in some coordinate system, then / is constant on each vertical line; more exactly, f~^(x) is a vertical line for every x in U. Moreover, x ^-^ f~^(x) is a bijection from U to the set of all fibers (vertical lines). In general, if f:A—^B is any surjective mapping, and if for each value y in B we set ^y = r\y) = {x^A: fix) = y], then 3^ = {^y '- y ^ ^} is a fibering of A and (p:y ^-^ Ay is a bijection from B to 3^. Also (^ o / is the projection tt: A —> 3^, since (f o f{x) = (p{f{x)) is the set X of all 2 in A such that f{z) = f{x). The above process of generating a fibering of A from a function on A is relatively trivial. A more important way of obtaining a fibering of A is from an equality-like relation on A called an equivalence relation. An equivalence relation '^ on A is a binary relation which is reflexive {x ^ x for every x E: A)^ symmetric (x ^ y =^ y ^ x)y and transitive (x ^ y and y ^ z =^ x ^ z). Every fibering JF of A generates a relation ^ by the stipulation that x ^ y if and only if x and y are in the same fiber, and obviously ^ is an equivalence relation. The most important fact to be established in this section is the converse. Theorem. Every equivalence relation '^ on A is the equivalence relation of a fibering. Proof. We obviously have to define x as the set of elements y equivalent to x, X = {y '- y "^ x}y and our problem is to show that the family [F of all subsets of A obtained this way is a fibering. The reflexive, symmetric, and transitive laws become X GXy X Gy =^ y GXf and x Gy and y Gz =^ x Gz. Reflexivity thus implies that 3^ covers A. Transitivity says that if y GZy then X Gy =^ X Gz; that is, if y Gz, then y Cz. But also, if y Gz, then z Gy by
20 INTRODUCTION 0.12 symmetry, and so z Cy- Thus y Gz implies y = z. Therefore, if two of our sets a and h have a point x in common, then a = x = h. In other words, if a is not the set 5, then a and h are disjoint, and we have a fibering. D The fundamental role this argument plays in mathematics is due to the fact that in many important situations equivalence relations occur as the primary object, and then are used to define partitions and functions. We give two examples. Let Z be the integers (positive, negative, and zero). A fraction ^m/n^ can be considered an ordered pair <m^n> of integers with n 9^ Q. The set of all fractions is thus I.X{1— {0}). Two fractions <mjn> and <pyq> are "equal" if and only if mq = np, and equality is checked to be an equivalence relation. The equivalence class <m, n> is the object taken to be the rational number m/n. Thus the rational number system Q is the set of fibers in a partition of Z X (Z - {0}). Next, we choose a fixed integer 7? G Z and define a relation jEJ on Z by mEn <=^ p divides m — n. Then E is an equivalence relation, and the set Zp of its equivalence classes is called the integers modulo p. It is easy to see that mEn if and only if m and n have the same remainder when divided by 7?, so that in this case there is an easily calculated function /, where f(m) is the remainder after dividing m by 7?, which defines the fibering. The set of possible remainders is {0, 1, . . . , 7? — 1}, so that Zp contains p elements. A function on a set A can be "factored" through a fibering of A by the following theorem. Theorem. Let g he a function on A, and let JF be a fibering of A. Then g is constant on each fiber of JF if and only if there exists a function g on 5 such that g = g o TT. Proof. If g is constant on each fiber of JF, then the association of this unique value with the fiber defines the function ^, and clearly g = "g o tt. The converse is obvious. D
CHAPTER 1 VECTOR SPACES The calculus of functions of more than one variable unites the calculus of one variable, which the reader presumably knows, with the theory of vector spaces, arid the adequacy of its treatment depends directly on the extent to which vector space theory really is used. The theories of differential equations and differential geometry are similarly based on a mixture of calculus and vector space theory. Such "vector calculus" and its applications constitute the subject matter of this book, and in order for our treatment to be completely satisfactory, we shall have to spend considerable time at the beginning studying vector spaces themselves. This we do principally in the first two chapters. The present chapter is devoted to general vector spaces and the next chapter to finite-dimensional spaces. We begin this chapter by introducing the basic concepts of the subject— vector spaces, vector subspaces, linear combinations, and linear transformations—and then relate these notions to the lines and planes of geometry. Next we establish the most elementary formal properties of linear transformations and Cartesian product vector spaces, and take a brief look at quotient vector spaces. This brings us to our first major objective, the study of direct sum decompositions, which we undertake in the fifth section. The chapter concludes with a ])reliminary examination of bilinearity. I. FUNDAMENTAL NOTIONS Vector spaces and subspaces. The reader probably has already had some ('ontact with the notion of a vector space. Most beginning calculus texts discuss f!;oometric vectors, which are represented by "arrows" drawn from a chosen origin 0. These vectors are added geometrically by the parallelogram rule: The sum of the vector OA (represented by the arrow from 0 to A) and the vector OB is the vector OP, where P is the vertex opposite 0 in the parallelogram liaving OA and OB as two sides (Fig. 1.1). Vectors can also be multiplied by numbers: x(OA) is that vector OB such that B is on the line through 0 and A, the distance from 0 to B is \x\ times the distance from 0 to A, and B and A live on the same side of 0 if a: is positive, and on opposite sides if x is negative 21
22 VECTOR SPACES 1.1 OC=-iOA Fig. 1.1 Fig. 1.2 (OA+OB)+OC = OP+OC = OX OA + iOB+OC)^OA+()Q = OX Fig. 1.3 (Fig. 1.2). These two vector operations satisfy certain laws of algebra, which we shall soon state in the definition. The geometric proofs of these laws are generally sketchy, consisting more of plausibility arguments than of airtight logic. For example, the geometric figure in Fig. 1.3 is the essence of the usual proof that vector addition is associative. In each case the final vector OX is represented by the diagonal starting from 0 in the parallelepiped constructed from the three edges OA, OB, and OC. The set of all geometric vectors, together with these two operations and the laws of algebra that they satisfy, constitutes one example of a vector space. We shall return to this situation in Section 2. The reader may also have seen coordinate triples treated as vectors. In this system a three-dimensional vector is an ordered triple of numbers <xi, X2, xs> which we think of geometrically as the coordinates of a point in space. Addition is now algebraically defined, <xuX2,x^> + <yuy2,y^> = <xi + yi, X2 + y2j x^ -{- y^y, as is multiplication by numbers, t<xi^X2^ x^> = <txi,tx2jtx^>. The vector laws are much easier to prove for these objects, since they are almost algebraic formalities. The set IR^ of all ordered triples of numl)ers, together with these two operations, is a second example of a vector space.
1.1 FUNDAMENTAL NOTIONS 23 If we think of an ordered triple <Xij X2, xs> as a function x with domain the set of integers from 1 to 3, where Xi is the value of the function x at i (see Section 0.8), then this vector space suggests a general type called a function space, which we shall examine after the definition. For the moment we remark only that we defined the sum of the triple x and the triple y as that triple z such that Zi = xi + Vi for every i. A vector space, then, is a collection of objects that can be added to each other and multiplied by numbers, subject to certain laws of algebra. In this context a number is often called a scalar. Befinition* Let F be a set, and let there be given a mapping <ay j3> ^-^ a + ^ from F X F to F, called addition^ and a mapping <x, a> ^-^ xa from il X F to F, called multiplication hy scalars. Then F is a vector space with respect to these two operations if: Al. a + (^ + 7) = (a + ^) + 7 for all a, ^, 7 G F. A2. a + iS = ^ + a for all a.p^V. A3. There exists an element 0 G F such that a + 0 = a for all a ^V. A4. For every a ^V there exists a |S G F such that a-\- fi = Q. 51. {xy)a = x{ya) for all x,y ^R, a ^V. 52. {x + y)ci = xa-{- ya for all a:, ?/ G R, a ^V, 53. x{a -{- p) = xa + xfi for all x^U, a.fi^V. 54. la= a for all a^V. In contexts where it is clear (as it generally is) which operations are intended, we refer simply to the vector space F. Certain further properties of a vector space follow directly from the axioms. Thus the zero element postulated in A3 is unique, and for each a the ^ of A4 is unique, and is called —a. Also Oa = 0, a:0 = 0, and (—1)q:= —a. These elementary consequences are considered in the exercises. Our standard example of a vector space will be the set F = R"^ of all real- valued functions on a set A under the natural operations of addition of two functions and multiplication of a function by a number. This generalizes the example [R^^'^.si ^ p|3 ^.j^^^ ^^ looked at above. Remember that a function / in R^ is simply a mathematical object of a certain kind. We are saying that two of these objects can be added together in a natural way to form a third such object, and that the set of all such objects then satisfies the above laws for addition. Of course, / + ^ is defined as the function whose value at a is /(a) + (j{a), so that (/+ g){a) = f(a) + g(a) for ail a in A. For example, in U^ we defined the sum x + y as that triple whose value at i is Xi + yi for all i. Similarly, (•/ is the function defined by {cf){a) = c{f(a)) for all a. Laws Al through S4 follow at once from these definitions and the corresponding laws of algebra for the real number system. For example, the equation (s + t)f = sf+tf means
24 VECTOR SPACES 1.1 that ((s + f)f) (a) = (s/ + tf){a) for all a e A. But (is + t)f){a) = (s + 0 (/(«)) = s(/(a)) + t{f{a)) = (s/)(a) + (f/)(a) = (sf+tfXa), where we have used the definition of scalar multiplication in R"^, the distributive law in R, the definition of scalar multiplication in U^, and the definition of addition in R^, in that order. Thus we have S2, and the other laws follow similarly. The set A can be anything at all. If A = R, then F = R"* is the vector space of all real-valued functions of one real variable. If A = R X R, then V = R"*^"* is the space of all real-valued functions of two real variables. If A = {1, 2} = 2, then V = ^ = U^ is the Cartesian plane, and if A = {I, . . . ,n} = n, then F = R"" is Cartesian n-space. If A contains a single point, then R^ is a natural bijective image of R itself, and of course R is trivially a vector space with respect to its own operations. Now let V be any vector space, and suppose that TF is a nonempty subset of V that is closed under the operations of V. That is, if a and 13 are in W, then so is a + 13, and if a is in W, then so is xa for every scalar x. For example, let V be the vector space R^^'^^ of all real-valued functions on the closed interval [a, h] C R, and let W be the set e([a, h]) of all continuous real-valued functions on [a, h]. Then TF is a subset of V that is closed under the operations of V, since f + g and cf are continuous whenever / and g are. Or let V be Cartesian 2-space R^, and let W be the set of ordered pairs x = <a:i; 0:2^ such that 0:1 + 0:2 = 0. Clearly, W is closed under the operations of V. Such a subset W is always a vector space in its own right. The universally quantified laws Al, A2, and Si through S4 hold in W because they hold in the larger set V. And since there is some fi in W, it follows that 0 = 0(S is in W because W is closed under multiplication by scalars. For the same reason, if a is in Wj then so is ~a = (—1)q:. Therefore, A3 and A4 also hold, and we see that TF is a vector space. We have proved the following lemma. Lemma. If TF is a nonempty subset of a vector space V which is closed under the operations of F, then TF is itself a vector space. We call TF a suhspace of V. Thus e([a, h]) is a subspace of R^'''''^ and the pairs <xi, X2> such that xi + ^2 = 0 form a subspace of R^. Subspaces will be with us from now to the end. A subspace of a vector space R^ is called a function space. In other words, a function space is a collection of real-valued functions on a common domain which is closed under addition and multiplication by scalars. What we have defined so far ought to be called the notion of a real vector space or a vector space over R. There is an analogous notion of a complex vector space, for which the scalars are the complex numbers. Then laws Si through S4 refer to multiplication by complex numbers, and the space C^ of all complex-
1.1 FUNDAMENTAL NOTIONS 25 valued functions on A is the standard example. In fact, if the reader knew what is meant by a field F, we could give a single general definition of a vector space over F, where scalar multiplication is by the elements of F, and the standard example is the space V = F"^ of all functions from A to F. Throughout this book it will be understood that a vector space is a real vector space unless explicitly stated otherwise. However, much of the analysis holds as well for complex vector spaces, and most of the pure algebra is valid for any scalar field F, EXERCISES 1.1 Sketch the geometric figure representing law S3, x(dl + '0B) = x(OA) + x(pB), for geometric vectors. Assume that x > 1. 1.2 Prove S3 for R^ using the explicit displayed form {o^i, X2j X'^] for ordered triples. 1.3 The vector 0 postulated in A3 is unique, as elementary algebraic fiddling will show. For suppose that 0' also satisfies A3. Then 0' = 0' + 0 (A3 for 0) = 0+0' (A2) = 0 (A3 for 00. Show by similar algebraic juggling that, given a, the fi postulated in A4 is unique. This unique fi is designated —a. 1.4 Prove similarly that Oa = 0, a;0 = 0, and (—1)q: = —a. 1.5 Prove that if xa = 0, then either a; = 0 or a = 0. 1.6 Prove SI for a function space IR^. Prove S3. 1.7 Given that a is any vector in a vector space F, show that the set {xa -.xElW] of all scalar multiples of a is a subspace of V. 1.8 Given that a and fi are any two vectors in F, show that the set of all vectors xa + yfi, where x and y are any real numbers, is a subspace of V. 1.9 Show that the set of triples x in IR^ such that xi — 0:2 + 20:3 = 0 is a subspace M. If A^ is the similar subspace {x: o^i + 0:2 + ^3 = 0}, find a nonzero vector a in M n N. Show that if fl A' is the set {xa -.x^U] of all scalar multiples of a. 1.10 Let A be the open interval (0, 1), and let V be R^. Given a point x in (0, 1), let Vx be the set of functions in V that have a derivative at x. Show that Vx is a sub- space of V. 1.11 For any subsets A and 5 of a vector space V we define the set sum A -^ B hy A + B = {a + ^ : a G ^1 and ^ G 5}. Show that {A + B) + C=A+{B+ C). 1.12 If ^ C F and ZC K, we similarly define XA = {xa'.x^X and a^A). Show that a nonvoid set yl is a subspace if and only \i A-^ A = A and ^A = A, 1.13 Let V be IR^, and let M be the line through the origin with slope k. Let x be any nonzero vector in M. Show that M is the subspace Rx = {^x: ^ G IR}.
26 VECTOR SPACES 1.1 1.14 Show that any other line L with the same slope k is of the form Af + a for some a. 1.15 Let M be a subspace of a vector space F, and let a and fi be any two vectors in V. Given A = a+ M and B = fi+ M, show that either A=B or A r\B=0. Show also that A + B = (a + P) + M. 1.16 State more carefully and prove what is meant by "a subspace of a subspace is a subspace". 1.17 Prove that the intersection of two subspaces of a vector space is always itself a subspace. 1.18 Prove more generally that the intersection W = flie/ ^i of ^^J family {Wi :iG 1} of subspaces of F is a subspace of V. 1.19 Let V again be R^^.d^ ^nd let W be the set of all functions / in 7 such that/'(a;) exists for every x in (0, 1). Show that W is the intersection of the collection of subspaces of the form Vx that were considered in Exercise LIO. 1.20 Let 7 be a function space R"^, and for a point a in .1 let Wa be the set of functions such that /(a) = 0. Wa is clearly a subspace. For a subset B C -I let Wb be the set of functions / in 7 such that / = 0 on B. Show that Wb is the intersection f)aGB Wa- 1.21 Supposing again that X and Y are subspaces of V, show that if .Y + F = V and X n y = {0}, then for every vector f in F there is a unique pair of vectors ^ G X and rj G y such that f = ^ + r/. 1.22 Show that if .Y and F are subspaces of a vector space F, then the union A^ U F can only be a subspace if either Y" C F or F C X. Linear combinations and linear span. Because of the commutative and associative laws for vector addition, the sum of a finite set of vectors is the same for all possible ways of adding them. For example, the sum of the three vectors aay abj ac can be calculated in 12 ways, all of which give the same result: {aa + ab) -\- ac = ac-\- (^a + «&) = (^c + ^a) + a:6 = ^6 + (^c + ^a)? etc. Therefore, if 7 = {a, 6, c} is the set of indices used, the notation Y^i^i ai^ which indicates the sum without telling us how we got it, is unambiguous. In general, for any finite indexed set of vectors {ai: i G 1} there is a uniquely determined sum vector Yliei cci which we can compute by ordering and grouping the a/s in any way. The index set I is often a block of integers n = {1, . . . , n}. In this case the vectors ai form an n-tuple {a:z}ij and unless directed to do otherwise we would add them in their natural order and write the sum as Xl?=i ^^ Note that the way they are grouped is still left arbitrary. Frequently, however, we have to use indexed sets that are not ordered. For example, the general polynomial of degree at most 5 in the two variables 's' and 'V is 2_^ CijS t , 0<z+i<5 and the finite set of monomials {sV}i+y<5 has no natural order.
1.1 FUNDAMENTAL NOTIONS 27 * The formal proof that the sum of a finite collection of vectors is independent of how we add them is by induction. We give it only for the interested reader. In order to avoid looking silly, we begin the induction with two vectors, in which case the commutative law 0:^ + ^6= ^6 + eta displays the identity of all possible sums. Suppose then that the assertion is true for index sets having fewer than n elements, and consider a collection {ai'.i G /} having n members. Let (S and 7 be the sum of these vectors computed in two ways. In the computation of fi there was a last addition performed, so that (S = (2Zze/i oci) + {Hi^J2 ^i)y where {Ji, J2} partitions I and where we can write these two partial sums without showing how they were formed, since by our inductive hypothesis all possible ways of adding them give the same result. Similarly, 7 = (HzeKi on) + (Lzei^2 «*)• Now set Ljk = Jj n Kk and ^jk = J2 «^> iELjk where it is understood that ^jk = 0 if Ljk is empty (see Exercise 1.37). Then ^iEJi =^11 + ?i2 by the inductive hypothesis, and similarly for the other three sums. Thus ^ = (?ii + ^12) + (?2i + ^22) = (?ii + ?2i) + (^2 + ^22) = -y, which completes our proof, ^k A vector 13 is called a linear combination of a subset A of the vector space V if iS is a finite sum 2Z ^i^^h where the vectors ai are all in A and the scalars xi are arbitrary. Thus, if A is the subset {t^}^ C U^ of all "monomials", then a function / is a linear combination of the functions in A if and only if / is a polynomial function f{t) = Xli ^z^^- If ^ is finite, it is often useful to take the indexed set {ai} to be the whole of A, and to simply use a 0-coefKcient for any vector missing from the sum. Thus, if A is the subset {sin ^, cos ^, e^} of IR°*, then we can consider A an ordered triple in the listed ordering, and the function 3 sin ^ — 6* = 3 • sin ^ + 0 • cos t -j- (—l)e^ is the linear combination of the triple A having the coefficient triple <3, 0, — 1 >. Consider now the set L of all linear combinations of the two vectors <1, 1, 1> and <0, 1, —1> in R^ It is the set of all vectors s<l, 1, 1> + t<0, 1, —1 > = <Sj s + t, s — t> , where s and t are any real numbers. Thus L = {<s^ s + t^ s — t> : <s^ t> G R^}. It will be clear on inspection that L is closed under addition and scalar multiplication, and therefore is a subspace of U^. Also, L contains each of the two given vectors, with coefficient pairs <1,0> and <0, 1>, respectively. Finally, any subspace M of U^ which contains each of the two given vectors will also contain all of their linear combinations, and so will include L. That is, L is the smallest subspace of U^ containing < 1, 1, 1 > and < 0, 1, — 1 > . It is called the linear span of the two vectors, or the subspace generated by the two vectors. In general, we have the following theorem.
28 VECTOR SPACES 1.1 Theorem 1.1. If A is a nonempty subset of a vector space V^ then the set L{A) of all linear combinations of the vectors in A is a subspace, and it is the smallest subspace of V which includes the set A. Proof, Suppose first that A is finite. We can assume that we have indexed A in some way, so that A = {ai: i G 1} for some finite index set 7, and every element of L{A) is of the form Yli^i ^i<^i' Then we have (E xiai) + (E ym) = L (Xi + yi)ai because the left-hand side becomes Ez (^z^z + Vio^i) when it is regrouped by pairSj and then S2 gives the right-hand side. We also have c{T.Xiai) = T.(cxi)ai by S3 and mathematical induction. Thus L(A) is closed under addition and multiplication by scalars and hence is a subspace. Moreover, L(A) contains each ai (why?) and so includes A. Finally, if a subspace W includes A^ then it contains each linear combination E ^z^z? so it includes L(A). Therefore, L{A) can be directly characterized as the uniquely determined smallest subspace which includes the set A. If A is infinite, we obviously can't use a single finite listing. However, the sum (Elf Xiai) + (Er Vjfij) of two linear combinations of elements of A is clearly a finite sum of scalars times elements of A. If we wish, we can rewrite it as 2^^+^ Xiaij where we have set fij = ^n+y and yj = Xn-{.j for J = 1, . . . , m. In any case, L(A) is again closed under addition and multiplication by scalars and so is a subspace. D We call L{A) the linear span of A. If L(A) = F, we say that A spans V; V is finite-dimensional if it has a finite spanning set. If 7 = U^^ and if 5\ 5^, and 8^ are the "unit points on the axes", 8^ = <1,0, 0>, 8^ = <0, 1,0>, and 8^ = <OjO, l>j then {8'}l spans F, since x= <Xi^ X2y x^> = <a:i,0,0> + <0, a:2j0> + <0,0, a:3> = Xi8^ + ^25^ + xs8^ = E? Xi8' for every x in U^. More generally, if F = R'' and 8' is the n-tuple having 1 in the jth place and 0 elsewhere, then we have similarly that X = <xi, . . . ,Xn> = 117=1 Xi8\ so that {8'}i spans W". Thus W" is finite- dimensional. In general, a function space on an infinite set A will not be finite- dimensional. For example, it is true but not obvious that e([a, h]) has no finite spanning set. EXERCISES 1.23 Given a = <1,1,1>, 13 = <0, 1, —1 >, 7 = <2, 0, 1 >, compute the linear combinations a + (3+7, Sa — 2(3 + 7, xa+ yl3 + z7. Find x, y, and z such that xa+yl3+ z7 = < 0, 0, 1 > = 8^. Do the same for 8^ and 8^. 1.24 Given a = <1, 1, 1>, ^ = <0, 1,—1>, 7 = <1,0, 2>, show that each of a, |S, 7 is a linear combination of the other two. Show that it is impossible to find coefficients x, y, and z such that xa-\- yP-\- z7 = 8^,
1.1 FUNDAMENTAL NOTIONS 29 1.25 a) Find the linear combination of the set yl = <t,t^ — l,t^ -\- l> with coefficient triple -<2, —1, 1>. Do the same for -<0, 1, 1>. b) Find the coefficient triple for which the linear combination of the triple A is (t-\- 1)^. Do the same for 1. c) Show in fact that any polynomial of degree < 2 is a linear combination of .1. 1.26 Find the linear combination/of {e\ e'^} C U^ such that/(0) = 1 and/'(0) = 2. 1.27 Find a linear combination / of sin x, cos x, and e^ such that/(0) = 0,/'(O) = 1, and/''(0) = 1. 1.28 Suppose that a sin a; + 6 cos x + ce^ is the zero function. Prove that a = b ~ c = 0. 1.29 Prove that < 1, 1 > and < 1, 2> span U^. 1.30 Show that the subspace M = {x : o^i + 0:2 = 0} C If^^ is spanned by one vector. 1.31 Let M be the subspace {x: o^i — X2-\- 2x3 = 0} in R^. Find two vectors a and b in M neither of which is a scalar multiple of the other. Then show that M is the linear span of a and b. 1.32 Find the intersection of the linear span of -<1, 1, 1> and -<0, 1, —1> in U^ with the coordinate subspace X2 = 0. Exhibit this intersection as a linear span. 1.33 Do the above exercise with the coordinate space replaced by M = {x: 0^1+ 0:2 = 0}. 1.34 By Theorem 1.1 the linear span L(A) of an arbitrary subset A of a vector space V has the following two properties: i) L(A) is a subspace of V which includes A; ii) If M is any subspace which includes A, then L(A) C M. Using only (i) and (ii), show that a) ACB=>L{A)CL{B); b) L(L(A)) = L{A), 1.35 Show that a) if M and A^ are subspaces of V, then so is ilf + A^; b) for any subsets A.BCV, L(A U B) = L(A) + L{B). 1.36 Remembering (Exercise 1.18) that the intersection of any family of subspaces is a subspace, show that the linear span L{A) of a subset yl of a vector space V is the intersection of all the subspaces of V that include A, This alternative characterization is sometimes taken as the definition of linear span. 1.37 By convention, the sum of an empty set of vectors is taken to be the zero vector. This is necessary if Theorem 1.1 is to be strictly correct. Why? What about the preceding problem? Linear transformations. The general function space R"^ and the subspace e([aj h\) of U^^'^^ both have the property that in addition to being closed under the vector operations, they are also closed under the operation of multiplication of two functions. That is, the pointwise product of two functions is again a function [{fg){a) = f{a)g{a)]y and the product of two continuous functions is continuous. With respect to these three operations, addition, multiplication.
30 VECTOR SPACES 1.1 and scalar multiplication, R^ and ©([a, h]) are examples of algebras. If the reader noticed this extra operation, he may have wondered why, at least in the context of function spaces, we bother with the notion of vector space. Why not study all three operations? The answer is that the vector operations are exactly the operations that are "preserved" by many of the most important mappings of sets of functions. For example, define T:e([a, h]) -> R by T(f) = j^ f(t) dt. Then the laws of the integral calculus say that T{f + g) = T{f) + T{g) and T{cf) = cT{f). Thus T "preserves" the vector operations. Or we can say that T "commutes" with the vector operations, since plus followed by T equals T followed by plus. However, T does not preserve multiplication: it is not true in general that TUg) = TU)T{g). Another example is the mapping Trxi—»y from R^ to R^ defined by ?/i = 2a: 1 — ^^2 + ^3, 2/2 = ^1 + 30^2 — 5^:3, for which we can again verify that T{x + y) = T{x) + T{j) and T{cx) = cT{x), The theory of the solvability of systems of linear equations is essentially the theory of such mappings T] thus we have another important type of mapping that preserves the vector operations (but not products). These remarks suggest that we study vector spaces in part so that we can study mappings which preserve the vector operations. Such mappings are called linear transformations. Definition. If V and W are vector spaces, then a mapping T: F —> TF is a linear transformation or a linear map if T(a + fi) = T{a) + T{fi) for all a, ^ G F, and T{xa) = xT(a) for all a G F, a: G R. These two conditions on T can be combined into the single equation T(xa + yl3) = xT{a) + yT{ff) for all a,fi^V and all x, y G R. Moreover, this equation can be extended to any finite sum by induction, so that if T is linear, then T f Z x,ai) = i: XiT{a,) for any linear combination Y.Xiai. For example, j^ (E? cj,) = U ^/a U EXERCISES 1.38 Show that the most general linear map from R to R is multiplication by a constant. 1.39 For a fixed a in F the mapping x^—^xa from R to F is linear. Why? 1.40 Why is this true for a^—^xa when x is fixed? 1.41 Show that every linear mapping from R to F is of the form x^—^xa for a fixed vector a in F.
1.1 FUNDAMENTAL NOTIONS 31 1.42 Show that every linear mapping from U^ to V is of the form -<a;i,a;2> •—* xiai + X2a2 for a fixed pair of vectors ai and 0:2 in V. What is the range of this mapping? 1.43 Show that the map/i—» fa f(t) dt from C([a, b]) to R does not preserve products. 1.44 Let g be any fixed function in U^. Prove that the mapping T: R^ -^ R^ defined by T(f) = gf is linear. 1.45 Let (f be any mapping from a set A to a set B. Show that composition by (p is a linear mapping from U^ to U^. That is, show that T:U^ -^ U^ defined by T(f) = foipis linear. In order to acquire a supply of examplesj we shall find all linear transformations having R^ as domain space. It may be well to start by looking at one such transformation. Suppose we choose some fixed triple of functions {fi} I in the space R" of all real-valued functions on R, say fi(t) = sin t, f2(t) = cos t, and /aCO = e^ = exp(t). Then for each triple of numbers x = {xi}l in R^ we have the linear combination Xlf=i ^ifi with {xi} as coefficients. This is the element of R" whose value at t is 2Zi ^ifiQ) = :^i sin ^ + ^2 cos t + x^eK Different coefficient triples give different functions, and the mapping x 1—» 2Zf=i xifi = o^i sin + X2 cos + Xs exp is thus a mapping from R^ to R". It is clearly linear. If we call this mapping T, then we can recover the determining triple of functions from T as the images of the "unit points" 8' in R^; T(8^') = J] djfi = fj, and so T{8^) = sin, T{8^) = cos, and T(8^) = exp. We are going to see that every linear mapping from R^ to R" is of this form. In the following theorem {5*} 1 is the spanning set for R^ that we defined earlier, so that x = ^i Xi8^ for every n-tuple x = <xij . . . , Xn> in R^. Theorem 1.2. If {l3j}i is any fixed n-tuple of vectors in a vector space TF, then the "linear combination mapping" x ^-^ ^i Xi^i is a linear transformation T from R^ to W, and T(d^) = 13j forj= 1, . . . , n. Conversely, if T is any linear mapping from R^ to Wy and if we set 13j = T(8^) ior j = 1, . . . , n, then T is the linear combination mapping x 1—> ^i xifii. Proof. The linearity of the linear combination map T follows by exactly the same argument that we used in Theorem 1.1 to show that L{A) is a subspace. Thus n n T{X + y) = Z (^^• + yi)fii = Z (^^•^^ + ViPi) 1 1 = Z x,Pi + Z ViPi = T{x) + T(j), 1 1 and n n n T{sx) = Z (s^i)l^i = Z <^il^i) = s Z ^iPi = sT{x), 1 1 1 Also T(8') = ^^"=1 5|ft = 0j^ since 8J = 1 and 8{ = 0 for i 9^ j.
32 VECTOR SPACES 1.1 Conversely J if T-.W —^Wis linear, and if we set Pj = T{8^) for all j, then for any x = <Xi, . , . , Xn> in R^ we have T(x) = T(Zi ^i 8') = E? XiT(8') = X? Xil3i. Thus T is the mapping x i-» J^i xifii. D This is a tremendously important theorem, simple though it may seem, and the reader is urged to fix it in his mind. To this end we shall invent some terminology that we shall stay with for the first three chapters. If a = {ai, . . . , an} is an n-tuple of vectors in a vector space TF, let La be the corresponding linear combination mapping x i-» Yl\ Xi<^i from W to W. Note that the n-tuple a itself is an element of W^. If T is any linear mapping from W to TF, we shall call the n-tuple {^(S')} J the skeleton of T. In these terms the theorem can be restated as follows. Theorem 1.2'. For each n-tuple a in TF^, the map La'. R^ —> TF is linear and its skeleton is a. Conversely, if T is any linear map from W to TF, then T = Lp where P is the skeleton of T. Or again: Theorem 1.2". The map a I—» La is a bisection from W^ to the set of all linear maps T from U^ to TF, and T i—» skeleton (T) is its inverse. A linear transformation from a vector space V to the scalar field U is called a linear functional on F. Thus f ^-^ ja f{t) dt is a linear functional on F = e([a, h]). The above theorem is particularly simple for a linear functional F: since TF = R, each vector fii = 7^(5') in the skeleton of F is simply a number 6^, and the skeleton {hi}^ is thus an element of U^. In this case we would write F(x) = 2Z? hiXiy putting the numerical coefficient '6/ before the variable ^Xi. Thus F(x) = 3xi — ^2 + 4^3 is the linear functional on U^ with skeleton <3, — 1, 4>. The set of all linear functionals on U^ is in a natural one-to-one correspondence with U^ itself; we get b from F by hi = F(8^) for all ^, and we get F from b by F(x) = E hiXi for all x in R^. We next consider the case where the codomain space of 7" is a Cartesian space R^, and in order to keep the two spaces clear in our minds, we shall, for the moment, take the domain space to be R^. Each vector I3i = T(8^) in the skeleton of T is now an m-tuple of numbers. If we picture this m-tuple as a column of numbers, then the three m-tuples I3i can be pictured as a rectangular array of numbers, consisting of three columns each of m numbers. Let tij be the iih number in the jih column. Then the doubly indexed set of numbers {tij} is called the matrix of the transformation T. We call it an m-by-3 (an m X 3) matrix because the pictured rectangular array has m rows and three columns. The matrix determines T uniquely, since its columns form the skeleton of T. The identity ^(x) = E? XjT{8') = E? Xjl3j allows the m-tuple ^(x) to be calculated explicitly from x and the matrix {tij}. Picture multiplying the column m-tuple 13j by the scalar Xj and then adding across the three columns at
1.1 FUNDAMENTAL NOTIONS 33 the ^th row, as below : Since Uj is the ^th number in the m-tuple fij^ the ^th number in the m-tuple 2Zy=i XjPj is Z^y=i Xjtij. That isj if we let y be the m-tuple T(x)^ then 3 and this set of 7n scalar equations is equivalent to the one-vector equation y = Tix). We can now replace three by 7i in the above discussion without changing anything except the diagram, and thus obtain the following specialization of Theorem 1.2. Theorem 1.3. Every linear mapping T from R^ to W^ determines the m X 71 matrix t = {tij} having the skeleton of T as its columns, and the expression of the equation y = ^(x) in linear combination form is equivalent to the m scalar equations n Vi = X tijXj for ^ = Ij . . . J m. j=i Conversely, each m X 7i matrix t determines the linear combination mapping having the columns of t as its skeleton, and the mapping t •-> T is therefore a bisection from the set of all m X 7i matrices to the set of all linear maps from R^ to R^. A linear functional F on U^ is a linear mapping from R^ to IR\ so it must be expressed by a 1 X n matrix. That is, the 7i-tuple b in R^ which is the skeleton of F is viewed as a matrix of one row and 7i columns. As a final example of linear maps, we look at an important class of special linear functionals defined on any function space, the so-called coordinate func- tionals. If F = R^ and i G /, then the ith coordinate functional Ti is simply evaluation at i, so that 7ri{f) = f{i). These functionals are obviously linear. In factj the vector operations on functions were defined to make them linear; since sf + tg is defined to be that function whose value at i is sf{i) + tg{i) for all i^ we see that sf + tg is by definition that function such that Tri{sf + tg) = STTiif) + tTi{g) for all i\ If V is R^j then Wj is the mapping x = <xiy . . . ^ Xn> •-> xy. In this case we know from the theorem that Tj must be of the form Tj(x) = 2Z^ hiXi for some n-tuple b. What is b?
34: VECTOR SPACES 1.1 The general form of the Hnearity property, T(^Xiai) = J^XiT(ai)^ shows that T and T~^ both carry subspaces into subspaces. Theorem 1.4. If T:V —^ W is linear, then the T-image of the linear span of any subset A C F is the linear span of the T-image of A: T[L(A)] = L{T[A]). In particular, if A is a subspace, then so is T[A]. Furthermorej if Y is a subspace of W^ then ^"^[F] is a subspace of V. Proof. According to the formula T{J^Xiai) = JlxiT(ai)^ a vector in W is the T-image of a linear combination on A if and only if it is a linear combination on T[A]. That is, T[L(A)] = L(T[A]). If A is a subspace, then A = L(A) and T[A] = L(T[A])j a subspace of W. Finally, if F is a subspace of W and {a^} C T-'\Y], then T{Zx,ai) = Z x^T(a^) e L(Y) = F. Thus Zx^a^ G T-'[Y] and T~^[Y] is its own linear span. D The subspace ^"^(0) = {a e F : T(a) = 0} is called the 7iull space, or kernel, of T, and is designated N{T) or 91(7"). The range of T is the subspace T[V] of W. It is designated R{T) or 6{(T). Lemma 1.1. A linear mapping T is infective if and only if its null space is {0]. Proof. If T is injective, and if a ?^ 0, then T(a) ^ T(0) = 0 and the null space accordingly contains only 0. On the other hand, if N(T) = {0}, then whenever a 9^ fi, we have a - fi ^ 0, T{a) - T{ff) = T{a - ^) ^ 0, and T{a) ^ T(^); this shows that T is injective. D A linear map T:V —^ W which is bijective is called an isomorphism. Two vector spaces F and W are isomorphic if and only if there exists an isomorphism between them. For example, the map <Ci^ . . . ^ Cn> —> 2Zo~^ c^+ix" is an isomorphism of W^ with the vector space of all polynomials of degree < 7i. Isomorphic spaces "have the same form", and are identical as abstract vector spaces. That is, they cannot be distinguished from each other solely on the basis of vector properties which they do or do not have. When a linear transformation is from F to itself^ special things can happen. One possibility is that T can map a vector a essentially to itself, T{a) = xa for some x in R. In this case a is called an eigenvector (proper vector, characteristic vector), and x is the corresponding eigenvalue. EXERCISES 1.46 In the situation of Exercise 1.45, show that T is an isomorphism if ip is bijective by showing that a) if injective =^ T surjective, b) <p surjective =^ T injective.
1.1 FUNDAMENTAL NOTIONS 35 1.47 Find the linear functional I on U^ such that l(<l,l» = 0andZ(<l,2>) = 1. That is, find b = -<6i, 62^ in R^ such that I is the linear combination map X -^ bixi + 62^2. 1.48 Do the samefor Z(<2, 1>) = —3andZ(<l, 2>) = 4. 1.49 Find the linear T:U^ -^ R" such that ^(^ 1, 1 >) = <2 ^nd ^(^ 1, 2>) = t^. That is, find the functions/i (0 and/2(0 such that T is the linear combination map X -^ a;i/i + 0:2/2. 1.50 Let T be the linear map from IR2 to IR3guch that ^(Si) = <2,—I, 1>, T{8^) = -<1, 0, 3>. Write down the matrix of T in standard rectangular form. Determine whether or not 5^ is in the range of T. 1.51 Let T be the linear map from U^ to U^ whose matrix is 1 2 3 2 0 —1 3 —1 1. Find T(x) when x = <1, 1, 0>; do the same for x = <3, —2, 1>. 1.52 Let M be the linear span of -< 1, —1, 0> and -<0, 1, 1>. Find the subspace T[M] by finding two vectors spanning it, where T is as in the above exercise. 1.53 Let T be the map <x, y> -^ <x + 2y, y> from U^ to itself. Show that T is a linear combination mapping, and write down its matrix in standard form. 1.54 Do the same for T: <x,y, z> -^ <x — z, x -^ z, y> from U^ to itself. 1.55 Find a linear transformation T from U^ to itself whose range space is the span of <1, —1,0> and <—1,0, 2>. 1.56 Find two linear functional on R^ the intersection of whose null spaces is the linear span of -<1, 1, 1, 1> and -<1, 0, —1, 0>. You now have in hand a linear transformation whose null space is the above span. What is it? 1.57 Let V = C([a, b]) be the space of continuous real-valued functions on [a, b], also designated C^([a, 6]), and let W = CV[a, 6]) be those having continuous first derivatives. Let D:W -^ V be differentiation (Df = f), and define T on F by ^(/) = ^7 where F(x) = fa f(t) dt. By stating appropriate theorems of the calculus, show that D and T are linear, T maps into W, and D is a left inverse of T (D o T is the identity on V). 1.58 In the above exercise, identify the range of T and the null space of D. We know that D is surjective and that T is injective. Why? 1.59 Let V be the linear span of the functions sin x and cos a;. Then the operation of differentiation D is a linear transformation from V to V. Prove that D is an isomorphism from V to V. Show that D^ = —/ on V. 1.60 a) As the reader would guess, e^(IR) is the set of real-valued functions on R having continuous derivatives up to and including the third. Show that / -^ /''' is a surjective linear map T from e^{U) to e{U). b) For any fixed a in R show that /-^ ^fia), fici), f\ci)^ is an isomorphism from the null space N{T) to R^. [Hint: Apply Taylor's formula with remainder.]
36 VECTOR SPACES 1.2 1.61 An integral analogue of the matrix equations yi = ^jtijXjj i = 1, . . . , m, is the equation g(s) = f' K(s,t)mdt, se[0,l]. Jo Assuming that K(s, t) is defined on the square [0, 1] X [0, 1] and is continuous as a function of t for each s, check that / —> ^ is a linear mapping from C([0, 1]) to R^^'^^ 1.62 For a finite set A = {«{], Theorem 1.1 is a corollary of Theorem 1.4. Why? 1.63 Show that the inverse of an isomorphism is linear (and hence is an isomorphism). 1.64 Find the eigenvectors and eigenvalues of T: R^ -^ R^ if the matrix of T is 1 —r -2 0. Since every scalar multiple xa of an eigenvector a is clearly also an eigenvector, it will suffice to find one vector in each "eigendirection". This is a problem in elementary algebra. 1.65 Find the eigenvectors and eigenvalues of the transformations T whose matrices are "2 r -4 —2 1.66 The five transformations in the above two exercises exhibit four different kinds of behavior according to the number of distinct eigendirections they have. What are the possibilities? 1.67 Let V be the vector space of polynomials of degree < 3 and define T.V -^ V by/ -^ tf{t). Find the eigenvectors and eigenvalues of T. 1 _—2 r 0_ } '—1 _—1 —r —1_ 7 1 _—2 —r 2_ 7 2. VECTOR SPACES AND GEOMETRY The familiar coordinate systems of analytic geometry allow us to consider geometric entities such as lines and planes in vector settings, and these geometric notions give us valuable intuitions about vector spaces. Before looking at the vector forms of these geometric ideas, we shall briefly review the construction of the coordinate correspondence for three-dimensional Euclidean space. As usual, the confident reader can skip it. We start with the line. A coordinate correspondence between a line L and the real number system U is determined by choosing arbitrarily on L a zero point 0 and a unit point Q distinct from 0. Then to each point X on L is assigned the number x such that \x\ is the distance from 0 to X^ measured in terms of the segment OQ as unit, and x is positive or negative according as X and Q are on the same side of 0 or on opposite sides. The mapping X i—» x is the coordinate correspondence. Now consider three-dimensional Euclidean space E^. We want to set up a coordinate correspondence between E^ and the Cartesian vector space U^. We first choose arbitrarily a zero point 0 and three unit points Qi7 Q27 and Q3 in such a way that the four points do not lie in a plane. Each of
1.2 VECTOR SPACES AND GEOMETRY 37 the unit points Qi determines a line Li through 0 and a coordinate correspondence on this line, as defined above. The three lines Li, L2, and L3 are called the coordinate axes. Consider now any point X in E^. The plane through X parallel to L2 and L3 intersects Li at a point Xi, and therefore determines a number xi, the coordinate of Xi on Li. In a similar way, X determines points X2 on L2 and X3 on L3 which have coordinates X2 and X3, respectively. Altogether X determines a triple X = <Xi, X2, X3> in U^, and we have thus defined a mapping (9:Xi->x from E'^ to (see Fig. 1.4). We call 6 the coordinate correspondence defined by the axis system. The convention implicit in our notation above is that d(Y) is y, d(A) is a, etc. Note that the unit point Qi on Li has the coordinate triple 8^ = < 1, 0, 0>, and similarly, that and e(Q2) = 8^ = <o, i,o> diQs) = 5'^- <0,0, 1>. Fig. 1.4 There are certain basic facts about the coordinate correspondence that have to be proved as theorems of geometry before the correspondence can be used to treat geometric questions algebraically. These geometric theorems are quite tricky, and are almost impossible to discuss adequately on the basis of the usual secondary school treatment of geometry. We shall therefore simply assume them. They are: 1) ^ is a bisection from E'^ to U^. 2) Two line segments AB and XY are equal in length and parallel, and the direction from A to B is the same as that from X to Y if and only if b — a == y — X (in the vector space U^). This relationship between line segments is important enough to formalize. A directed line segment is a geometric line segment, together with a choice of one of the two directions along it. If we interpret AB as the directed line segment from A to B, and if we define the directed line segments AB and XF to be equivalent (and write AB '^ XY) if they are equal in length, parallel, and similarly directed, then (2) can be restated: AB ^ XY ^ h ~ y — X. 3) li X 9^ Oj then Y is on the line through 0 and X in E^ if and only if y = tx for some t in R. Moreover, this t is the coordinate of Y with respect to X as unit point on the line through 0 and X.
38 VECTOR SPACES 1.2 X3 S2=xf+a;^ |OX|2 = r2 = s2+xi {y-x,y-x)^\XY\^ {y.y) = \OY\ Fig. 1.5 4) If the axis system in E^ is Cartesian^ that is, if the axes are mutually perpendicular and a common unit of distance is used, then the length \0X\ of the segment OX is given by the so-called Euclidean norm on R^, \0X\ = (YAXi^y^. This follows directly from the Pythagorean theorem. Then this formula and a second application of the Pythagorean theorem to the triangle OXY imply that the segments OX and OY are perpendicular if and only if the scalar product (x, y) = Z!f=i ^iVi h^s the value 0 (see Fig. 1.5). In applying this result, it is useful to note that the scalar product (x, y) is linear as a function of either vector variable when the other is held fixed. Thus 3 33 (ex + dj, z) = X (cXi + dyi)zi = c XI ^i^i + <i X Vi^i = ^(x, z) + d(j, z). 1 1 1 Exactly the same theorems hold for the coordinate correspondence between the Euclidean plane E^ and the Cartesian 2-space U^, except that now, of course, (x, y) = L? XiVi = xiVi + X22/2. We can easily obtain the equations for lines and planes in E^ from these basic theorems. First, we see from (2) and (3) that if fixed points A and B are given, with A 9^ 0^ then the line through B parallel to the segment OA contains the point X if and only if there exists a scalar t such that x — b = ^a (see Fig. 1.6). Therefore, the equation of this line is X = ^a + b. Fig. 1.6 This vector equation is equivalent to the three numerical equations Xi ~ ait + hij i = 1, 2, 3. These are customarily called the parametric equations of the line, since they present the coordinate triple x of the varying point X on the line as functions of the "parameter" t.
L2 VECTOR SPACES AND GEOMETRY 39 Nextj we know that the plane through B perpendicular to the direction of the segment OA contains the point X if and only if BX ± OA^ and it therefore follows from (2) and (4) that the plane contains X if and only if (x — b, a) = 0 (see Fig. 1.7). But (x — b, a) = (x, a) — (b, a) by the linearity of the scalar product in its first variable, and if we set I = (b, a), we see that the equation of the plane is (x, a) = I or Zl (^i^ I That iSj a point X is on the plane through B perpendicular to the direction of OA if and only if this equation holds for its coordinate triple x. Converselyj if a ?^ 0, then we can retrace the steps taken above to show that the set of points X in E^ whose coordinate triples x satisfy (x, a) = Hs a plane. V5^-.., Fig. 1.7 The fact that U^ has the natural scalar product (x, y) is of course extremely important, both algebraically and geometrically. However^ most vector spaces do not have natural scalar products, and we shall deliberately neglect scalar products in our early vector theory (but shall return to them in Chapter 5). This leads us to seek a different interpretation of the equation J^l aiXi = I. We saw in Section 1 that x i—> T^,? the most general linear functional / on H^. Therefore, given any plane M m E^, there is a nonzero linear functional / on M^ and a number I such that the equation of M is /(x) = I. And conversely, given any nonzero linear functional /: R^ -^ H and any I E R, the locus of /(x) = / is a plane M in E^. The reader will remember that we obtain the coefficient triple a from / by ai = /(5'), since then /(x) = /(2Zi x^-5') = Finally, we seek the vector form of the notion of parallel translation. In plane geometry when we are considering two congruent figures that are parallel and similarly oriented, we often think of obtaining one from the other by "sliding
40 VECTOR SPACES 1.2 the plane along itself" in such a way that all lines remain parallel to their original positions. This description of a parallel translation of the plane can be more elegantly stated as the condition that every directed line segment slides to an equivalent one. If X slides to Y and 0 slides to B^ then OX slides to BYy so that OX ^ BY and x = y — b by (2). Therefore, the coordinate form of such a parallel sliding is the mapping x i—» y = x + b. Conversely J for any b in R^ the plane mapping defined byxi—»y = x + b is easily seen to be a parallel translation. These considerations hold equally well for parallel translations of the Euclidean space E^. It is geometrically clear that under a parallel translation planes map to parallel planes and lines map to parallel lines, and now we can expect an easy algebraic proof. Consider, for example, the plane M with equation /(x) = 1; let us ask what happens to M under the translation x i—» y = x + b. Since X = y — b, we see that a point x is on M if and only if its translate y satisfies the equation /(y — h) = I or, since / is linear, the equation /(y) = l\ where V = l-^/(b). But this is the equation of a plane N. Thus the translate of M is the plane N. It is natural to transfer all this geometric terminology from sets in E^ to the corresponding sets in R^ and therefore to speak of the set of ordered triples X satisfying/(x) = Z as a set of points in U^ forming a plane in R^, and to call the mapping x i-» x + b the (parallel) translation of U^ through b, etc. Moreover, since R^ is a vector space, we would expect these geometric ideas to interplay with vector notions. For instance, translation through b is simply the operation of adding the constant vector b: x i—» x + b. Thus if M is a plane, then the plane N obtained by translating M through b is just the vector set sum M + b. If the equation of M is /(x) = Z, then the plane M goes through 0 if and only if Z = 0, in which case M is a vector suhspace of U^ (the null space of /). It is easy to see that any plane M is a translate of a plane through 0. Similarly, the line {ta + h :t G U] is the translate through b of the line {^a : ^ G R}, and this second line is a subspace, the linear span of the one vector a. Thus planes and lines in R^ are translates of suhspaces. These notions all carry over to an arbitrary real vector space in a perfectly satisfactory way and with additional dimensional variety. A plane in R^ through 0 is a vector space which is two-dimensional in a strictly algebraic sense which we shall discuss in the next chapter, and a line is similarly one-dimensional. In R^ there are no proper subspaces other than planes and lines through 0, but in a vector space V with dimension n > 3 proper subspaces occur with all dimensions from 1 to n — 1. We shall therefore use the term "plane" loosely to refer to any translate of a subspace, whatever its dimension. More properly, translates of vector subspaces are called affine subspaces. We shall see that if F is a finite-dimensional space with dimension n, then the null space of a nonzero linear functional / is always {n — l)-dimensional, and therefore it cannot be a Euclidean-like two-dimensional plane except when
1.2 VECTOR SPACES AND GEOMETRY 41 n = 3. We use the term hyperplane for such a null space or one of its translates. Thus, in general, a hyperplane is a set with the equation /(x) = I, where / is a nonzero linear functional. It is a proper affine subspace (plane) which is maximal in the sense that the only affine subspace properly including it is the whole of V. In U^ hyperplanes are ordinary geometric planes, and in U^ hyperplanes are lines! EXERCISES 2.1 Assuming the theorem AB ^ XY <^ h — a = y — x, show that OC is the sum of OA and OB, as defined in the preliminary discussion of Section 1, if and only if c = b+a. Considering also our assumed geometric theorem (3), show that the mapping x i—> OX from R^ to the vector space of geometric vectors is linear and hence an isomorphism. 2.2 Let L be the line in the Cartesian plane R^ w^ith equation X2 = Sxi. Express L in parametric form as x = <a for a suitable ordered pair a. 2.3 Let V be any vector space, and let a and /? be distinct vectors. Show that the line through a and ^ has the parametric equation ^ = tl3+ (I -t)a, tGR. Show also that the segment from a to |S is the image of [0, 1] in the above mapping. 2.4 According to the Pythagorean theorem, a triangle with side lengths a, b, and c has a right angle at the vertex "opposite c" if and only if c^ = a^ -\- b^. Prove from this that in a Cartesian coordinate system in E^ the length \0X\ of a segment OX is given by \ox\^ = Z 4 1 where x = <xi, X2, xs^ is the coordinate triple of the point X. Next use our geometric theorem (2) to conclude that 3 OX A. OY if and only if (x, y) = 0, where (x, y) = ^ Xiyi. 1 (Use the bilinearity of (x, y) to expand \X — Fp.) 2.5 More generally, the law of cosine says that in any triangle labeled as indicated, c2 = a2 + 62 _ 2ab cos 6.
42 VECTOR SPACES 1.2 Apply this law to the diagram 0 to prove that (x, y) = 2|x| |y| cos l9, where (x, y) is the scalar product ^l Xiiji, |x| = (x, \)^^^ = \0X\, etc. 2.6 Given a nonzero linear functional /: U^ -^ R, and given A: E R, show that the set of points X in E^ such that /(x) = A; is a plane. [Hint: Find a b in R^ such that /(b) = k, and throw the equation/(x) = k into the form (x — b, a) = 0, etc.] 2.7 Show that for any b in U^ the mapping X i—» 1" from E^ to itself defined by y = X + b is a parallel translation. That is, show that if .Y i—» 1^ and Z i—» W, then XZ - YW. 2.8 Let M be the set in U^ with equation 3a;i — X2-h x^ = 2. Find triplets a and b such that M is the plane through b perpendicular to the direction of a. What is the equation of the plane P = 3/ + -< 1, 2, 1 > ? 2.9 Continuing the above exercise, what is the condition on the triplet b in order for N = iVf + b to pass through the origin? What is the equation oi N? 2.10 Show that if the plane M in R^ has the equation/(x) = I, then M is a translate of the null space N of the linear functional /. Show that any two translates M and P of N are either identical or disjoint. What is the condition on the ordered triple b in order that AT + b = M? 2.11 Generalize the above exercise to hyperplanes in R^. 2.12 Let N be the subspace (plane through the origin) in U^ with equation/(x) = 0. Let M and F be any two planes obtained from N by parallel translation. Show that Q = il/ + P is a third such plane. If M and P have the equations /(x) = h and /(x) = h, find the equation for Q. 2.13 If M is the plane in R^ with equation/(x) = I, and if r is any nonzero number, show that the set product rM is a plane parallel to M. 2.14 In view of the above two exercises, discuss how we might consider the set of all parallel translates of the i)lane N with equation /(x) = 0 as forming a new vector space. 2.15 Let L be the subspace (Hne through the origin) in U^ with parametric equation X = ta. Discuss the set of all parallel translates of L in the spirit of the above three exercises. 2.16 The best object to take as "being" the geometric vector AB is the equivalence class of all directed line segments XY such that XY ^ AB. Assuming whatever you need from properties (1) through (4), show that this is an equivalence relation on the set of all directed line segments (Section 0.12). 2.17 Assuming that the geometric vector AB is defined as in the above exercise, show that, strictly speaking, it is actually the mapping of the plane (or space) into itself that we have called the parallel translation through AB. Show also that AB -\- CD is the composition of the two translations.
1.3 PRODUCT SPACES AND HOM(F, TF) 43 3. PRODUCT SPACES AND HOM(F, W) Product spaces. If TF is a vector space and A is an arbitrary set, then the set V = W^ of all TF-valued functions on A is a vector space in exactly the same way that W^ is. Addition is the natural addition of functions, (/+ g)(€i) = f{a) + g(a)^ and, similarly, {xf){a) = x{f{a)) for every function/and scalar x. Laws Al through S4 follow just as before and for exactly the same reasons. For variety, let us check the associative law for addition. The equation/+ (Q + h) = (f+9) + h means that {f + (g + h)) (a) = {(f + g) + h) (a) for all a G A. But {f+(g + h))(a)=f(a) + (g + hXa) = m + {g{a) + h{a)) = (/(a) + g{a)) + h{a) = {f+ g){a) + h{a) = ((/ +g) + h) (a), where the middle equality in this chain of five holds by the associative law for W and the other four are applications of the definition of addition. Thus the associative law for addition holds in W^ because it holds in TF, and the other laws follow in exactly the same way. As before, we let tti be evaluation at z, so that Tri{f) = f(i). Now, however, tt^ is vector valued rather than scalar valued, because it is a mapping from V to TF, and we call it the iih coordinate projection rather than the iih coordinate functional. Again these maps are all linear. In fact, as before, the natural vector operations on W^ are uniquely defined by the requirement that the projections ti all be linear. We call the value/(j) = 7ry(/) theith coordinate of the vector /. Here the analogue of Cartesian n-space is the set W^ of all n-tuples a = <ai, . . . , an^ of vectors in W; it is also designated W^. Clearly, ay is the yth coordinate of the n-tuple a. There is no reason why we must use the same space W at each index, as we did above. In fact, if TFi, . . . , Wn are any n vector spaces, then the set of all n-tuples a = <«!,..., a^^ such that ay E Wj forj= 1, . . . , n is a vector space under the same definitions of the operations and for the same reasons. That is, the Cartesian product W = Wi X W2 X • • • X Wn is also a vector space of vector-valued functions. Such finite products will be very important to us. Of course, W^ is the product Hi Wi with each Wi= R; but W^ can also be considered W X R^"^, or more generally. Hi Wi, where Wi = Wi and Y^l rrii = n. However, the most important use of finite product spaces arises from the fact that the study of certain phenomena on a vector space V may lead in a natural way to a collection {Fji of suhspaces of V such that V is isomorphic to the product IJi Vi- Then the extra structure that V acquires when we regard it as the product space Hi Vi is used to study the phenomena in question. This is the theory of direct sums, and we shall investigate it in Section 5. Later in the course we shall need to consider a general Cartesian product of vector spaces. We remind the reader that if {Wi :i G 1} is any indexed collection of vector spaces, then the Cartesian product JliGi Wi of these vector spaces is
44 VECTOR SPACES 1.3 defined as the set of all functions / with domain I such that f{i) G Wi for all i G / (see Section 0.8). The following is a simple concrete example to keep in mind. Let aS be the ordinary unit sphere in IR^j aS — {x : Yl\ ^f = 1}? and for each point x on aS let TFx be the suhspace of U^ tangent to S at x. By this we mean the subspace (plane through 0) parallel to the tangent plane to aS at x, so that the translate Wj, + X is the tangent plane (see Fig. 1.8). A function / in the product space W = YlnES Wj, is a function which assigns to each point x on aS a vector in W^, that is, a vector parallel to the tangent plane to aS at x. Such a function is called a vector field on aS. Thus the product set W is the set of all vector fields on aS, and W itself is a vector space, as the next theorem states. Fig. 1.8 Of course, the jih coordinate projection on IF = JJi^s Wi is evaluation aty, Trj(f) = f{j), and the natural vector operations on W are uniquely defined by the requirement that the coordinate projections all be linear. Thus f -]- g must be that element of W whose value at j, 7ry(/+ g), is TTj{f) + TTj{g) = fU) + qU) for all j G /, and similarly for multiplication by scalars. Theorem 3.1. The Cartesian product of a collection of vector spaces can be made into a vector space in exactly one way so that the coordinate projections are all linear. Proof. With the vector operations determined uniquely as above, the proofs of Al through S4 that we sampled earlier hold verbatim. They did not require that the functions being added have all their values in the same space, but only that the values at a given domain element i all lie in the same space. D Hoiii(F, W), Linear transformations have the simple but important properties that the sum of two linear transformations is linear and the composition of two linear transformations is linear. These imprecise statements are in essence the theme of this section, although they need bolstering by conditions on domains and codomains. Their proofs are simple formal algebraic arguments, but the objects being discussed will increase in conceptual complexity.
1.3 PRODUCT SPACES AND HOM(F, TF) 45 If TF is a vector space and A is any set, we know that the space W^ of all mappings /: A —> TF is a vector space of functions (now vector valued) in the same way that U^ is. If A is itself a vector space V, we naturally single out for special study the subset of W^ consisting of all linear mappings. We designate this subset Hom(F, TF). The following elementary theorems summarize its basic algebraic properties. Theorem 3.2. Hom(Fj TF) is a vector subspace of TF^. Proof. The theorem is an easy formality. If S and T are in Hom(Fj TF), then {S + T){xa + yfi) = S{xa + y0) + T{xa + y0) = xS(a) + yS(0) + xTia) + yT(ff) = x(S + T)(a) + y(S + T)0), so S + T is linear and Hom(Fj TF) is closed under addition. The reader should be sure he knows the justification for each step in the above continued equality. The closure of Hom(F, TF) under multiplication by scalars follows similarly, and since Hom(F, TF) contains the zero transformation, and so is nonempty, it is a subspace. D Theorem 3.3. The composition of linear maps is linear: if T g Hom(F, TF) and aS g Hom(TF, X), then S o T G Hom(F, X). Moreover, composition is distributive over addition, under the obvious hypotheses on domains and codomains: (Si + S2)oT = S10T + S20T and S o (Tx +T2) = S o Ti +S o T2. Finally, composition commutes with scalar multiplication: c{S o T) = (cS) oT = So (cT). Proof. We have S o T(xa + yfi) = S{T{xa + yfi)) = S{xT{a) + yT{fi)) = xS{T(a)) + yS(T(p)) = x{S o T)(a) + yiS o r)(^), so aS o T is linear. The two distributive laws will be left to the reader. D Corollary. If 7" G Hom(Fy TF) is fixed, then composition on the right by T is a linear transformation from the vector space Hom(TF, X) to the vector space Hom(F, X). It is an isomorphism if T is an isomorphism. Proof. The algebraic properties of composition stated in the theorem can be combined as follows: icS, + C2S2) o T = c^{S, o T) + C2{S2 o T). S o (ciTi + C2T2) = CiiS o Ti) + C2{S o T2). The first equation says exactly that composition on the right by a fixed 7" is a linear transformation. (Write aS o T as 3(aS) if the equations still don't look right.) If T is an isomorphism, then composition by T~^ "undoes" composition by Tj and so is its inverse. D
46 VECTOR SPACES 1.3 The second equation implies a similar corollary about composition on the left by a fixed aS. Theorem 3.4. If TF is a product vector space, W = Yli Wi, then a mapping T from a vector space F to TF is linear if and only if tt^ o 7" is linear for each coordinate projection tt^. Proof. If T is linear, then tt^ o 7" is linear by the above theorem. Now suppose, conversely, that all the maps tt^ q T are linear. Then Ti{T{xa + yfi)) = TTi o T(xa + yfi) = x{ji o T){a) + 2/(7r, o T){fi) = XTTiiTia)) + yWiiTifi)) = TTiixTia) + yT{^)), But if TTiU) = Tiig) for all i, then f = g. Therefore, T(xa + yp) = xT{a) + yT{0), and T is linear. D If 7" is a linear mapping from W^ioW whose skeleton is {fij} ^, then tti q T has skeleton {Tri{fij)}^=i, If W is R^, then in is the zth coordinate functional y I—» yi^ and fij is theyth column in the matrix t = {Uj} of T, Thus Tr^dSy) = tijy and TTi o Tm the linear functional whose skeleton is the zth row of the matrix of T. In the discussion centering around Theorem 1.3, we replaced the vector equation y = ^(x) by the equivalent set of m scalar equations yi = 2^y=i UjXj, which we obtained by reading off the iih coordinate in the vector equation. But in "reading off" the zth coordinate we were applying the coordinate mapping TTi, or in more algebraic terms, we were replacing the linear map T by the set of linear maps {wi © T}, which is equivalent to it by the above theorem. Now consider in particular the space Hom(F, F), which we may as well designate 'Hom(F)'. In addition to being a vector space, it is also closed under composition, which we consider a multiplication operation. Since composition of functions is always associative (see Section 0.9), we thus have for multiplication the laws Ao {BoC)= {AoB)oC, Ao{B + C)= {AoB) + {Ao C), {A+B)oC= {AoC) + {Bo C), k{A o B) = (kA) oB = Ao (kB), Any vector space which has in addition to the vector operations an operation of multiplication related to the vector operations in the above ways is called an algebra. Thus, Theorem 3.5. Hom(F) is an algebra. We noticed earlier that certain real-valued function spaces are also algebras. Examples were W^ and e([0, 1]). In these cases multiplication is commutative, but in the case of Hom(F) multiplication is not commutative unless F is a trivial space (F = {0}) or V is isomorphic to U. We shall check this later when we examine the finite-dimensional theory in greater detail.
1.3 PRODUCT SPACES AND HOM(F, TF) 47 Product projections and injections. In addition to the coordinate projections, there is a second class of simple linear mappings that is of basic importance in the handling of a Cartesian product space W = Yik^K Wk. These are, for each y, the mapping dj taking a vector a G Wj to the function in the product space having the value a at the index j and 0 elsewhere. For example, $2 for Wi X W2 X W3 is the mapping a i-» <0, a, 0> from W2 to W, Or if we view U^ as R X R^, then 62 is the mapping < 0:2, ^^3 >" •—* ^0, < 0:2, ^^3 >" >" = < 0, X2, X3 > . We call dj the injection of Wj into Ufc ^A;- The linearity of 6j is probably obvious. The mappings xy and Bj are clearly connected, and the following projection- injection identities state their exact relationship. If Ij is the identity transformation on Wj^ then TTj o dj = Ij and TTj o di = 0 if i 9^ j. If K is finite and I is the identity on the product space TF, then Z h ° TTk = L In the case II?=i TFj, we have 62 ^ 7r2{<oLi, a2, as>) = <0, a2, 0>, and the identity simply says that <ai, 0, 0> + <0, 0:2? 0> + <0, 0, q:3> = <ai, a2, (X3> for all ai, a2, (X3. These identities will probably be clear to the reader, and we leave the formal proofs as an exercise. The coordinate projections Wj are useful in the study of any product space, but because of the limitation in the above identity, the injections 6j are of interest principally in the case of finite products. Together they enable us to decompose and reassemble linear maps whose domains or codomains are finite product spaces. For a simple example, consider the T in Hom([R^, U^) whose matrix is 2-1 1 1 1 4 Then tti q T i^ the linear functional whose skeleton <2, —1, 1 > is the first row in the matrix of T, and we know that we can visualize its expression in equation form, 2/1 == 2a:i — X2 + Xs, as being obtained from the vector equation y = T{x) by "reading off the first row". Thus we "decompose" T into the two linear functional k == tt^- o T. Then, speaking loosely, we have the reassembly T = <lijl2> ; more exactly, ^(x) = <2a:i — X2 + Xs, Xi + X2 + 4.T3> == <li(x), l2(x)> for all X. However, we want to present this reassembly as the action of the linear maps ^1 and 62. We have <h{x), l2{x)> = ^iOl(x)) + d2{l2{x)) = (^1 o TTi + ^2 o 7r2)(7^(x)) = ^(x), which shows that the decomposition and reassembly of T is an expression of the identity E di o tt,- = 7. In general, if T G Hom(F, W) and W = Hz ^i, then Ti = Ti o T is in Hom(F, Wj) for each z, and T^ can be considered "the part of T going into Wi\ since Ti(a) is the ith coordinate of T(a) for each a. Then we
48 VECTOR SPACES 1.3 can reassemble the T/s to form T again hy T = Yl^i° Ti, for J^Oi o Ti = (2Z 6i o TTi) o T = I o T = T, Moreover, any finite collection of T^s on a common domain can be put together in this way to make a T. For example, we can assemble an m-tuple {Ti}^ of linear maps on a common domain V to form a single m-tuple-valued linear map T. Given a in F, we simply define T{a) as that m-tuple whose ith coordinate is Ti{a) fori= 1, . . . , m, and then check that T is linear. Thus without having to calculate, we see from this assembly principle that T:xy-^ <2xi —0:2 +^35^1+^2 + ^x^> is a linear mapping from U^ to R^, since we have formed T by assembling the two linear functionals Zi(x) = 2a:i —0:2 + ^3 and Z2(x) = :^i + ^2 + 4a:3 to form a single ordercd-pair-valued map. This very intuitive process has an equally simple formal justification. We rigorize our discussion in the following theorem. Theorem 3.6. If Ti is in Hom(F, Wi) for each i in a finite index set 7, and if W is the product space IIzg/ ^ii then there is a uniquely determined T in Hom(F, W) such that Ti = Wi o T for all i in 7. Proof, If T exists such that Ti = Ti o T for each i, then T = Iw ° T = (ZOiOTi) o T = Z Oi o {Ti o T) = Z^io Ti. Thus T is uniquely determined as J^OiO Ti. ^loreover, this T does have the required property, since then TjoT= Tj oiJ^OiO Ti) = E (Try o di) o T, = Ij o Tj = Tj. D i In the same way, we can decompose a linear T whose domain is a product space F = IIy=i ^i ii^to the maps Tj = To Sj with domains Fy, and then reassemble these maps to form T by the identity T = 2^7= 1 ^i ° ^i (check it mentally!). Moreover, a finite collection of maps into a common codomain space can be put together to form a single map on the product of the domain spaces. Thus an n-tuple of maps {Ti}i into W defines a single map T into TF, where the domain of T is the product of the domains of the T/s, by the equation T(<ai,...,an>)=Zi Ti{ai) ov T = Zi TiOTi. For example, if Tii [R-> R^ is the map t ^-^ t<2, 1> = <2t, t>, and T2 and T^ are similarly the maps 11-» t<—l, 1> and ^ i-» ^<1, 4>, then T = Y.i Ti o Ti is the mapping from U^ to U^ whose matrix is 2-1 1 .1 1 4 Again there is a simple formal argument, and we shall ask the reader to write out the proof of the following theorem. Theorem 3.7. If Tj is in Hom(Fy, W) for each j in a finite index set J, and if F = Jlj^J Vji then there exists a unique T in Hom(F, W) such that T o Sj = Tj for each j in J. Finally we should mention that Theorem 3.6 holds for all product spaces, finite or not, and states a property that characterizes product spaces. We shall
1.3 PRODUCT SPACES AND HOM(F, TF) 49 investigate this situation in the exercises. The proof of the general case of Theorem 3.6 has to get along without the injections 6j; instead, it is an application of Theorem 3.4. The reader may feel that we are being overly formal in using the projections Ti and the injections di to give algebraic formulations of processes that are easily visualized directly, such as reading off the scalar "components" of a vector equation. However, the mappings and <0, 0, .r„0, ...,0> are clearly fundamental devices, and making their relationships explicit now will be helpful to us later on when we have to handle their occurrences in more complicated situations. EXERCISES 3.1 Show that W^ X W' is isomorphic to W^'^ 3.2 Show more generally that if Sf rn = n, then IJi^i ^""^ is isomorphic to W. 3.3 Show that if [B, C] is a partitioning of .1, then U-'^ and U^ X U^ are isomorphic. 3.4 Generalize the above to the case where {Ai][ partitions A. 3.5 Show that a mapping T from a vector space T^ to a vector space 11^ is linear if and only if (the graph of) T is a subsi)ace of 7 X W. 3.6 Let S and T be nonzero linear maps from V to W. The definition of the map S -\- T is not the same as the set sum of (the graphs of) S and T as subspaces of V X W. Show that the set sum of (the graphs of) S and T cannot be a grai)h unless S = T. 3.7 Give the justification for each step of the calculation in Theorem 3.2. 3.8 Prove the distributive laws given in Theorem 3.3. 3.9 Let /;: Q^la, b]) -> e([a, b]) be differentiation, and let S:e{[a, b]) -> U be the definite integral map/t-^ ja f. Compute the composition So J), 3.10 We know that the general linear functional F on R^ is the map x t-^ aixi -f a2i'2 determined by the i)air a in R^, and that the general linear maj) T in Hom(lR^) is determined by a matrix \j2l t22_\ Then F o T is another linear functional, and hence is of the form x h-> ^i^i + b2X2 for some b in R^. Comi)ute b from t and a. Your comi)utation should -'lOw you that a t-^ b is linear. What is its matrix? 3.11 Given S and T in Hom(lR^) whose matrices are and respectively, find the matrix oi S o T in Hom([R^).
50 VECTOR SPACES 1.3 3.12 Given S and T in Hom([R^) whose matrices are and t = Sll Sl2 .521 S22. ^11 h2 hi ^22. find the matrix oi S o T. 3.13 With the above answer in mind, what would you guess the matrix of iS o T is if S and T are in Hom(IR^) ? Verify your guess. 3.14 We know that if jT E Hom(F, W) is an isomorphism, t^en T~^ is an isomorphism in Hom(]r, V). Prove that S o T surjective => S surjective, S o T infective => T infective, and, therefore, that if T G Hom(7, W),Sg Homdf, 7), and So T = Iv, To S = I. , then T is an isomorphism. 3.15 Show that if S''^ and T'^ exist, then (So T)-'^ exists and equals T'^ o ^-i. Give a more careful statement of this result. 3.16 Show that if S and T in Hom V commute with each other, then the null space of T,N = N{T), and its range R = R{T) are invariant under S {S[N] C N and S[R] C R). 3.17 Show that if a is an eigenvector of T and S commutes with T, then S(a) is an eigenvector of T and has the same eigenvalue. 3.18 Show that if S commutes with T and T~^ exists, then S commutes with T~^. 3.19 Given that a is an eigenvector of T with eigenvalue x, show that a is also an eigenvector oi T^ = To T, of T^, and of T'^ (if T is invertible) and that the corresponding eigenvalues are x^, x"", and l/x. Given that p{t) is a polynomial in t, define the operator p(T), and under the above hypotheses, show that a is an eigenvector of p(T) with eigenvalue p(x). 3.20 If S and T are in Hom V, we say that S doubly commutes with T (and write S cc T) if S commutes with every .4 in Hom V which commutes with T. Fix T, and set {T]'' = {S:Scc T}. Show that {T}'' is a commutative subalgebra of Hom V. 3.21 Given T in Hom V and a in V, let A^ be the linear span of the "trajectory of a under T" (the set {T^a : n G Z+]). Show that N is invariant under T. 3.22 A transformation T in Hom V such that T^ = 0 for some n is said to be nilpotent. Show that if T is nilpotent, then / — T is invertible. [Hint: The power series 1 °° I — X 0 is a finite sum if x is replaced by T.] 3.23 Suppose that T is nilpotent, that iS commutes with T, and that S ^ exists, where iS, T G Hom 7. Show that (iS — T)-^ exists. 3.24 Let <p be an isomorphism from a vector space F to a vector space W. Show that T —^ (fo To (^-1 is an algebra isomorphism from the algebra Hom V to the algebra Hom W.
1.3 PRODUCT SPACES AND HOM(F, TD 51 3.25 Show the tt/s and d/s explicitly for R^ = R X R X R using the stopped arrow notation. Also write out the identity 2Z Oj o Try = / in explicit form. 3.26 Do the same for R^ = R^ X R^. 3.27 Show that the first two projection-injection identities (wi o Si = li and Try o ^^- = 0 if j lA i) are simply a restatement of the definition of 6i. Show that the linearity of 6i follows formally from these identities and Theorem 3.4. 3.28 Prove the identity ^ Bi o wi = I by applying Try to the equation and remembering that f = g ii wjif) = Tj{g) for all j (this being just the equation f{j) = g(j) for all j). 3.29 Prove the general case of Theorem 3.6. We are given an indexed collection of linear maps {Ti: i E /} with common domain V and codomains {14\-: i G I]. The first question is how to define T: V -^ W = II^• Wi. Do this by defining T{^) suitably for each J E F and then applying Theorem 3.4 to conclude that T is linear. 3.30 Prove Theorem 3.7. 3.31 We know without calculation that the map T:x -^ < 3a;i — X2 + xs, X2 + xs, xi — 5x's, 2a;i > from R^ to R"* is linear. Why? (Cite relevant theorems from the text.) 3.32 Write down the matrix for the transformation T in the above example, and then write down the mappings To Bi from R to R^ (for i = 1, 2, 3) in explicit ordered quadruplet form. 3.33 Let W = U? Wi be a finite product vector space and set pi = Bi o tt^, so that Pi is in Hom W for all i. Prove from the projection-injection identities that 23? pi = / (the identity map on W), pio pj = 0 if i ?^ j, and pio pi = pi. Identify the range Ri = R{pi). 3.34 In the context of the above exercise, define T in Hom W as n Show that a is an eigenvector of T if and only if a is in one of the subspaces Ri and that then the eigenvalue of a is i. 3.35 In the same situation show that the polynomial n n {T-jD = (T-/)o...o(T-n/) is the zero transformation. 3.36 Theorems 3.6 and 3.7 can be combined if T E Hom(F, If^), where both V and IT are product spaces: n m V = YlVj and TF = n ^'i- 1 1 State and prove a theorem which says that such a T can be decomposed into a doubly indexed family {T^•y} when TijE: Hom(Fi, Wj) and conversely that any such doubly indexed family can be assembled to form a single T form V to W.
52 VECTOR SPACES 1.4 3.37 Apply your theorem to the special case where 7 = R"" and W = R"" (that is, Vi = Wj = U for all i and j). Now Tij is from R to R and hence is simply multiplication by a number Uj. Show that the indexed collection {Uj} of these numbers is the matrix of T. 3.38 Given an m-tuple of vector spaces {Wi] f, suppose that there are a vector space .Y and maps pi in Hom(X, Wi), i = 1, . . . , m, with the following property: P. For any m-tuple of linear maps {Ti] from a common domain space V to the above spaces Wi (so that TiG Hom(F, Wi), i = 1, . . . , m), there is a unique T in Hom(F, A") such that Ti = pio T, i = I, . . . , rn. Prove that there is a "canonical" isomorphism from m W = II Wi to X 1 under which the given maps pi become the projections in. [Remark: The product space W itself has property P by Theorem 3.6, and this exercise therefore shows that P is an abstract characterization of the product space.] 4. AFFINE SUBSPACES AND QUOTIENT SPACES In this section we shall look at the "planes" in a vector space V and see what happens to them when we translate them, intersect them with each other, take their images under linear maps, and so on. Then we shall confine ourselves to the set of all planes that are translates of a fixed subspace and discover that this set itself is a vector space in the most obvious way. Some of this material has been anticipated in Section 2. Affine subspaces. If A^ is a subspace of a vector space V and a is any vector of F, then the sot N + a = {^ + a : ^ E N} is called either the coset of N containing a or the affine subspace of V through a and parallel to N. The set N + a is also called the translate of N through a. We saw in Section 2 that affine sub- spaces are the general objects that we want to call planes. If N is given and fixed in a discussion, we shall use the notation a = N -\- a (see Section 0.12). We begin with a list of some simple properties of affine subspaces. Some of these will generalize observations already made in Section 2, and the proofs of some will be left as exercises. 1) With a fixed subspace N assumed, if 7 G a, then 7 = a. For if 7 = a + 770) then 7 + rj = a+ (r]o +rj) G a, SOJ Ca. Also a+rj = 7 + (rj — tjq) G7, so a C 7. Thus a = 7. 2) With N fixed, for any a and (S, either a = p or a and /3 are disjoint. For if a and /3 are not disjoint, then there exists a 7 in each, and a = 7 = ^ by (1). The reader may find it illuminating to compare these calculations with the more general ones of Section 0.12. Here a ^ 13 ii and only li a — 13 G N. 3) Now let a be the collection of all affine subspaces of F; Q- is thus the set of all cosets of all vector subspaces of V. Then the intersection of any sub-
1.4 AFFINE SUBSPACES AND QUOTIENT SP VCES 53 family of Q, is either empty or itself an affine subspace. In fact, if [Ai}isi is an indexed collection of affine subspaces and A^- is a coset of the vector subspace Wi for each i G /, then Hiei ^i is either empty or a coset of the vector subspace rizG/ Wi. For if (S G rizG/ ^u then (1) implies that A,- = fi -\-Wi for all i, and thenflA,- fi+V\Wi. 4) If A, 5 G Q-, then A + ^ G d. That is, the set sum of any two affine subspaces is itself an affine subspace. 5) If A G a and T G Hom(F, W), then T[A] is an affine subspace of W. In particular, if ^ G [R, then tA G Q-. 6) If B is an affine subspace of W and T G Hom(F, TF), then T'^B] is either empty or an affine subspace of F. 7) For a fixed a E:V the translation of F through a is the mapping Sa'-V —^ V defined by Sa(^) = ^ + « for all ^ gV. Translation is 7wt linear; for example, ^^(0) = a. It is clear, however, that translation carries affine subspaces into affine subspaces. Thus Sa(A) = A +« and Sa(l3 + W) = (a + fi) + W, 8) An affine transformation from a vector space F to a vector space TF is a linear mapping from F to TF followed by a translation in TF. Thus an affine transformation is of the form J i-> T(^) +13, where T G Hom(F, TF) and 13 gW. Note that J i—> T(^ + a) is affine, since TU +cc)= T(0 + ^, where ^ - T(a). It follows from (5) and (7) that an affine transformation carries affine subspaces of F into affine subspaces of TF. Quotient space. Now fix a subspace N of F, and consider the set TF of all translates (cosets) of N. We are going to see that TF itself is a vector space in the most natural way possible. Addition will be set addition, and scalar multiplication will be set multiplication (except in one special case). For example, if N is a line through the origin in R^, then TF consists of all lines in U^ parallel to A^. We are saying that this set of parallel lines will automatically turn out to be a vector space: the set sums of any two of the lines in TF turn out to be a line in TF! And if L G TF and t 9^ 0, then the set product tL is a line in TF. The translates of L fiber R^, and the set of fibers is a natural vector space. During this discussion it will be helpful temporarily to indicate set sums by '+s' and set products by '-5'. With A" fixed, it follows from (2) above that two (^osets are disjoint or identical, so that the set TF of all cosets is a fibering of F in the general case, just as it was in our example of the parallel lines. From (4) or by a direct calculation we know that a +5 /3 = a + ^. Thus TF is closed under set addition, and, naturally, we take this to be our operation of addition on TF. That is, we define + on TF by a + /3 = a +5 /3. Then the natural map tt.'q: I—> a from F to TF preserves addition, iri^a + |S) = 7r(Q:) + '7r(|S), since
54 VECTOR SPACES 1.4 this is just our equation a + (^ = « + /3 above. Similarly, if ^ G R, then the set product t -s ais either ta or {0]. Hence if we define ta as the set product when t 7^ 0 and as 0 = A^ when t = 0, then tt also preserves scalar multiplicationj T(ta) = t7r(a). We thus have two vectorlike operations on the set W of all cosets of N, and we naturally expect W to turn out to be a vector space. We could prove this by verifying all the laws, but it is more elegant to notice the general setting for such a verification proof. Theorem 4.1. Let V be a vector space, and let TF be a set having two vectorlike operations, which we designate in the usual way. Suppose that there exists a surjective mapping T:V —^ W which preserves the operations: T(sa + tfi) = sT(a) + tT(l3). Then TF is a vector space. Proof. We have to check laws Al through S4. However, one example should make it clear to the reader how to proceed. We show that ^(O) satisfies A3 and hence is the zero vector of W. Since every (3 gW is of the form T(a), we have T(0) + fi= T(0) + T{a) = T(0 + a)= T(a) = fi, which is A3. We shall ask the reader to check more of the laws in the exercises. D Theorem 4.2. The set of cosets of a fixed subspace A^ of a vector space V themselves form a vector space, called the quotient space V/N, under the above natural operations, and the projection tt is a surjective linear map from V to V/N. Theorem 4.3. If T is in Hom(F, TF), and if the null space of T includes the subspace M C F, then T has a unique factorization through V/M, That is, there exists a unique transformation S in Hom(F/ii/, W) such that T = Sot, Proof. Since T is zero on 71/, it follows that T is constant on each coset A of il/, so that T[A] contains only one vector. If we define S(A) to be the unique vector in T[A], then S(a) = T{a)^ so S o w = T by definition. Conversely, if T = R o TT, then R(a) = R o Tr(a) = T(a)^ and R is our above S. The linearity of S is practically obvious. Thus S(a + P) = S(a + fi)= T(a + fi) = T{a) + T{fi) = S{a) + S0), and homogeneity follows similarly. This completes the proof. D One more remark is of interest here. If N is invariant under a linear map T in Horn F (that is, T[N] C A^), then for each a in F, T[a] is a subset of the coset T(a), for T[a] = T[a + N] = T(a) +, T[N] C T(a) +s N = T(a).
1.4 AFFINE SUBSPACES AND QUOTIENT SPACES 55 There is therefore a map S: V/N —> V/N defined by the requirement that S{a) = T{a) {or S o TT = tt o T), and it is easy to check that S is linear. Therefore, Theorem 4.4. If a subspace N of Si vector space V is carried into itself by a transformation T in Hom Fj then there is a unique transformation S in Hom(F/iV) such that S o w = t o T, EXERCISES 4.1 Prove properties (4), (5), and (6) of affine subspaces. 4.2 Choose an origin 0 in the Euchdean plane E^ (your sheet of paper), and let Li and L2 be two parallel lines not containing 0. Let X and Y be distinct points on Li and Z any point on L2. Draw the figure giving the geometric sums OX+OZ and OY+CiZ (parallelogram rule), and state the theorem from plane geometry that says that these two sum points are on a third line L3 parallel to Li and L2. 4.3 a) Prove the associative law for addition for Theorem 4.1. b) Prove also laws A4 and S2. 4.4 Return now to Exercise 2.1 and reexamine the situation in the light of Theorem 4.1. Show, finally, how we really know that the geometric vectors form a vector space. 4.5 Prove that the mapping S of Theorem 4.3 is infective if and only if A^ is the null space of T. 4.6 We know from Exercise 4.5 that if T is a surjective element of Hom(Fj W) and IV is the null space of T, then the S of Theorem 4.3 is an isomorphism from V/N to W, Its inverse S"^ assigns a coset of A^ to each rj in W. Show that the process of "indefinite integration" is an example of such a map S~^. This is the process of calculating an integral and adding an arbitrary constant, as in / sin X dx = —cos x-\- c. 4.7 Suppose that A^ and M are subspaces of a vector space V and that N G M. Show that then M/N is a subspace of V/N and that V/M is naturally isomorphic to the (luotient space (V/N)/(M/N). [Hint: Every coset of A^ is a subset of some coset of M.] 4.8 Suppose that A^ and AI are any subspaces of a vector space V. Prove that (M + A^)/A^ is naturally isomorphic to M/(M O N). (Start with the fact that each coset of Af n A^ is included in a unique coset of A^.) 4.9 Prove that the map S of Theorem 4.4 is linear. 4.10 Given T G Hom V, show that T^ = 0 (T^ = T o T) if and only if R(T) C N(T). 4.11 Suppose that T G Hom V and the subspace A^ are such that T is the identity on A^ and also on V/N. The latter assumption is that the S of Theorem 4.4 is the identity on V/N. Set R = T — /, and use the above exercise to show that R'^ = 0. Show that if T = I -\- R and 72^ = Oj then there is a subspace A^ such that T is the identity on A^ and also on V/N.
56 VECTOI^ SPACES 1.5 4.12 We now view the above situation a little differently. Supposing that T is the identity on N and on V/N, and setting R = I — T, show that there exists a A' E liom(V/N, V) such that R = K o x. Show that for any coset A of N the action of T on A can be viewed as translation through K(A). That is, if J E A and rj = K(A), thenT(J) = ^+v- 4.13 Consider the map T: <xi, X2> ^-^ <a;i + 2x2, X2> in Horn U^, and let N be the null space oi R = T — /. Identify N and show that T is the identity on N and on R^/iV. Find the map K of the above exercise. Such a mapi)ing T is called a shear transformation of T^ parallel to A^ Draw the unit square and its image under T. 4.14 If we remember that the linear span L{A) of a subset .1 of a vector space V can be defined as the intersection of all the subspaces of V that include J, then the fact that the intersection of any collection of affine subspaces of a vector space T^ is either an affine subspace or empty suggests that we define the affine span M{A) of a nonempty subset A(ZV as the intersection of all affine subspaces including A. Then we know from (3) in our list of affine proi)erties that M{A) is an affine subspace, and by its definition above that it is the smallest affine subspace including A. We now naturally wonder whether M{A) can be directly described in terms of finear combinations. Show first that if a E A, then M{A) = L(A — a) + a; then prove that MiA) is the set of all Hnear combinations Yl ^i<^i ^^ -^ such that ^xi = 1. 4.15 Show that the finear span of a set B is the affine span of 5 U {0]. 4.16 Show that yi{A + 7) = yi{A) + T for any 7 in T^ and that M{xA) = xM{A) for any a; in R. 5. DIRECT SUMS We come now to the heart of the chapter. It frequently happens that the study of some phenomenon on a vector space V leads to a finite collection of subspaces {FJ such that V is naturally isomorphic to the product space Yli ^i- Under this isomorphism the maps Bi o in on the product space become certain maps Pi in Horn F, and the projection-injection identities are reflected in the identities YlPi = L Pj ° Pj = Pj ^or all ./', and Pi o Pj = 0 if i ^ j. Also, Vi = range Pi. The product structure that V thus acquires is then used to study the phenomenon that gave rise to it. For example, this is the way that we unravel the structure of a linear transformation in Hom F, the study of which is one of the central problems in linear algebra. Direct sums. If Fi, . . . , F^ are subspaces of the vector space F, then the mapping tt: <«!,..., a^^ "—^ 2Zi «z is a linear transformation from JYi Vi to F, since it is the sum tt = 2Zi Wi of the coordinate projections. Definition. We shall say that the F/s are independent if w is injective and that F is the direct sum of the F/s if tt is an isomorphism. We express the latter relationship by writing F = Fi 0 • • • 0 F^ = 0i F^ Thus F = 0r=i Vi if and only if tt is injective and surjective, i.e., if and only if the subspaces {FJ ^ are both independent and span F. A useful restate-
1.5 DIRECT SUMS 57 ment of the direct sum condition is that each a G F is uniquely expressible as a sum 2Z1 ai^ with ai G Vi for all i] a has some such expression because the F/s span F, and the expression is unique by their independence. For example, let F = e(IR) be the space of real-valued continuous functions on R, let Ve be the subset of even functions (functions / such that f(—x) = f(x) for all x), and let Vq be the subset of odd functions (functions such that f(—x) = —f(x) for all x). It is clear that Ve and Vq are subspaces of F, and we claim that 7 = Fe e Fo. To see this, note that for any / in F, g(x) = {f{x) +f(-x))/2 is even, h(x) = {f{x) - f(-x))/2 is odd, Sindf= g + h. Thus V = Ve + Fo. Moreover, this decomposition of / is unique, for if f = gi -{- hi also, where gi is even and hi is odd, then g — gi = hi — h, and therefore ^ — ^i = 0 = hi — /i, since the only function that is both even and odd is zero. The even-odd components of e^ are the hyperbolic cosine and sine functions: e^ = o + "^ o ^ ^^^^ ^ + ^"^^ ^* Since tt is infective if and only if its null space is {0} (Lemma 1.1), we have: Lemma 5.1. The independence of the subspaces [Vi}1 is equivalent to the property that if ai G Vi for all i and XI i ^i = 0, then ai = 0 for all i. Corollary. If the subspaces {F^jiare independent, ai G Vi for all z, and 2Z1 a^- is an element of Fy, then ai = 0 for i 9^ j. We leave the proof to the reader. The case of two subspaces is particularly simple. Lemma 5.2. The subspaces M and A^ of F are independent if and only if MnN= {0}. Proof. If a G M, ^ G iV, and a + ^ = 0, then a= —l3GMnN. If M n N = {0}, this will further imply that a = 13 = 0, so M and N are independent. On the other hand, if 0 9^ fi G M D N, and if we set a = —fi, then a G M, f-i G N, and a -{- ^ = 0, so M and N are not independent, D Note that the first argument above is simply the general form of the uniqueness argument we gave earlier for the even-odd decomposition of a function on R. Corollary. V = M ^ N if and only if V = M + N smd M D N = {0}. Definition. If F = M 0 A^, then M and A^ are called complementary sub- spaces, and each is a complement of the other. Warning: A subspace M of F does not have a unique complementary subspace unless M is trivial (that is, M = {0} or M = F). If we view R^ as coordinatized lOuclidean 3-space, then M is a proper subspace if and only if M is a plane containing the origin or M is a line through the origin (see Fig. 1.9). If M and N are
58 VECTOR SPACES 1.5 Fig. 1.9 proper subspaces one of which is a plane and the other a line not lying in that plane, then M and N are complementary subspaces. Moreover, these are the only nontrivial complementary pairs in 1^. The reader will be asked to prove some of these facts in the exercises and they all will be clear by the middle of the next chapter. The following lemma is technically useful. Lemma 5*S, If Vi and Vq are independent subspaces of V and {Vi}^ are independent subspaces of Fq, then {Vijl are independent subspaces of V. Proof, If m G Vi for all i and Ei a,- = 0, then, setting a^ = £^ ai, we have "1 + ao = 0, with aQ G Vq, Therefore, ai = ao = 0 by the independence of Vi and Vq. But then ag = ag = • • • = a^ = 0 by the independence of {Vi}% and we are done (Lemma 5.1). D Corollary. V = Vi ® Vq and Vq = ©t=2 Vi together imply that V = e^i Vi, Projections. If F = ®?=i F^, if w is the isomorphism <ai, , . , , an> ^-» « = Si «ti and if Try is the jth projection map <ai^ , . . ^ an> ^-^ aj from ffi=i Vi to Fi, then (tt,- o 7r""^)(a) = aj. Definition. We call aj the jth component of a, and we call the linear map Pj = TTy o 7r~^ the projection of V onto Vj (with respect to the given direct sum decomposition of F). Since each a in F is uniquely expressible as a sum « = ZIi «ii with ai in Vi for all f^ we can view Pj(a) = ay as "the part of a in Vj'\ This use of the word "projection" is different from its use in the Cartesian product situation^ and each is different from its use in the quotient space context (Section 0.12). It is apparent that these three uses are related, and the ambiguity causes little confusion since the proper meaning is always clear from the context. Theorem 5.1. If the maps Pi are the above projections, then range Pi = F^, P. o p^. = 0 for i 9^ j, and E? Pi = /. Proof, Since r is an isomoiphism and Pj = Try o 7r"~^ we have range Py = range Try = Vj, Next, it follows directly from the corollary to Lemma 5.1 that
1.5 DIRECT SUMS 59 if a G Vj, then Pi{a) = 0 for i 9^ j, and so Pi o Py = 0 for i 9^ j. Finally^ Li Pi = El TTi ° TT"^ = (L" TTi) o TT"^ = 7r o tt"^ = I, and we are done. D The above projection properties are clearly the reflection in V of the projection-injection identities for the isomorphic space Hi Vi, A converse theorem is also true. Theorem 5.2. If {P,}^ C Hom F satisfy E^ Pi = I and Pi o Pj = 0 for i ^ jj and if we set Vi = range Pi^ then V = ®iLi Vi, and Pi is the corresponding projection on Vi. Proof, The equation a — I (a) = Ei A'(«) shows that the subspaces {Vi}1 span F. Nextj if ^ G Vjy then Pi(fi) = 0 for i ^ j\ since fi G range Pj and p. o Py =. 0 if i ^ J. Then also PyO) = (/ - Zi^j Pi)0) = 1(0) = 0- Now consider a = E^ «i fo^ any choice of ai G Fj. Using the above two factSj wehavePj(Q:) = PyCELi «i) = 2]?=i Pj{<^i) = «y- Therefore, a = 0 implies that ay = Pi(0) = 0 for all j\ and the subspaces Vi are independent. Consequently, F = ® 1 F^•. Finally, the fact that a = E Pi(a) and P.^a) G F^ for all i shows that Pj((x) is the jth component of a for every a and therefore that Pj is the projection of F onto Fy. D There is an intrinsic characterization of the kind of map that is a projection. Lemma 5.4. The projections Pi are idempotent (Pf = Pi), or, equivalently, each is the identity on its range. The null space of Pi is the sum of the spaces Vj for J 9^ I. Proof, Pj = Pj o (/ — EzVi Pi) = Pj ° I = Pj' Since this can be rewritten as Pj{Pj{a)) = Pj(ci) for every a in F, it says exactly that Pj is the identity on its range. Now set Wi = Ejvi Fy, and note that if (S G Wi^ then Pi(p) = 0 since p.[Vj] = 0 for J 9^ I, Thus Wi C N(Pi). Conversely, if P^a) = 0, then a = I{a) = Z1 Pj{a) = Ey^,- Pj(a) G Wi, Thus N{Pi) C Wi, and the two spaces are equal. D Conversely: Lemma 5.5. If P G Hom(F) is idempotent, then F is the direct sum of its range and null space, and P is the corresponding projection on its range. Proof, Setting Q = 7 — P, we have PQ = P — P^ = 0. Therefore, V is the direct sum of the ranges of P and Q, and P is the corresponding projection on its range, by the above theorem. Moreover, the range of Q is the null space of P, by the corollary. D If F = M 0 AT" and P is the corresponding projection on M, we call P the projection on M along N, The projection P is not determined by M alone, since M does not determine N. A pair P and Q in Hom F such that P -\- Q = I and PQ = QP = 0 is called a pair of complementary projections.
60 VECTOR SPACES 1.5 In the above discussion we have neglected another fine point. Strictly speaking, when we form the sum tt = 2Z^ Wi, we are treating each ttj as though it were from JYi Vi to F, whereas actually the codomain of Try is Fy. And we want Pj to be from F to F, whereas Try © tt"^ has codomain Vj, so the equation Pj = TTj o 7r~^ can't quite be true either. To repair these flaws we have to introduce the injection ty: Fy —> F, which is the identity map on Fy, but which views Fy as a subspace of F and so takes F as its codomain. If our concept of a mapping includes a codomain possibly larger than the range, then we have to admit such identity injections. Then, setting Tfy = ty o Try, we have the correct equations tt = 2Z^ iTi and Pj = Tfy o Tr~^ EXERCISES 5.1 Prove the corollary to Lemma 5.1. 5.2 Let a be the vector -<1, 1, 1> in U^, and let M = Ua be its one-dimensional span. Show that each of the three coordinate planes is a complement of M. 5.3 Show that a finite product space V = YLi Fi has subspaces {ITiji such that Wi is isomorphic to Vi and V = 0i Wi. Show how the corresponding projections {Pi} are related to the tt/s and di's. 5.4 If r GHom(F, W), show that (the graph of) T is a complement of W = {0} XWinVXW. 5.5 If Z is a linear functional on F (Z E Hom(F, U) = V*), and if a is a vector in V such that l{a) j^ 0, show that V = N ® M, where N is the null space of I and M = Ua is the linear span of a. What does this result say about complements in R^? 5.6 Show that any complement M of a subspace iV of a vector space V is isomorphic to the quotient space V/N. 5.7 We suppose again that every subspace has a complement. Show that if T E Hom V is not injective, then there is a nonzero S in Hom V such that 7" o ^ = 0. Show that if 7" E Hom V is not surjective, then there is a nonzero S in Hom V such that So T = 0. 5.8 Using the above exercise for half the arguments, show that T E Hom V is injective if and only U T o S = 0=^ S = 0 and that T is surjective if and only ii S o T = 0=^ S = 0. We thus have characterizations of injectivity and surjectivity that are formal, in the sense that they do not refer to the fact that S and T are transformations, but refer only to the algebraic properties of S and T as elements of an algebra. 5.9 Let M and N be complementary subspaces of a vector space V, and let X be a subspace such that X O N = {0). Show that there is a linear injection from A" to M. [Hint: Consider the projection P of 7 onto M along N.] Show that any two complements of a subspace N are isomorphic by showing that the above injection is surjective if and only if Z is a complement of N. 5.10 Going back to the first point of the preceding exercise, let Y be a complement of PIX] in M. Show that X nY = {0} and that X © F is a complement of N. 5.11 Let M be a proper subspace of V, and let {ai: i E /} be a finite set in V. Set L = L({ai}), and suppose that M -[- L = V. Show that there is a subset J C I such
1.5 DIRECT SUMS 61 that {oii: i E J] spans a complement of M. [Hint: Consider a largest possible subset J such that M 0 L {{a^] j) = {0}.] 5.12 Given T G Hom(F, W) and S G Hom(Tr, X), show that a) So T'lii surjective <=^ S is surjective and R{T) + N{S) = W; b) So T is injective <=^ T is injective and R{T) n N{S) = [0] ; c) So T is an isomorphism <=> >S is surjective, T is injective, and ir = R{T) ® NiS). 5.13 Assuming that every subspace of T' has a complement, show that T G Horn V satisfies T^ = OU and only if T^ has a direct sum decomposition V = M © A^ such that r = 0 on A^ and T[M] C A^. 5.14 Suppose next that T^ = 0 but T- ^ 0. Show that T' can be written as V = Vi © Vz © F:i, wliere ^[T^]] C V2, T[V2] C V-a, and T = 0 on V;^. (Assume again that any subspace of a vector space has a complement.) 5.15 We now suppose that T'^ = 0 but T"-' 9^ 0. Set A\ = null space (T^) for I = 1, . . . , n — 1, and let T^i be a complement of Nn-i in V. Show first that T[Vi] n Nn-2 = {0} and that T[Vi] C Nn-i- Extend T[Vi] to a complement V2 of Nn-2 in Nn-i, and show that in this way we can construct subspaces Fi, . . . , F„ such that and © Vi, T[Vi] CVi^i for i < n, 1 TlVn] = {0}. On solving a linear equation. Alany important problems in mathematics are in the following general form. A linear operator 7": F -^ TF is given, and for a given 77 G TF the equation T{^) = rj is to be solved for ^ ^V. In our terms, the condition that there exist a solution is exactly the condition that rj be in the range space of T. In special circumstances this condition can be given more or less useful equivalent alternative formulations. Let us suppose that we know how to recognize R{T), in which case we may as well make it the new codomain, and so assume that T is surjective. There still remains the problem of determining what we mean by solving the equation. The universal principle running through all the important instances of the problem is that a solution process calculates a right inverse to T, that is, a linear operator S:W-^ V such that T o S = Iwj the identity on W. Thus a solution process picks one solution vector J G F for each 77 G TF in such a way that the solving J varies linearly with 77. Taking this as our meaning of solving, we have the following fundamental reformulation. Theorem 5.3. Let 7" be a surjective linear map from the vector space F to the vector space W, and let A^ be its null space. Then a subspace M is a complement of A^ if and only if the restriction of T to M is an isomorphism from M to W. The mapping M \-^ {T \ M)~^ is a bisection from the set of all such complementary subspaces M to the set of all linear right inverses of r.
62 VECTOR SPACES 1.5 Proof. It should be clear that a subspace M is the range of a linear right inverse of T (a map S such that T o S = Iw) if and only if 7" \ M is an isomorphism to W, in which case aS = {T \ M)~^. Strictly speaking, the right inverse must be from TF to F and therefore must be R = lm ^ Sj where lm is the identity injection from M to V, Then {R o T)^ = R o (T o R) o T = R o I^ o T = R o T, and R o T is Si projection whose range is M and whose null space is N (since R is injective). Thus F = M 0 iV. Conversely, if F == M 0 iV, then T \ M is injective because M n N = {0} and surjective because M + N = V implies that W = T[V] = T[M + N] = T[M] + T[N] = T[M] + {0} = T[M]. D Polynomials in T. The material in this subsection will be used in our study of differential equations with constant coefficients and in the proof of the diagonal- izability of a symmetric matrix. In linear algebra it is basic in almost any approach to the canonical forms of matrices. If Pi{t) = 2Zo^ ciii^ a-nd 7?2(0 — 2Zo ^j^^ ^^^ ^^Y two polynomials, then their product is the polynomial m-{-n pit) = Pi{t)p2{t) = E C/, 0 where Ck = Jli+j=k cabj = Jli=o ca^k-i- Now let T be any fixed element of Hom(F), and for any polynomial q{f) let q{T) be the transformation obtained by replacing t by T. That is, if q{t) = Lo ^/^ then qiT) = Eo CkT\ where, of course, T^ is the composition product T o T o - - . o T with I factors. Then the bilinearity of composition (Theorem 3.3) shows that if p{t) = Pi(0P2(0> then p{T) = pi{T) o p2(T). In particular, any two polynomials in T commute with each other under composition. More simply, the commutative law for addition implies that if p{t) = piit)+P2{t), then p{T) = p,{T)+p2{T). The mapping p{t) ^-^ p{T) from the algebra of polynomials to the algebra Hom(F) thus preserves addition, multiplication, and (obviously) scalar multiplication. That is, it preserves all the operations of an algebra and is therefore what is called an (algebra) homomorphism. The word "homomorphism" is a general term describing a mapping 6 between two algebraic systems of the same kind such that 6 preserves the operations of the system. Thus a homomorphism between vector spaces is simply a linear transformation, and a homomorphism between groups is a mapping preserving the one group operation. An accessible, but not really typical, example of the latter is the logarithm function, which is a homomorphism from the multiplicative group of positive real numbers to the additive group of U. The logarithm function is actually a bijective homomorphism and is therefore a group isomorphism. If this were a course in algebra, we would show that the division algorithm and the properties of the degree of a polynomial imply the following theorem. (However, see Exercises 5.16 through 5.20.)
1.5 DIRECT SUMS 63 Theorem 5.4. If pi(^) and P2(0 ^^^ relatively prime polynomials, then there exist polynomials ai{t) and a2{t) such that ai{t)pi{t) + a2{t)p2{t) = 1. By relatively prime we mean having no common factors except constants. We shall assume this theorem and the results of the discussion preceding it in proving our next theorem. We say that a subspace M C F is invariant under T e Hom(F) if T[M] C M [that is, r \ M G Hom(M)]. Theorem 5.5. Let T be any transformation in Hom F, and let q be any polynomial. Then the null space N of q(T) is invariant under T, and if Q. ^^ 0.10.2 is any factorization of q into relatively prime factors and A^i and A'2 are the null space of qi{T) and q2{T), respectively, then A" = A^i 0 A'2- Proof. Since T o q{T) = q{T) o T, we see that if q{T){a) = 0, then q{T){Ta) = T{q{T)ia)) = 0, so ^[A^] C A". Note also that since q{T) = qi{T) o q^{T), it follows that any a in A'2 is also in N, so A'2 C N. Similarly, A^i C A". We can therefore replace F by A" and Thy T \ N; hence we can assume that T G Hom A" and qiT) = q^{T) o q^{T) = 0. Now choose polynomials ai and a2 so that aiqi + a2q2 = 1- Since p i-> p(r) is an algebraic homomorphism, we then have aiiT) o q,(T) + a2{T) o q2{T) = I. Set Ai == ai{T)j etc., so that Ai oQ^ + A2 0Q2 = /, Qi ^ Q2 == 0, and all the operators A^-, Qi commute with each other. Finally, set Pi = Ai o Q^ = Q^ o Ai for i = 1, 2. Then Pi + P2 - 7 and P1P2 = P2P1 = 0. Thus Pi and P2 are projections, and A" is the direct sum of their ranges: A" = Fi 0 F2. Since each range is the null space of the other projection, we can rewrite this as A" = A^i © ^2} where Ni = A"(Pi). It remains for us to show that A'(Pi) = N{Qi). Note first that since Qi ^ P2 = Qi o Q2 o A2 = 0, we have Qi = Qi o I = Qi ° {Pi + P2) = Qi ° Pi' Then the two identities Pi = Ai o Q^ and Qi = Qi o P^ show that the null space of each of Pi and Qi is included in the other, and so they are equal. This completes the proof of the theorem. D Corollary. Let p(t) = TlT=i Pi(0 be a factorization of the polynomial p{t) into relatively prime factors, let 7" be an element of Ilom(F), and set Ni = N{pi{T)) iori= 1, . . . , m and A" - N{p{T)). Then A" and all the Ni are invariant under T, and A" = @'iL ^ Ni. Proof. The proof is by induction on m. The theorem is the case m = 2, and if we set q = TV2 PiQ) and M = N{q{T))j then the theorem implies that A" = A'l 0 M and that A'l and M are invariant under T. Restricting T to M, we see that the inductive hypothesis implies that M = @T=2 Ni and that A^^ is invariant under T for i = 2, . . . , m. The corollary to Lemma 5.3 then yields our result. D
64 VECTOR SPACES 1.5 EXERCISES 5.16 Presumably the reader knows (or can see) that the degree d(P) of a polynomial P satisfies the laws d{P + Q) < mSix{diP),diQ)], d(P • Q) = d{P) + d{Q) if both P and Q are nonzero. The degree of the zero polynomial is undefined. (It would have to be — qc \) By induction on the degree of P, prove that for any two polynomials P and D, with D 9^ 0, there are polynomials Q and R such that P = DQ-\- R and d{R) < d{D) or R = 0. [Hint: If d(P) < d{D), we can take Q and R as what? If d{P) > d{D), and if the leading terms of P and D are ax*" and hx^^, respectively, with n > m, then the polynomial has degree less than d{P), so P' = DQ' + R' by the Inductive hypothesis. Now finish the proof.] 5.17 Assuming the above result, prove that R and Q are uniquely determined by P and D. (Assume also that P = DQ' + R', and prove from the properties of degree that R' = R and Q' = Q.) These two results together constitute the division algorithm for polynomials. 5.18 If P is any polynomial n Pipe) = XI (^nX^, 0 and if t is any number, then of course P{t) is the number n 0 Prove from the division algorithm that for any polynomial P and any number t there is a polynomial Q such that P{x) = {x-t)Q{x) + P{t), and therefore that P{x) is divisible by {x — t) if and only if P{t) = 0. 5.19 Let P and Q be nonzero polynomials, and choose polynomials To and ^o such that among all the polynomials of the form AP + BQ the polynomial D = AoP+BoQ is nonzero and has minimum degree. Prove that D is a factor of both P and Q. (Suppose that D does not divide P and apply the division algorithm to get a contradiction with the choice of .lo and ^o-) 5.20 Let P and Q be nonzero relatively prime polynomials. This means that if iiJ is a common factor of P and Q {P = EP', Q = EQ'), then E' is a constant. Prove that there are polynomials A and B such that A{x)P{x) -]- B{x)Q{x) = L (Apply the above exercise.)
1.5 DIRECT SUMS 65 5.21 In the context of Theorem 5.5, show that the restriction of q2{T) = Q2 to Ni is an isomorphism (from Ni to Ni). 5.22 An involution on 7 is a mapping T G Hom V such that T^ = I. Show that if T is an involution, then F is a direct sum V = Fi © F2, where T{^) = J for every i^Vi {T = I on Vi) and r(© = —f for every J G V2 {T = -I on V2). (Apply Theorem 5.5.) 5.23 We noticed earlier (in an exercise) that if (p is any mapping from a set .1 to a set B, then /i—» /o <^ is a linear map T^ from U^ to W^. Show now that if xp: B -^ C, then (This should turn out to he a direct consequence of the associativity of composition.) 5.24 Let .1 be any set, and let <^: .1 —> A be such that ipo (p(a) = a for every a. Then T^i/i—»/o <^ is an involution on 7 = U^^ (since T^^rf^ = T^o T^). Show that the decomposition of U^ as the direct sum of the subspace of even functions and the subspace of odd functions arises from an involution on R"* defined by such a map 5.25 Let 7 be a subspace of U^ consisting of differentiable functions, and suppose that V is invariant under differentiation (/E V =^ Df E^ V). Suppose also that on V the linear operator D E Hom V satisfies D^ — 2D — 3/ = 0. Prove that V is the direct sum of two subspaces M and A^ such that D = 3/ on M and D = —lonN. Actually, it follows that M is the linear span of a single vector, and similarly for N. Find these two functions, if you can. (/' = ^f=^f= ?) *Block decompositions of linear maps. Given T in Hom V and a direct sum decomposition V = 0i Vi, with corresponding projections {Pi}1, we can consider the maps Tij = P^o T o P-, Although Tij is from V to F, we may also want to consider it as being from Vj to Vi (in which case, strictly speaking, what is it?). We picture the T^/s arranged schematically in a rectangular array similar to a matrix, as indicated below for n = 2. ^11 ^12 T21 ^22 Furthermore, since T = J^ij Tij, we call the doubly indexed family the block decomposition of T associated with the given direct sum decomposition of V. More generally, if 7" G Hom(F, W) and W also has a direct sum decomposition W — @T=i ^u with corresponding projections {Qij^j then the family {Tij) defined by Tij = Qi o T o Pj and pictured as an m X n rectangular array is the block decomposition of T with respect to the two direct sum decompositions. Whenever T in Hom V has a special relationship to a particular direct sum decomposition of V, the corresponding block diagram may have features that display these special properties in a vivid way; this then helps us to understand the nature of T better and to calculate with it more easily.
66 VECTOR SPACES 1.5 Forexample, if F=- Vi 0 F2, then Fi is invariant under T (i.e., T[Fi] C Fi) if and only if the block diagram is upper triangulaVy as shown in the following diagram. ^11 ^12 0 22 Suppose, next, that T^ = 0. Letting Vi be the range of T, and supposing that Vi has a complement V2, the reader should clearly see that the corresponding block diagram is ^ 0. 0 Tu ^0. This form is called strictly upper triangular; it is upper triangular and also zero on the main diagonal. Conversely, if T has some strictly upper-triangular 2X2 block diagram, then T^ = 0. If 72 is a composition product, 72 = ST, then its block components can be computed in terms of those of S and T. Thus 72a =- PiRPic = PiSTPk '•<,?/') TPk = E SijT, ijl jlc y=i We have used the identities / = E7=i ^j ^^^ ^j = ^1- The 2x2 case is pictured below. ^11^11 + ^12^21 ^21^11 + S22T2I S11T12 + S12T22 S21T12 + S22T22 From this we can read off a fact that will be useful to us later: If 7" is 2 X 2 upper triangular {T21 = 0), and if Ta is invertible as a map from Vi to Vi (i = 1, 2), then T is invertible and its inverse is Tu rn —1 /Trr n^ —1 "^11 ^ 12^ 22 0 We find this solution by simply setting the product diagram equal to h 0 0 and solving; but of course with the diagram in hand it can simply be checked to be correct. EXERCISES 5.26 Show that if T G Horn 7, if V = ®i Vi, and if {P^] i are the corresponding projections, then the sum of the transformation Tij = P^o To Py is T.
1.6 BILINEARITY 67 5.27 If S and T are in Horn V and [Sij], {Tij} are their block components with respect to some direct sum decomposition of V, show that Sij o Ta = 0 if j 9^ I. 5.28 Verify that if T has an upper-triangular block diagram with respect to the direct sum decomposition V = Vi @ V2, then Vi is invariant under T. 5.29 Verify that if the diagram is strictly upper triangular, then T^ = 0. 5.30 Show that if 7 = 7i © ^2 © F3 and T G Hom V, then the subspaces Vi are all invariant under T if and only if the block diagram for T is 7^11 0 0 0 T22 0 0 0 ^33 Show that T is invertible if and only if Ta is invertible (as an element of Hom Vi) for each i. 5.31 Supposing that T has an upper-triangular 2X2 block diagram and that Ta is invertible as an element of Hom Vi for i = 1,2, verify that T is invertible by forming the 2X2 block diagram that is the product of the diagram for T and the diagram given in the text as the inverse of T. 5.32 Supposing that T is as in the preceding exercise, show that S = T~^ must have the given block diagram by considering the two equations T o S = I and S o T = I in their block form. 5.33 What would strictly upper triangular mean for a 3 X 3 block diagram? What is the corresponding property of T? Show that T has this property if and only if it has a strictly upper-triangular block diagram. (See Exercise 5.14.) 5.34 Suppose that T in Hom V satisfies T"" = 0 (but T""^ ^ 0). Show that T has a strictly upper-triangular nX n block decomposition. (Apply Exercise 5.15.) 6. BILINEARITY Bilinear mappings. The notion of a bilinear mapping is important to the understanding of linear algebra because it is the vector setting for the duality principle (Section 0.10). Definition. If f7, F, and W are vector spaces, then a mapping co: < J, 77> '->co(J, 7]) from f7 X F to TF is bilinear if it is linear in each variable lohen the other variable is held fixed. That is, if we hold J fixed, then 17 •—> co(J, rj) is linear [and so belongs to Hom(F, TF)]; if we hold rj fixed, then similarly co(J, 77) is in Hom(l7, TF) as a function of J. This is not the same notion as linearity on the product vector space U X V. For example, <x,y> —^x + y is a linear mapping from U^ to R, but it is not bilinear. If y is held fixed, then the mapping x ^-^ x + y is affine (translation through y), but it is not linear unless y is 0. On the other hand, <x, y> 1—> 0:7/ is a bilinear mapping from R^ to R, but it is not linear. If y
68 VECTOR SPACES 1.6 is held fixed, then the mapping x ^-^ yx is linear. But the sum of two ordered couples does not map to the sum of their images: <x^y> + <u^v> = <x -]- u^y -]- v> i-»(a: + u){y + y), which is not the sum of the images, xy + uv. Similarly, the scalar product (x, y) = 2Zi xiyi is bilinear from U^ X W to R, as we observed in Section 2. The linear meaning of bilinearity is partially explained in the following theorem. Theorem 6.1. If co: 17 X F —> TF is bilinear, then, by duality, co is equivalent to a linear mapping from U to Hom(F, W) and also to a linear mapping from F to Hom(l7, TF). Proof. For each fixed 77 G F let co^ be the mapping J •-> co(J, 77). That is, ^rj(^) = w(^j f])' Then co^ G Hom(l7, TF) by the bilinear hypothesis. The mapping 77 1—> co^ is thus from F to Hom(f/, TF), and its linearity is due to the linearity of co in 77 when J is held fixed: COcr,+rfr(^) = ^(^5 ^^7 + CU) = CC0(J, 77) + CZC0(J, f) = CCO^(^) + cZcOf(^), so that , , Similarly, if we define co^ by co^(rj) = co( J, 77), then J 1—» co^ is a linear mapping from U to Hom(F, TF). Conversely, if cp: U —^ Hom(F, TF) is linear, then the function co defined by co(J, 77) = (p(0(v) is bilinear. Moreover, co^ = (p(Oj so that (f is the mapping J 1—» co^. D We shall see that bilinearity occurs frequently. Sometimes the reinterpreta- tion provided by the above theorem provides new insights; at other times it seems less helpful. For example, the composition map <S,T>i-^SoTis bilinear, and the corollary of Theorem 3.3, which in effect states that composition on the right by a fixed 7" is a linear map, is simply part of an explicit statement of the bilinearity. But the linear map T 1—» composition by 7" is a complicated object that we have no need for except in the case W = U. On the other hand, the linear combination formula XI1 ^i^i and Theorem 1.2 do receive new illumination. Theorem 6.2. The mapping co(x, a) = Xli ^i^^i is bilinear from W X V to F. The mapping a 1—» coa is therefore a linear mapping from F^ to Hom([R^, F), and, in fact, is an isomorphism. Proof. The linearity of co in x for a fixed a was proved in Theorem 1.2, and its linearity in a for a fixed x is seen in the same way. Then a 1—» coa is linear by Theorem 6.1. Its bijectivity was implicit in Theorem 1.2. D It should be remarked that we can use any finite index set I just as well as the special set n and conclude that co(x, a.) = Yli^i ^i^i is bilinear from U^ X F^
1.6 BILINEARITY 69 to V and that a i-» coa is an isomorphism from V^ to Hom([R^, V). Also note that coa = La in the terminology of Section 1. Corollary. The scalar product (x, a) = X!i bilinear from R^ x R^ to R; therefore, a i-» co^ = L^ is an isomorphism from U^ to Hom([R^, U). Natural isomorphisms. We often find two vector spaces related to each other in such a way that a particular isomorphism between them is singled out. This phenomenon is hard to pin down in general terms but easy to describe by examples. Duality is one source of such "natural" isomorphisms. For example, an m X n matrix {tij} is a real-valued function of the two variables <i,j>j and as such it is an element of the Cartesian space R^^^. We can also view {tij} as a sequence of n column vectors in U^. This is the dual point of view where we hold j fixed and obtain a function of i for each j. From this point of view {tij} is an element of (W^)^. This correspondence between R^x^ and (R"^)^ is clearly an isomorphism, and is an example of a natural isomorphism. We review next the various ways of looking at Cartesian n-space itself. One standard way of defining an ordered n-tuplet is by induction. The ordered triplet <x, y, z> is defined as the ordered pair < <x, y>, z> , and the ordered n-tuplet <xi, . . . , Xn> is defined as < <xij . . . , Xn-i>, Xn>- Thus we define R"" inductively by setting R^ = R and R'' = R''"^ X R. The ordered n-tuplet can also be defined as the function onn = {1, . . . , n} which assigns Xi to i. Then and Cartesian n-space is R^' = R^^'--"^L Finally, we often wish to view Cartesian (n + m)-space as the Cartesian product of Cartesian n-space with Cartesian m-space, so we now take and R^+^ as R^ X R^. Here again if we pair two different models for the same n-tuplet, we have an obvious natural isomorphism between the corresponding models for Cartesian n-space. Finally, the characteristic properties of Cartesian product spaces given in Theorems 3.6 and 3.7 yield natural isomorphisms. Theorem 3.6 says that an 71-tuple of linear maps {Ti}^ on a common domain V is equivalent to a single /i-tuple-valued map T, where T{^) = <7^i(J), . . . , Tn{0> for all ^gV. (This is duality again! Ti(^) is a function of the two variables i and J.) And it is not hard to see that this identification of T with {Ti}^ is an isomorphism from Ui Hom(F, Wi) to Hom(F, Ui Wi). Similarly, Theorem 3.7 identifies an n-tuple of linear maps {Ti}'l into a common codomain V with a single linear map T of an n-tuple variable, and this identification is a natural isomorphism from Hi Hom(TFj, V) to Hom(n^ Wi, F).
70 VECTOR SPACES 1.6 An arbitrary isomorphism between two vector spaces identifies them in a transient way. For the moment we think of the vector spaces as representing the same abstract space, but only so long as the isomorphism is before us. If we shift to a different isomorphism between them, we obtain a new temporary identification. Natural isomorphisms, on the other hand, effect permanent identifications, and we think of paired objects as being two aspects of the same object in a deeper sense. Thus we think of a matrix as "being" either a sequence of row vectors, a sequence of column vectors, or a single function of two integer indices. We shall take a final look at this question at the end of Section 3 in the next chapter. *We can now make the ultimate dissection of the theorems centering around the linear combination formula. Laws Si through S3 state exactly that the scalar product xa is bilinear. More precisely, they state that the mapping S: <x^ a> ^-^ xa from R X TF to TF is bilinear. In the language of Theorem 6.1, xa = coai^): and from that theorem we conclude that the mapping c^: i—» co^ is an isomorphism from W to Hom([R, W). This isomorphism between W and Hom([R, W) extends to an isomorphism from W^ to (Hom([R, TF))^, which in turn is naturally isomorphic to Hom([R^,TF) by the second Cartesian product isomorphism. Thus W^ is naturally isomorphic to Hom([R^, TF); the mapping is a i—> La, where La(x) = Z1 Xiai. In particular, U^ is naturally isomorphic to the space Hom([R^, U) of all linear functionals on U^, the n-tuple a corresponding to the functional co.^ defined by cOa(x) = X!i (^i^i- Also, (W^y is naturally isomorphic to Hom([R'', Wy ^nd since 1R^><^ is naturally isomorphic to (W^y, it follows that the spaces R^^^ and Hom([R^, R^) are naturally isomorphic. This is simply our natural association of a transformation T in Hom([R^, U^) to an m X n matrix {tij}.
CHAPTER 2 FINITE-DIMENSIONAL VECTOR SPACES We have defined a vector space to be finite-dimensional if it has a finite spanning set. In this chapter we shall focus our attention on such spaces, although this restriction is unnecessary for some of our discussion. We shall see that we can assign to each finite-dimensional space V a unique integer, called the dimension of F, which satisfies our intuitive requirements about dimensionality and which becomes a principal tool in the deeper explorations into the nature of such spaces. A number of "dimensional identities" are crucial in these further investigations. We shall find that the dual space of all linear functional on V, V* = Hom(F, R), plays a more satisfactory role in finite-dimensional theory than in the context of general vector spaces. (However, we shall see later in the book that when we add limit theory to our algebra, there are certain special infinite-dimensional vector spaces for which the dual space plays an equally important role.) A finite-dimensional space can be characterized as a vector space isomorphic to some Cartesian space U^, and such an isomorphism allows a transformation T in Hom V to be "transferred" to R^, whereupon it acquires a matrix. The theory of linear transformations on such spaces is therefore mirrored completely by the theory of matrices. In this chapter we shall push much deeper into the nature of this relationship than we did in Chapter 1. We also include a section on matrix computations, a brief section describing the trace and determinant functions, and a short discussion of the diagonalization of a ([uadratic form. I. BASES (yonsider again a fixed finite indexed set of vectors a = {ai: i G 1} inV and the corresponding linear combination map La: x i—» JZ ^ic^i from U^ to V having a lis skeleton. Definition. The finite indexed set {ai: i G 1} is independent if the above mapping La is infective, and {ai} is a basis for V if La is an isomorphism (onto F). In this situation we call {ai: i G 1} an ordered basis or frame if I = n = {Ij . , , J n} ior some positive integer n. Thus {ai :i G 1} is a basis if and only if for each ^ gV there exists a unique indexed "coefficient" set x = {xi: i G 1} G U^ such that J = Yi Xiai. The 71
72 FINITE-DIMENSIONAL VECTOR SPACES 2.1 numbers Xi always exist because {ai: i G /} spans F, and x is unique because La is injective. For example, we can check directly that b^ = < 2, 1 > and b^= <1, —3> form a basis for R^. The problem is to show that for each y G IR^ there is a unique x such that 2 J =Y1, ^*b* = ''^'i"^^, 1> + 0:2<1, —3> = <2a:i + X2, xi — 3x'2>. 1 Since this vector equation is equivalent to the two scalar equations 2/1 = 2x\ + X2 and 2/2 = ^1 ~ ^0:25 we can find the unique solution x\ = (3yi + 2/2)/^, ^2 = (2/1 "~ 22/2)/7 by the usual elimination method of secondary school algebra. The form of these definitions is dictated by our interpretation of the linear combination formula as a linear mapping. The more usual definition of independence is a corollary. Lemma 1.1. The independence of the finite indexed set {ai : i G 1} is equivalent to the property that 2]/ Xiai = 0 only if all the coefficients Xi are 0. Proof. This is the property that the null space of La consist only of 0, and is thus equivalent to the injectivity of La, that is, to the independence of {ai}, by Lemma LI of Chapter L D If {aiji is an ordered basis (frame) for F, the unique n-tuple x such that J = XI1 ^io^i is called the coordinate n-tuple of J (with respect to the basis {ai}), and Xi is the ith coordinate of J. We call Xiai (and sometimes Xi) the ith component of J. The mapping La will be called a basis isomorphism, and its inverse La^, which assigns to each vector J G F its unique coordinate n-tuple x, is a coordinate isomorphism. The linear functional J 1—» Xj is the yth coordinate functional; it is the composition of the coordinate isomorphism J 1—» x with the yth coordinate projection x 1—» Xj on U^. We shall see in Section 3 that the n coordinate functionals form a basis for F* = Hom(F, U). In the above paragraph we took the index set / to be n = {1, . . , , n} and used the language of n-tuples. The only difference for an arbitrary finite index set is that we speak of a coordinate function x = {xi: i G /} instead of a coordinate n-tuple. Our first concern will be to show that every finite-dimensional (finitely spanned) vector space has a basis. We start with some remarks about indices. We note first that a finite indexed set {ai: i G 1} can be independent only if the indexing is injective as a mapping into F, for if ak = ai, then 2Z ^io^i = 0, where o^A: = 1, ^z = —1, and Xi = 0 for the remaining indices. Also, if {ai : i G /} is independent and J C I, then {ai :i G J} is independent, since if Zl j Xiai = 0, and if we set Xi =0 for z G / — J, then 2Z/ ^i<^i = 0? and so each Xj is 0. A finite unindexed set is said to be independent if it is independent in some
2.1 BASES 73 (necessarily bijective) indexing. It will of course then be independent with respect to any bijective indexing. An arbitrary set is independent if every finite subset is independent. It follows that a set A is dependent (not independent) if and only if there exist distinct elements ai, . . . , q:,^ in A and scalars Xi^ . . . ^ Xn not all zero such that Yl\ ^i(^i = 0. An unindexed basis would be defined in the obvious way. However, a set can always be regarded as being indexed, by itself if necessary! Lemma 1.2. If B is an independent subset of a vector space V and 13 is any vector not in the linear span L(B), then B U [13] is independent. Proof. Otherwise tliere is a zero linear combination, xjS + X^i •^'//^z = 0, where ^1, . . . , p,i are distinct elements of B and tlie coefficients are not all 0. But then X cannot be zero; if it were, the equation would contradict the independence of B. We can therefore divide by x and solve for 13, so that /3 G L(B), a contradiction. D The reader will remember that we call a vector space V finite-dimensional if it has a finite spanning set [ai] \. We can use the above lemma to construct a basis for such a F by choosing some of the aiS. We simply run through the sequence [ai] i and choose those members that increase the linear span of the preceding choices. We end up with a spanning set since [ai] \ spans, and our subsequence is independent at each step, by the lemma. In the same way we can extend an independent set {/3/} i to a basis by choosing some members of a spanning set {apy^l. This procedure is intuitive, but it is messy to set up rigorously. We shall therefore proceed differently. Theorem 1.1. Any minimal finite spanning set is a basis, and therefore any finite-dimensional vector space F has a basis. More generally, if {0j : j G J} is a finite independent set and [ai : i G 1} is a finite spanning set, and if K is a smallest subset of / such that {0j} j U {ai] k spans, then this collection is independent and a basis. Therefore, any finite independent subset of a finite-dimensional space can be extended to a basis. Proof. It is sufficient to prove the second assertion, since it includes the first as a special case. If {fij} j U {ai}K is not independent, then there is a nontrivial zero linear combination X!/ Vj^j + 2Za' Xiai = 0. If every Xi were zero, this (equation would contradict the independence of {fij] j. Therefore, some Xk is not zero, and we can solve the equation for ak. That is, if we set L = K — {A;}, I hen the linear span of {0j} j U {a^} l contains ak. It therefore includes the whole original spanning set and hence is F. But this contradicts the minimal nature of /v, since L is a proper subset of K. Consequently, {0j} j U {ai} k is independent. D We next note that IR^ itself has a very special basis. In the indexing map / H-> ai the vector aj corresponds to the index j, but under the linear combination map X t-^ 2Z ^i<^i the vector aj corresponds to the function b-' which has (he value 1 at j and the value 0 elsewhere, so that Yli K(^i = «y- This function
74 FINITE-DIMENSIONAL VECTOR SPACES 2.1 8^' is called a Kronecker delta function. It is clearly the characteristic function Xb of the one-point set B = {j}, and the symbol '5-^' is ambiguous, just as 'X^' is ambiguous; in each case the meaning depends on what domain is implicit from the context. We have already used the delta functions on R^ in proving Theorem 1.2 of Chapter 1. Theorem 1.2. The Kronecker functions {d^}J=i form a basis for W\ Proof. Since Yli ^i^XJ) = ^j by the definition of 8\ we see that 2Zi ^i^* is the n-tuple x itself, so the linear combination mapping Lg: x ^^ 2Zi ^i^* is the identity mapping x ^^ x, a trivial isomorphism. D Among all possible indexed bases for R^, the Kronecker basis is thus singled out by the fact that its basis isomorphism is the identity; for this reason it is called the standard basis or the natural basis for U^. The same holds for U^ for any finite set /. Finally, we shall draw some elementary conclusions from the existence of a basis. Theorem 1.3. If 7" G Hom(F, W) is an isomorphism and a = {ai : i G 1} is a basis for F, then {T(ai) : i G 1} is a basis for W. Proof. By hypothesis La is an isomorphism in Hom(IR'^, F), and so T o La is an isomorphism in Hom([R^, W). Its skeleton {T(ai)} is therefore a basis for W. D We can view any basis {ai} as the image of the standard basis {5*} under the basis isomorphism. Conversely, any isomorphism d:U^ —> B becomes a basis isomorphism for the basis ay = 6(8^). Theorem 1.4. If X and Y are complementary subspaces of a vector space F, then the union of a basis for X and a basis for F is a basis for F. Conversely, if a basis for F is partitioned into two sets, with linear spans X and F, respectively, then X and F are complementary subspaces of F. Proof. We prove only the first statement. If {ai: i G J} is a basis for X and {ai: i G K} is a basis for F, then it is clear that {ai : i G J U K} spans F, since its span includes both X and F, and so X + F = F. Suppose then that Hjuk Xiai = 0. Setting J = Xl/ ^i<^i and t) = ^k Xiai, we see that ^ G X, 77 G F, and J + 77 = 0. But then f = 77 = 0, since X and F are complementary. And then Xi = 0 for i G J because {ai} j is independent, and Xi = 0 for i G K because {ai}K is independent. Therefore, {ai} j\jk is a basis for F. We leave the converse argument as an exercise. D Corollary. If F = 0i Vi and Bi is a basis for Vi, then B = IJi ^z is a basis for F. Proof. We see from the theorem that Bi u ^2 is a basis for Fi 0 ¥2- Proceeding inductively we see that Ui=i Bi is a basis for 0^=1 Vi for j = 2, . . . , n, and the corollary is the case j = n. D
2.1 BASES 75 If we follow a coordinate isomorphism by a linear combination map, we get the mapping of the following existence theorem, which we state only in n-tuple form. Theorem 1.5. If /3 = {0i}\ is an ordered basis for the vector space F, and if {ai}\ is any n-tuple of vectors in a vector space W, then there exists a unique S E Hom(F, W) such that S(Pi) = ai for i = 1, ... , n. Proof. By hypothesis Lp is an isomorphism in Hom(IR'^, F), and so S = La^ (^p)~^ is an element of Hom(F, W) such that S(Pi) = La(5') = ai. Conversely, if S G Hom(F, W) is such that S{0i) = ai for all z, then S o Lp(5*) = ai for all z, so that aS o Lp = La. Thus S is uniquely determined as La o (Lp)~\ D It is natural to ask how the unique S above varies with the 7?-tuple {ai}. The answer is: linearly and ^^isomorphically". Theorem 1.6. Let {Pi}\ be a fixed ordered basis for the vector space F, and for each 7i-tuple a = {aji chosen from the vector space W let Sa E Hom(F, W) be the unique transformation defined above. Then the map a ^-^ /Sa is an isomorphism from W^ to Hom(F, W). Proof. As above, Sa = La o d~^, where 6 is the basis isomorphism Lp. Now we know from Theorem 6.2 of Chapter 1 that a ^-^ La is an isomorphism from W^ to Hom([R^, W), and composition on the right by the fixed coordinate isomorphism d~^ is an isomorphism from Hom([R^, W) to Hom(F, W) by the corollary to Theorem 3.3 of Chapter 1. Composing these two isomorphisms gives us the theorem. D *Infinite bases. Most vector spaces do not have finite bases, and it is natural to try to extend the above discussion to index sets I that may be infinite. The Kronecker functions {5*: z E /} have the same definitions, but they no longer span U^. By definition / is a linear combination of the functions 5* if and only if/ is of the form Ylieii Ci8\ where 7i is a finite subset of I. But then / = 0 outside of 7i. Conversely, if / E R^ is 0 except on a finite set /i, then / = 12ieiif(i)8^' The linear span of {5* : z E /} is thus exactly the set of all functions of R^ that are zero except on a finite set. We shall designate this sub- space Ui. If {ai: i G 1} is an indexed set of vectors in F and / E R/, then the sum Y^i^i f{i)oLi becomes meaningful if we adopt the reasonable convention that the sum of an arbitrary number of O's is 0. Then XiiG/ = Szg/oj where Iq is any finite subset of I outside of which / is zero. With this convention, La'.f^-^ Y^if{i)<^i is a linear map from R/ to F, as in Theorem 1.2 of Chapter 1. And with the same convention, Y^i^i f{i)ai is an elegant expression for the general linear combination of the vectors ai. Instead of choosing a finite subset 7i and numbers Ci for just those indices z in 7i, we define Ci for all i E 7, but with the stipulation that Cj = 0 for all but a finite number of indices. That is, we take c = {ci: i G 1} as a function in Uj.
76 FINITE-DIMENSIONAL VECTOR SPACES 2.1 We make the same definitions of independence and basis as before. Then {ai : i G 1} is a basis for F if and only if La: H^/ —> F is an isomorphism, i.e., if and only if for each ^ gV there exists a unique x G R/ such that ^ = 2Zz Xiai. By using an axiom of set theory called the axiom of choice, it can be shown that every vector space has a basis in this sense and that any independent set can be extended to a basis. Then Theorems 1.4 and 1.5 hold with only minor changes in notation. In particular, if a basis for a subspace M of F is extended to a basis for F, then the linear span of the added part is a subspace A^ complementary to M. Thus, in a purely algebraic sense, every subspace has complementary subspaces. We assume this fact in some of our exercises. The above sums are always finite (despite appearances), and the above notion of basis is purely algebraic. However, infinite bases in this sense are not very useful in analysis, and we shall therefore concentrate for the present on spaces that have finite bases (i.e., are finite-dimensional). Then in one important context later on we shall discuss infinite bases where the sums are genuinely infinite by virtue of limit theory. EXERCISES 1.1 Show by a direct computation that {-<1, —l^, <0, 1>} is a basis for U^. 1.2 The student must realize that the ith coordinate of a vector depends on the whole basis and not just on the ith basis vector. Prove this for the second coordinate of vectors in U^ using the standard basis and the basis of the above exercise. 1.3 Show that {-<1, 1>, -<1, 2>} is a basis for V = U^. The basis isomorphism from U^ to V is now from U^ to U^. Find its matrix. Find the matrix of the coordinate isomorphism. Compute the coordinates, with respect to this basis, of -< —1, 1 >, -< 0, 1 >, <2, 3>. 1.4 Show that {h% where b^ = <1, 0, 0>, b^ = <1, 1, 0>, and b^ = <1, 1, 1>, is a basis for U^. 1.5 In the above exercise find the three linear functional k that are the coordinate functional with respect to the given basis. Since 3 1 finding the h is equivalent to solving x = 2Zi Vi^^ for the yi's in terms of X = <Xl, X2, X3>. 1.6 Show that any set of polynomials no two of which have the same degree is independent. 1.7 Show that if {ai}^ is an independent subset of V and T in Hom(F, W) is injec- tive, then {T{ai)]^ is an independent subset of W. 1.8 Show that if T is any element of Hom(F, W) and {T(ai)] ^ is independent in TF, then [ai] ^ is independent in V.
2.2 DIMENSION 77 1.9 Later on we are going to call a vector space V n-dimensional if ever}^ basis for V contains exactly n elements. If T^ is the span of a single vector a, so that T^ = Ua, then V is elearl}^ one-dimensional. Let {Vi]^ be a collection of one-dimensional subspaces of a vector space V, and choose a nonzero vector ai in ]\ for each i. Prove that [ai]\ is independent if and only if the subspaces {V,]\ are inde])endent and that [oii]^ is a basis if and only if V = 0"; Vi. 1.10 Finish the proof of Theorem 1.4. 1.11 Give a proof of Theorem 1.4 based on the existence of isomorphisms. 1.12 The reader would guess, and we shall prove in the next section, that every subspace of a finite-dimensional s})ace is finite-dimensional. Prove now that a sub- space A^ of a finite-dimensional vector space T is finite-dimensional if and only if it has a complement M. (Work from a combination of Theorems LI and 1.4 and direct sum })rojections.) L13 Since {h^]\ = {<1,0, 0>, <1, ],0>, <\, 1, 1 >] is a basis for U^, there is a unique T in Hom([R^ R^) ^^^.^ that T{h^) = <1,0>, ^(b^) = <0, 1>, and T(h^) = < 1, 1 >. Find the matrix of T. (Find 7^(50 for i = 1,2, 3.) 1.14 Find, similarly, the S in Hom U'^ such that S(h') = 8' for i = 1, 2, 3. 1.15 Show that the infinite sequence [r'] q i« a basis for the vector space of all polynomials. 2. DIMENSION The concept of dimension rests on the fact that two different bases for the same space always contain the same number of elements. This number, which is then the number of elements in every basis for F, is called the dimension of F. It tells all there is to know about F to within isomorphism: There exists an isomorphism between two spaces if and only if they have the same dimension. We shall consider only finite dimensions. If F is not finite-dimensional, its dimension is an infinite cardinal number, a concept with which the reader is probably unfamiliar. Lemma 2.1. If F is finite-dimensional and T in Hom V is surjective, then T is an isomorphism. Proof, Let n be the smallest number of elements that can span F. That is, there is some spanning set {oLi}\ and none with fewer than n elements. Then {ai]1 is a basis, by Theorem LI, and the linear combination map ^: x i—> Yl\ ^i<^i is accordingly a basis isomorphism. But {fii]\ = {T(ai)}1 also spans, since T is surjective, and so 7" o ^ is also a basis isomorphism, for the same reason. Then T = (T o e) o d~^ is an isomorphism. D Theorem 2.1. If F is finite-dimensional, then all bases for V contain the same number of elements. Proof. Two bases with ii and m elements determine basis isomorphisms 6: [R^ -> F and <p: W -> F. Suppose that m < n and, viewing R^ as R^ X R^"^,
78 FINITE-DIMENSIONAL VECTOR SPACES 2.2 let TT be the projection of R"" onto W, TT^^ "S ^Ij . . . J X^n: ' ' ' ■) Xn ^ ) ^ "S ^ij . . . j Xm ^ • Since T = d~^ o ,p is an isomorphism from W to W and T o tt: W -> W is therefore surjective, it follows from the lemma that 7" © tt is an isomorphism. Then tt = T~^ o (7" o tt) is an isomorphism. But it isn't, because 7r(5'^) = 0, and we have a contradiction. Therefore no basis can be smaller than any other basis, n The integer that is the number of elements in every basis for F is of course called the dimensioii of F, and we designate it (1{V). Since the standard basis [b^] ^ for R'^ has n elements, we see that W is n-dimensional in this precise sense. Corollary. Two finite-dimensional vector spaces are isomorphic if and only if they have the same dimension. Proof. If T is an isomorphism from F to TF and 5 is a basis for F, then T[B] is a basis for W by Theorem 1.3. Therefore d{V) = #B = #T[B] = d(W), where #A is the number of elements in A. Conversely, if d(V) = d(W) = n, then F and W are each isomorphic to R'^ and so to each other. D Theorem 2.2. Every subspace M of a finite-dimensional vector space F is finite-dimensional. Proof. Let d be the family of finite independent subsets of M. By Theorem 1.1, if A G d, then A can be extended to a basis for F, and so #A < d(V). Thus {#A : A G a} is a finite set of integers, and we can choose B G d such that 71 = jj^B is the maximum of this finite set. But then L{B) = M, because otherwise for any a G M ~ L{B) we have B \J {a} G Ct, by Lemma 1.2, and #(5uM) = n+l, contradicting the maximal nature of n. Thus M is finitely spanned. D Corollary. Every subspace M of a finite-dimensional space F has a complement. Proof. Use Theorem 1.1 to extend a basis for M to a basis for F, and let N be the linear span of the added vectors. Then apply Theorem 1.4. D Dimensional identities. We now prove two basic dimensional identities. We will always assume F finite-dimensional. Lemma 2.2. If Fi and F2 are complementary subspaces of F, then d{V) = d(Vi) + d(V2)^ More generally, if F = ©^ Vi then d(V) = Ei d(Vi). Proof. This follows at once from Theorem 1.4 and its corollary. D Theorem 2.3. If U and W are subspaces of a finite-dimensional vector space, then d(U +W) + d(U nW) = d(U) + d(W).
2.2 DIMENSION 79 Proof, Let F be a complement of 17 n TF in 17. We start by showing that then F is also a complement of TF in 17 + TF. First F + TF = F + ((17 n TF) + TF) = (F + (17 n TF)) + TF = 17 + TF. We have used the obvious fact that the sum of a vector space and a subspace is the vector space. Next, V nw = {V nu)nw =v n{u nW)= {o}, because F is a complement of 17 fi TF in C/. We thus have both F + TF = 17 + TF and F fi TF = {0}, and so F is a complement of TF in 17 + TF by the corollary of Lemma 5.2 of Chapter L The theorem is now a corollary of the above lemma. We have d{U) + d{W) = {d{U n TF) + d{V)) + d{W) = d{U n W) +{d{V) + d{W)) = d{U n TF) + d{U + TF). D Theorem 2.4. Let F be finite-dimensional, and let TF be any vector space. Let T G Hom(F, TF) have null space N (in F) and range R (in TF). Then R is finite-dimensional and d{V) = d{N) + d(R). Proof. Let 17 be a complement of N in F. Then we know that 7" f 17 is an isomorphism onto R, (See Theorem 5.3 of Chapter L) Therefore, R is finite- dimensional and d{R) + d{N) = d{U) + d{N) = d{V) by our first identity. D Corollary. If TF is finite-dimensional and d(W) = ^(F), then T is injective if and only if it is surjective, so that in this case injectivity, surjectivity, and bijectivity are all equivalent. Proof, T is surjective if and only if 72 = TF. But this is equivalent to d{R) = d(W), and if d(W) = d(V), then.the theorem shows this is turn to be equivalent to d(N) = 0, that is, to N = {0}. D Theorem 2.5. If d(V) = n and d{W) = m, then Hom(F, TF) is finite- dimensional and its dimension is mn. Proof, By Theorem L6, Hom(F, TF) is isomorphic to TF^ which is the direct sum of the n subspaces isomorphic to TF under the injections ^^ for z = 1, . . . , n. The dimension of TF"^ is therefore 2Zi ^ = ^^ by Lemma 2.2. D Another proof of Theorem 2.5 will be available in Section 4. EXERCISES 2.1 Prove that if d(V) = n, then any spanning subset of n elements is a basis. 2.2 Prove that if d(V) = n, then any independent subset of n elements is a basis. 2.3 Show that if d(V) = n and TF is a subspace of the same dimension, then W = V,
80 FINITE-DIMENSIONAL VECTOR SPACES 2.2 2.4 Prove by using dimensional identities that if / is a nonzero linear functional on an n-dimensional space V, then its null space has dimension n — 1. 2.5 Prove by using dimensional identities that if / is a linear functional on a finite- dimensional space V, and if a is a vector not in its null space N, then V = N ® Ua. 2.6 Given that A^ is an (n — l)-dimensional subspace of an n-dimensional vector space V, show that A^ is the null space of a linear functional. 2.7 Let A' and Y be subspaces of a finite-dimensional vector s})ace V, and suppose that T in Hom(7, W) has null space N = Xn Y. Show that T[X + Y] = T[X] © T(Y), and then deduce Theorem 2.3 from Lemma 2.2 and Theorem 2.4. This proof still depends on the existence of a T having N = X f) Y as its null space. Do we know of any such T? 2.8 Show that if T^ is finite-dimensional and S, T E Horn V, then So T = I =^ T is invertible. Show also that T o S = I =^ T is invertible. 2.9 A subspace A of a vector space V has finite codimension n if the quotient space V/N is finite-dimensional, with dimension n. Show that a subspace A has finite codimension n if and only if A has a complementary subspace M of dimension n. (Move a basis for V/N back into V.) Bo not assume V to be finite-dimensional. 2.10 Show that if Ai and A2 are subspaces of a vector space V with finite codimen- sions, then A = Ai H A2 has finite codimension and cod(A) < cod(Ai) + cod(A2). (Consider the mapping Ji—> < ^1, ^2^^ when ^^ is the coset of A^ containing J.) 2.11 In the above exercise, suppose that cod(Ai) = cod(A2), that is, d{V/Ni) = d(V/N2). Prove that d(Ai/A) = d{N2/N). 2.12 Given nonzero vectors /3 in 7 and / in 7* such that/O) 9^ 0, show that some scalar multiple of the mapping ?'-^/(J)/3 is a projection. Prove that any projection having a one-dimensional range arises in this way. 2.13 We know that the choice of an origin 0 in Euclidean 3-space E^ induces a vector space structure in E^ (under the correspondence Xt-^ OX) and that this vector space is three-dimensional. Show that a geometric plane through 0 becomes a two- dimensional subspace. 2.14 An m-dimensional plane M is a translate A + ao of an m-dimensional subspace N. Let [^i] f be any basis of A, and set ai = ^i + ao. Show that M is exactly the set of linear combinations m m / , Xiai such that zL ^i — 1- 1=0 0 2.15 Show that Exercise 2.14 is a corollary of Exercise 4.14 of Chapter 1. 2.16 Show, conversely, that if a plane M is the affine span of m + 1 elements, then its dimension is < m. 2.17 From the above two exercises concoct a direct definition of the dimension of an affine subspace.
2.3 THE DUAL SPACE 81 2.18 Write a small essay suggested by the following definition. An {m + l)-tuple {ai}^ is affinely independent if the conditions together imply that y^. Xiai = 0 and 7 . o^i = 0 0 0 Xi = 0 for all I. 2.19 A polynomial on a vector space 7 is a real-valued function on V which can be represented as a finite sum of finite products of linear functional. Define the degree of a polynomial; define a homogeneous polynomial of degree k. Show that the set of homogeneous polynomials of degree A: is a vector space Xk. 2.20 Continuing the above exercise, show that if A:i < A:2 < • • • < ^n, then the vector spaces {Xk^^ are independent subspaces of the vector space of all polynomials. [Assume that a polynomial p{t) of a real variable can be the zero function only if all its coefficients are 0. For any polynomial P on F consider the polynomials Pa{t) = Pita).] 2.21 Let -Ka, fi> be a basis for the two-dimensional space 7, and let <\, fi> be the corresponding coordinate projections (dual basis in V*), Show that every polynomial on V "is a polynomial in the two variables X and /x". 2.22 Let -<q:, ^> be a basis for a two-dimensional vector space 7, and let -<X,/x> be the corresponding coordinate projections (dual basis for V*), Show that <X2,X/x,/x'> is a basis for the vector space of homogeneous polynomials on V of degree 2. Similarly, compute the dimension of the space of homogeneous polynomials of degree 3 on a two-dimensional vector space. 2.23 Let V and W be two-dimensional vector spaces, and let F be a mapping from V to W, Using coordinate systems, define the notion of F being quadratic and then show that it is independent of coordinate systems. Generalize the above exercise to higher dimensions and also to higher degrees. 2.24 Now lei F: V -^ W he Si mapping between two-dimensional spaces such that for any u, v E F and any I E W*, l(F(tu + v)) is a quadratic function of t, that is, of the form at^ + 6< + c. Show that F is quadratic according to your definition in the above exercises. 3. THE DUAL SPACE Although throughout this section all spaces will be assumed finite-dimensional, many of the definitions and properties are valid for infinite-dimensional spaces as well. But for such spaces there is a difference between pTirely algebraic situations and situations in which algebra is mixed with hypotheses of continuity. One of the blessings of finite dimensionality is the absence of this complication. As the reader has probably surmised from the number of special linear functionals we have met, particularly the coordinate functionals, the space Hom(F, U) of all linear functionals on V plays a special role.
82 FINITE-DIMENSIONAL VECTOR SPACES 2.3 Definition. The dual space (or conjugate space) F* of the vector space V is the vector space Hom(F, R) of all linear mappings from V to R. Its elements are called linear functionals. We are going to see that in a certain sense V is in turn the dual space of V* (V and (F*)* are naturally isomorphic), so that the two spaces are symmetrically related. We shall briefly study the notion of aiiiiihilation (orthogonality) which has its origins in this setting, and then see that there is a natural isomorphism between Hom(F, W) and Hom(TF*, F*). This gives the mathematician a new tool to use in studying a linear transformation T in Hom(F, W)] the relationship between T and its image T* exposes new properties of T itself. Dual bases. At the outset one naturally wonders how big a space F* is, and we settle the question immediately. Theorem 3.1. Let {Pi}\ be an ordered basis for F, and let 8y be the corresponding jth coordinate functional on F: 8y(J) = Xj, where J = Z!i ^t/^i- Then {8y} i is an ordered basis for F*. Proof. Let us first make the proof by a direct elementary calculation. a) Independence. Suppose that Yl\ Cj8>j = 0, that is, J^i Cj&j(^) = 0 for all ^ gV. Taking J = Pi and remembering that the coordinate n-tuple of /3^• is d\ we see that the above equation reduces to Ci = 0, and this for all i. Therefore, {Sy} 1 is independent. b) Spanning. First note that the basis expansion ^ = Yl ^i&i can be rewritten J = L Si(0/3f. Then for any \gV we have X(0 = Li h^i{i)y where we have set U = \(l3i). That is, X = XI h^i- This shows that {Sy} i spans F*, and, together with (a), that it is a basis. D Definition. The basis {8y} for F* is called the dual of the basis {^i} for F. As usual, one of our fundamental isomorphisms is lurking behind all this, but we shall leave its exposure to an exercise. Corollary. 6^(F*) = d{V). The three equations ^ = E UOfii, X = L x(^,)8,, x(0 = L x(^,). &i(0 are worth looking at. The first two are symmetrically related, each presenting the basis expansion of a vector with its coefficients computed by applying the corresponding element of the dual basis to the vector. The third is symmetric itself between J and X. Since a finite-dimensional space F and its dual space F* have the same dimension, they are of course isomorphic. In fact, each basis for F defines an isomorphism, for we have the associated coordinate isomorphism from F to R^, the dual basis isomorphism from R^ to F*, and therefore the composite isomor-
2.3 THE DUAL SPACE 83 phism from F to F*. This isomorphism varies with the basis, however, and there is in general no natural isomorphism between F and F*. It is another matter with Cartesian space U^ because it has a standard basis, and therefore a standard isomorphism with its dual space (R"")*. It is not hard to see that this is the isomorphism a i—» L^, where La(x) = 2Zi a^•a:^•, that we discussed in Section 1.6. We can therefore feel free to identify U^ with (R"")*, only keeping in mind that when we think of an 7?-tuple a as a linear functional, we mean the functional La(x) = 2^i a^•a:^. The second conjugate space. Despite the fact that F and F* are not naturally isomorphic in general, we shall now see that V is naturally isomorphic to F** = (F*)*. Theorem 3.2. The function co: F X F* -> R defined by co(J,/) = /(J) is bilinear, and the mapping J ^^ co^ from F to F** is a natural isomorphism. Proof. In this context we generally set J** = co^, so that J** is defined by j**(/) = /(J) for all / e F*. The bilinearity of co should be clear, and Theorem G.l of Chapter 1 therefore applies. The reader might like to run through a direct check of the linearity of J i-» J** starting with (ci Ji + C2J2)**(/)• There still is the question of the injectivity of this mapping. If a ?^ 0, we can find / G F* so that f(a) 9^ 0. One way is to make a the first vector of an ordered basis and to take/as the first functional in the dual basis; then/(a) = 1. Since «**(/) = /(a) ^ 0, we see in particular that a** ?^ 0. The mapping J —> J** is thus injective, and it is then bijective by the corollary of Theorem 2.4. D If we think of F** as being naturally identified with F in this way, the two spaces V and F* are symmetrically related to each other. Each is the dual of t,he other. In the expression y(J)' we think of both symbols as variables and then hold one or the other fixed for the two interpretations. In such a situation we often use a more symmetric symbolism, such as (^,/), to indicate our inten- t,ion to treat both symbols as variables. Lemma 3.1. If {X^•} is the basis in F* dual to the basis {ai} in F, then {a**} is the basis in F** dual to the basis {X^•} in F*. Proof. We have a**(Xj) = \j(ai) = 5), which shows that af* is the iih coordinate projection. In case the reader has forgotten, the basis expansion/ = X! ^y^y implies that (xf*{f) = f{ai) = {Jlcj\j)(ai) = Ci, so that af* is the mapping f^ci. D Annihilator subspaces. It is in this dual situation that orthogonality first naturally appears. However, we shall save the term 'orthogonal' for the latter context in which V and F* have been identified through a scalar product, and shall speak here of the annihilator of a set rather than its orthogonal complement.
84 FINITE-DIMENSIONAL VECTOR SPACES 2.3 Definition. If A C F, the annihilator of .4, A°, is the set of all/ in F* such that/(a) = 0 for all a in A. Similarly, if A C F*, then A° = {a G F:/(a) = 0 for all / G A). If we view F as (F*)*, the second definition is included in the first. The following properties are easily established and will be left as exercises: 1) A° is always a subspace. 2) ACB=^ B"" CA\ 3) {L{A)Y = A°. 4) (A uBy == .rn/^°. 5) A C A°°. We now add one more crucial dimensional identity to those of the last section. Theorem 3.3. If TF is a subspace of F, then d(V) = cl(W) + d(TF°). Proof. Let {I3i]'i be a basis for TF, and extend it to a basis {tSiYl for F. Let {X^} 1 be the dual basis in F*. We claim that then {XjJJ^+i is a basis for TF°. I'irst, if,/ > 7^1, then \j{l3i) = 0 for i = 1, . . . , m, and so \j is in W° by (3) above. Thus {X^^.!, . . . , X,J C TF°. Now suppose that / G TF°, and let / = 2]y=i cy^i ^e its (dual) basis expansion. Then for each i < m we have c^• = /(^.) = 0, since ^,- G W and / G TF°; therefore, / = Em+i cyXy. Thus every / in TF° is in the span of {Xi}^_|_i. Altogether, we have shown that TF° is the span of {Xj^+i, as claimed. Then cZ(TF°) + d(W) = {71 - m) + m = n = cl(V), and we are done. D Corollary. A°° = L(A) for every subset A CV. Proof. Since (L(A))° = A°, we have d(L(A)) + d(A°) = cZ(F), by the theorem. Also d{A°) + fZ(A°°) = d(F*) = fZ(F). Thus d(A°°) = d{L(A)), and since L(A) C A°°, by (5) above, we have L{A) = A°°. 0 The adjoint of T. We shall now see that with every T in Hom(F, TF) there is naturally associated an element of Hom(TF*, F*) which we call the adjoint of T and designate T*. One consequence of the intimate relationship between T and T* is that the range of T* is exactly the annihilator of the null space of T, Combined with our dimensional identities, this implies that the ranges of T and T* have the same dimension. And later on, after we have established the connection between matrix representations of T and 7"*, this turns into the very mysterious fact that the dimension of the linear span of the row vectors of an m- hy-n matrix is the same as the dimension of the linear span of its column vectors, which gives us our notion of the rank of a matrix. In Chapter 5 we shall study a situation (Hilbert space) in which we are given a fixed fundamental isomorphism between F and F*. If T is in Hom F, then of course T* is in Hom F*, and we can use this isomorphism to "transfer" T* into Hom F. But now T can be com-
2.3 THE DUAL SPACE 85 pared with its (transferred) adjoint T*, and they may be equal. That is, T may be self-adjoint. It turns out that the self-adjoint transformations are "nice" ones, as we shall see for ourselves in simple cases, and also, fortunately, that many important linear maps arising from theoretical physics are self-adjoint. If r G Hom(F, W) and I G TF*, then of course Z o T g F*. Moreover, the mapping I y-^ I o T (T fixed) is a linear mapping from TF* to F* by the corollary to Theorem 3.3 of Chapter 1. This mapping is called the adjoint of T and is designated T*. Thus T* G Hom(T7*, F*) and r*(0 = I o T for Silll G TF*. Theorem 3.4. The mapping T i-^ T* is an isomorphism from the vector space Hom(F, W) to the vector space Hom(TF*, F*). Also (T o S)* = aS* o T* under the relevant hypotheses on domains and codomains. Proof. Everything we have said above through the linearity of 7" i—» T* is a consequence of the bilinearity of co(l, T) = I o T. The map we have called T* is simply cot, and the linearity of 7" i—» T* thus follows from Theorem 6.1 of Chapter 1. Again the reader might benefit from a direct linearity check, beginning with (ci^i + C2T2)*©. To see that T i-» T* is injective, we take any T ^ 0 and choose a G F so that T{a) 9^ 0. We then choose Z G TF* so that l{T{a)) ^ 0. Since l{T{a)) = (r*(0)(a), we have verified that T* ^ 0. Next, if d(V) = m and d{W) = n, then also d{V*) = m and (i(TF*) = n by the corollary of Theorem 3.1, and 6^(Hom(F, TF)) = mn= 6^(Hom(TF*, F*)) by Theorem 2.5. The injective map 7" 1—» T* is thus an isomorphism (by the corollary of Theorem 2.4). Finally, {T o S)n = I o (T o S) = {I o T) o S = aS*(Z o T) = >S*(r*(0) = (S* o r*)Z, so that {ToSy = S*o T*. D The reader would probably guess that T** becomes identified with T under the identification of F with F**. This is so, and it is actually the reason for calling the isomorphism J i-» J** natural. We shall return to this question at the end of the section. Meanwhile, we record an important elementary identity. Theorem 3.5. (72(r*))° = N(T) and iV(r*) = {R(T)y. Proof. The following statements are definitionally equivalent in pairs as they occur: I G iV(r*), r*(0 = OJ o T = 0, l{T(0) = 0 for alU G F, Z G {R(T)y, Therefore, N(T*) = {R{T)y. The other proof is similar and will be left to the reader. [Start with a G N{T) and end with a G (72(r*))°.] D The rank of a linear transformation is the dimension of its range space. Corollary. The rank of T* is equal to the rank of T, Proof. The dimensions of R{T) and {N{T)y are each d{V) - d{N(T)) by Theorems 2.4 and 3.3, and the second is d{R{T*)) by the above theorem. Therefore, d{R{T)) = d{R{T*)). D
86 FINITE-DIMENSIONAL VECTOR SPACES 2.3 Dyads. Consider any T in Hom(F, W) whose range M is one-dimensional. If fi is a nonzero vector in ilf, then x i—> X|S is a basis isomorphism d:R —^ M and (9-^ o T: F -> R is a linear functional X G F*. Then T = 6> o x and 7^(0 = X( J)(S for all ^ We write this as T = X(-)(S, and call any such T a dyad. Lemma 3.2. If T is the dyad X(-)^, then T* is the dyad ^**(-)X. Proof, (r*(0)(^) = {I o TXO = KnO) = KHm) = mnO, so that r*(Z) = l(p)\ = ^**(0X, and r* = ^**(-)X. D *Natural isomorphisms again. We are now in a position to illustrate more precisely the notion of a natural isomorphism. We saw above that among all the isomorphisms from a finite-dimensional vector space F to its second dual, we could single one out naturally, namely, the map J i-> J**, where J**(/) = f(^) for all /in F*. Let us call this isomorphism (pv- The technical meaning of the word 'natural' pertains to the collection {(pv} of all these isomorphisms; we found a way to choose one isomorphism (pv for each space F, and the proof that this is a "natural" choice lies in the smooth way the various <^f's relate to each other. To see what we mean by this, consider two finite-dimensional spaces F and W and a map T in Hom(F, W). Then T* is in Hom(TF*, F*) and T** = (T*)* is in Hom(F**, TF**). The setting for the four maps T, T**, <pv, and <pw can be displayed in a diagram as follows: The diagram indicates two maps, (pw ° T and T** o (py^ from F to TF**, and we define the collection of isomorphisms {ipv} to be natural if these two maps are always equal for any F, W and T. This is the condition that the two ways of going around the diagram give the same result, i.e., that the diagram he commutative. Put another way, it is the condition that T "become" T** when F is identified with F** (by (pv) and W is identified with TF** (by (pw)- We leave its proof as an exercise. EXERCISES 3.1 Let 6 be an isomorphism from a vector space V to U^. Show that the functional Wi °6}i form a basis for V*. 3.2 Show that the standard isomorphism from R^ to (R")* that we get by composing the coordinate isomorphism for the standard basis for R" (the identity) with the dual basis isomorphism for (R^)* is just our friend a i—» /«, where Za(x) = 2i (^i^i- (Show that the dual basis isomorphism is a i—» J^^ atWi.)
2.3 THE DUAL SPACE 87 3.3 We know from Theorem 1.6 that a choice of a basis {^i} for V defines an isomorphism from W^ to Hom(F, W) for any vector space W. Apply this fact and Theorem 1.3 to obtain a basis in F*, and show that this basis is the dual basis of {Pi}. 3.4 Prove the properties of A° that are listed in the text. 3.5 Find (a basis for) the annihilator of <1, 1, 1> in U^. (Use the isomorphism of (R^)* with U^ to express the basis vectors as triples.) 3.6 Find (a basis for) the annihilator of {< 1, 1, 1 >, < 1, 2, 3>} in R^. 3.7 Find (a basis for) the annihilator of {< 1, 1, 1, 1 >, < 1, 2, 3, 4>} in R'*. 3.8 Show that if V = AI @ N, then 7* = M° 0 iV°. 3.9 Show that if M is any subspace of an n-dimensional vector space V and d(M) = m, then M can be viewed as being the linear span of an independent subset of m elements of V or as being the annihilator of (the intersection of the null spaces of) an independent subset of n — m elements of F*. 3.10 If B = {fi}^ is a finite collection of linear functionals on V (B C V*), then its annihilator B° is simply the intersection A^ = Cl^Ni of the null spaces Ni = N(fi) of the functionals/t. State the dual of Theorem 3.3 in this context. That is, take W as the linear span of the functionals /», so that W C V* and W° C V. State the dual of the corollary. 3.11 Show that the following theorem is a consequence of the corollary of Theorem 3.3. Theorem. Let A^ be the intersection fli ^i of the null spaces of a set {fi}^ of linear functionals on V, and suppose that g in V* is zero on A^. Then ^ is a linear combination of the set {fi}^. 3.12 A corollary of Theorem 3.3 is that if IF is a proper subspace of V, then there is at least one nonzero linear functional / in V* such that / = 0 on W. Prove this fact directly by elementary means. (You are allowed to construct a suitable basis.) 3.13 An m-tuple of linear functionals {/t}f on a vector space V defines a linear mapping a i—» <fi(a),... ,fm{ci) > from V to W^. What theorem is being applied here? Prove that the range of this linear mapping is the whole of W^ if and only if (/i}i* is an independent set of functionals. [Hint: If the range is a proper subspace W, there is a nonzero m-tuple a such that 23i (^i^i = 0 for ^-H x E W.] 3.14 Continuing the above exercise, what is the null space A^" of the linear mapping (X i—^ < fi(a), . . . ,fm(oL) > ? If ^ is a linear functional which is zero on A^, show that g is a linear combination of the/t, now as a corollary of the above exercise and Theorem 4.3 of Chapter 1. (Assume the set {fi}^ independent.) 3.15 Write out from scratch the proof that T* is linear [for a given T in Hom(F, W)]. Also prove directly that T i—» T* is linear. 3.16 Prove the other half of Theorem 3.5. 3.17 Let 6{ be the isomorphism a i—» a** from Vi to 7i** for i = 1, 2, and suppose ^iven T in Hom(Fi, ¥2)- The loose statement T = T** means exactly that T** = 620 To ^fi or T** o Oi = 620 T, Prove this identity. As usual, do this by proving that it holds for each a in Vi.
88 FINITE-DIMENSIONAL VECTOR SPACES 2.4 3.18 Let ^: R^ —>7 be a basis isomorphism. Prove that the adjoint 6* is the coordinate isomorphism for the dual basis if (R^)* is identified with R^ in the natural way. 3.19 Let CO be any bilinear functional on F X W. Then the two associated linear transformations are T: V-^ W* defined by (Ti^))ir}) = cc(^, rj) and S:W-^ 7* defined by iSir})){^) = co(J, 77). Prove that S = T* if W is identified with M'**. 3.20 Suppose that / in (R"^)* has coordinate m-tuple a [/(y) = ^f aiyi] and that T in Hom(IR^, U^) has matrix t = {Uj}. Write out the explicit expression of the number f(T(\)) in terms of all these coordinates. Rearrange the sum so that it appears in the form n 1 and then read off the formula for b in terms of a. 4. MATRICES Matrices and linear transformations. The reader has already learned something about matrices and their relationship to linear transformations from Chapter 1 ; we shall begin our more systematic discussion by reviewing this earlier material. By popular conception a matrix is a rectangular array of numbers such as ^11 ^12 • • • hn ^21 ^22 • • • ^2n ^ml ^m2 ' ' • ^mn Note that the first index numbers the rows and the second index numbers the columns. If there are m rows and n columns in the array, it is called an m-by-n (m X n) matrix. This notion is inexact. A rectangular array is a way of picturing a matrix, but a matrix is really a function, just as a sequence is a function. With the notation m = {1, . . . , m}, the above matrix is a function assigning a number to every pair of integers < i, j> inm X n. It is thus an element of the set U^mxn ^j^g addition of two m X n matrices is performed in the obvious place- by-place way, and is merely the addition of two functions in [R^x^^ the same is true for scalar multiplication. The set of all m X n matrices is thus the vector space U^^'^j a Cartesian space with a rather fancy finite index set. We shall use the customary index notation Uj for the value t(z, j) of the function t at <z, j>, and we shall also write {tij} for t, just as we do for sequences and other indexed collections. The additional properties of matrices stem from the correspondence between m X n matrices {tij} and transformations T G Hom(IR'', U^), The following theorem restates results from the first chapter. See Theorems 1.2, 1.3, and 6.2 of Chapter 1 and the discussion of the linear combination map at the end of Section 1.6.
2.4 MATRICES 89 Theorem 4.1. Let {Uj} be an m-by-n matrix, and let t-^ be the m-tuple that is itsyth column forj= 1, . . . , n. Then there is a unique T in Hom(IR'', U^) such that skeleton T = {t^}, i.e., such that T(d^) = t^ for ally. T is defined as the linear combination mapping x ^-^ y = 2Z7=i ^j^^\ ^^^ ^^ equivalent presentation of T is the collection of scalar equations n i=i Each T in Hom(IR^, U^) arises this way, and the bijection {tij} ^-^ T from U^mxn ^Q Hom(IR'^, R^) is a natural isomorphism. The only additional remark called for here is that in identifying an m X n matrix with an n-tuple of m-tuples, we are making use of one of the standard identifications of duality (Section 0.10). We are treating the natural isomorphism between the really distinct spaces R^x^ and (R'^)'' as though it were the identity. We can also relate T to {tij} by way of the rows of {1^}. As above, taking iih coordinates in the m-tuple equation y = 2Z7=i ^i*"^? we get the equivalent and familiar system of numerical (scalar) equations yi = 2^y=i UjXj for i = 1, . . . , m. Now the mapping x ^-^ 2Zy=i CjXj from R"" to R is the most general linear functional on R^. In the above numerical equations, therefore, we have simply used the m rows of the matrix {tij} to present the m-tuple of linear functionals on R^ which is equivalent to the single m-tuple-valued linear mapping T in Hom(R^, R^) by Theorem 3.6 of Chapter 1. The choice of ordered bases for arbitrary finite-dimensional spaces F and W allows us to transfer the above theorem to Hom(F, W). Since we are now going to correlate a matrix t in R^x^ with a transformation T in Hom(F, W), we shall designate the transformation in Hom(R'', R"^) discussed above by T. Theorem 4.2. Let {ay} i and {Pi}"^ be ordered bases for the vector spaces F and W, respectively. For each matrix {tij} in R^Xn j^^ rp ^^ ^^iq unique element of Hom(F, W) such that T{aj) = T.T=i Ujfii for j = 1, . . . , n. Then the mapping {tij} i—» 7" is an isomorphism from R^Xn ^^ Hom(F, W). Proof. We simply combine the isomorphism {tij} i—» T of the above theorem with the isomorphism T^T = ypoTo^-^ from Hom(R^, R^) to Hom(F, TF), where (p and \p are the two given basis isomorphisms. Then T is the transforma- (ion described in the theorem, for T(aj) = yp{T{<p-\aj))) = yp{T(d')) = \p{t^) = JlT=i iijPi' The map {tij} ^-^ T is the composition of two isomorphisms and so is an isomorphism. D It is instructive to look at what we have just done in a slightly different way. (liven the matrix {tij}, let Tj be the vector in TF whose coordinate m-tuple is the ,/(h column t-^ of the matrix, so that Tj = YlT=i kjfii- Then let T be the unique clement of Hom(F, TF) such that T{aj) = Tj forj= 1, . . . , n. Now we have obtained T from {tij} in the following two steps: T corresponds to the n-tuple
90 FINITE-DIMENSIONAL VECTOR SPACES 2.4 {ry} 1 under the isomorphism from Hom(F, W) to W given by Theorem 1.6, and {Tj} 1 corresponds to the matrix {tij} by extension of the coordinate isomorphism between W and U^ to its product isomorphism from W to (R'^)''. Corollary. If y is the coordinate m-tuple of the vector rj in W and x is the coordinate n-tuple of J in F (with respect to the given bases), then rj = T{^) if and only if yi = 2^7= i ^u^i ^ov i = 1, . . . , m. Proof. We know that the scalar equations are equivalent to y = T(x), which is the equation y = ^~^o7'o(^(x). The isomorphism ^ converts this to the equation rj = T(^). D Our problem now is to discover the matrix analogues of relationship between linear transformations. For transformations between the Cartesian spaces U^ this is a fairly direct, uncomplicated business, because, as we know, the matrix here is a natural alter ego for the transformation. But when we leave the Cartesian spaces, a transformation T no longer has a matrix in any natural way, and only acquires one when bases are chosen and a corresponding T on Cartesian spaces is thereby obtained. All matrices now are determined with respect to chosen bases, and all calculations are complicated by the necessary presence of the basis and coordinate isomorphisms. There are two ways of handling this situation. The first, which we shall follow in general, is to describe things directly for the general space F and simply to accept the necessarily more complicated statements involving bases and dual bases and the corresponding loss in transparency. The other possibility is first to read off the answers for the Cartesian spaces and then to transcribe them via coordinate isomorphisms. Lemma 4.1. The matrix element tkj can be obtained from T by the formula tkj = lXk{T{aj)), where p,k is the A:th element of the dual basis in TF*. Proof. fJik{T{aj)) = fXk(ZT=i Ujfii) = HiUM^i) = Hikj b\ = t^j. D In terms of Cartesian spaces, T{d^), is the yth column m-tuple t^ in the matrix {tij} of T, and tkj is the A:th coordinate of t-^. From the point of view of linear maps, the A:th coordinate is obtained by applying the A:th coordinate projection Tk, so that % = Trk{T{8^)). Under the basis isomorphisms, Tk becomes jujt, T becomes T, 8^ becomes aj, and the Cartesian identity becomes the identity of the lemma. The transpose. The transpose of the m X n matrix {tij} is the n X m matrix {tfj} defined by t* = tji for all z, j. The rows of t* are of course the columns of t, and conversely. Theorem 4.3. The matrix of T* with respect to the dual bases in TF* and F* is the transpose of the matrix of T.
2.4 MATRICES 91 Proof, If s is the matrix of T*, then Lemmas 3.1 and 4.1 imply that s^,= aT{T\ix,)) = aT{iXioT) = {iXioT){aj) = iXi{T{aj)) = Uj. D Definition. The row space of the matrix {tij} G W^^^ is the subspace of IR^ spanned by the m row vectors. The column space is similarly the span of the n column vectors in U^. Corollary. The row and column spaces of a matrix have the same dimension. Proof. If T is the element of Hom(lR^, IR^) defined by T(8') = t', then the set {t-'} 1 of column vectors in the matrix {tij} is the image under T of the standard basis of IR^, and so its span, which we have called the column space of the matrix, is exactly the range of T. In particular, the dimension of the column space is d{R(T)) = rank T, Since the matrix of T* is the transpose t* of the matrix t, we have, similarly, that rank T* is the dimension of the column space of t*. But the column space of t* is the row space of t, and the assertion of the corollary is thus reduced to the identity rank T* = rank T, which is the corollary of Theorem 3.5. Q This common dimension is called the rank of the matrix. Matrix products. If T G Hom(lR^, W^) and S G Hom(lR^, IRO, then of course R = S o T G Hom(lR^, IR^), and it certainly should be possible to calculate the matrix r of 72 from the matrices s and t of aS and T, respectively. To make this computation, we set y = T(x) and z = S(y)f so that z = (S o T)(x) = R(x). The equivalent scalar equations in terms of the matrices t and s are Vi = 23 tihXh and Zk = ^ SkiVi, h=l z = l so that m n n / m \ Zk = ^ Ski S tihXh = S ( X) Skitihj Xh. i = \ h=\ h=\ \i=\ ' But Zk = Y.h=i '^khXh for fc = 1, . . . , Z. Taking x as 8\ we have m Vkj = 23 ^kitij for all k and j, z = l We thus have found the formula for the matrix r of the map R = So Tix—>z. Of course, r is defined to be the product of the matrices s and t, and we write r = s • t or r = St. Note that in order for the product st to be defined, the number of columns in the left factor must equal the number of rows in the right factor. We get the element Vkj by going across the fcth row of s and simultaneously down the yth
92 FINITE-DIMENSIONAL VECTOR SPACES 2.4 column of t, multiplying corresponding elements as we go, and adding the resulting products. This process is illustrated in Fig. 2.1. In terms of the scalar product (x, y) = ^i xiyi on IR^, we see that the element Vkj in r = st is the scalar product of the fcth row of s and the yth column of t. {I by m) I kih. row|---o e3- m (m by n) n 1 d) 1 1 1 1 1 1 jth column t Fig. 2.1 {I l)y n) Since we have defined the product of two matrices as the matrix of the product of the corresponding transformations, i.e., so that the mapping T i-^ {tij} preserves products (So T i-^ st), it follows from the general principle of Theorem 4.1 of Chapter 1 that the algebraic laws satisfied by composition of transformations will automatically hold for the product of matrices. For example, we know without making an explicit computation that matrix multiplication is associative. Then for square matrices we have the following theorem. Theorem 4.4. The set Mn of square n X n matrices is an algebra naturally isomorphic to the algebra Hom(lR^). Proof, We already know that T \-^ {tij} is a natural linear isomorphism from Hom(lR^) to Mn (Theorem 4.1), and we have defined the product of matrices so that the mapping also preserves multiplication. The laws of algebra (for an algebra) therefore follow for Mn from our observation in Theorem 3.5 of Chapter 1 that they hold for Hom(lR^). D The identity / in Hom(lR^) takes the basis vector 8^ into itself, and therefore its matrix e has 8^ for its yth column: e^ = 8^, Thus eij = 81 = 1 if i = j and eij =8^ = 0 if i 9^ J. That is, the matrix e is 1 along the main diagonal (from upper left to lower right) and 0 elsewhere. Since / i-^ e under the algebra isomorphism T i-^ t, we know that e is the identity for matrix multiplication. Of course, we can check this directly: J^]=i Ujejk = Uk, and similarly for multiplying by e on the left. The symbol 'e^ is ambiguous in that we have used it to denote the identity in the space IR^^^ of square 7i X 7i matrices for any n. Corollary. A square 71 X n matrix t has a multiplicative inverse if and only if its rank is 7i.
2.4 MATRICES 93 Proof. By the theorem there exists an s G M^ such that st = ts = e if and only if there exists an S G Hom(lR'') such that SoT=ToS=I. But such an S exists if and only if T is an isomorphism, and by the corollary to Theorem 2.4 this is equivalent to the dimension of the range of T being n. But this dimension is the rank of t, and the argument is complete. D A square matrix (or a transformation in Hom V) is said to be nonsingular if it is invertible. Theorem 4.5. If {az}i, {/5j}T> ^^^ {^/c}i are ordered bases for the vector spaces (7, F, and TF, respectively, and if T G Hom ((7, V) and S G Hom(F, W), then the matrix of aS o T is the product of the matrices of S and T (with respect to the given bases). Proof. By definition the matrix of aS o T is the matrix oi S o T = X~^ o {S o T) o ip in Hom(lR^, W), where <^ and X are the given basis isomorphisms for U and W. But if yj/ is the basis isomorphism for F, we have and therefore its matrix is the product of the matrices of S and T by the defini- tion of matrix multiplication. The latter are the matrices of S and T with respect to the given bases. Putting these observations together, we have the theorem. □ There is a simple relationship between matrix products and transposition. Theorem 4.6. If the matrix product st is defined, then so is t*s*, and t*s* = (st)*. Proof. A direct calculation is easy. We have m m ist)jk = (St)kj = X Skitij = X tjiSik = (t S )jk. z = l z = l Thus (st)* = t*s*, as asserted. D This identity is clearly the matrix form of the transformation identity {S o T)"^ = T* o aS*, and it can be deduced from the latter identity if desired. Cartesian vectors as matrices. We can view an n-tuple x^ -Kxij . . . , Xn^ as being alternatively either an n X 1 matrix, in which case we call it a column vector, or a 1 X n matrix, in which case we call it a row vector. Of course, these identifications are natural isomorphisms. The point of doing this is, in part, that then the equations yi = l^y=i tijXj say exactly that the column vector y is the matrix product of t and the column vector x, that is, y = t • x. The linear map T:U^ —^ W^ becomes left multiplication by the fixed matrix t when U^ is viewed as the space of n X 1 column vectors. For this reason we shall take the column vector as the standard matrix interpretation of an n-tuple x; then x* is the corresponding row vector.
94 FINITE-DIMENSIONAL VECTOR SPACES 2.4 In particular, a linear functional F G (IR^)* becomes left multiplication by iU matrix, which is of course 1 X n (F being from W^ to IR^), and therefore is simply the row matrix interpretation of an n-tuple in R^. That is, in the natural isomorphism a i-^ La from W^ to (IR^)*, where L^(x) = YJi (^i^i^ the functional La can now be interpreted as left matrix multiplication by the 7i-tuple a viewed as the row vector a*. The matrix product of the row vector (1 X n matrix) a* and the column vector (rz X 1 matrix) x is a 1 X 1 matrix a* • x, that is, a number. Let us now see what these observations say about T*. The number L^{T{x)) is the 1 X 1 matrix a*tx. Since L^{T(x)) = (r*(La))(x) by the definition of T*, we see that the functional T*(L^) is left multiplication by the row vector a*t. Since the row vector form of La is a* and the row vector form of T'*(La) is a*t, this shows that when the functionals on IR^ are interpreted as roio vectors, T* becomes right multiplication by t. This only repeats something we already know. If we take transposes to throw the row vectors into the standard column vector form for 7?-tuples, it shows that T* is left multiplication by t*, and so gives another proof that the matrix of T* is t*. Change of basis. If<^:xi-^ ^ = J2i ^iPi and d:y ^-^ ^ = l^i yifii are two basis isomorphisms for F, then A = ^~^ o <^ is the isomorphism in Hom(lR^) which takes the coordinate 7i-tuple x of a vector ^ with respect to the basis {/? J into the coordinate n-tuple y of the same vector with respect to the basis {jSi}. The isomorphism A is called the "change of coordinates" isomorphism. In terms of the matrix a of A, we have y = ax, as above. The change of coordinate map A = d~^ o ip should not be confused with the similar looking T = 6 o (p~^. The latter is a mapping on F, and is the element of Hom(y) which takes each Pi to /?'. Fig. 2.2 We now want to see what happens to the matrix of a transformation T G Hom(F, W) when we change bases in its domain and codomain spaces. Suppose then that cpi and <^2 are basis isomorphisms from IR^ to F, that i^i and \l/2 are basis isomorphisms from IR^ to W, and that t' and t" are the matrices of T with respect to the first and second bases, respectively. That is, t' is the matrix of r = (^Pl)-^ o T o <^i G Hom(lR^, IR^), and similarly for t'\ The mapping ^ = <^2~^ ° <^i ^ Hom(lR^) is the change of coordinates transformation for V: if X is the coordinate n-tuple of a vector { with respect to the first basis [that is, ^ = <pi(x)], then A(x) is its coordinate n-tuple with respect to the second basis. Similarly, let B be the change of coordinates map rj/^^ o xj/^ for W. The diagram in Fig. 2.2 will help keep the various relationships of these spaces and
2.4 MATRICES 95 mappings straight. We say that the diagram is commutative, which means that any two paths between two points represent the same map. By selecting various pairs of paths, we can read off all the identities which hold for the nine maps T, T", T"', (fly (P2, A,i^i,i^2f B. For example, T"' can be obtained by going backward along A J forward along T", and then forward along B. That is, T"' = B o T^ o A~^. Since these "outside maps" are all maps of Cartesian spaces, we can then read off the corresponding matrix identity t" = bt'a-S showing how the matrix of T with respect to the second pair of bases is obtained from its matrix with respect to the first pair. What we have actually done in reading off the above identity from the diagram is to eliminate certain retraced steps in the longer path which the definitions would give us. Thus from the definitions we get Bo r o A~^ = (yl/2^ o yp{) o (yPY^ o T o ^^) o (<pf ^ o ^^) = yfy^^ o T o ^^ = T". In the above situation the domain and codomain spaces were different, and the two basis changes were independent of each other. If TF = F, so that T G Hom(F), then of course we consider only one basis change and the formula becomes t" = a • t' • a-^ Now consider a linear functional F G F*. If i" and f are its coordinate n-tuples considered as column vectors (n X 1 matrices), then the matrices of F with respect to the two bases are the row vectors (f)* and (f")*> as we saw earlier. Also, there is no change of basis in the range space since here IF = IR, with its permanent natural basis vector 1. Therefore, b = e in the formula t" = bt'a~\ and we have (f")* = (fO*a~^ or f"= (a-"i)*f'. We want to compare this with the change of coordinates of a vector { G F, which, as we saw earlier, is given by These changes go in the oppositive directions (with a transposition thrown in). For reasons largely historical, functionals F in V* are called covariant vectors, and since the matrix for a change of coordinates in F is the transpose of the inverse of the matrix for the corresponding change of coordinates in F*, the vectors { in F are called contravariant vectors. These terms are used in classical tensor analysis and differential geometry. The isomorphism {tij} i-> T, being from a Cartesian space IR'^^^, is automatically a basis isomorphism. Its basis in Hom(F, W) is the image under the isomorphism of the standard basis in IR'^^^, where the latter is the set of Kronecker functions 8^^ defined by d^\i,j) = 0 if <k,l> 7^ <hj> and S^\k, l) = 1. (Remember that in IR^, S"" is that function such that d'^ib) = 0
90 FIXITE-DIMEXSIOXAL VECTOR SPACES 2.4 if 6 7^ a and 5"*(a) — 1. Here A = m X n and the elements a of A are ordered pairs a = </v, />.) The function 8^^ is that matrix whose columns are all 0 except for the /th, and the lih column is the m-tuple 8^. The corresponding transformation D^i thus takes every basis vector aj to 0 except ai and takes ai to Pfc' That is, Dki(aj) = Oiij 9^ I, and Dki(ai) = I3k. Again, Dki takes the lih basis vector in V to the /ith basis vector in W and takes the other basis vectors in V to 0. If ? = E ^'io^h it follows that Z>a-KO = xih- Since {D^z} is the basis defined by the isomorphism {tij} i-^ T, it follows that [tij] is the coordhiate set of T with respect to this basis; it is the image of T under the coordinate isomorphism. It is interesting to see how this basis expansion of T automatically appears. We have TiO = T (Z xja\ = i: xjTiaj) = ^ UjXjfii = ^ kjO^jU), so that 1 / J tijUij. Our original discussion of the dual basis in F* was a special case of the present situation. There we had Hom(F, R) = 7*, with the permanent standard basis 1 for R. The basis for F* corresponding to the basis {aJ for T'" therefore consists of those maps Di taking ai to 1 and aj to 0 for j 9^ I. Then Di{^) — Di(Y, Xjctj) = ^h ^ii<^l ^i '^^ the /th coordinate functional 8^ Finally, we note that the matrix expression of T G Hom(lR^, W^') is very suggestive of the block decompositions of T that we discussed earlier in Section 1.5. In the exercises we shall ask the reader to show that in fact Tj^i = tuDki. EXERCISES 4.1 Prove that if co: T'X ]'^ IR is a bilinear functional on T' and T: T^ ^ T'* is the corresponding linear transformation defined by {T{ri)){X) = 60(^,77), then for any basis [ai] for T' the matrix tij = o)(ai, aj) is the matrix of T. 4.2 Verify that the row and column rank of the following matrix are both 1: -5 2 3 -10 4 6 4.3 Show by a direct calculation that if the row rank of a 2 X 3 matrix is 1, then so is its column rank. 4.4 Let [fi] ^ be a linearly' dependent set of C^-functions (twice continuously^ differ- entiable real-valued functions) on R. Show that the three triples <fi{x),fi(x),fi\x) > are dependent for any x. Prove therefore that sin t, cos t, and e^ are linearly- independent. (Compute the derivative triples for a well-chosen x.)
2.4 MATRICES 97 4.5 Compute 4.6 Compute ] ^ U From your answer give a necessary and sufficient condition for I a b~\ \_c d I to exist. 4.7 A matrix a is idempotent if a^ = a. Find a basis for the vector space R^^^ of all 2 X 2 matrices consisting entirely of idempotents. 4.8 By a direct calculation show that 1 2 L—2 3. is invertible and find its inverse. 4.9 Show by explicitly solving the equation fa ^1. r^ ^1 = r^ [c d\ Iz w\ [o that the matrix on the left is invertible if and only if (the determinant) ad zero. 4.10 Find a nonzero 2X2 matrix be is not [: 3 whose square is zero. 4.11 Find all 2X 2 matrices whose squares are zero. 4.12 Prove by computing matrix products that matrix multiplication is associative. 4.13 Similarly, prove directly the distributive law, (r + s) • t = r • t + s • t. 4.14 Show that left matrix multiplication by a fixed r in U.^^^ is a linear transformation from R'^^p to R'^^p, What theorem in Chapter 1 does this mirror? 4.15 Show that the rank of a product of two matrices is at most the minimum of their ranks. (Remember that the rank of a matrix is the dimension of the range space of its associated T.) 4.16 Let a be an m X n matrix, and let b be n X m. If m > w, show that a • b cannot be the identity e {mX m).
98 FINITE-DIMENSIONAL VECTOR SPACES 2.4 4.17 Let Z be the subset of 2 X 2 matrices of the form fa h\ a Prove that Z is a subalgebra of IR2X2 (that is, Z is closed under addition, scalar multiplication, and matrix multiplication). Show that in fact Z is isomorphic to the complex number system. 4.18 A matrix (necessarily square) which is equal to its transpose is said to be symmetric. As a square array it is symmetric about the main diagonal. Show that for any mX n matrix t the product t • t* is meaningful and symmetric. 4.19 Show that if s and t are symmetric nX n matrices, and if they commute, then s • t is symmetric. (Do not try to answer this by writing out matrix products.) Show conversely that if s, t, and s • t are all symmetric, then s and t commute. 4.20 Suppose that T in Hom R^ has a symmetric matrix and that T is not of the form cl. Show that T has exactly two eigenvectors (up to scalar multiples). What does the matrix of T become with respect to the "eigenbasis" for IR^ consisting of these two eigenvectors? 4.21 Show that the symmetric 2X2 matrix t has a symmetric square root s (s^ = t) if and only if its eigenvalues are nonnegative. (Assume the above exercise.) 4.22 Suppose that t is a 2 X 2 matrix such that t* = t~^. Show that t has one of the forms where a^ -\- b^ = 1. 4.23 Prove that multipHcation by the above t is a EucHdean isometry. That is, show that if y = t • x, wherex and y E IR^, then ||a:|| = ||?/||, where ||a:|| = (x\-\- x^)^^^. 4.24 Let {Dki] be the basis for Hom(y, W) defined in the text. Taking W = V, show that these operators satisfy the very important multiplication rules Dij o D,i = 0 if j 9^ k, Diko Dki = Da. 4.25 Keeping the above identities in mind, show that if / 9^ ??i, then there are transformations *S and T in Hom V such that So T — To S Du Also find *S and T such that So T — To S = Du Drr, 4.26 Given T in Hom U"", we know from Chapter 1 that T = J^ij Tij, where Tij = PiTPj and Pi = BiWi. Now we also have T = X) ikiDki. kl Show from the definition of D^ in the text that PiDijPj = Dij and that PiDkiPj = 0 if either i 9^ k or j 9^ I. Conclude that Ta = tijDij.
2.5 TRACE AND DETERMINANT 99 5. TRACE AND DETERMINANT Our aim in this short section is to acquaint the reader with two very special real-valued functions on Horn V and to describe some of their properties. Theorem 5.1. If F is an 7i-dimensional vector space, there is exactly one linear functional X on the vector space Hom(F) with the property that \(S o T) = \(T o S) for all S, T in Hom(F) and normalized so that X(/) = n. If a basis is chosen for F and the corresponding matrix of T is {Uj}, then X(T') = Y^i=i tiij the sum of the elements on the main diagonal. Proof. If we choose a basis and define \(T) as X)i Ui^ then it is clear that X is a linear functional on Hom(F) and that X(/) = n. Moreover, n / n \ n XC-S o T) = E ( E Sijtji) = E Sijtji = £ tjiSij = \{T o S). That is, each basis for F gives us a functional X in (Hom F)* such that \(S o T) = \{T o aS), X(/) = ??, and \(T) = X) Ui for the matrix representation of that basis. Now suppose that jjl is any element of (Hom(F))* such that ijl(S o T) = Ijl(T o S) and /x(/) = n. If we choose a basis for F and use the isomorphism d: {tij} ^ T from IR^^^ to Hom F, we have a functional v = fx o 6 on IR^^^ (p = d^fji) such that v(st) = v(ts) and v(e) = n. By Theorem 4.1 (or 3.1) v is given by a matrix c, v{i) = HXj=i CijUj, and the equation p(st — ts) = 0 becomes T.ij,k=i Cij(siktkj — SjAi) = 0. We are going to leave it as an exercise for the reader to show that if Z f^ m, then very simple special matrices s and t can be chosen so that this sum reduces to cim = 0, and, by a different choice, to c^ — Cmm = 0. Together with the requirement that p(e) = n, this implies that cim = 0 for I 7^ m and Cmm = 1 for m = 1, . . . -, n. That is, v{t) = YJi imm, and v is the X of the basis being used. Altogether this shows that there is a unique X in (Hom F)* such that \(S o T) = \{T o S) for all S and T and X(/) = n, and that \(T) has the diagonal evaluation as X) Ui in every basis. □ This unique X is called the trace functional^ and \{T) is the trace of T. It is usually designated tr(T'). The determinant function A(T) on Hom F is much more complicated, and we shall not prove that it exists until Chapter 7. Its geometric meaning is as follows. First, |A(T')| is the factor by which T multiplies volumes. More precisely, if we define a "volume" v for subsets of V by choosing a basis and using the coordinate correspondence to transfer to F the "natural" volume on IR^, then, for any figure A C F, v(T[A]) = \A(T)\ • v(A), This will be spelled out in Chapter 8. Second, A(T) is positive or negative according as T preserves or reverses orientation^ which again is a sophisticated notion to be explained later. For the moment we shall list properties of A(T') that are related to this geometric interpretation, and we give a sufficient number to show the uniqueness of A.
100 FINITE-DIMENSIONAL VECTOR SPACES 2.5 T[A] Fig. 2.3 We assume that for each finite-dimensional vector space V there is a function A (or Av when there is any question about domain) from Hom(F) to U such that the following are true: a) A(S o T) = AiS) AiT) for any S, T in Hom(y). b) If a subspace N of V is invariant under T and T is the identity on N and on V/N (that is, T[a] = a for each coset a = a + N oi N), then A(T) = 1. Such a T is a shearing of V along the planes parallel to N. In two dimensions it can be pictured as in Fig. 2.3. c) If F is a direct sum F = ilf + ^ of T-invariant subspaces M and N, and if R = T \ Mand S = T \ N, then A(T) = A(R) A(S). More exactly, Av(T) = Am(R) An(S). d) If F is one-dimensional, so that any T in Hom(F) is simply multiplication by a constant ct, then A(T) is that constant ct- e) If F is two-dimensional and T interchanges a pair of independent vectors, then A(T) = — 1. This is clearly a pure orientation-changing property. The fact that A is uniquely determined by these properties will follow from our discussion in the next section, which will also give us a process for calculating A. This process is efficient for dimensions greater than two, but for T in Hom(IR^) there is a simple formula for A(T) which every student should know by heart. Theorem 5.2. If T is in Hom(lR2) and {Uj} is its 2x2 matrix, then A(T) = tiit22 ~ ^12^21- This is a special case of a general formula, which we shall derive in Chapter 7, that expresses A(T') as a sum of n! terms, each term being a product of n numbers from the matrix of T, This formula is too complicated to be useful in computations for large n, but for n = 3 it is about as easy to use as our row-reduction calculation in the next section, and for n = 2 it becomes the above simple expression. There are a few more properties of A with which every student should be familiar. They will all be proved in Chapter 7. Theorem 5.3. If T is in Hom F, then A(T*) = A{T). If d is an isomorphism from F to IF and aS = ^ o T o ^-i, then A{S) = A{T).
2.5 TRACE AND DETERMINANT 101 Theorem 5.4. The transformation T is nonsingular (invertible) if and only if ^{T) 9^ 0. In the next theorem we consider T in Hom IR^, and we want to think of A(T') as a function of the matrix t of T. To emphasize this we shall use the notation D{t) = h{T). Theorem 5.5 {Cramer^s rule). Given an n X n matrix t and an n-tuple y, let t \j y be the matrix obtained by replacing theyth column of t by y. Then y = t • X =4> D{t)Xj = D(t \j y) for all j. If t is nonsingular [D(t) 9^ 0], this becomes an explicit formala for the solution X of the equation y = t • x; it is theoretically important even in those cases when it is not useful in practice (large n). EXERCISES 5.1 Finish Theorem 5.1 by applying Exercise 4.25. 5.2 It follows from our discussion of trace that tr(T) = ^ ta is independent of the basis. Show that this fact follows directly from tr(t • s) = tr(s • t) and the change of basis formula in the preceding section. 5.3 Show by direct computation that the function d{t) = tiit22 — ^12^21 satisfies d(s • t) = d(s) d(t) (where s and t are 2X2 matrices). Conclude that if V is two- dimensional and d{T) is defined for T in Hom V by choosing a basis and setting d(T) = d(t), then d(T) is actually independent of the basis. 5.4 Continuing the above exercise, show that d(T) = A(T) in any of the following cases: 1) T interchanges two independent vectors. 2) T has two eigenvectors. 3) T has a matrix of the form ri al [0 ij Show next that if T has none of the above forms, then T = R o S, where *S is of type (1) and R is of type (2) or (3). [Hint: Suppose T(a) = fi, with a and ^ independent, l^et S interchange a and jS, and consider R = T o S.] Show finally that d{T) = A(T) for all T in Hom V. (F is two-dimensional.) 5.5 If t is symmetric and 2X2, show that there is a 2 X 2 matrix s such that H* = s~^, A(s) = 1, and sts~^ is diagonal. 5.6 Assuming Theorem 5.2, verify Theorem 5.4 for the 2X2 case. 5.7 Assuming Theorem 5.2, verify Theorem 5.5 for the 2X2 case.
102 FINITE-DIMENSIONAL VECTOR SPACES 2.G 5.8 In this exercise we suppose that the reader remembers what a continuous function of a real variable is. Suppose that the 2X2 matrix function a(0 = aiiit) ai2(0l a2i(t) a22(0j has continuous components aij(t) for t E (0, 1), and suppose that a(t) is nonsingular for every t. Show that the solution y{t) to the Unear equation a{t) • y(t) = x(t) has continuous components yi{t) and y2{i) if the functions xi{t) and X2{t) are continuous. 5.9 A homogeneous second-order linear differential equation is an equation of the form y'' + aiy' + aoy = 0, where ai = ai(t) and ao = ao(t) are continuous functions. A solution is a C^-function / (i.e., a twice continuously differentiate function) such that f'{t) + ai{t)f{t) + cio{t)f{t) = 0. Suppose that / and g are C^-functions [on (0, 1), say] such that the 2X2 matrix \m g{t)l Ifit) g'{t)\ is always nonsingular. Show that there is a homogeneous second-order differential equation of which they are both solutions. 5.10 In the above exercise show that the space of all solutions is a two-dimensional vector space. That is, show that if h(t) is any third solution, then h is a. Hnear combination of / and g. 5.11 By a "linear motion" of the Cartesian plane R^ into itself we shall mean a continuous maj) X ^-^ t(x) from [0, 1] to the set of 2 X 2 nonsingular matrices such that t(0) = e. Show that A(t(l)) > 0. 5.12 Show that if A(s) = 1, then there is a linear motion whose final matrix t(l) is s. 6. MATRIX COMPUTATIONS The computational process by which the reader learned to solve systems of linear equations in secondary school algebra was undoubtedly "elimination by- successive substitutions". The first equation is solved for the first unknown, and the solution expression is substituted for the first unknown in the remaining equations, thereby eliminating the first unknown from the remaining equations. Next, the second equation is solved for the second unknown, and this unknown is then eliminated from the remaining equations. In this way the unknowns are eliminated one at a time, and a solution is obtained. This same procedure also solves the following additional problems: 1) to obtain an explicit basis for the linear span of a set of m vectors in IR^; therefore, in particular, 2) to find the dimension of such a subspace; 3) to compute the determinant of an m X m matrix; 4) to compute the inverse of an invertible m X m matrix.
2.6 MATRIX COMPUTATIONS 103 In this section we shall briefly study this process and the solutions to these problems. We start by noting that the kinds of changes we are going to make on a finite sequence of vectors do not alter its span. Lemma 6.1. Let {ai}^ be any m-tuple of vectors in a vector space, and let {Pi}7 be obtained from {ai}^ by any one of the following elementary operations: 1) interchanging two vectors; 2) multiplying some ai by a nonzero scalar; 3) replacing ai by ai — xaj for some./ 9^ i and some a: G K. Then L({^,}T) = L({a,}T). Proof. If a'i = ai — xaj, then ai = a- + xaj. Thus if {^i}^ is obtained from {ai}^ by one operation of type (3), then {af}^ can be obtained from {^i}"^ by one operation of type (3). In particular, each sequence is in the linear span of the other, and the two linear spans are therefore the same. Similarly, each of the other operations can be undone by one of the same type, and the linear spans are unchanged. □ When we perform these operations on the sequence of roiv vectors in a matrix, we call them elementary row operations. We define the order of an n-tuple x = <xi, . . . ,Xn> as the index of the first nonzero entry. Thus if Xi = 0 for i < j and Xj 9^ 0, then the order of x kj. The order of <0, 0, 0, 2, — 1, 0> is 4. Let {aij} be an m X n matrix, let V be its row space, and let rii < 712 < • • • < rifc be the integers that occur as orders of nonzero vectors in V. We are going to construct a basis for V consisting of k elements having exactly the above set of orders. If every nonzero row in {aij} has order > p, then every nonzero vector x in V has order >p, since x is a linear combination of these row vectors. Since some vector in V has the minimal order rii, it follows that some row in {aij} has order ?ii. We move such a row to the top by interchanging two rows. We then multiply this row X by a constant, so that its first nonzero entry Xm is 1. Let a\ . . . , a^ be the row vectors that we now have, so that a^ has order rii and a^j = 1. We next subtract multiples of a^ from each of the other rows in such a way that the new ith row has 0 as its ni-coordinate. Specifically, we replace a* by a* — aj^j • a^ for i > 1. The matrix that we thus obtain has the property that its yth column is the zero m-tuple for eachy < rii and its riith column is 8^ in U^. Its first row has order rii, and every other row has order >ni. Its row space is still V. We again call it a. Now let X = ^7 Cz-a* be a vector in V with order 712- Then ci = 0, for if Ci 7^ Oj then the order of x is ni. Thus x is a linear combination of the second
104 FINITE-DIMENSIONAL VECTOR SPACES 2.6 to the mth rows, and, just as in the first case, one of these rows must therefore have order 712. We now repeat the above process all over again, keying now on this vector. We bring it to the second row, make its n2-coordinate 1, and subtract multiples of it from all the other rows (including the first), so that the resulting matrix has 8^ for its n2th column. Next we find a row with order 713, bring it to the third row, and make the 713th column 5^, etc. We exhibit this process below for one 3X4 matrix. This example is dishonest in that it has been chosen so that fractions will not occur through the application of (2). The reader will not be that lucky when he tries his hand. Our defense is that by keeping the matrices simple we make the process itself more apparent. 0 2 2 -1 2 4 2 4 0 3 -2 3 (1) (3) (3) (3) 2 0 2 1 0 0 "1 0 0 1 0 0 2 -1 4 1 -1 2 0 1 0 0 1 0 4 2 0 2 2 -4 4 -2 0 4 -2 0 -2 3 3 -1 3 5. 2 -3 11 0" 0 1 (2) (2) (1) 1 0 2 "1 0 P 1 0 0 1 -1 4 1 1 2 0 1 0 2 2 0 2 -2 -4 4 -2 0 -1 3 3 -1 -3 5. 2 -3 1 Note that from the final matrix we can tell that the orders in the row space are 1, 2, and 4, whereas the original matrix only displays the orders 1 and 2. We end up with an m X n matrix having the same row space V and the following special structure: 1) For 1 < j < k the jth row has order Uj, 2) If /c < m, the remaining m — k rows are zero (since a nonzero row would have order >nk, a contradiction). 3) The Ujih column is 8^. It follows that any linear combination of the first k rows with coefficients Ci, . . . , Cfc has Cj in the riyth place, and hence cannot be zero unless all the c/s are zero. These k rows thus form a basis for F, solving problems (1) and (2). Our final matrix is said to be in row-reduced echelon form. It can be shown to be uniquely determined by the space V and the above requirements relating its rows to the orders of the elements of V, Its rows form the canonical basis of V.
2.6 MATRIX COMPUTATIONS 105 A typical row-reduced echelon matrix is shown in Fig. 2.4. This matrix is 8 X 11, its orders are 1, 4, 5, 7, 10, and its row space has dimension 5. It is entirely 0 below the broken line. The dashes in the first five lines represent arbitrary numbers, but any change in these remaining entries changes the spanned space V. We shall now look for the significance of the row-reduction operations from the point of view of general linear theory. In this discussion it will be convenient to use the fact from Section 4 that if an n-tuplet in U^ is viewed as an n X 1 matrix (i.e., as a column vector), then the system of linear equations yk = 1^7=1 (^ij^jf ^* = 1, . . . , m, expresses exactly the single matrix equation y = a • x. Thus the associated linear transformation A G Hom(IR^, W^) is now viewed as being simply multiplication by the matrix a; y = A(x) if and only if y = a • x. 1 - Fig. 2.4 We first note that each of our elementary row operations on an m X n matrix a is equivalent to premultiplication by a corresponding m X m elementary matrix u. Supposing for the moment that this is so, we can find out what u is by using the m X m identity matrix e. Since u • a = (u • e) • a, we see that the result of performing the operation on the matrix a can also be obtained by premultiplying a by the matrix u • e. That is, if the elementary operation can be obtained as matrix multiplication by u, then the multiplier is u • e. This argument suggests that we should perform the operation on e and then see if premultiplying a by the resulting matrix performs the operation on a. If the elementary operation is interchanging the z'oth and joth rows, then performing it on e gives the matrix a with Ukk = 1 for /c f^ Iq and k 7^ jo, = 1 and Uki = 0 for all other indices. Moreover, examination of ^*o^o = U ;o»o the sums defining the elements of the product matrix u • a will show that premultiplying by this u does just interchange the z'oth and joth rows of any 771 X n matrix a. In the same way, multiplying the z'oth row of a by c is equivalent to premultiplying by the matrix u which is the same as e except that Ui^i^ = c. Finally, multiplying the ^o^h row by x and adding it to the z'oth row is equivalent to premultiplying by the matrix u which is the identity e except that Ui^j^ is x instead of 0.
106 FINITE-DIMENSIONAL VECTOR SPACES 2.6 These three elementary matrices are indicated schematically in Fig. 2.5. Each has the value 1 on the main diagonal and 0 off the main diagonal except as indicated. Jo Fig. 2.5 These elementary matrices u are all nonsingular (invertible). The row interchange matrix is its own inverse. The inverse of multiplying the yth row by c is multiplying the same row by 1/c. And the inverse of adding c times thejth row to the zth row is adding — c times the yth row to the zth row. If u\ u^, . . . , u^ is a sequence of elementary matrices, and if b = u^ .p-i . then b • a is the matrix obtained from a by performing the corresponding sequence of elementary row operations on a. If u\ . . . , u^ is a sequence which row reduces a, then r = b • a is the resulting row-reduced echelon matrix. Now suppose that a is a square m X m matrix and is nonsingular (invertible). Thus the dimension of the row space is m, and hence there are m different orders Til, ... , Uk. That is, k = m, and since 1 < rii < 712 < • • • < n^ = m, we must also have n^ = z, z = 1, . . . , m. Remembering that the n^th column in r is 5*, we see that now the iih column in r is 5* and therefore that r is simply the identity matrix e. Thus b • a = e and b is the inverse of a. Let us find the inverse of '1 21 13 41 [3 by this procedure. The row-reducing sequence is 1 3 i\ (3) |_0 -2J (2) [c The corresponding elementary matrices are [-1 ?]' [J -i] ?] (3) 1 0 3 [I -i\
2.6 MATRIX COMPUTATIONS 107 The inverse is therefore the product [J 0] _ [i i] r 1 0] _ [-2 1 iJ~Lo -iJL-3 iJ-L f -i Check it if you are in doubt. Finally, since b • e = b, we see that we get b from e by applying the same row operations (gathered together as premultiplication by b) that we used to reduce a to echelon form. This is probably the best way of computing the inverse of a matrix. To keep track of the operations, we can place e to the right of a to form a single m X 2m matrix a | e, and then row reduce it. In echelon form it will then be the m X 2m matrix e | b, and we can read off the inverse b of the original matrix a. Let us recompute the inverse of by this method. We row reduce getting 1 0 0" 1 w w 1 3 [l ri 2 4 2 -2 0 1 1 0 1 -3 -2 3 2 0] ij 0" 1_ 11 1 2J (2) [J 3 1. 2 2. from which we read off the inverse to be Finally we consider the problem of computing the determinant of a square mX m matrix. We use two elementary operations (one modified) as follows: 1') interchanging two rows and simultaneously changing the sign of one of them; 2) as before, replacing some row a^ by ai — xaj for some j 9^ i. When applied to the rows of a square matrix, these operations leave the determinant unchanged. This follows from the properties of determinants listed in Section 5, and its proof will be left as an exercise. Moreover, these properties will be trivial consequences of our definition of a determinant in Chapter 7. Consider, then, a square mX m matrix {aij}. We interchange the first and pth rows to bring a row of minimal order rii to the top, and change the sign of the row being moved down (the first row here). We do not make the leading
108 FINITE-DIMENSIONAL VECTOR SPACES 2.6 coefficient of the new first row 1; this elementary operation is not being used now. We do subtract multiples of the first row from the remaining rows, in order to make all the remaining entries in the riith column 0. The riith column is now Ci^S where Ci is the leading coefficient in the first row. And the new matrix has the same determinant as the original matrix. We continue as before, subject to the above modifications. We change the sign of a row moved downward in an interchange, we do not make leading coefficients 1, and we do clear out the Ujth column so that it becomes Cjb^j, where Cj is the leading coefficient of the ^th row {I < j < k). As before, the remaining m — k rows are 0 (if /c < m). Let us call this resulting matrix semzreduced. Note that we can find the corresponding reduced echelon matrix from it by k applications of (2); we simply multiply the jth row by 1/cy for y = 1, . . . , /c. If s is the semireduced matrix which we obtained from a using (r) and (3), then we shall show below that its determinant, and therefore the determinant of a also, is the product of the entries on the main diagonal: Y[a=\ ^a- Recapitulating, we can compute the determinant of a square matrix a by using the operations (1') and (3) to change a to a semireduced matrix s, and then taking the product of the numbers on the main diagonal of s. If we apply this process to ri 21 3 4| we get (3) [J (3) and the determinant is 1 • (- to -2) = —2. Our 2 X 2 determinant formula, appHed gives 1.4-2-3=4-6= -2. If the original matrix {aij} is nonsingular, so that k = m and n^ = i for i = 1, . . . , m, then the ^th column in the semireduced matrix is Cj8\ so that Sjj = Cj, and we are claiming that the determinant is the product YLT= i ^z of the leading coefficients. To see this, note that if T is the transformation in Hom(IR^) corresponding to our semireduced matrix, then T(8^) = Cjb\ so that K^ is the direct sum of n T-invariant, one-dimensional subspaces, on the ji\\ of which T is multiplication by Cj. It follows from (c) and (d) of our list of determinant properties that A(T) = III Cj = III Sjj. This is nonzero. On the other hand, if {aij} is singular, so that k = d(V) < m, then the mth row in the semireduced matrix is 0 and, in particular, Smm = 0. The product IIz Sii is thus zero. Now, without altering the main diagonal, we can subtract multiples of the columns containing the leading row entries (the columns with
2.6 MATRIX COMPUTATIONS 109 indices Uj) to make the mth column a zero column. This process is equivalent to postoultiplying by elementary matrices of type (2) and, therefore, again leaves the determinant unchanged. But now the transformation S of this matrix leaves W^~^ invariant (as the span of 5S . . . , 5"^"^ in IR^) and takes b"^ to 0, so that A(aS) = 0 by (c) in the list of determinant properties. So again the determinant is the product of the entries on the main diagonal of the semi- reduced matrix, zero in this case. We have also found that a matrix is nonsingular (invertible) if and only if its determinant is nonzero. EXERCISES 6.1 Compute the canonical basis of the row space of 1 3 1 0 1 1 1 2 2 -3 4 2 2 -2 1 3 0 — 1 4 3 0 2 2 4 -3. 5 4 2 6.2 Do the same for 6.3 Do the same for the above matrix but with a different first choice. 6.4 Calculate the inverse of "l 2 3] 12 3 4 [3 4 7J by row reduction. Check your answer by multiplication. 6.5 Row reduce yi 2/2 ys. How does the fourth column in the row-reduced matrix compare with the inverse of computed in the above exercise? Explain. 6.6 Check whether or not <1, 1, 1, 1>, <1, 2, 3, 4>, <0, 1, 0, 1>, and <4, 3, 2, 1 > are linearly independent by row reducing. Part of one of the row-reducing operations is unnecessary for this check. What is it?
110 FINITE-DIMENSIONAL VECTOR SPACES 2.6 6.7 Let us call a A:-tuple of vectors {ai}i in U^ canonical if the kX n matrix a with ai as its iih row for all i is in row-reduced echelon form. Supposing that an n-tuple { is in the row space of a, we can read off what its coordinates are with respect to the above canonical basis. What are they? How then can we check whether or not an arbitrary n-tuple { is in the row space? 6.8 Use the device of row reducing, as suggested in the above exercise, to determine whether or not 8^ = < 1, 0, 0, 0> is in the span of -<1,1, 1,1>, -<1,2, 3, 4>-, and <2, 0, 1, —1>. Do the same for <1,2, 1,2>, and also for <1, 1,0, 4>. 6.9 Supposing that a f^ 0, show that [: :] is invertible if and only U ad — be 9^ 0 by reducing the matrix to echelon form. 6.10 Let a be an m X n matrix, and let u be the nonsingular matrix that row reduces a, so that r = u • a is the row-reduced echelon matrix obtained from a. Suppose that r has m — k > 0 zero rows at the bottom (the kth row being nonzero). Show that the bottom m — k rows of u span the annihilator (range A)° of the range of A. That is, y = ax for some x if and only if m S CiVi = 0 for each m-tuple c in the bottom m — k rows of u. [Hint: The bottom row of r is obtained by applying the bottom row of u to the columns of a.] 6.11 Remember that we find the row-reducing matrix u by applying to the mX m identity matrix e the row operations that reduce a to r. That is, we row reduce the mX (n-\~ m) juxtaposition matrix a ( e to r ( u. Assuming the result stated in the above exercise, find the range oi A G Hom(IR^) as the null space of a functional if the matrix of A is 1 2 31 12 3 4 1_3 5 7J 6.12 Similarly, find the range of A if the matrix of A is 1 2 0 4 1 0 2 2 1 1 1 3 6.13 Let a be an ?n X n matrix, and let a be row reduced to r. Let A and R be the corresponding operators in Hom(IR^ W^) [so that A(x) = a • x]. Show that A and R have the same null space and that A* and R* have the same range space. 6.14 Show that solving a system of m linear equations in n unknowns is equivalent to solving a matrix equation k = tx for the n-tuple x, given the mX n matrix t and the m-tuple k. Let T G Hom(IR", W^) be multiplication by t. Review the possibilities for a solution from our general linear theory for T (range, null space, affine subspace).
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 111 6.15 Let b = c I d be the mX (n-{- p) matrix obtained by juxtaposing the mX n matrix c and the mX p matrix d. If a is an / X m matrix, show that a • b = ac I ad. State the similar result concerning the expression of b as the juxtaposition of n column m-tuples. State the corresponding theorem for the "distributivity" of right multiplication over juxtaposition. 6.16 Let a be an m X n matrix and k a column m-tuple. Let b | 1 be the niX (n + 1) matrix obtained from the m X (n+ 1) juxtaposition matrix a | k by row reduction. Show that a • X = k if and only if b • x = L Show that there is a solution x if and only if every row that is zero in b is zero in L Restate this condition in terms of the notion of row rank. 6.17 Let b be the row-reduced echelon matrix obtained from an m X n matrix a. Thus b = u • a, where u is nonsingular, and B and A have the same null space (where B G Hom(lR'*, R^) is multiplication by b). We can read off from b a basis for a sub- space W C^^ such that B \ W is slu isomorphism onto range B. What is this basis? We then know that the null space A^" of jB is a complement of W. One complement of W, call it M, can be read off from W. What is M? 6.18 Continuing the above exercise, show that for each standard basis vector 8i in M we can read off from the matrix b a vector ai in W such that 5' — ai G A^. Show that these vectors {8^ — ai} form a basis for A^". 6.19 We still have to show that the modified elementary row operations leave the determinant of a square matrix unchanged, assuming the properties (a) through (e) from Section 5. First, show from (a), (c), (d), and (e) that if T in Hom R^ is defined by T{8^) = 52 and T(8^) = —8\ then A(T) = 1. Do this by a very simple factorization, T = R ° S, where (e) can be applied to S. Conclude that a type (1') elementary matrix has determinant 1. 6.20 Show from the determinant property (b) that an elementary matrix of type (2) has determinant 1. Show, therefore, that the modified elementary row operations on a square matrix leave its determinant unchanged. *7. THE DIAGONALIZATION OF A QUADRATIC FORM As we mentioned earlier, one of the crucial problems of linear algebra is the analysis of the "structure" of a linear transformation T in Hom V. From the point of view of bases, every theorem in this area asserts that with the choice of a special basis for V the matrix of T can be given the such-and-such simple form. This is a very difficult part of the subject, and we are only making contact with it in this book, although Theorem 5.5 of Chapter 1 and its corollary form a cornerstone of the structural results. In this section we are going to solve a simpler problem. In the above language it is the problem of choosing a basis for V making simple the matrix of a transformation T in Hom(F, F*). Such a transformation is equivalent to a bilinear functional on V (by Theorem 6.1 of Chapter 1 and Theorem 3.2 of this chapter); we shall tackle the problem in this setting.
112 FINITE-DIMENSIONAL VECTOR SPACES 2.7 Let F be a finite-dimensional real vector space, and let co: F X F —> IR be a bilinear functional. If {az}i is a basis for V, then co determines a matrix tij = co(aij aj). We know that if corf(^) = co(f, 77), then co^, G F* and 77 i-^ co^, is a linear mapping T from F to F*. We leave it as an exercise for the reader to show that {tij} is the matrix of T with respect to the basis {a^} for F and its dual basis for F* (Exercise 4.1). If f = L^ XiOLi and 77 = Li 2/i«i> then w(f> ^) = Z) XiyjO)(ai, aj) = X) ^zi^zl/i- In particular, if we set g(Q = co(f, f), then g(Q = X!^,^ UjXiXj is a homogeneous quadratic polynomial in the coordinates Xi. For the rest of this section we assume that co is symmetric: co(f, 77) = co(77, f). Then we can recover co from the quadratic form q by as the reader can easily check. In particular, if the bilinear form co is not identically zero, then there are vectors f such that g(Q = co(f, f) 9^ 0. What we want to do is to show that we can find a basis {aJ 1 for F such that co(ai, aj) ^ 0 if i 7^ j and co(ai, ai) has one of the three values 0, ± 1. Borrowing from the standard usage of scalar product theory (see Chapter 5), we say that such a basis is orthonormal. Our proof that an orthonormal basis exists will be an induction on n = dim F. If n = 1, then any nonzero vector /? is a basis, and if co(/3,13) 9^ 0, then we can choose a = xjS so that a:^co(/3,13) = o)(a, a) = ±1, the required value of x obviously being x = |co(/3,13)\~^'^. In the general case, if co is the zero functional, then any basis will trivially be orthonormal, and we can therefore suppose that co is not identically 0. Then there exists a fi such that co(/3,13) 7^ 0, as we noted earlier. We set an = xfi^ where x is chosen to make qian) = o)(an, an) = ±1. The nonzero linear functional/(f) = co(f, an) has an (n — 1)-dimensional null space N^ and if we let co' be the restriction of ootoNxNf then co' has an orthonormal basis {aJ i~^ by the inductive hypothesis. Also co(ai, an) = o)(an, oti) = 0 if z < n, because ai is in the null space of/. Therefore, {az} 1 is an orthonormal basis for co, and we have reached our goal: Theorem 7.1. If co is a symmetric bilinear functional on a finite-dimensional real vector space F, then F has an co-orthonormal basis. For an co-orthonormal basis the expansion co(f, 77) = 2] ^iVj^if^ii «>) reduces to n w(f, v) = Yl XiyiQ(oLi), z = l where qiai) = ±1 or 0. If we let Vi be the span of those basis vectors ai for which q(ai) = 1, and similarly for F_i and Fq, then we see that q(^) > 0 for every nonzero f in Fi, g(f) < 0 for every nonzero vector f in F_i, and q = 0
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 113 on Vq. Furthermore, F = Fi 0 F_i 0 Fq, and the three subspaces are co-orthonormal to each other (which means that co.(f, 77) = 0 if ^ G Vi and 7} G F_i, etc.). Finally, g(f) < 0 for every f in F_i 0 Vq. If we choose another orthonormal basis {I3i} and let TFi, TF_i, and TFq be its corresponding subspaces, then Wi may be different from Vi, but their dimensions must be the same. For Wi n (F_i 0 Fq) = {0}, since any nonzero f in this intersection would yield the contradictory inequalities q(^) > 0 and q(0 ^ 0- Thus Wi can be extended to a complement of F_i 0 Fq, and since Fi zs a complement, we have (i(TFi) < d(Vi). Similarly, d(Vi) < d(TFi), and the dimensions therefore are equal. Incidentally, this shows that TFi is a complement of F_i 0 Vq. In exactly the same way, we find that d(TF_i) = d(F_i) and finally, by subtraction, that d(Wo) = d(Vo). It is conventional to reorder an co-orthonormal basis {aj 1 so that all the a/s with q(ai) = 1 come first, then those with g(a/) = — 1, and finally those with g(a^) = 0. Our results above can then be stated as follows: Theorem 7.2. If co is a symmetric bilinear functional on a finite-dimensional space F, then there are integers n and p such that if {aiJT is any co-ortho- normal basis in conventional order, and if f = Hi* Xiai, then q(0 = xi^ \- Xp — Xp^i — . • . — Xp^n p p-j-n = / J Xi / J X{. 1 P + 1 The integer p — n is called the signature of the form q (or its associated symmetric bilinear functional co), and p + n is its rank. Note that p + n is the dimension of the column space of the above matrix of g, and hence equals the dimension of the range of the related linear map T. Therefore, p + n is the rank of every matrix of q. An inductive proof that an orthonormal basis exists doesn^t show us how to find one in practice. Let us suppose that we have the matrix {tij} of co with respect to some basis {oLi}\ before us, so that co(f, 77) = Hxiyjtij, where f = Hi Xiai, r] = Hi Vi^if and tij = co(a^, a^), and we want to know how to go about actually finding an orthonormal basis {/3j i- The main problem is to find an orthogonal basis; normalization is then trivial. The first objective is to find a vector 13 such that co(/3, /?) 9^ 0. If some ta = co(ai, a^) is not zero, we can take 13 = ai. If all tii = 0 and the form co is not the zero form, there must be some lij 9^ 0, say ti2 9^ 0. If we set Ti = ai + a2 and 7^- = ai for i > 1, then {7^} 1 is a basis, and the matrix s = {s^J of co with respect to the basis {7^} has sii = w(^i, ^1) = w(«i + «2, «i + 0L2) = til + 2^12 + ^22 = 2^12 ^ 0. Similarly, Sij = tij if either i or j is greater than 1. For example, if co is the bilinear form on U^ defined by co(x, y) = 0:12/2 + X22/1, then its matrix tij = o)(8\ 8^) is fO ll 1 oh
114 FINITE-DIMENSIONAL VECTOR SPACES 2.7 and we must change the basis to get ^n f^ 0. According to the above scheme, we set Ti = 5^ + 8^ and 72 = 8^ and get the new matrix Sij = co(7^, 7^), which works out to Li oj The next step is to find a basis for the null space of the functional co(f, 7i) = ^ XiSu. We do this by modifying 72, ... , 7^; we replace Jj by Jj + c7i and calculate c so that this vector is in the null space. Therefore, we want 0 = co(7y + c7i, 7i) = Sij + csii, and so c = —SiyAii. Note that we cannot take this orthogonalizing step until we have made Sn f^ 0. The new set still spans and thus is a basis, and the new matrix {vij} has rn f^ 0 and Vij = Vji = 0 for j > 1. We now simply repeat the whole procedure for the restriction of co to this (n — l)-dimensional null space, with matrix {rij : 2 < i, j < n} j and so on. This is a long process, but until we normalize, it consists only of rational operations on the original matrix. We add, subtract, multiply, and divide, but we do not have to find roots of polynomial equations. Continuing our above example, we set /3i = 7i, but we have to replace 72 by /32 = ^2 — (si2Aii)^i = ^2 — i^i- The final matrix Uj = o)(l3i, pj) has ^11 = Sll = 2, ri2 = C0(/3i, 132) = C0(7i, 72 - i7i) = Si2 - iSll = 0, and r22 = ^(^2 — i^i? ^2 — 2^1) = ^22 — ^12 + Sii/4 = —J, so that Ol {rij} 2J The final basis is/3i = 7^= 8^ + 8^2indp2 = ^2 - i^i = 8^ - ^8^ + 8^) = (52 - 5i)/2. The steps we had to take above are reminiscent of row reduction, but since we are changing bases simultaneously in the domain and range spaces of the transformation T:V—^V* associated with co, each step involves simultaneously premultiplying and postmultiplying by an elementary matrix. That is, we are simultaneously row and column reducing. It should be intuitively clear that this has to be the case if we are to operate on a symmetric matrix in such a way as to keep it symmetric. For additional information about quadratic forms, we go back to the change of basis formula for the matrix of a transformation: t" = b • t' • a~^ Here the transformation T associated with the form co is from F to F*, and sob = (a*)~\ according to our calculations in Section 4. Now one of the properties of the determinant function is that A(T*) = A(T'), and so A(a*) = A(a). Therefore, if t and s are the matrices of a quadratic form with respect to a first and second basis in V, and if a is the change of basis matrix, then s = (a*)""^ • t • a"'^ and A(s) = (A(a~^))^ A(<). Therefore, a quadratic form has parity. If it is non- singular, then its determinant is either always positive or always negative, and
2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 115 we can call it even or odd. In our continuing example, the beginning and final matrices 0 1 1 0 and both have determinant —1. In the two-dimensional case, the determinant of a form with respect to an orthonormalized basis is +1 if the diagonal elements are both +1 or both —1, and —1 if they are of opposite sign. We can therefore read off the signature of a nonsingular form over a two-dimensional space ivithout orthonormalizing. If the determinant ^11^22 ~ (^12)^ is positive, the signature is d=2, and we can determine which by looking at ^n (since ^n is then unchanged by our orthogonalizing procedure). Thus the signature is +2 or —2 depending on whether ^n > 0 or ^11 < 0. If the determinant is negative, then the signature is 0. Thus the signature of the form co(x, y) = ^12/2 + ^22/1 > with matrix ro r Li 0. is known to be 0, without any calculation. Theorems 7.1 and 7.2 are important for the classification of critical points of real-valued functions on vector spaces. We shall see in Section 3.16 that the second differential of such a function F is a symmetric bilinear functional, and that the signature of its form has the same significance in determining the behavior of F near a point at which its first differential is zero that the sign of the second derivative has in the elementary calculus. A quadratic form q is said to be definite if q(^) is never zero except for ^ = 0. Then g(f) must always have the same sign, and q is accordingly called positive definite or negative definite. Looking back to Theorem 7.2, it should be obvious that q is positive definite only \i p = d{V) and n = 0, and negative definite only if n = d{V) and p = 0. A symmetric bilinear functional whose associated (quadratic form is positive definite is called a scalar product. This is a very important notion on general vector spaces, and the whole of Chapter 5 is devoted to developing some of its implications.
CHAPTER 3 THE DIFFERENTIAL CALCULUS Our algebraic background is now adequate for the differential calculus, but we still need some multidimensional limit theory. Roughly speaking, the differential calculus is the theory of linear approximations to nonlinear mappings, and we have to know what we mean by approximation in general vector settings. We shall therefore start this chapter by studying the notion of a measure of length, called a nornif for the vectors in a vector space V. We can then study the phenomenon suggested by the way in which a tangent plane to a surface approximates the surface near the point of tangency. This is the general theory of unique local linear approximations of mappings, called differentials. The collection of rules for computing differentials includes all the familiar laws of the differential calculus, and achieves the same goal of allowing complicated calculations to be performed in a routine way. However, the theory is richer in the multidimensional setting, and one new aspect which we must master is the interplay between the linear transformations which are differentials and their evaluations at given vectors, which are directional derivatives in general and partial derivatives when the vectors belong to a basis. In particular, when the spaces in question are finite-dimensional and are replaced by Cartesian spaces through a choice of bases, then the differential is entirely equivalent to its matrix, which is a certain matrix of partial derivatives called the Jacobian matrix of the mapping. Then the rules of the differential calculus are expressed in terms of matrix operations. Maximum and minimum points of real-valued functions are found exactly as before, by computing the differential and setting it equal to zero. However, we shall neglect this subject, except in starred sections. It also is much richer than its one-variable counterpart, and in certain infinite-dimensional situations it becomes the subject called the calculus of variations. Finally, we shall begin our study of the inverse-mapping theorem and the implicit-function theorem. The inverse-mapping theorem states that if a mapping between vector spaces is continuously differentiable, and if its differential at a point a is invertible (as a linear transformation), then the mapping itself is invertible in the neighborhood of a. The implicit-function theorem states that if a continuously differentiable vector-valued function G of two vector variables is set equal to zero, and if the second partial differential of G is invertible (as a linear mapping) at a point <a, P> where G(a, /?) = 0, then the equation 116
3.1 REVIEW IN 117 ^(^v) = 0 can be solved for rj in terms of f near this point. That is, there is a uniquely determined mapping tj = F(^) defined near a such that 13 = F(a) and such that G(f, ^(O) = 0 in the neighborhood of a. These two theorems are fundamental to the further development of analysis. They are deeper results than our work up to this point in that they depend on a special property of vector spaces called completeness; we shall have to put off part of their proofs to the next chapter, where we shall study completeness in a fairly systematic way. In a number of starred sections at the end of the chapter we present some liarder material that we do not expect the reader to master. However, he should try to get a rough idea of what is going on. I. REVIEW IN U l^]very student of the calculus is presumed to be familiar with the properties of the real number system and the theory of limits. But we shall need more than familiarity at this point. It will be absolutely essential that the student understand the €-definitions and be able to work with them. To be on the safe side, we shall review some of this material in the setting of limits of functions; the confident reader can skip it. We suppose that all the functions we consider are defined at least on an open interval containing a, (except possibly at a itself. The need for this exception is shown by the difference (juotients of the calculus, which are not defined at the point near which their i)ehavior is crucial. Definition. f(x) approaches I as x approaches a (in symbols, f(x) X -^ a) U for every positive e there exists a positive 8 such that 0 <\x- a\ < 8 =^ \f(x) -l\<€. I as We also say that I is the limit of f(x) as x approaches a and write \imx-^af(x) = l. The displayed statement in the definition is understood to be universally quantified in a:, so that the definition really begins with the three (juantifiers (V€^^)(35^^)(Va:). These prefixing quantifiers make the definition sound artificial and unidiomatic when read as ordinary prose, but the reader will remember from our introductory discussion of quantification that (his artificiality is absolutely necessary in order for the meaning of the sentence to be clear and unambiguous. Any change in the order of the quantifiers (Ve) (3 8) (Vx) changes the meaning of the statement. The meaning of the inner universal quantification (Va:)(0 <\x~ a\ < 8 =^ \f{x) - l\ < e) is intuitive and easily pictured (see Fig. 3.1).
118 THE DIFFERENTIAL CALCULUS 3.1 For all X closer to a than 8 the value of / at a: is closer to I than e. The definition begins by stating that such a positive 8 can be found for each positive e. Of course, 8 will vary with e; if € is made smaller, we will generally have to go closer to a, that is, we will have to take 8 smaller, before all the values of / on (a — 5, a + 5) — {a} become e close to I. The variables ^e^ and '8^ are almost always restricted to positive real numbers, and from now on we shall let this restriction be implicit unless there seems to be some special call for explicitness. Thus we shall write simply (V€)(35) . . . The definition of convergence is used in various ways. In the simplest situations we are given one or more functions having limits at a, say, f(x) —> u and g(x) -^ v, and we want to prove that some other function h has a limit iv at a. In such cases we always try to find an inequality expressing the quantity we wish to make small, \h(x) — iv], in terms of the quantities which we know can be made small, \f(x) — u\ and \g(x) — v\. For example, suppose that h = f + g. Since f(x) is close to u and g(x) is close to Vy clearly h(x) is close toio = u + v. But how close? Since h(x) — iv = {f{x) — u) + (g(x) — v), we have \Hx) - iv\ < \f(x) -u\ + \g{x) - v\. From this it is clear that in order to make \h(x) — iv\ less than e it is sufficient to make each of \f(x) — u\ and \g(x) — v\ less than e/2. Therefore, given any €, we can take 8i so that 0 < \x — a\ < 8i =^ \f(x) — u\ < e/2, and ^2 so that 0 < \x — a\ < 82 =^ \g(x) — v\ < e/2, and we can then take 8 as the smaller of these two numbers, so that if 0 < |a: — a| < 5, then both inequalities hold. Thus 0 <\x- a\ < 8 ^ \hix) -w\< \f(x) -u\ + \g(x) -v\ <^ + ^= e, and we have found the desired 8 for the function h. Suppose next that u 9^ 0 and that h = 1//. Clearly, h(x) is close to to = 1/u when/(a:) is close to li, and so we try to express h(x) — iv in terms oi f{x) — u. Thus 1 I _u- fix) h(x) — w = f{x) u f(x)u and so \h(x) — io\ < \f(x) — u\/\f(x)u\. The trouble here is that the denominator is variable, and if it should happen to be very small, it might cancel the smallness of \f(x) — u\ and not force a small quotient. But the answer to this problem is easy. Since f(x) is close to u and u is not zero, f(x) cannot be close to zero. For instance, if f(x) is closer to u than \u\/2, then f(x) must be farther from 0 than \u\/2. We therefore choose 81 so that 0 < \x — a\ < 81 => \fix) — u\ < \u\/2, from which it follows that \f(x)\ > \u\/2. Then \h(x) -w\ < 2\f(x) - u\/\u\^
3.1 REVIEW IN IR 119 and now, given any €, we take ^2 so that 0 < |a: — a| < 52 => l/W — u\ < e\u\^/2. Again taking 8 as the smaller of 5i and ^2, so that both inequalities will hold simultaneously when 0 < |a: — a| < 5, we have 0 <\x - a\ < 8 =^ \hix) -w\< 2\f(x) - u\/\u\^ < 2e\u\y2\u\^ = e, and again we have found our 8 for the function h. We have tried to show how one would think about these situations. The actual proof that would be written down would only show the choice of 8. Thus, Lemma 1.1. If/(a:) -^ u and g(x) -^ v as x -^ a, then f(x) + g(x) -^ u + v as X -^ a. Proof. Given €, choose 8i so that 0<|a: — a|<5i=^ \f(x) — a\ < e/2 (by the assumed convergence of f to u at a), and, similarly, choose ^2 so that 0 < \x — a\ < 82 =^ loi^) — v\ < €/2. Take 8 as the smaller of 81 and 52- Then 0<\x- a\ < 8 ^ \ (fix) + g(x)) - (u + v)\ < \f(x) -u\ + \g{x) -v\ < e/2 + e/2 = e. Thus we have proved that for every e there is a 5 such that 0 <\x-a\ < 8 ^ \{f(x) + g(x)) - (u + v)\ < €, and we are done. D In addition to understanding e-techniques in limit theory, it is necessary to understand and to be able to use the fundamental property of the real number system called the least upper hound property. In the following statement of the property the semi-infinite interval (— 00, a] is of course the subset {xEl^'.x <a]. If i4 is a nonempty subset of R such that ^4 C (—00, a] for some a, then there exists a uniquely determined smallest number h such that ^4 C (— 00, 6]. A number a such that ^4 C (— 00, a] is called an upper hound of A; clearly, a is an upper bound of A if and only if every a: in ^4 is less than or equal to a. A set having an upper bound is said to be bounded above. The property says that u nonempty set A which is bounded above has a least upper bound (lub). If we reverse the order relation by multiplying everything by —1, then we have the alternative formulation which asserts that a nonempty subset of R that is i)ounded below has a greatest lower hound (gib). The least upper bound of the interval (0, 1) is 1. The least upper bound of [0, 1] is also 1. The greatest lower hound of {1/n : n a positive integer} is 0. Furthermore, lub {x : a: is a positive rational number and x"^ < 2} = \/2, gib {e^ :x E:R} =0, and lub {e^ : x is rational and x < \/2} = e^^
120 THE DIFFERENTIAL CALCULUS 3.1 EXERCISES 1.1 Prove that if f(x) —> / and f(x) —> ?n as x —> a, then / = m. We can therefore talk about the limit of / as x —> a. 1.2 Prove that if/(x) —> / and g(x) —> m (as a; —> a), then /(x) gf(x) —> //?t as a; —> a. 1.3 Prove that |x — a| < \a\/2=^ \x\ > \a\/2. 1.4 Prove (in detail) the greatest lower bound property from the least upper bound l)roperty. 1.5 Show that lub A = a; if and only if x is an upper bound of .1 and, for every positive e, X — e is not an upper bound of .1. 1.6 Let A and B be subsets of R that are nonempty and bounded above. Show that A + J5 is nonempty and bounded above and that lub (A + B) = lub A + lub B. 1.7 Formulate and prove a correct theorem about the least upper bound of the product of two sets. 1.8 Define the notion of a one-sided limit for a function whose domain is a subset of IR. For example, we want to be able to discuss the limit of f(x) as x approaches a from below, which we might designate lim/(x). xt a 1.9 If the domain of a real-valued function / is an interval, say [a, b], we say that / is an increasing function if X < y =^ fix) <f{y). Prove that an increasing function has one-sided limits everywhere. 1.10 Let [a, b] be a closed interval in R, and let/: [a, 6] —> IR be increasing. Show that \imx-^yf{x) = f(y) for all y in [a, b] (/ is continuous on [a, b]) if and only if the range of / does not omit an3^ subinterval (c, d) C lf(ci),f(b)]. [Hint: Suppose the range omits (c, d), and set y = lub {x :f(x) < c}. Then/(x) -{^ f{y) as x —> y.] 1.11 A set that intersects every open subinterval of an interval [s, t] is said to be dense in [s, t]. Show that if/: [a, 6] —> IR is increasing and range/is dense in [f{a),f(b)], then range/ = lf(a),f(b)]. (For an}^ 2 between/(a) and/(6) set 2/ = lub {x:f(x) < z], etc.) 1.12 Assuming the results of the above two exercises, show that if / is a continuous strictly increasing function from [a, b] to R, and if r = /(a) and s = f(b), then f~^ is a continuous strictly increasing function from [r, s] to R. [A function / is continuous if f(x) —^f{y) as X —> 2/ for every y in its domain; it is strictly increasing U x < y => fix) <f{y).] 1.13 Argue somewhat as in Exercise Lll above to prove that if /: [a, b] —^ R is continuous on [a, 6], then the range of / includes lf(a),f(b)]. This is the intermediate- value theorem. 1.14 Suppose the function (/: IR —> IR satisfies q{xy) = q(x) q(y) for all x^yEiR. Note that q(x) = x"* (n a positive integer) and q(x) = \x\'' (r any real number) satisfy this "functional equation". So does q(x) = 0 (r = —00 ?). Show that if q satisfies the functional equation and q{x) > 1 for x > 1, then there is a real number r > 1 such that q(x) = x^ for all positive x.
3.2 NORMS 121 1.15 Show that if q is continuous and satisfies the functional equation q{xy) = ^l{^) Q(y) for Q-ll ^, 2/ E f^, and if there is at least one point a where q(a) 9^ 0, I, then <l(x) = x" for all positive x. Conclude that if also q is nonnegative, then q{x) = Ixl"" on IR. 1.16 Show that if q{x) = \x\% and if q{x + y) < q{x) + q{y), then r < 1. (Try y = I and X large; what is q'{x) like if r > 1?) 2. NORMS Fn the limit theory of IR, as reviewed briefly above, the absolute-value function is used prominently in expressions like ^\x — y^ to designate the distance i)etween two numbers, here between x and y. The definition of the convergence of/(a:) to u is simply a careful statement of what it means to say that the distance \f(x) — u\ tends to zero as the distance \x — a\ tends to zero. The properties of |.r| which we have used in our proofs are 1) \x\ > Oif a: F^ 0, and |0| = 0; 2) \xy\ = \x\ \y\] 3) \x + y\< \x\ + \y\. The limit theory of vector spaces is studied in terms of functions called norms, which serve as multidimensional analogues of the absolute-value function on U. Thus, if p: V —^ IR is a norm, then we want to interpret p(a) as the "size" of a and p{a — ^) as the "distance" between a and ^. However, if V is not one-dimensional, there is no one notion of size that is most natural. For example, if / is a positive continuous function on [a, 6], and if we ask the reader for a number which could be used as a measure of how "large" / is, there are two possibilities that will probably occur to him: the maximum value of/and the area under the graph of/. Certainly, / must be considered small if max / is small. Hut also, we would have to agree that / is small in a different sense if its area is small. These are two examples of norms on the vector space V = 6 ([a, h]) of all continuous functions on [a, h]: p{f) = max {\m\ :tG[a, b]} and q{f) = [' |/(0| dt. Ja Note that / can be small in the second sense and not in the first. In order to be useful, a notion of size for a vector must have properties analogous to those of the absolute-value function on R. Definition. A norm is a real-valued function p on a vector space V such that nl. p(a) > 0 if a F^ 0 (positivity); n2. p{xa) = \x\p{a) for all a G F, a: G IR (homogeneity); n3. p{a -\- P) < p(a) + j){fi) for all a, j^ G F (triangle inequality). A normed linear space (nls), or normed vector space, is a vector space V t()^j;ether with a norm p on V. A normed linear space is thus really a pair
122 THE DIFFERENTIAL CALCULUS 3.2 <V, p> f but generally we speak simply of the normed linear space V, a definite norm on V then being understood. It has been customary to designate the norm of a by ||a||, presumably to suggest the analogy with absolute value. The triangle inequality n3 then becomes ||a + i3|| < ||a|| + ||i3||, which is almost identical in form with the basic absolute-value inequality |a: + 2/| < \x\ + \y\. Similarly, n2 becomes \\xa\\ = \x\ ||a||, analogous to \xy\ = \x\ \y\ in U. Furthermore, \\a — ^\\ is similarly interpreted as the distance between a and p. This is reasonable since if we set a = ^ — 7] and ^ = v — ^y then n3 becomes the usual triangle inequality of geometry: U- fll < U- v\\ + \\v- fl|. We shall use both the double bar notation and the "p"-notation for norms; each is on occasion superior to the other. The most commonly used norms on IR^ are ||x||i = JZi \xi\f the Euclidean norm ||x||2 = (Uli^fy^) ai^d ||x||oo = max {|a:e|}e. Similar norms on the infinite-dimensional vector space e([a, h]) of all continuous real-valued functions on [a, h] are 11/111= C \m\dt, Ja \\f\\2 = (^[\m\u^"\ ll/IU = max {|/(0| : a < ^ < 6}. It should be easy for the reader to check that || ||i is a norm in both cases above, and we shall take up the so-called uniform norms || ||oo in the next paragraph. The Euclidean norms || II2 are trickier; their properties depend on scalar product considerations. These will be discussed in Chapter 5. Meanwhile, so that the reader can use the Euclidean norm || II2 on IR^, we shall ask him to prove the triangle inequality for it (the other axioms being obvious) by brute force in an exercise. On U itself the absolute value is a norm, and it is the only norm to within a constant multiple. We can transfer the above norms on IR^ to arbitrary finite-dimensional spaces by the following general remark. Lemma 2.1. If p is a norm on a vector space W and T is an infective linear map from a vector space V to TT, then p o 7" is a norm on V. Proof. The proof is left to the reader. Uniform norms. The two norms || ||oo considered above are special cases of a very general situation. Let A be an arbitrary nonempty set, and let (B(i4, IR) be the set of all bounded functions /: ^4 -^ IR. That is, / G (B(i4, IR) if and only if / G IR^ and range / C [—b, h] for some 6 G IR. This is the same as saying that range |/| C [0, 6], and we call any such h a hound of |/|. The set (B(i4, IR) is a
3.2 NORMS 123 vector space V, since if |/| and |^| are bounded by h and c, respectively, then l^/+ vol is bounded by \x\h + \y\c. The uniform norm \\f\\oo is defined as the smallest bound of |/|. That is, ||/|U=lub{|/(p)|:pe4}. Of course, it has to be checked that || ||oo is a norm. For any p in A, \Kp) + g(p)\ < \Kp)\ + \9(P)\ < ll/IU + \\g\U. Thus ll/lloo + ll^lloo is a bound of |/ + g\ and is therefore greater than or equal to the smallest such bound, which is ||/ + ^||x. This gives the triangle inequality. Next we note that if a: f^ 0, then h bounds |/| if and only if \x\h bounds |a:/|, and it follows that ||a:/||oo = |a:| ||/||oo. Finally, ||/|U > 0, and ||/|U = 0 only if / is the zero function. We can replace R by any normed linear space W in the above discussion. A function f: A -^ W is bounded by h if and only if ||/(7))|| < h for all p in Aj and we define the corresponding uniform norm on (^(A, W) by ||/|U = Iub{||/(p)||:pe4}. If / G e([0, 1]), then we know that the continuous function / assumes the least upper bound of its range as a value (that is, / "assumes its maximum value"), so that then ||/||oo is the maximum value of |/|. In general, however, the definition must be given in terms of lub. Balls. Remembering that \\a — f|| is interpreted as the distance from a to f, it is natural to define the open hall of radius r about the center a as {f : ||a — f || < r}. We designate this ball Br(a). Translation through 13 preserves distance, \\TpU) - ^^('?)ll = ll(^ + ^) - ('? + ^)ll = U-vl and therefore f G Br(a.) if and only if ^ + 13 G Br(a + 13), That is, translation through ^ carries Br(a) into Br(a + 13): Tp[Br(a)] = Br(a + ^). Also, scalar multiplication by c multiplies all distances by c, and it follows in a similar way that cBr(a) = Bcr(ca). Although Br (a) behaves like a ball, the actual set being defined is different for different norms, and some of them "look unspherelike". The unit balls about the origin in U^ for the three norms || ||i, || II2, and || ||oo are shown in Fig. 3.2. A subset i4 of a nls V is hounded if it lies in some ball, say Br (a). Then it also lies in a ball about the origin, namely Br+llall(0). This is simply the fact that if II f — a|| < r, then ||f|| < r+ ||a||, which we get from the triangle inequality upon rewriting ||f|| as ||(f — a) + a\\. The radius of the largest ball about a vector ^ which does not touch a set A is naturally called the distance from 13 to A, It is clearly gib {|| ^ — ^\\ : ^ G A} (see Fig. 3.3).
124 THE DIFFERENTIAL CALCULUS 3.2 p{^,A)=r Fig. 3.2 Fig. 3.3 Fig. 3.4 A point a is an interior point of a set A if some ball about a is included in A. This is equivalent to saying that the distance from a to the complement of A is positive (supposing that A is not the whole of F), and should coincide with the reader^s intuitive notion of what an "inside" point should be. A subset A of a normed linear space is said to be open if every point of A is an interior point. If our language is to be consistent, an open ball should be an open set. It is: if a G Br(^), then \\a — I3\\ < r, and then B§(a) C Br(l3), provided that 8 < r — \\a — i^ll, by virtue of the triangle inequality (see Fig. 3.4). The reader should write down the detailed proof. He has to show that if f G B§(a)f then f G Br{fi). Our intuitions about distances are quite trustworthy, but they should always be checked by a computation. The reader probably can see by a mental argument that the union of any collection of open sets is open. In particular, the union of any collection of open balls is open (Fig. 3.5), and this is probably the most intuitive way of visualizing an open set. (See Exercise 2.9.) Fig. 3.5 Fig. 3.6 A subset C is said to be closed if its complement C is open. Our discussion above shows that a nonempty set C is closed if and only if every point not in it is at a positive distance from it: a ^ C =^ p(aj C) > 0. The so-called closed ball of radius r about 13, B = {^ :\\^ — I3\\ < r}, is a closed set. As Fig. 3.6 suggests, the proof is another application of the triangle inequality.
3.2 NORMS 125 EXERCISES 2.1 Show that if ||f - a|| < ||a||/2, then U\\ > \\a\\/2. 2.2 Prove in detail that 11x111 = i: N 1 is a norm on W^. Also prove that ll/lli = [' 1/(01 rf< is a norm on 6([a, 6]). 2.3 For X in W let |a:| be the Euclidean length |x| = [E x^J \ and let (x, y) be the scalar product n (xi y) = Zl ^t2/t- The Schwarz inequality says that l(x,y)| < H|y| and that the inequality is strict if x and y are independent. a) Prove the Schwarz inequality for the case n = 2 by squaring and canceling. b) Now prove it for the general n in the same way. 2.4 Continuing the above exercise, prove that the Euclidean length |x| is a norm. The crucial step is the triangle inequality, |x + y| < |x| + |y|. Reduce it to the Schwarz inequaUty by squaring and canceHng. This is of course our two-norm ||x||2. 2.5 Prove that the unit balls for the norms || ||i and || ||oo on M^ are as shown in Fig. 3.2. 2.6 Prove that an open ball is an open set. 2.7 Prove that a closed ball is a closed set. 2.8 Give an example of a subset of R^ that is neither open nor closed. 2.9 Show from the definition of an open set that any open set is the union of a family (perhaps very large!) of open balls. Show that any union of open sets is open. Conclude, therefore, that a set is open if and only if it is a union of open balls. 2.10 A subset ^ of a normed hnear space V is said to be convex if A includes the line segment joining any two of its points. We know that the line segment from a to jS is the image of [0, 1] under the mapping t —^ t^ -\- {I — t)a. Thus A is convex if and only li a,l3G A and t G [0, l]=^tl3+ (I — t)a G A. Prove that every ball jBr(T) in a normed hnear space V is convex. 2.11 A seminorm is the same as a norm except that the positivity condition nl is relaxed to nonnegativity: nl'. p{a) > 0 for all a.
126 THE DIFFERENTIAL CALCULUS 3.3 Thus p{a) may be 0 for some nonzero a. Every norm is in particular a seminorm. Prove: a) If p is a seminorm on a vector space W and T is a linear mapping from V to W, then J) o T is a seminorm on V. b) p o T' is a norm if and only if T is infective and p is a norm on range T. 2.12 Show that the sum of two seminorms is a seminorm. 2.13 Prove from the above two exercises (and not by a direct calculation) that qif) = ll/'IU + l/«o)| is a seminorm on the space 6^ ([a, h]) of all continuously differentiable real-valued functions on [a, h], where ^o is a fixed point in [a, h]. Prove that g is a norm. 2.14 Show that the sum of two bounded sets is bounded. 2.15 Prove that the sum Brio) + Bs{fi) is exactly the ball jBr+s(a + P). 3. CONTINUITY Let V and W be any two normed linear spaces. We shall designate both norms by II II. This ambiguous usage does not cause confusion. It is like the ambiguous use of "0" for the zero elements of all the vector spaces under consideration. If we replace the absolute value sign | | by the general norm symbol || || in the definition we gave earlier for the limit of a real-valued function of a real variable, it becomes verbatim the corresponding definition of convergence in the general setting. However, we shall repeat the definition and take the occasion to relax the hypothesis on the domain of/. Accordingly, let A by any subset of F, and let / be any mapping from A to W. Definition. We say that /(f) approaches 13 as ^ approaches a, and write /(f) -^ i^ as f -^ a, if for every e there is a 5 such that ^eAandO<U-a\\<8=^ ||/(f) - I3\\ < e. If a G A and /(f) -^ f(a) as f -^ a, then we say that / is continuous at a. We can then drop the requirement that f f^ a and have the direct €,5- characterization of continuity: / is continuous at a if for every e there exists a 8 such that llf — ajl < 8=^ ||/(f) — /(a)|| < €. It is understood here that f is universally quantified over the domain A of/. We say that/is continuous if / is continuous at every point a in its domain. If the absolute value of a number is replaced by the norm of a vector, the limit theorems that we sampled in Section 1 hold verbatim for normed linear spaces. We shall ask the reader to write out a few of these transcriptions in the exercises. There is a property stronger than continuity at a which is much simpler to use when it is available. We say that / is Lipschitz continuous at a if there is a constant c such that ||/(f) — /(a)|| < c||f — a|| for all f sufficiently close to a.
3.3 CONTINUITY 127 That is, there are constants c and r such that U-a\\ <r ^ \\m) -f{a)\\ <c||f-a||. The point is that now we can take b simply as e/c (provided e is small enough so that this makes 5 < r; otherwise we have to set b = min {e/c, r}). We say that / is a Lipschitz function (on its domain A) if there is a constant c such that WfiO - f(v)\\ < c||f - v\\ for all f, T] in A. For a /mear map T:V -^W the Lipschitz inequality is more simply written as mm < ciifii for all f G F; we just use the fact that now T(^) — T{ri) = T{^ — rj) and set f = ^ — -q. In this context it is conventional to call T a hounded linear mapping rather than a Lipschitz linear mapping, and any such c is called a hound of T, We know from the beginning calculus that if / is a continuous real-valued function on [a, h] (that is, if/G e([a, 6])), then |Jj'/(x) (ix| < m(6 — a), where m is the maximum value of \f{x)\. But this is just the uniform norm of/, so that the inequality can be rewritten as \^a f\ ^ (h — a)||/||oo. This shows that if the uniform norm is used on e([a, 6]), then /»—^ Jj'/ is a bounded linear functional, with bound h — a. It should immediately be pointed out that this is not the same notion of boundedness we discussed earlier. There we called a real-valued function bounded if its range was a bounded subset of U. The analogue here would be to call a vector-valued function bounded if its range is norm bounded. But a nonzero linear transformation cannot be bounded in this sense, because \\T{xa)\\ = \x\ \\T{a)l The present definition amounts to the boundedness in the earlier sense of the quotient T(a)/\\a\\ (on V — {0}). It turns out that for a linear map T, being continuous and being Lipschitz are the same thing. Theorem 3.1. Let T be a linear mapping from a normed linear space F to a normed linear space W. Then the following conditions are equivalent: 1) T is continuous at one point; 2) T is continuous; 3) T is bounded. Proof. (1) => (3). Suppose T is continuous at ao- Then, taking e = 1, there exists 8 such that \\a — aoll < 5 => \\T(a) — T(ao)\\ < 1. Setting ^ = a ~ ao iand using the additivity of T, we have ||f|| < 5 => ||T(Q|| < 1. Now for any nonzero v^ ^ = ^v/^vW has norm 8/2. Therefore, \\T(^)\\ < 1. But mm = 8\\Tir})\\/2\\vl giving ||r(^)|| < 2\\r}\\/8. Thus T is bounded by C = '2/5.
128 THE DIFFERENTIAL CALCULUS 3.3 (3) => (2). Suppose ||T(Q|| < CUW for all ^ Then for any ao and any e we can take 8 = e/C and have \\a - aoW < 8 ^ \\Tia) - T(ao)|| = \\Tia - ao)\\ < C\\a - ao\\ <Cd=€. (2) => (1). Trivial. □ In the lemma below we prove that the norm function is a Lipschitz function from V to U. Lemma 3.1. For all a,l3eV,\ \\a\\ - \\l3\\ | < ||a - I3\\. Proof. We have ||a|| = \\ia - 13) + I3\\ < \\a - p\\ + p||, so that ||a|| - \\p\\ < \\a - p\\. Similarly, \\I3\\ - \\a\\ < \\^ - a\\ = \\a - I3\\. This pair of inequalities is equivalent to the lemma. □ Other Lipschitz mappings will appear when we study mappings with continuous differentials. Roughly speaking, the Lipschitz property lies between continuity and continuous differentiability, and it is frequently the condition that we actually apply under the hypothesis of continuous differentiability. The smallest bound of a bounded linear transformation T is called its norm. That ic; n| = Iub{||r(a)||/||a||:ap^O}. For example, let T: e(fa, b]) -^ IR be the Riemann integral, T(f) = ja fix) dx. We saw earlier that if we use the uniform norm ||/||oo on e([a, 6]), then T is bounded by 6 — a : | T{f)\ < (b — a) ||/||oo. On the other hand, there is no smaller bound, because j^ 1 = b - a = (b - a)||l||oc. Thus ||T|| = b - a. Other formulations of the above definition are useful. Since \\T(a)\\/\\a\\ = ||r(a/||a||)|l by homogeneity, and since 13 = oi/\\a\\ has norm 1, we have ||r|| = iub{r(^)||: 11^11 = 1}. Finally, if ||7|| < 1, then 7 = x^, where ||/3|| = 1 and |a;| < 1, and \\F{y)\\ = \x\ \\Fm\ < \\Fm- We therefore have an inefficient but still useful characterization: liril = lub {||r(7)||: IMI <i}. These last two formulations are uniform norms. Thus, if ^i is the closed unit ball {f : llfll < 1}, we see that a linear T is bounded if and only if T \ Bi is bounded in the old sense, and then iiT|| = iiTr^iiioo. AlinearmapT: F-^ Wisbouncledbeloiv by b if \\Ti^)\\ > bU\\ forall ^nF. If T has a bounded inverse and m = HT"^!!, then T is bounded below by 1/m, for 11^-1(77)11 < rn\\v\\ for all 77 G TF if and only if ||f|| < m\\T{0\\ forall ^gV.
3.3 CONTINUITY 129 If V is finite-dimensional, then it is true, conversely, that if T is bounded below, then it is invertible (why?), but in general this does not follow. If V and W are normed linear spaces, then Hom(F, W) is defined to be the set of all hounded linear maps T: V -^ W. The results of Section 2.3 all remain true, but require some additional arguments. Theorem 3.2. Hom(F, W) is itself a normed linear space if \\T\\ is defined as above, as the smallest bound for T. Proof. This follows from the uniform norm discussion of Section 2 by virtue of the identity ||T|| = \\T \ Bi\\^. D Theorem 3.3. If C/, F, and W are normed linear spaces, and if T G Hom(C/, V) and S G Hom(F, TT), then SoTe Hom(C/, W) and \\S o T\\ < \\S\\ \\T\\. It follows that composition on the right by a fixed T is a bounded linear transformation from Hom(F, W) to Hom(C/, TT), and similarly for composition on the left by a fixed S. Proof ||(.So T)(a)|| = \\S(T(a))\\ < \\S\\ ma)\\ < ||.S||(||T|| ||a||) = (||.S|| • ||T||)(||a||). Thus S o T is bounded by ||aS|| • \\T\\ and everything else follows at once. D As before, the conjugate space F* is Hom(F, IR), now the space of all hounded linear functionals. EXERCISES 3.1 Write out the e,5-proofs of the following limit theorems. 1) Let V and W be normed linear spaces, and let F and G be mappings from V to W. If \im^^aF(0 = fJi and lim^^a G(f) = v, then lim^^a {F + G)(f) = (jl+v. 2) Given F: F -> TF and g: V -^ U, if F(0 -> M and g(^) -> 6 as f -> a, then 3.2 Prove that if F(^) —> /x as f —> a and G{r]) —> X as 77 —> /x, then G o F{^) —> X as f —> a. Give a careful, complete statement of the theorem you have proved. 3.3 Suppose that A is an open subset of a nls V and that ao G A. Suppose that F: A —> IR is such that lima_^ao F(a) = b 9^ 0. Prove that l/F{a) —> 1/6 as a —> ao (e,5-proof). 3.4 The function/(a:) = |a:|^ is continuous at a: = 0 for any positive r. Prove that/is not Lipschitz continuous at a: = 0 if r < 1. Prove, however, that / is Lipschitz continuous at a: = a if a > 0. (Use the mean-value theorem.) 3.5 Use the mean-value theorem of the calculus and the definition of the derivative to show that if / is a real-valued function on an interval /, and if/' exists everywhere, then / is a Lipschitz mapping if and only if /' is a bounded function. Show also that then [|/'||oo is the smallest Lipschitz constant C.
130 THE DIFFERENTIAL CALCULUS 3.3 3.6 The "working rules" for ||r|| are 1) \\Tm< lirilllJIlforalU; 2) 117(^)11 < 611^11, alU ^ \\T\\<b. Prove these rules. 3.7 Prove that if we use the one-norm ||x||i = ^i \xi\ on IR'', then the norm of the linear functional n ^a^ X —> 2^ aiXi 1 is ||a||oo. 3.8 Prove similarly that if ||x|| = ||x||oo, then ||La|| = ||a||i. 3.9 Use the above exercises to show that if ||x|| on R^ is the one-norm, then ||x|| =lub{|/(x)|:/G(ff^'^)* and 11/11 < 1}. 3.10 Show that if T in Hom(lR'', IR'") has matrix t = {Uj], and if we use the one- norm ||x||i on W and the uniform norm ||y||oo on W^, then ||T|1 = ||t||oo. 3.11 Show that the meaning of 'Hom(F, TF)' has changed by giving an example of a linear mapping that fails to be bounded. There is one in the text. 3.12 For a fixed f in F define the mapping ev^: Hom(F, W) -> W by ev^(T) = T(f). Prove that ev^ is a bounded linear mapping. 3.13 In the above exercise it is in fact true that llev^|| = \\^\\, but to prove this we need a new theorem. Theorem. Given f in the normed linear space V, there exists a functional/in F* such that 11/11 = land 1/(^1 = U\l Assuming this theorem, prove that ||ev^|| = ||f||. [Hint: Presumably you have already shown that ||ev^|| < ||f||. You now need a T in Hom(F, W) such that ||T|| = 1 and \\T(0\\ = Ull Consider a suitable dyad.] 3.14 Let t = {tij] be a square matrix, and define ||t|| as max^ (^y \tij\). Prove that this is a norm on the space IR''^^'' of all n X n matrices. Prove that ||st|| < ||s|| • ||t||. Compute the norm of the identity matrix. 3.15 Let V be the normed linear space IR'' under the uniform norm ||x||oo = max {\xi\]. If T G Hom V, prove that ||T|| is the norm of its matrix ||t|| as defined in the above exercise. That is, show that Ly=i J (Show first that ||t|| is an upper bound of T, and then show that ||T(x)|| = ||t|| ||x|| for a specially chosen x.) Does part of the previous exercise now become superfluous? 3.16 Assume the following fact: If/G ©([0, 1]) and ||/||i = a, then given e, there is a function g G ©([0, 1]) such that ||gf||oo = 1 and / fg > a — e.
3.3 CONTINUITY 131 Let K(s, t) be continuous on [0, 1] X [0, 1] and bounded by h. Define T\ e([0, 1]) -> ^([0, 1]) by Th = k, where k(s) = f K(s, t) hit) dt. Jo If V and IF are the normal linear spaces 6 and (B under the uniform norms, prove that liril =luh I\K{s,t)\clt. [Hint: Proceed as in the above exercise.] 3.17 Let V and W be normed linear spaces, and let A be any subset of V containing more than one point. Let £(^1, TL) be the set of all Lipschitz mappings from A to IF. For/in £(.i, TF), let p(/) be the smallest Lipschitz constant for/. That is, ^^n II? — ^1! Prove that £(-.1, TF) is a vector space V and that p is a seminorm on V. 3.18 Continuing the above exercise, show that if a is any fixed point of A, then pU) + ll/(«)ll is a norm on V. 3.19 Let K be a mapping from a subset .1 of a normed Unear space F to F which differs from the identity by a Lipschitz mapping with constant c less than 1. We may as well take c = J, and then our hypothesis is that \\K{i) - K{,r,) - {^ - ■nn < m - ril Prove that K is injective and that its inverse is a Lipschitz mapping with constant 2. 3.20 Continuing the above exercise, suppose in addition that the domain .4 of K is an open subset of F and that K[C] is a closed set whenever (7 is a closed ball lying in A. Prove that if C = Cr(a), the closed ball of radius r about a, is a subset of .1, then K[C] includes the ball B = Br/iiy), where 7 = K(a). This proof is elementary but tricky. If there is a point u of jB not in K[C], then since K[C] is closed, there is a largest hall B' about v disjoint from K[C] and a point 77 = K(^) in K[C] as close to B' as we wish. Now if we change f by adding v — rj, the change in the value of K will approximate V — 7] closely enough to force the new value of K to be in B\ If we can also show that the new value f + (u — rj) is in C, then this new value of K is in K[C], and we have our contradiction. Draw a picture. Obviously, the radius p of B' is at most r/7. Show that if 77 = K(^) is chosen so that ||u — r]\\ < 3/2p, then the above assertions follow from the triangle inequality, and the Lipschitz inequafity displayed in Exercise 3.19. You have to prove that \\K(^+iv-v})-v\\ <p and \\{^+iv-v))-c^\\<r. .'1.21 Assume the result of the above exercise and show that Br,8(y) C K[Br{a)].
132 THE DIFFERENTIAL CALCULUS 3.4 Show, therefore, that iv[.l] is an open subset of T^. State a theorem about the Lipschitz invertibility of K, including all the hypotheses on K that were used in the above exercises. 3.22 We shall see in the next chapter that if V and W are finite-dimensional spaces, then any continuous map from V to IT takes bounded closed sets into bounded closed sets. Assuming this and the results of the above exercises, prove the following theorem. Theorem. Let F he Si mapping from an open subset .1 of a finite-dimensional normed linear space F to a finite-dimensional normed linear space W. Suppose that there is a T in Hom(F, W) such that T"^ exists and such that F — T is Lipschitz on .1, with constant l/2m, where m = ||T~^||. Then F is injective, its range R = F[.l] is an open subset of W, and its inverse F~^ is Lij^schitz continuous, with constant 2?n. 4. EQUIVALENT NORMS Two normed linear spaces V and W are 7mrm isomorphic if there is a bijection T from V toW such that T G Hom(F, W) and T'^ G Hom(Tr, V). That is, an isomorphism is a linear isomorphism T such that both T and T~^ are continuous (bounded). As usual, we regard isomorphic spaces as being essentially the same. For two different norms on the same space we are led to the following definition. Definition. Two norms p and q on the same vector space V are equivalent if there exist constants a and h such that p < aq and q < hp. Then (l/h)q < p < aq and (l/a)p < q < hp, so that two norms are equivalent if and only if either can be bracketed by two multiples of the other. The above definition simply says that the identity map i^-^ ^ from V to F, considered as a map from the normed linear space <V, p> to the normed linear space <V, q>, is bounded in both directions, and hence that these two normed linear spaces are isomorphic. If F is infinite-dimensional, two norms will in general not be equivalent. For example, if F = e([0, 1]) and fn{t) = T, then ||/^||i = l/(n+ 1) and 11/^ 1100 = L Therefore, there is no constant a such that ||/||oo < ^||/||i for all /G e[0, 1], and the norms || ||oo and || ||i are not equivalent on F = e[a, h]. This is why the very notion of a normed linear space depends on the assumption of a given norm. However, we have the following theorem, which we shall prove in the next chapter by more sophisticated methods than we are using at present. Theorem 4.1. On a finite-dimensional vector space F all norms are equivalent. We shall need this theorem and also the following consequence of it occasionally in the present chapter. Theorem 4.2. If F and W are finite-dimensional normed linear spaces, then every linear mapping T from F to IF is necessarily bounded.
3.4 EQUIVALENT NORMS 133 Proof. Because of the above theorem, it is sufficient to prove T bounded with respect to some pair of norms. Let 6: W^ -^ V and (p: W -^ W he any basis isomorphisms, and let {Uj} be the matrix oi T = (p~^ o 7" o ^ in Hom(IR'', W^). Then \Tx\\ao = max < (max|^ ^,J ij\) (t h) where h = max \tij\. Now q(r}) = \\(p ^(^)||oo and p(^) = ||^ ^(?)||i are norms on W and V respectively, by Lemma 2.1, and since q(.m)) = md-H)\U < b\\d-'^h = bp(0, we see that T is bounded by h with respect to the norms p and g on F and W. D If we change to an equivalent norm, we are merely passing through an isomorphism, and all continuous linear properties remain unchanged. For example : Theorem 4.3. The vector space Hom(F, W) remains the same if either the domain norm or the range norm is replaced by an equivalent norm, and the two induced norms on Hom(F, W) are equivalent. Proof. The proof is left to the reader. We now ask what kind of a norm we might want on the Cartesian product V X W of two normed linear spaces. It is natural to try to choose the product norm so that the fundamental mappings relating the product space to the two factor spaces, the two projections Wi and the two injections d{, should be continuous. It turns out that these requirements determine the product norm uniquely to within equivalence. For if \\<a, ^>\\ has these properties, then \\<a, ^>\\ = ||<a:,0> + <0, ?> || < ||<a,0>|| + ||<0, ?>|| <k,\\a\\ + k2U\\ <A;(||a|| + 11^11), where A;^ is a bound of the injection ${ and A; is the larger of A;i and A;2- Also, Ikll ^ Ci\\<a, ^>||and||^|| < C2\\<a, ^> ||, by the boundedness of the projections TTi, and so ||a:|! + ||?|| ^ c||<a:, ^>\\, where c = Ci-\~ C2- Now ||a:|| + II ^11 is clearly a norm || ||i on F X W, and our argument above shows that II <a, ^> II will satisfy our requirements if and only if it is equivalent to || ||i. Any such norm will be called a product norm for V X W. The product norms most frequently used are the uniform (product) norm \\l<oi, ?>||oo = max {||a||, ll^ll), tlie Euclidean (product) norm \\<a, ^>\\2 = (||«IP + l!?lP)^^^j and the above sum (product) norm \\<a, ^> ||i. We shall leave the verification that the uniform and Euclidean norms actually are norms as exercises. Each of these three product norms can be defined as well for n factor spaces jis for two, and we gather the facts for this general case into a theorem.
134 THE DIFFERENTIAL CALCULUS 3.4 Theorem 4.4. If {<Vi, Pi>}1 is a finite set of normed linear spaces, then II 111, II II2, and II U defined on V=IlUVi by Mi = ZlPi(ad, II^IU = (El Pi{oii)^)^'^, and Halloo = max {pi{ai) :i = 1, . . . ,n}, are equivalent norms on V, and each is a product norm in the sense that the projections Wi and the injections di are all continuous. * It looks above as though all we are doing is taking any norm || || on W^ and then defining a norm jjj jjj on the product space V by IHII = \\<Pl{0il), . . . ,Pn{oin)>\\- This is almost correct. The interested reader will discover, however, that II II on W^ must have the property that if \xi\ < \yi\ for i = 1, . . . , n, then ||x|i < ||y|| for the triangle inequality to follow for jjj jjj in V. If we call such a norm on [R^ an increasing norm, then the following is true. If II II is any increasing norm on W^, then jjjajjj = || <pi{ai), . . . , Pnio^n) > \\ is a product norm on V = Hi billow ever, we shall use only the 1-, 2-, go-product norms in this book.* The triangle inequality, the continuity of addition, and our requirements on a product norm form a set of nearly equivalent conditions. In particular, we make the following observation. Lemma 4.1. If F is a normed linear space, then the operation of addition is a bounded linear map from F X F to F. Proof. The triangle inequality for the norm on F says exactly that addition is bounded by 1 when the sum norm is used on F X F. D A normed linear space F is a (norm) direct sum 0^ Vi if the mapping <Xi, . . . , Xn> i-^ El ^i is a norm isomorphism from JJi ^i to F. That is, the given norm on F must be equivalent to the product norm it acquires when it is viewed as IJi ^i- If ^ is algebraically the direct sum 0^ Vi, we always have Y^xi 1 < EII^' by the triangle inequality for the norm on F, and the sum on the right is the one- norm for Hi Vi- Therefore, F will be the norm direct sum 0^ Vi if, conversely, there is an 7i-tuple of constants {hi} such that \\xi\\ < hi\\x\\ for all x. This is the same as saying that the projections Pi'.x \-^ xi are all bounded. Thus, Theorem 4.5. If F is a normed linear space and F is algebraically the direct sum F = 01 Vi, then F = 0^ Vi as normed linear spaces if and only if the associated projections [Pi] are all bounded.
3.4 EQUIVALENT NORMS 135 i:XERCISES 4.1 The fact that Hom(F, W) is unchanged when norms are replaced by equivalent norms can be viewed as a corollary of Theorem 3.3. Show that this is so. 4.2 Write down a string of quite obvious inequalities showing that the norms II 111, II II2, and II ||oo on W are equivalent. Discuss what happens as n —> 00. 4.3 Let V be an n-dimensional vector space, and consider the collection of all norms on V of the form p ^ 6, where ^: F —> IR'* is a coordinate isomorphism and p is one of the norms || ||i, || II2, || ||oo on IR'*. Show that all of these norms are equivalent. (Use the above exercise and the reasoning in Theorem 4.2.) 4.4 Prove that \\<a, ^>\\ = max {||a||, ||^||} is a norm on F X W. 4.5 Prove that ||<a, ^>|| = ||a|| + ||^|| is a norm on F X W. 4.6 Prove that II < a, ?> II = (||af + ||g||2)i/2 is a norm on F X TF. 4.7 Assuming Exercises 4.4 through 4.6, prove by induction the corresponding part of Theorem 4.4. 4.8 Prove that if A is an open subset of F X TF, 1>hen Tri[A] is an open subset of F. 4.9 Prove (e, 8) that <T, S> —>*So Tisa continuous map from Hom(Fi, F2) X Hom(F2, F3) to Hom(Fi, Vs), where the Vi are all normed hnear spaces. [.10 Let II II be any increasing norm on IR""; that is, ||x|| < ||y|| if Xi < yi for all i. Lot Pi be a norm on the vector space Vi for i = 1, . . . , n. Show that llklll = \\<pi{ai),...,pn{an)>\\ is a norm on F = IJi ^i- 1.11 Suppose that p: F —> IR is a nonnegative function such that p{xa) = |a:|p(a:) for all x, a. This is surely a minimum requirement for any function purporting to be a measure of length of a vector. a) Define continuity with respect to p and show that Theorem 3.1 is valid. b) Our next requirement is that addition be continuous as a map from F X F to F, and we decide that continuity at 0 means that for every e there is a 5 such that p{a) < 5 and p(l3) < 8 =^ p(a + 13) < e. Argue again as in Theorem 3.1 to show that there is a constant c such that p(a + p) < c{p(a) + p(p)) for all a,pGV. 1.12 Let F and TF be normed Hnear spaces, and let /: F X TF —> IR be bounded and bilinear. Let T be the corresponding linear map from F to TF*. Prove that T is bounded iind that ||T|| is the smallest bound to/, that is, the smallest b such that |/(a,^)| < 6||a||||^|| for all a, fi. 1.13 Let the normed linear space F be a norm direct sum M ® N. Prove that the ^uhspaces M and A^ are closed sets in F. (The converse theorem is false.)
136 THE DIFFERENTIAL CALCULUS 3.5 4.14 Let A^ be a closed subspace of the normed linear space V. If A is a coset N -\- a, define |||J||| as gib {||^|| : ^G ^1}. Prove that |||.4||| is a norm on the quotient space V/N. Prove also that if | is the coset containing ^, then the mapping $ i-^ | (the natural projection tt of F onto V/N) is bounded by 1. 4.15 Let V and W be normed linear spaces, and let T in Hom(F, 11') have a null space which includes the closed subspace A^. Prove that the unique linear S from V/N to W defined hy T = S o tt (Theorem 4.3 of Chapter 1) is bounded and that \\S\\ = \\T\\. 4.16 Let N he 8l closed subspace of a normed linear space, and suppose that A^ has a finite-dimensional complement in the purely algebraic sense. Prove that then V is the norm direct sum M 0 A^. (Use the above exercise and Theorem 4.2 to prove that if P is the projection of V onto A^ along ilf, then P is bounded.) 4.17 Let A^i and A^2 be closed subspaces of the normed linear space V, and suppose that they have the same finite codimension. Prove that A^i and A^2 are norm isomorphic. (Assume the results of the above exercise and Exercise 2.11 of Chapter 2.) 4.18 Prove that if p is a seminorm on a vector space F, then its null set is a subspace A^, p is constant on the cosets of A^, and p factors: p = g o tt, where g is a norm on V/N and TV is the natural projection ^ i-^ ^ of V onto V/N. Note that ^ i-^ | is thus an isometric surjection from the seminormed space V to the normed space V/N. An isometry is a distance-preserving map. 5. INFINITESIMALS The notion of an infinitesimal was abused in the early literature of the calculus, its treatment generally amounting to logical nonsense, and the term fell into such disrepute that many modern books avoid it completely. Nevertheless, it is a very useful idea, and we shall base our development of the differential upon the properties of two special classes of infinitesimals which we shall call ''big oh^^ and ''little oh^^ (and designate'0^ and ^e\ respectively). Originally an infinitesimal was considered to be a number that "is infinitely small but not zero". Of course, there is no such number. Later, an infinitesimal was considered to be a variable that approaches zero as its limit. However, we know that it is functions that have limits, and a variable can be considered to have a limit only if it is somehow considered to be a function. We end up looking at functions (p such that (p(t) -^ 0 as t -^ 0. The definition of derivative involves several such infinitesimals. If f\x) exists and has the value a, then the fundamental difference quotient {f(x + t) — f{x))/t is the quotient of two infinitesimals, and, furthermore, {{f{x + 0 ~ /(^))/0 "~ ^ ^^so approaches 0 as ^ -^ 0. This last function is not defined at 0, but we can get around this if we wish by multiplying through by ^, obtaining {Kx + t) - fix)) - at= cp(t), where/(a: -\- t) — f(x) is the "change in/" infinitesimal, at is a linear infinitesimal, and (p(t) is an infinitesimal that approaches 0 faster than t (i.e., (p(t)/t -^ 0 as ^ -^ 0). If we divide the last equation by t again, we see that this property of the infin-
:i5 INFINITESIMALS 137 itesimal <^, that it converges to 0 faster than ^ as ^ -^ 0, is exactly equivalent to the fact that the difference quotient of / converges to a. This makes it clear that the study of derivatives is included in the study of the rate at which infinitesimals get small, and the usefulness of this paraphrase will shortly become clear. Definition. A subset A of a normed linear space F is a neighborhood of a point a if A includes some open ball about a. A deleted neighborhood of a is a neighborhood of a minus the point a itself. We define special sets of functions ^, 0, and o as follows. It will be assumed in these definitions that each function is from a neighborhood of 0 in a normed linear space F to a normed linear space W. / E £r if /(O) = 0 and / is continuous at 0. These functions are the infi- nitesimals. / E 0 if /(O) = 0 and / is Lipschitz continuous at 0. That is, there exist positive constants r and c such that ||/(?)|| < c||^|| on ^^.(O). / e e if/(O) = 0 and ||/(a||/||^|| ^ 0 as ? ^ 0. When the spaces V and W are not understood, we specify them by writing c)(F, TF), etc. A simple set of functions from IR to IR makes the qualitative difference between these classes apparent. The function/(a:) = \x\ ^'^ is in ^(U, U) but not in 0, ^(a:) = a: is in 0 and therefore in ^ but not in o, and h(x) three classes (Fig. 3.7). X is in all Fig. 3.7 It is clear that ^, 0, and o are unchanged when the norms on F and W are replaced by equivalent norms. Our previous notion of the sum of two functions does not apply to a pair <»f functions fy g G ^{V^ W) because their domains may be different. However, / I ^ is defined on the intersection dom / D dom g, which is still a neighborhood <»f 0. Moreover, addition remains commutative and associative when extended ifi this way. It is clear that then ^(F, W) is almost a vector space. The only trouble occurs in connection with the equation /+ (—/) = 0; the domain of On\ function on the left is dom /, whereas we naturally take 0 to be the zero function on the whole of F.
138 THE DIFFERENTIAL CALCULUS 3.5 * The way out of this difficulty is to identify two functions / and ^ in ^ if they are the same on some ball about 0. We define / and g to be equivalent (/ '^ O) if ^i^d only if there exists a neighborhood of 0 on which f = g. We then check (in our minds) that this is an equivalence relation and that we now do have a vector space. Its elements are called germs of functions at 0. Strictly speaking, a germ is thus an equivalence class of functions, but in practice one tends to think of germs in terms of their representing functions, only keeping in mind that two functions are the same as germs when they agree on a neighborhood of 0.* As one might guess from our introductory discussion, the algebraic properties of the three classes ^, 0, and o are crucial for the differential calculus. We gather them together in the following theorem. Theorem 5.1 1) 6(V,W)C0(V,W)C^(V,W)y and each of the three classes is closed under addition and multiplication by scalars. 2) If fGeiV,W), and if (/G 0(17, Z), then ^o/G0(F,Z), where dom g o f = /~^[dom g]. 3) If either f or g above is in o, then so is ^ o /. 4) If / G 0(7, W) and g G ^(F, U), then fg G ©(F, W), and similarly if / G ^ and g GO. 5) In (4) if either / or ^ is in o and the other is merely bounded on a neighborhood of 0, then fg ee(V,W). 6) Hom(F, W) C 0(F, W). 7) Hom(F, W) n e(V, W) = {0}. Proof. Let £.,(¥, W) be the set of infinitesimals / such that ||/(f)|| < eU\\ on some ball about 0. Then / G 0 if and only if / is in some £e, and / G o if and only if / is in every £e- Obviously, o C 0 C ^. 1) If 11/(011 < «ll^ll on BM and ||<;(0|| < 6||^|| on 5„(0), then \\m) + gm < (a + b)U\\ on Br(0)y where r = min {^, u}. Thus 0 is closed under addition. The closure of o under addition follows similarly, or simply from the limit of a sum being the sum of the limits. 2) If 11/(011 < all^ll when ||^|| < t and \\g{v)\\ < bM when ||„|| < u, then himm < Hfm < abuw when llfll < t and ||/(f)|| < u, and so when ||f|| < r == min {^, u/a}.
3.5 INFINITESIMALS 139 3) Now suppose that / G o in (2). Then, given e, we can take a = e/h and have lls(/(^))ll < eUW when II f II < r. Thus ^ o / G o. The argument when g Ge and / G 0 is essentially the same. 4) Given ||/(f)|| < c||f|| on Br(0) and given €, we choose 8 such that \g(^)\ < e/c on ^5(0) and have \\m)gm<4^\\ when llfll < min (5, r). The other result follows similarly, as also does (5). G) A bounded linear transformation is in 0 by definition. 7) Suppose that / G Hom(F, W) D ©(F, W). Take any a 9^ 0. Given e, choose r so that ||/(f)|| < e\\^\\ on Br(0). Then write a as a = a:f, where llfll < r. (Find f and a:.) Then \\fM\\ = WKxm = \x\' WmW < N.€.||f|| = €|H|. Thus ||/(a)|| < €||a|| for every positive e, and so f(a) = 0. Thus / = 0, proving (7). D Remark. The additivity of/was not used in this argument, only its homogeneity. It follows therefore that there is no homogeneous function (of degree 1) in 0 except 0. Sometimes when more than one variable is present it is necessary to indicate with respect to which variable a function is in 0 or 0. We then write "/(f) = 6(f)" for "/ G 0", where "0(f)" is used to designate an arbitrary element of 0. The following rather curious lemma will be useful later in our proof of the differentiability of an implicitly defined function. It is understood that rj = f(^), where / is the function we are studying. Lemma 5.1. U rj = 0(f) + o(< ^, v>) and also rj = ^(f), then rj = 0(f). Proof. The hypotheses imply that there are numbers h, ri and p such that ll^ll < ?>||f|| + idlfll + h\\) if llfll < r, and ||f|| + ||^|| < p, and then that ll^ll ^ p/2 if llfll is smaller than some r2. If ||f|| < r = min {ri, r2, p/2}, then all the conditions are met and \\7]\\ < 6||f|| + i(||f|| + ||^||)- But this is the inequality \\7]\\ < (26 + l)||f||, and so 77 = 0(f). D We shall also need the following straightforward result. Lemma 5.2. If / G 0(F, X) and g G 0(7, F), then <f,g> G 0(7, X X Y). That is, <0(f), 0(f) > = 0(f). Proof. The proof is left to the reader.
140 THE DIFFERENTIAL CALCULUS 3.6 EXERCISES 5.1 Prove in detail that the class ^{V, W) is unchanged if the norms on V and W are replaced by equivalent norms. 5.2 Do the same for 0 and 0. 5.3 Prove (5) of the 00-theorem (Theorem 5.1). 5.4 Prove also that if in (4) either/ or gf is in 0 and the other is merely bounded on a neighborhood of 0, then fg G 0{V, W). 5.5 Prove Lemma 5.2. (Remember that F = •<Fi,F2^ is loose language for F = di o Fi-\- 62 ° F2-) State the generalization to n functions. State the e-form of the theorem. 5.6 Given Fi G 0(Fi, W) and F2 G 0(F2, W), define F from (a subset of) V = Vi X V2 to M' by F(ai, 0:2) = Fi(ai) + F2{ol2). Prove that F G 0(F, M'). (First state the defining equation as an identity involving the projections tti and 7r2 and not involving explicit mention of the domain vectors ai and 0:2-) 5.7 Given Fi G 0(Fi, W) and F2 G 0(^2, ^), define precisely what you mean by F1F2 and show that it is in 0(Fi X F2, W). 5.8 Define the class 0" as follows:/ G 0" if / G ^ and ||/(f) ||/||f ||" is bounded in some deleted ball about 0. (A deleted neighborhood of a is a neighborhood minus a.) State and prove a theorem about /+ g when / G 0"^ and g G 0^. 5.9 State and prove a theorem about fog when f G 0"^ and g G 0^. 5.10 State and prove a theorem about fg when / G 0"^ and g G 0^. 5.11 Define a similar class Q"^. State and prove a theorem about/o g when/G 0"^ and g G o"^. 6. THE DIFFERENTIAL Before considering the notion of the differential, we shall review some geometric material from the elementary calculus. We do this for motivation only; our subsequent theory is independent of the preliminary discussion. In the elementary one-variable calculus the derivative/'(a) of a function/ at the point a has geometric meaning as the slope of the tangent line to the graph of / at the point a. (Of course, according to our notion of a function, the graph of / is /.) The tangent line thus has the (point-slope) equation y — /(a) = f\a){x — a), and is the graph of the affine map x i-^ f'{a){x — a) + /(^)- We ordinarily examine the nature of the curve / near the point < a, /(a) > by using new variables which are zero at this point. That is, we express everything in terms oi s = y — f(a) and t = x — a. This change of variables is simply the translation <x, y> i-^ <ty s> = <x — a, y — f(a) > in the Cartesian plane U^ which brings the point of interest <a^f{a) > to the origin. If we picture the situation in a Euclidean plane, of which the next page is a satisfactory local model, then this translation in IR^ is represented by a choice of new axes, the t- and s-axes, with origin at the point of tangency. Since y = f(x)
3.6 THE DIFFERENTIAL 141 if and only if s = f(a + t) ~ f{a), we see that the image of / under this translation is the function Afa defined by Afa(t) = f(a + t) — f(a). (See Fig. 3.8.) Of course, Afa is simply our old friend the change in / brought about by changing X from a to a + ^. A/aW dfa{t)-Afa{t)=Qit) f{a + t) I ^ T" ^^ ' '>tf\a)=dja{t) Fig. 3.8 a + t Similarly, the equation y — f{a) = f'{a){x — a) becomes s = f'{o)t, and the tangent line accordingly translates to the line that is (the graph of) the linear functional l\ t \-^ f'{a)t having the number/'(ot) as its skeleton (matrix). Remember that from the point of view of the geometric configuration (curve and tangent line) in the Euclidean plane, all that we are doing is choosing the natural axis system, with origin at the point of tangency. Then the curve is (the graph of) the function Afa, and the tangent line is (the graph of) the linear map I. Now it follows from the definition off'(a) that I can also be characterized as llie linear function that approximates Afa most closely. For, by definition. A/a(0 ►/'(a) as ^-^0, and this is exactly the same as saying that Afajt) - m t 0 or A/a - I Q. But we know from the Oe-theorem that the expression of the function Afa as the sum Z + 0 is unique. This unique linear approximation I is called the differential of/at a and is designated elf a- Again, the differential of/at a is the linear function /: IR 1-^ IR that approximates the actual change in /, Afa, in the sense that Afa — I E o; we saw above that if the derivative/'(c^) exists, then the differential of/at a exists and hsisf(a) as its skeleton (1x1 matrix). Similarly, if / is a function of two variables, then (the graph of) / is a surface ill Cartesian 3-space IR^ = IR^ X IR, and the tangent plane to this surface at <a, h, f(a, h) > has the equation z — f(a, h) = fi{a, h)(x — a) + /2(ot, h)(x — 6),
142 THE DIFFERENTIAL CALCULUS 3.6 where /i = df/dx and /2 = df/dy. If, as above, we set and Z(s, t) = s/i(a, b) + tf2{a, 6), then A/<a,6> is the change in / around a, b and Z is the linear functional on IR^ with matrix (skeleton) <fi(a, 6), /2(a, b) >. Moreover, it is a theorem of the standard calculus that if the partial derivatives of / are continuous, then again I approximates Af^a,b>, with error in ©. Here also I is called the differential of/at < a, 6 > and is designated df^a,b> (Fig. 3.9). The notation in the figure has been changed to show the value at t = < ^i, ^2 ^ of the differential dfg, of / at a = < ai, a2 >. Fig. 3.9 The following definition should now be clear. As above, the local function AFa is defined by AF«(f) = F(a + f) - F{a). Definition. Let V and W be normed linear spaces, and let 4 be a neighborhood of a in F. A mapping F: A -^ W is differentiable at a if there is a T in Hom(F, W) such that AFa(^) = r(f) + ©(f). The 00-theorem implies then that T is uniquely determined, for if also AF« = S + Oj then T — Sgo, and soT — S = Ohy {7) of the theorem. This imiquely determined T is called the differential of F at a and is designated dFa. Thus AF« = dFa + 0, where dFa is the unique (bounded) linear approximation to AF«.
3.6 THE DIFFERENTIAL 143 * Our preliminary discussion should make it clear that this definition of the differential agrees with standard usage when the domain space is IR". However, in certain cases when the domain space is an infinite-dimensional function space, dFa is called the first variation of F at a. This is due to the fact that although the early writers on the calculus of variations saw its analogy with the differential calculus, they did not realize that it was the same subject.* We gather together in the next two theorems the familiar rules for differentiation. They follow immediately from the definition and the 00-theorem. It will be convenient to use the notation a)«(F, W) for the set of all mappings from neighborhoods of a in F to TF that are differentiable at a. Theorem 6.1 1) UFg a)«(F, Tf), then AF« G 0(F, W), 2) If F, G G a)«(F, Tf), then F+G G a)«(F, W) and d{F + G)« = dF^ + dGa. 3) UF G a)«(F, IR) and G G a)«(F, Tf), thenFG G 3D«(F, W) and d(FG)a = F{a) dGa + dFaG{a), the second term being a dyad. 4) If -F is a constant function on F, then F is differentiable and dFa = 0. 5) If F G Hom(F, Tf), then F is differentiable at every a G F and dFa == F, Proof 1) AFa = dFa + 0 = e + 0 = e by (I) and (6) of the 00-theorem. 2) It is clear that A(F +(?)« = AF« + AG«. Therefore, A(F + G)a = (dFa + e) + (dGa + ©) = (dFa + dGa) + 0 by (1) of the 00-theorem. Since dFa + dGa G Hom(F, Tf), we have (2). 3) A(FG)a(0 = F(a + i)G(a + f) -~ F(a)G(a) = AF«(f)G(a) + F(a) AG«(|) + AF,(f) AG«(f), as the reader will see upon expanding and canceling. This is just the usual device of adding and subtracting middle terms in order to arrive at the form involving the A's. Thus A(FG)a = (dFa + Q)G(a) + F(a)(dGa + o) + 00 = dFaG(a) + F(a) dGa + e by the 00-theorem. 4) If AFa = 0, then dFa = 0 by (7) of the 00-theorem. 5) AFa(^) == F(a + f) - F(a) = F(f). Thus AFa = F G Hom(F, W). D The composite-function rule is somewhat more complicated. Theorem 6.2. If F Ga)«(F, W) and GG^Fia)(W, Z), then GoFga)a(F, X) and d(G o F)a = dGp^a, o dFa.
144 THE DIFFERENTIAL CALCULUS 3.6 Proof. We have A(G o FUQ = G{F(a + 0) - G{F(a)) = GiFia) + AF^(O) - G(Fia)) = AG^(«)(AF«(f)) = dGr.aMF^iO) + o{AFa(^)) = dGF^a){dFa{0) + dGFiaMO) + « ° © = (dGF(a) ° dFa)(0 + e o 0 + 0 o 0. Thus A(G o F)a = dGF(a) ° dFc, + 0, and since dGF(a) ° dF^ G Hom(F, Tf), this proves the theorem. The reader should be able to justify each step taken in this chain of equalities. D EXERCISES 6.1 The coordinate mapping <x^y> i—> x from IR^ to IR is differentiable. Why? What is its differential? 6.2 Prove that differentiation commutes with the application of bounded linear maps. That is, show that H F:V —^ W is differentiable at a and if T E Hom(TF, X), then T o F is differentiable at a smd d(T o F) a = To dF^. 6.3 Prove that F G T>a{V, U) and F{a) ^ 0=^ G = l/FG 3D«(7, IR) and 6.4 Let F: V —^ R be differentiable at a, and let /; IR —» IR be a function whose derivative exists at a == F(a). Prove that/o F is differentiable at a and that d(foF). =r(a)dF.. [Remember that the differential of / at a is simply multiplication by its derivative: dfa(h) = hf'(a).] Show that the preceding problem is a special case. 6.5 Let V and W be normed linear spaces, and let F:V —^ W and G:W—^ V he continuous maps such that G o F = Iv and F o G = Iw- Suppose that F is differentiable at a and that G is differentiable at jS = F(a). Prove that dG^ = (dFa) -1 6.6 Let/: F —» IR be differentiable at a. Show that g = f^ is differentiable at a and that dga = np-Ha) dfa. (Prove this both by an induction on the product rule and by the composite-function rule, assuming in the second case that D^x'' = nx''~^.)
3.6 THE DIFFERENTIAL 145 6.7 Prove from the product rule by induction that if the n functions fi'.V -^ IR, i = 1, . . . , n, are all differentiable at a, then so is/ = Ili/i) and that 6.8 A monomial of degree n on the normed linear space F is a product XI? U of linear functionals {U E F*). A homogeneous polynomial of degree n is a finite sum of monomials of degree n. A polynomial of degree n is a sum of homogeneous polynomials Pt, 1 = 0, . . . , n, where Pq is a constant. Show from the above exercise and other known facts that a polynomial is differentiable everywhere. 6.9 Show that \i Fi:V -^ Wi and F2''V —^ W2 are both differentiable at a, then so is /^ = <Fij F2>' from 7 to IF = TFi X W2 (use the injections ^1 and 62). 6.10 Show without using explicit computations, but using the results of earlier exercises instead, that the mapping F = U^ —^ U^ defined by <x,y>y-^ <{x — y)^, {x+y)^> is everywhere differentiable. Now compute its differential at <a,b>. 6.11 Let F:V—^X and G:W—^X be differentiable at a and ^ respectively, and defined: 7 X W -^ X by K{^,V) = Fa)+G{v). Show that K is differentiable at <a, I3> a) by a direct A-calculation; b) by using the projections tti and 7r2 to express K in terms of F and G without explicit reference to the variable, and then applying the differentiation rules. 6.12 Now suppose given F:V—^R and GiW—^X^ and define K by Ka,v) = F(^)G(v). Show that if F and G are differentiable at a and (3 respectively, then K is differentiable at < a, jS > in the manner of (b) in the above exercise. 6.13 Let V and W be normed linear spaces. Prove that the map <a^fi> 1—^ ||a|| ||i3|| from F X IF to IR is in ©(F X TF, IR). Use the maximum norm on the product space. Let/: F X IF -^ IR be bounded and bilinear. Here boundedness means that there is some h such that \f(a^fi)\ < 6||a|| \\fi\\ for all a, jS. Prove that / is differentiable everywhere and find its differential. 6.14 Let/ and g be differentiable functions from IR to IR. We know from the composite- function rule of the ordinary calculus that (/°ff)'(a) =/'(ff(a)V(«). Our composite-function rule says that d{f o g)a = dfg^a) ° dga, where dfx is the linear mapping t -^ f\x)t. Show that these two statements are equivalent.
146 THE DIFFERENTIAL CALCULUS 3.7 6.15 Prove that f{x^y) = ||<^, 2/>"||i = |x| + |2/| is differentiable except on the coordinate axes (that is, df^a,b> exists if a and h are both nonzero). 6.16 Comparing the shapes of the unit balls for || ||i and || ||oo on IR^^ guess from the above the theorem about the differentiabiHty of || ||oo. Prove it. 6.17 Let V and W be fixed normed linear spaces, let Xd be the set of all maps from V to W that are differentiable at 0, let Xo be the set of all maps from V to W that belong to o(V, W), and let Xi be Hom(F, W). Prove that Xd and Xo are vector spaces and that Xd = Xo 0 Xi. 6.18 Let Fhea, Lipschitz function with constant C which is differentiable at a point a. Prove that \\dFa\\ < C. 7. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM Directional derivatives form the connecting link between differentials and the derivatives of the elementary calculus, and, although they add one more concept that has to be fitted into the scheme of things, the reader should find them intuitively satisfying and technically useful. A continuous function / from an interval / C S^ to a normed linear space W can have a derivative/'(^) at a point a: G / in exactly the sense of the elementary calculus: /'(.) = limfct^V^^i^^- The range of such a function / is a curve or arc in TT, and it is conventional to call / itself a parametrized arc when we want to keep this geometric notion in mind. We shall also call /'(x), if it exists, the tangent vector to the arc / at x. This terminology fits our geometric intuition, as Fig. 3.10 suggests. For simplicity we have set a: = 0 and/(a:) = 0. If/'(a:) exists, we say that the parametrized arc / is smooth at x. We also say that / is smooth at a = /(a:), but this terminology is ambiguous if / is not injective (i.e., if the arc crosses itself). An arc is smooth if it is smooth at every value of the parameter. We naturally wonder about the relationship between the existence of the tangent vector/'(x) and the differentiability of/at x. If dfx exists, then, being a linear map on IR, it is simply multiplication "by" the fixed vector a that is its skeleton, dfxQi) = h dfxil) = ha, and we expect a to be the tangent vector ./(Q+0-/(Q) t Fig. 3.10
3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 147 f'(x). We showed this and also the converse result for the ordinary calculus in our preliminary discussion in Section 6. Actually, our argument was valid for vector-valued functions, but we shall repeat it anyway. When we think of a vector-valued function of a real variable as being an arc, we often use Greek letters like 'X' and '7' for the function, as we do below. This of course does not in any way change what is being proved, but is slightly suggestive of a geometric interpretation. Theorem 7.1. A parametrized arc 7: [a, 6] -^ F is differentiable at a: G (a, h) if and only if the tangent vector (derivative) a = y\x) exists, in which case the tangent vector is the skeleton of the differential, dJ^ih) = hy'{x) = ha. Proof. If the parametrized arc 7: [a, 6] -^ V is differentiable at a: G (a, 6), then dJxih) = hdlxil) = ha, where a = dlxW- Since A7a; — dy^ G o, this gives ||A7a;(/i) — /ia||/|/i| -^ 0, and so A7a;(/i)//i -^ a as /i -^ 0. Thus a is the derivative y\x) in the ordinary sense. By reversing the above steps we see that the existence of y\x) implies the differentiability of 7 at a:. D Now let F be a function from an open set ^4 in a normed linear space F to a normed linear space W. One way to study the behavior of F in the neighborhood of a point a in ^4 is to consider how it behaves on each straight line through a. That is, we study F by temporarily restricting it to a one-dimensional domain. The advantage gained in doing this is that the restricted F is then simply a parametrized arc, and its differential is simply multiplication by its ordinary derivative. For any nonzero ^ G V the straight line through a in the direction f has the parametric representation t ^-^ a + t^. The restriction of F to this line is the parametrized arc 7: 7(^) = F(a + t^). Its tangent vector (derivative) at the origin ^ = 0, if it exists, is called the derivative of F in the direction f at a, or the derivative of F with respect to f at a, and is designated D^F(a). Clearly, D,Fia) = lim ^(- + '^] - ^(") • Comparing this with our original definition of /', we see that the tangent vector y\x) to a parametrized arc 7 is the directional derivative Diy(x) with respect to the standard basis vector 1 in IR. Strictly speaking, we are misusing the word "direction", because different vectors can have the same direction. Thus, if r; = cf with c > 0, then r; and f point in the same direction, but, because D^F(a) is linear in f (as we shall see in a moment), their associated derivatives are different: Dy^F{a) = cD^F(a). We now want to establish the relationship between directional derivatives, which are vectors, and differentials, which are linear maps. We saw above that for an arc 7 differentiability is equivalent to the existence of y\x) = Di7(a:). In the general case the relationship is not as simple as it is for arcs, but in one direction everything goes smoothly.
148 THE DIFFERENTIAL CALCULUS 3.7 Theorem 7.2. If F is differentiable at a, and if X is any smooth arc through a, with a = \(x), then y = F o \ is smooth at x, and y'(x) = dFa{y(x)). In particular, if F is differentiable at a, then every directional derivative D^F(a) exists, and D^F(a) = dF^i^). Proof. The smoothness of 7 is equivalent to its differentiability at x and therefore follows from the composite-function theorem. Moreover, y'{x) = dyx{l) = d{F o\)^{l) = dF^{d\:,{l)) = dFc,{y{x)). If X is the parametrized line \{t) = a-\- t^^ then it has the constant derivative f, and since a = X(0) here, the above formula becomes 7'(0) = cZF«(Q. That is, D^F{a) = 7'(0) = dF^{^). D It is not true, conversely, that the existence of all the directional derivatives D^F(a) of a function F at a point a implies the differentiability of F at a. The easiest counterexample involves the notion of a homogenous function. We say that a function F: V -^ W is homogeneous if F(xi) = xF{^) for all x and f. For such a function the directional derivative D^F(0) exists because the arc y(t) = F(0 + t^) = tF(^) is linear, and 7'(0) = F(f). Thus, all of the directional derivatives of a homogeneous function F exist at 0 and D^F{0) = F(^). If F is also differentiable at 0, then dFo(0 = D^F(0) = F(0 and F = dFo. Thus a differentiable homogeneous function must he linear. Therefore, any nonlinear homogeneous function F will be a function such that D^F(0) exists for all f but dFo does not exist. Taking the simplest possible situation, define F: IR^ -^ IR by F(x, y) = x^/{x^ + y^)ii <x,y> 9^ <0, 0> and F(0, 0) = 0. Then F{tx, ty) = tF(x, y), so that F is homogeneous, but F is not linear. However, if V is finite-dimensional, and if for each f in a spanning set of vectors the directional derivative D^F{a) exists and is a continuous function of a on an open set A, then F is continuously differentiable on A. The proof of this fact depends on the mean-value theorem, which we take up next, but we shall not complete it until Section 9 (Theorem 9.3). The reader will remember the mean-value theorem as a cornerstone of the calculus, and this is just as true in our general theory. We shall apply it in the next section to give the proof of the general form of the above-mentioned theorem, and practically all of our more advanced work will depend on it. The ordinary mean-value theorem does not have an exact analogue here. Instead we shall prove a theorem that in the one-variable calculus is an easy consequence of the mean-value theorem. Theorem 7.3. Let / be a continuous function (parametrized arc) from a closed interval [a, h] to a normed linear space, and suppose that/'(0 exists and that ||/'(0|| < m for all t G (a, h). Then ||/(6) - f{a)\\ < m{h - a). Proof. Fix € > 0, and let A be the set of points x G [a, h] such that \\Kx)-f{a)\\ < (m + 6)(x-a) + 6.
3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 149 A includes at least a small interval [a, c], because / is continuous at a. Set I = lub A. Then ||/(0 - f(a)\\ < (m + €)(/ - a) + ehy the continuity of/at I. Thus le A, and a < I < h. We claim that I = h. For if / < 6, then f\l) exists and ||/'(Oll ^ ^- Therefore, there is a 5 such that \mx)-fm{x-l)\\ <m + e when \x — l\ < 8. It follows that ll/a + 5) - /(a)II < ii/(? + 5) - /(Oil + |i/(0 - /(a)II < {m + e)5+ {m + e){l - a) + e = (m+ €)(/+ S-a)+e, so that Z + 5 e. A, a, contradiction. Therefore, I = b. We thus have 11/(6) -/(a)II < (m + e)(h-a) + e, and, since € is arbitrary, ||/(6) — /(a)|| < m(b — a). 0 The following more general version of the mean-value theorem is the form in which it is ordinarily applied. As usual, F and G are from a subset of V to W. Theorem 7.4. If F is differentiable in the ball 5r(«), and if \\dF^\\ < e for every (3 in this ball, then ||AF^(Q|| < ejj f || whenever (3 and (3 + ^ are in the ball. More generally, the same result holds if the ball Br (a) is replaced by any convex set C. Proof. The segment from 13 to ^ + ^ is the range of the parametrized arc \(t) = i3 + t^ from [0, 1] to V. If ^ Siud ^ + ^ are in the ball E,(a), then this segment is a subset of the ball. Setting 7(^) = F{I3 + ^Q, we then have y'(x) = (lF^^a:^(\'(x)) = dF^^a^^i^), from Theorem 7.2. Therefore, ||T'(a:)|| < €||f|| on [0, 1], and the mean-value theorem then implies that iiA^^(^)ii = i!^(^ +0-nm = pw-T(o)ii <€|i^ii(i-0) = eiuii, which is the desired inequality. The only property of Br(a) that we have used is that it includes the line segment joining any two of its points. This is the definition of convexity, and the theorem is therefore true for any convex set. D Corollary. If G is differentiable on the convex set C, if T G Hom(F, IF), and if \\dG^ - ^|| < € for all 13 in C, then \\AG^(0 - T{0\\ < eU\\ whenever p and P + ^ are in C. Proof, SetF = G - T, and note that cZF^ = dG^ - TsmdAF^ = AG^ - T. U We end this section with a few words about notation. Notice the reversal of the positions of the variables in the identity {D^F)(a) = dFaiO- This differ- i^nce has practical importance. We have a function of the two variables ^a and ^ f' which we can convert to a function of one variable by holding the other variable fixed; it is convenient technically to put the fixed variable in subscript
150 THE DIFFERENTIAL CALCULUS 3.7 position. Thus we think of dFaiO with a held fixed and have the function dFa in Hom(F, TF), whereas in (D^F)(a) we hold f fixed and have the directional derivative D^F: A -^ W in the fixed direction f as a function of a, generalizing the notation for any ordinary partial derivative dF/dxi(a) as a function of a. We can also express this implication of the subscript position of a variable in the dot notation (Section 0.10): when we write D^F(a), we are thinking of the value at a of the function D^F('). Still a third notation that we shall use in later chapters puts the function symbol in subscript position. We write Jria) = dFa. This notation implies that the mapping F is going to be fixed through a discussion and gets it "out of the way" by putting it in subscript position. If F is differentiable at each point of the open set A^ then we naturally consider dF to be the map a i-^ dFa from A to Hom(F, W). In the V'-notation, dF = J p. Later in this chapter we are going to consider the differentiability of this map at a. This notion of the second differential d^F^ = d(dF)a is probably confusing at first sight, and a preliminary look at it now may ease the later discussion. We simply have a new map G = dF from an open set ^4 in a normed linear space F to a normed linear space X = Hom(F, W), and we consider its differentiability at a. If dGa = d(dF)a exists, it is a linear map from V to Hom(F, W)j and there is something special now. Referring back to Theorem 6.1 of Chapter 1, we know that dOa = d^F^ is equivalent by duality to a bilinear mapping co from V X V to W: since dGaiO is itself a transformation in Hom(F, W), we can evaluate it at tj, and we define co by 0,(1 V)= idGM))(r,). The dot notation may be helpful here. The mapping a \-^ dFa is simply (iF(.), and we have defined G by (?(•) = dF(^.y Later, the fact that dGa(0 is a mapping can be emphasized by writing it as dGdOi')- ^^ ^^^h case here we have a function of one variable, and the dot only reminds us of that fact and shows us where we shall put the variable when indicating an evaluation. In the case of CO we have the original use of the dot, as in co(f, •) = dGaiO- EXERCISES 7.1 Given /: IR —» IR such that /'(a) exists, show that the "directional derivative" Dbf(a) has the value hf\a)^ by a direct evaluation of the limit of the difference quotient. 7.2 Let/ be a real-valued function on an n-dimensional space F, and suppose that/ is differentiable at a E F. Show that the directions ^ in which the derivative D^F(a) is zero make up an (n — l)-dimensional subspace of V (or the whole of V). What similar conclusions can be drawn if/ maps F to a two-dimensional space W?
3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 151 7.3 a) Show by a direct argument on limits that if / and g are two functions from an interval / C f^ to a normed linear space F, and if f'{x) and g'(x) both exist, then (/+ gYix) exists and (/+ gYix) = f{x) + g\x). b) Prove the same result as a corollary of Theorems 7.1 and 7.2 and the differentiation rules of Section 6. 7.4 a) Given f:I—^V and g:I -^W^ show by a direct limit argument that if j'{x) and g\x) both exist, and \i F = <fyg>: I -^ VX W^ then F'(x) exists and F'(x) = <f(x),g\x)>. b) Prove the same result from Theorems 7.1 and 7.2 and the differentiation rules of Section 6, using the exact relation F = di o f-^ $2 o g, 7.5 In the spirit of the above two exercises, state a product law for derivatives of arcs and prove it as in the (b) proofs above. 7.6 Find the tangent vector to the arc -<e', sin O at < = 0; at < = t/2. [Apply Mxercise 7.4(a).] What is the differential of the above parametrized arc at these two points? That is, if/(O = "<^S sin O, what are dfo and cjy^/2? 7.7 Let /^: 1R2-» (}^2 ^^ ^j^g mapping <x, ?/>»—^ <3x^yjX^y^>. Compute the directional derivative Z)<i,2>/^(3, —1) a) as the tangent vector at <3, —1 >- to the arc /o X, where X is the straight line through < 3, — 1 > in the direction < 1, 2> ; b) by first computing d/^<3,_i> and then evaluating at < 1, 2>. 7.8 Let X and n be any two linear functionals on a vector space V. Evaluate the product /(f) = X(f))Lt(f) along the line f = ta, and hence compute /)«/(«). Now rvaluate/along the general line f = <a + jS, and from it compute Daf(l3). 7.9 Work the above exercise by computing differentials. 7.10 If /: IR'' —» IR is differentiable at a, we know that its differential d/a, being a linear functional on R^, is given by its skeleton n-tuple L according to the formula n ^^(x) = (L,x) = 22 liXi. 1 In this context we call the n-tuple Lthe gradient of / at a. Show from the Schwarz ine(|uality (Exercise 2.3) that if we use vectors y of Euclidean length 1, then the directional derivative Dyf(a) is maximum when y points in the direction of the gradient nif, 7.11 Let IT be a normed linear space, and let V be the set of parametrized arcs X; 1 — 1, 1] —» W such that X(0) = 0 and X'(0) exists. Show that F is a vector space and that X —» X'(0) is a surjective linear mapping from V to W. Describe in words the i'li'iiients of the quotient space V/N, where A^ is the null space of the above map. 7.12 Find another homogeneous nonlinear function. Evaluate its directional deriva- ftves D^F(0)y and show again that they do not make up a linear map. 7.13 Prove that if /^ is a differentiable mapping from an open ball jB of a normed linear space F to a normed linear space W such that dFa = 0 for every a in 5, then F in a constant function. 7.14 Generalize the above exercise to the case where the domain of F is an open set A wjth the property that any two points of A can be joined by a smooth arc lying in A.
152 THE DIFFERENTIAL CALCULUS 3.8 Show by a counterexample that the result does not generalize to arbitrary open sets A as the domain of F. 7.15 Prove the following generalization of the mean-value theorem. Let / be a continuous mapping from the closed interval [a, b] to a normed linear space V, and let g be a continuous real-valued function on [a, b]. Suppose that/'(0 and g\t) both exist at all points of the open interval (a, b) and that ||/'(0|| ^ 9^0 on (a, b). Then 11/(6) -/(a)II <^(6) -g(a). [Consider the points x such that \\f(x) — /(a)|| ^ g(x) — g(a) + e(x — a) + e.] 8. THE DIFFERENTIAL AND PRODUCT SPACES In this section we shall relate the differentiation rules to the special configurations resulting from the expression of a vector space as a finite Cartesian product. When dealing with the range, this is a trivial consideration, but when the domain is a product space, we become involved with a deeper theorem. These general product considerations will be specialized to the IR ^-spaces in the next section, but they also have a more general usefulness, as we shall see in the later sections of this chapter and in later chapters. We know that an m-tuple of functions on a common domain, F^: A -^ Wi, i = 1, . . . , m, is equivalent to a single m-tuple-valued function m F:A-^W=Yl Wi, 1 F(a) being the m-tuple {F\a)}^ for each a G A. We now check the obviously necessary fact that F is differentiable at a if and only if each F^ is differentiate at a. Theorem 8.1. Given F': A-^Wi,i= 1, . . . , m, and F = <F\ . . . , F^>, then F is differentiable at a if and only if all the functions F^ are, in which csisedFa = <dFl. . . ,dF^>. Proof. Strictly speaking, F = 117 ^i ° F\ where dj is the injection of Wj into the product space W = HT Wi (see Section 1.3). Since each di is linear and hence differentiable, with d(di)a = Oi, we see that if each F^ is differentiable at a, then so is F, and dF^ = 117 ^i ° <^^L- Less exactly, this is the statement dFa = < dF\, . . . , dF"^ >. The converse follows similarly from F* = in o F, where ttj is the projection of HT Wi onto Wj. D Theorems 7.1 and 8.1 have the following obvious corollary (which can also be proved as easily by a direct inspection of the limits involved). Lemma 8.1. If fi is an arc from [a, h] to Wi, fori= 1, . . . , n, and if / is the n-tuple-valued arc / = </i, . . . ,fn>, then f\x) exists if and only if fi(x) exists for each i, in which case/'(a:) = <fi(^), • • • ,fn(^) ^• When the domain space F is a product space IIi Vj the situation is more complicated. A function F(f i, . . . , fn) of n vector variables does not decompose*
8.8 THE DIFFERENTIAL AND PRODUCT SPACES 153 into an equivalent n-tuple of functions. Moreover, although its differential (IFa does decompose into an equivalent n-tuple of partial differentials {cZFi}, we do not have the simple theorem that dFa exists if and only if the partial differentials dFa all exist. Of course, we regard a function /^(f i, . . . , fn) of n vector variables as being a function of the single n-tuple variable f = < f i, . . . , fn ^ > so that in principle there is nothing new when we consider the differentiability of F. However, when we consider a composition F o G, the inner function G must now be an n-tuple- valued function G = <^\...,^^>", where g^ is from an open subset A of some iiormed linear space X to Vi, and we naturally try to express the differential of F o G in terms of the differentials dg\ To accomplish this we need the partial differentials dFa of F. For the moment we shall define the jth partial differential of F at Qj = <ai, . . . , a^^ as the restriction of the differential dFa to Fy, (U)nsidered as a subspace of F = Hi Vi- As usual, this really involves the injection dj of Vj into Hi Vi, and our formal (temporary) definition, accordingly, is dFL = dFa o dj. Then, since f = < h, • • • , fn^ = Li ^z(fz)> we have dFa{& = E dFi{^i), 1 Similarly, since G = <^\ . . . , ^^^ = Hi 6i o ^*, we have d{F o G)t = E dFho) ° dg\, 1 which we shall call the general chain rule. There is ambiguity in the "z^'-super- H(!ripts in this formula: to be more proper we should write (dF)a and d(g^)y. We shall now work around to the real definition of a partial differential. Since AFa o dj = (dFa + e) o dj = dFa o dj + e= dF^ + e, w'v, see that dFa can be directly characterized, independently of dFa, as follows: dFa is the unique element Ti of Hom(F^, W) such that AFa o di = Ti + o. That is, dF\ is the differential at ai of the function of the one variable ^i obtained by holding the other variables in F(fi, . . . , fn) fixed at the values ii = aj. This is important because in practice it is often such partial differen- liubility that we come upon as the primary phenomenon. We shall therefore luke this direct characterization as our definition of dFa, after which our motivating calculation above is the proof of the following lemma. Lemma 8.2. If A is an open subset of a product space V = Hi Vi, and if F: A -^ W is differentiable at a, then all the partial differentials dFa exist and dFi = dFa o di.
154 THE DIFFERENTIAL CALCULUS 3.8 The question then occurs as to whether the existence of all the partial differentials dFa implies the existence of dFa. The answer in general is negative, as we shall see in the next section, but if all the partial differentials dFa exist for each Qj in an open set A and are continuous functions of qj, then F is continuously differentiable on A. Note that Lemma 8.2 and the projection-injection identities show us what dFa must be if it exists: dFa = dFa o di and ^ di o tt^ = / together imply that dFa = H dF\^ o tt-. Theorem 8.2. Let A be an open subset of the normed linear space V = V\X F2, and suppose that F: A -^ W has continuous partial differentials dF\a,^> and dF%a,0> on A. Then dF^a,0> exists and is continuous on A, and dF<a,^>(f, v) = ^^<a.^>(f) + dF%cc,Mv)' Proof. We shall use the sum norm on F = Fi X F2. Given €, we choose 8 so that ||dF<^,^> — dF\a,^>\\ < € for every </x, ^> in the 5-ball about <a^ fi> and for i = 1, 2. Setting we have WdG^W = ||dFi,«+f.^+,> - dFl,«.^>|| < € if ||<f, r;>|| < 5, and the corollary of Theorem 7.4 implies that \\Fia +^,p+rj)- Fia, 0 + v) - ^^<a.^>(0|| < €|| f || when II < f, r? > II < 8. Arguing similarly with H(v) = F(a,0+v) -dF^^Mv). we find that \\F{a, 0 + v)- F(a, fi) - dF%„,^^(r,)\\ < e||,|| when II <0, r7> II < 8. Combining the two inequalities, we have ||AF<«.^>(f, r;) - T«^,v>)\\ < e\\<^,v>\\ when II < f, r7>|| < 8, where T = dF\cc,^> ° tti + dF%a,^> ° '7r2. That is, AF^a,^> — T = 0, and so dF^a,^> exists and equals T. D The theorem for more than two factor spaces is a corollary. Theorem 8.3. If A is an open subset of JTiVi and F -^ W is such that for each i = 1, . . . , n the partial differential dFa exists for all qj G A and is continuous as a function of qj = <ai, . . . , a^^j then dFa exists and is continuous on .4. If € = < fi, . . . , fn>, then dFa(^) = L^ dFi(^i), Proof. The existence and continuity of dFa and dFa imply by the theorem that dFi 0x1 + dFa o 7r2 is the differential of F considered as a function of the first two variables when the others are held fixed. Since it is the sum of continuous
3.8 THE DIFFERENTIAL AND PRODUCT SPACES 155 functions, it is itself continuous in qj, and we can now apply the theorem again to add dFa to this sum partial differential, concluding that Y.\ dFaOTr^is the partial differential of F on the factor space Vi X V2X F3, and so on (which is colloquial for induction). D As an illustration of the use of these two theorems, we shall deduce the general product rule (although a direct proof based on A-estimates is perfectly feasible). A general product is simply a bounded bilinear mapping co:X X Y-^W, where X, F, and W are all normed linear spaces. The boundedness inequality hereis||co(f, r;)|| < 6||f|| ||r;||.^ We first show that co is differentiable. Lemma 8.3. A bounded bilinear mapping co: X X Y -^ W is everywhere differentiable and <ico<a,^>(f, tj) = co(a, rj) + co(f, P). Proof. With 13 held fixed, g^(^) = co(f, 13) is in Hom(Z, W) and therefore is everywhere differentiable and equal to its own differential. That is, dco^ exists and dco\a,^>(0 = ^(f>i^)- Since jS 1-^ ^^ is a bounded linear mapping, ^w<a,^> = ^^ is a continuous function of <aj^>. Similarly, do)%a,0>(v) = w(a, ry), and doo^ is continuous. The lemma is now a direct corollary of Theorem 8.2. D If co(f, Tj) is thought of as a product of f and tj, then the product of two functions g(^) and /i(r) is co(^(f), /i(f)), where g is from an open subset ^4 of a normed linear space F to X and h is from A to F. The product rule is now just what would be expected: the differential of the product is the first times the differential of the second plus the second times the differential of the first. Theorem 8.4. If g: A -^ X and h: A -^ Y are differentiable at jS, then so is the product F(^) = co(^(f), /i(r)) and dF^U) = co(^(^), dh^U)) + co(d^^(f), HP)), Proof. This is a direct corollary of Theorem 8.1, Lemma 8.3, and the chain rule. D KXERCISES 8.1 Find the tangent vector to the arc <sint,cost,t^>- at < = 0; at < = ir/2. What is the differential of the above parametrized arc at the two given points? That is, >^ fiO = < sin t, cos t,t^>, what are dfo and dI/^/2? 8.2 Give the detailed proof of Lemma 8.1. 8.3 The formula n dFa{0 = E dFii^i) 1
156 THE DIFFERENTIAL CALCULUS 3.9 is probably obvious in view of the identity n 1 and the definition of partial differentials, but write out an explicit, detailed proof anyway. 8.4 Let F he 8l differentiable mapping from an n-dimensional vector space F to a finite-dimensional vector space W, and define G: VX W -^ W by G(^, tj) = tj — F(^). Thus the graph of /^ in V X W is the null set of G. Show that the null space of dG^a,p> has dimension n for every <a, I3> E V X W. 8.5 Let F(^, tj) be a continuously differentiable function defined on a product A X B, where jB is a ball and A is an open set. Suppose that dF\a,^> = 0 for all <a, I3> in A X B. Prove that F is independent of tj. That is, show that there is a continuously differentiable function G(^) defined on A such that /^(f, tj) = G(^) on AX B. 8.6 By considering a domain in R^ as indicated at the right, show that there exists a function f(x, y) on an open set A in IR^ such that dy everywhere and such that /(x, y) is not a function of x alone. 8.7 Let F{^, TJ, f) be any function of three vector variables, and for fixed 7 set ^(Jj v) = ^(S, V,^)- Prove that the partial differential dF]<a,fi,y> exists if and only if dG\a,^> exists, in which case they are equal. 8.8 Give a more careful proof of Theorem 8.3. That is, state the inductive hypothesis and show that the theorem follows from it and Theorem 8.2. If you are meticulous in your argument, you will need a form of the above exercise. 8.9 Let/be a differentiable mapping from R^ to R. Regarding IR^ as IR X R, show that the two partial differentials of/are simply multiplication by its partial derivatives. Generalize to n dimensions. Show that the above is still true for a map F from R^ to a general vector space V, the partial derivatives now being vectors. 8.10 Give the details of the proof of Theorem 8.4. 9. THE DIFFERENTIAL AND R"- We shall now apply the results of the last two sections to mappings involving the Cartesian spaces IR^, the bread and butter spaces of finite-dimensional theory. We start with the domain. Theorem 9.1. If F is a mapping from (an open subset of) IR^ to a normed linear space TF, then the directional derivative of F in the direction of the jth standard basis vector 8^ is just the partial derivative dF/dxj, and the jth partial differential is multiplication by dF/dxj: dF^JJi) = h(dF/dxj)(a). More exactly, if any one of the above three objects exists at a, then they all do, with the above relationships.
3.9 THE DIFFERENTIAL AND W 157 Proof. We have dF F(ai, ..., ay + /,..., an) — F(ai,. . . , ay, . . . , an) -— (a) = lim dxj ^ ' t-.o t Moreover, since the restriction of F to a + U8^' is a parametrized arc whose differential at 0 is by definition the jth partial differential of F at a and whose tangent vector at 0 we have just computed to be (dF/dxj)(a)j the remainder of the theorem follows from Theorem 7.1. D Combining this theorem and Theorem 7.2, we obtain the following result. Theorem 9.2. U V = W^ and F is differentiable at a, then the partial derivatives (dF/dxj)(a) all exist and the n-tuple of partial derivatives at a, {(^^/^a:y)(a)} 1, is the skeleton of dF^. In particular, n -JET DyF{a)= Z2/>|f (a). Proof. Since dF^(8^) = D^^F^si) = {dF/dxi){a), as we noted above, we have DyF{a) = dF^iy) = dF^ (Z Vi^') = E Vi dF.(8') = E y, g (a). All that we have done here is to display dF^ as the linear combination mapping defined by its skeleton {dF^(8')} (see Theorem 1.2 of Chapter 1), where T(8') = (lFa(8^) is now recognized as the partial derivative (dF/dxi)(a.). D The above formula shows the barbarism of the classical notation for partial derivatives: note how it comes out if we try to evaluate dFa(x). The notation Ds^F is precise but cumbersome. Other notations are Fj and DjF. Each has its problems, but the second probably minimizes the difficulties. Using it, our formula reads dF^(y) = Y.7=i yjDjF(a). In the opposite direction we have the corresponding specialization of Theorem 8.3. Theorem 9.3. If A is an open subset of IR^, and if F is a mapping from A to a normed linear space W such that all of the \ partial derivatives (dF/dxj)(a) exist and are continuous on A, then F is continuously differentiable on A. Proof. Since the jth partial differential of F is simply multiplication by dF/dxj, we are (by Theorem 9.1) assuming the existence and continuity of all the partial differentials dF{ on A. Theorem 9.3 thus becomes a special case of Theorem 8.3. D Now suppose that the range space of F is also a Cartesian space, so that F is a mapping from an open subset A of W" to W^. Then dF^ is in Hom(lR'^, W^).
158 THE DIFFERENTIAL CALCULUS 3.9 For computational purposes we want to represent linear maps from IR^ to W"^ by their matrices, and it is therefore of the utmost importance to find the matrix t of the differential T = dF^. This matrix is called the Jacobian matrix of F at a. The columns of t form the skeleton of dF^^ and we saw above that this skeleton is the n-tuple of partial derivatives (dF/dxj)(a). If we write the m- tuple-valued F loosely as an m-tuple of functions, F = </i, . . . ffm>, then according to Lemma 8.1, thejth column of t is the m-tuple ^^■)HI<" i<4 dxj Thus, Theorem 9.4. Let F be a mapping from an open subset of W^ to IR^, and suppose that F is differentiable at a. Then the matrix of dF^ (the Jacobian matrix of F at a) is given by dxj If we use the notation yi = fi(x), we have '.■,=&(»)■ If we also have a differentiable map z = G(y) = <gi(y), . . . , giiy) > from an open set B C^'^ into U\ then tZGb has, similarly, the matrix Also, if B contains b = F(a)y then the composite-function rule d(G o F)a = dGb o dF^ has the matrix form d^j (") = 5 ^- (*») d^j ("^' or simply dzk ^ y> dzk dyi dxj ^i dyi dxj This is the usual form of the chain rule in the calculus. We see that it is merely the expression of the composition of linear maps as matrix multiplication. We saw in Section 8 that the ordinary derivative/'(a) of a function/ of one real variable is the skeleton of the differential d/a, and it is perfectly reasonable to generalize this relationship and define the derivative ^'(a) of a function F of n real variables to be the skeleton of dF^, so that ^'(a) is the n-tuple of partial derivatives {idF/dxi){a)}iy as we saw above. In particular, if F is from an open subset of W^ to R^y then F\a) is the Jacobian matrix of F at a. This gives the
3.9 THE DIFFERENTIAL AND U^ 159 n>atrix chain rule the standard form (Go FY {a) = G'(Fia))F\a). Some authors use the word ^derivative' for what we have called the differential, but this is a change from the traditional meaning in the one-variable case, and we prefer to maintain the distinction as discussed above: the differential dFa is the linear map approximating AFa and the derivative ^'(a) must be the matrix of this linear map when the domain and range spaces are Cartesian. However, we shall stay with the language of Jacobians. Suppose now that A is an open subset of a finite-dimensional vector space V and that H: A -^ W is differentiate Sit a G A. Suppose that W is also finite- dimensional and that <p:V-^U^ and xpiW -^ W^ are any coordinate isomorphisms. If A = <p[A]j then A is an open subset of W^ and H = xp o H o <^~i is a mapping from A to W^ which is differentiate at a = <^(«), with dH^ = x// o dHa o (p~~^. Then dH^ is given by its Jacobian matrix {idh^/dxj)(a)}y which we now call the Jacobian matrix of H with respect to the chosen bases in V and W, Change of bases in V and W changes the Jacobian matrix according to the rule given in Section 2.4. If F is a mapping from IR^ to itself, then the determinant of the Jacobian matrix (df/dxj) (a) is called the Jacobian of F at a. It is designated y'---'^"ha), or y^'---'^")(a) d{Xi, . . . ,Xn) d(Xi, . . . ,Xn) if it is understood that yi = f(x). Another notation is J/r(a) (or simply J(a) if F is understood). However, this is sometimes used to indicate the differential dF^f and we shall write det Jf{^) instead. If F{x) = <xi — xl, 2a:ia:2 >", then its Jacobian matrix is 2xi —2x2 2a:2 2xi and det J(x) = 4(a:f + 4i = 4(||x||2) 2 EXERCISES 9.1 By analogy with the notion of a parametrized arc, we define a smooth parametrized two-dimensional surface in a normed linear space W to be a continuously differentiable map F from a rectangle / X J in IR^ to W. Suppose that I X J = [-^1, 1] X [—1, 1], and invent a definition of the tangent space to the range of F in W at the point F(0, 0). Show that the two vectors are a basis for this tangent space. (This should not have been your definition.)
160 THE DIFFERENTIAL CALCULUS 3.9 9.2 Generalize the above exercise to a smooth parametrized n-dimensional surface in a normed linear space W. 9.3 Compute the Jacobian matrix of the mapping •<x,y> i-^ -Kx^^y^, (x-j- y)^>. Show that its rank is two except at the origin. 9.4 Let F = <f,P,f > from U^ to U^ be defined by fi{x, y,z) = x + y + z, f2{x, y, z) = x^ + ^/^ + z^, and fs(x,y,z) = x^ + y^ + z^. Compute the Jacobian of /^ at <a,h, c>. Show that it is nonsingular unless two of the three coordinates are equal. Describe the locus of its singularities. 9.5 Compute the Jacobian of the mapping F: <x,y> \-^ -< (x-{-y)^, y^ > from IR^tolR^at <1, —l>;at <l,0>;at <a, 6>. Compute the Jacobian of (?: <s,t>^ <s — t, s -{- t> at <w, t;>. 9.6 In the above exercise compute the compositions F o G and G o F. Compute the Jacobian oi F o (7 at <yyV>. Compute the corresponding product of the Jacobians of F and G. 9.7 Compute the Jacobian matrix and determinant of the mapping T defined by X = r cos 6, y = r sm 6, z = z. Composing a function f(x, y, z) with this mapping gives a new function: gf(r, ^, z) = /(r cos ^, r sin ^, z). That is, gf = / o T. This composition (substitution) is called the change to cylindrical coordinates in W*, 9.8 Compute the Jacobian determinant of the polar coordinate transformation <r^Q> \-^ <x^y> ^ where x = r cos d^y — r sin d, 9.9 The transformation to spherical coordinates is given by x = r sin <^ cos ^, y = r sin (^ sin d^z = r cos d. Compute the Jacobian ^(^, V, z) d(r,<p,d)' 9.10 Write out the chain rule for the following special cases: dw/dt = ?, where w = F(Xyy)y x = g(t), y = h(t). Find dw/dt when w = F{xi, . . . , xj and Xi = gi(t), i = 1, . . . , n. Find dw/du when w = F{x,y), X = g{Ujv), y = h(u,v). The special case where g{u,v) = w can be rewritten £F(a;,Mx,t;)). Compute it. 9.11 li w = f(x, y), X = r cos 6, and y = r sin 6, show that [^lyl , \dw\ \dw\ , r^iy I
3.10 ELEMENTARY APPLICATIONS 161 10. ELEMENTARY APPLICATIONS The elementary max-min theory from the standard calculus generalizes with little change, and we include a brief discussion of it at this point. Theorem 10.1. Let F be a real-valued function defined on an open subset A of a normed linear space F, and suppose that F assumes a relative maximum value at a point a in A where dFa exists. Then dFa = 0. Proof. By definition D^F(a) is the derivative 7'(0) of the function 7(^) = F(a -\~ t^)f and the domain of 7 is a neighborhood of 0 in U. Since 7 has a relative maximum value at 0, we have 7'(0) = 0 by the elementary calculus. Thus clFaiO = D^F(a) = 0 for all f, and so dFa = 0. D A point a such that dFa = 0 is called a critical point. The theorem states that a differentiable real-valued function can have an interior extremal value only at a critical point. If F is R^, then the above argument shows that a real-valued function F can have a relative maximum (or minimum) at a only if the partial derivatives (dF/dxi)(a) are all zero, and, as in the elementary calculus, this often provides a way of calculating maximum (or minimum) values. Suppose, for example, that we want to show that the cube is the most efficient rectangular parallelepiped from the point of view of minimizing surface area for a given volume V. If the edges are a:, y and z, we have V = xyz and A = 2(xy -\- xz ^ yz) = 2(xy + V/y + V/x). Then from 0 = dA/dx = 2(y - V/x^), we see that V = yx^y and, similarly, dA/dy = 0 impHes that V = xy^. Therefore, yx^ = xy^f and since neither x nor y can be 0, it follows that x = y. Then V = yx^ = x^f and X = V^^^ = y. Finally, substituting in F = xyz shows that z = V^^^. Our critical configuration is thus a cube, with minimum area A = QV^'^. It was assumed above that A has an absolute minimum at some point <x^y^z>. The reader might enjoy showing that ^4 -^ oo if any of a:, y^ z tends to 0 or 00, which implies that the minimum does indeed exist. We shall return to the problem of determining critical points in Sections 12, 15, and 16. The condition dFa = 0 is necessary but not sufficient for an interior maximum or minimum. The reader will remember a sufficient condition from beginning calculus: If f'{x) = 0 and f"{x) < 0 (>0), then x is a relative maximum (minimimi) point for /. We shall prove the corresponding general theorem in Section 16. There are more possibilities now; among them we have the analogous sufficient condition that if dFa = 0 and d^Fa is negative (positive) definite as a quadratic form on V, then a is a relative maximum (minimum) point of F, We consider next the notion of a tangent plane to a graph. The calculation of tangent lines to curves and tangent planes to surfaces is ordinarily considered a geometric application of the derivative, and we take this as sufficient justification for considering the general question here.
162 THE DIFFERENTIAL CALCULUS 3.10 Let F be a mapping from an open subset ^4 of a normed linear space V to a normed linear space W. When we view F as a graph in F X TF, we think of it as a "surface" S lying "over" the domain A, generalizing the geometric interpretation of the graph of a real-valued function of two real variables inIR^ = IR^XlR. The projection tti'.V X W -^ V projects S "down" onto A^ and the mapping ^ \~^ <^,F(^)> gives the point of S lying "over" f. Our geometric imagery views V as the plane (subspace) F X {0} in F X TF, just as we customarily visualize U as the real axis U X {0} in IR^. We now assume that F is differentiable at a. Our preliminary discussion in Section 6 suggested that (the graph of) the linear function dFa is the tangent plane to (the graph of) the function AFa in F X TF, and that its translate M through -< a, F{a) > is the tangent plane at -< a, F(a) > to the surface S that is (the graph of) F. The equation of this plane is rj — F(a) = dFa(^ — a), and it is accordingly (the graph of) the affine function (?(Q = dF^i^ — a) + F(a). Now we know that dFa is the unique T in Hom(F, W) such that AFa{0 = T(^) + ©(f), and if we set f = f — a, it is easy to see that this is the same as saying that G is the unique affine map from V to W such that FU) - G{0 = Kf - «). That is, M is the unique plane over V that "fits" the surface S around -< a, F(a) > in the sense of ©-approximation. However, there is one further geometric fact that greatly strengthens our feeling that this ixally is the tangent plane. Theorem 10.2. The plane with equation rj — F(a) = dFa{^ — a) is exactly the union of all the straight lines through <a, F(a)> in V X W that are tangent to smooth curves on the surface S = graph F passing through this point. In other words, the vectors in the subspace dFa of V X W are exactly the tangent vectors to curves lying in S and passing through -< a, F(a) >. Proof. This is nearly trivial. If -< f, ^ > G dFa, then the arc 7(0 = <a + t^,Fia + tO> in S lying over the line t\-^ a + t^ in V has -< f, dFa(0 ^ = < ^y V^ as its tangent vector at -<a, F(a) >, by Lemma 8.1 and Theorem 8.2. Conversely, if ^ i-^ <\(t),F(\(t))> is any smooth arc in S passing through a, with a = X(^o)> then its tangent vector at <a, F(a) > is <X\to),dFa{yito))>, a vector in (the graph of) dFa. □
:5.io ELEMENTARY APPLICATIONS 163 As an example of the general tangent plane discussed above, let F = <fij2> be the map from IR^ to U^ defined by/i(x) = (xj - a:^)/2,/2(x) = .ria:2. The graph of F is a surface over U in X U . According to our above discussion, the tangent plane at -< a, F(a) > has the equation y = dFa(x — a) + F(a). At a = -< 1, 2> the Jacobian matrix of clF^ is Xi —Xo _X2 X] 2] = [1 ~2] lj<l,2> [2 Ij jirid F(a) = < —f, 2 >. The equation of the tangent plane il/ at -< 1, 2 > is thus -2I <yi,y2> U -2 L2 1 <Xi 1,:^2 2> + <-f,2>. ('omputing the matrix product, we have the scalar equations y^ = xi - 2x2 + (-1 +4 -f) = xi- 2x2 + f, 2/2 = 2xx + 0:2 + (-2 -2 +2) = 2a:i +0:2-2. Note that these two equations present the affine space M as the intersection of the hyperplane in IR^ consisting of all <X\, X2, Vi, 2/2^ such that 0:1 — 20:2 — Vi = —f, with the hyperplane having the equation 2a:i + 0:2 — 2/2 = 2. IXERCISES 10.1 Find the maximum value oi f(x, y, z) = x -j- y -{- z on the ellipsoid ^2 _f_ 2y^ + 3^2 = 1. 10.2 Find the maximum value of the linear functional f(x) = ^iCiXi on the unit ^l)licreXli X? = 1. 10.3 Find the (minimum) distance between the two lines t8^ and y = s<l,l,l>+ <1,0,-1> 10.4 Show that there is a uniquely determined pair of closest points on the two lines \ ^ ta-{-l and y = sb + m in IR '^ unless b = A:a for some k. We assume that u ^ 0 F^ b„ Remember that if b is not of the form ka, then |(a, b)| < ||a||2 ||b||2, recording to the Schwarz inequality. 10.5 Show that the origin is the only critical point of f(x, y, z) = xy -{- yz-^ zx. I'iiid a line through the origin along which 0 is a maximum point for/, and find another liiio along which 0 is a minimum point.
164 THE DIFFERENTIAL CALCULUS 3.11 10.6 In the problem of minimizing the area of a rectangular parallelepiped of given volume V worked out in the text, it was assumed that A <-+^^) has an absolute minimum at an interior point of the first quadrant. Prove this. Show first that .1 —> oo if '<^x,y>- approaches the boundary in any way: x—>0, x—>oo^ ^_>0, or 2/—>^. 10.7 Let /^: IR2 -> [R2 ^g ^^e mapping defined by yi = sin (xi + X2), 2/2 = cos (xi — X2). Find the equation of the tangent plane in U^ to the graph of F over the point a = <7r/4,7r/4>. 10.8 Define F: IR^ _> [R2 ^y 3 3 E2 V^ 3 1 1 Find the equation of the tangent plane to the graph of F in U^ over a = <1,2, —1>. 10.9 Let co(f, rj) be a bounded bilinear mapping from a product normed linear space F X ir to a normed linear space X. Show that the equation of the tangent plane to the graph iS of CO in F X ir X X at the point <a,p,y>^Sis r = co(?,/?) + co(a,77) + co(a,/?). 10.10 Let F be a bounded linear functional on the normed linear space V. Show that the equation of the tangent plane to the graph of F^ in F X IR over the point a can be written in the form ij = F^(a)(3F(0 — 2F(a)). 10.11 Show that if the general equation for a tangent plane given in the text is applied to a mapping P' in Hom(F, W), then it reduces to the equation for F itself [rj = F(^)], no matter where the point of tangency. (Naturally!) 10.12 Continuing Exercise 9.1, show that the tangent space to the range of F in W at F(0) is the projection on IF of the tangent space to the graph of F in (R^ X IF at the point <0, F(0) >. Now define the tangent plane to range F in IF at F(0), and show that it is similarly the projection of the tangent plane to the graph of F. 10.13 Let /^': F —> IF be differentiable at a. Show that the range of dFa is the projection on IF of the tangent space to the graph of /^ in F X IF at the point <«, F{a) >. 11. THE IMPLICIT-FUNCTION THEOREM The formula for the Jacobian of a composite map that we obtained in Section 9 is reminiscent of the chain rule for the differential of a composite map that we derived earlier (Section 8). The Jacobian formula involves numbers (partial derivatives) that we multiply and add; the differential chain rule involves linear maps (partial differentials) that we compose and add. (The similarity becomes a full formal analogy if we use block decompositions.) Roughly speaking, the
3.11 THE IMPLICIT-FUNCTION THEOREM 165 whole differential calculus goes this way. In the one-variable calculus a differential is a Hnear map from the one-dimensional space U to itself, and is therefore multiplication by a number, the derivative. In the many-variable calculus when we decompose with respect to one-dimensional subspaces, we get blocks of such numbers, i.e., Jacobian matrices. When we generalize the whole theory to vector spaces that are not one-dimensional, we get essentially the same formulas but with numbers replaced by Hnear maps (differentials) and multipHcation by composition. Thus the derivative of an inverse function is the reciprocal of the derivative of the function: if ^ = f~^ and h = f{a), then g\h) = l/j'{a). The differential of an inverse map is the composition inverse of the differential of the map: if G= F-' and F{a) = /?, then clG^ = {dFa)~'. If the equation g{x, y) = 0 defines y implicitly as a function of x, y = f{x), we learn to compute/'(a) in the elementary calculus by differentiating g(x,f{x)) ^0, and we get where h = f(a). Hence f'ia) = dg/dx dgjdy We shall see below that if G{^, -q) = 0 defines 77 as a function of ^, 77 = F{^), and if /? = ^(«), then we calculate the differential clFa by differentiating the identity G{^, F{^)) = 0, and we get a formula formally identical to the above. Finally, in exactly the same way, the so-called auxiliary variable method of solving max-min problems in the elementary calculus has the same formal structure as our later solution of a "constrained " maximum problem by Lagrange multipliers. In this section we shall consider the existence and differentiability of functions implicitly defined. Suppose that we are given a (vector-valued) function G{^, rj) of two vector variables, and we want to know whether setting G equal to 0 defines 77 as a function of ?, that is, whether there exists a unique function F such that G(^, F{^)) is identically zero. Supposing that such an "implicitly defined" function F exists and that everything is differentiable, we can try to compute the differential of F at a by differentiating the equation G(^, F{^)) — 0, or G o <I,F> = 0. We get dG}<cc,3> ° dl^ + dG\cc,^> ° dF^ = 0, where we have set p = F{a). If dG^ is invertible, we can solve for dF^, getting clFcc = —(dG^a,p>)~ ° dG^aS>' Note that this has the same form as the corresponding expression from the elementary calculus that we reviewed arbove. If F is uniquely determined, then so is cWa, and the above calculation therefore strongly suggests that we are
166 THE DIFFERENTIAL CALCULUS 3.11 going to need the existence of (dG%a,p>)~^ as a necessary condition for the existence of a uniquely defined implicit function around the point < a, jS >. Since ^ is F(a), we also need (?(a, ^) = 0. These considerations will lead us to the right theorem, but we shall have to postpone part of its proof to the next chapter. What we can prove here is that if there is an implicitly defined function, then it must be differentiable. Theorem 11.1. Let F, W, and X be normed linear spaces, and let (? be a mapping from an open subset AxBofVxWtoX. Suppose that F is a continuous mapping from A to B implicitly defined by the equation (?(f, rj) = 0, that is, satisfying G{^y F(^)) = 0 on A. Finally, suppose that G is differentiable at <a^ I3>, where 13 — F(a)^ and that dG%a,fi> is invertible. Then F is differentiable at a and dFa = — (cl(?<a,/3>)""^ ° dG\a,^>' Proof. Set 77 = AF«(a, so that G{a + ^, 13 + v) = G(a + f, F(a + ©) = 0. Then Applying T~^ to this equation, where T = d(?<a,/3>, and solving for 77, we get This equation is of the form rj = 0(Q + ©(< f, ^>), and since rj = AF«(f) is an infinitesimal ^(0, by the continuity of F at a. Lemmas 5.1 and 5.2 imply first that 7} = 0(0 and then that <^,v> = 0(0- Thus e{e(<l r}>)) = 0((00(f))) = ©(?), and we have ^FM) = V = SiO + o(Q, where S = —(dG%a,p>)~^ ° dG\a,^>-, an element of Hom(F, W). Therefore, F is differentiable at a and dF^ has the asserted value. D We shall show in the next chapter, as an application of the fixed-point theorem, that if F, TF, and X are finite-dimensional, and if G is a continuously differentiable mapping from an open subset ^IX^ofFxTFtoX such that at the point <a, fi> we have both G(a, $) = 0 and dG%a,p> invertible, then there is a uniquely determined continuous mapping F from a neighborhood M of a to 5 such that F(a) = fi and G(f, F{Q) = 0 on M. The same theorem is true for the more general class of complete normed linear spaces which we shall study in the next chapter. For these spaces it is also true that if T~^ exists, then so does S~^ for all S sufficiently close to T, and the mapping S 1-^ S~^ is continuous. Therefore dG\y^^v> is invertible for all </x, ^> sufficiently close to <a, fi>, and the above theorem then implies that F is differentiable on a neighborhood of a. Moreover, only continuous mappings are involved in the formula given by the theorem for dF: yL\-^ dFfj,, and it follows that F is in fact continuously differentiable near a. These conclusions constitute the implicit- function theorem, which we now restate.
1,11 THE IMPLICIT-FUNCTION THEOREM 167 Theorem 11.2. Let F, TF, and X be finite-dimensional (or, more generally, complete) normed linear spaces, \et A X B be an open subset of F X TF, and let G: A X B -^ X he continuously differentiable. Suppose that at the point <a, I3> in A X B we have both (?(a, 13) = 0 and dG%a,fi> invertible. Then there is a ball M about a and a uniquely defined continuously differentiable mapping F from MtoB such that F(a) = fi and G(f, F(f)) = 0 on M. The so-called inverse-mapping theorem is a special case of the implicit- fiiitction theorem. Theorem 11.3. Let H he 2, continuously differentiable mapping from an open subset E of a finite-dimensional (or complete) normed linear space IF to a normed linear space F, and suppose that its differential is invertible at a point fi. Then H itself is invertible near fi. That is, there is a ball M about a = H(I3) and a uniquely determined continuously differentiable function F from M to B such that F(a) = p and H{F(^)) = f on M. ^'f^Hff. Set G(^, 7}) = f — H(r}). Then G is continuously differentiable from I X ^ to F and dG%a,p> = —dH^ is invertible. The implicit-function theorem lif^i gives us a ball M about a and a uniquely determined continuously differ- nhable mapping F from M to B such that F(a) = ^ and 0 = G(f, F(^)) = n{F(0) onM. D The inverse-mapping theorem is often given a slightly different formulation hieh we state as a corollary. Corollary. Under the hypotheses of the above theorem there exists an open neighborhood U of 13 such that H is injective on U, N = H[U] is open in V, and H~^ is continuously differentiable on N. (See Fig. 3.11.) Fig. 3.11 ''>/. The proof of the corollary is left as an exercise. In practice we often have to apply the Cartesian formulations of these lliiiorems. The student should certainly be able to write these down,, but we iliii!! state them anyway; starting with the simpler inverse-mapping theorem. Theorem 11.4. Suppose that we are given n continuously differentiable real-valued functions (?z(2/i, . . . , yn), i = 1, . . . , n, of n real variables defined on a neighborhood E of a point b in IR^ and suppose that the Jacobian determinant ^((?l, . . . , Gn) /j^N ^(2/1, . . . , 2/n)
168 THE DIFFERENTIAL CALCULUS 3.11 is not zero. Then there is a ball M about a = G^(b) in U^ and a uniquely determined n-tuple F = <Fi, . . . , Fn^ oi continuously differentiable real- valued functions defined on M such that F(a) = b and G(F(x)) = x on 71/ for i = 1, . . . , n. That is, Gi{Fi{xi, . . . , Xn), . . . , Fn{xi, . . . , Xn)) = /, for all X in M and ior i = 1, . . . , n. For example, if x we have <y\ + yl, y\ + yl>, then at the point b = <1, 2> ^(yi, 2/2) det det Syl 32/1 [22/1, 22/2J <1,2> l2, 12 4 12 9^ 0, and we therefore know without trying to solve explicitly that there is a unique solution for y in terms of x near x = < 1^ + 2^, 1^ + 22> = <9, 5>. The reader would find it virtually impossible to solve for y, since he would quickly discover that he had to solve a polynomial equation of degree 6. This clearly shows the power of the theorem: we are guaranteed the existence of a mapping which may be very difficult if not impossible to find explicitly. (However, in the next chapter we shall discover an iterative procedure for approximating tlK* inverse mapping as closely as we want.) Everything we have said here applies all the more to the implicit-function theorem, which we now state in Cartesian form. Theorem 11.5. Suppose that we are given m continuously differentiable real-valued functions Gi(x, y) = Gi(xi, . . . , Xn, yi, - - - , ym) of n + m real variables defined on an open subset A X B oi IR^+^ and an (n + m)-tuple <a, b> = <ai, . . . , an, &i, . . . , ??m>" such that Gi(a, b) = 0 for i ^ 1, . . . , m, and such that the Jacobian determinant d(Gi, . . . , Gm) ^(2/1,'" ,ym) (a,b) is not zero. Then there is a ball M about a in U^ and a uniquely determin(Ml m-tuple F= <Fi, . . . , Fm^" of continuously differentiable real-valued functions Fj(x) = Fj(xi, . . . , Xn) defined on M such that b = F(a) ami Gi(x, F(x)) = 0 on M for z = 1, . . . , m. That is, hi = Fi(ai, . . . , an) for i = 1, . . . , m,andft(a:i, . . . , Xn;Fi(xi, . . . , Xn), . . . ,Fm(xi, . . . , Xn)) 0 for all X in M and ior i = 1, . . . , m. For example, the equations 3 I 3 Xi + X2 x\ xl y\ yl 0, 2/2 = 0 can be solved uniquely for y in terms of x near -<x, y> = <1, 1, 1, —1>,
3.11 THE IMPLICIT-FUNCTION THEOREM 169 because they hold at that point and because eiG^^G^ _ d^^ — 22/1, —22/2I _ ^^^, ^,2 .. ..2^ has the value 12 there. Of course, we mean only that the solution functions diVh 2/2) i 12 ther€ exist, not that we can explicitly produce them. 3,?; -SIJ = '^'^'' - '^''^ EXERCISES 11.1 Show that <x,y> i-^ <e''-i-e^, e^'-i-e~y> is locally invertible about any point <a, 6>, and compute the Jacobian matrix of the inverse map. 11.2 Show that <w, t;> 1-^ <e" + e'', e" — e''> is locally invertible about any point <a,b> in IR^, by computing the Jacobian matrix. In this case the whole mapping is invertible, with an easily computed inverse. Make this calculation, compute the Jacobian matrix of the inverse map, and verify that the two matrices are inverses at the appropriate points. 11.3 Show that the mapping <x,y,z> 1-^ <sin x, cosy, e^> from U^ to U^ is locally invertible about < 0,7r/2, 0 >. Show that •<x,y, z> i-^ <sm (x -{- y -{- z), cos (x — y -\~ z), e^^+^~^^> is locally invertible about <7r/4, —7r/4, 0>. 11.4 Express the second map of the above exercise as the composition of two maps, and obtain your answer a second way. 11.5 Let F: <x,y> i-^ <u,v> be the mapping from U^ to U^ defined by w = 3.2 _|_ y2^ ^ _ 2xy. Compute an inverse G of F, being careful to give the domain and range of G. How many inverse mappings are there? Compute the Jacobian matrices of /^ at < 1, 2> and of G^ at <5, 4>, and show by multiplying them that they are inverse. 11.6 Consider now the mapping F: <x,y> 1-^ <x^,y^>. Show that dF^0,0^ is singular and yet that the mapping has an inverse G. What conclusion do we draw about the differentiability of G at the origin? 11.7 Define /^: IR^ _^ (1^2 y^y. ^x,y> ^-^ <e'' cosy, e"" smy>. Prove that F is locally invertible about every point. 11.8 Define /^: IR^ _^ (R3 ^y ^ ^ y ^here yi = ^1 + ^2 + (^3 — 1)^, y2 = X1 + X2+ (x3 — 3x3), 2/3 = ^1 + ^2 + ^3. Prove that x 1-^ y = F(x) is locally invertible about x = <0, 0, 1 >. 11.9 For a function/: IR -^ IR the proof of local invertibility around a point a where dfa is nonsingular is much simpler than the general case. Show first that the Jacobian matrix of / at a is the number/'(a). We are therefore assuming that/'(a:) is continuous in a neighborhood of a and that/'(a) 9^ 0. Prove that then/ is strictly increasing (or decreasing) in an interval about a. Now finish the theorem. (See Exercise 1.12.)
170 THE DIFFERENTIAL CALCULUS 3.11 11.10 Show that the equations i^ + x^ + y^ + z^ = 0, t+x^ + y^+z^ = 4, l + x+y+z = 0 have differentiable solutions x(t), y(t), z(t) around <t, x,y, z> = <0, —1, 1,0>. 11.11 Show that the equations X I 2y , Su I 4v . X I y , u I V . e -\- e -\- e -\- e =^4, e-\-e-\-e-\-e=4^ can be uniquely solved for u and v in terms of x and y around the point < 0, 0, 0, 0 >. 11.12 Let S be the graph of the equation xz -\- sin (xy) + cos {xz) = 1 in IR^. Determine whether in the neighborhood of (0, 1, 1) iS is the graph of a differentiable function in any of the following forms: ^ = /(^, y), ^ = giy, 2), y = H^, 2)- 11.13 Given functions/and g from R^ to U such that/(a, b, c) =0 and g(a, b, c) = 0, write down the condition on the partial derivatives of / and g that guarantees the existence of a unique pair of differentiable functions y = h{x) and z = k(x) satisfying; h(a) = 6, k(a) = c, and fix, y, z) = f{x, Hx), k{x)) = 0, g{x,y,z) = gi^, H^), k{x)) = O around <a,b,c>. 11.14 Let G(f, 7], f) be a continuously differentiable mapping from V = XI? Vi to 11 such that dGa'. Vs —> W is invertible and G{oi) = G(ai, a2, as) = 0. Prove that there exists a uniquely determined function f = F(^,7]) defined around <ai,a2> in Fi X V2 such that G(f, rj, F{^, v)) = 0 and F(ai, a2) = as- Also show that where f = F(^, rj). 11.15 Let F{^j rj) be a continuously differentiable function from F X W to X, and suppose that dF\a,^> is invertible. Setting 7 = /^(a, j3), show that there is a product neighborhood L X ilf X A^ of <y,a,fi> in X X F X W and a unique continuously differentiable mapping G:LX M -^ N such that on LXM, F{^,G{^,^)) = f. 11.16 Suppose that the equation g(x, y, z) =0 can be solved for z in terms of x and //, This means that there is a function/(x, y) such that g(x, y,f(x, y)) =0. Suppose also that everything is differentiable and compute dz/dx. 11.17 Suppose that the equations g(x, y,z) = 0 and h(x, y,z) =0 can be solved for y and z as functions of x. Compute dy/dx. 11.18 Suppose that g(x, y, u,v) =0 and h(x, y, u, v) =0 can be solved for u and v as functions of x and y. Compute du/dx. 11.19 Compute dz/dx where x^ -j- y^ -j- z^ = 0 and x^ + 2/^ -f 2^ = 1. 11.20 If ^^ + x^ + 2/^ + 2^ =0 and t^ + x^ + y^ + z^ = 1, then dz/dx is ambiguou.s, We are obviously goinej to think of two of the variables as functions of the other two.
3.11 THE IMPLICIT-FUNCTION THEOREM 171 Also z is going to be dependent and x independent. But is ^ or ?/ going to be the other independent variable? Compute dz/dx under each of these assumptions. 11.21 We are given four "physical variables" p, e/, t^ and ip such that each of them is a function of any two of the other three. Show that di/dip has two quite different meanings, and make explicit what the relationship between them is by labeling the various functions that are relevant and applying the implicit differentiation process. 11.22 Again the "one-dimensional" case is substantially simpler. Let G be a continuously differentiable mapping from IR^ to IR such that G(a, 6) = 0 and (dG/dy){a,h) = G2(a, 6) > 0. Show that there are positive numbers e and b such that for each c in (a — 5, a + 5) the function g{y) = G{Cj y) is strictly increasing on [6 — e, 6 + e] and G(c, 6 — e) < 0 < G{Cj h-\~ i). Conclude from the intermediate-value theorem (Exercise 1.13) that there exists a unique function F: (a — 8, a -\~ 8) —^ (b — e, b -{- e) such that G{x,F{x)) = 0. 11.23 By applying the same argument used in the above exercise a second time, prove that F is continuous. 11.24 In the inverse-function theorem show that d/^a = {dH^)~^. That is, the differential of the inverse of H is the inverse of the differential of H. Show this a) by applying the implicit-function theorem; b) by a direct calculation from the identity i/(/^(f)) = f. 11.25 Again in the context of the inverse-mapping theorem, show that there is a neighborhood M oi fi m A such that F{H(ri)) = 77 on M. (Don't work at this. Just apply the theorem again.) 11.26 We continue in the context of the inverse-mapping theorem. Assume the result (from the next chapter) that if dH'^^ exists, then so does dHJ^, for f sufficiently close to p. Show that there is an open neighborhood U oi ^inB such that H is injective on U, H[U] is an open set A^ in V, and H~^ is continuously differentiable on A^. 11.27 Use Exercise 3.21 to give a direct proof of the existence of a Lipschitz continuous local inverse in the context of the inverse-mapping theorem. [Hint: Apply Theorem 7.4.] 11.28 A direct proof of the differentiability of an inverse function is simpler than the implicit-function theorem proof. Work out such a proof, modeling your arguments in a general way upon those in Theorem 11.1. 11.29 Prove that the implicit-function theorem can be deduced from the inverse- function theorem as follows. Set H(^,V) = <^,G{^,v)>, and show that dH^a,^> has the block diagram I dG^ 0 dG^ Then show that dH^a,^> ^ exists from the block diagram results of Chapter 1. Apply the inverse-mapping theorem.
172 THE DIFFERENTIAL CALCULUS 3.12 12. SUBMANIFOLDS AND LAGRANGE MULTIPLIERS If V and W are finite-dimensional spaces, with dimensions n and m, respectively, and if F is a continuous mapping from an open subset ^4 of F to W, then (the graph of) F is a subset ofVxW which we visualize as a kind of "n-dimensional surface" S spread out over A. (See Section 10.) We shall call F an n-dimensional patch in V X W. More generally, if X is any (n + m)-dimensional vector space, we shall call a subset S an n-dimensional patch if there is an isomorphism (p from X to a product space V X W such that V is n-dimensional and (p[S] is a patch m V X W. That is, S becomes a patch in the above sense when X is considered to be F X W. This means that if tti is the projection of X = V X W onto F, then ttJaS] is an open subset A of F, and the restriction tti \ S is one-to- one and has a continuous inverse. If 7r2 is the projection on W, then F = 7r2 o (tti f AS)~\is the map from A to W whose graph in F X IF is aS (when V XW is identified with X). Now there are important surfaces that aren't such "patch" surfaces. Consider, for instance, the surface of the unit ball in IR^, S = {x : ^f o:^ = 1}. aS is obviously a two-dimensional surface in IR^ which cannot be expressed as a graph, no matter how we try to express IR^ as a direct sum. However, it should be equally clear that S is the union of overlapping surface patches. If a is any point on aS, then any sufficiently small neighborhood A/' of a in U^ will intersect aS in a patch; we take V as the subspace parallel to the tangent plane at a and W as the perpendicular line through 0. Moreover, this property of aS is a completely adequate definition of what we mean by a submanifold. A subset aS of an (n + m)-dimensional vector space X is an n-dimensional submanifold of X if each a on aS has a neighborhood N in X whose intersection with aS is an n-dimensional patch. We say that aS is smooth if all these patches Sa are smooth, that is, if the function F: A -^ W whose graph in F X IF is the patch aS^ (when X is viewed asVxW) is continuously differentiable for every such patch aS^. The sphere we considered above is a two-dimensional smooth submanifold of IR^ Submanifolds are frequently presented as zero sets of mappings. For example, our sphere above is the zero set of the mapping G from U^ to U defined by G(x) = Yll xf — I. It is obviously important to have a condition guaranteeing that such a null set is a submanifold. Theorem 12.1. Let (? be a continuously differentiable mapping from an open subset U of an (n + m)-dimensional vector space X to an m-dimensional vector space Y such that dOa is surjective for every a in the zero set aS of G. Then aS is an n-dimensional submanifold of X. Proof. Choose any point 7 of aS. Since dGy is surjective from the (n + m)- dimensional vector space X to the m-dimensional vector space F, we know that the null space V of dGy has dimension n (Theorem 2.4, Chapter 2). Let W be any
3.12 SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 173 complement of F, and think of X as F X TF, so that G now becomes a function of two vector variables and 7 is a point <a, fi> such that (?(a, fi) = 0. The restriction of dG^a,?> to W is an isomorphism from W to F; that is, {dG\a,^>)~^ exists. Therefore, by the implicit-function theorem, there is a product neighborhood S^{a) X Sr{fi) of <a, fi> m X whose intersection with S is the graph of a function on S^{a). This proves our theorem. D If aS is a smooth submanifold, then the function F whose graph is the patch of S around 7 (when X is viewed suitably as F X W) is continuously differentia- ble, and therefore S has a uniquely determined n-dimensional tangent plane M at 7 that fits S most closely around 7 in the sense of our e-approximations. If 7 = 0, this tangent plane is an n-dimensional subspace, and in general it is the translate through 7 of a subspace N. We call N the tangent space of aS at 7; its elements are exactly the vectors in X tangent to parametrized arcs drawn in S through 7. What we are going to do later is to describe an n-dimensional manifold S independently of any imbedding of aS in a vector space. The tangent space to aS at a point 7 will still be an invaluable notion, but we are not going to be able to visualize it by an actual tangent plane in a space X carrying aS. Instead, we will have to construct the vector space tangent to aS at 7 some- liow. The clue is provided by Theorem 10.2, which tells us that if aS is imbedded as a submanifold in a vector space X, then each vector tangent to aS at 7 can be presented as the unique tangent vector at 7 to some smooth curve lying in aS. This mapping from the set of smooth curves in aS through 7 to the tangent space at 7 is not injective; clearly, different curves can be tangent to each other at 7 and so have the same tangent vector there. Therefore, the object in aS that corresponds to a tangent vector at 7 is an equivalence class of smooth curves through 7, and this will in fact be our definition of a tangent vector for a general manifold. The notion of a submanifold allows us to consider in an elegant way a classical "constrained" maximum problem. We are given an open subset U of a finite-dimensional vector space X, a differentiable real-valued function F (hifined on C/, and a submanifold aS lying in U. We shall suppose that the submanifold aS is the zero set of a continuously differentiable mapping G from U to a vector space Y such that dGy is surjective for each 7 on aS. We wish to consider the problem of maximizing (or minimizing) F{y) when 7 is "constrained" to lie on aS. We cannot expect to find such a maximum point 7o by setting dFy = 0 and solving for 7, because 7o will not be a critical point for F. ('onsider, for example, the function ^(x) = Y^l xf — 1 from IR^ to IR and F(x) = X2. Here the "surface" defined by ^ = 0 is the unit sphere Y^i xf = 1, and on this sphere F has its maximum value 1 at <0, 1, 0>. But F is linear, and so ,ll^\ = F can never be the zero transformation. The device known as Lagrange niultiphers shows that we can nevertheless find such constrained critical points l)y solving dLy = 0 for a suitable function L.
174 THE DIFFERENTIAL CALCULUS 3.12 Theorem 12.2. Suppose that F has a maximum value on S at the point 7. Then there is a functional I in F* such that 7 is a critical point of the function F - (lo Gf). Proof. By the implicit-function theorem we can express X as V X W in such a way that the neighborhood of S around 7 is the graph of a mapping H from an open set ^4 in F to W. Thus, expressing F and G as functions on F X TF, wo have (?(f, ^) = 0 near 7 = <a, i3> if and only if 77 = H{^), and the restriction of F(^j 7]) to this zero surface is thus the function K: A -^ U defined by K(^) = F(^j H(^)). By assumption a is a critical point for this function. Thus 0= dKa = dF\^^^^ + dF\cc,^> ° dHc,. Also from the identity (?(f, H{^)) = 0, we get 0 = dG^a,fi> + dG%a,^> ° dHa. Since dG%a,0> is invertible, we can solve the second equation for dHa and substitute in the first, thus getting, dropping the subscripts for simplicity, dF^ - dF^ o (dG^)-^ o dG^ = 0. Let / G F* be the functional dF^ o (dG^)-\ Then we have dF^ = I o dG^ and, by definition, dF^ = I o dG^. Composing the first equation (on the right) with TTii F X W -^ V and the second with 7r2, and adding, we get dF^a,^> = I o dG^a,^>^ That is, d{F - I o G)y = 0. D Nothing we have said so far explains the phrase "Lagrange multipliers". This comes out of the Cartesian expression of the theorem, where we have U an open subset of a Cartesian space IR^, F = IR^, G = <^^ . . . , ^^>, and I in F* of the form I, : l(y) = ET CiVi. Then F - I o G = F - Zl^ ag^ and d(F — I o G)^ = 0 becomes These n equations together with the m equations G = <^\...,^^> =0 give m + n equations in the m + n unknowns 0:1, ... , Xm ci, . . . , Cm- Our original trivial example will show how this works out in practice. Wo want to maximize F(x) = x^ from IR^ to IR subject to the constraint Z\ xf = L Here ^(x) = 2^f xf — 1 is also from IR^ to IR, and our method tells us to look for a critical point oi F — eg subject to ^ = 0. Our system of equations is 0 - 2cxi = 0, 1 — 2cx2 = 0, 0 - 2cxs = 0,
3.13 FUNCTIONAL DEPENDENCE 175 The first says that c = 0 or a:i = 0, and the second impHes that c cannot be 0. Therefore, a:i = 0:3 = 0, and the fourth equation then shows that X2 = ±1. Another example is our problem of minimizing the surface area A = 2(0:2/ + yz + zx) of a rectangular parallelepiped, subject to the constraint of a constant volume, xyz = V. The theorem says that the minimum point will be a critical point oi A — XF for some X, and, setting the differential of this function equal to zero, we get the equations 2(2/ + z) - Xyz = 0, 2(x + z) — Xxz = 0, 2(x + y) - Xxy = 0, together with the constraint xyz = V. The first three equations imply that x = y = z; the last then gives V^'^ at the common value. *13. FUNCTIONAL DEPENDENCE The question, roughly, is this: If we are given a collection of continuous functions, all defined on some open set A, how can we tell whether or not some of them are functions of the rest? For example, if we are given three real-valued continuous functions /i, /2, and /a, how can we tell whether or not some one of them is a function of the other two, say/a is a function of/i and/2, which means that there is a function of two variables g(xy y) such that/a (0 = ^(/i(0>/2(0) for all t in the common domain ^4 ? If this happens, we say that /a is functionally dependent on/i and /2. This is very nearly the same as asking when it will be the case that the range S of the mapping F:t\-^ ^/i(0>/2(0>/3(0^ ^^ ^ two-dimensional submanifold of U^. However, there are differences in these questions that are worth noting. If/3 is functionally dependent on /i and /2, then the range of F certainly lies on a two-dimensional submanifold of IR^, namely, the graph of g. But this is no guarantee that it itself forms a two-dimensional submanifold. For example, both /2 and f^ might be functionally dependent on /i, /2 = ^ ° /i, and /a = /i o /i, in which case the range of F lies on the curve < s, g{s), h(s) > in R^, which is a one-dimensional submanifold. In the opposite direction, the range of F can be a two-dimensional submanifold M without fs being functionally dependent on /2 and /i. All we can conclude in this case is that locally one of the functions {/J f is a function of the other two, since locally M is a surface patch, in the language of the last section. But if we move a little bit away on the curving surface M to the neighborhood of another point, we may have to solve for a different one of the functions. Nevertheless, if M = range F is a subset of a two-dimensional manifold, it is reasonable to say that the functions {fi} f are functionally dependent, and we are led to examine this more natural notion.
176 THE DIFFERENTIAL CALCULUS 3A',\ If we assume that F = </i, /2, fs > is continuously differentiable and that the rank of dFa is 3 at some point a in A, then the implicit-function theorem implies that F[A] includes a whole ball in U^ about the point F(a). Thus a necessary condition for M = range F to lie on a two-dimensional submanifold in U^ is that the rank of dFa be everywhere less than 3. We shall see, in fact, that if the rank of dFa is 2 for all a, then M = range F is essentially a two-dimensional manifold. (There is still a tiny difficulty that we shall explain later.) Our tools are going to be the implicit-function theorem and the following theorem, which could well have come much earlier, that the rank of T is a "lower semi continuous" function of T. Theorem 13.1. Let V and W be finite-dimensional vector spaces, normed in some way. Then for any T in Hom(F, W) there is an e such that \\S - T\\ < € => rank S > rank T. Proof. Let T have null space N and range R, and let X be any complement of N in V. Then the restriction of T to X is an isomorphism to R, and hence is bounded below by some positive m. (Its inverse from i^ to X is bounded by some 6, by Theorem 4.2, and we set m = 1/6.) Then if ||aS — I'll < m/2, it follows that S is bounded below on X by m/2, for the inequalities \\T(a)\\ > m\\a\\ and {{(S - T)(a)\\ < (m/2)||a|| together imply that ||AS(a)ll ^ (W2)ll«ll. In particular, S is infective on Z, and so rank S = d(range S) > d{X) = d{R) = rank T. D We can now prove the general local theorem. Theorem 13.2. Let V and W be finite-dimensional spaces, let r be an integer less than the dimension of IF, and let F be a continuously differentiable map from an open subset A CV to W such that the rank of dFy = r for all 7 in A. Then each point 7 in ^4 has a neighborhood U such that F[U] is an r-dimensional patch submanifold of W. Proof. For a fixed 7 in ^4 let Vi and Y be the null space and range of dFy^ let V2 be a complement of Vi in V, and view F as Fi X ¥2- Then F becomes a function F(f, tj) of two variables, and if 7 = <a, I3> ^ then dF%a,^> is an isomorphism from V2 to Y. At this point we can already choose the decomposition W = Wi © W2 with respect to which F[A] is going to be a graph (locally). We simply choose any direct sum decomposition W = Wi @ W2 such that W2 is a complement of F = range dF^a,^>' Thus TFi might be F, but it doesn^t have to be. Let P be the projection of W onto TFi, along TF2. Since F is a complement of the null space of P, we know that P f F is an isomorphism from F to TFi. In particular, TFi is ?^-dimensional, and rank P o dF^a,^> = r.
3.13 FUNCTIONAL DEPENDENCE 177 Moreover, and this is crucial, P is an isomorphism from the range of dF^^^ri> to Wi for all < ^, v^ sufficiently close to <a, ^> . For the above rank theorem implies that rank P o dF^^^r]> ^ rank P o dF^a,fi> = r on some neighborhood of <a, i3>. On the other hand, the range of P o dF^^^r]> is included in the range of P, which is TTi, and so rank P o dF^^^ri> ^ r. Thus rank P o dF^^.rj> = r for < f, ^> near <a, i3>, and since rank dF^^^r]> = r by hypothesis, we see that P is an isomorphism on the range of any such dF^^^ri>' Now define H:Wi X A -^ ^i as the mapping <t,^,V> ^PoFa,v) - f. If jLt = P o P(a, j3), then dH%^^a,p> = P o dF\a,^>i which is an isomorphism from V2 to Wi. Therefore, by the implicit-function theorem there exists a neighborhood L X M X N of <ijlj a, ^> and a uniquely determined continuously differentiable mapping G from L X M to N such that ^(f, f, G(f, ?)) = 0 on L X M. That is, f=PoP(f,G(f, 0) on L X M. The remainder of our argument consists in showing that P(f, G(f, f)) is a function of f alone. We start by differentiating the above equation with respect to f, getting 0 = P o (dF^ + dF^ o dG^) = P o cZP o </, dG^>. As noted above, P is an isomorphism on the range of dF^^^r]> for all < f, ^> sufficiently close to <ay I3>, and if we suppose that L X M is also taken small enough so that this holds, then the above equation implies that cZP<^,,> o <I,dG^> = 0 for all < f, f > G L X M. But this is just the statement that the partial differential with respect to f of P(f, G(f, f)) is identically 0, and hence that is a continuously differentiable function K of f alone: Since v = <?(f, f) and f = P o P(f, ^y), we thus have P(f, 77) = K(P o p(f, ^y)), or F = Ko p o F, and this holds on the open set U consisting of those points < ^, v^ in M X A' such that P o P(f, 77) G L. If we think of IF as TFi X TF2, then F and K are ordered pairs of functions, F= <F^,F^> and K= <l^k>^P is the mapping <^,v> i-^ f, and the second component of the above equation is F^ = ko F\
178 THE DIFFERENTIAL CALCULUS 3.13 Since F^[U] = P o F[U] = L, the above equation says that F[U] is the graph of the mapping k from L to TF2. Moreover, L is an open subset of the r-dimensional vector space Wij and therefore F[U] is an r-dimensional patch manifold in W^WiXW2.U The above theorem includes the answer to our original question about functional dependence. Corollary. Let F = {/*}7 be an m-tuple of continuously differentiable real-valued functions defined on an open subset ^ of a normed linear space F, and suppose that the rank of dFa has the constant value r on J., where r is less than m. Then any point T in J. has a neighborhood 17 over which m — r oi the functions are functionally dependent on the remaining r. Proof. By hypothesis the range Y of dFy = <df\,, . . , df^> is an r-dimensional subspace of W^, We can therefore find a basis for a complementary sub- space W2 by choosing m — r of the standard basis elements {5*}, and we may as well renumber the fimctions f so that these are B^'^^, . . . , 5^. Then the projection P of IR"^ onto IR^ = L(5\ . . . , 5"*) is an isomorphism from Y to W (since F is a complement of its null space), and by the theorem there is a neighborhood f7 of 7 over which (I —- P) o F is Si function k of P o F. But this says exactly that <^+^ . . . ,/^> = fc o <f\ . . . ,f>. That is, k is an (m — r)- tuple-valued function, k = <fc'+S . . . , fc^>, and f = ¥ o </i, . . . ,f> for J = r + 1, . . . , m. Q Fig. 3.12 We mentioned earlier in the section that there was a difficulty in concluding that if F is a continuously differentiable map from an open subset ^ of F to TF whose differential has constant rank r less than rf(TF), then S = range F is an r-dimensional submanifold of S. The flaw can be described as follows. The definition of a submanifold S of X required that each point of S have a neighborhood in X whose intersection with aS is a patch. In the case before us, what we
3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 179 can conclude is that if jS is a point of aS, then 13 = F(a) for some a in ^, and a has a neighborhood U whose image under F is a patch. But this image may not be a full neighborhood of 13 in aS, because S may curve back on itself in such a way as to intrude into every neighborhood of 13. Consider, for example, the one- dimensional r imbedded in IR^ suggested by following Fig. 3.12. The curve begins in the a:2;-plane along the 2;-axis, curves over, and when it comes to the zy-plane it starts spiraling in to the origin in the xy-plsiue (the point of change over from the a:2;-plane to the xy-plsine is a singularity that we could smooth out). The origin is not a point having a neighborhood in IR^ whose intersection with r is a one-patch, but the full curve is the image of (—-1, 1) under a continuously differentiable injection. We would consider F to be a one-dimensional manifold without any difficulty, but something has gone wrong with its imbedding in IR^, so it is not a one-dimensional si^bmanifold of IR^. *14. UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS In the next chapter we shall see that a continuous function F whose domain is a bounded closed subset of a finite-dimensional vector space V is necessarily uniformly continuous. This means that given €, there is a 5 such that u--v\\<B ^ mo-Fm <e for all vectors J and tj in the domain of F. The point is that B depends only on e and not, as in ordinary continuity, cm the "anchor" point at which continuity is being asserted. This is a very Important property. In this section we shall see that it underlies a class of theorems in which a point map is escalated to a function-valued map, and prop- irties of the point map imply corresponding properties of the function-valued map. Such theorems have powerful applications, as we shall see in Section 15 find in Section 1 of Chapter 6. An application that we shall get immediately here Is the theorem on differentiation under the integral sign. However, it is only Theorem 14.3 that will be used later in the book. Suppose first that F(?, tj) is a bounded continuous function from a product i^pen set M X N to a normed linear space X. Holding rj fixed, we have a function JvW = ^(?> v) which is a bounded continuous function on M, that is, an r li^ment of the normed linear space Y = (Be(M, X) of all bounded continuous inaps from M toX. This function is also indicated F(•, ry), so that f^ = F{-, tj). We are supposing that the uniform norm is being used on Y: i|/,ii = lub {\\f,m : ? e M} = lub {\\Fil ,)11 -.^gM}. Theorem 14,1, In the above context, if F is uniformly continuous, then the mapping r? i-^ /,, (or r; t-^ F(-, tj)) is continuous, in fact, uniformly continuous, from N to Y,
180 THE DIFFERENTIAL CALCULUS 3.14 Proof, Given €, choose b so that \\<^.ri> - <ix,v>\\ < b =^ \\F{^, V) - Fill, v)\\ < €. Taking /jl = ^ and rewriting the right-hand side, we have h-v\\ < b => \\f,a) -f,m <€ for all ^ Thus 11^ - ^11 < 5 =^ 11/^ -/Jl^ < €. D We have proved that if a function of two variables is uniformly continuous, then the mappings obtained from it by the general duality principle are continuous. This phenomenon lies behind many well-known facts. For example: Corollary. If F(x, y) is a uniformly continuous real-valued function on the unit square [0, 1] X [0, 1] in U^, then Jq F(x, y) dx is a continuous function of 2/. Proof, The mapping y \-^ Jq^ F{xy y) dx is the composition of the bounded linear mapping f ^-^ Jq f from e([0, 1] to U with the continuous mapping y -^ F(' J y) from [0, 1] to ©[0, 1], and is continuous as the composition of continuous mappings. D We consider next the differentiability of the above duality-induced mapping. Theorem 14.2. If F is a bounded continuous mapping from an open product set M X A^ of a normed linear space F X TT to a normed linear space Z, and if dF%a,^> exists and is a bounded uniformly continuous function of <a, fi> on M X Nj then (p: tj -^ F(', rj) is a differentiable mapping from NtoY= (Be(M, Z), and [d<p^(vm) = dF%^,^^(r}), Proof, Given €, we choose b by the uniform continuity of dF^, so that WfjL — ^11 < 5 =^ \\dF%^^^^ — cZF<^,,>|| < € for all f G M, The corollary to Theorem 7.4 then implies that Ihll < b => ||An^^>W - dF%^,^^(v)\\ < e\\v\\ for all f G M, all 13 G N, and all rj such that the line segment from 13 to 13 + rj is in N, We fix ^ and rewrite the right-hand side of the above inequality. This is the heart of the proof. First ^F%^Mv) = Fa, 0 + v) -Fa,p) = [f^+.-f^m = wiP + v) - <pma) = i^Mvm)' Next we can check that if ||(i/^<;,,;,>|| < h for </x, v> G M X N, then the mapping T defined by the formula [T'(^)](?) = dF\^^^y{ri) is an element of Hom(TF, Y) of norm at most h. We leave the detailed verification of this as an
3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 181 exercise for the reader. The last displayed inequality now takes the form Ihll < 8 ^ ||[A^^(,) - Tivmn < e\\v\\, and hence II.7II < 8 ^ \\A<p^(r}) - T(v)\U < €h||. This says exactly that the mapping cp is differentiable at ^ and d(p^ = T. U The mapping (p is in fact continuously differentiable, as can be seen by arguing a little further in the above manner. The situation is very close to being an application of Theorem 14.1. The classical theorem on differentiability under the integral sign is a corollary of the above theorem. We give a simple case. Note that if 77 is a real variable y, then the above formula for dcp can be rewritten in terms of arc derivatives: Corollary. If F(Xf y) is a continuous real-valued function on the unit square [0, 1] X [0, 1], and if dF/dy exists and is a uniformly continuous function on the square, then Jq^ F{x, y) dx is a differentiable function of y and its derivative is ^q (dF/dy) (a:, y) dx. Proof, The mapping T: y ^-^ jo F{x^ y) dx is the composition of the bounded linear mapping/ 1-^ Jq f{x) dx from e([0, 1]) to R with the differentiable mapping ip:y \-^ F{' ^ y) from [0, 1] to e([0, 1]), and is therefore differentiable by the composite-function rule. Then Theorem 7.2 and the fact that the differential of a bounded linear map is itself give T'{y) = f^ [>p'(y)Kx) dx= f^^ (x, y) dx. D We come now to the situation of most importance to us, where a point to point map generates a function-to-function map by composition. Let A be an open set in a normed linear space F, let S be an arbitrary set, and let Gi be the set of bounded maps / from Sio A. Then a is a subset of the normed linear space (B(aS, V) of all bounded functions from S ioV under the uniform norm. A function / G Ct will be an interior point of Ct if and only if the distance from the range of/ to the boundary of Ct is a positive number 5, for this is clearly equivalent to saying that a includes a ball in (B(aS, V) about the point /. Now let g be any bounded mapping from ^4 to a normed linear space TT, and let G\ a -^ (B(aS, W) be composition by g. That is, h = G(f) if and only if / G Ct and h = g o f. We can consider both the continuity and differentiability of G, but we shall only work out the differentiability theorem. Theorem 14.3. Let the function g: A -^ W he differentiable at each point a in A, and let dga be a bounded uniformly continuous function of a. Then the mapping G: d -^ (B(aS, W) defined by (?(/) = ^ 0/ is differentiable at
182 THE DIFFERENTIAL CALCULUS 3.15 any interior point / in a and dG/: (B(aS, V) -^ (B(aS, W) is defined by [dGf{h)]{8) = dgf,s,(h(s)) for all s gS. Proof. Given e, choose 8 by the uniform continuity of dg so that \\a - ^\\ < 8 =^ \\dga - dg^W < e, and then apply the corollary to Theorem 7.4 once more to conclude that II^11 < 5 => \\Agaa)-dgM)\\ <eUl provided the line segment from a to a + f is in A. Now choose any fixed interior point / in a, and choose 8' < 5 so that ^5'(/) C GL^ Then for any h in (B(aS, F), \\h\U < 5' => ||Aj//(.)(/i(s)) - dgf^sMs))\\ < 4h(s)\\ for all seS. Define a map T: (&{S, V) •-* (&{S, W) by [T{h)](s) = dgf^s)(h(s)). Then the above displayed inequality can be rewritten as \\h\U < S' =^ \\AGf{h) - T(h)\U < €\\h\\^. That is, AGy = T" + o. We will therefore be done when we have shown that T G Hom((B(^, V), (&{S, W)). First, we have (r(/ii + h2))(s) = dgf^,,(ihi + h2Ks)) = di7/(.)(/ii(s) + h^is)) = %/(.)(/ii(s)) + %/(»)(/i2(s)) = iT{h^))(s) + (T{h2))(s). Thus T(hi + /12) = ^(^1) + TQ12), and homogeneity follows similarly. Second, if ?) is a bound to \\dg^\\ on A, then \\T{h)\U = lub {\\{T{h)){s)\\ : s G S} < lub {||<i^/(s)|| • ||^(s)|| : s G aS} < 6||/i||oo. Therefore, \\T\\ < 6, and we are finished. D In the above situation, if g is from A X C/ to TF, so that G{f) is the function h given by h{t) = g{fit), t), then nothing is changed except that the theorem is about dg^ instead of dg. If, in addition, F is a product space Vi X F2, so that/ is of the form </i,/2> and [G(f)](t) = g{ji{t)^ f2{t), 0> ^^^^ ^^^ ml^^ about partial differentials give us the formula [dGfOiM) = dg},t,{h,{t))+dgj,t,{h2{t)). *15. THE CALCULUS OF VARIATIONS The problems of the calculus of variations are simply critical-point problems of a certain type with a characteristic twist in the way the condition dF^ = 0 is used. We shall illustrate the subject by proving one of its standard theorems. Since we want to solve a constrained maximum problem in which the domain is an infinite-dimensional vector space, a systematic discussion would start off
3.15 THE CALCULUS OF VARIATIONS 183 with a more general form of the Lagrange multiplier theorem. However, for our purpose it is sufficient to note that if aS is a closed plane M + a, then the restriction of F to aS is equivalent to a new function on the vector space M, and its differential at ^ = rj + a in S is clearly just the restriction of dF^ to M. The requirement that jS be a critical point for the constrained function is therefore simply the requirement that dF^ vanish on M. Let F be a uniformly continuous differentiable real-valued function of three variables defined on (an open subset of) W X W X U, where TF is a normed linear space. Given a closed interval [a, h] C ff^, let V be the normed linear space e\[a, 6], W) of smooth arcs /: [a, h] -^ W, with ||/|| taken as ||/||oc + ||/'||oc. The problem is to maximize the (nonlinear) functional G(f) = fa F(f(t),f'(t),t) dt, subject to the restraints /(a) = a and /(6) = fi. That is, we consider only smooth arcs in W with fixed endpoints a and fi, and we want to find that arc from a to fi which maximizes (or minimizes) the integral. Now we can show that (? is a continuously differentiable function from (an open subset of) V to R. The easiest way to do this is to let X be the space 6([a, 6], W) of continuous arcs under the uniform norm, and to consider first the more general functional K from Z X X to IR defined by K{f, g) = ja F(f(t), g(t), t) dt. By Theorem 14.3 the integrand map </, g> ^-^ F(f('), g('), •) is differentiable from Z X X to e([a, h]) and its differential at <f, g> evaluated at <h, k> is the function dFknt)Mt),t>(hit)) + dF\f^t)M),t>Kt). Since/1-^ /« f{t) is a bounded linear functional on 6, it is differentiable and equal to its differential. The composite-function rule therefore implies that K is differentiable and that dK<f^g>{h, k) = [^ [dF\h(t)) + dF^(k(t))] dt, Ja where the partial differentials in the integrand are at the point <f(t)y git),t>. Now the pairs <f, g> such that/' exists and equals g form a closed subspace of X X X which is isomorphic to V. It is obvious that they form a subspace, but to see that it is closed requires the theory of the integral for parametrized arcs from Chapter 4, for it depends on the representation f(t) = f(a) + Ji/'(s) ds and the consequent norm inequality \\f{t) — f(a)\\ < (t — a)||/'||oo. Assuming this, we see that our original functional G is just the restriction of K to this sub- space (isomorphic to) V, and hence is differentiable with dGf(h) = f^dF\h(t)) +dF^{h'{t)) dt. Ja This differential dGj is called the first variation of G about /. The fixed endpoints a and fi for the arc / determine in turn a closed plane P in F, for the evaluation maps (coordinate projections) TTx'.f^-^ f(x) are bounded and P is the intersection of the hyperplanes Ta = a and tt?, = jS. Since P is a translate of the subspace M = {f gV: f(a) = f(h) = 0}, our constrained
184 THE DIFFERENTIAL CALCULUS 3.15 maximum equation is dGf{h) = [^ ldF\h{t)) + dF^{h'{t))] dt = 0 J a for all h in M. We come now to the special trick of the calculus of variations, called the lemma of Du Bois-Reymond. Suppose for simplicity that W = R. Then F is a function F{x, 2/, t) of three real variables, the partial differentials are equivalent to ordinary partial derivatives, and our critical-point equation is '^^^^^=L($-'+%-^)='- If we integrate the first term in the integral by parts and remember that h{a) h{b) = 0, we see that the equation becomes f /c)W f c)F\ where g = h\ Since h is an arbitrary continuously differentiable function except for the constraints h(a) = hQ)) = 0, we see that g is an arbitrary continuous function except for the constraint ja g{t) dt = 0. That is, dF/dy — jdF/dx is orthogonal to the null space N of the linear functional g 1-^ ja g{t) dt. Since the one-dimensional space N-^ is clearly the set of constant functions, rb rb g = 0 ^ Cg = 0. J a J a our condition becomes 1^ im,f'{t), 0 = j^ ^ im,f'(s), s) ds+c. This equation implies, in particular, that the left member is differentiable. This is not immediately apparent, since /' is only assumed to be continuous. Differentiating, we conclude finally that / is a critical point of the mapping G if and only if it is a solution of the differential equation which is called the Euler equation of the variational problem. It is an ordinary differential equation for the unknown function /; when the indicated derivative is computed, it takes the form d^F d'F d'F dF ^ dy^^ ~^dydx^ ~^dydt dx If W is not IR, we get exactly the same result from the general form of the integration by parts formula (using Theorem 6.3) and a more sophisticated
3.15 THE CALCULUS OF VARIATIONS 185 version of the above argument. (See Exercise 10.14 and 10.15 of Chapter 4.) That is, the smooth arc / with fixed endpoints a and jS is a critical point of the mapping g \-^ j^ F(g{t)^ (/{t)^ t) dt if and only if it satisfies the Euler differential equation ■j^dF<f(^t)j'{t),t> = dF<fi^t)jf{t),t>' This is now a vector-valued equation, with values in TT*. If W is finite-dimensional, with dimension n, then a choice of basis makes TT* into IR'*, and this vector equation is equivalent to n scalar equations where F is now a function of 2n + 1 real variables, F{x, y, t) = F{xi, . . . , a:n, 2/1, . . . , 2/n, t). Finally, let us see what happens to the simpler variational problem {W = R) when the endpoints of/are not fixed. Now the critical-point equation is dGfQi) = 0 for all h in F, and when we integrate by parts it becomes -,6 rh dy Ja J a \ for all h in V. We can reason essentially as above, but a little more closely, to conclude that a function / is a critical point if and only if it satisfies the Euler equation dt \dy/ dx and also the endpoint conditions = 0. dF^ dy dl dy t=b This has been only a quick look at the variational calculus, and the interested reader can pursue it further in treatises devoted to the subject. There are many more questions of the general type we have considered. For example, we may want neither fixed nor completely free endpoints but freedom subject to constraints. We shall take this up in Chapter 13 in the special case of the variational equations of mechanics. Or again, / may be a function of two or more variables and the integral may be a multiple integral. In this case the Euler equation may become a system of partial differential equations in the unknown/. Finally, there is the question of sufficient conditions for the critical function to give-a maximum or minimum value to the integral. This will naturally involve a study of the second differential of the functional (?, or its second variation, as it is known in this subject.
186 THE DIFFERENTIAL CALCULUS 3.1(') *16. THE SECOND DIFFERENTIAL AND THE CLASSIFICATION OF CRITICAL POINTS Suppose that V and W are normed linear spaces, that A is an open subset of \\ and that F: A -^ W is sl continuously differentiable mapping. The first differential of F is the continuous mapping clF: 7 i—> dFy from A to Hom(F, W). We now want to study the differentiability of this mapping at the point a. Prc^- sumably, we know what it means to say that dF is differentiable at a. By definition d(dF)a is a bounded linear transformation T from V to Hom(F, W) such that A(dF)a{v) - T(v) = o(v). That is, dF«+^ - dFa - T(r}) is an element of Hom(F, W) of norm less than e||77|| for r} sufficiently small. We set d^Fa = d{dF)a and repeat: c/^F^ = d^F^i-) is a linear map from V to Hom(F, W), d^Fa(v) = d^Fa{v)(') is an element of Hom(F, IF), and d^Fa(v)(^) is a vector in W. Also, we know that d^Fa is equivalent to a bounded bilinear map coiFX F->IF, where co(77, ?) = d^FMiO- The vector d^Fa(v)(0 clearly ought to be some kind of second derivative of F Sit a, and the reader might even conjecture that it is the mixed derivative in the directions ^ and rj. Theorem 16.1. If F: A —> W is continuously differentiable, and if the second differential d^Fa exists, then for each fixed /x G F the function D^F\ 7 K-> D^F{7) from A to IF is differentiable at a and D,(D^F)(a) = Proof. We use the evaIuation-at-/>t map ev^: Hom(F, IF) —> IF defined for a fixed jLt in F by ev^(T) == T(ijl). It is a bounded linear mapping. Then (D,F)ia) = dFM - ev,(6/F«) = (ev, o dF){a), so that the function D^F is the composition ev^ o dF. It is differentiable at a because d(dF)a exists and ev^ is linear. Thus (D^(D^F))(a) = d{D^F)a{v) = d(ev, o dFUv) = (ev, o d{dF)^){v) = eY,[{d^F,){v)] = {d^F^{v)){ix). D The reader must remember in going through the above argument that Dy,F is the function (D^F)('), and he might prefer to use this notation, as follows D,{(D,F){-))\^ = d{{D,F)('))^{v) = diev, o dF,.,Uv) = [ev, o d{dF,.X]{^) = eY,{d'Fa(p)). If the domain space V is the Cartesian space W^, then the differentiability of (D8jF)(') = (dF/dxj)(') at a implies the existence of the second partial deriva tives (d^F/dxi dxj)(a) by Theorem 9.2, and with b and c fixed, we then have dxj dxi
3.16 SECOND differential; classification of critical points 187 Thus, Corollary 1. If F = W^ in the above theorem, then the existence of d^F^ implies the existence of all the second partial derivatives (d^F/dxidxj)(a) and Moreover, from the above considerations and Theorem 9.3 we can also conclude that: Theorem 16.2. If F = IR^, and if all the second partial derivatives (d^F/dxi dxj)(a) exist and are continuous on the open set A, then the second differential d^F^^ exists on A and is continuous. Proof. We have directly from Theorem 9.3 that each first partial derivative (dF/dxj)(') is differentiable. But dF/dxj = evg^" o dF, and the corollary is then a consequence of the following general principle. D Lemma. If {Si}\ is a finite collection of linear maps on a vector space W such that S = <Sij . . . ^ Sk> is invertible, then a mapping F: A -^ W is differentiable at a if and only if Si o F is differentiable at a for all i. Proof. For then S o F and F = S~^ o S o F 2ire differentiable, by Theorems 8.1 and 6.2. D These considerations clearly extend to any number of differentiations. Thus, if d'^Ff^.)'. y —> d'^Fy is differentiable at a, then for fixed b and c the evaluation d^F(.)(b, c) is differentiable at a and the formula dxjdxi {D^{D^F)){.) = d%.,(b, a) = E b,c,^^{-) shows (for special choices of b and c) that all the second partials (d^F/dxj dxi)(') are differentiable at a, with (Da/)c-DbF)(a) = Da({D,D^F)(-))l = Z^ b.cA^^J^J^^^ (a). Conversely, if all the third partials exist and are continuous on A, then the second partials are differentiable on A by Theorem 9.3, and then d^F(^.) is differentiable by the lemma, since (d^F/dxi dxj)(') = ev^5\d^'y, o d^F(^.y As the reader will remember, it is crucially important in working with liigher-order derivatives that d^F/dxi dxj = d^F/dxj dxi, and we very much need the same theorem here. Theorem 16.3. The second differential is a symmetric function of its two arguments: {d^F^{n)){i) = id^F^{0){r}).
188 THE DIFFERENTIAL CALCULUS 3.16 Proof. By the definition of d(dF)a, given €, there is a 5 such that MdFUv) - d(dFUv)\\ < eM whenever ||77|| < 8. Of course, A{dF)a('n) = dFaJ^t) ~ ^^a- If we write down the same inequality with rj replaced by 77 + f, then the difference of the transformations in the left members of the two inequalities is (dF^+, - dF,^,^^) - d^F^(-0, and the triangle inequality therefore implies that \\(dF„+, - rfF«+,+r) - d^F^{-^)\\ < e(h\\ + h + fll) < 2e(|h|| + ||f||), provided that both rj and ^ + T have norms at most 8. We shall take ||f || < 5/3 and 117711 < 25/3. If we hold f fixed, and if we set T = d^F^{-^) and G(f) = ^(f) - ^(f + f), then this inequality becomes pG^^.^ - T\\ < 2€{\\'n\\ + ||f ||), and since it holds whenever ||77|| < 25/3, we can apply the corollary to Theorem 7.4 and conclude that IIAG«4.^(f) - T{^)\\ < 2€{\\'n\\ + ||f||)||f||, provided that 77 and 77 + f have norms at most 25/3. This inequality therefore holds if 77, f, and f all have norms at most 5/3. If we now set ^ = —77, we have and AG«4.^(f) = F(a + 77 + f) - F{a + 77) - F{a + f) + F{a). This function of 77 and f is called the second difference of F at a, and is designated A^Faivy ?)• Note that it is symmetric in f and 77. Our final inequality can now be rewritten as \\A'F„(r,,^)- {d'FM)m <46||„||||?||. Reversing 77 and f, and using the symmetry of A^F^, we see that \\(d^FM)iO - {d'Faa))m < 86h|| ll^ll, provided 77 and f have norms at most 5/3. But now it follows by the usual homogeneity argument that this inequality holds for all 77 and J. Finally, since € is arbitrary, the left-hand side is zero. D The reader will remember from the elementary calculus that a critical point a for a function / [/'(a) = 0] is a relative extremum point if the second derivative/"(a) exists and is not zero. In fact, if/"(^) < 0> then/has a relative maximum at a, because/"(^) < 0 implies that/' is decreasing in a neighborhood of a and the graph of / is therefore concave down in a neighborhood of a. Similarly, / has a relative minimum at a if/'(a) = 0 and /"(a) > 0. If/"(a) = 0, nothing can be concluded. If / is a real-valued function defined on an open set ^4 in a finite-dimensional vector space F, if a G ^4 is a critical point of /, and if d^fa exists and is a non-
3.16 SECOND differential; classification of critical points 189 singular element of Hom(F, F*), then we can draw similar conclusions about the behavior of / near a, only now there is a richer variety of possibilities. The reader is probably already familiar with what happens for a function / from U^ to U. Then a may be a relative maximimi point (a "cap" point on the graph of/), a relative minimimi point, or a saddle point as shown in Fig. 3.13 for the graph of the translated function A/a. However, it must be realized that new axes may have to be chosen for the orientation of the saddle to the axes to look as shown. Replacing / by A/a amounts to supposing that 0 is the critical point and that /(O) = 0. Note that if 0 is a saddle point, then there are two complementary subspaces, the coordinate axes in the Fig. 3.13, such that 0 is a relative maximum for / when / is restricted to one of them, and a relative minimima point for the restriction of / to the other. Fig. 3.13 We shall now investigate the general case and find that it is just like the two-dimensional case except that when there is a saddle point the subspace on which the critical point is a maximum point may have any dimension from 1 to n — 1 [where d(V) = n]. Moreover, this dimension is exactly the nimiber of —Ts in the standard orthonormal basis representation of the quadratic form Our hypotheses, then, are that / is a continuously diff erentiable real-valued function on an open subset of a finite-dimensional normed linear space F, that a G A is a critical point for f (dfa ~ 0), and that the mapping d^fa'. V —^V* exists and is nonsingular. This last hypothesis is equivalent to assimiing that the bilinear form co(J, rj) ~ d^fai^y v) has a nonsingular matrix with respect to any basis for V. We now use Theorem 7.1 of Chapter 2 to choose an w-orthonormal basis {a*} 1. Remember that this means that co(ai, aj) ~ 0 if i 9^ j\ co(at, ai) ~ 1 ioT i = 1, . . . , p, and co(a/, ai) = —1 for i == p + 1, . . . , n. There cannot be any 0 values for o){ai, ai) because the matrix Uj = o){ai, aj) is nonsingular: if o){ai, ai) = 0, then the whole ith colimin is zero, the colimin space has dimension < n — 1, and the matrix is singular. We can use the basis isomorphism <p to replace F by IR^ (i.e., replace / by / o (p)^ and we can therefore suppose that V = U^ and that the standard basis is
190 THE DIFFERENTIAL CALCULUS 3.16 w-orthonormal, with co(x, y) = ll? Xiyi — YZ+i ^^Vi- Since co(5^ 50 = <fm\ S') = D,'D,'fi^) = ^^ (a), our hypothesis of co-orthogonality is that {d^f/dxi dxj){a) = 0 for i 9^ j\ d^f/dxf = 1 for 1=1,...,/), and d^f/dxf = -1 for z = p + 1, . . . , n. Since p can have any value from 0 to n, there are n + 1 possibilities. We show first that if p = n, then a is a relative minimum of/. In this case the quadratic form q is said to be positive definite^ since q(x) = w(x, x) is positive for every nonzero x. We also say that the bilinear form co(x, y) = <i^/a(x, y) is positive definite^ and, in the language of Chapter 5, that co is a scalar product. Theorem 16.4. Let / be a continuously differentiable real-valued function defined on an open subset A of IR^, and let a G ^4 be a critical point of / at which d^f exists and is positive definite. Then / has a relative minimum at a. Proof. We suppose, as above, that the standard basis {5*} 1 is co-orthonormal. By the definition of cZ^/a, given €, there is a 5 such that |(cZA+,(x) - dL(x)) - d'f^ix, y)| < €||x|| ||y|| whenever ||y|| < 8. Now d/a = 0, since a is a critical point of/, and d^f^(xy y) = Hi Xiijij by the assumption that {5*] 1 is co-orthonormal. Therefore, if we use the two-norm on U and set y = ^x, we have \dUtM - t\\x\\'\ < et\\xf. Also, if h(t) = /(a + ^x), then h\s) = d/a+sxCx), and this inequality therefore says that (1 — €)^||a:|p < h\t) < (1 + €)^||a:||^. Integrating, and remembering that h(l) ~ h(0) = /(a + x) - /(a) = A/a(x), we have (-i) nr ^.,.(., _< {ip) Ixf whenever ||a:|| < 8. This shows not only that a is a relative minimum point but also that A/a lies between two very close paraboloids when x is sufficiently small. D The above argument will work just as well in general. If g(x) = Z) ^? — ^ Xi ,.2 1 P+i is the quadratic form of the second differential and ||x||2 = Hi xf, then replacing ||x||^ inside the absolute values in the above inequalities by g(x), we conclude that
3.17 THE TAYLOR FORMULA 191 or J (i: (1 - e)xf -1: (1 + e)xf) < AUix) \ 1 p+i / \ 1 p+i / This shows that A/a lies between two very close quadratic surfaces of the same type when ||x|| < 8. Ifl<p<n — 1 and a = 0, then/has a relative minimum on the subspace Vi = L({5*}i) and a relative maximum on the complementary space V2 = L{{8^}p^i). According to our remarks at the end of Section 2.7, we can read off the type of a critical point for a function of two variables without orthonormalizing by looking at the determinant of the matrix of the (assumed nonsingular) form d^f^. This determinant is hit If it is positive, then a is either a relative minimum or a relative maximum. We can tell which by following / along a single line, say the a;i-axis. Thus, if d^f/dxl < ^j then a is a relative maximum point. On the other hand, if the above expression is negative, then a is a saddle point. It is important for the calculus of variations that Theorem 16.4 remains true when the domain space is replaced by a space of the general type that we shall study in the next chapter, called a Banach space. The hypotheses now are that a is a critical point of/, that q{0 = dfa{^, f) is positive definite, and that the scalar product norm q^'^ (see Chapter 5) is equivalent to the given norm on V. The proof remains virtually unchanged. ♦17. HIGHER ORDER DIFFERENTIALS. THE TAYLOR FORMULA We have seen that if V and W are normed linear spaces, and if F is a differ- entiable mapping from an open set A in 7 to TF, then its differential dF = dF(^.) is a mapping from A to Hom(7, W). If this mapping is differentiable on A, then its differential d{dF) = d{dF)(^.) is a mapping from A to Hom(7, Hom(7, W)). We remember that an element of Hom(F, Hom(7, W)) is equivalent by duality to a bilinear mapping from 7 X 7 to T7, and if we designate the space of all such bilinear mappings by Hom^(7, T7), then d{dF) can be considered to be from A to Hom^(7, W), We write d{dF) = d'^F, and call this mapping the second differential of F. In Section 16 we saw that d^Fai^j v) = D^{Dr,F){a)j and that if 7 = R^, then rfX(b, c) = Z)bZ)eF(a) = 2 6,cy^0^ (a). The differentials of higher order are defined in the same way. If d^F:A-^ Hom2(F, W)
192 THE DIFFERENTIAL CALCULUS 3.17 is differentiable on A, then its differential, d{d^F) = d^F, is from A to Hom(7, Hom^(7, W)) = Hom^(F, W), the space of all trilinear mappings from V^ =V XV XV ioW, Continuing inductively, we arrive at the notion of the 7?th differential of /^ on A as a mapping from A to Hom(F, Hom^~"^(F, W)) = Hom'^(F, W)j the space of all n-linear mappings from V^ to W. The theorem that d^Fa is a symmetric element of Hom^(F, W) extends inductively to show that d^Fa is a symmetric element of Hom^(F, W). We shall omit this proof. Our theorem on the evaluation of the second differential by mixed directional derivatives also generalizes by induction to give Z)^,,...,Z)^/(a) = d^/^«(fi,...,U, for starting from the left-hand term, we have Dj,(Z)f„ . . . , i)j/)(-)U = d(D^„ ..., Dj„F(-))a(?i) = <^(ev<{, j„> o d'-iF(.))„(^i) = [ev<j, j„> o d(d"-iF(.))a](^i) = ey<i, („>(d^FMi)) = (d"F„(^i))(^2, ...,^n) = d'^FMu ■.., U- If F = R^, then our conclusions about partial derivatives extend inductively in the same way to show that F has continuous differentials on A up through order m if and only if all the mth-order partial derivatives d^F/dXi^j . . . , dxi^ exist and are continuous on A, with d-F^{c\ . . . , O = Z ci, ...,cZ (a). We now consider the behavior of F along the line t ^-^ a -{- t-q, where, of course, a and 77 are fixed. If \{t) = F{a-[- tri), then we can prove by induction that d'\/dt' = {Dr,yF{a + trj). We know this to be true for j = 1 by Theorem 7.2, and assuming it for j = m, we have, by the same theorem, ^^ = \^) (0 = d{D-F)^^t,{ri) = D,{D-F){a + tri) = D--^'F{a + tv). Now suppose that Fis real-valued {W = R). We then have Taylor's formula: \{t) = x(0) + a'(0) + • ■ ■ + ^ x""'(0) + (J^^), x^-'+^'Cto) for some k between 0 and 1. Taking t = 1 and substituting from above, we have F(a + v) = F(«) + D,F{a) + ■ ■ ■ + ^_ Dy{a) + ^^|^^, D'^+'Fia + fc,),
3.17 THE TAYLOR FORMULA 193 which is the general Taylor formula in the normed linear space context. In terms of differentials, it is J_ ml F{a +v) = F{a) + dFM + • • • + ^l ^"^«(^^ • • - ^) ~^ (m + 1)! ^'^ Fa+kv(Vj . . . , ^)- If V = Wy then DyG = J^\ yi dG/dxij and so the general term in the Taylor expansion is m! VY ^' axi/ ' ^"^ - m! i.,..t?„=i ^'^ ' ' ' ^'- dxi, ■ • • dxi„ li m = n = 2j and if we use the notation x= -<^Xj y>, s = <s, t> j then The above description is logically simple, but it is inefficient in that it repeats identical terms such as 2/i2/2(^^^/^^i ^^2) and 2/22/1 (^^^/^^2 ^^i)- We conclude by describing for the interested reader the modern "multi-index" notation for this very complicated situation. Remember that we are looking at the mth term of the Taylor formula for Fj and that F has n variables. For any n-tuple k = </ci, . . . , /cn> of nonnegative integers, we define |k| as 2^1 ki, and for x G W, we set x*" = 0:1^^X2^^ • • • Xn^". Also we set F^ = Fk^k2"'Ki 01* better, if DjF = dF/dxj, we set D'^F = Di^'D2^' • • • Dn^-F = F^. Finally, we set k! = /ci!/c2! • • • /cn!, and if p > |k|, we set (i) = p!/k!(p — |k|)!. Then the mth term of the Taylor expansion of F is a,£(k)°'^<""^ which is surely a notational triumph. The general Taylor formula is too cumbersome to be of much use in practice; it is principally of theoretical value. The Taylor expansions that we actually compute are generally found by other means, such as substitution of a polynomial (or power series) in a power series. For example. 3! ' 5! = x + y 3! 2 "^ \5! 2 / "^ V 4! 3!/ A mapping from A to W which has continuous differentials of all orders through k is said to be of class C* on 4, and the collection of all such mappings
194 THE DIFFERENTIAL CALCULUS 3.17 is designated C^(A, W) or e^(A, W). It is clear that (7^(A, W) is a vector space (induction). Moreover, it can also be shown by induction that a composition of C^-maps is itself of class C^. This depends on recognizing the general form of the mth differential of a composition F o (? as being a finite sum, each term of which is a composition of functions chosen from F, dF,. . . , d^F, G, dGj . . . , d^G. Functions of many variables are involved in these calculations, and it is simplest to treat each as a function of a single n-tuplet variable and to apply the obvious corollary of Theorem 8.1 that if G\ . . . , G"^ are of class C^, then so is G= < (?\ . . . , (?^>, with d^G = <d^G\ . . . , d^(?^>. As a special case of composition, we can conclude that a product of C^-maps is of class C^. We shall see in the next chapter that <p: T ^-^ T~^ is a differentiable map on the open set of invertible elements in Hom V (if F is a Banach space) and that d<pT{H) = - T-^HT-\ Since <S, H, T> ^S-^HT-Hhen has continuous partial differentials, we can continue, and another induction shows that (p is of class C^ for every k and that d'^ipriHij . . . , Hm) is a finite sum of finite products of T~^, Hij . . . , H^. It then follows that a function F defined implicitly by a C^- function G is also of class C^, for its differential, as computed in the implicit- function theorem, is then a composition of maps of class C^~^. A mapping F which is of class C^ for all k is said to be of class (7°°, and it follows from our remarks above that the family of C*-maps is closed under all the operations that we have met in the calculus. If the domain of F is an open set in W, then F G e°°(A, W) if and only if all the partial derivatives of F exist and are continuous on A.
CHAPTER 4 COMPACTNESS AND COMPLETENESS In this chapter we shall investigate two properties of subsets of a normed linear space V which are concerned with the fact that in a certain sense all the points which ought to be there really are there. These notions are largely independent of the algebraic structure of V, and we shall therefore study them in their own most natural setting, that of metric spaces. The stronger of these two properties, compactness, helps to explain why the theory of finite-dimensional spaces is so simple and satisfactory. The weaker property, completeness, is shared by important infinite-dimensional normed linear spaces, and allows us to treat these spaces in almost as satisfactory a way. It is these properties that save the calculus from being largely a formal theory. They allow us to define crucial elements by limiting processes, and are responsible, for example, for an infinite series having a sum, a continuous real- valued function assuming a maximum value, and a definite integral existing. For the real number system itself, the compactness property is equivalent to the least upper bound property, which has already been an absolutely essential tool in our construction of the differential calculus in Chapter 3. In Sections 8 through 10 we shall apply completeness to the calculus. The first of these sections is devoted to the existence and differentiability of functions defined by power series, and since we want to include power series in an operator Tj we shall take the occasion to introduce and exploit the notion of a Banach algebra. Next we shall prove the contraction mapping fixed-point theorem, which is the missing ingredient in our unfinished proof of the implicit-function theorem in Chapter 3 and which will be the basis for the fundamental existence and uniqueness theorem for ordinary differential equations in Chapter 6. In Section 10 we shall prove a simple extension theorem for linear mappings into a complete normed linear space and apply it to construct the Riemann integral of a param- atrized arc. 1. METRIC SPACES; OPEN AND CLOSED SETS In the preceding chapter we occasionally treated questions of convergence and continuity in situations where the domain was an arbitrary subset A of a normed linear space V. In such discussions the algebraic structure of V fades into the background, and the vector operations of V are used only to produce the combi- 195
196 COMPACTNESS AND COMPLETENESS 4.1 nation ||a — ^\\j which is interpreted as the distance from a to ^. If we distill out of these contexts what is essential to the convergence and continuity arguments, we find that we need a space A and a function p: A X A -^ Uj p{xj y) being called the distance from x to y, such that 1) P(^j y) > 0 if X 7^ ?/; and p{x, x) = 0; 2) p{x, y) = p{y, x) for all x,yGA; 3) p(x, z) < p{x, y) + p{y, z) for all x,y,z^ A. Any set A together with such a function p from A X A to R is called a metric space] the function p is the metric. It is obvious that a normed linear space is a metric space under the norm metric p(a, p) — \\a — I3\\ and that any subset B of a metric space A is itself a metric space under p \ B X B. If we start with a nice intuitive space, like W^ under one of its standard norms, and choose a weird subset Bj it will be clear that a metric space can be a very odd object, and may fail to have almost any property one can think of. Metric spaces very often arise in practice as subsets of normed linear spaces with the norm metric, but they come from other sources too. Even in the normed linear space context, metrics other than the norm metric are used. For example, aS might be a two-dimensional spherical surface in R^, say S = {x : X!i ^i = I}; and p(x, y) might be the great circle distance from x to y. Or, more generally, aS might be any smooth two-dimensional surface in R^, and p(x, y) might be the length of the shortest curve connecting x to y in aS. In this chapter we shall adopt the metric space context for our arguments wherever it is appropriate, so that the student may become familiar with this more general but very intuitive notion. We begin by reproducing the basic definitions in the language of metrics. Because the scalar-vector dichotomy is not a factor in this context, we shall drop our convention that points be represented by Greek or boldface roman letters and shall use whatever letters we wish. Definition. If X and Y are metric spaces, then f: X -^ Y is continuous at a G X if for every e there is a 5 such that p{x,a) < 8 =^ p(f{x),f(a)) < e. Here we have used the same symbol 'p^ for metrics on different spaces, just as earlier we made ambiguous use of the norm symbol. Definition. The (open) ball of radius r about p, Br{p)j is simply the set of points whose distance from p is less that r: Br(p) = {x:p{x,p) < r}. Definition. A subset A C X is open if every point p in A is the center of some ball included in A, that is, if (VpGA)(3r>«)(B.(p)c4).
4.1 METRIC spaces; open and closed sets 197 Lemma 1.1. Every ball is open; in fact, if g G Br{p) and 8 = r — p{pj q), then B8{q)cBr(p). Proof. This amounts to the triangle inequality. For, if x G ^5(5), then p{x, q) < d and p(x, p) < p{xj q) + p{qj p) < d + p{pj q) = r, so that x G Br{p). Thus Bsiq) C Br{p). D Lemma 1.2. If p is held fixed, then p(p, x) is a continuous function of x. Proof. A symbol-by-symbol paraphrase of Lemma 3.1 of Chapter 3 shows that \p(Py^) ~ P(P)y)\ ^ P(^)y)y so that p(p,x) is actually a Lipschitz function with constant 1. D Theorem 1.1. The family 3 of all open subsets of a metric space aS has the following properties: 1) The union of any collection of open sets is open; that is, if {Ai: i G 1} C 3, then \Jisi Ai G 3. 2) The intersection of two open sets is open; that is, if A,B ^ 3, then A n ^ G 3. 3) 0, 7 G 3. Proof. These properties follow immediately from the definition. Thus any point pm\JiAi lies in some Ay, and therefore, since Aj is open, some ball about p is a subset of Aj and hence of the larger set \Ji Ai. D Corollary. A set is open if and only if it is a union of open balls. Proof. This follows from the definition of open set, the lemma above, and property (1) of the theorem. D The union of all the open subsets of an arbitrary set A is an open subset of A, by (1), and therefore is the largest open subset of A. It is called the interior of A and is designated A^^^. Clearly, p is in A^^^ if and only if some ball about p is a subset of A, and it is helpful to visualize A^^^ as the union of all the balls that are in A. Definition. A set A is dosed if A' is open. The theorem above and De Morgan's law (Section 0.11) then yield the following complementary set of properties for closed sets. Theorem 1.2 1) The intersection of any family of closed sets is closed. 2) The union of two closed sets is closed. 3) 0 and V are closed. Proof. Suppose, for example, that {Bi: i g /} is a family of closed sets. Then the complement B'i is open for each z, so that \ji Bl is open by the above theorem.
198 COMPACTNESS AND COMPLETENESS 4.1 Also, \JiBl = (CliBiY by De Morgan's law (see Section 0.11). Thus OiBi is the complement of an open set and is closed. D Continuing our "complementary" development, we define the closurej A, of an arbitrary set A as the intersection of all closed sets including A, and we have from (1) above that A is the smallest closed set including A. De Morgan's law implies the important identity (I)' = {Ay^'K For /^ is a closed superset of A if and only if its complement U = 7^' is an open subset of A'. By De Morgan's law the complement of the intersection of all such sets F is the union of all such sets U. That is, the complement of Z is (A')'"*. This identity yields a direct characterization of closure: Lemma 1.3. A point p is in A if and only if every ball about p intersects A. Proof. A point p is not in A if and only if p is in the interior of A', that is, if and only if some ball about p does not intersect A. Negating the extreme members of this equivalence gives the lemma. D Definition. The boundary, dA, of an arbitrary set A is the difference between its closure and its interior. Thus dA = A - A int Since A — ^ = A fi ^', we have the symmetric characterization dA = A n (A^). Therefore, dA = a(A'); also, p G dA if and only if every ball about p intersects both A and A'. Example. A ball Br(a) is an open set. In a normed linear space the closure of Br(a) is the closed ball about a of radius r, {f : p(f, a) < r}. This is easily seen from Lemma 1.3. The boundary dBr(a) is then the spherical surface of radius r about a, {f : p(f, a) = r}. If some but not all of the points of this surface are added to the open ball, we obtain a set that is neither open nor closed. The student should expect that a random set he may encounter will be neither open nor closed. Continuous functions furnish an important source of open and closed sets by the following lemma. Lemma 1.4. If X and Y are metric spaces, and if / is a continuous mapping from X to F, then f~^[A] is open in X whenever A is open in F. Proof. If p Ef~^[A], then f(p) G A, and, since A is open, some ball ^e(/(p)) is a subset of A. But the continuity of / at p says exactly that there is a 5 such that flB8{p)]CB,(f{p)). In particular, flB^ip)] C A and B^ip) Cf'^lA]. Thus for each p in/~^[A] there is a ball about p included in/~'^[A], and this set is therefore open. D
4.1 METRIC spaces; open and closed sets 199 Since/"^[A'] = (/"^[A])', we also have the following corollary. Corollary. If/: X -^ Y is continuous, then/~'^[C] is closed in X whenever C is closed in Y. The converses of both of these results hold as well. As an example of the use of this lemma, consider for a fixed a G X the continuous function f: X -^ U defined by/(f) = p(f, a). The sets (—r, r), [0, r], and {r} are respectively open, closed, and closed subsets of U. Therefore, their inverse images under/—the ball Br{a), the closed ball {f : p(f, a) < r}, and the spherical surface {f : p(f, a) = r} —'are respectively open, closed, and closed in X. In particular, the triangle inequality argument demonstrating directly that Br{a) is open is now seen to be unnecessary by virtue of the triangle inequality argument that demonstrates the continuity of the distance function (Lemma 1.2). It is not true that continuous functions take closed sets into closed sets in the forward direction. For example, if /: R -^ R is the arc tangent function, then f[U] = range / = (—7r/2,7r/2), which is not a closed subset of U. The reader may feel that this example cheats and that we should only expect the /-image of a closed set to be a closed subset of the metric space that is the range of /. He might then consider f{x) = 2x/(l + x^) from U to its range [—1, 1]. The set of positive integers /"*" is a closed subset of K, but/[/"*"] = {2r?/(l + ^^)}T is not closed in [—1, 1], since 0 is clearly in its closure. The distance between two nonempty sets A and Bj p{Aj B), is defined as gib {p{aj h) : a G A and h G B}. If A and B intersect, the distance is zero. If A and B are disjoint, the distance may still be zero. For example, the interior and exterior of a circle in the plane are disjoint open sets whose distance apart is zero. The x-axis and (the graph of) the function f{x) = 1/x are disjoint closed sets whose distance apart is zero. As we have remarked earlier, a set A is closed if and only if every point not in A is a positive distance from A. More generally, for any set A a point 7? is in A if and only if p(p. A) = 0. We list below some simple properties of the distance between subsets of a normed linear space. 1) Distance is unchanged by a translation: p{Aj B) = p{A + 7, B + 7) (becaiuse ||(a + 7) - (^ + 7)|| = ||a - ^||). 2) p{kA, kB) = \k\p(A, B) (because \\ka - kp\\ = \k\ \\a - ^||). 3) If iV is a subspace, then the distance from ^ to iV is unchanged when we translate B through a vector in N: p{Nj B) = p{N, B -\- ri) \i -q g N (because N — -q = N). 4) UT G Hom(7, W), then p{T[Al T[B]) < \\T\\p{A, B) (because \\na)-Tm < wn-wa-m- Lemma 1.5. If iV is a proper closed subspace and 0 < ^ < 1, there exists an a such that ||a|| = 1 and p(a, N) > I — e.
200 COMPACTNESS AND COMPLETENESS 4.1 Proof. Choose any fi i N. Then p{^, N) > 0 (because N is closed), and there exists an 77 G AT such that 11/3 - ,11 < p(/3, N)/{1 - 6) [by the definition of p{0, N)]. Set « = (/3 - r,)/\\0 - r,\\. Then ||a|| = 1 and p{a, N) = p{fi - V, N)/\\0 - 7,11 = pifi, N)/\\p - ,11 > p(/3, N){1 - e)/piP, N) = l-e, by (2), (3), and the definition of 77. D The reader may feel that we ought to be able to improve this lemma. Surely, all we have to do is choose the point in N which is closest to jS, and so obtain lli^ — ^11 = p(Pj N)j giving finally a vector a such that ||a|| = 1 and p(a, N) = 1. However, this is a matter on which our intuition lets us down: if N is infinite- dimensional, there may not be a closest point 77! For example, as we shall see later in the exercises of Chapter 5, if V is the space ©([ — 1, 1]) under the two- norm 11/11 = (Jii/^)^^^, and if iV is the set of functions (/in 7 such that Jo^ ^ = 0, then iV is a closed subspace for which we cannot find such a "best" a. But if N is finite-dimensional, we can always find such a point, and if F is a Hilbert space, (see Chapter 5) we can also. EXERCISES 1.1 Write out the proof of Lemma 1.2. 1.2 Prove (2) and (3) of Theorem 1.1. 1.3 Prove (2) of Theorem 1.2. 1.4 It is not true that the intersection of a sequence of open sets is necessarily open. Find a counterexample in R. 1.5 Prove the corollary of Lemma 1.4. 1.6 Prove that p G A ii and only if p(p. A) = 0. 1.7 Let X and Y be metric spaces, and let/: X —> F have the property that/~^[5] is open in X whenever B is open in Y. Prove that / is continuous. 1.8 Show that p{x, A) = p{x, A). 1.9 Show that p{Xj A) is a continuous function of x. (In fact, it is Lipschitz continuous.) 1.10 Invent metric spaces S (by choosing subsets of U^) having the following properties : 1) S has n points. 2) S is infinite and pix^y) > 1 if a: 9^ y. 3) S has a ball Bi{a) such that the closed ball {x: p(x, a) < 1} is not the same as the closure oi Bi(a).
4.2 TOPOLOGY 201 1.11 Prove that in a normed linear space a closed ball is the closure of the corresponding open ball. 1.12 Show that if /: X —> Y and g^: F —> Z are continuous (where X, Y, and Z are metric spaces), then so is g' © /. 1.13 Let X and Y be metric spaces. Define the notion of a product metric on Z = X X Y. Define a 1-metric pi and a uniform metric Poo on Z (showing that they are metrics) in analogy with the 1-norm and uniform norm on a product of normed linear spaces, and show that each is a product metric according to your definition above. 1.14 Do the same for a 2-metric p2 on Z = X X F. 1.15 Let X and Y be metric spaces, and let F be a normed linear space. Let/: X —> R and g: F —> F be continuous maps. Prove that <x,y>^f{x)g{y) is a continuous map from XX F to F. *2. TOPOLOGY If X is an arbitrary set and 3 is any family of subsets of X satisfying properties (1) through (3) in Theorem 1.1, then 3 is called a topology on X. Theorem 1.1 thus asserts that the open subsets of a metric space X form a topology on X. The subsequent definitions of interior, closed set, and closure were purely topological in the sense that they depended only on the topology 3, as were Theorem 1.2 and the identity (A)' = (A'y^^. The study of the consequences of the existence of a topology is called general topology. On the other hand, the definitions of balls and continuity given earlier were metric definitions, and therefore part of metric space theory. In metric spaces, then, we have not only the topology, but also our e-definitions of continuity and balls and the spherical characterizations of closure and interior. The reader may be surprised to be told now that although continuity and convergence were defined metrically, they also have purely topological characterizations and are therefore topological ideas. This is easy to see if one keeps in mind that in a metric space an open set is nothing but a union of balls. We have: / is continuous at p if and only if for every open set A containing f{p) there exists an open set B containing p such that/[^] C A. This local condition involving behavior around a single point p is more fluently rendered in terms of the notion of neighborhood. A set A is a neighborhood of a point p if p G A^^^. Then we have: / is continuous at p if and only if/or every neighborhood N of/(p), f^lN] is a neighborhood of p. f Finally there is an elegant topological characterization of global continuity. Suppose that S\ and S2 are topological spaces. Then /: iSi —> S2 is continuous
202 COMPACTNESS AND COMPLETENESS 4:.'.) (everywhere) if and only if/~^[^] is open whenever A is open. Also, / is continuous if and only if/~^[^] is closed whenever B is closed. These conditions are not surprising in view of Lemma 1.4. 3. SEQUENTIAL CONVERGENCE In addition to shifting to the more general point of view of metric space theory, we also want to add to our kit of tools the notion of sequential convergence, which the reader will probably remember from his previous encounter with the calculus. One of the principal reasons why metric space theory is simpler and more intuitive than general topology is that nearly all metric arguments can be presented in terms of sequential convergence, and in this chapter we shall partially make up for our previous neglect of this tool by using it constantly and in preference to other alternatives. Definition. We say that the infinite sequence {Xn} converges to the point a if for every e there is an N such that n > N ==> p{xn, a) < e. We also say that Xn approaches a as n approaches (or tends to) infinity, and we call a the limit of the sequence. In symbols we write x^-^aasn-^oo,or lim„_,oo Xn = a. Formally, this definition is practically identical with our earlier definition of function convergence, and where there are parallel theorems the arguments that we use in one situation will generally hold almost verbatim in the other. Thus the proof of Lemma 1.1 of Chapter 3 can be alternated slightly to give the following result. Lemma 3.1. If {fj and {rji} are two sequences in a normed linear space V^ then ^i -^ a and rji-^ P =^ ^i + 77i "^ « + i^- The main difference is that we now choose N as max {iVi, N2} instead of choosing b as min {5i, 52}. Similarly: Lemma 3.2. If f^ -^ amV and Xi -^ a in R, then Xi^i -^ aa. As before, the definition begins with three quantifiers, (Ve)(3iV)(Vn). A somewhat more idiomatic form can be obtained by rephrasing the definition in terms of balls and the notion of "almost all n". We say that P{n) is true for almost all n if P{n) is true for all but a finite number of integers n, or equivalently, if (3iV)(Vn>^)P(n). Then we see that lim Xn = aii and only if every ball about a contains almost all the x„. The following sequential characterization provides probably the most intuitive way of viewing the notion of closure and closed sets.
4.3 SEQUENTIAL CONVERGENCE 203 Theorem 3.1. A point x is in the closure A of a set A if and only if there is a sequence {xn} in A converging to x. Therefore, a set A is closed if and only if every convergent sequence lying in A has its limit in A. Proof. If {xn} C A and Xn -^ Xj then every ball about x contains almost every Xn, and so, in particular, intersects A. Thus x G Ahy Lemma 1.3. Conversely, if X G Aj then every ball about x intersects A, and we can construct a sequence in A that converges to x by choosing Xn as any point in Bunix) 0 A. Since A is closed if and only if A = Aj the second statement of the theorem follows from the first. D There is also a sequential characterization of continuity which helps greatly in using the notion of continuity in a flexible way. Let X and Y be metric spaces, and let / be any function from X to Y. Theorem 3.2. The function / is continuous at a if and only if, for any sequence {xn} in X, if Xn -^ a, then f{xn) -^ f{a). Proof. Suppose first that / is continuous at a, and let {Xn} be any sequence converging to a. Then, given any e, there is a 5 such that p{x , a)< 8 => p{f[x) , /(a)) < e, by the continuity of / at a, and for this b there is an N such that n > N => p(Xn , a) < 5, because Xn -^ a. Combining these implications, we see that given e we have found N so that n > N => pUM , /(«)) < e. That is, f{xn) -^f{a). Now suppose that / is not continuous at a. In considering such a negation it is important that implicit universal quantifiers be made explicit. Thus, formally, we are assuming that '^(Ve)(35)(Vx)(p(x , a) < 5 =» p{f{x), /(a)) < e), that is, that (3e)(V5)(3x)(p(x , a) < 8 & p(f{x) , /(a)) > e). Such symbolization will not be necessary after the reader has had some practice in computing logical negations; the experienced thinker will intuit the correct negation without a formal calculation. In any event, we now have a fixed e, and for each d of the form 8 = 1/n we can let Xn be a corresponding x. We then havep(x„ , a) < 1/n and p(f{xn) , /(a)) > e for all n. The first inequality shows that Xn -^ a; the second shows that f(xn) -h/(a). Thus, if / is not continuous at a, then the sequential condition is 7wt satisfied. D The above type of argument is used very frequently and almost amounts to an automatic proof procedure in the relevant situations. We want to prove, say, that {Vx)(3y)(Vz)P(x, ?/, z). Arguing by contradiction, we suppose this false, so that {3x)(Vy)(3z)^P(Xj y^ z). Then, instead of trying to use all numbers y, we let y run through some sequence converging to zero, such as [1/^?), and we choose
204 COMPACTNESS AND COMPLETENESS 4.3 one corresponding 2;, Zn, for each such y. We end up with r^P{x, 1/n, Zn) for the given X and all n, and we finish by arguing sequentially. The reader will remember that two norms p and g on a vector space V are equivalent if and only if the identity map f j-^ f is continuous from <V,p> to <Vjq> and also from <Vj q> to <Vjp>. By virtue of the above theorem we now see that: Theorem 3.3. The norms p and q are equivalent if and only if they yield exactly the same collection of convergent sequences. Earlier we argued that a norm on a product V XW oi two normed linear spaces should be equivalent to ||<a:, ^^\\i = \W\\ + llfll- Now with respect to this sum norm it is clear that a sequence <«„, fn>^ in F X TF converges to < a, f > if and only if «„ -^ a in 7 and f„ -^ f in W. We now see (again by Theorem 3.2) that: Theorem 3.4. A product norm on F X TF is any norm with the property that <anj ^n> -^ <0Lj ^>' in V XW \i and only if «„ -^ a in V and f n -^ f in W. EXERCISES 3.1 Prove that a convergent sequence in a metric space has a unique limit. That is, show that \i Xn-^ a and Xn —> &, then a — h, 3.2 Show that Xn—^xm the metric space X if and only if p{Xnj x) —> 0 in R. 3.3 Prove that if o^n —> a in R and a^n ^ 0 for all n, then a > 0. 3.4 Prove that if o^n —> 0 in R and \yn\ ^ Xn for all n, then 2/n —^ 0. 3.5 Give detailed e, A^-proofs of Lemmas 3.1 and 3.2. 3.6 By applying Theorem 3.2, prove that if X is a metric space, F is a normed linear space, and F and G are continuous maps from X to F, then iP + G is continuous. State and prove the similar theorem for a product FG. 3.7 Prove that continuity is preserved under composition by applying Theorem 3.2. 3.8 Show that (the range of) a sequence of points in a metric space is in general not a closed set. Show that it may be a closed set. 3.9 The fact that in a normed linear space the closure of an open ball includes the corresponding closed ball is practically trivial on the basis of Lemma 3.2 and Theorem 3.1. Show that this is so. ^ 3.10 Show directly that if the maximum norm \\<a, f >|| = max {||a:||, ||f||} is used on F = Fi X F2, then it is true that <oin, fn> —^ <oi, f > in F if and only if an—^oc in Vi and Jn —^ J in F2.
4.4 SEQUENTIAL COMPACTNESS 205 3.11 Show that if || || is any increasing norm on R^ (see the remark after Theorem 4.3 of Chapter 3), then p«xi,yi>,<X2yy2>) = ||<p(^i,^2),p(2/i,2/2)>|| is a metric on the product X X F of two metric spaces X and Y. 3.12 In the above exercise show that <Xn,yn^ —> <x,y> in X X F if and only if a^n —^ ^ in X and 2/n —^ 2/ in Y. This property would be our minimal requirement for a product metric, 3.13 Defining a product metric as above, use Theorem 3.2 to show that <f,g>:S-^XXY is continuous if and only if/: S —^ X and g: S -^ Y are both continuous. 3.14 Let X, Y, and Z be metric spaces, and let /: X X F —> Z be a mapping such that f{x, y) is continuous in the variables separately. Suppose also that the continuity in X is uniform over y. That is, suppose that given e and xqj there is a 5 such that p{x,xo) < 8 => p(f{x,y),f{xo,y)) < e for every value of y. Show that then / is continuous on X X Y. 3.15 Define the function / on the closed unit square [0, 1] X [0, 1] by /(O, 0) = 0, fix, y) = T-^^ if <x, y> ^ <0, 0>. Then/is continuous as a function of x for each fixed value of y, and conversely. Show, however, that / is not continuous at the origin. That is, find a sequence < Xn, yn >" converging to < 0, 0 > in the plane such that f{Xn, yn) does not converge to 0. This example shows that continuity of a function of two variables is a stronger property than continuity in each variable separately. 4. SEQUENTIAL COMPACTNESS The reader is probably familiar with the idea of a subsequence. A subsequence of a sequence {xn\ is a new sequence {ym} that is formed by selecting an infinite number, but generally not all, of the terms x^ and counting them off in the order of the selected indices. Thus, if ni is the first selected n, 712 the next, and so on, and if we set ym = Xn^, then we obtain the subsequence 1^1? y2i • • • J ymi • • •! = xp^nii ^n2) • • • ; *^w„j • • •/ ^^ xViniin = l*^n„/m« Strictly speaking, this counting off of the selected set of indices nis a sequence m y-^ Um from Z"*" to Z"*" which preserves order: nm-\-i > nm for all m. And the subsequence m >-^ Xn^ is the composition of the sequence nv-^ Xn and the selector sequence. In order to avoid subscripts on subscripts, we may use the notation n{m) instead of n^. In either case we are being conventionally sloppy: we are using the same symbol 'n' as an integer-valued variable, when we write Xnj and as the selector function, when we write n{m) or n^. This is one of the standard nota-
206 COMPAGTJS^ESS AND COMPLETENESS 4.4 tional ambiguities which we tolerate in elementary calculus, because the cure is considered worse than the disease. We could say: let / be a sequence, i.e., a function from Z"*" to U. Then a subsequence of / is a composition / © gr, where g is a mapping from /"*" to /"*" such that g(m + 1) > g{m) for all m. If you have grasped the idea of subsequence, you should be able to see that any infinite sequence of O's and I's, say {0, 1, 0, 0, 1, 0, 0, 0, 1, . . .}, can be obtained as a subsequence of {0, 1, 0, 1, 0, 1, . . . , [1 + ( —1)'*]/2, . . .}. If Xn -^ a, then it should be clear that every subsequence also converges to a. We leave the details as an exercise. On the other hand, if the sequence {xn} does not converge to a, then there is an e such that for every N there is some larger n at which p(Xn, a) > e. Now we can choose such an n for every A^, taking care that n^r+i > n^v, and thus choose a subsequence all of whose terms are at a distance at least e from a. Then this sequence has no subsequence converging to a. Thus, if {xn} does not converge to a, then it has a subsequence no (sub)subsequence of which converges to a. Therefore, Lemma 4.1. If the sequence {xn} and the point a are such that every subsequence of {xn} has itself a subsequence that converges to a, then Xn -^ a. This is a wild and unlikely sounding lemma, but we shall use it to prove a most important theorem (Theorem 4.2). Definition. A subset ^ of a metric space is sequentially compact if every sequence in A has a subsequence that converges to a point of A. Here, so to speak, we create convergence out of nothing. One would expect a compact set to have very powerful properties, and perhaps suspect that there aren't many such sets. We shall soon see, however, that every boimded closed subset of R'* is compact, and it is in the theory of finite-dimensional spaces that we most frequently use this notion. Sequential compactness in infinite-dimensional spaces is a much rarer phenomenon, but when it does occur it is very important, as we shall see in our brief look at Sturm-Liouville theory in Chapter 6. We begin with a few simple but important general results. Lemma 4.2. If ^ is a sequentially compact subset of a metric space 5, then A is closed and bounded. Proof. Suppose that {xn} C A and that Xn -^ h. By the compactness of A there exists a subsequence {xn(i)}i that converges to a point a G A. But a subsequence of a convergent sequence converges to the same limit. Therefore, a = h and 6 G A. Thus A is closed. Boundedness here will mean lying in some ball about a given point b. If A is not bounded, for each n there exists a point Xn ^ A such that p{xnj b) > n. By compactness a subsequence {xn(t)}t converges to a point a G A, and p{xn(t), b) -^ p(a, b). This clearly contradicts p{xn{i), b) > n(i) > i. D
4.4 SEQUENTIAL COMPACTNESS 207 Continuous functions carry compact sets into compact sets. The proof of the following result is left as an exercise. Theorem 4.1. If/is continuous and A is a sequentially compact subset of its domain, then/[A] is sequentially compact. A nonempty compact set A C B^ contains maximum and minimum elements. This is because lub A is the limit of a sequence in A, and hence belongs to A itself, since A is closed. Combining this fact with the above theorem, we obtain the following well-known corollary. Corollary. If/is a continuous real-valued function and dom (/) is nonempty and sequentially compact, then / is bounded and assumes maximum and minimum values. The following very useful result is related to the above theorem. Theorem 4.2. If / is continuous and bijective and dom (/) is sequentially compact, then/~^ is continuous. Proof, We have to show that if Vn -^ V in the range of /, and if Xn = f^iVn) and X = f~^{y), then Xn -^ x. It is sufficient to show that every subsequence {^n(t)}t has itself a subsequence converging to x (by Lemma 4.1). But, since dom (/) is compact, there is a subsequence {xn{iU))}j converging to some z, and the continuity of / implies that f{z) = lim^-^^ fixnuu))) = limy^^ ynHU)) = V- Therefore, z = S~^{y) = x, which is what we had to prove. Thus f^^ is continuous. D We now take up the problem of showing that bounded closed sets in R"^ are compact. We first prove it for U itself and then give an inductive argument for R^. A sequence {xn} C K is said to be increasing if Xn < Xn+i for all n. It is strictly increasing if Xn < Xn+i for all n. The notions of a decreasing sequence and a strictly decreasing sequence are obvious. A sequence which is either increasing or decreasing is said to be monotone. The relevance of these notions here lies in the following two lemmas. Lemma 4.3. A bounded monotone sequence in U is convergent. Proof, Suppose that {xn} is increasing and bounded above. Let I be the least upper bound of its range. That is, Xn < I for all n, but for every e, Z — € is not an upper bound, and so I — e < xn for some N. Then n > N =^ I — e < Xn < Xn < ly and so \xn — /[ < e. That is, o^n -^ ^ as n -^ oo. D Lemma 4.4. Any sequence in R has a monotone subsequence. Proof, Call Xn a peak term if it is greater than or equal to all later terms. If there are infinitely many peak terms, then they obviously form a decreasing
208 COMPACTNESS AND COMPLETENESS 4.4 subsequence. On the other hand, if there are only finitely many peak terms, then there is a last one Xno (or none at all), and then every later term is strictly less than some other still later term. We choose any rii greater than no, and then we can choose n2 > ni so that Xm < Xn^i etc. Therefore, in this case we can choose a strictly increasing subsequence. We have thus shown that any sequence {xn} in R has either a decreasing subsequence or a strictly increasing subsequence. D Putting these two lemmas together, we have: Theorem 4.3. Every bounded sequence in R has a convergent subsequence. Now we can generalize to R'* by induction. Theorem 4.4. Every bounded sequence in R'* has a convergent subsequence (using any product norm, say || ||i). Proof. The above theorem is the case n = 1. Suppose then that the theorem is true for n — 1, and let {x'"},^ be a bounded sequence in W^, Thinking of W^ as W""^ X K, we have x^ = <y'^, ^m>, and {y'^]m is bounded in W"^, because if X = <y, 2;>, then ||x||i = ||y||i + \z\ > ||y||i. Therefore, there is a subsequence {y^^*^}t converging to some y in [R^"~\ by the inductive hypothesis. Since {znii)} is bounded in U, it has a subsequence {zn(i(p))}p converging to some x in U. Of course, the corresponding subsubsequence {y'*^*^^^^}^ still converges to y in U'''^\ and then {x''^^^^^^}^ converges to x = <y, ^> in R'' = R'*-^ X U, since its two component sequences now converge to y and Zj respectively. We have thus found a convergent subsequence of {x'*}. D Theorem 4.5. If ^ is a bounded closed subset of U^j then A is sequentially compact (in any product norm). Proof, If {xn} C A, then there is a subsequence {xn(i)}i converging to some x in U^, by Theorem 4.4, and a; is in A, since A is closed. Thus A is compact. D We can now fill in one of the minor gaps in the last chapter. Theorem 4.6. All norms on R'* are equivalent. Proof, It is sufficient to prove that an arbitrary norm || || is equivalent to || || i. Setting a = max {||5*||}^, we have Jlxisi <J^\xi\ \\8'\\ < a||x||i, 1 " 1 so one of our inequalities is trivial. We also have | ||a;|| — \\y\\ \ < \\x — y\\ < a\\x — 2/111, so ||a;|| is a continuous function on U^ with respect to the one-norm. Now the unit one-sphere S = {x : ||x||i = 1} is closed and bounded and so compact (in the one-norm). The restriction of the continuous function ||a;|| to this compact set S has a minimum value m, and m cannot be zero because S does not contain the zero vector. We thus have ||a;|| > m||a;||i on Sj and so ||a;|| > m||a;||i on R^, by homogeneity. Altogether we have found positive constants a and m such that m|| ||i < || || < a\\ ||i. D
4.4 SEQUENTIAL COMPACTNESS 209 Composing with a coordinate isomorphism, we see that all norms on any finite-dimensional vector space are equivalent. Corollary. If M is a finite-dimensional subspace of the normed linear space Vj then M is a closed subspace of V. Proof. Suppose that {fn} CM and ^n —^ a gV. We have to show that a is in M. Now {fn} is a bounded subset of Mj and its closure in M is therefore sequentially compact, by the theorem. Therefore, some subsequence converges to a point i9 in M as well as to a, and so a = fi G M. U EXERCISES 4.1 Prove by induction that if /: Z"^ ^ Z+ is such that /(n + 1) > /(n) for all n, then f{n) > n for all n. 4.2 Prove carefully that if Xn ^ a as n ^ oo, then Xnim) ^ a as m ^ oo for any subsequence. The above exercise is useful in this proof. 4.3 Prove that if {xn} is an increasing sequence in U (xn+i ^ Xn for all n), and if {xn} has a convergent subsequence, then {xn} converges. 4.4 Give a more detailed version of the argument that if the sequence {xn} does not converge to a, then there is an e and a subsequence {xn{m)}m such that p{Xnim), o) ^ ^ for all m. 4.5 Find a sequence in R having no convergent subsequence. 4.6 Find a nonconvergent sequence in R such that the set of limit points of convergent subsequence consists exactly of the number 1. 4.7 Show that there is a sequence {xn) in [0, 1] such that for any y G [0, 1] there is a subsequence Xn^ converging to y. 4.8 Show that the set of limits of convergent subsequences of a sequence {xn} in a metric space X is a closed subset of X. 4.9 Prove Theorem 4.1. 4.10 Prove that the Cartesian product of two sequentially compact metric spaces is sequentially compact. (The proof is essentially in the text.) 4.11 A metric space is boundedly compact if every closed bounded set is sequentially compact. Prove that the Cartesian product of two boundedly compact metric spaces is boundedly compact (using, say, the maximum metric on the product space). 4.12 Prove that the sum A -{- B oi two sequentially compact subsets of a normed linear space is sequentially compact. 4.13 Prove that the sum A -i- B oi sl closed set and a compact set is closed. 4.14 Show by an example in U that the sum of two closed sets nee^ not be closed. 4.15 Let {Cn} be a decreasing sequence (Cn+i CCn for all n) of nonempty closed subsets of a sequentially compact metric space S. Prove that Oi (^n is nonempty. 4.16 Give an example of a decreasing sequence {Cn} of nonempty closed subsets of a metric space such that Cif Cn — 0.
210 COMPACTNESS AND COMPLETENESS 4.5 4.17 Suppose the metric space S has the property that every decreasing sequence {Cn] of nonempty closed subsets of S has nonempty intersection. Prove that then *S must be sequentially compact. [Hint: Given any sequence {xi} C S, let Cn be the closure of {xi: i > n].] 4.18 Let .1 be a sequentially compact subset of a nls F, and let B be obtained from .1 by drawing all line segments from points of .1 to the origin (that is, B = {fa : a G .4 and < G [0, 1]}). Prove that B is compact. 4.19 Show by applying a compactness argument to Lemma L5 that if A^ is a proper closed subspace of a finite-dimensional vector space F, then there exists a in F such that ||a|| = p(a, N) = L 5. COMPACTNESS AND UNIFORMITY The word * uniform' is frequently used as a qualifying adjective in mathematics. Roughly speaking, it concerns a "point" property P{y) which may or may not hold at each point ?/ in a domain A and whose definition involves an existential quantifier. A typical form for P{y) is (yc){M)Q{y, c, d). Thus, if P{y) is V is continuous at y\ then P{y) has the form (ye){^b)Q{y, e, b). The property holds on A if it holds for all y \n A, that is, if (yy^^)[(yc){M)Q{y, c, d)]. Here d will, in general, depend both on y and c; if either ?/ or c is changed, the corresponding d may have to be changed. Thus b in the definition of continuity depends both on e and on the point y at which continuity is being asserted. The property is said to hold uniformly on A, or uniformly in y, if a value d can be found that is independent of y (but still dependent on c). Thus the property holds uniformly in y if {Vc)(M){Vy^^)Q{y,c,d); the uniformity of the property is expressed in the reversal of the order of the quantifiers (^y^^) and (3d). Thus/is uniformly continuous on A if (Ve)(35)(V2/, 2^^)[p(2/, 2) < h=^p{M,f{z)) < e]. Now b is independent of the point at which continuity is being asserted, but still dependent on e, of course. We saw in Section 14 of the last chapter how much more powerful the point condition of continuity becomes when it holds uniformly. In the remainder of this section we shall discuss some other imiform notions, and shall see that the uniform property is often implied by the point property if the domain over which it holds is sequentially compact. The formal statement forms we have examined above show clearly the distinction between uniformity and nonuniformity. However, in writing an argument, we would generally follow our more idiomatic practice of dropping out
4.5 COMPACTNESS AND UNIFORMITY 211 the inside universal quantifier. For example, a sequence of functions [fn] C W^ converges pointwise to/: A —> IF if it converges to/at every point p in A, that is, if for every point p in A and for every e there is an N such that n>N =^ pifn(p),Kp)) <e. The sequence converges uniformly on A if an AT exists that is independent of p, that is, if for every e there is an N such that n > N => pifn(p)J(p)) < e for every p in A. When p(f, rj) = II f — 77I!, saying that p(/n(p), f(p)) < e for all p is the same as saying that \\fn — f\\oo < e. Thus fn -> /uniformly if and only if ||/^ — /||^ -> 0; this is why the norm ||/||oo is called the uniform norm. Pointwise convergence does not imply uniform convergence. Thus/n(x) = x^ on A = (0, 1) converges pointwise to the zero function but does not converge uniformly. Nor does continuity on A imply uniform continuity. The function f{x) = 1/x is continuous on (0, 1) but is not uniformly continuous. The function sin (l/x) is continuous and bounded on (0, 1) but is not uniformly continuous. Compactness changes the latter situation, however. Theorem 5.1. If/is continuous on A and A is compact, then/is uniformly continuous on A. Proof. This is one of our "automatic" negation proofs. Uniform continuity (UC) is the property (V6>^)(35>«)(Vx,i/^^)[p(x,i/) < 8=>p(f{x)J{y)) < e]. Therefore, -UC ^ (3e)(V5)(3x, y)[p(x, y) < b and p(/(x), /(?/)) > e]. Take 5 = l/rij with corresponding Xn and y^ Thus, for all n, p{xnj yn) < V^ and p(f{^n),f{yn)) ^ e, where e is a fixed positive number. Now {xn} has a convergent subsequence, say XnH) —> x, by the compactness of A. Since we also have yn{i) —> x. By the continuity of / at x, p(KXn(i))J{yMi))) < p(f{Xnii))J(x)) + p(f{x), f{ynii))) -> 0, which contradicts p{f(xnii)),f(ynii))) ^ c. This completes the proof by negation. D The compactness of A does not, however, automatically convert the point- wise convergence of a sequence of functions on A into uniform convergence. The "piecewise linear" functions /n- [0, 1] —> [0, 1] defined by the graph shown in Fig. 4.1 converge pointwise to zero on the compact domain [0, 1], but the convergence is not uniform. (However, see Exercise 5.4.)
212 COMPACTNESS AND COMPLETENESS 1| JJ 1 1 4.5 Fig. 4.2 We pointed out earlier that the distance between a pair of disjoint closed sets may be zero. However, if one of the closed sets is compact, then the distance must be positive. Theorem 5.2. If A and C are disjoint nonempty closed sets, one of which is compact, then p(A,C) > 0. Proof. The proof is by automatic contradiction, and is left to the reader. This result is again a uniformity condition. Saying that a set A is disjoint from a closed set C is saying that (Va:^^)(3r>^)(^,(x) nC = 0). Saying that p{A, C) > 0 is saying that (3r>^)(Vx^^) . . . As a last consequence of sequential compactness, we shall establish a very powerful property which is taken as the definition of compactness in general topology. First, however, we need some preparatory work. If A is a subset of a metric space 5, the r-neighhorhood of A, Br[A], is simply the union of all the balls of radius r about points of A: Br[A] = \J{Br{a) :aGA} = {x: (3a^^)(p(x, a) < r)}. A subset A C >S is r-dense in 5 if 5 C ^r[-4], that is, if each point of S is closer than r to some point of A. A subset A of a metric space S is dense m S \i A = S. This is the same as saying that for every point p in aS there are points of A arbitrarily close to p. The set Q of all rational numbers is a dense subset of the real number system R, because any irrational real number x can be arbitrarily closely approximated by rational numbers. Since we do arithmetic in decimal notation, it is customary to use decimal approximations, and if 0 < x < 1 and the decimal expansion of X \Q X = 2Z? an/10'', where each an is an integer and 0 < an < 10, then YJi ^n/lO^ is a rational number differing from x by less than 10~^. Note that A is a dense subset of B if and only if A is r-dense in B for every positive r. A set B is said to be totally hounded if for every positive r there is a finite set which is r-dense in B. Thus for every positive r the set B can be covered by a finite number of balls of radius r. For example, the n — 1 numbers {z/n}^~^ are (l/n)-dense in the open interval (0, 1) for each n, and so (0, 1) is totally bounded.
4.5 COMPACTNESS AND UNIFORMITY 213 Total boundedness is a much stronger property than boundedness, as the following lemma shows. Lemma 5.1. If the normed linear space V is infinite-dimensional, then its closed unit ball Bi = {f : ||f|| < 1} cannot be covered by a finite number of balls of radius §. Proof. Since V is not finite-dimensional, we can choose a sequence {a-n} such that an+i is not in the linear span Mn of {ai, ... , an}j for each n. Since Mn is closed in 7, by the corollary of Theorem 4.6, we can apply Lemma L5 to find a vector f^ in Mn such that ||fn|| = 1 and p{^nj ^n-i) > § for all n > 1. We take f i = a:i/||ai||, and we have a sequence {fn} C Bi such that I sm sn|| ^ if m 7^ n. Then no ball of radius J can contain more than one f i, proving the lemma. D For a concrete example, let V be e([0, 1]), and let fn be the "peak" function sketched in Fig. 4.2, where the three points on the base are l/(2n + 2), l/(2n+ 1), and l/2n. Then/n+i is "disjoint" from fn (that is, fn+ifn = 0), and we have ||/n||oo = 1 for all n and ||/n — /mllw = I \i n 9^ m. Thus no ball of radius J can contain more than one of the functions/n, and accordingly the closed unit ball in V cannot be covered by a finite number of balls of radius ^. Lemma 5.2. Every sequentially compact set A is totally bounded. Proof. If A is not totally bounded, then there exists an r such that no finite subset F is r-dense in A. We can then define a sequence {pn} inductively by taking pi as any point of A, p2 as any point of A not in Br{pi)j and pn as any point of A not in Br[\J\~^ pi] = \JT~^ Br{pi). Then {pn} is a sequence in A such that p(pij pj) > r for all i 9^ j. But this sequence can have no convergent subsequence. Thus, if A is not totally bounded, then A is not sequentially compact, proving the lemma. D Corollary. A normed linear space V is finite-dimensional if and only if its closed unit ball is sequentially compact. Proof. This follows from Theorem 4.4 in one direction and from the above two lemmas in the other direction. D Lemma 5.3. Suppose that A is sequentially compact and that {Ei: i G 1} is an open covering of A (that is, {Ei} is a family of open sets and A c \JiEi). Then there exists an r > 0 with the property that for every point pin A the ball Br{p) is included in some Ej. Proof. Otherwise, for every r there is a point pin A such that Br{p) is not a subset of any ^y. Taker= 1/n, with corresponding sequence {pn}. Thus ^i/n(pn) is not a subset of any Ej. Since A is sequentially compact, {pn} has a convergent subsequence, Pn{m) —> p as m —> 00. Since {E^} covers A, some Ej contains p,
214 COMPACTNESS AND COMPLETENESS 4.5 and then B^{p) c Ej for some e > 0, since Ej is open. Taking m large enough so that 1/m < e/2 and also p{pn(m)j v) < ^Z^, we have Bilnim){Pn{m)) C B^(p) C Ej, contradicting the fact that Bun(pn) is not a subset of any Ei. The lemma has thus been proved. D Theorem 5.3. If ^ is an open covering of a sequentially compact set A, then some finite subfamily of ^ covers A. Proof. By the lemma immediately above there exists an r > 0 such that for every pm A the ball Br{p) lies entirely in some set of 9^, and by the first lemma there exist Pij. - - ,Pn iii A such that -A C Ui Br{pi). Taking corresponding sets Ei in ^ such that Br{pi) C Ei for i = 1, . . . , n, we clearly have A C U i ^i- D In general topology, a set A such that every open covering of A includes a finite covering is said to be compact or to have the Heine-Borel property. The above theorem says that in a metric space every sequentially compact set is compact. We shall see below that the reverse implication also holds, so that the two notions are in fact equivalent on a metric space. Theorem 5.4. If A is a compact metric space, then A is sequentially compact. Proof. Let {xn} be any sequence in A, and let ^ be the collection of open balls B such that B contains only finitely many Xi. If ^ were to cover A, then by compactness A would be the union of finitely many balls in 9^, and this would clearly imply that the whole of A contains only finitely many Xi, contradicting the fact that {xi} is an infinite sequence. Therefore, ^ does not cover A, and so there is a point X in A such that every ball about x contains infinitely many of the Xi. More precisely, every ball about x contains Xi for infinitely many indices i. It can now be safely left to the reader to see that a subsequence of {xn} converges to x. U EXERCISES 5.1 Show that/„(a:) = x"" does not converge uniformly on (0, 1). 5.2 Show that/(a:) = 1/x is not uniformly continuous on (0, 1). 5.3 Define the notion of a function K: X X F —> F being uniformly Lipschitz in its second variable over its first variable. 5.4 Let *S be a sequentially compact metric space, and let {/„] be a sequence of continuous real-valued functions on S that decreases pointwise to zero (that is, {fnip)] is a decreasing sequence in U and/n(p) —> 0 as n —> «> for each pin S), Prove that the convergence is uniform. (Try to apply Exercise 4.15.) 5.5 Restate the corollaries of Theorems 15.1 and 15.2 of Chapter 3, employing the weaker hypotheses that suffice by virtue of Theorem 5.1 of the present section.
4.6 EQUICONTINUITY 215 5.6 Prove Theorem 5.2. 5.7 Prove that if .1 is an r-dense subset of a set X in a normed Hnear space F, and if B is an s-dense subset of a set F C V, then A -]- B is (r -\- s)-dense in X + F. Conclude that the sum of two totally bounded subsets of V is totally bounded. 5.8 Suppose that the n points [pi] ^ are r-dense in a metric space X. Let .1 be any subset of X. Show that A has a subset of at most n points that is 2r-dense in .1. Conclude that any subset of a totally bounded metric space is itself totally bounded. 5.9 Prove that the Cartesian product of two totally bounded metric spaces is totally bounded. 5.10 Show that if a metric space X has a dense subset A that is totally bounded, then X is totally bounded. 5.11 Show that if two continuous mappings/and g from a metric space X to a metric space Y are equal on a dense subset of X, then they are equal everywhere. 5.12 Write out in explicit quantified form involving the existence of balls the statement that the interiors of the sets {Ai} cover the metric space A. Then show that the conclusion of Lemma 5.3 is another uniformity assertion. 5.13 Reprove the theorem that a continuous function on a compact domain is bounded on the basis of Theorem 5.3. 5.14 Reprove the theorem that a continuous function on a compact domain is uniformly continuous from Theorem 5.3. 6. EQUICONTINUITY The application of sequential compactness that we shall make in an infinite- dimensional context revolves around the notion of an equicontinuous family of functions. If A and B are metric spaces, then a subset ^ C B^ is said to be equicontinuous at po in A if all the functions of ^ are continuous at po and if given e, there is a 5 which works for them all, i.e., such that piPyPo) < 5 =» p{f{p)J{Po)) < e for every /in 9^. The family ^ is uniformly equicontinuous if 8 is also independent of poj and so is dependent only on e. Our quantifier string is thus (Ve)(35)(Vp, g^'^)(V/^^). For example, given m > 0, let 9^ be a collection of functions / from (0, 1) to (0, 1) such that/' exists and |/'| < m on (0, 1). Then \f{x) — f{y)\ < m\x — y|, by the ordinary mean-value theorem. Therefore, given any e, we can take d = e/m and have [x-y\ < d =^ \f{x) -f{y)\ < e for all Xj y G {Oy 1) and all / G 9^. The collection ^ is thus uniformly equicontinuous. Theorem 6.1. If A and B are totally bounded metric spaces, and if 9^ is a uniformly equicontinuous subfamily of B^, then ^ is totally bounded in the uniform metric.
216 COMPACTNESS AND COMPLETENESS 4.7 Proof. Given e > 0, choose b so that for all/in ^ and all pi, p2 in A, p(pi, ^2) < b =» p(/(pi), f{p2)) < c/4. Let D be a finite subset of A which is 5-dense in Aj and let ^ be a finite subset of B which is (e/4)-dense in B. Let G be the set E^ of all functions on D into E. G is of course finite; in fact, #G = n^, where m = #D and n = ^E. Finally, for each g G Glet ^g be the set of all functions f G^ such that p(/(p), gip)) < e/4 for every p G D, We claim that the collections ^g cover ^ and that each ^g has diameter at most €. We will then obtain a finite e-dense subset of ^ by choosing one function from each nonempty 9^^, and the theorem will be proved. To show that every / G 9^ is in some ^g, we simply construct a suitable g. For each p in D there exists a g in ^ whose distance from f{p) is less than e/4. If we choose one such qinE for each p in D, we have a function g inG such that The final thing we have to show is that li f,h G 9^^, then p(/, h) < e. Since P{hy g) < e/4 on D and p(/, g) < e/4 on D, it follows that p{f{p)y Hp)) < e/2 for every p G D. Then for any p' G A we have only to choose p G D such that p(p', p) < 5, and we have p(/(p'), Hp')) < p{f{p')J{p)) + p{f{p), h{p)) +p{h{p), Hp')) < e/4 + e/2 + e/4 = e. D The above proof is a good example of a mathematical argument that is completely elementary but hard. When referring to mathematical reasoning, the words 'sophisticated' and 'difficult' are by no means equivalent. 7. COMPLETENESS If x„ -^ a as n -^ 00, then the terms Xn obviously get close to each other as n gets large. On the other hand, if {Xn} is a sequence whose terms get arbitrarily close to each other as n -^ 00, then {x„} clearly ought to converge to a limit. It may not, however; the desired limit point may be missing from the sp^ce. If a metric space S is such that every sequence which ought to converge actually does converge, then we say that S is complete. We now make this notion precise. Definition. {x„} is a Cauchy sequence if for every e there is an N such that m > N and n > N => p{Xmj Xn) < e. Lemma 7.1. If {Xn} is convergent, then {xn} is Cauchy. Proof. Given e, we choose N such that n > N =^ p{Xnj o) < e/2, where a is the limit of the sequence. Then if m and n are both greater than AT, we have p(xm, Xn) < p(xm, o) + p{a, Xn) < e/2 + e/2 = e. D
4.7 COMPLETENESS 217 Lemma 7.2. If {xn} is Cauchy, and if a subsequence is convergent, then {xn} itself converges. Proof. Suppose that Xn(i) -^ a as i -^ oo. Given e, we take N so that mjn> N=^ p{Xn, Xm) < e. Because Xn^i) -^ cl as i-^ oo, we can choose an i such that n(i) > N Sindp{Xn(i)ya) < e. Thus if m > AT, we have p{Xmj a) < p{x i)) + p{xn(i)i a) < 2e, and so x^ -^ a. D Actually, of course, ii rrij n > N => p{Xm, Xn) < e, and if Xn -^ a, then for any m > AT' it is true that p(x^, a) < e. Why? Lemma 7.3. If A and B are metric spaces, and if T is a Lipschitz mapping from A to B, then T carries Cauchy sequences in A into Cauchy sequences in B, This is true in particular if A and B are normed linear spaces and T is an element of Hom(A, B). Proof. Let {x„} be a Cauchy sequence in A, and set yn = T{Xn). Given e, choose N so that m, n > N =^ p{Xmj Xn) < ^/Cj when C is a Lipschitz constant iorF, Then m,n> N =» p(?/^, Vn) = p{T{xm), T{Xn)) < Cp{xm, Xn) < Ce/C = e. D This lemma has a substantial generalization, as follows. Theorem 7.1. If A and B are metric spaces, {x„} is Cauchy in A, and F: A -^ B is uniformly continuous, then {F{xn)} is Cauchy in B, Proof. The proof is left as an exercise. The student should try to acquire a good intuitive feel for the truth of these lemmas, after which the technical proofs become more or less obvious. Definition. A metric space A is complete if every Cauchy sequence in A converges to a limit in A. A complete normed linear space is called a Banach space. We are now going to list some important examples of Banach spaces. In each case a proof is necessary, so the list becomes a collection of theorems. Theorem 7.2. U is complete. Proof. Let {xn} be Cauchy in U. Then {x„} is bounded (why?) and so, by Theorem 4.3, has a convergent subsequence. Lemma 7.2 then implies that {xn} is convergent. D Theorem 7.3. If A is a complete metric space, and if / is a continuous bijective mapping from A to a metric space B such that f~^ is Lipschitz continuous, then B is complete. In particular, if F is a Banach space, and if T in Hom(F, W) is invertible, then TF is a Banach space.
218 COMPACTNESS AND COMPLETENESS 4.7 Proof, Suppose that {yn} is a Cauchy sequence in Bj and set Xi = f~^{yi) for all i. Then {xi} is Cauchy in Aj by Lemma 7.3, and so converges to some x'm A, since A is complete. But then yn = f(xn) -^ f(x), because / is continuous. Thus every Cauchy sequence in B is convergent and B is complete. D The Banach space assertion is a special case, because the invertibility of T means that T~^ exists in Hom(IF, V) and hence is a Lipschitz mapping. Corollary. If p and q are equivalent norms on V and <V, p> is complete, then so is <V, q>. Theorem 7.4. If Vi and V2 are Banach spaces, then so is Vi X ¥2- Proof. If {< ^n, Vn'>^} is Cauchy, then so are each of {fn} and {rjn} (by Lemma 7.3, since the projections Ti are bounded). Then fn -^ « and Vn —^ f^ for some a G 7i and /3 G V2. Thus < ^n, Vn> -^ <ol, P> in Vi X ¥2- (See Theorem 3.4.) D Corollary 1. If {Vi] \ are Banach spaces, then so is 11?= 1 Vi- Corollary 2. Every finite-dimensional vector space is a Banach space (in any norm). Proof. R^ is complete (in the one-norm, say) by Theorem 7.2 and Corollary 1 above. We then impose a one-norm on V by choosing a basis, and apply the corollary of Theorem 7.3 to pass to any other norm. D Theorem 7.5. Let TF be a Banach space, let A be any set, and let (B(A, W) be the vector space of all bounded functions from A toW with the uniform norm \\f\\^ = lub {||/(a)|| : a G A}. Then (B(A, W) is a Banach space. Proof. Let [fn] be Cauchy, and choose any a G A. Since ||/n(a) — fm{ci)\\ ^ Wfn — /m||x, it follows that {/n(a)} is Cauchy in W and so convergent. Define g: A —> W hy g(a) = lim/^(a) for each a G A. We have to show that g is bounded and that fn -^ {/. Given e, we choose N so that m, 71 > N =^ \\fm — /n|U < e. Then WfUa) - g{a)\\ = lim ||/^(a) - fn{a)\\ < e. n—>oo Thus, lim > Nj then ||/^(a) — g{a)\\ < e for all a ^ A, and hence ||/,,i — (;||oo < e. This implies both that fm — g ^ (B(A, IF), and so g = fm-{fm- g) e (B(A, TF), and that fm-^g in the uniform norm. D Theorem 7.6. If F is a normed linear space and IF is a Banach space, then Hom(F, IF) is a Banach space. The method of proof is identical to that of the preceding theorem, and we leave it as an exercise. Boundedness here has a different meaning, but it is used
4.7 COMPLETENESS 219 in essentially the same way. One additional fact has to be established, namely, that the limit map (corresponding to g in the above theorem) is linear. Theorem 7.7. A closed subset of a complete metric space is complete. A complete subset of any metric space is closed. Proof. The proof is left to the reader. It follows from Theorem 7.7 that a complete metric space A is absolutely closed, in the sense that no matter how we extend A to a larger metric space Bj A is always a closed subset of B. Actually, this property is equivalent to completeness, for if A is not complete, then a very important construction of metric space theory shows that A can be completed. That is, we can construct a complete metric space B which includes A. Now, if A is not complete, then the closure of A in B, being complete, is different from A, and A is not absolutely closed. See Exercise 7.21 through 7.23 for a construction of the completion of a metric space. The completion of a normed linear space is of course a Banach space. Theorem 7.8. In the context of Theorem 7.5, let A be a metric space, let e(A, W) be the space of continuous functions from A to W, and set (Be(A, w) = (B(A, w) n e(A, w). Then (BC is a closed subspace of (B. }<e/3 Fig. 4.3 Proof, We suppose that {/n} C (BC and that \\fn — oWoo -^ 0, where gf E (B. We have to show that g is continuous. This is an application of a much used "up, over, and down" argument, which can be schematically indicated as in Fig. 4.3. Given e, we first choose any n such that \\fn — (/|U < e/3. Consider now any a G A, Since/„ is continuous at a, there exists a 8 such that p(x,a) < 8 => \\fn(x) -fn(a)\\ < e/3.
220 COMPACTNESS AND COMPLETENESS 4.7 Then p{x,a) < 8 => \\g{x) - g{a)\\ < \\g{x) - fn{x)\\ + \\fn{x) - fn{a)\\ + Wfnia) - g{a)\\ < e/3 + e/3 + e/3 = c. Thus g is continuous at a for every a G Aj and so ^ G (BC. D This important classical result is traditionally stated as follows: The limit of a uniformly convergent sequence of continuous functions is continuous. Remark. The proof was slightly more general. We actually showed that if fn-^f uniformly, and if each fn is continuous at a, then / is continuous at a. Corollary. (Be(A, W) is a Banach space. Theorem 7.9. If A is a sequentially compact metric space, then A is complete. Proof. A Cauchy sequence in A has a subsequence converging to a limit in A, and therefore, by Lemma 7.2, itself converges to that limit. Thus A is complete. D In Section 5 we proved that a compact set is also totally bounded. It can be shown, conversely, that a complete, totally bounded set A is sequentially compact, so that these two properties together are equivalent to compactness. The crucial fact is that if A is totally bounded, then every sequence in A has a Cauchy subsequence. If A is also complete, this Cauchy subsequence will converge to a point of A. Thus the fact that total boundedness and completeness together are equivalent to compactness follows directly from the next lemma. Lemma 7.4. If A is totally bounded, then every sequence in A has a Cauchy subsequence. Proof. Let {pm} be any sequence in A. Since A can be covered by a finite number of balls of radius 1, at least one ball in such a covering contains infinitely many of the points {pm}- More precisely, there exists an infinite set Mi C ^"^ such that the set {pm : m G Mi} lies in a single ball of radius I. Suppose that Ml, . . . , Mn C /"*" have been defined so that Mi^i C Mifori = 1, . . . , n — 1, Mn is infinite, and {pm - r^ ^ Mi} is a subset of a ball of radius l/iiori= 1,..., n. Since A can be covered by a finite family of balls of radius l/(n + 1), at least one covering ball contains infinitely many points of the set {pm ' 'm G Mn}. More precisely, there exists an infinite set M^+i C Mn such that {pm • rn G M^n+i} is a subset of a ball of radius l/(n + 1). We thus define an infinite sequence {Mn} of subsets of /"*" having the above properties. Now choose mi G Mij m2 G M2 so that m2 > mi, and, in general, mnj^i G MnJfi SO that mnjfi > mn- Then the subsequence {pm^n is Cauchy. For given e, we can choose n so that 1/n < e/2. Then i, j ":> n => mij mj g Mn => p{pmi, Pnij) < 2(l/n) < e. This proves the lemma, and our theorem is a corollary. D
4.7 COMPLETENESS 221 Theorem 7.10. A metric space aS is sequentially compact if and only if aS is totally bounded and complete. The next three sections will be devoted to applications of completeness to the calculus, but before embarking on these vital matters we should say a few words about infinite series. As in the ordinary calculus, if {fn} is a sequence in a normed linear space F, we say that the series ^^i converges and has the sum a, and write J^°^ ^i = a, if the sequence of partial sums converges to a. This means that (Jn -^ « as n -^ oo, where (Tn is the finite sum ^^ ^i for each n. We say that ^^i converges absolutely if the series of norms II||fi|| converges in U. This is abuse of language unless it is true that every absolutely convergent series converges, and the importance of the notion stems from the following theorem. Theorem 7.11. If F is a Banach space, then every absolutely convergent series in V is convergent. Proof, Let ^^i be absolutely convergent. This means that 2211 fill converges in Uj i.e., that the sequence {sn} converges in R, where Sn = H^ ||fi||. If m < n, then Wn ~ (Tm = m + 1 m + 1 Since {si} is Cauchy in R, this inequality shows that {(Ti} is Cauchy in V and therefore, because V is complete, that {(Tn} is convergent in V. That is, X^ ^i is convergent in V. D The reader will be asked to show in an exercise that, conversely, if a normed linear space V is such that every absolutely convergent series converges, then V is complete. This property therefore characterizes Banach spaces. We shall make frequent use of the above theorem. For the moment we note just one corollary, the classical Weierstrass comparison test. Corollary. If {fn} is a sequence of bounded real-valued (or TF-valued, for some Banach space W) functions on a common domain A, and if there is a sequence {Mn} of positive constants such that X ^n is convergent and 11/nil 00 < Mn for each n, then X^/n is uniformly convergent. Proof, The hypotheses imply that 2Z||/n||oo converges, and so J2 fn converges in the Banach space (B(4, W) by the theorem. But convergence in (B(A, W) is imiform convergence. D EXERCISES 7.1 Prove that a Cauchy sequence in a metric space is a bounded set. 7.2 Let F be a normed linear space. Prove that the sum of two Cauchy sequences in V is Cauchy. 7.3 Show also that if {f n} is Cauchy in V and {an} is Cauchy in U, then {an^n} is Cauchy in V,
222 COMPACTNESS AND COMPLETENESS 4.7 7.4 Prove that if {f„} is a Cauchy sequence in a normed linear space V, then [\\^n\\] is a Cauchy sequence in U. 7.5 Prove that if [xn] and f^/n) are two Cauchy sequences in a metric space S, then [p(xn, Vn)] is a Cauchy sequence in U. 7.6 Prove the statement made after the proof of Lemma 7.2. 7.7 The rational number system is an incomplete metric space. Prove this by exhibiting a Cauchy sequence of rational numbers that does not converge to a rational number. 7.8 Prove Theorem 7.1. 7.9 Deduce a strengthened form of Theorem 7.3 from Theorem 7.1. 7.10 Write out a careful proof of Theorem 7.6, modeled on the proof of Theorem 7.5. 7.11 Prove Theorem 7.7. 7.12 Let the metric space .Y have a dense subset Y such that every Cauchy sequence in Y is convergent in X. Prove that X is complete. 7.13 Show that the set W of all Cauchy sequences in a normed linear space V is itself a vector space and that a seminorm p can be defined on W by p({^n]) = Hm ||fn||. (Put this together from the material in the text and the preceding problems.) 7.14 Continuing the above exercise, for each ^ E: V, let ^^ be the constant sequence all of whose terms are f. Show that ^: f»—> f*" is an isometric linear injection of V into IT and that dlV] is dense in W in terms of the seminorm from the above exercise. 7.15 Prove next that every Cauchy sequence in d[V] is convergent in W. Put Exercises 4.18 of Chapter 3 and 7.12 through 7.14 of this chapter together to conclude that if N is the set of null Cauchy sequences in W, then W/N is a Banach space, and that J 1-^ ^^ is an isometric linear injection from V to a dense subspace of W/N'. This constitutes the standard completion of the normed linear space V. 7.16 We shall now sketch a nonstandard way of forming the completion of a metric space S. Choose some point po in S, and let V be the set of real-valued functions on S such that/(po) = 0 and/is a Lipschitz function. For/in V define ||/|| as the smallest Lipschitz constant for /. That is, 11/11 =lub{|/(p)-/(5)|/p(p,g)}. Prove that T^ is a normed linear space under this norm. (F actually is complete, but we do not need this fact.) 7.17 Continuing the above exercise, we know that the dual space F* of all bounded linear functionals on V is complete by Theorem 7.6. We now want to show that S can be isometrically imbedded in F*; then the closure of aS as a subset of F will be the desired completion of S. For each p G S, let dp'. V -^ Uhe "evaluation at p". That is, Opif) = fip)- Show that dp G F* and that \\dp — d^W < p{p, q). 7.18 In order to conclude that the mapping 6: p^-^ ^^ is an isometry (i.e., is distance- preserving), we have to prove the opposite inequality \\dp — ^^H > p(p, q). To do this, choose p and consider the special function/(a:) = p{p, x) — p{p, po). Show that/ is in F and that ||/|| = 1 (from an early lemma in the chapter). Now apply the definition of \\6p — 6q\\ and conclude that 6 is an isometric injection of S into F*. Then d[S] is our constructed completion.
4.8 A FIRST LOOK AT BANACH ALGEBRAS 223 7.19 Prove that if a normed linear space V has the property that every absolutely convergent series converges, then V is complete. (Let {an] be a Cauchy sequence. Show that there is a subsequence {a^J » such that if ft = an^+i — ar^, then \\^.i\\ < 2~'\ Conclude that the subsequence converges and finish up.) 7.20 The above exercise gives a very useful criterion for V to be complete. Use it to prove that if F is a ]3anach space and iV is a closed subspace, then Y/N is a Banach space (see Exercise 4.14 of Chapter 3 for the norm on YIN). 7.21 Prove that the sum of a uniformly convergent series of infinitesimals (all on the same domain) is an infinitesimal. 8. A FIRST LOOK AT BANACH ALGEBRAS When we were considering the implicit-function theorem and the inverse-function theorem in the last chapter, we saw how useful it is to know that if a transformation T has an inverse T"^^ then so does aS whenever ||aS — r|| is small enough, and that the mapping T h^ T~^ is continuous on the open set of all invertible elements. When the spaces in question are finite-dimensional, these facts can be made to follow from the continuity of the determinant function T h^ A(r) from Hom 7 to R. It is also possible to produce them by arguing directly in terms of upper and lower bounds for T and its close approximations aS. But the most natural, most elegant,, and—in the case of Banach spaces—easiest way to prove these things is to show that if F is a Banach space and T in Hom Y has norm less than one, then the sum of the geometric series l^o ^^ is the inverse of I — T, just as in the elementary calculus. But in making this argument, the fact th^t r is a linear transformation has little importance, and we shall digress for a moment to explore this situation. Lit us summarize the norm and algebraic properties of Hom Y when 7 is a Banach space. First of all, we know that Hom Y is also a Banach space. Second, it is an algebra. That is, it possesses an associative multiplication operation (composition) that relates to the linear operations according to the following laws: {S,+S2)T=SiT + S2T, c{ST)= {cS)T=S(cT). Finally, multiplication is related to the norm by ll^rll < ll^ll llrll and 11711 = 1. This list of properties constitutes exactly the axioms for a Banach algebra. Just as we can see certain properties of functions most clearly by forgetting that they are functions and considering them only as elements of a vector space, now it turns out that we can treat certain properties of transformations in Hom(F) most simply by forgetting the complicated nature of a linear transformation and considering it merely as an element of an abstract Banach algebra A.
224 COMPACTNESS AND COMPLETENESS 4.8 The most important simple thing we can do in a Banach algebra that we couldn't do in a Banach space is to consider power series. The following theorem shows that the geometric series, in particular, plays the same central role here that it plays in elementary calculus. Since we are not thinking of the elements of A as transformations, we shall designate them by lower-case letters; e is the identity of A, Theorem 8.1. If A is a Banach algebra, and if x in A has norm less than one, then (e — x) is invertible and its inverse is the sum of the geometric series in x: (e - x)-' = Z X 0 Also, \\e — (e — x) ^\\ < r/(l — r), where r = ||x||. Proof. Since ||x^|| < ||x||^ = r^, the series Yl ^^ is absolutely convergent when ||.t|| < 1 by comparison with the ordinary geometric series Yl ^'^- It is therefore convergent, and if ?/ = 2Zo ^^j then (e — x)y = lim {e — x)^ x'^ = lim (e — x^'^^) = 6, 0 since ||x||^+^ < r^+^ -> 0. That is, y = (e — x)~\ Finally, \\e — (e — xY <J2r''= r/(l - r). D Theorem 8.2. The set 91 of invertible elements in a Banach algebra A is open and the mapping x »—> x~^ is continuous from 91 to 91. In fact, if y~^ exists and m = \\y~'^\\j then (y — h)~^ exists whenever \\h\\ < 1/m and Uy-h)-' -y~'\\ < m'\\h\\/{l - m\\h\\). Proof, Set X = y~^h. Then (y — h) = y(e — a;), where ||x|| = ||2/~^/«'|| ^ ni\\h\\, and so by the above theorem y — h will be invertible, with (y — h)~^ = (e — x)~^y~^j provided \\h\\ < 1/m, Then also \\y-' - (y- h)-'\\ < \\e - (e - x)-'\\ • m, and this is bounded above by mr/(l - r) < m^h\\/(l - m\\h\\), by the last inequality in the above theorem. D Corollary. If F and W are Banach spaces, then the invertible elements in Hom(7, W) form an open set, and the map T h^ T~^ is continuous on this domain. Proof, Suppose that T^^ exists, and set m = ||r-^||. Then if ||r - aS|| < 1/m, wehave||/- ^-^aSH < ||r-^|| \\T - S\\ < 1, and so T-^aS = I - {I - T'^S) is an invertible element of Hom V, Therefore, S = T{T~^S) is invertible and ^_i _ ^y,_i^^_iy,_i rj.^^ continuity of S ^ S'^'^ is left to the reader. D
4.8 A FIRST LOOK AT BANACH ALGEBRAS 225 We saw above that the map x ^-^ (e — x)~^ from the open unit ball ^i(O) in a Banach algebra A to A is the sum of the geometric power series. We can define many other mappings by convergent power series, at hardly any greater effort. Theorem 8.3. Let A be a Banach algebra. Let the sequence {an} C A and the positive number 8 be such that the sequence {\\an\\ d^} is bounded. Then J2 0,71^^ converges for x in the ball B^{Q) in A, and if 0 < 5 < 5, then the series converges uniformly on ^s(O). Proof. Set r = s/b, and let 6 be a bound for the sequence {||an||5^}. On the ball Bs{0) we then have ||ana:''|| < ||an||5'' = ||an||5''r'' < hr"^, and the series therefore converges uniformly on this ball by comparison with the geometric series h^L t^, since r < L D The series of most interest to us will have real coefficients. They are included in the above argument because the product of the vector x and the scalar t is the algebra product {te)x. In addition to dealing with the above geometric series, we shall be particularly interested in the exponential function e^ =^ YI'q x'^/n\. The usual comparison arguments of the elementary calculus show just as easily here that this series converges for every x in A and uniformly on any ball. It is natural to consider the differentiability of the maps from A to A defined by such convergent series, and we state the basic facts below, starting with a fundamental theorem on the differentiability of a limit of a sequence. Theorem 8.4. Let {F^} be a sequence of maps from a ball ^ in a normed linear space 7 to a normed linear space W such that F^ converges pointwise to a map F onB and such that {clF^} converges for each a and uniformly over a. Then F is differentiate on B and dF^ = lim dF^ for each /? in B. Proof, Fix /? and set T = lim dF"^, By the uniform convergence of {dF""}, given e, there is an N such that WdF"^ — dF^\\ < e for all n > iV and for all a in B. It then follows from the mean-value theorem for differentials that \\{AFm - AF^(^)) - {dFm - dF^im < 2eU\\ for all n > AT and all f such that /? + ^ G B, Letting n —> oo and regrouping, we have ii(AF^(^) - m)) - (AF^(^) - dF^Wn < 2eU\\ for all such f. But, by the definition of dF^ there is a 5 such that iiAF^(^) - dF^m < m when llfll < d. Putting these last two inequalities together, we see that m < 8 => \\AF^U)-Tm <36||f||. Thus F is differentiable at /3 and dF^ = T. U The remaining proofs are left as a set of exercises.
226 COMPACTNESS AND COMPLETENESS 4.8 Lemma 8.1. Multiplication on a Banach algebra A is differentiable (from A X A to A). If we let p be the product function, so that p{Xj y) = xy, then dp^a,h>{x, y) = ay + xh. Lemma 8.2. Let A be a commutative Banach algebra, and let p be the monomial function p(x) = ax^. Then p is everywhere differentiable and dpy{x) = 7my^~^x, Lemma 8.3. If {||anlh'^j is a bounded sequence in U, then {n||an||s^} is bounded for any 0 < s < r, and therefore J^ nanX^~^ converges uniformly on any ball in A smaller than ^^(0). Theorem 8.5. If ^ is a commutative Banach algebra and {a^} C -A is such that nknlb'^] is bounded in R, then F(x) = Y.o (^n^^ is defined and differentiable on the ball Br(0) in A, and dFy(x) =: (Z nany''-'^ It is natural to call the element X!^ nany^~^ the derivative of F Sit y and to designate it F^(y)j although this departs from our rule that derivatives are vectors obtained as limits of difference quotients. The remarkable aspect of the above theorem is that for this kind of differentiable mapping from an open subset of ^ to ^ the linear transformation dFy is multiplication by an element of ^: dFy(x) = F'{y)'X, In particular, the exponential function exp {x) = e"" = Y.Z ^^/^J is its own derivative, since Xl"]^ 7?x^~V^^- ^ Z!o ^'"/^v ^^^ hova this fact (see the exercises) or from direct algebraic manipulation of the series in question, we can deduce the law of exponents e^'^^ = e^e^. Remember, though, that this is on a commutative Banach algebra. The function x ^-^ e^ = 2Zo ^^/^- can be defined just as easily on any Banach algebra A, but it is not nearly as pleasant when A is noncommutative. However, one thing that we can always do, and often thereby save the day, is to restrict the exponential mapping to a commutative subalgebra of A, say that generated by a single element x. For example, we can consider the parametrized arc l{t) = e^^ {x fixed) into ayiy Banach algebra A, and, because its range lies in the commutative subalgebra X generated by x, we can apply Theorem 7.2 of Chapter 3 to conclude that 7 is differentiable and that l\t) = d QXY>tx {x) = xe'^. This can also easily be proved directly from the law of exponents: Mt{h) = e^'+^^^^ - e'^ = e'-^(e'^^ - 1), and since it is clear from the series that {e^^ — l)//i —> x as /i —> 0, we have that
4.8 A FIRST LOOK AT BAXACH ALGEBRAS 227 EXERCISES 8.1 Finish the proof of the corollary of Theorem 8.2. 8.2 Let .1 be a J^anach algebra, and let [a„] C ^ and x G A he such that 23 ^i-^"' converges. Suppose also that x satisfies a polynomial indentity p{x) = ^obiX^ = 0, where [bi] C ^ and bn 9^ 0. Prove that the element 5ZT otio:' is a polynomial in x of degree < n — 1. (Let .1/ be the linear span of {^^]o~S and show first that x^ E M for all I.) 8.3 Let .1 be any l^anach algebra, let a: be a fixed element in J, and let X be the smallest closed subalgebra of .1 containing x. Prove that A^ is a commutative Banach algebra. (The set of polynomials p(x) = 5Zo ^i^' is the smallest algebra containing x. Consider its closure in X.) 8.4 Prove Lemma 8.1. [Hint: <x, y> t-^ xy is a, bounded bilinear map.] 8.5 Prove Lemma 8.2 by making a direct A-estimate from the binomial expansion, as in the elementary calculus. 8.6 Prove Lemma 8.2 by induction from Lemma 8.1. 8.7 Let .1 be any Banach algebra. Prove that p-.x^-^x^ is differentiable and that dpa(^) = ^0? -\- axa -\- a^x. 8.8 Prove by induction that if q{x) = x"", then q is differentiable and n-l dqa{x) = ^ a^a:a^"-i-^\ 1=0 Deduce Lemma 8.2 as a corollary. 8.9 Let. I be any Banach algebra. Prove that r: x^-^ x~^ is everywhere differentiable on the open set U of invertible elements and that dVaix) = — a~^xa~^, [Hint: Examine the proofs of Theorems 8.1 and 8.2.] 8.10 Let .4 be an open subset of a normed linear space V, and let F and G be mappings from A to a Banach algebra X that are differentiable at a. Prove that the product mapping FG is differentiable at a and that d(FGa) = F{a) dGa + dFaG{a). Does it follow that d{F^)a = 2F{a) dFJ 8.11 Continuing the above exercise, show that if X is a commutative Banach algebra, then d{F'')a = nF^'-^a) dFa> 8.12 Let F: A —> X be a differentiable map from an open set .1 of a normed linear space to a Banach algebra X, and suppose that the element F(^) is invertible in .Y for every ^ in A. Prove that the map G: ^^-^ [F{^)]~^ is differentiable and that dGaiO = —F(a)-^ dFaiOFM' Sho w also that if/''is a parametrized arc (A = /CK), then G\a) = -F(a)-'^ • F'{a) • F{a)-K 8.13 Prove Lemma 8.3. 8.14 Prove Theorem S.5 by showing that Lemma 8.3 makes Theorem 8.4 applicable. 8.15 Show that in Theorem 8.4 the convergence oi F^ to F needs only to be assumed at one point, provided we know that the codomain space W is a Banach space.
228 COMPACTNESS AND COMPLETENESS 4.9 8.16 We want to prove the law of exponents for the exponential function on a commutative Banach algebra. Show first that (exp (—a:))(exp x) = e by applying Exercise 7.13 of Chapter 3, the above Exercise 8.10, and the fact that d expa (x) = (exp a)x. 8.17 Show that if Z is a commutative Banach algebra and F: X ^^ X is a different iable map such that dFaiO = ?^'X^)j then F(^) = p exp J for some constant p. [Consider the differential of F{^) exp (—J).] 8.18 Now set F{^) = exp (J + r;) and prove from the above exercise that exp {^+ v) = exp (J) exp (rj). You will also need the fact that exp 0 = 1. 8.19 Let 2 be a nilpotent element in a commutative Banach algebra A". That is, z^ = 0 for some positive integer p. Show by an elementary estimate based on the binomial expansion that if ||a:|| < 1, then ||a:+2||'^ < knP\\x\\^~p for n > p. The series of positive terms XI ^^^'^ converges for r < 1 (by the ratio test). Show, therefore, that the series for log (l — {x -\- z)) and for (l — {x -\- z))~^ converge when ||a:|| < 1. 8.20 Continuing the above exercise, show that F{y) = log (1 — y) is defined and differentiable on the ball \\y — 2;|| < 1 and that dFa(x) = —(1 — a)~^ • x. Show, therefore, that exp (log {1 — y)) = 1 — y on this ball, either by applying the inverse mapping theorem or by applying the composite function rule for differentiating. Conclude that for every nilpotent element z in X there exists sl u in X such that exp u = 1 — z. 8.21 Let Zi, . . . , Z„ be Banach algebras. Show that the product Banach space X = IJi Zi becomes a Banach algebra if the product xy = <x,. .. ,XnXyi,. .. ,yn> is defined as <xiyi, . . . , Xnyn^ and if the maximum norm is used on X. 8.22 In the above situation the projections iri have now become bounded algebra homomorphisms. In fact, just as in our original vector definitions on a product space, our definition of multiplication on X was determined by the requirement that Tri{\y) = Tri{x)Tri(y) for all i. State and prove an algebra theorem analogous to Theorem 3.4 of Chapter 1. 8.23 Continuing the above discussion, suppose that the series ^ anx"" converges in X, with sum y. Show that then ^(an)i(xi)'' converges in Xi to yi for each i, where, of course, y = <yi, . . . ,yn>. Conclude that e"" = <e^i, . . . , e^»> for any x = ■<xi, . . . , Xn^ in Z. 8.24 Define the sine and cosine functions on a commutative Banach algebra, and show that sin' = cos, cos' = —sin, sin^ + cos^ = e. 9. THE CONTRACTION MAPPING FIXED-POINT THEOREM In this section we shall prove the very simple and elegant fixed-point theorem for contraction mappings, and then shall use it to complete the proof of the implicit- function theorem. Later, in Chapter 6, it will be the basis of our proof of the fundamental existence and uniqueness theorem for ordinary differential equations. The section concludes with a comparison of the iterative procedure of the fixed-point theorem and that of Newton's method.
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 229 A mapping K from a metric space X to itself is a contraction if it is a Lipschitz mapping with constant less than 1; that is, if there is a constant C with 0 < C < 1 such that p(K(x), K{y)) < Cp{x, y) for all x, y E: X. A fixed point of K is, of course, a point x such that K(x) = x. A contraction K can have at most one fixed point, since if K(x) = x and K(y) = y, thenp(x, y) = p(K(x), K{y)) < Cp{x, y), and so (1 — C)p{x, y) < 0. Since C < 1, this implies that p(x, y) = 0 and x = y. Theorem 9.1. Let X be a nonempty complete metric space, and let K: X -^ X be a contraction. Then K has a (unique) fixed point. Proof. Choose any Xq in X, and define the sequence {xn} o inductively by setting Xi = K(xo)j X2 = K(xi) = K^{xo), and Xn = K(xn-i) = K^(xo). Set 8 = p(xi, Xq), Thenp(x2, Xi) = p(K(xi), K{xo)) < Cp(xi, Xq) = C8, and, by induction, p(Xn + U Xn) = p{K(Xn), K(Xn-l)) < Cp{Xn, Xn-l) < C ' C^'' d = C^8. It follows that {xn} is Cauchy, for if m > n, then m —1 m —1 p(Xm, Xn) < Z p{Xi+i, Xi) < ^ C'S < C"5/(l - C), n n and C^ -^ 0 as n -^ oo, because C < 1. Since X is complete, {xn] converges to some a in X, and it then follows that K{a) = lim K{Xn) = lim x'n+i = a, so that a is a fixed point. D In practice, we meet mappings K that are contractions only near some particular point p, and we have to establish that a suitable neighborhood of p is carried into itself by K. We show below that if K is a contraction on a ball about p, and if K doesn't move the center of p very far, then the theorem can be applied. Corollary 1. Let Z) be a closed ball in a complete metric space X, and let K: D -^ X be a contraction which moves the center of D a distance at most (1 — C)r, where r is the radius of D and C is the contraction constant. Then K has a unique fixed point and it is in D. Proof, We simply check that the range of K is actually in D. If p is the center of D and x is any point in D, then p{K(xl p) < p(K(x), K(p)) + p{K(pl p) < Cp{x, p) + {l- C)r < Cr + (1 - C)r = r. U Corollary 2. Let B be an open ball in a complete metric space X, and let K: ^ -^ X be a contraction which moves the center of ^ a distance less than (1 — (7)r, where r is the radius of B and C is the contraction constant. Then K has a unique fixed point. Proof. Restrict K to any slightly smaller closed ball D concentric with B, and apply the above corollary. D
230 COMPACTNESS AND COMPLETENESS 4.9 Corollary 3. Let K be a contraction on the complete metric space X, and suppose that K moves the point x a distance d. Then the distance from x to the fixed point is at most d/(l — C), where C is the contraction constant. Proof, Let D be the closed ball about x of radius r = d/(l — C), and apply Corollary 1 to the restriction of K to D. It implies that the fixed point is in D. D We now suppose that the contraction K contains a parameter s, so that K is now a function of two variables K(Sj x). We shall assume that K is a contraction in X uniformly over s, which means that p(K(Sj x), K(Sj y)) < Cp{x, y) for all X, y, and s, where 0 < C < L We shall also assume that K is a continuous function of 5 for each fixed x. Corollary 4. Let K be a mapping from aS X X to X, where X is a complete metric space and S is any metric space, and suppose that K(Sj x) is a contraction in X uniformly over 5 and is continuous in 5 for each x. Then the fixed point Ps is a continuous function of s. Proof. Given e, we use the continuity of K in its first variable around the point <t, pt> to choose 8, so that if p{s, t) < 8, then the distance from K{s, pt) to K(t, Pt) is at most e. Since K(t, pt) = pt, this simply says that the contraction with parameter value s moves pt a distance at most e, and so the distance from Pt to the fixed point Ps is at most e/(l — C) by Corollary 3. That is, p(s, t) < 8 =» p{ps, Pt) < ^/(l — C)j where C is the uniform contraction constant, and the mapping s ^-^ ps is accordingly continuous at t. D Combining Corollaries 2 and 4, we have the following theorem. Theorem 9.2. Let ^ be a ball in a complete metric space X, let S be any metric space, and let K be a mapping from aS X ^ to X which is a contraction in its second variable uniformly over its first variable and is continuous in its first variable for each value of its second variable. Suppose also that K moves the center of ^ a distance less than (1 — C)r for every 5 in S, where r is the radius of B and C is the uniform contraction constant. Then for each 5 in S there is a unique p in B such that K{Sj p) = p, and the mapping s ^-^ p is continuous from aS to ^. We can now complete the proof of the implicit-function theorem. Theorem 9.3. Let V, W, and X be Banach spaces, let A X ^ be an open subset of V X W, and let G: A X ^ —> X be continuous and have a continuous second partial differential. Suppose that the point <«,/?> in A X ^ is such that G(a, 0) = 0 and dG\aS> is invertible. Then there are open balls M and N about a and /?, respectively, such that for each f in M there is a unique -q in N satisfying G(f, 77) = 0. The function F thus uniquely defined near <«,/?> by the condition G{^,F{^)) = Qis continuous. Proof, Set T = dG\aS> and jf^(f, 77) = 77 - T-\G{^, 77)). Then K is a continuous mapping from A X B to W such that K(a, 0) = P, and K has a con-
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 231 tinuous second partial differential such that dK%a,^> = 0. Because dK%^,j,>^ is a continuous function of < /i, ^ >, we can choose a product ball M X N about <aj ^> on which dK\y,^yy, is bounded by J, and we can then decrease the ball M if necessary so that for /x in M we also have \\K{yij 0) — P\\ < r/2j where r is the radius of the ball N, The mean-value theorem for differentials implies that K is a contraction in its second variable with constant ^. The preceding theorem therefore shows that for each f in M there is a unique rjinN such that K(^, rj) = 7} and the mapping F: ^ ^-^ rj is continuous. Since K{^j 77) == 77 if and only if G(^, v) = 0, we are done. D Theorems 8.2 and 9.3 complete the list of ingredients of the implicit-function theorem. (However, see Exercise 9.8.) We next show, in the other direction, that if a contraction depending on a parameter is continuously differentiable, then the fixed point is a continuously differentiable function of the parameter. Theorem 9.4. Let V and W be Banach spaces, and let iiC be a differentiable mapping from an open subset AxBoiVxWtoW which satisfies the hypotheses of Theorem 9.2. Then the function F from A to ^ uniquely defined by the equation iiC(f, F(^)) = F(^) is differentiable. Proof. The inequality ||K(f, 77') - K(f, t?")!! < CWv' - -q'^ is equivalent to ||^^<a.^>|| < C for all <a, p> m Ax B. We now define G by {?(f, 77) = rj — K(^j 77), and observe that dG^ = I — dK^ and that dG^ is therefore invertible by Theorem 8.1. Since G(^,F(0) = 0,it follows from Theorem 11.1 of Chapter 3 that F is differentiable and that its differential is obtained by differentiating the above equation. D Corollary. If K is continuously differentiable, then so is F. * We should emphasize that the fixed-point theorem not only has the implicit- function theorem as a consequence, but the proof of the fixed-point theorem gives an iterative procedure for actually finding the value of F{^)j once we know how to compute T~^ (where T = dG%a,p>)' In fact, for a given value of f in a small enough ball about <a, i9> consider the function G(f, •)• If we set K(^y rj) = Tj — T~^G(^y rj), then the inductive procedure Vi+i = K(^, rji) becomes (vi+i -Vi) = -T-'Ga,vi). (9.1) The meaning of this iterative procedure is easily seen by studying the graph of the situation where V = W = R^. (See Fig. 4.4.) As was proved above, under suitable hypotheses, the series Slki+i ~ Vi\\ converges geometrically. It is instructive to compare this procedure with Newton's method of elementary calculus. There the iterative scheme (9.1) is replaced by ivi+i - Vi) = S7'G(i, 77,), (9.2)
232 COMPACTNESS AND COMPLETENESS 4.9 G(a,-), G{x,') G{x, ^o)> Fig. 4.4 2/3 2/2 2/1 where Si = dG\^^y^.>^. (See Fig. 4.5.) As we shall see, this procedure (when it works) converges much more rapidly than (9.1), but it suffers from the disadvantage that we must be able to compute the inverses of an infinite number of linear transformations Si, Fig. 4.5 Let us suppress the f which will be fixed in the argument and consider a map G defined in some neighborhood of the origin in a Banach space. Suppose that G has two continuous differentials. For definiteness, we assume that G is defined in the unit ball, B, and we suppose that for each x ^B the map dGx is invertible and, in fact, \\dG7^\\ < K, \\d^Gj\ < K, Let Xo == 0 and, assuming that Xn has been defined, we set Xfi^l = Xfi K^n ^\^n)) where Sn = dGx^- We shall show that if ||G(0)|| is sufficiently small (in terms of K)j then the procedure is well defined (that is, Hxn+iH < 1) and converges rapidly. In fact, if r is any real number between one and two (for instance T = f)? we shall show that for some c (which can be made large if ||G(0)|| is small) \\Xn-Xn-i\\ <6--". (*)
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 233 Note that if we can establish (*) for large enough c, then ||x„|| < 1 follows. In fact, 1 1 1 It which is < 1 if c is large. Let us try to prove (*) by induction. Assuming it true for Uj we have ll^n + l — Xn\\ = ||AS;r^G(^n)|| < K{\\G(Xn-l) - c/(?x„_,S-_ii(?(Xn_i)|| + K\\Xn - X,_if} by Taylor's theorem. Now the first term on the right of the inequality vanishes, and we have \\Xn+i - x„\\ < K^Xn - a;„_i|r < Kh-^"\ For the induction to work we must have or Since r < 2, this last inequality can be arranged by choosing c sufficiently large. We must still verify (*) for n == 1. This says that or ||G(0)||<^- (***) In summary, for 1 < r < 2 choose c so that K^ < e^'^~'^^^'^ and ^-c(r-l) < 1. 1 — e-c(r-l) - Then if (***) holds, the sequence Xn converges exponentially, that is, (*) holds. If X = limxi, then G{x) = lim G{Xn) = WxnSniXnJfi ~ ^n) = 0. This is Newton's method. As a possible choice of c and r, let r == f, and let c be given by K^ = e^^^^, so that (**) just holds. We may also assume that K > 2^'^, so that e^^'^ > 4^^^ or e^ > 4:j which guarantees that e~^'^ < J, implying that e~^'^/(l — e~^'^) < 1. Then (***) becomes the requirement G(0) < K~^, We end this section with an example of the fixed-point iterative procedure in its simplest context, that of the inverse-mapping theorem. We suppose that ^(0) = 0 and that cIHq^ exists, and we want to invert H near zero, i.e., solve the equation H^ri) -— ^ = 0 for rj in terms of f. Our theory above tells us that the 7} corresponding to f will be the fixed point of the contraction K(f, 77) = 77 — T~^H(r]) + r"~^(f), where T — dHo, In order to make our example as
Ui = X, U2 = X - Us = X - U4 = X - -v\ -{y- ~\v~ x^f (X- 234 COMPACTNESS AND COMPLETENESS 4.1) simple as possible, we shall take U from R^ to R^ and choose it so that dU^ = I. Also, in order to avoid indices, we shall use the mongrel notation x = <x, y>, u = <u,v>. Consider the mapping x = H(u) defined by x = u + v^j y = u^ + v. The Jacobian matrix " 1 2v] [3^2 1 is clearly the identity at the origin. Moreover, in the expression K(x, u) = X + u — H(u)j the difference H(u) ~ u is just the function J(u) = < v^j u^ >. This cancellation of the first-order terms is the practical expression of the fact that in forming K(f, r;) = r; — T~^G{^, rj), we have acted to make dK^ == 0 at the "center point" (the origin here). We naturally start the iteration with uq = 0, and then our fixed-point sequence proceeds Ui = K(x, Uo) = K{x, 0), . . . , Un = K{x, Wn-l)' Thus Uo = 0 and u„ = K(x, u„_i) = x — J(u„_i), giving vi = y, V2= y — x^, V3 = y - (x - 2/')^ yy?. v, = y~[x-{y- x'r]\ We are guaranteed that this sequence u^ will converge geometrically provided the starting point x is close enough to 0, and it seems clear that these two sequences of polynomials are computing the Taylor series expansions for the inverse functions u(x, y) and v(Xj y). We shall ask the reader to prove this in an exercise. The two Taylor series start out u{x, y) = X — y'^ — 2yx^ -\ , v{x, y)= y ~ x^ + Sx^y^ H EXERCISES 9.1 Let 5 be a compact subset of a normed linear space such that rB G B for all r G [0, 1]. Suppose that F:B—^B is a Lipschitz mapping with constant 1 (i.e., \\F(h — ^'\v)\\ < U — v\\ for all J, r; G B). Prove that F has a fixed point. [Hint: Consider first G = rF for 0 < r < 1.] 9.2 Give an example to show that the fixed point in the above exercise may not be unique. 9.3 Let Z be a compact metric space, and let K:X —^ X "reduce each nonzero distance". That is, p{K{x), K{y)) < p(x, y) ii x 9^ y. Prove that K has a unique fixed point. (Show that otherwise gib {p{K(x),x)} is positive and achieved as a minimum. Then get a contradiction.) 9.4 Let i^ be a mapping from S X Z to X, where Z is a complete metric space and S is any metric space, and suppose that K(s, x) is a contraction in s uniformly over x and
4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 235 is Lipschitz continuous in x uniformly over x. Show that the fixed point ps is a Lipschitz continuous function of s. [Hint: Modify the €,5-beginning of the proof of Corollary 4 of Theorem 9.1.] 9.5 Let D be an open subset of a Banach space F, and let /^: D —> F be such that / — /^ is Lipschitz with constant J. a) Show that if BM) C Z) and /? = K{a), then 5w2(/3) C K[D]. (Apply a corollary of the fixed-point theorem to a certain simple contraction mapping.) v b) Conclude that K is injective and has an open range, and that K~^ \^ Lipschitz with constant 2. 9.6 Deduce an improved version of the result in Exercise 3.20, Chapter 3, from the result in the above exercise. 9.7 In the context of Theorem 9.3, show that dG\n,v> is invertible if ||c^K<^,„> || < 1. (Do not be confused by the notation. We merely want to know that S is invertible if ||7_ T-^oS\\ < 1.) 9.8 There is a slight discrepancy between the statements of Theorem 11.2 in Chapter 3 and Theorem 9.3. In the one case we assert the existence of a unique continuous mapping from a ball M, and in the other case, from the ball M to the ball A^. Show that the requirement that the range be in N can be dropped by showing that two continuous solutions must agree on M. (Use the point-by-point uniqueness of Theorem 9.3.) 9.9 Compute the expression for dFa from the identity G{^, F{^)) = 0 in Theorem 9.4, and show that if K is continuously differentiate, then all the maps involved in the solution expression are continuous and that a»—> dFa is therefore continuous. 9.10 Going back to the example worked out at the end of Section 9, show by induction that the polynomials Un — Un—i and Vn — Vn—i contain no terms of degree less than n. 9.11 Continuing the above exercise, show therefore that the power series defined by taking the terms of degree at most n from Un is convergent in a ball about 0 and that its sum is the first component u{x, y) of the mapping inverse to H. 9.12 The above conclusions hold generally. Let J = <K,L> be any mapping from a ball about 0 in R^ to U^ defined by the convergent power series K{x, 0 = Z cLijx'y', L(x, 2/) = S bij^'y^ in which there are no terms of degree 0 or L With the conventions x = <x,y> and u = <u,v>, consider the iterative sequence UO = 0, Un = X — J(Un-l). Make any necessary assumptions about what happens when one power series is substituted in another, and show by induction that Un — Un-i contains no terms of degree less than n, and therefore that the Un define a convergent power series whose sum is the function u(x, y) = <u{x, y), v(x, y) > inverse to /f in a neighborhood of 0. [Remember that ^(77) = H^rj) — 77.] 9.13 Let A be a Banach algebra, and let x be an element of A of norm less than 1. Show that
236 COMPACTNESS AND COMPLETENESS 4.10 This means that if Wn is the partial product YLi (1 + ^^^)j then Wn -^ (e — x)~^. [Hint: Prove by induction that {e — x)7rn-i = e — x^^.] This is another example of convergence at an exponential rate, like Newton's method in the text. 10. THE INTEGRAL OF A PARAMETRIZED ARC In this section we shall make our final application of completeness. We first prove a very general extension theorem, and then apply it to the construction of the Riemann integral as an extension of an elementary integral defined for step functions. Theorem 10.1. Let U he Si subspace of a normed linear space 7, and let T be a bounded linear mapping from f/ to a Banach space W. Then T has a uniquely determined extension to a bounded linear transformation aS from the closure U to W, Moreover, ||aS|| = \\T\\. Proof, Fix aGU and choose {f„] C f/ so that f„ —> a. Then {f„} is Cauchy and {T{^n)] is Cauchy (by the lemmas of Section 7), so that {T{^n)} converges to some 0 G W. If {77^} is any other sequence in U converging to a, then fn - ^n -> 0, T{^n) - T{7in) = T{^n - Vn) "> 0, and so Tirjr,) -> /3 also. Thus /? is hidependent of the sequence chosen, and, clearly, /? must be the value S(a) at a of any continuous extension S oi T. If a G U, then /? = lim T{an) = T{a) by the continuity of T. We thus have aS uniquely defined on U by the requirement that it be a continuous extension of T. It remains to be shown that aS is linear and bounded by || r||. For any a, 0gU we choose {fn}, {Vn] C f/, so that fn —^ « and rjn —> /?. Then x^n + yVn —^ ^f + 2/^j so that S{xa + yp) = lim T{x^n + VVn) = x lim T{^n) + V lim ^(77,) = xS{a) + yS{^). Thus aS is linear. Finally, IIaSWII = lim \\T{^n)\\ < \\T\\ lim UnW = \\T\\ • ||aH, Thus 11^11 is a bound for aS, and, since aS includes T, \\S\\ = \\T\\. D The above theorem has many applications, but we shall use it only once, to obtain the Riemann integral ja fit) dt of a continuous function / mapping a closed interval [a, h] into a Banach space W as an extension of the trivial integral for step functions. If TF is a normed linear space and /: [a, 6] —> TF is a continuous function defined on a closed interval [a, 6] C [f^, we might expect to be able to define Ja f{t) dt as a suitable vector in W and to proceed with the integral calculus of vector-valued functions of one real variable. We haven't done this until now because we need the completeness of W to prove that the integral exists! At first we Shall integrate only certain elementary functions called step functions. A finite subset A of [a, h] which contains the two endpoints a and h
4.10 THE INTEGRAL OF A PARAMETRIZED ARC 237 will be called a partition of [a, h]. Thus A is (the range of) some finite sequence {^i} Oj where a = to < ti < • • • < tn = h, and A subdivides [a, h] into a sequence of smaller intervals. To be definite, we shall take the open intervals (^i_i, ti), i == 1, . . . , n, as the intervals of the subdivision. If A and B are partitions and A C B,we shall say that -B is a refinement of A, Then each interval (5y_i, 5y) of the ^-subdivision is included in an interval (^i_i, ti) of the A-subdivision; ti^i is the largest element of A which is less than or equal to 5y_i, and ti is the smallest greater than or equal to Sj. A step function is simply a map /: [a, 6] —> TF which is constant on the intervals of some subdivision A = {ti}^. That is, thej^e exists a sequence of vectors {aj i such that /(J) = aj when f g (^i_i, ^i). The values of / at the subdividing points may be among these values or they may be different. For each step function / we define ja f{t) dt as X?=i oii Mi, where/ = ai on (^i_i, ti) and Ati = ti — ti^i. If/ were real-valued, this would be simply the sum of the areas of the rectangles making up the region between the graph of / and the ^-axis. Now / may be described as a step function in terms of many different subdivisions. For example, if/ is constant on the intervals of A, and if we obtain B from A by adding one new point s, then / is constant on the (smaller) intervals of B, We have to be sure that the value of the integral of / doesn't change when we change the describing subdivision. In the case just mentioned this is easy to see. The one new point 5 lies in some interval (^i_i, ti), defined by the partition A. The contribution of this interval to the A-sum is oii(ti — ti^i), while in the B-sum it splits into ai(ti — s) -\- ai(s — ^i_]). But this is the same vector. The remaining summands are the same in the two sums, and the integral is therefore unchanged. In general, suppose that / is a step function with respect to A and also with respect to C, Set B = A \J C, the ''common refinement" of A and C. We can pass from A to -B in a sequence of steps at each of which we add one new point. As we have seen, the integral remains unchanged at each of these steps, and so it is the same for A as for B, It is similarly the same for C and B, and so for A and C. We have thus shown that ja f is independent of the subdivision used to define /. Now fix [a, h] and W, and let 8 be the set of all step functions from [a, h] to W. Then 8 is a vector space. For, if/and gf in 8 are step functions relative to partitions A and B, then both functions are constant on the intervals of C = A \J B, and therefore xf + yg is also. IMoreover, if C = {ti}^, and if on (^i_i, ti) we have/ = ai and g = fii, so that xf + yg = xai -\- y^i there, then the equation Y, {xai + y^i) Mi = X {T, ai Mi) + ?/ f Z ^i ^u) is just ja (xf + yg) = X ja f+y ja 0- The map f ^-^ ja f is thus linear from 8 to W. Finally, / J a 1 < E \WMh < \\f\Ub - a),
238 COMPACTNESS AND COMPLETENESS 4.10 where ||/|U = lub {||/(0|| :tG[a, h]} = max {\\ai\\ : 1 < ^ < n}. That is, if we use on 8 the uniform norm defined from the norm of TF, then the linear mapping /»—> /„/is bounded by {h — a). If TF is complete, this transformation therefore has a unique bounded linear extension to the closure 8 of 8 in (B([a, h], W) by Theorem 10.1. But we can show that 8 includes the space e([a, h]y W) of all continuous functions from [a, h] to W, and the integral of a continuous function is thus uniquely defined. Lemma 10.1. e([a, h], W) C 8. Proof, A continuous function / on [a, h] is uniformly continuous (Theorem 5.1). That is, given e>^, there exists 5>^ such that |s — ^| < 5 =^ ||/(s) — /(0|| < e. Now take any partition A = {^J q on [a, h] such that A^^ = ti — ti^i < 8 for all ^, and take ai as any value of/on (<i_i, ti). Then ||/(^) — (Xi\\ < e on [^i_i, ti]. Thus, if g is the step function with value ai on (<i_i, ti] and g(a) = ai, then 11/ — ^||oo ^ e. Thus/ is in 8, as desired. D Our main theorem is a recapitulation. Theorem 10.2. If TF is a Banach space and V = <B>{[a, h], W) under the uniform norm, then there exists di, J E. Hom(F, W) uniquely determined by setting J(/) = lim/o/„, where {/„} is any sequence in 8 converging to / and Ja/„ is the integral on 8 defined above. Moreover, ||J|| < (6 — a). If/is elementary from [a, h] to W and c G [a, 6], then of course/is elementary on each of [a, c] and [c, 6]. If c is added to a subdivision A used in defining/, and if the sum defining J^ / with respect to B = A KJ {c} is broken into two sums at c, we clearly have JJ' / = ^a f -\- jc /• This same identity then follows for any continuous function / on [a, h], since J^ / == lim j^ fn = lim {ja fn + jc fn) = limj^U + lim//A = j^f+je'f- The fundamental theorem of the calculus is still with us. Theorem 10.3. If / G e([a, b], W) and F: [a, 6] -> IF is defined by F(x) = ja f{t) dt, then F' exists on (a, b) and 7^'(x) = /(x). Proof. By the continuity of / at Xq, for every e there exists a 8 such that ll/(:ro)-/(x)|| <e whenever |x — Xo| < S. But then ||r(/(a:o) -m)dt\\ < e|a;-Xo|, and since j^^ /(^o) <^^ = /(^o) (^ "~ ^o) by the definition of the integral for an elementary function, we see that ||/(^o) - (rf(t)dt/(x-Xo))\\ < e. W \Jxo /It Since jxof{t) dt = F{x) — F{xo), this is exactly the statement that the difference quotient for F converges to /(xq), as was to be proved. D
4.10 THE INTEGRAL OF A PARAMETRIZED ARC 239 EXERCISES 10.1 Prove the following analogue of Theorem 10.1. Let A be a subset of a metric space Bj let C be a complete metric space, and let F: A —> C be uniformly continuous. Then F extends uniquely to a continuous map from A to C. 10.2 In Exercises 7.16 through 7.18 we have constructed a completion of S, namely, 6[S] in F*. Prove that this completion is unique to within isometry. That is, supposing that (p is some other isometric imbedding of /S in a complete space X, show that the identification of the two images oi S by (p o d~^ (from d[S] to (p[S]) extends to an isometric bijection from 6[S] to (p[S]. [Hint: Apply the above exercise.] 10.3 Suppose that /S is a normed linear space X and that X is a dense subset of a complete metric space Y. This means, remember, that every point of Y is the limit of a sequence lying in the subset X. Prove that the vector space structure of X extends in a unique way to make Y a Banach space. Since we know from Exercise 7.18 that a metric space can be completed, this shows again that a normed linear space can always be completed to a Banach space. 10.4 In the elementary calculus, if/is continuous, then ^'mdt==f(xXb-a) / Ja for some x in (a, b). Show that this is not true for vector-valued continuous functions/ by considering the arc /: [0, tt] —> U^ defined by f{t) = <sin t, cos t>. 10.5 Show that integration commutes with the application of linear transformations. That is, show that if / is a continuous function from [a, b] to a Banach space W, and if T G Hom(Tr, X), where X is a Banach space, then j'T(m)dt^ T^j'mdt^. [Hint: Make the computation directly for step functions.] 10.6 State and prove the theorem suggested by the following identity: f' <m, g{t)> dt^ ^ I'm dt, 1^ g{t) dt y. (Apply the above exercise.) 10.7 Let W be any normed linear space, {aJia finite set of vectors in W, and {/t}i a corresponding set of real-valued continuous functions on [a, b]. Define the arc 7 by n y{t) = Z/<(0«.-. Prove that Xf y{t) dt exists and equals
240 COMPACTNESS AND COMPLETENESS 4.11 10.8 Let / be a continuous function from U^ to a Banach space W. Describe how one might set up a theory of a double integral //■ f(s, t) ds dt, IXJ where / X / is a closed rectangle. 10.9 Prove that if /„ converges uniformly to /, then Cfn{t)dt~^ Cmdt. J a J a This is trivial if you have understood the definition and properties of the integral. 10.10 Suppose that {/„} is a sequence of smooth arcs from [a, h] to a Banach space W such that Yl\ fnit) is uniformly convergent. Suppose also that J2T fnid) is convergent. Prove that then ^ fnif) is uniformly convergent, that / = S^/n is smooth, and that /' = Er/n. (Use the above exercise and the fundamental theorem of the calculus.) 10.11 Prove that even if W is not a Banach space, if the arc /: [a, h] —> W has a continuous derivative, then X?/' exists and equals/(6) —f{o). 10.12 Let Z be a normed linear space, and set (Z, J) = Z( J) for ^E: X and I G Z*. Now let / and g be continuously differentiable functions (arcs) from the closed interval [a, h] to X and Z*, respectively. Prove the integration by parts formula: (sQ>),Ah)) - {g{a),m) = C im, g'{t)) dt + C {f'{t), g{t)} dt. J a J a [Hint: Apply Theorem 8.4 from Chapter 3.] 10.13 State the generalization of the above integration by parts formula that holds for any bounded bilinear mapping co: 7 X IF —> Z, where Z is a Banach space. 10.14 Let i»—» Z^ be a fixed continuous map from a closed interval [a, h] to the dual TF* of a Banach space IF. Suppose that for any continuous map g from [a, h] to W f g(f) dt = 0 => f lt{g{t)) dt = 0. J a J a Show that there exists a fixed L E W* such that 1^ lt{g{t))dt = l(J^ g{t)dt^ for all continuous arcs g: [a, b] —> M\ Show that it then follows that U = L for all t. 10.15 Use the above exercise to deduce the general Euler equation of Section 3.15. 11. THE COMPLEX NUMBER SYSTEM The complex number system C is the third basic number field that must be studied, after the rational numbers and the real numbers, and the reader surely has had some contact with it in the past. Almost everybody views a complex number J as being equivalent to a pair of real numbers, the ''real and imaginary parts" of J, and the complex number system C is thus viewed as being Cartesian 2-space K^ with some further struc-
4.11 THE COMPLEX NUMBER SYSTEM 241 ture. In particular, a complex-valued function is simply a certain kind of vector- valued function, and is equivalent to an ordered pair of real-valued functions, again its real and imaginary parts. What distinguishes the complex number system C from its vector substratum R^ is the presence of an additional operation, complex multiplication. The vector operations of R^ together with this complex multiplication operation make C into a commutative algebra. Moreover, it turns out that < 1, 0> is the unique multiplicative identity in C and that every nonzero complex number ^ has a multiplicative inverse. These additional facts are summarized by saying that C is a field, and they allow us to use C as a new scalar field in vector space theory. In fact, the whole development of Chapters 1 and 2 remains valid when R is replaced everywhere by C. Scalar multiplication is now multiplication by complex numbers. Thus C^ is the vector space of ordered n-tuples of complex numbers < ^i, . . . , ^n>^, and the product of an n-tuple by a complex scalar a is defined by «< ^i, . . . , ^n>^ = <«^i, • • • , «^n>^, where a^i is complex multiplication. It is time to come to grips with complex multiplication. As the reader probably knows, it is given by an odd looking formula that is motivated by thinking of an element ^ = <Xi, X2>^ as being in the form Xi + ix2j where i^ = — 1, and then using the ordinary laws of algebra. Then we have ^-n = (xi + ix2){yi + iy^) = xiyi + ixiy2 + ^^22/i + ^'^^22/2 = (xiyi — X22/2) + ^(^i2/2 + ^22/1), and thus our definition is <xuX2> <yi,y2> = <xiyi — X22/2, ^12/2 + ^22/i>. Of course, it has to be verified that this operation is commutative and satisfies the laws for an algebra. A straightforward check is possible but dull, and we shall indicate a neater way in the exercises. The mapping x ^-^ <x, 0> is an isomorphic injection of the field U into the field C. It clearly preserves sums, and the reader can check in his mind that it also preserves products. It is conventional to identify x with its image < x, 0 >, and so to view R as a subfield of C. The mysterious i can be identified in C as the pair < 0, 1 >, since then i^ = <0, 1X0, 1> = < — 1, 0>, which we have identified with —1. With these identifications we have <x,y> = <x, 0> + <Ojy> = <XyO> + <0, 1> <y,0> = X + iyj and this is the way we shall write complex numbers from now on. The mapping x + iy ^-^ x — iy is a field isomorphism of C with itself. That is, it preserves both sums and products, as the reader can easily check. Such a self-isomorphism is called an automorphism. The above automorphism is called complex conjugation, and the image x — iy oi ^ = x -\- iy is called the conjugate of f, and is designated f. We shall ask the reader to show in an exercise
242 COMPACTNESS AND COMPLETENESS 4.11 that conjugation is the only automorphism of C (except the identity automorphism) which leaves the elements of the subfield R fixed. The Euclidean norm oi ^ = x + iy = <x, y> is called the absolute value of f, and is designated |f|, so that |f| = \x + zy\ = (x^ + y^^^. This is reasonable because it then turns out that |fT| = |f| |7|. This can be verified by squaring and multiplying, but it is much more elegant first to notice the relationship between absolute value and the conjugation automorphism, namely, [(X + iy)(x - iy) = x^- {iyY = x''+ y\ Then \^l\'= (n)(n) = (fD(T7) = |f|^|7|^, and taking square roots gives us our identity. The identity ff = |f|^ also shows us that if f ^ 0, then f/|f|^ is its multiplicative inverse. Because the real number system IR is a subfield of the complex number system C, any vector space over C is automatically also a vector space over U: multiplication b}^ complex scalars includes multiplication by real scalars. And any complex linear transformation between complex vector spaces is automatically real linear. The converse, of course, does not hold. For example, a real linear mapping T from R^ to R^ is not in general complex linear from C to C, nor does a real linear aS in Hom R^ become a complex linear mapping in Hom C^ when U^ is viewed as C^. We shall study this question in the exercises. The complex differentiability of a mapping F between complex vector spaces has the obvious definition AFa = T -\- e, where T is complex linear, and then F is also real differentiable, in view of the above remarks. But F may be real differentiable without being complex differentiable. It follows from the discussion at the end of Section 8 that if {an} C C and {|a^| 5^} is bounded, then the series J2 (^n^'' converges on the ball ^^(O) in the (real) Banach algebra C, and F(0 = Lo «nf"" is real differentiable on this ball, with dF^{^) = (E^ na^/3^)r == F\p) • f. But multiplication by F\p) is obviously a complex linear operation on the one-dimensional complex vector space C. Therefore, complex-valued functions defined by convergent complex power series are automatically complex differentiable. But we can go even further. In this case, if f ^ 0, we can divide by f in the defining equation AF^{0 = F'(p)^^ + o(0 to get the result that ^m _ F'iP) as f -> 0. That is, F'(j5) is now an honest derivative again, with the complex infinitesimal f in the denominator of the difference quotient. The consequences of complex differentiability are incalculable, and we shall mostly leave them as future pleasures to be experienced in a course on functions of complex variables. See, however, the problems on the residue calculus at the end of Chapter 12 and the proof in Chapter 11, Exercise 4.3, of the following fundamental theorem of algebra.
4.11 THE COMPLEX NUMBER SYSTEM 243 Theorem. Every polynomial with complex coefficients is a product of linear factors. A weaker but equivalent statement is that every polynomial has at least one (complex) root. The crux of the matter is that x^ + 1 cannot be factored over U (i.e., it has no real root), but over C we have x^ + 1 = (x + i)(x -- z), with the two roots ± i. For later use we add a few more words about the complex exponential function exp f = e^ = 2ZS f""/^'- li i = x + iy, we have e^ = e'"'^'^ = e'^e'^, and e'' = Zo {iyr/n] =(1- 2/V2! + 2/V4! ) + i{y - yVSl + y'/dl ) = cos y + i sin y. Thus c"^"^*^ = ^'^(cos y + i sin y). That is, the real and imaginary parts of the complex-valued function exp (x + iy) are e"^ cos y and e^ sin ?/, respectively. EXERCISES 11.1 Prove the associativity of complex multiplication directly from its definition. 11.2 Prove the distributive law, for complex numbers. 11.3 Show that scalar multiplication by a real number a, a<a:, 2/> = <ax, ay>,\n C = R^ is consistent with the interpretation of a as the complex number <a, 0 > and the definition of complex multiplication. 11.4 Let 6 be an automorphism of the complex number field leaving the real numbers fixed. Prove that d is either the identity or complex conjugation. [Hint: {d(i))^ = 6(i^) ~ d{—l) = —1. Show that the only complex numbers a: + iy whose squares are — 1 are dzi, and then finish up.] 11.5 If we remember that C is in particular the two-dimensional real vector space R^, we see that multiplying the elements of C by the complex number a + ib must define a linear transformation on R^. Show that its matrix is [b a 11.6 The above exercise suggests that the complex number system may be like the set .4 of all 2 X 2 real matrices of the form [: i Prove that ^ is a subalgebra of the matrix algebra R^^^ (that is, A is closed under multiplication, addition, and scalar multiplication) and that the mapping t -3 a + ife is a bijection from A to C that preserves all algebra operations. We therefore can conclude that the laws of an algebra automatically hold for C. Why?
244 COMPACTNESS AND COMPLETENESS 4.1 J 11.7 In the above matrix model of the complex number system show that the absolute value identity |f7| = |f| |7| is a determinant property. 11.8 Let ir be a real vector space, and let V be the real vector space W X W. Show that there is a ^ in Hom V such that 6^ = —I. (Think of C as being the real vector space R^ = R X R under multiplication by i.) 11.9 Let F be a real vector space, and let ^ in Hom F satisfy ^^ = —/. Show that V becomes a complex vector si)ace if ia is defined as 6(a). If the complex vector space T' is made from the real vector space W as in this and the above exercise, we shall call F the complexification of W. We shall regard IF itself as being a real subspace of V (actually W X {0]), and then V = W ® iW. 11.10 Show that the complex vector space C" is the complexification of R". Show more generally that for any set .1 the complex vector space C^ is the complexification of the real vector space R^. 11.11 Let F be the complexification of the real vector space IF. Define the operation of complex conjugation on F. That is, show that there is a real linear mapping <p such that (p^ = I and (p{ia) = —i(p(a). Show, conversely, that if F is a comi)lex vector space and v? is a conjugation on F [a real linear mapping (p such that (p^ = I and (p(ia) = —i(p{a)], then F is (isomorphic to) the complexification of a real linear space IF. (Apply Theorem 5.5 of Chapter 1 to the identity <p^ — I = 0.) 11.12 Let IF be a real vector space, and let F be its complexification. Show that every T in Hom IF "extends" to a complex linear S in Hom F which commutes with the conjugation (p. By S extending T we mean, of course, that S \ (IF X {0}) = T. Show, conversely, that if S in Hom F commutes with conjugation, then S is the extension of a T in Hom W. 11.13 In this situation we naturally call S the complexification of T. Show finally that if S is the complexification of T, then its null space .Y in F is the direct sum X = N ® iN, where A^ is the null space of T in W. Remember that we are viewing F as IF e iW. 11.14 On a complex normed linear space F the norm is required to be complex homogeneous: llXall = |X| • ||a|| for all complex numbers X. Show that the natural definitions of || ||i, || ||2, and || ||oo on C" have this property. 11.15 If a real normed linear space W is complexified to F = W 0 ilF, there is no trivial formula which converts the real norm for W into a complex norm for F. Show that, nevertheless, any product norm on F (which really is IF X IF) can be used to generate an equivalent complex norm. [Hint: Given < ^, 77 > G F, consider the set ol" numbers {\\{x + iy) < f, 77 > || : |x + iy\ = 1}, and try to obtain from this set a single number that works.] 11.16 Show that every nonzero complex number has a logarithm. That is, show that \iu -\- iv 9^ 0, then there exists an x + iy such that e*+'^ = t/ + iv. (Write the equation e'^(cos 2/ + i sin 2/) = u-^ iv, and solve by being slightly clever.) 11.17 The fundamental theorem of algebra and Theorem 5.5 of Chapter 1 imply that if F is a complex vector space and T in Hom F satisfies p{T) = 0 for a polynomial
4.12 WEAK METHODS 245 p, then there are subspaces [Vi] i of F, complex numbers [X^] i, and integers [jUi] ? such that V = @i Vi, Vi is T-invariant for each, and (T — \iT)^i = 0 on Vi for each i. Show that this is so. Show also that if V is finite-dimensional, then every T in Hom V must satisfy some polynomial equation p(t) = 0. (Consider the linear independence or dependence of the vector /, T, T^, . . . , T"^, . . . , in the vector space Hom F.) 11.18 Suppose that the polynomial p in the above exercise has real coefficients. Use the fact that complex conjugation is an automorphism of C to prove that if X is a root of p, then so is X. Show that if F is the complexification of a real space W and T is the complexifica- tion of i? G Hom W, then there exists a real polynomial p such that p{T) = 0. 11.19 Show that if ]F is a finite-dimensional real vector space and R G Hom W is an isomorphism, then there exists an .1 G Hom W such that R = e^ (that is, log R exists). This is a hard exercise, but it can be proved from Exercises 8.19 through 8.23, 11.12, 11.17, and 11.18. *12. WEAK METHODS Our theorem that all norms are equivalent on a finite-dimensional space suggests that the limit theory of such spaces should be accessible independently of norms, and our earlier theorem that every linear transformation with a finite-dimensional domain is automatically bounded reinforces this impression. We shall look into this question in this section. In a sense this effort is irrelevant, since we can't do without norms completely, and since they are so handy that we use them even when we don't have to. Roughly speaking, what we are going to do is to study a vector-valued map F by studying the whole collection of real-valued maps [I o F : I G V*]. Theorem 12.1. If V is finite-dimensional, then f ^ ~^ f in F (with respect to any, and so every, norm) if and only if l(^n) ~^ KO ^^ ^ ^^^ esich I in F*. Proof. If f^ -^ f and I G F*, then /(f^) -^ Z(f), since I is automatically continuous. Conversely, if l(^n) ~^ KO ^^^ every I in F*, then, choosing a basis {Pij'l for F, we have ei(f^) —^ ei(^) for each functional ei in the dual basis, and this implies that fn ~^ f in the associated one-norm, since \\^n ~ ^K = z'l liiiu - ^m -^ 0. D Remark. If F is an arbitrary normed linear space, so that V* = Hom(F, U) is the set of bounded linear functionals, then we say that fn ~^ f iveakly if K^n) -^ KO f^r each I G V*. The above theorem can therefore be rephrased to say that in a finite-dimensional spacej iveak convergejice and Jiorm convergence are equivalent notions. We shall now see that in a similar way the integration and differentiation of parametrized arcs can all be thrown back to the standard calculus of real-valued functions of a real variable by applying functionals from F* and using the natural isomorphism of F** with F. Thus, if / G e([a, 6], F) and X G F*, then
246 COMPACTNESS AND COMPLETENESS 4.11' X o / G e([a, 6], U), and so the integral ja ^° f exists from standard calculus. If we vary X, we can check that the map X »-» J^^ X o / is linear, hence is in F**, and therefore is given by a uniquely determined vector a G V (by duality; sec Chapter 2, Theorem 3.2.). That is, there exists a unique a G V such thai X(q:) = Ja X o / for every X G F*, and we clefme this a to be J^/. Thus integra tion is defined so as to commute with the application of linear functionals ja f is that vector such that X ff^A = f^ X(/(0) dt for all X G F* r* Similarly, if all the real-valued functions {X of: x ^ F*} are differentiabic at Xq, then the mapping X »-» (X o /)'(xo) is linear by the linearity of the derivativ( in the standard calculus: ((^iXi + C2X2) ofY = (ci(Xi of) + C2(X2 o/))' = Ci(Xi ofY + C2(X2 o/)'. Therefore, there is again a unique a G F such that (\ofy(x^) = X(a) for all XgF*, and if we define this a to be the derivative/'(^o), we have again defined an operation of the calculus by commutativity with linear functionals: (Xof'Xxo)= (X°/)'(Xo)- Now the fundamental theorem of the calculus appears as follows. If F(x) = j^f, then (X o F)(x) = j^ \ o f hy the weak definition of the integral. The fundamental theorem of the standard calculus then says that (X o 7^)' exists and (X o Fy(x) = (X o f)(x) = \(f(x)). By the weak definition ol the derivative we then have that F' exists and F\x) = f{x). The one conclusion that we don't get so easily by weak methods is the norm inequality \\jaf\\ ^ (b — a)||/||oo. This requires a theorem about norms on finite-dimensional spaces that we shall not prove in this course. Theorem 12.2. ||a**|| = ||a|! for each a G F. What is being asserted is that lub |a**(X)|/||X|| = ||a||. Since a**(X) = X(a), and since |X(a)| < ||X|| • |la|| by the definition of ||X||, we see that lub |a**(X)|/||X|| < M. Our problem is therefore to find X g F* with ||X|| = 1 and |X(a)| = ||a||. If we multiply through by a suitable constant (replacing a by ca, where c = 1/|!q:||), we can suppose that ||a!| = 1. Then a is on the unit spherical surface, and the problem is to find a functional X G F* such that the affine subspace (hyperplanc^ where X = 1 touches the unit sphere at a (so that X(a) = 1) and otherwise lies outside the unit sphere (so that |X(f)| < 1 when ||f|| = 1, and hence ||X|| < 1). It is clear geometrically that such "tangent planes" must exist, but, we shall drop the matter here.
4.12 WEAK METHODS 247 If we assume this theorem, then, since 1^ (('/)! = ir^(/(^))^^l ^ (b-a)m^x{\\(m)\:tG[a,b]} I \Ja /\ \Ja I < (6 - a)||X|| max {||/(0I|} (from |X(a)| < ||X|| • ||a||) = (6 - a)||X|| • 11/11., we get \\Ja II II \Ja / II I \Ja ' I the extreme members of which form the desired inequality.
CHAPTER 5 SCALAR PRODUCT SPACKs In this short chapter we shall look into what is gohig on behind two-norms, aiul we shall find that a wholly new branch of linear analysis is opened up. TIkm norms can be characterized abstractly as those arising from scalar product They are the finite and infinite-dimensional analogues of ordhiary geometrn length, and they carry with them practically all the concepts of Euclidean geometry, such as the notion of the angle between two vectors, perpendicular!(\ (orthogonality) and the Pythagorean theorem, and the existence of man\ rigid motions. The impact of this extra structure is particularly dramatic for infinite dimensional spaces. Infinite orthogonal bases exist in great profusion and can be handled about as easily as bases in finite-dimensional spaces, although tin basis expansion of a vector is now a convergent infinite series, J ^ J^^i ^i^^< IMany of the most important series expansions in mathematics are examples ol such orthogonal basis expansions. For example, we shall see in the next chaptci that the Fourier series expansion of a continuous function / on [0, tt] is the basi^ expansion of / under the two-norm 11/112 = (J-/2)i/2 for the particular ortho- onal basis [an] "i = [sin nt] ^. If a vector space is complete under a scalar product norm, it is called a Hilbert space. The more advanced theory of such spaces V one of the most beautiful parts of mathematics. 1. SCALAR PRODUCTS A scalar product on a real vector space T^ is a real-valued function from V X i to [R, its value at the pair < J, r7> ordinarily being designated (J, 77), such that ^) (?5 v) ^^ linear in ^ when 77 is held fixed; b) (?, v) = (v, 0 (symmetry); c) (?, 0 > 0 if J p^ 0 (positive definiteness). If (c) is replaced by the weaker condition cO (?,?)> 0 for all ? G V, then (J, 77) is called a semiscalar product. Two important examples of scalar products are n (x, j) = H -UVi when V = W" I 248
5.1 SCALAR PRODUCTS 249 and (/, g) = Cm g(t) dt when V = e([a, 6]). On a complex vector space (b) must be replaced by bO {^, v) = (Vj 0 (Hermitian symmetry), where the bar denotes complex conjugation. The corresponding examples are (z, w) = X!i z{iFi when V = C and (/, g) = j^ fg when V is the space of continuous complex-valued functions on [a, 6]. We shall study only the real case. It follows from (a) and (b) that a semiscalar product is also linear in the second variable when the first variable is held fixed, and therefore is a symmetric i)ilinear functional whose associated quadratic form g(f) = (^, 0 is positive definite or positive semidefinite [(c) or (c'); see the last section in Chapter 2]. The definiteness of the form q has far-reaching consequences, as we shall begin to see at once. Theorem 1.1. The Schwarz inequality \(^,v)\<{^,0"Hr„r,y" is valid for any semiscalar product. Proof. We have 0 < (f - ^77, f - ^77) = (f, f) - 2tU, 77) + t^(rj, rj) for every t G U. Since this quadratic in t is never negative, it cannot have distinct roots, and the usual (6^ — 4ac)-formula implies that 4(f, 77)^ — 4(f, ^)(77, 77) < 0, which is equivalent to the Schwarz inequality. D We can also proceed directly. If (77, 77) > 0, and if we set t = (^, 'n)/('n) v) in the quadratic inequality in the first line of the proof, then the resulting expression simplifies to the Schwarz inequality. If (77, 77) = 0, then (f, 77) must also be 0 (or else the beginning inequality is clearly false for some t), and now the Schwarz inequality holds trivially. Corollary. If (f, 77) is a scalar product, then ||f|| = (f, ^)^'^ is a norm. Proof = ll^f + 2(^,7;)+ ||7;f<||^f +211^11 11,11+ ||,f (by Schwarz) = (IUII + NII)^ proving the triangle inequality. Also, ||cf|| = (c^,c^)^'^= (c^(f, f))^^^ = \c\Ul n Note that the Schwarz inequality |(f, 77)! < ||f|| II77II is now just the statement that the bilinear functional (f, 77) is bounded by one with respect to the scalar product norm. A normed linear space V in which the norm is a scalar product norm is called a pre-Hilbert space. If V is complete in this norm, it is a Hilbert space.
250 SCALAR PRODUCT SPACES 5.1 The two examples of scalar products mentioned earlier give us the real explanation of our two-norms for the first time: / n \ 1/2 = (e4) /2 l|x||2 = ( U ^n for X and h „xl/2 - if.'') for /Geaa,6]) are scalar product norms. Since the scalar product norm on R"" becomes Euclidean length under a Cartesian coordinate correspondence with Euclidean n-space, it is conventional to call W^ itself Euclidean n-space E^ when we want it understood that th(' scalar product norm is being used. Any finite-dimensional space F is a Hilbert space with respect to any scalar product norm, because its finite dimensionality guarantees its completeness. On the other hand, we shall see in Exercise 1.10 that C([a, 6]) is incomplete in th(' two-norm, and is therefore a pre-Hilbert space but not a Hilbert space in this norm. (Remember, however, that C([a, 6]) is complete in the uniform norm ll/lloo.) It is important to the real uses of Hilbert spaces in mathematics that any pre-Hilbert space can be completed to a Hilbert space, but the theory of infinites- dimensional Hilbert spaces is for the most part beyond the scope of this book. Scalar product norms have in some sense the smoothest possible unil spheres, because these spheres are quadratic surfaces. It is orthogonality that gives the theory of pre-Hilbert spaces its special flavor. Two vectors a and p are said to be orthogonal^ written a J_ /?, if (a, p) = 0. This definition gets its inspiration from geometry; we noted in Chapter 1 that two geometric vectors are perpendicular if and only if their coordinate triples x and y satisfy (x, y) = 0. It is an interesting problem to go further and to show from the law of cosines (c^ = a^ + 6^ — 2ah cos 6) that the angle 6 between two geometric vectors is given by (x, y) = ||x|| ||y|| cos 6. This would motivate us to define the angle 6 between two vectors f and 77 in a pre-Hilbert space by (f; ^) = llfll Nil cos 6j but we shall have no use for this more general formulation. We say that two subsets A and B are orthogonal, and we write A ^- B,\{ a J_ /? for every am A and p in 5; for any subset A we set A^ = {p ElV: p ^- A]. Lemma I.I. If/? is orthogonal to the set A, then p is orthogonal to L{A), t\\v closure of the linear span oi A. It follows that B^ is a closed subspace for every subset B. Proof, The first assertion depends on the linearity and continuity of the scalar product in one of its variables; it will be left to the reader. As for A = B^j il includes the closure of its own linear span, by the first part, and so is a closed subspace. D
5.1 SCALAR PRODUCTS 251 Lemma 1.2. In any pre-Hilbert space we have the parallelogram law, ||a + 0f+||a-/3f =2(||af+||0f), and the Pythagorean theorem, a _L /3 if and only if ||a + /3f - \\a\\^ + \\0\\^, If {aj 1 is a (pairwise) orthogonal collection of vectors, then 1 1 Proof. Since ||a + /3f= ||a||2 + 2(a,/3) + ||/3||2, by the bilinearity of the scalar product ||a + |(3||^, we see that ||a + /^H^ = ||a||2 + ||i(3||2 if and only if (a, /?) = 0, which is the Pythagorean theorem. Writing down the similar expansion of ||a — j(3||^ and adding, we have the parallelogram law. The last statement follows from the Pythagorean theorem and Lemma 1.1 by induction. Or we can obtain this statement directly by expanding the scalar product on the left and noticing that all "mixed terms" drop out by orthogonality. D The reader will notice that the Schwarz inequality has not been used in this lemma, but it would have been silly to state the lemma before proving that U\\ = (f, O'^'isanorm. If {aji are orthogonal and nonzero, then the identity ||Z!i Xi«i||^ = YJi xfWaiW^ shows that Xi ^i«i can be zero only if all the coefficients Xi are zero. Thus, Corollary. A finite collection of (pairwise) orthogonal nonzero vectors is independent. Similarly, a finite collection of orthogonal subspaces is independent. EXERCISES 1.1 Complete the second proof of Theorem 1.1. 1.2 Reexamine the proof of Theorem 1.1 and show that if f and 77 are independent, then the Schwarz inequality is strict. 1.3 Continuing the above exercise, now show that the triangle inequality is strict it" f and 77 are independent. 1.4 a) Show that the sum of two semiscalar products is a semiscalar product. b) Show that if (ju, v) is a semiscalar product on a vector space W and if T is a linear transformation from a vector space V to W, then [f, 77] = (Tju, Tv) is a semiscalar product on V. c) Deduce from (a) and (b) that U,g) =mg{a)+ C f{t)g\t)dt J a is a semiscalar product on F = C^([a, 5]). Prove that it is a scalar product.
252 SCALAR PRODUCT SPACES T) .' 1.5 If a is held fixed, we know that/(f) = (f, a) is continuous. Why? Prove mom generally that (f, 77) is continuous as a map from F X F to R. 1.6 Let V be a two-dimensional Hilbert space, and let {ai, 0:2] be an}' basis for 1 Show that a scalar product (f, rj) has the form (f, 77) = axiiji + h(xiy2 + X2Z/i) + cx2y2, where h^ < ac. Here, of course, f = xiai + 0:20:2, 77 = ijiai + ?/2«2. 1.7 Prove that if a;(x, y) = axijji + b(xiy2 + 0:22/1) + cx2y2 and 6^ < ac, then « is a scalar product on R^. 1.8 Let a)(^, rj) be any symmetric bilinear functional on a finite-dimensional vectoi space T^ and let q{^) = a;(f, f) be its associated quadratic form. Show that for aii\ choice of a basis for T^ the equation q{^) = 1 becomes a quadratic equation in 11 k coordinates [xi] of J. 1.9 Prove in detail that if a vector /? is orthogonal to a set ^ in a pre-Hilbert space then /? is orthogonal to L{A). 1.10 We know from the last chapter that the Riemann integral is defined for the s(>l 8 of uniform limits of real-valued step functions on [0, 1] and that 8 includes all tlu continuous functions. Given that k is the step function whose value is 1 on [0, ^] and 0 on [^, 1], show that ||/ — k\\2 > 0 for any continuous function/. Show, howevci, that there is a sequence of continuous functions {fn] such that \\fn — k\\2 —> 0. Show, therefore, that ©([0, 1]) is incomplete in the two-norm, b}' showing that the abo\ c sequence [fn] is Cauchy but not convergent in C([0, 1]). 2. ORTHOGONAL PROJECTION One of the most important devices in geometric reasoning is "dropping a perpendicular" from a point to a line or a plane and then using right triangle arguments. This device is equally important in pre-Hilbert space theory. If M is a subspace and a is any element in V, then by "the foot of the perpendicular dropped from a to M" we mean that vector fx in M such that (a — ju) J_ 71/, if such a ju exists. (See Fig. 5.L) Writing a as ju + (a — ju), we see that the existence of the "foot" fx in M for each a in V is equivalent to the direct sum decomposition V = M ® M^. Now it is precisely this direct sum decomposition that the completeness of a Hilbert space guarantees, as we shall shortly see. We start by proving the geometrically intuitive fact that ju is the foot of the perpendicular dropped from a to M if and only if ju is the point in M closest to a. ^^* Lemma 2.1. If ju is in the subspace M, then (a — ju) J_ M if and only if ix is the unique point in M closest to a, that is, ju is the "best approximation" to a in M. Proof. If (a — ju) J- M and f is any other point in M, then ||a — f||^ =^ ||(a - m) + (m - f)||' == \W - mII' + IIm - ff > \W - Mf. Thus M is the
5.2 ORTHOGONAL PROJECTION 253 unique point in M closest to a. Conversely, suppose that ju is a point in M closest to a, and let f be any nonzero vector in M, Then ||a — mII^ ^ II (« ~" m) + t^Vj which becomes 0 < 2t{a — Mj f) + ^^11 ?||^ when the right-hand scalar product is expanded. This can hold for all t only if (a — ju, f) = 0 (otherwise let t = ?). Therefore, {a — fx) ± M, U On the basis of this lemma it is clear that a way to look for ju is to take a sequence jUn in M such that ||a — Mnll —^ p(«j M) and to hope to define jjl as its limit. Here is the crux of the matter: We can prove that such a sequence {fin} is always Cauchy, but its limit may not exist if M is not complete! Lemma 2.2. If {fjLn} is a sequence in the subspace M whose distance from some vector a converges to the distance p from a to M, then (ju^} is Cauchy. Proof, By the parallelogram law, \\fjLn — Mm||^ = ||(a — Mn) — (a — fJLm)\\^ = 2(\\a — MnT + \W — MmT) — ||2a: — {iXn + Mm)r. Since the first term on the right converges to 4p^ as n, m—> oo, and since the second term is always < —4p^ (factor out the 2), we see that ||/Xn — MmP —> 0 as n,m —> oo. D Theorem 2.1. If M is a complete subspace of a pre-Hilbert space F, then V = M ® M\ In particular, this is true for any finite-dimensional sub- space of a pre-Hilbert space and for any closed subspace of a Hilbert space. Proof. This follows at once from the last two lemmas, since now ju = lim ju^ exists, ||a — ju|| = p(a, M), and so (a — ju) ± M. D If F = M 0 M-^, then the projection on M along M^ is called the orthogonal projection on M, or simply the projection on M, since among all the projections on M associated with the various complements of M, the orthogonal projection is distinguished. Thus, if M is a complete subspace of F, and if P is the projection on Mj then P(f) is at once the foot of the perpendicular dropped from ^ to M (which is where the word "projection" comes from) and also the best approximation to f in M (Lemma 2.1). Lemma 2.3. If {MJ i is a finite collection of complete, pairwise orthogonal subspaces, and if for a vector a in F, a^ is the projection of a on Mi for i = 1, . . . , n, then 2Z i a^ is the projection of a on 0i Mi. Proof. We have to show that a — 2Zi ai is orthogonal to 0i My, and it is sufficient to show it orthogonal to each Mj separately. But if f G Mj, then (a — 2Zi aij f) = (a — ajj f), since (a^, f) = 0 for i 9^ j, and (a — ay, f) = 0 because ay is the projection of a on Mj. Thus (a — 2Zi a^, f) = 0. D Lemma 2.4. The projection of f on the one-dimensional span of a single nonzero vector 77 is ((f, ^)/||^||^)^. Proof. Here ju must be of the form x-q. But (f — X77) ± 77 if and only if 0 = ($ - Xt;, 7;) = (f, 7;) - x||7;f, or X= (f, 7;)/||7;f. D
254 SCALAR PRODUCT SPACES 5.2 We call the number (f, t;)/!!?;!!^ the -q-Fourier coefficient of f. If 77 is a unit (normalized) vector, then this Fourier coefficient is just (f, 77). It follows from Lemma 2.3 that if {ipi\\ is an orthogonal collection of nonzero vectors, and if {x^]\ are the corresponding Fourier coefficients of a vector f, then YJ\ Xi<Pi is th(' projection of f on the subspace M spanned by {(pi} j. Therefore, f — J^l ^i<Pi J- ^^, and (Lemma 2.1) J^^ xnpi is the best approximation to f in M. If f is in M, then both of these statements say that f = YJi xnpi, (This can of course be verified directly, by letting f = YJ\ (^i^i be the b^sis expansion of f and computinjz; (fj ^j) = Zi «i(<y^i, <y^y) = (^jW^PjV') If an orthogonal set of vectors {ipi} is also normalized (||<^i|| = 1), then we call the set orthonormal. Theorem 2.2. If {(pi] "^ is an infinite orthonormal sequence, and if {xi} "^ ar(» the corresponding Fourier coefficients of a vector f, then 00 ^ x^ < \\^\f (Bessel's inequality), 1 and f = J^"^ xnpi if and only if Y.^! xf = \\ f p (Parseval's equation). Proof. Setting an = 2Zi Xi(pi and f = (f — o*^) +0*^, and remembering that ^ — (Tn -L (Tn, we have 1 Therefore, 2Zi a;f < || f ||^ for all n, proving Bessel's inequality, ando-n —> f (that is, IIf — cr^ll —> 0) if and only if J^l xf —> ||$||^, proving Parseval's identity. D We call the formal series 2Z Xi(pi the Fourier series of f (with respect to the orthonormal set {<Pi}). The Parseval condition says that the Fourier series of J converges to f if and only if || f ||^ = JZT ^f- An infinite orthonormal sequence {<pi} "^ is called a 6asis for a pre-Hilbert space F if every element in V is the sum of its Fourier series. Theorem 2.3. An infinite orthonormal sequence {(pi] "J^ is a basis for a pre- Hilbert space V if (and only if) its linear span is dense in V. Proof. Let f be any element of F, and let {xi} be its sequence of Fourier coefficients. Since the linear span of {(pi} is dense in F, given any e, there is a finite linear combination 2Z^ yi(pi which approximates f to within e. But 2ZT ^i^i is the best approximation to $ in the span of {<Pi}^, by Lemmas 2.3 and 2.1, and so ^-z Xi(Pi 1 < e for any m > n. That is, f = ZT xm. U Corollary. If F is a Hilbert space, then the orthonormal sequence {(pij't is a basis if and only if {ipi} -^ = {0}.
5.2 ORTHOGONAL PROJECTION 255 Proof, Let M be the closure of the linear span of {<^i}^. Since V = M -\- M-^, and since M-^ = {<pi}^, by Lemma 1.1, we see that {<pi}^ = {0} if and only if V = M, and, by the theorem, this holds if and only if {(pi} is a basis. D Note that when orthogonal bases only are being used, the coefficient of a vector J at a basis element /? is always the Fourier coefficient (J, j(3)/|||(3||^. Thus the /^-coefficient of J depends only on /? and is independent of the choice of the rest of the basis. However, we know from Chapter 2 that when an arbitrary basis containing /? is being used, then the /^-coefficient of J varies with the basis. This partly explains the favored position of orthogonal bases. We often obtain an orthonormal sequence by "orthogonalizing" some given sequence. Lemma 2.5. If {aj is a finite or infinite sequence of independent vectors, then there is an orthonormal sequence {(pi] such that {ai}^ and {<^i}i,have the same linear span for all n. Proof, Since normalizing is trivial, we shall only orthogonalize. Suppose, to be definit>e, that the sequence is infinite, and let Mn be the linear span of {«!, . . . , an}. Let fjLn be the orthogonal projection of an on Mn-i, and set <Pn = oin — fJin (and <pi = ai). This is our sequence. We have (pi G Mi C Mn-i if i < n, and (pn -L Mn-i, so that the vectors (pi are mutually orthogonal. Also, ipn 9^ 0, since an is not in Mn-i- Thus {<pi}\ is an independent subset of the n-dimensional vector space Mn, by the corollary of Lemma 1.2, and so {iPi\\ spans Mn. □ The actual calculation of the orthogonalized sequence {<pn} can be carried out recursively, starting with <^i = «i, by noticing that since ju^ is the projection of an on the span of <y^i, . . . , <Pn—i, it must be the vector ^\~^ cupi^ where Ci is the Fourier coefficient of an with respect to <pi. Consider, for example, the sequence {x^}o in e([0, 1]). We have <p\ = «! = 1. Next, <^2 = «2 — M2 = ^ ~" c • 1, where C=(«2,^l)/|klf = /o^-l//o (l)' = i Then <pz = ol^ — {c2<P2 + Ci<P\), where Ci = /o^^-l/Jo(l)^=* and C2 = Jo X\X - i)//; (x - 1)=^ = (i - i)/(^) = 1. Thus the first three terms in the orthogonalization of {x^}o in e([0, 1]) are 1, X — ^, x'^ — {x — ^) — ^ = x'^ — X -\- ^, This process is completely elementary, but the calculations obviously become burdensome after only a few terms. We remember from general bilinear theory that if for ^ in F we define (9^: F -> R by d^{^) = (J, /3), then 6^ G F* and (9: ^S j-^ (9^ is a finear mapping from V to F*. If (f, rj) is a scalar product, then 6^(13) = \\l3\\^ > 0 if /3 ?^ 0, and so 6 is injective. Actually, 6 is an isometry, as we shall ask the reader to show in
256 SCALAR PRODUCT SPACES 5.2 an exercise. If V is finite-dimensional, the injectivity of 6 implies that 6 is an isomorphism. But we have a much more startling result: Theorem 2.4. 6 is an isomorphism if and only if F is a Hilbert space. Proof. Suppose first that F is a Hilbert space. We have to show that 6 is sur- jective, i.e., that every nonzero F in F* is of the form 6^. Given such an F, let A^ be its null space, let a be a vector orthogonal to N (Theorem 2.1), and consider /? = ca, where c is to be determined later. Every vector f in F is uniquely a sum f = Xj(3 + 77, where rj is in N. [This only says that V/N is one-dimensional, which presumably we know, but we can check it directly by applying F and seeing that F(^ — xl3) = 0 if and only if x = F(^)/F(l3).] But now the equations F(^) = F(x^ + v) = xF{0) = xcF(a) and e^ii) = ($, /3) = (x0 + 77, /3) = xMl^ = xc^aW^ show that dp = F if we take c = 7^(a)/||a||^ Conversely, if 6 is surjective (and assuming that it is an isometry), then it is an isomorphism in Hom(F, F*), and since F* is complete by Theorem 7.6, Chapter 4, it follows that F is complete by Theorem 7.3 of the same chapter. We are finished. D EXERCISES 2.1 In the proof of Lemma 2.1, if (a — ju, f) 9^ 0, what value of t will contradict the inequality 0 < 2t(a — ju, 9 + t^^\\^? 2.2 Prove the "only if" part of Theorem 2.3. 2.3 Let {Mi} be an orthogonal sequence of complete subspaces of a pre-Hilbert space V, and let Pi be the (orthogonal) projection on Mi. Prove that {Pi^] is Cauchy for any J in V. 2.4 Show that the functions {sin nt] n=i form an orthogonal collection of elements in the pre-Hilbert space ©([0, tt]) with respect to the standard scalar product (/, g) = So fit) git) dt. Show also that ||sin nt\\2 = VV^. 2.5 Compute the Fourier coefficients of the function f{t) = t in C([0, tt]) with respect to the above orthogonal set. What then is the best two-norm approximation to t in the two-dimensional space spanned by sin t and sin 2^? Sketch the graph of this approximating function, indicating its salient features in the usual manner of calculus curve sketching. 2.6 The "step" function / defined by f(t) = 7r/2 on [0,7r/2] and f{t) = 0 on (7r/2, tt] is of course discontinuous at 7r/2. Nevertheless, calculate its Fourier coefficients with respect to {sinn^]n=i in ©([0, tt]) and graph its best approximation in the span of {sin nt] 1. 2.7 Show that the functions {sin nt] n=i U {cos nt] n=o form an orthogonal collection of elements in the pre-Hilbert space ©([—tt, tt]) with respect to the standard scalar product (/, fir) = Sl^f{t)g{t)dt.
5.3 SELF-ADJOINT TRANSFORMATIONS 257 2.8 Calculate the first three terms in the orthogonalization of {a:"}o in ©([—1, 1]). 2.9 Use the definition of the norm of a bounded linear transformation and the Schwarz inequality to show that ||^^|| < \\^\\ [where 6p(^) = (J,/3)]. In order to conclude that /?»—> ^^s is an isometry, we also need the opposite inequality, ||^^|| > \\I3\\, Prove this by using a special value of ?. 2.10 Show that if V is an incomplete pre-Hilbert space, then V has a proper closed subspace M such that M-^ = {0}. [Hint: There must exist F G V* not of the form ^(?) = (?7 «)•] Together with Theorem 2.1, this shows that a pre-Hilbert space F is a Hilbert space if and only if F = M 0 M^ for every closed subspace M. 2.11 The isometry ^:af—> 6a [where daiO = (^, «)] imbeds the pre-Hilbert space V in its conjugate space F*. We know that F* is complete. Why? The closure of F as a subspace of F* is therefore complete, and we can hence complete F as a Banach space. Let H be its completion. It is a Banach space including (the isometric image of) F as a dense subspace. Show that the scalar product on F extends uniquely to H and that the norm on H is the extended scalar product norm, so that i/ is a Hilbert space. 2.12 Show that under the isometric imbedding a»—> ^^ of a pre-Hilbert space F into F* orthogonality is equivalent to annihilation as discussed in Section 2.3. Discuss the connection between the properties of the annihilator A^ and Lemma LI of this chapter. 2.13 Prove that if C is a nonempty complete convex subset of a pre-Hilbert space F, and if a is any vector not in C, then there is a unique n G C closest to a. (Examine the proof of Lemma 2.2.) 3. SELF-ADJOINT TRANSFORMATIONS Definition. If F is a pre-Hilbert space, then T in Horn V is self-adjoint if (Ta, 13) = (a, Tl3) for every a, /? G F. The set of all self-adjoint transformations will be designated SA. Self-adjointness suggests that T ought to become its own adjoint under the injection ^ of F into F*. We check this now. Since (a, /?) = 6^(0), we can rewrite the equation {Ta, /?) = (a, T0) as d^{Ta) = Bt^^ol), and again as (r*((9^))(a) = O'l^a^ by the definition of T*. This holds for all a and /3 if and only if r*((9^) = 6t^ for all /? G F, or r* o ^ = ^ o T'j which is the asserted identification. Lemma 3.1. If F is a finite-dimensional Hilbert space and {<pi} \ is an orthonormal basis for F, then T E. Hom(F) is self-adjoint if and only if the matrix {tij] of T with respect to {<pi) is symmetric (t = t*). Proof. If we substitute the basis expansions of a and /? and expand, we see that (a, r/5) = {Ta, 13) for all a and /3 if and only if (<^^, T<pj) = {T<pi, <pj) for all i and j. But T<pi = 2Zfc=i tkWk, and when this is substituted in these last scalar products, the equation becomes Uj = tji. That is, T is self-adjoint if and only if t = t*. D A self-adjoint T is said to be nonnegative if (T'f, f) > 0 for all f. Then l^j v] = (T^j rj) is Si semiscalar product!
258 SCALAR PRODUCT SPACES 5/^ Lemma 3.2. If T is a nonnegative self-adjoint transformation, then \\T{0\\ < \\T\\"\T^, ^Y'^ for all f. Therefore, if (Tf, f) = 0, then T^ = 0, and, more generally, if (Tf^, f^) -> 0, then T{^n) -> 0. Proof. If T is nonnegative as well as self-adjoint, then [f, ??] = (Tf, rj) is a semiscalar product, and so, by Schwarz's inequality. Taking rj = T^, the factor on the right becomes {T{T^), T^Y'^, which is less than or equal to ||r|| ^^^HTf ||, by Schwarz and the definition of ||!r||. Dividing by II^^11J we get the inequality of the lemma. D If a p^ 0 and T{a) = ca for some c, then a is called an eigenvector (proper vector, characteristic vector) of 7", and c is the associated eigenvalue (proper value, characteristic value). Theorem 3.1. If F is a finite-dimensional Hilbert space and T is a self- adjoint element of Hom F, then V has an orthonormal basis consisting entirely of eigenvectors of T. Proof. Consider the function (T^, $). It is a continuous real-valued function of ^, and on the unit sphere S = {^ : \\^\\ = 1} it is bounded above by ||T|| (by Schwarz). Set m = lub {(T^, ^) : ||^|| = 1}. Since S is compact (being bounded and closed), (T^, ^) assumes the value m at some point a on S. Now m — 7" is a nonnegative self-adjoint transformation (Check this!), and {Ta, a) = m is equivalent to ((m — T)a, a) = 0. Therefore, (m — T)a = 0 by Lemma 3.2, and Ta = ma. We have thus found one eigenvector for T. Now set Fi = F, «! = a, and mi — m, and let V2 be {ai}^. Then T[V2] C F2, for if ^ J_ ai, then {T^, a,) = ($, Ta,) = m($, a,) = 0. We can therefore repeat the above argument for the restriction of T to the Hilbert space V2 and find 0:2 in F2 such that ||a2|| = 1 and ^(0:2) = ^220:2? where mo = lub {{T^, ^) : ||^|| = 1 and ^ G V2] • Clearly, m2 < m,. We then set F3 = {«i, 0:2}"^ and continue, arriving finally at an orthonormal basis {ai} 1 of eigenvectors of T. D Now let Xi, . . . , Xr be the distinct values in the list mi, ... , m^, and let il/y be the linear span of those basis vectors ai for which m^- = \j. Then the sub- spaces Mj are orthogonal to each other, F = 0\ il/y, each Mj is T-invariant, and the restriction of T to Mj is X^ times the identity. Since all the nonzero vectors in Mj are eigenvectors with eigenvalue X^, if the aiS spanning Mj are replaced by any other orthonormal basis for Mj, then we still have an ortho- normal basis of eigenvectors. The a/s are therefore not in general uniquely determined. But the suhspaces Mj and the eigenvalues Xy are unique. This will follow if we show that every eigenvector is in an AIj. Lemma 3.3. In the context of the above discussion, if ^ 7^ 0 and T(^) = x^ for some a; in R, then ^ G Mj (and so x = \j) for some j.
5.3 SELF-ADJOINT TRANSFORMATIONS 259 Proof. Since V = @\ Mj, we have J = Y^'i ^i with ^i G Mi. Then r r r r J2 x^i = x^= T{^) = Z T{^i) = Z \i^, and T. (x - X,)f, = 0. 1 11 1 Since the subspaces Mi are independent, every component {x — \i) ^i is 0. But some ^j 9^ 0, since ^ 9^ 0. Therefore, x = Xy, ^i = 0 for i 9^ j, and ^ = $y G My. D We have thus proved the following theorem. Theorem 3.2. If F is a finite-dimensional Hilbert space and T is a self- adjoint element of Hom V, then there are uniquely determined subspaces {Vi}\ of Vj and distinct scalars {\i}\, such that {Vi} is an orthogonal family whose sum is V and the restriction of T to Vi is \i times the identity. If F is a finite-dimensional vector space and we are given T G Hom V, then we know how to compute related mappings such as T^ and T~^ (if it exists) and vectors Ta, T~^a, etc., by choosing a basis for V and then computing matrix products, inverses (when they exist), and so on. Some of these computations, particularly those related to inverses, can be quite arduous. One enormous advantage of a basis consisting of eigenvectors for T is that it trivializes all of these calculations. To see this, let {l3n} be a basis of V consisting entirely of eigenvectors for T, and let {r„} be the corresponding eigenvalues. To compute T^, we write down the basis expansion for f, f = Z!i Xil3i, and then T^ = J^l ^i^ifii- T^ has the same eigenvectors, but with eigenvalues {rf}. Thus T^a = 2Zi rfxi^i. T~^ exists if and only if no Vi = 0, in which case it has the same eigenvectors with eigenvalues {1/r,}. Thus T''^^ = Li (Vn)/3i- If ^^(0 = Zo (^nt"" is any polynomial, then P{T) takes 0i into P(rO/3i. Thus P{T)^= Zo ^O^O^A- By now the point should be amply clear. The additional value of orthonormality in a basis is already clear fom the last section. Basically, it enables us to compute the coefficients {xi} of f by scalar products: Xi = (f, I3i). This is a good place to say a few words about the general eigenvalue problem in finite-dimensional theory. Our complete analysis above was made possible by the self-adjointness of T (or the symmetry of the matrix t). What we can say about an arbitrary T in Hom V is much less satisfactory. We first note that the eigenvalues of T can be determined algebraically, for X is an eigenvalue if and only if T — \I is not infective, or, equivalently, is singular, and we know that T — X/issingularifandonly if its determinant A(r — X7) is 0. If we choose any basis for F, the determinant of T — \I is the determinant of its matrix t — Xe, and our later formula in Chapter 7 shows that this is a polynomial of degree n in X. It is easy to see that this polynomial is independent of the basis; it is called the characteristic polynomial of T. Thus the eigenvalues of T are exactly the roots of the characteristic polynomial of T.
260 SCALAR PRODUCT SPACES 5M However, T need not have any eigenvectors! Consider, for example, a 90° rotation in the Cartesian plane. This is the map T: <x,y> »—> <—y^x>. Thus T{b^) = b^ and T{b^) = -b\ so the matrix of T is P -'] 1 0 Then the matrix of T — X is X -1 1 X and the characteristic polynomial of T is the determinant of this matrix: X^ + 1. Since this polynomial is irreducible over U, there are no eigenvalues. Note how different the outcome is if we consider the transformation with the same matrix on complex 2-space C^. Here the scalar field is the complex number system, and T is the map <Zi, Z2> >—> < —Z2, ^i^ from C^ to C^. But now X^ + 1 = (X + ^)(X — ^), and T has eigenvalues zt^! To find the eigenvectors for ^, we solve T{z) = ^z, which is the equation < —Z2, Zi> = <izi, iz2>, or Z2 — —izi. Thus <1, —i> (or ^<l, —i> = <^, 1>) is the unique eigenvector for i to within a scalar multiple. We return to our real theory, li T E. Horn V and n = d{V)j so that c/(Hom V) = n^, then the set of n^ + 1 vectors {T^}f in Hom V is dependent. But this is exactly the same as saying that p(T) = 0 ior some polynomial p of degree <?i^. That is, any T in Hom V satisfies a polynomial identity p{T) = 0. Now suppose that r is an eigenvalue of T and that T{^) = r^. Then p{T){^) ^ p{r)^ ^ 0, and so p{r) = 0. That is, every eigenvalue of 7" is a root of the polynomial p. Conversely, if p{r) = 0, then we know from the remainder theorem of algebra that ^ — r is a factor of the polynomial p{t)j and therefore {t — vY^ will be one of the relatively prime factors of p. Now suppose that p is the minimal polynomial of T (see Exercise 3.5). Theorem 5.5, Chapter 1, tells us that {T — rl)"^ is zero on a corresponding subspace A^ of F and therefore, in particular, that T — rl is not injective when restricted to A^. That is, r is an eigenvalue. We have proved: Theorem 3.3. The eigenvalues of T are zeros (roots) of every polynomial p{t) such that p{T) = 0, and are exactly the roots of the minimal polynomial. EXERCISES 3.1 Use the defining identity (T^, t)) = (^, Tt)) to show that the set SA of all self- adjoint elements of Hom T^ is a subspace. Prove similarly that if S and T are self- adjoint, then ST is self-adjoint if and only if ST = TS. Conclude that if T is self-adjoint, then so is p{T) for any polynomial p. 3.2 Show that if T is self-adjoint, then aS = T^ is nonnegative. Show, therefore, that if T is self-adjoint and a > 0, then T'^ + al cannot be the zero transformation. 3.3 Let p{t) = ^^ + 5^ + c be an irreducible quadratic polynomial (5^ < 4c), and let 7" be a self-adjoint transformation. Show that p{T) 9^ 0. (Complete the square and apply earlier exercises.)
5.3 SELF-ADJOINT TRANSFORMATIONS 261 3.4 Let T be self-adjoint and nilpotent (T" = 0 for some n). Prove that T = 0. This can be done in various ways. One method is to show it first for n = 2 and then for n = 2"^ by induction. Finally, any n can be bracketed by powers of 2, 2^ < n < 2"*+! . 3.5 Let V be any vector space, and let T be an element of Hom V. Suppose that there is a polynomial q such that q{T) = 0, and let p be such a polynomial of minimum degree. Show that p is unique (to within a constant multiple). It is called the minimal polynomial of T. Show that if we apply Theorem 5.5 of Chapter 1 to the minimal polynomial p of T, then the subspaces Ni must both be nontrivial. 3.6 It is a corollary of the fundamental theorem of algebra that a polynomial with real coefficients can be factored into a product of linear factors {t — r) and irreducible quadratic factors' (^^ + ht-\- c). Let T be a self-adjoint transformation on a finite- dimensional Hilbert space, and let p{t) be its minimal polynomial. Deduce a new proof of Theorem 3.1 by applying to p{t) the above remark. Theorem 5.5 of Chapter 1, and Exercises 3.1 through 3.4. 3.7 Prove that if T is a self-adjoint transformation on a pre-Hilbert space V, then its null space is the orthogonal complement of its range: iV(r) = (.R{T))\ Conclude that if y is a Hilbert space, then a self-adjoint T is injective if and only if its range is dense (in V). 3.8 Assuming the above exercise, show that if F is a Hilbert space and T is a self- adjoint element of Hom V that is hounded below (as well as bounded), then T is sur- jective. 3.9 Let T be self-adjoint and nonnegative, and set m = lub {{T^, ^) : ||^|| = 1}. Use the Schwarz inequality and the inequality of Lemma 3.2 to show that m = \\T\\. 3.10 Let y be a Hilbert space, let T be a self-adjoint element of Hom V, and set m = lub {(T^, ^) : ||^|| = 1}. Show that if a > m, then a — T { = al — T) is in- vertibleand ||(a — T)~^|| < l/(a — m). (Apply the Schwarz inequality, the definition of m, and Exercise 3.8.) 3.11 Let P be a bounded linear transformation on a pre-Hilbert space V that is a projection in the sense of Chapter 1. Prove that if P is self-adjoint, then P is an orthogonal projection. Now prove the converse. 3.12 Let y be a finite-dimensional Hilbert space, let T in Hom V be self-adjoint, and suppose that S in Hom V commutes with T. Show that the subspaces Mj of Theorem 3.1 and Lemma 3.3 are invariant under S. 3.13 A self-adjoint transformation T on a finite-dimensional Hilbert space V is said to have a simple spectrum if all its eigenvalues are distinct. By this we mean that all the subspaces Mj are one-dimensional. Suppose that T is a self-adjoint transformation with a simple spectrum, and suppose that S commutes with T. Show that S is also self-adjoint. (Apply the above exercise.) 3.14 Let H he 3. Hilbert space, and let a;[^, rj] be a bounded bilinear form on H X H. That is, there is a constant h such that H^,v]\<bU\\M for all |, t, G//. Show that there is a unique T in Hom V such that a;[^, r;] = (^, Tr]). Show that T is self-adjoint if and only if a; is symmetric.
262 SCALAR PRODUCT SPACES 5.4 4. ORTHOGONAL TRANSFORMATIONS Assuming that F is a Hilbert space and that therefore 6: V -^ F* is an isomorphism, we can of course replace the adjoint T* G Hom V* of any T g Hom V by the corresponding transformation ^"^0^*0^^ Hom V. In Hilbert space theory it is this mapping that is called the adjoint of T and is designated T*, Then, exactly as in our discussion of a self-adjoint T, we see that (Ta, p) - (a, T*p) for all a, /? G 7 and that T* is uniquely defined by this identity. Finally, T is self-adjoint if and only liT = T*, Although it really amounts to the above way of introducing T* into Hom F, we can make a direct definition as follows. For each rj the mapping f »—> (T^, r;) is linear and bounded, and so is an elemenfcof F*, which, by Theorem 2.4, is given by a unique element 13^ in V according to the formula (T^j rj) = {^j /3^). Now we check that r;»—> /3^ is linear and bounded and is therefore an element of Hom V which we call T*, etc. The matrix calculations of Lemma 3.1 generalize verbatim to show that the matrix of T* in Hom V is the transpose t* of the matrix t of T, Another very important type of transformation on a Hilbert space is one that preserves the scalar product. Definition. A transformation T G Hom V is orthogonal if (Ta, TjS) = (a, 13) for all a, /? G 7. By the basic adjoint identity above this is entirely equivalent to (a, T*Tp) = (a, p)j for all a, p, and hence to T*T = I. An orthogonal T is injective, since IITall^ = \\a\\^, and is therefore invertible if 7 is finite-dimensional. Whether 7 is finite-dimensional or not, if T is invertible, then the above condition becomes T* = T~^ If T G Hom R^j the matrix form of the equation T*T = / is of course t*t = e, and if this is written out, it becomes n Z) ikitkj = 8} for all i, j, k=i which simply says that the columns of t form an orthonormal set (and hence a basis) in U^, We thus have: Theorem 4.1. A transformation T G Hom U^ is orthogonal if and only if the image of the standard basis {5*} 1 under T is another orthonormal basis (with respect to the standard scalar product). The necessity of this condition is, of course, obvious from the scalar-product- preserving definition of orthogonality, and the sufficiency can also be checked directly using the basis expansions of a and p. We can now state the eigenbasis theorem in different terms. By a diagonal matrix we mean a matrix which is zero everywhere except on the main diagonal.
5.4 ORTHOGONAL TRANSFORMATIONS 263 Theorem 4.2. Let t = {tij} be a symmetric n X n matrix. Then there exists an orthogonal n X n matrix h such that b~Hb is a diagonal matrix. Proof, Since the transformation T G Hom U^ defined by t is self-adjoint, there exists an orthonormal basis {b*} i of eigenvectors of Tj with corresponding eigenvalues {rj^. Let B be the orthogonal transformation defined by B{8^) = h^\ j = 1, . . . , n. (The n-tuples b^ are the columns of the matrix b = {hij} of B,) Then (B'^ o T o B)(8') = Vjb', Since {B'^ o T o B){b') is the jth column of b~Hb, we see that s = b~^tb is diagonal, with Sjj = rj. D For later applications we are also going to want the following result. Theorem 4.3. Any invertible T G Hom F on a finite-dimensional Hilbert space V can be expressed in the form T = RS, where R is orthogonal and aS is self-adjoint and positive. Proof, For any T, T*T is self-adjoint, since {T*T)* = T*T** = T*T, Let {(Pi}\ be an orthonormal eigenbasis, and let {rJ ^ be the corresponding eigenvalues of T*T, Then 0 < \\T<pi\\^ = {T*T<pi, <pi) = {rm, <pi) = n for each i. Since all the eigenvalues ofT*T are thus positive, we can define a positive square root aS = (T*Ty'^ by Scpi = (r^)^'^<Pi, z = 1, 2, . . . , n. It is clear that aS^ = T*T and that aS is self-adjoint. Then A = ST''' is orthogonal, for (ST-'a, ST-''p) = {T'^a, S^T-^p) = {T-^a, T*TT-''l3) = {T-^a, T*I3) = {Tf-^a, p) = (a, /?). Since T = A'^aS, we set 72 = A~^ and have the theorem. D It is not hard to see that the above factorization of T is unique. Also, by starting with TT*jWe can express T in the form T = SRj where aS is self-adjoint and positive and R is orthogonal. We call these factorizations the polar decompositions of T, since they function somewhat like the polar coordinate factorization z = re^^ of a complex number. Corollary. Any nonsingular n X n matrix t can be expressed as t = udv, where u and v are orthogonal and d is diagonal. Proof. From the theorem we have t = rs, where r is orthogonal and s is symmetric. By Theorem 4.2, s = bdb~\ where d is diagonal and b is orthogonal. Thus t = rs = (rb)db~^ = udv, where u = rb and v = b~^ are both orthogonal. D EXERCISES 4.1 Let y be a Hilbert space, and suppose that S and T in Hom V satisfy {T^,V) = {^,Srj) for all $,77. Write out the proof of the identity S -^ 6''^ o T* o $.
264 SCALAR PRODUCT SPACES 5.5 4.2 Write out the analogue of the proof of Lemma 3.1 which shows that the matrix of T* is the transpose of the matrix of T. 4.3 Once again show that if (^, rj) = (^, f) for all ^, then 77 = f. Conclude that if S, T in Hom V are such that (^, Trj) = (^, Srj) for all 77, then T = S, 4.4 Let {a, b} be an orthonormal basis for R^, and let t be the 2X2 matrix whoso columns are a and b. Show by direct calculation that the rows of t are also ortho- normal. 4.5 State again why it is that if V is finite-dimensional, and if S and T in Hom V satisfy S o T = 7, then T is invertible and S = T~^. Now let 7 be a finite-dimensional Hilbert space, and let T be an orthogonal transformation in Hom V. Show that T* is also orthogonal. 4.6 Let t be an n X n matrix whose columns form an orthonormal basis for R". Prove that the rows of t also form an orthonormal basis. (Apply the above exercise.) 4.7 Show that a nonnegative self-adjoint transformation aS on a finite-dimensional Hilbert space has a uniquely determined nonnegative self-adjoint square root. 4.8 Prove that if F is a finite-dimensional Hilbert space and T G Hom F, then the ^^polar decomposition" of T, T = RS^ of Theorem 4.3 is unique. (Apply the above exercise.) 5. COMPACT TRANSFORMATIONS Theorem 3.1 breaks down when V is an infinite-dimensional Hilbert space. A self-adjoint transformation T does not in general have enough eigenvectors to form a basis for V, and a more sophisticated analysis, allowing for a "continuous spectrum" as well as a "discrete spectrum", is necessary. This enriched situation is the reason for the need for further study of Hilbert space theory at the graduate level, and is one of the sources of complexity in the mathematical structure of quantum mechanics. However, there is one very important special case in which the eigenbasis theorem is available, and which will have a startling application in the next chapter. Definition. Let V and W be any normed linear spaces, and let aS be the unit ball in V. A transformation T in Hom(7, W) is compact if the closure of T[S] in W is sequentially compact. Theorem 5.1. Let V be any pre-Hilbert space, and let T g Hom V be self- adjoint and compact. Then the pre-Hilbert space R = range (T) has an orthonormal basis {(pi} consisting entirely of eigenvectors of T, and the corresponding sequence of eigenvalues {r^} converges to 0 (or is finite). Proof. The proof is just like that of Theorem 3.1 except that we have to start a little differently. Set m = \\T\\ = lub {\\T{0\\ : U\\ = 1}, and choose a sequence {fn} such that IIfnil = 1 for all n and 11^(^^)11 -^ ^- Then
5.5 COMPACT TRANSFORMATIONS 265 and since m^ — T^ is a nonnegative self-adjoint transformation, Lemma 3.2 tells that (m^ — T^){^n) -^ 0. But since T is compact, we can suppose (passing to a subsequence if necessary) that {T^n} converges, say to p. Then T^^n -^ Tpj and so m^^n -^ Tp also. Thus U -^ Tp/m^ and p = \imTU= T^{p)/m^ Since ||/3|| = lim HT'C^n)!! = ^, we have a nonzero vector jS such that T^{l3) = m^p. Set a = p/\\l3\\. Wehavethusfounda vectorasuch that ||a|| = landO = (m^ — T^){a) = (m - T){m+ T){a), Then either (m + T){a) = 0, in which case T{a) = —ma, or {m + T)(a) = 7 9^ 0 and (m — r)7 = 0, in which case Tl = ml. Thus there exists a vector (pi {a or t/||7||) such that ||<^i|| = 1 and T{(pi) = ricpi, where |ri| = m. We now proceed just as in Theorem 3.1. For notational consistency we set mi = m, Vi = V, and now set V2 = {iPi}-^, Then ^Fa] C Fg, since if a ± <pi, then (Ta, <pi) = (a, T<p2) = ^i{oi, <pi) = 0. Thus T t 72 is compact and self-adjoint, and if m2 = \\T \ V2\\, there exists (P2 with ||<^2|| = 1 and T{(p2) = r2(P2j where |r2| = m2. We continue inductively, obtaining an orthonormal sequence {(Pn} C V and a sequence {vn} C U such that Tcpn = Tnifn and \rn\ = \\T t 7^11, where We suppose for the moment, since this is the most interesting case, that Vn 9^ 0 for all nr Then we claim that \rn\ -^ 0. For \rn\ is decreasing in any case, and if it does not converge to 0, then there exists a 6 > 0 such that |r^| > 6 for all n. Then \\T(<pd - r(^y)f = ||r,^, - rj<p,r = llw^f + Ik.wf = ^i + ^/ ^ 26^ for all i 9^ i, and the sequence {T{<p{)} can have no convergent subsequence, contradicting the compactness of T, Therefore \rn\ i 0. Finally, we have to show that the orthonormal sequence {cpi} is a basis for R. li 13 = T{a), and if {bn} and {an} are the Fourier coefficients of p and a, then we expect that bn = rndn, and this is easy to check: bn = (/?, <Pn) = {T(a), iPn) = (a, T{iPn)) = (a, Vn^Pn) = ^n(«, (fn) = Tnan^ TMs is jUSt Saying that T{an(Pn) = bn<Pn, and therefore p — YJi bnpi = T(a — Yn ca^i)- Now a — X^i anpi is orthogonal to {(pi}\ and therefore is an element of 7n+i, and the norm of T on Vn+\ is kn+i|. Moreover, ||a — YJ\ «i<^i|| < ||«IL by the Pythagorean theorem. Altogether we can conclude that /5 — S ^i'Pi < kn + ll and since Vn^i -^0, this implies that P = J^t ^i^i- Thus {(pi) is a basis for R{T). Also, since T is self-adjoint, N{T) = R{T)^ = fe}-^ = (iT 7^. If some Vi = 0, then there is a first n such that r^ = 0. In this case \\T \ Vn\\ = Kl = 0, so that Vn C N{T), But ipi G /?(r) if i < n, since then <pi = T{<pi)/ri, and so N{T) = R{T)^ C {<^i, . . . , <^n-i}-^ = 7^. Therefore, N{T) = Vn and R{T) is the span of {ip^]T\ D
CHAPTER 6 DIFFERENTIAL EQUATIONS This chapter is not a small differential equations textbook; we leave out far too much. We are principally concerned with some of the theory of the subject, although we shall say one or two practical things. Our first goal is the fundamental existence and uniqueness theorem of ordinary differential equations, which we prove as an elegant application of the fixed-point theorem. Next wo look at the linear theory, where we make vital use of material from the first two chapters and get quite specific about the process of actually finding solutions. So far our development is linked to the initial-value problem, concerning th(^ existence of, and in some cases the ways of finding, a unique solution passing through some initially prescribed point in the space containing the solution curves. However, some of the most important aspects of the subject relate to what are called boundary-value problems, and our last and most sophisticated effort will be directed toward making a first step into this large area. This will involve us in the theory of Chapter 5, for we shall find ourselves studying self- adjoint operators. In fact, the basic theorem about Fourier series expansions will come out of recognizing a certain right inverse of a differential operator to bo a compact self-adjoint operator. 1. THE FUNDAMENTAL THEOREM Let A be an open subset of a Banach space W, let / be an open interval in U, and let F: I X A -^ W he continuous. We want to study the differential equation da/dt = F(t, a). A solution of this equation is a function f:J-^A, where J is an open subinterval of /, such that/'(0 exists and f(t) = F(t,f{t)) for every t in J. Note that a solution / has to be continuously differentiate, for the existence of /' implies the continuity of /, and then f\t) = F(tjf(t)) is continuous by the continuity of F. We are going to see that if F has a continuous second partial differential, then there exists a uniquely determined "local" solution through any point <toy ao> G I X A, 266
6.1 THE FUNDAMENTAL THEOREM 267 In saying that the solution / goes through <toy ao> y we mean, of course, that «o == /(^o)- The requirement that the solution / have the value ao when t = to is called an initial condition. The existence and continuity of dF%t,a> implies, via the mean-value theorem, that F(t, a) is locally uniformly Lipschitz in a. By this we mean that for any point <to, ao> in / X A there is a neighborhood M X N and a constant 6 such that \\F(t, 0 — F{t, 7))\\ < h\\i — 7)\\ for all t in M and all f, t) in N. To see this we simply choose balls M and N about ^o and ao such that dF%t,a> is bounded, say by 6, on M X Nj and apply Theorem 7.4 of Chapter 3. This is the condition that we actually use below. Theorem 1.1. Let A be an open subset of a Banach space W, let / be an open interval in U, and let F be a continuous mapping from I X A to W which is locally uniformly Lipschitz in its second variable. Then for any point <to, ao> in / X A, for some neighborhood U of ao and for any sufficiently small interval J containing to, there is a unique function / from J to U which is a solution of the differential equation passing through the point <to, ao>. Proof, If / is a solution on J through <^o, oio^ j then an integration gives fit) -f{to)= f'F(s,f{s))ds, ho so that fit) = cco+ f'F(sJ{s))ds Jto for t gJ. Conversely, if/ satisfies this "integral equation", then the fundamental theorem of the calculus implies that f'(t) exists and equals F(tjf{t)) on J J so that/is a solution of the differential equation which clearly goes through < toj ao^ ' Now for any continuous f: J -^ A we can define g:J-^Why git) = ao+ f'F(sJis))ds, Jto and our argument above shows that / is a solution of the differential equation if and only if / is a fixed point of the mapping K\f^-^g. This suggests that we try to show that K is a contraction, so that we can apply the fixed-point theorem. We start by choosing a neighborhood L X C/ of <^o; olq> on which Fit, a) is bounded and Lipschitz in a uniformly over t. Let J be some open subinterval of L containing ^o; and let V be the Banach space (Be(J, W) of bounded continuous functions from J to W. Our later calculation will show how small we have to take J. We assume that the neighborhood [/ is a ball about a^ of radius r, and we consider the ball of functions 11 = Briao) in F, where olq is the constant function with value a^. Then any/in 11 has its range in [/, so that F(^, fit)) is defined, bounded, and continuous. That is, K as defined earlier maps the ball 11 into 7.
268 DIFFERENTIAL EQUATIONS 6.1 We now calculate. Let F be bounded by m on L X U and let b be the length of J. Then ||K(ao) - ^olU = lub III C F{s, ao) ^s|| :tGj\ < 8m (1) by the norm inequality for integrals (see Section 10 of Chapter 4). Also, if/i and /2 are in 11, and if c is a Lipschitz constant for F on L X U, then \\KUx) - K(f2)\U = \uh{\\rF(s,Ms)) - F(sj2(s))\\\ iWJto 11' < 8\uh{\\F{sJds)) -F{s,Ms))\\} < 8c\uh{\\h{s) -/2(s)||} = 8c\\fi -/2II00. (2) From (2) we see that K is a contraction with constant C = 8c ii 8c < 1, and from (1) we see that K moves the center ao of the ball 11 a distance less than (1 — C)r if 8m < (1 — 8c)r, This double requirement on 8 is equivalent to r 8 < m -\- cr and with any such 8 the theorem follows from a corollary of the fixed-point theorem (Corollary 2 of Theorem 9.1, Chapter 4). D Corollary. The theorem holds ii F: I X A -^ W is continuous and has a continuous second partial differential. We next show that any two solutions through <tQj q:o> must agree on the intersection of their domains (under the hypotheses of Theorem 1.1). Lemma 1.1. Let Qi and ^2 be any two solutions of da/dt = Fit, a) through <^0; (^Q^' Then ^i(^) = ^2(0 ^^ all t in the intersection J = Ji n J2 of their domains. Proof. Otherwise there is a point s in J such that gx{s) 9^ ^2(5)- Suppose that s > ^0; and set C = {^: ^ > ^0 and ^i(^) ^ ^2(0} and x = gib C. The set C is open, since gi and ^2 are continuous, and therefore x is not in C, That is, ^i(^) = Q2{^)' Call this common value a and apply the theorem to <x, a>. With r such that Br{oL) CA, we choose 8 small enough so that the differential equation has a unique solution g from (x — 5, x + 5) to Br{oL) passing through <x, a>, and we also take 5 small enough so that the restrictions of gi and ^2 to {x — 5, a; + 5) have ranges in Br (a). But then gi = g2 = g on this interval by the uniqueness of ^, and this contradicts the definition of x. Therefore, 9i = ^2 on the intersection of their domains. D This lemma allows us to remove the restriction on the range of / in the theorem. Theorem 1.2. Let A, /, and F be as in Theorem 1.1. Then for any point <toy ao> in I X A and any sufficiently small interval neighborhood J of ^o; there is a unique solution from J to A passing through <^o; ^0^ •
6.1 THE FUNDAMENTAL THEOREM 269 Fig. 6.1 Global solutions. The solutions we have found for the differential equation da/dt = F{tj a) are defined only in sufficiently small neighborhoods of the initial point ^0 and are accordingly called local solutions. Now if we run along to a point <^i, ai > near the end of such a local solution and then consider the local solution about <^i, ai > , first of all it will have to agree with our first solution on the intersection of the two domains, and secondly it will in general extend farther beyond ti than the first solution, so the two local solutions will fit together to make a solution on a larger ^interval than either gives separately. We can continue in this way to extend our original solution to what might be called a global solution, made up of a patchwork of matching local solutions. These notions are somewhat vague as described above, and we now turn to a more precise construction of a global solution. Given <^o, «o> Cl X A, let ^ be the family of all solutions through < to, ao >. Thus g G^if and only if (/ is a solution on an interval J Cl,to gJ, and g{to) = ao- Lemma 1.1 shows exactly that the unionf / of all the functions g in ?F is itself a function, for if <^i,a:i> Ggi and <^i,a:2> G ^2, then ai = gdt) = 02(t) = 0:2- Moreover, / is a solution, because around any x in its domain / agrees with some g G^. By the way / was defined we see that / is the unique maximum solution through <to, aQ>, We have thus proved the following theorem. Theorem 1.3. Let F: I X A -^ V he Si. function satisfying the hypotheses of Theorem 1.1. Then through each <^o, «o> in / X A there is a uniquely determined maximal solution to the differential equation da/dt = F{t, a). In general, we would have to expect a maximal solution to "run into the boundary of A " and therefore to have a domain interval J properly included in J, as Fig. 6.1 suggests. t Remember that we are taking a function to be a set of ordered pairs, so that the union of a family of functions makes precise sense.
270 DIFFERENTIAL EQUATIONS 6.1 However, if A is the whole space W, and if F{t, a) is Lipsohitz in a for each t, with a Lipschitz bound c{t) that is continuous in tj then we can show that each maximal solution is over the whole of /. We shall shortly see that this condition is a natural one for the linear equation. Theorem 1.4. Let W be a Banach space, and let / be an open interval in U. Let F: I X W —^ W he continuous, and suppose that there is a continuous function c: / —> K such that \\F{t,a,)-F(t,a2)\\ <c(<)||ai - aall for all t in / and all ai, 0:2 in W. Then each maximal solution to the differential equation da/dt = F(t, a) has the whole of / for its domain. Proof. Suppose, on the contrary, that (/ is a maximal solution whose domain interval J has right-hand endpoint 6 less than that of /. We choose a finite open interval L containing 6 and such that L C I (see Fig. 6.2). Since L is compact, the continuous function c{t) has a maximum value c on L. We choose any ti in L n J close enough to 6 so that 6 — ^1 < 1/c, and we set ai = g{ti) and m = max \\F(t, ai)\\ on L. With these values of c and m, and with any r, the proof of Theorem 1.1 gives us a local solution / through <^i, ai > with domain (^1 — ^yti + 5) for any 8 less than r/{m + re) = l/(c + (m/r)). Since we now have no restriction on r (because A = W)j this bound on d becomes 1/c, and since we chose ti so that ^1 + (1/c) > 6, we can now choose d so that ti + d > b. But this gives us a contradiction; the maximal solution g through <^i,ai> includes the local solution /, so that, in particular, ti + 8 < b. We have thus proved the theorem. D I I 1" I L Fig. 6.2 Going back to our original situation, we can conclude that if the Lipschitz control of F is of the stronger type assumed above, and if the domain J of some maximal solution g is less than 7, then the open set A cannot be the whole of W. It is in fact true that the distance from g{t) to the boundary of A approaches zero as t approaches an endpoint 6 of J which is interior to I. That is, it is now a theorem that p(f{t)j A') —> 0 as ^ —> 6. The proof is more complicated than our argument above, and we leave it as a set of exercises for the interested reader. The nth-order equation. Let Ai, A2, . . . , A^ be open subsets of a Banach space T^, let 7 be an open interval in R, and let G:7xAiXA2X---xA^—>TF be continuous. We consider the differential equation d^'a/dt'' = G{t, a, da/dt, . . . , d^'-^a/de'^).
6.1 THE FUNDAMENTAL THEOREM 271 A function /: J -^ T7 is a solution to this equation if J is an open subinterval of /, / has continuous derivatives on J up to the nth order, /^*~^^[J] C Aij i = 1, . . . , n, and for t G J. An initial condition is now given by a point The basic theorem is almost the same as before. To simplify our notation, let a be the n-tuple <ai, a2, . . . , oin> in W^ = V, and set A = Hi Ai. Also let yl/ be the mapping f \-^ </,/',•-' ,/^''~^^>. Then the solution equation becomes/^^^(O - G(t,ypf{t)). Theorem 1.5. Let G: / X A -^ T7 be as above and suppose, in addition, that G{tj a) is locally uniformly Lipschitz in a. Then for any <toy fi> in / X A and for any sufficiently small open interval J containing ^o; there is a unique function/from J to TF such that/is a solution to the above nth-order equation satisfying the initial condition ^/(^o) = P- Proof. There is an ancient and standard device for reducing a single nth-order equation to a system of first-order equations. The idea is to replace the single equation d^'a/dr = Git, a, da/dt, . . . , d^'-^a/de'^) by the system of equations dai/dt = a2, da2/dt = asj dan—i/dt = anj dan/dt = G{tj ai, 0:2, • • • , ^n), and then to recognize this system as equivalent to a single first-order equation on a different space. In fact, if we define the mapping F = <F^j , . . j F^> from / X A to 7 = W^ by setting F\tj a) = ai^i for ^ = 1, . . . , n — 1, and F^(tj ot) = G{tj ol)j then the above system becomes the single equation dot/dt = F{t,oL), where F is clearly locally uniformly Lipschitz in a. Now a function f = <fii ' ' • ,fn^ from J to 7 is a solution of this equation if and only if /l = /2j J2 =" fs} Jn—l ^^ Jri) fn = G{tj fij . . . J fn)j
272 DIFFERENTIAL EQUATIONS 6.1 that is, if and only if /i has derivatives up to order n, ypifi) = f and fi^\t) = G(ij^fi{i))- The n-tuplet initial condition \l/f{to) = fi is now just f(^o) = P- Thus the nth-order theorem for G has turned into the first-order theorem for F^ and so follows from Theorems 1.1 and 1.2 D The local solution through <to, fi> extends to a imique maximal solution by Theorem 1.3 applied to our first-order problem da/dt = F{t,oL)j and the domain of the maximal solution is the whole of / if G{tj a) is Lipschitz in a with a bound c{t) that is continuous and if A = W^, as in Theorem 1.4. EXERCISES 1.1 Consider the equation da/dt = F{t, a) in the special case where W — R^. Write out the equation as a pair of equations involving real-valued functions and real variables. 1.2 Consider the system of differential equations dx/dt = t-\- x'^ -\- y^, dy/dt = cos xy. Define the function F:U^ -^ U^ so that the above system becomes da/dt = F{t,a), where a = <x, y>. 1.3 In the above exercise show that F is uniformly Lipschitz in a on R X ^, where A is any bounded open set in U^. Is F uniformly Lipschitz on R X R^? 1.4 Write out the above system in terms of a solution function / = </i,/2>'. Write out for this system the integrated form used in proving Theorem 1.1. 1.5 The fixed-point theorem iteration sequence that we used in proving Theorem 1.1 starts off with /o as the constant function ao and then proceeds by fn{t) = ao+ f F{sjn-i{s)) ds. Jo Compute this sequence as far as /4 for the differential equation dx/dt == t+x lf{t) = t + f{t)] with the initial condition/(O) = 0. That is, take/o = 0 and compute/i,/2,/a, and/4 from the formula. Now guess the solution / and verify it. 1.6 Compute the iterates/o,/i,/2, and/a for the initial-value problem dy/dx = x+ y^j y{0) = 0. Supposing that the solution / has a power series expansion about 0, what are its first three nonzero terms? 1.7 Make the computation in the above exercise for the initial condition/(O) = — 1. 1.8 Do the same for/(0) = +1.
6.1 THE FUNDAMENTAL THEOREM 273 1.9 Suppose that TF is a Banach space and that F and G are functions from R X W^ to W satisfying suitable Lipschitz conditions. Show how the second-order system would be brought under our standard theory by making it into a single second-order equation. 1.10 Answer the above exercise by converting it to a first-order system and then to a single first-order equation. 1.11 Let ^ be a nonnegative, continuous, real-valued function defined on an interval [0, a] C K, and suppose that there are constants h and c > 0 such that d{x) < cf d{t) dt + bx for all x G [0, a], Jo a) Prove by induction that ii m = II^IU, then m(cx)'' . b A (cxY , ^(^) < -, r- 1^ —TT- for every n. nl c^^ j\ b) Then prove that e(x) < - (e'"" - 1) for all x. c 1.12 Let T^ be a Banach space, let / be an interval in U, and let F he a, continuous mapping from I X W to W. Suppose that \\F{tj ao)\\ < b for alU G / and that \\F{t,a)-F{t,m<c\\a-p\\ for all t in / and all a, p in W, Let / be the global solution through <^o, olq>, and set e{x) = 11/(^0 + x) — ao||. Prove that e{x) < r d{t)dt+bx Jo for a: > 0 and ^o + ^ in /. Then use the result in the above exercise to derive a much stronger bound than we have in the text on the growth of the solution f{t) as t goes away from ^o- 1.13 With the hypotheses on F as in the above exercise, show that the iteration sequence for the solution through <^o, oio> converges on the whole of / by showing inductively that if /o = ao and fn(t) =ao+ f F({s)Jn-l{s)) ds, Jo then |/n(0-/n-l(OI<^^- From these inequalities prove directly that the solution/ through <to,ao> satisfies 11/(0 - aoll < 7 («'""'"' - !)•
274 DIFFERENTIAL EQUATIONS 6.2 2. DIFFERENTIABLE DEPENDENCE ON PARAMETERS It is exceedingly important in some applications to know how the solution to the system fit) = G{t,m), f{h) =«! varies with the initial point <tij ai> . In order to state the problem precisely, we fix an open interval J, set 11 = Br{a.Q) CV= (Be(J, W) as in the previous section, and require a solution in 11 passing through <^i, ai > , where <^i, ai > is near <toj ao>. Supposing that a unique solution / exists, we then have a mapping < ^i, ai > »-^ /, and it is the continuity and differentiability of this map that we wish to study. Theorem 2.1. Let L X C/ be a neighborhood of <toy ao> in the Banach space U X Wy and let F{tj a) be a bounded continuous mapping from L X UtoW which is Lipschitz in a uniformly over t. Then there is a neighborhood J X N oi <toj ao> with the following property. For any <tij ai> inJxN there is a unique function / from J to U which is a solution of the differential equation da/dt — F(t,a) passing through <^i,ai>, and the mapping <ti, ai > f-^/ from J X A^ to F is continuous. Proof. We simply reexamine the calculation of Theorem 1.1 and take b a little smaller. Let Kiti, ai,f) be the mapping of that theorem but with initial point <ti, ai >, so that g = K{ti, a\, f) if and only if g(t) = a^ + jt[ F(s,f(s)) ds for all t in J. Clearly K is continuous in <ti, ai> for each fixed /. If N is the ball Br/2(oio)j then the inequality (1) in the proof shows that ||K(^i, ai, ao) — aoll ^ ||«i — aoll + dm < r/2 + 8m, The second inequality remains unchanged. Therefore, / f-^ K{tij ai, /) is a map from 11 to 7 which is a contraction with constant C — 8cU 8c < Ij and which moves the center ao of 11 a distance less than (1 — C)r if r/2 -\- 8m < {1 — 8c)r. This new double requirement on 8 is equivalent to 8 < 2(m + cr) which is just half the old value. With J of length 5, we can now apply Theorem 9.2 of Chapter 4 to the map K{tij ai,/) from (J X iV) X 11 to F, and so have our theorem. D If we want the map <^i, ai > »—>/ to be differentiate, it is sufficient, by Theorem 9.4 of Chapter 4, to know in addition to the above that K: (J X iV) X 11 -^ F is continuously differentiable. And to deduce this, it is sufficient to suppose that dF exists and is uniformly continuous on L X U. Theorem 2.2. Let L X C/ be a neighborhood of < to, ao ^ ^^ the Banach space U xWy and let F{tj a) be a bounded mapping from L X UtoW such that dF exists, is bounded, and is imiformly continuous on L X U, Then, in
6.2 DIFFERENTIABLE DEPENDENCE ON PARAMETERS 275 the context of the above theorem, the solution / is a continuously differ- entiable function of the initial value <ti, ai > . Proof. We have to show that the map K{ti, ai, /) from (J X N) X'UtoV is continuously differentiable, after which we can apply Theorem 9.4 of Chapter 4, as we remarked above. Now the mapping h ^-^ k defined by k(t) = jt[ h(s) ds is a bounded linear mapping from F to 7 which clearly depends continuously on ^i, and by Theorem 14.3 of Chapter 3 the integrand map f ^-^ h defined by h(s) = F{sj /(s)) is continuously difTerentiable on 11. Composing these two maps we see that dK%ti,aiJ> exists and is continuous on J X iV X 11. Now so that dK^ = I, and AKl^t„a,j>(h) = -Jt['^^ F(sJ(s)) ds, from which it follows easily that dKl^ti,ai,f>(h) =—hF{ti,f(t)). The three partial differentials dK^j dK^j and dK^ thus exist and are continuous on J X iV X 11, and it follows from Theorem 8.3 of Chapter 3 that K(tij ai,/) is continuously difTerentiable there. D Corollary. If s is any point in J, then the value f(s) of a solution at s is a differentiable function of its value at ^o- Proof. Let /« be the solution through <to, a>. By the theorem, a i-^ /« is a continuously difTerentiable map from N to the function space V = 6^e(Jj W). Eut TTs'f^-^fis) is a bounded linear mapping and thus trivially continuously difTerentiable. Composing these two maps, we see that a »—> /«(«) is continuously difTerentiable on N. D It is also possible to make the continuous and difTerentiable dependence of the solution on its initial value < to, ao ^ ^^^^ ^ global afTair. The following is the theorem. We shall not go into its proof here. Theorem 2.3. Let /be the maximal solution through < to, ao > with domain J, and let [a, h] be any finite closed subinterval of J containing ^o- Then there exists an e > 0 such that for every <ti, ai> G Be(<toj «o^) the domain of the global solution through <ti, ai> includes [a, h], and the restriction of this solution to [a, h] is a continuous function of <^i, ai > . If F satisfies the hypotheses of Theorem 2.2, then this dependence is continuously difTerentiable. Finally, suppose that F depends continuously (or continuously difTer- entiably) on a parameter X, so that we have F(\, t, a) on M X I X A. Now the solution / to the initial-value problem fit) = Fit, fit)), f(t,) = a, depends on the parameter X as well as on the initial condition/(^i) = ai, and if the reader has fully understood our arguments above, he will see that we can show in the same way that the dependence of / on X is also continuous (continuously differentiable). We shall not go into these details here.
276 DIFFERENTIAL EQUATIONS 6.3 3. THE LINEAR EQUATION We now suppose that the function F of Section 1 is from I X W to W and continuous, and that F{t, a) is linear in a for each fixed t. It is not hard to see that we then automatically have the strong Lipschitz hypothesis of Theorem 1.4, which we shall in any case now assume. Here this is a boundedness condition on a linear map: we are assuming that F(tj a) = Tt(a)j where Tt G Hom Wj and that \\Tt\\ < c(t) for all t, where c(t) is continuous on /. As one might expect, in this situation the existence and uniqueness theory of Section 1 makes contact with general linear theory. Let Xq be the vector space e(/, W) of all continuous functions from / to W, and let Xi be its subspace e^(/, W) of all functions having continuous first derivatives. Norms will play no role in our theorem. Theorem 3.1. The mapping SiXi-^Xq defined by setting g = Sf U g(t) = f'{t) — F(tjf(t)) is a surjective linear mapping. The set N of global solutions of the differential equation da/dt = F(tj a) is the null space of aS, and is therefore, in particular, a vector space. For each to E: I the restriction to N of the coordinate (evaluation) mapping irt^'-f^-^ f{to) is an isomorphism from N to W. The null space M of ttiq is therefore a complement of iV in Xi, and so determines a right inverse R of aS. The mapping / j-^ < aS/, f{to) > is an isomorphism from Xi to Xq X W, and this fact is equivalent to all the above assertions. Proof, For any fixed g in Xq we set G{ty a) = F{t, a) + g{t) and consider the (nonlinear) equation da/dt = G{tj a). By Theorems 1.3 and 1.4 it has a unique maximal solution / through any initial point < toj ao >, and the domain of / is the whole of /. That is, for each pair <gj a> in Xq XW there is a unique/in Xi such that <Sfjf(to)> = <g,a>. The mapping <S,irt,>:f^<SfJ{t,)> is thus bijective, and since it is clearly linear, it is an isomorphism. In particular, aS is surjective. The null space AT of aS is the inverse image of {0} X W under the above isomorphism; that is, ttiq t iV is an isomorphism from N to W, Finally, the null space M oi ttiq is the inverse image of Xq X {0} under <aS, 7riQ>, and the direct sum decomposition Xi = M ^ N simply reflects the decomposition XqXW = (Xq X {0}) 0 ({0} X W) under the inverse isomorphism. This finishes the proof of the theorem. D The problem of finding, for a given g in Xq and a given ao ^^ W, the unique/ in Xi such that aS(/) = g and/(^o) = «o is called the initial-value problem. At the theoretical level, the problem is solved by the above theorem, which states that the uniquely determined / exists. At the practical level of computation, the problem remains important. The fact that M = Mt^ is a complement of N breaks down the initial-value problem into two independent subproblems. The right inverse R associated with
6.3 THE LINEAR EQUATION 277 MtQ finds h in Xi such that SQi) = g and h{to) = 0. The inverse of the isomorphism f^-^f{to) from N to W selects that k in Xi such that aS(A;) = 0 and k{to) = ao- Then / = h + k. The first subproblem is the problem of "solving the inhomogeneous equation with homogeneous initial data", and the second is the problem of "solving the homogeneous equation with inhomogeneous initial data". In a certain sense the initial-value problem is the "direct sum" of these two independent problems. We shall now study the homogeneous equation da/dt = Tt{a) more closely. As we saw above, its solution space N is isomorphic to W imder each projection map TTf = / f-^ f{i). Let (pt be this isomorphism (so that <pt = T^t \ N). We now choose some fixed ^o in /—we may as well suppose that / contains 0 and take ^0 = 0—and set Kt = (ft "^ <^^^ Then {Kt} is a one-parameter family of linear isomorphisms of W with itself, and if we set ffi{t) = Kt{ff)j then/|3 is the solution of da/dt = Tt{a) passing through <0, p> . We call Kt 2i fundamental solution of the homogeneous equation da/dt = Tt{a). Since f^(i) = Tt{U(i)), we see that d(Kt)/dt = Tt ^ Kt in the sense that the equation is true at each ^ in W. However, the derivative d(Kt)/dt does not necessarily exist as a norm limit in Hom W. This is because our hypotheses on Tt do not imply that the mapping t^-^ Tt'is continuous from / to Hom W. If this mapping is continuous, then the mapping <tj A> j-^ T^ o A is continuous from / X Hom W to Hom Wy and the initial-value problem dA/dt = Tto A, Ao= I has a unique solution A; in e ^(/, Hom W). Because evaluation at jS is a boimded linear mapping from Hom W to Wj At{^) is a dififerentiable function of t and dAtm/dt= {dAt/dtW) = Tt{At{P)). This implies that At{ff) = Ki(^) for all ^, so Kt = At. In particular, the fundamental solution t^-^Kt is now a differentiable map into Hom Wj and dKt/dt = Tt "^ Kt. We have proved the following theorem. Theorem 3.2. Let ^»—> T; be a continuous map from an interval neighborhood / of 0 to Hom W. Then the fundamental solution t^-^ Kt of the differential equation da/dt = Tt{a) is the parametrized arc from / to Hom W that is the solution of the initial-value problem dA/dt = Tt "^ A, Ao= I. In terms of the isomorphisms Kt = K(t), we can now obtain an explicit solution for the inhomogeneous equation da/dt = Tt{a) + g{t). We want/such that fit) - Ttim) = g(t). Now K\t) = TtoK(t), so that Tt = K'{t) o K{ty\ and it follows from Exercise 8.12 of Chapter 4 and the general product rule for differentiation
278 DIFFERENTIAL EQUATIONS G..'i (Theorem 8.4 of Chapter 3) that the left side of the equation above is exactly /:«) (I [/^a)-'(/«))])• The equation we have to solve can thus be rewritten as We therefore have an obvious solution, and even if the reader has found our paotivating argument too technical, he should be able to check the solution by differentiating. Theorem 3.3. In the context of Theorem 3.2, the function m = Kt[f^Kr\g{s))ds] is the solution of the inhomogeneous initial-value problem da/dt = Tt{a) + git), /(O) = 0. This therefore is a formula for the right inverse 72 of aS determined by th(» complement Mq of the null space N of aS. The special case of the constant coefficient equation, where the "coefficient" operator Tt is a fixed T in Hom W, is extremely important. The first new fact, to be observed is that if / is a solution of da/dt = T{a)j then so is /'. For the equation/'(O = T{f{t)) has a differentiable right-hand side, and differentiating, wegetr'W = T{f{t)), That is: Lemma 3.1. The solution space N of the constant coefficient equation da/dt = T{a) is invariant under the derivative operator D. Moreover, we see from the differential equation that the operator D on N is just composition with T, More precisely, the equation/'(^) = T(f{t)) can be rewritten wt "^ D = T o ttu and since the restriction of Wt to N is the isomorphism <pt from N to Wj this equation can be solved for T. We thus have the following lemma. Lemma 3.2. For each fixed t the isomorphism (ft from N to W takes the derivative operator D on N to the operator T on W. That is, T = <pt o J) o (p^ . The equation for the fundamental solution Kt is now dS/dt = TS. In the elementary calculus this is the equation for the exponential function, which leads us to expect and immediately check that Kt = €*^. (See the end of Section 8 of Chapter 4.) The solution of da/dt = T{a) through <0^ ^> is thus the function 0 J'
6.3 THE LINEAR EQUATION 279 If T satisfies a polynomial equation 7)(!r) = 0, as we know it must if W is finite-dimensional, then our analysis can be carried significantly further. Suppose for now that p has only real roots, so that its relatively prime factorization is pW = n?(^ - >^*)'^'. Then we know from Theorem 5.5 of Chapter 1 that W is the direct sum W = @\ Wi of the null spaces Wi of the transformations {T — \i)^\ and that each Wi is invariant under T. This gives us a much simpler form for the solution curve e^'^a if the point a is in one of the null spaces Wi. Taking such a subspace Wi itself as W for the moment, we have {T — X/)^ = 0, so that T = \I + R, where R"^ = 0, and the factorization e*^ = e'^e*^, together with the now finite series expansion of e*^, gives us J1' J^\ \ 4-Df \ \ I ^m-i R^~ (q^) e a= e I a + tR{a) H h ^ (m — 1) ij * Note that the number of terms on the right is the degree of the factor {t — X)"* in the polynomial j){t). In the general situation where W = @\ Wi, we have a = J^\ ai, e*^(a)^ = E^i e^'^{a^), and each e^^(a,) of the above form. The solution of f{t) = T(f{t)) through the general point <0, a> is thus a finite sum of terms of the form Ve^^^Pijj the number of terms being the degree of the polynomial p. If TF is a complex Banach space, then the restriction that p have only real roots is superfluous. We get exactly the same formula but with complex values of X. This introduces more variety into the behavior of the solution curves since an outside exponential factor e^^ = e^^e^^^ now has a periodic factor if Altogether we have proved the following theorem. Theorem 3.4. If TF is a real or complex Banach space and T G Hom TF, then the solution curve in W of the initial-value problem f'{t) = T{f{t)), m = ^, is 0 J' If T satisfies a polynomial equation (T — X)^ = 0, then m = e'^ [^ + tRifi) + ■■■ + ^J"_\y R^'-'iff)] , where R = T — \I. li T satisfies a polynomial equation p{T) = 0 and p has the relatively prime factorization p{x) = Hi (^ ~ ^i)*"*; then f{t) is a sum of k terms of the above type, and so has the form fit) = E t'V^%i, where the number of terms on the right is the degree of the polynomial p, and each fiij is a fixed (constant) vector.
280 DIFFERENTIAL EQUATIONS Q.',] It is important to notice how the asymptotic behavior of f{t) as ^ -^ + oo is controlled by the polynomial roots X^-. We first restrict ourselves to the solution through a vector a in one of the subspaces Wiy which amounts to supposing that {T — X)"^ = 0. Then if X has a positive real part, so that e^^ = e^f'e'^' with jjL > Oj then 11/(011 -^ 00 in exponential fashion. If X has a negative real pari, then f{t) approaches zero as ^ -^ oo (but its norm becomes infinite exponentially fast as t -^ — oo). If the real part of X is zero, then \\f{t)\\ -^ oo like f^'^ if m > 1. Thus the only way for / to be bounded on the whole of U is for the real part of X to be zero and m = 1, in which case / is periodic. Similarly, in tho general case where p{T) = ]J\ {T — X^)"^" = 0, it will be true that all the solution curves are bounded on the whole of U if and only if the roots X^ are all pure imaginary and all the multiplicities m^ are 1. EXERCISES 3.1 Let / be an open interval in U, and let W he a. normed linear space. Let F{t, a) be a continuous function from / X W to W which is linear in a for each fixed t. Prove that there is a function c(t) which is bounded on every closed interval [a, b] included in / and such that \\F{t, a)\\ < c(0||a:|| for all a and t. Then show that c can be made continuous. (You may want to use the Heine-Borel property: If [a, h] is covered by n collection of open intervals, then some finite subcollection already covers [a, b].) 3.2 In the text we omitted checking that /i-^/(^> — G{tJJ', . . . ,/^^"i) is sur- jective from Xn to Xq. Prove that this is so by tracking down the surjectivity through the reduction to a first-order system. 3.3 Suppose that the coefficients ai{t) in the operator 0 are all themselves in C^. Show that the null space A^ of T is a subspace of 6"*+^. State a generalization of this theorem and indicate roughly why it is true. 3.4 Suppose that W is a Banach space, T G Hom W, and fi is an eigenvector of T with eigenvalue r. Show that the solution of the constant coefficient equation da/dt = T{a) through <0,i9> is/(0 = e'^fi. 3.5 Suppose next that W is finite-dimensional and has a basis {Pi}\ consisting of eigenvectors of T, with corresponding eigenvalues r,. Find a formula for the solution through <0, «> in terms of the basis expansion of a. 3.6 A very important special case of the linear equation da/dt — Tt{a) is when the operator function Tt is periodic. Suppose, for example, that T<+i = Tt for all i. Show that then Kt^n = Kt{KiY for all t and n. Assume next that K\ has a logarithm, and so can be written K\ = e"^ for some A in Hom W. (We know from Exercise 11.19 of Chapter 4 that this is always possible if W is finite-dimensional.) Show that now Kt can be written in the form where B{t) is periodic with period 1.
6.4 THE nTH-ORDER LINEAR EQUATION 281 3.7 Continuing the above exercise, suppose now that TF is a finite-dimensional complex vector space. Using the analysis of e^"^^ given in the text, show that the differential equation da/dt = Tt(a) has a periodic solution (with any period) only if Ki has an eigenvalue of absolute value 1. Show also that if Ki has an nth root of unity as an eigenvalue, then the differential equation has a periodic solution with period n. 3.8 Write out the special form that the formula of Theorem 3.3 takes in the constant coefficient situation. 3.9 It is interesting to look at the facts of Theorem 3.1 from the point of view of Theorem 5.3 of Chapter 1. Assume that *S: Xi -^ Xo is surjective and that its null space N is isomorphic to W under the coordinate (evaluation) map wtQ. Prove that if M is the nullspace of wtQ in Xi, then S f M is an isomorphism onto Xo by applying this theorem. 4. THE nTH-ORDER LINEAR EQUATION The nth-order linear differential equation is the equation d^'a/dr = Git, a, da/dt, . . . , d'^-^a/dt''-^), where G{t, a) = G(t, ai, . . . , a^) is now linear from V = W^ to W for each t in /. We convert this to a first-order equation dot/dt = Fit, ot) just as before, where now F is a map from I XV ioV that is linear in its second variable a, F(t, ot) = Our proof of Theorem 1.5 showed that a function/ in 6^"^^/, W) is a solution of the nth-order equation d^a/df^ = G(t, a, . . . , d^~^a/dt^~^) if and only if the n-tuplet xl/f = </,/',.-., /^^~'^^> is a solution of the first-order equation doi/dt = F{tjCt) = Ttipt). We know that the latter solutions form a vector subspace N of e^{I, W), and since the map yp\f^ </,/',..- ,/^''~^^> is linear from e^(/, W) to e^{I, W), it follows that the set N of solutions of the nth-order equation is a subspace of ©"^(Z, W) and yp f iV is an isomorphism from TV to N. Since the coordinate evaluation ,^^ = tt^ f N is an isomorphism from N to W^ for each t (Theorem 3.1), it follows that the map takes N isomorphically to W^. Its null space Mt is a complement of N in C^, as before. Here Mt is the set of functions / in ©"^(Z, W) such that f(t) = - - - = j(n-l)(^) = 0. We now consider the special case W = U. For each fixed tj G is now a linear map from U^ to U, that is, an element of (R^)*, and its coordinate set with respect to the standard basis is an n-tuple k= </ci,...,/c^>. Since the linear map varies continuously with t, the n-tuple k varies continuously with t. Thus, when we take ^ into account, we have an n-tuple/c(^) = <ki{t)j... , kn{t)> of continuous real-valued functions on I such that G{t, Xi, . . . , Xn) = 22 ki{t)Xi.
282 DIFFERENTIAL EQUATIONS 6.1 The solution space N of the nth-order differential equation d^'a/dr = G(a, . . . , d''-^a/dt''-\ t) is just the null space of the linear transformation L\Q'^{I,R)—^ Q^{I,R) defined by (L/)(0 = /'"'(O - k„it)f"-'\t) fci(<)/(<)• If we shift indices to coincide with the order of the derivative, and if we let/^''^ also have a coefficient function, then our nth-order linear differential operator L appears as (Lm) = a,(t)f^\t) + '-' + ao(t)m. Giving/^'^^ a coefficient function a„ changes nothing provided an(t) is never zero, since then it can be divided out to give the form we have studied. This is called the regular case. The singular case, where a„(0 is zero for some t, requires further study, and we shall not go into it here. We recapitulate what our general linear theory tells us about this situation. Theorem 4.1. L is a surjective linear transformation from the space Q^(I) of all real-valued functions on / having continuous derivatives througli order n to the space 6^(7) = 6(7) of continuous functions on 7. Its null space N is the solution space of our original differential equation. For each ^0 in ^ the restriction to N of the mapping <Pto ° "^'-f *-^ ^/(^o)? • • • > f^^~^\to)> is an isomorphism from N to [R^, and the set Mt^ of functions f in Q^ such that f(to) = - - • = f'^^~^\to) = 0 is therefore a complement of A" in e'^(7), and determines a Hnear right inverse of L. The practical problem of "solving" the differential equation L(/) = ^ for/ when g is given falls into two parts. First we have to find the null space N of L, that is, we have to solve the homogeneous equation L(f) = 0. Since N is an n-dimensional vector space, the problem of delineating it is equivalent to findin<!; a basis, and this is clearly the efficient way to proceed. Our first problem therefore is to find n linearly independent solutions {ui}^ of L(/) = 0. Our second problem is to find a right inverse of L, that is, a linear way of picking one f such that L(f) = g for each g. Here the obvious thing to do is to try to make the formula of Theorem 3.3 into a practical computation. If v is one solution of L(f) = g, then of course the set of all solutions is the affine subspace N -\- v. We shall start with the first problem, that of finding a basis {ui}\ of solutions to L(f) = 0. Unfortunately, there is no general method available, and we have to be content with partial success. We shall see that we can easily solve the first-order equation directly, and that if we can find one solution of the nth- order equation, then we can reduce the problem to solving an equation of order n — 1. Moreover, in the very important special case of an operator L with constant coefficients. Theorem 3.4 gives a complete explicit solution. The first-order homogeneous linear equation can be written in the form y' + a{i)y = 0, where the coefficient of y' has been divided out. Dividing by //
6.4 THE nTH-ORDER LINEAR EQUATION 283 and remembering that y'jy = (log ?/)', we see that, formally at least, a solution is given by log ?/ = — Ja(0 dt or y = e~^^^^^^\ and we can check it by inspection. Thus the equation y' -\- y/t = Q has a solution y = e~^^^ ^ = \/t, as the reader might have noticed directly. Suppose now that L is an nth-order operator and that we know one solution u oi Lf = 0. Our problem then is to find n — 1 solutions Vij . . . jVn—i independent of each other and of u. It might even be reasonable to guess that these could be determined as solutions of an equation of order n — 1. We try to find a second solution v(t) in the form c(t)u{t)j where c(t) is an unknown function. Our motivation, in part, is that such a solution would automatically be independent of u unless c{i) turns out to be a constant. Now if v(t) = c(t)u(t), then v^ = cu' + c'u, and generally 1=0 V/ If we write down L{v) = Yi% aj(t)v^^\t) and collect those terms involving c(t)j we get n L(v) = c(i) ^ ajU^^^ + terms involving c', . . . , c^""^ 0 = cL(u) + S(c') = aS(cO, where aS is a certain linear differential operator of order n — 1 which can be explicitly computed from the above formulas. We claim that solving S{f) = 0 solves our original problem. For suppose that {gi}1~^ is a basis for the null space of aS, and set Ci(t) = Jo Qi. Then L(ciu) = S(Ci) = S(gi) = 0 for i = 1, . . . , n — 1. Moreover, u, CiU, . . . , c^-iu are independent, for if u = El"' k^c^u, then 1 = Zi"' te(0 and 0 = Li"' ^i^KO = Zi"' kigi(t), contradicting the independence of the set {gi}. We have thus shown that if we can find one solution u of the nth-order equation L/ = 0, then its complete solution is reduced to solving an equation Sf = 0 of order n — 1 (although our independence argument was a little sketchy). This reduction procedure does not combine with the solution of the first- order equation to build up a sequence of independent solutions of the nth-order equation because, roughly speaking, it "works off the top instead of off the bottom". For the combination to be successful, we would have to be able to find from a given nth-order operator a first-order operator aS such that N(S) C N(L)j and we can't do this in general. However, we can do it when the coefficient functions in L are all constants, although we shall in fact proceed differently. Meanwhile it is valuable to note that a seconc^-order equation Lf=0 can be solved completely if we can find one solution u, since the above argument reduces the remaining problem to a first-order equation which can then be solved by an integration, as we saw earlier. Consider, for instance, the equation 2/" — 2y/t^ = 0 over any interval / not containing 0, so that ao(t) = l/t^ is continuous on /. We see by inspection that u(i) = t^ is one solution. Then we
284 DIFFERENTIAL EQUATIONS G. 1 know that we can find a solution v(t) independent of u(t) in the form v(t) = t^c{t) and that the problem will become a first-order problem for c'. We have, in fact, v' = ec' + 2tc and v" = th" + W + 2c, so that L{v) = v" ~ 2v/t^ - t^c" + 4^c', and L{v) = 0 if and only if (c')' + (4/Oc' = 0. Thus (to within a scalar multiple; we only want a basis!), and v = t^c{t) = 1//. (The reader may wish to check that this is the promised solution.) The null space of the operator L(/) = /" — 2//^^ is thus the linear span of {i^, l/t]. We now turn to an important tractable case, the differential operator Lf = aj'^' + an-if""-'' + • • • + ao/, where the coefficients ai are constants and a„ might as well be taken to 1. What, makes this case accessible is that now L is a polynomial in the derivative operatoi' D. That is, if Df = /', so that D'f = f'\ then L = p(D), where p(x) - The most elegant, but not the most elementary, way to handle this equation is to go over to the equivalent first-order system dx/dt = T(x) on U^ and to apply the relevant theory from the last section. Theorem 4.2. If p(t) = (t — h)^, then the solution space N of the constant coefficient nth-order equation p(D)f = 0 has the basis {e'\te'\...,t^-'e''}. If p(t) is a polynomial which has a relatively prime factorization p(t) =^ Ua Viif) with each p^{t) of the above form, then the solution space of thc^ constant coefficient equation p{D)f = 0 has the basis \JB^, where Bi is th(; above basis for the solution space Ni of pi(D)f = 0. Proof. We know that the mapping ^:/»—> <f,f'j • • • ,/^'^~"^^> is an isomorphism from the null space N of p(D) to the null space N of dx/dt — T{x). It is clear that yp commutes with differentiation, \p{Df) = </', . . . ,f'^^> = D\p(f), and since we know that N is invariant under D by Lemma 3.1, it follows (and can easily be checked directly) that N is invariant under D. By Lemma 3.2 we hav(^ T = (ft o D o (p~^^ which simply says that the isomorphism <pt-^ -^ ff^^ takes the operator D on N into the operator T on U^. Altogether (pt "^ ^P takes D on N into T on W", and since p(D) = 0 onN, it follows that p(T) = 0 on R^. We saw in Theorem 3.4 that if p(T) = 0 and p = (^ — by, then the solution space N of dx/dt = T{x) is spanned by vectors of the form e^'x, . . . , r-^e^'x. The first coordinates of the n-tuple-valued functions g in N form the space N (under the isomorphism / = ^~^g), and we therefore see that N is spanned by the functions e^\ . . . , t^~^e^K Since N is n-dimensional, and since there are ?t of these functions, the spanning set forms a basis.
6.4 THE nTH-ORDER LINEAR EQUATION 285 The remainder of the theorem can be viewed as the combination of the above and the direct application of Theorem 5.5 of Chapter 1 to the equation p(Z)) = 0 onNj or as the carry-over to N under the isomorphism \l/~^ of the facts already established for N in the last section. D If the roots of the polynomial p are not all real, then we have to resort to the complexification theory that we developed in the exercises of Section 11, Chapter 4. Except for one final step, the results are the same. The one extra fact that has to be applied is that the null space of a real operator T acting on a real vector space Y is exactly the intersection with Y of the null space of the complexification S of T acting on the complexification Z = Y ® lY oi Y. This implies that if p{t) is a polynomial with real coefficients, then we get the real solutions of p(D)f = 0 as the real parts of the complex solutions. In order to see exactly what this means, suppose that q(x) = {x^ — 26a; + c)"^ is one of the relatively prime factors of p(x) over U, with x^ — 26a; + c irreducible over U. Over C, q(x) factors into (x — \)'^(x — \)'^, where X = 6 + zco and o:^ = c — h^. It follows from our general theory above that the complex 2m-dimensional null space of q(D) is the complex span of {e^\ te^\ . . . , r-'e^\ e^\ te^\ . . . , f^-^^'}. The real parts of the complex linear combinations of these 2m functions is a 2m-dimensional real vector space spanned by the real parts of the above functions and the real parts of i times the above functions. That is, the null space of the real operator q{D) is a 2m-dimensional real space spanned by {e^^ cos co^, te^^ cos co^, . . . , r~^e^^ cos co^; e^^ sin co^, . . . , r~^e^^ sin co^. Since there are 2m of these functions, they must be independent and must form a basis for the real solution space of q(D)f = 0. Thus, Theorem 4.3. If p(t) = (t^ + 2ht + c)'^ and h^ < c, then the solution space of the constant coefficient 2mth-order equation p{D)f = 0 has the basis {t e cos o:t} i=o ^ {t e sm o)t} i=o ? where o)^ = c — h^. For any polynomial p(t) with real coefficients, if p(t) == III Pi(i) is its relatively prime factorization into powers of linear factors and powers of irreducible quadratic factors, then the solution space N of p(D)f = 0 has the basis (J i ^i? where Bi is the basis for the null space of Pi(D) that we displayed above if pi(t) is a power of an irreducible quadratic, and Bi is the basis of Theorem 4.2 if pi(t) is a power of a linear factor. Suppose, for example, that we want to find a basis for the null space of D^ - 1 = 0. Here p{x) = x^ - 1 = (x - l)(x + l)(x - i){x + ^). The basis for the complex solution space is therefore {e\ e~\ e'^\ e~^^}. Since e'^^ = cos t + i sin i, the basis for the real solution space is {e\ e~\ cos t, sin i).
286 DIFFERENTIAL EQUATIONS 6.4 The same problem for Z)^ — 1 = 0 gives us p{x) = x^ - 1= {x- l){x^ + X + 1) so that the basis for the complex solution space is and the basis for the real solution space is {e^ e-'l^ cos WZt/2), e'"^ sin (V^t/2)}. *Our results above suggest that the collection Gi of all real-valued solutions of constant coefficient homogeneous linear differential equations contains the functions f^, e^\ cos co^, sin ojt for all z, r, and co, and is closed under addition and multiplication, and is in fact the algebra generated by these functions. We can easily prove this conjecture. We first consider sums. Suppose that ^(/) = 0 and that aS(^) = 0, where aS and T are two such constant coefficient operators. Then / + ^ is in the null space oi S o T because S and T commute: {S o T)(f+g) = (S o T)(f) + (S o T){g) = S(Tf) + T{Sg) = 0 + 0 = 0. We know that aS and T commute because they are both polynomials in D. In order to treat products, we first have to recognize that the linear span of all the trigonometric functions sin at, cos bt is an algebra. In other words, any finite product of such functions is a linear combination of such functions. This is the role of a certain class of trigonometric identities, such as 2 sin x cos y = sin (x + 2/) + sin (x — y), which the reader has undoubtedly had to struggle with. (And again the mystery disappears when we are allowed to treat them as complex exponentials.) Then we observe that any function in the algebra d is a finite sum of terms each of which is of the form fe*"^ sin cot or fe*"^ cos cot for some i, r, and co. We can exhibit an operator T having such a function in its null space, and our finite sum of such terms will then be in the null space of the composition of these operators T by our first argument. We are tempted to say one more thing. The functions f, e^\ sin cot, cos cot, and sums of their products can be shown to be exactly the continuous functions /: R —> R such that the set of translates of / has a finite-dimensional span. That is, if we define translation through x, K^, by (Kxf)(t) = f(t — x), then for exactly the above functions/the linear span of {Kxf, x G U} is finite-dimensional. This second characterization of exactly the same class of functions cannot be accidental. Part of the secret lies in the fact that the constant coefficient operators T are exactly those linear differential operators that commute with translation. That is, if r is a linear differential operator, then T o Kx = K^ o T ior all X if and only if T has constant coefficients. Now we have noted in an early chapter that U T o S = S o T, then the null space of T is invariant under S. Therefore, the null space iV of a constant coefficient operator T is invariant under all translations: Kx[N] C N for all x. Now we know that N is finite-dimensional
6.4 THE nTH-ORDER LINEAR EQUATION 287 from our differential equation theory. Therefore, the functions in N are such that their translates have a finite-dimensional span! This device of gaining additional information about the null space A^ of a linear operator T by finding operators aS that commute with T, so that N is AS-invariant, is much used in advanced mathematics. It is especially important when we have a group of commuting operators aS, as we do in the above case with the operators aS = Kx. What we have not shown is that if a continuous function / is such that its translation generates a finite-dimensional vector space, then / is in the null space of some constant coefficient operator p{D). This is delicate, and it depends on showing that if {Kt} is a one-parameter family of linear transformations on a finite-dimensional space such that Ks-\-t = Kg "^ Kt and K^ —> / as ^ —> 0, then there is an aS in Hom V such that Kt = e^^.^ EXERCISES Find solutions for the following equations. 4.2 x" +2x' — Sx = 0 4.4 x" + 2a:' + a: = 0 a: = 0 4.6 x'" — a: = 0 4.8 x''' = 0 4.10 Solve the initial-value problem x'' + 4x' — 5a: = 0, a:(0) = 1, x\0) = 2. 4.11 Solve the initial-value problem a:'" + a;' = 0, a:(0) = 0, a:'(0) = — 1, a:"(0) = 1. 4.12 Find one solution u of the equation 4^^a:" + a: = 0 by trying u{t) = r, and then find a second solution as in the text by setting v{t) = c{t)u{t). 4.13 Solve t^x'" — Stx' + 3a: = 0 by trying u(t) = r. 4.14 Solve ^a:" + a:' = 0. 4.15 Solve ^(a:'" + a:') + 2(a:" + x) = 0. 4.16 Knowing that e~^^ cos ojt and e~^^ sin ojt are solutions of a second-order linear differential equation, and observing that their values at 0 are 1 and 0, we know that they are independent. Why? 4.17 Find constant coefficient differential equations of which the following functions are solutions: t^, sin t, t^ sin t. 4.18 If / and g are independent solutions of a second-order linear differential equation w" + aiw' + a2U = 0 with continuous coefficient functions, then we know that the vectors <f{x), f\x)> and "^gix), g'{x)> are independent at every point x. Show conversely that if two functions have this latter property, then they are solutions of a second-order differential equation. 4.19 Solve the equation (D — a)^/ = 0 by applying the order-reducing procedure discussed in the text starting with the obvious solution e"'K 4.1 4.3 4.5 4.7 4.9 x" — 3a:' + 2a: = 0 x" + 2a:' + 3a: = 0 a:'" - 3a:" + 3a:' - a:(6) - x" = 0 x'" - a:" = 0
288 DIFFERENTIAL EQUATIONS 6.5 5. SOLVING THE INHOMOGENEOUS EQUATION We come now to the problem of solving the inhomogeneous equation L(/) = g. We shall briefly describe a practical method which works easily some of the time and a theoretical method which works all the time, but which may be hard to apply. The latter is just the translation of Theorem 3.3 into matrix language. We first consider the constant coefficient equation L(/) = ^ in the special case where g itself is in the null space of a constant coefficient operator aS. A simple example is y^ — ay = e^^ (or y^ — ay = sin ht), where g(t) = e^^ is in the null space of S = (D — h). In such a situation a solution / must be in the null space of aS o L, for aS q L(/) = S(g) = 0. We know what all these functions are, and our problem is to select / among them such that L(/) is the given g. For the moment suppose that the polynomials L and aS (polynomials in D) have no factors in common. Then we know that L is an isomorphism on the null space Ns of aS and therefore that there exists an / in Ns such that Lf = g. Since we have a basis for Ns, we could construct the matrix for the action of L on Ns and find / by solving a matrix equation, but the simplest thing to do is take a general linear combination of the basis, with unknown coefficients, let L act on it, and see what the coefficients must be to give g. For example, to solve y' — ay = e^\ we try f(t) = ce^^ and apply L:(D - a)(ce^') = (h - a)ce^' I e^\ and we see that c = 1/(6 — a). Again, to solve y' — ay =^ cos ht, we observe that cos ht is in the null space of aS= Z)^ + 6^ and that this null space has the basis {sin ht, cos ht]. We therefore set f{t) = ci sin 6^ + C2 cos ht and solve (D — a)f = cos ht, getting (—aci — hc2) sin ht + (6ci — ac2) cos ht = cos ht, —aci — hc2 = 0, hci — ac2 = I, and h a a^ + V a^ + h^ When L and aS do have factors in common, the situation is more complicated, but a similar procedure can be proved to work. Now an extra factor t'^ must be introduced, where i is the number of occurrences of the common factor in L. For example, in solving (D — r)^f = e^\ we have S o L = (D — r)^, and so we must set f(t) = ct^e^K Our equation then becomes (D - r)hth'' = 2ce'' I e\ and so c = \. For (D^ + 1)/ = sin t we have to set /(O = ^(ci sin t + C2 cos i), and after we work it out we find that ci = 0 and C2 = —\, so that j =^ —\t cos i.
6.5 SOLVING THE INHOMOGENEOUS EQUATION 289 This procedure, called, naturally, the method of undetermined coefficients, violates our philosophy about a solution process being a linear right inverse. Indeed, it is not a single process, applicable to any g occurring on the right, but varies with the operator aS. However, when it is available, it is the easiest way to compute explicit solutions. We describe next a general theoretical method, called variation of parameters, that is a right inverse to L and does therefore apply to every g. Moreover, it inverts the general (variable coefficient) linear nth-order operator L: mm = i: ai{t)f\t). 0 We are assuming that we know the null space N of L; that is, we assume known n linearly independent solutions {ui}\ of the homogeneous equation Lf = 0. What we are going to do is to translate into this context our formula Kt Jo K~^{g{s)) ds for the solution to da/dt = Tt{a) + g(t). Since is an isomorphism from the solution space N of the nth-order equation L(/) = 0 to the solution space N of the equivalent first-order system dx/dt = Tt{\)j it follows that if we have a basis {ui}\ for N, then the columns of the matrix Wij = vij~ ^^ form a basis for N. Let w(0 be the matrix Wij(}) = Uj'^~^\i), Since evaluation at t is the isomorphism (ft from N to R"^, the columns of w{t) form a basis for W^, for each t. But Kt{(x.) is the value at t of the solution of dx/dt = Tt{\) through the initial point < 0, a > , and it follows that the linear transformation Kt takes the columns of the matrix w(0) to the corresponding columns of w{t). The matrix for Kt is therefore w(0 • w(0)~\ and the matrix form of our formula m = Ktf\Ksr'(g(s))ds Jo is therefore f(^) = w(0 • w(0)-^ • r w(0) • w(s)-^ • g(s) ds, Jo Moreover, since integration commutes with the application of a constant linear transformation (here multiplication by a constant matrix), the middle w(0) factors cancel, and we have the result that {{t)=w{t)' r w(sr'' g{s) ds Jo is the solution of dx/dt = Tt{x) + g{t) which passes through <0, 0> . Finally, set k(s) = w(s)~^ • g(s)j so that this solution formula splits into the pair m = Mt) ■ C k(s) ds, w(«) ■ k(s) = g(«). -'0
290 DIFFERENTIAL EQUATIONS 6.5 Now we want to solve the inhomogeneous nth-order equation L(/) = g, and this means solving the first-order system with g = <0, , , , ,0, g>, Therefore, the second equation above is equivalent to ^ Wij{s)kj{s) = 0, i < n, 3 X; Wnj{.s)kj{s) = g{s). 3 Moreover, the solution / of the nth-order equation is the first component of the n-tuple f (that is, / = ^"^f), and so we end up with n C n f(t) = 2 '^ii(0 / kj(s) ds = J^ Uj{t)cj{t), j=i Jo 1 where Ci{t) is the antiderivative Jq hi{s) ds. Any other antiderivative would do as well, since the difference between the two resulting formulas is of the form 2Z^ aiUi(t), a solution of the homogeneous equation L(f) = 0. We have proved the following theorem. Theorem 5.1. If {ui(t)}^i is a basis for the solution space of the homogeneous equation L(h) = 0, and if f(t) = J2\ Ci(t)ui(t), where the derivatives cl(t) are determined as the solutions of the equations E c'S)u[^\t) = 0, i = 0, . . . , n - 2, i i: c'i{t)uT-'\t) = g{t), i then L(/) = g. We now consider a simple example of this method. The equation y" -\- y = sec X has constant coefficients, and we can therefore easily find the null space of the homogeneous equation 2/" + 2/. A basis for it is {sin x, cos x}. But we can't use the method of undetermined coefficients, because sec x is not a solution of a constant coefficient equation. We therefore try for a solution v{x) = Ci{x) sin X + C2(x) cos x. Our system of equations to be solved is c'l sin X + C2 cos X = 0, c'l cos X — C2 sin X = sec x. Thus C2 = — Ci tan x and cJ(cos a: + sin x tan x) = sec x, giving c[ = 1, C2 = —tan X, Ci = X, C2 = log cos X, and v{x) = X sin X + (log cos x) cos x. (Check it!)
6.5 SOLVING THE INHOMOGENEOUS EQUATION 291 This is all we shall say about the process of finding solutions. In cases where everything works we have complete control of the solutions of L(/) = g^ and we can then solve the initial-value problem. If L has order ^^, then we know that the null space N is ^^-dimensional, and if for a given g the function v is one solution of the inhomogeneous equation L(/) = g, then the set of all solutions is the ^^-dimensional plane (affine subspace) M = N -\- v. If we have found a basis {uiW for iV, then every solution of L{f) = g is of the form / = X! i CiUi + v. The initial-value problem is the problem of finding / such that L(/) = g and fito) = air (to) = al,., j'^'-'^to) = al where <a?, . . . , a^> = a« is the given initial value. We can now find this unique / by using these n conditions to determine the n coeflficients Ci in / = X! ^i^i + ^- We get n equations in the n unknowns Ci, Our ability to solve this problem uniquely again comes back to the fact that the matrix Wij{to) = Uj''~^\to) is nonsingular, as did our success in carrying out the variation of parameters process. We conclude this section by discussing a very simple and important example. When a perfectly elastic spring is stretched or compressed, it resists with a "restoring" force proportional to its deformation. If we picture a coiled spring lying along the x-axis, with one end fixed and the free end at the origin when undisturbed (Fig. 6.3), then when the coil is stretched a distance x (compression being negative stretching), the force it exerts is —ex, where c is a constant representing the stiffness, or elasticity, of the spring, and the minus sign shows that the force is in the direction opposite to the displacement. This is Hookers law. Fig. 6.3 Suppose that we attach a point mass m to the free end of the spring, pull the spring out to an initial position Xq = a, and let go. The reader knows perfectly well that the system will then oscillate, and we want to describe its vibration explicitly. We disregard the mass of the spring itself (which amounts to adjusting m), and for the moment we suppose that friction is zero, so that the system will oscillate forever. Newton's law says that if the force F is applied to the mass m, then the particle will accelerate according to the equation d^x ^ Here F = —ex, so the equation combining the laws of Newton and Hooke is dt^x . - rn— + cx = Q.
292 DIFFERENTIAL EQUATIONS 6.5 This is almost the simplest constant coefficient equation, and we know that the general solution is X = ci sin Qt + C2 cos Q^, where Q = \^c/m. Our initial condition was that x = a and x^ = 0 when t = 0. Thus C2 = a and Ci = 0, so x = a cos Qt, The particle oscillates forever between x = —a and x = a. The maximum displacement a is called the amplitude A of the oscillation. The number of complete oscillations per unit time is called the frequency /, so / = Q/27r = \/c/27r\/m. This is the quantitative expression of the intuitively clear fact that the frequency will increase with the stiffness c and decrease as the mass m increases. Other initial conditions are equally reasonable. We might consider the system originally at rest and strike it, so that we start with an initial velocity v and an initial displacement 0 at time t = 0. Now C2 = 0 and x = ci sin ^t. In order to evaluate ci, we remember that dx/dt = V Sit t = 0, and since dx/dt = ciQ cos ^t, we have v = ciQ and ci = v/^, the amplitude for this motion. In general, the initial condition would hex = a and x^ = v when t = 0, and the unique solution thus determined would involve both terms of the general solution, with amplitude to be calculated. The situation is both more realistic and more interesting when friction is taken into account. Frictional resistance is ideally a force proportional to the velocity dx/dt but again with a negative sign, since its direction is opposite to that of the motion. Our new equation is thus '^lP + ^Tt + ''' = ^' and we know that the system will act in quite different ways depending on the relationship among the constants m, /c, and c. The reader will be asked to explore these equations further in the exercises. It is extraordinary that exactly the same equation governs a freely oscillating electric circuit. It is now written J. d^x . j^dx . I ^ where L, R, and C are the inductance, resistance, and capacitance of the circuit, respectively, and dx/dt is the current. However, the ordinary operation of such a circuit involves forced rather than free oscillation. An alternating (sinusoidal) voltage is applied as an extra, external, "force" term, and the equation is now ■f (J/ X . j-K tlX . X . , This shows the most interesting behavior of all. Using the method of undetermined coefficients, we find that the solution contains transient terms that die away, contributed by the homogeneous equation, and a permanent part of frequency co/27r, arising from the inhomogeneous term a sin o)t. New phenomena called phase and resonance now appear, as the reader will discover in the exercises.
5.1 5.4 5.6 5.7 x" — X = t"^ 5.2 x'' — X = sin x^^ -\- X = sin t 5.5 y^^ — y^ = x^ y" -y' = e- Consider the equation y" -^ y = sec x thj 6.5 SOLVING THE INHOMOGENEOUS EQUATION 293 EXERCISES Find particular solutions of the following equations. 5.3 x'' — X = sin t + t"^ (Here y^ = dy/dx.) sec X that was solved in the text. To what interval / must we limit our discussion? Check that the particular solution found in the text is correct. Solve the initial-value problem for f\x)+f{x) = sec:c, /(O) = 1, f{0) = -1. Solve the following equations by variation of parameters. 5.8 x" +x = tan t 5.9 x'" -^ x' = t 5.10 y" -^ y = I 5.11 2/^^^ — 2/ = cos X 5.12 y" +42/ = sec 2x 5.13 y" + 4^/ = sec x 5.14 Show that the general solution Ci sin ^t + C2 cos ^t of the frictionless elastic equation m(dH/dt^) -\- ex = 0 can be rewritten in the form A sin {^t — a). (Remember that sin (x — y) = sin x cos y — cos x sin y.) This type of motion along a line is called simple harmonic motion. 5.15 In the above exercise express A and a in terms of the initial values dx/dt = v and X = a when t = 0. 5.16 Consider now the freely vibrating system with friction taken into account, and therefore having the equation m(dH/dt^) + k (dx/dt) + ex = 0, all coefficients being positive. Show that if k^ < 4mc, then the system oscillates forever, but with amplitude decreasing exponentially. Determine the frequency of oscillation. Use Exercise 5.14 to simplify the solution, and sketch its graph. 5.17 Show that if the frictional force is sufficiently large (k^ > 4mc), then a freely vibrating system does not in fact vibrate. Taking the simplest case k^ = 4mc, sketch the behavior of the system for the initial condition dx/dt = 0 and x = a when ^ = 0. Do the same for the initial condition dx/dt = v and a; = 0 when ^ = 0. 5.18 Use the method of undetermined coefficients to find a particular solution of the equation of the driven electric circuit d X dx X L-Z-Z-+ R-^+- = asinco^. dt^ dt c Assuming that R > 0, show by a general argument that your particular solution is in fact the steady-state part (the part without exponential decay) of the general solution.
294 DIFFERENTIAL EQUATIONS 6.0 5.19 In the above exercise show that the "current" dx/dt for your solution can b(^ written in the form dx a — = sin (o)t — a), where X = Lo) — l/coC Here a is called the phase angle. 5.20 Continuing our discussion, show that the current flowing in the circuit will have a maximum amplitude when the frequency of the "impressed voltage" a sin cot is l/27r\/LC. This is the phenomenon of resonance. Show also that the current is in phase with the impressed voltage (i.e., that a: = 0) if and only ii L = C = 0. 5.21 What is the condition that the phase a be approximately 90°? —90°? 5.22 In the theory of a stable equilibrium point in a dynamical system we end up with two scalar products (?, rj) and ((?, r;)) on a finite-dimensional vector space F, the quadratic form q(^) = ^((?, ?)) being the potential energy and p(?0 = ^(?', ?0 being the kinetic energy. Now we know that dqaiO = ((<^j ?)) ^^^ similarly for p, and because of this fact it can be shown that the Lagrangian equations can be written dt (|..)^ («..)). Prove that a basis {l3i] ^ can be found for V such that this vector equation becomes the system of second-order equations d Xi . —- = XiXi, I = 1, . . . ,n, where the constants X^ are positive. Show therefore that the motion of the system is the sum of n linearly independent simple harmonic motions. 6. THE BOUNDARY-VALUE PROBLEM We now turn to a problem which seems to be like the initial-value problem but which turns out to be of a wholly different character. Suppose that T' is a second- order operator, which we consider over a closed interval [a, b]. Some of the most important problems in physics require us to find solutions to T{f) = g such that / has given values at a and 6, instead of / and /' having given values at a single point ^0- This new problem is called a boundary-value problem, because {a, b} is the boundary of the domain / = [a, b]. The boundary-value problem, like the initial-value problem, breaks neatly into two subproblems if the set M= {/ee^([a,6]):/(a)=/(5) = 0} turns out to be a complement of the null space N of T, However, if the reader will consider this general question for a moment, he will realize that he doesn't have a clue to it from our initial-value development, and, in fact, wholly new tools have to be devised. Our procedure will be to forget that we are trying to solve the boundary- value problem and instead to speculate on the nature of a linear differential
6.6 THE BOUNDARY-VALUE PROBLEM 295 operator T from the point of view of scalar products and the theory of self- adjoint operators. That is, our present study of T will be by means of the scalar product (/, g) = ja fif)g(t) dt, the general problem being the usual one of solving Tf = ghy finding a right inverse S of T. Also, as usual, S may be determined by finding a complement M of N{T). Now, however, it turns out that if T is "formally self-adjoint", then suitable choices of M will make the associated right inverses S self-adjoint and compact^ and the eigenvectors of aS, computed as those solutions of the homogeneous equation Tf — rf = 0 which lie in M, then allow (relatively) the same easy handling of S, by virtue of Theorem 5.1 of Chapter 5, that they gave us earlier in the finite-dimensional situation. We first consider the notion of "formal adjoint" for an nth-order linear differential operator T. The ordinary formula for integration by parts. [''f'9 = f9]' - ['fg', Ja -ia Ja allows the derivatives of/ occurring in the scalar product {Tf^ g) to be shifted one at a time to g. At the end, / is undifferentiated and g is acted on by a certain ^^th-order linear differential operator R. The endpoint evaluations, like the above fgfa, that accumulate step by step can be described as 0<i+j<n where the coefficient functions kij{x) are linear combinations of the coefficient functions ai{x) and their derivatives. Thus (Tf, g) = (/, Rg) + B{f, g)t The operator R is called the formal adjoint of T, and if R = T, we say that T is formally self-adjoint. Every application of the integration by parts formula introduces a sign change, and the reader may be able to see that the leading coefficient of R is (—1)^ times the leading coefficient of T. Assuming this, we see that a necessary condition for formal self-adjointness is that ^^ be even, so that R and T have the same first terms. Supposing that T is formally self-adjoint, we seek a complement M of the null space iV of 7" in e^([a, h\) with the further property that aS, the associated right inverse of T, is self-adjoint as a mapping from the pre-Hilbert space e^([a, 6]) to itself. Let us see what this further requirement amounts to. For any Uj v G (3^, set f = Su and g = Sv^ so that / and g are in M and u = Tf, V = Tg. Then (u, Sv) = (Tf, g) = (/, Tg) + B{f, g)t = (Su, v) + B{S, g)t We thus have: Lemma 6.1. If T' is a formally self-adjoint differential operator and M is a complement of the null space of 7", then the right inverse of T determined by M is self-adjoint if and only if f,geM =^ B{f, g)ta = 0.
296 DIFFERENTIATi EQUATIONS 6.6 From now on we sliall consider only the second-order case* Howe¥er, almost everything tliat we are going to do works perfectly well for the general case, the price of generality being only additional notational complexity. We start by computing the formal adjoint of the second-order operator Tf = C2r + ci/' + Col We have {Tig) = f%2fV-- ['c.^fg f f'cofg, J a -J a J a I' cj'g ^ Cifgf ~ I'ficigy, J a Ja J a r C2f"g = C2f'gt - [''f'(c2gy J a Jn Ja Ja Ja giving and Ba g) =- e^if'g - g[n + id - c^fg. Thus Eg = ctg'' + (2c.^ - Ci)g' + « - ci + co)g and ^ -= f if and only if 2c2 — ci =■ ct (and €2 -- cj == 0)^ that is, C2 =^ ci. We have proved: Lemma 6,2. The second-order differential operator T is formally self- adjoint if and onl^^ if Tf = cj" -i cy + c^f ^ (c^y + Co/, in which case B{f, g) = c^iJ'g - .97). A constant coefficient operator is thus formally self-adjoint if and only if ci = 0. Supposing that T is formally self-ad joint, we now try to find a complement M of its null space N such that /, g e M ^^ B(/, g)]l = 0. Since N is two-dimen- sional; any complement M can be described as the intersection of the null space of two linear functional h and I2 on X2 ■= C^{{a,!)]), For example^ the "one point" complement Mt^ that we had earlier in connection with the initial-value problem is the intersection of the null spaces of the two functionals hif) = /(io) and IsC/) = f^ih)* Here^ however, the vanishing of li and I2 for two functions / and g must imply that ii(/, g)]l = C2{f'g — //)]« = 0, and the functionals li(/) must therefore involve the values of / and /^ at a and at fc. We would naturally guessj and it can be proved^ that each of li and 12 must be of the form, m - ktfia) + k2fid) h hm + hfih)^ Our problem can therefore be restated as follows. We must find two linear functionals li and I3 of the above general form such that if M is the intersection
6.6 THE BOUNDAMY-VALUE PHOBLEM 297 of their null spaces^ then a) M is a complement of N^ and b) f,geM=^ C2{f'g - o'!)t = 0, in which case we call the boundary condition li(/) -^ mf) ^ 0 self-adjoint. Lemiiia 6.3. We can replace (a) by a') T is injective on ill. Proof, If T is injective on M, then M n N = {0} j so that the map /-> <hif),h(f)> is injective on N, and thereforej because N is two-dimensional, is an isomorphism from F to R^ (by the corollary of Theorem 2.4, Chapter 2). Then M is a complement of N by Theorem 5.3 of Chapter 1. D Now we can easily write down various pairs li and /2 that form a self-adjoint boundary condition. We list some below. 1) / CE M ^ m = fib) = 0 Ithat is, h(J) = M and hif) = Kb)]. 2) fsM^fia) =r{b) -0. 3) ]\lore generally, f'{a) — kf{d), f\b) == cf{h), (Tn fact^ li can be any I that depends only on the values at a, and 12 can be any I that depends only on b. Thus hif) = kjia) + t2/'(a), and if hif) - hiy) =- 0, then the pairs </(a),/'(a)> and <gia), c/ia)> are dependent, since both lie in the one- dimensioTml mill space of h, ^^^^ ^^^ /V 9JU = 0- The same holds for I2 and 5j so that this split pair of endpoint conditions makes fe(/, g)fa = 0 by making the values of B at a and at b separately 0,) 4) If C2ia) = C2(5), then take/ e M ^ fia) - fib) and/'(a) - fib). That is, hif) = fia) - fib) and hif) = fia) ^ fib). We now show that in every case but (3) the condition (a') also holds if wc replace Thy T -- X for a suitable A. This is true also for case (3), but the proof is harder, and we shall omit it. Lemma 6.4. Suppose that M is defined by one of the self-adjoint boundary conditions (1), (2), or (4) above, that €^(0 S: w > 0 on [a, b], and that X > co(0 + 1 on [a, b]. Then 2 I 11/112 1((7^-X)/J)| > m\:f'\'i-\ 11/ 2 for all / G. .1/. In particular, M is a complement of the null space of T -™- X and licence defines a self-adjoint right inverse of T — \. Proof, We have ((X - T)f,f) = [' icjyf] f (X - Co)f Ja Ja i5 fb C2f'f] + [ C,{ff + [ (X --. C,)f Ja Ja Ja
298 DIFFERENTIAL EQUATIONS 6.G Under any of conditions (1), (2), or (4), C2fJ]t = 0, and the two integral terms are clearly bounded below by m||/'||2 and ||/||2, respectively. Lemma 6.3 then implies that M is a complement of the null space of T — X. D We come now to our main theorem. It says that the right inverse S oi T — X determined by the subspace M above is a compact self-adjoint mapping of the pre-Hilbert space e^([a, 6]) into itself, and is therefore endowed with all the rich eigenvalue structures of Theorem 5.1 of the last chapter. First we present some classical terminology. A Sturm-Liouville system on [a, 6] is a formally self-adjoint second-order differential operator Tf = (C2/O' + Cof defined over the closed interval [a, h], together with a self-adjoint boundary condition ^i(/) = hif) = 0 for that interval. If C2{t) is never zero on [a, b], the system is called regular. If 02(0) or 02(6) is zero, or if the interval [a, b] is replaced by an infinite interval such as [a, 00], then the system is called singular. Theorem 6.1. li T: li, I2 is Si regular Sturm-Liouville system on [a, b], with C2 positive, then the subspace M defined by the homogeneous boundary condition is a complement of A^(T' — X) if X is taken sufficiently large, and the right inverse of T' — X thus determined by M is a compact self-adjoint mapping of the pre-Hilbert space e^([a, 6]) into itself. Proof. The proof depends on the inequality of the above lemma. Since we have proved this inequality only for boundary conditions (1), (2), and^(4), our proof will be complete only for those cases. Set^ g= (T- X)f. Since ||r7||2||/||2 > l((^ - A)/,/)i by the Schwarz inequality, we see from the lemma first that II/II2 < ||{/||2||/||2) so that 11/1I2 < hh, and then that m||/'III < WghWfh < ||<7||1, so that ll/'IU < Wgh/Vm. We have already checked that the right inverse S of the formally self- adjoint T — \ defined by M is self-adjoint, and it remains for us to show that the set S[U] = {/: ||g||2 < 1} has compact closure. For any such / the Schwarz inequality and the above inequality imply that \Ky) - f{x)\ < / l/'l = / l/'l • 1 < WS'Uy - x\"' < ^^ 1/2 \/m Thus S{U] is uniformly equicontinuous. Since the common domain of the functions in S[U] is the compact set [a, 6], we will be able to conclude from Theorem 6.1 of Chapter 4 that the set S[U] is totally bounded if we can show that there is a constant C such that all the functions in S[U] have their ranges in [—C, C]. Taking y and x in the last inequality where |/| assumes its maximum and minimum values, we have ||/||oo — min |/| < (6 — a)^^^/Vm. But
6.6 THE BOUNDARY-VALUE PROBLEM 299 (min |/|)(6 - a)''' < II/II2 < hh < 1, and therefore ll/IU <c = 1/(6 - a)"' + ib- ayyv^. Thus AS[t/] is a uniformly equicontinuous set of functions mapping the compact set [a, b] into the compact set [—C, C], and is therefore totally bounded in the uniform norm. Since C{[a, h]) is complete in the uniform norm, every sequence in S[U] has a subsequence uniformly converging to some / G C, and since II/II2 ^ {b — a)^^^||/||oo, this subsequence also converges to / in the two-norm. We have thus shown that if H is the pre-Hilbert space G{[a, h]) under the standard scalar product, then the image S[U] of the unit ball U C H under S has the property that every sequence in S[U] has a subsequence converging in H. This is the property we actually used in proving Theorem 5.1 of Chapter 5, but it is not quite the definition of the compactness of S, which requires us to show that the closure S[U] is compact in H. However, if {g^} is any sequence in this closure, then we can choose {f^} in S[U] so that \\^n — tn\\ < V^- The sequence {fn} has a convergent subsequence {^n{m)}m as above, and then {^n{m)}m converges to the same limit. Thus aS is a compact operator. D Theorem 6.2. There exists an orthonormal sequence {cpn} consisting entirely of eigenvectors of T and forming a basis for M. Moreover, the Fourier expansion of any f G M with respect to the basis {cpn} converges uniformly to / (as well as in the two-norm). Proof. By Theorem 5.1 of Chapter 5 there exist an eigenbasis for the range of S, which is M. Since Scpn = r^ipn for some nonzero r^, we have {T — \){rn^n) = (Pn and Tcpn = ((1 + ^Tn)/rn)(Pn' The uniformity of the series convergence comes out of the following general consideration. Lemma 6.5. Suppose that T' is a self-adjoint operator on a pre-Hilbert space V and that T is compact as a mapping from F to <V, q>, where g is a second norm on V that dominates the scalar product norm p (q > cp). Then T is compact (from ptop), and the eigenbasis expansion X! ^n^n of an element 13 in the range of T converges to 13 in both norms. Proof. Let U be the unit ball of V in the scalar product norm. By the hypothesis of the lemma, the g-closure B of T[U] is compact. B is then also p-compact, for any sequence in it has a g-convergent subsequence which also p-converges to the same limit, because p < cq. We can therefore apply the eigenbasis theorem. Now let a and ^ = T{a) have the Fourier series X! (^i^i and X! ^i^i, and let T{(Pi) = rupi. Then 6^- = r^-, because hi = {T{a), ipi) = (a, T{(pi)) = (a, rupi) = ri{a, cpi) = ritti. Since the sequence of partial sums X!i aicpi is p-bounded (Bessel's inequality), the sequence {2]i hicpi} = {T(^1 anpi)] is totally g-bounded. Any subsequence of it therefore has a subsubsequence g-converging to some element 7 in V. Since it then p-converges to T, 7 must be fi. Thus every subsequence has a subsubsequence g-converging to fi^ and so {Si ^i^i\ itself g-converges to fi by Lemma 4.1 of Chapter 4. D
300 DIFFERENTIAL EQUATIONS 6.6 EXERCISES 6a Given that Tf{x) = xf'{x)+f{x) and Sf{x) = f\x), compute T o SsmdSo T. 6.2 Show that the differential operators T = aD and S = bD commute if and only if the functions a{x) and b{x) are proportional. 6.3 Show that the differential operators T = aD^ and S = bD commute if and only if b(x) is a first-degree polynomial b{x) = ex -\- d and a{x) = k(b{x))^. 6.4 Compute the formal adjoint aS of T if a) Tf = r, b) Tf = r, c) Tf = r, d) iTf)(x) = xf\x), e) {Tf){x) = x^f\x). 6.5 Let S and T be linear differential operators of orders vi and n, respectively. What are the coefficient conditions for aS o T to be a linear differential operator of order m + n? 6.6 Let T be the second-order linear differential operator (T/)(0 = a2{t)f"{t) + ai{t)f{t) + aoiOm. What are the conditions on its coefficient functions for its formal adjoint to exist? What are these conditions for T of order n? 6.7 Let S and T be linear differential operators of order m and n, respectively, and suppose that all coefficients are C°°-functions (infinitely differentiable). Prove that SoT — ToSisoi order < m+n — 1. 6.8 A 5-blip is a continuous nonnegative function (p such that ip = 0 outside of [—8, 8] and fl^cp = 1 (Fig. 6.4). We assume that there exists an infinitely differentiable 1-bHp (p. Show that there exists an infinitely differentiable 5-blip for every 5 > 0. Define what you would mean by a 5-blip centered at X, and show that one exists. Fig. 6.4 6.9 Let/ be a continuous function on [a, b] such that (/, g) = fa fg = 0 whenever g is an infinitely differentiable function which vanishes near a and b. Show that / = 0. (Use the above exercise.) 6.10 Let e'°([a, b]) be the vector space of infinitely differentiable functions on [a, b], and let T be a second-order linear differential operator with coefficients in 6°°; (T/)(0 = a2{t)f\t) + aiit)f{t) + aoit)m. Let aS be a linear operator on ©'^([a, b]) such that {Tf, g) - (/, Sg) = K(f, g) is a bilinear functional depending only on the values of /, g, f', and g' at a and b. Prove that S is the formal adjoint of T. [Hint: Take / to be a 5-bHp centered at .r. Then K{f, g) = 0. Now try to work the assertion to be proved into a form to which the above exercise can be applied.]
6.7 FOURIER SERIES 301 6.11 Prove an nth-order generalization of the above exercise. 6.12 Let X be the space of Hnear differential operators with e°°-coefficients, and let At be the formal adjoint of T. Prove that T -^ At is an isomorphism from X to X. Prove that ^(to^) = As° At. 7. FOURIER SERIES There are not many regular Sturm-Liouville systems whose associated ortho- normal eigenbases have proved to be important in actual calculations. Most orthonormal bases that are used, such as those due to Bessel, Legendre, Hermite, and Laguerre, arise from singular Sturm-Liouville systems and are therefore beyond the limitations we have set for this discussion. However, the most well- known example, Fourier series, is available to us. We shall consider the constant coefficient operator Tf = Z)^/, which is clearly both formally self-adjoint and regular, and either the boundary condition /(O) = /(tt) = 0 on [0, tt] (type 1) or the periodic boundary condition/(—tt) = /(tt), n-T) = f\ir) on [-TT, tt] (type 4). To solve the first problem, we have to find the solutions of f" — \f = 0 which satisfy /(O) = /(tt) = 0. If X > 0, then we know that the two-dimensional solution space is spanned by {e'^^, e~^^}, where r = X^^^. But if Cie^^ + C2e~^^ is 0 at both 0 and tt, then ci = C2 = 0 (because the pairs < 1, 1 > and <e^^, e~^^> are independent). Therefore, there are no solutions satisfying the boundary conditions when X > 0. If X = 0, then f{x) = Cix -\- Cq and again ci = Co = 0. If X < 0, then the solution space is spanned by {sin rx, cos rx], where r = (—X)^^^. Now if Ci sin rx + C2 cos rx is 0 at x = 0 and x = tt, we get, first, that C2 = 0 and, second, that rw = utt for some integer n. Thus the eigenfunctions for the first system form the set {sin nx} ^, and the corresponding eigenvalues of D^ are {—n^}i. At the end of this section we shall prove that the functions in e^([a, 6]) that are zero near a and 6 are dense in e([a, 6]) in the two-norm. Assuming this, it follows from Theorem 2.3 of Chapter 5 that a basis for M is a basis for C^, and we now have the following corollary of the Sturm-Liouville theorem. Theorem 7.1. The sequence {smnx}i is an orthogonal basis for the pre- Hilbert space e^[0, tt]). If / g e\[0, tt]) and /(O) = /(tt) = 0, then the Fourier series for / converges uniformly to /. We now consider the second boundary problem. The computations are a little more complicated, but again if/(x) = cie^^ + C2e~^^, and if/(—tt) = /(tt) and /'(—tt) = /'(tt), then / = 0. For now we have cie""^ + C2e"^ = cie"^ + C2e-"^ giving ci = C2, and Cire~^'^ — C2re^^ = Cire^^ — C2re~^'^,
302 DIFFERENTIAL EQUATIONS 6.7 giving Ciie^"^ — e~^'^) = 0, and so Ci = 0. Again/(a:) = CiX + Cq is ruled out Finally, if/(x) = Ci sin rx + C2 cos rx, our boundary conditions become 2c 1 sin rw = 0 and 2rc2 sin ttt = 0, so that again r = n, but this time the full solution space of (D^ + n^)f = 0 satisfies the boundary condition. Theorem 7.2. The set {sin nx} ^ U {cos ?ix} 0 forms an orthogonal basis for the pre-Hilbert space e^([-7r, tt]). If / G e2([-7r, w]) and /(-tt) = /(tt), /'(—tt) = /'(tt), then the Fourier series for / converges to / uniformly on [ — TT, tt]. Remaining "proof. This theorem follows from our general Sturm-Liouvilie dis- cussion except for the orthogonality of sin nx and cos nx. We have /TT sin nt cos nt dt -IT /TT sin 2nt dt -IT = — (l/4n) cos 2na:]!L7r = 0. Or we can simply remark that the first integrand is an odd function and therefore its integral over any symmetric interval [—a, a] is necessarily zero. The orthogonality of eigenvectors having different eigenvalues follows of course, as in the proof of Theorem 3.1 of Chapter 5. D Finally, we prove the density theorem we needed above. There are very slick ways of doing this, but they require more machinery than we have available, and rather than taking the time to make the machines, we shall prove the theorem with our bare hands. It is standard notation to let a subscript zero on a symbol denoting a class of functions pick out those functions in the class that are zero "on the boundary" in some suitable sense. Here eo([^) ^]) will denote the functions in e([a, b]) that are zero in neighborhoods of a and b, and similarly for GlUa, b]). Theorem 7.3. (3^([a, b]) is dense in e([a, b]) in the uniform norm, and Go(K b]) is dense in e([a, b]) in the two-norm. Proof. We first approximate / G e([a, b]) to within e by a piecewise "linear" function g by drawing straight line segments between the adjacent points on the graph of / lying over a subdivision a = Xq < Xi < - - • < Xn = b of [a,b\. If/ varies by less than e on each interval (x^_i, Xi), then ||/ — g\\^ < e. Now g\t) is a step function which is constant on the intervals of the above subdivision. We now alter g\t) slightly near each jump in such a way that the new function h{t) is continuous there. If we do it as sketched in Fig. 6.5, the total
6.7 FOURIER SERIES 303 Fig. 6.5 Fig. 6.6 integral error at the Jump is zero, J^^.lt/ (h — ^0 — 0, and the maximum error Jj^.l_3 is 6A/4. This will be less than e if we take d = e/||^'||oo, since A < 2||^'||oo. We now have a continuous function h such that \ja h{t) dt — {f{x) — /(a)) | < 2e. In other words, we have approximated / uniformly by a continuously differ- entiable function. Now choose g and h in e^([a, 6]) so that first ||/ — ^||oo < e/2 and then \\c/ - h\U < e/2(h - a). Then m) - g(o) JO h\ < e/2, and so H{x) = jo h + g{0) is a twice continuously differentiate function such that 11/ — H\\ao < €. In other words, e^([a, h]) is dense in e([a, 6]) in the uniform norm. It is then also dense in the two-norm, since ii/iu = (£f) 1/2 < .(/:o 1/2 (6-a) 1/211 But now we can do something which we couldn't do for the uniform norm: we can alter the approximating function to one that is zero on neighborhoods of a and 5, and keep the two-norm approximation good. Given 6, let e{t) be a non- negative function on [a, b] such that e{t) = 1 on [a + 26, 6 — 2^], e{t) = 0 on fa, a+ d] and on [6 — 8, b], e" is continuous, and ||6i|oo = 1. Such an e{t) clearly exists, since we can draw it. We leave it as an interesting exercise to actually define e(^). Here is a hint: Show somehow that there is a fifth-degree polynomial p(0 having a graph between 0 and 1 as shown in Fig. 6.6, with a zero second derivative at 0 and at 1, and then use a piece of the graph, suitably translated, compressed, rotated, etc., to help patch together e{i). Anyway, then \g — eg\2 ^ il(7lU(46)^^^ for any g on e([a, 6]), and if g has continuous derivatives up to order 2, then so does eg. Thus, if we start with/ in C and approximate it by g in C^, and then approximate g by eg, we have altogether the second approximation of the theorem. D
304 DIFFERENTIAL EQUATIONS 6.7 EXERCISES 7.1 Convert the orthogonal basis {sm nx]"^ for the pre-Hilbert space (3([0, tt]) to an orthonormal basis. 7.2 Do the same for the orthogonal basis {sin nx]i U {cos nx] o foi' G([—tt, tt]). 7.3 Show that {sin nx] i is an orthogonal basis for the vector space V of all odd continuous functions on [—TT, tt]. (Be clever. Do not calculate from scratch.) Normalize the above basis. 7.4 State and prove the corresponding theorem for the even functions on [—tt, tt]. 7.5 Prove that the derivative of an odd function is even, and conversely. 7.6 We now want to prove the following stronger theorem about the uniform convergence of Fourier series. Theorem. Let / have a continuous derivative on [—tt, tt], and suppose that /(—tt) = /(tt). Then the Fourier series for/ converges to / uniformly. Assume for convenience that / is even. (This only cuts down the number of calculations.) Show first that the Fourier scries for /' is the series obtained from the Fourier series for / by term-by-term differentiation. Apply the above exercises here. Next show from the two-norm convergence of its Fourier series to /' and the Schwarz inequality that the Fourier series for/ converges uniformly. 7.7 Prove that (cos nx] o is an orthonormal basis for the space 71/ of 6^-functions on [0, tt] such that/'(0) = /'(tt) = 0. 7.8 Find a fifth-degree polynomial p{x) such that p{0) = p\0) = p'\0) = 0, p\l) = p"(l) = 0, p{l) = 1. (Forget the last condition until the end.) Sketch the graph of p. 7.9 Use a "piece" of the above polynomial p to construct a function e{x) such that e^ and e^^ exist and are continuous, e{x) = 0 when x < a ^ 3 and x > b — 3, e{x) = 1 on [a+ 23, b — 23], and ||e||oo = 1. 7.10 Prove the AVcierstrass theorem given below on [0, tt] in the following stei)s. We know that / can be uniformly ai)i)roximated by a 6^-function g. 1) Show that c and d can be found and that g{t) — cit) — c? is 0 at 0 and tt. 2) Use the Fourier series ex})ansion of this function and the ]\Iaclaurin series for the functions sin nx to show that the polynomial p{x) can be found. Theorem {The Weierstrass approximation theorem). The polynomials are dense in 6([a, b]) in the uniform norm. That is, given any continuous function / on [a, b] and any e, there is a polynomial p such that \j{x) — p{x)\ < e for all x in [a, b].
CHAPTER 7 MULTILINEAR FUNCTIONALS This chapter is principally for reference. Although most of the proofs will be included, the reader is not expected to study them. Our goal is a collection of basic theorems about alternating multilinear functional, or exterior forms, and the determinant function is one of our rewards. 1. BILINEAR FUNCTIONALS We have already studied various aspects of bilinear functionals. We looked at their duality implications in Section 6, Chapter 1, we considered the "canonical forms" of symmetric bilinear functionals and their equivalent quadratic forms in Section 7, Chapter 2, and, of course, the whole scalar product theory of Chapter 5 is the theory of a still more special kind of bilinear functional. In this chapter we shall restrict ourselves to bilinear and multilinear functionals over finite-dimensional spaces, and our concerns are purely algebraic. We begin with some material related to our earlier algebra. If V and W are finite-dimensional vector spaces, then the set of all bilinear functionals on V X W is pretty clearly a vector space. We designate it F* ® TF* and call it the tensor product of F* and TF*. Our first theorem simply states something that was implicit in Theorem 6.1 of Chapter 1. Theorem 1.1. The vector spaces 7* ® TF*, Hom(y, TF*), and Hom(TF, 7*) are naturally isomorphic. Proof. We saw in Theorem 6.1 of Chapter 1 that each/in F* (x) TF* determines a linear mapping a i-^ /« from TF to F*, where/«(^) = /(^, a), and we also noted that this correspondence from F* ® TF* to Hom(TF, F*) is bijective. All that the present theorem adds is that this bijective correspondence is linear and so constitutes a natural isomorphism, as does the similar one from F* ® TF* to Hom(F, TF*). To see this, let fr be the bilinear functional corresponding to T in Hom(F, TF*). Then f.r+S) = It + fs. for f,T+S){a, ^) = ({T + S){a)){^) = {T{a) + S{a))(^) = (T{a)){P) + {S{a))(^) = Ma, ^) + fs(a, ^). We can do the same for homogeneity. The isomorphism of F* ® TF* with Hom(TF, F*) follows in exactly the same way by reversing the roles of the variables. We are thus finished with the proof. D 305
306 MULTILINEAR FUNCTIONALS 7.2 Before looking for bases in F* ® TF*, we define a bilinear functional 7 (x) X for any two functional 7 G 7* and X G TT* by (7 ® X)(^, rj) = y(^)Hv)' We call 7 ® X the tensor product of the functional 7 and X and call any bilinear functional having this form elementary. It is not too hard to see that/G F* ® TF* is elementary if and only if the corresponding T G Hom(y, PF*) is a dyad. If V and W are finite-dimensional, with dimensions m and n, respectively, then the above isomorphism of F* (x) TF* with Hom(y, TF*) shows that the dimension of F* ® W* is mn. We now describe the basis determined by given bases in V and W. Theorem 1.2. Let {ai}"^ and {Pj}1 be any bases for V and TF, and let their dual bases in F* and TF* be {jUi}? and {^y}?. Then the mn elementary bilinear functional {m ® Vj} form the corresponding basis for F* ® TF*. Proof. Since jUi ® ^y(^, rj) = iJLi{^)Pj{rj) = XiVj, the matrix expansion/(^, rj) = Hij tijXiyj becomes/(^, v) = Hij Ujifii ® Pj)(^, v) or id The set {^li ® ^y} thus spans F* ® TF*. Since it contains the same number of elements {mn) as the dimension of F* ® TF*, it forms a basis. D Of course, independence can also be checked directly. If Ylij kjifJ^i ® ^i) = 0, then for every pair <k,l>,tki= lli,j Uj{^i ® Vj){au, fii) = 0. We should also remark that this theorem is entirely equivalent to our discussion of the basis for Hom(F, TF) at the end of Section 4, Chapter 2. 2. MULTILINEAR FUNCTIONALS All the above considerations generalize to multilinear functionals /: Fi X • • • X Fn -^ R. We change notation, just as we do in replacing the traditional <x, y> G H^^ by X = <xi, . . . J Xn> G U^. Thus we write f{ai, . . . , a-n) = f{oi), where oi = <ai, . . . , an> G Fi X • • • X F^. Our requirement now is that /(«!, . . , ,an) be a linear functional of aj when ai is held fixed for all i 9^ j. The set of all such functionals is a vector space, called the tensor product of the dual spaces Ft, ... , F*, and is designated F? ® • • • ® F*. As before, there are natural isomorphims between these tensor product spaces and various Hom spaces. For example, Vt®"'®V* and Hom(Fi, F| ® • • • ® F*) are naturally isomorphic. Also, there are additional isomorphisms of a variety
7.2 MULTILINEAR FUNCTIONALS 307 not encountered in the bilinear case. However, it will not be necessary for us to look into these questions. We define elementary multilinear fiinctionals as before. If X^ G Vf^ i = 1, . . . , n, and f = < ^i, - - - , ^n>, then (Xl®---®Xn)(€) = Xl(h)---Xn(U. To keep our notation as simple as possible, and also because it is the case of most interest to us, we shall consider the question of bases only when Vi = V2 = '" = Vn= V. In this case (7*)® = 7* ® • • • ® 7* (in factors) is called the space of covariant tensors of order n (over V). If {ay} 7 is a basis for V and /g (7*)®, then we can expand the value f(^) = fi^i) ' ' ' ) ^n) with respect to the basis expansions of the vectors ^i just as we did when / was bilinear, but now the result is notationally more complex. If we set ^i = J^Y=i ^]^3 for z = 1, . . . , n (so that the coordinate set of ^i is X* = {x'j} j) and use the linearity of the/(^i,. .. , ^n) in its separate variables one variable at a time, we get /v5l) • • • ) ?nj ^^ 2^ Xp^Xp^ ' ' ' Xp^J[ap^j Oip^, ' ' ' ) ^Pn)) where the sum is taken over all n-tuples p= <Pi,.-.,Pn^ such that I < Pi < m for each i from 1 to m. The set of all these n-tuples is just the set of all functions from {1, . . . , n} to {1, . . . , m}. We have designated this set m^, using the notation n = {1, . . . , n}, and the scope of the above sum can thus be indicated in the formula as follows: j{kl) ' ' ' ) knj "= 2^_^Pi ' ' ' XpjioLp^^ ' • ' ) ^PrJ- A strict proof of this formula would require an induction on n, and is left to the interested reader. At the inductive step he will have to rewrite a double sum UpG;^ IZyGm_as the single sum l]qGm^i+i using the fact that an ordered pair <p, j> in m^ X m is equivalent to an (n + l)-tuplet q G m^+^ where qi = pi for i = 1, . . . , n and ^n+i = J- If {jUz}T is the dual basis for 7* and q G m^, let jUq be the elementary functional ju^^ ® • • • ® fJiq^' Thus/;tq(«pi, • • • , oLpJ = IIiM^/^p,) = 0 unless p = q, in which case its value is 1. More generally. Therefore, if we set Cq = /(«3i, - • • ? ««„), the general expansion now appears as K^ly ' ' ' , ^n) = 1]_ CpMp(^1, • . • , ^n) or / = X Cp/Xp, which is the same formula we obtained in the bilinear case, but with more sophisticated notation. The functionals {jUp : p G m^} thus span (7*)®. They are also independent. For, if X) ^pMp = 0? then for each q, Cq = 2] CpjUp(a^j, . . . , aqj = 0. We have proved the following theorem.
308 MULTILINEAR FUNCTIONALS 7..'^ Theorem 2.1. The set {jUp : p G m^} is a basis for (7*)®. For any / in (F*)® its coordinate function {Cp} is defined by Cp = Roip^, . . . , oip^). Thus/ = Z CpMp and/(^i, . . . , ^n) = Z CpjUp(^i, • • • , ^n) = Z Cp x^^ • • • x^^ for any / G (7*)® and any < h, . . . , ^^> G V. Corollary. The dimension of (F*)® is m^. Proof. There are m^ functions in m^, so the basis {jUp : p G m^} has m^ elements. D 3. PERMUTATIONS A permutation on a set aS is a bijection f:S-^S. If §>{S) is the set of all permutations on S, then S = S(aS) is closed under composition (a, p G $ =^ <t o p E:S) and inversion (a G S =^ a~^ G S). Also, the identity map / is in S, and, of course, the composition operation is associative. Together these statements say exactly that S is a group under composition. The simplest kind of permutation other than / is one which interchanges a pair of elements of S and leaves every other element fixed. Such a permuation is called a transposition. We now take S to be the finite set n = {1, . . . , n} and set Sn = S(ri). It is not hard to see that then any permutation can be expressed as a product of transpositions, and in more than one way. A more elementary fact that we shall need is that if p is a fixed element of §>n, then the mapping (7 i-^ (7 o p is a bijection S^ "-^ ^n- It is surjective because any d' can be written d' = (d' o p~i) o p, and it is injective because ai o p = (72 o p =» {ci o p) o p-i = ((72 o p) o p-^ =f^ q-^ = (j.^. Similarly, the mapping a i-^ p o a {p fixed) is bijective. We also need the fact that there are n\ elements in S^. This is the elementary count from secondary school algebra. In defining an element a G §>n, (7(1) can be chosen in n ways. For each of these choices (7(2) can be chosen in n — 1 ways, so that <(7(1), (7(2) > can be chosen in n{n — 1) ways. For each of these choices (7(3) can be chosen in n — 2 ways, etc. Altogether (7 can be chosen in n{n — l)(n — 2) • • • 1 = n! ways. In the sequel we shall often write ^pa^ instead of ^p ° (t\ just as we occasionally wrote ^ST' instead of 'S o T^ for the composition of linear maps. If 5 = < ^1, • • • , ^n> G y^ and (7 G Sn, then we can "apply (7 to f", or "permute the elements of < ^i, . . . , ^n^ through a". We mean, of course, that we can replace < ^i, . . . , ^n^ by < ^^(d, . . . , ^cr(n)^) that is, we can replace { by 5 o (7. Permuting the variables changes a functional /g (F*)® into a new such functional. Specifically, given / G (F*)® and a G Sn? we define f by r(€) = M ° ^"') = /(^cr-l(l), . . . , ^cr-l(n)). The reason for using a~^ instead of a is, in part, that it gives us the following formula.
7.4 THE SIGN OF A PERMUTATION 309 Lemma 3.1. /"^i^^^ = {f^Y^ Proof, f^'^'ii) = /({o (<ri oa^)-') = /({ o (cr^ i o<j-,')) = ^ ^ a^') o a',') = fi(fo<r2^)= (r'P({)- □ Theorem 3.1. For each (7 in S|2. the mapping T^ defined, by /1—> f^ is a linear isomorphism of (F*)® onto itself. The mapping (7 i-^ 7"^ is an antihomo- morphism from the group S^ to the group of nonsingular elements of Hom((y*)®). Proof. Permuting the variables does not alter the property of multilinearity, so T^ maps (7*)© into itself. It is linear, since {af+hg)'' = af + he/. And Tpa = TffO Tp, because/'^ = (fy. Thuscr i-^ T^ preserves products, but in the reverse order. This is why it is called an an^zhomomorphism. Finally, 7^(^-1) o T, = 7^(,,-i) = Ti = I, so that Tff is invertible (nonsingular, an isomorphism). D The mapping (T »-> 7"^ is a representation (really an an^irepresentation) of the group §>n by linear transformations on (F*)®. Lemma 3.2. Each T'^ carries the basis {jUp} into itself, and so is a permutation on the basis. Proof. We have (n^Y^^) = Mp(€ o a-') = UU fJ^pM^'Hi))- Setting^ = a-\i) and so having i = <t(j), this product can be rewritten 11?=! l^Pa(j)(^j) = Mpo(r(€)- Thus (Mp)"" = Mpo poff) and since p »-^ p o (j is a permutation on m^, we are done. D 4. THE SIGN OF A PERMUTATION We consider now the special polynomial EonW^ defined by E{x) = E(xi, . . . ^Xn) = n (^* — ^j)' This is the product over all pairs <i,j> ^n X n such that i < j. This set of ordered pairs is in one-to-one correspondence with the collection P of all pair sets {if j} C n such that i 9^ j, the ordered pair being obtained from the unordered pair by putting it in its natural order. Now it is clear that for any permutation (7 G Sn, the mapping {z, j) •—^ \^(j)) ^U)} is a permutation of P. This means that the factors in the polynomial £"^(x) = E{x o a) are exactly the same as in the polynomial E{x) except for the changes of sign that occur when a reverses the order of a pair. Therefore, if n is the number of these reversals, we have E'' = {—l^E. The mapping a i-^ (—1)^ is designated 'sgn' (and called "sign"). Thus sgn is a function from §>n to {1, —1} such that E^ = {sgn(T)E,
310 MULTILINEAR FUNCTIONALS 7.5 for all (7 G Sm. It follows that sgnpo- = (sgnp) (sgno-), for {sgnpa)E = E^'' = (E^y = {sgna)EP = (sgnp) {sgna)E, and we can evaluate E at any n-tuple x such that E(x) 9^ 0 and cancel the factor £'(x). Also sgn (7 = — 1 if (7 is a transposition. This is clear if g interchanges adjacent numbers because it then changes thc^ sign of just one factor in £'(x); we leave the general case as an exercise for the interested reader. 5. THE SUBSPACE QJ" OF ALTERNATING TENSORS Definition. A covariant tensor /g (F*)® is symmetric if f^ = f for all o-G Sn- If / is bilinear [/ g (F*)®], this is just the condition /(^, rj) = f{rj, Q for all ^,^ G F. Definition. A covariant tensor / G (F*)® is antisymmetric or alternating if f = (sgn a)f for all a G Sn- Since each (T is a product of transpositions, this can also be expressed as the fact that / just changes sign if two of its arguments are interchanged. In the case of a bilinear functional it is the condition/(^, rj) = —/{v, 0 for all ^jV ^V. It is important to note that if/is alternating, then/({) = 0 whenever the n-tuple 5 = < ^1, • • • , ^n^ is not injective {^i = ^j for some i 9^ j). The set of all symmetric elements of (F*)® is clearly a subspace, as is also the (for us) more important set d^ of all alternating elements. There is an important linear projection from (F*)® to a^ which we now describe. Theorem 5.1. The mapping /h-> (l/n!)X^^Es^ (sgn (t)/"^ is a projection il from (F*)® to a^. Proof. We first check that Q/G a^ for every / in (F*)®. We have (Q/)^ - {l/n\)J2<x (sgnd)/"^^. Now sgncr = (sgndp)(sgnp). Setting a' = a o p and remembering that (7 1-^ cr' is a bijection, we thus have (SlfY = ^^E (sgn or'= (sgnp)(Q/). Hence Q/ G a^. If / is already in a"^, then f = (sgncr)/ and Q/ = (l/^!)Zl(rGs„/- Since S„ has n! elements, Q/ = /. Thus Q is a projection from (F*)® to d^. U Lemma 5.1. Q(/0 = (sgn p)Q/. Proof. The formula for 12(/^) is the same as that for (12/)^ except that pa replaces ap. The proof is thus the same as the one for the theorem above. D
7.5 THE SUBSPACE O/^ OF ALTERNATING TENSORS 311 Theorem 5.2. The vector space d^ of alternating n-linear functionals over the m-dimensional vector space V has dimension (^). Proof. If / G (1^ and / = Y.p CpjUp, then since f = (sgn a)/, we have Up CpjUpocr = Up (sgn(7)CpjUp for any a in Sn- Setting p o (j = q, the left sum becomes Xlq Cqo(r-iMq) and since the basis expansion is unique, we must have Cqo(r-i = sgn (TCq or Cp = (sgn(7)Cpocr for all p E m^. Working backward, we see, conversely, that this condition implies that f = (sgncr)/. Thus / G (2,^ if and only if its coordinate function Cp satisfies the identity <^p = (sgn(7)Cpocr for all p Grrf and all a G Sn- This has many consequences. For one thing, 0^=0 unless p is one-to-one (injective). For if pi = pj and a is the transposition interchanging i and j\ then p OCT = p, Cp = (sgn(7)Cpocr = —Cp, and so Cp = 0. Since no p can be injective if n > m, we see that in this case the only element of d^ is the zero functional. Thus n > m=^ dim a"" = 0. Now suppose that n < m. For any injective p, the set {p o a :a G Sn} consists of all the (injective) n-tuples with the same range set as p. There are clearly n\ of them. Exactly one q = p ° o" counts off the range set in its natural order, i.e., satisfies qi < q2 < ' ' ' < ^n- We select this unique q as the representative of all the elements p o a having this range. The collection C of these canonical (representative) g's is thus in one-to-one correspondence with the collection of all (range) subsets of m = {1, . . . , m} of size n. Each injective p G m"" is uniquely expressible as p = q o o- for some q G C, a G Sn- Thus each / in a"" is the sum Y^c^gc JlaG§>^ ^qo^rMqc^r- Since t^^a = (sgn a)tq, this sum can be rewritten Y^^gc t^ Y.<x (sgn(7)jUqo(r = ZlqGC ^q^q, where we have set Vq = Lcr (sgno-)/Xqocr = n\Q{fjLq). We are just about done. Each v^ is alternating, since it is in the range of 12, and the expansion f=^ Mq qGC which we have just found to be valid for every f G Gi^ shows that the set {j^q : q G C} spans a"". It is also independent, since Z!qGC ^q^q = UpGW" ^pMp and the set {jUp} is independent. It is therefore a basis for 6i^. Now the total number of injective mappings p from n = {1, . . . , n} to rn = {1, . . . , m} is m(m — 1) • • • (m — n + 1)? for the first element can be chosen in m ways, the second in m — 1 ways, and so on down through n choices, the last element having m — (n— l) = m — n~}~l possibilities. We have seen above that the number of these p's with a given range is n\. Therefore, the number of different range sets is , ^. m —n+1 ^! /mX m{m — 1) 1 = —j7 TT =1 I • ni nl{m — n)l \nj And this is the number of elements q G C D
312 MULTILINEAR FUNCTIONALS 7.() The case n = m is very important. Now C contains only one element, the identity / in S^, so that / = CiVi = CiJ2 (Sgn <T)fla and /(^l, . . . , ^m) = C/ X) (sgn(T)iUcr(l)(h) * * * Mcr(m)(^m) <T = ClU (Sgn (T)xi(i) • • • X^cm). This is essentially the formula for the determinant, as we shall see. 6. THE DETERMINANT We saw in Section 5 that the dimension of the space d^ of alternating m-forms over an m-dimensional F is O = 1. Thus, to within scalar multiples there is only one alternating m-linear functional D over V = W^, and we can adjust thc^ constant so that Z)(5\ . . . , 8^) = 1. This uniquely determined m-form is thc^ determinant functional, and its value Z)(x\ . • • > x^) at the m-tuple <x\ ..., x^> is the determinant of the matrix x = {xij} whose jth column is x-^ for,/ = 1,. .., ni. Lemma 6.1. D(t\ . . . , t^) = T^aGS^ (sgno-)^^(i),i • • • ^,r(m),m. Proof. This is just the last remark of the last section, with the constant cj = 1, since i)(5\ . . . , 8^) = 1, and with the notation changed to the usual matrix form tij. D Corollary 1. i)(t*) = D(t). Proof. If we reorder the factors of the product ^^^,1 * * * ^cr^.m in the order of the* values ai, the product becomes ti,p^ • • • tm,p^, where p = a~^. Since is a bijection from S^ to §>n, and since sgn((7~^) = sgncr, the sum in the lemma can be rewritten as ^pG§>^ (sgnp) ti,p^ • • • tm,p^' But this is E(sgnpK^,i---^p^,^=i)r). □ Corollary 2. D{t) is an alternating m-linear functional of the rows of t. Now let dim V = m, and let / be any nonzero alternating m-form on V, For any T in Hom V the functional/r defined by/7^(^i, . . . , ^n) = /(^h, • • • , ^^n) also belongs to d'^. Since d^ is one-dimensional, fr = ^rf for some constant hr. Moreover, Ut is independent of/, since if Qt = kr^g and g = c/, we must havc^ cfr = /cy^'c/and kr^ = kr- This unique constant is called the determinant of T] we shall designate it A(T'). Note that A(T') is defined independently of any basis for 7. Theorem 6.1. A(aS o T) = A{S) A{T).
7.6 THE DETERMINANT 313 Proof A{SoT)f(^„ . . . , ^^)=f(^SoT){^,), . . . , (^o 7^)(^^)) = f{S(T{^,)),...,S{T{U))) = A{S)f{m,), . . . , T{U)) = A{T) A(^)/(^i, . . . , U. Now divide out /. D Theorem 6.2. If 6 is an isomorphism from V to W, and if 7" G Hom V and S = doT o d-\ then A(aS) = A(7^). Proof. If / is any nonzero alternating m-form on W^ and if we define g by f/(^i) • • • , ?n) = fi^iii ' ' ' , Q^n)' Then ^ is a nonzero alternating m-form on V. Now f{S o d^u ...,So dU) = A(S)f{d^, . . . , d^n) = HS)g{^i,. . • , ?n), and also/(^ o e^u • • • , ^ ° ^^n) = fiS o T^^i,. . . , ^ o T^) = g{T^,, . . . , T^U = HT)g(^i, . . . , ^n). Thus A{S)g = A{T)g and A(^) = A{T). D The reader will expect the two notions of determinant we have introduced to agree; we prove this now. Corollary 1. If t is the matrix of T with respect to some basis in V, then D{t) = A{T). Proof. If 6 is the coordinate isomorphism, then T = 6 o T o e~^ is in Hom W^ and A{T) = A{T) by the theorem. Also, the columns of t are the m-tuple T{8_^), . . . , ?(5^). Thus_i)(t) = D{t\ . . . , t^) = D{T{8^), ..., ?(5^)) = A{T) D{8\ . . . , 5^) = A{T). Altogether we have D{t) = A{T). D Corollary 2. If s and t are mX m matrices, then Z)(s • t) = Z)(s) D{t). Proof. D{s • t) = A(^ o T) = A{S) A{T) = D{s) D{t). D Corollary 3. D{t) = 0 if and only if t is singular. Proof. If t is nonsingular, then t~^ exists and D{t) D{t~^) = D{tt^^) = D{I) = 1. In particular, D{t) f^ 0. If t is singular, some column, say ti, is a linear combination of the others, ti = Zl^ c^•t^, and Z>(t], . . . , t^) = Z^ CiD{ti, t2, . . . , t^) = 0, since each term in the sum evaluates D at an //i-tuple having two identical elements, and so is 0 by the alternating property. D We still have to show that A has all the properties we ascribed to it in Chapter 2. Some of them are in hand. We know that A{S o T) = A{S) A{T), and the one- and two-dimensional properties are trivial. Thus, if T interchanges independent vectors ai and 0:2 in a two-dimensional space, then its matrix with respect to them as a basis is t = [? ol? ^^^ so A(T) = D{t) = —1. The following lemma will complete the job. Lemma 6.2. Consider Z)(t) = Z)(t\ . . . , t"^) under the special assumption that t"^ = d"^. If s is the (m — 1) X (m — 1) matrix obtained from the m X m matrix t by deleting its last row and last column, then Z)(s) = ^(t).
314 MULTILINEAR FUNCTIONALS 7.(') Proof. This can be made to follow from an inspection of the formula of Lemma 6.1, but we shall argue directly. If t has 5"^ also as itsjth column for soniej f^ m, then of course Z)(t) = 0 by the alternating property. This means that Z)(t) is unchanged if the jth column is altered in the mth place, and therefore D{t) depends only on the values Uj in the rows i 9^ m. That is, Z)(t) depends only on s. Now t 1-^ s is clearly a sur- jective mapping to O^^-^x^-i^ and, as a function of s, Z)(t) is alternatinj^: (m — 1)-linear. It therefore is a constant multiple of the determinant D on IR^'^~^^^^'^~^\ To see what the constant is, we evaluate at s= <5S..., 5^-i>. Then Z)(s) = 1 = Z)(t) for this special choice, and so Z)(s) = Z)(t) in general. 11 In order to get a hold on the remaining two properties, we consider an m X m matrix t whose last m — n columns are 5^"^^ . . . , 5^, and we apply the above lemma repeatedly. We have, first, D{t) = i)((t)^^), where (t)'"'" is the (m— l)X(m— 1) matrix obtained from t by deleting the last row and the last column. Since this matrix has 8^~^ as its last column (5^~^ being now an {m — 1)-tuple), the same argument shows that its determinant is the same as that of the (m — 2) X (m — 2) matrix obtained from it in the same way. We can keep on going as long as the 5-columns last, and thus see that D{i) is tin• determinant of the nX n matrix that is the upper left corner of t. If we interprc^l this in terms of transformations, we have the following lemma. Lemma 6.3. Suppose that V is m-dimensional and that T in Hom V is the identity on an (m — n)-dimensional subspace X. Let F be a complement of X, and let p be the projection on Y along X. Then p o (T \ Y) can be considered an element of Hom Y and A{T) = Ay(p ° {T \ F)). Proof. Let on, . . . , o:^ be a basis for F, and let an-\-i) . . . , o:^ be a basis for A'. Then {a:,} 7 is a basis for V, and since T{ai) = ai for i = n + 1, . . . , m, the matrix for T has 8^ as its ith column for i = n + 1^ . . . ^ m. The lemma will therefore follow from our above discussion if we can show that the matrix of p o {T t F) in Hom F is the nX n upper left corner of t. The student should be able to see this if he visualizes what vector (p o T) (ai) is for i < n. D Corollary. In the above situation if F is also invariant under T', thcni A(r) = AyiT \ Y). Proof. The proof follows immediately since now p ° {T \ Y) = T \ Y. II If the roles of X and F are interchanged, both being invariant under T and T being the identity on F, then this same lemma tells us that A(T') = Ax{T \ X). If we only know that X and F are T'-invariant, then we can factor T into u commuting product T = Ti o T2 = T'2 ° ^1, where Ti and T2 are of the two more special types discussed above, and so have the rule A{T) = A(T'i) A{T2) =^ Ax{T \ X) Ay{T \ Y), another of our properties listed in Chapter 2.
7.6 THE DETERMINANT 315 The final rule is also a consequence of the above lemma. If T is the identity on X and also on F/X, then it isn't too hard to see that p ° {T \ Y) is the identity, as an element of Hom F, and so A (7") = 1 by the lemma. We now prove the theorem concerning "expansion by minors (or cofactors)". Let t be an m X m matrix, and let (t)^^ be the {m — 1) X {m — 1) sub matrix obtained from t by deleting the pth row and ?'th column. Then, Theorem 6.3. D{t) = Y.T=i {-iT^'tir i>((t)^0- That is, we can evaluate ^(t) by going down the rth column, multiplying each element by the determinant of the (m— l)X(m— 1) matrix associated with it, and adding. The two occurrences of 'D^ in the theorem are of course over dimensions m and m — 1, respectively. Proof. Consider D{t) = D(t^, . . . , t^) under the special assumption that t^ = 8^. Since Z)(t) is an alternating Hnear functional both of the columns of t and of the rows of t, we can move the ?'th column and pth row to the right- bottom border, and apply Lemma 6.2. Thus D(t) = (-i)"-^(-i)"-^i)(tn = (-ir^'D(tn, assuming that the rth column of t is 8^. In general, t^ = JlT=i Ur 8\ and if we expand Z>(t ^ . . . , t^) with respect to this sum in the ?^th place, and if we use the above evaluation of the separate terms of the resulting sum, we get D{t) = Corollary 1. If s 9^ r, then ZT=i (-ly^'tis i>((t)^"0 = 0. Proof. We now have the expansion of the theorem for a matrix with identical sth and rth columns, and the determinant of this matrix is zero by the alternating property. D For simpler notation, set Cij = (—l)'"^-^ Z)((t)''^). This is called the cofactor of the element Uj in t. Our two results together say that m In particular, if D{t) 9^ 0, then the matrix s whose entries are Sri = Cir/D(t) is the inverse of t. This observation gives us a neat way to express the solution of a system of linear equations. We want to solve t • x = y for x in terms of y, supposing that D(t) 9^ 0. Since s is the inverse of t, we have x = s • y. That is, ^i = L^i ^jiVi = (IlT=i yiCij)/D{t) for j = 1, . . . , m. According to our expansion theorem, the numerator in this expression is exactly the determinant dj of the matrix obtained from t by replacing its jih column bj^ the m-tuple y. Hence, with dj defined this way, the solution to t • x = y is the m-tuple This is Cramer'8 rule. It was stated in slightly different notation in Section 2.5.
316 MULTILINEAR FUNCTIONALS 7.7 7. THE EXTERIOR ALGEBRA Our final job is to introduce a multiplication operation between alternating ?2-linear functionals (also now called exterior n-forms). We first extend the tensor product operation that we have used to fashion elementary covariant tensors out of functionals. Definition. If / G (7*)® and g G (7*)®, then f®gh that element of (y*)(SP defined as follows: We naturally ask how this operation combines with the projection Q of (y*)^SB> onto a^+^ Theorem 7.1. Q(/® (/) = ^{f ® ^O) = ^(^/® O)- Proof. We have ^(/ ® %) = (^^^ Z (sgn (7)(/ ® Q^)^ :p-^ Z (sgn 0-) (f ® ^ X) (sgn p)g'] (n + We can regard p as acting on the full n + I places of / ® (7 by taking it as thc^ identity on the first n places. Then (/ ® g^Y = {f ® 9^- Set pa = a\ For each (t' there are exactly /! pairs <Pja> with per = cr', namely, the pairs { <p, p~ V' > : p G Sz} • Thus the above sum is Z(sgn(TO(/®(7r = 12(/®(7). The proof for Q(Q/ ® (7) is essentially the same. D Definition. If / G a"^ and g G a\ then f A g = Ct^)^{f®g)' Lemma 7.1. /i A /2 A ' ' ' A //c = (n!/^i!n2! • • • ^/c!)^(/i ®"'® fk), where rii is the order of/^•, z = 1, . . . , /c, and n = J2\ 'hiproof. This is simply an induction, using the definition of the wedge operation A and the above theorem. D Corollary. If X,- G F*, z = 1, . . . , n, then Xi A • • • A An = n!Q(Xi ® • • • ® X^). In particular, if gi < • • • < g^ and {iu^}7 is a basis for F*, then fJ'qi A ' ' ' A iJLq^ = n!Q(jUq) = the basis element Vq of d^.
7.7 THE EXTERIOR ALGEBRA 317 Theorem 7.2. If / G a^ and g G a\ then g A f= (-1)^7 A g. In particular, X A X = 0 for X G F*. Proof. We have g ® f = (/ ® gY, where a is the permutation moving each of the last I places over each of the first n places. Thus a is the product of In transpositions, (sgncr) = ( — 1)^^, and ^{g®f) = ^{f®gr= (sgn(7)Q(/®(7)= (-iy^n{f®g). We multiply by (^J^) and have the theorem. Q Corollary. If {X,}^ C F*, then Xi A • • • A X^ = 0 if and only if the sequence {Xj^ is dependent. Proof. If {\i} is independent, it can be extended to a basis for F*, and then Xi A • • • A Xn is some basis vector v^ of a^ by the above corollary. In particular, Xi A • • • A X^ 7^ 0. If {X^} is dependent, then one of its elements, say Xi, is a linear combination of the rest, Xi = Y.2 (^i^i and Xi A X2 A • • • A X^ = Y.i=2 (^i^i A (X2 A • • • A X^). The ith. of these terms repeats X^-, and so is 0 by the lemma and the above corollary. D Lemma 7.2. The mapping <f,g> 1-^ / A (7 is a bilinear mapping from a"" X a^ to a'^'^K Proof. This follows at once from the obvious bilinearity of / ® (7. D We conclude with an important extension theorem. Theorem 7.3. Let 6 be the alternating n-linear map <Xi, . . . , \n> i-^ Xi A • • • A X^ from (F*)^ to d^. Then for any alternating n-linear functional /^(Xi,. .., X^) on (F*)^, there is a uniquely determined linear functional G on a^ such that F = God, The mapping G ^-^ F is thus a canonical isomorphism from (a^)*toa^(F*). Proof. The straightforward way to prove this is to define G by establishing its necessary values on a basis, using the equation F = God, and then to show from the linearity of G, the alternating multilinearity of <Xi, . . . , X^> ^-^ Xi A • • • A X^, and the alternating multilinearity of F that the identity F = G o d holds everywhere. This computation becomes notationally complex. Instead, we shall be devious. We shall see that by proving more than the theorem asserts we get a shorter proof of the theorem. We consider the space (2,^(F*) of all alternating n-linear functions on (F*)^. We know from Theorem 5.2 that (l(a^(F*)) = Q, since (l(F*) = d(V) = m.
318 MULTILINEAR FUNCTIONALS 7.7 Now for each functional G in (d^)*, the functional G o ^ is alternating and n- linear, and soG»-^F=Go^isa mapping from (Q,^)* to (2,^(7*) which is clearly linear. Moreover it is injective, for if G ?^ 0, then /^(^u^d), . . . , jU(z(n)) = G{Vq) 9^ 0 for some basis vector z^^ = ^u^d) A • • • A jU(z(n) of (2,^(7*). Since d{a'^{V*)) = (!^) = dia"^) = d((a'')*), the mapping is an isomorphism (by the corollary of Theorem 2.4, Chapter 2). In particular, every F in a^(7*) is of the form G o d. D It can be shown further that the property asserted in the above theorem is an abstract characterization of (2^. By this we mean the following. Suppose thai a vector space X and an alternating mapping (p from (7*)^ to X are given, and suppose that every alternating functional F on (F*)^ extends uniquely to a linear functional G on X (that is, F = G o cp). Then X is isomorphic to (2^, and in such a way that <p becomes 6. To see this we simply note that the hypothesis of unique extensibility is exactly the hypothesis that ^: G \~^ F = G ^ cp is am isomorphism from X* to a^(7*). The theorem gave an isomorphism 0 from (a^)* to (2^(7*), and the adjoint ($~^ o ©)* is thus an isomorphism from X** to (a^)**, that is, from X to d^. We won't check that cp "becomes" 6. By virtue of Corollary 1 of Theorem 6.2, the identity Z)(t) = Z)(t*) is the matrix form of the more general identity A{T) = A(T'*), and it is interesting to note the "coordinate free" proof of this equation. Here, of course, T ^ Hom V. We first note that the identity (T'*X)(^) = \(T^) carries through the definitions of ® and A to give 7^*Xi A • • • A 7^*X,(^i, . . . , ^,) = Xi A • • • A X,(7^^i, . . . , T^U- (*) Also, ev^: Xi A • • • A X^ i-^ Xi A • • • A Xn(^i, • • • , ^n) is an alternating n-linear functional on an(7*) for each g G V^. The left member of (*) is thus eVg(7'*Xi, . . . , 7'*X^), and, if n = dim 7, this is A(7'*)eVg(Xi, . . . , X^) by the definition of A. By the same definition the right side of (*) becomes A(7^)[Xi A • • • A X,(^i, . . . , U] = A(7^)ev^(Xi, . . . , X,). Thus (*) implies the identity A(7'*)eVg = HT)ev^. Since ev^ 9^ 0 if ^ = {^i}1 is independent, we have proved that A(T'*) = A(T'). We call a wedge product Xi A • • • A X^ of functionals X^- G 7* a multi- vector. We saw above that Xi A • • • A X^ ?^ 0 if and only if {X,}^ is independent, in which case {Xj^ spans an n-dimensional subspace of 7*. The following lemma shows that this geometric connection is not accidental. Lemma 7.3. Two independent n-tuples {X^}^ and {fJLi}\ in 7* have the same linear span if and only if jUi A • • • A Mn = ^(^1 A • • • A Xn) for some /c. Proof. If {juy} 1 C L({X^} i), then each jjlj is a linear combination of the X/s, and if we expand jUi A • • • A jUn according to these basis expansions, we get k{\i A • • • A X^). If, furthermore, {fiij^ is independent, then k cannot be zero.
7.8 EXTERIOR POWERS OF SCALAR PRODUCT SPACES 319 Now suppose, conversely, that /jli A • • • A iJLn = k(\i A • • • A X^), where k 7^ 0. This implies first that {juj i is independent, and then that My A (Ai A • • • A \n) = 0 for each j, so that each /jlj is dependent on {X^}^. Together, these two consequences imply that the set {jUz} i has the same linear span as {Xj^. □ This lemma shows that a multivector has a relationship to the subspace it determines like that of a single vector to its span. 8. EXTERIOR POWERS OF SCALAR PRODUCT SPACES Let y be a finite-dimensional vector space, and let ( , ) be a nondegenerate (nonsingular) symmetric bilinear form on V. In this and the next section we shall call any such bilinear form a scalar product, even though it may not be positive definite. We know that the bilinear form ( , ) induces an isomorphism of V with y* sending ?/ G F into y G F*, where y{x) — (x, "y) = (x, y) for all X gV. We then get a nondegenerate form (scalar product), which we shall continue to denote by ( , ), on F* by setting (i7, v) = (t6, v). We also obtain a nondegenerate scalar product on (2,^ by setting {ui A ' ' ' A Uq, vi A ' ' ' A Vq) = det {{Ui, vj)). (8.1) To check that (8.1) makes sense, we first remark that for fixed zJi, . . . , y^ G F*, the right-hand side of (8.1) is an antisymmetric multilinear function of the vectors Ui, . . . jUq, and therefore extends to a linear function on (2^(F*) by Theorem 7.3. Similarly, holding the u's fixed determines a linear function on (^^(F*), and (8.1) is well defined and extends to a bilinear function on (2^(F*). The right-hand side of (8.1) is clearly symmetric in u and z;, so that the bilinear form we get is indeed symmetric. To see that it is nondegenerate, let us choose a basis Uij . . . J Un so that {ui, Uj) = ±. 8ij. (8.2) (We can always find such a basis by Theorem 7.1 of Chapter 2.) We know that {ui} = {ui^ A • • • A Ui] forms a basis for (2,^, where i = <iij . . . ^ iq> ranges over all g-tuplets of integers such that I < ii < - - - < iq < rij and we claim that (i7i, i7j) = ±5ij. (8.3) In fact, if i 5^ j, then ir 9^ js for some value of r between 1 and q and for all s. In this case one whole row of the matrix (ui^, Uj^) vanishes, namely, the rth row. Thus (8.1) gives zero in this case. If i = j, then (8.2) says that the matrix has ±1 down the diagonal and zeros elsewhere, establishing (8.3), and thus the fact that ( , ) is nondegenerate on (2^. In particular, we have {ui A ' ' ' A Un, ui A ' ' ' A Un) = (—1)*, (8.4) where # is the number of minus signs occurring in (8.3).
320 MULTILINEAR FUNCTIONALS 7.9 9. THE STAR OPERATOR Let y be a finite-dimensional vector space endowed with a nondegenerate scalar product as in Section 8. The space (2,^ is one-dimensional if n is the dimension of V. The induced scalar product on (2^ is nondegenerate, so that (u, u) is either always positive or always negative for all nonzero i7 G (2^. In particular, there are exactly two tI's in (2,^ with {%u) = ±1. Let us choose one of them and hold it fixed for the remainder of this section. Geometrically, this amounts to choosing an orientation on V. We thus have picked a i7 G a^ with (i7, u) = (-1)^. (9.1) Let V be some fixed element of d^. Then for any y G (2^~^, v A y G (2^, and so we can write v A y = fv{y)u, where fy{y) depends linearly on y. Since the induced scalar product ( , ) on Gi^~^ is nondegenerate, there is a unique element *v G (i^~^ such that {y, *v) = My). To repeat, we have assigned a *2J G (2^~^ to each y G (2^ by setting (^, ^v)u = V A y. (9.2) We have thus defined a map, *, from d^ to a^~^. It is clear from (9.2) that this map is linear. Let t^i, . . . , t6^ be a basis for V satisfying (8.2) and also u = Ui A ' ' ' A Unf and construct the corresponding bases for the spaces (2^ and Q^n-3 Then Ui A Uj = 0 if any ii occurring in the g-tuplet i also occurs in j. If no ii occurs in j then Ui A Uj ^ ^i^...i^j^...j^_qUj where e^ = sgn k. If we compare this with (9.2) and (8.3), we see that *27i = ±€,i...,y^...y^_^l7j, (9.3) where the sign is the same as that occurring in (8.3), i.e., the sign is positive or negative according as the number of ji with {Uj^^ Uj^ = — 1 which appear in j is even or odd. Applying * to (9.3), we see that **2j = (-l)«(^-«)+#z; for all v^a^. (9.4) Let V and lo be elements of (2^. Then (*z;, ^w)u = V A "^w = { — iy^^~^^^w A V = {—iy^^~^\^^w, v)u. If we apply (9.4), we see that (*z;^ ^w) = (—l)*(y, w). (9.5)
CHAPTER 8 INTEGRATION 1. INTRODUCTION In this chapter we shall present a theory of integration in n-dimensional Euclidean space E^, which the reader will remember is simply Cartesian n-space W^ together with the standard scalar product. Our main item of business is to introduce a notion of size for subsets of E^ (area in two dimensions, volume in three . . .). Before proceeding to the formal definitions, let us see what properties we would like our notion of size to have. We are looking for a function (jl which assigns a number fjL{A) to bounded subsets A C E^. i) We would like ijl{A) to be a nonnegative real number. ii) If Ac B, we would expect to have iJi(A) < ii{B). iii) If A and B are disjoint (that is, A n 5 = 0), then we would expect to have ii{A D B) = ii{A) + fx{B). iv) Let T be any Euclidean motion. * For any set A let TA be the set of all points of the form T'x, where x ^ A. We then would expect to have fjL{TA) = ijl{A). (Thus we want "congruent" sets to have the same size.) v) We would expect a "lower-dimensional set" (where this is suitably defined) to have zero size. Thus points in the line, curves in the plane, surfaces in three-space, etc., should all have zero size. vi) By the same token, we would expect open sets to have positive size. In the above discussion we did not specify what kind of sets we were talking about. One might be ambitious and try to assign a size to every subset of E^. This proves to be impossible, however, for the following reason: Let U and V be any two bounded open subsets of E^. It can be shownj that we can find * Recall that a Euclidean motion is an isometry of E^ and can thus be represented as the composition of the translation and an orthogonal transformation, t S. Banach and A. Tarski, Sur la decomposition des ensembles de pointes en par tie respectivement congruentes. Fund. Math. 6, 244-277 (1924). R. ]\L Robinson, On the decomposition of spheres, Fund. Math. 34, 246-260 (1947). 321
322 INTEGRATION 8.2 decompositions k k U= \J Ui and F = U Vi i=l i=l with Ui n Uj = 0 = Vi n Vj for i 9^ j, and Euclidean motions Ti with TiTJi = F^ In other words, we can break up TJ into finitely many pieces, move these pieces around, and then recombine them to get F. Needless to say, the sets JJi will have to look very bad. A moment's reflection shows that if we wish to assign a size to all subsets (including those like C/^), we cannot satisfy (ii), (iii), (iv), and (vi). In fact, (iii) [repeated (/c — 1) times] implies that m(c/) = E M(C'^f). and (iv) implies that [x(JJi) = ^(Fi). Thus ijl{U) = ju(F), or the size of any two open sets would coincide. Since any open set contains two disjoint open sets, this implies, by (ii), that fjL{U) > 2ijl{U)j so fjL{U) = 0. We are thus faced with a choice. Either we dispense with some of requirements (i) through (vi) above, or we do not assign a size to every subset of E'^. Since our requirements are reasonable, we prefer the second alternative. This means, of course, that now, in addition to introducing a notion of size, we must describe the class of "good" sets we wish to admit. We shall proceed axiomatically, listing some "reasonable" axioms for a class of subsets and a function fi. 2. AXIOMS Our axioms will concern a class 3) of subsets of E'^ and a function (jl defined on 3). (That is, iJi{A) is defined if A is a subset of E'^ belonging to our collection 3).) I. 3) is a collection of subsets of E'^ such that: 3)1. If A G 3) and 5 G 3), then AuBe^, AnBe^, and A — 5 G 3). 3)2. If A G 3) and 7" is a translation, then TA G 3). 3)3. The set [Jl = {x :0 < x' < 1} belongs to 3). II. The real-valued function fx has the following properties: ill. fjL(A) > 0 for all A G 3). m2. If a g 3), 5 G 3), and a n 5 = 0, then fjL{A \J B) = fjL(A) + fi(B). /aS. For any ^4 G 3) and any translation T', we have ijl{TA) = ijl{A). m4. M(nJ) = 1. Before proceeding, some remarks about our axioms are in order. Axiom 3)1 will allow us to perform elementary set-theoretical operations with the elements of 3). Note that in Axioms 3)2 and fiS we are only allowing translations, but in
8.2 AXIOMS 323 our list of desired properties we wanted proper behavior with respect to all Euclidean motions in (iv). The reason for this is that we shall show that for "good" choices of 3), the axioms, as they stand, imiquely determine ju. It will then turn out that ijl actually satisfies the stronger condition (iv), while we assume the weaker condition ju3 as an axiom. Fig. 8.1 Axiom 5)3 guarantees that our theory is not completely trivial, i.e., the collection 33 is not empty. Axiom ju4 has the effect of normalizing jjl. Without it, any (jl satisfying jul, /i2, and ju3 could be multiplied by any nonnegative real number, and the new function ju' so obtained would still satisfy our axioms. In particular, ju4 guarantees that we do not choose fx to be the trivial fimction assigning to each A the value zero. Fig. 8.2 Our program for the next few sections is to make some reasonable choices for 3> and to show that for the given 2) there exists a unique (jl satisfying jjlI through />t4. An important elementary consequence of the 3), ju-axioms that we shall frequently use without comment is: iDiS. If J. C Ui ^i and all the sets are in 2), then /i(4) < J^\ fxiAi). Our beginning w^ork wdll be largely combinational. We will first consider (generalized) rectangles, which are just Cartesian products of intervals, and the way in which a point inside a rectangle determines a splitting of the rectangle into a collection of smaller rectangles, as indicated in Fig. 8.1. This is associated with the fact that the intersection of any two rectangles is a rectangle and the difference of two rectangles is a finite disjoint union df rectangles (see Fig. 8.2).
324 INTEGRATION 8.3 Fig. 8.3 We call a set A paved if it can be expressed as the union of a finite disjoint collection /> of rectangles (a paving of ^4). It will follow from our combinational considerations that the collection 3)min of all the paved sets satisfies Axioms 3)1 through 3)3 and is the smallest family that does: any other collection 3) satisfying the axioms includes a)min- It will then follow that if /jl satisfies /jlI through />t4 on 3)min, then it must have the natural value (the product of the lengths of the sides) for a rectangle. This implies that /a is uniquely defined on 3)miii by requirements julI through />t4, since the value /Ji{A) for any paved set A must be the sum of the natural values for the rectangles in a paving of A. The existence of /jl on a)inin thus depends on the crucial lemma that two different pavings of the set A give the same sum. (See Fig. 8.3.) This comes down to the fact that the "intersection" of two pavings of A is a third paving "finer" than either, and the fact that when a single rectangle is broken up, the natural values of ja for the pieces add up to /a for the fragmented rectangle. All these considerations are elementary but exceedingly messy in detail. We give the proofs below for the reader to refer to in case of doubt, but he may prefer to study only the definitions and statements of results and then to proceed to Section 6. 3. RECTANGLES AND PAVED SETS We first introduce some notation and terminology. Let a = <a\...,a^> and b = <6\ . . . , 6^> be elements of E^. By the rectangle \Z}^ we shall mean the set of all X = <xS . . . , x^> in E^ with a' < x' < h\ Thus Da = {x : a^ < x^ < 6\ z = 1, . . . , n}. (3.1) Note that in order for Oa to be nonempty, we must have a' < 6' for all i. In other words. Da = 0 if a^ > 6^ for some i. (3.2) In the plane {n = 2) for instance, our rectangles Oa correspond to ordinary Euclidean rectangles whose sides are parallel to the axes. (We should perhaps use an additional adjective and call our sets level rectangles, braced rectangles, or something else, but for simplicity we shall just call them rectangles.) Note that in the plane our rectangles include the left-hand and lower edges but not the right-hand and upper ones (see Fig. 8.4).
8.3 RECTANGLES AND PAVED SETS 325 a:,=0 -X2 = 0 Fig. 8.4 For general n, if we set 1 = < 1, . . . , 1 > , then our notation coincides with that of 3)3. We now collect some elementary facts about rectangles. It follows immediately from the definition (3.1) that if a = <a^, , . , , a^> ,h = < 6\ . . . , 6"" > , etc.. then i. ^ ^ Da n nf = DL (3.3) where e' = max {a\ &) and /' = min (6', d"), z = 1, . . . , n. (The reader should draw various different instances of this equation in the plane to get the correct geometrical feeling.) Note that the case where Da nnf= 0 is included in (3.3) by (3.2). Another immediate consequence of the definition (3.1) is ^ Da = n?a for any translation T. (3.4) We will now establish some elementary results which will imply that any 3) satisfying Axioms 3)1 through 3)3 must contain all rectangles. Lemma 3.1. Any rectangle Qa can be written as the disjoint union k LJa ^ U LJa^, where b^ — a^ ^ LJo. r = l (What this says is that any "big" rectangle can be written as a finite union of "small" ones.) Proof. We may assume that Oa 7^ 0 (otherwise take /c = 0 in the union). Thus 6^ > a^ In particular, if we choose the integer m sufficiently large, (1/2"^) (b - a) will lie in nJ. By induction, it therefore suffices to prove that we can decompose Oa into the disjoint union 271 Da = U Do: with ds - c. = i(b - a). (3.5) (For then we can continue to subdivide until the rectangles we get are small enough.) We get this subdivision in the obvious way by choosing the vertex "in the middle" of the rectangle and considering all rectangles obtained by cutting □a through this point by coordinate hyperplanes. To write down an explicit
326 INTEGRATION 8.3 a[i,2] =b0 a{2] 1 ^a{2] 1 ^^^ 1 1^0 \ ^b|i,2} 1 ^''^' 1 ^ari] b = bn b[i] Fig. 8.5 formula, it will be convenient to use the set of all subsets of {1, . . . , n} as an indexing set, rather than the integers 1, . . . , 2^. Let J denote an arbitrary subset of {1, 2, . . . , n}. Let sij = Ka], . . . ^a!j> and b/ = <6j, . . . , 6}> be given by and h} = L_l^ .f .^^_ a} |a' + if i G J, if i^J Then any x G Oa lies in one and only one Da;;. In other words, Hl'j n Da^ = 0UJ 9^ Ksmd Uaii/ Daj = Da- (The case where n = 2 is shown in Fig. 8.5.) Since bj — a^^ = ^(b — a) for all J, we have proved the lemma. D We now observe that for any c G Do we have, by (3.3), ns = ns n Dc-i. (3.6) Let Ty denote translation through the vector v. Then Dc-i = ^c-i Do by (3.4). Thus by Axioms 3)2 and 3)3 the rectangle Dc-i must belong to 3). By (3.6) and Axiom 3)1 we conclude that Do ^ 3) for any c G Do- Observe that TJOy^ = Da by (3.4). Thus Da e 3) whenever b — a G Do- (3.7) If we now apply Lemma 3.1 we conclude that Da e 3) for all a and b. We make the following definition. Definition 3.1. A subset aS C E^ will be called a paved set if S is the disjoint union of finitely many rectangles. We can then assert: Proposition 3.1. Any 3) satisfying Axioms 3)1 through 3)3 must contain all paved sets. Let 3)miii denote the collection of all finite unions of rectangles; then 3)miii satisfies Axioms 3)1 through 3)3. Proof. We have already proved the first part of this proposition. We leave the second part as an exercise for the reader.
8.4 THE MINIMAL THEORY 327 4. THE MINIMAL THEORY We are now going to see how far /x is determined by Axioms /jlI through /xS. In fact, we are going to show the mCDh) is what it should be; i.e., if a=<aS...,a^> and b = <6\ . . . , 6^> , then we must have MCDa) = 0 if D^ (6i - ai) • • • {bn - an) if Da ^ (4.1) Axiom/X4 says that (4.1) holds for the special case a = 0, b = 1. Examining the proof of Lemma 3.1 shows that OJ can be written as the disjoint union of 2^ rectangles, all congruent (via translation) to Do^^j where ^ = (^, . . . , J). Axioms ju2 and ju3 then imply that M(nn = ^- Repeating this argument inductively shows that (4.2) We shall now use (4.2) to verify (4.1). The idea is to approximate any rectangle by unions of translates of cubes r LJo • Fig. 8.6 Observe that in proving (4.1) we need to consider only rectangles of the form □S- In fact, we take c = b — a and observe that r-aCQa) = as, SO Axiom ju3 implies that ju(na) = KlUS)) and by definition c^ - - - c^ = (6^ — a^)" ' (6^^ — a""). If Do = 0, then (4.1) is trivially true (from Axiom fx2). Suppose that Do ^ 0- Then c = <c^ . . . , c^> with c^ > 0 for all i. For each r there are n integers N^, , , , , N^ such that (Fig. 8.6) N'/2' < c' < {N' + \)I2\ (4.3) In what follows, let k = </c\ . . . , /c^>, L = <Z\ . . . , r>, etc., denote vectors with integral coordinates (i.e., the /c/s are integers). Let us write k < Lif /c,- < li for all i, U N = <Ni, . . . , Nn>, then it follows from (4.3) and the definitions that □;i (l/2^)k+l/2' p-|c /20k whenever For any k and L, Since r-^(l/20k+l/2'^ r-^(l/20L+l/2'^ L_J(i/20k ' ' L_J(i/20L p-l(l/20k+l/2'^ ^LJ(i/20k 0 2nr k < N. if k F^ 1.
328 INTEGRATION by (4.2) (and Axiom ^t2) and I I r-i(l/2^)k+l/2'^ r-nc U LJ(i/20k ^ LJo, k<N we conclude that M(nS) ^ 2^ X (the number of k satisfying 0 < k < N). It is easy to see that there are Ni • N2 • . . . • Nn such k, so that MDS) > ~X (A^x • • • Ar„) = (|-l) . . . (^|l^ . According to (4.3), N,/2'' > c, - 1/2'', so we have M(nS)>(c^-^.)---(c"-^)- (4.1) nsc u Di;; Similarly, l(l/2^)k+l/2^ 1/2 ^)k ? 1 1 1 k<N+2 and we conclude that MDS) <(c^ + |)---fc- + |:^) (4.r,) 2^ Letting r —> (X) in (4.4) and (4.5) proves (4.1). In deriving (4.1) we made use of Axiom ^t4. Examining our argument shows that if iJL^ satisfied ijl2 and ^t3 but 7iot ijl4:, we could argue in the same mannc^r except that we would have to multiply everything by the fixed constant m'IDo)- To sum up, we have proved: Proposition 4.1. If ^t satisfies Axioms ^tl through ^t4, then the value of ^i or any rectangle is uniquely determined and is given by (4.1). If fx' satisfi(\s ^tl through idS, then for any rectangle Gaj M'(na) = KfjiiBl) where K = ^'(nJ). 5. THE MINIMAL THEORY (Continued) We will now show that formula (4.1) extends to give a unique ijl defined on SDmi,, so as to satisfy Axioms fil through ju4. We must establish essentially two facts. 1) Every union of rectangles can he ivritten as a disjoint union of rectangles This will then allow us to use Axiom ^tl to determine ^t(A) for every A E 3)miti, if A is the disjoint union of the [I]a-- Since A might be written in another way as a disjoint union of rectangles, this formula is not well defined until we establish that: 2) If A = UlZla- = UDc/ ^^6 i^^o re'presentations of A as a disjoint union of rectangles, then i:M(na;) = Ziu(n^).
8.5 THE MINIMAL THEORY (cONTINUEd) 329 Fig. 8.7 <c <c <c <c ci> , cl>\ ci> <cl 4> <4' 4> 1 1 1 1 1 1 a <cl, cl>\ <cl, C2> _ —4 <4' ^4> <4, C2> -1 1 <4,4> 1 Fig. 8.8 c\, cl> <c\, c^> cl cl> <cl ci> <c\, c\> <c\, cf> We first introduce some notation. Definition 5.1. A paving /> of E^ is a finite collection of mutually disjoint rectangles. The floor of this paving, denoted by |/>|, is the union of all rectangles belonging to p. lip = {G^;} and T' is a translation, we set Tp = {T^Da:}- If p and g. are two pavings, we say that ^ is finer than p (and write g < p) if every rectangle of /> is a union of rectangles of g. It is clear that U p < v and g < Pf then g < v. Note also that g < P implies \p\ C |^|. Proposition 5.1. Let p and g be any two pavings, paving V such that v < p and v < g. There exists a third Proof, The idea of the proof is very simple. Each rectangle in p or m g determines 2^ hyperplanes (each hyperplane containing a face of the rectangle). If we collect all these hyperplanes, they will "enclose" a number of rectangles. We let V consist of those rectangles in this collection which do not contain any smaller rectangle. Figure 8.7 shows the case (for n = 2) where p and g each contain one rectangle. Here v contains nine rectangles. We now fill in the details of this argument. Let ci = <c}, . . . , c^ >, . . . , c/c = <c^, . . . , c^> be all the vectors that occur in the description of the rectangles of p and g. (In other words, if Qa E /> or G ^, then a and b are among the c's.) Let di, . . . , d^" be the vectors of the form < c/^, . . . , c^^ >, where the z*/s range independently from 1 to /c, (so that there are ¥^ of them). (See Fig. 8.8 for the case where n = 2 and p and g consist of one rectangle each.) For each di there is at most one smallest dy(i) such that di < dy(i). In fact, if di <ct cl>. then set dy(i) = <c/j, . . . , c7„>, where cy, = mm c, ^31 .>'\i
330 INTEGRATION 8.5 Let V = {Ddf'^}- Then v is finer than p and ^. In fact, if Da ^ />, say, then Da = Dd^ for suitable a and ^ and uz= U n^f^ (5.1) da<di dy(i)<d^ To see this observe that if x G EUdf) then d^ < x < A^. Choose a largest di < X. Then di < x < dy(o, so x G Odf'^- This proves the proposition. W(^ will later want to use the particular form of the v we constructed to find additional information. D We can now prove (1) and (2). Lemma 5.1. Let />i, . . . , />z be pavings. Then there exists a paving ^ such that 1^1 = |/>i| U • • • U \pi\. Proof. By repeated applications of Proposition 5.1 we can choose a paving 'i^ which is finer than all the />/s. Then each \pi\ is the union of suitable rectangles of ?^. Let g be the collection of all these rectangles occurring in all the />/s. Then |^| = |/>i| U- • • U \pk\- □ In particular, we have proved (1). More generally, we have shown that every A G 3)min is of the form A = \p\ for a suitable paving />. We now wish to turn our attention to (2). Lemma 5.2. Let c} < • • • < c^\, c\ < - • - < c^^j . . . , c^ < • • • < c^^he v sequences of numbers. Then Proof, In fact, c". — Cj = c\ — c\ + cl — c^ + - - - + c\. — c'._i, so that th(^ lemma follows from (4.1) when we multiply out all the factors. D We now prove (2). Let /> = {Da-} and ^ = {Det)? where A = \p\ = \g\. Let V be the paving we constructed in the proof of Proposition 5.1. Let ^ ^ (Ddr) be the collection of those rectangles Ddf of v such that Ddf C \p\ = \g\. Then to prove (2) it suffices to show that i: mcd^f) = r iu(n^:) = e mcd^^). (5.2) Now each rectangle CHaJ is decomposed into rectangles Ddf^^^ according to (5.1), that is, 3ii = da, hi = d/?, etc. By construction of the d's, this is exactly a decomposition of the typc^ described in Lemma 5.2. Thus (5.1) implies that da<d; dy(i)<dp
8.6 CONTENTED SETS 331 Summing over all \Z\^i (and doing the same for \Z\t0 proves (5.2). We can thus state: Theorem 5.1. Every A G a)min can be written SiS A = |/>|. The number Ijl{A) = J^DGpfJ'GU) does not depend on the choice of />. We thus get a well-defined function (jl on 3)min- It satisfies Axioms fxl through ju4. If fi^ is any other function on SDmin satisfying ju2 and ju3, then fx^A) = Kijl{A), where K = fx'iBl)- Proof. The proof of the last two assertions of the theorem is easy and is left as an exercise for the reader. 6. CONTENTED SETS Theorem 5.1 shows that our axioms are not vacuous. It does not provide us with a satisfactory theory, however, because 3)min contains far too few sets. In particular, it does not fulfill requirement (iii), since 3)min is not invariant under rotations, except under very special ones. We are now going to remedy this by repeating the arguments of Section 4; we are going to try to approximate more general sets by sets whose (jl^s we know, i.e., by sets contained in 3)min- This idea goes back to Archimedes, who used it to find the areas of figures in the plane. Definition 6.1. Let A be any subset of E^. We say that /> is an inner paving of A if |/>| C A, We say that g is an outer paving of A if A C |^|. We list several obvious facts. If \MCAC \g\, then m(/>) < iu(^). (6.1) If \P\CAC \gl then \Tp\ cTAc \Tg\, (6.2) U Ai n A2= 0 and |/>i| C ^i, |/>2| C ^2, then />i U />2 is an inner paving of Ai U A2. (6.3) Definition 6.2. For any bounded subset A of E^ let M*(^) = lub m(|/>I) He A be called the inner content of A and let il(A)= gib Ml^l) be called the outer content of A. Note that since A is bounded, there exists a g with A C |^|. This shows that pi{A) is defined. This together with (6.1) shows that ju*(^) is defined and that „*{A) < niA). (6.4)
332 INTEGRATION 8.6 Definition 6.3. A set A will be called contented if ju*(^) = m(^)- We call jU*(A) = iJi{A) the content of A and denote it by ju(A). Observe that every A G SDmin is contented. In fact, if A = |?/|, then v is both an inner and an outer paving of A, Thus ju*(^) = m(^) = m(I^I)) and the new definition of ijl{A) coincides with the old one. Our next immediate objective is to show that the collection of all contented sets fulfills Axioms 3)1 through 3)3. Proposition 6.1. A set A is Contented if and only if its boundary is contented and has content zero. Proof. Suppose A is contented. For any 5 > 0 we can find an inner paving p and an outer paving g such that ju(^) — ju(/>) < 5/2. We want to replace /> by a close paving />' with |/>'| C int A. To do this, we choose a small number rj and replace each rectangle Q^ of /> by na+^cb-a!- We let />,, be the collection of all these rectangles. Then |/>^| C int |/>|, so |/>^| C int A. Furthermore, l^(\Pv\) = (1 ~ 277)V(|/>|), since the factor (1 — 277) is the decrease of each side of each rectangle of />. Similarly, we replace ^ by a slightly larger ^^, with A C int gr] and ^(^7?) < (1 + 27]Yii{g). By choosing 77 sufficiently small, wo can thus arrange that mC^t?) — ^(/^tj) < 5. Let ?^ be a paving which is finer than gy^ and />^, with \v\ = |^^|. Let ^ Ct/ consist of those rectangles of v lying in int A, Then \%/\ = \grj\ D \^\ D |/>^|, so ju(|?^|) — M(kl) < 5. Bui dA C\t^ — ^1, so that ll{dA) < iji(\v — a.\) = ju(l^l) — M(kl) < 5. In other words, il{dA) = 0. Conversely, suppose that dA has content zero. Let ^ be an outer paving of dA with ju(l^l) < ^' Let ?^ be a paving finer than ^ and such that A C |?^|. Let p dv consist of those rectangles contained in A. Let g dv consist of those rectangles lying in \p\ U U|. Thenju(|^|) < ju(|/>l) +M(kl) < m(|/>|) + e- Furthermore, A C |^|. In fact, let x G A. Then x G D for some D ^ ^- If D H ^A 7^ 0, then □ n 1^1 F^ 0, so n C |-4.|, since ?/ is a refinement of ^. If □ n ^A = 0, then every point of □ must lie in A, so that □ C |/>|. We have thus constructed p and g with \p\ C A C \g\ and ju(^) — ju(/>) < e. Since we can do this for any e, this implies that A is contented. D Proposition 6.2. The union of any finite number of sets with content zero has content zero. If A C B and B has content zero, then so does A. Proof. The proof is obvious. Theorem 6.1. Let 3)con denote the collection of all contented sets. Then 3)con satisfies Axioms 3)1 through 3)3, and the fx given in Definition 6.3 satisfies jul through ju4. If ju' is any other function on a)con satisfying jjlI througli ju3, then ju' = KfXj where K = ju'CDo)- Proof. Let us verify the axioms. 3)1. For any A and 5, d{A UB) CdAudB and a(A n S) C ^A U dB.
8.7 WHEN IS A SET CONTENTED? 333 By Proposition 6.1, if ^4 and B are contented, then ^^4 and dB have content zero. Thus so do dA U dB, d{A U B), and ^(^4 n B), by Proposition 6.2. Hence A U B and A n B are contented. 3)2. Follows immediately from (6.1). 3)3. Is obvious. /jl2. If AI and A 2 are contented, we can find inner pavings />i and />2 such that)u(-4i) -)u(|/>i|) < e/2andM(-42) - )u(|/>2|) < e/2. If .410^42 = 0, then />! U />2 is an inner paving of i4i U ^42, and so fi{Ai U A2) > ^i{Ai) +fi{A2). On the other hand, let ^1 and ^2 be outer pavings of A i and ^42, respectively, with m(^i) < KAi) + e/2 and ^(^2) < f^{A2) + e/2. Let ?^ be a paving with \v\ = \gi\ U |^2|- Then %^ is an outer paving of ^1 U ^2 and m(|?^|) < K\3i\) + m(I^2|). Thus ^i{Ai U A^) < m(|?^|) < ju(i4i) + ju(-42) + e, or ju(i4i U ^42) < ju(-4i) + m(-42). These two inequalities together give ju2. /aI. Is obvious. /jl3. Follows from (6.2) and Definition 6.3. fii. We already know. The second part of the theorem follows from Theorem 5.1 and Definition 6.3. In fact, we know that ju'(|/>|) = Kijl{\p\)j and (6.1) together with Axiom ju2 implies that ju'(|/>l) ^ m'(-4) < ju'(l^l)- Since we can choose /> and g to be arbitrarily close approximations to A (relative to ju), we are done. D Remark. It is useful to note that we have actually proved a little more than what is stated in Theorem 6.1. We have proved, namely, that if 3) is any collection of sets satisfying 3)1 through 3)3, such that a)min C 3) C 3)con, and if ju': 3) —> IR satisfies jul through ju3, then ijl\A) = Kijl{A) for all A in 3), where K = ^'(no)- 7. WHEN IS A SET CONTENTED? We will now establish some useful criteria for deciding whether a given set is contented. Recall that a closed ball 5^ with center x and radius r is given by B:,= {y:\\y-x\\<r}. (7.1) Note that ■Bi C nx±^i+'^' for any e > 0, (7.2) and (See Fig. 8.9.) U^licB;: -sjn r (7.3) x+rl Fig. 8.9
334 INTEGRATION 8.7 If we combine (7.2) and (7.3), we see that any cube G lies in a ball B such that ix{B) < 2^(Vn)V(C) and that any ball B lies in a cube C such that ju(C) < S^iV^rniB), Lemma 7.1. Let A he Si subset of E^. Then A has content zero if and only if for every e > 0 there exist a finite number of balls {Bi} covering A with E iJi{Bi) < e. Proof. If we have such a collection of covering balls, then by the above remark we can enlarge each ball to a rectangle to get a paving /> such that ^ C |/>| and m(I/>I) < S'^{Vn)'^e. Therefore, il{A) = 0 if we can always find the {Bi}. Conversely, suppose A has content 0. Then for any 8 we can find an outer paving /> with ju(|/^l) < ^- I^'or each rectangle □ in the paving we can, by the arguments of Section 4, find a finite number of cubes which cover Q and whose total content is as close as we like to ijl{\Z}), say <2ju(n). By doing this for each □ E />, we have a finite number of cubes {\Z}i} covering A with total content less than 25. Then by our remark before the lemma each cube \Z\t lies in a ball Bi such that ju(^^) < 2^(\/^)^m(^z), and so we have a covering of A by balls Bi such that E iJi{Bi) < 2^+i(\/ri)^5. If we take 8 = e/2^+H\/^)^ we have the desired collection of balls, proving the lemma. D Recall that a map (^ of ^ C E^ —> E^ is said to satisfy a Lipschitz condition if there is a constant K (called the Lipschitz constant) such that y{y) - <p{x)\\ < K\\y - x\\. (7.4) Proposition 7.1. Let ^ be a set of content zero with A C U^ and let (p: U —^ E^ satisfy a Lipschitz condition. Then (p{A) has content zero. Proof. The proof consists of applying both parts of Lemma 7.1. Since A has content zero, for any e > 0 we can find a finite number of balls covering A whose total outer content is less than e/K'^, By (7.4), (p{Bl.) C B^[^), so that the images of the balls covering A cover (p(A) and have a total volume less than e. D Recall that if (^ is a (continuously) differentiable map of an open set U into E^, then cp satisfies a Lipschitz condition on any compact subset of U. As a consequence of Proposition 7.1, we can thus state: Proposition 7.2. Let cp he Si continuously differentiable map defined on an open set U, and let ^ be a bounded set of content zero with A C U. Then (p{A) has content zero. Let A he any compact subset of E^ lying entirely in the subspace given by x^ = 0. Then A has content zero. In fact, for some sufficiently large fixed r, the set A is contained in the rectangle n<™-r,o)> for any 6 > 0, which has arbitrarily small volume.
8.8 BEHAVIOR UNDER LINEAR DISTORTIONS 335 Now let ^: V C E^~^ —> E^ be a continuously differentiable map given by Let B be any bounded subset of E^~^ with B CV. We can then write ^(5) = (p{A), where A is the set of points in E^ of the form (2/, 0), where y E: B, and where ^ is a differentiable map such that cp{x\ . . . , X-) = <i.\x\ . . . , X--'), . . . , r{x\ . . . , x--')>. By Proposition 7.2 we see that fi{yp(B)) = 0. Thus, Proposition 7.3. Let xp he Si differentiable map of F C E^~^ into E^, and let 5 be a bounded set such that B CV. Then \1^{B) has content zero. We have thus recovered requirement (v) of Section 1. An immediate consequence of Propositions 7.3 and 6.1 is: Proposition 7.4. Let il C E^ be such that ^^4 C \J^i{Bi) where each ^^ and Bi is as in Proposition 7.3. Then A is contented. This shows that every set "we can draw" is contented. Exercise. Show that every ball is contented. 8. BEHAVIOR UNDER LINEAR DISTORTIONS We shall continue to derive consequences of Proposition 7.L Proposition 8.1. Let ^ be a one-to-one map of C7 —> E^ which satisfies a Lipschitz condition and is such that (p~^ is continuous. If A C U is contented, then so is (p{A). Proof, Since A is contented, ^^4 has content zero. By the conditions on cp, we know that d(p(A) = (p{dA). Thus d(p{A) has content zero, and so (p{A) is contented. D An immediate consequence of Proposition 8.1 is: Proposition 8.2. Let L be a linear transformation of E^. Then LA is contented whenever A is contented. Proof. If L is nonsingular. Proposition 8.1 applies. If L is singular, it maps all of E^ onto a proper subspace. Any such subspace is contained in the image of {x : x^ = 0} by a suitable linear transformation, and so ijl{LA) = 0 for any contented A. D Theorem 8.1. Let L be a linear transformation of E^. Then for any contented A we have /t a\ i i ^ t i / ^ x /o ix fi{LA) = \det L\fx{A), (8.1) Proof. We can restrict our attention to nonsingular L, since we have already checked Eq. (8.1) for det L = 0. If L is nonsingular, then L carries the class of
336 INTEGRATION 8.9 contented sets into itself. Let us define ijl' by ix'{A) = iJi{LA) for each A G 3)con- We claim that ju' satisfies Axioms jul through ju3 on SDcon- In fact, ^tl and ju2 are obviously true; ijlS follows from the fact that for any translation T'^, we have Tl^L = LT^, so that H\T,A) = fjL{LT,A) = h{TlvLA) - n{LA) = n'{A), By Theorem 5.2 we thus conclude that ll' = kLfJL, where /cl is some constant depending on L. We must show that /c^ = [det L\. We first observe that if 0 is an orthogonal transformation, then fxiOA) = fjiiA). In fact, we know that fJi{OA) = koii{A). If we take A to be the unit ball 5j, then OBl = Bl, so ko = 1. Next we observe that ix{LiL2A) = kL^fJi{L2A) = /cli/clsMC^)? so that kLiL2 ^ ^Li^L2- Now we recall that any nonsingular L can be written as L = POj where P is a positive self-adjoint operator and 0 is orthogonal. Thus /c^ = kp and |det L\ = |det P\ |det 0\ = |det P|, so we need only verify (8.1) for positive self- adjoint linear transformations. Any such P can be written as P = OiDO~[^j where Oi is orthogonal and D is diagonal. Since P is positive, all the eigenvalues of D are positive. Since det P = det D and kp = ko, we need only verify (8.1) for the case where L is given by a diagonal matrix with positive eigenvalues Xi, . . . , X^. But then LQj = Q<^i--^n>^ go l^j^at /(DJ) = mOo^'^'-''^^) = Xi • • • An = |det L|, verifying (8.1). D Exercise. Let vi, . . . , Vn be vectors of E''. B}^ the parallelepiped s])anne(l by vi,. .., Vn we mean the set of all vectors of the form 2]?=i ^'^i, where 0 < a;' < 1. Show that its content is |det ((\i, vy))|^^^. 9. AXIOMS FOR INTEGRATION So far we have shown that there is a unique (jl defined for a large collection of sets in E^. However, we do not have an effective way to compute ^t, except in very special cases. To remedy this we must introduce a theory of integration. We first introduce some notation. Definition 9.1. Let / be any real-valued function on E^. By the support of /, denoted by supp /, we shall mean the closure of the set where / is not zero; that is, supp/= {x:/(x) ^0}.
8.9 AXIOMS FOR INTEGRATION 337 Observe that supp (/ + (7) C supp / U supp g (9.1) and supp fg C supp / n supp g. (9.2) We shall say that / has compact support if supp/ is compact. Equation (9.1) [and Eq. (9.2) applied to constant g] shows that the set of all functions with compact support form a vector space. Let T be any one-to-one transformation of E^ onto itself. For any fimction / we denote by Tf the functions given by (TfKx) = f{T-'x). (9.3) Observe that if T and T~^ are continuous, then supp 7^/= 7^ supp/. (9.4) Definition 9.2. Let A he Si subset of E^. By the characteristic function of A, denoted by Ca, we shall mean the function given by , . _ fl if x^A, eAW - 1^ if X g ^. (9.5) Note that ^AiHAa = ^Ai • eA2, (9.6) 6A1UA2 = ^Ai + 6A2 — ^AinA2) (9.7) supp ca = A, (9.8) TeA = eTA (9.9) and for any one-to-one map T of E^ onto itself. By a theory of integration on E^ we shall mean a collection ^ of functions and a rule J which assigns a real number J/ to each / E 9^, subject to the following axioms: $F1. 9^ is a vector subspace of the space of all bounded functions of compact support. 9^2. If / G 3" and 7" is a translation, then Tf G 9^. $F3. 6n belongs to 9^ for any rectangle □. Jl. / is a linear function on 3^. J2. ^Tf= J/for any translation T, J3. If/> 0, then J/> 0. Note that the axioms imply that 3^ contains all functions of the form ^Di + ^02 + • • • + ^Da f^^ ^^y rectangles Db • • • > D/c- In particular, for any paving />, the function e [^[ must belong to 3^.
338 INTEGRATION 8.10 Also note that from J3 we have at once the stronger version: J3'. f < g=^ jf < jg, since then g - f > 0. Proposition 9.1. Let 9^, J be a system satisfying Axioms ^1 through 9^3 and Jl through J4. Then ^eA = fJi{A) (9.10) f' for every contented set A such that Ca E 9^, and J5. IJ/I < ||/||ocM(supp/) forevery/G9^. Proof. The axioms guarantee that e^ E 9^ for every A g SDmin and that ^'(^4) = J^A satisfies fil through ^t4. Therefore, Jca = m(-4) for every A G SDmin by the uniqueness of (jl (Proposition 4.1). It follows that if ^4 is a contented set such that Ca G 9^, and if /> and g are inner and outer pavings of A, then m(|/>I) = /^i^i < /^A < /^i^i = m(I^I). Therefore, J^a lies between jU*(-4) and /l(i4), and so equals iJi(A). For any /g9^ and any ^4 G SDmin such that supp/^i4, we have — ||/||oo6a </^ WfllooCA, and therefore |J/| < ||/||ooju(^) by Js' and (9.10). Taking the greatest lower bound of the right side over all such sets ^4, we have J5. D 10. INTEGRATION OF CONTENTED FUNCTIONS We will now proceed to deal with Axioms 9^ and J in the same way we dealt with Axioms 33 and ju. We will construct a "minimal" theory and then get a "big" one by approximating. According to Proposition 9.1, the class 9^ must contain the function 6|^| for any paving />. By ^1 it must therefore contain all linear combinations of such. Definition 10.1. By a paved function we shall mean a function f = fp given by /= Z Ciea, (10.1) DiGp for some paving />. It is easy to see that the collection of all paved fimctions satisfies Axioms 9^1 through 9^3. Furthermore, by Proposition 9.1 and Axiom Jl the integral, J, is uniquely determined on the class of all paved functions by // = L CifjiiUi) (10.2) if/is given by (10.1). The reader should verify that if we let ^p be the class of all paved functions and let J be given by (10.2), then all our axioms are satisfied. Don't forget to show that J is well defined: if/is expressed as in (10.1) in two ways, then the sums given by (10.2) are equal.
8.10 INTEGRATION OF CONTENTED FUNCTIONS 339 The paved functions obviously form too small a collection of functions. We would like to have an 9^ including all continuous functions with compact support and all characteristic functions of the form Ca with A contented, for example. Definition 10.2. A bounded function / with compact support is said to be contented if for any e > 0 and 5 > 0 there exists a paved function g = g^^^ and a contented set ^4 = Ae,8 such that |/(x) - g{x)\ < e for all x^ A (10.3) and fi{A) < 8. (10.4) The pair <g, A> will be called a paved e, 5-approximation to /. Let us verify that the collection of all contented functions, 9^coii, satisfies Axioms 9^1 through 9^3. It is clear that if/is contented, so is a/for any constant a. If/i and /2 are contented, let <(7i, Ai> and <g2, A2> be paved e,5-approxi- mations to /i and /2, respectively. Then |(/i+/2)(x) - ((7i + (72)(x)| < 2e for all x^A.uA^, and fx(Ai U A2) < 25. Thus <(7i + (72, ^1 U A2> gives a paved 2e, 25-approximation to /i + /2. To verify ^2 we simply observe that U <g, A> isa. paved e, 5-approximation to /, then < Tg, TA > is one to Tf. A similar argument establishes the analogous result for multiplication: Proposition 10.1. Let /i and /2 be two contented functions. Then /i/2 is contented. Proof. Let M be such that |/i(x) < M and |/2(x)| < M for all x. Recall that the product of two paved functions is a paved function. Using the same notation as before, we have |/i/2(x) - (7i(x)(72(x)| < |/i(x)| |/2(x) - (72(x)| + |(72(x)| |/i(x) - (7i(x)| <Me+(M + e)e for all x^AiUA2. Thus <gig2, Ai U A2> is a paved (2M + e)e, 25-approximation to /1/2. □ As for 9^3, it is immediate that a stronger statement is true: Proposition 10.2. If 5 is a contented set, then es is a contented function. Proof. In fact, let /> be an mner paving of B with ijl(B) — ju(|/^I) < 5- Then 6b(x) - e|^,(x) = 0 if x^B-\p\, ^""^ fJiiB - \p\) < 8, so 6[^|,|/>| is a paved e,5-approximation to es for any e > 0. D
340 INTEGRATION 8.10 We now establish a useful alternative characterization of a contented function. Proposition 10.3. A function / is contented if and only if for every e there are paved functions h and k such that h < f < k and j{k — h) < e. Proof. If/is contented, let E be a rectangle including supp /. Let <g, A> he an e, 5-approximation to /. Let P be a paved set including A = A^^^ such thai- ju(P) < 8, and let m be a bound of |/|. Then g — e{eR) — mep < f < g ^- ^{^r) + mep, where the outside functions are clearly paved and the differences of their integrals is less than 2e^t(i?) + 2m b. Since e and 5 are arbitrary, we have our h and k. Conversely, if h and k are paved functions such that h < f < k and j{k — h) < a, then the set where k — h > a^^^ is a paved set A. Furthermore, a^'^fji{A) < jeA{k — h) < J(/c — h) < a, so thatAt(^) < a^'^- Given e and 8, we only have to choose a < min (e^, 8^) and take g as either /c or /i to see that / is contented. D Corollary. A function / is contented if for every e there are contented functions/i and/2 such that /i < / < /2 and J(/2 — /i) < e. Proof. For then we can find paved functions h < fi and k > f^ such thai J(/i — h) < e and J(/c — /2) < e and end up with h <f < k and j{k — h) < 3e. D Theorem 10.1. Let 9^ be a class of functions satisfying Axioms ^1 through ^3 and such that 9^p C 9^ C 9^con- Then there exists a unique J satisfying Axioms Jl through J4 on ^. Proof. If J is any integral on 9^ satisfying Axioms Jl through J4, then we must have J/ simultaneously equal to lub jh for h paved and </ and equal to gib jk for k paved and >/, by Proposition 10.3. The integral is thus uniquely determined on ^. Moreover, it is easy to see that if the integral on ^ is defined by J/ = lub jh = gib jk, then Axioms Jl through J4 follow from the fact that they hold for the uniquely determined integral on the paved functions. D Exercise 10.1. Let/ and Q be contented functions such that f(x) = g{x) for x^ A, where }j^{A) = 0. Then // = fg. (This shows that for the purpose of integration we need to know a function only as to a set of content zero.) Definition 10.3. Let / be a contented function and A a contented set. We call jeAf the integral off over A and denote it by J a/. Thus fj=f^Af- (10-5) An immediate consequence of Axiom Jl and (9.7) is f f= f f+ f f- f /• (10.6) JAiI)A2 J Ai J A2 JAinA2 An immediate consequence of Exercise 10.1 is \[ f\< sup \f{x)\KA). \Ja I xEA
8.10 INTEGRATION OF CONTENTED FUNCTIONS 341 We close this section by giving another useful characterization of contented functions. Proposition 10.4. Let / be a bounded function with compact support. Then / is contented if and only if to every e > 0 and 5 > 0 we can find an 77 > 0 and a contented set A^ such that (liA^) < 8 and |/(x) —/(y)| < e whenever ||x — j\\ < v and x, y g A^. (10.7) Proof. Suppose that for every e, 8 we can find rj and A^. Let p = {□^} be a paving such that i) supp/c|/>|; ii) if X, y G {Ji, then ||x — y\\ < v] iii) if ^ = {[Ji G />: Dz n As F^ 0}, then m(|^|) < 25. Then let fe,28(^) = /(xf) when x G {Zli, where x^ is some point of Qi. By (ii) and (iii), we see that /e,25)1^1 is a paved e, 25-approximation to /. Thus / is contented. Conversely, suppose that / is contented, and let/e/2,5/2, A^/2,8/2 be a paved approximation to /. Let /> = {□i} be the paving associated with fe/2,8/2' Replace each □i by the rectangle IZii obtained by contracting {Zii about its center by a factor (1 — ^). (See Fig. 8.10.) Thus M(nO = (1 - Oy(ay For any x, y G ua, if l|x — y|| < V, where rj is sufficiently small, then x and y belong to the same Q-. If X, y G U^^^ X, y g A,i2,8/2, and Fig. 8.10 y|| < V, then |/(X) -/(y)| < |/(X) -/e/2,5/2(x)| + |/(y) -/e/2,5/2(y)| + |/6/2.5/2(x) — /e/2,5/2(y)|. But the third term vanishes, so that |/(x) — /(y)| < €. Now by first choosing ^ sufficiently small, we can arrange that ju(|/>| ~ UCIIO < ^/2- Then we can choose 7] so small that \\x — y\\ < rj implies that x, y belong to the same {Zli if X, y G UD^ For this rj and for As = ^e/2.5/2 U (|/>| - UDO. Eq. (10.7) holds, SindfxiAs) < 8. D In particular, a bounded function which is continuous except at a set of content zero and has compact support is contented.
342 INTEGRATION 8.11 EXERCISES 10.2 Show that for any bounded set ^, e^ is a contented function if and only if A is a contented set. 10.3 Let/be a contented function whose support is contained in a cubeQ. For each b let ps = {[Hi.sltGis be a paving with \ps\ = [H ^^^ whose cubes have diameter less than 8. Let Xi,8 be some point of \Zli.8- The expression is called a Riemann 5-approximating sum for/. Show that for any € > 0 there exists :i 5o[ = 5o(/)] > 0 such that (f- Z/Cx.XDm)! < e whenever 5 < 80. 11. THE CHANGE OF VARIABLES FORMULA This section will be devoted to the proof of the following theorem, which is of fundamental importance. Theorem 11.1. Let U and V be bounded open sets in IR", and let 9 be a continuously difFerentiable one-to-one map of U onto V with 9 ~ ^ difFerentiable Let / be a contented function with supp f CV. Then (/ ° 9) is a contented function, and /^/=/^(/°^)|det/^|. (11.1) Recall that if the map cp is given hy y^ = (p^(x^, . . . , x'^), then J^ is thc' linear transformation whose matrix is [dcpi/dxj]. Note that if (^ is a nonsingular linear transformation (so that J^ is just cp), then Theorem 11.1 is an easy consequence of Theorem 8.1. In fact, for functions of the form eA we observe that ca ° (f ^ e^—iA, and Eq. (11.1) reduces, in this case, to (8.1). By linearity, (ILl) is valid for all paved functions. Furthermore, /o (y^ is contented. Suppose |/(x) —/(y)| < €when||x —y|| < (f and X, y g A, with ijl(A) < 5. Then |/o ,^(u) — f° <^(v)| < € when ||u - v|| < rj/M and u, V g ^"^A), withfi(cp~^A) < 8/\detcp\. Now let ge,8, ^€,5 be an approximating family of paved functions for/. Then \fo(p(x) — (7e,5°<^(x)| < eforxg (p~\A,^8) Sindfi((p~^A,^3) < 8/\det <p\. Thus j{0e,8 ° <^)|det (p\ —> J(/o (^)|det (p\, and Eq. (11.1) is valid for all contented/. The proof of Theorem 11.1 for nonlinear maps is a bit more tricky. It consists essentially of approximating cp locally by linear maps, and we shall do it in several steps. We shall use the uniform norm ||x||oo = max |x'| on W^. This is convenient because a ball in this norm is actually a cube, although this nicety isn't really necessary.
8.11 THE CHANGE OF VARIABLES FORMULA 343 Let ^ be a (continuously) differentiable map defined on a convex open set U. If the cube D = Dp^ri lies in U, then the mean-value theorem (Section 7, Chapter 3) implies that for any y E □, ll^(y) - ^(p)lloo < ||y - pIIoo sup ||/^(z)||. Thus ^O) C Uf^tl'l where K = sup |I/^(z) ||. zGD Thus mCiAO)) < (sup ||/^(z)||)"M(n). (11.2) zGD Lemma 11.1. Let (p be as in Theorem 11.1. Then for any contented set A with il C C/ we have m(^(^)) < I |det/^|. (11.3) Proof. Let us apply Eq. (11.2) to the map yp = L^^cp^ where L is a linear transformation. Then idetL|-v(^(n)) = iuCi'-vo)) < (sup ii/L-v(z)i!)V(n). zGD Since Jl—^^p — L^^J^p, we get M(^(n)) < |detL|(sup |iL-V^(z)||)V(n) (11-4) zGD for any □ contained in the domain of the definition of (p and for any linear transformation L. For any € > 0, let 8 be so small that ||/^(x)~^/^(y)|| < 1 + € for ||x - ylloo < 8 for all X, y in a compact neighborhood of A. (It is possible to choose such a 8, since J(x) is a uniformly continuous function of x, so that t/^(x)~V^(y) is close to the identity matrix when x is close to y; see Section 8, Chapter 4.) Choose an outer paving g^ = {\Z}i} of Aj where the Qt ^re cubes all having edges of length less than 8. Let x^- be a point of \Z}i' Then applying (11.4) to each [^i taking L = /^(x^), we get ^(<p{A)) < m(^(UI)) = ZMBd < Z |clet/^(xi)|(l + e)>(ni). We can also suppose 8 to have been taken small enough so that |det/^(z)| > (1 — €)|det/^(x^)| for all z G Di and all i. Then we have [ Idet/^l > (1 - €)|det/^(x,-)|M(ni), and so /• m(^U)) < j-^ (1 + erj^^ \detJ^\. Since e is arbitrary and s is an arbitrary outer paving of A, we get (11.3). D
344 INTEGRATION 8.1 1 We can now conclude that / o <^ is contented for any contented / witli supp f CV. In fact, let K be chosen so large that it is a Lipschitz constant for if on ip~^(suppf), and so large that K > |det J^—i(u)| for u G supp/. Now given € and 8, we can find an rj such that |/(u) — /(v)| < e if ||u — y\\ < V and u,y ^ A3 with fiiA^) < 8. But this implies that 1/ o ^(x) - / o ^(y)| < e if ||x - y|| < v/K and x, y g ^-'(A^), where iJi(ip~^(A3)) < K3, by (11.3). Since K was chosen independently of e and 8, this shows that / o (^ is contented. Lemma 11.2. Let (p, U, and V be as in Theorem 11.1. Let / be a nonnega- tive contented function with supp f CV. Then f^f<f^{fo^)\detJ,\. (11.5) Proof. Let <g, A> be a paved e, 5-approximation to / with g(u.) < /(u) for all u. If p = m^} is the paving associated with g, we may assume that supp f C\p\ Then f(7= E oM^^iUi) <i:gM[ ^ IdetJ.I <E f , (/o<,)|detJ,| = f ^ (fo^)\detJ,\= f{fo^)\detJ,\. yu^-i(n,) J Since we can choose g so that jg —> jf, we obtain (11.5). D Lemma 11.3. Let (p, U, V, and / be as in Theorem 11.1. Let / be a non- negative function. Then Eq. (11.1) holds. Proof. Let us apply (11.5) to the map <p~^ and the function (f o (p)\detJ^\. Since J^(x) o J^_i((p(x)) = id, we obtain f(fo^)\detJ^\ < f[(fo^) o ^-'](\detJ^\ o cp-i)|det J^-i| Combining this with (11.5) proves the lemma. D Completion of the proof of Theorem 11.1. Any real-valued contented function can be written as the difference of two positive contented functions. If for all \, fix) > —M for some large M, we write /= (f-\- Mea) — Me^, where supp / C D- Since we have verified Eq. (11.1) for nonnegative functions, and sincc^ both sides of (11.1) are linear in/, we are done. Similarly, any bounded complex- valued contented function / can be written as / = /i + if2, where /i and /2 are bounded real-valued contented functions. D
8.11 THE CHANGE OF VARIABLES FORMULA 345 In practice, we sometimes may apply Eq. (11.1) to a situation where the hypotheses of Theorem 11.1 are not, strictly speaking, verified. For instance, in IR^ we may want to introduce "polar coordinates". That is, we let r, 6 be coordinates on W^] if S is the set 0 < ^ < 27r, 0 < r, we consider the map (p: S —^W^ given by X = r cos 6j y = r sin 6, where x, y are coordinates on a second copy of U^. Now this map is one-to-one and has positive Jacobian for r > 0. If we consider the open sets U C S given by 0 < r, 0 < ^ < 27r and V C^^ given by V = U^ — {xy y :y = Of X > 0}, the hypotheses of Theorem 11.1 are fulfilled, and we can write (since det J^ = r) ff= fif°<p)r (11.6) if supp f CV. However, Eq. (11.6) is valid without the restriction supp f CV. In fact, if De is a strip of width e about the ray 2/ = 0, x > 0, then / = fe^^ + fef,n-D^ and ^feo^ —> 0 as e —> 0 (Fig. 8.11). Similarly, J(/o (p)(r o (p)eD^ o v? —> 0, so that (11.6) is valid for all contented / by this simple limit argument. D, <p-\D,) Fig. 8.11 We will not state a general theorem covering all such useful extensions of Theorem 11.1. In each case the limit argument is usually quite straightforward and will be left to the reader. EXERCISES 11.1 By the parallelepiped spanned by v^, . . ., v"" we mean the set of all a: = ^ J^v^ + . . . _|_ ^riyn^ where 0 < ^* < 1. Show that the content of this parallelepiped is given by |det((vi,v,))|i/2. 11.2 Express the content of the ellipsoid (xV ^<l) in terms of the content of the unit ball. 11.3 Compute the Jacobian determinant of the map -Kr, 6> ^-^ <x, y>, where X = r cos 6, y = r sin 6. 11.4 Compute the Jacobian determinant of the-map ^r, 6, <p> ^-^ <x, y, z>, where x = r cos (f sinS, y = r sin ^ sin ^, 2: = r cos 6. 11.5 Compute the Jacobian determinant of the map -Kr, 6, z> ^-^ <x,y, z>, where x = r cos 6, y = r sin ^, 2; = z.
346 INTEGRATION 8.12 12. SUCCESSIVE INTEGRATION In the case of one variable, i.e., the theory of integration on IR^, the fundamental theorem of the calculus reduces the computation of the integral of a function to the computation of its antiderivative. The generalization of this theorem to n dimensions will be presented in a later chapter. In this section we will show how, in many cases, the computation of an n-dimensional integral can be reduced to n successive one-dimensional integrations. Suppose we regard IR^, in some fixed way, as the direct product W^ = U^ X U^. We shall write every z G IR"" as z = < x, y >, where x gU^ and y G UK Definition 12.1. We say that a contented function / is contented relative tci the decomposition K"* = R^ X U^ if there exists a set A/ C K^ of content zero (in U^) such that i) for each fixed x G IR^, x g A/, the function /(x, •) is a contented fimctiori on U^; ii) the function JrZ/ which assigns to x the number Jr^/(x, •) is a contented function on U^, It is easy to see that the set of all such functions satisfies Axioms 3^1 througli 3^3. (The only axiom that is not immediate is 3^2. But this is an easy consequence of the fact that any translation T can be rewritten as T1T2 ,where Ti is a translation in U^ and T2 is a translation in UK) It is equally easy to verify that the rule which assigns to any such / the number satisfies Axioms Jl through J4. The only one which isn't immediately obvioun is J3. However, if p is any paving with supp / C \p\j then /< ll/lkw and smce for any rectangle (direct verification). Thus, by the uniqueness part of Theorem 10.1, we have Lnf=ULlf(-'-))- (12.1) Note, in particular, that if / is also contented relative to the decomposition R^ = R^ X R^, then In particular, for such / the double integration is independent of the order.
8.12 SUCCESSIVE INTEGRATION 347 In practice, all the functions that we shall come across will be contented relative to any decomposition of W^. In particular, writing IR^ = R^ X • • • X H^^, we have In terms of the rectangular coordinates x^,. . . , x^, this last expression is usually written as /(•••(//(^\...,^")^^')---)^^". For this reason, the expression on the left-hand side of (12.2) is frequently written as f^J= f ■■■ jf{x\...,x'')dx'---dx''. Let us work out some simple examples illustrating the methods of integration given in the previous sections. Example I. Compute the volume of the intersection of the solid cone with vertex angle a {vertex at 0) with the spherical shell 1 < r < 2 (Fig. 8.12). By a Euclidean motion we may assume that the axis of the cone is the ^-axis. If we introduce polar coordinates, we see that the set in question is the image of the set I—l<i,o,o> f i :^ r <. z, in the <r, ^, ^>-space (Fig. 8.13). 0 < ^ < 27r, 0 < d < a/2 Fig. 8.12 Fig. 8.13 By the change of variables formula and Exercise 11.4 we see that the volume in question is given by / f2 /.27r /.a/2 r^ sine = / r^ sin 6 dO d(p dr /•2 /.a/2 = 27r/ / r^ sm Odd dr Ji Jo = 2TrC [1 - cos {a/2)y dr = 27r[l - cos (a/2)](| - ^).
348 INTEGRATION 8.12 Example 2. Let 5 be a contented set in the plane, and let /i and j^ be two con tented functions defined on B, Let A be the set of all <x, y, z> G E^ such thai <x, y> G 5 and fi(x, y) < z < f2ix, y). If G is any contented function on A, we can express the integral J^ G as rf2^^^y^ [ G = [ i I ' G{x, y, z) dz\ dx dy. J A J A Vfiix,y) f x'^ir For example, compute the integral ja z, where A is the set of all points in the unit ball lying above the surface z = x^ + y^ (Fig. 8.14). '^'^-^^...--^ Fig. 8.14 Thus A = {<x, y,z>: x^ + y^ + z^ < l,z > x^ + y^}. We must have x^ + y^ < a, where a^ + a = 1 [so that a = (\/5 — l)/2], in order for <x, y, z> to belong to A. Then fi{x, y) = x^ + y^, /2(x, y) ^ Vl — (^2 + 7/2), and /, zdz=Ul- ix' + y')- {x' + y')\ Jfl(x,y) so that, using polar coordinates in the plane (and Exercise ILS), f z=i[ [I- {x' + y') - {x' + y'f] = irf^\{l - r'~ r') dr. Ja Jx^-\-y^<a Jo As we saw in the last example, part of the problem of computing an integral as an iterated integral is to determine a good description of the domain of integration in terms of the decomposition of the vector space. It is usually a great help in visualizing the situation to draw a figure. Example 3« Compute the volume enclosed by a surface of revolution. Here we arc given a function / of one variable, and we consider the surface obtained by rotating the curve x = f{z), zi < z < Z2, around the 2-axis (Fig. 8.15). Wc* thus wish to compute iJi{A), where A = {<x, y, z> : x^ + y^ < fiz), zi < z < Z2}. Here it is obviously convenient to use cylindrical coordinates, and we sec that A is the image of the set B = {<r, d, z> :r < f{z), 0 < 6 < 2ir} in the <r, 6, 2>-space. By Exercise 11.5, we wish to compute 2/(0)' jr= I f rdrdzde=2-wl (f r dr) dz = 2t "'-^^dz. JB Jo Jill Jo Jzi \Jo / Jzi Thus J Zi Uz.
8.12 SUCCESSIVE INTEGRATION 349 Fig. e.l5 EXERCISES 12.1 Compute the volume of the region between the surfaces 2 = a;^ -f y^ a^^d z ^ x + y. 12.2 Find the volume of the region in E^ bounded by the plane 2: = 0, the cylinder JJ.2 _j_ ^2 __ 2x, and the cone z = -\-\/x^ + y^. 12.3 Compute Ja (x^ + y^)^ dx dy dz, where A is the region bounded by the plane z = 2 and the surface x^ ~\- y^ = 2z. 12.4 Compute Ja x, where A - {<x,y,z>:x^+y^ + z^< a^, x > 0,y > 0, z > 0}. 12.5 Compute Aw .2 .2 ^2\l/2 X , y +k+-^ 62 ly ' wheri^ A is the region bounded by the ellipsoid 2 2 ^2 a2^52^c2 iet p be a nonnegative function (to be called the density of mass in the following discussion) defined on a domain D in E^. The total mass of <Dj p> is defined as M [ p{x) JD dx. If ikf 7^ 0, the center of gravity of <D,p> is the point C ^ <Ci,C2,Cz>, where C2 = -^ / X2p(x) dx, X3p(x) dx. '^ = hL'-
350 INTEGRATION 8.12 12.6 A homogeneous solid (where p is constant) is given by a:i > 0, 0:2 >: 0, 0:3 > 0, and 222 q2 52 q2 — ' Find its center of gravity. 12.7 The unit cube has density p(x) = 0:10:3. Find its total mass and its center of gravity. 12.8 Find the center of mass of the homogeneous body bounded by the surfaces ^^ + 2/^ + 2^ = a^ and x'^ -\- y'^ = ax. The notion of center of mass can, of course, be defined for a region in a Euclidean space of any dimension. Thus, for a region D in the plane with density p, the center of mass will be the point -<a:o, 2/0 > , where Jd xp Sd yp xo = -7i and 2/0 = -f JDP JDP 12.9 Let D he Si region in the x^-plane which lies entirely in the half-plane a: > 0. Let A be the solid in E^ obtained by rotating D about the ^-axis. Show that m(^) = 2tt dfx(D), where d is the distance of the center of mass of the region D (with uniform density) from the z-axis. (Use cylindrical coordinates.) This is known as Guldin^s rule. Observe that in the definition of center of gravity we obtain a vector (i.e., a point in E^) as the answer by integrating each of its coordinates. This suggests the following definition: Let F be a finite-dimensional vector space, and let ei, . . . , e^; be a basis for V. Call a map/from E"" to F (/ is a vector-valued function on E"" with values in V) contented if when we write /(x) = ^fi(^)ei, each of the (real-valued) functions fi is contented. Define the integral of / over D by // = ^(//^)- 12.10 Show that the condition that a function be contented and the value of its integral are independent of the choice of basis ei, . . . , ek. Let ^ be a point not in the closed domain Z), which has a mass distribution p. The gravitational force on a particle of unit mass situated at ^ is defined to be the vector L P(x)(x-e,^ Id Wx-rn (here x — ^ is an E^-valued function on E^). Fig. 8.16 12.11 Let D be the spherical shell bounded by two concentric spheres Si and S2 (Fig. 8.16), with center at the origin. Let p be a mass distribution on D which depends only on the distance from the center, that is, p(x) = /(||x||). Show that the gravitational force vanishes at any ^ inside Si. 12.12 <D, p>- is as in Exercise 12.5. Show that the gravitational force on a point outside S2 is the same as that due to a particle situated at the origin and whose mass is the total mass of D.
8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 351 13. ABSOLUTELY INTEGRABLE FUNCTIONS Thus far we have been dealing with bounded functions of compact support. In practice, we would like to be able to integrate functions which neither are bounded nor have compact support. Let / be a function defined on E^, let M be a nonnegative real number, and let ^4 be a (bounded) contented subset of E^. Let f^ be the function [O if X g A, f^(x) - { M if X G ^ and \f{x)\ > M, [fix) if xG-A and|/(x)| < M. Thus f^ is a bounded function of compact support. It is obtained from / by cutting / back to zero outside A and cutting / back to M when \f(x)\ > M. We say that a function / is absolutely integrable if i) f^ is a contented function for all M > 0 and contented sets A; and ii) for any e > 0 there is a bounded contented set A^ such that ^^^ • / is bounded and for all M > 0 and all B with B n A^ = 0, /I Wl < e. It is easy to check that the sum of two absolutely integrable functions is again absolutely integrable. Thus the set of absolutely integrable functions forms a vector space. Note that if/ satisfies condition (i) and \f{x)\ < \g{x)\ for all x, where g is absolutely integrable, then / is absolutely integrable. Let / be an absolutely integrable function. Given any e, choose a corresponding A^. Then for any numbers Mi and M2 > max^^e^^ 1/(^)1 and for any sets AiD Ae and A2 D A^, //^/-//f/|</|/^/_..| + / 4M2 JA2-A, < 2e. If we let e —> 0 and choose a corresponding family of A^^ then the above inequality implies that the lim J/a^ is independent of the choice of the A^. We define this limit to be J/. We now list some very crude sufficient criteria for a function to be absolutely integrable. We will consider the two different causes of trouble—nonbounded- ness and lack of compact support. Let / be a bounded function with /a contented for any contented set A, Suppose \f(x)\ < C||x||~^ for large values of \\x\\. Let Br be the ball of radius r centered at the origin. If ri is large enough so that the inequality holds for il^ll ^ ^1, then for 7*2 > Vi we have J J Br2—Bri J ri -k where Q^ is some constant depending on n (in fact, it is the "surface area'' of the unit sphere in E^). If /c > n, this last integral becomes
352 INTEGRATION 8.13 which is < [Cln/{k — n)]r1~^, which tends to zero as ri —> oo if /c > n. Thus w^e can assert: Let / be a bounded function such that /^ is contented for any contented set A. Suppose that |/(x)| —> 0 as ||x|| —> oo in such a way that ||x||^|/(x)| is bounded for some k > n. Then / is absolutely integrable. Now let us examine the situation when / is of compact support but unbounded. Suppose first that there is a point Xq such that / is bounded in the complement of any neighborhood of xq. Suppose, furthermore, that |/(x)| < C'\\x - xoll —k for some constants C and k. Then if |/(x)| > M, \\x — Xo|| ^ > M/C or ||x - xoll < C/M'^K Let Bi be the ball of radius CM^^^^ centered at Xq. Then |/(x)| > Mi implies that x G 5i. Furthermore, for M2 > Mi we have [ If""'] <&l \\x-x^f + M2ix{B^), JB^ JB1—B2 where B2 is the ball of radius CM^^^^ centered at Xq. Thus Ik -n/k rCMJ^'^ JBi JCM2 where 12^ and Vn depend only on n. If k < n, the integral on the right becomes y^n—k y^n—k J^ /jyr{k—n)lk jij(k—n)/k\ ^ _^ j^{k—n)lk n — k n — k Thus /. \f'\ < const (Mf-")/' + M^'-"^/"), which can be made arbitrarily small by choosing Mi large. Thus if/has compact support and is such that/^ is contented for all M and |/(x)| < C\\x — Xo||~^ with k < n, then/is absolutely integrable. More generally, let aS be a bounded subset of an ^dimensional subspace of E^. Let d{x) denote the distance from x to S. Let / be a function of compact support with f^ contented for all M. If |/(x)| < C d{x)~^ with k < n — I, then f is absolutely integrable. The proof is similar to that given above and is left to the reader. Let {fk} be a sequence of absolutely integrable functions. Under what conditions will the sequence jfn —> J/if the sequence/^(x) —> /(x)? Even if the sequence converges uniformly, there is no guarantee that the integrals converge. For instance, if/a, = (^/k'^)e{j^, then |/fc(x)| < l//c^, so that fk approaches zero uniformly. On the other hand, jfk = 1 for all k.
8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 353 We say that a set of functions {fk} is uniformly absolutely integrable if for any e > 0 there is an A^ which can be chosen independently of k such that Jb I < e for all M wherever B D A^ = 0. We frequently verify that {//,} is uniformly absolutely integrable by showing that there is an absolutely integrable function g such that \fk{x)\ < \g{x)\ for all k and x. Let {fk} be a uniformly absolutely integrable sequence of functions. Suppose that fk—^f uniformly. Suppose in addition that / is absolutely integrable. Then jfk —> J/. In fact, for any 5 > 0 we can find a ko such that \fk{x) — f{x)\ < 8 for all k > ko and all x. We can also find A^ and M^ such that ijfk ■— jf\ < \jfk,A^ — jfk\ + IjfAe ~ // + \jfk,A^ — I/aA < e+e+8fjL{A,), which can be made arbitrarily small by first choosing e small (which then gives an Ag) and then choosing 8 small (which means choosing ko large). The main applications that we shall make of the preceding ideas will be to the problems of computing iterated integrals and of differentiating under the integral sign. Proposition 13.1. Let / be a function on X Suppose that the set of functions {/{x, •)} is uniformly absolutely integrable, where x is restricted to lie in a bounded contented set K C 11^^. Then the function exx^^' f is absolutely integrable, and [ if= [ [if{^,y)dydx= f f f{x,y)dxdy. Proof. By assumption, for any e > 0 we can find M and A^C^^ such that li\fA{x,-)\ < € if AnA, = 0. (13.1) Now for any set B in K", < ii{K)e if BnKx A,= 0. This shows that eKx»^f is absolutely integrable on R". Now choose a sufficiently large d = di X da and an M such that [ if- [ if'^ and also such that ffix, ■) - ffMix, •) < e < €, for all X ElK.
354 INTEGRATION 8.13 Then we have < e and Thus so that \[ if-1 if' [ if°- I f,f¥{x,y)dydx. f J ~~ I f if{x,y)dydx\ < e +fi{K)e, f if=[ fif{^,y)dydx. Finally Eq. (13.1) shows that the function F(y) = jx f{', y) is absolutely integrable. In fact, using the same A and M as in (13.1), we get f\F¥'\ < KK)e. Thus we get [ if""[ f z/(^' y)dydx= L ( f{x, y) dx dy. D Jkx^ jk Jr'- Jr'' Jk An extension of the same argument shows the following. Proposition 13.2. Let / be absolutely integrable on W^ and such that the functions/(x, •) are uniformly absolutely integrable for each x G U^. Then jf = IIfix, y) dy dx. We now turn our attention to the problem of differentiating imder the integral sign. Proposition 13.3. Let {t, x) i-^ F{t, x) be a function on / X IR^, where / = [a, 6] C H^. Suppose that i) F and dF/dt are continuous functions on / X W^; ii) (dF/dt){tj •) is a uniformly absolutely integrable family of functions; iii) F{tj •) is absolutely integrable for all t G /. Let f{t) — JF{tj •)• Then / is a differentiable function of t and f'{t) = f^„{dF/dt){t,-). Proof. Let G{t) = ^u^ {dF/dt){t^ •)• Then G{t) is continuous; hence we can pass to the limit under the integral sign of a family of absolutely integrable fimctions. Furthermore, £' G{s) ds = l^^l^ {dF/dt){s, •) ds
8.14 PROBLEM set: the FOURIER TRANSFORM 355 by Proposition 13.1. Thus [' G{s) = /■„ (Fit, •) - F{a, •)) - [„F{t, •) - f ^F{a, •) = fit) - /(«)• Differentiating this equation with respect to t gives the desired result. D Finally, let us state the change of variables formula for absolutely integrable functions. Let (p: U —^ y be a differentiate one-to-one map with differentiate inverse, where U and V are two open sets in U^. Let / be an absolutely integrable function defined on V. Then (/ o <^)|det J^\ is an absolutely integrable function on U and f^f= f^(fo^)\detJ^\. Proof. To show that (/o <^)|det/^| is absolutely integrable, let e > 0 and choose am A^CV such that (ii) holds. Then A^ is compact, and therefore so is (p'~^{Ae). In particular, <^~^(i4e) is a bounded contented set and |det/^| is bounded on it. li B n <^~^(i4e) = 0, where B C U is bounded and contented, then f |detJ,|^l(/o^)^|< f \n<e. JB J(p{B) This shows that (/o <^)|det /^| is absolutely integrable. The rest of the proposition then follows from by letting e —> 0. D EXERCISES 13.1 Evaluate the integral /_^^ e~^^ dx. [Hint: Compute its square.] 13.2 Evaluate the integral /o° e~^^a:^^ dx. 13.3 Evaluate the volume of the unit ball in an odd-dimensional space. [Hint: Observe that the Jacobian determinant for "polar" coordinates is of the form r'^~^ X /, where / is a function of the "angular variables". Thus the volume of the unit ball is of the form C/o r^~^ dr^ where C is determined by integrating / over the "angular variables". Evaluate C by computing (/^^ e~^^ dx)"^.] 14. PROBLEM SET: THE FOURIER TRANSFORM Let a = <q:i, . . . , Q:n> be an n-tuple whose entries are nonnegative integers. By D" we shall mean the differential operator Z)«= — dxV ' ' 'd^n"
356 INTEGRATION 8.14 Let |a:| = ai + • • • + ^n- Let Q{x, D) = Y.\a\<k aa{x)D'^ be the differential operator where each a^ is a polynomial in x. Thus if / is a C^-function on U^, we have iQf)(x) = E a^(x)DJix). \a\<k For any / which is C°° on W^ we set ii/iio = sup mx)\. We denote by S the space of all f ^C^ such that ll/llo < ^ (14.1) for all Q. To see what this means, let us consider those Q with k = 0. Then (14.1) says that for any polynomial a(-) the function a • / is bounded. In other words, / vanishes at infinity faster than the inverse of any polynomial; that is, lim ||x||^/(x) = 0 llxll->oo for all p. To say that (14.1) holds means that the same is true for any derivative of / as well. If / is a C°°-function of compact support, then (14.1) obviously holds, so / G S. A more instructive example is provided by the function n given by Since lim^^oo r^e~^^ = 0 for any p, it follows that limnxn^oo a{x)n{x) = 0. On the other hand, it is easy to see (by induction) that D^n{x) = Pa{x)n{x) for some polynomial Pa- Thus Qn{x) = PQ{x)n{x), where Pq is a polynomial. Thus n ^S. It is easy to see that the space S is a vector space. We shall introduce a notion of convergence on this space by saying that /n —^ / if for every fixed Q, (Note that the space S is not a Banach space in that convergence depends on an infinity of different norms.) EXERCISES 14.1 Let <^ be a (7°°-function which grows slowly at infinity. That is, suppose that for every a there is a polynomial Pa such that |Z)«<^(a:)| < Pa(x) for all x. Show that if / G S, then (pf G S. Furthermore, the map of S into itself sending f —^ (pf is continuous, that is, if/n —>/, then (pfn —> (pf.
8.14 PROBLEM set: the FOURIER TRANSFORM 357 For o: = <x\ . . . , x^'y ^W" smd ^ = < ^S . . . , r> G R''* we denote the value of ^ at a; by (x,^) = xi^^H h^c-^-. Also for any a = <a^, . . . , a^> and any a: E B^^ we let and similarly ^« = (fi)«i • • • (^")«n, etc. For any/E S we define its Fourier transform/, which is a function on IR^*, by ho = fe-'^'''^f(x) dx. We note that /(O) = jf and 1/(01 < j\f\. 14.2 Show that/ possesses derivatives of all orders with respect to f and that in other words, mm) = 9(^), where gf(a:) = (—i)'"'a:"/(a:). 14.3 Show that <> [Hint: Write the integral as an iterated integral and use integration by parts with respect to the jth variable.] 14.4 Conclude that the map /i-^ / sends S(IR'') into S(IR''*) and that if /n -> 0 in S, then/„->OinS(IR^*). 14.5 Show that ^^ £/(^) = e-^<-.f>/(Q for any co E U\ Recall that Ta,/(a:) = f(x — co). 14.6 For any / E S define 7 by fix) = f{-x), where denotes complex conjugation. Show that M = M^ 14.7 Let n = 1, and let/ be an even real-valued function of x. Show that /(© = j cos (x, 0 fix) dx. 14.8 Let n(x) = e-(i/2)-r2 ^^^^^ ^ ^ u^i^ gj^^^ ^^^^ and conclude that logn(^) = ~i^2_^ const,
358 INTEGRATION 8.14 SO that n(0 = const X e-(i/2)€^ Evaluate this constant as \/27r by setting ^ = 0 and using Exercise 13.1. Thus 14.9 Show that the hmit limc_>o Se^^ (sin x)/x dx exists. Let us call this limit d. Show that for any i^ > 0, limc_>o Se^^ (sin Rx)/x dx = d. If / E S, we have seen that/ E S(IR^*). We can therefore consider the function The purpose of the next few exercises is to show that /(2/) = (2^/e''^^'*>/(f)df. (14.2) We first remark that since all integrals involved are absolutely convergent, it suffices to show that fiy) lim . . . lim -i- /" " / ' he, ..., r)e«'''«'+-+^''«'" d^ ■ ■ ■ df. Substituting the definition of / into this formula and interchanging the order of integration with respect to x and ^, we get Um... lim (f)" [["■■■(' fix\ ..., ^'.)e'K«'-')f'+-+(«''-'')f"l a^ ■ ■ ■ d^dx. i2l->oo i2„->oo \2'r'/ J J—R^ J—Ri It therefore suffices to evaluate this limit one variable at a time (provided the convergence is uniform, which will be clear from the proof). We have thus reduced the problem to functions of one variable. We must show that if/G S(lI^O, then We shall first show that fiy) = lim-L // Kx)e'"'-''^d^dx. R^oo ^T^ J J —R w that f{y)= limi. [[ }{x)e'"'-'''dkdx, where d is given in Exercise 14.9. 14.10 Show that this last integral can be written as 1 r ., X sin^R(y-^ 1 rf(y-u)+f(y+u) Ru -^ / j{x) dx = - sm — du. 2d J-00 y — X d Jo 2 u 14.11 Let ^(^),/(.-^)+/fa+^)_^(^).
8.14 PROBLEM set: the FOURIER TRANSFORM 359 Show that g{0) =0 and conclude that g(x) = xh{x) ior 0 < x < 1, where hsC^. By integrating by parts, show that / g(u) sin [- / g(u) sin — lim 1 ("fcli^ jB_>oo d Jo 1 < const -^' K )+f(y + u) . Ru , ., , sin — du = f{y). 2 u This proves that iih 14.12 Using Exercise 14.8, conclude that d = 7r/2. Let /i E S and /2 E S. Define the function /i * /2 by setting /i */2W = ffi(x — y)f2(y) dy. Note that this makes good sense, since the integrand on the right clearly converges for each fixed value of x. We can be more precise. Since fi E S, we can, for any integer p, find a Kp such that SO that ,„„ \hiy)\<T^- ' \\y\\>R 1 -j- /C^ Then / J Wn fa+ \\xr)fi(x ~ y)f2(y) dy= f (1+ \\xr)Mx - y)h{y) dy J y 111/ll<(l/2) 11x11 + f {l+\\xr)h{x-y)f2{y)dy. The first integral is at most C„(4llx||)"(l + \\x\\Y max \Mz)\ ^^^l^^^^^y ' while the second is at most ll»^_,■■■l..■.M -^p(ilMI)" (l+IWr) max!/.(.)! .^_^^,|l^ll^^ By choosing p > q-jr n, we see that both terms go to zero. Thus lim (1+11x11')/!*/2(x) =0. 11x11—>oo 14.13 Show that £-.»•«-(i)-'-/.*(i) Conclude that /i */2 E S.
360 INTEGRATION 8.14 14.14 Show that if ip is any bounded continuous function on IR'*, then [[(fix+y)/ (x)f2{y) dx dy = j(p(u)(fi */2)(^) du. 14.15 Conclude that 7r^(© = /i(©/2(^). 14.16 Show that f^f(y) = (£ff\m'e'''^'d^, 14.17 Conclude that for any / G S, /I'l'=(s)7i'i' <»« [Hint: Set ?/ = 0 in Exercise 14.16.] The following exercises use the Fourier transform to develop facts which are useful in the study of partial differential equations. We will make use of these facts at the end of the last chapter. The reader may prefer to postpone his study of these problems until then. On the space S, define the norm || ||s by setting 11/11^ =(2x)-"|(l+ll?ll')1/(0l'd?, and the scalar product (/, g)s by (f,g)s = fii+ufrk&md^. 14.18 Let s = i^ be a nonnegative integer. Show that ll/ll'= Z ,.j,^ I ., I \Dy(x)fdx, where al = ail • • • anl [Use the multinomial theorem, a repeated application of Exercise 14.3, and Eq. (15.3).] We thus see that ||/||i2 measures the size of/ and its derivatives to order R in the square integral norm. It is helpful to think of || \\s as a generalization of this notion of size, where now s can be an arbitrary real number. Note that 11/11. < WfWt if s<L For any real s define the operator K* by setting ^fiO = (1 + II ff )'/(?)• 14.19 Show that the operator K = K^ is given by oxi
8.14 PROBLEM set: the FOURIER TRANSFORM 361 14.20 Show that for any real numbers s and t, \\K%\t = ||/||,+2. and {K^f,g)i = (f,K^9)t= (f,9)s+t. 14.21 Show that K*"""* = K* o K\ so that, in particular, K^ in invertible for all s. We now define the space Hg to be the completion of S under the norm || ||s. The space Hs is a Hilbert space with a scalar product ( , )s. We can think of the elements of Hs as "generalized functions with generalized derivatives up to order s". By construction, the space S is a dense subspace of Hg in the norm || ||s. We note that Exercise 14.20 implies that the operator K^ can be extended to an isometric map of Ht into Ht-2s. We shall also denote this extended map by K\ By Exercise 14.21, K-^:Ht-2s-^Ht is the inverse of K% so that K^ is a norm-preserving isomorphism of Ht onto Ht-2s' 14.22 Let uG Hs and v G //_«. Show that iK^^)oi < II^IMIHI— Thus we can extend <u,v>-^(u,v)oiosi function on Hg X H^s which is linear in u and antilinear in v [that is, (u, avi -\- b2V2)o = a(u, vi) -\- b(u, V2)] and satisfies the above inequality. Thus any v E H^s defines a bounded linear function, I, on Hs by l(u) = (u, v)o. 14.23 Conversely, let Z be a bounded linear function on Hs. Show that there is a V E H^s with l(u) = (u, v)o for all u E Hs. [Hint: Consider the hnear form v = K'^w, where w; is a suitable element of Hs, using Theorem 2.4 of Chapter 5.] 14.24 Show that II II \(u,v)o\ M-s = sup II II ' uGH„ \\u\\s (Exercise 14.22 gives an inequality. If z; f^ 0, take u = K'^'v to get (K-'^h, K-''h) m\ l|2 in order to get an equality.) 14.25 Let 2s > n (where our functions are defined on W'). Show that for any/G S we have sup \f{x)\ < 11/11, r f (1 + ll^f )~'l'^' di (Sobolev's inequality). (Use Eq. (14.2), Schwarz's inequality, and the fact that the integral on the right of the inequality is absolutely convergent.) Sobolev's inequality shows that the injection of S into (7(IR^) extends to a continuous injection of Hg into C{W')j where (7(IR^) is given the uniform norm. We can thus regard the elements of Hs as actual functions on IR^ if s > n/2.
362 INTEGRATION 8.14 By induction on \a\ we can assert that for s > n/2, any/G -^ui+s has \a\ continuous derivatives and sup I D7(x)I < Call/Ill„|+.. (14.4) 14.26 Let 12 be a bounded open subset of R^. Let (^ E S satisfy supp cp C^. Show that \m)\ < fJii^y^HAo for all ^ 14.27 Let d' = lubxGn |^1, and let d« = (d^)"! • • • (d")«n. Show that 14.28 Show that and conclude that l(i+llff)V(?)l<M(Si)Hlf. 14.29 More generally, let ^ be a function in S which satisfies \l/(x) = 1 for all x G^, and let (^ E S satisfy supp cp C^. Show that where \l^^(x) = 4/{x) e~'^'''^\ and that \DiH^)\ < II^IMI^^II-., where \l/l(x) = x'^(x)e-'^'^'^\ Let us denote by H^ the completion under || ||s of the space of those functions in cS whose supports lie in 12. According to Exercise 14.29, any cp E: H^ defines an actual function <^ of ^ which is differentiable and satisfies where ||\^^(a:j||_s depends only on 12, a, ^, and —s, and is independent of (p. Further- more, ll^ll? = S(l + \\m'\H^)\^ d^. 14.30 Let s < t. Then the injection Ht —> i/s is a compact mapping. That is, it {(pi} is a sequence of elements of //? such that \\(Pi\\t ^ 1 for all i, then we can select a subsequence {cpij} which converges in || ||s. [Hint: By Exercise 14.29, the sequence ot functions cpiiO is bounded and equicontinuous on {.f : ||^|| < r} for only fixed r. Wv can thus choose a subsequence which converges uniformly and therefore a subsubsc- quence which converges on {^ : ||^|| < r} for all r (the uniformity possibly depending on r). Then if {cpij} is this subsubsequence, ikv - 'Pifs = /(I + ufrwijio - >PiM' d^ + /■ (i+ufy\<Pijm~<PiM'd^ J \\^\\>r J llf ll<r + (i+ufy-'{\\<Pijfi + \\<Pifi}.]
CHAPTER 9 DIFFERENTIABLE MANIFOLDS Thus far our study of the calculus has been devoted to the study of properties of and operations on functions defined on (subsets of) a vector space. One of the ideas used was the approximation of possibly nonlinear functions at each point by linear functions. In this chapter we shall generalize our notion of space to include spaces which cannot, in any natural way, be regarded as open subsets of a vector space. One of the tools we shall use is the "approximation" of such a space at each point by a linear space. Suppose we are interested in studying functions on (the surface of) the unit sphere in E^. The sphere is a two-dimensional object in the sense that we can describe a neighborhood of every point of the sphere in a bicontinuous way by two coordinates. On the other hand, we cannot map the sphere in a bicontinuous one-to-one way onto an open subset of the plane (since the sphere is compact and an open subset of E^ is not). Thus pieces of the sphere can be described by open subsets of E^, but the whole sphere cannot. Therefore, if we want to do calculus on the whole sphere at once, we must introduce a more general class of spaces and study functions on them. Even if a space can be regarded as a subset of a vector space, it is conceivable that it cannot be so regarded in any canonical way. Thus the state of a (homogeneous ideal) gas in equilibrium is specified when one gives any two of the three parameters: temperature, pressure, or volume. There is no reason to prefer any two to the third. The transition from one set of parameters to the other is given by a one-to-one bidifferentiable map. Thus any function of the states of the gas which is a differentiable function in terms of one choice of parameters is differentiable in terms of any other. Thus it makes sense to talk of differentiable functions on the states of the gas. However, a function which is linear in terms of one choice of parameters need not be linear in terms of the other. Thus it doesn't really make sense to talk of linear functions on the states of the gas. In such a situation we would like to know what properties of functions and what operations make sense in the space and are not artifacts of the description we give of the space. Finally, even in a vector space it is sometimes convenient to introduce "nonlinear coordinates" for the solution of specific problems: for example, polar coordinates in Exercises 11.3 and 11.4, Chapter 8. We would therefore like to know how various objects change when we change coordinates and, if possible, to introduce notation which is independent of the coordinate system. 363
364 DIFFERENTIA BLE MANIFOLDS 9.1 We will begin our formal discussion with the definition of differentiable manifolds. The basic idea is similar to the one that is used in everyday life to describe the surface of the earth. One gives a collection of charts describing small overlapping portions of the globe. We can piece the whole picture together by seeing how the charts match up. <Pi{Ui) <Pj{Uj) Fig. 9.1 1. ATLASES Let M be a set. Let F be a Banach space. (For almost all our applications we shall take V to be U^ for some integer n.) A V-atlas of class C^ on M is a collection d of pairs (C/^, (pi) called charts, where Ui is a subset of M and <pi is a bijec- tive map of Ui onto an open subset of V subject to the following conditions (Fig. 9.1): AI. For any (C/^, (pi) G Ct and (Uj, cpj) G d the sets (fiiUi fl Uj) and (fjiUi n Uj) are open subsets of F, and the maps (fi o (pj^: (pj{Ui n Uj) —> (fiiUi n Uj) are differentiable of class C^. A2. ljUi= M. The fimctions ifi o ipj ^ are called the transition functions of the atlas ^. The following are examples of sets with atlases. Example I. The trivial example. Let M be an open subset of V. If we take ft to consist of the single element (U^ cp), where U = M and ^: C7 —> 7 is the identity map, then Axioms Al and A2 are trivially fulfilled. Example 2. The sphere. Let M = S"" denote the subset of W'^^ given by (x^)^ + • • • + (x"^"^^)^ = 1. Let the set Ui consist of those points for which ^n+i v^ _-^^ ^^^ Yet U2 consist of those points for which x"^"^^ < 1. Let cpiiUi-^ IR^
9.1 be given by .^■o..(.\...,.-^)=^-3^ i= 1, ATLASES 365 • ,^, where 2/\ ...,?/ are coordinates on W. Thus the map ipi is given by the projection from the "south pole", <0, . . . , 0, -1 >, to IR^ regarded as the equatorial plane (see Fig. 9.2). Similarly, define (^2 by 1 — x^H-l Then (^i(f/i n U2) = <P2{Ui n U2) =^ {y GW'iy 9^ 0}. Now .n-\-U2 1 - X' n + 1 Thus ^ 1 - (x"+^) (1 + X^ + l)2 ~ 1 -f :c^ + l or ^2(^) = ^iW In other words, the map (^2 ° <^r\ defined for all ?/ 5^ 0, is given by <^2 ° ^1 {y) = -irw^ • Thus conditions Al and A2 are fulfilled. Fig. 9.2 Note that the atlas we gave for the sphere contains only two charts (each given by polar projection). An atlas of the earth usually contains many more charts. In other words, many different atlases can be used to describe the same set. We shall return to this point later.
366 DIFFERENTIABLE MANIFOLDS 9.1 Fig. 9.3 Fig. 9.4 Example 3. The circle. The circle fi^Ms a "one-dimensional sphere" and therefore has an atlas as described in Example 2. We wish to describe a different atlas on S^. Regard S^ as the imit circle x\-^ xl = Ij and consider the fimction ^i, defined in a neighborhood of < 1, 0> on the upper semicircle of aS\ which gives the angle from the point on S^ to < 1, 0> (see Fig. 9.3). As we move counterclockwise around the circle, this function is well defined imtil we hit <1, 0> again. We will take, as the first chart in our atlas, (C7i, ^i), where Ui = S^ — {<1,0>} and (9i is the fimction defined above. LetU2 = S^ — {<0, 1>}, and define $2 to be 7r/2 plus the angle (measured counterclockwise) from <0, 1> (see Fig. 9.4). Now C7i n C/2 = ^S^ - {<1, 0>, <0, 1>}, and e,(U,nU2)= (0,27r) - {t/2}. ei[UinU2) -h 7r/2 -4- 27r Also, d2{Ui n C/2) = (7r/2, 27r + 7r/2) - {27r}. 0 7r/2 The map 62 ° B~[^ is given by -i^:,) = j^ + 2^ 62^11002) -I e2oe-^\x) 27r 27r + 7r/2 if 0 < X < 7r/2, if 7r/2 < X < 27r. Example 4. The product of two atlases. Let d = {(C/^, (fi)} be a Fi-atlas on a set Mj and let (B = {{Wj, ypj)} be a y2-atlas on a set Nj where Vi and V2 are Banach spaces. Then the collection e = {{Ui X Fy, (fi X i/j)} is a (Fi X F2)- atlas onM X N. Here cpi X \l/j{p, q) = <cpiip), ^M)> if <P: Q> ^UiX Wj. It is easy to check that C satisfies conditions Al and A2. We shall call C the product of d and (B and write C = a X (B. For instance, let M = (0, 1) C K^ and N = S^. Then we can regard M X iV as a cylinder or an annulus. If M = N = S^j then M X iV is a torus. Cylinder Annulus Torus
9.2 FUNCTIONS, CONVERGENCE 367 It is an instructive exercise to write down the atlases and transition functions explicitly in these cases. Example 5. As a generalization of our first example, let aS be a submanifold of an (n + m)-dimensional vector space X, as defined in Section 12 of Chapter 3. For each neighborhood N defined there, the set S f) Nj together with the map if which is defined as the projection tti restricted to S, provides a chart with values in V (where X is viewed as F X W). In such a neighborhood N the set S is presented as a graph of fimction F. In other words, SnN = {<x,Fix)> gFX W'.x^TTiiS)}, where F is a smooth map of A = Ti(S f) N) into W. Let N^ be another such neighborhood with corresponding projection t[ (where now X is identified with V X W in some other way). Then cp^ o (p~^(x) = 7ri(x, F(x)), which shows that (^' o (^~"Ms a smooth map. Thus every submanifold in the sense of Chapter 3 possesses an atlas. Exercise. Let P^ (projective n-space) denote the space of all lines through the origin in IR^"^^. Any such line is determined by a nonzero vector lying on the line. Two such vectors, <x^, . . , , x^"^^ > and <y^, . . . , y^~^^ > , determine the same line if and only if they differ by a factor, that is, y* = \x^ for all i, where X is some (nonzero) real number. We can thus regard an element of P^ as an equivalence class of nonzero vectors. For each i between 1 and n + 1, let Ui C P^ be the set of those elements coming from vectors with x^ 7^ 0. Map Ui by sending .1 n+1. J X X X^ X ^ \ <X , . . . ,X >F->< -r,... , r- , r- , . . . , r- >. ' ' ^X' ' ' X' ' X' ' ' X^ f Show that the map ai is well defined and that {{Uij ai)} is an atlas on P^. 2. FUNCTIONS, CONVERGENCE Let a be a F-atlas of class C^ on a set M. Let / be a real-valued function defined on M. For a chart (C/^-, (pi) we obtain a fimction fi defined on (Pi{Ui) by setting fi=focpi-\ (2.1) The function fi can be regarded as the "local expression of /" in terms of the chart (Uij cpi). In general, the fimctions fi will look quite different from one another. For example, let M = S^, let GL be the atlas described, and let / be the fimction on the sphere assigned to the point <x\ . . ., x^~^^ > the value x'^^^. Then 2 My) =f° fT'iy) = Y+J^2 - ^' while f2(y) = f" •P2\y) = 1 - Y+Jyp' as one can check by solving the equations.
368 DIFFERENTIABLE MANIFOLDS 9.2 Returning to the general discussion, we observe that the functions fi are not completely independent of one another. In fact, it follows from the definition (2.1) that we have /z ° <^i ° ^T^ = fj on (fjiUi n Uj). (2.2) [Thus in the example cited above we indeed have f2{y) = fi{y/\\y\\^)y as is required by (2.2).] We now come to a simple but important observation. Suppose we start with a collection of fimctions {/J, each/^- defined on (pi{Ui)j and such that (2.2) holds. Then there exists a imique function f on M such that fi = f° ^pT^- In fact, define / by setting /(p) = fi{<Pi{p)) if p E Ui. For / to be well defined, we must be sure that this definition is consistent, i.e., that if p is also in C7y, then fi{(Piip)) = fj{^j{p))i but this is exactly what (2.2) says. We can thus think of a real-valued function in two ways: as either i) an object defined invariantly on M, i.e., a map from M to IR, or ii) a collection of objects (in this case functions) one defined for each chart and satisfying certain "transition laws", namely (2.2). This dual way of looking at objects on M will recur quite frequently in what follows. Let M be a set with an atlas of class C^. We will say that a function / is of class C^ {I < k) if each of the functions fi defined by (2.1) is of class CK Note that since I < /c, this can happen without any interference from (2.2). If/^ G C^ and (p^^ o cpj ^C^ (k > Q, then fi o (,^^1 o cpj) g CK If I were larger than k, then in general fi would not be of class C^ if fj were, and there would be very few functions of class CK Since we will not wish to constantly specify degrees of differentiability of our atlas, from now on when we speak of an atlas we shall mean an atlas of class C°°. Let M be a set with an atlas d. We shall say that a sequence of points {xi G M} converges to x G M if i) there exists a chart {Uij cpi) G GL and an integer Ni such that x G Ui and for all k > Nj Xk ^ Ui; ii) ^i{xk)k>N converges to ip{x). Note that if {Uj, cpj) is any other chart with x G Ujy then there exists an Nj such that (fjixk) G Uj for k > Nj and (pj{xk) —> (Pj{x). In fact, choose Nj so that (Pi{xk) G (Pi{Ui n Uj) for all k > Nj. (This is possible since (Pj{Ui 0 Uj) is open by Al.) The fact that the (pj{xjc) converge to (pj{x) follows from the continuity of (pj ° (fi^. It thus makes good sense to say that {xk\ converges to x. Warning. It does not make sense to say that a sequence {xk\ is a Cauchy sequence. Thus, for example, let M = S^ with the atlas described above. If {xk\ is a sequence of points converging to the north pole in aS^, then (pi{xk) —> 0, while (P2{xk) —^ 00. This example becomes even more sticky if we remove the north pole, i.e., l^iM = S^ — { < 0, . . . , 0, 1 >} and define the charts as before.
9.3 DIFFERENTIABLE MANIFOLDS 369 Then {xk} has no limit (in M). Clearly, {(pi{xk)} is a Cauchy sequence, while {^2i^k)} is not. Once we have a notion of convergence, we can talk about such things as open sets and closed sets. We could also define them directly. For instance, a set U is open if (pi{U n Ui) is an open subset of (Pi{Ui) for all charts (C/^, (^i), and so on. EXERCISES 2.1 Show that the above definition of a set's being open is consistent, i.e., that there exist nonempty open sets. (In fact, each of the Ui^s is open.) 2.2 Show that a sequence {xa} converges to x if and only if for every open set U containing x there is an Nu with Xa E: U for a > Nu- Let a = {{Uij (fi)} be an atlas on M, and let U be an open subset of M relative to this atlas. Let d \ U he the collection of all pairs {Ui n C/, (Pi \ U). It is easy to check that d f C7 is an atlas on U. We shall call it the restriction of a to U. Let / be a function defined on the open set U. We say that / is of class C^ on U if it is of class C^ relative to the atlas 6i \ U on U. For later convenience we shall say that a function / defined on a subset of M is of class C^ if i) the domain of / is some open set U of M, and ii) / is of class C^ on U. 3. DIFFERENTIABLE MANIFOLDS In our discussion of the examples in Section 1, the particular choice of atlas that we made in each case was rather arbitrary. We could equally well have introduced a different atlas in each case without changing the class of differentiable functions, or the class of open sets, or convergent sequences, and so on. We therefore introduce an equivalence relation between atlases on M: Let a I and (2,2 be atlases on M. We say that they are equivalent if their union di U (22 is again an atlas on M. The crucial condition is that Al still hold for the union. This means that for any charts (C/i, cpi) G (2i and (IFy, i/^y) G (2,2 the sets cpiiUi f] Wj) and i/j{Ui n Wj) are open and cpi o ypy^ is a differentiable map of \pj(Ui n Wj) onto iPi{Ui n Wj) with a differentiable inverse. It is clear that the relation introduced is an equivalence relation. Furthermore, it is an easy exercise to check that if / is a function of class C^ with respect to a given atlas, it is of class G^ with respect to any equivalent one. The same is true for the notions of open set and convergence.
370 DIFFERENTIABLE MANIFOLDS 9.3 Definition 3.1. A set M together with an equivalence class of atlases on M is called a differentiable manifold if it satisfies the "Hausdorff property": For any two points jci ^ X2 of M there are open sets Uiand U2 with jci G Ui and JC2G U2 with c/i n c/2 = 0. In what follows we shall (by abuse of the language) denote a differentiable manifold by M, where the equivalence class of atlases is understood. By an atlas of M we shall then mean an atlas belonging to the given equivalence class, and by a chart of M we shall mean a chart belonging to some atlas of M. We shall also adopt the notational convention that V is the Banach space where the charts on M take their values (and shall say that M is a F-manifold). If there are several manifolds, Mi, M2, etc., under discussion, we shall denote the corresponding vector spaces by Fi, V2, etc. If 7 = K", we say that M is an n-dimensional manifold. Let Ml and M2 be differentiable manifolds. A map cp: Mi —> M2 is called continuous if for any open set U2 C M2 the set <^~^(C/2) is an open subset of ilf 1. Let X2 E M2, and let U2 be any open set containing X2. If (fixi) = X2, then (p'^^{U2) is an open set containing Xi. If (TF, a) is a chart about Xi, then TF n <p~^{U2) is an open subset of IF, and a(TF fl (p~^iU2)) is an open set in Vi containing a{xi). Therefore, there exists an € > 0 such that (p{x) G U2 for all X G IF, wdth \\a{x) — a:(xi)|| < €. In this sense, all points "close to Xi" are mapped "close to X2". Note that the choice of e will depend on the chart (TF, a) as well as on xij X2, C/2, and cp. If Ml, M2, and M3 are differentiable manifolds, and if cp: Mi —> M2 and yp: M2 —> M3 are continuous maps, it is easy to see that their composition yp o (p is a. continuous map from Mi to M3. Let (^ be a continuous map from Mi to M2. Let (TFi, ai) be a chart on Mi and (TF2, (X2) a chart on M2. We say that these charts are compatible (under cp) if (p{Wi) C TF2. If (2'2 is an atlas on M2 and Q-i is an atlas on Mi, we say that di and (2,2 are compatible under cp if for every (TFi, ai) G di there exists a (TF2, 0:2) G (2,2 compatible with it, i.e., such that <^(TFi) C TF2. (Note that the map a2° {(p \ TFi) o q:~Ms then a continuous map of an open subset of Fi into F2.) Given (22 and (^, we can always find an (2i compatible with (22 under cp. In fact, let 6i\ be any atlas on Mi, and set ai = {(TFi n ^-HW2)), a r (iFi n ^-^"^^2))}, where (TFi, a) ranges over all charts of (2'i and (TF2, fi) ranges over all charts of a2. Definition 3.2. Let Mi and M2 be differentiable manifolds, and let (^ be a map: Mi A M2. We say that (p is differentiable if the following hold: i) (p is continuous. ii) Let GLi and GL2 be compatible atlases under cp. Then for any compatible (IFi, ai) G ai and (IF2, a2) ^ (^2, the map a2 ° (p ° olY : ai(IFi) —» a2(lF'2) is differentiable (as a map of an open subset of a Banach space into a Banach space). (See Fig. 9.5.)
9.3 DIFFERENTIABLE MANIFOLDS 371 Fig. 9.5 «2(^^2r In order to check that a continuous map ip is differentiate, it suffices to check much less than (ii). Condition (ii) relates to any pair of compatible atlases and any pair of compatible charts. In fact, we can assert: Proposition 3.1. Let (f.Mi —» M2 be continuous, and let (^i and GL2 be compatible atlases under cp. Suppose that for every (TFi, ai) G GLi there exists a (TF2, a2) ^ ^2 with (p{Wi) C W2 and a2 ° (p ° a7^ differentiate. Then (p is differentiate. Proo/. Let (Uiy jSi) and (f72, iS2) be any charts on Mi and If 2 with <^(f7i) C f/2. We must show that 02 ° ^ ° fiT^ '^^ differentiable. It suffices to show that it is differentiable in the neighborhood of every point iS(xi), where Xi G Ui. Choose (TFi, ai) G (Ji with X G TFi, and choose (TF2, ai) G (^2 with <^(TFi) C W2. Then on 0i{Wi n f/i), we have 02 ° <P ° 0T^ = (i^2 ° «i"^) o (a2 o <^ o aF^) o («i ° ^r^), so that the left-hand side is differentiable. D In other words, it suffices to verify differentiability with one pair of atlases. We have as a consequence: Proposition 3.2. Let (f'.Mi Then ^ o <^ is differentiable. M2 and ^:M2'-^ M^ be differentiable. Proof. Let (^3 be an atlas on M3. Choose (I2 compatible with (^3 under ^, and then choose an atlas di on Mi compatible with (J2 under <^. For any (TFi, ai) G ai choose (TF2, ^2) ^ (X2 and (TF3, a3) G dz with <^(TFi) C TF2 and #(^^2) C TF3. Then a3 o ^ o <^ o a^^ == (ag 0^0 a^^) o (a2 o <^ o «7^) is differentiable. D Exercise 3.1. Let Mi = ^, let M2 = IP**, and let cp: Mi —> M2 be the map sending each point of the unit sphere into the Kne it determines. (Note that two antipodal
372 DIFFERENTIABLE MANIFOLDS 9.o points of ^S^ go into the same point of P^.) Construct compatible atlases for cp and show that if is differentiable. Note that if / is any function on M with values in a Banach space, then/ is differentiable as a function (in the sense of Section 2) if and only if it is differentiable as a map of manifolds. In particular, let cp: Mi —> M2 be a differentiable map, and let/be a differentiable function on M2 (defined on some open subset, say t/2). Then/ o (^ is a differentiable function on Mi [defined on the open set (P~^{U2)]- Thus (f "pulls back" a differentiable function on M2 to Mi. From this point of view we can say that (p induces a map from the collection of differentiable functions on M2 to the collection of differentiable functions on Mi. Wo shall denote this induced map by <^*. Thus differentiable functions on M2 -^ differentiable functions on Mi is given by If;/': M 2 —> Ms is a second differentiable map, then {\l/ o (^)* goes from functions on M3 to functions on Mi, and we have (^o^)* = ^*o^* (3.1) (note the change of order). In fact, for g on M3, (^ o ,p)^g = g o (xly o cp) = (g oxp) o cp = (^*[;A*[^]]. Observe that if (p is any map from Mi —> ilf 2 and / is any function defined on a subset S2 of M2, then the "puUback" <^*[/] ^ f ° <^ is a function defined on (P~^{S2) oi Ml. The fact that cp is continuous allows us to conclude that if S2 is open, then so is <^~H^2)- The fact that cp is differentiable implies that <^*[/] is differentiable whenever / is. The map <^* commutes with all algebraic operations whenever they arc defined. More precisely, suppose / and g take values in the same vector space and have domains of definition Ui and f/2. Then / + gr is defined on Ui fi (^■.• and <^*[/] + <^*[{/] is defined on (p~^{Ui fi U2), and we clearly have >P*[f+9l = <P*[f] + <P *r EXERCISES 3.2 Let M2 be a finite-dimensional manifold, and let cp: 71/i —> M2 be continuoll^ Suppose that <^*[/] is differentiable for any (locally defined) differentiable real-vakuM function /. Conclude that cp is differentiable. 3.3 Show that if <^ is a bounded linear map between Banach spaces, then (^* :i defined above is an extension of <^* as defined in Section 3, Chapter 2.
9.4 THE TANGENT SPACE 373 4. THE TANGENT SPACE In this section we are going to construct an "approximating vector space" to a differentiable manifold at each point of the manifold. This will allow us to formulate most of the notions of the differential calculus on manifolds. Let M be a differentiable manifold, and let x be a point of M (Fig. 9.6). Let / C K be an interval containing the origin. Let <^ be a differentiable map of / into M such that <^(0) = x. We will call cp a (differentiable) curve through x. Let / be any differentiable real-valued function on M defined in a neighborhood of X. Then <^*[/] is differentiable on IR and we can consider its derivative at the origin. Define the operator D^ by DM) = dAf] dt In view of the hnearity of <^*, the map /-> DM) is linear: DMf+^9) = aDM) + ^D^{g). Similarly, we have Leibnitz's rule: DMg) = Kx)DM) + oix)DM), Fig. 9.6 which can easily be checked. The functional D^ depends on the curve cp. If ^ is a second curve, then, in general, D^ 9^ D^. If, however, D^ = D^, then we say that the curves cp and xp are tangent at x, and we write (p ^ \l/. Thus (f r^ yp if and only if DM) = DM) for all differentiable functions/. It is easy to check that ^ is an equivalence relation. An equivalence class of curves through x will be called a tangent vector at x. If ? is a tangent vector at X and <^ G ?, we say that ? is tangent to <^ at x. For any differentiable function / defined about x and any tangent vector g, we set ?(/) = D^U), where <^ E ?. Thus ^ gives us a functional on differentiable functions defined about X. We have i{af+-bg) = am + hm, (4.1) iUg) = Kx)m + g{x)i{f). (4.2) Let us examine what the equivalence relation ^ says in terms of a chart {W, a) about x. The functional DM) can be written as dj^ <P dt t=o d{f ° oi ) ° {a o ip) Jt t=o
374 DIFFERENTIABLE MANIFOLDS 9. If we set <l> = ao ip and F ^ f o a \ then <l> is a parametrized curve in a Banac^li space and /^ is a differentiable function there. We can thus write D,{J)=dF{^'{0)) = D^>^^,F. From this expression we see (setting ^ = a: o ;//) that \l/ ^ ip \i and only ii <l>'(0) = '^'(O). We thus see that in terms of a chart {W, a), every tangeiif vector g at X corresponds to a unique vector ^a ^ V given by ^a= (ao cpYiO), where <^ E {. CoiiA^erselyj given any v G V, there is a tangent vector f with ^^ = v In fact, define (p b}^ setting (p(t) = a~'^(a(x) + tv). Then (p is defined in a smaJI enough interval about 0, and (a o ^p)^ = v. In short, a choice of chart allows us to identify the set of all tangent vector. • at X with V. Let {U, ^) be a second chart about x. Then ?/3 = (^ ° ^)'(O) = (0 o a-') o (a o ^)^(O). By the chain rule we thus have f/3 = J^oa-i{Gt{x))^a, (4.3) where Jy(])) is the differential dJp of T at p. Since /,Soa-i(Q:(x)) is a linear map of V into itself, Eq. (4.3) says that tlir set of ail tangent vectors at x can be identified with V, the identification beiiifi. determined up to an automorphism of V. In particular, we can make the sef^ of all tangent vectors at x into a vector space by defining where f is determined by - hi} - hi] a for some chart a. Equation (4.3) shows that this definition is independent of a. We shall denote the space of tangent vectors at x by T^iM) and shall call if the tangent space (to M) at x. Let i/^ be a differentiable map of Mi to M2, and let (^ be a curve passing through X G Ml (see Fig. 9.7). Then 1// o ^ is a curve passing through 4/{x) G Mo- It is eas3^ to check that if <^ ---^ ^, then ^ o (^ ^ 1// o ^. Thus the map ^ induces a mapping of Tr^(Mi) into T^^r^^{M2), which we shall denote by ip^x- To repeal. Fig. 9.7
9.4 THE TANGENT SPACE 375 Fig. 9.8 Fig. 9.9 if f G Tx(Mi)j then yp^xi^) = -^7 is determined by yp o tp ^ r] for all (p G ^. Let ([/, a) be a chart about x, and let (IF, ^) be a chart about ^(x). Then and f« = (a o ^)'(O) ^^ = (|8 o i/. o <^)'(0) = {^oyf^o a~^) o (a: o <^)'(0). By the chain rule we can thus write This says that if we identify T^iMi) with Vi via a and identify T^(^x){M2) with F2 via j8, then \(/^x becomes identified with the linear map /^o^oQ;-™i(a:(x)). In particular, the map \(/^x is 0, continuous linear mapping from Tx{Mi) to T^(^x){M2). If (p: Ml —> M2 and ^: M2 —> M3 are two differentiate mappings, then it follows immediately from the definitions that (4" ° <^)*x = ^^cpix) ° (P*X' (4.4) We have seen that the choice of chart (?7, a) identifies Tx{M) with F. Now suppose that M is actually F itself (or an open subset of F) regarded as a differentiable manifold. Then M has a distinguished chart, namely (M, id). Thus on an open subset of F the identity chart gives us a distinguished way of identifying Tx{M) with F. It is sometimes convenient to picture Tx{M) as a copy of F whose origin has been translated to x. We would then draw a tangent vector at x as an arrow originating at x. (See Fig. 9.8.) Now suppose that M is a general manifold and that 1^ is a differentiable map of M into a vector space Fi. Then ^^{Tx{M)) is a subspace of T^f^x){Vi)^ If we regard 4/^{Tx{M)) as a subspace of Fi and consider the corresponding hyperplane through x, we get the "plane tangent to i/'(M) at x" in the intuitive sense (Fig. 9.9).
376 DIFFERENTIABLE MANIFOLDS O.T) It is very convenient to think of tangent vectors in this way, that is, to regard them as vectors tangent to M if M were mapped into a vector space If / is a real-valued differentiable function defined in a neighborhood C/ ol of X G M, then we can regard it as a map of the manifold U to the manifold U'. We therefore get a map/^^,: T^{M) -> Tf^^^{U^), Recall that we identify Ty(W) with U^ for any y G W^. Therefore, f^^ can be viewed as a map from T^^iJ^l) to [R^ The reader should check that this map is indeed given by fU^) = ?(/) for ? G T,(M). (4.r)) In particular, if we take M3 = U and xj/ = f in (4.4), we can assert: Let i/' be a differentiable map of il/i to M2, and let / be a differentiable fuiic tion on M2 defined in a neighborhood of tA(^)- Then for any ^ G !r^(il/i), From now on, we shall frequently drop the subscript x in i/'^^ when it can hr understood from the context. Thus we would write (4.4) as (xp o cp)^ ^ xp^ o ip,^,. Some authors call the mapping xp^^ the differential of i/' at x and designate \i chp,. If Ml and M2 are open subsets of Banach spaces Vi and V2 (and hence arc differentiable manifolds under their identity charts), then xp^^ as defined above does reduce to the differential dxpj^ Avhen T:r,(M^) is identified with Vi. Thi,. reduction does depend on the identification, however. 5. FLOWS AND VECTOR FIELDS Let Ml and il/2 be differentiable manifolds. A map g from Mi —> M2 is called n cliffeomovphism if {/ is a differentiable one-to-one map of Mi onto ill2 such thai, g~^ is also differentiable. Let M be a differentiable manifold. A map <^: ill X [R —> il/ is called a onc- parameter group if i) if is differentiable; ii) (p{x, 0) = X for all x G M; iii) (p{(p(x, s), t) = (p(x, s + t) for all x G M and s, ^ G [I^. We can express conditions (ii) and (iii) a little differently. Let ipt'.M —^ M be given by ^t{x) = (p(x, t). For each t G U the map (pt is differentiable. In fact, (ft = (p o Li, where it is the differentiable map of M —^MxU given by Lt(x) = (x, t). Then condition (ii) says that cpo = id. Condition (iii) says that If we take ^ = — s in this equation, we get ipt ° (f—t = id. Thus for each / the map cpt is a diffeomorphism and ((Pt)~^ = <P—t'
9.5 FLOWS AND VECTOR FIELDS 377 Fig. 9.10 We now give some examples of one-parameter groups. Example 1. Let M = F be a vector space, and let w G M. Let cpiVx U —^ V be given by (p(Vj t) = V -\- tw. It is easy to check that (i), (ii), and (iii) are satisfied. (See Fig. 9.10.) Example 2. Let M = F be a finite-dimensional vector space, and let A be a linear transformation A:V—^V. Recall that the linear transformation e^^ is defined by .2 A 2 .3 A 3 2! 3! i.e., for any v gV, j=OJ' (See Figs. 9.11 and 9.12.) Since the convergence of the series is uniform on any ■Ul] 7=D '-[::] Fig. 9.11 Fig. 9.12
ei(<p{x,t)) = ei{x) + ta, = di{x) + ta- e2(<pix,t)) = e2{x) + ta, = ^2(3;) + ta - -2ir, -27r, X X X X 378 DIFFERENTIABLE MANIFOLDS 9.5 compact set of <v, t>, the map cp: M X U —^ M given by <p{v, t) = e^^v is easily seen to be differentiable and to satisfy (ii) and (iii) as well. Examples. Let M be the circle aS^, and let a be any real number. Let (^^ be tho diffeomorphism consisting of rotation through angle ta. In terms of the atlas a = {{Ui, di), {U2, O2)}, the map cp is given by Ui, Biix) < 27r — ta, Ui, Biix) > 27r — ta, U2, 02{x) < 27r + 7r/2 - ta, U2, 02{x) > 27r + 7r/2 - ta. (Strictly speaking, this doesn't quite define (p for all values of <x,t>. If X = <1, 0> and ta = 7r/2, then x ^ U\ and ip{x, 7r/2) g 112- This is easily remedied by the introduction of a third chart.) It is easy to see that (^ is a one- parameter group. Example 4. Let M = aS^ X aS^ be the torus, and let a and 5 be real numbers. Write X G M as X = <xi, X2> , where Xi G S^. Define cp^^^^^ by cp ""'^ {Xi,X2,t)= <<pUxi),<pt{X2)>, where cp^ and cp^ are given in Example 3. Then cp^^^^^ is a one-parameter group and indeed a rather instructive one. The reader should check to see that essentially different behavior arises according to whether h/a is rational or irrational. [The construction of Example 4 from Example 3 can be generalized as follows. If cp: M X U —^ M and \l/: N X U —^ N are one-parameter groups, then we can construct a one-parameter group on M X N given by (ptX-^f. The map of M X N X U-^ M X N sending <x, y, t> -> <cpt{x), ^tiy)> is differentiable because it can be written as the composite {(p X \p) o A, where (pXxly: MxUxNxU-^MxN, and A:MxA^XlR->MxlRxA^XlR is given by A{x, y, t) = <x, t, y,t>.] In each of the four preceding examples we started out with an "infinitesimal generator" to construct the one-parameter group, namely, the vector w in Example 1, the linear transformation A in Example 2, the real number a in Example 3, and the pair <a,h> in Example4. We will now show that associated with any one-parameter group on a manifold, there is a nice object which wv can regard as the infinitesimal generator of the one-parameter group. Let (p: M X U —^ M he Si one-parameter group. For each x G M consider the map cpx of U —^ M given by <Px{t) = ip{x, t).
9.5 FLOWS AND VECTOR FIELDS 379 In view of condition (ii), we know that (Px{0) = x. Thus cpx is a curve passing through X (see Fig. 9.13). Let us denote the tangent to this curve by X{x). We thus get a mapping X which assigns to each x G M a vector X{x) G Tx{M). Any such map, i.e., any rule assigning to each x G M a vector in Tx{M), will be called a vector field. We have seen that every one-parameter group gives rise to a vector field which we shall call the infinitesimal generator of the one-parameter group. Fig. 9.13 Let F be a vector field on M, and let (17, a) be a chart on M. For each X G 17 we get a vector Y{x)a G V. We can regard this as defining a F-valued function Y^ on a{U): Ya{v) = Y(a-\v))a for V G a{U). (5.1) Let (W, iS) be a second chart, and let Y^ be the corresponding F-valued function on P(W). If we compare (5.1) with (4.3), we see that Y^{^oa-\v)) = J^^a-i{v)oY^{v) if z; G ..(17 n TF). (5.2) Equation (5.1) gives the "local expression" of a vector field with respect to a chart, and Eq. (5.2) describes the "transition law" from one chart to another. Conversely, let 6i be an atlas of M, and let Y^ be a F-valued function defined on a:(17) for each chart (U, a) G d. Suppose that the Y^ satisfy (5.2). Then for each X G M we can let Y{x) G T^iM) be defined by setting Y{x)^= Y^{a{x)) for some chart (17, a) about x. It follows from the transition law given by (4.3) and (5.2) that this definition does not depend on the choice of (17, a). Observe that J^oa~^ is a C°°-function (linear transformation-valued function) on a:(i7 n TF). Therefore, if F is a vector field and F^, is a F-valued C°°-function on a(17), the function F^ will be C°° on j8(i7 fl F). In other words, it is consistent to require that the functions Ya be of class C°°. We shall therefore say that F
380 DIFFERENTIABLE MANIFOLDS O.T) is a C°°-vector field if the function Y^ is C°° for every chart (U, a). As in the case of functions and mappings, in order to verify that Y is C°°, it suffices to check thai the Ya are C°° for all charts (17, a) belonging to some atlas of M. Let us check that the infinitesimal generator X of a one-parameter group ^ is a C°°-vector field. In fact, if (17, a) is a chart, then Xa{v) = {a o ^,y(o), where (Px{t) = (p{x, t). We can write a o (p^{t) = #a(y, t), where ^aiv, t) = ao (p(a~\v), t). Let 17' C i7 be a neighborhood of x such that (p{y, t) ^ U ior y ^ U' and |^| < e Then #« is a differentiable map of ol{IJ') X I ^^ oi{U), where / = {^ : |^| < el In terms of this representation, we can write Xa{v) = ^ {V, 0). (5..3) This shows that X is a C°°-vector field. If we evaluate (5.3) in the case of Example 1, we get ^id{^, ^) = y + tu\ so that Xid = w. In the case of Example 2 we get X\^{v) = Av. There are various algebraic operations that can be performed with vector fields. The set of all vector fields on M forms a vector space in the obvious way. If X and Y are C°^-vector fields, then so is aX -\-hY (a and 5 are constants), where (aX + hY){x) = aX{x) + hY{x), x^M. Similarly, we can multiply a vector field by a function. If / is a function and X is a vector field, we define fX by UX){x)=f{x)X{x), x^M. It is easy to see that if / and X are differentiable, then so is fX. It is also easy to check the various associative laws for this multiplication. We have seen that any one-parameter group defines a smooth vector field. Let us examine the converse. Does any C°°-vector field define a one-parameter group? The answer to the question as stated is "no". In fact, let X = d/dx^ be the vector field corresponding to translation in thc^ X^-direction in W^. Let M = U^ — C, where C is some nonempty closed set, of IR^. Then if p is any point of M that lies on a line parallel to that x^-axis whicli intersects C (Fig. 9.14), then (pt{p) will not be defined (will not lie in M) for every t. The reader may object that M "has some points missing" and that is w^hy X does not generate a one-parameter group. But we can construct a counterexample on IR^ itself. In p fact, if we consider the vector field X on IR^ given by X,a{x\x^)= (1, -(x2)2), Fig. 9.14
9.5 FLOWS AND VECTOR FIELDS 381 then (5.3) shows that if, if defined, satisfies where # = #id. If we let y^(t^ x) = x^ o #(x, t)^ then It = '■ »'<»> - '■■ ^--(!/V, v'(0) = x^ If x^ 7^ 0, then the unique solution of the second equation is given by which is not defined for all values of ^. Of course, the trouble is that we only have a local existence theorem for differential equations. We must therefore give up on the requirement that (f be defined on all of M X K. Definition 5.1. A flow on M is a map (p of an open set 17 C M X IR ^ M such that i) M X {0} C [/; ii) if is differentiable; iii) ip{x, Q) = X] iv) ip{ip{x, s),t) = ip{x,s-]-t) whenever both sides of this equation are defined. For X fixed, ipx{t) = (p{x, t) is defined for sufficiently small t^ so that (p gives rise to a vector field X as before. We shall call X the infinitesimal generator of the flow (f. As the previous examples show, there may be no ^ 7^ 0 such that (^(x, t) is defined for all x, and there may be no x such that ip(x, t) is defined for all t. Proposition 5.1. Let X be a smooth vector field on M. Then there exists a neighborhood 17 of M X {0} in M X K and a flow cp: U -^ M having X as its infinitesimal generator. Proof. We shall first construct the curve ipx{t) for any x G M, and shall then verify that <x,t> 1-^ ip{x, t) is indeed a flow. Let X be a point of ilf, and let ([/, a) be a chart about x. Then X^ gives us an ordinary differential equation in a{U), namely, I = Xa{v), V G a{U). By the fundamental existence theorem for ordinary differential equations, there exists an e > 0, an open set 0 containing a:(x), and a map $,:0X {t:\t\ < e} -^ a{U)
382 DIFFERENTIABLE MANIFOLDS 9.5 such that and d^a(v, t) XT /^ / A\ j^ = Xa{^a{v,t)). Here the choice of the open set 0 and of e depends on a{x) and a(U). The uniqueness part of the theorem asserts that #« is uniquely determined up to the domain of definition; i.e., if ^y is any curve defined for |^| < e' with #^(0) = v and ^ = X„(*.W), (5.4) then ^y{t) = ^a{f^) t). This implies that whenever both sides are defined. (Just hold s fixed in the equation.) Consider the curve ^^x^') defined by ^\{t) = a-i(0«(a(x),^)). (5.5) It is defined for |^| < e, and is a continuous, in fact difFerentiable map of this interval into M. Furthermore, if we write i|; = c|)"-^() then (5.4) asserts that the tangent vector to the curve i|;(^ + •) is X{^{t)), the value of the vector field at the point ^{t). We will write this condition as ^'{t) = Xm))^ (5.6) Equation (5.6) is the way we would write the ''first order differential equation" on M corresponding to the vector field X. A differentiable curve i|; satisfying (5.6) is called an integral curve of X We now can formulate a manifold version of the uniqueness theorem of differential equations: Lemma 5.1. Let i|ii:/ -> M and i|i2:/ -^ M be two integral curves of X defined on the same interval /. If v|ii(s) = i|i2(s) at some point s E:I then i|ii = i|i2, i.e. ^i{t) = \\f2it) for all t e /. Proof. We wish to show that the set where \\fi(t) ^ \\f2(t) is empty. Let A = {t:t > s and \\fi(t) ^ \\f2(t)}. We wish to show that A is empty, and similarly that the set B = {t:t < s and \\)i(t) ^ \\f2(t)} is empty. Suppose that A is not empty, and let t+ = gib A = gib {t:t > s and i|;i(^) ^ \\f2(t)}. We will derive a contradiction by i) using the uniqueness theorem for differential equations to show that \\fi(t+) ^\\)2(t + ),and ii) using the Hausdorff property of manifolds to show that \\fi(t + ) = i|;2(^ + ).
9.6 LIE DERIVATIVES 383 Details:!). Suppose that i|ii(^ + ) = v|i2(^ + ) = j'E M. We can find a coordinate chart (p, W) about j^, and then p ° i|;i and p ° ^2 are solutions of the same system of first order ordinary differential equations, and they both take on the value P(y) at ^ = ^ +. Hence, by uniqueness for differential equations, P ° i|ii and p ° ^2 must be equal in some interval about t +, and hence ^i{t) = \\)2(t) for all t in this interval. This is impossible since there must be points arbitrarily close to ^ + where \\fi(t) ^ \\f2(t) by the gib property of ^ +. This proves i). Now suppose that i|;i(^ + ) ^ i|i2(^ + ). We can find neighborhoods [7iof i|;i(^ + ) and [72of ^2(^ + ) such that C7i n [72 = 0- But then the continuity of the i|;i imply that \\fi(t) E: Ui and \\f2(t) E U2 for t close enough to ^ + , and hence that \\)i(t) ^ \\f2(t) for t in some interval about t + . This once again contradicts the gib property of ^ + , proving ii). The same argument with gib replaced by lub shows that B is empty proving Lemma 5.1. The above argument is typical of a ''connectedness argument." We showed that the set where \\fi(t) = \\)2(t) is both open and closed, and hence must be the whole interval /. Lemma 5.1 shows that (5.5) defines a solution curve of X passing through X at time ^ = 0, and is independent of the choice of chart in any common interval of definition about 0. In other words it is legitimate to define the curve ^x(') by which defines c|)x(rt for |^| < e. Unfortunately the e depends not only on x but also on the choice of chart. We use Lemma 5.1 and extend the definition of c|)x() as far as possible, much as we did for ordinary differential equations on a vector space in Chapter 6: For any s with |s | < e we letj' = 4>x(s) and obtain a curve (^y(') defined for |^| < e\ By Lemma 5.1 (^y(t) = ^jc(s-^t)i{\s-^t\ < €. (5.7) It may happen that | s | + e' > e. Then there will exist a t with | ^ | < e' and |s + ^ I > €. Then the right hand side of (5.7) is not defined, but the left is. We then take (5.7) as the definition of c|)-^(s + ^), extending the domain of definition of c|):^(). We then continue: Letlx^ denote the set of all s > 0 for which there exists a finite sequence ofreal numbers So = 0 < si < . . . < Sk = sand points xq, . . .xjfe_i EMwithxo = x,si in the domain of definition of c|):^(),X2 = ^xi^i) and, inductively, Sj +1 in the domain of definition of ^xi(') and xi + i = ^xMi +1). If s G it, so is s' for 0 < s' < s, and so is s + 77 for sufficiently small positive 7]. Thus I^ is an interval, half open on the right. By repeated use of (5.4) we define (Px{s) for s G it- We construct I^ in a similar fashion and set /^ = /+ u /7- Then (px{s) is defined for s G h, and / is the maximal interval for which our construction defines cpx- For each x G M we obtain an open interval Ix in which the curve (Px{-) is defined. Let U = \Jxgm{x} X Ix' Then U is an open subset of M X /. To verify this, let (x, s) G U. We must show that there is a neighborhood W oix and an e > 0 such that s G /x for all |s — s| < e and x ^W. By definition, there is a finite
384 DIFFERENTIABLE MANIFOLDS 9.6 sequence of points x = XqjXi, . . . , Xk and charts (Ui, on), . . . , (17^, a^) with Xi—i G Ui and Xi G Ui and such that Oii{Xi) = ^ai(^iiXi-l), ti), where ti -\- - - - -\- t^ = s. It is now clear from the continuity properties of the ^a that if we choose Xq such that a:i(xo) is close enough to a:i(xo), then the points Xi defined inductively by will be well defined. [That is, aj{xj-i) will be in the domain of the definition of ^a,( • , ti)'] This means that s G Ixq for all such points Xq. The same argument shows that s + ^ G /^o for rj sufficiently small and x sufficiently close to x. This shows that U is open. Now define cp by setting (fix, t) = (Px{t) for (x, t) G U. That if is differentiable near M X {0} follows from the fact that if is given (in terms of a chart) as the solution of an ordinary differential equation. The fundamental existence theorem then guarantees the differentiability. Near the point (^, t) we can write (^(X, t) = ip{ip{' ' ' ((fix, ti), ^2) • • • )tk)^ t = ti + t2-\ \- h, and so cp is differentiable because it is the composite of differentiable maps. D 6. LIE DERIVATIVES Let (f he Si one-parameter group on a manifold M, and let / be a differentiable function on M. Then for each t the function cpflf] is differentiable, and for t 9^ 0 we can form the function <p*An - / t (6.1) We claim that the limit of this expression as ^ —> 0 exists. In fact, for any X e M, (^f [/](x) = j ^ ipt{x) = f o cp^{t) and, therefore, li^ dllLzf (^) = lin, /-^^(0-/o^x(Q) ^ j^^j ^ xix)f. (6.2) Here X{x) is a tangent vector at x and we are using the notation introduced in Section 4. We shall call the limit of (6.1) the derivative of / with respect to the one-parameter group cp, and shall denote it by Dxf- More generally, for any smooth vector field X and diflerentiable function / we define Dxf by Dxfix) = X(x)f for all x G M, (6.3) and call it the Lie derivative of / with respect to X. In terms of the flow generated by Z, we can, near any x G M, represent Dxf as the limit of (6.1),
9.6 LIE DERIVATIVES 385 where, in general, (6.1) will only be defined for a sufficiently small neighborhood of X and for sufficiently small |^|. Our notation generalizes the notation in Chapter 3 for directional derivative. In fact, if M is an open subset of V and X is the "constant vector field" of Example 1 Xiei= w G F, then (Z)x/)id = D^f.a, where Dy^ is the directional derivative with respect to w. Note that Dxf is linear in X; that is, Dax+bvf = aDxf + hDyg if X and Y are vector fields and a and h are constants. Let tA be a diffeomorphism of Mi onto M2, and let X be a vector field on M2. We define the "pullback'^ vector field xp'^lX] on Mi by setting ;A*[X](x) = ^-^^Xiypix)) for all x G M. (6.4) Note that \l/ must be a diffeomorphism for (6.4) to make sense, since \l/~^ enters into the definition. This is in contrast to the "pullback^' for functions, which made sense for any differentiable map. Equation (6.4) does indeed define a vector field, since ^^^ = \^;rx : T^ix){M2) -> T^iMi) and X(^(x)) G 7^^(x)(M2). Let us check that tA*[^] is a smooth vector field if X is. To this effect, let Cti and (^2 be compatible atlases on Mi and M2, and let (17, a) G Cti and {W, ^) G (^2 be compatible charts. Then (6.4) says that xPnXUv) = Jao^-Ur^{fi-^-Oi-\v)) 'X^{fio,poa-\v)) for V^a{U), which is a differentiable function of v. Since, by the chain rule, we can rewrite the last expression more simply as riXUv) = (J^o^oa-i{v))-'Xp(p o ^ o a-\v)) for V G a{U). (6.5) Thus \p'^[X]a is the product of a smooth Hom(y2) Fi)-valued function and a smooth F2-valued function, which shows that tA*[^] is a smooth vector field. Exercise. Let (f be the flow generated by X on il/2. Show that the flow generated by xp'^lX] is given by <x,t> ^xly--^ip(xP(x),t). (6.6) If (f IS Si one-parameter group, then we can write (6.6) as <x,t> ^xl^-^ o cp^o xp{x). (6.60
386 DIFFERENTIABLE MANIFOLDS 9.6 The vector field X The vector field Y {Xf=df/dx) {Yf = x{df/dy)) X{u,v)=<l,0>=8i Y{u,v)=<0,u> <ptiY) ""'"^ ^ V ^^^ i'*." ^^, 74 aS|PwM.«j»^'..ig^^ ^p^',^"^;.y..^« (c) DvF (riiiKV inde])endent of 0 '/ "': /i^— "'' /^\ v:- :\ /. my M^' M "«#«««*«#*» ^^JBSfWMW^IiMISW^^ (f) v^'/'^-V) ^^^v)--'^' ,.-,i^^'>?P *^?ii;i i?-L' i.'? ^ lEUfMli «V:$4l9«l<^';t»h«HIVl^4~ (g) (^) Fig. 9.15 It is easy to check that if i^i: Mi ^^ M^ and ^2* M^2 —^ M'3 are diffeo- morphisms and F is a vector field on M3, then (^2 o ^(,i)*F = ^t^lF.
^•^ LIE DERIVATIVES 387 mx)-x _ ^^^ , . ^ ^ , , , If/is a differentiable function on Mo, then (since independent of t) ' £»**[xi(^*[/]) = riDxf). (6.7) In fact, by (6.3) and (4.6) we have, for x e Mi, D^*mr[f]{x) = ^p*[X]{x)^p*[f] by (6.3) = ^p-'X{^p{x))r[f] by (6.4) = (^^^-'X(4.{x)))f by (4.6) = X{Hx))f = (DxDi^Pix)) Fig. 9.15 (cont.) = ^*i^xf){x). Let (p be a one-parameter group on M with infinitesimal generator X, and let F be another smooth vector field on M. For < 5^ 0 we can form the vector field ^^^^ m) and investigate its limit as ^ ^ 0, which we shall call DxY. In Fig. 9.15 we have shown the calculation of DyX and DxY for two very simple fields on the Cartesian plane Hi The field X is the constant field Zid = 5i, so that Xf -= df/dx in terms of Cartesian coordinates x, y. The corresponding flow is given ^y <Pt{x,y) = <x + t,y>. Thus ipt^ = \d if we identify the tangent space at each point of the plane with the plane itself. Then Y ^ ipfY consists of "moving" the vector field F to the left by t units. Here we have taken F ^ xb^, so that Yf -= x{df/dy). In Fig. 9.15(c) we have pictured iffY, and have superimposed F and cpfY in Fig. 9.15(d). Figure 9.15(e) represents <ptY ~ Y and Fig. 9.15(f) is {l/t){(pfY — Y}, which coincides with its limit, DxY, since the expression is independent of L The one-parameter group generated by F is xpt where \l/t{x, y) = <x,y + tx>. Here at any p E K^ we have yl^t^bi = bi + 162, so that xP^X = ^-t^X{xp{x)) = 81 — 182. In Fig. 9.15(g) we have drawn xp^X and in Fig. 9.15(h) we have superimposed it on X. Note that DxY = —DyX. However, these two derivatives are nonzero for quite different reasons. The field <pfY varies with t because the field F is not constant. The field i^f X varies with t because of "distortion" in the flow ^^ See Fig. 9.15(g) and (h). In the general case, DxY will result from a superposition of these two effects. We now make the general calculation. Let (17, a) be a chart on M, and for v ^ a(U) let 0 be a sufficiently small open set containing v, and let € > 0 be sufficiently small, so that ^« given by ^aiw, t) = ao (p(a~\w), t) is defined torw^O and 1^1 < e. Then, for |^1 < e, Eq. (6.5) imphes that ^flYUv) - (/^.(..oW)~'^.(*.(^ 0). (6.9)
388 DIFFERENTIABLE MANIFOLDS 9.6 The right-hand side of this equation is of the form A^^Zt, where At and Zt are differentiable functions of t with Aq = I. Therefore, its derivative with respect to t exists and dt = lim Af'zt ^0 = lim At t (Ar'zt zo) = lim Zt t AtZQ lim t ».(^ Zt — Zq AtZo zo' Now in (6.9) Zt t t ) = 4 — A'qZq. 'o = dY^(^-ff{v,0)^ = dYaiXM)^ Here Ya is a F-valued function, so dYa is its differential at the point ^a{^, 0)- Hence dYa{Xa{v)) is the value of this differential at Xa{v). The transformation At = J^a('^,t) = d{^a)(v,t), SO dA dt d d^a = d dt dt - d{Xa)v. Thus the derivative of (6.9) at ^ = 0 can be written as d{Y^),iXM) - d{X^),(YM) = Dx^,v,Y^ - 2)f«(.)X«. We have thus shown that the limit in (6.8) exists. If we denote it by DxY, we can write .F. - 2)f«(.)X,. (6.10) {DxYUv) = Dx^,v,ia As before, we can use (6.10) as the definition of DxY for arbitrary vector fields X and F. Again, this represents the derivative of F with respect to the flow generated by X, that is, the limit of (6.9) where now (6.8) is only locally defined. From (6.10) we derive the surprising result that DxY = —DyX. For this reason it is convenient to introduce a notation which expresses the antisymmetry more clearly, and we shall write DxY = [X, F].
9.6 LIE DERIVATIVES 389 The expression on the right-hand side is called the Lie bracket of X and F. We have [X, Y] = -[Y, X]. (6.11) Let us evaluate the Lie bracket for some of the examples listed in the beginning of Section 5. Let M = W. Example 1. If Xia = Wi and Fid = '^2 are "constant" vector fields, then (6.10) shows that [X, F] = 0. Example 2. Let X\^{v) = Av, where ^4 is a linear transformation, and let Fid = w. Then (6.10) says that [X, YUv) = -Aw, since the directional derivative of the linear function Av with respect to lu is Aw. Example 3. Let Xi^{v) = Av and Yi^{v) = Bv, where A and B are linear transformations. Then by (6.10), [X, YU{v) = BAv - ABv = (BA - AB)v. (6.12) Thus in this case [X, Y] again comes from a linear transformation, namely, BA — AB. In this case the antisymmetry in A and B is quite apparent. We now return to the general case. Let <^ be a one-parameter group on M, let F be a smooth vector field on M, and let / be a difTerentiable function on M. According to (6.7), Then ^ptiDyf) - DyS _ 2)^r[F](^t[/]) - 2)F(^r[/]) , 2)F(^r/) - Byf Dr^^^.Y]-Y\^t [/] 'Un[Y]-Y\ -K^O Since the functions <^f [/] are uniformly difTerentiable, we may take the limit as ^ ^ 0 to obtain DxiDyf) = DD^Yf+ DriDxf) = D[x,Y]f+DriDxf). In other words, Ax.F]/ = DxiDyf) - DriDxf). (6.13) In view of its definition as a derivative, it is clear that DxY is linear in F: DxiaY, + 6F2) = aDxY, + hDxY2 if a and h are constants and X and F are vector fields. By the antisymmetry, it must therefore also be linear in X. That is, Dax,+bx,Y = [aXi + 6X2, F] = a[Xu Y] + h[X2, Y] = aDxJ+hDxJ.
390 DIFFERENTIABLE MANIFOLDS 9.7 Let X and Y be vector fields on a manifold ^2, and let ^ be a diffeomorphisni of Ml onto M2. Then V.*[Z, F] = [V^*X,V^*F]. (6.14) In fact, suppose X generates the flow <^. Then ;/.*[X, F] - ;A*^xF = ;A*lim s(^9 Since t^"^ q (^^ o ;// is the flow generated by lA*^? we conclude that the last limit, is Z)^*xtA*^) which proves (6.14). Now let F and Z be smooth vector fields on M, and let X be the infinitesimal generator of <^. Then ^r[F,ZJ - [F,Z] Dx[Y,Z] = \{m ^ ^.^[<ptY,<ptZ]-[Y,Z] t=0 = lim ^ *. V( F - F *„ + * F, (PtZ — Z t = [DxY,Z] + [Y,DxZ]. Thus [Z, [F, Z]] = [[X, F], Z] + [F, [X, Z]]. (6.15) In view of the antisymmetry of the Lie bracket, Eq. (6.15) can be rewritten as [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y]] = 0. (6.16) Equation (6.15), or (6.16), is known as Jacobi's identity. 7. LINEAR DIFFERENTIAL FORMS Let M be a differentiable manifold. We have attached to each x ^M a vector space TxiM). Its dual space, (TxiM))*, is called the cotangent space to M at x, and will be denoted by T^(M). Thus an element of T|(M) is a continuous linear function on TxiM); it is called a covector. Some explanation of the word "continuous" is in order. In the case where M [and hence T^iM)] is finite-dimensional, all linear functions on Tx{M) are continuous, so no further comment is necessary. We shall be concerned primarily with this case. More generally, let I be a linear function on Tx{M). For any
9.7 LINEAR DIFFERENTIAL FORMS 391 chart ([/, a) about x we have identified Tx{M) with V, thus identifying J G Tx{M) with ^CK G y. Then I determines a linear function l^ on F by (?a, ^a) = (?, 0- (7.1) If {W, (3) is a second chart, then <?^, k) = {Jao^-^{P(x))^^, la). Since Jao^-^ (i^(^)) is a continuous map of V into F, we see that la is continuous if and only if l^ is. We shall therefore say that I is continuous if la is continuous for some (and hence any) a. In this case we see that (7.1) gives us an identification of Tx{M) with F* sending I into la- The last equation says that the rule for change of charts is given by k= {JaorMx))Yla. (7.2) Let / be a difTerentiable function on M, and let x G M. Then the function on Tx{M) sending each ^ G T^^M) into ^(/) will be denoted by df{x). Thus {^.df{x))= ?/. It is easy to see that dj G T^(M). In fact, in terms of a chart (U, a) about x, Note that / assigns an element df{x) of T^(I\I) to each x G M. A map which assigns to each x G M an element of Tx{]\I) will be called a covector field or a linear differential form. The linear differential form determined by the function / will be denoted simply by df. Let CO be a linear differential form. Thus o:(x) G T^iM) for each x G M. Let a be an atlas of HI. For each (U, a) G M we obtain the F*-valued function coa on q:([/) defined by o:a(v)= {c^{a-Hv))]a fov V G a(U). (7.3) If (W, 13) G a, then (7.2) says that a;^(^ o a-i(?^)) = (^o/3-i(^ o a-i(z;)))*a;«(z;) = (^^oa-i(z;))-i*a;,(z;) for z; G a(f/n TF). (7.4) As before, Eq. (7.4) shows that it makes sense to require that co be smooth. We say that co is a C^-difTerential form if co^ is a F*-valued C^-function for every chart ([/, a). By (7.4) it suffices to check this for all charts in an atlas. Also, if we are given F*-valued functions co^, each defined on a(U), (U, a) G Gi, and satisfying (7.4), then they define a linear differential form co on M via (7.3). If CO is a differential form and / is a function, we define the form /co by /co(:r) = f{x)cx:{x). Similarly, we define coi + ^2 by (coi + co2)(:r) = coi(:r) + co2(:r).
392 DIFFERENTIABLE MANIFOLDS 9.7 Let Ml and M2 be differentiable manifolds, and let <^: Mi —> M2 be a differentiable map. For any :r G Mi wehavethemap <^:jcx- ^x(Mi) —> T'^(^)(M2). It therefore defines a dual map (^*x)*:T*(,)(M2)^r,*(Mi). (The reader can check that if I G T'*(^)(M2), then J —> (<^*(^), I) is a continuous linear function of ^, by verification in terms of a chart.) Now let CO be a differential form on M2. It assigns a;(<^(:r)) G T'*(3.)(M2) to <^(:r), and thus assigns an element (<^*x)*(^(<^(^))) G 2'*(Mi) to :r G Mi. Wo thus "pull back" the form co to obtain a form on Mi which we shall call <^*a;. Thus <^*a;(:r) = (<^^^)*a;(<^(:r)). (7.5) Note that <^* is defined for any differentiable map as in the case of functions, not only for diffeomorphisms (and in contrast to the situation for vector fields). It is easy to give the expression for <^* in terms of compatible charts {U, a) of Ml and (W, (3) of M2. In fact, from the local expression for <^* we see that (^Ma(?^) = {J^o^oa-Kv)Yo^^{& o ^ ° oc-\v)), V G a(U), (7.6) From (7.6) we see that <^*a; is smooth if co is. It is clear that <^* preserves algebraic, operations: <^*(^i + ^2) = <^*^i + <^*co2 (7.7) and ^*(/co) = ^*[/]^*(co). (7.8) If <^: Ml —> M2 and \p\ M2 —> M3 are differentiable maps, then (4.4) and (7.5) show that (^o <^)*a; = <^V*co. (7.9) Let ^: Ml —> M2 be a differentiable map, and let / be a differentiable function on 71/2. Then (4.6) and the definition dj show that dirm) = r dj. (7.10) Let <^ be a flow on M with infinitesimal generator X, and let co be a smootli linear differential form on M. Then the form <^f co is locally defined and, as in the case of functions and vector fields, the limit as ^ —> 0 of (ft 0^ — CO t exists. We can verify this by using (7.6) and proceeding as we did in the case; of vector fields. The limit will be a smooth covector field which we shall call Dxco. We could give an expression for Dx^ in terms of a chart, just as we did for vector fields.
9.8 COMPUTATIONS WITH COORDINATES 393 If / is a differentiable function, co a smooth differential form, and X the infinitesimal generator of <^, then DxC/co) = {Dxf)c^ + fDxc^. (7.11) In fact, DAfo^) = lim '^1^^^ = hm(^^^r(co)+/^^^ = (2)x/)a;+/2)xco. If ^ is a differentiable function on M, then (ft dg — dg _ dcpfig] — dg _ . /cptlg] — g^ t t ^p^) An easy verification in terms of a chart shows that the limit of this last expression exists and is indeed d{Dx(p)- Thus Dx{df) = d{Dxf). (7.12) Equations (7.11) and (7.12) show that if c^ = fidgi-\ \-fkdgk, then Dxco = (Dxfi) dg, + --- + {Dxh) dg^ + /i ^(^^x^i) + • • • + A d{Dxgk)- (7.120 Let CO be a smooth linear differential form, and let X be a smooth vector field. Then (X, co) is a smooth function given by {X, ^){x) = {X{x), ^{x)). Note that (X, co) is linear in both X and co. Also observe that for any smooth function / we have (X, df) = Dxf. (7.13) 8. COMPUTATIONS WITH COORDINATES For the remainder of this chapter we shall assume that our manifolds are finite- dimensional. Let M be a differentiable manifold whose V = W^. If {U, a) is a chart of M, then we define the function xl, on U by setting Xa{x) = ith coordinate of a(x). (8.1) If/ is any differentiable function on C/, then we can write Eq. (2.1) as /(^) = fa{Xa(x), . . . , X'^(x))y
394 DIFFERENTIABLE MANIFOLDS 9.8 which we shall write as / = fai^a, • • • J ^a)- (8.2) We define the vector field d/dxl, on U by (^) (^) = Sii=<0,...,l,...,0». (8.3) \ a/ a *^li position If X is any vector field on U, then we have ^=^i,-^ + ---+^.%i' (8.4) a a where the functions XI, are defined by {X)a{a{x)) = <XUx), ..., X:(x) > . (8.5) Equation (8.4) allows us to regard the vector field X as a "differential operator". In fact, it follows from the definitions that a a Since Xq, is a differentiable function on U, dxl, is a differential form on U and {dxi)a{v) = 8i for all v eU. (8.7) In particular, £., dxiS = di (8.8) If CO is a differential form on U, then CO = aia dXa + • • • + ana dXaj (8.9) where the functions O/icx. a/re defined by c^a{ci{x)) = <aic,{x), . . . , ana{x)>G W". (8.10) It then follows from the definitions that df = ^^dx^ + --- + ^^dx:. (8.11) a a Equation (8.11) has built into it the transition law for differential forms under Ji change of charts. In fact, if (W, (3) is a second chart, then on [/ fi TF we have, by (8.11), dxj = g da;i + • • • + g dxl (8.12)
9.8 COMPUTATIONS WITH COORDINATES 395 If we write co = ai^ dx^ + • • • + a^^ dx^ and substitute (8.12), we get Zdx^^ Now dxl dx3 is the matrix J^oa~'^- If we compare with (8.10), we see that we have recovered (7.4)._ Since the subscripts a, 13, etc., clutter up the formulas, we shall frequently use the following notational conventions: Instead of writing xl, we shall write x\ and instead of writing x^^ we shall write y\ Thus i i i i k k 1 X = Xa, y = Xfij z = Xy, etc. Similarly, we shall write X^ for X^, F* for X^, a^ for a^^, hi for a^-^, and so on. Then Eqs. (8.1) through (8.12) can be written as x'^{x) = iih coordinate of a{x)j (8.1') f=Ux\...,x^), (8.2') (^);^) = K (8.30 ^=^'^+--- + ^"^' (8.40 (Z)„ («(a;)) = < X\x), ..., X^{x) >, (8.50 i)x/=X'gf+--- + ^%^' (8.60 (dxXiv) = 5i, (8.70 d ^^.,dx^)=Si (8.80 w= aidx^ -] \- a„ dx", (8.90 co„(a(a;)) = <ai{x), . . . , a„(a;)>, (8.100 df='S''^' + ---+fxl''''' (8.110 dy^='£dx^ + --. + '£dx^. (8.120 The formulas for "pullback" also take a simple form. Let ^: Mi —> M2 be a differentiable map, and suppose that Mi is m-dimensional and M2 is n-dimen-
396 DIFFERENTIABLE MANIFOLDS 9.8 sional. Let {U, a) and (W, (3) be compatible charts. Then the map is given by vKHx)) = y\x\ . . . , xn, i=l,...,n, (8.13) that is, by n functions of m real variables. If / is a function on M2 with / =/^(2/\ • • • , 2/") on W, then r[n = L{x\...,xn on u, where Ux\ ..., xn = U{y\x\ ..., xn...., y''{x\ ..., xn)- (8.14) The rule for "pulling back" a differential form is also very easy. In fact, if CO = tti dy^ + • • • + dn dy'^ on W, then ^*a; has the same form on [/, where we now regard the a's and 2/'s as functions of the :r's and expand by using (8.12). Thus ^*a; = J^di^dx', where a^- = a,(2/H:^\ • • • , ^'^), • • • , v'^ix^ . . . , :r^)). Let xgU. Then ^*(^)(^) = i:g(^)4,(^(^)) (8.15) gives the formula for ^^x- EXERCISES 8.1 Let X and ?/ be rectangular coordinates on E^, and let (r, 6) be polar "coordinates" on E^ — {0}. Express the vector fields d/dr and d/dd in terms of rectangular coordinates. Express d/dx and d/dy in terms of polar coordinates. 8.2 Let X, y, z be rectangular coordinates on E^. Let X = y Z--1 Y = z^ ^T"' and Z = x- y^- oz ay ox oz oy ox Compute [Z, F], [Z, Z], and [Y, Z]. Note that Z represents the infinitesimal generator of the one-parameter group of rotations about the a;-axis. We sometimes call Z the "infinitesimal rotation about the a;-axis". We can do the same for Y and Z. 8.3 Let oz oy oz ax oy ax Compute [A, B], [A, C], and [B, C]. Show that Af = Bf = Cf = 0 'li f(x, y, z) = ^2 _j_ ^2 — ^2 Sketch the integral curves of each of the vector fields A, B, and C.
9.9 RIEMANN METRICS 397 8.4 Let D = U — + V — y E = u- V — ) and /^ = u :^ v — • ov ou ov ou ou ov Compute [D, E], [D, F], and [E, F]. 8.5 Let Pi, . . . , Pn be polynomials in a;\ . . . , a;" with no constant term, that is, Pi(0, ...,0) = 0. Let ^ ^ ^1 _^ I I ^n d and dx^ dx Show that [/, X] = 0 if and only if the P^'s are linear. [Hint: Consider the expansion of the P^'s into homogeneous terms.] 8.6 Let X and the Pi's be as in Exercise 8.5, and suppose that the Pi's are linear. Let and suppose that \i 9^ Xj for I 9^ j. Show that [v4, Z] = 0 if and only if Pi = ijiix\ that is, = fjLix — H ^ finx — for some /x , . . . , /x . 8.7 Let A be as in Exercise 8.6, and suppose, in addition, that \i 9^ \j + \r for any i, j, r. Show that if the P/s are at most quadratic, then [A,X] = 0 if and only if Pi = fiix^. Generalize this result to the case where Pi can be a polynomial of degree at most m. 9. RIEMANN METRICS Let M be a finite-dimensional differentiable manifold. A Riemann metric, m, on M is a rule which assigns a positive definite scalar product ( , )m.x to the vector space Tx{M) for each x G M. We shall usually drop the subscripts m and X when they are understood from the context. Thus if m is a Riemann metric on My X G My and ?, r? G Tx{M), we shall write the scalar product of ^ and r) as
398 DIFFERENTIABLE MANIFOLDS 9.9 Let ([/, a) be a chart of M. Define the functions gij on U by setting ^.V(x)=(|,(.),^-.(x)), (9.1) SO that Qij = (jjt. If ^, 77 G T:c(M) with ^=Zr£lW and .= Z.%^(A then Since f/x^Cr), . . . , dx^ix) is the basis of T^\]\I) dual I0 the basis we have ^^•_ {^,dx\x)\ V- (^,M^^)), so that the last equation can be written as (?, V)m,. = Z g^J(x){^, dx'Xv, dx^}. (9.2) Equation (9.2) is usually written more succinctly as m \ U = J: g^j{x) dx'dx'. (9.3) [Here (9.3) is to be interpreted as a short way of writing (9.2).] Let {W, 13) be a second chart with h'^'i^) = (a~. (•-)' ifi w) , X e w, that is, m r TF = E hn di/ dij\ (9.4) Then for x G [/ fi TF, we have so 9iM) = (^ (X), ir, W) = Z |1 (..) g (..)/^u(.), that is, .J^ n, ^ Note that (9.5) is the answer we would get if we formally substituted (8.12) for the dij^ in (9.4) and collected the coefficients of dx^ dx\ In any event, it is clear from (9.5) that if the hij are all smooth functions on Wj then the cjij are smooth on IJ fi W. In view of this we shall say that a
9.9 RIEMANN METRICS 399 Riemann metric is smooth if the functions cjij given by (9.3) are smooth for any chart ([/, a) belonging to an atlas d of M. Also, if we are given functions dij = Qji defined for each {U, a) G G, such that i) Z gijix)^"^' > 0 unless J = 0 for all xgU, ii) the transition law (9.5) holds, then the Qij define a Riemann metric on M. In the following discussion we shall assume that our Riemann metrics are smooth. Let ^: Ml —> M2 be a differentiable map, and let m be a Riemann metric on M2. For any x G Mi define ( , ),/.*(m).x on T:c(Mi) by (?j V)^p''(n^),x = (^*(?), ^*(^))m,,/.(x)- (9.6) Note that this defines a symmetric bilinear function of J and rj. It is not necessarily positive definite, however, since it is conceivable that ^*(?) = 0 with J 7^ 0. Thus, in general, (9.6) does not define a Riemann metric on Mi. For certain ^ it does. A differentiable map ^: Mi —> M2 is called an immersion if ^^^ is an injection (i.e., is one-to-one) for all :r G M^ If ^: Ml —> M2 is an immersion and m is a Riemann metric on M2, then we define the Riemann metric ^*(m) on Mi by (9.6). Let ([/, a) and (TF, 13) be compatible charts of Mi and M2, and let Then where ^*(m) \ U = Y. Qij dx^ dx\ fxn,\p{x) = (*. {^ w), *. {h m)). which is just (9.5) again (with a different interpretation). Or, more succinctly, ^*(m) \U=Z rQiu)r{dy^)r{dy'). Let us give some examples of these formulas. If M = W', then the identity chart induces a Riemann metric on IR'^ given by {dx^Y-] + {dx"")^. Let us see what this looks like in terms of polar coordinates in R^ and R^.
400 DIFFERENTIABLE MANIFOLDS 9.9 In U^ if we write then so 10 = r cos dy X = r sin By dx^ = cos ^ dr — r sin ^ ddy dx^ = sin 6 dr -\- r cos 6 ddy {dxT + {dx^r = dr^ + r^ dd\ (9.7) Note that (9.7) holds whenever the forms dr and dd are defined, i.e., on all of U^ — {0}. (Even though the function d is not well defined on all of W^ — {0}, the form dS is. In fact, we can write de = ^Lt! 7 fjf •) (9.8) In U^ we introduce Then Thus (:ri)2+ {x^)2 x^ ^ r cos cp sin dy x^ = r sin cp sin dy x^ = r cos 6. dx^ = cos (p sinS dr — r sin <p sin ^ d<p + ?" cos <p cos ^ d^, (i:r^ = sin cp sin ^ dr + ^ cos cp sin 6 dcp -\- r sin <p cos 6 ddy dx^ = cos 6 dr — r sin 6 dd. (dx^ + (dxy + (dx^y = dr^ + r^ sin^ d dcp^ + r^ dd\ (9.9) Again, (9.9) is valid wherever the forms on the right are defined, which this time means when (x^)^ + (x^)^ 9^ 0. Let us consider the map l of the unit sphere S^ —> R^, which consists of regarding a point of >S^ as a point of U^. We then get an induced Riemann metric on>S2. Let us set dd = t* dd and d(p = t* d<p, so the forms dd and dip are defined on U = S^ — {<0, 0, 1>, <0, 0, —!>}. Then on U we can write (since r = 1 on S^) L'^iidxy + (dxy + {dxY) = ^^ + sin'^ d'dUp^, (9.10) We now return to general considerations. Let M be a differentiable manifold and let (7: 7 —> M be a differentiable map, where I is an interval in R^ Let t denote the coordinate of the identity chart on I, We shall set =^* (!)(«)' C'{s) = C* [^^J is), SG I, SO that (7'(s) G Tc(8){M) is the tangent vector to the curve C at s. If (C/, a) is a chart on M and X , • • . , X are the coordinate functions of (f7, a), then if
9.9 RIEMANN METRICS 401 C(7') C U for some I' C /, aoC= <X^ oC,...,x'' oC>, so that n').=^^^ ^^(0. When there is no possibility of confusion, we shall omit the o C and simply write a o C(t) = <x\t), ..., x''(t)> and C\t)a = <x'\t), ..., x''\t)>. Now let m be a Riemann metric on M. Then ||C'(0|| = (C\t), C\t)) ^'^ is a continuous function. In fact, in terms of a chart, we can write \\C'm = VT,gij(C(t))xi'(t)xr{t). The integral f^wcmdt (9.11) is called the length of the curve C. It will be defined if ||C(Q || is integrable over /. This will certainly be the case if / and ||C(^)|| are both bounded, for instance. Note that the length is independent of the parametrization. More precisely, let <p: / —> / be a one-to-one differentiable map, and let Ci = C o <p. Then at any r G / we have that is, Thus dt ^ dr C'lir) = §C'it). ri(T)ii = ruT))ii|^|- (9.12) On the other hand, by the law for change of variable in an integral we have l\\c'm = [wc'um dip dr = /^ \\C'i{-)\\ by (9.12). More generally, we say that a curve C defined on an interval / is piecewise differentiable if i) C is continuous; ii) I = Ii \J '-' U Ir and C, on each /y, is the restriction of a differentiable curve defined on some interval /y strictly containing /y.
402 DIFFERENTIABLE MANIFOLDS 9.9 (Thus a piecewise differentiable curve is a curve with a finite number of "corners".) If C is piecewise differentiable, then ||C'(^)|| is defined and continuous except at a finite number of ^'s, where it may have a jump discontinuity. In particular, the integral (9.11) will exist and the curve will have a length. Exercise. Let C be a curve mapping I onto a straight line segment in W^ in a one-to- one manner. Show that the length of C is the same as the length of the segment. Let C: [0, 1] -> R^ be a curve with C(0) = 0 and C(l) = v^R^. If we use the expression (9.7), we see that j\\C{t)\\ dt = IV(r\t))^+ {r{t)e\t)ydt dt > I r'iOdt Jo = ii^^ii, with equality holding if and only if ^' = 0 and r^ > 0. Thus among all curves joining 0 to v, the straight line segment has shortest length. Similarly, on the sphere, let C be any curve C: [0, 1] —> S^ with C(0) = (0, 0, 1) and C(l) = p 9^ {0, 0, -1), and let di = d{C{l)). Then f\\C\t)\\ dt = fV(d\t))^ + sm^d(cp\t)ydt > C w{t)\dt. Jo If we let ^1 denote the first point in [0, 1] where 6 = ^i, then f\\C\t)\\ dt > C \d'{t) I dt > C' \d'{t)\ dt > C' d'{t) dt - ^1, J Jo Jo Jo with equality only if <p' = 0 and ^i = L Thus the shortest curve joining any two points on S^ is the great circle joining them. In both examples above we were aided by a very fortuitous choice of coordinates (polar coordinates in the plane and a kind of polar coordinates on the sphere). We shall see in Section 11, Chapter 13, that this is not accidental. We shall see that on any Riemann manifold one can introduce local coordinates in terms of which it is easy to describe the curves that locally minimize length.
CHAPTER 10 THE INTEGRAL CALCULUS ON MANIFOLDS In this chapter we shall study integration on manifolds. In order to develop the integral calculus, we shall have to restrict the class of manifolds under consideration. In this chapter we shall assume that all manifolds M that arise satisfy the following two conditions: 1) M is finite-dimensional. 2) M possesses an atlas Gi containing (at most) a countable number of charts; that is, d = {(Ui, a:^•)}^=l.2.... • Before getting down to the business of integration, there are several technical facts to be established. The first two sections will be devoted to this task. 1. COMPACTNESS A subset A of a manifold M is said to be compact if it has the following property: i) If {Ul} is any collection of open sets with there exist finitely many of the Ul, say Ul^j . . . , Ul^, such that A cUl.U" ' U Ul^. Alternatively, we can say: ii) A set A is compact if and only if for any family {Ft} of closed sets such that AnriFL= 0, L there exist finitely many of the Fl such that AnFL^n-" nF,^= 0. The equivalence of (i) and (ii) can be seen by taking Ul equal to the complement of Fi. In Section 5 of Chapter 4 we established that if M = U is an open subset of IR'^, then A C U is compact if and only if A is a closed bounded subset of U^. 403
404 THE INTEGRAL CALCULUS ON MANIFOLDS 10.1 We make some further trivial remarks about compactness: iii) li Ai, . . . , Ar are compact, so is Ai U • • • U Ar. In fact, if {Ul} covers Ai \J - - - U Ar, it certainly covers each Aj. We can thus choose for each j a finite subcollection which covers Aj. The union of these subcollections forms a finite subcollection covering Ai \J - - - U Ar. iv) If ^: Ml —> M2 is continuous and A C Mi is compact, then ^[A] is compact. In fact, if {Ul} covers ^[A], then {<P~^{Ul)} covers A. If the Ul are open, so are the ^~^{Ul), since yp is continuous. We can thus choose ti, . . . , t^ so that ACyl^-\UL,)U"'U^-\ULX which implies that ^[A] C f/t^ U • • • U f/t^. We see from this that if A = Ai \J - - • U Arj where each Aj is contained in some T^^•, where (Wi, (3i) is a chart, and (3i(Aj) is a compact subset of W^, then A is compact. In particular, the manifold M itself may be compact. For instance, we can write S^ as the union of the upper and lower hemispheres: S"" = {x : x'''^^ > 0} U {:r : x'''^^ < 0}. Each hemisphere is compact. In fact, the upper hemisphere is mapped onto {2/ : II2/II ^ 1} by the map <pi of Section 8.1, and the lower hemisphere is mapped onto the same set by <p2- Thus the sphere is compact. On the other hand, an open subset of IR'^ is not compact. However, it can be written as a countable union of compact sets. In fact, if C/ C IR'^ is an open set, let An = {x eU : \\x\\ < n and p{x, dU) > 1/n}. It is easy to check that A^ is compact and that {}An= U. In view of condition (2), we can say the same for any manifold M under consideration: Proposition 1.1. Any manifold M satisfying (1) and (2) can be written as M = U Ai, i=l where each AiC M is compact. Proof. In fact, by (2) M= 0 Uj, and by the preceding discussion each Uj can be written as the countable union of compact sets. Since the countable union of a countable union is still countable, we obtain Proposition 1.1. D
10.2 PARTITIONS OF UNITY 405 An immediate corollary is: Proposition 1.2. Let M be a manifold [satisfying (1) and (2)], and let {Ul} be an open covering of M. Then we can select a countable subcollection {Uj} such that \JUj'= M. Proof. Write M = \jAr, where Ar'is compact. For each r we can choose finitely many C/r.i, Ur,2, • • • j Ur,k^ so that ArC Ur,l U- • • U Ur,k^. The collection {Ur,k}r=l 00 A;=l kr is a countable subcollection covering M. D 2. PARTITIONS OF UNITY In the following discussion it will be convenient for us to have a method of "breaking up" functions, vector fields, etc., into "little pieces". For this purpose we introduce the following notation: Definition 2.1. A collection {qj} of (7°°-functions is said to be a partition of unity if i) 9i ^ 0 for all i; ii) supp gi'\ is compact for all i; iii) each x G M has a neighborhood Vx such that Vx H supp Qi = 0 for all but a finite number of i; and iv) Z; gi{x) = 1 for all x G M. Note that in view of (iii) the sum occurring in (iv) is actually finite, since for any x all but a finite number of the gi(x) vanish. Note also that: Proposition 2.1. If A is a compact set and {qj} is a partition of unity, then A n supp gi = 0 for all but a finite number of i. Proof. In fact, each x G A has a neighborhood Vx given by (iii). The sets {Vx}x^A form an open covering of A. Since A is compact, we can select a finite subcollection {Fi, . . . , Vr) with A C Fi U • • • U Fr- Since each Vk has a nonempty intersection with only finitely many of the supp gi, so does their union, and so a fortiori does A. D t Recall that supp g is the closure of the set {x: g{x) 9^ 0}.
406 THE INTEGRAL CALCULUS ON MANIFOLDS 10.2 Oelifiitioii 2.2. Let {Ui} be an open covering of Jl, and let {gj} he a partition of unity. We say that {gj} is subordinate to {Ut} if for every j there exists an t(j) such that suppgjC Ihay (2.1) Thewem 2.1. LM. {Ih} be any open covering of M. There exists a partition of unity {^j} subordinate to {Ui] . The proof that wc shall present below is due to Bonie and Franiptou.f First we introduce some preliminary notions. The function / on K defined by ., . Ur'^'" if w > 0, ^^^)"lo if «<o is C^, For w 7^ 0 it is clear that / has derivatives of all orders. To check that / is C^ at 0, it suffices to show that f^^^^{u) -^ 0 as m -* 0 from the right. Bui f^^\u) ^ Pk{l/u)e'^^'^-% where Pk is a polynomial of degree 2fc, So Mm f^\u) = Mm Pkis)e'^' = 0, since e* goes to infinity faster than any poljaiomial. Note that /(m) > 0 if and only if u > 0. Now consider the function g^ on 1 defined by gtix) r= fix ^ u)fib ^ x)^ Then gl is C'^ and nonnegative and glix) 7^ 0 if and only if o < z < h. More eener ally, if a -- <a\ . . . , a^> and h = <b\ . . . , 5*>, define the function (/« on R* by setting where x = <x\ . . , , a:*>. Then g^ > 0, g^ E C*, and g^(x) > 0 if and only if a^ < x^ < b\ ^ . . , a^ < x^ < h^ (2.2) Lemma. Let fij ■ ■ - ,fk be C'*-functions on a manifold If, and let W = {z : a^ < fi(x) < b\ . . . , a^' < /&(x) < 6*}. There exists a nonnegative C^-function g such that W = {x : gix) > 0}, In fact, if we define g by gix)'^gUM^).'^'Jki^))^ then it is clear that g has the desired properties. t Smooth functions on Banaeh manifolds, J. Math and 3l€ch. 15, 877-898 (1%6).
10.2 PARTITIONS OF UNITY 407 We now tiirri to the proof of Theorem 2.1. Proof. For each x E M choose a f/t containing x and a chart ([/, a) about :i:. Then a{U n Ui) is an open set containing a{x) in R'". Choose a and b such that «Ct) e int Da and ft C a(U n l/.). Let W^ = a^^''\mt \jt)' Then W.T C f.\ and W^. is compact. (2.3) Also if x^ . . . , x" are the coordinates given by a, W, = {y : a' < x\y) < h\ . . . , a'" < x"(-//) < 6^^^, By our lemma we can find a nonnegative C^'"^"fnnction j\ such that w,= {y.JM >o;. Since a: E Wx^ the {W^x} cover M. By Proposition 1.2 we can select a countable subcovering {W^}. Let ns denote the corresponding functions by Jf, that is, if Wi = W,., wc set /, = U Let Fi- lfi= {x:/i(x) >0}, ^2= k:/2(x) >0,/i(x) < i), Vr = (x :/.(x) > 0,/i(x) < I/);, , , . ,/V-^i(x) < 1/r}, It is clear that Vj is open and tliat Vj C Wj, m that^ by (2.3), Vj is compact and Fj C f/t (2.4) for some t = t(J). For each x G M let g(x) denote the first integer q for which /g(x) > 0. Thus/p(x) = 0 if p < 5(x) and/^(,,j(x) > 0. Letl'^-^ {y:f^(x){ij) > ¥q(x)ix)], Since/y(x)(x) > 0, it follows that x E F^ and Vx is open. Furtliennore, V, nTr= 0 if r > ^(x) and 1/r < M^ix-M)* (2-5) According to the lemma, each set F/ can be given as Ff = {x : pt(x) > 0}, where p^ is a suitable C^^'-function, Let g = Yl Vi- I" view of (2.5) this i> really a finite sum in the neighborhood of any x. Thus g is C*. Now fg(x)Cr) > 0, since x G Fg(j;). Thus {j > 0. Set We claim that {(jj} is the desired partition of unity. In fact, (i) holds by our construction, (ii) and (2.1) follow from (2.4), (iii) follows from (2.5), and (iv) holds by construction, D
408 THE INTEGRAL CALCULUS ON MANIFOLDS 10.3 3. DENSITIES If we regard U^ as a differentiable manifold, then the law for change of variables for an integral shows that the integrand does not have the same transition law as that of a function under change of chart. For this reason we cannot expect to integrate functions on a manifold. We now introduce the type of object that we can integrate. Definition 3.1. A density p is a rule which assigns to each chart (U, a) of M a function pa defined on a(U) subject to the following transition law: If {W, 13) is a second chart of M, then Pa{v) = p^(0 o a-\v))\detJpoa-'(v)\ for v G a(U n W). (3.1) If G, is an atlas of M and functions pa. are given for all (Ui, ai) G d satisfying (3.1), then the Pa^ define a density p on M. In fact, if ([/, a) is any chart of M (not necessarily belonging to d), define pa by Pa{v) = Paii^i ° oi~^{v))\det Juioa'^iv)] U V G a{U D Ui). This definition is consistent: If v G a{U n Ui) n a{U n Uj), then by (3.1), p«/a:y o a~^{v))\det Jocjoa-'(v)\ = poci(^i° Oif^ioLjO a~^{v)))\deiJaioaJ^{oLjOa~^{v))\\detJa^oa-^{v)\ = pocX^i o a~\v))\det Joc-oa-'{v)\ by the chain rule and the multiplicative property of determinants. In view of (3.1) it makes sense to talk about local smoothness properties of densities. We will say that a density p is C^ if for any chart ([/, a) the function Pa is C^. As usual, it suffices to verify this for all charts ([/, a) belonging to some atlas. Similarly, we say that a density p is locally absolutely integrable if for any chart ([/, a) the function pa is absolutely integrable. By the last proposition of Chapter 8 this is again independent of the choice of atlases. Let p be a density on M, and let :r be a point of M. It does not make sense to talk about the value of p at x. However, (3.1) shows that it does make sense to talk about the sign of p at x, More precisely, we say that p > 0 at :r if pa{a{x)) > 0 (3.2) for a chart ([/, a) about x. Equation (3.1) shows that if pa{a{x)) > 0, then p^{^(x)) > 0 for any other chart (TF, 13) about x. Similarly, it makes sense to say that p < 0 at :r, p > 0 at :r, or p 5^ 0 at :r. Definition 3.2. Let p be a density on M. By the support of p, denoted by supp p, we shall mean the closure of the set of points of M at which p does not vanish. That is, supp p = {x : p 7^ 0 Sit x}.
10.3 DENSITIES 409 Let pi and p2 be densities. We define their sum by setting (Pl + P2)a = Pla + P2a (3.3) for any chart (f/, a). It is immediate that the right-hand side of (3.3) satisfies the transition law (3.1), and so defines density on M. Let p be a density, and let / be a function. We define the density jp by (/P)« = UPa. (3.4) Again, the verification of (3.1) is immediate in view of the transition laws for functions. It is clear that supp (pi + P2) C supp pi U supp p2 (3.5) and supp (/p) = supp/ n supp p. (3.6) We shall write Pi ^ P2 at :r if p2 — Pi > 0 at :r and Pi ^ P2 if Pi ^ P2 at all X G M. Let P denote the space of locally absolutely integrable densities of compact support. We observe that P is a vector space and that the product fp belongs to P if / is a (bounded) locally contented function and p G P. Theorem 3.1. There exists a unique linear function J on P satisfying the following condition: If p G P is such that supp p C U, where ([/, a) is a chart of M, then fp= f p«. (3.7) J Ja{U) Proof. We first show that there is at most one linear function satisfying (3.7). Let a be an atlas of M, and let {gj} be a partition of unity subordinate to d. For each j choose an i{j) so that supp gj C f/^•, ijy Write p = 1 ' p = ^ gj' P' Since supp p is compact, only finitely many of the terms gjp are not identically zero. Thus the sum is finite. Since J is linear, jp= jH QjP = Z {qj-P- By (3.7), [qjP = f (gjP)oci^jy J •^«f(y)(^i(y)) Thus fp=T. I (gjP)a,,jy (3.8) Thus J, if it exists, must be given by (3.8). To establish the existence of J,
410 THE INTEGRAL CALCULUS ON MANIFOLDS 10.)^ we must show that (3.8) defines a linear function on P satisfying (3.7). The linearity is obvious; we must verify (3.7). Suppose supp p C U for some chart {U, a). We must show that Since p = X (JjP and therefore Pa ^ Yl {OjP)a, it suffices to show that f {gjp)a = f {9jp)ai, (3.9) Ja(U) JotiiUi) where supp (jjp C f/ D Ui. By (3.1), SO that (3.9) holds by the transformation law for integrals in IR^. D We can derive a number of useful properties of the integral from the formula (3.8): if Pi < P2y then fpi < jp2. (3.10) In fact, since Qj > 0, we have {gjPi)a ^ ((IjP2)a for any chart (U, a). Thus (3.10) follows from the corresponding fact on IR"" if we use (3.8). Let us say that a set A has content zero ifAcAiU---uAp where eacli Ai is compact, A^ C Ui for some chart {Ui, ai)j and ai{A^) has content zero in IR^. It is easy to see that the union of any finite number of sets of content zero has content zero. It is also clear that the function ba is contented. Let us call a set J5 C M contented if the function es is contented. For any p G P we define Jb P by f^p= jenp. (3.11) It follows from (3.8) that f p-0 J A for any p G P if A has content zero. We can thus ignore sets of content zero for the purpose of integration. In practice, one usually takes advantage of this when computing integrals, rather than using (3.8). For instance, in computing an integral over aS^, we can "ignore" any meridian: for example, if A = {:r G ^" : :r - (^, 0, . . . , ztVl - t^) G B^^+'}, then f p = f p for any p. This means that we can compute jsn p by introducing polar coordinates (Fig. 10.1) and expressing p in terms of them. Thus in S^, ii U = S^ — A and a is the polar coordinate chart on [/, then f p = f I Pads dip. Js^ Jo Jo
10.4 VOLUME DENSITY OF A RIEMANN METRIC 411 Fig. 10.1 It is worth observing that if iV is a differentiable manifold of dimension less than dim M and ^ is a differentiable map oi N -^ M, then Proposition 7.3 of Chapter 8 implies that if A is any compact subset of N, then 4^{A) has content zero in M. In this sense, one can ignore "lower-dimensional sets" when integrating on M. 4. VOLUME DENSITY OF A RIEMANN METRIC Let M be a differentiable manifold with a Riemann metric a. We define the density cr [=o-(a)] as follows. For each chart (U, a) with coordinates x^, . . . ,x'^ let 11/2 (Toc{ol{x)) = ^^H(^^^^'ali^^))] |det {gijix))\ 1/2 Here is the matrix whose ijth entry is the scalar product of the vectors (4.1) ^^^^ and ^-7 (x), dx^ ' so that (in view of Exercise 8.1 of Chapter 8) o'a(oi(x)) = volume of the parallelepiped spanned by (d/dx^){x) with respect to the Euclidean metric ( , )a,x on Tx{M). It is easy to see that (4.1) actually defines a density. Let (TF, ^) be a second chart about x with coordinates y^, . . . , y^. Then so that (^^(Kx)) = det l>-w-^4\"'-
412 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4 Now \dy^ ' dy^J ~~ X'jdyk' dy^ \dx^ ' dx^J for all A; and l. We can write this as the matrix equation Wdy^ ' dy^/\ ~ \_dy^\ \_\dx^ ' dyjj 'dx[ Idy^] so that (T^iKx)) det det det 1/2 dx^ ldy^_ det det dx3 Idy^ 1/2 L^2/^J == aa(a{x)) det W. If M is an open subset of Euclidean space with the Euclidean metric, then the volume density, when integrated over any contented set, yields the ordinary Euclidean volume of that set. In fact, if X ,. . . , ^ are orthonormal coordinates corresponding to the identity chart, then gij{x) = 0 ii i 9^ j and ga == 1, so that Cid ^ 1 and thus Ja J a 1 mU). J\lore generally, let cp be an immersion of a A;-dimensional manifold M into W^ such that (p{M) is an open subset of a A;-dimensional hyperplane in W, and let m be the Riemann metric induced on M by cp. Then, if c denotes the corresponding volume density, J4 a is the A;-dimensional Euclidean volume of (p{A), In fact, by a Euclidean motion, we may assume that cp maps M into R^ C ll^^^ Then, since cp is an immersion and M is A;-dimensional, we can use :r\ . . . , :r* as coordinates on M and conclude, as before, that a is given by the function in terms of these coordinates, and hence that JaO" = m(<^(^))- Now let (pi and <^2 be two immersions of M Let (U, a) be a coordinate chart on M with coordinates y by (Pi, then y . If mi is the Riemann metric induced and dy3jmi dy^/m2 ^ fd<pi d<pi\ \dy^ ' dy^'J /d<p2 ^ d<p2\ \dy^ ' dyjj where the scalar product on the right is the Euclidean scalar product. Let at
10.4 VOLUME ©13NSITY OF A RIEMANN METRIC 413 Fig. 10.2 and 0-2 be the volume densities corresponding to mi and m2. Then and In particular, given an L > 0, there is a K = K{k, n, L) such that if < L and \\-fj\\ < L then I by the mean-value theorem, |0-i« - 0-2a| < K ^^ - —, for all ^ = 1, . . . , k, + '" + dyk dy4/ Roughly speaking, this means that if <pi and <P2 are close, in the sense that their derivatives are close, then the densities they induce are close. We apply this remark to the following situation. We let <pi be an immersion of M into W and let (TF, a) be some chart of M with coordinates ^^ . . . , /. We let [/ = Tf — C = Ul/z, where C is some closed set of content zero and such that UinUv -= 0 U I 9^ V, For each I let zi be a point of Ui whose coordinates are < 2//,. i^sZ^Ui. (See Fig. 10.2.) yi>, and for 2 - < 2/S • • • » /> define (^2 by setting,
414 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4 If the Uis are sufficiently small, then I dy^ dy^ I will be small. More generally, we could choose <p2 to be any affine linear map approximating <pi on each Ui. We thus see that the volume of W in terms of the Riemann metric induced by (p is the limit of the (surface) volume of polyhedra approximating (p{W). Here the approximation must be in the sense of slope (i.e., the derivatives must be close) and not merely in the sense of position. The construction of the volume density can be generalized and suggests an alternative definition of the notion of density. In fact, let p be a rule which assigns to each x in M a function, px, on 7i tangent vectors in TxiM) subject to the rule p,(A?i, . . . , Ain) = |det A|p,(?i, . . . , U, (4.2) where ?^• G Tx{M) and A: Tx{M) -> T^iM) is a linear transformation. Then we see that p determines a density by setting p„(«(x))=p(^(x),...,^(x)) (4.3) if ([/, a) is a chart with coordinates u^, . . . , u'^. The fact that (4.3) defines a density follows immediately from (4.2) and the transformation law for the d/du'^ under change of coordinates. Conversely, given a density p in terms of the p^, define p{d/du^, . . . , d/du'^) by (4.3). Since the vectors {^/^t^*}i=i,...,n form a basis at each x in Uj any Ji, . . . , J^ in TxiM) can be written as where .B is a linear transformation of Tx{M) into itself. Then (4.2) determines pUu . . . , ?n) as P(?i, ..., U = \detB\p^(a(x)), (4.4) That this definition is consistent (i.e., doesn't depend on a) follows from (4.2) and the transformation law (3.1) for densities. EXERCISES 4.1 LetM = S^XS^ be the torus, and let <p: M -> R^ be given by X^ o (p(dl, 62) = COS dl, x^ o (p(di, 62) = sin dl, x^ o (p(di, 62) = 2 cos ^2, x'^o ^{$1,62) = 2 sin (92,
10.4 VOLUME DENSITY OF A RIEMANN METRIC 415 and 6^, 6^ are angular coordi- where x^, . . . , x^ are the rectangular coordinates on nates on M. a) Express the Riemann metric induced on if by <p (from the Euclidean metric on R^) in terms of the coordinates 6^, 6^. [That is, compute the gij{d^, 6^).] b) What is the volume of M relative to this Riemann metric? 4.2 Consider the Riemann metric induced on S^ X S^ by the immersion (p into E^by X o ip(u, v) = (a — cos u) cos v, y o ip(u, v) = (a — cos u) sin v, Z o <p(Wj y) = sin Uj where u and v are angular coordinates and a > 2. What is the total surface area of <S^ X aS^ under this metric? 4.3 Let ip map a region U of the a;2/-plane into E^ by the formula ip{x,y) = {x,y,F(x,y)), Fig. 10.3 so that (piU) is the surface z = F(x, y). (See Fig. 10.3.) Show that the area of this surface is given by fM£r-(sr- 4.4 Find the area of the paraboloid z = x^ -\- y^ for x^ -\- y^ < 1. 4.5 Let U C U^y and let <p: ^ -> E^ be given by (p(u,v) = {x(u,v),yiu,v),z(u,v)), where Xj y, z are rectangular coordinates on E^. Show the area of the surface ip{U) is given by i I \(dx dy dx dy\ idy dz dy dz\ Idx dz dx dy\ Ju A/ \du dv dv duj \du dv dv duf \^u dv dv duj 4.6 Compute the surface area of the unit sphere in E^. 4.7 Let Ml and If2 be differentiable manifolds, and let cr be a density on M2 which is nowhere zero. For each density p on Mi X M2, each product chart (Ui X U'2, ai X a2)j and each X2 E U2, define the function piaii'y ^2) by PlaiiVlj X2)(ra{a2{X2)) = PaiXa2 (^b <^2(^2) ) for all 1^1 GaiiUi). a) Show that piaii^ij ^2) is independent of the chart ([/2, 0C2)' b) Show that for each fixed X2 G M2 the functions piaii', ^2) define a density on Mi. We shall call this density pi(x2).
416 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 c) Show that if p is a smooth density of compact support on Mi X M2 and a is smooth, then pi{x2) is a smooth density of compact support on Mi. d) Let p be as in (c). Define the function Fp on Tlf2 by Fp{X2) = f plfe). Sketch how you would prove the fact that Fp is a smooth function of compact support on M2 and that JM1XM2 JM2 5. FULLBACK AND LIE DERIVATIVES OF DENSITIES Let if'. Ml —> M2 be a diffeomorphism, and let p be a density on M2. Define the density <^*p on Mi by <^*P(?l, • • • , ?n) == P(<^*?1, • • • , <^*?n) (5.1) for ^i G Tx(Mi) and <^^ = <^^x- To show that <^*p is actually a density, we must check that (4.2) holds for any linear transformation A of Tx{Mi). But <^*p(A?i, . . . , A?n) = p(<^*A?i, . . . , <^*A?n) ^ p(<^*A<^* <^*?i, . . . , <^*A<^* <^*?n) = |det (p-^Aif^ I p(<^*?i, . . . , <^*?n) = |det A|<^*p(?i, . . . , in), which is the desired identity. Let ([/, a) and {W, (3) be compatible charts on Mi and M2 with coordinates u^, . . . , u'^ and i(;\ . . . , if^, respectively. Then for all points of U we have, by (4.3), (^*p).(.(.)) = P (^*^ . • • . . ^*^) = |det (g)| P (^- • -^) (^ det P^(^o<^(.)). In other words, we have (<p''p)a = |det J^o<poa-i|p(^ o <p o q:-1(-)). (5.2) The density <^*p is called the pullback of p by <^*. It is clear that <^*(P1 + P2) = <^*(pl) + <^*(P2) and that <f*{fp) = <P*U)'P*{P) for any function /. It follows directly from the definition that supp <^*p = <^~^[supp p].
10.5 FULLBACK AND LIE DERIVATIVES OF DENSITIES 417 Proposition 5.1. Let ip\ M^ —> M2 be a diffeomorphism, and let p be a locally absoluteh^ integrable density with compact support on Af 2. Then /"*" /" (5.3) Proof. It suffices to prove (5.3) for the case suppp C <p{U) for some chart {U, a) of Mi with (p{U) C W, where (W, ^) is a chart of M2. In fact, the set of all such (p(U) is an open covering of M2, and we can therefore choose a partition of unity {qj} subordinate to it. If we write p = 1] QjPj then the sum is finite and each gjp has the observed property. Since both sides of (5.3) are linear, we conclude that it suffices to prove (5.3) for each term. Now if supp p C <p{U), then fp= f P^= f J J3(W) JBo and f>i.U) P^ f<P*P = f (<P*p)a J Ja{U) Ja(U) J^o(p(U) P^y thus establishing (5.3). D Now let (ft be a one-parameter group on M with infinitesimal generator X. Let p be a density on M, let ([/, a) be a chart, and let W be an open subset of U such that cpt{W) C U for all |^| < e. Then {^tp)a{v) = Pa(^a{v, t)) det for V G oi{W)y where ^a{^, t) = a o cp^ o a~^{v) and {d^a/^^){v,t) is the Jacobian of y h-> ^^{v, t). We would like to compute the derivative of this expression with respect to t at t = 0. Now ^ai^^j 0) = ^y and so Consequently, we can conclude that det 1. 0) \^^ /iv,t > 0 for t close to zero. We can therefore omit the absolute-value sign and write d{(Ptp)a dt dpai^a) dt ..^-«l(<'-t)l,..
418 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 We simply evaluate the first derivative on the right by the chain rule, and get dpa f-^j = dpa{Xa{v)). In terms of coordinates x^, . . . , x^, we can write dpai^ajV, t)) _ y^ dPa j dt ~ ^ dxi ^" if Xa =^ "K X^, . . . J X^ /^ . To evaluate the second term on the right, we need to make a preliminary observation. Let A{t) = (dijit)) be a differentiable matrix-valued function of t with 1(0) = id = {8i). Then ^^^^l = limi(detA(0-l). Now aii(0) = 1 and aij(0) = 0 (i ^ j). To say that A is differentiable means that each of the functions aij(t) is differentiable. We can therefore find a constant K such that \aij{t)\ < K\t\ (^ 9^ j) and {aait) — 1| < K\t\. In the expansion of det A (t), the only term which will not vanish at least as ^^ is the diagonal product aii(0 • • • (inn(t). In fact, any other term in ^ i aii^(t) • • • ani^(t) involves at least two off-diagonal terms and thus vanishes at least as ^^. Thus ^ (det A{t)) = li^i (aii(0 • • • a^n(0 — 0 = aii(0) + ...+a;,(0) = tr A'(0). If we take A = d^^c/^^j "^ve conclude that It V dv J-^'^ dv - ^ dx^ d{iptP)a X^ dPa -^i I ^ dXa V^ ^ / vi\ Thus * We repeat: Proposition 5.2. Let (pt be a one-parameter group of diffeomorphisms of M with infinitesimal generator X, and let p be a differentiable density on M. Then DxP^hm^^^ exists and is given locally by {DxP)a - 2. ^-- if Z = <Xl, . . . ,Xl> on the chart {U, a).
10.6 THE DIVERGENCE THEOREM 419 The density DxP is sometimes called the divergence of <X,p> and is denoted by div <X, p>. Thus div <X, p> ^ DxP is the density given by (div<X,p>)« = E^,(^aP«) on (t/,a). Now let p be a differentiable density, and let A be a compact contented set. Then I P = I e^tU)P = I <pt{e<ptU)p) JM = \{^te<ptiA)){<ptp) JiPtiA) Jm [eA<pt(p) I * <PtP- A Thus 7(L/-/.^)=/j<-^^"- Using a partition of unity, we can easily see that the limit under the integral sign is uniform, and we thus have the formula |(f p)\ = f Dxp= [div<X,p>. Ctt\J^^(A) / U=0 J A J A 6. THE DIVERGENCE THEOREM Let <p be a flow on a differentiable manifold M with infinitesimal generator X. Let p be a density belonging to P, and let A be a contented subset of M. Then for small values of t, we would expect the difference J^^(A) P — Ja P to depend only on what is happening near the boundary of A (Fig. 10,4). In the limit, we would expect the derivative of J^^(^) p at ^ = 0 (which is given by Ja div <X, p>) to be given by some integral over dA. In order to formulate such a result, we must first single out a class of sets whose boundaries are sufficiently nice to allow us to integrate over them. We therefore make the following definition: Definition. Let M be a differentiable manifold, and let D be a subset of M, We say that Z) is a domain with regular boundary if for every x G M there is a chart (f/, a) about x, with coordinates x^, . . . , x^, such that one of the following three possibilities holds: i) UnD=0; ii) UCD; iii) aiU n D) = a{U) n {v = <v\ .,,,v^> eW'-.v^ > 0}.
420 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6 Note that if x g D, we can always find a, (U, a) about x such that (i) holds. If X G int D, we can always find a chart (L^, a) about x such that (ii) holds. This imposes no restrictions on D. The crucial condition is imposed when X G dD. Then we cannot find charts about x satisfying (i) or (ii). In this case, (iii) implies that a(U D dD) is an open subset of U'^~'^ (Fig. 10.5). In fact, a(U n dD) = {ve a{U) '.v"" = 0} = a(U) D [I^^~\ where we regard R^-^ as the subspace of W^ consisting of those vectors with last component zero. Fig. 10.5 Let a be an atlas of M such that each chart of G, satisfies either (i), (ii), or (iii). For each (U, a) e a consider the map a \ dD:U ndD -^ W~^ C H^"". [Of course, the maps a \ dD will have a nonempty domain of definition only for charts of type (iii).] We claim that {([/ n dD, a \ dD)} is an atlas on dD. In fact, let ([/, a) and (W, (3) be two charts in a such that U D W n dD 7^ 0. Let x\ . . . , x"" be the coordinates of {U, a), and let ij^, . . . , ij^ be those of {W, p). The map jS o a~^ is given by <x\ . . . ,x^> K-> <y\x\ . . . ,x^), . . . ,?/^(x\ . . . ,.r'0>. On a{U nW n dD), we have x"" =^ 0 and y"^ = 0. In particular, 2/-(x^...,x^-^o)^o, and the functions y^{x^j . . . , x'^~^, 0), . . . , y''~'^(x\ . . . , x'^~\ 0) are differentiable. This shows that (^ \ dD) o (« \ dD)'' is differentiable on a{U n dD). We thus get a manifold structure on dD. It is easy to see that this manifold structure is independent of the particular atlas of M that was chosen. We shall denote by i the map of dD —> M which sends each x G dD, regarded as an element of M, into itself. It is clear that t is a differentiable map. (In fact, {U O dD, a charts in terms of which a o t o (« f dD)~^ is just the map of W^~'^ —> W^.) Let jc be a point of dD regarded as a point of M, and let ^ be an element of TxiM). We say that ^ points into D if for every curve C with C'(0) = ^, we have C(t) G D for sufficiently small positive t (Fig. 10.6). In Fig. 10.6 dD) and (U, a) are compatible n —1
10.6 THE DIVERGENCE THEOREM 421 terms of a chart {U, a) of type (iii), let ^^ = < ?^ • • • , ^""^ • Then it is clear that ^ points into D if and only if ^^ > 0. Similarly, a tangent vector ^ points out of D (obvious definition) if and only if ^^ < 0. If ^^ = 0, then ^ is tangent to the boundary—it lies in i^Tx{dD). Let p be a density on M and X a vector field on M. Define the density px on dD by px(h, ..., ?n-i) = p(t*h, ...,t*?,_i,X(x)) for ^,eT,{dD). (6.1) It is easy to check that (6.1) defines a density. (This is left as an exercise for the reader.) If {U, a) is a chart of type (iii) about x and X« = <X\ . . . , X'^>, then applying (4.3) to the chart (U n dD^ a f dD) and the density px, we see that (px)o,haB = p(^---.^'XJ- Let A be the linear transformation of Tx{M) given by The matrix of A is d axi ax^-i ax^-1 Ai^ = X. 1 0 0 0 0 1 0 x^ x^ 0 and therefore Idet A\ = {X"^]. Thus we have (Px)« hai) = \X'^\pa at all points of a{U n ^2)). We can now state our results. (6.2) Theorem 6.1 {The divergence theorem)."f Let D be a domain with regular boundary, let p G P, and let X be a smooth vector field on M. Define the function ex on dD by (1 if X{x) points out of D, 0 if X(x) is tangent to dD, — 1 if X(x) points into D. Then f div <X,p> = f €xpx. (6.3) Jd JdD Remark. In terms of a chart of type (iii), the function ex is given by ex = -sgn X^. (6.4) t This formulation and proof of the divergence theorem was suggested to us by Richard Rasala.
422 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6 Fig. 10.7 Fig. 10.8 supp p Fig. 10.9 Fig. 10.10 Proof. Let d be an atlas of M each of whose charts is one of the three types. Let {(ji\ be a partition of unity subordinate to d. Write p = Y. QiP- This is a finite sum. Since both sides of (6.3) are linear functions of p it suffices to verify (6.3) for each of the summands gip. Changing our notation (replacing Qip by p), we reduce the problem to proving (6.3) under the additional assumption supp p C U, where (U, a) is a chart of type (i), (ii), or (iii). There are therefore three cases to consider. CASE L supp pCUsindUnD=0. (See Fig. 10.7.) Then both sides of (6.3) vanish, and so (6.3) is correct. CASE XL supp pCU with U C int D. (See Fig. 10.8.) Then the right-hand side of (6.3) vanishes. We must show that the left-hand side does also. But Jd ' Ju Ja(C/) dx'' "^ JaiU) dx'' Now each of the functions X^Pa has its support lying inside a{U). Choose some large R so that a{U) C O—r. We can then replace jaiU) by /nS-R. We extend its domain of definition to all of U^ by setting it equal to zero outside a{U). (See Fig. 10.9.) Writing the integral as an iterated integral and integrating with respect to x* first, we see that JaiU) dx^ I = / X'pa( ...,R,...)- XWi ...,-R,...)dxUx'' dx'-^ dx dx'" = 0. This last integral vanishes, because the function X'p^ vanishes outside a{U).
10.6 THE DIVERGENCE THEOREM 423 CASE III. supp p is contained in a chart of type (iii). (See Fig. 10.10.) Then I div <x,p> = f div <x,p> = j: f ^^'^« div <X,p> = I div <X,p> J D Now a(U f\D) = a{U) n {y:v'' > 0}. We can therefore replace the domain of integration by the rectangle UT-r:''.^-r,^>- (See Fig. 10.11.) For 1 < i < n all the integrals in "<-^;ii the sum vanish as before. For i =1 n we obtain x{Df\U) dx^ R> !;^ = 0 supp Pa /^div<X,p> = -/^„_,XV. Fig. 10.11 If we compare this with (6.2) and (6.4), we see that this is exactly the assertion of (6.3). D If the manifold M is given a Riemann metric, then we can give an alternative version of the divergence theorem. Let dV be the volume density of the Riemann metric, so that dViiu ...,?„) = Idet {{U, ii))\^l\ ii e T,{M), is the volume of the parallelepiped spanned by the ^i in the tangent space (with respect to the Euclidean metric given by the scalar product on the tangent space). Now the map t is an immersion, and therefore we get an induced Riemann metric on dD. Let dS be the corresponding volume density on dD. Thus, if {ii\i=\ n-i are n — 1 vectors in T^idD), dS{^i, . . . , ?n-i) is the (n — 1)- dimensional volume of the parallelepiped spanned by t*Ji, . . . , t*?n-i in i^T^{dD) C T^{M). For any xGdDletnG T^{M) be the vector of unit length which is orthogonal to uTx{dD) and which points out of D (Fig. 10.12). We
424 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 clearly have dS{^i, . . . , ^n-i) = dV(L:^^i, . . . , t*?n-i, n). For any vector X{x) G Tx{M) (Fig. 10.13) the volume of the parallelepiped spanned by Ji, . . . , J^-i, X(x) is \{X(x), ii)\dS{^i, . . . , ?n-i)- [In fact, write X{x) = {X{x),n)ii + m, where m G ^^^^(^Z)).] If we compare this with (6.1), we see that dVx= \{X,n)\dS. (6.5) Furthermore, it is clear that e{x) = sgn (X{x)j-n.). Let p be any density on M. Then we can write P = fdV, where / is a function. Furthermore, we clearly have px = f dVx and div <X,p> = div <XJdV.>. We can then rewrite (6.3) as [ div <X,fdV> - [ f-{X,n)dS. (6.6) Jd JdD 7. MORE COMPLICATED DOMAINS For many purposes. Theorem 6.1 is not quite sufficiently broad. The trouble is that we would like to apply (6.3) to domains whose boundaries are not completely smooth. For instance, we would like to apply it to a rectangle in U^, Now the boundary of a rectangle is regular at all points except those lying on an edge (i.e., the intersection of two faces). Since the edges form a set "of dimension n — 2", we would expect that their presence does not invalidate (6.3). This is in fact the case. Let M be a differentiable manifold, and let D be a subset of M. We say that D is a domain with almost regular boundary if to every x G M there is a chart {U, a) about x, with coordinates x^, . . . , x^, such that one of the following four possibilities holds: i) UnD=0; ii) UcD; iii) a{U nD) = a{U) n {y= <v\ . . . , z;^> G IR^ : y^ > 0} ; iv) a(U nD) = a(U) n {y= <v\ . . . , z;^> G K^ : y^ > 0, . . . , z;^ > 0}. The novel point is that we are now allowing for possibility (iv) where k < n. This, of course, is a new possibility only if n > 1. Let us assume n > 1 and sec what (iv) allows. We can write a{U Ci dD) as the union of certain open subsets lying in (n — 1)-dimensional subspaces of 1R^~\ together with a union of portions lying in subspaces of dimension n — 2.
10.7 MORE COMPLICATED DOMAINS 425 Fig. 10.14 In fact, fork<p<n let jy^ = {v : / > 0, . . . , z;^ = 0, z;^+' > 0, . . . , z;" > 0}. Thus Hp is an open subset of the (n — 1)-dimensional subspace given by v^ = 0. (See Fig. 10.14.) We can write a{U n dD) C a{U) n {{Hi U Hl^, U---uHt)US}, where S is the union of the subspaces (of dimension n — 2) where at least two of the v^ vanish. Fig. 10.15 Observe that if x gU ndD is such that a(x) G H^ for some p, then there is a chart about x of type (iii). In fact, simply renumber the coordinates so that v^ becomes z;^ that is, map IR'' -^ IR'' by sending <v^, . . . ,v''> -^ <w^,. . . , w''>, where y* for i < p, w" .,^+1 for p < i < n, Then in a sufficiently small neighborhood U^ of x the chart (U^, <p o a) is of type (iii). (See Fig. 10.15.)
426 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 We next observe the set of :r G dD having a neighborhood of type (iii) forms a differentiable manifold. The argument is just as before. The only difference is that this time these points do not exhaust all of dD. We shall denote this manifold by dD. Thus dD is a manifold which, as a set, is not dD but only the "regular" points of dD^ that is, those having charts of type (iii). Theorem 7.1 (The divergence theorem). Let M be an n-dimensional manifold, and let D cM be a domain with almost regular boundary. Let dD be as above, and let i be the injection of dD -^ M. Then for any p G P we have I div <X,p> = L €xpx. (7.1) JD JdD Proof. The proof proceeds as before. We choose a connecting atlas of charts of types (i) through (iv) and a partition of unity {qj} subordinate to the atlas. We write p = 1] gjP and now have four cases to consider. The first three cases have already been handled. The new case arises when p has its support in U, where {U, a) is a chart of type (iv). We must evaluate 5r^ dX^Pa L aiUnD) dx^ The terms in the sum corresponding to i < k make no contribution to the integral, as before. Let us extend X^pa to be defined on all of IR^ by setting it equal to zero outside a{U), just as before. Then, for k < i < n we have / dX^Pa ^ / dX^Pa JaiUnD) dX^ Jb dX^ where B = {y : v^ > 0, . . . , v"" > 0}. Writing this as an iterated integral and integrating first wdth respect to x\ we obtain f dX%_ f Jb dx^ J A, X^Pa, IB OX" JAi where the set Ai C W^~~^ is given by Ai= {<v\.. v^-^,v'-^\ Fig. 10.16 ,z;^> :z;^ > 0, ...,z;^ > 0}. Note that Ai differs from H^ by a set of content zero in 1R^~^ (namely, where at least one of the i;^ = 0 for /c = ? < n). Thus we can replace the Ai by the H^ in the integral. Summing over k < i < n, we get JaiDnU)"^ dx' a, Jh- which is exactly the assertion of Theorem 7.1 for case (iv). D
10.7 MORE COMPLICATED DOMAINS 427 Fig. 10.17 Fig. 10.18 We should point out that even Theorem 7.1 does not cover all cases for which it is useful to have a divergence theorem. For instance, in the plane, Theorem 7.1 does apply to the case where D is a triangle. (See Fig. 10.16.) This is because we can "stretch " each angle to a right angle (in fact, we can do this by a linear change of variables of IR^). (See Fig. 10.17.) However Theorem 7.1 does not apply to a quadrilateral such as the one in Fig. 10.18, since there is no C^-transformation that will convert an angle greater than TT into one smaller than tt (since its Jacobian at the corner must carry lines into lines). Thus Theorem 7.1 doesn't apply directly. However, we can write the quadrilateral as the union of two triangles, apply Theorem 7.1 to each triangle, and note that the contributions of each triangle coming from the common boundary cancel each other out. Thus the divergence theorem does apply to our quadrilateral. This procedure works in a quite general context. In fact, it works for all cases where we shall need the divergence theorem in this book, whether Theorem 7.1 applies directly or we can reduce to it by a finite subdivision of our domain, followed by a limiting argument. We shall not, however, formulate a general theorem covering all such cases; it is clear in each instance how to proceed. EXERCISES In Euclidean space we shall write div X instead of div <X, p> when p is taken to be the Euclidean volume density. 7.1 Let X, y, z be rectangular coordinates on E^. Let the vector field X be given by where r^ = x^ + 2/^ + 2;^. Show directly that { (X, n)dA= f div X Js Jb by integrating both sides. Here 5 is a ball centered at the origin and S is its boundary.
428 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 7.2 Let the vector field Y be given by Y = YrHf + YqRq + Y^n^ in terms of polar "coordinates" r,d,,^ on E^, where rir, rift and n,^ are the unit vectors in the directions d/dr, d/dd and d/dif respectively. Show that div Y = ^4— \l (r' sin^ F,) + ^ (rYe) + ^ {rsm<p yA . r2 sin ip [or ad dip ] 7.3 Compute the divergence of a vector field in terms of polar coodrinates in the plane. 7.4 Compute the divergence of a vector field in terms of cylindrical coordinates in E3. 7.5 Let (J be the volume (area) density on the unit sphere iS^. Compute div aX in terms of the coordinates 6, (p (polar coordinates) on the sphere.
CHAPTER 11 EXTERIOR CALCULUS Let M be a differentiable manifold and let co be a linear differential form in J\I. For any differentiable curve C\ [a, h] -^ M we can consider the integral ia (^'(Oj ^C(o) ^^^- Let [c, d] -^ [a, h] be a differentiable map given by s -^ t{s). The curve B: [c, d] -^ M given by B{s) = C{t{s)) satisfies B'{s) = t'{s)C'{t{s)). Thus if t'{s) > 0 for all s, /"' (B'is), a^Bw) ds= f' (C'it), coc(o) dt. J c J a Thus a linear differential form is something we can integrate over "oriented" curves of M and is independent of the parametrization. In this chapter we shall introduce objects which can be integrated over "oriented A:-dimensional surfaces" of M and study their properties. 1. EXTERIOR DIEEERENTIAL EORMS We defined a linear differential form to be a rule which assigns an element of T*(M) to each x^M. We can regard r*(M) d.^ a^{T:,{M)). In view of this, we make the following generalization of this definition. By an exterior differential form of degree q on M we mean a rule which assigns an element of Gf^i^TxiM)) to each x E M. If co is an exterior form of degree q and (11, a) is a chart, then, since a identifies each Tx{M) with V for x ^U, we obtain an (i^(F)-valued function, a;«, on a{U) defined by co.W(?i, . . . , ?2) = a;(a;)(?\ • • • , ?'^) li v = a{x) and ^\ . . . ^ ^^ ^ Tx{M). It is easy to write down the transition laws. In fact, if {W, fi) is a second chart, we have or, since ^^ = /^oa-i(a:(x))(^a) for ^ E T^i^l), we see that cOaW(?i, . . . , ?2) = co^C^ ° «~'W)(/^o«-K^)?L . . . , /^o«-i(.)^^). (1.1) In order to write (1.1) in a less cumbersome form, we introduce the following notation. Let Vi and V2 be vector spaces, and let l\ Vi —> F2 be a linear map. 429
430 EXTERIOR CALCULUS 11.1 We define a^(/) to be the linear map of a^(T"2) -^ Gi^iVi) given by for all IV E Gi^{V2) and vi, . . . , Vp E Fi. Note that under the identification of anF)withF* the map a^(/) coincides with the map /*: Vt -^ f! . Note also that if ici E a^(F2) and 102 E a^(F2), then a^(0^ri A a^(/)?r2 = a^+^(/)06^i a 102)^ (1.2) This follows directly from the definitions. Also, if h'. Vi -^ V2 and I2: V2 -^ F3, then ci^(/2 ° h) = a'\h) ° ^^(y. (1.3) It is clear that if / depends differentiably on some parameters, then so does a^{l) for any p. We can now write (1.1) as o:M - Gi'(J^oa-^(v))c^^(^ o a-\v)). (1.10 It is clear from (l.F) that it is consistent to require that co^ be a smooth function. We therefore say that co is a smooth differential form if all the functions coa are C"^ on a(U) for all charts (U, a). As usual, it suffices to verify this for all charts in an atlas. We let /\^{M) denote the space of all smooth exterior forms of degree q. Let wi E A^(^) and a;2 E A'^(^)- We define the exterior (p + g)-form coi A cjJ2 by (coi A o)2)(x) = a;i(x) A co2Cr) for all x E M. It is easy to check that coi A 002 is a smooth {p -f g)-form. We thus get a multiplication on exterior forms. To make the formalism complete, it is convenient to denote the space of differentiable functions on M by /\^(il/) and to denote the product of a function / and a p-form co by /co or / A co. This product is given by (/ A a;)(x) = (/a;)(x) -= f(x)o:(x) for all xgM. We have thus defined, for all 0 < p < ?i and 0 < g < ??, a multiplication sending coi e A^'W and a;2 E A^(^) ii^to coi A a;2 E A'''^^(^) (where coi A a;2 = 0 if p + g > n = dim M). The rules for the A-product on antisymmetric tensors carry over and thus, for instance, coi A a;2 = (-l)^^a;2 A coi if coi E A''(^) and a;2 E A^(^), coi A (002 A a;2) = (coi A a;2) A coa, ^1 A (002 + a;3) = coi A a;2 + coi A cx^s, and so on. Let Ml and ikf2 be different iable manifolds, and let (p: Mi -^ M2 be a differentiable map. For each co E A'^(^2) we define the form (^*a; E A^(^i) by cpMx) = a^(<^*.)(co(<^(x))). (1.4)
11.1 EXTERIOR DIFFERENTIAL FORMS 431 It is easy to check that <p*cu is indeed an element of /\^(Mi), that is, it is a smooth g-form. Note also that (7.5) of Chapter 9 is a special case of (1.4)—the case g = 1. (If we make the convention that 6i^(l) = id, then the case g = 0 of (1.4) is the rule for pullback of functions.) It follows from (1.4) that <p* is linear, that is, <p*(a;i + C02) = <P*(coi) + <P*(co2), (1.5) and from (1.2) that <p*(a;i A 002) = <P*(coi) A <P*(co2). (1.6) If <p is a one-parameter group on a manifold M with infinitesimal generator X, then we can show that the exists for any co E /\^(M). The proof of the existence of this limit is straightforward and will be omitted. We shall derive a useful formula allowing a simple calculation of Dx^ in Section 3. Let us now see how to compute with the /\^{M) in terms of local coordinates. Let {U, a) be a chart of M with coordinates x^, . . . , x'^. Then dx'' G /\^{U) (where by /YiU) we mean the set of differentiable g-forms defined on U). For any ii, . . . , iq the form dx''^ A - - - A dx""^ belongs to /\^{U), and for every X G U the forms {(dx'' A • • • A dx''^)(x)}i^<_...^i^ form a basis for (l^(Tx(M)). From this it follows that every exterior form co of degree g which is defined on U can be written as ^ = 12 «^•l ^, ^^'' A • • • A dx'\ (1.7) *1 < • • • < iq where the a's are functions; that is, ^W = Z) «^l zM)(dx'' A • • • A dx''){x) t'l < • • • < iq for all X G U. It is easy to see that co G A\^{U) if and only if all the functions a^j i are (7°°-functions on U. If (W, 13) is a second chart with coordinates y^, . , . , y'^ and cu= Zbj, j^dy^'^ A ••• A dy^'% (1.8) then it is easy to compute the transition law relating the b's to the a's on U nW. In fact, on U n W we have dy' = Y.%dx\ (1.9) where y^ = y\x^, . . . , :r^). Then all we have to do is to substitute (1.9) into (1.8) and collect the coefficients of dx''^ A • • • A dx''^. For instance, if g = 2,
432 EXTERIOR CALCULUS 11.1 then we have h<J2 ii<y2 V"-^ dx ;;d.")A(£d.^+...+£d."). If we collect the coefficients of c/x"^ A dx''^ (remember the A-multiplication in anticommutative), we get CO z V h (^_yl^_y!l_^_y!l^_yl dx'' A dx ,.^2 Thus Z ^iiy2det *'^1^2 i^^ ^JlJ2 ' J1<J2 ^11 dx^i dx^2 dx^i dy^2 (1.10) Although (1.10) looks a little formidable, the point is that all one has to remember is (1.9) and the law for A-multiplication. For general q the same argumenl gives «^•l..■.,^•, = Z ^J\ y, det dx'^i dxH dx^i dy^ dx^Q. (l-ll) The formula for pullback takes exactly the same form. Let ip\ Mi —> 71/-. be a differentiable map, and suppose that {11, a) and (TF, fi) are compatible charts, where X , . . . , j^ ai e the coordinates of (U, a) and ?/\ . . . , ^y"" are those of (TF, fi). Then we get that 2/' ° </^ are functions on U and can thus be written as y^ o ^ = y^\x\ . . . , x'''). Since (^* dy^ = (:/(?/-^ o (^)^ we have ^ aa:» '^•^ (1.12) If Jl<---<Jq then, by (1.5) and (1.6), /M = Z (hu...jq ° <^)(<^*^^'0 A • • • A (<pUy''^). (i.i:;) ^l<-"<^g The expression for (1.13) in terms of the dx's can be computed by substitutini; (1.12) into (1.13) and collecting coefficients. The answer, of course, will look
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS just like it did before. If <p*(cu) = then the a's are given by 433 A dx^ o^H H = S (^;i ;, ° ^) det Jl<-'-<Jq dxH .dx\ dx^i dxH, (1.14) Again, we emphasize that there is no need to remember a complicated looking formula like (1.14); Eqs. (1.5), (1.6), and (1.12) (and of course the rules for A -multiplication) are sufficient. In many cases, it is much more convenient to do the substitutions directly than to use (1.14). 2. ORIENTED MANIFOLDS AND THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS Let M be an n-dimensional manifold. Let {U, a) and (W, ^) be two charts on M with coordinates x^, . . . , x'^ and 2/\ . . . , 2/^. Let cu be an exterior differential form of degree n. Then we can write and 0) = a dx^ A • • • A dx^ on U oj = b dy^ A • • • A dy^ on W, where the functions a and b are related on U D W hy (1.11), which, in this case (q = n), becomes 'V ... dy^ dx^ dx^ a = b det dy^ dx"" dx"". or aa{ci{x)) = bfi{l3 o a ^{a{x))) det Jfioa-^(a{x)) or, finally, aa(v) = hifi o a-\v)) det J^oa-^{v) for v G a{U D W). If p is a density on M, then the transition laws for p^ are given by Pa{v) = Pfi{l3 o a:-^(z;))|det Jfioa-'{v)\' (2.1) (2.2) Note that (2.2) and (2.1) look almost the same; the difference is the absolute- value sign that occurs in (2.2) but not in (2.1). In particular, if ([/, a) and (W, 13) were such that det J^oa-^ > 0, then (2.2) and (2.1) would agree for this pair of charts.
434 EXTERIOR CALCULUS 11.2 This leads us to the following definition: An atlas a of ikf is said to be oriented if for any pair of charts (f/, a) and {W, 13) of d we have det J^oa-' (oc(x)) > 0 for all x eU n W. There is no guarantee that there exists an oriented atlas on a given manifold M. In fact, it is not difficult to show that there does not exist an oriented atlas on certain manifolds. (An example of a manifold possessing no oriented atlas is the Mobius strip.) We say that a manifold M is orientahle if it has an oriented atlas. Let M be an orientable manifold, and let di and (^2 be two oriented atlases. We say that Gii and (I2 have the same orientation, and write Gii '^ (^2, if di U (^2 is again an oriented atlas. To say that di '^ (^2 meatis that for any (U, a) G Gii and any (W, (3) E (^2 we have detJ^.a-'(v) > 0 on a(U n W). It is clear that '^ is an equivalence relation. An equivalence class of oriented atlases is called an orientation of M. An orientable manifold, together with a choice of orientation, will be called an oriented manifold. We shall denote an oriented manifold by M. That is, M is a manifold M together with a choice of orientation. Thus an oriented one-dimensional manifold has a preferred direction at each point (Fig. 11.1); an oriented two-dimensional manifold has a notion of clockwise versus counterclockwise direction (Fig. 11.2); and at any point of an oriented three-dimensional manifold we can distinguish between right- and left-handedness. Fig. 11.1 Fig. 11.2 In general, let M be an oriented manifold, and let {U, a) be a chart of M with coordinates x^, . . . , x'^. We say that (U, a) is a positive chart if J^oa~'^ > 0 for any chart {W, 13) belonging to any oriented atlas defining (i.e., belonging to) the orientation. (It suffices to check this, of course, for all {W, ^) belonging to one fixed atlas defining the orientation.) Note that if U is connected, then if (U, a) is not positive, then the chart (U, a^), where a\x) = <-v\v^,. .. ,v''> if a{x) = <v\v^,. . . .v'^y, is a positive chart. We shall say that (U, a) is a negative chart if det J^oa~^ < 0 for all (IF, fi) belonging to an atlas defining the orientation. (Thus, if U is connected, then (f/, a) must be either positive or negative.)
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 435 We now return to our initial observation comparing (2.1) with (2.2). Proposition 2.1. Let M be an oriented n-dimensional manifold. We can identify exterior forms of degree n with densities by sending the form co into the density p^j where for any positive chart {U, a) with coordinates :r\ . . . , :r^, the function p^ is determined by CO = Pa(«(•)) dx^ A - - - A dx" on U. (2.3) Another way of writing (2.3) is a;(a/axS ..., a/ax^) = p(a/ax\ ..., a/ax^). (2.3') In other words, \i o^ = a dx^ A • • • A dx"^ on U, then Pa{v) = a«. That p"^ is really a density follows from the fact that (2.2) reduces to (2.1) for all pairs of charts belonging to a positive atlas. It is clear that this identification is additive, P^l+^2 = p^l +p^2^ (2.4) and that for any function, P^" = fp""- (2.5) Furthermore, if a;(x) = 0, then p'^ = 0 at x. By the support of a differential form we mean, as usual, the closure of the set of x for which a;(x) 9^ 0. We say that an n-form co is locally absolutely integrable if the density p"^ is locally absolutely integrable. Note that to say that co is locally absolutely integrable means that for any chart {U, a), with coordinates x\ . . . , x^ of some atlas d, if bi = a dx^ A • • • A dx'^ on [/, then the function a^ = a o a~^ is an absolutely integrable function on a{U). Let T(M) denote the space of absolutely integrable ?i-forms of compact support. It is clear that r(M) is a vector space and that /co E T(M) if / is a (bounded) contented function and co E r(M). As a consequence of Proposition 2.1 and Theorem 3.1 of Chapter 10, we can state: Theorem 2.1. Let M be an oriented manifold. There exists a unique linear function J on T(M) satisfying the following condition: If supp co C ^^ where {Uj a) is a positive chart with coordinates x\ . . . , x^, and if co = adx^ A • • • A dx^^y then l-L a„. (2.6) Observe that we can write for all CO e T{M). (2.7) h = 1^' The recipe for computing Jco is now very simple. We break co up into small pieces such that each piece lies in some U. (We can ignore sets of content zero
436 EXTERIOR CALCULUS 11.2 in the process.) If supp o: C U^ and if (C/, a) is a positive charts we express co as o: = adx^ A • • • A dx"^. And if a is given as a = aa(:r\ ... , x^), we integrate the function a^ over W^. The computations are automatic. Thus one point that has to be checked is that the chart (U, a) is positive. If it is negative, then Jco is given by —jaa- Let Ml be an oriented manifold of dimension q, let <^:Mi -^ M2 be a differentiate map, and let co E /\^(M2). Then for any contented compact set A C Ml the form e^<^*(a;) belongs to T{Mi), so we can consider its integral. This integral is sometimes denoted by J^(^A) co; that is, we make the definition / CO ^ / eA(p CO. J(p{A) JMi (2.8) If we regard (p{A) as an "oriented g-dimensional surface" in M2, then we see that the elements of /\^(^2) are objects that we can integrate over such "surfaces". (Of course, if g = 1, we say "curves".) C{c) C{[a, h]) ~ Fig. 11.3 Fig. 11.4 Let us illustrate by some examples. Suppose that M2 = ^'^, and let A C 11^^ be the interval a < t < h. Let x^, . . . , x'^ he the coordinates of W^, and let CO = a^dx^ + • • • + a^dx^. We regard R^ as an oriented manifold on which the identity chart is positive (and its coordinate is ^). If C: R^ —> R^ is a differentiate curve (Fig. 11.3), then / CO = / e[a,6]C*(co) yC([a,6]) J = / {C'{t),c^)dt. (2.9) J a From this last expression we see that C does not have to be differentiable everywhere in order for /c([a,6]) co to make sense. In fact, if C is differentiable everywhere on R except at a finite number of points, and if C'{t) is always bounded (when regarded as an element of R^), then the function (C'(-), co) is defined everywhere except for a set of content zero and is bounded. Thus
11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 437 C*(a;) is a contented density and (2.9) still makes sense. Now the curve can have corners. (See Fig. 11.4.) It should be observed that U o: = df (and if C is continuous), then f df= f d{foC)= f' ifocy •/C([a,6]) •/C([a,6]) Ja = /(C(6)) -/(C(a)). (2.10) In this case the integral depends not on the particular curve C but on the end- points. In general, jc ^ depends on the curve C. We will obtain conditions for it to be independent of C in Section 5. In the next example let M2 = U^ and Mi = U CU^ where (u, v) are Euclidean coordinates on U.^ and x, y, z are Euclidean coordinates on R^. Let oi = P dx A dy -\- Q dx A dz -\- R dy A dz be an element of /\^([R^). If (p: U -^ U^ is given by the functions x{u, y), y{u, v), and z(u, v), then for A C U, I CO = / e^<^*a; = / eA(p*(P dx A dy + Q dx A dz + R dy A dz) = IA l^^ ° ^^ W a^ " a^ W + ^^ ° ^^ \TuVv -Q^Vv) We conclude this section with another look at the volume density of Riemann metrics, this time for an oriented manifold. If M is an oriented manifold with a Riemann metric, then the volume density a corresponds to an n-form 12. By our rule for this correspondence, if ([/, a) is a positive chart with coordinates :r^ . . . , :r^, then ^= adx^ A " ' A dx"", where, by (4.1) of Chapter 10, a{x) = |det {gij)\^'^ is the volume in Tx{M) of the parallelepiped spanned by dx t(^)---^(^)- dx"" Let ei{x)j . . . J en{x) be an orthonormal basis of Tx{M) (relative to the scalar product given by the Riemann metric). Then |det {gij)\ 1/2 det \h'V. |det A|, where A = (d/dx\ ey) is the matrix of the linear transformation carrying ey -^ d/dx^. If o:^(x), . . . , a;^(:r) is the dual basis of the e^s, then c^\x) A • • • A a;^(:r) = det A dx\x) A • • • A g?x^(x).
438 EXTERIOR CALCULUS ll..^ Now o:^{x), . . . , a;''(x) can be any orthonormal basis of T*(M). [T*(M) has a scalar product, since it is the dual space of the scalar product space Tx(M).] We thus get the following result: If a;\ . . . , co^ are linear differential forms such that for each x G M, cx:^(x), . . . , o:'^(x) is an orthonormal basis of T*{M), then n = zbco^ A • • • A a;^ We can write 12 = co^ A • • • A co^ (2.11) if we know that co^ A • • • A co^ is a positive multiple of cZ:r^ A • • • A dx'\ Can we always find such forms a;\ . . . , co^ on [/? The answer is "y^s": we can do it by applying the orthonormalization procedure to clx^, ... , dx'^. That is, we set 1 _ dx^ where \\dx^\\{x) = \\dx\x)\\ > 0 \\dx^\\ is a C°°-function on U, 2 CO dx^ — (dx'^, o:^) 0)^ px2 - (d:r2^a;i)a;i| The matrix which relates the dx's to the co's is composed of C°°-functions, so that the CO* G /\^{U). Furthermore, it is a triangular matrix with positive entries on the diagonal, so its determinant is positive. We have thus constructed tlu^ desired forms co^, . . . , co^, so (2.11) holds. For instance, it follows from Eq. (9.10), Chapter 9, that dd, sin 6 d<p form an orthonormal basis for Tx{S^) at, all X G S'^ (except the north and south poles). If we choose the orientation on S'^ so that 6, (p form a positive chart, then the volume form is given by 12 = sin ^ dd A d<p. 3. THE OPERATOR d With every function / we have associated a linear differential form df. We can thus regard d as a map from /\^(M) to /\^(M). As such, it is linear and satisfies c/(/l/2)=/2^/l+/l^/2. We now seek to define a d: A^W -> A^"^^(^^) for /c > 0 as well. We shall require that d be linear and satisfy some identity with regard to multiplication, generalizing the above formula for g?(/i/2). The condition we will impose is that cZ(coi A C02) = do^i A C02 + ( —l)^coi A c/co2 if coi is a form of degree p. The factor (—1)^ accounts for the anticommutativity of A. The reader should check that d is consistent with this law, at least to the extent that g?(coi A C02) = (—1)^^ g?(co2 A coi) if coi is of degree p and C02 is of degree q. We are going to impose one further condition on d which will uniquely determine it. This condition (which lies at the heart of the matter) requires
11.3 THE OPERATOR d 439 Fig. 11.5 some introduction. Let / be a differentiable function, and let C: / —> ikf be a differentiable curve. For any points a,hGl, the fundamental theorem of the calculus implies that fidb)) -/(C(a)) = I ^^dt= j^ C*df. We can regard b and a (with i signs attached) as the "oriented boundary" of the interval [a, h]. Let us make the convention that "integrating" an element of A^(p) is just evaluating the function at the point p. As such, the equation above says that the integral of the "pullback" of / over the "boundary", that is, f(b) — f{a), equals the integral of the "pullback" of df over [a, h]. In some sense, we would like to be able to say that if co is a form of degree /c, then the integral of the "pullback" of co over the /c-dimensional boundary" of a (/c + l)-dimen- sional region is equal to the integral of the pullback of do: over the (/c + 1)- dimensional region. Without trying to make this requirement precise, let us see what it says for the case where k = 1 and the region is a triangle in the plane. Let <^ be a smooth map of some neighborhood of the triangle A C 11^^ into M, and let the vertices of A be mapped by cp into x, y, and z (see Fig. 11.5). The boundary of A consists of three curves (segments) Ci, C2, and C3 (with the proper orientations). Let co be a linear differential form on M. We would then expect that I cp do) = I Cicp CjJ -{- I C2<P OJ -]- I C^cp CjJ. If CO = dfj then the three integrals on the right become (by the fundamental theorem of the calculus) /(y) - /(x) + /(z) - /(y) + /(x) - /(z) = 0. Thus jcp* d(df) = 0. Since the triangle was arbitrary, we expect that didf) = 0. We now assert: Theorem 3.1. There exists a unique linear map d: /\^(M) -^ /\^'^^(M) such that on /\^ it coincides with the old d and such that d{c^i A coa) = dc^i A coa + (-l)^a;i A do:2 if coi G A^'W (3-1) and d(df) = 0 if /gA^^). (3.2)
440 EXTERIOR CALCULUS ll.;') Proof. We first establish the uniqueness of d. To do this we observe that (3.1) implies that d is local, in the sense that if co = co' on some open set U, then do: = (ico' on U. In fact, let W be an open set with W C U, and let <^ be a C^- function such that (p(x) ^ 1 for x gW and supp ^ <ZU. Then <^a; = <^a;' everywhere on M, and thus d(<^a;) = d(<^a;'). But, by (3.1), d(<^a;) = <^ c^co + d<^ A CO = c/co on TF, since <^ ^ 1 and d<^ = 0 there. Thus dw = do:' on W. Since 14' can be arbitrar}^, we conclude that c/co = do:' on U. Let (r/, a) be a chart with coordinates a;\ . . . ,x'^. Every co G A^(^-^) can be written as CO = X) ^*i, .,u<^^^'^ A • • • A di:'^ on U. Now [by induction on /c, using (3.1) and (3.2)] d{dx^^ A • • • A dx^^) = 0. Thus (3.1) implies that dco = L ^«n. .t, A dx'' A • • • A c^x''^" on U. (3.3) Equation (3.3) gives a local formula for d. It also shows that d is unique. In fact, we have shown that there is at most one operator d on any open subset OcM mapping A^(0) -^ A^'^^iO) and satisfying the hypotheses of the theorem (for 0). On the set 0 n U it must be given by (3.3). We now claim that in order to establish the existence of d, it suffices to show that the d given by (3.3) [in any chart ([/, a)] satisfies the requirement of the theorem on /\^{U). In fact, suppose we have shown this to be so. Let d be an atlas of M, and for each chart (U, a) G Gi define the operator da'- A\^(U) -^ ^k+i^fj^ by (3.3). We would like to set do: = dao: on U. For this to be consistent, we must show that daO: = d^o: on U n W if {W, P) is some other chart. But both da and dp satisfy the hypotheses of the theorem on U CiW, and they must therefore coincide there. Thus to prove the theorem, it suffices to check that the operator d, defined by (3.3), fulfills our requirements as a map of /\^{U) —> /\^'^^{U). It is obviously linear. To check (3.2), we observe that so d(df) = Z d (II) A d.^ = Z {S^d.^ A d.^ = 0 by the equality of mixed partials. Now we turn to (3.1). Since both sides of (3.1) are linear in coi and a;2 separately, it suffices to check (3.1) for coi = a dx'^^ A • • • A dx'^^ and a;2 =
11.3 THE OPERATOR d 441 b dx^^ A • • • A dx^''. Now coi A 002 = ah dx^^ A • • • A dx^^ A dx^'^ A • • • A c?:^-^'' and d(ah) = h da -\- a dh; therefore, d(o:i A 0:2) = h da A dx''^ A • • • A dx'^p A dx^'^ A • - • A dx^" + adh A dx'' A • • • A dx'^ A dx^'' A • • • A dx^'% while G^coi A a;2 == (c^a A dx'"^ A • • • A G?:r'^) A (b dx^'"- A • • • A c^:r-^«) =. hda A dx'"^ A • • • A G?:r'^ A G?:r-^'^ A • • • A dx^''' and coi A do:2 = {a dx'' A • • • A dx'^) A {dh A dx^'' A • • • A dx^''') = (-l)^a G?6 A dx'' A • • • A g?x^^ A dx^'' A • • • A c^x^'^ so we see that (3.1) holds. This proves the theorem. D We can draw a number of important corollaries from Eq. (3.3). First of all, it follows immediately that for co G /\^(M), for any k, we have d(dc^) = 0. (3.4) (Remember we merely assumed it for /c = 0.) Secondly, let <^: Mi —> M2 be a differentiable map. Then for co G /\^(M2) we have d(p*o: = <^* G^co. (3.5) To check (3.5), it suffices to verify it for any pair of compatible charts. But if X , . . . , ^ are coordinates on M2 and, locally, ^ = E «ti h dx'' A • • • A dx'^, we have d<p*o) = d{Y. (P*(ai^ if)(p* dx'' A • • • A <^* dx'') = E ^<^*(azi ij) A c^<^V^ A • • • A dcp^'x'' = Y. <P* dai^ ik A G?<^V^ A • • • A G?(^V^ = <^*(E ^«M 4 A dx'' A • • • A G?:r^"^) In particular, if X is a vector field on M, we conclude that Dx dc^ = d(Dxc^). (3.6) EXERCISES 3.1 Compute d of the following differential forms. a) y = Hi (—iy~''^Xidxi A • • • A t^a;i_i A dxi^i A - • • A dxn b) r~"7, where 7 is as in (a) and r = {a;? + • • • + o;^} ^^^ c) E Pt- ^5*' d) sin (x^ + 2/^ + ^^) {x dx-\- y dy -\- z dz)
(2 2 ' 442 EXTERIOR CALCULUS 11 1 Let F be a vector space equipped with a nonsingular bilinear form and an orientation Then we can define the ^-operator as in Chapter 7. Since we identify the tangent spac* Tx(V) with V for any x G V, we can consider the ^-operator as mapping /\^{V) /\"~^(F). For instance, in IR^, with the rectangular coordinates (x, y), we have *c?a: = d?/, *d?/ = —dx, and so on. 3.2 Show that dyy for any function /on IR^. 3.3 Obtain a similar exi)ression for d ^ d m [R" with its usual scalar jiroduct. (Rec.'ili that *dx^ A ' ' • A dx^ = dx^+^ A • • • A dx"" and, more generally, *dx^i A • • • A dx'^ = ± dx^' A • • • A dx^^-^, where (^l, . . . , ik,ji, • • • ,jn-k) is a permutation of (1, ... , n) and the ± is the si,<;ii of the permutation.) 3.4 Let X, y, z, t be coordinates on IR^. Introduce a scalar product on the tangent space at each point so that (dx, dx) = (dy, dy) = {dz, dz) = 1, {dx, dy) = (dx, dz) = (dx, dt) = (dy, dz) = (dy, dt) = (dz, dt) = 0, and (c dt, c dt) = —1, where c is a positive constant. Let the two-form co be given by CO = c(Ei dx A dt+ E2 dy A dt+ E3 dz A dt) -\- Bi dy A dz+ Bo dz A dx ^ B3 dx A dy. Let the three-form 7 be given by T = p dx A dy A dz — (Ji dy A dz^ J2 dz A dx + /s dx A dy) A dt. Write the equations dco = 0, d * CO = 47rT as equations involving the various coefficients and their partial derivatives. 4. STOKES' THEOREM In this section we shall prove a theorem which will be a far-reaching generalize tion of the fundamental theorem of the calculus of one variable. It should, perhaps, be called the fundamental theorem of the calculus of several variables We first make some definitions. Let D be a domain with regular boundary in a manifold M. We recall (page 419) that each point of M lies in a chart ([/, a) which is one of three types
11.4 STOKES THEOREM 443 Let ([/, a) and (W, fi) be two charts of M of type (iii). Then, as on page 420, the matrix of J^oa~^ is given by dy^ dy^~ dy -1 a*/"- 3x1 0 0 0 and so det J;3o £-- dy 0 ^ n dx'^ dy' ax'^-i T-^ X det /(^rai))o(arai))-i. ax (4.1) Furthermore, 2j''(x\ . . . , x"") > 0 if x"" > 0, since a{U nW) D {wv'' > 0} = a{U nW n int D). Thus dy^'/dx'' > 0 at a boundary point where Xn = 0. Now suppose that M is an oriented manifold, and let D C M be a domain with regular boundary. We shall make dD into an oriented manifold. We say that an atlas Gi is adjusted if each ([/, a) G a is of type (i), (ii), or (iii) and, in addition, if each chart of Gi is positive. If dim M > 1, we can always find an adjusted atlas. In fact, by choosing the U connected, we find that every {Uj a) is either positive or negative. If ([/, a) is negative, we replace it by ([/, a), where Xa^ = —x^. If dim M = 1, then dD consists of a discrete set of points (which we can regard as a "zero-dimensional manifold")- Each x G dD lies in a chart of type (iii) which is either positive or negative. We assign a plus sign to x if any chart (and hence all constricted charts) of type (iii) is negative. We assign a minus sign to X if its charts of type (iii) are positive. In this way we "orient" dDj as shown in Fig. 11.6. _ D + Fig. 11.6 If dim M > 1, we choose an adjusted (oriented) atlas on M. It then follows from (4.1) and the fact that dy^'/dx'' > 0 that detJ0^dD)oia\-dD)~^ > 0. This shows that (L' \ dD, a \ dD) is an oriented atlas on dD. We thus get an orientation on dD. This is not quite the orientation we want on dD. For reasons that will soon become apparent, we choose the orientation on dD so that
444 EXTERIOR CALCULUS 11.4 {U \ dD,a\ dD) has the same sign as ( — 1)''. That is, (f/ \ dD,a\ dD) is a positive chart if n is even, and we take the orientation opposite to that determined by (^ f dDj a \ dD) if n is odd. We can now state our main theorem. Theorem 4.1 {Stokes^ theorem). Let M be an n-dimensional oriented manifold, and let D C M be a domain with regular boundary. Let ^D denote the boundary of D regarded as an oriented manifold. Then for any CO G /\^~^(M) with compact support we have [ L*o)= f doj, (4.2) where, as usual^ i is the injection of boundary D into M. Proof. For n = 1 this is just the fundamental theorem of the calculus. For n > 1 our proof is almost exactly the same as the proof of Theorem 6.1 of Chapter 10. Choose an adjusted atlas Gi and a partition of unity {qj} subordinate to d. Since co has compact support, we can write where the sum is finite. Since both sides of (4.2) are linear, it suffices to verify (4.2) for each of the summands QjO). Since supp gjO) C U, where ([/, a), we must check the three possibilities: (U, a) satisfies (i), (ii), or (iii). If {U, a) satisfies (i), t*co = 0, since supp co f) dD = 0, and dco = cd do) = 0, Jd Jm since D f) supp co = 0. Thus both sides of (4.2) vanish. If ([/, a) satisfies (ii), the left-hand side of (4.2) vanishes. We must show that the same holds for the right-hand side. Let x^, . . . ^ x^ he the coordinates on ([/, a), and write gjO) = ai dx^ A • • • A dx"^ + ^2 dx^ A dx^ A • • • A dx"^ -f . . . + cin dx^ A ' ' ' A dx'^~^. Then dgjo) = E (-1)'-' ^^,dx' A • • • A dx"", and thus f dgjc. = Z i-ir~' I daj Since gjco has compact support, the functions ai have compact support, and we can replace the integral over IR^ by the integral over D-R, where R = <Ry . . , J R> and R is chosen so large that supp ai C (ZJ-r. But writing the multiple integral as an integral, we get □^-rI^ = L-i ^'^- • •' ^' • ■ •) - «'<• ■■,-R,---) = o, since a^(. . . , R, . . .) = ai{. . . , —R, . . .) = 0.
11.4 STOKES THEOREM 445 Fig. 11.7 We now examine J dgjO) in case (iii). The argument proceeds exactly as before^ except that we must compute la{Uf\D) Bai/dxi instead of ^aiUy (See Fig. 11.7.) We can now replace the region of integration by a rectangle of the form □ <-¥.!??.-s.o> ^or large R. If i < n^ j dai/dxi = 0 as before. If i = n, we get so that Now since ^^ == 0 on f/ n dD^ we see that t* dx'^ == 0. Thus i*co = (£*an)(t*G?x^) A • • • A {L*dx''-^), or if (by abuse of notation) we regard x\ . . . ^ x^~^ as the coordinates of {U ldD,a\ dD), we get t*co = a(-, •, . . . , •, 0) G?x^ A • • • A dx''~\ In view of the choice we made for the orientation of dDj we conclude that f t*«=(-ir/" «„(•,-,...,-, 0). iaD Jun — l This completes the proof of the theorem. D Theorem 4.1, like the divergence theorem^ is not sufficiently broad for us to apply to more general domams. For this purpose, we will again use the notion of a domain with almost regular boundary. We have already seen that the set of a: G dD having a neighborhood of type (iii) forms a differentiable manifold. (Recall that these points need not exhaust all of dD). Similarly, if M is an oriented manifold, then this collection of points becomes an oriented manifold (with (—1)^ times the induced orientation, as before). By abuse of language we shall denote this oriented manifold by ^D. Thus dJ} is an oriented manifold which, as a set, is not dD but only the "regular" points of dDj that is, the points of dD.
446 EXTERIOR CALCULUS 11.4 Theorem 4.2 (Stokes^ theorem). Let M be an n-dimensional oriented manifold, and let D C M be a domain with almost regular boundary. Let ^D be as above, and let l be the injection of ^D —> M. Then for any CO G /\^~^{M) with compact support we have f t*co = f do). (4.2) Proof. The proof proceeds as before. We choose an adjusted atlas and a partition of unity {gj} subordinate to the atlas. We write co = J2 dj^ and now have four cases to consider. The first three cases have been handled already. The new case is where QjO) = ^ aj dx^ A • • • A dx^ f\ • • - f\ dx^, 3 where the "^ indicates that dx^ is to be omitted, has its support contained in U, where ([/, a) is a chart of type (iv). By linearity, it suffices to verify (4.2) for each summand on the right, i.e., for aj dx^ A • • • A dx^ A • • • A dx'^. Now L'^{ajdx^ A • • • A & A • • • A dx'^) = 0 unless j > /c, since dx^ vanishes on the piece of ^D n U whose image under a lies in Zfp. If j < p, then all these dx^ occur, and thus t*(ay g?x^ A • • • A dx^ A • • • A dx^) = 0. If j > p, then i'^iaj dx^ A • • • A dx^' A dx'^) vanish everyivhere except on the portion of ^D which maps under a onto H^. On the other hand, d{aj dx^ A'" Adx' A '" A dx"") - (-1)'"^ ^:^.dx^ A - - A dx"". We can evaluate the integral //> by integrating over the rectangle I—j<ii!, . . . , R R> I—l<--ii!,...,—ii!,0,...,0,0> (where the —iJ's extend through the (/c — l)th position). Integrating first with regard to x\ we obtain ( diajdx^ A ' ' ' Adx^ A ' ' ' A dx"") = {—lyf .aj. Jd Jh^ On the other hand, the orientation on Hj is such that this integral has the sign necessary to make (4.2) hold. This proves Theorem 4.2. D As before, we can apply Theorems 4.1 and 4.2 to still more general domains by using a limit argument. For instance. Theorem 4.2, as stated, does not apply to the domain D in Fig. 11.8, because the curves Ci and C2 are tangent at P. It does apply, however, to the approximating domain obtained by "breaking off a little piece" (Fig. 11.9), and it is clear that the values of both sides of (4.2) for D' are close to those for D. We thus obtain (4.2) for D by passing to the limit. As before, we will not state a more general theorem covering these cases. It will be clear in each instance how to apply a limit argument.
11.4 STOKES THEOREM 447 Fig. 11.8 Fig. 11.9 Since the statement and proof of Stokes^ theorem are so close to those of the divergence theorem, the reader might suspect that one implies the other. On an oriented manifold, the divergence theorem is, indeed, a corollary of Stokes^ theorem. To see this, let Q be an element of /\^{M) corresponding to the density p. If X is a vector field, then the n-form Dx^ clearly corresponds to the density Dxp = div <X, p>. Anticipating some notation that we shall introduce in Section 6, let X J 12 be the (n — l)-form defined by X J i2(f \..., r-') = (~-i)"-^i2(f \..., e~\ X). In terms of coordinates, if Q = adx^ A - - - A dx'^, then X JQ= a[X^ dx^ A'" Adx"" - X^ dx^ A dx^ A - " A dx"" + h (~1)"~'X^ dx' A "' A dx""-^]. Note that d{xjQ) = (e^-)^^' a • • • a t^x^ which is exactly the n~form Dx^, since it corresponds to the density DxP = div <Xy p> . Thus, by Stokes^ theorem, f L*(XJQ)== f d(XjQ)=- f div<X,p>. JdB JB JD We must compare X JQ with the density px on dD, By (2.2) they agree on everything up to sign. To check that the signs agree, it suffices to compare Px with (toJ aJ=f) - ^ (tei 5?^' ^) at any x G dD. Now t*(Z JQ)= (-l)"-iZ" dx' A '" A dx""-' and, according to our convention, x^, . . . , x^~^ is a positive or negative coordinate system according to the sign of (—1)^. Thus the two coincide if and only if X^ is negative, that is. f t*(Z jn) = f €XPX. JdB JdD
448 EXTERIOR CALCULUS 11.4 EXERCISES 4.1 Compute the following surface integrals both directly and by using Stokes' theorem. Let O denote the unit cube, and let B be the unit ball in R^. a) Sdn X dij A dz+ y dz /\ dx -\- z dx /\ dy b) SdB x^ dy A dz c) Jan cos z dx A dy d) Sdux dy A dz, where U = [(x,y,z):x> 0, y > 0, z > 0, x^ + y^ + z^ < 1} 4.2 Let a; = yz dx + x dy -f dz. Let 7 be the unit circle in the i)lane oriented in the counterclockwise direction. Compute fy cj. Let Ai - {{x,y,z):z = 0, x^ + y^ < 1], A2 - {(x, y,z):z == 1- x^ - y\ x^ + y^ < 1]. Orient the surfaces A i and .I2 so that^Ai = 8X2 — T. Verify that Jai doo = Jx^ do: — Jo; b}' computing the integrals. 4.3 Let S^ be the circle and define a; = (l/27r) dS, w^here 6 is the angular coordinate. a) Let <^: iS^ —> iS^ be a differentiable map. Show that J<^*a; is an integer. This integer is called the degree of <p and is denoted by deg <p. b) Let <pt be a collection of maps (one for each t) which depends differcntiably on t. Show that deg cpo = deg <pi. c) Let us regard S^ as the unit circle in the complex numbers. Let / be some function on the complex numbers, and suppose that f{z) 9^ 0 for \z\ = r. Define ifrj by setting cprjie'^) = f{re'^)/\f{re'^)\' Suppose f{z) = z'\ Compute deg <^r,/ for r ?^ 0. d) Let/be a polynomial of degree n > 1. Thus J{z) = anz"" + a„_iz"-i H h ao, where a„ ?^ 0. Show that there is at least one complex number zo at which/(^o) — 0. [Hint: Suppose the contrary. Then <Pr,a/an)f is defined for all 0 < r < 00 and deg (Pr,o/an)f "= const, by (b). Evaluate lim7.=o and lim7.=oo of this expression.] Let X be a vector field defined in some neighborhood U of the origin in E^, and suppose that A^(0) = 0 and that X{x) 9^ 0 for x ^ 0. Thus A^ vanishes only at the origin. Define the map <pr: S^ —^S^ by ie. _ X(re' ) This map is defined for sufficiently small r. By Exercise 4.3(b) the degree of this map does not depend on r. This degree is called the index of the vector field X at the origin. 4.4 Compute the index of a) x-T-+y-T-> b) x- y-r-y c) y- x — • dx dx dx dy dx dy d) Construct a vector field with index 2. e) Show that the index of —A'' is the same as the index of X for any vector field A^.
11.5 SOME ILLUSTRATIONS OF STOKES THEOREM 449 4.5 Let X be a vector field on an oriented two-dimensional manifold, and suppose that X(p) = 0 for some p G M and that X does not vanish at any other point in a small neighborhood of p. By choosing an oriented chart mapping p into zero, we get a vector field on E^ vanishing at the origin. Show that the index of this vector field does not depend on the choice of charts. We can thus define the index of X at p. 4.6 a) On the sphere S^ let Z be a vector field which is tangent to the meridian circles everywhere and vanishes only at the north and south poles. What is its index at each pole? b) Let Y be a vector field which is tangent to the circles of latitude everywhere and vanishes only at the north and south poles. What is its index at each pole? 5. SOME ILLUSTRATIONS OF STOKES' THEOREM As a simple but important corollary of Theorem 4.2, we state: Theorem 5.1. Let <p: Mi —> M2 be a differentiable map of the oriented /c-dimensional manifold Mi into the n-dimensional manifold M2. Let co be a form of degree k — 1 on M2, and let D C Ml be a domain with almost regular / yq boundary on Mi. Then we have / L*(p*co = / (p*(dco). (5.1) JdB Jb Fig. 11.10 Equation (5.1) follows directly from (4.2) and from the fact that <^*g? = g?<^*. We can regard the right-hand side of (5.1) as the integral of do) over the "oriented /c-dimensional hypersurfaces" <^(D). Equation (5.1) says that this integral is equal to the integral of co over the (/c — 1)-dimensional hypersur- face (p{dD). Fig. 11.11 We now give a simple application of Theorem 5.1. Let Cq: [0, 1] —> M and Ci: [0, 1] -^ M be two differentiable curves with Co(0) = C'i(O) = p and Co(l) = Ci(l) = q. (See Fig. 11.10.) We say that Co and Ci are (differentiably) homotopic if there exists a differentiable map <^ of a neighborhood of the unit square [0, 1] X [0, 1] C U^ into M such that cp{t, 0) = Co(t), cp(t, 1) = Ci{t), (p{0, s) = p, and <^(1, s) = q. (See Fig. 11.11.) For each value of s we get the curve Cs given by Cs(t) = <p{ty s). We think of <p as providing a differentiable "deformation" of the curve Co into the curve Ci.
450 EXTERIOR CALCULUS 11.5 Proposition 5.1. Let Cq and Ci be differentiably homo topic curves, and let CO be a linear differential form on M with do) = 0. Then Jci Jc. (5.2) Proof. In fact, /. ^LJ<o.o> CO = /n<i.i> •/LJ<o.o> do) = 0. But/a □ is the sum of the four terms corresponding to the four sides of the square. The two vertical sides (^ = 0 and ^ = 1) contribute nothing, since (p maps these curves into points. The top gives —jci (because of the counterclockwise orientation), and the bottom gives Jcq. Thus fco ^ — jci^ = Oj proving the proposition. D It is easy to see that the proposition extends without difficulty to piecewise differentiable curves and piecewise differentiable homotopies. Let us say that two piecewise differentiable curves, Co and Ci, are (piecewise differentiably) homo topic if there is continuous map <^ of [0, 1] X [0, 1] —> M such that i) <^(0, s) = p, (p(l, s) = q; ii) cpit,0) = Co{t),<p(t,l) = Ci(t); iii) there are a finite number of points to < ti < - - - < tm such that (p coincides with the restriction of a differentiable map defined in some neighborhood of each rectangle [ti, U^i] X [0, 1]. (See Fig. 11.12.) To verify that Proposition 5.1 holds for the case of piecewise differentiable homotopies, we apply Stokes' theorem to each rectangle and observe that the contribution of the interior vertical lines cancel one another. We say that a manifold M is connected if every pair of points can be joined by a (piecewise differentiable) curve. Thus IR", for example, is connected. We say that M is simply connected if all (piecewise differentiable) curves joining the same two points are (piecewise differentiably) homotopic. (Note that the circle, S^ is not simply connected.) Let us verify that IR" is simply connected. If Co and Ci are two curves, let <^: [0, 1] X [0, 1] —> W^ be given by cp(t, s) = sCo(t) + (1 - s)C,(t). It is clear that (p has all the desired properties. () () {) () Fig. 11.12 Proposition 5.2. Let M be a connected and simply connected manifold, and let 0 gM. Let co g A\M) satisfy do) = 0. For any xgM let J(x) = jc CO, where C is some piecewise differentiable curve joining 0 to x. The function / is well defined and differentiable, and df = o).
11.5 SOME ILLUSTRATIONS OF STOKES' THEOREM 451 Proof. It follows from Proposition 5.1 that / is well defined. If Co and Ci are two curves joining 0 to x, then they are homotopic, and so Jcq co = Jci co. It is clear that / is continuous, since fix) - M = f CO, JD where D is any curve joining y to x (Fig. 11.13). To check that/ is differentiable, let ([/, a) be a chart about x with coordinates <x\ . . . , x">. Then S{x\ ...,x' + K...,x^)- f{x\ . . . , :r") = r CO, Jc where C is any curve joining p to q, where a{p) = {x^, . . . , x\ . . . , x^), and where a(q) = (x^, . . . , x^ -{- h, . . . , x^). We can take C to be the curve given by aoC(t) = (x\ . . . , x' + ht, . . . , x"). If CO = ai dx^ + ' ' ' + dn dx^, then I 0) = I hai dt = I ai{x^, . . . , x^ -{- s, . . . , x^) ds. Jc Jo Jo (See Fig. 11.14.) Thus lim i [f{x\ ...,x' + h\..., x^) - f(x\ . . . , x^)] = a\ h-^o ^ that is, df/dx^ = a\ This shows that/is differentiable and that df = co, proving the proposition. D Fig. 11.13 Fig. 11.14 We have thus estabhshed that every co E AH^f^") with c/co = 0 is of the form df. More generally, it can be estabhshed that if 12 G /\^(1R'') satisfies dQ = 0, then Q = do) for some co G A^~^('I^'')- *This is not true for an arbitrary manifold. For instance, every co G AH^^) satisfies dco = 0. Yet the element of angle form (which is, unfortunately, denoted by dd) is not the d of any function. The fact that d^ = 0 shows that if Q = do), then dQ = 0. Thus the space d[A^-\M)] C A^(^) is a subspace of the space ker^- d of elements in /\^(M) satisfying d9. = 0. The quotient space kerA; d/d[/\!^~^] is denoted by H^{M) and is called the /cth cohomology group of M. If M is compact, it can be shown that H^ is finite-dimensional. It measures (roughly speaking) "how many" /c-dimensional holes there are in M.*
452 EXTERIOR CALCULUS 11.6 6. THE LIE DERIVATIVE OF A DIFFERENTIAL FORM Let Af be a differentiable manifold, and let <^ be a flow on M with infinitesimal generator X. For any co G /\^(M) we can consider the expression <^f CO — CO t It is not difficult (using local expressions) to verify that the limit as ^ —> 0 exists and is again an element of /\^(M), which we denote by D^co. The purpose of this section is to provide an effective formula for computing D^co. For this purpose, we first collect some properties of Dx- First of all, we have that it is linear: i)x(coi + C02) -= Dxo)i + Dxco2. (6.1) Secondly, we have (ft {o)i A C02) — coi A C02 = (<^«coi) A (<^«co2) — coi A C02 = (<^*coi) A (<^fco2) — ((^fcoi) A C02 + i^to^i) A C02 — coi A C02. Dividing by t and passing to the limit, we see that Dx{o)i A C02) = (Dxcoi) A C02 + coi A Dxco2. (6.2) Finally, since <^f d = dcp*, we have Dxdo) = d{Dxo)). (6.3) Actually, these three formulas suffice for the computation. If CO = E (^H,...,ikdx''^ A • • • A dx''\ then Dxco = E Dx{ai^,...,i,dx'' A • • • A dx'') by (6.1) = L [{I>xa^,_,i,) dx'' A • • • A dx'' + a,^,...,,,(Z)x dx'') A • • • A dx'' + • • • + (^ii,...,ik dx" A • • • A {Dx dx")] by repeated use of (6.2) = L [(^xavi,...,t,) dx'' A • • • A dx'' + «ti,...,4 d(Dxx^) A - - ■ A dx^ + "■ + a^^_,i^ dx'' A • • • A c/(Z)xx'^)] by (6.3). Since this expression is rather cumbersome (the d{Dxx') have to be expanded and the terms collected), we shall derive a simpler and more convenient expression for Dxo). In order to do this, we make an algebraic detour. Recall that the operator d: /\^{M) —> /\^+^(M) is linear and satisfies the identity (i(coi A C02) = dcoi A C02 + ( —l)^coi A do)2 (6.4) if coi G /\^{M). More generally, any (sequence of linear) maps 6 of AHm)-^ A^'^HM)
11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 453 satisfying the identity (9(coi A C02) = e'coi A C02 + (-l)^coi A 6*602 (6.4') and supp dco C supp cot (5.5) will be called an antiderivation of the algebra /\(M). It follows from (6.5) that if coi = CO2 on an open set U, then ^(coi) = 6(0)2) on U. Now about every x G M we can find a neighborhood U and functions x^, . . . , x^, so that CO G /\^(M) can be written as CO = L ai^_,i^ dx'' A • • • A dx'^ on U. (6.6) Then by repeated use of (6.4') we have ^(co) = L [^(a^, ij A dx'' A • • • A g?x^'^ + a^, ,,^(g?x^O A • • • A g?x^^ + h (-l)'^~'a,, i^dx'^ A • • • A ^(g?x^^)]. (6.7) We thus arrive at the important conclusion: Proposition 6.1. Any antiderivation 6: /\^{M) -> A^"^H^^). /c = 0, . . . , n, is uniquely determined by its action on /\^(M) and /\^{M). That is, if 6ii(co) = 6'2(co) for all co G A^(^^) and /\\M), then (9i(12) = 6'2(1^) for ^ E A^(-^) for any k. Now suppose we are given maps d: /\\M) -> A'(^^) and 6: /WM) -> /WM) which satisfy (6.5) and (6.4') where it makes sense, that is, e{hf2) = e{h)U + /i^(/2) and ^(/co) = ^(/) Aico + P(co). (6. Then any chart {U,a) defines ^: A^(^)-> A^'^'(^) by (6.7). This gives an antiderivation du on [7, as can easily be checked by the use of the argument on pp. 440-441. By the uniqueness argument, if (TF, fi) is a second chart, the antiderivations du and dw coincide on U dW. Therefore, Eq. (6.7) is consistent and yields a well-defined antiderivation on /\{M). (Observe that we have just repeated about two-thirds of the proof of Theorem 3.1 for the more general context of any antiderivation.) t This condition is actually a consequence of (6.4). In fact, let U be an open set containing supp a;. Since {U, ilf-supp a;} is an open covering of 71/, we can find a partition of unity subordinate to it. In particular, we can find a C°°-function ^ which is identically one on supp a; and vanishes outside U. Then a; — <^a;, so that (9(a;) - e{ipoi) - (9(<^)Aa;+<^(9(a;). Thus supp B{oi) C supp a; U supp <p C U. Since U is an arbitrary neighborhood of supp cj, we conclude that supp d{c^) C supp cj.
454 EXTERIOR CALCULUS 11.6 Also observe that in the above arguments, nothing changes if instead of d:A\M)-^A^'^\M) we have 6: A\M)-^ A^"\M). [We take this to mean d{f) = 0 for / G /\^(M).] In fact, the same argument works for for any odd integer r. We can thus state: Proposition 6.2. Let d: AHM) -> A^(M) and 6: AHM) -> A^-^HM) be linear maps satisfying (6.5) and (6.8), where r is odd. Then there exists one and only one way of extending 6 to an antiderivation 6: A^(M) -> A^^^(M) satisfying (6.4). As an application of this proposition, we will attach an antiderivation d{X): A^iM) -^ A^~'\M) to every smooth vector field X on M. Since r= -1, forfeA^iM) we set d{X)f = 0. For CO G A^(^) we set diX)o) = {X, co). (6.9) To verify (6.8) means to check that e{X){fco) = fe{X)w, that is, that (Z,/co)=/(X,a>), which is obvious. If / is a function and 6 is an antiderivation, we denote by fd the map which sends co -^/^(co). It is easy to check that this is again an antiderivation. We can assert the following as a consequence of the uniqueness theorem: Let X and Y be smooth vector fields, and let / and g be smooth functions. Then difX + cjY) = fd{X) + gd(Y). (6.10) By the proposition, it suffices to check (6.10) on all co G /\^(7I/). By (6.9), this is just (/x + ^y,co)-/(z,co) + ^(y,co), which is obvious. In particular, in a chart (U, h), if then To evaluate d{d/dx''), we use (6.8) and the fact that <S.)/ = o. »(s)-' = |? in
11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 455 Thus, for example, 6(6/dx^) dx^ A dx^ = 0 if neither p = i nor q = i, while d{d/dx'){dx' A dx^') = dx^\ d{d/dx')idx^' A dx') = —dx\ etc. Let us call a (sequence of) map(s) D: /\^{M) -^ /\^~^\M), where s is even, a derivation if it satisfies (6.5) and Z)(coi A C02) = Dcoi A C02 + coi A Dco2. (6.11) Since s is even, this is consistent. The most important example is Dx where s = 0. Then (6.11) is just (6.2). All the previous arguments about existence and uniqueness of extensions apply unchanged to derivations, as can easily be checked. We can therefore assert: Proposition 6.3. Let D: A^(^^) "^ A^"^'W and D: /\\M) -^ A^"^'(^^). where s is even, be maps satisfying (6.5) and (6.8) (with d replaced by D). Then there exists one and only one way of extending D to a derivation of A(^/). We need one further algebraic fact. Proposition 6.4. Let di'. /\^ -^ A^^'^ ^^^^^ ^2- A^ -^ A^'^'' be antideriva- tions. Then 6162 + 6261: A^ -^ /\^+^i+'2 is a derivation. Proof. Since ri and r2 are both odd, Vi + /"^ is even. Equation (6.5) obviously holds. To verify (6.40, let coi G A^^^)- Then 6162(0:1 A a;2) == 6i[62o:i A a;2 + ( —l)^"a;i A 620:2] = 61620:1 A a;2+ (-l)^"+'2(92a;i A 610:2 + { —1)^610:1 A 620:2 + coi A 61620:2. Similarly, 6261(0:1 A a;2) -= 62610:1 A a;2+ ( — if'^''^610:1 A 620:2 + ( — 1)^620:1 A 610:2 + coi A 62610:2. Since ri and r2 are both odd, the middle terms cancel when we add. Hence we get (6162 + 626i)(o:i A a;2) = (6162 + 6261)0:1 A a;2 + coi A (6162 + 6261)0:2. □ As a first application of Proposition 6.3, we observe that 6(X) o d(Y) = -6(Y) o d(X). (6.12) In fact, by Proposition 6.4, 6(X)6(Y) + 6(Y)6(X) is a derivation of degree —2, that is, it vanishes on A^ ^^^ A^- I^ must therefore vanish identically. We could, of course, directly verify (6.12) from the local description of 6(X) and 6(Y), As a more serious use of Proposition 6.4, consider 6(X) o d -\- d o 6(X), where Z is a smooth vector field. Since d: A^ -^ A^~^^ ^^^ ^ W • A^ -^ A^~S we conclude that 6(X) o d + d o 6(X): A^ ~^ A^- We now assert the main formula of this section: Dx = 6{X) od + do d(X). (6.13)
456 EXTERIOR CALCULUS ll.G Since both sides of (G.13) are derivations, it suffices to check (6.13) for functions and linear differential forms. If / G /\^{M), then d{X)f = 0. Thus, by (6.9), Eq. (6.13) becomes Dxf = {X, df), which we know holds. Next we must verify (6.13) for co G /\^{M). By (6.5), it suffices to verify (6.13) locally. If we write co = ai c/x^ + • • • + a^ dx^, it suffices, by linearity, to verify (6.13) for each term a^ dx\ Since both sides of (6.13) are derivations, we have Dx(a, dx') = (Dxai) dx' + a^(Dx dx') and [d{X) d + d^{X)]{a^ dx') = [e{X) d + de{X)]{a,) dx' + a,[^(X) d + de{X)] dx\ Since we have verified (6.13) for functions, we only have to check (6.13) for dx\ Now Dx dx' = dDxx' and [d{X) d + ddiX)] dx' = dd{X) dx' = d{X, dx') = dDxx\ This completes the proof of (6.13). In many circumstances it will be convenient to free the letter 6 for other uses. We shall therefore occasionally adopt the notation A^ J CO == d(X)o). The symbol J is called the interior product. X J co is the interior product of the form co with the vector field X. If co G /\^j then X J co G A^~^- Equation (6.13) can then be rewritten as Dxo: = X J dco + d{X J co). (6.14) Let us see what (6.14) says in some special cases in terms of local coordinates. If CO = ai dx^ -\- - • ' ~\- an dx'^ and then Hence while so dx^ ax^ c/co = Y. d^i A dx\ X Jdw = Z) [(X J da^) A dx' — dai A X J dx'] X Jco = Y. o,jX', d{x J w) = E ^ '(B^'^^-m^-
App. I. ^Vector analysis" 457 Thus X'."-£(i:-Y'|;: + «,||)^.', which agrees with Eq. (7.12') of Chapter 9. As a second illustration, let Q = adx^ A - • - A dx'^, where n = dim M. Then dQ = 0, so (6.14) reduces to Dx^ = d{X J Q). If Z = £ X\d/dx^), then = E aX' U^ J {dx^ A • • • A dx"") = E i-iy-'aX'dx' A • • • A dx'-' A dx'-^' A • • • A dx", which is merely the formula introduced at the end of Section 4. Then i^'if)'-' Dx^ = d(X jn) = [X -f^rj dx' A '" A dx^. Since we can always locally identify a density with an 7?-form by identifying p with Pa dx^ A • • • A dx'^ on (U, a), we obtain another proof of Proposition 5.2 of Chapter 10. Appendix I. "VECTOR ANALYSIS" We list here the relationships between notions introduced in this chapter and various concepts found in books on "vector analysis", although we shall have no occasion to use them. In oriented Euclidean three-space E'^, there are a number of identifications we can make which give a special form to some of the operations we have introduced in this chapter. First of all, in E'^, as in any Riemann space, we can (and shall) identify vector fields with linear differential forms. Thus for any function / we can regard df as a vector field. As such, it is called grad/. Thus, in E^, in terms of rectangular coordinates x, y, z, -«-^:4'i'i^' where we have also identified vector fields on E^ with E'^-valued functions. Secondly, since E'^ is oriented, we can, via the ^-operator (acting on each T*), identify A^(E^) with A^(E^). Recall that * is given by ^{dx A dy) = dZy *(c/x A dz) = —dy, ^{dy A dz) = dx. (LI)
458 EXTERIOR CALCULUS In particular, if coi = <P,Qj R> = P dx-\-Q dy-\-R dz and 002 = <L, Mj N> = L dx -\- M dy -\- N dZj we can introduce the so-called "vector product" of coi with a;2. It is defined by coi X 002 = *(a;i A a;2) and is given [in view of (I.l)] by <P,Q,R> X <L,M,N> = <QN - RM, RL - PN.PM - QL>. Also we introduce the operator curl CO = *G?a;. Thus, if CO = <P,Q, R>, we have _JdR,_dQd]P_dR dQ_dR,\^ \dy dz ^ dz dx ^ dx dy ( Consider an oriented surface in E^; i.e., let <^: aS -^ E^. Let Q be the volume form on S associated with the Riemann metric induced by cp. By definition, if ?i, ?2 G T,(S), then where dV is the volume element of E'^ and n is the unit normal vector. Another way of writing this is to say that where (7 = *n when we regard n as a differential form. Now let li; be a form in E^, and suppose that <^*co = fQ for some function /. Then /(x) = (co, *n)(<^(x)). Thus f <^*(a5) = f fQ = I (cj, *n)12 = f (*a5, n)Q. J S J S J S J Applying this to co = g?co, where cx: = P dx + Q dy + R dz, we can rewrite Stokes' theorem as /co= I P dx + Q dy + R dz = f (curl co, n)12, Jc Jc Js where S is some surface spanning the closed curve C. If we apply the remark to the case co = *co and S = dD, we obtain, since ** = id (for n = 3), f (co, n)Q = j d ^ 0) Note that , /dP . dQ . dR\. ^ , ^ . which we write as div co; that is, div CO = G? * CO.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 459 (It is in fact div {co, dV}, where dV is the volume element and we regard co as a vector field.) Thus we get the divergence theorem again. Note that and since d^ = 0. curl (grad f) = ^d df — 0 div (curl co) = G? ** G^co = c^^co = Appendix II. ELEMENTARY DIFFERENTIAL GEOMETRY OF SURFACES IN E^ For purposes of computation, it is convenient to introduce the notion of a vector-valued differential form. Let £' be a vector space, and let M be a differen- tiable manifold. By an E'-valued exterior differential form Q of degree p we shall mean a rule which assigns an element Qx to each x G M, where Q^ is an antisymmetric E'-valued multilinear function of degree p on Tx(M). For instance, if p = 0, then an E'-valued zero-form is just a function on M with values in E, An E'-valued one-form is a rule which assigns an element of E to each tangent vector ? at any point of M, and so on. Suppose that E is finite-dimensional and that {^i, . . . , e^} is a basis for E. Let 12i, . . . , 12iv be (real-valued) p-forms. We can then consider the E'-valued p-form Q = l^i^i + • • • + ^N^N, where, for any p vectors ?i, . . . , ?p in Tx{M)j we have ^x(?b ' ' ' , iv) = ^1x(?1j • • • j iv)^l + • • • + ^Nx{il, . . . , iv)^N' Conversely, if 12 is an E'-valued form, then real-valued forms 12i, . . . , 12iv can be defined by the above equation. In short, once a basis for an A^-dimensional vector space E has been chosen, giving an E'-valued differential form 12 is the same as giving N real-valued forms, and we can write N 12 = ^ ^iCi or 12 = <12i, . . . , 12iv>. 1 The rules for local description of ^E^-valued forms, as well as the transition laws, are similar to those of real-valued forms, so we won't describe them in detail. For the sake of simplicity, we shall restrict our attention to the case where E is finite-dimensional, although for the most part this assumption is unnecessary. If CO is a real-valued differential form of degree p, and if 12 is an E'-valued form of degree g, then we can define the form co A 12 in the obvious way. In terms of a basis, if 12 = <12i, . . . , 12iv>, then co A 12 = <a; A 12i, . . . , co A 12iv>. More generally, let E and F be (finite-dimensional) vector spaces, and let # be a bilinear map of E X F -^ G^ where G is a third vector space. Let {^1, . . . , ^iv}
460 EXTERIOR CALCULUS be a basis for E, let {/i, . . . , Jm} be a basis for F^ and let {t/i, . . . , (/l} be a basis for G. Suppose that the map # is given by k Then if co = XI ^t^t is an £'-valued form and 0 = X ^jjj is an i^-valued form, we define the (j-valued form co A 0 by CO A 0 = X! (X) ^ii^t A ^j\ k \ i,j / gk. It is easy to check that this does not depend on the particular bases chosen. We shall want to appl}^ this notion primarily in two contexts. First of all^ we will be interested in the case where E = F and G = U, so that # is a bilinear form on E. Suppose # is a scalar product and ei, . . . j e^ is^n orthonormal basis. Then we shall write (co A ^) to remind us of the scalar product. If CO = <coi, . . . , coa-> and U= <^i, . . . , 12,v^ ; then (co A fi) == L ^i A ^i. Note that in this case if co is a j>-form and ^ is a g-form, then (co A ^) = (-1)^^(12 A co), as in the case of real-valued forms. The second case we shall be interested in is where F = G and E = Hom(i^), and # is just the evaluation map evaluating a linear transformation on a vector of F to give another element of F. This time, choosing a basis for F determines a basis for Hom(i^), so we can regard co as a matrix of real-valued differential forms. If CO = (co^-y) and U = <^i, . . . , ^m>, then CO A ^ == <2Z coiy A ^y, . . . , S o^Mj A Qj>. The operator d makes sense for vector-valued forms just as it did for real- valued forms, and it satisfies the same rules. Thus, if ^ = <^i, . . . , ^^y>, then dU = <dUi, . . . , d^M ^ and d{o: A ^) = do: A ^ + ( —l)^co A d^ if CO is an E'-valued form of degree p and 9, is an i^-valued form. We shall apply the notion of vector-valued forms to develop (mostly in exercise form) some elementary facts about the geometry of oriented surfaces in E'^ Let M be an oriented two-dimensional manifold, and let (^ be a differ- entiable map of M into E^. We shall assume that (p:^ is not singular at any point of M, i.e., that (p is an immersion. Thus at each point p G M the space (^^ {Tp{M)) is a tw^o-dimensional subspace of 2'<p(p)(E^). Since we can identify T'<p(p)(E^) with E^, we can regard (p^{Tp{M)) as a two-dimensional subspace of E^. (See Fig. 11.15.) Since M is oriented, so is the tangent plane ip^{Tp(M)). Therefore, there is a unique unit vector orthogonal to the tangent plane which,
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 461 Fig. 11.15 together with an oriented basis of the tangent plane^ gives an oriented basis of E^. This vector is called the normal vector and will be denoted by n(p). We can consider n an E^-valued function on M. Since ||n|| = 1, we can regard n as a mapping from M to the unit sphere. Note that ip(M) lies in a fixed plane of E^ if and only if n = const (n = the normal vector to the plane). We therefore can expect the variation of n to be useful in describing how the surface (p(M) is "bending". Let Q be the (oriented) area form on M corresponding to the Riemann metric induced by <p. Let Qs be the (oriented) area form on the unit sphere. Then n*(Qs) is a two-form on M^ and therefore we can write n*Qs = KQ. The function K is called the Gaussian curvature of the surface (p{M). Note that K = 0 if (p{M) lies in a plane. Also, K = 0 if <^(M) is a cylinder (see the exercises). For any oriented two-dimensional manifold with a Riemann metric we let ^ denote the set of all oriented bases in all tangent spaces of M, Thus an element of ^ is given by <fi,f2>, where </i,/2> is an orthonormal basis of T^{M) for some x ^M. Note that /2 is determined by/i, because of the orientation and the fact that J2 J- /i- Thus we can consider ^ the space of all tangent vectors of unit length. For each x ^M the set of all unit vectors is just a circle. We leave it to the reader to verify that ^ is, in fact^ a three-dimensional manifold. We denote by tt the map that assigns to each </i,/2> the point x when </i,/2^ is an orthonormal basis at x. Again, the reader should verify that tt is a differentiable map of ^ onto M, In the case at hand, where the metric comes from an immersion ^, we define several vector-valued functions X^ ei, ^2? and 63 on 3^ as follows: X = <^ o TT, ^l(</l,/2>) = ^Ju ^2(</lj/2>) == <^*/2, 63 = n o TT.
462 EXTERIOR CALCULUS (In the middle two equations we regard (p^ ji as elements of E^ via the identification of T'<p(x)E^ with E^.) Thus at any point z of 3^, the vectors ^1(2;), ^2(2;), e^{z) form an orthonormal basis of E^, where ^1(2;) and 62(2:) are tangent to the surface at (p{'k{z)) = X{z) and es{z) is orthogonal to this surface. We can therefore wTite (dX A cs) = (dX, es) = 0 and (ei, Bj) = 8i By the first equation we can write dX = COi^i + ^02^2, (II-l) where coi = (dX, ei) and 002 = (dX, 62) are (real-valued) linear differential forms defined on ^. Similarly, let us define the forms co^y by setting c^ij = (dei, ej). Applying d to the equation (e^-, Cj) = bik shows that CO^y = —coy,-. (II.2) If we apply d to (II.1), we get 0 = d dX = do^iei — coi A dci + do^2e2 — 0^2 /\ ^^2- Taking the scalar product of this equation with ei and ^2, respectively, shows (since con = 0 and 0022 = 0) that cZcoi = 002 A 0021 and G?a;2 = coi A a;i2. (II-3) If we apply d to the equation dCi = ^ o^ijCj, i we get 0 = Y^idcx^ijej — cx^ij A dej) and if we take the scalar product with ej, we get k If we apply d to the equation (dX, e^) = 0, we get 0 = d(dX, es) = (dX, de^) = (coi^i + ^02^2, coai^i + a;32e2), which implies that ^1 A coai + ^2 A 0032 = 0. (II-5) We will now interpret these equations. Let z = </i,/2^ be a point of ^. For any J G Tz(^) we have
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 463 Therefore, = (^*?,/i), (11.6) since the metric was defined to make <^^ an isometry. In other words, (?, coi) and (?, 002) are the components of tt^J with respect to the basis </i,/2>. If 7} is another tangent vector at z, then coi A a;2(?, i?) is the (oriented) area of the parallelogram spanned by tt^ J and tt^t;. In other words, coi A a;2 = 7r*12, (II.7) where Q is the oriented area form on M. Similarly, {^,des) = n^TT^?, (II.8) and we have ^*7r^? = (J, coai)^! + (?, ^032)^2 = (?j ^3l)<^*/l + (?, C032)<^*/2. (II.9) Since we can regard ei and 62 as an orthonormal basis of the tangent space to the unit sphere J we conclude that cosi A co32(^, ^7) is the oriented area on the unit sphere of the parallelogram spanned by n^7r^ J and n^ir^r]. Thus ^31 A 0032 = 7r*n*12^ Let be the matrix of the linear transformation n^: Tx{M) -^ Tn{x)(S^) in terms of the basis </i,/2> of T^(M) and <eu e2> of Tnix){S^)> Then comparing (II.6) with (II.9) shows that ^31 = ^^1 + ^^co2 and 0032 ^ ?)coi + ca;2. (11.10) If we substitute this into (II.5), we conclude that h = 6\ i.e., that the matrix of n^ is symmetric. This suggests that it corresponds to a symmetric bilinear form of some geometrical significance. In other words, we want to consider the quadratic form acoi + 26a;ia;2 + ccoi [where it is understood that this is the quadratic form on T2{^) which assigns the number a(?, a;i)2 + 26(?, coiX?, 0^2) + c{^, 0^2? to any f G ^^(3^)]. a b' = TT* ii:fi = Kwi A W2 6' c
464 EXTERIOR CALCULUS EXERCISES 11.1 Show that 11.2 The quadratic form which assigns to each f G Tx{M) the number (<^*f j ^*f) is called the second fundamental form of the surface. We shall denote it by 11(f). (What is usually called the first fundamental form is just ||f||^ in our terminology.) Let C be any smooth curve with C'(0) = f. Show that 11(f) = - I ^4;^ (0). n{x) )■ Thus 11(f) measures how much the curve cp o C \^ bending in the n-direction. Suppose we choose C to be such that (p <=> C lies in the plane spanned by cp^^ and n{x). [Geometrically, this amounts to considering the curve obtained on the surface by intersecting the surface with the plane spanned by cp^^ and n{x).] Show that 11(f) is the curvature of this plane curve. In this sense, the second fundamental form 11(f) tells us how much the surface is bending in the direction of f. Note that Let Xi and X2 be the eigenvalues of the matrix a b J) c Thus Xi= max 11(f) and X2 = min 11(f) for ||f|| = 1. If Xi 5^ X2, there are two orthogonal eigenvectors which are called the directions of principal curvature of the surface. (Note that they must be orthogonal, since they are eigenvectors of a symmetric matrix.) If A is a Euclidean motion of E^, then ^ = ^1 o <^ is another immersion of M and it is easy to check that both the Riemann metric induced by ^ and the second fundamental form associated with ^ coincide with those attached to <p. What is not so obvious is the converse: If ^ and <p induce the same metric and the sa7ne second fundamental for 7n, then 4^ = A o <pfor some Euclidean motion A. We will not prove this fact, although it is a fairly easy consequence of what we have already established. We have seen the meaning of coi, 002, 0031, and 0032 in geometric terms. Let us now interpret the one remaining form, a;i2. Let 7 be a differentiable curve on M. A differentiable family of unit vectors /i(-) along 7 [where/i(s) E Ty^s){M)] is the same as a curve C in 3^ with w oC = J. [Here C{s) = <fiis)j f2(s)>.] Let us call the family /i(s) parallel if the unit vectors are all changing normally to the surface in three-space. In other words, /i(s) is parallel if the vector ds
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 465 is normal to (p(M) for all s. Let us see how to express this condition. Let ^s be the tangent vector to the curve C at C{s). Then, by the definition of dei^ ^<^*T(s)(/i(s)) = {^s,dei). Note that ((?«, dei), ei(C(s))) = 0 and ((?«, dei), e2{C(s))) = (?«, CO12). Now ei and 62 span the tangent space to (p(M), so saying that /i(-) is parallel is the same as saying that (Jg, a;i2) = 0. Thus /i(s) is parallel along 7 if and only if (Js, CO12) = 0. Let M and M be two-dimensional manifolds with Riemann metrics. Let u: M -^ M be a differentiable map which is an isometry. Let ^ be the manifold of orthonormal bases of M, and let ^ be the manifold of orthornormal bases of M. Then u induces a map u oi ^ —^ ^ by u{<fuf2>) = <uJuuj2>- Let coi be the differential form on ^ given, as in (IL6), by {i.o^i)= (7r*?,/i) for ?gT,(3^), where z= </i,/2^, with the corresponding definition for 002, coi, and C02. Then for any J G Tz{^) we have, since t ^ u ^ u o w, <?, ^t^coi) = (?2^J, coi) = (tt^?2^?, 1/Ji) = (i^*7r^J, 7^*/i) = (tt^J, /i) = (?, coi). In other words, u*cibi = coi and ^^*co2 = a;2. Now suppose that the metrics on M and M come from immersions <p and ^. Then we get forms co^-y and co^j. Now by (II.3) we have Thus or iIt*(coi A CO21) = 'W*G?coi = d{u'^G)i) = do^i = coi A a;2i. coi A 'W*co2i = coi A a;2i and 002 A 'W*coi2 = 002 A ^12? (?2*coi2 — a;i2) A coi = 0' and (i;t*coi2 — a;i2) A 002 = 0. Since the differential forms coi and 002 are linearly independent, this can only happen if ^*^12 = ^12- In other words, if the two surfaces (p(M) and (p(M) are isometric, they have the "same" a;i2, that is, the same notion of "parallel vector fields". Observe that a piece of a cylinder and a piece of the plane are isometric, even though they are not congruent by a Euclidean motion. In different terms, while the forms coi 3 and 0023 depend on how the surface is immersed in E^, the form 0J12 depends only on the Riemann metric induced by the immersion. Now we have (II.4): G?a;i2 = CU13 A CU23 = —■ 7r*Qs = — K^-
466 EXTERIOR CALCULUS From this we conclude that the Gaussian curvature K also does not depend only on the immersion, but only on the Riemann metric coming from the immersion. Since a;i2 does not depend on <^, we should be able to define it for an arbitrary two-dimensional manifold with a Riemann metric. Note that the preceding argument shows that a;i2 is uniquely determined by Eq. (II.4). It therefore suffices to construct an co 12 on a coordinate neighborhood so as to satisfy (II.4). It will then follow from the uniqueness that any two such coincide to give a well-defined form. Let f/ be a coordinate neighborhood of M, and let ^: U -^ ^ be a differentiable map such that tt o ^ = id. Thus ^ assigns a basis </i, /2> to each x G U, in a differentiable manner. (One possible way to construct ^ is to apply the orthonormalization procedure to the vector fields < d/dx^, d/dx^ >.) Once we have chosen ij/, any basis of T^ differs from ^(x) by a rotation. If we let r denote the (angular) coordinate giving this rotation (so that r is only defined mod 27r), then we can use the local coordinates on U together with r as coordinates on 7r~^{U). More precisely, if x^ and x^ are local coordinates on Uj we define y^, y^yThy ?/^ = X^ o T, 2/^ = X^ o TT, and t(z) is given for z = <ei, e2> by ei = cos T{z)fi + sin T{z)f2, ^2 = —sin T{z)fi + cos r(2)/2, (II.H) where <fij2> = H^) when <ei, e2> e T^{M). Now let ^1 = ^*(coi) and 62 = ^*(co2), so that ^1 and 62 are forms defined on U and are, in fact, the dual basis for ^(x) at each x G M. If we set 0:1 = 7r*^i and 0:2 = 7r*^2j then (11.11) gives coi = cos Tai + sin Ta2 and 002 — —sin roii + cos 70:2. Note that coi A a;2 = oil A 0:2. Define the functions li and I2 on M by ddi = lidi A 62 and de2 = h^i A 62. Let hi — li o TT and k2 — h ° ^r, so that dai — kiai A 0:2 and 6^0:2 = k2(xi A 0:2. Now dcx^i — —sin T dr A ai + cos t dr A (X2 + (/ci cos r + /c2 sin 7)0:1 A 0:2, G?a;2 — —cos T dr A ai — sin r G?r A 0:2 + (+/c2 cos t — ki sin 7)0:1 A 0:2.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 467 Since coi A 002 = ai A 0:2, we can rewrite these equations as do^i = (dr + (/ci cos r + /c2 sin r)a;i) A 002, G?a;2 — —{dr — (+/c2 cos r — ki sin 7)0^2) A coi. We thus see that the form a;i2 = dr -\- (/ci cos r + /c2 sin r)a;i + (—/ci sin r + /c2 cos r)a;2 = dr -\- kiai + /c2a:2 satisfies the desired equations. As before, on any two-dimensional Riemann manifold we will call a family of unit vectors parallel along a curve T if (^s, CO12) — 0. With this definition of parallel translation we can state the following: Theorem. Let 7 be any differentiable curve on M. Given the unit vector Qi G T'7(0)(M), there is a unique parallel family of unit vectors ^i(s) along 7, with ^i(O) == ^1. If g[{Qi) is another unit vector of Ty(^Q){M) differing from gi by an angle c, then ^'i(s) differs from ^i(s) by the same angle a for all s. Proof. It is clearly sufficient (by breaking 7 up into small pieces if necessary) to prove the theorem for curves 7 lying entirely in a coordinate chart. Then we can use the local expression for a;i2. Let us rewrite the condition for parallel translation along 7(s). In terms of local coordinates, the unit vector ^i(s) is given by a function r(s), where Then gi{s) = cosr(s)/i(7(s)) - sin r(s)/2(7(s)). <?s, CO12) -= {is dr) + <?s, /ciai + k2a2) -^^^'^ + {is,7r'^(k,d, + k2d2)) — <7r*?s, kidi + /C2(92). ds dr{s) ds But TT^Js ^ fs is the tangent vector to 7 at 7(s). Thus where Fy(s) = (fs, kidi + /c2^2) is a function depending only on s. In particular, ^i(s) is parallel if and only if From this we see that given ^i(O) there is a unique parallel family ^i(s), starting with ^i(O). Furthermore, if g[{0) is a second unit vector at 7(0), the angle
468 EXTERIOR CALCULUS between gi{s) and g[(s) is equal to the angle between gi(0) and g[{0). Thus parallel translation preserves angles, which proves the theorem. D Note that if M is (locally isometric to) Euclidean space, then we can choose fi = -^ and /2 = 73 so that ^1 = dx^ and 62 = dx^. In this case, ki = k2 = 0 and r is just the angle that gi makes with d/dx^, that is, with the Xi-axis. Thus 0012 = —dr in this case. Then the condition for parallel translation becomes dr/ds = 0, which coincides with the usual notion of parallelism in Euclidean geometry. Note that in Euclidean space the parallelism does not depend on the curve 7. This is not true in general. Exercise 11.3. Let 7i and 72 be two arcs of great circles joining the north and south poles on S^. Suppose that 7i and 72 are orthogonal at the poles. Let f be a tangent vector of the north pole. Compare its translates to the south ])ole via 7i and 72. Let M be any two-dimensional Riemann manifold. For any curve 7 on M there is an obvious way of choosing unit vectors along 7: just let gi(s) be the unit tangent vector to 7 at 7(s). Thus for every curve 7 on M we get a curve, which we shall call y, on 3". [Here y = (y{s), gi{s), g2{s)) and gi{s) is the tangent to 7(s).] We call the form 7*(a;i2) the geodesic curvature form of 7. [In the Euclidean case this is just the ordinary curvature (see the exercises).] Let us consider those curves whose geodesic curvatures vanish, i.e., those curves whose tangent vectors are parallel. We shall call such a curve a geodesic with respect to the given Riemann metric. Note that the condition that a curve be geodesic is given, in local coordinates, by a second-order differential equation. Therefore, a geodesic C{-) is uniquely specified by giving C(t) and C'(t) at any fixed value of t. In Chapter 13 we use the term "geodesic" to mean a curve which locally minimizes length. It is the purpose of the next few exercises to show that geodesies in our present sense have this property. EXERCISES II.4 Let X, y be local coordinates on U C M. Through each point of the curve y = 0 (that is, the x-axis in the local coordinates), construct the unique geodesic orthogonal to this curve. (See Fig. 11.16.) Let s be the arc-length parameter along the geodesic, so that the geodesic passing through {u, 0) is given by {y(u, s), x{u, s)). Show that the map {u, s) ^-^ {y{u, s), x{u, s)) has nonzero Jacobian at (0,0) and therefore defines a coordinate system in some open subset U' C U.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 469 Fig. 11.16 Fig. 11.17 II.5 We are going to make a further change of coordinates, field on U' defined by the properties Let Y be the vector iJ^ii = 1, (^.|) 0, {Y,du)>Q. Thus Y is orthogonal to the geodesies u = const and points in the increasing tZ-direc- tion. Let us consider the solution curves of this vector field parametrized by the initial position along the geodesic u = 0. That is, let v be the arc-length parameter along the geodesic u = {), and consider the map {u^ v) H-> {u^ s{u, v)), where s(u, v) is the s-coordinate of the intersection of the solution curve of Y passing through (0, v) with the geodesic given by u. (See Fig. 11.17.) Again the existence theorem and smooth dependence on parameters, together with the fact that the curves u = 0 and s = 0 are already orthogonal, guarantees that we can find some neighborhood W so that (u, v) are coordinates on W. We have thus constructed coordinates such that the curves u = const are geodesies and the curves u = const and v = const are orthogonal. Such a system of coordinates is called a geodesic parallel coordinate system. II.6 Let {u, v) be a coordinate system on U C M for which (d/du, d/dv) = 0, so that the metric takes the form ds^ = p du^ + q dv^. Define the choice of frame ^ by normalizing d/du, d/dv so that ^(x) = <fi,f2^, where /i = {d/du)/\\id/du)\\ and /2 = id/dv)/\\id/dv)\\. Show that the forms ^i and 02 are given by di = p du, 62 = Q dv, and and C012 K (idp ^ ^ I dq \ = dr — irn- ^ du-\- - —- dv] \q dv p du / pq \_du \p du/ dv \q dv J
470 EXTERIOR CALCULUS 11.7 Let (u, v) be a geodesic parallel coordinate system, as in Exercise II.5. The curve Cu given by Cuiv) = (u, v) is a geodesic. Thus {C'\v), a;i2) = 0. But in terms of our local coordinates, {C"(v), dr) = 0, since C {v) is always parallel to one of the base vectors /2, and {C\v),f^du) = {CHv),du) = 0, since u = const along C. Thus we conclude that dq/du = 0, or q = q{v). Let us replace the parameter vhy w = Jq q{t) dt. Then {u, v) is a geodesic parallel coordinate system for which we have ds^ = p du^ + dw^, and now the arc length along any curve u = const is / dw. 11.8 Show that for \w\ sufficiently small, any curve joining (0, 0) to (0, w) must have arc length at least \w\. Conclude that (since the choice of our original curve x = 0 was arbitrary) the geodesies locally minimize length. 11.9 Let <w,z>- be local coordinates on an open set f/ of a Riemann manifold with the property that the curves Cz given by Cziw) = {w, z) are geodesies parametrized according to arc length. Thus z = const is a geodesic and ||^/^ti;|| = L Let \dw dzj Show that da/dw = 0. [Hint: Show that by orthonormalizing {d/dw, d/dz), we obtain a map ^ whose associated forms ^i and 62 are given by ^1 = dw ^ a dz, 62 = b dz, where b = \\d/dz — a{d/dw)\\. Then compute h and h and use the fact that z = 0 is a geodesic] 11.10 Construct geodesic polar coordinates. That is, for a fixed p E M let 6 be an angular coordinate on the unit circle in Tp{M). For each 6 let Ce{-) be the unique geodesic, parametrized by arc length, such that C(0) = p and C'(0) corresponds to angle 6 in Tp(M). Show that the map (r, 6) 1—> Ce(r) gives "coordinates" on U — {p}, where U is some neighborhood of p. By taking <r, 6> to be the <w, z> of Exercise II.9 and passing to the limit r = 0, conclude that \dr ddj dr ' do) ^' Thus in terms of the "coordinates" <r, 6> on U — [p], the Riemann metric takes the form ds'^ = dr^+ A dd'^. These coordinates are the analogue, for a general two-dimensional Riemann manifold, of the polar coordinates introduced on the plane and on the sphere at the end of Chapter 9. The argument given there applies generally to give another i)roof of the fact that geodesies locally minimize arc length. We now continue to study the consequences of the equation Let D be a domain with regular boundary on M, and let ^ be a map of some neighborhood of D -^ 5 satisfying tt o ^ = identity. Then, by Stokes'
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 471 theorem, / ^*(coi2) = - I ^*7r*K^ = - I Kn, (11.12) JdD Jd Jd where ^ is the area form on M. Let us apply this to the map ^ constructed as follows: Let 7 be a vector field on a compact manifold iff which vanishes at only a finite number of points pi, . . . , pr. At any x E M where 7^ ^ 0, put fi(x) = Y:c/\\Yx\\, so ^(x) = <x, fi{x), /2(^) > • About each point p^ choose a chart (f/^, /i^) such that f/^ contains no other point pj and /^t(pt) == (0, 0) in the plane. Let 7^-,^ be h~^{Cr), where Cr is the circle of radius r in the plane. [Here r is chosen so small that Cr lies in h^(U^).] Now let D = M — Ui (interiors of 7^-,^). We must compute Jy. ^ ^*(a;i2), where 7^,^ is oriented clockwise rather than counterclockwise. For this purpose, we introduce about p^ the orthonormal frames coming from the coordinates x^, x^. Thus we have coordinates x\ x^, t, where T{x,fi,f2) measures the angle that/i(x) makes with d I dx^lx Thus (taking yt,r,T clockwise) - I dr(h{s)) = 27r (index of Y at p^). But ^*a;i2 = ^*G?r + ^*(/ciq:^ + ^2«^), so — f ^*coi2 == 27r (index of 7 at p^) + f kiO^ + /c2^^. Now as r -^ 0, the second term on the right vanishes and Jd -^ Jm on the right of (IL12). Thus we have proved the following: Let y be a vector field which vanishes at a finite number of points pi, . . . , p^. Then X; index (7) = -^- / K dA. (IL13) i Pi ^TT Jm Li particular, the sum of the indices of a vector field is independent of the vector field, and the total integral of the curvature is independent of the Riemann metric. Thus, for instance, if if/ is S^, the vector field tangent to the meridian circles has zeros only at the north and south poles, and the index at each of these points is +1. Thus (IL13) says that the sum of indices of any vector field on S^ must equal 2. (In particular, it is impossible to construct a vector field on S^ which does not vanish anywhere.) Furthermore, (IL13) says that no matter what Riemann metric we put on S^, we have /. KdA = 47r. Similarly, if M is the torus, we can put it on a vector field which does not vanish anywhere, so the integer in (IL13) is 0.
472 EXTERIOR CALCULUS EXERCISES Let M denote the torus with angular coordinates <pi, <p2 (defined up to 2t). The map F will denote the immersion of M —^ Euclidean three-space given by xi = (2+ cos <^i) cos <^2, 0:2 = (2+ cos <^i) sin <^2, 0:3 = sin <pi. 11.11 What is the Riemaiiii metric on M induced by F? That is, compute (d/dcpi, d/d(pi)p, (d/d<pi, d/d(p2)p and (d/d<p2, d/d<p2)p for each p G M. (These can be given as functions of cpi, ip2.) Let/i,/2 denote the vector fields obtained by orthonormalizing d/difi, d/d(p2. What is the explicit expression for/1,/2? 11.12 What is the total area of the torus relative to this Riemann metric? 11.13 In terms of the vector fields /i,/2 given in Exercise 11.11 we can introduce (angular) coordinates (pi, ip2, r on 3^. In terms of these coordinates, what are the explicit expressions for the vector-valued functions X, f^,f^,f^, on 3^? (Each of these vector-valued functions should be given in terms of the fixed standard basis of Euclidean three-space, i.e., a triplet of functions of (pi, (p2, r.) Thus X(<^i, <p2, r) = < (2 + cos (pi) cos (p2j (2 + cos (pi) sin <^2, sin (pi > . What are the corresponding expressions for/1,/2, and/a? 11.14 What are the explicit expressions for the forms cji, 0^2, and a;i2 on 3^ given in terms of d(pi, d(p2, and dr? 11.15 What is the Gaussian curvature K oi Ml (Again, K can be given as a function of (pi,(p2') For which points of M is the Gaussian curvature positive, and for which points is it negative? 11.16 On any Riemann manifold there is an obvious way of identifying a linear differential form with a vector field [via the identification of T^iM) with T^iM) given by the scalar product in Tx{M)]. If g' is a function, the vector field associated to the form dg is denoted by grad g. Let M be the torus with Riemann metric as above, and let xi be the function given by xi{(pi, (p2) = (2 + cos (pi) cos <^2, as before. Show that the vector field X = grad xi vanishes at exactly four points, and compute its index at each of these points. Let x^ x^ be a local system of coordinates on U C M, and let ^: U -^ ^ he given by orthonormalizing <d/dx^, d/dx^>. Thus coordinates on t~^(U) are x^ o tt^ x^ o tt, r, where t{z) is the angle that /i makes with d/dx^ U z = </i,/2>". Then a;i2 = dr-\-kiai-\-k20C2 = dr + 77*^*0; 12 on T~'^{U). Let C be a curve in U with C\t) > 0, and let C be the curve in tt{U) assigning to each t the unit tangent vector C\t)/\\C\t)\\. Then / a;i2 = /C*coi2 = / \^*a;i2+ / dr, Jc J Jc Jc Jc where r is the angle that C\t) makes with the x-axis.
APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E^ 473 From this formula we can deduce a number of interesting consequences which we flJiilo in exercise form. 11.17 Show that the sum of the exterior angles of a geodesic triangle D is given by '*" Jd KQ.. (A geodesic triangle is a domain whose boundary consists of three K«MHiosics.) 11. Itt Suppose that D is a domain which in the local coordinates is Kivcii by a simple polygon. Thus dB = Ci U • • • U Cjt, where each i\ is a curve which is a straight line segment in the local coordinate *NHt(»m. Let ai, . . . , ak be the interior angles. (See Fig. 11.18.) Hliow that kTr+ Z^ = 27r — 2 L ^12- Fig. 11.18 11.19 This gives us another way of computing the integer in (11.13) if M is a compact R>ii face. Definition. A cellulation of a compact differentiable two-dimensional manifold M is a finite collection of closed subsets (called the cells) Fi, . . . , Fm such that m M = [j Fi satisfy the following: 1) For each Fi there is a one-to-one bidifferentiable map fi of a neighborhood of Fi onto a polygon with rn edges {m > 3). 2) For I ^ j\ either Fi n Fj is empty or fiiFiOFj) is a fixed edge or vertex of the corresponding polygon. Let/ be the number of faces, e the number of edges, and v the number of vertices ui (he cellulation of a two-dimensional Riemann manifold. Show that f-e+v = f Jm m. Tims/ —■ e-^ V does not depend on the cellulation. We have thus given three distinct N\ iiys of computing an integer attached to the manifold. This integer is called the Euler I h(tracteristic and is denoted by X{M). Thus X(M) = f K^ = f - e+v =J^ index Y. Jm ^i 11.20 By a regular cellulation we mean a cellulation such that each face has the same number of edges and each vertex is the union of the same number of edges. Show that (luu-e are at most five possibihties for the number of faces in a regular cellulation of the Hphcre. Conclude that there are at most five "regular solids".
CHAPTER 12 POTENTIAL THEORY IN E^ 1. SOLID ANGLE In what follows x^,...,x'^ will denote Euclidean coordinates on E'\ Let r^ = (^x^y + ' ' ' + (x"")^; then dr^ = 2r dr, so that we have r dr = J2 ^^ dx\ *r dr = 2: i-iy-^x' dx^ A " ' d?" ' A dx^, d * r dr = ndx^ A ' ' ' A dx^. Let i denote the injection of S'^~^ —> E^ as the unit sphere. Let Vn denote the volume of the unit ball and An-i the volume of the unit {n — l)-sphere. The volume of the sphere of radius r is thus r^~^A^_i, and so Vn= / r''-'An-idr = -An-i. Jo n Since z*(*7' dr) is an (n — l)-form invariant under rotations, it is some multiple of the volume form on S'^~^. By Stokes' theorem, / i*{^r dr) = f d^rdr = n\ dx^ A • • • A dx^. Js^-'^ Jb^ J b^ Comparing this with the above, we conclude that z*(*r dr) is the volume form on the unit sphere. Let p denote the projection of E^ — {0} onto the unit sphere. Set r = p*i*{^r dr). Then r is called the element of solid angle. Integrated over any {n — 1)-surface, it gives the volume (counting sign of orientation and multiplicity) of its projection on the unit (n — l)-sphere. We have dr = 0, since dr = dp*i*{*r dr) = p*(df{*rdr)) = 0, because z*(*r dr) is an (n — l)-form on the (n — l)-dimensional manifold aS^~\ and d of any (n — l)-form on an (n — l)-dimensional manifold must vanish. 474
12.1 SOLID ANGLE 475 Let Ir denote the injection of the S^ ^ into E"" as the sphere of radius R (so that ii = i). It then follows directly from the definitions that irt dSR where dSR is the induced volume form. (See Fig. 12.1.) Fig. 12.1 Now the volume of the ball of radius R is R'^Vn and the surface volume (area) of the sphere of radius R is R'^~^An-i = R~^(nR'^Vn)' Now iR{^dr) is an (n — l)-form and thus i*R(*dr) = f dSR for some function /. Since ^dr and dSR are invariant under rotation, we conclude that / is a constant. But f i%(*r dr) = nf dx^ A " - A dx^ = nR'^Vn, form which we conclude that dSR = i%(^) = i%{^dr). Thus .* /*r dr\ Now let X be any point in E^ — {0}, and let ?i, at X. Then ^n-i be tangent vectors [r - :^y Uu • • •, ?n-i will vanish if all the J's are tangent to the sphere centered at the origin and passing through x, by the above equation. If one of the J's, say ?i, is a multiple of {d/dr)x, then again the expression will vanish, because t(Ji, . . . , ?n-i) = z*(*r dr){p^^i, . . . , p*?n-i) and p*?i = 0, and because *dr{^i, ...,•)= 0- By multilinearity, we conclude that the expression vanishes identically and therefore that _ j^ _ ^rdr _ E j—iy-'^x' dx^ A " ' A dx' A " ' A dx"" where the ^^ indicates that the corresponding term is omitted.
476 POTENTIAL THEORY IN E"" 12.2 Fig. 12.2 2. GREEN'S FORMULAS Let u and v be smooth functions on E"". Then du ^b"-' so :,du = L (-1)'-' ^.dx' A "'d^'"' A dx"", and therefore d ^ du = where the operator dx^ A • • • A dx'^ = Au dx^ A • • • A dx'^, is called the Laplacian. Let [/ be an open subset with compact closure and almost regular boundary. (See Fig. 12.2.) Set Du[Uj v] = j du A *G?f = dv A ^du = / ( 2 '^~i J~i) ^^^ A • • • A dx'^. Now jdu u ^ dv = ju d(u * dv). But d{u * G?y) = du A "^dv -\- u A d * dVj so [ u* dv = Du[u, v]+ f (uAv) dx^ A ' " A dx"". (2.1) JdU Ju Since Du is symmetric in u and v, we have / u ^ dv ~ V * du = I (uAv — vAu) dx^ A - - - A dx'^. (2.2) J du Ju Suppose U contains the origin, and let 11^= U — Be, where Be is the ball of radius e centered at the origin. Let v = r^~''.t Then dv = {2 — n)r^~'^ dr and *G?y = _(^ _ 2)t, so that d * dv = 0. Substituting into (2.2) and taking e tFor n = 2, set v = log r.
12.3 THE MAXIMUM PRINCIPLE 477 small enough so that dUe = dU — dB^, we get -in-2) I iuT + ^^^^-1^) + (n-2) I ur + e""" [ ^du J BU \ n ~ I J J BBe JdBe = - / r^-^'Audx^ A ••• A dx"". JUe Let us examine what happens as e -^ 0. The first term on the left does not depend on e. For the second term, let us write u = u{0) + e{e). Then the second term becomes (n — 2)An-iu{0) + 0(e). For the third term on the left, we write / *du = d * du = All dx^ A • • • A dx^. JdBe JBe J Be Thus the third term on the left vanishes to order e^~^. Since r^~^ is an integrable function on E^, the right-hand side tends to — ^^ r'^~'^ Au dx^ A • • • A dx^. Therefore, if we let e -^ 0, we get An-Mid) = [ (ur + '''~"*^-) ^ f r^-^ Audx' A'" A dx\ J du \ n — Z / n — z Ju (2.3) In particular, if i/ is a harmonic function, that is, Au = 0, then ^(0) = -IT— \ ur-\- -, r ^ [ ^ ' (2.4) ^ ^ An-i Jeu An-i(n - 2) Jeu r^-^ ^ ^ In the special case U = Ba, the second term on the right-hand side of (2.4) becomes *du = 7 zrr—. / d * du ^ 0. (n - 2)An-ia^-^ jbb, {n - 2)An-ia^-^ Jb Thus (by the definition of A^_i and r) «(0) = ^, (2.5) where Sa is the sphere of radius e and dS is its volume element. In other words, if 1/ is a function that is harmonic in some domain, then the value of u at any point is equal to its average value on any sphere centered at that point whose interior is completely contained in the domain. 3. THE MAXIMUM PRINCIPLE Equation (2.5) has a number of startling consequences which we shall now develop. Let i/ be a function that is harmonic in a domain U. Let xq be some point of [/, and suppose that there is a neighborhood W of xq such that W GU and u{x) < u{xo) for all x gW.
478 POTENTIAL THEORY IN E^ 12.3 Fig. 12.3 Fig. 12.4 (See Fig. 12.3.) Let Sa be a sphere of radius a centered at xq^ where a is so small that Ba C W. Then u(x) < u{xo) for all x G Sa. Therefore, /. udS < i(xo) f Js. dS, If there were some point x G Sa with u(x) < u(xo), then u(y) would be less than u(xo) for all y sufficiently close to x, since u is continuous. But then the above inequality would be a strict inequality, i.e., [ udS < u(xo) [ dS, which contradicts (2.5). We must therefore have u(x) = u(xq) for all x G Sa- Now suppose that W is an open set that is connected; that is, suppose that any two points of W can be joined by a continuous curve lying entirely in W. Let y be any point of TF, and let C be a curve joining xq to y. About each x on the curve we can find a sufficiently small ball with center x lying entirely in W. By the compactness of C we can choose a finite number of these balls which cover C. We can therefore formulate the following: There are a finite number of spheres Sa^, . . . , Saj^ such that each sphere and its interior lie entirely in W, Sa^^ has center xq, the center Xi of Sa.^^ lies on Sa., and y G Sa^. (See Fig. 12.4.) But this implies that u(xo) = u{xi) =^ - - - :=z u{xk) == u{y). In other words, we have established: Proposition 3.1. Let u be harmonic in a connected open set W, and suppose that u achieves its maximum value at some xq G TF. Then u is constant onTF. An immediate corollary of this result is: Proposition 3.2. Let C/ be a connected open set with U compact. Then if 1/ is a function that is continuous in U and harmonic in [/, unless 1/ is a constant. u{x) < max u(y) yGdU (3.1)
12.4 green's functions 479 Proof. In fact, since U is compact and u is continuous, u must achieve its maximum at some point xq of U. If we could actually choose xq E U, then u would have to be a constant by Proposition 3.1. If u is not a constant, then xo ^ dUj and we have (3.1). D From this we deduce: Proposition 3.3. Let C/ be a connected open set with U compact. Let u and V be functions that are continuous in U and harmonic in U. Suppose that M{y) = v{y) for all y G dU, Then u(x) == v{x) for all x G 17, Proof. In fact, u — v satisfies the hypotheses of Proposition 3.2 and vanishes on dU. Thus u{x) — v{x) < 0 for x G C/. Similarly v{x) — u{x) < 0, which implies the proposition. D An alternative way of formulating Proposition 3.3 is to say that on a domain U a harmonic function is completely determined by its boundary values. This is a uniqueness theorem: there is at most one harmonic function with given boundary values. It suggests the problem of deciding whether the corresponding existence theorem is true. This problem is known as Dirichlet's problem. Dirichlet's problem. Given a continuous function / defined on dU, does there exist a function u that is continuous in U and harmonic in U and such that u{y) = f{y) for all y G dU? We shall show in Section 9 that we can always solve Dirichlet's problem for domains with almost regular boundary. 4. GREEN'S FUNCTIONS Suppose that C/ is a domain for which the Dirichlet problem can be solved. We shall show in this section that the solution can be given "explicitly" in terms of a certain integral over the boundary. We first introduce some notation. For each x G E^ set"!" ^»(^. 2/) = (^_ 2)4^^11:.-2/||n-2 f«r 2/eE"-{x} (n>3), SO that (for fixed x) *dKn(x, •) = -z Tx on E"" — {x}, where r^ denotes the solid angle about the point x. Then for x G C/, Eq. (2.3) tThis is for n > 3. For w = 2, set K2(x, y) = (l/27r) In \x — y\.
480 POTENTIAL THEORY IX E^ 12.4 can be rewritten as u(x) - j [w * dKnix, •) + Knix,«) * diij - J^ Kn(x, ^) Au dV^ (41) Now for fixed x G U the function Knix^ ij) is a cliffereiitiable function of y as y varies on dV\ By assumption^ we enn therefore find a function hiXy •) that k harmonic in U and continuous in U and such that h{x, y) - -Kn(x, y) for all y G dl\ (1,2) Furtlierniore, for fixed y, /i(x, y) is a continuous function of x. In fact, by the maximum principle, I^C^i. y) ^ h(j^2, y)\ < max \Kn(x2, z) — Kn(xi, z)l star and Kn(x, z) is clearly uniformly contmuous in x for all z E ()U so long as i stays a fixed distance away from dL\ We have thus constructed a function in such that i) hr(x, y) is a contmuous function of x and y for x, y E U] and is differeii- tiable in y for // G L'; ii) for each fixed x, Ayhu = 0; i.e., AhuiXy -) ^ 0; in) Gu(x, y) = ii„(x, y) -r ^ir^(^, I/) "^ 0 for (/ e dU. The function (rV i^^ called the Green function of the domain l\ Let us suppoM* for a moment that Gu exists, and let us derive some of its properties. We write Q = Gf^r when there m no po^sibiiity of confusion. We fixst show that for x F V and 1/ G r we have G(.x,y)= (7(1/, x). (4J^ Let B^^^ and Bj^,, be small balls about x and y. Let ti == (?(x, •) and v — G{yj •) in (2.2), where U is replaced by C -— B^^t ^ ^y,i^ Since both function- arc harmonic in the domain and vanish on dU^ we obtain ( G{:c, •) * dG{y, ■) + [ (?(x, •) * dG{y, ■) - ( G{y, ■) * dG{x, •) ^ f G^y, ■) * dG{x, ■ We will show that the left-hand side approaches G{x, y) and the right-hand sidt* approaches Giy^ x) as e --^ 0. By symmetry, it suffices to look at the left-han<l aide. Now f G{x, 0 * dGiy, 0 - f Gix, •) * dKniy, 0 + f Gix, ^) * dh. JdBy^t JdBy^^ JdBy^^ The first term on the right-hand side approaches Gix, y) as in the proof of (2J . since A^^i * dK is the solid angle about y. The second term tends to zero
12.4 green's functions 481 since G(.r, •) and h are smooth functions on By^^, On the other hand, f Gix, ■) * dG{y, ■) = [ K{x,-)*dG{!j,-) \ f h{x, ■) * da{y, ■). JiB^^t JBB^,^ JdBj^^ The second term tends to zero, as above. The first term can be written, (for n > 3) since fi(ij^ •) is harmonic in B^^^, A similar argument works for n = 2 with ^n-2 r^pig^cg(j |3y iQg £^ This proves (4.3). Lei M be any smooth function on U, Apply (2.2) to u and v = G(x, •) on U — Bj^f. We get (since G(x, -) = 0 on dU) f M * fl(?(x, •) - [ u* dG{x, ') + [ G(x, 0 * clii JdU ' JdBr,i ^ ia5x.€ = [Gixr)^udx^ A " - A Ac''. The tMrd nitegral on the left-!iand side can be written as I lCn(Xj •) * du + I /i(x, •) * du JdB.^t J ' if f = I /„/^ 9t^V"^2 I *^'^' i I ^'('^'>") ::; a(€"-^) + 0(€^'-^^) A(n - 2)e^-^ and so tend^ to zeni. (The usual niodificaiion wnrks for n ^ 2.) We get nix) = [ G{i\ •) An dx^ A • • • A dx" + f ii * iiG(:r, •). (^^4) if Jar/ We observe that (4.4) shows that ij we know that there exists a solution to AF = /with the boundary conditions F = 0^ then it is given by Fix) = JGix, y)f{jj) dy. Similarly^ if we know that there exists a smooth sohition to the problem Am = 0, ti{x) = /(:r) for x G BU, (4.5) then it is given by u{x) ^ M * dG{xj *)• (4-^) It is important to observe that these formulas are consequences of the existence of Green's function for IL Thus they are valid whenever we can find the function h such that properties (ii) and (in) hold.
482 POTENTIAL THEORY IN E^ 12.5 5. THE POISSON INTEGRAL FORMULA In this section we shall explicitly construct the Green function for a ball of radius R. Let Br be the ball of radius R centered at the origin. For any .r E E"* — {0} let x' be the image of x under inversion with respect to the sphere of radius R: Define the function Gr by 1 R''-'- An-i{n-2)GR{x,y) \y-4n-2 ||,,||.-.||^,^_^,.||.-, 1 1 l||/y||^^-2 Rn-2 if X 9^ 0, (5.1) if X = O.t If X G Br, then x' ^ Br, and so the second terms on the right-hand side of (5.1) are continuous and harmonic on Br. We nmst merely check that property (iii) holds. Now for \\y\\ = R we have, by similar triangles (or direct computation), R \\u - A\ ,,^, ll'^il h - 4 so that Gr{x, y) = 0 for \\y\\ = R. (See Fig. 12.5.) This is (iii), so we have verified that Gr is the Green function for the ball of radius R. To apply (4.4) we must compute ^cIGr on the sphere of radius R. Now by (5.1) we liave (for x 9^ 0) j^n-2 An-\ * dGR{x, ') = T^ — ippn-r^-' But, by (5.2), R"" v\\n—2\\i, — \\,j-.r\\" ||,-||'-2||^_.,'||4*^f/- if ||/y|| = R. R'A\y - xt tif n = 2, sot (?«(.r, y) = log \\y - x\\ - log M \\y _ ^'|| if :c 9^ Q, = log |h/|| - log R if X = 0.
12.5 THE POISSON INTEGRAL FORMULA 483 Fig. 12.5 We thus see that for \\y\\ = R, i*R{An-i * cIGr) = i*R IR IR \y — X R'- \\y - x^R R^ — "^22y *dy R^ y wxr^'ji * dy^ ■r dr\ . But z^(*r dr) = R dSu, where dSn is the volume element on the sphere of radius R. If we substitute into (4.6), we obtain u(x) R' Wxf RAr. u{y) SrWV dSu (the Poisson integral formula). (5.3) In the proof of (5.3) we used the assumption that the function u is differ- entiable in some neighborhood of the ball Br and is harmonic for ||x|| < R. Actually, all that we need to assume is that u is differentiable and harmonic for ||x|| < R and continuous on the closed ball ||x|| < R. In fact, for any ||x|| < R, Eq. (5.3) will be valid with R replaced by Ra, where ||x|| < Ra < R- If we then let Ra approach R, we recover (5.3) by virtue of the assumed continuity of u. Equation (5.3) gives the solution of Dirichlet's problem for a ball provided we know thai the solution exists. That is, if u is any function that is harmonic on the open ball and continuous on the closed ball, it satisfies (5.3). Now let us show that (5.3) is actually a solution of Dirichlet's problem for prescribed boundary values. Thus suppose we are given a continuous function u defined on the sphere Sr. Then we are given u{y) for all y G Sr. Define u{x) for ||x|| < R by (5.3). We must show that a) u is harmonic for ||x|| < R, and b) u{x) -> uiy^) if X -> 1/0 and ||i/o|| = R. To prove (a) we observe that Gr{Xj y) is a differentiable function of x and y in the range ||x|| < Ri < R, Ri < \\y\\ < R^/Ri, and is, by construction, a harmonic function of y. For ||x|| < R and \\y\\ < R, we know, by (4.4), that Gr{x, y) = GRiVy x)'
484 POTENTIAL THEORY IN E"" 12.5 Thus for fixed y with \\y\\ < R, Gr{', y) is a harmonic function on Br — {y}. Letting \\y\\ -^ R, we see that Gr{x, y) is a harmonic function of x for ||x|| < Ri < R for each fixed y G Sr. Thus is a harmonic function of x for each 1/ G Sr. In other words, all the coefficients of *dGR{', y) are harmonic functions of x for each y G ^Si^, and therefore so is each coefficient of u{y) ^ dGr{', y). It follows that the function u{x) = Jsr u * dGR(Xj •) is a harmonic function of x, since the integral converges uniformly (as do the integrals of the various derivatives with respect to x) for ||x|| < Ri < R. This proves (a). To prove (b) we first remark that the constant one is a harmonic function everywhere, so Eq. (5.3) applies to it. We thus have R^-Jxf f dSR , „ 11 11 / r> rKA^ —-A n- / Ti ru = 1 for any \\x\\ < R. (5.4) An-lR Jsr \\y - Xt J W W Now let yo be some point of Sr, and let u he a continuous function on Sr. For any e > 0 we can find a 5 > 0 such that \u{y) — u{yo)\ < e for \\y — yo\\ < 28 {y gSr). Let Zi= {yeSR: \\y - yo\\ > 28} and Z2= {yGSRi \\y - yo\\ < 28}. Then by (5.3) and (5.4) we have, for ||x|| < R, R' - \\xf f u{y) - u{y^) {An-l)R JSr nix) - uiyo) = ^f^r-f^ r: zr^dsu, so where and \u{x) — u{yo)\ < I1 + 12, _ R^ - ||xf f \u{y) - 1/(7/0)1 '' An-lR Jz, Wlj-Xt J ^ R^ — \\xf f \u(y) — u{yo)\ ^ An-iR Jz2 \\y - ^Ih dS, dSi Now if \\yo — x\\ < 8, then for all y gZi we have \\y — x|| > \\y — yo\\ — \\x — yo\\, so that \\y — x\\ > 8. Thus for all x such that ||x — yo\\ < 8 the integral occurring in /i is uniformly bounded. Since ||x|| -^ i^ as x -^ 1/0, we conclude that /i —> 0 as x -^ 1/0. With respect to 12, we know that \u{y) — u{yQ)\ < e for all y G Z2, so that h < R' - llxf An — lR by (5.4). This proves (b). [ edSR ^ (r' - ||xf f dSR \ Jz2\\y-xt V An-lR JsR\\y-x\\/
12.6 CONSEQUENCES OF THE POISSON INTEGRAL FORMULA 485 We have thus proved: Theorem 5.1. Let u he Si continuous function defined on the sphere Sr. There is a unique continuous function defined for \\x\\ < R which coincides with the given function on the sphere Sr and is harmonic for ||x|| < R. This function is given by (5.3) for all ||x|| < R. Fig. 12.6 6. CONSEQUENCES OF THE POISSON INTEGRAL FORMULA In the proof of Theorem 5.1 we assumed that u is continuous in the closed ball, possesses two continuous derivatives in the open ball, and is harmonic in the open ball. However, it is clear from (5.3) that u possesses derivatives of all orders when ||x|| < R. In fact, if ||x|| < Ri < Rj we can differentiate (5.3) under the integral sign any number of times, since all the integrals we obtain will be uniformly integrable in ||x||. Now if u is harmonic in some open set U and X G Uy we can choose R sufficiently small so that the ball of radius R centered at x will be contained in U, (See Fig. 12.6.) We can then apply (5.3). We thus conclude: Proposition 6.1. Let u he Si function defined on an open set U, possessing two continuous derivatives, and satisfying Au = 0. Then u has continuous derivatives of all orders. We can improve Proposition 6.1 by examining (5.3) a little more closely. Let \\y\\ = R, and let Ri < R. Therefore, all ||x|| < Ri. The multinomial theorem allows us to expand = L ^ii in(y)^i'''' ^n"", (6.1) \\y - x\\ where the coefficients depend on y but the series converges uniformly in x for all \\y\\ = R, Furthermore, if D is any operator of partial differentiation with respect to x, then we obtain an analogous expansion D{\\y - x\\-"} = z 5„. aSy)^V ■ ■ ■ 4", where this series is obtained from (5.4) by term-by-term differentiation and converges uniformly in the same region. If we now substitute (6.1) into (5.3), we can integrate the series term by term because of the uniform convergence in
486 POTENTIAL THEORY IN E^ y, and we conclude that 12.6 U{x) — zL ^ai,...,a„^l^ * * * ^n"j where the series converges uniformly for ||x|| < Ri. Furthermore, we can differentiate the series term by term. Doing this and evaluating at x = 0, we see that whereas <q:i, . . . , q:^> . We thus have proved: Proposition 6.2. Let u be harmonic in the ball ||x|| < R. Then ^W = Z A ^^"^(O)^", (6.2) where the Taylor series (6.2) converges for all ||x|| < R and converges uniformly for all ||x|| < Ri < R. In particular, u is determined throughout the ball by its value and the value of all its derivatives at the origin. Fig. 12.7 Let U be some connected open subset of E^, and let u and v be two harmonic functions on U. Suppose that u and v have the same value and the same derivatives of all orders at some point x G U. Let y be some other point of U. We can then connect x to i/ by a series of balls lying in U, where the center of each ball lies in the interior of the preceding ball, and such that x is the center of the first ball and y lies in the last ball. (See Fig. 12.7.) We thus conclude that y^iy) = v{y). Thus we have: Proposition 6.3. Let u and v be harmonic functions defined on an open connected set U, Suppose that u and v, together with all their derivatives, coincide at some point of U. Then u ^ v onU, In particular, if u{x) = v{x) for all X G IF, where W is some open subset of [7, then u{x) = v{x) for all X G [/.
12.7 harnack's theorem 487 We continue to examine the consequences of (5.3). Let u be given by (5.3). Let D denote any operator of partial differentiation with respect to x. Thus ^ai+...+a„ Then D^u{x) is given by an integral over Sr of a function involving, at worst, inverse powers of \\y — x\\ for y on Sr and the function u on Sr. In particular, if ||x|| < Ri < R, then we can estimate the maximum absolute value of D^u in terms of the values of u on Sr and the difference R — Ri. In short, iD^'uix)] < c(D'', R, Ri) max \u(y)\ for ||x|| < R^ (6.3) \\y\\=R where c depends only on a, R, and Ri. Now suppose that u is harmonic in some open set U, and let Ki and K2 be compact subsets of U with Ki C int K2CK2C U. About each x G Ki we can draw an open ball Br^ such that Br^ C int K2, so that Sr^ C K2. We can also draw a ball Br^^ of slightly smaller radius about x. Now the open balls Br^^ cover Ki. Since Ki is compact, we can select a finite number of these balls which cover Ki, By applying (6.3) to each of these balls, replacing \u{ij)\ by the larger max^e^a W{^)\-, and using the smallest of the finitely many constants c, we conclude that max \D''u{x)\ < c(a, Ki, K2) max \u(z)\. (6.4) xGXi 2GX2 In particular, we have: Proposition 6.4. Let {uk} be a sequence of harmonic functions defined on an open set U that converges uniformly on any compact subset of U. Then the sequence of partial derivatives {D^u} also converges uniformly on any compact subset. In particular, the limit of the sequence is again a harmonic function. In fact, for any compact subset Ki we can always find a compact subset K2 such that Ki C int K2. Applying (6.4) to the harmonic functions u^ — Uj establishes the uniform convergence of {D^uj^} on Ki. Since the sequence of partial derivatives converges uniformly, Au = lim Auk = 0. 7. HARNACK'S THEOREM We continue to reap the consequences of Eq. (5.3). In addition to what we assumed in the preceding section, let us suppose that u{y) > 0 for all y G Sr.
488 POTENTIAL THEORY IN E"" 12.7 y--^\\ Now for ||x|| = Ri < R we have (see Fig. 12.8) R - Ri < \\y - x\\ < R + Ri for all Then, by (5.3), Fig. 12.8 11^11 = R' R' An-lR(R + R\ f + Rir Js udSR < U{x) < -r- R' R .iR(R udSi Now according to (2.5), the integrals occurring on the right and left of this inequality are equal to An-iR'^~^u{0). Thus / r>2 r>2\ r>n—2 / r>2 r>2\ r>n—2 [K — III) ti (f\\ ^ ( \ ^ \'^ ~ t(j\)K ,^. -1/(0) < U{X) < -r^ ^-^^ 1/(0). {R+ Rl^ (R - Rir (7.1) Inequality (7.1) is known as Harnack^s inequality. It has the following consequence. Suppose that {uj^} is a sequence of functions satisfying (5.3) and such that ui(y) < U2{y) < ■ ' ' < Uk{y) < • • • for all ye Sr. If we apply (7.1) to the functions Uj — Ui (j > i), we conclude that \ujix) - Ui{x)\ < d{R, Ri)\ujiO) - u^{0)\ for all ||:r|| < Ri, where d{R, Ri) depends only on R and Ri. In particular, if the sequence converges at the origin, it converges uniformly for ||:r|| < Ri < R, By applying our usual device of joining two points by a sequence of balls, we deduce: Proposition 7.1 {Harnack^s theorem). Let {uk} be a sequence of harmonic functions defined on a connected open set U and such that Ui{x) < U2{x) < for all X gU. (7.2) Suppose that the sequence {uk{p)} converges for some p gU. Then the sequence of functions {uk} converges uniformly on any compact subset of U and, by Proposition 6.4, the limit function is again harmonic.
12.8 SUBHARMONIC FUNCTIONS 489 A useful variation of Harnack's theorem is: Proposition 7.2. Let {uk} be a sequence of harmonic functions defined on an open set U and satisfying (7.2). Suppose that there is a constant M such that Uk(x) < M for all k and all x gU. Then the sequence of functions {uk} converges uniformly on any compact subset of [/ to a harmonic function. To prove Proposition 7.2 we remark that this time the convergence of {uk(x)} at each x G U is automatic because it is a monotone (nondecreasing) sequence of bounded real numbers. Proposition 7.1 guarantees that the convergence is uniform on compact subsets to a harmonic limit function. 8. SUBHARMONIC FUNCTIONS In this section and the next we shall show that Dirichlet's problem can be solved for bounded open sets U whose boundaries satisfy certain regularity conditions. The proof that we shall present (there are many others) is due to Perron and makes essential use of the concept of a subharmonic function. Let i/ be a function defined on the open set U. We say that u is subharmonic if a) u is continuous; and b) for any connected open subset W of U and any harmonic function, v defined on W, the function u — v satisfies the maximum principle on W. In other words, for any such W and v, if there is some Xq gW with u{x) — v{x) < u(xo) — v(xo) for all x GWy then u{x) — v{x) = u{x(y} — v{xo) in W. In order to understand condition (b) a little better, we study some of its consequences. First of all, we can let v be the harmonic function that is identically zero. Then (b) says that u satisfies the maximum principle on every open subset of U, Next we let Br be some ball with center z whose closure is contained in U. In particular, its boundary, Sr, is contained in U, and the function u is continuous on Sr. We can therefore find a harmonic function v defined on the open ball and taking on the values u(y) for y G Sr. We take W to be the open ball and V to be the function we have just constructed. Then u — v vanishes on Sr, so (b) implies that u{x) < v{x) for all x gW. In particular, u{z) < v{z). But We have thus shown that if t^ is a subharmonic function defined on U and Sr is a sphere with center z lying in [/, then *(^)^A„_'ii;"-i/.,"^'^«-
490 POTENTIAL THEORY IN E^ 12.8 In other words, a subharmonic function is always less than or equal to its average value over a sphere about a point. This property is frequently taken as the definition of a function's being subharmonic, and the hypothesis of continuity is somewhat relaxed. However, the definition we gave above is more suitable for our present purposes. Let Wi and t(;2 be two subharmonic functions defined on an open set U. Define the function Wi V W2hy setting (wi V W2)(x) = max [wi(x)j W2(x)], The function Wi V W2 is again subharmonic. The fact that Wi V W2 is continuous is established as follows: Let x be any point of [/. Then Wi V W2{x) is either Wi(x) or W2{x), say wi{x). Since Wi is continuous, for any € > 0, we can find a 5 > 0 such that \wi{y) — Wi{x)\ < e for \y — x\ < 8, Similarly, we can arrange to have W2{y) < W2{x) + € for that same range of y, so that "^2(1/) < Wi(x) -\- €, since ii;i(:r) = max [wi{x), W2{x)]. For these values of y we thus have wi{x) — e < wi(y) < wi y W2{y) < Wi{x) + e, so that Wi y W2\s continuous. Now let T7 be a connected open subset of U and let y be a harmonic function on W. Suppose that Wi \J W2 — v takes on its maximum value at some point :ro of W, Suppose that Wi V t(^2(^o) = {xq). Then for all :r G T7 Wi{x) — v{x) < Wi y W2{x) — v{x) < Wi{xo) — v{xo). Since Wi is subharmonic, the right and left sides of this inequality must be equal. Thus Wi V W2 — V satisfies the maximum principle on W, which shows that Wi V W2is subharmonic. Let whe Si subharmonic function defined on an open set U, and let B be a ball whose closure is contained in U. Let u be the solution of Dirichlet's problem for the ball B with boundary values w. Thus u is the unique function continuous in the closed ball, harmonic in the open ball, and coinciding with w on the boundary S of B. As we have already observed, w{x) < u{x) for x G B. Now define the function ivb by setting / N lw(x) for X G U —~ B, ^^^^)=U(:.) for xeB. We claim that the function wb is again subharmonic. The proof is just as before. The continuity is obvious, as before. If W is subset of U and z; is a harmonic function on W, then wb — v cannot achieve a strict maximum at any interior point: suppose wb{x) — v{x) < wb{xo) — v{xo) for all x gW, (See Fig. 12.9.) If xq G U — B, then we h8ivew(x) — v(x) < wb{x) — v{x) < wb{xo) — v(xo) = ^(^0) ~~ ^(^o)- Since w is subharmonic, all these inequalities must be equalities. On the other hand, ifxoGB, then since wb is harmonic in the open ball, we must have wb{x) — v(x) = wb{xo) — v{xo) for all x gW. By continuity, this imphes
12.9 DIRICHLET S PROBLEM 491 Fig. 12.9 that wsiy) ~ v{y) = wb{xo) — v{xo) ior y G W 0 S. If W oS p^ 0^ we can take 2/ as a new xq and the problem is reduced to the previous case. In shorty to every subharmonic function w defined on U we have attached another subharmonic function wb such that wb is actually harmonic in the interior of B and coincides with winU — Bj and such that w < Wb throughout 17. It is clear from the method of construction that if Wi and W2 are two subharmonic fimctions defined on 17 with Wi < W2, then 9. DIRICHLET'S PROBLEM Let U be an open subset of E^ with U compact. We say that U has a touchable boundary if for every p E dU there is a ball B such that B oU = {p}. Thus Fig. 12.10(a) represents a domain with a touchable boundary, while Fig. 12.10(b) and 12.10(c) represent domains that have untouchable points on their boundaries. (a) (b) (c) Fig. 12.10 Let 17 be an open subset of E^ with U compact and with touchable boundary. Let/be a bounded function defined on dUj and suppose that i/(p)i < M for all p E dU. Let Wf denote the class of all functions defined on 17 which satisfy the following two conditions: a) each w G Wf is subharmonic; and b) for each p GdU, lim sup w{x) < /(p). xeu
492 POTENTIAL THEORY IN E"* 12.9 We can rephrase condition (b) as follows: For each p E dU and each e > 0 there is a § > 0 (which depends on p, w, and e) such that w{x) < f{p) + € for \\x — p\\ < 8. Note that the family of functions Wf is nonempty, since the constant function which is identically —M clearly belongs to Wf. Also note that condition (b) implies that lim sup w{x) < M sls x —^ dU. Since w gW/ is subharmonic, we conclude from the maximum principle that \w\ < M for all w G Wf. Now define the function u/ by setting Uf(x) = lub w(x). wGWf In view of the preceding remarks, the function Uf is well defined and, in fact, \u{x)\ < M for all x G U. We shall show that if / is continuous on dU, the function Uf is the solution of the Dirichlet problem for U. We must thus show that Uf is harmonic and takes on the boundary values /. We first show, without any continuity restrictions on /, that Uf is harmonic. To do this it is sufficient to show that Uf is harmonic in any open ball B, where B CU. Let X be some point of B. Let wi, W2, - - - , Wk, . . . , be a sequence of functions belonging to Wf such that lim Wi(x) — Uf(x). (Such a sequence exists by the definition of Uf.) Now define w\ = Wi, W2 = Wi V W2j w^i = Wi V ' ' ' V Wi. Then w[ < W2 < - - - < wj^ < • • • is a monotone increasing sequence of functions belonging to Wf with lim Wi(x) — Uf(x). Now replace each Wi by Wis- The function Wis belongs to T7/, since it is subharmonic and coincides with wl near dU. Furthermore, since w < wb for any subharmonic function, we can conclude that lim Wisix) — Uf(x). Inside the ball B, each of the functions wIb is harmonic, so that the sequence {wis} is a monotone nondecreasing sequence of harmonic functions in B. Since all the functions of Wj are bounded by M, we conclude from Proposition 7.2 that the sequence {wis} converges uniformly on any compact subset of J5 to a function u which is harmonic in jB, and we have Uf(x) — u(x). Furthermore, by definition, we have u(y) < Uf{y) for any y E B. We would like to show that we actually have u{y) = Uf{y) for all such y. To this effect, we follow the same procedure at ?/, namely, we select a sequence {vIb^ where each ViB is harmonic in J5, belongs to T7/, and is such that the sequence is monotone nondecreasing and lim v^Biy) = '^fiv)- In this way we obtain another function v which is harmonic in B and satisfies v < Uf throughout B, and v{y) — Uf{y). Finally, let the functions Si be defined by Si = W^iB V v'iB,
12.9 dirichlet's problem 493 and consider the functions Sis- On the one hand, they all belong to W/ and are harmonic in B. Furthermore, we have w^b ^ SiB and Vis < Sis- Finally, the sequence {s{b} is nondecreasing and therefore converges to a harmonic function § in B, By construction, we have u < s and v < s throughout B, while u{x) = s(x) = Uf(x) and M(y) = s{y) = Uf(y). Applying the maximum principle to the harmonic functions u — s and v — s,we conclude (since the maximimi value 0 is achieved at x and y, respectively) that s = -M = ?; in all of B. We thus see that Uf(y) = u(y) for any y G B, which shows that Uf is harmonic in U, Note that in proving that % is harmonic we did not make use of any properties of the boundary of U or any continuity assumptions of/. Fig. 12.11 Now let p G ^17 be a touchable point of the boundary, and assume that / is continuous at p. We shall show that Uf(x) -^ f{p) as x -^ p. Let the ball Br of radius R with center z touch Tl at p, that is, Br nU = {p}, (See Fig, 12.11,) Since / is continuous at p, for any € > 0 we can find an Ri > R such that l/(g) - f(p)\ < e for all qedUn Br,. (9.1) Define the function b by setting b(x) = \\x ■— z\\^'^'^ — R^^^J Note that b is defined and harmonic on E^ — {z} and is negative for \\x — z\\ > R. Furthermore, b{x) < i?^ < 0 if ||x — z\\ > Ri. In particular, for all qGdU we have m <^Hq) + e+f{p). (9.2) In fact, this is just (9.1) if ||g — z\\ < Ri, and it follows from |/(g)| < M if l\q — z\\ > Ri. By the maximum principle for subharmonic functions we conclude that w{x) < {2M/K)b{x) + e + /(p) for any w gW/ and any a: e 17. We thus have Uf{x) < -^ b(x) +/(p) + € for any x G U. On t^e other hand, we have, for the same reason as before, 2M f{p) - € " -^ b(q) < fiq) for any q E BU, t h{x) = In iR/\\x - z\\) if n = 2.
494 POTENTIAL THEORY IN E^ 12.9 SO that the harmonic function f{p) — e — {2M/K)h actually belongs to Wf. In particular, we have /(p) — € 7^ h{x) < Uf{x) for any x E.U. K Putting the two inequalities together, we conclude that \M^) - Kp)\ < ^ Hx) + € for all xeU. (9.3) Now the function h is continuous and vanishes at p. Thus by choosing x sufficiently close to p we can arrange that the right-hand side of (9.3) is less than 2€. Thus Uf{x) -^ f(p) SiS X -^ p. In particular, if all points of dll are touchable, and if/ is continuous at all points of the boundary, we have proved: Theorem 9.1. Let U be an open subset of E^ with compact closure and touchable boundary. Let / be a continuous function defined on dU. Then there exists a unique function Uf which is continuous on U and harmonic in U, and which coincides with f on dU. Some remarks concerning Theorem 9.1 and its proof are in order. a) The only time we used the assumption that dll is touchable was when we constructed a function h which was subharmonic in U, vanished at p, and took on values less than some negative constant on all points oi dll outside some neighborhood of p. Now a more careful analysis will show that we can always construct such a function so long as we can touch p from the outside by a cone. Thus we can solve the Dirichlet problem even at points like the p in Fig. 12.12. Fig. 12.12 b) On the other hand, some condition on the boundary is necessary. For instance, if dll contains an isolated point p, then we cannot assign the value at p arbitrarily. In fact, if u is continuous in a neighborhood W of p and harmonic in W — {p}, then the Poisson integral formula implies that the various derivatives of u will also be bounded mW— {p}. Therefore, the proof of the mean-value theorem applies to the point p itself, and so u{p) is determined and cannot be assigned arbitrarily. More generally, a more delicate argument shows that the Dirichlet problem cannot be solved for domains whose boundaries contain spikes pointing inward or (in dim >3) sufficiently sharp cusps (as in
12.10 BEHAVIOR NEAR THE BOUNDARY 495 Fig. 12.13 Fig. 12.13). This is the mathematical analogue of the physical fact that a very sharp conductor cannot hold a charge, but will spark. (The relation with electrostatics will be discussed in Section 12.) c) For the purpose of applying the results of Section 4, i.e., the construction of the Green function of the domain and the various identities. Theorem 9.1 is still not quite enough. We need to know not only that the Green function exists, but also that it has continuous derivatives up to the boundary. This fact requires further argument and additional assumptions about the nature of the boundary; it will be discussed in the next section. Fig. 12.14 10. BEHAVIOR NEAR THE BOUNDARY The purpose of this section is to discuss the behavior of the solution of the Dirichlet problem near the boundary. In fact, we shall prove: Theorem 10.1. Let U he Si domain with regular boundary and compact closure in E^. Let / be a continuously differentiable function defined on dU, and let u be the solution of the Dirichlet problem with boundary values /. Then the partial derivatives du/dx^ can be extended to continuous functions on U. Furthermore, if p ^ dU and J = < ?\ • • • , ?^> is tangent to the boundary at p, then {^,df) = {^,du) = j:(du/dx')e- The following proof was suggested to us by Professor Ahlfors and is reproduced here with his kind permission. Proof. Let p be some point of dll, and let us arrange, by a Euclidean motion, that the tangent plane to dU 2it p is the plane x'^ = 0. (See Fig. 12.14.) Then nearpthepointsof^t/are all points of the form (x\ . . . , x^~\ <p{x^,. • •, x'^~^)),
496 POTENTIAL THEORY IN E^ 12.10 where <^ is a function defined near the origin of IR^~^ and vanishing at the origin, together with all its first derivatives. In particular, there is a constant C > 1 such that \Ax\..., x^-')\ < c{{x'f + ■■■ + ix^-'n in some neighborhood of the origin of W^~^. We can therefore choose a sufficiently small R < 1/2C so that the (open) ball of radius R with center (0, . . . , 0, R) lies entirely within U and the ball of radius R with center (0, . . . ,Qj —R) lies inside the complement of U. (See Fig. 12.15.) Let z = (0, . . . ,0, —R). Then for all q E: dll sufficiently close to p we have \\q - zf = {x'r + . . ■ + (a;—^)2 + R^ - 2E<p{q) + /(g) > (1 " 2RC){{x^Y + . . . + {x'^-^f} + fi2 ^ /(g), so that \\q - z\\- R> k{(x'y H h (x^'-Y}, where k is some positive constant. Let (p(r) = r^"^ (=ln r if n = 2). Then (p'(R) 9^ 0, and therefore |(p(||g — ^Ij) — ip{R)\ > C\\\q — z\\ — R\ for a suitable C > 0. In particular, we have the inequality 1 '|g — z|h-2 R^-2 or iig — z In- >K{(xy + --- + (x^-Y} (10.1) > KixY if n - 2 R for some positive constant K. Now let us replace the function / by / - [/(p) + Y.V^ W/dx'){p)x% Then u — [/(p) + YJi"^ {df/dx'){p)x''\ is the solution of the corresponding Dirichlet problem. In this way we may assume (changing our notation accordingly) that /, together with its first derivatives, vanishes at p. Since / is assumed to have continuous first derivatives, we may apply Taylor's theorem to conclude that there exists a c > 0 such that for all qon dU sufficiently close to p we have |/(g)| <c{{x'Y + --- + {x--'Y}. (10.2) As in the last section, let h{x) - \\x - ^f-^ - i^'-" — In -M \r if n — 2. h - ^11 If we compare (10.2) with (10.1), we see that for all q E dU sufficiently close to p we have 1/(3)1 < I l&(3)l- On the other hand, since b is strictly negative outside some neighborhood of p
12.10 BEHAVIOR NEAR THE BOUNDARY 497 ie«-^0 «(0,...,0, "20 Fig. 12.15 Fig. 12.16 (on dU)f and since / is bounded on dUj we can (using the above inequality for all q near p) find a constant A such that |/(^)i < A\biq)\ for all q E dU. (10.3) Since the function b is harmonic in C7, the maximum principle applied to m — Ab and Ab —■ u allows us to conclude that \u{x)\ < A\bix)\ for all xeU, (10.4) Now the function I? is a differentiable fimction of the distance from z which vanishes when this distance is R, In particular, there is a constant a > 0 such that \b{x)\ < ad{x), where d(x) denotes the distance from x to the sphere of radius R with center z. Now let B be the ball of radius R with center —z = (0^. . . ^ 0, U), so that B lies in U^ and let S = dB. Note that S is tangent to the sphere about z at the point p. (See Fig. 12.16.) Thus for points y on Sj d{y) < Ci\\y — p\\^. If we substitute this into (10.4) using the previous inequalityj we conclude that \u{y)\ < L\\y - p\\ for all y &S, (10.5) for some constant L. We apply the Poisson integral formula to the ball B and the fimction u and differentiate with respect to x* to obtain /u (y) ■dS + R' \\x + z\\ An~lR /w II iy)ix' - y') 2/11" +2 dS. (10.6)
498 POTENTIAL THEORY IN E^ 12.10 Now let X be on the normal to the boundary through p, so that x = (0,. . . , 0, x'^). If |x^| < R, then \\y -p\\ < \\y -x\\ + \x^\ < 2\\y ~x\\. Thus [ r^^in ds\ <2^f II '""^^^L dS < 2^2) f\\y - pf "^ dS. J \\x — 2/11^ I J \\p — 2/lh J Since the sphere is {n — 1)-dimensional, this last integral converges absolutely, and we thus see that the first integral in (10.6) is uniformly absolutely convergent. (Note that the term containing this integral vanishes if i < n.) We now show that the second term in (10.6) tends to zero as x'^ tends to zero. In fact, R^ — \\x + ^11^ = x^{2R — x^), so we can write the second term of (10.6) as An_iR J \\x-yt+^ This time, since x" < \\x — y\\ for all y & S, we can assert that the integrand is smaller in absolute value than \u{y)\W--y'\ \u{y)\ k(^)|2-+^^^ ^ m^+i/^ii,, ^ ^||3/2-n \\y _^ a:||^+3/2 - \\y _^ :r||n+l/2 - \\y _ p||n+l/2 ^ ^^ 11^ PW which is again absolutely integrable. Therefore, the integral occurring in the last expression is uniformly bounded for all values of x'^ > 0. Since (x^)^^^ -^ 0, we conclude that the second term of (10.6) approaches zero. If we now rephrase the result we have obtained independently of the special choice of coordinates, we see that we have proved the following: Let p be any point of dUj and let x be a point on the normal line to dU through p. If ^ is any vector parallel to a tangent vector at p, then (^, du) -^ (^, df) as X -^ p along the normal line. If d/dn denotes the unit vector in the normal direction (pointing into U), then ) _, _z^ [ <y) - y ds, (10.7) / An-i J y - pt E~ '^^ / A I II dn / An-i J \\y — p where the integral is taken over the sphere of radius R tangent to dU dhi p and lying inside U. So far, we have proved convergence only \ix -^ p along the normal direction. If we go back and examine the argument, we see that the radius R and the constant D in (10.5) can be chosen uniformly for all p G dU. (In fact, since dU is compact, we can cover ^[/ by a finite number of neighborhoods, of type (iii), such that a ball of radius R about each p E: dll lies in one of these neighborhoods, where R is sufficiently small. In such a situation it is clear that the choice of R (still smaller, if necessary) depends only on the second derivatives of the change of charts from Euclidean coordinates to the adjusted charts. Also the choice of D depends only on the constants involved in the Taylor expansion of / and on R. Thus the choice of R and D can be made uniformly.) Since, by assump-
12.11 dirichlet's principle 499 tion, the partial derivatives of/are continuous and the right-hand side of (10.7) is continuous in p, we conclude that the limits of the partial derivatives of u exist as we approach the boundary in any direction, and that in fact we have proved Theorem 10.1. D We can now construct a Green function for any domain with regular boundary by the solution of Dirichlet's problem, as in Section 4. Theorem 10.1 tells us that it is differentiable in the closed set U in the sense that the partial derivatives are continuous up to the boundary. If we want to derive the various formulas of Section 4, we can do so by applying a limit argument: We simply replace [/ by a smaller domain U^ such that U^ -^ U and dU^ -^ dU SiS a -^ 0. (Actually, a more careful and delicate argument will show that if / has two continuous derivatives, then the du/dx^ we obtain as limits on the boundary are in fact continuously differentiable. Since du/dx^ is the solution of the Dirichlet problem with these boundary values, we conclude that the second derivatives of u can be extended so as to be continuous on U. In this way one can prove that if / is C°°, all the derivatives of u can be extended so as to be continuous on U. In particular, all the derivatives of the Green function G{x, •) can be extended to continuous functions at the boundary for any x E: U.) 11. DIRICHLET'S PRINCIPLE The solution of Dirichlet's problem has an interesting and useful characterization as the solution of the problem of minimizing the Dirichlet integral D[Uj u] = JYl {du/dx'^Y with given boundary values. More precisely. Theorem II.I (Dirichlet's principle). Let U he a, domain with compact closure and regular boundary, and let / be a continuously differentiable function on dU. Let u be the solution of the Dirichlet problem with boundary values /, and let lo be any function which is differentiable in U and takes on the values of / on the boundary, and whose derivatives can be extended to be continuous in U. Then D[u,u] < D[w,wl (11.1) and equality holds if and only if u = w. Proof. Let us write w = u ^ v, where now v is continuously differentiable on U, continuous on [/, and vanishing on dU. Then D[w, w] = D[u + v,u + v] = D[u, u] + D[v, v] + 2D[u, v], (11.2) since D is bilinear and symmetric in its arguments. Now suppose for the moment that u possesses two continuous derivatives and V possesses one continuous derivative in a neighborhood of U, Then D[u, v] = Du[u, v] = dv * du = I v * du — vAu = 0. (11.3) Ju Jdu Ju
500 POTENTIAL THEORY IN E"" 12.12 Now D[v, v] = JX! (dv/dx^)^ > 0 and vanishes if and only if all the derivatives of V vanish, so that z; is a constant on each component of U. Since v vanishes on dU, we conclude that D[v, z;] = 0 if and only if z; = 0. In case u and v are not necessarily differentiable in a neighborhood of dU, we establish that D[u, z;] = 0 by writing Du[Uj v] = lim Du^[Uj v]j where U^ is a sequence of domains which approach [/ as a -^ 0 and dll^ -^ dll. Applying the previous argument to each of the domains U^j we conclude that D[u, z;] = 0 and D[v, z;] = 0 if and only if z; = 0. This proves the theorem. D 12. PHYSICAL APPLICATIONS The study of Laplace's equation and its solutions plays an important role in many theories of classical physics. This is essentially due to the intimate connection between the Laplace operator and Euclidean geometry. Since this is not the place to elaborate on the physical applications, we refer the reader to any physics text for the details. (See, for instance, Feynman, Leighton, and Sands, The Feynman Lectures on Physics, Addison-Wesley, Reading, JNlass., 1964, vol. II, especially Chapter 12.) We shall briefly mention some of the relevant physics in this section. In electrostatics it is assumed that the electric field E satisfies the equations d* E = p, dE = 0, where p is the density of charge and we identify the vector field E with a linear differential form via the Euclidean metric on E^. Here p is assumed, for the moment, to be a smooth density. (It is also convenient to consider the limiting cases of "surface distributions" and "point distributions".) By the second of these equations, we know that E can be written as E = —d(p, where <^ is a smooth function known as the potential of the electric field. The first equation then implies that A(p = d * d(p = —p. If p is given, then (p is determined by the formula ^^^^ 4:7r J \\y — x\\ (since ^2 = 47r). As limiting cases, we obtain, for charge distributed along a surface S with density a dA, the formulas ^^^ iTTjs \\y - x\\
12.12 PHYSICAL APPLICATIONS 501 where dA is the area density of .the surface, and where Ids is a, linear distribution along a curve C, For a point charge located at Xj we obtain where e is the magnitude of the charge. There can be no electric field along a conductor (since this would result in a motion of the charge, which contradicts the assumption of a static field). Thus d(p = 0 and (p = const. In most problems that arise the distribution of charge is not known in advance, but must be determined. For example, suppose that a unit charge is placed at a point x inside a cavity whose boundary is a conductor which is grounded, i.e., kept at zero potential. (See Fig. 12.17.) We want to find the electric field inside the cavity. There will be a charge distribution induced on the boundary surface that we also wish to determine. Since there is no charge distribution anywhere inside the cavity, except at x, the potential (p must be harmonic everywhere except at x, and, in fact, differs from 1 47r|| • — a^ll by a harmonic function. Since we want «^ == 0 on the boundary, we see that the desired potential <p is exactly the Green fimction for the domain. The surface density can then be determined from d(p along the boundary. (This is another reason why it is important to know that the solution of Dirichlet's problem is differentiable at the boimdary.) More frequently, we are interested in the electric field outside conducting surfaces. Here, strictly speaking, the theory we have developed of Dirichlet's problem does not apply, since we considered only bounded domains. We can handle this problem in one of two ways. First, we can reduce to the Dirichlet problem by considering everything contained inside a very large conducting sphere (at potential zero). Then we let the radius go to infinity to get the desired potential as a limit. Second, we can modify our arguments in Sections 3 through 10 to include the case of unboimded domains, but restrict all our functions to vanishing as c/r at infinity. The details are left to the reader. In the theory of diffusion one assumes that material of some nature is flowing according to a vector field X. If the density of the material is p dV,
502 POTENTIAL THEORY IN E"* 12.12 then the amount of material flowing per unit time through an oriented piece of surface S is given by f p(X, Ti)dA=f *pX, Js Js where n is the unit normal vector to the surface, dA is the element of area, and, we are regarding pX on the right-hand side as a linear differential form via the Euclidean metric. Thus the net amount of material flowing out of any region is d * pX = div <X, p>. Now "material" may be produced in the region at a rate s dV (where s is the function describing the rate of productivity of the sources in the region). Thus the total change of density is given by ^dV = div <X,p> -» sdV. If, as we shall assume, the situation is stationary, i.e., the density does not change, we obtain the equations div <X,p> = sdV. On the other hand, it frequently happens that the flow is given by the gradient of some function, i.e., PidX = dN for some function N. Combining these equations, we obtain AN = s. In the theory of heat N is the temperature, pi^X is the rate of flow of heat, and s is the density of the sources of heat. The equation pi^X = dN says that the rate of flow of heat is proportional to the gradient of temperature. In the theory of diffusion of particles it is assumed that the rate of flow of the material is proportional to the change (i.e., gradient) of the density of the particles. (This says that the particles tend to flow from a region of higher density to a region of lower density.) Then N represents the density of the particles and s represents the rate of production of particles. Suppose, for example, that the boundary of a region D is kept at some fixed temperature /, where / is a given function on dD and there are no sources of heat inside D. Then the distribution of temperature in D is given by the function N which satisfies the differential equation AN = 0 and the boundary condition N = f on dD. In other words, the temperature is determined by the solution of Dirichlet's problem. In the theory of steadyj incompressiblej irrotational flow of a fluid it is assumed that a vector field X is given which represents the flow of the liquid. By the incompressibility it follows that c? * X — 0. It is assumed that there is no circulation, that is, dX = 0. Thus X ^ d(p for some function cp which is a solution of Laplace's equations. The natural boundary-value problems that arise in this case are different from those we have been discussing, so we simply refer the reader to standard works on fluid mechanics.
12.13 PROBLEM set: the calculus of residues 503 SUPPLEMENTARY EXERCISES 1. Let Vi and V2 be vector spaces with scalar products. A linear map I: Vi —> V2 is said to be conformal (or a similarity) if (Iv, Iw) = civ, w), where c is a positive constant. (Note that if c = 1, then I is an isometry.) Let Ml and M2 be manifolds each with a Riemann metric. A differentiable (C°°j map (f is said to be conformal if <^*^ is conformal for each x G Mi. Thus, to say that (p is conformal means that i^*.^i, <P*.h) = c{x){ii, b) if h, h G T,{Mi) for any x G Mi. (Note that this time the c can depend on x.) a) Let ip: [7 —> R^ be a conformal map, where [7 C U^^ and we use the Euclidean metric on U and on R^. Suppose (p is given by ip{x,y) = (u,v), where u = u{x, y) and v = v{x, y), and where {x, y) and {u, v) are rectangular coordinates in the plane. Show that either the equations hold or the equations hold, b) Conclude that if u and v are as in (a), then they are both harmonic functions. 2. a) Let w be a harmonic function defined on all of W'. Show that if u is bounded then it must be a constant, b) Using (a) and Exercise 1, show that there is no conformal map of R^ onto a bounded subset [7 C R^. 13. PROBLEM SET: THE CALCULUS OF RESIDUES On a manifold M we can consider complex-valued smooth functions, differential forms, etc. For instance, we say that the function f =: u -\- iv is a complex-valued C°°-function if each of the real-valued functions u and v are real-valued C°°-functions. Similarly, we consider'the complex-valued linear differential form CO = (7 + iT, where (7t G /\^{M) are linear (real-valued) differential forms. We can then perform the usual operators, remembering that z^ = — L Thus du dx du dx dv du dy' dy dv du dy' dy dv dx dv dx fo) = (ua — vt) + i{v(T + IJ-t)'
504 POTENTIAL THEORY IN E^ 12.13 Similarly, if 0)1 = (Ti + iTi and C02 = 0'2 + ^^2, then coi A C02 = (cFi A 0-2 — Ti A T2) + i{cri A T2 + Ti A 0-2) is a complex-valued exterior two-form. We similarly have the operator d given by df = du + i dv, do) = da + i dr, etc. If Xi and X2 are vector fields, we define the "complex vector field" Xi + ^'^2 as that differential operator on complex-valued functions which is given by (Xi + iX2)/-Xi/+iX2/ - {X^u - X2V) + i{X^v + X2U). From now on, let M be an open subset of R^ with rectangular coordinates {x, y). We set 2 = a^ + iy, so dz = dx-\- i dy. We let 2 =: X -— iy, dz = dx — i dy, and then define the complex vector fields d/dz and d/dz by d_ dz \(d .d\ . d l/d . .d\ We then set dz for any complex function /. EXERCISES 13.1 Show that for any complex-valued smooth function / we have df = %dz + %dz. dz dz 13.2 Show that Leibnitz's rule holds for d/dz and d/dz; that is, A function/is called/lo^omorp/iic (or complex analytic) if ^//^2 = 0. For a holomorphic function / we write ^ dz
12.13 PROBLEM set: the calculus of residues 505 13.3 Show that dz and dz are independent in the sense that if adz-^^h dz = Oj then a = 0 and 6 = 0. 13.4 Conclude that / is holomorphic if and only ii df = h dz for some h. 13.5 Show that diz"") = nz""-^ dz (on R^ — {0} if n < 0) for all integers n, so that z"^ is holomorphic. 13.6 Conclude that every polynomial p given by p{z) = J^q ^kZ^ ^^ holomorphic and that n p'(z) = X kakZ^~^. 1 13.7 Define the function e^ by setting e^ = e^(cos y -j- i sin y). Show that de^ = e^ dz. 13.8 Show that the product of two holomorphic functions is holomorphic. 13.9 Let / = w + ^^ be a holomorphic function, and consider the map sending (^j y) -^ (wj v) as a map oi M d^^ into R^. Show that this map is conformal and that its Jacobian determinant at the point {x, y) is |/'(2;)P \i z = x^ iy. 13.10 Let gr be a holomorphic function defined on the image of M under the map of M C K^ corresponding to /. Define g ° fhy 9 ° fix, y) = 9(u(x, y), v(x, y)) [which we can write, for short, as g ^ f(z) = g(f(z))]. Show that g ^ f is holomorphic and that ig°f)'iz) = 9'{fiz))f'(z). 13.11 Let U he Si domain with almost regular boundary in the plane. By Stokes' theorem we have [ V = [ dv JdU Ju for any complex-valued linear differential form v. (This is to be interpreted, as usual, as the equality of the real and imaginary parts of both sides.) Conclude that / g dz = I dg A dz = I —dz A dz. JdU Ju Juoz 13.12 Show that dz A dz = 2i dx A dy, so that / gdz = 2i /dx A dy = 2i /' JdU Ju dz Ju oz 13.13 Conclude that if / is a smooth function defined in a neighborhood of U, and if / is holomorphic in U, then f fdz = 0. JdU
506 POTENTIAL THEORY IN E^ 12.13 Fig. 12.18 Fig. 12.19 13.14 Let J = ^ -\- ir}, where (f, ??) E t/ and g is a smooth function defined in a neighborhood of U. (See Fig. 12.18.) Show that 9(0 = 1 2Tn JdU ^ — k Ju dU/dz dz A dz [Hint: Apply Exercise 13.11 to the function U{z)/{z — J) and the domain U — Bg, where B^ is a disk of radius e about J. Then let e —> 0.] 13.15 Conclude that if / is holomorphic in [7, then for any J E f/, M) 2« Jev m dz. IdU Z — ^ 13.16 Let / be holomorphic in U — {ai} U • • • U {ak} (and let / be smooth in IF - {ai} U--- U {ak}, where IF 3 C/). Suppose that in a neighborhood of each ai f(z) = BhU^ - a,)"'^' + • • • + Bi,i{z - a)-' + <pi{z), where cpi is holomorphic in the neighborhood. (See Fig. 12.19.) Show that if 27ri Jbu J{z)dz = B11 + B12-] \-Bik. [Hint: Apply Exercise 13.13 to U — U A.e where Di,g are small disks around the ai, and let e —> 0.] The result of Exercise 13.16 can frequently be used to evaluate definite integrals. For example, suppose we wish to evaluate /•2ir / R{cos e, sin 6) dS, Jo where R is some rational function of two variables. If we set z = e^^, then this integral becomes •i, 4 ^{z+z ').^(^-^' ■')]!■
12.13 PROBLEM set: the calculus of residues 507 where Di is the unit diak. To apply Exercise 13.15, we must merely find the points % and the coefficients Bij to evaluate the integral. For instance, if a > 0, Jo a + COS ^ 2 Jo a + cos ^ JdDi dz z^+2az+ 1 Now 2^+ 2a2+ 1 = (2 — ai)(2 — 02), where ai = —a+ (a^ — 1)^''^ and a2 —a — (a2 — 1)1/2^ Thus only ai E Di, and 1 ^2 + 2a5J + 1 ai so the integral is evaluated as Jo a-\- cos B /2 - 02 V — ai ^ — 02/ 13.17 Evaluate 13,18 Evaluate L dS 0 a + sin2 6 2jr (a > 0). / Jo ™'^ :de. /o a + 5 cos ^ Let P and Q be polynomials such that Q(x) 9^ 0 for any real x. Suppose that P(z) = a„„22;"~2 + a„^32:*--3 H \-aiz+ao and Q(2) = ft^^'^ + hn^iz^-^ H h 6i2 + &0, where hn 9^ 0. In other words, the degree of Q is at least two more than the degree of P. Then P/Q is absolutely integrable and l^dx= lim / ^. Fig. 12.20 Now consider §bu [P{z)/Q{z)] dz, where 17 is a semidisk of radius R. (See Fig. 12.20.) The integral over d U splits into two parts: r p{x) f p(z)
508 POTENTIAL THEORY IN E^ 12.13 where Tr is half the circle of radius R. For large values of R we have, for ||2;|| = R, \P(z)\ Q(z) ^ \an\ + \an-i\'-^-] i-ko|-^;r=^ ^ - ^7 ] Tx ^'r^' so (n-|6„-.|-| N-^) f 1^,11 - \-7?rEi^ KQiz)n - rJo Q{Re«>. n --R and thus lim/2_,oo Str = 0. We can then apply Exercise 13.16, since there will be only a finite number of zeros of Q in the upper half-plane. 13.19 Evaluate J —c dx (^2 _|_ ^2)2 13.20 Show that /■ J —i dx (a > 0). (2n-2)! (a;2+ 1)*^ 22('^-i) [(n — 1)!]2
CHAPTER 13 CLASSICAL MECHANICS In this chapter we shall present a brief introduction to the study of classical theoretical mechanics. We emphasize that what we are studying is a branch of mathematics—idealized from the mathematical considerations common to many problems arising in the physics of mechanical systems. We will thus formulate, on an axiomatic basis, a mathematical model that describes features that arise in the study of the equations of mechanics. The model will apply to most situations arising in the mechanical applications. In order to simplify the mathematics, we have sacrificed a certain amount of generality, and therefore there will be some situations in classical mechanics where our model is inadequate. Mechanics is devoted to the study of how a physical system evolves in time. The first fundamental assumption is that the system can be described (locally) by a collection of continuous parameters. More precisely, we assume that the set of all possible "positions" or configurations of the physical system is a differentiable manifold M which is called the configuration space. For example, if the physical system consists of three particles free to move in space, we can describe the "position" of the system at any given instant by giving the positions of each of the three particles. Thus, in this case, the configuration space is E^ X E^ X E^. If we insist that no two particles be able to occupy the same position of space at the same time, then the configuration space M is given by M - E^ X E^ X E^ - >S, where S = {<Vi, V2, vs> : Vi G E^ and vi == V2 or vi = vs or V2 == Vs}- If the physical system is a rigid body (say a top) spinning about some point held fixed in space, then the position of the system is completely described by giving the position in space of three orthonormal vectors on the body (drawn through the fixed point, say). Since these vectors can be placed in any position but must remain orthonormal and cannot change orientation (from right- to left-handedness), we see that M is the set of all oriented orthonormal bases of E^. In this case M is a three-dimensional manifold. If we fix some arbitrary initial orthonormal basis <ei, 62, e^^, then any possible position of the system is 509
510 CLASSICAL MECHANICS given by <Vi, V2, vs>, where Vi — Aei and A is a rotation, that is, A is an orthogonal linear transformation with positive determinant. Thus M is diffeo- morphic to 0"^(3), the space of all orthogonal three-by-three matrices with positive determinant. The basic problem of mechanics is to describe how the configuration of the system changes in time. As the system evolves, the configuration at any instant t will be given by some point C{t) G M. Our problem is to give a reasonably simple description of the possible curves C which can arise as actual changes in the configuration of the mechanical system. The second fundamental assumption of classical mechanics is that the curve C{t) can be determined from a knowledge of the "state" of the system at any given time. That is, we may (and, in general, will) have to know more about the system at a given time than its configuration in order to be able to predict its future configuration. However, if we do have enough such instantaneous information, we can determine the curve C{t). The total amount of relevant information is called the state of the system. It is assumed that the set of all states is itself a differentiable manifold, S. Since the state of a system contains more information than the configuration, we can assign to every state s the configuration 7r(s) G M of the system in the state s. In other words, we have a map TT: S —> M. We assume that this map tt is differentiable. It is assumed that if we know the state s of the system at time to, we can predict its state at any future time t. Thus we are given a map ^t,to' S —> S such that if the system is in the state s at time to, it will be in the state (pt,to{s) at time t. Now there is nothing special about the time ^o- If ^ > ^1 > ^0 and s is the state at time ^0, then (Pti,to{s) is the state at time ^1, and therefore is the state at time t. In other words. Now it turns out that in most (basic) mechanical systems, if the time is suitably parametrized, the function (ptj^ really depends only on t — to- (This fails to hold in the so-called nonconservative systems. A typical nonconservative system is one involving friction. This is usually a consequence of not studying a sufficiently complete system. In the case of friction, for example, the heat loss must be taken into account.) Let us assume that cptjo depends only on t — to. Then, if we write Si == ^1 — ^0 aiid S2 = t — ti, then the previous equation can be written as
13.1 THE TANGENT AND COTANGENT BUNDLES 511 This looks like the defining relation for a one-parameter group except that so far we have been restricting ourselves to nonnegative values of s. In point of fact, it is assumed that this restriction is unnecessary and, in fact, we are given a flow (f on S. To repeat, we are given a difTerentiable manifold M representing the set of all possible configurations of our system, a differentiable manifold S representing all possible states of the system, a differentiable map tt : S —> M, and a flow (p on S. Then the curves C{t) are all assumed to be of the form C{t) = TTiptix) for some x G S. We must therefore describe S, tt, and cp for any given configuration space M. Classical mechanics makes a very definite assumption about the nature of the space S (and the map tt). It asserts that the state of a system is completely determined by its (configuration and its) "momentum". What is the momentum of our abstract setup? As usual, the momentum should be something that resists change in velocity. It turns out that an appropriate object representing "infinitesimal resistance to change in istantaneous velocity" is a cotangent vector. A heuristic motivation for this, which the reader may choose to ignore, is the following: At any given configuration x E M, the set of all possible velocity vectors is just Tx{M). At any given v G Tx{M), a "resistance to change in v" would be some function defined near v and vanishing at v. To first order we could replace such a function by its differential. Thus "infinitesimal resistance" is a linear function on Tv{Tx{M)). Since Tx{M) is a vector space, we may identify Ty{Tx{M)) with Tx{M) for each v. Thus all possible "momenta" become identified with all elements of r*(M). The set S is thus taken to be the set of all momenta, i.e., the set of all cotangent vectors at all points of M. We must first show how to make this space into a differentiable manifold. 1. THE TANGENT AND COTANGENT BUNDLES Let M be a differentiable manifold, and let us consider the set T{M) of all tangent vectors to all points of M. Thus T{M) - U Tx{M). xGM Let TT denote the map of T{M) onto M which assigns to each tangent vector the point of M at which it is defined; that is, if J G Tx{M), then 7r(J) == x. We claim that T(M) can be made into a differentiable manifold in a natural way, so that TT is a differentiable map. In fact, let d be an atlas on M. For any (f/, a) G a, define T{a): Tf-^U) -^ V ^ V hy setting T(a)^ = <a(x), ^«> if ^ G Tx(M), xeU. (1.1) We claim that the collection (7r~^C/, T{a)) is an atlas on T{M). To check that
512 CLASSICAL MECHANICS 13.1 T{a) is one-to-one, we observe that if J G Tx(M) and rj G Ty(M) with x 9^ y, then a{x) 9^ 0L{y); while if ^ 9^ rj and both lie in Tx{M), we have ^a 7^ Va- To check that the transition law is satisfied, we note that Eq. (4.3), Chapter 9 implies that if {U, a) and (W, 0) are two charts of d, T(fi) o T(a)-\v, U - <p o a-\v), /^o«-iW?«> (1.2) for <v, ^cc> eT(a)7r-\U nW), that is, vea(UnW) and J« arbitrary in 7. The fact that the structure on T(M) does not depend on the choice of Gi is obvious. That tt is differentiable is also clear; in fact, (7r"~^(C/), T(a)) and (U, a) are compatible charts in terms of which a o tt o T(a)~^(v, J^) ^ v. We call T(M), together with its structure as a differentiable manifold, the tangent bundle of M. If M is finite-dimensional, let {U, a) be a chart of M with coordinates x\ . . . , x^, so that a(x) = <x\x),...,x''{x)> GR^ We will denote the coordinates associated to the chart T(a) on Tr~^(U) by <g\ . . . , g"", g\ . . . , g''>. Hence if J G T^^iM), where a: G f/, we have T{a)(0 - <cc{x), ^«> - <g^?),...,g^(l),g'(?),...,r(?)>, so that g^ = x^7r and ? = E s'^) (^); (1-3) In other words, the q^ are just the coordinates x^ regarded as functions on T{M) via TT, and the q^(^) are the components of J relative to the basis {[dxijj * * * ' Ua^V J * We can follow a similar procedure for r*(M)^ U ^*(M), xGM which we call the cotangent bundle. If (U, a) is a chart of an atlas Gi of M, define the map T*{a):7r-\U) -^7 0 7* by setting T*{a){l)= <a{x),l.> (1.4) for I G T*{M) and x G U. As before, this defines an atlas on T*{M) which, in turn, defines a differentiable structure on T*{M) which is independent of the choice of atlas of M, (Note that we have used the same letter, tt, to denote the two projections: that of T{M) -^ M and that of T*{M) -> M. Whenever there is any confusion, we shall denote these maps by ttt and ttt*-)
13.2 EQUATIONS OF VARIATION 513 If M is finite-dimensional and (U, a) is a chart on the coordinates x^,..., x^, we shall denote the coordinates of (7r"~^(C/), T*{a)) by Thus, if I G T*(M), where x G f/, we have so that g^- = q^oTT and Z - L v\l){dx^)x' (1-5) 2. EQUATIONS OF VARIATION Let Ml and M2 be two differentiable manifolds, and let <^: Mi —> M2 be a differ- entiable map. Then (p induces a map T{ip) of T{Mi) —> T{M2) when we set n^)?--<^*x(?) if ?Gr,(Mi). (2.1) To check that T{(p) is differentiable, let us choose compatible charts (f/, a) on Ml and (T7, iS) on Mg. Then it follows from the definition of T{ip) that (r(C/), T{a)) and (r(T7), r(^)) are compatible charts. Furthermore, T{&)oT{^)oT{a)-\vi,V2) - <{&-<P-cr^)vi,J^.^.a-^{vi)v2>. (2.2) This establishes that T{(p) is differentiable. Observe that if \p: M2 —> M3 is a second differentiable map, then it follows from Eq. (4.4) of Chapter 9 that T{^p o ^) = T{yp) o T{^). (2.3) Furthermore, it is clear from the definition that TT o T{ip) = cpo TT. (2.4) In other words, the diagram T{Mi) '—^ T{M2) commutes in the sense that it doesn't matter which path one takes to get from T{Mi) to M2. In particular, if {(ft} is a one-parameter group of diffeomorphisms of M, we get a one-parameter group T{(pt) on T{M)^ where, by (2.4), Tv-T{^t)i= <^^(7r(^)). (2.5)
514 CLASSICAL MECHANICS 13.2 If X is the infinitesimal generator of {cpt}, let us denote by T{X) the infinitesimal generator of {T{(pt)}. It follows from (2.5) that if J G Tx(M), then 7r*Kr(X)^) ^ X,. (2.6) Let us obtain the expression for T{X) in terms of a chart: Let {U, a) be a chart, and choose an open set W such that W C U and e > 0 such that (pt {W)cU for 1^1 < €. Then T{a) o T{cpt) o T(a)-^<v, w> = <{ao cpto a-^)v, J(ao^^oa-i)(v)w> , SO that, differentiating with respect to ^ at ^ == 0, we see by Eq. (4.9) of Chapter 9 that T(X)Tia)<v,w> = <XM,dX^^,,(w)>. (2.7) If M is finite-dimensional, X J ... J X are the coordinates of (U, a), and Xa = <X , . . . , X^ > , then we can rewrite (2.7) as TiX)T,^,<v, w> = -(x'iv), . . .,X"iv), J:^-^w\..., Y^^-^w^y , (2.7') where if — <w^, . . . , w'^>. In other words, in terms of the local coordinates ^^\ • • • J ^^j ?\ • • • J ?^^ the differential equation corresponding to T(X) take the form ^^ = XV, •••,«") (2.8) and dq' dX' .1 dX' .„ _=__g +...+__g. (2.9) Note that Eqs. (2.8) are just the local equations corresponding to the vector field X, Since q^ = x^ o tt, this is simply another way of writing (2.6). Suppose that cpt(x)= <x\t),...,X^(t)> is a solution curve of X lying in U. Then we can regard the coefficients as functions of t alone. Thus (2.9) takes the form ■dt = ^^')^ of a linear differential equation for w. This linear differential equation is called the equation of variation of X along the curve (p(x). Roughly speaking, it repre-
13.3 THE FUNDAMENTAL LINEAR DIFFERENTIAL FORM ON T*{M) 515 sents, to linear approximation, how solution curves of X near (p{x) are deviating from (p{x). We now go through a similar construction for the cotangent bundle. If cp: Ml -^ M2 is a differentiable map, then <^*: !r*(x)(M2) —> T^{Mi) is going in the wrong direction. We therefore restrict our attention to maps which are locally diffeomorphisms, and we define T*{(p) by setting r*(<^)z ^ (cp-yi = (cp^,.,ru if IG tUmi). (2.10) Since <^ is a diffeomorphism, is well defined, and we have TT o T*(<p) = <poTr. (2.11) If ([/, a) and (W, jS) are compatible charts on Mi and M2, then T*(fi) o T*(cp) o r*(a)-i<2;i,2;2> - <(^o .^o «"%, [/^o^o«-i(2^i)]*-'2^2>, (2.12) and so T*{(p) is differentiable. If <^: Ml —> M2 and i^: M2 -^ Ms are diffeomorphisms, then SO that r*(^ o ^) = T^iyp) o r*(<^). (2.13) In particular, if {ipt} is a flow on a manifold M, we obtain a flow T*((pt) on r*(M). It satisfies 7roT*{cpt)l^ cptiiril)), (2.14) If X is the infinitesimal generator of {(ft}, we denote by T*(X) the infinitesimal generator of {T*((pt)}. It satisfies 7r*r*(Z)z-X, for leTt(M). (2.15) 3. THE FUNDAMENTAL LINEAR DIFFERENTIAL FORM ON T*(M) Before returning to our study of mechanics, we study in some detail the geometry of the cotangent bundle. Let M be a differentiable manifold, and let 2 be a point of T*(M). Let J be a tangent vector to T*(M) at the point z, so that jGr,(r*(M)). Then tt* J is an element of !r^(2)(M). Since z G T*(^z){M) is a linear function on TTr(z){M)j we can consider the expression which depends linearly on J. We denote this linear function of J by dz- We have
516 CLASSICAL MECHANICS 13.3 thus defined a linear differential form 6 on T*{M) by setting (?, 6,) = (7r*,^, z) for J G T,{T*{M)). (3.1) The form ^ is called the fundamental linear form of T*{M). Let us obtain the expression for d in terms of a chart (7r~^(C/), !r*(a)). Since T*{a) maps 7r~^(C/) onto an open set, 0 of F © V*, the expression dT*(a) should be a function from 0 to (F © F*)* which can be identified with V* © V. Let us evaluate this function. In terms of the chart {U, a) on M and (tt-^U), T*{a)) on T*{M), we have ?r*(«) = <v,w*> eV ^ V* and (7r*J)« -= t; g F, so that (tt*?, 2> -= (z;, 2«>. Thus {< V, w*^, (Pz)T*{a)) — (^j 2!a); that is, (^^)r*(a) - <Za, 0> G F* © F - (F © F*)*. In other words, the local expression 6^ for the differential form d in terms of the chart !r*(a) is given by ea<ui,ut> - <ut 0>. (3.2) If M is finite-dimensional and we use the local coordinates <5\ • • • , 5^, p\ . . . , p'*>, then while Thus while so that or TT* (t^) = 0 and 2 = ^ P^(2) rfa;^. e^ Zv'dq\ (3.3) Of course, (3.3) is just a way of writing (3.2) in terms of a basis of F == U^. Let (p: Ml —> M2 be a diffeomorphism, let ^1 be the fundamental linear form on T*{Mi), and let 62 be the fundamental linear form on T*{M2). Let ^ G T,{T*{Mi)) and 7r(2) == a: g Mi.
13.4 THE FUNDAMENTAL EXTERIOR TWO-FORM ON T*(M) 517 and so In other words, (r*(^))*^2- ^1. (3.4) In particular, if {(pt} is a flow on M, then (r*(^,))*^- ^. If X is a vector field on M, then the infinitesimal version of the last equation is DT*iX)e = 0. (3.5) Note that any vector field X on M defines a function fx on T*{M) by fx{z) = (Z,, z) if 7r{z) = X. (3.6) We also have fx = {T*{X), d\ (3.7) since (r*(X)„ ^,) = (7r*r*(X)„ ^> ^ (X,, z) by (2.15). Finally, in view of (3.5) and Eq. (6.14), Chapter 11, we have 0 - DT*,x,d = d{T%X), 6) + r*(X) J de, so that dfx-= -T*{X)Jdd. (3.8) For reasons which will become clear later on, the function fx is sometimes called the momentum function associated to the vector field X. 4. THE FUNDAMENTAL EXTERIOR TWO-FORM ON T*{M) It turns out that the exterior two-form n= de (4.1) plays a fundamental role in mechanics, and we therefore study some of its properties. First of all, since d^ — 0, we have dn == 0. (4.2) We claim that ^z is a nonsingular bilinear form on Tz{T*(M)) for each z G r*(M). That is, if ^ G Tz(T*{M)) is such that |J 12;, -- 0, then ^ = 0. (4.3)
518 CLASSICAL MECHANICS 13.4 For this purpose we compute the local expression for 12 in terms of a chart (7r~^([/), T*{a)) [where z G Tr~^{U)]. The map T*{a) gives a diffeomorphism of Tr~^{U) with a subset of F © V*, and by (3.4) the form 6 on M carries over to the corresponding form da on F © F* == T*{V). Also, da can be regarded as a (F* © F)-valued function which is given by (3.2). Let us denote Jr*(a) by Ja, and let ^a= <XuX2> ev ev* be considered a constant vector field on F © F*. Then and so <:/(?«, ^a) is the linear differential form given by {Va, diU Oa)} = (Xi, F2) if Va= <Yi,Y2> &V® V*. On the other hand, since J^ is a constant vector field, the Lie derivative D^^Sa reduces to the ordinary derivative of the Hnear function <U2,U2> »-> <U2,0>. This derivative is just the constant <X2, 0>. Thus the Lie derivative D^^da is given by the constant linear form D^Ja = <X2,0> eV*^ V; that is. Now ^a -\^a == ?a J dda = D^Ja — d{^a, ^a), SO we See that ^a J ^a IS the constant form ^aJcld= <X2,-X,> eV*ev. To recapitulate, if J G Tz(T*{M)) is such that Jr*(a) is given by Jr*(a) == <Xi, X2> G F© F*, then (? J^.)r*(«) - <X2,-Xi> G F*© F. (4.4) Equation (4.3) clearly follows from this. Since (4.3) is of fundamental importance, we present an alternative derivation for the case of a finite-dimensional manifold. If we use the coordinates <5^...,5^p^...,p^>,then d=T.p'dq\ so that n=Zdp' A dq\ (4.5) If x = T.{a<±, + b'^), then X A^^Y.{B^dq^ - A^ dp^. (4.6) Thus (X AQ),^ 0 if and only if X^ = 0. This shows that on T*{M) we have a one-to-one correspondence, X »-> X J12, between vector fields and
13.4 THE FUNDAMENTAL EXTERIOR TWO-FORM ON T*{M) 519 linear differential forms. Let us denote by cox the form corresponding to X, so that c^x^ X A 12; and let us denote by X^ the vector field corresponding to the linear differential form CO. Thus CO == X^ J fl == cox„. Observe that c?co -= 0 if and only if Dxjl = 0. (4.7) In fact, by (4.2), DxSl ^ d{X^ J 12) -= c?co. In particular, there is a distinguished class of vector fields on T*{M)—those corresponding to functions, i.e., the vector fields of the form Xar, where i^ is a function on T*(M). These vector fields are called Hamiltonian vector fields. *In view of (4.6), if Dx^ = 0, then locally, at least, X is of the form XdF, since any form co with do) = 0 can be written locally as co — dF. If we make the topological assumption that T*{M) has vanishing first cohomology group, then Dx^ — 0 is equivalent to X being Hamiltonian. This assumption is really a restriction on the nature of the configuration space M. Since we do not wish to restrict M in this manner, we will not take Dx^ = 0 as the definition of a Hamiltonian vector field.* Note that if X and Y are Hamiltonian vector fields, so is aX -^bY, where a and b are constants. In fact, li X = XdF and Y — Xdo, then aX -\- bY = Xd(aF+hG)' Furthermore, [X, Y] is also a Hamiltonian vector field. In fact, Dx{Y jn) = DxY J 12 + F J Dx^ = DxY J 12, since Dx^ = 0. Since DxY = [X, Y], we see that [X, Y]JQ=DxdG= dDxG. In other words, we have [XdF, Xdo] = XdiX^pG)' (4.8) We thus see that we have a binary operation on functions corresponding to the Lie bracket on Hamiltonian vector fields. It is called the Poisson bracket and is denoted by {F^ G}. In other words, we define {F, G} = XdFG, (4.9) so that we can rewrite (4.8) as [XdF, Xdo] — Xd[F,G\' (4.8')
520 CLASSICAL MECHANICS 13.5 Note that XarG ^ {XaF, dG) - {Xar, X^g J^} = {Xao A X,^, 12), (4.10) so that, in particular, {F,G} = -{G,F}. (4.11) In terms of local coordinates <5^ . . . , 5^, p^ . . . , p^>, we have so that by (4.4), ^"^ = ^ (^ 5^ ~ w w)' ^*-^^^ and therefore from which the antisymmetry (4.11) is apparent. A consequence of (4.11) is the following: Proposition 4.1. If F and G are functions on T*(M) such that XdpG == 0, then XaoF - 0. In other words, if G is constant along the solution curves oi XdFj then F is constant along the solution curves of Xao- In fact, XapG ^ {F, G} = -{G,F} = -X^gF. It will turn out that Proposition 4.1 is the prototype of all the "conservation laws" of mechanics. We close this section with the following observation. Let F be a vector field on M. Then the momentum function of F is a function fy on T*{M). Equation (3.8) asserts that -T*(F) ^ X,f^. (4.14) 5. HAMILTONIAN MECHANICS As we indicated in the introduction, the first fundamental assumption of mechanics is that the evolution of the system is determined by a flow on T*{M)j where M is the configuration space of the system. The second fundamental assumption concerns the character of the flow. It is assumed that the infinitesimal generator of the flow is a Hamiltonian vector field. That is, it is assumed that there is a function H (called the energy) on T*{M) such that the vector field X—dH is the infinitesimal generator of the flow on T*{M) describing the
13.5 HAMILTONIAN MECHANICS 521 evolution of the system. (The minus sign is a consequence of certain standard conventions.) In order to see what this means, let us express the equations of motion in terms of q- and p-coordinates for a finite-dimensional system. Thus X_, dH - Y (— - — —^ Thus, if <q^('), . . . , q^'i'), P^{'), • • • , p''(')> is an integral curve of the flow, it must satisfy the differential equations dq' dH , dp' dH dt dp^ dt dq^ ^ ^ A trivial consequence of (4.11) is that X_dHH = 0. In other words, the function H is constant along trajectories of the system. This principle is known as the law of conservation of energy. More generally, we can formulate Proposition 4.1 as: Proposition 5.1. Let X_dH be the infinitesimal generator of a mechanical system with energy H. Let /^ be a function such that XdpH - 0. Then F is constant on the trajectories, i.e., solution curves, of the flow generated by X_dH' Proposition 5.1 is the prototype of all the "momentum conservation" laws we shall derive later; see, for example, the discussion at the beginning of Section 6. In order to specify the mechanical system, one must give the function H. It turns out (in many but not all cases) that the energy is the sum of two terms, H = K + U, where K is called the kinetic energy and U is called the potential energy. They each have a very special form which we now describe. The kinetic energy is a function on T*(M) which is associated with a Riemann metric on M. Let ( , ) be a Riemann metric on M. It gives a scalar product ( , )x on each Tx{M), and therefore induces an isomorphism of Tx{M) with T*{M), and thus gives a scalar product on T*{M) which we will continue to denote by ( , ). The function K is then defined by K{1) = iil, I)- (5.2) To understand the relevance of a Riemann metric to mechanics, let us consider the most elementary case—the study of a single particle of mass m in E^. The usual relation between velocity and momentum, p — mq can be formulated as follows: Consider the Riemann metric on E^ which is m X the Euclidean metric. That is, if <x, y, z> are rectangular coordinates on
522 CLASSICAL MECHANICS 13.5 E^ and <qx, Qy, Qz, qx, iy, qz^ are the corresponding coordinates on T(E^), then II(^x, Qy, qz)\\^ = ml + mql + mql. Then the map of r^(E^) -^ T*(E^) sends <qx, qy, qz, qx, iy, qz> "-^ <qx, qy, qz, Px, Py, Pz>, where Px = ^gx, Py = mqy, pz = mq^, (5.3) and K{qx, qy, qz, Px, Py, Pz) = 2"- ipl + pI + P^z)- (5-4) Thus the passage from velocity to momentum depends on the choice of a Riemann metric, which determines a map from T{M) -^ T*{M). This can be regarded as a generalized "choice of mass" for the configuration. The function U is assumed to be of the form U = U o tt, where f/ is a function on M. The form F = — df/ is called th^ force field whose potential is U. It can be regarded as a vector field on M in view of the Riemann metric on M: (?, F,) = -(?, dU) for any ? G T,{M). In the special case of a single particle in E^ with mass m, substituting (5.3) and (5.4) into Eq. (5.1) gives, when we write H = K + U, or, since m is constant. mg = n (5.6) which is the usual rule stating that force = mass X acceleration. Returning to the general theory, we now formulate a useful corollary of Proposition 5.1. Proposition 5.2. Let H = K + U be the energy associated with the Riemann metric ( , ) and the function U on M. Let F be a vector field on M which is an infinitesimal isometry of ( , ) and is such that YU = 0. Then the momentum function fy is constant under the flow generated by X_dH' Let {(ft} be the flow generated by Y. Since (pt is a local isometry, (<^*/, (pfl) = (/, /) wherever (pfl is defined, so that K{T*i<pt)l) = Kil), and thus T*(Y)K = 0.
13.6 THE CENTRAL-FORCE PROBLEM 523 Also, T*(Y)U = {T*(Y),dU) - (r*(F), 7r*dU) = (7r*7^*(F), dU) = (Y, dU) - 0, so that T*{Y)H = 0 and Proposition 5.1 applies [see Eq. (4.14)]. 6. THE CENTRAL-FORCE PROBLEM Before proceeding with the general theory, we illustrate the previous results in a simple but important case. We will study the motion of a particle in E^ acted on by a "force centered at the origin". That is, our configuration space M will be taken to be E^ — {0}, and the Riemann metric is m X the Euclidean metric, where m is the "mass" of the particle. We also assume that the function U depends only on the distance to the origin. Thus U{x, y, z) = P{r), where r^ — ^2 _j_ ^2 _j_ ^2 Under these circumstances it is clear that any rotation about the origin is an isometry of the Riemann metric and preserves U. We can therefore apply Proposition 5.2 to the infinitesimal rotations x{d/dy) — y{d/dx) to conclude that the corresponding momentum function xpy — ypx is constant. (This momentum function is known as the angular momentum about the 2;-axis.) Similarly, the functions xpz — zpx and ypz — zpy must be constant on any trajectory of the flow. If we write x == <x,y,z> and p == <px, Py, Pz >, then these three conservation laws can be combined to read X A p == const, (6.1) which is known as the law of conservation of angular momentum. Here x and p are considered vector-valued functions on T*{M). In order to study the implications of (6.1), let us first distinguish two cases: where the constant 9^ 0 and where the constant vanishes. If the constant occurring in (6.1) does not vanish, then (6.1) implies that the plane spanned by x and p does not change. In particular, the motion is such that x lies in a fixed plane. If x A p — 0, we argue as follows: Since p — mx, we have x A x — 0. Since x 9^ 0, this implies that X — Xx for some function X of time. Now ||x|| — (x, x) ^^^, so ^llx|| = ^(x,x) = (^,x)=M|x||, (6.2) and therefore Xx Xllxllx d / X \ _ Xx dt\MJ ||x|| llxp ^' SO that x/||x|| == const; that is, x lies on a fixed ray through the origin.
524 CLASSICAL MECHANICS 13.6 If we differentiate (6.2) and make use of the fact that x/||x|| is constant, we get, by (5.6), (~ ,m--i)=- (j^r^, P'(||x||) j^^ , \M dt) viixii ^" "^iixiiy that is, m~=-P'(r), (6.3) where r == ||x||. In any event, in all cases the particle x moves in a plane. We may therefore restrict our attention to the plane. That is, we can consider a "new" mechanical system where M = E^ — {0} and its Riemann metric, the function U = P{r), etc. Let us introduce polar "coordinates" in the plane. Then ^/^^ preserves the metric and the potential. If < r, 6, pr, pe > are the corresponding coordinates on T*(M), then Proposition 5.2 implies that Pe = const. (6.4) (Note that this is really just that part of (6.1) that we haven't yet used.) Now in terms of polar coordinates, the Euclidean metric has the form of Eq. (8.7) of Chapter 9, so that the Riemann metric associated to the mass m which is m X the Euclidean metric is given by In particular, the associated map from T(M) to T'^{M) is given by Pr — mf and pe — mr'^B. (6.5) Thus (6.4) says that r'^e ^ const. (6.40 To understand the significance of (6.4'), consider a curve x(-) in the plane. Consider the region %, bounded by the portion of the ray from 0 to x(0), the portion of the ray from 0 to x(t), and the curve x(-) from 0 to t. (See Fig. 13.1.) Then r 9 f ^^^) i/ r^ do = rdr A dd is the area of 11. On the other hand, since dd — 0 on the rays, we see that t f r^ dd = f r^d dt. Jd'n. Jo Thus (6.4') is exactly the content of Kepler's second law: The particle sweeps out area at a constant rate. Thus we have seen that the spherical symmetry of the system implies that the motion is in a plane and that Kepler's second law holds.
13.6 THE CENTRAL-FORCE PROBLEM 525 We have not yet made use of the conservation of energy, which in the present context reads ^mf + imr 6 + P{r) = const = 1 1 2m^' + 2^^^^ + ''^^^- ^'-'^ Let us examine a particular solution curve for which p^ = A = const. Then differentiating (6.6) gives dpr dt d'r A' ^,. . (6.7) We can interpret (6.7) as the equations of motion of a one-dimensional mechanical system. The kinetic energy of this system is ^mf^, and the potential energy Q is given by Q{r) = P{t) + 2mr2 (6.8) The second term in (6.8) is known as the centrifugal potential and the corresponding term, A^/mr^, occurring in (6.7) is called the centrifugal force. Note that if A = 0, then (6.7) reduces; to (6.3), which is what we would expect. E, Fig. 13.2 We can now use the following procedure for solving the equations of motio^: First, for a given angular momentum find the various solution curves r(-) of (6.7). For a solution ?^(-), determine d{') by integrating 6 = A/mr^ to get e{t) J to mr^{s) ds + const. (6.9) We can obtain a good bit of information about the nature of the solutions of (6.7) by using the law of conservation of energy. We draw a graph of the function, as shown in Fig. 13.2. Suppose that the trajectory r(-) has the constant energy Ei = ^f^ + Q{r). Then if the set {r : Q(r) < Ei} is bounded, r(t) must always be in this set, since imf^ > 0. In fact, suppose that the interval a < r < h is such that Q(a) == Q{h) = Ei and Q(h + e) > Ei and Q{a — e) > El for small €. Then if a < r{to) < h for some value of to,it follows
526 CLASSICAL MECHANICS 13.6 that a < r{t) < h for all t. Furthermore, if Q(r) < Eiior a < r < h, then for (^ < ^'(^o) < ^, ^(0 cannot vanish. The particle is thus moving to one of the limits r = a and r = h. If we use the law of conservation of energy, we see that lmf^ + Q(r) = El, so that -=w^ dr_,_^ IE,- Q(r) dt For a < r < h we see that r is a monotone function of t, and we can solve to obtain t as a function of r by the formula .1/2 / dx t{r) = Mirny" , + ^0 if r{to) = To. (6.10) 'to VEi - Q{x) In particular, if Q'{a) 9^ 0 and Q'(6) 9^ 0, the integral in (6.10) converges, so that it takes a finite amount of time for r to get to a or to h. Thus (since dfldi — Q'(f) 7^ 0), the function r oscillates between a and h taking the time (Jm) 1/2 / dx /« VE'i - Q{x) to get from one side to the other. If {r : Q(f) < E2} is not bounded, then the motion with energy E2 need not be bounded. For instance, suppose that Q{r) < E2 for r > a. Then if r(^o) > ct and f{to) < 0, r will decrease until it hits a at which time it will turn around and then go off to infinity; if f{to) > 0, then r will simply go to infinity. The trajectory in this case is that of r coming in from infinity, turning around at a, and then going back to infinity. We recall that this is a descriptive analysis of the function r(-). The curve d(-) is then to be determined by (6.9). Frequently we are interested in the tajectories as curves in the r^-plane without reference to the time dependence. Suppose that the energy is E and the angular momentum is A. For the range where r f^ 0 we can substitute dd _ A dt mr^ into (6.10) to obtain [since dd/dr = (dd/dt){dt/dr)] A dx 'roxWE - Q{x) ^«=±(27^y''/ ^^T^^^^^+^M- In the important case where P{r) — —a/r corresponding to the inverse-square law of gravitational attraction, we have Q{x) A^ 2mx^ X
13.6 THE CENTRAL-FORCE PROBLEM 527 so that the integmnd is 1 A (2m) 1/2 v^- A^ a 2mx^ X 2 2"|l/2 h'-ii-^y^'^] dx arccos■ A X ma A Thus 4 2mE + m^a^ d{r) — d(ro) = arccos A r ma ~A 4 2mE + 2ry2 Let us choose Tq to be the minimum value of ?% and let us choose ^(?^o) = 0. Let us set A' ma and = ^1 + 2^:42 ma^ We see that the equations for the orbit are - = 1 + e cos 0. (6.11) This is the equation of a conic section (Kepler's first law), li E < 0, then e < \ and (6.11) represents an ellipse, whose major and minor semiaxes are p _ a a == and 1 - ^2 V 2\E\ A Vl - e^ \/2m\E\ We leave to the readers the details of working out the hyperbolic and parabolic orbits. For the elliptic orbit, Kepler's second law implies that the total area of the interior of the ellipse is swept out at the uniform rate A/2m. Thus the area, Tvab, of the ellipse is given by {A/2m)T. Thus 2m T = Tvah — Tva - V2m\E\ and hence T == 27ra^'Wm/a. In other words, the square of the period of motion is proportional to the cube of the linear dimension of the orbit (Kepler's third law).
528 CLASSICAL MECHANICS 13.7 7. THE TWO-BODY PROBLEM Let us consider the mechanical system consisting of n particles each having mass rrii, so that the configuration space is E^ X • • • X E^ (n times), and if a ^ <a\ . . . , a''> G M, where a^ = <x^, y^, z^> E E^, then MdV == im,\\dT + ' ' ' + hrnn\\d^r = imAixY + (yy + (zY] + • • • + imn[(x-)' + {rY + (^")'] is the kinetic energy of the system. As we indicated in Section 1, we may wish to restrict the manifold M to be that subset of E^ X • • • X E^ for which no ai == aj for any i 9^ j. Let us further assume that the potential energy depends only on the mutual distances between the particles. That is, let us assume that [7(a^ . . . ^a"") ^ P{\\a2 — aJ, ||ay — aJ, . . . , \\an — an_i||). Then if A is a Euclidean motion of E^ and we apply A simultaneously to all the particles, the kinetic and potential energy are conserved. That is, the transformation <a\ . . . , a^> ^ <Aa^, . . . , Aa^> is an isometry of the Riemann metric on M and preserves TJ. Let <x^ 2/^ ^\ • • • J ^^j y^j ^^^ be the Cartesian coordinates on M, and let ^^xl, ^2/1, Q.zl, Qx2, . . . , Qyn, Qzn, Pxl, • • • , Pzn^ be the (Corresponding coordinates on T*{M). Now d/dx is the vector field representing the infinitesimal translation in the x-direction in E^. Therefore, dx^ ^ dx^^ dx^ is the infinitesimal generator of the one-parameter group <x\y\z\x^...,x'',y'',z''> ^ <x' + t, y\ z\ ^2 + t, y^, z^...,x'' + t, 2/^ ^">. We can therefore apply Proposition 5.2 to conclude that the function Pxi+Px2-\ h Pxn is constant in any trajectory. This function is known as the total linear momentum in the x-direction. Similarly, the total linear momentum in the ^/-direction, Pyl-\ h Pynj and the total linear momentum in the ^-direction, Pzl + ' ' ' + Pzn, must be conserved. If we define p* G E^ by setting p* == <Pxij Pyu Pzt^ , then
13.7 THE TWO-BODY PROBLEM 529 we can say that the E^-valued function must be conserved. This is the law of conservation of total linear momentum. (For two particles the assertion that pi + P2 is conserved is just Newton's law of "equality of action and reaction". In our setup we see that this law is a reflection of the invariance of the physical situation under translations.) The vector field x(d/dy) — y(d/dx) represents infinitesimal rotation about the 2;-axis in E^. Therefore, 1 d 1 d . 2 ^ 2 ^ , I n ^ n ^ ^ dy^^ ^ dx^'^'' dy^ y dx^'^""^'' ar ^ Qxn represents simultaneous infinitesimal rotation of all the particles about the 2;-axis. Therefore, the function QxiPyi Qy^Pxl -f- • • • -f- QxnPyn QynPxn is conserved. This is the law of conservation of total angular momentum about the 2;-axis. Similarly, we obtain the law of conservation of total angular momentum about the x- and y-Sixes, If we set q' = <qxijqyi,qzi> G E^, we can combine the three equations by saying that the function gi A p^H hg"" A p"" with values in /\^(E^) is constant on the trajectories of the motion. Let us examine more dlosely the law of conservation of total linear momentum. In view of the fact that dx^ PxV=m,^. etc., we have p* == mi(da^/dt), and thus -r, (mia^ + ni2a^ -f . . . -f ntnO^) = In other words, the center of mass ^ _ niia^ _j- . . . _]- rrtnO^ const. mi-\ + rrin moves in a straight line with constant velocity. Suppose there are only two particles. Then it is reasonable to introduce the center of mass mi + ^2 and the relative position vector d = a^ — a^
530 CLASSICAL MECHANICS 13.8 as new coordinates. If we solve for a^ and a^, we get mi + m2 mi + 1712 and therefore the kinetic energy is given by while f7(C, d) = P{\\d\\). Thus the motions of the system have the following description: The center of mass has constant linear motion, and the relative position vector d = a^ — a^ satisfies the equations of motion of a single particle with mass mim2 nil + ^2 in the central-force field with potential U — P(||d||). We can thus apply the results of the preceding section to determine the motions of the particle. In particular, if P(||c/||) — q:/||c?|| is the inverse-square potential, the corresponding two-body problems can be completely solved. Note that if m2 is very large compared to mi, then C is very close to a^. In this case c/(-) is a good approximation to the motion of the smaller particle relative to the larger one. This is the situation that arises, for example, in the study of the motion of of the planets relative to the sun. 8. LAGRANGE'S EQUATIONS Our discussion of mechanics has led us to a certain kind of vector field X on !r*(M), where M is the configuration manifold. In the case where H = K + U, the Riemann metric giving K induces an isomorphism <£ [that is, a diffeo- morphism which is linear on each Tx(M)] from T{M) to T*(M). We therefore obtain a vector field Y on T(M) such that <£*F^ == ^£(1;) at any v G T{M), We can therefore inquire about the form of this vector field Y in terms of local coordinates <q^, . . . , q^, q^, . . . , q^> on T{M), associated with coordinates <x^, . , . ,x'''> on M. Suppose that the Riemann metric is given by If <5S . . . , 5^, pS . . . , p^> are corresponding local coordinates on !r*(M), then the diffeomorphism <£ is given by q'-q\ p'= i:giM\'".(fW' (8.1) We could proceed to use (8.1) to obtain the local expression for <£ and thereby the local expression for F. However, it is more convenient to argue a little differently. Let L be the function on T{M) defined by L{q\ . . . , g^ gS . . . , r) - iZ gm'q' - U(q\ . . . , g^), (8.2)
13.8 Lagrange's equations 531 that is, L= K - U, where K{v) = i\\v\\^ and U(v) = U{Tr{v)) are the kinetic and potential energy expressed as functions on T(M), We can rewrite (8.1) as ^' = «'' ^' = W ^^-^^ Then for any I e T*(M), we have H{1) = {£-\l), I) - L(£-i(0). (8.4) In fact, by definition, {£-\l),l)= ||Zf = \\£-\l)r, so that (£-'(Z), I) - L(£-\l)) = \\ir - Mir + U(7r{l)) = i\\iV+u(i) = H{1). In terms of local coordinates we can write (8.4) as H{q\ . . . , 5^ pS . . . , p^) - L p¥ - L(q\ . . . , 5^ gS . . . , g^), (8.5) where the g* are regarded as functions of the q's and p's via £~^ We can write the map <£"~^ as «' =«'' ^' = ff' (^-6) since H{1) = M^V + U(-k{1)). Furthermore, by (8.5), dL o £-1 dq' by (8.4). Now let v{-) = <q^{-), ■ ■ ■, g"(-)> ?'(")> • • • > ^''C')'^ be a solution curve to the vector field Y, so that £ » v{-) is a solution curve of X on r*(ilf). Then by (5.1), dq^^dH^^ .i dt dpi ^ and dp' d{dL/dq') ^ _ 5F^aL In other words, Eqs. (5.1) are equivalent to the equations dq' .i d{dLM) dL_
532 CLASSICAL MECHANICS 13.9 on T(M), The first of Eqs. (8.7) says that we are dealing with a system of second- order differential equations which is given implicitly by the second of Eqs. (8.7). Equations (8.7) are known as Lagrange's equations. For certain purposes Lagrange's equations are very convenient. We illustrate by establishing the "principle of mechanical similarity". Note that Eqs. (8.7) are unchanged if we replace L by cLj where c is any nonzero constant. Suppose that M is an open subset of a vector space with linear coordinates x^, . . . , x^ and that the Riemann metric is given by ^ Oiji^q^j where the Qij are constant. Let m^ denote the linear map consisting of multiplication by a > 0, that is, ma(x \ . . . , x^) = < ax \ . . . , ax^ >. Suppose that C/ is a homogeneous function of degree p, so that U(ax\ . . . , ax"") = a^U{x\ . . . , x^). Now let us change our time scale by a factor /3, replacing ^ by s — ^t. Then tT ' daq^ _ a dq^ ~ds' ~ '^~dt and 1 ^ daq^ daq^ _ ^^ 1 \^ ^5* ^^^ 2^ ^''~di ~di ~ J^2^ ^'' ~dt '~dl' Let us choose /3 so that a^/^^ = oP^ that is, Then T I \ n oidq^ adq\ vt ( \ n dq^ dq''\ In other words, replacing q^ by aq^ and t by 0t carries solutions into solution: if we change the linear scale by a and change the time scale by a^~^^i^^p^ we obtain an isomorphic situation. For instance, if U is homogeneous of degree —1 (as in the case of an inverse-square law of attraction), then /3 — a^'^. In particular, the period of any periodic orbit is proportional to the f-power of its linear dimension, which is just Kepler's third law. We thus see that Kepler's third law is an exact consequence of the inverse-square law of attraction. Returning to the study of the general Lagrangian system, we observe again that it is a system of second-order differential equations. We can therefore apply the fundamental existence and uniqueness theorem to it to conclude that for every x G M and for every ^ G Tx{M) there is a unique curve C(-) which is a trajectory of the system and for which C(0) = x and C'(0) == ?. 9. VARIATIONAL PRINCIPLES The function L plays a crucial role in the study of variational principles of mechanics. Consider the following problem: Let p and q be two points of M, and let ^i < ^2 be two real numbers. For any differentiable curve C defined on
13.9 VARIATIONAL PRINCIPLES 533 the interval [^i, ^2] we set I[C]^ C" L{C'(t))dt, (9.1) Note that C'{t) G T{M) and L is a function on T{M), so the integrand makes sense. Among all curves joining p to q, find that curve for which I[C] takes on its minimum value. We shall see that a necessary condition for I[C] to be a minimum is that it be a trajectory of a mechanical system. (In fact, if a suitable notion of neighborhood is introduced on the space of curves, it is also a necessary condition for C to be even a local minimum.) Before establishing this result, it is convenient to have another expression for I[C], As before, let <£ be the map of T{M) -> T*(M) given by the Riemann metric. Let V be the curve in T*(M) given by ^(0 - £{C\t)), Then I[C] = f_e - C"H{V{t)) dt, (9.2) Jc Jti where 6 is the fundamental linear form on T*(M). In fact, in terms of local coordinates, since the curve C by definition is such that q\t) = % (0. But, by (8.5), z / f| m dt = Y: f(p' ° £)(tmt) dt = jw»£(c'(o) + Lic'm at, so (9.2) holds. Let Z be a vector field on M which generates a flow ip. For all sufficiently small s the curve ips o C(-) will be defined and (v» ° cy{t) = <p,*c'{t) = TMic'it)), so that IWs ° C] = p LiTMC'it)) dt = [ e- f'"H{£oTi<p,)(C'{t)))dt. (9.3) j£o(pgoC' Jti Since I[C] is to be a minimum, we must have ds We will now compute this derivative so as to derive the consequences of the above equation.
534 CLASSICAL MECHANICS 13.9 Now £ o T[(ps] o <£~^ is a flow on T*(M) which satisfies TT o £ o T(cps) o <£~^ -= cps^ Let Z be the infinitesimal generator of this flow, so that at all points of T'^(M) we have 7r*Z -- Z, (9.4) If we differentiate (9.3) with respect to s at s — 0, we obtain ds I[cps-C]= IdzO- / DzH{V{t))dL Jc Jti Now Dzd = d(Z, d) + Z J dd and (Z, d)i = (tt^Z, /> - (Z, /> - fz(l), so we get ^ ly. oC] = f_Z Jde- f^ DzH{V{t)) dt+fzi^{t2)) - fzi^i)). (9.5) Now suppose that the vector field Z vanishes at p and q. Then all the curves <Ps ° C join p to q. If C is to minimize the integral /, then the derivative d(I[(ps o C])/ds must vanish at s == 0. Note that in this case the two last terms of (9.5) vanish and we must have fZjdd- C' DzH{V(t)) = 0 (9.6) Jc Jti for all vector fields vanishing at p and q. In particular, let us take Z^yp—. for xGf/, - 0 for X g 17, where i^ is a C°°-function whose support lies in some coordinate neighborhood U of M with coordinates <x^, . . . , x^>. Suppose further that T^(p) == yp{q) == 0 if p G f/ or 5 G f/. Then by (2.7') a^ ^dqod^o so that where jB^ are some functions on T*{M) which depend linearly on Z. [This, of course, is just a restatement of (9.4).] Then Z Ade^'Z AY.dp' A dq' ^ ^ B'dq' - rP dp\
13.9 VARIATIONAL PRINCIPLES 535 while d-zh = >^§+i:b dqi Thus if the curve "C is given by C(t) Eq. (9.6) becomes <q\t),,,,,q-{t),pHt),,,,,p-(t)>, /[2-'(f^g)-*(f+!!)]--»• dpj/ ^ \dt ' dq\ Now by construction, ^(t) = £> o C'{t), so on C we have dq^ dt T = dpj (9.7) by (8.6). Thus the first sum occurring in the above integral vanishes and we must have A(¥+i)-=»- dqi This must hold for all functions \p whose support lies in a coordinate neighborhood and vanishes at p and q. Clearly, this can happen only if dp^ dt m dq^ (9.8) This must hold for all i. Since (9.6) and (9.8) are exactly (5.2), we can assert: Proposition 9.1. A necessary condition for C to minimize the functional I[C] is that C is a trajectory of the corresponding dynamical system. The question of when this necessary condition is also sufficient is a more complicated one. We shall not discuss it in any generality here, but refer the reader to any standard source on the calculus of variations. Fig. 13.3 By the way, we can derive a bonus from (9.5). Suppose that we consider the following problem: Let Ni and N2 be two submanifolds (Fig. 13.3) of M, and suppose that we require that C minimize / among all curves joining Ni to N2 and not merely among those joining p to q. In this case (9.5) will have to vanish for all vector fields Z which are tangent to iVi at p and to N2 at q. Now observe
536 CLASSICAL MECHANICS 13.9 that if C is a solution to this minimum problem, it certainly is a solution to the problem of minimizing / among all curves joining p to q. In particular, C must be a trajectory of the mechanical system. As the reader can easily check, this implies that j_Z AS- j{DzH){V{t)) dt^O for all vector fields Z, Thus, if C solves the more difficult minimization problem, we must have Now fzi^ih)) = (ZpMi)) and /z(^fe)) = (2«,^(<2)>. Since ii p ^ q we can choose Zp arbitrarily in Tp{Ni) and Zq arbitrarily in Tq{N2), we conclude in this case that C must also satisfy {^,^iti)) = 0 for all ^GTpiNi), iv, ^(<2)) = 0 for all V e TqiN2). Since ^(<i) = £C'{ti), we have so we can write (9.9) as (^,C'«i)) = 0 for all ^eTp{N^), {v,C'{t2)) = 0 for all veTq{N2). (9.9) (9.10) In other words, the curve C must be orthogonal to the submanifolds A^i and A^2- Although our statement of Proposition 9.1 was couched in the framework of dynamical systems, it actually can be formulated in a more general context. Let L be any function on T(M)—not necessarily of the type K — U. We can then define the integral / as in (9.1) and again pose the minimization problem. This is the typical problem in the calculus of variations. We have already discussed this problem from a different point of view in Section 3.15. We leave to the reader the task of showing that our arguments carry over in this more general case if the matrix is nowhere singular. (Here the map £ of T(M) —> T'*'(M) is given by I n . 1 . n,. >^ 1 n ^^ ^^ V <x J . . . J X J X J , . , J X >»->-<^x,...,x, -^ri J ' ' ' J -^^ >, and the nonsingularity guarantees that this map is locally a diffeomorphism.)
13.10 GEODESIC COORDINATES 537 10. GEODESIC COORDINATES In this section we depart momentarily from the study of mechanics in order to exhibit some applications of the results of the preceding paragraphs to the study of Riemann manifolds. Note that a Riemann manifold always defines a mechanical system by its kinetic energy if we set the potential energy equal to zero. It is this special kind of mechanical system that we wish to study in this section. Let M be a finite-dimensional manifold with a Riemann metric and, as above, define L on T{M) by setting L(v) = i\\v\\^. This then determines a vector field Y on T{M) which corresponds to the system of differential equations (8.7) in terms of local coordinates. Let p be a point of M, For every ^ G Tp{M) there is a unique trajectory Q(-) such that Q(o) - p, cm == ?. In terms of local coordinates, if ? == < ?^ • • • , ?^^, then where q\ are the unique solutions of the differential equations dqi .i d{dL/dct) _dL^ dt ^^' dt dqi with the initial conditions 5i(o) = x\p), 4(0) = r. By the fundamental existence and uniqueness theorem the functions q\ are defined for sufficiently small t. We can regard C^(t) as dependent on both ? and t. In other words, we have a map C.(-) assigning to each ^ G Tp(M) and each sufficiently small t a point of M, The map C.(-) is, in fact, defiijed in some neighborhood of Tp(M) X {0} C Tp{M) X U. The above is true for any Lagrangian function. In the case of no potential energy we can say a lot more. Let s be a real number. Consider the curve Q(s •); that is, Q(sO - <ql(st),,,,,q'^(st)>. Then (suppressing the subscript ? which is to be understood in the following computations) ^ = ^^'■(«^)' II (q\st), ..., q'^ist), sq\st), ..., sq'^{st)) = 1 s^ ^ ^ q''{st)q\st),
538 CLASSICAL MECHANICS 13.10 and d_ dt i(^'(^^)'- q\st),sq\st),...,sq''{st)) d ^^sJ2 9ik(q\st), . . . ,q\st))q\st) dtdqi ?"(■)) In other words, ddL^ dt dq^ d dL / I. . dL ?>-))-|^(5^(«-),-..,.-.,^g>-)) II (.H-),...,^"^)-I I iq\-), ..-, ?"(•)) = 0. Thus the curve C^{s ■) is again a trajectory of the system, and we clearly have CKs-)lo = sCK-)lo = s?. By the uniqueness theorem for differential equations we thus must have Ciist) = C,i(t). (10.1) We are therefore led to define a map, which we shall call exp, from Tp(M) -^ Mhy setting exp (0 = Q(l). (10.2) Note that the map exp is defined and differentiable in some neighborhood of the origin in Tp(M), In fact, by (10.1), exp(0 = C^jU\\(U\\), where now J/|| J|| lies on the unit sphere in Tp(M) [the unit sphere with respect to the Euclidean metric given by the scalar product on Tp{M)]. Since the unit sphere is compact, there is some e > 0 such that C^(0 is defined for all rj on the unit sphere and all |^| < e. Thus exp will be defined for all || J|| < e. The map exp is a differentiable map from some neighborhood of the origin in the vector space Tp(M) into the manifold. Let us compute exp^o--To[Tp(M)]^Tp(M). Let J G Tp(M), and let us consider the straight line through 0, /^, in Tp{M) given by /^(O — t^. Since Tp{M) is a vector space, we can identify Tq[Tp{M)\ with Tp{M) via the identity chart, in which case we identify But 1[{Q) with ^ exp [im = exp {to = C,K1) = Q(0, (10.3)
13.10 GEODESIC COORDINATES 539 and the tangent to this curve at 0 is just ^. In other words, exp*o [um - ?. Thus, if we identify To[Tp{M)] with Tp{M), we can assert that exp*o: To[Tp{M)] -^ Tp(M) is given by exp*o (0= ^' (10.4) In particular, the map exp:j:o is nonsingular, so that by the implicit-function theorem the map exp is a diffeomorphism in some neighborhood of the origin. We have thus constructed a diffeomorphism exp from some neighborhood of the origin in Tp(M) into M, which by (10.3) carries straight lines through the origin into trajectories through p. In the case of no potential energy the trajectories are called geodesies for reasons which will soon become apparent. If we identify Tp(M) with V by some chart {U, a), we can then use exp to introduce a new chart (U\ n^) by setting n~\^a) = exp (?). The chart n^ has the property that rtaip) — 0 and n^ carries geodesies through p into straight lines through the origin. The chart (U\ n^) is called a geodesic normal chart, and corresponding coordinates are called geodesic normal coordinates on M. Let us consider the curve Q(-) == exp (• ^) which is defined for 0 < ^ < 1 so long as ||?|| < e. We have mi-)] = C L(cm) dt = if' \\cmf dt. Jo Jo But, by the conservation of energy, for any trajectory of our system we have H o £(C'(0) -= const. In this case, since U = 0, H o £(C'(t)) = L(C'{t)) ^ i||C^(^)||2 3^ const. Since C^(0) - ?, we have ||C^(0|| == ||?||, and therefore /[Q(-)]-i/^ ll?ir^^-^- (10.5) Now let {jSs} be a one-parameter group of rotations in Tp{M). Then (Ps = exp o^^o exp~^ defines a one-parameter group on the open set U = exp {«^: ||«^|| < e} C ^. If ll^ll < e, we have by (10.5) so, by (9.10), we get f/i'Ps ° QO] = 0 = /z,(C(l)) = {€'{!), Zca)),
540 CLASSICAL MECHANICS 13JO exi) Sjjiii Tp{M) -^^ M Fig. 13.4 where Z is the infinitesimal generator of ip. But ^ccd = exp#o Y^, where F is the vector field on Tp{M) which is the infinitesimal generator of the one-parameter group of rotations. Now we can choose fi arbitrarily. Therefore, F| can be any vector tangent to the sphere of radius || J|| in Tp(M). We thus conclude that {ri, C|(l)) == 0 for any rj wjiich is tangent to exp ;S||||l, where S\\^\\ is the sphere of radius || J|| in Tp(M). (See Fig. 13.4.) In other words, not only are the rays through the origin orthogonal to the spheres in the Euclidean metric of Tp{M), but also the image of a ray through the origin under exp is orthogonal to the image of a sphere about p in the Riemann metric of M, In particular, we can transfer "polar coordinates" on Tp(M) to M so as to get "geodesic polar coordinates" on M. This has the following effect: Let r be the "radial coordinate", that is, r is the function defined in U by r(x) = ||exp'~^ x||. Then for any x e f/, x 5^ p, and any f G T^{M) we have llrll >\{^,dr)l (10.6) with equality holding only if f is tangent to a geodesic through p. In fact, suppose that f G Tx{M), where x = exp J for some J G Tp(M), Then we can write f = f 1 + f 2; where f 1 is some multiple of C|(l) and f 2 is tangent to exp ;S!i|!i. By the above result, (f 1, f 2) = 0, so llff =||fl||^+llf2f and we obtain (10.6), with equality holding only if ||f2|| = 0. We are now in the same fortunate position we were in Section 9 of Chapter 9. Let D be any curve joining p to x, where x = exp J. Let ^1 be the first time that D{t) G exp ;S|||l!. Then the length of D = f \\D\t)\\ dt > C' \\D'{t)\\ dt > C' \{D'{t),dr)\ Jo Jo Jo > C'{D\t),dr)^T[D{h)]^ Ul Jo Furtherniiore, equality holds only if D'{t) is a nonnegative multiple of a tangent
13.11 euler's equations 541 to a fixed geodesic through p. Then Z) must be the geodesic C|(-). In short, we have proved: Theorem 10.1. Let € > 0 be so small that the map exp is a diffeomorphism on B, = {J G Tp(M) : || ||| < e}. Let x = exp J be a point of C/ = exp B,. Then the geodesic Q joining p to x has length |||||, and any other curve D joining p to x is strictly longer unless D differs from C| only in a (monotone) change of parameter. In other words, loosely speaking, geodesies are locally the shortest curves joining two points. Since we have come this far, let us show in addition that geodesies also locally minimize the energy. Let D be any curve from [0, 1] to M. Then by Schwarz^s inequality we have (I' IID'COII dty < p \\D\t)fdtf' 1 dt = 2/[D], with equality holding only if ||D'(0|| is constant. If C|(-) is the geodesic joining p to X = exp J, we thus have iiD] > i (f^ wD'rn dty > i (/^' iic'coii dtf = iiiif. Now equality holds in the second inequality only if D\t) is proportional to C'(0, while equality holds in the first only if ||D'(0|| = const. We thus conclude that 11^'(Oil = II III, that is, D\t) = C\t). In short, we have proved: Proposition 10.1. Under the hypotheses of Theorem 10.1 the curve C|(-) is a strict absolute minimum for I[C] among all curves C: [0, 1] —» M such that C(0) = p and C(l) = x. 11. EULER'S EQUATIONS Under certain circumstances, which we shall presently describe, the equations of motion take a particularly elegant form. The first special assumption that we shall make is that there is an isomorphism of T(M) with M X F. More precisely, we assume that there is a diffeomorphism, t, of T(M) with M X V such that t(j) = <m, t;>, where m — 7r(J), and for each x G M the map J h-> t; of Tx(M) —» F is a linear isomorphism of vector spaces. For example, if M is an open subset of F, then the identity chart defines such an isomorphism t(J)= <7r(J), Jid>. A slightly less trivial example is furnished by the n-dimensional torus T^ = S^ X ' ' ' X S^. Then we can introduce "angular variables" ^\ . • • > ^^i where ^* is the angular variable on the iih circle. We thus obtain n vector fields d/dd^j . . . , d/dd^ which are linearly independent at each point of M. We have a basis of Tx(M) and therefore an isomorphism of Tx{M) with U^ = V which
542 CLASSICAL MECHANICS 13.11 defines the desired map t. We shall encounter a more complicated example in the next section. We should point out that only very special kinds of manifolds admit such an isomorphism l of T(M) with M X V. For the rest of this section we shall identify T{M) with M X F and T*(M) with M X V* via the corresponding (adjoint) isomorphism. The rule which identifies each Tx(M) with V can be regarded as a F-valued linear differential form on M. Let us denote this form by co. In other words, the identification l is given by t(J) == (J, ojx) if I E Tx(M). We can therefore study the F-valued exterior two-form doj. For each pair of tangent. vectors ^j V ^ Tx{M) we obtain {rj, J J dcj) as an element of V. Now we can identify J and 7} with vectors of V. We thus obtain a F-valued antisymmetric bilinear form on F; if we call it Gix, we have ax{v, w) = {t], ^J do))j where (J, co^) — v, {rj, co^) — tf, and J, r; G Tx{M). Note that, in general, the bilinear form Gix depends on x. Our second fundamental assumption about the identification l is that Gix is independent of x. That is, we assume that there is a F-valued bilinear form d such that (F, Xjdc^)= a{{X, CO), (F, CO)) (11.1) for all vector fields X and Y on M. In the examples given above (the open subset of F on the torus) dcj = 0, so that (11.1) holds trivially. In the next section we shall come across a case where dco ^ 0. To understand (11.1) a little better, let us introduce the following notation: For any z; G F let ?) be the vector field on M given by (23,-co)^ == v for all x G M. Then for any Vj w G Vj we have {w, V Jdo)) = D%{w, co) — D^{v, co) — {[v, w], co). Now {Wj 0)) = w and {v, co) — z; are constants, so the first two terms on the right disappear. Thus we can rewrite (11.1) as a{v,w) = -{[v.wic^). (11.1') For the kinetic energy of a mechanical system we need a Riemann metric on M. A Riemann metric on M gives a scalar product on each Tx(M). This means giving a scalar product ( , )a; on F for each x G M. Our third special assumption is that ( , )x does not depend on x. Thus we are given a scalar product on F which gives the Riemann metric on M via the identification of T(M) with M XV, We wish to describe the vector field on T*(M) as X-dH, where H = K + Uj K being the kinetic energy'of the Riemann metric and U being some potential energy. Since T*(M) == M X F*, a vector field X on T*(M) can be uniquely written as X — Xi + X2, where Xi is tangent to M and X2 is tangent to F*. Furthermore, we can regard Xi as a F-valued function and X2 as a F*-valued
13.11 euler's equations 543 function (identifying the tangent space to vector space V* with V*). Then (X, e)<.,i> = (X„ I) Sit any <Xj l> G M X V*. We should really write this as (•, 6} = {{; W*C0), V). where co is the form defined above and the F*-valued function p:M X V* -^ V* is the projection onto the second factor, p(m, l) — /. Then <•, XAde) = <(•, 7r*w), (X, dp)) - «X, 7r*«), (•, dp» - ((•, Xtt* J d«>, p>. Substituting Y = Y\-\- Y2 and using (11.1), we obtain (F, XAde)= (Fi, X2) - (Xi, F2) - (a(Xi, Fi), 0- Now H = K+U, where K(Z) = i(Z, I) and [/(a;, 0 = U{x). Thus (F, rfF) = (F2, 0 + Fi[7, and the equation {Y,X Jdd)= -{Y,dH) becomes (Fi, X2> - (Xi, F2) - (a(X„ Fi), Z) = -(F2, Z) - (Fi, rfiJ), (11.2) which must hold for all choices of Y. Setting Yi = 0, we get <Xi,-)= (-.Z). (11.3) In fact, the scalar product occurring in the last equation is on V*. Transferring it to an equation on F, we get (Xi, •) = (•, Z). (11.4) Let t »-> <C(t), v{t) > be a solution curve of our system transferred to T{M) == M X y by the Riemann metric. Then this last equation says C'{t) — v{t). If we set F2 == 0 in (11.2) and use (11.4), we get (•,X2)- (a(X„-),Xi) = -{■,dU). Now for any solution of (C, v) we have <■■« = (■. a)' SO that these last two equations can be rewritten as C'{t) = v{t), (•' j) = («(^' •)' ^^ - <^' '^^>' ^^^'^^ which are known as Euler's equations.
544 CLASSICAL MECHANICS 13.12 12. RIGID-BODY MOTION We shall apply the results of the previous section to the study of the motion of a rigid body. For simplicity, we shall confine ourselves to the study of the motion of a rigid body with one point fixed. The more general case where the body as a whole is allowed to move can be handled by similar methods. (Frequently, by considering first the motion of the center of gravity, the more general case can be split into two parts: the motion of the center of gravity and the motion of the body relative to the center of gravity. This then reduces the problem to the one we are studying.) In order to exhibit the generality of our method, we shall consider the equations of motion of a rigid body in E^. Only at the end will we make use of the fact that n — 3. Let us fix some positive orthonormal system "drawn through the fixed point of the body". In other words, we fix some initial position Xq — <h^, . . . ,h^> of the body. Any other position, Xi, of the body can be obtained from Xq by a rotation: Xi — RiXq. Let R{t) — e^^ be a one-parameter group of rotations. Then RiR{t)xo = RiR(t)R^^Xi is a curve of possible positions of the body. The tangent to this curve at Xi will be denoted by A^^ Thus A^^ G T^^iM), If X2 = R2^o = R2R7^Ri^o ^ R2R1 ^:^i, then Ax2 is the tangent to the curve R2R{t)xo = R2R1 ^RiR(t)xoj so that Ax2 = (R2RT^)*^xi' It is clear from its definition that Ax^ == 0 if and only if A = 0. We can regard the map A »-> A^i as a map from the space of skew adjoint linear transformations to Tx^{M), Let V denote the space.of skew adjoint linear transformations. Then since dim V = dim M = ''^'' ~ ^^ and the map A »-> Ax^ is an injection, we conclude that it is an isomorphism. We thus have a trivialization T(M) »-> M X V. Consequently, we get a F- valued linear differential form co, as in Section 11. Let us describe once more the meaning of co^: Tx(M) -^ V. If ? G Tx{M) represents an infinitesimal motion of the body, then since the body is rigid, we can regard ? as an infinitesimal rotation of the body relative to an observer situated outside the body (fixed in space). Thus ? — J5 for some J5 G F, say. Then (^, co) is the corresponding infinitesimal rotation expressed in terms of the basis attached to the body; that is, (?, co) — RJ^BRi if x — RiXq, We denote by A the vector field x^-^ Ax corresponding to A E.V, Let (ft be the one-parameter group generated by 5 for some B E.V. Note that at any X2 — R2^o we have «^ + ^2 — R2^^^^0' Then at any xi — RiXq we have ^As D —1_ r) „As^ D AsBt^ *-• - — (ptKie Xq = Kie e r, Bt/—BtAsBt\^ — Kie [e e e jxq,
13.12 RIGID-BODY MOTION 545 SO that ^—^^^ If we differentiate this equation with respect to ^ at ^ — 0, we conclude that [i, S] = (AB - BA) = -[A, Bl so that, according to (H.l), we have a{A, B) = [A, B] = BA - AB, (12.1) We now show how a mass distribution on the body determines a Riemann metric on M. Let p be a particle on the body with mass m. We will assume that the particle p has coordinates -<p^, • • • , p^>" relative to the axes drawn on the body. Suppose that the body is at position x = <hi, . . . ,hn^- Then the particle p will be situated at the point p^hi + - - - + p'^hn E E"*. If R(t) == e^^ is a one-parameter group of rotations, then when the body is at RiR(t)x the particle p will be situated at Ri[p'R{t)hi + '"+ p^R(t)hn] = RiR{t)p, Thus the velocity of the particle p as the body undergoes the motion generated by A is RiApj and the kinetic energy of the particle p is im||i2iAp||^ — Jm||Ap||^, since i2i is an orthogonal linear transformation. We define the kinetic energy of Ax to be the total kinetic energy of all the particles of the body. Thus illi.f = i| m\\Apf. (12.2) body Note that \\Ax\\ depends only on A, so that our third requirement of the last section is satisfied, provided (12.2) does indeed define a norm on V. (Note: That the mass distribution could be such that Ap = 0 for all p in {p : m(p) > 0} does not imply that A = 0. For example, suppose that all the mass were concentrated along a line l. If A represents infinitesimal rotation about I, so that Ap = 0 for p E Ij then \\A\\ — 0. However, it is clear that if the set of p for which m > 0 spans E^, then (12.2) defines a Riemann metric. In fact, \\A\\ = 0 =^ Ap== 0 for all p belonging to a spanning set, and thus A = 0.) Let us examine the scalar product given in (12.2) a little more closely. Let/ denote the linear function on E^ 0 E^ corresponding to the scalar product on E^ (which is a bilinear form); in other words, /(ai ® 6i H [-ak®hk) = (ai, hi)-\ h (ak, hk). Let s be any element of F ® F. Then s defines a bilinear form on Hom(E^) given by s{A,B) = f{{A0B)s). Note that if the tensor s is symmetric, so is the corresponding bilinear form. In
546 CLASSICAL MECHANICS 13.12 our present context the scalar product (12.2) comes from the tensor / G F 0 F, where / — / mp (X) p body- is called the inertia tensor of the body.* Thus (12.2) can be written as llill^.- I(A,A), Euler's equations in this case become / dA\ V' dt) C\t) ^ A{t) and \^, -^J = 1(1, A], A) - {Ac,t„ dU), (12.3) Now the tensor / is symmetric. We can therefore find an orthonormal basis e^, . . . , e^ oi E^ which diagonalizes /, so that / - he^ ®e' + he^ ® e^ + h /n^" ® 6". Let Eij (i < j) he the antisymmetric matrix defined by Eije' = e'\ Eije' - -e\ Eije^ - 0 (I 9^ ij). Then I(Eij,Eki) = 0 if i,j ^ kl and I{Eijj Eij) = li + Ij- Now let us see what Eqs. (12.3) say for the case n == 3, where we have [^12j ~^13] — ^23j [^12,^23] — ~^13j [^13j^23] — ^12- Suppose A(t) == ai(t)E23 — ^2(0^13 + ^3(0^12, and let B = hiE23 — &2^i3 + &3^i2 == const. Then substituting B into (12.3) and comparing the coefficients of 61, 62? ^i^d 63, we get (I2 + h)^= (h - I2)a2a3 + a,{E^2, dU), (/i + ^3) ^ - (/i - /3)aia3 + a2(-^i2, dU), (12.4) (/i + ^2)^= (I2 - Ii)a,a2 + asiE^s, dU). A simple case of these equations arises when there is no potential, that is, 17—0. First of all, suppose that the body has a spherically symmetric distri- * This differs slightly from that which is usually called the inertia tensor in physics texts. In terms of the coefficients li introduced below, what is usually called lij is li + Ij in our setup.
13.12 RIGID-BODY MOTION 547 bution of mass. Then Ii = I2 = 13 and Eqs. (12.4) become dai da2 da^ ^ dt dt dt In other words, A — const. The motion of the body consists of a steady rotation about some axis fixed on the body. Of course, this means that for an observer in space also, the body undergoes a steady rotation about a fixed axis. Next, let us consider the case of an axially symmetric rigid body moving freely; that is, /i — 12 and [7 — 0. The equations of motion then become dai ~dt da2 "dt das dt where K- = Ka2asj = —Kaiasj = 0, /3+ I2 The solutions of these equations can be written down immediately: as = s — const and ai — Ci cos Kst + C2 sin Kstj a2 = —Ci sin Kst + C2 cos Kst. Thus for an observer situated on the body the instantaneous axis of rotation describes a circle around the axis of symmetry of the body. This motion is known as regular precession. (This motion should not be confused with the astronomical precession of the earth's axis, which is due to the gravitational action of the sun and moon.) If no two of 11, 12, and Is are equal, then the equations of motion can still be solved in terms of integrals, although the expression is rather complicated. We refer the reader to any standard book on mechanics for the details. So far we have been considering the motion with no potential. If [/ f^ 0, then Eqs. (12.4) are usually not so easy to solve. Let us treat the case of a symmetrical top; that is, /i == 12 and 17 is given by the gravitational potential. In order to solve this problem it is convenient to use the Euler angles of astronomy as the local coordinates on M. In order to avoid confusion, we reproduce the definitions here: We let 8x, 8y, dz be the basis vectors of E^ corresponding to the rectangular coordinates x, y, z, where we take the origin of E^ to be the fixed point of the body. We may assume that the center of mass of the body is distinct from the fixed point—otherwise there will be no gravitational force acting on the body. It is easy to see that the center of mass c lies on the axis of symmetry. We shall assume that the vectors drawn on the body are such that 63 points in the direction from the fixed point to the center of mass, i.e., from 0 to c. Then
548 CLASSICAL MECHANICS 13.12 0 < ^ < TT is defined to be the angle between &3 and 5^: cos 6 = (&3, dz). The line of nodes is the intersection of the plane spanned by &i and 62 with the xy-plsme. (Thus, in order for it to be defined, we must restrict to the open set b 9^ :hdz.) We define the unit vector n along the line of nodes to be the one that makes §«, 63, n a positive (right-handed) basis of E^. "^xLine of nodes Fig. 13.5 The angle 0 < ^ < 27r is defined to be the angle that n makes with 8x and 0 < «^ < 27r is defined to be the angle that n makes with hi. We now wish to find the transformation relating the basis of Tx{M) given by d/dd^ d/dip^ d/dtp with the orthogonal basis jE/12, -~jEi3, JE23 introduced earlier. Suppose that x has the coordinate <6ftp, <p>. (See Fig. 13.5.) Now {d/dd)x represents an infinitesimal rotation about the line of nodes and n = (cos (p)hi — (sin <p)h2i so that \ddjx {cosip)E23 — (sin (p){—Eis). The vector (d/d(p)x represents infinitesimal rotation about 63, so that Finally, {d/d^)x represents infinitesimal rotation about 5^. Now {K ^3) = cos 6, Furthermore, since (5^, n) == 0, the projection of bz onto the plane spanned by §1 and ^2 must still be orthogonal to n, since n lies in that plane. It is therefore easy to check that this projection is given by (sin (p)hi + (cos «^)62- Thus Bz = sin d [(sin (p)hi + (cos (p)h2] + (cos d)hsy and therefore W/x (sin d sin (p)E23 + (sin d cos «^)(—jE/13) + (cos d)Ei2.
13.12 RIGID-BODY MOTION 549 If ? e Ta:{M) is given by we have 2m) ^ur = d^cos e)Hl2 + Is) + (sin ^)Hh + Is) + i>\li + 12) + 2(i4{li +12) cos e + xl.'iim' e)[{h + Is) sin^ ^ + (/i + U) cos^ ^] + (/i + /2) cos^ 6). Now J by assumptioa Ii = I2, Let us set Af 1 = /i + /3 ^ ^2 + /3 and Mg = /i + /2. Then the above expression simplifies to K{e, ^, <p; 6, i, ^) = JMi(d2 _^ ^2 gi^2 ^) ^ i^^(^ ^ ^ ^^g ^)2^ (12.5) Let us now obtain the expression for the potential energy. It is proportional to the height of the center of gravity if we assume (as we shall) a uniformly vertical gravitational field. Thus U(d, tp.ip) =: k cos 6, (12.6) where k = mg\\c\\ and m is the total mass of the body, g is the force of the gravitational field, and ||c|| is the distance of the center of gravity from the fixed point. Thus the Lagrangian is given by L{d, ^,ip\6, i/', ^) = 4Mi(iJ2 + ,/.2 sin^ e) + hM%{^ + ^ cos (9)2 - fc cos 6, (12.7) Note that 6/dip and d/d^ are both isometries of the Riemann metric which leave V invariant. Thus the corresponding momenta are constants of the motion. In other words, for any motion of the system we have Pa/Bf = Ci and Pd/d(p = C^2) where Ci and C2 are constants. But Pdld^ = TT = Af 1 ^ sin^ e + M2 cos e{(p + ^ cos 6) = Ci and Pd/d^ = Af2(^ + ^ cos e) = C2. Solving these equations gives <p + xl^cosd=^~-r Ml ^ sin^ e=^Ci -C2 cos 6. (12.8) M2
550 CLASSICAL MECHANICS 13.12 Let us substitute (12.8) into the expression for the energy, E == K+U = Wi(6^ + ^^ sin^ 6) + \M2{<p + ^ cos 61) + ^ cos 6, to get, for a given value of Ci and 'C2, the expression ^ + I M,e' + ^^^7/lT/^' + kcose^ const. (12.9) 2M2 ' 2 ' ' 2Misin2(9 Thus, just as in our treatment of the central-force problem, for a fixed value of Pd/d(p and P^/^^|/, the motion of 6 is determined by a one-dimensional mechanical system whose energy is given by (12.9). After solving this mechanical system for 6, we can then obtain i^(-) and <^(-) by integrations from (12.8). In order to obtain qualitative information about the behavior of the solutions d{'), we can apply the method of Section 5 to our one-dimensional mechanical system. Note that if Ci ^ C2 (as would be the case if the body were "spinning fast")J the kinetic energy tends to infinity as 0 -^ 0 and as ^ -^ tt. We therefore conclude that 6 oscillates between two values 0<^i<^<^2<'7r. In other words, the axis of symmetry of the body executes a periodic up and down motion (called nutation). As d oscillates, xj/ satisfies (12.8). Let us graph the curve that 63 traces out on the unit sphere. If Ci > C2 cos 0 > 0 for ^1 < 6 < 62, then xp > Q although it oscillates in magnitude. The motion is thus as given in Fig. 13.6. Another possibility is that Ci — C2 cos 01 < 0 and Ci — cos 62 > 0, so that i is negative near 6 = di and positive max d = 62. In this case the average of xj/ over a period is still positive, so that the motion is as shown in Fig. 13.7. Fig. 13.6 Fig. 13.7 Fig. 13.8 A limiting case is where Ci — C2 cos di = 0, where the motion of 63 is as shown in Fig. 13.8. (This is the case that arises if the axis of a spinning top is held fixed at some position 0, ^ and then allowed to fall.) The motion of the axis of body around the 2;-axis in all these cases is called precession. It should be remembered that all this time the top is spinning about its axis with constant angular momentum 62-
13.13 SMALL OSCILLATIONS 551 13. SMALL OSCILLATIONS Suppose we are given a mechanical system on a manifold M with energy H = K + U. Suppose that the "force field" dU vanishes at some xq G M. Then the constant curve C{t) = a^o is a trajectory of the system. In fact, let us choose a chart (Wj a) with coordinates x^, . . . , x^ such that «(^o) = <0, . ..,0>. Then in the corresponding coordinates <q^y. . . , q^> on 7r~^(W) we have at the point <0, . . . , 0, . . . , 0> dH dU ^ ^ dH dK ^ It is therefore natural to expect that for small initial values of q and p and for small intervals of time, the solutions of the system should be well approximated by the following linear system: Replace the potential energy, U(x\...,x'') = ZaijxV+ Us, [where Us = 0(\\x\\^)', that is. Us vanishes to third order at x = 0] by the quadratic potential energy, U2{x) = JE aijxVj and replace the kinetic energy corresponding to the given Riemann metric, K(q,q) = hY.giM)i"i\ by the one corresponding to the Euclidean metric at xq, K2(q,q) = hEgij{0,..,,0)q'q\ We thus obtain a mechanical system H2 whose corresponding equations (5.1) are actually linear. [The reader should check, as an exercise, that these equations are exactly the equations of variation (introduced in Section 2) of the vector field X_dH along the curve C(t) = <0, ....... 0> in T*{M).] Of course, as time increases, the values of q^ and p* might become quite large and the linear approximations useless. However, under certain circumstances we can guarantee that q^ and p* remain small for all time. In fact, suppose that the quadratic form X! ciijxV is positive definite. Then U has a strict minimum at xq—say U(xo) = 0. In particular, if we start at Xq with a kinetic energy K = E, where E is sufficiently small, then x will be restricted to the neighborhood of xq defined by U(x) < E, and the momenta will be restricted by the condition K < Ey since, by the conservation of energy, K + U = E. (See Fig. 13.9.) Thus the q^ and the p* will remain small. This does not mean that the solutions to the original mechanical system with Hamiltonian H will remain close to one fixed solution curve of the linearized system. It does mean
552 CLASSICAL MECHANICS 13.13 Fig. 13.9 that for any short interval of time (that is, a short interval near any time in the future), the trajectory will be close to some trajectory of the linearized system. It is therefore important to study the behavior of such mechanical systems. We are thus interested in the following kind of mechanical system: The configuration space M is a vector space. The Riemann metric is given by a Euclidean metric on the manifold. The potential energy is a positive definite quadratic form on this vector space. Let us choose rectangular coordinates x^, . . . , x^ with respect to the Euclidean metric. Thus and the map £ is given by Thus iqy + --- + (ry V = 9 • L{q, q) = K{q) - U{q) and Lagrange's equations become -dt = '^' dt = — E aaq'- (13.1) (Of course, these are just the Euler equations (11.15) for the case at hand where a = o.) Equations (13.1) can be written more suggestively as follows: Let A be the linear transformation whose matrix is (a^y). Stated invariantly, A is the unique self-adjoint linear transformation such that U(x) = {Ax, x). (13.2) Then the trajectories v(') of the system are the solutions of the second-order differential equations d^v . df^ = -^^• (13.3) To find the actual solutions of (13.1) and (13.3) we apply Theorem 3.1 of Chapter 5. According to that theorem, if M is finite-dimensional, we can find an orthonormal basis e^, . . . , e^ of eigenvalues of A. In other words, if 2* are the
13.14 SMALL OSCILLATIONS (CONTINUED) 553 rectangular coordinates corresponding to this basis, U{z\...,z^) = Xi(^i)2 + ... + x,(^-)2 and Eqs. (13.1) and (13.3) become Thus the general solution of (13.3) is given by v{t) = (ai cos Xi( + &i sin \it)e^ + • • • + (^n cos X„< + 6„ sin X„0«", (13.4) where the constants a ^ and h ^ are determined by v{{)) - aie^-\ \-ane'' and jj(0)- \,h^e^ + '" + \nhne\ Thus the general motion is a superposition of independent oscillations, the frequency of each oscillation being determined by the eigenvalues {X^}. That is why the mechanical system (13.1) is called a system of "small oscillations". 14. SMALL OSCILLATIONS (Continued) So far, we have been considering the linearized equations (13.1) as an approximation to a finite-dimensional mechanical system. The philosophy has been that the solutions to the actual mechanical system exist, but are hard to find. We use (13.3) as good approximating equations. Fig. 13.10 It turns out that the method has more extensive applications, even to the case of infinite-dimensional systems where the very existence of solutions to the "actual" mechanical systems may be difficult to establish. Let us illustrate by the mechanical system consisting of a stretched string which is held fixed at two endpoints. For simplicity in illustration, we shall assume that the string is restricted to move in the x2/-plane, although this is in no way essential to the argument. Let us assume that the string is homogeneous and that the two fixed points are (0, 0) and (0, 1). Then the configuration space should be all possible smooth curves joining (0, 0) to (0, 1). Thus the curve shown in Fig. 13.10 would be a possible element of our configuration space. In some sense.
554 CLASSICAL MECHANICS 13.14 the configuration is an "infinite-dimensional manifold" and with suitable work this idea can be made precise. However, what is of interest to us is behavior near the "equilibrium" curve C{t) = {t, 0). For such curves we will have dx/dt > 0, and so the curve can be described by using x as independent variable, i.e., by giving a function u{x). In other words^ we are replacing the big configuration space by the approximating vector space V of all functions u of one variable, with u{0) = u{l) = 0, Thus V is regarded as the "tangent space" to our system, in the sense that u is the "tangent vector" to the curve Cs(-), where Cs(r) — (r, su{t)). (Remember that our configuration space is a collection of curves, so that a curve in our configuration space is a one-parameter family of curves.) Now we expect the "kinetic energy" of u to be the total of the kinetic energy of all the particles on the string. The particle at r has velocity u{t) and therefore kinetic energy ^mu(T)^. If we assume that the mass density is constant, we thus get K^iu) — im I u^ dx Jo as the expression for the kinetic energy. This makes V into a pre-Hilbert space as in Section 6 of Chapter 6. We expect that the potential energy depends on the stretching of the string, i.e., that it is some function of the length: C/(C)=F^'||C'(0||d<) = F(L), where L is the length of the curve and F is some smooth function with F{1) = 0. For curves parametrized by x, the length is given by f!^!^ 2 dx. Now Vl + ci^ — 1 + h^^ + higher-order terms. Thus using a Taylor expansion for F, F{L) ^ F\\){L - 1) + quadratic terms in (L - 1) and du dx L — 1 = - I \T~) ^^~^ I higher-order terms i we see that 172, the quadratic approximation to U, is given by ^^^^) = f io \W '^^^
18.14 SMALL OSCILLATIONS (CONTINUED) 555 By analogy with (13.3) we expect that the "small oscillations" of the string are solutions Ut{') of the equations d^ut . where A is the self-adjoint linear operator such that \ m f Now {Au, u) = ^ I u{Au) dx. ^ Jo Since u{0) = u{i) = 0, we have, by integration by parts, I du du _ _ i tdSc\ Jo dx dx Jo \dx^/ ' so that C d\ Au= -jT^ • Equation (13.3) thus becomes Hh^^'^^' ..(0) = «.(!) = 0. (14.1) Note that we have derived (14.1) by reasoning by analogy. We have not formulated the actual (nonlinear) infinite-dimensional mechanical system, nor have we any guarantee that there is one that can be solved. Furthermore, we don't actually know the function F. We only need to know its form and the value of F\l), Nevertheless, Eq. (14.1) gives a good explanation of observed physical phenomena. The solution of (14.1) proceeds just as in the finite-dimensional case due to the fact that the operator A with the boimdary conditions u{0) = u{l) = 0 is a Sturm-Liouville system, so the results of Sections 6 and 7 of Chapter 6 apply. In fact, we can choose the functions Un{x) = sin (mrx) as an orthogonal basis of eigenvectors of A, where Un has the eigenvalue n^(c/m)7r^. Thus the general solution is given by Ut(x) = (aI cos at + hi sin at) sin (ttx) + (^2 cos 2at + &2 sin 2at) sin (27rx) + * * *, where a = (c/m)7r^. In other words, the general solution is a superposition
556 CLASSICAL MECHANICS 13.14 (linear combination) of the "harmonics" sin nat sin mrXj cos nat sin rnrx. These can be regarded as "standing waves", as, for example, in Fig. 13.11. Isin Sat sin Sttx Fig. 13.11 As another illustration of this method, let us consider the "vibrating membrane" in n-dimensions. Here we are given a domain D with almost regular boundary in E^. We consider a stretched membrane in E^"^^ == E^ X E^ which is fastened along dD in E^ X {0}. Again, as our linear approximation V to the configuration manifold, we take the space of all functions u on D which vanish on dD. To be precise, we let V be the space of functions which we defined, are of class C^ in some neighborhood Z), and vanish on dD. Thus the membrane is the surface in E^"^^ whose points are of the form ■KX , . . . , X^, U(X y . . . , where <x^j . . . , x^> G D. As before, we define the kinetic energy as K(u) = if u\ Jd while the potential energy is to be some function of the total volume (area) which vanishes at ijl(D). Now the total volume (area) of the hypersurface is given by (see Exercise 4.3, Chapter 10) Thus, as before. U2{U) D[u, u], where D is the Dirichlet integral introduced in Section 11 of Chapter 12. By Green's formula we have (since u = 0 on dD) D[u, u]== - f Jd uAu. Thus the operator A of (13.3) is given by (c/m)A, and Eq. (13.3) becomes (14.2)
13.14 SMALL OSCILLATIONS (CONTINUED) 557 Note that this is exactly (14.1) if n — 1. In order to solve (14.2), we must find a complete set of eigenvalues for (14.2). If {un} is such a basis, where X^ is the eigenvalue associated to Uij then the general solution of (14.2) would be given by y^tix) = X) (O^n COS XJ + hn Slli \nt)Un{x). The problem of showing that —A has a complete set of eigenvectors is somewhat more difficult for n > 1, and will be discussed in the exercises. EXERCISES In order to study the eigenvalue problem it is convenient to replace the space of C^-functions vanishing on ^D by the space Hi (where we refer back to page 361 for the definition of the spaces H^.) To show that this replacement is legitimate, we have: 14.1 Show that if </? is a function which is C^ in a neighborhood of D, then ip G H^ if and only if ip vanishes on dD. 14.2 Let the operator K be defined as Kf = (1 — A)/ for smooth / and extended as in Section 14 of Chapter 8 to a map of Hg —> Hs-2' Let g G Hq. Since |(^, v)\ < ll^llollHIo < ll^llollHli for any v G H?, conclude that there is a bounded linear map L from Hq to H^ satisfying (Lg,v)i = {g,v)o for all vGH^. 14.3 Let/ and g be locally integrable functions on D. By this we mean that/</? and gcp are integrable for every test function (p E Co(D). We say that the differential equation Kf = g holds weakly on D if (/, Kcp) = (g, cp) for all test functions (pECo(D). Show that this generalizes the notion of being a solution by proving the following lemma. Lemma. If the functions / and g are respectively in the classes C^{D) and C^(D), then the equation Kf = g holds in the weak sense if and only if it holds in the classical sense. In proving this lemma you may assume that if /i is a continuous function on D such that johip = 0 for every test function </?, then h = 0. 14.4 Prove that the operator L of Exercise 14.2 is a right inverse of K in the weak sense. That is, show that \i g E. Hq and/ = Lg, then Kf = —g weakly on D. 14.5 We now want to show that / is actually in C^{D) if g is suitably smooth. Roughly speaking, what we want to do is to "round off"/ and g near the boundary of C in such a way that we can consider the adjusted functions to be defined on the whole of U^, and can then apply Exercises 14.25 and 14.30 of Chapter 8. Our rounding off process will simply be multiplication by an arbitrary but fixed function \l/mCo{D). Prove, to begin with, that multiplication by such a i/' is a bounded linear mapping of Hg into itself for any s. 14.6 We know that Dj = d/dxj is a bounded linear mapping from Hg to Hg—i for every s. Combine this fact with the result of the above exercise to show that \i Kf = g
558 CLASSICAL MECHANICS 13.15 weakly on D, and if xp is any fixed element of Co(D), then there is a differential operator R of order 1 defined on the whole of U^ such that 1) K{\l/f) = \l/g+ Rg weakly on D; 2) h^-^ Rhis & bounded linear mapping from Hs to Hg-i for every s. In order to consider R to be defined on IR^ we have to extend rp to R^ in an obvious way. The proof is essentially an integration by parts. 14.7 We say that a function h defined on D is locally in Hg if (fh G Hg (when extended to U^) for every (p E:Co{D). Use the above exercise to prove the following lemma: Lemma. Suppose that Kf = g weakly in D, that / G Hj locally in D, and that g ^ Hm locally in D. Then/G //min(m+2,y+i) locally in D. [Hint: In order to prove this crucial lemma, show first that the weak differential equation of Exercise 14.6 holds for all test functions cp on IR^. The crux of the matter is that there is a function X EiCoiD) such that X = 1 on the support of yp. We extend X to R^ as above, and then for each test function (p on U^ we write cp =Xcp+ (1 -X)cp, where Xcp E:Cq{D). Now use the fact that the test functions cp on IR^ are dense in Hg for every s, that ypf G Hj, and that ypg G Hm-] 14.8 Suppose now that g G Hq^ that/ = Lg G Hi (Exercise 14.2), and that kf = g weakly on D (Exercise 14.4). Apply the above lemma repeatedly to show that if g G Hm locally in D, then / G Hm+2 locally in D. Conclude from Sobolev's lemma that if m > n/2 + j and g E Hm locally in D, then / = Lg E 0^^{D). 14.9 Show that \\Lg\\i < \\g\\o and conclude from Exercise 14.31 of Chapter 8 that if we regard L as an operator from Hq to Hq, it is compact, and all of its eigenvectors belong to Hf. Use Exercise 14.8 to show that every eigenvector belongs to C°^{D). 15. CANONICAL TRANSFORMATIONS In Sections 1 through 5 we formulated the notion of a mechanical system as a flow of a certain type on the cotangent bundle of the configuration space. The defining equations for the vector field X generating this flow were X J^ — —dH. Thus the basic property of the cotangent bundle used in singling out the class of flows is the existence of the two-form 12. It turns out that in studying the equations of mechanics it is sometimes convenient to forget that the flow is on the cotangent bundle and to concentrate on the form 12. For instance, we may be able to introduce charts that don't arise from the configuration manifold but in terms of which the vector field X takes a particulary simple form. We shall therefore want to consider a manifold N which carries a two-form 12, subject to certain restrictions which we shall describe below. On such manifolds we shall study vector fields satisfying X J 12 = —dH. It will be convenient to allow H to depend on the time t, as well as being a function on N, so that X will be a time-dependent vector field. The reason for this is twofold. First of all, it allows
13.15 CANONICAL TRANSFORMATIONS 559 the consideration of "nonconservative" mecftanical systems. Secondly, even in the study of the systems we have introduced so far, it is sometimes convenient to make a time-dependent change of coordinates to simplify the equations. This has the effect of changing a time-independent vector field into a time-dependent one. Now to the definitions: Definition. A manifold N is said to possess a Hamiltonian structure (or to be a Hamiltonian manifold) if there is an exterior two-form Q. defined on N such that i) do. = %\ and ii) 12 is of maximal rank in the sense that (4.3) holdJs. Remxirks a) As we have seen, if iV == T*(M), then iV is a Hamiltonian manifold where Q. is given by (4.1). b) If N is finite-dimensional, then it must be even-dimensionaii In fact, condition (ii) says that Q. restricted to each tangent space is an antisymmetric bilinear form which is nonsingular. This can happen on a vector only if it is even-dimensional. c) It can be proved that if iNT is a finite-dimensional Hamiltonian manifold, then one can always find local coordinates ^i, . . . , qn, Pi, - - - j Pn such that ^ = Z dp>i A dqi. [We know this to be the case if AT == T*{M).] The point of this result (which we shall not prove here) is that locally all Hamiltonian manifolds of the same finite dimension look alike. We shall now single out a class of vector fields on TV X R. Definition. The vector field X is a Wamiltonian vector field if there is a function H = Hx on iV X IR such that Xt= (X.dt)^ 1, (15.1) where t is the standard coordinate on U, regarded as a function on iV X IR, and X J (7r*12 - dH A dt) - 0, (15.2) where tt is the projection of iV X IR onto N\ 7r(x, t) == x. Note that H is determined up to a function of t alone. Let us set CO = 7r*12, so that (15.2) can be written as X J{c^ -dH A dt) = 0. (15.20
560 CLASSICAL MECHANICS 13.15 t f iw^O (It Fig. 13.12 If we consider the direct sum decomposition of the tangent space of iV X IR, then condition (15.1) says that we can write X (-•i)' where X is a time-dependent vector field on N; that is, X is a rule which assigns a tangent vector X(x, t) G Tx(N) to each x and t. Since co does not involve dt, we can write X J CO = X(-, t) J^ at any time t. (15.3) [Strictly speaking, this equality should be written as follows: Let it: N ^^ N X U be the map defined by it{x) = (x, t). Then Also, i*(XJco) = X(., 0 J^.] dH (15.30 (X, 6^^> = <X(-, 0, ^iy(-, 0) + -^ at any time t. Thus (15.2) can be split up into two equations. When we compare the terms not involving dt, we obtain Thus X{',t) JQ= -dH{', t) for every fixed t. X J CO = -dH + ^ dt (15.2a) (15.2b) Note that (15.2a) is just the condition stated at the beginning of Section 5 with the novelty that H (and therefore X) can now depend on time. Definition. A diffeomorphism, ^, of iV X IR ^ iV X IR is called a canonical transformation if i) ^*(co) = CO — dW A dt, where W = W-^ is some function depending on Jp] and
13.15 CANONICAL TRANSFORMATIONS 561 X Fig. 13.13 ii) Tp is time-preserving, i.e., 7p has the form ^(x, () — (<p{Xj 0, 0? where <^(-, 0 is a diffeomorphism of N for each t. Observe that if ^ is a canonical transformation, then so is ^~^ and TF^„i= -{ip^)*W^. (15.4) Also observe that if cp and ip are canonical transformations, then so is i?' o ^, where Wj^^ = ^*W^ + Wj, (15.5) These facts follow directly from the definitions and will be left as exercises for the reader. We note next that if ^ is a canonical transformation and Z is a Hamiltonian vector field, then ^*(X) is also a Hamiltonian vector field. In fact, <P*X J [co - d{W^ + cp*H) A dt] = cp*X J [^*co - ip*(dH A dt)] = <P*[X Jic^-dH A dt)] = 0. Thus we may take H^*x as H^*x -=W^+ <p*H, (15.6) Let Z be a Hamiltonian vector field, and let 7p be the map of iVxK^iVX U obtained by letting the system evolve from time t == 0 according to the flow generated by X. That is, let the map <p(-, t) be defined so that the curve t H-» (p(x, t) is a solution curve to the (time-dependent) vector field X which passes through x at time ^ = 0. To put it another way, the curve 11-^ (<^(x, t), t) is the solution curve to the vector field X which passes through (x, 0) at time zero. (See Figs. 13.12 and 13.13.) Note that it follows from the definition of ^ that ^*(l) ^<p(X,t)' We claim that ^ is a canonical transformation. In fact, it {<P co) = (Tp o ^) TT Q = (p{', t)*Q^
562 CLASSICAL MECHANICS 13.15 since Tp o {^(Xj t) = ((p{x, t)j t). But d §- i<p(', s)*12) = <^(., s)*Dx,.,n^ = 0 ds by (15.2a). Thus it cp 0) = 12, or, in other words, <^*co is of the form o) + 6 A dt To determine 6 it suffices to take the interior product with d/dt, since co doesn't involve dt. But (f^ J <p*7r''n\ = cp* U^ (j\ J 7r*12l =<p*(Xj 7r*12) = <p* (-dH + ^-§dt\. so 6 = 7p* dH. Thus ^ is a canonical transformation and T7- = -7p*Hx. (15.7) Note that (15.7) is just what we would expect from (15.6). In fact, <^*Z = d/dt, and we may take HdiQt = 0. Equations (15.6) and (15.7) are used in conjunction in the following way: Suppose that H = Hq + Hi, where we know how to solve the differential equations corresponding to ^o- In other words, we can find the map 7p corresponding to the vector field Xq, where Xq J (co — dHo A dt) = 0. If X J {c^ - dH A dt) = 0, then <^*Z is a vector field whose corresponding Hamiltonian function is ^*Hi by (15.6) This method was first introduced by Lagrange in the study of the n-body problem. We can let ^o be the Hamiltonian obtained by ignoring the terms in the potential energy coming from the interaction of the planets, and let Hi be the rest of the Hamiltonian H. The solution for ^o is then given by having the planets move about the sun according to Kepler's laws. For simplicity in discussion, let us restrict our attention to that portion of phase space where the motion is elliptical. Then the motion of the planets is specified by giving the various parameters of each ellipse (such as the plane of the ellipse, its major axis, its eccentricity, etc.) and telling the position of the planet on its ellipse at time t = 0. This corresponds to the use of the map (p. One then regards the equations of motion of the whole system as differential equations for the parameters of each ellipse. This corresponds to studying the vector field ip*X. This idea of introducing the parameters of the ellipses as "generalized " coordinates was one of the key steps leading to the notion of the invariant calculus on manifolds. We have seen that solving the differential equations corresponding to a Hamiltonian H is the same as looking for a map 7p satisfying (15.7). Under certain circumstances this can be reduced to looking for the solution to a certain partial differential equation. Suppose that we have local coordinates qi, ^ - - ^ qn^
13.15 CANONICAL TRANSFORMATIONS 563 Ply . . . yPn such that 12 == I] ^Pi A %. Let V = V(qi, . . . , ^n, Pi, • • • , Pnj 0 be a function such that the maps <^i and <^2 are diffeomorphisms, where M^l, • • • , ^n, Pi, . . . , Pn, 0 ^ < ^1, • • * ' ^^' d^ ' * * * ' ^ ' ^/^ and <^2(gi, . . . , gn, Pl, . . . , Pn, t) = < ^ ' * * * ' ^ ' ^1' • • • ' ^^' Y We claim that ^ = <^i o <p~^ is a canonical transformation and that TT^-^^'*^- (15.8) Note that co — c?(X) Pi dq^) — —d(^ qi dpi). Thus <^*a) — CO -= cZ(^* ^ p^ c^g^ + ^ qi dpi) ^ ^<^2"^*(<^i 23 Vi dqi + <^2 2] ^z ^Pz) of = _,(,-..*!) ^,, If we substitute into (15.7) we see that (p solves our differential equations if and only if ^-1*^ + ^*^ = 0. But :p*H = (p^^*ip*H, so we can write this equation as ^ + ^t^ = 0 (15.9) or dV dt + ff(3x,...,3.|^.---.£./) = 0. (15.90 Equation (15.9) is known as the Hamilton-Jacohi equation. We therefore have a prescription for (locally) solving the equations of motion: Find a solution V of (15.9) which has the property that (pi and <^2 ^re diffeomorphisms. Under certain circumstances, a proper choice of coordinates allows us to solve (15.9) by the method of "separation of variables''. We illustrate this method in the following examples.
564 CLASSICAL MECHANICS 13.15 Example 1. Central-force motion again. Here when we use polar coordinates in the plane. Equation (15.9) becomes Since the variables t and 6 do not occur explicitly in this equation, we seek a solution of the form V= V,{t) + V2{d) + Vs{r) and conclude that both V[(t) and ¥2(0) depend only on r, and so must be constants. We may thus write V[{t) = ~E, V'2(d) - Ae, where E and Ae are constants. (This just reflects the conservation of energy and angular momentum.) We then get the equation '^ = Viir) = ^2m{E - t/(r)) - f Thus Aee + f Jo 2m{E - U(s)) 1/2 ds — Et is a solution of (15.9). Here we consider V a function of the variables r, 6, Ej Ae. Then the map <^2 is given by cp2(r, d;E,Ae) = -t + m ds [ [2m{E - Uis)) - j{\" Ae ds l^s^[2m(E-ms))-§\^'' ' E,Ae Now the map <^i o <^2 ^ takes the flow into the constant flow, so that we must have m ds t-to = and ^0- \2m(E - U(s] Ae ds ', .' [imiE - VM) - 0\ 1/2
13.15 CANONICAL TRANSFORMATIONS 565 where ^o and ^o are constants. Note that the second of these equations gives the orbit explicitly, which can then be solved to give r as a function of t. Example 2. The simple harmonic oscillator. Here so Eq. (15.9) becomes 2m\dqJ ^ 2 ^ dt Again, since time doesn't enter explicitly, we can write V = -Et + T7, where T7 is a function of q alone which must satisfy or /2 W ^ Vmk I [^ — s^ ) ds. Then Thus =(r/:(¥-r''-'- Solving for q in terms of t gives Example 3. The motion of a particle attracted by two fixed point masses. Here where ri and ?^2 are the distances to the two points and A and B are constants (determined by the masses of these two points). For the purpose of solving this problem, it is convenient to introduce so- called elliptical coordinates. Let us assume that the two fixed points lie on the X-axis, each at a distance c from the origin. We may take c — 1 for simplicity.
566 CLASSICAL MECHANICS 13.15 (-1,0) Fig. 13.14 (See Fig. 13.14.) In the xiy-plane define the local coordinates ? and rj by setting ? = Wi + ^^2), V = i{ri — r2). Thus the curves ? — const represent ellipses with semimajor axis ? and foci at the two fixed points, while the curves r) — const are hyperbolas with semimajor axis rj and the same foci. Note that 0<|r7|<l<?<oo. The equations of these two curves are 2 2 ^__(__ y = 1 and y = 1, so that Thus and therefore ^2 ' ^2 _ 1 ^ ^ ^2 I - r)^ x' ^ ^'v' and y' = U' - 1)(1 - ry^). ^d^ rjdrj dx __ d^ drj dy y ?2 1-772 If we now rotate about the x-axis to get the analogue of cylindrical coordinates in space, we have the coordinates <x,p,d> and <J, r;, ^> in space, and the Euclidean metric is given by dx^ + dy^ + dz^ = dx^ + dp^ + p^ dd^ - (?' - ^') {-^i + r^) + (?' - i)(i - ^') ^^'- Also, jj _A I B _Ar2 + Bri 1 ^2_,2 (a? + M,
13.15 CANONICAL TRANSFORMATIONS 567 where a = A + B and 0 ^ A — B. Then H takes the form H(^y rj, d, P^, P,, Pe, t) ^ 2m(?2 _ ^2) X [(?' - l)Pf + (ry" - l)Pl + (^^^ + r^) ^'' + «^ + ^^] • Since t and ^ do not occur explicitly in (15.9), we may write V = —Et + AeS + W, where now W must satisfy (£^ - .) (f)% tf - .) (15) + (^; + ^) ^1 + <+„ = 2m{f - v^)E. Note that if we set W = T7i(Q + ^72(^7), this equation separates into two, and we can explicitly solve each of them by quadratures. This gives the solution to the original equations of motion. We leave the details to the reader.
SELECTED REFERENCES Chapters 1, 2, and 7 A more extensive treatment of linear algebra, over arbitrary coefficient fields, can be found in BiRKHOFF, G., and S. MacLane, A Survey of Modern Algebra, 3rd ed., Macmillan, New York, 1965. Hoffman, K., and R. Kunze, Linear Algebra, Prentice-Hall, Englewood Cliffs, N. J., 1961. For linear algebra over commutative rings that are not fields, see Lang, S., Algebra, Addison-Wesley, Reading, Mass., 1965. Zariski, 0., and P. Samuel, Commutative Algebra, 2 vols.. Van Nostrand, Princeton, 1959, 1960. Chapter 3 Difficult but rewarding is DiEUDONNE, J., Foundations of Modern Analysis, Academic Press, New York, 1960. The differential notation in this book is Df(a) (instead of dFa) and Df(a) • ^ (instead of dF«(^)). Chapters 4 and 5 Good books on general (point set) topology are Kelley, J., General Topology, Van Nostrand, Princeton, 1961. Simmons, G., Topology and Modern Analysis, McGraw-Hill, New York, 1963. The remaining standard examples of Banach spaces and Hilbert space require the Lebesgue integral and are therefore beyond our scope. However, the interested reader can pursue the abstract theory of Banach and Hilbert spaces in Simmons and in such books as HiLLE, E., and R. S. Phillips, Functional Analysis and Semigroups, American Mathematical Society, Providence, R. I., 1957. Murray, F. J., An Introduction to Linear Transformations in Hilbert Space, Princeton University Press, Princeton, 1941. RiESz, F., and B. Sz-Nagy, Functional Analysis, Ungar, New York, 1955. 569
570 SELECTED REFERENCES YosiDA, K., Functional Analysis, Springer, Berlin, 1965. The books by Murray and Yosida are more advanced and harder. Chapter 6 Standard books on ordinary differential equations are BiRKHOFF, G., and G. C. Rota, Ordinary DijSferential Equations, Ginn, Boston, 1962. HuREWicz, W., Lectures on Ordinary Differential Equations, M.I.T. Press, Cambridge, Mass., 1958. Advanced treatises are CoDDiNGTON, E., and N. Levinson, Theory of Ordinary Differential Equations, McGraw-Hill, New York, 1955. Hartman, p.. Ordinary Differential Equations, Wiley, New York, 1964. Chapter 8 This chapter has been devoted to the theory of content. For modern mathematics the more powerful theory of Lebesgue measure and integration is needed. For one- dimensional theory the reader can consult RuDiN, W., Principles of Mathematical Analysis, 2nd ed., McGraw-Hill, New York, 1964. For the general theory the reader can consult Halmos, p.. Measure Theory, Van Nostrand, Princeton, 1961. Another way of computing certain definite integrals, involving the "residue calculus", is discussed in a set of exercises at the end of Chapter 12, and can be read independently after Chapters 8 and 11. For the relationship of integration to "generalized functions'' see Gel'fand, I. M., and G. E. Shilov, Generalized Functions, vol. 1, Academic Press, New York, 1964. A little complex variable theory is necessary to read this book. Chapters 9, 10, and 11 For a more abstract treatment see Lang, S., Introduction to Differentiable Manifolds, Interscience, New York, 1962. For a less abstract treatment see Flanders, H., Differential Forms with Applications to Physical Sciences, Academic Press, New York, 1963. Fleming, W., Functions of Several Variables, Addison-Wesley, Reading, Mass., 1965. Spivak, M., Calculus on Manifolds, Benjamin, New York, 1965. A more extensive treatment of differential geometry will be found in WiLLMORE, T., An Introduction to Differential Geometry, Oxford University Press, London, 1959.
SELECTED REFERENCES 571 Chapter 12 The classical book on potential theory is Kellogg, 0., Foundations of Potential Theory, Springer, Berlin, 1929. The relationship between harmonic functions on the plane and analytic functions is studied in standard texts on the latter subject, such as Ahlfors, L., Complex Analysis, 2nd ed., McGraw-Hill, New York, 1966. HiLLE, E., Analytic Function Theory, Ginn, Boston, 1959. Chapter 13 A standard modern book on mechanics is Goldstein, H., Classical Mechanics, Addison-Wesley, Reading, Mass., 1952. For the classical astronomy-oriented aspect of mechanics see PoiNCARE, H., Legons de mecanique celeste, Gauthier-Villars, Paris, 1905-1910. Whittaker, E. T., Analytical Dynamics, 4th ed., Cambridge University Press, Cambridge, England, 1937. A leisurely geometrical study of classical mechanics will be found in Lanczos, C, The Variational Principles of Mechanics, University of Toronto Press, Toronto, 1960. For a book which treats classical mechanics in the spirit of this chapter, and which also studies statistical mechanics and quantum mechanics, see Mackey, G., Mathematical Foundations of Quantum Mechanics, Benjamin, New York, 1965. An elegant treatment of the geometrical applications of the calculus of variations will be found in MiLNOR, J., Morse Theory, Princeton University Press, Princeton, 1965.
NOTATION INDEX General Conventions D end of proof U the real number system Z the integers C the complex number system E^ Euclidean n-space boldface letters for n-tuplets: a = <ai, . . . j an> Special Symbols Symbol Page Symbol Page (Vx) 1 (3x) 2 & 3 4 =» 4 ^ 5 G 6 C 7 0 7 {} 7 < > 9 R-^ 9 AxB 9 \ 10 i^[A] 10 •-> 11 572 —> 5^ n Xa n o Fix, •) u, n, u, n A' T e([a, b]) 2Z ^iO^i L{A) 8' La Try 11, 12 12 12 14, 14 16 17 17 19, 24, 27 27 28, 32 33, 117, 202 43 53,56 121 31,74 47
Symbol Page NOTATION Symbol I, ^'"* dA piA, B) Br[A] Q{A, W) (m{A, w) (ii) A^ ^2.^, ALB V*®W* (7*)@ a" \ /\ n *M m(A) SD «*-'min Da^ M*, M «^con 5J c^(x) ffp 'J'"con «/ ^ JA S /,/ • //s (f^z, «z) 4: I>^ INDEX Page 197, 198 198 199 212 219 219 38, 248 250 250 305, 306 307 310 316 320 321 322 324, 326 324 331 332 333 337 338 339 150, 342 351 356 357 359 361 364 372 373 573 N{T), 'Sl{T) R{T), (R(r) Hom(F, W) dj © d{V) U V* A° jr* (* A(r) tr(r) Z)(t) lub, gib il II II 111, II lU, II lU BM) (S,{A, W) ^, e, © AF„ dF^ SD„(F, W) D^F d'F„ dFi 3(2/1, . . . , 2/„) , , 7} ;: (a) d(Xi, . . . ,X„) d''F„ class C*, e*([a, b]) c* P(x, y) Br{x) 34 34 45 47 56 78 78 82 84 85, 262 93 99, 312 99 101, 312 119 122, 128 122 123, 196 122, 123 138 141, 142 141, 142 143 147 150, 186 153 159 192 194 194 196 196
574 NOTATION Symbol m T.{M) ^*x f*x F« Dxf nx] IX,Y] Tt{M) <?, I) - K& dL dKx) d d dx' supp^ P,Pa supp p P cp*P Dxp div <X,p> INDEX Page 373 374 374 376 379 384 384 388 390 391, 83 391 394 157, 395 405 408 408 409 416 419 419 Symbol a^ A' the operator d Dxco X J CO curl CO div CO A Wi V W2 T{M) T{a) r*(a) e 12 fx (^x,X^ exp GixiVj W) V A 00 w^ Page 429, 430 430 438 452 456 458 458 476 490 511 511 512 516 517 517 519 538 542 542. 544 559 560
INDEX absolute convergence, of an infinite series, 221 absolute value, 121, 242 absolutely integrable function, 351 adjoint of T, 85 alfine independence, 81 affine span, 56 affine subspace, 40, 52 affine transformation, 53 algebra, 30, 46 Banach, 223 of matrices, 92 alternating tensor, 310 amplitude, 292 analytic plane R^, 10 angular momentum^ 523 annihilator, 84 antisymmetric tensor, 310 associative law, for composition, 14 atlas, 364 ball, closed, 124 open, 123, 196 Banach algebra, 223 Banach space, 217 Banach-Tarski paradox, 321 basis, 71 dual, 82 in Hom(F, IF), 96 infinite, 75 for pre-Hilbert space, 254 basis isomorphism, 72 BesseFs inequality, 254 bijective, 12 bilinear, 67 bilinear functionals, 305ff block decompositions, 65ff bound variable, 2 boundary, 198 boundary-value problem, 294ff bounded below, 128 bounded function, 122 bounded linear mapping, 127 bounded set, 123 braces, 7 calculus of variations, 182ff canonical basis, 104 canonical transformation, 558, 560 Cartesian axes, 38 Cartesian product, of an indexed collection, 14 of two setSy 9 Cauchy sequence, 216 cellulation, 473 center, of gravity, 349 of mass, 529 central force, 523, 564 centrifugal force, 517 centrifugal potential, 517 chain rule, 153 change of basis, 94 change of variable formula, 342, 355 characteristic function, Fourier transform, 337 of a set, 12 characteristic polynomial, 259 classical mechanics, 509 closed set, 124, 197 closure, 198 codimension, 80 codomain, 12 cofactor, 315 column space (of a matrix), 91 575
576 INDEX column vector, 93 commutative diagram, 86 compact transformation, 264 compactness, in general topology, 214 on a manifold, 403 sequential, 206 compatible atlases, 370 complement, 17 complementary subspaces, 57 completeness, 217 completion, 222 complex conjugation, 241 complex normed linear space, 244 complex numbers, 240ff complex vector space, 24, 241 complexification, 244 component, 58 composite-function rule, 143 composite truth-functional forms, 5 composition, 14 configuration space, 509 conjugate space, 82 connectives, logical, 3 conservation, of angular momentum, 523 of energy, 521 laws, 521 constant coefficient equation, 278 constrained maximum, 173f content, 332 contented function, 339 contented sets, 331f continuity, 126, 196, 201, 203 contraction mapping, 229 contravariant vector, 95 convergence, of infinite series, 221 sequential, 202 convex set, 125 coordinate, 43, 72 coordinate correspondence, 36f coordinate functional, 33, 72 coordinate isomorphism, 72 coordinate n-tuple, 72 coordinate projection, 43 correspondence, 9 coset, 52 cotangent bundle, 511 CO variant tensor, 307 covariant vector, 95 Cramer's rule, 101, 315 critical point, 161, 188ff cylindrical coordinates, 345 (Ex. 11.5) definite (form), 115 degree (of a polynomial), 64 De Morgan's law, 17 dense subset, 212 density, 408 dependent set of vectors, 73 derivative, 146 in a Banach algebra, 226 over C, 242 directional, 147 determinant, 99ff, 108, 312ff diagonal matrix, 262 diffeomorphism, 376 differentiable manifolds, 363, 369ff differential, 142 differential forms, 390f differential geometry, 459 differentiation under the integral sign, 181 dimension, 78 dimensional identities, 78f, 84 direct sum, 56 directed frame, 9 directed line segment, 37 directional derivative, 147 disjoint (family of sets), 18 distance, in a metric space, 196 between vectors, 122 divergence theorem, 421 division algorithm, 64 domain, with almost regular boundary, 424 with regular boundary, 419 domain, of a relation, 9 of a variable, 8 dual basis, 82 dual space, 82 duality, 15, 68 dyad, 86 echelon matrix, 104 eigenvalue, 34, 258
INDEX 577 eigenvector, 34, 258 element (of a set), 6 elementary bilinear functionals, 306 elementary matrices, 105f elementary row operations, 103 elliptical coordinates, 566 empty set, 7 equations of variation, 513f equicontinuity, 215 equivalence relation, 19 equivalent directed line segments, 37 equivalent norms, 132, 204 equivalent truth-functional forms, 5 Euclidean norm, 38, 122 Euler angles, 548 Euler's equation(s), 184, 541, 543 existential quantification, 2 exponential function, 226, 228 exterior algebra, 316ff exterior calculus, 429 exterior differential forms, 429 exterior forms, 316 fiber ing, 19 finite-dimensional, 28 first variation, 183 fixed-point theorem, 229ff floor of a paving, 329 flow, 381 formal adjoint, 295 Fourier coefficient, 254 Fourier series, 254, 301 ff Fourier transform, 355, 357 frame, 71 free variables, 2 frequency, 292 function, lOf function space, 24 functional dependence, 175ff fundamental form on T*iM), 515 fundamental solution, 277 fundamental theorem, of algebra, 243 of calculus, 238 of ordinary differential equations, 267 geodesic, 539 geodesic coordinates, 537 geodesic curvature, 468 geodesic polar coordinates, 470 germ of functions, 138 global solutions, 269 graph, 9, 162 greatest lower bound, 119 group, 15 Hamilton-Jacobi equation, 563 Hamiltonian mechanics, 520 Hamiltonian structure, 559 Hamiltonian vector fields, 519, 559 Heine-Borel property, 214 Hermitian symmetry, 249 Hilbert space, 249 homogeneity, of a norm, 121 homogeneous equation, 277, 282ff homogeneous function, 148 homogeneous polynomial on V, 145 hyperplane, 41 idempotent, 59 if ... , then . . . , 4 image, 10 immersion, 399 implicit-function theorem, 167, 230 inclusion (set), 7 increasing norm on R", 134 increasing sequence, 207 independent set of vectors, 7If independent subspaces, 56 indexed collection, 13 inertia tensor, 546 infinite series, 221, 224ff infinitesimal, 137 infinitesimal generator, 379, 381 inhomogeneous equation, 277, 288ff initial condition, 267 injection (product), 47 injective, 12 inner content, 331 inner paving, 331 integral, of a parametrized arc, 236ff integration, 321, 336 interior, 197 interior point, 124 intermediate-value theorem, 120
578 INDEX intersection, 17 invariant subspace, 54, 63 inverse, of a function, 15 of a relation, 9 inverse-mapping theorem, 167 isomorphism, 34 between normed linear spaces, 132 iterated integrals, 346 Jacobian determinant, 159 Jacobian matrix, 158f Kepler's first law, 527 Kepler's second law, 524 Kepler's third law, 527, 532 kernel, 34 kinetic energy, 521 Kronecker delta function, 74 Lagrange multipliers, 174 Lagrange's equations, 530 law of cosines, 41, 250 least upper bound property, 119 left inverse, 14 lemma of Du Bois-Reymond, 184 length of a curve, Riemannian manifold, 401 Lie bracket, 389 Lie derivative, of a density, 417 of an exterior differential form, 431, 452 of a function, 383f of a linear differential form, 392 of a vector field, 385, 388 limit, of a function, 116, 126 line (straight), 38 line of nodes, 548 linear combination, 27 linear combination mapping, 31 linear differential equations, 276ff linear functionals, 32, 82 linear transformation, 30 Lipschitz continuity, 126 local solution, 269 locally absolutely integrable density, 408 locally absolutely integrable n-form, 435 locally uniformly Lipschitz, 267 mapping, 12 matrix, 32, 88 maximal solution, 269 maximum, relative, 161 mean-value theorem, 148f member (of a set), 6 metric space, 196 minimal polynomial, 261 momentum, 511 momentum function of a vector field, 517 (Eq. 3.6) monomial on F, 145 monotone sequence, 207 multi-index notation, 193 multilinear functionals, 306ff multivector, 318 name, 7 natural isomorphism, 69, 86 negation, 6 neighborhood, 137, 201 deleted, 137 nilpotent, 50 nonnegative transformation, 257 nonsingular, 93 norm, 121 norm metric, 196 normal distributions, 355 (Ex. 13.1), 356 normed linear space, 121 nth-order equation, 270ff null space, 34 nutation, 550 one-parameter group, one-to-one, 11 open set, 124, 196 or (connective), 4 ordered basis, 71 ordered n-tuple, 13 ordered pair, 9 orthogonal, 250 orthogonal projection, 253ff orthogonal transformation, 262 orthonormal, 254 orthonormal basis, 112, 254 oscillating system, 291 ff outer content, 331 outer paving, 331
INDEX 579 pair set, 7 parallel translation, 40 parallelogram law, 251 parametric equations (of a line), 38 parametrized arc, 146 Parseval's equation, 254 partial derivative, 156f partial differential, 153 partition, 19 partition of unity, 405 subordinate to an open covering, 406 patch manifold, 172 paved function, 338 paved set, 324, 326 paving, 324 permutations, 308 phase angle, 294 plane, 40 Poisson bracket, 519 polar coordinates, 345 polar decomposition, 263 polynomials, 62, 64 positive definite, 115, 190 precession, 547 pre-Hilbert space, 249 principle of mechanical similarity, 532 product, matrix, 91 product norm, 133 product rule (for differentials), 155 product vector space, 43 projection, 19, 58 projection-injection identities, 47 property, 7 puUback, of densities, 416 of exterior differential forms, 392 of functions, 372 of linear differential forms, 392 of vector fields, 384 Pythagorean theorem, 41, 251 quadratic form, 112 quotation marks, 7 quotient space, 54 range, of a linear transformation, 34 of a relation, 9 rank, of a linear transformation, 85 of a matrix, 91 real vector space, 23f rectangles, 323f regular differential equations, 282, 298 regular solid, 473 relation, 9 Rellick's lemma, 362 (Ex. 14.30) resonance, 294 restricted variables, 8 restriction (of a relation), 10 Riemann metric, 397 right inverse, 14, 61 rigid-body motion, 544 row-reduced echelon form, 104 row space (of a matrix), 91 row vector, 93 scalar, 23 scalar product, 38, 248 Schwarz inequality, 125, 249 second conjugate space, 83 second difference, 188 second differential, 150, 186ff self-adjoint, 257 se minor m, 125 semiscalar product, 248 separation of variables, 563f sequential convergence, 202ff set, 6 shear transformation, 56 sign of a permutation, 309 signature (of a quadratic form), 113 simple harmonic motion, 293 skeleton, 32 small oscillations, 551 smooth arc, 146 Sobolev's inequality, 361 solid angle, 474 span, linear, 28 spherical coordinates, 345 (Ex. 11.4) standard basis, 74 star operator, 320 state, 510 statement, 1 statement frame, 1 stop function, 237 Sturm-Liouville system, 298 submanifold, 172, 367 subsequence, 205
580 INDEX subset, 7 subspace (vector), 24 successive integration, 346 support, of a density, 408 of a differential form, 425 of a function, 336 surjective, 12 symmetric tensor, 310 tangent bundle, 511 tangent plane, 162 tangent space, 373 tangent vector, 146, 373 tautology, 5 Taylor's formula, 191ff tensor product, 305f topology, 201 total linear momentum, 528 totally bounded, 212 trace, 99 translation, 40, 53 transpose, 90 triangle inequality, for metrics, 196 for norms, 121 truth-functional forms, 5 truth table, 3 two-body problem, 528 two-norm, 250 undetermined coefficients, 289 uniform conditions, 21 Off uniform continuity, 179, 210 uniform convergence, 211 uniform norm, 123 uniformly absolutely integrable, 353 union, 17 unit set, 7 universal quantifier, 1 upper triangular, 66 variation of parameters, 289 variational principles, 532 vector analysis, 457 vector field, 44 vector space, 23 volume density of a Riemann metric, 411 volume of an immersed hypersurface, 413, 415 weak sequential convergence, 245 wedge operation, 316 Weierstrass approximation theorem, 304