Author: Curtis Morton L.  

Tags: linear algebra  

ISBN: 0-387-97263-3

Year: 1990

Text
                    Universitext
Editors (North America): J.H. Ewing, F.W. Gehring, and P.R. Halmos
Berger: Geometry I, II (two volumes)
Bliedtner/Hansen: Potential Theory
Bloss/Bleeker: Topology and Analysis
Chandrasekharan: Classical Fourier Transforms
Charlap: Bierbach Groups and Flat Manifolds
Chern: Complex Manifolds Without Potential Theory
Cohn: A Classical Invitation to Algebraic Numbers and Class Fields
Curtis: Abstract Linear Algebra
Curtis: Matrix Groups
van Dalen: Logic and Structure
Devlin: Fundamentals of Contemporary Set Theory
Edwards: A Formal Background to Mathematics I a/b
Edwards: A Formal Background to Mathematics II a/b
Emery: Stochastic Calculus
Fukhs/Rokhlin: Beginner's Course in Topology
Frauenthal: Mathematical Modeling in Epidemiology
Gardiner: A First Course in Group Theory
Gar ding/Tambour: Algebra for Computer Science
Godbillon: Dynamical Systems on Surfaces
Goldblatt: Orthogonality and Spacetime Geometry
Humi/Miller: Second Course in Order Ordinary Differential Equations
Hurwitz/Kritikos: Lectures on Number Theory
Iverson: Cohomology of Sheaves
Kelly/Matthews: The Non-Euclidean Hyperbolic Plane
Kostrikin: Introduction to Algebra
Krasnoselskii/Pekrovskii: Systems with Hysteresis
Luecking/Rubel: Complex Analysis: A Functional Analysis Approach
Marcus: Number Fields
McCarthy: Introduction to Arithmetical Functions
Meyer: Essential Mathematics for Applied Fields
Mines/Richman/Ruitenburg: A Course in Constructive Algebra
Moise: Introductory Problem Courses in Analysis and Topology
Montesinos: Classical Tesselations and Three Manifolds
Nikulin/Shafarevich: Geometries and Group
Oskendal: Stochastic Differential Equations
Rees: Notes on Geometry
Reisel: Elementary Theory of Metric Spaces
Rey: Introduction to Robust and Quasi-Robust Statistical Methods
Rickart: Natural Function Algebras
Rotman: Galois Theory
Samelson: Notes on Lie Algebra
Smith: Power Series From a Computational Point of View
Smorynski: Self-Reference and Modal Logic
Stroock: An Introduction to the Theory of Large Deviation!
Sunder: An Invitation to von Neumann Algebras
Tondeur: Foliations on Riemannian Manifolds
Verhulst: Nonlinear Differential Equulioni und Dyilttfflliil IjfVlimi
Zaanen: Continuity, Integration mul I'ouilti Thtuiy


Morton L. Curtis Abstract Linear Algebra With Revisions by Paul Place Sprlngtr-Verlag New York Berlin Heidelberg London Purls Tokyo Hong Kong
Morton L. Curtis (1921-1989) Department of Mathematics Rice University Houston, TX 77251 USA Editorial Board (North America): J.H. Ewing F.W. Gehring Department of Mathematics Department of Mathematics Indiana University University of Michigan Bloomington, IN 47405 Ann Arbor, MI 48109 USA USA P.R. Halmos Department of Mathematics Santa Clara University Santa Clara, CA 95053 USA AMS Subject Classifications (1980): 15-01, 11R52 Library of Congress Cataloging-in-Publication Data Curtis, Morton Landers, 1921- Abstract linear algebra / Morton L. Curtis ; with revisions by Paul Place. p. cm — (Universitext) Includes bibliographical references. ISBN 0-387-97263-3 (alk. paper) 1. Algebras, Linear. I. Place, Paul. II. Title. QA184.C873 1990 512'.5-dc20 90-33058 Printed on acid-free paper ® 1990 Springer-Verlag New York Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Photocomposed copy prepared using the author's LaTeX file. Printed and bound by R.R. Donnelley & Sons, Harrisonburg, Virginia, Printed in the United States of America. 987654321 ISBN 0-387-97263-3 Springcr-Vcrlag New York Berlin Htldf lh»i* ISBN 3-540-97263-3 Springer-Verlag Berlin Hsldillt«f NfW Iff!
Preface The unexpected and untimely death of the author, and my good friend, Morton Curtis on February 4, 1989 occurred only a short time after the first complete version of this manuscript had been returned by the typist. His intentions with this work were clear: he saw a way of organizing the subject of Linear Algebra around a topic which was not in the general purview and in doing so enhancing both the topic and its audience. Indeed it was characteristic of his approach to mathematics to seek out the elegant • ideas, to make them clear and concise, and to teach them with enthusiasm to the widest group possible. It gives me satisfaction to have been able to help insure that this project be completed. That others might gain some enjoyment of mathematics through this book would be a most suitable monument to his memory. The goal to which this work leads is the Theorem of Hurwitz—that the only normed algebras over the real numbers are the real numbers, the complex numbers, the quaternions, and the octonions. It is, to my knowledge, the only presentation of this material at an "elementary level." It begins from scratch and develops the standard topics of Linear Algebra one would expect from a first course on the subject. It is intended as a text for such a course directed toward students with mathematical leanings; it stresses the complete logical development of the subject while suppressing mention of its many applications to other fields. I think it would also be a valuable reference for mathematicians in general. Much of the beginning presentation is standard; where options exist, the choice has tended toward the more mathematically sophisticated— particularly where this choice makes a good case for its superiority as, for example, in the treatment of determinants through the introduction of exterior algebras. The task of preparing the original draft for publication was taken on by my student, Paul Place. In consultation with me, he approached the job of reworking the text to insure clarity, consistency, and completeness with admirable dedication. His efforts include: the inclusion of exercises, rewriting of sectionH IF, HE, IIIC, HID, and large parts of Chapters IV and V,. addition of ioctioun HIE and IVA , and all the other details inherent
VI Preface in such a project. The reader will surely benefit from the care he put into polishing this book. Our thanks also go to Vivian Choi for her excellent work in Tj^K-ing all phases of this manuscript. John Hempel Houston, Texas December 1989
Introduction This book covers most of the basic results in linear algebra, but in some ways it is somewhat more ambitious. The main such feature is in Chapter V in which the Hurwitz Theorem is proved. This result has been known since early in this century, but the proofs have just recently been simplified enough for a textbook at this level. The treatment which gave me the courage to include it is due to Reese Harvey and Blaine Lawson ([3]). Our treatment is just a detailed amplification of their appendix. Another feature is the introduction of exterior algebras to use in defining determinants. This has been standard for years in more advanced courses, but is not yet standard in beginning linear algebra texts. It greatly simplifies the, usually quite messy, more standard treatments of determinants. Also, we prove the Cay ley-Hamilton Theorem in a general setting. This only necessitates a consideration of polynomials over a not-necessarily commutative ring (namely, the ring ofnxn matrices). Since noncommutative algebras will be a way of life for us in Chapter V, it doesn't hurt to make an acquaintance in Chapter III. All of this is written as a mathematics course—not an applied mathematics course. It is to be hoped that the reader can read and understand the mathematics without applications. Linear algebra has lots and lots of important and interesting applications to many subjects as well as to other things in mathematics. Try to accept it for now as interesting mathematics.
«-*
Contents Preface Introduction Chapter 0. Algebraic Preliminaries Chapter I. Vector Spaces and Linear Maps A. Vector Spaces B. Linear Maps C. Bases, Dimension D. Direct Sums, Quotients E. Eigenvectors and Eigenvalues (Part i) F. Dual Spaces Chapter II. Matrices and Determinants A. Matrices B. Algebras C. Determinants, the Laplace Expansion D. Inverses, Systems of Equations E. Eigenvalues (Part ii) Chapter III. Rings and Polynomials A. Rings B. Polynomials C. Cay ley-Hamilton Theorem D. Spectral Theorems E. Jordan Form Chapter IV. Inner Product Spaces A. Rn as a Model, Bilinear Forms B. Real Injur Product Spaces, Normed Vector Spaces , C. CompliM Imwr Product Spaces V vii 1 9 9 15 20 28 37 41 47 47 56 61 71 76 81 81 86 94 96 102 111 111 116 123
X Contents D. Orthogonal and Unitary Groups 129 E. Stable Subspaces for Unitary and Orthogonal Groups 133 Chapter V. Normed Algebras 141 A. The Normed Algebras R and C 141 B. Some General Results, Quaternions 145 C. Alternative and Division Algebras 154 D. Cayley-Dickson Process, Hurwitz Theorem 156 Bibliography 163 Index 165
Chapter 0 Algebraic Preliminaries A field is an algebraic object consisting of a set with two operations on it, addition and multiplication, which are required to satisfy certain conditions. To avoid a certain amount of repetition in stating these conditions (and thus to also make the conditions easier to remember), we first talk about sets with only one operation. Definition A group G is a set G with one operation ((a, b) —► ab) satisfying: (i) the operation is associative a(bc) = (ab)c, (ii) there exists an identity element e € G such that for each a G G ae = ea = a, and (iii) for each a G G there is an inverse element a"1 such that aa~x — e = a~xa. Note that, because of (ii), the empty set cannot be a group. But given a set with exactly one element (a singleton) 4t has precisely one operation on it, and that operation makes it into a group, a trivial or zero group. Now conditions (ii) and (iii) leave open certain possibilities. Namely, there may be more than one identity element, and an element of G may have more than one inverse element. Actually, neither of these can happen. Proposition 1 A group G has exactly one identity element and each a 6 G hm emctly om'invgrm.
2 0. Algebraic Preliminaries Proof: Suppose e and / are identity elements in G. Then fe = e since / is an identity element, and fe = / since e is an identity element. Next suppose that 6, c are both inverse elements for a. Then b — eb = (ca)b = c(ab) = ce = c. I Examples (1) The set Z of integers is a group under addition. The operation is associative, zero is the identity element, and —a is the inverse of a. (2) Z is not a group under multiplication. The operation is associative and 1 acts as identity, but only 1 and —1 have multiplicative inverses in Z. (3) The set Q of rational numbers is a group under addition. (4) The set Q — {0} (i.e., all nonzero rationals) is a group under multiplication. (Verify this.) (5) R+ : The set of all real numbers x > 0 forms a group under multiplication. (Verify.) (6) Let Rn be the set of all ordered n-tuples x = (x1,x2,...,xn) ' of real numbers. Given x = (xi,..., xn), y = (yi,..., xn), define x -f y = {x\ -f yi,..., xn + yn). This operation on Rn is associative, 0 = (0,..., 0) acts as identity, and the inverse of x = (a?i,..., xn) is —x = (—a?i,..., —xn). This makes Rn into a group. We will be primarily interested in this course in groups which are abelian. Definition A group G is abelian group if always ab = ba. In the examples above, all groups are abelian. Now we are ready to define field. Definition A set k with an operation of addition (a+ 6) and an operation of multiplication (ab) is a field if the following aoildltloni are Hatinfiecl. (i) k is an abelian group under + (ili it» Identity olcurumt for + is denoted by 0).
0. Algebraic Preliminaries 3 (ii) Multiplication distributes over addition a(b 4- c) = ab + ac. (iii) k — {0} is an abelian group under multiplication (the identity element is written as 1). Note that in a field k, the additive identity 0 is a "multiplicative anni- hilator." That is, for any a G k we have aO = 0. Proof: aO = a(0 + 0) (since 0 is the additive identity) = aO + aO (by (ii)). By (i), aO must have some additive inverse, and adding it to both sides of aO = aO + aO gives 0 = aO. I Recall that we could not have an empty group, but could have one with only one element. For a field we need at least two elements, 0 and 1. (These could not be equal since 0 annihilates 1, but 1 times 1 equals l.)1 There is a field with just these two elements: Addition Table Multiplication Table + 0 1 0 0 1 1 1 0 X 0 1 0 0 0 1 0 1 Exercise. Prove that these operations cannot be anything else. Proposition 2 Let k be a field. If x,y G k and x ^ 0 and y ^ 0; then xy ^ 0. Proof: We will show the contrapositive, that is, if x ^ 0 but xy — 0, then y = 0. Now x/0 implies (by (iii)) that x~x exists (with x~xx — 1). Then y = (x~1x)y = x~1(xy) = x_10 = 0. I The property described in Proposition 2 is "no divisors of zero"\ i.e., we cannot have x ^ 0 and 2/^0, but xy — 0. So a field has no divisors of zero. But, additivcly, a field can have "torsion." For example, the two- element field has 1 + 1 = 0. We can define a three-element field (elements: 0,1,2) by ^Show that If i = 0 wo are reduced to one trivial operation and really have just the ssero group.,
4 0. Algebraic Preliminaries Addition Table Multiplication Table X 0 1 2 0 0 0 0 1 0 1 2 2 0 2 1 + 0 1 2 0 0 1 2 1 1 2 0 2 2 0 1 (Verify that this is a field.) Note that in this field 1 + 1 + 1 = 2 + 1 = 0. Definition The characteristic A of a field k is the least natural number such that 1 + 1 + ... + 1 = 0. v v ' A If no such natural number exists, we say k has characteristic zero (strange terminology?). There will be cases in which we will want to have 1 + 1^0. So from now on when we say "a field fc" we will mean "a field k with 1 + 1 ^ 0." Definition If k is a field and s is a subset of fc, then s is a subfield of k if the operations of k make s into a field. Then k is called an extension field of s. Let p{x) = anxn + an-ixn~1 + ... + a\x + ao be a polynomial with coefficients ai in a field k. We say that r G k is a root of p(x) if p(r) = anrn + an_irn_1 + ... + axr + a0 = 0. The field k is algebraically closed if every polynomial with coefficients in k has a root r in fc. For example, the field R is not algebraically closed because p(x) = x2 + 1 has coefficients in R but there is no real number r such that p(r) = r2 + 1 = 0. However, the field C of complex numbers (which we are just now going to introduce) is algebraically closed. This fact is known as "the fundamental theorem of algebra" The remainder of this introductory chapter is devoted to a classical and very important extension field C of the field R of real numbers. The Field C on R2 Example (6) of a group was Rn. In particular, for n = 2 we have x = (xi,x2), y= (2/1,2/2) with x + y = (xi+yi,X2+1/2)1 identity 0 = (0,0) and the inverse -a? ■* (-&ij^$i) for x. Wo consider R C R2 by assigning to r € R the point (f,0) ill Wo easily hoc that this makes R into a subgroup of R3, (Wrlti out ft formal definition of
0. Algebraic Preliminaries 5 "subgroup.") Can we define multiplication on R2 so that R2 becomes a field with R C R^ as a subfield? Surely the most obvious try is xy= (zi2/i,X22fc). This operation does distribute over addition x(y + z) = {x1,x2)(y1+z1,y2 + z2) = (x1(y1+z1),x2(y2 + Z2)) = (xiyi + x1zllx2y2 + x2z2) — xy + xz. The operation is easily seen to be associative and commutative. It has an identity (1,1). But inverses are a problem. Indeed, if this gave a field then (by Proposition 2) we would have no divisors of zero, but (1,0)(0,1) = (0,0) = 0 (the additive zero of R2). Another way of stating the difficulty here is to note that R^ — {0} is not a group under this multiplication. (Because, for example, (1,0) ^ 0, but if (1,0) had a multiplicative inverse x = {x\,x2) we would have (l,0)(xi,x2) = (1,1). Whereas 0x2 = 0^1.) But there is a multiplication on R^ which makes it into a field. In Section" A of Chapter V we will show that the following formula is "forced on us." (*) (a, 6)(c, d) = (ac — bd, ad + be) In the following set of exercises one shows that (*) does the job. Exercises (1) A subset of H of a group G is a subgroup of G if the operation on G makes H into a group. Prove that H C G is a subgroup if and only if (i) e e H, and * (ii) if a, be H, then afc"1 € H.
0. Algebraic Preliminaries (2) Given three distinct objects {a,b,c} let G be the set of all permutations (= one-to-one maps) of {a, 6, c}, (e.g., a -> c, c —> a, b —> b is a permutation). Define an operation on G to be iteration (first one permutation and then the other). Show that this makes G into a group. Show that this group is not abelian. (3) For a positive integer ra, let Zm = {0,l,...,m-1}. Define: + : Zm x Zm -* Zm mult: Zm x Zm —> Zm by taking answers modulo m (e.g., Zg = {0,1,2,3,4,5} and (3) (5) = 15 modulo 6 = 3). Show that Zm has no divisors of zero <^misa prime. (4) Show that (1,0) acts as identity for (*). (5) Show that (*) extends the multiplication on R (R = {(r, 0)|r G R} C R2). (6) Show that (*) makes R into a field (the field C of complex numbers). (7) Write (a, b) as a + bi and treat these as polynomials in i with the condition that i2 = — 1. Show this gives (*). (8) Define the conjugate of a = a + bi e C to be a = a — bi. (Note that conjugation is the identity map on R.) Prove that for a, (3 G C we have (i) (a + /3) = a + /3, and (ii) (a/3) = a/3. Also show that aa is a nonnegative real number, and aa = 0 exactly when a = 0. (The real number |a| = y/afi. is called the absolute value of a.) Show that if a is real (i.e., 6 = 0), then |a| is just the usual absolute value of a real number. (The definition of \r\ for r G R is Vr2.) (9) Let p(x) be a polynomial in the indeterminate x with real coefficients. Show that for any a E C we have Use this to prove that if p(x) (rt&l eoefftalcmtii) hm a complex root a (i.e., p(a) = 0), then S li §lio ft root,
0. Algebraic Preliminaries 7 (10) A polynomial p(x) over a field k is monic if the highest power of x has coefficient 1. Let p(x) be monic and let r G k. Show that if p(x) is divided by x — r, then the remainder is p(r). (Hint: Write p(x) = xn + an-\xn~x + ... + a\x + ao and do long division by x — r.) Show that this remains true if p(x) has coefficients in R and r G C. (11) Prom (10), we see that if p(x) is monic with real coefficients and a G C is a root of p(#), then x—a divides p(x). Now prove that a monic polynomial p{x) with real coefficients can be factored into linear and quadratic factors. (Hint: Let «i,..., sm be all of the real roots of p(x). Then p(x) = (x-si)...(x- sm)q(x) where q(x) has real coefficients but no real roots. For each complex root a of q(x) we have factors (x — a) and {x — a). Show that (x — a){x — a) has real coefficients.) (12) Show that {a + 6\/3 | a, b G Q} is a field (a subfield of the field R), but {a + &\/3 | a, 6 G Z} is not a field.
Chapter I Vector Spaces and Linear Maps A. Vector Spaces We introduce the concept of a vector space by using the plane R2 as a model. R2 is the set of all ordered pairs (a, b) of real numbers. (The fact that we take ordered pairs means, for example, that (1,2) ^ (2,1).) We define an operation + of addition on R2 by (1) • (a,6) + (c,d) = (a + c,6 + d). This operation is associative (since addition of real numbers is associative). The ordered pair 0 = (0,0) acts as identity for +, and relative to this identity, (—a, —b) is the inverse of (a, b). Thus R2 becomes a group, and it is abelian since the operation is clearly commutative. We next define a second operation in which we use a real number and an element of R2 to produce another element of R2. Namely, if r G R and (a, b) G R we set (2) r(a,b) = (ra,rb). Terminology: "We call elements of R2 vectors. We call elements of R scalars. We call the operation (1) vector addition and the operation (2) scalar multiplication. Definition A real vector space V is a set V of elements (called vectors) mid two operation*:
10 I. Vector Spaces and Linear Maps vector + vector = vector, and (scalar) (vector) = vector. These operations must satisfy the following properties: (i) V is an abelian group under + (identity = 0), (ii) lv = v, and (iii) the two operations are related by (a) r(v + w) = rv + rw, (b) (r + s)v = rv + sv, and (c) (rs)v = r(sv), where v, w are vectors; and r, s are scalars. It is left as a routine exercise to verify that R^ with the operations (1) and (2) is a real vector space. Also, what we did for R , we can clearly do for R** ( = all ordered triples of real numbers) and, more generally, for Rn, n = 1,2,3,... . But these Rn spaces are not the only real vector spaces. For example, let a, b G R and let V consist of all twice differentiable functions / which satisfy T^ + 4 + V-o- dxA ax We define operations (/ +0)0*0 = f(x) + g(x), and (rf)(x) = r(f(x)) and easily verify that V is a real vector space. By replacing R by a field k we get the concept of a vector space over the field k. Examples (1) The field k is a vector space over itself. The vector addition is just addition in k and scalar multiplication is just multiplication in k. (2) kn = n-tuples of elements of k. This is a vector space over k just as Rn is a real vector space. (3) Let V be the set of all continuous functions from [0,1] to R. We define (f + 9)(t) = /(*)+f(t). Midfor reR, (rf)(t) - r/(t).
A. Vector Spaces 11 Since /, g continuous implies that / + g is continuous and rf is continuous, V becomes a real vector space. Definition If V is a vector space over k and r\,..., rm G k and v\,..., vm G V, then the vector r-iv-i + ... + rmvm is called a linear combination of v\,..., vm. Definition Let V be a vector space over a field k. A subset W of V is called a subspace (or linear subspace) if the operations of V make W into a vector space over k. Proposition 1 A subset W of a vector space V is a subspace <=$■ any linear combination of elements of W is in W. Proof: => If W is a subspace, it is closed under vector addition and scalar multiplication and hence closed under taking linear combinations. <= This is obvious. I Examples (1) Let D be all real-valued twice differentiate functions defined on [0,1]. Given P,Q in D, let W be all solutions y{x) of the differential equation y" + P(x)y' + Q(x)y = 0. Then W is a subspace of D. (2) Let R°° be the set of all sequences {a^} of real numbers. We add coordinatewise and multiply by a real number coordinate- wise. Let W C R°° be all sequences {a^} such that J2al *s a convergent series. Then W is a subspace (proved in §B, Chapter IV). (3) Let P be the set of all polynomials with real coefficients. Let Pn be those of degree < n. Then Pn is a subspace of P. Proposition 2IfV is a vector space and {Wa}ae^ is any collection of subspaces, then W= f]Wa is a subspace. Proof: Exercise. I Definition If V is a vector space and A is any subset of V, then Span (A) \h the intersection of all subspaces of V which contain A. By propositions 1 and 2 we see that Span (A) is a subspace of V equal to all finite? linear combinations of vectors in A
12 I. Vector Spaces and Linear Maps Definition ACV generates (or spans) V if Span (A) = V. Examples (1) A = {(n,0)|n EZ}CR2 does not generate R2. (2) A = {(1,0), (0,1)} C R2 does generate R2. (3) A = {(M2,*3)!* e R} C R3 generates R3. (1) and (2) should be easy. But, at this stage of our development of this subject, (3) should not be obvious. But we already do know how to prove it and we proceed to (somewhat lengthily) do so. Let (a,b,c) be an arbitrary element of R . We want to show that (a, 6, c) is some linear combination of elements of A. We will take u = (1,1,1) v = (2,4,8) w = (3,9,27) as elements of A and we assert that the arbitrary element (a, 6, c) of R** is a linear combination of these three elements of A. We seek real numbers a, 0,7 such that au + f3v + 7W = (a, 6, c). Now au = (a, a, a) f3v = (2/3,4/W) 7iy = (37,97,277). So we must have (1) a + 20 + 37 = a (2) a + 40 + 97 = 6 (3) a + 80 + 277 = a. You do know how to solve these equations for a, 0,7. For example, solve (1) for a (a = a - 20 - 37) and plug this into (2) and (3). This gives . (10 a + 20 + 67 = b (2') o + 60 + 247 = c.
A. Vector Spaces 13 Next we solve the first for /3 and plug that into the second. (2/3 = 6—0—67, and putting that in the second gives a + 3(6 - a - 67) + 247 = c .) This last equation is easily solved for 7 to give 67 = c — 36 + 2a or _ c - 36 + 2a 7 - 6 ' Thus for any (a, 6, c) € R we can find an appropriate 7. But then we can plug this 7 back into V and 2' to get (1") a + 2/3 + (c-36+2a) = 6 (2") a + 6/3 + 4(c-36 + 2a) = c . Prom (1") we have 2/3 = 46 — 3a — c and plugging this and the value for 7 into (1) will give a. This example was designed to show that this elementary method of "solve-and-plug-in" is really a powerful method in linear algebra. It will settle questions which are not at all intuitively obvious. Like: the set A = {(t,t2,t3)\t eR} cr3 does generate R . However powerful the "solve-and-plug-in" method is, it can be tedious if we keep doing it for every problem. So we will develop general theorems which will give us answers without each time resorting to "solve-and-plug- in." (The example A = {(£, £2, t3)\t G R} is called the twisted cubic, and our "nolve-and-plug-in" method is often called the "method of elimination") Exercises (1) Show that {/ : R —> R|/ continuous} with operations (f + g)(t) = f(t) + g(t) (rf)(t) = rf(t) is a vector space. Call it V. (i) Let to S R and let W = {fev\f(t0) = o}. Show that W is a subspace of V.
14 I. Vector Spaces and Linear Maps (ii) Let C/ = {/E V\f{t2) = (/(0)2 for all t e R}. Show that U is not a subspace of V. (iii) Let X — {/ : R —> R|/ differentiate} and show that X is a subspace of V. (2) If (7, W are subspaces of the vector space V, show that the sum of [/ and W U + w = {u + w\ue U, weW} is also a subspace of V. (3) If U and W are subspaces of V, show that UUW need not be a subspace. However, if U U W is a subspace, show that either UCW otWCU. (4) Let R°° be the vector space of all real sequences and let W C R°° be all sequences with only a finite number of nonzero com- ponents. Show that W is a subspace of R°°. (5) Show that the set {1, (t - 1), (t - l)2, (t - l)3} generates P3 the vector space of polynomials of degree < 3. (6) Suppose A and B are subsets of the vector space V; show that if A C B, then Span(A) C Span(B). (7) Consider 2x2 square arrays of real numbers (called matrices). We denote the totality of these by M2(R) = |7 ac bAUb,c,deR\. We make M2(R) into a vector space (over R) by denning e f \ ( a + e & + / a b \ c d J 1 g hi \ c + g d + h and a b \ _ ( ra rb c d J y re rd (You will note that this is just the vector space R4 of ordered quadruples of real numbers. But we will later find good reasons for sometimes writing these ordered quadruples as square arrays.) -05) € M2(R) is diagonal If 6 * 0 = c. Show that the set D C M2(R) of diagonal matrices ia a subspace of M2(R). Do the same for th© sst T of upper triangular matrices (c = 0).
B. Linear Maps 15 (8) Generalize (8) to 3 x 3 matrices M3(R). (9) A = I . 1 G M2 (R) is singular if ad — be = 0; otherwise it is nonsingular. Show that the set 5 of singular matrices in M2(R) is not a subspace of M2(R). Show that the set NS of nonsingular matrices in M2(R) is not a subspace of M2(R) . B. Linear Maps For two vector spaces V, W (over the same field fc), the kind of function from V to W of most importance to us is a linear map. Definition 0 : V —► W is linear if it respects linear combinations; i.e., for any a, 6 € fc and #, y e V we have (3) 0(ax + 6j/) = acj)(x) + 60(</). Exercise. We define the identity map I : V —> V by 7(f) = v for every v € V. Show that J is linear. Proposition 3 0 : V -> W is linear & (i) <f>(x + y) = 0(x) H- 0(j/), and (ii) 0(aai) = a<j>(x). Proof:. => (i) follows from (3) by taking a = 1 = b and (ii) follows from (3) by taking 6 = 0. <ss 0(ax + fa/) = 0(ax) + 0(fa/) by (i), and this equals acj){x) + b(j){y) by (ii). I Note that if <j> is linear, then (f)(0) = 0 (since <j>(0) = <j>(0 + 0) = </>(0) + 0(0)). (1) Suppose 0 : R —> R is linear. Then 0 is completely determined by knowing 0(1). (If 0(1) = a, then 0(6) = 0(61) = 60(1) = ba.) (2) The same proof shows that a linear map 0 of R into any (real) vector space is determined by knowing 0(1). (3) Let D be the set of all real-valued C°° functions defined on R (i.e., functions with derivatives of all orders). Then D is a real vector ipace and wo have a map a\ D-+D
16 I. Vector Spaces and Linear Maps defined by letting a(f) = /', the derivative of /. Then a is a linear map. (4) With D as in (3), we define 0: D-+D by 0(/)O) = /* f(t)dt. Then 9 is linear. Definition Let 0: V —> W be linear. The image of 0 is the set of all <j>(v) for v G V. We denote the image of 0 by 4>(V) or by im 0. The preimage of A C W, written 0-1(A), is the set of all vectors in V which map into A under 0. The kernel of 0, denoted by ker 0, is the set of all v G V such that <j>{v) = 0. The linear map 0 is surjective (or epic) if 0(F) = W; it is injective (or monic) if ker 0 = 0. A linear map 0 which is both surjective and injective is an isomorphism. If there is an isomorphism from V to W, we say that they are isomorphic and write V = W. Exercise. Show that the linear map 0 is injective O 0 is one-to-one (i.e., if 0(i>) = 0(u>), then v = w). Note that 0-1 is a function from subsets of W to subsets of V\ however, it is not a function from 4>(V) to V in the usual sense unless 0 is monic (i.e., if ker 0 ^ {0} and w G im 0 (so there is a v G V such that <j>(v) = u>), then (j)~1(w) is not well-defined since it consists of the set {v + v'\vf G ker 0}). Examples (1) 0 : R —> R2 defined by 0(x) = (x,x) is linear and is injective, but not surjective. (2) 0: R^ —> R defined by 0(x, y) = x+y is linear and surjective, but not injective. (3) Let 0 : R^ —* R^ be linear. Denote 0(1,0) by (a,c) and 0(0,1) by (ft, d), i.e., 0(1,0) = (a, c), 0(0,1) = (6, d). We assert that 0 in example (3) is an isomorphism 4=> ad — be ^ 0. We can prove this by our "solve-and-plug-in" method discussed in Section A. This is a bit tedious, but we will do it in hopes of motivating our development of the general theory of matrices and determinants. We have 0 : R2 —► R2 linear, and #1,0) « (a,c), 0(0,1)-(6,d).
B. Linear Maps 17 Suppose ad — bc — 0. We show that <j> is not injective (and hence can't be an isomorphism). If both a and c are zero, then 0(1,0) = (0,0) and we are done. If not, then one, say a, is nonzero. Then (—1,1) ^ (0,0), but we claim that </>(-£, 1) = (0,0). Well, (-^,1) = -|(1,0) + (0,1), so #-i,l) = -±^(l,0) + ^(0,l) = -i(a,c) + (6,d) = (-6,-^) + (6,d) = (0,=^) = (0,0). A similar argument works with 6, d replacing a,c. Thus we have proved <\> an isomorphism =>ad — bc^0 (by proving ad — bc = 0=>4> not an isomorphism). Next we need to show ad — be ^ 0 => (i) <j> is injective and (ii) 0 is iurjective. To prove (i), we suppose (4) 0(aM/) = (O,O) and prove that (x,y) = (0,0). As before, we have 0(1,0) = (a,c) and #0,1)-(M). Then (4) gives (1) ax-\-by = 0, and (2) cx + % = 0. Since ad — be ^ 0 one of a, 6 must be nonzero. Suppose a ^ 0. We solve (1) for # and plug into (2) to get / cb \ „ f-cb + ad\ y(--+d)=0 or ^_-_j=o. Since R has no divisors of 0 and ~cb+ad ^ 0, we conclude that y = 0. Similarly, 6 and d can't both be zero so we can solve (1) or (2) for y and plug into the other to get x — 0. Thus 0 is injective. The proof that 0 is surjective goes in a similar way. Given (z, w) G R we want (x,y) such that </>(x,y) — (z,w). We can solve ax + fa/ = z for either x or y and plug into ex + dy = w, solve and plug the answer back into ax + by = z. Proposition 4 If (f> : V —* W is a linear map which is an isomorphism, then 0""1 is also linear (and thus is also an isomorphism). Proof: Let a, b G k and x,y € W. Since 0 is one-to-one, there exist unique elements x',y' of V such that <j){x') = x,<f>(yf) — y. Then ^ox' 4- fa/) = ti,j' +.6y, so that <f>'l(ax + by) = ax' 4- by' = a^""1^) 4- b<p'l(y), proving linearity of </>~l. I W© conclude thin section with a brief discussion of the Algebra of En- domorphismB of a vector space.
18 I. Vector Spaces and Linear Maps The Algebra End(F) Definition An algebra is a vector space W which also has a multiplication WxW -> W (v,w) —> vw which distributes over vector addition u(v + w) = uv 4- UW, and (v 4- w)u = w + wii and satisfies v(aw) = a(vw) = (av)w for a E k. The algebra has a nm£ element if there is an element leW which acts as a left and right multiplicative identity. It is associative if its multiplication is associative. Examples (1) R is an algebra with unit element. (2) C is an algebra with unit element. Indeed, any field is an algebra with unit element, but an algebra with unit element need not be a field because some nonzero elements in an algebra may not have multiplicative inverses. This is true of the algebra End(F) which we are about to define. This algebra End(V) plays a central role in the study of linear algebra. Definition A linear map of a vector space V into itself is called an en- domorphism (or operator) of V. We denote by End(F) the set of all endo- morphisms of V. We first define operations on End(V) to make it into a vector space over k (V being a vector space over the field k). Let 0, ^ G End(V). We define <\> 4- ^ by (4-) 04-t/OO) = <l>(v) + ij)(v) forallveV. For a G k and (f) G End(V) we define acf) by (.) (a(f)){v) = a(<t>(v)) for all v € V. The operation (4-) is easily seen to be associative and commutative. * The zero element of End(V) is the endomorpfalim lending all v € V to the zero element in V. Relative to this zero in IndfK), tha additive inverse of cj) e End(V) is -<fi defined by (-0)(v) » *=0(v). Clearly 0 + (-0) in
II. Linear Maps 19 the zero endomorphism. The other properties of (+) and (•) are routinely verified so that End(F) is indeed a vector space over k. Next, we want to make End(V) into an algebra. For 0, i/j G End(F) their product 0-0 is defined by 4>^{v) = 0(,0(i>)). That is, we just take their composition v JUV A+y v —► xj}(v) —>(f)(tp(v)). N v ' (We will use the notation 0-0 and 0 o -0 interchangeably.) 0-0 is linear because {<p%jj){av + bw) = (f)(a^(v) 4- btp(w)) since i\) is linear = a<t>{^){v)) 4- b(j>(il)(w)) since 0 is linear. Alio our multiplication distributes over addition. For let 0,0,-0 £ End(V) and v £ V. Then {0(0 +VOX*) = 0((0 + ^)) = 0(0(u) 4- ^(v)) = 04>{v) 4- 0V>(^) = (00 + 0^). Thus End(F) is an algebra. This algebra does have a unit element, i.e., the identity map of V to V. We denote this map by 1, 1 : V -> V, and we clearly have 01 = 10 = 0 for all 0 G End(V). It in convenient to use 02 to denote 00, etc. Thus p(0) makes sense if p($) \n any polynomial with coefficients from k. Definition Let A be an algebra with unit element 1. Then any a G A is failed h ?/m£ if there exists b e A such that a& = 1 and ba = 1. (That is, fi r /I is a unit if it has a multiplicative inverse b.) Clearly then b is also a unit; furthermore, this b is unique. (Because if also ac = ca — 1, then b — lb = (ca)b = c(ab) = c\ — c.) Proposition 5 Let A be an associative algebra with unit element 1. Then thr 8ft U of all units in A is a group under multiplication. Proof; By hypothesis, multiplication is associative, 1 acts as identity and, by definition, each u € U has a multiplicative inverse. I Since our multiplication in End(V) is associative (composition is associativa by itn very definition) we have a group of unite* in End(V). Wo denote it by GL{V) aud call it the general linear group ov©r V.
20 I. Vector Spaces and Linear Maps Proposition 6 <\> e End(V) is a unit <$ <j) is an isomorphism. Proof: => <\> is a unit so there exists V € End(F) so that c/yip = 1 and ipcf) = 1. Now 0-0 = 1 implies 0 is surjective (and ip is injective) and i/j(f) = 1 implies 0 is injective (and i/j is surjective) (see exercises). <= If 0 is an isomorphism, take i/j = (p'1 to see that 0 is a unit. I Thus, GL(V) consists of those endomorphisms of V which are actually isomorphisms of V. Exercises (1) Let V —► W be linear. Show that ker 0 is a subspace of V and 4>{V) is a subspace of W. (2) Give specific linear maps 0, i/j : R2 —> R2 such that 0-0 ^ i/j(f). (3) Let F -^-> W -^-> F be linear maps such that V ^ V is an isomorphism. Show that <\> is injective and i\) is surjective. (4) A linear map V —► V is idempotent if pp = p. Show that if p is idempotent then p acts as the identity on p(V). (Such linear maps are called projections: p projects V onto p(V).) (5) Show that <\>: R2 —> R2 given by 0(x, ?/) = (x, x + y) is linear and so is ip : R3 —> R given by i/j(x,y,z) = z. Show that 0 : R2 -+ R given by 0(x, ?/) = xy is not linear. (6) Decide whether or not <\> : R2 —> R2 given by 0(x, y) = (x + ?/, 2x—y) is an isomorphism. If it is, find a formula for (f>~1(u^ v) and show that <\> o 0~1(u, v) = (u, v) and 0"1 o 0(x, y) = (x, y). (7) If 0 : V —> W is linear, show that (f)(—v) = —</)(v) for any (8) If 0:R2-+R2 is defined as in exercise (6), what is p(0) where p(x) = x2 - 2x + 1? C. Bases, Dimension Let V be a vector space over a field k. For a (finite) linear combination of vectors ai^i + «2^2 + ... + amt;m (ai € fc, vt E V) we make a convention: namely, if i ^ j then ^ ^ Vj. That is, vectors with different subscripts must be different vectori. W© can always put a finite
C. Bases, Dimension 21 linear combination in this form — if a given vector occurs two or more times, we sum up its coefficients and then write it just once with this sum as coefficient. Definition Let S be a subset of V. The set S is linearly independent if no two distinct finite linear combinations of elements of S can be equal vectors. For example, if S = {u, v, w} is linearly independent and au 4- bv + cw = a'u + b'v 4- cfw, then it must be that a = a', 6 = b', c = c'. That is, if we don't have a = a', b = 6', and c = c', then au -f bv 4- ciu and a'u 4- b'v 4- c'w must be different vectors in V. Lemma If S is linearly independent and aiVi 4-... 4- amVm = 0 (vi 's in 5), Men a\ = a2 = ... = am = 0. Proof: We have ai^i 4-... 4- amvm = 0 = (toi 4-... 4- 0vm. I Definition If S C V is not linearly independent, then it is said to be Hnmrly dependent Proposition 7 (i) IfOeS, then S is linearly dependent. (ii) If S is linearly dependent and S C T C V, £/ien T zs linearly dependent. (iii) // 5 25 linearly independent and U C S C V, then U is linearly independent. Proof: (i) 1 • 0 = 0 • 0 (two distinct, but equal, linear combinations). (ii) We have a\V\ + ... 4- amvm = 0, with v% £ S and not all a* being zero. Clearly we have the same relation in T since all Vi € T,
22 I. Vector Spaces and Linear Maps (iii) This is the contrapositive of (ii). I Suppose S = {^i,..., vp} is a finite set of vectors in V. How can we proceed to determine if S is linearly dependent or independent? If vi = 0, we are done (5 is linearly dependent). If V\ ^ 0, then its span is a nonzero subspace of V. Span(^i) = {tv\\t £ k}. If any of i?2,..., vp lie in this span (e.g., Vj = to^i), then {vi,Vj} is linearly dependent, and hence so is 5. If none of i?2,..., vp are in Span(vi), consider Span(vi,V2). Show that if one of V3,..., vp is in Span(vi, V2), then S is linearly dependent. If not, look at Span^i,^,^)- If any of V4,... , vp lie in this span, we have that S is linearly independent, etc. If vp is not in Span(vi,..., vp_i), we have that S is linearly independent. Definition A set T is infinite if T contains a subset U which can be put into a one-to-one correspondence with N = {1,2,3,...}. (Of course, we may have U = T.) If T is not infinite it is finite. Definition A vector space V is infinite-dimensional if it contains an infinite linearly independent subset. Otherwise V is finite-dimensional. Equivalently, we can define V to be finite-dimensional if it is spanned by some finite subset. Examples (1) R°° = all sequences of real numbers (Example (2) of IA after Proposition 1). This vector space is infinite-dimensional. For let S = {(1,0,0,...), (0,1,0,0,...), (0,0,1,0,...),...}. Then S is infinite (let (1,0,0,...) <-+ 1, (0,1,0,...) <-+ 2, (0,0,1,0,...) <-► 3, etc.) and linearly independent. (2) R^ is finite-dimensional. Indeed, any three vectors in R^ form a linearly dependent set. (See (i) of Theorem 1 later in this * section.) Definition A set S in a vector space V is a basi$ for V if: Span(S) = V; and S ie linearly indepsadent.
C. Bases, Dimension 23 Example Theset{ei = (l,0,...,0),e2 = (0,l,0,...,0),...,en = (0,0,...,l)}is a basis for Rn or, more generally, for kn (called the standard basis for kn). Proposition 8 S CV is a basis for V O Each element ofV is a unique linear combination of elements of S. Proof: => Since S spans F, any v G V is a linear combination of elements of S. Since S is linearly independent, this linear combination must be unique. <= The hypothesis implies both spanning and linear independence. I Proposition 9 Let {vi,..., vn} be a basis for V, (defined over a field k). If W is a vector space over k and w\,..., wn are any n vectors in W, then there exists exactly one linear map 4>:V-*W audi that 4>{v%) = wi for i = 1,..., n. Proof; v G V may be uniquely written as v = aiV\ + ... + anvn, m If <f> is to be linear we must have (6) . (ft(v) = ai(j)(vi) + ... + an(j)(vn) = aiwi + ... + anwn. Conversely, if 4> is defined by (5), it is linear and 4>(vi) = Wi. I Proposition 10 If A = {i?i,... ,i?m} spans V, then some subset of A is a hamx for V. Pivof: Let A be the set of all linearly independent sets in A. Let n be the maximal number of elements in any set in A. By changing notation, we rimy asnume that: {^i,..., vn} is linearly independent, but if n < i < m, ili^u {V\,..., vn, Vi} is linearly dependent. Thus we have a relation aivi + ... + anvn + alvl = 0 Willi not all coefficients zero. If a* = 0, we contradict the linear inde- * puitc'Umcn of {v[,...yVn}. So we can solve for V{ (vi = —^v\ — ... - |*Hj„) to nhow that v% € Span(vi,... ,vn). Thus vn+i,... ,vm are all in gpaii(ti] f,.., vn) so {vi,..., vn} is a basis for V. I
24 I. Vector Spaces and Linear Maps Theorem 1 Let {^i,..., vn} be a basis for V. Then (i) any wi,... ,wn, iun+i in V are linearly dependent, and (ii) any U\,..., itn-i fail to span V. Proof: (For (i) we really just need that {^i,..., vn} spans V.) We have w\ = anVi + ... + ainvn wn+! = an+i,ii;i 4-... 4- an+i)nvn. If w\ = 0, we are done. If w\ ^ 0, some a\j must be nonzero and we can solve the first equation for Vj (in terms of the other v's and wi) and plug this into the remaining n equations. The second equation has w\,W2 and all but one of the v's in it. If W2 = 0 or if wi and w<i have the only nonzero coefficients, then {wi, W2} is linearly dependent. If not we can solve for some v and plug this into the remaining n — 1 equations. By now the pattern of the argument should be clear. Either {wi,W2, ^3} is linearly dependent or we can solve for some v and plug that into the remaining equations. After n steps we run out of v's and have a nontrivial linear dependence relation among wi,... ,wn+i- The argument for (ii) is similar. If iti,... ,itn-i did span V we would have vi = cntti 4 ... 4ci5n_ittn_i Vn = Cni^i 4 ... + Cn>n_iUn_i. Now fi 7^ 0, so we can solve the first equation for some it and plug in the other equations. We wind up with a nontrivial linear dependence relation among vi,..., vn , contradicting the hypothesis. I Prom Proposition 10 and Theorem 1, we see that any finite-dimensional vector space V has a finite basis and that any two bases for V must have the same number of elements. In fact, any infinite-dimensional vector space V also has a basis (called a Hamel basis) and any two such bases for V can be put into one-to-one correspondence; we will show existence of bases after we have discussed Zorn's Lemma in section IF. Definition The number of elements in a basis for the finite-dimensional vector space V is the dimension of V, written dim V.
C. Bases, Dimension 25 Proposition 11 Let W be a subspace of the finite-dimensional vector space V. Then any basis for W can be extended to a basis for V. Proof: Let {wi,... ,wp} be a basis for W. (This must be finite or we would have an infinite linearly independent set in V.) Then {wi,..., wp} is linearly independent in V. If Span(u>i,..., wp) — V, we are done (since Span(u>i,..., wp) = W so V = W). If not, we take vi,..., Vk in V — W so that Span(wi,...,Wp,t;i,...,t;jfe) = V. We consider linearly independent subsets of {u>i,..., wp, t>i,..., Vk} and proceed, as in Proposition 10, to get a basis for V containing {u>i,..., wp}. i This clearly implies dim W < dim V. Now suppose V and W are finite-dimensional vector spaces over a field k) and 4>: V-+W is a linear map. We have seen that the kernel of <f> ker (f) = {v e V\<t>(v) = 0} 1§ a subspace of V and the image of 0 im cf) = {(f)(v)\v e V} i» H iubspace of W. We now prove an important relation involving dimen- llf)IIN. • Thaorem 2 (The Rank Theorem) dim V = dim (ker 0) + dim(im 0). Proof: Let {i>i,..., vr} Q V be a basis for ker 0, and {u>i,..., u>s} C W bp a, basis for im <\>. Choose tti,..., us e V such that <j>{ui) = wi (we can do thin by tho definition of im 0 = 4>(V)). We claim that {t>i,..., vr, tti,..., tts} ii a basis for V. This will prove the theorem. First we will show {i>i,..., vr, tti,..., its} is linearly independent. Sup- \umv ai?^ «f \~arvr + b\Ui -\ \-bsus = 0. Since vi,... ,vr are in ker 0, fhip Implies c/)(aifi + ... + arvr + 6itti + ... + bsus) = 0(0) = 0 0(&iui + ... + b8u8) b\<t>(ui) + ... + ba<l>(ua) II hw% + .,, + 6i^i.
26 I. Vector Spaces and Linear Maps Since wi,... ,ws are linearly independent in W, hiw\ + ... + b8w8 = 0 implies 61,..., bs are all zero. But then a\V\ + ... + arvr = 0 so a\,..., ar are all zero. This proves {vi,...,vr, uu...,us} is linearly independent. It remains to prove this set of vectors spans V. Let x G V. Then <t>(x) e 4>{V) so (f)(x) = tiWi + . . . + tsWs. Set u = t\Ui +... + tsus. Since (f>(ui) — W{, we see that (f)(x—u) = 0. Thus x — u is in ker 0 and is a unique linear combination of v\,..., vr x — u — rri\V\ + ... + mrvr , so x = miVi + ... + mrvr + t\Ui + ... + tsus. I The rank theorem has two important corollaries. Corollary 1 Let V he a finite-dimensional vector space and (j): V —> V be a linear map. If (f> is monic, then <\> is an isomorphism. If <\> is epic, then (f) is an isomorphism. Proof: If (f> is monic, the rank theorem implies that dim (f>(V) = dim V, so 4>(V) = v. If (f> is epic, 4>{V) = V, and the rank theorem implies that dim (ker (f>) = 0. Thus (f) is monic. I Corollary 2 Let V and W he finite-dimensional vector spaces of the same dimension. Then V = W. Proof: Let {vi,..., vn} be a basis for V and let {wi,..., wn} be a basis for W. By Proposition 9, there is a linear map 0 : V —+ W such that <t>(vi) = W{ for i = 1,... ,n. Note that 0 is epic since {wi,... ,wn} spans W. Now the rank theorem implies that ker (j) — {0}, so <\> is also monic. Hence (j) is an isomorphism. I Corollary 2 says that any n-dimensional vector space V over the field k is essentially kn since we get an isomorphism V^kn once we choose a basis for V (we already have the standard basis for fen).
C. Bases, Dimension 27 Exercises (1) Check that a: N-+N defined by a(n) = 2n gives a one-to-one correspondence between the set of all natural numbers and the set of all even natural numbers. Use this to prove that: If a set S is infinite, then it can be put in a one-to-one correspondence with a proper subset of itself. (2) Determine whether or not {(1,1,0), (2,0,-1), (-3,1,1)} is a basis for R . (3) Let (f> : V —> W be linear. Suppose that vi,..., vp G V are such that (f>{v\),..., 4>{vp) are linearly independent in W. Show that vi,..., vp are linearly independent. (4) Let (f> G End(V) for a finite-dimensional vector space V. Prove that cf) is monic <3> (f) is epic <3> <\> is an isomorphism. (5) Show that Pn = {polynomials with real coefficients of degree < n} is an (n + 1)-dimensional subspace of the infinite-dimen sional vector space of all real polynomials. (6) Let Vbea vector space over a field k and let £/, W be finite- dimensional subspaces of V. Prove that both U + W = {u + w\u G i7, w G W} and U D W are finite-dimensional subspaces of V, and dim(U + W) + dim(U n W) = dim U + dim W . (Mint: We can find bases: {vu...,vp} for t/HW {fi,... ,fp, ui,.. .,uq} for 17 {vi,...,vp, wi,...,iur} for W. ■ Show that {v\,... , i>p, ui,... , ug, u>i,... ,uv} is a basis for U + W.) (7) Show that the set of real numbers R is a vector space over the rational numbers Q. Show that R is not finite-dimensional as a vector space over Q.
28 I. Vector Spaces and Linear Maps (8) Show that the set of complex numbers C is a two-dimensional vector space over R. (9) Define two subspaces C/,y of a vector space W to be complementary if (i) U fl V = {0}, and (ii) U + V = W. Prove that [7,V are complementary <£=£► each w G W may be uniquely represented as w = u + v, u €U, v eV. (10) Let {ei,e2,e3} be the standard basis for R^ and defiine (f> : R3 -> R by <£(ei) = 1, 0(e2) = 2, <£(e3) = -1. Determine the subspaces ker 0 and im 0, and verify the rank theorem in this case. (11) (f) : V —> V is nilpotent of order 2 if (fxf) is the zero endomor- phism. Now composition of two such endomorphisms need not be nilpotent of order 2. Find <\>,ty : R^ —> R , each nilpotent of order 2, whose composition is idempotent. D. Direct Sums, Quotients In this section we discuss a generalization of the way we constructed the vector space R^ from R. If V, W are vector spaces over a field fc, we will construct a new vector space V 0 W called the (external) direct sum of V and W. The set of vectors in V 0 W is just the set of all ordered pairs (v,w) with v EV and w € W. We add two such pairs coordinatewise (v, w) + (x, j/) = (v + x, w + y) and for a scalar r £ k we define r(v, t/;) = (rv, rt/;). The fact that these operations make V 0 W into a vector space is trivial. It satisfies the axioms for a vector space since V and W do. The zero vector in V 0 W is (0,0), the first vector being the zero in V, the second that in W. Just as we can include R into R^ as either the x-axis or y-axis, and we can project R^ onto either axis, we have four naturally defined linear maps.
D. Direct Sums, Quotients 29 V V V®W w w k(v) = (v,0) i2(w) = (Q,ty) Pi(x,y) =x p2{x,y) =y W© call ii^2 inclusions and pi,P2 projections. Clearly pi o^ is the identity map on V and p^ o i2 is the identity map on W. It follows that i\,i*i are Itljocfcive (linear maps) and Pi,P2 are surjective (linear maps). Note also thai P2 oil and pi oi2 are zero maps. Actually, the image of i\ is the kernel of pa and the image of i<i is the kernel of pi. Proposition 12 If {vi,... ,vn} zs a feasis for V and {wi,... ,wm} is a buni* for W then {(vu 0),..., (vn, 0), (0, iyi),..., (0, wm)} U a basis forV ®W. Pwof; Given (v,w) <G V © W we have v = aivi + ... + anvn and w = biWi + ... + bmWm uniquely. Tlinn (u,u/) = ai(vi,0) + ... + an(vn,0) + bi(Q,wi) + ... + bm(Q,wm), also utihjudy I CWnllary dim(V © W) = dimF + dim W NpmI wn consider when a vector space is, "somehow," the direct sum of iWn of Itw Nuhnpaces. So suppose A and B are subspaces of the vector space V (6vt»r #). Then A and 5 are vector spaces over k and we can construct A$ B SB above. We use the operation of vector addition in V to define a Hi up rj: A&B-+V, tfUhb) — aH-6,
30 I. Vector Spaces and Linear Maps Proposition 13 (i) 7] is a linear map. (ii) rj is infective O Af)B = {0}. (iii) rj is surjective <3> A\J B spans V. Proof: (i) rj(r(ai,bi)+s(a2,b2)) = v(rai + sa2,r61 + 562) by definition of operations in A ® B. By definition of 77 this is rax + sa2 + rbi + sb2 = n/(ai, &i) + srj(a2,b2), proving rj is linear. (ii) <= We suppose Af)B = {0} and 77(0,6) = 0. That is a + 6 = 0. This implies a = —6 G B and 6 == —a G i so a = 0,5 = 0. => (Contrapositive proof) Suppose c e An B. If c 7^ 0, we have ry(c, 0) = c and 77(0, c) = c but (c,0) ^ (0,c). Thus 77 is not monic. (iii) <= Given v G V, it is a finite linear combination of ai,..., a^ in A and 61,..., 6m in £?. Add up (with coefficients) the part in A to get a G A and similarly get b e B. Then 77(a, 6) = a + 6 = 7j. => This is obvious. I Definition When 77 (above) is an isomorphism, we say that V is an internal direct sum of A and B. By Proposition 13, we conclude that V is the internal direct sum of A and B exactly when we can express every v G V uniquely as v = a + 6 where a G A, 6 G £?. Examples (1) R^ is the internal direct sum of A = {(x,0)GR2} and B = {(0,7/) GR2}. (2) R^ is the internal direct sum of A (as in (1)) and C = {(#, x) G R2} because A n C = {0} and A U C spans R2. (Given (p, (?) G R , we have (?,«) = (P-0.O) + («,g).) £,4 eC
D. Direct Sums, Quotients 31 There is some useful terminology for situations where we have three or more vector spaces and linear maps like, e.g., (6) U -^ V ^-* W. Definition We say the sequence (6) is exact at V if imp = ker a . Let us denote the trivial ( = one-element) vector space by 0. Then if wo assert that in exact at {/, we are simply asserting that p is monic. Ill Hilary, if we assert that 1« ixact at W, we are simply asserting that a is epic. Definition If (T) . 0->U-*>V-Z>W-+0 li M\ exact nequence of vector spaces (i.e., exact at {/, at V, and at W), we nay that (7) .spZite if there is a linear map 7 (called a splitting) v 2- w Stir11 that (7 © 7 in the identity on W. Theorem 3 For finite-dimensional vector spaces, the exact sequence (7) fjMf * MplU, Pft«»/; OIioohc a basis (2:1,..., zm} for W. By Proposition 9, given any m v« turn |/i,..., j^m in V there is exactly one linear map W —> V sending *i to fy( i -*= I,..., m, So if we choose Vi € flr"1^) i = l,...,m (which Wfc can do since a is epic) and we set 7(*«) s M »
32 I. Vector SpacciH and Lincwir Maps then the resulting linear map clearly has a o 7 = id . I Corollary Given the exact sequence of finite-dimensional vector spaces and any splitting 7 : W —► V, then V is represented as an internal direct sum V = p{U)®y{W). Proof: Let {a?i,..., xn} be a basis for U and {^i,..., zm} be a basis for W. Since p(U) f)j(W) = {0} (by exactness at V), it will suffice to show that {p(xi),..., p(xn),7(*i),..., 7(^m)} is a basis for V. Since p is injective, {p(a?i),... ,p(xn)} is linearly independent in V. Also {7(^1),...,7(^m)} is linearly independent in V (since their images under a are linearly independent in 7(W^)). Since p(U)r)-f(W) = {0}, {p(xi),...,p(xn), 7(zi),...,7(zm)} is linearly independent in V. By the rank theorem, dim V = n + m so that set is indeed a basis. I Quotients Let V be a vector space over a field k and let W be a subspace of V. We use W to define an equivalence relation on V. Namely, for v,w G V, we define v ~ w if v — w E W. Now (i) v ~ v, since v-u = 0G W. (ii) v ~ w => w ~ v, since v — w G W =^ w — v = — (v—w) G W. (hi) v ~ w and w ~ u => v ~ it, since v — u = (v — w) + (w — u) and VF is closed under vector addition. Thus ~ is an equivalence relation. The equivalence relation ~ divides V into disjoint equivalence classes which are called cosets. Proposition 14 Let C be any one of these equivalence classes and let v be any element of C. Then C = W + v = {w + v\w G W}.
D. Direct Sums, Quotients 33 Proof: By definition C = {u\u ~ v} = {u\u — ve W}. For such a u, u — v = w E W so that u = w 4- v. So the proposition is clear. I Thus each equivalence class is just UW translated by adding a fixed vector to each element of W." We have a schematic picture We denote the collection of equivalence classes by V/W and will now make V/W into a vector space, called the quotient space of V modulo W. Definition Let Ci, C% be two equivalence classes. Choose any v G C\ and any w G C% and define C\ + C<i to be the equivalence class containing the vector v + w. For any equivalence class C and any r G k we choose any u G C and define rC to be the equivalence class containing the vector ru. Proposition 15 These operations are well-defined, i.e., they are independent of our choices. Proof: Let v' G C\ and w' G C2. We need to show that the class containing v + w is also the class containing v' + w'. But v — v' = w\ G W and w — w' = W2 G W, so (y + w) — (v' 4-1//) = wi 4- W2 £ W"- Thus addition is well-defined. If v! (as well as u) is in C, then u — u' — w and ru — ru1 = r(ii-u') = ru; € W, so scalar multiplication is well-defined. 1
34 I. Vector Spaces and Linear Maps Proposition 16 With these operations V/W becomes a vector space over k. Proof: Exercise. I Exercise. If dimF = n and dimVF = m, show that d\mV/W = n — m. (Hint: Let {wi,... ,wm} be a basis for W and extend this to a basis {wi,..., wm, i>m+i, • • •, vn} for V. If Vi denotes the coset W + Vi, show that {vm+i,..., vn} is a basis for V/W.) Proposition 17 Let r\ : V —> V/W be the function obtained by assigning to v G V the equivalence class r](v) G V/W to which v belongs. Then r\ is a linear map (which is clearly epic). Proof: r](av 4- bw) is the equivalence class to which av 4- bw belongs. This is (by definition of 4- in V/W) the sum of the classes containing av and bw, and this equals (a times the class of v) 4- (b times the class of w). That is, r](av 4- bw) = arj(v) 4- br)(w), proving linearity. I Let U, V be vector spaces over k and 0 : U —> V be a linear map. We know that ker 0 is a subspace of U and (j>(U) is a subspace of V (and hence is a vector space). Theorem 4 ker < 4>(U). Proof: We have U V u ker <f> where i/j is defined as follows: for u € U,rj(u) =u + ker (f> and we set i/>(ri(u)) = (f>(u). We claim: if; is well-defined, ^ is linear, ^ is monic, and U $ ker ^ = 4>(U).
D. Direct Sums, Quotients 35 If u + ker (f> = u' 4- ker 0, then r](u) = ?/(?/) so ip(ri(u)) = ^(^(u')) and ^ is well-defined. i/j(ar](u) + bq(y!)) = ip(r](au + buf)) (since r/ is linear) = (j>(au 4- bu;) (by definition of xjj) = a<t>(u) 4- b<t>{v!) {(f) is linear) .j = ai/j(r](u)) 4- bi/j(r](u')) (definition of ip). Thus ip is linear. Clearly ^ is monic, for if 0 = i/j(r](u)) = (j>(u), then u G ker 0 and ^) = 0inKih- Finally, if v = 0(tt) then i/j(r](u)) = v, so that ipL U ) = </>(U). I Exercises (1) Show that V is the internal direct sum of the subspaces A and B <^ every v E V can be written uniquely as v — a 4- b where a G A, b G B (i.e., A and B are complementary). (2) Generalize the notions of external and internal direct sums to three or more summands. (Hint: The characterization of internal direct sum given in exercise (1) is easier to extend. Exercise (4), below, shows how to extend the definition given after Propositon 13.) (3) HU and W are subspaces of the vector space V, show that (u + w)/w^u/(unw). (4) Let Vi,...,Vn be subspaces of the vector space V and suppose that V = V-i 4- • • • 4- Vn. Show that if Vi n (Vi 4- ■ • ■ 4- Vi-x 4- Vi+i H V Vn) = {0} for i G {1,..., n}, then V is the internal direct sum of V\,..., Vn. (5) Let Wbea subspace of the finite-dimensional vector space V. Show that there is a subspace U of V such that V = U 0 W. (Hint: Consider the exact sequence 0 -► W -► V -► V/W -► 0 of finite-dimensional vector spaces.) (6) Let W bo the subspace of R3 spanned by {(1,1,0), (1,0, -1)}. Find a Huhnpnco U C R3 such that R3 a [/ © W.
36 I. Vector Spaces and Linear Maps (7) Let V be finite-dimensional and suppose Vi,..., Vm are sub- spaces of V. Furthermore, suppose that B{ is a basis of Vi for each i. Show that V = Vi 0 • • • 0 Vm <=> B = Bx U • • • U Bm is a basis of V. The Linear Geometry of R . In the real vector space R , a line is a coset of a 1-dimensional subspace of R3 (i.e., any line L is L = v + V where V is a one-dimensional subspace of R3 and v G R ). Similarly, a plane in R3 is a coset of a two-dimensional subset of R . A point is a coset of the (trivial) zero-dimensional subspace {0} of R . The dimension of a coset S is the dimension of the corresponding subspace. Note that the collection S of all cosets of R3 is partially ordered by inclusion. Proposition ( Prove) Let {Sa}aeA be a collection of cosets in R . Prove that Qj6Y4 g is either a coset of R3 or is empty. Definition Let Si,S2 be two cosets in R . Their join is the intersection of all cosets S G S such that Si C S and #2 Q S, written Si J#2- Prove the following. (1) Let V be a subspace of R . Then v + V = w + V<^wev + V<3>-v + weV. (2) Ifw + W = v + V, then W = V. (3) (w + W)J(v + V) =w + [-w + v] + W + V, where [-a + 6] denotes the one-dimensional subspace generated by the vector (-a + b). (4) (w + W)Ci(v + V)^fb&(-w + v)eW + V. Use (3) and (4) to prove. Proposition Let S = w + W, T = v + V. Then SnT = 0 <£> dim(5JT) = dim(TV + V) + l. Definition w 4- W and v + V are parallel if W C V or V C W. (Note that a point is parallel to every coset). Prove the following theorem. Theorem Let 5, T be two cosets in R . Then (1) S C T => dim 5 < dimT, and equality implies S m T.
E. Eigenvectors and Eigenvalues 37 (2) IfSnT is not empty, then dim(SJT) + dim(5 n T) = dim S + dim T . (3) If S fl T t^ 0, £/ien 5, T are parallel <3> one contains the other. (4) IfSHT = 0, tfien 5,T are para/ZeZ *» dim(5 JT) = max{dim 5, dim T} + 1. Now prove the following theorem. Theorem (1) The join of two distinct points is a line. (2) The intersection of two nonparallel planes is a line. (3) The join of two lines with a point of intersection is a plane. (4) The intersection of two coplanar nonparallel lines is a point. (5) The join of two distinct parallel lines is a plane. (6) The join of a point with a line not containing this point is a plane. (7) The intersection of a plane with a line not parallel to it is a point. E. Eigenvectors and Eigenvalues (Part i) In this section we are interested in linear maps of a vector space V into itself (i.e., elements of End(V)). Definition Let <\> : V —> V be linear. A subspace W of V is said to be 4>-stable if cj)(W) C W. Note that if W is 0-stable, then the restriction of 0 to W is a linear map (f>\ w '• W —> W.
38 I. Vector Spaces and Linear Maps Examples (1) Let D be the real vector space of C°° functions from [0,1] into R. The set E C D of elementary functions are those representable in closed form (i.e., start with #, logx, e^cosx, and arccos x, and then construct new functions from these by using the standard algebraic operations and composition). It is not hard to verify that E is a subspace of D. We looked earlier at the two linear maps e D ^ D a defined by a(f) = /' (i.e., differentiation) and *(/)(*)= fXf(t)dt. JO The subspace E is a-stable, but not 0-stable. By basic formulas for differentiation, if / is elementary, then so is its derivative /'. But if, for example, /(t)=y/2 sin2 t+cos2 t, then / is elementary, but 9(f)(x) = Jq f{t)dt is not an elementary function. (It is an elliptic function.) (2) A linear map p : V —> V is said to be idem/potent if pop = p. For such a linear map, p(V) is p-stable. Indeed suppose y G p{V), i.e., y = p(v) for some v G V. Then p(y) = P(p(v)) = p(v) = y so that p restricted to p(V) is the identity map. (3) Let (j): R2 -> R2 be denned by (j){x1,x2) = (xi + x2, xi + x2). Then the diagonal D = {(x,x) G R } is </>-stable because </)(x,x) = (2x,2x) eD. (4) Let D be the real vector space of C°° functions and let 0 : D -^ D be differentiation. Let W = {rex|r G R}. This is a one-dimensional linear subspace of D which is 0-stable. Indeed, (j) is the identity map on W. The idea of finding 0-stable subspaces W of V is to study (j> by observing its action on the 0-stable subspaces. The point is that it is easier to understand linear mappings on lower-dimensional spaces. In particular, we noted earlier that a linear map a : k —► k (k a field) is eomplH.t*ly
E. Eigenvectors and Eigenvalues 39 determined by knowing its value on one element of fc, e.g., a(l). So we make the following definition. Definition A nonzero vector v G V is an eigenvector for 0 : V —> V if Span(i>) is 0-stable. A more customary way of phrasing this definition is to say that v 7^ 0 is an eigenvector for <j> if there exists some A G k such that 4>(v) = Xv. Notice that in this statement the choice A = 0 is not excluded. Thus every nonzero vector v G ker <\> is an eigenvector for (/>. If v is an eigenvector for <j> then the (unique) A such that cf)(v) = Xv is called an eigenvalue for <j> corresponding to v. Proposition 18 Ifv is an eigenvector for (j), andr is any nonzero element of k, then rv is also an eigenvector for (j) with the same eigenvalue as v. Proof: Let A be the eigenvalue of v. Then (j)(rv) = r(j)(v) = r(Xv) = X(rv). I This suggests that, for a linear map <j>: V —+ V, the notion of eigenvalue may be more fundamental than the notion of eigenvector. So for A G k we let V{X) = {ve V\(j){v) = Xv}. Then each V(A) contains the zero vector, and each nonzero vector in V(A) is an eigenvector of 0 having A as eigenvalue. Note that V(0) = ker </>. Also note that each V(A) is a subspace of V called the eigenspace belonging to A. (Of course, we expect that most V(A) will consist of the zero vector only.) The dimension of V(A) is the geometric multiplicity of A. Proposition 19 J/A,/i G k and A^/i, then v(x)nv(v) = {o}. Proof: live V{X) n V(ijl), then (f){v) = Xv = fJLV ho that (A - fj)v = 0. Thus if A # & then v = 0. I
40 I. Vector Spaces and Linear Maps An Important Example: There exists a linear map <\>: Rr —> R^ such that V(X) = {0} for each AeR. Let ei = (1,0) and e2 — (0,1). Define <\> by <\>(e,\) = e2,4>(e2) = —e\. If v = aei + be2 is in V(A), then 4>(v) = a0(ei) 4- b(f>(e2) = Xv = X(ae1 4- &e2) so ae2 4-6(—ei) = Aaei 4-A6e2- Thus Afr = a and Aa = -6. So A(A6) = Aa = -b. Since A2 ^ -1 in R we must have 6 = 0. But then v = aei and 0(f) = a</>(ei) = ae2 = Aaei implying that a = 0 as well as 6 = 0. So t> = 0. The point is that <j> rotates the plane R^ by 90° and the only vector which goes to a scalar multiple of itself is the zero vector. Let (j) e End(F) and V 6 GL(V) (C End(F)). Theorem 5 If v is an eigenvector for (j) with eigenvalue X, then ij)(y) is an eigenvector for ijxfrip"1 with eigenvalue A. Proof: ^0^""1(^(t;)) = ip(j)(v) = i/j(Xv) = Xi/j(v). I Observation: Suppose V has a basis {t>i,..., vn} consisting of eigenvectors of <j>. If Ai,..., An are the corresponding eigenvalues, then <j> is simply described. If v = a\V\ 4-... 4- anvn then (j)(v) = aiAiVi 4-... 4- anXnvn. Exercises (1) Show that V(A) is, indeed, a subspace of V for each A G k. (2) Let {ei, e2} be the standard basis for R . Find the eigenvalues and corresponding eigenvectors for <j> : R^ —> R^ denned by (j>(e\) = 2ei 4- e2, 4>{e2) = 2ei 4- 3e2. What is the geometric multiplicity of each eigenvalue? (3) Let {ei, e2, 63} be the standard basis for R . Find the eigenvalues and corresponding eigenvectors for <\> : R** —> R** defined by 4>{ei) = ei, 4>{e2) = ei + e2, 0(e3) = e3. What is the geometric multiplicity of each eigenvalue? (4) Recall that we defined <j>: R —> R in the important example by <j>(e\) = e2, ^(^2) = —ei. Now suppose that we consider {ei, 62} as a basis for C2 and define <\> : C2 —> C2 by 0(ei) = 62, 0(^2) = —ei- Does this change matters? That is, does <f> have any eigenvalues now, and if so, what are the corresponding v{xys?
F. Dual Spaces 41 (5) Given 0 G End(F), show that 0 is an eigenvalue for 0 <=> ker 0 ^ {0}. (6) Suppose that A is an eigenvalue for the isomorphism 0 G End(F). Show that A-1 is an eigenvalue for (f)~1. (7) Given 0, i\) G End(F), show that 0-0 and -00 have the same eigenvalues. (Hint: consider the cases A = 0 and A 7^ 0 separately) (8) Suppose that the vector space V is the direct sum of the sub- spaces Vi,..., Vm. Also, suppose that we have endomorphisms 01, • • •, 0m where fa G End(V^). Given v G V, we can write it (uniquely) as v = V\ H h vm where Vi G Vi. If we define 0 = fa © • • • © 0m by fav) = fa(vi) + • • • + 4>m(vm), show that 0 is an endomor- phism of V. Furthermore, show that each Vi is 0-stable and that 0| v; = 0i- (9) Suppose we have 0 G End(F) and 0-stable subspaces Vi,..., V"m such that F = Vi © • • • © Vm. Show that 0(F) = faV{) © • • • © 0(^m). F. Dual Spaces Let V be a vector space over a field fc, and let V* be the set of all linear maps 0 : V -* ft. Then F* becomes a vector space by defining (rfa)(v) = r0(i>) and (0 + ^)(v) = 0(v)+^(v) for r G ft and 0, -0 G V*. Definition This vector space V* is called the dual space of V, and its elements are called junctionals. Examples (1) Let V = fcn and let 0 : fcn —> fc be projection on the ith factor; i.e., 0(a?i,...,a?n) = a:< • Than 0 in a functional.
42 I. Vector Spaces and Linear Maps (2) Let V be finite-dimensional and choose a basis {^i,..., vn} so that v G V can be written as v = £it>i H \- xnvn for xi G k. Then for any (ai,..., an) G fcn we get a functional defined by 0(v) = aixi H h anxn . In Proposition 20, we shall see that the converse is also true. That is, any <j> G V* can be written in this form once we choose a basis for V. (3) Let V = C([0,1]) be the vector space of real-valued continuous functions on [0,1]. Then 4>{f) = / f(t)dt Jo is a functional. (4) Let V be any vector space (possibly infinite-dimensional) and let B be a basis for V. If we choose v e B, then there is a unique functional v* such that v*(v) = 1 and v*(w) = 0 for any w e B — {v}. Exercise. Verify these examples. Proposition 20 If V is finite-dimensional, then dim(F*) = dim(F) so thatV^V*. Proof: Choose a basis a = {t>i,..., vn} for V and define vj,..., t>* G F* by <(^) = <^ . We claim that a* = {vj,... ,i>*} is a basis of V* (called the dual basis of a). First, a* spans V*. For if 0 G V*, then we have 0 = ^(viK + • • • + 0KX (just apply both sides of this expression to any vt to get (j){vl)). Next, a* is linearly independent. For if we have scalars ci,..., cn such that civj + • • • + cn< = 0eT, then 0 = {c\v\ + • • • + cnv^)(vi) = ca for each z. We get an isomorphism i/ja : V —> V* by defining
F. Dual Spaces 43 We remark that the isomorphism V = V* depends strongly on our choice of basis (i.e., a different basis /3 for V would lead to a different dual basis j3* and a different isomorphism -0/3)• In fact, there is no obvious way to identify V to V* unless we choose a basis. Since we use the term "dual space", it seems reasonable to ask if, somehow, V is also the dual of V*. We will see that when V is finite-dimensional, we have a natural way to identify V with the dual space (V*)* of V*. Here, "natural" means°"basis free." That is, our isomorphism of V onto (V*)* will not require us to choose a basis. We still have a natural map of V into (V*)* when V is infinite-dimensional, but it is only monic in this case. We now define our map p:V-+(V*)* as follows. Given v G V, we want p(v) to be a functional on V*. So given 4> G V*, we define p{v){(j)) = (j){v) . Exercise. Show that p(v) is, indeed, a functional on V*. Proposition 21 p : V —> (V*)* is a linear map. Proof: Choose v±, t>2 G V and r G k and </> G V*. Then we have p(rvi)(0) = (j){rvi) = r0(vi) = rp(vi)(0) and p(vi H- V2)(0) = 0(vi H- V2) = 0(vi) H- 0(^2) = p(vi)(0) + p(v2){(j)) = (P(V1) + P(V2))(0). Since 0 is arbitrary, we must have p(rvi) = rp(vi) and p(vi + V2) = p(vi) + p{v2) . I We are going to show that the linear map p : V -► (F*)* is monic (even when V is infinite-dimensional). This will use a logical principle known as Zorn's lemma, so we make a slight detour to discuss that. A nonempty set S is partially ordered if for some pairs (a, b) G S x 5, there is a relation, written a <b, which satisfies the properties (1) a < a, (2) if a < 6 and 6 < c, then a < c, and
44 I. Vector Spaces and Linear Maps (3) if a < b and b < a, then a = b. The elements a,b E S are said to be comparable if either a < b or b < a. Note that we do not require all pairs of elements to be comparable in a partially ordered set; however, if a, b G S are always comparable, then S is simply ordered (or linearly ordered or totally ordered). A partially ordered set S has a maximal element ao if a < ao for all a G S which are comparable to ao- An upper bound for the nonempty subset T C S is an element bo £ S such that b < bo for all b eT. Zorn's Lemma: Let S be a partially ordered set, and suppose that each simply ordered subset of S has an upper bound in S. Then S contains a maximal element. Now let V be a vector space over the field k. We will use Zorn's lemma to show that any nonzero element uG^is part of a basis for V. (In particular, V always has a basis, as claimed earlier.) Let S be the collection of all subsets of V which are linearly independent and contain v. We partially order S by inclusion: for a, b G S we write a < b to mean a C b. Now let T be a simply ordered subset of S, and let U be the union of all the elements (sets) in T. Clearly v G U, and we claim that U G S, which will show that U is an upper bound for T. So we just need to show that U is a linearly independent set. Suppose m,..., um are elements of U with a\U\ + • • • + dmUm = 0 where a\,..., am G k. Then, since all U\,..., um are elements of some r E T, and every r G T is a linearly independent set in 5, we must have all ai = 0. Thus 5 satisfies the hypotheses of Zorn's lemma, and we conclude that S has a maximal element So G 5. Now s0 is a linearly independent set in V and it contains v. If $o does not span V, we can choose w e V outside the span of So- But then Sq U {w} is a linearly independent set containing v, contradicting the maximality of s0. Thus we have proven Theorem 6 Let V be a vector space over k and suppose v G V is any nonzero element. Then V has a basis containing v as one element. Corollary Given the nonzero element v G V, there exists a linear map 4> :V —► k such that 4>(v) — 1. Proof: Take a basis for V containing v and apply example (4). I Proposition 22 p : V —► (V*)* is monic. Proof: Suppose p is not monic. Then for some nonzero v € V, we have p(v) = 0. This means p(v) is the zero map V* —> fc; that is, for any <f> € V* we have p{v){4>) = 0(v) = 0. But, by the corollary, some 0 mapi V to 1, a contradiction. 1
F. Dual Spaces 45 Proposition 23 IfVis finite-dimensional, then p is an isomorphism. Proof: We know that dim V = dim(V*) = dim((V*)*), so the monic map p must also be epic. I Exercises (1) Show that the set of integers Z equipped with the standard order is simply ordered but has no maximal element. (2) Consider the subset of R2 S = {(xux2)\x2<0} and define a relation on S by (^i,^) < (2/1,2/2) <=> #1 = V\ and X2 <V2- Show that S is partially ordered by this relation. Show also that S has infinitely many maximal elements. (3) Given the basis {(1,0,0), (1,-1,0), (2,0,1)} of R3, find its dual basis. (4) Generalize Theorem 6 as follows: let V be a vector space over k and let A be a linearly independent subset of V. Show that A can be extended to a basis for V. (5) Let W be a subspace of the vector space V. If 0 G W*, show that we can find a <j> G V* such that <\> = 0. w (6) Given the subset W of the vector space V, we say that </> G V* annihilates W if <j>(w) = 0 for every w G W. We call A(W) = {(j) G V* I (j) annihilates W} the annihilator of W. Show that A(W) is a subspace of V*. Show also that if W C W, then A(W0 C A(W')- (7) If W is a subspace of the finite-dimensional vector space V, show that W* ^ F*/A(W) . Conclude that dim(A(W)) = dimF - dim W\ (Hint: Define a map ip : V* -* W* by ^»(0) =01^ and show that ip is surjective with kcr ip = A(W), Now apply Theorem 4.)
46 I. Vector Spaces and Linear Maps (8) Use exercise (7) to show that W = A(A(W)). Note that we must identify V and (V*)* for this to make sense. (Hint: First show that W C A(A(W)).) (9) Let U and W be subspaces of the (not necessarily finite-dimensional) vector space V. Show that A(U + W) = A(U)f)A(W). If V is finite-dimensional, show also that A(U f) W) — A(U) + A(W). (10) If V = U © W, show that V* = A{W) 0 A(U).
Chapter II Matrices and Determinants A. Matrices We have already noticed that a linear map <j> : k —> k is completely determined by its value on one element of fc, e.g., on 1. Also we noted that a linear map (j): k2 -> k is determined by 0(1,0) and 0(0,1). We will now see how a linear map (j)'.V -+ W (V, W being vector spaces over k of dimensions n, m) is determined by nm elements of k. Let {t>i,..., vn} be a basis for V and {w\,..., wm} be a basis for W. Then (by Proposition 8 of Chapter I) each <j>{v\) is a unique linear combination of wi,..., wm. We write this as (1) <j>(v%) = anwi + ai2w2 H h aimwm . Also, we know that <\> is uniquely determined by the n vectors (in W) </>(i>i),..., <j>(vn). Thus the rectangular array (2) M{4>) of nm elements of k determines 0, as long as we know the bases {v\,..., vn }, {wi,..., wm). Thta rectangular array of elements of & is called a matrix ( an Q>21 \ dnl «12 • &22 • ^n2 • • • 0>lm \ • • A2m &nm /
48 II. Matrices and Determinants and the one shown has n rows and m columns', it is called annxm matrix; and M((/>) is called the matrix representation of <j> with respect to the bases {vi,..., vn} and {wi,..., wm}. We sometimes write M(4) = (<*y) meaning that we will denote the element in the ith row and jth column by aij, and we call it the i, j entry of (a^). With the given bases for V and W and the matrix M(<j>) we can describe <j> quite explicitly. Given x G V, we have x = X1V1 H h £n^n uniquely. Since 0 is linear, we have (j){x) = xi0(vi) H h x„0(v„). Then, by using (1), we have (j)(x) = xi(aniyi H h aimwm) H h x„(oniiui H h anmwm) = {xian + x2a2i H h xn«ni)^i 4- • • • 4" v^i^i7?T, 4 • • • 4* xnanrn)wrn . In particular, we see that the coefficient of Wj in the unique expression for (f)(x) is (3) yj = xxaxj 4- x2a2j H h xnanj. This is the element of k obtained by multiplying the row (a?i, x2,... #n) by / aij \ a2j , term-by-term and then adding up. the column \ anj J Examples First we give some simple examples where V = W = R2 and we us© the basis {ex = (1,0), e2 = (0,1)} for both V and W.
A. Matrices 49 (j) M((j>) Remarks (1) 0(ei) = 0,0(e2) = 0 ( n ) The zero linear map (2) 4>(ei) = e2,0(e2) = 0 ( ) A nilpotent map (3) 0(ci) = O,0(c2) = ci ( 1 0 ' NilP°tent (4) 0(ei) = 0,0(e2) = e2 ( ) Idempotent (5) 0(ei) = ei + e2, <£(e2) = ei - e2 0 1 1 1 1-1 (6) 0(ei) = e2,0(e2) = -ei ( ) rotation by 7r/2 (7) 0(ei) = e2,0(e2) = ei (in) reflection in line (8) 0(ei) = — ei,0(e2) = — e2 ( ] reflection in origin (9) 0(ei) = ei H- e2,0(e2) = ei H- e2 1 1 1 1 Exercise. Which of these nine are isomorphisms? An example in V = W = R3: let {vi = (1,0,0), t* = (1,1,1), v3 = (0,1,0)} be the basis for V and {w! = (1,0,0), w2 = (0,1,0), w3 = (0,0,1)} be the basis for W. Define <j> by (j){vi) = wi + w2, (j){v2) = w2, (j){v3) = u>i - iu3. Exercise. Write out M(</>) and try to decide if <j> is an isomorphism. We have seen that given bases for V and W, a linear map <\> determines a matrix M{(j>) (which will, in turn, determine <j>). Suppose now we are given bases {vi,... ,vn} for V and {u>i, ... ,wm} for W, and we are also given an n x m matrix Then (2^) determines a linear map i/j : V —► W by V>(v<) = *u^l + ' ' § + ZimWm.
50 II. Matrices and Determinants This leads to an interesting point. Suppose we have vector spaces V, W, U of dimensions n, m, p respectively and have linear maps Then the composition if; o <j> is a linear map from V toU. If we have chosen bases {vi,...,vn} for V {wi,...,wm} for W {tti,... ,ixp} for (7, then we have matrices M(</>), M(/0), and M(/0 o 0) relative to these bases. Since 4> and ^ determine i/jocf), it should be that M(</>) and M(^) determine M{il)o(j)). We now explore how this works. Let M(<f>) = (dij) M{i>) = (bij) M(^o(j)) = (cij). Then we have <f>(vi) = anW! H h aimi<;m il)(wt) = bnui H h b^pUp (lpO(f))(Vi) = CnUi-\ \-CipUp. Using the first two expressions we can calculate the coefficient Cij of Uj in the third expression. We have il>(<f>(vi)) = a^Oi) H h airn^(wm) and we see that (4) c^- = anhj + a^j H h ctimbmj • That is, we get the ij entry in Mfyotfii) by multiplying the i**1 row of M(</>) term-by-term with the jth column of M(ip) and summing the resulting m terms. So we use this as our definition of how to multiply annxm matrix with anmxp matrix to obtain annxp matrix, i.e., (a^) • (b^) = (cij) where Cij is given by (4). Example r 23\ in) - (-im) \0-4 7/ \-3 3/ \-33 21 / 2x3 3x2 2x2 n = 2, m = 3, p = 2
A. Matrices 51 We have also proven the following. Theorem With <\> and i/j as above, we have M{^ ocj)) = M{(j)) • M(V>) . Notice that formula (3) for calculating cj){x) is just a special case of matrix multiplication. We had bases {vi,...,vn} for V and {wi,... , wm} for W. Using this basis for V we can think of x = XiVx H h xnvn as a 1 x n matrix (xi,X25...,xn) called the coordinate vector of x with respect to the basis {vi,... , wn}- Then (3) tells us we get <j>(x) (relative to {wu...,wm}) as the 1 xm matrix (2/1,2/2, ••• ,2/m) • If VF — fc, then 0 : V —> fc is simply a functional on V and we have (j){x) = (Xi,...,Xn) as in example (2) of section IF. / 0,1 An Important Observation: By now, it has probably become apparent that there is a slight discrepancy in our notation; i.e., we write functions on the left (e.g., x 1—► </>(#)), but when we represent a linear map by a matrix, we write it on the right (e.g., (a?i,..., xn) 1—► (a?i,..., xn)M((/>)). This difference is due to the fact that we will want to consider linear maps over the quaternions in Chapter V. We will see that the quaternions are noncommutative, so our choice to define scalar multiplication on the left (i.e., (scalar)(vector)) forces us to write a linear map as matrix multiplication on the right. The only real difficulty caused by this choice arises when we compose maps: if we want to find M(ip o 0), we must compute M(</>) • M(^) instead ofM(V;)-M(0). It is left as an exercise to show that this difficulty disappears if we write vectors as column matrices and multiply by the matrix of a linear map on the loft.
52 II. Matrices and Determinants Square Matrices Let V be an n-dimensional vector space. We will be considering linear maps V —> V, i.e., elements of End(F); these correspond to square (n x n) matrices. Proposition 1 Let {vi,..., vn} and {wi,..., wn} be any two bases for V. Then the linear map (j): V —> V defined by (j){vi) = Wi is an isomorphism. Proof: By the rank theorem, if <j> is monic, it will also be epic and hence an isomorphism. So suppose <j>(y) = 0. Write v = a\V\ + ... + vnvn and then 0 = (j)(v) = ai(j){vi) H h an(j){vn) — a\wx H V anwn. Since {wi,..., wn} is a basis, each ai = 0 and thus v = 0. I Notation: If we use the same basis a = {vi,..., vn} for V as domain and V as range, we denote the matrix for p G End(F) by Ma(p). If / : V —> V is the identity map, then for any basis a we have Ma(I) = (6ij) where 6ij = 1 if i = j , and 6ij =0 if i^j . This is called the identity matrix I. Clearly Iop = poI = pior any p e End(V), and for any n xn matrix A, AI = IA = A. Definition The n x n matrix A is a nonsingular if there exists annxn matrix B such that AB = £?A = /; B is called the inverse of A and is usually written as A-1. Proposition 2 If p E End(F) zs an isomorphism, then any Ma(p) has an inverse, namely, Ma(p~1). Proof: I = Ma(p~1op)=Ma(p)Ma(p~1). I Proposition 3 A matrix A is nonsingular <^ any corresponding (j) is an isomorphism. Proof: <= comes from Proposition 2. => If A = Ma(<j>) has an inverse A~x = B = (bij), where a = {i>i,..., vn} is some basis for V, we define i/; : V —> V by ^(^t) = &ilVl + • • • + fein^n and note that %j) o 0 and 0 o -0 are represented by L I
A. Matrices 53 Definition Two n x n matrices A, B are similar (or conjugate) if there exists a nonsingular n x n matrix C such that B = CAC~1. It is an exercise to show that similarity is an equivalence relation. (It seems that "conjugate" is a much better word than "similar" for this relationship.) An obvious question to ask at this point is the following: how does the matrix representing a given endomorphism of V change when we choose a different basis for VI Fortunately, this is easy to answer. Proposition 4 Let (j) e End(Vr). Let A = Ma(0) and B = Mp(<j>) for two bases a = {vi,..., vn}, (3 = {wi,..., wn}. Then A and B are similar matrices. Proof: If A = (a^) and B = (^), then by (1) we have n <l>(wi) = Ylbi3WJ n fc=i Define the matrix C = (cZJ) by n Wj = 2_.CjkVk (change-of-basis matrix), fc=i and notice that C is nonsingular by Proposition 3 (show this). We calculate <j)(w%) two ways: n n n </>K) = Ylb%3W3 =^2b%3^2C3kVk = Yl [ Ylb%3C3k I Vk 3 = 1 3 = 1 k=l k=l \j=l n I n sr v ^(^t) = <\> (Yl Cz3vj) = Yl^iYl aikVk U = l / 3 = 1 \k=l / n i n YL Y2cvaJk)Vk k=l \j=l This shows BC = CA or that B = CAC-1 . |
54 II. Matrices and Determinants Suppose / is a function whose domain is the set ofnxn matrices. If / is constant on equivalence classes of similar matrices (i.e., f(A) = f(CAC~1) for every nonsingular matrix C), then we get an induced function on End(V) by Proposition 4. That is, given 0 G End(V), we define /(</>) to be f(Ma(4>)) where a is any basis for V. An important example of such a function is the determinant function, which we shall study shortly. Another such is the trace function. Example We define the trace of the n x n matrix A = (a^-) to be the sum of its diagonal elements, i.e., trace A = an + a22 H h ctnn • It is left as an exercise to show that similar matrices have the same trace. Exercises (1) Verify that similarity is an equivalence relation. (Note that if X and Y are nonsingular, then so is their product XY.) (2) Given the linear map 4> : V —> W and bases a = {vi,..., vn} for V and (3 = {u>i,..., wm} for W, we have (j){vi) = anW! H h airnwm as before; let *M(0) / an «2i ••* am \ «12 «22 * * * «n2 \ Q>lm Q>2m ' ' ' Q"nm / and note that the ith column of lM{(j)) is the ith row of M(</>) (see equation (2)). Given the bases a and ft, we will see that *M(</>) is the matrix associated to </> when we write vectors as column matrices. (i) Given x — x\V\ H \-xnvn G V, we can write this as the n x 1 matrix : . Similarly, we can \ %n / (Vl write <f)(x) = y = yiWi + • • • + ymwm m \ \ Vm
A. Matrices 55 Now show that / xi lM{(t>) : \ xn (ii) Given the linear maps V» show that we have *M(^ o 0) = *M(^) • *M(0) once we have chosen bases for V, W, and J7. (3) Define <\> : R^ —> R^ by </>(xi,X2) = {2%i — 3^2, —x\ — 8x2), and determine Ma (</>), Mp{cj)) where a is the standard basis and j3 = {Vl = (1,1),V2 = (-1,2)}. Find C such that M^(0) = The Rank of a Matrix. Given the nxm matrix A = (a^) with entries form the field fc, we denote the rows of A by Ai,..., An and the columns of A by A1,..., Am, i.e., Ai is the vector (oji, ..., a»m) G fcm and A-7 is the vector \aij,..., anj-) G fcn. Definition The row space of A is the subspace of km generated by {Ai,.. 0 An}, and the column space of A is the subspace of kn generated by {A1,..., Am}. The row (column) rank of A is the dimension of the row (column) space of A. An interesting fact is that the row and column ranks of A are equal. Thus we simply refer to the rank of A. It is also true that similar matrices have the same rank, so we define the rank of 0 G End(F) to be the rank of any matrix which represents <j>. More Exercises (4) Show that the row rank and column rank of any matrix A are equal. (Hint: Choose a maximal linearly independent subset {Si,..., Br} of {Ai,..., An} and write (*){ M = cn-Bi H h clrBr ^ An — cnii?i + • • • + cnrt>r for sealant cy € fc. Now note that the jth components on the Mt hand nido of (*) combine to give A7.)
56 II. Matrices and Determinants (5) Show that if the square matrices A, B are similar, then they have the same rank. (Hint: Think of the nxn matrix A as an element of End(fcn) and observe that the rank of A is just the dimension of im (A).) What is the rank of the following matrices? (i) A=\ (ii) A=\ (-: 'I 0 N 3, 0 1 -1 i i) (7) Given the matrix A, if we delete some (or possibly none) of its rows and/or columns, then we obtain a submatrix of A. For example, <-(!!) is the square submatrix that we get from A in exercise (6i) when we delete its second row. Show that r is the rank of the matrix A o A has an r x r square submatrix which is nonsingular, but any larger square submatrix of A is singular. (Note that this will be easy to prove once we have developed the theory of elementary row operations.) (8) Given nxn matrices A and £?, show that trace(AB) = trace(£?A). Now show that similar matrices have the same trace. B. Algebras Let V be a vector space over a field k. If V also has a multiplication on it which behaves well with respect to the vector space operations, then V is called an algebra (see Section B of Chapter I). Precisely, the multiplication must distribute over vector addition (5) v(w + u) = vw + vu and (w + u)v = wv + uv and for scalar multiplication we need (6) v(rw) = r(vw) = (rv)w for r G k. Note that the zero vector 0 of V is a "multiplicative annihilator" because Ov = (0 + 0)v = Ov + CH;
B. Alegbras 57 and, since V is an additive group, this implies Ov = 0. The algebra V is associative if we always have v(wu) = (vw)u. It is commutative if we always have vw = wv. The algebra has a unit element if there is a 1 G V such that lv = vl = v for each v eV. (Note that such an element must be unique. See Proposition 1 of Chapter 0.) Example ° Let Mn(k) denote the set of all n x n matrices over the field k. We first make Mn(k) into a vector space by defining: for A = (dij) and B = (bij) in Mn(fc), let A + B = (atj 4- 6»j) , and for A = (a^-) and r G fc, we set rA = (ra,ij) . 2 A little thought will make it clear that this is just the vector space kn of all ordered n2~tuples of elements of k. We are just writing the n2-tuples as square arrays. In the previous section we saw how to multiply n x n matrices and we readily verify A(B + C) = AB + AC (B + C)A = BA + CA A(rB) = r(AB) = (rA)B, so that Mn(k) becomes an algebra. Also it is associative since matrix multiplication corresponds to composition of linear maps which is manifestly associative. But Mn{k) is not commutative (for n > 1). For example, whereas (??)(:!!)-(! J)- This algebra has a unit element, namely, the identity matrix /l ... 0 \o ... i (*«) % = {I if if i = J i + 3
58 II. Matrices and Determinants This clearly satisfies IA = AI = A for any A G Mn(k). The algebras Mn(k) are matrix algebras over k. We have seen that once we choose a basis a = {t>i,... ,i>n} for the n-dimensional vector space V over the field fc, we get an isomorphism V^kn which takes x = x\Vi-\ \-xnvn GFto its coordinate vector (xi,..., xn) G kn. This choice of basis also gives a one-to-one correspondence between endomorphisms <j> G End(F) and n x n matrices Ma((/>) G Mn(fc); this correspondence is an (algebra) isomorphism by exercise (2) of section A. Unfortunately, this isomosphism is not "natural" since it depends on a. (By Proposition 3, we know how a different choice of basis affects this isomorphism.) Henceforth, we will sometimes use <j> and Ma((/>) interchangeably (especially when V = kn and a is the standard basis). We have already observed the potential difficulty in doing this (see the "important observation" in section A). Example We have already made the plane Rr into a vector space. Now we make it into an algebra by defining (7) (a,6)(c,d) = (ac,W). It is easy to verify that (7) does make R2 into an algebra. Also this algebra is associative and commutative and it has unit element (1,1). But (7) does not make R^ into a field, because we have seen that fields have no divisors of zero whereas (7) implies (1,0)(0,1) = (0,0). We saw in Chapter 0 that a "better" multiplication exists on R2; namely (8) (a, b)(c, d) = (ac — bd,ad + bc). This is associative and commutative, has unit element (1,0), and is a field. Example: The exterior algebra on R** We will introduce here the exterior algebra A on R . (In the next chapter we will define it for all Rn.) As a vector space, A is going to be the direct sum of four vector spaces (9) A = A°0A1eA20A3. A0 is just the vector space R with basis {1}. A1 is the vector space R** with the standard basis {ei,e2,e3}. Now A2 is going to be another three-dimensional real vector space and we are going to write the basis elements as {ei Ac2,ci Ae3,e2 Ac3}.
B. Alegbras 59 Finally, A3 is going to be a one-dimensional real vector space with basis element {ei A e2 A e3}. Our clumsy notation for basis elements (at least for A2 and A3) is in preparation for making the vector space A = A0 0 A1 e A2 © A3 into an algebra. Now A is an eight-dimensional real vector space with basis (10) {l,ei,e2,e3,ei Ae2,ei Ae3,e2 Ae3,ei Ae2 A e3} consisting of "four kinds" of elements. We are going to define a multiplication on A which will distribute over vector addition and thus it will suffice to define multiplication of basis elements. Our algebra is also going to be associative. (i) First we decree that the basis element 1 for A°(= R) is to be the unit element for the multiplication. (ii) Next we decide that the product of two elements in A1 is to be in A2. Precisely, we take: the product of e\ and e2 is to be e\ A e2, the product of e\ and e3 is to be e\ A e3, and the product of e2 and e3 is to be e2 A e3. (iii) We also need to know the product of e2 and ei, e\ and ei, etc. We set (11) ej Aei = -(e{ Ae,) for all i,j — even for i = j. In particular, e$ A ei = — (ei A ei) and this must be the zero vector (in A2) since 2 ^ 0 in R. So the multiplication of basis elements in A1 is given by ei e2 ^3 ei 0 -(ei Ae2) -(ei Ae3) e2 ei Ae2 0 -(e2 Ae3) ^3 ei Ae3 e2 Ae3 0 and using distributivity we can calculate the product of any two elements of A1. The general rule (11) also gives us guidance in determining other products. For example, suppose we want to multiply e3 G A1 with e\ Ae2 G A2. We write the product as e3 A (e\ Ae2), and we are insisting on associativity, so we can write this as e$ A eA A e2. By (11), this equals -ei A 63 A eg = ei A 62 A 63,
60 II. Matrices and Determinants As another example consider (ei A 62) A (e2 A 63). This can be written ei A (e2 A e2) A e% = 0, since e2 A e2 = 0. Finally, let's do a fairly general example: (a0 4- axei + a3e3 + ai3ei Ae3 + a123ei A e2 A e3) A(62e2 4- bi2ei A e2 4- 613ei A e3) = a>ob2e2 4- ^0^1261 A e2 4- ao^i3ei A e3 4- aib2ei A e2 4 0 4 0 4- a362e3 A e2 4 a3&i2e3 A ex A e2 4- 0 4- a13&2ei A e3 A e2 4 0 4- 0 4-04-04-0 = a>ob2e2 4- (a0&i2 4- ai^)ei A e2 4- ao&i3<3i A e3 - a3b2e2 A e3 4 {a3b12 - ai362)ei A e2 A e3. You are asked to do a few more in the exercises. Note that the product of an element in Ap and an element in Aq will be an element in hP+v. (Of course Ap+q = {0} for p 4 q > 3.) Note that in our definition of algebra we did not require the multiplication to be associative or commutative. Given elements x,y, z of an algebra A, we define their associator by (12) [x,y,z] = {xy)z-x(yz). Clearly A is associative <=> all associators are zero. For two elements x, y of A we define their commutator by (13) [x,y]=xy-yx. A is commutative <=> all commutators are zero. Recall that the algebra End(V) is associative but not commutative. From End(F) we construct a new kind of algebra, a Lie Algebra, by defining the product of x,y G End(F) to be their commutator [x,y] = xy-yx. Clearly (14) [y,x] = -[x,y]. So, in particular, [x, x] = 0 for all x G End(F). This algebra fails to be associative, but satisfies the Jacobi identity (15) [x, [?/, z] ] + [?/, [z, x) ] + [z, [x, y] ] = 0, the proof of which is a routine exercise. Theorem 1 Let A be an associative algebra. Then A with the new multiplication [x, y] = xy - yx is a Lie Algebra (i.e., [x,y] = — [l/i$c] arad Wie Jacobi identity is aatiiifird).
C. Determinants, the Laplace Extension 61 Example There is an important three-dimensional Lie Algebra. Let x = (1,0,0), y = (0,1,0), h = (0,0,1) in R3. Define [h, x] = 2x, [/i, 2/] = -2?/, [x, 2/] = ft, etc. (This is the Lie Algebra associated with the Lie group whose elements are quaternians of unit length.) Exercises (1) Compute the following products: (i) (e2 A ei) A (e2 A 63) (Note that this can be done two different ways: either using (11) or the fact that A4 = WO (ii) (ei 4- e2 A e3) A (ei + e2 A e3) (iii) (3e2 - 2ex A e2 + 4e2 A e3) A (2 - 5ex) (2) Prom (11), we concluded that e$ A e$ = 0. Given any uieA, can we conclude that w A w = 0? (Hint: See (ii) above.) (3) Prove Theorem 1, i.e., show that the new multiplication satisfies (5), (6), and (15). C. Determinants, the Laplace Expansion We will now define the exterior algebra A "over" Rn (for n = 3, see Section B). We fix n. As a vector space, A will be a direct sum (16) A = A0 © A1 © ... © An. Just as before, A0 = R with basis {1}. Similarly, A1 = Rn with basis {ei, e2,..., en}. Then A2 will be a real vector space with basis {el A e311 < i < j < n}. A3 will have basis {e^ A ej A e^|l < i < j < k < n}, etc. Finally, An will have basis {ei A e2 A ... A en}. n\ We note that the dimension of Ap (p = 0,1,..., n) is —-—-—r-. p\(n — p)\ Exercise. Show that dim(A) = 2n. For the algebra structure we insist (just as we did for n — 3) that: (i) multiplication is to be associative, (ii) multiplication is to distribute over addition, and
62 II. Matrices and Determinants (iii) for all i, j, e$ A e3- = —(e^ A e^). Just as for n = 3, we see that these conditions suffice to define the multiplication. If x G Ap and y € A9, then x Ay E Ap+q (of course if p + g > n, then A*>+<* = {0}). If in e^ A ei2 A ... A eim any two are equal, then this product is zero. For by (iii), we can move the two equal ones together by sign changes and then (again by (iii)) the product is zero. Proposition 5 Ifx,y G A1 are linearly dependent, then x A y = 0. Proof: For any x,y € A1 we can write x = x\e\ + ••• + xnen, y — yi^i H h 2/nen and, using (i), (ii), and (iii), we have xAy = ^2xiVjei A eo = 5Z(^2/j ~ ^jZ/tJet A e,. hj i<3 Now suppose that for some r E R, y — rx. Then each yi — rxi and x Ay = 2_l(xirx3 ~~ xjrxi)ei A ej =0. I Corollary For xgA1,xAx = 0. Proposition 6 J/t>i, t>2,..., t>m G A1 are linearly dependent, then fi Af2 A ... A vm = 0. Proof: We have 0 = ait>i H + ami>m with not all a; zero. We may as well say a\ ^ 0. Then vi = b2f2 H h ^m^m (with 62 = , etc.). ai So we have vi A V2 A - • • A vm = &2i>2 A V2 A v% A • • • A vm + 631>3 A 1>2 A V3 A • • • A Vm + &m^m A V2 A V3 A • • • A Vm . Each term has a repeated Vj and is thus zero. I It is important to notice that the contrapositive of Proposition 6 is a criterion for linear independence: if Vi,... ,vm € Rn(= A1) satisfy v\ A • • • A vm -fi 0, then vi,..., vm are linearly independent.
C. Determinants, the Laplace Extension 63 We are now ready to discuss determinants of matrices. In Rn, ei,..., en are ordered n-tuples and can be considered to be 1 x n matrices (or "row vectors"). So given annxn real matrix A, the matrix product eiA is again a 1 x n matrix which we see is just the ith row of the matrix A. Thus each a A e Rn = A1. It follows that e\A A • • • A enA lies in the one-dimensional vector space An and thus is some real multiple of the standard basis vector e\ A e2 A • • • A en for An, i.e., (17) e-iA A • • • A enA = (det A)e\ A • • • A en. Definition The real number det A is called the determinant of the real n x n matrix A. Example We calculate det A for A = e\A = ae\ + 6e2, e2A = cei + de2 exANe^A = (aei -\-be2) A (cei +^e2) = ad ei A e2 + 6c e2 A ei = (ad — 6c)ei A e2. So det I , I = ad — be. \c d J Notice that (17) not only defines det A, but (using (i), (ii), and (iii)) also furnishes a means for calculating det A. Proposition 7 (i) If I is the n x n identity matrix, then det/= 1 . (ii) If the rows of A are linearly dependent, then det A = 0 . ::) (iii) For r € R, det (rA) = rn(det ^4).
64 II. Matrices and Determinants Proof: (i) The rows of I are e\,..., en so ei A • • • A en = (det X)e\ A • • • A en. (ii) By Proposition 6, e\ A A • • • A enA = 0. (iii) The rows of rA are re\A,..., renA and reiA A • • • A renA = rn(eiA A • • • A enA) = rn(det A)e\ A • • • A e„. I Let C be an n x n real matrix. Then we know how C operates on A1 and we now define an operation of C on a product v\ A • • • A vp (^ E A1) by (18) (vi A ■ • ■ A vp)C = v-iC A • • • A vpC . Now we get a very important property of det : Mn(R) -> R. Proposition 8 Let A,B be n x n real matrices. Then (19) det(AB) = (det A) (det B). Proof: By definition e-i(AB) A • • • A en(AB) = det(AB)ei A • • • A en. Also ei(AB) A • • • A en(A£) = {erA A • • • A enA)£ (by (18)) = (det A)(ci A • • • A en)£ (by (17)) = (det A)(eiJ5 A • • • A enB) = (det A)(det B)ei A■••A en . I A very useful property of det is: Proposition 9 (20) A is nonsingular <=> det A -=f=- 0. Proof: =$> By hypothesis A"1 exists. So 1 = det I = det(^L4-1) = (det A)(det A"1), proving det A ^ 0. <= By (ii) of Proposition 7, the rows of A (which are the images of ei,..., en under the linear map A) are linearly independent. Thus A is an isomorphism of Rn and A"1 exists. | Recall that two nxn matrices A, B are similar if there is a nonsingular nx n matrix C such that B = CAC~l.
C. Determinants, the Laplace Extension 65 Proposition 10 det : Mn(R) —> R is constant on equivalence classes of similar matrices. Proof: If B = CAC~\ then det (B) = det {CAC-1) = det (C) det {A) det (C"1) and (det C)(detC-1) = det (CC'1) = det J = 1. Thus det £ = det A . I Observation: Notice that our development of the determinant function has only used the fact that R is a field — we have not used any properties unique to the real numbers. Thus, for any field fc, we get a determinant function det : Mn(k) —> k which is constant on equivalence classes of similar matrices, etc. Exercises (1) A matrix D = (d^) G Mn(k) is diagonal if dtj = 0 whenever i^j. Show that if D is diagonal, then det D = d\\d^2 • • • dnn. (2) A matrix A = (a^) is (weakly) upper triangular if a^- = 0 whenever i> j. Show that this implies det A = ai 1*222 •. • ann- A is lower triangular if a^ = 0 whenever z < j. Show that this also implies det A = an ... ann. (A is called triangular if it is either upper or lower triangular.) (3) The transpose lA of matrix A = (a^) is lA = (a^). (That is, the rows of lA are the columns of A.) Show that t(tA) = A and that det* A = det A. (4) Define the (real) general linear group to be all nonsingular matrices A e Mn(R); i.e., GL(n,R) = {Ae Afn(R) | det A 7^ 0} . Show that GL(n, R) is, indeed, a group. Note that GL(n, R) is simply GL(Rn). (5) Let ^11 &12 ai3 i4 = I tt21 &22 &23 &31 &32 a33
66 II. Matrices and Determinants Show that det A = an det (°22 °23 )- fll2 det ( °21 °23 ) V a32 033 J \ a3i a33 ) + a13detfa21 a22V (This is a special case of the "Laplace expansion.") The Laplace Expansion Recall that our definition of det A for an n x n matrix A is (21) e-iA A • • • A enA = (det A)er A • • • A en . We can rearrange things in this product to get some formulas for det A in terms of some determinants of (n — 1) x (n — 1) matrices. This can be useful sometimes in calculating det A, and can be very useful in getting inductive proofs of certain theorems about determinants. First off, let us rewrite (21) as (anei H h alnen) A • • • A (anle-i H h annen) = (det A)e-i A • • • A en Next we do the case n = 3 to make the procedure clear without too much notation. (anei + ai2e2 + ^13^3) A (a21ei + ^22^2 + ^23^3) A (031^1 + ^32^2 + a33e3) = anei A (a2iei + a22e2 + ^23^3) A (031^1 + ^32^2 + a33e3) + a\2e2 A (a2iei + a22e2 + 023^3) A (a31ei + ^32^2 + a33e3) + ai3e3 A (a2\e\ + a22e2 + a2^e^) A (a31ei + ^32^2 + a33e3) = anei A (a22e2 + ^23^3) A {a<$2e2 + a33e3) + ai2e2 A (021^1 + ^23^3) A (031^1 + ^33^3) + ai3e3 A (a2iei + ^22^2) A (a3iei + a32e2) = ^11^22^33 ei A e2 A e3 + 011023^32 ei A e3 A e2 + ^12^21^33 e2 A ei A e3 + ^12^23^31 e2 A e3 A ei + ai3a2ia32 e3 A ei A e2 + ^13^22^31 e3 A e2 A ex = {o^ii{a22a33 - a23a32) - ct12(a2iass - a2Sa3i) + ^13(021^32 ~~ a22fl3i)} ei A e2 A e3 > (*)
C. Determinants, the Laplace Extension 67 Let Bij be the 2x2 matrix obtained from A by deleting the ith row and jth column. Then we have proved that det A = an det Bn — a\2 det B\2 + ai3 det £?i3 . For annxn matrix A, the definition of B^ (an (n — 1) x (n — 1) matrix) is similar. Proposition 11 For an n x n matrix A, we have (22) det A = an det Bn - ai2 det B12 + • • • + (-l)n+1am det £in. Proof: Same as for n = 3. I Equation (22) is the Laplace expansion about the first row of A. More generally, we can calculate det A by computing the Laplace expansion about the ith row of A, i.e., det A = {-lY^an det Ba + • • • + {-l)i+nain det £in . Proposition 12 For an n x n matrix A we have (23) det A = an det Bn - a2i det B21 + • • • + (-l)n+1ani det J3nl. (This is the Laplace expansion about the first column of A. We can also calculate det A using the Laplace expansion about the jth column of A. Exercise. Write out the appropriate formula.) Proof:. Again the general case should be clear from the computation for n = 3. We rewrite (*) as (anei A (a22e2 + ^23^3) A (a32e2 + a33e3) + (^12^2 + ai3e3) A (a2iei) A (a32e2 + a33e3) + (^12^2 + ai3e3) A {a22e2 + a23e3) A (a3iei) and this equals ^11^22^33 ei A e2 A e3 + aiia23a32 ei A e3 A e2 + ^2iai2a33 e2 A ei A e3 + a2iai3a32 e3 A ei A e2 + «3i«i2^23 e2 A e3 A ei + a3iai3a22 e3 A e2 A e\ = {an(a22a33 - a23as2) - a21(a12a33 - a13a32) + a3i(ai2a23 - a13a22)} e\ A e2 A e3 . Thus deti4 = an dot JBu — a2i detB2i + a3i detB3i. I
II. Matrices and Determinants Exercises (1) Recall that we define the transpose lA = faij) of an n x n matrix A = (a^) by *a^- = a^. (Definition: A is symmetric if A =tA.) Prove the following. (i) *(S4) = A, *(A + B) =*A +*B. (ii) A +lA is symmetric. (iii) *(AB) =*BS4. (iv) A$4 is symmetric. (v) det 54 = det A. (Hint: Use the Laplace expansion about the first row of A and the first column of 54. Then use induction.) (vi) If the nonsingular matrix A is symmetric, then A-1 is also symmetric. { A C \ (2) Show that the determinant of ( R J is (det A) (det B), where A and B and square submatrices. Generalize this to calculating the determinant of Mi \ V 0 Am J where Ai,..., Am are square submatrices. Elementary Row Operations. (3) Let A = (a^) be an element of Mn(k). The following three operations are called elementary row operations. E2 E'3 Interchange the ith row and jth row Multiply a row by a nonzero scalar r £ k Replace ith row by (ith row) + (jth row) Two matrices M, AT are row equivalent if AT can be obtained from M by a finite sequence of elementary row operations. Show that this is an equivalence relation. (4) Let £3 be the operation: E2 and then E'z. Show that Eu E%,E^ still give the same equivalence relation.
C. Determinants, the Laplace Extension 69 (5) Show that row equivalent matrices have the same row space. (6) An n x n matrix A = (a^) is in echelon form if the number of zeros preceding the first nonzero entry in a row increases row by row, stopping this increase only when arriving at an all zero row (then succeeding rows must be all zeros). The first nonzero elements in rows of an echelon matrix are called distinguished elements. Prove that any matrix can be reduced to echelon form by an appropriate sequence of elementary row operations ^1,^2,^3- (7) Show that r is the rank of the n x n matrix A ^ A has a nonsingular r x r submatrix, but any larger square submatrix of A is singular. (8) An echelon matrix A is row reduced if its distinguished elements satisfy: (i) each is the only nonzero element in its column, and (ii) each is equal to 1. Prove that any echelon matrix can be put in row reduced form by an appropriate sequence of Ei,E2,E%. (9) Let A be an n x n matrix. Show that det A is changed by Ei,E2, and E% in the following manner: (i) det(E1(A)) = -detA, (ii) det(E2(A)) = r(det A), and (hi) det (ES(A)) = detA (10) An nxn matrix e is an elementary matrix if e can be obtained from the nxn identity matrix I by a single Ei £ {Ei,E2,E$}. Show that we can perform an elementary row operation on a matrix by multiplying by the appropriate elementary matrix. Now show that if A is an invertible (= nonsingular) nxn matrix and A is row reduced to the echelon matrix B by elementary row operations (i.e., there are elementary nxn matrices ei, e2,..., eq such that B = eq ... e2£\A), then we must have B = I. Conclude that invertible nxn matrices are precisely those which are products of elementary matrices. Given A € Mr,(fe), note that Ei,E2,E% make evaluating det A easier, since we can simplify A to the matrix B with £q, I?2, E3 (being careful to keep track of how those changes relate det A to det B): we can either row reduce A in which cblhg B is a diagonal matrix (and det B is simply the
70 II. Matrices and Determinants product of its diagonal elements); or we can make all (but possibly one) elements in a column of B equal to 0, and then calculate det B by using the Laplace expansion about this column. Example / 2 0 If A = \ 12 V-l 4 third row to obtain , then subtract twice the second row from the B We have applied the operation E% to obtain £?, so det A = det B. We can calculate det B by expanding about the second column: det B = 2 det 2 1 -3 -8 = 2(-16-(-3)) = -26. More Exercises (11) Compute det A for the following matrices. (i) A- (ii) A: (iii) A = (12) If we replace "row" by "column" in exercise (3), we get the elementary column operations on A £ Mn(k). Give appropriate versions of exercises (3)-(10) where the notions of row equivalence, etc., have been replaced by the notions of column equivalence, etc. Now prove these revised exercises. (Hint for (9): recall that det*A = det A.) (13) Some of the exercises (3)-(10) generalize to the case ofnxm matrices. Decide which of these exercise can be generalized, give appropriate versions, and than prove them,
D. Inverses, Systems of Equations 71 D. Inverses, Systems of Equations For A = {cbij) € Mn(fc), we know that A has an inverse A-1 <3> det A ^ 0. But this does not tell us how to calculate A-1 when we know it exists. In this section we determine a way to calculate A"1. For the i,j entry a^ of A, we define its cofactor to be ^•-(-l^'det^ where, as before, Bij is the submatrix of A that we get when we delete the ith row and jth column of A. Then we define an n x n matrix C — (cij) by C ='(6«) . that is, c%j = bji. This matrix C is called the classical adjoint of A, and we will denote it by adj A. Example n = 2 ^ = / aH a12 \ V a21 «22 / ■Bn = a22, #12=021, B21 = ai2, -822=011. Then adj A = ( 22 12 1. Consider the product ' A(adjA)=( aiia22~ai2a21 ° V V ° -021012 + 011022 So A(adj A) = (det A)/. Similarly, (adj A)A = (det A)I. So, at least in the case n — 2, when det i^Owe get A-1 easily from adj A. This is true in general. Theorem 2 A (adj A) = (adj A)A = (det A)/. Corollary //det A 7^ 0, tfien A"1 - (h^Wj ^- Proof: Now Theorem 2 makes an assertion about the matrix product AC = A (adj A), and formula (22) says that the 1,1 entry in this product is det A. To prove the theorem, we must show that the remaining diagonal entries in AC are det A and all nondiagonal entries are zero. First we calculate the Laplace expansion about the ith row of A to get det A = (-l),+1Oii det Ba + • • • 4- {-l)i+nain det B,{ (24) = aubu +--- + ainbin in
72 II. Matrices and Determinants which is just the i, i entry in AC. Now suppose i 7^ fc, and we write an expression similar to the right hand side of (24): (25) {-l)k+1an det Bkl + • • • + {-l)k+nain det Bkn. We recognize (25) as the i, k entry of AC. But it is also det D where D is the matrix obtained by replacing the kth (^ ith) row of A by the ith row (and leaving the ith row intact). Of course, since D has two identical rows, det D = 0 by (ii) of Proposition 7. The proof that CA = I is almost exactly the same (starting off with (23) instead of (22)). So we have proven Theorem 2, and the corollary tells us A~x whenever det A ^ 0. I Exercise. Calculate (25) for n — 3, i = 1, k = 2. Show it gives the 1,2 term of AC and it also gives an «12 «13 det | an ai2 ai3 «31 &32 «33 Systems of Equations Let fcbea field of characteristic zero. Let A be an n x m matrix over k and let y G km. We are concerned with the questions: (26) (a) Does there exist xGfcn such that xA = yl (b) When such an x does exist, is it unique? We can write (26) as ' an (x1,x2,...,xn) &nl or even (*) an^i + fllm^l + &lra (2/l,2/2,...,2/m) + «nl^n = 2/1 ~r OjnrnXn — Z/r? The a jo are the coefficients and the unknowns. We have m equations in n unknowns. Note that if y = 0, then we always have the trivial solution x = 0.
D. Inverses, Systems of Equations 73 Now, using standard bases for kn and fcm, we have (26r) A:kn -+km linear. The answer to (a) is: yes <=> y G A(kn). (Notice that in (26), we have matrix multiplication so we write A on the right; but in (26'), we are considering A to be a linear map so we write A on the left, as in A{kn).) So if n < m, theje will be some t/'s not in the image A(kn) and xA = y will have no solution for such y's. The answer to (b) is: yes <3> ker A = {0}. That is, if ker A ^ {0}, then no solution can be unique; for if xA = y and v G ker A, then (x + v)A = xA + 0 = y. Conversely, if x, x' G kn are distinct elements such that xA = x'A = ?/, then (x — x')A = y — y = 0 so that x — x' e ker A. In particular, if n > m, we must have ker A ^ {0}. So the interesting case is n = m. In this case, if det A ^ 0, then A is an isomorphism and there is a unique solution. If det A = 0, then the answer to both questions is no. The previous discussion is interesting from a theoretical point of view, but it sheds little light on how to actually solve the system (*), if a solution exists. Recall that in section IA we briefly discussed the method of elimination when we solved a system of three equations in three unknowns. We now extend this method to the general system (*) by using elementary row operations to simplify the system. (The elementary row operations E\, E<i, E% are discussed at length in the exercises of the previous section.) We begin by writing (*) as an m x (n + 1) matrix /an ... ani \ &lra • • • Q"nm and then putting this matrix into row reduced form using E\,E2, and E$ (by exercise (13), we can always do this). Prom this new matrix, we can simply "read off" the solution, if it exists. Some examples should suffice to illustrate this technique. Examples (1) Starting with the system 3xi + 2x2 = 2 2xi + x2 = -1 , w© write this as / 3 2 | 2 \ \,2 1 |-1 j V\ \ Vm J
II. Matrices and Determinants Applying the elementary row operations to this matrix yields the row reduced matrix 1 0 |-4 0 1 | 7 Thus the (unique) solution is (xi,x2) = (—4,7). Note that we expected this situation since det ( I j]=3-4 = -1^0. (2) Starting with the system xi + x2 = 3 2xi + 3x2 = 4 x\ — 2x2 = -2 , we write this as which is row equivalent to The third row of this matrix tells us that 0 = Oxi + 0^2 = 1; since this is clearly impossible, we conclude that no solution exists. (3) Starting with the system xi - x2 + 2x3 = 1 2xi - x2 + 3x3 = -1 , we obtain the row reduced matrix / 1 0 1 | -2 ^ 0 1 -1 | -3 Thus we seek (xi,X2,#3) such that X\ + xs = -2 x2 - a?3 = -3 ,
D. Inverses, Systems of Equations 75 or equivalently, xi = -x3 - 2 X2 = ^3 — 3 . So any choice of xs E R will determine xi and X2 so that (xi,x2,x3) is a solution. It is an interesting fact that these examples exhaust all possibilities for the number of solutions to (*); i.e., (*) either has zero, one, or infinitely many solutions. Exercises (1) Find the inverses of the following matrices by using the corollary to Theorem 2. (i) A -(-ii) 1 0 -1 (ii) A= ( 2 1 0 0 -1 1 (2) If A is a nonsingular n x n matrix, show that A~x can also be computed as follows. Form the n x 2n matrix (A | /) and row reduce this to (/ | B) . Then B = A~1. (Hint: We know that A is the product of elementary matrices by a previous exercise.) (3) Find A'1 where A = (4) Solve the following systems of equations or indicate that no solution exists. ( 2 -2 1 V-i 5 -3 3 -6 -3 2 -2 4 -2\ -5 2 3/ ,,, 2x[ + 3a?2 = -1 1 ' hxi + 7x2 = 0
76 II. Matrices and Determinants Xi — 2^2 + 3^3 = 1 2xx + x2 + 5x3 = -2 xi- x2 = 4 2xi + x2 = 5 —xi — 3x2 = —6 Homogeneous Systems. Let A be an n x m matrix over k and consider the homogeneous system of linear equations xA = 0 where x G kn. We know that x = 0 is a solution; what else can we say about the solution space S = {x G kn | xA = 0} ? (5) Show that S is a subsapce of kn. (6) Let W denote the row space of A, and show how we can consider x G S to be an element of the annihilator of W. Now show that S ^ A(W). (7) Show that dimS = dimfcn - dimW = n - rank(A). (Hint: Recall exercise (7) of section IF.) (8) What is the dimension of the solution space for the following homogeneous system? 3x + 2y - z = 0 x +2 = 0 - 2x -32/ + 2z = 0 E. Eigenvalues (Part ii) Let V be a finite-dimensional vector space over a field k and let <\> G End(V). Recall that A G fc is an eigenvalue for <\> if there exists a nonzero vector v G V such that 4>(v) = At>. Another way of saying this is to define i/j\ :V —> V by i/)\(v) = </>(i>) — A^. Then ^ is an endomorphism of V and clearly (27) A is an eigenvalue of <j) <=> ker ip\ ^ 0. (ii) (iii)
E. Eigenvalues 77 Examples (df=r2, ,=(; j). For A E R and v = a e\ + 6 e2 consider ^a(v) = 0(f) - Av. Since </>(ei) = e2 and </>(e2) = 0, we get i/>\(v) = oe2- X(aex + be2) = (-Aa)ei + (a - A6)e2. If ^a(^) is to be 0, we must have — Xa — 0 and hence a = 0 or A = 0. If A ^ 0, then a = 0 and A6 = 0 so 6 = 0, and we conclude that v = 0. Thus 0 has no nonzero eigenvalues. This is not surprising because <j> is nilpotent (of order two) (i.e., cf) o cf) = (j)2 — I J the zero endomorphism). So if (f)(v) = Xv, O = 0o (f)(v) = 0(Av) = X(/)(v) = A2t> and this cannot happen for v ^ 0 unless A = 0. The same argument works for nilpotent endomorphisms of higher order.1 Exercise. Show that A = 0 is, indeed, an eigenvalue for <\> (in example (1)) and that e2 is the only corresponding eigenvector (up to scalar multiplication). (2)V = K?, ,= (g J). Now </>(ei) = 0 and </>(e2) = e2. So if A ^ 0 and v = a e\ + 6 e2 and we evaluate i/j\(v), we get ^a(^ ) — (—Aa)ei + (6 — A6)e2. If this is to be zero, we must have a = 0. If v ^ 0, then 6^0 and (1 — X)b = 0 so we must have A = 1. The eigenvector is any nonzero scalar times e2. Here <j> is idempotent (<j> o </> = </>) and does have exactly one nonzero eigenvalue. Exercise. Show that A = 0 is also an eigenvalue for <j> (in example (2)) and find a corresponding eigenvector. (3) y = R2, *=(_J J). Proceeding as in (1) and (2), we get t/)\(v) = (~b - Xa)ei + (a — A6)e2. lt,g., if $a ■ 0 ftnd ${v) = At;, than 0 = $3(w) as X^v again implying A ■ 0.
78 II. Matrices and Determinants If this is to be zero, then a = Xb and so 0 = b + Xa = b + X2b = 6(1 + A2). Thus 6 = 0 and a = 0, and <\> has no (real) eigenvalue. This is not surprising since 0 is a rotation of \ radians. Exercise. If we pass to the extension field C of R, then we have V = C2 and A G C. For <\> (as in example (3)), find two distinct eigenvalues in this extension field. What are two corresponding eigenvectors? f X a b (4) F = R3, AgR*=R-{0}, 0=1 0 A c | . We also \ 0 0 A assume a, 6, c G R*. / 0 a b \ Now ip\ = 4> — XI = I 0 0 c J, and we see that e3 e \ 0 0 0 / ker i/j\. Thus by (27), A is an eigenvalue for (/>. Note that tf = and thus, although ip\(e2) = (0,0, c) ^ (0,0,0), we have ifr^fa) = (0,0,0). So ker Va c ker ^l- We have 0 £ ker Va c ker ^a c ker V>a = r3 • Span(e3) Span(e2, e3) Span(ei, e2, e3) Exercise. Show that if we allow A = 0 (in example (4)), then <j> is nilpotent of order 3. In example (2) above I V — R , 0=1 n ) ) we have that the only nonzero eigenvalue is 1, and we find that if v = aeiH-6e2, then ip\{v) = —ae\ and thus v G ker i/ji <& a = 0. Now = 0(-aei)-(-aei) = 0-faei. So in example (2), ker ipi = ker V>i(= Span (e2)) and the dimensions of the kernels of ip™ stabilize at 1. In example (4), the dimensions Increased right up to th© dimension of F(= Br).
E. Eigenvalues 79 Definition The algebraic multiplicity of an eigenvalue A of an endomor- phism (f>: V —> V is the maximum dimension of the subspaces 0 C ker i/>\ C ker ifc C • •.. Since we are assuming V is finite-dimensional, this will be an integer < dim V. If the maximum dimension is zero, this just means that A is not an eigenvalue of <j>. Recall that back in Section E of Chapter I, for each A G k we defined a subspace V(A) of V consisting of all v G V which had A as an eigenvalue. We proved that If /i ^ A, then V(/i) D V(X) = {0}. In our present notation, V(X) = ker i/j\, and now we sometimes have bigger spaces (i.e., ker ip\) associated with A. We will now show that for A ^ /i, these spaces also intersect only in zero. To simplify our notation, we fix </>, A, and /i (where A ^ /i). Then we write Ki for ker i/>lx and Lj for ker ift where i,j = 1,2, Note that we have 0 C Kx C K2 C • • • C Ki C • • • and 0 C Lx C L2 C • • • C Lj C • •.. We now claim that Ki fl Lj = {0} for each i,j. Proof: Since i/>\ = (j) — AI and ^a IS ^ne zero endomorphism on Ki, we have that (0 — A J)2 = 0 on Ki. Similarly, (0 — /i/)-7 =0on Lj. Note that (since A ^ /i) the polynomials (x — X)1 and (x — /i)-7 have no common factor. We will see in section B of the next chapter that the previous statement has the following consequence: we can find polynomials a(x), b(x) such that (28) 1 = a(x)(x - Xy + b(x){x - fx)j . Now suppose we have v G KiC\L3. We wish to show that v must be 0. By (28), we have V = I(V) = [a(0)(0-Air + K0)(0-^)J]W = a(4>)(4>-\iy(v) + b(4>)(4>-iiiy(v) = 0 + 0 = 0. I Note that everything we have said in this section about 0 G End(F) applies equally to A G Mn(k); e.g., A G k is an eigenvalue of the n x n matrix A if there is a nonzero x G kn such that xA = Ax. We will soon see an easy way to calculate the eigenvalues of A. Exercises (1) Given any A e k, show that </> commutes with x/j\ = 4> - XL (2) Show thai Ki - ker ^^ is 0-stable.
II. Matrices and Determinants (3) Let Vi,...,vm be nonzero eigenvectors for <\> G End(V) corresponding to the distinct eigenvalues Ai,..., Am. Show that Vi,..., vm are linearly independent. (4) For V = R and A as below, find the eigenvalues of A along with the geometric and algebraic multiplicities of each. Vi 1 0 (i) A = I 0 Ax 0 I , Ax ^ A2 (ii) A 0 0 3 2 1 Ai 0 1 4 1 0 A2 1 2 3
Chapter III Rings and Polynomials A. Rings A ring R is a generalization of a field. It has operations of addition and multiplication, and R must be an abelian group under addition (just as for a field). However, the only requirement for multiplication is that it distribute over addition: (1) a(b + c) = ab + ac, and (a + b)c = ac + bc . Example If G is an abelian group (with identity element 0), we can make G into a ring by using the operation on G for addition and defining ab = 0 for all a,beG. Definition A ring R is associative if its multiplication is associative; it is commutative if its multiplication is commutative; it has a unit element if it has a 2-sided multiplicative identity (i.e., a 1 G R such that for any x G R we have xl = lx = x). If R has a unit element, then a G R is called a unit if there exists b G R such that ab = 1 = ba. If R has a unit element, it is unique. Examples (1) The ring Z of integers is associative and commutative and has a unit element. (2) The ring M2(R) of 2 x 2 real matrices is associative and has a unit element, but it is not commutative. The same is true of Mn(k), for any n > 1 and any field k. In Chapter V we will encounter some non-associative rings. But for this chapter wt will assume that ring means associative ring.
82 III. Rings and Polynomials Definition A subset 5 of a ring R is a subring of R if the operations on R induce operations on S so that S becomes a ring. Example In the ring Z of integers, the subset 2Z of all even integers is a subring. Definition A nonempty subset 5 of a ring R is an ideal if: (i) r G R and s G S => rs G S and sr G 5, and (ii) «,t G 5=> s-t G 5. Let 5 be an ideal in i?. Since 5 is nonempty, there is some s G 5 and thus s — 5 = 0 G 5. Thus every ideal contains zero. We also note that the single element {0} is itself an ideal, ((ii) is obvious and (i) is just the proof in Chapter 0 that a0 = 0.) Suppose R has a unit element 1. Then any ideal 5 in R containing 1 must equal R. (By (i), r G R => rl{= r) G 5.) For a field &, only {0} and k are ideals. Because if I is an ideal in &, either / = {0} or / contains a nonzero element x. In the latter case, x has an inverse x~l G k and (by (i)) x~xx = 1 G / implying I = k. Definition If i?, T are rings, a map (j):R->T is a (ring) homomorphism if for all r, s G R we have (i) 0(r + 5) = 0(r) + 0(5), (ii) 0(rs) = 0(r)0(s), and (iii) if i?, T have unit elements 1,1/ we also require that (f)(1) = V. Note that a ring homomorphism sends the zero of R to the zero of T. Proposition 1 If (f) \ R—±T is a ring homomorphism, then (i) </>(R) is a subring of T, and (ii) ker</> is an ideal in R. Proof: If a, b G 0(iJ), then there exist elements x, y in R such that </>(#) = a and </>(y) = b. Then 0(x ± y) = a ± b implying a ± 6 G 0(i2), and </>(a?y) = a& implying ab € $(J?).
A. Rings 83 This proves part (i). For (ii), suppose x G ker </> and r G R. Then </>(rx) = (j)(r)0 = 0, so rx G ker </>. Similarly, xr G ker </>. For x, y G ker 0, clearly x — y G ker 0. i Just as we did for vector spaces, we say that two rings i?, T are isomorphic if there is a ring homomorphism <j> : R—* T which is both surjective (i.e., (j){R) = T) and injective (i.e., ker <\> = {0}). Note that in afty ring i?, any element a G R commutes with itself, so a2 is well-defined. By associativity, aa2 = a2a so a3 is well-defined, etc., so all powers of a a are well-defined. In particular, for A G Mn(k), all powers A? are well-defined. The ring Mn{k) is also a vector space over the field &, and we have seen that the dimension of this vector space is n2. Thus, of the vectors {I, A, A2, A3,...} at most n2 of them can be linearly independent. This means that there is a nontrivial linear combination of /, A, A2,..., An which equals the zero matrix 0. That is, there exist ao,ai,... ,an2 G &, which are not all zero, such that (2) a0I + aiA + a2A2 -\ h aniAn2 = 0. We rephrase this as follows: given A G Mn(k) there exists a nonzero polynomial (with coefficients in k) of degree (at most) n2 which is satisfied by A. For n = 2 and A = I n 12 ) this means there is a nonzero vector V a2i a22 J (&0,&1,&2,&3,&4) € k4 such that b0I + 6i A + 62A2 + 63A3 + 64A4 = Actually, it is possible to find a polynomial of degree n which A G Mn(k) satisfies. Definition The characteristic polynomial of A G Mn(k) is (3) a(x) = det(xl - A) which is a monic polynomial of degree n. Strictly speaking, we have not defined the determinant of C G Mn{R) unless R is a field; however, we can formally calculate detC, using the Laplace expansion, whenever C has entries from a commutative ring. In particular, deiC i« defined for C G Mn(k[x]). Taking C = xl - A, we see that (3) mikes mmo for A € Mn(k). (Actually, (3) makes sense for 0 0 0 0
84 III. Rings and Polynomials A G Mn{R), where R is a commutative ring with unit element, but we will not need this much generality until Section C.) Now the Cay ley-Hamilton Theorem states that &(A) is the n x n zero matrix. In more detail, xI-A ( x - an -ai2 ... -aln \ —a2i x — a22 • • • — «2n \ —Q>n\ —a>n2 • • • x — ann J and the determinant a(x) of this matrix is of the form (4) a(x) = Xn + fcn-lZ71"1 + &n-2*n~2 + • • • + hx + b0 , and the assertion of the theorem becomes An + &n_i A""1 + 6n_2An"2 + • • • + h A + 60/ = 0. For n = 2 and A = ( n 12 ) the assertion is V a2i «22 / A2 - (an +CL22)A + (a11a22 - a12a2i)I = I q 0 ) ' which can be checked directly. Note that <r(0) = det(-A) = (-l)ndetA, but cr(0) = b0 is also the constant term of a(x). It is also an interesting fact that 6n_i = —trace A Exercise. Use induction to show that 6n_i = —trace A and that a(x) is, indeed, a monic polynomial of degree n. Proposition 2 x — r is a factor of a(x) O r is an eigenvalue of A. Proof: x — r is a factor of a(x) <^> a(r) = 0 <=> det(rl - A) = 0 <& rl - A is singular <^> there is a v ^ 0 in kn such that v(r/ — A) = 0 <^> vA = rv. 1 Proposition 2 provides us with a convenient method for calculating the eigenvalues of a square matrix. Examples ( 2 2 \ (1) The eigenvalues of A = I I are the roots of a(x) = det ( :E_22 s +2j ) = 0z-2)(z+l)-4 = (*-3)(z+2) which are Ai = 3 and A2 = — 2. To get an eigenvector for A corresponding to Ai = 3, note that we are looking for (rci, £2) £ R2 such that (&i,&aM = 3(01,32)
A. Rings 85 or equivalently (x1,x2)(3I-A) =0. Thus (xi,x2) must satisfy (xllx2)( _2 4 J = {xi -2x2,-2xi +4x2) = (0,0) or, mpre simply, x\ = 2x2. If we choose x2 = 1, then we get the eigenvector (X!,X2) = (2,1). Exercise. Find an eigenvector corresponding to the eigenvalue A2 = -2. (2) The eigenvalues of A = I n J are the roots of a(x)=det(X1 'l^x' + l. This polynomial does not have any real roots, but if we extend to the complex numbers (i.e., think of A as an element of M2(C)), then we see that the eigenvalues of A are Ai = i and X2 = — i. Unfortunately, the eigenvectors for A must also be complex. Since any real polynomial can be factored completely over C, the technique used in example (2) shows that every A e Mn(R) has an eigenvalue (although it may be a complex number). It is easy to show that similar matrices have equal characteristic polynomials; so if V is a finite-dimensional vector space, then we define the characteristic polynomial of <j> G End(T^) to be the characteristic polynomial of Ma (</>), where a is any basis for V. This allows us to calculate the eigenvalues of </> by calculating them for any Ma((f>). Exercises (1) Recall the definition of Zm (with the operations + and •) from the exercises of Chapter 0. Show that Zm is a ring. Show that this ring is a field <^misa prime number. (2) Let R be a ring and choose any r, 5 E R. Show that r(—s) = — (rs) = (-r)s and (—r)(—s) = rs. If R has a unit element, show that it is unique and that (—l)r = —r, (—1)(—1) = 1. (3) A commutative ring is called an integral domain if it has no diviners of zoro. Show that a finite integral domain is a field.
86 III. Rings and Polynomials (4) Define a map (j> : Z —► Zm by </>(n) = remainder after dividing n by m. Show that 0 is a surjective (ring) homomorphism. What is the kernel of </>? (5) Find the eigenvalues of the following matrices. Also find a basis for each associated eigenspace. 1 0 1 (ii) A = | 0 2 0 0 0 1 (iii) A (6) Show that similar matrices have the same characteristic polynomial. (7) Show that the n x n matrices A and lA have the same characteristic polynomial. If one of the square matrices A, B is nonsingular, show that AB and BA have the same characteristic polynomial. Is this still true if we allow them to both be singular? (8) Let V = R3 and define 0 G End(F) by <£(ei) = 2ei + 2e2-e3 0(e2) = ei + e2 + e3 <Ke3) = ei - 2e2 + ^4 • Find the eigenvalues of </> and find a basis for each associated eigenspace. B. Polynomials First we will consider polynomials with coefficients in a commutative (and associative) ring R with unit element. Then we will remove the commuta- tivity assumption (since we will want to use R = Mn(k)) and see how this modifies our results. Definition A function p:R~+ R
B. Polynomials 87 is a polynomial function if it is given by a formula of the sort (5) p(x) = anxn + an_ixn-1 H f- axx + a0 where an,..., ao are elements of R which we call the coefficients of p. By writing (5) as we have, we mean to imply that an ^ 0. The n for this highest nonzero coefficient is the degree of the polynomial p and we write this as degp. The polynomial p is monic if an = 1. The x is referred to as the indeterminate (since it may be chosen to be any element of R) and the aiX% are the terms of p. Definition An element x0 G R is a rootoip if p(x0) = anXQ-\ }-aiXo + a0 = 0. Let R[x] denote the set of all polynomials in x with coefficients in R. We will make R[x] into a commutative ring with unit element. The unit element 1 in R[x] will be the polynomial in x with all coefficients except ao being zero and ao = 1 G R. The zero 0 G R[x] is the polynomial with all coefficients zero. (Note that we do not define the degree of the zero polynomial.) To add p(x) and q(x) in R[x] we simply add up all of their terms and then put the results in the form (5). For example, if p(x) = 2x3 + x — 1 and q(x) = 7x4 - 3xs + x + 1, we get p(x) + q(x) = 7x4 - xs + 2x. This operation is associative and commutative, the polynomial 0 acts as identity element, and — p(x) is the additive inverse of p(x). Thus R[x] becomes an abelian group under addition. Since multiplication is to distribute over addition, we just need to define the product of a term a^x1 in p(x) and a term bjX^ in q(x). We set (6) (aix^ibjx*) = (oibj)^* and extend by distributivity. This multiplication is associative and is commutative (since R is). Clearly 1 G R[x] is the unit element. Properties of Degree: (a) deg(/ + g) < max(deg /, deg#) (b) deg(fg) < degf + deg# If the coefficient of the highest power is a unit, for / or for g, then (c) deg(fg) = degf + deg# Proof: (a) and (b) are straightforward. For (c), let f(x) = ao + a\x + V anxn (an ^ 0) and g(x) = b0 + b\x H f- 6mxm (bm ^ 0). Then we just need to show that the coefficient anbm of a?n+m is nonzero. But one of an, bm is a unit. Say a,n m a unit. Then anbm = 0 gives a~lanbm = 0 or bm = 0, a contradiction, I
88 III. Rings and Polynomials Proposition 3 (Division Theorem) Given f(x),g(x) G R[x], with the highest coefficient in g(x) being a unit, then there exist unique polynomials q{x) and r(x) such that (7) f(x) = q(x)g(x)+r(x) with degr < deg# or r(x) = 0. Proof: (Note that if deg / < deg #, the conclusion is trivial — take q(x) = 0 and r(x) = f(x).) Let 5 = {f(x) — h(x)g(x)\h(x) G i?[x]}. If 0 G 5, then we are done; otherwise, from S we select r(x) = f(x) — q{x)g{x) with the least possible degree and suppose r(x) = Co + c\X H h cpxp1 cp ^ 0, and g(x) = b0 + bix H h 6mxm, bm a unit. We will show that p>m contradicts the minimality of p = deg r. So suppose p > m, and let *(*) = r(a;) - cpb^x^mg{x). Then s(x) = /(a;) - q(x)g(x) - cpb^xp-mg(x) = f(x)-(q(x)+cpb^xP-m)g(x) showing s(x) G S. Now degs < degr since s(x) = (c0 + cxx + • • • + cpxp) - cpb^l1xp-rn(bo + • • • + 6mxm) and the pth powers cancel. Thus deg r < deg g, and we have also found q. It remains to prove that q and r are unique. So suppose qg+r = q'g+r' where q ^ g'. (Note that q ^ qf implies q — q' ^ 0 so that deg(g — g') > 0.) Then (q - q')g = r' - r. So (8) deg((<? - q')g) = deg(r' - r) < degg. By property (c) of degree, (9) deg((q - q')g) = deg(q - q') + deg g, and (8) and (9) together force deg(g — q') < 0, giving a contradiction. Thus q = q', and also r = r'. I Proposition 4 Let f(x) = a0 + aixH \-anxn, g(x) = 60 + 6ia?H + 6mxm, and c £ R,
B. Polynomials 89 Then 0) (f9)(c) = a>o9(c) + aig(c)c H h ang(c)cn, and (ii) (gf)(c) = g{c)a0 + cg(c)a1 -\ h cng(c)an. Proof: Using distributivity we get f(x)$x) = a0g(x) + a1g(x)x -\ h ang(x)xn, and g(x)f(x) = g(x)a0 + xg{x)ax + • • • + zn#0z)an, so the proposition follows. (We have written these in a somewhat strange way in preparation for the case in which R is not necessarily commutative.) I Corollary 1 If g(c) = 0, then (fg)(c) = 0 and (gf)(c) = 0. Corollary 2 If f{x) can be written as f{x) = g(x)(x — c)-\-r with c,r £ R, then f{c) = r. Corollary 3 (Factor Theorem) Let f(x) G R[x\. Then x — c is a factor of f(x) * /(c) = 0. Case of Noncommutative R Suppose now that R is not necessarily commutative, and let p(x) = ao + a>\x H f- anxn G R[x\. We define operations on R[x] as before and R[x] becomes a ring which, of course, may not be commutative. Now for c G R we define the value of p at c to be (10) p(c) = a$ + aic^ h ancn, but in addition we define a left value p\{c) by (11) p\(c) = a0 + cai + c2a2 H h cnan. (Alternatively, we could write p\(x) = a0 + ccai H f- xnan and evaluate Px(x) at c.) Now Proposition 4 becomes Proposition 4/ Let f{x) = ao + aicc H f- anxn and g{x) G i?[cc]. Tften for c £ R, we have (0 (/#)(c) = ao#(c) + ai^(c)c + • • • + ang{c)cn, and (ii) {9f)x{c) = £A(c)a0 + c^A(c)ai + • • • + cngx(c)an. Proof: Again, wi have
90 III. Rings and Polynomials f(x)g(x) = aog(x) + a\g(x)x -\ f- ang(x)xn (recall our definition of the product of terms of polynomials. Thus g(x)x = xg(x), etc.) and g{x)f(x) = g{x)a0 + xg(x)a1 H h xng(x)an. Thus the proposition follows from (10) and (11) I Corollary 1' (*) 9(c)=0^(/<7)(c) = 0(by(i)of4'). (**) 9\(c) = 0 =» (gf)x(c) = 0 (by (ii) of 4'). Corollary 2' (Remainder Theorem) (i) If f(x) = g{x)(x -c) + r, then /(c) = r. (ii) If f(x) = {x - c)g(x) + r, then fx(c) = r. Proof: For (i), apply (*) to the product g{x){x — c). For (ii), apply (**) to the product (x — c)g(x). I Corollary 3' (Factor Theorem) Let f(x) G R[x] and c G R. Then (12) (x - c) is a right factor of f(x) O /(c) = 0. (13) (x - c) is a left factor of f(x) <^> f\(c) = 0. Polynomials Over a Field k Definition If p(x),q(x) G k[x] we say that q divides p if there is some nonconstant h(x) G k[x] such that p = qh. Given two polynomials f(x),g(x) in k[x] we now find their greatest common divisor. Proposition 5 Given f(x),g(x) G k[x] (not both zero), there is a unique polynomial d(x) satisfying: (i) d(x) — a{x)f{x) H- b(x)g(x) (for some a(x),b(x) G k[x}), (ii) d(x) is monic, (iii) d(x) divides both f(x) andg(x), and
B. Polynomials 91 (iv) if s(x) divides both f{x) andg(x), then s(x) divides d(x). Proof: Since we do not have both / and g zero, we have a nonzero polynomial of the form (i), and we may require it to be monic since A; is a field. Let d{x) be one of minimal degree. We claim that it then also satisfies (iii) and (iv). If d fails to divide /, we have f(x) = q(x)d(x) + r(x) with degr < degd, and r ^ 0. For some c £ &, cr is monic. We claim it is of the form (i) and it has degree less than deg d, contradicting the minimality of deg d. Now r = f — qd and d = af + bg, so •cr = cf- cqd = cf - cq(af + bg) = (c - cqa)f - cqbg , and so cr is of the form (i). This proves d satisfies (iii). (iv) follows easily from (i). Finally, if we have d! with these same properties, then d and d! are monic polynomials each dividing the other (by part (iv)) and thus they are equal. I Definition f(x),g(x) are relatively prime if their greatest common divisor is 1. A polynomial / is irreducible if the only monic nonconstant polynomial dividing / is a constant multiple of /. Lemma 1 Suppose f is irreducible and f divides the product gh. Then f divides g or f divides h. Proof: If / does not divide g, then 1 = a(x)f(x)-\-b(x)g(x) (by Proposition 5). But then h = afh -f- bgh, and since / divides gh we conclude that / divides h. I As a corollary to this lemma, if an irreducible f(x) £ k[x] divides a multiple product of elements of k[x], then it divides some one of them. Proposition 6 Letp(x) be a nonconstant monic polynomial in k[x\. Then p(x) is a unique (up to order) product of irreducible polynomials. Proof: Suppose monic nonconstant polynomials which cannot be factored into irreducible polynomials do exist. Let / be such a one of minimal degree. If / is irreducible, then it is already factored; so we must have f = gh with g and h monic and nonconstant. But deg/ = deg# -f deg ft (by (c) of properties of degree) and thus g and h can be factored into irreducible polynomials. But then / can also be factored. Now suppose that there are polynomials which admit nonunique factorizations. Then we have (14) p s= ai.. .ar =s b\.. .b9
92 III. Rings and Polynomials with each a* and bj monic and irreducible. Choose the least r for which this happens. Since a\ divides b\... bSl it divides one of the bj; since bj is irreducible, they are equal and can be cancelled and r was not minimal. Thus r = s. Finally, if the 6/s are not just a permutation of the a^s, we can choose r(= s) minimal for this. Again, we can cancel a\ with some bj to obtain a contradiction. I Definition Given A G Mn(fc), we know that there is a monic polynomial p(x) G k[x] such that p(A) = 0 (e.g., let p be the characteristic polynomial of A) so there is such a polynomial of minimal degree. We call this the minimal polynomial of A and denote it by m(x). (If V is finite-dimensional, we define the minimal polynomial of </> G End(F) to be the minimal polynomial of any matrix which represents </>. This is well-defined by property (e) below.) Properties of m(x): (a) deg m < n (b) m is unique (c) m divides any polynomial which is satisfied by A (so, in particular, m divides the characteristic polynomial a of A) (d) m and a have the same irreducible factors (e) similar matrices have the same minimal polynomial Proof: (a) is obvious. For (b), suppose we have another such m'{x). Since m and mf are both monic, m — mf must be of smaller degree; but (m — m'){A) — m(A) — m'{A) = 0 which contradicts the minimality of deg m = deg m' (unless m' = m). For (c), suppose we have a nontrivial polynomial p(x) G k[x] such that p(A) = 0. By the division theorem, there are polynomials g(x),r(x) such that p(x) = g{x)m{x) -f r{x) where r = 0 or degr < degm. If r ^ 0, we get a contradiction since r(A) = p(A) - g(A)m(A) = 0. To prove (d), the following lemma will be useful. (Its proof is left as an exercise.) Lemma 2 cr(x) divides (m(x))n. Now suppose that f(x) G k[x] is an irreducible polynomial. If / divides m, then it also divides a (since m divides a by part (c)). On the other hand, suppose that / divides a. Then / divides mn by the previous lemma; but / is irreducible and so it must also divide m by Lemma 1. The proof of (e) is left as exercise (2). 1
B. Polynomials 93 Exercises (1) Prove Lemma 2. (2) Given A G Mn{k) and f(x) G k[x], show that f{CAC'1) = Cf(A)C~1 where C is any n x n invertible matrix. Conclude that miCAC-1) =0&m{A) = 0. (3) Show^hat x — 1 divides xn — 1. (4) Given A G Mn(fc), let S C fc[x] be the set of all polynomials f(x) such that /(A) = 0. Show that S is an ideal of k[x]. (5) Let f(x) G R[cc] be a monic irreducible polynomial of degree two. Show that / can be written in the form f(x) = (x-a)2 + b2 where a, b are real numbers with 6^0. Conversely, show that any such / is irreducible. (6) Suppose f(x) = anxn -f • • • + a\x -f ao is a polynomial with integer coefficients, and suppose m/£ is a root of /, where m and I are relatively prime integers. Show that m divides ao and I divides an. (7) Show that the polynomials x2 — x -f 4 and x3 — x -f 1 are irreducible over the rational numbers. What about over R? (8) Show that if R is an integral domain with unit element, then any unit in R[x] is also a unit in R. (9) Find the minimal polynomial of the following matrices: (i) A = (ii) A = 2 0 0 0 0 2 0/ (10) Suppose A = B 0 0 C where B, C are square submatrices. Show that the minimal polynomial of A is the least common multiple of the minimal polynomials for B and C. Now generalize to the case where / A1 0 \ A = \ 0 Am )
94 III. Rings and Polynomials C. Cay ley-Hamilton Theorem Let R be a commutative ring with unit element. Then R[x] is also commutative. We consider two rings. First consider Mn(R[x]), that is, the ring of all n x n matrices with entries from the ring R[x], For example, / 1 -f x 2x2 \ ( „ 3 ] G M2(Z[x]). Next consider Mn(R)[x], that is, the ring of polynomials with coefficients from the ring Mn(R) of n x n matrices with entries from R. We claim that these rings are essentially the same. Theorem 1 Mn(R[x]) ^ Mn(R)[x] . Proof: First we need some notation: for f(x) = ao + a\X + f- anxn G R[x], let rp{f{x)) = ap, where it is understood that rp(f(x)) = 0 if p > n. Clearly rp{f{x) + g(x)) = rp(f(x)) + rp(g(x)); since we want a ring isomorphism from Mn(R[x]) to Mn(R)[x], we also need to know rp{f{x)g{x)). Lemma rp(f(x)(g(x)) = Y%=0ri(f(x))rp-i(g(x)). The proof of this lemma is left as an exercise. Now for F = {fij(x)) G Mn(R[x}), we define rp(F) = (r^fijix))) which is an element of Mn(R). Clearly rp(F + G) — rp(F) + rp(G) for F,G e Mn(R[x]). We are now ready to define a map from Mn{R[x\) to Mn(R)[x]: given F G Mn(R[x\), define a : Mn(R[x])-^ Mn(R)[x] by a(F) = r0(F) + n(F)x + r2{F)x2 + ... (a finite sum). It is plain that a(F+G) = a(F)+a(G), since rp{F+G) = rp{F)+rp{G). To see that a(FG) = a(F)a(G) requires some work, however. We will show that the (matrix) coefficient of xp is the same in a(FG) and a(F)a(G). The coefficient of xp in a(F)a(G) is C = (cy) = f>(J>p_,(G) by the obvious extension of our lemma to the polynomial ring Mn(R)[x\. The coefficient of xp in a(FG) is D = (dy) = rp(FG) .
C. Cay ley-Hamilton Theorem 95 If F = (fij(x)) and G = (gij(x)), then the i,j entry of FG is (FG)ij = /•i0ij 4- • ■ • 4- /in^nj. Thus dij = (rp(FG))ij = rp(fngij H h /*n^nj) = Yl(n{fn)rp-i{gij) 4- • • • 4- ri(fin)rp-i(gnj)) £=0 where the last equality follows from our lemma. Now note that v Cij = Yl(ith row of T^F^ x tith column of rp-t(G)) £=0 P = ^2{r£(fn)rp-£(gij) 4- • • • 4- r£(fin)rp-£(gnj)) £=0 which is just dij. We have shown that a is a ring homomorphism; the proofs that it is surjective and injective are left as an exercise. I We continue with R (and hence R[x]) commutative and with unit element. Recall that we defined the characteristic polynomial of A G Mn(R) to be (15) a(x) = det(xl- A). We note that many important properties of the determinant function still hold in this general setting; in particular, Theorem 2 of Chapter II is still true, That is, if adj C denotes the classical adjoint of C G Mn(R), then (16) (adj C)C = (detC)I. Applying this to C = xl — A gives (adj (xl - A))(xl -A)= det(xl - A)I = a(x)I. Now we can consider (by Theorem 1) the relation (adj (xl - A))(xl -A) = a(x)I to be in Mn(R)[x}. There, it says that xl — A is a right factor of a(x)I. Thus, by Corollary 3', the value of a(x)I at x = A is zero. We have proven the following important theorem. Theorem 2 (Cayley-Hamilton Theorem) For A e Mn(R), we have . (17) (j(yl) = ao/4-a1A+... + an_1An-1+An = 0 where ar(x) s* ao4-aj£-l \ran-ixrhm~i+xn is the characteristic polynomial of A.
96 III. Rings and Polynomials Exercises (1) Prove the lemma in this section. (Hint: Try induction on deg(fg).) (2) Verify by direct computation that the following matrices are, indeed, roots of their characteristic polynomials. (i) A = 2 -1 0 0 0 1 0 4 3 2 1 0 0 0 -1 0 1 0 0 0 1 \ / 0 1 0 0 \ / (ii) (3) Prove that a (defined in the proof of Theorem 1) is surjective and injective. D. Spectral Theorems (an application of the Cayley-Hamilton Theorem) Let A G Mn(k) and let a(x) = det(x/ - A) G k[x] be its characteristic polynomial. We will see how to obtain information about A (considering it as an element of End(/cn)) from information about a. Case I: Suppose a(x) = {x- n)(x - r2)... {x - rn) where Ti G k and no two of the T{ equal. First we note that by Cayley-Hamilton we have (18) 0 = (A - rJ)(A - r2I) • • • (A - rnI) . Next, since the ?Vs are distinct, we can find ai,a2,... ,an G fe* such that (for x not equal to any of the ?Vs) we have (19) -7-r = + ••• + , or a(x) x — r\ x - rn
D. Spectral Theorems 97 (20) 1 = ax ]J(x - rj) + • • • + an JJ (x - rd). (This is just the technique of partial fractions from calculus.) Plugging in A for x, we have (21) / = oi H(A - rjI) + • • • + an JJ(^ - rjl). O Setting £?i = aiTT(A - r^J), we have (22) 7 = ^1 + ^ + ... + ^. Proposition 7 (Properties of the Ei) (i) EiA = AEi (Each Ei commutes with A), (ii) EiA = riEi, (iii) For i ^ j, EiEj = 0 fTfte Ei's form an "orthogonal set"), and (iv) Ef = £?j f!Sacft £?i is idempotent,). Proof: (i) This follows since A(A — Vjl) = (A — TjI)A for each j. .(ii) ^(A - r,7) = (aiH(A - rjI)){A - rj) = 0 by (18). (iii) Note that for i ^ j, the product a, ]J(A - r£I) J j aj H (A - rm/) contains the right hand side of (18) as a factor, (iv) El = ElI = Et(Ei + ••• + £„) by (22) = Ef by (iii). | For i = 1,... ,n we let V$ = £?i(fcn) . Since Ei is idempotent we know that it acts on Vi as the identity. Note that Ei ^ 0 since the minimal polynomial for A must contain the factor x-Vi. Henm V5 f* {0} and so dim V^ > 0.
98 III. Rings and Polynomials Proposition 8 (Spectral Theorem) Each Vi is one-dimensional, kn = vi e • • • e vn , and the action of A onVi is multiplication by r^. Furthermore, if we denote the restriction of A to Vi by A{, then the minimal polynomial of Ai is x—ri. Proof: First off, the Vi do span kn. For let v G kn. Then v = vl = v(E1 + • • • + En) e Vi + V2 + • • • + Vn. Next let w = vEi and consider the action of A on w;: wA = vEiA = vriEi = uvEi = riw. It follows that Vi is an eigenspace for A with eigenvalue r$. Now suppose we have w £ V± Pi (V2 + - - + Vn) so that w = V1E1 and w = V2E2 H -f vnEn. But then w = WE1 = (v2E2 + • • • + vnEn)Ei = 0 by (iii) of Proposition 7. Similarly, we see that Vi f) (Vi H h VJ_i H- V^+i H- • • • + Vn) = {0} for each i. It follows that kn = V\ 0 • • • 0 Vn by exercise (4) of Section ID. Since dim Vi > 0 for each i, we must have dim VJ = 1. The fact that x — r» is the minimal polynomial for A* follows from (ii) of Proposition 7. I So we see that Case I is very favorable. When a(x) splits into linear factors x — Vi with no r» occurring more than once, we get a clear and simple picture of A through the idempotents Ei,...,En. In Case I it is also easy to calculate any polynomial p{x) in A because we have Proposition 9 p(A) = p(r\)E\ H hp(rn)En. Proof: By the division theorem and the remainder theorem, for each i we have p(x) — p{vi) -f (x — ri)qi(x). By plugging A in for x and multiplying on the right by Et, we have p(A)Ei = p(ri)El + (A - r<J)(fc(A)Ei . But (A - TiI)Ei = 0 (by (ii) of Proposition 7). So p(A)Et = p(rt)Ei. Thus p(A) = p(A)(BiH-...+Sn) = p(ri)(£i) + .--+p(rn)(£n) as asserted. I Case II: Suppose (23) a(x) = det(a:/ - A) = (x - rx)mi ... (a? - f|)mi
D. Spectral Theorems 99 with ri,..., re distinct elements of the field k. So we are assuming again that a(x) splits into linear factors but now there may be repetitions. (For k = C we always have Case II and may sometimes have the subcase, Case By Cayley-Hamilton, we have (24) 0 = (A - nJ)mi ...(A- reI)m'. O Just as for Case I, we want to find idempotents Ei,...,E£ associated with A by breaking up l/a(x) into an appropriate sum. The difference now is that we need polynomials as numerators instead of just elements of k. Proposition 10 If p(x),q(x) are in k[x] and are (nonconstant and) relatively prime, then there exist unique polynomials a(x),b(x) in k[x] such that ap -f bq = 1 and deg a < deg q and deg b < degp. Proof: Since p, q are relatively prime, there exist ai, b\ G k[x] such that axp + hq = 1. Divide a\ by q to get a\ = qh-\- a with deg a < deg q. Set b = b\ -f hp and then 1 = a\p -f b\q = (qh -f a)p -f (b — /ip)g = ap -f 6g. So for the existence part of the proof, we are done if we can show deg b < degp. Since we are in k[x], with k a field, we always have deg(fg) = deg / -f deg#. So deg(bq) = deg 6 -f degq. But we also have deg(bq) = deg(l — ap) < deg a -f degp < deg q -f degp It follows that deg b < degp, proving existence. Suppose we also have a'p -f b'q = 1 with deg a' < deg q and deg 6' < degp. Then (a — a')p = — (b — 6')g and, since p and g are relatively prime, we have that p divides b — b'. Since deg(6 — b') < degp, this forces b — b' = 0. Similarly a — a' = 0 and Proposition 10 is proved. I We are still considering Case II; i.e., a(x) = det(xl - A) = (x - n)™1 ... (x - r£)m* with ri,..., r£ distinct elements of k. Proposition 11 We have (25) — = Ql(x) +... + Qi(x) 1 ; ct(x) (x-n)™* (x-r{)me where 0 < dega^a?) < m* - 1 ^/or t = 1,... J),
100 III. Rings and Polynomials Proof: We will establish (25) in the equivalent form (26) 1 = aiOr) ]\{x - rj)m> + • • • + a£(x) JJ(a: - rj)m> or equivalently, that h(x) = ai(x) ]\(x - rj)m> +... + a£{x) Y[(x - rj)m> - 1 is the zero polynomial. Now we choose i £ {1,2,..., £} and set p[x) = Y[(x - rj)m> and q(x) = (x - r*)m\ These are nonconstant relatively prime polynomials so we can apply Proposition 10. This gives unique polynomials a,i(x),bi(x) in k[x] such that (27) ai(x)p(x) + bi(x)q(x) = 1 with degeii < rrii and degfr* < degp = deg cr — rrii. We use the di(x) in (27) to define h(x). Note that degh < deg a. Also note that q(x) divides a,i(x)p(x) — 1 (by (27)). Now i was arbitrary so we conclude that for (x — ri)m% divides h(x) for each i. But this implies a(x) divides h(x). Since degh < deg cr, we conclude that h(x) = 0 as we desired to prove. I We can now substitute A for x in (26) to obtain I = a1(A)]l(A-rjir>+... + a£(A)]l(A-rjir^ or letting Ei — ai{A)V\{A — rjl)™3, we again have (28) / = £! + •••+£/ with Z?iZ?j = 0 for i ^ j, Z^A = AE*, and Ef = Ei. So we have again decomposed I into a sum of "orthogonal" idempotents using A. However, condition (ii) in Proposition 7 (namely, that (A — TiI)Ei = 0) must now be replaced by (29) (A-riI)m*Ei=Q. So if we define V* = Ei(kn) as before, we have that VJ ^ {0} since the minimal polynomial of A must contain the factor x - r*.
D. Spectral Theorems 101 Proposition 12 (Spectral Theorem) Each Vi is an mi-dimensional sub- space which is A-stable, kn = Vi 0 • • • 0 Vt , and the minimal polynomial of Ai — A\v is (x — Ti)Pt where 1 < pi < ra^. In fact, the minimal polynomial of A is m(x) = (x — v\)Vl • • • (x — ri)Pe. o Proof: By (29), we conclude that the minimal polynomial of A{ must be (x — ri)Pt with a < pi < m*. That m(x) = (x - ri)Pl • • • (x - rn)Vi is the minimal polynomial of A is left as an exercise. The proof that kn = Vi 0 • • • 0 Vi is almost identical to the proof in case I. To see that Vi is A-stable, choose w £ V^ and note that there is some v ekn such that w = vEi. Thus wA = (vEi)A = {vA)Ei G V*. We will defer the proof that dim Vi = rrii until the next section. (It is an easy consequence of the fact that we can put A into a "block diagonal form.") | Exercises (1) Compute the idempotents for the following matrices. What is the minimal polynomial of each Ai = A\v? f 1 o \° (2 0 0 V ° 0 2 0 0 3 0 0 0 ^ 0 -1 t 0 0 -2 0 \ / 0 ^ 0 0 3) (i) (ii) (2) Based on the results of exercise (1), state a general result about the idempotents of a diagonal matrix A. (3) Let A G Mn(k) be the matrix with A G k on its diagonal, 1 on its superdiagonal, and zeros elsewhere. For example, with n = 3 / A 1 0 A= 0 A 1 \ 0 0 A What arc the characteristic and minimal polynomials of AI Show thfid A has exactly one eigenvector. What is the algebraic multiplicity of the eigenvalue A?
102 III. Rings and Polynomials (4) Let V be a finite-dimensional vector space and suppose V = Vi 0 V2 where Vi, V2 are </>-stable subspaces for <\> € End(V). If m,i(x) is the minimal polynomial of <j>\v , show that the minimal polynomial of <j> is the least common multiple of m\(x) and m2(x). Generalize. E. Jordan Form Given </> £ End(V) where V is finite-dimensional, it is to our advantage to choose a basis a of V so that Ma(</>) is as simple as possible; this will make the task of understanding </> easier. We will look at this primarily as a matrix problem; i.e., given A £ Mn(k), we seek a matrix B which is similar to A and is as nice as possible. Now suppose that the characteristic polynomial a of A factors completely over k so that (30) a(x) = (x~ n)mi • • • (x - r£)m* as in case II of the previous section. As we observed before, this is always true when k = C. By Proposition 12, there are A-stable subspaces Vi,..., Vt such that kn = Vx 0 • • • 0 Vt , and the minimal polynomial of A% — A\v is [x — ri)Pt where 1 < pi < ra$. Also, the minimal polynomial of A is m(x) = (x — ri)Pl • • • (x — ri)Vi. First we show that we can consider A{ to be a square submatrix of A. Proposition 13 Suppose that (f) is an endomorphism of the vector space V; also suppose that Wi,...,Wm are (f)-stable subspaces such that V = W\ 0 • • • 0 Wm. Then we can find a basis aofV such that the matrix of (j) is of the form (31) Ma((f>) = f Bl 0 \ Bo \ 0 Bm J where jE?i, ..., Bm are square submatrices. Furthermore, the matrix of </>\w is B{. Proof: Choose a basis {vj,..., v^} for W\. Since W\ is ^-stable, we have (j)(v\) = bUv{ + - - + bin^ tlVnJ = fcnilv} + --- + 6nln1Vn
E. Jordan Form 103 So let B\ be the n\ x n\ matrix (6^). Similarly, choose a basis {v\, ...,^J for M^2? etc. Then a = {vj,..., v™ } is the desired basis for V. I Corollary If A € Mn(k) satisfies (30), then there is a nonsingular matrix C such that B = CAC'1 is of the form (31). Exercise. Finish the proof of Proposition 12; i.e., show that dim Vi = m*. (Hint: Show that the characteristic polynomial of A{ must be (x — r^)771*.) The matrix B in the preceding corollary is simpler than A since it is a block diagonal matrix (i.e., it has square submatrices on its diagonal and zeros elsewhere), but we can do better. Definition The matrix Jn(A)= (X 1 0 A 0 \ 1 A/ € Mn(k) with A on its diagonal, 1 on its superdiagonal, and zeros elsewhere is called the n x n Jordan block corresponding to A. Theorem 3 Suppose that <j> € End(V) has (x — r)s as its minimal polynomial. Then there is a basis for V in which the matrix of (j> is a block diagonal matrix of the form (32) / JsAt) \ 0 JsAr) 0 \ Jsk(r) J where s = si > s2 > • • • > Sfc > 1. Furthermore, this matrix is uniquely determined by (j). The proof of Theorem 3 is somewhat long-winded, so we will present it in an appendix after the exercises. Returning to the case of a matrix A £ Mn(k) whose characteristic polynomial is cr(x) = (x — ri)mi • • • (x — re)171* and whose minimal polynomial is m(x) = (x — ri)Pl • • • [x — 7y)p*, we see that a consequence of Proposition 13 and Theorem 3 is the following. Corollary Th&re is a nonsingular matrix C mwh that B = CAC l is the
104 III. Rings and Polynomials unique block diagonal matrix (33) B ( Bx \ 0 #2 ° \ Bi J where Bi is the rriix ^» block diagonal matrix of the form (32) corresponding to the eigenvalue r^; furthermore, Bi has a pi x pi block (in its upper left hand corner) and none of its other blocks is larger. Definition The matrix B in the preceding corollary is called the Jordan form of A. Examples (1) A = I 1 j has characteristic polynomial o{x) — (x — l)2 and minimal polynomial m{x) = (x — l)2; so the Jordan form of A must have exactly one 2x2 Jordan block corresponding to the eigenvalue 1, i.e., B = (2) A is said to be diagonalizable if its Jordan form is a diagonal matrix. If we are in case I of the previous section, then A is diagonalizable. (3) Suppose the minimal polynomial of A £ M^R) is (x — 2)2. Then A has two possible Jordan forms, namely, /2 0 V \ i 2/ or /2 0 V 2/ (Here, a blank space indicates a zero.) In order to decide which is actually the Jordan form of A, we need more information about A. We will now use the Jordan form of A to draw some interesting (and important) conclusions. Since rz is a root of the characteristic polynomial a(x) of A, it is also an eigenvalue of A. Recall that we defined the algebraic multiplicity of Ti to be the maximum dimension of the subspaces 0 C ker (A — r,/) C ker (A-r?;J)2C....
E. Jordan Form 105 Proposition 14 The algebraic multiplicity of r^ is equal to rrii (the multiplicity of ri as a root of a(x)). Proof: Let B = CAC"1 be the Jordan form of A (as in (33)) so that J3* is the rrii x rrii block corresponding to r^. Now the only vectors which get mapped to 0 by B — r{I are precisely the ones which get mapped to 0 by Bi — ul. (Exercise. Show this.) Similarly, ker (B — r^/)-7 =Jker (Bi — r^/)-7 for all j > 1. But Bi — rj is nilpotent of order ^ (i.e., (Bi — riI)Pt = 0) so the dimensions of the kernels of (B — Tiiy stabilize at rrii. Since dim(ker (A — rilY) = dim(ker (J3 —r^J)-7') for all j > 1, our result follows. | Recall that we defined the geometric multiplicity of the eigenvalue r^ to be the dimension of the eigenspace belonging to 7> Proposition 15 If Bi is the block corresponding to ri in the Jordan form of A, then the geometric multiplicity of ri is equal to the number of Jordan blocks occurring on the diagonal of Bi. Proof: Exercise. (Hint: An equivalent formulation of this problem is to show that the dimension of ker (Bi — r^J) is equal to the number of Jordan blocks on the diagonal of Bi.) I Note that if B = (bij) is a matrix in Jordan form, then its determinant is easy to calculate: detB = 611622 • • *6nn since B is triangular. For A G Mn(C), let Ai,...,An G C be its eigenvalues (there may be repeats in this list). If B = CAC"1 is the Jordan form of A, then the diagonal elements of B are precisely (some permutation of) Ai,..., An. Since det A = det £?, we see that the determinant of A is simply the product of its eigenvalues. In fact, for A G Mn(R), the same is true: thinking of A as an element of Mn(C), we can put it in Jordan form (over C), etc. Similar arguments show that trace A = Ai H h An for A with real or complex entries. We summarize these paragraphs as Proposition 16 Let Ai,...,An G C be the eigenvalues of A € Mn(k) where k = Rork = C. Then det A = Ai • • • An and trace A = AiH hAn. Exercises (1) Find all possible Jordan forms for a real matrix whose characteristic and minimal polynomials are as follows. (Could Proposition 15 possibly be used to single out a particular Jordan form?)
III. Rings and Polynomials (i) a(x) = (x-l)\x-2)2 , m(x) = (x-l)2(x-2)2 (ii) a{x) = (x + 3)4(x-4)4 , m(x) = (x + 3)2(x-4)2 (iii) a(x) = (x + l)6 , m(x) = (x + l)2 (2) Find the Jordan form of the following matrices. Also calculate det A and trace A. (iv) A = (v) A = (3) Show that A £ Mn{k) is diagonahzable <^> A has n linearly independent eigenvectors. (4) Show that A £ Mn{k) is diagonahzable <^> the minimal polynomial m(x) of A is a product of distinct linear factors (i.e., m(x) — (x — ri) • • • (x — ri) where ri?..., ra are distinct). (5) Show that the Jordan block Jn(r) is not diagonahzable. Which of the matrices in exercise (2) are diagonahzable? (6) If A £ Mn(k) is nilpotent and nonzero, show that A is not diagonahzable. (7) If A £ Mn(C) satisfies Am — /, show that A is diagonahzable. (Hint: Show that Jn(r)m is not diagonal unless n = 1.) (8) Given the characteristic and minimal polynomials of some A £ Af3(C), show that these completely determine the Jordan form of A
Appendix: Proof of Theorem 3 107 (9) Suppose that A £ Mn{k) is diagonalizable, and let Q be the matrix whose rows are the n linearly independent eigenvectors of A. Show that B = QAQ~~l is the Jordan form of A. Appendix: Proof of Theorem 3 First we consider^ the special case where dim V = s and the minimal polynomial of (f) is {x — r)s. Since {(f) — r)s_1 ^ 0, there is a vector v eV such that (<l>-r)a-\v)±Q. Observe that the set of s vectors a = {v,{cj>-r){v),...,{4>-ry'\v)} is linearly independent. For if (34) a0v + ax{(f) - r){v) + • • • + as_i(</> - r)3'1^) = 0 , then applying {(f) — r)s_1 to both sides of (34) yields ao(^-r)*-1(t;) = 0. But then a$ = 0 since {(f) — r)s~l{v) ^ 0; and so (34) becomes (340 ai{(f) - r)(v) + • • • + as_!(0 - r)5"1^) = 0 . Applying {(f) — r)s~2 to both sides of (34') yields a\ = 0, etc. Hence ao = • • • = as-i — 0 and so a is linearly independent. So a is a basis for V. What is the matrix of (f) in this basis? Since (f){v) — (r + (f) — r){v) = rv -f {(f) — r){v) (/>{(/> — r){v) = {r -f (f> — r)(</> — r){v) — r(</> — r){v) + {(f) — r)2{v) (j){(j) - r)*-1^) = r{(j) - r)*-1^) + r{(f) - r)s{v) = r{(j) - r)5"1^) we have Ma{(f>) — Js{r). Now we return to the general case where dimF = n and the minimal polynomial of (f) is (x — r)s. Suppose that we can decompose V as a direct sum of 0-stable subspaces (35) V = Vx 0 • • • © Vk where dim V* = S{ (remember that s\ must be s) and the minimal polynomial of $\v is (a? - r)"'. Then by the simple case above and Proposition 13, we will d© done,
108 III. Rings and Polynomials For convenience, we write -0 — (j) — r . We start off by finding V\ as in the simple case. Since -0s-1 ^ 0, there is a ^i G V such that i^s~l{vi) ^ 0. Let V\ be the «i(= s)-dimensional ^-stable subspace Vi =Span(vi,^(t;i),...,^a-1(wi)) and note that the minimial polynomial of (j>\v is (x — r)Sl. If we can find a ^-stable subspace W such that V = Vl®W , then (noting that the minimal polynomial of </>\w is (x — r)f where t < s) by induction on n = dim V, we can find the appropriate </>-stable subspaces V2,..., "54 such that w = v2 e... e vk. This will then give us (35). To get W, we use induction on s. If 5 = 1, there is nothing to prove. Our induction hypothesis is: Suppose p G End(X) has minimal polynomial (x — r)s_1. If we let Xi = Span(xi, (p — r)(xi),..., (p — r)s~~2(xi)) where (p — r)s~2(xi) 7^ 0, then there is a p-stable subspace Y such that X = Xi 0 Y. Now let V = ip(V) and note that V is </>-stable, because for any t? G V we have 4>{%l>{v)) = </>(</> - r)(v) = ((t>- r)(f)(v) = i/>(<l>(v)) G V' . Also, the minimal polynomial of <j>\v, is (x — r)s-1. If we let u\ = ^(vi), then V/ - SpanK,VK),... ^s~2(u1)) = ^(Vi) is a ^-stable subspace of V', so we have a </>-stable subspace V{ such that V7 = V[ 0 V2' by induction. Now let W = ^~1(^) = {^ e V I tl>{v) g V2'} and note that (36) V = Vi + W .
Appendix: Proof of Theorem 3 109 To see this, choose t? G V and observe that we can write i/>(v) as l/;(v) =Wi+W2 where Wi G V/. But V{ = Span(^(vi),... ,-0s-1(vi)) so we can write wi as Wl = ai^(vi) H ha^-i^5-1^!) o = ^i+-- + as_if-2(t;i)) = #*i) where zi eVi. Thus ^(v) = -0(^i) + W2 so that i/j(v — Zi) — W2 G V^. But this implies that v — Z\ G TV'; so we have v = zi + (v - zi) eVi + W as claimed. Unfortunately, we do not have Vi n W7 = {0}, so the sum (36) is not direct. However, we can say that (37) VinV2 = {0}. Exercise. Show this. (Hint: If v G Vi n V2', then ^(v) G V/ n V2' = {0}. Show that t/;(v) = 0 and v G Vi together imply that v G V7.) So W contains the subspaces Vi H IV' and V2' which intersect trivially (by (37)). Thus we can find another subspace V7 such that w' = (Vi n W) e v2' e v3'. Exercise. Show this. Finally, we let w = vi ® vi and show that this is the subspace we desire. Clearly, V\ D W = {0} since V2 and V7 are both disjoint from Vi. Furthermore, Vi + W contains both Vi and W so that V = Vi+W. Thus v = Vi e w. To see that TV is ^-stable, choose w G W and note that i/>(w) G V2. But then 0(u/) = (</> - r + r)(w) = i/;(w) + rw eVj + W = W. We have shown that <\> can be represented by the matrix (32), but what about uniqueness? Toward this end, the following lemma will be helpful. Lemma Suppose that U is t-dimensional and that p e End(U) has minimal polynomial a?*. Thm p^(U) has dimension t - j for each j <t.
110 III. Rings and Polynomials Proof: Choose wG[/so that {u, p(u),..., pt-1(ix)} is a basis for [/. Then {p>(u),... ,pt_1(ix)} is a basis for fP(U). I Continuing the proof of uniqueness in Theorem 3, suppose that </> has another block diagonal representation where the blocks are Jtl (r),..., Jte (r) with ti > • • • > te > 1. (These blocks correspond to subspaces Wi,..., Wt such that V = W\ 0 • • • 0 W^.) We claim that k = I and s» = U for each i. If not, there is a least index j such that S3 T1 tj ? without loss of generality, we assume that Sj < tj. As before, let i/j = </> — r and note that the minimal polynomial at ij)\w is xt% (and for i/j\v it is xSt). By the previous lemma, dim^Sj (Vi) = Si- Sj and dim^ (Wi) = U - Sj for i < j. By exercise (9) of Section IE, we have i/>*> (V) = ^ (W1 0 • • • 0 We) = ^ (Wi) 0 • • • 0 V>Sj (W€) and so , . dim^Sj (V) > dim^Sj (Wi) + • • • + dim^Sj (Wj) [ j = (ti-*;) + -" + (t;-*;). Also ^ (Vi) = {0} for i > j, so we have ^ (y) = ^ (^ 0... 0 vi) = r° (vi) ©... 0 v>Sj ovj which implies that (39) dim^Sj (V) = (*i - Sj) + • • • + (^-i - *,-) . Now Si = ti,..., 5^-i = tj-i by our choice of j, so we can rewrite (39) as (39') dim^ (V) = (ti - Sj) + • • • + (t,_i - s,) . Comparing (38) and (39'), we conclude that 0 > tj - Sj which contradicts the fact that Sj < tj. This completes the proof of Theorem 3. I
Chapter IV Inner Product Spaces A. Rn as a Model, Bilinear Forms For x = (£i,...,£n), y = (yi>--->yn) two points in Rn we define the distance from x to y by d(x, y) = y/{xx - yi)2 + • • • + (xn - yn)2. Two obvious properties of d are: (1) d(y,x) =d{x,y), . (2) d(x, y) > 0, and d(x, y) = 0 <=> x = y. A less obvious property is: (3) For any x,y,z £ Rn we have d(x,y) + d(y,z) > d(x,z). Property (3) is called the triangle property. A function d : X x X -*H satisfying (1), (2), and (3) is a metric on the set X. To prove (3), it is convenient to introduce some related definitions. The norm of x G Rn is ||z|| = d(3,0) = yjxl + --- + X*. The function (,):RnxRn^R defined by (1) {x,y) =x1y1 + ... + a;nyn is called an inner product We will see that d and (,) are related by (2) d(ar, y) = y/(x-y,x-y).
112 IV. Inner Product Spaces Proposition 1 (,) has the following properties: (i) (,) is symmetric; i.e., (x,y) = (y,x), (ii) (x,y + z) = (x,y) + (x,z), (ii') (x + y,z) = (x,z) + (y,z), (iii) (x, ay) = a(x, y) for a G R, (iii7) (ax, y) = a(x, y) for a G R, (iv) ( ,) is positive definite; i.e., (x,x) is always > 0, and (x,x) = 0<^x = 0 = (0,...,0), (v) if {ei,..., en} is the standard basis for Rn, then [el,ej) oy-j0 .f t^j and (vi) (,) is nondegenerate; i.e., if (x, y) = 0 /or a// y, then x = 0. Proof: (i), (ii), (ii7), (iii), (m7)? and (iv) are all routine to prove. Note that (ii) and (iii) together say that (,) is linear in the second variable when the first variable is held fixed. We have a similar statement from (ii7) and (iii7). These four together mean that (,) is bilinear. (v) is obvious. For (vi), suppose (x,y) = 0 for all y. Then, letting y — x and using (iv), we see that x = 0. I Now we prove (3) d(x,y) = yj(x-y,x-y) = \\x - y\\. Using bilinearity of (,), we have (x-y,x-y) = (x,x)- (y,x) - (x,y) + (y,y) = (%i - yi)2 + • • • + {xn - yn)2 = \\x - y\\2 = (d(x,y))*. Now, using properties of (,), we will prove the Schwarz Inequality. This will then give the triangle inequality for the metric d. Proposition 2 (Schwarz Inequality) For any x,y G Rn, we have (4) (x,y)2 < (x,x)(y,y).
A. Rn as a Model, Bilinear Forms 113 Proof: By property (iv) of (,), for any real number t we have (x + ty,x + ty) > 0. Using bilinearity and symmetry, this is (5) (x,x) + 2(x,y)t+(yiy)t2>0. This quadratic polynomial in t with real coefficients can have only one minimum, so (5) shows it cannot have two distinct roots. It follows that the discriminant cannot be > 0. That is, (2(x,y))2-4(y,y)(x,x)<0. This is equivalent to (4) and the Schwarz Inequality is proved. I Now take x,y,z G Rn. We claim that d(cc, y) + d(y, z) > d(cc, z) holds. Proof: We apply the Schwarz Inequality to x — y and y — z to get (6) (x-y,y-z) < ^/(x -y,x-y) y/(y - z,y - z) = d(x, y)d(y, z) We will show that (6) is equivalent to (7) (d(x,y) + d(y,z))2>(d(x,z))2. Rewriting (7) as (x-y,x-y) + 2d(x, y)d(y, z) + (y - z,y - z) > (x - z,x - z), and then expanding out and cancelling some terms gives -2(3, y) + 2(y, y) - 2(y, z) + 2d(s, y)d{y, z) > -2(3, z) or - (s, y) + (y, y) - (y, *) + (&, z) + d(z, y)d(y, *) > o or d(x, y)d(y, z)>(x-y,y-z) which is (6). I Bilinear Forms Definition Let V be a real vector space. A bilinear form on V is a function f ;V x V —+R which satisfies (i) f(u + v, w) = /(w,w) + /(v, w), (!') /(u, v + u/) - /(u, v) + /(u, w), and
114 IV. Inner Product Spaces (iii) f(au,v) = af(u,v) = f(u,av) for a G R. The bilinear form is symmetric if f(u,v) = f{v,u) for all u,vGV; and the symmetric bilinear form / is positive definite if f(v,v) > 0 for v ^0 . Exercise. Show that /(?;, 0) = 0 = /(0, v) for any v €V. Examples (1) The inner product (,) on Rn is a positive definite symmetric bilinear form. (2) Given A = (a^) G Mn(R), we define a bilinear form on Rn as follows: /an ... aln \ ffay) = xA(ty) = (xll...1xn) n — / v aij%iyj - \ ani t^nn I ( yi \ \Vn J The fact that / is bilinear follows from the fact that matrix multiplication distributes over matrix addition. We get the inner product of example (1) by taking A — I. Note that / is symmetric *=> A is a symmetric matrix. If we choose a basis a = {t?i,... ,t?n} for the finite-dimensional vector space V, then we can say a great deal about any bilinear form / on V. First we express u, v G V in terms of this basis (u = x\V\ -\ h xnvni v = y\V\ + h ynvn) and then we use bilinearity of / to compute n f(u,v) = ^T Xiyjfiv^Vj) . If we set azj = /(t^, Vj), then we see that / is completely determined by the n2 numbers {a^}. Letting A be the n x n matrix (a^), an equivalent way to write f(u,v) is f{u,v) = (xi,...,xn)A f Vi \ \ Vn J
A. Rn as a Model, Bilinear Forms 115 So once we choose a basis for V, all bilinear forms on V can be written as in example (2). A is called the matrix representation of / with respect to the basis {t?i,..., vn}. A natural question to ask at this point concerns our choice of basis; i.e., if we had picked a different basis f3 = {w±,..., wn} for V, how would that affect the matrix representation of /? We might expect the answer to be the same as for endomorphisms (namely, that M#(</>) = CMa((f))C~1 for some nonsingular>matrix C), but this is not quite right. If we let B be the matrix representation of / with respect to the basis {wi,..., wn}, then there is a nonsingular matrix C such that B = CAlC . A proof of this relationship is sketched in the exercises. Exercises (1) One can make Rn x Rn into a vector space by taking for (x, y) the 2n-tuple (cci,..., xn, yi,..., yn)- Show that if this is done, then (,) : Rn x Rn -+ R is not a linear map. (2) Show that (x, y) = 0 for all x implies y = 0. (3) Show that equality holds in the Schwarz Inequality exactly when x and y are linearly dependent; i.e., (x, y)2 = (x, x) (y, y) ^ there is a real number r such that x = ry. (4) Prove the following refinement of the Schwarz Inequality: for any x = (zi,..., xn), y = (yi,..., yn) in Rn, we have n \(x,y)\ <^2\xtyi\ < \\x\\ \\y\\ . (Hint: Observe that 2af3 < a2 + /?2 for any a, f3 G R, and set a"iwr p~\\vw Now sum these inequalities over i.) (5) The rank of a bilinear form / is defined to be the rank of any matrix representation of /. Show that this definition makes aenso (i.e., if B = CAlC (C nonsingular), then A and B have fche same rank),
116 IV. Inner Product Spaces (6) Given a symmetric bilinear form / on the real vector space V, the quadratic form associated to / is the function g-.V-^R defined by g(v) = f(v, v). Show that / and g are related by f(u,v) = \(g(u + v)-g(u-v)) . (7) If we define a quadratic form onF = R^ by g(xi, X2) = x\ + 3xix2 - 2x\ , find the matrix of its associated symmetric bilinear form. (8) Show that the real matrix A represents a symmetric bilinear form <=> A is a symmetric matrix. (9) Prove the relationship B = CAlC (where A, B are matrix representations of the bilinear form / with respect to the bases {vi,..., vn}, {wi,..., wn}) by filling in the details of the following steps. (i) Define C = (cij) by n i=i and notice that this is just the change-of-basis matrix occuring in the proof of Proposition 4 (Chapter ii). (ii) Express u^G^as u = x1v1 H h xnvn = x[wi -\ h x'nwn v = yivi H h ynvn = y[wi -\ h y'nwn and show that x = (xi,... ,ccn) is equal to x'C = (si,..., x'n)C. Similarly, y = y'C. (iii) Use the result of (ii) to show that /(u, v) = xA^y) = ^(CA'CXV). Conclude that B = CAfC. B. Real Inner Product Spaces, Normed Vector Spaces Let V be a real vector space. (V, (,)) is called a real inner product apace if (,) is a positive definite symmetric bilinear form on V". As in section A, we
B. Real Inner Product Spaces, Normed Vector Spaces 117 define a norm on V by ||v|| = yj(v,v). Note that || ||2 is just the quadratic form associated to (,). If it is understood that the field in question is R, we will simply refer to (V, (,)) as an inner product space. It is worth observing that our proof of the Schwarz Inequality holds for any inner product space, finite-dimensional or not. Example If V is finiterjlimensional, it is easy to define an inner product on V. Take any basis {v\,... ,vn} f°r V, define (vi,Vj) = Sij, and extend bi- linearly. If x = x1v1 + • • • + xnvn, y = y1v1 + • • • + ynvn, then we get (x,y) = X\y\ + • • • + xnyn. This is clearly positive definite, bilinear, and symmetric. Example We will now examine an infinite-dimensional inner product space in some detail. Let V be the set of all real sequences x = {xl\ such that Yjx\ exists; i.e., V consists of all square-summable real sequences. We add sequences and multiply by scalars in the obvious way; to see that the sum of two square-summable sequences is square-summable, we define fay) = ^%iVi , i=l and show that this series converges absolutely. By the refinement of the Schwarz Inequality occuring in the exercises of section A, we have \xiyi\ H h \xnyn\ < £*? £»? < A / ^ i \ / j *i — \\ / j i \ E- i=l which holds for all n. But then the monotonic sequence of partial sums Sn = |#i2/i | H 1- |^n2/n| is bounded and so converges. Now suppose that {x{} and {yt} are square-summable. Then Efo + y.)2 = E(^2 + 2xtyi+yt2) and so {xz}-\- {yl} is also square-summable. It is trivial to see that if r G R and {x{} G V, then {rxl} is square-summable. Hence V is a real vector space. Exercise. Show that (,) defined above for square-summable sequences is positive definite, bilinear, and symmetric. Thus wo havo an infinite-dimensional vector space with an inner product on it. It is called f3*tipacr or Hilberi Space.
118 IV. Inner Product Spaces Definition Let (V, ( ,)) be an inner product space. We say that two vectors u^GV are orthogonal if (u, v) = 0. A set S of vectors in V is an orthnormal set if each v G S is a nni£ vector (i.e., ||v|| = 1) and any two vectors in S are orthogonal. JScarapZe The standard basis for Rn is an orthonormal set by (v) of Proposition 1. Proposition 3 Any nontrivial finite-dimensional inner product space (V, (,)) has an orthonormal basis. Proof: First, we choose any basis {vi,...,vn} f°r V, and then we set v[ = v\ and Note that v[ and v'2 are orthogonal since (v[,v>2) = (v^-^v'J = (v'1,v2)-^(v'1,v'1)=0. Also note that Vi,vf2 are nonzero (since {vi,v2} is linearly independent) and v\,v2 are in the span of {v[, v2}. More generally, define (Vm,v'm-l) , _ _ (Vm,Vl) / (^-1,^-1) m"1 '" K^i) * and note that t?^ is orthogonal to v[,... 1v'm_1. Furthermore, v^ is not the zero vector and vm G Span(v/1,..., vfm). If we let Un = TRJI Vn » then {ixi,... ,wn} is an orthnormal set of vectors which span V. This is our desired basis. I The technique used in the preceding proof to generate the orthonormal basis {ixi,...,un} is called the Gram-Schmidt orthonormalization process. Definition Let (V, (,)) be an inner product space. A linear map 0 : V —► V is said to be orthogonal if for all u, v G V we have (0(u),0(v)) = (u,v) . Equivalently (when V is finite-dimensional), A G Mn(R) is an orthogonal matrix if {xA,yA)** {x,y)
B. Real Inner Product Spaces, Normed Vector Spaces 119 for all x,y e Rn. We denote the set ofnxn orthogonal matrices by 0(n). Clearly an orthogonal map preserves lengths (= norms) of vectors, because \\<Kv)\\2 = {<Kv),4>(v)) = (v,v) = \\v\\*. Strangely enough, the converse is true. Proposition 4 If</>:V—>V preserves lengths, then it is orthogonal o Proof: Our hypothesis is: (a) (<t>(v), <t>(v)) = (v,v) for all v. Our conclusion is: (b) (<l>(u),(/)(v)) = (u,v) for all u,v. Statement (b) appears to be more general. But we will prove that (a) => (b) by "polarizing the identity (a)" (a procedure we will also use in our study of normed algebras). We apply (a) to the vector u + v: (cj)(u + v),</)(u + v)) = (u + v, u 4- v) = (u, v) + 2(u, v) + (v, v) = {cj>{u),cj>{u)) + 2(u, v) + <0(t;), 0(t;)>, and note that also (<l>(u + v),<l>(u + v)) = (<l>(u)+<l>(v),<l>(u) + <l>(v)) = (d>(u), d>(u)) + 2{cj>{u)^{v)) + {cj>{v),cj>{v)) proving (b). I We have discussed two different concepts of orthogonality (namely, orthogonal maps and orthogonal vectors). These notions are quite closely related. Given the matrix A G Mn(R), note that Ri = eiA G Rn is the ith row of A. If A is orthogonal, then (Ri,Rj) = (eiA,ejA) = {e^ej) = • 6ij. Said differently, the rows of A form an orthonormal basis for Rn. (Exercise. Show that if S C Rn is any orthonormal set, then S is linearly independent.) The converse is also true (namely, that a matrix whose rows form an orthonormal basis for Rn is orthogonal), but a proof of this fact will have to wait until we know more about orthogonal matrices. Normed Vector Spaces We saw in section A that an inner produce (,) on Rn defines a norm by i (8) HxHv'M.
120 IV. Inner Product Spaces Similarly, given the inner product space (V, (,)), we can define a norm on V by (8). Notice that this norm inherits some properties from ( ,). For example, for r G R and v G V, we have (9) ||rv|| = yj(rv,rv) = Vr2 \/(v,v) = \r\ \\v\\ . Also, positive definiteness of (,) tells us that (10) IMI>0, and ||t;||=0^t; = 0. Finally, the Schwarz Inequality gives (ii) i|w+u|I<iHi + ihi for any u, v G V. (To see this, square both sides to get \\u + v\\2 = (u + v, u + v) = \\u\\2 + \\v\\2 + 2(<u, v) and ( ll«ll + ll^ll)2 = ll~ll2 + ll^ll2 + 2||~|| ||V|| , and note that the Schwarz Inequality is just \(u, v)\ < \\u\\ \\v\\.) Definition Let V be a real vector space and suppose there is a function || || : V -* R satisfying properties (9), (10), and (11). Then we call (V, || ||) a normed vector space. Note that || || is a continuous function on V\ so if we have a sequence of vectors {vn} converging to the vector v, then the sequence of numbers {||vn||} converges to ||v||. We have shown that every inner product space yields a normed space by using (8). Conversely, given a norm on V, can we define an inner product which will give us back our norm (via (8))? Unfortunately, the answer is "no" in general; however, we do have the following Lemma Let (V, || ||) be a normed vector space. Then || || comes from an inner product <^> the norm satisfies the parallelogram law \\u + u||2 - ||u - u||2 - 2( \\u\\2 + IMI2) for all u, v G V . Proof: => Exercise. <= If there is an inner product which induces || ||, then it must satisfy the polar form (12) <u,«> = i(||u + i,||2-||u-t;||2). (This is just exercise (6) from the previous section.) With this in mind, we define a function on V x V by (12) and show that this is, indeed, an inner product which induces || ||. It is clear that (,) is symmetric, positive definite, and that it gives back || || since <«,«> = i(||t; + i;||a-||0||a) = J(2a|M|a) = H|a.
B. Real Inner Product Spaces, Normed Vector Spaces 121 Now we show that (13) (u + v,w) = (u, w) + (v, w) (and hence we get (u,v + w) = (u,v) + (%w) by symmetry). Expanding the right hand side of (13) by using (12) gives (u,w) + (v,w) = J(||u + w\\2 - \\u - w\\2) + j(||t; + w\\2 - ||v - w\\2) ^= \(\\U + U)\\2 + \\V + -L^H2) - l(\\u ~ U)\\2 + \\V - W\\2) = l(\\u + w + v + w\\2 -\\u + w — v- w\\2) — |(||w — w + v — w\\2 — \\u — w — v + w||2) = U\\u + v + 2w\\2-\\u + v-2w\\2) (u,w) + (v,w) = \(u + v,2w) where the third equality holds by the parallelogram law, and the last holds by (12). Setting v = 0 in this last equation yields (14) (u,w) = \(u,2w) , and substituting u + v for u in (14) gives (13). To show bilinearity of (,), we still must show that (15) * (ru,v) = r(u,v) for any r £ R. First suppose r £ N = {1,2,...} and observe that (ru, v) = (u-\ + u , v) = (u, v) H + (u,v) — r(u, v) , v v ' v v ' r r by using (13) repeatedly. Furthermore, note that 0 = (u + (—u), v) = (u, v) + {—% v) so that (—u,v) — —(u,v). Hence (15) holds when r is an integer. Now suppose that r = £ is any rational number (p, q integers with qj^O). Then <fu,i;> =p(±u,v) = (f) g^ti,*) = f^ti),*) = § <u,t;> and so (15) holds for the rationals. Finally, choose? any r € R and let {ri} be a sequence of rational numbers converging to f. (Wo know such a sequence exists since the rationals are
122 IV. Inner Product Spaces dense in the reals.) Note that {riu} is a sequence of vectors converging to ru. Then we have (ru, v) = (lim (ryu), v) = lim (r^, v) = (lim Vi) (u, v) = r{u, v) . Note that we can "pull the limit out" of (,) since || || is continuous. | Now we consider an example of a normed vector space (V, || ||) in which || || does not come from any inner product on V. Example Let V = R2 and define || || by ||(xi,x2)|| = |o?i| + \x2\ . It is left as an exercise to show that this is actually a norm on R . If we let x = (1,0) andy = (0,1), then we see that ||a?+y||2- ||#-y||2 = (|1| + |1|)2 - (|1| + | - 1|)2 = 0 whereas 2(||*||2 + ||y||2) = 2(|1| + |1|) = 4. So this norm does not satisfy the parallelogram law. Exercises (1) Given any set of vectors W from the inner product space (V, (,)), we define the orthogonal complement of W to be all vectors in V which are orthogonal to every w € W; i.e., WL — {v G V | (v, w) = 0 for all w G W}. Show that WL is a subspace of V. If V is finite-dimensional and if W is a subspace of V, show that V^ W® W±. (2) Let W be the subspace of R3 spanned by u = (1,1,0) and v = (1,0, —2). Find orthonormal bases for W and WL. Verify that the union of these two bases is an orthonormal basis for R». (3) Let <j) be an orthogonal endomorphism on the finite-dimensional inner product space (V, (,)), and suppose that W is a </>-stable subspace of V. Show that (i) <j>\w is orthogonal, and (ii) WL is </>-stable. (Hint: <j) is nonsingular — a fact we will prove later — so 0(W) = W.) (4) Show that || || : R2 -> R defined by ||(xi,x2)|| = |a?i| + \x2\ is a norm. (5) Draw a picture in R^ to justify the name "parallelogram law,"
C. Complex Inner Product Spaces 123 (6) For V = Rn, show that the following define norms on Rn. (i) ||(a:i,...,xn)\\p = (\Xl\* + • • • + |ajn|p)*, where 1 < p < oo. (ii) ||(a?i,...,a;n)||00 = max{|xi|,..., \xn\}. Can you decide which of these norms comes from an inner product? (7) Let V be the infinite-dimensional vector space of real-valued continuous functions on [—7r, tt]. Show that (f,g)= f" f(t)g(t)dt J—7T defines an inner product on V. What is ||/|| if f(t) = t + 3? Show that {1, sin t, sin(2t),..., cos t, cos(2t),...} is an orthogonal subset of V. C. Complex Inner Product Spaces A complex vector space is simply a vector space whose scalar field is C. An inner product on a complex vector space is much the same as the real inner products we have already studied; however, there is one important difference from the real case which arises because conjugation is a nontrivial operation on C. First we define a Hermitian inner product (,) on Cn: for x = (ai,..., «n),2/ = (/?i,...,/?n)GCnwelet (x,y) =aiJ3i-\ han/3n. Proposition 5 (,) has the following properties. (i) (,) is Hermitian symmetric; i.e., (x,y) = (y,x), (ii) (x + y,z) = (x,z) + (y,z), (ii') (x,y + z) = {x,y) + {x,z), (iii) (ax,y) = a(x,y), (Hi') (x,ay) = a(x,y), (iv) (x,y) = (x,y), (v) (,) is a positive definite; i.e., (x,x) is a real number > 0, and {», x) » 0 «» x = (0,..., 0),
124 IV. Inner Product Spaces (vi) (ei,ej) = 6ij where {ei,...,en} is the standard basis for Cn, and (vii) (,) is nondegenerate. Proof: Exercise. (Hint for (v): Recall that if a = a + bi G C, then aa = a2 + b2 G R.) I Properties (ii) and (iii) say together that (,) is linear in the first variable, while (ii') and (iii') say that (,) is conjugate linear in the second variable. So our inner product on Cn is said to be Hermitian linear. We could have defined our inner product on Cn by formula (1), and then it would still be bilinear. Unfortunately, our inner product would no longer be positive definite because aa is not always a real number if a G C. Since positive definiteness is crucial for defining norms, we are willing to sacrifice bilinearity. Definition Let Fbea complex vector space. A Hermitian form on V is a function / : V x V —► C which is Hermitian symmetric and Hermitian linear; i.e., f(u,v) = f(v,u) , and f(au + 0v, w) = a/(u, w) + /3f(v, w) where u,v,w eV and a,/? G C. If V is a complex vector space, then (V, (,)) is called a complex inner product space if ( ,) is a positive definite Hermitian form. The Schwarz Inequality still holds in complex inner product spaces, but we need to modify its statement and proof for the present case. Proposition 6 (Schwarz Inequality) For any u,v in the complex inner product space (V, (,)), we have \{u,v)\2 < (u,u)(v,v) . Proof: We will use the fact that aa = \a\2. First, observe that if (u, v) is real, then our previous proof works without any problems. If a = (u,v) is not real, then we have u 1 (u,v) (-,V) = -{U,V) = r = 1 a a (w,v) so that (^,v) is real. Thus we have a a a aa so that aa = (u}u)(v}v). Since a = (u,t>), we have our result. I
C. Complex Inner Product Spaces 125 Given two vectors u, v from the complex inner product space (V, (,)), we say they are orthogonal if (u^v) = 0 (just as we did in the real case). The Gram-Schmidt process works exactly the same, too, so every finite- dimensional complex inner product space has an orthonormal basis. We define a norm on V by (16) |M| = y/{^) . O It is left as an exercise to show that this norm inherits the same properties from (,) as in the real case; i.e., (17) \\av\\ = |a| \\v\\ for a € C , v G V , (18) N|>0, and ||v|| = 0 O v = 0 , and (19) ||w + v|| < Ml + H . Hence the definition of a (complex) normed vector space is exactly the same as before. Furthermore, it is still true that a norm on V comes from an inner product (via (16)) O the norm satisfies the parallelogram law. Definition Let (V, (,)) be a complex inner product space. A linear map (j): V —► V is unitary if for any u, v G V we have {(j){u),(f){v)) = (u,v). Equivalently (when V is finite-dimensional), A G Mn(C) is a unitary matrix if (xA,yA) = (x,y) for all x,y G Cn. We denote the set of n x n unitary matrices by U(n). Proposition 7 U(n) is a group under matrix multiplication. Proof: Matrix multiplication is associative, and clearly the identity matrix is unitary. If A, B G U(ri), then (xAB,yAB) = (xA,yA) = (x,y), showing that AB is also unitary. Note that if A = (al3) G Mn(C), then we define the conjugate of A to be A = (ai:j) . Since the operations of conjugation and taking transposes commute, the symbol lA is unambiguous. It remains to prove that if A is unitary, then A is an isomorphism and A"1 is also unitary. For this, and other purposes, the following lemma is important. Lemma 1 For A £ Mn(C) and x,y G Cn we have (a?i4,y) = {x,y%A) .
126 IV. Inner Product Spaces Proof: Let A = (a^) and calculate xAj= (xiau H h xnanll..., xxaln H h xnann) y *A = (yian H h ynain,..., yiani H 1- 2/nflnn) • Thus we see that (xA, y) is equal to )2/i H 1- (ziain H h xnann)yn , whereas the (x,y *A) is #i0ii2/i H 1- aint/n) H 1- Snfanif/i H h annyn). Inspection shows these two expressions contain exactly the same terms. i Corollary For A G Mn(R) and a?,y G Rn, Lemma 1 gives (xA,y) = Continuing the proof of Proposition 7, we use Lemma 1 to show that A G Mn(C) is unitary «=> *A = A-1. By Lemma 1, (eiA^ejA) = (e*, (e^A) *A). Thus (e*A, e^ A) = (e», e,) = <% <£> A *A = J. So if A is unitary, then A-1 exists and is equal to tA. Furthermore, (#A~~1,yA~"1) = (xA~1A,?/A"~1A) = (x^y). This shows A-1 is also unitary and completes the proof that U(n) is a group. I In the course of proving Proposition 7, we also showed that the rows of a unitary matrix form an orthonormal basis for Cn. Next we want to prove a result for unitary matrices like we did for orthogonal matrices. Proposition 8 If A G Mn(C) and A preserves lengths, then A is unitary. Proof: To prove this theorem, the following lemma will be useful. Lemma 2 If A preserves lengths, then (etA,ejA) = -(ejA,e»A) for i^j. Proof : ((e< + Cj)-A,(e< + ej)-A) = (ei + ej,ei + ej) = (e^e*) + (e^ty) » 2.
C. Complex Inner Product Spaces 127 But the left hand side also equals (e» A + ej A, e{A + ejA) = (e»A, e* A) + (e* A, e. A) + (e. A, e*A) + (e3- A, e^-A) = 2 + (e^A, e^A) + (e^-A, e»A). This proves Lemma 2. I Now we can prove Proposition 8. For i ^ j, let u a: = aei + /Jej where a,/3 e C are not yet specified. Then (20) (x, x> = (aei + /%, ae* + /Je^) = aa + 0 4- 0 + /3/3. But, by Lemma 2, we also have (x, x) — (xA, xA) = (aeiA + /?ej A, ae^A + /3ejA) /91 x = aa(e*A, e»A) + a^e^A, e, A) + /?a(ej-A, e*A) 1 j +PP{e5A,e5A) = aa + /3/3 + (a/3 - /3a) (e» A, e, A) . Comparing (20) and (21) gives (a^-/3a)(eiA,eiA) =0. Letting a = i and /3 = 1 gives a/3 — /3a = 2i, and hence (e» A, ey A) = 0 fori^j. So for a: = xiei H h xnen? 2/ = 2/iei H h ynen we have (xA, yA> = ((xiei H h xnen)A, (y1e1 H h ynen)A) proving that A is unitary. I Exercises (1) Show that the norm defined by (16) inherits properties (17)- (19) from the Hermitian inner product (,). (2) Prove that a norm on a complex vector space comes from a Hermitian inner product (via (16)) <& the norm satisfies the parallelogram law. Give an example of a norm on C2 which dofi not come from any inner product.
128 IV. Inner Product Spaces (3) Let V be the complex vector space of all continuous complex- valued functions on [0,1], Show that </,»> = / f(tW)dt Jo defines an inner product on V. What is the norm of elt = cos t + i sin tl (4) Given A G Mn(C), show that *(A) = (U). (And so *A is unambiguous.) (5) Let (V, ( ,)) be a complex inner product space, and suppose </> G End(V') is unitary. If W is a 0-stable subspace of V, show that W-1 is also 0-stable. (6) The matrix A G Mn(C) is called Hermitian if (Note that the set of Hermitian matrices contains the set of symmetric matrices as a subset.) Show that the eigenvalues of a Hermitian matrix are real numbers. (Hint: Consider (xA, x) where x is an eigenvector for A.) (7) The matrix A G Mn(C) is called skew-Hermitian if tA = -A. Show that the eigenvalues of a skew -Hermitian matrix are purely imaginary. (We say a G C is purely imaginary if a = bi where b G R.) (8) The matrix A G Mn(C) is called positive definite if there is a nonsingular matrix B G Mn(C) such that A = BtB . Show that the eigenvalues of a positive definite matrix are real and positive. (Hint: Consider (xB,xB) where x is an eigenvector for A.) Note that, in particular, this holds for A G Mn(R). (9) Given the Hermitian matrix A G Mn(C), show that the function / : Cn x Cn -» C defined by f(x)y)^xA(1y)
D. Orthogonal and Unitary Groups 129 is a Hermitian form on Cn. Conversely, given the basis {?;i,..., vn} for the complex vector space V, and given the Hermitian form / on V, we get a matrix A — (a^) associated to / as above (i.e., dij = f(vi,Vj)). Show that A is a Hermitian matrix. How does A change if we choose a different basis for VI For exercises (10)-(14), let (V, ( ,)) be an inner product space (real or complex). 0 (10) Show that each v G V determines a functional </>v G V* defined by (j)v(w) = (w,v) . However, i\)v defined by ipv(w) — (v,w) is not linear unless V is a real vector space. (11) If V is finite-dimensional and if <j> G V* is any functional, show that there is a unique v G V such that <j>(w) = (w,v) for all w G V. (12) Suppose V is finite-dimensional and let ip G End(V). Use exercise (11) to show that there is a unique endomorphism rp* of V such that (il>(u),v) = (u,il>*(v)) for all u, v G V. We call ip* the adjoint of ip. (Hint: Fix v ^V so that (f) defined by </>(u) = (ip(u),v) is a functional on V. Then there is a unique v' G V such that 0(u) = (u, v') for all u eV. Define ip* by ^*(^) = ^.) (13) Continuing exercise (12), choose an orthonormal basis a for V. Show that if A = Ma(^), then *A = Ma(^*). (14) Show that A G Mn(C) represents a unitary map (j) G End(V') (with respect to an orthonormal basis of V) <=>• A is a unitary matrix. State and prove an analogous result for orthogonal maps/matrices. (Thus we see that if (j) is orthogonal or unitary, then it is nonsingular.) D. Orthogonal and Unitary Groups Recall that 0{n) is the sot of n x n orthogonal matrices. Because of the corollary to Lemma I, our proof that U(n) is a group (whose operation is
130 IV. Inner Product Spaces matrix multiplication) carries over almost verbatim to show that 0(n) is also a multiplicative group. In fact, we can think of 0(n) as a subgroup of U(n) where the inclusion is induced by the natural inclusion of R into C. Proposition 9 For A £ Mn(H), the following statements are equivalent: (i) A is orthogonal, (ii) (eiA,ejA) = % (iii) A maps an orthonormal basis (of Hn) to an orthonormal basis, (iv) the rows of A form an orthonormal basis (v) the columns of A form an orthonormal basis, and (vi) *A = A-1. Proof: Exercise. I In Proposition 9, if we make the obvious adjustments, then we get an analogous characterization of unitary matrices. From the equivalence of (iv) and (v), we conclude that A is orthogonal (unitary) &> *A is orthogonal (unitary). Proposition 10 If A € U(ri), then detA is a complex number of unit ength. If A € 0(ri), then detA G {1, -1}. Proof: For A unitary, we know that A-1 — tA. Thus 1 = det (J) = det{AA~x) = det Adet( *A) = detA detA so that | det A\ = Vdet AdelTA = 1. For A orthogonal, we know that A is also unitary (by the inclusion 0(n) C U(n)) so that | det A\ = 1. But det A is a real number, so we must have det A = 1 or det A = -1. I Definition The special orthogonal group is SO{n) = {A e 0{n) | det A = 1} (also called the rotation group). Similarly, the special unitary group is SU(n) = {Ae U(n) | det A = 1} . The fact that these are, indeed, groups is an easy exercise. Definition We say that the matrix A G Mn(R) is skew-symmetric if %A = -A.
D. Orthogonal and Unitary Groups 131 The matrix A G Mn(C) is skew-Hermitian if *A = -A . The set of all n x n skew-symmetric matrices is denoted by o{n), and the skew-Hermitian matrices are denoted by u{n). Note that o{n) and u{n) are not multiplicative groups. For example, if then A is skew-symmetric, but A2 = I 1 J is not. However, o{n) is a subspace of the n2-dimensional real vector space Mn(R). (Exercise. Show this.) It is not true that u{n) is a subspace of the complex vector space Mn(C) (since a complex multiple of a skew-Hermitian matrix need not be skew- Hermitian); but if we think of Mn(C) as a (2n)2-dimensional real vector space, then u{n) is a (real) subspace. For example, if then A is skew-Hermitian, and so is rA for any r G R, but is not skew-Hermitian. Proposition 11 The dimension of o(n) is n^n~ \ and the dimension of u{n) is n2. Proof: Exercise. (Hint: Bases for both spaces are easy to write down.) i Recall the definition of a Lie Algebra from Section IIB (i.e., an algebra whose multiplication is anticommutative and satisfies the Jacobi identity). We make the real vector spaces o(n) and u(n) into Lie Algebras by defining [A, B] = AB-BA, where AB denotes matrix multiplication. To see that [A,B] is skew- symmetric whenever A, B € o(n), we observe that *[A,B] = t{AB-BA)=\AB)-\BA) = tBtA- WB = -(*A*B -lB *A) - -((-A)(-B)-(-B)(-A)) = -[A,B\.
132 IV. Inner Product Spaces Similarly, [A,B] E u(n) whenever 4,BG u{ri). The groups 0{ri), SO(n), f/(n), and SU(n) are examples of Lie Groups. (The definition of a Lie Group would take us too far afield, but we simply remark that they are important in various branches of mathematics.) Now o(n) is the Lie Algebra associated to the Lie Group 0(n) (and also to SO(n)), and u{ri) is the Lie Algebra associated to U(n). The Lie Algebra associated to SU(n) is su(n) — {A E u(n) | trace A = 0} . The idea of associating a Lie Algebra to a Lie Group is akin to associating /' to / in calculus. We do this because linear objects are easier to study than nonlinear ones, but we still get useful information about the associated nonlinear objects. Exercises (1) State and prove a proposition (analogous to Proposition 9) for unitary matrices. (2) Show that if A = (a^) is an orthogonal or unitary matrix which is triangular, than A is diagonal. (3) Show that SO(n) and SU(n) are, indeed, subgroups of 0{n) and f/(n), respectively. (4) Show that SO(2) = [ ( _^f^ ™ * \ I 0 E r|; also, show that J7(l) = {eie = cos0 + i sin# | 0 E R}. Now show that the map <$> : 50(2) - U(l) denned by cf> ( _C^ ™* ) = e" is a group isomorphism (i.e., </>(AB) = </>(A)</>(B), and there is a map ^ : 17(1) -► SO(2) such that ^(e^e*') - ^(^)^(^2) and ip o 0, 0 o t/j are identity maps). (Hint: Use the standard trigonometric identities to show that eieiel°2 = el(ei+°2\) (5) Show that if A E o{ri) (A E w(n)), then the diagonal elements of A are zero (purely imaginary). (6) Show how we can think of Mn(C) as a (2n)2-dimensional real vector space. (Recall that we already did this for n = 1.) Now show that u(n) is a subspace. (7) Show that su(n) is a Lie Algebra and that its dimension as a (real) vector space is n2 - 1.
E. Stable Subspaces for Unitary and Orthogonal Groups 133 (8) The special linear group is the set SL{n, R) = {A G Mn(R) | det A = 1} whose group operation is matrix multiplication. Show that this is, indeed, a group and find a matrix A G SL(n, R) — 0(ri). (9) Show^that sl(n, R) = {Ae Mn(R) | trace A = 0} is a Lie Algebra. (This is the Lie Algebra associated to the Lie Group SX(n,R).) Curves in Matrix Groups. A curve in the real normed vector space V is a continuous map 7 : 'vfc ~f~ h) — /"y(c) (a, b) —» V. For c G (a, 6), we say 7 is differentiable at c if lim - exists, and we denote this limit by 7;(c). Given the subset W C V, we say that 7 is a curve in W if j(t) G W for every t G (a, 6). If G is a multiplicative group in V (e.g., V = Mn(R) or Mn(C), and G = O(n) or C/(n), and if 7, a are two curves in G, then we define their product curve in G by (1a)(t)=1(t)a(t). (10) If 7 and a are both differentiable at c G (a, 6), show that 7a is also differentiable at c, and (jcry(c)=j(c)a'(c)+j'(c)a(c). (11) Let /? be a differentiable curve (3 : (—a, a) —► O(n) with (3(0) = I. Show that /?'(()) is skew-symmetric. For (3 : (—a, a) —► f/(n) with /?(0) = /, show that /?'(0) is skew-Hermitian. (Exercise (11) should provide a hint as to how we associate a Lie Algebra to a Lie Group.) E. Stable Subspaces for Unitary and Orthogonal Groups We begin with two easy observations about the unitary matrix A e U(n). (1) If th® aubapace W C Cn is A-stabk, then A\w is unitary.
134 IV. Inner Product Spaces (2) If W CCn is A-stable, then the orthogonal complement of W WL = {z e Cn | (z,y) = 0 for all yeW} is also A-stable. Proof: Note that A\w € End(W) since W is A-stable. Also (xA,yA) = (x,y) for all x,y € W so that A\w is unitary on W. Now choose z € W1- and note that for any y 6 W we have (zA,y) = (z,y'A) = (z,j/A-1). But A is an isomorphism of W onto W so yA""1 € W. Thus (zA,y) = 0 for all y € W, showing that 2A € W-1-. I Proposition 12 Given A € U{n), there exists an orthonormal basis for Cn consisting of eigenvectors for A. Proof: We show this by induction on n. For n = 1, the result is trivial; so suppose n > 1. Since the characteristic polynomial of A factors completely, A has at least one eigenvalue and hence an eigenvector v\. Then 1 IMI is a unit vector. Let W = Span(ui), so that Cn = ^® W1- by exercise (1) of Section IV B. Note that dim WL = n — 1 (since dim W = 1), and A|wX is unitary on W^ (by our observations). By induction, W^ has an orthonormal basis {iX2, • • •, un} of eigenvectors for A\w±. Thus {u\,..., wn} is the desired basis. I Corollary A G 17(ra) is diagonalizable. Exercise. Given A G U{n), show that there is a B G 17(n) such that BAB"1 is diagonal. It is tempting to assert that Proposition 12 should also hold for A G O(n) since the observations (1) and (2) still hold for an orthogonal matrix. But we have seen that A G 0(n) need not have an eigenvalue (in R). For example, *-(-? ;H2) is not diagonalizable. However, if A € Mn(R) is symmetric, then our observations still hold.
E. Stable Subspaces for Unitary and Orthogonal Groups 135 Exercise. State and prove appropriate versions of observations (1) and (2) for a real symmetric matrix. So if we can show that a symmetric matrix always has an eigenvector, then the proof of Proposition 12 can easily be adjusted to prove the following. Proposition 13 If A £ Mn(R) is symmetric, then there is an orthonor- mal basis for R/i> consisting of eigenvectors for A. We show that a symmetric matrix has an eigenvector in the next two propositions. Proposition 14 Let A £ Mn(R) be symmetric and suppose A £ C is an eigenvalue of A. (Thinking of A as an element of Mn(C), we know such a X exists.) Then A £ R. Proof: Let x £ Cn be an eigenvector for A corresponding to A. Using the standard inner product on Cn, we see that A(x, x) = (Ax, x) = (xA, x) = (cc, xlA) . But A is real and symmetric, so lA = A. Thus (x, x *A) = (x, xA) = (x, Xx) = A(cc, x) and we see that A(x, x) = X(x,x) . Since x is an eigenvector, it is nonzero and so (x, x) ^ 0. Hence A = A implying that A is real. I Even though this proposition shows that the eigenvalues of A are all real, we cannot immediately conclude that the corresponding eigenvectors are in Rn (because we had to work in Cn). Proposition 15 The symmetric matrix A has an eigenvector in Rn. Proof: Let A £ R be an eigenvalue of A and let z £ Cn be a corresponding eigenvector. We can write z uniquely as z — x -f iy where x,y € Rn. (Exercise. Show this.) And so Xx 4- iXy = Xz = zA = xA + iyA .
136 IV. Inner Product Spaces Since A is real and since the real and imaginary parts of a vector are unique, we must have xA = Xx and yA = Xy . But z is nonzero, so one of x or y must be nonzero. I Exercise. Given the symmetric matrix A G Mn(R), show that there is a B G 0(n) such that BAB"1 is diagonal. Now we return to orthogonal matrices and show that even though A=(cos9 -sin0\ \ sin^ cos0 J V ' is not diagonalizable (unless 0 = kit for k G Z), this is about the worst case. That is Proposition 16 Given A G 0(n), Rn contains A-stable subspaces of dimensions one and/or two such that Rn is the internal direct sum of these subspaces. Proof: Again, we use induction on n. For n = 1, there is nothing to prove; so assume n > 1. Lemma There exists an A-stable subspace W C Rn with dimW G {1,2}. Proof: B = A + M. is symmetric, so B has an eigenvector x G Rn (by Proposition 15). Let A G R be the corresponding eigenvalue, and consider the two vectors x, xA. If x and xA are linearly dependent, then xA = rx for some r G R. Thus W = {tx 11 G R} is A-stable and dim W = 1. If x and xA are linearly independent, let W = Span(x,xA) . Note that Xx — xB = xA + x lA = xA + xA'1 and so XJ\. ~==- \XJ\. X • Now choose w G W and write this as w = ax + bxA for some a, b G R. But then wA = (ax + focA)A = ax A + 6xA2 = axA + b(XxA - x) = (-&)# + (a + 6A)xA € W* so that W is A-stabl©.
E. Stable Subspaces for Unitary and Orthogonal Groups 137 This proves the lemma. I Returning to the proof of Proposition 16, we let W be an A-stable subspace as in the lemma, and note that Rn = W ® W±. By our observations, A\w± is orthogonal on W1- (which is of dimension < n); so by induction, WL can be written as the direct sum of A-stable subspaces of dimensions one and/or two. I Given A G 0(n), Proposition 16 allows us to restrict our attention to the A-stable subspace W C Rn which is of dimension one or two. So by Proposition 13 of Section HIE, we only have to look at 1 x 1 and 2x2 orthogonal matrices. A G O(l) must be (1) or (-1) since | detA\ = 1. What about A G 0(2)? Exercise. Show that A G 0(2) can be written as cos# sin# — sin# cos# )«( — cos 9 sin 9 sin 9 cos 9 for some 9 G R. Note that A — cos9 sm# \ . ,. Tii. ..i . n /i is diagonahzable since its charac- sm# cos 9 J ° teristic polynomial is x2 — 1. (Exercise. Show this.) So A is similar to (J -?)■ Combining Proposition 16 with these observations gives the following. Corollary Given A G 0(ri), there is a nonsingular matrix C such that CAC~X is a block diagonal matrix of the form (22) where M\ Mk = I, M2 cos Ok — sin 9k sin Ok cos Ok V 0 -I,M3 Mo M. 0 \ Mk J cos 9s — sin 9s sin 9s cos 9s (1) Show that ( C0Sf kn for k € Z. Exercises -sin0 CGS0 )' is not diagonalizable unless 9 =
138 IV. Inner Product Spaces (2) Show that 2 1 7? -1 3^2 1 3 0 4 3\/2 2 A V2 -1 3^ is orthogonal and find a matrix of the form (22) which is similar to A (3) A G Mn(C) is said to be a normal matrix if A fA = lAA (or A G Mn(R) is normal if A1 A = MA). Note that Hermitian and unitary matrices are special cases of normal matrices. Show that A~ \ 1 3 + 2* ) is normal. (4) Suppose A G Mn(C) is normal. Show that (i) A is Hermitian o the eigenvalues of A are real, and (ii) A is unitary o the eigenvlaues of A are all of absolute value 1. (5) Given A G Mn(C) and an A-stable subspace W C Cn, show that W-1 is ^-stable. (6) Let A G Mn(C) be normal. Show that (i) ^ = 0^^1 = 0, (ii) A — XI is normal for any A G C, and (iii) if xA = Ax, then xtA = \x. (7) If A G Mn(C) is normal, show that there is an orthonormal basis for Cn consisting of eigenvectors for A. (Hint: Use exercises (5) and (6) to prove appropriate versions of the observations (1) and (2).) (8) Show that any endomorphism of a finite-dimensional complex inner product space can be represented by a triangular matrix. (9) Suppose the nxn matrix A is normal. If x and y are eigenvectors belonging to distinct eigenvalues of A, show that x and y are orthogonal.
E. Stable Subspaces for Unitary and Orthogonal Groups 139 (10) Recall that GL(n,R) = {A e Mn(R) | det >1 ^ 0}. If P(n) denotes the set oinxn positive definite matrices (see exercise (8) of Section IV C), show that GL{n,R) = 0(n) x P(n) . That is, given A G GX(n,R), there exist B G 0(n) and C G P(n)Dsuch that A = BC. Furthermore, B and C are unique. (Hint: TV — A lA is symmetric, so there is an orthogonal matrix Q such that QNQ~X is diagonal; but the diagonal elements are positive since N is positive definite. Thus there is a (real) diagonal matrix S such that S2 = QNQ-1. Let C = Q~lSQ.) (11) Show that GL{n, C) = U(n) x P(n).
Ill I I I i. I Hi I I ||i< |.. mil iiiIMM M, HP i »l> II «||l||li lill'n ill . liiiUlli. PlUli I HUM INllUll |i||ilillM I'MBwin HHtfliiu ii'f i N if ffS^f|WHHi^WWWiWljlHWtltef,.f||l1P4l(M|*!ffll|iNhilHf ' I I
Chapter V Normed Algebras A. The Normed Algebras R and C We have seen that normed vector spaces are the same as inner product spaces precisely when the norm satisfies the parallelogram law. We now begin our search for normed algebras. Let A be a finite- dimensional real vector space and choose a basis {t?i,..., vn} for A. Suppose that A has a multiplication which makes it an algebra (which is not necessarily commutative or associative) with unit element v\ = 1. If we define an inner product on A by (vi^Vj) = <5;j, then we get the associated norm on A as usual; i.e. We say that A is a normed algebra if (1) \\uv\\ = |H| \\v\\ for all u,v e A . Note that unlike the case of a normed vector space, we require the norm on an algebra to come from an inner product. We begin with the observation that R is a normed algebra. Its norm is ||z|| = y/x? and we easily check that we do have \\xy\\ = \\x\\ \\y\\. So R is our first example. Next we claim that the vector space R2 with its usual norm ||x|| = \/x\ + x\ can be made into a normed algebra by an appropriate definition of multiplication. Indeed, the multiplication will be unique in the following sense. We consider R={(x,0)}CR2 and we insist that our multiplication on R2 extend that on R; i.e., (a, 0)(ft,0) = (aft, 0), and (1,0) is the unit element for the multiplication on R2. Then there in only on© such multiplication making R2 into a normed algebra.
142 V. Normed Algebras To see this, we take the standard basis {(1,0), (0,1)} for R^ and denote these vectors by 1 = (1,0) and i = (0,1). Then any (a, b) G R can be uniquely written as al + bi or, since 1 is the multiplicative identity, as a + bi. We always want multiplication to distribute over addition so we must have (2) (a + bi)(c + di) = ac + bci + adi + bdi2 = ac+ (be + ad)i + bdi2 so we just need to define i2 so that (1) is satisfied. We have ||i|| = Vl2 + 02 = 1 so that ||i|| ||i|| = ||i2|| = 1 by (1). This shows we can write i2 = u + wi with w, w G R and u2 + w2 = 1. Then 1 = ||i3||2 = || (^ + wi)i\\2 = \\ui + w(u + wi)\\2 = \\uw + (u + w2)i\\2 = u2w2 + u2 + 2wif;2 + if;4 , and putting in u2 = 1 — w2 gives 1 = (l-w2)w2+ (l-w2) + 2uw2+wA = it;2 - w4 + 1 - w2 + 2wk;2 + tf;4 = 1 + 2uw2 . Thus we must have 2m*;2 = 0 so that u — 0 or w = 0. If u = 0, then it; = 1 or it; = —1. If w = 1 (and w = 0), then i2 = i which implies i = 1 (see Exercise (3)); i.e., (1,0) = (0,1), which is false. If u = 0 and w = —1, we get i2 = —i, implying that i = — 1, also a contradiction. Thus we cannot have u = 0. So W = 0 and i2 = 1 or i2 = —1. If i2 = 1, we get 0=||i2-l|| = ||i + l||||i-l|| = x/2V/2 = 2, a contradiction. Thus i2 = — 1 and (2) becomes (2') (a + bi)(c + di) = (ac - bd) + (ad + be) i. Exercise. Show that this multiplication is associative, commutative, extends that of R = {a + 0i} C R , and preserves norms. (Hint: Recall the exercises of Chapter 0.) The previous example is typical of the situation in which we are interested. That is, if A is a normed algebra, can we extend A to a larger normed algebra B such that the multiplication of B extends that of A and the norm on B restricts to that on Al If such a B exists, we say that it has A as a subalgebra. An application of (2') to number theory is
A. The Normed Algebras R and C 143 Theorem Let S C Z be those integers which may be written as a sum of two squares (e.g., 29 G S since 29 = (2)2 + (5)2>). Then S is multiplicatively closed. Proof: Suppose p — a2 + b2 and a = c2 -f d2 are in S. We need to show pa G S. Set a = a + bi, f3 = c + di. Then we have pa = (a2 + 62)(c2+d2) = ||a||2||/3||2 = ||a/3||2 o= (ac-6d)2 + (ad+6c)2. I Of course, the normed algebra we found on R2 is just the field C of complex numbers. So now we ask if we can extend this normed algebra on C to a normed algebra on R . The answer is a resounding NO. In fact, we cannot even extend the multiplication on C to an associative multiplication on R . (Later we will get this result without assuming associativity.) Suppose we can extend the multiplication on C to R . We extend the basis {l,i} of C to the basis {1-(1,0,0), i = (0,l,0), j = (0,0,1)} for R3 and note that we must have ij = a + bi + cj with a,b,c G R. If we assume that the extension of C to R** is an associative algebra, then —J — ^3 — *(u) — ai — b + c(a + bi + cj) = (ac — b) + (a + bc)i + c2j and equating coefficients of j gives — 1 = c2, contradicting c G R. Next we turn our attention to R . We shall see that we can make R4 into a normed algebra which is associative but not commutative. Our proof of this will be constructive; that is, we will use some properties of a general normed algebra to decide how we should define a multiplication on R . We use the basis {1 = (1,0,0,0), t = (0,1,0,0), j = (0,0,1,0), fc = (0,0,0,1)} along with the standard norm (i.e., if x = x\ + x2i + x3j + x±k then \\x\\ = \Jx\ + x\ + x\ + x\ ). Also, we want C to be a subalgebra, so we insist that i2 = —1. Let R = Span(l) so that its orthogonal complement is Rx = Span(i,j, k) . Then any q 6 R can be uniquely written as q = a + (3 with a G R and 0 £ R1, Wa call a = Re(q) the real part of q and {3 — Im(g) the imaginary part of q,
144 V. Normed Algebras We define the conjugate of q = a + /? to be q = a- f3 . Then clearly q = q,P = -/3, Re(q) = ^, and Im(q) = ^ Also, for <ji, g2 £ R we have <?i + 92 = qi + <?2 • Exercise. Show that these conjugation formulas hold. We want to prove that for any q G R , we have (3) qq=\\q |2 It turns out to be fairly easy to prove (3) for any normed algebra, so we will do this in the next section. After proving (3) in the general setting, we will finish making R4 into a normed algebra. We call this normed algebra the quaternions and denote it by H. (The H is in honor of W.R. Hamilton, who invented the quaternions in 1843.) But first, we observe that if /? G Rx is any unit vector, then we have w = M2 = i by (3). But /3$ = /?(-/?) = -/32 so that (4) /32 = -1 . In particular, we must have j2 = k2 = — 1. Next, we note that ij must be in Rx (we will see in Proposition 3 that (xw,y) = (x,yw) for any x,y,w G R4, so (ij, 1) = (i,j) = -(i,j) = 0; so by (4), we also have (5) ijij = (ij)2 = -1 since ij is a unit vector. (At this stage, we are assuming that we can construct an associative multiplication.) Multiplying (5) on the left by i and on the right by j gives ji = -ij . Similarly, we see that jk = —kj and ki = —ik. Consequently, H is not commutative.
B. Some General Results, Quartenions 145 Exercises (1) Show that the unit element of any normed algebra is a unit vector. (2) Suppose that A is a normed algebra and that a G A is a unit (i.e., there is a (unique) a"1 G A such that aa~l = a~~la = 1). Show°that iia-iHMr1- (3) Suppose a and b are elements of a normed algebra. Show that if ab = 0, then either a = 0 or b = 0. Use this to show that if i2 = i, then i = 1 (hence i2 ^ i). (4) What is wrong with the following "proof" that j and k are linearly dependent? Since j2 = k2 = — 1, we have j2 — k2 = 0. But x2—y2 factors as {x-{-y){x—y), so we get (j+k)(j — k) = 0. By exercise (3), either j — —k or j = k. (5) Read "Hamilton's Discovery of Quaternions" by B.L. van der Waerden in Mathematics Magazine, vol. 49, #5 (1976), pp. 227-234. B. Some General Results, Quaternions Let A be a normed algebra with basis {t>i,... ,t>n} such that V\ = 1 and the inner product on A is (x,y) =Xiyi-\ \-xnyn where x = ori^x H h xnvn, y = Z/i^i H h ynVn. Recall that the commutator of # and y is [x, y] = xy - yx, and the associator of #, y, z is [x,y,z] = (xy)z-x(yz). As before, R = Span(l) and its orthogonal complement is R-1 = Span(t>2 ,..., vn). Also, we can write x £ A uniquely as x — a + /3 with real part a — Re(a?) € R and imaginary part (3 = Im(#) G Rx. We define the conjugate of x to be #
146 V. Normed Algebras and it is easy to show that n n X+X X-X _ x = x, p = -/?, a = , p = , and x + y = x + y . Notice that if x G R (<£> Im(:r) = 0), then we can consider x to be a scalar. In fact, A need not be normed for this to hold. Exercise. Let B be any real algebra with unit element. Show that a G Span(l) can be considered to be a scalar. Proposition 1 Suppose B is a real algebra with unit element (not necessarily normed). (i) For x,y G B, if one of them is real, then [x, y] = 0. (ii) For x,y,z G B, if any one of them is real, then [x, y, z] = 0. Proof: (i) and (ii) both follow from the preceding exercise and the definition of an algebra. I Prom now on, our algebras will be normed unless we state otherwise. Proposition 2 For any x,y,w G A, we have (i) (xw,yw) = (x,y)\\w\\2 {= (x,y)(w,w)), and (ii) (wx,wy) = (x,y)|H|2. Proof: We prove (i); (ii) can be proved similarly. First we obtain an identity in x and w. We have ||aru;||2 = \\xw\\ \\xw\\ = \\x\\ \\w\\ \\x\\ \\w\\ = \\x\\2\\w\\2 so that (6) (xw,xw) = (x,x)(w,w) . Now we prove (i) by polarizing the identity (6). We put x + y in (6) in place of x to get ((x + y)w, (x + y)w) = (x + y, x + y) (w, w) . We expand both inner products (by bilinearity and symmetry) to get (xw,xw) +2(xw,yw) + (yw,yw) = ({x,x) +2(x,y) + (y,y)){w,w) . Using (6) to cancel some terms in this last equation gives 2(xw,yw) = 2(x,y)(w,w) , proving (i). Exercise. Prove (ii). I Proposition 3 For any x,y,w € A} we have
B. Some General Results, Quartenions 147 (i) (xw,y) = (x,yw), and (ii) (wx,y) = (x,wy). Proof: Again, we prove (i). Let w = c + 6 with c G R and 6 G Rx. Then 0 (xw,y) = (cx,y) + (x6,y), and (x,yw) = {x,cy) + (x,yS). Since (cx,y) = (x,cy), it suffices to prove (i) in the case that w G R-1 (so that w = —w). With the assumption that w G R-1 we claim (7) (x, y)(l + \\w\\2) = (x(l + w), y(l + w)) . By Proposition 2, the right hand side of (7) equals (x,y)(l + w, l + w) = (a:,y)((l, 1) + (1, w) + (w, 1) + («;,«;» . But (l,w) = (w, 1) = 0, so this equals (x,y)(l + ||w||2) as claimed. By proposition 2, we can rewrite (7) as {x, y) + {xw, yw) = (x, y) + (xw, y) + (x, yw) + (xw, yw) . It follows that (xw,y) = —(x^yw) = {x,y{—w)) = (x,yw), proving (i). Exercise. Prove (ii). I Next "we establish an important property of conjugation. Proposition 4 For x,y G A, we have xy = yx. Proof: For any x,y,z,w G A we use (i) and (ii) of Proposition 3 repeatedly to get (zw,xy) = (z,xyw) = ((xy)z,w) = (xy,wz) = (y,x(wz)) = (y(wz),x) = (Wz,yx). Now take u> = 1 to get (z,xy) = (z,yx). Since (,) is nondegenerate, this proves Proposition 4. I We are finally ready to prove formula (3) (i.e., xx = ||x||2 for any x G A). * Proposition 5 For x,y € A, we have (x,y) = Re(xy) = ^(xy + yx) and 80 XX = \\X\\2 .
148 V. Normed Algebras Proof: Recall that {t>i,..., vn} is our basis of A, and we write x = x\V\ + h xnvni y = ytvt H h ynvn- First we consider v^j when i ^ j. If j = 1, then t^- = vi € R1- ; so assume j > 1. By Proposition 3, (1^,1) = <Vi,t;j> = (vu-Vj) = -(vuvj) = 0 . Thus v^- € R"1" whenever i ^ j. Next we note that INI2 = (Vi,Vi) = (IjViVi) = (Vi,Vi) = 1 by Proposition 3, so ^ is a unit vector. But then vfii is also a unit vector; furthermore, v^i is real since ww = ww = ww for any w e A by Proposition 4. Thus v^ G {1, —1} for all i For i > 1, we have £i = — u» so that — u? = t>^ G {1, —1}. If u? = 1, then 0 = v? - 1 = (t* + 1)(^ - 1) 1 I 1 which contradicts the fact that neither V{ + 1 nor u$ — 1 is zero. Hence v? = —1 if i > 1. I Now we use these observations to calculate f # zy = {x1-\-x2v2-\ \-xnvn)(yi - y2v2 ynvn) f = a?iyi + #22/2 H h #n2/n + Vi(x2v2 H h xnvn) - ^T Xiy^v^ Z so that Re(#y) = xiyi H h o:nyn. But this is precisely (#,2/). The fact J that Re(xy) = \{xy + yx) follows from the fact that Re(i<;) = ^(w + w) 5 and Proposition 4. 3 Finally, we see that 5 ||#||2 = (x,x) — Rje(xx) — xx . I 5 Corollary If f5 G R-1 is a txmt vector, then 01 = — 1. Consequently, when || A is associative, V{Vj = — t^ /or distinct Vi,Vj £ {v2,... ,t>n}- Proof: Exercise. I jl Quaternions z We have the basis {1, i, j, fc} for R4 and we know that = ji as -y, jft as -ftj, and fci = -ifc
B. Some General Results, Quartenions 149 (assuming that our multiplication will be associative). But we still need to define ij, jk, and ki. Choose w G R and consider the map Lw : R4 -+ R4 defined by Lw(x) = wx . Lw is called left multiplication by w. This is a linear map since o Lw(ax + by) = w(ax + by) — awx + bwy = aLw(x) + bLw(y) . That is, Lw G End(R4). Letting w = i, we see that £(-*) is the inverse of Li since £(.<) o Li(x) = L(_i){ix) = (~i)(ix) = x and ^ oi(.i)(x) = a:; so Li is an isomorphism of R . By Proposition 2, we have (Li(x),Li(y)) = (x,y)\\i\\2 = (x,y) so that Li is an orthogonal map (i.e., Li takes {l,a,j, &} to another or- thonormal basis of R4). Now Li sends 1 to i and it sends i to —1. Thus Li must take {j, &} to an orthonormal basis for Span(j, k). That is, Li(j) — ij — aj -f bk with a, 6 G R such that a2 + 62 = 1 and 6^0. (If 6 = 0, then we have aj G {j, — j} which implies that i is real.) Similarly, Li(k) = ik = cj + dA; with c, d G R , c2 + d2 — 1, and c ^ 0. Then we have iL;(j) = -j = aij + ta& = a(aj + &&) -f b(cj + dfe) = (a2 + 6c)j H- (aft + bd)k . Equating coefficients of j and A; yields (8) a2 + bc = -l and (9) 6(a + d) = 0 . Since 6^0, (9) becomes
150 V. Normed Algebras Now Li is orthogonal, so we have 0 = (Li(j),Li{k)) = (aj + 6fc, cj + dk) which implies (10) ac + bd = 0 . Substituting a — —d into (10) gives (11) a(c-b) = 0. Now we multiply (8) by c and multiply (10) by a to obtain (80 a2c + 6c2 = -c (100 a2c + a6d = 0. Subtracting (100 from (80 yields 6(c2 - ad) = -c or, since a = — d, 6(c2 + d2) = -c . But c2 + d2 = 1, so we see that (12) b = -c . Prom (11), we know that either a — 0 or c = 6. If c = 6, then we get c = 6 = — c by (12); but then c must be zero, a contradiction. Hence 0 = a — —d and so b — — c G {1, -1}. We choose b = 1 so that we have ij — k and i/c = — j . Multiplying ij = k on the right by j gives so we have defined ij = k, jk = i, and ki = j. These last thr6e relations area easy to remember using the diagram below. i k ^—^ j Thus wa have the multiplication table
B. Some General Results, Quartenions 151 1 i 3 k 1 1 i 3 k i i -1 -k 3 3 3 k -1 —i k k -3 i -1 and using this table (along with distributivity), we can multiply any two elements of R . ° For example, if u = a 4- bi 4- cj 4- dk and v = x 4- yi 4- zj 4- w&, then f 1 ^ uv ~ (ax ~by — cz - dw) 4- (a$/ 4- &£ 4- cw — c?^)i ^ ' 4- (a^ 4- car 4- dy — 6ty)j 4- (aw + dx -i-bz — cy)k . As we noted before, R4 equipped with this multiplication is denoted by H and called the quaternions. Exercise. Show that H is an associative normed algebra (which is not commutative) if its multiplication is defined by the preceding table; also, verify formula (13). Would we still get a normed algebra if we had decided to define ij as —k instead of kl (Yes. In fact, this algebra is isomorphic to H-) So far, we have constructed the normed algebras RCCCH. Can we extend H to a larger normed algebra? In fact we can, but we lose associativity; we can make R8 into a normed algebra having H as a subalgebra. This new normed algebra is called the octonions (or Cayley numbers) and is denoted by O. We will forgo a careful development of the octonians and simply define an appropriate structure on R . (We remark that there are several such structures, but they give isomorphic normed algebras.) We use the standard basis {1 = ei, e2,..., eg} where 1 = (1,0,..., 0), etc., and the standard norm on R . By the corollary after Proposition 5, we know that we must have e\ = -1 for i > 2 ; also, we require that e% ej — ej e<i for i,j G {2,..., 8} with i ^ j. (This property does not follow from the corollary. Why?) We define e2e3 = e4, e2e5 = ee, e2e7 = -e8, e3e5 = e7, ese® = e8, e4e5 = e%, e4e6 = -e>j and we get 14 mora relations by cyclically permuting each set of three subscript! (©.§., 0§§4 » H and e4e2 = e3, etc:.),
152 V. Normed Algebras Thus we know how to multipy any two basis elements, so we can multiply two elements of R° using distributivity. Obviously, O is not commutative; to see that it is not even associative, we calculate ^5 * e7 = e6e7 = —e4 and compare this to ^2 * e5e7 = e2e3 = e4 . Exercises (1) Show that the commutator [, ] : A x A —* A is a bilinear map on the algebra A. Now show that the associator [,, ] is trilinear on A. (2) Show that if we represent R4 as C 0 C = {(a, (3) | a, /? G C} and define (a,P)(>y96) = (cn-6P,6a + py), then this also gives H. (3) Now represent R8 as H 0 H and use the same definition of multiplication as in exercise (2). Show that this also gives O. (4) Write out a formula, analogous to (13), for the product of two octonions. (5) Show that O is a normed algebra which contains H as a subal- gebra. (Showing that O is normed will be easier after we have established the Cayley-Dickson formula.) (6) Let FCZbe those integers which may be represented as a sum of four squares, and let E C Z be those integers which may be represented as a sum of eight squares (e.g., 10 = l2 +12 -f 22 -f 22 GF). Show tht F and E are closed under multiplication. (7) Given q = a -f bi 4- cj -f dk G H where a, b,c,d e R, show that q and q are both roots of the real polynomial p(x) = x2 — 2ax + \\q\\ . (8) Consider the set of n-tuples of quaternions This is not a vector space since H is not a field, but we can still define the notions of scalar multiplication, addition, linear
B. Some General Results, Quartenions 153 maps, etc. For example, if a G H and q = (qi,... ,qn) G Hn, then we define aq = (aqi,...,aqn) . Given a matrix of quaternions A G Mn(H), we define a map <$>: Hn -♦ Hn by <l>(Q) = (Qi,'-,Qn)A. o Show that this map is linear. Also, show that the map \ Qn is not linear since we decided to multiply by scalars on the left. (9) Define an inner product on Hn by (x,y) =xiyi + --+xnyn where x = (xi,..., xn) and y = (yi,..., yn). Show that (,) satisfies the following properties. (i) (x, y) = (y, x) (ii) (x + y, *) = (x, z) + (y, z) (ii') (x, y 4- 2) = (x, y) 4- (x, 2) (iii) (ax,y) = a(x,y) (iii7) (x, ay) = (x, y)a (iv) (x, x) is a real number > 0, and (x, x) = 0 O x = 0 (v) If {ei,..., en} is the standard "basis" for Hn (i.e., ei = (1,0,..., 0), etc.), then (e., ej) = Sy. (vi) (,) is nondegenerate. (10) For x,y G Hn and A G Afn(H), show that (Xi4,y) = (x,y*A) . (11) The matrix ^4 G Mn(H) is said to be symplectic if (xA, yA> = (x, y> for all x, y G Hn. Show that Sp(n) = {A € Mn(H)|j4 is symplectic} is a group under matrix multiplication. Sp(n) is called the symplectic group. (12) Show that A is symplectic <$> (Ax, Ax) = (x, x) for all x € H.
154 V. Normed Algebras C. Alternative and Division Algebras Now we show that normed algebras are rather special. First they satisfy a weak kind of associativity called "alternative," and second they are division algebras. Definition An algebra A is alternative if the associator [x, y, z] = (xy)z — x(yz) is zero whenever two of {x,y, z} are equal. Proposition 6 Any normed algebra A is alternative. Proof: We need to show that [x, w, w] = 0, [w, x, w] = 0 and [w, w, x] = 0. First we prove that for any x, w in A, we have (14) [x,w,w] = 0 . To see this, note that [x, w, w] = (xw)w — x(ww) = (xw)w — x\\w\\2 by Proposition 5. For any yGi, note that ((xw)w, y) = (xw, yw) by Proposition 3 = (x, y) 11 w 112 by Proposition 2 = <*|MI2,i/>. But (,) is nondegenerate, so (xw)w = x\\w\\2 proving (14). Quite similarly, one proves (15) [w^w^x] = 0. Now write w = c 4- 8 with c G R and 8 G Br1. Then [x, w, w] = [x, c 4- <5, c 4- 8] = [x, c, c] 4- [x, c, 8} 4- [x, <5, c] 4- [x, <5,6] since [, , ] is trilinear. The first three terms contain c G R and thus they are zero by Proposition 1; thus [x, w, w] = 0 ^ [x, 6,6] = 0 . But 6 = —6, so that [x,6,8] = —[x,6,6] which is zero by (14). Similarly, (15) implies that [w,w,x] = 0. It remains to prove that [w, a?, w] = 0. Since [w, w) x] = 0, we know that 0 = [x + w}x + w,w]
C. Alternative and Division Algebras 155 and the right hand side is [x, x, w] -f [x, w, w] 4- [w, w, w] 4- [w, x, w] . But the first three terms are zero by (14) and (15). This proves [w, x, w] = 0 and completes the proof of Proposition 6. I Definition An algebra A with unit element is a division algebra if each nonzero a € A lias a unique left and right inverse (i.e., there is a unique a~l G A such that aa~l = a~la = 1). Proposition 7 Any normed algebra A is a division algebra. In fact, given a, 6 G A with a ^ 0, the equations ax = b and xa = b can be solved uniquely for x. Proof: For a ^ 0, we have a-1 = tt^W since aa = \\a\\2 (by Proposition 5). Now note that {ba)a = b(aa) by (14), so the solution of xa = b is x = jj^w. Similarly, the solution of ax = b is x = v^m. I We conclude this section with some important formulas for a general normed algebra A. (Proofs are outlined in the exercises.) Suppose x,y G A are orthogonal, i.e., (x,y) = 0. Then (16) xy = -yx , (17) x(yw) = —y(xw) , and (18) (wy)x — ~{wx)y for an arbitrary w G A. Exercises (1) Using the fact that 2(x, y) = xy -f yx in a normed algebra A, show that 2(x, y)w = [x, y, w] + [y, x, w] + x(yw) + y(xw) for any x,y,w G A. (2) Show that the formula in exercise (1) reduces to 2{x, y)w = x(yw) 4- y(xw) by considering the associator [x 4- y, x + t/, u;]. (3) Now show that 2{#, j/)w = (wy)£ 4- (wx)y for all a1,2/, w € X.
156 V. Normed Algebras (4) Assuming that x and y are orthogonal, prove formulas (16), (17), and (18). (5) If the (not necessarily normed) algebra A is alternative, show that [w, w, x] = 0 for any w, x G A. Similarly, [x, w, w] = 0 and [w, x, w] = 0. D. Cayley-Dickson Process, Hurwitz Theorem Let B be a normed algebra and suppose A is a subalgebra with 1 G A. (Note that A is also a normed algebra.) Then RCA and so A1- C R-1. Let e be a unit vector in A^. Then e G R-1 so that e2 = —1. Proposition 8 Ae is orthogonal to A; i.e., Ae C A1-. Proof: Since A is a normed algebra, we know that xG A&xG A . Now any element of Ae is of the form ae for some a G A. Thus for any b G A, we have (ae, 6) = (e, a&) = 0 since ab G A. Hence ae G A-1. I Note that Ae is a subspace of the vector space B, so we can form the sum A + Ae . By the previous proposition, AC\Ae = {0} so this sum is direct. That is, A® Ae is a subspace of B. In fact, we can make A © Ae into a subalgebra. Theorem 1 A ® Ae is a subalgebra of B and the multiplication in A 0 Ae is given by (19) (a + be)(c + de) = (ac - db) + (da + bc)e Proof: Clearly (19) implies that A ® Ae is a subalgebra, so we just need to prove (19). Now (a + be)(c + de) = ac-{- (be)(de) + a(de) + (6e)c , and the first term ac is already in the desired form. By Proposition 8, we have (6, e) = 0, (d, e) = 0, (d, be) = 0, etc. So we can use the formulas (16), (17), (18). Recall (17) says that for (x,y) = 0 x(gw) = -i/(£it;),
D. Cayley-Dickson Process, Hurwitz Theorem 157 Let x = be and y = d and w = e. Then (6e)(de) = -d{(be)e) = ~d((-be)e) = -d(b(ee)) = -db where the second equality follows from the fact that be G Ae C R-1, and the third is true because B is alternative. Next, by (16) and (17), we have a(def= a(—de) = a(—ed) = a(ed) = —e(ad) = e(da) = —(da)e = (da)e ; and finally, by (18), (be)c = —(bc)e = (bc)e . I Exercise. Show that the conjugation on A 0 Ae is given by a + 6e = a — be . We now use Theorem 1 to motivate the Cayley-Dickson Process. Let A be a finite-dimensional real algebra with unit element and a conjugation such that § = x and xy = yx. We are not assuming that A is normed, but we can define the real and imaginary parts of x G A by Re(x) = -{x + x) and Im (x) = -(x — x) . Zi Zi (See exercise (1).) We make the (external) direct sum B = A 0 A into an algebra with unit element by defining (a, 6)(c, d) = (ac — d&, da + 6c) ; and we define conjugation on B by (a7&) = (a, -b) . Exercise. Show that this conjugation on B satisfies x — x and xy = yx. (Thus we can repeat this construction, forming B 0 £?, etc.) What are the real and imaginary parts of (a, 6) G Bl We say that I? is obtained from A by the Cayley-Dickson Process. So far, we have C = R0R, H = C0C, and O = H0H by using the Cayley-Dickson Process (see (2') and the exercises of section B). Now we regard A as a subalgebra of B = A 0 A by including it as the first factor. If we choose a unit vector e from the second factor, then we can write B as the internal direct sum B = A 0 Ae with multiplication given by (19).
158 V. Normed Algebras Theorem 2 Suppose B = A(&Ae is obtained from A by the Cayley-Dickson Process. Then (i) B is commutative 4=> A = R, (ii) B is associative ^ A is associative and commutative, and (hi) B is alternative ^ A is associative. Proof: Suppose x = a + ae, y = b + /?e, and z = c + ^e are elements of B with a, a, 6, /?, c, 7 G A. We will state some rather incredible-looking formulas, show that these prove the theorem, and then show that they are true by using the Cayley- Dickson formula (19). (20) [x, y] = [a, b] + 2 Im (a/?) + 2(/3 Im (a) - a Im (b))e (21][x,y,z] = [a,7/?] + [&,7a] + M<*] + (a [6,c] +/? [a,c] +7 [a, 6] +a [A 7] + [a, 7] £ + 7 [a,0] )e (22}x, x, y] = [a, /3, a] + [a, 6, a]e Suppose A = R so that all of the terms in the right hand side of (20) zero. Thus B is commutative. (i) =>• (by contrapositive) Suppose A^Rso that dim A > 2. Now R = Span(l) C A so that A contains C = R0R by Theorem 1. But then B must contain H = C®C so that B is not commutative. (ii)«= If A is commutative, then any commutator with both entries from A will be zero. This is precisely the case in the right hand side of (21), so B must be associative. If B is associative, then A certainly is; furthermore, the associator [x, 2/, z] is always zero. In particular, the A-part in (21) [a,7/?] + [&,7a] + [c,0a] is always zero. Let a,/3 G A be arbitrary, let 7 = 1, and let b = c = 0. Then we see that [a, /?] = 0 so A is commutative. (in) <= If A is associative, then the right hand side of (22) vanishes. Thus [x, x, y] = 0 for all x,y e B, Looking at the proof of Proposition 6, we see that this is sufficient to prove that B is alternative,
D. Cayley-Dickson Process, Hurwitz Theorem 159 (iii) =* If B is alternative, then [x, #, y] = 0 (see exercise (5) of the previous section). By (22), we have [a,J3,a] = 0 for any a, /?, a G A so A is associative. Now we show that the formulas (20), (21), and (22) hold. Using (19), we calculate xy =& (a + ae)(b + /3e) = (ab - J3a) + (/3a + a6)e and yx = (b + /3e)(a + ae) = (6a - a/3) + (a6 + /?a)e . Thus xy - yx = ab - ba + (a/3 — /3a) + (/3(a — a) — a(b — b))e . Recall that 2 Im (w) = w — ?D, so this last equation becomes (20). To establish (21), we must assume that A is associative. (This is no restriction since A is always associative in case (ii).) Now [x,y,z] = (xy)z — x(yz) and using (19), we calculate (xy)z = {(ab -jia) + (/3a + ab)_e}(c + je) = (a6 - /?a)c — 7(/?a + a6) + {j(ab - j3a) + (/?a + ab)c}e and #(yz) = a(bc - j/3) - (fry + c/?)a + {(76 + f3c)a + a(c6 - /?7)}e . So we have (23) r 1 _ / (ab)c ~ (Pa)c ~ 7(/?a) ~ 7M0 J 7(a6) - 7(/?a) + (/?a)c_+ (a6)c + 1 -(76)a-()9c)a-a(a)+a(j97) '6 Since A is associative, (ab)c — a(bc) = 0 in the A-part. Also, in the A-part, we have c(j3a) - (f3a)c = [c, (Ja] (terms 8 and 2) a(j/3) — (7/3)a = [a, 7/?] (terms 6 and 3) 6(7(2) — (7(2)6 = [6,7a] (terms 7 and 4) . Hence the ^4-part of (23) is the same as in (21). For the Ae-part, we have 7(a6) - 7(6a) = 7[a, 6] (terms 1 and 5) /3(ad) - 0(da) = /3[a, c] (terms 3 and 6) a(ls) - a(Sb) « a[g, c] (terms 4 and 7) .
160 V. Normed Algebras We are left with the terms a(/?7) — 7(f3a) which we rewrite as afij — aj(3 + aj(3 — ja(3 + jaj3 — jfia = a[&7] + [a,7]j9 + 7[a,j9], so (21) holds. To establish (22), we must assume that A is alternative (which is the case in (iii)). We calculate xx = (a + ae)(a — ae) = aa-haa and so (24) (xx)y = (aa + aa)b + (aa + aa)(3e . Also, xy = (a — ae)(b + /?e) = (a& + /?a) + (/?a - ab)e so that x(icy) = (a + ae){(a6 + /3a) + (/?a — ab)e} = a(ab + J3a) — (a/5 — ba)a + {(/3a - a6)a + a(ba + a/?)}e = a(a&) + a(J3a) — (aj3)a + (ba)a + {(/?a)a - (a6)a +_a(6a) -f a(a/?)}e = (aa)6 + b(aa) — [a, /?, a] + {/?(aa) + (act)fi — [a, 6, a] }e where the last equality follows from the fact that A is alternative. Now note that ww G R for any w € A. (Exercise. Show this.) So we have /25x x(xy) = (aa + aa)b - [a, /?, a]_ ^ ' + {(aa + aa)^ — [a, 6, a] }e . Subtracting (25) from (24) gives (22) which finishes the proof of Theorem 2. 1 An important corollary of Theorem 2 is Theorem 3 (Hurwitz Theorem) The only real normed algebras are R, C, H; and O. Proof: Let Bbea normed algebra and let R = Span(l). If B = R, then we are done. If not, then Rx 7^ {0} so we can choose a unit vector i G Rx. By Theorem 1, C = R 0 Hi is a subalgebra of B. If B = C, then we are done; otherwise, we choose a unit vector j G C1. Then H = C 0 Cj is a subalgebra of B. If B = H, we are done. If not, we choose a unit vector e € H1 so that O = H 0 He is a subalgebra of B.
D. Cayley-Dickson Process, Hurwitz Theorem 161 If O is not all of B, then B contains the subalgebra O 0 Or where r G 0^~ is a unit vector. But B is a normed algebra so that O 0 Or is also normed. Hence O 0 Or is alternative (by Proposition 6) so that O must be associative by Theorem 2. Since O is not associative, we must have B = O. I Exercises (1) Let A%e a finite-dimensional real algebra with unit element, and choose a basis {^i,... ,vn} for A such that vi = 1. For x = cci^i H + xnvn G A, define X = Xi — X2V1 XnVn and show that this conjugation satisfies x = x and xy = yx. Also, show that Re(cc) = \{x + x) is, indeed, in Span(l) = R. (2) Show that the Cayley-Dickson process works. That is, show that the multiplication defined on B = A 0 A makes B into an algebra with unit element. (3) Consider the algebra B = O 0 O obtained via the Cayley- Dickson process. Show that ||x||2 = xx still holds, where || || is the standard norm on R16. Conclude that B is a division algebra. However, given a, b G B, the equation ax = b cannot always be solved uniquely for x since B is not alternative. (4) Let A be the subalgebra (with unit element) which is generated by any two elements of O. Show that A is associative. (Hint: If x, y G O generate A, write these as x = X\ + #2, y = yi + 2/2 where x\ = Re(#) and X2 = Im (x), etc. If A ^ R, consider R 0 Re C A where e = pjir.) (5) We have shown that xx = \\x\\2 and xy = yx hold in a general normed algebra. Prove these directly for O (without assuming that O is normed). (6) Use exercises (3) and (4) to give a proof that O is a normed algebra (i.e., satisfies (1)). This also shows that R, C, and H ar© normed.
162 V. Normed Algebras (7) Given cc, y, z G O, prove the identities (xyx)z = x(y(xz)) , z(xyx) = ((zx)y)x , and (xy)(zx) = x(yz)x . Note that the expression xyx makes sense since O is alternative. (Hint: If any two of {#,?/, z} are equal, then these equations reduce to zero by exercise (3). Show that these formulas are linear in y and z, so we can assume x,y, and z are orthogonal. Now apply exercises (2) and (3) from the previous section.)
Bibliography [1] Morton L. Curtis, Matrix Groups, second edition, Universitext Series, Springer-Verlag New York, Inc. (1984). [2] Paul R. Halmos, Finite-Dimensional Vector Spaces, second edition, Un- degraduate Texts in Mathematics Series, Springer-Verlag New York, Inc. (1974). [3] F. Reese Harvey and H. Blaine Lawson, Jr., "Calibrated Geometries," Acta. Math. 148 (1982), 47-157. [4] I.N. Herstein, Topics in Algebra, second edition, John Wiley & Sons (1975). [5] Serge Lange, Linear Algebra, second edition, Addison -Wesley Publishing Co., Inc. (1971). [6] Seymour Lipschutz, Linear Algebra, Schaum's Outline Series, McGraw- Hill Book Co. (1968).
Index, abelian group, 2 absolute value, 6 adjoint, 129 algebra, 18, 56 alternative, 154 associative, 18, 57 commutative, 57 division, 155 normed, 155 algebraic multiplicity, 79, 105 algebraically closed field, 4 annihilator, 45 associator, 60, 145 basis, 22 bilinear form, 113 block diagonal matrix, 103 Cayley numbers, 151 Cayley-Dickson formula, 156 Cayley-Dickson Process, 157 Cayley-Hamilton Theorem, 84, 95 change-of-basis matrix, 53, 116 characteristic of field, 4 characteristic polynomial, 83, 95 classical adjoint, 71 coefficients of a polynomial, 87 coefficients of linear system, 72 cofactor, 71 column, 48 column space, 35 commutator, 60, 145 comparable, 44 complementary subspaces, 28 complex numbers, 6 conjugate, 53 in normed algebra, 145 of complex number, 6 of a matrix, 125 of quartenion, 144 coordinate vector, 51 coset, 32 curve, 133 differentiate, 133 product, 133 degree of polynomial, £7 determinant, 63 diagonal matrix, 14, 65 diagonalizable, 104, 106 dimension, 24, 36 direct sum external, 28 internal, 30 distance, 111 divide, 90 Division Theorem, 88 divisor of zero, 3 dual basis, 42 dual space, 41 echelon form, 69
166 eigenspace, 39 eigenvalue, 39, 76. 84 eigenvector, 39 elementary column operations, 70 elementary matrix, 69 elementary row operations, 68 elimination, 73 endomorphism, 18 End(V0, 18 epic, 16 exact sequence, 31 extension field, 4 exterior algebra on IR3, 58 on IRn, 61 Factor Theorem, 89, 90 field, 2 finite-dimensional, 22 finite set, 22 functional, 41, 120 fundamental theorem of algebra, 4 general linear group, 19, 65 generates, 12 geometric multiplicity, 39, 105 Gram-Schmidt orthnormalization process, 118 greatest common divisor, 90 group, 1 Hamel basis, 24 Hermitian form, 124 Hermitian inner product, 123 Hermitian linear, 124 Hermitian matrix, 128 Hermitian symmetric, 124 Hilbert Space, 117 Index homogeneous system of equations, 76 homomorphism, 82 Hurwitz Theorem, 160 i,j entry, 48 ideal, 82 idempotent, 77, 97 idempotent map, 20 identity map, 15 identity matrix, 52, 57 image, 16 imaginary part, 143, 145 indeterminate, 87 infinite-dimensional, 22 infinite set, 22 injective, 16 inner product, 111 inner product space complex, 124 real, 116 integral domain, 85 inverse, 52 irreducible, 91 isomorphism of vector spaces, 16 Jacobi identity, 60 join, 36 Jordan block, 103 Jordan form, 104 kernel, 16 ^2-space, 117 Laplace expansion, 67 left multiplication, 149 left value, 89
Index 167 Lie Algebra, 60, 131 Lie Group, 132 line, 36 linear combination, 11 linear geometry of H3, 36 linear map, 15 linearly dependent, 21 linearly independent, 21 matrix, 14, 47 matrix algebra, 58 matrix multiplication, 50 matrix representation of a bilinear form, 115 of a linear map, 48 maximal element, 44 method of elimination, 13 metric, 111 minimal polynomial, 92 monic map, 16 monic polynomial, 7, 87 natural map, 43 nilpotent, 28, 77 nondegenerate, 112 nonsingular matrix, 15 nonsingular, 52 norm, 111, 125 normal matrix, 138 normed algebra, 141 normed vector space, 120, 125 oct onions, 151 operator, 18 orthogonal, 118, 125 orthogonal complement, 122 orthogonal group, 130 orthogonal map, 118 orthogonal matrix, 118, 130 orthonormal, 118 orthonormal basis, 118 parallel, 36 parallelogram law, 120 partially ordered set, 43 permutation, 6 plane, 36 point, 36 polar form, 120 polynomial function, 87 positive definite bilinear form, 114 positive definite matrix, 128 . preimage, 16 projection, 20 quadratic form, 116 quaternions, 144, 151 quotient space, 33 rank, 55, 115 Rank Theorem, 25 real inner product space, 116 real part, 143, 145 relatively prime, 91 Remainder Theorem, 90 ring, 81 associative, 81 commutative, 81 homomorphism, 82 root, 4, 87 rotation group, 130 row, 48 row equivalent, 68 row reduced, 69 row space, 55 scalar, 9
168 Index Schwarz Inequality, 112, 115, 124 similar matrices, 53 simply ordered set, 44 singular matrix, 15 skew-Hermitian matrix, 128, 131 skew-symmetric matrix, 130, 131 span, 11, 12 special linear group, 133 special orthogonal group, 130 special unitary group, 130 Spectral Theorem, 98, 101 split exact sequence, 31 splitting, 31 square matrix, 52, 57 stable subspace, 37 standard basis, 23 subfield, 4 subgroup, 5 submatrix, 56 subring, 82 subspace, 11 sum of subspaces, 14 superdiagonal, 101 surjective, 16 symmetric bilinear form, 114 symmetric matrix, 68 symplectic group, 153 symplectic matrix, 153 system of equations, 72 terms of a polynomial, 87 torsion, 3 trace, 54 transpose, 65 triangle property, 111 triangular matrix, 65 trivial solution, 72 unit, 19, 81 unit element, 18, 57, 81 unit vector, 118 unitary group, 125 unitary map, 125 unitary matrix, 125, 130, 134 upper bound, 44 upper triangular, 14, 65 value, 89 vector, 9 vector space over fc, 10 real, 9 Zorn's Lemma, 44
o