Text
                    1 ‘ Н «1
LINEAR ALGEBRA
L’1 ? fforu-[: ’]
Everyone
Л= » o|
GILBERT STRANG


LINEAR ALGEBRA FOR EVERYONE GILBERT STRANG Massachusetts Institute of Technology WELLESLEY-CAMBRIDGE PRESS Box 812060 Wellesley MA 02482
Copynght 02020 by Gilbert Sum* ISBN Г78-1-733М**-3"® be reproduced or stored or transmitted AU righureaerved Nopejof^nra ^„„«0 from by my tneam. 1,к11мЬп* «у language >t strictly prohibited — Wellesley - Cambridge Pres. . „blisber. author red traasbbom ле arranged by , HTfcX lypewtung by AiMey 987854321 QAI84 2 5773 21ИО|ООС512/3-<1с23 Other teat, from WeUesley • Cambridge Press Linear Algebra ami Learmng from IMU. 2019. Gilbert Strang Introduction to Linear Algebra. Sth ErL. 2016. G.lbet Strang Wavelets and Filar Валка. Gilbert Strang and Truong Nguyen Introduction to Applied Mathematics Gilbert Strang Calculus Third Edition. Gilbert Strang ISBN 978-O-692I963-8-0 ISBN 978-0-9802327-7-6 ISBN 978-0-9614088-1-7 ISBN 978-0-9614088-7-9 ISBN 978-0-9614088-0-0 ISBN 978-0-9802327-5.2 Чц.-.,!,». Г«е i:i-u..l Pooimniny. Kai Bone ft Gilbert Strang ISBN 978-0-9802327.3.8 Essays Й1 linear Algebra, Gilbert Strang ISBN 978-0-9802327-6-9 DUTervaUal Fquatiom and linear Algebra. Gilbert Strang ISBN 978-0-9802327-9-0 An Analysis of the Finite Element Method. 2008 edition. Gilbert Strang and George Fix ISBN 978-0-9802327-0-7 Wellesley Cambridge Free. Bos 812010. Wellntey MA 02482 USA WWWwelledeytainbridgr.com I. AFFxreryoneW gmail.com Gilbert Strang « page math.niit.edu/ - gs Fororden math.mit.edu/wTborder.php Outude USACanada: w ww.cambridge.org Select books. India: www.w ellrdcy publishers .com The website for this book (with Solution Manual) is math.mit.edu/evcryone 2019 book Linear Algebra and I-earning from Data (math.mit.edu/leamingfnimdiita) 2016 book Introduction to Linear Algebra. Sth Edition I math.mit.edu/llnearalgchra 1 2014 book Differential Equations and linear Algebra (mathmitcdu/dda) Linear Algebra is included in МП". OpenCouneWare ocw.mit.edu/courvrUmatheinatlcs Those videos (including 18 06SC and 18 065) are aim on www.youtube.com/mitoc w 1H 06 Linear Algebra 18 06SC with problem solution. 18.065 Learning from Data MATLAB* is a repuered trademark of The Math Works. Inc The cover design was created by Gad Corbett and Lot. Seilers lseHcrMiesign.com
Table of Contents Preface T 1 Vectors and Matrices I l.t Linear Combination* of Vector*..................................... 2 1.2 Length* and Angie* from Dot Product* ........................... 11 IJ Matrices and Column Space* ..................................... 20 1.4 Matnx Multiplication and Л m CJ2 ................................ 29 2 Solving Linear Equations Ax = b 39 2.1 The Idea of Elimination........................................... 40 2.2 Elimination Matnce* and Inverse Maine** ......................... 49 2.3 Matri* Computation* and A m LU.................................... 57 2.4 Permutation* and Transpose* ....................................... M 3 The Four Fundamental Subspaces 74 3.1 Vector Space* and Subapace* ...................................... 75 3.2 The Nullapac* of A: Solving Ax - 0................................ 83 3.3 The Complete Solution to Ax - 6................................... 96 3.4 Independence. Baai*. and Dtmenuoo ............................... 107 3.5 Dimension* of the Four Sdbtpacc*................................. 121 4 Orthogonality 154 4.1 Orthogonality of the Four Subspace* .............................. 13$ 4.2 Projection* onto Subspace*...................................... 143 43 Least Square* Approximations....................................... 153 4.4 Orthogonal Matrices and Gram-Schmidt.............................. 165 5 Determinants and Linear Transformations 177 5.1 3 by 3 Determinant*.......................................... 173 5.2 Propertie* and Application* of Determinants..................... 184 5.3 Linear Transformations........................................ 192 6 Eigenvalues and Eigenvectors 201 6.1 Introduction to Eigenvalue* .................................... 202 6.2 Diagonaliting a Matnx........................................... 215 6.3 Symmetric Positive Definite Matrices............................ 227 6.4 System* of Differential Equations............................... 243 iii
Table of Conient^ 8 7.1 Sm^ValursaodSm^**»' _ 7.1 72 Compre»"* l®<e* 7.4 Learning from Data IJ Muuminn|bn»by<Jf*1,CT*Dofe*“ 158 259 269 274 280 286 289 299 306 321 Appendix 1 The Ranks of A В and A + В 334 Appendix 2 Eigenvalues and Singular Values: Rank Ono 335 Appendix 3 Counting Parameters in the Basic Factorizations 336 Appendix 4 Codes and Algorithms lor Numerical Linear Algebra 337 Appendix 5 Matrix Factorizations 338 Appendix 6 The Column-Row Factorization of a Matrix 340 Appendix 7 The Jordan Form of a Square Matrix 343 Appendix 8 Tensors 344 Appendix 9 The Condition Number 345 Appendix 10 Markov Matrices and Perron-Frobenius 346 Index of Symbols Six Great Theorems Linear Algebra In a Nutshell 356
Preface This i* a linear algebra textbook with a new Mart Chapter 1 begin* a* usual with vector*. We rec (heir linear combination* and dot product*. Then the new idea* come with matrices. Let me illustrate those idea* right away by an «ample Suppose we are given* 3by 3 mam* A with column* at.*, *!; There column* are threedimensional vecton The tint vector* nt and a* connect the center point (0.0,0) io the pouu* (1,3.4) and (2,4,2). The picture show* those point* In 3-dimen>*onal apace (apt space) The key to this num* u the third vector a* going to the point (3,7,6). When I look at thow vector*. I tee something exceptional Adding column* 1 and 2 produce* column 3 In other word* O| + o> « a*. In а 3 dimensional picture, a, and a, go from the center point (0,0,0) Io the point* (l.3,4)and(2.4.2). The picture shows how to add those vector*. It i* normal that all crwnbtaatxwi* of two vector* will fill up a plane (The plane ia actually uifiruse. we juu drew the part between the vector* I What I* really exceptional i* that the third point o* =(3.7. •) lie* on thi* planr of a, and 03. Mom points don't lie on that plane Mint vector* a* are nor combination* of O| and Oj. Mont 3 by 3 matrsces have independent columns Then the malm will be invertible. But there three column* are dependent because they lie on the «ante plane: o> + <*i - o> at» (1.3.4) (3,7.0) -oi+ u3 Three vectors sharing a plane inside 3-dimensional space оз = (2.4.3) (0,0.0)
1Jnear Algebra for Everyone vi fact about th°« ,hreC veC,OTS a‘ °2,a3. TKu pawre raveab Ле ««> ™a Ю их Ле >dca. and to get better ButweneedthengbtlangiupeloJocn ‘ djrection and better at expresung it Here are three sleps tn ago» of column 1 and column 2 Idea in words Idea In symbols 2 3 Matri, times vector Step 3 shows how a mana C mntapbes a rector x. The columns a, and <4 in C multiply the numbers a, - 1 and и lax Пе output C* U в combination of the columns Here that аЛшш cooteaaboa Bill + ®j One more creaal sup allows several combustions al once This is the way forward. We cannot take this step with paeans or worts A matrix multiplies a matrix 4 Mitrn time, main, A • CR a «1 “1 10 1 Oil Column 1: lai + Ota, a( Column J; 0о> ♦ log - e> Colman 3 le, * lo, . (l,g,4) + (3,4.3) - (З.Т.в) > а. That matns nwikspbcatioo A CR display, mayor information; A has dependent columns the amtnaahoo of column, 1 + 2-3 gives (0,0,0) 4 also has dependent row, none conbaatsoo of its rows wiQ give (0.0,0) I The foi""4>«er_ofth«s A wooly «plane and not the whole 3D space The -row tpotr- of this л aho a plane and not the whole 3D space The «pare mama A haanotorem It, drrermtnaat n aero. It is unuwml. In Chapter, I IO 4. the orgmu». idem — -------' — - Ktor 'Paras The columns of A arc in «n-dmieneoMj sp«x R- But the action is in b mtrXT X”* lPOCe “d **“ a W Лх"“" " • row space pan and a nullspace part: f fmeor ,,Ujmwi a* = b ___
Preface **' The Plan for the Book Thai example is pan of a new start 1 believe it is a better start (for the reader and the course). By working with specific matrices to introduce the algebra, the subject unfolds. Chapter 1 develops the maths equation A “ C times Я C lakes independent vectors like в! and a? from the column space of А Я takes independent vectors from the row space of A. Those two “vector spaces" an at the center of linear algebra. We meet them properly in Chapter 3. But you will know about independence from examples in Chapter 1. Die big sup is io factor A uuo C wrus R Matrix muhipbcauoa is a crucial operation, and Chapter I ends with four different ways to do rt—seen on the back cover of the book This sets the path to all the great factorizalioas of linear algebra A « LU Chapter 2 solves n equations Ax » b in n unknowns: A is square A = CR Chapter 3 reduces to r independent columns and r independent rows A — QR Chapter I change* the columns of A into perpendicular columns of Q 3 “ QAQr Chapter 6 has eigenvectors in Q The eigenvalues of S are in A A > UE VT Chapter 7 has singular vectors in I/ and V and singular values in E The column* of A an tn rn dimensional space R"*. the rows are in R' The m by n matrix multiplies vecton x in R“ to produce Ar а в' But the real action of A Is seen in the four fundamental subspaces Chapter 2 only allows one solution x„ The matrix A is square and invertible Chapter 3 find* every solution to Аж - b by adding every Chapter 4 deals with equa- bom that don't have a solution (because b has a piece from the mysterious fourth subspace) 1 hope you will like the “big picture of linear algebra" on page 124 all four nibspace* Those hve factorization* are a perfect way to organize and remember linear algebra The eigenvalues in the matrix A and the singular values in E come from S and A tn a beautiful way—but not a simple way. Those numbers in 5x - Xx and Ao ж tru see deeply into the symmetric matrix S and the m by n matrix A. Often 5 appears in engineer- ing and physics Often A is a matrix of data. And data is now coming from everywhere. Please don't miss Tim Baumann's page 272 on compressing photographs by the SVD Chapter 5 explains the amazing formulas produced by determinants Amazing but unfortunately difficult to compute' We solve equations Ax = b before (not after) we find the determinant of A Those equations ask a* Io produce b from the columns of A. Ax > s, (column 1) + Жа (column 2) + •• • + r, column я) - right aide b In principle, determinants lead to eigenvalues (Chapter в) and singular values (Chapter 7). In practice, we look for other ways to find and use those important numbers And yet determinants tell us about geometry too—like the volume of a tilted box in n dimensions. A short course can go directly from dimensions in 35 to eigenvalues for 2 x 2 matrices.
Lmear Ale»*™ ,or Everyone viii № added Chapter 8 oa Final Chapter: Learning from Dau ^ofte»coa«'«»re^ul"n“‘IW* number n is large, and the number of . coclied linear algebra has to find out underfunding that leads to a decision, functioo of the input. Deep learning In machine learning the output в a produces (often with giant compuU- mrm to find that furutioo from the ««Й» fel|urcs of Hons) a learning function Ffa,»)^ «*“*, * ц* waghtt assigned to those the rapou and the weight*. (*> = ^^O^y.hrutapfitaatoa. Optimizing the weights so learn from die trammg ttata v .. the btg computation. When that u well done. « ca. mp- new sampio that the lystem lus never Seen The BMxesa of deep learning ta that » <Л» ck* ю correcl OUW The system Ьм identified aa uwsy or translated a sentence, or chosen a winning move. This ujsivcrwurtowar algefvu. a mutate of weight matrices and ReLLl It is included with no expectatson of testing stoderes Thn is a chapter to use late, in any way you want Yo« could experiment with the webene plas ground.lrmorflow.org Machine learning has become important and powerful based on linear algebra and calculus i optimizing the weights) and on suusnci (controlling the mean and variance). Snr « u wor nrpmnrd or rvrn apetled to be pan of Ле roans Whai I hope is lhai the faster start ailoen you to reach ngemotors and smgafar Mines—chose are true highlights of this subject. Chapters 2 m 7 roeifinn the jump of intuition near the end of Chapter 1; If all columns ot A Ise oo an r~dimems<»al plane, then all rows of A also lie on a (usually different) r-dunemamal plane That fact has far-reaching consequences Thu й a textbook for a normal linear algebra dasa—to explain the key ideas of this beautiful subject to mryvrw As deasly as I can Thank you. Gilbert Strang **4 “Wk That 3 by 3 matrix I*™ “! •X «cpcnuem ПЖ1 ine numbers to show this are -S and 3; “5 (row 1) + 3 (row 2) » row 3 of A -S(i.a.») + 3(S,4,7)e(4.2,e).
Preface lx Websites for Linear Algebra The dedicated website foe this textbook is malhmiLcdu.rirnooe Seven! icy sections of the book can be downloaded. That sue also has bncf solutions So the problem sets (For homework , the instructor may ask for more detailed solutions.) Every class will find a balance between learning the essential ideas of this subject and practicing the steps on small matrices. The most beautiful website for linear algebra u 3Bluc I Brownxum. created by Grant Sanderson. He chose that name baaed on an unusual genetic feature of his eyes. For blackboard lectures you could go to the OpenCourse Ware site ocw.mll.edu created by МГГ The videos for Math 18.06 and Math 18 06SC and Math 18.065 have had millions of viewers, very often accessed through YouTube 18.08 is MITs large linear algebra course The 18.06SC videos include problem solutions by the instructors 18.065 is the newer course that leads to Chapter 8 of this book and to math.mil.rduAearnlngf rumdata Important to add The "new sasrT in this book was hrM tested ia 18 065—which begins with a substantial review of linear algebra. Two more online materials were added in 2020 to the 18.08 she: A 70- minute video outlines the new sun tn teaching and learning linear algebra That "2020 vision" has guided Secoons I 3 and 1.4 of this new book I am convinced tliat working with the columns of actual matnees is a direct route Io understanding linear independence and column spaces and mama mstiliplicabon The column-row factorualion A w CR is al the heart of solving linear equations Лг 6 Professor Raj Rao has developed a very successful course on computational linear algebra srul machine teaming at the Umvemry of Michigan The key idea is to im- plement (in class 1) what you learn The website mynerva.io describes the course and the online textbook and future plans. That textbook is complementary Io this one. Property used, the Web has become truly valuable Io students and all readers. Il pro- vides a different way Io explain linear algebra, and k is alive' Please use the video lectures to see the flow of the course And please use the book to capture that flow and hold it and practice it and understand it. This book begins with independent vectors from the column space and row space: A = CR The book ends with orrAogcmof vectors from those spaces. the u's and v's in the SVD with Av w ma. In between this we have the central ideas of linear algebra.
X Computational Linear Algebra nuly Linear algebra is often the key to .^ment and finance. This page aims to We need to computations in engineering and science r_toraputers and math provide an updase O<1 hard power and son pm«r^ ^^^ws The leaders are Hie 500 fastest computers m the !a(j „ can give their locations: Kobe, mostly paid foe and controlled by *°^ПГОС xlI -^p includes Italy. Switzerland, Oak Ridge, Lnermore. *uxi. Russia. Spain, Saudi Arabia, Germany, France, Korea. , д ,s in then speed and special processors. Finland. Norway, and BrwL The scientific interest is “ u^jy.askwl.qoesllo|B Agood source; big matrix into L time, The bench. тИ1Ги1и) and then solving a system of linear equations Ax = b XT^aH^ns in the High Performance UNPACK benchmark. Those prob- lems are a the Mart of Chapters 2 »d3. no.speeded up by parallel processing. The top machine aduesrn 415 petaflops = 415 tiroes 10" double precision floating point operations per second. This is with extremely careful coding for special hardware. The important point is that ordinary composers have also seen a tremendous increase in Numerical Linear Algebra This is the subject at major research: fast algorithms for matrix computations, The first was thousands at yean ago with “elimination". For that idea and any new one. part of the test is ю count the steps: in this case n’/3 for n Hnear equations with n unknowns To compute eigenvalues and now singular values, big progress brought that also to cn1. Favorite textbooks among many good ones are * L. N Trefethen and David Ban. Numerical Linear Algebra, SIAM (1997) • Gene Golub and Charles van Loan. Matrix Compulations, fohns Hopkins (2013). For the mathematics of linear algebra (not focused on compulation) we mention • Roger Horn and Charles Johnson. Matrix Analysis. Cambridge (2013) • R. Bhatia. Matrix Analysis. Springer and Posttrie Defin.tr Matrices, Princeton Randomized Numerical Linear Algebra We can hardly fell u, about the ТЫ, Й f"*"'”' Randonl “mPles —^srewre-
Preface XI Gratitude for Help This book was written during the months of lockdown for the corona* irus Life was limited but time for writing was nearly unlimited. Difficuit for oar society but perfect far an author. The idea for a new and more active start had just gone into a new video for Math 18.06 on OpenCourscWare. (The matnx multiplication A = CR in Section 1.4 is pari of the idea, with independent columns of A going into C.) Developing that idea into this textbook has been exciting every morning. The time was right but help was needed. It came in the best possible way My good friend Ashley C. Fernandes in Mumbai received handwritten pages every day. Then he returned IfTgX pages overnight Those pages went back and forth many times Working with Ashley has made Linear Algebra for Everyone possible; this is our eighth book. I am truly grateful for these happy months. Another good fortune has been help from Daniel Drucker. He is the most careful reader I know. Let me leave you to decide on this one; To be really ркку. in the Preface you say that the three columns lie on the same plane, bat in the figure you say that the three vectors are in a plane " 1 won't do that again. It made my day when Dan liked the small matrices on the front cover. The goals of the text are clarity and simplicity; The basic ideas of matrix multiplication evolve step by step in Chapter 1. Columns of CR are combinations of columns of C There are four different ways to multiply AB (see rhe hack cover) The key property is AB times C = A times BC This is the tenth cover created by two artists: Gail Corbett and Lois Sellers. You might have seen the rectangles for the four subspaces on Introduction to Linear Algebra. Before that came Essays on Linear Algebra with Alberto Giacomettis “Walking Man“ and Calculus with a famous curve painted by Jasper Johns. Perhaps the most beautiful was the photograph on the finite element book. These are very happy memories for an author The whole idea of helping students is beautiful. My greatest gratitude is to ary rrife JUL and our seat David and John and Robert This booh is dedicated to them.
Line» Algebra for Everyone Dictionary of Matrices xH A good way to left you Thu wide variety of шайке» ias a K> name the matrices you will me;, Identity mama I Column haul C Row bests R Rank 1 лшпж uv Chapur2 Я.т1гмпод num» E Lower triangular L Upper triangular U Chapter 3 Echelon matrix Я Free matrix F Special solutions 5 Mixing matrix M « W~} Tniupote num» A* Incidence matrix A Ратлявоа P laplacian matrix L Fourier matrix F Pscudomverse A* Chapter 4 (Mnfonal matrix Q fnpaot matnx P Least txpiam ЛТЛ Chapterh Cofactor matnx C Change of hast» Я TihedboxE Upper triangular R Ноше matnx H Reflection matnx H Chapter в Symmetric matrix S Eigenvector» X Eigenvalue» A Fibonacci matrix F Jordan matnx J SttMarmoanBAB' Exponential ел' Chapter 1 Singular value» £ Left ungular vecton U ^‘(htuntulttnaonV Compreaaed matrix Л» S®np*e<»vanao«AAT Hi»>en matrix Я Chapter П Weight matrix A Convolution C Jacobian matrix J Hessian matrix H Covanance matnx V Shift matrix S
1 Vectors and Matrices 1.1 Linear Combinations of Vectors 1.2 Lengths and Angles from Dot Products 13 Matrices and Column Spaces 1.4 Matrix Multiplication and A — CR The heart of linear algebra u in two operation*—both with vccux* We add vector* to get V + w. We multiply them by numbers c and d to pct cv and dw Combining those two operation* (adding cv Io dw) give* the liaeur combtaonoa cv + dw "♦*•[! I Linear combination* are all-important m thn subject' Sometime* we want one partic- ular combination, the specific choice c “ 1 and d “ 1 that produce* cv + dw “ (4.5). Other time* we want ail Йе combination* of V and w Combination* that produce the tcro vector have special importance Of count Ov + Ow i* always the tcro vector The vector* cv lie along a line WTicn w n not on that line, the combination* eV ♦ dw HU a complete two-dimensional plane Starting from three vector* u.v.w in three- dimensional ipace. their combtnaOom r*t + do + ew art likely Io Hll the whole apace— but not aiway* The vector* and their combtnatwnt could lie la a plane or on a line Thi* ia a key problem describe all combination* of n given vector*. Next «ер: Put two vector* into the column* at a mam* A or B. Then multiplying a matrix by a vector z exactly produce* a linear combmatiori of the column* Again thou combination* Ax fill a plane The output* from Bz only fill a line. The first example had "independent columns" The second example ha* “dependent columns" Chapter 1 explain* these central idea», on which everything build* Linear algebra move* from 2 column* in 2 dimension* to n column* in m dimension*. Your mental picture stay* correct- and we end by multiplying matricn 1.1 Vri’torodditton v 4-wandfiacurcoatbiaaetoa* cv 4-dw. 1.2 The dot product r w of two vectors end the length |,v| = y/v v IJ Matrix A timet vector z is a euatMtoltoa of the colnmm of A 1-4 Matrix A timet matrix В в | ЛЬ] • • • ЛЬ»]. Mirlnph A timet each column of В 1
Chapter I Vectors and Mairice, 2 1.1 Linear Combinations of Vectors Z----------———~T + of the vectors» w > s' 1 i ic a сзвввовлл j я... ’»‘°H 5] i ; ] .<[; | »«»'<*" ; ]. till a plane la тух space Same plane for Arithmetic starts with numbers We operateoa those numben in two essential ways: Addition 2 + 1-5 Multiplication (4) (5)-20 Subtracting 3 is just the inverse of adding 3 Subtract S - 3 lo recover 2. Dividing by 5 is ум the inverse of ndnptysag by 5 Divide 20 by 5 Io recover 4. Combining addition and taakepbeanoa leads lo (2) (3 ♦ 4) (2) (3) -I- (2) (4). Linear algebra moves addition and nailnpbcatioa into higher dimensions Instead of working with angle numbers, we wort with ««cion The vector о - (3,1,7) is a string of three numbers к a a *l-dime<Moaal vector" The good way is lo write о as a column vector Thea we can add two column vectors v and w: I add each pair of components) Subtracting sc is ум the inverse of adding Sa. so dial v + SO - w recovers V Vector wblrwtiim (.4-M) - w - । J j - 5 2 (») (4) ♦ (1) (S) + (T) (2) - 12 + 5 + M . 3 j Ik>t product = 31 (I) (3) The dos product v - ш u , _ 7~ ------- ** ’• “d ’ • • |v^bL’m;leF2l^°r‘ °"'scctlon ° •»«veals ™lt,plKMwa m Im~ i(gehri b = w. But a more important
3 The output from Av is a ««ctor not a number: Matnx A limes vector v equal» vector Av. The matrix A is a rectangle of numbers: m rows and n columni. A 2 by 3 matrix multiplies a vector v with n = 3 components. —- -4: hi Please notice: A has 2 rows. A lima v involved 2 dot products The first component in Av used row 1 of A The result was 31. The second component of Ac used row 2 of At (row 2of A) • (column vector e) = 1-3+ 2-1+ 1-7= 12 This is the usual way to multiply A times •: dot products of the rows of A with c. But Section 1.3 will explain a better way to understand Av Computing row • column is fine, but understanding Av becomes clearer with linear combmanona of column vectors. Let me show otic "linear combtnaooo" because this is the fundamental operation on vectors. Multiply veclon by numbenlAe 2 and i and add rite retulu : 1.Inrar combination co + d so = 2v > 4w (S) Those combtnalioas go into lhe big step Multipiy a matrix by a matrix I would like to save that step for Section 14 We have explained three ways lo multiply, involving numbers and vectors and matrices: 1. Number times vector (co) 2. Vector vector (•• to) 3. Matnx times vector (Av) Those are tn Sections 11 and 1.2 and 13 Then AU is matnx multiplication in Section 1.4. Let me also say: A limn v can me the rows of a mama A or the columns of A. There are m row vecton in A and there are n column vectors. Both «rays use the same rrm numbers A major key to linear algebra comes from the connections between two ideas. lint products with rows of A Combinaliom of columns of A
Chapter 1. Vectors and Matrices Linear Combinations Combining addition wnh scalar multiplication proluces a “linear combination of о and u- Multiply* bye and inutapiytt» by* Tbenaddrv + dw Theuatpfcvanddwua Imearcombiaanoa cv + dw. Foor Special bnev combinations are sum. difference, zero, and a scalar multiple cv • 1*4 1» - wm of sectors (4.2)+ (-1.2) = (3.4) I* - Iw = difference of sectors (4,2) — (—1,2) = (6,0) Ov + Ow = am «сиг (0,0) cv + Ow W sector cv m (he direction of * The zero sector is always a possible combsaatioo from c = d = 0. Every lime we sec a “space” of sectors, that uro vector will be included Thu big vic*, taking all lhe combi- aaiumt of v and w. n linear algebra at wort The figures show bow you can visualize vectors For algebra, we just need the com- ponents dike 4 and 2k Thai sector * is represented by an arrow. The arrow goes vi w 4 units io die right and t>a > 2 unru up li ends al the point whose z. p coordinates art 4,2. This point la another repmentaunn of the sector—to we have three ways Io describe в; Represent sector о Two numbers Arrow from (0.0) Point in the plane We add using the numbers We vuiuluc v + w using arrows for о and w and V + w lector addition (head (o tad) Alike nd of v.place the Hart of w
1.1. Linear Combinatiui» of Vectors 5 Vectors in Three Dimensions A rector withlwo components correspond»» a point in the .rp plane The component» of v are the coordinates of the point: z = v> and p The trra* end» at chit point (vi.m), when it start» from (0,0) Now we allow rector» to have three component» (Vt.bj.Os). The zy plane is replaced by three-dimensional ryi »pace Here are typical rector» (still column vecton but with three components): The vector ю corresponds to an arrow in 3-space Usually the arrow Mans al the “origin", where the rpa axes meet and the coordinates are (0.0,0). The arrow ends at the potm with coordinates U). uj. v>. There u a perfect match between the column rector and the я/TOS' from the origin and the pour where the arm emit The vector (a, p) in the plane («nth 2 numbers) is different from (r, p. 0) in 3-space I correspond Io points (z.p) and (r.p, a) in 2D and 3D. и abo written ai vw (I,|,-|). The reason for the row form (in parentheses) is to save space. But t> « (1,1,-1) is not a row vector* h is tn actuality a column rector, just temporarily lying down The row vector [ 1 1 -1 ] is absolutely different, even though n has the same three components. That 1 by 3 row rector vT w 11 1 -1 ] is the “transpose" of the 3 by I column vector v. In three dimensions, v + w is still found a component al a time The sum has components t'i + иц and uj + vi and I'l + srj. You see how to add vectors in 4 or 5 or n dimensions. When w starts at the end of v. the third side is v 4- to. The other way around the parallelogram is w + V. Do the four rider all tit in the юте plane.’ Yet. And lhe sum v + w - и — w goes completely around to produce the _________vector.
Chapter I Vectors and Matrices 6 The Important Question: All Combinations Foe one vector u. the only linear combtnaboM » multiples «*• ,wo vectors, the combiaatiOM Me co 4 do For three vectors. Ле combinations are cu + do + eto. Will you take the big мер from one coMbuuboo io all combinations ? Every c and d and e are allowed. Suppose Ле vectors M. e. to are la three-dimensional space: I. What is the picture of eil combinations cw? 1 What в Ле picture of ad combusuticns css ♦ de? 3. What u lhe picture tri ail combmabons cu ♦ de ♦ ew? The answers depend on the particular vectors st. v. and w If they are typical nonzero vectors (chosen at random 1. here are the three answers Thu u a key tn our subject: I ThecombinationsruMibuAnagk (0.0,0). 1 The combinations ru + de til a pine thenar* (0,0,0). 3. The combinations CM + de ♦ ew hu rbree-duneasiona/ ipact The uro rector (0.0.0) b oa Ле line because c can be иго It u on Ле plane because e and d could both be rero The tire tri vretors щ „ ш6пйе|у long (forward and backward) Il is the plane of all eu + do (combining two vectors ts, e in threectimensioiial space) that I especially aak you to ton* about Addutf oZZeuoatrerhnrtoadduoutbr Mbrr tire filli in ike plane in Figure 1.3. tblrtX^LtL^l^V'~'4** "₽’'**“< '« Suppose that tills up the wh.de ,hr^.'** ₽U",rfu » Then combmmgall ew with all cu+dt> up the Whole Лгее-Лтеткта! spree a. ♦ do 4 ew matches every point in 3D. When w hapJn?fc ₽4,n* Й*П‘p,°' B“‘olher P°«ibilities exist The co^bZX . Л .X Р11ПС °f •** fi™ '*° three-dimensKwul spree Please thmk *?* P'"* ** ПО< gC' ful1 •₽~e Pteree Лтк about Ле spec»! ceci ln p^^ , Qf
7 1.1. Linear CombinaUons of Vector» Line containing all cu figure 1.3 -u/2 Plane from all ru + dv • WORKED EXAMPLES 1.1 A Die linear combmatioas of v = (1,1.0) and w - (0.1,1) fill a plane in R’. Detcribt that plant, find a vector dial it not a combination of v and w—not on the plane Solution The plane of v and w contain» all combination* cv + dw. The vector* in that plane allow any c and d. The plane of figure IJ tills in between the two line* ['1 [0] Combinations cv + dw = e I 1 I + d I 1 I • to] L»J 611 a plane Four vectors in that plane are (0.0.0) and (2,3.1) and (3.7.2) and (v.2a,<). The second component c + d is always the sum of the first and third components Like most vector*. (1,2,3) il not ui rhe plane, btcaatt 2 dart not rquo/ 1 + 3 Another description of this plane through (0,0.0) ts to know that n (1,-1,1) is perpendicular to the plane Section 12 will confirm that 90* angle by testing dot products 1.1 В find two equations for г and d so that the linear combination rw + dw equals 6: -l-‘l -[;] ЧЯ Solution In applying mathematics, many problems have two parts Here we are asked for the modeling part (the equations) Chapter 2 is devoted lo the solution part (finding c and d) Our example tits into a fundamental model for linear algebra Vector equation find 2 numbers c and d so that cv + dw = b For n - 2 we will soon find a formula foe c and d The "elimination method" in Chapter 2 succeeds far beyond n - 1000 column vectors Al that point we must use matrices < Vector equation 2 -1 1 2c- d=l c«2 cv + dw = b 5 + d -3 = 1 Se-3d.i d-3 Vector addition produces two equations The graph of each equation produces a line. Two lines cross at the solution. Why not see this also as a matrix equation Az = b. since that is where we are going: 2 5 2 by 2 matrix Az - b -1 -3
в Chapter 1 Vecton and Mairice, Problem Set 1.1 1 Underwhmeomto-0-.Лс.-й [ 1 .n^uptemof [ °fc ]?sunwilhme П.О eouanom e » me and d - m*. By ehmmaxmg m. fuxi one equaUon connecting me - «го. »the- number. 1 G™. around a uuorte from (0.0)»(5.0)»(0. И)»(0.°). •*- « Отои three г±т?;лс“.."imi " IM’Tbeterittfl ujuaredof*™*»" (!.»)» INI f+Ч- Problems 5-9 are about addition <d .ector. and linear combination*. 3 Deicnbe ceometncaUy (line, plane, or all of R1) all linear combtnatrona of If v ♦ w - and v - w p . compute and draw the vecton v and w. From v - , * 1 md w “ J ' , And the component, of 3o + w andrv + dw Compute u->a*waad2u*2o4w How do you know u, v, w lie in a plane? Erory combmatmn of a - (|. -J, 1) «d w - (0.1, -1) ha. component, that add »-------Rod e and d ю ttm e. ♦ dw . (X 3. -в). Why u (J.3, в) impoatible? la Ле ip plane mart a0 mne of them haear comhinadoru: в[*1 W”h e“0’1’2 “d <f“0,l,2.
9 1.1. LinearCombinationsofVectors к = (0,0,1) j + k ---------------f 2:00 i -(0.1.0) NotKt Ла Шамая b (0,0,0) al the i-(1.1,0) Figure 1.4: Unit cube from «./ к and twelve dock vector»: ail lengths - 1. Problems 10—14 are about special sectors on cubes and docks In Figure 14. 10 If three comers of a parallelogram an (1,1), (4.2). and (1,3), what an all three of the possible fourth corners? Draw two of them 11 Four corners of this unit cube art (0,0,0), (1,0,0), (0,1,0), (0,0,1). What are the other four comers'’ Find the coordinates of the center point of the cube The center points of the sis faces are________The cube has how many edges'’ 12 Review Question. in xya space, when is the plane of all linear combinations of i-(1.0,0)and t+J-(1,1,0)? 13 (a) What is the sum V of the twelve vectors that go from the center of a dock lo the hours 1.00.200....12:00? <b) If the 200 vector is removed, why do the 11 remaining vectors add lo 8 00? (c) What an the», p components of that 200 vector •- (oosf.ainf)? 14 Suppose the twelve vectors Mart from 600 al the bottom instead of (0.0) al the center. The vector to 12 00 is doubled to (0,2). The new twelve vectors add to , 15 Draw vectors u. e. u> to that their combmatiom cat + de + ew fill only a line Find vectors u. e. w to that their combinataom cu ♦ de ♦ rw fill only a plane equations for the coefficients c and d in the linear combination. Problems 17-18 go further with linear combinations of r and w I Figure 15a). 17 Figure I 5a shows | e + | w Mark the points | e +j tn and | e + ;tn and n + tii. Draw the line of all combinations co + dir that have c + d = I. IB Restricted by 0 < c < 1 and 0 < d < 1. shade tn all the combinations co •+ dir Restricted only by c > 0 and d > 0 draw the “cone" of all combinations cu - dw
РгоЫепи IM2 dm! и. -. - - .ЬгеечЬтепсй«.1 Ф.« <-* ngur. 1.5Ъ). 16 iw«di«+4«“R«urel 56 Challenge problem: Under .tut res’tncoom on ! d.e. wtO Ле combmiUom cw + *> + « fill in the dashed mingle1 To oay Ле mangle. one requirement u c > 0, d > 0. e > 0. 20 The three dashed hues tn the mangle are v u and w - t> and u - u> Their sum is _________ Draw the head u> tail addiuon wound plane mingle of (3,1) plus (-1,1) plus (-2,-2). 21 Shade n Ле pyramid of canbmatiom я» ♦ de +eu> with c > 0, d > 0. e > 0 and c+d + t$ 1. Mart the vector|(« + u+w) as inside or ouuide this pyramid. 22 If you look ar nffcombsnatioMof tho« u ». and w. n there any vector that can't be produced from cv + du ♦ etc? Different answer if «,«. w are all in Challenge Problems 23 How many corners does a cube have in 4 dimensions? How many 3D faces? How many edges? A typical corner to (0,0.1.0). A typical edge goes io (0,1,0.0). 24 Find two iigrntu сапМаоПти of die three vectors si ” (1,3) and v m (2,7) and w > (1,5) that produce i • (0,1). Slightly delicate question: If I take any three vectors u. v. v in the plane, will there always be two different combinations that produceb. (0.1)? 29 The linear cnmbinaUms of (n. 5) and (c,d) fill the plane unless__ Find four vectors u. o. w. * with four components each so that their combinations rai + du + rw + /a produce all vectors (bi.5>.bi.b«) in foor-dimensiontd space. * ^“d^'",hree7**,"B,«'--<tf»,’«‘rai*de + ew = b. Can you somehow П1Ы1 td л Сайт iff- r L 4 w г -1 0 -1 2 -I 0 -I 2
1.2. Lengths and Angle* from Doi Products 11 1.2 Lengths and Angles from Dot Products 1 The "dot product" of e - * j and so = ' is e • v - (1)(4) + (2)(6) -4 + 12-10. 2 The length squared ofc = (1,3.2)u co - 1 + 9 + 4= 14 The length b ||e|| . /14. J П. »««».- . -fa . IMP. 1.1 4 v - (1,3,2) is perpendicular to • (4, -4,4) because v • w = 0. S The angle в • 45" between c= [ * 1 and w - [ * 1 ttoo.1 = .----L—. I 0 I L « J IMI IMI ЙЙЛ) Schwars inequality Triangle inequality 6 All angles have | сов d| < 1. AU vectors have |v. w|SI|e(| ||w|| | ||v+u>||$||v||+||w|| The dot product и • c tells tn Ле squared length ||ej|’ of a vector u. The dot product c w tells us Ле angle between two vecton v and so The length ||v|| b ghen by ||v||’ о - в « ej + c’ + • • • + oj. (I) In two dimensions, this is Ле Pythagoras formula a’ ♦ b* - r1 for a nghi triangle The sides have a1 - of and 5* - Ц The hypotenuse has ||r||* - ₽} + t«J - a’ + b*. That formula for length squared matches ordinary plane geometry To reach n dimensions, we can add one Лшеаыоа at a time Figure IЛ Лтп w • (1.2,3) in three dimcnuom Now the nght tnangle has sides (1.2.0) and (0.0,3). Those vectors add to so The fins side в tot the ay piaac. the second side goes up the perpen dicular r axis . For ЛЬ tnangle in 30 wuh bypolemnc w a (1,2.3). Ле law a’+ b* w c* becomes (1’ + 23) ♦ (31) - 14 -
Chapter 1. Vectors and Matrices 12 The length of a foor^intensmoa) «» would be +-^ J 'S .< Thus the vector (1.1*U1) has length 71’+1’ + ^ = \ W’ “ ** “iT.h г""'8'’ a unit cube in four^unenuonal space That thagonal m n dimensions has length Wte»e the words «it «ctor when the length of the veetorBl.D.vuievby||»||. A uni. «ctor u has length |f»|| = L If •# 0. then» = Ba unit vector. Example 1 The standard unit vectors along the r and у axes are written t and j. In the zy plane, the unit vector that makes an angle "theta" with the s axis is (con 9, sin 9); l ull «dor. and /-[®| and « - |"J]. When 9 - 0. the horuonol «ctor u is t When « - 90*.the vertical vector j* j. Por any angle, u - (cos 9, sin 9) В a unit vector because u • u « con’ 9 + ain’ в ж 1. In three dimensions .he standard паи vectors are i — (1,0,0) and j — (0,1,0) and к (0,0,1). In tout dimenuona, one example of a unit vector it st - (j, j, J). Or you could sun with the vector »-(l,S. 5,7). Then I'vll’ - 1 + 25 + 25 + 49- 100. So v has length It) and u »/10 В a unit vector The word unit" n always uvdwxtmg that some measurement equals "one". The unit price В the price for one пет А пая cube has tides of length one. A uni. circle is a circle with rattan one. Now we tee the meaning of a "unit vector": length • 1. Perpendicular Vectors Suppose the angle between в and w n SO*. Its coune is Meo That produces a valuable leM V . w = 0 for perpendicular vectors Perpendicular .«toe, hn« »• w - 0, ТЪеп Ц» 4-»||» ж ||,||» 4- ||w||’. (2) i** T^.,m₽Ort" c-r " b“ brought ut tmek to 90* angle, and lengths " + “ e*. The algebra far perpendicular vectors (v to - 0 - w - v) it easy; II» ♦ toll1 -(• + •). (• + •)-«.• + •. w + t,..4w.w.||v|(i + ||w||> (3) Two terms were tern Please notice that ||n - w|p h al» equal lo )|»||> + 11w||>. Example 2 The vector * 3 (1.1) is n a 45* angle with the x uh The vector w 3 (1, -1) is al a -45’ angle with the x axis So the angle between (1.1) and (1,-1) is 90*. Their dot product ts v .» „ i t _ n ** •**• “ « -1 - ».«. i м „г?,?:;; : J
1.2 Lengths and Angles tram Doi Products 13 Example 3 The vectors v = (4.2) and w = (-1.2) have a jrro dot product: Dot product is zero [41 [-11 ... = 0 Vectors are perpendicular [2] [ 2j Put a weight of 4 al the point z = -1 (left of zero) and a weight ol 2 al lhe point r = 2 (right of zero). The z axis will balance on the center point like a see-saw The weights balance because lhe dot product is <4>(—1) + (2)(2) = 0. This example is typical of engineering and science. The vector of weights is (trj, w,) = (4,2). The vector of distances from the center is (ft, ft) « (-1,2). The weights times lhe distances. iftft and tftft. give die “moments’* The equation for the see saw to balance is W • tl = WtVi + Iftft w 0. Example 4 The unit vecton v » (1,0) and u = (соав.ипв) have v • u = «М0. Now we are connecting the dot product to the angle between vectors Cosine of the angle в The cosine formula is easy to remember for unit vectors: If ||v|| w 1 and I|u|| - 1. the angle * between v and u has coed a v •«. In mathematics, zero is always a special number For dot products, it means that (here two vreton are perprndicaiar The angle between them is 90* The clearest example of perpendicular vecton is t w (1,0) along the z axis and j “ (0.1) up lhe у axis. Again the dot product u i j - 0 + 0 - ft The coune of «Г is zero Figure 1.7: The coordinau: vectors i and j. The unit vector u at angle 45" divides v - (1,1) by its length Ци|| - i/2 .The amt vector u = (<хи в. sin*) it al angle 0 Example 5 Dot products enter m economics and business We have three goods to buy Their prices are (pt.Pi.Pi) for each unit—this is the price vector p The quantities we buy are (ql,qI,qa). Buying q, mu al the prin pi brings in ftPi The total com becomes quantities q times prices p This is (hr def product q - p in three dimensions : Cert = (qi.ft.ft) (pi.pi.ps) = qtpi + ftft + ftps > dot product A zero dot product means that The books balance". Total sales equal total purchases if q • p = 0. Then p is perpendicular to q (in three-dimensional space). A supermarket with thousands of goods goes quickly into high dimensions. Spreadsheets have become essential in management They compute linear combi ru- lions and dot products. What you see on the screen is a matrix.
Chap» 1 Vector» and Maine» 14 The Angle Between Two Vectors U C. = o. The dot product IS zero when the We know thal perpendKular weW have e „gles The dot product „ . togteisWTOtottoxtiiepistoCMneciJIdotpreducUto s find, lhe angle between anytwo полито vector* r ano _ __ г™л onal aod w » (oo»lS h‘n^) ^e « • W = Ехатрй 4 The ““ «^’=^*°л“"“^Х,иЬ for «.(3 - o). Rgure U ramrod + smasu>3 In tnjooometry show, that lhe angle between the unit vector* V and U ' IS The dor ,^uc, w. . rg-h » w 1* «det of v W makes no difference. в fl-a figure 1.1 Um vector* «• U - a»t. The angle between the vector* ii в. Suppose v w t* aot иго 11 may be pouttv», it may be negative The (ign of V . w immediately lellv whether wc are below or above a right angle. The angle i* lew than UlT when vw i* positive. The angle и above W when «• to is negative The right aide of Figure 19 ihow* a typreal vector»- (3.1). The angle with w - (1,3) la lew than 90“ became v • w - 6 кремне». The borderline n when vector* an perpendicular to v. On that dividing line between plus and minus. »1 (1. -3) i* perpendicular to V - (3.1). Their dot product is zero. Then w । goes beyond а 91Г angle with v The lest become* V • w> < 0: ntfalivt Figure 1.9 Sowll angle » w, > 0 Right angle ». w, ж 0 Large angle v • w, < 0 The dot product reveals the e*act angle в To repeat: For unit vector* u and U. thr Jot pr^iKi U • u h tht cosine tf 9. Thn remain* true in n dimension*. umt vector*uand Vm angle 9 hareu -U. mt Certainlv |u-U|<l
I .2. Length* and Angie* front Dut Product* 15 Wbor v and w are not unit renon? Divide by their length* to get u - v/||o|| and U = w/|jw||. Then the dot product of those unit vectors si and U give» cm»®. COSINE FORMULA If в and w are nonzero vector* then ———j HI M an®. (4> Whatever the angle, thi* dot product of u/|ut »nh u>/| w| never exceeds one Thai it the "Schwarz intqnaluy" v • w| < |u| |w| for dot product»—or more correctly the Cauchy-Schwarz-Buruakowsky inequality. It seat found in France and Germany and Russia (and maybe elsewhere—it i* the mas important inequality tn mathematic»). Since |coa® | never exceeds 1. the соыпе formula in (4) give» two great inequalities SCHWARZ INEQUALITY |o. w| < |o| Iv| TRIANGLE INEQUALITY |u A v| < M A |to| Example 7 Find coo* for в w j 1 and w « | ' and check both inequalities The dot product it v • w - 4 Both u and ir hare length »/& So | |ul| ||v|| 8 |b||wH v^s/5 5 The Schwarz inequality is 4 < 5. By the triangle inequality, side 3 -Jo A to flit les* lhan side 1 4 ode 2. With о + w - (3.3) the three tide» lit/is < Л Square thi* inequality to get 18 < 20. Thia confirm* the mangle inequality. Example 8 Thu dot product of e - (u.k) and w - (6. n| it 2nb. Their length* arc IM - I IM - A1 A M The Schwart inequality в ir < ||u|| ||w|| it 2nb £ a* + b3 For any numbers o’ and b3. grewrnc mean ab < arittunrtic ятя } (a1 A b’). The triangle inequality comes directly from the Schwarz inequality • Finally, here to a proof of the Schwart inequality that doesn’t « angles. Every vector u ha* 0 < u • u. We apply this to the vectors u m ||b\w ± ||w|>: 0 < u • u = ||u|l’w • w ±2|)u1|||ib|| w-u + |)«е||’в-е means that 2IMI,l|w|(,>2|MllMI»-M (5) Divide by 2||u!| |Ьв||. Then >u • ml < |)u|| |)ar]l is the Schwarr inequality It lead* to ||« + mH1 - B- B + B.» + wB + w- m< ЦвЦ1 + 2||b(|11m|| + ||w||*. (6) The square root it |)u A toil < |]v|’ A tc 1 Side 3 cannot exceed Side 1 + Side 2.
Chapter 1. Vecton tod Matrices 16 WORKED EXAMPLES « al «1 w = (4,3) lest the Schwarz inequality on t>. w A For the vectors » - (3J) # for lnglc between t> and to and lhe triangle inequahry on fa + «I r,D0 Solution The dot product в « • w = (3X4> + J4"3 ||e|| = #йб = 5 *nd also |H = 5- The «nn • + “ Schwarz inequality « ». • = (3X-*) + H)<3> = 24Thc ‘«'вл <rf v 15 - • -i = (7,7) has length 7^2 < jq !••(< MM 24<25 (• + •!< M + M « 7s/2<5 + 5. Thin angle frame = (3,4) tow = (4,3) 25 Count of angle 13 В Which e and w grw «ywdirv u -w| = |»l l»l tod ||« + w|| = |v|| + ||w||? £quo/iry: One vector isamultipie of the other as in w = ce. Then the angle is 0° or 180®. In thiscase IcoaBi = 1 and |»-1»|вфпяЬ|п| |w||. Iftbc angleis O’, as in w = 2u. then |v + w||=|v| + |wf (both odes give 3>J). This v. 2o. 3c triangle is extremely thin. 13 C Find a unit vector u in the direction of e = (3,4). Find a unit vector V that is perpendicular to u There are two possibilities for U. Solution For a unit vector u. divide e by its length ||v|| = 5. For a perpendicular vector V we can choose (-4,3) or (4, -3). For a ml veaot V, divide V by its length | V | = 5. Problem Set U 2 з 5 W tod M of those vecton Chock the Schwarz inequalmes |a • w| < f«l M and (e w| < |e| |„||. the angle 9 Choose ихЗ “ РгоЫав L and find the cosine of gte Chooie vectors a. b.e that nuke O’. <W. and 180“ angles with to. F« any and vectors n and w. find dto d„ of - (b) . + (c) Find Unit VCClOn Hi And ♦», in rl^ л;, - , » F.nd.nntv^a.a.to^thu.eperpeX^LZ/idLr*'0 = (2Л'2)'
17 1.2 Ixngth* and Angies from Dot Pnxkjcts 6 (a) Describe every vector» = (»>,»») that is perpendicular toe = (2.-1). (b) AU vectors perpendicular to V (1.1.1) lie on a------in 3 dimensions (c) The vectors perpendicular to both (1,1.1) and (1.2,3) lie on a----. 7 Find the angle 9 (from its cosine) between these pairs of vectors: (a) If u = (1,1,1) is perpendicular to v and w. then v b parallel to ». (b) If u is perpendicular too and w. then u is perpendicular to » + 2». (c) If u and v are perpendicular unit vectors then ,« — e|| — -fi. Yes'. 9 The slopes of the arrows from (0.0) to (vi,u>) and (»i,wi) are vj/vi and irj/ti'i. Suppose the product t4iq/r|U| of those slopes is -1. Show that v • » = 0 and the vectors are perpendicular. (The line у — 4z is perpendicular to » = — |z.) 10 Draw arrows from (0,0) Io the points v = (1.2) and » = (—2.1). Multiply their slopes. That answer is a signal that v - w = 0 and the arrows are________. 11 If v w is negative, what does this say about the angle between e and »? Draw a 3-dimensional vector v (an arrow), and show where to find all w's with r - w < 0. 12 With v = (1,1) and » = (1,5) choose a number c so thal w — cv is perpendicular to v Then find the formula fore starting from any nonzero v and w. 13 Find nonzero vectors u, v. w that are perpendicular to (1.1,1,1) and to each other. 14 The geometric mean of z • 2 and у = 8 is ^zy — 4. The arithmetic mean is larger j(z+p) =_______________. This would come from the Schwarz inequality for о = (s/2. i/§) and w « (V'S. v/2). Find cos 9 for this v and ». 15 How long is the vector e (1,1, —, 1) in 9 dimensions? find a unit vector u in the same direction as v and a trait vector »that is perpendicular to v. 16 What are the cosines of the angles ci, S.9 between the vector (1,0,-1) and the unit vectors i, j, к along the axes? Check the formula cos3 a + cos? 0 + cw3 9 = 1.
Cfapiet 1. Vectors and Matr,^ 18 Г^« «Ьоы tenetbs “d *ns,“in ,ri“ne,cs- Problems 17-25 lead to the main facts about > 8 n _ _________A^.„ = (4.J)mdw = (-».2)U*recUn»le Check Ле rywagons тсспьшл ат» ,^-Г.1» + (ta-hof .)* - О"***» + w)’. 18 18 (Rules for do. products) These equaooos are sunpie but useful: (!)«.» = «•- <3><«)-w-c(v.w) Use(2)w,<b. = .4 - -ор«~1- * -I’—* + 2V W + W W The “Law of Cossnes" comes from (• - •) -(•-») = »• ® ~ 2v • • + W • «: Cute U. t» - •!’ - M* " 2M M c“* + M’ Drawatriangleenthudeseandwandn-w Whichoftheanglests87 20 The tnaaglt iar^aatay says (length of • + ») < (length of u) + (length of w). Problem 18 found |o + eef’ - M* * 2’ 1 » ♦ lncre»“ U>« v • w to || o|| || w| to show that side 3| cannot esceed [Лк 1Ц + llalde 2|: 21 The Schwart inequality |u • te| < |tr|| |w| by algebra instead of trigonometry (a) Multiply out ЬоЛ »des at (o>»i + и1ю1)’< (v? + t^)(w? + w’). (b) Show chai the difference between those two sides equals (щюз - t>2wi)2. This cannot be negative since it is a square—so the inequality is true 22 One-Ime proof of die inequality fss-ff | < 1 for unit vectors (u,,ui) and ((Д, (/,): I» • U| < Ь1|KZ,| + Inal |СЛ»| < = !. Pot (ui.ua) - (Д Л) and (Ut.Ut) = (Л, .6) in that whole line and find cos8. 23 Why is I cor8, never greater than 17 End con» in an equilateral triangle.
1.2 Length» and Angles Iron Doi Product» 19 24 Show thal the squared diagonal lengths j® + w|’ + |v - v|1 add to Use cum of lour squared side lengths 21 v||’ + 2|tri’- 25 (Recommended) If |MI - 5 and ||»|| = 3. what are lhe smallest and largest possible values of Це - w||? What are the smallest and largest possible values of и • to? Challenge Problem» 26 Сал three vectors tn the ry plane have u-o<0andew<0anduw<0? I don't know how many vectors in xyz space can have all negative dot products (Four of those vectors in the plane would certainly be impossible .. 27 Find 4 perpendicular unit vectors of lhe form Choose + or 28 Using v - randn(3,1) in MATLAB, create a random unit vector u a »/||»|. Using V a randn(3,30) create 30 more random unit vectors U, What is the average size of the dot products |u Ц|? In calculus, the average is f* |«atf|dP/e a 2/rr. 29 In the rg plane, when could four vectors v(, ₽>. »j, v4 not be the four sides of a quadrilateral 7
Chaffer 1. Vectors and Мшпсе» 20 13 Matrices and Column Spaces 2 columns. Rank 2. , a а- of the 3 rows of A with lhe vector a: 2 The Зсотрооепв of Ax are dot products oltnc a ro- • » J- о. u *>•> 1-7 + 2-8 23 53 -1 4 Thccoiumntpoceof A contain» ad rombinaliom Ax-г, о i+ff3a3 of the columns J Rank one natrim All column at A (and all combinations Ax) lie on one line. J Sections 1.1 and 1.2 explained lhe mechanics of vector»—linear combituurons. dot product». lengths. aad angle» We have vecton in R'and R and ever) R". Section 1 3 begins lhe algebra of m by n matncea our tree grail A typical matrix A la a rectangle of m tuna n number»—m rows and n columns If m equals n then A is a 2 1 -J 1 4 7 -3 7 5 Identity IHagonal Triangular Symmetric matnx mama matrix matrix We often think of the column» of A as vecton O|,a31... ,aM Each of (hose n vectors ia in m-dimcnuooal space. la this example the a's have rn • 3 components each: m - 3 пип n — 4 column» 3 by 4 matrix This example is a “difference matnx" because multiplying A limes x produces a vector Ax of differences How doer an mbjn malm A multiply on n by 1 vector x ? There are two ways to lhe same answer—we wort »«h the rows of A or we svort with the columns. The row picture of Ax will come from dot products of x with lhe rows of A. The column picture will come from linear combinations of the columns of A.
21 IJ. Машею and Column Space Row picture of Ax Each row of A multiplies lhe column vector x. Those multiplications row timet column are doc products! The tint dot product comes from row 1 of A: (row l)-x = (-1,1,0,0) •(xi.xj.xs.xa) = *> — »s. It takes m limes n email multiplications to find the m “ 3 dot products that go into Ax. Three dot products -1 0 0 1 -I 0 Notice well that each row of A has the same number of components as lhe vector X. Four columns multiply X| to x« Otherwise msdtiplying Ax would be impossible Column picture of Ax The mains A times the vector x is a combination of lhe columns of A The n columns are multiplied by the n numbers in x Then add those column vectors X|Ot..x.a. to find the vector Ax: Ax w Xi (column a() 4- xj(column <si) + xilcolumn щ) 4 x«(cotamn a«) (2) This combination of n columns involves exactly the same multipiscalninv as dot products of x with lhe rn rows Bui it is higher level' We have a vector equation instead of three dot products. You see the same Ax in equations (I) and (J>. Combination of columns [-1] Г 11 Г 01 fol [c«-«tl 0 -I ll + r(lo|alt|-ril|j) Oj I oj l-lj ll J l«a-».J Let me admit something right away. If I have numbers in A and x. and I want to compute Ax, then I tend io use dot products the row picture But if I want to undrrttand Ax. the column picture is better "The column vector Ax is a combination of the columns of A." We are aiming for a picture of not just one combination Ax of the columns (from a particular z), What we really warn is a picture of all combinations of lhe columns (from multiplying A by all vectors z). This figure shows one combination 2ai ♦ 07 and then it tries to show the plane of all combinations X|«| 4- гга] (for every xt and Xj). Figure 1.10: A linear combination of oi and n} All linear combinations fill a plane. The next important words are independence, dependence, and column space
Chapter 1. Vectors and Matrices 22 . contribute anything new. They might 1 sidy included). Examples 1 and 2 show • • _ rsf nn»viniK rnlniv,». . Coiunms^^ nugbin°l.ui * columns (which »« ^bmations of previous columns. • т a new direction. Their combinations fill 3D space R3. „_• all vectors (bi, bj, Ьз): 3D space. number bi. Then x3 (0,4,5) leaves b1 dirertron. and columns (1 0 0 1 Each column gives 2 4 0 3 5 6 J If »e took at all combinations of the columns, we see -ь 6. b and allows us to match any 4,. We have found zj, xj. xs *0 «bat A।X - Independence means: The only combination of columns that produces Ax = (0,0,0) < x = (0,0,0). The columns are independent when each new column is a vector that we don't already have as a -/«wious columns. That word "independent" will be important columns that pse a or* Independent Ai = columns 12 3 1 4 5 6 0 6 1 + 2 «= 3 Column 1 + column 2 « column 3 Their combinations don’t fill 3D space ’. ' * ~ " r 6+0=6 The opposite of independent is -de/vndrnr" These three columns of A3 are dependent Column 3 is in the plane of columns 1 and 2. Nothing new from column 3. I usually lest independence going from left lo right. The column (1,1,6) is no problem. Column 2 м nor a multiple of column 1 and (2,4,0) gives a new direction. But column 3 is the sum of columns 1 and 2. The third column vector (3,5,6) is not independent of (1.1.6) and (2,4,0). We only have two independent columns. If I went from nght to left. I would start with independent columns 3 and 2. Then column 1 is a combination (column 3 minus column 2). Either way we find that the three columns are in the same plane: two independent columns produce a plane in 3D. That plane is the column space of this matrix: Plane = all combinations of the columns. Example з ’ I 2 5 3 6 15 4 И 20 column 1 + column 2 — column 3 is (0,0,0). Now в] is 3 times at. And a3 is 4 times at. Every pair of columns is dependent. This example is important. You could call it an extreme case. All three columns of A3 lie on the same line in .^dimensional space That line consists of all column vectors (c, 2c, 5c)— all the multiples of (1.2.5). Notice that e = 0 gives the point (0,0,0). That line In 3D b the column space for this matrix As The line contains all vectors AjZ. By allowing every vector z. we fill in the column space of A3—and here we only filled one line. That is almost the smallest possible column space pbe column space of A is the set of all vectors Az: All combinations of the columns )
1.3. Matrices and Column Spaces 23 Thinking About the Column Space of A “Vector spaces” are a central topic. Examples are coming unusually early. They give you a chance to see what linear algebra is about. The combinations of all columns produce the column space, but you only need r independent columns. So we start with column 1. and go from left to right in identifying independent columns. Here are two examples A< and Aj. Aa = 1111’ 0 111 0 0 11 0 0 0 1 Aj = 110 0' 0 110 0 0 11 10 0 1 At has four independent columns. For example, column 4 is not a combination of columns 1,2,3. There are no dependent columns in A«. Triangular matrices like A< are easy provided the main diagonal has no zeros Here the diagonal is 1,1,1,1. A$ is not so easy. Columns 1 and 2 and 3 are independent. The big question is whether column 4 is independent—or is it a combination of columns 1,2,3? To match the final 1 in column 4, that combination will have to start with column 1. To cancel the I in the top left comer of Aj. we need minus the second column. Then we need plus column 3 so that -1 and +1 in row 2 will also cancel Now we see what is true about this matrix А»: Column 4 of AB г Column I — Column 2 4- Column 3. (4) So column 4 of Aj is a combination of columns 1,2,3. A® has only 3 independent columns. The next step is to "visualize" the column space—all combinations of the four columns. That word is in quotes because the task may be impossible. I don't think that drawing a 4-dimensional figure would help (possibly this is wrong). The first matrix A* is a good place to start, because its column space is the full 4-dimensional space R4. Do you see why C(Aa) « R4 ? If we look to algebra, we see that every vector v in R4 is a combination of the columns. By writing v as (vj, vj, vj, v*). we can literally show the exact combination that produces every vector v from A«: the combination is. We have solved the four equations A^x = v! The four unknowns in x = (xi, xj, хз, z«) are now known in the four parentheses of equation (5). Geometrically, every vector v is a combination of the 4 columns of Ад. Here is one way to look at A«. The first column (1,0,0,0) is responsible for a line in 4-dimensional space. That line contains every vector (ci,0,0,0). The second column is responsible for another line, containing every vector (cj,Cj,0,0). /f you add every vector (ci,0,0,0) to every vector (cj, Cj, 0,0), you get a 2-dimensional plane inside 4-dimensional space.
Chapter 1. Vectors and Matrices 24 _ num rule of linear algebra is keep going. Thc ты w as the first two columns. The independent of the first two. Ы! two columns give two more «hrect^ 44linxtbional vector is a combination of the At the end, equation (5) 4 of Aa is «И of tour columns of Aa The co u first 3 columns cooperate. But If we attempt the same plan for *** j 2 3. Those three columns combine to column 4 of As is a combination of 4 happens t0 be in that subspace. give a three-dunetuionai whote ^4^ space C(A5). We can only solve Thai three-dimensional subspa*. three indcpcndent co|u Ajz - v when v is in C(As)- tne maui» ... ,„llimn soace of A. When A has m rows, the columns are ' R^^Tumn space might fill *11 of R"‘ or it might not. vectors ш m-dimcnsRinaJ spj*.e к , irwwiu**** r—-w For m = 1 here are all four possibilities for column spaces in 3-dtmensional space: 3 independent columns 2 independent columns 1 independent column 3. 1. The whole space R3 2. A plane in R3 going through (0,0,0) A line in R3 gotng through (0,0,0) The single point (0,0,0) in R3 (when A is a matrix of zeros I) Here are simple matrices to show those four possibilities for the column space C(A); 1 0 0 0 1 0 ° ° 1 C(A) = R3 = iy: space 10 0 0 1 0 0 0 0 C(A) = ry plane 10 0 0 0 0 ° 0 0 C(A)=xaxis 0 0 0' 0 0 0 ° 0 0 C(A)=one point (0,0,0) Author’s note The words “column space" have not appeared in Chapter 1 of my previous books 1 thought the idea of a space was loo important to come so soon. Now I think that the best way to understand such an important idea is to see it early and often. It is examples more than definitions that make ideas dear-in mathematics as in life. 0» 4e •“*»1 >»“«“* the keys to the answers They give a real understanding of any matrix A l. ’ T •"*' '"k₽en*"' '» •. «*«- Ч-. 5 (Amazing) The r rows of Л are a bad» f™ .1, c row s₽ace °f Л: combinations of rows. Sectio* 14 will eiphin how to multiply ах», n C contains columns from A. Please n^tke Z C and «• The result is A = СЯ. “«>« the row, of Я do w come directly from A.
1.3. Matrices and Column Spaces 25 Matrices of Rank One Now we come to the building blocks for all matrices. Every matrix of rank r is the sum ofr matrices of rank one. For a rank one matrix, all column vectors lie along the same line. That line through (0,0,0) is the whole column space of the rank one matrix. 1 4 2 Example As = 3 12 6 -2 -8 -4 has rank r = 1. All columns: same direction 1 Columns 2 and 3 are multiples of the first column ai = (1,4,2). Column 2 is 3a, and col- umn 3 is— 2aj. The column space C(Ag) is only the line of all vectors cai = (c,4c, 2cj. Here is a wonderful fact about any rank one matrix. You may have noticed the rows of A«. All the rows are multiples of one row. When lhe column space is a single line in rn-dimensional space, the row space is a single line in n-dimensional space. All rows of this matrix A« are multiples of_____. An example like A« raises a basic question. If all columns are in the same direction, why does it happen that all rows are in the same direction? To find an answer, look first at this 2 by 2 matrix. Column 2 is m times column 1 so the column rank is 1. a ma b mb Is row 2 a multiple of row 1 ? Yes I The second row (b,mb) is | times lhe first row (a, ma). If the column rank is 1, then the row rank is 1. To cover every possibility we have to check the case when a 0. Then the first row [ 0 0 ] is 0 times row 2. So the row space is the line through row 2. Our 2 by 2 proof is complete. Let me look next at this 3 by 3 matrix of rank 1: ma pa mb pb me pc Column 2 is m times column 1 Column 3 is p times column 1 Rows 2 and 3 are b/a and c/a times row 1 This matrix does not have two independent columns. Is it the same for the rows of A ? Is row 2 in the same direction as row 1 ? Yes. Is row 3 in the same direction as row 1 ? Mrs. The rule still holds. The row rank of this A is also 1 (equal to the column rank). Let me jump from rank one matrices to all matrices. At this point we could make a guess: It looks possible that row rank equals column rank for every matrix. If A has r independent columns, then A has r independent rows. A wonderful fact I I believe that this is the first great theorem in linear algebra. So far we have only seen the case of rank one matrices. The next section 1.4 will explain matrix multiplication AB and lead us toward an understanding of "row rank = column rank” for all matrices.
Chapter I. Vectors and Matrices 26 Problem Set 1-3 But we don't yet have a computational system to This chapter introouces , „Iumn vectors. So these problems stay with whole decide independence or dependence of colum numbers and small matrices. 1 0 o' 1 1 о 1 1 1 Лз = Лз 1 5' 2 10 1 5 0 Л< = 0 0 0‘ 0 0 Find a combine of the columns that produces (0,0.0): column space = plane. Dependent columns Describe the column spaces in R ’ of В and C: 12 2 1 3 3 C- В -В (3 by 4 block matrix) Multiply Az and By and lx using dot products as in (rows of Л) z: 2 1 2]fl 4 2 4 0 1 0 1 0 01Г 4* By = 1 10 1 1 1 10 °1Г*1 0 1 J *» . 1 2 2 1 1 5 6 2 I 2 3 4 5 6 7 8 9 1 4 7 2 5 8 3 6 9 3 в 2 5 1 0 0 1 0 0 *2 5 6 8 9 Multiply the same A times z and В times у and I times * using combinations of the columns of AandBand/.asinAz = 1 (column 1) + 2(column 2) + 5(column 3). JnolXkb m"> ,nd^a”co,u>™«<‘‘« Л have ? How many independent columns in Д? How many independent columns in A + Bl st. co,umns) w ,hat л+в h“ WNoradeP«>dent columns (c) 4 independent columns the column spaces in R3 oMandBand C°m^’nat*ons of *** columns. Describe 2 4’ 1 2 2 4 1 0 0 0‘ 0 I 0 1 0 10 12' 0 2 2 4 .0 2 2 4 0 = C = R*>d a 3 by 3 matrix A with x — • (What is the maximum poss.bk'n^^'i "" "ine Cntries = 1 or 2
1.3. Matricc* and Column Space* 27 10 Complete A and В so that they arc rank one matrices. What are the column space* of A and В1 What arc the row spaces of A and В ? A = 3 5 15 B- 1 2 -5 4 11 Suppose A is a 5 by 2 matrix with columns ai and aj. We include one more column to produce В (5 by 3). Do A and В have the same column space if (a) the new column is the zero vector ? (b) the new column is (1,1,1) ? (c) the new column is the difference aj — ai ? 12 Explain this important sentence. It connects column spaces Io linear equations. Ax = b has a solution vector x if the vector b is in the column space of A. The equation Ax = b looks for a combination of columns of A that produces b. What vector will solve Ax = b for these right hand sides b? 13 Find two 3 by 3 matrices A and В with lhe same column space the plane of all vectors perpendicular to (1,1,1). Whal is the column space of A + В ? 14 Which numbers q would leave A with two independent columns ? 1 4 7 ‘ 1 3 0 1 2 9 2 4 2 5 8 3 6 q 5 8 2 0 2 4 0 4 5 0 9 15 Suppose A limes x equals b. If you add b as an extra column of A, explain why the rank r (number of independent columns) stays the same. 16 True or false (a) If the 5 by 2 matrices A and В have independent columns, so docs A 4- B. (b) If the m by n matrix A has independent columns, then tn > n. (c) A random 3 by 3 matrix almost surely has independent columns. 17 If A and В have rank 1, what are the possible ranks of A + В ? Give an example of each possibility. 18 Find the linear combination 3s 1 4- 4sa 4- 5вз = b. Then write b as a matrix-vector multiplication Sx. with 3,4,5 in z. Compute the three dot products (row of S) • z: 1 1 1 «2 = 1 1 0 я.з = 0 go into the columns of S 1
Chapter 1. Vectors and Matrices 28 19 Solve these equations Sy - fl 0 01 Г Vx 110 in 1 1 1J I». bwith«b*X’Js in ГИ _ 1 and 1 1J I1 the columns of the sum matrix S; The sum of the first 10 is_________ The sum of the first 3 odd numbers is----• Solve these three equations for tn. Jt2. P3 in tcrmS°f 12 10 0 1 1 0 1 1 1 V3 Wnte the solution V as a matnx A times the vector c. A is the "inverse matrix" S'1. Are the columns of S independent or dependent. 21 The three rows of this square matrix A are dependent- Then linear algebra says that the three columns must also be dependent. Find x in Ax = 0: 12 3 3 5 6 4 7 9 Row 1 + row 2 = row 3 Two independent rows Then only two independent columns 22 Which numbers c give dependent columns ? Then a combination of columns is zero. 1 1 0 3 2 1 7 4 c 1 0 c 1 1 0 0 1 1 c c c 2 1 5 3 3 6 [' и L4 CJ 23 If the columns combme mto Ax = 0 then each row of A has row x = 0: If ai Oj аз H ff = 0 0 then by rows 1 *1 1 M *- H H 1 Xs J 0 r3 • X The ta. Л. fc „ , b р1те ю ’O' 0 0
1.4. Matrix Multiplication and A = CR 29 1.4 Matrix Multiplication and A = CR f \ To multiply AB we need tow length for A = column length for B. 2 The number in row i. column j of AB is (row • of A) «(column j of B). 3 By columns: A times column j of В produces column j of AB. 4 Usually AB is different from BA. But always (AB) C = A (BC). If A has r independent columns, then A = CR = (m by r) (r by n). At this point we can multiply a matrix A times a vector z to produce Ax. Remember the row way and the column way. The output is a vector. Row way Dot products of z with each row of A Column way Ax = ziai + • • • + zna„ combination of the columns of A Now we come to the higher level operation in linear algebra: Multiply two matrices We can multiply AB if their shapes are right. When A has n columns. В must have n rows. If A ism by n and В is n by p, then AB ismbyp.m columns and p rows. The rules for AB will be an extension of the rules for Ax. We can think of the vector z as a matrix В with only one column. What we did for Az we will now do for AB. The columns of В are vectors [ z у x ]. The columns of AB are vectors [ Az Ay Ax ] In other words, multiply A times each column of B. There are two ways to multiply A times a column vector, and those give two ways to multiply AB: Dot products (row t of A) • (column j of B) goes into row». column j of AB Combinations of columns of A Use the numbers from each column of В We have dot products (numbers) or linear combinations of columns of A (vectors). For computing by hand. I would use the row way to find each number in AB. I “think” the column way to see the big picture: Columns of AB are combinations of columns of A. Example 1 Multiply AB =1 ! I 3 el **У* How many steps ? I 4 17 О I The dot product (row 1 of A) • (column 1 of B) is (1,2) • (5,7) = 5 + 14 = 19 , (Rows of A)-(columns of B) Г row 1 -col 1 - [ row 2 • col 1 row 1 • col 2 1 _ Г19 row 2 • col 2 J “ [43 221 50 J Abi and Ab? are Г 1 1 Г 2 1 „ 1 11 Г 2 11 Г 19 22 1 combinations of AB = 5 3 +7 4 6 3 +8 4 = аз 50 J the columns of A L J J L 4 J л J I 4 J J L4 J
2 b) 2 "UInCeS Mutaptytng s 61ви'ПрЬ*>10-' в h*s P columns, we have to nm]Up. * я by n mnp multiplications for (m by n) Tba __ j.9.2 = 8multiplications. Iж = Ps 1 л and В be multiplied with fewer than . Слм я by * pouiblf (allowing extra additions) “** ° noTesponert E in the multiplication count nE. k.i E = 2 0001 may be impossible. питх mulnpbcaoon. Explain why every vector да!ияда of AB—и also in the column space of A Еъ*-<яе2 тье identity matrix / has A7 » A and 7fi = Я if matrix sizes are right ?]-[“-]_х ,°"ayA 0 1 1 0 wiD exchange columns or exchange rows. Ехжпр» 3 The matrix E » Exchange columns of A (E is on the right) Exchange rows of В (E is on the left) AE * ЕЛ for moo matrices Exchange columns or exchange rows. ! !1 «. ts not the same as EA = j 2 ] „ “ "ПрППЛЯ f": AB “n faMlv b* different from BA We must keep matrices AB ot ABC in order. . examples where AB « BA. hit those special cases are not typical. I. Why is this true ? J"“ л Оив ВС. Matnx multiplication is “associative ‘«prod, became n u sounportant; (AB)C = A(BC). "-n® mvM oxy the same! But we can multiply AB fir* res T"*1 m 11ПСИ 11?ebri depend on this simple fact и asvKien/i mM m“Ch (,n n> * (n by p) x (p ЬУ te the^l^ *°”И ** (£Л)£ = Е(Л£)’ Exchan₽e A hnx The triple product EAE does both
1.4. Matnx Multiplication and A = CR 31 Rank One Matrices and A — CR All columns of a rank one matrix lie on the same line. That line is the column space C( A). Examples in Section 1.3 pointed to a remarkable fact: 77ie rows also lie on a line. When all the columns are in the same column direction, then all the rows are in the same row direction. Here is an example: 1 3 2 2 6 4 10 100 rank one matrix 30 300 = one independent column 20 200 one independent row ! All columns are multiples of (1,3,2). All rows are multiples of [ 1 2 10 100 ]. Only one independent row when there is only one independent column. Why is this true ? Our approach is through matrix multiplication. We factor A into C times R. For this special matrix, C has one column and R has one row. CR is (3 x 1)(1 x 4). AmCR 1 2 10 100 3 6 30 300 2 4 20 200 = 1’ 3 2 [1 2 10 100 ] (D The dot products (row of О • (column of R) arc just multiplications like 3 times 10. This is multiplication of thin matrices CR Only 12 small multiplications. The rows of A are numbers 1,3,2 tunes the (only) row [ 1 2 10 100 ] of A. By factoring this special A into one column times one row. the conclusion jumps out: If the column space of A is a line, the row space of A is also a line. One column in C, one row in A. That is beautiful, but we are certainly not finished. Our big goal is to allow r columns in C and to find r rows in A And to see A ” CR C Contains Independent Columns Suppose we go from left to right, looking for independent columns in any matrix A: If column 1 of A is not all zero, put it into the matrix C If column 2 of A is not a multiple of column 1. put it into C If column 3 of A is not a combination of columns 1 and 2, put it into C. Continue. At the end C will have r columns taken from A. That number r is the rank of A The n columns of A might be dependent The r columns of C will surely be independent Independent No column of C is a combination of previous columns columns No combination of columns gives Cx = 0 except x = all zeros When those independent columns combine to give all columns, we have a basis
Chapter 1. Vectors and Matrices 32 , + «(column2ofC) + ••• = zero vector. Cx = 0 means that г,(соЬиппЫ^ With independent columns, th.s onlyhД w0Uld be a comb.nat.on of the earlier by the last nonzero coefficients and ma columns—which our construction forbids. ’ 2 Example 7 Л = 6 4 4 12 8 1 3 5 Column 1 goes leads to C - 2 4 1 4 8 5 C. Column 2 ° “““3 & Matrix Multiplication C times R ._rn n tells how to produce the columns of A from t^cXTns column Of A is actually in C so the first column of R just has 1 and 0. The third column of A is also in C. so lhe third column of R just has 0 and 1. Rank 2 Notice I inside R 2 6 4 1 2 12 8 = 4 3 5 J 1 4 8 5 [ 1 ? 0 [ 0 ? 1 = СЛ (2) Two columns of A went straight into C, so part ofRis lhe identity matrix. The question marks are in column 2 because column 2 of A is not in C. It was not an independent column. Column 2 of A is 3 limes column 1. That number 3 goes into R. Then R shows bow to combine the two columns of C to get all three columns of the original A. A is m x n C is m x r R is r x n A = CR is ’2 6 4 4 12 8 1 3 5 = 2 4 4 8 1 5 f 1 3 ° [OOl This completes A = CR. The magic is now seen in the rows. All the rows of A come from the rows of R. This fact follows immediately from matrix multiplication CR: RowlofAis 2 (row 1 of Л) + 4 (row 2 of R) Row2ofAis 4(row 1 ofЯ) +8(row2ofR) Row3ofAis 1 (row 1 of 7?) + 5 (row 2 of R) I» O.I, г .«lopo— w „3 33,O in л combjne w 8ive of л 3 6 9 Multiply CR using rows of R Second example of A = CR from the front cover When a column of Л goes into C, a < 1 4 7 2 5 8 1 4 7 2 ’ 5 8 1 0 -1 0 1 2 (4) of R tells us how to produce the depend™? 8°es into R. The “free” column -1,2 tn . Column 3 of A is —j (14 7) >9 (9 r ° ^гот l^e independent columns Column j rfX - CUm. “ *>= <* * «Г Я c times column j of R. R • f ,w,of4 = rows of Ctimes R
1.4. Matrix Multiplication and A = CR 33 Question If all n columns of A are independent, then C = A. What matrix to Я ? Answer This case of n independent columns has R = I (identity matrix). The rank is n. How to find R. Start with t independent columns of A going into C. If column 3 of A = 2nd independent column in C, then column 3 of Я is = CR All three ranks = 2 Dependent: If column 4 of A « columns 1 + 2 of C, then column 4 of Я is J Я tells how to recover all columns of A from the independent columns in C. 1 2 3 4 1 _ [ 1 3 1 Г 1 2 0 11 1 2 4 5 j [ 1 4 ] I 0 0 1 1] Here is an informal proof that row rank of A equals column rank of A 1. The r columns of C are independent (by their construction) 2. Every column of A is a combination of those r columns of C (because A » CR) 3. The r rows of Я are independent (they contain the r by r matrix I) 4. Every row of A is a combination of those r rows of Я (because A = CR) Key facts The r columns of C are a basis for the column space of A: dimension r The r rows of Я are a basis for the row space of A: dimension r Those words “basis” and “dimension” are properly defined in Section 3.41 Section 3.2 will show how the same row matnx Я can be constructed directly from the "reduced row echelon form” of A, by deleting any zero rows. Chapter 1 starts with independent columns of A, placed in C. Chapter 3 starts with nows of A and combines them into R. We are emphasizing CR because both matrices are important. C contains r indepen- dent columns of А. Я tells how to combine those columns to give all columns of A. (Я contains I. when columns of A are already in C.) Chapter 3 will produce Я directly from A by elimination, the most used algorithm in computational mathematics. This will be the key to a fundamental problem: solving linear equations Ax = b Why is Matrix Multiplication AB Defined This Way ? The definition of AB was chosen to produce this crucial equation: (AB) times x is equal to A times Bx. This leads to the all-important law (AB)C = A(BC). We had no other reasonable choice for AB ! Linear algebra will use these laws over and over. Let me show in three steps why that crucial equation (AB)x = A(Bx) is correct: Bx is a combination xibi + x2bj + • • • + x»bn of the columns of B. Matrix-vector multiplication is linear: A(Bx) = xj Abj + xjAbj + • • • + xn(Abn). We want this to agree with (AB)x = xi(cohimn 1 of AB) + ••• + xn(column n of AB). Compare lines 2 and 3. Column I of AB absolutely must equal A times column 1 of B. This is our rule: When В = [ x у x ] the columns of AB are [ Ax Ay Az].
Chapter 1- Vectors and Matrices 34 (AB)x = A(Bx) to erase the parentheses. _[.;i ,-[•:] ЛВ‘ 1 AB: Example 8 A — When we show that (AB* ____ (AB)z = 4(Ях)-[3 4 J L 7 The parentheses don’t matter but the ВАС and ACВ almost always give differ» order ABC certainly does matter. The multiplications answers. In fact ВАС may be impossible. Columns of A times Rows of В . . in Wid this message. There is another way to multiply Before this chapter ends. always) This way is not so well known, but it - - - “ bi AB = b‘n columns ak rows bt columns a* times rows b’k (5) Those matrices akb[ are called outer products We recognize that they have rank one: column times row They are entirely different from dot products (rows times columns, also known as inner products). If A is an m by n matrix and В is an n by p matrix, adding columns times rows gives the same answer AB as rows times columns. Actually they involve the same mnp small multiplications but in a different order I (Row) -(Column) mp doc products, n multiplications each total mnp (Column) (Row) n rank one matrices, mp multiplications each total mnp Columns x Rows 1 4 [ 7 8 9 1 T [7 8 9] '4' [10 11 12] for A times В 2 5 [10 11 12] = 2 + 5 3 6 3 6 Rank 1 -bRank 1 7 14 21 16 24 9 18 a io 60 44 48' 47 52 57' 55 60 = 64 71 78 66 72 81 90 99 (6) 18 multholication* <ПЛ₽ = sUn °f second line you see the 3 * 3 ТЪСЛ 9 «“*«“ ₽*« ^e correct answer AB. Two independent rowr ' T)^ °f ,s 2- Г*° "dependent columns, not three, inverse matrix it is not imemhl s яТ” ChafXCr *111 use different words. AB has no • uaonmvmWc. And laterinthe book: The determinant of AB is zen>.
1.4. Matrix Multiplication and A = CR 35 Note about the “echelon matrices*’ R and Rq We were amazed to learn that the row matrix R in A - CR is already a famous matrix in linear algebra! It is essentially the “reduced row echelon form” of the original A. MATLAB calls it rref (A) and includes m — r zero rows. With the zero rows, we call it Ro The factorization A = CR is a big step in linear algebra. The Problem Set will look closely at the matrix R, its form is remarkable. R has the identity matrix in г columns. Then C multiplies each column of Я to produce a column of A. Ro comes in Chapter 3. Example 9 A = a, a2 3a, + 4a2 » la, a2 * ° ® = CR. Here a, and a? are the independent columns of A. The third column is dependent— a combination of a, and a?. Therefore it is in the plane produced by columns 1 and 2. Al) three matrices A, C, R have rank r = 2. We can try that new way (columns x rows) to quickly multiply CR in Example 9: ColumnsofC CR= , 31 + aj[0 1 41»(a, a2 3a,+4a3]-A times rows of Я 1 J • l i i • * j Four Ways to Multiply AB = C (Row i of A) * (Column fc of B) = Number Си, t = lto3 k=lto4 12 numbers A times (Column к of B) = Column к of C к- 1Ю4 4 columns (Row t of A) times В » Row i of C I = 1 to3 3 rows (Column j of A) (Row j of B) = Rank 1 Matrix j = lto2 2 matrices Problem Set 1.4 1 Construct this four-way table when A is m by n and В is n by p. How many dot products and columns and rows and rank one matrices go into AB ? In all four cases the total count of small multiplications is mnp. 2 If all columns of A = [ a a a ] contain the same a / 0. what are C and R ?
Chapter I. Vectors and Matrices 36 3 4 5 6 7 в а ю и 12 Multiply A bmes В (3 examples) using dot products: each row times each column ' " f4](1 2 7 10 0 1 I 0 I 1 О о I 0 5 6 1 6 1 -1 1J Test the truth of the associative _ 1 ivelaw (AB)C = A(BC). ill (b) 3 1 1 0 1 2 0 1 Why is it impossible for a maim A with 7 columns and 4 rows to have 5 independent columns ? This is not a trivia) or useless question. Going from left to right, put each column of A into the matrix C if that column is not a combination of earlier columns: [2-216 1 -1 3 -3 2 1 3 0 0 1 C = a = 0 2 0 6 Find R in Problem 6 so that A — CR. If your C has r columns, then R has r rows. The 5 columns of R tell how to produce the 5 columns of A from the columns in C. This matrix A has 3 independent columns. So C has the same 3 columns as A What is the 3 by 3 matrix R so that A CR? What is different about В ? [ 2 2 2’ Upper triangular A - | 0 4 4 0 6 2 2 2' 0 0 4 0 0 6 0 Suppose A is a random 4 by 4 matrix. The probability is 1 that the columns of A are “independent" In that case, what art the matricesC and R in A = CR? Note Random matrix theory has become an important pari of applied linear algebra— especially for very large matrices when even multiplication AB is too expensive. An example of “probability 1” is choosing two whole numbers at random. The probability is 1 that they are different. But they could be the same ! Problem 10 is another example of this type. Suppose A is a random 4 by 5 matrix. With probability 1, what can you say about C and Я in A = СЯ ? in particular, which columns of A (going into C) are probably independent of previous columns, going from left to right ? Г-7Г ^7^77** * * 4 * 4 * of rank r = 2. Then factor A into CR = (4 by 2)(2by 4). Factor these matrices into A-CflsfmbvrMrk--I n .. tm oy r) (r by n). all ranks equal to r. Г i n «1 »- A,- “11 ranks equal to r. o| A4=[1 0 0 4 l0 2 2 0 0 12 3 0 13 5
1.4. Matrix Multiplication and A = CR 37 13 Starting from C= and /? = [2 4 ] compute CR and RC and CRC and RCR. 14 Complete these 2 by 2 matrices to meet the requirements printed underneath: 3 6 1 Г 6 l Г 2 If3 4 I 5 7 L 3 6 J [ -3 ] rank one orthogonal columns rank 2 A2 = I 15 Suppose A = CR with independent columns in C and independent rows in R. Explain how each of these logical steps follows from A = CR = (m by r) (r by n). 1. Every column of A is a combination of columns of C. 2. Every row of A is a combination of rows of R. What combination is row I ? 3. The number of columns of C = the number of rows of R (needed for CR ?). 4. Column rank equals row rank. The number of independent columns of A equals the number of independent rows in A. 16 (a) The vectors ABx produce the column space of AB. Show why this vector ABx is also in the column space of A. (Is ABx = Ay for some vector у 7) Conclusion: The column space of A contains the column space of AB. (b) Choose nonzero matrices A and В so the column space of AB contains only the zero vector. This is the smallest possible column space. 17 True or false, with a reason (not easy): (а) ИЗ by 3 matrices A and В have rank 1. then AB will always have rank 1. (b) If 3 by 3 matrices A and В have rank 3, then AB will always have rank 3. (c) Suppose AB — BA for every 2 by 2 matrix B. Then A = [ £ j cl for some number c. Only those matrices A commute with every B. 1 2 ‘ 3 4 1 0 18 Example 6 in this section mentioned a special case of the law (AB)C = A(BC). A = C « exchange matrix (a) First compute AB (row exchange) and also BC (column exchange). (b) Now compute the double exchanges: (AB)C with rows first and A(BC) with columns first. Verify that those double exchanges produce the same ABC. 19 Test the column-row multiplication in equation (5) to find AB and BA: ’10 0' 111' 111' 10 0' AB = 1 1 0 0 1 1 BA = 0 1 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 1 20 How many small multiplications for (AB)C and A(BC) if those matrices have sizes ABC = (4 x 3)(3 x 2) (2 x 1)? That choice affects the operation count.
Chapter 1. Vectors and Matrices 38 Thoughts on Chapter 1 . a. .he author s thoughts. But a lot of decisions go into Most textbooks don’t have a place f<* _ jumped nght int0 the subject, with sorting a new textbook This chapter g(xxj ideas ahcaj and discussion *^Pcndc“" , Herc two questions that influenced the writing, time to absorb, so why not get started ' Mere are । ч What makes this subject easy? AH the equations are linear. What makes this subject hard? So many equations and unknowns and ideas. Book rumples are small size But if we want the temperature at many points of an engine, there is an equation at every point: easily n = 1000 unknowns. I believe the key is to work right away with matrices Ax = b is a perfect format to accept problems of all sizes. The linearity is built into the symbols Ax and the rule is A(x + y) « Ax + Ay. Each of the m equations in Ax “ b represents a flat surface: 2x + by - 4z - 6 isa plane in three-dimensional space 2z + 5y - 4z + 7w 9 is a 3D plane (hyperp lane ?) in four-dimensional space Linearity is on our side, but there is a serious problem in visualizing 10 planes meeting in 11-dimensional space. Hopefully they meet along a line: dimension 11 - 10 - 1. An 11th plane should cut through that line at one point (which solves all 11 equations). Whai the textbook and the notation must do is to keep the counting simple Here is what we expect for a random m by n matrix A: m < n Many solution» or no solutions to the m equations Ax = b man Probably one solution to the n equations Ax = b _ m > n Probably no solation. too many equations with only n unknowns in x Л ca" •* combinations of The rank r teUs us the real size of our nmht r CWT,blM'>on of previous equations. The beautiful formula is A - f в / from independent columns and rows. The same ,s true for every column of C П—r ° = A (Bc)' C Tberefore(ДВ)СвЛ(ВС)
2 Solving Linear Equations Ax = b 2.1 The Idea of Elimination 2.2 Elimination Matrices and Inverse Matrices 2.3 Matrix Computations and A = LU 2.4 Permutations and Transposes The matrices in this chapter are square: n by n. Then Ax b gives n equations (one from each row of A). Those equations have n unknowns in the vector x. Often but not always there is one solution x for each b. In this case A has an inverse A"1 with A“*A = I and AA~l = 1. Multiplying Ax = b by A-1 produces the symbolic solution sc = A**b. This chapter aims to find that solution x, but not by computing A~ *. (That would solve Ax = b for every possible b.) We go forward column by column, assuming that A has independent columns. We only stop if this proves wrong. At the end we have triangular matrices L, U and x is easy to find. Ax м b is a universal problem in science and engineering and every quantitative sub- ject. There might be n = 10 equations—this is already beyond hand calculations. Many problems have n " 1000 or more—and we certainly don't want to find A-*. What we do need is an efficient way to compute the solution vector x. Here is an idea that goes back thousands of years (to China). Each step of “elimination" will produce a zero in the matrix. The original A gradually changes into an upper triangular U. Half of this matrix will be zero. A simple elimination matrix Etj produces one zero where row i meets column j. This is not exciting, it is just the natural way to simplify A. To describe all these steps we need matrices. This is the point of linear algebra! There are elimination matrices like E to reach U. And we multiply U by an inverse matrix I s f'1 to come back to A. Here are key matrices in this chapter of the book: Coefficient matrix A Elimination matrix Ey Permutation matrix P Upper triangular U Overall elimination E Transpose matrix AT Lower triangular L Inverse matrix A-1 Symmetric matrix S = Sr Our goal is to explain the elimination steps from A to EA = U to A = E — LU. (If the steps fail, this signals that Ax = b has no solution.) Every computer system has a code to find U and then x. Those codes are used so often that elimination adds up to the greatest cost in all of scientific computing. But the codes are highly engineered and we don’t know a better way to find x. 39
2. Solving Linear Equations Ax = b 40 2.1 The Idea of Elimination_______________________________ /1 Elimination subtracts tit times row ; from row i. Io turn А,у into zero. 2 Ax = bbecomes Ux = c(orelse Ax = b is proved to have no solution). 3 Ux = c is solved by back substitution and possible row exchanges. This chapter explains a systematic way lo solve Ax = b: n equations for n unknowns. The n by n matrix A is given and the n by 1 column vector b is given. There may be no vector x « (xi, xj........x.) that solves Ax = b, or there may be exactly one solution, or there may be infinitely many solution vectors x. Our job is lo decide among these three possibilities and to find all solutions Here are examples with n = 2. 1 2 3 Exactly one solution to Ax = b. In this case A has independent columns. The rank of A is 2. The only solution to Ax * 0 is x * 0. A has an inverse matrix A" *. The best case has a square matrix A (m = n) with independent columns. Then there is one solution x (one combination of lhe columns of A) for every vector b. E^P»* whh one .elution («,»)» (1,1) 2x + 3p - 5 Independent columns (2,4) and (3,2) 4т + 2y = 6 In other . in ib Л s*сж* b '* no* * combination of the columns of A. <n other word, b >. mx in the column qiace of A. The rank of A i, 1. Example with no solution _ „ Dependent column. (2,4) and (3,6) £ “ ° Subtract 2 times the first equation from ib V ° «I atmn from lhe second to get 0 - 3. No solution ^ert will be Infinitely many solution. a. a v - independent This is the meaning of d * = ° Whcn tlle columns of Л arc not 'he zero vector b > 0 Every cX co*un,ns ,П;||’У ways to produce . S,Tes Л'сД)»0. If there is one solution to Ax - A All the vector. z + cX solve Ihexam/" ** *“ Wy K>lu,ion ,o AX = 0 д, e*,ua,lons- so we have many solutions. ***"*~?*^ Ь+W- e m be added to the ro ° T**n = (6 -4) |U* ** more •options because r^to^^'*roluuonx L;0 2 JZ!VM = 0. AH vectors + ^^'^“ce-noresoludons: ""'ofso/urt^ w,rh'weqi<ariOTIJ Ax = b.
2.1 The Iilea of III ruination 41 This chapter will give a systematic way lo decide between those possibilities 1,2.3 One solution. no solution, infinitely nun у solutions this system IS called rllmtiiHlion It simplifies the matrix Л without changing any volution x to the equation lx b We do the same operations to both sides of the equation, and those opeiations ate reversible Elimination keeps all solutions x and creates no new ones Let me show you the ideal case, when elimination produces an uppet liiangulai matrix That matrix is culled //. Then /lx b leads lo Cx - c. which we easily solve Elimination reaches U Back substitution finds x 2 3 I О П 0 II () 7 Н» 17 II r That letter U stands for upper triangular The matrix has all zeros below its diagonal Highly important: The “pivots" 2. Г>. 7 on that main diagonal are not zero Then we can solve the equations by going from bottom lo lop ftrul t ।thru i г thru 1। Back substitution Work upwards Upwards again Conclusion Special note The last equation 7r । - 11 gives x* « 2 The next equation 'ix, + fit2) - 17 gives xj • I The first equation 2xt 4-3(1) ♦ 4(2) 19 gives X| I The only solution lo this example Гх cisx -(4.1.2). In solving for X|, x3. xj we needed to divide by live pivots 2, A, 7, These pivots were probably not on the diagonal of the original matrix A (which we haven’t seen). The pivots 2,5,7 were discovered when "elimination" produced lire lower triangular zeros in U. This crucial step from A to U I» still to be csplumed I Note We would not allow the number zero to be a pivot That would destroy our plnn because an equation like Oxi = 2orQrj « 5or0rj • Я has no solution Bink substitution will break down with a zero in any pivot position (on the diagonal of ft) the test lot independent columns (when Ca A in Chapter 1. and R*l, and Л = (‘H becomes I Л11 is n nonzero pivots. Every square matrix Л with independent columns (full rank) can he reduced lo a Irian gular matrix U with nonzero pivots. This is our job. It is possible that wc may need to put the equations Лх = b in a different order. We start with the usual case when elimination goes from A to U and back substitution finds the one and only solution vector r to Лх = b
Chapter 2. Solving Linear Equations Ax = A 42 Elimination in Each Column First comes . matrix A (independent columns) that will require no row exchanges. ’ 2 3 4 ' The starting matrix is A д _ 4 Ц 14 . The first pivot is 2 2 8 17 The first pivot row is [ 2 3 4 ]. Multiply that row by 2 and subtract from row 2: First step: Eji^is The multiplier was 4/2 - 2 ‘2 3 4 0 5 6 2 8 17 (3) This produced the desired zero in column 1. To produce another zero, we subtract row 1 from row 3. This completes elimination in column 1: Second step : The multiplier was 2/2 = 1 2 3 4 ‘ 0 5 6 0 5 13 (4) 2 and ™ 2 * * (lhe lecond pivo< row) The Piv0‘« 5- on ‘he diagonal. To eliminate the 5 below it, multiply row 2 by the number 1 and subtract from row 3 Phial : E31 EjiEhA is triangular The multiplier was 5/5 и 1 U = 3 5 0 4 6 7 (5) 2 0 0 on i» dnjuX wTk'nT» ”h« tai Fo™ml i complete Since U had 2.5,7 ongmal 4 were independent, as we will «ее/т^СП1 (“nd ,hereforc ,hc columns of the Wr can summarize the elimination tt, l ma,r*ce8 a”d U have full rank 3. -------------------------- 1 и fn no row exchanges are involved. Use the first equation Г U« the new second equation to Um" ’ be’°W ‘hC P'V°‘' Continue to column 3. The exoe t7 °U' ? Ь'1°* Р'Ш 2 TOW 2 L ~~-----------------------CC resull,san upper triangular matrix U. Elimination on A produces U The UXXT-X n"" ,id" appl'ed» "» "8"' ** " *"d - c (equivalent lo the old uon Th« gives the solution x.
2.1. Ilic Idea of Iiliininalк>n 43 Possible Breakdown of Elimination Elimination can fail. Zero <«« appear in a pivot position Subliiu ting that /cm from lower rows will not clear out the column below the unwanted zero Here is an example Zero In pivot 2 from 2 ’* J 2 1 ... Л . i . A«4 0l4-»0 0fl II elimination In column 1 i Mil II Ij I •> The cure is simple if it works. Exchange row 2 with the zero for row .4 with the A. Then the second pivot is 5 and we can clear out the second column lx-low that plvol So elimination continues as normal after the row exchange by the matrix Row exchange PH - I (It»' 0 0 I 0 I 0 3 4 1 Г 2 0 fl - 0 5 13 0 3 4 A 13 0 0 - u. 2 0 0 For this small example, the row exchange is all we need It produced lhe upper triangular U with nonzero pivots 2,5,6. Normally there are more columns anil rows lo work on. before we reach U. Caution! That row exchange was a success. This is what we hope for, lo reach U with no zeros on its main diagonal. (The pivots 2,5,0 are on the diagonal.) But a slightly different matrix Л* would lead to a bad situation: no pivot Is available In column 2 A* = 4 3 fl 3 14 17 (J 0 3 4 ' 0 fl 0 13 2 2 4 2 - V At this point elimination is helpless in column 2. No pivot is available. This misfortune tells us that the matrix A* did not have full rank. Column 2 of U* is in lhe same direction as column 1 of U‘. So column 2 of A* is in the same direction as column I of A’ You see how dependent columns are systematically identified by elimination There are nonzero solutions X to A’X » 0. The columns are not independent. This example has column 2 = | column 1. The solution vector X is (3. 2,0), The equation A'x = b may or may not be solvable, depending on b'. probably not Dependent or Independent Column* This A* looks like a failure of elimination: No second pivot. But it was a success because the problem was identified: dependent columns. The beauty of aiming for a triangular matrix U or U* is that the diagonal entries tell us everything. A triangular matrix U has full rank exactly when its main diagonal has no zeros. In that case (square matrix with nonzero pivots) the columns of U are independent. Also the rows are independent. We can see this directly because elimination has simplified the original A to the triangular U. How do we know that a zero on the diagonal ofU* leads to dependent columns 7
Chapter 1 Solving Linear Equations Ax = b 44 = upper triangular with an extra zero on its diagonal 0 0 0 * 1. The first three columns are dependent. 2. The last two rows are dependent. The Row Picture and the Column Picture The next pictures will show those three possibilities for Ax = b : No solution or one solution or infinitely many solutions. There are two ways to see this. We start with the rows of A and we graph the two equations: the row picture. r-2jt = -l z —2y = l Figure 2.1: Parallel lines mean no solution. Top line twice means many solutions. Intersecting lines give one solution. The solution is where the lines meet. If we had three equations for z.y, and z, those two lines would change to three planes. Each plane like 2x + + 3z = 9 would be in 3-dimensiona) space. This row picture becomes hard to draw The column picture is much easier in three or more dimensions. The column picture just shows column vectors: columns of A and also the vector b. We are not looking foe points where these vectors meet The goal of Ax = b is to combine lhe columns of A so as to produce the vector b This is always possible when the columns of A (n vectors in n-dimensional space) are mdependent Then the column space of A contains all vectors b in R". There is exactly one combination Ax of the columns that equals b Elimination finds that x The columns of A are independent Column 1 + Column 2 = b Then the solution is x = 1, у — 1 Construct b from the columns I combination Az of the columns of A.
2.1. The Idea of Elimination 45 Examples of Elimination and Permutation This chapter will go on to express the whole process using matrices. An elimination matrix E will act on A. In case zero appears in a pivot position. a permutation matnx P is needed The result is an upper triangular U and a new right hand side c. Then Ux = c is solved by back substitution. In reality a computer takes those steps (x = A\b in MATLAB». But it is good to solve a few examples—not too many—by hand. Then you sec lhe steps to Ux = c and not only the solution X. This page contains a variety of examples, hopefully lo show the way. 2 4 -2 4 9 2 0 -3 7 Those elimination steps Ел and E3i and Ek produced zeros in positions (2,1) and (3.1) and (3,2). The matrices E have —2 and +1 and —1 in those positions. The same steps must be applied to the right hand side b, to keep the equations correct. ’ 2 8 —> Eli b = 10 2' 4 —»E31E2ib = 10 ’ 2'1 4 —♦ E32E31 Ел b = Eb = c = 12 2 4 8 There is a better way to make sure that every operation on lhe matrix A (left side of equations) is also executed on b (right side of equations). The good way is to include b as an extra column with A. The combination [A b ] is called an augmented matrix. и 'i- (7) Now we include an example that requires a permutation matrix P. It will exchange equations and avoid zero in the pivot. This example needs P in column 2. Exchange rows 2 and 3 In the final description PA = LU of elimination on A. all the E’s will be moved to the right side. Each matrix in Еи^з1^и в inverted. Those inverses come in reverse order in L = E^1 E^i Ей'. The overall equation is PA = LU. Often no permutations are needed and elimination produces A = LU: the best equation of all. That permutation P23 exchanged rows 2 and 3 when it was needed to avoid a zero pivot But we could have exchanged rows 2 and 3 at the start. (Then Ел and £jt have to change places) Section 2.4 will return to understand all the possible permutations of n rows. There are n I possible matrices P. including P = I for norvw exchanges.
гъдмег 1 Solving Linear Equations Ax = b 46 Problem Set 2.1 Problems 1-10 are about riiminauoo on 2 by 2 systems. 1 2 3 6 Whar multiple in of equation 1 should be subtracted from equation 2 ? 2x + 3y = 1 lQj + 9y = ll. After elimination. write down the upper triangular system and circle the two pivots. Use back substitution to find X and у (and check that solution). It equation I is »AVri to equation 2, which of these are changed: the planes in the row picture, the vectors in lhe column picture, the coefficient matrix, the solution ? What multiple of equation 1 should be wbtrocted from equation 2 ? 2x - 4y - 6 -X + 5y = 0. After this elimination step, solve lhe triangular system. If the right side changes to (-6,0). what is the new solution? What multiple / of equation I should be subtracted from equation 2 to remove c? ax + by = j a + dy = g. The first pivot is a (assumed nonzero) Elimination produces what formula for the second pivot ? What is у 7 The second pivot is missing when ad-be: singular. Choose a nght side which gives no solution and another right side which gives infinitely many solutions What arc two of those solutions 7 Singular system 3x + 2y=|0 and 6x + 4g= 2“ “**'•• ни» « nai maxes it solvable Fmd (wo solutions in thal singular case. b + 6gs 16 Cr + 8y = g. <u + 3y = -3 Cr + бу = g For which three numbers 1- docs cilm exchange .’ Is the number of solutions n^T d°Wn ? Which ’» fixed by a row Ь + Зу« б
2.1. I he l<k«i Я HiiiiidjIi'M 47 9 What ICM Oil 6| jiuj (q decides Wilcdxi tllCSC I WO КЦииШть ell'/» U WvluUoO? Ho», шилу solutions will they have? Praw Ox column риши» for b - (| 2) w-4 H 4) if «• 2y = hi 4f 4 у - by. 10 Draw the lines j 4 // ; fj and t t 2</ - <> and die equation у - that comes from elimination Which line 5i 4(f < goes through t|n solution of lh*x equations ? Problems 11-20 study elimination on J by J systems (and poe&ible failure). 11 (Recommended) A system of linear equations c an't have exactly two solutions Why 7 (a) If (z, у, t) and (X, Y. Z] ate two solution», wliai is another solution? (b) If 25 planes meet al two points, where else do they meet? 12 Reduce to upper triangular form by row operations Then lind z, и 2x + Зр + г — И 2f - Uy - 3 4z + 7у + ht 20 4ж - by f < “ 7 - 2y + 2г « 0 2x - у - 'it « 5 13 Which number d forces a row exchange, and what is the li (angular system (not sin gular) for that </? Which d makes this system singular < no thud pivot) ? 2f + 5p + г = 0 4z + dy + x « 2 U - z “ 3. 14 Which number b leads later lo a row exchange? Which b leads to a missing pivot 7 In that singular case find a nonzero solution z, y, t. x + by -0 x - 2 у - x = 0 y + x = 0. 15 (a) Construct a 3 by 3 system that needs 2 row exchanges to become triangular (b) Construct a system that needs a row exchange and breaks down later 16 If rows I and 2 are the same, how far can you get with elimination (allowing row exchange)? If columns I and 2 are the same, which pivot is missing? Equal 2z - у + г - 0 2x + 2y + z = 0 Equal rows 2x - у 4- z = 0 4z + 4y + z = 0 columns 4z + у + x = 2 6x + 6y + z « 2. 17 Construct a 3 by 3 example that has 9 different coefficients on the left side, but rows 2 and 3 become zero in elimination. How many solutions to your system with b = (1,10,100) and how many with b = (0,0,0)?
аир» г Solving Linear Equations Ax = b 48 18 Which number q makes this system smguu many solutions? Find the solution that has side t gives it infinitely 3y + flZ = t 19 20 For which two numbers a will elimination fail on A = (j’j? For which three numbers a will elimination fail to give three pivots? Гп 2 3] • - —* - 4 is singular for three values of a. tai row sums 4 and 8, and column sums 2 and a: . Find two matrices with the Look for a matrix that The four equations are solvable only if t = _______ correct row and column sums. Write down the 4 by 4 system Ax = b with x = (a, b, c, d) and make A triangular by elimination Matnx a +1 = I а + c = 2 4 equations e + d « 8 b + d = i 4 unknowns 22 Create a MATLAB command A(2. ;) ... for the new row 2. to subtract 3 times row 1 from the existing row 2 if the 3 by 3 matrix A is already known. 23 Find experimentally the average 1 a and 2nd and 3rd pivot sizes from MATLAB s [L.17] ta(raad(3)) with random entnes between 0 and 1. The average of 1/(1,1) is above | because hi picks the largest pivot in column 1. 24 If the last corner entry is A(5,5) = 11 and lhe last pivot of A is 17(5,5) = 4. what different entry A(5,5) would have made A singular ? 25 Suppose elimination takes A to U without row exchanges. Then row j of U is « combination of which rows of A? If Ax = 0, is Ux = 0? If Ax = b. is Ux = b? If A starts out lower triangular, what is the upper triangular 17? 26 Start with 100 equations Az » 0 for 100 unknowns x = (zj,..., z10q). Suppose elimination reduces the 100th equation lo 0 = 0, so the system is “singular". (a) sywems Az = 0 have infinitely many solutions. This means that some linear combination of the 100 column, of A is__________ (b) Invent a 100 by 100 singular matnx with no zero entries. (c) Describe in words the row picture and column picture of your Ax = 0.
2.2. Elimination Malnccs and Inverse Matrices 49 2.2 Elimination Matrices and Inverse Matrices Elimination multiplies /I by Ел.En\ lhen Ей,..., En2,... as .4 becomes EA = I/. 2 In reverse order the inverses of the E’s multiply U lo recover A = E~lU. This is LU. ^3 A-1 A = / and (LU)~l = U~lL~l. Then Ax = b becomes ж — A' 'b = U~'L~'b All the steps of elimination can be done with matrices. Those steps can also be undone (inverted) with matrices. For a 3 by 3 matrix we can write out each step in detail—almost word for word. But for real applications, matrices are a much better way. The basic elimination step is to subtract a multiple ft) of equation j from equation t. We always speak about subtractions as elimination proceeds. If the first pivot is atl = 3 and below it is ал “ —3. we could just add equation 1 to equation 2. Thai produces zero. But we stay with subtraction: subtract Gt = -1 times equation 1 from equation 2. Same result. The inverse step is addition. Compare equation (10) to (I I) to see it all. Here is the matrix that subtracts 2 times row 1 from row 3: Rows 1 and 2 stay lhe same. Elimination matrix E,. _ Em = 0 10 Row 3, column 1, multiplier 2 q । If no row exchanges are needed, then three elimination matrices Ell *nd Em and Era will produce three zeros below the diagonal. This changes A to the triangular U: A is 3 by 3 U is upper triangular E33E31 Ej>A = U (I) The number is affected by the fai and f31 that came first We subtract times row 2ofU (the final second row. not the original second row of A). This is the step that produces zero in row 3. column 2 of U. E3? gives the last step of 3 by 3 elimination. Example 1 En and Em subtract multiples of row 1 from rows 2 and 3 of A: 10 0' 1 0 O' 3 1 O' 3 1 o' two new EmEji A = 0 1 0 1 1 0 -3 1 1 = 0 2 1 zeros in (2) -2 0 1 0 0 1 6 8 4 0 6 4 column 1 To produce a zero in column 2. En subtracts 132 = 3 times the new row 2 from row 3: 1 0 0 3 1 o' 3 1 o' U has zeros (Ем)(Е31Е31А) = 0 1 0 0 2 1 = 0 2 1 = U below the (3) 0 -3 1 0 6 4 0 0 1 main diagonal Notice again: En is subtracting 3 times the row 0,2,1 and not the original row of A. At the end. the pivots 3,2,1 are on the main diagonal of U: zeros below. Example 4 will show the "inverse” of each elimination matrix Ey. This leads to the inverse of their product E = E»Em£m- That inverse of E is special. We call it L.
nnptrf 2. Solving Linear Equations Az - Inverse Matrices 50 „ a- Wr kwi for ш'кгеяеждХгьг'Л'1 of the satne size, such Suppose .4 is a square matrix. we кхж io> . . . th^1 times A equals I Whatever A does. A’* undoes Their product is the identity mamx—which does nothing to a vector, so A”* Az = X- But A~ mighlnot «Ш. The square matrix A needs independent columns to be invertible. Then A1 A = /. What a mams mostly does is to multiply a vector. Multiplying Ax = b by A~* gives A~lAx A"1*. Thu h x = A"1*- The product A 1A is like multiplying by a number and then dividing by thai number Numbers have inverses if they arc not zero. Matrices arc more complicated and more interesting The matrix A"1 is called "A inverse." DEFINITION The matrix A is invertible if there exists a matrix A 1 that “inverts" A: Two-sided inverse A-1 A = I and A A1 = /. (4) Not all matrices have invenn This is the first question we ask about a square matrix: Is A invertible? Its columns must be independent We don’t mean that we actually calculate A’1. In most problems we never compute it! Here are seven “notes" about A-1. Note I The inverse exists if and only if elimination produces n pivots (row exchanges are allowed). Elimination solves Ax = b without explicitly using the matrix A"'. 2 a^tn*A c“n0‘ *“** ,wo Afferent inverses Suppose BA = I and also AC « I Then В = C. according to this "proof by parentheses": 0(AC)-(SA)C gives BI-IC or В-C. (5) (multiplying А ' mu^JPb’’ng A from the left) and a rightinverse C oiupiymg A from the ngta to give AC - I) must be the some matrix Then ж = А-‘Ах A has dependent columns, ft fonno. h“ ’U"'''fro rector x such that Ax = 0. Then If A is invertible, then Ax - n ' N° П“‘Г’Х bring ° b“* *° *' N«“ , “'’"'““'“""«independen.. (6) ** " 01 1 lc и r,nunanto(A Amar' ” Pivots is usually decial.'nvert’^e *ts determinant is not 1 before the determinant appears-
2.2. Elimination Malnc.es and Imerse Мжпсс* 51 Note 7 A triangular matrix has an unenc pros ided no diagonal entries d, are zero: If A = f1 с e о о о • X о • x X X X X к I then A"1 = 1/dj XX X 0 • x x 0 0 • x 0 0 0 1/dn Example 2 The 2 by 2 matnx A = } J is nor invertible, it fails the test in Note G. because ad = be. it also fails the test in Note 4. because Ax « 0 when x — (2, — 1). It fails to have two pivots as required by Note 1. its columns arc dependent Elimination turns the second row of this matnx A into a zero row. No pivot Example 3 Three of these matrices are invertible, and three arc singular Find the im erve when it exists. Give reasons for noninvertibihty (zero determinant. too few pivots, nonzero solution to Ax = 0) for the other three The matrices are in the order A.B.C.D.S.T: A is not invertible because its determinant is 4 • 6 — 3 • 8 • 24 - 24 • 0. D is not invertible because there is only one pivot; the second row becomes zero when the first row is subtracted T has two equal rows (and the second column minus ihe first column is zero), in other words Tx » 0 has the nonzero solution x = (-1.1.0). Not invertible. The Inverse of a Product AB For two nonzero numbers a and 6. the sum a + b might or might not be invertible The numbers a = 3 and b = -3 have inverses | and —Their sum a + b = 0 has no inverse. But lhe product ab = -9 does have an inverse, which is | times For matrices A and B, the situation is similar. Their product AB has an inverse if and only if A and В are separately invertible (and the same size). The important point is that A-1 and B~l come in reverse order: If A and В are invertible (same size) then the inverse of AB is (AB)-1 = B-'A-1 (АВ)(В-'А~1) = A IA~' = AA-1 = / (7)
52 CMptrr г Solving Linear Equations Ax = b м BB-'«. Si«l«1y В-Л-' limes ЛВ equsls I. 0-1 а-1 . basic гак of mathematics: Inverses come m reverse order. й ‘ 1 if vou put 00 socks and then shoes, the first to be taken off It is also common sense, u sou рш vu aretbe_____. The same reverse order applies to three or more matrices: (ABC)'1 = (8) Revtrse order Example 4 Inverse of ал ehmumtun matrix If E subtracts 5 times row 1 from row 2, then E"1 adds 5 times row 1 to row 2: E subtracts E-'adds E 1 0 O' 5 1 0 0 0 1 Multiply EE 1 to get the identity matrix /. Also multiply E~lE to get /. We are adding and subtracting the same 5 times row 1. If AC-I then automatically CA = I. For square matrices, an inverse on one tide it automatically an inverse on the other side. Example 5 Suppose F subtracts 4 times row 2 from row 3, and F~1 adds it back: 0 0 1 1 0 o' 0 1 о 0 4 1 roZ^EHFE A,l° ти|‘Ф'У E'1 I,mcs F 0 1 0 F£« -s 1 20 -4 1 0 O' 5 1 0 ° 4 !. inverse doesn't. order pf wbtracts 4 times the new row 2 (changed W an effect of size 20 from row 1. First F"1 adds 4 times row 2 to _____________________________________ 1 ло rffectfrom row 1. •be triangular U to the original A. ,a triangular L: Equation (11) below. (9) In the order £-1^-1 л _ .............................................................. 4 umcs row £ и/ ge again In this order E'tp-i ^сгс is no 20, because row 3 doesn’t ^“bwhywedxxne A 1n’ ' 'ffM fnm L "”,,tlpl‘CT' f*“ -mo Place'Л,10 the tnamtular U th* ^ein.l A. 1 is special.
2.2. Elimination Matrices and Inverse Matrices 53 L is the Inverse of E E is the product of all the elimination matrices Et), taking tn from A to its upper triangular form EA = U. We arc assuming for now that no row exchanges arc involved (thus P = /). The difficulty with E is that multiplying all the separate elimination steps Et) docs not produce a good formula. But the inverse matrix E~* becomes beautiful when wc multiply the inverse steps E~^. Remember that those steps come in the opposite order. With n = 3, the complication for E = EyjEsi Ell is in the bottom comer: .(10) 1 Watch how that confusion disappears for E 1 = L. Reverse order is the good way: E"‘ = ’ 1 61 1 ’ 1 0 1 1 0 1 = ’ 1 4i 1 I. (ID 0 0 >. 4» 0 1 0 1 4i 4a 1 All the multipliers 1ц appear in their correct positions in L. The next section w ill show that this remains true for all matrix sizes. Then EA = U becomes A = LU. Equation (11) is the key to this chapter: each t„ in its place. Problem Set 2.2 Problems 1-11 art about elimination matrices. 1 Write down the 3 by 3 matrices that produce these elimination steps: (a) Eh subtracts 5 times row 1 from row 2. (b) Ем subtracts -7 times row 2 from row 3. (с) P exchanges rows 1 and 2. then rows 2 and 3. 2 In Problem 1. applying E31 and then Ем «о b = (1,0,0) gives ЕмЕ^Ь =___________. Applying E32 before Eji gives ЕцЕ^Ь = . When Ejj comes first, row_______feels no effect from row____. 3 Which three matrices Eii, Ем. Em put Л into triangular form U ? Multiply those Es to get one elimination matrix E. What is E-1 = L?
Chapter 1 Solving Linear Equation* Ax « (> 54 4 5 6 7 8 9 10 11 12 13 14 IS n n 01 m a fourth column tn Problem 3 to produce | A b]. Carry out Include b - (1< Ou)*5 matrix to solve Ax = b. the elimination steps on this augmented m 7 MW1 the third pivot is 5. if you change nM to 11. the third pivot is _____И you unangc ujj --------- If every column of A is a muluple of (1.1.1). then Ax is always a multiple of (1,1,1). Do a 3 by 3 example. How many pivots are produced by elimination? Suppose E subtracts 7 times row 1 from row 3. (a) To invert that step you should_7 times row---------to row----------- (b) What “inverse matrix" E~1 takes that reverse step (so E~1E = IYf (c) If the reverse step is applied first (and then E) show that EE~1 - /. The determinant of M = [; *] is det Af = ad - be. Subtract f times row 1 from row 2 to produce a new Af*. Show that detU‘ = det Af for every t. When t • c/a, the product af pnoa equals the determinant: (a}(d - fb) equals ad - be. (a) Eji subtracts row I from row 2 and then Ри exchanges rows 2 and 3. What matnx Af = РпЕц does both steps at once? (b) Pn exchanges rows 2 and 3 and then Ejt subtracts row I from row 3. What rrutn* = ^lPa *« h«h «П» M once? Explain why the Af s are the tame but the Es art different hat matrix addt row 1 to row 3 and at the samf llme row 3 to row । 7 (b) row ) to row 3 and rben adds row 3 to row I? Create a matrix that has an «n^ =. .... phots without row exchanges (The 2 C ,mina,,on produces two negative Fw‘he*“perm«a6on matnets" . nd P by trial and error (with 1 's and 0’s): _ [0 0 1] 0 1 01 and P = 0 0 11 • 1 0 0 L Pe 0 1 о .1 0 0 “lurnn (t,z) of A-«. Check АЛ'1. t—- •'’’J 1*1 I-1 Find an upper triangular U (j»t diagonal) with lP = I. Then U-1 e (a) If A ts urvernble and AB = AC. prove quickly that В = C- (b) И А = Щ]. find two differ------------
22 18 17 18 18 20 21 22 23 24 25 28 27 28 29 30 F.ltminalxm Mainer» ual Immr 55 (Importanti II A ha» row I ♦ i*w 2 • row i Ju<w that A м inver .rw (a) Explain why As » (0 0. I самим have a <r4ui»> л AM eqn I • ецп 2 (h) Which right inlet (A). S S mtglM allow * «.lutMai ib A* 4’ (c) In the elimination prnce»*, what happen* tn equal» n 1' If A ha* column I + column 2 column t «b*w that 4 ata mW« (a) Find a nonzero solution я tn As 0 The matrn и I by i (b) Elimination keep* column* 1*2 3 F<plain why there .» a» turd pi • « Suppose A is invertible and you exchange n« firu two r<**« to reach В I» ihr <ww matrix В invertible? How would you find ft from A ? (a) Find invertible matrice* .4 and ft *uch that 4 * В is ant invert >ie (b) Find singular matrices A and В «xh that A ♦ ft it mvertAle If the product C ® AB is invertible > A and ft are square then А ла" . Find a formula for A'1 that involves C 1 and ft If lhe product M = ABC of three square matrices is invertible then ft i» unrrz « (So are A and C.) Find a formula for B~1 that involve* Af 1 and A and f If you add row 1 of A to row 2 to get B. how do you find В from A ’ Prove that a matrix with a column of zeros cannot have an inverse Multiply [j J] times [_« “Jj. What is the inverse of each matnx if otf a hr'* (a) What 3 by 3 matrix E has the same effect as these three step*’ Subtract row | from row 2, subtract row 1 from row 3. then subtract row 2 from row 3 (b) What single matnx L has the same effect as these three reserve steps1 Add row 2 to row 3. add row 1 to row 3. then add row I to row 2 If В is the inverse of Aa, show that AB is the inverse of A. Show that A 4 «eye (4) - ones (4.4) is nor invertible Multiply A «ones (4.1) There are sixteen 2 by 2 matrices whose entries are I's and O’s. How many of them are invertible ? Change / into A-1 as elimination reduces A to / (the Gauss-Jordan idea). Could a 4 by 4 matrix A be invertible if every row contains the numbers 0.1,2,3 in some order? What if every row of В contains 0,1,2, -3 in some order?
П^рит г Solving Linear Equations Ax = Ь 2 1 11 2 "J Л=121 and -1 2 1 1 1 2j l-l "I 2J Use Gauss-Jordan elimination on |U /] to find the upper triangular I/-1: UWl-I 0 O' 1 0 0 1 1 a b 0 1 c 0 0 1 True or false (with a counterexample if false and a reason if true): (a) A 4 by 4 matnx with a row of zeros is not invertible. (b) Every matnx with l's down the main diagonal is invertible. (c) If A is invertible then A-1 and A2 are invertible. (Recommended) Prove that A is invertible ifo#Oanda#b (find the pivots or A"*). Then find three numbers c so that C is not invertible: a b b a a b a a a 2 c c c 8 7 c c c C = ‘ Wdtaj^do.on IЛ 1|. Exlend “permutation matnccs" Show th» p / but in any order. They are “™»> Ky Ml 1 Леи block meinee»; 00 • 3 by 3 rneinx ml) you if л jj jnvenibk ?
2.3. Matnx Computation and A = LU 57 2.3 Matrix Computations and A = LU 1 The elimination steps from Л to U cost |n3 multiplications and subtractions. 2 Each right side b costs only n3: forward to Ux = c. then back-substitution for x. 3 Elimination without row exchanges factors A into LU (two proofs of Л « LU}. How would you compute the inverse of an n by n matnx Л ? Before answering that ques- tion I have to ask: Do you really want to know Л * 1 ? It is true that the solution to Ax = b (which we do want) is given by x = Л“’Ь. Computing Л-’ and multiplying Л"'Ь is a very slow way to find x. We should understand Л-* even if we don’t use it. Here is a simple idea for Л-*. That matnx is the solution to ЛЛ-’ = I. The identity matrix has n columns ei.ej,Then ЛЛ_| = I is really n equations Лх* e* for the n columns хц of A~ *. We have three equations if the matrices are 3 by 3: We are solving n equations and they have the same coefficient matnx A. So we can solve them together. This is called “Gauss-Jordan elimination”. Instead of a matrix [ A b j augmented by one right hand side b, we have a matrix [ A I I augmented by n right hand sides (the columns of /). And elimination produces [ I Л*1 ]. The key point is that the elimination steps on A only have to be done once. The same steps are applied to the right band side—but now АЛ_| = I has n right hand sides. The n solutions x, to Ax, = e, jo into the n columns of A*1. Then Gauss-Jordan takes [ A I ] into [ I A-1 ]. Here elimination is multiplication by A-1. In this example A subtracts rows and A-1 adds. This is linear algebra's version of the Fundamental Theorem of Calculus: Derivative of integral of f(z) equals f(x). The Cost of Elimination A very practical question is cost—or computing time. We can solve 1000 equations on a PC. What if n = 100,000? (Is A dense or sparse?) Large systems come up all the time in scientific computing, where a three-dimensional problem can easily lead to a million unknowns. We can let the calculation run overnight, but we can’t leave it for 100 years.
Chapter 1 Solving Linear Equations Ax = b 58 r ,. . .. . юа below the first pivot in column 1. To The first sage one multiplication and one subtraction, find each new entry below p multiplications and n2 subtractions. It is actually Wh witf count this first stage er n j пг - n, because row 1 does not change The next stage clears out the second column below (he second pivot. The working marrn is now of size n -1 Estimate this stage by (n -1 )2 multiplications and subtractions. The matrices art getting smaller as elimination goes forward. The rough count to reach U is the sum of squares n1 + (ia — !)* + •••+ 2* +12. There is an exact formula |n (n + |)(n +1) for this sum of squares. When n is large, the | and the 1 art not important The lumber that matters is | n3. The sum of squares is like the integral of x3! The integral from 0 lo n is | n1: Elimination on A requires about * n3 multiplications and ’ ns subtractions. What about the nght side 6? Going forward, we subtract multiples of b} from the lower components bj....b» This is n -1 steps The second stage takes only n - 2 steps, because b, is not involved. The last stage of forward elimination (b to c) takes one step. Now start back substitution Computing x. uses one step (divide by the last pivot). The next unknown uses two steps When we teach X| it will require n steps (n — I substitutions of the other unknowns, then division by the first pivot). The total count on the right side, from b to c to t—forward to the bottom and back to the top—is exactly n2: Kn ~ 1) + (n-2)+ ••• +J| + |l + 2 + ... + (n_ l) + n|-na. (2) To see that sum. pair off (n - I) with 1 and (n - 2) with 1 The pairings leave n terms, each equal to n. That makes n2. The right side costs a lot leu than the left side’ Solve Each right side needs n* multiplications and n3 subtractions. Host long does it take to solre Ax = b? For a random matrix of order n = 1000. a typical time on a PC is 1 second. The time is multiplied by about 8 when n is multiplied by 2 For professional codes go to netlih.org According to thn n1 rule, matrices that are 10 times as large (order 10.000) will take a thousand seconds Matrices of order 100.000 will take a million seconds. This is too expensive without a supercomputer, but remember that these matrices are full. Most large matrices in practice are sparse (many zero entries) In that case A = LU is much faster. . . Proving A = LU Elimination is expressed by EA г U .„л showed how the multipliers I fjii * lnvcned by Л = LU. Equation (11) in Section 2.2 Wlrv should me want to find a proof , 2?П'° *** n®ht Pos*t’0’» in E~1 which is L- just seen that partem and believed К and A proof means that we have not "“Ol'W n, but understood it
2.3. Matrix Computation' and A = LU 59 The Great Factorization A = LU Let me review the forward steps of elimination. They start with a matrix A and they end with an upper triangular matrix U. Every elimination step Et) produces a lower triangu- lar zero. Those steps EtJ subtract ftl limes equation j from equation i below it. Row exchanges (permutations) are coming soon but not yet. To invert one elimination step EtJ, we add instead of subtracting: Equation (10) in Section 2.2 multiplied ExiEsi Eji with a messy result: ~Gt (GaGi - Gt) Equation (II) showed how inverses (in reverse order E,,1 E,,1 E« > produced perfection: Then elimination EA - U becomes A - E~*U LU if we run it backward from U to A. These pages aim to show the same result for any matrix size n. The formula >1 = LU is one of the great matrix factorizations in linear algebra. Here is one way to understand why L has all the tt} in position, with no mix-up. The key reason why A equals LU: Ask yourself about lhe pivot rows that are subtracted from lower rows. Are they the original rows of A? No. elimination probably changed them. Are they rows of 17? Yes. the pivot rows never change again. When computing the third row of U, we subtract multiples of earlier rows of U (not rows of A!): Row 3 of U = (Row 3 of A) — Gt (Row 1 of If) — f«(Row 2 of 17). (3) Rewrite this equation to see that the row [ fSl tM 1) is multiplying the matrix 17: (Row 3 of A) = Gi(R°* 1 of 17) + ^«(Row 2 of 17) + l(Row 3 of (7). (4) This is exactly row 2 of A = LU. That row of L bolds Gt. Gz. 1. All rows look like this, whatever the size of A. With no row exchanges, we have A = LU.
Пиргег 2. Solving Linear Equations Ax = b 60 Second Proof of A = LU'. Multiply Columns Times Rows 1 would like to present another proof of A = LV. The idea is lo see elimination as removing one co/amn of L Нош one of U fro* A. The problem becomes one size smaller. Elimination begins with pivot row = row 1 of A. We multiply that ptvot row by the numbers fj> and /ц and eventually Ли- Then we subtract from row 2 and row 3 and eventually row n of A. By choosing Gi = ««/«и and fji = ttji/ац and eventually f«i = Oni/oit. die subtraction leases zeros in column 1. Now we face a similar problem for At. And we take a similar step to reach Лэ : Step 2 removes 0 (row 2 of Aj) 1 (row 2 of Aj) f»(row2of Aj) f«j(row2of Aj) from Aj to leave As = 0 0 0 0 0 0 0 0 X X X X 0 t, • (0, 7* !<COnd Wc rcnK,vcd a Column removes a column f, times a pivot row^r^*Con,lnuin« inthe same way, every step ’ a pmx row Uj of I/. Now put those pieces blck togc,hcr: U1 - LU (5) “n 1* very Й0" Of L U- wu introduced at •o multiply LU—by ««ton will review this important way Nona that (/ is upper triangular The mvm (C° of L d®« rows of U). »«*« triangular with 14 on iu main dh^Z? »*lh fc -1 zeros. And L is Column 4 also beg|ns wj|h к - 1 zeros.
2.3. Matrix ( ompoiaiion* and Л - /,/. 61 Elimination Without Row F.x charters The next section is going to allow row exchange* Z', They are necessary to пкл»е zeros out of the pivot positions Before we go there, we can answer this hasa. queso. x, When ir Л /7/po<uh/e with no mw rrrtwngzfom/no zzeor m theprvzirr ’ Answer All upper left к hy к mhmatrkes of A must he invertible (sizes 1 I tr. <> i The reason is that elimination is also factoring every one of those subn-utrices (i by * corners of A). All those corner matrices A* agree with Л»Л» (к by к earners of I. ami I ) /,* 0 Ak • tells ns that A* fM Problem Set 2.3 Problems 1-8 compute the factorization A = LU (and also A LDU) 1 (Important) Forward elimination changes | J ] x b to a triangular J | z r x + у = 5 x + 2y = 7 x + у ® 5 p-2 I I 5 I 2 7 That step subtracted fл —_________times row I from row 2. The reverse step oddt f л times row 1 to row 2. The matrix for that reverse step is L - Multiply this L times the triangular system [J } ]»i = [$] to get = _ .. In letters. L multiplies Ux = c to give_________. 2 Write down the 2 by 2 triangular systems Lc = b and Ux = c from Problem 1. Check that c - (5,2) solves the first one. Find x that solves the second one 3 What matrix E puts A into triangular form EA = If? Multiply by E1 = /, to factor A into LU: 2 1 0‘ 0 4 2 6 3 5 A = 4 What two elimination matrices Ел and E32 put A into upper triangular form E32E2lA = U1 Multiply by E3, and E^i to factor A into LU = Ej,1 E32'U: 1 1 1* A= 2 4 5 . 0 4 0
Chapter 2. Solving Linear Equations Ax = b 62 5 What three elimination matrices P* Л «*> * UPP" “Wul*- form ЕззЕз^Л = 1Л Multiply by e£. Ej> and E2l to factor A into L times У: 1 0 I Л= 2 2 2 3 4 5 L = £^1Е31'Е32*. 6 A and В are symmetric across the diagonal (because 4 = 4). Find their triple factor- izations LDU and say how I/ is related to L for these symmetric matrices: Symmetric and В = 4 12 4 o' 4 0 1 0 IRecommended) Compute L and U for the symmetric matrix Л: a a b b a b 6 a b c abed Find four condition» on a. 5, c, d lo get A LU with four pivots. Thh mmsymmetric matrix will have the same L as in Problem 10: Find L and U for a a a a b b b c c t d 9 fi“l л . II/ wil, piroB. Sd’'‘*“««*>4«»u.sBtal<.neiobe|/B_<.(i)en(|e! 10 and l/ = K* safety multiply LU and solve 4, » a r , . b “ Circle c when you see it. S°**e f*C - Ь to find c. Then solve// *пю|уе1/Ж = с1оЫх Wha(was4? and b = 11 I 0 0 *> 1 1 0 Л i i 1 1 1 “d l/s о 1 i W'»* ««01 a, 4 5 6 and b = 0 0 1 steps to L. what matrix do you reach? r * 0 0 L~ fn 1 0 /» f» 1 1
2.3. Main* Computation.' and A = LU 63 (b) When you apply the same steps to /. what matnx do you get ? (c) When you apply the same steps to LU, what matrix do you get ? 12 If A = LDU and also A = L\D\U\ with all factors invertible, then L - L\ and D = D\ and U = Ui. "The three factor* are unique." Derive lhe equation L\XLD = DiUtU~x. Are the two sides triangular or diagonal? Deduce L = Li and U = Ut (they all have diagonal l’»J. Then D = Dt. 13 Tridiagonal matrices have zero entries except on lhe main diagonal and the two ad- jacent diagonals. Factor these into A = LU and A = LDL': 1 1 O' a a 0 A = 1 2 1 and A = a a + 6 b 0 1 2 0 b b + e 14 If A and В have nonzeros in the positions marked by x. which zeros (marked by 0) slay zero in their factors L and U1 'x X X x' A- X 1 X ° л 0 x x x 0 0 x x x x x O' n_ z * 0 * S" x 0 x x ° X X X 15 Easy but important. If A has pivots 5.9.3 with no row exchanges, w hat are the pivots for the upper left 2 by 2 submairix Aa (without row 3 and column 3 of A) ? Following the second proof of A - LU. what three rank I matrices add to A ? 0 ‘ 1 4 2 5 6 €|U| + £jua + €з**э e LU! columns multiply rows 17 Multiply LrL and LLT by columns times rows when the 3 by 3 lower triangular L has six l’s.
Chapter 2. Solving Linear Equations Ax = (, 64 2.4 Permutations and Transposes "1^ A permutation matrix P has the same rows as / (in any order). There are n! differentaJa' 2 Then Pz puts the components-п.-Л.x. in that new order. And PT equals /»-*. 3 iimns of A art rows of AT. The transposes of Az and AB are x1 A 1 and BTAT. 4 The idea behind AT is that Ax • у equals x A1 у because (Az)1 у = rTATy = XT(ATy). A symmetric matrix has ST = S. The product S = AT A is always symmetric. Permutations Permutation matrices have a 1 in every row and a 1 in every column. All other entries are zero. When this matrix P multiplies a vector, it changes the order of its components: ’ 0 0 1 z> 1 0 Circular shift of z 1,2,3U>3,1,2 Pz- о 0 1 0 xj *3 ®a pl м,а 11 и ““ * “ ““м" ₽'°”d < H<nantpccifc°*** 3' and 4' “ 24 peonuulions otsi» У permutations, when they multiply a vector x: Reverse the order Circular shift 0 0 0 I 0 0 1 0 0 1 0 0 Г о о 0 Even zo.zj before odd Z|,xj inthc Fast Fourier Transform 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 Г о о о Exchange rows 2 and 3 Exchange again 10 ««t 1,2,3,4 0 0 0 1 0 0 1 0 0 0 0 0 0 0 Half of the n I [^7 T?L- ““ “« -«KT. An pennutation “ fi”' example (exebn ° thc ma,ri* /• The last example ge 1 and 4. exchange 2 and 3) was even. ₽en"“‘«<Km»of«enaft « «change) was odd. Therowsofp,,, the columns of p~i P = lr>nsposeofp 0 0 1 1 0 0 0 1 0 0 0 1’ 1 0 0 J L ° i о = / 1
2.4 Pctinuurtiun*. uud 11>ui»pu№ 65 Properties of PcrmuLMlKM) Matrices 1. The ч J's appear in ч different rows and a dtfieterr columns of P. 2. I he columns uf P arc orthogonal: dot products between columns ate all im 3. The product P^Pj of permutations is also a permutation 4. If A is invertible, there n a permutation P to order its row* in advance so that elimination on PA meets no zeros in the pivot positions Then PA = LU The PA = LU Factorization: Row Exchanges from P An example will show how elimination can often succeed, even when a zero appears tn the pivot position. Suppose elimination starts with 1 as the first pivot- Subtracting 2 times row 1 produces 0 as an unacceptable second pivot: 2 a 1 2 a ' 2 4b 3 7c 1 2 a 0 0 b - 2a (J 1 c- 3a 1 e-3a = 1/ 0 b - 2a 0 In spite of this zero, A is probably invertible. To rescue elimination. P will exchange row 2 with row 3. That brings 1 into the second pivot as shown So we can continue This matrix A is invertible if and only if 6 - 2a is not zero in the third pivot Notice that if b * 2a, then row 2 of A equals 2 (row 1 of Л), In that case. A tv surely not invertible We can exchange rows 2 and 3 first to get PA. Then LU factorization becomes PA = LU. The matrix PA sails through elimination without seeing that zero pivot. I 0 0 1 0 0 1 2 0 1 0 3 P In principle we might need several row exchanges. Then the overall permutation P includes them all, and still produces PA = LU. Daniel Drucker showed me a neat way to keep track of P, by adding a special column to the matrix A. That column tracks the original row numbers, as rows are exchanged. If we do exchanges on that column also, the final permutation P is easy to sec. The same example has one row exchange in P. 1 2 3 2 a 1] 1 4 b 2 -4 0 7 c 3 0 2 a 1 0 b-2a 2 1 c — 3a 3 12a Г 0 1 c — 3a 3 0 0 b-2a 2 • Asa is 1 0 0 0 0 0 1 1 0
Chapter 2. Solving Linear Equations Ax = b “Partial Pivoting” to Reduce Roundoff Errors Even good code for elimination allows fa extra row exchanges that add safety. Small pivots are unsafe! The code does not blindly accept a small pivot when a larger number appears below it (in the same column) The computation is more stable if we exchange those rows, to produce the largest possible number in the pivot. This example had tint pivot equal to 1. but column 1 offered larger numbers 2 and 3. The code will choose the largest number 3 as the first pivot: exchange rows 1 and 3. The order of rows is tracked by the last column—that column is not part of the matrix. AU entries of L are < 1 when each pivot i* larger than all the numbers below it. Fast Fourier Transform ^»Car1y eA4mPi L"cvens 06(15 pennuution". This comes и F°UfXTran5fonD («T). step reduces a transform Fx The FFT mav be the **** D“aete Founer Transform is multiplication by F. science 'Step 1 reduces -------Ш24 with 1024 nonzeros to two multiplications by FS12 (half size): rows0,2,4,8,... of/ rows 1,3,5,7,... of I the Founer matrix F I D I -D Those zero submatnees cut the Then the key idea u recuraon"^^^ involve the diagonal D’s The Fioaa = ata 0 ° diagonal matrix D and the пегтш^а^оГ^?1 *ОГк 10 (plus &mall work of Then the krv idn < ransform the evens and transform the odds. *.•.““*““•»F>“- involve the diagonal D\ The релтшип™ on*ardi. the only multiplications will o-en-odd permutation at even step The °M °Veral1 P = Product of an ^*2‘° = 1024). Every step costs 1024 to 1 « bfc Ю24 = 10 X7: n D'‘&,oui ““ “f “ 1 «Ио««Ч». I1» • d-ta, .и! “ “'Мками That difference makes
2.4. PemuiUtions and Transp/ve* 67 The I ranspose of A Wc need one more main», and fortunately it is much simpler than the inverse It is the “transpost" of A. which is denoted by A1. The columns of A1 are the nms of A. When A is an m by n matrix, the transpose is n by m. 3 by 2 becomes 2 by 3 Transpose If A = You can write the rows of A into the columns of AT. Or you can write the columns of A into the rows of Лт. The matrix “flips over" its main diagonal. The entry in row I. column j of AT comes from row j, column i of the original A: Exchange rows and columns The transpose of a lower triangular matrix is upper triangular. The transpose of Лт is A. Note MATLAB’s symbol for AT is A*. Typing [1 2 3 gives a row vector and the column vector is v = [1 2 3]*. The matrix Af with second column w = [ 4 5 6 |* is M = [ v w ]. Quicker to enter by rows and transpose: Af = [ 1 2 3; 4 5 6 ] *. The rules for transposes are very direct. We can transpose A + li to get (A 4- B)1. Or we can transpose A and В separately, and then add Л1 + В'—with the same result. The serious questions are about the transpose of a product A В and an inverse A 1: Sum The transpose of A + B is Ar + BT. (1) Product The transpose of AB is (AB)T = BTAT. (2) Inverse The transpose of Л"* is (Л-,)Т = (ЛТГ|. (3) Notice especially how BTAT comes in reverse order. For inverses, this reverse order was quick to check: B~lA~l times AB produces / because A~*A = B'lB = I. To understand (ЛВ)Т = BTAT. start with (Лх)т = xTAT when В is just a vector: Ax combines the columns of A while xrAr combines the rows of AT. It is the same combination of the same vectors! In A they are columns, in Лт they are rows. So the transpose of the column Ax is the row хтЛт. That fits our formula (Лт)т = xTAT. Now we can prove (ЛВ)Т = BTAr. when В has several columns. If В has two columns Xi and x2, apply the same idea to each column. The columns of AB are Л®! and Ax?. Their transposes appear correctly in the rows of BTAr : Transposing AB Ax\ Axj gives which is BTAT . (4)
68 AB (5) __ 2. Solving Linear Equations A® = b HOT«"»"*enin(''B|T “ в’Л 41(I q r6 ,1 I [5 0] [5 °] and BTAT=|0 1 |o 1] |0 1] 4 if I9 'J t io three or more factors*. (ABC)T equals C1 В ’ A1. The reverse order role extends to A-»A = I. On one side. ,. A-‘A-I is transposed io A (A ) Transpose of inverse Л . ..пт ат _ r we can invert the transpose or we can SMbd, AX-‘ - I j „ .„,ПЫ. "«'I, Hl« л “ transpose the inverse Notice especially л - The inverse of A = . The transpose is AT - The Meaning of Inner Products The dot product (inner product) of z and у is the sum of numbers х,у(- Now *c тм. better way to wnte * • y. without using that unprofessional dot. Use matrix notation T binside Пй dor product or iiuw product u zT у (1 X n)(n X 1) s 1 x ' T b outside Theron* one product or outer product if zyT(n x 1)(1 x n) — n x n zTy is a number. zyT is a matrix Quantum mechanics would write those as < ®llf (inner) and |z><y| (ouier). Probably our universe is governed by linear algebra Here are three more examples where the inner product has meaning Work = (Movements) (Forces) = x 1 f Heal loss = (Voltage drops) (Currents) = e ’ V Income = (Quantities) (Prices) = qT p From mechanics From circuits From economics We ate redly ckne to the heart of mathematics, and there is one more point to We defined dT k'T” C°”'*ct,on be1wtcn inner products and the transpose of A There baX vJ? *’«*iU miinThat’s not mathematics There is a better i«y . AT ц (A»)Ty = «T(ATf) lnw = inner prwjuct of x ATV
24. Permutations .ind Transposes 69 Example 1 Sun with A - On one side we have Ax multiplying у lo produce (r2 x()yt +Ui x2)in Thai is the same as x> (—) + x2 (gi — щ) + *s (pH No* * •* multiplying A1 у Example 2 Will you allow a little cakulus? It is important or I wouldn'l leave linear algebra. (This is linear algebra for functions.) Change lhe matrix to a derivative: A = d/dt. The transpose of d/dt comes from (Ax/y - xT(A'y). First, the dot product хту changes from X! jn + ••••* x,y„ lo an uiirgrulof x(t)y(t). Inner product of functions x and у x1y»(x. y) = / x(f)y(f)dt by definition Transpose rule for functions (Ax)'y = xT(4Ty) / / 4-(f) J Ш J (6) I hope you recognize "integration by parts" The dens alive moves from the first function x(t) to the second function y(t). Dun ng that move, a minus sign appears This tells us that the transpost of A — d/dt iiAT = -A= —d/dL The derivative is anti symmetric. Symmetric matrices have Л’ Л. anti-symmetric matrices have AT — A. In some way. lhe 2 by 3 difference matnx in Example I followed this pattern. The 3 by 2 matrix Лт was minus a difference matnx. Il produced i/( - y2 in the middle component of ATy instead of the difference щ - yi- Integration by parts is deceptively important and not just a tnck. Symmetric Matrices For a symmetric matrix, transposing A to Лт produces no change. Inthiscasc A’ equals A. Its (J>>) entry across the main diagonal equals its (i.j) entry. In my opinion, these are lhe most important matrices of all. We give symmetric matrices the special letter S. A symmetric matrix has S* = S. This means that every a2< = ao. Symmetric matrices S» J2 5] “ Jo io] ~ • The inverse of a symmetric matrix is a symmetric matrix. The transpose of S-1 is (S~*)T = (5T)_| = S~l. When S is invertible, this says that S-1 is symmetric . Symmetric inverses S~l = J_2 jj and ' = Jo 0 1]' Now we produce a symmetric matrix S by multiplying any matrix A by AT.
Oupier 2. Solving Linear Equations A® = b 70 PnxlueuA^ and A AT and LDL^ л — xrлт A ^T^A is automatically a square symmetric m • “ ',т<лТ)Т *** , °’ —’а 1 о -1 1 -1 1 0 0 -1 1 in both orders. Example 3 Multiply A - q and ЛТА ’ 1 -1 -1 2 0 -1 o' -I are both symmetric matrices. 1 The product AAT b m by m. In the opposite order. ATA is n by n. Both are symmetric, with positive diagonal (why?). But even if m = n. it is very likely that A1 A / A A . Symmetric matrices tn elimination S^ = S makes elimination twice as fast, because we can work with half the matrix (plus the diagonal) The rymmrtry is in the triple product S = LDLT. The diagonal matrix D of pivots can be divided out, to leave I/ * /Л L U misses the symmetry of S Divide the pivots 1,3 out of U S = LDLr captures the symmetry A'ow U и the transpose of L For a rectangular A th» saddU-pouu matrix S u symmetric and important: Block matrix [/ д! from least squares AT 0 । S has size m + n. S = ShinvertiMe A* A it invertible Block elimination Subtract AT(row 1) The block pivot matrix D <==> Ax /0 whenever x / 0 isU. __ M ~AtA. Then L and LT contain AT and A: l.lW.ll 01 p 0 1 Г/ Л Iх /J [о -ЛТЛ о I
2.4. Permutation* and Transpincs 71 Problem Set 2.4 Question* 1-7 are about the rule* for transpose matrices. 1 Find Л1 and Л 1 and (Л *)T and (Лт) 1 for Л = 2 Verify that (ЛB)T equals В1 A1 but those arc different front A1 BT : I e c 0 Show also that ЛЛТ is different from A1 A. But both of those matrices are 3 (a) The matrix ((AB)~1 )T comes from (Л'1)1 and (B~l)r. In what order! (b) Iff/is upper triangular then (I/-1 )T is__ triangular. 4 Show that Л2 = 0 is possible but Л1 A = 0 is not possible (unless A ® zero matrix). [12 3 4 5 6 (b) This is the row хтЛ _______times the column у (0,1.0). (c) This is the row xT = (0 11 times the column Ay = 6 The transpose of a block matrix А/ = [ * d I *s AfT " __ Test an example. Under what conditions on А, В, C, D is the block matrix symmetric ? 7 True or false: (a) The block matrix [ X о ] *s automatically symmetric. (b) If A and В are symmetric then their product AB is symmetric. (c) If Л is not symmetric then Л-1 is not symmetric. (d) When А, В, C are symmetric, the transpose of ABC is CBA. Questions 8-15 are about permutation matrices. 8 Why are there n! permutation matrices of order n? 9 If Pj and Pj are pennutation matrices, so is P\Pi This still has lhe rows of I in some order. Give examples with Pi Pa / Pi Pi and P3P4 = P4P3. 10 There are 12 “even” permutations of (1,2,3,4), with an even number of exchanges. Two of them are (1,2,3,4) with zero exchanges and (4,3,2,1) with 2 exchanges. List the other ten. Instead of writing each 4 by 4 matrix, just order the numbers. 11 If P has l’s on the antidiagonal from (l,n) to (n, 1), describe PAP. Note P = PT.
Chapter 2. Solving Linear Equations Az = (, 72 12 Explain why the dot product of X and у equals the dot product of Px and Py. ПепТрхАру) = *rV tells us that PTP = I for any permutation. x = (1,2,3) and у = (1,4,2) choose P to show that Px • у is not always x. Py 13 Which permutation makes PA upper triangular? Which permutations make P{ AP2 lower triangular? Multiplying .4 оя the right by P2 exchanges the___of Д 0 0 1 2 0 4 6' 3 5 14 find a 3 by 3 permutation matrix with P3 = 1 (but not P = I). Why can‘t P be >he____________Find a 4 by 4 permutation P with P* ji I. 15 'ШПСа "* symmctnc pT = P Then Р'Р = I becomes r = I. Other permutation matrices may or may not be symmetric. 19 17 18 (i) ">*<•*« PT *nds row_____ to row_____ P the row exchanges come in pain with no overlap. |Ь)^.<Ь><атр|.иа,рт.рам|п<|>о||||Го<дгта| ind Г1с|ог1и1(ою Л “ВВшВ,-*ил^Яштюкатстич1у,уттстс1 У'~В' (сМВЛ (d)ABAB (С) How many entries са h. J *'*' number of choic“ in LDLr ? v v«*u»ci can be chosen if j * W) Why doe. A*A h^e “ г^^Пс ЦА* = -A). F«ctor these symmetn numbcrs °"'t* diagonal ? rnetnc matrices into S » LDLT Th r • Tne pivot matrix D is diagonal: S. and S and s 19 2 -1 0 -1 2 -1 O' -1 2 (and ch*k them) for A 0 1 11 Ла 1 0 1 1 2 2 4 ,1 1 ^"Z^^^lhAlth , ’ what are^'f^ 3 exchanges to reach the torsPii, and (/? 2 3 and O' 1 1
2.4. Permutation* and Transposes 73 21 Prove that the identity matrix cannot be the product of three row exchanges (or five). It can be the product of two exchanges (or four). 22 If every row of a 4 by 4 matrix contains the numbers 0,1,2.3 in some order, can the matrix be symmetric? 23 Start with 9 entries in a 3 by 3 matrix A. Prove that no reordering of rows and reordering of columns can produce AT. (Watch the diagonal entries.) 24 Wires go between Boston, Chicago, and Seattle. Those cities are at voltages хц.хс. xg. With unit resistances between cities, the currents between cities are in y: VBC 1 -1 o' *B у « Ax is yes = 0 1 -1 *C VBS, 1 0 -1 (a) Find the total currents Лт у out of the three cities. (b) Verify that (Ax)Ty agrees with zT(ATy)—six terms in both. 25 The matrix P that multiplies (x.y.z) to give (*,x,y) is also a roution matrix. Find P and P3. The rotation axis a (1,1.1) doesn't move, it equals Pa. What is the angle of rotation from v = (2,3, -5) to Pv « (-5,2,3)? 26 Here is a new factorization A - LS = triangular rimes symmetric: Start from A LDU. Then A equals L times S = U* DU. Why is L (Ur)~x triangular ? Why is UrDU symmetric ? 27 In algebra, a group of matrices includes AB and A~x if it includes A and B. "Products and inverses stay in the group " Which of these sets are groups? Lower triangular matrices L with l’s on lhe diagonal, symmetric matrices S, positive matrices M, diagonal invertible matrices D. permutation matrices P, orthogonal matrices with QT « Q~x Invent two more matrix groups. Challenge Problems 28 If you take powers of a permuUtion matrix, why is some P* eventually equal lo /7 Find a 5 by 5 permutation P so that the smallest power to equal I is P*. 29 (a) Write down any 3 by 3 matrix M. Split M into S + A where S = S1" is symmetric and A = — AT is anti-symmetric. (b) Find formulas for S and A involving M and AfT. We want Af — S + A. 30 Suppose QT equals Q~1 (the transpose equals the inverse, so QTQ = Г). (a) Show that the columnsqj,...,qn are unit vectors: ||q,||2 = 1. (b) Show that every two columns of Q are perpendicular qjq2 = 0. (c) Find a 2 by 2 example with first entry qn = coe 0.
3 The Four Fundamental Subspaces 3.1 Vector Spaces and Subspaces 3 J The Nullspace of A: Solving Ax = 0 33 The Complete Solution to Ax = b ЗА Independence. Basis, and Dimension 33 Dimensions of the Four Subspaces Column 1 h'5* the picture tha Section 3 1 ojieni with a pure algebra question How do we define a "vector space" 1 Looking at R . the key operations are v + w and cv. They are connected by simple laws like c(v + w) = or + on. We must be able to add о + w. and multiply by c. Section 3 I will give eight rules that the vectors t> and the scalars c must satisfy produ“',h' -*• 4,40 «olutions: The nullspace is a subspace. Linear algebra gives us a way to solve 4т - o n. u . Simplify the equations to ftr = 0 TK,n к J “ °’ ?* *“* lystem IS eliminalion - column Taking all their combinations .< m. *. 4*C“I so,ut‘Ofl" for “ch dependent Rnallv r «“«Лесгиа.1 «ер to produce the nullspacc. anally comes the idea of a bash- А nt Their combinations give one and onlv ° VK’°rs Л,< Р^есЧу describes the space. Tl* r independent columns of A >r,. *?У *° Prtx*uce evcry vector in the space. ** • 0 are a basis for N(4). °r C(4). The n - r special solutions to Chapter 2 was ahn> ^*Ve r and n - r. ч.>м|лсг z was about souarr had full rank r = m 3 All four of the matrices in PA = LU Chapter 3 moves to a hither t«. '* °* 'P** °f A WCK thc ful1 spaCC R* Every rn by n matnx is allowed, and thcrT^ |T °* mOS’inlponan, chapter in thc book. Z * k n°nzem to Ax = 0 e ’tarts with equations Ax = T *** co,umn space and row space. al of this chapter ’ П<М Co*umns or rows ^rom conn«* the four wbL3^T"'a/ ntorrm of Unear Algebra ". rowspac^ Ле1ГШтеп^: *’«h it makes ^e "“^P8" °f A' nu«space of AT. ndamental Theorem easy to remember.
3.1. Vector Space* and Sub*pacc* 75 3.1 Vector Spaces and Subspaces 1 Al! linear combination* rv + dw must stay in the vector space 2 The row space of А is "spanned" by the rows of Л. The columns span C( A). ^3 Matrices Л/1 to My and functions ft to JN span matrix spaces and function spaces^ Start with the vector spaces R . R;. R *,... The space R" contains all column vectors v of length n. The components i ( to ц, are real numbers. (When complex numbers like t»i = 2 + 3i arc allowed, the spaces become С1, С2, C3....). We know how to add vectors v and w in R”. We know how to multiply a vector by a number г or d to get rv or dw. So we can find linear combinations rv + dw in the vector space R". This operation of "linear combinations" is fundamental for any vector space. It must satisfy eight rules. Those eight rules are listed at the start of Problem Set 3.1 — they start with v + w = w + v and they are easy to check in R". They don’t need lo be memorized! One important requirement: All linear combinations tv + dw must stay in lhe vector space. The set of positive vectors (vi,...,i>r) with every r, > 0 is not a vector space. The set of solutions to Ax (1,1,...,!) i* n<* a vector space. A line in R" is not a vector space unless it goes through (0,0.0). If the line does go through 0, we can multiply points on the line by any number c and we can add points on the line—without leaving the line. That line in R" shows the idea of a subspace: A vector space imide another vector space Examples of Vector Spaces This book is mainly about the vector spaces R” and their subspaces like lines and planes. The space Z that only contains the zero vector 0 = (0,0...0) counts as a subspace' Combinations cO + dO are still 0 (inside the subspace). Z is the smallest vector space. We often see Z as the nullspace of an invertible matnx: If the only solution to Лх 0 is the zero vector x “ 0. then the nullspace of A is Z. We can certainly accept vector spaces of matrices. The space R3"3 contains all 3 by 3 matrices. We can take combi nations cA + dB of those matrices. They easily satisfy the eight rules. One subspace would be the 3 by 3 matrices with all 9 entries equal— a “line of matrices”. Note that Z “ (zero matrix) and S = symmetric 3 by 3 matrices are also subspaces: A + В stays symmetric. But the invertible matrices are nor a subspace. We can also accept vector spaces of functions. The line of functions у = ce“ (any c) would be a “line in function space". That line contains all the solutions to the differential equation dy/dx = y. Another function space contains all quadratics у = a + bx + ex2. Those are the solutions to rPy/dx? « 0. You see how linear differentia] equations replace linear algebraic equations Ax = 0 when we move to function space. In some way the space of 3 by 3 matrices is essentially the same as R9. The space of functions f(x) = a + bx + a2 is essentially R3.
Chapter 3. The Four Fundamental Subspaces 76 u. .____ r lh, mainces and functions arc safely in those spaces. “,uran veThand n0‘functions T^ Ld-spice” means that all linear combinations of the vectors or matrices or functions stay inside the space. Subspaces of Vector Spaces At different times, we will ask you to think of matrices and functions as vectors. But at all times, the vectors that we need most are ordinary column vectors. They are vectors with n components—but maybe not all of the vecton with n components There are important vector spaces inside R" Those are subspaces of R". Start with the usual three-dimensional space R3. Choose a plane through the origin (0,0,0). That plane is a vector space in its own right If we add two vectors in the plane, their sum is in lhe plane. If we multiply an m-plane vector by 2 or -5, it stays in the plane. A plane in three-dimensional space is not R3 (even if it looks like R2). The vecton have three components and they belong to R3. The plane is a vector space inside R3. This illustrates one of the most fundamental ideas in linear algebra. The plane going through (0,0,0) is a subspace of the full vector space R3. DEFINITION A subspace of a vector space is a set of vectors (including 0) that satisfies two requirements: If v and w are vectors in the subspace and cis any scalar, then (I) v + w is in the subspace (11) cv is in the subspace ,n <>,hcr w‘*d‘-the set o( vectors is 'closed” under addition v + w and multiplication cv । ik. <5*ra,’o» lease us in the subspace. We can also subtract, because - w is in »U space a its sum with v is v - w. AU linear combinations slay in the subspace. are sutJ^iK Гу>'е^ии 4*C’’10 cighl "4“^ conditions check lhe linear combinations requirement for a subspace. (0.0.0). We thilTpIre’ehT’ ^С,0Г planC R’ has t0 g° ,hroUgh •«. -d uk * fram n‘le“° U»n AnugK A, JX'X'Tfc"' teiU ₽lanei “e 1,01 s',bsPatel vectors on the line, we stay on the lin/ R., When ** ти,1'Р,У ЬУ 5- or add tW° Another «Пирке is .)| of Rs T llne 8° through (0,0,0). of all the possible subspaces of R> 4>acc “ a subsP»ce (of itself). Here is a list (L) Any line through (0,0,0) (P) Any plane through (0,0,0) (R1) The whole space I (Z) The single vector (0,0,0) J plane or line, the requirements for a subspace don 1 they are not suh««~—
3.1. Vector Spate» and Subpaces 77 Example 1 Keep only the vector» (г, whose component» are positive or zero Ithi» is a quarter-plane). The vector (2.3)» included but (-2. -3) n not So rule (ii) is isolated when we try to multiply by c = -1. The quarter-plane и Me subspace. Example 2 Include also the vectors whose components are both negative. Now we have two quarter-planes. Requirement liii is satisfied, we can multiply by any c. But rule (it now fails. The sum of t? = (2.3) and w = (-3. -2) is (-1.1). which is outside the quarter-planes. Two quarter-planes doe 'I make a subspace Rules (i) and (ii) involve vector addition г + ic and muluplicatioa by scalar» c and d The rules can be combined into a single requirement—lhe rule for subspacer A subspace containing v and w must contain ell linear combinations cv •+ dw. Example 3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces: (U) All upper triangular matrices (D) All diagonal matrices i'(‘ . Add any upper triangular matrices in U. and the sum is in U. Add diagonal matrices, and the sum is diagonal. In this case D is also a subspace of U! Of course lhe zero matrix is in these subspaces, when a. b. and d all equal zero. Z is always a subspace. Multiples of the identity matrix also form a subspace of Af. Those matrices <7 form a “line of matrices" inside M and U and D. Is the matrix I a subspace by itself? Certainly not. Only lhe zero matnx is. Your mind will invent more subspaces of 2 by 2 matrices—write them down for Problem 5. The Column Space of A The most important subspaces are tied directly to a matnx A. We are trying to solve Ax = b. If A is not invertible, the system is solvable for some b and not solvable for other b. We want to describe the good right sides b—the vectors that can be wntten as A times some vector x. Those b’s form the “column space" of A. Remember that Az is a combination of the columns of A. To get every possible b. we use every possible x. Stan with the columns of A and take all their linear combinations. Thu produces the column space of A. It is a vector space made up of column vectors. DEFINITION The column space consists of all linear combinations of the columns. Those combinations are all possible vectors Ax. They till the column space C( A). This column space is crucial to the whole book, and here is why. To solve Ax = bis to express basa combination of the columns. The right side b has to be in the column space produced by A, or Ax = b has no solution! The equations Ax = bare solvable if and only if bis in the column space of A When b is in the column space C(A), it is a combination of the columns of A. The coefficients in that combination will solve Ax = b. The word “space" is justified by taking all combinations of the columns. The column space is a subspace of Rm.
I — Chapter 3. The Four Fundamental Subspaces 78 Caution • The columns of A do txx form a subspace! The invertible matnccs do no form a suZce The W«hr matnees do not fonn a subspace. You have to include all hnear combinations The columns of A “span” « subspace when we take thetr combinations. The Row Space of A The rows of A are the columns of AT. the n by m transpose matrix. Since we prefer to work with column vectors, we welcome AT: The row space of A is the column space C(AT) of the transpose matrix AT This row space is a subspace of R" It contains m column vectors from AT and all their combinations. The equations ATy = e are solvable exactly when the vector c is in the subspace С(ЛТ) = row space of A. Chapter 1 explained why С(Л) and C(AT) both contain r independent vectors and no more. Then r rank of A = rank of Лт. A new proof is in Section 3.5. W *P*Ce °f ,he 1ma,n* Л = ut,T “,hc ,mc °f a" c°lumn vectors row an«J* nnr '°luinn Л ~ vuT “ • multiple of v. One vector v spans the row space, one vector u spans the column space The Columns of A Span the Vector Space C(A) tuns only it **' f °f VCC,ors in R"‘ ,f S СОП’ of the vectors in S. then we h>v W *Pace ®ut 'nc'udc combinations In fact V is the smallest vector snV I₽,Ce V‘ ,n lh‘l case ,he sel S sPans V combinations to produce a vector c*”*,ainin8 $ (because we are forced to include all Thu is exactly what we , column space С(Л) « all combinations JiiT', °f Л Th0SC n colutnns sPan ,he у the word span. In the same wav the m columns. Independence is not required ™ question. Show that the * У the row space C(AT). ._______ “тЫегь>2"«пе«,р„к1.< Next comes the пцц^ 3 3 mMric« span R3*’. (equations and nm and that reni.,~ I» й a vector space be«°\ ‘O,ut>ons^ t0 rJT* Wc Start with Лх = ° have io work to find livl.USe = 0 and Ab - n 7°* e4uat>°ns give the nullspace, ‘’hndthoseso,^ 4V-01eadtoA(cz + dv) =0. But we ^^«^Whendonj^ V V> Vec*0R sPan R5 ? Th»» • ThtS,$ very possible.
3.1. Vector Space» and Subspaces Problem Set 3.1 79 The first problems 1-7 are about vector spaces in general. The vectors in those space* are not necessarily column vectors. In the definition of a vector space, vector addition x + у and scalar multiplication ex must obey the following eight rules: (I) x + у = у + x (2) x + (у + a) = (х + у) + ж (3) There is a unique “zero vector" such that X 4- 0 = X for all X (4) For each x there is a unique vector -x such that x + (-ж) = 0 (5) I times x equals x (6) (cica)x ж C|(cj«) (l)lo(4) about x + у (7) c(x + y) = ex + ey (5) to (6) about ex (8) (ci + ca)x = Cix + Cjx. (7) to (8) connect* them 1 Suppose (xi.xa) + (уг.уг) '* defined to be (xi + yj.xj + pi). With lhe usual multiplication ex ж (cri.cri), which of the eight conditions are not satisfied ? 2 Suppose the multiplication ex is defined to produce (cx|,0) instead of (cT|,CXj). With the usual addition in R2. are the eight conditions satisfied ? 3 (a) Which rules are broken if we keep only the positive numbers x > 0 in R1? Every c must be allowed. The half-line is not a subspace. (b) The positive numbers with x + у and ex redefined to equal lhe usual ту and x* do satisfy the eight rules. Test rule 7 when c — 3, x « 2, у 1. (Then x + у = 2 and ex — 8.) Which number acts as the "zero vector" 7 4 The matrix A [, Za ] is • “vector" in the space M of all 2 by 2 matrices. Write down the zero vector in this space, the vector | A. and the vector —A. What matrices are in the smallest subspace containing A (the subspace spanned by Л)? 5 (a) Describe a subspace of M that contains A = [ J g] but not В = [J (b) If a subspace of M does contain A and B, must it contain /? (c) Describe a subspace of M that contains no nonzero diagonal matrices.
Cfaptrr 3. The Four Fundamental Subspaces 80 6 7 TV A nr\ = I2 and o(x) = 5x are “vectors" in F. This is the vector space of all real functions (The functions are defined for -oo < x < oo.) The combination 3/(x) - 4у(х) в the function h(x) = -------• Which rule is broken if multiplying /(x) by c gives the function /(ex)? Keep the usual addition /(x) + <?( x). Questions 8-15 are about the “subspace requirements”: x + у and ex (and (hen all linear combinations ex 4 dp) stay in the subspace. 8 One subspace requirement can be met while the other fails. Show this by finding (a) A set of vectors in R2 for which x + у stays in the set but ’ x may be outside. (b) A set of vectors in R2 (other than two quarter-planes) for which every ex stays in the set but x + у may be outside. 10 11 12 13 14 Which of these subsets of R1 are actually subspaces ? They all span subspaces! (a) Theplaneof vectors (bi.bj.bj) with b] = bj. (b) The plane of vectors with bi = 1. (c) The vectors with bib^bj a 0. (d) All linear combinations of»» (1,4,0) and w a (2,2,2). (e) AU vectors that satisfy bi + b, + b, a 0. (0 AU vectors with b, < bi < bj. “К H:] h;;] <»>[::]-[j ?]• Let P be the plane in R5 with _ P' Find two vectors u> P and checkth f ь.?.~ 2* ~ 4 The on8ln (0,0,0) is not in t«o their sum is not in P. ue’ *o be the plane through (0 n m *чимкж fw ₽o? Find two vectors tn P^110 Ше prtV|ous plane P. What is the S-Р^Р..Ы к Лгои^ (0,0 0) and T • .. ’*’** containing both P and L is athe’ ’ ”* ‘hr°Ugh (°- °- 0)- The smallest (a) Show that the set of -----*---------
3.1. Vector Spaces and Subspaces 81 15 True or false (check addition in each case by an example): (a) The symmetric matrices in M (with Лт = Л) form a subspace. (b) The skew-symmetric matrices in M (with Лт = - Л) form a subspace. (c) The unsymmctric matrices in M (with Лт # ЛI span a subspace. Questions 16-26 are about column spaces C( A) and the equation Ax = b. 16 Describe the column spaces (lines or planes) of these particular matrices: 17 For which right sides (find a condition on b|, bj. bj) are these systems solvable? 18 Adding row I of Л to row 2 produces B. Adding column I to column 2 produces C. A combination of the columns of (B or C1) is also a combination of the columns of Л. Which two matrices have the same column___________? 19 20 (Recommended) If we add an extra column b to a matrix A. then the column space gets larger unless_________. Give an example where the column space gets larger and an example where it doesn't Why is Лх = b solvable exactly when the column space doesn't get larger ? Then it is the same for A and [ A b ]. 21 The columns of AB are combinations of the columns of A. This means: The column space of AB is contained in (possibly equal to) the column space of A. Give an example where the column spaces of Л and AB are not equal.
Chapter 3. The Four Fundamental Subspaces 82 22 23 24 25 26 27 28 29 30 31 - b* are both solvable. Then Az = b + b is solvable. Suppose Ax = b and Ay coiumn space C(A), then What is a? This translates into. Lt b + b’is in Wn>at is a requirement for a vect p ил«ад5Ь,5»«^"-^'Ь“'““|ш,шч>“е“-------------------------------- W? True or false (with a counterexample if false I (a) The vectors b that are not in the column space C( A) form a subspace. (b) If C(A) contains only the zero vector, then .4 is the zero matrix. (c) The column space of 2.4 equals the column space of Л. (d) The column space of A -1 equals the column space of A (test this). Construct a 3 by 3 matrix whose column space contains (1,1,0) and (1,0,1) but not (1,1,1). Construct a 3 by 3 matrix whose column space is only a line. If the 9 by 12 system Ax = b is solvable for every b. then C( A) =. Challenge Problems Suppose S and T are two subspaces of a vector space V. (a) Definition The sum S + T contains all sums a + t of a vector a in S and a ***** t in T Show that S + T satisfies the requirements for a vector space. Addition and scalar multiplication stay inside S + T . Cb) И S and T are lines in R*. what is the difference between S + T and S U T? t union contains all, vectors from S or T or both. Explain this statement: The span ofSuTbS + T. (Section 3.5 returns to this word “span".) what matrix W°* * 7^ T **,hcn S + T is the column space of XiLvT.^ » «nd M are all in R-. I don’t think A + В « always a correct M We want the columns of Af to span S + T. Show that thc rcistrifes A And Г А A n 1 " ”" - c“> - Я- *. л ь „ _ Rnd an°<h« independent solution (after « - equation tPy/dx3 = u . v ) to the second order differential w 2^ XCZ4* “"biinuon, у =-----. th two subspaces of Dn Tv • T**”*’” are in both subsoac« m*. “'nto*ction" V П W contains *“* vn* *r •• “*> vector is in V and W.) yacK + ,iVmVnwl ghe rc4Uircment: If z and у are in VO W.
3.2. The Nullspace of A: Solving Ax = 0 63 3 .2 The Nullspace of A: Solving Ax = 0 I The nullspacc N(A) in R contains all solutions x to Ax = 0. This includes x — 0. 2 Elimination from A to U to Ho docs not change the nullspace: N(A) - N(U) = N(H«). 3 The reduced row echelon form Ho = rref(A) has I in r columns and F in n - r columns. 4 If column j of Ho is free (no pivot), there is a “special solution" to Ax = 0 with Xj = 1. \5 Every short wide matrix with rn < n has nonzero solutions to Ax = 0 in its nullspace. This section is about the nullspace containing all solutions to Ax = 0. The m by n matrix A can be square or rectangular. The right hand side is b = 0. One immediate solution is x - 0. For square invertible matrices this is the only solution. For other matrices, we find n - r special solutions to Ax - 0. Each solution x belongs to the nullspace of A. Elimination will find all solutions and identify this very important subspace. The nullspace N(A) consists of all solutions to Ax = 0. These vectors x are in R". Check that those vectors form a subspace. Suppose x and у are in the nullspacc (this means Ax — 0 and Ay — 0). The rules of matrix multiplication give A(x + y) = 0 + 0. The rules also give A(cx) = cO. The right sides are still zero. Therefore x + у and ex arc also in the nullspace N(A), and the test for a subspace is passed. To repeat: The solution vectors x have n components. They are vectors in R", so the nullspace is a subspace of R". The column space C(A) is a subspace of R”'. [1 2] j gI. This matrix is singular! Solution Apply elimination to change the linear equations Ax = 0 to Rox = 0; Xi + 2xj = 0 + 2хз = О Г 1 2 1 „ f 1 2 1 _ Г / Fl 3x, +6x3-0 0 = 0 L 3 6 J ' L 0 0 J “ I ° 0 J There is really only one equation. The second equation is the first equation multiplied by 3. In the row picture, the line xi + 2x2 = 0 is the same as the line 3x> + 6x3 = 0. That line is the nullspace N(A). It contains all solutions (x|,xa) = (-2c,c) = c(-2,1). To describe the solutions to Ax = 0, here is an efficient way. Choose one “special solution”. Then all solutions are multiples of this one. We choose the second component to be x2 = 1 (a special choice). From the equation xj + 2x2 = 0, the first component must be xi = -2. The special solution is а = (—2,1). Special A 0 nu|lspace of A = R ~ contains all multiples of a = ~2 . solution LJ ®J L This is the best way to describe the nullspace. The solution з is special because the free variable is 1. Simple formulas for H and в come at the end of this Section 3.2.
Chip«er3. The Four Fundamental Subspace, п»И«5 Г W 84 special solutions to Ax = q jk, i by 3 matrix A = [ 1 2 3]. Then F.amole 2 x + 2y + *» = 0 COmeS^n the pla* PcrPcnd,cular to (1.2,3). Ax = 0 produces a plane A1'free variables у and z: Set to 0 and 1. The plane is the mdlspace of A. 1h« [123] » and «2 = -3‘ 0 Oj 1 . , 4.2» + 3z = 0. All vectors on the plane are 2 3] = (/ Fj. eomlunaiwo* of b »tomponrnls an ~fnt~ and wt >“» •“ “ ir'.tL'of 6« «»»«“"“ -2 «о -3 № dem,. choose them specialty as 1,0 and U,1- u mined by the equation x + 2y + 3a = 0. The solutions to x + 2y + За = в also lie on a plane, but that plane is not a subspace. The vector x « 0 u only a solution if b = 0. Section 3.3 will show how lhe solutions to Ax = b (if there are any solutxms) are shifted away from zero by one particular solution. TWo key steps (1) reducing A to iU row echelon forms R^ and R in this section (2) finding the n - r special solutions a to Rx = 0 Section 3.3 has the final step (3) finding a particular solution to Ax - b Example Л Я is connected to A by A CR. As in Chapter 1. C contains r independent columns. Elimination (row operations) will now take us directly from A to Ro to R. without C. 12 1 2 4 5 . We can see that column 2 is 2 times column 1. Then 3 6 9 columns 1 and 3 are independent and the rank is r - 2. But we don’t want to use this information I We want a systenubc way to find dependent columns for any matrix A. Thatsystetnabc way is a sequence of row operations on A that will lead directly to R. (urJi't r°* l,ke е11ГП1М1|оп «ер* in Chapter 2. leading from A to U <Wt ЙО₽ * U WiU COnt,nuc '° * and R We discovering Я before C. That matrix R will reveal the nullspace of A. ”'4te °"”1 “ ““ ~ I IK- 4» Um Pi»,. Column 2 Г 5 9 1 2 3 2 6 1 2 Г ооз 3 6 9 12 1' 0 0 3 0 0 6 • о Step2 Divide row 2 by 1 ю produce second pivot-t n , P vt _ 1. Use it to eliminate 6 and 1: 1 0 0 2 0 0 1 3 6 1 2 Г 0 0 1 . 0 0 6 1 0 0 2 0 0 0 ‘ 0
3.2. The Nullspace of A: Solving Ax = 0 85 That matrix Яо is the reduced row echelon form It has the same rank as A (rank 2). The word echelon means that lhe l’s in Яо go steadily down, left to right. До has the same row space as A (all our row operations were invertible). Яо has the same nullspace as A. The equations Hqx = 0 are linear combinations of the equations Ax = 0. Notice the zero row in Яо. We can and will remove it—no change in the row space or nullspace. Яо becomes Я with no aero rows Thu u the R we wanted in Chapter 1. 12 0' 0 0 1 0 0 0 f1 >11 2 5 I 3 9 C contains the first r independent columns of A (columns 1 and 3) Я has the identity matrix in columns 1 and 3 and F in column 2; rank r = 2 The special solution to Rx = 0 is a = (-2,1,0) with free variable = 1 The nullspace of A and Яо and Я contains all multiples of that solution a Ho = = CH is 1 2 Г 2 4 5 3 6 9 1 2 0 0 0 1 This is the same A CR = (m X r)(r x n) that Section 1.4 would produce by looking for independent columns in C. Now we have a good computational system: Elimination steps from A to Яо and H. then look for the r by r identity matrix inside Я. By creating Я, we know the correct columns of A that go into C. Those columns give the identity matrix I in Я. Then A " CR is the result of elimination on any matrix. going far beyond A = LU to allow every matrix A. Pivot Columns and Free Columns оГ R and A If Rx = 0 then Ax = CRx = 0. There is a special solution x = a for every column of A without a pivot. The r pivots are the Га in I. leaving n - r free columns of Я. Here is the result of elimination on a 4 by 5 matrix when the rank is r = 3 and the independent columns of C are oi and 03 and as- The free columns aj and a« are not in C. 0 9 O' 1 r 0 0 0 1 С Я 4x3 3x5 You see the 3 by 3 identity matrix in R. Elimination on the 4 by 5 matrix A led to a 4 by 5 matrix Яо- With rank r = 3. the fourth row of Яо was all zeros Removing that zero row from R« produced the perfect factorization A = CR. Elimination on A is complete and it reached Я. The remaining step is to read off the 5 — 3 = 2 special solutions to Rx = 0.
86 Chapter у The Four Fundamental Subsp^ . 3 = 2 special solubons >, and * ? Those vectors solve What are the n — r — o •> ^s0 solve = 0 and Аз2 e r. ________________ °- rhe combinations cjai т To find the special solutk We arc assigning the vali correspond to the colum Rei — 0 and R»i=0 tel Special solutions = to Rx - 0 >ns. stai tes 1.0 ns 1,3. ushov -P 1 0 0 0 •к. -( 1. ,0,_)апЬв2 = (—.0,—,1 i and 0,1 to the n - r = 5 — 3 = 2 positions that don’t 5 containing the identity matrix in R. The equations v Ю fill in the rest of those special solutions a, and a2: The nullspace , >,= -r N(A) = N(«) and а? contains all 0 X’Ci^ + c^ » Those three numbers -p and -g and -r are pst negatives of three numbers in ft Elimination has led systematically ion - r 2 independent vectors in the nullspace of R. Those are the two special solutions a. and aj to Rx = 0 and Ax = 0. The free components correspond to columns with no pivots. The special choice (one or zero) is only for the free variables in lhe special solutions. Exampin 3 Find the nullspaces of А. В. M and the two special solutions to Mx = 0. [И] 2 н 4 16 M - [Л 2Л]= 12 2 4 3 8 6 1б] ’ Solution The equation Ax • 0 has only the zero solution x = 0. The nullspace is Z. It contains only lhe single point x « 0 in R3. This fact comes from elimination: Лх“[з e] “* [o 2] [о 1]“Яж/ Nofn*variahl‘“s A is invertible. There are no special solutions. Both columns of this matrix have pivots. The rectangular matrix В has the same nullspace Z. The first two equations in Bx - 0 again require z ж 0. The last two equations would also force x = 0. When we add extra equations (giving extra rows), the nullspace certainly cannot become larger. The extra rows impose more conditions on the vectors x in the nullspace. The rectangular matnx Af is different. It has extra columns instead of extra rows. The solution vector z has/™, components Elimination will produce pivots in the first two columns of Af . The last two columns of M are “free”. They don’t have pivots, -[и:.:] «'-[ни] t t 11 pivot columns free columns
3.2. The Nullspece of A: Solving Az = Q 87 For the free variables x, and x4. we make special Z4 = 0 and second Zj = 0, z4 = J. tees of ones and zeros. First x, = 1, equation Ux = 0 (or Rx = 0). We get two s Vanable* *t and z2 are determined by the is also the nullspace of U and R • '*7° 4>lul,on5 »the nullspace of Af. This Special solutions [ 1 0 2 0 1 R ~ I 0 I 0 2 ] •*w -2' 0 1 and a, = o' -2 «- 2 pivot ♦- variables Ra\ = 0 Ra2 = о o 0 <- 2 free 1 «- variables The Reduced Row Echelon Form Я 1. Produce zeros above the pivots. Vse pivot rows to eliminate upward. 2. Produce ones in the pivots. Divide the whole pivot row by lu pivot. Those steps don' t change the zero vector on the nght tide of the equation The nullspace stays the same: N(A) - N«Z) = N(ft). This nullspace becomes easiest to see when we reach the reduced row echelon form. The pivot columns of R contain I. 12 2 4 0 2 0 4 Reduced form R 1. 1 0 2 0 1 0 1 0 2 J* I subtracted row 2 of U from row I. Then I multiplied row 2 by | lo get pivot Now (free column 3) = 2 (pivot column I), so -2 appears M •> > (-2,0,1,0). The special solutions are much easier to find from the reduced system Rx — 0. In each pivot column of Я. change all the signs to find a. Second special solution - (0, -2,0,1). Before moving to m by n matrices A and their nullspaces N(A) and special solutions, allow me to repeat one comment. For many matrices, the only solution to Ax = 0 is x = 0. Their nullspaces N(A) = Z contain only that zero vector no special solutions The only combination of the columns that produces b = 0 is then the “zero combination". This case of a zero nullspace Z is of the greatest importance. It says that the columns of A arc independent No combination of columns gives the zero vector (except x - 0). But this can't happen if n > m We can’t have n independent columns in R". Important Suppose A has more columns than rows. With n> m there is al least one free variable. The system Ax =0 has at least one nonzero solution Suppose Ax = 0 has more unknowns than equations (n > m). There must be al least n - m free columns Ax = 0 has nonzero solutions in N(A). The nullspace is a subspace. Its -dimension" is the numtor pf free variables. This • _1 -—A 1— < S ПЙ fmt fnlfMM
CMftf1 ThC Four Fundatncr»lal Sub ’Расс» in the Echelon Matrix ft 88 Pivot > i' Л = Variables special Kxi=0and/fe . -. . -a to -e come from p u 3 pivot column* P F in free column* Ra = O Ji 2 free columns/ j pivot*: rank r s 3 ° io be revealed by Я r Ьи^шпп 2). The same must be true fOr A «*-»*«* h“ - o. The special solution »i repeat* combination* of at and a2. Nullspace oM« Null*P»ceof---------------------------------------- ------------7 im„i, formulas for the echelon matrix R On the next Ыà У* wiH «ее simple ormu = 0 „nd . ben-rspecWsdutkm* /inpi«’‘c0,unWS fin free colun,n$ nmkr»3 Example This-I by < 1 0 x oe 0 1 X /<0 0 0 0 0 0 0 mduced row echelon matnx Ko ha* 3 pivots. Delete row 4 to find Я Three pivot variables x>. x2, xfl Four free variables ,r3, x<, x5, x7 Four special solutions в in N(/?o) » N(K) The pivot rows and columns contain I x x 0 t x j 0 x 0 0 1 x 0 0 0 0 Я = [ / F1 P ha* row 4 removed. The permutation P puts column 3 of / into column 6. Question What are the column space and the nullspace for this matrix R 1 Answer The columns of Ro have four component* so they lie in R*. (Not in R3!) The fourth component of every column is zero. The column space of Ro consists of all vectors of the form (b,. bj. b*. 0). The null*pace N(K) = N(/?o) is a subspace of R7 The solutions to Лох - 0 are combinations of the four special solutions — one for each free variable: 1. Columns 3.4.5.7 have no pivots. So the four free variables are x3, x4, x5, x7. 2. Set one free variable io I and set the other three free variables to zero. 3. To find each в. solve Rs = 0 for lhe pivot variables xi. x3, x6. Four special solutions. To repeat: A short wide matrix (n > rn) always has nonzero vectors in its nullspace. There must be al least n - m free variables, since lhe number of pivots cannot exceed m.
3.2. The Nullspace of A; Solving Аг = 0 89 The Echelon Form and Special Solutions in Matrix Language From the examples you see the steps to and R. Chapter 2 produced zeros below the piwrts in U. Chapter 3 also has zeros above lhe pivots in R. All pivots are 1. We now have a systematic way lo identity independent columns in A and to h A = CR. This row echelon form is famous, but its simple matnx formula is seldom given. This page will g.ve formulas for Ro and R. along with lhe spcc.al solutions to Aa = 0 Those n - r special solutions combine to give lhe nullspacc: all solutions lo Ax = 0. Ro comes from elimination (down and up) on A. Here are lhe basic formulas Яо=[о o]P *-[* F]p A-CR-[C CF]P (I) Thai column permutation P puts the columns of I and F into their correct positions in R. F tells how the independent columns in A combine Into the dependent columns - —" ___________________________________ Special solutions lo Ax = 0 Since A has rank r, we expect n - r independent solutions. Ax - 0 gives Rx « [ I F ] Px = 0 Here I is r by r and F is r by n - r. Thanks to the simplicity of /. and the fact that PPT 1. we know immediately the matrix S of special solutions [a, ... S has n rows and n—r columns (special solutions). The identity matrix in S has size n-r. Each column has a 1, as special solutions always do. The other nonzeros in that column come directly from F. with signs reversed to —F. The role of PT is to move the l's into the right positions (free positions) in these special solutions. If the r independent columns of A come first, then P is the identity matrix and S is truly simple: RS = -F + F « 0. Here is a magic factorization that treats rows and columns of A in the same way. C contains the first r independent columns of A as always. Suppose R* contains the first r independent rows of A. (We know that row rank « column rank.) The rows of R’ will meet the columns of C in an r by r matrix IV. Then A factors into CW ~lR*. The first columns of W~'R' will be IV1»' = I. The last column will be the free part F. The permutation is just P = I, since the independent rows and columns came first in A. W1 fl* is the same matrix as K=[/ F], The free part is F= ’ j •
CR=C[I FJp “4 completes the presentation of A = C/l factor muins the first r independent columns of A, and£/U.°,1S''all vws of A. C and /Г meet in an r by r matrix IV 74/* c°ntaj and a small example from page 32 has grown int, к 90 Three Identical Factorizations A = This very optional page ci “ the same C and R. C contains first r independent rows <'f •! M the mixing matrix, and a small ехшпр».... ization’X =CIP-‘R‘ 13 6 I 9] 1 0 0 1 R=W-'K‘^3 -5 4 2 -1 w тхп in x r r X n r X r г n - г =[ W A = any matrix of rank r C = first r independent columns of A Л* = first r independent rows of A IV = intersection of C and R' Theorem The r by r matrix IV also has rank 1. Combinations V of the rows of R* must produce thc dependent rows in [ j Then [ J H]«VH' m - »• «] r and A = CW~]R', ;•[ УЖ VH ] for some matrix V and С» I ц, 2. Combinations T of the columns of C must produce the dependent columns in [ Then 3. A « «].CT ж и 1 VIV VH WT JT IV VIF H К for some matrix T and R* = W [ I T ] IVT Vll'T = CW'«- W V Since A has rank r. its factors must have rank > r. From their shapes that means rank r. If C and R’ were not in the first r columns and rows of A, then permutations PH of the rows and Pc of the columns will give Pn A Pc and the proof goes through. I. Find C and R’ and IV and IV-1 and R = IV“*W for thc transpose of A above. 11. Explain these statements about the rank of augmented matrices [4 b] and [C D]. The rank of A equals the rank of [ A b] if and only if Ax = b is solvable. The rank of C equals the rank of [ C D ] if and only if CT = D is solvable. III. If A = CM R’ his sizes (rnxr) (rxr) (rxn) and rank A = r, show that rank M=r.
91 3.2. The Nullspace of A: Solving Ax = 0 problem Set 3.2 1 Why do A and R = EA have the same nullspace ? We know that E is invertible. 2 Find the row reduced form R and the rank r of A and В (those depend on c). Which are the pivot columns? Find the special solutions to Ax = 0 and Bx = 0. Find special solutions ' 1 2 1 ' 3 6 3 4 8c and В = Create a 2 by 4 matrix R whose special solutions to Rx = 0 arc «i and ej: -3 1 0 0 pivot columns 1 and 3 free variables x? and Xj and £4 are 1,0 and 0,1 Describe all 2 by 4 matrices with this nullspace N(A) spanned by and tj. Reduce A and В to their echelon forms R. Which variables are free? 1 2 2 4 6* 1 2 3 6 9 0 0 12 3 (a) A = (b) В *2 4 2 0 4 4 0 8 8 A = 3 5 For the matrix A in Problem 4, find a special solution to Rx 0 for each free vari- able. Set the free variable to 1. Sei the other free variables to zero. Solve Rx = 0. 6 True or false (with reason if true or example to show it is false): (a) A square matrix has no free variables. (b) An invertible matrix has no free variables. (c) An rn by n matrix has no more than n pivot variables. (d) An m by n matrix has no more than m pivot variables. 7 Put as many l's as possible in a 4 by 7 echelon matrix U whose pivot columns arc (a) 2,4,5 (b) 1,3,6,7 (c) 4 and 6. 8 Put as many l's as possible in a 4 by 8 reduced echelon matrix R so that the free columns are (a) 2,4,5,6 or (b) 1,3,6,7,8. 9 Suppose column 4 of a 3 by 5 matrix is all zero. Then £4 is certainly a _______________ variable. The special solution for this variable is the vector x = _______. 10 Suppose the first and last columns of a 3 by 5 matrix are the same (not zero). Then is a free variable. Find the special solution for this free variable. 11 The nullspace of a 5 by 5 matrix contains only x = 0 when the matrix has -------------- pivots. In that case the column space is R '. Explain why.
92 12 13 14 15 16 17 18 19 20 21 23 24 25 26 Chapter 3. 1 he fundamental Suh ’14, The number of special solutions is by n main* has r pn» . contains only x = 0 when r = Suppose an n m The n £ is r _ -------------------. hv tbc СоипШ V when me tUi—***” 3, - ' 12 * para"'[“ > ’ Iй ' - 0. <v . тъи olane x ' •’» _ ли noints on the plane have the f— 7 he cuiuo... (Recommended) The plane r - 3y - : = i. .. r '«nojnt on this plane is (12,0,0). All points on the p|ane ц -- r, r7 particular point x У t + !/ 11 +* l°l • 0 0 0 , mn з + column 5 = 0 in a 4 by 5 matrix with f0Ur Djv Suppose column 1 + «’1UI” * ц specia| solution? Describe N( Д). P V°u- Which column has no pt vo . w» ‘°"”" ,p“ (l'I,5> <0,3'11 “nd *« nulhpace contains (1,1,2), Construct a matrix whose column space contains (1,1,0) and (0,1,1) and nullspace contains (1,0. !)• Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible Why does no 3 by 3 matrix have a nullspace that equals its column space? If AR = 0 then the column space of В is contained in the------of A. Why? The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure to be______. What R is virtually certain if the random A is 4 by 3? If N( A) = all multiples of x (2,1,0,1). what is R and what is its rank? If the special solutions to Rx 0 are in the columns of these nullspace matrices N go backward to find the nonzero rows of the reduced matrices R: Г2 3' 0 1 iV = I 0 and W 0‘ 0 I and N 1 (empty 3 by 1). (a) What are the five 2 by 2 reduced matrices R whose entries arc all 0’s and I’i? (b) What are the eight 1 by 3 matrices containing only 0’s and 1 ’s? Are all eight of them reduced echelon matrices R ? If A is 4 by 4 and invertible, describe the nullspace of the 4 by 8 matrix В = [А А]. Explain why A and -A always have the same reduced echelon form R.
3.2. The Nullspace of A. Solving Ax = 0 93 27 How is the nullspacc N(C) related to the spaces N(A) and N(B). if C = j ? 28 Find the reduced Ro and Я for each of these matrices: 29 Suppose the 2 pivot variables come last instead of first. Describe the reduced matrix R (3 columns) and the nullspace matrix N containing the special solutions. 30 If A has r pivot columns, bow do you know that AT has r pivot columns? Give a 3 by 3 example with different pivot column numbers for A and AT. 31 Fi 11 out these matrices so that they have rank 1: a b c 32 If A is a rank one matrix, the second row of Я is___. Do an example. // A has rank r. then it has an r by r submatrix S that is invertible. Remove m - r rows and n — r columns to find an invertible submatrix S inside A. B. and C. You could keep the pivot rows and pivot columns: 1 0' о 0 0 1 U 5 5] (! i Л 34 Suppose A and В have the same reduced row echelon form Я (a) Show that A and В have the same nullspace and the same row space (b) We know Ei A = R and E?B = R. So A equals an___________matrix times B. 35 Kirchhoff’s Current Law ATy « 0 says that current in - current out at every node. At node 1 this is yj = y( + y« (arrows show the positive direction of each y). Reduce AT lo Я (3 rows) and find three special solutions in the nullspace of AT. -1 0 1-1 0 0 1-1 0 0-10 0 1-1 0 0 1 0 0 0 1 1 1 3
з. The Four Fundamental Subspac^ C contains the r pivxM columns of A. Find the r pivo< columns of CT >r . Transpose r bv r * '* —hmatrix 5 inside A: * Ъ). 94 36 37 36 39 40 41 Г1 2 3 ЛгЛ» 2 4 6 2 4 7J find C (3 by 2) thcn “*,nvcrtible S (2 by 2). Why is the column space C(AB) • subspace of C(A) ? Then rank(.4B) < Suppose column j of В is a «unb.nat.on of previous columns of В Show XT, «< ЛВ U °'AB л" cannot have new pivot columns, so rank(AB) < rank(B) 'Important) Suppose .4 and Я are n by n matrices. and ЛВ = / Prove fron, nnk(AB) < гапк(Л) that the rank of A is n. So A is invertible and В must be it, inverse Therefore BA - I <»htch is not so obvions!). If A is 2 by 3 and В is 3 by 2 and А В ml. show from its rank that В A * I. Givc an example of A and В with AB = I For m < n, a right inverse is not a left inverse. What is the nullspace mains N (containing the special solutions) for A. В, C ? Г - 2 by 2 Nocks Л«(/ /] and В and C-[/ 1 /). 42 Suppose Л is an m by n mains of rank r lb reduced echelon form (including any zero rows) is Яо Describe exactly the matrix Z (its shape and all its entries) that comes from transposing the redisced ton echelon form of Rq . Z (rref (Д r))T 43 (Recommended) Suppose Яо = j is m by n of rank r. Pivot columns first: (a) W hat are lhe shapes of those four blocks, based on m and n and r? (b) Find a right inverse В with RtlB = I if r “ m. The zero blocks are gone, (c) Find a left inverse C with CRo ~ J if r m n. The F and 0 column is gone, (d) What is the reduced row echelon form of R° (with shapes)? (e) What is the reduced row echelon form af RqRq (with shapes)? Suppose you allow elementary column operations on A as well as elementary row- operations (which get to Ro). What is the "row-and-column reduced form” for an rn by n matrix A of rank r? Verify that equation (I) on page 89 is correct: IV is invertible and IV' The magk factorization is easy if the first r rows and columns of Л are independent. What multiple of block row 1 will equal block row 2 of this matrix ? [ W 1[ IV‘ ][ IV Я ] Г IV Я 1 I J ] = [ J JW~'H I
j > The Nullspace of A: Solving Ax = 0 95 Elimination: The Big Picture This page explains elimination at the vector level and subspace level, when A is reduced to R- YoU know ,hc UePiaad 1 *on't repeat them. Elimination starts with thc first pivot. It moves a column at a lime (left to right) and a row at a time (top to bottom) for U. Then upwards elimination produces Ro and R. Elimination answers two questions: Question 1 Is this column a combination of previous columns? If the column contains a pivot, the answer is no. Pivot columns are "independent" of previous columns. If column 4 has no pivot, it is a combination of columns 1,2,3. Question 2 Is this row a combination of previous rows? If the row contains a pivot, the answer is no. Pivot rows are independent of previous rows. and their first nonzero is 1 from I. Rows that are all zero in Ro were and are not independent, and they disappear in R. It is amazing to me that one pass through the matrix answers both questions 1 and 2. Elimination acts on the rows but the result tells us about the columns! The identity matrix in R locates the first r independent columns in A.Then the free columns F in R tell us the combinations of those independent columns that produce the dependent columns in A. This is easy to miss without seeing the factorization A = CR. R tells us the special solutions to Ax » 0. We could reach R from A by different row exchanges and elimination steps, but it will always be the same R. (This is because the special solutions are decided by A. The formula comes before Problem Set 3.2.) In thc language coming soon, R reveals a "basis" for three of the fundamental subspaces: Thc column space of A—choose the columns of A that produce pivots in R The row space of A—choose the rows of R as a basis. The nullspacc of A—choose the special solutions to Rx = 0 (and Ax = 0). For the left nullspace N(AT), we look at the elimination «ер EA = Ro. The last m — r rows of Ro are zero. The last m - r rows of E are a basis for the left nullspace! In reducing [A /] to [Ro EJ, the matrix E keeps a record of elimination that is otherwise lost. Suppose we fix C and R’ (rn by r and r by n, both rank r). Choose any invertible r by r mixing matrix Af. All thc matrices CMR’ (and only those) have the same four fundamental subspaces as the original A.
a-pKr з The Four Fundamental Subsp^ 96 >,₽ Solutio" t° Ax = b 1i The Comply 1 S^uhr solution xF) + («ny In tn the nullspaC(J { Complete solunon» A* = ” ‘ when „го rows of Яо have zeros in d, 2 Elimination on [ Л . Ip has all free variables equal to zcro. 3 When RoX = d is4ol43b*- nulbpace N( A) = zero vector no free variable,. 4 .4 has full column rank r = " * « Rm : Лх = b is always solvable _ u hen its column sp* 5 .4 has full row rank r - __________ The last section totally solsed Лх = 0 El.nunauon converted the problem lo a 0 The free variables were give. special values (one and aero). Then. the pwot variables found by back subst.tuuon We pa>d no attention to the right s.dc b because it SIaycd at zero Then zero rows in Яо were no problem ... Now b is not zero. Ron operations on the left side must act also on the right side Ax - b is reduced to a simpler system ЯоХ - d with lhe same solutions (if any). One way to organize that is to add b as an extra column of lhe matrix I will augment" Д with the nght side Ib.h.h) -(1.6.7) ₽«*»»« «URmented matrix [Л bj; 13 0 2 0 0 14 13 16 has the augmented matnx 0 2 11 -|Л I]. 1 3 0 0 1 3 1 4 6 1 6 7 When we apply lhe usual elimination steps to Л. reaching Яо. we also apply them to b. In this example we subtract row 1 from row 3. Then we subtract row 2 from row 3. This produces a row of zrroz in Ro. and it changes b lo a new right side d (1.6,0): 13 0 2 0 0 14 0 0 0 0 has the augmented matrix 1 3 0 2 1 0 0 14 6 [0 0 0 0 0] - (Яо d]. That very last row is crucial. The third equation has become 0 = 0. So the equations can be solved. In the original matrix Л. the first row plus the second row equals the third row. To solve Лх = b. we need bt + b, = bj on the right side too. The all-important property of b was 1+6 = 7 That led to 0 = 0 in the third equation. This was essential. Here are the same augmented matrices for a general b = (bj, bj, 63): (>» 1 3 0 2 b/ 0 0 14b, 13 16b, bi bj bj - bj — bj = [Яо 1 3 0 0 0 0 0 1 0 2 4 0 Now we get 0 - 0 in the third equation only if bj — bj - b, = 0. This is bi + bj = 63.
3.3. The Complete Solution to A« = b 97 One Particular Solution Axp = b For an easy solution x,. choo,< lhefrre * nonzero equations give the two pivot variables , . -.1 F J . i .н»п to Д» — h i n ana®*es _ i antj x _ g Our particular solution to Ax - b (and also Rqx = d) is x. - И n г m -n. _• , t . » my favorite:/™ variaAhs . “‘"“°" For . ~1ШЮ» <o «Ы „„ -omtaR.m„luM>b,„„toiStoc,Ibtou, pivot rows and pivot columns of R„ the pivot variables In come from d: 1 3 0 2' 0 0 14 0 0 0 0 1 ‘ 6 0 0 6 0 Pivot variables 1,6 Free variables 0,0 Solution zp = (1,0,8,0). Notice how we choose the free variables (as zero) and solve for the pivot variables. After the row reduction to Ro, those steps are quick When the free variables are zero, thc pivot variables for z,, are already seen in the right side vector d. xparticular The particular solution solves Axr = b znulhpac« The n — r special solutions solve Axn = 0 That particular solution is (1,0,6,0). The two special (nullspace) solutions to Rox = 0 come from the two free columns of Ro, by reversing signs of 3,2, and 4. Please notice how I write the complete solution Xp + z, to Ax = b: Complete solution one Zp+many x„ xp : tree variables xn : special solutions Question Suppose A is a square invertible matrix, m = n = r. What are zp and z„? Answer The particular solution is the one and only solution z, ™ A_,b. There arc no special solutions or free variables. Ro = / has no zero rows. The only vector in the nuilspace is xn = 0. The complete solution ii z = z, + z, = A"'b + 0 We didn't mention the nullspace in Chapter 2. because A was invertible. It was reduced all the way to I. [ A b] went to [f A_,b]. Then Ax = b became x - A~'b which is d. This is a special case here, but square invertible matrices are the best. So they got their own chapter before this one. For small examples we can reduce [A b] to [Ro dj. For a large matrix, MATLAB does it better. One particular solution (not necessarily ours) is x = A\b from backslash. Here is an example with full column rank. Both columns have pivots. Example 1 Find thc condition on (1ц, bj. b) for Az = b to be solvable, if b = bi bj . *»,
Chapter 3- The Four Fundamental Subspa^ 98 with its extra column b Subtract row I of [ Л M W’ зюгеэсЬ[Яо d] ’] 0 26i — *2 _» 0 1 62-61 0 0 63 + 61 + 62. Solution Use the augmented matnx. from row 2. Then add 2 times row I Ю row ’ 1 1 61 1 2 63 -2 -3 63 The last equation is 0 = the column space Then Ax = So for consistency (these are t . ’1 1 bi 0 1 to-bi 0 _i 63 + 2*1. — [Яо d] 0 provided 6, + *1 + *2 » 0. TOs is Ле condition to put b in • = 6 will be solvable. The rows of A add to the zero row ^equations') the entries of b must also add to zero. " Пк^ые lx » ta ' - 2 7 2 •П» .«IlHxe «*Л» bfc - * W« “IW“” Л* ~ ‘ "°* * d “ “ the top of the final column d One solution to Ax = b If 6s + 6( + bj is not zero, there is по solution to Ax = b (x,, and x don t exist). This example is typical of an extremely important case: A has full column rank. Every column has a pivot. The rank is r = n. The matrix is tall and thin (m > n). Row reduction puls / al lhe top, when A is reduced to R with rank n: n identity matrix! n rows of zeros W Fuji column rank r = n Ro There are no free columns or free variables. The nullspace is Z = (zero vector). We will collect together the different ways of recognizing this type of matrix. Every matrix A with full column rank (r = n) has all these properties: 1. All columns of A are pivot columns. No free variables. 2. The nullspace N(A) contains only the zero vector x = 0. 3. If Aa? = 6 has a solution (it might not) then it has only one solution. In the essential language of the next lection, this A has independent columns = ° °” У hapj*?*hen » = 0. Later we will add one more fact to the list above: "“""J Л U ,hemnkit ’*• A may have many rows.’ In this case the, nullspace of A has shrunk to the zero vector The solution to ™ » «. .ben: „ *“b'и» ‘ epeee.
3.3. The Complete Solution to Ax = Ь 99 Full Row Rank and the Complete Solution The other extreme case is full row rank. Now Ax = b has «и or infinitel)manv solution». ln tbs case A must be abort and wide (m < n). Л matrix has full row rank tf r = m. “The rows are independent. Every row has a piwx. and here is an example. Example 2 This system Ax = b has n = 3 unknowns but only m = 2 equations: Full row rank * + V + 1 = 3 _ . x + 2V - z - 4 'rank r « m = 2) These are two planes in zyz space The planes are not parallel so they intersect in a line. This line of solutions is exactly what elimination will find. The particular solution will be one point on the line. Adding the nullspace vectors xK will move us along the line in Figure 3.1. Then x = xj,+ all xn gives the whole line of solutions. Figure 3.1: Complete solution = one particular solution xF + all nullspace solutions x„. We find Xp and xn by elimination downwards and then upwards on [A b ]. Г1 1 1 31 Г1 1 13] [10 3 2] r n . 1 11 г -1 «]-[о 1 -J »] - |o ! -1 1]-I" dl- The particular solution (2,1,0) has free variable x> = 0. Il comes directly from d. The special solution в has Xi = 1. Then -Z| and -xj come from the free column of R. It is wise to check that x, and a satisfy the original equations Alp = b and At — 0: 2+1 = 3 -3+2+1 = 0 2+2 « 4 -3+4-1 = 0 The nullspace solution x„ is any multiple of a. It moves along the line of solutions, starling at Xparticuiar. Please notice again haw to write the answer: 21 Г-3 Complete solution жв-+ж.= 1+х> 2 . Particular + nullspace ’ q j This line of solutions is drawn in Figure 3.1. Any point on the line could have been chosen as the particular solution. We chose xF as the point with xj =
Chapter 3- UK F«*r Fundamental Suhs 100 bv an arbitrary constant I The snc,.; . The particular solution ts nr£rnul ,c ;(|| jn (|]c M needs that constant, and you understand why I -space. П Now we summarize this short wide case of full w rank. m < n Ax = b is underdrtermined (many solutions). <>n Every matrix /I with/u/Zrow rank (r=m) has all these ргорелтГП 1. All rows have pivots, and Ro has no zero rows Rq = R. I 2. Ax = b has a solution for every right side b. 3. The column space is the whole space R'". I 14. There are n - r = n - rn special solutions in the nullspacc of A I___ In this case with rn pivots. lhe rows art linearly independent. So the columns of ЛТ n mis case пи||и«е of Лт contains only the zero vector. And .2 are linearly independent The nullspacc ui л And thil( nullspacc N(AT) will be the fourth fundamental subspace. We arc ready for lhe definition of linear independence, as soon as we summarize th, four possibilities—which depend on the rank. Notice how r, tn, n arc the critical numbers. The four possibilities for linear equations depend on the rank r Square and invertible Ax = b has 1 solution Short and wide * K — Tall and thin and and and and r-n r<n Not full rank Ax = b Ax = b Ax в b has oo solutions has 0 or 1 solution has 0 or oo solutions rn rn m tn The reduced Ru will fall in the same category as the matrix A. In case the pivot columns happen to come first, we can display these four possibilities. For Hyx d and Ax « b to be solvable, d must end in m — r zeros, t is the free part of Ro. Four types for R<> [ 1 ] Their ranks r = m = n r = 0 0 Cases I and 2 have full row rank r = m. Cases I and 3 have full column rank r = n. REVIEW OF THE KEY IDEAS 1. The rank r is the number of pivots. The matrix Ro has m - r zero rows. 2. Ax = b is solvable if and only if lhe last tn - r equations reduce to 0 = 0. 3. One particular solution xf has all free variables equal to zero. 4. The pivot variables are determined after the free variables are chosen. 5. Full column rank г = n means no free variables: one solution or none. 6. Full row rank r = rn means one solution if m = n or infinitely many if m < n.
3 3, The Complete Solution to Ax = b 101 WORKED EXAMPLES 3.3 a This question connects elimination (pivot columns and back substitution) to column space-nullspace-rank-solvability (the higher level picture). A has rank 2: xt + 2x3 + 3x3 + 5x< = fc| Ax = b is 2x1+4x2 + 3x3+12x4 = 63 3x| + вхг + 7xj + 13X4 = 63 5 12 13 3 8 7 2 4 6 1 2 3 A = 1, Reduce [ A b] to [ U c|, so that Ax = b becomes a triangular system Ux = c. 2. Find the condition on b,, b3, Ьз for Ax = b to have a solution. 3. Describe the column space of A. Which plane in R* ? 4. Describe lhe nullspacc of A. Which special solutions in R* ? 5. Reduce [ U c|lo[Ao d): Special solutions from /{<>, particular solution from d. 6. Find a particular solution to Лх = (0,0, —6) and then the complete solution. Solution 1. The multipliers in elimination are 2 and 3 and-1. They take [4 b)into[l/ c|. 1 2 3 5 bi I Г1 2 4 8 12 b3 -s 0 3 6 7 13 bs 0 2 3 5 0 2 2 0 -2 -2 bi Ьз — 2bj bj — 3b| bi b3 - 2b, bs + ba — 5b| 1 2 0 0 0 0 3 5 2 2 0 0 2. The last equation shows the solvability condition 63 + bj - 56) = 0. Then 0 = 0. 3. First description: The column space is the plane containing all combinations of the pivot columns (1,2,3) and (3,8,7). The pivots are in columns 1 and 3. Second description: The column space contains all vectors with b3 + by - 5b| = 0. That makes Лх = b solvable, so b is in lhe column space. All columns of A pass this test Ьз + bj - 5b| = 0. This is the equation for the plane in the first description I 4. The special solutions have free variables x3 = 1, X4 = 0 and then x3 = 0, r< 1: Special solutions to Лх = 0 Back substitution in Ux = 0 or change signs of 2,2,1 in A The nullspacc N(4) in R' contains all x„ = С|в( + C2«a. 5. In the reduced form Ao. the third column changes from (3,2.0) in U to (0,1,0). The right side c = (0,6,0) becomes d = (-9,3,0) showing -9 and 3 in (I/ c] = 12350* 0 0 2 2 6 0 0 0 0 0 —>[A0 d) = -9 3 0 1 0 0 2 0 2 0 1 1 0 0 0 6. x = (-9.0,3,0) is the very particular solution with free variables = zero. The complete solution to Лх = (0,6, —6) is x = xp + x„ = xp + С|в> + ejej-
Сщия 3. The Four Fundamental Sub»^ 102 U В SW- Р^ы, .s :r*6 What does that tell J«* atx>u i. - [<] *4U- 1 All solutions ю Az tss:****"'-ы+4?] Solution In case I. *•* "jX^X^Necttwrily m * n- Г a *• The nullspace of A co« = * columns (and m is arbitrary). With [} ] in the nullspace In case 2. A must ha д 0: the rank is 1. With x = Г a 1 ... of A. column 2 is the negatnt of cohun * would be (1,0). ' “ XnX ^X^^Vis not in the column space of A. The rank of A musl к In case 3 we on у Ilth<-rwise z = 0 would be a solution. k”l^<1 3 »l— •»(1.0.1)»'b' or л. colum, 3 is -(column l) The rank ts 3 -1 = 2 and b IS column I + column 2. 3.3 c Fmd the complete solution z = zp + zn by forward elimination on (A b]: Solution ’ 1 2 2 4 4 8 2 it) 2 10 4 0 2 8 -0 0 0 0 0 12 0-4 0 0 1 0 0 0 7 ' 4 -3 0 0 1 6 0 8 8 ‘ 1 0 0 For the nullspace рал zn with b = 0. set the free variables Zj, z4 to 1,0 and also 0,1; Special solutions at-(-2,1,0,0) and aa = (4,0,-4,1) Particular xp = (7,0, -3,0) Then the complete solution to Az = b is ^complete = Zp + Ci«i + c^aj. The rows of A produced the zero row from 2(row 1) + (row 2) - (row 3) = (0,0,0,0). Thus у a (2,1,-1). The same combination forb = (4,2,10) gives 2(4) + (2) -(10) = 0. If a combination of the roars (on the left side) gives the zero row, then the same combi- nation must give zero on the right side. Of course! Otherwise no solution. Later we will say this again in different words: If every column of A is perpendicular to у = (2,1, -1), then any combination b of those columns must also be perpendicular to y. Otherwise b is not in the column space and Az = b is not solvable. And again: If у is in the nullspace of AT then у must be perpendicular to every b in the column space of A. Just looking ahead...
103 3.3. The Complete Solution to Ax = b Problem Set 33 3 (Recommended) Execute the six steps of Worked Example 3.3 A to describe the column space and nullspace of A and the complete solurion to Ax = b 2 4 6 4 2 5 7 6 2 3 5 2 Г I b= 5, [ Ьз J 4 3 5 A = 2 Cany out the same six steps for this matrix A with rank one. You will find two conditions on bi, 63,63 for Ax — 6 to be solvable. Together these two conditions put b into the------------space (two planes give a line): |2 I 3) = bi bj bj 4 2 10 311 n Questions 3-15 are about the solution of Ax = b. Follow the steps in lhe text to x,, and xn. Start from the augmented matrix with last column 6. 3 Write lhe complete solution as xf plus any multiple of a in lhe nullspace: x + 3y = 7 2x + 6y=l4 x + Зу + За = 1 2x + 6p + 9t - 5 —x - 3jr + 3a = 5 4 Find the complete solution x — x,+ any z„ (also called the general solution) to 1 2 0 3 1 2‘ 6 4 8 0 2 4 5 Underwhatconditionsonbi.bj.bjarethesesystemssolvable? Include b as a fourth column in elimination. Find all solutions when that condition holds: x + 2p — 2a = bi 2x + 2a ” b> 2х + 5у-4г = Ьз 4-r + 4y = t>j 4x + 9y — 8a = 63 8x + 8y = 6з 6 What conditions on bi, 6j, 63, 64 make each system solvable? Find x in that case: 1 2 2 3 2' 4 5 9 bi bi bj b4 12 3 2 4 6 2 5 7 3 9 12
104 10 11 12 13 The ranker = Copter 3. The Four Fundamental Sub^ . . is in the column space if h - 2^ + 4/>] Г1 3 1 3 8 2 . 2 4 0. . s.i«in the column space of Л? Which combinations of the Which vectors (•>!• 03; rows of Л give zero? 1 2 1 2 6 3 0 2 3 - form z. + *» ю d** ful1 rank systcms: Find the complete solution in the form z, (b) A 1 1 1' 1 2 4 2 4 8 (b) Construct a 2 by 3 Az - b wit.^particular so.ution x, - (2,4,0) and homogeneous solution x. - »У mull,Pkr " <*• *• 1,1 Why can’t a I by 3 system have « (2.4.0) and x. - any multiple of (1,1,1)? (a) If Az - b has two solutions z, and x2. find two solutions to Ax = 0. (b) Then find another solution to Az - 0 and another solution to Ax - b Explain why these are all false: (a) The complete solution is any linear combination of aj, and xn. (b) A system Az - b has at most one particular solution. This is true if A is 7 8 9 14 15 W A (c) The solution z, with all free variables zero is the shortest solution (minimum length ||z||). Find a 2 by 2 counterexample. (d) If A is invertible there is no solution x„ in the nullspace. Suppose column 5 of (/’has no pivot Then x5 is a____ variable. The zero vector (is) (is not) the only solution to Az = 0. If Az = b has a solution, then it has solutions. Suppose row 3 of U has no pivot. Then that row is___The equation Ux = e 1^ли°,*Л’е pro*'<W-----------C4uatlon Ax ’ 6 (й) (“ no,) not
33. The Complete Solution to Ax = b 105 16 The largest possible rank of a 3 by 5 matrix is_______. Then there is a pivot in every -— and R. The solution to Ax = b (always exists) (is unique). The column space of A is________An example is A =__________ 57 The largest possible rank of a 6 by 4 matrix is _________. Then there is a pivot in every ------of U and R. The solution to Az = b (ofweyr exists) (is unique). The nullspace of A is_______An example is A =__________. 18 Find by elimination the rank of A and also the rank of AT: 1 4 O' 2 11 5 11 5 2 10 1 0 Г and A = 1 1 2 (rank depends on q). .1 1 Я. 19 If Az = b has infinitely many solutions, why is it impossible for Az » В (new right side) to have only one solution? Could Az “ В have no solution ? 20 Choose the number q so that (if possible) the ranks are (a) I (b)2 (c)3: 6 4 2 -3 -2 -1 9 6 q 21 Give examples of matrices A for which the number of solutions to Az = b is (a) 0 or 1, depending oo b (b) oc. regardless of b (c) 0 or oo, depending on b (d) I, regardless of b. 22 Write down all known relations between r and m and n if Az b has (a) no solution for some b (b) one solution for some b. no solution for other b (c) infinitely many solutions for every b (d) exactly one solution for every b. Questions 23-27 are about the reduced echelon matrices Ro and R. 23 Divide rows by pivots. Then produce zeros above those pivots to reach Ro »nd R- 2 4 4' 0 3 6 0 0 0 and U= 2 4 4' 0 3 6 0 0 5 24 If A is a triangular matrix, when is Ro - rref(A) equal to I ? 25 Apply elimination to Ux = 0 and Ux = c. Reach R«z = 0 and Roz = d : l«'»l-[J o’ °i “d l" •=!-[' 1 2 3 51 0 0 4 Solve Roz = 0 to find xn with z2 = 1. Solve Roz = d to find xr with za = 0.
Chapter 3. The Four Fundamental Subsp^ = - c . to - 0 - - d Wl“’m •» Яо1=<*? 106 26 27 28 29 30 31 32 33 '3 0 о 0 о 0 6 9' 0 4 . 2 5 3 0 6 0 0 0 U с and 0 0 0 2 _ ,,x _ c (Gaussian elimination) and then R»x = d. и о 0 0 Find a panicuia, and all homogeneous solutions zn. 10 2 3 13 2 0 2 0 4 9 ’ 2 5 10 = b. Hnd matrices A and В with the gisen property or explain why you can’t: 4 "[!]• (a) The only solution of Ar - 0 1 . (b) The only solution of Bz = j I t isz = 1 2 . 3 Find the LU factorization of A and all solutions to Ax = b: The complete solution to Az = 0 1 1 0 0 0 . Find A. (Recommended!) Suppose you know that the 3 by 4 matrix A has the vector s = (2,3.1,0) as the only special solution to Ax = 0. (a) What is the rank of A and the complete solution to Ax = 0? (b) What is the exact row reduced echelon form Яо of A? (c) H°* do you know that Ax = Ь can be solved for all b ? to AequAC?* = 6 ,he Mme <comPle,e) solutions for every b. Describe the column snace nf . j . Removing any zero rouT a- . educed row echelon matrix Rq with rank r. g any zero rows, descnbe the column space of R.
3.4. Independence, Basin, and Dimension 107 3.4 Independence, Basis, and Dimension /^Independent vectors: The only zero combination Cj »i+-.- + c4v* = 0 has all c’a = o\ 2 The vectors v>,..., v* span the space S if S = all combinations of the »'s. 3 The vectors ...are a basis for Sif(l) they are independent and (2) they spanS. 4 The dimension of a space S is the number к of vectors in every basis for S. This important section is about the true size of a subspace. There are n columns in an m by n matrix. But the true “dimension” of the column space is not necessarily n. The dimension of C( A) is measured by counting independent columns. We will see again that the true dimension of the column space is the rank r. The idea of independence applies to any vectors ,..., in any vector space. Most of this section concentrates on the subspaces that we know and use—especially the column space and the nullspace of A. In the last part we also study “vectors” that are not column vectors. They can be matrices and functions; they can be linearly independent or dependent. First come the key examples using column vectors. The goal is to understand a basis : independent vectors that "span the space". Every vector in the space is a unique combination of the basis vectors. We are at the heart of our subject, and we cannot go on without a basis. The four essentia] ideas in this section (with first hints at their meaning) are: 1. Independent vectors (no extra vectors) 2. Spanning a space (enough vectors to produce the rest) 3. Basis for a space (лог too many and not too few) 4. Dimension of a space (the number of vectors in every basis) Linear Independence Our first definition of independence is not so conventional, but you are ready for it. DEFINITION The columns of A are linearly independent when the only solution to Ax = 0 is x = 0. No other combination Ax of the columns gives the zero vector. The columns are independent when the nullspace N(A) contains only the иго vector. Let me illustrate linear independence (and dependence) with three vectors in R .
Chapter 3. The Four Fundamental Sub... DSPaccs 108 . rj ate not in the same plane, they are independent. No combina 1. If three vectors in except (h,, + Ov2 + Ov3. non of ® ® , —3 . w, are in the same plane in R , they are dependent. 2. if three «i ®1*"3 Not in a plane Figure 3.2: Independent: Only 0®i + 0®2 + Ovj give 0. Dependent: Wi-Wj + WjaQ Th» idea of independence applies to 7 vectors in 12-dimensional space. If they are the columns of A. and tndependeni. the nullspace only contains x = 0. None of the vectors is a combination of the other six vectors Now we choose different words to express (he same idea in any vector space. DEFINITION The sequence of vecton ........v„ is linearly independent if the only combination that gives the zero vector b 0® । + Ovj + ••• 4-0v„. Linear independence ri»t + rjVj+ + x.v, « 0 only happens when all z’s are zero. If a combination gives 0. when the r's are not all zero, the vectors arc dependent. Correct language: “The sequence of vecton is linearly independent." Acceptable dioncitr The vectors are independent.” Unacceptable: ‘The matrix is independent." A sequence of vectors is either dependent or independent. They can be combined to give the zero vector (with nonzero z’s) or they can’t. So the key question is: Which com- bmations of the vecton give zero? We begin with some small examples in R * (a) The vecton (1.0) and (1.0.00001) are independent. (b) 11*»е«оп(1,|)аЫ(-1,-1)ж4фйИя|». (c) The vecton (1,1) Ы (0,0) ж drpfndent of <d) In R« any three vecton (.,b) (crf) (e/) dfpfnden, Dependent columns h ql r i r , * 10 in the nullspacc [1 -1] [zJ = 0 for *1 = 1 and x2 = 1. Three vecton in RJ cannot be those three columns must have a Onc waY to see this: the matrix A with Now move to three vecton in R1 anJ t*lcn 3 special solution to Ax = 0. secton are dependent But the com L' 0ПС them “1 mu,tiP,c of another one. these them in a matnx and try ю solve AtVq ,nvo,ves all three vectors at once. We put
3.4. Independence. Basis, and Dimension 109 Example 1 The columns of this A are dependent. Ax = 0 has a nonzero solution: 1 5 1 ° 3J J The rank is only r — 2. Independent columns produce full column rank r = n = 3. For a square matrix, dependent columns imply dependent rows and vice versa. Question How to find that solution to Ax = 0? The systematic way is elimination. Full column rank The columns of A are independent exactly when the rank iir n. There are n pivots and no free variables and A = C. Only x = 0 is in the nullspace. One case is of special importance because it is clear from the sun Suppose seven columns have five components each (m = 5 is less than n = 7). Then the columns must be dependent. Any seven vectors from Rs are dependent. The rank of A cannot be larger than 5. There cannot be more than five pivots in five rows. Ax = 0 has at least 7 - 5 » 2 free variables, so it has nonzero solutions—which means thal the columns are dependent. Any set of n vectors in R"* must be linearly dependent if n > m. This type of matrix has more columns than rows—it is short and wide. The columns are certainly dependent if n > m, because Ax - 0 has a nonzero solution. The columns might be dependent or might be independent if n < m. Elimination will reveal the r pivot columns. It is those r pivot columns that are independent in C. Note Another way to describe linear dependence is this: “One vector is a combination of the other vectors." That sounds clear. Why don't we say this from the start? Our definition was longer: “Some combination gives the zero vector, other than the trivial combination with every x = 0." We must rule out lhe easy way to get the zero vector. The point is, our definition doesn't pick out one particular vector as guilty. All columns of A are treated the same. We look at Ax = 0. and it has a nonzero solution or it hasn't
110 Example 2 Desert-------- m = 3 4 = n = 2 Chapter 3. The Four Fundamental Sub, Pace, Vectors that Span a Subc. wsPace „ .« fc Cd."» surt.» “Xunm „................... CM) dcWo Л то siogk >!“" decnb“ С<Л>- " combinations Ах * со1итя space. They might be dependent Пе columns of matrix spa x and the row space of A. 1 41 a <Т Г1 2 31 2 7 and .4 = L 7 J. 3 5 J , л к the plane in R3 spanned by the two columns of 4. 77 <hf lhrre *А (WhiCh arc.COlumnS Of 1Ъ“ ">w sp^ „f й2 R^ember The rows are m R" spanning the row space. The columns are^ Rm spanning the column space. Same numbers, different vectors, different spaces. A Basis for a Vector Spacc Two vectors can t span all of R3. even if they are independent. Four vectors can’t be independent even if they span R3. We want enough independent vectors to span tht space (and not more). A “basts is just right DEFINITION A basis for a vector space is a sequence of vectors with two properties: The basis vectors are linearly independent and they span the space. This combination of properties is fundamental to linear algebra. Every vector v in the space is a combination of the basis vectors, because they span the space. More than that, the com- bination that produces о is unique, because the basis vectors Vj..... vn are independent: There is one and only one way to write v as a combination of the basis vectors. Reason: Suppose v = mm+-+anvn and also» = biVi+--+bnt>,(. By subtraction (m - + (<4 - bn)®„ •* the zero vector. From the independence of the v’s, each a, - b, = 0. Hence a. = 6„ and there are not two ways to produce v. Examples The columns of I = Q produce the “standard basis” for R2. The basis vectors i = | and j = j are independent. They span R . Everybody thinks of this basis first The vector i goes across and j goes straight up. The columns of the n by n identity matrix give the "standard basis" for R”.
3.4. Independence. Basis, and Dimension 111 Now we find many o(her bases (infinitely many). Tbe basis is not unique! Example 4 (Important) The columns of every invertible n by n matrix give a basis for R : Invertible matrix Independent columns Column space h R3 Singular matrix 1 0 1 Dependent columns В = 1 1 2 1 1J Column space # R3 112 The only solution to Ax = 0 is x = Д- ‘0 = 0. The columns are independent. They span the whole space R because every vector b is a combination of thc columns. Ax = b can always be solved by x = A b Do you see how everything comes together for invertible matrices? Here it is in one sentence: The vectors « i,..., v„ are a basis for Rn exactly when they arc the columns of an n by n invertible matrix. Thus Rn has infinitely many different bases. When the columns are dependent, we keep only the pivot columns—thc first two columns of В above. Those two columns are independent and they span the column space. Every set of independent vectors can be extended to a basis. Every spanning set of vectors can be reduced to a basis. Example 5 This matrix is not invertible. Its columns are not a basis for anything! One pivot column One pivot row (r = 1) Example 6 Find bases for the column and row spaces of this rank two matrix: 2 0 0 0 3' 1 4 0 0 Columns 1 and 3 are the pivot columns. They are a basis for the column space of Ro. The column space is the “ry plane" inside ryz space R3. That plane is not R2. it is a subspace of R3. Columns 2 and 3 are also a basis for the same column space. Which pair of columns of Ro is not a basis for its column space ? The row space is a subspace of R4. The simplest basis for that row space is thc two nonzero rows of Rq. The zero vector is never in a basis Question Given five vectors in R . how do you find a basis for the space they span? First answer Make them the rows of A. and eliminate to find the nonzero rows in R. Second answer Put the five vectors into the columns of A. Eliminate to find thc pivot columns. Those pivot columns in C are a basis for the column space. Could another basis have more vectors, or fewer? This is a crucial question with a good answer: No. AU bases for a vector space contain the same number of vectors. The number of vectors in any and every basis is the dimension’’of the space.
112 for rhe same vector space, then m = n an ai„ = V4. t>i ... vm W1 tl>2 •• Each w is a combination И ~ ofther's Proof Suppose that there are more w's than v's. From n > m we w contradK'uon. The v's are a basis, so W] must be a combination of the v’s / ,0 ^ch a a, lC|+-• this is lhe first column of a matnx multiplication c4uals Dimension of a V We have to prose what was just stated. There are many choices for the b the number of basts rectors doesn’t change as,s vc'ctOr ' byt «е both bases If Cl We don’t know each e... but we know the shape of A (it is m by n). The second vector * £ aho a combination of the vs. The coefficients ш that combination fill the column of A. The key is that A has a row for every v and a column for every w. A « a ribZ vide matrix «псе we assumed n > m. So Ax = 0 has a nonzero solution. Лх = 0 gives V4x = 0 which is Их = 0. A combination of the w s gives zen>\ Then the w’s could not be a basis—our assumption n > m is not possible for two bases. If m > n we exchange lhe e's and w's and repeat the same steps. The only way to □void a contradiction is to have m = n. This completes the proof that m = n. The number of basis vectors is the dimension. So the dimension of R" is n. We now define the important word dimension. DEFINITION The dimension of a space is lhe number of vectors in every basis. [ The dimension marches our intuition. The line through v = (1,5,2) has dimension one. It is a subspace with this one vector v in its basis. Perpendicular to that line is the plane x + 5y + 2a = 0. This plane has dimension 2. To prove it, we find a basis (-5.1,0) and (-2,0,1). The dimension is 2 because the basis contains two vectors. The plane is the nullspace of lhe matrix Л = [ 1 5 2 ], which has two free variables. Our basis vecton (-5,1,0) and (-2,0,1) are the “special solutions" to Ax = 0. The n - r special solutions always give a basis for the nullspace: dimension n — r. Note about the language of linear algebra We never say “lhe rank of a space” or “the dimension of a basts or “lhe basis of a matrix". Those terms have no meaning. It is the dimension of the column space that equals the rank of the matrix.
3.4. Independence. Basis, and Dimension 113 Bases for Matrix Spaces and Function Spaces The words independence and “basis" and “dimension” are not limited to column vectors. We can ask whether three matrices А|,А2.Аз are independent. When they are 3 by 4 matrices, some combination might give the zero matrix. We can also ask the dimension of the full 3 by 4 matrix space. (It is 12.) In differential equations, Py/dx1 = у has a space of solutions. One basis is у = e* and у = e *• Counting the basis functions gives the dimension 2 for this solution space. (The dimension is 2 because of the second derivative.) Matrix spaces Thc vector space M contains all 2 by 2 matrices. Its dimension is 4. —- а.л,.л,.а-[; ;]. Those matrices are linearly independent. We are not looking at their columns, but at the whole matrix. Combinations of those four matrices can produce any matrix in M. Every A combines . . , . fci col the basis matrices С,Л‘ + + esAt + c«A< = = A. A is zero only if the c’s are all zero—this proves independence of Xi, А?. A3, A*. The three matrices A1.A2.A1 are a basis for a subspace—the upper triangular matrices. Its dimension is 3. At and Ад are a basis for the diagonal matrices. What is a basis for the symmetric matrices? Keep Ai and Ад. and throw in Aj + A3. The dimension of the whole n by n matrix space is n2. The dimension of the subspace of upper triangular matrices is |n2 + |n. The dimension of the subspace of diagonal matrices is n. The dimension of tbe subspace of symmetric matrices is |n2 + |n (why ?). Function spaces The equations (Py/dx1 = 0 and Py/dx/1 = -y and Py/dx2 = у involve the second derivative. In calculus we solve to find the functions y(x): у" = 0 is solved by any linear function у = ex + d у” = — у is solved by any combination у = csin x + dcosx у" = у is solved by any combination у = ее* + de~*. That solution space for у" = — у has two basis functions: sinx and coex. The space for у" = 0 has r and 1. It is the “nullspace” of the second derivative! The dimension is 2 in each case (these are second-order equations). The solutions of y" = 2 don’t form a subspace—thc right side b = 2 is not zero. A particular solution is y(x) = x2. The complete solution is p(x) = x2 + ex + d. All those functions satisfy y" = 2. Notice the particular solution plus any function ex + d in the nullspace. A linear differential equation is like a linear matrix equation Ax = b. We end here with the space Z that contains only the zero vector. The dimension of this space is zero. The empty set (containing no vectors) is a basis for Z. We can never allow the zero vector into a basis, because then linear independence is lost.
114 UW‘U ----------- 1 u,,u*uhcntaJ sub-- lodep^""- ’₽“ b“iS d™™»n , , „ W™ tf* - <"•,be °"11' “ A*- 0. 1. The columns or * * jf combinations fill that space. 2. Thevectorsu,.tndtnl vectors thal span the space. E 3. A basis consists °fbn^‘^iOn of the basis vectors. * in the space is a am? . mn« «ГГ one basis for the column space. The dimension is r. 5. The pivot columns arc one iw WORKED EXAMPLES A is invertible. Show that A»i• • • » " Solution In matrix language: Put the basis vectors v„. tt in the columns Of an invertible (!) matnx V. Then Ли.Ли„ arc the columns of AV. Since A is invert^ so is AV. Its columns give a basis. In vector language: Suppose с(Ли( + ••• + = 0. This is Av . 0 wj. v . Cl V1 + • • -+c„ v„. Multiply by A'1 to reach и = 0. By linear independence of the v's all c, 0. This shows thal lhe Xu's are independent. To show that lhe ЛиЧ span R", solve cl Ли( + • • • + c„ Avn > b which is the same a C) V| +... + c„vn = Л_|Ь. Since the v's are a basis, this must be solvable. 3.4 В Start with the vectors U| = (1.2,0) and u2 - (2,3,0). (a) Are they linearly independent? (b) Are they a basis for any space? (c) What is the dimension of V? (d) Which matrices A have V as their column space ? (e) Which matrices have V as their nullspacc? (f) Describe all vectors v3 that complete a basis Vi, v2, t»3 for R3. Solution (a) vi and u2 are independent—the only combination to give 0 is 0«i + 0v2. (b) Yes, they are a basis for the space V they span: All vectors (x, y, 0). (c) The dimension of V is 2 since the basis contains two vectors. (d) This V is the column space of any 3 by n matrix A of rank 2, if every column is a combination of »i and v2. In particular A could just have columns v, and v2. (e) This V is the nullspace of any m by 3 matrix В of rank 1, if every row is a multiple of (0,0,1). In particular take В = (0 0 1]. Then Bv1 = 0 and Bv2 - 0. (f) Any third vector vs = (a, b,c) will complete a basis for R3 provided c / 0.
3.4. Independence. Basis, and Dimension 115 3.4 C Start with three independent vectors W1, w2. w3. Take combinations of those vectors to produce vl, v2( v3. Write the combinations in matrix form as V = W'B: V| = W| + w2 v2 = u)| + 2w2 + w3 V3 = W2 + CW3 0 1 c which is What is the test on В to see if V = И £/ has independent columns? If c / 1 show that Vi, V3> Va are linearly independent. If c = 1 show that thc v’s are linearly dependent. Solution For independent columns, the nullspace ofV must contain only the zero vector. Vx = 0 requires x = (0,0,0). If c = 1 in our problem, we can see dependence in two ways. First, tq + v3 will be the same as v2. Then v> — v2 + v3 « 0—which says that the v's are dependent. The other way is to look at the nullspace of B. If c = 1, the vector x - (1,-1,1) ii in that nullspace, and Bx 0. Then certainly WBx 0 which is the same as Vx = 0. So the v's are dependent: vt - v2 + v3 » 0. Now suppose c # 1. Then the matrix В is invertible. So if x is any nonzero vector we know that Bx is nonzero. Since thc w’s are given as independent, we further know that WBx is nonzero. Since V H'B, this says that x is not in the nullspace of V. In other words vt, v2, v3 are independent. The general rule is “independent v’s from independent w’s when В is invertible”. And if these vectors are in R3, they are not only independent—they are a basis for R*. "Basis of v's from basis of w’s when the change of basis matrix В is invertible."
Copter з. The Four Fundamental Sub^ 116 Problem Set 3. |incar dcpen<lence> boutiin«rin,kpe^ Questions l-Ю arc a V4 dcpcndcn(. Show that n. T о о V| = Solve C|»i (Recommended) Find the U| • I -1 0 0 1’2 = Ci = t»4 = + ri,.4 = Oor.Ax = ° 1116 ” s 8°in the columns of д largest possible number of independent vectors among 0 -I 0 1' 0 0 -1 V4 e o' I -1 0 o' 1 0 -1 «в » O' 0 1 -1 1 2 vt “ T i о 1 1 1 2 3 4 V8 » I,., f oor d = Oor / e 0(3 cases), the columns of I are dependent: 3 5 6 a b e [/«Ode. 0 0 /. If „ d / in Question 3 are nonzero, show that the only solution to Ux = 0 is x - 0. An upi*r triangular U with no diagorul zeros ha* independent columns. Decide the dependence or independence of (a) the vectors (1,3.2) and (2.1.3) and (3,2.1) (b) the vectors (1.-3,2) and (2,1,-3) and (-3,2,1). Choose three independent columns of U. Then make two other choices. Do the same for A. 7 8 If W|. to2, w3 are independent vectors, show that the differences Vi = w2 - w3 and vj = u>| - w3 and »з = Wj - wj are dependent. Find a combination of the v’s that gives zero Which matrix A in [ v\ v2 Vj ] = [ Wj w2 w3 ] A is singular? If W|. w2. w3 are independent sectors, show that the sums = w2 4- w3 and v2 = wi + wj and th = wj + Wj are independent. (Write Ci V| +C2V2 + C3V3 = 0 in terms of lhe w’s. Find and solve equations for the c's, to show they arc zero.)
3,4. Independence. Basis, .nd Dimension 117 9 Suppose Vi.Vj.Vj,t>4 ие vectors in R1 (a) These tour vectors arc dependent because (b) Thc two vectors V| and v2 will be dependent if (c) The vectors v, and (0,0.0) are dependent because _. 1 о Find two independent vectors on the plane t + 2y - 3a -1 = 0 in R* Then find three independent vectors. hy not four? This plane is the nullspace of what mains? Questions 11-14 are about the space ipannrd by a set of vectors. Take all linear com- binations of the vectors. 11 Describe thc subspace of R’ (is it a line or plane or R1?) spanned by (a) the two vectors (1,1,-1) and (-1,-1,1) (b) the three vectors (0.1,1) and (1,1,0) and (0,0,0) (c) all vectors in RJ with whole number components (d) all vectors with positive components 12 The vector b is in the subspace spanned by the columns of A when has a solution. Thc vector c is in the row space of A when_______has a solution. True or false: If the zero vector is in the row space, thc rows are dependent 13 Find the dimensions of these 4 spaces. Which twoofthe spaces are thc same? (a) col- umn space of 4, (b) column space of U. (c) row space of A. (d) row space of U: A = 1 1 0 1 3 1 3 1 -1 and U " 1 1 O' 0 2 1 0 0 0 14 v + w and v - w are combinations of v and w. Write v and w as combinations of v + w and v - w. Tbe two pairs of vectors the same space When are they a basis for the same space? Questions 15-25 are about the requirements for a basis. 15 If vt..... »„ are linearly independent the space they span has dimension These vectors are a for that space. If the vectors are the columns of an m by n matrix, then m is_____than n. If m = n. that matrix is___. 16 Find a basis for each of these subspaces of R4: (a) All vectors whose components are equal. (b) All vectors whose components add to zero. (c) All vectors that are perpendicular to (1.1,0,0) and (1,0,1,1). (d) The column space and the nullspace of / (4 by 4).
118 Chapter 3. The Four Fundamental Subspace 17 Find three different bases for the column space of U- [ J ? J ? J]. Then find two different bases for the row space of U. 18 Suppose vi.vj......vt are six vectors in R . (a) Those vectors (doXdo not X might not) span R . (b) Those vectors (areXare notXmight be) linearly independent. (c) Any four of those vectors (areXare notXmight be) a basis for R"*. 19 The columns of A are n vectors from Rm. If they are linearly independent, what is the rank of A? If they span Rm. what is the rank? If they are a basis for Rm, what then? Looking ahead The rank r counts the number of____________columns. 20 Find a basis for the plane x—2y+3a = 0 in RJ. Then find a basis for the intersection of that plane with the xy plane. Then find a basis for all vectors perpendicular to the plane 21 Suppose the columns ofa 5 by 5 matrix A are a basis for R&. (a) The equation Ax = 0 has only the solution x = 0 because ___, (b) If b is in R' then Ax b is solvable because the basis vectors _R5. Conclusion: A is invertible. Ils rank is 5. Its rows are also a basis for R&. 22 Suppose S is a 5-dimensional subspace of R" True or false (example if false): (a) Every basis for S can be extended to a basts for R® by adding one more vector. (b) Every basis for R6 can be reduced lo a basis for S by removing one vector. 23 U comes from A by subtracting row I from row 3: 1 3 2 0 1 I 3 2 A- and U I 1 3 2‘ 0 1 1 0 0 0 24 25 f^^f7nhe'WOtS’lnSP*C‘ F,nd bases for the two row spaces. Find bases or the two nullspaces. Which spaces stay fixed in elimination? True or false (give a good reason) (a) If the columns of . matnx are dependent, so are the rows £ T" T ” ‘ 2 » " ”> >»“' >™ red““ •»». basis? Suppose t»| Suppose t>t. a basis ?
3 4. Independence, Basis, and Dimension 119 26 For which numbers c and d do these matrices have rank 2? Questions 27-31 are about spaces where the “vectors" are matrices. 27 Find a basis (and the dimension) for each of these subspaces of 3 by 3 matrices: (a) All diagonal matrices. (b) All symmetric matrices (AT = A). (c) All skew-symmetric matrices (Лт ж -Д). 28 Construct six linearly independent 3 by 3 echelon matrices l/|,..., l/e. 29 Find a basis for the space of all 2 by 3 matrices whose columns add lo zero. Find a basis for the subspace whose rows also add to zero. 30 What subspace of 3 by 3 matrices is spanned (take all combinations) by (a) lhe invertible matrices? (b) the rank one matrices? (c) the identity matrix? 31 Find a basis for the space of 2 by 3 matrices whose nullspace contains (2,1,1). Questions 32-36 are about spaces where the “vectors" are functions. 32 (a) Find all functions that satisfy jJ = 0. (b) Choose a particular function that satisfies = 3. (c) Find all functions that satisfy jJ = 3. 33 The cosine space Fj contains all combinations y(x) “ A cos x+ В сое 2x+C cos 3x. Find a basis for the subspace with t/(0) 0. 34 Find a basis for the space of functions that satisfy (a) £-2v = ° <b) й-;=о- 35 36 37 Suppose j/i (x), Jf2(x), кз(х) are three different functions of x. The vector space they span could have dimension 1. 2. or 3. Give an example of щ, уз. уз to show each possibility. Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the subspace with p(l) = 0. Find a basis for the space S of vectors (a, b, c. d) with a + e + d - 0 and also for the space T with a + b = 0 and c = 2d. What is the dimension of the intersection S ПТ7
I '< •I < I Chapter 3. The Four Fundamental Subspilc 120 C4 . н chow that ^as no soIution whcn the 5 bv ч 38 X* »| « Л. = ” » <—* M b|» ™eu ’ ^../,/r2 = u(x)and then = ~v(x). 39 Find bases for all solutions to d y/dx Challenge Problems 40 41 42 Write the 3 by 3 identity matrix as a combination of the other five permu(alion matrices^ Then show that those five matrices are hneary independent. This U , basis for the subspace of 3 by 3 matrices with row and column sums all equal. Choose x = (x„z3.x3.z4) in R4. It has 24 rearrangements like {x2>xUXi^} and (хд.хз.хьx2). Those 24 vectors, including x itself, span a subspace S. Find specific vectors x so that the dimension of S is: (a) zero, (b) one, (c) three, (d) f0Ur Intersections and sums have dim(V) + dim(W) — dim(V П W) + dim(V 4 W) Start with a basis ui,.. ., ur for the intersection V О W. Extend with Vj...,, v, to a basis for V, and separately W|,..., to a basis for W. Prove that the u’s, v’s and w'i together arc independent: dimensions (r + e) + (r-f-t) » (r) + (r 4- a 4. j 43 Inside R”, suppose dimension (V) + dimension (W) > n. Show that some nonzero vector is in both V and W 44 Suppose A is 10 by 10 and A1 0 (zero matrix). So A multiplies each column of A to give the zero vector. Then the column space of A is contained in the If A has rank r, those subspaces have dimension r < 10 - r. So the rank is r < 5.
3,5. Dimensions of the Four Subspaces 121 3. 5 Dimensions of the Four Subspaces /f^hecolumn space C(A) and the row space C(AT) bvc dimenilon r (thc ^k of A), i The nullspace N(A) has dimension n - r. The left nullspace N(AT) has dimension m - r j Elimination often changes C(A) and N( AT) (but their dimensions don't change). The main theorem in this chapter connects rank and dimension The rank of a matrix counts independent columns. The dimension of a subspace is the number of vectors in a basis. We can count pivots or basis vectors. The rank о/ A reveals the dimensions of all four fundamental subspaces. Here are the subspaces, including the new one. Two subspaces come directly from A. and the other two come from AT. Four Fundamental Subspaces Dimensions 1. The row space is C( AT). a subspace of R". r 2. The column space is C(A). a subspace of R". r 3. The nullspace is N(A), a subspace of R". n - r 4. The left nullspace is N(AT). a subspace of Rm. m - г We know C(A) and N(A) pretty well. Now C(AT) and N(AT) come forward. The row space contains all combinations of the rows. This row space is the column space of AT. For the left nullspace we solve ATy = 0—that system is n by m. In Example 2 this produces one of the great equations of applied mathematics—Kirchhoff's Current Law. The currents flow around a network, and they can't pile up at the nodes. The four subspaces come from nodes and edges and loops and trees. Those subspaces are connected in an absolutely beautiful way. Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One fact stands out: The row space and column space have the same dimension r. This number r is the rank of A (Chapter 1). The other important fact involves the two nullspaces: N( A) and N(AT) have dimensions n - г and m - r,to make up the full n and rn. Part 2 of the Fundamental Theorem will describe how the four subspaces fit together: Nullspacc perpendicular to row space, and N( AT) perpendicular to C( A). That completes the “right way” to understand Ax = b. Stay with it—you are doing real mathematics.
122 Chapter 3- The Four Fundamental Subsp^ The Four Subspaces fOr - «, . itotoy lhcm to.', dtotel> » « loo* ba "k ?'» .. wueb to. to «•*»- to Л «arf Я». “' * and Я “ rf to to ^bpto. «. h~ Wblab , As a specific 3 by 5 example, look at tbe four subspaces for this echelon matrix . 1 3 5 0 71 pivot rows 1 and 2 0 0 0 1 2 0 0 0 0 Oj phot columns 1 and 4 Tbe rank of this matrix is r = 2 (nropiroa). Take the four subspaces in order. 1. The row space has dimension 2. matching thc rank, j Reason: The first two rows are a basis. The row space contains combinations of all three rows, but the third row (the zero row) adds nothing to the row space. The pivot rows 1 and 2 are independent. That is obvious for this example, and it is always true If we look only at the pivot columns, we see the r by г identity matrix. There is no way lo combine its rows to give the zero row (except by the combination with all coefficients zero) So the r pivot rows (the rows of A) are a basis for the row space. The dimension of the row space is the rank r. The nonzero rows of Reform a basis. 2. Thc column space of Ro also has dimension r = 2. Reason: The pivot columns I and 4 form a basis. They are independent because they contain the г by r identity maim. No combination of those pivot columns can give the zero column (except the combination with all coefficients zero). And they also span the column space. Every other (free) column is a combination of the pivot columns. Actually the combinations we need are the three special solutions! Column 2 is 3 (column I). The special solution is (-3,1,0,0,0). Column 3 is 5 (column I). The special solution is (—5,0,1,0,0,). Column 5 is 7 (column I) + 2 (column 4). That solution is (-7,0,0,-2,1). The pivot columns are independent, and they span C(/?o). so they are a basis for C(Ao). The dimension of the column space is the rank r. The pivot columns form a basis.
3.5. Dimensions of lhe Four Subspaces 123 3. Ш of ft, ta dime»™, n - , . , _ 2 ЛЫ 3 f„ «table give 3 .petal ...... и ftg . Q. ft, „ | tnd „ ,„d a -3‘ —5 —7' 1 0 0 Ro® = 0 has the «2 ~ 0 •a = 1 «5 = 0 complete solution 0 0 -2 ® = ®2«2 + Тз*3 + ®6«S 0 0 1 The nullspacc has dimension 3. Reason . There is a special solution for each free variable. With n variables and г pivots, thal leaves n—г free variables and special solutions. The special solutions are independent, because they contain the identity matrix in rows 2.3,5. The nullspace N (Л) has dimension n — r. The special solutions form a basis. 4. The nullspace of Rj (left nullspace of Ro) has dimension m - r - 3 - 2. Reason : Ro has г independent rows and rn — r zero rows. Then R,\ has г independent columns and m — r zero columns. So у in lhe nullspace of R(] can have nonzero* in its last m - r entries. The example has m - r I zero columns in Rj and I nonzero in y. ’ 1 0 0 ‘ 0 ‘ Rjy = 3 0 0 Vi 0 0 5 0 0 Vi = 0 is solved by у ш 0 0 1 0 V3 0 Vs 7 2 0 0 (1) Because of zero rows in Ro and zero columns in Ro’, it is easy to see the dimension (and even a basis) for this fourth fundamental subspace: If Ro has m — r zero rows, its left nullspace has dimension rn — r. Why is this a “left nullspace"? Because we can transpose R„y — 0 to yTRo ~ O’. Now у1 is a row vector to the left of R. This subspace came fourth, and some linear algebra books omit it—but that misses the beauty of the whole subject. In Rn the row space and nullspace have dimensions r and n — r (adding to n). In Rm the column space and left nullspace have dimensions r and rn - r (total m). We have a job still to do. The four subspace dimensions for A are the same as for Ro. The job is to explain why. 4 is now any matrix that reduces to Ro = rref (Л). 1 7 2 9 3 5 0 0 0 1 3 5 1 Illis A reduces to Ro 0 1 Same row space as Ro Different column space But same dimension!
Figure 33: The dimension* of lhe Four Fundamental Subspace* (for Яо and for A). The Four Subspaces for д 1 A has the same row space as Яо ansi R. Same dimension r and same basis. Reason: Every row of A is a combination of lhe rows of Яо- Also every row of Яо is a combination of the row* of A. Elimination changes rows, but not row spaces. Since A ha* the same row space as Яо. the first r row* of Яо are still a basis. Or we could choose r suitable row* of the original A. They might not always be the first r rows of A. because those could be dependent. The good r rows of A are the ones that end up as pivot row* in Яо and Я. 2 The column space of Ahos dimension r. The column rank equals the row rank. The number of independent columns — the number of independent rows. Wrtmg reason “A and Яо have lhe same column space." This is false. The columns of Яо often end in zeros The column* of A don't often end in zeros. Then C( A) is not С(Яо). Right reason. The same combinations of the columns are zero (or not) for A and Яо- nJTokm,П A ~ ‘кре^П1'° &У th" an°thCT way: Ax = 0 exactly when R»x = 0. The column spaces are different, but their dsmensions are the same -equal to the rank r. Conclusion The r pivot column* of A are a basis for its column space C( A).
3.5. Dimensions of the Four Subspaces 125 3 A has the same nullspace as Same dimension n - r and same basis Reason: Thc elimination steps don't change the soluUons. The special solutions are a ba- sis for this nullspace (as we always knew). There are n - r free variables, so the dimension of the nullspace is n — r. This is the Counting Theorem: r 4- (n — r) ecjunls n. | (dimension of column space)+(dimension of nullspace) = dimension of Rn. 4 The left nullspace of A (the nullspace of Лт) has dimension m - r. Reason: Лт is just as good a matrix as A. When we know the dimensions for every Л. we also know them for Лт. Its column space was proved to have dimension r. Since Лт is n by m, thc ‘ whole space** is now R"*. The counting rule for A was г + (n - r) = n. The counting rule for AT is г + (m — r) = m We have all details of a big theorem: Fundamental Theorem of Linear Algebra, Part 1 The column space and row space both have dimension r. The nullspaces have dimensions n - г and m — r. By concentrating on spaces of vectors, not on individual numbers or vectors, we get these clean rules. You will soon lake them for granted—eventually they begin to look obvious. But if you write down an 11 by 17 matrix with 187 nonzero entries. I don't think most people would sec why these facts are true: _ . . dimension of C( Л) = dimension of С(ЛТ) = rank of A Two kev facts ' ' ' _ ' dimension of С(Л) 4 dimension of М(Л) « 17. Every vector Ax = h in the column space comes from exactly one x in the row space ! (If we also have Ay = b then Л(х - у) = b - b = 0. So x - у is tn the nullspace as well as the row space, which forces x = y.) From its row space to its column space, A is like an r by r invertible matrix. It is the nullspaces that will force us to define a “pseudoinverse of Л": page 133. Example 1 Л=^ "i б] has m = 2 w,,h n = ^ The rank is r « 1. The row space is the line through (1,2,3). The nullspace is the plane xt + 2zj + 3x j = 0. The line and plane dimensions still add to 1 + 2 = 3. The column space and left nullspace arc perpendicular lines in R2. Dimensions 1 + 1 = 2. Column space = line through LI Left nullspace = line through Final point: The у'sin the left nullspace combine the rows of A to give the zero row.
126 Chapter 3. The Four Fundamental Subspeces ’ Г * гх**- - can t continue forever; H have four unknowns (one for every n<1(l •”> » При 30 №s Л has 1 «nd -1 on The matrix in Ar = b ri an Differences Ax = b across edges 1,2,3. 4,5 between nodes 1,2,3,4 m = 5 and n = 4 If you understand lhe four fundamental subspaces for this matrix (the column spaces and the nullspaces for A and AT) you have captured a central idea of linear algebra. Figure 3.4; A “graph” with 5 edges and 4 nodes. A it its 5 by 4 incidence matrix The nullspace N(A) To find the nullspacc we set b - 0. Then the first equation savs *1 - xj. The second equation s x, - x,. Equation 4 is zt - x4. All four unknowns *i. Xj. x,. x4 have the same value c. The vectors z - (c, c, C, c) fill the nullspace of A. n< х?”^,а$расе “ * l,ne " **’ 4X011 *°,и|'оп * “ (1.1.1,1) is a basis for ялсе n - 7T l' “ “* bS'S) Пе ППк °f A mUSI >* 3- r - 4 - 3 . I We now know the dimensions of all four subspaces. „ ?Г ' - 3 «*—. The ta w “ «" 3 «*— U Л. П. „«апык ».y „ Я. = „«(Л). ’ Columns 1.2,3 -1 -1 0 0 0 1 0 -1 -1 0 _ reduced row echelon form From ft, we 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 -1 ' -1 1 0 0 0 1 1 0 -1 3 columns are basic, columns 1.2,3 of A. The colZ^* must 8° back to
3 j. Dimensions of the Four Subspaces 127 The row space С(ЛТ) The dimension must again be r = 3. But the first 3 row* of A are not independent: row 3 = row 2 - row I. So row 3 became zero in elimination, and row 3 was exchanged with row 4. The first three independent row* are rows 1,2.4. Those three rows arc a basis (one possible basts) for the row space. Edges 1,2,3 form a loop in the graph: Dependent row* 1,2,3. Edges 1,2,4 form a tree. Trees have no loops' Independent row* 1,2,4. The left nullspace N(AT) Now we solve ATy = 0. Combinations of the row* give zero. We already noticed that row 3 = row 2 - row 1, so one solution it у « (1, —1,1,0,0). I would say: That у comes from following the upper loop in the graph. Another у comes from going around the lower loop and it is у = (0,0, -1.1, -1): row 3 = row 4 - row 5. Those two y's are independent, they solve ATy = 0. and lhe dimension of N(AT) ism — r = 5 - 3 = 2. So we have a basis for the left nullspace. You may ask how "loops" and "trees" got into this problem. That didn't have to happen. We could have used elimination to solve Ат у = 0. The 4 by 5 matrix AT would have three pivot columns 1,2,4 and two free columns 3,5. There are two special solution* and the nullspace of AT has dimension two: m - Г - 5 - 3 - 2. But loops and trees identify dependent rows and independent rows in a beautiful way for every incidence matrix. The equations Ax = b give “voltages" X|,xj,xj,z4 at the four node*. The equation* А’у = 0 give "currents" 1П,Уз.уз,1м,у* °" the five edge*. These two equation* are Kirchhoff’s Voltage Law and Kirchhoff’s Current Law. Those laws apply to an electrical network. But the ideas behind the words apply all over engineering and science and economics and business Linear algebra connects the laws to the four subspaces Graphs are the most important model in discrete applied mathematics. You see graphs everywhere: roads, pipelines, blood flow, the brain, the Web. the economy of a country or the world. We can understand their matrices A and AT. Here is a summary. The incidence matrix A comes from a connected graph with n nodes and m edges. The row space and column space have dimensions г = n — 1. The nullspaces of A and AT have dimensions 1 and m - n + 1: N(A) The constant vectors (c,c,...,c) make up the nullspace of A : dim = 1. C(AT) The edges of any tree give r independent row* of A : r = n - 1. C( A) Voltage Law: The components of Az add to zero around all loops: dim = n - 1. N( AT) Current Law: ATy = (flow in)-(flow out) = 0 is solved by loop currents There are rn - г = m - n + 1 independent small loops in lhe graph. For every graph in a plane, linear algebra yields Euler's formula : Theorem 1 in topology! (nodes) - (edges) + (small loops) =(n) - (m) + (m - n + 1) = 1
128 Chapter 3. The Four Fundamental Subspaccs Rank Two Matrices — Rank One plus Rank One Rank one matnces have the form uvT Here u a matnx .4 o!mnk r 1 Wecj-< see r immediately from .4 So we reduce the matnx by row operates to Яо- has the siUnc row space as .4 Throw away its zero row to find Я-also with tbe same row space. Rank two 0 3 1 Г 1 17 = 1 2 20 J [4 (3) Now look at columns The pivot columns of Я are clearly (1,0) and (0,1). Then the pivot columns of Л are also in columns 1 and 2: U| = (1,1,4) and u2 = (0,1,2). Notice that C has those same first two columns' That was guaranteed since multiplying by two columns of the identity matnx (in Я) won't change the pivot columns Uj and u2 When you put in letters for the columns and rows, you see rank 2 — rank 1 4- rank 1. Matrix A Rank two zero row utvf + u2t>J Columns of C times rows of R. Every rank r matrix и a sum of r rank one matrices • WORKED EXAMPLES 3.5 A Put four 1’s into a 5 by в matrix of zeros, keeping the dimension of its row space as small as possible. Describe all the ways to make the dimension of its column space as small as possible. Then descnbe all the ways lo make the dimension of its nullspace as small as possible. How to make the sum of the dimensions of all four subspaces small? Solution The rank is I if thc four l’s go into the same row. or into the same column. They can also go into two rows and two columns (so a,, - ao = a., = a,. = 1). Since the column space and row space always have the same dimensions, this answers the first two questions: Dimension 1. The nullspace has its smallest possible dimension 6-4 = 2 when the rank is r = 4. ° и ’ 522"T g° 'nto f<W d,ffercn‘roWS four diffcrcn‘ columns. « Л“ n n tь'4 J "" ' + <"-’) + ’ + (»- r) = „ + m I, will be + 5‘Il~~«e1bo.dwr,«eptod - Tbejom i, 11 even if (here aren'lany IX.. и «И lb. oto .«no о, Л „ r. intend of OX bo. do to. »Swe„ change ?
35. Dimensions of the Four Subspaces 129 3 5 B A! i^Zib?! lib ^b'na,IOnS -ws of B. So the row space of AB is contained in (possibly equal to) Ле row space of В Rank (AB) < rank (B) Al| columns of AB are combinations of the columns of A. SoThe column space of AB is contained in (poMibiy equal to) the column space of A. Rank (AB) < rank (A). |f we multiply A by an ittvenMe matrix В, Ле rank will not changZ Tbe rank can’t drop, because when we multiply by the inverse matrix the rank can’t jump back up. Appendix 1 collects tbe key facts about the ranks of matrices. Problem Set 3.5 (a) If a 7 by 9 matrix has rank 5. what are the dimensions of lhe four subspaces? What is Ле sum of all four dimensions? (b) If a 3 by 4 matrix has rank 3. what are its column space and left nullspacc? Find bases and dimensions for Ле four subspaces associated with A and В: Find a basis for each of the four subspaces associated wii A: 0 12 3 4 0 12 4 6 0 0 0 1 2 I 0 O' 1 1 0 0 1 1 4' 2 0 0 1 2 0 0 0 0 0 0 4 Construct a matrix with the required property or explain why this is impossible: (a) Column space contains [i j, ^oj. row space contains (b) Column space has basis f I j. nullspace has basis i j. (c) Dimension of nullspace = 1 + dimension of left nullspace. (d) Nullspace contains [ J ], column space contains [} ]. (e) Row space = column space, nullspace # left nullspace. 5 6 If V is Ле subspace spanned by (1,1,1) and (2,1,0), find a matrix A that has V as its row space. Find a matrix В that has V as its nullspace. Multiply AB. Without using elimination, find intensions and bases for Ле four subspaces for Г 4 . 5 and В = 0 3 3 3' A= 0 0 0 0 0 10 1 Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces for A and also for Ле 3 by 6 matrix В = (A A ]. (7he bans for Z и empty.)
130 8 9 10 11 12 13 14 18 18 17 18 м iimensrnmof .he four subspaces Й»ЛД and C, if / is lhe 3 by 3 identity matnx and и ик J > and С = [О]. onv for these matrices of different sizes? Which subspaces are the same for rncse r ГЛ [A .41 (a) M| and [*] 0» [л] «“» [Л A]’ prove that all three of those matrices have the same wik г. If the entries of a 3 by 3 matnx are chosen randomly between 0 and I. what are the том likely dimensions of the four subspaces" What if the random matnx is 3 by 5? (Important) A is an m by n matnx of rank r. Suppose there are right sides b for which Az = b has no solution (a) What are all inequalities « or <) that must be true between m, n, and r? (b) How do you know that ATy = 0 has solutions other than у » 0? Construct a matrix with (1.0,1) and (1,2,0) as a basis for its row space and its column space. Why can’t this be a basis for the row space and nullspace ? True or false (with a reason or a counterexample): (a) If m « n then the row space of A equals the column space. (b) The matrices A and - A share the same four subspaces. (c) If A and В share the same four subspaces then A is a multiple of B. Without computing A. find bases for its four fundamental subspaces: 1 0 0 6 1 0 9 8 1 A- 1 2 3 41 0 12 3 0 0 I 2 J If you exchange the first two rows of A. which of the four subspaces stay the same? If» = (1.2.3,4) is in the left nullspace of A write down a vector in the left nullspace of the new matrix after the row exchange. Explain why v — (1,0, -1) cannot be a row of A and also in the nullspace. Describe the four subspaces of RJ associated with 0 1 о Л= о о 1 and J ooo ^COmpkled (5 ones and 4 nerther side passed up a winning move ? ’1 0 0 o‘ 1 i zcros in A) so that rank (A) = 2 but 1 1 0
3 5 Dimensions of the Four Subspaces 131 19 (Left nullspace) Add the extra column b and reduce A to echelon form: 12 3 b| 4 5 6 bj 7 8 9 bj 20 1 2 3 bi t>2 - 4b, 0 0 0 A combination of the rows of A has produced the zero row. What combination is it? (Look at Ьз - 2bi + bi on the nght side.) Which vecton are in the nullspace of Ar and which vectors are in the nullspace of A? (a) Check that the solutions to Ax = 0 are perpendicular lo the rows of A: 21 22 23 1 2 1 0 1 0 0] [4 2 0 0 0 13= ERo = CR 3 4 11 [O 0 0 0 (b) How many independent solutions to ATy = O’ Why does yT = row 3 of £'1 ? Suppose A is the sum of two matrices of rank one: A = uvT + w:T. (a) Which vectors span the column space of A ? (b) Which vectors span lhe row space of A? (c) The rank is less than 2 if_or if_____. (d) Compute A and its rank if u = z = (1,0,0) and v = w = (0,0,1). Construct A = ut»T + wxT whose column space has basis (1,2,4),(2,2,1) and whose row space has basis (1,0), (1,1). Write A as (3 by 2) times (2 by 2). Without multiplying matrices, find bases for the row and column spaces of A: 24 How do you know from these shapes that A cannot be invertible? (Important) ATy = d is solvable when d is in which of the four subspaces? The solution у is unique when the _______contains only the zero vector. True or false (with a reason or a counterexample): A and AT have the same number of pivots. A and AT have the same left nullspace. If the row space equals the column space then AT = A. (b) (c) (d) If AT = -A then the row space of A equals the column space.
I 132 26 27 Chapter 3. The Four Fundamental SubSpact И а Ь c ж to» a / 0.1»* Cl”°“ d " I “ a ] h“ ra"k R,d to of to . by 8 tocketo«d man. В and to toy. C. 1? it It '1 0 1 0 10 10 1 10 10 10 0 10 10 1 and 10 10 10 b 4 к b n P P P p p four zero rows P P P P p b q k b n I Ц В = 0 1 0 C = P P 0 1 '4 n P P n P P The numbers r.n.UM « 11,1 differenr Find baSC* r0W Space a"d left nullspace of В and C. Challenge problem: Find a basts for the nullspace of C. Challenge Problems 2B 29 If A uvT is a 2 by 2 matrix of rank 1, redraw Figure 3.5 to show clearly the Four Fundamental Subspaces. If В produces those same four subspaces, what is the exact relation of В to Al M is the space of 3 by 3 matrices. Multiply every matrix X in M by A 1 -1 0 0 1 -1 -1 0 1 . Notice: A O' 0 0 J 1 1 1 (a) Which matrices X lead to AX « zero matrix? (b) Which matrices have tbe form AX for some matrix XI 30 (a) finds the "nullspace" of that operation AX and (b) finds the "column space”. What are the dimensions of those two subspaces of M 7 Why do they add to 9 ? Suppose the rn by n matrices A and В have the same four subspaces. If they are both in row reduced echelon form, prove that F must equal G: ll Л = B = и n
j 5 Dimensions of the Four Subspaces 133 Every Matrix A has a Pseudoinverse A+ If the columns of A are independent, then A* = (ATA)“1 AT j, a |eft-inverse: A* A = /. If the rows of A are independent. then A* = AT(AAT)“* is . right-inverse: AA* = I. This page allows dependent columns and dependent rows, and creates the pseudoinverse A+. Here is the key idea for A*b. Split b into p and e. The part p in lhe column space equals Ax for one vector z in the row space (see page 125). The pan e is in lhe nullspace of A (the fourth subspace). Then the best possible inverse A* has A*p = z* and A+e = 0 on the two pans. By linearity A*b = A*(p + e) = z*. In short, A takes its row space to its column space. A* inverts that invertible part. Figure 3.5: Vectors p = Az* in the column space of A go back to z* in the row space. Notice ! Suppose you start from that vector b (not in the column space). Then A" b = z4 is in the row space and AA*b = Az* = p is in the column space. So AA* / I. But p is аз close to b as possible. Actually p is the “projection of b onto the column space. the subject of Chapter 4. Symmetrically. A* Az is the projection of z onto the row space. Examples If A = CR = (m xr)(rxn) then А* =Я*С*
4 Orthogonality 4.1 Orthogo»alil.'»f,he,;“urSubSpa“S 42 Projections onto Subspaces 43 Least Squares Approximations 4.4 Orthogonal Matrices and Gram-Schmidt Two vectors arc orthogonal when then dot product is zero v • w - w = Q chapter moves to orthogonal subspaces and orthogonal bases and orthogonal matrices The vectors in two subspaces. and the vectors in a basts, and the column vectors in Q all pairs will be orthogonal Think of a3 + b3 = c2 for a right triangle with sides v and J ^Orthogonal vectors_________t>TW » 0 ll®ll* + ||w||a a ||t> 4. The right side is (o + w)T(® + »)• This equals vTv + wTw when vTw = wTv a 0. Subspaces entered Chapter 3 to throw light on Ax = b. Right away we needed the column space and the nullspace. Then the light turned onto A1. uncovering two more subspaces Those four fundamental subspaces reveal what a matnx really does. A matrix multiplies a vector A timet x. Al the first level this is only numbers. At the second level Ax is a combination of column vectors The third level shows subspaces But I don’t think you have seen the whole picture until you study Figure 4.2. Those fundamental subspaces are orthogonal: The nullspace N(A) contains all vectors orthogonal to lhe row space C(AT). The nullspace N(AT) contains all vectors orthogonal to the column space C(A). Ax = 0 makes x orthogonal to each row. ATy = 0 makes у orthogonal to each column. A key idea in this chapter is projection If 6 is outside the column space of A, find the closest point p that is inside. The line from b to p shows the error e. That line is perpend.cular to the column space. The least squares equation ATAx = ATb produces That "s* *^Srnfe‘ h«ivci “* Ь» ® wfKn Ax = b is unsolvable. Thatbest x nukes ||Ax - ЦР = |Wf» as small as possible. ** easy whcn AT A = /. Then A has orthonormal columns: OrthogonaJiane”^ГС,|ОП ** *°П * happen b-v “’dent but we can make it happen. °* to C0,umns 10 «n with 9^ =Tnd ’ °nbOg0nlJ “*<?>- QT<? = Z and it connects t; A by A = QR light Q i0 „иду waZ”^peC,J°r co’nPutations The whole of Section 7.4 will high- Q is better than A = LU, and this chapter shows why. 134
4.1. Orthogonality of the Four Subpaces 135 4.1 Orthogonality of the Four Subspaces veoonh«« 0. The,||,||> + |M. . II, + B||1 „к .» + P . A 2 Subspaces V and W are orthogonal when ®Tw . 0 for v ia v w in W. j The row space of A is orthogonal to the nullspace. The column space is orthogonal to N(AT). 4 The dimensions add lo r + (n - r) « n and r + (m - r) = m: Orthogonal complement* ^5 If n vectors in R are independent, they span Rn. If n vectors span R", they are independent^ Chapter 1 connected dot products vT w to the angle between v and w For 90" angles we have v1 w = 0. This chapter moves up from orthogonal vectors v and w to orthogonal subspaces V and W. The subspaces fit together to show the hidden reality of A tunes x. The 90° angles between subspaces are new -and we can say now what those nght angles mean The row space is perpendicular to the nullspace. Every row of A is perpendicular to every solution of Ax “ 0. That gives the 90° angle on the left side of the figure. This perpendicularity of subspaces is Part 2 of the Fundamental Theorem of Linear Algebra. The column space и perpendicular to the nullspace of Ar. When we want lo solve Ax “ b and can't do it, this nullspace of AT contains the error e = b - Ax in the "least-squares" solution x. A key application of linear algebra. DEFINITION Two subspaces V and W of a vector space arc orthogonal if every vector v in V is perpendicular to every vector w in W: Orthogonal subspaces vTw =0 for all v in V and allwinW. Example 1 Two walls of a room look perpendicular but those two subspaces are not orthogonal! The meeting line is in both V and W—and this line is not perpendicular to itself. Two planes in R3 (dimensions 2 and 2) cannot be orthogonal subspaces. Example 2 Thc floor and a vertical line do give orthogonal subspaces. They have dimensions 2+1=3. Perpendicular lines through 0 also give orthogonal subspaces. When a vector is in two orthogonal subspaces, it must be zero. It is perpendicular to itself. It is v and it is w, so vTv = 0. Thi* has to be the zero vector. The crucial examples for linear algebra come from the four fundamental subspaces. Zero is the only point where the nullspace meet* the row space. More than that, the nullspace and row space of A meet at 90°. This key fact come* directly from Ax = 0: Every vector x in the nullspace is perpendicular to every row of A. because Ax = 0. The nullspace N(A) and the row space C(AT) are orthogonal subspaces of R" w ar
136 orthogonal plane V and line W F.₽. 4.1: Onbop^^ »i«V«»bk •»- <“» V + di"' W > To « . »и»**™«»•«* * * °' m"l“pl“’ *= 01 <— (row 1) • X is «го : (I) Oj 1— (row m) • x is zero The fin» equation says that row I u perpendicular to x. The last equation says that row m is perpendicular to x. Every row has a zero dot product with x. Then x is also perpendicular to every combination of the rows. Tbe whole row space C(AT) is orthogonal to N( A). Here is a second proof of that orthogonality for readers who like matrix shorthand. The vectors in the row space of A art combinations A ' у of the rows. Nullspace orthogonal to row space xT(ATy) (Ax)Ty = O ' v 0. (2) We like the first proof. You can see those rows of A multiplying x to produce zeros in equa- tion (I). The second proof shows why A and AT are both in the Fundamental Theorem. Рал I of the Fundamental Theorem gave the dimensions of the four subspaces. The row and column spaces have the same dimension r (they are drawn the same size). The two nullspaces have the remaining dimensions n - r and m - r. Now we know that the row space and nulhpace are orthogonal subspaces inside R”. Example 3 The rows of A are perpendicular to x = (1,1, -1) in the nullspace: gives the dot products 1 + 2 3 1 ° 5+2-7=0 Now turn to the column space of A and the nullspace of AT: the other pair. Every vector у in the nullspace of 4T >< T~ TbeUfinulUpace^ anJJ^^^ V0 column of A. ---------are orthogonal in Rrn. 2е 7"’“ °*41»»к - «' ‘ л »«« column space of A Q.E.D.
4.1. Orthogonality of the Four Subspaces 137 For a visual proof, look at A'y = 0. Each column of A multiplies у to give 0: column of A multiplies у to give 0: lupuesymg Figure 4.2: Two pairs of orthogonal subspaces Tbe dimensions add to n and add to m. This is the Big Picture—two subspaces in R" and two subspaces in Rw. Orthogonal Complements Important The fundamental subspaces are more than just orthogonal (in pain). Their dimensions are also right. Two lines could be perpendicular in R3. but those lines could not be the row space and nullspace of a 3 by 3 matrix The lines have dimensions I and I, adding to 2. But the correct dimensions г and n - r must add to n = 3. Figure 4.1 showed two walls of a room. Dimensions 2 + 2 # 3 must fail. The fundamental subspaces of a 3 by 3 matrix have dimensions 2 and I. or 3 and 0. Those pairs of subspaces are not only orthogonal, they are orthogonal complements. DEFINITION The orthogonal complement Vх of a subspace V contains every vector that it perpendicular to V. The dimensions of V and V* add to (dimension of the whole space). By this definition, the nullspace is the orthogonal complement of the row space. Every x that is perpendicular to the rows satisfies Ax = 0. and lies in the nullspace. The reverse is also true. If vis orthogonal to the nullspace, it must be in the row space In the same way, the left nullspace and column space are orthogonal in Rm, and they are orthogonal complements. Their dimensions r and m - r add to the full dimension m.
138 NM) * *. "/Л1С< А?/‘" N(4T) „ С(л) (|1!!?>- -=°- “ “е'ХГХ‘^0» » И» «*»» Ч»“' Mull",l>"ns Ьу/ с“"“ ,d0 “*"« «Ье. in Йе row tpace. Proof: If Лхг = A*'r. the difference xr -xr is in the nullspacc It i, also .n the row sp*e. where xr and x'r came from. Th.s difference must be the zero vector because the nullspace and row space are perpendreular. Therefore xr = ж'. Figure 4.3: This update of Figure 4.2 shows the true action of A on x = xr + xn A times zr is in the column space. A times xn is the zero vector. There is an r by r invertible matrix hiding inside A, if we throw away the two nullspaces. From row space to column tpace, A и invertible (page 127). The pseudoinverse A + will invert that part of A (page 133). B = contains in the pivot rows and columns. Example Every matnx of rank r has an r by r invertible submatrix В has rank 2: 1 2 3 4 5 1 1 2 4 5 6 1 2 4 5 6 j bases I hope yoJr^ch "ihT* ** d,agQnalized' when we choose the right orthogonal - hope you reach lhat amazmg fact: the Singular Value Decomposition of A < me repeat. Theoniy vectors two orthogona) subspaces is the zero vector.
41. Orthogonality of the Four Subspaces 139 Combining Bases from Subspaces A basis contains linearly independent vectors that span the space Normally we have to check both properties of a basis When the count is right, one property implies the other: Every vector is a combination of the basis vectors in exactly one way. Any n independent vectors in Rn must span R". So they are a basis. Any n vectors that span R" must be independent. So they are a basis. Starting with thc correct number of vectors, one property of a basis produces the other. This is true in any vector space, but we care most about Rn. When the vectors go into the columns of an n by n square matrix A, here are the same two facts: If the n columns of A are independent, they span Rn So Ax = b is solvable If the n columns span Rn, they are independent So Ax = b has only one solution. If AB = I for square matrices, then BA = I. Uniqueness implies existence and existence implies uniqueness. Then A is invertible. If (here are no free variables, the solution X is unique There must be n pivot columns Then back substitution solves Ax = b (the solution exists) Starting in the opposite direcuon, suppose that Ax = b can be solved for every b (existence of solutions). Then elimination produced no zero rows. There are n pivots and no free variables The nullspace contains only x = 0 (uniqueness of solutions). With bases for the row space and the nullspace, we have r + (n - r) - n vectors. This is the right number. Those n vectors are independent.1 Therefore they span R”. Each x is the sum xr + zn of a row space vector zr and a nullspace vector zn. The splitting xr + z„ in Figure 4.3 shows the key point of orthogonal complements— the dimensions add to n and all vectors are fully accounted for. Example 5 For A = * g j split z = j into z, + z„ j + ^ _ j j. The vector (2,4) is in the row space. The orthogonal vector (2, -1) is in the nullspace. The next section will compute this splitting by a projection matrix P. Example 6 Suppose S is a six-dimensional subspace of nine-dimensional space R*. (a) What are the possible dimensions of subspaces orthogonal to S ? 0,1,2,3 (b) What are the possible dimensions of the orthogonal complement Sx of S ? 3 (c) What is the smallest possible size of a matrix A that has row space S ? в by 9 (d) What is the smallest possible size of a matrix В that has nullspace S1 ? в by 9 a If a combination of all n vectors gives zr+*n = 0. then *r = -x,. is in both subspaces So Xr = Xn = 0. All coefficients of the row space basis and of the nullspace basis must be zero. This proves independence of the n vectors together.
Chap,Cf4^nahty 140 2 Problem Set 4.1 n u , ^unx of rank one. Copy fi₽« 42 and one vec>°r in each Construct any 2 by 3 matru oi orthogonal? subspace (and put two in the nullspace). a 1 for a 3 by 2 matnx of rank r = 2- Which subspace is Z .л К “ “y wb>’ “ “ impo“ible: 3 . [ 11 and 1-3 . nullspace contains [11 (a) Column space contains [JJ ,J’ Li J [11 and -3 nullspace contains [ 11 jjand^ sj.nuus,— LiJ (c) Ax = [i] has a solution and AT [g] » [g] (d) Every rot ia orthogonal to every column (A is not the zero matrix) (e) Columns add up to a column of zeros, rows add to a row of l’s. 4 If -LB = 0 then the columns of В are in the---- of A. The rows of A are in the of B. With AB - 0, why can’t A and В be 3 by 3 matrices of rank 2? (a) If Ax = b has a solution and ATp = 0. is (yTx = 0) or (yTb = 0)? (b) If ATy «(1,1,1) has a solubon and Ax = 0, then------. 6 This system of equations Ax = b has no solution (they lead to 0 = 1): x + 2y + 2z = 5 2z + 2y + 3z = 5 3x + 4y + 5z = 9 Find numbers уьуьуз to multiply the equations so they add to 0 = 1. You have found a vector у in which subspace? Its dot product у rb is 1, so no solution x. 7 Every system Ax = b with no solution is like the one in Problem 6. There are num- bers yi .........у», that multiply the m equations so they add up to 0 = 1. This is called Fredholm’s Alternative If b ts not in C(A), then part of b is in N(AT). Exactly one problem has a solution: Ax = b OR Ary = 0 with yTb = l, Multiply tbe equations xj - x2 = 1 and Xj — Z3 = 1 and Xi — X3 = 1 by numbers Vi-Уз.Уз chosen so that the equations add up to 0 = 1. 8 wrinr • v. "°* ^Zr B еЯиа1 t0 How do we know that this vector is in the column space’If A — 11 U _ in «. • « >p«x.uA-[11jandx = |i] what is xr?
4.1. Orthogonality of the Four Subspaces 141 9 Jh A Ax ofVanrt IR1 0 Reason: » the nullspace of AT and also in А ГГТ4»“> «--------------------0 and toA.ce «те ли pace аз Л. Thu key faa a repeated in lhe nett tertian. Ю Suppose A is a symmetric matrix (Лт = Д) (a) Why is its column space perpendicular to its nullspace? (b) If Ax — 0 and Az = 5z, which subspaces contain these “eigenvectors” x and z Symmetric matrices have perpendicular eigenvectors xTz = 0. 1 0 3 0 12 Find xr and xn and draw Figure 43 property if A = I1. X| and x = Questions 13-23 are about orthogonal subspaces. 13 Put bases for the subspaces V and W into the columns of matrices V and W. Explain why the test for orthogonal subspaces can be written VT1V = zero matrix. This matches WT w = 0 for orthogonal vectors. 14 The floor V and the wall W are not orthogonal subspaces, because they share a nonzero vector (along the line where they meet). No planes V and W in R3 can be orthogonal! Find a vector in the column spaces of both matnces. 1 2 1 3 1 2 5 4 6 3 5 1 This will be a vector Ax and also Bx. Think 3 by 4 with the matrix [ A В ]. 15 Extend Problem 14 to a p-dimensional subspace V and a g-dimensional subspace W of Rn. What inequality on p + q guarantees that V intersects W in a nonzero vector? These subspaces cannot be orthogonal. 16 Prove that every у in N(XT) is perpendicular to every Ax in the column space, using the matrix shorthand of equation (2). Start from ATy = 0. 17 If S is the subspace of R3 containing only the zero vector, what is Sx ? If S is spanned by (1,1,1), what is Sx ? If S is spanned by (1,1,1) and (1,1,-1). what is a basis for Sx ? 18 Suppose S only contains two vectors (1,5,1) and (2,2,2) (not a subspace). Then Sx is the nullspace of the matrix A =__________. Sx is a subspace even if S is not. 19 Suppose L is a one-dimensional subspace (a line) in R3. Its orthogonal complement Iх is the____________perpendicular to L. Then (Lx)x is a------ perpendicular to L . In fact (Lx)x is the same as--------
Chapter 4.0^,.^ 142 20 21 22 23 <nart.Rt Then Vх contains only the vector_ е11ПЛЛ, V в the whole K « ------Thea Suppose v (Vх Iх is the same as --• n (Vx)x is__•’°' ' t hv the vectors (1.2.2,3) and (1.3,3,2). Find two vector ,l S«P<K~S«^2^»ta«A* = »fa*h'ch'’’ “* span Sx. This в the same as soiling л !f P в the plane of vectors in R4 satisfying z, + *,+«,+ «4 =• 0. write a JL ^ cJXt a matrix that has P as its nuilspace. If, subspace S is contained to a subspace V. explain why Sx contains Vх. about perpendicular columns and rows. Suppose an n by n matrix is invertible: AA~l = I. Then the first column of л orthogonal to the space spanned by which rows of A? is Questions 24-23 are 24 25 Find ArA if the columns of A are umt vectors, perpendicular to each other. 25 Construct a 3 by 3 matnx A with no zero entries whose columns arc mutual] pendicular. Compute ArA. Why is it a diagonal matrix? У **r‘ 27 The lines 3r+ p = b| and 6r + 2y = bj are_____________They are the same line if In that case (bj, bj) и perpendicular to the vector_______The nullspace of the п^ы is the line За 4 у »______. One particular vector in that nullspace is 25 Why is each of these statements false? (a) (1.1,1) is perpendicular to (1,1, -2) so the planes z + у + z = 0 and z + y- 2a = 0 are orthogonal subspaces (b) The subspace spanned by (1,1,0,0,0) and (0,0,0,1,1) is the orthogonal com- plement of the subspace spanned by (1,-1,0,0,0) and (2, -2,3,4, -4). (c) Two subspaces that meet only in the zero vector are orthogonal. 29 Find a matnx A with v (1,2,3) in the row space and column space. Find В with v in the nullspace and column space. Which pairs of subspaces can't share v ? 30 Suppose A is 3 by 4 and R is 4 by 5 and AB = 0. So N(A) contains C(B). Prove from the dimensions of N(A) and C(B) that rank(A) + rank(B) < 4. 31 The command .V = ntal(A) will produce a basis for the nullspace of A. Then the command В = пи1(ЛГ) will produce a basis for the__ of A. 32 What are the conditions for nonzero vectors r, n, с, I in R2 to be bases for the four fundamental subspaces C(AT),N(A),C(A),N(AT) of a 2 by 2 matrix? 33 Whcncan the vectors ri.rj.m.ni,C|,C2,f i,l2 in R4 be bases for thc four funda- mental subspaces of a 4 by 4 matrix ? What is one possible A ?
4 2. Projections onto Subspaces 143 4.2 Projections onto Subspaces \The projection of a vector b onto the line through a it the closest point p = a(aTb/ aTa). 2 The error e = b - p is perpendicular to a: Right inangle bpe has ||p||2 + ||e||a = ||b||a. 3 The projection of b onto a subspace S is the closest vector p in S; b - p is orthogonal to S. 4 Then the projection of b onto the column space of A is the vector p = Л(ЛТЛ)“‘ Лтb. 5 The projection nutria onto С(Л) is P ~ А(ЛТЛ)-’AT.| Then p = Pb and P3 = P. This section of lhe book is about closest points We have a point b that is not in a subspace S (both are in rn dimensions). What point p in the subspace is closest lo b ? A picture of lhe problem suggests the key to tbe solution: The line from b to p is perpendicular to the subspace That line in Figure 4.5 shows us the error e = b - p. Our first examples are "projecting" b onto special subspaces like the xy plane. There is a projection matrix P that multiplies b and produces its projection p = Pb 1 What are the projections of b = (2,3,4) onto tbe z axis and onto lhe xу plane ? 2 What matrices P1 and Pi produce those projections Pb onto a line and a plane ? When b is projected onto a line, its projection p it the part of b along that line. If b is projected onto a plane, p is the part in that plane. The projection p it Pb. The projection onto the z axis we call pt. The second projection drops straight down to the xy plane. The picture in your mind should be Figure 4.4. Start with b a (2,3,4). The г-projection gives pt (0,0,4). The projection down gives p2 = (2,3,0). Those are the parts of b along the z axis and in the xy plane. The projection matrices P\ and Pi are 3 by 3. They multiply b with 3 components to produce p with 3 components. Projection onto a line comes from a rank one matrix. Projection onto a plane comes from a rank two matrix: „ , , Го о 01 Г1 о ol Projection matrix p 0 0 0 Onto the xy plane: 0 1 0 . Onto the z axis: о 0 1] [о 0 0 Pi picks out the г component of every vector. Pi picks out the x and у components. To find the projections pt and p7 of b, multiply b by Pi and Pi (small p for the vector, capital P for the matrix that multiplies b to produce p):
144 * ты rv pla* and lhe z axis are orthogonal spaces, like the noor 01 v. Projection ₽! Figure4.4: The projectionsp, » fib Pt = ** ont° Лс * “ls “d *V plane. More than just orthogonal, the line and plane are orthogonal complements Thejf dimensions add to 1 + 2 « 3. Every vector 6 in the whole space is the sum of fa pans in the two subspaces The projections p, and p, are exactly those two parts of b: The vectors give pj+ft-b The matrices give + P, . J. (J) This is perfect. Our goal is reached—for this example We have lhe same goal for Wy line and any plane and any n-dimensional subspace of R . The object is to find the pan p in each subspace, and also the projection matrix P that produces that part p = pfo Every subspace of R" has its own m by m projection matrix P The best description of a subspace is a basis. We put lhe basis vectors into the columns of A. Nov we art projecting onto the column space of Л! Certainly the z axis is the column space of lhe 3 by 1 matrix A।. The xy plane is the column space of Л2. That plane is also the column space of Л3 (a subspace has many bases). So p2 = p3 and = Pj. Дз has the same column space as A? Our problem is to project any b onto the column space of an m by n matrix A. Start with a line (dimension n = 1). The matnx A will have only one column. Call it a. Projection Onto a Line A line goes through the origin in lhe direction of a = (a.,. . am) Alone that line we want the point p closest to b = lb, h t tv 1, ' 1 8 ’ The line from b top is perpendicular to A^' рПуеС1,0П ,S orthog°"ali>y: f figure 4.5. We now compute p by algebra.
4 2. Projections onto Subspace* 145 The projection p win be some multiple of a. Call ц p = io = hat tunes a. C°?Pff £ ^i^mV p'^ P ТЪсП frora for ₽ *e read ° th find the "t* * J*** lhree S|CP' will lead co all projection matnces: ftnd x, then find the vector p = Ax. then find tbe matrix P The dotted line 6 - p i. the “error" e = b - ia. h b perpend.cular to а-this will determine x. Use the fact that b - xa is perpendicular to a when their dot product is zero: Projecting b onto a with error e = b - z a . Tl • a*b о о (2) fl.(b-za) = Q or a.b = 2a.a 9 a^a s 7^ The multiplication aTb is the same nab. Using the transpose is better, because it applies also to matnces. Our formula z ж oTb/aTa gives the projection p = xa. Figure 4.5: The projection p of b onto a line (left) and onto S = column space of A. The projection of b onto the line through a is the vector p = xa ж °т в. Special сне I: If b = a then i = I. The projection of a onto a is itself. Pa - a Special case 2: If b is perpendicular to a then aTb ж 0. The projection is p = 0. ’ 1 ' 2 2 Solution The number x is the ratio of aTb = 5 to aTa = 9. So the projection is p = Ja The error vector between b and p is e = b - p. Those vectors p and e will add to b. Example 1 Project b onto a = to find p = xa = | a in Figure 4 J 5 P=9 5 10 10 9’ 9 ’ 9 1 1 1 1 9 The error e should be perpendicular to a = (1,2,2) and it is: e a
146 . Of b P. and e. The vector b is split into two pan_ Look at the right tnang popendicular part is e. Those two sides p M - W **• T"f"''v*•"«pr»dut,' have length ||p|l — - «=£ws has length W = “ПЙ5 Hl = l|b|l (3) aTa ы ampler than gening involved with coefl and the length Of s. The dot product is а lot (|Ьц = yj 8® Of 6. The example has square * j$ sqUilre roots in the projection p=5o/!Jlne«wu ' ' Now comes the projection matrix In the formula for p what matrix is multiply, 6, You c^ 2Z JXer if the number x „ on the nght stde of a: Projection matrix P s Pb "hen the matrix is P * tsar ara aTa' Solution Multiply column a rimes row aT and divide by a1 a 1 9 P i, . column rimes a row! Tbe column is a. the row is a Tben dmde by the number aTo The projectton matnx P is m by m. but its rank is one We are projecting onto a one-dunensional subspace, the line through a. That hne ss the column space of P. Ex.mpte2 Fmd the projection matrix P » onto the line through a = [j], 9: 1 2 2 И 2 4 2 1 9 Projection matrix 2 4 This matrix projects any vector bonto a. Check p ” Pb for b = (1,1,1) in Example I: P-Pb-I 1 1 1 2 21 Г1 2 4 4 2 4 4 1 9 5 10 10 > • which is correct. If the vector a is doubled, the matrix P stays the same! It still projects onto the same line. If P is squared. P3 equals P. Projecting a second time doesn't change anything. so P2 - P. The diagonal entries of P add to | (1 + 4 +• 4) = 1 = dimension of line. P3 = = P when you cancel the number e*a aTe aTo The matrix I - P should be a projection too. It produces the other side e of the triangle— the perpendicular part of b. Note that (I - P)b equals b - p which is e in the left nullspace. When P projects onto one subspace. I — P projects onto the perpendicular subspace. Now we move beyond lines and planes in R3. Projecting onto an n-dimensional KxJfiuj-n p • Cff°rt T** cn,cial formulas will be collected in equations (5H6H7). Bastoally you need to remember thou-.b--------
4.2. Projections onto Subspaces 147 Projection Onto a Subspace Start with n vectors <ц,..., a„ m R Assume that these a's are linearly independent. Problem: Find the combination p = z,a, + ... + * a glyfn „аог b We are projee mg eac m R onto the n-dimensional subspace spanned by the a's. With n = I (one vector a() this is projection onto a line. The line is lhe column space of A, which has just one column. In general the matnx A has n columns a,...a„. The combinations in R”1 are the vectors Az in the column space. We are looking for the particular combination p = Ax (the projection) that is closest lo b. The hat over z indicates the best choice z. to give the closest vector in the column space. That choice is £ = aTb/a a when n = 1. Forn > 1. the best z = (z,........£„) is to be found now. We compute projections onto n-dimensional subspaces in three steps as before Find the vector x. Find the projection p = Az. Find the projection matrix P. The key is in the geometry ' The dotted line in Figure 4.5 goes from b to the nearest point Ax in the subspace. This error vector b — Ax is perpendicular to lhe subspace. The error b - Ax makes a right angle with all the vectors al,...,an in the base. Those n right angles give the n equations for x: af (b - Az) = 0 or aj(b- Az) ж 0 .T (4) 0 The matrix with those rows af is AT. The n equations are exactly AT(b - Ax) = 0. Rewrite AT(b - Az) = 0 in its famous form ATAx = ATb. This is the equation for z, and the coefficient matrix is AT A. Now we can find z and p and P. in that order. The combination p = X|Oi+ ••• +znOn that is closest to b is p = Az: Findz(nxl) AT(b-Az)-0 or ATAz = ATb (5) This symmetric matrix ATA is n by n. It is invertible if the a's are independent. The solution is x = (ATA)~lATb.Theprvjecribnofbontotbesubspaceisp: Findp(mxl) p = Az = A(ATA)* ATb. The next formula picks out the projection matrix that is multiplying b in (6): Find P (m x m) (6) (7)
148 Compare with pr®Jft'uon For n = 1 и», ЛЬ»0** °°ta,,n “d ATA = »Ta <1 by ъ - -S* .»d Р=«Й •"« aTa a a a . n[ical with (5)and <« and nUmbcr ®Tq become. ,k Those formulas are Л divide by 1L When it is a matrix. we in ,he Л -.................................°- лТл » *5£ The linear ^dependence = Q used gcometry (e is orthogonal to Cach The key step was A ( _ .. g very quick and beautifu] * a). Unear algebra gives thu normal egaanon way, 1. Our subspace is the column space of Л 2. The error vector e = b - Лх is perpendicular to that column space. I N» « Ь i. *1* ..llx*» °f f! ТЬеь ЛТ(Ь - Л») « 0 and ЛТЛ£ « лть Th, left » -pod»* '** Г*0*"”1" J- Пи1!1Р“е °'»* «го, vector е = b - Лх. The vector b is split into the projection p and the error e = b ₽ Projection produces a right mangle with sides p. «. and b Example 3 If Л - [} ?] «d b =[?] find £ and p and P. Solution Compute the square matnx ЛТЛ and the vector ATb. Solve ЛТЛ2 » лт6: Equations ЛТЛ£ = ЛТЬ : The combination p = Ai is the projection of b onto the column space of A: Two checks on the calculation. First, the error e = (1, -2,1) is perpendicular to both columns (1.1.1) and (0.1,2) of Л. Second, the matrix P times b = (6.0,0) correctly gives p = (5.2, -1). That solves the problem for one particular b. The projection matrix is P = Л(ЛТЛ)-*ЛТ. The determinant of ATA is 15 - 9 = 6. Then multiply A times (ЛТЛ)~’ limes Лт to reach P: We must have Р» = p. because a second and P=1 6 5 2-1 2 2 2 -12 5 (10) projection doesn’t change the first projection!
4.2- Projections onto Subspaces 149 The ™. P . 4(4’4)-UT » Уоо (ЛТЛ)-. *» A ,-ujtw It '.' ** »>*«* ii imo P. you will H~l p = A A {A ) Л . Apparently everything cancels. This looks like P = I. the identity matrix. We want to say why this is wrong. fhe matrix A is rectangular. It has no (wn, We н, (лт A)-1 inlo A"1 times (A ) because there is no A -»in the first place In our expencnce, a problem that involves a rectangular matnx almost always leads to ArA- When A has independent columns. ATA is inveruble This fact is so crucial thal we state it clearly and give a proof. A A Is invertible if and only If A has linearly independent columns, j Proof ЛТА is a square matrix (n by n). For every matrix A, we will now show that A1 A has the same nullspace as A. When the columns of A art linearly independent, its nullspace contains only the zero vector. Then ATA. with this same nullspace, is invertible. Let A be any matrix. If x is in its nullspace, then Ax • 0. Multiplying by AT gives AT Ax = 0. So x is also in the nullspace of ATA. Now start with the nullspace of ATA. From ATAx « 0 we must prove Ax 0. We can't multiply by (AT)'1, which generally doesn't exist Just multiply by xT: (xT)ATAx = 0 or (Ax)T(Ax)-0 or |Ax|2 = 0. (II) We have shown: If A1 Ax “ 0 then Ax has length zero Therefore Ax = 0. Every vector x in one nullspace is in the other nullspace. If ATA has dependent columns, so has A. If ATA has independent columns, so has A. This is the good case: ATA is invertible. When A has independent columns, AT A is square and symmetric and invertible. AT A ATA дТ A ATA [110 I 2' [2 41 [1 1 01 1 2‘ [2 41 [2 2 0 1 2 0 0 4 *1 [2 2 1] 1 2 .° >. ’1 в Л dependent singular indep. invertible Very brief summary To find the projection p = Zia1 + • • • + zna„. solve ATAx = A1 b. This gives x. The projection of b is p = Ax and the error is e = b~p = b Ax. The projection matrix P = A(ATA)_,AT gives p = Pb This matrix satisfies P2 = P. The distance from b to the subspace C(A) is ||e|| = ||b — p|| (p = closest point). Example Suppose your pulse is measured at x = 70 beats per minute, then z = 80. then x = 120. Those three equations Ax = b in one unknown have AT = (1 1 1) and b = (70,80,120). Thenx = 90° is the average of 70,80,120 Use calculus or algebra: I. Minimize E = (x - 70)a + (x - 80)’ + (x - 120)1 by solving dE/dx = 6z - 540 = 0. , . «Tfc 70 + 80+120 2. Project b = (70,80.120) onto a = (1,1,1) to find x= =-----------=90.
150 Copter 4. Onho^^ Problem Set 4.2 Questions 1-9 ask for projections p onto lines. Abo errors e = b - p and niatricts p Project lhe vector b onto the line through a. Check that e и perpendicular to a; T (b) b= 3 1 and Dm the projection of b onto a and also compute it from p - xa: In Problem I. find the projection matrix P « aaT/ara onto the line through vector a. Verify in both cases that P3 = P. Multiply Pb in each rw l0 co Mc" the projection p Projection matrices onto lines have rank 1. PUIe 4 Construct the projection matrices Pi and Pi onto the lines through the ai in Prob- lem 2. Is it true that (ft + ft)’ - ft + ft? ‘П>« *<*« be true if P, ft - 0. 5 Compute the projection matnees aaT/a'!a onto the lines through a! - (-1,2,2) and oj = (2.2. -1) Multiply those projection matrices and explain why their prod- uct Pt Pj is what it is. 6 Project b a (1,0,0) onto lhe lines through Oi and a? in Problem 5 and also onto ej a (2.-1,2). Add up the three projections p, +p, +p3. 7 Continuing Problems 5-6. find the projection matrix P3 onto aj = (2, — 1,2). Verify that ft + ft + ft a /. This is because the basis a], aj. a3 is orthogonal I Questions 5-6-7: orthogonal Questions 8-9-10: not orthogonal
42 projections onto Subspace» 151 project thc vector b = (1,1) onU) Draw thc projection» p, and p, and add p because thc a’» are not orthogonal. through at . (1,0) and a2 = (1,2). + Pj. Thc projections do not add to b 9 10 In Problem 8, the projection of b p = A(ATA)-1 AT for A = (a( project ai = (1,0) ontoa2 = (1,2). Then these projections and multiply the projection onto the plane of a( and a2 will equal b. Find °2' = [ol] = invertible main*. project the result back onto a,. Draw matnces Pt Pj; Is this a projection? Questions 11-20 ask for projections, and projection matrices, onto »u tn paces. 11 Project b onto the column space of A by solving ATAi -ЛЧшАр-Аж: 1 1 2 1 1' ’4‘ (a) A = 0 1 and b a 3 (b) A = 1 1 and b “ 4 0 0 4 0 1 6 Find e = b - p. It should be perpendicular to the columns of A. 12 Compute the projection matrices P, and Pj onto the column spaces in Problem 11. Verify that P^b gives the first projection p,. Also verify P22 Pj. 13 (Quick and Recommended) Suppose A is the 4 by 4 identity main* with its last column removed. A is 4 by 3. Project b « (1,2,3,4) onto the column space of A. What shape is the projection matrix P and what is P? 14 Suppose b equals 2 times the first column of A. What is the projection of b onto the column space of A? Is P = I for sure in this case? Compute p and P when b = (0,2,4) and the columns of A are (0,1,2) and (1,2,0). 15 If A is doubled, then P = 2A(4ATA)* *2AT. This is the same as A(ATA)_,AT. The column space of 2A is the same as______. Is x the same for A and 2A? 16 What linear combination of (1,2, -1) and (1,0,1) is closest to b - (2,1,1)? 17 (Important) If P2 = P show that (/ - P)2 « I - P. When P projects onto thc column space of A, / — P projects onto the_________. 18 (a) If P is the 2 by 2 projection matnx onto the line through (1,1), then / - P is the projection matrix onto_____. (b) If P is the 3 by 3 projection matrix onto the line through (1.1,1). then I - P is the projection matrix onto______. 19 To find the projection matrix onto the plane x — у — 2z = 0. choose two vectors in that plane and make them the columns of A. The plane will be tbe column space of A! Then compute P — A(ArA)"*AT. 20 To find the projection matrix P onto the same plane X - у - 2z = 0, write down a vector e that is perpendicular to that plane. Compute tbe projection Q - ее / e e and then P = I — Q.
152 21 24 26 28 _ Л/АтА)~'Ат by iee^- Cancel ,0 Prove that pa n “in“lumn W««’ „T „ p™. ,ы p - “ »”“«by “mpu"”8 ' Remcmb''««u, Х“.^"“"«“!уп,лллс- .. ,JTJ , и л i, М|««е and mvcrt.ble. (he warning 'P'T‘J( n™ Wt " = /• When A В invertible, why is P - П Whath^ 1 Then AA *(Л ) л n nf Ar it to 'hc column space С(Л)- So if ATb - 0 ik The nullspace of A it —-— Check that P - л/ лт ’ ,'|e projection of b onto C(A) should be p = -• Check that _ A(A A)~>Лт gives this answer. The projection matrix P onto an n-dimensional subspace of R”' has rank r . n. Keaton: The projections Pb fill the subspace S. So S is the-of P. If an m by rn matrix has Л3 = Л and its rank is m, prove that A = I. The important fact that ends lhe section it this: If ATAx = 0 then Ax = <j New Proof. The vector Ax is in lhe nullspace of-. Ax is always in the column space of ___. To be in both of those perpendicular spaces, Ax must be zero. Use PT « P and P2 = P to prove that the length squared of column 2 always equals the diagonal entry Pjj. This number i* £ = j|j + jjj + ;& for ‘52-1 29 If I) has rank rn (full row rank, independent rows) show that В В1 is invertible. 30 (a) Find the 2 by 2 projection matrix Pc onto the column space of A (after looking closely at the matrix!) . _ [ 3 6 6 1 I 4 8 8 J 31 32 33 (b) Find the 3 by 3 projection matrix Pw onto the row space of A. Multiply В = Pc Al ft. Your answer В should be a little surprising—can you explain it? In R™. suppose I give you b and also a combination p of щ.an. How would you test to see if p is the projection of b onto the subspace spanned by the a’s? SnjWeynhkn.n.te.ve^^ „Г4..4,............ w When Z,,™ „rives, check “•al + (610U0 - IoM)/1(Kjo. That step updates fotd to xww. Suppose P, and P, are projection matrices (P'2 = pi = /rT) Provc (his fac,. A P3 is a projection matrix if and only if Pt p3 = p2pl.
4 3 Lea*1 Square* Approximations 153 4.3 Least Squares Approximations It often happens that Ax — b has no solution. The usual reason is: too талу equations. The matrix A has more rows than columns. There are more equations than unknowns (m is greater than n). The n columns span a small рал of m-dimensional space. Unless all measurements are perfect, b is outside that column space of A. Elimination reaches an impossible equation and stops. But we can't stop just because measuremenu include noise! To repeat: We cannot always get lhe error e — b - Ax down to zero. When e is zero. x is an exact solution to Ax = b. When the error e is as small as possible, x it a least squares solution. The words "least squares" mean that | <b - Ax\|2 is a minimum Our goal in this section is to compute x and use it These are real problems that need answers Note In statistics this problem is linear regression x and b often become Y and X. The previous section emphasized p (the projection» This section emphasizes x (the least squares solution). They are connected by p = Ax. The fundamental equation is still AtAx = A rb. Here is a short unofficial way to reach this “normal equation": When Ax = b has no solution, multiply by AT and solve ArAx ” ATb. Example 1 A crucial application of least squares is lining a straight line to m points. Stan with three points: Find the closest line to the points (0.6), (1.0). and (2,0). No straight line b = C + Dt goes through those three poinu. We are asking for two numbers C and D that satisfy three equations: n = 2 and m = 3 and m > n. Here are the three equations at t = 0,1,2 to match the given values b = 6,0,0: 1 = 0 The first point is on the line b = C+ Dt if C 4- D -0 = 6 t = 1 The second point is on the line b = C + Dt if C + D • 1 = 0 t = 2 The third point is on the line b = C + Dt if C + D 2 = 0 This 3 by 2 system has no solution: b = (6.0,0) is not a combination of the columns (1,1,1) and (0,1,2). Read off A and x and b from those equations: b = '6 0 0 Ax = b is not solvable x is overdetermined
I u*» * _ «и ч in the last section. We computed x = (r The same numbers were in Р* 5 _ 3< will be the best line for the 3^7^' Those number, are the best by Mpkuning why A-Ax = A^. We must could easily be m = 100 points instead of m e 3 In practical pro (j f + pt. Our numbers 6,0,0 exaggerate the erro C> don't exactly match any anight lineд f "c Crror -------„ and r3 in Figure 4.6. Minimizing the Error as small as possible? This is an important qucstion How do we make the error _ (caUed found by gcomctry (the with a beautiful answer } h can be found by algebra: ArAx = дТ6 °* ЛССП" Pl - 61,1 Ь 1 Bv aeometrv Every Ax lies in the plane of Ле columns (1 1. D and (0,1,2), In that 5S Г-«<— “ ‘ *'<"*«•" ₽ n.taidW.I.ai-Xl**"’' 1Ъе po».bk em.r. b_p peroendicular to the columns. The three pauas al hetghts (p,, P2. Ps) do he on a line. SXse p is tn tbe column ермх of A The best line C + Dt comes from x - (C, D). By algebra Every vector b splits into two parts. The part in the column space is p The perpendicular part .* e. There is an equation we cannot solve (Az = 6). There is an equation Ai = p we can and do solve (by removing « and solving AT Ax = A1 b): Ax-b = p+e is impossible Az = p is solvable x is (ATA)~*ATb. (1) The solution to Ax - p leaves the least possible error (which is e): Squared length for any * II Az - b||2 || Ax - p|(2 + ||e[|a. (2) This is the law c3 « a1 + b2 for a right triangle. The vector Az - p in Ле column space is perpendicular to e in thc left nullspace. We reduce Ax - p to zero by choosing x = Z. That leaves the smallest possible error e = (*i.e?,es) which we can't reduce. Notice what "smallest” means. The squared length of Ax - b is minimized: The least squares solution i makes E = || Ax — b||3 as small as possible. Figure 4.6a shows the closest line. It misses by distances ei,ei, ej = 1,-2,1. Those are vertical distances. The least squares line minimizes E = e? + e% + e2. Figure 4.6b shows the same problem in 3-dimensional space (bpe space). The vector b is not in the column space of A. That is why we could not solve Az = b. No line goes through the three points The smallest possible error is the perpendicular vector e. This is e ~ ,he vcc,w of errors (1, -2,1) in the three equations. Those are Ле distances from the best line. Behind both figures is the fundamental equation AT Az = ATb.
4 3 beast Squares Approximations errors = vertical distances to line Figure 4.6: Best line and projection: Two pictures, same problem. Tbe line has heights p « (5,2, —1) with errors e«(l, -2,1). The equations ATAx = ATbgivex — (5, -3). The best line is b = 5 - 31 and the closest point is p = 5a> - 3a2 Same answer! Notice that the errors 1,—2,1 add to zero Reason: The error e = (ei.es,ej) is perpendicular to the first column (1,1,1) in Л. The dot product gives ei + ej + ej • 0. By calculus Most functions are minimized by calculus! The graph of E bottoms out and the derivative in every direction is zero. Here the error function E to be minimized is a sum of squares ej + ej + ej (lhe square of the error tn each equation): E-||Ax-b||3-(C + D-0-6)J + (C + D-l)2 + (C + D-2)2. (3) The unknowns C and D tell us the closest line C + Dt. With two unknowns there are two derivatives—both zero at the minimum. They are “partial derivatives" because dE/dC treats D as constant and QE/dD treats C as constant: dE/dC = 2(C + D-0-6) + 2(C+D1) + 2(C + D-2) =0 9E/dD = 2(C + D • 0 - 6)(0) + 2(C + D • 1)(1) + 2(C + D • 2)(2) - 0. dE/dD contains the extra factors 0,1,2 from the chain rule. (The last derivative from (C + 2D)2 was 2 times C + 2D times that extra 1) Those factors are 1,1,1 in dE/dC. It is no accident that those factors 1,1, 1 and 0.1. 2 in lhe derivatives of Ax — b|| are the columns of A. Now cancel 2 from every term and collect all C’s and all D s: The C derivative is zero: 3C + 3D = 6 •n,isnwtrixP ’]tsATA! The D derivative is zero: 3C + 5D = 0 L 3 * 5 J (4)
156 П<» tquanons ®* сж1си1и5 лс Ofx. The ' equations of» algebra Ibese are the ke> СЧ Chapter 4. Onhogo^ ,ть Hie best C and D are the compon * Jas the 'normal equations' from h e «тс )jnear regression. .3 = b e 5 - 3t is die best line—it Comes . r _ 5*pdD = -3 mis line goes through p = 5. 2, H The solution is c ' At t e 01 l* * , _2 1. This is the vector e! closest to the three pou"--------------.^«areL • It could not go through b The Big Picture for Least Squares . fnor subspaces and the true action of a mMrix. The key figure X- “ b = °" ** The vector x on the left side of g solutions to Лх - b. ?* J'ht intoXr + z. ^ jToopoMte There are no solutions to Ax , 6 !n L section the situanoo .s juM tbrg 4.7 sh the big picture /«.end uPx “*^^^tobe ,4x = P The error e = b-p ts unavo.dable, for least squares. Instead of A wlvabteforp p is in the column space к all of R 0 A has independent columns Nullspace = {0} column space inside R”‘ no< solvable for b b is not in the column space ofAT p = Pb is nearest to b Figure 4.7: The projection p = Ax is closest to b. so x minimizes E - ||b - Ax||’. Nonce how the nullspace N(A) is very small—just one point. With independent = 0 - ~ » rmb'c' AT Ax = ATb fully determines the best vector x. The error has A e - u.
д j Least Squares Approximations 157 Fitting a Straight Line Fitting 8 l*nc ls t*1c clearest application of кач hopefully near a straight line. At times t “ sUfU w,th > 2 P°,nU> , The best line C + Dt misses*^ " Шоте P°,nt4 *’ he,ehU *T“ “.......... The first example in this section had “ c‘ + ‘ '+ e”*' J™ can be lartte) The two / P01»*4 “ R8“« <6. Now we allow m points (and m can be large). The two components of i are still C and D A line goes through them points when „ d0 it. TVo unknowns C and D determine a fine = У*7^ ‘ J0 • . arM trvino . i 50 4 1Пе’ 10 has only n = 2 columns. To fit the m points, we are trytng to solve m equations (and we only have two unknowns!). C + Dt\ = b] И..6 a c+«.-»> C+DU-k, with and [ci lDl <5) The column space of A is so thin that almost certainly the vector b is outside of it. When b happens to lie in the column space, the points happen to lie on a line. That case b - p is very unusual. Then Ax - b is solvable and e - (0.....0). Tht ЛМ Unt C + Dt has heigto Pl.........Pm wi/Л rrro„ e,......вт. Solvt AT Ax = ATb fori = (C.D). The error, are e, = b, - C - Dt,. Fitting points by a straight line is so important thal we now find the two equations A rAx = Arb, once and for all. The two columns of A are independent (unless all of the times t* are the same). So we turn to least squares and solve ATAx - ATb. Dot-product matrix ATA = L»t ••• U (6) On the right side of the normal equation is the 2 by 1 vector ATb: (7) In a specific problem, the t*s and b's are given. The best i = (C.D) is (ATA)_,ATb. The line C + Dt minimizes e, + • • • ATAz = ATb + ej, = ||Ax - b||2 when ATAx = ATb: m E*<1 [cl _ Ek ,8) [Eb E<?] Ы~ 1е<АГ
158 Chapter 4. Orthogonal^ The vertical errors at (he m points on the line arc the components of e - b - p, This error vector (the residual) b - Ax is perpendicular to the columns of A (geometry). The error is in the nullspace of AT (linear algebra). The best x = (C, D) minimizes thc total error E, the sum of squares (calculus): E(x) = |Ar - Ц’ = (C + Dti - bi)’ + • • • + (C + Dtm - bm)2. Calculus sets the derivatives dE/dC and dE/dD to zero, and produces Ат4х = 4Tb. Other least squares problems have more than two unknowns. Fitting by the best parabola has n = 3 coefficients C, D. E (see below). In general we are fitting m data points by n parameters xi,...,x„. The matrix A has n columns and n < m. Tbe derivatives of ||.4x - b||2 give the n equations ArAx = ATb. The derivative of a square is linear. This is why the method of least squares is so popular Example 2 A has orthogonal columns when the measurement times t, add to zero. Suppose b - 1.2,4 at times t « -2.0.2. Those times add to него. The columns of A have tens dot product (1,1,1) is orthogonal to (-2,0,2): C+D(-2)«l C+ D(0)«2 C+ D(2)-4 1 -2' 1 0 1 2 T 2 4 When the columns of A are orthogonal. ATA will be a diagonal matrix (this is good): ATAx = ATb is (9) uWZ. find C - j .nd D - S. The urn - \ • -<T* i 5 j H. is almost as simple as the identity matrix. average '."77 “ *°ПЬ ** Л>Ш Ьу sub,rac,ing ,he t - 3. The shifted tunes T = Г t' , °П8'П^tlm” Were 1 •3'5 lhcn thcir avcra8e “ * * ~3**e-2,0,2. Those times add to zero! 1 T‘l r n 1 К А1_Л.„_= 3 0 . Ti = 1-3=-2 Tj ж з - 3 ж о Tj = 5 - 3 ж 2 Now C and D come from the easv mum zm which is C + £)(t - f) = q + _ °.? 1 ^*n thc **st straight fine uses C + DT That was a perfect example of th, -с 7? 30 Cvcn «'*« a formula for C and D. Make the columns orthogonal in advance tsJ,dca coming in the next section:
4.3. Least Squares Approximations 159 Dependent Columns in A : Which x is best ? From the start, this chapter assumed independent columns in A. Then AT A is invertible. AtAx = ATb produces the only least squares solution to Ax = b Which x is best if A has dependent columns ? All the dashed lines have the same errors e = (1, -1). The measurements = 3 and bj = 1 are at lhe same lime T! A straight line C + Dt cannot go through both points. I think we are right to project 6 « (3,1) to p = (2,2) in the column space of A. That changes the equation Ax « b to lhe equation Ax p. An equation with no solution has become an equation with infinitely many solutions. The problem is that A has dependent columns and Xi + ij 2 has many solutions. Which solution x should we choose? All the dashed lines in the figure have the same two errors I and -1 at time T. Those errors (I,-1) - e = b - p arc as small as possible. But this doesn't tell us which dashed line is best. My instinct is to go for the horizontal line at height 2. The "pieudoinvrrse" of A will choose the shortest solution x+ = A+b to Ax = p. Here, that shortest solution will be x+ = (1,1). This is the particular solution in the row space of A, and x+ has length y/2. (Both solutions x - (2,0) and (0.2) have length 2.) We are choosing the nullspace component of the solution x * to be zero. When A has independent columns, the nullspace only contains the zero vector and lhe pseudoinverse is our usual left inverse L = (A1.4) 1AT When 1 write it that way. the pseudoinverse sounds like the best way to choose x. The shortest solution x is often called the minimum norm solution its nullspace component is zero. Comment MATLAB expenmenu with singular matrices produced either Inf or NaN (Not a Number) or 10,e (a bad number). There is a warning in every case! I believe that Inf and NaN and 10*e come from the possibilities 0x = band0x-0andl0 x-1. Those are three small examples of three big d.fficulties: singular with no solution, singular with many solutions, and very very close to singular, ry more experiments.
160 Chapter 4.0^^ Fitting by a Parabola If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b = q + + allows the ball to go up and come down again (b is the height at time t). The actual path is not a perfect parabola, but the whole theory of projectiles starts there. When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated. The d<«tance contains a quadratic term (Galileo s point was that the stone s mass is not involved.) Without that t* term we could never send a satellite into its orbit. But even with a nonlinear function like t1, tbe unknowns C, D, E still appear linearly! Fitting points by tbe best parabola is still a problem in linear algebra. Problem Fit heights bt,... ,6m at times tb.... tm by a parabola C + Dt + Et2. Solution With m > 3 points, the m equations for an exact fit are generally unsolvable: C + Dtt + Et} «= 6, C + Dtm + Eti, - 6m 1 fl f? A = • • • • (10) j fm fm. = (C,D,E) to is Ax = 6 with this rn by 3 matnx Least squares Tbe closest parabola C + Dt + Et2 chooses x solve the three normal equations AT Ax = A T6 May I ask you to convert this to a problem of projection? Tbe column space of A has dimension______ The projection of b is p = Ax, which combines the three columns using the coefficients C, D, E. The error at the first data point is ei «= 6j - C - Dt i - Et\. The total squared error is ef +__. If you prefer to minimize by calculus, lake the pwtial derivatives of E with respect to_,___________These three derivatives will be zero when x - (C, D, E) solves lhe 3 by 3 system of equations ArAx = ATb. Example 3 For a parabola 6 = C + Dt + Et2 to go through the three heights 6 6,0,0 when t - 0,1,2. the equations for C, D, E are C+D.0 + £.02=6 C+D.l+E.p = 0 C+D.2 + £.a»e0. a square m'nx The ^Vtion их' " Р°'ПЦ g'Ve Лгее c4ua,ions and three points i?6 = 6 - w + 3P (C ’ (®’ ~9' P-^bola through the whole space R'. The Droiectj^'^j?*^ P* malnx has ,hrcc columns, which span the is zero. We didn't need ArAx "аП* IL?' ,dentity The projection of b is b. The error lf there are m = 4 dauX 2 T“lved ’ *• зи points, then we need ATA and least squares. (ID
43. Least Squares Approximations 161 Three Ways to Measure Error Start with nine measurements bi to b». all urn, at times t = The tenth measurement b10 = 40 is an outlier. Find the best horizontal line у = C to fit the ten points (1,0),(2,0),...,(9,0),(10,40) using three options for the error E: (1) Least squares Ei = ej + • • • + ejo (then the normal equation for C is linear) (2) Least maximum error Eoo = | emaj | (3) Least sum of errors E\ = |«il+ ” ' +le»o| Solution (1) The least squares fit to 0,0,...,0,40 by a horizontal line is C = 4: A = column of l's ЛтА = 10 ATb = sum of b, = 40. So 10C - 40. (2) The least maximum error requires C = 20. halfway between 0 and 40. (3) The least sum requires C = 0 (!!). The sum of errors 9|C| + |40 - C| would increase if C moves up from zero. The least sum comes from the median measurement (the median of 0.....0,40 is zero). Many statisticians feel that the least squares solution is too heavily influenced by outliers like bio ~ 40. and they prefer least sum. But the equations become nonlinear. Now find the least squares line C + Dt through those ten points (1,0) to (10,40): Those come from equation (8). Then ArAx = ATb gives C « -8 and D “ 24/11. Problem Set 43 Problems 1-11 use four data points b = (0.8,8,20) to bring out the key ideas.
162 a’»pt«4.otthogonah(y Wkh b = 0,8,8,20 at t = 0,1.3,4. set up and solve the normal eqUations 4t 4i = лтЬ- Rx the best straight line m Figure 4.8a. find its four heights P1 and*four errors e,. What is the minimum value E - e1 + e2 + e3 + e<? 1 2 3 4 5 6 7 8 9 10 11 (Line C + Dt does go through p’s) With b = 0,8,8,20 at times t = о 1 a write down the four equations Az = b (unsolvable). Change the measuremen p = 1,5,13,17 and find an exact solution to Az = p. nts 10 Check that e = b - P = (-1.3, -5,3) is perpendicular to both columns of the same matrix A. What is tbe shortest distance H from b to the column space of Л? (By calculus) Write down E = ||Az - b(|2 as a sum of four squares—the last is (C + 4D - 20)2. Find the derivative equations dE/dC = 0 and dE/dD ~° o' Divide by 2 to obtain tbe normal equations A1 Az = ATb. Find the height C of the best horizontal line to fit b = (0,8,8,20). An exact fit would solve the unsolvable equations C = 0, C = 8, C = 8, C = 20. Find the 4 by 1 matrix A in these equations and solve ATAz = ATb. Draw the horizontal line at height x = C and the four errors in e. Project b = (0,8,8,20) onto the line through a = (1,1,1,1). Find z = aTb/aTa and the projection p = xa. Check that e = b - p is perpendicular to a, and find the shortest distance |elf from b to the line through a. Find the closest line b = Dt, through the origin, to the same four points. An exact fit would solve D • 0 = 0, D • 1 = 8, D • 3 = 8, D • 4 = 20. Find the 4 by 1 matrix and solve ATAz = ATb. Redraw Figure 4.8a showing the best line b = Dt and the e’s. Project b = (0,8,8,20) onto the line through a = (0,1,3,4). Find X = D and p = xa. The best C in Problems 5-6 and the best D in Problems 7-8 do not agree with the best (C, D) in Problems 1-4. That is because (1,1,1,1) and (0,1,3,4) are ______perpendicular For the closest parabola b = C + Dt + Et2 to the same four points, write down the unsolvable equations Az = b in three unknowns z = (C, D, E). Set up the three normal equations AT Az = ATb (solution not required). In Figure 4.8a you arc now fitting a parabola to 4 points—what is happening in Figure 4.8b? For the closest cubic b = C + Dt + Et2 + Ft3 to the same four points, write down the four equations Az = b. Solve them by elimination. In Figure 4.8a this cubic now goes exactly through the points. What are p and e? The average of the four times is t = {(0 + 1 + 3 + 4) = 2. The average of the four b’s is b = |(0 + 8 + 8 + 20) = 9. (a) Verify that the best line goes through the center point (F, b) = (2,9). (b) Explain why C + Dt = b comes from the first equation in ATAx = A^b.
4.3. Least Squares Approximation* 163 Questions 12-16 introduce basic ideas of statistics—the foundation for least squares. 12 (Recommended) This problem projects b = (b,.......b_) onto the line through a = (1,... t !)• We solve m equations ax = b in one unknown x (by least squares). (a) Solve aTax = a Yb to show that x is the mean (the average) of the b’s. (b) Find e = b - ax and tbe variance ||e|2 and the standard deviation ||e||. (c) Thehorizontal line b = 3isclosest tob = (1,2,6). Check that p = (3.3,3) is perpendicular toeandfindthe3by3 projection matrix P. 13 First assumption behind least squares: Ax = b - (noise e with mean zero). Multiply the error vector e - b - Ax by (ATA)-IAT to get x - x on lhe right. The estimation errors x — x also average to zero. The estimate x is unbiased. 14 Second assumption behind least squares: The m errors e, are independent with vari- ance a2, so the average of (b - Ax)(b - Ax)T is a2!. Multiply on lhe left by (ATA)-1AT and on the right by A(ATA)-1 to show that tbe average matrix (x — x)(x — x)T is <72(ATA)~*. This is the covariance matrix in Section 8.4. 15 A doctor takes 4 readings of your heart rate. The best solution to x = b|,...,x = b« is the average x of bi,...,bt. The matrix A is a column of Га. Problem I4 gives the expected error (x - x)2 as o2(ATA)~l = . By averaging, the variance drops from a2 to a2 /4. 16 If you know the average xg of 9 numbers b|,..., 6». how can you quickly find the average хщ with one more number Ью? Tbe idea of recursive least squares is to avoid adding 10 numbers. What number multiplies x# in computing гщ ? *io = i^^io +-------i» = + ••• + 6ю) Questions 17-24 give more practice with x and p and e. 17 Write down three equations for the line b = C + Dt to go through b = 7 at t = — 1, b = 7 at t = 1. and b = 21 at t = 2. Find the least squares solution x = (C, D) and draw the closest line. 18 Find the projection p = Ax in Problem 17. This gives the three heights of the closest line. Show that the error vector is e = (2. -6,4). Why is Pe = 0? 19 Suppose the measurements at t = -1,1.2 are the errors 2.-6.4 in Problem 18. Compute x and the closest line to these new measurements. Explain the answer: b = (2, -6.4) is perpendicular to__________so the projection is p = 0. 20 Suppose the measurements at t = -1,1.2 are b = (5.13,17). Compute x and the closest line and e. The error is e = 0 because this b is-----. 21 Which of the four subspaces contains the error vector e? Which contains p? Which contains x? What is the nullspace of A?
164 СЬдр,сг4°пЬор)Па1йу 22 HndihetatltaeC + »iofiifc-4.2.-1.0.0mtiIne#r = -2,-|,0,i,2> 23 к the error vector e orthogonal to b or p or e or z? Show that ||e||2 equals which equals brb - pTb. This is the smallest total error E. 24 The partial derivatives of | Ax|2 with respect Ю Ц.z„ fill the vector 2AтЛа. The derivatives of 2bJ Ax fill the vector 2ATb So lhe derivatives of || Ax _ 6#a zero when_____. Challenge Problems 25 What condition on (f|,bi).(fj.^2).(f3.fo) Puls >hose three points onto a straight line’! A column space answer is: (bt.bj.bj) must be a combination of (1, 1, i) an<J (ft. h- <»)• Try to reach a specific equation connecting the f's and b’s. I should have thought of this question sooner! 26 Hnd the plane that gives the best fit lo the 4 values b - (0,1,3,4) at the corners (1.0) and (0,1) and (-1.0) and (0. -1) of a square. The equations C+Dx + Ey* b al those 4 points are Ax • b with 3 unknowns x — (C,D, E). What is A? At lhe center (0.0) of the square, show thal C + Dx + Ey = average of the b's. 27 (Distance between lines) The points P (x, x, x) and Q (y, 3y, -1) arc on two lines in space that don't meet. Choose x and у to minimize the squared distance ||P - QH1. The line connecting the closest P and Q is perpendicular to___________ 28 Suppose the columns of A arc not independent. How could you find a matrix В so thal P B(BrB)~'Br does give lhe projection onto the column space of A? (The usual formula will fail when ATA is not invertible.) 29 Usually there will be exactly one hyperplane in R” that contains the n given points x = O o..............*.-!• (Example for n = 3: There will be one plane containing 0, o,, aj unless------) What is the test to have exactly one plane in R"? Example 2 shifted the times t, to make them add to zero. We subtracted away the average time t « (t, +... + tm)/m to gc, T( _ t( _ f -p,^ Tj add (0 With the columns (I,., entries are m and 7? + ., 1) and (7j,...,Tm) now orthogonal, A1A is diagonal. Ils ' ” + T^. Show that the best C and D have direct formulas: Tfat-t Cg*l+' +*! W)d blr,q--.. + 6m7'BI is an example*^ t^GmmVhmidt «2- ,ha‘ diagOna' advance. This и in Section 4 4 process, orthogonalize the columns of A in
4.4. Orthogonal Matnces and Gram-Schmidt 165 4.4 Orthogonal Matrices and Gram-Schmidt This section has two goals, why and how The first is to see why orthogonality is good. If Q has orthonormal columns, then QrQ « I. Least squares becomes eary. The second goal is to convert independent vectors in A to orthonormal sectors in Q. You will see how Gram-Schmidt combines thc columns of A to produce right angles between columns of Q. From Chapter 3, a basis consists of independent vectors that span the space. The basis vectors could meet at any angle (except 0е and 180®). But every lime we visu- alize axes, they are perpendicular. In our imagination. the coordinate axes art practically always orthogonal. This simplifies the picture and it greatly simplifies the computations Thc vectors qlt.... qn are orthogonal when their dot products q, qf are zero. More exactly q[q, “ 0 whenever i / j. With one more step—just divide each vector by its length—the vectors become orthogonal unit vectors Their lengths are all I (normal». Then the basis is orthonormal DEFINITION The vectors q,.......qn are orthonormal if their dot products arc 0 or 1: A matrix with orthonormal columns is assigned the special letter Q. The matrix Q is easy to work with because QrQ = I. This repeats in matrix language that the columns q,....qn are orthonormal Q is not always required to be square.
Chapter 4. 166 lily When row i of (?T multiplies column jof Q. the dot product is qj 9j< 0(r (i # j) that dot product is zero by orthogonality. On the diagonal (i = jj йе u .<U4o^ give qrq = |9(|’ = ]. ^ can be rectangular (m > n) or square (rn = " veqOrs Hhen Qis square, QTQ = I means that QT= Q~\ transpose == /n„erj When Q is not square, its rank is n < m. So QQT cannot equal /m If the columns are only orthogonal (not unit vectors), dot products still gjve matrix QTQ (but not the identity matrix). This diagonal matrix is almost as “ dia8on4| The important thing is orthogonality—then it is easy to produce unit vectors ^°°d To repeat QTQ ж 1 even when Q is rectangular. In that case QT js On( from the left For square matrices we also have QQT = /, so QT is the t *nvcr”: »ene of Q. The rows of a square Q are orthonormal like the columns. The .•*°'s‘<,ed b- transpose In this square case we call Q an orthogonal matrix 1 ">ersr is the Here are three examples of orthogonal matrices—rotation and permutation . bon. The quickest test is QrQ /. at>d ^flec- Exampte 1 (Rotation) Q rotates every vector in the plane by the angle # and Q-t к Q= Геш# -sin# - V" I sin# COS# sin# - sin в cos # ' cos# The columns of Q are orthogonal (take their dot product). They are unit vecton because sin29 + cos’ 9 « I. Those columns give an orthonormal basis for the plane R2, Example 2 (Permutation) These matrices change the order to (y, z, x) and (y,*); V t and 1 V Inverse = transpose: |0 0 [1 0 Oj [aj [xj l* ',J lMJ L*J All columns of these Q't are unit vecton (their lengths are obviously 1). They are also orthogonal (the l’s appear tn different places). The inverse of a permutation matrix is its transpose Q~l QT. The inverse puls the components back into their original order: ° 0 11 Гу] [ж] rn ,i г.л r_s 10 0 z 0 10 x JU. Every permutation matrix is an orthogonal matrix. Example 3 (Reflection) If u is any unit vector, set Q = I - 2uuT. Notice that uu is a matrix while uTu is the number |u|2 = 1. Then QT and Q~* both equal Q\ <3T*/-’wT’Q QTQ = f _ 4UUT + 4uuTuu and V (1) a better name for Q, but it’» not used. Any matnx with *e only call it an orthogonal matrix when It is square-
4.4. Orthogonal Matrices and Gram-Schmidl 167 Reflection matrices 1 - 2uu are symmetric and also orthogonal. If you them, you get thc identity Q = Q Q - I Reflecting twice through a mirror bring* back the original, like (-1) = 1. Notice uTu = I inside4uuTuuT inequauonflj. Figure 4.9: Rotation by Q = [J •] and reflection across 45° by Q - [J J). As an example choose the direction u = (-l/>/2,1/^2). Compute 2uuT (column times row) and subtract from I to get the reflection matnx Q across thc 45° line. aaw»» j] [»;][;]-[;]. When (*, y) goes to (y, x), a vector like (3,3) doesn't move. It is on the mirror line. Rotations preserve the length of every vector. So do reflections. So do permutations. If Q has orthonormal columns (QTQ = I), it leaves lengths unchanged: \|Qx|| - ||x|| Same length for Qx (Qx)T(Qx) = xrQrQx - xTlx xTx (2) Same dot product: (<?x)T(Qу) = xTQTQy « xTy. Just use QTQ / Projections Using Orthonormal Bases: Q Replaces A Orthogonal matrices are excellent for computations—numbers can never grow loo large when lengths of vectors are fixed. Stable computer codes use Q's as much as possible. For projections onto subspaces, all formulas involve ATA. The entries of AT A arc the dot products between the columns of A. Usually AT A is not diagonal. Suppose those columns are actually orthonormal Tbe a's become the q's. Then AT A simplifies to QTQ = I. Look at the improvemenu in x and p and the projection matrix P = QQT. Instead of QrQ we print a blank for the identity matrix: _____x = QTb and p-Qx and P = Q--------------Qr. (3) The least squares solution of Qx = b is x = QTb. There are no matrices to invert. This is the point of an orthonormal tmis. The best x = QTb just has dot products of q,,...,qn With b. We have 1-dimensional projections' A A is now Q Q - /.There is no coupling. When A is Q, with orthonormal columns, here is p = Qx - QQ »:
168 Chw'* Projection onto q't + Mgn6). Important case when Q в square: If m = n, the subspace is the whole space Th. QT = Q~l and i = Qrb is the same as z = Q~lb. The solution is exact! The proj of b onto the whole space is b itself. In this case p = b and P = QQT = /. ^bon You may think that projection onto the whole space is not worth mentioning when Q is square and p = b. our formula assembles b out of its I -dimensional proir °Щ If qt,...,q„ is an orthonormal basis for the whole space, then b is equal to Опт*' Every 6 b the sum of its components along the q*s: "• 1 W/ Example 4 The columns of this orthogonal Q are orthonormal vectors qt,q2,q3: m - n = 3 -1 2 2 2 2 -1 2 2 -1 has QtQ-QQT„L The separate projections of 6 = (0,0,1) onto q, and q2 al,d Яз m Pi an^ Pa and Ps' qt(«Tb)«ht «*» Ча(чТЬ)-|ча and q3(qfb) = -jq3. The sum of the first two is the projection of 6 onto the plane of q, and q2. The sum of all three is the projection of 6 onto the whole space—which is b itself: Reconstruct b 6"Pi +Pa+Pj 2 , 2 1 1 3«> + 3^-3»1«9 -2 + 4 - 2 4-2-2 0‘ 0 «=6. 1 Transforms QQr > / u the foundation of Fourier series and all the great “transforms” of applied mathematics. They break vectors b or functions /(z) into perpendicular pieces. Then by adding the pieces in (5). the inverse transform puts b and /(z) back together. Fourier series /(z) = a0 + a1coez + b1sinz + aacoe2x + basin2r+-’ Only two differences. Those are functions. The sine-cosine basis is infinite: m — n = oo.
4.4. Orthogonal Matrices and Gran-Schmidt 169 The Gram-Schmidt Process The point of т“тТи^ “ lhjI “onho8°nal “ g(wd"- Projections and least squares always involve A A. When this matrix becomes QTQ = /. ihe inverse is no problem. The one-dimensional projections are uncoupled. The best z is Qrb Gust n separate dot products). For this to be true, we had to say “If the vectors are orthonormal". Now we explain the “Gram-Schmidl way" to create orthonormal vectors. Start with three independent vectors a, b. c. We intend to construct three orthogonal vectors А, В, C. Sooner or later we will divide А. В, C by their lengths. That produces three orthonormal vectors q, = A/|A|. q2 = <j3 = C/JC||. Gram-Schmidt Begin by choosing Ажа. This first direction is accepted as it comes. The next direction В must be perpendicular to A. Start with b and subtract its projection along A. This leaves the perpendicular part, which is the orthogonal vector B: First Gram-Schmidt step B = b- (6) A and В are orthogonal in Figure 4.10. Multiply equation (6) by AT to verify that ATB A1 b - A b = 0. This vector В is what we have called lhe error vector e. perpendicular to A. Notice that В in equation (6) is not zero (otherwise a and b would be dependent). The directions A and В are now set. The third direction starts with c. This is not a combination of A and В (because c is not a combination of a and b). But most likely c is not perpendicular to A and B. So subtract off its components in those two directions to get a perpendicular direction C: Next Gram-Schmidt step (7) Subtract the projection p to get В = b — p Figure 4.10: First project b onto the line through a and find the orthogonal В as b - p. Then project c onto the AB plane and find C as c - p,. Divide by ЦАЦ, ||B||. ||C||. This is the one and only idea of the Gram-Schmidl process Subtract from every new vector its projections in the directions already set. That idea is repeated at every step. A = a b
Chapter 4. 170 Illy , К d ue would subtract three projections onto А. В, C t0 lc. n ~ «.л-* “»"""«»»; •=«»„ A ° E^dCM-SdvM SW~lbc'»^^"^-onto«0""l»«ooa,(, ’-care and ’ 2 0 -2 and c = ’ 3 -3 3 b = 0 Then A = a has ATA = 2 and ATb = 2. Subtract from b its projection p along A; В 2 1 First step 2 Check: AJ В » 0 as required. Now subtract the projections of c on A and В to get C. T 1 . Next step c Brc 6 6 —A-VB = £-^ + ;B = A BrB 2 6 Check; C = (1,1,1) is perpendicular to both A and B. Finally convert A,B,C to unit vecton (length I. orthonormal). The lengths of А. В, C are >/2 and ч/б and vj Divide by those lengths for an orthonormal basis q,, q2 Qi 1 '1 if1] 1 pl Usually А. В. C contain fractions Almost always q(, q2, q3 contain square roots. The Factorization A = QR We started with a matrix A, whose columns were a, b, c. We ended with a matrix Q whose columns are q^.q^ How are those matrices related? Since the vectors a b c °* lhe ’ ‘ VWM)’ “* ** 1 n,atnx connecting 'a third matnx is the tnangular R in A - QR. (Nol the R ln chapier 1.) e4uatK,n Г*” 'nWlVCd) Thc S,eP Was Z ed Si ’°" °f Л аП<1 B A‘ 'hat Ma«e C and «3 *c« This non-mvolvement of later vecton is the key point of Gntm-Schm.dt: • The vecton a and A and q, « all along a single line. • The vectona.band А Я »wi ~ _ ... ам Л. В and q,, q2 are all in the same plane. * vecton g. b, c and ABC and 9i>9j.q3 are in one subspace (dimension 3).
Orthogonal Matrice» and Gram-Schmidt 171 At every step ai,...,a* are combination*of q,.....„ Later q * arc not involved. The connecting matrix R is triangular, and we have A = QR a b rfa 0 0 «1 4i q3 fljb 0 4ic flic flje, or A = QR <8> A • QR « Gram-Schmidt in a nutshell. Multiply by QT ю recognize that R = QT A (Gram-Schmidt) From independent vectors a.... a., Gram-Schmidt construct* orthonormal vectors flp ••., fln. The matrices with these columns satisfy A — QR- Then R — Q1A is upper triangular because later q\ are orthogonal to earlier a's. Here are the original as and the final q't from the example. The i,j entry of R - QT Л is row 1 of Qr times column j of A. The dot products go into R. Then A — QR: 2 0 -2 3' -3 3 l/v/2 = -l/s/2 l/s/6 l/x/6 -2/^6 0 1/»/3] 1/V3 l/s/5j 0 -QR 0 Look closely at Q and R. The lengths of A. B.C are */2. s/б. s/3 on lhe diagonal of R. The columns of Q are orthonormal. Because of lhe square roots. QR might look harder than LU. Both factorizations are absolutely central to calculations tn linear algebra Any rn by n matrix A with independent columns can be factored into A — QR- The rn by n matrix Q has orthonormal columns, and the square matnx R is upper tnangular with positive diagonal. We must not forget why this is useful for least squares. ATA = (QR)TQR = RrQTQR = RT R. The least squares equation A1 Ax — Arb simplifies to RT Rx = RrQrb. Then finally we reach Rx = QTb: success. Least squares Rr Rx — RTQrb or Ri - Q'b or x = R~'QJb (9) Instead of solving Ax = b. which is impossible, we solve Rx = Q1 b by back substitu- tion—which is very fast. The real cost is the mn1 multiplications needed by Gram-Schmidt. The next page has an informal code. It projects each new column v = a} onto the known orthonormal columns q(,...,q,_*. After subtracting those projections from a7. the last line divides the new orthogonalized vector (still called t>) by its length r„. This produces the next orthonormal vector q;.
172 Starting from a. b.c 4l =ai/|oill subtracts all projections at once. -"-«onulity = ei. aj. аз the code will construct 91. B, q2, C\ C, q3. В = 02- (41a^i 42 ~ (qT<j)91 C = C--(^)«2 9з = С/||С|| nmiM-tion al a time in C* and then C. That change j m “* ”«<» m (Ю) ;e *s celled kxj = l:n v = tari = l:J-l R(i.j) = <?(:• »/*« v=0-R(iJ)»Q(:J); end R(j,j) = norni(u); Q(:.» = »/ЯиЛ end % modified Gram-Schmidt % v begins as column j of thc original A 4c columns d| to are already settled in Q Як compute R,} = which is q}v Як subtract the projection (q/'vjq, Як v is now perpendicular to all of q,,..., q % the diagonal entries RJ} are lengths Як divide v by its length to get the next Як the loop “for j = 1: n” produces all of the qj To recover column j of A, undo the last step and the middlesteps of the code- Л(у, j)qj = (v minus its projections) = (column j of A) - . (Ц) =i Moving the sum to the far left, this is column j in the multiplication QR = A. Note Good software like LAPACK. used m good systems like MATLAB and Julia and Python, will offer alternative ways to factor A = QR. “Householder reflections” act on A to produce the upper triangular R. This happens one column at a time in thc same way that elimination produces the upper triangular U in LU. Those reflection matrices I - 2uuT will be described in Section 7.4. If A is tridiagonal we can simplify even more to use 2 by 2 rotations. The result is always A = QR and the MATLAB command to orthogonalize A is [Q, Я] = qr(A). Here is a further way to reduce roundoff error: Allow reordering of thc columns of A. When each new 4j is found, subtract its projections from all remaining columns j + 1 to n. Then choose the largest of the resulting vectors as aj+1 (leading to q... ,). We are exchanging columns just as elimination exchanged rows. So a permutation P is allowed on the column side of A, and AP = QR or rot^iT Ih^‘1Gram-S<;uhm•<1, “ ®" «>* g<»d process to understand, even if reflections or rotations or column exchanges lead to a more perfect Q.
4 4. Orthogonal Matrices and Gram-Schmidt 173 REVIEW OF THE KEY IDEAS 1, If the orthonormal vectors оn . T n . T , , . are the columns of Q. then q,'q. = 0 and 4» 4i = * •tanslate into the matrix multiplication QTQ = / 2. If Q is square (an orthogonal matrix) then Qr = Q- >: = inrer^e 3. The length of Qx equals the length of z; |QZ|| _ |хц 4. The projection onto the column space of Q spanned by the q't is P = QQT. 5. If Q is square then P = QQ? = / b = (flTfr) +... + qjrfb). 6. Gram-Schmidt produces orthonormal vectors qi.q2,q3 from independent a,b,c. In matrix form this is the QR factorization A = (orthogonal Q)(triangular R). WORKED EXAMPLE Add two more rows and columns with all entries 1 or — 1, so the columns of this 4 by 4 Hadamard matrix are orthogonal. How do you turn Hi into an orthogonal matrix Q ? 1 -1 x x. The projection of b = (6,0,0,2) onto the first column of Hi is px = (2,2,2,2). The projection onto the second column is pj = (1,—1,1,—1). What is the projection pt 2 of b onto the 2-dimensional space spanned by the first two columns? Solution Hi is built from H2 just as Ht is built from Ht: Я4 = H2 H2 1 I 1 1' Я21 _ -я2] - 1 -1 1 1 1 -1 1 -1 -1 -1 -1 1 has orthogonal columns. Then Q = Я/2 has orthonormal columns. Dividing by 2 gives unit vectors in Q. A 5 by 5 Hadamard matrix is impossible because the dot product of columns would have five l's and/or — l’s and could not add to zero. Я8 has orthogonal columns of length s/8. „т„ [Ят ЯТ1[Я Я1_[2НТЯ 0 1 _ [87 01 Яа «8'7в= [ят _ят] [я _я] " [ о 2ЯТЯ]"[О 8/J 4’-^ What is the key point of orthogonal columns? Answer ATA is diagonal and easy to invert. We can project onto lines and just add. The axes are orthogonal.
174 ChaptW4O^onallty Problem Set 4.4 Problems 1-12 are about orthogonal sectors and orthogonal matrices. 1 Are these pairs of vecton orthonormal or only orthogonal or only independent? йН-з] (c) Change the second vector when necessary to produce orthonormal vectors. 2 The vecton (2,2,-l)and (-1,2,2) are orthogonal. Divide them by their len find orthonormal vecton qt and q2 Put those into the columns of О and . ’° QTQtndQQT v multiply 3 (a) If A has three orthogonal columns each of length 4, what is AT A ? (b) If A has three orthogonal columns of lengths 1,2,3. what is ATA ? 4 Give an example of each of the following: (a) A matrix Q that has orthonormal columns but QQT J. (b) Two orthogonal vecton that are not linearly independent. (c) An orthonormal basis for RJ, including the vector q, ж(1,1,1)/у/з S Find two orthogonal vecton in the plane * + v + 2x = 0. Make them orthonormal • If Qi «»d Qj are orthogonal matrices. show that thetr product Q,n, thogonal matrix. (Use QTQ * I.) V t Vj is also an or- 7 8 9 If Q has orthonormal columns, what is the least squares solution x to Qx = b? If <?] and q2 are orthonormal vectors in Rs. what combination__+___________q2 is closest to a given vector b? (a) Compute P • QQT when q, « (.8, .6,0) and q2 « (-.6, .8,0). Verify that Dl _ D 10 11 (b) Prove that always (<JQT)2 » QQT by using QrQ « I. Then P « QQT is the projection matnx onto the column space of Q. Orthonormal vectors q{,q2.q2 are automatically linearly independent. (a) Vector proof: When c> + cjfl2 -t-Cjflj = 0, what dot product leads to ci =0? Similarly cj = 0 and cj = 0. Thus the q's are independent. (bl Matnx proof: Show that Qx = 0 leads to x = 0. Since Q may be rectangular, you can use QT but not Q"1. Find orthonormal vectors q, and q2 in the plane spanned by a = (1,3,4,5,7) and b = (-6,6,8,0,8). Which combination is closest to (1,0,0,0,0) ?
4.4. 12 13 14 15 16 17 18 19 20 Orthogonal Matrices and Gram-Schmidl 175 If ai, aa, a3 is a basis for R3. any vector u. - ” ^t r ° can be written as b = х1а|+гааа+.тз<*з. (a) Suppose the as are orthonormal. Show that r, = вт6 <Ь> Suppose 4» «> are onhog.uul. Show Um », . «f <c> 1Г Ле L11 - [J) ЛтЛеЛе^и!. В orthogonal to a? Sketch a figure to show a, b, and В. loJ Complete theGram-Schmidt process in Problem 13 by computing q, - а/||a|| and q, = B/IIBII and factoring into QR: Find orthonormal vectors q\,q2.q2 such that qt,q2 span the column space of A. Which “fundamental subspace" contains q, ? Solve Ax « (1,2,7) by least squares. Г -I 4 What multiple of а - (4,5,2,2) is closest to b - (1,2,0,0)? Find orthonormal vectors q( and q2 in the plane of a and b. Project b — (1,3,5) onto the line through а — (1,1,1). Then find p and e. Compute the orthonormal vectors q( = а/|а|| and q2 - e/|e||. (Recommended) Find orthogonal vectors А. В. C by Gram-Schmidt from a. b. c a -(1,-1,0,0) b-(0,1,-1,0) c = (0,0,1,-1). A, B,C and a, b,c are bases for the vectors perpendicular tod — (1,1,1,1). If A — QR then A1 A = RTR = ____________ triangular times ___ triangular. Gram-Schmidt on A corresponds to elimination on ЛТЛ. The pivots for Лт A must be the squares of diagonal entries of R. Find Q and R by Gram-Schmidl for this A: -1 1' 2 1 2 4 Find an orthonormal basis for the column space of A. Then project b on С(Л).
Chapter 4. Orthog^,^ 176 21 22 Hndortho?’"*1 vectc** A В ' f n = 1 2 and bs ’ f -1 0 and •“•Jit в» Then write л 4 finances of a,b,c (independent colum Г1 2 4 - 0 0 5 • 0 3 6. ns). c = T о 4 23 24 25 26 27 28 29 30 31 bl m> 23-2* use the Qfl code above equation (11). It executes Gram.Schmidt Shownhy C (found Via C* in (Ю)) b equd to C in equation (7). Equate (7) subtracts from c its components along A and В Why not subtract components along a and along b WTtern « dr rnn’ small multiplscation* in executing Gram-Schmidl ? Apply the MATLAB qr code to a - (2,2.-1). b = (0,-3,3),c« (1,0,0). Whai are the д’»? If u is a unit rector, then Q - I - 2«uT b • reflection matrix (Example 3). Find Q. from u - (0.1) and Qj from u (0. V2/2. V2/2). Draw the reflections when Qt and Qj multiply the vectors (1.2) and (1,1,1). Find all matrices that are both orthogonal and lower tnangular Q-I- 2uuT is a reflection matrix when uTu - 1. Two reflections give QJ / (a) Show that Qv - -a The mirror is perpendicular lo u. (b| Find Qv when uTv - 0 The mirror contains v. It reflects to itself. Challenge Problems (MATLAB) Factor |Q. Я) = qr(A) if A has columns (1, -1,0,0) and (0,1, -1,0) and (0.0,1. -1) and (0,0,0.1). Cao you scale the orthogonal columns of Q to get nice integer components ? If A is m by n with rank n. then qr( A) produces a square Q and zeros below R: The factors from MATLAB are (m by m)(m by n) A = f о ‘ The n columns of Q( are an orthonormal basis for which fundamental subspace? The m-n columns of Q2 are an orthonormal basis for which fundamental subspace? 32 We know that P = QQT is the projection onto the column space of Q(m by n). Now add another column a to produce A = (Q a]. Gram-Schmidt replaces a by what vector q? Start with a. subtract_.divide by_to find q.
5 Determinantsand Linear Transformations 5.1 3 by 3 Determinants 5j Properties and Application, of Determinanls 5.3 Linear Transformations The determinant of a square matnx i« >n ..._. that the column vectors are dependent and A ' П8 numbeT И det A = 0. this signals Za for A - * Will have a divXt iZre™ T”'"' " ** A “ ™ f°" and the "cofactor formula" for det A A ™™" 5’1 finds 3 b> 3 determinants Section 5.2 begins with algebra; Cramer s Rule fnr , _ a-tx geometry: the volume of a tilted box. The edecs of the hn - Tbc" “ nWVCS ‘° When the box is flat. A is singular. In ,11 c^s Z * Г г”1Т "* t в «• in an cases, the volume of the box is |dctA|. When we multiply matrices AB ue j mllipl,ln, boxes .nd die., volumes. X." are parallelograms and | det Al = Mc. * Л , Жс Ьо*“ к 11 ca Section 53 multiplies matrices for det AB. This link to volumes leads naturally to line» . в3 hikrtnciibe intn. tii>~i k- .r linear transformations. A linear transformation ;:*e^.o7^^ then you know T(.) to, every ““ Wmfani™. Tdoe. lo. bon. There are three useful formulas for dot л тк. — r Pivots^Ube triangular matnx U So the product of pivots in U gives ± det A. This is usually the fastest way A second formula uses determmants of size n - I; the "cofactors" of A Thev rive the best formula for A"1. The entry of A~* « th, , . t л j j i. j * 5 _. . . , „ , л ’1 < °> л и the j, i cofactor divided by det A. The Ing formula for det A adds up n! terms^oe for every path down the matnx. Each path chooses one entry from every row and column, and we multiply the n entries on the path. Reverse the sign if the path has an odd permutation of column numbers. A 3 by 3 matnx has six paths and the big formula has six terms—one for every 3 by 3 permutation. This chapter could easily become fuU of formulas! The connection to linear transfor- mations shows how an n by n matrix acts on a shape in n dimensions It produces another shape. And the ratio of the two volumes is | det A|. Determinants could tel) us everything, if only they were not so hard to compute.
Chapter 5. Ddermmanis and Linear Trantf^ 178 5.1 3 by 3 Determinants ______________________ H is ea — be. The singular matrix [“ has 4^7 al L J U, life H _ Гс П has det PA = be - ad- а o] [c d] [o 6J "detд 1 The determinant of A = 2 Rowexchange рА=Г0 reverses signs [ 1 "'["t 9a Ib * -uB 1 is x(ad - be)+ V( Ad- Be). De‘ b lineaf.. n>wlbyW then 1.2.3 remain true: det = 0 when A is singular det •------v iIr„ _ ^~L...lre.Vcrs«Ssi| 3 4 If A is n by n then 1.2.3 remain «к.. —--------- _ when rows are exchanged, det is linear in row 1. Also, det = product kr.T***® s'gn ‘ <' —• jT = det A is an am . ^pivots "et BA =7de7B)(det A) and det AT = det A. Hus is an ama^ , formuias. But often they are not practical for computing. Determinants lead 10 ‘ ‘ZsTo see how determinants produce the matrix A~>. -pus да-иоп will focus on 3 by 3 j 4J Rm wiU come 2 by 2 matnces: ГПЬГошиЬГогЛ-'»'”»"”»”1^ , d. |i «1 i «kJ0 11—i < H'ic drt • ‘ =Ьс'^ det _ j I"1 detl 1 0] Iе “J L J L J „ matrices. Their determinants change sign when the We «art *iA™ sign change appears for any matnx. This rule becomes a rows are exchanged invfne when det A - 0. key to determinants ofЯ o/c = b/d. The rows are parallel. For n by 2^«fet A^neam that tbe columns of A are not independent. A combmation n matnces. det A_ лх = о with ж # 0. A is not invertible. °f CTh7p^ti« and more will follow after we define the determinant. 3 by 3 Determinants 2 by 2 matrices are easy: ad - be. 4 by 4 matrices are hard. Better to use elimination (or a laptop). For 3 by 3 matrices, the determinant has 3! = 6 terms, and often you can compute it by hand. We will show how. Start with the identity matrix (det I = 1) and exchange two rows (then det = -1). Exchanging again brings back det = +1. You quickly have all six permutation matrices. Each row exchange will reverse the sign of the determinant (+1 to -1 or else —1 to +1): Notice! If I exchange two rows—say row 1 and row 2—each determinant changes sign. Permutations 1 and 2 exchange. 3 and 4 exchange. 5 and 6 exchange. This will cany over to al) determinants: Row exchange multiplies det A by -1.
5 I. 3 by 3 Determinants 179 When you multiply a row by a number ihi< ~ i . Suppose the throe rows arc a b e and „ , r^t ₽ dc,erTn,nanl b> thal number canaP4ruuixyz. Those шпе nufnbCT5 mu|upiy ±1. ' a Я z det = +aqz -bpz +ЬгЖ -cgx +cpp —ary Finally we use the most powerful property we have. The determinant of A is linear in each row separately. As equation (3) will show, we can add those six determinants. To remember the plus and minus signs, I follow the arrows in this picture of the matrix. Combine 6 simple determinants into det A + aqz 4- brx + cpy — ary — bpz — cqx (I) Hotice! Those six terms all have one entry from each row of the matrix. They also have one entry from each column of the matnx. There are 6 = 3 ! terms because there are six 3x3 permutation matrices. A 4 x 4 determinant will have 4! = 24 terms. This guides us to the big formula' for the determinant of an n by n matrix. That formula has n . terms ац а2* .. «ц,,. one for every n by n permutation. Each permutation matrix P picks out n numbers in A (one number from every row and column). Multiply those n numbers by det P= 1 or —1. Then add the results like On а-&—uijoji =ad—be. Det P is 1 for even permutations like 231 and —1 for odd permutations like 213. Those are reached from the identity matnx I by an even or an odd number of exchanges Each permutation P reorders the column numbers 1,2,...,n into some order j.k. z. The determinant is the sumof n ! simple determinants like (-1)а13а31ам = -bpz. det A = sum over all n! column permutations P = (j, k,.z) = £ (det P) aij а2к...апя= BIG FORMULA (2) So every term in the big formula picks out one number a,} from each row of A and at the same time one number from each column of A. Multiply those n numbers times 1 or —1. Permutations P and their plus-minus signs are the keys to determinants I Let me return for a minute to that powerful property: det A is linear in each row separately. We can split row 1 into (a,0,0) + (0,6,0) + (0,0,c). We can split row 2 and row 3 in the same way. This gives us a lot of pieces (33 = 27 different pieces). But only 6 of those pieces are important and 21 of them are zero automatically (a zero column). 3! = 6 ways to use every row and column once, 33 = 27 ways if columns could repeat det 0 q aqz is important det 0 0 = 0 automatically z 21 like this a 0 0 0 a 0 P o
180 Copter 5 Determinants and Linear Transform^^ Cofactors and a Formula for д-i I can explain the “cofactor formula," for ш (i) with 6 u to reduce from 3 x 3 to 2 x 2. For the 3 x 3 inverse of A with b and with c from row 1, the determ.nant ‘ m det A Two of the su _ ^?7b(rx - P*) + c(PV-4*Y Л , о (4«J r*) + ----------------------- Cofactor formula к r from their “cofactors . e> ..irrt its cofa (3) r. Each cofactor is a 2 by 2 «red tbe factors a,b,c from' iu cofactor from row and column 2 We have fKt(X e in tbe 1,1 P»' j Notice that the cofactor of b in "TX, «- “|”м X *'oto ’•3 "^T ,’? £3,. x»»" -+p« -" n“°"a°'re * * the actual 2 by 2 determinan Je for p|us and minus signs. „rrt ________________________________ ——'™’л' -*A For the i, J cofactor n>nl of the remaining matrix (size n - 1). „ „i« f lV+) times tbe determinant ot the r » Cuequj I Ln* row th detA-asiCu+’-- + а*пСй, The cofactor formula along_—------ (4) -----------' collects all the terms ind*A that art multiplied by a0. «.«<««—a.[• «]• та. _ » а —V- * <«- о ««« d« л w« n det A 0 Divtdin, by det A. cofactors give our first «>d best formula for the inverse matrix ad — be 0 0 ad - be 0 det A (5) (6) m, formula «»™«>< - '"«rtW' ,o dlruk »x mama С7 Ь» »• «"*= <« E«r> e»«y » ™*ПХ Л rat» of two determinant, (size n - 1 for the cofactor divided by size n for A). This example has determinant 1. so the inverse is exactly C (the cofactors). N how C12 removes row 3 and column 2. That leaves a 2 by 2 matnx with determinant 1. Since (-I)*** = -1. this becomes -1 in C1". Inverse matrix formula Example of A"1 Determinant = 1 Cofactors in (f1 A’1» 1]"* Г1 1 = 0 1 o -1 1 0 1 1 0 1 0 0 (7)
j 1. 3 by 3 Determinants 181 The diagonal entries of ACT are always det A. That is exactly the cofactor formula. Problem 24 will show why lhe off-diagonal entries of AC1 are always zero. Those bers tum out to be determinants of matrices with two equal rows. Automatically rero. пиГПд typical cofactor C3l removes row 3 and column 1. In our 3 by 3 example, that leaves о hv 2 matrix of l’s, with determinant = 0. This is the bold zero in A"'. 8 if we change A to 2A. this determinant is multiplied by (2) (2) (2) = 8. All cofactors C are multiplied by (2) (2) - 4. Then A"* = C*/det A is divided by 2. Of course. 1П Section 5 2 will solve Ax b and Section 5Л will find volumes from det A. problem Set 5.1 Questions 1-5 are about the rules for determinants. 1 2 3 If a 4 by 4 matrix has det A - find det(2A) and det(-A) and det(A’) and det(A"‘). If a 3 by 3 matrix has det A -1. find det (| A) and det (-A) and det (A2) and det(A_|). What are those answers if det A 0? True or false, with a reason if true or a counterexample if false: (a) The determinant of/ + Aisl+detA (b) The determinant of ABC is |A| |fl| |C|. (c) The determinant of 4A is 4|A|. (d) The determinant of AB - BA is zero. Try an example with A “ [ j . Which row exchanges show that these “reverse identity matrices" and have |J3| = -lbut |J4|“+l? det 0 0 1] 0 1 1 0 but det 0 0 0 1] 0 0 i 0 0 Oj 0 0 0 1 1 0 0 0 5 6 For n = 5,6,7. count the row exchanges to permute the reverse identity to the identity matrix /„. Propose a rale for every size n and predict whether JiOi has determinant +1 or -1. Find the six terms in equation (1) like +aqz (the main diagonal) and —cqz (the anti-diagonal). Combine those six terms into the determinants of А, В, C. 2 -I 0 -1 2 -1 0 ' -1 2 13 = 2 1 4 2 6 3 8 12 c = 12 3] 4 5 7 8 6 9
Chapter 5. Determinants and Linear Transf, 182 , 4 ’]«oge‘[P + a 7 + b r + cl ’ sho*'^11* in row row 1 rowl+r0*2 ro*3 8 9 10 11 12 13 14 15 16 ’ row 1 row 2 row 3 ’ row 1 row 1 +det row3 jT _ ,tr>t A because both of those 3 by 3 determinants come fris Do these matrices have determinant 0,1,2, or 3? [0 0 1 1 0 0 0 1 0 det A = = det 0 1 1 B — 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 — O+det row 1 ’ row 2 . row3 Г1 1 0 1 L1 i = 0 to prove det A D = 1 0 1 If the entries tn every row of A add to zero, solve Ax ~ If those entries add to one. show that det(.4 - I) - 0. Does this mean det A Why doesdet(PiA) = (det A) times (det A) for permutations ? If p ne- row exchanges and needs 3 row exchanges to reach I, why does P} p2 rc ? 2 from 2 + 3 exchanges ? Then their determinants will be (-1)2(—1)3 _ Explain why half of all 5 by 5 permutations are even (with det P = 1). Reduce .4 to U and find det A = product of the pivots: 1 2 3 A- 1 1 1 1 2 2 1 2 3 A = 2 2 3 3 3 3 By applying row operations to produce an upper triangular U, compute 0’ 0 -1 2 det 1 2 -1 0 2 6 0 2 3 6 0 0 0 1 3 ? and det 2 -1 0 0 2 -1 0 0 -1 2 -1 Use row operations lo simplify and compute these determinants: 101 201 301 det 102 202 302 103 203 303 0. 1? t2 t 1 1 t t 1 t2 t Rnd the determinants of a rank one matrix and a skew-symmetric matrix: ГП and det A = 3 and A = Г 0 1 -1 0 -3 -4 3 4 0
5 ! 3 by 3 Determinants 183 If the i,j entry of A is i times j, show that det A = 0. (Exception when A = [ 1 ]•) 1 If the», j entry of A is i + j, show that det A = 0. (Exception when n = 1 or 1в Use row operations to show that the 3 by 3 “Vandermonde determinant" is 1 a2 b2 c2 - (6 - o)(c - a)(c - b). a b c 19 Place the smallest number of zeros in a 4 by 4 matrix that will guarantee det A = 0. Place as many zeros as possible while still allowing det A / 0. 20 (a) If «и = «» =азз = 0. how many of the six terms in det A will be zero? (b) If an — «22 - <133 = 044=0, how many of the 24 products Oi .a2jb«3<«4m are sure to be zero? 2i If all the cofactors are zero, how do you know that A has no inverse? If none of the cofactors are zero, is A sure to be invertible? 22 The big formula has n! terms. But if an entry of A is zero, (n - 1)! terms disappear. If A has only three nonzero diagonals (in the center of A). bow many terms are left ? For n = 1,2,3,4 that tridiagonal determinant has 1,2,3,5 terms. Those are Fibonacci numbers in Section 6.2! Show why a tridiagonal 5 by 5 determinant has 5 + 3 = 8 nonzero terms (Fibonacci again). Use the cofactors of щ > and an- 23 Cofactor formula when two rows are equal. Write out the 6 terms in det A when a 3 by 3 matrix has row 1 = row 2 = a, b, c. The determinant should be zero. 24 Why is a matrix that has two equal rows always singular? Then det A = 0. If we combine the cofactors from one row with the numbers in another row. we will be computing det A' when A* has equal rows. Then det A* = 0—this is what produces the off-diagonal zeros in AC1" = (det A) I. 25 The Big Formula has 24 terms if A is 4 by 4. How many terms (a) include Ou? (b) include о 13 and aa? (c)are left if an= 022 = 033 = 044=0?
Copter 5. Determinants and Linear Transf^.^ 184 nnlications of Determinants 5 j Properties and Арр»--------------------------- /1 Lsefulpropernes: det A‘ = <»« —— ’ ~~‘vl * 1 A 2 Cramers Rule finds x = A~'b from ratios of determinants (a slow way) 1^3 The volume of the box (parallelogram in 2D) with edges g to en is |det The determinant of a square matrix is an amazing number. First of all. an "" trix has det A # 0. A singular matnx has det A = 0. When we come to eigc*nVeni^lc Пц. eigenvectors x with Ax = Ax. we will erite that eigenvalue equation as (a nVa*Ue» A ar|(J This tells us that A - А/ is singular and det (A - AZ) = 0. We have an " ' 0 ®4Uation fo Overall. the formulas are useful for small matnces and also for Г^’ And the properties of determinants can make those formulas simpler, niatr,ccs tnangular or diagonal, w just multiply the diagonal entries to find the de._J* n,atri< is ae,enninant; Triangular matrix Diagonal matrix b c q r = det ~ a4z (1) 9 If we transpose A the same formula still takes one number from each row and column: Transpose the matrix det(AT) = det(A) qj If we multiply AB. we just multiply determinants (a wonderful fact): Multiply two matrices det( AB) = (det A) (det B) qj A proof by algebra can get complicated. We will give a simple proof of (3) by geometry. When we add matrices, we do not just add determinants 1 (Try I + Z) Here are two good consequences of equations (2) and (3): Orthogonal matrices Q have determinant 1 or —1 We know that QrQ = /. Then (det Q)2 = (det QT) (det Q) = 1. Therefore det Q is ±1. Invertible matrices have det A = ± (product of the pivots) If A = LU then det A = (det L) (det U) - det U. Triangular U: Multiply the pivots. If PA — LU because of row exchanges, then det P = 1 or -1. Permutation matrix! Multiplying the pivots (fu Un... Un„ on the diagonal reveals the determinant of A. This is how determinants are computed by MATLAB and by all computer systems for linear algebra. The соя to find U in Chapter 2 was only n3/3 multiplications. Notice: The “Big Formula" for det A will have a much larger cost. It is the sum of n 1 terms.
5.2. Properties and Applications <rf Detenninwis 185 We know that exchanging rows will ____ jet Л = 0. Linearity allows us to up of wi" »ive enmination-s-hrraerm, r rime, TOM. , Д «) key opcrat.on in 1 ™ гон j—does not change the determinant : det[c_xa d-xb] <*«[• £]»det[‘ d]"^ (4) This was “linearity in row 2 with row 1 fixed". It means (again) that our elimination steps from the original matrix A to an upper triangular U do not change the determinant: t ~ det A — det U = 17ц l/a... Unn = product of the pivots (5) Cramer’s Rule to Solve Ax = b J lumn ™ ‘’H4’0'*’1’Xl °f solution vectors to Ax = b. Replace the Г Xn **"*“« * When you multiplyit by A. Ihe first column becomes Ax wluch is b. The other columns of Bl are copied from A: Key idea 0 O' 1 0 0 1 bi h bs °I2 «22 “32 ° 13 a23 a зз = Bs. (6) We multiplied a column at a time. Take determinants of the three matrices to find x,: Product rule (det A)(z,) = det Bi or z, = -et . (7) det A This is the first component of x in Cramer's Rule. Changing a column of A gave B|. To find x2 and B2, put the vectors x and b into the second columns of /and A: Same idea (8) Take determinants to find (det Л)(х2) = det Bj. This gives x3 = (det Bj)/(det A) Example 1 Solving 3x i + 4x2 = 2 and 5xi + 6x2 = 4 needs three determinants: Put 2 and 4 into each В detA = 3 J 5 6 detB1= 4 fi 4 о det B2 = 3 2 5 4 The determinants of Bi and B2 are -4 and 2. Those are divided by det A = -2: Find x = A *b -4 n 2 Xl = -=2 xa = —= -l Check 3 4lf 2l-[2' 5 6jL-1 [4
Chapter 5. Determinants and Unear Тгад^ 186 If det A is not zero. Ax = b is solved by determin» _ detBj _ detBn X1 ~ det A " det3T jA column of A replaced by the vector b. To solve an n by n system. Cramer’s Rule evaluates n + 1 determinants ( n different B's). When each one is the sum of n! terms—applying the "big f ° a°d th all permutations—this makes a total of (n +1)! terms, It would be crazy to *ith that way. But we do finally have an explicit formula for the solution to Ax - e4Uo,i°Hs Example 2 Cramer's Rule is inefficient for numbers but it is well For n = 2, find the columns x end у of A-1 by solving AA= A [x *° ktterj crameb’srVL£ det Bi Xl = det A The matrix B3 has Ле } (9) Ге b *i 1 ] ^“[e d] xa] [«J a b yi _ 0 ” c d уз 1 Those share the same matnx A. We need |A| and four determinants fori,, z2, . He 1 I | 0 b I I a 0 I c 0| d| |c The last four determinants are d. -c. -b. and a. (They are thc cofactors !) Here is A~l. 1,"|Л|’Х,"ИГ* = |ЛГ*"|Л|^Й*ПЛ ’ ad —be d -b -c a 1 chose 2 by 2 so that the main points could come through clearly. The key idea is; A-1 involves the cofactors. When the right side is a column of the identity matrix I, as in AA'1 = /. the determinant of each By in Cramer’s Rule is a cofactor of A You can see those cofactors for n = 3- Solve Ax = (1,0,0) to find column 1 of A-1; Determinants of B’s are Cofactors of A 1 Я12 nj3 0 O22 °23 0 a32 a33 «II «21 <»31 1 «13 0 023 о 033 all 012 1 <*21 <*22 0 e31 032 0 That determinant of B| is the cofactor Сц = 033033 - 023032- Then |B21 is the cotactor C12. Notice that the correct minus sign appears in -(031033 — 023^31). This cofactor C12 goes into column 1 of A-1. When we divide by det A we have computed the inverse. C- C'1 FORMULA FOR A"’ (A-,)« = —— A"1 =------ ________________ det A det A
5 2. Properties and Applications of Determinants 187 Areas and Volumes W S " £ Ы™ п”Л“ pa'llld»S«"-or Ш. p^lldugram «h«b ;s a triangle, me problem is: Find the areo c__ . . in. half the base limes the height. A paralleloenm °f * tnang c- ** з bh' h L,it the 1 Then n=. I paraJ,e‘o&ram contains two triangles with equal area. so we omit the Then parallelogram area = bh = base times height ПО o/Zn “T 3nd remember Bu« ^'t««««г problem, because 311 wZ 8hL We 0П,У knW ,he ‘he comers For the ,Г‘аП8 s Ae^ou^ P°,ntS Г (0,0) (в’6) and For P^allelogram (twl« as large) the fourth comer wtll be (o+<, ]ng , U ^are aL^frorn? л1 iU T° find - could createz to fam (c, rf) that u perpendicular to the baseline. The length A of that hTa Л Гге 4uare roou-Bu* d<~not >nvo1- square roots and it has a beautiful formula: Area of parallelogram = Determinant of matrix = ±! ac J i = M - bc|. (11) Our goal is to find that formula by linear algebra: no square roots or negative volumes. We also have a more difficult goal. We need to move into 3 dimensions and eventually into n dimensions. We start with four comers 0,0,0 and a. b,c and p,q,r and x, y, z of a box. (The box is not rectangular. Every side is a parallelogram. It will look lopsided.) If we use the area formula (11) as a guide, we could guess the correct volume formula: Volume of box = Determinant of matrix = ± a b c P q r x V z (12) Our first effort stays in a plane. For this case we use geometry. Figure 5.1 shows how adding pieces to a parallelogram can produce a rectangle. When we subtract the areas of those six pieces, we arrive at the correct parallelogram area ad — be (no square roots). The picture is not very elegant, but in two dimensions it succeeds.
188 be \°^ cd/2 cd/2 Area of parallelogram (a+c)(b+<f)-2bc-ab-cd — ad — be (a.bl ^ь/2 : k Figure 5.1: Adding six simple pieces to a parallelogram produces a rectangle. Would a similar construction be possible in three dimensions ? Following FigUre 5 . I believe we could add simple pieces to make a (tiledbox into a rectangular bo*J but it doesn't look easy And there is a much better way: Use I,near algebra. Areas and Volumes by Linear Algebra In working on this problem. I came to an understanding. If we do more algebra, then we need leu geometry. Very often, linear algebra comes down to factoring a matrix We will look there for ideas. A box in n dimensions has n edges в|.е»,...,en going out from the origin. The parallelogram in two dimensions had two vectors Bj “ (a, b) and e2 = (c,d). Those vectors e give two corners or n comen of the “box". In lhe 2-dimensional picture, the fourth corner was e> + e». In the n-dimensional picture, the other corners of the box would be sums of the e's. The box is totally determined by the n edges in the matrix E- Edge matrix Ej [ ь d i ,nd Our goal is to prove that the volume of the box b | det E|. We considered three possible factorizations of E. to reach this goal. They are taken from Chapters 2 and 4 and 7 The third factorization b called the Singular Value Decomposition of E: the SVD Lower times upper triangular Orthogonal times upper triangular Orthogonal - Diagonal - Orthogonal E = LU E = QR E = USVT
52. Propert'» “d Applications of Determinants 189 The problem is to connect the volume of the box to Those factors of E arc square matrices becauw г s. determinant of L is 1 (all ones on its diagonal? (" . . js j or -1. Then det L = | det QI = | L7 '_T1da^n'nan‘ “У orthogonal matrix on the multiplication formula for the dctermm *7 “ Wc wil1 certa‘nly dcPcnt) on * 1ЛС ««CTmtnani of a product, which now tells us that ||detg| = | det LZ | = |detfl| = |detS|?j (13) **" not change the volume Lei me undenund this hru tot E - Q time. R Multiply by any matrix: Straight lines stay straight Multiply by an orthogonal Q; .T. and zTy art the same a, (qx)t((?x) дш1 (Qx)T(Qv) Then lengths and angles and box shapes and volumes are not changed by Q This remain, troe for curved regions. We div.de them into nuny small cubes plus thin curved pieces. The total volume of those curved pieces can appro^h zero. The volumes of the cubes are not changed by Q. The boxes for R and/or E = QR haw the same volume. R is a triangularjnatnxju box ha, . volume we can compute For a program in the xy plane, the base and height are exactly the diagonal entries of R R- [u v 0 w base = u. height = w |area| = |u times w| = |det R | The key point i,: The main diagonal of Я shows the height in each new dimension. When we multiply those numbers on the diagonal of R, we gel the volume of the box and also the determinant of the triangular matrix R The volume formula | det El is now proved in all dimensions because det Q| = 1 and |det E| |det R|. Final comment: The Singular Value Decomposition E « (7EVT has two orthogonal matrices U and V. The number | det E] is equal to | det E|. And this matrix E is diagonal. It gives a perfectly normal rectangular box in R". This SVD approach by f/EVT look, simpler than QR, which had a triangular matrix R producing a tilted box. But that tilted figure shows a dear geometric meaning for the diagonal entries of Я: bate and height. The geometry of lhe SVD will be seen in Chapter 7. It is beauti- fully clear for ellipses in n dimensions. But the singular values are not so clear for boxes. E gives lhe lengths of the axes of an ellipse but not the sides of a rectangular box. For a box with straight sides. E = QR leads directly to volume = | det Я|. The next section will allow any shape: not just boxes.
190 Problem Set 5.2 If<MA = 2.*hJ«arc Compute the determinants 1 2 о 1 1 1 1 1 О О 1 drt A'* and det A" and det AT ? of А. В, C, D. Are their columns independent? fl 2 3 B= 4 5 6 7 8 9 С = „««л •« "«»*“ rf««<ь« 3 0 0 x 0 0 x Whal are lhe cofactors of row 1 ? What is the rank of A? What are the 6 terms in det A? 4 (a) IfD. - «И(Л-). could Л oo even if all |XV| < 1 ? fl>) Could D. -»0 esen if all |Aj| > 1 ? Problems 5-9 art about Cramer’s Rule for x - A b. 5 Solve these linear equations by Cramer’s Rule Xj = det Д2/det A: (.1 2r|+Sx’”1 ,B) x, + 4xa = 2 2xi + x2 =1 (b) x, + 2xj + xj 0 xj + 2xj “ 0. 6 Use Cramer s Rule to solve for у (only). Call the 3 by 3 determinant D. (a) «x + by = I rx + dy = 0 ax + by + a - 1 (b) dr + ey + ft “ 0 gr + hy + it = 0. 7 Cramer’s Rule breaks down when det A - 0. Example (a) has no solution while (b) has infinitely many. What are lhe ratios x, = det B,/det A in these two cases? (P*alWlinc$) 0,1 S+te’Zi («««line) 8 Quick pmofof Cramer'i rule The determinant is a linear function of column I. ft is zero if two columns are equal. When b = Ax = X]Oi + x2a2 + x3a3 goes into the first column of A. lhe determinant of this matrix B\ is lb a3 a3| = |х|О| + xjOj + xjtij o2 a3| = xj |а2 a? a3| = z, det A. (a) What formula forxt comes from left side = right side? (b) Whal steps lead lo lhe middle equation? If the right side b is the first column of A. solve the 3 by 3 system Ax = b. How does each determmant in Cramer's Rule lead lo this solution x?
5-1. Ю 11 12 13 14 15 16 17 18 19 20 21 properties and Applications of Detemunanu 191 ГЫ .be _ { w _ (1#4). The c«nen ol.u,„,k„ft I|ed (a.4)-4(0 <« FigdtlKirel ™ ош„. И ta, «Ли,^ ta> b . , 1 1 1 -I 1 1 1 -1 What is |Я| 1 1 I 1 I I I 1 » volume of a hypercube in R‘? The sides have length 2 An n dimensional cube has how many comers"’ How many edges? How many (n - irnensonJ aces The cube in R* whose edges are the rows of 2/ has volume ------ • A hypercube computer has parallel processors at the comers with connections along the edges. The triangle with comers (0.0). (1.0), (0.1) has area 1. The pyramid in R1 with four comers (0,0,0), (1,0,0), (0,1.0), (0,0.1) has volume _______What is the vol- ume of a pyramid in R with five comers al (0,0,0,0) and the four columns of / ? Suppose E„ is the determinant of the tridiagonal 1,1,1 matnx of order n. By cofac- tors of row 1 show that En > En_, - Starting from E, - 1 and Ej - 0 find E3, By noticing how the Es repeat, find EIOo- Ез.Еч, Ft = IM Е,- £s- 1 1 1 1 1 1 1 1 0 1 From the cofactor formula AC1 = (det A)/ show that det C = (det A)—’. Suppose det A 1 and you know all the cofactors in C. How can you find A? If a 3 by 3 matrix has entries 1,2,3,4.9. what is the maximum determinant ? If the edge matrix E is orthogonal, the box has volume___. If the edge matrix E is singular, the box has volume___. If the volume in Rn is V. the box for 2E has volume____.. Draw parallelograms for ’jand^ ] j Can you see any reason for equal areas ? Transposing the edge matrix | J gives a matrix with the same determinant and a new parallelogram with the same area. Can you draw it and recompute its area ?
Chapter 5. Determinants and Linear Transfom^^ 192 Linear Transformations 53 1 2 3 ипеаг transformations T obey the rule T(cv + dw) = cT(v) + dT(w} Derivatives and integrals are linear transformations in function space. Volumes of all shapes are multiplied by | det Л| when every x goes t0 Дд. _ ’ _____________________________________ Transformations T follow the same idea as functions. In goes a number z or out comes /(x) or T(u). For one vector tt or one number x. we apply (hc lran-Vcc,Or v, T or we evaluate f(x). The deeper goal is to see all vectors v at once. We are tranT"1*1'0" the whole space V. Orfning Start again with a mains A. It transforms v to Av. It transforms w to Aw Tk know what happens to u v + w. There is no doubt about Ли. it has to equal А *"СП *e Matnx multiplication T(o) A и is an example of a linear transformation * + A tran formation T assigns an output T(v) lo each input vector v inV 1 The transformation и linear if it meets these requirements for all t> and (a) T(v + w) - T(v) + T(w) (b) T(cv) - cT(c) for all r. Those rules tell us: If the input is • - 0. then lhe output must be T(0) - 0. No shift. T(0 + w) T(0) + T(w) Removing T(w) from both sides leaves 0 » T(0) Combining the two rules tells us about linear combinations of ti and w: |r(cv + dw) - T(cv) + T(dw) - cT(v) + dT(w) Example 1 Ti rotates the whole xy plane by 90° around the center point (0,0). This is a linear transformation! Straight lines will rotate into straight lines. A square will rotate into the same sire square. The center point (0.0) does not move: / (0) ж 0. Requirement (I) for linear combinations cv + dw is satisfied. The likable pan of that example is: No matrix was needed. We can visualize linear geometry without linear algebra. If we have another linear transformation T2 of lhe xy plane, then T2 can follow Ti lo produce T2T\: First find Ti(ti) and then apply Ta. Example 2 T2 reflects each vector (x, y) to its minor image (z, -y) across the x axis. This is another linear transformation that doesn't need a matrix. Notice that TjTi differs from T>T>: Reflecting the rotated vector rotating the reflected vector. (1,0) rotates to (0,1) and reflects to (0, -1). But (1,0) reflects to (1,0) and rotates to (0,1).
5.3- linear Transformation* 193 Example 3 The length f(v) _ || v|| Ц iinear Requirement (a) for linearity would be ||w + w|| - M + M- Requirement (b) would be ||со| = г|вЦ. Both are false’ Hot (a): Thc ',dcs of Wangle satisfy an mequo/iry |v + w|| < |v| + ||w||. Hol Ф): The length | - v|| i» ||o|j and not -||v|. For negative c. linearity fails. T (every vector) from T (basis vectors) The rul, of linearity extend, to combination, of three vector. or n rector,: Linearity u = C|v, + qjOj +... + СжВя must transform to _____________Т(ц> = С1Т(р1)->~О1Т(рз) 4--- +cwT(o,) The 2-vcctor rule starts the 3-vector proof: T(cu + do + fW) = T(ru) + T(do + ew). Then linearity applies to both of those parts, to give cT(tt) + dT(v) + eT(w). The n-vector rule (2) leads to the most important fact about linear transformations: Suppose you know T(v) for all vectors «ц......in a basis Then you know T(u) for every vector u in the space. You see the reason: Every u in the space is a combination of lhe basis vectors or Then linearity tells us that T(u) must be the same combination of lhe outputs T(v,). A key point about linear transformations: If we choose bases for the input space and the output space, then 7 can be specified by a matrix A. The rule for constructing A must use the two bases (which can be the same if output space = input space). Step !• Apply the transformation T lo each input baii, vector vJe Step 2- Write the output T(o}) as a combination of the output basu vecton w,. Step 3. The coefficients AtJ in that combi nation T(v,)* £2 A,, w, go into column j of A. This matrix A finds the output T(u) for any input ». If the vector c (ci,...,c«) gives the input coefficients in v = CiV| + ••• + c,t>„ then b « Ac gives the output coefficients in T(v) = biw> + • •• + bw,wm. T becomes multiplication by A. Example 4 T is rotation by в of the ry plane with basis vectors . and j. Г 1 1 Г сое» 1 . Г 0 1 Г —sin® 1 __________ Г сов в -sin» 1 т[о] = [ sine] and Thrl сове j produce л = [ sine «»е J
the matnx multiplication Ли. du . л Output = b + Sex- matrix form of the derivative T = —. dx 1 Vl Input и Multiplication Ли » We know what T does to these basis functiom: - 2x - 3va. (3) n- 1 The Derivative is a Linear Tran*f(lr. а,°ппац0 It i* Imeanty that lo hnd the denvarive of u(x) = i6 - 4x + 3x7. SlarI denvative* of 1 and x and x7 Those functions are the basts vector* Thetr d *',h ме 0 tod 1 aid 2x. Then use lineanty for the denvative of any combination like **** = в (denvative of l)-4 (derivative of x) + 3 (derivative of x3) » -4-p e Ml of cikului depend on lineanty' Precalcuiu* finds a few essential den vat i x" and sin x and <-ou x and e* Then lineanty applies to all their combinations '**' Г<* I would say thal the only rule special lo calculus it the chain rule. That Drnrf denvative of a chain of functiom /(p(x)) or /(p(h(x))) Needed in deep learning1?** *he Nullspace of T(u) = du/dx For the nullspace we solve T(u) = 0. The d чего when uiia consiani/MCtion. So the nullspacc of d/dx it a line in functi^ V<"‘Ve h all multiples of the special solution и « 1. ' The denvative operator it ‘P^e— W 14 no1 'nvertible» Column space of T(u) = du/dx la our example the input space contains all n + hr + ex7. Then the outpuu (the column space ofT^aretdl linear function?x rWic‘ Notice that lhe Counting Theorem is still true: r + ' _ 11 A , * 6 + 2cx. ---..и» 1 column space + dimension (nullspace ) (Шпепыи» ---------. 1 +1 dimension (input ipirt; Wluii il lhe maim foe d/dx' I cant leave derivative* without asking for a lnil|f|x T d/dx и a linear transformation ’ “ — u—i. ______________' st|.l,x,x7 ^-0 Gj The 3-dimaMKmal input space V (= quadratic») transforms to the 2-dimensional output xpace W ( linear functions) If vi.vi.Ui were vector*, we would know the matrix l
jj. Linear Transformations 195 The Integral is a Linear Transformation Hext we look «integrals from 0 to x They gbt ... . e„mp»» Cco, _ Ox t J£j> S' 'XT "Л?““ “ »—. «bT.lhi The matrix A lor z ix J by 2. A» inverts А wlwrr 0 ' D Input Multiply A*v v я D + Ex Output = Integral of v T*(r) - Ш+ |£r» ° 01. 1 0 0 Fundamental Theorem of Calculus says that integration is the pvcudoim erve of differentiation For linear algebra, the matrix A* is the pscudoinvenc of A. 0 0 °U 0 0 0 0 10 0 0 1 (3) The derivative of a constant function is zero That zero и on the diagonal of A* A. Calculus wouldn't be calculus without that 1-dimensional noll.pw-rrrfT . d/dx. Example в Suppose A is an imtmbU matru. Cmainly T(v + u) Ae + Aw T(v) + T(w). Another linear transformation is multiplicanon by A'• This produces thc invmr transformation Г*1, which brings every vector T(v) beck to v: T~'(T(v)) - v matches the matnx multiplication A'*(Ar) - v. if T(v) « Av and 5(u) - Bu, then T(S(a)) matches the product ABu. We are reaching an unavoidable question Are all linear transformations from V to W produc'd by matrien? When a linear T is desenbed as a “rotation- or a ‘projection" or is there always a matrix A hiding behind П Is T(v) always Av? The answer is yer! This is an approach to linear algebra that doesn't start with matrirM Wr «fill *««*4 a.« впДак — —‘ — r--— mwwhw wrj a . но к» IWI nurar vompuier graphics has found a way to use a matrix—but it is 3 by 3. Every point (r, g) is given the "homogenrous coordinatfs" {x,g, 1). Then you can shift a page by using matrices. A diagonal matrix rescales a page and an orthogonal matrix rotates a page 7
196 O^er 5- Deienninants and L,ncar Tnunfonnati()lh The Geometry of Linear Tran.sformations linear transforout^transformed into a tru^J SuPpOh'C1 L u^*TOCd,nl°aStrJ1₽ И ГС.) T(«2)- then we know T(„j straight hne i» V:gne «basis If»* fonns into a mangle. The area of CVe The vector* »t triangle О-т .^determinant formula : for all other point* ~ connected by the arca/oc mangle (»berc'rT *•»>*“ г) (inKs area of original triangle .rea of transformed tnang T of the xy plane ? Use it* matrix I ц-hat is the determinant of a Imear Figure 5.2: Line* to line*, equal «pacing to equal spacing, u = 0 to T(u) - o. One key point about area*, when a linear T transforms one part of a plane to another part in Figure 5.2. The area of the big triangle is multiplied by a number. The area of every small triangle is also multiplied by that number. More is true: Every circle and square and every shape whatsoever has iu area multiplied by that same number We just fill the shape with small squares, a* closely a* we want. Their area* are all multiplied lhe same way. Section 52 discovered that lhe area multiplier is the determinant of the matnx E. Then E take* squares into parallelograms and circle* into ellipses. Why Do Determinants Multiply? Thi* valuable property | det AB | ж | det A 11 det В | look* messy to prove. It is buried somehow in lhe big formula for lhe determinant. Here are two very different proofs, one from geometry using volume*, and one from logic using properties of the determinant. Geometry Stan with a standard cube in n dimensions. Each edge has length 1. The n edge* going out from (0,... ,0) are the row* of the n by n identity matrix /. The volume of the cube is 1. which is the determinant of /. Now multiply every point in the cube by the matrix B. This gives a box with sides from B. We know that volume of box = determinant of B. No problem so far, except for some risk that our logic might become circular.
s j. Linear Transformations 197 Now multiply every point in that B-boX bv th. Altogether we have multiplied the columns of hTJ*?? A ’П,'‘ giv*‘ * nc* Л/?Ьо* haS volume = |det AB |. •denuty matnx by AB. so the Л B-box But also we have multiplied the B-box bv A u-x. volume is multiplied by | det A |. Since the В L к . Wy *** “ mul,'Pl'cd by \‘te v >|ume = |dct A | |dct В |. •>*» *®'ume | del В |. the Л B-box has ( ) we give three properties of all determinants. 1. The determinant of A = I is j. 2. The determinant reverses sign if two row. of A are exchanged 3. The determinant is a linear function of each row separately From those wrtb 3 we can make any matnx tnangular and find its determinant. R“ в ? *. a multiX 7Га"ОП$ 'ke cl,m"ut,ofl R“le 2 allows us to exchange rows Ru)e 1 sets a multiplying factor to 1. for the volume of a unit cube The product rule comes fn>m cbeckmg tKa, R three properties. Then that ratio must equal det A. Thus det AB = (det A) (det B). Change of Bases Suppose the input space and output space are both R2. Suppose v2 is the input has.* Md w,, W, .. the output bmis. What is the matnx В (us.ng these bases) for the Entity transformation T(o) = v?No, always /! В is the “change of bas.s matnx”. WhenV=[vt Ua]andlVa[W| w3 ], the change of basis matrix is В = W~ ‘V. For any linear transformation, its matrix A (In the old bases) changes to W~lAV. I see a clear way to understand those rules Suppose the same vector u is written in the input basis of v’s and the output basis of w’s. ThenT = /. Its matnx gives d - etl Г 1141 U = C|V! +•• +c„v„ u = diWt + • •• + dnWn ,S »l ••• On Wi w„ and Vc = W d The coefficients d in lhe new basis of w'sare d = W~*Vc. Then В is W~lV. For a transformation T between spaces V and W. we insert the matrix for T to get W~l AV: Change of basis leads to W *4V not И’Л V. Larger vectors w have smaller coefficients d!
198 I' Chapter 6 L зреег Dclcrrni ----------------- uailonx ., K.V art orthogonal When A is symmetric) and of Л (they matrix becomes diagonal. Hor every A>- Singular vector u.t, \ 1 OrthoR°na,for“l,A ' Chapter 7 I ’ I I Linear Transformations of a House to define it When a 2 by 2 matrix A a transformation IM wilh a -house" that has eleven U is more interesung Ю ^ch ho* « ^ors Av. Straight line, ««и»** X. Л ». prod““ • ,wn h°^Xbrf “ ” >1”'' ** *“show ,o“r hOT“‘ “““lhc Ь —• - '^ U<^nco™n«<0K йты»». (И Th,i them The columns of U ‘ сдапет to the first) A multiplie, AH °',h' °“” IS 2 by \ tk, houje matnx n w e lbe 11 points m the bou [-6 -6 -7 House - 1 j - matrix I 0 8 7 I
199 jj. Linear Transformation! problem Set 5.3 1 2 3 4 5 б 7 8 9 10 A linear transformation must leave the »» — - , T(v + w) = T(tt) + T(w) by с»юомп^ T(0) = 0 Provc lh'4 fr°m alsofromT(cv) L ^(.J * Suppose a linear T transforms (1,1) to (2,2) (2fl) * (() fl) RwJ r(v). (.)v = (2,2) (b) (c) (d) v = (efc) Which of these transformations are not linear? The input i. v - (v,. v,): (a) T(v)-(«>,„) (b) T(V),(VliVl} (c) r(r)_(0.ri) (d)TO-(o,i) (e) nvj.v.-.. (f) T(e) = VlP2. If S and 7 are linear transformations, is T(S(»)) linear or quadratic ? (a) (Special case) If S(v) - v and 7(e) > ®. then T(S(®)) - v ot v11 (b) (Generalcase)S(®1+Vj) . ,+»,) - 7(v,) + 7(v2) combine into T(S(v> 4-V])) ш T(_____)= \ __ Sup’TSe^(w)x " ® «сеР“»ш 7(0,4,) . (0,0). Show that this transformation satisfies 7(cv) = cT(v) but does not satisfy T(v + w) - 7(t>) + 7(w). True or False: If we know 7(«) for n different nonzero vectors in R". then we know 7(v) for every vector v in R". Which of these transformations satisfy T(v + w) - T(v) + T(w) and which satisfy T(ct>) •= cT(w) ? (a) T(w) - t>/|M| (b) T(v) . v, +vj+vj (c) 7(„) . (₽1,2vjt (d) T(w) = largest component of ®. How can you tell from the picture of 7 (house) that A is (a) a diagonal matrix ? The house expands or contracts along each axis. (b) a rank-one matrix ? (c) a lower triangular matrix ? Draw a picture of 7 (house) for these matrices: D=[o i] «d л=[1 л] tf-[J i] • What are the conditions on A = [" $] to ensure that 7 (house) will (a) sit straight up? (b) expand the house by 3 in all directions? (c) rotate the house with no change in its shape?
Chapter 5. Determinants and Linear Transfo^ 200 11 12 13 nuter sketch the bouses .4 . H for these matrices Д: Without a computer sxc< [i.:] - '] “* [-’ J “ N- .... .. „„ ()K 4 = ad - be ensure that the output house AH will What conditions on <iet t (a) be squashed onto a line. <b) keep its endpoints in clockwise order (not reflected)? (c) base the same area as the original house? This code creates a vector theta of 50 angles It draws the unit circle and it draws T (circle) = ellipse. The multiplication Av takes circles to ellip^ A-[21;12] % Мэи can change A theta • [О* * I*502 • PH: * 50 an0*es arete (cosfthete); s«(theU)); % 50 points ellipse-A.drete;* arete to ethpse axis(H 4 -4 4[). aaisCsquare’) ptot(oecte(1. ). drete(2.:). еде!.:). ««PH2.:)) 14 Suppose the spaces V and W have lhe same basis B|,va. (a) Describe a transformation T (not /) that is its own inverse. (b) Describe a transformation T (not /) that equals Г3. (c) Why can’t the same T be used for both (a) and (b)? Questions 15-18 are about changing the basis. 15 (a) What matrix transforms (1,0) into (2,5) and transforms (0,1) to (1,3)7 (b) What matrix C transforms (2,5) lo (1,0) and (1,3) to (0,1)? (c) Why does no matrix transform (2,6) to (1,0) and (1,3) to (0,1)? 16 (a) What matrix .If transforms (1,0) and (0,1) to (r, t) and (a, u)? (b) What matnx .V transforms (a,c) and (b,d) lo (1,0) and (0,1)? (c) What condition on c, b, e, d will make part (b) impossible? 17 (a) How do .If and N in Problem 16 yield lhe matrix thal transforms (a, c) to (r t) and (6. d) to (a,«)? 7 (b) What matrix transforms (2,5) to (1,1) and (1,3) to (0,2)? 16 If you keep the same basis vectors but put them in a different order, lhe change of .. "Г? ir *------------------niatnx. If you keep the basis vectors in order but change their lengths. В is a___matrix. 19 Why is integration not the inverse of differentiation ?
6 Eigenvalues and Eigenvectors 6.1 Introduction to Eigenvalues 6.2 Diagonalizing a Matrix 63 Symmetric Positive Definite Matrices 6.4 Systems of Differential Equations Eigenvalues A and eigenvectors x obey the equation Ax - u™ a ; and A is a number. The vector Ax kin the - Ax. A isi square matnx . naira Ax, = A.x »n л . Mme d,rcc,,on as * unusual. If we find n of „ simple one-dimen.ionil We W"S ргоЬ1"" “™ iM° id Je „,.е Г« ihe - e.pemee.ee,. Here is an example: Solve . я a u _____________________ „ vecior«. i. multiplied by hi £*. Output UW ж A*U - CjAf»! + ... Input «0 = ? 10 ** differential equation du/dt - Ли is solved at time t tn exactly lhe same way. The numbers A* change to e". In matrix language, the matrix X of eigenvector, turns A into X'AX = A. This is the diagonal matnx of eigenvalues. A diagonal matnx means that the system is uncoupled into n easy equations like du/dt - Au. and solved by и = e" Sections 6.1 and 6.2 present the key ideas of eigenvalues. Then Section 6.3 goes from general matnces A to symmetric matrices S. The eigenvector, of S are orthogonal. Their matnx X becomes Q with Q'Q « /. And with positive eigenvalues A > 0. we have thc best matnces in pure and applied mathematics symmetric and positive definite. Positive definiteness can be tested five ways—by positive pivots and determinants and eigenvalues and energy (and by S = ДТЛ) co(wct |o fiRj five book. The central ideas are coming together for the best matrices. 201
Chapter 6. Eigenvalues and Eigenvector 1 2 202 6.1 Introduction to Eigenvalues_______________________ An eigenvector x lies along the same line as A times x : |Ax = Ax.| The eigenvalue^, if Ax = Ax then A2x = A2x and A"'x = A~’x and (A-E c/)x = (A-E c)a # If A® - Ax then (A-A/)x = 0 and А-А/ is lingular: |det(A-A7^0 ‘ V. by det A = (A^As) - (A.) dw,S«*l sum a,! + a2J + • • + anB = sum of have 1 and -1 Rotations have e,e and e"'*: comply 3 4 Check A’s 5 Projections have A = 1 and 0. Reflections This chapter enters a new pan of linear algebra. The first part was about Ax ж b- linear equation* to find a steady stale. Now the second part is about change Time en- ters the picture—continuous tune in a differential equation du/dt = Au or time step, in a difference equation u»>t » Au*. Those equations are NOT solved by elimination. The key idea is to find solutions u(t) that stay in the direction of a fixed vector x. We want "eigenvectors" that don‘t change direction when you multiply by Д. The eigenvector-eigenvalue equation is Ax Ax. We look for n eigenvectors x and their eigenvalues X Then A2 also has those eigenvectors: A2x - A(Ax) = A2®. A good model comes from the powers А, A2. A1,... of a matrix. Suppose you need the hundredth power A,oe. Ils columns are very close lo the eigenvector x (.6, .4): , , f.8 .31 (.70 .451 (.650 .5251 A.A’.A»-[>2 7] [да .55] [зад .475] A100 « .6000 .6000 .4000 .4000 A100 was found by using the eigenvalues of A. not by multiplying 100 matrices. Again: Each eigenvector * has an eigenvalue A with Ax ж Ax. ThenAl0"x ж A100®. To explain eigenvalues, we first explain eigenvectors. Almost all vectors will change direction, when they are multiplied by A. Certain exceptional vecton x art in the same direction at Ax. Thme are the eigenvecton. Multiply an eigenvector by A. and the vector Ax is a number A times lhe original x. The basic equation is Ax = Ax This leads to (A — Л/) x = 0 The eigenvalue A tells whether the eigenvector x is stretched or shrunk or reversed or left unchanged—when it is multiplied by A. We may find A = 2 or | or -1 or 1. If А ж 0 then Ax = Ox means that this eigenvector x is in the nullspace of A. If A is the idenuty matrix, every vector has Ax = x. All vectors are eigenvectors of I. All eigenvalue* “lambda" are A = 1. This is unusual to say the least. Most 2 by 2 matrices have two eigenvector directions and two eigenvalues: A®| = Ai®i and Axj = AjXj- This section will explain how to compute the x’s and A’s. It can come early in the course because we only need the determinant of a 2 by 2 matnx A-XI.
।. Introduction to Eigenvalue» 203 •8-А .3 2 .T-A A = E„mpl.1 su„d«(X-A7)=0: I factored the quadratic into Л - 1 times A - 1 ... . . . . . , For those numbers, the matrix A - XI become! . °* c'8CnvaluCi A = 1 and J. I, ,hc (ЛOb [-j J and lhe first eigenvector is xt (Л- -«»» [^2 j] ®* " [0] jnd *** ‘есо<х1 eigenvector is x2 If Xi » multiplied again by A. we still get Every power of A will give Л"; Multiplying x2 by A gave |za. and if we multiply again we get (j| When Л fa squared, the eigenvectors stay the same. The eigenvalues are squared.^ This pattern keeps going, because the eigenvectors stay in their own directions (Figure 6.1). They never gel mixed. The eigenvectors of zl' are the same Xi and Xi. The eigenvalue» of Л100 are l100 - 1 and ($ ),0° = very small number «1. times za. f (1)’*| Multiply z’a by A2 — ‘2® a2 / all A2xt (5)3za Figure 6.1: The eigenvectors of 4 are also eigenvectors of A2: eigenvalue = A’. Other vectors like (.8. .2) do change direction. But all other vectors are combinations of the two eigenvectors zi = (.6, .4)andza = (1,-1): The first column of A iszj +(.2)za: Separate other vectors into eigenvectors .8 .2 = zi+(.2)zj= ® _ . (I)
Chapter 6. Eigenvalues and bigCnVcc tar. 204 When we multiply Xi + Multiply Xi •» by A = 1 and Л = Fw molo^io. Ь, A » b by (J). Tbe »m eiseoree.», b b, l.Tben99«4»F«<D"sma"<’> lppem“Л'”: This is the first column of A’00- The number we originally wrote as .6000 was not exact. We left out (.2) (J)" whkh wouldn’t show up for 30 decimal places. The eigenvector x( is a "steady state” that doesn’t change (because A, = 1). The eigenvector x2 is "decaying” and virtually d.sappears (because A, - |). The higher lhe power of A. the more closely its columns approach the steady state. This particular A is a Markor matnx: Columns add to I. Its largest eigenvalue is A 1. Its eigenvector X| (.6, .4) is the Heady state—which all columns of A* will approach, A giant Markov matrix is the key to Google’s search algorithm—which is truly fat. Other matnces have other eigenvalues. Projection matrices have A s 1 for vectors in the column space and A = 0 for vectors in thc nullspace (projected to the zero vector). Then Pxt - *i and Px2 - 0. We have P3 = P because 1’ = I and 02 - 0. Example 2 The projection matrix /’ • $ j has eigenvalues A « 1 and А ж 0. Its eigenvectors are x> “ (1,1) and zj « (1,-1). For those vectors. Px} equals «i (steady state) and Pz2 • 0 (nullspace). Then P2x1 “ Xi and P2x2 » 0 and P2 = P. Our examples illustrate Markov matrices and singular matrices and symmetric matrices: Those matrices have special A’s and special eigenvectors: I. Markov matrix: Each column adds to 1. This makes A = 1 an eigenvalue. 2. Singular matrix: Some vector has Ax = 0. Then A = 0 is an eigenvalue. 3. Symmetric matrix The eigenvectors (1,1) and (1, -1) are perpendicular. The only eigenvalues of a projection matrix are 0 and 1. The nullspace is projected to zero. The column space projects onto itself. The projection keeps the column space and destroys thc nullspace. so A = I and A = 0: [11 Г21 fol -1 + 2 projects onto Pv = q + The next matrix R is a reflection matrix and also a permutation matrix.
I. Introduction to Eigenvalues 205 еитриз ТЪ.г.Пееи.пт.ж.я, [?.| ta>«i|wmle,I1„t -1. The eigenvector (1,1) is unchanged by Л The are reversed by R. A matnx with no negatived e'8“vectw “ ~1 Ils «£ ° „„vm-tnr» for n .i 8a,,ve entries can suit have a negative eigenvalue! The eigenvectors for R are the same as for P, because reflection = 2(pr»jection) - I. Eigenvalues of R = 2P - I A =2(1)- bl md Аж 2(0) _ J _ -1 (2) whe„ a matrix is shifted by I, each A is shifted by 1. No change tn it* eigenvectors. The Equation for the Eigenvalues For projection matrices we found A’s and xi by geometry: Px - x and Px - 0. For other matrices we use determinants and linear algebra This is the key calculation in the chapter—almost every application Mans by solving Ax = Xx First move Ax to the left side Write the equafon Ax = Ax a* (A - A/)x - 0. rA< eigenvector, «« tn the nullspace of A - AZ When we know an eigenvalue A. we find x by solving (A - AZ)x a 0. Eigenvalues first. If (Л - AZ)x ж 0 has a nonzero solution. A - Af is not invertible. The determinant of A — XI must be zero. This is bow to recognize an eigenvalue A: Eigenvalues A к an eigenvalue of A о A - AZ is singular Equation for A det (A - AZ) = 0 (3) This '‘characteristic polynomiaP det(A - AZ) involves only A. not x. When A is n by n, equation (3) contains A". So A has n eigenvalues (repeats possible!) Each A leads to x: For each eigenvalue A solve (A - AZ)x = 0 or Ax = Ax to find an eigenvector x. Example 4 A = * * j is already singular (zero determinant). Find its A's and x's. When A is singular, A = 0 is one of the eigenvalues The equation Ax = Ox has solutions. They are the eigenvectors for A = 0. Solving det (A - AZ) » 0 is the way to find all A’s and x’s. Always subtract AZ from A: 1 - A 2 Subtract A from the diagonal of A to find A - XI = ? 4 - A ‘ Take the determinant uad — be” of this 2 by 2 matrix. From 1 - A times 4 - A, the “ad" part is A2 - 5A + 4. The “be" part without A is 2 times 2. Subtract: 4*a]»(1-AX4-A)-(2X2) = A’-5A. (5)
206 Chapter 6. Eigenvalues and E,8cnv«*t<n 1» - 5A to -его One solution is Aj -°- This was expected this ^.nont X fX д dmes A - 5. the other eigenvalue is д2 Л' A is singular. Factoring л . •* - 5; toM-AI)-A’-»-» A, = 0 >nd A> = S Now find the eigenvectors. Sohr (A - AZ)z = 0 separately for At =0 and A — e 2 5. M _ 0/)x = |J Jj Щ = fJj Vld(fa „ eigenvector j = 2j forA2 = 5 The matrices A - 01 and .4 - 51 are singular (because A - 0 and 5 are eigenvalues). The eigenvectors (2.-1) and (1,2) are in the nullspaces: (A - XI)x - 0 is Ax = We need to emphasize: There is nothing exceptional about A = 0. Like every other number, zero might be an eigenvalue and it might not If A is singular, the eigenvectors for A = 0 fill the nullspace: Ax — Ox - 0 If A is invertible, zero is not an eigenvalue. We shift A by a multiple of I to make it angular. In Example 4 the shifted matrix A - 51 is singular. Then 5 is the other eigenvalue of A. Summary To find the eigenvalues of an n by n matrix, follow these steps: best if n s 2 1. Compute the determinant of A - AZ. With A subtracted along the diagonal. this determinant starts with A" or -A". It is a polynomial in A of degree n. 2. Find the roots of this polynomial, by solving det (A - AZ) = 0. The n roots are the n eigenvalues of A. Those n numbers make A — AZ singular. 3, For each eigenvalue A: Solve (A - AZ)z = 0 to find an eigenvector: Ax = Az. 4. The eigenvalues of a triangular matrix are the numbers on its diagonal. A note on the eigenvectors of 2 by 2 matrices. When A - AZ is singular, both rows are multiples of a vector (a, b). The eigenvector direction is (6, —a). Tbe example had A = 0: rows of A - 0Z in the direction (1,2); eigenvector in the direction (2, -1) A = 5 : rows of A - 5Z in the direction (-4.2); eigenvector in the direction (2,4) Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct There is a whole line of eigenvectors—any nonzero multiple of z is as good as x. MATLAB's dg(A) divides by the length of z. to make the eigenvector into a unit vector.
5.1. Introduction lo Eigenvalues 207 Determinant and Trace Bad news first: If you add a row of A to amwK- eigenvalues usually change. Elimination does T U has its eigenvalues sitting along the diaeonSl? «** А к The upper triangular eigenvalues of U but not of A ! Eigenvalues are Г u pnXXs b r. _i gtnvatues are changed when row 1 is added lo row 2: U has A = Oand A = 1; д ^XXX^s^^ and ,he ™ A«+* - «•*** h ' m The Slim nf ’ P10**0* “ 0 times 4. That agrees with the determinant (Wh' J?’, “ 0 + 4 ТЫ agrees with the sum down the main has A = 0 and A = 4. The product of the n eigenvalues equals the determinant. The sum of the n eigenvalues equals the sum of the n diagonal entries. The sum of the entries of A along the main diagonal is called the truce of A: 6 (6) Those checks are very useful. They are proved in PSct 6.1 and again in Section 6.2. They don’t remove the pain of computing A’s. But when the computation is wrong, they generally tell us so. To compute the correct A’s, go back to det (A - Af) = 0. The trace and determinant do tell everything when the matrix is 2 by 2. We never want to get those wrong! Here trace = 3 and det = 2. so these matrices have A = 1 and A = 2: (7) And here is a question about the easiest matrices for finding eigenvalues: triangular A. Why do the eigenvalues of a triangular matrix A lie along its diagonal ? Imaginary' Eigenvalues One more possibility, not too terrible. The eigenvalues might not be real numbers. Example 5 The 90° rotation Q = [? ~o] ^as 1,0 rea^ eigenvectors. Its eigenvalues are Aj = t and Aj = —i. Then Ai + Aj = trace = 0 and AjAj = determinant = 1. After a rotation, no real vector Qx stays in the same direction as x (x — 0 is useless). There cannot be an eigenvector, unless we go to imaginary numbers. Which we do. To see how i = y/^T can help, look at a rotation Q through 90°. Then Q2 is rotation through 180°. Its eigenvalues are —1 and —1 because —Jx = —lx. Squaring Q will square each A. so we must have A2 = -1. The eigenvalues of the 90° rotation matrix Q are +i and —i, because i2 = —1.
Chapter 6. Eigenvalues and Eii 208 Those A’s come as usual from det(Q - AZ) = 0. This equation is A2 Its roots are i and -i. We meet the imaginary number i also in the eigenvectors cpu. rc *UJ ”* l* №‘l!]- eigenvectors L , = (1 i) and x2 = (*. 1) keeP ±е1г dlrccti<>n as they Somehow the complex vectors ,mportant poinl that real matrices are rotated. Don't ask me bow- • The particular eigenvalues i and •“ »»™ «. p. p malnces Q ta„ 1|(W| _ )M 1. The absolute value of each = X is pure imaginary. 2. This Q is a skew-symmetric matnx (У ,ct _ Cl can be compared to a real number: A is real. A symmetric matnx (5 - h) can * to. qmntok man. (Лт = -Л) is Mu ” "“P"”’ "umb": A “ « «togtol man. (9’0 - П ‘ ”“тЬ" W "1 , cneeial matrices S and A and Q are perpendicular. Somehow Eigenvalues of AB and A+B The first guess about the eigenvalues of AB is not true. An eigenvalue A of A times an eigenvalue 3 of В usually does not give an eigenvalue of AB: False proof ABx = Apx = pAx = pXx. (8) When x is an eigenvector for A and B. this equation is correct. The mistake is to expect that A anti В automatically share the same eigenvector x. Usually they don't. Eigenvectors of A are not generally eigenvectors of B. These singular matrices A and В have all zero eigenvalues while 1 is an eigenvalue of AB and A + В: and В then AB and A + В = qJ • The eigenvalues of A + В are generally not X + P. Here A + p = 0 while A + В has eigenvalues 1 and -1: trace = 0 and determinant = -1. The false proof suggests what is true. Suppose x really is an eigenvector for A and B. Then we do have ABx = XPx and В Ax = XPx. When all n eigenvectors are shared by A and B, we гол multiply eigenvalues. The test AB = BA for shared eigenvectors is important in quantum mechanics—time out to mention this application of linear algebra.
6 I. Introducuon to Eigenvalues 209 |7^d В share the same n independeni'^^ if and only if AB = BA | Heisenberg’s uncertainty principle In quantum mechanics, the position matrix P and the momentum matrix Q do not commute. In fact QP - PQ = / (these are infinite matrices). To have Px - 0 at the same time as Qx = о would require x = lx = 0. But if we knew the posibon exactly, we could not also know the momentum exactly. Heisenberg’s uncertainty principle JPxfi ||Qx|| > l|xp b ln Problcm WORKED EXAMPLES 6.1 A Find the eigenvalues and eigenvectors of Л and A2 and A~1 and A + 4Z: and A2 Check that the trace Ai + Aj = 2 + 2 = 4 and the determinant is AjAj =4-1=3. We don’t need to compute A2 to find its eigenvalues A2. Solution The eigenvalues of A come from det(X - AZ) = 0: det( A - AZ) = 2-Л -1 1 I = A2 - 4A+ 3 = 0. 2 This factors into (A — 1)(A — 3) = 0 so the eigenvalues of A are Ai = 1 and Aj = 3. For the trace, the sum 2 + 2 agrees with 1+3. The determinant 3 agrees with the product Ai Aj. The eigenvectors come separately by solving (A — AZ)x = 0 which is Ax = Ax: A = 1: (A-Z)x gives the eigenvector Xj = 1 1 л=з: _;][;]-[;] gives the eigenvector Xj A2 and A-1 keep the same eigenvectors as A Their eigenvalues are A2 and A 1. A2 has eigenvalues I2 = 1 and 32 = 9 A 1 has — and - A + 4Zhas ^-^4 = 7 Notes for later sections: A has orthogonal eigenvectors (Section 63 on symmetric matrices). A can be diagonalized since Aj / Aj (Section 6.2). A is similar to any 2 by 2 matrix with eigenvalues 1 and 3 (Section 6.2). A is л positive definite matrix (Section 6.3).
210 Chapter 6. Eigenvalues and Eigenvector te the eigenvalues of any A ? Gershgorin gave this answer. 6.1 В How can you es onc of the entries^. on the main diagonal. Every eigenvalue of Л must be W » (han the sum Щ of all other |% j I«.,l» “““ °f • “""w ««... in that row. of the ma diagonal entries а« |o« - A| < R, Every A b m the circle = ° of x is x2. Then Fro# Suppose (- _ A| < |aal | |z2| + |aM| |Ia| . , - Ain + “»ls ~ ° 8 ajtxi + («и si p and A is inside the second Gershgonn circle. Dividing by |xj| leaves |o» — - "2 Example I. Every eigenvalue A of this A falls into one or both of the Gershgonn circl The centers аге a and d. the radii are Я> = |6f and R2 = |c|. И Га 6 1 First circle: |A - a| < |6| “ [ c d | Second circle: |A - d| < |c| Those are circles in the complex plane, since A could certainly be complex. Example 2. All eigenvalues of this A lie in a circle of radius R = 3 around one or mo of the diagonal entries d\,d2, d3: |A-di|<l+2 = fll |A - <Ы < 2 + 1 = R2 |A - d,| < 1 + 2 = R3 You see that “near" means not more than 3 away from dt or d2 or d3, for this example. 6.1 C Find the eigenvalues 0,1,3 and eigenvectors of this symmetric 3 by 3 matrix S: Symmetric matrix Singular matrix 1 -1 0 -1 2 -1 0‘ -1 1 All eigenvalues are in the Gershgonn circle |A - 2| < 1 + 1. Solution Since all rows of 5 add to zero, the vector x = (1,1,1) gives Sx = 0. This is an eigenvector for A = 0. To find Aj and Аз I will compute the 3 by 3 determinant: 1-A det(5-A/)= -1 0 Those three factors give A = Pl *1 = 1 1 Sx)= 0z[ -1 2-A -1 = (1 - A)(2 — A)(l — A) — 2(1 - A) = (1 — A)[(2 — A)(l — A) — 2] 0 -1 1-A| = (1 — А)(—Л)(3 — A). — 0,1,3. Each eigenvalue corresponds to an eigenvector: ’ 1 0 -1 Sx2 = lx2 x3 = 1‘ -2 1 Sx3 =Зхз. u J ’ ei8envecl<*s « perpendicular when S is symmetric. We were lucky to тъГа|'| ’ 1WOU*<1 use eig(A). and never touch determinants. u comma [ •£]=eig(A) will produce unit eigenvectors in the columns of X. S =
$.1. introduction to Eigenvalues 211 problem Set 6.1 1 The example ai the start of the chapter has powers of this matrix A: 4'"] - *-[« 5] - -=[:< :?] Find the eigenvalues of these matrices. All powers have lhe same eigenvectors. (a) Show from A how a row exchange can produce different eigenvalues. (b) Why is a zero eigenvalue not changed by the steps of elimination? 2 Find the eigenvalues and the eigenvectors of these two matrices: A=[j 3] and Л + /=[2 j], A +1 has the eigenvectors as A. Its eigenvalues are____by 1. 3 Compute the eigenvalues and eigenvectors of A and A-1. Check the trace ! i] A"1 has the eigenvectors as A. When A has eigenvalues A j and >2, its inverse has eigenvalues. 4 Compute the eigenvalues and eigenvectors of A and A2: - = o] “ *•[-» 4 A2 has the same____as A. When A has eigenvalues A! and Aj. A2 has eigenvalues . In this example, why is A2 + A| = 13? 5 Find the eigenvalues of A and В (easy for triangular matrices) and A + В: A=[J j] and B«[j J] and A + B=[{ ’] • Eigenvalues of A + В are / art not equal to eigenvalues of A + eigenvalues of B. 6 Find the eigenvalues of A and В and AB and BA: (a) Are the eigenvalues of AB equal to eigenvalues of A times eigenvalues of B? (b) Are the eigenvalues of AB equal to the eigenvalues of BA?
212 Chapter 6. Eigenvalues and Ei8««*aors - aneurin, nroduces -4 = UJ The eigenvalues of U are on its diagona|. 7 X the P The eigenvalues of £ are on its diagonal; they are all _ ’ eigenvalues of -4 are not tbe same as ------ 8 (a) If you know that z is an eigenvector, lhe way lo find A is to---. (b) If you know that A is an eigenvalue, the way to find X is to---- 9 What do you do to tbe equation Ax = Az. in order to prove (a), (b), and (с)? (a) A* is an eigenvalue of A2. as in Problem 4. (b) A*1 is an eigenvalue of A as in Problem 3. (c) A + 1 is an eigenvalue of A + I. as in Problem 2. 10 find the eigenvalues and eigenvectors for both of these Markov matrices A and A °° Explain from those answers why A100 is close to Ax: 11 Here is a strange fact about 2 by 2 matrices with eigenvalues Aj / A2; The columns of A - Aj I an multiples of the eigenvector x2. Any idea why this should be? 12 Find three eigenvectors for this matrix P (projection matrices have A= 1 and 0) Projection matrix f.2 .4 0] .4 .8 0 0 0 1] If two eigenvectors share lhe same A, so do all their linear combinations. Find an eigenvector of P with no zero components. 13 From the unit vector « = (1,1,3,5)/6 construct the rank one projection matrix ' = «« . This matrix has P2 = P because uTu = 1. (a> sfrh U to™s fmm <uuT)u - u(uTu) - u. Then и is an eigenvector with eigenvalue A = 1. In that case find />100u. (b) If v is perpendicular to и show that Pv ~ 0. Then A = 0. <« ejjel)vaJueA = “ - ад . w «.ы, и д . ± ,.л л Q _ — sin в] [ sin 8 casfil ’be ту plane by the angle 9. No real A’s. **fcb,= и,,,..!
Introduction to Eigenvalues 213 15 Every permutation matrix leaves x = (1,1....j, unrflan^ Then A = 1. Find two more A s (possibly complex) for these permuuuons. from <fct(P - AZ) = 0: _ Г0 1 °] ГО 0 1' P ~ I ° ° 1 I and P = о 1 0 . L1 0 0j 10 0 16 The determinant of A equals the product А» Aj • • - A,. Start with the polynomial det(A - AZ) separated into its n factors (always possible). Then set A = 0: det(A — AZ) = (At — A)(A2 — A) • • • (A, — A) so det A = __- Check this rule in Example I where the Markov matrix has A = 1 and |. 17 If A has Ai = 4 and A2 = 5 then det(A - AZ) = (A - 4)(A - 5) = A2 - 9A + 20. Find three matrices that have trace a + d = 9 and determinant 20 and A = 4,5. 18 A 3 by 3 matrix В is known to have eigenvalues 0.1,2- This information is enough to find three of these (give the answers where possible): (a) the rank of В (b) the determinant of BTB (c) the eigenvalues of BTB (d) the eigenvalues of (В2 + Z)_|. 19 Choose the last rows of A and C to give eigenvalues 4,7 and 1.2.3: Companion matrices 20 The eigenvalues of A equal the eigenvalues of AT. This is because det(A - AZ) equals det(AT - AZ). That is true because ____________. Show by an example that the eigenvectors of A and AT are not the same. 21 Find three 2 by 2 matrices that have Aj = Aj = 0. The trace is zero and the determinant is zero. A might not be the zero matrix but check that A =0. 22 This matrix is singular with rank one. Find three A’s and three eigenvectors 1 2 2 4 1 2 All eigenvalues are in the Gershgorin circle |A — 2| < 8. «tnnnncr. 4 and В have the same eigenvalues A,.A„ with the same independent Suppose A and В have tne same eigenvectors Xj,. . ..Xn- Then A " CjX, + • • • + c„x„. What is Axl What is Bx.
214 Chapter 6. Eigenvalues and Eigenve^ 24 Find the rank and the four eigenvalues of A and C: 25 (Review) Find thc eigenvalues of А, B, and C: 1 2 3 0 4 5 0 0 6 0 0 I* 0 2 0 3 0 0 2 2' 2 2 2 2 26 Suppose A has eigenvalues 0,3,5 with independent eigenvectors u. v, w (a) Give a basis for the nullspace and a basis for the column space. (b) Find a particular solution z lo Az v + w. Find al) solutions. (c) Az . u has no solution. If it did then____would be in lhe column space. Challenge Problems 27 28 28 30 eigenvalues of A. Check that A| + A, agrees with Die trace ut ti| + u2V3 = UT„ ’* ^m<,? Wh"‘ numhcni can •* ,hc "Г P? What fimr numbtn can he e.genvalue. of P, as in Problem 15? ».’££F rrt^“ «' <-* »• A Hz - zrHAz S 2 IIЛз>|| ||//X||. Then I* *• •mf"’»»iMc to get Hr positam emir *T*1 "”** 1|Яж|,/|,а'11 '* lc"'' J* I* ion emu and momentum error txMh very small. A ’ ' 1?*y can hT **/’’‘el*env"luw mu*, “,irfy . Whal are the trace and determinant ?
6.2. Diagonalizing a Matrix 215 6.2 Diagonalizing a Matrix columns of AX = ХЛ are Axk = A*xt. The eigenvalue matrix Л is diagonal. 2 n independent eigenvectors in X diagonalize A A = XAX~l and Л = №* AX 3 Solve Uk+t = Xu* by u* = A‘uo = XA*X 'v<> = [cHAQ^Zt 4- ---F c„(AT.)hz7 4 No equal eigenvalues » n independent eigenvectors in X. Then A can be diagonalized. Equal eigenvalues A might have too few independent eigenvectors. Then X~1 fails. 5 Every matrix C = fl"1 AB has the same eigenvalues as A. These C*s are “similar'' lo A. When x is an eigenvector, multiplication by A is just multiplication by a number A: Ax = Ax. All the difficulties of matrices are swept away. Instead of an interconnected system, we can follow the eigenvectors separately. Il is like having a diagonal matrix, with no off-diagonal interconnections. The 100th power of a diagonal matnx is easy. The point of this section is very direct. A turns into a diagonal matrix A when we tut the eigenvectors properly. Diagonalizing A is lhe matrix form of our key idea. Suppose the n by n matrix A has n linearly independent eigenvector* x>............x„. Pul them into the columns of an eigenvector matrix X. We will prove AX = X A. Therefore X ~1A X is the eigenvalue matrix A: Eigenvector matrix X Eigenvalue matrix A Xх AX = A = The matrix A is “diagonalized." We use capital lambda for the eigenvalue matrix, because the small A’* (the eigenvalues of A) are on its diagonal. Example 1 This A is triangular so its eigenvalues 1 and 6 are on the mam diagonal: Eigenvectors go into X 'll [11 °] [ij ; -!] [i s] [::] - [J J] X-t A X = A In other words A = ХЛХ*1. Then AJ = XAX_,XAX-1. So Aa is XA’X1. Aa has the same eigenvectors in X and its squared eigenvalues are in A?
216 Chapter 6. Eigenvalues and Ei, „.. . . Y _ у A ’ Л multiplies its eigenvectors, which are the columns of X Th ™ь>,«,.ЕкЬ.ЫиЯ.»ГХЬт«И|ЧЫ|,1иЫ|!я^ The key idea is to split this matrix AX into A’ times Л. Keep those matrices in lhe right order! Then Aj multiplies the first column xj, as shown The diagonalization is complete, and we can write AX = XK in two good ways: AA'=A'A is X"’AX=A or A = АЛХ» (2) The matrix X has an inverse, because its columns (the eigenvectors of A) were assumed to be linearly independent Without n independent eigenvectors, we can't diagonalize A. A and Л have the same eigenvalues Aj,..., An. Their eigenvectors are different. The job of the original eigenvectors Xj,..., xn was to diagonalize A. Those eigenvectors in X produce A = XAX~l. You will soon see their simplicity and importance and meaning The fcth power will be A* = XA*X-1 which is easy to compute: A* = (XAX^XXAX-1)... (AAA"1) = XAkX~* Example 1 A = 1 and 6 P 5Г = P И P 1P -i] _ Г1 [0 6] “ [0 1] [ 6*J [0 1J “ [o With к = 1 we get A With к = 0 we get A0 = / (and A0 = 1). With к = -1 we get A"1 with eigenvalues 1 and j. You see bow A2 = [1 35; 0 36] fits that formula when k = 2. Here are four small remarks before we use A again in Example 2. Remark 1 Suppose the eigenvalues A,,..., are all different Then it is automatic that the eigenvector x, .... xB are independent The eigenvector matrix X will be invertible, -very matnx that has no repealed eigenvalues can be diagonalized. In “"™е,оп b’ an’ constants. A(cx) = A(cx) is MATLAB 1 ** d,vide 1 ~ (1.1) by s/2 to produce a unit eigenvector. MATLAB and vmually all other codes produce eigenvectors of length ||x|| = 1.
6.2- Diagonalizing a Matrix 217 Л come “ «me order as the eigenvalues in A. To reverse the order tn Л. put the eigenvector (1,1) W<Me eigenvcctof (1(0) in X: New order 6,1 To diagonalize A we must use an eigenvector matrix. From X'AX = Л we know that AX - XA. Suppose the first column of X is x. Then the first columns of AX and XA arc Ax and Ац. For those to be equal, x must be an eigenvector. Remark 4 (repeated warning for repealed eigenvalues) Some matrices have too few eigenvectors. Those matrices cannot be diagonalized. Here are two examples: A and В are not diagonalizable They have only one eigenvector ““ B=[o o]- Their eigenvalues happen to be 0 and 0. Nothing is special about A * 0. thc problem is lhe repetition of A. All eigenvectors of the first matrix are multiples of (1,1): Only one line of eigenvectors Ax = Ox means [! -IH -Й and x = c There is no second eigenvector, so this unusual matnx A cannot be diagonalized. Those matrices are the best examples to test any statement about eigenvectors In many true-false questions, non-diagonalizable matrices lead to false. Remember that there is no connection between invertibility and diagonalizability: - Invertibility is concerned with tbe eigenvalues (A = 0 or A / 0). - Diagonalizability is concerned with the eigenvectors (too few or enough for X). Each eigenvalue has at least one eigenvector! A - Af is singular. If (Л - AI)x = 0 leads you to x = 0. A is not an eigenvalue. Look for a mistake in solving det (A - Af) = 0. Eigenvectors for n different A’s are independent. Then we can diagonalize A. Independent x from different A Eigenvectors xj,...,x7 that correspond to distinct (all different) eigenvalues are linearly independent An n by n matrix that has n different eigenvalues (no repeated A’s) must be diagonalizable. Proof Suppose cjxi+caxj =0. Multiply by A tofindciArXi + caAjXj = 0. Multiply by A2 to find CtAeXt + cjAjXa = 0. Now subtract ooe from the other to show c, = 0. Subtraction leaves (At - AjjciXj = 0. Then с» = 0 because A> # Aj. Similarly ca = 0. Only the combination of x’s with d = ca = 0 gives ctx, + Cjx2 = 0. So the eigenvectors x । and x2 must be independent.
218 Chapter 6. Eigenvalues and Eigenv^,^ . ainxtlx to 3 eigenvectors- Suppose that c«i + e-2x2 + a This preo f йООС Multiply by .4 - W and x3 в gone: Multiply by A - Ajf and x3 - omyxtbHt (A.-M(a.-^’° wtachforccs Cl=0- (3) - n When the А » are all different, the eigenvectors are independent Similarly every с. - а n ^„„,5 of the eigenvector matrix X. A full set of n eigenvectors can g _ 1» 9 Powers of 4 The Markov matrix .4 = [;$:’] has Aj = 1 and A2 = .5 ««»«•“ •': Markov example , -j ^.4 —] J .5] [.4 —.6j The eigenvectors (.6, .4) and (1,-1) «in *e columns of X. They are also the eigcnvec- tors of A2. Watch how A2 has the same X. and the eigenvalue matrix of А2 is Л2: Same X for A2 A2 = XAX^XAX'1 = XA2X~2. Just keep going, and you see why the high powers A4 approach a steady state : (4) As к gets larger. (.5)* gets smaller. In the limit it disappears completely. That limit is A00: .. ... .« f-6 il fl ol Г1 il Г.6 .el Limitfc—»oc A = [4 oj^4 _.6] = [,4 ,4j. The limit has the eigenvector xt in both columns. We saw this A°° on tbe very first page of Chapter 6. Now we see it coming from powers like A100 = XAi0oX~l. Question When don Ak -» zero matrix? Answer AU |A| < 1. Similar Matrices: Same Eigenvalues Suppose the eigenvalue matrix A is fixed. As we change the eigenvector matrix X, we get a whole family of different matrices A = XAX’1^// with the same eigenvalues in A. All those matrices A (with the same eigenvalues in A) are called similar. This idea extends to matrices C that can t be diagonalized. Again we look at tbe whole family of matnees A _ BCB allowing all invertible matrices B. Again all those matrices A and C are similar. f у U'inf 5 *’*5tea<1 ot Л because C might not be diagonal. We are using В instead no< * ei^v««ors. We only require that В is mvernble S.m.lar matnees C and BCB1 have the same eigenvalues AHthematrices A = BCB~> are ^ИагГ a(]of c
ьг pjagonalizing a Matnx 219 Supp0se Cx = Ax. Then BCB' also has the eigenvalue A The new eigenvector is Bx: Same A (BCB-*)(Bx) = BCx = BAx = A(Bx). (5) д fixed matrix C produces a family of similar matrices BCB*1, allowing all B. n £ is the identity matrix, the “family” is very small. The only member is BIB ' = I- The identity matrix is the only diagonalizable matrix with all eigenvalues A = 1. The family is larger when A = 1 and 1 with only one eigenvector (not diagonalizable). The simplest C in this family is called the Jordan form. Every matrix A in the family has determinant = 1 and trace = 2 and this special form with A = I excluded: J J = Jordan form for every A = BCB~1 = 1 For an important example I will take eigenvalues A = 1 and 0 (not repeated!). Now the whole family is diagonalizable with the same eigenvalue matrix A. We get every 2 by 2 matrix A that has eigenvalues 1 and 0. The trace of A is 1 and the determinant is zero: All . Г 1 0 Г 1 1 1 Г Л .5 1 . xyr similar Л“[о о] Л ’ [ 0 0 ] ” 4 ~ [ S .5 ] ' 7V The family contains all matrices with A2 = A. including A = A. When A is symmetric these are projection matrices P2 = P. Eigenvalues 1 and 0 make life easy. Fibonacci Numbers We present a famous example, where every new Fibonacci number is the sum of the two previous F’s. Then eigenvalues of A tell how fast the Fibonacci numbers grow. The sequence 0,1,1,2,3,5,8,13.... comes from = F*+i + F^. Problem: Find the Fibonacci number F100 The slow way is to apply the rule F*+2 = Fk+i + F* one step at a time. By adding Fe = 8 to F? = 13 we reach Fe = 21. Eventually we come to Fioo- Linear algebra gives a better way. The key is to begin with a matrix equation 11*4.1 — Au*. That is a one-step rule for vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting two Fibonacci numbers into a vector u. Then you will see lhe matrix A.
Chapter 6. Eigenvalues and 220 «, .-И* Л'°Ч: Г11 «|>Р1« “»“[lb 03 = [2].......... ut°o<=[pl0ll «0= 0 . 1 Id UJ 1 J l/iooj’ This problem is perfect for e.genva.ues, Take the determinant of A - Af: 4-A/=pjA \j kads,° dcl(4-A/) = A’-A-i. The equation Aa - A -1 ® OB solved by the quadratic formula (-6 ± v^T^)/2e. Eigenvalues A, = * 1-618 and A2 = * -0.6l8. These eigenvalues lead to eigenvectors X| = (Ai, 1) and x2 = (A2,1). Step 2 finds the combination of those eigenvectors that gives the starting vector tio = (1,0): N-sMi’HW “ <•> Step 3 multiplies uo by A100 to find uioo- The eigenvectors Xj and x2 stay separate! They are multiplied by (A| )100 and (A2)100: We want F100 = second component of umo- The second components of Xj and x2 are I The difference between At = (1 + s/5)/2 and A2 = (1 - v/5)/2 is y/5. And Aj00 =s 0. Д1ОО _ Д1ОО 100th Fibonacci number = —1------— = nearest integer to (10) Every Ft is a whole number. The ratio Fioi/F10o must be very close to tbe limiting ratio (1 + s/S)/2. The Greeks called this number thc "golden mean". For some reason a rectangle with sides 1.618 and I looks especially graceful. Matrix Powers A* Rbonacd s example is a typical difference equation u*+i = Au*. Each step multiplies у The solution is u* = A «0. We want to make clear how diagonalizing the matrix gives a quick way to compute A* and find uk in three steps T^ eigenvector matrix Xproduces A = XXX '. This is a factorization of the matrix, hee>i>J °Г' vi P* new fac1onzaI'on is perfectly suited to computing powers, because every tune Xх multiplies Хне get It Powers of A A*i»o = (ХХХ-')...{ХАХ->)щ = XAfcX-*uo **” nl°threc s,eps Equation (11) will show how eigenvalues work.
6 2. Diagonalizing a Matrix 221 wn» « — + Then с - X -«. 2. Multiply each eigenvector x, by (A,)* Now we have Д* 3. Add up the pieces c,(A, )*x, lo find the solution щ = A'uo. This is XA*X-‘uo Solution for = Aut ut = ^t«o=c1(AI)‘x,+... + Ce(Ae)‘xn. (II) In matrix language Ak equals (ХЛХ *)* which is X times A* times X~l 2. In Step 1. the eigenvectors in X lead to the c’s in the combination uo = С|х, + — + c„x„: Step 1 This says that uo = Ac. (12) The coefficients in Step 1 аге c = X'1^. Then Step 2 multiplies each c* by A*. The final result u* = 52с,(А,)кх, in Step 3 is the product of X and A* and c = X-1uo: A*uo = XA*X-‘uo=XA*c= xt RAi)‘ . (В) (A„)fcJ |< This result is exactly u* = Ci(Ai)fcXi +••• + c„(A„)kxn It solves ut+l = Aut. Nondiagonalizable Matrices (Optional) Suppose Л is an eigenvalue of A. We discover thal fact in two ways: 1. Eigenvectors (geometric) There are nonzero solutions lo Ax = Ax. 2. Eigenvalues (algebraic) The determinant of A - AI is zero. The number A may be a simple eigenvalue or a multiple eigenvalue, and we want to know its multiplicity. Most eigenvalues have multiplicity Af = 1 (simple eigenvalues). Then there is a single line of eigenvectors, and det( A — Af) does not have a double factor. For exceptional matrices, an eigenvalue can be repeated Then there are two different ways to count its multiplicity. Always GM < AM for each A: 1. (Geometric Multiplicity = GM) Count the independent eigenvectors for A. Then GM is the dimension of the nullspace of A - Af. 2. (Algebraic Multiplicity = AM) AM counts the repetitions of A among the eigenvalues of A. Look at the n roots of det, A — Af) — 0.
CWer6.ElgenvJ1luesandEigcnvBc^ 222 If A has A = 4,4.4. then that eigenvalue has AM = 3 and GM = 1 or 2 or 3 The following num* A is an example of trouble. Its eigenvalue A = n It is a double eigenvalue (AM = 2) with only one independent eigenvector '* rcPeated. = (1,0). AM = 2 A=[? 11 has <fet(A — Af) — | 0 _д| A- 1 eigenvea^ GM =1 1° when GM is below AM means that A is not diagonal^ This shortage of eigenvectors WORKED EXAMPLES 6.2 A Find the inverse and the e.genvalues and tbe determinant of this matrix A. A = 5 • eye(4) - ooes(4) 4 -1 -1 -1 -1 4 -1 -1 -1 -1 4 -1 -1 -1 -1 4 Describe an eigenvector matrix X that gives X AX-A. Solution What are the eigenvalues of the all-ones matrix ? Its rank is certainly 1 so three eigenvalues are A = 0,0,0. Its trace is 4, so the other eigenvalue is A = 4 Subtract this all-ones matrix from 5/ to get our matrix A: Subtract the eigenvalues 4.0,0,0 from 5,5,5,5. The eigenvalues of A are 1,5,5,5. The determinant of A is 125, the product of those four eigenvalues. The eigenvector for A = 1 is x = (1,1,1,1) or (c,c,c,c). The other eigenvectors are perpendicular to x (since A is symmetric). The nicest eigenvector matrix X is the symmetric orthogonal Hadamard matrix H The factor 5 produces unit column vectors = eigenvectors of A. Г 1 1 1 11 Tbe eigenvalues of A*1 are 1, |, |. The eigenvectors are not changed so A-1 = The inverse matrix is surprisingly neat: A-1 = (/ + all ones)/5. A is a rank-one change from 51. So A-1 is a rank-one change from I/5. In a graph with 5 nodes, the determinant 125 counts the “spanning trees”. Those trees have no loops and they touch all 5 nodes.In a graph with 5 nodes, lhe determinant 125 counts the “spanning trees". Those trees have no loops and they touch all 5 nodes. W5lh 6 nodes, the matrix 6 • eye(5) - ones(5) has the five eigenvalues 1,6,6,6,6.
& 2. Diag°nali,jn*>a MaUu problem Set 6.2 Questions 1-7 are about the eigenvalue and eigenvector matrices Л and X. 1 (a) Factor these two matrices into A = XXX (b)IfA = XAX 'thenA3 = ( X X )MdA-‘=( X )( )• 2 If A has Ai = 2 with eigenvector Zj = [J] and A2 = 5 with x2 = [}]. useXAX to find A. No other matrix has the same A’s and z’s. 3 Suppose A = XAX’-1. What is the eigenvalue matrix for A + 21? What is the eigenvector matrix? Check that A + 2/ = ( )( )( )-*. 4 True or false: If the columns of X (eigenvectors of A) are linearly independent, then (a) A is invertible (b) A is diagonalizable (с) X is invertible (d) X is diagonalizable. 5 If the eigenvectors of A are the columns of I. then A is a_matrix. If the eigen- vector matrix X is triangular, then X~l is triangular. Prove that A is also triangular. 6 Describe all matrices X that diagonalize this matrix A (find all eigenvectors): A—2} • Why does this X also diagonize A-1 = | 7 Diagonalize the Fibonacci matrix by completing X-1: 1 11 [Ai Ajl [Ai 01 [ 1 0] ” I 1 1J L ° a2J L Do the multiplication XA*X-1 [ J] to find its second component. This is the ith Fibonacci number F* = (A* — A|)/(Ai — Aj). 8 Suppose Gt+2 is the average of lhe two previous numbers Gk+i and G*: Gk+a = |Ga+i + fGk+2l = [ A 1 Gk+i=Gk+l I HG‘ (a) Find the eigenvalues and eigenvectors of A. (b) Find the limit as n -> oc of the matrices A" = XA"X . (c) If Go = 0 and Gi = 1 show that the Gibonacci numbers approach |.
224 Chapter 6. Eigenvalues and j. , eenvtet,^ Prove that every third Fibonacci number in 0,1,1,2,3.... is even. Write down the most general matrix that has eigenvectors [ I ] and [ J j 10 Questions I l-M are 11 12 13 True or false: If the etgem^trs of Лапе 2.2.5then tbe matrix is ccnain)y (a) invertible (b) diagonal.zabic (O no. diagonalizable. True or false: If the only eigenvectors of Л arc multiples of (1,4) then A has (a) no inverse (b) a repeated eigenvalue (c) no diagonalization XAX-1 Complete these matnces so that drt Л = 25- Then check that A = 5 is repeated^ the trace is 10 so the determinant of Л - Af •* (A •>) . Hind an eigenvector with 4x = 5». These matrices will not be diagonalizable because there is no second line of eigenvectors. and Л and 4 Tbe matrix Л = [’ J] is not diagonalizable because thc rank of A - 3/ is_______ Change one entry to make Л diagonalizable. U Inch entries could you change? Questions 15-19 are about powers of matrices. Л* = XA*X'1 approaches the zero matrix as к -» oo if and only if every A has absolute ' aloe less than___. Which of these matrices has Л* -+ 0? .6 .91 •6 | ’ 14 15 and .1 9 Л = 16 (Recommended) Find A and A' to diagonalize Л1 in Problem 15. What is the limit of A* as к -+ oo? What is the limit of XA*X-1? In tbe columns of this limiting matrix you see the______. 17 Find A and X to diagonalize Л2 in Problem 15. What is (Лг)10ио for these Uo? and u0 = 6' 0 ' 18 Diagonalize Л and compute XA*X-1 to prove this formula for Л*: has Л‘=’Г1 + 3‘ 1-3* 2 11-3* 1 +3* '
6.2- Diagonalizing “ Mattn 225 19 20 21 22 23 24 25 26 27 28 29 Diagonalize В and compute XA‘X- to prove this formula tor B‘: 5* 5* -4*1 0 Suppose A = XAX 'Jake determinants io prove det A = det A = A, A2- A„. This quick proof only works when A can be___ Show that trace X Y = trace YX. by adding the diagonal entries of X Y and YX: 13 = has 4» X И and Now choose Y to be AX *. Then XAX' has the same trace as AX~'X = A. This proves that the trace of A equals the truce of A = sum of the eigenvalues. When is a matrix A similar to its eigenvalue matrix A ? A and A always have the same eigenvalues. But similarity requires a matrix В with A = BAB~ 1. Then В is the ________ matrix and A must have n independent. If A = X AX ~1, diagonalize the block matrix В = [ J ] Find its eigenvalue and eigenvector (block) matrices. Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector matrix X. Show that the A’s form a subspace (cA and A> + Aj have this same X). What is this subspace when X = /? What is its dimension? Suppose A2 = A. On tbe left side A multiplies each column of A. Which of our four subspaces contains eigenvectors with A = 1? Which subspace contains eigenvectors with A = 0? From the dimensions of those subspaces. A has a full set of independent eigenvectors. So a matrix with A2 = A can be diagonalized. (Recommended) Suppose Ax — Ax. If A = 0 then z is in the nullspacc. If A # 0 then z is in the column space. Those spaces have dimensions (n - r) + г = n. So why doesn’t every square matrix have n linearly independent eigenvectors? The eigenvalues of A are 1 and 9. and the eigenvalues of В are — 1 and 9: A and В Find a matrix square root of A from R — Xv/ЛХ'1. Why is there no real matrix square root of B? If A and В have the same A’s with the same independent eigenvectors, their factor- izations into ____are the same. So A = B. Suppose the same X diagonalizes both A and B. They have the same eigenvectors in A = XA,X"1 and В = XA2X*. Prove «hat AB = BA. 1
Chapter 6. Eigenvalues 226 30 31 32 33 34 35 36 37 38 and Ei»_^ '8cn*eCl0^ • _ l« bl then the determinant of A - »(A - a)(A - <f). (a)^-H^i,on‘n*orem,'tha,( * h Cavley-Hamilton Theorem on Fibonacci » A -[, 1]. Thc (b) Test the Cayley ; = 0 М1КС the polynomial det(A - A/) is д2 °*т ^TT^btotbe^H-At/XX-A^-..^.^'? “Xto д»,м ”di If Д - p 0] and AB = BA. show that В = [j “ al*° a diagonal matrix И Л - [° ai A different etgen----------. These diagonal matri B ^^si^aJ subspace of matnx space. AB - BA ~ 0 gi^ -«to-»»»**1 to nnk of to 4 by 4 mart,. Ш ,4‘ Wtob if •" W < I”?,0"» “7 if *"> W > L PeterTax gives these striking examples in his book Linear Algebra: c- \ 5 r ~ L~3 ~4 c1024 = -c D = [ 5 6.91 1-3 -4 H^10M|| < 10-П В A ||A,024|| > 10™ В1024 = I Find the eigenvalues A = e** of В and C to show В4 = I and C3 = — ] The nth power of rotation through в is rotation through n0: A„ _ Г cos6 -sin# 1" _ Г cosn# -sinn0 1 ~ [ sin0 cos в j [ sinn0 cosn# I' Prove that neat formula by diagonalizing A = XAX-1. The eigenvectors (colum of X) are (1. i) and (i, 1). You need to know Euler’s formula <* = cos 0 + i Sjn $ * The transpose of A = XAX~l is AT = (X-1)TAAfT. The eigenvectors in ATy = Ay are the columns of that matrix (X”*)T. They are often called left eigenvectors of A. because yTA = AyT. How do you multiply matrices to find this formula for A? [ Sum of rank-1 matrices A = XAX"1 = AjXiy^ + - • • + AnXnyJ. The inverse of A = eyt{n) + ones(n) is A-1 = eye(n) + C • ones(n). Multiply AA~* to find that number C (depending on n). Suppose Ai and A3 are n by n invertible matrices. What matrix В shows that A2A) = B(A|A2)B~* ? Then A2Ai is similar to A) A2: same eigenvalues. (Pavel Gnnfcld) Without writing down any calculations, can you find the eigenvalues of this matrix ? Can you find the 2020th power A2020 ? 110 55 —164 A = 42 21 -62 88 -131
Symmetric Positive Definite Matrices 227 6.3 Symmetric Positive Definite Matrices S h. „ ral A, _ „ „е„усяот, 2 Then S is diagonalized by an orthogonal matnx Q [~S = qaq-i = qaqt 3A positive definite: all A > 0 every piv« > о ail uppcr fef, > 0. 3g The energy test is x TSx > 0 for all x + 0. Then S=Лт A with independent columns in A. Positive semidefinite allows A = 0. pivot = q, determinant = 0. energy xTSx = 0. any A. Symmetric matrices S ST deserve all the attention they get Looking at their eigen- values and eigenvectors, you see why they are special: 1 All n eigenvalues A of a symmetric matrix S are real numbers. 2 The n eigenvectors q can be chosen orthogonal (perpendicular to each other). The identity matrix S - I is an extreme case. All its eigenvalues are A = 1. Every nonzero vector x is an eigenvector: lx — lx. This shows why we wrote "can be chosen" in Property 2 above. With repeated eigenvalues like Aj = Aj = 1, we have a choice of eigenvectors. We can choose them to be orthogonal. And we can rescale them to be unit vectors (length 1). Then those eigenvectors qlt...,qn are not just orthogonal, they are orthonormal. The eigenvector matrix for S has QTQ = I: orthonormal columns in Q. We write Q instead of X for the eigenvector matrix of S. to emphasize that these eigenvectors are orthonormal: QrQ = I and QT = Q~ *. This eigenvector matrix is an orthogonal matrix The usual A — XAX-1 becomes S = QAQT : Every matrix of that form is symmetric: Transpose QAQT to get QTTATQT - <?AQT. Quick Proofs: Orthogonal Eigenvectors and Real Eigenvalues Suppose first that Sx = Ax and Sy = Oy. The symmetric matnx S has a nonzero eigenvalue A and a zero eigenvalue. Then у is in the nullspace of S and x ts in the column space of S (x = Sx/A is a combination of tbe columns of S). But S is symmetric: column space = row space! Since the row space and nullspace are always orthogonal, we have proved that x is orthogonal to y.
Chapter 6. Eigenvalues and Ei, is not zero, we have Sy = «У In this case we i^. to = OK «»<S - “/)l Г <Л 0)1 ’**' * - О »«»s -•' Л »t» »'»” (' г ’ * S ' « №• » “ Й S»»r*’0: " „TMP<too<«toSoto.ly«rs»ntortoeigen4|uei to «W~ ,”,°1" Г4*' M“"S dPmto«>«'S- T”"" «tor IT ctoiges и -i). Tb, J’ ’'^^^’„.„«rtoptanrmbto^tonpte.tomeeslortoendorto^ „«и. B« poto« "“into art ю bt,„„( Positive Definite Matrices 7^ti^definite matrii has all positive eigenvalues ] i .^^mr-inc matnces S « S7. All their eigenvalues are i».i "m Ьг* “,““тp7'rfu' р,<ч*п> H.„ i. ТЫ1 7 podtTredefinite matrixhas all positive eigenvalues ] We would ita to check for positive eigenvalues without ^computing those numbers A. You will see four more tests for positive definite matrices, after these five examples. is positive definite. Its eigenvalues 2 and 6 are both positive 2 S Q J p g j Qr “ positive definite if Qr - Q~1: same A 2 and 6 3 S = C 2 ° C1" is positive definite if C is invertible (not obvious) [ 0 и J 4 S » [ “ j is positive definite exactly when a > 0 and ac > b2 5 ~ о о ** Р0*1^** semidefinite: it has all A > 0 but not A > 0 Try Test I on these examples. The other tests may give faster answers. No, No, Yes. I 2 2 I 5 = vvT(rank 1) ’ 2 1 0 1 2 1
63- Symmetric Positive Definite Matrices 229 The Energy-based Definition May 1 bring forward the most important idea about posmve definite matnees? Ulis new approach doesn t directly involve eigenvalues. but it turns out to be a perfect test for A > 0. -This is а g°°d definition of positive definite matnees: Test 2 is the energy test is positive definite if the energy xTSx is positive for all vectors x # 0 | (I) Qf course S = 1 i* positive definite: All A, = 1. The energy xT/x = xTx is positive if x / 0. Let me show you the energy in a 2 by 2 matnx It depends on x - (хьх>). xrSx -l*‘ x’ 1 2 4 4 9 Energy = 2x’+ Bx^ + 9 x^ Is this positive for every x» and x, except (xI,x3) (0,0) ? Yes. Uuasum of squares: xrSx “ 2X| + 8xix3 + 9xJ »2(xt+ 2x3)’ + Xj « positive energy We must connect positive energy xTSx > 0 to positive eigenvalues A > 0: If Sx = Ax then xTSx - AxTx. So A > 0 leads lo energy xTSx > 0. That line tested xTSx for each separate eigenvector x. But more is true. If every eigenvector has positive energy, then all попмго vectors x have positive energy: If x rSx > 0 for the eigenvectors of S, then xTSx > 0 for every nonxero vector x. Here is the reason. Every x is a combination ci®i + ••• + c*xn of the eigenvectors. Those eigenvectors can be chosen orthogonal because S is symmetric. We will now show: x1 Sx is a positive combination of the energies Адх^ха > 0 in the separate eigenvectors. XTSx (cl»|> + • ‘ * + Cn®I) S (C1X1 + • • • + CaSa) »(Cixf + --+CaXj)(c1A|Xi + -- + CaAaX,) = cf Aisfsi + • • • + <^А„х^хя > 0 If every A< > 0. From line 2 to line 3 we used the orthogonality of the eigenvectors of S: x/x} = 0. Here is a typical use for the energy test, without knowing any eigenvalues or eigenvectors. If Sj and Si are symmetric positive definite, so is St + Sj Proof by adding energies : xT(Si + Sj) x = xTSi x + xTS? x >0 + 0 The eigenvalues and eigenvectors of Si + Sj are not easy to find. Energies just add.
230 Chapter 6. Eigenvalues and '8en*B<.tUr, Three More Equivalent Test . n nositive eigenvalues and positive energy. That Crtc~ . $ be «lu« 1 w .and prob.bly«be«.bu.»e.lopWi,|,u,^.'» ""“J'“lun™ £ 4 « = К-» °' ..S r' Toll ................ •t j why must columns of A be independent in thia test? Test 3 applies to S = А л. if w“" . Л. - Лт>1« - M«)T M-) - ||Л.||>, a s= A A ы* ж. Ax This cncrgy is positive provided Л, i. Бк cmupy » И» !<«'»ЛЛ„ О. Ле column of И mu,, be lndepenacn‘ л,л ь“ -* 2- ” 3 . r ] । i 1 Г 2 3 4 is not positive definite. -I 9 C 7 It i< nncitium .’ is not positive definite. It is positive semidefinite ||Л®||2 > 0 ,_________ • (1, -2,1) has zero energy 0. Then S » ArA is only positive semidefinite, 1 2 1 3 3 5 7 4 7 10 1» Л I3 - 2 P7T* It is an eigenvector of A A with л Equation (2)»ys 4ТЛ is al least semidefinite. because xrSx = ||Ae||« j, never negative. Semidefinite allows energy I eigenvalues I determinants /pivots of S to be ten Determinant Test and Pivot Test The determinant test is recommended for a small matnx. I will mark the four "leading determinants” Dt, Di. Dt, Dt in this 4 by 4 symmetric second difference matrix. Test 4 S -1 2 -1 2 -1 -1 -1 2 has 1st determinant 2nd determinant 3rd determinant 4th determinant D, =2 £>2 = 3 D3 = 4 D4 = 5 The determinant test is here passed! The energy xTSx must be positive loo. Leading determinants are closely related to pivots (the numbers on the diagonal after elimination). Here the first pivot is 2 The second pivot | appears when l(row 1) is added to row 2. The third pivot j appears when j(new row 2) is added to row 3. Those fractions 1 are ratios of determinants! The last pivot is The Ath pivot equals the ratio * of the leading determinants (sizes k and k - 1) ”fc—1
6 , Symmetric Positive Definite Мжпса 231 Tet 4 П» !»•"“ «" И* S П. ' C"n ’“^1 Sir “ S Л’Л. In («I elimination on b produces an important choice of Л. RcnicmbCT that elimination = iriaHKu^r fa‘ C . . P lo no* has had l's on its diagonal and U contained the pivots. But with symmetric matnees we can balance S as LDLT: 2 -1 -1 2 0 -i put pivots into О - “J Test5 ° "J [’I 41 S-LU (3) S = LDLT (4) Share those pivots between A7 and A Test 3 0 1 am sorry about those square roots—but the pattern 5 - ATA is beautiful: A = y/DLT. ^Elimination factors every positive definite S into Ar A (Ab upper triangular) j This is the Cholesky factorization S ATA with ^pivots on the main diagonal of A. To apply the S “ ATA test when 5 is positive definite, we must find al least one possible A. There are many choices foe A. including (1) symmetric and (2) triangular 1 If S ж QAQT. take square roots of those eigenvalues. Then A « Qy/XQ' “ AT. 2 If S = LU “ LDLr with positive pivots in D. then S (Li/D) (y/T)LT). Summary The 5 tests for positive definiteness of S = S* involve 5 different parts of linear algebra—pivots, determinants, eigenvalues. S « ATA. and energy. Each test gives a complete answer by itself: positive definite or semidefinite or neither. Positive energy xTSx > 0 fa tbe best definition. It connects them all.
232 Suppose 5 » • *• 5 = • ‘ b ej I will choose an Energy E - **S* The graph of that energy Chapter 6. Eigenvalue Positive Definite Matrices and Minimum ь ” rn>b|( mmetric positive definite 2 by 2 matrix. Apply four of the tests determinants a > O.ae- b2 >0 - л t-^0 =^+M+51/J>0 pivots a>0'(ac-bi\i energy ax2 + 2bry + '.J * 0 3 c = 5 and b = 4. This matrix S has A = 9 and A = ,° I» * function £(z.f) «• bon! opening upwards The bottom tne gmp..« — —- . . = 0. This connects minimum problem book describes numerical minimization^ For the beat problems, the --------------------------like / = r,or/-x2 + v2 Here is the convexity lcsl. ^murnx <^ *<*-*« « define a, all poinlx. We are in high dilw X. but linear algebra .denuf.es the crucial properties of the second denvat.ve matrix. calculus with positive Chapter 8 of this I function f(x) ь strictly cons ex—I The second derivatives of thc energy j zTSz are in thc matrix S For an ordinary function J(x) of one variable x, the lest for a minimum is farnou ... . First derivative <// n Second derivative d2/ Minimum 7- • 0 . . . —~ n ~ is zero di is positive <£ra u d ж = *o For f(x.y) with two variables, the second derivatives go into a matrix : positive definite! Minimum ilf _ 0 and — = 0 and at xo. vo Ox Oy (Pf/dz2 02f/&x0y'\ is positive definite e^f/didy fPf/Ov* at x0, Vo Thc graph of: ’ f(z.y) is flat at that point xo.vo because df/dx = df/dy = 0. The graph goes upwards because the second derivative matrix is positive definite. So we have a minimum point of lhe function f(x,y). Similarly for f(x,y,t). Positive Semidefinite Matrices Often we arc at the edge of positive definiteness. The determinant is zero. The smallest eigenvalue is A = 0. Thc energy in its eigenvector is zT5z = z1 Oz = 0. These matrices on the edge are “positm irmidefiniie". Here are two examples (not invertible): are positive semidefinite but not positive definite
Symmetric Positive Definite Matrices 233 S has eigenvalues 5 and Ota trace i. 1 + 4 = 5. !u upper , ani]0. The rank of S is only 1. This matnx S factors into A1.4 with depfluirn, columns in A: Dependent columns in А Г1 21 fl О] [1 21 Positive semidefinite S [2 41 * 12 0 0 0 = ^T,A‘ If 4 is increased by any small number, the matrix S will become positive definite. The cyclic T also has zero determinant. The eigenvector x - (1,1,1) has Tx = 0 and energy хлТх “ Sectors z in all other directions do give positive energy. Second differences T from first differences A Columns add to (0,0,0) 2 -1 -1 2 —1 -I -1] 1 -1 ж 0 2j [-1 -1 0‘ 1 -1 0 1 1 0-1’ -1 I 0 0 -1 1 Positive semidefinite matrices have all A > 0 and all x^ Sx > 0. Those weak inequalities (> instead of >) include positive definite S along with the singular matrices at the edge. If S is positive semidefinite. so is every matnx ATSA: If x^Sx > 0 for every vector x, then (Ax)TS( Ax) > 0 for every x. We can tighten this proof to show that A* SA is actually positive definite. But we have to guarantee that Az is not the zero vector—<0 be sure that (Ax)TS(Az) is ms zero. Suppose zTSx > 0 and Ax / 0 whenever z is not zero. Then ATSA is positive definite. Again we use the energy test. For every x # 0 we have Ax / 0. The energy in Ax is strictly positive: (Az)T5(Ax) > 0. The matnx ATSA is called "congruent to S. Example 1 The identity matrix S ” I is positive definite. Then we have proved: 1. AT A is positive semidefinite. 2. If A is invertible, then Ar A is positive definite. This was Test 3 for a positive definite matrix It is mentioned again because ATSA is such an important matrix in applied mathematics. We warn to be sure it is positive definite (not just semidefinite). Then the equations ATSAx f in engineering can be solved. Here is an extension called the Law of Inertia If S1" = S has P positive eigenvalues and N negative eigenvalues and Z zero eigenvalues, then the same is true for ATSA—provided A is invertible. The Ellipse ax2 + 2bxy 4- cy2 = 1 Think of a tilted ellipse xTSx = 1. Its center is (0,0). as in Figure 6.2a. Tum it to line up with the coordinate axes (X and Y axes). That is Figure 6.2b. These two pictures show the geometry behind the eigenvalues in Л and the eigenvectors in Q and in S = QAQT. The eigenvector matrix Q lines up the ellipse The tilted ellipse has xTSx = I. The lined-up ellipse has ХТЛХ = 1 with X = Qx.
Chapter 6. Eigenvalues and Ei( 234 rod the axes of this tilted ellipse 5x2 + 8xy + 5V2 = 1. Example 2 Ft nu(nx 5 [haI matches this equation: Solution Sort mth »e poe • . r. The equation is The matrix is S' The eigenvectors are i ] and [J J Divide by Л for unit vectors. Then $ a qAqt Eigenvalues 9 and 1 Now i multiply by [x у] on the left and ['] on the right to get xTSx » (»Tq)a'( r'dl‘ sum oi — .—, - - , , К >/2 / V y/2~J • (6) The coefficients are the eigenvalues 9 and 1 from Л. Inside the squares arc th. «, . (i. - (i. -I)M "“"««««Kn The axes of the tilted ellipse point along those eigenvectors. This ex S « QAQ* is called the “principal axis theorem"—it displays the axes No **’y axis directions (from lhe eigenvectors) but also the axis lengths (from th. * On,y *hc »<ne eigenvalue,) Figure 6.2: The ellipse *TS* = 5г2 + 8ry + 5y2 = 1. Lined up it is ЭХ2 + У* 1. To sec it all. use capital letters for the new coordinates that line up the ellipse: Lined up ЦД = Х and =Y and 9X2 + У2 = 1. y/2 Vi The largest value of X2 is 1/9. The endpoint of the shorter axis has X = 1/3 and Y = 0. Notice: The bigger eigenvalue A( gives lhe shorter axis, of half-length l/УХ? = 1/3. The smaller eigenvalue Aj = 1 gives the greater length 1 /V^a = 1. In the ry system, the axes are along the eigenvectors of S. In the XY system, lhe axes art along the eigenvectors of A—lhe coordinate axes. All comes from S = QAQJ.
Symmetric Positive Definite Matnees 235 Optimization and Machine Learning This book will end with gradient descent to пип.пии /(x). Ucp t0 Xt41 takes be steepest direction al (he current point «*. But that «еерем direction changes as we descend. This is where calculus meets linear algebra, at lhe minimum point z7 Calculus The partial derivatives of / are all zero at z’ : = 0 dx, Linear algebra The matrix 5 of second derivative, ^L- « posuive definite If S is positive definite (or semtdefinite) at all «u- function /(z) is convex. If the eigenvalues of «Г f ~ (*>•••• »")• «ben the then /(®) I» strictly convex Then are the beuV* ***’*' роы|,*е numbcr s- only one minimum, and gradient descent will find iL Г“Паюп$ to °P“mi“ They have Machine learning produces “loss functions" with hundreds of thousands of variables. They measure the error—which we minimize. But computing al) the second derivatives is barely possible. We use tint derivatives to tell us a direction to move—the error drops fastest in lhe steepest direction. Then we take another descent step in a new direction. This is the central computation in least squares and neural nets and deep learning. All Symmetric Matrices are Diagonalizable This section ends by returning to the proof of the (real) spectral theorem S QAQr- Every real symmetric matrix S can be diagonalized by a real orthogonal matrix Q When no eigenvalues of A are repeated, the eigenvectors are sure to be independent. Then A can be diagonalized. But a repeated eigenvalue can produce a shortage of eigenvectors. Thi, sometimes happens for nonsymmetric matrices. It never happens for symmetric matrices. There are always enough eigenvectors to diagonalize S = ST The proof come, from Schur’s Theorem: Every real square matrix factors into A = QTQ-1 « QTQT for some triangular matrix T If A is symmetric, then 7* = QrAQ is also symmetric. But a symmetric triangular matrix T is actually diagonal. Thu, Schur has found the diagonal matrix we want. If A = S is symmetric, then T = A and 5 = QAQ r as required The website math.mil.edu/esTryone has the proof of Schur*, Theorem. Here we just note that if A can be diagonalized (A = XAX"’) then we can see that triangular matrix T. Use Gram-Schmidt from Section 4.4 to factor X into QR: A = XAX-1 = QRAR~lQ~l = QTQ~' with triangular T = RAR'1.
236 Chapter 6. Eigenvalues and Eii Complex Vectors and \.ja This page allows for complex numbers like 3 + 4t in z and 5. The complex z, « 3 + dr is It = 3 - 4i. Then z, times z, is 32 + 42 = 25 (reaj,. nnjugaic of |z, I = y/25 = 5. Now suppose we have a complex vector x = (Xj, = ^nitu^ J Length squared ^3 _ yT, _ 25 + 2 * 27 II*» 4.v_ Before we move to nutnces, here are the key facts about complex numbers I The conjugate isz=a-ib x + 't ------° ’b. = 2a is real z times I = |x|2 = a2 + b2^ . 7~г~ - — — a ~ tb equals .i-з , - w - j--. I____________________ A real symmetric matrix has 5T = S. With complex numbers, we must change S'7 - Here is a mauix that has S T = S (Hermitian matrix). to S7 2 3-3» 3 + 3i 5 S=S xrSx = real number for complex i 8 and —1= real eigenvalues of S ‘ 2 —A 3-3» dct(S - A/) = det 3 4. Ji 5 - A A2 - 18 = (A - 8) (A +1) f_ _ ir » 3-3iirzil 2ziXi + 512*2 + ITS.-!1' 5 ][i;]'i,(3-3.>1+Il(3 + 3i)xrre*1 W xT5x is real: Equation (7) ends with complex numbers z+z = a+ib+a-ib = 2a (reef). Hermitian matrix ST = S is the complex analog of a real symmetric matrix SF = $. Unitary matrix QT=Q'1 is lhe complex analog of an orthogonal matrix with QT=Q-t The eigenvalues of S = S Г are real. Tbe eigenvalues of a unitary Q have |A| = 1. The eigenvectors q,.....q„ can be complex. Those eigenvectors are still orthogonal. The command S' in MATLAB or Julia automatically returns JT when S is complex. Thc dot product of complex vectors is xTy = *t»i + • • • + znj/n. Example to show two complex dot products xTx = 2 xTy = 1-1 = 0 Those orthogonal vectors z. p are thc eigenvectors of Q = 1' 0 and5= _® ’ .
_„,lnc Positive Definite Matrices 237 6 3. WORKED EXAMPLES Test these symmetric matrices S' and T for positive definiteness: вЛА [ 2 -1 01 r . • ’ Solution The pivots of S are 2 and 2 and < “ and 3 and 4. all positive. The eigenv^ucs of $ uppcr feft determinants are 2 That completes three tests. Any one test is । V 2 2 and 2 + x/2. all positive. 1 have three candidates Alt Aa, 4j to . a first difference matrix with 1 and -1 to “ л A “ Positive definite. At is to produce the second difference-1,2,-1 in S: S= A{Ai 2 -1 0 -1 2 -1 °] Г1 -1=0 2j [O -1 0 1 -1 0 1 O' 0 -1 1 -1 0 0 0 1 -1 0 o' 0 1 -1 The three columns of A, are independent Therefore S is positive definite. Aj comes from S = LDLT (the symmetric version of S = LVY Elimination gives the pivots 2, j in D and the multipliers -|,0, -1 in L. Just put A2 = Ly/D. LDLT = 1 “5 0 1 -j 01 1 -J = (L7D)(Lx/D)t = aJa2. This triangular matrix Aj has square roots (not so beautiful). A? is the “Cbolesky factor" of S and the MATLAB command is A = chol(S). In applications, the rectangular Ai is how we build S and this Aj is how elimination breaks it apart Eigenvalues give the symmetric choice A3 = Qy/A.QT. This succeeds because A3 A3 = QAQT = S. All tests show that the —1,2. — 1 matrix S is positive definite. The three choices Ai, A2. A3 give three different ways to split up the energy xTSx: xTSx = 2x^ - 2xi jj + 2x, - 2хэхз + 2xj l|Atx||2 = x? + (x2 - xi)2 + (x3 - xj)2 + x? ||A2x||2 = 2(x! - |xa)’ + |(x2 - §x3)2 + $ x§ ||A3x||a = Ai(q7x)2 + A2(q2 x)2 + As(gjx)2 Rewrite with squares S = A^Ai S — LDLT = AjAa S = QAQT = AjA3 For the second matrix T, the determinant test is easiest. Test on T det T = 4 + 2b - 2b2 = (1 + b) (4 — 2b) must be positive. At b = — 1 and b = 2 we get det T = 0. Between b = -1 and b — 2 this matrix T is positive definite. The comer entry b = 0 in the matrix S was safely between -1 and 2.
Chapter 6. Eigenvalues and Ei Benve^ Problem Set 6.3 Лс мгос cigenvaJucs * 1 Suppose S ₽ й syinnKlnc When В ------------------ (•) Tnubpo* -45B eigenvalues) when В =---------- (b) AS В » sinular to та1псе5 similar to S look like (_ W( Put (a) and (Ы together eigenvectors and the factors for Q\QT. the Cln 2 2 2 S« 2 0 0 2 0 OJ 0 21 and T- 0 -1 -2 -2 0 2 П 2 3 4 5 6 7 Find all orthogonal matnees that dugonanze J2 J6J. (.) Find a svmmetnc matnx [J ?] that 1ш a negative eigenvalue. <b) How do you known must have, ncgal.ve pivot? (c) How do you know и can t hm two negator eigenvalues? If C is symmetne prove .hat ArCA is also symmetric. (Transpose it.) When A it 6 by 3. what tn the shapes of C and A CA. Find an orthogonal matnx Q that diagonalizes S = [ ’ “j. What is Л ? If Л’ 0 then the eigenvalues of A must be ——• Give an example that hai A / 0 But if Л is symmetric, diagonalize it to prove that A must be a zero matnx Write 5 and 7 in the form A|»jxf + *2*1*1 of 'he spectral theorem QAQT. (keep||xi|| = ||x2|| = j). T- 5 = 8 Every 2 by 2 symmetne matnx is A|X|X^ + A2x2x2 “ A|ft + A2ft. Explain ft + ft = xixf + x2xj - I from columns times rows of Q. Why is ft ft - 0? 9 What arc the agenvalues of A = [_£ ,]? Create a 4 by 4 antisymmetric matrix (Лт = -Л) and verify that all its eigenvalues are imaginary. 10 (Recommended) This matrix Af is antisymmetric and also_________Then all its eigenvalues are pure imaginary and they also have |A| = 1. (||A/x|| = ||x|| for every x so jt Ax|| = |x| for eigenvectors.) Find all four eigenvalues from the trace of Af: 1 -1 0 I 1 1 -1 0 can only have eigenvalues i or - t.
6Л Symmetric Positive Detinue Matrices 239 . Show that this A (symmetric but complex i . 11 ° “Prex> has only one line of eigenvectors: Л*[1 -I] <|,Ч«и1.иЫе „8с<„11ие%х»o.O д! , X » M .ucb. Ч«х»1 prapcn, fa trapk> sroj profeny „ А = A. Then all A s are real and the eigenvectors are orthogonal 12 Find the eigenvector matrices Q for 5 and X for B. Show that X doesn’t collapse at ~ !•evcn lhou*h л = 1 is repeated. Are those eigenvectors perpendicular? 0 d o’ -d 0 1* 5 = d 0 0 Be 0 1 0 have A-l,d.-d. 0 0 1 0 0 d 13 Write a 2 by 2 complex matrix with 3T = S (a “Hermitian matrix"). Find A, and Aj for your complex matrix. Check that Sjzj = 0 (this is complex orthogonality). 14 True (with reason) or fake (with example). (a) A matrix with n real eigenvalues and n real eigenvectors is symmetric. (b) A matrix with n real eigenvalues and n orthonormal eigenvectors is symmetric. (c) The inverse of an invertible symmetric matrix is symmetric. (d) The eigenvector matrix Q of a symmetric matrix is symmetric. 15 (A paradox for instructors) If AAT = ATA then A and AT share the same eigen- vectors (true). A and AT always share lhe same eigenvalues. Find the flaw in this conclusion: A and AT must have the same X and same A. Therefore A equals A r. 16 Are A and В invertible, orthogonal, projections, permutations, diagonalizable ? Which of these factorizations are possible: LU.QFL XAX~1.QЛQT^ 0 A- 0 1 1’ I 1 . 1 I 0 1 1 0 0 0 17 What number 6 in A = [ ? j}] makes A = Q\QT possible? What number will make it impossible to diagonalize A? What number makes A singular? 18 Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers can be eigenvalues of those two matrices? 19 This A is nearly symmetric. But what is the angle between Ле eigenvectors ? Г1 IO"15 1 [0 i + io-*5] has eigenvectors and [?]
Chapter 6. Eigenvalues and & 240 21 22 23 If Аш„ i* u* --------~ larger than A^. What n the first entry о,, °> о - ----------- Suppose Лт = -Л (real aniitymmetnc matrix). Explain these facts abou( * **»• (а) x7 Az = 0 for every real vector z. (b) The eigenvalues of A are pure imaginary. (ct The determinant of A is positive or zero (not negative). For (a). multiply out an example of xT Ax and watch terms cancel n xr(Ax) to -(Ar)Tz. For(b). Ax = Az leads to tTAz = AlT* a дй J* г«*егч shows that zTAz - (*- iy)TA(x + ty) has zero real part. Then (bi hit1 Pa'4a> ncips wi(Jj i If S is symmetric and all its eigenvalues are A = 2, how do you know th о 2/ ? Key point S) mmetry guarantees that S « ^AQ7. What is that Д ? * 'T,UM ** Which tymmetnc maincei S art alto orthogonal? Show why S'2 « / \v possible eigenvalues A ? Then S must be QAQr for which A ? ’hat are thc Problems 24-49 are about tests for positive definiteness. Suppose the 2 by 2 tests a > 0 andac -1? > 0 arc passed by S * f • ь i (I) A| and A, have the tome tign because their product A, Aj equals * (i) That sign is positive because A( + A2 equals___So A. > n «nd Aj q Which of Si. Sj. Sj. S4 has two positive eigenvalues? Use a test don1. Also find an x so that xrStx < 0. so S, is not positive definite.’ C°n’PUte A’«. 24 25 Si - 26 For which numbers b and e is positive definite ? Factor S into LDLT. S-ff ‘I. S = S 27 Write f(x. p) - x2 + 4*v + Зу2 as a difference of squares and find a point (*, v) where / is negative. No minimum al (0,0) even though / has positive coefficients. The function /(*,») = 2zy certainly has a saddle point and not a minimum at (0,0). What symmetric matnx S produces [ г у ]S [' ] = 2ry ? What are its eigenvalues? 29 Test to see if ArA is positive definite in each case: A needs independent columns.
bi Syfflinctnc Positive Detinue Matrices 241 30 31 32 33 34 35 36 37 38 39 Which 3 by 3 symmetric matrices S and T з «1*1-z,z»). Why isTsemidefinite? Compute lhe three upper left determinants <rf c M • . • • Verify that their ratios give the second and ЛЛ P°'"1'e dd"”*ene4S 7 * Ond л,г<1 pivots in elimination. Pivots = ratios of determinants $ 2 2 O’ 2 0 5 3 3 8 For what number» e and d art S and T posiuve definite? Тем their 3 determinants: 1 Г 1 S = c 1 lie and 1 2 3‘ d 4 4 5 2 3 Find a matrix with a > 0 and c>0anda + c>26 that has a negative eigenvalue. If S is positive definite then S 1 is positive definite. Best proof: The eigenvalues of S~' are positive because________Can you use another lest ? A positive definite matrix cannot have a zero (or even worse, a negative number) on its main diagonal. Show that this matrix fails the energy lest zTSz > 0: 4 1 1 *1 0 2 zj is not positive when (zi.zj.za)" ( , , ). 2 & A diagonal entry »t) of a symmetric matnx cannot be smaller than all lhe A's. If it were, then S - в]}1 would have_________eigenvalues and would be positive definite. But S - / has a_______on the main diagonal. Give a quick reason why each of these statemenu is true: (a) Every positive definite matrix u invertible. (b) The only positive definite projection matrix is P = I. (c) A diagonal matrix with positive diagonal entries is positive definite (d) A symmetric matrix with a positive determinant might not be positive definite! For which s and t do S and T have all A > 0 (therefore positive definite)? From S = QAQT compute the positive definite symmetric square root Qy/\QT of each matrix. Check that this square root gives AT .4 = S:
242 40 Chapter 6 bgenvalues find the half-lengths Of iUa aXcs frOfn 41 find S. From S = 4 8 8 25 find С» _k , ‘Wj. From Ся In the Cbolesky factorization S - CTC. with C = >/DLT, the square root. p.HXs are oo the diagonal of C. Find C (upper triangular) for №o,s of ц,с 19 0 0 0 1 2 0 2 SJ [ci»0 Wrnhout multiplying S = К $ (a) the detenmnani of S (c) the eigenvectors of S The graph of a x3 + y3 is a bowl opening upward. The graph of r - xa raddle The graph of г - -X3 - у3 “ • bowl opening downward. What' o, 6, c for x a ox3-t-T&ry + cy3 to have a saddle point at (x.y) ж (o.q)?* Which values of e give a bowl and which c give a saddle point for th r 4т3 + I2xy + cy3? Describe this graph at lhe borderline value of r ' 8r*Ph * When S and T are symmetric positive definite, ST might not even be But stan from STx - Xx and take dot products with Tx Th™ n ,y,nn’cl'ic. ________ 1 ncn Prove A > 0 Suppose C is positive definite (so yTCy > 0 whenever v d 0) and 4 c . dent columns (so Ax / 0 whenever x # 0). Apply the enerjtv test t *T 'f^epen- - S - Importanl ’ Suppose S is positive definite with eigenvalue* A1 > д > (a) What are the eigenvalue* ofthematrixAi/ -$?!* it posn.vr c A” (b) How doe* «follow that AiXT* > xTSx for every x? ' ° П"С? (О Draw this conclusion. The nusimum value of x^Sx/x^x is __ Pur which a and ria this matrix positive definite > For «л к м-Tnidefinitr* /th» k л ' r which a and c is it positjy^ All 5 tests are possible. The energy xTSx equals e(*i+xl+lJ)2 + c(T2_;r3)2 42 43 45 47 45 49 5 = [1 1 I] 11 2 21 [1 2 7] 01 Г сове sin 01 япв cos0j[O 5][-sin0 coe0J'^nd (b) lhe eigenvalues of 5 (d) a reason why S is symmetric positive definite y3he atc«on and
Systems of Differential Equations 6-4 Systems of Difrerentia, 243 । If s Ax thcn u(<) = eUx *'ll solve If A s 1 ’hen |ЦМ = ^‘«(0) i 3 4 5 = Xe^jr-t x give a solution еих. Matris^i-------- 'T/“T’" + Mt)"/n! + is stable and u(t)-» 0 and ел<-so whe Xr A 1 lf A - XAX~l. .-+B^ W<0. first order system + « • 0 is equivalent to [u ]' [ 0 llful Eigenvalues and eigenvectors and Л « Пу-l Шу are also perfect for differential equation^ du/dt ₽Crf<X‘ for n“trix P°««n Я*. algebra, but to read it you need one fact from cakul 4 V*" *С‘,ОП “ moMl> ,inear Thc whole point of the section is this: Constani derivative of e^ is Arx' inverted Into linear algebra. «*Пккп| dinerentul equations can be The ordinary equations ~ = u and ~ » dt л - Au are solved by exponents: (I) du du и produces u(t) . Ce1 dt dt ’ Au ^“ces u(t) . CeA‘ At time t - 0 those solutions start from u(0) » r s_ tells us C. The solutions u(t) = u(0)e‘ and / beca“*ve“" « This “initial value” »=j«««a™..iь,,“<0,И *““<»)• vector и (now boldface). It starts frorn initl4) ч’\т°'с' n by n. The unknown is a square matrix A. We expect n solutions u(t) » JuZ. n 'Suations contain a ____________, » from n eigenvalues / eigenvectors. “t(0) 1 System of n equations du dt = Ли starting from the vector u(0) = L“-(0) These differential equations are linear. If U(H We will need n constants like C and D to mairh nJ ** *° “,юм- *° « c«(0 + Dv(t). is to find n “pure exponential solutions" и = рл'т к ” Com,*’nenu °f u(0). Our tint job Notice that A is a constant matrix. In other hn^r “a^ In nonlinear equations, A changes as и сЬап«ч и 1°°^ и ch«8« “ ' changes. du/dt = Au is “linear with constant coefficients" тъ Шие difficu,I’«- “equ.Lon,ta,„willсот„d,ra|j„" “* linear eomUM ,, ri„ У _ . (2)
244 Chapter 6. lugcnvalucs und Eigcnvcct,^ Solution of du/dt = дп Our Pure exponential soluuon will be r* times a fixed vector x. You may gUcss ц,ад д >s an eigenvlterf A and x «the eigenvector. Substitute u( < x into the equate Л Au to pnwe you nght. The factor r* will cancel to leave Ax = Ax. du dt agrees with Au = Aextx (3) share the same c* The solution , ••• • • — '* like a sine wave. 4 2 • All components of this special solution u grows when A > 0. h decays when A < 0. If A » a complex number, ns real pan dec.des growth or decay. The imaginary part u> give* oscillation < starting from u(0) » This is a vector equation for u. Il contains two scalar equations for the components у and a. They are ’ du — = Au Ju . Example 1 Solve — = Au - “coupled together" because the matrix A is not diagonal: . dy dz means that — = x and — = u dt dt v The idea of eigenvectors к lo combine those equations in a way that gets back lo I by | problems The combination» у + : and у - z will do it. Add and subtract equations: ^(y + x)-x + y and jj(y -x)--(y-r). The combination у * grows like e*. because it has A 1. The combination у — z decays like r“‘. because it has A « -I. Here is the point: We don’t have to juggle the original equations du/dt Au. looking for these special combinations. The eigenvectors and eigenvalues of A will do it for us. This matnx A has eigenvalues I and -I. The eigenvectors x arc (1,1) and (1,-1). The pure exponential solutions U| and Uj take lhe form < A,x with A| = 1 and Aj -1; u2(t) = ca»'x2 = «"• 1 (4) Complete solution u(t) = , Г11 . Г 11 _ [Ce'+ De"'! Combine u, and u3 C [1]+D [-1J ” [Cef - De"'] ’ 5 With these two constants ( and f). we can match any starting vector u(0) — (U|(0), uj(0)). Set t = 0 and e» = 1. Example I asked for the initial value to be u(0) = (4.2): u(0) decides r[ll f 11 hl CandD ' llj * L-l] = l-’l У,С,Л C = 3 and D = 1 With C - .t and О = 1 in ihe solution (5). the initial value problem is completely solved.
j, 4 Sy**cm» of Differential Equations 245 Forriby n matrices we look for n eigenvectors Thea C and D become r, 1. Write u(0) as a combination c,z, + • • • + r.x. cftherigenvectors of Л 2. Multiply each eigenvector z, by its growth factor cA«' 3. The solution to du/dt Ли is the same combtnatmn of those pure solutions eA'z: |м(0 = с|ел»^1+... + с,<Л,>ж<<. (6) Not included: If two A’» are equal, with only one eigenvector, another solution is needed (It will be te*‘x.) Step I needs a basis of n eigenvectors to diagonalize Л = XXX~l. Example 2 Solve du/dt = Ли knowing the eigenvalues A - 1.2.3 of Л: Typical example Equation for и Initial condition u(0) 1 I 11 rr 0 2 I и starting from u(0) - 7 . ° 0 з] [4 The eigenvector matrix X haszt (1,0,0) and za (1,1,0) and z, (1,1,1). Step 1 The vector u(0) (9,7,4)i*2zi + 3z2+4zj Then (ci.cj.d) (2.3.4). Step 2 The factors eA* give exponential solutions e*Z| and raz3 and «-“z*. Step 3 The combination that start* from u(0) is u(f) le’zi + 3«21 x3 + 4- “zj. The coefficients 2,3,4 came from solving the linear equation C|Z| + cjXj + c3z3 u{0): c,l 1 1 1 ‘ c, « 0 1 I cj] [О 0 I which is Xc u(0). (7) You now have the basic idea—how to solve du/dt » Ли. The rest of this section goes further. We solve equations that contain second derivative*, became they arise so often in applications. We also decide whether u(t) approaches zero or blows up or just oscillates Al the end comes lhe matrix exponential eAl. The short formula r u(0) solves the equation du/dt = Ли in the same way that Л‘и«, solve* lhe equation ш+1 = ,u‘ Example 3 will show how “difference equations" help to solve differential equations All these steps use lhe A’» and the zs. This section solves the constant coefficient problems that turn into linear algebra. Those arc the simplest but mostjmportant differential equations—whose solution is completely based on grow th factor* c . Second Order Equations The most important equation in mechanics is my + by * rinc aixTTtHs force is the mass “times the acceleration a = < Then by « dampmg and Ky is force.
246 Chapter 6 f .ipenvaJues JrKj bgenvm,^ Thu a a seconder equation (it » Abrtaw'r F = ma). It conta.n* lhc denvauve p" = tfy, Л< fl is util linear »nh constant coefficient* m. b, k. In a differential equation* course. lhe method of *olution ia to substitute у = fAt Each derivative of p bring* down a factor A. We want p = c to solve thc equation: m + 6 — + ky = 0 become* (mA’ + 6A + fc) eA‘ о, (K. Everything depend* on rnA3 + ЬХ + к - 0. Thi* equation for A ha* two root* Д, an(j A,. Then the equation for v ha* two pure wilution* y, = e 1 and yy ж rA’'. combination* e>pi + ejpj five lhe complete wilution. Thii U not true if A| ж AJt In a linear algebra course we expect matrices and eigenvalue*. Therefore we turn the scalar equation (with p") into a rertor equation for у and y': First derivative only! Suppose lhe ma*» is m - I. Two equation* for u (y, p') give du/dt = Au: dy/dt = |/ dtf/dl = ~ky - by' converts to - IV1 - f ° ’ll»] л kJ 1-* -M kJ Ли. (9, "The first equation dy/dt - tf is trivial (but true). Thc second is equation (K) connecting y" lo g' and y. Together they connect u' to u Now we solve и' ж Au by eigenvalues of A • A - XI ж J * fc * A ] determinant X1 + ЬХ + к ж 0. The equation for the A’* Is the same aa (M)' It I* still A3 + bX 4- к я 0, since m = 1 The roots A| and Aj are now eigenvaluet of A. Thc eigenvector* and lhe solution arc Hie first component of u(t) ha* p - qr*'1 + r,rA>'-thc same solution a* before. Il can't be anything else. In the second component of u(f) you sec the velocity dy/dt. The 2 by 2 matrix A in (9) is called a companion matrix—a companion to thc equation (8). Example 3 Motion around a circle with y" + p = 0 and у = con t Thia is our master equation with mass rn 1 and stiffness к ~ 1 and d 0: no damping Substitute p r ‘* into if + p 0 lo reach A3 + 1 = 0. The root, are A = I and А я Then half of r'* f e'" gives lhe solution у ж «м|. As a first order system, the initial values p(0) ж |, y'(O) = Ogo into u(0) = (1,0): Use p" = — p The eigenvalues of A are again the same A - I .„d A = -( (no surprise). A is anti- L h (7 m e?rm'tan ? (and 0. -•). The combination that matches t*(0) (1.0) i. J (a-t+r,). Step 2 multiplies the bye" and e ".
^ 4. of Differential Equations 247 Step 3 combines lhe pure oscillations e" and ,-•< , ' to had v « cost » expected: Г «(0 - | A|| good. The vector u because con1 t + sin1t 1 2 -H i t. 2*“rL^mwndlcirc,e,fi»“re6’» ^rad'u'bi 1. The PenoiT i* 2» when и completes a circle. Figure 6.3: The exact «elution u = (con t, — sin t) stays on a circle. Forward differences Yi, Yj,... spiral out from the circle in 32 steps. DifTercnce Equations To display a circle on a screen, replace jf -p by a difffrtact reunion Here are three choices using Y(t + Д1)-2У(1) + У (I- At). Divide by (At)7 to approximate rf'p/df*. F Forward from n — 1 C Centered at time n В Backward from n + 1 V WxV -У—' (,|Г) K.4.1 - 2У„ 4 r„_-i s (|IC) (A/’’ -У.41 (I I Bl Figure 6.3 shows the exact y(t) - coat completing a circle at I = 2a The three dif- ference methods don't complete a perfect circle in 32 time steps of length Af = 2»/32. The spirals in those pictures will be explained by eigenvalues A for 11 F. 11 B. 11 C Forward |A| >1 (spiral out) Centered |A| = l(best) Backward |A| < 1 (spiral In) The 2-slcp equations (I I) reduce to 1-step systems (/„♦ । »AUK. Instead of u = (y.y ) the unknown is U„ - (K.2.). We take n time steps of sire At sinning from Uo: Forward (11F) Уп*1 =K. + Af2« [ 1 z;*‘=zn-AiK bwome' "-‘[-Д' 1 Jl*-J (12) Those are like У' = Z and Z' - -Y They are h гм order equals Now we have matrix. Eliminating Z would bring back the second order equation (I I F).
Chapter 6. Eigenvalues and Eigcnv^ 248 • dmnle Do lhe points (YK.Z„) stay on lhe circle Y3 + Z3 My question is ытР,с . R 6j w, are taking powers An and nol „л. No. they are grow mg to m . eigenvaiueS! x so we test the magnitude \X\ and not tne re Eigenvalues of .4 A = 1 ± H>en |A| > 1 «nd (K.,Zn) spiralsout Tbe backward choice in (11 B) will do the opposite in Figure 6.4. Notice the new A: Backward K.+i = K, + A*^-*> Z.+1 = Z. - AiK+i is Г 1 -All [K+i [A( 1 [Zn + 1 ” И =tz"- (13) That matrix Then |A| < has eigenvalues 1 ± »AL But we invert it to reach Un+1 fraiT1 v 1 explains why the solution spirals in to (0,0) for backward difference Figure 6.4: Backward differences (11B) spiral in. Leapfrog (11C) stays near the circle. On the right side of Figure 6.4 you see 32 steps with the centered choice. The solution stays close to the circle (Problem 28) if At < 2. This is the leapfrog method, constantly used. The second difference У,+| - 2УЯ + Уя-1 “leaps over" the center value Yn in (II). This is the way a chemist follows the motion of molecules (molecular dynamics leads to giant computations). Computational science is lively because one differential equation can be replaced by many difference equations—some unstable, some stable, some neutral. Problem 26 has a fourth (very good) method that exactly completes the circle. Real engineering and real physics deal with systems (not just a single mass at one point). The unknown у is a vector. The coefficient of y" is a mass matrix M, with n masses. The coefficient of у is a stiffness matrix K, not a number k. lhe coef- ficient of j/ is a damping matrix which might be zero. The vector equation My" + Ky = / is a major part of computational mechanics.
ь 4 System» of Differential Equations 249 For thc solution of du/dt = Ди. h a fu 2 b> 2 Matnces appnxich и = 0 лг t -♦ oo? h probjcm ,T™ Ч«*мюп. Dow the wham includes e‘ is unstable. Stability depends on the e,JL. ,р"'П’ спст«>7 A 4olu'*on ,hjl The complete solution u(t) fc bu.lt from Л we know exactly when e* will approach tan * И ^«wxlue A is real. If the eigenvalue is a complex number A - r ° ' 'lu"d>er A mutt be negative. When eU splits into erte“‘, the factored has ahL'l .rral P™ r muil >* negative. «’«’lute value fixed at 1: c‘" = cos st 4-i sin st has |eU‘|, = coe2rt4-»in2rt = l. Then |eA‘| = ert controls the growth (r > Q) or the decay (r < m The question is: Which matrices have nevat^ * . ° ' „ fte a A is stable and u(t) -» 0 when all eigenvalues A of A have negative real parts The 2 by 2 matrix A = [ «& must pass two tests: Ai + Да < 0 The trace T = a + d must be negative. (I4T) AiAa > 0 The determinant D = ad - be must be positive. (I4D) Reason If the A’s are real and negative, their sum is negative. This is the trace T = a + d. Their product is positive. This is the determinant D. The argument also goes in reverse. И D = Ai A2 is positive, then At and A2 have the same sign. IfT = Ai + >2 is negative, that sign will be negative. If the A s are complex numbers, they must have the form г + ie and r — is. Otherwise T and D will not be real. The determinant D is automatically positive, since (r + is)(t is) = r2 + s2. The trace Tisr + ta + r — i» = 2r. So a negative trace T means that r < 0 and the matrix is stable. Tbe two tests in (14) are correct The Exponential of a Matrix We want to write the solution u(t) in a new form e^ufO). First we have to say what e** means, with a matrix in the exponent To define rM for matrices, we copy e* for numbers. The direct definition of e* is by the infinite series 14- x 4- Jx2 4-1x3 + - • •. When you change x to a square matrix At. this infinite series defines the matrix exponential e41: Matrix exponential e4‘ e42 =/4-At 4-}(A7)2 4-|(At)3 4-••• | (1 Its t derivative is Ae4’ A 4- A2t 4- jA3t2 4- •• • = Ar4’ Its eigenvalues areeA2 (7 4- At 4- ;(At)2 4----)x = (14-A/4- |(At)2 + •••)«
250 Chapter 6 Eigenvalues and EiBcnveetOrj ^Taw converges and its derivative is always Ae*. Therefore < л,и(0) so|Ves differential equation with one quick foonula-^rn i/there и a shonage of e,genveclorx Ibis chapter emphasizes how to find u(t) = ^'u(O) by^.agonal izat ion. AssUlnc д does have n independent eigens-ectors. so we can substtMc! A - Л AX lnlo for e4*. Whenever XAX-’XAX’’ appears, cancel X 1A tn the middle: A‘ = / + ХАХ-Ч + |(AAX-‘t)(XAX-‘t) +.., = X|/ + At + |(A<)2 + ---]X-* L'se the series Factor out X and X-1 eA‘ is diagonalized! rAI has the same eigenvector matrix X as .4. Then A is a diagonal matrix and so is cAt The numbers Л' are on lhe diagonal. Multiply XeA,X_1u(0) lo recognize u(f): '%(0) = XrA,X-'u(0) This solution r^ufO) is the same answer that came in equation (6) from three steps; Example 4 Use the infinite series to find c4' for A = [ _J „). Notice that A* = I; A*'. A®,A7. A" will be a repeal of A. A3. A3, A*. The top right comer has 1,0,-1,0 repeating over and over in powers of A. Then I - Lt3 starts the infinite scries for eAt in that top right comer. and 1 - if2 starts the top left comer: 'M = I + At + |(At)’ + g(At)3 + - ,-f+jf3------------------------------------------------------- + That matrix e4' shows the infinite scries for cos t and sin t! eAt _ | cosf sint — slot cost (18)
м- Systems of Differential Equations 251 л ;s an antisymmetric matrix (AT = -Д) t.___________ The eigenvalues of A are i and -i. Then the eigemXs"^ » “ 1 The inverse of eA* is always е~л‘. *' *e . 2 The eigenvalues of eAt are always e4 3 When A is antisymmetric, eA‘ is 0f1hog0IML = = At An,iSy";Sike i and T’kc'n '(Xh 'ym,nCtnC" b™ P“« ^i-гу eigenvalues like. and -i. Then eA‘ has eigenvalues like e“ and e- Then absolutevalue is 1: neutral stability. pure mediation, energy i. conserved. So ||u(f)j| If A is triangular, the eigenvector matrix X is rise tnangular. So are %- and eM. The solution u(t) is a combination of eigenvectors. Its short form is ?'x(0). ™ Г1 11 r i Example 5 Solve — = Ли = j u starting from u(0) = * at t ж 0. Solution The eigenvalues I and 2 are on the diagonal of A (since A is tnangular). The eigenvectors are (1,0) and (1.1). Then eAl produces u(t) for every u(0): u(t) « Хел‘Х-‘и(0) is [J That last matrix is eAt. It is nice because Л is triangular The situation is the same as for Ax “ b and inverses. We don't need A"’ to find z. and we don’t need eAl to solve du/dt - Au. But as quick formulas for lhe answers. A~'b and ели(0) are unbeatable Example 6 Solve y" + 4/ + 3y 0 by substituting e** and also by linear algebra Solution Substituting у = e* yields (А1 + 4A + 3^** ж 0. That quadratic factors into д* + 4А+3= (Л+ 1)(А+3) ж 0. Therefore Ai = —landAj = -3. The pure solutions are l/i - e~‘ and щ • The complete solution у = ciyt + cjyj approaches zero To use linear algebra we set и = (у. у*). Then the vector equation is и’ ж Ди: dy/dt = yf du Г 0 11 ,,. , converts to — = „ . si- dy /dt — — 3y — 4y dt [—3 -4] This A is a “companion matrix” and its eigenvalues are again A| = -1 and A2 = -3: Same quadratic det(A - A/) “ I _g _ д | — + + ® “ ®- The eigenvectors of A are (1,4) and (1. Aa). Either way. the decay in y(f) comes from e“‘ and e~4 With constant coefficients, calculus leads to linear algebra Ax = Az. The eigenvectors are orthogonal (proved in Section 63 for all symmetric matrices) All three A< are negative. This A is negative definite and e*' decays to zero (stability). The starting u(0) = (O,2V^.O) is zj-zz The solution is u(f) = e** *a — c ’ *a
252 Chaptci 6. Eigenvalues and ,,- al the center starts at 2V2. Heat diffuse „Ш HP« h5*-,,К tCn^4.tsiJe Nw» <fr0,cn " °O) ra,e of heat Hau '<*“*'’* ttxes and then ‘° From box 2. heat flo*s Wt and right« £“ ГД ы»» - * - *•>+in lhc “““• ™ <x Ihe rate u, —3 -»« ' eigenvec.ors X- But now: the eigenvalues A ._•— j-и dr - 'u tk fmiuencies come from и-' --A: rtorr <r“ . —<T The ttequcnstc. lcJj to o^lbixm* « x ana ,• лЗг«-*х = Ar^x and u? = -X. £ ал,). A(e-**) bccon<> 1 Л- ' fwt, <“**•• «Л Лгсе c>8cnvecton art two Tu**a combination will match the six components of u(0) and «(<»•*’«’* Figure 6.3: Heat diffuses away from box 2 (left). Waves travel from box 2 (right). „ . . . „ _ ,ar t/* - 2u' + V = 0 gives an equation with Example 7 *e 0 b /д _ ) >r я о with A = 1,1. A differential equations would propswe г and (e* m two independent so)u.ions Here we discover why. Linear alg^Treduce* jf - 2/ + V - 0 lo a vector equat.on for и » (у,«/): Л has a repeated eigemalue A ” 1,1 (with trace « 2 and det A = 1). The only eigen- vectors are multiples of x - (1,!). Dtafonali^ition is not postiblt. This matrix A has only one line of eigenvectors. So we compute rAl from it> definition as a scries: Short series ел‘= e^ef4-^ = e* [/+ (Л — J)tj. (21) That “infinite" senes for е(Л*,и ended quickly because (Л - /)2 is the zero matrix. You can see te* in equation (21A The first component of e4' u(0) is our answer y(f): [ J' ]я? [/+ [ "I ! 1'] [ «Cl 1 y(t)=e‘v(o)-te*y{0)+teV{0)-
64. Sy’W"** °* D,ffcrcnt“> Equations 253 problem Set 6.4 ante In linear algebra lhe serious danger к а . A1)and(l,A2)arcthcMmeifA7^x geo*agcmwl<*' Our eigenvectors don’t yet have two independent solutions lo du/dt = Au ’ d“r°nak/e A ,n lhn “* wc In differential equations the danger is also а гепел«1 1 an Ar elution has to be found. It turns out to be и = «Jm тъ -Af ” * “/ ’ “ *econd ” к. О »<*« <«’ bl,*1 “ 1 2 5 Find two A’s and x’s so that u = e^x solves — : [4 3] dt [О 1] “ What combination u = e^'x, + (mm u(0) . (5, _2)? Solve Problem 1 for u = (y,z) by back substitution, z before у: Solve - - z from a(0) - -2. Then solve $ ж 4p + 3z from y(0) - 5. at The solution for у will be a combination of e* and e*. The A’s are 4 and 1. (a) If every column of A adds to tero. why is A = 0 an eigenvalue? (b) With negative diagonal and positive off-diagonal adding to zero, и' ж Au will be a "continuous" Markov equation. Find lhe eigenvalues and eigenvec- tors, and the steady state as t -♦ oo dti Г—2 31 fjl Solve — 2 _2 u with u(0) « J. What is u(oc)? A door is opened between rooms that hold v(0) ж 30 people and u(0) ж 10 people. The movement between rooms is proportional to lhe difference r - u-: dv dw — = tc - и and — » v - w. at dt Show that the total v + w is constant (40 people). Find the matrix in du/dt ж Au and its eigenvalues and eigenvectors. What are t> and w at t = 1 and f ж oo? Reverse the diffusion of people in Problem 4 to du/dt = -Au: dv . dw — жи-v and — - w - v. at al The total v+ w still remains constant How are the A’s changed now that A is changed to - A? But show that v(t) grows to infinity from r(0) = 30. A = “ * j hasreal eigenvalues but В = j has complex eigenvalues: Find the conditions on a and 6 (real) so that all solutions of du/dt - Au and dv/dt — Bv approach zero ast -too: Re A < 0 for all eigenvalues.
Chapter 6. Eigenvalues and Ei, 254 8 9 Suppose P is the projection matrix onto the 45° line у = j jn rz eigenvalues? If du/dt = ~Pu (notice minus sign) can you find the limit f are ‘U t — oc slatting from u(0) = (3,1)? ° **(t) at The rabbit population shows fast growth (from 6r) but loss to wolves (f The wolf population always grows in this model (-tr2 would control и Г*”1 '2*") wt’IVcs). dr . dw — « 6r - 2r and — = 2r + и,. Find the eigenvalues and eigenvectors. If r(0) = «(()) = 30 what are the at time t? After a long time, what is the ratio of rabbits to wolves? ° P°pulal'«ns (a) Write (4.0) as a combination c,», + CjXj of these two eigenvectors of (b) The solutioo to du/dt Ли starting from (4.0) is + cye~ttx stituter*1 = cost + isint and e*'1 “ аж/ — tsinf to find u(t). 2 U*>‘ 10 Find Л to change f - 5/ + 4V into a vector equation for и - (у. y>); What are the eigenvalues <* Л? F,nd lhcm a,*° from “ bv> + 4v wi,h V = «*. 11 The solution io g” = 0 is a straight line у - C + Dt. Convert to a matrix equation: — f * = ° 1 * has the solution dt (VJ [0 0J l/J V = ,At v(0) И k(0). This matrix Л has A > 0.0 and it cannot be diagonalized. Find A2 and compute ел‘ = 1 + At + pl/a + •••. Multiply your e* times (y(0).1/(0)) to check the straight line y(t) = y(0) + /(0)t. 12 Substitute у = e* into y" = 6y* - 9y to show that A = 3 is a repeated root. This is trouble; we need a second solution after r3*- The matrix equation is d у _ 0 1 у dt И = Г» б] Show that this matnx has А ж 3.3 and only one line of eigenvectors. Trouble here too. Check that у = Ic31 is a second solution lo g" = 6j/ - 9g.
е 4 Sy»‘ems °*l>l,lCTtn,lal Циа1н>т 255 (a) Write down two familiar functions ' Иорл/л * (W . л.: Find u(t) by using the eigenvalues and eigenvectors of A: u(0) = (3,0). 14 The matrix in this question is skew-symmetric (Ат = -Ay. du 0 c ~b < cu> - bu3 dt “ ° ° ** °* “з “ aui ~ ™i . • ~° OJ t^ = bu|-eu2. (a) Tbe derivative of ||u(t)||2 . u’+u’+u? is 2uiu',+2u2«'2+2u1ui Substitute u'„ Uj, Uj ‘° 8е’ »« Then ||u(t)||2 stays equal to |u(0)||2. (b) A1 — -A maket Q = tAl orthogonal. Prove QT «e“^* from tbe senes fix Q. A particular solution to du/dt = Au - b is u, - A ~1 b. if A is invertible. Thc usual solutions to du/dt = Au give u„ Find the complete solution и = u,+ u,: Questions 16-25 are about lhe matrix exponential eAt. 16 Write five terms of the infinite series for e'*1. Take thc I derivative of each term. Show that you have four terms of AtM. Conclusion: e4,Uo solves u' = Au. 17 The matrix В “ [J ~J] has B2 = 0. Find r®’ from a (short) infinite series. Check that the derivative of e®‘ is Be®. 18 Starting from u(0) lhe solution at time T is eAI u(0). Go an additional time t to reach eAl eAT u(0). This solution at time t + T can also be written as ---------------. Conclusion: eAt times eAT equals________. 19 If A2 = A show that the infinite series produces c 41 =/ + («*- 1)A. 20 Generally eAeB / eBeA. They are both different from tA + ®. Check this for 21 Put A = [ J § ] into the infinite series to find tM. First compute A* and A":
256 Chapter 6. Eigenvalues and Eigci 22 23 24 (Recommended l Give two reasons why the matrix exponential eAt js ncv (a) Write down its inverse. (b) Why are its eigenvalues eXl nonzero? Г *’Пви,«г: find a solution x(t). y(t) thal gets large as t -f oo. To avoid this instabilit v exchanged the two equations to get A < 0. How can this be ? a ^'entist Кн.! - 2У. + Г.-i = -(А/)2Г. can be written as a one-step difference eqUatj Уя+| = Г« + Д/ Z. Г 1 01Г r.+11 r 1 Д/ 1 r Z.^i = Z. - Af r.+t [ Af 1 J [Ze+I J = 0 1 ? J I J Invert the matrix oo the left side to write this as 1/я4] ж AU,. Show that I Choose the large time step A/ = 1 and find the eigenvalues A< and A _ T * 1. 2 ® At of 4. 4= j * has|AI| = |AJ| = l.Showthat4e-JsoueeUoexactJy 25 That leapfrog method in Problem 24 is very successful for small time But find the eigenvalues of A for Af s/2 and 2. Any time step Д/ V*? A/' lead to |A| > I. and lhe powers int/„ A"U0 will explode. 2 W'H A 1 Л -Л -1J and A = borderline unstable 26 A very good idea for / - -Jf » «he trapezoidal method (half forward/half back). Thu may be lhe belt way lo keep (Y„,Zn) exactly on a circle. _ . f 1 1 [ K-+1 1 - ( ’ Д‘/2 1 [ Г" 1 Trapezoidal [ д,/2 , J [ 2Я>1 ] [ -At/2 1 J [ Z„ J ♦ (a) Invert the left matrix lo wnie this equation as l/„+i = AU,. Show that A it ал orthogonal matrix: A1 A I. These points U„ never leave the circle. A « (/ - В)"'(/ + B) is always an orthogonal matrix if BT ж -B. (b) (Optional MATLAB) Take 32 steps from l/0 (1.0)tof732 with At = 2ir/32. Is Un = Ue11 think there is a small error. 27 Explain one of these three proofs that lhe square of eA is c2A. I. Solving with r4 from t = 0 to 1 and then 1 lo 2 agrees with e2A from 0 to 2. 2. The squared series (I + A + + • • • )2 matches I + 2A + + • • • = e2A. 3. If A can be diagonalized then (XeA№,)(XeA№1) = Л>2ЛХ-1.
6Д. Systems DifTcnn«*al Equations, 257 on a H'lTortnlial Equalions Course instant coefficient linear equates are the simoieu. , course part of a differential equations course, but there £ S<xt,on 6 4 showi У™ 1. The second order equation mu* + bu' + b cations. The exponents A in the solute " = ° has major importance in appli- ₽<« cdjzz:; 7 z *=°- . j 'j . .. . “ =4mk Overdampine lP>4rnk This decides whether A, and Aa are real rant p K With complex A = e + iu the ,o|ulion w rcpca,cd " complex roots, utxm u(t) opiates from e** as it decays from e*. 2. Our equations had no forcing term /ft) Wr To u„(t) we need to add a pan.cular LlwionT m 71* “nulUpace м>1и,юп" This solution can also be discovered and studied Ь/ Ь**>Псе',he force /(<)’ siuoieo by Laplace transform; Input f(t) at time • Growth factor ел<‘~»> Add up outputs at time t in real applications nonlinear deferential equations m solved numerically. A method wlth good accuracy is Runge Kuru The constant solutions to du/Л - J(u) are u(t) = Y with f(Y) - 0 and du/dt « 0: no mowmeni. Far from У. the computer takes over This basic course is the subject of my textbook (a comp»,on to this one) on Differential Equations and Linear Algebra math.mit.edu/dela П,« MMdw.1 ЖЙШ of lhe book m taenbed i. , 4 Aon .kfcov .Ы . parallel senes about numerical solutions was prepared by Clew Moler <ww.mit.edu/resources/rcs-1Я-Ш№.1>пта^г;ггт.п.пц||| c Tj ... ... . strang-and-cleve-moler-fall-2015/ www.mathworks.com/academia/courseware/)earo.diff»r»n«i«i—o.—o™ u._ 'о
7.1 7 The Singular Value Decomposition (S Singular \alues and Singular Vectors Camprwdng linages by tbe S' D Principal Component Analysis Tbe Victory of Orthogonality (and a Revolution) upterdesctops one idea. That idea applies to everymatrix, square lJrr_. tensaon of c^cnvccton. and no» »e need «.oretsc/orr^^/^^hr to r, Mkl omput vectors щ to This is completely natural for ‘npw к vectors г, ю r, are m the no» space and u, lo u. are in the col™!! b> " ------ r meets of rank one. »ith r « ^4) nin *₽ace «ngular 73 7.4 SVP A « LTV'T - Qiu:ef o;u-t>J t-“4r,n,»q hr unrular sectors r, are eigenvectors of ATA. They give bases for the rou rxisei ror ur IV»UU* Y—----------------. _ Tbe matrix A is diagonalized by these two base: AV = l/E. Eads m, r, »hen A is a symmetric positive definite matrix Those singular vectors •ill be the eigenvectors g. And the singular values a, become the eigenvalues of A. If A is not square or not ss mmetne. then A “ CEVT replaces 5 - Q.\QT. TIk SVP is a valuable »ay to understand a matnx of data In that case AAT is the sample covariance matrix, after centering the data and dividing byn - 1 (Section 7.3). Its eigenvalues are <rf to <rj. Its eigenvectors are the u s in the SVD. Principal Comfwient Analysts < PC A1 is totally based on the singular vectors of the data matnx A. The SAD allows »onderful projects. by separating a photograph - matrix of pixels into its rank-one components. Each tune you include one more piece <7,u,v^. the picture becomes clearer Section 7J sho»s examples and a link to an excellent website. Section 7J describes PCA and its connection to the covariance matrix in statistics. Section 7.4 shows bon it all develops from and depends on one idea: orthogonality. 258
Singular Value* and Singular Visors 7‘ 259 7.1 Singular Values and Singular Vectors A* S"4,”1“ « AV . vix 2 Singular sectors in Ди. = rr.u, « orehomvnul. yTy . f ; 3 The diagonal matnx E contatns the ungula, nlues o, „r > 0. <4 The^softhoaes.n^larv^ues^eigenv^uesofATA^AA/ } lS^^№3!SSi Ш by « nW,7 2 # °** « of orthonormal sectors r v.. in R and a «««nd set u..in R-. Instead of Sx = Xr wt welUT 1' ffu " Here i> « 2 by 2 unsymmetric example with orthogonal input, and orthogonal output.: °][!]"[«] “** лв»-[? °][“!]’['»]•(,) (1.1) is orthogonal ‘О Н.П.-d (3.9) b onhogonal to (-3.1). Those are not unit vectors but that iv easdy fixed The tnputs (1.1) „d (-1.1) „eed to be divided by Л The outputs need to be dtv.ded by Л6. That leaves the singular value* ЗЛ and v/5: Xv = au | J ;]в1«зЛМ1 and [J (2> Multiply the singular values o( ЗЛ and u, Л lo get rr^, . det A - 15. We can move from vector formulas to a matrix formula. SQ = QA becomes A V l/E. AV . [ 3 0 1 [ 1 1 * . [ * -3 1 1 Г 3s/5 0 1 ... I 4 5 ] I I 1 ] [ 3 1 ] vlo [ 0 v/5 ] “t/E' (3) V and I are orthogonal matrices. So if we multiply equation (3) by VT. A will be alone: AV^l/E becomes A ж t/EVT oiUiv^ + OjUjOj (4) This says everything except how to find V and U and E. and what they mean. When equation (4) multiplies v,. orthogonality produces Ди, ж ffjtq. Key point: Evtry matrix A is diagonalized by two sets of singular vectors, not one set of eigenvectors. In the 2 by 2 example, the first piece is more important than the second piece because ai = ЗЛ is greater than o2 = Л. To recover A. add the pieces tfiUtuf + rraujuj: -31 (30 1][4 5
260 Chapter 7. The Singular Value Decomposition (SVD) The Reduced Form of the SVD Thai full form AV = l/E can have a lol of zeros in E when lhe rank of A is sntall and the nullspace of A is large. Those zero* contribute nothing to matrix multiplication. The heart of the SVD is in the first г v’s and u’s and rr’s. We can change AV = to AVr = l/rEr by removing the parts that are sure to produce zeros. This leaves the reduced SVD where Er is now square: (m x n) (n X r) = (m X r) (r x r). Reduced SVD AVr = l/rEK A V\ .. vr “ Ui .. Ur Av, = <r,u, fow spacc column space We still have VrT Vr — I, and Uj Ur = /, from those orthogonal unit vectors v’s and u’s. When Vr and Ur are not square, we can’t have full inverse*: Vr VrT / 1 and UT Uj / /, But A = l/rEr VrT is true. The other multiplications in A = BEVT give only zeros. Example? A=[l 2 2]-[1][3][1 2 2] /3 = l/rErVrT ha* r > 1 and rr, . 3. The rest of l/E VT contributes nothing to A, because of all the zeros in E. The key separation of A mto«7|U| v’ + ••• + o,u,»7 iloPs •*<»|U|»T because the rank is r 1. The Important Fact for Data Science Why is the SVD so important for thi* subject and this book ? Like the other factorizations A = LU and A = QR and S “ QAQT. il separates the matrix into rank one pieces. A special property of the SVD is that those piece* come in order of Importance. The first piece ff|U|»,r when at > a2 u the closest rank one matrix to A. More is true: For every k, the sum of the first к piece* is the rank к matrix that is closest to A. A* = <7|Ui»y + • • • + «raUfctiJ is the best rank к approximation to A “Eckart-Young" If В has rank к then 11A - A*|| < ||A — B||. (6) To interpret that statement you need lo know the meaning of the symbol ||A — B||. This is the "norm" of the matrix A — B. a measure of its size (like the absolute value of a number). The norm could be <T| or <t| + • • • + aT or the square root of tzj + • • • + <r,. Our first job is to discover how U and E and VT can be computed. For a small matrix they come from eigenvectors and eigenvalues of ATA and AAT. For a large matrix, multiplying A by AT is not wise. Two steps are much better: Reduce A to two nonzero diagonals and modify lhe QR algorithm that finds eigenvalues.
. I Singular Values and Singular Vectors 261 HrM Proof of the SVD The go»1 » A S Wc Wan‘'° “""У °* ,BO ** «< Mn8“lar vecKus, thc us and (hc v's. One way » «nd iho* vectors is lo form the symmetric matnces Лт A and AAT : ATA = (VETl/T) ((JEVT) = yj-Tj-yT P) AAT = (l/EVT) (VETl/7) . иЕЕТцт (S) th (7) and <*) P'oduccd *ynmetric matrices Usually ATA and АЛ1 are diilerent. Both °ht hand sides have thc special form QAQ1 Eigenvectors art In Q - V or Q = U. nB • know from (7) and (8) how V and U and E connect to those symmetric matrices S« wc . -T дТ A and AA • V contains orthonormal eigenvectors of AT A U contains orthonormal eigenvectors of AAT to art the nonzero eigenvalues of both AT A and AAT We are not quite finished, for this reason The SVD requires that Ao* - v*u*. It connects each right singular vector v* to a left singular vector «*. for к « I...r. When I choose the v’s, that choice will decide lhe signs of the u's. If Su Au then also S(-u) " A(-u) and I have lo know the sign to choose More than that, there is a whole plane of eigenvectors when A is a double eigenvalue. When I choose two v's in that plane, then Av « au will tell me both u’s. This is in equation (9). The plan is to start with the v's. Choose orthonormal eigenvectors vi..........v, of ATA. Then choose <r* = v^a- To determine the u’s we require that Av * <ru: v’s then u’s ATAv* = and then u* - — for к - 1 (9) This produces the SVD! Let me check that those vectors ut (10) A A u* ® Aa" i । — n । \ <r* / \ O* / <r* Thc v’s were chosen to be orthonormal. I must check that thc u’s are also orthonormal: т /Av^T/<^ = t^A7Av*) = a* T^= f 1 if7 = k (i|) Oj Ok O) * I ® >f J # was the key to equation (10). The law (AB)C = * in linear algebra. Moving the parentheses is a n;“”kTJ w Notice that (AAT)A = A(ATA) A(Z?C) is thc key to a great many proofs in powerful idea. This is the associative law.
262 Chapter 7. The Singular Value Decomposidcm (SVD) Finally we have to choose the last n — r vectors vr+t lo vn and the last rn — r vec- tors Ur+i to um. This is easy. These v’s and u’s are in the nullspaces of A and ДТ We can choose any orthonormal bases for those nullspaccs. They will automatically be orthogonal to the first v’s in the row space of A and the first u’s in the column space This is because the whole spaces are orthogonal: N(A) -L C(AT) and N(AT) j_ С(Д) The proof of the SVD is complete by that Fundamental Theorem of Linear Algebra Now we have U and V and E in the full sire SVD of equation (1): rn u’s, n v’s You may have noticed that the eigenvalues of ATA are in E1 E, and the same numbers Ox to a, are also eigenvalues of AAT in EET. An amazing fact: BA always has the same nonzero eigenvalues as AB. If В is invertible, BA = B(AB)B~* is similar to AB AB and BA: Equal Nonzero Eigenvalues If A is m by n and В is n by m. AB and BA have the same nonzero eigenvalues Start with ABx = Xx and A / 0. Multiply both sides by B. to get В ABx = АВж This says that Bx is an eigenvector of BA with the same eigenvalue A—exactly what we wanted. We needed A / 0 to be sure that Bx is truly a nonzero eigenvector. Notice that if В is square and invertible, then B_|(BA)B = AB. This says that BA is similar to AB: same eigenvalues. But our first proof allows A and В to be m by n and n by rn. This covers the important example of the SVD when В = AT. In that case AT A and AAT both lead to the r nonzero singular values of A. If m is larger than n. then AB has m — n extra zero eigenvalues compared to BA. Example 1 (completed) Find the matrices U and E and V for A = 3 0 1 4 5 J With rank 2. this A has two positive singular values ai and a3. We will sec that at is larger than Amax = 5, and a3 is smaller than A^,, = 3. Begin with AT A and AAT: Those have the same trace (50) and the same eigenvalues tr, = 45 and rr j = 5. The square roots are <7i = v/45 and o3 = -y/5. Then <7i<r2 = 15 and this is the determinant of A. The key step for V is to find the eigenvectors of ATA (with eigenvalues 45 and 5): [ 25 20 1 Г 1 1 _ Г 1 1 Г 25 20 1 Г -1 1 Г -1 1 I 20 25 J [ 1 J - 45 1 ] [ 20 25 J [ 1 J = 5 [ 1 J Then vi and v2 arc those orthogonal eigenvectors rescaled to length 1. Divide by \/2. Right singular vectors V] = Left singular vectors u, = — Oi
7 I. Singular Value, and Singular Vectors Now compute Av, and Av, which muy u 263 Av, Av'i 'io L i division by s/W makes ut and u2 orthonormal. Then O| = v« an a, expected. The Singular Value Decomposition of A is U times £ times VT. 1 -3 3 1 45 (12) .. ancj у contain orthonormal bases for the column space and the row space of A (both spaces are just R2). The real achievement is that those two bases diagonalize A: 1V equals l/E. The matrix A = UEV7 splits into two rank-one matrices, columns times rows, with x/2 v/10 = v 20. Their sun is Л with V5/v/26 = i ^Tio E = V = <72 Uj Every matrix is a sum of rank one matnees with orthogonal w's and orthogonal us. Orthogonal rows [ 1 1 I and [ 3 -3 ], orthogonal columns (1,3) and (3.-1). To say again: Good codes do not start with AT A and AAr. Instead we produce zeros in A by rotations that leave only two diagonals (and don't affect the singular values). The last page of this section describes a successful way to compute the SVD. Question: If 5 = QAQT is symmetric positive definite, what is its SVD ’ Answer: The SVD is exactly UY,V^ = QAQ1. The matrix U = V = Q is orthogonal. And the eigenvalue matrix Л becomes the singular value matrix £. Question: If S = QAQT has a negative eigenvalue (Sx = -ax), what is the singular value and what are the vectors v and tt ? Answer: The singular value will be a = +o (positive). One singular vector (either u or v) must be — x (reverse the sign). Then Sx = —ax is the same as Sv = <zu. The two sign changes cancel. Question: If A = Q is an orthogonal matrix, why does every singular value equal 1 ? Answer: All singular values are a = 1 because ATA = QrQ = 1- Tbw S — !• But 17 = Q and V = I is only one of many choices for the singular vectors u and v: Q = l/EVT can be Q = QUr <*•“* Q = (Wi
264 Chapter 7 Thc SinSul" Value Dcc°mposition (SVD) Question: Why are all eigenvalues of a square matrix A less than or equal to j Answer: Multiplying by orthogonal matrices U and V T does not change vector length,; ||Ax|| - ||f/EVTx|| = ||EVTx|| < <rt||VTx|| = «nllatll for all x. (|3) An eigenvector has ||Ax|| = |A| ||x||. Then (13) gives |A| ||x|| < <r( ||x|| and |Д| < Question; If A = zyT has rank 1. what are U| and »i andffi ? Check that |A,| < Answer: Thc singular vectors Ui = x/||x|| and Vi = у/ |y|I have length 1. Then «г, B ||x|| |; y|| is the only nonzero number in thc singular value matrix E. Here is thc SVD; 2 V1 Rank 1 matrix xyT = (||«|| llvll) - «1<И ef. Observation The only nonzero eigenvalue of A = xy r is A> «= yTx. Tbe eigenvector is x because (xyT)x = x(yTx) ” A|X. Then the key inequality |А, | < becomes exactly the Schwarz inequality |yTx| < ||x|| ||y 11 The Geometry of the SVD The SVD separates a matrix into Л = УEVT: (orthogonal) x (diagonal) x (orthogonal) In two dimensions we can draw those steps. The orthogonal matrices U and V rotate the plane. The diagonal matrix E stretches it along the axes Figure 7.1 shows rotation times stretching times rotation Vectors x on the unit circle go to Ax on an ellipse Figure 7.1: U and V are rotations and possible reflections. E stretches circle to ellipse. This picture applies to a 2 by 2 invertible matrix (because > 0 and rr2 > 0). First is a rotation of any x lo VTx Then E stretches that vector to EVTx. Then U rotates to Ax = t/EVTx. We kept all determinants positive to avoid reflections. Thc four numbers a, b, c, d in the matrix connect to two rotation angles 9, ф and two numbers oj, <z2 in E. ° ^ ] = [ сов® -тпв 1 Г Oj 1 Г сов* шпф 1 c d ] [а»пв cost? ] [ <7j J [ — sin^ сояф J * ' Question. If the matrix is symmetric then b = c. Now A has only 3 (not 4) parameters. How do tbe 4 numbers 9, ф. <Г|, a2 reduce to 3 numbers for a symmetric matrix? Question 2 If9 = 30°andol = 2 and a2 = 1 and ф = 60°. what is A?
7j. Singular Values and Singular Vectcn 265 The First Singular Vector Vi wj|| establish a new way to look a a,. The prevwu. pages chose the •’* as Thi» pa|5..ors of ЛГЛ. Certainly that remains tree But there is a valuable way to cigcnVCL ‘ singular vectors one al a lime Instead o( ail al once We start with Vi «^^“"nuular value ff|. The length of z comes from | z|p = r3, + • + x* = »• and d* ___________________________ Maximize the ratio l£f!! Them».- ||z|| ,nen»aximumh (15) -------------------------------1 ц,е ellipse in Figure 7^ I showed why the max.mu.ng . » . When you foltow „ across the page, it ends at Ло, - о, u, .The longer axis of the ellipse has length |Л v,|| = <r„ Bul we aim for an independent approach lo the SVD! We arc not assuming that we already know U or L or V How do we recognize that the ratio ||Xz||/||*|| is a maximum when x = vi Calculus tells us that the first derivatives must he zero. The derivatives will be easier to compute if we square our function and work with S - Лт Л: Problem: Find the maximum value A of хгЛ3 Ax zT 5z IMP x7x z^z ’ This “Rayleigh quotient" depends on x..x. Calculus uses the quotient role, need the partial derivatives 2z and 2Sx of zTz and zTSx: (16) m> we (17) s(*'s') Vs) -»E s.,,,. J(s.) <»> * i I Q (zTSz\ Use the quotient rule for I —=— ) and sei those n partial derivatives of (16) to zero; czx, \ z’z / (zTz)2^$z) - (zTSz) 2x, » 0 for i * l,...,n (19) Equation (19) says that (Sz), = Ax,. The number A tszTSz/zTz Then Sx “ Az and the best z to maximize the ratio in (16) is an eigenvector of S! 2Sx = 2 Az and the maximum value of * j* = -У**', is an eigenvalue A of S. xrx ||z|P The search is narrowed to eigenvectors of S = ЛТЛ. The eigenvector with largest A is z = V|. The eigenvalue is A| = <r3. Calculus has confirmed the solution (15) of the maximum problem. That problem has led to <r. and v. in the SVD. For the full SVD, we need a/l lhe singular vectors and singular values. To find v2 and <т2, we adjust lhe maximum problem so it looks only at vectors x orthogonal to ®|.
266 Chapter 7. The Singular Value Decomposition (SVD) Maximize under lhe condition vfx = 0. The maximum is <ra n( x = "Lagrange multipliers'* were invented lo deal with constraints on a? like „т^. And Problem 9 give* a simple direct way io work with this condition v[x = 0 ' Every singular vector u*+i gives lhe maximum ratio ||Ax\|/||x|| over nil vect that arc perpendicular U> the first o,..The left singular vectors would cor *” 7* * maximizing ||ATy||/||y||. We arc always finding lhe axes of an ellipsoid andT eigenvectors of symmetric matrices A1A or AAT: all at once or separately. 1 ,c Computing Eigenvalues and Singular Values x atrr.MM between the symmetric eigenvalue problem Sx = \x Xi s A Moncompu,ine Vs ttnd Eigenvalues arc the same for S and Q~'SQ = QrSQ when Q is orthogonal. So wc have limited freedom to create zeros in Q~lSQ (which slays symmetric). If we try for loo many zeros in Q lS, the final Q will destroy them. The good Q~*SQ will be Iridiagonal wc can reduce S lo three nonzero diagonals. Singular values are the same for A and Qj1 AQ? even if Qi is different from Qa. Wc have more freedom to create zeros in Qt 1 AQ-j. With the right Q s, this will be bidiagonal (two nonzero diagonals). Wc can quickly find Q and Qi and Qa so that в| bi «- for A’» Cl d| bl O] bj b, • • Qf’XQa- •e • "O о (20) * °". for it’» -» 0 c". The reader will know that lhe singular values of A orc the square rxxXs of the eigenvalues of S’ « ArA. And the singular values of Q, 'AQ3 are the same as the singular values of A. Multiply (bidiagonal)T(bidiagonalI to sec trldingonal This offers an option that wc should not take. Don’t multiply ЛТЛ and find its eigen- values. This is unnecessary work and the condition of the problem will be unnecessarily squared. The Golub-Kahan algorithm for the SVD works directly on A. in two steps: 1 . Find Qi and Qj so that Qi 1 AQj is bidiagonal as in (20). 2 . Adjust the shifted QH algorithm to preserve singular values of this bidiagonal matrix. Step 1 requires ()(тиг) multiplicalions to put an rn by n matrix A into hidiagonal form. Then later steps will work only with hidiagonal matrices. Normally it then lakes O(№) multiplications to find singular values (correct lo nearly machine precision). The full algorithm is described on pages -IN9-W2 in the 1th edition of Golub-Van Loan. When A is truly large, wc turn lo random sampling With very high probability, rundt>mi:ed linear algebra givei arrunite multi. Most gamblers would say that a gixxl outcome from careful random sampling is certain.
7,|. Singular Value» and Singular Wctu»» problem Set 7.1 267 Find Ar A and A A1 and thc singular vector* г 0 I о 1 0 0 8 ООО ha* rank r . } The eigenvalue* are 0.0.0. Check the equation* zU| = tfiu, and .Ats, - If you remove row 3 of A (all zero*), show that "j«»j and A » niuiv[ e> and <ij don't change. + OjttjvJ. Find thc singular value* and also the eigenvalue* of Д; 0 10' 0 0 8 TuuB ° ° В = ha* rank r _ 3 and determinant —--. I (JOO Compared to A above. eigenvalue» have ch»,cd much more than singular value*. 3 T"* to U ‘ “ «*- 4** Transpose Д a (/EV to sec that A = VE l/T ,«* the opposite way. from n't to ATu* = <rbvfc fork = l............... . 0 ferfc = r + ,........m Multiply .‘tv* = <г*Щ by AT. Divide A1 Av* * njek in equation (9) by n*. Whal are thc u * and и * for thc transpose [3 4 ; 0 5] of our esample main*? 4 When Av* and ATu* »Mt. show that 5 ha* eigenvalue* i»» and --л*: S" [ ДТ о ] h»»«rn*«ton [ “‘J and | ] and tract - 0. The eigenvectors of this symmetric S tell tn the singular vector* of A. Find the eigenvalues and the singular values of this 2 by 2 rnatns A lhe eigenvectors (1,2) and (1, -2) of 4 are not orthogonal. How do you know the eigenvectors V|, Vj of ATA will be orthogonal? Nonce that ATA and AA1 hast thc same eigenvalues A> = 25 and Aj = 0. The two columns of А V - (7E are Art ’ ai«i and Avj = So hope that """"H ' JI * J I •> I “ l> 4I-.J —1-1J The first needs <Т| + 1 « e, and the second need* I - Arc those true? The MATLAB command* .4 « rand(20.401 and В = randn (20,40) produce 20 by 4(1 random matrices. The entries of .4 arc between 0 and I with uniform probability. Thc entries of В have a normal "bell-shaped" probability distribution, thing an svd command, find and graph their singular values to <гя. Why do they have 20 <r’s ?
268 В 9 10 11 12 13 14 15 16 17 Chapter 7. The Singular Value Оесотр<л|11()п (SVQj A symmetric matrix S = ATA has eigenvalues A. to A„and eigenvectors v, 10 Then any vector has the form x = и th + • • • + The Rayleigh quotient is x * Sx _ AiC| 4* * * * 4* Anc% Я(ж) = "x5®-= c? + -’- + c£ Which vector x gives the maximum of Я7 What are the numbers C| to c„ for lha( maximizing vector x 1 Which x gives lhe minimum of R1 To find a2 and v2 from maximizing that ratio Я(х), wc must rule out the first singu |ttr vectors V) by requiring XT«I = 0. What docs this mean for c, 7 Which c’s givc thc new maximum a2 at thc second eigenvector x = v2 "! Find ATA and thc singular vectors in Av> = <hUt and Av2 - o2u2: 2 2 . . [33 Л = [ -1 1 Ond A [ 4 4 ] 1 For this rectangular matrix find Vi,Va,«3 and U|,u3 and ffi,cra. Then write the SVD for A as UWr - (2 x 2)(2 x 3) (3 x 3). л f 1 1 °' A [ 0 1 1 lf(ATA)t> - ff’tt. multiply by A. Move the parentheses to get (AAT)Av «. 1Г v b an eigenvector of ATA. then___b an eigenvector of AAT. Find thc eigenvalues and unit eigenvectors v ।, va of ATA. Then find u, Avt/oi: Л- J 2] andATA-[jJ “] and AAT JJ . Verify that щ is a unit eigenvector of AA^. Complete thc matrices Г/, E, V. (a) Why is live trace of A1 A equal lo the sum of all 7 (b) For every rank-one matrix, why is af = sum of all o^7 If A t/EV1 is a square invertible matrix then A~l = , ______. Thc largest singular value of A*1 is therefore l/<7„,|n(A). The largest eigenvalue is l/|A(A)|mi„. Then equation (13) says that <zmin(A) < |Л(А)|тщ. Suppose A = U£Vr is 2 by 2 with > o2 > 0. Change A by as small a matrix as possible lo produce a singular matrix Ao. Hint: V and V do not change. Why doesn't thc SVD for A + / just use E + /?
7 X Compressing ,ma8« by the SVD 269 7 2 Compressing Images by the SVD 1 An image is a large matrix of gra>K4j 2 When nearby pixels are correlated (not rando(n) "" ₽’Wl J Hags often give simple images. Photograob « к. '°трГС"с4 '--------------------------------------<-an be compressed by the SVD Image processing and compression are тают conTZTT~~~~~~~ in1agcs often uses convolutional neural nets in decn 1^7'°* “** al*rtrx Reto8n ^present part of.be This section will beg.n with stylized images Uke fliM л Then we move to photographs with many more oixek t n *,lh со"Ч>1ги'У- ways to process and transmit signals The ши» к. re . “ ** wam eff,c’cnl «present light/dark and red/greetVblue fw every smJfl X? У * °’ The SVD offers one approach to matnx an»»;—-.' . . sum A of r rank one matrices o,u,WT c>n he геИ,ХеИ <> A by A*. The This section (plus online help) will consider the effect M"n Л* of fc ‘ennk oraphmograph. Section7 3 willexptorenw^exMtjto^whirtwencedioapprox^maic and understand a matnx of data. approximate Stan with flags. More than 30 countnes chore flags with three «npes Those flap have , particularly simple form: easy to compress I found a book called "Flags of the World" and the pictures range from one solid color (Libya*, flag was ent.rely green dunng the Gaddah years) to very completed images. How would those pictures be compressed with minimum loss? The linear algebra answer is: Use the SVD. Notice that 3 stripes still produce rank 1. France has blue-white-red vertical stripes and b w r in its columns. By coincidence lhe German flag is nearly its transpose with the coion Black-White-Red: bbwwrr bbw w rr bbwwrr bbwwrr = T 1 1 1 [bbwwrr] France В В В В В В' В В В В В В W IV IV И’ И’ IV IV IV IV IV IV IV = В в IV IV [111111] Germany bbwwrr 1 В R R R R R R bbwwrr t R R R R R R R Each matrix reduces to two vectors. To transmit those images we can replace № pixels by 2ЛГ pixels. Similarly. Italy is green-white-red and Iceland is green-white-orange. But many many countries make the problem infinitely more difficult by adding a small badge on top of those stripes. Japan has a red sun on a white background and the Maldives have an elegant white moon on a green rectangle oo a white rectangle. Those curved images have infinite rank—compression is still possible and necessary, but not to rank one.
270 Chapter 7. The Singular Value Decomposition (SVDj A few flags slay with finite rank but they add a cross to increase the rank. Here flags (Greece and Tonga) with rank 3 and rank 4. lw° I see four different rows in the Greek flag, but only three columns. Mistakenly, I thought lhe rank was 4. But now I think that row 2 + row 3 - row 1 and the rank of Greece is 3. On the other hand. Tonga’s flag does seem to have rank 4. Tbe left half has four rows: all while-short red-longer red-all red. We can’t produce any row from a linear combination of the other rows. The island kingdom of Tonga has the champion flag of finite rank I Singular Values with Diagonals Three countries have flags with only two diagonal lines Bahamas. Czech Republic, and Kuwait. Many countries have added in stars and multiple diagonals. From my book I can’t be sure whether Canada also has small curves. Il is interesting to find the SVD of this matrix with lower triangular I s—including the main diagonal—and upper triangular O s. Hag with a triangle 10 0 0 110 0 1110 1111 has A-1 « 1 -1 0 0 0 0‘ 1 oo -1 1 0 0-11 0 A has full rank r = AL All eigenvalues are 1. on the main diagonal. Then A has N singular values (all positive, but not equal to 1). The SVD will produce n pieces 0,14,0^ of rank one. Perfect reproduction needs all n pieces In compression lhe small o's can be discarded with no serious loss in image quality. We want to understand the singular values for n ж 4 and also to plot all a'l for large n. The graph on the next page will decide if A is compressed by the SVD. Working by hand, we begin with AAT (a computer would proceed differently): That -1,2,-1 inverse matrix is included because all its eigenvalues have thc form 2 — 2 cos в. We know those eigenvalues! So we know the singular values of A.
-j Comprising Images by the SVD 271 A 4T) = ------ — I 2-2ожв 4sin2(g/2) ffH)sv/A =--------------!____ qi ' 1 2sin(9/2)' U1 The n different angles Q m dually spaced. whlch 1 m4tc',h“ example so exceptional ж Зж (2n—l)v / tfs2^m'2n + l........ 2n + 1 l"e4mcludc*e=^WIlh2rin? = 1\ ' 9 2 J Ш important point » to graph the n smgular Vilutt л off (unlike the eigenvalues of A. which art dl I). But the dropoff и no, steep So the SVD give, only moderate compress.™ of this triangular flag Gmu for H.lben Figure 7.2: Singular values of the 40 by 40 triangle of 14 (it is not compressible). Hie evil Hilbert matrix H(i, j) •(i+j- 1)"‘ has low effective rank: we must compress it. The striking point about the graph is that the singular values for the triangular main* never go below I. Working with Alex Townsend, we have seen this phenomenon for 0-1 matrices with the Г» in other shapes (such as circular). This has not yet been explained Image Compression by lhe SVD Compressing photographs and images i. an exceptional way to «the SVC)« act.oa The action comes by varying the number of rank one p«es ouJ m the display By keep.ng more terms the image improves. ____ nv »nod fortune. We hoped to find a website that would show this Tim Baumann has achieved exactly what we hope^ or drr.»/ to use his work: httpet/ftimbaumann.in " -mag
272 Chapter 7. Thc Singular Value Decomposing (SV|)) Uncompressed image. Slider at 3QQ IMAGE SIZE GOO x GOO «PIXELS = 3GOOOO UNCOMPRESSED SIZE proportional to number of pixels COMPRESSED SIZE approximately proportional to 600 x 300 + 300 + 3<X) x 600 = 360300 COMPRESSION RATIO 360000/360300 = 1.00 Show singular values Compressed image. Slider at 20. IMAGE SIZE 600 x GOO •PIXELS = 360000 UNCOMPRESSED SIZE proportional to number of pixels COMPRESSED SIZE approximately proportional to 600 x 20 + 20 + 20 x 600 - 2-1020 COMPRESSION RATIO 360000/24020 - 14.99 Show singular values i i i 1111 i i ............i,i|i*i.......................................... Change thc number of singular values using thc slider. Click on one of these images to compress it: You can compress your own images by using the |file picker] or by dropping them oo this page.
7 2. Compressing ln“₽* by the SVD This is onc of ,hc fivc ,ma8« directly available 273 compression rat». The best ratio depends on the °* °* *“« ttaerrnmes the Mondnan pamtmg arc less complex and allow higher^ “* girl and dK. or the cats. When the computatron of comp,^ Лап Ле city o, the tree we have 80 terms truv with vectors u andvtf d.m^^'fС‘’° * 80 + *• + «0 x 600. You can compress your own images by UMn. /Г* 4lda U *« “ *>• sample images provided on the site, or by droon.„L Z_T рккст" bw,o° below the six One beautiful feature of T.m Baumanns site в that й ’ Stunt results. This book's website malh.mit.edu/ev . “Perates in the browser, with in- please sec that edited site for questions and C*" mc,ude ,dci' from readers. '-,Jmfncnt\ and suggestions problem Set 7.2 1 We usually think that the identity matnx / is as um„L. difficult to compress? Create the matru fora rank ч л, “ ₽'AM*’k Bul wh> “ 1 cro$i ™ JJb/t»ith a hontonta/-vertical 2 These flags have rank 2. Write A and В in any way as .,»T + 12 1! Aswvdan — Apintand w 2 2 2 2 --[J i 11 .12 11 [ 1 3 3 J 3 Now find the trace and determinant of BBT and alio ВтВ in Problem 2. The singular values of В are close to erf = 28 - and - J,. Is В compressible or not? 4 Use [I/, S, V] « svd (A) to find two orthogonal pieces auvT of ASw^s.n 5 A matrix for the Japanese flag has a circle of ones surrounded by all zeros. Suppose the center line of lhe circle (the diameter) has 2.V ones. Then the circle will contain about ir№ ones. We think of the flag as a 1-0 matrix. Its rank will be approximately CN, proportional to What is that number C 2 Hint: Remove a big square submatnx of ones, with corners at ±45® and ± 135°. The rows above the square and the columns to the right of the square are independent Draw a picture and estimate lhe number cJV of those rows. Then C = 2c. 6 Here is one way to start with a function F(r, y) and construct a matrix A Set Aij = F(i/N,j/N). (The indices i and j go from 0 to A' or from -N lo /V.) The rank of A as JV increases should reflect the simplicity or complexity of F. Find the ranks of the matrices for the functions F( = ту and F2 = x + и and F3 = x3 + y3. Then find three singular values and singular vectors for F3. 7 In Problem 6. what conditions on F(x.y) will produce a symmetric matrix S? An antisymmetric matrix A ? A singular matrix Af ? A matrix of rank 2 ?
214 Chapter 7. The Singular Value Decomposition (SVD) 7.3 Principal Component Analysis The “principal components" of A are its singular vectors, thc orthogonal column» and Vj of the matrices U and V. This section aims to apply thc Singular Value Dccomposj' lion A = U'EV'*. Principal Component Analysis (PCA) uses the largest a’s connected to the first u's and v’t lo understand the information in a matrix of data. Wc are given a matrix A. and we extract its most important part Ak (largest tr’g). Лк - <7iUIe]’ + • • • + with rank (Afc) = *. solves a matrix optimization problem—and we start there The closest rank к matrix to A is A*. In statistics we are identifying the rank one pieces of A with largest variance This puts lhe SVD al the center of data science. In that world, PCA is “unsupervised" learning. Our only instructor is linear algebra— the SVD tells us to choose A*. When thc learning is supervised, wc have a big Kt of training data. Deep Learning constructs a (nonlinear!) function F to correctly classify most of that data. Then we apply this F to new data, as you will see in Chapter 8. Principal Component Analysis is based on matrix approximation by A*. The proof that A* is lhe best choice was begun by Schmidt (1907). He wrote about operators in function space; his ideas extend directly to matrices. Eckart and Young gave a new proof (using thc Frobenius norm to measure A - A*). Then Mirsky allowed any norm ||A|| that depends only on the singular values—as in the definitions (2), (3). and (4) below. Here is that key property of lhe special rank к matrix A* a। ut vf + • • • + vj. A* is closest to A If В has rank к then ЦЛ —Afc|| < ||A - B||. (|) Three choices for the matrix norm ||A|| have special importance and their own names: Spectral norm ||A|| = max« tri (often called the Z3 norm) (2) Frobenius norm ||A||r = v/a?+ •••+ (7) also defines ||A||jr (3) Nuclear norm ||A||/v = <7i + <7a + ••• + <7r (the trace norm) (4) These norms have different values already for thc n by n identity matrix: ll/lh-i l|/|k = n. (5) Replace I by any orthogonal matrix Q and the norms stay the same (because all at = 1): 1Ю111-1 H<?llF = t/n IIQII/V-n. (6) More than this, the spectral and Frobenius and nuclear norms of any matrix stay the same when A is multiplied (on either side) by an orthogonal matrix. So the norm of A =(/EV1 equals thc norm of £: ||A|| = ||S|| because I/and V are orthogonal matrices.
275 Norm of a Matrix Wc need a way to measure the size of a vector or. matnx . * is the usual length ||tr||. For. matnx, FmbeatnsexcILlT T*T’ S** H .x -K. in Л. ™. ........... 11,ll> = 4 + - + ^ 1М&-4>+-+<+...+4|1....+< Clearly |M 7 Iе! IMI- Similarly ||A||r > 0 and |lcA||r = lei ||A||r Equally essential is the tnanglc inequality for tr + w and A~+ fl: Triangle Inequalities ||v + w|| < ||v|| + ||w|| „j цл + дцг < цдц^ + (gj We use one more fact when we meet dot products vT w or matrix products AB : Schwarz inequalities |vTw| < ||tr||||w|| and ||АВ||Г < ||A||f||B|If W That Frobenius matrix inequality comes directly from the Schwarz vector inequality: |(AJ3)y|’ < Hrowiof A||2 Ucolumn j of B||2. Add for all 1 and) to see ||A0||^. This suggests that there could be a dot product of matrices. It is A • В trace! AT B). Note. The largest size |A| of the eigenvalues of A is nor on acceptable norm' We know that a nonzero matrix could have al) zero eigenvalues—but its norm ||A|| b not allowed to be zero. In this respect singular values are superior 10 eigenvalues The Eckart-Young Theorem The theorem was in equation (I): И В has rank к then ||A - A.|| < IIA - ВЦ- In all three norms ||A|| and ||A||F and ||A||a. we come closest to A by cutting off thc SVD after к terms. The closest matrix is A* оiat।t>] + • • • + • This to the fact to use in approximating A by a low rank matnxI We need an example and it can look extremely simple: a diagonal matnx A. The rank 2 matrix closest to A = 0 3 0 0 0 0 2 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ' 0 0 0 0 0 The difference A - A, is all zero except foe the z ano ь. ™ How could any other rank 2 matrix be closer io A than this Aj . Пез. »«"*9.T QiAQi. The norms and the rank are n 1ие.д 3 2,1. The best approximation So this example includes all matnces wit singu 2019 book Linear Algebra and A2 keeps 4 and 3. Several proofs art Li has simplified Mirsky's Learning from Data (Wellesley-Cambndge Pte^Ou К g proof that Ak is closest to A, for all norms that depend only on the
276 Chapter 7. The Singular Value Dccompositlon (SVD) Principal Component Analysis Now we start using the SVD. The matrix .4 is full of data. We have n samples. por each sample we measure m variables (like heigh! and weight). Thc data matrix Ao has n columns and rn rows. In many applications it is a very large matrix. The first step is lo find the average (lhe sample mean) along each row of Ло. Subtract that mean from all m entries in the row. Now each row of the centered matrix Д has mean zero. The columns of A are n points in R Because of centering, the sum of the n column vectors is zero. So lhe average column is thc zero vector. Often those n points are clustered near a line or a plane or another low-dimensional subspace of R™. Figure 73 shows a typical set of data points clustered along a line in R2 (after centering Ao to shift the points left-right and up-down to have mean (0,0) in A). How will linear algebra find that closest line through (0,0) 7 It is in the direction of the first singular vector u> of A. This is the key point of PCA ! A is 2 x n (large nullspace) AAT is 2 x 2 (small matrix) ATA is n x n (large matrix) Two singular values O\ > > 0 Figure 73: Data points (columns of A) are often close to a line in R2 or a subspace in R"'. The Geometry Behind PCA The best line in Figure 73 solves a problem in perpendicular least squares. This is also called orthogonal regression. It is different from the standard least squares fit to n data points, or thc least squares solution to a linear system Ax « b. That classical problem in Section 43 minimizes ||Ax — b||a. It measures distances up and down to the best line. Our problem minimizes perpendicular distances. Thc older problem leads to a linear equation ATAx - AJb lot the best x. Our problem leads to singular vectors tq (eigenvectors of AAT). Those are the two sides of linear algebra: not the same side. The sum of squared distances from the data points to the uj line is a minimum. To see this, separate each column a, of A into its components along U) and u2: E 11‘bll* = E l°tT“'l2 + Ё > । i The sum on the left is fixed by the data. The first sum on the right has terms u7°jo7u,‘ It adds to uf(AAr)U| So when we maximize that sum in PCA by choosing the top eigenvector U! of AA , we minimize the second sum. That second sum of squared distances from data points to the best line (or best subspace) is the smallest possible.
7 3. Principal Component Analyse 277 The Geometric Meaning of Eckart-Young c eure 7-3 *as *n lwo dimensions and it led to the closest line. Now suppose our data matrix Ao is 3 by n Three measurements like age. height, weight for each of n samples. A ain we center each row of the matrix, so all the rows of A add to zero And the points move into three dimensions. V^'e can still look for the nearest line. It will be revealed by the first singular vector ui , -pu best line will go through (0,0,0). But if the data points fan out compared to °' re 7 3. we really need to look for the best plane. The meaning of “best" is still this: Tlw sum of perpendicular distances squared to the best plane is a minimum That plane will be spanned by the singular sectors u> and Uj. This is the meaning of Eckart-Young. It leads to a neat conclusion: The best plane contains the best line. The Statistics Behind PCA The key numbers in probability and statistics are the mean and variance The “mean" is an average of the data (in each row of Ao) Subtracting those means from each row of Ao produced lhe centered A. The crucial quantities arc the “variances" and “covariances". The variances are sums of squares of distances from the mean—along each row of A. The variances are the diagonal entries of lhe matrix A AT. Suppose the columns of A correspond lo a child's age on the x-axis and its height on the у-axis. (Those ages and heights art measured from the avenge age and height.) We are looking for the straight line that slays closest to the data points in the figure. And wc have to account for the joint age-height distribution Of the data. The covariances are the off-diagonal entries of lhe matrix AAr. Those are dot products (row i of A) • (row j of A). High covariance means that increased height goes with increased age. (Negative covariance means that one variable increases when lhe other decreases.) Our first example has only two rows from age and height: the symmetric matrix AAT is 2 by 2. As the number n of sample children increases, we divide by n — 1 to give AAT its statistically correct scale. The factor is n — 1 because one degree of freedom has already been used for mean 0. This example with six ages and heights is already centered to make each row add to zero: 3-4 7 1-4-3 1 7-6 8-1 -I -7 j For this data, the sample covariance matrix S is easily computed. It is positive definite. Example Variances and covariances $ = 6 _ j
278 Chapter 7. The Singular Value Decomposition (SVD) Thc two orthogonal eigenvector; of S are t»i and u2. Those are the left singular vectors (often called the principal components) of A. Thc Eckart-Young theorem says that the vector ui points along the closest line in Figure 73. The second singular vector u2 will be perpendicular to that closest line. Important note PCA can be described using the symmetric S = AAT/(n - 1) or the rectangular A. No doubt S is the nicer matrix. But given thc data in A. computing $ can be a computational mistake. For large matrices, a direct SVD of A is faster and more accurate. By going to AAT we square <r> and trr and the condition number <7i/cr In lhe example. S has eigenvalues near 5< and 3. Their sum is 20 + 40 = 60, thc trace of 5. Thc first rank one piece >/57и1»Г й much larScr than thc second piece The leading eigenvector ttj ss (0.6.0.8) tells us that the closest line in the scatter plot has slope near 8/6. The direction in the graph nearly produces a 6 - 8 - 10 right triangle. The Linear Algebra Behind PCA Principal Component Analysis is a way to understand n sample points jn rn-dimensional space—the data. That data plot is centered: all rows of A add to zero. The crucial connection to linear algebra is in the singular values and the left singular vectors u, of A. Those come from the eigenvalues A, = <rf and the eigenvectors of the sample covariance matrix S — AAT/(n - 1). Thc total variance in the data comes from the squared Frobenius norm of A: Total variance T = ||A||}./(n - 1) = (||at||2 + ••• + ||an||2)/(n - 1). (Ц) This is the trace of S—the sum down the diagonal. Linear algebra tells us that the trace equals the sum of the eigenvalues of the sample covariance matrix S. The SVD is producing orthogonal singular vectors u, that separate the data into uncorrelatcd pieces (with zero covariance). They come in order of decreasing variance, and lhe first pieces tell us what we need to know. The trace of S connects the total variance to the sum of variances of the principal components u i,..., ur: Total variance T = + • • • + trj. (12) The first principal component U| accounts for (or "explains") a fraction a2/T of the total variance. The next singular vector u2 of A explains the next largest fraction Oj/T. Each singular vector is doing its best to capture the meaning in a matrix—and all together they succeed. The point of the Eckart-Young Theorem is that к singular vectors (acting together) explain more of the data than any other set of к vectors. So we are justified in choosing “i to u* as a basis for the k-dimensional subspace closest to the n data points. The "effective rank” of A and S is lhe number of singular values above the point where noise drowns the true signal in the data. Often this point is visible on a “scree plot" showing the dropoff in the singular values (or their squares o^). Look for the “elbow" in the scree plot (Figure 7.2) where the signal ends and noise takes over.
7 у I’nncipal Component Analysis 279 problem Set 7.3 1 Suppose Ao holds these 2 measurements of 5 Ло=[ 5 < 3 2 fl 1-» 1 0 l-i] Find Ле pute thc sample covariance matnx S = АДТ,, _ . A Lo"v and Л2. What line through the ongin is closest to the 5 samples in colul™ ofA? * Take the steps of Problem 1 for this 2 by 6 maim Д,; < = Г I 0 1 0 1 01 * 1123321 The sample variances and the sample covariance *ц are the entries of S. Find S after subtracting row averages from .-lo . What is <h? From the eigenvectors of S = A.4T. find the line (the U; direction through the center point) and then the plane (u1 and u, directions* closest 10 these four points in three-dimensional space: 1-1 0 0 A = 0 0 2 -2 1 1-1-1 Compare ordinary least squares (Section 4-3) with PC A (perpendicular least squares). They both give a closest line C + Dt to the symmetric data b = -1.0.1 at times t =-3,1,2. -3 I 2 Least squares : ATAi = ATb PCA: eigenvector of AAT (singular sector uj of .4) 7 8 The idea of eigenfaces begins with .V images: same sire and alignment. Subtract the average image from each of the .V images. Create ’ E.4, A?/N - 1 and find tbe eigenvectors (= etgenfaces) with arye g™ _ They don’t look like faces but their ^b^^^Xd gives a code for this dimension reduction pioneered у u • . f > 4. if 4 has singular values 5,4,3.2.1 and What are the singular values of .4 - A3 A3 is the closest matrix of rank 3 to .4 . 1 are unrer and lower bounds to Oi for If .4 has <7, = 9 and В has m = 4. what are upper an A+ B? Why is this true?
280 Оыргст 7. Thc Singular Value Decomposition (SVD) 7.4 The Victory of Orthogonality (and a Revolution) If I look back at the linear algebra in this book, orthogonal matrices have won. You could say that they deserved to win. The key to their success goes all thc way back to Section 1 2 on lengths and dot products. Let me recall some of their victories and add new ones. 1 The length of Qx equals the length of x: (Qx)T(Qx) = xrQTQx = xrx = ||x||3 2 The dot product (Qx)T(Qy) equals the dot product xTy: xTQTQy = xTy. 3 All powers Q ' and products Q\Qi.. Qx of orthogonal matrices are orthogonal. 4 The projection matrix onto the column space of Q (m by n) is QQ‘ = (QQ^y 5 The least squares solution to Qx — b (m > n) is x = QTb = Q^b (pseudo inverse). 6 The eigenvectors of a symmetric matrix S can be chosen orthonormal: S = QAQT 7 The singular vectors of every matrix are orthonormal: A = Q\E QJ = C/EVT". 8 The pseudoinverse of L'EVT is VE+l/T. The nonzeros in E+ (n by m) are , _L That list shows something important. The success of orthogonal matrices is tied to the sum of squares definition of length: ||x||2 = xTx. In least squares, the derivative of ||Ax - b||2 leads to a symmetric matrix S = ATA. Then S is diagonalized by an orthogonal matrix Q (the eigenvectors). A is diagonalized by two orthogonal matrices U and V. And here is more about orthogonal matrices: A = QS and A = QR. 9 Every invertible matrix equals an orthogonal Q times a positive definite S. Polar Decomposition A = UT, VT = (UVT) (VEVT) = QS (1) S is like a positive number r. and Q is like a complex number e,e of magnitude 1. Every complex number x + iy can be written as e1* times r. Every matrix factors into Q times S. The square root of (x-»y) (x+iy) is r. The square root of AT A = VE2VT is S= VEVT. Example Q = UV"1 and S = VEVT come from the SVD of A in Section 7.1: 10 Every invertible matrix equals an orthogonal Q times an upper triangular R. Example Q and R come from the Gram-Schmidt algorithm in Section 4.4:
r 4. The Victory of Orthogonality land . Rot)<lMxwl) 281 Householder Reflection Matrices , iS a neat construction of orthogonal matrices. Instead of rotations ( determinant = I) f’crC afC reflections (determinant = -1). Each matrix // is determined by a unit sector u: the*6 Reflection matrix 77 — J - 2ццт Яш = ш _ = dearly ЯТ = Я. Verify that Я Й an orthogonal matnx - ' • * = I because u * ti — 1 • ЯтН = (7-2ии-)(/_2иыт) = / т 'СИи-1 7 * ’ua +4uuua‘a/ (2) One eigenvector of H is u with eigenvalue A = _i. и и - 2u = -и. The other eigenvectors x fin the Naoe n^““ “ S,mpl,ficS to pnf u x ~ 0 that is orthogonal to u uTx = 0 leads to Ях = z - 2uuTz = z soA-1 Notice: The eigenvalues 1 and -1 arc real,since Я is symmetric The eigenvalues have |A| = 1. since Я is orthogonal The eigenvalues areA = l(n-itimes|lndA=_Hone dlne) The determinant of Я is —1, from multiplying the A’s Examples u = -±= [ _J j leads to lhe permutation Я = /-? | J “j j = ' ° о COS 9 ] leads to the reflection Я = 7-2 [ 00629 са>0яп0]=Г-сов20 -sin 201 stnflj [cos0sin0 sin20 J [-sin 20 cos 20] Both examples have determinant — 1. The neatest formula is the answer to this question: If ||a|| = ||r||, which matrix reflects a into Ha = rf Choose the unit vector u = with v=a - r. Then Ha = (f-2uuT) о= r I I’ll This leads to an error-free algorithm that factors A into QR: orthogonal times triangular. Q will be a product Hn . HjHi of Householder reflections. One column at a time, we choose Hj to produce the desired column r, in R. We keep a record of the vectors that lead to each H, (storing vectors not matrices). When we need the triangular R. we just use the ttj to recover those matrices H}. This idea can replace Gram-Schmidt for A = QR. Long ago Euler found another way to produce all orthogonal matrices. He rotated in some order around the x and у and z axes (three plane rotations). To an airline pilot those three rotations are roll and pitch and yaw. Now we show that orthogonality also wins in “function space . The vectors q become functions q(x). The dot products become integrals j j gdx. The dimension becomes oo.
282 Chapter 7. Thc Singular Value Decomposition (SVb) Calculus: Vectors Become Functions This is a book about linear algebra (foe matrices) Orthogonality is just as important in calculus (for functions). Unear comb.nations of functions produce a function space. Lengths 11/II « stillthc *luarc rools of ,nnCf Prixlucts £ f *** ,nncr Produ« of two vectors is a sum. the inner product of two functions f(x) and g(x) is an tntegral: Inner product LcnR‘h||/ll2-(/./)-/l№)|2<fa (3) The two great inequalities of mathematics extend from vectors to functions: |//Ю^)*1*ПЛ11Ы1 (4) The orthogonal basis from Gram-Schmidt now contains functions q*(x) instead of vectors: Basis functions q(x) /(x) = Ci9i(x) + ^(x) + • • (infinite series) (5) In a Fourier Series, those q's are sines or cosines. Other series use Chebyshev functions: Chebfun.org computes with very high accuracy. It is orthogonality f q, qk dx = 0 that allows us to find each Fourier coefficient c*. Just multiply the series by qk and integrate -. У f (x) q^x) dx = ci^qi qk fa + cijto 4k dx + • • • + ckJ(qk )2dx + • • • (6) By orthogonality, all terms on the right side are zero except the A'th term: Find ca jf(x)qk(x)dx = СкД*(*)) dx Qf Basis functions like qk = sin kx and coskx are guaranteed to be orthogonal because they are eigenfunctions for a symmetric differentia) equation ATA qk = Xkqk : A = — AT = - — ATAsinqx = sinqx = q2 sinqx. (8) dx dx ax Symmetric matrix equations ATAx = b become symmetric differential equations. Here is Newton’s Law using - ATAy for acceleration (the second derivative): d -rd .r. <?У force dt dt dt2 mass The equations of physics and mechanics tell us about minimizing the energy 1 Sy— just as we saw for positive definite matrices. The important point is that the basic laws of physics (as presented by Feynman) produce equations from minimum principles. They lead to symmetry and positive definiteness. Then the eigenfunctions are orthogonal.
7.4. The Victory of Ortl>og0,ullly (lnd a ',x"l *“* Н* + |г||1 Sllxlli + IIvU,. (9) 283 The message from classical mechanics w v°lulion from Sparsity orthogonal eigenfunctions. But the world of S»»"~nc equates have matrices and all kinds of training data. We w» lft We brge rectangular Smaller sums of squares are not necessarily the “ 7^ ‘ha* « • «mple way few nonzeros are the easiest to understand ^Tnd,n« goal. Span, vnton with We would like to build that goal of sparsity inn, th. • □f squares is inappropriate. If we minimize x2 + “* Right away a sum the best vector x' has N components all eq^l to l/A •5’” tol' + ’ + = 1. So we add a constraint that will push the . П“' “ °PPOMlc ot sparse! _ । . . , <>П Z t<’ward few nonzero components. The difficulty is that thc cardinality of x (Ihe . not a convex function of x. The set of vectors satisfy in'XS _componenl‘ ’ “ because tbe halfway vector i (x + X) can have Г, ^7 ** ~ 3 141X4 convcx- a different convex function whose minimization encouraZ 0X^*1?™’ ** that a good convex function would be found. * ' It was not at all sure In fact there is an excellent substitute for cardinality. It is the Zl norm of x: Z1norm /1 is the first in a sequence of norms. The exponents in Г go between p = 1 and p = oo: /’’norm |MF=(|xIF + - + k.fl,,’has ||x + »«, < l|x||, + IMU (10) P = 2 gives our usual sum of squares (the t2 norm). At the top end p = oc is lhe maximum norm |,x| = max|x,If we go below p = 1, the triangle inequality fails: convexity is lost. As p -> 0. ||x||, approaches tbe cardinality of x: not a norm. Let me show bow adding an t1 penalty to the Z2 norm produces a sparser solution x". Nopenalty 2E = (. - 1)> ♦ (,-!)’♦(.♦,)’ bat %*. The minimum at x = у = j (not sparse) is E = j. Now add a penalty | |x||i = |x| + |p|: 2S-(«-l)’ + (,-l)* + (I + ,)’+«M + 4»|S- Sivil’S' The minimizing solution (x*. у*) moves from (J. j) to (0.0): totally sparse. There is a geometric way to see why an Z1 minimization picks out a sparse solution. The vectors with | Vi | + |vj| < 1 fill a diamond in the n — t^ plane. The I2 norm gives a circle u?+ vj < 1 and the Z® norm |ci| < l,|rj| < I gives a square. It is the comers of tbe diamond that touch a line al a sparse point One of those sharp comers will lead to the minimum in the following typical optimization.
284 Chapter 7. The Singular Value ОесощройИи, (SVb) The Minimum of ||v|| on the line ajVj 4- a2v2 — । Which point on a diagonal line like 3t>i + 4t>j = i i* closest to (0,0) ? The answer (and meaning of "closest”) will depend on lhe norm—the measure of distance. This is д *”C way to see important differences between /* and t2 and t°°. gotx* To find the closest point lo (0,0), rescale lhe t1 diamond and t2 circle and f°° (where the vectors have | i v||> < 1 and ||v||2 < 1 and < 1). When thcv Figure 7.4: The solutions v* lo the /* and I2 and t°° minimizations. The first is «pane The first figure displays a highly important property of the minimizing solution to the /* problem That solution v” has a zero component The vector v* is "sparse" To repeat, this happened because a diamond will touch a line at a sharp point The line (or the hypcrplane in high dimensions) contains lhe vectors that satisfy the constraints /tv = b. The diamond expands lo meet the line at a sharp comer I The essential point is that the solutions to /’ problems are sparse. They have few nonzero components, and those components have meaning. By contrast the least squares solution (using t2) has many small and non-interesting components. By squaring, those small components hardly affect the I2 distance and they turn up in lhe t2 solution. Minimizing with the f1 norm The point of these pages is that computations are not all based on minimum energy. When sparsity is desirable. tl comes in. We need new methods for new problems like these: Basis Pursuit Minimize ||z||i subject to Ax = b LASSO with Penalty Minimize || Ax — b||£ 4- A||z||i LASSO with Constraint Minimize ||Az - b||£ with ||z|ii < T LASSO was invented by Tibshirani to improve on ordinary regression (= least squares).
7 4 The Victory of Orthogonal,^ (and , 285 Numerical methods for f + ? minimuMMxi m , improvement in x. We have icarncd to "'***’>' Step by мер an t' Lagrange's idea builds the constramt* palely (by introducing Lagrange multipliers as unknown,. tC fuoclio0 to ** wmmiied ,heV are derivatives of thc minimum cost with rrv^. L mu*l,plier* h**t meaning*— In mathematical finance the тиШр1|еп «^2* They measure lhe risks in buying an option-the rcP,e'enteJ by Greek letter* ^uals a designated strike price * П8Ы to " «И - when its value problem Set 7.4 1 If . » . compta ««« » кч* ta, |Wp . . h|, + For v = < find Г and ||«||2 and 6T and ||8]|2. Find the eigenvalues and eigenvectors of a rotation marrn and . reflection mMrix: •Hs: «:] на:-:] A permutation matrix has the same columns as thc identity maim (in some order). Explain why this permutation matrix and every permutation matrix ix orthogonal: 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 has orthonormal columns to PTP=_______and P*‘ = When a matrix is symmetric or orthogonal, it will have orthogonal eigenvectors This is the most important source of orthogonal vectors in applied mathematics. Four eigenvectors of that matrix P are x, = (l.l.l.l).xi • (1Д<*,<’). x3 . (l,ia,<*,«•), and Z4 - (I,Multiply P times еж* vector to find X|, Aj, Aj, A«. The eigenvectors are the columns of the 4 by 4 Fourier matrix F. Г i i i i л F 1 Show that Q=— ® j 1 i -1 ~t i <» i-i i i3 -i • has orthonormal columns Q Q = I Haar wavelets are orthogonal vectors (columns of W) using only 1,-1, and 0 1 1 0 1 -1 0 -1 0 1 -1 0 -1 Find VV^ W and W~1 and the eight Haar wavelets for n = 8.
8 Learning from Data 8.1 Piecewise Linear Learning Functions 8.2 Convolutional Neural Nets 83 Minimizing Loss by Gradient Descent 8.4 Mean, Variance, and Covariance This chapter describe* a combination of linear algebra and calculus and machine learn in They produce an algorithm called “deep learning" that approximates a nonlinear functi of many variables. That unknown function classifies lhe data, and recognizes the ima and translates the sentence, and find* the best move in Go. The learning function P(x has to combine complexity with simplicity. Simplicity comes from two key step* in each layer F* of lhe overall learning function P • Layer к - 1 to layer к v* = F*(u*_|) = ReLU(A*v*_1 + bk) That function F* begin* with a linear step to the vector wt + bk. Then comes a fixed nonlinear function like ReLU (my pronunciation is RayLoo). That function acts on every component of every vector A* v*_ i + b* lo give v*: ReLU (any number x) « max(0,») ж ? -I * 2 Z It r i u It is amazing that this nonlinear function can achieve so much. The key is composition functions of functions of functions. We have L + 1 layers t 0,1,..., L (layer 0 is input, layer L is output). Composition produces Vi. from Vz-i and eventually from the input v0; t»t = Ft(®£_|) = Ft(Ft-i(...(Fi(wo)))) = chainof nonlinear functions F*. (2) The “weights" z are all the entries in the matrices A> to А/. and the vectors b| to bg. A deep network will have many weights from dense matrices A* and fewer weights from convolution matrices (Section 8.2). The big computation is to choose those A’s and b’s in z so that Vl = P(x. t>0) is close to the known outputs w from the training data v0. More training data should give more accurate weights A* and b>—at a cost of extra computations. Those computations aim for weights that minimize the loss—the differ- ence between vz. and to. Stochastic gradient descent is a favorite algorithm to find those weights. Backpropagation computes derivatives of F from derivatives of every F* by the chain rule. The design of F is a balance between computing cost and learning power. Amazingly. F can achieve accurate outputs on new test data that it has never seen. Reference: Linear Algebra and learning from Data. Gilbert Strang. Wellesley-Cambridge (2019). 286
t «• Learning from Data 287 Suppose one of lhe digit, о, 1.g u . F«*tionS of Deep Learning whidl digit il is ? Thai пеип^^ que^ u How d^ . * ^ogntze Which digit И ta? This « , !**“'*««* How can . COItlputCT begin with lhe same idea: Lean, from ещ^**'"* 4-cwou Prob^y both answers So wc start with M different images (the —* • small pixels—or a vector v = (n..... "In« «» An image will be a set of of thc ith pixel in the image: how dark or l1Rht it Г"T"' Г‘ “* °* “в”У*«»е" p features: M vectors v in p-dimensional space c, ’ ** *u*e W lnu8c‘ each with know lhe digit it represents. n ”* evcr> » in that training set wc in a way. we know a function. We have Л/ moots w R> But we don’t have a “rule". We are helpless w.Ta £ ln^ „7*“ °“9 to create a rule that succeeds on (most of) the training n*^"* P“’pt”“” more than lhal: The rule should give the correct d J f^T 7“ ."*"" rouch "*«• from the «« „.tata. answer might be. F v) could be a linear function from R’ to R10 (a 10 by P matnx) Ш 10 outputs would be probabtl.ues of the numbers 0 to 9 We would have 10p auric, and M training samples to get mostly right The difficulty is: Linearity is far loo limned Artistically, two zeros could make an 8. 1 and 0 could combine into a handwritten 9 or possibly 6 Images don’t add In recognizing faces instead of numbers, we will need a lot of pixels—and the input-output rule is nowhere near linear. Artificial intelligence languished for a generation, waiting for new ideas There is no claim that the absolutely best class of functions has now been found. That class needs to allow a great many parameters (called weights). And it must remain feasible to compute all those weights (in a reasonable time) from knowledge of thc training set. The choice that has succeeded beyond expectation—and has turned shallow learning into deep learning—is Continuous Piecewise Linear (CPL) function, Linear for sim- plicity. continuous to model an unknown but reasonable rule, and piecewise io achieve the nonlinearity that is an absolute requirement for real images and data This leaves the crucial question of computability. What parameters will quickly de- scribe a large family of CPL functions ? Unear finite elements start with a tnangular mesh But specifying many individual nodes in R’ is expensive. Much bener if those nodes we the intersection, of a smaller number of lines (or hyperplanes) Please know that a regular grid is too simple. Here is a first construction of a piecewise linear function of tbe datavector .ma.n, .4, the input v and the output w.
‘ - I 288 Chafer 8 Learning fron, D|Uj (Лю)| [И® + b)ij+ pq + 2q = 20 weights P С|Лv + b]+ s t£> \ r(4.3) = 15 linear pieces ----------< in w = F(v) (Av), [(Лю + b)«]+ Actually the nonlinear function ReLU (x) = *♦ = rnax (r, 0) was originally smoothed into a logistic curve like 1/(1 ♦ e"'). It was reasonable to think that continuous derivatives would help in optimizing the weights Л i. bi. Aj. That proved to be wrong. The graph of each component of (Л|« + bj+ has two halfplanes (one is flat, front the zeros where Л(« + bt is negative). If is q by p. the input space Жр is sliced by q hyperplanes into r pieces. We can count those pieces! This measures the "expressivity” of the overall function F(t>). The formula from combinatorics uses the binomial cocffi- • cients (see Section 8.1): (q \ / я \ 0 1 + I J 1 +••• + This number gives an impression of lhe graph of F with a hidden layer. But our function is not yet sufficiently expressive, and one more idea is needed. Here is the indispensable ingredient in the learning function F. The best way to create complex functions from simple functions is by composition Each Ft is linear (or affine) followed by the nonlinear ReLU : F,(t>) “ (Л,о + b,)+. Their composition is F(») « Ft(Fi,_I(...Fs(F|(v)))). We now have L - 1 hidden layers before the final output layer. The network becomes deeper as L increases, That depth can grow quickly for convolutional nets (with banded Toeplitz matrices A: many zeros). The great optimization problem of deep learning is lo compute weights A, and bi that will make the outputs F(t>) nearly correct—close to the digit w(v) that lhe image v represents. This problem of minimizing some measure of F(o) - w(v) is solved by following a gradient downhill. The derivatives of this complicated function are computed by backprvpagaiion—the workhorse of deep learning that executes the chain rule. A histone competition in 2012 was to identify the 1.2 million images collected in ImageNet. The breakthrough neural network in AlexNet had 60 million weights. Its accuracy (after 5 days of stochastic gradient descent) cut in half the next best error rate. Deep learning had arrived. Our goal here was to identify continuous piecewise linear functions as powerful approximators. That family is also convenient—closed under addition and maximization and composition. The magic is that the learning function F(Ai,bt,v) gives accurate results on images v that F has never seen.
289 g I. piecewise Linear Learning Function, 8.1 Piecewise Linear Learni peep neural networks have evolved into a the structure of lhe network has become more , — adapted to new applications. One way to * P^«i-and nxee'eajy structure. 1 hose pieces come together into . kLlr *Ьсп‘* eMcnt,aJ l*«* in the * - •’)*•«’ weights» creating that function f<* “* *"h Bcw tC4t J,ng Functions ^*f«««mach.nelcanuBg & adapted to new applications. One wav тТЛ'1*"' *ndГ” ' . -•_>s*aire- Those nieces мт» .i is |q dcx.ribc ск\сп(м1 pi that capture information from the trainingdlu Here are important steps in Key operation Key rule Key algorithm Key subroutine Key nonlinearity 1 2 3 4 5 Our first step is to describe lhe pieces F., F, F, far TZ ' / * The weights x that connect the layers v m ()p(in,1/ed |n comes from the training set, and the function г wtor v v° ш taU — „. uncent to find tbe best weights x Backpropagation to execute lhe chain rule Rel.l(y) ж max(y.O) = ramp function F* is a Piecewise Linear Function of v*_| The input to F* is a vector of length IV*.,. The output is a vector v* of length AT* ready for input to F*+l. This function F* has two pans, first linear and then nonlinear: 1. The linear part of F* yields A*v*_, + bt (that bias vector b, makes this “affine”) 2. A fixed nonlinear function like ReLU is applied to each component of A*V*.|+b* Layer* |ti**F*(ti*.,)wReLU(A*g*l+6*)| (I) The training data for each sample is a feature vector t\>. The matnces A* and the column vectors b* are “weights” to be chosen—so that the final output vj. is close to the correct output w. Frequently stochastic gradient descent computes optimal weights x = (Л,, bi,..., Al) in the central computation of deep learning. Minimizing a loss function of vl - w relies on "backpropagation" to find the x-derivatives. The activation function ReLU(y) = max(y.O) gives flexibility and adaptability. Linear steps alone were of limited power and ultimately they were unsuccessful. ReLU is applied to every “neuron” in every internal layer. There are № neurons in layer k, containing the A* outputs from equation (Ik Notice that ReLU itself is continuous and piecewise linear, as its graph shows. (The graph is just a ramp with slopes 0 and 1. Its derivative is the usual step function.) When we choose ReLU. the composite function F = FL •• (F2(F,(*»•))) h» “ '"Ч»™1 ind ,l,rac"'e property: The learning function F is continuous and piecewise linear in p.|
Chapter 8. Learning from Dala 290 One Internal Layer (L = 2) Sunoco we have measured m = 3 feature» of one sample point in the training seu Tho« features are the 3 components of the input vector v - v0. Then the first function F, in the chain multiplies v0 by a matrix A, and adds an offset vector bj (bias vector). If 1, is 4 by 3 and the vector bj is 4 by 1. we h«*e 4 components of A0v0 + That step found 4 combmations of the 3 original features in v = v0. The 12 weights in the matrix .4, were optinuzed over many feature vectors »o tn the tnumng set, to choose a I by 3 matrix (and a 4 by 1 bias vector) that would find 4 insightful combinations. The final step to reach v, is to apply the nonlinear "activation function" to each of the 4 components of Л.оо + bi Historically, the graph of that nonlmcar function was often given by a smooth “S-arn*". Particular choices then and now are in Figure 8.1. Figure 8.1: The Rectified Linear Unit and a sigmoid option for nonlinearity. Previously it was thought that a sudden change of slope would be dangerous and pos- sibly unstable. But large scale numerical experiments indicated otherwise! A better result was achieved by lhe ramp function ReLU(y) - max(y.O). We will work with ReLU: Substitute A। t>o + bi into ReLU to find t>i (®i)fc “ max((AiVo + bt)s,0). (2) Now we have the components of V| at the four “neurons" in layer 1. The input layer held the three components of this particular sample of training data. We may have thousands of samples. The optimization algorithm found Ai and b(. probably by gradient descent. Suppose our neural net is shallow instead of deep. It only han this first layer of 4 neunms. Then the final step will multiply the 4-component vector by a 1 by 4 matrix A; (a row vector). A vector bj and the nonlinear ReLU are not applied to the output. Overall we compute Vj = F(x, i>q) for each feature vector t>o in the training set. The steps are v2 = Aj (ReLU (A|V0 + 6|)) “ F(®, ®o)- The goal in optimizing x = A|.b(.Aj is that the output values Vl = v2 at the last layer L “ 2 should correctly capture the important features of the training date t>0. At the beginning of machine learning the function F was linear—a severe limitation. Now F is certainly nonlinear. Just the inclusion of ReLU al each neuron in each layer has made a dramatic difference. It is the processing power of the computer that makes for fast operations on the data. For a deep network we depend on the speed of GPU’s (the Graphical Processing Units that were developed for computer games).
g ) piecewise Linear Learning Functions 291 ReLU ReLU д1Л3та,пхЛ1 Add4* ReLU Ы matrix Feature vector v0 Three components for each training sample ReLU Vi « Л|во + bj в| at layer 1 ViM layer 1 Ui«ReLU(»,) Four components of Vl and v, Output v3 Vj = 4jt>i True = w x: ’д,:. V"- For a classification problem each sample of the trammg data b asugned 1 or -1. We want the output v, to have that correct ogn (most of the time) For a regression problem we use the numerical value (not just the sign) of vj Depending on our choice of loss function of v, - w. thts problem can be like least squares or entropy minimization. We are choosing * - (weight matrices Л» and vectors bfc) to minimize the loss. Here are 3 possible loss functions: 1 1 Square loss £(z) = -£||F(x,»j) _ 7 . N training samples 1 N 2 Hinge loss L(z) = — m,x “ It F’(x)) for classification у = 1 or -1 1 ?. 3 Cross-entropy loss L(z)= £[jhlog b+(l-g,)k>g(l-p1)]fory,=Oar 1 Our hope is that the function F has “learned” the data This is machine learning. We aim for enough weights so that F has discovered what is important in recognizing dog versus cat—or identifying an oncoming car versus a turning car. Machine learning doesn't aim to capture every detail of the numbers 0,1,2...,9. It just aims to capture enough information to decide correctly <•*'<* number it is.
ChafXcr 8. Learning frum 292 The Initial Weights x0 in Gradient Descent * the form of Ле learning function F(®, v). The The architecture in a neural net ucciuc x Jn malnCes д and vcclors b шшйч аш VXX'Z*'"**•8 ” ” •“* Starting from «о.«« « тИЙ~« X2 and onward, aiming to imd w g ? Choosing Жо = о would be a disaster. ’ The question is: What "fW,“Xo f failure in deep learning. A proper choice of the Poor initialization is anXX X indcpenW «l«bu tlntt .«"Wen.,; 1. x„ha> a carefully chorea variattca »’• 2. The hidden layers in the neural net have enough neurons. , , , . , initial variance <ra controls the mean of the computed X!I iSX гX the variance of Ле weights. The key point is mis wcignts. I nc у on Ihe |ra|n|ng M(. Bm tf a j, wmng or & 5SU <M g— cm to С«»п,1 «/</.c weigta. The enqw. lion of x can explode to infinity or implode to zero. The danger controlled by the variance a* of x0 is exponentially large or exponentially tne uang у i. „2 s 2/fan-in. The fan-in is the maximum number of small weights, ic go< ' hc outpul). Software like Kerns makes a inputs to neurons (Figure 8.Z nas inn in good choice of Max-Pooling to Reduce Dimensions An image can have many pixels; the input tin can include many features. Then the size of our computations can grow out of control. We have to reduce the number of components (sometimes called neurons) in a typical layer. If you look at the architecture of AlexNet. you sec "pooling layers" lo reduce dimension. The most popular choice is simply max-pooling. Divide the image into blocks of 4 pixels. Replace each 2x2 block by 1 x 1: usually the largest of the four numbers. (Average pooling would keep the average of those numbers.) Here is AlexNet. Figure 8.3: AlexNet uses two GPU’s linked at certain layers. Pooling simply reduces the image dimensions. Most layers connect by convolution matrices A* (Section 8.2) but the (inal layers are fully connected by dense matrices A*. The input dimension is 150.528 and AlexNet had 00,000,000 weights—it won the 2012 competition to classify linageNet.
g.|. Piecewise Linear Learning Function» 293 Graph of the Learning Function F(v) The graph of F(v) is a огЬурегр1алС5 that fit together akxigaU рксех-they же plancs This is like ongami except that this р1р6 'produced a change of slope, graph might not be in RJ—Ле feature sector в = 4 P**'' going ю infinity. And Ле Л’« » m components. Part of the mathematic, of deep leanwtr and to visualize how they fit into one pteceuhe be numbcT r M ftal Р,лс' an example of a neural net w nh one intern^ 1»»„ ”***' *f,cr rn measurements like height, weight, age of a sample in tb r J^*,***** V“ COnU‘"' In Figure 8.2, F had three inputs in л, and one flat surface in 4-d.menMona) space The he.ght of £ "“j point Vo >n 3-ditnensional space, Limitations of .. hLs—Г —Л («hi), over the •« ™, T,»«»« Л. lb. p^. s^, „ 3 Note You actually see points on the graph of F when you run examples oo playgnxind.tensorflow.org This is a very instructive website That website offers four options for the training set of points r0 You choose the number of layers and neurons. Please choose the ReLU activation function ’ Then the program counts epochs as gradient descent optimizes the weights. (An epoch sees all samples on average once.) If you have allowed enough layers and neurons to correctly classify the blue and orange training samples, you will see a polygon separating them. That polygon shows where F = 0. It is the cross-season of the graph of : = F(o) ж height: «0 That polygon separating blue from orange (or phu from miiuu: this is classification) is the analog of a separating hyperplane in a Support Vector Machine, if we were limited to linear functions and a straight line between a blue ball and an orange nng around it. separation would be impossible But for the deep learning function F this is not difficult.. We will discuss experiments oo this plav ground.temorflow site tn the Problem Set Important Note: Fully Connected versus Convolutional Wc don’t want to mislead the reader -Fully connected" nets are often not Ae most ef- fective. If the weights around one pixel in an image can be repealed around *1 poeds (why not ?). then one row of .4 is all we need The row can asep> ™ pixel*. Local convolutional neura) nets <CNV.) же m .MexNetaod Sect- 8.2. The website math.mrt.eduTNNUI alkms the reader to creaie a CXX poo g Y™ e I— That is a useful insight into the power of to visualize in full detail the size and depth of the neural netwxrt make it оитк
Chapter 8. Learning from DftU 294 CounU»8 PiKeS ‘°,he G”₽h! °"' U»w .hr weicht matrices As and *»* bias vcc‘ore b* number h is easy to count * • 6far more mtcresling to count the number of flat picccs determine the function F. But ures the expressivity of the neural network in the graph of F- uon we fully understand (at least so far). F(x,v) is a more coroplicairo • without explicit approval of its •‘thinking". The system is deciding and * fairly soon. For driverless cars »> components. We have N functions Suppose vo has mcompone ^ * „ro g hypcrplane (dimension m - 1) of vo. Each of those linear fc becomet piecewise linear, with a in Rm. When we apply fold iu graph is sloping, on the other side fold along that hyperplane. this component of vt в «го piecewise linear functions of v0. so va Then the next matrix Ai conn' „ R- wordl describe each piecewise now has/оИг along 5r A, ReLU(Ai»o + &t)): thc output in our case. linear component о “ g ((he folds actually along N hyper- You could think of 5 ЙГИ*Ь , (old separates lhe plane in two pieces. The next planes in rn-dimensional space) _ fold morc dlfflcu|t t0 (ЫО <™. «ли km »-Лw but tbe rollowtn. H.0(e 4 .rraRRfmcnt-ur.d a theorem of Tom In combinatorial theory, wena V nled b Richard Stanley’s great textbook on Zaslavsky counts the pieces. 11* proo _ P u more complicated than we need. Enumerative Combinatorics ( nossible ways. We assume that the fold lines because it aikre.Ле <oM 1t~a» ~ » cm.». line» p.eee. it are re-geuerai [«««<« NretntlNet»»-*.<«».: 1606.05336). given by On the Expnutve Power of Deep Theorem For v in rn dimensions R". suppose the graph of F(v) has folds along N hyperplanes H|,...,Hjv. Those come from N linear equations ajv + b, = 0. in other words from ReLU at N neurons. Then the number of linear pieces of F and regions bounded by the hyperplanes is r(N, m): r(N,m)«(o ) + (7)+"‘+(m)‘ (4) These binomial coefficients are / N\ with 0! = 1 and I j = 1 and ( . 1 = 0 for t > N. N\ N\ » J ~ il(N-i)! Example The function F(x,p, z) = ReLU (z) + ReLU (y) + ReLU (z) has 3 folds along the 3 planes r = 0. p = 0, г = 0. Those planes divide R3 into r(3,3) = 8 pieces where F = x + у + z and x + x and r and 0 (and 4 more). Adding ReLU (x + у + z — 1) gives a fourth fold and r(4,3) = 15 pieces of R3. Not 16 because the new fold plane x + у + z = 1 does not meet the 8th original piece where x < 0, у < 0, z < 0.
g I. Piecewise Linear Learning FuntUum 295 George Polya'» famous YouTube video ь.,.. _ He helps lhe class to find r(5.3) = 26 Ole„ ?** G*“"« a One hyperplane in R"* produces ( 1V /Т will produce r(2,m) = 1 + 2 + 1 ?;/ J 'r two fokh in a line, which only ikr1- — * u** into г 2 11 -. •> The count r of linear pieces from .V g.li. ~ **“' '°Uo*fro«» the recursive formula «х/t Спай,, си a cake by 5 pbnes । s -------‘ "'-dimensional cakes. jJ=2rc^AndJVe2hyp«pllne% •egions provided m _> 1 u ----- “ L.. „ * 1 Whcn "i = 1 we have m- 1) r(N.m)=r(N_1,n,) + r<N_l (5) _ *ndr(N-l,m) regions m-1). The established N - | hyperplanes -------------------t one existing region - l.m); see Figure 8.4. To understand that recursion, start with .V -1 hypernb^ lnB- bdd one more hyperplane H (dimensjon r • - cut H in<o r(N - l.m - 1)^,on*. Eachof-tbore'p^^Xs' into two. adding r(N - l.m - 1) regions to the original r(N - ’ So the recursion is correct, and we now apply equauon (5) to comp^’^f ’ The count starts at r(l,0) = r(0, l) ж 1. Then (4)is proved by inducuonoo M + m: VKeKVH?;.*)] ' ' O' ' 9 ' ' The two terms in brackets (second line) became one term because of a useful identity: I . J + \, + l)eli + l) “^Леinductionucomplete. Mike Giles made that presentation clearer, and be suggested Figure 8.4 to show the effect of the last hyperplane H. There are r » 2* linear pieces of F(v) for N < ni and r » Nm/m\ pieces for N » m. when the hidden layer has many neurons 4 Stan with 2 planes \/ 4- r(2.2) - 4 la / \ 3a pc рЦпе H / 2a \ - н 4- H2.1) - 3 lb / 26 ' 36 Total r(3,2)e 7 |l •• n P • »• |i Figure 8.4: The r(2.1) = 3 pieces of H create ’ r(3,2) =4 + 3 = 73-^M’r(4 2) = n. fourth fold would cross all 3 existing folds and create 4 .
296 Chapter 8. Learning from t)ata Hat Pieces of F(«) with More Hidden Layers Ffrlis much harder with 2 internal layers in the network. Counung the linear pieces oi П ' CTls Now A1V) + b, will have N2 components Again vo and v> have m 1 '™(>n f fw onc layer< described above. Then applj. before ReLU. Each one is lit. Those folds aiong lhc ]incs whcrc cation of ReLU will create new folds m ртР component of A। Vi + b‘ “ ZC™. f _ + b, is piecewise linear, not linear. So it Remember that each compone w(M. hncM surface, not a hypcrplane. The straight crosses zero (if it does) along P t0 iecfwise straight lines for thc folds lines in Figure 8.4 for the о * ‘ „nding on the details of v0, At Ль Лэ, and bj. in Vj. So the count becomes > - piecewise straight lines Still we can estimate the n ReLU's at the second hidden layer. If those (or piecewise hyperplanes in a tot>1 foldl jn cach component of lines were actually straight, we м.ОиМ hove + jy2 piace oj N V1 « F(vo) Then J" Rolnick (fl,Xiv: 1906.00904). So the count of Composition F>(Fa(Fi(v))) The word “composition" would simply represent “matrix multiplication" if al) our functions were linear: F*(v) A*v. Then F(vq) “ AjAjAiVq: just one matrix. For nonlinear F* the meaning is tbe same: Compute v> F|(vo), then v2 « Fj(V|), and finally Vj - F3(v2). Now we get remarkable functions. This operation of composi- tion F3(Fa(Fj(vo))) is far more powerful in creating functions than addition! For a neural network, composition produces continuous piecewise linear functions F( v0) Thc 13th problem on Hilbert's list of 23 unsolved problems in 1900 asked a question about all continuous functions. A famous generalization of his question was this: Is every continuous function F(z, y, x) of three variables the composition of continuous functions G| Gn of two variables ? The answer is yes. Hilbert seems to have expected the answer no. But a positive answer was given in 1957 by Vladimir Arnold (age 19). His teacher Andrey Kolmogorov had previously created multivariable functions out of 3-variable functions. The 2-variable functions xy and xy use 1-variable functions exp and log, and you must allow addition. xy = cxp(log x + log y) and z* = exp(cxp(log у + log log z)). (7) So much to learn from the Web. A chapter of Kolmogorov's Heritage in Mathematics (Springer. 2007) connects these questions explicitly to neural networks. Is the answer to Hilbert still yes for piecewise linear continuous functions ? With enough layers and neurons. F can approximate any continuous This is univer- sality. New research by Telgarsky and Townsend shows the power of rational functions.
g.|. Piecewise Linear Learning Function, 297 problem Set 8.1 In the example F = ReLU(x) + r,i ... , for r(N, rn). suppose the 4th fold Comes * **}-«(«) that follows fonnuU (4) amglc point (0,0,0)—™ aceptionof^ £ *** x “ °* = = 0 at . F = sum of these four ReLU’s. ^«nbe the 16 (not 15) hncar pieces of Suppose we have m = 2 input, and X neu(um it a linear combination of N ReLU's. W °" 1 h,dden byer, so F(z,y) that the count of linear pieces of F has leading tenr/ ^nrmu'i *** to show Suppose we have X = 18 lines in a ni.« rr <i how many pieces of lhe plane? CompL wnh ” ** position and no three lines meet. **n the lines arc in general What weight matrix Л, and bias vector b, will produce ReLU lx + 2v 41 >„d ’ ? ^,nd (2x+* - »> - oitii hidden ayer (Theinput layer!has 2 components x and y.) If the output u> is the sum of those three ReLU s, how many flat pieces ia the piecewise linear w(x,y)? Folding a line four times gives r (4.1) = 5 pieces Folding a plane four times gives r И’ “11 pieces. According to formula (4), how many flat subsets come from folding R four times ? The flat subsets of R1 meet al 2D planes (like a door frame). 6 The binomial theorem finds the coefficients ' j u(s + b)x e‘b*“*. Fora = b=l what does this reveal about those coefficients and r(N.m) form > X? 7 8 In Figure 8 4. one more fold will produce 11 flat pieces in the graph of : “ F(x, y). Check that formula (4) gives r (4,2) = 11. How many pieces after five folds ? Explain with words or show with graphs why each of these statements about Continuous Piecewise Linear functions (CPL functions) is true: M The maximum M(x, y) of two CPL functions Ft(x. y) and F»(x, y) is CPL S The sum S(x. y) of two CPL functions F|(x. у) and Fj(x, y) is CPL. C If the one-variable functions у = F((x) and i »Fs(y) are CPU so is the composition C(x) = z » (Fj(Fi(x)).
298 Chapter 8. Leaning from 0<ц How many weights and biases are in a network with m = A'o = 4 । . feature vector t>0 and N = 6 neurons on each of lhe 3 hidden layers *» н *П Cacl1 activation functions (ReLU) are in this network, before lhe final output ? * П,апУ 10 (Experimental) In a neural network with two internal layers and a total of should you pul more of those neurons in layer 1 or layer 2 ? 10 neurons, Problems 11-13 use the blue ball, orange ring example on playground.lensorflow with one hidden layer and activation by ReLU (not Tanh). When learning succ wf8 a white polygon separates blue from orange in lhe figure that the code create* 11 Does learning succeed for N = 4 ? What is lhe count r(N, 2) of flat pieces in F(x) 7 The white polygon shows where flat pieces in the graph of F(x) change sign as they go through the base plane z - 0. How many sides in the polygon ? 12 Reduce to .V = 3 neurons in one layer. Does F still classify blue and orange cor- rectly? How many flat pieces r(3,2) in the graph of F(v) and how many sides in the separating polygon ? 13 Reduce further loW 2 neurons in one layer. Does learning still succeed ? What is the count r(2,2) of flat pieces ? How many folds in the graph of F(v) ? How many sides in the while separator 7 14 Example 2 has blue and orange in two quadrants each. With one layer, do N 3 neurons and even N • 2 neurons classify that training data correctly ? How many flat pieces are needed for success ? Describe the unusual graph of F(v) when W = 2. 15 Example 4 with blue and orange spirals is much more difficult! With one hidden layer, can the network team this training data ? Describe the results as N increases. 16 Try that difficult example with two hidden layers. Start with 4 + 4and6 + 2and 2 + 6 neurons. Is 2 + 6 better or worse or more unusual than 6 + 2? 17 How many neurons bring complete separation of the spirals with two hidden layers ? Can three layers succeed with fewer neurons than two layers ? I found that 1 + 4 + 2 and 4 + 4 + 4 neurons give very unstable iterations for that spiral graph. There were spikes in the training loss until the algorithm stopped trying, playground tensorflow org was created by Daniel Smilkov. 18 What is the smallest number of pieces that 20 fold lines can produce in a plane ? 19 How many pieces are produced from 10 vertical and 10 horizontal folds ? 20 What is the maximum number of pieces from 20 fold lines in a plane ?
g 2. Convolutional Neural Nets 299 8.2 Convolutional Neural Nets This section is about networks with a dtfferent r a layer with n neurons to the next layer with An m " "»nx *11 connects connected: A had mn independent weights No* *"'Р to "ow-the *cre My independent weights in A. • »e might have only E = 3 at tf = 9 The fully connected "dense net” will be eur i First, the weight matrices A will be huge. If one .m'”' I "*ftc,ew fur ™4e recognition layer has 60. (XX) components The weight mat™ Л S’P‘“k *” te ,npul ТЫ »: W. „ f„ to^'’ Almost always, the important connect™ u> an unage are local Music has a 1D local structure Images have a 2D local structure (3 copres fo, red-green bluei Video has a 3D local structure: Images in a tune senes More than this, the search foe structure b essenrially the same everywhere in the image. There is normally no reason to process one part of a text or image or video differently from its other parts We can use the same weights in all parts: Share the weights. Tbe neural net of local connections between pixels b shift-invariant thc same everywhere. The result is a big reduction in the number of independent weights Suppose each neuron is connected to only E neurons on the next layer, and those connections are the same for all neurons. Then the matrix A between those layers has only E independent weights x. The optimization of those weights becomes enormously faster. In reality we have time to create several different channels with their own E or E2 weights They can look for edges in different directions (horizontal, vertical, and diagonal). In one dimension, a banded shift-invariant matrix is a Tocplitt matrix or a filter. Multiplication by that matrix A is a convolution z • v. The network of connections between the layers is a Convolutional Neural Net (CNN or ConvNcti Here E 3. ' Z| xo z-i 0 0 0 • (to.’i.to.to.vr.to) 0 »i *o x_» 0 0 v«Au = (ft. to. to-У») 0 0 xi x, x_| 0 0 0 0 x, x0 X-ij X + 2 inputs v and N outputs у It is valuable to see A as u combination of shift matrices L, C, R Left. Center. Right. Each shift has a diagonal of I's A = Xi L + xoC + x-t R . t _ r„ + mCv + x_iRr> are exceptionally simple: Then lhe derivatives of у « Av = xyLv + XoC * d( output) cH weight) К х1л ^-=Cv = (1) > I .4' '1
300 <М*стК Learning tnen 1л-4 Convolution of Vecton» ж » v The convolution of two vector* to written Ж • • = (2,1 • 2) • (3,3 I). Computing the гем;It z • v a (6.9. II.7.2) to like multiplying the number* 212 and 331. without carrying 3 з 1 Notice that we leave 2 1 2 6*3*2= 11 a* t»(no carrying) 6 6 T Same step* for multiplying 331 Zr2 x + 2 time* 3z2 + 3x + 1 6 6 2 That tovwer would he 6 9 11 7 "2" 6r* * 9г1 + lix2 ♦ 7x + 2 The previous page just put the number* (X|.x0.x_|) « (2.1.2) on three diagonal* of A. Then ordinary multiplication 212 time* 331 convert* to matrix-vector multiplication An. When x ha* length j + I and v ha* length к * 1. convolution ж • v ha* length j + к + |. Comolulion as a Moving Window Suppose we average each number with lhe next number in v — (1,3,5). Thc result to у - (2.4). Thi* to a typical convolution of c with the averaging vector z ж (|. |) ; Notice thc decivion not lo pad v with a zero at each end (and extend A to be 4 by 5). That would lead to 4 output* у instead of 2. It would be consistent with multiplying number*: II time* 135to 14A5 and dividing by 2 give* J,2.4, j. Python and MATLAB offer both versions of convolution, padded or not (and a third option with three output*). We will choose not to pad the input with zero*. Each row of A to a perfect shift of thc prcviou* row. a* above. Often the convolution process Av to seen a* a moving window. Thc window start* at I 3 and move* lo 3 5. Averaging produce* 2 in the first window and 4 in the second window. Thc whole point of “shift invariance" to that a convolution doe* the %ame thing in each window. Windows in Two Dimensions Thi* approach to helpful in two dimension* where the window is a square or a rectangle. It to easy to see 2 by 2 overlapping window* filling an n by n square. There would be (n - I J2 window* and an average over each window. Thc matrix A ha* (n — 1 )2 outputs from n2 input*. Each row of A ha* (|. 1.1, |): four nonzero* lo average over a 2 by 2 window.
gt CoirrohMirwul Seurat Se, Eipenments have panted to £ » 3 M , l^jrnini.' СГМ11И __ г . ал £ by £ = 3 by 3 num ж would tux 2£ - j 3X3 windows in a 5 X 5 square ГГТ 4 5 в 7 3 9 Move thn wmduw left/п<Ы/ар/down/ <fT ap Inpt left down/ npt down to produce 9 window* centered m these ame ромгюев 2D Comolutioa by One Large Matrix When lhe input v is an image, convolution become* £ . 3 nutniwn Z-|.Zo.Z| change lo « 3l independent weight* The input* r(J have two md*ce* and v represents (.V + 2)1 pixels The outputs have only № pixel* оЫеы we pad * with zeros at the boundary The 2D convolution Ar в а linear ахпЬепжюа at 9 shift* of r Zu Z01 z-и Input image e,, Lj from (O.O)to (X ♦ 1..V ♦ 1) Weights zl0 zoo z_w Output tmate i,j from (1.1) *>(X.X) zt-i Zo-i z.i-i ShiftsL,C,R.L.D Left.Ceaaer Right.Lp.Doan A^XuLU+ia\CU+z||Rtr+xioI.+x«C-kz-teA-*-xt-iX.D-»zu-iCD*z-i-iflD This expresses the convolution matnx A as 1 combmaocc <Л 9 shift* The demaenes of the output V = Ли are again excepbonaily simple We use the шпе den*мое* ta (2> to create the grad.enls VF and VI (learning function. function, that are needed in gradient descent to improve the weight* z». The next itentxwi z*-i * z* » has weights that better match the correct outputs from the training dau Backpropagation finds these 9 den*arises of p = Ar with respect to 9 weight*: dv «>» z*r- ^RUv . -^—•RDv (2) Z’LU’ %'cl' л-
302 к Uanilng fn.mhWi CNN’s can readily allord to have II parallel channels (and that number H can Vl, wc go deeper into thc net) lhe count o( weights in x is so much reduced hy weight sht i " and weight locality, tlwt wc don’t need and wc can’t expect one set of Д» Q Wc|" lo do all lhe work of a conssilulional net. H cottrtdttfiotu give the next layer. * ”* .....Wei or Twpliw пш|гН> Two-dinwnslonul Convolutional Nets Now we come lo the teal success ol CNN’»: Image recognition Con v Nels and ck-cp learning hove produced а small rerolutioa in compiler vision. The application» arc to self driving can. drone». medical imaging. security. robotic*—there la nowhere to stop. Our inlereal I» in the algebra and geometry and intuition thul makes all this possible, In two dimensions ((or images) the mains Л is block locplitz. Each small block is E by E. Uns is a familiar structure in computational engineering. Thc count fc’a of independent weights to be optimized is (ar smaller than for a fully connected network, lite same weights air used around all pixels Ghi/t invariance}. Thc matrix A produces a 2D convolution x ♦ v. Iret|uently 4 is called a (liter. 1b understand an image, look lo sec where il changes. Find the edge*. Our eyes look for sharp cutoffs and sleep gradients. The computer can do the same by creating a filter, lhe difficulty with two or more dimensions is that edges cun have many directions, Wc will need horizontal and vertical and diagonal filters for thc lest images. And fillers huve many psiqmscs. including smoothing and gradient detection and edge detection. Smoothing For functions. one smoother is cawohtfltm with a Gauxidan e~',tf2a*. For vector», wc could convolve t> • G with G ” jly (1,4.7.4.1). Gradient detection Image processing (ax distinct from learning by a CNN) needs Alien that delect the gradient. They contain specially chosen weights. Wc mention some simple tillers just to indicate how they can find lint derivatives of f. One dimension E 3 Then (An), = -VH| - -v(_|. Ли» dimenutms E = 3 These 3x3 Sttbel operator* approximate l)/0x and H/Oy : 0 1 ' 0 2 0 1 0 | — n: - Oy 2 -2 -1 ‘ 0 0 2 I (3) Edge detection Those weights were created for image processing, to locate the most important features of a typical image: its edges. These would be candidates lor E by E filters inside a 2D convolutional matrix Л. But remember that in deep learning, weights like and • are ma chosen by lhe user. They are created from thc training data.
Н 2 CimvululiimaJ Neural Nel» imitorinni The t.hcrw deunbed w (lf..., 303 lhc moving window laket i„„Kr, * * fe»t, Uridt for a 1-dimensional 3-weight fife, w„h *7» *->»» tfe £ -de. )lflheou<p.l V ^‘-^dbyife1(K,^^ "’'’“•‘"•‘Rb and now two,; for , 1-dimensional 3-weight lifer Wtlh " *T । /lvl*red*«’bytfeirK„;*; Stride S - 2 if*. 0 <4» I ° I How lhe nonzero weight» life X| arc . In 2D. n stride 5 = 2 reduces each direction by jaa t*h? <S с'*"пп' •P*’ <<* «nde S). 1 “«lhe »i« of ife („tpjn hy «и In a onc-dimcnsional problem. supp,^ B Uyc| .. matrix wilh E nonzero weight». The «ride i» S' md ncu,,l,n Wc apply a cunrolutumal each end. How many mitputs (Af number») doas du» hlier^'X"^' ‘,gM’ - ^Kurpathy'sformula M я *j_Z g + a** (51 In a 2D W 3D problem, this ID formula ippfe* <n each direction Suppose E - 3 and the Mode is S . 1.11 wc one < p . U al cntl. Ihcn M»N-3 + l+l«N (input length = output length) Thi* cane 2 / E — I with stride S I b the mo« common architecture lor CNN'». If wc don’t pud lhe end» of the input with zero», then P > 0 and M - N - 2 (as in the <1 by (i matrix A at the start of thi» «action). In 2 dimension» thi» become* л/’ Я (N - 2)a. Wc lose neuron» this way. but we aroid any artificial zero-padding Now suppose the stride it S = 1 Then N - E must he an even number Otherwise thc formula (5) produces a fraction. Here are two example» of success for «ride S 2. with N - E = 5 - 3 and padding P «Ои P » 1 at both end* of the N 5 input*: Stride 2 »Г *.| xq *1 о о 0 0 1 z_| *o X| о О II 0 0 *-1 *e *1 о 0 I 0 0 x_| *o *1 j I 0 00 0 x-i *• X| J A Deep Convolutional Network Recognizing images is a major a^ success came with the creation о«М Ле** This page will descnbe a deep networkл Z14vC,nun from KI R 2015. nilion. We follow the pnze-winntng paper ) (J w 3) fihen Tta *7„к1”^^.‘<в7-рт -«*»>»> The nel»<rt tas a breaJih В f«BW *»«* <B
304 Chipccr 8. looming from Cha, If the breadth В were to slay the same at all layers, and all tillers had E by £ । weights, a straightforward formula would estimate the number IV of weights in the nV [ IV a; LBE2 L layers, В channels, E by E local ronvolutionsj (f)) Notice that IV does not depend on the count of neurons on each layer. This is b the E2 weights are shared. Pooling will reduce lhe count of neurons. Usc h is im common to end a CNN with fully-connected layers. You sec the last | in AlexNet (Section 8.1). Those dense layers radically increased the count of weioh/”* IV * 135, 000. 000. ,0 Sortmax Outputs for Multiclass Networks In recognizing digits, we have 10 possible outputs. For letters and other symbols. 2(1 or more. With multiple output classes, we need an appropriate way to decide the very last layer (lhe output layer w in lhe neural net thal started with v). “Softmax" replaces ihe two-output case of logistic regression We arc turning n numbers into probabilities. The outputs u’i....tc„ arc converted to probabilities pi,... ,p„ that add to I, 1 " Softmax py«« —e“’-> where ew>l Certainly softmax assigns the largest probability p, to the largest output w,. But ew is a nonlinear function of ir. So the softmax assignment is not invariant to scale: If we double all the outputs w}. softmax will produce different probabilities p}. For small w’s softmax actually deemphasizes lhe largest number Wmax- In the CNN example of tnachyourmachlnn.com that recognizes digits, you will see how softmax produces lhe probabilities displayed in a pie chart—an excellent visual aid. Residual Networks (RcsNets) Networks are becoming seriously deeper with more and more hidden layers. Mostly these are convolutional layers with a moderate number of independent weights. But depth brings dangers. Information can jam up and never reach the output. The problem of “vanishing gradients’* can be serious: so many multiplications in propagating so far. with the result thal computed gradients are exponentially small. When it is well designed, depth is a good thing—but you must create paths for learning to move forward. The remarkable thing is that those fast paths can be very simple: "skip connections" that go directly to the next layer—bypassing the usual step v„ = (Anv„_| 4- b,,)*. L layers could allow 2L possible routes— fast or normal from each layer to the next. One result is that entire layers can be removed without significant impact. The nth layer is reached by 2"'* possible paths. Many paths have length well below n, not counting the skips. By sending information far forward, features that arc learned early don't get lost before the output Residual networks have become highly successful deep networks. h'trt
g 2. Convolutional Neural Neu A fit 305 AS»mpleCNN., One Of the class projects м MIT Wls a _ . to Read |л(|т user begins by drawing multiple coptcs T“,,<ul *t ^h, «„mad, in™ _ learning thu data—creating a conumaum pX^*, 1Ъеа «he тумеп.ш\ uJot probability Ю lhe correct answer (the letter £ w4,1I^U"etM* F(b» «««“ high For learning lo read digit*. » ргпЫЫШЬ lnnr ' lhul too small a training set leads ю frequeot erZT w** chm You Ч“,Л|У or letters, and the test images are not centered the nJ- /11Лу>сл h*d ««e»cd numbers One purfHisc of teachyourmqchlne.com I* rtucXu?^'’‘‘°* emirs appear. The World Championship al the Game of Go A dramatic achievement by a deep conw|ullo<ul netw,»s champion al Go. This is a difficult game pbved <» . in u*“ l°de,e* “* <h“mani world p., J.»» -«one.- *>- color has no open space beside it (left, right, up « d.mn, 1Г * ,roup Ы °* the board. Wikipedia h« an an.rnaled galx tX^X* " ,ПИП AlphaGo defeated the lead.ng player Ue Sedol by 4 games ю 1 щ »!«. ft ^ned on thousands of human game, Jh.s w„ . P™ he neural network was deepened and unproved Google's new version AlplJ, Zero learned to play w.thout any human intenenoon-s.tnply by play.ng against itself. Now it defeated Us former self AlphaGo by 100 to a The key point about the new and better version is that the machine learned by Itself ft wus told the mlfi and nothing more. The first version had been fed earlier games, aiming lo discover why winners had won and losers had lost The outcome from the new approach was parallel to the machine translation of languages. To master a language, special cases from grammar seemed essential. How else to team ail those exceptions ? The translation team at Google was telling (he system what it needed to know. Meanwhile another small team was taking a different approach: Let the machine figure it out. In both cases, playing Go and translating languages, success came with a deeper neural net and more games and no coaching. it is interesting that the machine often makes opening moves thal have seldom or never been chosen by humans. The input to the network is a board position and its history. The output vector gives the probability of selecting each move—and also a scalar that estimates the probability of winning from thal position. Every step communicates with a Monte Carlo tree search, to produce reinforcement teaming.
Chapter 8. Learning from Dau 306 8.3 Minimizing Loss by Gradient Descent This section of lhe final chapter is about a fundamental problem: Minimise a function F(x x ) Cab nlits teaches us that all tbe first derivatives oF /дх, are zero at the minimum‘(when F is smooth). If we have n = 20 unknowns (a small number in deep learning) then minimizing one function F leads to 20 equat.ons OF/дх, » 0. • Gradient descent- uses the derivatives dF/dx, to find a direction that reduces F(x). The steepest direction, in which F(z) decreases fastest, is given by the gradient -V F. Gradient descent / learning rate at, (1) VF represents the vector (dF/dx\........dF/dx») of the n partial derivatives of F. So (1) is a vector equation for each step к = 1,2.3,... and ak is lhe stepsize or the learning rate. We hope to move toward lhe point z* where lhe graph of F(z) hits bottom. We are willing lo assume for now that 20 first derivatives exist and can be computed. We are not willing to assume that those 20 functions also have 20 convenient derivatives d/dx^dF/dx,). Those are thc 210 second derivatives of F which go into a 20 by 20 symmetric matnx H—the Hessian matrix. (Symmetry reduces n2 - 400 to |n2 + « 210 computations.) Those second derivatives would be very useful extra information, but in many problems we have to go without. You should know that 20 first derivatives and 210 second derivatives don't multiply the computing cost by 20 and 210. The neat idea of automatic differentiation rediscovered and extended as backpropagation in machine learning—makes those cost factors much smaller in practice. Backpropagation is a fast way to follow n chain rules at once. Return for a moment to equation (1). The step -s|VF(zk) includes a minus sign (to descend) and a factor a* (to control lhe the stepsize) and the gradient vector VF (containing lhe first derivatives of F computed at the current point zk). A lot of thought and computational experience has gone into lhe choice of stepsize and search direction. We start with the main facts about derivatives and gradient vectors VF. Please forgive me. this linear algebra book is ending with a touch of calculus. Multivariable Calculus Machine learning involves functions F(Z|,..., z„)of many variables. We need basic facts about the first and second derivatives of F. These are "partial derivatives" when n > 1. °,Tvu^x F<* + « F^ + VF + 5 (Az)T/f (Ar) (2) This is the beginning of a Taylor series—and we don’t often go beyond that second-order term. The first terms F(z) + (Az)T VF give ajirar order approximation lo F(x + Az), using information at z. Then the (Az)2 term makes it a second order approximation.
g j. Lo“ by Gradient Descent 307 :/2*“vr = as* *• 1"° "«nt Sit 2 by 2: Example 1 When S is symmetric, the EIJ1| To sec this vector, write out the function Fu*/' fxt ®a ] Г “ bl [i|l aH+d-»J r« * *i T- - • This is an important example! The minimum occurinhere 6m VF = ’ BF/дх, ' 0F/Bxn = ~ 0 = 0 at r* — e-i_ . _ 1 s a = arg min F. (3) « always ЧШ *4 «>>п Г Hands for Uk wbm __ vector! =S a than in the actual minimum value F^ = F(z®) at that point • ’ F«i. » 5(S'‘a)TS(S",e)-aT(S-,a)«l.Tr‘«-eT^->ee_l.TS-». The graph of F is a bowl passing through too at z = 0 and dipping to its minimum al a1 The Geometry of the Gradient Vector V / Start with a function /(z, y). It has n 2 variables. Its gradient is V/ = {dj/dx. дЦду\ This vector changes length as we move lhe point z.у where the derivatives are computed: /а/ dj\ , .. If8/\*. (9f\* Slope in the ~ \\dx) + steepest direction That length || V/|| tells us the steepness of the graph of : « /(z. *). The graph is normally a curved surface—like a mountain or a valley in zyz space At each point there is a slope df/дх in the z-direction and a slope df/dy in the y-duection The steepest slope is In the direction of Vf - grad /. The magnitude of that steepest slope is || V/Ц. Example 3 The gradient is the vector V/ That steepest z = constant has ax + by - was»!»- It is I The graph of a linear function/(z, y)«= az + by is the plane: «az + by. - । 11 ! ,jf partial denvanvrs The length of that vector is - Лч»"<«““”< л“”p““’"‘"'“°"v/- в у ="’v
308 Chapters. Learning frnni Figure 8J: The negative gradient - V/ gives the direction of steepest descent. Example 4 Thc gradient of the quadratic / = ar2 + by2isV/ = | j « ] That tells us lhe steepest direction, changing from point to point. We are on a curved surface (a bowl opening upward). Thc bottom of thc bowl is at x = у » 0 where the gradient vector is zero. The slope in thc steepest direction is '|V/||. At the minimum, V/ » (2«x. 2by) « (0.0) and ilopt - zero. The level direction has z — ax2 + by2 ~ constant height That plane t = constant cuts through the bowl in a level curve. In this example the level curve ax2 + by2 « c is an ellipse. Thc tangent line to the ellipse (level direction) is perpendicular to lhe gradient (steepest direction). But there is a serious difficulty for steepest descent: The steepest direction changes as you go down! Thc gradient doesn't point to the bottom I _____________Л___ steepest direction V/ up and down thc bowl ax2 4- by2 = z flat direction (V/)x along thc ellipse ax2 + by2 = constant V Thc steepest direction is perpendicular to the flat direction but the steepest direction is not aimed at thc minimum point (0,0) Figure 8.6 Steepest descent moves down the bowl in the gradient direction Let me repeat. At lhe point x0.Ste the gradient direction for / = ax3 + by2 is along V/ - (Zaxo.Zbyu). The steepest line through Xo.Jto is 2ru*0(y - J/o) — 2fejfo(r - r0)- But then the lowest point (x, y) = (0.0) does not lie on thc line! We will not find that minimum point in one step of "gradient descent**. The steepest direction does not lead to thc bottom of the bowl—except when b = о and the bowl is circular. Water changes direction as it goes down a mountain. Sooner or later, we must change direction 1<ю. In practice wc keep going in thc gradient direction and stop w hen our cost function J is not decreasing quickly (or starts upward). At that point Step 1 ends and we recompute the gradient V/. This gives a new descent direction for Step 2.
gj Minimizing Loss by Gradient Descent A I 309 Ш example /(*,!/) = Х(«з + *** ^*“4* with Zig-Zag Vf has two components df/dx = x «efrt for (j < b < , . That minimum is reached at the point (Г./ "^"um ‘( exact line search produces a slmple fofW||' [j M- Best of < down thc bowl toward (0,0). Starting^ (^^ ** <*•*) in the slow progress f-----------------:---------------- (M) we find these poi«u 2b I /(zo.ito) (4) Vk Zk If b = 1. you see immediate success in oneиептС .--------------- is perfectly circular with J = + P01" (*» • Vi I»(0,0». The bowl bowl, and it goes exactly through (0,0). Then йж firsi^^l^^’'0" d°WB correct minimizing point where J = 0. ₽ 8ГаЛа“<fcvxnl finds that The real purpose of this example is seen when b is tk. . equation (4) to r (1 - b)/(b + 1). For b this гию is r - 9/ц Rz ь™ "T the ratio is 99/10L ]hc ratio is approaching 1 urf p,^, /|ow|lrt (0 0)JJ virtually stopped when b is very small Figure 8.7 shows the frustrating zig-zag рлют ,Q Q, E is short and progress is very stow. This is a case where the stepsue at in »k*i - zt “ afcV/(Zk) was exactly chosen to minimize J (an exact line search). But the direction of - V f, even if steepest, is pointing far from the final answer (z*, /) - (0.0). The bowl has become a narrow valley when b is small. We are uselessly ernurng the valley instead of moving down the valley to the bottom The first descent step starts out perpendicular to lhe level set. As it crosses through loner level sets, the function /(z,y) is decreasing Eventually Us path is tangent to a level set L. Descent has slopped. Going further will increase /. The first step ends. The next step is perpen- dicular to L. So the tig-zag path took a 90 ° turn. Rg„ 17: Sto- .____. lh, .vseent is faster First-order convergence For b close to 1, the bowl » by . ^nsun. factor al each мер means that the distance to (ж , У > in (4) is (1 - M’/U + b^' For gradient descent and this f. the consergence factor
310 Оицмст 8. Learning fnxn I)aI1 Momentum and the Path of a Heavy Ball The slow zig-zag path of steepest descent is a real problem. Wc have lo improve it. Our model example / = ^ + hai ,wo vanab,CS and ,tS SCCOnd dcr'vatlVc matrix H is diagonal—constant entries f„ = 1 and - b and =0. But it shows the zig-zag problem very clearly when b = = b/1 •» «mall. Key idea: Zig-zag would not happen for a heavy ball rolling downhill. Its momentum carries it through the narrow valley-bumping the sides but moving mostly forward. So we add momentum with coefficient 0 to the gradient (this is Polyak's important idea). The direction z4 of the improved step remembers the previous direction Descent with momentum x*+i =x* — aza with zj, = V/(z*) +/3zfc j Now we have two coefficients to choose—the stepsize в and also 0. Most important, the step to za>i in equation (5) involves x*_|. Momentum has turned a one-step method (gradient descent) into a two-step method. To get back lo one step, we have to rewrite equation (5) as two coupled equations (one vector equation) for lhe stole al time к + 1: Vector equation with momentum x*+i = Xk -azk z*+i - V/(Xk*t) 0zk (6) With those two equations, we have recovered a one-step method. This is exactly like re- ducing a single second order differential equation to a system of two first order equations. Second order reduces to first order when dy/dt becomes a second unknown along with y. 2nd order equation 1st order system Interesting that this b is damping the motion while 0 adds momentum to encourage it. The Quadratic Model When f(x) - |xTSx is quadratic, its gradient V/ = Sx is linear. This is the model problem to understand: S is symmetric positive definite and V/(zk+t) becomes Sik+1. Our 2 by 2 supermodcl is included, when the matrix 5 is diagonal with entries 1 and b. For a bigger matrix 5. you will sec thal its largest and smallest eigenvalues determine the best choices for 0 and the stepsize s—so the 2 by 2 case actually contains the essence of the whole problem. To understand the steps of accelerated descent, we track each eigenvector q of S. Here we are using a key idea from linear algebra (Chapter G): Follow the eigenvectors. Suppose Sq = Aq and xt = c*<7 and zk = dtq and Vfk = Sxk = Actq. Then equatiixi (7) connects the numbers ct and d* al step к to r*+| and dk+t at step к + 1.
и 3. Min'n,izin8 1ли by Gr*1’«’t Descent 311 K»ll<»*inR,he ***’ ~ca-adk Г figenvectorq -Ack+l + dt + 1 = Now we invert the first matrix (-A becomes А)ю (Ъ *ee** descent мер clearly: Descent step multlpU« ЬУ R 1 —a A в-Ха After к steps thc starting vector is multiply bv P* (which is the minimum of f a 1zts °У«'- as sroa|! as possible. Clearly those etgenvalues <0(2^ That eigenvalue A could be anywhere between A * -fc₽end Choose a and 0 to minimize max ре,(Д)|, |еа(Д) For fast convergence to irtu eigenvalues r, and e2 of Я lo be I on the eigenvalue A of S. !,n(S) and AfnnfS). Our problem is: I f<* \ninW - * - *»($)• U seems a miracle that this problem has a beaunful sdunon The opumal » and 3 are Think of thc 2 by 2 supermodel, when S has eigenvalues A^ a 1 „rf Am„ . i: *"(г+я) “d *"(jT7s) "°’ These choices of stepsize and momentum give a convergence rate that looks like the rate in equation (4) for ordinary steepest descent (no momentum) But there is a crucial difference between (10) and (4): b is replaced by y/b. Ordinary / № fl “bl A - r «11 f t ->--t л cct i с га ico / r-\2 11 — v/b\ descent factor \1+Ь/ descent factor \i +vv (ID So similar but so different. The real lest comes when b is very small Then the ordinary descent factor is essentially 1 - 4b. very close lo 1. The accelerated descent factor is essentially 1 - 4y/b, much further below 1. To emphasize thc improvement that momentum bongs, suppose b = 1/100. Then y/b = 1/10 (ten times larger than b). The convergence factors in equation (11) are / 90 \2 ( .9 V Steepest descent ( pjyj- J ss .96 Accelerated descent j » .67 Ten steps of ordinary descent multiply the starting error by 0.67. This is matched by a single momentum step. Ten steps with momentum multiply lhe error by 0.018. Amax/Amin = V* = * b ,hf condition number of S.
312 Chafer 8. Loaming from DaIil Stochastic Gradient Desccn( Gradient descent is fundamental in training a deep neural network. It is based on a sten the form x^1 = xk - st VL(xk). That step should lead us downhill toward the J,, , x* where the loss function L(x) is minimized for lhe test data v. But for large network with many samples in the training set. this algorithm (as it stands) is not successful! It is important to recognize two different problems with classical steepest descent • 1. Computing VL at every descent step—the derivatives of the total loss L wj(h respect to ail the weights x in the network—is too expensive, Thai total loss add lhe individual losses t(x,v.) for every sample v, in the training set—potential! * millions of separate losses are computed and added in every computation of i 2. The number of weights is even larger. So VXL = 0 for many different choices »• of the weights. Some of those choices can give poor results on unseen test data The learning function F can fail to "generalize". But stochastic gradient descent (SGD) does find weights x* that generalize—weights that will succeed on unseen input vectors v from a similar population. Stochastic gradient descent uses only a “minibatch” of the training data at each step В samples will be chosen randomly. Replacing the full batch of all the training data bv a minibatch changes L(x) | £f.(x) lo a sum of only В losses. This resolves both difficulties at once. The success of deep learning rests on these two facta; 1. Computing VL by backpropagation on В samples is much faster. Often 1. 2. The stochastic algorithm produces weights x* that also succeed on unseen data. The first point is clear. The calculation per step is greatly reduced. The second point is a miracle. Generalizing well to new data is a gift that researchers work hard to explain. Stochastic Descent Using One Sample Per Step To simplify, suppose each minibatch contains only one sample v* (so В — 1). That sample is chosen randomly. The theory of stochastic descent usually assumes that the sample is replaced after use—in principle the sample could be chosen again at step к + 1. But replacement is expensive compared to starting with a random ordering of the samples. In practice, we often omit replacement and work through samples in a random order. Each pass through the training data is one epoch of the descent algorithm. Ordinary gradient descent computes one epoch per step (batch mode). Stochastic gradient descent needs many steps (for minibatches). The online advice is to choose В < 32. Stochastic descent is more sensitive to the stepsizes a* than full gradient descent. If we randomly choose sample o, al step k, then the fcth descent step sees only one loss: ~ ж* ~**Vx^(xa,p,)| Vj I = denvative of the loss term from sample v,
313 gj. Minimizing Loss by Grad^nt We are doing much less work ре ... training let). Bul we do not necessary CL ,DPW' ,nMe*« « all inou, f stochastic gradient descent is “semi-conve^^ A tyX'feXe ^^nteneemtheoan Early steps of SGD often converge то^Х^^’ П = 1 Wc admit immediately that later iterations of SGD г °* *’• at thc start changes to large oscillations near ibe rti^'b Coo'tr^n« One response is to stop early. And thereby we , 2/°“ RpUt ” W,U Uwu th“ 7 t owrntting the In lhe following example, the solution z" is u, . approximation x* is outside I, the next aowon™,,. “Г*'1* Л cuncnl Thnt gives semtconvergence-a good sun 8w evenly '! We learned from Suvrit Sra that the simplest examnle L one component x. The tih loss is (, . l(OjX - xp W1th . . .. - .**^2 * iu derivative at(a,x - 6,) It is zero and /, is minmzed at a= b /а а11 /V samples is Цх) = E(«.x^)’ Ьеам^Д^Х^ . * £aifei e.b, В ---------- Important If B/A is the largest ratio bja,. then the tree solution x* к below B/A. This follows from a row of four inequalities: bt В Ла<1,<^Ва' л(Е“<М<в(Е«?) *•- Similarly X* is above the smallest ratio 0/a. Conclusion: If z« is outside the interval / from в/a to B/A, then the hh gradient descent step will move nmwrf that interval I containing x*. Here is what we can expect from stochastic gradient descent: If xi, Is outside /, then хьм moves toward tbe interval 0/a < x < B/A. If Xh is inside I, then so is Za+i. Th* iterations can bounce around inside I. A typical sequence Xo.Xi.Xj,... from minimizing i|Az - b|J by stochastic gradient descent is graphed in Figure 8.8. Ком tee the fan uan and the otcillanng finish. This behavior is a perfect signal to think about early stopping or else averaging. t
314 iterations . -«nl but later iterations oscillate. For these four SGD paths. Figure 8.8: Early iterations v uuickly and then fluctuates instead of converging, the quadratic cost function decreases quit у Overfifting Here is an observation from experience. W<- may not want to fit the training data too perfectly That could be overfilling The function F becomes oversensitive. It memorized everything but it hasn't learned anything Generalization is the ability to give the correct classification for unseen lest data v. based on the weights x that were learned from the training data. I compare overfining with choosing a polynomial of degree 60 that fits exactly to 61 data points Ils 61 coefficients a« »«n will perfectly leam the data. But that high degree polynomial will oscillate wildly between the data points. For test data at a nearby point, the perfect-fit polynomial gives a totally wrong answer. But see Figure 8.9 for an unexpected result from severe overfitting. One fundamental strategy in training a neural network (which means finding a func- tion that кал» from the training data and generalizes well to lest data) is early stopping Machine learning needs to know when lo quit! Possibly this is true of human learning too. The Double Descent Curve This section is a report oo experiments (including ordinary least squares). Usually we are fitting a large problem with a smaller number of parameters. We may have m equations and n < rn unknowns: Ax *= b with a tall thin matnx A. The m measurements b arc not exact and loo few parameters (n = 2 for fitting by a straight line) cannot model the data. To improve our result, we allow more parameters. As n increases, the model begins to improve. "The bias is reduced But nothing is perfect, and the first descent in Figure 8.5 turns upward. The model begins to fail from overfitting. All this is for n < m. the usual situation. It was fully expected that deep karning would become deep suffering, when the num- ber of layers and matrix weights increased too far. The evidence pointed that way. until computers became so fast and powerful that n went past m: The model is overparame- terized Now we have many solutions to choose from—many different weights x minimize the loss function L(x).
315 gj. Minimizing Loss by Gradient m>n m<n ^Ex —int^poiati' Number of Weights Figure 8.9: Thi» is Belkin’» double dev «"«joins J<~" » »««,.шw, ’«'«no;, * J1’*1 «*!» ch.net! n Gradient descent (full batch or numbatch to good weight»! Apparently it doe». ,( Л"? “осЬ*Лс»»У> <nev U> converge The method generalize» well to new data by ch. ** *C°°d <*e4cw *" R*ure ” all the possible solutions. That process is п<и fuli?7"r!2 fMnKul"1> 8°°d “hitum among For the linear least squares equate “ Л“ *«* “ being written added to x. Then ATA(z + ж) М1ц _*/ u,lu‘'<,<ito Ax = 0 could be attoon м n > m (more unknowns than equations. r72“" keeps x in the row space of А H avwd, Xnr x from the nuT """"r 7?' ‘°‘и“"П would increase the norm: ||i + x||« e |lx7^|x|P^nd^ A neat observation by Poggio (arXiv 1912 06190» looks at the condition numh« rf AT A. As n increases, the graph of that number is ven, . r </?> number of , , v numoer is very close to Figure 8.9—the error goes down again for large n. Out of many solutmns („ n > m. Мк^ы other authors show that gradient descent somehow chooses a volution that generalizes well lo new data ADAM: Adaptive Methods Using Earlier Gradients For faster convergence of gradient descent and stochastic gradient descent, adaptive methods have been a major success. The idea is to uir gmdienu from rariirr aept. “Momentum” went one step back, to к - 1. These adaptive methods (ADAM) go all the way back. Memory partly guides the choice of search direction Dt and stepsize aa. We are searching for the vector z* that minimizes a specified loss function L(z). In the step to xa+t = z* - sD*. we are free to choose the ditraion Dt and tieptize a*. Dk = DCVLa.VLk-t........VL,) and a* = a(VLk. VL»-t....VLo). (12) For a standard iteration (not adaptive). Dt depends only oo the current gradient VLk (and sk can be s/y/k). That gradient is evaluated only on a random mimbatch В of the test data. Now, deep networks often have the option of averaging some or all of the gradients from earlier minibatches Success or fatlure wtll depend on D. and »*.
Chapter 8. Learning from 316 Епхжелгш/ mming mrmgez « ADAM have become .he favorites Recent gradients v/Se greater weight .han earlier gradients in both ч and .he step dtrcction Dk. 7^ exponential weights in D and a come from 8 <\ and 0 < 1 J^al valucs arc A a 0.9 and7 = 0.999 Small values of A and fl will effectively kill off lhe moving memory end lose die advantages of adaptive methods. The actual computation of D* and a* will be a recursive combination of old and new; A - /Da-! 4- (1 - - Я1 + О ~ Д) ||V£ (g Д (J3) For several class projects, adaptive methods clearly produced faster convergence. After fast convergence lo weights thal nearly solve VL(x) = 0 there is still t|le crucial issue: Why do those weights generalize well to unseen test data ? Randomized Kaczmarz for Ax — b is Stochastic (radicnl Descent Kaczmarz for Ax = b with random i(fc) - a, xk i№ ai (14) The Kaczmarz idea is simple. Choose row i of A at step k. Adjust x*+J to solve equation • in Ax b. (Multiply equation (14) by a? to verify that a,Tx*+i - b„ This is equation i in Ax - b.) Geometrically. x*+1 is lhe projection of x* onto one of the hyperplanes a*x •• b, that meet at x* “ A~'b. This algorithm resisted a close analysis for many years. The equations ofx 6b ajx bj... were taken in cyclic order I to n. 1 to n.... Then Strohmer and Vcrshynin proved fast convergence for random Kaczmarz. They used SGD with norm-squared sam- pling : Choose row 1 of A with probability p, proportional lo ||a,||a. A previous page described the Kaczmarz iterations for Ax — b when A is N by 1. The sequence xOlX|,xg,... moved toward the interval /. The least squares solution x* was in that interval. For an N by К matrix A. we expect the К by 1 vectors x, to move into a Л'-dimensional box around X*. Figure 8.8 showed this for К — 2. The next page will present numerical experiments for stochastic gradient descent.
g з. Minimizing Loss by Gradient Descent 317 Random Kaczrnarz and Iterated Projections SUPP?hCC t Гм' ГаП“”'П Кж/Лап "* emx - X- °n,° online mine mdw h i'i cho'cn ^^mily at step t (often with impor- tance ob-b.hu« proportional io ||a,|p). To see that projection matnx OiOl/a,1 a,, substitute о, «< a:’into the update мер: Olbogonsl k»pb Tbtmo<cM,«,|) aar„« Jl,™ norm ||xfc - x || decrease» steadily, even it the com function ||Лх* - b| doc* not. Bul convergence in usually slow! Suohmer-Venhynm eMimate the expected слот: E l||x* - x || I < H - — j ||х» - ж,||,1 e»= condition number of Л. (16) Thi» Й slow compared lo gradient descent (there c2 h replaced by r. and then y/r with momentum added). But (16) is independent of the size of A: attractive for large problem*. Our experiments converge slowly! The 100 by 10 matnx A is random with c * 400. The figures show random Kaczmarz for 600.000 steps We measure convergence by lhe angle 0k between x* - x* and the row a, chosen al step k. The error equation (IS) is ||x*>i - x*||2 (1 -соаа0й)||х* -x*||’ (П) The graph shows that those numbers 1 - сои2 Oy are very close ю I: slow convergence But the second graph confirms that convergence doe* occur. The Strohmcr-Venhynin bound becomes E[coa2®»] > 1/c2. Our example matnx has 1/c2 close to 10'* and experimentally con2 0* =s 2 • 10"*, confirming that bound. Figure 8.10. Convergence of the squared error for random Kaczmarz. Equation (17) with 1 - сов2 0,, close to 1 - 10's produces the slow convergence in lhe lower graph
Chapter 8. Learning from Data 318 Product of Matrices ABC: Which Order ? ^hblv efficient improvement on computing each OF/dz, scp. Backpropagauon » an «««*«*«* thc computations can make such arately. At fir* •<**"“ “ £ cnd (thc j^bter might say) you have to compute derivatives an enormous difference B||t reordcring for each small step in faJ Js than N times lhe cost of one derivative dF/dxx. d,ffere„„auo„ mat, ЛО). X, m VF Those are number i„ , bus Шау ле m.mces .» top ^.ag-wben each ,a,cr „ muluplkaao« son gives lire to~< d»h («« below). It is beautiful . t three matrices ABC. thc associative law offers ?“ «--- - ь- AB first or BC first ? Compute (AB) C or A(BC) 1 The result is the same but the number of ind.vidual multiplications can be very different mein. Л is » b, o. end В is » b, p. and C « p by AB = (m x n) (n x p) has mnp multiplications First way ^AB)C - (m x p) (p x q) has mpq multiplications BC = (n x p) (p x q) has npq multiplications Second way = (m x nj (n x g) has mnq multiplications So the comparison is between mp(n + q) and nq(m + p). Divide both by mnpq: The first way is faster when - + — is smaller than------f- —. q n m p Here is an extreme case (extremely important). Suppose C is a column vector: p by 1. Thus q = 1. Should you multiply BC to get another column vector (n by 1) and then A(BC) to find the output (m by 1)? Or should you multiply matrices AB first ? Thc question almost answers itself. The correct A(BC) produces a vector at each step. The matrix-vector multiplication BC has np steps. The next matrix-vector multiplication A(BC) has mn steps. Compare those matrix-vector steps to the cost of starting with the matrix-matrix option AB (mnp steps!). Nobody in their right mind would do that. In the application of A(BC) lo the chain rule, we start with the last layer C is the derivative of the last Fl and we go back to the first layer (A is the derivative of Fj).
8.3. Minimizing Loss by Gradient Dewem 319 11,6 Multivariable Chain Rule Suppose the vector v with n components v, is a fUIKtlon .. u.. The derivative of each u, with respect to each ’ vector “ *llh c°mponcnts (often called the Jacobian matrix J) 1ц ^anr lnto *** dcnvatlve matrix dv/du derivative matrix dw/dv of the vector functiX» --Г <П°* Ьу П) 5,тИяг1У-lhc components of v = (v,....M is a p by n matnx " ....*lth t0 the dw dv dwi dv\ dwp . tbt (18) Each w. depends on the v’s and each Bj depends oo the us. Therefore each function wt,..., wp depends on ult..., и„. The chain rale aims to find the derivatives dw,/duk. And the rule is exactly a dot product: (row i of dw/dv)-(column к of dv/du). 3w, = 3w, Эщ + + =/dw, dw,\ /dvj dv„\ duk &vi duk dv„ duk \dv}'‘“'dv^) \duk'’",dui,) Multivariable chain rules dw / dw \ / dv\ Multiply the matrices in (18) du = \ dv J \ du) (19) The key to matrix calculus is linear algebra—shown again in this chain rule. Problem Set 8.3 1 The rank-one matrix P = aoT/aTa is an orthogonal projection onto the line through a. Verify that P2 = P (projection) and that Px is oo that line and that x — Px is always perpendicular to a. (Why is aTz = aTPx ?) 2 Verify equation (15) which shows that Xk+i — x* в exactly P(xk - x ). 3 If A has only two rows at and aj. then Kaczmarz will produce the alternating projections in this figure. Starting from any error vector eo = xo - x*. why does ek approach zero ? How fast—if you know that angle в at 0 ? »> -fli
320 4 Chapter 8. Leaning from Dau Suppose we want to min.mize F(x,y) = V*+ (* " J1* «^nl minimum • al (x’.y*) = (0.0). Find thc gradient vector VF at thc starting pojn| (to <ai) = (I. 0- fr* ,u*' 4?ги^и'п* descent (no/ stochastic) with stepsize в = 1. where is (xi. Щ) • In minimizing F(x) = ||Ax - Ы12. stochastic gradient descent with minibatch size В = I will solve one equation ajx = b, at each step. Explain thc typical step for minibatch size В — 2. (Experiment) For a random A and b (20 by 4 and 20 by 1). try stochastic gradient descent with minibatch sizes В = 1 and В = 2. Compart thc convergence rales— the ratios r* = |[x4+i - хф(|/||х* — X*||. (Experiment)Try thc weight averaging proposed in arXiv: 1803.05407on page 365. Apply it to lhe minimization of || Ax - b\|2 with randomly chosen A (20 by 10) and b(20by 1). and minibatch В = 1. Do averages in stochastic descent converge faster than tbe usual iterates xk ?
8.4. Mean- Variance, and Covariance 8.4 Mean, Variance, and Covariance 321 *1 to z. are positive numbers adding to 1. The mean is simple and we will start there Ri.t» ~ We may have *, rexull, *— expected resulls (expec/ed rala«) (nxu (ulure итак. Sample values Five random freshmen have ages 18.17.18.19.17 Sample mean 5 (18 + 17 + 18 + 19 + 17) = 17.8 Probabilities Expected age E [x] of a random freshman = (0.2) 17 + (0.5) 18 + (0.3) 19 = 18.1 Both numbers 11.8 and 18.1 are correct averages. The sample mean starts with N samples Xi to XN from a completed trial. Their mean is the average of the .V observed samples: 5(18 + 17+ 18+19+17) = 17.8 Thc ages in a freshmen class are 17 (20%), 18 (50%). or 19(30%) „ 1 1 Sample mean m = и = — N The expected value of x starts with the probabilities pt Expected value m = E(z] = pi*i + Paxa + • • • + P»x„. (I) (2) This is p • x. Thc number m = Ejxj tells us what to expect, rn = p tells us what we got. A fair coin has probability po = | of tails and pi = | of heads. Then E[r] = (5) 0+| (1). The fraction of heads in N coin flips is lhe sample mean. Thc “Law of Large Numbers” says that with probability 1. the sample mean will converge to its expected value E[r] = 5 as the sample size N increases. This does nor mean that if we have seen more tails than heads, the next sample is likely to be heads. The odds remain 50-50. The first 1000 flips do affect the sample mean. But 1000 flips wll not affect its limit— because you are dividing by .V -» oc. Note Probability and statistics are essential for modem applied mathematics *ith mul- tiple experiments, the mean m is a vector. The variances/covariances go mto a matnx. Probabilities p(<) change with time in a master equation
322 Chapter 8. Variance (around the mean) The variance <ra measures expected distance (squared) from the expected mean 1-1 I The sample variance № measures actual diMance (squared) from thc actual sample nic The square nx« is the standard deviation a or S. Aller an exam. 1 email the results n tmrt 5 to the class. I don’t know thc expected m and <r* because 1 don’t know the probabilif Pt Ю pioo for each score. (After GO year*. I still have no idea what to expect.) The distance is always/гот the mean—sample or expected. We arc looking for the si of the "spread” around thc mean value x = m. Start with N samples. Sample variance S’ = [(®i - m)a + ... + (xN - m)’j (3) Thc sample agesr = 18,17,18,19,17base mean rn = 17.8. That sample has variance 0 7 • S’ = J [(-2)’ + (-.8)’ + (.2)’ + (1.2)’ + (—.8)’] = 1(2.8) = 0.7 Thc minus sign* disappear when we compute squares. Please notice ! Statisticians divide by ;V - I = ( (and not N = 5) so thal S’ is an unbiased estimate of o’. One degree of freedom is already accounted for in thc sample mean. An important identity comes from splitting each (x — m)’ into x2 - 2mx g. m2: sum of (x, - rn)’ (sum of x’) - 2m(sum of x<) + (sum of rn’) (sum of x2) - 2ni(/Vm) + Nrn2 sum of (x< — rn)’ = (sum of x2) — Nm2. This is an equivalent way to find (x( - m)2 + • • • + (xN - m2) by adding rf q.... g. j.3 Now start with probabilities p, (never negative I) instead of samples. We find expected values instead of sample values. The variance a2 is thc crucial number in statistics. Variance <т3 = E [(x - rn)’] =P1(r, - m)’ g.... g. Рп(Жп т)з | (J) Wc arc squaring thc distance from the expected value rn = E[r], We don’t have samples, only expectations. We know probabilities but wc don’t know the experimental outcomes' Equation (3) for thc sample variance № extends directly lo equation (6) for the variance o’: Sum of p,(xt-rn)2 = (Sum of p,x’)-(Sum of p,x,)2 or <r’ = E(®’] - (E[x]a) (6) Example 1 Coin flipping has outputs x = 0 and 1 with probabilities po = Pi = |. Mean rn = j(0) + ](1) = ] = average outcome = E|x] Variances = |(0 —i) + j(l —}) = 1 g-1 = 1 = average of (distance from m)2 For average distance (not squared) from rn. what do we expect ? E[x - m] is zero!
К.4. Mean. Variance, and Covwwncc 323 Find thc variance n2 of the The probabilities of ages r. Example 2 Solution «js» x, 17.18.19 were p, = 0.2 and 0.5 and 0 3. -fhc expected value was m “ 18.1. Thesariawceuveslhinc same probabilities; <r’ - (0Д)(И - «Л)’ + (O.5)(1B - IMf + (03)110 - 18.1)’ . (0.2Ц1.21) + (0.51(0.01) + (03)(0.81) . 0.49 Then в = 0.7. measures ,*1C 4PrcaJ °1 *8. V* around E|xj. weighted by prubabilitievO.2.0.5,03. Continuous Probability Distrilnitions Up to now we have allowed for ages 17 19. __________ „ stead of years, there will be loo many р^мЫе ares Г 17ond 20^8 continuum of possible ages. The^^^ ? *7 change to a probability distribution p(x) for a^^f Ж!”* The best way to explam probab.|1Iy dnmbutxxn n to gne you twn examples They wi|| be the uniform distribution and thc normal distribution The first fund Д) is easy The normal distribution is all-important. Uniform distribution Suppose ages are uniformly distributed between 17.0 and 20.0. All those ages are equally likely . Of course any one exact age has no chance at all I. ---------- ----------------.. . r x . п I orX - 17 4. V5; ------- —i agr less than < x: X < 17 won't happen x < 20 will happen There is zero probability that you will hit the exact number z ’ IT. But you can provide lhe chance F(z) that a random freshman has The chance of age less than x « 17 is F( 17) 0 The chance of age less than z 20 is F(20) 1 The chance of age less than x is F(x) = |(x - 17) 17 to 20 : F goes from 0 to 1 From 17 to 20. the cumulative distribution F(x) increases linearly where p is constant. You could say that p(x) dz is the probability of a sample falling in between z and x + dz. This is “infinitesimally true": p(z)dz is F(x + dr) - F(x). Here is calculus: F = Integral of p Probability of a < x < b = / p(x) dz = F(b) — F(a) (7) F(b) is the probability of z < b. Subtract F(a) to keep x > a . That leaves a < r < b.
324 Chapter H. Learning fnwi) Ьлц Mean and Variance of p(xj cumulative F(x) =. . probability that a sample !• below x F(x) = э(* “ 17) H "pdf P(z) . derivative of F probability that M sample is near ® _________ p(®) = -7- dr Figure К 11; /’(.г) is the cumulative distribution and its derivative p(x) = JF/d® is t|K probability density function (pdf). The urea up lo ж under the graph of p(®) is F(®). What arc (he mean in and variance o’ for a probability distribution 7 Previously we added juj:, to gel lhe mean (expected value). With a continuous distribution wc Integrate xp(x); Mean m E[x) = x p(a>) dx Variance a1 = E [(« - m)a] = I p(x) (x - m)a dx 3 When ages are uniform between 17 and 20, lhe mean is m ~ IN.5 with o2 = That is u typical example, and here is the complete picture for a uniform p(x), 0 to a. Uniform ford < x < a Density p(x) = — Cumulative F(z) = - a a Menn in a J x p(r)dx = - Variance 9a = Г - (ж - -Л dx e — J ' 2 J «\ 2/ 12 l or one random number between 0 and I (mean q ) lhe variance is a2 (K) Normal Distribution: Bcll-shttpcd Curve I he normal distribution is also called (he "Gaussian" distribution. It is the most important of all probability density functions р(ж). Hie reason for its overwhelming importance comes from repenting rm experiment and averaging lire outcomes. The experiments huve their own distribution (like heads and tails). The aventge approaches a normal distribution.
и 4. Mean. Variance, and Covwwnte 325 Figure 8.12: The standard norma) distribution p (x) ha, mean m - 0 and a - 1. The “Mandard normal distribution" p(x) is symmetric around x = 0. ю its mean value ii rn “ О- I* •» chosen to have a standard variance a2 1. It ia called N(0,l). I he graph ol p(®) = e 12 is the bell-shaped curve with variance а2 = I. By symmetry thc mean i« rn 0. The integral for a2 uses the idea in Problem 11 to reach 1. Figure 8.12 shows a graph of p(x) for N (0,<r) and also its cumulative distnbution F(x) “ integral of p(x). From F(x) you see a very important approximation for opinion polling: 2 'Die probability that a random sample falls between -a and a is F{a) - F(-tr) as -. 3 Similarly, thc probability that a random x lies between -2a and 2a ("less than two standard deviations from lhe mean") is F(2a) - F(-2a) « 0.95. If you have an experimental result further than 2a from the mean, it is fairly sure to be not accidental. The normal distribution with any mean rn and standard deviation a comes by shifting and stretching the standard N (0,1). Shift x to x - m. Stretch x — m to (x - m)/a. GauMlan density p(x) = 1 _ т)а/2<уэ (g) Normal distribution N(m, <r) <r s/2rr The integral of p(x) is F(x)—lhe probability that a random sample will fall below x. There it. no simple formula lo integrate e“* ^2, so I' (x) is computed very carefully.
326 СЬчмег 8. Learning faxn DWa N Coin Elips and N -> Exsmple3 SupP*"*xЬ 1 ** . ~. i _ ifn2 . i, na The variance is -5(1) + |(-1)2 = 1. The mean value is m = jt*> + jl A»> = (xi + •" + 1Ъс ,ndcPendcni т. The key queMhm “**“ e‘* ' ‘ by д’. The expected mean of AN is still zero, зге ±1 and we air k awrage approaches zero with probability 1. ,„e b. or Uge numberv>riance ,, , How fast docs .4.v approach zero. . 4. = .V — = — By linearity <r^. = д’1 + Д’’ № since a2 = 1. Here are the results from three numerical tests: random 0 or 1 averaged over trials. [4B I s from .V = IOC! [5035 i's from .V = 10000] (19967 l’s from N = 40000]. The standardized X = (x - m)/o = (As - 5) /2v/X was (-.40] (.70] (-.33]. The Central Limit Theorem says that the average of many coin flips will approach a normal distribution. Let us begin to see how that happens: binomial approaches normal. The “binomial" probabilities po,..., PN TOunt die number of heads in W coin flips. For each (fair) flip, thc probability of heads is |. For N = 3 flips, the probability of heads all three times is (|)’ = |. The probability of heads twice and tails once is from three sequences HHT and HTH and THH. These numbers | and | are pieces of /1 + I)3 = l + ’ + ’ + l = l. The average number of heads in 3 flips is 1.5. V 2 2' 8 8 8 8 1 3 . .3 3 6 3 Mean m = (3 heads)- + (2 heads)- + (1 bead)- +0= - + - + - = 1.5 heads Ь о о о о о With .V flips. Example 3 (or common sense) gives a mean of m = E XiPi = heads. The variance a2 is based 00 the squared distance from this mean N/2. With N = 3 the variance is a2 — j (which is X/4). To find a2 wc add (x< — m)2 p, with m = 1.5: a2 = (3-1.5)’1 + (2-1-5)» I + (1-1-5)’| + (0-1.5)’ | = ~*3 + 3 + 9 = ? о о о о oz 4 For any Л’. the variance for a binomial distribution is aj, = N/4. Then as = VN/2. Figure 8.13 shows how the probabilities of 0.1.2.3.4 heads in N = 4 flips come close to a bell-shaped Gaussian. Thai Gaussian is centered at thc mean value m = N/2 = 2.
Й.4. Menn. Variance. and Covar1ancc To reach the .standard Gau^,^., 327 If x is the number of heads in V n u **> variance i > . r biW by ll> me» m , N/2 м V Jt --------------to P”*1** the standard X • Shifted and scaled Subtracting m Is “centering" or “detrmdi ' ? Dividing by cr Is "normalizing" «г “« и"' "**" ** X h иго- L—~v.ri>rKe It IS fun to sec the Central I imir tt^Z-;----- X = 0. At that point, the factor e~x’/i «271 П?Ь‘ MM*CT 11 ccntCT P°inl flips is ff2 = M/4. The center of the bell-sha^ \kno*,hat *** vanani-c for ti coin What is thc height at the ^n<er of '**** distribution)? For X = 4. the probabilitiesf»n , о ',°П А» •e Px «he b.nomial Polities forO. 1,2,3.4 heads come from (1 + I)4. i + lV- 1 . 4 6 4 ! 2 2/ 16 16+1б+1б + 1б = 1' (9) g Center probability = — 16 p(x) = l ₽*/1*>/27тХ/'«'х ---------7 ' ' uniform-----------------------------------------z \ J binomial / approaches \ M heads area =1 1 1 Z G’ussian \ ti flips —i------------»-—L a 0 1 Af=O X/2 ti Figure 8.13: The probabilities p = (1,4,6.4,1)/16 for the number of heads in 4 flips. These p. approach a Gaussian distribution with variance a2 = X/4 centered al m = X/2. For X, the Central Limit Theorem gives convergence to the normal distribution N(0.1). The binomial (| -t-1)* tells us the probabilities for 0,1, , X ty»rk The center term is the probability of beads. - tails___________- ' ____________________________________2 2 2N (Х/2)! (Х/2)! ForX = 4. those factorials produce 4!/2! 2! = 24/4 = 6. For large .V. Stirling s formula x/2ttN(N/e)s is a close approximation to XL Use this formula for X and twice for .V/2: Limit of coin flip __ 1 s/2x.V(X/e)* _ y/2 1 Center probability ₽,v/2 "" 2* яХ(Х/2е) * “ ~ y/bta' The last step used the variance a1 = X/4 for coin-tossing. The result 1/'Jlxa matches the center value (above) for the Gaussian. The Central Limit Theorem is true: The centered binomial distribution approaches the normal distribution p(z) as N —» 00.
328 Chapter 8. Learning from Data Covariance Matrices and Joint Probabilities Linear ateebn enters when we mn Af different experiments at once. We might measure аве and height (Af = 2 measurements of .V children). Each experiment has Ha own mean value So we have a vector m = (rm.m,) containing two mean values. Those could be wm/i/r mrons of age and hc.ght. Or m. and rrr2 could be expec/erf ro/trw of age and height based on known probabilities. A matnx becomes involved when we look al variances. Each experiment will have a sample variance 5? or an expected o? » E[(r, -m.)-j based on thc squared distance from ils mean Those M numbers of............will go on the main diagonal of the "variance-covariance matrix’’. So far we have made no connection between the M parallel experiments. They measure different random variables, but thc experiments are not necessarily independent! If we measure age and height for children, the results will be strongly correlated. Older children are generally taller Suppose thc means rn. and rn* arc known. Then rrf and arc thc separate variances in age and height. The new number Is thc covariance <xo*, which measures the connection of each possible age to each possible height. Covariance <r.* = E [(age - mean age) (height - mean height)]. (j |) This definition needs a close look. To compute <r«*. it is not enough to know the probability of each age and lhe probability of each height. We have lo know thc joint probability p„* of each pair (age and height) This is because age is related to height. Pah probability that a random child has age = a and height = h: both at once pl} probability that experiment 1 produces m and experiment 2 produces pj Suppose experiment I (age) has mean rn(. Experiment 2 (height) has its own mean rna. The covariance between experiments 1 and 2 looks at all pairs of ages r, and heights y, We multiply by thc joint probability p4 of that age-height pair. Expected value of (x - rrii)(y - rna) Covariance <7la = £ £ ргДхг - mx)(Vi - ma) (12) alii, j To capture this idea of “joint probability p,/’ we begin with two small examples. Example 4 Hip two coins separately With I for heads and 0 for tails, the results can be (I, I) or (I. I)) or (0,1) or (0.0). Those four outcomes all have probability (I)2 = 1 lor independent experiments we multiply probabilities: The covariance is zero. pt) - Probability of (i, J) = (Probability of t) times (Probability of j).
8.4. Mean. Variance. and Covenant 329 Example 5 Glue the coins together (1,1) and (0,0) Those have probabtht^ i'X'i' £"* **У °“'У рохмЫте, are (1,0) and (0.1) won l happen because the coin, P,u m IQ*e®e* both heads or both tails. Joint probability matrices for Examples 1 and 2 1 and P2 Let me stay longer with P. to show it h, (heads, tails). Notice the row sum, p>. p, colunJi' Probability matrix matrix shows the •heads, heads) and (а,, й) . wm‘ Pi. Pi and the total sum - 1. first \ coin / Ptl + Р» = P2 4 entries add to 1 Pit P|2 Pit Pn (second coin) column sums Pt p* Those sum, p^ and />,.* are the marginals of the Join, pmbdahty matnx Pt P> = ₽“ + Pta - chance of heads from coin 1 (coin 2 can be heads or tatls) Pt -Ptt + P11 -chance of heads from coin 2 (coin lean be heads or tads) Example Hhowed independent random variables Every probably p„ equal, p. times p, (| times 5 gave ptJ - tn that example). In this сак the covariance al} will be aero Heads or tails from the first com gave no information about the second coin Zero covariance <7la for independent trial. «Л л= ~ diagonal covariance matrix V Independent experiments have <7 и 0 because every ptJ equals (p, )(p2) in equation (12). •Ha 52^)^И*‘“’"|)<»|-П1»)“ [&•)(*•-"«>'] p£(p,)(lb-"u)] -10]|0|. Example 6 The glued coin, show perfect correlation Head, on one means head, on the other. Thc covariance ffu moves from 0 to <T| times <ra This is the largest possible value of tfij. Here it is (|)(|) - <rla = (J), as a separate computation confirms: Means = - 2 ^’ = 5 (*-5) (’“I) + 0 + 0 + l(°"5)(°"l) Heads or tails from coin 1 gives complete information about heads or tails from coin 2: Glued coins give largest possible covariances Singular covariance matrix: determinant = 0 ' ci»« - <rj <7i<ra <7t<r3 <rj
330 Chapters. Learning from Dau Always <r3<r2 > (<Иа)2 Thus “ betnWn ‘Т,<Т2' Thc rnatri* V is positive definite (or in this singular case of glued coms. V is positive semidefinite). Those are important facts about all M by Л/ covanance matnees V for M experiments. Note that the sample covariance matrix S from A trials is certainly semidefinite. Every sample A' = (age. height) contributes to the sample mean X = (m.,пц). rank-one term (X, - X)(X - X)T is positive semidefimte and we just add to reach the matrix S. No probabilities in S. use the actual outcomes: The Covariance Matrix V is Positive Semidefinite Come back to the expected covariance »u between two expenments 1 and 2 (two coins); on = expected value of |(ошрш 1 - «гол 1) times (output 2 - mean 2)] (Пэ - E E Pa (** - m«) (Vi “ ma). Thc sum includes all ij. (’4) p,. > 0 is the probability of seeing outputs x, in experiment 1 and y, in experiment 2. Some pair of outputs must appear. Therefore the № joint probabilities p,} add lo 1. Total probability (ail pain) is 1 $2 " L (15) all i.J Here is another fact we need, fix on one particular output r, in experiment 1. Allow all outputs У) in experiment 2. Add the probabilities of (x,. pi). (x,. pj),..., (x,, t/„); n Row sum р,- of P £ Pij ~ probability p, of x, in experiment 1. (16) Some уj must happen in experiment 2! Whether the two coins are completely separate or glued, we get the same answer | for the probability рн “ Рни +Рнт that coin 1 is heads: (separate) Рнн + Pht - | | ~ (glued) Рнн + Рит = + 0 = i. That basic reasoning allows us to write one matrix formula that includes the covariance <rl2 along with the separate variances <rf and a2 for experiment 1 and experiment 2. We get the whole covariance matrix V by adding the matrices Цу for each pair (a. j): Covariance matrix _ Г (x, - mi)2 (x, — m, )(y, - mj ll V = sum of all *jj — P,] |<r, - mi)(yj — m2) (p, - mj)2 J
8.4. Mean. Variance. and Covari^e 0«1bedi4l.»al.1hi.i.tqo<l„M « ,eum» Ite «ta, «шм. *<•№. „ by using equation (16). Allowing all} jUM ? 1 Аи* ln detail how we get Ц, = — the probability Pi 0(,. in Mperinlen| Vit - > . - mJ2 = V (nn. . ..lift jr-,р,оЬЛ,,“> H|| | please look al that twice. It j( the . one formula (17). The beauty of that formula bT? *!“* cwwiancc тм,пх h> v„ h„di.s..n.l«n«k,p,,(Ij.„,). >„ „ N„rmi|1 Jo ( Thai matrix V(J ha. rank l.EquuionUTJmvl [(x(-mi)(^-m2) (gj-m,)2 ] [ю-т,]1 Ei-ery matrix рциит it positive semidefinite. So the whole matrix Г (thc шт of those rank 1 matnces) is at least semidefinite—and probably I’ is definite. (18) (19) The covariance matrix V is positive definite unless the experiments art dependent. Now we move from two variables r and g to Л/ viable» like age-height-weight The output from each tnal is a vector X with Л/ components (Each child has an agc- height-weight vector X with 3 components.) The covariance matrix V ix now Л/ by Af. The matrix V is created from the output vectors X and their average У = E ,’XJ: Covariance V=Ef(X-X) (X-Х)7 (X^-m,) (20) matrix i_________________________________________________ Remember that XXr and X XT = (column)(row) are M by M matnces. For Af = 1 (one variable) you see that У is the mean m and V is the variance a3. For Af -2 (two coins) you see that Г is (mt.m2) and V matches equation (7). Thc expectation always adds up outputs limes their probabilities. For age-height-weight thc output could be X = (5 years, 31 inches. 4S pounds) and its probability is ps.31.4e • Now comes a new idea. Take any linear combination с1 X = ci Xt + • • • + см Хм- With c = (6.2.5) this would be cTX = 6 x age + 2 x height + 5 x weight. By linearity wc know that its expected value E [cTX] is cTE [Xj = с1 X: E [cTX] = cTE (XJ - 6 (expected age) + 2(expected height) + 5 (expected weight).
332 Chapter 8. Learning from Data More than the mean of cTX. we also know iu variance tr2 - cT Vc: Variance of cTX = cTE ^(X - X) (X - X) j c — cT Vc Now lhe key point: The variance of cTX can never be negative. So Vc > о New proof: The covariance matrix V is positive semidefinite by the energy test cT Vc > o' Covariance matnees V open up the link between probability and linear algJbn>. V equals QAQT with eigenvalues A( > 0 and orthonormal eigenvectors qt to Diagonalizing the covariance matrix V means finding M independent experiments as combinations of tbe original M experiments. The Covariance Matrix for Z = AX Here is a good way to see oj when z = r + y. Think of (x,y) as a column vector X. Think of the 1 by 2 matrix A = [ 1 1 ] multiplying that vector X = (x. y). Then AX is lhe sum z = x + y. The variance a2 goes into matrix notation as •:-(* ч[Й. S’lR] •*“* (2J) Now for the main point. The vector X could have Af components coming from А/ experiments (instead of only 2). Those experiments will have an M by А/ covariance matrix Vx. The matrix A could be К by M. Then AX is a vector with К combinations of the Л/ outputs (instead of one combination x + у of 2 two outputs). That vector Z “ AX of length К has a К by К covariance matrix V% . Then the great rule for covariance matrices—of which equation (22) was only a 1 by 2 example—is this beautiful formula: The covariance matrix of AX is A (covariance matrix of X) AT; The covariance matrix of Z = AX Vz = AVXAT (23) To me. this neat formula shows the beauty of matrix multiplication. 1 won’t prove this formula, just admire it. It is constandy used in applications. The Correlation p Correlation p,v is closely related to covariance a^. They both measure dependence or independence. Start by rescaling or "standardizing" the random variables x and у The new X = х/<гж and У = y/av have variance <r2x = trj. = 1. This is just like dividing a vector t> by its length to produce a unit vector ю/||ю|| of length 1. Thc correlation of x and у is the covariance of X and Y. If the original covariance of x and у was then rescaling to X and У gives correlation pxy = <тяу/ая<ти. Correlation pxy =—— ж covariance of — and — tr* <rv Always —1 < p*v < 1
333 8.4. Mean, Variance. and Covariance problem Set 8.4 1 2 3 4 5 6 7 8 9 10 11 If all 24 samples froin sample mean p lhc ” 4 (hc _ ° finance & Wk » < * ж 20» * hat arc thc Add 7 to every outputWhat J *f ' = » « 21.12 tunes each 7 new sample mean, thc neu. - PP^ 10 °* n*an .k- “*’** «petted mean. and inancc 9 are thc We know: | of all lnte„en „. . . **lhc *• *«n»ce 7 What fraction of integers will к, ,'*'мЫе b7 3 and 1 o( lnu 8 *'П ** ^visible by 3 o, 7 t““?sen *" d,*‘“ble by 7. Suppose you sample from the number, , , " Whai k th‘hC PrDbab,l"le' Po «о Л that the 1^?^ ₽robab,l"*‘ 1/1000. What the expected mean m of that J* , **" °* 7ой' «mple is 0.......97 Cl . , •“ *1»M is ns Variance n3 7 Sample again from 1 to 1000 but lew м .h. i square could end with z = o, 1 4 j 6 d'«« <* 1* sample ^uarrd T^t finl dig.i (71 I 'iS)'ww*Т-•*К*I»«» variances11 Rtme„to S « “»-P* Equation (4) gave a second equivalent form fw (Ihe vanance using sample,) S’ = ^Ti,umof(xi-m),-~-y[(lumofzJ)-WmJ]. Verify lhe matching identity for the expected vanancc a3 (using m - £ p, r.): <ra =- sum of (z, - m)’ = (sum of p, *J) - m2. Computer experiment: Find the average Aioooow of a million random 0-1 samples I What is your value of the standardized vanable X = (Ajv - |) /2s/?f 7 For any function /(z) the expected value is E[/| = £> /fo) or f p(z) /(z)dr (discrete or continuous probability). The function can be i or (z - m)’ or z2. If the mean is E[z] = m and the variance is E[(x - m)2] ® o3. what is E[z2) 7 Show that the standard normal distnbution p(z) has total probability J p(r) dr = 1 as required. A famous trick multiplies Jp(x)dr by fp(ii)dy and computes thc integral over all z and all у (—ос to oo). The trick is to replace dr dy in that double integral by r dr dO (polar coordinates with r2 + g2 = r2). Explain each step: oo oo oo 2» 2тг Zpfz) dr Ml/) dy= /J e'^^dx dy= J —oo —oc —co 0 = 0 г -к Ir dr de = 2* = 0
A1 The Ranks of AB and A + В This page establishes kes facts about ranks: «ben we multiply matnees. the rank cannot increase You vu ill see this by looking at column spaces and row spaces. And there is one special situation alien the rank cannot decrease. Then you know thc rank of AB. Statement 4 will be tmporunt when data science factors a mains into UV or CR. Here are five key facts in one place mequal.ties and equal.tics for the rank. 1 Rank of AB < rank of A Rank of AB < rank of В 2 Rank of A + В < (rank of A) + (rank of Bi 3 Rank of A* A = rank of AAT = rank of A = rank of AT 4 If A is m by r and В is r by n—both with rank r—then AB also has rank r Statement 1 insolves the column space and row space of AB: C(AB) is contained in C(A) C((AB)T) i» contained in C(BT) Every column of A В is a combination of the columns of A (matrix multiplication) Every row of AB is a combination of the rows of В (matrix multiplication) Remember from Section 1.4 that row rank = column rank Wc can use rows or columns. The rank cannot grow when we multiply AB Statement 1 in the box is frequently used. Statement 2 Each column of Л + В is lhe sum of (column of A) + (column of B). rank (A + B) < rank (A) + rank (B) is always true It combines bases for C(A) and C(B) rank (A + B) - rank (A) + rank (B) is not always true. It is certainly false if A - В I. Statement 3 A and ATA both have n columns. They also have the same nullspace. (See Problem 4.1.9.) So n - r is the same for both, and the rank r is rhe same for both. Then rank(AT) > rank(ATA) - rank(A). Exchange A and AT to show their equal ranks. Statement 4 We are told that A and В have rank r. By Statement 3. A1A and BBT have rank r. Those arerby r matnees so they are invertible. So is their product A1 ABBT. Then r = rank of (ATABBT) < rank of (AB) by Statement 1 : AT, BTcan’t increase rank We also know rank (AB) < rank A = r. So we have proved that AB has rank exactly r. Note This does not mean that every product of rank r matrices will have rank r. Statement 4 assumes that A has exactly r columns and В has r rows. BA can easily fail. В = [ 1 2 —3 AB has rank 1 But BA is zero! 334
A2Eigenva1UesandSin8ularVa,ues;RankOne . onc matnx has lhe simple form A = xy1 hv singular vectors »t.г ал! its «Ь A ГЗП oneulM value »i m incredibly can io fad. nonzcn> * x y •‘шйЗ *’’we* ll»ll '®n You see immediately that Л = хут ж ц^.-Т _______..к - „ All other column. of Ле c' ™ „ S. №«Р..М..ОО л . rar Eigenvalues and eigenvectors are not it,,, —. c*_ . т —L-i.f--.-_i quite mat easy Of course the matnx A must be square To make life simple we continue with a 2 by 2 matnx A « ryT Certainly x is an eigenvector! Аж ж ryTx Ж A(x >0 A, is the number yTx. The other eigenvalue is Aj ж 0 since A is singular (rank ж П The eigenvector xj ж yA must be perpendicular to y. so that Axj ж xy1^1 ж 0 If у (a. hi then y- is its 90° rotation (b. -e). The transpose matrix AT = yxr has the same eigenvalues v'x «rf 0. Its eigenvectors are lhe left eigenvectors of A. They will be у and x* (because xy1 hw eigenvectors x and у1). The only question is tbe scaling that decides the eigenvector lengths The requirement is (left eigcnvector)T(nght eigenvector) ж 1 Then the left eigenvec- tors are thc rows of X 1 when the right eigenvectors are the columns of .V; perfection! In our case those dot products of eigenvectors now stand at yTx and (xx)Ty* Divide both left eigenvectors у and x^ by the number yTx. to produce Г'Х ж XX‘‘ /: Finally there is one more crucial possibility, that yTx 0 Now the eigenvalues of A = xy1 art zero and zero A has only one line of eigenvectors, because y1 is in the same direction as x. The diagonalization (2) breaks down because the eigenvector matnx X becomes singular. We cannot divide by its determinant yTx - 0. This shows how eigenvectors can go into a death spiral (or a fatal embrace x ж у1). Of course the pairs of singular vectors x, x1 and y. y1 remain orthogonal. Question In equation (2). verify that j x y£ j ж (yTx) J 0 j j. Quest ion When does A = xyT have orthogonal eigenvectors ? 335
A3 Counting Parameters in the Basic Factorizations A = LU A = QR s - QAQr * — ХЛХ 1 A - QS A = This it a review of key idea' in linear algebra. Thc ideas are expressed by those i/alions and our plan is simple: Count the parameters in each matrix. Wc hope to sec that in each equation like A = LU. thc two side* have thc same number of parameters. For A = LU. both sides have я* parameters I.; Triangular n x ri matrix with I s on thc diagonal j n(n — 1) U: Triangular nun malm with free diagonal | n(n 4- 1) Q: Orthogonal я x n matrix j r»( ra — 1) S: Symmetric n x n matrix | n(r‘ + 1) A: Diagonal n x n matrix n X: n x n matrix of independent eigenvectors n2 — n Comments are needed for Q. Ils lirsl column qt is a point on thc unit sphere in R". That sphere is an n — 1 -dimensional surface, just as lite unit circle ra + y2 = 1 in Ra has only one parameter (thc angle Of. The requirement ||fl||| 1 has used up one of the n parameters in qt. Then fl3 has n - 2 parameters—il is a unit vector and it is orthogonal to . The sum (n - 1) + (л - 2) + • • - + I equals J n(n - 1) free parameters in Q. Thc eigenvector matrix X has only na - n parameters, not na. If x is an eigenvector then so is ex for any e / 0. Wc could require thc largest component of every x to be I, This leaves n I parameters fix each eigenvector (and no free parameters for X-*). Thc count for the two sides now agrees in all of thc first five factorizations. Fix the SVD. use thc reduced form AmXn = Umur^rurKx„ (known zeros arc not free parameters!) Suppose that rn < n and A is a full rank matrix with r m m. Thc parameter count for A is rnn. So it lhe total count for (Д E, and V. Thc reasoning for orthonormal columns in U and V is thc same as for orthonormal columns in Q. U has rn(m — 1) E has m V has (n-l)+---+(n-rri) = mn — - rn(m + 1) Finally, suppose that A is an rn by n matrix of rank r How many free parameters in a rank r matrix? Wc can count again f<xf/mxrErxr Vr^n: U has r) » mr — ? r(r 4- 1) V has nr — - r(r 4-1) E has r Thc total parameter count for rank r is (m + n — r)r. Wc reach thc same total for A = CR in Section 1.4. Thc r columns of C were taken directly from A. Thc row matrix R includes an г by r identity matrix (not free!). Then the count for CR agrees with thc previous count for l/EV *. when thc rank is r: (' has rnr parameters R has nr — ra parameters Total (m 4- n — r)r. 336
А4 Codes and Algorithms forNume mer'cal Linear Algebra LAl’ACK is the first choice for dense ScaLAPACK achieves high performance COIN/OR Here are sources for specific algorithms Direct solution of linear systems Basic matrix-vector operations Elimination with row exchanges Sparse direct solvers (UMFPACK) QR by Gram-Schmidt and Householder Eigenvalues and singular values Shifted QR method for eigenvalues Golub-Kahan method for the SVD Iterative solutions Preconditioned conjugate gradients for Sx b Preconditioned GMRES for Ax = b Krylov-Amoldi for Ax = Ax Extreme eigenvalues of S Optimization Linear programming Semidefinite programming Interior point methods Convex Optimization Randomized linear algebra Randomized factorizations via pivoted QR A = CM R columns/mixing/rows Fast Fourier Transform Repositories of high quality codes ACM Transactions on Mathematical Software ‘Vnuzabon problems of operates research BLAS LAPACK SuitcSparsc. SuperLU 1-APACK LAPACK lapack Trilinos Tnlmos ARPACK. Tnhnos. SLEPc see also BLOPfcX CLP in COIN/OR CSDP in COIN/OR IPOPT in COIN/OR CVX.CVXR usersicesirtexasedu/ ->pgrrvmain codes html FFTWorg GAMS and Nethb org TOMS Deep learning software Deep learning in Julia Deep learning in MATLAB Deep learning in Python Deep learning in R and JavaScript Fluxml a»^F lux jl'stabie Tensorflow org, Tensorflow js Kcras. KcrasR 337
А5 Matrix Factorizations 1. A = CR = (basis for column space of A) (basis for row space of A) Requirements: C is m by r and R is r by n. Columns of A go into C if they are not combinations of earlier columns of A. R contains thc nonzero rows of thc reduced row echelon form Ra = rref(A). Those rows begin with an г by r identity matrix, so R equals [ I F ] limes a column permutation P. A_CMR* (C = firstr \ /IV = firstr by г \ 1/Л* = first r — ^independent columns/ ^invertible submatrixy ^independent rows Requirements: C and R' come directly from A. Those columns and rows meet in the r by r matrix IV = Af “* (Section 3.2): Af = mixing matrix. The first r by г in- vertible submatrix IV is the intersection of the r columns of C with the r rows of R*. . . - ,. _ / lower triangular L \ / upper triangular U \ — l’s on the diagonal у \ pivots on the diagonal J Requirements: No row exchanges as Gaussian elimination reduces square A to U. 4 A - LDU — ( *owcr tnan£ular \ ( P*TOt matrix A / upper triangular U \ — \ l’s on the diagonal ) \ Dis diagonal J у l’s on the diagonal J Requirements: No row exchanges. The pivots in D are divided out from rows of U to leave l’s on the diagonal of U. If A is symmetric then U is LT and A = LDLT. 5. PA = LU (permutation matrix P to avoid zeros in tbe pivot positions). Requirements: A is invertible. Then P.L,U are invertible. P does all of the row exchanges on A in advance, to allow normal LU. Alternative: A = L\P\U\. 6. S = C^C = (lower triangular) (upper triangular) with v'D on both diagonals Requirements: S is symmetric and positive definite (all n pivots in D arc positive). This Cholesky factorization C = chol(S) has CT = Ly/D, soS = C^C = LDLr. 7. A = QR = (orthonormal columns in Q) (upper triangular matrix /?). Requirements: A has independent columns. Those are orthogonalized in Q by the Gram-Schmidt or Householder process. If A is square then 1 = QT. 8. A = X AX ~1 = (eigenvectors in A') (eigenvalues in A) (left eigenvectors in X-1). Requirements: A must have n linearly independent eigenvectors. 9. S = QAQT = (orthogonal matrix Q) (real eigenvalue matrix A) (QT is Q"1)- Requirements 5 is real and symmetric: ST = S. This is the Spectral Theorem. 338
д5 Matrix Factorisations 339 « fi) (Jordan blocks in J) (R-*). -----t a block for each one eigenvalue. m * " «n?ul»r value matnx \ f > ' ° It--,o, on its diagonal ) \ 1. _ Requiren^nts: None. This ----- Requirements: A is any square linearly independent eigenvector of A. Every blucXs^riy II. A = USVT orthogonal X Visit x n )' tors of A A tn L and eigenvectors of ATA in V; = /д,(ААт). Those singular values are o, > aj > ... > „r > 0. By column-row mulupltcabon A = l/EVT = +”- + <r,Ur»J. If Sis symmetric positive definite then L’ = V = Q and E = A and S = QAQT. .+ __ у£+[/Т _ i orthogonal \ / n x m pseudoinverse of E \ /orthogonal > \ nxn / \1/<Г|,...,1/<Гг on diagonal / \ m x m ) Requirements: None. The pseudounerse A* has A* A = projection onto row space of A and A A* = projection onto column space. A* - A-1 if A is invertible. The shortest least-squares solution to Az = b is i+ = A*b. This solves AT Ax+ = A1 b. 13. A = QS - (orthogonal matrix Q) (symmetric positive definite matrix S). Requirements: A is invertible. This polar decomposition has S2 - AT A. Thc factor S is semidefinite if A is singular. The reverse polar decomposition A = KQ has № = AAT. Both have Q = UV1 from the SVD. 14. A = UAU~1 = (unitary U) (eigenvalue matrix А)(Г'' which is I H = U )• Requirements: A is norma!: АЯА = AA^ «« «ЬогютЫ eigenvectors are the columns of U. Complex Vs unless S = S Hermman case. 15 A = QTQ1 = (unitary Q) (triangular T with A’s on diagonal) {Q =Q ) ’ Requirements: Schur trian^^any squanA • matrix Q with orthonormal columns that makes Q~'AQ tnangular Secnon 63. |7 Dl [^n/3 ] [ evttKxld 16. Fn = [ Fn/,J I permutation Requirements: Fn ~ Fbuner matrix - У tbe Fast Fourier Transform D ba. 1. ‘ ta. < «М» - « V will compute Fnz with only P recursive ИТ.
А6 The Column-Row Factorization of a Matrix Abstract The active ideas in linear algebra are often expressed by matrix factorizations: S = Q\QT for symmetric matrices (the spectral theorem) and A = UL\ all matrices (singular value decomposition). Far back near the beginning comes A = Lb for successful elimi- nation : Lower triangular tunes upper triangular. This paper is one step earlier, with bases in A = CR for lhe column space and row space of any matnx—and a proof that column rank = row rank. The echelon form of A and the pscudoinverse A+ appear naturally. The “proofs" are mostly “observations". Introduction An introduction is hardly necessary for so short a paper. But I can explain thc background. In teaching linear algebra, the course often begins slowly. The idea of a vector space waits until Chapter 3. The highly important topic of singular values is squeezed into the final week or completely omitted. A new plan is needed. I now start lhe course in a different way. The multiplication Ar produces a combination of the columns of A. All combinations fill the column tpace of A—a key idea lo visualize. Simple examples of Ar » 0 show thc idea of linear dependence. Starling with column 1, we create a matrix C with a full set of independent columns—a basis for lhe column space. 1 believe that this “fast start" is also a better start. Every column of A is a combination of the columns of C. Introducing matrix multiplication, thal fact become! A CR. Wc have a natural factorization of A. to be followed by A = LU (elimination) and A = QR (Gram-Schmidt) and 5 - QAQr (eigenvalues in the spectral theorem) and A = UEVr (singular values in the SVD). The course has a structure that students can follow. A new textbook called "Linear Algebra for Everyone" is in preparation. Thc key point for this paper is thal the matrix R in A = CR is already famous. R is the reduced row echelon form of A. with any zero rows removed. It has a simple “formula” R = [ I F ] P which the mechanics of elimination will execute. And it has a "meaning” that is hidden in those row operations on A. R tells us lhe combinations of independent columns in C. which produce all the columns of A. The Factorization A = CR A is a real matrix with m rows and n columns. Some of those columns might be linear combinations of previous columns. Here is a natural way. working from left to right, to find a complete set of independent columns. If column 1 is not zero, put it into C. If column 2 of A is not a multiple of column 1. put it into C. If column 3 is nor a linear combination of columns 1 and 2, put it into C. Continue. 340
A6 The Column-Row FactoruatKx1<J(iMj|ni 341 At the end. C will have r independent column Tv umn space of 4. Every column of 4 ha comb' col“n«ns wil be a basis for the col- in those combinations go into columns of 'jT****’’’ ** of C. and the coefficients 4 = 1 4 7 2 5 8 3 6 9 1 2 3 4 5 6 1 0 -1 0 1 = CR with r = 2 2 The matrix R contains an r by r identity matrii deni columns of 4. Column 3 of R tells u* thJthZ IC°,ura"' 4m CWTC'P"«‘d to indepen- M obMTVMioa: Hi, ,„r * neumjorm of Д without us m - r zero row». A second observation: Every row of A i.o --«• directly from the matrix multiplication A - CR Г”** I of R plus I times row 2 of R. 7Ы «ffi^J! ‘ °* Л "* A ,lK «nblike I and 4 in those linear combmatiixis ше in the rows of C And the row s of R are independent because r by г idottily mams is a subma tn x of R. Then the rows of Я are a basts for the row space of 4. This matrix Я is usually computed by row operations on A to reach the “echelon form". Here R appears after the column basis in C. A third observation comes from 4 = CR = (m x r)(r x n): The column runk of 4 equals the row rank of A The same number r counts independent columns in C and independent rows in R. A fourth observation is a "formula" fat the reduced row echelon form Яо rrefi 4). Normally this matrix with m - r zero rows is constructed directly by row operations on 4. and C does not appear. A direct description of Яо could be as follows. Suppose the basic columns in C are columns n( < n} < •• • < n, of 4. The other n — r columns of 4 are combinations .V = CF of those r basic columns (in order) Then thc reduced row echelon form of 4 with m - г zero rows is R»: d _ Г ? I p The permutation P puts the n columns of C* and .V into ° [ 0 0 ] their correct order in 4. and I - r x r identity matrix. Note that 4 - C ( I F]P»(C Л’] P has the columns of 4 in their original order, thanks to P. In lhe formula above. Яо is constructed directly from 4 and its uniqueness is clear. Eric Grinberg has inserted the name -gauche basis" for the columns of (’—a brilliant suggestion that reinforces the left-to-right construction of this basis for the column space of 4. . . Uniqueness of the reduced row echelon form seems to be a moot question when there is an explicit formula for that matrix Ro- The formula cannot be new. but I doo t know a reference. Observations 1-3 are definitely not new.
342 ДА fhr С>Аитп R/Л» I-мпмим tA a M-tru A Mixing Matrix Hera I* a variaimn on die matrix factorization A • CH. The matnx ( contain* actual column* of A hut the matrix H doe* u* contain row» of А. For *ymmctric perfection. we might prefer the matrix /Г cMdatmof the г uppermost linearly independent row. Ukcn directly horn A f-enerally A CH’ will not he true To rcc/rver a correct factrrrzalion of A. we need lo include a mixing matrix M between C and H Then A - (. MЦ* (The Idea of a mixing matrix ha» become widespread •" numerical linear algebra The symbol f/ I. often chosen instead of Af. hut U I» needed in thc important factorization. LU and ULV'1. We would like u> nominate lhe letter Af and the adjective mUbtg.) th*. Af have a wmple formula? fri Af 1 i* die r by r wbrnalnx at the intersection of lhe г column» of C with the r row* of /Г. If Пиле happen lo be thc first r column* and lhe first r row*, il I» easy М» *ee that MH’ will produce lhe familiar r by r identity matrix that begin* lhe reduced row echelon form H. Then A = ('MR’ ii identified with А -СП. The Pxeudolnvrn* I aclofi/alion» like A ('Hue familiar to algebraists. Il i* not surprising that they connect lo other corislruclion*. An example in linear algebra it lite pscudolnvcrse of A. Wr write die p*eudoinver»e a* A ‘. Il invert* lhe mapping (multiplication by A) from row apace lo column apace. The pseudoinvcrsc it rrm on lhe nullspacc of At. Thu* it invert* A where inversion is possible: A A 1 A A. and A4 -((/EPT)+ - VE+t/T I he pceudoinvrrtc connect* perfectly lo the rank r factor* in A ('R. Thc pMiudoinvcrxc of (' it il* lefl inverte (’♦ (C^C) 'Cn. Thc pscudoinverse of R it it* right inverse /(• R1 (////* ) Then lite ptcudoinvene of A CR i* A * = H+C+, because all rnnk* are eipuil lo r. Thia »hort puficr wat wbmilicd lo lhe Jtnirtuil nf Convex AnaMi. A further note about lhe*c fiKiiHi/aiion* it in ptogre*» with Daniel Drucker and Alexander Lin. Our plan it lo |mi«| holh paper» on die nrXiv wchsilc in 21120.
* Tbe Jordan Form of a Square Matrix Wc kn<л» that srmie ujuare matnees 4 »иЬ eigenvector*. Therefore they can , be А>ж* •“'* n «dependent JordanesUbhsheda^oHy^^^^^ by ХАХ’»: X и am mwruNe. when A has к independent eigenvector* йи* b“ * Jordan block» J,,... jt J=BABl bs Jordan block» J, А» I 0 • 1 O' 0 I If Л can be diagonalized. then к = n д _ x . . n blocks are 1 by 1). If A can t be diagonal,/ed A Each block J, of she n, ha» only one L ** value A,. The matrix В contain*egemectorstf А хкы .nh - °"* e,gc"’ Here to an example rather than a proof. * 8 e’*en'*cUx' Example has A = 3,3.3 with only two genuine eigenvector* *1 [° 0 and 0 °] [1 and A » B~'JB. 3 I 0 3 о 0 Thi* Jordan form make* Л* - B' J”В and . B'^B as simple a* posoble to compute. For power* of J. we just compute block by block: Г 3 1 Г [ 3" [оз] я[ О That exponential formula i* telling us the missing solution to the differentia) equation dU/dt “ JU (and also du/dt = Ли). The usual solution has e31. We can’t just use that twice, when A = 3 is repeated. The missing solution is teM. And a triple eigenvalue A = 3 with only one eigenvector (and one Jordan block) would involve f’e3*. The Cayley-Hamilton Theorem “Every matrix A satisfies its own characteristic equation р(Л) » 0.” The determinant of A - XI is a polynomial p(A). and the n solutions to p( A) = 0 are the eigenvalues of Л Our example above has p(A) = (A - 3)3 with a triple eigenvalue A = 3.3,3. Then Cayley- Hamilton says that р(Л) = (Л - З/)3 has to be the zero matrix. Jordan make* this easy. because р(Л) = Bp{J)B~1 and p( J) is certainly «го: 101’ Г о Example p(J) = (J-3/)3 = and then (Л—З/)3 = 0. 0 0 0 о о о 0 0 0 0 0 0 0 0 343
А8 Tensors In linear algebra, a tensor is a multidimensional array. (To Einstein, a tensor was a function thal followed certain transformation rules.) A matrix is a 2-way tensor. A 3-way tensor T is a stack of matrices. Its elements have three indices: row number t and column number j and “tube number k. An example is a color image. It has 3 slices corresponding to red-green-blue. A slice of T shows the density of one of those primary colors RGB (k = 1 to 3), at each pixel (a, j) in thc image. tensor (3-way array) vector Another example is a joint probability tensor. Now р,}ъ is the probability that a random individual has (for example) age i and height j and weight k. The sum of all those numbers p1;t will be 1. For i = 9. the sum of all рэ}к would be the fraction of individuals that have age 9—the sum over one slice of the tensor. A fundamental problem—with tensors as with matrices—is to decompose the tensor T into simpler pieces. For a matrix A that was accomplished by the SVD. The pieces that add to A arc matrices (we should now say 2-tensors), with the special property that each piece is a rank-one matrix u»T. Linear algebra allowed us to require that the u’s from different pieces were orthogonal, and the u’s were also orthogonal and that there were only r pieces (r < m and r < n). Sad to say, this SVD format is not possible for a 3-way tensor. We can still ask for R rank-one pieces that approximately add to T: CP Decomposition T ss ai о bj ocj + -•• + ад о Ьд о cr. (]) Orthogonality of the a’s and of the b’s and of the c’s is generally impossible. Thc number of pieces is not set by T (its “rank" is not well defined). But an approximate decomposition of this kind is still useful in computations with tensors. One option is to solve alternately for the a, (with fixed b, and c,) and then for the b, (fixed a, and c,) and then for the c, (fixed a, and b,). Those subproblems can be reduced to least squares. Other approximate decompositions of T are possible. The theory of tensor decompositions (multilinear algebra) is driven by applications. We must be able to compute with T. So the algorithms are steadily improving, even without the orthogonality properties of an SVD. 344
№ The Condition Number The condition number measures the ratio of < The most common problem is to solve n line»*" to (change in data). this case the data is b and the solution is z - T b “ n unkno*M *• In in the data is ДЬ and the change in the solution кл * A “ ftxed changc meaning of the word “change". Do we comoute ‘° deC,de thc change ИЛЬИ / ||b|| ? That derision for t^ dJa bt^nc^ or the relative solution x. *111 h™* a decision for the Absolute _ max ||Az|| im-ia. condition b, ДЬ ||ДЬ|| = ®« The absolute choice looks good but it has a problem. If we are multiplying A"1 by 10. The absolute condition is not 10 times harder. The relative Relative _ max ||Az||/||z|| ... condition Ь.ДЬ[|ДЫ|/ЦЬЦ we divide lhe matrix A by 10. ..goes up by 10. But solving Az = b condition number is the right choice. (2) ------------------------— I If Ais the simple diagonal matnx £ with entries a, > - -. > ffr = then its norm is o,M* - Ot • The norm of A is 1/a».. The orthogonal matrices U and V in the SVD leave the norms unchanged. So the ratio /a~ i, cond(A). We are using the usual measure of length ||z||2 = zf +-• + zj. Notice that crM (not A^,) measures the distance from A to tbe nearest singular matrix. At first we might expect to see A - A^Z, bringing the smallest eigenvalue to zero. Wrong. The nearest singular matrix to A = L'EVT is U(E - a—Z)VT because the orthogonal matrices V and V don t affect the norm. Bnng thc smallest singular value to zero. The eigenvalues of A have different condition numbers. Suppose A is a simple root (not a repeated root) of the equation det(A - AZ) = 0. Then Az = Az and ATy = Ay for unit eigenvectors ||z|| = ||yj| = 1. The condition number of A is l/|yTz|. In other words it is 1/| cos 01, where 0 is the angle between the right eigenvector z and the left eigenvector y. (The name comes from the equation yT A = AyT. with yT on the left side of A.) Notice that a symmetric matrix A will have у = z with cosfl = 1. Tbe eigenvalue problem is perfectly conditioned for symmetric matrices, just as Az = b was perfectly conditioned for orthogonal matrices with ||Q| | ||Q"11| = 1. The formula 1/|yTz| comes from the change ДА « ут ДА z/yTz m the eigenvalue created by a small change ДА in the matrix. 345
А10 Markov Matrices and Perron-Frobenius This appendix is about positive matrices (all atJ > 0) and nonnegalivc matrices (all > 0) Markov matrices Л/ are important examples, when every column of M adds to 1. Positive numbers adding to 1 makes you think immediately of probabilities. A useful fact about any Markov matrix M: The largest eigenvalue is always A a 1. We know that every column of Л/ — I adds to zero. So the rows add to the zero row, and Af - / is not invertible: A = 1 is an eigenvalue Here are two examples: = [ 0 2 0 7 ] c'8envalues 1 and ~ [ 1 о ] **“ e*8cnvalues 1 and -1 That matrix .4 is typical of Markov. Thc eigenvectors are = (0.6,0.4) and x2 = (1, _ i). f 0.8 [ 0.2 0.3 1 0.7 ] f 0.6 [ 0.4 J q G j is a steady state Г 0.8 [ 0.2 0.3 1 0.7 J [-!] H * j is a “transient" that disappears Our favorite example is based oo rental cars in Chicago and Denver. We start with 100 cars in Chicago and no cars in Denver: y0 e (°)- Every month we multiply the current vector y„ by A to find y„.|: the number in Chicago and Denver after n + 1 months: «.-['"l ’-["1 *-|S] ’-[Si••’-[“] That steady slate (60.40) is an eigenvector of A for A « 1. If wc had started with Vo “ (60,40) then we would have stayed there forever. Starting at (100,0) we needed to get 40 cars to Denver. You see that number 40 at time zero reduced to 20 at time 1, 10 at lime 2. and 5 at time 3. That is thc effect of lhe other eigenvalue A |, dividing its eigenvector by 2 at every step: This is yn = Any0 coming from the single step equation y„+1 = Ay„. In matrix notation. A" approaches rank one! л-.<хлх-г-хл-х-'.[’; (J).][J П . 4«-[06 04 -O.Gj - [0.4 0.4 You have now seen a typical Markov matrix with Atnax = 1- Its eigenvector (0.6,0.4) is the survivor as time goes forward. All small eigenvalues have A" —> 0. But our second Markov example has a second eigenvalue A = — 1. Now we don't approach a steady state: ® e I о e’£enva,ue At = 1 with zt = I J j and A2 = —1 with x2 = 346
AIO Markov Matrices anj The zeros in В allow tha| AH cars switch cities even, e,ge Vo^^^thenthetS^^ 347 Vo = sue as A, = , b*Mo>. If we мал M No steady stale because A, = . 1 J I -10 ] Markov matrix A has all a ~ ~1 •*» has sue |A,| „ . Perron found the proofs ' ^ghl «L B "J Ind.i.mn.prwnd,,„Шу ,llh£•**., > 0. n,. _____ tVCTV fwv.;»:. _________ _______________________ ••Mowed B. =n Theorem (P Z-------------------' Пи*Л' Л “ allowed (Perron) Alli^b^ZT---------‘--------------- ------— m ж * mined). we will Jw ,'XX 1 Я» " Л' ? " Amax and x is the positive eigenvect ** ** “ ,юм x Then BU* (wh,ch “ e^env«'«~whKh we * *n 'max is our eigenvalue If Ax > tm^x is not an that produces a strict mequal.ty А’Г>\ by A Because A > n •* >,.'"ш. forces the equality Лх « tmuX co',*d be increased Thu controdumon beeeu» on ,be Wl ,Л o( »-««. . i. ponn,, To see that no eigenvalue can be larver th * * pow,ve may involve negat.ve or complex numbers S,nce A « by thc “triangle inequality- Thi, |,| “ '<* **!“«: |Д||Х| . (Лх| < Л|ж| possible candidates t. Therefore |Al cannot e^^ “ *“ ,A| “ «* °f «he Many Markov examples мал with a^^±~^т“Я * A"« higher power A” « stnctly po„tIVe (Perron> bu',hen Л’ *««"' steady state eigenvector from A . j '™"” “** >™«»* "«nces" also have one when pare > «, “)’ of outgoing links then column sums = I. finally G = r^7hG‘J > и(Р^ГТОП) WikiPed“ "*1 «he book Amy L^gville and Carl Mever (which is quickly found using Google). Rtfertnct Amy Langville and Carl Meyer. Google't ftige Лтк and Beyond: The Science of Search Engine Rankings, Princeton University Press (2011).
Index A Adaptive, 315 Add vectors, 1,2 AlexNet. 288.292,303 All combinations. 6,21 All-ones matrix. 222 AlphaGo, 305 Angle between vectors. 14 Antisymmetric. 69.238.240.251 Area. 187, 189. 196 Area of parallelogram. 188 Arg min, 307 Arrow. 5 Associative law, 30.38 Augmented matrix. 45.81.96 Average. 301,316 Axis, 234 В Back substitution. 41,42 Backpropagation, 286,288, 289, 306,318 Backslash. 97 Base, 189 Basis. 24.31.33.74.107,110.114.117. 118,122.139.193 Basis Pursuit, 284 Baumann, vii. 271.272 Bell-shaped. 267, 324-326 Best line. 155.277 Bidiagonal. 266 Big Formula. 179. 183,184 Big Picture. 124,137. 138,156 Binomial. 297.326.327 BLAS. 337 Block elimination. 70 Block matrix. 56.70.71.343 3bluelbrown.com, ix Bowl. 232.307,309 Box. 189 Breakdown. 43 C Calculus, 155.282,306 Cardinality, 283 Cayley-Hamilton. 226,343 Center point, 162 Centered, 247 Central Limit Theorem. 324. 326, 327 Chain rule, 289,319 Change of basis. 115,197,200 Characteristic polynomial, 205, 343 Chebfun.org, 282 Checkerboard. 132 Chess matrix. 132 China. 39 Cholesky, 231,237 Circle. 210.247 Classification. 291 Clock. 9 Closest line. 153.157.276 Closest point. 143. 147, 149 CNN. 293.299 Code. 172,176.200.263.267 Cofactor, 177,180,186 Coin flip. 321,327,328 COIN/OR. 337 Column picture. 21.44 Column rank. 25 Column rank = row rank, viii, 33,124 Column space, vi, 22,23,27,77,101 Column way, 29 Columns times rows, 34,35,60 Combination of columns. 3,21 Commutative, 30 Companion matrix, 213,251 Complete solution, 97,99,104 Complex matrix. 236.239 Complex number, 228,236 Complex vector. 236 Components. 5 Composition. 286,288,289,296 348
Index Compression by SVD. 269-272 Condition number. 311.315,345 Congruent. 233 Constant coefficient. 243 Convex. 232.235 Convolution. 292,299-301 Comers, 10,187 Coronavirus, xi Correct language, 108 Correlation. 332 Cosine, 13,15,17 Cosine Law, 18 Cost. 57 Counterrevolution, 283 Counting Theorem. 125 Covariance matrix. 258.277,328-332 Cramer's Rule. 185,186,190 Cross-entropy. 291 Cube, 9. 10.191 Cumulative, 323.324 Current, 127 D Damping, 245 Data science, viii, 260,291 Deep learning, 286. 337.356 Dependent, v, 22.40, 159.356 Derivative. 69.194 Determinant. 50.54.177,207.213,225 Diagonal matrix. 20.215,275 Diagonalizable, 217,223,224,235 Diagonalization. 216,343 Diamond. 283,284 Dictionary, xii Difference equation, 247 Difference matrix. 20 Differential equation. 243.257 Dimension, 33,87,107,111.112.121 Distance to singular, 345 Dot product, 1,2,13.21 Dot-product matrix, 157 Double descent. 314.315 Double eigenvalue. 222.343 Drucker, xi, 65 <lung. 260. 349 К 3.87.89.341 275.277 i-o Edge matnx. IWt E<genfaces. 279 Eigenvalue 202-210 Eigenvector. 202.311. 345 bgemector main x, 245 bgenvcctonof .4ТЛ. 261 Eight rules. 79 Elimination, 39.47,95 biminatmo matnx £. 49 Ellipse. 189.200.233.234 Empty basis. 113 Energy, 229.232 Epoch.312 Equation for A. 205 Error. 161 Euler s formula. 127 Even permutation. 64.71.182 Even-odd permutation. 66 Even/odd. 179 Exchange matnx. 30 Existence. 139 Expected value. 321.322 Exponential. 249.343 Exponential volunoo. 244 Expressivity. 288 F Factorial 250 Factorization. 59.89.188. 336.338 Factors are unique. 63 Fast Foaner Transform. 64.66. 337.339 Fast start viii. 313 Fibonacci 183.219.224 Filter. 299.302 Finance. 285 Flag. 270. 273 Flat pieces. 293. 294 Folds. 294.298 Four subspaces. 121.132.133 Four ways to multiply, vii. 35 Fourier matrix. 285. 339
350 Index Fourier series, 168, 282 Fredholm. 140 Free column. 85,86 Free variables. 88,91. 100 Frobenius, 274, 347 Full column rank. 43.98, 109 Full row rank, 99, 100 Function space. 75, 113, 119 Fundamental subspace, 74,121 Fundamental Theorem. 125, 138,262 Fundamental Theorem of Calculus, 195 G Gauss-Jordan, 56,57 Gaussian, 324-326 General solution. 103 Generalization, 287,314 Geometry of SVD, 264 Gershgonn, 210,213 Go. 305 Golub- Kuhan. 266,337 Golub-Van Loan. 266 Google. 204 Google matrix. 347 Gradient. 307, 308 Gradient descent. 235.289, 306,308.309, 312,317,320 Gram-Schmidt, 158, 164, 169-171,175, 176,280,337 Graph. 126 Grayscale, 269. 287 Greeks. 285 Group, 73 Growth factor, 245 H Hadamard, 173,191,222 Heat equation, 252 Height. 189 Heisenberg. 209.214 Hermitian matrix, 236 Hessian matrix. 306 Hidden layer. 287,296. 304 Hilbert matrix. 271.296 Homogeneous, 195 House, 198, 199 Householder, 172,281,337 Hypercube, 191 Hyperplane, 294,295 I Identity matrix, 20,30 111 conditioned, 345 Image recognition, 269,299, 302 Imaginary eigenvalue, 207 Incidence matrix, 126 Independence. 22. 107, 108,217,356 Independent columns, 1.31 Independent variables, 329 Inequality, 15 Infinitely many solutions, 40 Initial value, 243 Inner product, 34.68.282 Integral, 195 Integration by parts, 69 Inverse matrix, 50,180 Inverse of AD, 51 Inverse of E, 53 Invertible matrix, 55,93,138 J Jacobian matrix, 319 Jordan block. 343 Jordan fonn, 219,339,343 Julia, 172,236,337 К Kaczmarz, 316,317,319 Kirchhoff's Current Law. 93, 127 L Lagrange multiplier, 266,285 Language. 112 LAPACK. 337 LASSO. 284 Law of Inertia, 233 Law of Large Numbers, 321 Layer, 289 Leapfrog, 248 Learning from data, viii, ix, 356 Learning function, 289,293,301 Learning rate, 306
Index Least squares. 153.154.156, pi _ 291,315 -«'1.276. Left eigenvector, 226.335 Left inverse, 94,133 Left nullspace, 121.123,125 Length, 11, 167,282 Length ||v||, 11 Line, 31,99 Linear combination, 1,3 Linear dependence, 109 Linear in each row, 179 Linear pieces, 294-296 Linear transformation, 192,194 |y? Linearly independent, 107 174 ’ ’ ” Loop. 127,172 Loss function, 291,301,312 LU,58-60 351 N Ne8«4ve definite 251 n«tlib, 58 Neural net, 235.291 № P'vot, 43 Noi«. 153,163 Normalizable, 221 Nenn. 260.274.275,282,284 345 »rma 1 distribution.323-325 333 NoT e<,Ui"iOn' I48J«.I56 "ormal matrix, 326,339 Nullspace, 83,123 Nutshell. 356 0 M Machine learning, viii, 235,291 Magic factorization, 89,90 Magnitude, 248 Marginal, 329 Markov equation, 253 Markov matrix. 204,212,218.346 MATLAB. 19.45.159,172.236 237 300,337 Matrix, 20 Matrix exponential, 243,249,255 Matrix multiplication. 2,3,29,33 Matrix space, 77,113 Max-pooling, 292 Mean, 163,277,321,322,324.326 Median, 161 Minibntch. 312 Minimum, 232 Minimum norm solution, 159,3)5 Mixing matrix, 90,95,342 Modified Gram-Schmidt, 172 Momentum, 310 Multidimensional. 344 Multiplication, 30,35 Multiplicity, 221 Multiplier, 42,49, 52 Multivariable, 306,319 oew-mit.edu. ix Origami, 293 Orthogonal complement. 137.138 Orthogonal eigenvector. 227 Orthogonal matrix. 134.166. 208, 280 Orthogonal subspaces, 135.136 Orthogonal vectors. 134.285 Orthogonality. 258.280,282.336 Orthonormal. 165 Outer product. 34,68 Overdamping, 257 Overdetermined, 153 Overfilling, 3)4 P PageRank, 347 Parabola, 160 Paradox, 239 Parallel plane. 92 Parallelogram. 4.187 Parameters, 336 Partial pivoting. 66 Particular solution, 97,99,100 PCA. 274,276,278 Penalty. 283 Permutation, 64.166,178,285 Perpendicular, see Orthogonal Perron. 347
352 Piecewise, 296,297 Piecewise linear. 2X7 Pivot. 41-43. .156 Pivot column, Кб. X7 Pivot variable*. XX Pixel, 2X7 Plane, v, 6,7 playground.tensorflow.org. viii, 293. 29X Polar decompusilHM, 2X0. 339 Positive definite, 227-242 Positive matrix, 346 Positive pivot», 230 Positive semidefinite, 22X Principal axis theorem. 234 Principal Component. 274 Probability. 36. 321, 323. 330.333, 344 Probability density. 324 Probability matrix. 329 Product ADC, 318 Product of pivot*. 1X4 Projection, 133, 134. 143.147,156 Projection matrix. 143.146, 147,151, 152.204.212 Proof. 58.60.90 Properties of determinant*. 197 Pscudoinvene. 133,138,159, 195.280. 339.342 Pythagoras. 11.18 Python. 172,300.337 Q qr. tee Gram-Schmidt Quadratic. 310 Quantum mechanics. 209 Quarter-plane. 77 R Ramp function. 290 Random. 316,317 Random sampling. 266 Rank. 24.33.100,105.122.334 Rank 1 matrix. 25. 31, 264 Rank к approximation. 260 Rank r, 25.33. 122.336 Rank of [ A b ],90 Indci Rank of A1 A " rank ofA, 334 Rank of AD. 334 Rank one. 25,31,335.344 Rank two. 128. 131 Ratio of determinants. 184. 230 Rayleigh quotient. 265, 268 Real eigenvalue, 227, 228 Real part. 248 Reduced SVD. 260 Reflection matrix. 166. 167,205, 281 Regression. 153, 156, 276.291 Relative error. 345 ReLU. viii. 286.289.297.356 Repeated eigenvalue, 216. 221 Repeated root. 252, 254, 343 Residual nets. 304 Reverse identity, 181 Reverse order, 45. 51 Revolution. 280 Right inverse, 94,133 Rotation. 166, 207 Row echelon form, 85, 87,94. 341 Row exchange. 61,65.66 Row picture. 21.29.44 Row rank. 25 Row space. 24. 78. 356 Row-and-column reduced, 94 Runge-Kutta. 257 S Saddle-point matrix. 70. 240 Sample value, 321 Sample variance. 322 Schur's Theorem. 235.339 Schwarz inequality. 15. 18.214.282 Scree plot. 278 Second derivative. 232 Semi -convergence, 313 Semidefinite. 230,232,331 Sensitivity, 345 Shift matrix. 299 Shift-invariance, 299.302 Shifted QR. 266,337 Short wide matrix, 88 Shortest solution. 159.315
353 lode* Similar matnx. 218.219.225, М3 Singular matnx. 46.48. 80 Singular value. 189.271 Singular vector. 262.276 Skew-symmetric. 182.208.231 255 Slider. 272 Sobel operator. УГ2 Softmax. 304 Solvable. 77 Space of matrices. 75,119 Span. 78. 110 Spanning tree. 222 Sparse. 283.284 Special volution. 84. 86,89 Spectral norm. 274 Spectral Theorem. 227,235 Spiral. 247.298 Stability, 249 Standard basis. 110 Standard deviation. 322.325 Standardized. 326 Statistics, 163 Steady state, 204.253.346 Steepest descent. 308,311 Stepsize, 306 Stiffness matrix. 248 Stochastic. 289.312.320 Straight line fit. 153.155.157 Stride. 303 Submatrix. 138 Subspace, vi. 24.75.76.80 Sum matrix. 28 Sum of matrices. 229.258.263 Sum of squares. 234.237 SVD. 272 Symmetric matrix. 20.69.72,336 T Taylor series. 306 Tensor. 344 Theorem. 90.347.356 Tilted box. 177 Tocplitz matnx. 299. 302 Tonga, 270 T'XaJ varunce 278 Tunmead. 27| Trace. 207.225.268.275 Tr^rung dau. 2*6 Training Kt. 3|2 Transformation. 192 Transpose. 67.184 TranspuK uf ЛВ, 67 Transput of d/dt. 69 Tree. 127 Triangle. 10.187 Triangle inequality, 15. 275.282 Triangular reams. 20.42. 59. 189. 336 Tndiagonal. 183.266 T*o equal runs. 181 U Unbiased. 163 Undcrdamping. 257 Underdetermincd. 100 Uniform distribution. 323 Uniqueness. 139 Unit circle. 13 Unit vector. 12 Unitary matrix. 236 Upper triangular. 4|. 45.171 V Vandermonde. 183 Variance. 163.277.292.321-326.333 Vector, v. 4.75 Vector addition. 2 Vector space. 74.75.79 Victory. 280 Volume. 177.187. 189 W Wave equation. 252 Wavelets. 285 Weights, viii. 286.299 Window. 300. 301 z Zero mean. 277 Zero sector. 76.79 Zig-zag. 309.310
> I I •I Index of Symbols (ЛЛТ)Л Л(ЛТЛ). 261 (ЛН)С ж Л (НС). 30. 38 (ЛА)я - Л(Н®). 33 (ЛН)Т - ИТЛ ’.67 (ЛЯ)-’ = Д-'Л-'.З! Л’1.50.57 (Лх)1 у ®т(Лту).68 (Л-1)1'- (Л1) '.67 (Л-')м " С>(/<1нЛ. 1X0. 1X6 Л H.III '.339 Л СЛ/И’. 90.95.338,343 Л - СП. 32. Х5. ЗЗХ. 340 Л СП' '/Г. 89.90 Л I.DU. 63 Л = LU. 58-60. ЗЗХ Л С Н. 171,2X1.338 Л QS. 2X0 Л - НЕУТ. 259.339 Л - (ЛЕгЦТ. 260 Л = АЛА-1.215,338 ЛН. 29.34.35 АН and НА. 225.262 А НС. 3IX AVr = t/r£r. 260 Лх Ь. 27.40 Ах = Аят. 202 Ли. 3 Ли а <ти, 259 Л1 Л and ЛЛ1,262 Л' Л® = ЛТЬ. 147.148 Л+. 133.2X0 Л* = АЛ*№|,216 АН = O|6J + • • + a„b*n, 34 СР Decomposition, 344 / НЛх-ЬН*. 154. 155 £-• = /,,52.53 H.j.49 Р=Л(ЛТЛ)-'ЛТ. 147 P = QQr, 176 РА - LU. 65. ЗЗХ Р2 - Р. 146 QR, 171,2X0.338 QrQ - 165 (/' ’> '.166 |(?,Н]-Чг(Л).176 Я, 33 п = [ / F ) Р. XX. 89. 340 Но. 85 О /?<,а = d. 96 S = ЛТЛ. 231,265 Я - СТС, 338 ,S’ = LDLT, 231.237 ].89 S - QAQ'r. vii. 227.237.338 .S’T - S. 69 T(ev + dw) = cT(u) + dT(w). 192 T^d/dx. 194 Т0ь344 VW UT. 280 X-'AX = Л. 215 u(t) = e^'x, 244 F(x,v), viii. 289 Sx.141 354
Index of Symbol» 355 <s [°] ж Id' ° P s Ax. 147 « = QQrb, 167 ufc = Л*м<>. 221 ж+ = A+b. 159 x » Xf, + xn. 97 х = Л\Ь. 45,97 x ♦ v. 300 x'rSx/xrx, 265 J = fl“*Q'rb. 171 STU. 236 I/1'335 U = W/|H|.I2.28I v + w, 2,5 v-w/i|v||||w|| = «*0.15 v • w « v'w, 2,12 |v-w| < Ijyll l|w||. 15 ця||ав®+в.236 ||x||b2«3 ||v +w|| < |MI + I|w||. 15 ||v||3 II cv + dw, 3,7 det. AB (<lct Л) (det В), I«4.197 dotP" lor-1.179 <1(А(Л- A/) = 0,203 det.^T) = dct(4). 184 е!В(Л). 206 Л* norm. 283,284 ^+/‘.285 ANlogaJV,66 ReLU. viii. 286.289.290.297.356 null(X). |42 пеКЛ). 35.34) С(Л). 23 С(ЛТ),78 N(4), 83 N(4T), 121 N(4T)±C(4). 136 N(4)1C(4T). 136 N(4'4) = NM), 141.149.152 V1,137 cond = <T| /a„, 345 V +W. 120 VnW.82.120 VUW, 120 Z. 113 (Л b).81.96 <r* = 261 |Л® ЛИ Лх].29 Капк(ЛЙ) < гапк(Л), 129.334 Rank( АВ) < rank(B). 129.334 | n3 multiplication», 58 |A|< fft.264 ЦЛ - B||. 260.274 1И®||/||«||. 265 ЦЛЦ.274.345 Il4“4i 345 HQxll = ll«l|. 167.280 ел‘,249 е4сВ^гМ. 255 « XeA* №‘. 250 mnp steps. 34 LAPACK. 172 randn. 19.267
356 Six Great Theorem*' Linear Algebra in a Nutshell Six Great Theorems»! Linear Algebra ь<-^тыт-*>ь—т.^ч^Ь‘-’Ь'^™тЬсто,гаи" R.„k ТЫ»™. Dimension of »l«mn ч»« - «Ии. Ч««- П» <• И» rank. Fundamental Theorem The row space and nullspace of A arc orthogonal complements in Rn SVD There are orthonormal bases (v’s and u’s for the row and column spaces) so that At>( WjUj Spectral Theorem If AT = A there are orthonormal q’s so that Аф = A,q, and A = QAqt Linear Algebra (( The matrix A Nonsingular A is invertible The columns are independent The rows arc independent The determinant is not zero Ax “ 0 has one solution z 0 Azsb has one solution x = A_|b A has n (nonzero) pivots A has full rank r»n Thc reduced row echelon form is R = I The column space is all of R" The row space is ail of R" All eigenvalues are nonzero ATA is symmetric positive definite A has n (positive) singular values in a Nutshell is n by n)) Singular A is not invertible The columns are dependent The rows are dependent The determinant is zero Az = 0 has infinitely many solutions Ax = b has no solution or infinitely many A has г < n pivots A has rank r < n R has al least one zero row The column space has dimension r < n The row space has dimension r < n Zero is an eigenvalue of A ATA is only semidefinite A has r < n singular values Linear Algebra and Learning from Data See math.mit.edu/learningfromdata This is the new textbook for the applied linear algebra course 18.065 at МГГ. It starts with the basic factorizations of a matrix: A=CR A = LU A=QR A = XAX~l S = QAQT A = U^VT The goal of deep learning is to find patterns in the training data. Matrix multiplication is interwoven with the nonlinear ramp function ReLU (z) = max (0, z). The result is a learning function that can interpret new data. The textbook explains how and why this succeeds—even in the classroom. Linear algebra and student projects are the keys.