Linear Algebra for Everyone - Strang G.

Author: Strang G.
Tags: algebra mathematical physics linear algebra cambridge press publisher wellesley publisher
Year: 2020
Similar
Lectures on linear Algebra
Linear algebra
Linear Algebra and Learning from Data
Handbook of Linear Algebra
Text
                    1 ‘ Н «1
LINEAR ALGEBRA
L’1 ? fforu-[: ’]
Everyone
Л= » o|
GILBERT STRANG

LINEAR ALGEBRA
FOR
EVERYONE
GILBERT STRANG
Massachusetts Institute of Technology
WELLESLEY-CAMBRIDGE PRESS
Box 812060 Wellesley MA 02482
Copynght 02020 by Gilbert Sum*
ISBN Г78-1-733М**-3"®	be reproduced or stored or transmitted
AU righureaerved Nopejof^nra ^„„«0 from
by my tneam. 1,к11мЬп*	«у language >t strictly prohibited —
Wellesley - Cambridge Pres.	. „blisber.
author red traasbbom ле arranged by	,
HTfcX lypewtung by AiMey
987854321
QAI84 2 5773 21ИО|ООС512/3-<1с23
Other teat, from WeUesley • Cambridge Press
Linear Algebra ami Learmng from IMU. 2019. Gilbert Strang
Introduction to Linear Algebra. Sth ErL. 2016. G.lbet Strang
Wavelets and Filar Валка. Gilbert Strang and Truong Nguyen
Introduction to Applied Mathematics Gilbert Strang
Calculus Third Edition. Gilbert Strang
ISBN 978-O-692I963-8-0
ISBN 978-0-9802327-7-6
ISBN 978-0-9614088-1-7
ISBN 978-0-9614088-7-9
ISBN 978-0-9614088-0-0
ISBN 978-0-9802327-5.2
Чц.-.,!,». Г«е i:i-u..l Pooimniny. Kai Bone ft Gilbert Strang ISBN 978-0-9802327.3.8
Essays Й1 linear Algebra, Gilbert Strang	ISBN 978-0-9802327-6-9
DUTervaUal Fquatiom and linear Algebra. Gilbert Strang ISBN 978-0-9802327-9-0
An Analysis of the Finite Element Method. 2008 edition. Gilbert Strang and George Fix
ISBN 978-0-9802327-0-7
Wellesley  Cambridge Free.
Bos 812010. Wellntey MA 02482 USA
WWWwelledeytainbridgr.com
I. AFFxreryoneW gmail.com
Gilbert Strang « page math.niit.edu/ - gs
Fororden math.mit.edu/wTborder.php
Outude USACanada: w ww.cambridge.org
Select books. India: www.w ellrdcy publishers .com
The website for this book (with Solution Manual) is math.mit.edu/evcryone
2019 book Linear Algebra and I-earning from Data (math.mit.edu/leamingfnimdiita)
2016 book Introduction to Linear Algebra. Sth Edition I math.mit.edu/llnearalgchra 1
2014 book Differential Equations and linear Algebra (mathmitcdu/dda)
Linear Algebra is included in МП". OpenCouneWare ocw.mit.edu/courvrUmatheinatlcs
Those videos (including 18 06SC and 18 065) are aim on www.youtube.com/mitoc w
1H 06 Linear Algebra 18 06SC with problem solution. 18.065 Learning from Data
MATLAB* is a repuered trademark of The Math Works. Inc
The cover design was created by Gad Corbett and Lot. Seilers lseHcrMiesign.com
Table of Contents
Preface	T
1	Vectors and Matrices	I
l.t	Linear Combination* of Vector*..................................... 2
1.2	Length* and Angie* from Dot Product* ........................... 11
IJ	Matrices and Column Space* .....................................   20
1.4	Matnx Multiplication and Л m CJ2 ................................  29
2	Solving Linear Equations Ax = b	39
2.1	The Idea of Elimination........................................... 40
2.2	Elimination Matnce* and Inverse Maine**  ......................... 49
2.3	Matri* Computation* and A m LU.................................... 57
2.4	Permutation* and Transpose* ....................................... M
3	The Four Fundamental Subspaces	74
3.1	Vector Space* and Subapace* ...................................... 75
3.2	The Nullapac* of A: Solving Ax - 0................................ 83
3.3	The Complete Solution to Ax - 6................................... 96
3.4	Independence. Baai*. and Dtmenuoo ............................... 107
3.5	Dimension* of the Four Sdbtpacc*................................. 121
4	Orthogonality	154
4.1 Orthogonality of the Four Subspace* .............................. 13$
4.2 Projection* onto Subspace*......................................   143
43 Least Square* Approximations....................................... 153
4.4 Orthogonal Matrices and Gram-Schmidt.............................. 165
5	Determinants and Linear Transformations	177
5.1	3 by 3 Determinant*..........................................    173
5.2	Propertie* and Application* of Determinants..................... 184
5.3	Linear Transformations........................................   192
6	Eigenvalues and Eigenvectors	201
6.1	Introduction to Eigenvalue* .................................... 202
6.2	Diagonaliting a Matnx........................................... 215
6.3	Symmetric Positive Definite Matrices............................ 227
6.4	System* of Differential Equations............................... 243
iii
Table of Conient^
8
7.1 Sm^ValursaodSm^**»' _
7.1
72 Compre»"* l®<e*
7.4
Learning from Data
IJ Muuminn|bn»by<Jf*1,CT*Dofe*“
158
259
269
274
280
286
289
299
306
321
Appendix 1	The Ranks of A В and A + В	334
Appendix 2	Eigenvalues and Singular Values: Rank Ono	335
Appendix 3	Counting Parameters in the Basic Factorizations	336
Appendix 4	Codes and Algorithms lor Numerical Linear Algebra	337
Appendix 5	Matrix Factorizations	338
Appendix 6	The Column-Row Factorization of a Matrix	340
Appendix 7	The Jordan Form of a Square Matrix	343
Appendix 8	Tensors	344
Appendix 9	The Condition Number	345
Appendix 10	Markov Matrices and Perron-Frobenius	346
Index of Symbols
Six Great Theorems Linear Algebra In a Nutshell
356
Preface
This i* a linear algebra textbook with a new Mart Chapter 1 begin* a* usual with vector*.
We rec (heir linear combination* and dot product*. Then the new idea* come with matrices.
Let me illustrate those idea* right away by an «ample
Suppose we are given* 3by 3 mam* A with column* at.*, *!;
There column* are threedimensional vecton The tint vector* nt and a* connect the
center point (0.0,0) io the pouu* (1,3.4) and (2,4,2). The picture show* those point* In
3-dimen>*onal apace (apt space) The key to this num* u the third vector a* going to the
point (3,7,6).
When I look at thow vector*. I tee something exceptional Adding column* 1 and 2
produce* column 3 In other word* O| + o> « a*. In а 3 dimensional picture, a, and a,
go from the center point (0,0,0) Io the point* (l.3,4)and(2.4.2). The picture shows how
to add those vector*. It i* normal that all crwnbtaatxwi* of two vector* will fill up a plane
(The plane ia actually uifiruse. we juu drew the part between the vector* I What I*
really exceptional i* that the third point o* =(3.7. •) lie* on thi* planr of a, and 03.
Mom points don't lie on that plane Mint vector* a* are nor combination* of O| and
Oj. Mont 3 by 3 matrsces have independent columns Then the malm will be invertible.
But there three column* are dependent because they lie on the «ante plane: o> + <*i - o>
at» (1.3.4)
(3,7.0) -oi+ u3
Three vectors sharing a plane
inside 3-dimensional space
оз = (2.4.3)
(0,0.0)
1Jnear Algebra for Everyone
vi
fact about th°« ,hreC veC,OTS a‘ °2,a3.
TKu pawre raveab Ле ««>	™a Ю их Ле >dca. and to get better
ButweneedthengbtlangiupeloJocn ‘	djrection
and better at expresung it Here are three sleps tn ago»
of column 1 and column 2
Idea in words
Idea In symbols
2
3
Matri, times vector
Step 3 shows how a mana C mntapbes a rector x. The columns a, and <4 in C multiply
the numbers a, - 1 and и  lax Пе output C* U в combination of the columns
Here that аЛшш cooteaaboa Bill + ®j
One more creaal sup allows several combustions al once This is the way forward.
We cannot take this step with paeans or worts A matrix multiplies a matrix
4 Mitrn time, main, A • CR a
«1 “1
10 1
Oil
Column 1: lai + Ota,  a( Column J; 0о> ♦ log - e>
Colman 3 le, * lo, . (l,g,4) + (3,4.3) - (З.Т.в) > а.
That matns nwikspbcatioo A  CR display, mayor information;
A has dependent columns the amtnaahoo of column, 1 + 2-3 gives (0,0,0)
4 also has dependent row, none conbaatsoo of its rows wiQ give (0.0,0) I
The foi""4>«er_ofth«s A wooly «plane and not the whole 3D space
The -row tpotr- of this л  aho a plane and not the whole 3D space
The «pare mama A haanotorem It, drrermtnaat n aero. It is unuwml.
In Chapter, I IO 4. the orgmu». idem —
-------'	— -	Ktor 'Paras The columns of A arc in
«n-dmieneoMj sp«x R- But the action is in
b mtrXT X”* lPOCe “d **“
a	W Лх"“" "
• row space pan and a nullspace part:
f	fmeor ,,Ujmwi a* = b	___
Preface	**'
The Plan for the Book
Thai example is pan of a new start 1 believe it is a better start (for the reader and the
course). By working with specific matrices to introduce the algebra, the subject unfolds.
Chapter 1 develops the maths equation A “ C times Я C lakes independent vectors
like в! and a? from the column space of А Я takes independent vectors from the row
space of A. Those two “vector spaces" an at the center of linear algebra. We meet them
properly in Chapter 3. But you will know about independence from examples in Chapter 1.
Die big sup is io factor A uuo C wrus R Matrix muhipbcauoa is a crucial operation,
and Chapter I ends with four different ways to do rt—seen on the back cover of the book
This sets the path to all the great factorizalioas of linear algebra
A « LU Chapter 2 solves n equations Ax » b in n unknowns: A is square
A = CR Chapter 3 reduces to r independent columns and r independent rows
A — QR Chapter I change* the columns of A into perpendicular columns of Q
3 “ QAQr Chapter 6 has eigenvectors in Q The eigenvalues of S are in A
A > UE VT Chapter 7 has singular vectors in I/ and V and singular values in E
The column* of A an tn rn dimensional space R"*. the rows are in R' The m by n matrix
multiplies vecton x in R“ to produce Ar а в' But the real action of A Is seen in the
four fundamental subspaces
Chapter 2 only allows one solution x„ The matrix A is square and invertible
Chapter 3 find* every solution to Аж - b by adding every Chapter 4 deals with equa-
bom that don't have a solution (because b has a piece from the mysterious fourth subspace)
1 hope you will like the “big picture of linear algebra" on page 124 all four nibspace*
Those hve factorization* are a perfect way to organize and remember linear algebra
The eigenvalues in the matrix A and the singular values in E come from S and A
tn a beautiful way—but not a simple way. Those numbers in 5x - Xx and Ao ж tru see
deeply into the symmetric matrix S and the m by n matrix A. Often 5 appears in engineer-
ing and physics Often A is a matrix of data. And data is now coming from everywhere.
Please don't miss Tim Baumann's page 272 on compressing photographs by the SVD
Chapter 5 explains the amazing formulas produced by determinants Amazing but
unfortunately difficult to compute' We solve equations Ax = b before (not after) we
find the determinant of A Those equations ask a* Io produce b from the columns of A.
Ax > s, (column 1) + Жа (column 2) + •• • + r, column я) - right aide b
In principle, determinants lead to eigenvalues (Chapter в) and singular values (Chapter 7).
In practice, we look for other ways to find and use those important numbers And yet
determinants tell us about geometry too—like the volume of a tilted box in n dimensions.
A short course can go directly from dimensions in 35 to eigenvalues for 2 x 2 matrices.
Lmear Ale»*™ ,or Everyone
viii
№ added Chapter 8 oa
Final Chapter: Learning from Dau
^ofte»coa«'«»re^ul"n“‘IW*
number n is large, and the number of
. coclied linear algebra has to find out
underfunding that leads to a decision,
functioo of the input. Deep learning
In machine learning the output в a	produces (often with giant compuU-
mrm to find that furutioo from the ««Й»	fel|urcs of
Hons) a learning function Ffa,»)^	«*“*, *	ц* waghtt assigned to those
the rapou and the weight*.
(*> = ^^O^y.hrutapfitaatoa.
Optimizing the weights so learn from die trammg ttata v .. the btg computation.
When that u well done. « ca. mp- new sampio that the lystem lus never Seen
The BMxesa of deep learning ta that » <Л» ck* ю correcl OUW
The system Ьм identified aa uwsy or translated a sentence, or chosen a winning move.
This ujsivcrwurtowar algefvu. a mutate of weight matrices and ReLLl It is included
with no expectatson of testing stoderes Thn is a chapter to use late, in any way you want
Yo« could experiment with the webene plas ground.lrmorflow.org
Machine learning has become important and powerful based on linear algebra and
calculus i optimizing the weights) and on suusnci (controlling the mean and variance).
Snr « u wor nrpmnrd or rvrn apetled to be pan of Ле roans Whai I hope is lhai the
faster start ailoen you to reach ngemotors and smgafar Mines—chose are true highlights
of this subject. Chapters 2 m 7 roeifinn the jump of intuition near the end of Chapter 1;
If all columns ot A Ise oo an r~dimems<»al plane, then all rows of A also lie on a
(usually different) r-dunemamal plane That fact has far-reaching consequences
Thu й a textbook for a normal linear algebra dasa—to explain the key ideas of this
beautiful subject to mryvrw As deasly as I can Thank you.
Gilbert Strang
**4	“Wk That 3 by 3 matrix
I*™ “!
•X «cpcnuem ПЖ1 ine numbers to show this are -S and 3;
“5 (row 1) + 3 (row 2) » row 3 of A	-S(i.a.») + 3(S,4,7)e(4.2,e).
Preface
lx
Websites for Linear Algebra
The dedicated website foe this textbook is malhmiLcdu.rirnooe Seven! icy sections
of the book can be downloaded. That sue also has bncf solutions So the problem sets
(For homework , the instructor may ask for more detailed solutions.) Every class will find
a balance between learning the essential ideas of this subject and practicing the steps on
small matrices.
The most beautiful website for linear algebra u 3Bluc I Brownxum. created by
Grant Sanderson. He chose that name baaed on an unusual genetic feature of his eyes.
For blackboard lectures you could go to the OpenCourse Ware site ocw.mll.edu created
by МГГ The videos for Math 18.06 and Math 18 06SC and Math 18.065 have had millions
of viewers, very often accessed through YouTube 18.08 is MITs large linear algebra
course The 18.06SC videos include problem solutions by the instructors 18.065 is the
newer course that leads to Chapter 8 of this book and to math.mil.rduAearnlngf rumdata
Important to add The "new sasrT in this book was hrM tested ia 18 065—which begins
with a substantial review of linear algebra. Two more online materials were added in 2020
to the 18.08 she:
A 70- minute video outlines the new sun tn teaching and learning linear algebra That
"2020 vision" has guided Secoons I 3 and 1.4 of this new book I am convinced
tliat working with the columns of actual matnees is a direct route Io understanding
linear independence and column spaces and mama mstiliplicabon The column-row
factorualion A w CR is al the heart of solving linear equations Лг  6
Professor Raj Rao has developed a very successful course on computational linear
algebra srul machine teaming at the Umvemry of Michigan The key idea is to im-
plement (in class 1) what you learn The website mynerva.io describes the course
and the online textbook and future plans. That textbook is complementary Io this one.
Property used, the Web has become truly valuable Io students and all readers. Il pro-
vides a different way Io explain linear algebra, and k is alive' Please use the video lectures
to see the flow of the course And please use the book to capture that flow and hold it and
practice it and understand it.
This book begins with independent vectors from the column space and row space:
A = CR The book ends with orrAogcmof vectors from those spaces. the u's and v's
in the SVD with Av w ma. In between this we have the central ideas of linear algebra.
X
Computational Linear Algebra
nuly Linear algebra is often the key to
.^ment and finance. This page aims to
We need to
computations in engineering and science r_toraputers and math
provide an updase O<1 hard power and son pm«r^ ^^^ws The leaders are
Hie 500 fastest computers m the	!a(j „ can give their locations: Kobe,
mostly paid foe and controlled by *°^ПГОС xlI -^p includes Italy. Switzerland,
Oak Ridge, Lnermore. *uxi.	Russia. Spain, Saudi Arabia,
Germany, France, Korea.	, д ,s in then speed and special processors.
Finland. Norway, and BrwL The scientific interest is “ u^jy.askwl.qoesllo|B
Agood source; big matrix into L time,
The bench.	тИ1Ги1и) and then solving a system of linear equations
Ax = b XT^aH^ns in the High Performance UNPACK benchmark. Those prob-
lems are a the Mart of Chapters 2 »d3. no.speeded up by parallel processing.
The top machine aduesrn 415 petaflops = 415 tiroes 10" double precision floating
point operations per second. This is with extremely careful coding for special hardware.
The important point is that ordinary composers have also seen a tremendous increase in
Numerical Linear Algebra
This is the subject at major research: fast algorithms for matrix computations, The first
was thousands at yean ago with “elimination". For that idea and any new one. part of
the test is ю count the steps: in this case n’/3 for n Hnear equations with n unknowns
To compute eigenvalues and now singular values, big progress brought that also to cn1.
Favorite textbooks among many good ones are
*	L. N Trefethen and David Ban. Numerical Linear Algebra, SIAM (1997)
•	Gene Golub and Charles van Loan. Matrix Compulations, fohns Hopkins (2013).
For the mathematics of linear algebra (not focused on compulation) we mention
•	Roger Horn and Charles Johnson. Matrix Analysis. Cambridge (2013)
•	R. Bhatia. Matrix Analysis. Springer and Posttrie Defin.tr Matrices, Princeton
Randomized Numerical Linear Algebra
We can hardly
fell u, about the ТЫ, Й	f"*"'”' Randonl “mPles

—^srewre-
Preface	XI
Gratitude for Help
This book was written during the months of lockdown for the corona* irus Life was limited
but time for writing was nearly unlimited. Difficuit for oar society but perfect far an author.
The idea for a new and more active start had just gone into a new video for Math 18.06 on
OpenCourscWare. (The matnx multiplication A = CR in Section 1.4 is pari of the idea,
with independent columns of A going into C.) Developing that idea into this textbook
has been exciting every morning.
The time was right but help was needed. It came in the best possible way My good
friend Ashley C. Fernandes in Mumbai received handwritten pages every day. Then he
returned IfTgX pages overnight Those pages went back and forth many times Working
with Ashley has made Linear Algebra for Everyone possible; this is our eighth book. I am
truly grateful for these happy months.
Another good fortune has been help from Daniel Drucker. He is the most careful reader
I know. Let me leave you to decide on this one; To be really ркку. in the Preface you say
that the three columns lie on the same plane, bat in the figure you say that the three vectors
are in a plane " 1 won't do that again. It made my day when Dan liked the small matrices
on the front cover. The goals of the text are clarity and simplicity;
The basic ideas of matrix multiplication evolve step by step in Chapter 1.
Columns of CR are combinations of columns of C
There are four different ways to multiply AB (see rhe hack cover)
The key property is AB times C = A times BC
This is the tenth cover created by two artists: Gail Corbett and Lois Sellers. You
might have seen the rectangles for the four subspaces on Introduction to Linear Algebra.
Before that came Essays on Linear Algebra with Alberto Giacomettis “Walking Man“
and Calculus with a famous curve painted by Jasper Johns. Perhaps the most beautiful was
the photograph on the finite element book. These are very happy memories for an author
The whole idea of helping students is beautiful.
My greatest gratitude is to ary rrife JUL and our seat David and John and Robert
This booh is dedicated to them.
Line» Algebra for Everyone
Dictionary of Matrices
xH
A good way to left you
Thu wide variety of шайке»
ias a K> name the matrices you will me;,
Identity mama I
Column haul C
Row bests R
Rank 1 лшпж uv
Chapur2
Я.т1гмпод num» E
Lower triangular L
Upper triangular U
Chapter 3
Echelon matrix Я
Free matrix F
Special solutions 5
Mixing matrix M « W~}
Tniupote num» A*	Incidence matrix A
Ратлявоа P	laplacian matrix L
Fourier matrix F
Pscudomverse A*
Chapter 4
(Mnfonal matrix Q
fnpaot matnx P
Least txpiam ЛТЛ
Chapterh
Cofactor matnx C
Change of hast» Я
TihedboxE
Upper triangular R Ноше matnx H
Reflection matnx H
Chapter в
Symmetric matrix S
Eigenvector» X
Eigenvalue» A
Fibonacci matrix F
Jordan matnx J
SttMarmoanBAB'
Exponential ел'
Chapter 1
Singular value» £
Left ungular vecton U
^‘(htuntulttnaonV
Compreaaed matrix Л»
S®np*e<»vanao«AAT
Hi»>en matrix Я
Chapter П
Weight matrix A
Convolution C
Jacobian matrix J
Hessian matrix H
Covanance matnx V
Shift matrix S
1 Vectors and Matrices
1.1 Linear Combinations of Vectors
1.2 Lengths and Angles from Dot Products
13 Matrices and Column Spaces
1.4 Matrix Multiplication and A — CR
The heart of linear algebra u in two operation*—both with vccux* We add vector* to get
V + w. We multiply them by numbers c and d to pct cv and dw Combining those two
operation* (adding cv Io dw) give* the liaeur combtaonoa cv + dw
"♦*•[! I
Linear combination* are all-important m thn subject' Sometime* we want one partic-
ular combination, the specific choice c “ 1 and d “ 1 that produce* cv + dw “ (4.5).
Other time* we want ail Йе combination* of V and w Combination* that produce the tcro
vector have special importance Of count Ov + Ow i* always the tcro vector
The vector* cv lie along a line WTicn w n not on that line, the combination* eV ♦ dw
HU a complete two-dimensional plane Starting from three vector* u.v.w in three-
dimensional ipace. their combtnaOom r*t + do + ew art likely Io Hll the whole apace—
but not aiway* The vector* and their combtnatwnt could lie la a plane or on a line
Thi* ia a key problem describe all combination* of n given vector*.
Next «ер: Put two vector* into the column* at a mam* A or B. Then multiplying a
matrix by a vector z exactly produce* a linear combmatiori of the column*
Again thou combination* Ax fill a plane The output* from Bz only fill a line.
The first example had "independent columns" The second example ha* “dependent
columns" Chapter 1 explain* these central idea», on which everything build* Linear
algebra move* from 2 column* in 2 dimension* to n column* in m dimension*. Your
mental picture stay* correct- and we end by multiplying matricn
1.1	Vri’torodditton v 4-wandfiacurcoatbiaaetoa* cv 4-dw.
1.2	The dot product r  w of two vectors end the length |,v| = y/v  v
IJ Matrix A timet vector z is a euatMtoltoa of the colnmm of A
1-4 Matrix A timet matrix В в | ЛЬ] • • • ЛЬ»]. Mirlnph A timet each column of В
1
Chapter I Vectors and Mairice,
2
1.1	Linear Combinations of Vectors
Z----------———~T + of the vectors» w	>
s' 1 i ic a	сзвввовлл
j я...	’»‘°H 5]
i	; ] .<[; | »«»'<*"	; ].
till a plane la тух space Same plane for
Arithmetic starts with numbers We operateoa those numben in two essential ways:
Addition 2 + 1-5 Multiplication (4) (5)-20
Subtracting 3 is just the inverse of adding 3 Subtract S - 3 lo recover 2.
Dividing by 5 is ум the inverse of ndnptysag by 5 Divide 20 by 5 Io recover 4.
Combining addition and taakepbeanoa leads lo (2) (3 ♦ 4)  (2) (3) -I- (2) (4).
Linear algebra moves addition and nailnpbcatioa into higher dimensions Instead of
working with angle numbers, we wort with ««cion The vector о - (3,1,7) is a string
of three numbers к a a *l-dime<Moaal vector" The good way is lo write о as a
column vector Thea we can add two column vectors v and w:
I add each pair
of components)
Subtracting sc is ум the inverse of adding Sa. so dial v + SO - w recovers V
Vector wblrwtiim (.4-M) - w - । J j -
5
2
(») (4) ♦ (1) (S) + (T) (2) - 12 + 5 + M . 3 j
Ik>t product = 31
(I)
(3)
The dos product v - ш u ,	_ 7~	-------
** ’• “d ’ • • |v^bL’m;leF2l^°r‘ °"'scctlon ° •»«veals
™lt,plKMwa m Im~ i(gehri b =	w. But a more important
3
The output from Av is a ««ctor not a number: Matnx A limes vector v equal» vector Av.
The matrix A is a rectangle of numbers: m rows and n columni. A 2 by 3 matrix multiplies
a vector v with n = 3 components.
—- -4: hi
Please notice: A has 2 rows. A lima v involved 2 dot products The first component
in Av used row 1 of A The result was 31. The second component of Ac used row 2 of At
(row 2of A) • (column vector e) = 1-3+ 2-1+ 1-7= 12
This is the usual way to multiply A times •: dot products of the rows of A with c.
But Section 1.3 will explain a better way to understand Av Computing row • column is
fine, but understanding Av becomes clearer with linear combmanona of column vectors.
Let me show otic "linear combtnaooo" because this is the fundamental operation on
vectors. Multiply veclon by numbenlAe 2 and i and add rite retulu :
1.Inrar combination
co + d so = 2v > 4w
(S)
Those combtnalioas go into lhe big step Multipiy a matrix by a matrix I would like
to save that step for Section 14 We have explained three ways lo multiply, involving
numbers and vectors and matrices:
1. Number times vector (co) 2. Vector  vector (•• to) 3. Matnx times vector (Av)
Those are tn Sections 11 and 1.2 and 13 Then AU is matnx multiplication in Section 1.4.
Let me also say: A limn v can me the rows of a mama A or the columns of A.
There are m row vecton in A and there are n column vectors. Both «rays use the same rrm
numbers A major key to linear algebra comes from the connections between two ideas.
lint products with rows of A	Combinaliom of columns of A
Chapter 1. Vectors and Matrices
Linear Combinations
Combining addition wnh scalar multiplication proluces a “linear combination
of о and u- Multiply* bye and inutapiytt» by* Tbenaddrv + dw
Theuatpfcvanddwua Imearcombiaanoa cv + dw.
Foor Special bnev combinations are sum. difference, zero, and a scalar multiple cv •
1*4 1» - wm of sectors (4.2)+ (-1.2) = (3.4)
I* - Iw = difference of sectors (4,2) — (—1,2) = (6,0)
Ov + Ow = am «сиг (0,0)
cv + Ow W sector cv m (he direction of *
The zero sector is always a possible combsaatioo from c = d = 0. Every lime we sec a
“space” of sectors, that uro vector will be included Thu big vic*, taking all lhe combi-
aaiumt of v and w. n linear algebra at wort
The figures show bow you can visualize vectors For algebra, we just need the com-
ponents dike 4 and 2k Thai sector * is represented by an arrow. The arrow goes vi w 4
units io die right and t>a > 2 unru up li ends al the point whose z. p coordinates art 4,2.
This point la another repmentaunn of the sector—to we have three ways Io describe в;
Represent sector о Two numbers Arrow from (0.0) Point in the plane
We add using the numbers We vuiuluc v + w using arrows for о and w and V + w
lector addition (head (o tad) Alike nd of v.place the Hart of w
1.1. Linear Combinatiui» of Vectors
5
Vectors in Three Dimensions
A rector withlwo components correspond»» a point in the .rp plane The component» of v
are the coordinates of the point: z = v> and p  The trra* end» at chit point (vi.m),
when it start» from (0,0) Now we allow rector» to have three component» (Vt.bj.Os).
The zy plane is replaced by three-dimensional ryi »pace Here are typical rector»
(still column vecton but with three components):
The vector ю corresponds to an arrow in 3-space Usually the arrow Mans al the “origin",
where the rpa axes meet and the coordinates are (0.0,0). The arrow ends at the potm
with coordinates U). uj. v>. There u a perfect match between the column rector and the
я/TOS' from the origin and the pour where the arm emit
The vector (a, p) in the plane («nth 2 numbers) is different from (r, p. 0) in 3-space I
correspond Io points (z.p) and (r.p, a) in 2D and 3D.
и abo written ai vw (I,|,-|).
The reason for the row form (in parentheses) is to save space. But t> « (1,1,-1) is
not a row vector* h is tn actuality a column rector, just temporarily lying down The row
vector [ 1 1 -1 ] is absolutely different, even though n has the same three components.
That 1 by 3 row rector vT w 11 1 -1 ] is the “transpose" of the 3 by I column vector v.
In three dimensions, v + w is still found a component al a time The sum has
components t'i + иц and uj + vi and I'l + srj. You see how to add vectors in 4 or 5
or n dimensions. When w starts at the end of v. the third side is v 4- to. The other way
around the parallelogram is w + V. Do the four rider all tit in the юте plane.’ Yet.
And lhe sum v + w - и — w goes completely around to produce the _________vector.
Chapter I Vectors and Matrices
6
The Important Question: All Combinations
Foe one vector u. the only linear combtnaboM » multiples «*• ,wo vectors,
the combiaatiOM Me co 4 do For three vectors. Ле combinations are cu + do + eto.
Will you take the big мер from one coMbuuboo io all combinations ? Every c and d and
e are allowed. Suppose Ле vectors M. e. to are la three-dimensional space:
I. What is the picture of eil combinations cw?
1 What в Ле picture of ad combusuticns css ♦ de?
3.	What u lhe picture tri ail combmabons cu ♦ de ♦ ew?
The answers depend on the particular vectors st. v. and w If they are typical nonzero
vectors (chosen at random 1. here are the three answers Thu u a key tn our subject:
I ThecombinationsruMibuAnagk (0.0,0).
1 The combinations ru + de til a pine thenar* (0,0,0).
3.	The combinations CM + de ♦ ew hu rbree-duneasiona/ ipact
The uro rector (0.0.0) b oa Ле line because c can be иго It u on Ле plane because e
and d could both be rero The tire tri vretors щ „ ш6пйе|у long (forward and backward)
Il is the plane of all eu + do (combining two vectors ts, e in threectimensioiial space)
that I especially aak you to ton* about
Addutf oZZeuoatrerhnrtoadduoutbr Mbrr tire filli in ike plane in Figure 1.3.
tblrtX^LtL^l^V'~'4** "₽’'**“< '« Suppose that
tills up the wh.de ,hr^.'** ₽U",rfu » Then combmmgall ew with all cu+dt>
up the Whole Лгее-Лтеткта! spree a. ♦ do 4 ew matches every point in 3D.
When w hapJn?fc	₽4,n* Й*П‘p,°' B“‘olher P°«ibilities exist
The co^bZX . Л .X	Р11ПС °f •** fi™ '*°
three-dimensKwul spree Please thmk	*?* P'"* ** ПО< gC' ful1
•₽~e Pteree Лтк about Ле spec»! ceci ln p^^ , Qf
7
1.1. Linear CombinaUons of Vector»
Line containing all cu
figure 1.3
-u/2
Plane from
all ru + dv
• WORKED EXAMPLES 
1.1	A Die linear combmatioas of v = (1,1.0) and w - (0.1,1) fill a plane in R’.
Detcribt that plant, find a vector dial it not a combination of v and w—not on the plane
Solution The plane of v and w contain» all combination* cv + dw. The vector* in that
plane allow any c and d. The plane of figure IJ tills in between the two line*
['1	[0]
Combinations cv + dw = e I 1 I + d I 1 I •
to] L»J
611 a plane
Four vectors in that plane are (0.0.0) and (2,3.1) and (3.7.2) and (v.2a,<).
The second component c + d is always the sum of the first and third components
Like most vector*. (1,2,3) il not ui rhe plane, btcaatt 2 dart not rquo/ 1 + 3
Another description of this plane through (0,0.0) ts to know that n  (1,-1,1) is
perpendicular to the plane Section 12 will confirm that 90* angle by testing dot products
1.1	В find two equations for г and d so that the linear combination rw + dw equals 6:
-l-‘l -[;] ЧЯ
Solution In applying mathematics, many problems have two parts Here we are asked
for the modeling part (the equations) Chapter 2 is devoted lo the solution part (finding
c and d) Our example tits into a fundamental model for linear algebra
Vector equation find 2 numbers c and d so that cv + dw = b
For n - 2 we will soon find a formula foe c and d The "elimination method" in Chapter 2
succeeds far beyond n - 1000 column vectors Al that point we must use matrices <
Vector equation	2		-1		1	2c- d=l	c«2
cv + dw = b	5	+ d	-3	=	1	Se-3d.i	d-3
Vector addition produces two equations The graph of each equation produces a line.
Two lines cross at the solution. Why not see this also as a matrix equation Az = b.
since that is where we are going:
2
5
2 by 2 matrix
Az - b
-1
-3

в
Chapter 1 Vecton and Mairice,
Problem Set 1.1
1 Underwhmeomto-0-.Лс.-й [	1 .n^uptemof [ °fc ]?sunwilhme
П.О eouanom e » me and d - m*. By ehmmaxmg m. fuxi one equaUon connecting
me - «го. »the- number.
1	G™. around a uuorte from (0.0)»(5.0)»(0. И)»(0.°). •*- « Отои three
 г±т?;лс“.."imi "
IM’Tbeterittfl ujuaredof*™*»" (!.»)» INI f+Ч-
Problems 5-9 are about addition <d .ector. and linear combination*.
3
Deicnbe ceometncaUy (line, plane, or all of R1) all linear combtnatrona of
If v ♦ w - and v - w  p . compute and draw the vecton v and w.
From v - , * 1 md w “ J ' , And the component, of 3o + w andrv + dw
Compute u->a*waad2u*2o4w How do you know u, v, w lie in a plane?
Erory combmatmn of a - (|. -J, 1) «d w - (0.1, -1) ha. component, that add
»-------Rod e and d ю ttm e. ♦ dw . (X 3. -в). Why u (J.3, в) impoatible?
la Ле ip plane mart a0 mne of them haear comhinadoru:
в[*1 W”h e“0’1’2 “d <f“0,l,2.
9
1.1. LinearCombinationsofVectors
к = (0,0,1) j + k
---------------f
2:00
i -(0.1.0)
NotKt Ла Шамая
b (0,0,0) al the
i-(1.1,0)
Figure 1.4: Unit cube from «./ к and twelve dock vector»: ail lengths - 1.
Problems 10—14 are about special sectors on cubes and docks In Figure 14.
10 If three comers of a parallelogram an (1,1), (4.2). and (1,3), what an all three of
the possible fourth corners? Draw two of them
11 Four corners of this unit cube art (0,0,0), (1,0,0), (0,1,0), (0,0,1). What are the
other four comers'’ Find the coordinates of the center point of the cube The center
points of the sis faces are________The cube has how many edges'’
12	Review Question. in xya space, when is the plane of all linear combinations of
i-(1.0,0)and t+J-(1,1,0)?
13	(a) What is the sum V of the twelve vectors that go from the center of a dock lo
the hours 1.00.200....12:00?
<b) If the 200 vector is removed, why do the 11 remaining vectors add lo 8 00?
(c) What an the», p components of that 200 vector •- (oosf.ainf)?
14 Suppose the twelve vectors Mart from 600 al the bottom instead of (0.0) al the
center. The vector to 12 00 is doubled to (0,2). The new twelve vectors add to ,
15 Draw vectors u. e. u> to that their combmatiom cat + de + ew fill only a line
Find vectors u. e. w to that their combinataom cu ♦ de ♦ rw fill only a plane
equations for the coefficients c and d in the linear combination.
Problems 17-18 go further with linear combinations of r and w I Figure 15a).
17 Figure I 5a shows | e + | w Mark the points | e +j tn and | e + ;tn and n + tii.
Draw the line of all combinations co + dir that have c + d = I.
IB Restricted by 0 < c < 1 and 0 < d < 1. shade tn all the combinations co •+ dir
Restricted only by c > 0 and d > 0 draw the “cone" of all combinations cu - dw
РгоЫепи IM2 dm! и. -. - - .ЬгеечЬтепсй«.1 Ф.« <-* ngur. 1.5Ъ).
16	iw«di«+4«“R«urel 56 Challenge problem: Under
.tut res’tncoom on ! d.e. wtO Ле combmiUom cw + *> + « fill in the dashed
mingle1 To oay  Ле mangle. one requirement u c > 0, d > 0. e > 0.
20 The three dashed hues tn the mangle are v u and w - t> and u - u> Their sum is
_________ Draw the head u> tail addiuon wound  plane mingle of (3,1) plus (-1,1)
plus (-2,-2).
21 Shade n Ле pyramid of canbmatiom я» ♦ de +eu> with c > 0, d > 0. e > 0 and
c+d + t$ 1. Mart the vector|(« + u+w) as inside or ouuide this pyramid.
22 If you look ar nffcombsnatioMof tho« u ». and w. n there any vector that can't be
produced from cv + du ♦ etc? Different answer if «,«. w are all in	
Challenge Problems
23 How many corners does a cube have in 4 dimensions? How many 3D faces?
How many edges? A typical corner to (0,0.1.0). A typical edge goes io (0,1,0.0).
24 Find two iigrntu сапМаоПти of die three vectors si ” (1,3) and v m (2,7) and
w > (1,5) that produce i • (0,1). Slightly delicate question: If I take any three
vectors u. v. v in the plane, will there always be two different combinations that
produceb. (0.1)?
29 The linear cnmbinaUms of	(n. 5) and (c,d) fill the plane unless__
Find four vectors u. o. w. * with four components each so that their combinations
rai + du + rw + /a produce all vectors (bi.5>.bi.b«) in foor-dimensiontd space.
* ^“d^'",hree7**,"B,«'--<tf»,’«‘rai*de + ew = b. Can you somehow
П1Ы1 td л Сайт iff- r L 4	w
г
-1
0
-1
2
-I
0
-I
2
1.2. Lengths and Angle* from Doi Products
11
1.2 Lengths and Angles from Dot Products
1 The "dot product" of e - * j and so =	' is e • v - (1)(4) + (2)(6) -4 + 12-10.
2 The length squared ofc = (1,3.2)u co - 1 + 9 + 4= 14 The length b ||e|| . /14.
J П. »««».-	. -fa .	IMP. 1.1
4 v - (1,3,2) is perpendicular to •  (4, -4,4) because v • w = 0.
S The angle в • 45" between c= [ * 1 and w - [ * 1 ttoo.1 =	.----L—.
I 0 I L « J IMI IMI ЙЙЛ)
Schwars inequality Triangle inequality
6 All angles have | сов d| < 1. AU vectors have |v. w|SI|e(| ||w|| | ||v+u>||$||v||+||w||
The dot product и • c tells tn Ле squared length ||ej|’ of a vector u.
The dot product c  w tells us Ле angle between two vecton v and so
The length ||v|| b ghen by ||v||’  о - в « ej + c’ + • • • + oj. (I)
In two dimensions, this is Ле Pythagoras formula a’ ♦ b* - r1 for a nghi triangle
The sides have a1 - of and 5* - Ц The hypotenuse has ||r||* - ₽} + t«J - a’ + b*.
That formula for length squared matches ordinary plane geometry
To reach n dimensions, we can add one Лшеаыоа at a time Figure IЛ Лтп
w • (1.2,3) in three dimcnuom Now the nght tnangle has sides (1.2.0) and (0.0,3).
Those vectors add to so The fins side в tot the ay piaac. the second side goes up the perpen
dicular r axis . For ЛЬ tnangle in 30 wuh bypolemnc w a (1,2.3). Ле law a’+ b* w c*
becomes (1’ + 23) ♦ (31) - 14 -
Chapter 1. Vectors and Matrices
12
The length of a foor^intensmoa) «» would be +-^ J 'S .< Thus the
vector (1.1*U1) has length 71’+1’ + ^ = \ W’ “ ** “iT.h г""'8'’
a unit cube in four^unenuonal space That thagonal m n dimensions has length
Wte»e the words «it «ctor when the length of the veetorBl.D.vuievby||»||.
A uni. «ctor u has length |f»|| = L If •# 0. then» = Ba unit vector.
Example 1 The standard unit vectors along the r and у axes are written t and j. In
the zy plane, the unit vector that makes an angle "theta" with the s axis is (con 9, sin 9);
l ull «dor.	and /-[®| and « - |"J].
When 9 - 0. the horuonol «ctor u is t When « - 90*.the vertical vector j* j. Por
any angle, u - (cos 9, sin 9) В a unit vector because u • u « con’ 9 + ain’ в ж 1.
In three dimensions .he standard паи vectors are i — (1,0,0) and j — (0,1,0) and
к  (0,0,1). In tout dimenuona, one example of a unit vector it st - (j, j, J).
Or you could sun with the vector »-(l,S. 5,7). Then I'vll’ - 1 + 25 + 25 + 49- 100.
So v has length It) and u  »/10 В a unit vector
The word unit" n always uvdwxtmg that some measurement equals "one". The
unit price В the price for one пет А пая cube has tides of length one. A uni. circle
is a circle with rattan one. Now we tee the meaning of a "unit vector": length • 1.
Perpendicular Vectors
Suppose the angle between в and w n SO*. Its coune is Meo That produces a valuable
leM V . w = 0 for perpendicular vectors
Perpendicular .«toe, hn« »• w - 0, ТЪеп Ц» 4-»||» ж ||,||» 4- ||w||’.	(2)
i** T^.,m₽Ort" c-r " b“ brought ut tmek to 90* angle, and lengths
" +	“ e*. The algebra far perpendicular vectors (v  to - 0 - w - v) it easy;
II» ♦ toll1 -(• + •). (• + •)-«.• + •. w + t,..4w.w.||v|(i + ||w||> (3)
Two terms were tern Please notice that ||n - w|p h al» equal lo )|»||> + 11w||>.
Example 2 The vector * 3 (1.1) is n a 45* angle with the x uh
The vector w 3 (1, -1) is al a -45’ angle with the x axis
So the angle between (1.1) and (1,-1) is 90*. Their dot product ts v .» „ i t _ n
** •**• “ « -1 - ».«. i м „г?,?:;; : J
1.2 Lengths and Angles tram Doi Products
13
Example 3 The vectors v = (4.2) and w = (-1.2) have a jrro dot product:
Dot product is zero	[41 [-11	... = 0
Vectors are perpendicular	[2] [ 2j
Put a weight of 4 al the point z = -1 (left of zero) and a weight ol 2 al lhe point r = 2
(right of zero). The z axis will balance on the center point like a see-saw The weights
balance because lhe dot product is <4>(—1) + (2)(2) = 0.
This example is typical of engineering and science. The vector of weights is (trj, w,) =
(4,2). The vector of distances from the center is (ft, ft) « (-1,2). The weights times lhe
distances. iftft and tftft. give die “moments’* The equation for the see saw to balance is
W • tl = WtVi + Iftft w 0.
Example 4 The unit vecton v » (1,0) and u = (соав.ипв) have v • u = «М0.
Now we are connecting the dot product to the angle between vectors
Cosine of the angle в The cosine formula is easy to remember for unit vectors:
If ||v|| w 1 and I|u|| - 1. the angle * between v and u has coed a v •«.
In mathematics, zero is always a special number For dot products, it means that (here
two vreton are perprndicaiar The angle between them is 90* The clearest example
of perpendicular vecton is t w (1,0) along the z axis and j “ (0.1) up lhe у axis.
Again the dot product u i  j - 0 + 0 - ft The coune of «Г is zero
Figure 1.7: The coordinau: vectors i and j. The unit vector u at angle 45" divides
v - (1,1) by its length Ци|| - i/2 .The amt vector u = (<хи в. sin*) it al angle 0
Example 5 Dot products enter m economics and business We have three goods to buy
Their prices are (pt.Pi.Pi) for each unit—this is the price vector p The quantities we
buy are (ql,qI,qa). Buying q, mu al the prin pi brings in ftPi The total com becomes
quantities q times prices p This is (hr def product q - p in three dimensions :
Cert = (qi.ft.ft) (pi.pi.ps) = qtpi + ftft + ftps > dot product
A zero dot product means that The books balance". Total sales equal total purchases if
q • p = 0. Then p is perpendicular to q (in three-dimensional space). A supermarket with
thousands of goods goes quickly into high dimensions.
Spreadsheets have become essential in management They compute linear combi ru-
lions and dot products. What you see on the screen is a matrix.
Chap» 1 Vector» and Maine»
14
The Angle Between Two Vectors
U C.	= o. The dot product IS zero when the
We know thal perpendKular weW have e	„gles The dot product „ .
togteisWTOtottoxtiiepistoCMneciJIdotpreducUto s
find, lhe angle between anytwo полито vector* r ano
_	__	г™л onal aod w » (oo»lS h‘n^) ^e « • W =
Ехатрй 4 The ““ «^’=^*°л“"“^Х,иЬ for «.(3 - o). Rgure U
ramrod + smasu>3 In tnjooometry
show, that lhe angle between the unit vector* V and U ' IS
The dor ,^uc, w. . rg-h »  w 1* «det of v W makes no difference.
в  fl-a
figure 1.1 Um vector* «• U - a»t. The angle between the vector* ii в.
Suppose v w t* aot иго 11 may be pouttv», it may be negative The (ign of V . w
immediately lellv whether wc are below or above a right angle. The angle i* lew than UlT
when vw i* positive. The angle и above W when «• to is negative The right aide
of Figure 19 ihow* a typreal vector»- (3.1). The angle with w - (1,3) la lew than 90“
became v • w - 6 кремне».
The borderline n when vector* an perpendicular to v. On that dividing line between
plus and minus. »1  (1. -3) i* perpendicular to V - (3.1). Their dot product is zero.
Then w । goes beyond а 91Г angle with v The lest become* V • w> < 0: ntfalivt
Figure 1.9 Sowll angle » w, > 0 Right angle ». w, ж 0 Large angle v • w, < 0
The dot product reveals the e*act angle в To repeat: For unit vector* u and U.
thr Jot pr^iKi U • u h tht cosine tf 9. Thn remain* true in n dimension*.
umt vector*uand Vm angle 9 hareu -U. mt Certainlv |u-U|<l
I .2. Length* and Angie* front Dut Product*
15
Wbor v and w are not unit renon? Divide by their length* to get u - v/||o|| and
U = w/|jw||. Then the dot product of those unit vectors si and U give» cm»®.
COSINE FORMULA If в and w are nonzero vector* then ———j
HI M
an®. (4>
Whatever the angle, thi* dot product of u/|ut »nh u>/| w| never exceeds one Thai
it the "Schwarz intqnaluy" v • w| < |u| |w| for dot product»—or more correctly the
Cauchy-Schwarz-Buruakowsky inequality. It seat found in France and Germany and
Russia (and maybe elsewhere—it i* the mas important inequality tn mathematic»).
Since |coa® | never exceeds 1. the соыпе formula in (4) give» two great inequalities
SCHWARZ INEQUALITY	|o. w| < |o| Iv|
TRIANGLE INEQUALITY	|u A v| < M A |to|
Example 7 Find coo* for в w j 1 and w « |	' and check both inequalities
The dot product it v • w - 4 Both u and ir hare length »/& So | |ul| ||v||  8
|b||wH v^s/5	5
The Schwarz inequality is 4 < 5. By the triangle inequality, side 3 -Jo A to flit les* lhan
side 1 4 ode 2. With о + w - (3.3) the three tide» lit/is < Л Square thi*
inequality to get 18 < 20. Thia confirm* the mangle inequality.
Example 8 Thu dot product of e - (u.k) and w - (6. n| it 2nb. Their length* arc
IM - I IM - A1 A M The Schwart inequality в ir < ||u|| ||w|| it 2nb £ a* + b3
For any numbers o’ and b3. grewrnc mean ab < arittunrtic ятя } (a1 A b’).
The triangle inequality comes directly from the Schwarz inequality • Finally, here to a
proof of the Schwart inequality that doesn’t « angles. Every vector u ha* 0 < u • u.
We apply this to the vectors u m ||b\w ± ||w|>:
0 < u • u = ||u|l’w • w ±2|)u1|||ib|| w-u + |)«е||’в-е means that
2IMI,l|w|(,>2|MllMI»-M	(5)
Divide by 2||u!| |Ьв||. Then >u • ml < |)u|| |)ar]l is the Schwarr inequality It lead* to
||« + mH1 - B- B + B.» + wB + w- m< ЦвЦ1 + 2||b(|11m|| + ||w||*. (6)
The square root it |)u A toil < |]v|’ A tc 1 Side 3 cannot exceed Side 1 + Side 2.
Chapter 1. Vecton tod Matrices
16
 WORKED EXAMPLES 
« al «1 w = (4,3) lest the Schwarz inequality on t>. w
A For the vectors » - (3J)	# for lnglc between t> and to
and lhe triangle inequahry on fa + «I r,D0
Solution The dot product в « • w = (3X4> + J4"3
||e|| = #йб = 5 *nd also |H = 5- The «nn • + “
Schwarz inequality
« ». • = (3X-*) + H)<3> = 24Thc ‘«'вл <rf v 15
-	• -i = (7,7) has length 7^2 < jq
!••(< MM  24<25
(• + •!< M + M « 7s/2<5 + 5.
Thin angle frame = (3,4) tow = (4,3)
25
Count of angle
13 В Which e and w grw «ywdirv u -w| = |»l l»l tod ||« + w|| = |v|| + ||w||?
£quo/iry: One vector isamultipie of the other as in w = ce. Then the angle is 0° or 180®.
In thiscase IcoaBi = 1 and |»-1»|вфпяЬ|п| |w||. Iftbc angleis O’, as in w = 2u. then
|v + w||=|v| + |wf (both odes give 3>J). This v. 2o. 3c triangle is extremely thin.
13 C Find a unit vector u in the direction of e = (3,4). Find a unit vector V that is
perpendicular to u There are two possibilities for U.
Solution For a unit vector u. divide e by its length ||v|| = 5. For a perpendicular vector
V we can choose (-4,3) or (4, -3). For a ml veaot V, divide V by its length | V | = 5.
Problem Set U
2
з
5
W tod M of those vecton Chock the Schwarz
inequalmes |a • w| < f«l M and (e  w| < |e| |„||.
the angle 9 Choose	ихЗ “ РгоЫав L and find the cosine of
gte Chooie vectors a. b.e that nuke O’. <W. and 180“ angles with to.
F« any and vectors n and w. find dto d„	of
-	(b) . +	(c)
Find Unit VCClOn Hi And ♦», in rl^ л;, - ,	»
F.nd.nntv^a.a.to^thu.eperpeX^LZ/idLr*'0 = (2Л'2)'
17
1.2 Ixngth* and Angies from Dot Pnxkjcts
6	(a) Describe every vector» = (»>,»») that is perpendicular toe = (2.-1).
(b) AU vectors perpendicular to V  (1.1.1) lie on a------in 3 dimensions
(c) The vectors perpendicular to both (1,1.1) and (1.2,3) lie on a----.
7 Find the angle 9 (from its cosine) between these pairs of vectors:
(a)	If u = (1,1,1) is perpendicular to v and w. then v b parallel to ».
(b)	If u is perpendicular too and w. then u is perpendicular to » + 2».
(c)	If u and v are perpendicular unit vectors then ,« — e|| — -fi. Yes'.
9 The slopes of the arrows from (0.0) to (vi,u>) and (»i,wi) are vj/vi and irj/ti'i.
Suppose the product t4iq/r|U| of those slopes is -1. Show that v • » = 0 and
the vectors are perpendicular. (The line у — 4z is perpendicular to » = — |z.)
10 Draw arrows from (0,0) Io the points v = (1.2) and » = (—2.1). Multiply their
slopes. That answer is a signal that v - w = 0 and the arrows are________.
11 If v w is negative, what does this say about the angle between e and »? Draw a
3-dimensional vector v (an arrow), and show where to find all w's with r - w < 0.
12 With v = (1,1) and » = (1,5) choose a number c so thal w — cv is perpendicular
to v Then find the formula fore starting from any nonzero v and w.
13 Find nonzero vectors u, v. w that are perpendicular to (1.1,1,1) and to each other.
14 The geometric mean of z • 2 and у = 8 is ^zy — 4. The arithmetic mean is larger
j(z+p) =_______________. This would come from the Schwarz inequality for о = (s/2. i/§)
and w « (V'S. v/2). Find cos 9 for this v and ».
15 How long is the vector e  (1,1, —, 1) in 9 dimensions? find a unit vector u in
the same direction as v and a trait vector »that is perpendicular to v.
16 What are the cosines of the angles ci, S.9 between the vector (1,0,-1) and the unit
vectors i, j, к along the axes? Check the formula cos3 a + cos? 0 + cw3 9 = 1.
Cfapiet 1. Vectors and Matr,^
18
Г^« «Ьоы tenetbs “d *ns,“in ,ri“ne,cs-
Problems 17-25 lead to the main facts about >	8
n _	_________A^.„ = (4.J)mdw = (-».2)U*recUn»le Check Ле
rywagons тсспьшл ат»
,^-Г.1» + (ta-hof .)* - О"***» + w)’.
18
18
(Rules for do. products) These equaooos are sunpie but useful:
(!)«.» = «•-	<3><«)-w-c(v.w)
Use(2)w,<b. = .4 - -ор«~1- * -I’—* + 2V W + W W
The “Law of Cossnes" comes from (• - •) -(•-») = »• ® ~ 2v • • + W • «:
Cute U. t» - •!’ - M* " 2M M c“* + M’
Drawatriangleenthudeseandwandn-w Whichoftheanglests87
20 The tnaaglt iar^aatay says (length of • + ») < (length of u) + (length of w).
Problem 18 found |o + eef’ - M* * 2’ 1 » ♦ lncre»“ U>« v • w to
|| o|| || w| to show that side 3| cannot esceed [Лк 1Ц + llalde 2|:
21 The Schwart inequality |u • te| < |tr|| |w| by algebra instead of trigonometry
(a)	Multiply out ЬоЛ »des at (o>»i + и1ю1)’< (v? + t^)(w? + w’).
(b)	Show chai the difference between those two sides equals (щюз - t>2wi)2.
This cannot be negative since it is a square—so the inequality is true
22	One-Ime proof of die inequality fss-ff | < 1 for unit vectors (u,,ui) and ((Д, (/,):
I» • U| < Ь1|KZ,| + Inal |СЛ»| <	= !.
Pot (ui.ua) - (Д Л) and (Ut.Ut) = (Л, .6) in that whole line and find cos8.
23 Why is I cor8, never greater than 17 End con» in an equilateral triangle.
1.2 Length» and Angles Iron Doi Product»
19
24 Show thal the squared diagonal lengths j® + w|’ + |v - v|1 add to Use cum of lour
squared side lengths 21 v||’ + 2|tri’-
25 (Recommended) If |MI - 5 and ||»|| = 3. what are lhe smallest and largest possible
values of Це - w||? What are the smallest and largest possible values of и • to?
Challenge Problem»
26 Сал three vectors tn the ry plane have u-o<0andew<0anduw<0?
I don't know how many vectors in xyz space can have all negative dot products
(Four of those vectors in the plane would certainly be impossible ..
27	Find 4 perpendicular unit vectors of lhe form	Choose + or
28	Using v - randn(3,1) in MATLAB, create a random unit vector u a »/||»|. Using
V a randn(3,30) create 30 more random unit vectors U, What is the average size
of the dot products |u  Ц|? In calculus, the average is f* |«atf|dP/e a 2/rr.
29 In the rg plane, when could four vectors v(, ₽>. »j, v4 not be the four sides of a
quadrilateral 7
Chaffer 1. Vectors and Мшпсе»
20
13 Matrices and Column Spaces
2 columns. Rank 2.
, a а-	of the 3 rows of A with lhe vector a:
2 The Зсотрооепв of Ax are dot products oltnc a ro-
• » J- о. u	*>•>
1-7 + 2-8
23
53
-1
4 Thccoiumntpoceof A contain» ad rombinaliom Ax-г, о i+ff3a3 of the columns
J Rank one natrim All column at A (and all combinations Ax) lie on one line. J
Sections 1.1 and 1.2 explained lhe mechanics of vector»—linear combituurons.
dot product». lengths. aad angle» We have vecton in R'and R and ever) R".
Section 1 3 begins lhe algebra of m by n matncea our tree grail A typical matrix A
la a rectangle of m tuna n number»—m rows and n columns If m equals n then A is a
2	1	-J
1	4	7
-3	7	5
Identity	IHagonal	Triangular	Symmetric
matnx	mama	matrix	matrix
We often think of the column» of A as vecton O|,a31... ,aM Each of (hose n vectors
ia in m-dimcnuooal space. la this example the a's have rn • 3 components each:
m - 3 пип
n — 4 column»
3 by 4 matrix
This example is a “difference matnx" because multiplying A limes x produces a vector
Ax of differences How doer an mbjn malm A multiply on n by 1 vector x ? There are
two ways to lhe same answer—we wort »«h the rows of A or we svort with the columns.
The row picture of Ax will come from dot products of x with lhe rows of A.
The column picture will come from linear combinations of the columns of A.
21
IJ. Машею and Column Space
Row picture of Ax Each row of A multiplies lhe column vector x. Those multiplications
row timet column are doc products! The tint dot product comes from row 1 of A:
(row l)-x = (-1,1,0,0) •(xi.xj.xs.xa) = *> — »s.
It takes m limes n email multiplications to find the m “ 3 dot products that go into Ax.
Three dot
products
-1
0
0
1
-I
0
Notice well that each row of A has the same number of components as lhe vector X.
Four columns multiply X| to x« Otherwise msdtiplying Ax would be impossible
Column picture of Ax The mains A times the vector x is a combination of lhe
columns of A The n columns are multiplied by the n numbers in x Then add those
column vectors X|Ot..x.a. to find the vector Ax:
Ax w Xi (column a() 4- xj(column <si) + xilcolumn щ) 4 x«(cotamn a«)	(2)
This combination of n columns involves exactly the same multipiscalninv as dot products
of x with lhe rn rows Bui it is higher level' We have a vector equation instead of three
dot products. You see the same Ax in equations (I) and (J>.
Combination
of columns
[-1]	Г 11	Г 01	fol [c«-«tl
0 -I	ll + r(lo|alt|-ril|j)
Oj	I oj	l-lj	ll J l«a-».J
Let me admit something right away. If I have numbers in A and x. and I want to compute
Ax, then I tend io use dot products the row picture But if I want to undrrttand Ax. the
column picture is better "The column vector Ax is a combination of the columns of A."
We are aiming for a picture of not just one combination Ax of the columns (from
a particular z), What we really warn is a picture of all combinations of lhe columns
(from multiplying A by all vectors z). This figure shows one combination 2ai ♦ 07 and
then it tries to show the plane of all combinations X|«| 4- гга] (for every xt and Xj).
Figure 1.10: A linear combination of oi and n} All linear combinations fill a plane.
The next important words are independence, dependence, and column space
Chapter 1. Vectors and Matrices
22
. contribute anything new. They might
1 sidy included). Examples 1 and 2 show
• • _	rsf nn»viniK rnlniv,».
. Coiunms^^ nugbin°l.ui
* columns (which »« ^bmations of previous columns.
• т a new direction.
Their combinations fill 3D space R3.
„_• all vectors (bi, bj, Ьз): 3D space.
number bi. Then x3 (0,4,5) leaves b1
dirertron. and columns
(1 0 0 1 Each column gives
2 4 0
3 5 6 J
If »e took at all combinations of the columns, we see
-ь 6. b and
allows us to match any 4,. We have found zj, xj. xs *0 «bat A।X -
Independence means: The only combination of columns that produces Ax = (0,0,0)
< x = (0,0,0). The columns are independent when each new column is a vector that we
don't already have as a	-/«wious columns. That word "independent" will be
important
columns that pse a or*
Independent Ai =
columns
12 3
1 4 5
6 0 6
1 + 2 «= 3
Column 1 + column 2 « column 3
Their combinations don’t fill 3D space ’. ' * ~ "
r 6+0=6
The opposite of independent is -de/vndrnr" These three columns of A3 are dependent
Column 3 is in the plane of columns 1 and 2. Nothing new from column 3.
I usually lest independence going from left lo right. The column (1,1,6) is no problem.
Column 2 м nor a multiple of column 1 and (2,4,0) gives a new direction.
But column 3 is the sum of columns 1 and 2. The third column vector (3,5,6) is not
independent of (1.1.6) and (2,4,0). We only have two independent columns.
If I went from nght to left. I would start with independent columns 3 and 2. Then
column 1 is a combination (column 3 minus column 2). Either way we find that the three
columns are in the same plane: two independent columns produce a plane in 3D.
That plane is the column space of this matrix: Plane = all combinations of the columns.
Example з
’ I
2
5
3
6
15
4
И
20
column 1 + column 2 — column 3 is (0,0,0).
Now в] is 3 times at. And a3 is 4 times at.
Every pair of columns is dependent.
This example is important. You could call it an extreme case. All three columns of A3 lie on
the same line in .^dimensional space That line consists of all column vectors (c, 2c, 5c)—
all the multiples of (1.2.5). Notice that e = 0 gives the point (0,0,0).
That line In 3D b the column space for this matrix As The line contains all vectors
AjZ. By allowing every vector z. we fill in the column space of A3—and here we only
filled one line. That is almost the smallest possible column space
pbe column space of A is the set of all vectors Az: All combinations of the columns )
1.3. Matrices and Column Spaces
23
Thinking About the Column Space of A
“Vector spaces” are a central topic. Examples are coming unusually early. They give you
a chance to see what linear algebra is about. The combinations of all columns produce the
column space, but you only need r independent columns. So we start with column 1. and go
from left to right in identifying independent columns. Here are two examples A< and Aj.
Aa =	1111’ 0 111 0 0 11 0 0 0 1	Aj =	110 0' 0 110 0 0 11 10 0 1
At has four independent columns. For example, column 4 is not a combination of
columns 1,2,3. There are no dependent columns in A«. Triangular matrices like A<
are easy provided the main diagonal has no zeros Here the diagonal is 1,1,1,1.
A$ is not so easy. Columns 1 and 2 and 3 are independent. The big question is whether
column 4 is independent—or is it a combination of columns 1,2,3? To match the final 1
in column 4, that combination will have to start with column 1.
To cancel the I in the top left comer of Aj. we need minus the second column. Then
we need plus column 3 so that -1 and +1 in row 2 will also cancel Now we see what is
true about this matrix А»:
Column 4 of AB г Column I — Column 2 4- Column 3.	(4)
So column 4 of Aj is a combination of columns 1,2,3. A® has only 3 independent columns.
The next step is to "visualize" the column space—all combinations of the four columns.
That word is in quotes because the task may be impossible. I don't think that drawing a
4-dimensional figure would help (possibly this is wrong). The first matrix A* is a good
place to start, because its column space is the full 4-dimensional space R4.
Do you see why C(Aa) « R4 ? If we look to algebra, we see that every vector v in R4
is a combination of the columns. By writing v as (vj, vj, vj, v*). we can literally show the
exact combination that produces every vector v from A«:
the combination is. We have solved the four equations A^x = v! The four unknowns
in x = (xi, xj, хз, z«) are now known in the four parentheses of equation (5).
Geometrically, every vector v is a combination of the 4 columns of Ад. Here is
one way to look at A«. The first column (1,0,0,0) is responsible for a line in 4-dimensional
space. That line contains every vector (ci,0,0,0). The second column is responsible for
another line, containing every vector (cj,Cj,0,0). /f you add every vector (ci,0,0,0)
to every vector (cj, Cj, 0,0), you get a 2-dimensional plane inside 4-dimensional space.
Chapter 1. Vectors and Matrices
24
_ num rule of linear algebra is keep going. Thc
ты w as the first two columns. The	independent of the first two.
Ы! two columns give two more «hrect^ 44linxtbional vector is a combination of the
At the end, equation (5)	4 of Aa is «И of
tour columns of Aa The co u	first 3 columns cooperate. But
If we attempt the same plan for *** j 2 3. Those three columns combine to
column 4 of As is a combination of	4 happens t0 be in that subspace.
give a three-dunetuionai	whote ^4^ space C(A5). We can only solve
Thai three-dimensional subspa*.	three indcpcndent co|u
Ajz - v when v is in C(As)- tne maui»
... ,„llimn soace of A. When A has m rows, the columns are
'	R^^Tumn space might fill *11 of R"‘ or it might not.
vectors ш m-dimcnsRinaJ spj*.e к , irwwiu**** r—-w
For m = 1 here are all four possibilities for column spaces in 3-dtmensional space:
3 independent columns
2 independent columns
1 independent column
3.
1. The whole space R3
2. A plane in R3 going through (0,0,0)
A line in R3 gotng through (0,0,0)
The single point (0,0,0) in R3 (when A is a matrix of zeros I)
Here are simple matrices to show those four possibilities for the column space C(A);
1 0 0
0 1 0
° ° 1
C(A) = R3 = iy: space
10 0
0 1 0
0 0 0
C(A) = ry plane
10 0
0 0 0
° 0 0
C(A)=xaxis
0 0 0'
0 0 0
° 0 0
C(A)=one point (0,0,0)
Author’s note The words “column space" have not appeared in Chapter 1 of my previous
books 1 thought the idea of a space was loo important to come so soon. Now I think that
the best way to understand such an important idea is to see it early and often. It is examples
more than definitions that make ideas dear-in mathematics as in life.
0»	 4e •“*»1 >»“«“*
the keys to the answers They give a real understanding of any matrix A
l.
’	T •"*' '"k₽en*"'	'» •. «*«- Ч-.
5 (Amazing) The r rows of Л are a bad» f™ .1,
c row s₽ace °f Л: combinations of rows.
Sectio* 14 will eiphin how to multiply ах», n
C contains columns from A. Please n^tke Z C and «• The result is A = СЯ.
“«>«	the row, of Я do w come directly from A.
1.3. Matrices and Column Spaces
25
Matrices of Rank One
Now we come to the building blocks for all matrices. Every matrix of rank r is the sum
ofr matrices of rank one. For a rank one matrix, all column vectors lie along the same line.
That line through (0,0,0) is the whole column space of the rank one matrix.
1
4
2
Example
As =
3
12
6
-2
-8
-4
has rank r = 1. All columns: same direction 1
Columns 2 and 3 are multiples of the first column ai = (1,4,2). Column 2 is 3a, and col-
umn 3 is— 2aj. The column space C(Ag) is only the line of all vectors cai = (c,4c, 2cj.
Here is a wonderful fact about any rank one matrix. You may have noticed the rows
of A«. All the rows are multiples of one row. When lhe column space is a single line in
rn-dimensional space, the row space is a single line in n-dimensional space. All rows of
this matrix A« are multiples of_____.
An example like A« raises a basic question. If all columns are in the same direction,
why does it happen that all rows are in the same direction? To find an answer, look
first at this 2 by 2 matrix. Column 2 is m times column 1 so the column rank is 1.
a ma
b mb
Is row 2 a multiple of row 1 ?
Yes I The second row (b,mb) is | times lhe first row (a, ma). If the column rank is 1,
then the row rank is 1. To cover every possibility we have to check the case when a  0.
Then the first row [ 0 0 ] is 0 times row 2. So the row space is the line through row 2.
Our 2 by 2 proof is complete. Let me look next at this 3 by 3 matrix of rank 1:
ma pa
mb pb
me pc
Column 2 is m times column 1
Column 3 is p times column 1
Rows 2 and 3 are b/a and c/a times row 1
This matrix does not have two independent columns. Is it the same for the rows of A ?
Is row 2 in the same direction as row 1 ? Yes. Is row 3 in the same direction as row 1 ? Mrs.
The rule still holds. The row rank of this A is also 1 (equal to the column rank).
Let me jump from rank one matrices to all matrices. At this point we could make
a guess: It looks possible that row rank equals column rank for every matrix. If A
has r independent columns, then A has r independent rows. A wonderful fact I
I believe that this is the first great theorem in linear algebra. So far we have only seen
the case of rank one matrices. The next section 1.4 will explain matrix multiplication AB
and lead us toward an understanding of "row rank = column rank” for all matrices.
Chapter I. Vectors and Matrices
26
Problem Set 1-3
But we don't yet have a computational system to
This chapter introouces	, „Iumn vectors. So these problems stay with whole
decide independence or dependence of colum
numbers and small matrices.
1 0 o'
1 1 о
1 1 1
Лз =
Лз
1 5'
2 10
1 5
0
Л< = 0
0
0‘
0
0
Find a combine of the columns that produces (0,0.0): column space = plane.
Dependent
columns
Describe the column spaces in R ’ of В and C:
12
2 1
3 3
C-
В -В
(3 by 4 block matrix)
Multiply Az and By and lx using dot products as in (rows of Л)  z:
2 1 2]fl
4 2 4
0 1 0
1 0 01Г 4*
By = 1 10
1 1 1
10
°1Г*1
0
1 J *» .
1
2 2
1 1
5 6
2
I 2 3
4 5 6
7 8 9
1 4 7
2 5 8
3 6 9
3
в
2
5
1 0
0 1
0 0
*2
5
6
8
9
Multiply the same A times z and В times у and I times * using combinations of the
columns of AandBand/.asinAz = 1 (column 1) + 2(column 2) + 5(column 3).
JnolXkb m"> ,nd^a”co,u>™«<‘‘« Л have ? How many independent
columns in Д? How many independent columns in A + Bl
st.	co,umns) w ,hat л+в h“
WNoradeP«>dent columns (c) 4 independent columns
the column spaces in R3 oMandBand C°m^’nat*ons of *** columns. Describe
2 4’
1 2
2 4
1
0
0
0‘
0
I
0
1
0
10 12'
0 2 2 4
.0 2 2 4
0 =
C =
R*>d a 3 by 3 matrix A with x — •
(What is the maximum poss.bk'n^^'i "" "ine Cntries = 1 or 2
1.3. Matricc* and Column Space*
27
10 Complete A and В so that they arc rank one matrices. What are the column space*
of A and В1 What arc the row spaces of A and В ?
A =
3
5 15
B-
1 2 -5
4
11 Suppose A is a 5 by 2 matrix with columns ai and aj. We include one more column
to produce В (5 by 3). Do A and В have the same column space if
(a) the new column is the zero vector ?	(b) the new column is (1,1,1) ?
(c) the new column is the difference aj — ai ?
12	Explain this important sentence. It connects column spaces Io linear equations.
Ax = b has a solution vector x if the vector b is in the column space of A.
The equation Ax = b looks for a combination of columns of A that produces b.
What vector will solve Ax = b for these right hand sides b?
13	Find two 3 by 3 matrices A and В with lhe same column space  the plane of
all vectors perpendicular to (1,1,1). Whal is the column space of A + В ?
14	Which numbers q would leave A with two independent columns ?
1 4 7 ‘
1
3
0
1
2
9
2
4
2 5 8
3 6 q
5 8
2
0
2 4
0 4
5 0 9
15	Suppose A limes x equals b. If you add b as an extra column of A, explain why the
rank r (number of independent columns) stays the same.
16	True or false
(a)	If the 5 by 2 matrices A and В have independent columns, so docs A 4- B.
(b)	If the m by n matrix A has independent columns, then tn > n.
(c)	A random 3 by 3 matrix almost surely has independent columns.
17	If A and В have rank 1, what are the possible ranks of A + В ? Give an example of
each possibility.
18 Find the linear combination 3s 1 4- 4sa 4- 5вз = b. Then write b as a matrix-vector
multiplication Sx. with 3,4,5 in z. Compute the three dot products (row of S) • z:
1
1
1
«2 =	1
1
0
я.з = 0 go into the columns of S
1
Chapter 1. Vectors and Matrices
28
19
Solve these equations Sy -
fl 0 01 Г Vx
110 in
1 1 1J I».
bwith«b*X’Js in
ГИ
_ 1 and 1
1J I1
the columns of the sum matrix S;
The sum of the first 10 is_________
The sum of the first 3 odd numbers is----•
Solve these three equations for tn. Jt2. P3 in tcrmS°f 12
10 0
1 1 0
1 1 1
V3
Wnte the solution V as a matnx A times the vector c. A is the "inverse matrix" S'1.
Are the columns of S independent or dependent.
21 The three rows of this square matrix A are dependent- Then linear algebra says that
the three columns must also be dependent. Find x in Ax = 0:
12 3
3 5 6
4 7 9
Row 1 + row 2 = row 3
Two independent rows
Then only two independent columns
22	Which numbers c give dependent columns ? Then a combination of columns is zero.
1 1 0
3 2 1
7 4 c
1 0 c
1 1 0
0 1 1
c c c
2 1 5
3 3 6
[' и
L4 CJ
23
If the columns combme mto Ax = 0 then each row of A has row  x = 0:
If
ai Oj аз
H ff	=	0 0	then by rows		1 *1 1 M *- H H 	1
Xs J		0		r3 • X
The ta.	Л. fc „ ,	b р1те	ю
’O'
0
0
1.4. Matrix Multiplication and A = CR
29
1.4 Matrix Multiplication and A = CR
f \ To multiply AB we need tow length for A = column length for B.
2	The number in row i. column j of AB is (row • of A) «(column j of B).
3	By columns: A times column j of В produces column j of AB.
4	Usually AB is different from BA. But always (AB) C = A (BC).
If A has r independent columns, then A = CR = (m by r) (r by n).
At this point we can multiply a matrix A times a vector z to produce Ax. Remember the
row way and the column way. The output is a vector.
Row way Dot products of z with each row of A
Column way Ax = ziai + • • • + zna„  combination of the columns of A
Now we come to the higher level operation in linear algebra: Multiply two matrices
We can multiply AB if their shapes are right. When A has n columns. В must have n rows.
If A ism by n and В is n by p, then AB ismbyp.m columns and p rows.
The rules for AB will be an extension of the rules for Ax. We can think of the vector z
as a matrix В with only one column. What we did for Az we will now do for AB.
The columns of В are vectors [ z у x ]. The columns of AB are vectors [ Az Ay Ax ]
In other words, multiply A times each column of B. There are two ways to multiply A
times a column vector, and those give two ways to multiply AB:
Dot products (row t of A) • (column j of B) goes into row». column j of AB
Combinations of columns of A Use the numbers from each column of В
We have dot products (numbers) or linear combinations of columns of A (vectors).
For computing by hand. I would use the row way to find each number in AB. I “think” the
column way to see the big picture: Columns of AB are combinations of columns of A.
Example 1 Multiply AB =1	! I 3 el **У* How many steps ?
I 4	17 О I
The dot product (row 1 of A) • (column 1 of B) is (1,2) • (5,7) = 5 + 14 = 19 ,			
(Rows of A)-(columns of B)	Г row 1 -col 1 - [ row 2 • col 1	row 1 • col 2 1 _ Г19 row 2 • col 2 J “ [43	221 50 J
Abi and Ab? are	Г 1 1 Г 2 1 „ 1	11 Г 2 11 Г 19	22 1
combinations of AB =	5 3 +7 4	6	3 +8 4	= аз	50 J
the columns of A	L J J L 4 J	л J I 4 J J L4 J	
2 b) 2 "UInCeS Mutaptytng
s 61ви'ПрЬ*>10-'	в h*s P columns, we have to nm]Up.
* я by n	mnp multiplications for (m by n)
Tba	__ j.9.2 = 8multiplications.
Iж = Ps 1 л and В be multiplied with fewer than
. Слм я by 	* pouiblf (allowing extra additions)
“**	° noTesponert E in the multiplication count nE.
k.i E = 2 0001 may be impossible.
питх mulnpbcaoon. Explain why every vector
да!ияда of AB—и also in the column space of A
Еъ*-<яе2 тье identity matrix / has A7 » A and 7fi = Я if matrix sizes are right
?]-[“-]_х ,°"ayA
0 1
1 0
wiD exchange columns or exchange rows.
Ехжпр» 3
The matrix E »
Exchange columns of A (E is on the right)
Exchange rows of В (E is on the left)
AE * ЕЛ for moo matrices Exchange columns or exchange rows.
! !1 «.
ts not the same as EA = j 2 ]
„	“ "ПрППЛЯ f": AB “n faMlv b* different from BA
We must keep matrices AB ot ABC in order.
. examples where AB « BA. hit those special cases are not typical.
I. Why is this true ?
J"“ л Оив ВС. Matnx multiplication is “associative 
‘«prod, became n u sounportant; (AB)C = A(BC).
"-n® mvM oxy the same! But we can multiply AB fir*
res T"*1 m 11ПСИ 11?ebri depend on this simple fact
и asvKien/i mM m“Ch (,n n> * (n by p) x (p ЬУ
te the^l^ *°”И ** (£Л)£ = Е(Л£)’ Exchan₽e
A hnx The triple product EAE does both
1.4. Matnx Multiplication and A = CR
31
Rank One Matrices and A — CR
All columns of a rank one matrix lie on the same line. That line is the column space C( A).
Examples in Section 1.3 pointed to a remarkable fact: 77ie rows also lie on a line.
When all the columns are in the same column direction, then all the rows are in the same
row direction. Here is an example:
1
3
2
2
6
4
10 100	rank one matrix
30 300	= one independent column
20 200	one independent row !
All columns are multiples of (1,3,2). All rows are multiples of [ 1 2 10 100 ].
Only one independent row when there is only one independent column. Why is this true ?
Our approach is through matrix multiplication. We factor A into C times R. For this
special matrix, C has one column and R has one row. CR is (3 x 1)(1 x 4).
AmCR	 1 2 10 100 3 6 30 300 2 4 20 200	=	 1’ 3 2	[1 2 10 100 ]
(D
The dot products (row of О • (column of R) arc just multiplications like 3 times 10.
This is multiplication of thin matrices CR Only 12 small multiplications.
The rows of A are numbers 1,3,2 tunes the (only) row [ 1 2 10 100 ] of A. By
factoring this special A into one column times one row. the conclusion jumps out:
If the column space of A is a line, the row space of A is also a line.
One column in C, one row in A. That is beautiful, but we are certainly not finished.
Our big goal is to allow r columns in C and to find r rows in A And to see A ” CR
C Contains Independent Columns
Suppose we go from left to right, looking for independent columns in any matrix A:
If column 1 of A is not all zero, put it into the matrix C
If column 2 of A is not a multiple of column 1. put it into C
If column 3 of A is not a combination of columns 1 and 2, put it into C. Continue.
At the end C will have r columns taken from A. That number r is the rank of A The
n columns of A might be dependent The r columns of C will surely be independent
Independent No column of C is a combination of previous columns
columns	No combination of columns gives Cx = 0 except x = all zeros
When those independent columns combine to give all columns, we have a basis
Chapter 1. Vectors and Matrices
32
,	+ «(column2ofC) + ••• = zero vector.
Cx = 0 means that г,(соЬиппЫ^
With independent columns, th.s onlyhД	w0Uld be a comb.nat.on of the earlier
by the last nonzero coefficients and ma
columns—which our construction forbids.
’ 2
Example 7 Л =
6
4
4 12 8
1
3
5
Column 1 goes
leads to C -
2
4
1
4
8
5
C. Column 2	° “““3	&
Matrix Multiplication C times R
._rn n tells how to produce the columns of A from
t^cXTns	column Of A is actually in C so the first column of R just
has 1 and 0. The third column of A is also in C. so lhe third column of R just has 0 and 1.
Rank 2
Notice I
inside R
2
6	4	1	2
12	8	=	4
3	5	J	1
4
8
5
[ 1 ? 0
[ 0 ? 1
= СЛ
(2)
Two columns of A went straight into C, so part ofRis lhe identity matrix. The question
marks are in column 2 because column 2 of A is not in C. It was not an independent
column. Column 2 of A is 3 limes column 1. That number 3 goes into R. Then R shows
bow to combine the two columns of C to get all three columns of the original A.
A is m x n
C is m x r
R is r x n
A = CR is	’2 6 4  4 12 8 1 3 5	=	2 4  4 8 1 5	f 1 3 ° [OOl
This completes A = CR. The magic is now seen in the rows. All the rows of A
come from the rows of R. This fact follows immediately from matrix multiplication CR:
RowlofAis 2 (row 1 of Л) + 4 (row 2 of R)
Row2ofAis 4(row 1 ofЯ) +8(row2ofR)
Row3ofAis 1 (row 1 of 7?) + 5 (row 2 of R)
I» O.I, г .«lopo— w „3 33,O in л combjne w 8ive	of л
3 
6
9
Multiply CR
using rows of R
Second example
of A = CR
from the front cover
When a column of Л goes into C, a <
1
4
7
2
5
8
1
4
7
2 ’
5
8
1 0 -1
0 1
2
(4)
of R tells us how to produce the depend™? 8°es into R. The “free” column -1,2
tn . Column 3 of A is —j (14 7) >9 (9 r ° ^гот l^e independent columns
Column j rfX - CUm.	“ *>=	<* *	«Г Я
c times column j of R.	R • f
,w,of4 = rows of Ctimes R
1.4. Matrix Multiplication and A = CR
33
Question If all n columns of A are independent, then C = A. What matrix to Я ?
Answer This case of n independent columns has R = I (identity matrix). The rank is n.
How to find R. Start with t independent columns of A going into C.
If column 3 of A = 2nd independent column in C, then column 3 of Я is
= CR All three ranks = 2
Dependent: If column 4 of A « columns 1 + 2 of C, then column 4 of Я is J
Я tells how to recover all columns of A from the independent columns in C.
1	2	3	4 1	_ [ 1	3 1 Г 1	2	0	11
1	2	4	5 j [ 1	4 ] I 0	0	1	1]
Here is an informal proof that row rank of A equals column rank of A
1.	The r columns of C are independent (by their construction)
2.	Every column of A is a combination of those r columns of C (because A » CR)
3.	The r rows of Я are independent (they contain the r by r matrix I)
4.	Every row of A is a combination of those r rows of Я (because A = CR)
Key facts
The r columns of C are a basis for the column space of A: dimension r
The r rows of Я are a basis for the row space of A: dimension r
Those words “basis” and “dimension” are properly defined in Section 3.41 Section 3.2
will show how the same row matnx Я can be constructed directly from the "reduced row
echelon form” of A, by deleting any zero rows. Chapter 1 starts with independent columns
of A, placed in C. Chapter 3 starts with nows of A and combines them into R.
We are emphasizing CR because both matrices are important. C contains r indepen-
dent columns of А. Я tells how to combine those columns to give all columns of A.
(Я contains I. when columns of A are already in C.) Chapter 3 will produce Я directly
from A by elimination, the most used algorithm in computational mathematics. This will
be the key to a fundamental problem: solving linear equations Ax = b
Why is Matrix Multiplication AB Defined This Way ?
The definition of AB was chosen to produce this crucial equation: (AB) times x
is equal to A times Bx. This leads to the all-important law (AB)C = A(BC). We
had no other reasonable choice for AB ! Linear algebra will use these laws over and over.
Let me show in three steps why that crucial equation (AB)x = A(Bx) is correct:
Bx is a combination xibi + x2bj + • • • + x»bn of the columns of B.
Matrix-vector multiplication is linear: A(Bx) = xj Abj + xjAbj + • • • + xn(Abn).
We want this to agree with (AB)x = xi(cohimn 1 of AB) + ••• + xn(column n of AB).
Compare lines 2 and 3. Column I of AB absolutely must equal A times column 1 of B.
This is our rule: When В = [ x у x ] the columns of AB are [ Ax Ay Az].
Chapter 1- Vectors and Matrices
34
(AB)x = A(Bx)
to erase the parentheses.
_[.;i ,-[•:]
ЛВ‘	1	AB:
Example 8 A —
When we show that (AB*	____
(AB)z =
4(Ях)-[3 4 J L 7
The parentheses don’t matter but the
ВАС and ACВ almost always give differ»
order ABC certainly does matter. The multiplications
answers. In fact ВАС may be impossible.
Columns of A times Rows of В
. . in Wid this message. There is another way to multiply
Before this chapter ends.	always) This way is not so well known, but it
- - - “
bi
AB =
b‘n
columns ak rows bt
columns a* times rows b’k
(5)
Those matrices akb[ are called outer products We recognize that they have rank one:
column times row They are entirely different from dot products (rows times columns,
also known as inner products). If A is an m by n matrix and В is an n by p matrix,
adding columns times rows gives the same answer AB as rows times columns.
Actually they involve the same mnp small multiplications but in a different order I
(Row) -(Column) mp doc products, n multiplications each total mnp
(Column) (Row) n rank one matrices, mp multiplications each total mnp
Columns x Rows	1 4	[ 7 8 9 1	T	[7 8 9]	'4'	[10 11 12]
for A times В	2 5	[10 11 12] =	2	+	5	
	3 6		3		6	
Rank 1
-bRank 1
7
14
21
16
24
9
18
a
io
60
44 48'		47 52 57'
55 60	=	64 71 78
66 72		81 90 99
(6)
18 multholication* <ПЛ₽	=	sUn °f second line you see the
3 * 3 ТЪСЛ 9 «“*«“ ₽*« ^e correct answer AB.
Two independent rowr ' T)^ °f	,s 2- Г*° "dependent columns, not three,
inverse matrix it is not imemhl s яТ” ChafXCr *111 use different words. AB has no
• uaonmvmWc. And laterinthe book: The determinant of AB is zen>.
1.4. Matrix Multiplication and A = CR
35
Note about the “echelon matrices*’ R and Rq
We were amazed to learn that the row matrix R in A - CR is already a famous matrix
in linear algebra! It is essentially the “reduced row echelon form” of the original A.
MATLAB calls it rref (A) and includes m — r zero rows. With the zero rows, we call it Ro 
The factorization A = CR is a big step in linear algebra. The Problem Set will look
closely at the matrix R, its form is remarkable. R has the identity matrix in г columns.
Then C multiplies each column of Я to produce a column of A. Ro comes in Chapter 3.
Example 9 A = a, a2 3a, + 4a2 » la, a2 * ° ® = CR.
Here a, and a? are the independent columns of A. The third column is dependent—
a combination of a, and a?. Therefore it is in the plane produced by columns 1 and 2.
Al) three matrices A, C, R have rank r = 2.
We can try that new way (columns x rows) to quickly multiply CR in Example 9:
ColumnsofC CR= ,	31 + aj[0 1 41»(a, a2 3a,+4a3]-A
times rows of Я	1 J • l i i •	*	j
Four Ways to Multiply AB = C
(Row i of A) * (Column fc of B) = Number Си,
t = lto3 k=lto4 12 numbers
A times (Column к of B)	= Column к of C
к- 1Ю4	4 columns
(Row t of A) times В	» Row i of C
I = 1 to3	3 rows
(Column j of A) (Row j of B)	= Rank 1 Matrix
j = lto2	2 matrices
Problem Set 1.4
1	Construct this four-way table when A is m by n and В is n by p. How many dot
products and columns and rows and rank one matrices go into AB ? In all four cases
the total count of small multiplications is mnp.
2	If all columns of A = [ a a a ] contain the same a / 0. what are C and R ?
Chapter I. Vectors and Matrices
36
3
4
5
6
7
в
а
ю
и
12
Multiply A bmes В (3 examples) using dot products: each row times each column
'	"	f4](1 2 7
10 0
1 I 0
I 1
О о
I 0
5
6
1
6
1 -1 1J
Test the truth of the associative _
1
ivelaw (AB)C = A(BC).
ill (b)
3
1
1
0
1 2
0 1
Why is it impossible for a maim A with 7 columns and 4 rows to have 5 independent
columns ? This is not a trivia) or useless question.
Going from left to right, put each column of A into the matrix C if that column is
not a combination of earlier columns:
[2-216
1 -1
3 -3
2
1
3
0
0
1
C =
a =
0 2
0 6
Find R in Problem 6 so that A — CR. If your C has r columns, then R has r rows.
The 5 columns of R tell how to produce the 5 columns of A from the columns in C.
This matrix A has 3 independent columns. So C has the same 3 columns as A
What is the 3 by 3 matrix R so that A  CR? What is different about В ?
[ 2 2 2’
Upper triangular A - | 0 4 4
0 6
2 2 2'
0 0 4
0 0 6

0
Suppose A is a random 4 by 4 matrix. The probability is 1 that the columns of A are
“independent" In that case, what art the matricesC and R in A = CR?
Note Random matrix theory has become an important pari of applied linear algebra—
especially for very large matrices when even multiplication AB is too expensive.
An example of “probability 1” is choosing two whole numbers at random. The
probability is 1 that they are different. But they could be the same ! Problem 10
is another example of this type.
Suppose A is a random 4 by 5 matrix. With probability 1, what can you say about C
and Я in A = СЯ ? in particular, which columns of A (going into C) are probably
independent of previous columns, going from left to right ?
Г-7Г ^7^77** * * 4 * 4	* of rank r = 2. Then factor A into
CR = (4 by 2)(2by 4).
Factor these matrices into A-CflsfmbvrMrk--I n ..
tm oy r) (r by n). all ranks equal to r.
Г i n «1	»-
A,-
“11 ranks equal to r.
o| A4=[1 0 0 4
l0 2 2 0
0 12 3
0 13 5
1.4. Matrix Multiplication and A = CR
37
13
Starting from C=
and /? = [2
4 ] compute CR and RC and CRC and RCR.
14 Complete these 2 by 2 matrices to meet the requirements printed underneath:
3 6 1 Г 6 l Г 2 If3 4 I
5	7	L 3 6 J [	-3 ]
rank one orthogonal columns rank 2 A2 = I
15	Suppose A = CR with independent columns in C and independent rows in R.
Explain how each of these logical steps follows from A = CR = (m by r) (r by n).
1.	Every column of A is a combination of columns of C.
2.	Every row of A is a combination of rows of R. What combination is row I ?
3.	The number of columns of C = the number of rows of R (needed for CR ?).
4.	Column rank equals row rank. The number of independent columns of A equals
the number of independent rows in A.
16	(a) The vectors ABx produce the column space of AB. Show why this vector
ABx is also in the column space of A. (Is ABx = Ay for some vector у 7)
Conclusion: The column space of A contains the column space of AB.
(b) Choose nonzero matrices A and В so the column space of AB contains only
the zero vector. This is the smallest possible column space.
17	True or false, with a reason (not easy):
(а)	ИЗ by 3 matrices A and В have rank 1. then AB will always have rank 1.
(b)	If 3 by 3 matrices A and В have rank 3, then AB will always have rank 3.
(c)	Suppose AB — BA for every 2 by 2 matrix B. Then A = [ £ j  cl
for some number c. Only those matrices A commute with every B.
1 2 ‘
3 4
1
0
18	Example 6 in this section mentioned a special case of the law (AB)C = A(BC).
A = C « exchange matrix
(a)	First compute AB (row exchange) and also BC (column exchange).
(b)	Now compute the double exchanges: (AB)C with rows first and A(BC) with
columns first. Verify that those double exchanges produce the same ABC.
19	Test the column-row multiplication in equation (5) to find AB and BA:
	’10 0'		111'		111'		10 0'
AB =	1 1 0		0 1 1	BA =	0 1 1		1 1 0
	1 1 1		0 0 1		0 0 1		1 1 1
20 How many small multiplications for (AB)C and A(BC) if those matrices have sizes
ABC = (4 x 3)(3 x 2) (2 x 1)? That choice affects the operation count.
Chapter 1. Vectors and Matrices
38
Thoughts on Chapter 1
. a. .he author s thoughts. But a lot of decisions go into
Most textbooks don’t have a place f<*	_ jumped nght int0 the subject, with
sorting a new textbook This chapter	g(xxj ideas ahcaj and
discussion *^Pcndc“" , Herc two questions that influenced the writing,
time to absorb, so why not get started ' Mere are ।	ч
What makes this subject easy? AH the equations are linear.
What makes this subject hard? So many equations and unknowns and ideas.
Book rumples are small size But if we want the temperature at many points of an engine,
there is an equation at every point: easily n = 1000 unknowns.
I believe the key is to work right away with matrices Ax = b is a perfect format to
accept problems of all sizes. The linearity is built into the symbols Ax and the rule is
A(x + y) « Ax + Ay. Each of the m equations in Ax “ b represents a flat surface:
2x + by - 4z - 6 isa plane in three-dimensional space
2z + 5y - 4z + 7w  9 is a 3D plane (hyperp lane ?) in four-dimensional space
Linearity is on our side, but there is a serious problem in visualizing 10 planes meeting
in 11-dimensional space. Hopefully they meet along a line: dimension 11 - 10 - 1.
An 11th plane should cut through that line at one point (which solves all 11 equations).
Whai the textbook and the notation must do is to keep the counting simple
Here is what we expect for a random m by n matrix A:
m < n Many solution» or no solutions to the m equations Ax = b
man Probably one solution to the n equations Ax = b
_ m > n Probably no solation. too many equations with only n unknowns in x
Л ca" •* combinations of
The rank r teUs us the real size of our nmht r CWT,blM'>on of previous equations.
The beautiful formula is A - f в /	from independent columns and rows.
The same ,s true for every column of C П—r	° = A (Bc)'
C Tberefore(ДВ)СвЛ(ВС)
2 Solving Linear Equations Ax = b
2.1	The Idea of Elimination
2.2	Elimination Matrices and Inverse Matrices
2.3	Matrix Computations and A = LU
2.4	Permutations and Transposes
The matrices in this chapter are square: n by n. Then Ax  b gives n equations (one from
each row of A). Those equations have n unknowns in the vector x. Often but not always
there is one solution x for each b. In this case A has an inverse A"1 with A“*A = I
and AA~l = 1. Multiplying Ax = b by A-1 produces the symbolic solution sc = A**b.
This chapter aims to find that solution x, but not by computing A~ *. (That would solve
Ax = b for every possible b.) We go forward column by column, assuming that A has
independent columns. We only stop if this proves wrong. At the end we have triangular
matrices L, U and x is easy to find.
Ax м b is a universal problem in science and engineering and every quantitative sub-
ject. There might be n = 10 equations—this is already beyond hand calculations. Many
problems have n " 1000 or more—and we certainly don't want to find A-*. What we
do need is an efficient way to compute the solution vector x.
Here is an idea that goes back thousands of years (to China). Each step of “elimination"
will produce a zero in the matrix. The original A gradually changes into an upper triangular
U. Half of this matrix will be zero. A simple elimination matrix Etj produces one zero
where row i meets column j. This is not exciting, it is just the natural way to simplify A.
To describe all these steps we need matrices. This is the point of linear algebra!
There are elimination matrices like E to reach U. And we multiply U by an inverse
matrix I s f'1 to come back to A. Here are key matrices in this chapter of the book:
Coefficient matrix A
Elimination matrix Ey
Permutation matrix P
Upper triangular U
Overall elimination E
Transpose matrix AT
Lower triangular L
Inverse matrix A-1
Symmetric matrix S = Sr
Our goal is to explain the elimination steps from A to EA = U to A = E — LU.
(If the steps fail, this signals that Ax = b has no solution.) Every computer system has a
code to find U and then x. Those codes are used so often that elimination adds up to the
greatest cost in all of scientific computing.
But the codes are highly engineered and we don’t know a better way to find x.
39
2. Solving Linear Equations Ax = b
40
2.1 The Idea of Elimination_______________________________
/1 Elimination subtracts tit times row ; from row i. Io turn А,у into zero.
2 Ax = bbecomes Ux = c(orelse Ax = b is proved to have no solution).
3 Ux = c is solved by back substitution and possible row exchanges.
This chapter explains a systematic way lo solve Ax = b: n equations for n unknowns.
The n by n matrix A is given and the n by 1 column vector b is given. There may be
no vector x « (xi, xj........x.) that solves Ax = b, or there may be exactly one solution,
or there may be infinitely many solution vectors x. Our job is lo decide among these
three possibilities and to find all solutions Here are examples with n = 2.
1
2
3
Exactly one solution to Ax = b. In this case A has independent columns. The rank
of A is 2. The only solution to Ax * 0 is x * 0. A has an inverse matrix A" *.
The best case has a square matrix A (m = n) with independent columns. Then
there is one solution x (one combination of lhe columns of A) for every vector b.
E^P»* whh one .elution («,»)» (1,1)	2x + 3p - 5
Independent columns (2,4) and (3,2)	4т + 2y = 6
In other	. in ib Л s*сж* b '* no* * combination of the columns of A.
<n other word, b >. mx in the column qiace of A. The rank of A i, 1.
Example with no solution	_	„
Dependent column. (2,4) and (3,6)	£	“ °
Subtract 2 times the first equation from ib	V °
«I atmn from lhe second to get 0 - 3. No solution
^ert will be Infinitely many solution. a. a v -
independent This is the meaning of d * = ° Whcn tlle columns of Л arc not
'he zero vector b > 0 Every cX	co*un,ns ,П;||’У ways to produce
.	S,Tes Л'сД)»0.
If there is one solution to Ax - A
All the vector. z + cX solve Ihexam/" **	*“ Wy K>lu,ion ,o AX = 0
д,	e*,ua,lons- so we have many solutions.
***"*~?*^ Ь+W- e
m be added to the ro ° T**n = (6 -4) |U*	** more •options because
r^to^^'*roluuonx L;0 2 JZ!VM = 0. AH vectors
+	^^'^“ce-noresoludons:
""'ofso/urt^
w,rh'weqi<ariOTIJ Ax = b.
2.1 The Iilea of III ruination
41
This chapter will give a systematic way lo decide between those possibilities 1,2.3
One solution. no solution, infinitely nun у solutions this system IS called rllmtiiHlion
It simplifies the matrix Л without changing any volution x to the equation lx b We
do the same operations to both sides of the equation, and those opeiations ate reversible
Elimination keeps all solutions x and creates no new ones
Let me show you the ideal case, when elimination produces an uppet liiangulai matrix
That matrix is culled //. Then /lx b leads lo Cx - c. which we easily solve
Elimination reaches U
Back substitution finds x
2	3	I
О	П	0
II	()	7
Н»
17
II
r
That letter U stands for upper triangular The matrix has all zeros below its diagonal
Highly important: The “pivots" 2. Г>. 7 on that main diagonal are not zero Then we
can solve the equations by going from bottom lo lop ftrul t ।thru i г thru 1।
Back substitution
Work upwards
Upwards again
Conclusion
Special note
The last equation 7r । - 11 gives x* « 2
The next equation 'ix, + fit2) - 17 gives xj • I
The first equation 2xt 4-3(1) ♦ 4(2)  19 gives X|  I
The only solution lo this example Гх cisx -(4.1.2).
In solving for X|, x3. xj we needed to divide by live pivots 2, A, 7,
These pivots were probably not on the diagonal of the original matrix A (which we
haven’t seen). The pivots 2,5,7 were discovered when "elimination" produced lire
lower triangular zeros in U. This crucial step from A to U I» still to be csplumed I
Note We would not allow the number zero to be a pivot That would destroy our plnn
because an equation like Oxi = 2orQrj « 5or0rj • Я has no solution Bink substitution
will break down with a zero in any pivot position (on the diagonal of ft) the test lot
independent columns (when Ca A in Chapter 1. and R*l, and Л = (‘H becomes I Л11
is n nonzero pivots.
Every square matrix Л with independent columns (full rank) can he reduced lo a Irian
gular matrix U with nonzero pivots. This is our job. It is possible that wc may need to put
the equations Лх = b in a different order. We start with the usual case when elimination
goes from A to U and back substitution finds the one and only solution vector r to Лх = b
Chapter 2. Solving Linear Equations Ax = A
42
Elimination in Each Column
First comes . matrix A (independent columns) that will require no row exchanges.
’ 2 3 4 '
The starting matrix is A	д _	4 Ц 14	.
The first pivot is 2	2	8 17
The first pivot row is [ 2 3 4 ]. Multiply that row by 2 and subtract from row 2:
First step: Eji^is
The multiplier was 4/2 - 2
‘2 3 4
0 5 6
2 8 17
(3)
This produced the desired zero in column 1. To produce another zero, we subtract row 1
from row 3. This completes elimination in column 1:
Second step :
The multiplier was 2/2 = 1
2	3	4	‘
0	5	6
0	5	13
(4)
2 and ™ 2 * * (lhe lecond pivo< row) The Piv0‘« 5- on ‘he diagonal.
To eliminate the 5 below it, multiply row 2 by the number 1 and subtract from row 3
Phial : E31 EjiEhA is triangular
The multiplier was 5/5 и 1	U =
3
5
0
4
6
7
(5)
2
0
0
on i» dnjuX wTk'nT» ”h« tai Fo™ml	i complete Since U had 2.5,7
ongmal 4 were independent, as we will «ее/т^СП1 (“nd ,hereforc ,hc columns of the
Wr can summarize the elimination tt, l ma,r*ce8 a”d U have full rank 3.
-------------------------- 1 и fn no row exchanges are involved.
Use the first equation	Г
U« the new second equation to Um" ’ be’°W ‘hC P'V°‘'
Continue to column 3. The exoe t7 °U'	? Ь'1°* Р'Ш 2 TOW 2
L ~~-----------------------CC resull,san upper triangular matrix U.
Elimination on A produces U The
UXXT-X n"" ,id" appl'ed» "» "8"' ** " *"d
- c (equivalent lo the old
uon Th« gives the solution x.
2.1. Ilic Idea of Iiliininalк>n
43
Possible Breakdown of Elimination
Elimination can fail. Zero <«« appear in a pivot position Subliiu ting that /cm from
lower rows will not clear out the column below the unwanted zero Here is an example
Zero In pivot 2 from	2	’* J 2	1
... Л . i .	A«4	0l4-»0 0fl II
elimination In column 1
i Mil	II Ij I •>
The cure is simple if it works. Exchange row 2 with the zero for row .4 with the A.
Then the second pivot is 5 and we can clear out the second column lx-low that plvol
So elimination continues as normal after the row exchange by the matrix
Row exchange
PH -
I (It»'
0 0 I
0 I 0
3 4 1 Г 2
0 fl -	0
5 13	0
3 4
A 13
0	0
- u.
2
0
0
For this small example, the row exchange is all we need It produced lhe upper triangular
U with nonzero pivots 2,5,6. Normally there are more columns anil rows lo work on.
before we reach U.
Caution! That row exchange was a success. This is what we hope for, lo reach U
with no zeros on its main diagonal. (The pivots 2,5,0 are on the diagonal.) But a slightly
different matrix Л* would lead to a bad situation: no pivot Is available In column 2
A* = 4
3
fl
3
14
17
(J
0
3 4 '
0 fl
0 13
2
2
4
2
- V

At this point elimination is helpless in column 2. No pivot is available. This misfortune
tells us that the matrix A* did not have full rank. Column 2 of U* is in lhe same
direction as column 1 of U‘. So column 2 of A* is in the same direction as column I of A’
You see how dependent columns are systematically identified by elimination There are
nonzero solutions X to A’X » 0. The columns are not independent.
This example has column 2 = | column 1. The solution vector X is (3. 2,0),
The equation A'x = b may or may not be solvable, depending on b'. probably not
Dependent or Independent Column*
This A* looks like a failure of elimination: No second pivot. But it was a success
because the problem was identified: dependent columns. The beauty of aiming for
a triangular matrix U or U* is that the diagonal entries tell us everything.
A triangular matrix U has full rank exactly when its main diagonal has no zeros.
In that case (square matrix with nonzero pivots) the columns of U are independent. Also
the rows are independent. We can see this directly because elimination has simplified
the original A to the triangular U.
How do we know that a zero on the diagonal ofU* leads to dependent columns 7
Chapter 1 Solving Linear Equations Ax = b
44
= upper triangular with an extra zero on its diagonal
0 0 0 *
1. The first three columns are dependent. 2. The last two rows are dependent.
The Row Picture and the Column Picture
The next pictures will show those three possibilities for Ax = b : No solution or one
solution or infinitely many solutions. There are two ways to see this. We start with the
rows of A and we graph the two equations: the row picture.
r-2jt = -l
z —2y = l
Figure 2.1: Parallel lines mean no solution.
Top line twice means many solutions.
Intersecting lines give one solution.
The solution is where the lines meet.
If we had three equations for z.y, and z, those two lines would change to three planes.
Each plane like 2x +	+ 3z = 9 would be in 3-dimensiona) space. This row picture
becomes hard to draw The column picture is much easier in three or more dimensions.
The column picture just shows column vectors: columns of A and also the vector b.
We are not looking foe points where these vectors meet The goal of Ax = b is to combine
lhe columns of A so as to produce the vector b
This is always possible when the columns of A (n vectors in n-dimensional space) are
mdependent Then the column space of A contains all vectors b in R". There is exactly
one combination Ax of the columns that equals b Elimination finds that x
The columns of A are independent
Column 1 + Column 2 = b
Then the solution is x = 1, у — 1
Construct b from the columns I
combination Az of the columns of A.
2.1. The Idea of Elimination
45
Examples of Elimination and Permutation
This chapter will go on to express the whole process using matrices. An elimination matrix
E will act on A. In case zero appears in a pivot position. a permutation matnx P is needed
The result is an upper triangular U and a new right hand side c. Then Ux = c is solved by
back substitution.
In reality a computer takes those steps (x = A\b in MATLAB». But it is good to solve
a few examples—not too many—by hand. Then you sec lhe steps to Ux = c and not only
the solution X. This page contains a variety of examples, hopefully lo show the way.
2
4
-2
4
9
2
0
-3	7
Those elimination steps Ел and E3i and Ek produced zeros in positions (2,1) and (3.1)
and (3,2). The matrices E have —2 and +1 and —1 in those positions. The same steps
must be applied to the right hand side b, to keep the equations correct.
’ 2
8 —> Eli b =
10
2'
4 —»E31E2ib =
10
’ 2'1
4 —♦ E32E31 Ел b = Eb = c =
12
2
4
8
There is a better way to make sure that every operation on lhe matrix A (left side of
equations) is also executed on b (right side of equations). The good way is to include b
as an extra column with A. The combination [A b ] is called an augmented matrix.
и	'i-
(7)
Now we include an example that requires a permutation matrix P. It will exchange
equations and avoid zero in the pivot. This example needs P in column 2.
Exchange
rows 2 and 3
In the final description PA = LU of elimination on A. all the E’s will be moved to the
right side. Each matrix in Еи^з1^и в inverted. Those inverses come in reverse order
in L = E^1 E^i Ей'. The overall equation is PA = LU. Often no permutations are
needed and elimination produces A = LU: the best equation of all.
That permutation P23 exchanged rows 2 and 3 when it was needed to avoid a zero pivot
But we could have exchanged rows 2 and 3 at the start. (Then Ел and £jt have to change
places) Section 2.4 will return to understand all the possible permutations of n rows.
There are n I possible matrices P. including P = I for norvw exchanges.
гъдмег 1 Solving Linear Equations Ax = b
46
Problem Set 2.1
Problems 1-10 are about riiminauoo on 2 by 2 systems.
1
2
3
6
Whar multiple in of equation 1 should be subtracted from equation 2 ?
2x + 3y = 1
lQj + 9y = ll.
After elimination. write down the upper triangular system and circle the two pivots.
Use back substitution to find X and у (and check that solution).
It equation I is »AVri to equation 2, which of these are changed: the planes in the
row picture, the vectors in lhe column picture, the coefficient matrix, the solution ?
What multiple of equation 1 should be wbtrocted from equation 2 ?
2x - 4y - 6
-X + 5y = 0.
After this elimination step, solve lhe triangular system. If the right side changes to
(-6,0). what is the new solution?
What multiple / of equation I should be subtracted from equation 2 to remove c?
ax + by = j
a + dy = g.
The first pivot is a (assumed nonzero) Elimination produces what formula for the
second pivot ? What is у 7 The second pivot is missing when ad-be: singular.
Choose a nght side which gives no solution and another right side which gives
infinitely many solutions What arc two of those solutions 7
Singular system 3x + 2y=|0 and 6x + 4g= 
2“ “**'•• ни» «
nai maxes it solvable Fmd (wo solutions in thal singular case.
b + 6gs 16
Cr + 8y = g.
<u + 3y = -3
Cr + бу = g
For which three numbers 1- docs cilm
exchange .’ Is the number of solutions n^T d°Wn ? Which ’» fixed by a row
Ь + Зу« б
2.1. I he l<k«i  Я HiiiiidjIi'M
47
9 What ICM Oil 6| jiuj (q decides Wilcdxi tllCSC I WO КЦииШть ell'/» U WvluUoO? Ho»,
шилу solutions will they have? Praw Ox column риши» for b - (| 2) w-4 H 4)
if «• 2y = hi
4f 4 у - by.
10	Draw the lines j 4 // ; fj and t t 2</ - <> and die equation у - that comes from
elimination Which line 5i 4(f < goes through t|n solution of lh*x equations ?
Problems 11-20 study elimination on J by J systems (and poe&ible failure).
11	(Recommended) A system of linear equations c an't have exactly two solutions Why 7
(a)	If (z, у, t) and (X, Y. Z] ate two solution», wliai is another solution?
(b)	If 25 planes meet al two points, where else do they meet?
12	Reduce to upper triangular form by row operations Then lind z, и
2x +	Зр +	г	— И	2f -	Uy	-	3
4z +	7у +	ht	 20	4ж -	by	f < “	7
-	2y +	2г	« 0	2x -	у	- 'it «	5
13	Which number d forces a row exchange, and what is	the	li (angular system (not sin
gular) for that </? Which d makes this system singular < no thud pivot) ?
2f + 5p + г = 0
4z + dy + x « 2
U - z “ 3.
14	Which number b leads later lo a row exchange? Which b leads to a missing pivot 7
In that singular case find a nonzero solution z, y, t.
x + by -0
x - 2 у - x = 0
y + x = 0.
15	(a) Construct a 3 by 3 system that needs 2 row exchanges to become triangular
(b) Construct a system that needs a row exchange and breaks down later
16	If rows I and 2 are the same, how far can you get with elimination (allowing row
exchange)? If columns I and 2 are the same, which pivot is missing?
Equal	2z - у + г - 0	2x	+	2y +	z = 0	Equal
rows	2x - у 4- z = 0	4z	+	4y +	z = 0	columns
4z + у + x = 2	6x	+	6y +	z « 2.
17	Construct a 3 by 3 example that has 9 different coefficients on the left side, but
rows 2 and 3 become zero in elimination. How many solutions to your system with
b = (1,10,100) and how many with b = (0,0,0)?
аир» г Solving Linear Equations Ax = b
48
18
Which number q makes this system smguu
many solutions? Find the solution that has
side t gives it infinitely
3y + flZ = t
19
20
For which two numbers a will elimination fail on A = (j’j?
For which three numbers a will elimination fail to give three pivots?
Гп 2 3]
• - —* -
4 is singular for three values of a.
tai
row sums 4 and 8, and column sums 2 and a:
. Find two matrices with the
Look for a matrix that
The four equations are solvable only if t = _______
correct row and column sums. Write down the 4 by 4 system Ax = b with x =
(a, b, c, d) and make A triangular by elimination
Matnx
a +1 = I а + c = 2	4 equations
e + d « 8 b + d = i	4 unknowns
22 Create a MATLAB command A(2. ;) ... for the new row 2. to subtract 3 times
row 1 from the existing row 2 if the 3 by 3 matrix A is already known.
23 Find experimentally the average 1 a and 2nd and 3rd pivot sizes from MATLAB s
[L.17]  ta(raad(3)) with random entnes between 0 and 1. The average of
1/(1,1) is above | because hi picks the largest pivot in column 1.
24 If the last corner entry is A(5,5) = 11 and lhe last pivot of A is 17(5,5) = 4. what
different entry A(5,5) would have made A singular ?
25 Suppose elimination takes A to U without row exchanges. Then row j of U is «
combination of which rows of A? If Ax = 0, is Ux = 0? If Ax = b. is Ux = b?
If A starts out lower triangular, what is the upper triangular 17?
26 Start with 100 equations Az » 0 for 100 unknowns x = (zj,..., z10q). Suppose
elimination reduces the 100th equation lo 0 = 0, so the system is “singular".
(a)	sywems Az = 0 have infinitely many solutions. This means that some
linear combination of the 100 column, of A is__________
(b)	Invent a 100 by 100 singular matnx with no zero entries.
(c)	Describe in words the row picture and column picture of your Ax = 0.
2.2. Elimination Malnccs and Inverse Matrices
49
2.2 Elimination Matrices and Inverse Matrices
Elimination multiplies /I by Ел.En\ lhen Ей,..., En2,... as .4 becomes EA = I/.
2 In reverse order the inverses of the E’s multiply U lo recover A = E~lU. This is LU.
^3 A-1 A = / and (LU)~l = U~lL~l. Then Ax = b becomes ж — A' 'b = U~'L~'b
All the steps of elimination can be done with matrices. Those steps can also be undone
(inverted) with matrices. For a 3 by 3 matrix we can write out each step in detail—almost
word for word. But for real applications, matrices are a much better way.
The basic elimination step is to subtract a multiple ft) of equation j from equation t.
We always speak about subtractions as elimination proceeds. If the first pivot is atl = 3
and below it is ал “ —3. we could just add equation 1 to equation 2. Thai produces
zero. But we stay with subtraction: subtract Gt = -1 times equation 1 from equation 2.
Same result. The inverse step is addition. Compare equation (10) to (I I) to see it all.
Here is the matrix that subtracts 2 times row 1 from row 3: Rows 1 and 2 stay lhe same.
Elimination matrix E,.	_
Em =	0 10
Row 3, column 1, multiplier 2	q ।
If no row exchanges are needed, then three elimination matrices Ell *nd Em
and Era will produce three zeros below the diagonal. This changes A to the triangular U:
A is 3 by 3 U is upper triangular E33E31 Ej>A = U (I)
The number is affected by the fai and f31 that came first We subtract times
row 2ofU (the final second row. not the original second row of A). This is the step that
produces zero in row 3. column 2 of U. E3? gives the last step of 3 by 3 elimination.
Example 1 En and Em subtract multiples of row 1 from rows 2 and 3 of A:
	10 0'	1 0 O'	3 1 O'		3 1 o'	two new
EmEji A =	0 1 0	1 1 0	-3 1 1	=	0 2 1	zeros in (2)
	-2 0 1	0 0 1	6 8 4		0 6 4	column 1
To produce a zero in column 2. En subtracts 132 = 3 times the new row 2 from row 3:
	1	0	0	3	1	o'		3	1	o'	U has zeros
(Ем)(Е31Е31А) =	0	1	0	0	2	1	=	0	2	1	= U below the	(3)
	0	-3	1	0	6	4		0	0	1	main diagonal
Notice again: En is subtracting 3 times the row 0,2,1 and not the original row of A.
At the end. the pivots 3,2,1 are on the main diagonal of U: zeros below.
Example 4 will show the "inverse” of each elimination matrix Ey. This leads to
the inverse of their product E = E»Em£m- That inverse of E is special. We call it L.
nnptrf 2. Solving Linear Equations Az -
Inverse Matrices
50
„ a-	Wr kwi for ш'кгеяеждХгьг'Л'1 of the satne size, such
Suppose .4 is a square matrix. we кхж io>	.	.	.
th^1 times A equals I Whatever A does. A’* undoes Their product is the identity
mamx—which does nothing to a vector, so A”* Az = X- But A~ mighlnot «Ш.
The square matrix A needs independent columns to be invertible. Then A1 A = /.
What a mams mostly does is to multiply a vector. Multiplying Ax = b by A~*
gives A~lAx  A"1*. Thu h x = A"1*- The product A 1A is like multiplying by
a number and then dividing by thai number Numbers have inverses if they arc not zero.
Matrices arc more complicated and more interesting The matrix A"1 is called "A inverse."
DEFINITION The matrix A is invertible if there exists a matrix A 1 that “inverts" A:
Two-sided inverse A-1 A = I and A A1 = /.	(4)
Not all matrices have invenn This is the first question we ask about a square matrix:
Is A invertible? Its columns must be independent We don’t mean that we actually
calculate A’1. In most problems we never compute it! Here are seven “notes" about A-1.
Note I The inverse exists if and only if elimination produces n pivots (row exchanges
are allowed). Elimination solves Ax = b without explicitly using the matrix A"'.
2 a^tn*A c“n0‘ *“** ,wo Afferent inverses Suppose BA = I and also
AC « I Then В = C. according to this "proof by parentheses":
0(AC)-(SA)C gives BI-IC or В-C.	(5)
(multiplying А	' mu^JPb’’ng A from the left) and a rightinverse C
oiupiymg A from the ngta to give AC - I) must be the some matrix
Then ж = А-‘Ах
A has dependent columns, ft fonno. h“ ’U"'''fro rector x such that Ax = 0. Then
If A is invertible, then Ax - n ' N° П“‘Г’Х bring ° b“* *° *'
N«“	,	“'’"'““'“""«independen..
(6)
** " 01	1
lc и
r,nunanto(A Amar'
” Pivots is usually decial.'nvert’^e *ts determinant is not
1 before the determinant appears-
2.2. Elimination Malnc.es and Imerse Мжпсс*
51
Note 7 A triangular matrix has an unenc pros ided no diagonal entries d, are zero:
If A =	f1 с e о о о • X о • x X X X X к	I	then A"1 =	1/dj	XX	X 0	•	x	x 0	0	•	x 0	0	0	1/dn
Example 2 The 2 by 2 matnx A = } J is nor invertible, it fails the test in Note G.
because ad = be. it also fails the test in Note 4. because Ax « 0 when x — (2, — 1).
It fails to have two pivots as required by Note 1. its columns arc dependent
Elimination turns the second row of this matnx A into a zero row. No pivot
Example 3 Three of these matrices are invertible, and three arc singular Find the im erve
when it exists. Give reasons for noninvertibihty (zero determinant. too few pivots, nonzero
solution to Ax = 0) for the other three The matrices are in the order A.B.C.D.S.T:
A is not invertible because its determinant is 4 • 6 — 3 • 8 • 24 - 24 • 0. D is not
invertible because there is only one pivot; the second row becomes zero when the first row
is subtracted T has two equal rows (and the second column minus ihe first column is
zero), in other words Tx » 0 has the nonzero solution x = (-1.1.0). Not invertible.
The Inverse of a Product AB
For two nonzero numbers a and 6. the sum a + b might or might not be invertible The
numbers a = 3 and b = -3 have inverses | and —Their sum a + b = 0 has no inverse.
But lhe product ab = -9 does have an inverse, which is | times
For matrices A and B, the situation is similar. Their product AB has an inverse if and
only if A and В are separately invertible (and the same size). The important point is that
A-1 and B~l come in reverse order:
If A and В are invertible (same size) then the inverse of AB is
(AB)-1 = B-'A-1	(АВ)(В-'А~1) = A IA~' = AA-1 = / (7)
52
CMptrr г Solving Linear Equations Ax = b
м	BB-'«. Si«l«1y В-Л-' limes ЛВ equsls I.
0-1 а-1	. basic гак of mathematics: Inverses come m reverse order.
й ‘	1	if vou put 00 socks and then shoes, the first to be taken off
It is also common sense, u sou рш vu
aretbe_____. The same reverse order applies to three or more matrices:
(ABC)'1 =
(8)
Revtrse order
Example 4 Inverse of ал ehmumtun matrix If E subtracts 5 times row 1 from row 2,
then E"1 adds 5 times row 1 to row 2:
E subtracts
E-'adds
E
1 0 O'
5 1 0
0 0 1
Multiply EE 1 to get the identity matrix /. Also multiply E~lE to get /. We are adding
and subtracting the same 5 times row 1. If AC-I then automatically CA = I.
For square matrices, an inverse on one tide it automatically an inverse on the other side.
Example 5 Suppose F subtracts 4 times row 2 from row 3, and F~1 adds it back:
0
0
1
1 0 o'
0 1 о
0 4 1
roZ^EHFE A,l° ти|‘Ф'У E'1 I,mcs F
0
1 0
F£« -s 1
20 -4
1 0 O'
5 1 0
° 4 !.
inverse doesn't.
order pf wbtracts 4 times the new row 2 (changed
W an effect of size 20 from row 1.
First F"1 adds 4 times row 2 to
_____________________________________ 1
ло rffectfrom row 1.
•be triangular U to the original A.
,a triangular L: Equation (11) below.
(9)
In the order £-1^-1 л _
.............................................................. 4	umcs row £ и/
ge again In this order E'tp-i	^сгс is no 20, because row 3 doesn’t
^“bwhywedxxne A 1n’ '	'ffM fnm	L
"”,,tlpl‘CT' f*“ -mo Place'Л,10	the tnamtular U th* ^ein.l A.
1

is special.
2.2. Elimination Matrices and Inverse Matrices
53
L is the Inverse of E
E is the product of all the elimination matrices Et), taking tn from A to its upper triangular
form EA = U. We arc assuming for now that no row exchanges arc involved (thus P = /).
The difficulty with E is that multiplying all the separate elimination steps Et) docs not
produce a good formula. But the inverse matrix E~* becomes beautiful when wc multiply
the inverse steps E~^. Remember that those steps come in the opposite order.
With n = 3, the complication for E = EyjEsi Ell is in the bottom comer:
.(10)
1
Watch how that confusion disappears for E 1 = L. Reverse order is the good way:
E"‘ =	’ 1 61	1	’ 1 0 1	1 0 1	=	’ 1 4i 1	I.	(ID
	0	0 >.	4» 0 1	0	1		4i 4a 1		
All the multipliers 1ц appear in their correct positions in L. The next section w ill show
that this remains true for all matrix sizes. Then EA = U becomes A = LU.
Equation (11) is the key to this chapter: each t„ in its place.
Problem Set 2.2
Problems 1-11 art about elimination matrices.
1 Write down the 3 by 3 matrices that produce these elimination steps:
(a)	Eh subtracts 5 times row 1 from row 2.
(b)	Ем subtracts -7 times row 2 from row 3.
(с)	P exchanges rows 1 and 2. then rows 2 and 3.
2 In Problem 1. applying E31 and then Ем «о b = (1,0,0) gives ЕмЕ^Ь =___________.
Applying E32 before Eji gives ЕцЕ^Ь = . When Ejj comes first,
row_______feels no effect from row____.
3 Which three matrices Eii, Ем. Em put Л into triangular form U ?
Multiply those Es to get one elimination matrix E. What is E-1 = L?
Chapter 1 Solving Linear Equation* Ax « (>
54
4
5
6
7
8
9
10
11
12
13
14
IS
n n 01 m a fourth column tn Problem 3 to produce | A b]. Carry out
Include b - (1< Ou)*5	matrix to solve Ax = b.
the elimination steps on this augmented m
7 MW1 the third pivot is 5. if you change nM to 11. the third pivot is
_____И you unangc ujj ---------
If every column of A is a muluple of (1.1.1). then Ax is always a multiple of
(1,1,1). Do a 3 by 3 example. How many pivots are produced by elimination?
Suppose E subtracts 7 times row 1 from row 3.
(a)	To invert that step you should_7 times row---------to row-----------
(b)	What “inverse matrix" E~1 takes that reverse step (so E~1E = IYf
(c)	If the reverse step is applied first (and then E) show that EE~1 - /.
The determinant of M = [; *] is det Af = ad - be. Subtract f times row 1
from row 2 to produce a new Af*. Show that detU‘ = det Af for every t. When
t • c/a, the product af pnoa equals the determinant: (a}(d - fb) equals ad - be.
(a)	Eji subtracts row I from row 2 and then Ри exchanges rows 2 and 3. What
matnx Af = РпЕц does both steps at once?
(b)	Pn exchanges rows 2 and 3 and then Ejt subtracts row I from row 3. What
rrutn* = ^lPa *« h«h «П» M once? Explain why the Af s are the
tame but the Es art different
hat matrix addt row 1 to row 3 and at the samf llme row 3 to row । 7
(b)	row ) to row 3 and rben adds row 3 to row I?
Create a matrix that has an «n^ =.	....
phots without row exchanges (The 2 C ,mina,,on produces two negative
Fw‘he*“perm«a6on matnets"	.
nd P by trial and error (with 1 's and 0’s):
_ [0 0 1]
0	1	01
and P =	0	0	11	•
1	0	0
L
Pe 0 1 о
.1 0 0

“lurnn (t,z) of A-«. Check АЛ'1.
t—- •'’’J 1*1 I-1
Find an upper triangular U (j»t diagonal) with lP = I. Then U-1 e
(a) If A ts urvernble and AB = AC. prove quickly that В = C-
(b) И А = Щ]. find two differ------------
22
18
17
18
18
20
21
22
23
24
25
28
27
28
29
30
F.ltminalxm Mainer» ual Immr
55
(Importanti II A ha» row I ♦ i*w 2 • row i Ju<w that A  м inver .rw
(a) Explain why As » (0 0. I самим have a <r4ui»> л AM eqn I • ецп 2
(h) Which right inlet (A). S S mtglM allow * «.lutMai ib A* 4’
(c) In the elimination prnce»*, what happen* tn equal» n 1'
If A ha* column I + column 2 column t «b*w that 4  ata mW«
(a)	Find a nonzero solution я tn As 0 The matrn и I by i
(b)	Elimination keep* column* 1*2 3 F<plain why there .» a» turd pi •  «
Suppose A is invertible and you exchange n« firu two r<**« to reach В I» ihr <ww
matrix В invertible? How would you find ft from A ?
(a)	Find invertible matrice* .4 and ft *uch that 4 * В is ant invert >ie
(b)	Find singular matrices A and В «xh that A ♦ ft it mvertAle
If the product C ® AB is invertible > A and ft are square then А ла" .
Find a formula for A'1 that involves C 1 and ft
If lhe product M = ABC of three square matrices is invertible then ft i» unrrz «
(So are A and C.) Find a formula for B~1 that involve* Af 1 and A and f
If you add row 1 of A to row 2 to get B. how do you find В from A ’
Prove that a matrix with a column of zeros cannot have an inverse
Multiply [j J] times [_« “Jj. What is the inverse of each matnx if otf a hr'*
(a)	What 3 by 3 matrix E has the same effect as these three step*’ Subtract row |
from row 2, subtract row 1 from row 3. then subtract row 2 from row 3
(b)	What single matnx L has the same effect as these three reserve steps1 Add row
2 to row 3. add row 1 to row 3. then add row I to row 2
If В is the inverse of Aa, show that AB is the inverse of A.
Show that A  4 «eye (4) - ones (4.4) is nor invertible Multiply A «ones (4.1)
There are sixteen 2 by 2 matrices whose entries are I's and O’s. How many of them
are invertible ?
Change / into A-1 as elimination reduces A to / (the Gauss-Jordan idea).
Could a 4 by 4 matrix A be invertible if every row contains the numbers 0.1,2,3 in
some order? What if every row of В contains 0,1,2, -3 in some order?
П^рит г Solving Linear Equations Ax = Ь
2 1 11	2 "J
Л=121	and -1	2	1 
1 1 2j l-l "I 2J
Use Gauss-Jordan elimination on |U /] to find the upper triangular I/-1:
UWl-I
0 O'
1 0
0 1
1 a b
0 1 c
0 0 1
True or false (with a counterexample if false and a reason if true):
(a)	A 4 by 4 matnx with a row of zeros is not invertible.
(b)	Every matnx with l's down the main diagonal is invertible.
(c)	If A is invertible then A-1 and A2 are invertible.
(Recommended) Prove that A is invertible ifo#Oanda#b (find the pivots or
A"*). Then find three numbers c so that C is not invertible:
a b b
a a b
a a a
2 c
c c
8 7
c
c
c
C =
‘	Wdtaj^do.on IЛ 1|. Exlend
“permutation matnccs" Show th» p	/ but in any order. They are
“™»> Ky Ml 1 Леи block meinee»;
00 • 3 by 3 rneinx ml) you if л jj jnvenibk ?
2.3. Matnx Computation and A = LU
57
2.3 Matrix Computations and A = LU
1	The elimination steps from Л to U cost |n3 multiplications and subtractions.
2	Each right side b costs only n3: forward to Ux = c. then back-substitution for x.
3	Elimination without row exchanges factors A into LU (two proofs of Л « LU}.
How would you compute the inverse of an n by n matnx Л ? Before answering that ques-
tion I have to ask: Do you really want to know Л * 1 ? It is true that the solution to Ax = b
(which we do want) is given by x = Л“’Ь. Computing Л-’ and multiplying Л"'Ь is a
very slow way to find x. We should understand Л-* even if we don’t use it.
Here is a simple idea for Л-*. That matnx is the solution to ЛЛ-’ = I. The identity
matrix has n columns ei.ej,Then ЛЛ_| = I is really n equations Лх*  e*
for the n columns хц of A~ *. We have three equations if the matrices are 3 by 3:
We are solving n equations and they have the same coefficient matnx A. So we can solve
them together. This is called “Gauss-Jordan elimination”. Instead of a matrix [ A b j
augmented by one right hand side b, we have a matrix [ A I I augmented by n right hand
sides (the columns of /). And elimination produces [ I Л*1 ].
The key point is that the elimination steps on A only have to be done once. The same
steps are applied to the right band side—but now АЛ_| = I has n right hand sides.
The n solutions x, to Ax, = e, jo into the n columns of A*1. Then Gauss-Jordan takes
[ A I ] into [ I A-1 ]. Here elimination is multiplication by A-1.
In this example A subtracts rows and A-1 adds. This is linear algebra's version of
the Fundamental Theorem of Calculus: Derivative of integral of f(z) equals f(x).
The Cost of Elimination
A very practical question is cost—or computing time. We can solve 1000 equations on a
PC. What if n = 100,000? (Is A dense or sparse?) Large systems come up all the time
in scientific computing, where a three-dimensional problem can easily lead to a million
unknowns. We can let the calculation run overnight, but we can’t leave it for 100 years.
Chapter 1 Solving Linear Equations Ax = b
58
r ,. . .. . юа below the first pivot in column 1. To
The first sage	one multiplication and one subtraction,
find each new entry below p multiplications and n2 subtractions. It is actually
Wh witf count this first stage er n	j
пг - n, because row 1 does not change
The next stage clears out the second column below (he second pivot. The working
marrn is now of size n -1 Estimate this stage by (n -1 )2 multiplications and subtractions.
The matrices art getting smaller as elimination goes forward. The rough count to reach U
is the sum of squares n1 + (ia — !)* + •••+ 2* +12.
There is an exact formula |n (n + |)(n +1) for this sum of squares. When n is large,
the | and the 1 art not important The lumber that matters is | n3. The sum of squares is
like the integral of x3! The integral from 0 lo n is | n1:
Elimination on A requires about * n3 multiplications and ’ ns subtractions.
What about the nght side 6? Going forward, we subtract multiples of b} from the
lower components bj....b» This is n -1 steps The second stage takes only n - 2 steps,
because b, is not involved. The last stage of forward elimination (b to c) takes one step.
Now start back substitution Computing x. uses one step (divide by the last pivot). The
next unknown uses two steps When we teach X| it will require n steps (n — I substitutions
of the other unknowns, then division by the first pivot). The total count on the right side,
from b to c to t—forward to the bottom and back to the top—is exactly n2:
Kn ~ 1) + (n-2)+ ••• +J| + |l + 2 + ... + (n_ l) + n|-na. (2)
To see that sum. pair off (n - I) with 1 and (n - 2) with 1 The pairings leave n terms,
each equal to n. That makes n2. The right side costs a lot leu than the left side’
Solve Each right side needs n* multiplications and n3 subtractions.
Host long does it take to solre Ax = b? For a random matrix of order n = 1000.
a typical time on a PC is 1 second. The time is multiplied by about 8 when n is multiplied
by 2 For professional codes go to netlih.org
According to thn n1 rule, matrices that are 10 times as large (order 10.000) will take
a thousand seconds Matrices of order 100.000 will take a million seconds. This is too
expensive without a supercomputer, but remember that these matrices are full. Most large
matrices in practice are sparse (many zero entries) In that case A = LU is much faster.
. .	Proving A = LU
Elimination is expressed by EA г U .„л
showed how the multipliers I fjii * lnvcned by Л = LU. Equation (11) in Section 2.2
Wlrv should me want to find a proof , 2?П'° *** n®ht Pos*t’0’» in E~1 which is L-
just seen that partem and believed К and	A proof means that we have not
"“Ol'W n, but understood it
2.3. Matrix Computation' and A = LU
59
The Great Factorization A = LU
Let me review the forward steps of elimination. They start with a matrix A and they end
with an upper triangular matrix U. Every elimination step Et) produces a lower triangu-
lar zero. Those steps EtJ subtract ftl limes equation j from equation i below it. Row
exchanges (permutations) are coming soon but not yet.
To invert one elimination step EtJ, we add instead of subtracting:
Equation (10) in Section 2.2 multiplied ExiEsi Eji with a messy result:
~Gt
(GaGi - Gt)
Equation (II) showed how inverses (in reverse order E,,1 E,,1 E« > produced perfection:
Then elimination EA - U becomes A - E~*U  LU if we run it backward from U to A.
These pages aim to show the same result for any matrix size n. The formula >1 = LU
is one of the great matrix factorizations in linear algebra.
Here is one way to understand why L has all the tt} in position, with no mix-up.
The key reason why A equals LU: Ask yourself about lhe pivot rows that are subtracted
from lower rows. Are they the original rows of A? No. elimination probably changed them.
Are they rows of 17? Yes. the pivot rows never change again. When computing the third
row of U, we subtract multiples of earlier rows of U (not rows of A!):
Row 3 of U = (Row 3 of A) — Gt (Row 1 of If) — f«(Row 2 of 17).	(3)
Rewrite this equation to see that the row [ fSl tM 1) is multiplying the matrix 17:
(Row 3 of A) = Gi(R°* 1 of 17) + ^«(Row 2 of 17) + l(Row 3 of (7).	(4)
This is exactly row 2 of A = LU. That row of L bolds Gt. Gz. 1. All rows look like this,
whatever the size of A. With no row exchanges, we have A = LU.
Пиргег 2. Solving Linear Equations Ax = b
60
Second Proof of A = LU'. Multiply Columns Times Rows
1 would like to present another proof of A = LV. The idea is lo see elimination as removing
one co/amn of L Нош one of U fro* A. The problem becomes one size smaller.
Elimination begins with pivot row = row 1 of A. We multiply that ptvot row by the
numbers fj> and /ц and eventually Ли- Then we subtract from row 2 and row 3 and
eventually row n of A. By choosing Gi = ««/«и and fji = ttji/ац and eventually
f«i = Oni/oit. die subtraction leases zeros in column 1.
Now we face a similar problem for At. And we take a similar step to reach Лэ :
Step 2 removes
0 (row 2 of Aj)
1 (row 2 of Aj)
f»(row2of Aj)
f«j(row2of Aj)
from Aj to leave As =
0
0
0
0
0 0
0 0
X X
X X
0
t, • (0,	7* !<COnd	Wc rcnK,vcd a Column
removes a column f, times a pivot row^r^*Con,lnuin« inthe same way, every step
’ a pmx row Uj of I/. Now put those pieces blck togc,hcr:
U1
- LU
(5)
“n
1* very	Й0" Of L U- wu introduced at
•o multiply LU—by	««ton will review this important way
Nona that (/ is upper triangular The mvm (C° of L d®« rows of U).
»«*« triangular with 14 on iu main dh^Z?	»*lh fc -1 zeros. And L is
Column 4 also beg|ns wj|h
к - 1 zeros.
2.3. Matrix ( ompoiaiion* and Л - /,/.
61
Elimination Without Row F.x charters
The next section is going to allow row exchange* Z', They are necessary to пкл»е zeros
out of the pivot positions Before we go there, we can answer this hasa. queso. x,
When ir Л /7/po<uh/e with no mw rrrtwngzfom/no zzeor m theprvzirr ’
Answer All upper left к hy к mhmatrkes of A must he invertible (sizes 1 I tr. <> i
The reason is that elimination is also factoring every one of those subn-utrices (i by *
corners of A). All those corner matrices A* agree with Л»Л» (к by к earners of I. ami I )
/,* 0
Ak •
tells ns that A*
fM
Problem Set 2.3
Problems 1-8 compute the factorization A = LU (and also A LDU)
1 (Important) Forward elimination changes | J ] x b to a triangular J | z r
x + у = 5
x + 2y = 7
x + у ® 5
p-2
I I 5
I 2 7
That step subtracted fл —_________times row I from row 2. The reverse step oddt f л
times row 1 to row 2. The matrix for that reverse step is L - Multiply this
L times the triangular system [J } ]»i = [$] to get = _ .. In letters. L
multiplies Ux = c to give_________.
2 Write down the 2 by 2 triangular systems Lc = b and Ux = c from Problem 1.
Check that c - (5,2) solves the first one. Find x that solves the second one
3 What matrix E puts A into triangular form EA = If? Multiply by E1 = /, to
factor A into LU:
2 1 0‘
0 4 2
6 3 5
A =
4 What two elimination matrices Ел and E32 put A into upper triangular form
E32E2lA = U1 Multiply by E3, and E^i to factor A into LU = Ej,1 E32'U:
1	1	1*
A=	2	4	5	.
0	4	0
Chapter 2. Solving Linear Equations Ax = b
62
5
What three elimination matrices	P* Л «*> * UPP" “Wul*- form
ЕззЕз^Л = 1Л Multiply by e£. Ej> and E2l to factor A into L times У:
1 0 I
Л= 2 2 2
3 4 5
L = £^1Е31'Е32*.
6
A and В are symmetric across the diagonal (because 4 = 4). Find their triple factor-
izations LDU and say how I/ is related to L for these symmetric matrices:
Symmetric
and В =
4
12
4
o'
4
0
1
0
IRecommended) Compute L and U for the symmetric matrix Л:
a
a
b
b
a
b
6
a
b
c
abed
Find four condition» on a. 5, c, d lo get A  LU with four pivots.
Thh mmsymmetric matrix will have the same L as in Problem 10:
Find L and U for
a
a
a
a
b
b
b
c
c
t
d
9
fi“l	л . II/ wil, piroB.
Sd’'‘*“««*>4«»u.sBtal<.neiobe|/B_<.(i)en(|e!
10
and l/ =
K* safety multiply LU and solve 4, » a
r , .	b “ Circle c when you see it.
S°**e f*C - Ь to find c. Then solve//
*пю|уе1/Ж = с1оЫх Wha(was4?
and b =
11
I 0 0
*> 1 1 0
Л i i
1 1
1
“d l/s о 1 i
W'»* ««01 a,
4
5
6
and
b =
0 0
1 steps to L. what matrix do you reach?
r *	0 0
L~ fn 1 0
/» f» 1
1
2.3. Main* Computation.' and A = LU
63
(b) When you apply the same steps to /. what matnx do you get ?
(c) When you apply the same steps to LU, what matrix do you get ?
12 If A = LDU and also A = L\D\U\ with all factors invertible, then L - L\ and
D = D\ and U = Ui. "The three factor* are unique."
Derive lhe equation L\XLD = DiUtU~x. Are the two sides triangular or diagonal?
Deduce L = Li and U = Ut (they all have diagonal l’»J. Then D = Dt.
13 Tridiagonal matrices have zero entries except on lhe main diagonal and the two ad-
jacent diagonals. Factor these into A = LU and A = LDL':
	1	1	O'		a	a	0
A =	1	2	1	and A =	a	a + 6 b
	0	1	2		0	b b + e
14 If A and В have nonzeros in the positions marked by x. which zeros (marked by 0)
slay zero in their factors L and U1	
'x X X x' A-	X	1	X	° л	0	x	x	x 0	0	x	x	x x x O' n_ z * 0 * S" x 0 x x ° X X X
15 Easy but important. If A has pivots 5.9.3 with no row exchanges, w hat are the pivots
for the upper left 2 by 2 submairix Aa (without row 3 and column 3 of A) ?
Following the second proof of A - LU. what three rank I matrices add to A ?
0 ‘
1
4
2
5
6
 €|U| + £jua + €з**э e LU!
columns multiply rows
17 Multiply LrL and LLT by columns times rows when the 3 by 3 lower triangular L
has six l’s.
Chapter 2. Solving Linear Equations Ax = (,
64
2.4 Permutations and Transposes
"1^ A permutation matrix P has the same rows as / (in any order). There are n! differentaJa'
2 Then Pz puts the components-п.-Л.x. in that new order. And PT equals /»-*.
3 iimns of A art rows of AT. The transposes of Az and AB are x1 A 1 and BTAT.
4 The idea behind AT is that Ax • у equals x  A1 у because (Az)1 у = rTATy = XT(ATy).
A symmetric matrix has ST = S. The product S = AT A is always symmetric.
Permutations
Permutation matrices have a 1 in every row and a 1 in every column. All other entries
are zero. When this matrix P multiplies a vector, it changes the order of its components:
’ 0 0 1 z>
1
0
Circular shift of z
1,2,3U>3,1,2
Pz-
о 0
1 0
xj
*3
®a
pl м,а 11 и ““
* “ ““м" ₽'°”d
< H<nantpccifc°*** 3' and 4' “ 24 peonuulions otsi»
У permutations, when they multiply a vector x:
Reverse
the order
Circular shift
0
0
0
I
0
0
1
0
0
1
0
0
Г
о
о
0
Even zo.zj
before odd Z|,xj inthc
Fast Fourier Transform
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
1
Г
о
о
о
Exchange rows 2 and 3
Exchange again
10 ««t 1,2,3,4
0
0
0
1
0
0
1
0
0
0
0
0
0
0
Half of the n I [^7
T?L-	““ “« -«KT. An pennutation
“ fi”' example (exebn ° thc ma,ri* /• The last example
ge 1 and 4. exchange 2 and 3) was even.
₽en"“‘«<Km»of«enaft
« «change) was odd.
Therowsofp,,,
the columns of p~i
P = lr>nsposeofp
0
0
1
1
0
0
0
1
0
0 0 1’
1 0 0
J L ° i о
= /
1
2.4 Pctinuurtiun*. uud 11>ui»pu№
65
Properties of PcrmuLMlKM) Matrices
1.	The ч J's appear in ч different rows and a dtfieterr columns of P.
2.	I he columns uf P arc orthogonal: dot products between columns ate all im
3.	The product P^Pj of permutations is also a permutation
4.	If A is invertible, there n a permutation P to order its row* in advance so that
elimination on PA meets no zeros in the pivot positions Then PA = LU
The PA = LU Factorization: Row Exchanges from P
An example will show how elimination can often succeed, even when a zero appears tn
the pivot position. Suppose elimination starts with 1 as the first pivot- Subtracting 2 times
row 1 produces 0 as an unacceptable second pivot:
2 a
1 2 a '
2 4b
3 7c
1 2 a
0 0 b - 2a
(J 1 c- 3a
1 e-3a = 1/
0 b - 2a
0
In spite of this zero, A is probably invertible. To rescue elimination. P will exchange
row 2 with row 3. That brings 1 into the second pivot as shown So we can continue
This matrix A is invertible if and only if 6 - 2a is not zero in the third pivot Notice that
if b * 2a, then row 2 of A equals 2 (row 1 of Л), In that case. A tv surely not invertible
We can exchange rows 2 and 3 first to get PA. Then LU factorization becomes
PA = LU. The matrix PA sails through elimination without seeing that zero pivot.
I 0 0 1
0 0 1 2
0 1 0 3
P
In principle we might need several row exchanges. Then the overall permutation P
includes them all, and still produces PA = LU. Daniel Drucker showed me a neat way
to keep track of P, by adding a special column to the matrix A. That column tracks the
original row numbers, as rows are exchanged. If we do exchanges on that column also,
the final permutation P is easy to sec. The same example has one row exchange in P.
1
2
3
2 a 1]	1
4 b 2 -4 0
7 c 3	0
2 a 1
0 b-2a 2
1 c — 3a 3
12a Г
0 1 c — 3a 3
0 0 b-2a 2
• Asa is
1
0
0
0 0
0 1
1 0
Chapter 2. Solving Linear Equations Ax = b
“Partial Pivoting” to Reduce Roundoff Errors
Even good code for elimination allows fa extra row exchanges that add safety. Small
pivots are unsafe! The code does not blindly accept a small pivot when a larger number
appears below it (in the same column) The computation is more stable if we exchange
those rows, to produce the largest possible number in the pivot.
This example had tint pivot equal to 1. but column 1 offered larger numbers 2 and 3.
The code will choose the largest number 3 as the first pivot: exchange rows 1 and 3.
The order of rows is tracked by the last column—that column is not part of the matrix.
AU entries of L are < 1 when each pivot i* larger than all the numbers below it.
Fast Fourier Transform
^»Car1y eA4mPi L"cvens 06(15 pennuution". This comes
и F°UfXTran5fonD («T). step reduces a transform Fx
The FFT mav be the **** D“aete Founer Transform is multiplication by F.
science 'Step 1 reduces
-------Ш24 with 1024 nonzeros to two multiplications by FS12 (half size):
rows0,2,4,8,... of/
rows 1,3,5,7,... of I
the Founer matrix F
I D
I -D
Those zero submatnees cut the
Then the key idea u recuraon"^^^
involve the diagonal D’s The
Fioaa =
ata 0
°
diagonal matrix D and the пегтш^а^оГ^?1 *ОГк 10 (plus &mall work of
Then the krv idn <	ransform the evens and transform the odds.
*.•.““*““•»F>“-
involve the diagonal D\ The релтшип™ on*ardi. the only multiplications will
o-en-odd permutation at even step The	°M °Veral1 P = Product of an
^*2‘° = 1024). Every step costs	1024 to 1 « bfc Ю24 = 10
X7:	n D'‘&,oui ““
“f	“ 1 «Ио««Ч». I1»
• d-ta, .и!	“ “'Мками That difference makes
2.4. PemuiUtions and Transp/ve*
67
The I ranspose of A
Wc need one more main», and fortunately it is much simpler than the inverse It is the
“transpost" of A. which is denoted by A1. The columns of A1 are the nms of A.
When A is an m by n matrix, the transpose is n by m. 3 by 2 becomes 2 by 3
Transpose If A =
You can write the rows of A into the columns of AT. Or you can write the columns of A
into the rows of Лт. The matrix “flips over" its main diagonal. The entry in row I. column j
of AT comes from row j, column i of the original A:
Exchange rows and columns
The transpose of a lower triangular matrix is upper triangular. The transpose of Лт is A.
Note MATLAB’s symbol for AT is A*. Typing [1 2 3 gives a row vector and the
column vector is v = [1 2 3]*. The matrix Af with second column w = [ 4 5 6 |*
is M = [ v w ]. Quicker to enter by rows and transpose: Af = [ 1 2 3; 4 5 6 ] *.
The rules for transposes are very direct. We can transpose A + li to get (A 4- B)1.
Or we can transpose A and В separately, and then add Л1 + В'—with the same result.
The serious questions are about the transpose of a product A В and an inverse A 1:
Sum	The transpose of	A + B is Ar + BT.	(1)
Product	The transpose of	AB is (AB)T = BTAT.	(2)
Inverse	The transpose of	Л"* is (Л-,)Т = (ЛТГ|.	(3)
Notice especially how BTAT comes in reverse order. For inverses, this reverse order
was quick to check: B~lA~l times AB produces / because A~*A = B'lB = I.
To understand (ЛВ)Т = BTAT. start with (Лх)т = xTAT when В is just a vector:
Ax combines the columns of A while xrAr combines the rows of AT.
It is the same combination of the same vectors! In A they are columns, in Лт they are
rows. So the transpose of the column Ax is the row хтЛт. That fits our formula
(Лт)т = xTAT. Now we can prove (ЛВ)Т = BTAr. when В has several columns.
If В has two columns Xi and x2, apply the same idea to each column. The columns
of AB are Л®! and Ax?. Their transposes appear correctly in the rows of BTAr :
Transposing AB
Ax\ Axj
gives
which is BTAT .	(4)
68
AB
(5)
__ 2. Solving Linear Equations A® = b
HOT«"»"*enin(''B|T “ в’Л
41(I q r6 ,1
I [5 0] [5 °] and BTAT=|0 1 |o 1]	|0 1]
4 if I9 'J
t io three or more factors*. (ABC)T equals C1 В ’ A1.
The reverse order role extends to	A-»A = I. On one side.
,.	A-‘A-I is transposed io A (A )
Transpose of inverse Л
. ..пт ат _ r we can invert the transpose or we can
SMbd, AX-‘ - I	j „ .„,ПЫ. "«'I, Hl« л “
transpose the inverse Notice especially л	-
The inverse of A =
. The transpose is AT -
The Meaning of Inner Products
The dot product (inner product) of z and у is the sum of numbers х,у(- Now *c тм.
better way to wnte * • y. without using that unprofessional dot. Use matrix notation
T binside Пй dor product or iiuw product u zT у	(1 X n)(n X 1) s 1 x '
T b outside Theron* one product or outer product if zyT(n x 1)(1 x n) — n x n
zTy is a number. zyT is a matrix Quantum mechanics would write those as < ®llf
(inner) and |z><y| (ouier). Probably our universe is governed by linear algebra
Here are three more examples where the inner product has meaning
Work = (Movements) (Forces) = x 1 f
Heal loss = (Voltage drops) (Currents) = e ’ V
Income = (Quantities) (Prices) = qT p
From mechanics
From circuits
From economics
We ate redly ckne to the heart of mathematics, and there is one more point to
We defined dT k'T” C°”'*ct,on be1wtcn inner products and the transpose of A
There baX vJ?	*’«*iU miinThat’s not mathematics
There is a better i«y . AT ц
(A»)Ty = «T(ATf) lnw	= inner prwjuct of x	ATV
24. Permutations .ind Transposes
69
Example 1 Sun with A -
On one side we have Ax multiplying у lo produce (r2 x()yt +Ui x2)in
Thai is the same as x> (—) + x2 (gi — щ) + *s (pH No* * •* multiplying A1 у
Example 2 Will you allow a little cakulus? It is important or I wouldn'l leave linear
algebra. (This is linear algebra for functions.) Change lhe matrix to a derivative:
A = d/dt. The transpose of d/dt comes from (Ax/y - xT(A'y).
First, the dot product хту changes from X! jn + ••••* x,y„ lo an uiirgrulof x(t)y(t).
Inner product
of functions x and у
x1y»(x. y) = / x(f)y(f)dt by definition
Transpose rule for functions
(Ax)'y = xT(4Ty)
/	/ 4-(f)
J Ш	J
(6)
I hope you recognize "integration by parts" The dens alive moves from the first
function x(t) to the second function y(t). Dun ng that move, a minus sign appears
This tells us that the transpost of A — d/dt iiAT = -A= —d/dL
The derivative is anti symmetric. Symmetric matrices have Л’ Л. anti-symmetric
matrices have AT  — A. In some way. lhe 2 by 3 difference matnx in Example I followed
this pattern. The 3 by 2 matrix Лт was minus a difference matnx. Il produced i/( - y2
in the middle component of ATy instead of the difference щ - yi-
Integration by parts is deceptively important and not just a tnck.
Symmetric Matrices
For a symmetric matrix, transposing A to Лт produces no change. Inthiscasc A’ equals A.
Its (J>>) entry across the main diagonal equals its (i.j) entry. In my opinion, these are
lhe most important matrices of all. We give symmetric matrices the special letter S.
A symmetric matrix has S* = S. This means that every a2< = ao.
Symmetric matrices S» J2 5] “	Jo io] ~	•
The inverse of a symmetric matrix is a symmetric matrix. The transpose of S-1 is
(S~*)T = (5T)_| = S~l. When S is invertible, this says that S-1 is symmetric .
Symmetric inverses S~l = J_2 jj and ' = Jo 0 1]'
Now we produce a symmetric matrix S by multiplying any matrix A by AT.
Oupier 2. Solving Linear Equations A® = b
70
PnxlueuA^ and A AT and LDL^
л — xrлт A
^T^A is automatically a square symmetric m	•
“ ',т<лТ)Т *** , °’
—’а
1 о
-1 1
-1
1
0
0
-1
1
in both orders.
Example 3 Multiply A - q
and ЛТА
’ 1 -1
-1	2
0 -1
o'
-I are both symmetric matrices.
1
The product AAT b m by m. In the opposite order. ATA is n by n. Both are symmetric,
with positive diagonal (why?). But even if m = n. it is very likely that A1 A / A A .
Symmetric matrices tn elimination S^ = S makes elimination twice as fast, because we
can work with half the matrix (plus the diagonal) The rymmrtry is in the triple product
S = LDLT. The diagonal matrix D of pivots can be divided out, to leave I/ * /Л
L U misses the symmetry of S
Divide the pivots 1,3 out of U
S = LDLr captures the symmetry
A'ow U и the transpose of L
For a rectangular A th» saddU-pouu matrix S u symmetric and important:
Block matrix	[/ д!
from least squares	AT 0 । S has size m + n.
S =
ShinvertiMe	A* A it invertible
Block elimination
Subtract AT(row 1)
The block pivot matrix D
<==> Ax /0 whenever x / 0
isU.
__ M ~AtA. Then L and LT contain AT and A:
l.lW.ll 01 p 0 1 Г/ Л
Iх /J [о -ЛТЛ о I
2.4. Permutation* and Transpincs
71
Problem Set 2.4
Question* 1-7 are about the rule* for transpose matrices.
1 Find Л1 and Л 1 and (Л *)T and (Лт) 1 for Л =
2 Verify that (ЛB)T equals В1 A1 but those arc different front A1 BT :
I e
c 0
Show also that ЛЛТ is different from A1 A. But both of those matrices are
3	(a) The matrix ((AB)~1 )T comes from (Л'1)1 and (B~l)r. In what order!
(b) Iff/is upper triangular then (I/-1 )T is__ triangular.
4 Show that Л2 = 0 is possible but Л1 A = 0 is not possible (unless A ® zero matrix).
[12 3
4 5 6
(b) This is the row хтЛ _______times the column у  (0,1.0).
(c) This is the row xT = (0 11 times the column Ay =
6	The transpose of a block matrix А/ = [ * d I *s AfT "	__ Test an example.
Under what conditions on А, В, C, D is the block matrix symmetric ?
7	True or false:
(a)	The block matrix [ X о ] *s automatically symmetric.
(b)	If A and В are symmetric then their product AB is symmetric.
(c)	If Л is not symmetric then Л-1 is not symmetric.
(d)	When А, В, C are symmetric, the transpose of ABC is CBA.
Questions 8-15 are about permutation matrices.
8	Why are there n! permutation matrices of order n?
9	If Pj and Pj are pennutation matrices, so is P\Pi This still has lhe rows of I in
some order. Give examples with Pi Pa / Pi Pi and P3P4 = P4P3.
10	There are 12 “even” permutations of (1,2,3,4), with an even number of exchanges.
Two of them are (1,2,3,4) with zero exchanges and (4,3,2,1) with 2 exchanges.
List the other ten. Instead of writing each 4 by 4 matrix, just order the numbers.
11	If P has l’s on the antidiagonal from (l,n) to (n, 1), describe PAP. Note P = PT.
Chapter 2. Solving Linear Equations Az = (,
72
12 Explain why the dot product of X and у equals the dot product of Px and Py.
ПепТрхАру) = *rV tells us that PTP = I for any permutation.
x = (1,2,3) and у = (1,4,2) choose P to show that Px • у is not always x. Py
13 Which permutation makes PA upper triangular? Which permutations make P{ AP2
lower triangular? Multiplying .4 оя the right by P2 exchanges the___of Д
0 0
1 2
0 4
6'
3
5
14 find a 3 by 3 permutation matrix with P3 = 1 (but not P = I). Why can‘t P be
>he____________Find a 4 by 4 permutation P with P* ji I.
15	'ШПСа "* symmctnc pT = P Then Р'Р = I becomes
r = I. Other permutation matrices may or may not be symmetric.
19
17
18
(i)	">*<•*« PT *nds row_____ to row_____
P the row exchanges come in pain with no overlap.
|Ь)^.<Ь><атр|.иа,рт.рам|п<|>о||||Го<дгта|
ind Г1с|ог1и1(ою
Л “ВВшВ,-*ил^Яштюкатстич1у,уттстс1
У'~В'	(сМВЛ (d)ABAB
(С) How many entries са h. J *'*'	number of choic“ in LDLr ?
v v«*u»ci can be chosen if j *
W) Why doe. A*A h^e	“ г^^Пс ЦА* = -A).
F«ctor these symmetn	numbcrs °"'t* diagonal ?
 rnetnc matrices into S » LDLT Th
r	• Tne pivot matrix D is diagonal:
S.
and S
and s
19

2
-1
0
-1
2
-1
O'
-1
2
(and ch*k them) for
A 0 1 11
Ла 1 0 1
1 2
2 4
,1 1
^"Z^^^lhAlth ,
’ what are^'f^ 3	exchanges to reach the
torsPii, and (/?
2
3
and
O'
1
1
2.4. Permutation* and Transposes
73
21 Prove that the identity matrix cannot be the product of three row exchanges (or five).
It can be the product of two exchanges (or four).
22 If every row of a 4 by 4 matrix contains the numbers 0,1,2.3 in some order, can the
matrix be symmetric?
23 Start with 9 entries in a 3 by 3 matrix A. Prove that no reordering of rows and
reordering of columns can produce AT. (Watch the diagonal entries.)
24 Wires go between Boston, Chicago, and Seattle. Those cities are at voltages хц.хс.
xg. With unit resistances between cities, the currents between cities are in y:
	VBC		1	-1	o'		*B
у « Ax is	yes	=	0	1	-1		*C
	VBS,		1	0	-1		
(a)	Find the total currents Лт у out of the three cities.
(b)	Verify that (Ax)Ty agrees with zT(ATy)—six terms in both.
25 The matrix P that multiplies (x.y.z) to give (*,x,y) is also a roution matrix.
Find P and P3. The rotation axis a  (1,1.1) doesn't move, it equals Pa.
What is the angle of rotation from v = (2,3, -5) to Pv « (-5,2,3)?
26 Here is a new factorization A - LS = triangular rimes symmetric:
Start from A  LDU. Then A equals L times S = U* DU.
Why is L (Ur)~x triangular ? Why is UrDU symmetric ?
27 In algebra, a group of matrices includes AB and A~x if it includes A and B.
"Products and inverses stay in the group " Which of these sets are groups?
Lower triangular matrices L with l’s on lhe diagonal, symmetric matrices S,
positive matrices M, diagonal invertible matrices D. permutation matrices P,
orthogonal matrices with QT « Q~x Invent two more matrix groups.
Challenge Problems
28 If you take powers of a permuUtion matrix, why is some P* eventually equal lo /7
Find a 5 by 5 permutation P so that the smallest power to equal I is P*.
29	(a) Write down any 3 by 3 matrix M. Split M into S + A where S = S1" is
symmetric and A = — AT is anti-symmetric.
(b)	Find formulas for S and A involving M and AfT. We want Af — S + A.
30 Suppose QT equals Q~1 (the transpose equals the inverse, so QTQ = Г).
(a)	Show that the columnsqj,...,qn are unit vectors: ||q,||2 = 1.
(b)	Show that every two columns of Q are perpendicular qjq2 = 0.
(c)	Find a 2 by 2 example with first entry qn = coe 0.
3 The Four Fundamental Subspaces
3.1	Vector Spaces and Subspaces
3 J The Nullspace of A: Solving Ax = 0
33 The Complete Solution to Ax = b
ЗА Independence. Basis, and Dimension
33	Dimensions of the Four Subspaces
Column
1 h'5* the picture tha
Section 3 1 ojieni with a pure algebra question How do we define a "vector space" 1
Looking at R . the key operations are v + w and cv. They are connected by simple laws
like c(v + w) = or + on. We must be able to add о + w. and multiply by c. Section 3 I
will give eight rules that the vectors t> and the scalars c must satisfy
produ“',h' -*•
4,40 «olutions: The nullspace is a subspace.
Linear algebra gives us a way to solve 4т - o n. u	.
Simplify the equations to ftr = 0 TK,n к J “ °’ ?* *“* lystem IS eliminalion -
column Taking all their combinations .< m. *. 4*C“I so,ut‘Ofl" for “ch dependent
Rnallv r	«“«Лесгиа.1 «ер to produce the nullspacc.
anally comes the idea of a bash- А nt
Their combinations give one and onlv ° VK’°rs Л,< Р^есЧу describes the space.
Tl* r independent columns of A >r,.	*?У *° Prtx*uce evcry vector in the space.
** • 0 are a basis for N(4).	°r C(4). The n - r special solutions to
Chapter 2 was ahn>	^*Ve	r and n - r.
ч.>м|лсг z was about souarr
had full rank r = m 3	All four of the matrices in PA = LU
Chapter 3 moves to a hither t«.	'*	°* 'P** °f A WCK thc ful1 spaCC R*
Every rn by n matnx is allowed, and thcrT^ |T °* mOS’inlponan, chapter in thc book.
Z	* k n°nzem to Ax = 0
e ’tarts with equations Ax = T *** co,umn space and row space.
al of this chapter	’ П<М Co*umns or rows ^rom
conn«* the four wbL3^T"'a/ ntorrm of Unear Algebra ".
rowspac^ Ле1ГШтеп^:
*’«h it makes ^e "“^P8" °f A' nu«space of AT.
ndamental Theorem easy to remember.
3.1. Vector Space* and Sub*pacc*
75
3.1 Vector Spaces and Subspaces
1	Al! linear combination* rv + dw must stay in the vector space
2	The row space of А is "spanned" by the rows of Л. The columns span C( A).
^3 Matrices Л/1 to My and functions ft to JN span matrix spaces and function spaces^
Start with the vector spaces R . R;. R *,... The space R" contains all column vectors v
of length n. The components i ( to ц, are real numbers. (When complex numbers like
t»i = 2 + 3i arc allowed, the spaces become С1, С2, C3....). We know how to add vectors
v and w in R”. We know how to multiply a vector by a number г or d to get rv or dw.
So we can find linear combinations rv + dw in the vector space R".
This operation of "linear combinations" is fundamental for any vector space. It must
satisfy eight rules. Those eight rules are listed at the start of Problem Set 3.1 — they start
with v + w = w + v and they are easy to check in R". They don’t need lo be memorized!
One important requirement: All linear combinations tv + dw must stay in lhe vector
space. The set of positive vectors (vi,...,i>r) with every r, > 0 is not a vector space.
The set of solutions to Ax  (1,1,...,!) i* n<* a vector space. A line in R" is not a
vector space unless it goes through (0,0.0).
If the line does go through 0, we can multiply points on the line by any number c
and we can add points on the line—without leaving the line. That line in R" shows the
idea of a subspace: A vector space imide another vector space
Examples of Vector Spaces
This book is mainly about the vector spaces R” and their subspaces like lines and planes.
The space Z that only contains the zero vector 0 = (0,0...0) counts as a subspace'
Combinations cO + dO are still 0 (inside the subspace). Z is the smallest vector space.
We often see Z as the nullspace of an invertible matnx: If the only solution to Лх 0
is the zero vector x “ 0. then the nullspace of A is Z.
We can certainly accept vector spaces of matrices. The space R3"3 contains all 3 by
3 matrices. We can take combi nations cA + dB of those matrices. They easily satisfy
the eight rules. One subspace would be the 3 by 3 matrices with all 9 entries equal—
a “line of matrices”. Note that Z “ (zero matrix) and S = symmetric 3 by 3 matrices
are also subspaces: A + В stays symmetric. But the invertible matrices are nor a subspace.
We can also accept vector spaces of functions. The line of functions у = ce“ (any c)
would be a “line in function space". That line contains all the solutions to the differential
equation dy/dx = y. Another function space contains all quadratics у = a + bx + ex2.
Those are the solutions to rPy/dx? « 0. You see how linear differentia] equations replace
linear algebraic equations Ax = 0 when we move to function space.
In some way the space of 3 by 3 matrices is essentially the same as R9. The space of
functions f(x) = a + bx + a2 is essentially R3.
Chapter 3. The Four Fundamental Subspaces
76
u. .____ r lh, mainces and functions arc safely in those spaces.
“,uran veThand n0‘functions
T^ Ld-spice” means that all linear combinations of the vectors or matrices or
functions stay inside the space.
Subspaces of Vector Spaces
At different times, we will ask you to think of matrices and functions as vectors. But at all
times, the vectors that we need most are ordinary column vectors. They are vectors with
n components—but maybe not all of the vecton with n components There are important
vector spaces inside R" Those are subspaces of R".
Start with the usual three-dimensional space R3. Choose a plane through the origin
(0,0,0). That plane is a vector space in its own right If we add two vectors in the plane,
their sum is in lhe plane. If we multiply an m-plane vector by 2 or -5, it stays in the plane.
A plane in three-dimensional space is not R3 (even if it looks like R2). The vecton
have three components and they belong to R3. The plane is a vector space inside R3.
This illustrates one of the most fundamental ideas in linear algebra. The plane going
through (0,0,0) is a subspace of the full vector space R3.
DEFINITION A subspace of a vector space is a set of vectors (including 0) that satisfies
two requirements: If v and w are vectors in the subspace and cis any scalar, then
(I) v + w is in the subspace
(11)	cv is in the subspace
,n <>,hcr w‘*d‘-the set o( vectors is 'closed” under addition v + w and multiplication cv
। ik. <5*ra,’o» lease us in the subspace. We can also subtract, because - w is
in »U space a its sum with v is v - w. AU linear combinations slay in the subspace.
are sutJ^iK Гу>'е^ии	4*C’’10 cighl "4“^ conditions
check lhe linear combinations requirement for a subspace.
(0.0.0). We	thilTpIre’ehT’	^С,0Г planC R’ has t0 g° ,hroUgh
•«. -d uk *	fram n‘le“°
U»n AnugK A, JX'X'Tfc"' teiU ₽lanei “e 1,01 s',bsPatel
vectors on the line, we stay on the lin/ R., When ** ти,1'Р,У ЬУ 5- or add tW°
Another «Пирке is .)| of Rs T llne 8° through (0,0,0).
of all the possible subspaces of R>	4>acc “ a subsP»ce (of itself). Here is a list
(L) Any line through (0,0,0)
(P) Any plane through (0,0,0)
(R1) The whole space	I
(Z) The single vector (0,0,0) J
plane or line, the requirements for a subspace don 1
they are not suh««~—
3.1. Vector Spate» and Subpaces	77
Example 1 Keep only the vector» (г, whose component» are positive or zero Ithi» is
a quarter-plane). The vector (2.3)» included but (-2. -3) n not So rule (ii) is isolated
when we try to multiply by c = -1. The quarter-plane и Me subspace.
Example 2 Include also the vectors whose components are both negative. Now we have
two quarter-planes. Requirement liii is satisfied, we can multiply by any c. But rule (it
now fails. The sum of t? = (2.3) and w = (-3. -2) is (-1.1). which is outside the
quarter-planes. Two quarter-planes doe 'I make a subspace
Rules (i) and (ii) involve vector addition г + ic and muluplicatioa by scalar» c and d
The rules can be combined into a single requirement—lhe rule for subspacer
A subspace containing v and w must contain ell linear combinations cv •+ dw.
Example 3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces:
(U) All upper triangular matrices	(D) All diagonal matrices i'(‘	.
Add any upper triangular matrices in U. and the sum is in U. Add diagonal matrices, and
the sum is diagonal. In this case D is also a subspace of U! Of course lhe zero matrix is in
these subspaces, when a. b. and d all equal zero. Z is always a subspace.
Multiples of the identity matrix also form a subspace of Af. Those matrices <7 form a
“line of matrices" inside M and U and D.
Is the matrix I a subspace by itself? Certainly not. Only lhe zero matnx is. Your mind
will invent more subspaces of 2 by 2 matrices—write them down for Problem 5.
The Column Space of A
The most important subspaces are tied directly to a matnx A. We are trying to solve
Ax = b. If A is not invertible, the system is solvable for some b and not solvable for other
b. We want to describe the good right sides b—the vectors that can be wntten as A times
some vector x. Those b’s form the “column space" of A.
Remember that Az is a combination of the columns of A. To get every possible b. we
use every possible x. Stan with the columns of A and take all their linear combinations.
Thu produces the column space of A. It is a vector space made up of column vectors.
DEFINITION The column space consists of all linear combinations of the columns.
Those combinations are all possible vectors Ax. They till the column space C( A).
This column space is crucial to the whole book, and here is why. To solve Ax = bis to
express basa combination of the columns. The right side b has to be in the column space
produced by A, or Ax = b has no solution!
The equations Ax = bare solvable if and only if bis in the column space of A
When b is in the column space C(A), it is a combination of the columns of A.
The coefficients in that combination will solve Ax = b. The word “space" is justified by
taking all combinations of the columns. The column space is a subspace of Rm.
I —
Chapter 3. The Four Fundamental Subspaces
78
Caution • The columns of A do txx form a subspace! The invertible matnccs do no form
a suZce The W«hr matnees do not fonn a subspace. You have to include all hnear
combinations The columns of A “span” « subspace when we take thetr combinations.
The Row Space of A
The rows of A are the columns of AT. the n by m transpose matrix. Since we prefer to
work with column vectors, we welcome AT:
The row space of A is the column space C(AT) of the transpose matrix AT
This row space is a subspace of R" It contains m column vectors from AT and all their
combinations. The equations ATy = e are solvable exactly when the vector c is in the
subspace С(ЛТ) = row space of A.
Chapter 1 explained why С(Л) and C(AT) both contain r independent vectors and
no more. Then r  rank of A = rank of Лт. A new proof is in Section 3.5.
W *P*Ce °f ,he 1ma,n* Л = ut,T “,hc ,mc °f a" c°lumn vectors
row an«J* nnr '°luinn Л ~ vuT “ • multiple of v. One vector v spans the
row space, one vector u spans the column space
The Columns of A Span the Vector Space C(A)
tuns only it	**' f °f VCC,ors in R"‘ ,f S СОП’
of the vectors in S. then we h>v	W *Pace ®ut	'nc'udc combinations
In fact V is the smallest vector snV	I₽,Ce V‘ ,n lh‘l case ,he sel S sPans V
combinations to produce a vector c*”*,ainin8 $ (because we are forced to include all
Thu is exactly what we	,
column space С(Л) « all combinations JiiT', °f Л Th0SC n colutnns sPan ,he
у the word span. In the same wav the m columns. Independence is not required
™ question. Show that the	* У	the row space C(AT).
._______	“тЫегь>2"«пе«,р„к1.<
Next comes the пцц^ 3 3 mMric« span R3*’.
(equations and nm	and that reni.,~
I» й a vector space be«°\ ‘O,ut>ons^ t0 rJT* Wc Start with Лх = °
have io work to find livl.USe = 0 and Ab - n 7°* e4uat>°ns give the nullspace,
‘’hndthoseso,^ 4V-01eadtoA(cz + dv) =0. But we
^^«^Whendonj^ V	V>
Vec*0R sPan R5 ? Th»» •
ThtS,$ very possible.
3.1. Vector Space» and Subspaces
Problem Set 3.1
79
The first problems 1-7 are about vector spaces in general. The vectors in those space*
are not necessarily column vectors. In the definition of a vector space, vector addition
x + у and scalar multiplication ex must obey the following eight rules:
(I)	x + у = у + x
(2)	x + (у + a) = (х + у) + ж
(3)	There is a unique “zero vector" such that X 4- 0 = X for all X
(4)	For each x there is a unique vector -x such that x + (-ж) = 0
(5)	I times x equals x
(6)	(cica)x ж C|(cj«)	(l)lo(4) about x + у
(7)	c(x + y) = ex + ey	(5) to (6) about ex
(8)	(ci + ca)x = Cix + Cjx.	(7) to (8) connect* them
1 Suppose (xi.xa) + (уг.уг) '* defined to be (xi + yj.xj + pi). With lhe usual
multiplication ex ж (cri.cri), which of the eight conditions are not satisfied ?
2 Suppose the multiplication ex is defined to produce (cx|,0) instead of (cT|,CXj).
With the usual addition in R2. are the eight conditions satisfied ?
3	(a) Which rules are broken if we keep only the positive numbers x > 0 in R1?
Every c must be allowed. The half-line is not a subspace.
(b) The positive numbers with x + у and ex redefined to equal lhe usual ту and
x* do satisfy the eight rules. Test rule 7 when c — 3, x « 2, у  1. (Then
x + у = 2 and ex — 8.) Which number acts as the "zero vector" 7
4 The matrix A  [, Za ] is • “vector" in the space M of all 2 by 2 matrices. Write
down the zero vector in this space, the vector | A. and the vector —A. What matrices
are in the smallest subspace containing A (the subspace spanned by Л)?
5	(a) Describe a subspace of M that contains A = [ J g] but not В = [J
(b) If a subspace of M does contain A and B, must it contain /?
(c) Describe a subspace of M that contains no nonzero diagonal matrices.
Cfaptrr 3. The Four Fundamental Subspaces
80
6
7
TV A nr\ = I2 and o(x) = 5x are “vectors" in F. This is the vector
space of all real functions (The functions are defined for -oo < x < oo.) The
combination 3/(x) - 4у(х) в the function h(x) = -------•
Which rule is broken if multiplying /(x) by c gives the function /(ex)? Keep the
usual addition /(x) + <?( x).
Questions 8-15 are about the “subspace requirements”: x + у and ex (and (hen all
linear combinations ex 4 dp) stay in the subspace.
8 One subspace requirement can be met while the other fails. Show this by finding
(a)	A set of vectors in R2 for which x + у stays in the set but ’ x may be outside.
(b)	A set of vectors in R2 (other than two quarter-planes) for which every ex stays
in the set but x + у may be outside.
10
11
12
13
14
Which of these subsets of R1 are actually subspaces ? They all span subspaces!
(a)	Theplaneof vectors (bi.bj.bj) with b] = bj.
(b)	The plane of vectors with bi = 1.
(c)	The vectors with bib^bj a 0.
(d)	All linear combinations of»» (1,4,0) and w a (2,2,2).
(e)	AU vectors that satisfy bi + b, + b, a 0.
(0 AU vectors with b, < bi < bj.
“К H:] h;;] <»>[::]-[j ?]•
Let P be the plane in R5 with	_
P' Find two vectors u> P and checkth f ь.?.~ 2* ~ 4 The on8ln (0,0,0) is not in
t«o	their sum is not in P.
ue’ *o be the plane through (0 n m
*чимкж fw ₽o? Find two vectors tn P^110 Ше prtV|ous plane P. What is the
S-Р^Р..Ы к
Лгои^ (0,0 0) and T • ..
’*’** containing both P and L is athe’ ’ ”* ‘hr°Ugh (°- °- 0)- The smallest
(a) Show that the set of	-----*---------
3.1. Vector Spaces and Subspaces
81
15	True or false (check addition in each case by an example):
(a)	The symmetric matrices in M (with Лт = Л) form a subspace.
(b)	The skew-symmetric matrices in M (with Лт = - Л) form a subspace.
(c)	The unsymmctric matrices in M (with Лт # ЛI span a subspace.
Questions 16-26 are about column spaces C( A) and the equation Ax = b.
16	Describe the column spaces (lines or planes) of these particular matrices:
17	For which right sides (find a condition on b|, bj. bj) are these systems solvable?
18	Adding row I of Л to row 2 produces B. Adding column I to column 2 produces C.
A combination of the columns of (B or C1) is also a combination of the columns of
Л. Which two matrices have the same column___________?
19
20	(Recommended) If we add an extra column b to a matrix A. then the column space
gets larger unless_________. Give an example where the column space gets larger and
an example where it doesn't Why is Лх = b solvable exactly when the column
space doesn't get larger ? Then it is the same for A and [ A b ].
21	The columns of AB are combinations of the columns of A. This means: The column
space of AB is contained in (possibly equal to) the column space of A. Give an
example where the column spaces of Л and AB are not equal.
Chapter 3. The Four Fundamental Subspaces
82
22
23
24
25
26
27
28
29
30
31
- b* are both solvable. Then Az = b + b is solvable.
Suppose Ax = b and Ay	coiumn space C(A), then
What is a? This translates into. Lt
b + b’is in Wn>at is a requirement for a vect	p
ил«ад5Ь,5»«^"-^'Ь“'““|ш,шч>“е“-------------------------------- W?
True or false (with a counterexample if false I
(a)	The vectors b that are not in the column space C( A) form a subspace.
(b)	If C(A) contains only the zero vector, then .4 is the zero matrix.
(c)	The column space of 2.4 equals the column space of Л.
(d)	The column space of A -1 equals the column space of A (test this).
Construct a 3 by 3 matrix whose column space contains (1,1,0) and (1,0,1) but
not (1,1,1). Construct a 3 by 3 matrix whose column space is only a line.
If the 9 by 12 system Ax = b is solvable for every b. then C( A) =.
Challenge Problems
Suppose S and T are two subspaces of a vector space V.
(a) Definition The sum S + T contains all sums a + t of a vector a in S and a
***** t in T Show that S + T satisfies the requirements for a vector space.
Addition and scalar multiplication stay inside S + T .
Cb) И S and T are lines in R*. what is the difference between S + T and S U T?
t union contains all, vectors from S or T or both. Explain this statement:
The span ofSuTbS + T. (Section 3.5 returns to this word “span".)
what matrix W°* * 7^ T **,hcn S + T is the column space of
XiLvT.^ » «nd M are all in R-. I don’t think A + В
« always a correct M We want the columns of Af to span S + T.
Show that thc rcistrifes A And Г А A n 1
" ”"	- c“> - Я- *. л ь „ _
Rnd an°<h« independent solution (after « -
equation tPy/dx3 = u	. v ) to the second order differential
w 2^ XCZ4*
“"biinuon, у =-----.
th	two subspaces of Dn Tv •
T**”*’” are in both subsoac« m*. “'nto*ction" V П W contains
*“* vn*	*r •• “*> vector is in V and W.)
yacK + ,iVmVnwl ghe rc4Uircment: If z and у are in VO W.
3.2. The Nullspace of A: Solving Ax = 0
63
3	.2 The Nullspace of A: Solving Ax = 0
I The nullspacc N(A) in R contains all solutions x to Ax = 0. This includes x — 0.
2	Elimination from A to U to Ho docs not change the nullspace: N(A) - N(U) = N(H«).
3	The reduced row echelon form Ho = rref(A) has I in r columns and F in n - r columns.
4	If column j of Ho is free (no pivot), there is a “special solution" to Ax = 0 with Xj = 1.
\5 Every short wide matrix with rn < n has nonzero solutions to Ax = 0 in its nullspace.
This section is about the nullspace containing all solutions to Ax = 0. The m by n matrix
A can be square or rectangular. The right hand side is b = 0. One immediate solution is
x - 0. For square invertible matrices this is the only solution. For other matrices, we find
n - r special solutions to Ax - 0. Each solution x belongs to the nullspace of A.
Elimination will find all solutions and identify this very important subspace.
The nullspace N(A) consists of all solutions to Ax = 0. These vectors x are in R".
Check that those vectors form a subspace. Suppose x and у are in the nullspacc (this means
Ax — 0 and Ay — 0). The rules of matrix multiplication give A(x + y) = 0 + 0. The
rules also give A(cx) = cO. The right sides are still zero. Therefore x + у and ex arc also
in the nullspace N(A), and the test for a subspace is passed.
To repeat: The solution vectors x have n components. They are vectors in R", so
the nullspace is a subspace of R". The column space C(A) is a subspace of R”'.
[1 2]
j gI. This matrix is singular!
Solution Apply elimination to change the linear equations Ax = 0 to Rox = 0;
Xi + 2xj = 0	+ 2хз = О Г 1 2 1	„ f 1 2 1 _ Г / Fl
3x, +6x3-0	0 = 0 L 3 6 J ' L 0 0 J “ I ° 0 J
There is really only one equation. The second equation is the first equation multiplied
by 3. In the row picture, the line xi + 2x2 = 0 is the same as the line 3x> + 6x3 = 0.
That line is the nullspace N(A). It contains all solutions (x|,xa) = (-2c,c) = c(-2,1).
To describe the solutions to Ax = 0, here is an efficient way. Choose one “special
solution”. Then all solutions are multiples of this one. We choose the second component
to be x2 = 1 (a special choice). From the equation xj + 2x2 = 0, the first component
must be xi = -2. The special solution is а = (—2,1).
Special A 0 nu|lspace of A = R ~ contains all multiples of a = ~2 .
solution	LJ ®J	L
This is the best way to describe the nullspace. The solution з is special because the free
variable is 1. Simple formulas for H and в come at the end of this Section 3.2.
Chip«er3.
The Four Fundamental Subspace,
п»И«5 Г W
84
special solutions to Ax = q
jk, i by 3 matrix A = [ 1 2 3]. Then
F.amole 2 x + 2y + *» = 0 COmeS^n the pla* PcrPcnd,cular to (1.2,3).
Ax = 0 produces a plane A1'free variables у and z: Set to 0 and 1.
The plane is the mdlspace of A. 1h«
[123] »
and «2 =
-3‘
0
Oj	1
.	, 4.2» + 3z = 0. All vectors on the plane are
2 3] = (/ Fj.
eomlunaiwo* of	b »tomponrnls an ~fnt~ and wt
>“» •“ “ ir'.tL'of 6« «»»«“"“ -2 «о -3 № dem,.
choose them specialty as 1,0 and U,1- u
mined by the equation x + 2y + 3a = 0.
The solutions to x + 2y + За = в also lie on a plane, but that plane is not a subspace.
The vector x « 0 u only a solution if b = 0. Section 3.3 will show how lhe solutions to
Ax = b (if there are any solutxms) are shifted away from zero by one particular solution.
TWo key steps	(1) reducing A to iU row echelon forms R^ and R
in this section	(2) finding the n - r special solutions a to Rx = 0
Section 3.3 has the final step (3) finding a particular solution to Ax - b
Example Л
Я is connected to A by A  CR. As in Chapter 1. C contains r independent columns.
Elimination (row operations) will now take us directly from A to Ro to R. without C.
12 1
2 4 5 . We can see that column 2 is 2 times column 1. Then
3 6 9
columns 1 and 3 are independent and the rank is r - 2. But we don’t want to use this
information I We want a systenubc way to find dependent columns for any matrix A.
Thatsystetnabc way is a sequence of row operations on A that will lead directly to R.
(urJi't r°*	l,ke е11ГП1М1|оп «ер* in Chapter 2. leading from A to U
<Wt ЙО₽ * U WiU COnt,nuc '° * and R We
discovering Я before C. That matrix R will reveal the nullspace of A.
”'4te °"”1 “ ““ ~	I IK- 4» Um Pi»,. Column 2
Г
5
9
1
2
3
2
6
1 2 Г
ооз
3 6 9
12 1'
0 0 3
0 0 6
• о
Step2 Divide row 2 by 1 ю produce second pivot-t n ,
P vt _ 1. Use it to eliminate 6 and 1:
1
0
0
2
0
0
1 
3
6
1 2 Г
0 0 1
. 0 0 6
1
0
0
2
0
0
0 ‘
0
3.2. The Nullspace of A: Solving Ax = 0
85
That matrix Яо is the reduced row echelon form It has the same rank as A (rank 2).
The word echelon means that lhe l’s in Яо go steadily down, left to right.
До has the same row space as A (all our row operations were invertible). Яо has the same
nullspace as A. The equations Hqx = 0 are linear combinations of the equations Ax = 0.
Notice the zero row in Яо. We can and will remove it—no change in the row space or
nullspace. Яо becomes Я with no aero rows Thu u the R we wanted in Chapter 1.
12 0'
0 0 1
0 0 0
f1 >11
2 5 I
3 9
C contains the first r independent columns of A (columns 1 and 3)
Я has the identity matrix in columns 1 and 3 and F in column 2; rank r = 2
The special solution to Rx = 0 is a = (-2,1,0) with free variable = 1
The nullspace of A and Яо and Я contains all multiples of that solution a
Ho =
= CH is
1 2 Г
2 4 5
3 6 9
1 2 0
0 0 1
This is the same A  CR = (m X r)(r x n) that Section 1.4 would produce by
looking for independent columns in C. Now we have a good computational system:
Elimination steps from A to Яо and H. then look for the r by r identity matrix inside Я.
By creating Я, we know the correct columns of A that go into C. Those columns give
the identity matrix I in Я. Then A " CR is the result of elimination on any matrix.
going far beyond A = LU to allow every matrix A.
Pivot Columns and Free Columns оГ R and A
If Rx = 0 then Ax = CRx = 0. There is a special solution x = a for every column of
A without a pivot. The r pivots are the Га in I. leaving n - r free columns of Я. Here is
the result of elimination on a 4 by 5 matrix when the rank is r = 3 and the independent
columns of C are oi and 03 and as- The free columns aj and a« are not in C.
0 9 O'
1 r 0
0 0 1
С Я
4x3 3x5
You see the 3 by 3 identity matrix in R. Elimination on the 4 by 5 matrix A led to a
4 by 5 matrix Яо- With rank r = 3. the fourth row of Яо was all zeros Removing that
zero row from R« produced the perfect factorization A = CR.
Elimination on A is complete and it reached Я. The remaining step is to read off the
5 — 3 = 2 special solutions to Rx = 0.
86
Chapter у The Four Fundamental Subsp^
. 3 = 2 special solubons >, and * ? Those vectors solve
What are the n — r — o •>	^s0 solve = 0 and Аз2 e r.
________________ °-
rhe combinations cjai т To find the special solutk We arc assigning the vali correspond to the colum Rei — 0 and R»i=0 tel Special solutions	= to Rx - 0	>ns. stai tes 1.0 ns 1,3. ushov -P 1 0 0 0	•к. -( 1. ,0,_)апЬв2 = (—.0,—,1 i and 0,1 to the n - r = 5 — 3 = 2 positions that don’t 5 containing the identity matrix in R. The equations v Ю fill in the rest of those special solutions a, and a2: The nullspace ,	>,=	-r	N(A) = N(«) and	а?	contains all 0	X’Ci^ + c^ » 
Those three numbers -p and -g and -r are pst negatives of three numbers in ft
Elimination has led systematically ion - r  2 independent vectors in the nullspace of R.
Those are the two special solutions a. and aj to Rx = 0 and Ax = 0.
The free components correspond to columns with no pivots. The special choice
(one or zero) is only for the free variables in lhe special solutions.
Exampin 3 Find the nullspaces of А. В. M and the two special solutions to Mx = 0.
[И]
2
н
4
16
M - [Л 2Л]=
12 2 4
3 8 6 1б] ’
Solution The equation Ax • 0 has only the zero solution x = 0. The nullspace is Z.
It contains only lhe single point x « 0 in R3. This fact comes from elimination:
Лх“[з e] “* [o 2] [о 1]“Яж/ Nofn*variahl‘“s
A is invertible. There are no special solutions. Both columns of this matrix have pivots.
The rectangular matrix В has the same nullspace Z. The first two equations in Bx - 0
again require z ж 0. The last two equations would also force x = 0. When we add extra
equations (giving extra rows), the nullspace certainly cannot become larger. The extra rows
impose more conditions on the vectors x in the nullspace.
The rectangular matnx Af is different. It has extra columns instead of extra rows.
The solution vector z has/™, components Elimination will produce pivots in the first
two columns of Af . The last two columns of M are “free”. They don’t have pivots,
-[и:.:] «'-[ни]
t t 11
pivot columns free columns
3.2. The Nullspece of A: Solving Az = Q
87
For the free variables x, and x4. we make special
Z4 = 0 and second Zj = 0, z4 = J.	tees of ones and zeros. First x, = 1,
equation Ux = 0 (or Rx = 0). We get two s Vanable* *t and z2 are determined by the
is also the nullspace of U and R •	'*7° 4>lul,on5 »the nullspace of Af. This
Special solutions [ 1 0 2 0 1 R ~ I 0 I 0 2 ]	•*w	-2' 0 1	and a, =	o' -2	«- 2 pivot ♦- variables
Ra\ = 0 Ra2 = о	o		0	<- 2 free
			1	«- variables
The Reduced Row Echelon Form Я
1. Produce zeros above the pivots. Vse pivot rows to eliminate upward.
2. Produce ones in the pivots. Divide the whole pivot row by lu pivot.
Those steps don' t change the zero vector on the nght tide of the equation The nullspace
stays the same: N(A) - N«Z) = N(ft). This nullspace becomes easiest to see when we
reach the reduced row echelon form. The pivot columns of R contain I.
12 2 4
0 2 0 4
Reduced
form R
1.
1 0 2 0 1
0 1 0 2 J*
I subtracted row 2 of U from row I. Then I multiplied row 2 by | lo get pivot
Now (free column 3) = 2 (pivot column I), so -2 appears M •> > (-2,0,1,0). The
special solutions are much easier to find from the reduced system Rx — 0. In each pivot
column of Я. change all the signs to find a. Second special solution - (0, -2,0,1).
Before moving to m by n matrices A and their nullspaces N(A) and special solutions,
allow me to repeat one comment. For many matrices, the only solution to Ax = 0 is
x = 0. Their nullspaces N(A) = Z contain only that zero vector no special solutions
The only combination of the columns that produces b = 0 is then the “zero combination".
This case of a zero nullspace Z is of the greatest importance. It says that the columns
of A arc independent No combination of columns gives the zero vector (except x - 0).
But this can't happen if n > m We can’t have n independent columns in R".
Important Suppose A has more columns than rows. With n> m there is al least
one free variable. The system Ax =0 has at least one nonzero solution
Suppose Ax = 0 has more unknowns than equations (n > m). There must be
al least n - m free columns Ax = 0 has nonzero solutions in N(A).
The nullspace is a subspace. Its -dimension" is the numtor pf free variables. This
•	_1 -—A 1—	< S ПЙ fmt fnlfMM
CMftf1 ThC Four Fundatncr»lal Sub
’Расс»
in the Echelon Matrix ft
88
Pivot
> i'
Л =
Variables
special Kxi=0and/fe
. -.	.	-a to -e come from p u
3 pivot column* P F in free column*	Ra = O	Ji
2 free columns/	j pivot*: rank r s 3	°
io be revealed by Я r	Ьи^шпп 2). The same must be true fOr A
«*-»*«*	h“ - o.
The special solution »i repeat*	combination* of at and a2.
Nullspace oM« Null*P»ceof----------------------------------------
------------7 im„i, formulas for the echelon matrix R
On the next Р«Г У* wiH «ее simple ormu	= 0
„nd . ben-rspecWsdutkm*
/inpi«’‘c0,unWS
fin free colun,n$
nmkr»3
Example This-I by <
1	0	x
oe	0	1	X
/<0	0	0	0
0	0	0
mduced row echelon matnx Ko ha* 3 pivots. Delete row 4 to find Я
Three pivot variables x>. x2, xfl
Four free variables ,r3, x<, x5, x7
Four special solutions в in N(/?o) » N(K)
The pivot rows and columns contain I
x x 0 t
x j 0 x
0 0 1 x
0 0 0 0
Я = [ / F1 P ha* row 4 removed. The permutation P puts column 3 of / into column 6.
Question What are the column space and the nullspace for this matrix R 1
Answer The columns of Ro have four component* so they lie in R*. (Not in R3!) The
fourth component of every column is zero. The column space of Ro consists of all vectors
of the form (b,. bj. b*. 0). The null*pace N(K) = N(/?o) is a subspace of R7 The solutions
to Лох - 0 are combinations of the four special solutions — one for each free variable:
1.	Columns 3.4.5.7 have no pivots. So the four free variables are x3, x4, x5, x7.
2.	Set one free variable io I and set the other three free variables to zero.
3.	To find each в. solve Rs = 0 for lhe pivot variables xi. x3, x6. Four special solutions.
To repeat: A short wide matrix (n > rn) always has nonzero vectors in its nullspace.
There must be al least n - m free variables, since lhe number of pivots cannot exceed m.
3.2. The Nullspace of A; Solving Аг = 0
89
The Echelon Form and Special Solutions in Matrix Language
From the examples you see the steps to and R. Chapter 2 produced zeros below the
piwrts in U. Chapter 3 also has zeros above lhe pivots in R. All pivots are 1. We now have
a systematic way lo identity independent columns in A and to h A = CR.
This row echelon form is famous, but its simple matnx formula is seldom given.
This page will g.ve formulas for Ro and R. along with lhe spcc.al solutions to Aa = 0
Those n - r special solutions combine to give lhe nullspacc: all solutions lo Ax = 0.
Ro comes from elimination (down and up) on A. Here are lhe basic formulas
Яо=[о o]P *-[* F]p A-CR-[C CF]P (I)
Thai column permutation P puts the columns of I and F into their correct positions in R.
F tells how the independent columns in A combine Into the dependent columns
-	—"	___________________________________
Special solutions lo Ax = 0 Since A has rank r, we expect n - r independent solutions.
Ax - 0 gives Rx « [ I F ] Px = 0
Here I is r by r and F is r by n - r. Thanks to the simplicity of /. and the fact that
PPT  1. we know immediately the matrix S of special solutions [a, ...
S has n rows and n—r columns (special solutions). The identity matrix in S has size n-r.
Each column has a 1, as special solutions always do. The other nonzeros in that column
come directly from F. with signs reversed to —F. The role of PT is to move the l's into
the right positions (free positions) in these special solutions. If the r independent columns
of A come first, then P is the identity matrix and S is truly simple: RS = -F + F « 0.
Here is a magic factorization that treats rows and columns of A in the same way.
C contains the first r independent columns of A as always. Suppose R* contains the
first r independent rows of A. (We know that row rank « column rank.) The rows
of R’ will meet the columns of C in an r by r matrix IV. Then A factors into CW ~lR*.
The first columns of W~'R' will be IV1»' = I. The last column will be the free part F.
The permutation is just P = I, since the independent rows and columns came first in A.
W1 fl* is the same matrix as K=[/ F], The free part is F= ’ j •
 CR=C[I FJp “4
completes the presentation of A = C/l factor
muins the first r independent columns of A, and£/U.°,1S''all
vws of A. C and /Г meet in an r by r matrix IV 74/* c°ntaj
and a small example from page 32 has grown int, к
90
Three Identical Factorizations A =
This very optional page ci	“
the same C and R. C contains
first r independent rows <'f •!
M the mixing matrix, and a small ехшпр»....
ization’X =CIP-‘R‘
13 6 I 9]
1 0
0 1
R=W-'K‘^3
-5 4
2 -1
w
тхп
in x r
r X n
r X r
г
n - г
=[ W
A = any matrix of rank r
C = first r independent columns of A
Л* = first r independent rows of A
IV = intersection of C and R'
Theorem The r by r matrix IV also has rank
1. Combinations V of the rows of R* must produce thc dependent rows in [ j
Then [ J H]«VH'
m - »•
«]
r and A = CW~]R',
;•[ УЖ VH ] for some matrix V and С» I ц,
2. Combinations T of the columns of C must produce the dependent columns in [
Then
3. A «
«].CT
ж и 1
VIV VH
WT
JT
IV
VIF
H
К
for some matrix T and R* = W [ I T ]
IVT
Vll'T
= CW'«-
W
V
Since A has rank r. its factors must have rank > r. From their shapes that means rank r.
If C and R’ were not in the first r columns and rows of A, then permutations PH of the rows
and Pc of the columns will give Pn A Pc
and the proof goes through.
I.	Find C and R’ and IV and IV-1 and R = IV“*W for thc transpose of A above.
11.	Explain these statements about the rank of augmented matrices [4 b] and [C D].
The rank of A equals the rank of [ A b] if and only if Ax = b is solvable.
The rank of C equals the rank of [ C D ] if and only if CT = D is solvable.
III.	If A = CM R’ his sizes (rnxr) (rxr) (rxn) and rank A = r, show that rank M=r.
91
3.2. The Nullspace of A: Solving Ax = 0
problem Set 3.2
1	Why do A and R = EA have the same nullspace ? We know that E is invertible.
2	Find the row reduced form R and the rank r of A and В (those depend on c).
Which are the pivot columns? Find the special solutions to Ax = 0 and Bx = 0.
Find special solutions
' 1 2 1 '
3 6 3
4 8c
and В =
Create a 2 by 4 matrix R whose special solutions to Rx = 0 arc «i and ej:
-3
1
0
0
pivot columns 1 and 3
free variables x? and
Xj and £4 are 1,0 and 0,1
Describe all 2 by 4 matrices with this nullspace N(A) spanned by and tj.
Reduce A and В to their echelon forms R. Which variables are free?
1 2 2 4 6*
1 2 3 6 9
0 0 12 3
(a) A =
(b) В
*2 4 2
0 4 4
0 8 8
A =
3
5 For the matrix A in Problem 4, find a special solution to Rx  0 for each free vari-
able. Set the free variable to 1. Sei the other free variables to zero. Solve Rx = 0.
6	True or false (with reason if true or example to show it is false):
(a)	A square matrix has no free variables.
(b)	An invertible matrix has no free variables.
(c)	An rn by n matrix has no more than n pivot variables.
(d)	An m by n matrix has no more than m pivot variables.
7	Put as many l's as possible in a 4 by 7 echelon matrix U whose pivot columns arc
(a) 2,4,5	(b) 1,3,6,7	(c) 4 and 6.
8	Put as many l's as possible in a 4 by 8 reduced echelon matrix R so that the free
columns are (a) 2,4,5,6 or (b) 1,3,6,7,8.
9	Suppose column 4 of a 3 by 5 matrix is all zero. Then £4 is certainly a _______________
variable. The special solution for this variable is the vector x = _______.
10	Suppose the first and last columns of a 3 by 5 matrix are the same (not zero).
Then is a free variable. Find the special solution for this free variable.
11	The nullspace of a 5 by 5 matrix contains only x = 0 when the matrix has --------------
pivots. In that case the column space is R '. Explain why.
92
12
13
14
15
16
17
18
19
20
21
23
24
25
26
Chapter 3. 1 he fundamental Suh
’14,
The number of special solutions is
by n main* has r pn» . contains only x = 0 when r =
Suppose an n m The n £ is r _ -------------------.
hv tbc СоипШ V when me
tUi—***”	3, -  ' 12 * para"'[“ > ’ Iй  ' - 0. <v
. тъи olane x ' •’» _ ли noints on the plane have the f—
7 he cuiuo...
(Recommended) The plane r - 3y - : = i. .. r
'«nojnt on this plane is (12,0,0). All points on the p|ane	ц
-- r, r7
particular point
x
У
t
+ !/
11 +* l°l •
0
0
0
, mn з + column 5 = 0 in a 4 by 5 matrix with f0Ur Djv
Suppose column 1 + «’1UI” * ц specia| solution? Describe N( Д). P V°u-
Which column has no pt vo .
w» ‘°"”" ,p“	(l'I,5>	<0,3'11 “nd *«
nulhpace contains (1,1,2),
Construct a matrix whose column space contains (1,1,0) and (0,1,1) and
nullspace contains (1,0. !)•
Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible
Why does no 3 by 3 matrix have a nullspace that equals its column space?
If AR = 0 then the column space of В is contained in the------of A. Why?
The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure
to be______. What R is virtually certain if the random A is 4 by 3?
If N( A) = all multiples of x  (2,1,0,1). what is R and what is its rank?
If the special solutions to Rx  0 are in the columns of these nullspace matrices N
go backward to find the nonzero rows of the reduced matrices R:
Г2 3'
0
1
iV =
I
0
and W
0‘
0
I
and N
1
(empty 3 by 1).
(a)	What are the five 2 by 2 reduced matrices R whose entries arc all 0’s and I’i?
(b)	What are the eight 1 by 3 matrices containing only 0’s and 1 ’s? Are all eight of
them reduced echelon matrices R ?
If A is 4 by 4 and invertible, describe the nullspace of the 4 by 8 matrix В = [А А].
Explain why A and -A always have the same reduced echelon form R.
3.2. The Nullspace of A. Solving Ax = 0
93
27 How is the nullspacc N(C) related to the spaces N(A) and N(B). if C = j ?
28 Find the reduced Ro and Я for each of these matrices:
29 Suppose the 2 pivot variables come last instead of first. Describe the reduced matrix
R (3 columns) and the nullspace matrix N containing the special solutions.
30 If A has r pivot columns, bow do you know that AT has r pivot columns? Give a
3 by 3 example with different pivot column numbers for A and AT.
31	Fi 11 out these matrices so that they have rank 1:
a b c
32 If A is a rank one matrix, the second row of Я is___. Do an example.
// A has rank r. then it has an r by r submatrix S that is invertible. Remove
m - r rows and n — r columns to find an invertible submatrix S inside A. B. and C.
You could keep the pivot rows and pivot columns:
1 0'
о 0
0 1
U 5 5]
(! i Л
34 Suppose A and В have the same reduced row echelon form Я
(a)	Show that A and В have the same nullspace and the same row space
(b)	We know Ei A = R and E?B = R. So A equals an___________matrix times B.
35 Kirchhoff’s Current Law ATy « 0 says that current in - current out at every node.
At node 1 this is yj = y( + y« (arrows show the positive direction of each y).
Reduce AT lo Я (3 rows) and find three special solutions in the nullspace of AT.
-1	0	1-1	0	0
1-1	0	0-10
0	1-1	0	0	1
0	0	0	1	1	1
3
з. The Four Fundamental Subspac^

C contains the r pivxM columns of A. Find the r pivo< columns of CT >r .
Transpose	r bv r	* '* —hmatrix 5 inside A:	* Ъ).
94
36
37
36
39
40
41
Г1 2 3
ЛгЛ» 2 4 6
2 4 7J
find C (3 by 2)	thcn “*,nvcrtible S (2 by 2).
Why is the column space C(AB) • subspace of C(A) ? Then rank(.4B) <
Suppose column j of В is a «unb.nat.on of previous columns of В Show
XT, «< ЛВ U	°'AB л"
cannot have new pivot columns, so rank(AB) < rank(B)
'Important) Suppose .4 and Я are n by n matrices. and ЛВ = / Prove fron,
nnk(AB) < гапк(Л) that the rank of A is n. So A is invertible and В must be it,
inverse Therefore BA - I <»htch is not so obvions!).
If A is 2 by 3 and В is 3 by 2 and А В ml. show from its rank that В A * I. Givc an
example of A and В with AB = I For m < n, a right inverse is not a left inverse.
What is the nullspace mains N (containing the special solutions) for A. В, C ?
Г -
2 by 2 Nocks Л«(/ /] and В
and C-[/ 1 /).
42 Suppose Л is an m by n mains of rank r lb reduced echelon form (including any
zero rows) is Яо Describe exactly the matrix Z (its shape and all its entries) that
comes from transposing the redisced ton echelon form of Rq . Z (rref (Д r))T
43 (Recommended) Suppose Яо = j is m by n of rank r. Pivot columns first:
(a) W hat are lhe shapes of those four blocks, based on m and n and r?
(b) Find a right inverse В with RtlB = I if r “ m. The zero blocks are gone,
(c) Find a left inverse C with CRo ~ J if r m n. The F and 0 column is gone,
(d) What is the reduced row echelon form of R° (with shapes)?
(e) What is the reduced row echelon form af RqRq (with shapes)?
Suppose you allow elementary column operations on A as well as elementary row-
operations (which get to Ro). What is the "row-and-column reduced form” for an
rn by n matrix A of rank r?
Verify that equation (I) on page 89 is correct: IV
is invertible and
IV'
The magk factorization is easy if the first r rows and columns of Л are independent.
What multiple of block row 1 will equal block row 2 of this matrix ?
[ W 1[ IV‘ ][ IV Я ] Г IV Я 1
I J ]	= [ J JW~'H I
j > The Nullspace of A: Solving Ax = 0	95
Elimination: The Big Picture
This page explains elimination at the vector level and subspace level, when A is reduced
to R- YoU know ,hc UePiaad 1 *on't repeat them. Elimination starts with thc first pivot.
It moves a column at a lime (left to right) and a row at a time (top to bottom) for U.
Then upwards elimination produces Ro and R. Elimination answers two questions:
Question 1 Is this column a combination of previous columns?
If the column contains a pivot, the answer is no. Pivot columns are "independent" of
previous columns. If column 4 has no pivot, it is a combination of columns 1,2,3.
Question 2 Is this row a combination of previous rows?
If the row contains a pivot, the answer is no. Pivot rows are independent of previous
rows. and their first nonzero is 1 from I. Rows that are all zero in Ro were and are not
independent, and they disappear in R.
It is amazing to me that one pass through the matrix answers both questions 1 and 2.
Elimination acts on the rows but the result tells us about the columns! The identity matrix
in R locates the first r independent columns in A.Then the free columns F in R tell us
the combinations of those independent columns that produce the dependent columns in A.
This is easy to miss without seeing the factorization A = CR.
R tells us the special solutions to Ax » 0. We could reach R from A by different
row exchanges and elimination steps, but it will always be the same R. (This is because
the special solutions are decided by A. The formula comes before Problem Set 3.2.) In thc
language coming soon, R reveals a "basis" for three of the fundamental subspaces:
Thc column space of A—choose the columns of A that produce pivots in R
The row space of A—choose the rows of R as a basis.
The nullspacc of A—choose the special solutions to Rx = 0 (and Ax = 0).
For the left nullspace N(AT), we look at the elimination «ер EA = Ro. The last m — r
rows of Ro are zero. The last m - r rows of E are a basis for the left nullspace! In reducing
[A /] to [Ro EJ, the matrix E keeps a record of elimination that is otherwise lost.
Suppose we fix C and R’ (rn by r and r by n, both rank r). Choose any invertible
r by r mixing matrix Af. All thc matrices CMR’ (and only those) have the same four
fundamental subspaces as the original A.
a-pKr з The Four Fundamental Subsp^
96	>,₽ Solutio" t° Ax = b
1i The Comply
1	S^uhr solution xF) + («ny In tn the nullspaC(J
{ Complete solunon» A* = ” ‘	when „го rows of Яо have zeros in d,
2	Elimination on [ Л	.	Ip has all free variables equal to zcro.
3	When RoX = d is4ol43b*-	nulbpace N( A) = zero vector no free variable,.
4	.4 has full column rank r = " *	« Rm : Лх = b is always solvable
_	u hen its column sp*
5	.4 has full row rank r -	__________
The last section totally solsed Лх = 0 El.nunauon converted the problem lo a 0
The free variables were give. special values (one and aero). Then. the pwot variables
found by back subst.tuuon We pa>d no attention to the right s.dc b because it SIaycd
at zero Then zero rows in Яо were no problem	...
Now b is not zero. Ron operations on the left side must act also on the right side
Ax - b is reduced to a simpler system ЯоХ - d with lhe same solutions (if any). One
way to organize that is to add b as an extra column of lhe matrix I will augment" Д
with the nght side Ib.h.h) -(1.6.7) ₽«*»»« «URmented matrix [Л bj;
13 0 2
0 0 14
13 16
has the
augmented
matnx
0 2 11
-|Л I].
1 3
0 0
1 3
1 4 6
1 6 7
When we apply lhe usual elimination steps to Л. reaching Яо. we also apply them to b.
In this example we subtract row 1 from row 3. Then we subtract row 2 from row 3.
This produces a row of zrroz in Ro. and it changes b lo a new right side d  (1.6,0):
13 0 2
0 0 14
0 0 0 0
has the
augmented
matrix
1 3 0 2 1
0 0 14 6
[0 0 0 0 0]
- (Яо d].
That very last row is crucial. The third equation has become 0 = 0. So the equations can
be solved. In the original matrix Л. the first row plus the second row equals the third row.
To solve Лх = b. we need bt + b, = bj on the right side too. The all-important property
of b was 1+6 = 7 That led to 0 = 0 in the third equation. This was essential.
Here are the same augmented matrices for a general b = (bj, bj, 63):
(>»
1 3 0 2 b/
0 0 14b,
13 16b,
bi
bj
bj - bj — bj
= [Яо
1 3
0 0
0 0
0
1
0
2
4
0

Now we get 0 - 0 in the third equation only if bj — bj - b, = 0. This is bi + bj = 63.
3.3. The Complete Solution to A« = b
97
One Particular Solution Axp = b
For an easy solution x,. choo,< lhefrre *
nonzero equations give the two pivot variables ,	. -.1 F J .
i .н»п to Д» — h i n ana®*es _ i antj x _ g Our particular
solution to Ax - b (and also Rqx = d) is x. - И n г m -n.	_• , t .
» my favorite:/™ variaAhs .	“‘"“°"
For . ~1ШЮ» <o «Ы „„ -omtaR.m„luM>b,„„toiStoc,Ibtou,
pivot rows and pivot columns of R„ the pivot variables In	come from d:
1 3 0 2'
0 0 14
0 0 0 0
1 ‘
6
0
0
6
0

Pivot variables 1,6
Free variables 0,0
Solution zp = (1,0,8,0).
Notice how we choose the free variables (as zero) and solve for the pivot variables. After
the row reduction to Ro, those steps are quick When the free variables are zero, thc pivot
variables for z,, are already seen in the right side vector d.
xparticular	The particular solution solves	Axr = b
znulhpac«	The n — r special solutions solve Axn = 0
That particular solution is (1,0,6,0). The two special (nullspace) solutions to
Rox = 0 come from the two free columns of Ro, by reversing signs of 3,2, and 4.
Please notice how I write the complete solution Xp + z, to Ax = b:
Complete solution
one Zp+many x„
xp : tree variables
xn : special solutions
Question Suppose A is a square invertible matrix, m = n = r. What are zp and z„?
Answer The particular solution is the one and only solution z, ™ A_,b. There
arc no special solutions or free variables. Ro = / has no zero rows. The only vector
in the nuilspace is xn = 0. The complete solution ii z = z, + z, = A"'b + 0
We didn't mention the nullspace in Chapter 2. because A was invertible. It was
reduced all the way to I. [ A b] went to [f A_,b]. Then Ax = b became x - A~'b
which is d. This is a special case here, but square invertible matrices are the best. So they
got their own chapter before this one.
For small examples we can reduce [A b] to [Ro dj. For a large matrix,
MATLAB does it better. One particular solution (not necessarily ours) is x = A\b
from backslash. Here is an example with full column rank. Both columns have pivots.
Example 1 Find thc condition on (1ц, bj. b) for Az = b to be solvable, if
b =
bi
bj .
*»,
Chapter 3- The Four Fundamental Subspa^
98
with its extra column b Subtract row I of [ Л M
W’ зюгеэсЬ[Яо d]
’] 0 26i — *2
_» 0 1 62-61
0 0 63 + 61 + 62.
Solution Use the augmented matnx.
from row 2. Then add 2 times row I Ю row
’ 1	1	61
1	2	63
-2 -3 63
The last equation is 0 =
the column space Then Ax =
So for consistency (these are t .
’1	1	bi
0 1 to-bi
0 _i 63 + 2*1.
— [Яо d]
0 provided 6, + *1 + *2 » 0. TOs is Ле condition to put b in
• = 6 will be solvable. The rows of A add to the zero row
^equations') the entries of b must also add to zero.
" Пк^ые lx » ta	' - 2 7 2
•П» .«IlHxe «*Л» bfc - * W« “IW“” Л* ~ ‘	"°* * d “ “
the top of the final column d
One solution to Ax = b
If 6s + 6( + bj is not zero, there is по solution to Ax = b (x,, and x don t exist).
This example is typical of an extremely important case: A has full column rank.
Every column has a pivot. The rank is r = n. The matrix is tall and thin (m > n).
Row reduction puls / al lhe top, when A is reduced to R with rank n:
n identity matrix!
n rows of zeros W
Fuji column rank r = n Ro
There are no free columns or free variables. The nullspace is Z = (zero vector).
We will collect together the different ways of recognizing this type of matrix.
Every matrix A with full column rank (r = n) has all these properties:
1. All columns of A are pivot columns. No free variables.
2. The nullspace N(A) contains only the zero vector x = 0.
3. If Aa? = 6 has a solution (it might not) then it has only one solution.
In the essential language of the next lection, this A has independent columns
= ° °” У hapj*?*hen » = 0. Later we will add one more fact to the list above:
"“""J Л U	,hemnkit ’*• A may have many rows.’
In this case the, nullspace of A has shrunk to the zero vector The solution to
™ » «. .ben: „
*“b'и»	‘	epeee.
3.3. The Complete Solution to Ax = Ь
99
Full Row Rank and the Complete Solution
The other extreme case is full row rank. Now Ax = b has «и or infinitel)manv solution».
ln tbs case A must be abort and wide (m < n). Л matrix has full row rank tf r = m.
“The rows are independent. Every row has a piwx. and here is an example.
Example 2 This system Ax = b has n = 3 unknowns but only m = 2 equations:
Full row rank * + V + 1 = 3	_	.
x + 2V - z - 4 'rank r « m = 2)
These are two planes in zyz space The planes are not parallel so they intersect in a line.
This line of solutions is exactly what elimination will find. The particular solution will
be one point on the line. Adding the nullspace vectors xK will move us along the line in
Figure 3.1. Then x = xj,+ all xn gives the whole line of solutions.
Figure 3.1: Complete solution = one particular solution xF + all nullspace solutions x„.
We find Xp and xn by elimination downwards and then upwards on [A b ].
Г1	1	1	31 Г1	1	13]	[10 3	2] r n	. 1
11	г	-1	«]-[о	1 -J »] -	|o !	-1	1]-I"	dl-
The particular solution (2,1,0) has free variable x> = 0. Il comes directly from d.
The special solution в has Xi = 1. Then -Z| and -xj come from the free column of R.
It is wise to check that x, and a satisfy the original equations Alp = b and At — 0:
2+1	=	3	-3+2+1	=	0
2+2	«	4	-3+4-1	=	0
The nullspace solution x„ is any multiple of a. It moves along the line of solutions, starling
at Xparticuiar. Please notice again haw to write the answer:
21	Г-3
Complete solution жв-+ж.= 1+х>	2 .
Particular + nullspace	’	q	j
This line of solutions is drawn in Figure 3.1. Any point on the line could have been chosen
as the particular solution. We chose xF as the point with xj =

Chapter 3- UK F«*r Fundamental Suhs

100
bv an arbitrary constant I The snc,.; .
The particular solution ts nr£rnul	,c ;(|| jn (|]c	M
needs that constant, and you understand why I	-space. П
Now we summarize this short wide case of full w rank. m < n
Ax = b is underdrtermined (many solutions).	<>n
Every matrix /I with/u/Zrow rank (r=m) has all these ргорелтГП
1. All rows have pivots, and Ro has no zero rows  Rq = R. I
2. Ax = b has a solution for every right side b.
3. The column space is the whole space R'".	I
14. There are n - r = n - rn special solutions in the nullspacc of A
I___
In this case with rn pivots. lhe rows art linearly independent. So the columns of ЛТ
n mis case	пи||и«е of Лт contains only the zero vector. And .2
are linearly independent The nullspacc ui л	And thil(
nullspacc N(AT) will be the fourth fundamental subspace.
We arc ready for lhe definition of linear independence, as soon as we summarize th,
four possibilities—which depend on the rank. Notice how r, tn, n arc the critical numbers.
The four possibilities for linear equations depend on the rank r
Square and invertible Ax = b has 1 solution
Short and wide * K	—
Tall and thin
and
and
and
and
r-n
r<n Not full rank
Ax = b
Ax = b
Ax в b
has oo solutions
has 0 or 1 solution
has 0 or oo solutions
rn
rn
m
tn
The reduced Ru will fall in the same category as the matrix A. In case the pivot columns
happen to come first, we can display these four possibilities. For Hyx  d and Ax « b
to be solvable, d must end in m — r zeros, t is the free part of Ro.
Four types for R<>	[ 1 ]
Their ranks r = m = n
r =
0 0
Cases I and 2 have full row rank r = m. Cases I and 3 have full column rank r = n.
 REVIEW OF THE KEY IDEAS 
1.	The rank r is the number of pivots. The matrix Ro has m - r zero rows.
2.	Ax = b is solvable if and only if lhe last tn - r equations reduce to 0 = 0.
3.	One particular solution xf has all free variables equal to zero.
4.	The pivot variables are determined after the free variables are chosen.
5.	Full column rank г = n means no free variables: one solution or none.
6.	Full row rank r = rn means one solution if m = n or infinitely many if m < n.
3 3, The Complete Solution to Ax = b
101
 WORKED EXAMPLES 
3.3 a This question connects elimination (pivot columns and back substitution) to
column space-nullspace-rank-solvability (the higher level picture). A has rank 2:
xt + 2x3 + 3x3 + 5x< = fc|
Ax = b is 2x1+4x2 + 3x3+12x4 = 63
3x| + вхг + 7xj + 13X4 = 63
5
12
13
3
8
7
2
4
6
1
2
3
A =
1,	Reduce [ A b] to [ U c|, so that Ax = b becomes a triangular system Ux = c.
2.	Find the condition on b,, b3, Ьз for Ax = b to have a solution.
3.	Describe the column space of A. Which plane in R* ?
4.	Describe lhe nullspacc of A. Which special solutions in R* ?
5.	Reduce [ U c|lo[Ao d): Special solutions from /{<>, particular solution from d.
6.	Find a particular solution to Лх = (0,0, —6) and then the complete solution.
Solution
1.	The multipliers in elimination are 2 and 3 and-1. They take [4 b)into[l/ c|.
1 2 3 5 bi I Г1
2 4 8 12 b3 -s 0
3 6 7 13 bs 0
2 3 5
0 2 2
0 -2 -2
bi
Ьз — 2bj
bj — 3b|
bi
b3 - 2b,
bs + ba — 5b|
1 2
0 0
0 0
3 5
2 2
0 0
2.	The last equation shows the solvability condition 63 + bj - 56) = 0. Then 0 = 0.
3.	First description: The column space is the plane containing all combinations of the
pivot columns (1,2,3) and (3,8,7). The pivots are in columns 1 and 3. Second
description: The column space contains all vectors with b3 + by - 5b| = 0. That
makes Лх = b solvable, so b is in lhe column space. All columns of A pass this test
Ьз + bj - 5b| = 0. This is the equation for the plane in the first description I
4.	The special solutions have free variables x3 = 1, X4 = 0 and then x3 = 0, r<  1:
Special solutions to Лх = 0
Back substitution in Ux = 0
or change signs of 2,2,1 in A
The nullspacc N(4) in R' contains all x„ = С|в( + C2«a.
5.	In the reduced form Ao. the third column changes from (3,2.0) in U to (0,1,0).
The right side c = (0,6,0) becomes d = (-9,3,0) showing -9 and 3 in
(I/ c] =
12350*
0 0 2 2 6
0 0 0 0 0
—>[A0 d) =
-9
3
0
1
0
0
2 0 2
0 1 1
0 0 0
6.	x = (-9.0,3,0) is the very particular solution with free variables = zero.
The complete solution to Лх = (0,6, —6) is x = xp + x„ = xp + С|в> + ejej-
Сщия 3. The Four Fundamental Sub»^
102
U В SW-	Р^ы, .s :r*6
What does that tell J«* atx>u
i.	- [<] *4U-
1 All solutions ю Az
tss:****"'-ы+4?]
Solution In case I. *•* "jX^X^Necttwrily m * n- Г a *•
The nullspace of A co« = * columns (and m is arbitrary). With [} ] in the nullspace
In case 2. A must ha	д 0: the rank is 1. With x = Г a 1 ...
of A. column 2 is the negatnt of cohun	* would be (1,0).	' “
XnX ^X^^Vis not in the column space of A. The rank of A musl к
In case 3 we on у	Ilth<-rwise z = 0 would be a solution.
k”l^<1	3 »l— •»(1.0.1)»'b' or л. colum, 3
is -(column l) The rank ts 3 -1 = 2 and b IS column I + column 2.
3.3 c Fmd the complete solution z = zp + zn by forward elimination on (A b]:
Solution
’ 1 2
2 4
4 8
2
it)
2 10	4
0 2 8 -0
0 0 0	0
12 0-4
0 0 1
0 0 0
7 '
4 -3
0	0
1
6
0
8
8
‘ 1
0
0
For the nullspace рал zn with b = 0. set the free variables Zj, z4 to 1,0 and also 0,1;
Special solutions at-(-2,1,0,0) and aa = (4,0,-4,1) Particular xp = (7,0, -3,0)
Then the complete solution to Az = b is ^complete = Zp + Ci«i + c^aj.
The rows of A produced the zero row from 2(row 1) + (row 2) - (row 3) = (0,0,0,0).
Thus у a (2,1,-1). The same combination forb = (4,2,10) gives 2(4) + (2) -(10) = 0.
If a combination of the roars (on the left side) gives the zero row, then the same combi-
nation must give zero on the right side. Of course! Otherwise no solution.
Later we will say this again in different words: If every column of A is perpendicular
to у = (2,1, -1), then any combination b of those columns must also be perpendicular to
y. Otherwise b is not in the column space and Az = b is not solvable.
And again: If у is in the nullspace of AT then у must be perpendicular to every b in
the column space of A. Just looking ahead...
103
3.3. The Complete Solution to Ax = b
Problem Set 33
3 (Recommended) Execute the six steps of Worked Example 3.3 A to describe the
column space and nullspace of A and the complete solurion to Ax = b
2 4 6 4
2 5 7 6
2 3 5 2
Г I
b= 5,
[ Ьз J
4
3
5
A =
2 Cany out the same six steps for this matrix A with rank one. You will find two
conditions on bi, 63,63 for Ax — 6 to be solvable. Together these two conditions
put b into the------------space (two planes give a line):
|2 I 3) =
bi
bj
bj
4 2
10
311
n
Questions 3-15 are about the solution of Ax = b. Follow the steps in lhe text to x,,
and xn. Start from the augmented matrix with last column 6.
3 Write lhe complete solution as xf plus any multiple of a in lhe nullspace:
x + 3y = 7
2x + 6y=l4
x + Зу + За = 1
2x + 6p + 9t - 5
—x - 3jr + 3a = 5
4 Find the complete solution x — x,+ any z„ (also called the general solution) to
1
2
0
3 1 2‘
6 4 8
0 2 4
5 Underwhatconditionsonbi.bj.bjarethesesystemssolvable? Include b as a fourth
column in elimination. Find all solutions when that condition holds:
x + 2p — 2a = bi	2x	+ 2a ” b>
2х + 5у-4г = Ьз	4-r	+ 4y = t>j
4x + 9y — 8a = 63	8x	+ 8y = 6з
6 What conditions on bi, 6j, 63, 64 make each system solvable? Find x in that case:
1
2
2
3
2'
4
5
9
bi
bi
bj
b4
12 3
2	4	6
2	5	7
3	9	12
104
10
11
12
13
The ranker =
Copter 3. The Four Fundamental Sub^
. . is in the column space if h - 2^ + 4/>]
Г1 3 1
3 8 2 .
2 4 0.
. s.i«in the column space of Л? Which combinations of the
Which vectors (•>!•	03;
rows of Л give zero?
1 2 1
2 6 3
0 2 3
- form z. + *» ю d** ful1 rank systcms:
Find the complete solution in the form z,
(b) A
1 1 1'
1 2 4
2 4 8
(b)
Construct a 2 by 3 Az - b wit.^particular so.ution x, - (2,4,0) and
homogeneous solution x. - »У mull,Pkr " <*• *• 1,1
Why can’t a I by 3 system have « (2.4.0) and x. - any multiple of (1,1,1)?
(a)	If Az - b has two solutions z, and x2. find two solutions to Ax = 0.
(b)	Then find another solution to Az - 0 and another solution to Ax - b
Explain why these are all false:
(a)	The complete solution is any linear combination of aj, and xn.
(b)	A system Az - b has at most one particular solution. This is true if A is
7
8
9
14
15
W A
(c) The solution z, with all free variables zero is the shortest solution (minimum
length ||z||). Find a 2 by 2 counterexample.
(d) If A is invertible there is no solution x„ in the nullspace.
Suppose column 5 of (/’has no pivot Then x5 is a____ variable. The zero vector
(is)	(is not) the only solution to Az = 0. If Az = b has a solution, then it has
solutions.
Suppose row 3 of U has no pivot. Then that row is___The equation Ux = e
1^ли°,*Л’е pro*'<W-----------C4uatlon Ax ’ 6 (й) (“ no,) not
33. The Complete Solution to Ax = b
105
16 The largest possible rank of a 3 by 5 matrix is_______. Then there is a pivot in every
-—	and R. The solution to Ax = b (always exists) (is unique). The column
space of A is________An example is A =__________
57 The largest possible rank of a 6 by 4 matrix is _________. Then there is a pivot in
every ------of U and R. The solution to Az = b (ofweyr exists) (is unique). The
nullspace of A is_______An example is A =__________.
18 Find by elimination the rank of A and also the rank of AT:
1 4 O'
2 11 5
11	5
2 10
1 0 Г
and A = 1 1 2 (rank depends on q).
.1 1 Я.
19 If Az = b has infinitely many solutions, why is it impossible for Az » В (new
right side) to have only one solution? Could Az “ В have no solution ?
20 Choose the number q so that (if possible) the ranks are (a) I (b)2 (c)3:
6 4 2
-3 -2 -1
9	6	q
21 Give examples of matrices A for which the number of solutions to Az = b is
(a) 0 or 1, depending oo b (b) oc. regardless of b
(c) 0 or oo, depending on b (d) I, regardless of b.
22 Write down all known relations between r and m and n if Az  b has
(a) no solution for some b (b) one solution for some b. no solution for other b
(c) infinitely many solutions for every b (d) exactly one solution for every b.
Questions 23-27 are about the reduced echelon matrices Ro and R.
23 Divide rows by pivots. Then produce zeros above those pivots to reach Ro »nd R-
2 4 4'
0 3 6
0 0 0
and U=
2 4 4'
0 3 6
0 0 5
24	If A is a triangular matrix, when is Ro - rref(A) equal to I ?
25	Apply elimination to Ux = 0 and Ux = c. Reach R«z = 0 and Roz = d :
l«'»l-[J o’ °i “d l" •=!-['
1 2 3 51
0 0 4
Solve Roz = 0 to find xn with z2 = 1. Solve Roz = d to find xr with za = 0.
Chapter 3. The Four Fundamental Subsp^
=	- c . to - 0 -	- d Wl“’m	•»
Яо1=<*?
106
26
27
28
29
30
31
32
33
'3 0
о 0
о 0
6 9'
0 4 .
2 5
3 0 6 0
0
0
U с
and
0
0
0
2
_	,,x _ c (Gaussian elimination) and then R»x = d.
и о
0
0
Find a panicuia,
and all homogeneous solutions zn.
10 2 3
13 2 0
2 0 4 9
’ 2
5
10
= b.
Hnd matrices A and В with the gisen property or explain why you can’t:
4 "[!]•
(a) The only solution of Ar -
0 1 .
(b) The only solution of Bz = j I t
isz =
1
2 .
3
Find the LU factorization of A and all solutions to Ax = b:
The complete solution to Az =
0
1
1
0
0
0
. Find A.
(Recommended!) Suppose you know that the 3 by 4 matrix A has the vector s =
(2,3.1,0) as the only special solution to Ax = 0.
(a)	What is the rank of A and the complete solution to Ax = 0?
(b)	What is the exact row reduced echelon form Яо of A?
(c)	H°* do you know that Ax = Ь can be solved for all b ?
to AequAC?* = 6 ,he Mme <comPle,e) solutions for every b.
Describe the column snace nf . j	.
Removing any zero rouT a- . educed row echelon matrix Rq with rank r.
g any zero rows, descnbe the column space of R.
3.4. Independence, Basin, and Dimension
107
3.4 Independence, Basis, and Dimension
/^Independent vectors: The only
zero combination Cj »i+-.- + c4v* = 0 has all c’a = o\
2	The vectors v>,..., v* span the space S if S = all combinations of the »'s.
3	The vectors ...are a basis for Sif(l) they are independent and (2) they spanS.
4	The dimension of a space S is the number к of vectors in every basis for S.
This important section is about the true size of a subspace. There are n columns in
an m by n matrix. But the true “dimension” of the column space is not necessarily n.
The dimension of C( A) is measured by counting independent columns. We will see again
that the true dimension of the column space is the rank r.
The idea of independence applies to any vectors ,..., in any vector space. Most
of this section concentrates on the subspaces that we know and use—especially the column
space and the nullspace of A. In the last part we also study “vectors” that are not column
vectors. They can be matrices and functions; they can be linearly independent or dependent.
First come the key examples using column vectors.
The goal is to understand a basis : independent vectors that "span the space".
Every vector in the space is a unique combination of the basis vectors.
We are at the heart of our subject, and we cannot go on without a basis. The four essentia]
ideas in this section (with first hints at their meaning) are:
1.	Independent vectors (no extra vectors)
2.	Spanning a space (enough vectors to produce the rest)
3.	Basis for a space	(лог too many and not too few)
4.	Dimension of a space (the number of vectors in every basis)
Linear Independence
Our first definition of independence is not so conventional, but you are ready for it.
DEFINITION The columns of A are linearly independent when the only solution
to Ax = 0 is x = 0. No other combination Ax of the columns gives the zero vector.
The columns are independent when the nullspace N(A) contains only the иго vector.
Let me illustrate linear independence (and dependence) with three vectors in R .
Chapter 3. The Four Fundamental Sub...
DSPaccs
108
. rj ate not in the same plane, they are independent. No combina
1.	If three vectors in	except (h,, + Ov2 + Ov3.
non of	® ®	, —3 .
w, are in the same plane in R , they are dependent.
2.	if three «i ®1*"3
Not in
a plane
Figure 3.2: Independent: Only 0®i + 0®2 + Ovj give 0. Dependent: Wi-Wj + WjaQ
Th» idea of independence applies to 7 vectors in 12-dimensional space. If they are the
columns of A. and tndependeni. the nullspace only contains x = 0. None of the vectors is
a combination of the other six vectors
Now we choose different words to express (he same idea in any vector space.
DEFINITION The sequence of vecton ........v„ is linearly independent if the only
combination that gives the zero vector b 0® । + Ovj + ••• 4-0v„.
Linear independence
ri»t + rjVj+   + x.v, « 0 only happens when all z’s are zero.
If a combination gives 0. when the r's are not all zero, the vectors arc dependent.
Correct language: “The sequence of vecton is linearly independent." Acceptable
dioncitr The vectors are independent.” Unacceptable: ‘The matrix is independent."
A sequence of vectors is either dependent or independent. They can be combined to
give the zero vector (with nonzero z’s) or they can’t. So the key question is: Which com-
bmations of the vecton give zero? We begin with some small examples in R *
(a)	The vecton (1.0) and (1.0.00001) are independent.
(b)	11*»е«оп(1,|)аЫ(-1,-1)ж4фйИя|».
(c)	The vecton (1,1) Ы (0,0) ж drpfndent	of
<d) In R« any three vecton (.,b) (crf) (e/)	dfpfnden,
Dependent columns h ql r i r ,
* 10 in the nullspacc [1 -1] [zJ = 0 for *1 = 1 and x2 = 1.
Three vecton in RJ cannot be
those three columns must have a	Onc waY to see this: the matrix A with
Now move to three vecton in R1 anJ t*lcn 3 special solution to Ax = 0.
secton are dependent But the com L' 0ПС them “1 mu,tiP,c of another one. these
them in a matnx and try ю solve AtVq ,nvo,ves all three vectors at once. We put
3.4. Independence. Basis, and Dimension
109
Example 1 The columns of this A are dependent. Ax = 0 has a nonzero solution:
1 5	1
° 3J	J
The rank is only r — 2. Independent columns produce full column rank r = n = 3.
For a square matrix, dependent columns imply dependent rows and vice versa.
Question How to find that solution to Ax = 0? The systematic way is elimination.
Full column rank The columns of A are independent exactly when the rank iir  n.
There are n pivots and no free variables and A = C. Only x = 0 is in the nullspace.
One case is of special importance because it is clear from the sun Suppose seven
columns have five components each (m = 5 is less than n = 7). Then the columns must
be dependent. Any seven vectors from Rs are dependent. The rank of A cannot be larger
than 5. There cannot be more than five pivots in five rows. Ax = 0 has at least 7 - 5 » 2
free variables, so it has nonzero solutions—which means thal the columns are dependent.
Any set of n vectors in R"* must be linearly dependent if n > m.
This type of matrix has more columns than rows—it is short and wide. The columns are
certainly dependent if n > m, because Ax - 0 has a nonzero solution.
The columns might be dependent or might be independent if n < m. Elimination will
reveal the r pivot columns. It is those r pivot columns that are independent in C.
Note Another way to describe linear dependence is this: “One vector is a combination
of the other vectors." That sounds clear. Why don't we say this from the start? Our
definition was longer: “Some combination gives the zero vector, other than the trivial
combination with every x = 0." We must rule out lhe easy way to get the zero vector.
The point is, our definition doesn't pick out one particular vector as guilty. All columns
of A are treated the same. We look at Ax = 0. and it has a nonzero solution or it hasn't
110
Example 2 Desert--------
m = 3	4 =
n = 2
Chapter 3. The Four Fundamental Sub,
Pace,
Vectors that Span a Subc.
wsPace
„	.« fc Cd."» surt.» “Xunm „...................
CM) dcWo Л	то siogk >!“" decnb“ С<Л>-	"
combinations Ах	* со1итя space. They might be dependent
Пе columns of matrix spa
x and the row space of A.
1	41 a <Т	Г1 2 31
2	7 and .4 = L 7 J.
3	5 J
, л к the plane in R3 spanned by the two columns of 4. 77
<hf lhrre *А (WhiCh arc.COlumnS Of 1Ъ“ ">w sp^
„f й2 R^ember The rows are m R" spanning the row space. The columns are^
Rm spanning the column space. Same numbers, different vectors, different spaces.
A Basis for a Vector Spacc
Two vectors can t span all of R3. even if they are independent. Four vectors can’t be
independent even if they span R3. We want enough independent vectors to span tht
space (and not more). A “basts is just right
DEFINITION A basis for a vector space is a sequence of vectors with two properties:
The basis vectors are linearly independent and they span the space.
This combination of properties is fundamental to linear algebra. Every vector v in the space
is a combination of the basis vectors, because they span the space. More than that, the com-
bination that produces о is unique, because the basis vectors Vj..... vn are independent:
There is one and only one way to write v as a combination of the basis vectors.
Reason: Suppose v = mm+-+anvn and also» = biVi+--+bnt>,(. By subtraction
(m -	+ (<4 - bn)®„ •* the zero vector. From the independence of the v’s,
each a, - b, = 0. Hence a. = 6„ and there are not two ways to produce v.
Examples The columns of I = Q produce the “standard basis” for R2.
The basis vectors i =	| and j = j are independent. They span R .
Everybody thinks of this basis first The vector i goes across and j goes straight up.
The columns of the n by n identity matrix give the "standard basis" for R”.
3.4. Independence. Basis, and Dimension
111
Now we find many o(her bases (infinitely many). Tbe basis is not unique!
Example 4	(Important) The columns of every invertible n by n matrix give a basis for R :
Invertible matrix
Independent columns
Column space h R3
Singular matrix	1 0 1
Dependent columns В = 1 1 2
1 1J Column space # R3 112
The only solution to Ax = 0 is x = Д- ‘0 = 0. The columns are independent. They span
the whole space R because every vector b is a combination of thc columns. Ax = b can
always be solved by x = A b Do you see how everything comes together for invertible
matrices? Here it is in one sentence:
The vectors « i,..., v„ are a basis for Rn exactly when they arc the columns of an
n by n invertible matrix. Thus Rn has infinitely many different bases.
When the columns are dependent, we keep only the pivot columns—thc first two columns
of В above. Those two columns are independent and they span the column space.
Every set of independent vectors can be extended to a basis.
Every spanning set of vectors can be reduced to a basis.
Example 5	This matrix is not invertible. Its columns are not a basis for anything!
One pivot column
One pivot row (r = 1)
Example 6	Find bases for the column and row spaces of this rank two matrix:
2
0
0
0 3'
1 4
0 0
Columns 1 and 3 are the pivot columns. They are a basis for the column space of Ro.
The column space is the “ry plane" inside ryz space R3. That plane is not R2. it is a
subspace of R3. Columns 2 and 3 are also a basis for the same column space. Which pair
of columns of Ro is not a basis for its column space ?
The row space is a subspace of R4. The simplest basis for that row space is thc two
nonzero rows of Rq. The zero vector is never in a basis
Question Given five vectors in R . how do you find a basis for the space they span?
First answer	Make them the rows of A. and eliminate to find the nonzero rows in R.
Second answer	Put the five vectors into the columns of A. Eliminate to find thc pivot
columns. Those pivot columns in C are a basis for the column space.
Could another basis have more vectors, or fewer? This is a crucial question with a good
answer: No. AU bases for a vector space contain the same number of vectors.
The number of vectors in any and every basis is the dimension’’of the space.
112
for rhe same vector space, then m = n
an
ai„
= V4.
t>i ... vm
W1 tl>2 ••
Each w is a
combination И ~
ofther's
Proof Suppose that there are more w's than v's. From n > m we w
contradK'uon. The v's are a basis, so W] must be a combination of the v’s / ,0 ^ch a
a, lC|+-•	this is lhe first column of a matnx multiplication	c4uals
Dimension of a V
We have to prose what was just stated. There are many choices for the b
the number of basts rectors doesn’t change	as,s vc'ctOr
' byt
«е both bases
If Cl
We don’t know each e... but we know the shape of A (it is m by n). The second vector
* £ aho a combination of the vs. The coefficients ш that combination fill the
column of A. The key is that A has a row for every v and a column for every w. A « a
ribZ vide matrix «псе we assumed n > m. So Ax = 0 has a nonzero solution.
Лх = 0 gives V4x = 0 which is Их = 0. A combination of the w s gives zen>\
Then the w’s could not be a basis—our assumption n > m is not possible for two bases.
If m > n we exchange lhe e's and w's and repeat the same steps. The only way to
□void a contradiction is to have m = n. This completes the proof that m = n.
The number of basis vectors is the dimension. So the dimension of R" is n. We now
define the important word dimension.
DEFINITION The dimension of a space is lhe number of vectors in every basis. [
The dimension marches our intuition. The line through v = (1,5,2) has dimension one.
It is a subspace with this one vector v in its basis. Perpendicular to that line is
the plane x + 5y + 2a = 0. This plane has dimension 2. To prove it, we find a basis
(-5.1,0) and (-2,0,1). The dimension is 2 because the basis contains two vectors.
The plane is the nullspace of lhe matrix Л = [ 1 5 2 ], which has two free variables.
Our basis vecton (-5,1,0) and (-2,0,1) are the “special solutions" to Ax = 0.
The n - r special solutions always give a basis for the nullspace: dimension n — r.
Note about the language of linear algebra We never say “lhe rank of a space” or “the
dimension of a basts or “lhe basis of a matrix". Those terms have no meaning. It is the
dimension of the column space that equals the rank of the matrix.
3.4. Independence. Basis, and Dimension
113
Bases for Matrix Spaces and Function Spaces
The words independence and “basis" and “dimension” are not limited to column vectors.
We can ask whether three matrices А|,А2.Аз are independent. When they are 3 by 4
matrices, some combination might give the zero matrix. We can also ask the dimension of
the full 3 by 4 matrix space. (It is 12.)
In differential equations, Py/dx1 = у has a space of solutions. One basis is у = e*
and у = e *• Counting the basis functions gives the dimension 2 for this solution space.
(The dimension is 2 because of the second derivative.)
Matrix spaces Thc vector space M contains all 2 by 2 matrices. Its dimension is 4.
—- а.л,.л,.а-[;	;].
Those matrices are linearly independent. We are not looking at their columns, but at the
whole matrix. Combinations of those four matrices can produce any matrix in M.
Every A combines .	.	,	. fci col
the basis matrices С,Л‘ +	+ esAt + c«A< =	= A.
A is zero only if the c’s are all zero—this proves independence of Xi, А?. A3, A*.
The three matrices A1.A2.A1 are a basis for a subspace—the upper triangular
matrices. Its dimension is 3. At and Ад are a basis for the diagonal matrices. What is
a basis for the symmetric matrices? Keep Ai and Ад. and throw in Aj + A3.
The dimension of the whole n by n matrix space is n2.
The dimension of the subspace of upper triangular matrices is |n2 + |n.
The dimension of the subspace of diagonal matrices is n.
The dimension of tbe subspace of symmetric matrices is |n2 + |n (why ?).
Function spaces The equations (Py/dx1 = 0 and Py/dx/1 = -y and Py/dx2 = у
involve the second derivative. In calculus we solve to find the functions y(x):
у" = 0 is solved by any linear function у = ex + d
у” = — у is solved by any combination у = csin x + dcosx
у" = у is solved by any combination у = ее* + de~*.
That solution space for у" = — у has two basis functions: sinx and coex. The space for
у" = 0 has r and 1. It is the “nullspace” of the second derivative! The dimension is 2 in
each case (these are second-order equations).
The solutions of y" = 2 don’t form a subspace—thc right side b = 2 is not zero. A
particular solution is y(x) = x2. The complete solution is p(x) = x2 + ex + d. All
those functions satisfy y" = 2. Notice the particular solution plus any function ex + d
in the nullspace. A linear differential equation is like a linear matrix equation Ax = b.
We end here with the space Z that contains only the zero vector. The dimension of this
space is zero. The empty set (containing no vectors) is a basis for Z. We can never allow
the zero vector into a basis, because then linear independence is lost.
114
UW‘U ----------- 1 u,,u*uhcntaJ sub--
lodep^""- ’₽“ b“iS d™™»n
, , „ W™ tf* - <"•,be °"11'	“ A*- 0.
1.	The columns or * * jf combinations fill that space.
2.	Thevectorsu,.tndtnl vectors thal span the space. E
3.	A basis consists °fbn^‘^iOn of the basis vectors.	*
in the space is a am?
. mn« «ГГ one basis for the column space. The dimension is r.
5.	The pivot columns arc one iw
 WORKED EXAMPLES 
A is invertible. Show that A»i• • • »	"
Solution In matrix language: Put the basis vectors v„. tt in the columns Of an
invertible (!) matnx V. Then Ли.Ли„ arc the columns of AV. Since A is invert^
so is AV. Its columns give a basis.
In vector language: Suppose с(Ли( + ••• +	= 0. This is Av . 0 wj.
v . Cl V1 + • • -+c„ v„. Multiply by A'1 to reach и = 0. By linear independence of the v's
all c,  0. This shows thal lhe Xu's are independent.
To show that lhe ЛиЧ span R", solve cl Ли( + • • • + c„ Avn > b which is the same a
C) V| +... + c„vn = Л_|Ь. Since the v's are a basis, this must be solvable.
3.4 В Start with the vectors U| = (1.2,0) and u2 - (2,3,0). (a) Are they linearly
independent? (b) Are they a basis for any space? (c) What is the dimension of V?
(d) Which matrices A have V as their column space ? (e) Which matrices have V as their
nullspacc? (f) Describe all vectors v3 that complete a basis Vi, v2, t»3 for R3.
Solution
(a)	vi and u2 are independent—the only combination to give 0 is 0«i + 0v2.
(b)	Yes, they are a basis for the space V they span: All vectors (x, y, 0).
(c)	The dimension of V is 2 since the basis contains two vectors.
(d)	This V is the column space of any 3 by n matrix A of rank 2, if every column is a
combination of »i and v2. In particular A could just have columns v, and v2.
(e)	This V is the nullspace of any m by 3 matrix В of rank 1, if every row is a multiple
of (0,0,1). In particular take В = (0 0 1]. Then Bv1 = 0 and Bv2 - 0.
(f)	Any third vector vs = (a, b,c) will complete a basis for R3 provided c / 0.
3.4. Independence. Basis, and Dimension
115
3.4 C Start with three independent vectors W1, w2. w3. Take combinations of those
vectors to produce vl, v2( v3. Write the combinations in matrix form as V = W'B:
V| = W| + w2
v2 = u)| + 2w2 + w3
V3 =	W2 + CW3
0
1
c
which is
What is the test on В to see if V = И £/ has independent columns? If c / 1 show
that Vi, V3> Va are linearly independent. If c = 1 show that thc v’s are linearly dependent.
Solution	For independent columns, the nullspace ofV must contain only the zero vector.
Vx = 0 requires x = (0,0,0).
If c = 1 in our problem, we can see dependence in two ways. First, tq + v3 will be
the same as v2. Then v> — v2 + v3 « 0—which says that the v's are dependent.
The other way is to look at the nullspace of B. If c = 1, the vector x - (1,-1,1) ii
in that nullspace, and Bx  0. Then certainly WBx  0 which is the same as Vx = 0.
So the v's are dependent: vt - v2 + v3 » 0.
Now suppose c # 1. Then the matrix В is invertible. So if x is any nonzero vector we
know that Bx is nonzero. Since thc w’s are given as independent, we further know that
WBx is nonzero. Since V  H'B, this says that x is not in the nullspace of V. In other
words vt, v2, v3 are independent.
The general rule is “independent v’s from independent w’s when В is invertible”.
And if these vectors are in R3, they are not only independent—they are a basis for R*.
"Basis of v's from basis of w’s when the change of basis matrix В is invertible."
Copter з. The Four Fundamental Sub^
116
Problem Set 3.	|incar dcpen<lence>
boutiin«rin,kpe^
Questions l-Ю arc a	V4 dcpcndcn(.
Show that n.
T
о
о
V| =
Solve C|»i
(Recommended) Find the
U| 
• I
-1
0
0
1’2 =
Ci =
t»4 =
+ ri,.4 = Oor.Ax = ° 1116 ” s 8°in the columns of д
largest possible number of independent vectors among
0
-I
0
1'
0
0
-1
V4 e
 o'
I
-1
0
 o'
1
0
-1
«в »
O'
0
1
-1
1
2
vt “

T
i
о
1
1
1
2
3
4
V8 »
I,., f oor d = Oor / e 0(3 cases), the columns of I are dependent:
3
5
6
a b e
[/«Ode.
0 0 /.
If „ d / in Question 3 are nonzero, show that the only solution to Ux = 0 is x - 0.
An upi*r triangular U with no diagorul zeros ha* independent columns.
Decide the dependence or independence of
(a)	the vectors (1,3.2) and (2.1.3) and (3,2.1)
(b)	the vectors (1.-3,2) and (2,1,-3) and (-3,2,1).
Choose three independent columns of U. Then make two other choices. Do the same
for A.
7
8
If W|. to2, w3 are independent vectors, show that the differences Vi = w2 - w3 and
vj = u>| - w3 and »з = Wj - wj are dependent. Find a combination of the v’s
that gives zero Which matrix A in [ v\ v2 Vj ] = [ Wj w2 w3 ] A is singular?
If W|. w2. w3 are independent sectors, show that the sums = w2 4- w3 and
v2 = wi + wj and th = wj + Wj are independent. (Write Ci V| +C2V2 + C3V3 = 0
in terms of lhe w’s. Find and solve equations for the c's, to show they arc zero.)
3,4. Independence. Basis, .nd Dimension	117
9 Suppose Vi.Vj.Vj,t>4 ие vectors in R1
(a)	These tour vectors arc dependent because
(b)	Thc two vectors V| and v2 will be dependent if
(c)	The vectors v, and (0,0.0) are dependent because _.
1 о Find two independent vectors on the plane t + 2y - 3a -1 = 0 in R* Then find three
independent vectors. hy not four? This plane is the nullspace of what mains?
Questions 11-14 are about the space ipannrd by a set of vectors. Take all linear com-
binations of the vectors.
11	Describe thc subspace of R’ (is it a line or plane or R1?) spanned by
(a)	the two vectors (1,1,-1) and (-1,-1,1)
(b)	the three vectors (0.1,1) and (1,1,0) and (0,0,0)
(c)	all vectors in RJ with whole number components
(d)	all vectors with positive components
12	The vector b is in the subspace spanned by the columns of A when has a
solution. Thc vector c is in the row space of A when_______has a solution.
True or false: If the zero vector is in the row space, thc rows are dependent
13	Find the dimensions of these 4 spaces. Which twoofthe spaces are thc same? (a) col-
umn space of 4, (b) column space of U. (c) row space of A. (d) row space of U:
A =	1	1	0 1	3	1 3	1	-1	and U "	1 1 O' 0 2 1 0 0 0
14	v + w and v - w are combinations of v and w. Write v and w as combinations of
v + w and v - w. Tbe two pairs of vectors the same space When are they a
basis for the same space?
Questions 15-25 are about the requirements for a basis.
15	If vt..... »„ are linearly independent the space they span has dimension
These vectors are a for that space. If the vectors are the columns of an m by
n matrix, then m is_____than n. If m = n. that matrix is___.
16	Find a basis for each of these subspaces of R4:
(a)	All vectors whose components are equal.
(b)	All vectors whose components add to zero.
(c)	All vectors that are perpendicular to (1.1,0,0) and (1,0,1,1).
(d)	The column space and the nullspace of / (4 by 4).
118
Chapter 3. The Four Fundamental Subspace
17	Find three different bases for the column space of U- [ J ? J ? J]. Then find two
different bases for the row space of U.
18	Suppose vi.vj......vt are six vectors in R .
(a)	Those vectors (doXdo not X might not) span R .
(b)	Those vectors (areXare notXmight be) linearly independent.
(c)	Any four of those vectors (areXare notXmight be) a basis for R"*.
19	The columns of A are n vectors from Rm. If they are linearly independent, what is
the rank of A? If they span Rm. what is the rank? If they are a basis for Rm, what
then? Looking ahead The rank r counts the number of____________columns.
20	Find a basis for the plane x—2y+3a = 0 in RJ. Then find a basis for the intersection
of that plane with the xy plane. Then find a basis for all vectors perpendicular to the
plane
21	Suppose the columns ofa 5 by 5 matrix A are a basis for R&.
(a)	The equation Ax = 0 has only the solution x = 0 because ___,
(b)	If b is in R' then Ax  b is solvable because the basis vectors _R5.
Conclusion: A is invertible. Ils rank is 5. Its rows are also a basis for R&.
22	Suppose S is a 5-dimensional subspace of R" True or false (example if false):
(a)	Every basis for S can be extended to a basts for R® by adding one more vector.
(b)	Every basis for R6 can be reduced lo a basis for S by removing one vector.
23	U comes from A by subtracting row I from row 3:
1 3 2
0 1
I 3 2
A-
and U
I
1 3 2‘
0 1 1
0 0 0
24
25
f^^f7nhe'WOtS’lnSP*C‘ F,nd bases for the two row spaces. Find bases
or the two nullspaces. Which spaces stay fixed in elimination?
True or false (give a good reason)
(a) If the columns of . matnx are dependent, so are the rows
£ T" T ” ‘	2	» " ”> >»“'
>™ red““ •»». basis?
Suppose t»|
Suppose t>t.
a basis ?
3 4. Independence, Basis, and Dimension
119
26
For which numbers c and d do these matrices have rank 2?
Questions 27-31 are about spaces where the “vectors" are matrices.
27 Find a basis (and the dimension) for each of these subspaces of 3 by 3 matrices:
(a)	All diagonal matrices.
(b)	All symmetric matrices (AT = A).
(c)	All skew-symmetric matrices (Лт ж -Д).
28 Construct six linearly independent 3 by 3 echelon matrices l/|,..., l/e.
29 Find a basis for the space of all 2 by 3 matrices whose columns add lo zero. Find a
basis for the subspace whose rows also add to zero.
30 What subspace of 3 by 3 matrices is spanned (take all combinations) by
(a)	lhe invertible matrices?
(b)	the rank one matrices?
(c)	the identity matrix?
31 Find a basis for the space of 2 by 3 matrices whose nullspace contains (2,1,1).
Questions 32-36 are about spaces where the “vectors" are functions.
32	(a) Find all functions that satisfy jJ = 0.
(b)	Choose a particular function that satisfies = 3.
(c)	Find all functions that satisfy jJ = 3.
33	The cosine space Fj contains all combinations y(x) “ A cos x+ В сое 2x+C cos 3x.
Find a basis for the subspace with t/(0)  0.
34 Find a basis for the space of functions that satisfy
(a) £-2v = °
<b) й-;=о-
35
36
37
Suppose j/i (x), Jf2(x), кз(х) are three different functions of x. The vector space they
span could have dimension 1. 2. or 3. Give an example of щ, уз. уз to show each
possibility.
Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the
subspace with p(l) = 0.
Find a basis for the space S of vectors (a, b, c. d) with a + e + d - 0 and also for the
space T with a + b = 0 and c = 2d. What is the dimension of the intersection S ПТ7
I
'<
•I
<
I
Chapter 3. The Four Fundamental Subspilc
120	C4
. н chow that ^as no soIution whcn the 5 bv ч
38	X* »|	« Л. = ” » <—* M b|» ™eu ’
^../,/r2 = u(x)and then = ~v(x).
39	Find bases for all solutions to d y/dx
Challenge Problems

40
41
42
Write the 3 by 3 identity matrix as a combination of the other five permu(alion
matrices^ Then show that those five matrices are hneary independent. This U ,
basis for the subspace of 3 by 3 matrices with row and column sums all equal.
Choose x = (x„z3.x3.z4) in R4. It has 24 rearrangements like {x2>xUXi^}
and (хд.хз.хьx2). Those 24 vectors, including x itself, span a subspace S. Find
specific vectors x so that the dimension of S is: (a) zero, (b) one, (c) three, (d) f0Ur
Intersections and sums have dim(V) + dim(W) — dim(V П W) + dim(V 4 W)
Start with a basis ui,.. ., ur for the intersection V О W. Extend with Vj...,, v,
to a basis for V, and separately W|,..., to a basis for W. Prove that the u’s, v’s
and w'i together arc independent: dimensions (r + e) + (r-f-t) » (r) + (r 4- a 4. j
43 Inside R”, suppose dimension (V) + dimension (W) > n. Show that some nonzero
vector is in both V and W
44 Suppose A is 10 by 10 and A1  0 (zero matrix). So A multiplies each column
of A to give the zero vector. Then the column space of A is contained in the
If A has rank r, those subspaces have dimension r < 10 - r. So the rank is r < 5.
3,5. Dimensions of the Four Subspaces
121
3.	5 Dimensions of the Four Subspaces
/f^hecolumn space C(A) and the row space C(AT) bvc dimenilon r (thc ^k of A),
i The nullspace N(A) has dimension n - r. The left nullspace N(AT) has dimension m - r
j Elimination often changes C(A) and N( AT) (but their dimensions don't change).
The main theorem in this chapter connects rank and dimension The rank of a matrix
counts independent columns. The dimension of a subspace is the number of vectors in
a basis. We can count pivots or basis vectors. The rank о/ A reveals the dimensions of
all four fundamental subspaces. Here are the subspaces, including the new one.
Two subspaces come directly from A. and the other two come from AT.
Four Fundamental Subspaces	Dimensions
1.	The row space is C( AT). a subspace of R".	r
2.	The column space is C(A). a subspace of R".	r
3.	The nullspace is N(A), a subspace of R".	n - r
4.	The left nullspace is N(AT). a subspace of Rm.	m - г
We know C(A) and N(A) pretty well. Now C(AT) and N(AT) come forward. The row
space contains all combinations of the rows. This row space is the column space of AT.
For the left nullspace we solve ATy = 0—that system is n by m. In Example 2 this
produces one of the great equations of applied mathematics—Kirchhoff's Current Law.
The currents flow around a network, and they can't pile up at the nodes. The four subspaces
come from nodes and edges and loops and trees. Those subspaces are connected in an
absolutely beautiful way.
Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One fact
stands out: The row space and column space have the same dimension r. This number r
is the rank of A (Chapter 1). The other important fact involves the two nullspaces:
N( A) and N(AT) have dimensions n - г and m - r,to make up the full n and rn.
Part 2 of the Fundamental Theorem will describe how the four subspaces fit together:
Nullspacc perpendicular to row space, and N( AT) perpendicular to C( A). That completes
the “right way” to understand Ax = b. Stay with it—you are doing real mathematics.
122
Chapter 3- The Four Fundamental Subsp^
The Four Subspaces fOr
- «, . itotoy	lhcm to.', dtotel> » « loo* ba "k ?'»
.. wueb to. to «•*»-	to Л «arf Я».	“' *
and Я “ rf to to ^bpto. «. h~	Wblab ,
As a specific 3 by 5 example, look at tbe four subspaces for this echelon matrix .
1 3 5 0 71	pivot rows 1 and 2
0 0 0 1 2
0 0 0 0 Oj	phot columns 1 and 4
Tbe rank of this matrix is r = 2 (nropiroa). Take the four subspaces in order.
1.	The row space has dimension 2. matching thc rank, j
Reason: The first two rows are a basis. The row space contains combinations of all three
rows, but the third row (the zero row) adds nothing to the row space.
The pivot rows 1 and 2 are independent. That is obvious for this example, and it is
always true If we look only at the pivot columns, we see the r by г identity matrix.
There is no way lo combine its rows to give the zero row (except by the combination with
all coefficients zero) So the r pivot rows (the rows of A) are a basis for the row space.
The dimension of the row space is the rank r. The nonzero rows of Reform a basis.
2.	Thc column space of Ro also has dimension r = 2.
Reason: The pivot columns I and 4 form a basis. They are independent because they
contain the г by r identity maim. No combination of those pivot columns can give the
zero column (except the combination with all coefficients zero). And they also span the
column space. Every other (free) column is a combination of the pivot columns. Actually
the combinations we need are the three special solutions!
Column 2 is 3 (column I). The special solution is (-3,1,0,0,0).
Column 3 is 5 (column I). The special solution is (—5,0,1,0,0,).
Column 5 is 7 (column I) + 2 (column 4). That solution is (-7,0,0,-2,1).
The pivot columns are independent, and they span C(/?o). so they are a basis for C(Ao).
The dimension of the column space is the rank r. The pivot columns form a basis.
3.5. Dimensions of lhe Four Subspaces
123
3.	Ш	of ft, ta dime»™, n - , . , _ 2 ЛЫ 3 f„ «table give
3 .petal ...... и ftg . Q. ft,	„ | tnd „ ,„d a
	-3‘		—5		—7'	
	1		0		0	Ro® = 0 has the
«2 ~	0	•a =	1	«5 =	0	complete solution
	0		0		-2	® = ®2«2 + Тз*3 + ®6«S
	0		0		1	The nullspacc has dimension 3.
Reason . There is a special solution for each free variable. With n variables and г pivots,
thal leaves n—г free variables and special solutions. The special solutions are independent,
because they contain the identity matrix in rows 2.3,5.
The nullspace N (Л) has dimension n — r. The special solutions form a basis.
4.	The nullspace of Rj (left nullspace of Ro) has dimension m - r - 3 - 2.
Reason : Ro has г independent rows and rn — r zero rows. Then R,\ has г independent
columns and m — r zero columns. So у in lhe nullspace of R(] can have nonzero* in its
last m - r entries. The example has m - r  I zero columns in Rj and I nonzero in y.
	’ 1	0	0 ‘				 0 ‘		
Rjy =	3	0	0		Vi		0		0
	5	0	0		Vi	=	0	is solved by у ш	0
	0	1	0		V3		0		Vs
	7	2	0				0		
(1)
Because of zero rows in Ro and zero columns in Ro’, it is easy to see the dimension
(and even a basis) for this fourth fundamental subspace:
If Ro has m — r zero rows, its left nullspace has dimension rn — r.
Why is this a “left nullspace"? Because we can transpose R„y — 0 to yTRo ~ O’.
Now у1 is a row vector to the left of R. This subspace came fourth, and some linear algebra
books omit it—but that misses the beauty of the whole subject.
In Rn the row space and nullspace have dimensions r and n — r (adding to n).
In Rm the column space and left nullspace have dimensions r and rn - r (total m).
We have a job still to do. The four subspace dimensions for A are the same as for Ro.
The job is to explain why. 4 is now any matrix that reduces to Ro = rref (Л).
1
7
2
9
3 5 0
0 0 1
3 5 1
Illis A reduces to Ro
0
1
Same row space as Ro
Different column space
But same dimension!
Figure 33: The dimension* of lhe Four Fundamental Subspace* (for Яо and for A).
The Four Subspaces for д
1	A has the same row space as Яо ansi R. Same dimension r and same basis.
Reason: Every row of A is a combination of lhe rows of Яо- Also every row of Яо is a
combination of the row* of A. Elimination changes rows, but not row spaces.
Since A ha* the same row space as Яо. the first r row* of Яо are still a basis. Or we
could choose r suitable row* of the original A. They might not always be the first r rows
of A. because those could be dependent. The good r rows of A are the ones that end up as
pivot row* in Яо and Я.
2	The column space of Ahos dimension r. The column rank equals the row rank.
The number of independent columns — the number of independent rows.
Wrtmg reason “A and Яо have lhe same column space." This is false. The columns of Яо
often end in zeros The column* of A don't often end in zeros. Then C( A) is not С(Яо).
Right reason. The same combinations of the columns are zero (or not) for A and Яо-
nJTokm,П A ~ ‘кре^П1'° &У th" an°thCT way: Ax = 0 exactly when R»x = 0.
The column spaces are different, but their dsmensions are the same -equal to the rank r.
Conclusion The r pivot column* of A are a basis for its column space C( A).
3.5. Dimensions of the Four Subspaces
125
3	A has the same nullspace as Same dimension n - r and same basis
Reason: Thc elimination steps don't change the soluUons. The special solutions are a ba-
sis for this nullspace (as we always knew). There are n - r free variables, so the dimension
of the nullspace is n — r. This is the Counting Theorem: r 4- (n — r) ecjunls n.
| (dimension of column space)+(dimension of nullspace) = dimension of Rn.
4	The left nullspace of A (the nullspace of Лт) has dimension m - r.
Reason: Лт is just as good a matrix as A. When we know the dimensions for every Л.
we also know them for Лт. Its column space was proved to have dimension r. Since
Лт is n by m, thc ‘ whole space** is now R"*. The counting rule for A was г + (n - r) = n.
The counting rule for AT is г + (m — r) = m We have all details of a big theorem:
Fundamental Theorem of Linear Algebra, Part 1
The column space and row space both have dimension r.
The nullspaces have dimensions n - г and m — r.
By concentrating on spaces of vectors, not on individual numbers or vectors, we get these
clean rules. You will soon lake them for granted—eventually they begin to look obvious.
But if you write down an 11 by 17 matrix with 187 nonzero entries. I don't think most
people would sec why these facts are true:
_ .	. dimension of C( Л) = dimension of С(ЛТ) = rank of A
Two kev facts	' '	' _ '
dimension of С(Л) 4 dimension of М(Л) « 17.
Every vector Ax = h in the column space comes from exactly one x in the row space !
(If we also have Ay = b then Л(х - у) = b - b = 0. So x - у is tn the nullspace as
well as the row space, which forces x = y.) From its row space to its column space,
A is like an r by r invertible matrix.
It is the nullspaces that will force us to define a “pseudoinverse of Л": page 133.
Example 1 Л=^ "i б] has m = 2 w,,h n = ^ The rank is r « 1.
The row space is the line through (1,2,3). The nullspace is the plane xt + 2zj + 3x j = 0.
The line and plane dimensions still add to 1 + 2 = 3. The column space and left nullspace
arc perpendicular lines in R2. Dimensions 1 + 1 = 2.
Column space = line through LI Left nullspace = line through
Final point: The у'sin the left nullspace combine the rows of A to give the zero row.
126
Chapter 3. The Four Fundamental
Subspeces
’ Г * гх**- -
can t continue forever; H	have four unknowns (one for every n<1(l
•”>	» При 30	№s	Л has 1 «nd -1 on
The matrix in Ar = b ri an
Differences Ax = b
across edges 1,2,3. 4,5
between nodes 1,2,3,4
m = 5 and n = 4
If you understand lhe four fundamental subspaces for this matrix (the column spaces and
the nullspaces for A and AT) you have captured a central idea of linear algebra.
Figure 3.4; A “graph” with 5 edges and 4 nodes. A it its 5 by 4 incidence matrix
The nullspace N(A) To find the nullspacc we set b - 0. Then the first equation savs
*1 - xj. The second equation s x, - x,. Equation 4 is zt - x4. All four unknowns
*i. Xj. x,. x4 have the same value c. The vectors z - (c, c, C, c) fill the nullspace of A.
n< х?”^,а$расе “ * l,ne " **’	4X011 *°,и|'оп * “ (1.1.1,1) is a basis for
ялсе n - 7T l'	“ “* bS'S) Пе ППк °f A mUSI >* 3-
r - 4 - 3 . I We now know the dimensions of all four subspaces.
„	?Г ' - 3	«*—. The ta w
“ «" 3 «*— U Л. П. „«апык ».y „	Я. = „«(Л).	’
Columns
1.2,3
-1
-1
0
0
0
1
0
-1
-1
0
_ reduced row
echelon form
From ft, we
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
-1 '
-1
1
0
0
0
1
1
0
-1
3 columns are basic,
columns 1.2,3 of A. The colZ^* must 8° back to
3 j. Dimensions of the Four Subspaces
127
The row space С(ЛТ) The dimension must again be r = 3. But the first 3 row* of
A are not independent: row 3 = row 2 - row I. So row 3 became zero in elimination,
and row 3 was exchanged with row 4. The first three independent row* are rows 1,2.4.
Those three rows arc a basis (one possible basts) for the row space.
Edges 1,2,3 form a loop in the graph:	Dependent row* 1,2,3.
Edges 1,2,4 form a tree. Trees have no loops' Independent row* 1,2,4.
The left nullspace N(AT) Now we solve ATy = 0. Combinations of the row*
give zero. We already noticed that row 3 = row 2 - row 1, so one solution it у «
(1, —1,1,0,0). I would say: That у comes from following the upper loop in the graph.
Another у comes from going around the lower loop and it is у = (0,0, -1.1, -1):
row 3 = row 4 - row 5. Those two y's are independent, they solve ATy = 0. and lhe
dimension of N(AT) ism — r = 5 - 3 = 2. So we have a basis for the left nullspace.
You may ask how "loops" and "trees" got into this problem. That didn't have to happen.
We could have used elimination to solve Ат у = 0. The 4 by 5 matrix AT would have three
pivot columns 1,2,4 and two free columns 3,5. There are two special solution* and the
nullspace of AT has dimension two: m - Г - 5 - 3 - 2. But loops and trees identify
dependent rows and independent rows in a beautiful way for every incidence matrix.
The equations Ax = b give “voltages" X|,xj,xj,z4 at the four node*. The equation*
А’у = 0 give "currents" 1П,Уз.уз,1м,у* °" the five edge*. These two equation* are
Kirchhoff’s Voltage Law and Kirchhoff’s Current Law. Those laws apply to an
electrical network. But the ideas behind the words apply all over engineering and science
and economics and business Linear algebra connects the laws to the four subspaces
Graphs are the most important model in discrete applied mathematics. You see graphs
everywhere: roads, pipelines, blood flow, the brain, the Web. the economy of a country or
the world. We can understand their matrices A and AT. Here is a summary.
The incidence matrix A comes from a connected graph with n nodes and m edges.
The row space and column space have dimensions г = n — 1. The nullspaces of A
and AT have dimensions 1 and m - n + 1:
N(A) The constant vectors (c,c,...,c) make up the nullspace of A : dim = 1.
C(AT) The edges of any tree give r independent row* of A : r = n - 1.
C( A) Voltage Law: The components of Az add to zero around all loops: dim = n - 1.
N( AT) Current Law: ATy = (flow in)-(flow out) = 0 is solved by loop currents
There are rn - г = m - n + 1 independent small loops in lhe graph.
For every graph in a plane, linear algebra yields Euler's formula : Theorem 1 in topology!
(nodes) - (edges) + (small loops) =(n) - (m) + (m - n + 1) = 1
128
Chapter 3. The Four Fundamental Subspaccs
Rank Two Matrices — Rank One plus Rank One
Rank one matnces have the form uvT Here u a matnx .4 o!mnk r 1 Wecj-< see r
immediately from .4 So we reduce the matnx by row operates to Яо- has the siUnc
row space as .4 Throw away its zero row to find Я-also with tbe same row space.
Rank
two
0 3 1	Г 1
17	=	1
2 20 J	[4
(3)
Now look at columns The pivot columns of Я are clearly (1,0) and (0,1).
Then the pivot columns of Л are also in columns 1 and 2: U| = (1,1,4) and u2 = (0,1,2).
Notice that C has those same first two columns' That was guaranteed since multiplying
by two columns of the identity matnx (in Я) won't change the pivot columns Uj and u2
When you put in letters for the columns and rows, you see rank 2 — rank 1 4- rank 1.
Matrix A
Rank two
zero row
 utvf + u2t>J
Columns of C times rows of R. Every rank r matrix и a sum of r rank one matrices
• WORKED EXAMPLES 
3.5 A Put four 1’s into a 5 by в matrix of zeros, keeping the dimension of its row space
as small as possible. Describe all the ways to make the dimension of its column space
as small as possible. Then descnbe all the ways lo make the dimension of its nullspace
as small as possible. How to make the sum of the dimensions of all four subspaces small?
Solution The rank is I if thc four l’s go into the same row. or into the same column.
They can also go into two rows and two columns (so a,, - ao = a., = a,. = 1).
Since the column space and row space always have the same dimensions, this answers the
first two questions: Dimension 1.
The nullspace has its smallest possible dimension 6-4 = 2 when the rank is r = 4.
° и ’ 522"T g° 'nto f<W d,ffercn‘roWS four diffcrcn‘ columns.
« Л“ n n tь'4 J	"" ' + <"-’) + ’ + (»- r) = „ + m I, will be
+ 5‘Il~~«e1bo.dwr,«eptod - Tbejom i, 11 even if (here aren'lany IX..
и «И lb. oto .«no о, Л „ r. intend of OX bo. do to. »Swe„ change ?
35. Dimensions of the Four Subspaces
129
3 5 B	A! i^Zib?! lib ^b'na,IOnS -ws of B. So the row space of AB
is contained in (possibly equal to) Ле row space of В Rank (AB) < rank (B)
Al| columns of AB are combinations of the columns of A. SoThe column space of
AB is contained in (poMibiy equal to) the column space of A. Rank (AB) < rank (A).
|f we multiply A by an ittvenMe matrix В, Ле rank will not changZ Tbe rank
can’t drop, because when we multiply by the inverse matrix the rank can’t jump back up.
Appendix 1 collects tbe key facts about the ranks of matrices.
Problem Set 3.5
(a)	If a 7 by 9 matrix has rank 5. what are the dimensions of lhe four subspaces?
What is Ле sum of all four dimensions?
(b)	If a 3 by 4 matrix has rank 3. what are its column space and left nullspacc?
Find bases and dimensions for Ле four subspaces associated with A and В:
Find a basis for each of the four subspaces associated wii A:
0 12 3 4
0 12 4 6
0 0 0 1 2
I 0 O'
1 1 0
0 1 1
4'
2
0
0 1 2
0 0 0
0 0 0
4 Construct a matrix with the required property or explain why this is impossible:
(a)	Column space contains [i j, ^oj. row space contains
(b)	Column space has basis f I j. nullspace has basis i j.
(c)	Dimension of nullspace = 1 + dimension of left nullspace.
(d)	Nullspace contains [ J ], column space contains [} ].
(e)	Row space = column space, nullspace # left nullspace.
5
6
If V is Ле subspace spanned by (1,1,1) and (2,1,0), find a matrix A that has
V as its row space. Find a matrix В that has V as its nullspace. Multiply AB.
Without using elimination, find intensions and bases for Ле four subspaces for
Г
4 .
5
and В =
0 3 3 3'
A= 0 0 0 0
0 10 1
Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces
for A and also for Ле 3 by 6 matrix В = (A A ]. (7he bans for Z и empty.)
130
8
9
10
11
12
13
14
18
18
17
18
м iimensrnmof .he four subspaces Й»ЛД and C, if / is lhe 3 by 3
identity matnx and и  ик J >
and С = [О].
onv for these matrices of different sizes?
Which subspaces are the same for rncse
r	ГЛ [A .41
(a) M| and [*]	0» [л] «“» [Л A]’
prove that all three of those matrices have the same wik г.
If the entries of a 3 by 3 matnx are chosen randomly between 0 and I. what are the
том likely dimensions of the four subspaces" What if the random matnx is 3 by 5?
(Important) A is an m by n matnx of rank r. Suppose there are right sides b for
which Az = b has no solution
(a)	What are all inequalities « or <) that must be true between m, n, and r?
(b)	How do you know that ATy = 0 has solutions other than у » 0?
Construct a matrix with (1.0,1) and (1,2,0) as a basis for its row space and its
column space. Why can’t this be a basis for the row space and nullspace ?
True or false (with a reason or a counterexample):
(a)	If m « n then the row space of A equals the column space.
(b)	The matrices A and - A share the same four subspaces.
(c)	If A and В share the same four subspaces then A is a multiple of B.
Without computing A. find bases for its four fundamental subspaces:
1 0 0
6 1 0
9 8 1
A-
1 2 3 41
0 12 3
0 0 I 2 J
If you exchange the first two rows of A. which of the four subspaces stay the same?
If» = (1.2.3,4) is in the left nullspace of A write down a vector in the left nullspace
of the new matrix after the row exchange.
Explain why v — (1,0, -1) cannot be a row of A and also in the nullspace.
Describe the four subspaces of RJ associated with
0 1 о
Л= о о 1	and J
ooo
^COmpkled (5 ones and 4
nerther side passed up a winning move ?
’1
0
0
o‘
1
i
zcros in A) so that rank (A) = 2 but
1
1
0
3 5 Dimensions of the Four Subspaces
131
19
(Left nullspace) Add the extra column b and reduce A to echelon form:
12 3 b|
4 5 6 bj
7 8 9 bj
20
1 2 3
bi
t>2 - 4b,
0 0 0
A combination of the rows of A has produced the zero row. What combination is it?
(Look at Ьз - 2bi + bi on the nght side.) Which vecton are in the nullspace of Ar
and which vectors are in the nullspace of A?
(a)	Check that the solutions to Ax = 0 are perpendicular lo the rows of A:
21
22
23
1
2 1 0
1	0	0] [4	2	0
0 0 13= ERo = CR
3	4	11 [O	0	0	0
(b)	How many independent solutions to ATy = O’ Why does yT = row 3 of £'1 ?
Suppose A is the sum of two matrices of rank one: A = uvT + w:T.
(a)	Which vectors span the column space of A ?
(b)	Which vectors span lhe row space of A?
(c)	The rank is less than 2 if_or if_____.
(d)	Compute A and its rank if u = z = (1,0,0) and v = w = (0,0,1).
Construct A = ut»T + wxT whose column space has basis (1,2,4),(2,2,1) and
whose row space has basis (1,0), (1,1). Write A as (3 by 2) times (2 by 2).
Without multiplying matrices, find bases for the row and column spaces of A:
24
How do you know from these shapes that A cannot be invertible?
(Important) ATy = d is solvable when d is in which of the four subspaces? The
solution у is unique when the _______contains only the zero vector.
True or false (with a reason or a counterexample):
A and AT have the same number of pivots.
A and AT have the same left nullspace.
If the row space equals the column space then AT = A.
(b)
(c)
(d) If AT = -A then the row space of A equals the column space.
I
132
26
27
Chapter 3. The Four Fundamental SubSpact
И а Ь c ж to» a / 0.1»*	Cl”°“ d " I “ a ] h“ ra"k
R,d to of to . by 8 tocketo«d man. В and to toy. C.
1?
it
It
'1
0
1
0 10 10 1
10 10 10
0 10 10 1
and
10 10 10
b	4	к	b	n
P	P	P	p	p
four zero rows
P	P	P	P	p
b	q	k	b	n
I
Ц
В =
0
1
0
C =
P
P
0
1
'4
n
P
P
n
P
P
The numbers r.n.UM « 11,1 differenr Find baSC* r0W Space a"d left
nullspace of В and C. Challenge problem: Find a basts for the nullspace of C.
Challenge Problems
2B
29
If A  uvT is a 2 by 2 matrix of rank 1, redraw Figure 3.5 to show clearly the Four
Fundamental Subspaces. If В produces those same four subspaces, what is the exact
relation of В to Al
M is the space of 3 by 3 matrices. Multiply every matrix X in M by
A
1
-1
0
0
1
-1
-1
0
1
. Notice: A
O'
0
0
J
1
1
1
(a)	Which matrices X lead to AX « zero matrix?
(b)	Which matrices have tbe form AX for some matrix XI
30
(a) finds the "nullspace" of that operation AX and (b) finds the "column space”.
What are the dimensions of those two subspaces of M 7 Why do they add to 9 ?
Suppose the rn by n matrices A and В have the same four subspaces. If they are
both in row reduced echelon form, prove that F must equal G:
ll
Л =
B =
и
n

j 5 Dimensions of the Four Subspaces
133
Every Matrix A has a Pseudoinverse A+
If the columns of A are independent, then A* = (ATA)“1 AT j, a |eft-inverse: A* A = /.
If the rows of A are independent. then A* = AT(AAT)“* is . right-inverse: AA* = I.
This page allows dependent columns and dependent rows, and creates the pseudoinverse A+.
Here is the key idea for A*b. Split b into p and e. The part p in lhe column space
equals Ax for one vector z in the row space (see page 125). The pan e is in lhe
nullspace of A (the fourth subspace). Then the best possible inverse A* has A*p = z*
and A+e = 0 on the two pans. By linearity A*b = A*(p + e) = z*.
In short, A takes its row space to its column space. A* inverts that invertible part.
Figure 3.5: Vectors p = Az* in the column space of A go back to z* in the row space.
Notice ! Suppose you start from that vector b (not in the column space). Then A" b = z4
is in the row space and AA*b = Az* = p is in the column space. So AA* / I. But
p is аз close to b as possible. Actually p is the “projection of b onto the column space.
the subject of Chapter 4. Symmetrically. A* Az is the projection of z onto the row space.
Examples
If A = CR = (m xr)(rxn) then А* =Я*С*
4 Orthogonality
4.1 Orthogo»alil.'»f,he,;“urSubSpa“S
42 Projections onto Subspaces
43 Least Squares Approximations
4.4 Orthogonal Matrices and Gram-Schmidt
Two vectors arc orthogonal when then dot product is zero v • w - w = Q
chapter moves to orthogonal subspaces and orthogonal bases and orthogonal matrices
The vectors in two subspaces. and the vectors in a basts, and the column vectors in Q
all pairs will be orthogonal Think of a3 + b3 = c2 for a right triangle with sides v and J
^Orthogonal vectors_________t>TW » 0	ll®ll* + ||w||a a ||t> 4.
The right side is (o + w)T(® + »)• This equals vTv + wTw when vTw = wTv a 0.
Subspaces entered Chapter 3 to throw light on Ax = b. Right away we needed the
column space and the nullspace. Then the light turned onto A1. uncovering two more
subspaces Those four fundamental subspaces reveal what a matnx really does.
A matrix multiplies a vector A timet x. Al the first level this is only numbers. At
the second level Ax is a combination of column vectors The third level shows subspaces
But I don’t think you have seen the whole picture until you study Figure 4.2. Those
fundamental subspaces are orthogonal:
The nullspace N(A) contains all vectors orthogonal to lhe row space C(AT).
The nullspace N(AT) contains all vectors orthogonal to the column space C(A).
Ax = 0 makes x orthogonal to each row. ATy = 0 makes у orthogonal to each column.
A key idea in this chapter is projection If 6 is outside the column space of A, find
the closest point p that is inside. The line from b to p shows the error e. That line is
perpend.cular to the column space. The least squares equation ATAx = ATb produces
That "s* *^Srnfe‘ h«ivci “* Ь» ® wfKn Ax = b is unsolvable.
Thatbest x nukes ||Ax - ЦР = |Wf» as small as possible.
** easy whcn AT A = /. Then A has orthonormal columns:
OrthogonaJiane”^ГС,|ОП ** *°П * happen b-v “’dent but we can make it happen.
°* to	C0,umns 10 «n with 9^ =Tnd
’ °nbOg0nlJ “*<?>- QT<? = Z and it connects t; A by A = QR
light Q i0 „иду waZ”^peC,J°r co’nPutations The whole of Section 7.4 will high-
Q is better than A = LU, and this chapter shows why.
134
4.1. Orthogonality of the Four Subpaces
135
4.1 Orthogonality of the Four Subspaces
veoonh«« 0. The,||,||> + |M. . II, + B||1 „к .» + P . A
2 Subspaces V and W are orthogonal when ®Tw . 0 for v ia v	w in W.
j The row space of A is orthogonal to the nullspace. The column space is orthogonal to N(AT).
4 The dimensions add lo r + (n - r) « n and r + (m - r) = m: Orthogonal complement*
^5 If n vectors in R are independent, they span Rn. If n vectors span R", they are independent^
Chapter 1 connected dot products vT w to the angle between v and w For 90" angles
we have v1 w = 0. This chapter moves up from orthogonal vectors v and w to orthogonal
subspaces V and W.
The subspaces fit together to show the hidden reality of A tunes x. The 90° angles
between subspaces are new -and we can say now what those nght angles mean
The row space is perpendicular to the nullspace. Every row of A is perpendicular to
every solution of Ax “ 0. That gives the 90° angle on the left side of the figure. This
perpendicularity of subspaces is Part 2 of the Fundamental Theorem of Linear Algebra.
The column space и perpendicular to the nullspace of Ar. When we want lo solve
Ax “ b and can't do it, this nullspace of AT contains the error e = b - Ax in the
"least-squares" solution x. A key application of linear algebra.
DEFINITION Two subspaces V and W of a vector space arc orthogonal if every vector
v in V is perpendicular to every vector w in W:
Orthogonal subspaces vTw =0 for all v in V and allwinW.
Example 1	Two walls of a room look perpendicular but those two subspaces are
not orthogonal! The meeting line is in both V and W—and this line is not perpendicular
to itself. Two planes in R3 (dimensions 2 and 2) cannot be orthogonal subspaces.
Example 2	Thc floor and a vertical line do give orthogonal subspaces. They have
dimensions 2+1=3. Perpendicular lines through 0 also give orthogonal subspaces.
When a vector is in two orthogonal subspaces, it must be zero. It is perpendicular to
itself. It is v and it is w, so vTv = 0. Thi* has to be the zero vector.
The crucial examples for linear algebra come from the four fundamental subspaces.
Zero is the only point where the nullspace meet* the row space. More than that, the
nullspace and row space of A meet at 90°. This key fact come* directly from Ax = 0:
Every vector x in the nullspace is perpendicular to every row of A. because Ax = 0.
The nullspace N(A) and the row space C(AT) are orthogonal subspaces of R"
w ar
136
orthogonal plane V and line W
F.₽. 4.1: Onbop^^ »i«V«»bk •»- <“» V + di"' W >
To «	. »и»**™«»•«* *	* °'	m"l“pl“’ *=
01	<— (row 1) • X is «го
:	(I)
Oj	1— (row m) • x is zero
The fin» equation says that row I u perpendicular to x. The last equation says that row m is
perpendicular to x. Every row has a zero dot product with x. Then x is also perpendicular
to every combination of the rows. Tbe whole row space C(AT) is orthogonal to N( A).
Here is a second proof of that orthogonality for readers who like matrix shorthand.
The vectors in the row space of A art combinations A ' у of the rows.
Nullspace orthogonal to row space xT(ATy)  (Ax)Ty = O ' v  0.	(2)
We like the first proof. You can see those rows of A multiplying x to produce zeros in equa-
tion (I). The second proof shows why A and AT are both in the Fundamental Theorem.
Рал I of the Fundamental Theorem gave the dimensions of the four subspaces. The
row and column spaces have the same dimension r (they are drawn the same size). The
two nullspaces have the remaining dimensions n - r and m - r. Now we know that
the row space and nulhpace are orthogonal subspaces inside R”.
Example 3	The rows of A are perpendicular to x = (1,1, -1) in the nullspace:
gives the dot products 1 + 2 3	1	°
5+2-7=0
Now turn to the column space of A and the nullspace of AT: the other pair.
Every vector у in the nullspace of 4T ><	T~
TbeUfinulUpace^ anJJ^^^ V0 column of A.
---------are orthogonal in Rrn.
2е 7"’“ °*41»»к - «'
‘ л »«« column space of A Q.E.D.
4.1. Orthogonality of the Four Subspaces
137
For a visual proof, look at A'y = 0. Each column of A multiplies у to give 0: column
of A multiplies у to give 0:	lupuesymg
Figure 4.2: Two pairs of orthogonal subspaces Tbe dimensions add to n and add to m.
This is the Big Picture—two subspaces in R" and two subspaces in Rw.
Orthogonal Complements
Important The fundamental subspaces are more than just orthogonal (in pain).
Their dimensions are also right. Two lines could be perpendicular in R3. but those lines
could not be the row space and nullspace of a 3 by 3 matrix The lines have dimensions
I and I, adding to 2. But the correct dimensions г and n - r must add to n = 3.
Figure 4.1 showed two walls of a room. Dimensions 2 + 2 # 3 must fail.
The fundamental subspaces of a 3 by 3 matrix have dimensions 2 and I. or 3 and 0.
Those pairs of subspaces are not only orthogonal, they are orthogonal complements.
DEFINITION The orthogonal complement Vх of a subspace V contains every vector that
it perpendicular to V. The dimensions of V and V* add to (dimension of the whole space).
By this definition, the nullspace is the orthogonal complement of the row space.
Every x that is perpendicular to the rows satisfies Ax = 0. and lies in the nullspace.
The reverse is also true. If vis orthogonal to the nullspace, it must be in the row space
In the same way, the left nullspace and column space are orthogonal in Rm, and they
are orthogonal complements. Their dimensions r and m - r add to the full dimension m.
138

NM) * *. "/Л1С< А?/‘"
N(4T) „	С(л) (|1!!?>-
-=°-
“ “е'ХГХ‘^0» » И» «*»» Ч»“' Mull",l>"ns Ьу/ с“"“ ,d0 “*"« «Ье.
in Йе row tpace. Proof: If Лхг = A*'r. the difference xr -xr is in the nullspacc
It i, also .n the row sp*e. where xr and x'r came from. Th.s difference must be the zero
vector because the nullspace and row space are perpendreular. Therefore xr = ж'.
Figure 4.3: This update of Figure 4.2 shows the true action of A on x = xr + xn
A times zr is in the column space. A times xn is the zero vector.
There is an r by r invertible matrix hiding inside A, if we throw away the two nullspaces.
From row space to column tpace, A и invertible (page 127). The pseudoinverse A + will
invert that part of A (page 133).
B =
contains
in the pivot rows and columns.
Example Every matnx of rank r has an r by r invertible submatrix В has rank 2:
1	2	3	4	5	1
1	2	4	5	6
1	2	4	5	6	j
bases I hope yoJr^ch "ihT*	** d,agQnalized' when we choose the right orthogonal
- hope you reach lhat amazmg fact: the Singular Value Decomposition of A
< me repeat. Theoniy vectors two orthogona) subspaces is the zero vector.
41. Orthogonality of the Four Subspaces
139
Combining Bases from Subspaces
A basis contains linearly independent vectors that span the space Normally we have to
check both properties of a basis When the count is right, one property implies the other:
Every vector is a combination of the basis vectors in exactly one way.
Any n independent vectors in Rn must span R". So they are a basis.
Any n vectors that span R" must be independent. So they are a basis.
Starting with thc correct number of vectors, one property of a basis produces the other.
This is true in any vector space, but we care most about Rn. When the vectors go into the
columns of an n by n square matrix A, here are the same two facts:
If the n columns of A are independent, they span Rn So Ax = b is solvable
If the n columns span Rn, they are independent So Ax = b has only one solution.
If AB = I for square matrices, then BA = I.
Uniqueness implies existence and existence implies uniqueness. Then A is invertible. If
(here are no free variables, the solution X is unique There must be n pivot columns
Then back substitution solves Ax = b (the solution exists)
Starting in the opposite direcuon, suppose that Ax = b can be solved for every b
(existence of solutions). Then elimination produced no zero rows. There are n pivots and
no free variables The nullspace contains only x = 0 (uniqueness of solutions).
With bases for the row space and the nullspace, we have r + (n - r) - n vectors.
This is the right number. Those n vectors are independent.1 Therefore they span R”.
Each x is the sum xr + zn of a row space vector zr and a nullspace vector zn.
The splitting xr + z„ in Figure 4.3 shows the key point of orthogonal complements—
the dimensions add to n and all vectors are fully accounted for.
Example 5 For A = * g j split z = j into z, + z„  j + ^ _ j j.
The vector (2,4) is in the row space. The orthogonal vector (2, -1) is in the nullspace.
The next section will compute this splitting by a projection matrix P.
Example 6 Suppose S is a six-dimensional subspace of nine-dimensional space R*.
(a)	What are the possible dimensions of subspaces orthogonal to S ?	0,1,2,3
(b)	What are the possible dimensions of the orthogonal complement Sx of S ?	3
(c)	What is the smallest possible size of a matrix A that has row space S ?	в by 9
(d)	What is the smallest possible size of a matrix В that has nullspace S1 ?	в by 9
a If a combination of all n vectors gives zr+*n = 0. then *r = -x,. is in both subspaces So
Xr = Xn = 0. All coefficients of the row space basis and of the nullspace basis must be zero.
This proves independence of the n vectors together.
Chap,Cf4^nahty
140
2
Problem Set 4.1
n u , ^unx of rank one. Copy fi₽« 42 and one vec>°r in each
Construct any 2 by 3 matru oi	orthogonal?
subspace (and put two in the nullspace).
a 1 for a 3 by 2 matnx of rank r = 2- Which subspace is Z
.л К	“ “y wb>’ “ “ impo“ible:
3
. [ 11 and 1-3 . nullspace contains [11
(a) Column space contains [JJ	,J’	Li J
[11 and -3 nullspace contains [ 11
jjand^ sj.nuus,—	LiJ
(c)	Ax = [i] has a solution and AT [g] » [g]
(d)	Every rot ia orthogonal to every column (A is not the zero matrix)
(e)	Columns add up to a column of zeros, rows add to a row of l’s.
4 If -LB = 0 then the columns of В are in the---- of A. The rows of A are in the
of B. With AB - 0, why can’t A and В be 3 by 3 matrices of rank 2?
(a)	If Ax = b has a solution and ATp = 0. is (yTx = 0) or (yTb = 0)?
(b)	If ATy «(1,1,1) has a solubon and Ax = 0, then------.
6	This system of equations Ax = b has no solution (they lead to 0 = 1):
x + 2y + 2z = 5
2z + 2y + 3z = 5
3x + 4y + 5z = 9
Find numbers уьуьуз to multiply the equations so they add to 0 = 1. You have
found a vector у in which subspace? Its dot product у rb is 1, so no solution x.
7 Every system Ax = b with no solution is like the one in Problem 6. There are num-
bers yi .........у», that multiply the m equations so they add up to 0 = 1. This is called
Fredholm’s Alternative If b ts not in C(A), then part of b is in N(AT).
Exactly one problem has a solution: Ax = b OR Ary = 0 with yTb = l,
Multiply tbe equations xj - x2 = 1 and Xj — Z3 = 1 and Xi — X3 = 1 by numbers
Vi-Уз.Уз chosen so that the equations add up to 0 = 1.
8
wrinr • v.	"°* ^Zr B еЯиа1 t0 How do we know that this
vector is in the column space’If A — 11 U _ in «. • «
>p«x.uA-[11jandx = |i] what is xr?
4.1. Orthogonality of the Four Subspaces
141
9 Jh A Ax ofVanrt IR1 0 Reason:	» the nullspace of AT and also in
А ГГТ4»“> «--------------------0 and toA.ce
«те ли pace аз Л. Thu key faa a repeated in lhe nett tertian.
Ю Suppose A is a symmetric matrix (Лт = Д)
(a)	Why is its column space perpendicular to its nullspace?
(b)	If Ax — 0 and Az = 5z, which subspaces contain these “eigenvectors” x
and z Symmetric matrices have perpendicular eigenvectors xTz = 0.
1 0
3 0
12 Find xr and xn and draw Figure 43 property if A = I1. X| and x =
Questions 13-23 are about orthogonal subspaces.
13	Put bases for the subspaces V and W into the columns of matrices V and W. Explain
why the test for orthogonal subspaces can be written VT1V = zero matrix. This
matches WT w = 0 for orthogonal vectors.
14	The floor V and the wall W are not orthogonal subspaces, because they share a
nonzero vector (along the line where they meet). No planes V and W in R3 can be
orthogonal! Find a vector in the column spaces of both matnces.
1 2
1 3
1 2
5 4
6 3
5 1
This will be a vector Ax and also Bx. Think 3 by 4 with the matrix [ A В ].
15	Extend Problem 14 to a p-dimensional subspace V and a g-dimensional subspace W
of Rn. What inequality on p + q guarantees that V intersects W in a nonzero vector?
These subspaces cannot be orthogonal.
16	Prove that every у in N(XT) is perpendicular to every Ax in the column space,
using the matrix shorthand of equation (2). Start from ATy = 0.
17	If S is the subspace of R3 containing only the zero vector, what is Sx ? If S is
spanned by (1,1,1), what is Sx ? If S is spanned by (1,1,1) and (1,1,-1). what is
a basis for Sx ?
18	Suppose S only contains two vectors (1,5,1) and (2,2,2) (not a subspace). Then
Sx is the nullspace of the matrix A =__________. Sx is a subspace even if S is not.
19	Suppose L is a one-dimensional subspace (a line) in R3. Its orthogonal complement
Iх is the____________perpendicular to L. Then (Lx)x is a------ perpendicular to L .
In fact (Lx)x is the same as--------
Chapter 4.0^,.^
142
20
21
22
23
<nart.Rt Then Vх contains only the vector_
е11ПЛЛ, V в the whole K «	------Thea
Suppose v	(Vх Iх is the same as --•	n
(Vx)x is__•’°'	'
t hv the vectors (1.2.2,3) and (1.3,3,2). Find two vector ,l
S«P<K~S«^2^»ta«A* = »fa*h'ch'’’	“*
span Sx. This в the same as soiling л
!f P в the plane of vectors in R4 satisfying z, + *,+«,+ «4 =• 0. write a
JL ^ cJXt a matrix that has P as its nuilspace.
If, subspace S is contained to a subspace V. explain why Sx contains Vх.
about perpendicular columns and rows.
Suppose an n by n matrix is invertible: AA~l = I. Then the first column of л
orthogonal to the space spanned by which rows of A?	is
Questions 24-23 are
24
25 Find ArA if the columns of A are umt vectors, perpendicular to each other.
25 Construct a 3 by 3 matnx A with no zero entries whose columns arc mutual]
pendicular. Compute ArA. Why is it a diagonal matrix?	У **r‘
27 The lines 3r+ p = b| and 6r + 2y = bj are_____________They are the same line if
In that case (bj, bj) и perpendicular to the vector_______The nullspace of the п^ы
is the line За 4 у »______. One particular vector in that nullspace is
25 Why is each of these statements false?
(a)	(1.1,1) is perpendicular to (1,1, -2) so the planes z + у + z = 0 and z + y-
2a = 0 are orthogonal subspaces
(b)	The subspace spanned by (1,1,0,0,0) and (0,0,0,1,1) is the orthogonal com-
plement of the subspace spanned by (1,-1,0,0,0) and (2, -2,3,4, -4).
(c)	Two subspaces that meet only in the zero vector are orthogonal.
29 Find a matnx A with v  (1,2,3) in the row space and column space. Find В
with v in the nullspace and column space. Which pairs of subspaces can't share v ?
30 Suppose A is 3 by 4 and R is 4 by 5 and AB = 0. So N(A) contains C(B).
Prove from the dimensions of N(A) and C(B) that rank(A) + rank(B) < 4.
31	The command .V = ntal(A) will produce a basis for the nullspace of A. Then the
command В = пи1(ЛГ) will produce a basis for the__ of A.
32	What are the conditions for nonzero vectors r, n, с, I in R2 to be bases for the four
fundamental subspaces C(AT),N(A),C(A),N(AT) of a 2 by 2 matrix?
33	Whcncan the vectors ri.rj.m.ni,C|,C2,f i,l2 in R4 be bases for thc four funda-
mental subspaces of a 4 by 4 matrix ? What is one possible A ?
4 2. Projections onto Subspaces
143
4.2 Projections onto Subspaces
\The projection of a vector b onto the line through a it the closest point p = a(aTb/ aTa).
2 The error e = b - p is perpendicular to a: Right inangle bpe has ||p||2 + ||e||a = ||b||a.
3 The projection of b onto a subspace S is the closest vector p in S; b - p is orthogonal to S.
4 Then the projection of b onto the column space of A is the vector p = Л(ЛТЛ)“‘ Лтb.
5 The projection nutria onto С(Л) is P ~ А(ЛТЛ)-’AT.| Then p = Pb and P3 = P.
This section of lhe book is about closest points We have a point b that is not in
a subspace S (both are in rn dimensions). What point p in the subspace is closest lo b ?
A picture of lhe problem suggests the key to tbe solution: The line from b to p is
perpendicular to the subspace That line in Figure 4.5 shows us the error e = b - p.
Our first examples are "projecting" b onto special subspaces like the xy plane.
There is a projection matrix P that multiplies b and produces its projection p = Pb
1	What are the projections of b = (2,3,4) onto tbe z axis and onto lhe xу plane ?
2	What matrices P1 and Pi produce those projections Pb onto a line and a plane ?
When b is projected onto a line, its projection p it the part of b along that line.
If b is projected onto a plane, p is the part in that plane. The projection p it Pb.
The projection onto the z axis we call pt. The second projection drops straight down to
the xy plane. The picture in your mind should be Figure 4.4. Start with b a (2,3,4).
The г-projection gives pt  (0,0,4). The projection down gives p2 = (2,3,0). Those are
the parts of b along the z axis and in the xy plane.
The projection matrices P\ and Pi are 3 by 3. They multiply b with 3 components
to produce p with 3 components. Projection onto a line comes from a rank one matrix.
Projection onto a plane comes from a rank two matrix:
„ , ,	Го	о 01	Г1 о ol
Projection matrix p 0 0 0 Onto the xy plane:	0 1 0 .
Onto the z axis:	о	0 1]	[о 0 0
Pi picks out the	г component	of every	vector.	Pi	picks out the x and	у components.
To find the projections pt	and p7	of b, multiply	b	by	Pi	and	Pi	(small p	for the vector,
capital P for the matrix that multiplies b to produce p):
144	*
ты rv pla* and lhe z axis are orthogonal
spaces, like the noor 01	v.
Projection ₽!
Figure4.4: The projectionsp, » fib Pt = ** ont° Лс * “ls “d *V plane.
More than just orthogonal, the line and plane are orthogonal complements Thejf
dimensions add to 1 + 2 « 3. Every vector 6 in the whole space is the sum of fa pans in
the two subspaces The projections p, and p, are exactly those two parts of b:
The vectors give pj+ft-b The matrices give + P, . J. (J)
This is perfect. Our goal is reached—for this example We have lhe same goal for Wy
line and any plane and any n-dimensional subspace of R . The object is to find the pan p
in each subspace, and also the projection matrix P that produces that part p = pfo
Every subspace of R" has its own m by m projection matrix P
The best description of a subspace is a basis. We put lhe basis vectors into the columns
of A. Nov we art projecting onto the column space of Л! Certainly the z axis is the
column space of lhe 3 by 1 matrix A।. The xy plane is the column space of Л2. That plane
is also the column space of Л3 (a subspace has many bases). So p2 = p3 and = Pj.
Дз has the
same column
space as A?
Our problem is to project any b onto the column space of an m by n matrix A.
Start with a line (dimension n = 1). The matnx A will have only one column. Call it a.
Projection Onto a Line
A line goes through the origin in lhe direction of a = (a.,. . am) Alone that line
we want the point p closest to b = lb, h t tv 1, ' 1		8	’
The line from b top is perpendicular to A^' рПуеС1,0П ,S orthog°"ali>y:
f figure 4.5. We now compute p by algebra.
4 2. Projections onto Subspace*
145
The projection p win be some multiple of a. Call ц p = io = hat tunes a.
C°?Pff £ ^i^mV p'^ P ТЪсП frora	for ₽ *e
read ° th find the "t* *	J*** lhree S|CP' will lead co all projection matnces:
ftnd x, then find the vector p = Ax. then find tbe matrix P
The dotted line 6 - p i. the “error" e = b - ia. h b perpend.cular to а-this will
determine x. Use the fact that b - xa is perpendicular to a when their dot product is zero:
Projecting b onto a with error e = b - z a . Tl
• a*b о о (2)
fl.(b-za) = Q or a.b = 2a.a	9 a^a s 7^
The multiplication aTb is the same nab. Using the transpose is better, because it
applies also to matnces. Our formula z ж oTb/aTa gives the projection p = xa.
Figure 4.5: The projection p of b onto a line (left) and onto S = column space of A.
The projection of b onto the line through a is the vector p = xa ж °т в.
Special сне I: If b = a then i = I. The projection of a onto a is itself. Pa - a
Special case 2: If b is perpendicular to a then aTb ж 0. The projection is p = 0.
’ 1 '
2
2
Solution The number x is the ratio of aTb = 5 to aTa = 9. So the projection is p = Ja
The error vector between b and p is e = b - p. Those vectors p and e will add to b.
Example 1 Project b
onto a =
to find p = xa = | a in Figure 4 J
5
P=9
5 10 10
9’ 9 ’ 9
1
1
1
1
9
The error e should be perpendicular to a = (1,2,2) and it is: e a
146
. Of b P. and e. The vector b is split into two pan_
Look at the right tnang popendicular part is e. Those two sides p
M - W **• T"f"''v*•"«pr»dut,'
have length ||p|l —	-
«=£ws has length W = “ПЙ5 Hl = l|b|l	(3)
aTa
ы ampler than gening involved with coefl and the length Of s.
The dot product is а lot	(|Ьц = yj	8® Of 6.
The example has square	* j$	sqUilre
roots in the projection p=5o/!Jlne«wu '	'
Now comes the projection matrix In the formula for p what matrix is multiply, 6,
You c^ 2Z JXer if the number x „ on the nght stde of a:
Projection
matrix P
s Pb "hen the matrix is P * tsar
ara	aTa'
Solution Multiply column a rimes row aT and divide by a1 a
1
9
P i, . column rimes a row! Tbe column is a. the row is a Tben dmde by the number
aTo The projectton matnx P is m by m. but its rank is one We are projecting onto a
one-dunensional subspace, the line through a. That hne ss the column space of P.
Ex.mpte2 Fmd the projection matrix P » onto the line through a = [j],
9:
1
2
2
И
2
4
2
1
9
Projection matrix
2
4
This matrix projects any vector bonto a. Check p ” Pb for b = (1,1,1) in Example I:
P-Pb-I
1
1
1 2 21 Г1
2 4 4
2 4 4
1
9
5
10
10
> •
which is correct.
If the vector a is doubled, the matrix P stays the same! It still projects onto the same line.
If P is squared. P3 equals P. Projecting a second time doesn't change anything.
so P2 - P. The diagonal entries of P add to | (1 + 4 +• 4) = 1 = dimension of line.
P3 =	= P when you cancel the number
e*a aTe	aTo
The matrix I - P should be a projection too. It produces the other side e of the triangle—
the perpendicular part of b. Note that (I - P)b equals b - p which is e in the left nullspace.
When P projects onto one subspace. I — P projects onto the perpendicular subspace.
Now we move beyond lines and planes in R3. Projecting onto an n-dimensional
KxJfiuj-n p •	Cff°rt T** cn,cial formulas will be collected in equations
(5H6H7). Bastoally you need to remember thou-.b--------
4.2. Projections onto Subspaces
147
Projection Onto a Subspace
Start with n vectors <ц,..., a„ m R Assume that these a's are linearly independent.
Problem: Find the combination p = z,a, + ... +	* a glyfn „аог b
We are projee mg eac m R onto the n-dimensional subspace spanned by the a's.
With n = I (one vector a() this is projection onto a line. The line is lhe column space
of A, which has just one column. In general the matnx A has n columns a,...a„.
The combinations in R”1 are the vectors Az in the column space. We are looking for
the particular combination p = Ax (the projection) that is closest lo b. The hat over z
indicates the best choice z. to give the closest vector in the column space. That choice is
£ = aTb/a a when n = 1. Forn > 1. the best z = (z,........£„) is to be found now.
We compute projections onto n-dimensional subspaces in three steps as before
Find the vector x. Find the projection p = Az. Find the projection matrix P.
The key is in the geometry ' The dotted line in Figure 4.5 goes from b to the nearest
point Ax in the subspace. This error vector b — Ax is perpendicular to lhe subspace.
The error b - Ax makes a right angle with all the vectors al,...,an in the base.
Those n right angles give the n equations for x:
af (b - Az) = 0
or
aj(b- Az) ж 0
.T
(4)
0
The matrix with those rows af is AT. The n equations are exactly AT(b - Ax) = 0.
Rewrite AT(b - Az) = 0 in its famous form ATAx = ATb. This is the equation
for z, and the coefficient matrix is AT A. Now we can find z and p and P. in that order.
The combination p = X|Oi+ ••• +znOn that is closest to b is p = Az:
Findz(nxl) AT(b-Az)-0 or ATAz = ATb
(5)
This symmetric matrix ATA is n by n. It is invertible if the a's are independent.
The solution is x = (ATA)~lATb.Theprvjecribnofbontotbesubspaceisp:
Findp(mxl) p = Az = A(ATA)* ATb.
The next formula picks out the projection matrix that is multiplying b in (6):
Find P (m x m)
(6)
(7)
148
Compare with pr®Jft'uon
For n = 1
и»,	ЛЬ»0** °°ta,,n “d ATA = »Ta <1 by ъ
- -S* .»d Р=«Й •"«
aTa	a	a a
. n[ical with (5)and <« and nUmbcr ®Tq become. ,k
Those formulas are Л	divide by 1L When it is a matrix. we in ,he
Л -.................................°- лТл » *5£
The linear ^dependence	= Q used gcometry (e is orthogonal to Cach
The key step was A (		_	.. g very quick and beautifu] * a).
Unear algebra gives thu normal egaanon	way,
1. Our subspace is the column space of Л
2. The error vector e = b - Лх is perpendicular to that column space.
I N» « Ь i. *1* ..llx*» °f f! ТЬеь ЛТ(Ь - Л») « 0 and ЛТЛ£ « лть
Th, left » -pod»* '** Г*0*"”1" J- Пи1!1Р“е °'»* «го,
vector е = b - Лх. The vector b is split into the projection p and the error e = b ₽
Projection produces a right mangle with sides p. «. and b
Example 3 If Л - [} ?] «d b =[?] find £ and p and P.
Solution Compute the square matnx ЛТЛ and the vector ATb. Solve ЛТЛ2 » лт6:
Equations ЛТЛ£ = ЛТЬ :
The combination p = Ai is the projection of b onto the column space of A:
Two checks on the calculation. First, the error e = (1, -2,1) is perpendicular to both
columns (1.1.1) and (0.1,2) of Л. Second, the matrix P times b = (6.0,0) correctly
gives p = (5.2, -1). That solves the problem for one particular b.
The projection matrix is P = Л(ЛТЛ)-*ЛТ. The determinant of ATA is 15 - 9 = 6.
Then multiply A times (ЛТЛ)~’ limes Лт to reach P:
We must have Р» = p. because a second
and P=1
6
5	2-1
2 2 2
-12 5
(10)
projection doesn’t change the first projection!
4.2- Projections onto Subspaces
149
The ™. P . 4(4’4)-UT »	Уоо	(ЛТЛ)-.
*» A ,-ujtw It '.'	** »>*«* ii imo P. you will H~l
p = A A {A ) Л . Apparently everything cancels. This looks like P = I. the identity
matrix. We want to say why this is wrong.
fhe matrix A is rectangular. It has no (wn, We н, (лт A)-1 inlo
A"1 times (A ) because there is no A -»in the first place
In our expencnce, a problem that involves a rectangular matnx almost always leads to
ArA- When A has independent columns. ATA is inveruble This fact is so crucial thal
we state it clearly and give a proof.
A A Is invertible if and only If A has linearly independent columns, j
Proof ЛТА is a square matrix (n by n). For every matrix A, we will now show that
A1 A has the same nullspace as A. When the columns of A art linearly independent, its
nullspace contains only the zero vector. Then ATA. with this same nullspace, is invertible.
Let A be any matrix. If x is in its nullspace, then Ax • 0. Multiplying by AT gives
AT Ax = 0. So x is also in the nullspace of ATA.
Now start with the nullspace of ATA. From ATAx « 0 we must prove Ax  0.
We can't multiply by (AT)'1, which generally doesn't exist Just multiply by xT:
(xT)ATAx = 0 or (Ax)T(Ax)-0 or |Ax|2 = 0. (II)
We have shown: If A1 Ax “ 0 then Ax has length zero Therefore Ax = 0. Every vector
x in one nullspace is in the other nullspace. If ATA has dependent columns, so has A.
If ATA has independent columns, so has A. This is the good case: ATA is invertible.
When A has independent columns, AT A is square and symmetric and invertible.
AT	A	ATA	дТ	A	ATA
[110	I 2'	[2 41	[1 1 01	1 2‘	[2 41
[2 2 0	1 2 0 0	4 *1	[2 2 1]	1 2 .° >.	’1
	в	Л				
dependent singular				indep.	invertible
Very brief summary To find the projection p = Zia1 + • • • + zna„. solve ATAx = A1 b.
This gives x. The projection of b is p = Ax and the error is e = b~p = b Ax.
The projection matrix P = A(ATA)_,AT gives p = Pb This matrix satisfies P2 = P.
The distance from b to the subspace C(A) is ||e|| = ||b — p|| (p = closest point).
Example Suppose your pulse is measured at x = 70 beats per minute, then z = 80.
then x = 120. Those three equations Ax = b in one unknown have AT = (1 1 1) and
b = (70,80,120). Thenx = 90° is the average of 70,80,120 Use calculus or algebra:
I. Minimize E = (x - 70)a + (x - 80)’ + (x - 120)1 by solving dE/dx = 6z - 540 = 0.
, . «Tfc 70 + 80+120
2. Project b = (70,80.120) onto a = (1,1,1) to find x= =-----------=90.
150
Copter 4. Onho^^
Problem Set 4.2
Questions 1-9 ask for projections p onto lines. Abo errors e = b - p and niatricts p
Project lhe vector b onto the line through a. Check that e и perpendicular to a;
T
(b) b= 3
1
and
Dm the projection of b onto a and also compute it from p - xa:
In Problem I. find the projection matrix P « aaT/ara onto the line through
vector a. Verify in both cases that P3 = P. Multiply Pb in each rw l0 co Mc"
the projection p Projection matrices onto lines have rank 1.	PUIe
4 Construct the projection matrices Pi and Pi onto the lines through the ai in Prob-
lem 2. Is it true that (ft + ft)’ - ft + ft? ‘П>« *<*« be true if P, ft - 0.
5 Compute the projection matnees aaT/a'!a onto the lines through a! - (-1,2,2)
and oj = (2.2. -1) Multiply those projection matrices and explain why their prod-
uct Pt Pj is what it is.
6 Project b a (1,0,0) onto lhe lines through Oi and a? in Problem 5 and also onto
ej a (2.-1,2). Add up the three projections p, +p, +p3.
7	Continuing Problems 5-6. find the projection matrix P3 onto aj = (2, — 1,2). Verify
that ft + ft + ft a /. This is because the basis a], aj. a3 is orthogonal I
Questions 5-6-7: orthogonal
Questions 8-9-10: not orthogonal
42 projections onto Subspace»
151
project thc vector b = (1,1) onU)
Draw thc projection» p, and p, and add p
because thc a’» are not orthogonal.
through at . (1,0) and a2 = (1,2).
+ Pj. Thc projections do not add to b
9
10
In Problem 8, the projection of b
p = A(ATA)-1 AT for A = (a(
project ai = (1,0) ontoa2 = (1,2). Then
these projections and multiply the projection
onto the plane of a( and a2 will equal b. Find
°2' = [ol] = invertible main*.
project the result back onto a,. Draw
matnces Pt Pj; Is this a projection?
Questions 11-20 ask for projections, and projection matrices, onto »u tn paces.
11 Project b onto the column space of A by solving ATAi -ЛЧшАр-Аж:
	1	1		2		1 1'		’4‘
(a) A =	0	1	and b a	3	(b) A =	1 1	and b “	4
	0	0		4		0 1		6
Find e = b - p. It should be perpendicular to the columns of A.
12 Compute the projection matrices P, and Pj onto the column spaces in Problem 11.
Verify that P^b gives the first projection p,. Also verify P22  Pj.
13 (Quick and Recommended) Suppose A is the 4 by 4 identity main* with its last
column removed. A is 4 by 3. Project b « (1,2,3,4) onto the column space of A.
What shape is the projection matrix P and what is P?
14 Suppose b equals 2 times the first column of A. What is the projection of b onto
the column space of A? Is P = I for sure in this case? Compute p and P when
b = (0,2,4) and the columns of A are (0,1,2) and (1,2,0).
15 If A is doubled, then P = 2A(4ATA)* *2AT. This is the same as A(ATA)_,AT.
The column space of 2A is the same as______. Is x the same for A and 2A?
16 What linear combination of (1,2, -1) and (1,0,1) is closest to b - (2,1,1)?
17 (Important) If P2 = P show that (/ - P)2 « I - P. When P projects onto thc
column space of A, / — P projects onto the_________.
18	(a) If P is the 2 by 2 projection matnx onto the line through (1,1), then / - P is
the projection matrix onto_____.
(b) If P is the 3 by 3 projection matrix onto the line through (1.1,1). then I - P
is the projection matrix onto______.
19 To find the projection matrix onto the plane x — у — 2z = 0. choose two vectors in
that plane and make them the columns of A. The plane will be tbe column space of
A! Then compute P — A(ArA)"*AT.
20 To find the projection matrix P onto the same plane X - у - 2z = 0, write down a
vector e that is perpendicular to that plane. Compute tbe projection Q - ее / e e
and then P = I — Q.
152
21
24
26
28
_ Л/АтА)~'Ат by iee^- Cancel ,0 Prove that pa
n “in“lumn W««’
„T „
p™. ,ы p -	“ »”“«by “mpu"”8 ' Remcmb''««u,
Х“.^"“"«“!уп,лллс- .. ,JTJ ,
и л i, М|««е and mvcrt.ble. (he warning 'P'T‘J(	n™ Wt
"	= /• When A В invertible, why is P - П Whath^ 1
Then AA *(Л ) л
n nf Ar it to 'hc column space С(Л)- So if ATb - 0 ik
The nullspace of A it —-—	Check that P - л/ лт ’ ,'|e
projection of b onto C(A) should be p = -• Check that _ A(A A)~>Лт
gives this answer.
The projection matrix P onto an n-dimensional subspace of R”' has rank r . n.
Keaton: The projections Pb fill the subspace S. So S is the-of P.
If an m by rn matrix has Л3 = Л and its rank is m, prove that A = I.
The important fact that ends lhe section it this: If ATAx = 0 then Ax = <j
New Proof. The vector Ax is in lhe nullspace of-. Ax is always in the column
space of ___. To be in both of those perpendicular spaces, Ax must be zero.
Use PT « P and P2 = P to prove that the length squared of column 2 always
equals the diagonal entry Pjj. This number i* £ = j|j + jjj + ;& for
‘52-1
29 If I) has rank rn (full row rank, independent rows) show that В В1 is invertible.
30	(a) Find the 2 by 2 projection matrix Pc onto the column space of A (after looking
closely at the matrix!)
. _ [ 3 6 6 1
I 4 8 8 J
31
32
33
(b) Find the 3 by 3 projection matrix Pw onto the row space of A. Multiply В =
Pc Al ft. Your answer В should be a little surprising—can you explain it?
In R™. suppose I give you b and also a combination p of щ.an. How would
you test to see if p is the projection of b onto the subspace spanned by the a’s?
SnjWeynhkn.n.te.ve^^ „Г4..4,............ w When Z,,™ „rives, check
“•al	+ (610U0 - IoM)/1(Kjo. That step updates fotd to xww.
Suppose P, and P, are projection matrices (P'2 = pi = /rT) Provc (his fac,.
A P3 is a projection matrix if and only if Pt p3 = p2pl.
4 3 Lea*1 Square* Approximations
153
4.3 Least Squares Approximations
It often happens that Ax — b has no solution. The usual reason is: too талу equations.
The matrix A has more rows than columns. There are more equations than unknowns
(m is greater than n). The n columns span a small рал of m-dimensional space. Unless all
measurements are perfect, b is outside that column space of A. Elimination reaches an
impossible equation and stops. But we can't stop just because measuremenu include noise!
To repeat: We cannot always get lhe error e — b - Ax down to zero. When e is zero.
x is an exact solution to Ax = b. When the error e is as small as possible, x it a
least squares solution. The words "least squares" mean that | <b - Ax\|2 is a minimum Our
goal in this section is to compute x and use it These are real problems that need answers
Note In statistics this problem is linear regression x and b often become Y and X.
The previous section emphasized p (the projection» This section emphasizes x (the
least squares solution). They are connected by p = Ax. The fundamental equation is still
AtAx = A rb. Here is a short unofficial way to reach this “normal equation":
When Ax = b has no solution, multiply by AT and solve ArAx ” ATb.
Example 1 A crucial application of least squares is lining a straight line to m points.
Stan with three points: Find the closest line to the points (0.6), (1.0). and (2,0).
No straight line b = C + Dt goes through those three poinu. We are asking for two
numbers C and D that satisfy three equations: n = 2 and m = 3 and m > n. Here are the
three equations at t = 0,1,2 to match the given values b = 6,0,0:
1 = 0	The first point is on the line b = C+ Dt if	C 4- D -0 =	6
t = 1	The second point is on the line b = C + Dt if	C + D • 1 =	0
t = 2	The third point is on the line b = C + Dt if	C + D  2 =	0
This 3 by 2 system has no solution: b = (6.0,0) is not a combination of the columns
(1,1,1) and (0,1,2). Read off A and x and b from those equations:
b =
'6
0
0
Ax = b is not solvable
x is overdetermined
I u*»	*
_ «и ч in the last section. We computed x = (r
The same numbers were in Р* 5 _ 3< will be the best line for the 3^7^'
Those number, are the best	by Mpkuning why A-Ax = A^.
We must	could easily be m = 100 points instead of m e 3
In practical pro	(j f + pt. Our numbers 6,0,0 exaggerate the erro C>
don't exactly match any anight lineд f	"c Crror
-------„ and r3 in Figure 4.6.
Minimizing the Error
as small as possible? This is an important qucstion
How do we make the error _	(caUed	found by gcomctry (the
with a beautiful answer	} h can be found by algebra: ArAx = дТ6
°* ЛССП" Pl - 61,1 Ь 1
Bv aeometrv Every Ax lies in the plane of Ле columns (1 1. D and (0,1,2), In that
5S Г-«<— “ ‘	*'<"*«•" ₽
n.taidW.I.ai-Xl**"’' 1Ъе po».bk em.r. b_p
peroendicular to the columns. The three pauas al hetghts (p,, P2. Ps) do he on a line.
SXse p is tn tbe column ермх of A The best line C + Dt comes from x - (C, D).
By algebra Every vector b splits into two parts. The part in the column space is p
The perpendicular part .* e. There is an equation we cannot solve (Az = 6). There is an
equation Ai = p we can and do solve (by removing « and solving AT Ax = A1 b):
Ax-b = p+e is impossible Az = p is solvable x is (ATA)~*ATb. (1)
The solution to Ax - p leaves the least possible error (which is e):
Squared length for any * II Az - b||2  || Ax - p|(2 + ||e[|a.	(2)
This is the law c3 « a1 + b2 for a right triangle. The vector Az - p in Ле column space
is perpendicular to e in thc left nullspace. We reduce Ax - p to zero by choosing x = Z.
That leaves the smallest possible error e = (*i.e?,es) which we can't reduce.
Notice what "smallest” means. The squared length of Ax - b is minimized:
The least squares solution i makes E = || Ax — b||3 as small as possible.
Figure 4.6a shows the closest line. It misses by distances ei,ei, ej = 1,-2,1.
Those are vertical distances. The least squares line minimizes E = e? + e% + e2.
Figure 4.6b shows the same problem in 3-dimensional space (bpe space). The vector
b is not in the column space of A. That is why we could not solve Az = b. No line goes
through the three points The smallest possible error is the perpendicular vector e. This is
e ~ ,he vcc,w of errors (1, -2,1) in the three equations. Those are Ле distances
from the best line. Behind both figures is the fundamental equation AT Az = ATb.
4 3 beast Squares Approximations
errors = vertical distances to line
Figure 4.6: Best line and projection: Two pictures, same problem. Tbe line has heights
p « (5,2, —1) with errors e«(l, -2,1). The equations ATAx = ATbgivex — (5, -3).
The best line is b = 5 - 31 and the closest point is p = 5a> - 3a2 Same answer!
Notice that the errors 1,—2,1 add to zero Reason: The error e = (ei.es,ej) is
perpendicular to the first column (1,1,1) in Л. The dot product gives ei + ej + ej • 0.
By calculus Most functions are minimized by calculus! The graph of E bottoms out
and the derivative in every direction is zero. Here the error function E to be minimized
is a sum of squares ej + ej + ej (lhe square of the error tn each equation):
E-||Ax-b||3-(C + D-0-6)J + (C + D-l)2 + (C + D-2)2.	(3)
The unknowns C and D tell us the closest line C + Dt. With two unknowns there are two
derivatives—both zero at the minimum. They are “partial derivatives" because dE/dC
treats D as constant and QE/dD treats C as constant:
dE/dC = 2(C + D-0-6) + 2(C+D1) + 2(C + D-2)	=0
9E/dD = 2(C + D • 0 - 6)(0) + 2(C + D • 1)(1) + 2(C + D • 2)(2) - 0.
dE/dD contains the extra factors 0,1,2 from the chain rule. (The last derivative from
(C + 2D)2 was 2 times C + 2D times that extra 1) Those factors are 1,1,1 in dE/dC.
It is no accident that those factors 1,1, 1 and 0.1. 2 in lhe derivatives of Ax — b||
are the columns of A. Now cancel 2 from every term and collect all C’s and all D s:
The C derivative is zero: 3C + 3D = 6 •n,isnwtrixP ’]tsATA!
The D derivative is zero: 3C + 5D = 0	L 3 * 5 J
(4)
156
П<» tquanons ®* сж1си1и5 лс
Ofx. The	' equations of»
algebra Ibese are the ke> СЧ
Chapter 4. Onhogo^
,ть Hie best C and D are the compon
* Jas the 'normal equations' from h
e «тс )jnear regression.
.3	=
b e 5 - 3t is die best line—it Comes
. r _ 5*pdD = -3 mis line goes through p = 5. 2, H
The solution is c ' At t e 01 l* * , _2 1. This is the vector e!
closest to the three pou"--------------.^«areL •
It could not go through b
The Big Picture for Least Squares
. fnor subspaces and the true action of a mMrix.
The key figure	X- “ b =	°"	**
The vector x on the left side of g solutions to Лх - b.
?* J'ht intoXr + z. ^ jToopoMte There are no solutions to Ax , 6
!n L section the situanoo .s juM tbrg 4.7 sh the big picture
/«.end	uPx “*^^^tobe ,4x = P The error e = b-p ts unavo.dable,
for least squares. Instead of A
wlvabteforp
p is in the column space
к all of R
0
A has independent columns
Nullspace = {0}
column space
inside R”‘
no< solvable for b
b is not in the column space
ofAT
p = Pb is
nearest to b
Figure 4.7: The projection p = Ax is closest to b. so x minimizes E - ||b - Ax||’.
Nonce how the nullspace N(A) is very small—just one point. With independent
= 0 - ~	» rmb'c'
AT Ax = ATb fully determines the best vector x. The error has A e - u.
д j Least Squares Approximations
157
Fitting a Straight Line
Fitting 8 l*nc ls t*1c clearest application of кач
hopefully near a straight line. At times t	“ sUfU w,th > 2 P°,nU>
,	The best line C + Dt misses*^ "  Шоте P°,nt4	*’ he,ehU
*T“ “..........
The first example in this section had	“ c‘ + ‘ '+ e”*'
J™ can be lartte) The two /	P01»*4 “ R8“« <6. Now we allow m points
(and m can be large). The two components of i are still C and D
A line goes through them points when	„
d0 it. TVo unknowns C and D determine a fine	=	У*7^ ‘
J0 • . arM trvino . i	50 4 1Пе’ 10 has only n = 2 columns. To fit the
m points, we are trytng to solve m equations (and we only have two unknowns!).
C + Dt\ = b]
И..6 a c+«.-»>
C+DU-k,
with
and
[ci
lDl
<5)
The column space of A is so thin that almost certainly the vector b is outside of it.
When b happens to lie in the column space, the points happen to lie on a line.
That case b - p is very unusual. Then Ax - b is solvable and e - (0.....0).
Tht ЛМ Unt C + Dt has heigto Pl.........Pm wi/Л rrro„ e,......вт.
Solvt AT Ax = ATb fori = (C.D). The error, are e, = b, - C - Dt,.
Fitting points by a straight line is so important thal we now find the two equations
A rAx = Arb, once and for all. The two columns of A are independent (unless all of the
times t* are the same). So we turn to least squares and solve ATAx - ATb.
Dot-product matrix ATA =
L»t ••• U
(6)
On the right side of the normal equation is the 2 by 1 vector ATb:
(7)
In a specific problem, the t*s and b's are given. The best i = (C.D) is (ATA)_,ATb.
The line C + Dt minimizes e, + • • • ATAz = ATb	+ ej, = ||Ax - b||2 when ATAx = ATb: m E*<1 [cl _ Ek	,8) [Eb E<?] Ы~ 1е<АГ
158
Chapter 4. Orthogonal^
The vertical errors at (he m points on the line arc the components of e - b - p, This error
vector (the residual) b - Ax is perpendicular to the columns of A (geometry).
The error is in the nullspace of AT (linear algebra). The best x = (C, D) minimizes thc
total error E, the sum of squares (calculus):
E(x) = |Ar - Ц’ = (C + Dti - bi)’ + • • • + (C + Dtm - bm)2.
Calculus sets the derivatives dE/dC and dE/dD to zero, and produces Ат4х = 4Tb.
Other least squares problems have more than two unknowns. Fitting by the best parabola
has n = 3 coefficients C, D. E (see below). In general we are fitting m data points
by n parameters xi,...,x„. The matrix A has n columns and n < m. Tbe derivatives
of ||.4x - b||2 give the n equations ArAx = ATb. The derivative of a square is linear.
This is why the method of least squares is so popular
Example 2 A has orthogonal columns when the measurement times t, add to zero.
Suppose b - 1.2,4 at times t « -2.0.2. Those times add to него. The columns of A
have tens dot product (1,1,1) is orthogonal to (-2,0,2):
C+D(-2)«l
C+ D(0)«2
C+ D(2)-4
1 -2'
1 0
1 2
T
2
4
When the columns of A are orthogonal. ATA will be a diagonal matrix (this is good):
ATAx = ATb is
(9)
uWZ.	find C - j .nd D - S. The urn
- \ • -<T*
i 5 j H. is almost as simple as the identity matrix.
average '."77	“ *°ПЬ ** Л>Ш Ьу sub,rac,ing ,he
t - 3. The shifted tunes T = Г t' ,	°П8'П^tlm” Were 1 •3'5 lhcn thcir avcra8e “
* * ~3**e-2,0,2. Those times add to zero!
1 T‘l	r n
1 К А1_Л.„_= 3 0 .
Ti = 1-3=-2
Tj ж з - 3 ж о
Tj = 5 - 3 ж 2
Now C and D come from the easv mum zm
which is C + £)(t - f) = q +	_ °.? 1 ^*n thc **st straight fine uses C + DT
That was a perfect example of th, -с 7? 30 Cvcn «'*« a formula for C and D.
Make the columns orthogonal in advance tsJ,dca coming in the next section:
4.3. Least Squares Approximations
159
Dependent Columns in A : Which x is best ?
From the start, this chapter assumed independent columns in A. Then AT A is invertible.
AtAx = ATb produces the only least squares solution to Ax = b Which x is best if
A has dependent columns ? All the dashed lines have the same errors e = (1, -1).
The measurements = 3 and bj = 1 are at lhe same lime T! A straight line C + Dt
cannot go through both points. I think we are right to project 6 « (3,1) to p = (2,2) in
the column space of A. That changes the equation Ax « b to lhe equation Ax  p.
An equation with no solution has become an equation with infinitely many solutions.
The problem is that A has dependent columns and Xi + ij  2 has many solutions.
Which solution x should we choose? All the dashed lines in the figure have the same
two errors I and -1 at time T. Those errors (I,-1) - e = b - p arc as small as possible.
But this doesn't tell us which dashed line is best. My instinct is to go for the horizontal line
at height 2.
The "pieudoinvrrse" of A will choose the shortest solution x+ = A+b to Ax = p.
Here, that shortest solution will be x+ = (1,1). This is the particular solution in the row
space of A, and x+ has length y/2. (Both solutions x - (2,0) and (0.2) have length 2.)
We are choosing the nullspace component of the solution x * to be zero.
When A has independent columns, the nullspace only contains the zero vector and lhe
pseudoinverse is our usual left inverse L = (A1.4) 1AT When 1 write it that way. the
pseudoinverse sounds like the best way to choose x. The shortest solution x is often
called the minimum norm solution its nullspace component is zero.
Comment MATLAB expenmenu with singular matrices produced either Inf or NaN
(Not a Number) or 10,e (a bad number). There is a warning in every case! I believe that Inf
and NaN and 10*e come from the possibilities 0x = band0x-0andl0 x-1.
Those are three small examples of three big d.fficulties: singular with no solution,
singular with many solutions, and very very close to singular, ry more experiments.
160	Chapter 4.0^^
Fitting by a Parabola
If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b =
q +	+ allows the ball to go up and come down again (b is the height at time t).
The actual path is not a perfect parabola, but the whole theory of projectiles starts there.
When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated.
The d<«tance contains a quadratic term (Galileo s point was that the stone s mass is
not involved.) Without that t* term we could never send a satellite into its orbit.
But even with a nonlinear function like t1, tbe unknowns C, D, E still appear linearly!
Fitting points by tbe best parabola is still a problem in linear algebra.
Problem Fit heights bt,... ,6m at times tb.... tm by a parabola C + Dt + Et2.
Solution With m > 3 points, the m equations for an exact fit are generally unsolvable:
C + Dtt + Et} «= 6,
C + Dtm + Eti, - 6m
	1 fl f?	
A =	• • •	• (10)
	j fm fm.	
= (C,D,E) to
is Ax = 6 with
this rn by 3 matnx
Least squares Tbe closest parabola C + Dt + Et2 chooses x
solve the three normal equations AT Ax = A T6
May I ask you to convert this to a problem of projection? Tbe column space of A has
dimension______ The projection of b is p = Ax, which combines the three columns
using the coefficients C, D, E. The error at the first data point is ei «= 6j - C - Dt i - Et\.
The total squared error is ef +__. If you prefer to minimize by calculus, lake the
pwtial derivatives of E with respect to_,___________These three derivatives will
be zero when x - (C, D, E) solves lhe 3 by 3 system of equations ArAx = ATb.
Example 3 For a parabola 6 = C + Dt + Et2 to go through the three heights 6  6,0,0
when t - 0,1,2. the equations for C, D, E are
C+D.0 + £.02=6
C+D.l+E.p = 0
C+D.2 + £.a»e0.
a square m'nx The ^Vtion их' "	Р°'ПЦ g'Ve Лгее c4ua,ions and
three points i?6 = 6 - w + 3P (C ’ (®’ ~9' P-^bola through the
whole space R'. The Droiectj^'^j?*^ P* malnx has ,hrcc columns, which span the
is zero. We didn't need ArAx "аП* IL?' ,dentity The projection of b is b. The error
lf there are m = 4 dauX 2 T“lved ’ *•
зи points, then we need ATA and least squares.
(ID
43. Least Squares Approximations
161
Three Ways to Measure Error
Start with nine measurements bi to b». all urn, at times t =	The
tenth measurement b10 = 40 is an outlier. Find the best horizontal line у = C to fit
the ten points (1,0),(2,0),...,(9,0),(10,40) using three options for the error E:
(1)	Least squares Ei = ej + • • • + ejo (then the normal equation for C is linear)
(2) Least maximum error Eoo = | emaj | (3) Least sum of errors E\ = |«il+ ” ' +le»o|
Solution (1) The least squares fit to 0,0,...,0,40 by a horizontal line is C = 4:
A = column of l's ЛтА = 10 ATb = sum of b, = 40. So 10C - 40.
(2)	The least maximum error requires C = 20. halfway between 0 and 40.
(3)	The least sum requires C = 0 (!!). The sum of errors 9|C| + |40 - C| would increase
if C moves up from zero.
The least sum comes from the median measurement (the median of 0.....0,40 is zero).
Many statisticians feel that the least squares solution is too heavily influenced by outliers
like bio ~ 40. and they prefer least sum. But the equations become nonlinear.
Now find the least squares line C + Dt through those ten points (1,0) to (10,40):
Those come from equation (8). Then ArAx = ATb gives C « -8 and D “ 24/11.
Problem Set 43
Problems 1-11 use four data points b = (0.8,8,20) to bring out the key ideas.
162
a’»pt«4.otthogonah(y
Wkh b = 0,8,8,20 at t = 0,1.3,4. set up and solve the normal eqUations
4t 4i = лтЬ- Rx the best straight line m Figure 4.8a. find its four heights P1
and*four errors e,. What is the minimum value E - e1 + e2 + e3 + e<?
1
2
3
4
5
6
7
8
9
10
11
(Line C + Dt does go through p’s) With b = 0,8,8,20 at times t = о 1 a
write down the four equations Az = b (unsolvable). Change the measuremen
p = 1,5,13,17 and find an exact solution to Az = p.	nts 10
Check that e = b - P = (-1.3, -5,3) is perpendicular to both columns of the
same matrix A. What is tbe shortest distance H from b to the column space of Л?
(By calculus) Write down E = ||Az - b(|2 as a sum of four squares—the last
is (C + 4D - 20)2. Find the derivative equations dE/dC = 0 and dE/dD ~° o'
Divide by 2 to obtain tbe normal equations A1 Az = ATb.
Find the height C of the best horizontal line to fit b = (0,8,8,20). An exact fit
would solve the unsolvable equations C = 0, C = 8, C = 8, C = 20. Find the
4 by 1 matrix A in these equations and solve ATAz = ATb. Draw the horizontal
line at height x = C and the four errors in e.
Project b = (0,8,8,20) onto the line through a = (1,1,1,1). Find z = aTb/aTa
and the projection p = xa. Check that e = b - p is perpendicular to a, and find the
shortest distance |elf from b to the line through a.
Find the closest line b = Dt, through the origin, to the same four points. An exact
fit would solve D • 0 = 0, D • 1 = 8, D • 3 = 8, D • 4 = 20. Find the 4 by 1 matrix
and solve ATAz = ATb. Redraw Figure 4.8a showing the best line b = Dt and the
e’s.
Project b = (0,8,8,20) onto the line through a = (0,1,3,4). Find X = D and
p = xa. The best C in Problems 5-6 and the best D in Problems 7-8 do not agree
with the best (C, D) in Problems 1-4. That is because (1,1,1,1) and (0,1,3,4) are
______perpendicular
For the closest parabola b = C + Dt + Et2 to the same four points, write down the
unsolvable equations Az = b in three unknowns z = (C, D, E). Set up the three
normal equations AT Az = ATb (solution not required). In Figure 4.8a you arc now
fitting a parabola to 4 points—what is happening in Figure 4.8b?
For the closest cubic b = C + Dt + Et2 + Ft3 to the same four points, write down
the four equations Az = b. Solve them by elimination. In Figure 4.8a this cubic
now goes exactly through the points. What are p and e?
The average of the four times is t = {(0 + 1 + 3 + 4) = 2. The average of the
four b’s is b = |(0 + 8 + 8 + 20) = 9.
(a)	Verify that the best line goes through the center point (F, b) = (2,9).
(b)	Explain why C + Dt = b comes from the first equation in ATAx = A^b.
4.3. Least Squares Approximation*	163
Questions 12-16 introduce basic ideas of statistics—the foundation for least squares.
12 (Recommended) This problem projects b = (b,.......b_) onto the line through a =
(1,... t !)• We solve m equations ax = b in one unknown x (by least squares).
(a)	Solve aTax = a Yb to show that x is the mean (the average) of the b’s.
(b)	Find e = b - ax and tbe variance ||e|2 and the standard deviation ||e||.
(c)	Thehorizontal line b = 3isclosest tob = (1,2,6). Check that p = (3.3,3) is
perpendicular toeandfindthe3by3 projection matrix P.
13	First assumption behind least squares: Ax = b - (noise e with mean zero). Multiply
the error vector e - b - Ax by (ATA)-IAT to get x - x on lhe right. The
estimation errors x — x also average to zero. The estimate x is unbiased.
14	Second assumption behind least squares: The m errors e, are independent with vari-
ance a2, so the average of (b - Ax)(b - Ax)T is a2!. Multiply on lhe left by
(ATA)-1AT and on the right by A(ATA)-1 to show that tbe average matrix
(x — x)(x — x)T is <72(ATA)~*. This is the covariance matrix in Section 8.4.
15	A doctor takes 4 readings of your heart rate. The best solution to x = b|,...,x = b«
is the average x of bi,...,bt. The matrix A is a column of Га. Problem I4 gives
the expected error (x - x)2 as o2(ATA)~l = . By averaging, the variance
drops from a2 to a2 /4.
16	If you know the average xg of 9 numbers b|,..., 6». how can you quickly find the
average хщ with one more number Ью? Tbe idea of recursive least squares is to
avoid adding 10 numbers. What number multiplies x# in computing гщ ?
*io = i^^io +-------i» =	+ ••• + 6ю)
Questions 17-24 give more practice with x and p and e.
17 Write down three equations for the line b = C + Dt to go through b = 7 at t = — 1,
b = 7 at t = 1. and b = 21 at t = 2. Find the least squares solution x = (C, D) and
draw the closest line.
18 Find the projection p = Ax in Problem 17. This gives the three heights of the closest
line. Show that the error vector is e = (2. -6,4). Why is Pe = 0?
19 Suppose the measurements at t = -1,1.2 are the errors 2.-6.4 in Problem 18.
Compute x and the closest line to these new measurements. Explain the answer:
b = (2, -6.4) is perpendicular to__________so the projection is p = 0.
20 Suppose the measurements at t = -1,1.2 are b = (5.13,17). Compute x and the
closest line and e. The error is e = 0 because this b is-----.
21 Which of the four subspaces contains the error vector e? Which contains p? Which
contains x? What is the nullspace of A?
164	СЬдр,сг4°пЬор)Па1йу
22 HndihetatltaeC + »iofiifc-4.2.-1.0.0mtiIne#r = -2,-|,0,i,2>
23 к the error vector e orthogonal to b or p or e or z? Show that ||e||2 equals
which equals brb - pTb. This is the smallest total error E.
24 The partial derivatives of | Ax|2 with respect Ю Ц.z„ fill the vector 2AтЛа.
The derivatives of 2bJ Ax fill the vector 2ATb So lhe derivatives of || Ax _ 6#a
zero when_____.
Challenge Problems
25 What condition on (f|,bi).(fj.^2).(f3.fo) Puls >hose three points onto a straight
line’! A column space answer is: (bt.bj.bj) must be a combination of (1, 1, i) an<J
(ft. h- <»)• Try to reach a specific equation connecting the f's and b’s. I should have
thought of this question sooner!
26 Hnd the plane that gives the best fit lo the 4 values b - (0,1,3,4) at the corners
(1.0) and (0,1) and (-1.0) and (0. -1) of a square. The equations C+Dx + Ey*
b al those 4 points are Ax • b with 3 unknowns x — (C,D, E). What is A?
At lhe center (0.0) of the square, show thal C + Dx + Ey = average of the b's.
27 (Distance between lines) The points P  (x, x, x) and Q  (y, 3y, -1) arc on two
lines in space that don't meet. Choose x and у to minimize the squared distance
||P - QH1. The line connecting the closest P and Q is perpendicular to___________
28 Suppose the columns of A arc not independent. How could you find a matrix В
so thal P  B(BrB)~'Br does give lhe projection onto the column space of A?
(The usual formula will fail when ATA is not invertible.)
29 Usually there will be exactly one hyperplane in R” that contains the n given points
x = O o..............*.-!• (Example for n = 3: There will be one plane containing
0, o,, aj unless------) What is the test to have exactly one plane in R"?
Example 2 shifted the times t, to make them add to zero. We subtracted away the
average time t « (t, +... + tm)/m to gc, T( _ t( _ f -p,^ Tj add (0
With the columns (I,.,
entries are m and 7? +
., 1) and (7j,...,Tm) now orthogonal, A1A is diagonal. Ils
' ” + T^. Show that the best C and D have direct formulas:
Tfat-t Cg*l+' +*! W)d blr,q--.. + 6m7'BI
is an example*^ t^GmmVhmidt «2-	,ha‘	diagOna'
advance. This и in Section 4 4	process, orthogonalize the columns of A in
4.4. Orthogonal Matnces and Gram-Schmidt	165
4.4 Orthogonal Matrices and Gram-Schmidt
This section has two goals, why and how The first is to see why orthogonality is good.
If Q has orthonormal columns, then QrQ « I. Least squares becomes eary. The second
goal is to convert independent vectors in A to orthonormal sectors in Q. You will see how
Gram-Schmidt combines thc columns of A to produce right angles between columns of Q.
From Chapter 3, a basis consists of independent vectors that span the space.
The basis vectors could meet at any angle (except 0е and 180®). But every lime we visu-
alize axes, they are perpendicular. In our imagination. the coordinate axes art practically
always orthogonal. This simplifies the picture and it greatly simplifies the computations
Thc vectors qlt.... qn are orthogonal when their dot products q,  qf are zero. More
exactly q[q, “ 0 whenever i / j. With one more step—just divide each vector by its
length—the vectors become orthogonal unit vectors Their lengths are all I (normal».
Then the basis is orthonormal
DEFINITION The vectors q,.......qn are orthonormal if their dot products arc 0 or 1:
A matrix with orthonormal columns is assigned the special letter Q.
The matrix Q is easy to work with because QrQ = I. This repeats in matrix language
that the columns q,....qn are orthonormal Q is not always required to be square.
Chapter 4.
166
lily
When row i of (?T multiplies column jof Q. the dot product is qj 9j< 0(r
(i # j) that dot product is zero by orthogonality. On the diagonal (i = jj йе u .<U4o^
give qrq = |9(|’ = ]. ^ can be rectangular (m > n) or square (rn =	" veqOrs
Hhen Qis square, QTQ = I means that QT= Q~\ transpose == /n„erj
When Q is not square, its rank is n < m. So QQT cannot equal /m
If the columns are only orthogonal (not unit vectors), dot products still gjve
matrix QTQ (but not the identity matrix). This diagonal matrix is almost as “ dia8on4|
The important thing is orthogonality—then it is easy to produce unit vectors	^°°d
To repeat QTQ ж 1 even when Q is rectangular. In that case QT js On(
from the left For square matrices we also have QQT = /, so QT is the t *nvcr”:
»ene of Q. The rows of a square Q are orthonormal like the columns. The .•*°'s‘<,ed b-
transpose In this square case we call Q an orthogonal matrix 1	">ersr is the
Here are three examples of orthogonal matrices—rotation and permutation .
bon. The quickest test is QrQ  /.	at>d ^flec-
Exampte 1 (Rotation) Q rotates every vector in the plane by the angle # and Q-t к
Q= Геш# -sin#	-
V" I sin#
COS# sin#
- sin в cos # '
cos#
The columns of Q are orthogonal (take their dot product). They are unit vecton because
sin29 + cos’ 9 « I. Those columns give an orthonormal basis for the plane R2,
Example 2 (Permutation) These matrices change the order to (y, z, x) and (y,*);
V
t and
1
V
Inverse = transpose:
|0 0
[1 0 Oj [aj [xj l* ',J lMJ L*J
All columns of these Q't are unit vecton (their lengths are obviously 1). They are also
orthogonal (the l’s appear tn different places). The inverse of a permutation matrix is its
transpose Q~l  QT. The inverse puls the components back into their original order:
° 0 11 Гу] [ж]	rn ,i г.л r_s
10 0	z
0 10	x
JU.
Every permutation matrix is an orthogonal matrix.
Example 3 (Reflection) If u is any unit vector, set Q = I - 2uuT. Notice that
uu is a matrix while uTu is the number |u|2 = 1. Then QT and Q~* both equal Q\
<3T*/-’wT’Q QTQ = f _ 4UUT + 4uuTuu
and
V
(1)
a better name for Q, but it’» not used. Any matnx with
*e only call it an orthogonal matrix when It is square-
4.4. Orthogonal Matrices and Gram-Schmidl
167
Reflection matrices 1 - 2uu are symmetric and also orthogonal. If you them, you
get thc identity Q = Q Q - I Reflecting twice through a mirror bring* back the
original, like (-1) = 1. Notice uTu = I inside4uuTuuT inequauonflj.
Figure 4.9: Rotation by Q = [J •] and reflection across 45° by Q - [J J).
As an example choose the direction u = (-l/>/2,1/^2). Compute 2uuT (column
times row) and subtract from I to get the reflection matnx Q across thc 45° line.
aaw»»	j] [»;][;]-[;].
When (*, y) goes to (y, x), a vector like (3,3) doesn't move. It is on the mirror line.
Rotations preserve the length of every vector. So do reflections. So do permutations.
If Q has orthonormal columns (QTQ = I), it leaves lengths unchanged: \|Qx|| - ||x||
Same length for Qx (Qx)T(Qx) = xrQrQx - xTlx  xTx	(2)
Same dot product: (<?x)T(Qу) = xTQTQy « xTy. Just use QTQ  /
Projections Using Orthonormal Bases: Q Replaces A
Orthogonal matrices are excellent for computations—numbers can never grow loo large
when lengths of vectors are fixed. Stable computer codes use Q's as much as possible.
For projections onto subspaces, all formulas involve ATA. The entries of AT A arc the
dot products between the columns of A. Usually AT A is not diagonal.
Suppose those columns are actually orthonormal Tbe a's become the q's. Then
AT A simplifies to QTQ = I. Look at the improvemenu in x and p and the projection
matrix P = QQT. Instead of QrQ we print a blank for the identity matrix:
_____x = QTb and p-Qx and P = Q--------------Qr.	(3)
The least squares solution of Qx = b is x = QTb. There are no matrices to invert.
This is the point of an orthonormal tmis. The best x = QTb just has dot products of
q,,...,qn With b. We have 1-dimensional projections' A A is now Q Q - /.There
is no coupling. When A is Q, with orthonormal columns, here is p = Qx - QQ »:
168
Chw'*
Projection
onto q't
+ Mgn6).
Important case when Q в square: If m = n, the subspace is the whole space Th.
QT = Q~l and i = Qrb is the same as z = Q~lb. The solution is exact! The proj
of b onto the whole space is b itself. In this case p = b and P = QQT = /. ^bon
You may think that projection onto the whole space is not worth mentioning
when Q is square and p = b. our formula assembles b out of its I -dimensional proir °Щ
If qt,...,q„ is an orthonormal basis for the whole space, then b is equal to Опт*'
Every 6 b the sum of its components along the q*s:	"•
1	W/
Example 4 The columns of this orthogonal Q are orthonormal vectors qt,q2,q3:
m - n = 3
-1
2
2
2 2
-1 2
2 -1
has QtQ-QQT„L
The separate projections of 6 = (0,0,1) onto q, and q2 al,d Яз m Pi an^ Pa and Ps'
qt(«Tb)«ht «*» Ча(чТЬ)-|ча and q3(qfb) = -jq3.
The sum of the first two is the projection of 6 onto the plane of q, and q2. The sum of
all three is the projection of 6 onto the whole space—which is b itself:
Reconstruct b
6"Pi +Pa+Pj
2 , 2	1	1
3«> + 3^-3»1«9
-2 + 4 - 2
4-2-2
0‘
0 «=6.
1
Transforms QQr > / u the foundation of Fourier series and all the great “transforms”
of applied mathematics. They break vectors b or functions /(z) into perpendicular pieces.
Then by adding the pieces in (5). the inverse transform puts b and /(z) back together.
Fourier series /(z) = a0 + a1coez + b1sinz + aacoe2x + basin2r+-’
Only two differences. Those are functions. The sine-cosine basis is infinite: m — n = oo.
4.4. Orthogonal Matrices and Gran-Schmidt
169
The Gram-Schmidt Process
The point of т“тТи^ “ lhjI “onho8°nal “ g(wd"- Projections and least squares
always involve A A. When this matrix becomes QTQ = /. ihe inverse is no problem.
The one-dimensional projections are uncoupled. The best z is Qrb Gust n separate
dot products). For this to be true, we had to say “If the vectors are orthonormal".
Now we explain the “Gram-Schmidl way" to create orthonormal vectors.
Start with three independent vectors a, b. c. We intend to construct three orthogonal
vectors А, В, C. Sooner or later we will divide А. В, C by their lengths. That produces
three orthonormal vectors q, = A/|A|. q2 =	<j3 = C/JC||.
Gram-Schmidt Begin by choosing Ажа. This first direction is accepted as it comes.
The next direction В must be perpendicular to A. Start with b and subtract its
projection along A. This leaves the perpendicular part, which is the orthogonal vector B:
First Gram-Schmidt step
B = b-
(6)
A and В are orthogonal in Figure 4.10. Multiply equation (6) by AT to verify that
ATB  A1 b - A b = 0. This vector В is what we have called lhe error vector e.
perpendicular to A. Notice that В in equation (6) is not zero (otherwise a and b would
be dependent). The directions A and В are now set.
The third direction starts with c. This is not a combination of A and В (because c
is not a combination of a and b). But most likely c is not perpendicular to A and B.
So subtract off its components in those two directions to get a perpendicular direction C:
Next Gram-Schmidt step
(7)
Subtract the
projection p
to get В = b — p
Figure 4.10: First project b onto the line through a and find the orthogonal В as b - p.
Then project c onto the AB plane and find C as c - p,. Divide by ЦАЦ, ||B||. ||C||.
This is the one and only idea of the Gram-Schmidl process Subtract from every new
vector its projections in the directions already set. That idea is repeated at every step.
A = a b
Chapter 4.
170
Illy
, К d ue would subtract three projections onto А. В, C t0 lc. n
~ «.л-*	“»"""«»»; •=«»„ A °
E^dCM-SdvM SW~lbc'»^^"^-onto«0""l»«ooa,(,
’-care
and
’ 2
0
-2
and c =
’ 3
-3
3
b =
0
Then A = a has ATA = 2 and ATb = 2. Subtract from b its projection p along A;
В
2
1
First step
2
Check: AJ В » 0 as required. Now subtract the projections of c on A and В to get C.
T
1 .
Next step
c	Brc	6 6
—A-VB = £-^ + ;B =
A BrB	2 6
Check; C = (1,1,1) is perpendicular to both A and B. Finally convert A,B,C to
unit vecton (length I. orthonormal). The lengths of А. В, C are >/2 and ч/б and vj
Divide by those lengths for an orthonormal basis q,, q2 Qi
1 '1 if1] 1 pl
Usually А. В. C contain fractions Almost always q(, q2, q3 contain square roots.
The Factorization A = QR
We started with a matrix A, whose columns were a, b, c. We ended with a matrix Q
whose columns are q^.q^ How are those matrices related? Since the vectors a b c
°* lhe ’ ‘ VWM)’ “* ** 1 n,atnx connecting 'a
third matnx is the tnangular R in A - QR. (Nol the R ln chapier 1.)
e4uatK,n	Г*” 'nWlVCd) Thc S,eP Was
Z ed Si	’°" °f Л аП<1 B A‘ 'hat Ma«e C and «3 *c«
This non-mvolvement of later vecton is the key point of Gntm-Schm.dt:
•	The vecton a and A and q, « all along a single line.
•	The vectona.band А Я »wi ~ _	...
ам Л. В and q,, q2 are all in the same plane.
*	vecton g. b, c and ABC and
9i>9j.q3 are in one subspace (dimension 3).
Orthogonal Matrice» and Gram-Schmidt
171
At every step ai,...,a* are combination*of q,.....„ Later q * arc not involved.
The connecting matrix R is triangular, and we have A = QR
a b
rfa
0
0
«1 4i q3
fljb
0
4ic
flic
flje,
or A = QR <8>
A • QR « Gram-Schmidt in a nutshell. Multiply by QT ю recognize that R = QT A
(Gram-Schmidt) From independent vectors a.... a., Gram-Schmidt construct*
orthonormal vectors flp ••., fln. The matrices with these columns satisfy A — QR-
Then R — Q1A is upper triangular because later q\ are orthogonal to earlier a's.
Here are the original as and the final q't from the example. The i,j entry of R - QT Л
is row 1 of Qr times column j of A. The dot products go into R. Then A — QR:
2
0
-2
3'
-3
3
 l/v/2
= -l/s/2
l/s/6
l/x/6
-2/^6
0
1/»/3]
1/V3
l/s/5j 0
-QR
0
Look closely at Q and R. The lengths of A. B.C are */2. s/б. s/3 on lhe diagonal of R.
The columns of Q are orthonormal. Because of lhe square roots. QR might look harder
than LU. Both factorizations are absolutely central to calculations tn linear algebra
Any rn by n matrix A with independent columns can be factored into A — QR- The
rn by n matrix Q has orthonormal columns, and the square matnx R is upper tnangular
with positive diagonal. We must not forget why this is useful for least squares.
ATA = (QR)TQR = RrQTQR = RT R. The least squares equation A1 Ax —
Arb simplifies to RT Rx = RrQrb. Then finally we reach Rx = QTb: success.
Least squares Rr Rx — RTQrb or Ri - Q'b or x = R~'QJb (9)
Instead of solving Ax = b. which is impossible, we solve Rx = Q1 b by back substitu-
tion—which is very fast. The real cost is the mn1 multiplications needed by Gram-Schmidt.
The next page has an informal code. It projects each new column v = a} onto the
known orthonormal columns q(,...,q,_*. After subtracting those projections from a7.
the last line divides the new orthogonalized vector (still called t>) by its length r„. This
produces the next orthonormal vector q;.
172
Starting from a. b.c
4l =ai/|oill
subtracts all projections at once.
-"-«onulity
= ei. aj. аз the code will construct 91. B, q2, C\ C, q3.
В = 02- (41a^i	42 ~
(qT<j)91 C = C--(^)«2 9з = С/||С||
nmiM-tion al a time in C* and then C. That change j
m	“* ”«<» m
(Ю)
;e *s celled
kxj = l:n
v =
tari = l:J-l
R(i.j) = <?(:• »/*«
v=0-R(iJ)»Q(:J);
end
R(j,j) = norni(u);
Q(:.» = »/ЯиЛ
end
% modified Gram-Schmidt
% v begins as column j of thc original A
4c columns d| to are already settled in Q
Як compute R,} = which is q}v
Як subtract the projection (q/'vjq,
Як v is now perpendicular to all of q,,..., q
% the diagonal entries RJ} are lengths
Як divide v by its length to get the next
Як the loop “for j = 1: n” produces all of the qj
To recover column j of A, undo the last step and the middlesteps of the code-
Л(у, j)qj = (v minus its projections) = (column j of A) -	.	(Ц)
=i
Moving the sum to the far left, this is column j in the multiplication QR = A.
Note Good software like LAPACK. used m good systems like MATLAB and Julia
and Python, will offer alternative ways to factor A = QR. “Householder reflections”
act on A to produce the upper triangular R. This happens one column at a time in thc same
way that elimination produces the upper triangular U in LU.
Those reflection matrices I - 2uuT will be described in Section 7.4. If A is tridiagonal
we can simplify even more to use 2 by 2 rotations. The result is always A = QR and the
MATLAB command to orthogonalize A is [Q, Я] = qr(A).
Here is a further way to reduce roundoff error: Allow reordering of thc columns of
A. When each new 4j is found, subtract its projections from all remaining columns j + 1
to n. Then choose the largest of the resulting vectors as aj+1 (leading to q... ,). We are
exchanging columns just as elimination exchanged rows. So a permutation P is allowed
on the column side of A, and AP = QR
or rot^iT Ih^‘1Gram-S<;uhm•<1, “ ®" «>* g<»d process to understand, even if reflections
or rotations or column exchanges lead to a more perfect Q.
4 4. Orthogonal Matrices and Gram-Schmidt
173
 REVIEW OF THE KEY IDEAS 
1,	If the orthonormal vectors оn	. T n .
T ,	,	.	are the columns of Q. then q,'q. = 0 and
4» 4i = * •tanslate into the matrix multiplication QTQ = /
2.	If Q is square (an orthogonal matrix) then Qr = Q- >:	= inrer^e
3.	The length of Qx equals the length of z; |QZ|| _ |хц
4.	The projection onto the column space of Q spanned by the q't is P = QQT.
5.	If Q is square then P = QQ? = /	b = (flTfr) +... + qjrfb).
6.	Gram-Schmidt produces orthonormal vectors qi.q2,q3 from independent a,b,c.
In matrix form this is the QR factorization A = (orthogonal Q)(triangular R).
 WORKED EXAMPLE 
Add two more rows and columns with all entries 1 or — 1, so the columns of this 4 by 4
Hadamard matrix are orthogonal. How do you turn Hi into an orthogonal matrix Q ?
1 -1 x x.
The projection of b = (6,0,0,2) onto the first column of Hi is px = (2,2,2,2). The
projection onto the second column is pj = (1,—1,1,—1). What is the projection pt 2 of b
onto the 2-dimensional space spanned by the first two columns?
Solution Hi is built from H2 just as Ht is built from Ht:
Я4 =
H2
H2
	1 I 1	1'	
Я21 _ -я2] -	1 -1 1 1 1 -1 1 -1 -1	-1 -1 1	has orthogonal columns.
Then Q = Я/2 has orthonormal columns. Dividing by 2 gives unit vectors in Q.
A 5 by 5 Hadamard matrix is impossible because the dot product of columns would have
five l's and/or — l’s and could not add to zero. Я8 has orthogonal columns of length s/8.
„т„ [Ят ЯТ1[Я Я1_[2НТЯ 0 1 _ [87 01 Яа
«8'7в= [ят _ят] [я _я] " [ о 2ЯТЯ]"[О 8/J 4’-^
What is the key point of orthogonal columns? Answer ATA is diagonal and easy
to invert. We can project onto lines and just add. The axes are orthogonal.
174
ChaptW4O^onallty
Problem Set 4.4
Problems 1-12 are about orthogonal sectors and orthogonal matrices.
1 Are these pairs of vecton orthonormal or only orthogonal or only independent?
йН-з] (c)
Change the second vector when necessary to produce orthonormal vectors.
2 The vecton (2,2,-l)and (-1,2,2) are orthogonal. Divide them by their len
find orthonormal vecton qt and q2 Put those into the columns of О and . ’°
QTQtndQQT	v multiply
3	(a) If A has three orthogonal columns each of length 4, what is AT A ?
(b) If A has three orthogonal columns of lengths 1,2,3. what is ATA ?
4	Give an example of each of the following:
(a)	A matrix Q that has orthonormal columns but QQT J.
(b)	Two orthogonal vecton that are not linearly independent.
(c)	An orthonormal basis for RJ, including the vector q, ж(1,1,1)/у/з
S	Find two orthogonal vecton in the plane * + v + 2x = 0. Make them orthonormal
• If Qi «»d Qj are orthogonal matrices. show that thetr product Q,n,
thogonal matrix. (Use QTQ * I.)	V t Vj is also an or-
7
8
9
If Q has orthonormal columns, what is the least squares solution x to Qx = b?
If <?] and q2 are orthonormal vectors in Rs. what combination__+___________q2
is closest to a given vector b?
(a)	Compute P • QQT when q, « (.8, .6,0) and q2 « (-.6, .8,0). Verify that
Dl _ D
10
11
(b)	Prove that always (<JQT)2 » QQT by using QrQ « I. Then P « QQT is
the projection matnx onto the column space of Q.
Orthonormal vectors q{,q2.q2 are automatically linearly independent.
(a) Vector proof: When c> + cjfl2 -t-Cjflj = 0, what dot product leads to ci =0?
Similarly cj = 0 and cj = 0. Thus the q's are independent.
(bl Matnx proof: Show that Qx = 0 leads to x = 0. Since Q may be rectangular,
you can use QT but not Q"1.
Find orthonormal vectors q, and q2 in the plane spanned by a = (1,3,4,5,7)
and b = (-6,6,8,0,8). Which combination is closest to (1,0,0,0,0) ?
4.4.
12
13
14
15
16
17
18
19
20
Orthogonal Matrices and Gram-Schmidl
175
If ai, aa, a3 is a basis for R3. any vector	u. -
” ^t r ° can be written as b = х1а|+гааа+.тз<*з.
(a) Suppose the as are orthonormal. Show that r, = вт6
<Ь> Suppose 4» «> are onhog.uul. Show Um », . «f
<c> 1Г Ле
L11	- [J) ЛтЛеЛе^и!. В
orthogonal to a? Sketch a figure to show a, b, and В. loJ
Complete theGram-Schmidt process in Problem 13 by computing q, - а/||a|| and
q, = B/IIBII and factoring into QR:
Find orthonormal vectors q\,q2.q2 such that qt,q2 span the column space of A.
Which “fundamental subspace" contains q, ? Solve Ax « (1,2,7) by least squares.
Г
-I
4
What multiple of а - (4,5,2,2) is closest to b - (1,2,0,0)? Find orthonormal
vectors q( and q2 in the plane of a and b.
Project b — (1,3,5) onto the line through а — (1,1,1). Then find p and e.
Compute the orthonormal vectors q( = а/|а|| and q2 - e/|e||.
(Recommended) Find orthogonal vectors А. В. C by Gram-Schmidt from a. b. c
a -(1,-1,0,0)	b-(0,1,-1,0) c = (0,0,1,-1).
A, B,C and a, b,c are bases for the vectors perpendicular tod — (1,1,1,1).
If A — QR then A1 A = RTR = ____________ triangular times ___ triangular.
Gram-Schmidt on A corresponds to elimination on ЛТЛ. The pivots for Лт A must
be the squares of diagonal entries of R. Find Q and R by Gram-Schmidl for this A:
-1 1'
2 1
2 4
Find an orthonormal basis for the column space of A. Then project b on С(Л).
Chapter 4. Orthog^,^
176
21
22
Hndortho?’"*1
vectc** A В '
f
n = 1
2
and bs
’ f
-1
0
and
•“•Jit в»
Then write л 4
finances of a,b,c (independent colum
Г1 2 4
- 0 0 5 •
0 3 6.
ns).
c =
T
о
4
23
24
25
26
27
28
29
30
31
bl m> 23-2* use the Qfl code above equation (11). It executes Gram.Schmidt
Shownhy C (found Via C* in	(Ю)) b equd to C in equation (7).
Equate (7) subtracts from c its components along A and В Why not subtract
components along a and along b
WTtern « dr rnn’ small multiplscation* in executing Gram-Schmidl ?
Apply the MATLAB qr code to a - (2,2.-1). b = (0,-3,3),c« (1,0,0). Whai
are the д’»?
If u is a unit rector, then Q - I - 2«uT b • reflection matrix (Example 3). Find Q.
from u - (0.1) and Qj from u  (0. V2/2. V2/2). Draw the reflections when Qt
and Qj multiply the vectors (1.2) and (1,1,1).
Find all matrices that are both orthogonal and lower tnangular
Q-I- 2uuT is a reflection matrix when uTu - 1. Two reflections give QJ  /
(a) Show that Qv - -a The mirror is perpendicular lo u.
(b| Find Qv when uTv - 0 The mirror contains v. It reflects to itself.
Challenge Problems
(MATLAB) Factor |Q. Я) = qr(A) if A has columns (1, -1,0,0) and (0,1, -1,0)
and (0.0,1. -1) and (0,0,0.1). Cao you scale the orthogonal columns of Q to get
nice integer components ?
If A is m by n with rank n. then qr( A) produces a square Q and zeros below R:
The factors from MATLAB are (m by m)(m by n) A =	f о ‘
The n columns of Q( are an orthonormal basis for which fundamental subspace?
The m-n columns of Q2 are an orthonormal basis for which fundamental subspace?
32 We know that P = QQT is the projection onto the column space of Q(m by n).
Now add another column a to produce A = (Q a]. Gram-Schmidt replaces a by
what vector q? Start with a. subtract_.divide by_to find q.
5 Determinantsand Linear Transformations
5.1	3 by 3 Determinants
5j	Properties and Application, of Determinanls
5.3	Linear Transformations
The determinant of a square matnx i« >n ..._.
that the column vectors are dependent and A ' П8 numbeT И det A = 0. this signals
Za for A - * Will have a divXt iZre™ T”'"' " ** A “ ™ f°"
and the "cofactor formula" for det A A ™™"	5’1 finds 3 b> 3 determinants
Section 5.2 begins with algebra; Cramer s Rule fnr , _ a-tx 
geometry: the volume of a tilted box. The edecs of the hn - Tbc" “ nWVCS ‘°
When the box is flat. A is singular. In ,11 c^s Z * Г г”1Т "* t
в «• in an cases, the volume of the box is |dctA|.
When we multiply matrices AB ue	j
mllipl,ln, boxes .nd die., volumes.	X."
are parallelograms and | det Al = Mc. * Л ,	Жс Ьо*“
к	11 ca Section 53 multiplies matrices for det AB.
This link to volumes leads naturally to line»
. в3 hikrtnciibe intn. tii>~i k- .r linear transformations. A linear transformation
;:*e^.o7^^
then you know T(.) to, every	““ Wmfani™. Tdoe. lo. bon.
There are three useful formulas for dot л тк. — r
Pivots^Ube triangular matnx U
So the product of pivots in U gives ± det A. This is usually the fastest way
A second formula uses determmants of size n - I; the "cofactors" of A Thev rive
the best formula for A"1. The entry of A~* « th, , . t л j j i. j * 5
_.	. .	, „ , л ’1	< °> л и the j, i cofactor divided by det A.
The Ing formula for det A adds up n! terms^oe for every path down the matnx.
Each path chooses one entry from every row and column, and we multiply the n entries on
the path. Reverse the sign if the path has an odd permutation of column numbers. A 3 by 3
matnx has six paths and the big formula has six terms—one for every 3 by 3 permutation.
This chapter could easily become fuU of formulas! The connection to linear transfor-
mations shows how an n by n matrix acts on a shape in n dimensions It produces another
shape. And the ratio of the two volumes is | det A|.
Determinants could tel) us everything, if only they were not so hard to compute.
Chapter 5. Ddermmanis and Linear Trantf^
178
5.1 3 by 3 Determinants ______________________
H is ea — be. The singular matrix [“ has 4^7
al	L J	U,
life H _ Гс П has det PA = be - ad- а
o] [c d] [o 6J	"detд
1 The determinant of A =
2 Rowexchange рА=Г0
reverses signs [ 1
"'["t 9a Ib * -uB 1 is x(ad - be)+ V( Ad- Be). De‘ b lineaf..
n>wlbyW
then 1.2.3 remain true: det = 0 when A is singular det
•------v iIr„ _ ^~L...lre.Vcrs«Ssi|
3
4 If A is n by n then 1.2.3 remain «к.. —--------- _
when rows are exchanged, det is linear in row 1. Also, det = product kr.T***® s'gn
 ‘ <' —• jT = det A is an am . ^pivots
"et BA =7de7B)(det A) and det AT = det A. Hus is an ama^
, formuias. But often they are not practical for computing.
Determinants lead 10	‘ ‘ZsTo see how determinants produce the matrix A~>.
-pus да-иоп will focus on 3 by 3	j 4J Rm wiU come 2 by 2 matnces:
ГПЬГошиЬГогЛ-'»'”»"”»”1^	, d.
|i «1 i «kJ0 11—i < H'ic drt • ‘ =Ьс'^
det _ j I"1 detl 1 0] Iе “J	L J
L J	„ matrices. Their determinants change sign when the
We «art *iA™ sign change appears for any matnx. This rule becomes a
rows are exchanged	invfne when det A - 0.
key to determinants ofЯ	o/c = b/d. The rows are parallel. For n by
2^«fet A^neam that tbe columns of A are not independent. A combmation
n matnces. det A_	лх = о with ж # 0. A is not invertible.
°f CTh7p^ti« and more will follow after we define the determinant.
3 by 3 Determinants
2 by 2 matrices are easy: ad - be. 4 by 4 matrices are hard. Better to use elimination
(or a laptop). For 3 by 3 matrices, the determinant has 3! = 6 terms, and often you can
compute it by hand. We will show how.
Start with the identity matrix (det I = 1) and exchange two rows (then det = -1).
Exchanging again brings back det = +1. You quickly have all six permutation matrices.
Each row exchange will reverse the sign of the determinant (+1 to -1 or else —1 to +1):
Notice! If I exchange two rows—say row 1 and row 2—each determinant changes sign.
Permutations 1 and 2 exchange. 3 and 4 exchange. 5 and 6 exchange. This will cany over
to al) determinants: Row exchange multiplies det A by -1.
5 I. 3 by 3 Determinants
179
When you multiply a row by a number ihi< ~ i .
Suppose the throe rows arc a b e and „ , r^t ₽ dc,erTn,nanl b> thal number
canaP4ruuixyz. Those шпе nufnbCT5 mu|upiy ±1.
' a
Я
z
det = +aqz -bpz +ЬгЖ
-cgx +cpp —ary
Finally we use the most powerful property we have. The determinant of A is linear in
each row separately. As equation (3) will show, we can add those six determinants.
To remember the plus and minus signs, I follow the arrows in this picture of the matrix.
Combine 6 simple determinants into det A
+ aqz 4- brx + cpy — ary — bpz — cqx (I)
Hotice! Those six terms all have one entry from each row of the matrix. They also have
one entry from each column of the matnx. There are 6 = 3 ! terms because there are six
3x3 permutation matrices. A 4 x 4 determinant will have 4! = 24 terms.
This guides us to the big formula' for the determinant of an n by n matrix. That
formula has n . terms ац а2* .. «ц,,. one for every n by n permutation. Each permutation
matrix P picks out n numbers in A (one number from every row and column). Multiply
those n numbers by det P= 1 or —1. Then add the results like On а-&—uijoji =ad—be.
Det P is 1 for even permutations like 231 and —1 for odd permutations like 213.
Those are reached from the identity matnx I by an even or an odd number of exchanges
Each permutation P reorders the column numbers 1,2,...,n into some order j.k. z.
The determinant is the sumof n ! simple determinants like (-1)а13а31ам = -bpz.
det A = sum over all n! column permutations P = (j, k,.z)
= £ (det P) aij а2к...апя= BIG FORMULA
(2)
So every term in the big formula picks out one number a,} from each row of A and at the
same time one number from each column of A. Multiply those n numbers times 1 or —1.
Permutations P and their plus-minus signs are the keys to determinants I
Let me return for a minute to that powerful property: det A is linear in each row
separately. We can split row 1 into (a,0,0) + (0,6,0) + (0,0,c). We can split row 2 and
row 3 in the same way. This gives us a lot of pieces (33 = 27 different pieces). But only
6 of those pieces are important and 21 of them are zero automatically (a zero column).
3! = 6 ways to use every row and column once, 33 = 27 ways if columns could repeat
det 0 q
aqz is important
det
0
0	= 0 automatically
z 21 like this
a 0
0
0
a 0
P o
180
Copter 5 Determinants and Linear Transform^^
Cofactors and a Formula for д-i
I can explain the “cofactor formula," for
ш (i) with 6	u to reduce from 3 x 3 to 2 x 2.
For the 3 x 3	inverse of A	with b and with c from row 1,
the determ.nant ‘ m det A
Two of the su _	^?7b(rx - P*) + c(PV-4*Y
Л , о (4«J r*) +	-----------------------
Cofactor formula
к r from their “cofactors .
e>	..irrt its cofa
(3)
r. Each cofactor is a 2 by 2
«red tbe factors a,b,c from' iu cofactor from row and column 2
We have fKt(X e in tbe 1,1 P»'	j Notice that the cofactor of b in
"TX, «- “|”м X *'oto ’•3 "^T ,’?
£3,. x»»" -+p« -" n“°"a°'re * *
the actual 2 by 2 determinan	Je for p|us and minus signs.
„rrt	________________________________
——'™’л' -*A
For the i, J cofactor	n>nl of the remaining matrix (size n - 1).
„	„i« f lV+) times tbe determinant ot the r	»
Cuequj	I Ln* row th detA-asiCu+’-- + а*пСй,
The cofactor formula along_—------
(4)
-----------'	collects all the terms ind*A that art multiplied by a0.
«.«<««—a.[•	«]•
та. _	» а —V- * <«- о ««« d« л w« n
det A
0
Divtdin, by det A. cofactors give our first «>d best formula for the inverse matrix
ad — be
0
0
ad - be
0
det A
(5)
(6)
m, formula	«»™«>< - '"«rtW'
,o dlruk »x mama С7 Ь» »• «"*= <« E«r> e»«y »	™*ПХ Л
rat» of two determinant, (size n - 1 for the cofactor divided by size n for A).
This example has determinant 1. so the inverse is exactly C (the cofactors). N
how C12 removes row 3 and column 2. That leaves a 2 by 2 matnx with determinant 1.
Since (-I)*** = -1. this becomes -1 in C1".
Inverse matrix formula
Example of A"1
Determinant = 1
Cofactors in (f1
A’1»
1]"* Г1
1	= 0
1 o
-1
1
0
1 1
0 1
0 0
(7)
j 1. 3 by 3 Determinants
181
The diagonal entries of ACT are always det A. That is exactly the cofactor formula.
Problem 24 will show why lhe off-diagonal entries of AC1 are always zero. Those
bers tum out to be determinants of matrices with two equal rows. Automatically rero.
пиГПд typical cofactor C3l removes row 3 and column 1. In our 3 by 3 example, that leaves
о hv 2 matrix of l’s, with determinant = 0. This is the bold zero in A"'.
8 if we change A to 2A. this determinant is multiplied by (2) (2) (2) = 8. All cofactors
C are multiplied by (2) (2) - 4. Then A"* = C*/det A is divided by 2. Of course.
1П Section 5 2 will solve Ax  b and Section 5Л will find volumes from det A.
problem Set 5.1
Questions 1-5 are about the rules for determinants.
1
2
3
If a 4 by 4 matrix has det A - find det(2A) and det(-A) and det(A’) and
det(A"‘).
If a 3 by 3 matrix has det A  -1. find det (| A) and det (-A) and det (A2) and
det(A_|). What are those answers if det A 0?
True or false, with a reason if true or a counterexample if false:
(a)	The determinant of/ + Aisl+detA
(b)	The determinant of ABC is |A| |fl| |C|.
(c)	The determinant of 4A is 4|A|.
(d)	The determinant of AB - BA is zero. Try an example with A “ [ j .
Which row exchanges show that these “reverse identity matrices" and have
|J3| = -lbut |J4|“+l?
det
0 0 1]
0 1
1 0
but det
0 0 0 1]
0
0
i 0 0 Oj
0
0
0
1
1
0
0
0
5
6
For n = 5,6,7. count the row exchanges to permute the reverse identity to the
identity matrix /„. Propose a rale for every size n and predict whether JiOi has
determinant +1 or -1.
Find the six terms in equation (1) like +aqz (the main diagonal) and —cqz (the
anti-diagonal). Combine those six terms into the determinants of А, В, C.
2
-I
0
-1
2
-1
0 '
-1
2
13 =
 2 1
4 2
6 3
8
12
c =
12 3]
4 5
7 8
6
9
Chapter 5. Determinants and Linear Transf,
182
,	4 ’]«oge‘[P + a 7 + b r + cl
’ sho*'^11*
in row
row 1
rowl+r0*2
ro*3
8
9
10
11
12
13
14
15
16
’ row 1
row 2
row 3
’ row 1
row 1 +det
row3
jT _ ,tr>t A because both of those 3 by 3 determinants come fris
Do these matrices have determinant 0,1,2, or 3?
[0 0 1
1 0 0
0 1 0
det
A =
= det
0 1 1
B — 1 0 1
1 1 0
1
1
1
1
1
1
1
1
1
— O+det
row 1 ’
row 2
. row3
Г1 1
0 1
L1 i
= 0 to prove det A
D =
1
0
1
If the entries tn every row of A add to zero, solve Ax  ~
If those entries add to one. show that det(.4 - I) - 0. Does this mean det A
Why doesdet(PiA) = (det A) times (det A) for permutations ? If p ne-
row exchanges and needs 3 row exchanges to reach I, why does P} p2 rc ? 2
from 2 + 3 exchanges ? Then their determinants will be (-1)2(—1)3 _
Explain why half of all 5 by 5 permutations are even (with det P = 1).
Reduce .4 to U and find det A = product of the pivots:
1
2
3
A-
1
1
1
1
2
2
1
2
3
A =
2
2
3
3
3
3
By applying row operations to produce an upper triangular U, compute
0’
0
-1
2
det
1
2
-1
0
2
6
0
2
3
6
0
0
0
1
3
?
and det
2
-1
0
0
2
-1
0
0
-1
2
-1
Use row operations lo simplify and compute these determinants:
101 201 301
det 102 202 302
103 203 303
0.
1?
t2
t
1
1 t
t 1
t2 t
Rnd the determinants of a rank one matrix and a skew-symmetric matrix:
ГП
and det
A =
3
and A =
Г 0 1
-1	0
-3 -4
3
4
0
5 ! 3 by 3 Determinants
183
If the i,j entry of A is i times j, show that det A = 0. (Exception when A = [ 1 ]•)
1	If the», j entry of A is i + j, show that det A = 0. (Exception when n = 1 or
1в Use row operations to show that the 3 by 3 “Vandermonde determinant" is
1
a2
b2
c2
- (6 - o)(c - a)(c - b).
a
b
c
19	Place the smallest number of zeros in a 4 by 4 matrix that will guarantee det A = 0.
Place as many zeros as possible while still allowing det A / 0.
20	(a) If «и = «» =азз = 0. how many of the six terms in det A will be zero?
(b) If an — «22 - <133 = 044=0, how many of the 24 products Oi .a2jb«3<«4m
are sure to be zero?
2i	If all the cofactors are zero, how do you know that A has no inverse? If none of the
cofactors are zero, is A sure to be invertible?
22	The big formula has n! terms. But if an entry of A is zero, (n - 1)! terms disappear.
If A has only three nonzero diagonals (in the center of A). bow many terms are left ?
For n = 1,2,3,4 that tridiagonal determinant has 1,2,3,5 terms. Those are
Fibonacci numbers in Section 6.2! Show why a tridiagonal 5 by 5 determinant has
5 + 3 = 8 nonzero terms (Fibonacci again). Use the cofactors of щ > and an-
23	Cofactor formula when two rows are equal. Write out the 6 terms in det A when
a 3 by 3 matrix has row 1 = row 2 = a, b, c. The determinant should be zero.
24	Why is a matrix that has two equal rows always singular? Then det A = 0.
If we combine the cofactors from one row with the numbers in another row.
we will be computing det A' when A* has equal rows. Then det A* = 0—this is
what produces the off-diagonal zeros in AC1" = (det A) I.
25	The Big Formula has 24 terms if A is 4 by 4. How many terms
(a) include Ou? (b) include о 13 and aa? (c)are left if an= 022 = 033 = 044=0?
Copter 5. Determinants and Linear Transf^.^
184	nnlications of Determinants
5 j Properties and Арр»---------------------------
/1 Lsefulpropernes: det A‘ = <»« ——
’ ~~‘vl * 1 A
2 Cramers Rule finds x = A~'b from ratios of determinants (a slow way)
1^3 The volume of the box (parallelogram in 2D) with edges g to en is |det
The determinant of a square matrix is an amazing number. First of all. an	""
trix has det A # 0. A singular matnx has det A = 0. When we come to eigc*nVeni^lc Пц.
eigenvectors x with Ax = Ax. we will erite that eigenvalue equation as (a nVa*Ue» A ar|(J
This tells us that A - А/ is singular and det (A - AZ) = 0. We have an "	' 0
®4Uation fo
Overall. the formulas are useful for small matnces and also for	Г^’
And the properties of determinants can make those formulas simpler, niatr,ccs
tnangular or diagonal, w just multiply the diagonal entries to find the de._J* n,atri< is
ae,enninant;
Triangular matrix
Diagonal matrix
b c
q r
= det
~ a4z (1)
9
If we transpose A the same formula still takes one number from each row and column:
Transpose the matrix	det(AT) = det(A)	qj
If we multiply AB. we just multiply determinants (a wonderful fact):
Multiply two matrices	det( AB) = (det A) (det B) qj
A proof by algebra can get complicated. We will give a simple proof of (3) by geometry.
When we add matrices, we do not just add determinants 1 (Try I + Z) Here are two
good consequences of equations (2) and (3):
Orthogonal matrices Q have determinant 1 or —1
We know that QrQ = /. Then (det Q)2 = (det QT) (det Q) = 1. Therefore det Q is ±1.
Invertible matrices have det A = ± (product of the pivots)
If A = LU then det A = (det L) (det U) - det U. Triangular U: Multiply the pivots.
If PA — LU because of row exchanges, then det P = 1 or -1. Permutation matrix!
Multiplying the pivots (fu Un... Un„ on the diagonal reveals the determinant of A.
This is how determinants are computed by MATLAB and by all computer systems for
linear algebra. The соя to find U in Chapter 2 was only n3/3 multiplications. Notice:
The “Big Formula" for det A will have a much larger cost. It is the sum of n 1 terms.
5.2. Properties and Applications <rf Detenninwis
185
We know that exchanging rows will ____
jet Л = 0. Linearity allows us to up	of	wi" »ive
enmination-s-hrraerm, r rime, TOM. , Д	«) key opcrat.on in
1 ™ гон j—does not change the determinant :
det[c_xa d-xb] <*«[•	£]»det[‘ d]"^ (4)
This was “linearity in row 2 with row 1 fixed". It means (again) that our elimination steps
from the original matrix A to an upper triangular U do not change the determinant:
t	~
det A — det U = 17ц l/a... Unn = product of the pivots
(5)
Cramer’s Rule to Solve Ax = b
J lumn ™ ‘’H4’0'*’1’Xl °f solution vectors to Ax = b. Replace the
Г Xn	**"*“« * When you multiplyit by A.
Ihe first column becomes Ax wluch is b. The other columns of Bl are copied from A:
Key idea
0 O'
1 0
0 1
bi
h
bs
°I2
«22
“32
° 13
a23
a зз
= Bs.
(6)
We multiplied a column at a time. Take determinants of the three matrices to find x,:
Product rule (det A)(z,) = det Bi or z, = -et .	(7)
det A
This is the first component of x in Cramer's Rule. Changing a column of A gave B|.
To find x2 and B2, put the vectors x and b into the second columns of /and A:
Same idea
(8)
Take determinants to find (det Л)(х2) = det Bj. This gives x3 = (det Bj)/(det A)
Example 1 Solving 3x i + 4x2 = 2 and 5xi + 6x2 = 4 needs three determinants:
Put 2 and 4
into each В
detA = 3 J
5 6
detB1= 4 fi
4 о
det B2 =
3 2
5 4
The determinants of Bi and B2 are -4 and 2. Those are divided by det A = -2:
Find x = A *b
-4 n 2
Xl = -=2 xa = —= -l
Check
3 4lf 2l-[2'
5 6jL-1	[4
Chapter 5. Determinants and Unear Тгад^
186
If det A is not zero. Ax = b is solved by determin»
_ detBj	_ detBn
X1 ~ det A	" det3T
jA column of A replaced by the vector b.
To solve an n by n system. Cramer’s Rule evaluates n + 1 determinants (
n different B's). When each one is the sum of n! terms—applying the "big f ° a°d th
all permutations—this makes a total of (n +1)! terms, It would be crazy to	*ith
that way. But we do finally have an explicit formula for the solution to Ax - e4Uo,i°Hs
Example 2 Cramer's Rule is inefficient for numbers but it is well
For n = 2, find the columns x end у of A-1 by solving AA= A [x *° ktterj
crameb’srVL£
det Bi
Xl = det A
The matrix B3 has Ле }
(9)
Ге b *i 1 ]
^“[e d] xa] [«J
a b	yi	_ 0
” c d	уз 1
Those share the same matnx A. We need |A| and four determinants fori,, z2,	.
He 1 I | 0 b I I a 0 I
c 0| d| |c
The last four determinants are d. -c. -b. and a. (They are thc cofactors !) Here is A~l.
1,"|Л|’Х,"ИГ* = |ЛГ*"|Л|^Й*ПЛ ’ ad —be
d -b
-c a
1 chose 2 by 2 so that the main points could come through clearly. The key idea is;
A-1 involves the cofactors. When the right side is a column of the identity matrix I,
as in AA'1 = /. the determinant of each By in Cramer’s Rule is a cofactor of A
You can see those cofactors for n = 3- Solve Ax = (1,0,0) to find column 1 of A-1;
Determinants of B’s
are Cofactors of A
1 Я12 nj3
0 O22 °23
0 a32 a33
«II
«21
<»31
1 «13
0 023
о 033
all 012 1
<*21	<*22	0
e31	032	0
That determinant of B| is the cofactor Сц = 033033 - 023032- Then |B21 is the cotactor
C12. Notice that the correct minus sign appears in -(031033 — 023^31). This cofactor C12
goes into column 1 of A-1. When we divide by det A we have computed the inverse.
C-	C'1
FORMULA FOR A"’	(A-,)« = ——	A"1 =------
________________ det A det A
5 2. Properties and Applications of Determinants
187
Areas and Volumes
W S " £ Ы™	п”Л“  pa'llld»S«"-or Ш. p^lldugram «h«b
;s a triangle, me problem is: Find the areo c__	.	.	in.
half the base limes the height. A paralleloenm °f * tnang c-	** з bh'
h L,it the 1 Then n=. I paraJ,e‘o&ram contains two triangles with equal area.
so we omit the Then parallelogram area = bh = base times height
ПО o/Zn	“T 3nd remember Bu«	^'t««««г problem, because
311 wZ	8hL We 0П,У knW ,he	‘he comers For the
,Г‘аП8 s Ae^ou^ P°,ntS Г (0,0)	(в’6) and For P^allelogram
(twl« as large) the fourth comer wtll be (o+<,	]ng	,
U ^are aL^frorn? л1 iU	T° find	-
could createz to fam (c, rf) that u perpendicular to the baseline. The length A of that
hTa Л Гге 4uare roou-Bu* d<~not >nvo1-
square roots and it has a beautiful formula:
Area of parallelogram = Determinant of matrix = ±! ac J i = M - bc|.
(11)
Our goal is to find that formula by linear algebra: no square roots or negative volumes.
We also have a more difficult goal. We need to move into 3 dimensions and eventually
into n dimensions. We start with four comers 0,0,0 and a. b,c and p,q,r and x, y, z
of a box. (The box is not rectangular. Every side is a parallelogram. It will look lopsided.)
If we use the area formula (11) as a guide, we could guess the correct volume formula:
Volume of box = Determinant of matrix = ±	a b c P q r x V z
(12)
Our first effort stays in a plane. For this case we use geometry. Figure 5.1 shows how
adding pieces to a parallelogram can produce a rectangle. When we subtract the areas of
those six pieces, we arrive at the correct parallelogram area ad — be (no square roots).
The picture is not very elegant, but in two dimensions it succeeds.
188
be \°^
cd/2
cd/2
Area of parallelogram
(a+c)(b+<f)-2bc-ab-cd
— ad — be
(a.bl
^ь/2 : k
Figure 5.1: Adding six simple pieces to a parallelogram produces a rectangle.
Would a similar construction be possible in three dimensions ? Following FigUre 5 .
I believe we could add simple pieces to make a (tiledbox into a rectangular bo*J
but it doesn't look easy And there is a much better way: Use I,near algebra.
Areas and Volumes by Linear Algebra
In working on this problem. I came to an understanding. If we do more algebra, then
we need leu geometry. Very often, linear algebra comes down to factoring a matrix
We will look there for ideas.
A box in n dimensions has n edges в|.е»,...,en going out from the origin. The
parallelogram in two dimensions had two vectors Bj “ (a, b) and e2 = (c,d). Those
vectors e give two corners or n comen of the “box". In lhe 2-dimensional picture, the
fourth corner was e> + e». In the n-dimensional picture, the other corners of the box
would be sums of the e's. The box is totally determined by the n edges in the matrix E-
Edge matrix Ej
[ ь d i ,nd
Our goal is to prove that the volume of the box b | det E|. We considered three possible
factorizations of E. to reach this goal. They are taken from Chapters 2 and 4 and 7
The third factorization b called the Singular Value Decomposition of E: the SVD
Lower times upper triangular
Orthogonal times upper triangular
Orthogonal - Diagonal - Orthogonal
E = LU
E = QR
E = USVT
52. Propert'» “d Applications of Determinants
189
The problem is to connect the volume of the box to
Those factors of E arc square matrices becauw г s.
determinant of L is 1 (all ones on its diagonal?	("	.	.
js j or -1. Then det L = | det QI = | L7 '_T1da^n'nan‘	“У orthogonal matrix
on the multiplication formula for the dctermm *7	“ Wc wil1 certa‘nly dcPcnt)
on	* 1ЛС ««CTmtnani of a product, which now tells us that
||detg| = | det LZ | = |detfl| = |detS|?j	(13)
**" not change the volume Lei me undenund this hru tot E - Q time. R
Multiply by any matrix: Straight lines stay straight
Multiply by an orthogonal Q; .T. and zTy art the same a, (qx)t((?x) дш1 (Qx)T(Qv)
Then lengths and angles and box shapes and volumes are not changed by Q
This remain, troe for curved regions. We div.de them into nuny small cubes plus thin
curved pieces. The total volume of those curved pieces can appro^h zero. The volumes of
the cubes are not changed by Q. The boxes for R and/or E = QR haw the same volume.
R is a triangularjnatnxju box ha, . volume we can compute For a program
in the xy plane, the base and height are exactly the diagonal entries of R
R-
[u v
0 w
base = u. height = w
|area| = |u times w| = |det R |
The key point i,: The main diagonal of Я shows the height in each new dimension.
When we multiply those numbers on the diagonal of R, we gel the volume of the box
and also the determinant of the triangular matrix R The volume formula | det El
is now proved in all dimensions because det Q| = 1 and |det E|  |det R|.
Final comment: The Singular Value Decomposition E « (7EVT has two orthogonal
matrices U and V. The number | det E] is equal to | det E|. And this matrix E is diagonal.
It gives a perfectly normal rectangular box in R". This SVD approach by f/EVT look,
simpler than QR, which had a triangular matrix R producing a tilted box.
But that tilted figure shows a dear geometric meaning for the diagonal entries
of Я: bate and height. The geometry of lhe SVD will be seen in Chapter 7. It is beauti-
fully clear for ellipses in n dimensions. But the singular values are not so clear for boxes.
E gives lhe lengths of the axes of an ellipse but not the sides of a rectangular box.
For a box with straight sides. E = QR leads directly to volume = | det Я|.
The next section will allow any shape: not just boxes.
190
Problem Set 5.2
If<MA = 2.*hJ«arc
Compute the determinants
1
2
о
1
1
1 1
1 О
О 1
drt A'* and det A" and det AT ?
of А. В, C, D. Are their columns independent?
fl 2 3
B= 4 5 6
7 8 9
С =
„««л •« "«»*“ rf««<ь«
3
0 0 x
0 0 x
Whal are lhe cofactors of row 1 ?
What is the rank of A?
What are the 6 terms in det A?
4	(a) IfD. - «И(Л-). could Л oo even if all |XV| < 1 ?
fl>) Could D. -»0 esen if all |Aj| > 1 ?
Problems 5-9 art about Cramer’s Rule for x - A b.
5 Solve these linear equations by Cramer’s Rule Xj = det Д2/det A:
(.1 2r|+Sx’”1
,B) x, + 4xa = 2
2xi + x2 =1
(b) x, + 2xj + xj  0
xj + 2xj “ 0.
6 Use Cramer s Rule to solve for у (only). Call the 3 by 3 determinant D.
(a)
«x + by = I
rx + dy = 0
ax + by + a - 1
(b) dr + ey + ft “ 0
gr + hy + it = 0.
7	Cramer’s Rule breaks down when det A - 0. Example (a) has no solution while
(b) has infinitely many. What are lhe ratios x, = det B,/det A in these two cases?
(P*alWlinc$) 0,1 S+te’Zi («««line)
8	Quick pmofof Cramer'i rule The determinant is a linear function of column I. ft is
zero if two columns are equal. When b = Ax = X]Oi + x2a2 + x3a3 goes into the
first column of A. lhe determinant of this matrix B\ is
lb a3 a3| = |х|О| + xjOj + xjtij o2 a3| = xj |а2 a? a3| = z, det A.
(a)	What formula forxt comes from left side = right side?
(b)	Whal steps lead lo lhe middle equation?
If the right side b is the first column of A. solve the 3 by 3 system Ax = b. How
does each determmant in Cramer's Rule lead lo this solution x?
5-1.
Ю
11
12
13
14
15
16
17
18
19
20
21
properties and Applications of Detemunanu
191
ГЫ .be	_ { w _ (1#4).
The c«nen ol.u,„,k„ft I|ed (a.4)-4(0
<« FigdtlKirel
™	ош„. И ta, «Ли,^	ta> b .	,
1 1
1 -I
1 1
1 -1
What is |Я|
1
1
I
1
I
I
I
1
» volume of a hypercube in R‘?
The sides have length 2
An n dimensional cube has how many comers"’ How many edges? How many
(n - irnensonJ aces The cube in R* whose edges are the rows of 2/ has
volume ------ • A hypercube computer has parallel processors at the comers with
connections along the edges.
The triangle with comers (0.0). (1.0), (0.1) has area 1. The pyramid in R1 with
four comers (0,0,0), (1,0,0), (0,1.0), (0,0.1) has volume _______What is the vol-
ume of a pyramid in R with five comers al (0,0,0,0) and the four columns of / ?
Suppose E„ is the determinant of the tridiagonal 1,1,1 matnx of order n. By cofac-
tors of row 1 show that En > En_, - Starting from E, - 1 and Ej - 0 find
E3,	By noticing how the Es repeat, find EIOo-
Ез.Еч,
Ft = IM
Е,-
£s-
1 1
1 1
1 1
1 1
0 1

From the cofactor formula AC1 = (det A)/ show that det C = (det A)—’.
Suppose det A  1 and you know all the cofactors in C. How can you find A?
If a 3 by 3 matrix has entries 1,2,3,4.9.	what is the maximum determinant ?
If the edge matrix E is orthogonal, the box has volume___.
If the edge matrix E is singular, the box has volume___.
If the volume in Rn is V. the box for 2E has volume____..
Draw parallelograms for ’jand^ ] j Can you see any reason for equal areas ?
Transposing the edge matrix | J gives a matrix with the same determinant and
a new parallelogram with the same area. Can you draw it and recompute its area ?
Chapter 5. Determinants and Linear Transfom^^
192
Linear Transformations
53
1
2
3
ипеаг transformations T obey the rule T(cv + dw) = cT(v) + dT(w}
Derivatives and integrals are linear transformations in function space.
Volumes of all shapes are multiplied by | det Л| when every x goes t0 Дд.
_ ’ _____________________________________
Transformations T follow the same idea as functions. In goes a number z or
out comes /(x) or T(u). For one vector tt or one number x. we apply (hc lran-Vcc,Or v,
T or we evaluate f(x). The deeper goal is to see all vectors v at once. We are tranT"1*1'0"
the whole space V.	Orfning
Start again with a mains A. It transforms v to Av. It transforms w to Aw Tk
know what happens to u  v + w. There is no doubt about Ли. it has to equal А *"СП *e
Matnx multiplication T(o)  A и is an example of a linear transformation * +
A tran formation T assigns an output T(v) lo each input vector v inV 1
The transformation и linear if it meets these requirements for all t> and
(a) T(v + w) - T(v) + T(w)	(b) T(cv) - cT(c) for all r.
Those rules tell us: If the input is • - 0. then lhe output must be T(0) - 0. No shift.
T(0 + w)  T(0) + T(w) Removing T(w) from both sides leaves 0 » T(0)
Combining the two rules tells us about linear combinations of ti and w:
|r(cv + dw) - T(cv) + T(dw) - cT(v) + dT(w)
Example 1	Ti rotates the whole xy plane by 90° around the center point (0,0).
This is a linear transformation! Straight lines will rotate into straight lines. A square
will rotate into the same sire square. The center point (0.0) does not move: / (0) ж 0.
Requirement (I) for linear combinations cv + dw is satisfied.
The likable pan of that example is: No matrix was needed. We can visualize linear
geometry without linear algebra. If we have another linear transformation T2 of lhe
xy plane, then T2 can follow Ti lo produce T2T\: First find Ti(ti) and then apply Ta.
Example 2	T2 reflects each vector (x, y) to its minor image (z, -y) across the x axis.
This is another linear transformation that doesn't need a matrix. Notice that TjTi differs
from T>T>: Reflecting the rotated vector rotating the reflected vector. (1,0) rotates
to (0,1) and reflects to (0, -1). But (1,0) reflects to (1,0) and rotates to (0,1).
5.3-
linear Transformation*
193
Example 3	The length f(v) _ || v|| Ц iinear Requirement (a) for linearity would be
||w + w|| - M + M- Requirement (b) would be ||со| = г|вЦ. Both are false’
Hot (a): Thc ',dcs of  Wangle satisfy an mequo/iry |v + w|| < |v| + ||w||.
Hol Ф): The length | - v|| i» ||o|j and not -||v|. For negative c. linearity fails.
T (every vector) from T (basis vectors)
The rul, of linearity extend, to combination, of three vector. or n rector,:
Linearity u = C|v, + qjOj +... + СжВя must transform to
_____________Т(ц> = С1Т(р1)->~О1Т(рз) 4--- +cwT(o,)
The 2-vcctor rule starts the 3-vector proof: T(cu + do + fW) = T(ru) + T(do + ew).
Then linearity applies to both of those parts, to give cT(tt) + dT(v) + eT(w).
The n-vector rule (2) leads to the most important fact about linear transformations:
Suppose you know T(v) for all vectors «ц......in a basis
Then you know T(u) for every vector u in the space.
You see the reason: Every u in the space is a combination of lhe basis vectors or
Then linearity tells us that T(u) must be the same combination of lhe outputs T(v,).
A key point about linear transformations: If we choose bases for the input space and
the output space, then 7 can be specified by a matrix A. The rule for constructing A
must use the two bases (which can be the same if output space = input space).
Step !• Apply the transformation T lo each input baii, vector vJe
Step 2- Write the output T(o}) as a combination of the output basu vecton w,.
Step 3. The coefficients AtJ in that combi nation T(v,)* £2 A,, w, go into column j of A.
This matrix A finds the output T(u) for any input ». If the vector c  (ci,...,c«)
gives the input coefficients in v = CiV| + ••• + c,t>„ then b « Ac gives the output
coefficients in T(v) = biw> + • •• + bw,wm. T becomes multiplication by A.
Example 4	T is rotation by в of the ry plane with basis vectors . and j.
Г 1 1 Г сое» 1	. Г 0 1 Г —sin® 1 __________ Г сов в -sin» 1
т[о] = [ sine] and Thrl сове j produce л = [ sine «»е J
the matnx multiplication Ли.
du . л
Output = b + Sex-
matrix form of the derivative T = —.
dx
1  Vl
Input и Multiplication Ли »
We know what T does to these basis functiom:
- 2x - 3va. (3)
n-
1
The Derivative is a Linear Tran*f(lr.
а,°ппац0
It i* Imeanty that lo hnd the denvarive of u(x) = i6 - 4x + 3x7. SlarI
denvative* of 1 and x and x7 Those functions are the basts vector* Thetr d *',h
ме 0 tod 1 aid 2x. Then use lineanty for the denvative of any combination like ****
= в (denvative of l)-4 (derivative of x) + 3 (derivative of x3) » -4-p e
Ml of cikului depend on lineanty' Precalcuiu* finds a few essential den vat i
x" and sin x and <-ou x and e* Then lineanty applies to all their combinations '**' Г<*
I would say thal the only rule special lo calculus it the chain rule. That Drnrf
denvative of a chain of functiom /(p(x)) or /(p(h(x))) Needed in deep learning1?** *he
Nullspace of T(u) = du/dx For the nullspace we solve T(u) = 0. The d
чего when uiia consiani/MCtion. So the nullspacc of d/dx it a line in functi^ V<"‘Ve h
all multiples of the special solution и « 1. ' The denvative operator it ‘P^e—
W 14 no1 'nvertible»
Column space of T(u) = du/dx la our example the input space contains all
n + hr + ex7. Then the outpuu (the column space ofT^aretdl linear function?x rWic‘
Notice that lhe Counting Theorem is still true: r + '	_ 11 A , * 6 + 2cx.
---..и» 1 column space + dimension (nullspace )
(Шпепыи» ---------.	1 +1  dimension (input ipirt;
Wluii il lhe maim foe d/dx' I cant leave derivative* without asking for a lnil|f|x
T  d/dx и a linear transformation ’	“	—  u—i. ______________'
st|.l,x,x7 ^-0
Gj
The 3-dimaMKmal input space V (= quadratic») transforms to the 2-dimensional output
xpace W ( linear functions) If vi.vi.Ui were vector*, we would know the matrix
l
jj. Linear Transformations
195
The Integral is a Linear Transformation
Hext we look «integrals from 0 to x They gbt	...	.
e„mp»»	Cco, _ Ox t J£j>
S' 'XT "Л?““ “	»—. «bT.lhi
The matrix A lor z ix J by 2. A» inverts А wlwrr
0 '
D
Input Multiply A*v
v я D + Ex
Output = Integral of v
T*(r) - Ш+ |£r»
° 01.
1 0
0
Fundamental Theorem of Calculus says that integration is the pvcudoim erve
of differentiation For linear algebra, the matrix A* is the pscudoinvenc of A.
0 0
°U
0 0 0
0 10
0 0 1
(3)
The derivative of a constant function is zero That zero и on the diagonal of A* A.
Calculus wouldn't be calculus without that 1-dimensional noll.pw-rrrfT . d/dx.
Example в Suppose A is an imtmbU matru. Cmainly T(v + u)  Ae + Aw 
T(v) + T(w). Another linear transformation is multiplicanon by A'• This produces thc
invmr transformation Г*1, which brings every vector T(v) beck to v:
T~'(T(v)) - v matches the matnx multiplication A'*(Ar) - v.
if T(v) « Av and 5(u) - Bu, then T(S(a)) matches the product ABu.
We are reaching an unavoidable question Are all linear transformations from V
to W produc'd by matrien? When a linear T is desenbed as a “rotation- or a
‘projection" or is there always a matrix A hiding behind П Is T(v) always Av?
The answer is yer! This is an approach to linear algebra that doesn't start with
matrirM Wr «fill *««*4 a.« впДак — —‘
— r--— mwwhw wrj a .  но к» IWI nurar vompuier
graphics has found a way to use a matrix—but it is 3 by 3. Every point (r, g) is given
the "homogenrous coordinatfs" {x,g, 1). Then you can shift a page by using matrices.
A diagonal matrix rescales a page and an orthogonal matrix rotates a page
7
196
O^er 5- Deienninants and L,ncar Tnunfonnati()lh
The Geometry of Linear Tran.sformations
linear transforout^transformed into a tru^J
SuPpOh'C1 L u^*TOCd,nl°aStrJ1₽ И ГС.) T(«2)- then we know T(„j
straight hne i» V:gne «basis If»* fonns into a mangle. The area of CVe
The vector* »t triangle О-т	.^determinant formula :
for all other point*	~ connected by the arca/oc
mangle (»berc'rT *•»>*“	г) (inKs area of original triangle
.rea of transformed tnang	T of the xy plane ? Use it* matrix I
ц-hat is the determinant of a Imear
	
Figure 5.2: Line* to line*, equal «pacing to equal spacing, u = 0 to T(u) - o.
One key point about area*, when a linear T transforms one part of a plane to another
part in Figure 5.2. The area of the big triangle is multiplied by a number. The area of every
small triangle is also multiplied by that number. More is true: Every circle and square
and every shape whatsoever has iu area multiplied by that same number
We just fill the shape with small squares, a* closely a* we want. Their area* are all
multiplied lhe same way. Section 52 discovered that lhe area multiplier is the determinant
of the matnx E. Then E take* squares into parallelograms and circle* into ellipses.
Why Do Determinants Multiply?
Thi* valuable property | det AB | ж | det A 11 det В | look* messy to prove. It is buried
somehow in lhe big formula for lhe determinant. Here are two very different proofs,
one from geometry using volume*, and one from logic using properties of the determinant.
Geometry Stan with a standard cube in n dimensions. Each edge has length 1. The n
edge* going out from (0,... ,0) are the row* of the n by n identity matrix /. The volume
of the cube is 1. which is the determinant of /.
Now multiply every point in the cube by the matrix B. This gives a box with sides
from B. We know that volume of box = determinant of B. No problem so far, except for
some risk that our logic might become circular.
s j. Linear Transformations
197
Now multiply every point in that B-boX bv th.
Altogether we have multiplied the columns of hTJ*?? A ’П,'‘ giv*‘ * nc* Л/?Ьо*
haS volume = |det AB |.	•denuty matnx by AB. so the Л B-box
But also we have multiplied the B-box bv A u-x.
volume is multiplied by | det A |. Since the В L к . Wy *** “ mul,'Pl'cd by \‘te
v >|ume = |dct A | |dct В |.	•>*» *®'ume | del В |. the Л B-box has
(	) we give three properties of all determinants.
1.	The determinant of A = I is j.
2.	The determinant reverses sign if two row. of A are exchanged
3.	The determinant is a linear function of each row separately
From those wrtb 3 we can make any matnx tnangular and find its determinant.
R“ в ? *. a multiX 7Га"ОП$ 'ke cl,m"ut,ofl R“le 2 allows us to exchange rows
Ru)e 1 sets a multiplying factor to 1. for the volume of a unit cube
The product rule comes fn>m cbeckmg tKa,	R
three properties. Then that ratio must equal det A. Thus det AB = (det A) (det B).
Change of Bases
Suppose the input space and output space are both R2. Suppose v2 is the input has.*
Md w,, W, .. the output bmis. What is the matnx В (us.ng these bases) for the Entity
transformation T(o) = v?No, always /! В is the “change of bas.s matnx”.
WhenV=[vt Ua]andlVa[W| w3 ], the change of basis matrix is В = W~ ‘V.
For any linear transformation, its matrix A (In the old bases) changes to W~lAV.
I see a clear way to understand those rules Suppose the same vector u is written in the
input basis of v’s and the output basis of w’s. ThenT = /. Its matnx gives d -
etl Г 1141
U = C|V! +•• +c„v„
u = diWt + • •• + dnWn ,S
»l ••• On
Wi w„
and Vc = W d
The coefficients d in lhe new basis of w'sare d = W~*Vc. Then В is W~lV. For
a transformation T between spaces V and W. we insert the matrix for T to get W~l AV:
Change of basis leads to W *4V not И’Л V. Larger vectors w have smaller coefficients d!
198
I'
Chapter 6	L
зреег Dclcrrni ----------------- uailonx
., K.V art orthogonal When A is symmetric) and
of Л (they matrix becomes diagonal.
Hor every A>-
Singular vector u.t,	\
1 OrthoR°na,for“l,A '
Chapter 7	I ’
I
I

Linear Transformations of a House
to define it When a 2 by 2 matrix A
a transformation IM	wilh a -house" that has eleven
U is more interesung Ю	^ch ho* «	^ors Av. Straight line,
««и»**	X. Л ». prod““ •
,wn h°^Xbrf “	” >1”'' ** *“show ,o“r hOT“‘ “““lhc
Ь —• - '^ U<^nco™n«<0K йты»». (И
Th,i	them The columns of U ‘ сдапет to the first) A multiplie,
AH °',h' °“”
IS 2 by	\ tk, houje matnx n w e
lbe 11 points m the bou
[-6 -6 -7
House -	1 j -
matrix
I
0
8
7
I
199
jj. Linear Transformation!
problem Set 5.3
1
2
3
4
5
б
7
8
9
10
A linear transformation must leave the »» — - ,
T(v + w) = T(tt) + T(w) by с»юомп^	T(0) = 0 Provc lh'4 fr°m
alsofromT(cv) L ^(.J	*
Suppose a linear T transforms (1,1) to (2,2)	(2fl) * (() fl) RwJ r(v).
(.)v = (2,2)	(b)	(c)	(d) v = (efc)
Which of these transformations are not linear? The input i. v - (v,. v,):
(a) T(v)-(«>,„)	(b) T(V),(VliVl}	(c) r(r)_(0.ri)
(d)TO-(o,i)	(e) nvj.v.-..	(f)	T(e) = VlP2.
If S and 7 are linear transformations, is T(S(»)) linear or quadratic ?
(a)	(Special case) If S(v) - v and 7(e) > ®. then T(S(®)) - v ot v11
(b)	(Generalcase)S(®1+Vj) .	,+»,) - 7(v,) + 7(v2)
combine into T(S(v> 4-V])) ш T(_____)=	\ __
Sup’TSe^(w)x " ® «сеР“»ш 7(0,4,) . (0,0). Show that this transformation
satisfies 7(cv) = cT(v) but does not satisfy T(v + w) - 7(t>) + 7(w).
True or False: If we know 7(«) for n different nonzero vectors in R". then we
know 7(v) for every vector v in R".
Which of these transformations satisfy T(v + w) - T(v) + T(w) and which satisfy
T(ct>) •= cT(w) ?
(a) T(w) - t>/|M|	(b) T(v) . v, +vj+vj (c) 7(„) . (₽1,2vjt
(d) T(w) = largest component of ®.
How can you tell from the picture of 7 (house) that A is
(a)	a diagonal matrix ? The house expands or contracts along each axis.
(b)	a rank-one matrix ?
(c)	a lower triangular matrix ?
Draw a picture of 7 (house) for these matrices:
D=[o i] «d л=[1 л] tf-[J i] •
What are the conditions on A = [" $] to ensure that 7 (house) will
(a)	sit straight up?
(b)	expand the house by 3 in all directions?
(c)	rotate the house with no change in its shape?
Chapter 5. Determinants and Linear Transfo^
200
11
12
13
nuter sketch the bouses .4 . H for these matrices Д:
Without a computer sxc<
[i.:] - '] “* [-’ J “ N-
.... .. „„ ()K 4 = ad - be ensure that the output house AH will
What conditions on <iet t
(a) be squashed onto a line.
<b) keep its endpoints in clockwise order (not reflected)?
(c) base the same area as the original house?
This code creates a vector theta of 50 angles It draws the unit circle and
it draws T (circle) = ellipse. The multiplication Av takes circles to ellip^
A-[21;12]	% Мэи can change A
theta • [О* * I*502 • PH: * 50 an0*es
arete  (cosfthete); s«(theU)); % 50 points
ellipse-A.drete;* arete to ethpse
axis(H 4 -4 4[). aaisCsquare’)
ptot(oecte(1. ). drete(2.:). еде!.:). ««PH2.:))
14	Suppose the spaces V and W have lhe same basis B|,va.
(a)	Describe a transformation T (not /) that is its own inverse.
(b)	Describe a transformation T (not /) that equals Г3.
(c)	Why can’t the same T be used for both (a) and (b)?
Questions 15-18 are about changing the basis.
15	(a) What matrix transforms (1,0) into (2,5) and transforms (0,1) to (1,3)7
(b)	What matrix C transforms (2,5) lo (1,0) and (1,3) to (0,1)?
(c)	Why does no matrix transform (2,6) to (1,0) and (1,3) to (0,1)?
16	(a) What matrix .If transforms (1,0) and (0,1) to (r, t) and (a, u)?
(b)	What matnx .V transforms (a,c) and (b,d) lo (1,0) and (0,1)?
(c)	What condition on c, b, e, d will make part (b) impossible?
17	(a) How do .If and N in Problem 16 yield lhe matrix thal transforms (a, c) to (r t)
and (6. d) to (a,«)?	7
(b) What matrix transforms (2,5) to (1,1) and (1,3) to (0,2)?
16 If you keep the same basis vectors but put them in a different order, lhe change of
.. "Г? ir *------------------niatnx. If you keep the basis vectors in order but change
their lengths. В is a___matrix.
19 Why is integration not the inverse of differentiation ?
6 Eigenvalues and Eigenvectors
6.1	Introduction to Eigenvalues
6.2	Diagonalizing a Matrix
63 Symmetric Positive Definite Matrices
6.4	Systems of Differential Equations
Eigenvalues A and eigenvectors x obey the equation Ax - u™ a ;
and A is a number. The vector Ax kin the	- Ax. A isi square matnx
. naira Ax, = A.x »n л . Mme d,rcc,,on as *  unusual. If we find n of
„ simple one-dimen.ionil We W"S	ргоЬ1"" “™ iM°
id Je „,.е Г« ihe	- e.pemee.ee,.
Here is an example: Solve . я a u _____________________ „
vecior«. i. multiplied by hi	£*.
Output UW ж A*U - CjAf»! + ...
Input «0 =
?	10 **	differential equation
du/dt - Ли is solved at time t tn exactly lhe same way. The numbers A* change to e".
In matrix language, the matrix X of eigenvector, turns A into X'AX = A. This is
the diagonal matnx of eigenvalues. A diagonal matnx means that the system is uncoupled
into n easy equations like du/dt - Au. and solved by и = e"
Sections 6.1 and 6.2 present the key ideas of eigenvalues. Then Section 6.3 goes from
general matnces A to symmetric matrices S. The eigenvector, of S are orthogonal.
Their matnx X becomes Q with Q'Q « /. And with positive eigenvalues A > 0. we
have thc best matnces in pure and applied mathematics symmetric and positive definite.
Positive definiteness can be tested five ways—by positive pivots and determinants and
eigenvalues and energy (and by S = ДТЛ) co(wct |o fiRj five
book. The central ideas are coming together for the best matrices.
201
Chapter 6. Eigenvalues and Eigenvector
1
2
202
6.1 Introduction to Eigenvalues_______________________
An eigenvector x lies along the same line as A times x : |Ax = Ax.| The eigenvalue^,
if Ax = Ax then A2x = A2x and A"'x = A~’x and (A-E c/)x = (A-E c)a	#
If A® - Ax then (A-A/)x = 0 and А-А/ is lingular: |det(A-A7^0
‘ V. by det A = (A^As) - (A.) dw,S«*l sum a,! + a2J +  • • + anB = sum of
have 1 and -1 Rotations have e,e and e"'*: comply
3
4 Check A’s
5 Projections have A = 1 and 0. Reflections
This chapter enters a new pan of linear algebra. The first part was about Ax ж b-
linear equation* to find a steady stale. Now the second part is about change Time en-
ters the picture—continuous tune in a differential equation du/dt = Au or time step,
in a difference equation u»>t » Au*. Those equations are NOT solved by elimination.
The key idea is to find solutions u(t) that stay in the direction of a fixed vector x.
We want "eigenvectors" that don‘t change direction when you multiply by Д.
The eigenvector-eigenvalue equation is Ax  Ax. We look for n eigenvectors x and
their eigenvalues X Then A2 also has those eigenvectors: A2x - A(Ax) = A2®.
A good model comes from the powers А, A2. A1,... of a matrix. Suppose you need
the hundredth power A,oe. Ils columns are very close lo the eigenvector x  (.6, .4):
, , f.8	.31	(.70	.451	(.650	.5251
A.A’.A»-[>2	7]	[да	.55]	[зад	.475]
A100 «
.6000 .6000
.4000 .4000
A100 was found by using the eigenvalues of A. not by multiplying 100 matrices.
Again: Each eigenvector * has an eigenvalue A with Ax ж Ax. ThenAl0"x ж A100®.
To explain eigenvalues, we first explain eigenvectors. Almost all vectors will change
direction, when they are multiplied by A. Certain exceptional vecton x art in the same
direction at Ax. Thme are the eigenvecton. Multiply an eigenvector by A. and the vector
Ax is a number A times lhe original x.
The basic equation is Ax = Ax This leads to (A — Л/) x = 0
The eigenvalue A tells whether the eigenvector x is stretched or shrunk or reversed or
left unchanged—when it is multiplied by A. We may find A = 2 or | or -1 or 1. If
А ж 0 then Ax = Ox means that this eigenvector x is in the nullspace of A.
If A is the idenuty matrix, every vector has Ax = x. All vectors are eigenvectors of I.
All eigenvalue* “lambda" are A = 1. This is unusual to say the least. Most 2 by 2 matrices
have two eigenvector directions and two eigenvalues: A®| = Ai®i and Axj = AjXj-
This section will explain how to compute the x’s and A’s. It can come early in the course
because we only need the determinant of a 2 by 2 matnx A-XI.
।. Introduction to Eigenvalue»
203
•8-А .3
2 .T-A
A =
E„mpl.1	su„d«(X-A7)=0:
I factored the quadratic into Л - 1 times A - 1 ...	.	.	.	.	. ,
For those numbers, the matrix A - XI become! .	°* c'8CnvaluCi A = 1 and J.
I,	,hc
(ЛOb [-j J
and lhe first eigenvector is xt
(Л-	-«»» [^2 j] ®* " [0] jnd *** ‘есо<х1 eigenvector is x2 
If Xi » multiplied again by A. we still get Every power of A will give Л";
Multiplying x2 by A gave |za. and if we multiply again we get (j|
When Л fa squared, the eigenvectors stay the same. The eigenvalues are squared.^
This pattern keeps going, because the eigenvectors stay in their own directions (Figure 6.1).
They never gel mixed. The eigenvectors of zl' are the same Xi and Xi. The eigenvalue»
of Л100 are l100 - 1 and ($ ),0° = very small number
«1.
times za.
f  (1)’*|
Multiply z’a by A2
— ‘2® a2 / all
A2xt  (5)3za
Figure 6.1: The eigenvectors of 4 are also eigenvectors of A2: eigenvalue = A’.
Other vectors like (.8. .2) do change direction. But all other vectors are combinations of
the two eigenvectors zi = (.6, .4)andza = (1,-1): The first column of A iszj +(.2)za:
Separate other vectors
into eigenvectors
.8
.2
= zi+(.2)zj= ®	_	.
(I)
Chapter 6. Eigenvalues and bigCnVcc
tar.
204
When we multiply Xi +
Multiply Xi •»
by A = 1 and Л =
Fw	molo^io. Ь, A » b	by (J). Tbe »m eiseoree.», b
b, l.Tben99«4»F«<D"sma"<’> lppem“Л'”:
This is the first column of A’00- The number we originally wrote as .6000 was not exact.
We left out (.2) (J)" whkh wouldn’t show up for 30 decimal places.
The eigenvector x( is a "steady state” that doesn’t change (because A, = 1).
The eigenvector x2 is "decaying” and virtually d.sappears (because A, - |).
The higher lhe power of A. the more closely its columns approach the steady state.
This particular A is a Markor matnx: Columns add to I. Its largest eigenvalue is A  1.
Its eigenvector X|  (.6, .4) is the Heady state—which all columns of A* will approach,
A giant Markov matrix is the key to Google’s search algorithm—which is truly fat.
Other matnces have other eigenvalues. Projection matrices have A s 1 for vectors in
the column space and A = 0 for vectors in thc nullspace (projected to the zero vector).
Then Pxt - *i and Px2 - 0. We have P3 = P because 1’ = I and 02 - 0.
Example 2 The projection matrix /’ •	$ j has eigenvalues A « 1 and А ж 0.
Its eigenvectors are x> “ (1,1) and zj « (1,-1). For those vectors. Px} equals «i
(steady state) and Pz2 • 0 (nullspace). Then P2x1 “ Xi and P2x2 » 0 and P2 = P.
Our examples illustrate Markov matrices and singular matrices and symmetric matrices:
Those matrices have special A’s and special eigenvectors:
I.	Markov matrix: Each column adds to 1. This makes A = 1 an eigenvalue.
2.	Singular matrix: Some vector has Ax = 0. Then A = 0 is an eigenvalue.
3.	Symmetric matrix The eigenvectors (1,1) and (1, -1) are perpendicular.
The only eigenvalues of a projection matrix are 0 and 1. The nullspace is projected to zero.
The column space projects onto itself. The projection keeps the column space and destroys
thc nullspace. so A = I and A = 0:
[11 Г21	fol
-1 + 2 projects onto Pv = q +
The next matrix R is a reflection matrix and also a permutation matrix.
I. Introduction to Eigenvalues
205
еитриз ТЪ.г.Пееи.пт.ж.я, [?.| ta>«i|wmle,I1„t -1.
The eigenvector (1,1) is unchanged by Л The
are reversed by R. A matnx with no negatived e'8“vectw “	~1 Ils «£
° „„vm-tnr» for n .i	8a,,ve entries can suit have a negative eigenvalue!
The eigenvectors for R are the same as for P, because reflection = 2(pr»jection) - I.
Eigenvalues of R = 2P - I A =2(1)- bl md Аж 2(0) _ J _ -1	(2)
whe„ a matrix is shifted by I, each A is shifted by 1. No change tn it* eigenvectors.
The Equation for the Eigenvalues
For projection matrices we found A’s and xi by geometry: Px - x and Px - 0.
For other matrices we use determinants and linear algebra This is the key calculation
in the chapter—almost every application Mans by solving Ax = Xx
First move Ax to the left side Write the equafon Ax = Ax a* (A - A/)x - 0.
rA< eigenvector, «« tn the nullspace of A - AZ When we know an eigenvalue A.
we find x by solving (A - AZ)x a 0.
Eigenvalues first. If (Л - AZ)x ж 0 has a nonzero solution. A - Af is not invertible.
The determinant of A — XI must be zero. This is bow to recognize an eigenvalue A:
Eigenvalues A к an eigenvalue of A о A - AZ is singular
Equation for A det (A - AZ) = 0
(3)
This '‘characteristic polynomiaP det(A - AZ) involves only A. not x. When A is n by n,
equation (3) contains A". So A has n eigenvalues (repeats possible!) Each A leads to x:
For each eigenvalue A solve (A - AZ)x = 0 or Ax = Ax to find an eigenvector x.
Example 4 A =	* * j is already singular (zero determinant). Find its A's and x's.
When A is singular, A = 0 is one of the eigenvalues The equation Ax = Ox has
solutions. They are the eigenvectors for A = 0. Solving det (A - AZ) » 0 is the way to find
all A’s and x’s. Always subtract AZ from A:
1 - A 2
Subtract A from the diagonal of A to find A - XI =	?	4 - A ‘
Take the determinant uad — be” of this 2 by 2 matrix. From 1 - A times 4 - A,
the “ad" part is A2 - 5A + 4. The “be" part without A is 2 times 2. Subtract:
4*a]»(1-AX4-A)-(2X2) = A’-5A.	(5)
206
Chapter 6. Eigenvalues and
E,8cnv«*t<n
	1» - 5A to -его One solution is Aj -°- This was expected
this ^.nont X fX д dmes A - 5. the other eigenvalue is д2 Л'
A is singular. Factoring л .	•* - 5;
toM-AI)-A’-»-»	A, = 0 >nd A> = S
Now find the eigenvectors. Sohr (A - AZ)z = 0 separately for At =0 and A — e
2 5.
M _ 0/)x = |J Jj Щ = fJj Vld(fa „ eigenvector j = 2j
forA2 = 5
The matrices A - 01 and .4 - 51 are singular (because A - 0 and 5 are eigenvalues).
The eigenvectors (2.-1) and (1,2) are in the nullspaces: (A - XI)x - 0 is Ax =
We need to emphasize: There is nothing exceptional about A = 0. Like every other
number, zero might be an eigenvalue and it might not If A is singular, the eigenvectors
for A = 0 fill the nullspace: Ax — Ox - 0 If A is invertible, zero is not an eigenvalue.
We shift A by a multiple of I to make it angular.
In Example 4 the shifted matrix A - 51 is singular. Then 5 is the other eigenvalue of A.
Summary To find the eigenvalues of an n by n matrix, follow these steps: best if n s 2
1.	Compute the determinant of A - AZ. With A subtracted along the diagonal.
this determinant starts with A" or -A". It is a polynomial in A of degree n.
2.	Find the roots of this polynomial, by solving det (A - AZ) = 0. The n roots
are the n eigenvalues of A. Those n numbers make A — AZ singular.
3,	For each eigenvalue A: Solve (A - AZ)z = 0 to find an eigenvector: Ax = Az.
4.	The eigenvalues of a triangular matrix are the numbers on its diagonal.
A note on the eigenvectors of 2 by 2 matrices. When A - AZ is singular, both rows are
multiples of a vector (a, b). The eigenvector direction is (6, —a). Tbe example had
A = 0: rows of A - 0Z in the direction (1,2); eigenvector in the direction (2, -1)
A = 5 : rows of A - 5Z in the direction (-4.2); eigenvector in the direction (2,4)
Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct
There is a whole line of eigenvectors—any nonzero multiple of z is as good as x.
MATLAB's dg(A) divides by the length of z. to make the eigenvector into a unit vector.
5.1. Introduction lo Eigenvalues
207
Determinant and Trace
Bad news first: If you add a row of A to amwK-
eigenvalues usually change. Elimination does	T
U has its eigenvalues sitting along the diaeonSl? «** А к The upper triangular
eigenvalues of U but not of A ! Eigenvalues are Г u pnXXs
b r. _i	gtnvatues are changed when row 1 is added lo row 2:
U
has A = Oand A = 1; д
^XXX^s^^ and ,he ™ A«+* -	«•***
h ' m The Slim nf ’	P10**0* “ 0 times 4. That agrees with the determinant
(Wh' J?’,	“ 0 + 4 ТЫ agrees with the sum down the main
has A = 0 and A = 4.
The product of the n eigenvalues equals the determinant.
The sum of the n eigenvalues equals the sum of the n diagonal entries.
The sum of the entries of A along the main diagonal is called the truce of A: 6
(6)
Those checks are very useful. They are proved in PSct 6.1 and again in Section 6.2.
They don’t remove the pain of computing A’s. But when the computation is wrong, they
generally tell us so. To compute the correct A’s, go back to det (A - Af) = 0.
The trace and determinant do tell everything when the matrix is 2 by 2. We never want
to get those wrong! Here trace = 3 and det = 2. so these matrices have A = 1 and A = 2:
(7)
And here is a question about the easiest matrices for finding eigenvalues: triangular A.
Why do the eigenvalues of a triangular matrix A lie along its diagonal ?
Imaginary' Eigenvalues
One more possibility, not too terrible. The eigenvalues might not be real numbers.
Example 5 The 90° rotation Q = [? ~o] ^as 1,0 rea^ eigenvectors. Its eigenvalues are
Aj = t and Aj = —i. Then Ai + Aj = trace = 0 and AjAj = determinant = 1.
After a rotation, no real vector Qx stays in the same direction as x (x — 0 is useless).
There cannot be an eigenvector, unless we go to imaginary numbers. Which we do.
To see how i = y/^T can help, look at a rotation Q through 90°. Then Q2 is rotation
through 180°. Its eigenvalues are —1 and —1 because —Jx = —lx. Squaring Q will
square each A. so we must have A2 = -1. The eigenvalues of the 90° rotation matrix Q
are +i and —i, because i2 = —1.
Chapter 6. Eigenvalues and Eii
208
Those A’s come as usual from det(Q - AZ) = 0. This equation is A2
Its roots are i and -i. We meet the imaginary number i also in the eigenvectors 
cpu. rc *UJ ”* l* №‘l!]-
eigenvectors L	,
= (1 i) and x2 = (*. 1) keeP ±е1г dlrccti<>n as they
Somehow the complex vectors	,mportant poinl
that real matrices
are rotated. Don't ask me bow- •	The particular eigenvalues i and
•“ »»™ «. p. p	malnces Q ta„ 1|(W| _ )M
1.	The absolute value of each	=
X is pure imaginary.
2.	This Q is a skew-symmetric matnx (У
,ct _ Cl can be compared to a real number: A is real.
A symmetric matnx (5 - h) can
* to. qmntok man. (Лт = -Л) is Mu ” "“P"”’ "umb": A “
« «togtol man. (9’0 - П	‘	”“тЬ" W "1
,	cneeial matrices S and A and Q are perpendicular. Somehow
Eigenvalues of AB and A+B
The first guess about the eigenvalues of AB is not true. An eigenvalue A of A times an
eigenvalue 3 of В usually does not give an eigenvalue of AB:
False proof	ABx = Apx = pAx = pXx.	(8)
When x is an eigenvector for A and B. this equation is correct. The mistake is to
expect that A anti В automatically share the same eigenvector x. Usually they don't.
Eigenvectors of A are not generally eigenvectors of B. These singular matrices A and В
have all zero eigenvalues while 1 is an eigenvalue of AB and A + В:
and В
then AB
and A + В = qJ •
The eigenvalues of A + В are generally not X + P. Here A + p = 0 while A + В has
eigenvalues 1 and -1: trace = 0 and determinant = -1.
The false proof suggests what is true. Suppose x really is an eigenvector for A and B.
Then we do have ABx = XPx and В Ax = XPx. When all n eigenvectors are shared
by A and B, we гол multiply eigenvalues. The test AB = BA for shared eigenvectors is
important in quantum mechanics—time out to mention this application of linear algebra.
6 I. Introducuon to Eigenvalues
209
|7^d В share the same n independeni'^^ if and only if AB = BA |
Heisenberg’s uncertainty principle In quantum mechanics, the position matrix P and
the momentum matrix Q do not commute. In fact QP - PQ = / (these are infinite
matrices). To have Px - 0 at the same time as Qx = о would require x = lx = 0.
But if we knew the posibon exactly, we could not also know the momentum exactly.
Heisenberg’s uncertainty principle JPxfi ||Qx|| > l|xp b ln Problcm
 WORKED EXAMPLES 
6.1 A Find the eigenvalues and eigenvectors of Л and A2 and A~1 and A + 4Z:
and A2
Check that the trace Ai + Aj = 2 + 2 = 4 and the determinant is AjAj =4-1=3.
We don’t need to compute A2 to find its eigenvalues A2.
Solution	The eigenvalues of A come from det(X - AZ) = 0:
det( A - AZ) =
2-Л
-1
1 I = A2 - 4A+ 3 = 0.
2
This factors into (A — 1)(A — 3) = 0 so the eigenvalues of A are Ai = 1 and Aj = 3. For
the trace, the sum 2 + 2 agrees with 1+3. The determinant 3 agrees with the product Ai Aj.
The eigenvectors come separately by solving (A — AZ)x = 0 which is Ax = Ax:
A = 1: (A-Z)x
gives the eigenvector Xj =
1
1
л=з:	_;][;]-[;]
gives the eigenvector Xj
A2 and A-1 keep the same eigenvectors as A Their eigenvalues are A2 and A 1.
A2 has eigenvalues I2 = 1 and 32 = 9 A 1 has — and - A + 4Zhas ^-^4 = 7
Notes for later sections: A has orthogonal eigenvectors (Section 63 on symmetric
matrices). A can be diagonalized since Aj / Aj (Section 6.2). A is similar to any 2 by 2
matrix with eigenvalues 1 and 3 (Section 6.2). A is л positive definite matrix (Section 6.3).
210
Chapter 6. Eigenvalues and Eigenvector
te the eigenvalues of any A ? Gershgorin gave this answer.
6.1 В How can you es	onc of the entries^. on the main diagonal.
Every eigenvalue of Л must be W »	(han the sum Щ of all other |% j
I«.,l» “““ °f •	“""w ««...
in that row. of the ma	diagonal entries а« |o« - A| < R,
Every A b m the circle = °	of x is x2. Then
Fro# Suppose (-	_ A| < |aal | |z2| + |aM| |Ia|
. ,	- Ain + “»ls ~ ° 8
ajtxi + («и	si p and A is inside the second Gershgonn circle.
Dividing by |xj| leaves |o» —	- "2
Example I. Every eigenvalue A of this A falls into one or both of the Gershgonn circl
The centers аге a and d. the radii are Я> = |6f and R2 = |c|.	И
Га 6 1	First circle:	|A - a| < |6|
“ [ c d |	Second circle: |A - d| < |c|
Those are circles in the complex plane, since A could certainly be complex.
Example 2. All eigenvalues of this A lie in a circle of radius R = 3 around one or mo
of the diagonal entries d\,d2, d3:
|A-di|<l+2 = fll
|A - <Ы < 2 + 1 = R2
|A - d,| < 1 + 2 = R3
You see that “near" means not more than 3 away from dt or d2 or d3, for this example.
6.1 C Find the eigenvalues 0,1,3 and eigenvectors of this symmetric 3 by 3 matrix S:
Symmetric matrix
Singular matrix
1
-1
0
-1
2
-1
0‘
-1
1
All eigenvalues are
in the Gershgonn
circle |A - 2| < 1 + 1.
Solution Since all rows of 5 add to zero, the vector x = (1,1,1) gives Sx = 0.
This is an eigenvector for A = 0. To find Aj and Аз I will compute the 3 by 3 determinant:
1-A
det(5-A/)= -1
0
Those three factors give A =
Pl
*1 = 1
1
Sx)= 0z[
-1
2-A
-1
= (1 - A)(2 — A)(l — A) — 2(1 - A)
= (1 — A)[(2 — A)(l — A) — 2]
0
-1
1-A| = (1 — А)(—Л)(3 — A).
— 0,1,3. Each eigenvalue corresponds to an eigenvector:
’ 1
0
-1
Sx2 = lx2 x3 =
1‘
-2
1
Sx3 =Зхз.
u J
’	ei8envecl<*s « perpendicular when S is symmetric. We were lucky to
тъГа|'| ’	1WOU*<1 use eig(A). and never touch determinants.
u comma [ •£]=eig(A) will produce unit eigenvectors in the columns of X.
S =
$.1. introduction to Eigenvalues
211
problem Set 6.1
1	The example ai the start of the chapter has powers of this matrix A:
4'"] - *-[« 5] - -=[:< :?]
Find the eigenvalues of these matrices. All powers have lhe same eigenvectors.
(a)	Show from A how a row exchange can produce different eigenvalues.
(b)	Why is a zero eigenvalue not changed by the steps of elimination?
2	Find the eigenvalues and the eigenvectors of these two matrices:
A=[j 3] and Л + /=[2 j],
A +1 has the eigenvectors as A. Its eigenvalues are____by 1.
3	Compute the eigenvalues and eigenvectors of A and A-1. Check the trace !
i]
A"1 has the eigenvectors as A. When A has eigenvalues A j and >2, its inverse
has eigenvalues.
4 Compute the eigenvalues and eigenvectors of A and A2:
- = o] “ *•[-» 4
A2 has the same____as A. When A has eigenvalues A! and Aj. A2 has eigenvalues
. In this example, why is A2 + A| = 13?
5 Find the eigenvalues of A and В (easy for triangular matrices) and A + В:
A=[J j] and B«[j J] and A + B=[{ ’] •
Eigenvalues of A + В are / art not equal to eigenvalues of A + eigenvalues of B.
6 Find the eigenvalues of A and В and AB and BA:
(a) Are the eigenvalues of AB equal to eigenvalues of A times eigenvalues of B?
(b) Are the eigenvalues of AB equal to the eigenvalues of BA?
212
Chapter 6. Eigenvalues and
Ei8««*aors
-	aneurin, nroduces -4 = UJ The eigenvalues of U are on its diagona|.
7	X the P The eigenvalues of £ are on its diagonal; they are all _	’
eigenvalues of -4 are not tbe same as ------
8	(a) If you know that z is an eigenvector, lhe way lo find A is to---.
(b) If you know that A is an eigenvalue, the way to find X is to----
9	What do you do to tbe equation Ax = Az. in order to prove (a), (b), and (с)?
(a)	A* is an eigenvalue of A2. as in Problem 4.
(b)	A*1 is an eigenvalue of A	as in Problem 3.
(c)	A + 1 is an eigenvalue of A + I. as in Problem 2.
10	find the eigenvalues and eigenvectors for both of these Markov matrices A and A °°
Explain from those answers why A100 is close to Ax:
11	Here is a strange fact about 2 by 2 matrices with eigenvalues Aj / A2; The columns
of A - Aj I an multiples of the eigenvector x2. Any idea why this should be?
12	Find three eigenvectors for this matrix P (projection matrices have A= 1 and 0)
Projection matrix
f.2 .4 0]
.4 .8 0
0 0 1]
If two eigenvectors share lhe same A, so do all their linear combinations. Find an
eigenvector of P with no zero components.
13 From the unit vector « = (1,1,3,5)/6 construct the rank one projection matrix
' = «« . This matrix has P2 = P because uTu = 1.
(a> sfrh U to™s fmm <uuT)u - u(uTu) - u. Then и is an eigenvector
with eigenvalue A = 1. In that case find />100u.
(b) If v is perpendicular to и show that Pv ~ 0. Then A = 0.
<« ejjel)vaJueA =
“	- ад . w «.ы, и д .	± ,.л л
Q _	— sin в]
[ sin 8 casfil ’be ту plane by the angle 9. No real A’s.
**fcb,= и,,,..!
Introduction to Eigenvalues
213
15	Every permutation matrix leaves x = (1,1....j, unrflan^ Then A = 1. Find
two more A s (possibly complex) for these permuuuons. from <fct(P - AZ) = 0:
_	Г0	1	°]	ГО 0	1'
P ~	I °	°	1 I	and P =	о 1	0	.
L1	0	0j	10	0
16	The determinant of A equals the product А» Aj • • - A,. Start with the polynomial
det(A - AZ) separated into its n factors (always possible). Then set A = 0:
det(A — AZ) = (At — A)(A2 — A) • • • (A, — A) so det A = __-
Check this rule in Example I where the Markov matrix has A = 1 and |.
17	If A has Ai = 4 and A2 = 5 then det(A - AZ) = (A - 4)(A - 5) = A2 - 9A + 20.
Find three matrices that have trace a + d = 9 and determinant 20 and A = 4,5.
18	A 3 by 3 matrix В is known to have eigenvalues 0.1,2- This information is enough
to find three of these (give the answers where possible):
(a) the rank of В	(b) the determinant of BTB
(c) the eigenvalues of BTB (d) the eigenvalues of (В2 + Z)_|.
19 Choose the last rows of A and C to give eigenvalues 4,7 and 1.2.3:
Companion matrices
20	The eigenvalues of A equal the eigenvalues of AT. This is because det(A - AZ)
equals det(AT - AZ). That is true because ____________. Show by an example that the
eigenvectors of A and AT are not the same.
21	Find three 2 by 2 matrices that have Aj = Aj = 0. The trace is zero and the
determinant is zero. A might not be the zero matrix but check that A =0.
22	This matrix is singular with rank one. Find three A’s and three eigenvectors
1 2
2 4
1 2
All eigenvalues are
in the Gershgorin
circle |A — 2| < 8.
«tnnnncr. 4 and В have the same eigenvalues A,.A„ with the same independent
Suppose A and В have tne same
eigenvectors Xj,. . ..Xn- Then A "
CjX, + • • • + c„x„. What is Axl What is Bx.
214
Chapter 6. Eigenvalues and Eigenve^
24 Find the rank and the four eigenvalues of A and C:
25 (Review) Find thc eigenvalues of А, B, and C:
1 2 3
0 4 5
0 0 6
0 0 I*
0 2 0
3 0 0
2 2'
2 2
2 2
26 Suppose A has eigenvalues 0,3,5 with independent eigenvectors u. v, w
(a)	Give a basis for the nullspace and a basis for the column space.
(b)	Find a particular solution z lo Az  v + w. Find al) solutions.
(c)	Az . u has no solution. If it did then____would be in lhe column space.
Challenge Problems
27
28
28
30
eigenvalues of A. Check that A| + A, agrees with Die trace ut ti| + u2V3 = UT„
’* ^m<,? Wh"‘ numhcni can •* ,hc "Г P? What
fimr numbtn can he e.genvalue. of P, as in Problem 15?
».’££F rrt^“ «' <-* »•
A Hz - zrHAz S 2 IIЛз>|| ||//X||.
Then
I* *• •mf"’»»iMc to get Hr positam emir *T*1 "”** 1|Яж|,/|,а'11 '* lc"'' J*
I* ion emu and momentum error txMh very small.
A ’ ' 1?*y can hT **/’’‘el*env"luw mu*, “,irfy
. Whal are the trace and determinant ?
6.2. Diagonalizing a Matrix
215
6.2 Diagonalizing a Matrix
columns of AX = ХЛ are Axk = A*xt. The eigenvalue matrix Л is diagonal.
2 n independent eigenvectors in X diagonalize A A = XAX~l and Л = №* AX
3 Solve Uk+t = Xu* by u* = A‘uo = XA*X 'v<> = [cHAQ^Zt 4- ---F c„(AT.)hz7
4 No equal eigenvalues » n independent eigenvectors in X. Then A can be diagonalized.
Equal eigenvalues A might have too few independent eigenvectors. Then X~1 fails.
5 Every matrix C = fl"1 AB has the same eigenvalues as A. These C*s are “similar'' lo A.
When x is an eigenvector, multiplication by A is just multiplication by a number A:
Ax = Ax. All the difficulties of matrices are swept away. Instead of an interconnected
system, we can follow the eigenvectors separately. Il is like having a diagonal matrix,
with no off-diagonal interconnections. The 100th power of a diagonal matnx is easy.
The point of this section is very direct. A turns into a diagonal matrix A when we tut
the eigenvectors properly. Diagonalizing A is lhe matrix form of our key idea.
Suppose the n by n matrix A has n linearly independent eigenvector* x>............x„.
Pul them into the columns of an eigenvector matrix X. We will prove AX = X A.
Therefore X ~1A X is the eigenvalue matrix A:
Eigenvector matrix X
Eigenvalue matrix A
Xх AX = A =
The matrix A is “diagonalized." We use capital lambda for the eigenvalue matrix,
because the small A’* (the eigenvalues of A) are on its diagonal.
Example 1 This A is triangular so its eigenvalues 1 and 6 are on the mam diagonal:
Eigenvectors
go into X
'll [11
°] [ij
; -!] [i s] [::] - [J J]
X-t A X = A
In other words A = ХЛХ*1. Then AJ = XAX_,XAX-1. So Aa is XA’X1.
Aa has the same eigenvectors in X and its squared eigenvalues are in A?
216
Chapter 6. Eigenvalues and Ei,
„.. . . Y _ у A ’ Л multiplies its eigenvectors, which are the columns of X Th
™ь>,«,.ЕкЬ.ЫиЯ.»ГХЬт«И|ЧЫ|,1иЫ|!я^
The key idea is to split this matrix AX into A’ times Л.
Keep those matrices in lhe right order! Then Aj multiplies the first column xj, as shown
The diagonalization is complete, and we can write AX = XK in two good ways:
AA'=A'A is X"’AX=A or A = АЛХ»
(2)
The matrix X has an inverse, because its columns (the eigenvectors of A) were assumed to
be linearly independent Without n independent eigenvectors, we can't diagonalize A.
A and Л have the same eigenvalues Aj,..., An. Their eigenvectors are different. The
job of the original eigenvectors Xj,..., xn was to diagonalize A. Those eigenvectors in X
produce A = XAX~l. You will soon see their simplicity and importance and meaning
The fcth power will be A* = XA*X-1 which is easy to compute:
A* = (XAX^XXAX-1)... (AAA"1) = XAkX~*
Example 1
A = 1 and 6
P 5Г = P И P 1P -i] _ Г1
[0 6] “ [0 1] [ 6*J [0 1J “ [o

With к = 1 we get A With к = 0 we get A0 = / (and A0 = 1). With к = -1 we get A"1
with eigenvalues 1 and j. You see bow A2 = [1 35; 0 36] fits that formula when k = 2.
Here are four small remarks before we use A again in Example 2.
Remark 1 Suppose the eigenvalues A,,..., are all different Then it is automatic that
the eigenvector x, .... xB are independent The eigenvector matrix X will be invertible,
-very matnx that has no repealed eigenvalues can be diagonalized.
In	“"™е,оп b’ an’ constants. A(cx) = A(cx) is
MATLAB 1 ** d,vide 1 ~ (1.1) by s/2 to produce a unit eigenvector.
MATLAB and vmually all other codes produce eigenvectors of length ||x|| = 1.
6.2- Diagonalizing a Matrix
217
Л come “ «me order as the eigenvalues in A.
To reverse the order tn Л. put the eigenvector (1,1) W<Me eigenvcctof (1(0) in X:
New order 6,1
To diagonalize A we must use an eigenvector matrix. From X'AX = Л we know
that AX - XA. Suppose the first column of X is x. Then the first columns of AX and
XA arc Ax and Ац. For those to be equal, x must be an eigenvector.
Remark 4 (repeated warning for repealed eigenvalues) Some matrices have too few
eigenvectors. Those matrices cannot be diagonalized. Here are two examples:
A and В are not diagonalizable
They have only one eigenvector
““ B=[o o]-
Their eigenvalues happen to be 0 and 0. Nothing is special about A * 0. thc problem is lhe
repetition of A. All eigenvectors of the first matrix are multiples of (1,1):
Only one line
of eigenvectors
Ax = Ox means
[! -IH -Й
and x = c
There is no second eigenvector, so this unusual matnx A cannot be diagonalized.
Those matrices are the best examples to test any statement about eigenvectors In many
true-false questions, non-diagonalizable matrices lead to false.
Remember that there is no connection between invertibility and diagonalizability:
-	Invertibility is concerned with tbe eigenvalues (A = 0 or A / 0).
-	Diagonalizability is concerned with the eigenvectors (too few or enough for X).
Each eigenvalue has at least one eigenvector! A - Af is singular. If (Л - AI)x = 0 leads
you to x = 0. A is not an eigenvalue. Look for a mistake in solving det (A - Af) = 0.
Eigenvectors for n different A’s are independent. Then we can diagonalize A.
Independent x from different A Eigenvectors xj,...,x7 that correspond to
distinct (all different) eigenvalues are linearly independent An n by n matrix
that has n different eigenvalues (no repeated A’s) must be diagonalizable.
Proof Suppose cjxi+caxj =0. Multiply by A tofindciArXi + caAjXj = 0. Multiply
by A2 to find CtAeXt + cjAjXa = 0. Now subtract ooe from the other to show c, = 0.
Subtraction leaves (At - AjjciXj = 0. Then с» = 0 because A> # Aj.
Similarly ca = 0. Only the combination of x’s with d = ca = 0 gives ctx, + Cjx2 = 0.
So the eigenvectors x । and x2 must be independent.
218
Chapter 6. Eigenvalues and Eigenv^,^
. ainxtlx to 3 eigenvectors- Suppose that c«i + e-2x2 + a
This preo f	йООС Multiply by .4 - W and x3 в gone:
Multiply by A - Ajf and x3 -
omyxtbHt (A.-M(a.-^’° wtachforccs Cl=0-	(3)
- n When the А » are all different, the eigenvectors are independent
Similarly every с. - а	n ^„„,5 of the eigenvector matrix X.
A full set of n eigenvectors can g
_	1» 9 Powers of 4 The Markov matrix .4 = [;$:’] has Aj = 1 and A2 = .5
««»«•“	•':
Markov example , -j ^.4 —] J	.5] [.4 —.6j
The eigenvectors (.6, .4) and (1,-1) «in *e columns of X. They are also the eigcnvec-
tors of A2. Watch how A2 has the same X. and the eigenvalue matrix of А2 is Л2:
Same X for A2 A2 = XAX^XAX'1 = XA2X~2.
Just keep going, and you see why the high powers A4 approach a steady state :
(4)
As к gets larger. (.5)* gets smaller. In the limit it disappears completely. That limit is A00:
.. ...	.« f-6 il fl ol Г1 il Г.6 .el
Limitfc—»oc	A = [4 oj^4 _.6] = [,4 ,4j.
The limit has the eigenvector xt in both columns. We saw this A°° on tbe very first page
of Chapter 6. Now we see it coming from powers like A100 = XAi0oX~l.
Question When don Ak -» zero matrix? Answer AU |A| < 1.
Similar Matrices: Same Eigenvalues
Suppose the eigenvalue matrix A is fixed. As we change the eigenvector matrix X, we get
a whole family of different matrices A = XAX’1^// with the same eigenvalues in A.
All those matrices A (with the same eigenvalues in A) are called similar.
This idea extends to matrices C that can t be diagonalized. Again we look at tbe whole
family of matnees A _ BCB allowing all invertible matrices B. Again all those
matrices A and C are similar.
f у U'inf 5 *’*5tea<1 ot Л because C might not be diagonal. We are using В instead
no< * ei^v««ors. We only require that В is
mvernble S.m.lar matnees C and BCB1 have the same eigenvalues
AHthematrices A = BCB~> are ^ИагГ a(]of c
ьг
pjagonalizing a Matnx
219
Supp0se Cx = Ax. Then BCB' also has the eigenvalue A The new eigenvector is Bx:
Same A (BCB-*)(Bx) = BCx = BAx = A(Bx).	(5)
д fixed matrix C produces a family of similar matrices BCB*1, allowing all B.
n £ is the identity matrix, the “family” is very small. The only member is BIB ' = I-
The identity matrix is the only diagonalizable matrix with all eigenvalues A = 1.
The family is larger when A = 1 and 1 with only one eigenvector (not diagonalizable).
The simplest C in this family is called the Jordan form. Every matrix A in the family has
determinant = 1 and trace = 2 and this special form with A = I excluded:
J J = Jordan form for every A = BCB~1 = 1
For an important example I will take eigenvalues A = 1 and 0 (not repeated!). Now the
whole family is diagonalizable with the same eigenvalue matrix A. We get every 2 by 2
matrix A that has eigenvalues 1 and 0. The trace of A is 1 and the determinant is zero:
All . Г 1 0 Г 1 1 1 Г Л .5 1	. xyr
similar Л“[о о] Л ’ [ 0 0 ] ” 4 ~ [ S .5 ]	' 7V
The family contains all matrices with A2 = A. including A = A. When A is
symmetric these are projection matrices P2 = P. Eigenvalues 1 and 0 make life easy.
Fibonacci Numbers
We present a famous example, where every new Fibonacci number is the sum of the two
previous F’s. Then eigenvalues of A tell how fast the Fibonacci numbers grow.
The sequence 0,1,1,2,3,5,8,13.... comes from = F*+i + F^.
Problem: Find the Fibonacci number F100 The slow way is to apply the rule
F*+2 = Fk+i + F* one step at a time. By adding Fe = 8 to F? = 13 we reach Fe = 21.
Eventually we come to Fioo- Linear algebra gives a better way.
The key is to begin with a matrix equation 11*4.1 — Au*. That is a one-step rule for
vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting
two Fibonacci numbers into a vector u. Then you will see lhe matrix A.
Chapter 6. Eigenvalues and
220
«, .-И*	Л'°Ч:
Г11 «|>Р1« “»“[lb 03 = [2].......... ut°o<=[pl0ll
«0= 0 . 1 Id UJ 1 J	l/iooj’
This problem is perfect for e.genva.ues, Take the determinant of A - Af:
4-A/=pjA \j kads,° dcl(4-A/) = A’-A-i.
The equation Aa - A -1 ® OB solved by the quadratic formula (-6 ± v^T^)/2e.
Eigenvalues A, =	* 1-618 and A2 =	* -0.6l8.
These eigenvalues lead to eigenvectors X| = (Ai, 1) and x2 = (A2,1). Step 2 finds the
combination of those eigenvectors that gives the starting vector tio = (1,0):
N-sMi’HW “	<•>
Step 3 multiplies uo by A100 to find uioo- The eigenvectors Xj and x2 stay separate!
They are multiplied by (A| )100 and (A2)100:
We want F100 = second component of umo- The second components of Xj and x2 are I
The difference between At = (1 + s/5)/2 and A2 = (1 - v/5)/2 is y/5. And Aj00 =s 0.
Д1ОО _ Д1ОО
100th Fibonacci number = —1------— = nearest integer to
(10)
Every Ft is a whole number. The ratio Fioi/F10o must be very close to tbe
limiting ratio (1 + s/S)/2. The Greeks called this number thc "golden mean".
For some reason a rectangle with sides 1.618 and I looks especially graceful.
Matrix Powers A*
Rbonacd s example is a typical difference equation u*+i = Au*. Each step multiplies
у The solution is u* = A «0. We want to make clear how diagonalizing the matrix
gives a quick way to compute A* and find uk in three steps
T^ eigenvector matrix Xproduces A = XXX '. This is a factorization of the matrix,
hee>i>J °Г' vi P* new fac1onzaI'on is perfectly suited to computing powers,
because every tune Xх multiplies Хне get It
Powers of A A*i»o = (ХХХ-')...{ХАХ->)щ = XAfcX-*uo
**” nl°threc s,eps Equation (11) will show how eigenvalues work.
6 2. Diagonalizing a Matrix
221
wn» « —	+	Then с - X -«.
2. Multiply each eigenvector x, by (A,)* Now we have Д*
3. Add up the pieces c,(A, )*x, lo find the solution щ = A'uo. This is XA*X-‘uo
Solution for = Aut ut = ^t«o=c1(AI)‘x,+... + Ce(Ae)‘xn. (II)
In matrix language Ak equals (ХЛХ *)* which is X times A* times X~l 2. In Step 1.
the eigenvectors in X lead to the c’s in the combination uo = С|х, + — + c„x„:
Step 1
This says that uo = Ac.
(12)
The coefficients in Step 1 аге c = X'1^. Then Step 2 multiplies each c* by A*. The
final result u* = 52с,(А,)кх, in Step 3 is the product of X and A* and c = X-1uo:
A*uo = XA*X-‘uo=XA*c= xt
RAi)‘
. (В)
(A„)fcJ |<
This result is exactly u* = Ci(Ai)fcXi +••• + c„(A„)kxn It solves ut+l = Aut.
Nondiagonalizable Matrices (Optional)
Suppose Л is an eigenvalue of A. We discover thal fact in two ways:
1. Eigenvectors (geometric) There are nonzero solutions lo Ax = Ax.
2. Eigenvalues (algebraic) The determinant of A - AI is zero.
The number A may be a simple eigenvalue or a multiple eigenvalue, and we want to know
its multiplicity. Most eigenvalues have multiplicity Af = 1 (simple eigenvalues). Then
there is a single line of eigenvectors, and det( A — Af) does not have a double factor.
For exceptional matrices, an eigenvalue can be repeated Then there are two different
ways to count its multiplicity. Always GM < AM for each A:
1. (Geometric Multiplicity = GM) Count the independent eigenvectors for A.
Then GM is the dimension of the nullspace of A - Af.
2. (Algebraic Multiplicity = AM) AM counts the repetitions of A among the
eigenvalues of A. Look at the n roots of det, A — Af) — 0.
CWer6.ElgenvJ1luesandEigcnvBc^
222
If A has A = 4,4.4. then that eigenvalue has AM = 3 and GM = 1 or 2 or 3
The following num* A is an example of trouble. Its eigenvalue A = n
It is a double eigenvalue (AM = 2) with only one independent eigenvector '*
rcPeated.
= (1,0).
AM = 2 A=[? 11 has <fet(A — Af) — | 0 _д| A- 1 eigenvea^
GM =1	1°
when GM is below AM means that A is not diagonal^
This shortage of eigenvectors
 WORKED EXAMPLES 
6.2 A Find the inverse and the e.genvalues and tbe determinant of this matrix A.
A = 5 • eye(4) - ooes(4)
4 -1 -1 -1
-1	4	-1	-1
-1 -1	4	-1
-1 -1 -1	4
Describe an eigenvector matrix X that gives X AX-A.
Solution What are the eigenvalues of the all-ones matrix ? Its rank is certainly 1
so three eigenvalues are A = 0,0,0. Its trace is 4, so the other eigenvalue is A = 4
Subtract this all-ones matrix from 5/ to get our matrix A:
Subtract the eigenvalues 4.0,0,0 from 5,5,5,5. The eigenvalues of A are 1,5,5,5.
The determinant of A is 125, the product of those four eigenvalues. The eigenvector for
A = 1 is x = (1,1,1,1) or (c,c,c,c). The other eigenvectors are perpendicular to x
(since A is symmetric). The nicest eigenvector matrix X is the symmetric orthogonal
Hadamard matrix H The factor 5 produces unit column vectors = eigenvectors of A.
Г 1 1 1 11
Tbe eigenvalues of A*1 are 1, |, |. The eigenvectors are not changed so A-1 =
The inverse matrix is surprisingly neat: A-1 = (/ + all ones)/5.
A is a rank-one change from 51. So A-1 is a rank-one change from I/5.
In a graph with 5 nodes, the determinant 125 counts the “spanning trees”. Those trees
have no loops and they touch all 5 nodes.In a graph with 5 nodes, lhe determinant 125
counts the “spanning trees". Those trees have no loops and they touch all 5 nodes.
W5lh 6 nodes, the matrix 6 • eye(5) - ones(5) has the five eigenvalues 1,6,6,6,6.
& 2. Diag°nali,jn*>a MaUu
problem Set 6.2
Questions 1-7 are about the eigenvalue and eigenvector matrices Л and X.
1	(a) Factor these two matrices into A = XXX
(b)IfA = XAX 'thenA3 = ( X X )MdA-‘=( X )( )•
2	If A has Ai = 2 with eigenvector Zj = [J] and A2 = 5 with x2 = [}].
useXAX to find A. No other matrix has the same A’s and z’s.
3	Suppose A = XAX’-1. What is the eigenvalue matrix for A + 21? What is the
eigenvector matrix? Check that A + 2/ = ( )( )( )-*.
4	True or false: If the columns of X (eigenvectors of A) are linearly independent, then
(a) A is invertible (b) A is diagonalizable
(с) X is invertible (d) X is diagonalizable.
5	If the eigenvectors of A are the columns of I. then A is a_matrix. If the eigen-
vector matrix X is triangular, then X~l is triangular. Prove that A is also triangular.
6	Describe all matrices X that diagonalize this matrix A (find all eigenvectors):
A—2} • Why does this X also diagonize A-1 = |
7	Diagonalize the Fibonacci matrix by completing X-1:
1	11 [Ai	Ajl [Ai	01	[
1	0]	” I 1 1J L °	a2J	L
Do the multiplication XA*X-1 [ J] to find its second component. This is the ith
Fibonacci number F* = (A* — A|)/(Ai — Aj).
8	Suppose Gt+2 is the average of lhe two previous numbers Gk+i and G*:
Gk+a = |Ga+i +	fGk+2l = [ A 1
Gk+i=Gk+l	I HG‘
(a)	Find the eigenvalues and eigenvectors of A.
(b)	Find the limit as n -> oc of the matrices A" = XA"X .
(c)	If Go = 0 and Gi = 1 show that the Gibonacci numbers approach |.
224
Chapter 6. Eigenvalues and j. ,
eenvtet,^
Prove that every third Fibonacci number in 0,1,1,2,3.... is even.
Write down the most general matrix that has eigenvectors [ I ] and [ J j
10
Questions I l-M are
11
12
13
True or false: If the etgem^trs of Лапе 2.2.5then tbe matrix is ccnain)y
(a) invertible (b) diagonal.zabic (O no. diagonalizable.
True or false: If the only eigenvectors of Л arc multiples of (1,4) then A has
(a) no inverse (b) a repeated eigenvalue (c) no diagonalization XAX-1
Complete these matnces so that drt Л = 25- Then check that A = 5 is repeated^
the trace is 10 so the determinant of Л - Af •* (A •>) . Hind an eigenvector with
4x = 5». These matrices will not be diagonalizable because there is no second line
of eigenvectors.
and Л
and 4
Tbe matrix Л = [’ J] is not diagonalizable because thc rank of A - 3/ is_______
Change one entry to make Л diagonalizable. U Inch entries could you change?
Questions 15-19 are about powers of matrices.
Л* = XA*X'1 approaches the zero matrix as к -» oo if and only if every A has
absolute ' aloe less than___. Which of these matrices has Л* -+ 0?
.6 .91
•6 | ’
14
15
and
.1
9
Л =
16 (Recommended) Find A and A' to diagonalize Л1 in Problem 15. What is the limit
of A* as к -+ oo? What is the limit of XA*X-1? In tbe columns of this limiting
matrix you see the______.
17 Find A and X to diagonalize Л2 in Problem 15. What is (Лг)10ио for these Uo?
and u0 =
6'
0 '
18 Diagonalize Л and compute XA*X-1 to prove this formula for Л*:
has Л‘=’Г1 + 3‘
1-3*
2 11-3* 1 +3* '
6.2-
Diagonalizing “ Mattn
225
19
20
21
22
23
24
25
26
27
28
29
Diagonalize В and compute XA‘X- to prove this formula tor B‘:
5* 5* -4*1
0
Suppose A = XAX 'Jake determinants io prove det A = det A = A, A2- A„.
This quick proof only works when A can be___
Show that trace X Y = trace YX. by adding the diagonal entries of X Y and YX:
13 =
has
4»
X
И
and
Now choose Y to be AX *. Then XAX' has the same trace as AX~'X = A.
This proves that the trace of A equals the truce of A = sum of the eigenvalues.
When is a matrix A similar to its eigenvalue matrix A ?
A and A always have the same eigenvalues. But similarity requires a matrix В with
A = BAB~ 1. Then В is the ________ matrix and A must have n independent.
If A = X AX ~1, diagonalize the block matrix В = [ J ]  Find its eigenvalue and
eigenvector (block) matrices.
Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector
matrix X. Show that the A’s form a subspace (cA and A> + Aj have this same X).
What is this subspace when X = /? What is its dimension?
Suppose A2 = A. On tbe left side A multiplies each column of A. Which of our
four subspaces contains eigenvectors with A = 1? Which subspace contains
eigenvectors with A = 0? From the dimensions of those subspaces. A has a full
set of independent eigenvectors. So a matrix with A2 = A can be diagonalized.
(Recommended) Suppose Ax — Ax. If A = 0 then z is in the nullspacc. If A # 0
then z is in the column space. Those spaces have dimensions (n - r) + г = n. So
why doesn’t every square matrix have n linearly independent eigenvectors?
The eigenvalues of A are 1 and 9. and the eigenvalues of В are — 1 and 9:
A
and В
Find a matrix square root of A from R — Xv/ЛХ'1. Why is there no real matrix
square root of B?
If A and В have the same A’s with the same independent eigenvectors, their factor-
izations into ____are the same. So A = B.
Suppose the same X diagonalizes both A and B. They have the same eigenvectors
in A = XA,X"1 and В = XA2X*. Prove «hat AB = BA.
1
Chapter 6. Eigenvalues
226
30
31
32
33
34
35
36
37
38
and Ei»_^
'8cn*eCl0^
• _ l« bl then the determinant of A -	»(A - a)(A - <f).
(a)^-H^i,on‘n*orem,'tha,(	*
 h Cavley-Hamilton Theorem on Fibonacci » A -[, 1]. Thc
(b) Test the Cayley ; = 0 М1КС the polynomial det(A - A/) is д2 °*т
^TT^btotbe^H-At/XX-A^-..^.^'?
“Xto д»,м ”di
If Д - p 0] and AB = BA. show that В = [j	“ al*° a diagonal matrix
И Л - [° ai	A different etgen----------. These diagonal matri B
^^si^aJ subspace of matnx space. AB - BA ~ 0 gi^
-«to-»»»**1 to nnk of to 4 by 4 mart,.
Ш ,4‘ Wtob if •" W < I”?,0"» “7 if *"> W > L
PeterTax gives these striking examples in his book Linear Algebra:
c- \ 5 r
~ L~3 ~4
c1024 = -c
D = [ 5 6.91
1-3 -4
H^10M|| < 10-П
В
A
||A,024|| > 10™ В1024 = I
Find the eigenvalues A = e** of В and C to show В4 = I and C3 = — ]
The nth power of rotation through в is rotation through n0:
A„ _ Г cos6 -sin# 1" _ Г cosn# -sinn0 1
~ [ sin0 cos в j [ sinn0 cosn# I'
Prove that neat formula by diagonalizing A = XAX-1. The eigenvectors (colum
of X) are (1. i) and (i, 1). You need to know Euler’s formula <* = cos 0 + i Sjn $ *
The transpose of A = XAX~l is AT = (X-1)TAAfT. The eigenvectors in ATy =
Ay are the columns of that matrix (X”*)T. They are often called left eigenvectors of
A. because yTA = AyT. How do you multiply matrices to find this formula for A?
[ Sum of rank-1 matrices A = XAX"1 = AjXiy^ + - • • + AnXnyJ.
The inverse of A = eyt{n) + ones(n) is A-1 = eye(n) + C • ones(n). Multiply
AA~* to find that number C (depending on n).
Suppose Ai and A3 are n by n invertible matrices. What matrix В shows that A2A) =
B(A|A2)B~* ? Then A2Ai is similar to A) A2: same eigenvalues.
(Pavel Gnnfcld) Without writing down any calculations, can you find the eigenvalues
of this matrix ? Can you find the 2020th power A2020 ?
110 55 —164
A =	42 21	-62
88
-131
Symmetric Positive Definite Matrices
227
6.3 Symmetric Positive Definite Matrices
S h. „ ral	A, _ „	„е„усяот,
2 Then S is diagonalized by an orthogonal matnx Q [~S = qaq-i = qaqt
3A positive definite: all A > 0 every piv« > о ail uppcr fef,	> 0.
3g The energy test is x TSx > 0 for all x + 0. Then S=Лт A with independent columns in A.
Positive semidefinite allows A = 0. pivot = q, determinant = 0. energy xTSx = 0. any A.
Symmetric matrices S ST deserve all the attention they get Looking at their eigen-
values and eigenvectors, you see why they are special:
1	All n eigenvalues A of a symmetric matrix S are real numbers.
2	The n eigenvectors q can be chosen orthogonal (perpendicular to each other).
The identity matrix S - I is an extreme case. All its eigenvalues are A = 1. Every
nonzero vector x is an eigenvector: lx — lx. This shows why we wrote "can be chosen"
in Property 2 above. With repeated eigenvalues like Aj = Aj = 1, we have a choice of
eigenvectors. We can choose them to be orthogonal. And we can rescale them to be unit
vectors (length 1). Then those eigenvectors qlt...,qn are not just orthogonal, they are
orthonormal. The eigenvector matrix for S has QTQ = I: orthonormal columns in Q.
We write Q instead of X for the eigenvector matrix of S. to emphasize that these
eigenvectors are orthonormal: QrQ = I and QT = Q~ *. This eigenvector matrix is an
orthogonal matrix The usual A — XAX-1 becomes S = QAQT :
Every matrix of that form is symmetric: Transpose QAQT to get QTTATQT - <?AQT.
Quick Proofs: Orthogonal Eigenvectors and Real Eigenvalues
Suppose first that Sx = Ax and Sy = Oy. The symmetric matnx S has a nonzero
eigenvalue A and a zero eigenvalue. Then у is in the nullspace of S and x ts in the column
space of S (x = Sx/A is a combination of tbe columns of S). But S is symmetric:
column space = row space! Since the row space and nullspace are always orthogonal,
we have proved that x is orthogonal to y.
Chapter 6. Eigenvalues and Ei,
is not zero, we have Sy = «У In this case we i^.
to	= OK «»<S - “/)l Г <Л  0)1 ’**' * - О
»«»s -•' Л »t» »'»”	(' г ’ * S ' «
№• » “ Й
S»»r*’0: " „TMP<too<«toSoto.ly«rs»ntortoeigen4|uei
to «W~	,”,°1" Г4*' M“"S
dPmto«>«'S- T”"" «tor IT  ctoiges и -i). Tb, J’
’'^^^’„.„«rtoptanrmbto^tonpte.tomeeslortoendorto^
„«и. B« poto« "“into art ю bt,„„(
Positive Definite Matrices
7^ti^definite matrii has all positive eigenvalues ]
i .^^mr-inc matnces S « S7. All their eigenvalues are i».i
"m Ьг* “,““тp7'rfu' р,<ч*п> H.„ i.
ТЫ1 7 podtTredefinite matrixhas all positive eigenvalues ]
We would ita to check for positive eigenvalues without ^computing those numbers A.
You
will see four more tests for positive definite matrices, after these five examples.
is positive definite. Its eigenvalues 2 and 6 are both positive
2	S 	Q J	p	g	j Qr “ positive definite if Qr - Q~1: same A  2 and 6
3	S =	C	2	° C1" is positive definite if C is invertible (not obvious)
[ 0 и J
4	S »	[ “	j is positive definite exactly when a > 0 and ac > b2
5	~ о о **	Р0*1^** semidefinite: it has all A > 0 but not A > 0
Try Test I on these examples. The other tests may give faster answers. No, No, Yes.
I 2
2 I
5 = vvT(rank 1)
’ 2 1 0
1 2 1
63-
Symmetric Positive Definite Matrices
229
The Energy-based Definition
May 1 bring forward the most important idea about posmve definite matnees? Ulis new
approach doesn t directly involve eigenvalues. but it turns out to be a perfect test for A > 0.
-This is а g°°d definition of positive definite matnees: Test 2 is the energy test
is positive definite if the energy xTSx is positive for all vectors x # 0 | (I)
Qf course S = 1 i* positive definite: All A, = 1. The energy xT/x = xTx is positive
if x / 0. Let me show you the energy in a 2 by 2 matnx It depends on x - (хьх>).
xrSx -l*‘ x’ 1
2 4
4 9
Energy
= 2x’+ Bx^ + 9 x^
Is this positive for every x» and x, except (xI,x3)  (0,0) ? Yes. Uuasum of squares:
xrSx “ 2X| + 8xix3 + 9xJ »2(xt+ 2x3)’ + Xj « positive energy
We must connect positive energy xTSx > 0 to positive eigenvalues A > 0:
If Sx = Ax then xTSx - AxTx. So A > 0 leads lo energy xTSx > 0.
That line tested xTSx for each separate eigenvector x. But more is true. If every
eigenvector has positive energy, then all попмго vectors x have positive energy:
If x rSx > 0 for the eigenvectors of S, then xTSx > 0 for every nonxero vector x.
Here is the reason. Every x is a combination ci®i + ••• + c*xn of the eigenvectors.
Those eigenvectors can be chosen orthogonal because S is symmetric. We will now show:
x1 Sx is a positive combination of the energies Адх^ха > 0 in the separate eigenvectors.
XTSx  (cl»|> + • ‘ * + Cn®I) S (C1X1 + • • • + CaSa)
»(Cixf + --+CaXj)(c1A|Xi + -- + CaAaX,)
= cf Aisfsi + • • • + <^А„х^хя > 0 If every A< > 0.
From line 2 to line 3 we used the orthogonality of the eigenvectors of S: x/x} = 0.
Here is a typical use for the energy test, without knowing any eigenvalues or eigenvectors.
If Sj and Si are symmetric positive definite, so is St + Sj
Proof by adding energies : xT(Si + Sj) x = xTSi x + xTS? x >0 + 0
The eigenvalues and eigenvectors of Si + Sj are not easy to find. Energies just add.
230
Chapter 6. Eigenvalues and
'8en*B<.tUr,
Three More Equivalent Test
. n nositive eigenvalues and positive energy. That Crtc~ . $
be «lu« 1	w .and prob.bly«be«.bu.»e.lopWi,|,u,^.'»
""“J'“lun™
£ 4 « = К-»	°'  ..S r'
Toll	................
•t j why must columns of A be independent in thia test?
Test 3 applies to S = А л.	if
w“"	. Л. - Лт>1« - M«)T M-) - ||Л.||>, a
s= A A ы* ж.	Ax This cncrgy is positive provided Л, i.
Бк cmupy » И» !<«'»ЛЛ„ О. Ле column of И mu,, be lndepenacn‘
л,л ь“ -* 2- ” 3
. r ] । i 1	Г 2 3	4	is not positive definite.
-I 9 C	7	It i< nncitium	.’
is not positive definite.
It is positive semidefinite
||Л®||2 > 0
,_________ • (1, -2,1) has zero energy
0. Then S » ArA is only positive semidefinite,
1 2
1 3
3 5 7
4 7 10
1» Л I3 - 2 P7T*
It is an eigenvector of A A with л
Equation (2)»ys 4ТЛ is al least semidefinite. because xrSx = ||Ae||« j, never
negative. Semidefinite allows energy I eigenvalues I determinants /pivots of S to be ten
Determinant Test and Pivot Test
The determinant test is recommended for a small matnx. I will mark the four "leading
determinants” Dt, Di. Dt, Dt in this 4 by 4 symmetric second difference matrix.
Test 4 S
-1 2 -1
2
-1
-1
-1 2
has
1st determinant
2nd determinant
3rd determinant
4th determinant
D, =2
£>2 = 3
D3 = 4
D4 = 5
The determinant test is here passed! The energy xTSx must be positive loo.
Leading determinants are closely related to pivots (the numbers on the diagonal
after elimination). Here the first pivot is 2 The second pivot | appears when l(row 1)
is added to row 2. The third pivot j appears when j(new row 2) is added to row 3.
Those fractions 1 are ratios of determinants! The last pivot is
The Ath pivot equals the ratio * of the leading determinants (sizes k and k - 1)
”fc—1
6 , Symmetric Positive Definite Мжпса
231
Tet 4 П» !»•"“ «" И* S П.
' C"n ’“^1 Sir “	S  Л’Л. In («I
elimination on b produces an important choice of Л. RcnicmbCT
that elimination =
iriaHKu^r fa‘ C .	. P lo no* has had l's on its diagonal and
U contained the pivots. But with symmetric matnees we can balance S as LDLT:
2 -1
-1	2
0 -i
put pivots
into О - “J
Test5	° "J
[’I 41
S-LU (3)
S = LDLT (4)
Share those pivots
between A7 and A
Test 3
0
1 am sorry about those square roots—but the pattern 5 - ATA is beautiful: A = y/DLT.
^Elimination factors every positive definite S into Ar A (Ab upper triangular) j
This is the Cholesky factorization S  ATA with ^pivots on the main diagonal of A.
To apply the S “ ATA test when 5 is positive definite, we must find al least one
possible A. There are many choices foe A. including (1) symmetric and (2) triangular
1	If S ж QAQT. take square roots of those eigenvalues. Then A « Qy/XQ' “ AT.
2	If S = LU “ LDLr with positive pivots in D. then S  (Li/D) (y/T)LT).
Summary The 5 tests for positive definiteness of S = S* involve 5 different parts
of linear algebra—pivots, determinants, eigenvalues. S « ATA. and energy.
Each test gives a complete answer by itself: positive definite or semidefinite or neither.
Positive energy xTSx > 0 fa tbe best definition. It connects them all.
232
Suppose 5 » • *•
5 =
• ‘
b ej
I will choose an
Energy E - **S*
The graph of that energy
Chapter 6. Eigenvalue
Positive Definite Matrices and Minimum ь
” rn>b|(
mmetric positive definite 2 by 2 matrix. Apply four of the tests
determinants a > O.ae- b2 >0
 - л t-^0
=^+M+51/J>0
pivots a>0'(ac-bi\i
energy ax2 + 2bry + '.J * 0
3 c = 5 and b = 4. This matrix S has A = 9 and A = ,°
I» *
function £(z.f) «• bon! opening upwards The bottom
tne gmp..« — —-	.	. = 0. This connects minimum problem
book describes numerical minimization^ For the beat problems, the
--------------------------like / = r,or/-x2 + v2 Here is the convexity lcsl.
^murnx <^ *<*-*« «	define a, all poinlx. We are in high dilw
X. but linear algebra .denuf.es the crucial properties of the second denvat.ve matrix.
calculus with positive
Chapter 8 of this I
function f(x) ь strictly cons ex—I
The second derivatives of thc
energy j zTSz are in thc matrix S
For an ordinary function J(x) of one variable x, the lest for a minimum is farnou
... . First derivative <// n Second derivative d2/
Minimum	7-	• 0	.	. .	—~	n ~
is zero di	is positive	<£ra	u	d ж	= *o
For f(x.y) with two variables, the second derivatives go into a matrix : positive definite!
Minimum ilf _ 0 and — = 0 and
at xo. vo Ox Oy
(Pf/dz2 02f/&x0y'\ is positive definite
e^f/didy fPf/Ov*
at x0, Vo
Thc graph of: ’ f(z.y) is flat at that point xo.vo because df/dx = df/dy = 0.
The graph goes upwards because the second derivative matrix is positive definite.
So we have a minimum point of lhe function f(x,y). Similarly for f(x,y,t).
Positive Semidefinite Matrices
Often we arc at the edge of positive definiteness. The determinant is zero. The smallest
eigenvalue is A = 0. Thc energy in its eigenvector is zT5z = z1 Oz = 0. These matrices
on the edge are “positm irmidefiniie". Here are two examples (not invertible):
are positive semidefinite
but not positive definite
Symmetric Positive Definite Matrices
233
S has eigenvalues 5 and Ota trace i. 1 + 4 = 5. !u upper	, ani]0.
The rank of S is only 1. This matnx S factors into A1.4 with depfluirn, columns in A:
Dependent columns in А	Г1	21 fl	О] [1	21
Positive semidefinite S	[2	41	* 12	0 0	0	= ^T,A‘
If 4 is increased by any small number, the matrix S will become positive definite.
The cyclic T also has zero determinant. The eigenvector x - (1,1,1) has Tx = 0 and
energy хлТх “ Sectors z in all other directions do give positive energy.
Second differences T
from first differences A
Columns add to (0,0,0)
2 -1
-1 2
—1 -I
-1] 1
-1 ж 0
2j [-1
-1 0‘
1 -1
0 1
1 0-1’
-1 I 0
0 -1 1
Positive semidefinite matrices have all A > 0 and all x^ Sx > 0. Those weak inequalities
(> instead of >) include positive definite S along with the singular matrices at the edge.
If S is positive semidefinite. so is every matnx ATSA:
If x^Sx > 0 for every vector x, then (Ax)TS( Ax) > 0 for every x.
We can tighten this proof to show that A* SA is actually positive definite. But we
have to guarantee that Az is not the zero vector—<0 be sure that (Ax)TS(Az) is ms zero.
Suppose zTSx > 0 and Ax / 0 whenever z is not zero. Then ATSA is positive definite.
Again we use the energy test. For every x # 0 we have Ax / 0. The energy in Ax
is strictly positive: (Az)T5(Ax) > 0. The matnx ATSA is called "congruent to S.
Example 1 The identity matrix S ” I is positive definite. Then we have proved:
1. AT A is positive semidefinite. 2. If A is invertible, then Ar A is positive definite.
This was Test 3 for a positive definite matrix It is mentioned again because ATSA is
such an important matrix in applied mathematics. We warn to be sure it is positive definite
(not just semidefinite). Then the equations ATSAx  f in engineering can be solved.
Here is an extension called the Law of Inertia
If S1" = S has P positive eigenvalues and N negative eigenvalues and
Z zero eigenvalues, then the same is true for ATSA—provided A is invertible.
The Ellipse ax2 + 2bxy 4- cy2 = 1
Think of a tilted ellipse xTSx = 1. Its center is (0,0). as in Figure 6.2a. Tum it to line
up with the coordinate axes (X and Y axes). That is Figure 6.2b. These two pictures show
the geometry behind the eigenvalues in Л and the eigenvectors in Q and in S = QAQT.
The eigenvector matrix Q lines up the ellipse
The tilted ellipse has xTSx = I. The lined-up ellipse has ХТЛХ = 1 with X = Qx.
Chapter 6. Eigenvalues and Ei(
234
rod the axes of this tilted ellipse 5x2 + 8xy + 5V2 = 1.
Example 2 Ft	nu(nx 5 [haI matches this equation:
Solution Sort mth »e poe •	.	r.
The equation is
The matrix is S'
The eigenvectors are
i ] and [J J Divide by Л for unit vectors. Then $ a qAqt
Eigenvalues 9 and 1
Now i
multiply by [x у] on the left and ['] on the right to get xTSx » (»Tq)a'(
r'dl‘ sum oi	— .—, - -	,	,
К >/2 / V y/2~J • (6)
The coefficients are the eigenvalues 9 and 1 from Л. Inside the squares arc th.
«, . (i.	- (i. -I)M	"“"««««Kn
The axes of the tilted ellipse point along those eigenvectors. This ex
S « QAQ* is called the “principal axis theorem"—it displays the axes No **’y
axis directions (from lhe eigenvectors) but also the axis lengths (from th. * On,y *hc
»<ne eigenvalue,)
Figure 6.2: The ellipse *TS* = 5г2 + 8ry + 5y2 = 1. Lined up it is ЭХ2 + У*  1.
To sec it all. use capital letters for the new coordinates that line up the ellipse:
Lined up ЦД = Х and	=Y and 9X2 + У2 = 1.
y/2	Vi
The largest value of X2 is 1/9. The endpoint of the shorter axis has X = 1/3 and Y = 0.
Notice: The bigger eigenvalue A( gives lhe shorter axis, of half-length l/УХ? = 1/3.
The smaller eigenvalue Aj = 1 gives the greater length 1 /V^a = 1.
In the ry system, the axes are along the eigenvectors of S. In the XY system, lhe
axes art along the eigenvectors of A—lhe coordinate axes. All comes from S = QAQJ.
Symmetric Positive Definite Matnees
235
Optimization and Machine Learning
This book will end with gradient descent to пип.пии /(x). Ucp t0 Xt41 takes
be steepest direction al (he current point «*. But that «еерем direction changes as we
descend. This is where calculus meets linear algebra, at lhe minimum point z7
Calculus The partial derivatives of / are all zero at z’ :	= 0
dx,
Linear algebra The matrix 5 of second derivative, ^L- « posuive definite
If S is positive definite (or semtdefinite) at all «u-
function /(z) is convex. If the eigenvalues of «Г f ~ (*>•••• »")• «ben the
then /(®) I» strictly convex Then are the beuV* ***’*' роы|,*е numbcr s-
only one minimum, and gradient descent will find iL Г“Паюп$ to °P“mi“ They have
Machine learning produces “loss functions" with hundreds of thousands of variables.
They measure the error—which we minimize. But computing al) the second derivatives
is barely possible. We use tint derivatives to tell us a direction to move—the error
drops fastest in lhe steepest direction. Then we take another descent step in a new direction.
This is the central computation in least squares and neural nets and deep learning.
All Symmetric Matrices are Diagonalizable
This section ends by returning to the proof of the (real) spectral theorem S  QAQr-
Every real symmetric matrix S can be diagonalized by a real orthogonal matrix Q
When no eigenvalues of A are repeated, the eigenvectors are sure to be independent.
Then A can be diagonalized. But a repeated eigenvalue can produce a shortage of
eigenvectors. Thi, sometimes happens for nonsymmetric matrices. It never happens
for symmetric matrices. There are always enough eigenvectors to diagonalize S = ST
The proof come, from Schur’s Theorem: Every real square matrix factors into A =
QTQ-1 « QTQT for some triangular matrix T If A is symmetric, then 7* = QrAQ
is also symmetric. But a symmetric triangular matrix T is actually diagonal.
Thu, Schur has found the diagonal matrix we want. If A = S is symmetric, then T = A
and 5 = QAQ r as required The website math.mil.edu/esTryone has the proof of Schur*,
Theorem. Here we just note that if A can be diagonalized (A = XAX"’) then we can see
that triangular matrix T. Use Gram-Schmidt from Section 4.4 to factor X into QR:
A = XAX-1 = QRAR~lQ~l = QTQ~' with triangular T = RAR'1.
236
Chapter 6. Eigenvalues and Eii
Complex Vectors and \.ja
This page allows for complex numbers like 3 + 4t in z and 5. The complex
z, « 3 + dr is It = 3 - 4i. Then z, times z, is 32 + 42 = 25 (reaj,. nnjugaic of
|z, I = y/25 = 5. Now suppose we have a complex vector x = (Xj,	= ^nitu^ J
Length squared ^3 _ yT, _
25 + 2 * 27
II*» 4.v_
Before we move to nutnces, here are the key facts about complex numbers
I The conjugate isz=a-ib x + 't	------° ’b.
= 2a is real z times I = |x|2 = a2 + b2^
. 7~г~	- —	— a ~ tb
equals .i-з , - w - j--.
I____________________
A real symmetric matrix has 5T = S. With complex numbers, we must change S'7 -
Here is a mauix that has S T = S (Hermitian matrix).	to S7
2	3-3»
3 + 3i 5
S=S
xrSx = real number for complex i
8 and —1= real eigenvalues of S
‘ 2 —A 3-3»
dct(S - A/) = det 3 4. Ji 5 - A
A2 - 18 = (A - 8) (A +1)
f_ _ ir » 3-3iirzil 2ziXi + 512*2 +
ITS.-!1'	5 ][i;]'i,(3-3.>1+Il(3 + 3i)xrre*1 W
xT5x is real: Equation (7) ends with complex numbers z+z = a+ib+a-ib = 2a (reef).
Hermitian matrix ST = S is the complex analog of a real symmetric matrix SF = $.
Unitary matrix QT=Q'1 is lhe complex analog of an orthogonal matrix with QT=Q-t
The eigenvalues of S = S Г are real. Tbe eigenvalues of a unitary Q have |A| = 1.
The eigenvectors q,.....q„ can be complex. Those eigenvectors are still orthogonal.
The command S' in MATLAB or Julia automatically returns JT when S is complex.
Thc dot product of complex vectors is xTy = *t»i + • • • + znj/n.
Example to show two
complex dot products
xTx = 2
xTy = 1-1 = 0
Those orthogonal vectors z. p are thc eigenvectors of Q =
1'
0
and5= _® ’ .

_„,lnc Positive Definite Matrices	237
6 3.
 WORKED EXAMPLES 
Test these symmetric matrices S' and T for positive definiteness:
вЛА	[ 2 -1 01	r . • ’
Solution The pivots of S are 2 and 2 and <	“
and 3 and 4. all positive. The eigenv^ucs of $	uppcr feft determinants are 2
That completes three tests. Any one test is । V 2	2 and 2 + x/2. all positive.
1 have three candidates Alt Aa, 4j to	.
a first difference matrix with 1 and -1 to “ л A “ Positive definite. At is
to produce the second difference-1,2,-1 in S:
S= A{Ai
2
-1
0
-1
2
-1
°] Г1
-1=0
2j [O
-1	0
1 -1
0	1
O'
0
-1
1
-1
0
0
0
1
-1
0
o'
0
1
-1
The three columns of A, are independent Therefore S is positive definite.
Aj comes from S = LDLT (the symmetric version of S = LVY Elimination gives
the pivots 2, j in D and the multipliers -|,0, -1 in L. Just put A2 = Ly/D.
LDLT =
1
“5
0
1 -j 01
1 -J = (L7D)(Lx/D)t = aJa2.
This triangular matrix Aj has square roots (not so beautiful). A? is the “Cbolesky factor"
of S and the MATLAB command is A = chol(S). In applications, the rectangular Ai is
how we build S and this Aj is how elimination breaks it apart
Eigenvalues give the symmetric choice A3 = Qy/A.QT. This succeeds because
A3 A3 = QAQT = S. All tests show that the —1,2. — 1 matrix S is positive definite.
The three choices Ai, A2. A3 give three different ways to split up the energy xTSx:
xTSx = 2x^ - 2xi jj + 2x, - 2хэхз + 2xj
l|Atx||2 = x? + (x2 - xi)2 + (x3 - xj)2 + x?
||A2x||2 = 2(x! - |xa)’ + |(x2 - §x3)2 + $ x§
||A3x||a = Ai(q7x)2 + A2(q2 x)2 + As(gjx)2
Rewrite with squares
S = A^Ai
S — LDLT = AjAa
S = QAQT = AjA3
For the second matrix T, the determinant test is easiest.
Test on T det T = 4 + 2b - 2b2 = (1 + b) (4 — 2b) must be positive.
At b = — 1 and b = 2 we get det T = 0. Between b = -1 and b — 2 this matrix T
is positive definite. The comer entry b = 0 in the matrix S was safely between -1 and 2.
Chapter 6. Eigenvalues and Ei
Benve^
Problem Set 6.3	Лс мгос cigenvaJucs *
1 Suppose S ₽	й syinnKlnc When В ------------------
(•) Tnubpo* -45B	eigenvalues) when В =----------
(b) AS В » sinular to	та1псе5 similar to S look like (_ W(
Put (a) and (Ы together	eigenvectors and the factors for Q\QT.
the Cln
2 2 2
S« 2 0 0
2 0 OJ
0 21
and T- 0 -1 -2
-2	0
2
П
2
3
4
5
6
7
Find all orthogonal matnees that dugonanze J2 J6J.
(.) Find a svmmetnc matnx [J ?] that 1ш a negative eigenvalue.
<b) How do you known must have, ncgal.ve pivot?
(c) How do you know и can t hm two negator eigenvalues?
If C is symmetne prove .hat ArCA is also symmetric. (Transpose it.) When A it
6 by 3. what tn the shapes of C and A CA.
Find an orthogonal matnx Q that diagonalizes S = [ ’ “j. What is Л ?
If Л’  0 then the eigenvalues of A must be ——• Give an example that hai
A / 0 But if Л is symmetric, diagonalize it to prove that A must be a zero matnx
Write 5 and 7 in the form A|»jxf + *2*1*1 of 'he spectral theorem QAQT.
(keep||xi|| = ||x2|| = j).
T-
5 =
8 Every 2 by 2 symmetne matnx is A|X|X^ + A2x2x2 “ A|ft + A2ft. Explain
ft + ft = xixf + x2xj - I from columns times rows of Q. Why is ft ft - 0?
9 What arc the agenvalues of A = [_£ ,]? Create a 4 by 4 antisymmetric matrix
(Лт = -Л) and verify that all its eigenvalues are imaginary.
10 (Recommended) This matrix Af is antisymmetric and also_________Then all its
eigenvalues are pure imaginary and they also have |A| = 1. (||A/x|| = ||x|| for every
x so jt Ax|| = |x| for eigenvectors.) Find all four eigenvalues from the trace of Af:
1
-1
0
I
1
1
-1
0
can only have eigenvalues i or - t.
6Л
Symmetric Positive Detinue Matrices
239
. Show that this A (symmetric but complex i	.
11	°	“Prex> has only one line of eigenvectors:
Л*[1 -I]	<|,Ч«и1.иЫе „8с<„11ие%х»o.O
д! , X » M .ucb. Ч«х»1 prapcn, fa trapk>	sroj profeny „
А = A. Then all A s are real and the eigenvectors are orthogonal
12 Find the eigenvector matrices Q for 5 and X for B. Show that X doesn’t collapse
at ~ !•evcn lhou*h л = 1 is repeated. Are those eigenvectors perpendicular?
	0 d	o’	-d 0	1*
5 =	d 0	0	Be 0 1	0 have A-l,d.-d.
	0 0	1	0 0	d
13 Write a 2 by 2 complex matrix with 3T = S (a “Hermitian matrix"). Find A, and Aj
for your complex matrix. Check that Sjzj = 0 (this is complex orthogonality).
14	True (with reason) or fake (with example).
(a)	A matrix with n real eigenvalues and n real eigenvectors is symmetric.
(b)	A matrix with n real eigenvalues and n orthonormal eigenvectors is symmetric.
(c)	The inverse of an invertible symmetric matrix is symmetric.
(d)	The eigenvector matrix Q of a symmetric matrix is symmetric.
15	(A paradox for instructors) If AAT = ATA then A and AT share the same eigen-
vectors (true). A and AT always share lhe same eigenvalues. Find the flaw in this
conclusion: A and AT must have the same X and same A. Therefore A equals A r.
16	Are A and В invertible, orthogonal, projections, permutations, diagonalizable ?
Which of these factorizations are possible: LU.QFL XAX~1.QЛQT^
0
A- 0
1 1’
I 1 .
1 I
0 1
1 0
0 0
17	What number 6 in A = [ ? j}] makes A = Q\QT possible? What number will make
it impossible to diagonalize A? What number makes A singular?
18	Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers
can be eigenvalues of those two matrices?
19	This A is nearly symmetric. But what is the angle between Ле eigenvectors ?
Г1 IO"15 1
[0 i + io-*5]
has eigenvectors
and [?]
Chapter 6. Eigenvalues and &
240
21
22
23
If Аш„ i* u* --------~
larger than A^. What n the first entry о,, °> о - -----------
Suppose Лт = -Л (real aniitymmetnc matrix). Explain these facts abou( * **»•
(а) x7 Az = 0 for every real vector z.
(b) The eigenvalues of A are pure imaginary.
(ct The determinant of A is positive or zero (not negative).
For (a). multiply out an example of xT Ax and watch terms cancel n
xr(Ax) to -(Ar)Tz. For(b). Ax = Az leads to tTAz = AlT* a дй J* г«*егч
shows that zTAz - (*- iy)TA(x + ty) has zero real part. Then (bi hit1 Pa'4a>
ncips wi(Jj i
If S is symmetric and all its eigenvalues are A = 2, how do you know th о
2/ ? Key point S) mmetry guarantees that S « ^AQ7. What is that Д ? * 'T,UM **
Which tymmetnc maincei S art alto orthogonal? Show why S'2 « / \v
possible eigenvalues A ? Then S must be QAQr for which A ? ’hat are thc
Problems 24-49 are about tests for positive definiteness.
Suppose the 2 by 2 tests a > 0 andac -1? > 0 arc passed by S * f • ь i
(I) A| and A, have the tome tign because their product A, Aj equals *
(i) That sign is positive because A( + A2 equals___So A. > n
«nd Aj q
Which of Si. Sj. Sj. S4 has two positive eigenvalues? Use a test don1.
Also find an x so that xrStx < 0. so S, is not positive definite.’ C°n’PUte A’«.
24
25
Si -
26
For which numbers b and e is positive definite ? Factor S into LDLT.
S-ff ‘I.
S =
S
27	Write f(x. p) - x2 + 4*v + Зу2 as a difference of squares and find a point (*, v)
where / is negative. No minimum al (0,0) even though / has positive coefficients.
The function /(*,») = 2zy certainly has a saddle point and not a minimum at (0,0).
What symmetric matnx S produces [ г у ]S [' ] = 2ry ? What are its eigenvalues?
29 Test to see if ArA is positive definite in each case: A needs independent columns.
bi Syfflinctnc Positive Detinue Matrices
241
30
31
32
33
34
35
36
37
38
39
Which 3 by 3 symmetric matrices S and T
з	«1*1-z,z»). Why isTsemidefinite?
Compute lhe three upper left determinants <rf c	M • .	• •
Verify that their ratios give the second and ЛЛ	P°'"1'e dd"”*ene4S
7	* Ond л,г<1 pivots in elimination.
Pivots = ratios of determinants $
2 2 O’
2
0
5
3
3
8
For what number» e and d art S and T posiuve definite? Тем their 3 determinants:
1
Г
1
S =
c
1
lie
and
1 2 3‘
d 4
4 5
2
3
Find a matrix with a > 0 and c>0anda + c>26 that has a negative eigenvalue.
If S is positive definite then S 1 is positive definite. Best proof: The eigenvalues
of S~' are positive because________Can you use another lest ?
A positive definite matrix cannot have a zero (or even worse, a negative number)
on its main diagonal. Show that this matrix fails the energy lest zTSz > 0:
4
1
1 *1
0 2 zj is not positive when (zi.zj.za)" ( , , ).
2 &
A diagonal entry »t) of a symmetric matnx cannot be smaller than all lhe A's. If it
were, then S - в]}1 would have_________eigenvalues and would be positive definite.
But S -	/ has a_______on the main diagonal.
Give a quick reason why each of these statemenu is true:
(a)	Every positive definite matrix u invertible.
(b)	The only positive definite projection matrix is P = I.
(c)	A diagonal matrix with positive diagonal entries is positive definite
(d)	A symmetric matrix with a positive determinant might not be positive definite!
For which s and t do S and T have all A > 0 (therefore positive definite)?
From S = QAQT compute the positive definite symmetric square root Qy/\QT
of each matrix. Check that this square root gives AT .4 = S:
242
40
Chapter 6 bgenvalues
find the half-lengths Of iUa
aXcs frOfn
41
find S. From S =
4 8
8 25
find С» _k ,
‘Wj.
From Ся
In the Cbolesky factorization S - CTC. with C = >/DLT, the square root.
p.HXs are oo the diagonal of C. Find C (upper triangular) for	№o,s of ц,с
19 0 0
0 1 2
0 2 SJ
[ci»0
Wrnhout multiplying S = К $
(a) the detenmnani of S
(c) the eigenvectors of S
The graph of a  x3 + y3 is a bowl opening upward. The graph of r - xa
raddle The graph of г - -X3 - у3 “ • bowl opening downward. What'
o, 6, c for x a ox3-t-T&ry + cy3 to have a saddle point at (x.y) ж (o.q)?*
Which values of e give a bowl and which c give a saddle point for th
r  4т3 + I2xy + cy3? Describe this graph at lhe borderline value of r ' 8r*Ph *
When S and T are symmetric positive definite, ST might not even be
But stan from STx - Xx and take dot products with Tx Th™ n ,y,nn’cl'ic.
________	1 ncn Prove A > 0
Suppose C is positive definite (so yTCy > 0 whenever v d 0) and 4 c .
dent columns (so Ax / 0 whenever x # 0). Apply the enerjtv test t *T 'f^epen-
- S -
Importanl ’ Suppose S is positive definite with eigenvalue* A1 > д >
(a) What are the eigenvalue* ofthematrixAi/ -$?!* it posn.vr c A”
(b) How doe* «follow that AiXT* > xTSx for every x?	' ° П"С?
(О Draw this conclusion. The nusimum value of x^Sx/x^x is __
Pur which a and ria this matrix positive definite > For «л к
м-Tnidefinitr* /th»	k л	' r which a and c is it positjy^
All 5 tests are possible.
The energy xTSx equals
e(*i+xl+lJ)2 + c(T2_;r3)2
42
43
45
47
45
49
5 =
[1 1 I]
11 2 21
[1 2 7]
01 Г сове sin 01
япв cos0j[O 5][-sin0 coe0J'^nd
(b) lhe eigenvalues of 5
(d) a reason why S is symmetric positive definite
y3he
atc«on
and
Systems of Differential Equations
6-4 Systems of Difrerentia,
243
। If s Ax thcn u(<) = eUx *'ll solve
If A s	1 ’hen |ЦМ = ^‘«(0)
i
3
4
5
= Xe^jr-t
x give a solution еих.
Matris^i--------	'T/“T’" + Mt)"/n! +
is stable and u(t)-» 0 and ел<-so whe	Xr A 1 lf A - XAX~l.
.-+B^	W<0.
first order system	+ « • 0 is equivalent to [u ]' [ 0 llful
Eigenvalues and eigenvectors and Л « Пу-l
Шу are also perfect for differential equation^ du/dt ₽Crf<X‘ for n“trix P°««n Я*.
algebra, but to read it you need one fact from cakul 4 V*" *С‘,ОП “ moMl> ,inear
Thc whole point of the section is this: Constani	derivative of e^ is Arx'
inverted Into linear algebra.	«*Пккп| dinerentul equations can be
The ordinary equations ~ = u and ~	»
dt л - Au are solved by exponents:
(I)
du
du
и produces u(t) . Ce1
dt
dt ’ Au ^“ces u(t) . CeA‘
At time t - 0 those solutions start from u(0) » r s_
tells us C. The solutions u(t) = u(0)e‘ and / beca“*ve“" « This “initial value”
»=j«««a™..iь,,“<0,И *““<»)•
vector и (now boldface). It starts frorn initl4) ч’\т°'с' n by n. The unknown is a
square matrix A. We expect n solutions u(t) » JuZ. n 'Suations contain a
____________,	» from n eigenvalues / eigenvectors.
“t(0) 1
System of
n equations
du
dt = Ли starting from the
vector u(0) =
L“-(0)
These differential equations are linear. If U(H
We will need n constants like C and D to mairh nJ ** *° “,юм- *° « c«(0 + Dv(t).
is to find n “pure exponential solutions" и = рл'т к ” Com,*’nenu °f u(0). Our tint job
Notice that A is a constant matrix. In other hn^r “a^
In nonlinear equations, A changes as и сЬап«ч и 1°°^ и ch«8« “ ' changes.
du/dt = Au is “linear with constant coefficients" тъ	Шие difficu,I’«-
“equ.Lon,ta,„willсот„d,ra|j„" “*
linear eomUM	,,	ri„ У _ .
(2)
244
Chapter 6. lugcnvalucs und Eigcnvcct,^
Solution of du/dt = дп
Our Pure exponential soluuon will be r* times a fixed vector x. You may gUcss ц,ад д
>s an eigenvlterf A and x «the eigenvector. Substitute u( < x into the equate
Л Au to pnwe you	nght. The factor r* will cancel to leave Ax = Ax.
du
dt
agrees with
Au = Aextx
(3)
share the same c* The solution
, ••• • • —
'* like a sine wave.
4
2 •
All components of this special solution u
grows when A > 0. h decays when A < 0. If A » a complex number, ns real pan dec.des
growth or decay. The imaginary part u> give* oscillation <
starting from u(0) »
This is a vector equation for u. Il contains two scalar equations for the components у and a.
They are ’
du
— = Au
Ju .
Example 1 Solve — = Au -
“coupled together" because the matrix A is not diagonal:
.	dy	dz
means that	—	=	x	and	—	=	u
dt	dt	v
The idea of eigenvectors к lo combine those equations in a way that	gets back	lo I by |
problems The combination» у + : and у - z will do it. Add and subtract equations:
^(y + x)-x + y and jj(y -x)--(y-r).
The combination у * grows like e*. because it has A  1. The combination у — z decays
like r“‘. because it has A « -I. Here is the point: We don’t have to juggle the original
equations du/dt  Au. looking for these special combinations. The eigenvectors and
eigenvalues of A will do it for us.
This matnx A has eigenvalues I and -I. The eigenvectors x arc (1,1) and (1,-1).
The pure exponential solutions U| and Uj take lhe form < A,x with A| = 1 and Aj  -1;
u2(t) = ca»'x2 = «"•	1
(4)
Complete solution u(t) =	, Г11	. Г 11 _ [Ce'+ De"'!
Combine u, and u3	C [1]+D	[-1J ” [Cef - De"'] ’ 5
With these two constants ( and f). we can match any starting vector u(0) — (U|(0), uj(0)).
Set t = 0 and e» = 1. Example I asked for the initial value to be u(0) = (4.2):
u(0) decides	r[ll f 11 hl
CandD	' llj * L-l] = l-’l У,С,Л C = 3 and D = 1
With C - .t and О = 1 in ihe solution (5). the initial value problem is completely solved.
j, 4 Sy**cm» of Differential Equations
245
Forriby n matrices we look for n eigenvectors Thea C and D become r,
1.	Write u(0) as a combination c,z, + • • • + r.x. cftherigenvectors of Л
2.	Multiply each eigenvector z, by its growth factor cA«'
3.	The solution to du/dt Ли is the same combtnatmn of those pure solutions eA'z:
|м(0 = с|ел»^1+... + с,<Л,>ж<<.	(6)
Not included: If two A’» are equal, with only one eigenvector, another solution is needed
(It will be te*‘x.) Step I needs a basis of n eigenvectors to diagonalize Л = XXX~l.
Example 2 Solve du/dt = Ли knowing the eigenvalues A - 1.2.3 of Л:
Typical example
Equation for и
Initial condition u(0)
1 I 11	rr
0 2 I и starting from u(0) - 7 .
° 0 з]	[4
The eigenvector matrix X haszt  (1,0,0) and za  (1,1,0) and z,  (1,1,1).
Step 1 The vector u(0)  (9,7,4)i*2zi + 3z2+4zj Then (ci.cj.d)  (2.3.4).
Step 2 The factors eA* give exponential solutions e*Z| and raz3 and «-“z*.
Step 3 The combination that start* from u(0) is u(f)  le’zi + 3«21 x3 + 4- “zj.
The coefficients 2,3,4 came from solving the linear equation C|Z| + cjXj + c3z3  u{0):
c,l	1	1	1	‘
c, «	0	1	I
cj]	[О	0	I
which is Xc  u(0). (7)
You now have the basic idea—how to solve du/dt » Ли. The rest of this section goes
further. We solve equations that contain second derivative*, became they arise so often in
applications. We also decide whether u(t) approaches zero or blows up or just oscillates
Al the end comes lhe matrix exponential eAl. The short formula r u(0) solves the
equation du/dt = Ли in the same way that Л‘и«, solve* lhe equation ш+1 = ,u‘
Example 3 will show how “difference equations" help to solve differential equations
All these steps use lhe A’» and the zs. This section solves the constant coefficient
problems that turn into linear algebra. Those arc the simplest but mostjmportant
differential equations—whose solution is completely based on grow th factor* c .
Second Order Equations
The most important equation in mechanics is my + by * rinc aixTTtHs force
is the mass “times the acceleration a = < Then by « dampmg and Ky is force.
246
Chapter 6 f .ipenvaJues JrKj
bgenvm,^
Thu a a seconder equation (it » Abrtaw'r F = ma). It conta.n* lhc
denvauve p" = tfy, Л< fl is util linear »nh constant coefficient* m. b, k.
In a differential equation* course. lhe method of *olution ia to substitute у = fAt
Each derivative of p bring* down a factor A. We want p = c to solve thc equation:
m + 6 — + ky = 0 become* (mA’ + 6A + fc) eA‘  о, (K.
Everything depend* on rnA3 + ЬХ + к - 0. Thi* equation for A ha* two root* Д, an(j
A,. Then the equation for v ha* two pure wilution* y, = e 1 and yy ж rA’'.
combination* e>pi + ejpj five lhe complete wilution. Thii U not true if A| ж AJt
In a linear algebra course we expect matrices and eigenvalue*. Therefore we turn the
scalar equation (with p") into a rertor equation for у and y': First derivative only!
Suppose lhe ma*» is m - I. Two equation* for u  (y, p') give du/dt = Au:
dy/dt = |/
dtf/dl = ~ky - by'
converts to
- IV1 - f ° ’ll»]
л kJ  1-* -M kJ
 Ли. (9,
"The first equation dy/dt - tf is trivial (but true). Thc second is equation (K) connecting y"
lo g' and y. Together they connect u' to u Now we solve и' ж Au by eigenvalues of A •
A - XI ж J * fc * A ] determinant X1 + ЬХ + к ж 0.
The equation for the A’* Is the same aa (M)' It I* still A3 + bX 4- к я 0, since m = 1
The roots A| and Aj are now eigenvaluet of A. Thc eigenvector* and lhe solution arc
Hie first component of u(t) ha* p - qr*'1 + r,rA>'-thc same solution a* before.
Il can't be anything else. In the second component of u(f) you sec the velocity dy/dt.
The 2 by 2 matrix A in (9) is called a companion matrix—a companion to thc equation (8).
Example 3 Motion around a circle with y" + p = 0 and у = con t
Thia is our master equation with mass rn 1 and stiffness к ~ 1 and d 0: no damping
Substitute p r ‘* into if + p 0 lo reach A3 + 1 = 0. The root, are A = I and
А я Then half of r'* f e'" gives lhe solution у ж «м|.
As a first order system, the initial values p(0) ж |, y'(O) = Ogo into u(0) = (1,0):

Use p" = — p
The eigenvalues of A are again the same A - I .„d A = -( (no surprise). A is anti-
L h (7 m e?rm'tan ? (and 0. -•). The combination that matches
t*(0) (1.0) i. J (a-t+r,). Step 2 multiplies the bye" and e ".
^ 4. of Differential Equations
247
Step 3 combines lhe pure oscillations e" and ,-•< ,
' to had v « cost » expected:
Г
«(0 -	|
A|| good. The vector u
because con1 t + sin1t
1
2
-H
i t. 2*“rL^mwndlcirc,e,fi»“re6’» ^rad'u'bi
1. The PenoiT i* 2» when и completes a circle.
Figure 6.3: The exact «elution u = (con t, — sin t) stays on a circle.
Forward differences Yi, Yj,... spiral out from the circle in 32 steps.
DifTercnce Equations
To display a circle on a screen, replace jf  -p by a difffrtact reunion Here are three
choices using Y(t + Д1)-2У(1) + У (I- At). Divide by (At)7 to approximate rf'p/df*.
F Forward from n — 1
C Centered at time n
В Backward from n + 1
V WxV -У—'	(,|Г)
K.4.1 - 2У„ 4 r„_-i s	(|IC)
(A/’’	-У.41 (I I Bl
Figure 6.3 shows the exact y(t) - coat completing a circle at I = 2a The three dif-
ference methods don't complete a perfect circle in 32 time steps of length Af = 2»/32.
The spirals in those pictures will be explained by eigenvalues A for 11 F. 11 B. 11 C
Forward |A| >1 (spiral out) Centered |A| = l(best) Backward |A| < 1 (spiral In)
The 2-slcp equations (I I) reduce to 1-step systems (/„♦ । »AUK. Instead of u = (y.y )
the unknown is U„ - (K.2.). We take n time steps of sire At sinning from Uo:
Forward
(11F)
Уп*1 =K. + Af2«	[ 1
z;*‘=zn-AiK bwome' "-‘[-Д' 1 Jl*-J
(12)
Those are like У' = Z and Z' - -Y They are h гм order equals Now we have
matrix. Eliminating Z would bring back the second order equation (I I F).
Chapter 6. Eigenvalues and Eigcnv^
248
• dmnle Do lhe points (YK.Z„) stay on lhe circle Y3 + Z3
My question is ытР,с . R 6j w, are taking powers An and nol „л.
No. they are grow mg to m	.	eigenvaiueS! x
so we test the magnitude \X\ and not tne re
Eigenvalues of .4 A = 1 ± H>en |A| > 1 «nd (K.,Zn) spiralsout
Tbe backward choice in (11 B) will do the opposite in Figure 6.4. Notice the new A:
Backward
K.+i = K, + A*^-*>
Z.+1 = Z. - AiK+i
is
Г 1 -All [K+i
[A( 1 [Zn + 1
” И =tz"- (13)
That matrix
Then |A| <
has eigenvalues 1 ± »AL But we invert it to reach Un+1 fraiT1 v
1 explains why the solution spirals in to (0,0) for backward difference
Figure 6.4: Backward differences (11B) spiral in. Leapfrog (11C) stays near the circle.
On the right side of Figure 6.4 you see 32 steps with the centered choice. The solution
stays close to the circle (Problem 28) if At < 2. This is the leapfrog method, constantly
used. The second difference У,+| - 2УЯ + Уя-1 “leaps over" the center value Yn in (II).
This is the way a chemist follows the motion of molecules (molecular dynamics leads
to giant computations). Computational science is lively because one differential equation
can be replaced by many difference equations—some unstable, some stable, some neutral.
Problem 26 has a fourth (very good) method that exactly completes the circle.
Real engineering and real physics deal with systems (not just a single mass at
one point). The unknown у is a vector. The coefficient of y" is a mass matrix M,
with n masses. The coefficient of у is a stiffness matrix K, not a number k. lhe coef-
ficient of j/ is a damping matrix which might be zero.
The vector equation My" + Ky = / is a major part of computational mechanics.
ь 4 System» of Differential Equations
249
For thc solution of du/dt = Ди. h a fu	2 b> 2 Matnces
appnxich и = 0 лг t -♦ oo? h probjcm ,T™ Ч«*мюп. Dow the wham
includes e‘ is unstable. Stability depends on the e,JL. ,р"'П’ спст«>7 A 4olu'*on ,hjl
The complete solution u(t) fc bu.lt from	Л
we know exactly when e* will approach tan * И ^«wxlue A is real.
If the eigenvalue is a complex number A - r °	' 'lu"d>er A mutt be negative.
When eU splits into erte“‘, the factored has ahL'l .rral P™ r muil >* negative.
«’«’lute value fixed at 1:
c‘" = cos st 4-i sin st has |eU‘|, = coe2rt4-»in2rt = l.
Then |eA‘| = ert controls the growth (r > Q) or the decay (r < m
The question is: Which matrices have nevat^ * . ° '
„ fte	a
A is stable and u(t) -» 0 when all eigenvalues A of A have negative real parts
The 2 by 2 matrix A = [ «& must pass two tests:
Ai + Да < 0	The trace T = a + d must be negative.	(I4T)
AiAa > 0	The determinant D = ad - be must be positive.	(I4D)
Reason If the A’s are real and negative, their sum is negative. This is the trace T = a + d.
Their product is positive. This is the determinant D. The argument also goes in reverse.
И D = Ai A2 is positive, then At and A2 have the same sign.
IfT = Ai + >2 is negative, that sign will be negative.
If the A s are complex numbers, they must have the form г + ie and r — is.
Otherwise T and D will not be real. The determinant D is automatically positive, since
(r + is)(t is) = r2 + s2. The trace Tisr + ta + r — i» = 2r. So a negative trace T
means that r < 0 and the matrix is stable. Tbe two tests in (14) are correct
The Exponential of a Matrix
We want to write the solution u(t) in a new form e^ufO). First we have to say what e**
means, with a matrix in the exponent To define rM for matrices, we copy e* for numbers.
The direct definition of e* is by the infinite series 14- x 4- Jx2 4-1x3 + - • •. When you
change x to a square matrix At. this infinite series defines the matrix exponential e41:
Matrix exponential e4‘	e42 =/4-At 4-}(A7)2 4-|(At)3 4-••• |	(1
Its t derivative is Ae4’ A 4- A2t 4- jA3t2 4- •• • = Ar4’
Its eigenvalues areeA2 (7 4- At 4- ;(At)2 4----)x = (14-A/4- |(At)2 + •••)«
250
Chapter 6 Eigenvalues and
EiBcnveetOrj
^Taw converges and its derivative is always Ae*. Therefore < л,и(0) so|Ves
differential equation with one quick foonula-^rn i/there и a shonage of e,genveclorx
Ibis chapter emphasizes how to find u(t) = ^'u(O) by^.agonal izat ion. AssUlnc д
does have n independent eigens-ectors. so we can substtMc! A - Л AX lnlo
for e4*. Whenever XAX-’XAX’’ appears, cancel X 1A tn the middle:
A‘ = / + ХАХ-Ч + |(AAX-‘t)(XAX-‘t) +..,
= X|/ + At + |(A<)2 + ---]X-*
L'se the series
Factor out X and X-1
eA‘ is diagonalized!
rAI has the same eigenvector matrix X as .4. Then A is a diagonal matrix and so is cAt
The numbers Л' are on lhe diagonal. Multiply XeA,X_1u(0) lo recognize u(f):
'%(0) = XrA,X-'u(0)
This solution r^ufO) is the same answer that came in equation (6) from three steps;
Example 4 Use the infinite series to find c4' for A = [ _J „). Notice that A* = I;
A*'. A®,A7. A" will be a repeal of A. A3. A3, A*. The top right comer has 1,0,-1,0
repeating over and over in powers of A. Then I - Lt3 starts the infinite scries for eAt in
that top right comer. and 1 - if2 starts the top left comer:
'M = I + At + |(At)’ + g(At)3 +	-
,-f+jf3------------------------------------------------------- +
That matrix e4' shows the infinite scries for cos t and sin t!
eAt _ | cosf sint
— slot cost
(18)
м-
Systems of Differential Equations
251
л ;s an antisymmetric matrix (AT = -Д) t.___________
The eigenvalues of A are i and -i. Then the eigemXs"^ » “
1	The inverse of eA* is always е~л‘.	*' *e .
2	The eigenvalues of eAt are always e4
3	When A is antisymmetric, eA‘ is 0f1hog0IML =	= At
An,iSy";Sike i and T’kc'n '(Xh 'ym,nCtnC"	b™ P“« ^i-гу
eigenvalues like. and -i. Then eA‘ has eigenvalues like e“ and e- Then absolutevalue
is 1: neutral stability. pure mediation, energy i. conserved. So ||u(f)j|
If A is triangular, the eigenvector matrix X is rise tnangular. So are %- and eM.
The solution u(t) is a combination of eigenvectors. Its short form is ?'x(0).
™ Г1 11	r i
Example 5 Solve — = Ли = j u starting from u(0) = * at t ж 0.
Solution The eigenvalues I and 2 are on the diagonal of A (since A is tnangular). The
eigenvectors are (1,0) and (1.1). Then eAl produces u(t) for every u(0):
u(t) « Хел‘Х-‘и(0) is [J
That last matrix is eAt. It is nice because Л is triangular The situation is the same as
for Ax “ b and inverses. We don't need A"’ to find z. and we don’t need eAl to solve
du/dt - Au. But as quick formulas for lhe answers. A~'b and ели(0) are unbeatable
Example 6 Solve y" + 4/ + 3y  0 by substituting e** and also by linear algebra
Solution Substituting у = e* yields (А1 + 4A + 3^** ж 0. That quadratic factors into
д* + 4А+3= (Л+ 1)(А+3) ж 0. Therefore Ai = —landAj = -3. The pure solutions
are l/i - e~‘ and щ • The complete solution у = ciyt + cjyj approaches zero
To use linear algebra we set и = (у. у*). Then the vector equation is и’ ж Ди:
dy/dt = yf	du	Г 0 11
,,.	, converts to — =	„	. si-
dy /dt — — 3y — 4y	dt	[—3 -4]
This A is a “companion matrix” and its eigenvalues are again A| = -1 and A2 = -3:
Same quadratic det(A - A/) “ I _g _ д | —	+	+ ® “ ®-
The eigenvectors of A are (1,4) and (1. Aa). Either way. the decay in y(f) comes from
e“‘ and e~4 With constant coefficients, calculus leads to linear algebra Ax = Az.
The eigenvectors are orthogonal (proved in Section 63 for all symmetric matrices)
All three A< are negative. This A is negative definite and e*' decays to zero (stability).
The starting u(0) = (O,2V^.O) is zj-zz The solution is u(f) = e** *a — c ’ *a
252
Chaptci 6. Eigenvalues and

,,- al the center starts at 2V2. Heat diffuse
„Ш HP« h5*-,,К tCn^4.tsiJe Nw» <fr0,cn " °O) ra,e of heat
Hau '<*“*'’* ttxes and then ‘°	From box 2. heat flo*s Wt and right«
£“ ГД ы»» - *	- *•>+in lhc “““• ™ <x
Ihe rate u, —3 -»« '	eigenvec.ors X- But now: the eigenvalues A
._•— j-и dr - 'u	tk fmiuencies come from и-'	--A:
rtorr	<r“	. —<T The ttequcnstc.
lcJj to o^lbixm* «	x ana
,• лЗг«-*х = Ar^x and u? = -X.
£ ал,). A(e-**) bccon<> 1
Л- '	fwt, <“**•• «Л Лгсе c>8cnvecton
art two	Tu**a combination will match the six components of
u(0) and «(<»•*’«’*
Figure 6.3: Heat diffuses away from box 2 (left). Waves travel from box 2 (right).
„ . . . „ _ ,ar t/* - 2u' + V = 0 gives an equation with
Example 7	*e 0 b /д _ ) >r я о with A = 1,1. A differential equations
would propswe г and (e* m two independent so)u.ions Here we discover why.
Linear alg^Treduce* jf - 2/ + V - 0 lo a vector equat.on for и » (у,«/):
Л has a repeated eigemalue A ” 1,1 (with trace « 2 and det A = 1). The only eigen-
vectors are multiples of x - (1,!). Dtafonali^ition is not postiblt. This matrix A has
only one line of eigenvectors. So we compute rAl from it> definition as a scries:
Short series	ел‘= e^ef4-^ = e* [/+ (Л — J)tj.	(21)
That “infinite" senes for е(Л*,и ended quickly because (Л - /)2 is the zero matrix.
You can see te* in equation (21A The first component of e4' u(0) is our answer y(f):
[ J' ]я? [/+ [ "I ! 1'] [ «Cl 1 y(t)=e‘v(o)-te*y{0)+teV{0)-
64. Sy’W"** °* D,ffcrcnt“> Equations
253
problem Set 6.4
ante In linear algebra lhe serious danger к а	.
A1)and(l,A2)arcthcMmeifA7^x	geo*agcmwl<*' Our eigenvectors
don’t yet have two independent solutions lo du/dt = Au ’ d“r°nak/e A ,n lhn “* wc
In differential equations the danger is also а гепел«1 1 an Ar
elution has to be found. It turns out to be и = «Jm тъ -Af ” * “/ ’ “ *econd
” к. О »<*«	<«’ bl,*1 “
1
2
5
Find two A’s and x’s so that u = e^x solves — : [4 3]
dt [О 1] “
What combination u = e^'x, +	(mm u(0) . (5, _2)?
Solve Problem 1 for u = (y,z) by back substitution, z before у:
Solve - - z from a(0) - -2. Then solve $ ж 4p + 3z from y(0) - 5.
at
The solution for у will be a combination of e* and e*. The A’s are 4 and 1.
(a)	If every column of A adds to tero. why is A = 0 an eigenvalue?
(b)	With negative diagonal and positive off-diagonal adding to zero, и' ж Au
will be a "continuous" Markov equation. Find lhe eigenvalues and eigenvec-
tors, and the steady state as t -♦ oo
dti Г—2	31	fjl
Solve — 	2 _2 u with u(0) « J. What is u(oc)?
A door is opened between rooms that hold v(0) ж 30 people and u(0) ж 10 people.
The movement between rooms is proportional to lhe difference r - u-:
dv	dw
— = tc - и and — » v - w.
at	dt
Show that the total v + w is constant (40 people). Find the matrix in du/dt ж Au
and its eigenvalues and eigenvectors. What are t> and w at t = 1 and f ж oo?
Reverse the diffusion of people in Problem 4 to du/dt = -Au:
dv	.	dw
— жи-v	and	— - w - v.
at	al
The total v+ w still remains constant How are the A’s changed now that A is changed
to - A? But show that v(t) grows to infinity from r(0) = 30.
A = “ * j hasreal eigenvalues but В =	j has complex eigenvalues:
Find the conditions on a and 6 (real) so that all solutions of du/dt - Au and
dv/dt — Bv approach zero ast -too: Re A < 0 for all eigenvalues.
Chapter 6. Eigenvalues and Ei,
254
8
9
Suppose P is the projection matrix onto the 45° line у = j jn rz
eigenvalues? If du/dt = ~Pu (notice minus sign) can you find the limit f are ‘U
t — oc slatting from u(0) = (3,1)?	° **(t) at
The rabbit population shows fast growth (from 6r) but loss to wolves (f
The wolf population always grows in this model (-tr2 would control и Г*”1 '2*")
wt’IVcs).
dr	.	dw
— « 6r - 2r and	— = 2r + и,.
Find the eigenvalues and eigenvectors. If r(0) = «(()) = 30 what are the
at time t? After a long time, what is the ratio of rabbits to wolves?	° P°pulal'«ns
(a)	Write (4.0) as a combination c,», + CjXj of these two eigenvectors of
(b)	The solutioo to du/dt  Ли starting from (4.0) is + cye~ttx
stituter*1 = cost + isint and e*'1 “ аж/ — tsinf to find u(t).	2 U*>‘
10 Find Л to change f - 5/ + 4V into a vector equation for и - (у. y>);
What are the eigenvalues <* Л? F,nd lhcm a,*° from “ bv> + 4v wi,h V = «*.
11 The solution io g” = 0 is a straight line у - C + Dt. Convert to a matrix equation:
— f * = ° 1	* has the solution
dt (VJ [0 0J l/J
V = ,At v(0)
И k(0).
This matrix Л has A > 0.0 and it cannot be diagonalized. Find A2 and compute
ел‘ = 1 + At + pl/a + •••. Multiply your e* times (y(0).1/(0)) to check the
straight line y(t) = y(0) + /(0)t.
12 Substitute у = e* into y" = 6y* - 9y to show that A = 3 is a repeated root. This is
trouble; we need a second solution after r3*- The matrix equation is
d у _	0 1 у
dt И = Г» б]
Show that this matnx has А ж 3.3 and only one line of eigenvectors. Trouble here
too. Check that у = Ic31 is a second solution lo g" = 6j/ - 9g.
е 4 Sy»‘ems °*l>l,lCTtn,lal Циа1н>т
255
(a) Write down two familiar functions
'	Иорл/л *
(W . л.:
Find u(t) by using the eigenvalues and eigenvectors of A: u(0) = (3,0).
14
The matrix in this question is skew-symmetric (Ат = -Ay.
du 0 c ~b	<  cu> - bu3
dt “	° ° ** °* “з “ aui ~ ™i
. • ~°	OJ	t^ = bu|-eu2.
(a)	Tbe derivative of ||u(t)||2 . u’+u’+u? is 2uiu',+2u2«'2+2u1ui Substitute
u'„ Uj, Uj ‘° 8е’ »« Then ||u(t)||2 stays equal to |u(0)||2.
(b)	A1 — -A maket Q = tAl orthogonal. Prove QT «e“^* from tbe senes fix Q.
A particular solution to du/dt = Au - b is u, - A ~1 b. if A is invertible. Thc usual
solutions to du/dt = Au give u„ Find the complete solution и = u,+ u,:
Questions 16-25 are about lhe matrix exponential eAt.
16	Write five terms of the infinite series for e'*1. Take thc I derivative of each term.
Show that you have four terms of AtM. Conclusion: e4,Uo solves u' = Au.
17	The matrix В “ [J ~J] has B2 = 0. Find r®’ from a (short) infinite series.
Check that the derivative of e®‘ is Be®.
18	Starting from u(0) lhe solution at time T is eAI u(0). Go an additional time t to
reach eAl eAT u(0). This solution at time t + T can also be written as ---------------.
Conclusion: eAt times eAT equals________.
19	If A2 = A show that the infinite series produces c 41 =/ + («*- 1)A.
20	Generally eAeB / eBeA. They are both different from tA + ®. Check this for
21 Put A = [ J § ] into the infinite series to find tM. First compute A* and A":
256
Chapter 6. Eigenvalues and Eigci
22
23
24
(Recommended l Give two reasons why the matrix exponential eAt js ncv
(a) Write down its inverse.	(b) Why are its eigenvalues eXl nonzero? Г *’Пви,«г:
find a solution x(t). y(t) thal gets large as t -f oo. To avoid this instabilit v
exchanged the two equations to get A < 0. How can this be ?	a
^'entist
Кн.! - 2У. + Г.-i = -(А/)2Г. can be written as a one-step difference eqUatj
Уя+| = Г« + Д/ Z.	Г 1	01Г r.+11 r 1	Д/	1 r
Z.^i = Z. - Af r.+t	[ Af	1 J [Ze+I J	= 0	1	?
J I J
Invert the matrix oo the left side to write this as 1/я4] ж AU,. Show that I
Choose the large time step A/ = 1 and find the eigenvalues A< and A _ T * 1.
2 ® At of 4.
4= j * has|AI| = |AJ| = l.Showthat4e-JsoueeUoexactJy
25 That leapfrog method in Problem 24 is very successful for small time
But find the eigenvalues of A for Af  s/2 and 2. Any time step Д/ V*? A/'
lead to |A| > I. and lhe powers int/„  A"U0 will explode.	2 W'H
A 
1 Л
-Л -1J
and A =
borderline
unstable
26	A very good idea for / - -Jf » «he trapezoidal method (half forward/half back).
Thu may be lhe belt way lo keep (Y„,Zn) exactly on a circle.
_	. f 1	1 [ K-+1 1 - ( ’ Д‘/2 1 [ Г" 1
Trapezoidal [ д,/2	, J [ 2Я>1 ]	[ -At/2	1 J [ Z„ J ♦
(a)	Invert the left matrix lo wnie this equation as l/„+i = AU,. Show that A it
ал orthogonal matrix: A1 A  I. These points U„ never leave the circle.
A « (/ - В)"'(/ + B) is always an orthogonal matrix if BT ж -B.
(b)	(Optional MATLAB) Take 32 steps from l/0  (1.0)tof732 with At = 2ir/32.
Is Un = Ue11 think there is a small error.
27	Explain one of these three proofs that lhe square of eA is c2A.
I.	Solving with r4 from t = 0 to 1 and then 1 lo 2 agrees with e2A from 0 to 2.
2.	The squared series (I + A +	+ • • • )2 matches I + 2A +	+ • • • = e2A.
3.	If A can be diagonalized then (XeA№,)(XeA№1) = Л>2ЛХ-1.
6Д. Systems DifTcnn«*al Equations,
257
on a H'lTortnlial Equalions Course
instant coefficient linear equates are the simoieu.	,	course
part of a differential equations course, but there £	S<xt,on 6 4 showi У™
1.	The second order equation mu* + bu' + b
cations. The exponents A in the solute " = ° has major importance in appli-
₽<« cdjzz:; 7 z *=°-
. j 'j . ..	.	“ =4mk Overdampine lP>4rnk
This decides whether A, and Aa are real rant	p K
With complex A = e + iu the ,o|ulion	w rcpca,cd " complex roots,
utxm u(t) opiates from e** as it decays from e*.
2.	Our equations had no forcing term /ft) Wr
To u„(t) we need to add a pan.cular LlwionT m 71* “nulUpace м>1и,юп"
This solution can also be discovered and studied Ь/ Ь**>Псе',he force /(<)’
siuoieo by Laplace transform;
Input f(t) at time •
Growth factor ел<‘~»>
Add up outputs at time t
in real applications nonlinear deferential equations m solved numerically. A method
wlth good accuracy is Runge Kuru The constant solutions to du/Л - J(u) are u(t) =
Y with f(Y) - 0 and du/dt « 0: no mowmeni. Far from У. the computer takes over
This basic course is the subject of my textbook (a comp»,on to this one) on
Differential Equations and Linear Algebra math.mit.edu/dela
П,« MMdw.1 ЖЙШ of lhe book m taenbed i. ,	4 Aon .kfcov .Ы .
parallel senes about numerical solutions was prepared by Clew Moler
<ww.mit.edu/resources/rcs-1Я-Ш№.1>пта^г;ггт.п.пц||| c Tj	... ... .
strang-and-cleve-moler-fall-2015/
www.mathworks.com/academia/courseware/)earo.diff»r»n«i«i—o.—o™ u._
'о
7.1
7 The Singular Value Decomposition (S
Singular \alues and Singular Vectors
Camprwdng linages by tbe S' D
Principal Component Analysis
Tbe Victory of Orthogonality (and a Revolution)
upterdesctops one idea. That idea applies to everymatrix, square lJrr_.
tensaon of c^cnvccton. and no» »e need «.oretsc/orr^^/^^hr
to r, Mkl omput vectors щ to This is completely natural for ‘npw
к vectors г, ю r, are m the no» space and u, lo u. are in the col™!! b> "
------ r meets of rank one. »ith r « ^4)	nin *₽ace
«ngular
73
7.4
SVP A « LTV'T - Qiu:ef o;u-t>J t-“4r,n,»q
hr unrular sectors r, are eigenvectors of ATA. They give bases for the rou
rxisei ror ur IV»UU* Y—----------------.	_
Tbe matrix A is diagonalized by these two base: AV = l/E.
Eads m,  r, »hen A is a symmetric positive definite matrix Those singular vectors
•ill be the eigenvectors g. And the singular values a, become the eigenvalues of A.
If A is not square or not ss mmetne. then A “ CEVT replaces 5 - Q.\QT.
TIk SVP is a valuable »ay to understand a matnx of data In that case AAT is the
sample covariance matrix, after centering the data and dividing byn - 1 (Section 7.3).
Its eigenvalues are <rf to <rj. Its eigenvectors are the u s in the SVD. Principal
Comfwient Analysts < PC A1 is totally based on the singular vectors of the data matnx A.
The SAD allows »onderful projects. by separating a photograph - matrix of pixels
into its rank-one components. Each tune you include one more piece <7,u,v^. the picture
becomes clearer Section 7J sho»s examples and a link to an excellent website.
Section 7J describes PCA and its connection to the covariance matrix in statistics.
Section 7.4 shows bon it all develops from and depends on one idea: orthogonality.
258
Singular Value* and Singular Visors
7‘	259
7.1 Singular Values and Singular Vectors
A* S"4,”1“	« AV . vix
2 Singular sectors in Ди. = rr.u, « orehomvnul. yTy . f	;
3 The diagonal matnx E contatns the ungula, nlues o,	„r > 0.
<4 The^softhoaes.n^larv^ues^eigenv^uesofATA^AA/ }
lS^^№3!SSi
Ш by « nW,7 2	#	°** « of orthonormal sectors r	v.. in R
and a «««nd set u..in R-. Instead of Sx = Xr wt welUT 1' ffu "
Here i> « 2 by 2 unsymmetric example with orthogonal input, and orthogonal output.:
°][!]"[«] “** лв»-[? °][“!]’['»]•(,)
(1.1) is orthogonal ‘О Н.П.-d (3.9) b onhogonal to (-3.1). Those are not unit
vectors but that iv easdy fixed The tnputs (1.1) „d (-1.1) „eed to be divided by Л
The outputs need to be dtv.ded by Л6. That leaves the singular value* ЗЛ and v/5:
Xv = au | J ;]в1«зЛМ1 and [J	(2>
Multiply the singular values o(  ЗЛ and u,  Л lo get rr^, . det A - 15.
We can move from vector formulas to a matrix formula. SQ = QA becomes A V 	l/E.
AV .	[	3 0 1 [ 1	1 * . [ *	-3 1 1 Г 3s/5 0 1	...
I	4 5 ] I I	1 ] [ 3	1 ] vlo [ 0 v/5 ] “t/E'	(3)
V and I are orthogonal matrices. So if we multiply equation (3) by VT. A will be alone:
AV^l/E	becomes A ж t/EVT  oiUiv^ + OjUjOj	(4)
This says everything except how to find V and U and E. and what they mean.
When equation (4) multiplies v,. orthogonality produces Ди, ж ffjtq. Key point:
Evtry matrix A is diagonalized by two sets of singular vectors, not one set of eigenvectors.
In the 2 by 2 example, the first piece is more important than the second piece because
ai = ЗЛ is greater than o2 = Л. To recover A. add the pieces tfiUtuf + rraujuj:

-31 (30
1][4 5
260
Chapter 7. The Singular Value Decomposition (SVD)
The Reduced Form of the SVD
Thai full form AV = l/E can have a lol of zeros in E when lhe rank of A is sntall
and the nullspace of A is large. Those zero* contribute nothing to matrix multiplication.
The heart of the SVD is in the first г v’s and u’s and rr’s. We can change AV =
to AVr = l/rEr by removing the parts that are sure to produce zeros. This leaves
the reduced SVD where Er is now square: (m x n) (n X r) = (m X r) (r x r).
Reduced SVD
AVr = l/rEK A V\ .. vr “ Ui .. Ur
Av, = <r,u,	fow spacc column space
We still have VrT Vr — I, and Uj Ur = /, from those orthogonal unit vectors v’s and u’s.
When Vr and Ur are not square, we can’t have full inverse*: Vr VrT / 1 and UT Uj / /,
But A = l/rEr VrT is true. The other multiplications in A = BEVT give only zeros.
Example? A=[l 2 2]-[1][3][1 2 2] /3 = l/rErVrT ha* r > 1 and rr, . 3.
The rest of l/E VT contributes nothing to A, because of all the zeros in E. The key
separation of A mto«7|U| v’ + ••• + o,u,»7 iloPs •*<»|U|»T because the rank is r  1.
The Important Fact for Data Science
Why is the SVD so important for thi* subject and this book ? Like the other factorizations
A = LU and A = QR and S “ QAQT. il separates the matrix into rank one pieces.
A special property of the SVD is that those piece* come in order of Importance.
The first piece ff|U|»,r when at > a2 u the closest rank one matrix to A. More is true:
For every k, the sum of the first к piece* is the rank к matrix that is closest to A.
A* = <7|Ui»y + • • • + «raUfctiJ is the best rank к approximation to A
“Eckart-Young"
If В has rank к then 11A - A*|| < ||A — B||.
(6)
To interpret that statement you need lo know the meaning of the symbol ||A — B||.
This is the "norm" of the matrix A — B. a measure of its size (like the absolute value
of a number). The norm could be <T| or <t| + • • • + aT or the square root of tzj + • • • + <r,.
Our first job is to discover how U and E and VT can be computed. For a small matrix
they come from eigenvectors and eigenvalues of ATA and AAT.
For a large matrix, multiplying A by AT is not wise. Two steps are much better:
Reduce A to two nonzero diagonals and modify lhe QR algorithm that finds eigenvalues.
. I Singular Values and Singular Vectors
261
HrM Proof of the SVD
The go»1 » A S	Wc Wan‘'° “""У °* ,BO ** «< Mn8“lar vecKus, thc us and
(hc v's. One way » «nd iho* vectors is lo form the symmetric matnces Лт A and AAT :
ATA = (VETl/T) ((JEVT) = yj-Tj-yT	P)
AAT = (l/EVT) (VETl/7) . иЕЕТцт	(S)
th (7) and <*) P'oduccd *ynmetric matrices Usually ATA and АЛ1 are diilerent. Both
°ht hand sides have thc special form QAQ1 Eigenvectors art In Q - V or Q = U.
nB • know from (7) and (8) how V and U and E connect to those symmetric matrices
S« wc . -T
дТ A and AA •
V contains orthonormal eigenvectors of AT A
U contains orthonormal eigenvectors of AAT
to	art the nonzero eigenvalues of both AT A and AAT
We are not quite finished, for this reason The SVD requires that Ao* - v*u*.
It connects each right singular vector v* to a left singular vector «*. for к « I...r.
When I choose the v’s, that choice will decide lhe signs of the u's. If Su  Au then
also S(-u) " A(-u) and I have lo know the sign to choose More than that, there is
a whole plane of eigenvectors when A is a double eigenvalue. When I choose two v's
in that plane, then Av « au will tell me both u’s. This is in equation (9).
The plan is to start with the v's. Choose orthonormal eigenvectors vi..........v,
of ATA. Then choose <r* = v^a- To determine the u’s we require that Av * <ru:
v’s then u’s ATAv* = and then u* - — for к - 1
(9)
This produces the SVD! Let me check that those vectors ut
(10)
A A u* ® Aa" i 	। — n ।
\ <r* /	\ O* /	<r*
Thc v’s were chosen to be orthonormal. I must check that thc u’s are also orthonormal:
т /Av^T/<^ = t^A7Av*)	= a*	T^=	f 1	if7 = k	(i|)
Oj Ok O)	*	I ®	>f J #
was the key to equation (10). The law (AB)C =
* in linear algebra. Moving the parentheses is a
n;“”kTJ w
Notice that (AAT)A = A(ATA)
A(Z?C) is thc key to a great many proofs in
powerful idea. This is the associative law.
262	Chapter 7. The Singular Value Decomposidcm (SVD)
Finally we have to choose the last n — r vectors vr+t lo vn and the last rn — r vec-
tors Ur+i to um. This is easy. These v’s and u’s are in the nullspaces of A and ДТ
We can choose any orthonormal bases for those nullspaccs. They will automatically be
orthogonal to the first v’s in the row space of A and the first u’s in the column space
This is because the whole spaces are orthogonal: N(A) -L C(AT) and N(AT) j_ С(Д)
The proof of the SVD is complete by that Fundamental Theorem of Linear Algebra
Now we have U and V and E in the full sire SVD of equation (1): rn u’s, n v’s
You may have noticed that the eigenvalues of ATA are in E1 E, and the same numbers
Ox to a, are also eigenvalues of AAT in EET. An amazing fact: BA always has the
same nonzero eigenvalues as AB. If В is invertible, BA = B(AB)B~* is similar to AB
AB and BA: Equal Nonzero Eigenvalues
If A is m by n and В is n by m. AB and BA have the same nonzero eigenvalues
Start with ABx = Xx and A / 0. Multiply both sides by B. to get В ABx = АВж
This says that Bx is an eigenvector of BA with the same eigenvalue A—exactly what we
wanted. We needed A / 0 to be sure that Bx is truly a nonzero eigenvector.
Notice that if В is square and invertible, then B_|(BA)B = AB. This says that
BA is similar to AB: same eigenvalues. But our first proof allows A and В to be m by n
and n by rn. This covers the important example of the SVD when В = AT. In that case
AT A and AAT both lead to the r nonzero singular values of A.
If m is larger than n. then AB has m — n extra zero eigenvalues compared to BA.
Example 1
(completed) Find the matrices U and E and V for A =
3 0 1
4 5 J
With rank 2. this A has two positive singular values ai and a3. We will sec that at is
larger than Amax = 5, and a3 is smaller than A^,, = 3. Begin with AT A and AAT:
Those have the same trace (50) and the same eigenvalues tr, = 45 and rr j = 5. The square
roots are <7i = v/45 and o3 = -y/5. Then <7i<r2 = 15 and this is the determinant of A.
The key step for V is to find the eigenvectors of ATA (with eigenvalues 45 and 5):
[ 25	20 1 Г 1 1 _ Г 1 1	Г 25	20 1 Г -1 1 Г -1 1
I 20	25 J [ 1 J -	45 1 ]	[ 20	25 J [ 1 J	= 5 [ 1 J
Then vi and v2 arc those orthogonal eigenvectors rescaled to length 1. Divide by \/2.
Right singular vectors V] =
Left singular vectors u, = —
Oi
7 I. Singular Value, and Singular Vectors
Now compute Av, and Av, which muy u
263
Av,
Av'i
'io L i
division by s/W makes ut and u2 orthonormal. Then O| = v« an
a, expected. The Singular Value Decomposition of A is U times £ times VT.
1 -3
3 1
45
(12)
.. ancj у contain orthonormal bases for the column space and the row space of A
(both spaces are just R2). The real achievement is that those two bases diagonalize A:
1V equals l/E. The matrix A = UEV7 splits into two rank-one matrices,
columns times rows, with x/2 v/10 = v 20. Their sun is Л with V5/v/26 =
i
^Tio
E =
V =
<72 Uj
Every matrix is a sum of rank one matnees with orthogonal w's and orthogonal us.
Orthogonal rows [ 1 1 I and [ 3 -3 ], orthogonal columns (1,3) and (3.-1).
To say again: Good codes do not start with AT A and AAr. Instead we produce
zeros in A by rotations that leave only two diagonals (and don't affect the singular values).
The last page of this section describes a successful way to compute the SVD.
Question: If 5 = QAQT is symmetric positive definite, what is its SVD ’
Answer: The SVD is exactly UY,V^ = QAQ1. The matrix U = V = Q is orthogonal.
And the eigenvalue matrix Л becomes the singular value matrix £.
Question: If S = QAQT has a negative eigenvalue (Sx = -ax), what is the singular
value and what are the vectors v and tt ?
Answer: The singular value will be a = +o (positive). One singular vector (either u
or v) must be — x (reverse the sign). Then Sx = —ax is the same as Sv = <zu.
The two sign changes cancel.
Question: If A = Q is an orthogonal matrix, why does every singular value equal 1 ?
Answer: All singular values are a = 1 because ATA = QrQ = 1- Tbw S — !•
But 17 = Q and V = I is only one of many choices for the singular vectors u and v:
Q = l/EVT can be Q = QUr <*•“* Q = (Wi
264	Chapter 7 Thc SinSul" Value Dcc°mposition (SVD)
Question: Why are all eigenvalues of a square matrix A less than or equal to j
Answer: Multiplying by orthogonal matrices U and V T does not change vector length,;
||Ax|| - ||f/EVTx|| = ||EVTx|| < <rt||VTx|| = «nllatll for all x. (|3)
An eigenvector has ||Ax|| = |A| ||x||. Then (13) gives |A| ||x|| < <r( ||x|| and |Д| <
Question; If A = zyT has rank 1. what are U| and »i andffi ? Check that |A,| <
Answer: Thc singular vectors Ui = x/||x|| and Vi = у/ |y|I have length 1. Then «г, B
||x|| |; y|| is the only nonzero number in thc singular value matrix E. Here is thc SVD;
2	V1
Rank 1 matrix xyT =	(||«|| llvll) - «1<И ef.
Observation The only nonzero eigenvalue of A = xy r is A> «= yTx. Tbe eigenvector
is x because (xyT)x = x(yTx) ” A|X. Then the key inequality |А, | < becomes
exactly the Schwarz inequality |yTx| < ||x|| ||y 11
The Geometry of the SVD
The SVD separates a matrix into Л = УEVT: (orthogonal) x (diagonal) x (orthogonal)
In two dimensions we can draw those steps. The orthogonal matrices U and V rotate
the plane. The diagonal matrix E stretches it along the axes Figure 7.1 shows rotation
times stretching times rotation Vectors x on the unit circle go to Ax on an ellipse
Figure 7.1: U and V are rotations and possible reflections. E stretches circle to ellipse.
This picture applies to a 2 by 2 invertible matrix (because > 0 and rr2 > 0). First
is a rotation of any x lo VTx Then E stretches that vector to EVTx. Then U rotates to
Ax = t/EVTx. We kept all determinants positive to avoid reflections. Thc four numbers
a, b, c, d in the matrix connect to two rotation angles 9, ф and two numbers oj, <z2 in E.
° ^ ] = [ сов® -тпв 1 Г Oj 1 Г сов* шпф 1
c d ] [а»пв cost? ] [	<7j J [ — sin^ сояф J *	'
Question. If the matrix is symmetric then b = c. Now A has only 3 (not 4) parameters.
How do tbe 4 numbers 9, ф. <Г|, a2 reduce to 3 numbers for a symmetric matrix?
Question 2 If9 = 30°andol = 2 and a2 = 1 and ф = 60°. what is A?
7j. Singular Values and Singular Vectcn
265
The First Singular Vector Vi
wj|| establish a new way to look a a,. The prevwu. pages chose the •’* as
Thi» pa|5..ors of ЛГЛ. Certainly that remains tree But there is a valuable way to
cigcnVCL ‘ singular vectors one al a lime Instead o( ail al once We start with Vi
«^^“"nuular value ff|. The length of z comes from | z|p = r3, +  • + x* =	»•
and d* ___________________________
Maximize the ratio l£f!! Them».-
||z|| ,nen»aximumh
(15)
-------------------------------1
ц,е ellipse in Figure 7^ I showed why the max.mu.ng . » . When you foltow „ across
the page, it ends at Ло, - о, u, .The longer axis of the ellipse has length |Л v,|| = <r„
Bul we aim for an independent approach lo the SVD! We arc not assuming that we
already know U or L or V How do we recognize that the ratio ||Xz||/||*|| is a maximum
when x = vi Calculus tells us that the first derivatives must he zero. The derivatives will
be easier to compute if we square our function and work with S - Лт Л:
Problem: Find the maximum value A of  хгЛ3 Ax	zT 5z
IMP x7x z^z ’
This “Rayleigh quotient" depends on x..x. Calculus uses the quotient role,
need the partial derivatives 2z and 2Sx of zTz and zTSx:
(16)
m> we
(17)
s(*'s') 	Vs) -»E s.,,,. J(s.)	<»>
* i	I
Q (zTSz\
Use the quotient rule for I —=— ) and sei those n partial derivatives of (16) to zero;
czx, \ z’z /
(zTz)2^$z) - (zTSz) 2x, » 0 for i * l,...,n	(19)
Equation (19) says that (Sz), = Ax,. The number A tszTSz/zTz Then Sx “ Az and
the best z to maximize the ratio in (16) is an eigenvector of S!
2Sx = 2 Az and the maximum value of
* j* = -У**', is an eigenvalue A of S.
xrx ||z|P
The search is narrowed to eigenvectors of S = ЛТЛ. The eigenvector with largest A
is z = V|. The eigenvalue is A| = <r3. Calculus has confirmed the solution (15) of
the maximum problem. That problem has led to <r. and v. in the SVD.
For the full SVD, we need a/l lhe singular vectors and singular values. To find v2
and <т2, we adjust lhe maximum problem so it looks only at vectors x orthogonal to ®|.
266
Chapter 7. The Singular Value Decomposition (SVD)
Maximize under lhe condition vfx = 0. The maximum is <ra n( x =
"Lagrange multipliers'* were invented lo deal with constraints on a? like „т^.
And Problem 9 give* a simple direct way io work with this condition v[x = 0 '
Every singular vector u*+i gives lhe maximum ratio ||Ax\|/||x|| over nil vect
that arc perpendicular U> the first o,..The left singular vectors would cor *” 7* *
maximizing ||ATy||/||y||. We arc always finding lhe axes of an ellipsoid andT
eigenvectors of symmetric matrices A1A or AAT: all at once or separately.	1 ,c
Computing Eigenvalues and Singular Values
x atrr.MM between the symmetric eigenvalue problem Sx = \x
Xi	s A Moncompu,ine Vs ttnd
Eigenvalues arc the same for S and Q~'SQ = QrSQ when Q is orthogonal.
So wc have limited freedom to create zeros in Q~lSQ (which slays symmetric).
If we try for loo many zeros in Q lS, the final Q will destroy them. The good
Q~*SQ will be Iridiagonal wc can reduce S lo three nonzero diagonals.
Singular values are the same for A and Qj1 AQ? even if Qi is different from Qa.
Wc have more freedom to create zeros in Qt 1 AQ-j. With the right Q s, this will be
bidiagonal (two nonzero diagonals). Wc can quickly find Q and Qi and Qa so that
	в| bi	«- for A’»	Cl d|	
	bl O] bj b, •	•	Qf’XQa-	•e • "O о	(20)
	* °".	for it’» -»	0 c".	
The reader will know that lhe singular values of A orc the square rxxXs of the
eigenvalues of S’ « ArA. And the singular values of Q, 'AQ3 are the same as the singular
values of A. Multiply (bidiagonal)T(bidiagonalI to sec trldingonal
This offers an option that wc should not take. Don’t multiply ЛТЛ and find its eigen-
values. This is unnecessary work and the condition of the problem will be unnecessarily
squared. The Golub-Kahan algorithm for the SVD works directly on A. in two steps:
1	. Find Qi and Qj so that Qi 1 AQj is bidiagonal as in (20).
2	. Adjust the shifted QH algorithm to preserve singular values of this bidiagonal matrix.
Step 1 requires ()(тиг) multiplicalions to put an rn by n matrix A into hidiagonal form.
Then later steps will work only with hidiagonal matrices. Normally it then lakes O(№)
multiplications to find singular values (correct lo nearly machine precision). The full
algorithm is described on pages -IN9-W2 in the 1th edition of Golub-Van Loan.
When A is truly large, wc turn lo random sampling With very high probability,
rundt>mi:ed linear algebra givei arrunite multi. Most gamblers would say that a gixxl
outcome from careful random sampling is certain.
7,|. Singular Value» and Singular Wctu»»
problem Set 7.1
267
Find Ar A and A A1 and thc singular vector*
г 0 I о 1

0 0 8
ООО
ha* rank r . } The eigenvalue* are 0.0.0.
Check the equation* zU| = tfiu, and .Ats, -
If you remove row 3 of A (all zero*), show that
"j«»j and A » niuiv[
e> and <ij don't change.
+ OjttjvJ.
Find thc singular value* and also the eigenvalue* of Д;
0	10'
0 0 8
TuuB ° °
В =
ha* rank r _ 3 and determinant —--.
I (JOO
Compared to A above. eigenvalue» have ch»,cd much more than singular value*.
3	T"* to U ‘ “	«*- 4** Transpose
Д a (/EV to sec that A = VE l/T ,«* the opposite way. from n't to
ATu* = <rbvfc fork = l...............	. 0 ferfc = r + ,........m
Multiply .‘tv* = <г*Щ by AT. Divide A1 Av* * njek in equation (9) by n*.
Whal are thc u * and и * for thc transpose [3 4 ; 0 5] of our esample main*?
4	When Av* and ATu* »Mt. show that 5 ha* eigenvalue* i»» and --л*:
S" [ ДТ о ] h»»«rn*«ton [ “‘J and |	] and tract - 0.
The eigenvectors of this symmetric S tell tn the singular vector* of A.
Find the eigenvalues and the singular values of this 2 by 2 rnatns A
lhe eigenvectors (1,2) and (1, -2) of 4 are not orthogonal. How do you know the
eigenvectors V|, Vj of ATA will be orthogonal? Nonce that ATA and AA1 hast
thc same eigenvalues A> = 25 and Aj = 0.
The two columns of А V - (7E are Art ’ ai«i and Avj = So hope that
""""H ' JI * J I •> I “ l> 4I-.J —1-1J
The first needs <Т| + 1 « e, and the second need* I -	Arc those true?
The MATLAB command* .4 « rand(20.401 and В = randn (20,40) produce 20 by
4(1 random matrices. The entries of .4 arc between 0 and I with uniform probability.
Thc entries of В have a normal "bell-shaped" probability distribution, thing an svd
command, find and graph their singular values to <гя. Why do they have 20 <r’s ?
268
В
9
10
11
12
13
14
15
16
17
Chapter 7. The Singular Value Оесотр<л|11()п (SVQj
A symmetric matrix S = ATA has eigenvalues A. to A„and eigenvectors v, 10
Then any vector has the form x = и th + • • • + The Rayleigh quotient is
x * Sx _ AiC| 4* * * * 4* Anc%
Я(ж) = "x5®-= c? + -’- + c£
Which vector x gives the maximum of Я7 What are the numbers C| to c„ for lha(
maximizing vector x 1 Which x gives lhe minimum of R1
To find a2 and v2 from maximizing that ratio Я(х), wc must rule out the first singu |ttr
vectors V) by requiring XT«I = 0. What docs this mean for c, 7 Which c’s givc thc
new maximum a2 at thc second eigenvector x = v2 "!
Find ATA and thc singular vectors in Av> = <hUt and Av2 - o2u2:
2	2	.	.	[33
Л = [ -1	1	Ond	A	[ 4 4	] 1
For this rectangular matrix find Vi,Va,«3 and U|,u3 and ffi,cra. Then write the
SVD for A as UWr - (2 x 2)(2 x 3) (3 x 3).
л f 1 1 °'
A [ 0 1 1
lf(ATA)t> - ff’tt. multiply by A. Move the parentheses to get (AAT)Av «.
1Г v b an eigenvector of ATA. then___b an eigenvector of AAT.
Find thc eigenvalues and unit eigenvectors v ।, va of ATA. Then find u,  Avt/oi:
Л- J 2] andATA-[jJ “] and AAT  JJ .
Verify that щ is a unit eigenvector of AA^. Complete thc matrices Г/, E, V.
(a)	Why is live trace of A1 A equal lo the sum of all 7
(b)	For every rank-one matrix, why is af = sum of all o^7
If A  t/EV1 is a square invertible matrix then A~l =	, ______.
Thc largest singular value of A*1 is therefore l/<7„,|n(A). The largest eigenvalue
is l/|A(A)|mi„. Then equation (13) says that <zmin(A) < |Л(А)|тщ.
Suppose A = U£Vr is 2 by 2 with > o2 > 0. Change A by as small a matrix
as possible lo produce a singular matrix Ao. Hint: V and V do not change.
Why doesn't thc SVD for A + / just use E + /?
7 X Compressing ,ma8« by the SVD
269
7 2 Compressing Images by the SVD
1 An image is a large matrix of gra>K4j
2 When nearby pixels are correlated (not rando(n) "" ₽’Wl
J Hags often give simple images. Photograob « к.	'°трГС"с4
'--------------------------------------<-an be compressed by the SVD
Image processing and compression are тают conTZTT~~~~~~~
in1agcs often uses convolutional neural nets in decn 1^7'°* “** al*rtrx Reto8n
^present part of.be
This section will beg.n with stylized images Uke fliM л
Then we move to photographs with many more oixek t n *,lh со"Ч>1ги'У-
ways to process and transmit signals The ши» к. re . “	** wam eff,c’cnl
«present light/dark and red/greetVblue fw every smJfl X? У *	°’
The SVD offers one approach to matnx an»»;—-.'	. .
sum A of r rank one matrices o,u,WT c>n he геИ,ХеИ <>	A by A*. The
This section (plus online help) will consider the effect M"n Л* of fc ‘ennk
oraphmograph. Section7 3 willexptorenw^exMtjto^whirtwencedioapprox^maic
and understand a matnx of data.	approximate
Stan with flags. More than 30 countnes chore flags with three «npes Those flap have
, particularly simple form: easy to compress I found a book called "Flags of the World"
and the pictures range from one solid color (Libya*, flag was ent.rely green dunng the
Gaddah years) to very completed images. How would those pictures be compressed with
minimum loss?
The linear algebra answer is: Use the SVD. Notice that 3 stripes still produce rank 1.
France has blue-white-red vertical stripes and b w r in its columns. By coincidence lhe
German flag is nearly its transpose with the coion Black-White-Red:
bbwwrr bbw w rr bbwwrr bbwwrr	=	T 1 1 1	[bbwwrr] France	В В В В В В' В В В В В В W IV IV И’ И’ IV IV IV IV IV IV IV	=	В в IV IV	[111111] Germany
bbwwrr		1		В R R R R R		R	
bbwwrr		t		R R R R R R		R	
Each matrix reduces to two vectors. To transmit those images we can replace № pixels
by 2ЛГ pixels. Similarly. Italy is green-white-red and Iceland is green-white-orange. But
many many countries make the problem infinitely more difficult by adding a small badge
on top of those stripes. Japan has a red sun on a white background and the Maldives have
an elegant white moon on a green rectangle oo a white rectangle. Those curved images
have infinite rank—compression is still possible and necessary, but not to rank one.
270
Chapter 7. The Singular Value Decomposition
(SVDj
A few flags slay with finite rank but they add a cross to increase the rank. Here
flags (Greece and Tonga) with rank 3 and rank 4.	lw°
I see four different rows in the Greek flag, but only three columns. Mistakenly, I thought
lhe rank was 4. But now I think that row 2 + row 3 - row 1 and the rank of Greece is 3.
On the other hand. Tonga’s flag does seem to have rank 4. Tbe left half has four rows:
all while-short red-longer red-all red. We can’t produce any row from a linear combination
of the other rows. The island kingdom of Tonga has the champion flag of finite rank I
Singular Values with Diagonals
Three countries have flags with only two diagonal lines Bahamas. Czech Republic, and
Kuwait. Many countries have added in stars and multiple diagonals. From my book I can’t
be sure whether Canada also has small curves. Il is interesting to find the SVD of this
matrix with lower triangular I s—including the main diagonal—and upper triangular O s.
Hag with
a triangle
10 0 0
110 0
1110
1111
has A-1 «
1
-1
0
0	0 0‘
1 oo
-1	1 0
0-11
0
A has full rank r = AL All eigenvalues are 1. on the main diagonal. Then A has N singular
values (all positive, but not equal to 1). The SVD will produce n pieces 0,14,0^
of rank one. Perfect reproduction needs all n pieces
In compression lhe small o's can be discarded with no serious loss in image quality.
We want to understand the singular values for n ж 4 and also to plot all a'l for large n.
The graph on the next page will decide if A is compressed by the SVD.
Working by hand, we begin with AAT (a computer would proceed differently):
That -1,2,-1 inverse matrix is included because all its eigenvalues have thc form
2 — 2 cos в. We know those eigenvalues! So we know the singular values of A.
-j Comprising Images by the SVD
271
A 4T) = ------ — I
2-2ожв 4sin2(g/2) ffH)sv/A =--------------!____ qi
' 1	2sin(9/2)' U1
The n different angles Q m dually spaced. whlch
1	m4tc',h“ example so exceptional
ж Зж (2n—l)v /
tfs2^m'2n + l........ 2n + 1 l"e4mcludc*e=^WIlh2rin? = 1\
'	9	2 J
Ш important point » to graph the n smgular Vilutt л	off
(unlike the eigenvalues of A. which art dl I). But the dropoff и no, steep So the SVD
give, only moderate compress.™ of this triangular flag Gmu	for H.lben
Figure 7.2: Singular values of the 40 by 40 triangle of 14 (it is not compressible). Hie
evil Hilbert matrix H(i, j) •(i+j- 1)"‘ has low effective rank: we must compress it.
The striking point about the graph is that the singular values for the triangular main*
never go below I. Working with Alex Townsend, we have seen this phenomenon for 0-1
matrices with the Г» in other shapes (such as circular). This has not yet been explained
Image Compression by lhe SVD
Compressing photographs and images i. an exceptional way to «the SVC)« act.oa The
action comes by varying the number of rank one p«es ouJ m the display By keep.ng
more terms the image improves.	____ nv »nod fortune.
We hoped to find a website that would show this
Tim Baumann has achieved exactly what we hope^ or	drr.»/
to use his work: httpet/ftimbaumann.in " -mag
272
Chapter 7. Thc Singular Value Decomposing (SV|))
Uncompressed image. Slider at 3QQ
IMAGE SIZE GOO x GOO
«PIXELS = 3GOOOO
UNCOMPRESSED SIZE
proportional to number of pixels
COMPRESSED SIZE
approximately proportional to
600 x 300 + 300 + 3<X) x 600
= 360300
COMPRESSION RATIO
360000/360300 = 1.00
Show singular values
Compressed image. Slider at 20.
IMAGE SIZE 600 x GOO
•PIXELS = 360000
UNCOMPRESSED SIZE
proportional to number of pixels
COMPRESSED SIZE
approximately proportional to
600 x 20 + 20 + 20 x 600
- 2-1020
COMPRESSION RATIO
360000/24020 - 14.99
Show singular values
i i i 1111 i i ............i,i|i*i..........................................
Change thc number of singular values using thc slider. Click on one of these images to compress it:
You can compress your own images by using the |file picker] or by dropping them oo this page.
7 2. Compressing ln“₽* by the SVD
This is onc of ,hc fivc ,ma8« directly available	273
compression rat». The best ratio depends on the	°* °* *“« ttaerrnmes the
Mondnan pamtmg arc less complex and allow higher^ “*	girl and dK.
or the cats. When the computatron of comp,^	Лап Ле city o, the tree
we have 80 terms truv with vectors u andvtf d.m^^'fС‘’° * 80 + *• + «0 x 600.
You can compress your own images by UMn. /Г* 4lda U *« “ *>•
sample images provided on the site, or by droon.„L Z_T рккст" bw,o° below the six
One beautiful feature of T.m Baumanns site в that й ’
Stunt results. This book's website malh.mit.edu/ev . “Perates in the browser, with in-
please sec that edited site for questions and	C*" mc,ude ,dci' from readers.
'-,Jmfncnt\ and suggestions
problem Set 7.2
1
We usually think that the identity matnx / is as um„L.
difficult to compress? Create the matru fora rank ч л, “ ₽'AM*’k Bul wh> “ 1
cro$i	™ JJb/t»ith a hontonta/-vertical
2 These flags have rank 2. Write A and В in any way as .,»T +		
	12 1!	
Aswvdan — Apintand w	2 2 2 2	--[J i 11
	.12 11	[ 1 3 3 J
3	Now find the trace and determinant of BBT and alio ВтВ in Problem 2.
The singular values of В are close to erf = 28 - and - J,. Is В compressible
or not?
4	Use [I/, S, V] « svd (A) to find two orthogonal pieces auvT of ASw^s.n
5	A matrix for the Japanese flag has a circle of ones surrounded by all zeros. Suppose
the center line of lhe circle (the diameter) has 2.V ones. Then the circle will contain
about ir№ ones. We think of the flag as a 1-0 matrix. Its rank will be approximately
CN, proportional to What is that number C 2
Hint: Remove a big square submatnx of ones, with corners at ±45® and ± 135°. The
rows above the square and the columns to the right of the square are independent
Draw a picture and estimate lhe number cJV of those rows. Then C = 2c.
6	Here is one way to start with a function F(r, y) and construct a matrix A Set
Aij = F(i/N,j/N). (The indices i and j go from 0 to A' or from -N lo /V.)
The rank of A as JV increases should reflect the simplicity or complexity of F.
Find the ranks of the matrices for the functions F( = ту and F2 = x + и and
F3 = x3 + y3. Then find three singular values and singular vectors for F3.
7	In Problem 6. what conditions on F(x.y) will produce a symmetric matrix S?
An antisymmetric matrix A ? A singular matrix Af ? A matrix of rank 2 ?
214	Chapter 7. The Singular Value Decomposition (SVD)
7.3 Principal Component Analysis
The “principal components" of A are its singular vectors, thc orthogonal column»
and Vj of the matrices U and V. This section aims to apply thc Singular Value Dccomposj'
lion A = U'EV'*. Principal Component Analysis (PCA) uses the largest a’s
connected to the first u's and v’t lo understand the information in a matrix of data.
Wc are given a matrix A. and we extract its most important part Ak (largest tr’g).
Лк - <7iUIe]’ + • • • +	with rank (Afc) = *.
solves a matrix optimization problem—and we start there The closest rank к matrix
to A is A*. In statistics we are identifying the rank one pieces of A with largest variance
This puts lhe SVD al the center of data science.
In that world, PCA is “unsupervised" learning. Our only instructor is linear algebra—
the SVD tells us to choose A*. When thc learning is supervised, wc have a big Kt of
training data. Deep Learning constructs a (nonlinear!) function F to correctly classify
most of that data. Then we apply this F to new data, as you will see in Chapter 8.
Principal Component Analysis is based on matrix approximation by A*. The proof
that A* is lhe best choice was begun by Schmidt (1907). He wrote about operators in
function space; his ideas extend directly to matrices. Eckart and Young gave a new proof
(using thc Frobenius norm to measure A - A*). Then Mirsky allowed any norm ||A|| that
depends only on the singular values—as in the definitions (2), (3). and (4) below.
Here is that key property of lhe special rank к matrix A*  a। ut vf + • • • + vj.
A* is closest to A If В has rank к then ЦЛ —Afc|| < ||A - B||.	(|)
Three choices for the matrix norm ||A|| have special importance and their own names:
Spectral norm	||A|| = max« tri	(often called the Z3 norm) (2)
Frobenius norm	||A||r = v/a?+ •••+	(7) also defines ||A||jr	(3)
Nuclear norm	||A||/v = <7i + <7a + ••• + <7r (the trace norm)	(4)
These norms have different values already for thc n by n identity matrix:
ll/lh-i	l|/|k = n.	(5)
Replace I by any orthogonal matrix Q and the norms stay the same (because all at = 1):
1Ю111-1 H<?llF = t/n IIQII/V-n.	(6)
More than this, the spectral and Frobenius and nuclear norms of any matrix stay the
same when A is multiplied (on either side) by an orthogonal matrix. So the norm of
A =(/EV1 equals thc norm of £: ||A|| = ||S|| because I/and V are orthogonal matrices.
275
Norm of a Matrix
Wc need a way to measure the size of a vector or. matnx .
* is the usual length ||tr||. For. matnx, FmbeatnsexcILlT T*T’
S** H .x -K. in Л. ™.	...........
11,ll> = 4 + - + ^ 1М&-4>+-+<+...+4|1....+<
Clearly |M	7 Iе! IMI- Similarly ||A||r > 0 and |lcA||r = lei ||A||r
Equally essential is the tnanglc inequality for tr + w and A~+ fl:
Triangle Inequalities ||v + w|| < ||v|| + ||w|| „j цл + дцг < цдц^ + (gj
We use one more fact when we meet dot products vT w or matrix products AB :
Schwarz inequalities |vTw| < ||tr||||w|| and ||АВ||Г < ||A||f||B|If W
That Frobenius matrix inequality comes directly from the Schwarz vector inequality:
|(AJ3)y|’ < Hrowiof A||2 Ucolumn j of B||2. Add for all 1 and) to see ||A0||^.
This suggests that there could be a dot product of matrices. It is A • В  trace! AT B).
Note. The largest size |A| of the eigenvalues of A is nor on acceptable norm' We
know that a nonzero matrix could have al) zero eigenvalues—but its norm ||A|| b not
allowed to be zero. In this respect singular values are superior 10 eigenvalues
The Eckart-Young Theorem
The theorem was in equation (I): И В has rank к then ||A - A.|| < IIA - ВЦ-
In all three norms ||A|| and ||A||F and ||A||a. we come closest to A by cutting off thc
SVD after к terms. The closest matrix is A*  оiat।t>] + • • • +	• This to the fact
to use in approximating A by a low rank matnxI
We need an example and it can look extremely simple: a diagonal matnx A.
The rank 2 matrix closest to A =
0
3
0
0
0
0
2
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0 '
0
0
0
0
0
The difference A - A, is all zero except foe the z ano ь. ™
How could any other rank 2 matrix be closer io A than this Aj .
Пез.	»«"*9.T
QiAQi. The norms and the rank are n	1ие.д 3 2,1. The best approximation
So this example includes all matnces wit singu	2019 book Linear Algebra and
A2 keeps 4 and 3. Several proofs art	Li has simplified Mirsky's
Learning from Data (Wellesley-Cambndge Pte^Ou К	g
proof that Ak is closest to A, for all norms that depend only on the
276	Chapter 7. The Singular Value Dccompositlon (SVD)
Principal Component Analysis
Now we start using the SVD. The matrix .4 is full of data. We have n samples. por
each sample we measure m variables (like heigh! and weight). Thc data matrix Ao has n
columns and rn rows. In many applications it is a very large matrix.
The first step is lo find the average (lhe sample mean) along each row of Ло. Subtract
that mean from all m entries in the row. Now each row of the centered matrix Д has
mean zero. The columns of A are n points in R Because of centering, the sum of the
n column vectors is zero. So lhe average column is thc zero vector.
Often those n points are clustered near a line or a plane or another low-dimensional
subspace of R™. Figure 73 shows a typical set of data points clustered along a line in R2
(after centering Ao to shift the points left-right and up-down to have mean (0,0) in A).
How will linear algebra find that closest line through (0,0) 7 It is in the direction
of the first singular vector u> of A. This is the key point of PCA !
A is 2 x n (large nullspace)
AAT is 2 x 2 (small matrix)
ATA is n x n (large matrix)
Two singular values O\ >	> 0
Figure 73: Data points (columns of A) are often close to a line in R2 or a subspace in R"'.
The Geometry Behind PCA
The best line in Figure 73 solves a problem in perpendicular least squares. This is also
called orthogonal regression. It is different from the standard least squares fit to n data
points, or thc least squares solution to a linear system Ax « b. That classical problem
in Section 43 minimizes ||Ax — b||a. It measures distances up and down to the best line.
Our problem minimizes perpendicular distances. Thc older problem leads to a linear
equation ATAx - AJb lot the best x. Our problem leads to singular vectors tq
(eigenvectors of AAT). Those are the two sides of linear algebra: not the same side.
The sum of squared distances from the data points to the uj line is a minimum.
To see this, separate each column a, of A into its components along U) and u2:
E 11‘bll* = E l°tT“'l2 + Ё
>	।	i
The sum on the left is fixed by the data. The first sum on the right has terms u7°jo7u,‘
It adds to uf(AAr)U| So when we maximize that sum in PCA by choosing the top
eigenvector U! of AA , we minimize the second sum. That second sum of squared
distances from data points to the best line (or best subspace) is the smallest possible.
7 3. Principal Component Analyse
277
The Geometric Meaning of Eckart-Young
c eure 7-3 *as *n lwo dimensions and it led to the closest line. Now suppose our data
matrix Ao is 3 by n Three measurements like age. height, weight for each of n samples.
A ain we center each row of the matrix, so all the rows of A add to zero And the points
move into three dimensions.
V^'e can still look for the nearest line. It will be revealed by the first singular vector ui
, -pu best line will go through (0,0,0). But if the data points fan out compared to
°' re 7 3. we really need to look for the best plane. The meaning of “best" is still this:
Tlw sum of perpendicular distances squared to the best plane is a minimum
That plane will be spanned by the singular sectors u> and Uj. This is the meaning
of Eckart-Young. It leads to a neat conclusion: The best plane contains the best line.
The Statistics Behind PCA
The key numbers in probability and statistics are the mean and variance The “mean" is
an average of the data (in each row of Ao) Subtracting those means from each row of
Ao produced lhe centered A. The crucial quantities arc the “variances" and “covariances".
The variances are sums of squares of distances from the mean—along each row of A.
The variances are the diagonal entries of lhe matrix A AT.
Suppose the columns of A correspond lo a child's age on the x-axis and its height on the
у-axis. (Those ages and heights art measured from the avenge age and height.)
We are looking for the straight line that slays closest to the data points in the figure.
And wc have to account for the joint age-height distribution Of the data.
The covariances are the off-diagonal entries of lhe matrix AAr.
Those are dot products (row i of A) • (row j of A). High covariance means that increased
height goes with increased age. (Negative covariance means that one variable increases
when lhe other decreases.) Our first example has only two rows from age and height:
the symmetric matrix AAT is 2 by 2. As the number n of sample children increases, we
divide by n — 1 to give AAT its statistically correct scale.
The factor is n — 1 because one degree of freedom has already been used for mean  0.
This example with six ages and heights is already centered to make each row add to zero:
3-4	7	1-4-3 1
7-6	8-1 -I -7 j
For this data, the sample covariance matrix S is easily computed. It is positive definite.
Example
Variances and covariances $ = 6 _ j
278	Chapter 7. The Singular Value Decomposition (SVD)
Thc two orthogonal eigenvector; of S are t»i and u2. Those are the left singular vectors
(often called the principal components) of A. Thc Eckart-Young theorem says that the
vector ui points along the closest line in Figure 73.
The second singular vector u2 will be perpendicular to that closest line.
Important note PCA can be described using the symmetric S = AAT/(n - 1) or the
rectangular A. No doubt S is the nicer matrix. But given thc data in A. computing $
can be a computational mistake. For large matrices, a direct SVD of A is faster and more
accurate. By going to AAT we square <r> and trr and the condition number <7i/cr
In lhe example. S has eigenvalues near 5< and 3. Their sum is 20 + 40 = 60, thc trace
of 5. Thc first rank one piece >/57и1»Г й much larScr than thc second piece
The leading eigenvector ttj ss (0.6.0.8) tells us that the closest line in the scatter plot
has slope near 8/6. The direction in the graph nearly produces a 6 - 8 - 10 right triangle.
The Linear Algebra Behind PCA
Principal Component Analysis is a way to understand n sample points	jn
rn-dimensional space—the data. That data plot is centered: all rows of A add to zero.
The crucial connection to linear algebra is in the singular values and the left singular
vectors u, of A. Those come from the eigenvalues A, = <rf and the eigenvectors of the
sample covariance matrix S — AAT/(n - 1).
Thc total variance in the data comes from the squared Frobenius norm of A:
Total variance T = ||A||}./(n - 1) = (||at||2 + ••• + ||an||2)/(n - 1).	(Ц)
This is the trace of S—the sum down the diagonal. Linear algebra tells us that the trace
equals the sum of the eigenvalues of the sample covariance matrix S.
The SVD is producing orthogonal singular vectors u, that separate the data into
uncorrelatcd pieces (with zero covariance). They come in order of decreasing variance,
and lhe first pieces tell us what we need to know.
The trace of S connects the total variance to the sum of variances of the principal
components u i,..., ur:
Total variance T =	+ • • • + trj.	(12)
The first principal component U| accounts for (or "explains") a fraction a2/T of the
total variance. The next singular vector u2 of A explains the next largest fraction Oj/T.
Each singular vector is doing its best to capture the meaning in a matrix—and all
together they succeed.
The point of the Eckart-Young Theorem is that к singular vectors (acting together)
explain more of the data than any other set of к vectors. So we are justified in choosing
“i to u* as a basis for the k-dimensional subspace closest to the n data points.
The "effective rank” of A and S is lhe number of singular values above the point
where noise drowns the true signal in the data. Often this point is visible on a “scree plot"
showing the dropoff in the singular values (or their squares o^). Look for the “elbow"
in the scree plot (Figure 7.2) where the signal ends and noise takes over.
7 у I’nncipal Component Analysis
279
problem Set 7.3
1
Suppose Ao holds these 2 measurements of 5
Ло=[ 5 < 3 2 fl
1-» 1 0 l-i]
Find Ле
pute thc sample covariance matnx S = АДТ,, _ .	A Lo"v
and Л2. What line through the ongin is closest to the 5 samples in colul™ ofA? *
Take the steps of Problem 1 for this 2 by 6 maim Д,;
< = Г I 0 1 0 1 01
* 1123321
The sample variances and the sample covariance *ц are the entries of S.
Find S after subtracting row averages from .-lo
. What is <h?
From the eigenvectors of S = A.4T. find the line (the U; direction through the
center point) and then the plane (u1 and u, directions* closest 10 these four points in
three-dimensional space:
1-1 0 0
A = 0 0 2 -2
1 1-1-1
Compare ordinary least squares (Section 4-3) with PC A (perpendicular least squares).
They both give a closest line C + Dt to the symmetric data b = -1.0.1 at times
t =-3,1,2.
-3
I
2
Least squares : ATAi = ATb
PCA: eigenvector of AAT
(singular sector uj of .4)
7
8
The idea of eigenfaces begins with .V images: same sire and alignment. Subtract the
average image from each of the .V images. Create	’
E.4, A?/N - 1 and find tbe eigenvectors (= etgenfaces) with arye g™ _
They don’t look like faces but their ^b^^^Xd
gives a code for this dimension reduction pioneered у u
•		. f >	4. if 4 has singular values 5,4,3.2.1 and
What are the singular values of .4 - A3
A3 is the closest matrix of rank 3 to .4 .
1 are unrer and lower bounds to Oi for
If .4 has <7,	= 9 and В has m = 4. what are upper an
A+ B? Why is this true?
280	Оыргст 7. Thc Singular Value Decomposition (SVD)
7.4 The Victory of Orthogonality (and a Revolution)
If I look back at the linear algebra in this book, orthogonal matrices have won. You could
say that they deserved to win. The key to their success goes all thc way back to Section 1 2
on lengths and dot products. Let me recall some of their victories and add new ones.
1 The length of Qx equals the length of x: (Qx)T(Qx) = xrQTQx = xrx = ||x||3
2 The dot product (Qx)T(Qy) equals the dot product xTy: xTQTQy = xTy.
3	All powers Q ' and products Q\Qi..  Qx of orthogonal matrices are orthogonal.
4	The projection matrix onto the column space of Q (m by n) is QQ‘ = (QQ^y
5	The least squares solution to Qx — b (m > n) is x = QTb = Q^b (pseudo inverse).
6	The eigenvectors of a symmetric matrix S can be chosen orthonormal: S = QAQT
7	The singular vectors of every matrix are orthonormal: A = Q\E QJ = C/EVT".
8	The pseudoinverse of L'EVT is VE+l/T. The nonzeros in E+ (n by m) are , _L
That list shows something important. The success of orthogonal matrices is tied to the
sum of squares definition of length: ||x||2 = xTx. In least squares, the derivative
of ||Ax - b||2 leads to a symmetric matrix S = ATA. Then S is diagonalized by an
orthogonal matrix Q (the eigenvectors). A is diagonalized by two orthogonal matrices U
and V. And here is more about orthogonal matrices: A = QS and A = QR.
9	Every invertible matrix equals an orthogonal Q times a positive definite S.
Polar Decomposition A = UT, VT = (UVT) (VEVT) = QS (1)
S is like a positive number r. and Q is like a complex number e,e of magnitude 1. Every
complex number x + iy can be written as e1* times r. Every matrix factors into Q times S.
The square root of (x-»y) (x+iy) is r. The square root of AT A = VE2VT is S= VEVT.
Example Q = UV"1 and S = VEVT come from the SVD of A in Section 7.1:
10	Every invertible matrix equals an orthogonal Q times an upper triangular R.
Example Q and R come from the Gram-Schmidt algorithm in Section 4.4:
r 4. The Victory of Orthogonality land . Rot)<lMxwl)
281
Householder Reflection Matrices
, iS a neat construction of orthogonal matrices. Instead of rotations ( determinant = I)
f’crC afC reflections (determinant = -1). Each matrix // is determined by a unit sector u:
the*6
Reflection matrix 77 — J - 2ццт	Яш = ш _	=
dearly ЯТ = Я. Verify that Я Й an orthogonal matnx
-	' • *	= I because u * ti — 1 •
ЯтН = (7-2ии-)(/_2иыт) = /	т 'СИи-1
7 * ’ua +4uuua‘a/ (2)
One eigenvector of H is u with eigenvalue A = _i. и
и - 2u = -и. The other eigenvectors x fin the Naoe n^““ “ S,mpl,ficS to
pnf u x ~ 0 that is orthogonal to u
uTx = 0 leads to Ях = z - 2uuTz = z soA-1
Notice: The eigenvalues 1 and -1 arc real,since Я is symmetric
The eigenvalues have |A| = 1. since Я is orthogonal
The eigenvalues areA = l(n-itimes|lndA=_Hone dlne)
The determinant of Я is —1, from multiplying the A’s
Examples u = -±= [ _J j leads to lhe permutation Я = /-? | J “j j = ' ° о
COS 9 ] leads to the reflection Я = 7-2 [ 00629 са>0яп0]=Г-сов20 -sin 201
stnflj	[cos0sin0 sin20 J [-sin 20 cos 20]
Both examples have determinant — 1. The neatest formula is the answer to this question:
If ||a|| = ||r||, which matrix reflects a into Ha = rf
Choose the unit vector u =	with v=a - r. Then Ha = (f-2uuT) о= r
I I’ll
This leads to an error-free algorithm that factors A into QR: orthogonal times triangular.
Q will be a product Hn .  HjHi of Householder reflections. One column at a time, we
choose Hj to produce the desired column r, in R. We keep a record of the vectors that
lead to each H, (storing vectors not matrices). When we need the triangular R. we just use
the ttj to recover those matrices H}. This idea can replace Gram-Schmidt for A = QR.
Long ago Euler found another way to produce all orthogonal matrices. He rotated in
some order around the x and у and z axes (three plane rotations). To an airline pilot
those three rotations are roll and pitch and yaw.
Now we show that orthogonality also wins in “function space . The vectors q become
functions q(x). The dot products become integrals j j gdx. The dimension becomes oo.
282
Chapter 7. Thc Singular Value Decomposition (SVb)
Calculus: Vectors Become Functions
This is a book about linear algebra (foe matrices) Orthogonality is just as important in
calculus (for functions). Unear comb.nations of functions produce a function space.
Lengths 11/II « stillthc *luarc rools of ,nnCf Prixlucts £ f *** ,nncr Produ« of
two vectors is a sum. the inner product of two functions f(x) and g(x) is an tntegral:
Inner product	LcnR‘h||/ll2-(/./)-/l№)|2<fa (3)
The two great inequalities of mathematics extend from vectors to functions:
|//Ю^)*1*ПЛ11Ы1
(4)
The orthogonal basis from Gram-Schmidt now contains functions q*(x) instead of vectors:
Basis functions q(x) /(x) = Ci9i(x) + ^(x) + • •  (infinite series) (5)
In a Fourier Series, those q's are sines or cosines. Other series use Chebyshev functions:
Chebfun.org computes with very high accuracy. It is orthogonality f q, qk dx = 0 that
allows us to find each Fourier coefficient c*. Just multiply the series by qk and integrate -.
У f (x) q^x) dx = ci^qi qk fa + cijto 4k dx + • • • + ckJ(qk )2dx + • • •	(6)
By orthogonality, all terms on the right side are zero except the A'th term:
Find ca jf(x)qk(x)dx = СкД*(*)) dx	Qf
Basis functions like qk = sin kx and coskx are guaranteed to be orthogonal because they
are eigenfunctions for a symmetric differentia) equation ATA qk = Xkqk :
A = — AT = - — ATAsinqx = sinqx = q2 sinqx. (8)
dx	dx	ax
Symmetric matrix equations ATAx = b become symmetric differential equations.
Here is Newton’s Law using - ATAy for acceleration (the second derivative):
d -rd	.r.	<?У force
dt	dt	dt2 mass
The equations of physics and mechanics tell us about minimizing the energy 1 Sy—
just as we saw for positive definite matrices. The important point is that the basic
laws of physics (as presented by Feynman) produce equations from minimum principles.
They lead to symmetry and positive definiteness. Then the eigenfunctions are orthogonal.
7.4. The Victory of Ortl>og0,ullly (lnd a
',x"l *“* Н* + |г||1 Sllxlli + IIvU,.
(9)
283
The message from classical mechanics w	v°lulion from Sparsity
orthogonal eigenfunctions. But the world of	S»»"~nc equates have
matrices and all kinds of training data. We w» lft	We	brge rectangular
Smaller sums of squares are not necessarily the “ 7^ ‘ha*	« • «mple way
few nonzeros are the easiest to understand ^Tnd,n« goal. Span, vnton with
We would like to build that goal of sparsity inn, th.  •
□f squares is inappropriate. If we minimize x2 +	“*	Right away a sum
the best vector x' has N components all eq^l to l/A •5’” tol' + ’ +	= 1.
So we add a constraint that will push the .	П“' “ °PPOMlc ot sparse!
_	। . .	,	<>П Z t<’ward few nonzero components.
The difficulty is that thc cardinality of x (Ihe .
not a convex function of x. The set of vectors satisfy in'XS	_componenl‘ ’ “
because tbe halfway vector i (x + X) can have Г, ^7 ** ~ 3 141X4 convcx-
a different convex function whose minimization encouraZ 0X^*1?™’ **
that a good convex function would be found.	*	' It was not at all sure
In fact there is an excellent substitute for cardinality. It is the Zl norm of x:
Z1norm
/1 is the first in a sequence of norms. The exponents in Г go between p = 1 and p = oo:
/’’norm |MF=(|xIF + - + k.fl,,’has ||x + »«, < l|x||, + IMU (10)
P = 2 gives our usual sum of squares (the t2 norm). At the top end p = oc is lhe
maximum norm |,x|	= max|x,If we go below p = 1, the triangle inequality fails:
convexity is lost. As p -> 0. ||x||, approaches tbe cardinality of x: not a norm.
Let me show bow adding an t1 penalty to the Z2 norm produces a sparser solution x".
Nopenalty 2E = (. - 1)> ♦ (,-!)’♦(.♦,)’ bat %*.
The minimum at x = у = j (not sparse) is E = j. Now add a penalty | |x||i = |x| + |p|:
2S-(«-l)’ + (,-l)* + (I + ,)’+«M + 4»|S- Sivil’S'
The minimizing solution (x*. у*) moves from (J. j) to (0.0): totally sparse.
There is a geometric way to see why an Z1 minimization picks out a sparse solution.
The vectors with | Vi | + |vj| < 1 fill a diamond in the n — t^ plane. The I2 norm gives a
circle u?+ vj < 1 and the Z® norm |ci| < l,|rj| < I gives a square.
It is the comers of tbe diamond that touch a line al a sparse point One of those
sharp comers will lead to the minimum in the following typical optimization.
284
Chapter 7. The Singular Value ОесощройИи, (SVb)
The Minimum of ||v|| on the line ajVj 4- a2v2 — ।
Which point on a diagonal line like 3t>i + 4t>j = i i* closest to (0,0) ? The answer (and
meaning of "closest”) will depend on lhe norm—the measure of distance. This is д *”C
way to see important differences between /* and t2 and t°°.	gotx*
To find the closest point lo (0,0), rescale lhe t1 diamond and t2 circle and f°°
(where the vectors have | i v||> < 1 and ||v||2 < 1 and < 1). When thcv
Figure 7.4: The solutions v* lo the /* and I2 and t°° minimizations. The first is «pane
The first figure displays a highly important property of the minimizing solution to
the /* problem That solution v” has a zero component The vector v* is "sparse"
To repeat, this happened because a diamond will touch a line at a sharp point
The line (or the hypcrplane in high dimensions) contains lhe vectors that satisfy the
constraints /tv = b. The diamond expands lo meet the line at a sharp comer I
The essential point is that the solutions to /’ problems are sparse. They have few
nonzero components, and those components have meaning. By contrast the least squares
solution (using t2) has many small and non-interesting components. By squaring, those
small components hardly affect the I2 distance and they turn up in lhe t2 solution.
Minimizing with the f1 norm
The point of these pages is that computations are not all based on minimum energy.
When sparsity is desirable. tl comes in. We need new methods for new problems like these:
Basis Pursuit	Minimize ||z||i subject to Ax = b
LASSO with Penalty	Minimize || Ax — b||£ 4- A||z||i
LASSO with Constraint Minimize ||Az - b||£ with ||z|ii < T
LASSO was invented by Tibshirani to improve on ordinary regression (= least squares).
7 4 The Victory of Orthogonal,^ (and ,
285
Numerical methods for f + ? minimuMMxi m ,
improvement in x. We have icarncd to	"'***’>'	Step by мер
an t' Lagrange's idea builds the constramt*	palely
(by introducing Lagrange multipliers as unknown,. tC fuoclio0 to ** wmmiied
,heV are derivatives of thc minimum cost with rrv^. L mu*l,plier* h**t meaning*—
In mathematical finance the тиШр1|еп «^2*
They measure lhe risks in buying an option-the	rcP,e'enteJ by Greek letter*
^uals a designated strike price	* П8Ы to " «И - when its value
problem Set 7.4
1 If . » . compta «««	» кч* ta, |Wp .	. h|, +
For v =	< find Г and ||«||2 and 6T and ||8]|2.
Find the eigenvalues and eigenvectors of a rotation marrn and . reflection mMrix:
•Hs: «:] на:-:]
A permutation matrix has the same columns as thc identity maim (in some order).
Explain why this permutation matrix and every permutation matrix ix orthogonal:
0
0
0
1
1
0
0
0
0 0
1 0
0 1
0 0
has orthonormal columns to PTP=_______and P*‘ =
When a matrix is symmetric or orthogonal, it will have orthogonal eigenvectors
This is the most important source of orthogonal vectors in applied mathematics.
Four eigenvectors of that matrix P are x, = (l.l.l.l).xi • (1Д<*,<’).
x3 . (l,ia,<*,«•), and Z4 - (I,Multiply P times еж* vector to find
X|, Aj, Aj, A«. The eigenvectors are the columns of the 4 by 4 Fourier matrix F.
Г i i i i
л F 1
Show that Q=— ® j
1 i -1 ~t
i <» i-i
i i3 -i •
has orthonormal columns Q Q = I
Haar wavelets are orthogonal vectors (columns of W) using only 1,-1, and 0
1	1	0
1 -1	0
-1	0	1
-1	0	-1
Find VV^ W and W~1 and the
eight Haar wavelets for n = 8.
8 Learning from Data
8.1	Piecewise Linear Learning Functions
8.2	Convolutional Neural Nets
83	Minimizing Loss by Gradient Descent
8.4	Mean, Variance, and Covariance
This chapter describe* a combination of linear algebra and calculus and machine learn in
They produce an algorithm called “deep learning" that approximates a nonlinear functi
of many variables. That unknown function classifies lhe data, and recognizes the ima
and translates the sentence, and find* the best move in Go. The learning function P(x
has to combine complexity with simplicity.
Simplicity comes from two key step* in each layer F* of lhe overall learning function P •
Layer к - 1 to layer к v* = F*(u*_|) = ReLU(A*v*_1 + bk)
That function F* begin* with a linear step to the vector wt 	+ bk. Then comes
a fixed nonlinear function like ReLU (my pronunciation is RayLoo). That function acts on
every component of every vector A* v*_ i + b* lo give v*:
ReLU (any number x) « max(0,») ж ? -I * 2
Z It r i u
It is amazing that this nonlinear function can achieve so much. The key is composition
functions of functions of functions. We have L + 1 layers t  0,1,..., L (layer 0 is input,
layer L is output). Composition produces Vi. from Vz-i and eventually from the input v0;
t»t = Ft(®£_|) = Ft(Ft-i(...(Fi(wo)))) = chainof nonlinear functions F*. (2)
The “weights" z are all the entries in the matrices A> to А/. and the vectors b| to bg.
A deep network will have many weights from dense matrices A* and fewer weights from
convolution matrices (Section 8.2). The big computation is to choose those A’s and
b’s in z so that Vl = P(x. t>0) is close to the known outputs w from the training data v0.
More training data should give more accurate weights A* and b>—at a cost of extra
computations. Those computations aim for weights that minimize the loss—the differ-
ence between vz. and to. Stochastic gradient descent is a favorite algorithm to find those
weights. Backpropagation computes derivatives of F from derivatives of every F* by the
chain rule. The design of F is a balance between computing cost and learning power.
Amazingly. F can achieve accurate outputs on new test data that it has never seen.
Reference: Linear Algebra and learning from Data. Gilbert Strang. Wellesley-Cambridge (2019).
286
t «• Learning from Data
287
Suppose one of lhe digit, о, 1.g u .	F«*tionS of Deep Learning
whidl digit il is ? Thai пеип^^ que^ u How d^ .	*
^ogntze Which digit И ta? This « ,	!**“'*««* How can . COItlputCT
begin with lhe same idea: Lean, from ещ^**'"* 4-cwou Prob^y both answers
So wc start with M different images (the —* •
small pixels—or a vector v = (n..... "In« «» An image will be a set of
of thc ith pixel in the image: how dark or l1Rht it Г"T"' Г‘	“* °* “в”У*«»е"
p features: M vectors v in p-dimensional space c, ’ ** *u*e W lnu8c‘ each with
know lhe digit it represents.	n ”* evcr> » in that training set wc
in a way. we know a function. We have Л/ moots w R>
But we don’t have a “rule". We are helpless w.Ta £ ln^ „7*“	°“9
to create a rule that succeeds on (most of) the training n*^"*	P“’pt”“”
more than lhal: The rule should give the correct d J f^T 7“	."*"" rouch
"*«• from the «« „.tata.
answer might be. F v) could be a linear function from R’ to R10 (a 10 by P matnx)
Ш 10 outputs would be probabtl.ues of the numbers 0 to 9 We would have 10p auric,
and M training samples to get mostly right
The difficulty is: Linearity is far loo limned Artistically, two zeros could make an 8.
1 and 0 could combine into a handwritten 9 or possibly 6 Images don’t add In recognizing
faces instead of numbers, we will need a lot of pixels—and the input-output rule is nowhere
near linear.
Artificial intelligence languished for a generation, waiting for new ideas There is no
claim that the absolutely best class of functions has now been found. That class needs to
allow a great many parameters (called weights). And it must remain feasible to compute
all those weights (in a reasonable time) from knowledge of thc training set.
The choice that has succeeded beyond expectation—and has turned shallow learning
into deep learning—is Continuous Piecewise Linear (CPL) function, Linear for sim-
plicity. continuous to model an unknown but reasonable rule, and piecewise io achieve
the nonlinearity that is an absolute requirement for real images and data
This leaves the crucial question of computability. What parameters will quickly de-
scribe a large family of CPL functions ? Unear finite elements start with a tnangular mesh
But specifying many individual nodes in R’ is expensive. Much bener if those nodes we
the intersection, of a smaller number of lines (or hyperplanes) Please know that a regular
grid is too simple.
Here is a first construction of a piecewise linear function of tbe datavector
.ma.n, .4,
the input v and the output w.
‘ -
I
288
Chafer 8 Learning fron, D|Uj
(Лю)|	[И® + b)ij+
pq + 2q = 20 weights
P
С|Лv + b]+ s t£>
\	r(4.3) = 15 linear pieces
----------<	in w = F(v)
(Av), [(Лю + b)«]+
Actually the nonlinear function ReLU (x) = *♦ = rnax (r, 0) was originally smoothed
into a logistic curve like 1/(1 ♦ e"'). It was reasonable to think that continuous derivatives
would help in optimizing the weights Л i. bi. Aj. That proved to be wrong.
The graph of each component of (Л|« + bj+ has two halfplanes (one is flat, front
the zeros where Л(« + bt is negative). If is q by p. the input space Жр is sliced by
q hyperplanes into r pieces. We can count those pieces! This measures the "expressivity”
of the overall function F(t>). The formula from combinatorics uses the binomial cocffi-
•	cients (see Section 8.1):
(q \ / я \
0 1 + I J 1 +••• +
This number gives an impression of lhe graph of F with a hidden layer. But our function
is not yet sufficiently expressive, and one more idea is needed.
Here is the indispensable ingredient in the learning function F. The best way to create
complex functions from simple functions is by composition Each Ft is linear (or
affine) followed by the nonlinear ReLU : F,(t>) “ (Л,о + b,)+. Their composition is
F(») « Ft(Fi,_I(...Fs(F|(v)))). We now have L - 1 hidden layers before the final
output layer. The network becomes deeper as L increases, That depth can grow quickly for
convolutional nets (with banded Toeplitz matrices A: many zeros).
The great optimization problem of deep learning is lo compute weights A, and bi
that will make the outputs F(t>) nearly correct—close to the digit w(v) that lhe image v
represents. This problem of minimizing some measure of F(o) - w(v) is solved by
following a gradient downhill. The derivatives of this complicated function are computed
by backprvpagaiion—the workhorse of deep learning that executes the chain rule.
A histone competition in 2012 was to identify the 1.2 million images collected in
ImageNet. The breakthrough neural network in AlexNet had 60 million weights. Its
accuracy (after 5 days of stochastic gradient descent) cut in half the next best error rate.
Deep learning had arrived.
Our goal here was to identify continuous piecewise linear functions as powerful
approximators. That family is also convenient—closed under addition and maximization
and composition. The magic is that the learning function F(Ai,bt,v) gives accurate
results on images v that F has never seen.
289
g I. piecewise Linear Learning Function,
8.1 Piecewise Linear Learni
peep neural networks have evolved into a
the structure of lhe network has become more ,	—
adapted to new applications. One way to	* P^«i-and nxee'eajy
structure. 1 hose pieces come together into . kLlr *Ьсп‘* eMcnt,aJ l*«* in the
*	-	•’)*•«’ weights»
creating that function f<* “* *"h Bcw tC4t
J,ng Functions
^*f«««mach.nelcanuBg &
adapted to new applications. One wav тТЛ'1*"' *ndГ” ' .
-•_>s*aire- Those nieces мт» .i	is |q dcx.ribc ск\сп(м1 pi
that capture information from the trainingdlu
Here are important steps in
Key operation
Key rule
Key algorithm
Key subroutine
Key nonlinearity
1
2
3
4
5
Our first step is to describe lhe pieces F., F, F, far	TZ ' /	*
The weights x that connect the layers v m ()p(in,1/ed |n
comes from the training set, and the function г	wtor v v°
ш	taU
— „. uncent to find tbe best weights x
Backpropagation to execute lhe chain rule
Rel.l(y) ж max(y.O) = ramp function
F* is a Piecewise Linear Function of v*_|
The input to F* is a vector of length IV*.,. The output is a vector v* of length AT*
ready for input to F*+l. This function F* has two pans, first linear and then nonlinear:
1.	The linear part of F* yields A*v*_, + bt (that bias vector b, makes this “affine”)
2.	A fixed nonlinear function like ReLU is applied to each component of A*V*.|+b*
Layer* |ti**F*(ti*.,)wReLU(A*g*l+6*)|	(I)
The training data for each sample is a feature vector t\>. The matnces A* and the
column vectors b* are “weights” to be chosen—so that the final output vj. is close to
the correct output w. Frequently stochastic gradient descent computes optimal weights
x = (Л,, bi,..., Al) in the central computation of deep learning. Minimizing a loss
function of vl - w relies on "backpropagation" to find the x-derivatives.
The activation function ReLU(y) = max(y.O) gives flexibility and adaptability.
Linear steps alone were of limited power and ultimately they were unsuccessful.
ReLU is applied to every “neuron” in every internal layer. There are № neurons
in layer k, containing the A* outputs from equation (Ik Notice that ReLU itself is
continuous and piecewise linear, as its graph shows. (The graph is just a ramp with
slopes 0 and 1. Its derivative is the usual step function.) When we choose ReLU. the
composite function F = FL •• (F2(F,(*»•))) h» “ '"Ч»™1 ind ,l,rac"'e property:
The learning function F is continuous and piecewise linear in p.|
Chapter 8. Learning from Dala
290
One Internal Layer (L = 2)
Sunoco we have measured m = 3 feature» of one sample point in the training seu
Tho« features are the 3 components of the input vector v - v0. Then the first function F,
in the chain multiplies v0 by a matrix A, and adds an offset vector bj (bias vector).
If 1, is 4 by 3 and the vector bj is 4 by 1. we h«*e 4 components of A0v0 +
That step found 4 combmations of the 3 original features in v = v0. The 12 weights
in the matrix .4, were optinuzed over many feature vectors »o tn the tnumng set, to choose
a I by 3 matrix (and a 4 by 1 bias vector) that would find 4 insightful combinations.
The final step to reach v, is to apply the nonlinear "activation function" to each of the
4 components of Л.оо + bi Historically, the graph of that nonlmcar function was
often given by a smooth “S-arn*". Particular choices then and now are in Figure 8.1.
Figure 8.1: The Rectified Linear Unit and a sigmoid option for nonlinearity.
Previously it was thought that a sudden change of slope would be dangerous and pos-
sibly unstable. But large scale numerical experiments indicated otherwise! A better result
was achieved by lhe ramp function ReLU(y) - max(y.O). We will work with ReLU:
Substitute A। t>o + bi into ReLU to find t>i (®i)fc “ max((AiVo + bt)s,0). (2)
Now we have the components of V| at the four “neurons" in layer 1. The input layer held
the three components of this particular sample of training data. We may have thousands
of samples. The optimization algorithm found Ai and b(. probably by gradient descent.
Suppose our neural net is shallow instead of deep. It only han this first layer of 4
neunms. Then the final step will multiply the 4-component vector by a 1 by 4 matrix
A; (a row vector). A vector bj and the nonlinear ReLU are not applied to the output.
Overall we compute Vj = F(x, i>q) for each feature vector t>o in the training set.
The steps are v2 = Aj (ReLU (A|V0 + 6|)) “ F(®, ®o)-
The goal in optimizing x = A|.b(.Aj is that the output values Vl = v2 at the last
layer L “ 2 should correctly capture the important features of the training date t>0.
At the beginning of machine learning the function F was linear—a severe limitation.
Now F is certainly nonlinear. Just the inclusion of ReLU al each neuron in each layer
has made a dramatic difference. It is the processing power of the computer that makes
for fast operations on the data. For a deep network we depend on the speed of GPU’s
(the Graphical Processing Units that were developed for computer games).
g ) piecewise Linear Learning Functions
291
ReLU
ReLU
д1Л3та,пхЛ1
Add4*
ReLU
Ы matrix
Feature vector v0
Three components for
each training sample
ReLU
Vi « Л|во + bj в| at layer 1
ViM layer 1 Ui«ReLU(»,)
Four components of Vl and v,
Output v3
Vj = 4jt>i
True = w
x: ’д,:.	V"-
For a classification problem each sample of the trammg data b asugned
1 or -1. We want the output v, to have that correct ogn (most of the time)
For a regression problem we use the numerical value (not just the sign) of vj
Depending on our choice of loss function of v, - w. thts problem can be like least
squares or entropy minimization. We are choosing * - (weight matrices Л» and
vectors bfc) to minimize the loss. Here are 3 possible loss functions:
1
1 Square loss £(z) = -£||F(x,»j) _	7 . N training samples
1 N
2 Hinge loss L(z) = — m,x “ It F’(x)) for classification у = 1 or -1
1 ?.
3 Cross-entropy loss L(z)= £[jhlog b+(l-g,)k>g(l-p1)]fory,=Oar 1
Our hope is that the function F has “learned” the data This is machine learning.
We aim for enough weights so that F has discovered what is important in recognizing dog
versus cat—or identifying an oncoming car versus a turning car.
Machine learning doesn't aim to capture every detail of the numbers 0,1,2...,9.
It just aims to capture enough information to decide correctly <•*'<* number it is.
ChafXcr 8. Learning frum
292
The Initial Weights x0 in Gradient Descent
* the form of Ле learning function F(®, v). The
The architecture in a neural net ucciuc	x Jn malnCes д and vcclors b
шшйч аш	VXX'Z*'"**•8 ”	” •“*
Starting from «о.««	« тИЙ~«
X2 and onward, aiming to imd w g	? Choosing Жо = о would be a disaster.
’ The question is: What "fW,“Xo f failure in deep learning. A proper choice of the
Poor initialization is anXX X indcpenW «l«bu tlntt	.«"Wen.,;
1. x„ha> a carefully chorea variattca »’•
2. The hidden layers in the neural net have enough neurons.
, , , . , initial variance <ra controls the mean of the computed
X!I iSX гX	the variance of Ле weights. The key point is mis
wcignts. I nc у	on Ihe |ra|n|ng M(. Bm tf a j, wmng or &
5SU <M g— cm to С«»п,1 «/</.c weigta. The enqw.
lion of x can explode to infinity or implode to zero.
The danger controlled by the variance a* of x0 is exponentially large or exponentially
tne uang	у i. „2 s 2/fan-in. The fan-in is the maximum number of
small weights, ic go< '	hc outpul). Software like Kerns makes a
inputs to neurons (Figure 8.Z nas inn in
good choice of
Max-Pooling to Reduce Dimensions
An image can have many pixels; the input tin can include many features. Then the size of
our computations can grow out of control. We have to reduce the number of components
(sometimes called neurons) in a typical layer. If you look at the architecture of AlexNet.
you sec "pooling layers" lo reduce dimension.
The most popular choice is simply max-pooling. Divide the image into blocks of
4 pixels. Replace each 2x2 block by 1 x 1: usually the largest of the four numbers.
(Average pooling would keep the average of those numbers.) Here is AlexNet.
Figure 8.3: AlexNet uses two GPU’s linked at certain layers. Pooling simply reduces the
image dimensions. Most layers connect by convolution matrices A* (Section 8.2) but the
(inal layers are fully connected by dense matrices A*. The input dimension is 150.528 and
AlexNet had 00,000,000 weights—it won the 2012 competition to classify linageNet.
g.|. Piecewise Linear Learning Function»
293
Graph of the Learning Function F(v)
The graph of F(v) is a
огЬурегр1алС5 that fit together akxigaU	рксех-they же plancs
This is like ongami except that this р1р6	'produced a change of slope,
graph might not be in RJ—Ле feature sector в = 4 P**'' going ю infinity. And Ле
Л’« » m components.
Part of the mathematic, of deep leanwtr
and to visualize how they fit into one pteceuhe be	numbcT r M ftal Р,лс'
an example of a neural net w nh one intern^ 1»»„	”***' *f,cr
rn measurements like height, weight, age of a sample in tb r J^*,***** V“ COnU‘"'
In Figure 8.2, F had three inputs in л, and one
flat surface in 4-d.menMona) space The he.ght of £	"“j
point Vo >n 3-ditnensional space, Limitations of	.. hLs—Г —Л («hi), over the
•« ™, T,»«»« Л. lb. p^. s^, „ 3
Note You actually see points on the graph of F when you run examples oo
playgnxind.tensorflow.org This is a very instructive website
That website offers four options for the training set of points r0 You choose the number of
layers and neurons. Please choose the ReLU activation function ’ Then the program counts
epochs as gradient descent optimizes the weights. (An epoch sees all samples on average
once.) If you have allowed enough layers and neurons to correctly classify the blue and
orange training samples, you will see a polygon separating them. That polygon shows
where F = 0. It is the cross-season of the graph of : = F(o) ж height: «0
That polygon separating blue from orange (or phu from miiuu: this is classification)
is the analog of a separating hyperplane in a Support Vector Machine, if we were limited
to linear functions and a straight line between a blue ball and an orange nng around it.
separation would be impossible But for the deep learning function F this is not difficult..
We will discuss experiments oo this plav ground.temorflow site tn the Problem Set
Important Note: Fully Connected versus Convolutional
Wc don’t want to mislead the reader -Fully connected" nets are often not Ae most ef-
fective. If the weights around one pixel in an image can be repealed around *1 poeds
(why not ?). then one row of .4 is all we need The row can asep> ™
pixel*. Local convolutional neura) nets <CNV.) же m .MexNetaod Sect- 8.2.
The website math.mrt.eduTNNUI alkms the reader to creaie a CXX poo g
Y™	e I—
That is a useful insight into the power of	to visualize in full detail
the size and depth of the neural netwxrt make it оитк
Chapter 8. Learning from DftU
294
CounU»8 PiKeS ‘°,he G”₽h! °"' U»w
.hr weicht matrices As and *»* bias vcc‘ore b* number
h is easy to count	* • 6far more mtcresling to count the number of flat picccs
determine the function F. But	ures the expressivity of the neural network
in the graph of F-	uon we fully understand (at least so far).
F(x,v) is a more coroplicairo • without explicit approval of its •‘thinking".
The system is deciding and *	fairly soon.
For driverless cars »>	components. We have N functions
Suppose vo has mcompone	^ * „ro g hypcrplane (dimension m - 1)
of vo. Each of those linear	fc becomet piecewise linear, with a
in Rm. When we apply	fold iu graph is sloping, on the other side
fold along that hyperplane.
this component of vt в «го	piecewise linear functions of v0. so va
Then the next matrix Ai conn'	„ R- wordl describe each piecewise
now has/оИг along 5r A, ReLU(Ai»o + &t)): thc output in our case.
linear component о “	g ((he folds actually along N hyper-
You could think of 5 ЙГИ*Ь , (old separates lhe plane in two pieces. The next
planes in rn-dimensional space) _	fold morc dlfflcu|t t0
(ЫО <™. «ли km »-Лw
but tbe rollowtn. H.0(e	4	.rraRRfmcnt-ur.d a theorem of Tom
In combinatorial theory, wena V nled b Richard Stanley’s great textbook on
Zaslavsky counts the pieces. 11* proo _ P	u more complicated than we need.
Enumerative Combinatorics (	nossible ways. We assume that the fold lines
because it aikre.Ле <oM 1t~a» ~	» cm.». line» p.eee. it
are re-geuerai [«««<«	NretntlNet»»-*.<«».: 1606.05336).
given by On the Expnutve Power of Deep
Theorem For v in rn dimensions R". suppose the graph of F(v) has folds along
N hyperplanes H|,...,Hjv. Those come from N linear equations ajv + b, = 0.
in other words from ReLU at N neurons. Then the number of linear pieces of F and
regions bounded by the hyperplanes is r(N, m):
r(N,m)«(o ) + (7)+"‘+(m)‘	(4)
These binomial coefficients are
/ N\
with 0! = 1 and I j = 1 and ( . 1 = 0 for t > N.
N\ N\
» J ~ il(N-i)!
Example The function F(x,p, z) = ReLU (z) + ReLU (y) + ReLU (z) has 3 folds along
the 3 planes r = 0. p = 0, г = 0. Those planes divide R3 into r(3,3) = 8 pieces where
F = x + у + z and x + x and r and 0 (and 4 more). Adding ReLU (x + у + z — 1) gives a
fourth fold and r(4,3) = 15 pieces of R3. Not 16 because the new fold plane x + у + z = 1
does not meet the 8th original piece where x < 0, у < 0, z < 0.
g I. Piecewise Linear Learning FuntUum
295
George Polya'» famous YouTube video ь.,.. _
He helps lhe class to find r(5.3) = 26 Ole„ ?** G*“"« a
One hyperplane in R"* produces ( 1V /Т
will produce r(2,m) = 1 + 2 + 1 ?;/ J 'r
two fokh in a line, which only ikr1-	—
*	u** into г 2 11 -. •>
The count r of linear pieces from .V g.li.	~ **“'
'°Uo*fro«» the recursive formula
«х/t Спай,, си a cake by 5 pbnes
। s -------‘ "'-dimensional cakes.
jJ=2rc^AndJVe2hyp«pllne%
•egions provided m _> 1 u
----- “ L.. „	* 1 Whcn "i = 1 we have
m- 1)
r(N.m)=r(N_1,n,) + r<N_l
(5)
_ *ndr(N-l,m) regions
m-1). The established N - | hyperplanes
-------------------t one existing region
- l.m); see Figure 8.4.
To understand that recursion, start with .V -1 hypernb^ lnB-
bdd one more hyperplane H (dimensjon r • -
cut H in<o r(N - l.m - 1)^,on*. Eachof-tbore'p^^Xs'
into two. adding r(N - l.m - 1) regions to the original r(N - ’
So the recursion is correct, and we now apply equauon (5) to comp^’^f ’
The count starts at r(l,0) = r(0, l) ж 1. Then (4)is proved by inducuonoo M + m:

VKeKVH?;.*)]
'	' O' '	9 '	'
The two terms in brackets (second line) became one term because of a useful identity:
I . J + \, + l)eli + l) “^Леinductionucomplete.
Mike Giles made that presentation clearer, and be suggested Figure 8.4 to show
the effect of the last hyperplane H. There are r » 2* linear pieces of F(v) for N < ni
and r » Nm/m\ pieces for N » m. when the hidden layer has many neurons
4	Stan with 2 planes
\/	4- r(2.2) - 4
la / \ 3a	pc рЦпе H
/ 2a \	- н 4- H2.1) - 3
lb /	26	' 36 Total r(3,2)e 7
|l
••
n
P
•
»•
|i
Figure 8.4: The r(2.1) = 3 pieces of H create ’
r(3,2) =4 + 3 = 73-^M’r(4 2) = n.
fourth fold would cross all 3 existing folds and create 4	.
296
Chapter 8. Learning from t)ata
Hat Pieces of F(«) with More Hidden Layers
Ffrlis much harder with 2 internal layers in the network.
Counung the linear pieces oi П ' CTls Now A1V) + b, will have N2 components
Again vo and v> have m  1	'™(>n f fw onc layer< described above. Then applj.
before ReLU. Each one is lit.	Those folds aiong lhc ]incs whcrc
cation of ReLU will create new folds m ртР
component of A। Vi + b‘ “ ZC™. f _ + b, is piecewise linear, not linear. So it
Remember that each compone w(M. hncM surface, not a hypcrplane. The straight
crosses zero (if it does) along P	t0 iecfwise straight lines for thc folds
lines in Figure 8.4 for the о * ‘ „nding on the details of v0, At Ль Лэ, and bj.
in Vj. So the count becomes >	-	piecewise straight lines
Still we can estimate the n	ReLU's at the second hidden layer. If those
(or piecewise hyperplanes in	a tot>1	foldl jn cach
component of
lines were actually straight, we	м.ОиМ hove + jy2 piace oj N
V1 « F(vo) Then J" Rolnick (fl,Xiv: 1906.00904). So the count of
Composition F>(Fa(Fi(v)))
The word “composition" would simply represent “matrix multiplication" if al) our
functions were linear: F*(v)  A*v. Then F(vq) “ AjAjAiVq: just one matrix.
For nonlinear F* the meaning is tbe same: Compute v>  F|(vo), then v2 « Fj(V|),
and finally Vj - F3(v2). Now we get remarkable functions. This operation of composi-
tion F3(Fa(Fj(vo))) is far more powerful in creating functions than addition!
For a neural network, composition produces continuous piecewise linear functions F( v0)
Thc 13th problem on Hilbert's list of 23 unsolved problems in 1900 asked a question about
all continuous functions. A famous generalization of his question was this:
Is every continuous function F(z, y, x) of three variables the composition of
continuous functions G| Gn of two variables ?	The answer is yes.
Hilbert seems to have expected the answer no. But a positive answer was given in 1957
by Vladimir Arnold (age 19). His teacher Andrey Kolmogorov had previously created
multivariable functions out of 3-variable functions. The 2-variable functions xy and xy use
1-variable functions exp and log, and you must allow addition.
xy = cxp(log x + log y) and z* = exp(cxp(log у + log log z)).	(7)
So much to learn from the Web. A chapter of Kolmogorov's Heritage in Mathematics
(Springer. 2007) connects these questions explicitly to neural networks.
Is the answer to Hilbert still yes for piecewise linear continuous functions ?
With enough layers and neurons. F can approximate any continuous This is univer-
sality. New research by Telgarsky and Townsend shows the power of rational functions.
g.|. Piecewise Linear Learning Function,
297
problem Set 8.1
In the example F = ReLU(x) + r,i ... ,
for r(N, rn). suppose the 4th fold Comes * **}-«(«) that follows fonnuU (4)
amglc point (0,0,0)—™ aceptionof^ £ *** x “ °* =	= 0 at .
F = sum of these four ReLU’s.	^«nbe the 16 (not 15) hncar pieces of
Suppose we have m = 2 input, and X neu(um
it a linear combination of N ReLU's. W °" 1 h,dden byer, so F(z,y)
that the count of linear pieces of F has leading tenr/ ^nrmu'i *** to show
Suppose we have X = 18 lines in a ni.« rr <i
how many pieces of lhe plane? CompL wnh	” **
position and no three lines meet.	**n the lines arc in general
What weight matrix Л, and bias vector b, will produce ReLU lx + 2v 41 >„d
’ ? ^,nd (2x+* - »> -	oitii
hidden ayer (Theinput layer!has 2 components x and y.) If the output u> is the
sum of those three ReLU s, how many flat pieces ia the piecewise linear w(x,y)?
Folding a line four times gives r (4.1) = 5 pieces Folding a plane four times gives
r И’ “11 pieces. According to formula (4), how many flat subsets come from
folding R four times ? The flat subsets of R1 meet al 2D planes (like a door frame).
6
The binomial theorem finds the coefficients ' j u(s + b)x	e‘b*“*.
Fora = b=l what does this reveal about those coefficients and r(N.m) form > X?
7
8
In Figure 8 4. one more fold will produce 11 flat pieces in the graph of : “ F(x, y).
Check that formula (4) gives r (4,2) = 11. How many pieces after five folds ?
Explain with words or show with graphs why each of these statements about
Continuous Piecewise Linear functions (CPL functions) is true:
M The maximum M(x, y) of two CPL functions Ft(x. y) and F»(x, y) is CPL
S The sum S(x. y) of two CPL functions F|(x. у) and Fj(x, y) is CPL.
C If the one-variable functions у = F((x) and i »Fs(y) are CPU
so is the composition C(x) = z » (Fj(Fi(x)).
298
Chapter 8. Leaning from 0<ц
How many weights and biases are in a network with m = A'o = 4 ।	.
feature vector t>0 and N = 6 neurons on each of lhe 3 hidden layers *» н *П Cacl1
activation functions (ReLU) are in this network, before lhe final output ? * П,апУ
10
(Experimental) In a neural network with two internal layers and a total of
should you pul more of those neurons in layer 1 or layer 2 ?
10 neurons,
Problems 11-13 use the blue ball, orange ring example on playground.lensorflow
with one hidden layer and activation by ReLU (not Tanh). When learning succ wf8
a white polygon separates blue from orange in lhe figure that the code create*
11	Does learning succeed for N = 4 ? What is lhe count r(N, 2) of flat pieces in F(x) 7
The white polygon shows where flat pieces in the graph of F(x) change sign as they
go through the base plane z - 0. How many sides in the polygon ?
12 Reduce to .V = 3 neurons in one layer. Does F still classify blue and orange cor-
rectly? How many flat pieces r(3,2) in the graph of F(v) and how many sides in
the separating polygon ?
13 Reduce further loW  2 neurons in one layer. Does learning still succeed ? What is
the count r(2,2) of flat pieces ? How many folds in the graph of F(v) ? How many
sides in the while separator 7
14 Example 2 has blue and orange in two quadrants each. With one layer, do N  3
neurons and even N • 2 neurons classify that training data correctly ? How many
flat pieces are needed for success ? Describe the unusual graph of F(v) when W = 2.
15 Example 4 with blue and orange spirals is much more difficult! With one hidden
layer, can the network team this training data ? Describe the results as N increases.
16 Try that difficult example with two hidden layers. Start with 4 + 4and6 + 2and
2 + 6 neurons. Is 2 + 6 better or worse or more unusual than 6 + 2?
17	How many neurons bring complete separation of the spirals with two hidden layers ?
Can three layers succeed with fewer neurons than two layers ?
I found that 1 + 4 + 2 and 4 + 4 + 4 neurons give very unstable iterations for that
spiral graph. There were spikes in the training loss until the algorithm stopped trying,
playground tensorflow org was created by Daniel Smilkov.
18	What is the smallest number of pieces that 20 fold lines can produce in a plane ?
19	How many pieces are produced from 10 vertical and 10 horizontal folds ?
20 What is the maximum number of pieces from 20 fold lines in a plane ?
g 2. Convolutional Neural Nets
299
8.2 Convolutional Neural Nets
This section is about networks with a dtfferent r
a layer with n neurons to the next layer with	An m " "»nx *11 connects
connected: A had mn independent weights No* *"'Р to "ow-the *cre My
independent weights in A.	• »e might have only E = 3 at tf = 9
The fully connected "dense net” will be eur i
First, the weight matrices A will be huge. If one .m'”' I "*ftc,ew fur ™4e recognition
layer has 60. (XX) components The weight mat™ Л S’P‘“k *” te ,npul
ТЫ »: W. „ f„ to^'’
Almost always, the important connect™ u> an unage are local
Music has a 1D local structure
Images have a 2D local structure (3 copres fo, red-green bluei
Video has a 3D local structure: Images in a tune senes
More than this, the search foe structure b essenrially the same everywhere in the image.
There is normally no reason to process one part of a text or image or video differently
from its other parts We can use the same weights in all parts: Share the weights. Tbe
neural net of local connections between pixels b shift-invariant thc same everywhere.
The result is a big reduction in the number of independent weights Suppose each
neuron is connected to only E neurons on the next layer, and those connections are the
same for all neurons. Then the matrix A between those layers has only E independent
weights x. The optimization of those weights becomes enormously faster. In reality
we have time to create several different channels with their own E or E2 weights They
can look for edges in different directions (horizontal, vertical, and diagonal).
In one dimension, a banded shift-invariant matrix is a Tocplitt matrix or a filter.
Multiplication by that matrix A is a convolution z • v. The network of connections
between the layers is a Convolutional Neural Net (CNN or ConvNcti Here E  3.
' Z|	xo	z-i	0	0	0	•	(to.’i.to.to.vr.to)
0	»i	*o	x_»	0	0	v«Au = (ft. to. to-У»)
0	0	xi	x,	x_|	0
0	0	0 x, x0 X-ij X + 2 inputs v and N outputs у
It is valuable to see A as u combination of shift matrices L, C, R Left. Center. Right.
Each shift has a diagonal of I's A = Xi L + xoC + x-t R
. t	_ r„ + mCv + x_iRr> are exceptionally simple:
Then lhe derivatives of у « Av = xyLv + XoC *
d( output) cH weight)	К х1л ^-=Cv	=	(1)
>
I
.4' '1
300	<М*стК Learning tnen 1л-4
Convolution of Vecton» ж » v
The convolution of two vector* to written Ж • • = (2,1 • 2) • (3,3 I). Computing the гем;It
z • v a (6.9. II.7.2) to like multiplying the number* 212 and 331. without carrying
3 з 1	Notice that we leave
2 1 2	6*3*2= 11 a* t»(no carrying)
6 6 T Same step* for multiplying
331	Zr2 x + 2 time* 3z2 + 3x + 1
6 6 2	That tovwer would he
6 9 11 7 "2"	6r* * 9г1 + lix2 ♦ 7x + 2
The previous page just put the number* (X|.x0.x_|) « (2.1.2) on three diagonal* of A.
Then ordinary multiplication 212 time* 331 convert* to matrix-vector multiplication An.
When x ha* length j + I and v ha* length к * 1. convolution ж • v ha* length j + к + |.
Comolulion as a Moving Window
Suppose we average each number with lhe next number in v — (1,3,5). Thc result to
у - (2.4). Thi* to a typical convolution of c with the averaging vector z ж (|. |) ;
Notice thc decivion not lo pad v with a zero at each end (and extend A to be 4 by 5).
That would lead to 4 output* у instead of 2. It would be consistent with multiplying
number*: II time* 135to 14A5 and dividing by 2 give* J,2.4, j.
Python and MATLAB offer both versions of convolution, padded or not (and a
third option with three output*). We will choose not to pad the input with zero*.
Each row of A to a perfect shift of thc prcviou* row. a* above.
Often the convolution process Av to seen a* a moving window. Thc window start*
at I 3 and move* lo 3 5. Averaging produce* 2 in the first window and 4 in the second
window. Thc whole point of “shift invariance" to that a convolution doe* the %ame thing
in each window.
Windows in Two Dimensions
Thi* approach to helpful in two dimension* where the window is a square or a rectangle.
It to easy to see 2 by 2 overlapping window* filling an n by n square. There would be
(n - I J2 window* and an average over each window. Thc matrix A ha* (n — 1 )2 outputs
from n2 input*. Each row of A ha* (|. 1.1, |): four nonzero* lo average over a 2 by 2
window.
gt CoirrohMirwul Seurat Se,
Eipenments have panted to £ » 3 M ,
l^jrnini.' СГМ11И	__ г .
ал £ by £ = 3 by 3 num ж would tux 2£ - j
3X3 windows in
a 5 X 5 square
ГГТ
4 5 в
7 3 9
Move thn wmduw left/п<Ы/ар/down/
<fT ap Inpt left down/ npt down
to produce 9 window* centered
m these ame ромгюев
2D Comolutioa by One Large Matrix
When lhe input v is an image, convolution become*	£ . 3 nutniwn
Z-|.Zo.Z| change lo « 3l independent weight* The input* r(J have two md*ce*
and v represents (.V + 2)1 pixels The outputs have only № pixel* оЫеы we pad * with
zeros at the boundary The 2D convolution Ar в а linear ахпЬепжюа at 9 shift* of r
Zu	Z01	z-и	Input image e,,	Lj from (O.O)to (X ♦ 1..V ♦ 1)
Weights	zl0	zoo	z_w	Output tmate	i,j from (1.1) *>(X.X)
zt-i	Zo-i	z.i-i	ShiftsL,C,R.L.D	 Left.Ceaaer Right.Lp.Doan
A^XuLU+ia\CU+z||Rtr+xioI.+x«C-kz-teA-*-xt-iX.D-»zu-iCD*z-i-iflD
This expresses the convolution matnx A as 1 combmaocc <Л 9 shift* The demaenes
of the output V = Ли are again excepbonaily simple We use the шпе den*мое* ta (2>
to create the grad.enls VF and VI (learning function. function, that are needed
in gradient descent to improve the weight* z». The next itentxwi z*-i * z* »
has weights that better match the correct outputs from the training dau
Backpropagation finds these 9 den*arises of p = Ar with respect to 9 weight*:
dv	«>» z*r- ^RUv . -^—•RDv (2)
Z’LU’ %'cl' л-
302
к Uanilng fn.mhWi
CNN’s can readily allord to have II parallel channels (and that number H can Vl,
wc go deeper into thc net) lhe count o( weights in x is so much reduced hy weight sht i "
and weight locality, tlwt wc don’t need and wc can’t expect one set of Д»	Q Wc|"
lo do all lhe work of a conssilulional net. H cottrtdttfiotu give the next layer.	* ”*
.....Wei or Twpliw пш|гН>
Two-dinwnslonul Convolutional Nets
Now we come lo the teal success ol CNN’»: Image recognition Con v Nels and ck-cp
learning hove produced а small rerolutioa in compiler vision. The application» arc to
self driving can. drone». medical imaging. security. robotic*—there la nowhere to stop.
Our inlereal I» in the algebra and geometry and intuition thul makes all this possible,
In two dimensions ((or images) the mains Л is block locplitz. Each small block
is E by E. Uns is a familiar structure in computational engineering. Thc count fc’a of
independent weights to be optimized is (ar smaller than for a fully connected network,
lite same weights air used around all pixels Ghi/t invariance}. Thc matrix A produces a
2D convolution x ♦ v. Iret|uently 4 is called a (liter.
1b understand an image, look lo sec where il changes. Find the edge*. Our eyes look
for sharp cutoffs and sleep gradients. The computer can do the same by creating a filter,
lhe difficulty with two or more dimensions is that edges cun have many directions, Wc
will need horizontal and vertical and diagonal filters for thc lest images. And fillers huve
many psiqmscs. including smoothing and gradient detection and edge detection.
Smoothing For functions. one smoother is cawohtfltm with a Gauxidan e~',tf2a*.
For vector», wc could convolve t> • G with G ” jly (1,4.7.4.1).
Gradient detection Image processing (ax distinct from learning by a CNN) needs
Alien that delect the gradient. They contain specially chosen weights. Wc mention some
simple tillers just to indicate how they can find lint derivatives of f.
One dimension
E 3
Then (An), = -VH| - -v(_|.
Ли» dimenutms
E = 3
These 3x3 Sttbel operator* approximate l)/0x and H/Oy :
0 1 '
0 2
0 1
0	|
— n: -
Oy 2
-2 -1 ‘
0	0
2 I
(3)
Edge detection Those weights were created for image processing, to locate the most
important features of a typical image: its edges. These would be candidates lor E by E
filters inside a 2D convolutional matrix Л. But remember that in deep learning, weights
like and • are ma chosen by lhe user. They are created from thc training data.
Н 2 CimvululiimaJ Neural Nel»
imitorinni The t.hcrw deunbed w (lf...,	303
lhc moving window laket i„„Kr, * * fe»t, Uridt
for a 1-dimensional 3-weight fife, w„h *7» *->»» tfe £ -de.
)lflheou<p.l V ^‘-^dbyife1(K,^^
"’'’“•‘"•‘Rb and now two,;
for , 1-dimensional 3-weight lifer Wtlh " *T
। /lvl*red*«’bytfeirK„;*;
Stride S - 2 if*.
0
<4»
I ° I
How lhe nonzero weight» life X| arc .
In 2D. n stride 5 = 2 reduces each direction by jaa t*h? <S с'*"пп' •P*’ <<* «nde S).
1 “«lhe »i« of ife („tpjn hy
«и
In a onc-dimcnsional problem. supp,^ B Uyc| ..
matrix wilh E nonzero weight». The «ride i» S' md ncu,,l,n Wc apply a cunrolutumal
each end. How many mitputs (Af number») doas du» hlier^'X"^' ‘,gM’	-
^Kurpathy'sformula M я *j_Z g + a**	(51
In a 2D W 3D problem, this ID formula ippfe* <n each direction
Suppose E - 3 and the Mode is S . 1.11 wc	one < p . U al cntl. Ihcn
M»N-3 + l+l«N (input length = output length)
Thi* cane 2 / E — I with stride S  I b the mo« common architecture lor CNN'».
If wc don’t pud lhe end» of the input with zero», then P > 0 and M - N - 2
(as in the <1 by (i matrix A at the start of thi» «action). In 2 dimension» thi» become*
л/’ Я (N - 2)a. Wc lose neuron» this way. but we aroid any artificial zero-padding
Now suppose the stride it S = 1 Then N - E must he an even number Otherwise
thc formula (5) produces a fraction. Here are two example» of success for «ride S  2.
with N - E = 5 - 3 and padding P «Ои P » 1 at both end* of the N  5 input*:
Stride
2
»Г	*.|	xq *1	о	о	0	0	1
z_|	*o	X|	о	О II	0	0 *-1	*e	*1	о	0	I
0	0	x_|	*o	*1 j I	0	00	0	x-i	*•	X|	J
A Deep Convolutional Network
Recognizing images is a major a^
success came with the creation о«М Ле**
This page will descnbe a deep networkл	Z14vC,nun from KI R 2015.
nilion. We follow the pnze-winntng paper )	(J w 3) fihen
Tta	 *7„к1”^^.‘<в7-рт -«*»>»>
The nel»<rt tas a breaJih В f«BW *»«* <B
304
Chipccr 8. looming from Cha,
If the breadth В were to slay the same at all layers, and all tillers had E by £ ।
weights, a straightforward formula would estimate the number IV of weights in the nV
[ IV a; LBE2 L layers, В channels, E by E local ronvolutionsj (f))
Notice that IV does not depend on the count of neurons on each layer. This is b
the E2 weights are shared. Pooling will reduce lhe count of neurons.	Usc
h is im common to end a CNN with fully-connected layers. You sec the last |
in AlexNet (Section 8.1). Those dense layers radically increased the count of weioh/”*
IV * 135, 000. 000.	,0
Sortmax Outputs for Multiclass Networks
In recognizing digits, we have 10 possible outputs. For letters and other symbols. 2(1 or
more. With multiple output classes, we need an appropriate way to decide the very last
layer (lhe output layer w in lhe neural net thal started with v). “Softmax" replaces ihe
two-output case of logistic regression We arc turning n numbers into probabilities.
The outputs u’i....tc„ arc converted to probabilities pi,... ,p„ that add to I,
1 "
Softmax py«« —e“’-> where	ew>l
Certainly softmax assigns the largest probability p, to the largest output w,. But ew is
a nonlinear function of ir. So the softmax assignment is not invariant to scale: If we
double all the outputs w}. softmax will produce different probabilities p}. For small w’s
softmax actually deemphasizes lhe largest number Wmax-
In the CNN example of tnachyourmachlnn.com that recognizes digits, you will see
how softmax produces lhe probabilities displayed in a pie chart—an excellent visual aid.
Residual Networks (RcsNets)
Networks are becoming seriously deeper with more and more hidden layers. Mostly these
are convolutional layers with a moderate number of independent weights. But depth brings
dangers. Information can jam up and never reach the output. The problem of “vanishing
gradients’* can be serious: so many multiplications in propagating so far. with the result
thal computed gradients are exponentially small. When it is well designed, depth is a good
thing—but you must create paths for learning to move forward.
The remarkable thing is that those fast paths can be very simple: "skip connections"
that go directly to the next layer—bypassing the usual step v„ = (Anv„_| 4- b,,)*.
L layers could allow 2L possible routes— fast or normal from each layer to the next.
One result is that entire layers can be removed without significant impact. The nth layer
is reached by 2"'* possible paths. Many paths have length well below n, not counting
the skips. By sending information far forward, features that arc learned early don't get lost
before the output Residual networks have become highly successful deep networks.
h'trt

g 2. Convolutional Neural Neu
A fit	305
AS»mpleCNN.,
One Of the class projects м MIT Wls a _ .	to Read |л(|т
user begins by drawing multiple coptcs T“,,<ul *t ^h, «„mad, in™ _
learning thu data—creating a conumaum pX^*, 1Ъеа «he тумеп.ш\ uJot
probability Ю lhe correct answer (the letter £ w4,1I^U"etM* F(b»	«««“ high
For learning lo read digit*. » ргпЫЫШЬ lnnr '
lhul too small a training set leads ю frequeot erZT w** chm You Ч“,Л|У
or letters, and the test images are not centered the nJ- /11Лу>сл h*d ««e»cd numbers
One purfHisc of teachyourmqchlne.com I* rtucXu?^'’‘‘°* emirs appear.
The World Championship al the Game of Go
A dramatic achievement by a deep conw|ullo<ul netw,»s
champion al Go. This is a difficult game pbved <» . in u*“ l°de,e* “* <h“mani world
p., J.»» -«one.-	*>-
color has no open space beside it (left, right, up « d.mn, 1Г	* ,roup Ы °*
the board. Wikipedia h« an an.rnaled galx tX^X* " ,ПИП
AlphaGo defeated the lead.ng player Ue Sedol by 4 games ю 1 щ »!«. ft ^ned
on thousands of human game, Jh.s w„ .
P™ he neural network was deepened and unproved Google's new version AlplJ,
Zero learned to play w.thout any human intenenoon-s.tnply by play.ng against itself.
Now it defeated Us former self AlphaGo by 100 to a
The key point about the new and better version is that the machine learned by Itself
ft wus told the mlfi and nothing more. The first version had been fed earlier games,
aiming lo discover why winners had won and losers had lost The outcome from the
new approach was parallel to the machine translation of languages. To master a language,
special cases from grammar seemed essential. How else to team ail those exceptions ?
The translation team at Google was telling (he system what it needed to know.
Meanwhile another small team was taking a different approach: Let the machine figure
it out. In both cases, playing Go and translating languages, success came with a deeper
neural net and more games and no coaching.
it is interesting that the machine often makes opening moves thal have seldom or never
been chosen by humans. The input to the network is a board position and its history. The
output vector gives the probability of selecting each move—and also a scalar that estimates
the probability of winning from thal position. Every step communicates with a Monte Carlo
tree search, to produce reinforcement teaming.
Chapter 8. Learning from Dau
306
8.3 Minimizing Loss by Gradient Descent
This section of lhe final chapter is about a fundamental problem: Minimise a function
F(x x ) Cab nlits teaches us that all tbe first derivatives oF /дх, are zero at the
minimum‘(when F is smooth). If we have n = 20 unknowns (a small number in
deep learning) then minimizing one function F leads to 20 equat.ons OF/дх, » 0.
• Gradient descent- uses the derivatives dF/dx, to find a direction that reduces F(x).
The steepest direction, in which F(z) decreases fastest, is given by the gradient -V F.
Gradient descent / learning rate at,
(1)
VF represents the vector (dF/dx\........dF/dx») of the n partial derivatives of F.
So (1) is a vector equation for each step к = 1,2.3,... and ak is lhe stepsize or the
learning rate. We hope to move toward lhe point z* where lhe graph of F(z) hits bottom.
We are willing lo assume for now that 20 first derivatives exist and can be
computed.
We are not willing to assume that those 20 functions also have 20 convenient derivatives
d/dx^dF/dx,). Those are thc 210 second derivatives of F which go into a 20 by 20
symmetric matnx H—the Hessian matrix. (Symmetry reduces n2 - 400 to |n2 +	«
210 computations.) Those second derivatives would be very useful extra information,
but in many problems we have to go without.
You should know that 20 first derivatives and 210 second derivatives don't multiply the
computing cost by 20 and 210. The neat idea of automatic differentiation rediscovered
and extended as backpropagation in machine learning—makes those cost factors much
smaller in practice. Backpropagation is a fast way to follow n chain rules at once.
Return for a moment to equation (1). The step -s|VF(zk) includes a minus sign
(to descend) and a factor a* (to control lhe the stepsize) and the gradient vector VF
(containing lhe first derivatives of F computed at the current point zk). A lot of thought
and computational experience has gone into lhe choice of stepsize and search direction.
We start with the main facts about derivatives and gradient vectors VF. Please
forgive me. this linear algebra book is ending with a touch of calculus.
Multivariable Calculus
Machine learning involves functions F(Z|,..., z„)of many variables. We need basic facts
about the first and second derivatives of F. These are "partial derivatives" when n > 1.
°,Tvu^x F<* +	« F^ + VF + 5 (Az)T/f (Ar)
(2)
This is the beginning of a Taylor series—and we don’t often go beyond that second-order
term. The first terms F(z) + (Az)T VF give ajirar order approximation lo F(x + Az),
using information at z. Then the (Az)2 term makes it a second order approximation.
g j.	Lo“ by Gradient Descent
307
:/2*“vr = as*
*• 1"° "«nt Sit 2 by 2:
Example 1 When S is symmetric, the EIJ1|
To sec this vector, write out the function Fu*/'
fxt ®a ] Г “ bl [i|l aH+d-»J r«
* *i T- - •
This is an important example! The minimum occurinhere 6m
VF =	’ BF/дх, ' 0F/Bxn	=	~ 0 = 0 at r* — e-i_	. _ 1 s a = arg min F.
(3)
« always ЧШ *4 «>>п Г Hands for Uk wbm __
vector! =S a than in the actual minimum value F^ = F(z®) at that point •	’
F«i. » 5(S'‘a)TS(S",e)-aT(S-,a)«l.Tr‘«-eT^->ee_l.TS-».
The graph of F is a bowl passing through too at z = 0 and dipping to its minimum al a1
The Geometry of the Gradient Vector V /
Start with a function /(z, y). It has n  2 variables. Its gradient is V/ = {dj/dx. дЦду\
This vector changes length as we move lhe point z.у where the derivatives are computed:
/а/ dj\ ,	..	If8/\*. (9f\* Slope in the
~ \\dx) +	steepest direction
That length || V/|| tells us the steepness of the graph of : « /(z. *). The graph is normally
a curved surface—like a mountain or a valley in zyz space At each point there is a slope
df/дх in the z-direction and a slope df/dy in the y-duection The steepest slope is In
the direction of Vf - grad /. The magnitude of that steepest slope is || V/Ц.
Example 3
The gradient is the vector V/
That steepest
z = constant has ax + by - was»!»- It is
I The graph of a linear function/(z, y)«= az + by is the plane: «az + by.
-	। 11 ! ,jf partial denvanvrs The length of that vector is
- Лч»"<«““”< л“”p““’"‘"'“°"v/-
в	у
="’v

308
Chapters. Learning frnni
Figure 8J: The negative gradient - V/ gives the direction of steepest descent.
Example 4 Thc gradient of the quadratic / = ar2 + by2isV/ = | j «	]
That tells us lhe steepest direction, changing from point to point. We are on a curved
surface (a bowl opening upward). Thc bottom of thc bowl is at x = у » 0 where the
gradient vector is zero. The slope in thc steepest direction is '|V/||. At the minimum,
V/ » (2«x. 2by) « (0.0) and ilopt - zero.
The level direction has z — ax2 + by2 ~ constant height That plane t = constant
cuts through the bowl in a level curve. In this example the level curve ax2 + by2 « c is
an ellipse. Thc tangent line to the ellipse (level direction) is perpendicular to lhe gradient
(steepest direction). But there is a serious difficulty for steepest descent:
The steepest direction changes as you go down! Thc gradient doesn't point to the bottom I
_____________Л___
steepest direction V/ up and down thc bowl ax2 4- by2 = z
flat direction (V/)x along thc ellipse ax2 + by2 = constant
V
Thc steepest direction is perpendicular to the flat direction but
the steepest direction is not aimed at thc minimum point (0,0)
Figure 8.6 Steepest descent moves down the bowl in the gradient direction
Let me repeat. At lhe point x0.Ste the gradient direction for / = ax3 + by2 is along
V/ - (Zaxo.Zbyu). The steepest line through Xo.Jto is 2ru*0(y - J/o) — 2fejfo(r - r0)-
But then the lowest point (x, y) = (0.0) does not lie on thc line! We will not find that
minimum point in one step of "gradient descent**. The steepest direction does not
lead to thc bottom of the bowl—except when b = о and the bowl is circular.
Water changes direction as it goes down a mountain. Sooner or later, we must change
direction 1<ю. In practice wc keep going in thc gradient direction and stop w hen our cost
function J is not decreasing quickly (or starts upward). At that point Step 1 ends and we
recompute the gradient V/. This gives a new descent direction for Step 2.
gj Minimizing Loss by Gradient Descent
A I	309
Ш example /(*,!/) = Х(«з +	*** ^*“4* with Zig-Zag
Vf has two components df/dx = x	«efrt for (j < b < , .
That minimum is reached at the point (Г./	"^"um	‘(
exact line search produces a slmple fofW||' [j M- Best of <
down thc bowl toward (0,0). Starting^ (^^ ** <*•*) in the slow progress
f-----------------:---------------- (M) we find these poi«u
2b
I /(zo.ito) (4)
Vk
Zk
If b = 1. you see immediate success in oneиептС .---------------
is perfectly circular with J =	+	P01" (*» • Vi I»(0,0». The bowl
bowl, and it goes exactly through (0,0). Then йж firsi^^l^^’'0" d°WB
correct minimizing point where J = 0.	₽	8ГаЛа“<fcvxnl finds that
The real purpose of this example is seen when b is tk. .
equation (4) to r  (1 - b)/(b + 1). For b  this гию is r - 9/ц Rz ь™ "T
the ratio is 99/10L ]hc ratio is approaching 1 urf p,^, /|ow|lrt (0 0)JJ
virtually stopped when b is very small
Figure 8.7 shows the frustrating zig-zag рлют	,Q Q, E
is short and progress is very stow. This is a case where the stepsue at in »k*i -
zt “ afcV/(Zk) was exactly chosen to minimize J (an exact line search). But the
direction of - V f, even if steepest, is pointing far from the final answer (z*, /) - (0.0).
The bowl has become a narrow valley when b is small. We are uselessly ernurng the
valley instead of moving down the valley to the bottom
The first descent step starts out perpendicular to lhe level
set. As it crosses through loner level sets, the function
/(z,y) is decreasing Eventually Us path is tangent to
a level set L. Descent has slopped. Going further will
increase /. The first step ends. The next step is perpen-
dicular to L. So the tig-zag path took a 90 ° turn.
Rg„ 17: Sto-
.____. lh, .vseent is faster First-order convergence
For b close to 1, the bowl »	by . ^nsun. factor al each мер
means that the distance to (ж , У >	in (4) is (1 - M’/U + b^'
For gradient descent and this f. the consergence factor
310
Оицмст 8. Learning fnxn I)aI1
Momentum and the Path of a Heavy Ball
The slow zig-zag path of steepest descent is a real problem. Wc have lo improve it. Our
model example / = ^ + hai ,wo vanab,CS and ,tS SCCOnd dcr'vatlVc
matrix H is diagonal—constant entries f„ = 1 and - b and =0. But it shows
the zig-zag problem very clearly when b =	= b/1 •» «mall.
Key idea: Zig-zag would not happen for a heavy ball rolling downhill. Its momentum
carries it through the narrow valley-bumping the sides but moving mostly forward. So we
add momentum with coefficient 0 to the gradient (this is Polyak's important idea).
The direction z4 of the improved step remembers the previous direction
Descent with momentum x*+i =x* — aza with zj, = V/(z*) +/3zfc j
Now we have two coefficients to choose—the stepsize в and also 0. Most important,
the step to za>i in equation (5) involves x*_|. Momentum has turned a one-step method
(gradient descent) into a two-step method. To get back lo one step, we have to rewrite
equation (5) as two coupled equations (one vector equation) for lhe stole al time к + 1:
Vector equation
with momentum
x*+i	= Xk -azk
z*+i - V/(Xk*t)  0zk
(6)
With those two equations, we have recovered a one-step method. This is exactly like re-
ducing a single second order differential equation to a system of two first order equations.
Second order reduces to first order when dy/dt becomes a second unknown along with y.
2nd order equation
1st order system

Interesting that this b is damping the motion while 0 adds momentum to encourage it.
The Quadratic Model
When f(x) - |xTSx is quadratic, its gradient V/ = Sx is linear. This is the model
problem to understand: S is symmetric positive definite and V/(zk+t) becomes Sik+1.
Our 2 by 2 supermodcl is included, when the matrix 5 is diagonal with entries 1 and b.
For a bigger matrix 5. you will sec thal its largest and smallest eigenvalues determine the
best choices for 0 and the stepsize s—so the 2 by 2 case actually contains the essence of
the whole problem.
To understand the steps of accelerated descent, we track each eigenvector q of S.
Here we are using a key idea from linear algebra (Chapter G): Follow the eigenvectors.
Suppose Sq = Aq and xt = c*<7 and zk = dtq and Vfk = Sxk = Actq. Then
equatiixi (7) connects the numbers ct and d* al step к to r*+| and dk+t at step к + 1.
и 3. Min'n,izin8 1ли by Gr*1’«’t Descent
311
K»ll<»*inR,he ***’ ~ca-adk Г
figenvectorq -Ack+l + dt + 1 =
Now we invert the first matrix (-A becomes А)ю
(Ъ
*ee** descent мер clearly:
Descent step
multlpU« ЬУ R
1 —a
A в-Ха
After к steps thc starting vector is multiply bv P*
(which is the minimum of f a 1zts °У«'-
as sroa|! as possible. Clearly those etgenvalues <0(2^
That eigenvalue A could be anywhere between A * -fc₽end
Choose a and 0 to minimize max ре,(Д)|, |еа(Д)
For fast convergence to irtu
eigenvalues r, and e2 of Я lo be
I on the eigenvalue A of S.
!,n(S) and AfnnfS). Our problem is:
I f<* \ninW - * - *»($)•
U seems a miracle that this problem has a beaunful sdunon The opumal » and 3 are
Think of thc 2 by 2 supermodel, when S has eigenvalues A^ a 1 „rf Am„ . i:
*"(г+я) “d *"(jT7s)	"°’
These choices of stepsize and momentum give a convergence rate that looks like the
rate in equation (4) for ordinary steepest descent (no momentum) But there is a crucial
difference between (10) and (4): b is replaced by y/b.
Ordinary	/ № fl “bl	A - r «11 f t ->--t л cct i с га ico	/	r-\2 11 — v/b\
descent factor	\1+Ь/	descent factor	\i +vv
(ID
So similar but so different. The real lest comes when b is very small Then the ordinary
descent factor is essentially 1 - 4b. very close lo 1. The accelerated descent factor is
essentially 1 - 4y/b, much further below 1.
To emphasize thc improvement that momentum bongs, suppose b = 1/100. Then
y/b = 1/10 (ten times larger than b). The convergence factors in equation (11) are
/ 90 \2	( .9 V
Steepest descent ( pjyj- J ss .96 Accelerated descent j » .67
Ten steps of ordinary descent multiply the starting error by 0.67. This is matched by
a single momentum step. Ten steps with momentum multiply lhe error by 0.018.
Amax/Amin = V* = * b ,hf condition number of S.
312
Chafer 8. Loaming from DaIil
Stochastic Gradient Desccn(
Gradient descent is fundamental in training a deep neural network. It is based on a sten
the form x^1 = xk - st VL(xk). That step should lead us downhill toward the J,, ,
x* where the loss function L(x) is minimized for lhe test data v. But for large network
with many samples in the training set. this algorithm (as it stands) is not successful!
It is important to recognize two different problems with classical steepest descent •
1. Computing VL at every descent step—the derivatives of the total loss L wj(h
respect to ail the weights x in the network—is too expensive, Thai total loss add
lhe individual losses t(x,v.) for every sample v, in the training set—potential! *
millions of separate losses are computed and added in every computation of i
2. The number of weights is even larger. So VXL = 0 for many different choices »•
of the weights. Some of those choices can give poor results on unseen test data
The learning function F can fail to "generalize". But stochastic gradient descent
(SGD) does find weights x* that generalize—weights that will succeed on unseen
input vectors v from a similar population.
Stochastic gradient descent uses only a “minibatch” of the training data at each step
В samples will be chosen randomly. Replacing the full batch of all the training data bv
a minibatch changes L(x)  | £f.(x) lo a sum of only В losses. This resolves both
difficulties at once. The success of deep learning rests on these two facta;
1. Computing VL by backpropagation on В samples is much faster. Often 1.
2. The stochastic algorithm produces weights x* that also succeed on unseen data.
The first point is clear. The calculation per step is greatly reduced. The second point is
a miracle. Generalizing well to new data is a gift that researchers work hard to explain.
Stochastic Descent Using One Sample Per Step
To simplify, suppose each minibatch contains only one sample v* (so В — 1).
That sample is chosen randomly. The theory of stochastic descent usually assumes that the
sample is replaced after use—in principle the sample could be chosen again at step к + 1.
But replacement is expensive compared to starting with a random ordering of the samples.
In practice, we often omit replacement and work through samples in a random order.
Each pass through the training data is one epoch of the descent algorithm. Ordinary
gradient descent computes one epoch per step (batch mode). Stochastic gradient descent
needs many steps (for minibatches). The online advice is to choose В < 32.
Stochastic descent is more sensitive to the stepsizes a* than full gradient descent.
If we randomly choose sample o, al step k, then the fcth descent step sees only one loss:
~ ж* ~**Vx^(xa,p,)| Vj I = denvative of the loss term from sample v,
313
gj. Minimizing Loss by Grad^nt
We are doing much less work ре ...
training let). Bul we do not necessary CL ,DPW' ,nMe*« « all inou, f
stochastic gradient descent is “semi-conve^^	A tyX'feXe
^^nteneemtheoan
Early steps of SGD often converge то^Х^^’	П = 1
Wc admit immediately that later iterations of SGD г	°*	*’•
at thc start changes to large oscillations near ibe rti^'b Coo'tr^n«
One response is to stop early. And thereby we , 2/°“ RpUt ” W,U Uwu th“
7 t owrntting the
In lhe following example, the solution z" is u, .
approximation x* is outside I, the next aowon™,,.  “Г*'1* Л cuncnl
Thnt gives semtconvergence-a good sun 8w evenly	'!
We learned from Suvrit Sra that the simplest examnle	L
one component x. The tih loss is (, . l(OjX - xp W1th . . .. - .**^2 *
iu derivative at(a,x - 6,) It is zero and /, is minmzed at a= b /а
а11 /V samples is Цх) = E(«.x^)’ Ьеам^Д^Х^
. *
£aifei
e.b, В
----------
Important If B/A is the largest ratio bja,. then the tree solution x* к below B/A.
This follows from a row of four inequalities:
bt В
Ла<1,<^Ва' л(Е“<М<в(Е«?) *•-
Similarly X* is above the smallest ratio 0/a. Conclusion: If z« is outside the
interval / from в/a to B/A, then the hh gradient descent step will move nmwrf that
interval I containing x*. Here is what we can expect from stochastic gradient descent:
If xi, Is outside /, then хьм moves toward tbe interval 0/a < x < B/A.
If Xh is inside I, then so is Za+i. Th* iterations can bounce around inside I.
A typical sequence Xo.Xi.Xj,... from minimizing i|Az - b|J by stochastic gradient
descent is graphed in Figure 8.8. Ком tee the fan uan and the otcillanng finish.
This behavior is a perfect signal to think about early stopping or else averaging.
t
314
iterations
. -«nl but later iterations oscillate. For these four SGD paths.
Figure 8.8: Early iterations v uuickly and then fluctuates instead of converging,
the quadratic cost function decreases quit у
Overfifting
Here is an observation from experience. W<- may not want to fit the training data too
perfectly That could be overfilling The function F becomes oversensitive. It memorized
everything but it hasn't learned anything Generalization is the ability to give the correct
classification for unseen lest data v. based on the weights x that were learned from the
training data.
I compare overfining with choosing a polynomial of degree 60 that fits exactly
to 61 data points Ils 61 coefficients a« »«n will perfectly leam the data. But that
high degree polynomial will oscillate wildly between the data points. For test data at a
nearby point, the perfect-fit polynomial gives a totally wrong answer. But see Figure 8.9
for an unexpected result from severe overfitting.
One fundamental strategy in training a neural network (which means finding a func-
tion that кал» from the training data and generalizes well to lest data) is early stopping
Machine learning needs to know when lo quit! Possibly this is true of human learning too.
The Double Descent Curve
This section is a report oo experiments (including ordinary least squares). Usually we are
fitting a large problem with a smaller number of parameters. We may have m equations
and n < rn unknowns: Ax *= b with a tall thin matnx A. The m measurements b arc not
exact and loo few parameters (n = 2 for fitting by a straight line) cannot model the data.
To improve our result, we allow more parameters. As n increases, the model begins to
improve. "The bias is reduced But nothing is perfect, and the first descent in Figure 8.5
turns upward. The model begins to fail from overfitting. All this is for n < m. the usual
situation.
It was fully expected that deep karning would become deep suffering, when the num-
ber of layers and matrix weights increased too far. The evidence pointed that way. until
computers became so fast and powerful that n went past m: The model is overparame-
terized Now we have many solutions to choose from—many different weights x minimize
the loss function L(x).
315
gj. Minimizing Loss by Gradient
m>n
m<n

^Ex —int^poiati'
Number of Weights
Figure 8.9: Thi» is Belkin’» double dev
«"«joins J<~" » »««,.шw,	’«'«no;,
* J1’*1 «*!» ch.net!
n
Gradient descent (full batch or numbatch
to good weight»! Apparently it doe». ,(	Л"? “осЬ*Лс»»У> <nev U> converge
The method generalize» well to new data by ch. ** *C°°d <*e4cw *" R*ure ”
all the possible solutions. That process is п<и fuli?7"r!2 fMnKul"1> 8°°d “hitum among
For the linear least squares equate	“ Л“ *«* “ being written
added to x. Then ATA(z + ж) М1ц	_*/ u,lu‘'<,<ito Ax = 0 could be
attoon м n > m (more unknowns than equations. r72“"
keeps x in the row space of А H avwd, Xnr x from the nuT """"r 7?' ‘°‘и“"П
would increase the norm: ||i + x||« e |lx7^|x|P^nd^
A neat observation by Poggio (arXiv 1912 06190» looks at the condition numh« rf
AT A. As n increases, the graph of that number is ven, . r </?> number of
, , v numoer is very close to Figure 8.9—the error
goes down again for large n. Out of many solutmns („ n > m. Мк^ы other authors
show that gradient descent somehow chooses a volution that generalizes well lo new data
ADAM: Adaptive Methods Using Earlier Gradients
For faster convergence of gradient descent and stochastic gradient descent, adaptive
methods have been a major success. The idea is to uir gmdienu from rariirr aept.
“Momentum” went one step back, to к - 1. These adaptive methods (ADAM) go all
the way back. Memory partly guides the choice of search direction Dt and stepsize aa.
We are searching for the vector z* that minimizes a specified loss function L(z).
In the step to xa+t = z* - sD*. we are free to choose the ditraion Dt and tieptize a*.
Dk = DCVLa.VLk-t........VL,) and a* = a(VLk. VL»-t....VLo). (12)
For a standard iteration (not adaptive). Dt depends only oo the current gradient VLk
(and sk can be s/y/k). That gradient is evaluated only on a random mimbatch В of
the test data. Now, deep networks often have the option of averaging some or all of the
gradients from earlier minibatches Success or fatlure wtll depend on D. and »*.
Chapter 8. Learning from
316
Епхжелгш/ mming mrmgez « ADAM have become .he favorites Recent gradients
v/Se greater weight .han earlier gradients in both ч and .he step dtrcction Dk. 7^
exponential weights in D and a come from 8 <\ and 0 < 1 J^al valucs arc A a 0.9
and7 = 0.999 Small values of A and fl will effectively kill off lhe moving memory end
lose die advantages of adaptive methods.
The actual computation of D* and a* will be a recursive combination of old and new;
A - /Da-! 4- (1 -	- Я1 + О ~ Д) ||V£ (g Д	(J3)
For several class projects, adaptive methods clearly produced faster convergence.
After fast convergence lo weights thal nearly solve VL(x) = 0 there is still t|le
crucial issue: Why do those weights generalize well to unseen test data ?
Randomized Kaczmarz for Ax — b is Stochastic (radicnl Descent
Kaczmarz for Ax = b with random i(fc)
- a, xk
i№ ai
(14)
The Kaczmarz idea is simple. Choose row i of A at step k. Adjust x*+J to solve equation
• in Ax  b. (Multiply equation (14) by a? to verify that a,Tx*+i - b„ This is equation i
in Ax - b.) Geometrically. x*+1 is lhe projection of x* onto one of the hyperplanes
a*x •• b, that meet at x* “ A~'b.
This algorithm resisted a close analysis for many years. The equations ofx  6b
ajx  bj... were taken in cyclic order I to n. 1 to n.... Then Strohmer and Vcrshynin
proved fast convergence for random Kaczmarz. They used SGD with norm-squared sam-
pling : Choose row 1 of A with probability p, proportional lo ||a,||a.
A previous page described the Kaczmarz iterations for Ax — b when A is N by 1.
The sequence xOlX|,xg,... moved toward the interval /. The least squares solution x*
was in that interval. For an N by К matrix A. we expect the К by 1 vectors x, to move
into a Л'-dimensional box around X*. Figure 8.8 showed this for К — 2.
The next page will present numerical experiments for stochastic gradient descent.
g з. Minimizing Loss by Gradient Descent
317
Random Kaczrnarz and Iterated Projections
SUPP?hCC	t Гм' ГаП“”'П Кж/Лап "* emx - X-
°n,° online mine mdw h i'i	cho'cn ^^mily at step t (often with impor-
tance	ob-b.hu« proportional io ||a,|p). To see that projection matnx
OiOl/a,1 a,, substitute о, «< a:’into the update мер:
Olbogonsl	k»pb Tbtmo<cM,«,|) aar„« Jl,™
norm ||xfc - x || decrease» steadily, even it the com function ||Лх* - b| doc* not. Bul
convergence in usually slow! Suohmer-Venhynm eMimate the expected слот:
E l||x* - x || I < H - — j ||х» - ж,||,1 e»= condition number of Л. (16)
Thi» Й slow compared lo gradient descent (there c2 h replaced by r. and then y/r with
momentum added). But (16) is independent of the size of A: attractive for large problem*.
Our experiments converge slowly! The 100 by 10 matnx A is random with c * 400.
The figures show random Kaczmarz for 600.000 steps We measure convergence by lhe
angle 0k between x* - x* and the row a, chosen al step k. The error equation (IS) is
||x*>i - x*||2  (1 -соаа0й)||х* -x*||’
(П)
The graph shows that those numbers 1 - сои2 Oy are very close ю I: slow convergence
But the second graph confirms that convergence doe* occur. The Strohmcr-Venhynin
bound becomes E[coa2®»] > 1/c2. Our example matnx has 1/c2 close to 10'* and
experimentally con2 0* =s 2 • 10"*, confirming that bound.
Figure 8.10. Convergence of the squared error for random Kaczmarz. Equation (17) with
1 - сов2 0,, close to 1 - 10's produces the slow convergence in lhe lower graph
Chapter 8. Learning from Data
318
Product of Matrices ABC: Which Order ?
^hblv efficient improvement on computing each OF/dz, scp.
Backpropagauon » an «««*«*«*	thc computations can make such
arately. At fir* •<**"“ “ £ cnd (thc j^bter might say) you have to compute derivatives
an enormous difference	B||t	reordcring
for each small step in faJ Js than N times lhe cost of one derivative dF/dxx.
d,ffere„„auo„ mat, ЛО).
X,	m VF Those are number i„ ,
bus Шау ле m.mces .» top ^.ag-wben each ,a,cr „
muluplkaao« son gives lire to~< d»h («« below).
It is beautiful	. t three matrices ABC. thc associative law offers
?“	«--- - ь-
AB first or BC first ?	Compute (AB) C or A(BC) 1
The result is the same but the number of ind.vidual multiplications can be very different
mein. Л is » b, o. end В is » b, p. and C « p by
AB = (m x n) (n x p) has	mnp	multiplications
First way ^AB)C - (m x p) (p x q) has	mpq	multiplications
BC = (n x p) (p x q) has	npq	multiplications
Second way	= (m x nj (n x g) has	mnq	multiplications
So the comparison is between mp(n + q) and nq(m + p). Divide both by mnpq:
The first way is faster when - + — is smaller than------f- —.
q n	m p
Here is an extreme case (extremely important). Suppose C is a column vector: p by 1.
Thus q = 1. Should you multiply BC to get another column vector (n by 1) and then
A(BC) to find the output (m by 1)? Or should you multiply matrices AB first ?
Thc question almost answers itself. The correct A(BC) produces a vector at each step.
The matrix-vector multiplication BC has np steps. The next matrix-vector multiplication
A(BC) has mn steps. Compare those matrix-vector steps to the cost of starting with the
matrix-matrix option AB (mnp steps!). Nobody in their right mind would do that.
In the application of A(BC) lo the chain rule, we start with the last layer C is
the derivative of the last Fl and we go back to the first layer (A is the derivative of Fj).
8.3. Minimizing Loss by Gradient Dewem
319
11,6 Multivariable Chain Rule
Suppose the vector v with n components v, is a fUIKtlon ..
u.. The derivative of each u, with respect to each ’ vector “ *llh c°mponcnts
(often called the Jacobian matrix J) 1ц ^anr lnto *** dcnvatlve matrix dv/du
derivative matrix dw/dv of the vector functiX» --Г <П°* Ьу П) 5,тИяг1У-lhc
components of v = (v,....M is a p by n matnx "	....*lth t0 the
dw
dv
dwi
dv\
dwp
. tbt
(18)
Each w. depends on the v’s and each Bj depends oo the us. Therefore each function
wt,..., wp depends on ult..., и„. The chain rale aims to find the derivatives dw,/duk.
And the rule is exactly a dot product: (row i of dw/dv)-(column к of dv/du).
3w, = 3w, Эщ +	+	=/dw, dw,\ /dvj dv„\
duk &vi duk dv„ duk \dv}'‘“'dv^) \duk'’",dui,)
Multivariable chain rules	dw / dw \ / dv\
Multiply the matrices in (18) du = \ dv J \ du)
(19)
The key to matrix calculus is linear algebra—shown again in this chain rule.
Problem Set 8.3
1	The rank-one matrix P = aoT/aTa is an orthogonal projection onto the line through
a. Verify that P2 = P (projection) and that Px is oo that line and that x — Px
is always perpendicular to a. (Why is aTz = aTPx ?)
2	Verify equation (15) which shows that Xk+i — x* в exactly P(xk - x ).
3	If A has only two rows at and aj. then Kaczmarz will produce the alternating
projections in this figure. Starting from any error vector eo = xo - x*. why does ek
approach zero ? How fast—if you know that angle в at 0 ?
»>	-fli
320
4
Chapter 8. Leaning from Dau
Suppose we want to min.mize F(x,y) = V*+ (* "	J1* «^nl minimum
• al (x’.y*) = (0.0). Find thc gradient vector VF at thc starting pojn|
(to <ai) = (I. 0- fr* ,u*' 4?ги^и'п* descent (no/ stochastic) with stepsize в = 1.
where is (xi. Щ) •
In minimizing F(x) = ||Ax - Ы12. stochastic gradient descent with minibatch size
В = I will solve one equation ajx = b, at each step. Explain thc typical step for
minibatch size В — 2.
(Experiment) For a random A and b (20 by 4 and 20 by 1). try stochastic gradient
descent with minibatch sizes В = 1 and В = 2. Compart thc convergence rales—
the ratios r* = |[x4+i - хф(|/||х* — X*||.
(Experiment)Try thc weight averaging proposed in arXiv: 1803.05407on page 365.
Apply it to lhe minimization of || Ax - b\|2 with randomly chosen A (20 by 10) and
b(20by 1). and minibatch В = 1.
Do averages in stochastic descent converge faster than tbe usual iterates xk ?

8.4. Mean- Variance, and Covariance
8.4 Mean, Variance, and Covariance
321
*1 to z. are positive numbers adding to 1.
The mean is simple and we will start there Ri.t» ~
We may have *, rexull,	*—
expected resulls (expec/ed rala«) (nxu (ulure итак.
Sample values Five random freshmen have ages 18.17.18.19.17
Sample mean 5 (18 + 17 + 18 + 19 + 17) = 17.8
Probabilities
Expected age E [x] of a random freshman = (0.2) 17 + (0.5) 18 + (0.3) 19 = 18.1
Both numbers 11.8 and 18.1 are correct averages. The sample mean starts with N samples
Xi to XN from a completed trial. Their mean is the average of the .V observed samples:
5(18 + 17+ 18+19+17) = 17.8
Thc ages in a freshmen class are 17 (20%), 18 (50%). or 19(30%)
„	1	1
Sample mean m = и = —
 N
The expected value of x starts with the probabilities pt
Expected value m = E(z] = pi*i + Paxa + • • • + P»x„.
(I)
(2)
This is p • x. Thc number m = Ejxj tells us what to expect, rn = p tells us what we got.
A fair coin has probability po = | of tails and pi = | of heads. Then E[r] = (5) 0+| (1).
The fraction of heads in N coin flips is lhe sample mean. Thc “Law of Large Numbers”
says that with probability 1. the sample mean will converge to its expected value E[r] = 5
as the sample size N increases. This does nor mean that if we have seen more tails than
heads, the next sample is likely to be heads. The odds remain 50-50.
The first 1000 flips do affect the sample mean. But 1000 flips wll not affect its limit—
because you are dividing by .V -» oc.
Note Probability and statistics are essential for modem applied mathematics *ith mul-
tiple experiments, the mean m is a vector. The variances/covariances go mto a matnx.
Probabilities p(<) change with time in a master equation
322	Chapter 8.
Variance (around the mean)
The variance <ra measures expected distance (squared) from the expected mean 1-1 I
The sample variance № measures actual diMance (squared) from thc actual sample nic
The square nx« is the standard deviation a or S. Aller an exam. 1 email the results n tmrt
5 to the class. I don’t know thc expected m and <r* because 1 don’t know the probabilif
Pt Ю pioo for each score. (After GO year*. I still have no idea what to expect.)
The distance is always/гот the mean—sample or expected. We arc looking for the si
of the "spread” around thc mean value x = m. Start with N samples.
Sample variance S’ =	[(®i - m)a + ... + (xN - m)’j (3)
Thc sample agesr = 18,17,18,19,17base mean rn = 17.8. That sample has variance 0 7 •
S’ = J [(-2)’ + (-.8)’ + (.2)’ + (1.2)’ + (—.8)’] = 1(2.8) = 0.7
Thc minus sign* disappear when we compute squares. Please notice ! Statisticians divide
by ;V - I = ( (and not N = 5) so thal S’ is an unbiased estimate of o’. One degree of
freedom is already accounted for in thc sample mean.
An important identity comes from splitting each (x — m)’ into x2 - 2mx g. m2:
sum of (x, - rn)’  (sum of x’) - 2m(sum of x<) + (sum of rn’)
 (sum of x2) - 2ni(/Vm) + Nrn2
sum of (x< — rn)’ = (sum of x2) — Nm2.
This is an equivalent way to find (x( - m)2 + • • • + (xN - m2) by adding rf q.... g. j.3
Now start with probabilities p, (never negative I) instead of samples. We find expected
values instead of sample values. The variance a2 is thc crucial number in statistics.
Variance <т3 = E [(x - rn)’] =P1(r, - m)’ g.... g. Рп(Жп  т)з | (J)
Wc arc squaring thc distance from the expected value rn = E[r], We don’t have samples,
only expectations. We know probabilities but wc don’t know the experimental outcomes'
Equation (3) for thc sample variance № extends directly lo equation (6) for the variance o’:
Sum of p,(xt-rn)2 = (Sum of p,x’)-(Sum of p,x,)2
or <r’ = E(®’] - (E[x]a) (6)
Example 1 Coin flipping has outputs x = 0 and 1 with probabilities po = Pi = |.
Mean rn = j(0) + ](1) = ] = average outcome = E|x]
Variances = |(0 —i) + j(l —}) = 1 g-1 = 1 = average of (distance from m)2
For average distance (not squared) from rn. what do we expect ? E[x - m] is zero!
К.4. Mean. Variance, and Covwwncc
323
Find thc variance n2 of the
The probabilities of ages r.
Example 2
Solution	«js» x, 17.18.19 were p, = 0.2 and 0.5 and 0 3.
-fhc expected value was m	“ 18.1. Thesariawceuveslhinc same probabilities;
<r’ - (0Д)(И - «Л)’ + (O.5)(1B - IMf + (03)110 - 18.1)’
. (0.2Ц1.21) + (0.51(0.01) + (03)(0.81) . 0.49 Then в = 0.7.
measures ,*1C 4PrcaJ °1	*8. V* around E|xj. weighted by prubabilitievO.2.0.5,03.
Continuous Probability Distrilnitions
Up to now we have allowed for ages 17	19. __________ „
stead of years, there will be loo many р^мЫе ares Г
17ond 20^8 continuum of possible ages. The^^^ ?	*7
change to a probability distribution p(x) for a^^f Ж!”*
The best way to explam probab.|1Iy dnmbutxxn n to gne you twn examples They
wi|| be the uniform distribution and thc normal distribution The first fund Д) is easy
The normal distribution is all-important.
Uniform distribution Suppose ages are uniformly distributed between 17.0 and 20.0.
All those ages are equally likely . Of course any one exact age has no chance at all
I. ---------- ----------------.. .	r x . п I orX - 17 4. V5;
------- —i agr less than < x:
X < 17 won't happen
x < 20 will happen
There is zero probability that you will hit the exact number z ’ IT.
But you can provide lhe chance F(z) that a random freshman has
The chance of age less than x « 17 is F( 17)  0
The chance of age less than z  20 is F(20)  1
The chance of age less than x is F(x) = |(x - 17) 17 to 20 : F goes from 0 to 1
From 17 to 20. the cumulative distribution F(x) increases linearly where p is constant.
You could say that p(x) dz is the probability of a sample falling in between z and
x + dz. This is “infinitesimally true": p(z)dz is F(x + dr) - F(x). Here is calculus:
F = Integral of p Probability of a < x < b = / p(x) dz = F(b) — F(a) (7)
F(b) is the probability of z < b. Subtract F(a) to keep x > a . That leaves a < r < b.
324
Chapter H. Learning fnwi) Ьлц
Mean and Variance of p(xj
cumulative F(x)
=. . probability that a
sample !• below x
F(x) = э(* “ 17)
H	"pdf P(z) .
derivative of F	probability that M
sample is near ®
_________ p(®) = -7-
dr
Figure К 11; /’(.г) is the cumulative distribution and its derivative p(x) = JF/d® is t|K
probability density function (pdf). The urea up lo ж under the graph of p(®) is F(®).
What arc (he mean in and variance o’ for a probability distribution 7 Previously we added
juj:, to gel lhe mean (expected value). With a continuous distribution wc Integrate xp(x);
Mean m  E[x) = x p(a>) dx
Variance a1 = E [(« - m)a] = I p(x) (x - m)a dx
3
When ages are uniform between 17 and 20, lhe mean is m ~ IN.5 with o2 =
That is u typical example, and here is the complete picture for a uniform p(x), 0 to a.
Uniform ford < x < a Density p(x) = — Cumulative F(z) = -
a	a
Menn in a J x p(r)dx = - Variance 9a = Г - (ж - -Л dx e —
J '	2	J «\	2/	12
l or one random number between 0 and I (mean q ) lhe variance is a2
(K)
Normal Distribution: Bcll-shttpcd Curve
I he normal distribution is also called (he "Gaussian" distribution. It is the most important
of all probability density functions р(ж). Hie reason for its overwhelming importance
comes from repenting rm experiment and averaging lire outcomes. The experiments huve
their own distribution (like heads and tails). The aventge approaches a normal distribution.
и 4. Mean. Variance, and Covwwnte
325
Figure 8.12: The standard norma) distribution p (x) ha, mean m - 0 and a - 1.
The “Mandard normal distribution" p(x) is symmetric around x = 0. ю its mean value
ii rn “ О- I* •» chosen to have a standard variance a2  1. It ia called N(0,l).
I he graph ol p(®) = e  12 is the bell-shaped curve with variance а2 = I.
By symmetry thc mean i« rn  0. The integral for a2 uses the idea in Problem 11 to
reach 1. Figure 8.12 shows a graph of p(x) for N (0,<r) and also its cumulative distnbution
F(x) “ integral of p(x). From F(x) you see a very important approximation for opinion
polling:
2
'Die probability that a random sample falls between -a and a is F{a) - F(-tr) as -.
3
Similarly, thc probability that a random x lies between -2a and 2a ("less than
two standard deviations from lhe mean") is F(2a) - F(-2a) « 0.95. If you have an
experimental result further than 2a from the mean, it is fairly sure to be not accidental.
The normal distribution with any mean rn and standard deviation a comes by shifting
and stretching the standard N (0,1). Shift x to x - m. Stretch x — m to (x - m)/a.
GauMlan density p(x)	= 1	_ т)а/2<уэ (g)
Normal distribution N(m, <r)	<r s/2rr
The integral of p(x) is F(x)—lhe probability that a random sample will fall below x.
There it. no simple formula lo integrate e“* ^2, so I' (x) is computed very carefully.
326
СЬчмег 8. Learning faxn DWa
N Coin Elips and N ->
Exsmple3 SupP*"*xЬ 1 **	. ~. i _ ifn2 . i, na
The variance is -5(1) + |(-1)2 = 1.
The mean value is m = jt*> + jl
A»> = (xi + •" +	1Ъс ,ndcPendcni т.
The key queMhm “**“ e‘* ' ‘ by д’. The expected mean of AN is still zero,
зге ±1 and we air	k awrage approaches zero with probability 1.
,„e b. or Uge numberv>riance ,, ,
How fast docs .4.v approach zero.
.	4.	= .V — = —
By linearity <r^. = д’1 + Д’’	№
since a2 = 1.
Here are the results from three numerical tests: random 0 or 1 averaged over trials.
[4B I s from .V = IOC! [5035 i's from .V = 10000] (19967 l’s from N = 40000].
The standardized X = (x - m)/o = (As - 5) /2v/X was (-.40] (.70] (-.33].
The Central Limit Theorem says that the average of many coin flips will approach a normal
distribution. Let us begin to see how that happens: binomial approaches normal.
The “binomial" probabilities po,..., PN TOunt die number of heads in W coin flips.
For each (fair) flip, thc probability of heads is |. For N = 3 flips, the probability
of heads all three times is (|)’ = |. The probability of heads twice and tails once is
from three sequences HHT and HTH and THH. These numbers | and | are pieces of
/1 + I)3 =	l	+ ’	+	’	+	l	=	l. The average number of heads in	3	flips is	1.5.
V 2	2'	8	8	8	8
1	3 . .3	3	6	3
Mean	m	=	(3 heads)-	+	(2	heads)- + (1 bead)-	+0=	-	+	-	+ - =	1.5 heads
Ь	о о	о	о	о
With .V flips. Example 3 (or common sense) gives a mean of m = E XiPi = heads.
The variance a2 is based 00 the squared distance from this mean N/2. With N = 3
the variance is a2 — j (which is X/4). To find a2 wc add (x< — m)2 p, with m = 1.5:
a2 = (3-1.5)’1 + (2-1-5)» I + (1-1-5)’| + (0-1.5)’ | = ~*3 + 3 + 9 = ?
о	о	о	о	oz	4
For any Л’. the variance for a binomial distribution is aj, = N/4. Then as = VN/2.
Figure 8.13 shows how the probabilities of 0.1.2.3.4 heads in N = 4 flips come close
to a bell-shaped Gaussian. Thai Gaussian is centered at thc mean value m = N/2 = 2.
Й.4. Menn. Variance. and Covar1ancc
To reach the .standard Gau^,^.,	327
If x is the number of heads in V n u **> variance i >	. r
biW by ll> me» m , N/2 м	V	Jt
--------------to P”*1** the standard X •
Shifted and scaled
Subtracting m Is “centering" or “detrmdi '	?
Dividing by cr Is "normalizing" «г “« и"'	"**" ** X h иго-
L—~v.ri>rKe
It IS fun to sec the Central I imir tt^Z-;-----
X = 0. At that point, the factor e~x’/i «271 П?Ь‘ MM*CT 11 ccntCT P°inl
flips is ff2 = M/4. The center of the bell-sha^ \kno*,hat *** vanani-c for ti coin
What is thc height at the ^n<er of	'****
distribution)? For X = 4. the probabilitiesf»n , о ',°П А» •e Px «he b.nomial
Polities forO. 1,2,3.4 heads come from (1 + I)4.
i + lV- 1 . 4 6 4 !
2 2/	16 16+1б+1б + 1б = 1' (9)
g
Center probability = —
16
p(x) = l	₽*/1*>/27тХ/'«'х
---------7	 '	'
uniform-----------------------------------------z \
J binomial
/ approaches \ M heads
area =1	1	1 Z G’ussian \ ti flips
—i------------»-—L
a 0	1 Af=O X/2 ti
Figure 8.13: The probabilities p = (1,4,6.4,1)/16 for the number of heads in 4 flips.
These p. approach a Gaussian distribution with variance a2 = X/4 centered al m = X/2.
For X, the Central Limit Theorem gives convergence to the normal distribution N(0.1).
The binomial (| -t-1)* tells us the probabilities for 0,1,	, X ty»rk
The center term is the probability of beads. - tails___________- '
____________________________________2	2 2N (Х/2)! (Х/2)!
ForX = 4. those factorials produce 4!/2! 2! = 24/4 = 6. For large .V. Stirling s formula
x/2ttN(N/e)s is a close approximation to XL Use this formula for X and twice for .V/2:
Limit of coin flip	__ 1 s/2x.V(X/e)* _ y/2	1
Center probability ₽,v/2 "" 2* яХ(Х/2е) * “	~ y/bta'
The last step used the variance a1 = X/4 for coin-tossing. The result 1/'Jlxa matches
the center value (above) for the Gaussian. The Central Limit Theorem is true:
The centered binomial distribution approaches the normal distribution p(z) as N —» 00.
328
Chapter 8. Learning from Data
Covariance Matrices and Joint Probabilities
Linear ateebn enters when we mn Af different experiments at once. We might measure
аве and height (Af = 2 measurements of .V children). Each experiment has Ha own mean
value So we have a vector m = (rm.m,) containing two mean values. Those could be
wm/i/r mrons of age and hc.ght. Or m. and rrr2 could be expec/erf ro/trw of age and height
based on known probabilities.
A matnx becomes involved when we look al variances. Each experiment will have
a sample variance 5? or an expected o? » E[(r, -m.)-j based on thc squared
distance from ils mean Those M numbers of............will go on the main diagonal
of the "variance-covariance matrix’’. So far we have made no connection between the
M parallel experiments. They measure different random variables, but thc experiments
are not necessarily independent!
If we measure age and height for children, the results will be strongly correlated. Older
children are generally taller Suppose thc means rn. and rn* arc known. Then rrf and
arc thc separate variances in age and height. The new number Is thc covariance <xo*,
which measures the connection of each possible age to each possible height.
Covariance <r.* = E [(age - mean age) (height - mean height)]. (j |)
This definition needs a close look. To compute <r«*. it is not enough to know the
probability of each age and lhe probability of each height. We have lo know thc joint
probability p„* of each pair (age and height) This is because age is related to height.
Pah  probability that a random child has age = a and height = h: both at once
pl}  probability that experiment 1 produces m and experiment 2 produces pj
Suppose experiment I (age) has mean rn(. Experiment 2 (height) has its own mean rna.
The covariance between experiments 1 and 2 looks at all pairs of ages r, and heights y,
We multiply by thc joint probability p4 of that age-height pair.
Expected value of
(x - rrii)(y - rna)
Covariance <7la = £ £ ргДхг - mx)(Vi - ma) (12)
alii, j
To capture this idea of “joint probability p,/’ we begin with two small examples.
Example 4 Hip two coins separately With I for heads and 0 for tails, the results can
be (I, I) or (I. I)) or (0,1) or (0.0). Those four outcomes all have probability (I)2 = 1
lor independent experiments we multiply probabilities: The covariance is zero.
pt) - Probability of (i, J) = (Probability of t) times (Probability of j).
8.4. Mean. Variance. and Covenant
329
Example 5 Glue the coins together
(1,1) and (0,0) Those have probabtht^ i'X'i' £"* **У	°“'У рохмЫте, are
(1,0) and (0.1) won l happen because the coin,	P,u m
IQ*e®e* both heads or both tails.
Joint probability matrices
for Examples 1 and 2
1
and P2
Let me stay longer with P. to show it h,
(heads, tails). Notice the row sum, p>. p,	colunJi'
Probability matrix
matrix shows the
•heads, heads) and (а,, й) .
wm‘ Pi. Pi and the total sum - 1.
first \
coin /
Ptl + Р» = P2
4 entries add to 1
Pit P|2
Pit Pn
(second coin) column sums Pt p*
Those sum, p^ and />,.* are the marginals of the Join, pmbdahty matnx Pt
P> = ₽“ + Pta - chance of heads from coin 1 (coin 2 can be heads or tatls)
Pt -Ptt + P11 -chance of heads from coin 2 (coin lean be heads or tads)
Example Hhowed independent random variables Every probably p„ equal, p. times p,
(| times 5 gave ptJ - tn that example). In this сак the covariance al} will be aero
Heads or tails from the first com gave no information about the second coin
Zero covariance <7la for independent trial.	«Л л=	~ diagonal covariance matrix V
Independent experiments have <7 и  0 because every ptJ equals (p, )(p2) in equation (12).
•Ha 52^)^И*‘“’"|)<»|-П1»)“ [&•)(*•-"«>'] p£(p,)(lb-"u)] -10]|0|.
Example 6 The glued coin, show perfect correlation Head, on one means head, on the
other. Thc covariance ffu moves from 0 to <T| times <ra This is the largest possible value
of tfij. Here it is (|)(|) - <rla = (J), as a separate computation confirms:
Means = -
2
^’ = 5 (*-5) (’“I) + 0 + 0 + l(°"5)(°"l)
Heads or tails from coin 1 gives complete information about heads or tails from coin 2:
Glued coins give largest possible covariances
Singular covariance matrix: determinant = 0
' ci»« -
<rj	<7i<ra
<7t<r3	<rj
330
Chapters. Learning from Dau
Always <r3<r2 > (<Иа)2 Thus “ betnWn	‘Т,<Т2' Thc rnatri* V
is positive definite (or in this singular case of glued coms. V is positive semidefinite).
Those are important facts about all M by Л/ covanance matnees V for M experiments.
Note that the sample covariance matrix S from A trials is certainly semidefinite.
Every sample A' = (age. height) contributes to the sample mean X = (m.,пц).
rank-one term (X, - X)(X - X)T is positive semidefimte and we just add to reach the
matrix S. No probabilities in S. use the actual outcomes:
The Covariance Matrix V is Positive Semidefinite
Come back to the expected covariance »u between two expenments 1 and 2 (two coins);
on = expected value of |(ошрш 1 - «гол 1) times (output 2 - mean 2)]
(Пэ - E E Pa (** - m«) (Vi “ ma). Thc sum includes all ij.	(’4)
p,. > 0 is the probability of seeing outputs x, in experiment 1 and y, in experiment 2.
Some pair of outputs must appear. Therefore the № joint probabilities p,} add lo 1.
Total probability (ail pain) is 1	$2	" L (15)
all i.J
Here is another fact we need, fix on one particular output r, in experiment 1. Allow
all outputs У) in experiment 2. Add the probabilities of (x,. pi). (x,. pj),..., (x,, t/„);
n
Row sum р,- of P £ Pij ~ probability p, of x, in experiment 1.	(16)
Some уj must happen in experiment 2! Whether the two coins are completely separate or
glued, we get the same answer | for the probability рн “ Рни +Рнт that coin 1 is heads:
(separate) Рнн + Pht - | | ~ (glued) Рнн + Рит = + 0 = i.
That basic reasoning allows us to write one matrix formula that includes the covariance
<rl2 along with the separate variances <rf and a2 for experiment 1 and experiment 2.
We get the whole covariance matrix V by adding the matrices Цу for each pair (a. j):
Covariance matrix	_ Г (x, - mi)2 (x, — m, )(y, - mj ll
V = sum of all	*jj — P,] |<r, - mi)(yj — m2)	(p, - mj)2 J
8.4. Mean. Variance. and Covari^e
0«1bedi4l.»al.1hi.i.tqo<l„M
« ,eum» Ite «ta, «шм. *<•№.	„
by using equation (16). Allowing all} jUM ?	1 Аи* ln detail how we get Ц, =
— the probability Pi 0(,. in Mperinlen|
Vit - > .	- mJ2 = V (nn. .
..lift jr-,р,оЬЛ,,“>
H|| |
please look al that twice. It j( the .
one formula (17). The beauty of that formula bT? *!“* cwwiancc тм,пх h>
v„ h„di.s..n.l«n«k,p,,(Ij.„,). >„ „ N„rmi|1 Jo	(
Thai matrix V(J ha. rank l.EquuionUTJmvl
[(x(-mi)(^-m2)	(gj-m,)2	] [ю-т,]1
Ei-ery matrix рциит it positive semidefinite. So the whole matrix Г (thc шт of
those rank 1 matnces) is at least semidefinite—and probably I’ is definite.
(18)
(19)
The covariance matrix V is positive definite unless the experiments art dependent.
Now we move from two variables r and g to Л/ viable» like age-height-weight
The output from each tnal is a vector X with Л/ components (Each child has an agc-
height-weight vector X with 3 components.) The covariance matrix V ix now Л/ by Af.
The matrix V is created from the output vectors X and their average У = E ,’XJ:
Covariance V=Ef(X-X) (X-Х)7	(X^-m,) (20)
matrix	i_________________________________________________
Remember that XXr and X XT = (column)(row) are M by M matnces.
For Af = 1 (one variable) you see that У is the mean m and V is the variance a3.
For Af -2 (two coins) you see that Г is (mt.m2) and V matches equation (7). Thc
expectation always adds up outputs limes their probabilities. For age-height-weight
thc output could be X = (5 years, 31 inches. 4S pounds) and its probability is ps.31.4e •
Now comes a new idea. Take any linear combination с1 X = ci Xt + • • • + см Хм-
With c = (6.2.5) this would be cTX = 6 x age + 2 x height + 5 x weight. By linearity
wc know that its expected value E [cTX] is cTE [Xj = с1 X:
E [cTX] = cTE (XJ - 6 (expected age) + 2(expected height) + 5 (expected weight).
332
Chapter 8. Learning from Data
More than the mean of cTX. we also know iu variance tr2 - cT Vc:
Variance of cTX = cTE ^(X - X) (X - X) j c — cT Vc
Now lhe key point: The variance of cTX can never be negative. So Vc > о
New proof: The covariance matrix V is positive semidefinite by the energy test cT Vc > o'
Covariance matnees V open up the link between probability and linear algJbn>.
V equals QAQT with eigenvalues A( > 0 and orthonormal eigenvectors qt to
Diagonalizing the covariance matrix V means finding M independent
experiments as combinations of tbe original M experiments.
The Covariance Matrix for Z = AX
Here is a good way to see oj when z = r + y. Think of (x,y) as a column vector X.
Think of the 1 by 2 matrix A = [ 1 1 ] multiplying that vector X = (x. y). Then AX
is lhe sum z = x + y. The variance a2 goes into matrix notation as
•:-(* ч[Й. S’lR] •*“*	(2J)
Now for the main point. The vector X could have Af components coming from А/
experiments (instead of only 2). Those experiments will have an M by А/ covariance
matrix Vx. The matrix A could be К by M. Then AX is a vector with К combinations
of the Л/ outputs (instead of one combination x + у of 2 two outputs).
That vector Z “ AX of length К has a К by К covariance matrix V% . Then the great
rule for covariance matrices—of which equation (22) was only a 1 by 2 example—is this
beautiful formula: The covariance matrix of AX is A (covariance matrix of X) AT;
The covariance matrix of Z = AX
Vz = AVXAT (23)
To me. this neat formula shows the beauty of matrix multiplication. 1 won’t prove this
formula, just admire it. It is constandy used in applications.
The Correlation p
Correlation p,v is closely related to covariance a^. They both measure dependence or
independence. Start by rescaling or "standardizing" the random variables x and у
The new X = х/<гж and У = y/av have variance <r2x = trj. = 1. This is just like
dividing a vector t> by its length to produce a unit vector ю/||ю|| of length 1.
Thc correlation of x and у is the covariance of X and Y. If the original covariance
of x and у was then rescaling to X and У gives correlation pxy = <тяу/ая<ти.
Correlation pxy =—— ж covariance of — and —
tr* <rv
Always —1 < p*v < 1
333
8.4. Mean, Variance. and Covariance
problem Set 8.4
1
2
3
4
5
6
7
8
9
10
11
If all 24 samples froin
sample mean p lhc ” 4	(hc
_	° finance & Wk » <	* ж 20» * hat arc thc
Add 7 to every outputWhat J *f ' = » « 21.12 tunes each 7
new sample mean, thc neu. - PP^ 10 °* n*an .k-
“*’** «petted mean. and	inancc 9 are thc
We know: | of all lnte„en „. . .	**lhc *• *«n»ce 7
What fraction of integers will к, ,'*'мЫе b7 3 and 1 o( lnu
8 *'П ** ^visible by 3 o, 7 t““?sen *" d,*‘“ble by 7.
Suppose you sample from the number, , ,	"
Whai k th‘hC PrDbab,l"le' Po «о Л that the 1^?^ ₽robab,l"*‘ 1/1000.
What the expected mean m of that J* , **" °* 7ой' «mple is 0.......97
Cl . ,	•“ *1»M is ns Variance n3 7
Sample again from 1 to 1000 but lew м .h. i
square could end with z = o, 1 4 j 6	d'«« <* 1* sample ^uarrd T^t
finl dig.i (71 I	'iS)'ww*Т-•*К*I»«»
variances11 Rtme„to S	« “»-P*
Equation (4) gave a second equivalent form fw (Ihe vanance using sample,)
S’ = ^Ti,umof(xi-m),-~-y[(lumofzJ)-WmJ].
Verify lhe matching identity for the expected vanancc a3 (using m - £ p, r.):
<ra =- sum of (z, - m)’ = (sum of p, *J) - m2.
Computer experiment: Find the average Aioooow of a million random 0-1 samples I
What is your value of the standardized vanable X = (Ajv - |) /2s/?f 7
For any function /(z) the expected value is E[/| = £> /fo) or f p(z) /(z)dr
(discrete or continuous probability). The function can be i or (z - m)’ or z2.
If the mean is E[z] = m and the variance is E[(x - m)2] ® o3. what is E[z2) 7
Show that the standard normal distnbution p(z) has total probability J p(r) dr = 1
as required. A famous trick multiplies Jp(x)dr by fp(ii)dy and computes thc
integral over all z and all у (—ос to oo). The trick is to replace dr dy in that double
integral by r dr dO (polar coordinates with r2 + g2 = r2). Explain each step:
oo oo	oo	2»
2тг Zpfz) dr Ml/) dy= /J e'^^dx dy= J
—oo	—oc —co	0 = 0 г
-к
Ir dr de = 2*
= 0
A1 The Ranks of AB and A + В
This page establishes kes facts about ranks: «ben we multiply matnees. the rank
cannot increase You vu ill see this by looking at column spaces and row spaces. And there
is one special situation alien the rank cannot decrease. Then you know thc rank of AB.
Statement 4 will be tmporunt when data science factors a mains into UV or CR.
Here are five key facts in one place mequal.ties and equal.tics for the rank.
1	Rank of AB < rank of A Rank of AB < rank of В
2	Rank of A + В < (rank of A) + (rank of Bi
3	Rank of A* A = rank of AAT = rank of A = rank of AT
4	If A is m by r and В is r by n—both with rank r—then AB also has rank r
Statement 1 insolves the column space and row space of AB:
C(AB) is contained in C(A) C((AB)T) i» contained in C(BT)
Every column of A В is a combination of the columns of A (matrix multiplication)
Every row of AB is a combination of the rows of В (matrix multiplication)
Remember from Section 1.4 that row rank = column rank Wc can use rows or columns.
The rank cannot grow when we multiply AB Statement 1 in the box is frequently used.
Statement 2 Each column of Л + В is lhe sum of (column of A) + (column of B).
rank (A + B) < rank (A) + rank (B) is always true It combines bases for C(A) and C(B)
rank (A + B) - rank (A) + rank (B) is not always true. It is certainly false if A - В  I.
Statement 3 A and ATA both have n columns. They also have the same nullspace.
(See Problem 4.1.9.) So n - r is the same for both, and the rank r is rhe same for both.
Then rank(AT) > rank(ATA) - rank(A). Exchange A and AT to show their equal ranks.
Statement 4 We are told that A and В have rank r. By Statement 3. A1A and BBT have
rank r. Those arerby r matnees so they are invertible. So is their product A1 ABBT. Then
r = rank of (ATABBT) < rank of (AB) by Statement 1 : AT, BTcan’t increase rank
We also know rank (AB) < rank A = r. So we have proved that AB has rank exactly r.
Note This does not mean that every product of rank r matrices will have rank r.
Statement 4 assumes that A has exactly r columns and В has r rows. BA can easily fail.
В = [ 1 2 —3	AB has rank 1 But BA is zero!
334
A2Eigenva1UesandSin8ularVa,ues;RankOne
. onc matnx has lhe simple form A = xy1 hv singular vectors »t.г ал! its «Ь
A ГЗП oneulM value »i m incredibly can io fad.
nonzcn> *	x y
•‘шйЗ	*’’we*
ll»ll '®n
You see immediately that Л = хут ж ц^.-Т	_______..к	- „
All other column. of Ле	c' ™	„ S.
№«Р..М..ОО л . rar
Eigenvalues and eigenvectors are not it,,, —. c*_	.
т —L-i.f--.-_i	quite mat easy Of course the matnx A must be
square To make life simple we continue with a 2 by 2 matnx A « ryT Certainly x is an
eigenvector!
Аж ж ryTx Ж A(x >0 A, is the number yTx.
The other eigenvalue is Aj ж 0 since A is singular (rank ж П The eigenvector xj ж yA
must be perpendicular to y. so that Axj ж xy1^1 ж 0 If у  (a. hi then y- is its
90° rotation (b. -e).
The transpose matrix AT = yxr has the same eigenvalues v'x «rf 0. Its eigenvectors
are lhe left eigenvectors of A. They will be у and x* (because xy1 hw eigenvectors
x and у1). The only question is tbe scaling that decides the eigenvector lengths
The requirement is (left eigcnvector)T(nght eigenvector) ж 1 Then the left eigenvec-
tors are thc rows of X 1 when the right eigenvectors are the columns of .V; perfection!
In our case those dot products of eigenvectors now stand at yTx and (xx)Ty* Divide
both left eigenvectors у and x^ by the number yTx. to produce Г'Х ж XX‘‘  /:
Finally there is one more crucial possibility, that yTx  0 Now the eigenvalues of
A = xy1 art zero and zero A has only one line of eigenvectors, because y1 is in the
same direction as x. The diagonalization (2) breaks down because the eigenvector matnx
X becomes singular. We cannot divide by its determinant yTx - 0.
This shows how eigenvectors can go into a death spiral (or a fatal embrace x ж у1).
Of course the pairs of singular vectors x, x1 and y. y1 remain orthogonal.
Question In equation (2). verify that j x y£ j ж (yTx) J 0 j j.
Quest ion When does A = xyT have orthogonal eigenvectors ?
335
A3 Counting Parameters in the Basic Factorizations
A = LU A = QR s - QAQr * — ХЛХ 1 A - QS A =
This it a review of key idea' in linear algebra. Thc ideas are expressed by those
i/alions and our plan is simple: Count the parameters in each matrix. Wc hope to sec
that in each equation like A = LU. thc two side* have thc same number of parameters.
For A = LU. both sides have я* parameters
I.; Triangular n x ri matrix with I s on thc diagonal j n(n — 1)
U: Triangular nun malm with free diagonal	| n(n 4- 1)
Q: Orthogonal я x n matrix	j r»( ra — 1)
S: Symmetric n x n matrix	| n(r‘ +	1)
A: Diagonal n x n matrix	n
X: n x n matrix of independent eigenvectors	n2 — n
Comments are needed for Q. Ils lirsl column qt is a point on thc unit sphere in R". That
sphere is an n — 1 -dimensional surface, just as lite unit circle ra + y2 = 1 in Ra has
only one parameter (thc angle Of. The requirement ||fl|||  1 has used up one of the n
parameters in qt. Then fl3 has n - 2 parameters—il is a unit vector and it is orthogonal
to . The sum (n - 1) + (л - 2) + • • - + I equals J n(n - 1) free parameters in Q.
Thc eigenvector matrix X has only na - n parameters, not na. If x is an eigenvector
then so is ex for any e / 0. Wc could require thc largest component of every x to be I,
This leaves n I parameters fix each eigenvector (and no free parameters for X-*).
Thc count for the two sides now agrees in all of thc first five factorizations.
Fix the SVD. use thc reduced form AmXn = Umur^rurKx„ (known zeros arc
not free parameters!) Suppose that rn < n and A is a full rank matrix with r m m. Thc
parameter count for A is rnn. So it lhe total count for (Д E, and V. Thc reasoning for
orthonormal columns in U and V is thc same as for orthonormal columns in Q.
U has rn(m — 1) E has m V has (n-l)+---+(n-rri) = mn — - rn(m + 1)
Finally, suppose that A is an rn by n matrix of rank r How many free parameters
in a rank r matrix? Wc can count again f<xf/mxrErxr Vr^n:
U has	r) » mr — ? r(r 4- 1) V has nr — - r(r 4-1) E has r
Thc total parameter count for rank r is (m + n — r)r.
Wc reach thc same total for A = CR in Section 1.4. Thc r columns of C were taken
directly from A. Thc row matrix R includes an г by r identity matrix (not free!). Then
the count for CR agrees with thc previous count for l/EV *. when thc rank is r:
(' has rnr parameters R has nr — ra parameters Total (m 4- n — r)r.
336
А4 Codes and Algorithms forNume
mer'cal Linear Algebra
LAl’ACK is the first choice for dense
ScaLAPACK achieves high performance
COIN/OR
Here are sources for specific algorithms
Direct solution of linear systems
Basic matrix-vector operations
Elimination with row exchanges
Sparse direct solvers (UMFPACK)
QR by Gram-Schmidt and Householder
Eigenvalues and singular values
Shifted QR method for eigenvalues
Golub-Kahan method for the SVD
Iterative solutions
Preconditioned conjugate gradients for Sx  b
Preconditioned GMRES for Ax = b
Krylov-Amoldi for Ax = Ax
Extreme eigenvalues of S
Optimization
Linear programming
Semidefinite programming
Interior point methods
Convex Optimization
Randomized linear algebra
Randomized factorizations via pivoted QR
A = CM R columns/mixing/rows
Fast Fourier Transform
Repositories of high quality codes
ACM Transactions on Mathematical Software
‘Vnuzabon problems of operates research
BLAS
LAPACK
SuitcSparsc. SuperLU
1-APACK
LAPACK
lapack
Trilinos
Tnlmos
ARPACK. Tnhnos. SLEPc
see also BLOPfcX
CLP in COIN/OR
CSDP in COIN/OR
IPOPT in COIN/OR
CVX.CVXR
usersicesirtexasedu/
->pgrrvmain codes html
FFTWorg
GAMS and Nethb org
TOMS
Deep learning software
Deep learning in Julia
Deep learning in MATLAB
Deep learning in Python
Deep learning in R
and JavaScript
Fluxml a»^F lux jl'stabie
Tensorflow org, Tensorflow js
Kcras. KcrasR
337
А5 Matrix Factorizations
1. A = CR = (basis for column space of A) (basis for row space of A)
Requirements: C is m by r and R is r by n. Columns of A go into C if they
are not combinations of earlier columns of A. R contains thc nonzero rows of thc
reduced row echelon form Ra = rref(A). Those rows begin with an г by r identity
matrix, so R equals [ I F ] limes a column permutation P.
A_CMR* (C = firstr \ /IV = firstr by г \ 1/Л* = first r
—	^independent columns/ ^invertible submatrixy ^independent rows
Requirements: C and R' come directly from A. Those columns and rows meet in
the r by r matrix IV = Af “* (Section 3.2): Af = mixing matrix. The first r by г in-
vertible submatrix IV is the intersection of the r columns of C with the r rows of R*.
.	.	- ,.	_ /	lower triangular L	\ /	upper triangular U	\
—	l’s on the diagonal	у \	pivots on the diagonal J
Requirements: No row exchanges as Gaussian elimination reduces square A to U.
4	A	-	LDU —	( *owcr tnan£ular	\	(	P*TOt matrix A /	upper triangular U	\
—	\ l’s on the diagonal )	\	Dis diagonal J у	l’s on the	diagonal	J
Requirements: No row exchanges. The pivots in D are divided out from rows of U
to leave l’s on the diagonal of U. If A is symmetric then U is LT and A = LDLT.
5.	PA = LU (permutation matrix P to avoid zeros in tbe pivot positions).
Requirements: A is invertible. Then P.L,U are invertible. P does all of the
row exchanges on A in advance, to allow normal LU. Alternative: A = L\P\U\.
6.	S = C^C = (lower triangular) (upper triangular) with v'D on both diagonals
Requirements: S is symmetric and positive definite (all n pivots in D arc positive).
This Cholesky factorization C = chol(S) has CT = Ly/D, soS = C^C = LDLr.
7.	A = QR = (orthonormal columns in Q) (upper triangular matrix /?).
Requirements: A has independent columns. Those are orthogonalized in Q by the
Gram-Schmidt or Householder process. If A is square then 1 = QT.
8.	A = X AX ~1 = (eigenvectors in A') (eigenvalues in A) (left eigenvectors in X-1).
Requirements: A must have n linearly independent eigenvectors.
9.	S = QAQT = (orthogonal matrix Q) (real eigenvalue matrix A) (QT is Q"1)-
Requirements 5 is real and symmetric: ST = S. This is the Spectral Theorem.
338
д5 Matrix Factorisations
339
« fi) (Jordan blocks in J) (R-*).
-----t a block for each
one eigenvalue.
m * " «n?ul»r value matnx \ f > '
° It--,o, on its diagonal ) \ 1. _
Requiren^nts: None. This	-----
Requirements: A is any square
linearly independent eigenvector of A. Every blucXs^riy
II. A = USVT
orthogonal X
Visit x n )'
tors of A A tn L and eigenvectors of ATA in V;	= /д,(ААт).
Those singular values are o, > aj > ... > „r > 0. By column-row mulupltcabon
A = l/EVT =	+”- + <r,Ur»J.
If Sis symmetric positive definite then L’ = V = Q and E = A and S = QAQT.
.+ __ у£+[/Т _ i orthogonal \ / n x m pseudoinverse of E \ /orthogonal >
\ nxn / \1/<Г|,...,1/<Гг on diagonal / \ m x m )
Requirements: None. The pseudounerse A* has A* A = projection onto row space
of A and A A* = projection onto column space. A* - A-1 if A is invertible. The
shortest least-squares solution to Az = b is i+ = A*b. This solves AT Ax+ = A1 b.
13.	A = QS - (orthogonal matrix Q) (symmetric positive definite matrix S).
Requirements: A is invertible. This polar decomposition has S2 - AT A. Thc
factor S is semidefinite if A is singular. The reverse polar decomposition A = KQ
has № = AAT. Both have Q = UV1 from the SVD.
14.	A = UAU~1 = (unitary U) (eigenvalue matrix А)(Г'' which is I H = U )•
Requirements: A is norma!: АЯА = AA^ «« «ЬогютЫ
eigenvectors are the columns of U. Complex Vs unless S = S Hermman case.
15	A = QTQ1 = (unitary Q) (triangular T with A’s on diagonal) {Q =Q )
’ Requirements: Schur trian^^any squanA	• matrix Q with
orthonormal columns that makes Q~'AQ tnangular Secnon 63.
|7 Dl [^n/3	] [ evttKxld
16.	Fn =	[ Fn/,J I permutation
Requirements: Fn ~ Fbuner matrix	- У tbe Fast Fourier Transform
D ba. 1.	‘	ta. < «М» - « V
will compute Fnz with only	P
recursive ИТ.
А6 The Column-Row Factorization of a Matrix
Abstract
The active ideas in linear algebra are often expressed by matrix factorizations: S = Q\QT
for symmetric matrices (the spectral theorem) and A = UL\ all matrices (singular
value decomposition). Far back near the beginning comes A = Lb for successful elimi-
nation : Lower triangular tunes upper triangular. This paper is one step earlier, with bases
in A = CR for lhe column space and row space of any matnx—and a proof that column
rank = row rank. The echelon form of A and the pscudoinverse A+ appear naturally. The
“proofs" are mostly “observations".
Introduction
An introduction is hardly necessary for so short a paper. But I can explain thc background.
In teaching linear algebra, the course often begins slowly. The idea of a vector space
waits until Chapter 3. The highly important topic of singular values is squeezed into the
final week or completely omitted. A new plan is needed.
I now start lhe course in a different way. The multiplication Ar produces a combination
of the columns of A. All combinations fill the column tpace of A—a key idea lo visualize.
Simple examples of Ar » 0 show thc idea of linear dependence. Starling with column 1,
we create a matrix C with a full set of independent columns—a basis for lhe column space.
1 believe that this “fast start" is also a better start. Every column of A is a combination
of the columns of C. Introducing matrix multiplication, thal fact become! A  CR. Wc
have a natural factorization of A. to be followed by A = LU (elimination) and A = QR
(Gram-Schmidt) and 5 - QAQr (eigenvalues in the spectral theorem) and A = UEVr
(singular values in the SVD). The course has a structure that students can follow. A new
textbook called "Linear Algebra for Everyone" is in preparation.
Thc key point for this paper is thal the matrix R in A = CR is already famous. R is the
reduced row echelon form of A. with any zero rows removed. It has a simple “formula”
R = [ I F ] P which the mechanics of elimination will execute. And it has a "meaning”
that is hidden in those row operations on A. R tells us lhe combinations of independent
columns in C. which produce all the columns of A.
The Factorization A = CR
A is a real matrix with m rows and n columns. Some of those columns might be linear
combinations of previous columns. Here is a natural way. working from left to right, to
find a complete set of independent columns.
If column 1 is not zero, put it into C.
If column 2 of A is not a multiple of column 1. put it into C.
If column 3 is nor a linear combination of columns 1 and 2, put it into C. Continue.
340
A6 The Column-Row FactoruatKx1<J(iMj|ni
341
At the end. C will have r independent column Tv
umn space of 4. Every column of 4 ha comb'	col“n«ns wil be a basis for the col-
in those combinations go into columns of 'jT****’’’ ** of C. and the coefficients
4 =
1 4 7
2 5 8
3 6 9
1
2
3
4
5
6
1 0 -1
0 1
= CR with r = 2
2
The matrix R contains an r by r identity matrii
deni columns of 4. Column 3 of R tells u* thJthZ IC°,ura"' 4m CWTC'P"«‘d to indepen-
M obMTVMioa: Hi, ,„r	*
neumjorm of Д without us m - r zero row».
A second observation: Every row of A i.o --«•
directly from the matrix multiplication A - CR	Г”**
I of R plus I times row 2 of R. 7Ы «ffi^J!	‘ °* Л "*
A	,lK «nblike I and 4 in those linear combmatiixis
ше in the rows of C And the row s of R are independent because r by г idottily mams is
a subma tn x of R. Then the rows of Я are a basts for the row space of 4.
This matrix Я is usually computed by row operations on A to reach the “echelon form".
Here R appears after the column basis in C.
A third observation comes from 4 = CR = (m x r)(r x n): The column runk of
4 equals the row rank of A The same number r counts independent columns in C and
independent rows in R.
A fourth observation is a "formula" fat the reduced row echelon form Яо  rrefi 4).
Normally this matrix with m - r zero rows is constructed directly by row operations on 4.
and C does not appear. A direct description of Яо could be as follows.
Suppose the basic columns in C are columns n( < n} < •• • < n, of 4. The other
n — r columns of 4 are combinations .V = CF of those r basic columns (in order) Then
thc reduced row echelon form of 4 with m - г zero rows is R»:
d _ Г ? I p The permutation P puts the n columns of C* and .V into
°	[ 0 0 ] their correct order in 4. and I - r x r identity matrix.
Note that 4 - C ( I F]P»(C Л’] P has the columns of 4 in their original order,
thanks to P. In lhe formula above. Яо is constructed directly from 4 and its uniqueness
is clear. Eric Grinberg has inserted the name -gauche basis" for the columns of (’—a
brilliant suggestion that reinforces the left-to-right construction of this basis for the column
space of 4.	.	.
Uniqueness of the reduced row echelon form seems to be a moot question when there
is an explicit formula for that matrix Ro- The formula cannot be new. but I doo t know a
reference. Observations 1-3 are definitely not new.
342
ДА fhr С>Аитп R/Л» I-мпмим tA a M-tru
A Mixing Matrix
Hera I* a variaimn on die matrix factorization A • CH. The matnx ( contain* actual
column* of A hut the matrix H doe* u* contain row» of А. For *ymmctric perfection. we
might prefer the matrix /Г cMdatmof the г uppermost linearly independent row. Ukcn
directly horn A f-enerally A CH’ will not he true To rcc/rver a correct factrrrzalion of
A. we need lo include a mixing matrix M between C and H Then A - (. MЦ* (The
Idea of a mixing matrix ha» become widespread •" numerical linear algebra The symbol
f/ I. often chosen instead of Af. hut U I» needed in thc important factorization. LU and
ULV'1. We would like u> nominate lhe letter Af and the adjective mUbtg.)
th*. Af have a wmple formula? fri Af 1 i* die r by r wbrnalnx at the intersection
of lhe г column» of C with the r row* of /Г. If Пиле happen lo be thc first r column*
and lhe first r row*, il I» easy М» *ee that MH’ will produce lhe familiar r by r identity
matrix that begin* lhe reduced row echelon form H. Then A = ('MR’ ii identified with
А -СП.
The Pxeudolnvrn*
I aclofi/alion» like A ('Hue familiar to algebraists. Il i* not surprising that they connect
lo other corislruclion*. An example in linear algebra it lite pscudolnvcrse of A.
Wr write die p*eudoinver»e a* A ‘. Il invert* lhe mapping (multiplication by A) from
row apace lo column apace. The pseudoinvcrsc it rrm on lhe nullspacc of At. Thu* it
invert* A where inversion is possible: A A 1 A  A.
and A4 -((/EPT)+ - VE+t/T
I he pceudoinvrrtc connect* perfectly lo the rank r factor* in A ('R. Thc pMiudoinvcrxc
of (' it il* lefl inverte (’♦	(C^C) 'Cn. Thc pscudoinverse of R it it* right inverse
/(• R1 (////* ) Then lite ptcudoinvene of A CR i* A * = H+C+, because all
rnnk* are eipuil lo r.
Thia »hort puficr wat wbmilicd lo lhe Jtnirtuil nf Convex AnaMi. A further note about
lhe*c fiKiiHi/aiion* it in ptogre*» with Daniel Drucker and Alexander Lin. Our plan it lo
|mi«| holh paper» on die nrXiv wchsilc in 21120.
* Tbe Jordan Form of a Square Matrix
Wc kn<л» that srmie ujuare matnees 4 »иЬ
eigenvector*. Therefore they can , be	А>ж* •“'* n «dependent
JordanesUbhsheda^oHy^^^^^ by ХАХ’»: X и am mwruNe.
when A has к independent eigenvector*	йи* b“ * Jordan block» J,,... jt
J=BABl
bs Jordan block» J,
А» I 0
• 1
O'
0
I
If Л can be diagonalized. then к = n д _ x .	.
n blocks are 1 by 1). If A can t be diagonal,/ed	A
Each block J, of she n, ha» only one L	**
value A,. The matrix В contain*egemectorstf А хкы .nh -	°"* e,gc"’
Here to an example rather than a proof.	*	8 e’*en'*cUx' 
Example
has A = 3,3.3 with only
two genuine eigenvector*
*1	[°
0 and 0
°]	[1
and A » B~'JB.
3 I
0 3
о 0
Thi* Jordan form make* Л* - B' J”В and	. B'^B as simple a* posoble to
compute. For power* of J. we just compute block by block:
Г 3 1 Г [ 3"
[оз] я[ О
That exponential formula i* telling us the missing solution to the differentia) equation
dU/dt “ JU (and also du/dt = Ли). The usual solution has e31. We can’t just use
that twice, when A = 3 is repeated. The missing solution is teM. And a triple eigenvalue
A = 3 with only one eigenvector (and one Jordan block) would involve f’e3*.
The Cayley-Hamilton Theorem
“Every matrix A satisfies its own characteristic equation р(Л) » 0.” The determinant of
A - XI is a polynomial p(A). and the n solutions to p( A) = 0 are the eigenvalues of Л
Our example above has p(A) = (A - 3)3 with a triple eigenvalue A = 3.3,3. Then Cayley-
Hamilton says that р(Л) = (Л - З/)3 has to be the zero matrix. Jordan make* this easy.
because р(Л) = Bp{J)B~1 and p( J) is certainly «го:
101’ Г о
Example
p(J) = (J-3/)3 =
and then (Л—З/)3 = 0.
0
0
0
о о
о 0
0
0
0
0
0
0
0
343
А8 Tensors
In linear algebra, a tensor is a multidimensional array. (To Einstein, a tensor was a function
thal followed certain transformation rules.) A matrix is a 2-way tensor. A 3-way tensor
T is a stack of matrices. Its elements have three indices: row number t and
column number j and “tube number k.
An example is a color image. It has 3 slices corresponding to red-green-blue. A slice of
T shows the density of one of those primary colors RGB (k = 1 to 3), at each pixel (a, j)
in thc image.
tensor (3-way array)
vector
Another example is a joint probability tensor. Now р,}ъ is the probability that a random
individual has (for example) age i and height j and weight k. The sum of all those numbers
p1;t will be 1. For i = 9. the sum of all рэ}к would be the fraction of individuals that have
age 9—the sum over one slice of the tensor.
A fundamental problem—with tensors as with matrices—is to decompose the tensor T
into simpler pieces. For a matrix A that was accomplished by the SVD. The pieces that
add to A arc matrices (we should now say 2-tensors), with the special property that each
piece is a rank-one matrix u»T. Linear algebra allowed us to require that the u’s from
different pieces were orthogonal, and the u’s were also orthogonal and that there were
only r pieces (r < m and r < n).
Sad to say, this SVD format is not possible for a 3-way tensor. We can still ask
for R rank-one pieces that approximately add to T:
CP Decomposition T ss ai о bj ocj + -•• + ад о Ьд о cr. (])
Orthogonality of the a’s and of the b’s and of the c’s is generally impossible. Thc number
of pieces is not set by T (its “rank" is not well defined). But an approximate decomposition
of this kind is still useful in computations with tensors. One option is to solve alternately
for the a, (with fixed b, and c,) and then for the b, (fixed a, and c,) and then for the c,
(fixed a, and b,). Those subproblems can be reduced to least squares. Other approximate
decompositions of T are possible.
The theory of tensor decompositions (multilinear algebra) is driven by applications.
We must be able to compute with T. So the algorithms are steadily improving, even
without the orthogonality properties of an SVD.
344
№ The Condition Number
The condition number measures the ratio of <
The most common problem is to solve n line»*" to (change in data).
this case the data is b and the solution is z -	T b “ n unkno*M *• In
in the data is ДЬ and the change in the solution кл * A “ ftxed changc
meaning of the word “change". Do we comoute	‘° deC,de thc
change ИЛЬИ / ||b|| ? That derision for t^ dJa bt^nc^ or the relative
solution x.	*111 h™* a decision for the
Absolute _ max ||Az|| im-ia.
condition b, ДЬ ||ДЬ|| = ®«
The absolute choice looks good but it has a problem. If
we are multiplying A"1 by 10. The absolute condition
is not 10 times harder. The relative
Relative _ max ||Az||/||z|| ...
condition Ь.ДЬ[|ДЫ|/ЦЬЦ
we divide lhe matrix A by 10.
..goes up by 10. But solving Az = b
condition number is the right choice.
(2)
------------------------— I
If Ais the simple diagonal matnx £ with entries a, > - -. > ffr = then its norm
is o,M* - Ot • The norm of A is 1/a».. The orthogonal matrices U and V in the SVD
leave the norms unchanged. So the ratio /a~ i, cond(A). We are using the usual
measure of length ||z||2 = zf +-• + zj.
Notice that crM (not A^,) measures the distance from A to tbe nearest singular matrix.
At first we might expect to see A - A^Z, bringing the smallest eigenvalue to zero. Wrong.
The nearest singular matrix to A = L'EVT is U(E - a—Z)VT because the orthogonal
matrices V and V don t affect the norm. Bnng thc smallest singular value to zero.
The eigenvalues of A have different condition numbers. Suppose A is a simple root (not
a repeated root) of the equation det(A - AZ) = 0. Then Az = Az and ATy = Ay for unit
eigenvectors ||z|| = ||yj| = 1. The condition number of A is l/|yTz|. In other words it
is 1/| cos 01, where 0 is the angle between the right eigenvector z and the left eigenvector
y. (The name comes from the equation yT A = AyT. with yT on the left side of A.)
Notice that a symmetric matrix A will have у = z with cosfl = 1. Tbe eigenvalue
problem is perfectly conditioned for symmetric matrices, just as Az = b was perfectly
conditioned for orthogonal matrices with ||Q| | ||Q"11| = 1.
The formula 1/|yTz| comes from the change ДА « ут ДА z/yTz m the eigenvalue
created by a small change ДА in the matrix.
345
А10 Markov Matrices and Perron-Frobenius
This appendix is about positive matrices (all atJ > 0) and nonnegalivc matrices (all > 0)
Markov matrices Л/ are important examples, when every column of M adds to 1. Positive
numbers adding to 1 makes you think immediately of probabilities.
A useful fact about any Markov matrix M: The largest eigenvalue is always A a 1.
We know that every column of Л/ — I adds to zero. So the rows add to the zero row, and
Af - / is not invertible: A = 1 is an eigenvalue Here are two examples:
= [ 0 2 0 7 ] c'8envalues 1 and ~ [ 1 о ] **“ e*8cnvalues 1 and -1
That matrix .4 is typical of Markov. Thc eigenvectors are	= (0.6,0.4) and x2 = (1, _ i).
f 0.8 [ 0.2	0.3 1 0.7 ]	f 0.6 [ 0.4		J q G j is a steady state
Г 0.8 [ 0.2	0.3 1 0.7 J	[-!]	H	* j is a “transient" that disappears
Our favorite example is based oo rental cars in Chicago and Denver. We start with 100 cars
in Chicago and no cars in Denver: y0 e (°)- Every month we multiply the current
vector y„ by A to find y„.|: the number in Chicago and Denver after n + 1 months:
«.-['"l ’-["1 *-|S] ’-[Si••’-[“]
That steady slate (60.40) is an eigenvector of A for A « 1. If wc had started with
Vo “ (60,40) then we would have stayed there forever. Starting at (100,0) we needed
to get 40 cars to Denver. You see that number 40 at time zero reduced to 20 at time 1,
10 at lime 2. and 5 at time 3. That is thc effect of lhe other eigenvalue A  |, dividing its
eigenvector by 2 at every step:
This is yn = Any0 coming from the single step equation y„+1 = Ay„. In matrix notation.
A" approaches rank one!
л-.<хлх-г-хл-х-'.[’;	(J).][J
П . 4«-[06 04
-O.Gj - [0.4 0.4
You have now seen a typical Markov matrix with Atnax = 1- Its eigenvector (0.6,0.4) is
the survivor as time goes forward. All small eigenvalues have A" —> 0. But our second
Markov example has a second eigenvalue A = — 1. Now we don't approach a steady state:
® e I о e’£enva,ue At = 1 with zt = I J j and A2 = —1 with x2 =
346
AIO Markov Matrices anj
The zeros in В allow tha|
AH cars switch cities even, e,ge
Vo^^^thenthetS^^
347
Vo =
sue as A, = ,
b*Mo>. If we мал M
No steady stale because A, = .	1 J I -10 ]
Markov matrix A has all a ~ ~1 •*» has sue |A,| „ .
Perron found the proofs ' ^ghl «L B	"J
Ind.i.mn.prwnd,,„Шу ,llh£•**., > 0. n,.
_____	tVCTV fwv.;»:.

_________ _______________________ ••Mowed B. =n
Theorem (P Z-------------------' Пи*Л' Л “ allowed
(Perron) Alli^b^ZT---------‘---------------
------—	m ж *
mined). we will Jw ,'XX 1 Я»	" Л' ? "
Amax and x is the positive eigenvect ** ** “ ,юм x Then BU* (wh,ch “
e^env«'«~whKh we * *n 'max is our eigenvalue
If Ax > tm^x is not an
that produces a strict mequal.ty А’Г>\	by A Because A > n
•* >,.'"ш.
forces the equality Лх « tmuX	co',*d be increased Thu controdumon
beeeu» on ,be Wl ,Л o(	»-««. . i. ponn,,
To see that no eigenvalue can be larver th * * pow,ve
may involve negat.ve or complex numbers	S,nce A «
by thc “triangle inequality- Thi, |,| “	'<* **!“«: |Д||Х| . (Лх| < Л|ж|
possible candidates t. Therefore |Al cannot e^^	“ *“ ,A| “ «* °f «he
Many Markov examples мал with a^^±~^т“Я * A"«
higher power A” « stnctly po„tIVe (Perron>	bu',hen Л’ *««"'
steady state eigenvector from A . j '™"”	“** >™«»* "«nces" also have one
when pare >	«, “)’
of outgoing links then column sums = I. finally G =
r^7hG‘J > и(Р^ГТОП) WikiPed“ "*1 «he book Amy L^gville and Carl Mever
(which is quickly found using Google).
Rtfertnct
Amy Langville and Carl Meyer. Google't ftige Лтк and Beyond: The Science of Search
Engine Rankings, Princeton University Press (2011).
Index
A
Adaptive, 315
Add vectors, 1,2
AlexNet. 288.292,303
All combinations. 6,21
All-ones matrix. 222
AlphaGo, 305
Angle between vectors. 14
Antisymmetric. 69.238.240.251
Area. 187, 189. 196
Area of parallelogram. 188
Arg min, 307
Arrow. 5
Associative law, 30.38
Augmented matrix. 45.81.96
Average. 301,316
Axis, 234
В
Back substitution. 41,42
Backpropagation, 286,288, 289, 306,318
Backslash. 97
Base, 189
Basis. 24.31.33.74.107,110.114.117.
118,122.139.193
Basis Pursuit, 284
Baumann, vii. 271.272
Bell-shaped. 267, 324-326
Best line. 155.277
Bidiagonal. 266
Big Formula. 179. 183,184
Big Picture. 124,137. 138,156
Binomial. 297.326.327
BLAS. 337
Block elimination. 70
Block matrix. 56.70.71.343
3bluelbrown.com, ix
Bowl. 232.307,309
Box. 189
Breakdown. 43
C
Calculus, 155.282,306
Cardinality, 283
Cayley-Hamilton. 226,343
Center point, 162
Centered, 247
Central Limit Theorem. 324. 326, 327
Chain rule, 289,319
Change of basis. 115,197,200
Characteristic polynomial, 205, 343
Chebfun.org, 282
Checkerboard. 132
Chess matrix. 132
China. 39
Cholesky, 231,237
Circle. 210.247
Classification. 291
Clock. 9
Closest line. 153.157.276
Closest point. 143. 147, 149
CNN. 293.299
Code. 172,176.200.263.267
Cofactor, 177,180,186
Coin flip. 321,327,328
COIN/OR. 337
Column picture. 21.44
Column rank. 25
Column rank = row rank, viii, 33,124
Column space, vi, 22,23,27,77,101
Column way, 29
Columns times rows, 34,35,60
Combination of columns. 3,21
Commutative, 30
Companion matrix, 213,251
Complete solution, 97,99,104
Complex matrix. 236.239
Complex number, 228,236
Complex vector. 236
Components. 5
Composition. 286,288,289,296
348
Index
Compression by SVD. 269-272
Condition number. 311.315,345
Congruent. 233
Constant coefficient. 243
Convex. 232.235
Convolution. 292,299-301
Comers, 10,187
Coronavirus, xi
Correct language, 108
Correlation. 332
Cosine, 13,15,17
Cosine Law, 18
Cost. 57
Counterrevolution, 283
Counting Theorem. 125
Covariance matrix. 258.277,328-332
Cramer's Rule. 185,186,190
Cross-entropy. 291
Cube, 9. 10.191
Cumulative, 323.324
Current, 127
D
Damping, 245
Data science, viii, 260,291
Deep learning, 286. 337.356
Dependent, v, 22.40, 159.356
Derivative. 69.194
Determinant. 50.54.177,207.213,225
Diagonal matrix. 20.215,275
Diagonalizable, 217,223,224,235
Diagonalization. 216,343
Diamond. 283,284
Dictionary, xii
Difference equation, 247
Difference matrix. 20
Differential equation. 243.257
Dimension, 33,87,107,111.112.121
Distance to singular, 345
Dot product, 1,2,13.21
Dot-product matrix, 157
Double descent. 314.315
Double eigenvalue. 222.343
Drucker, xi, 65
<lung. 260.
349
К
3.87.89.341
275.277
i-o
Edge matnx. IWt
E<genfaces. 279
Eigenvalue 202-210
Eigenvector. 202.311. 345
bgemector main x, 245
bgenvcctonof .4ТЛ. 261
Eight rules. 79
Elimination, 39.47,95
biminatmo matnx £. 49
Ellipse. 189.200.233.234
Empty basis. 113
Energy, 229.232
Epoch.312
Equation for A. 205
Error. 161
Euler s formula. 127
Even permutation. 64.71.182
Even-odd permutation. 66
Even/odd. 179
Exchange matnx. 30
Existence. 139
Expected value. 321.322
Exponential. 249.343
Exponential volunoo. 244
Expressivity. 288
F
Factorial 250
Factorization. 59.89.188. 336.338
Factors are unique. 63
Fast Foaner Transform. 64.66. 337.339
Fast start viii. 313
Fibonacci 183.219.224
Filter. 299.302
Finance. 285
Flag. 270. 273
Flat pieces. 293. 294
Folds. 294.298
Four subspaces. 121.132.133
Four ways to multiply, vii. 35
Fourier matrix. 285. 339
350
Index
Fourier series, 168, 282
Fredholm. 140
Free column. 85,86
Free variables. 88,91. 100
Frobenius, 274, 347
Full column rank. 43.98, 109
Full row rank, 99, 100
Function space. 75, 113, 119
Fundamental subspace, 74,121
Fundamental Theorem. 125, 138,262
Fundamental Theorem of Calculus, 195
G
Gauss-Jordan, 56,57
Gaussian, 324-326
General solution. 103
Generalization, 287,314
Geometry of SVD, 264
Gershgonn, 210,213
Go. 305
Golub- Kuhan. 266,337
Golub-Van Loan. 266
Google. 204
Google matrix. 347
Gradient. 307, 308
Gradient descent. 235.289, 306,308.309,
312,317,320
Gram-Schmidt, 158, 164, 169-171,175,
176,280,337
Graph. 126
Grayscale, 269. 287
Greeks. 285
Group, 73
Growth factor, 245
H
Hadamard, 173,191,222
Heat equation, 252
Height. 189
Heisenberg. 209.214
Hermitian matrix, 236
Hessian matrix. 306
Hidden layer. 287,296. 304
Hilbert matrix. 271.296
Homogeneous, 195
House, 198, 199
Householder, 172,281,337
Hypercube, 191
Hyperplane, 294,295
I
Identity matrix, 20,30
111 conditioned, 345
Image recognition, 269,299, 302
Imaginary eigenvalue, 207
Incidence matrix, 126
Independence. 22. 107, 108,217,356
Independent columns, 1.31
Independent variables, 329
Inequality, 15
Infinitely many solutions, 40
Initial value, 243
Inner product, 34.68.282
Integral, 195
Integration by parts, 69
Inverse matrix, 50,180
Inverse of AD, 51
Inverse of E, 53
Invertible matrix, 55,93,138
J
Jacobian matrix, 319
Jordan block. 343
Jordan fonn, 219,339,343
Julia, 172,236,337
К
Kaczmarz, 316,317,319
Kirchhoff's Current Law. 93, 127
L
Lagrange multiplier, 266,285
Language. 112
LAPACK. 337
LASSO. 284
Law of Inertia, 233
Law of Large Numbers, 321
Layer, 289
Leapfrog, 248
Learning from data, viii, ix, 356
Learning function, 289,293,301
Learning rate, 306
Index
Least squares. 153.154.156, pi _ 291,315	-«'1.276. Left eigenvector, 226.335 Left inverse, 94,133 Left nullspace, 121.123,125 Length, 11, 167,282 Length ||v||, 11 Line, 31,99 Linear combination, 1,3 Linear dependence, 109 Linear in each row, 179 Linear pieces, 294-296 Linear transformation, 192,194 |y? Linearly independent, 107 174 ’	’ ” Loop. 127,172 Loss function, 291,301,312 LU,58-60	351 N Ne8«4ve definite 251 n«tlib, 58 Neural net, 235.291 № P'vot, 43 Noi«. 153,163 Normalizable, 221 Nenn. 260.274.275,282,284 345 »rma 1 distribution.323-325 333 NoT e<,Ui"iOn' I48J«.I56 "ormal matrix, 326,339 Nullspace, 83,123 Nutshell. 356 0
M Machine learning, viii, 235,291 Magic factorization, 89,90 Magnitude, 248 Marginal, 329 Markov equation, 253 Markov matrix. 204,212,218.346 MATLAB. 19.45.159,172.236 237 300,337 Matrix, 20 Matrix exponential, 243,249,255 Matrix multiplication. 2,3,29,33 Matrix space, 77,113 Max-pooling, 292 Mean, 163,277,321,322,324.326 Median, 161 Minibntch. 312 Minimum, 232 Minimum norm solution, 159,3)5 Mixing matrix, 90,95,342 Modified Gram-Schmidt, 172 Momentum, 310 Multidimensional. 344 Multiplication, 30,35 Multiplicity, 221 Multiplier, 42,49, 52 Multivariable, 306,319	oew-mit.edu. ix Origami, 293 Orthogonal complement. 137.138 Orthogonal eigenvector. 227 Orthogonal matrix. 134.166. 208, 280 Orthogonal subspaces, 135.136 Orthogonal vectors. 134.285 Orthogonality. 258.280,282.336 Orthonormal. 165 Outer product. 34,68 Overdamping, 257 Overdetermined, 153 Overfilling, 3)4 P PageRank, 347 Parabola, 160 Paradox, 239 Parallel plane. 92 Parallelogram. 4.187 Parameters, 336 Partial pivoting. 66 Particular solution, 97,99,100 PCA. 274,276,278 Penalty. 283 Permutation, 64.166,178,285 Perpendicular, see Orthogonal Perron. 347
352
Piecewise, 296,297
Piecewise linear. 2X7
Pivot. 41-43. .156
Pivot column, Кб. X7
Pivot variable*. XX
Pixel, 2X7
Plane, v, 6,7
playground.tensorflow.org. viii, 293. 29X
Polar decompusilHM, 2X0. 339
Positive definite, 227-242
Positive matrix, 346
Positive pivot», 230
Positive semidefinite, 22X
Principal axis theorem. 234
Principal Component. 274
Probability. 36. 321, 323. 330.333, 344
Probability density. 324
Probability matrix. 329
Product ADC, 318
Product of pivot*. 1X4
Projection, 133, 134. 143.147,156
Projection matrix. 143.146, 147,151,
152.204.212
Proof. 58.60.90
Properties of determinant*. 197
Pscudoinvene. 133,138,159, 195.280.
339.342
Pythagoras. 11.18
Python. 172,300.337
Q
qr. tee Gram-Schmidt
Quadratic. 310
Quantum mechanics. 209
Quarter-plane. 77
R
Ramp function. 290
Random. 316,317
Random sampling. 266
Rank. 24.33.100,105.122.334
Rank 1 matrix. 25. 31, 264
Rank к approximation. 260
Rank r, 25.33. 122.336
Rank of [ A b ],90
Indci
Rank of A1 A " rank ofA, 334
Rank of AD. 334
Rank one. 25,31,335.344
Rank two. 128. 131
Ratio of determinants. 184. 230
Rayleigh quotient. 265, 268
Real eigenvalue, 227, 228
Real part. 248
Reduced SVD. 260
Reflection matrix. 166. 167,205, 281
Regression. 153, 156, 276.291
Relative error. 345
ReLU. viii. 286.289.297.356
Repeated eigenvalue, 216. 221
Repeated root. 252, 254, 343
Residual nets. 304
Reverse identity, 181
Reverse order, 45. 51
Revolution. 280
Right inverse, 94,133
Rotation. 166, 207
Row echelon form, 85, 87,94. 341
Row exchange. 61,65.66
Row picture. 21.29.44
Row rank. 25
Row space. 24. 78. 356
Row-and-column reduced, 94
Runge-Kutta. 257
S
Saddle-point matrix. 70. 240
Sample value, 321
Sample variance. 322
Schur's Theorem. 235.339
Schwarz inequality. 15. 18.214.282
Scree plot. 278
Second derivative. 232
Semi -convergence, 313
Semidefinite. 230,232,331
Sensitivity, 345
Shift matrix. 299
Shift-invariance, 299.302
Shifted QR. 266,337
Short wide matrix, 88
Shortest solution. 159.315

353
lode*
Similar matnx. 218.219.225, М3
Singular matnx. 46.48. 80
Singular value. 189.271
Singular vector. 262.276
Skew-symmetric. 182.208.231 255
Slider. 272
Sobel operator. УГ2
Softmax. 304
Solvable. 77
Space of matrices. 75,119
Span. 78. 110
Spanning tree. 222
Sparse. 283.284
Special volution. 84. 86,89
Spectral norm. 274
Spectral Theorem. 227,235
Spiral. 247.298
Stability, 249
Standard basis. 110
Standard deviation. 322.325
Standardized. 326
Statistics, 163
Steady state, 204.253.346
Steepest descent. 308,311
Stepsize, 306
Stiffness matrix. 248
Stochastic. 289.312.320
Straight line fit. 153.155.157
Stride. 303
Submatrix. 138
Subspace, vi. 24.75.76.80
Sum matrix. 28
Sum of matrices. 229.258.263
Sum of squares. 234.237
SVD. 272
Symmetric matrix. 20.69.72,336
T
Taylor series. 306
Tensor. 344
Theorem. 90.347.356
Tilted box. 177
Tocplitz matnx. 299. 302
Tonga, 270
T'XaJ varunce 278
Tunmead. 27|
Trace. 207.225.268.275
Tr^rung dau. 2*6
Training Kt. 3|2
Transformation. 192
Transpose. 67.184
TranspuK uf ЛВ, 67
Transput of d/dt. 69
Tree. 127
Triangle. 10.187
Triangle inequality, 15. 275.282
Triangular reams. 20.42. 59. 189. 336
Tndiagonal. 183.266
T*o equal runs. 181
U
Unbiased. 163
Undcrdamping. 257
Underdetermincd. 100
Uniform distribution. 323
Uniqueness. 139
Unit circle. 13
Unit vector. 12
Unitary matrix. 236
Upper triangular. 4|. 45.171
V
Vandermonde. 183
Variance. 163.277.292.321-326.333
Vector, v. 4.75
Vector addition. 2
Vector space. 74.75.79
Victory. 280
Volume. 177.187. 189
W
Wave equation. 252
Wavelets. 285
Weights, viii. 286.299
Window. 300. 301
z
Zero mean. 277
Zero sector. 76.79
Zig-zag. 309.310
>
I
I
•I
Index of Symbols
(ЛЛТ)Л Л(ЛТЛ). 261
(ЛН)С ж Л (НС). 30. 38
(ЛА)я - Л(Н®). 33
(ЛН)Т - ИТЛ ’.67
(ЛЯ)-’ = Д-'Л-'.З!
Л’1.50.57
(Лх)1 у ®т(Лту).68
(Л-1)1'- (Л1) '.67
(Л-')м " С>(/<1нЛ. 1X0. 1X6
Л H.III '.339
Л СЛ/И’. 90.95.338,343
Л - СП. 32. Х5. ЗЗХ. 340
Л СП' '/Г. 89.90
Л I.DU. 63
Л = LU. 58-60. ЗЗХ
Л С Н. 171,2X1.338
Л QS. 2X0
Л - НЕУТ. 259.339
Л - (ЛЕгЦТ. 260
Л = АЛА-1.215,338
ЛН. 29.34.35
АН and НА. 225.262
А НС. 3IX
AVr = t/r£r. 260
Лх  Ь. 27.40
Ах = Аят. 202
Ли. 3
Ли а <ти, 259
Л1 Л and ЛЛ1,262
Л' Л® = ЛТЬ. 147.148
Л+. 133.2X0
Л* = АЛ*№|,216
АН = O|6J +  • • + a„b*n, 34
СР Decomposition, 344
/ НЛх-ЬН*. 154. 155
£-• = /,,52.53
H.j.49
Р=Л(ЛТЛ)-'ЛТ. 147
P = QQr, 176
РА - LU. 65. ЗЗХ
Р2 - Р. 146
QR, 171,2X0.338
QrQ - 165
(/'	’> '.166
|(?,Н]-Чг(Л).176
Я, 33
п = [ / F ) Р. XX. 89. 340
Но. 85
О
/?<,а = d. 96
S = ЛТЛ. 231,265
Я - СТС, 338
,S’ = LDLT, 231.237
].89
S - QAQ'r. vii. 227.237.338
.S’T - S. 69
T(ev + dw) = cT(u) + dT(w). 192
T^d/dx. 194
Т0ь344
VW UT. 280
X-'AX = Л. 215
u(t) = e^'x, 244
F(x,v), viii. 289
Sx.141
354
Index of Symbol»
355
<s [°] ж Id' °
P s Ax. 147
« = QQrb, 167
ufc = Л*м<>. 221
ж+ = A+b. 159
x » Xf, + xn. 97
х = Л\Ь. 45,97
x ♦ v. 300
x'rSx/xrx, 265
J = fl“*Q'rb. 171
STU. 236
I/1'335
U = W/|H|.I2.28I
v + w, 2,5
v-w/i|v||||w|| = «*0.15
v • w « v'w, 2,12
|v-w| < Ijyll l|w||. 15
ця||ав®+в.236
||x||b2«3
||v +w|| < |MI + I|w||. 15
||v||3 II
cv + dw, 3,7
det. AB  (<lct Л) (det В), I«4.197
dotP" lor-1.179
<1(А(Л- A/) = 0,203
det.^T) = dct(4). 184
е!В(Л). 206
Л* norm. 283,284
^+/‘.285
ANlogaJV,66
ReLU. viii. 286.289.290.297.356
null(X). |42
пеКЛ). 35.34)
С(Л). 23
С(ЛТ),78
N(4), 83
N(4T), 121
N(4T)±C(4). 136
N(4)1C(4T). 136
N(4'4) = NM), 141.149.152
V1,137
cond = <T| /a„, 345
V +W. 120
VnW.82.120
VUW, 120
Z. 113
(Л b).81.96
<r* =	261
|Л® ЛИ Лх].29
Капк(ЛЙ) < гапк(Л), 129.334
Rank( АВ) < rank(B). 129.334
| n3 multiplication», 58
|A|< fft.264
ЦЛ - B||. 260.274
1И®||/||«||. 265
ЦЛЦ.274.345
Il4“4i 345
HQxll = ll«l|. 167.280
ел‘,249
е4сВ^гМ. 255
« XeA* №‘. 250
mnp steps. 34
LAPACK. 172
randn. 19.267
356
Six Great Theorem*' Linear Algebra in a Nutshell
Six Great Theorems»! Linear Algebra
ь<-^тыт-*>ь—т.^ч^Ь‘-’Ь'^™тЬсто,гаи"
R.„k ТЫ»™. Dimension of »l«mn ч»« -	«Ии. Ч««- П» <• И» rank.
Fundamental Theorem The row space and nullspace of A arc orthogonal complements in Rn
SVD There are orthonormal bases (v’s and u’s for the row and column spaces) so that At>(  WjUj
Spectral Theorem If AT = A there are orthonormal q’s so that Аф = A,q, and A = QAqt
Linear Algebra
(( The matrix A
Nonsingular
A is invertible
The columns are independent
The rows arc independent
The determinant is not zero
Ax “ 0 has one solution z  0
Azsb has one solution x = A_|b
A has n (nonzero) pivots
A has full rank r»n
Thc reduced row echelon form is R = I
The column space is all of R"
The row space is ail of R"
All eigenvalues are nonzero
ATA is symmetric positive definite
A has n (positive) singular values
in a Nutshell
is n by n))
Singular
A is not invertible
The columns are dependent
The rows are dependent
The determinant is zero
Az = 0 has infinitely many solutions
Ax = b has no solution or infinitely many
A has г < n pivots
A has rank r < n
R has al least one zero row
The column space has dimension r < n
The row space has dimension r < n
Zero is an eigenvalue of A
ATA is only semidefinite
A has r < n singular values
Linear Algebra and Learning from Data
See math.mit.edu/learningfromdata This is the new textbook for the applied
linear algebra course 18.065 at МГГ. It starts with the basic factorizations of a matrix:
A=CR A = LU A=QR A = XAX~l S = QAQT A = U^VT
The goal of deep learning is to find patterns in the training data. Matrix multiplication
is interwoven with the nonlinear ramp function ReLU (z) = max (0, z). The result is
a learning function that can interpret new data. The textbook explains how and why
this succeeds—even in the classroom. Linear algebra and student projects are the keys.