Author: Gohberg I.   Lancaster P.   Rodman L.  

Tags: mathematics   matrix  

ISBN: 0-89871-608-Х

Year: 2006

Text
                    Invariant Subspaces
of Matrices with
k itlications
&> : o2
Israel Gohberg
Peter Lancaster
Leiba Rodman
C-L-A-S-S-I-C-S
In Applied Mathematics
sihjtl 51


Invariant Subspaces of Matrices with Applications
SlAM's Classics in Applied Mathematics series consists of books that were previously allowed to go out of print. These books are republished by S1AM as a professional service because they continue to be important resources tor mathematical scientists. Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board Richard A. Brualdi, University of Wisconsin-Madison Leah Edelstein-Keshet, University of British Columbia Nicholas J. Highani, University of Manchester Herbert B. Keller, California Institute of Technology Andrzej Z. Manitius, George Mason University Hilary Ockendon, University of Oxford Ingram Olkin, Stanford University Peter Olver, University of Minnesota Ferdinand Verhulst, Mathematiscfi Instituut, University of Utrecht Classics in Applied Mathematics C. C. Lin and L. A. Scgel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis: A Second Course Anthony V. Fiacco and Garth P. McCorrnick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimisation and Nonsmooth Analysis George F. Carrier and Carl E. Pearson, Ordinary Differential Equations Leo Breiman, Probability R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegattve Matrices in the Mathematical Sciences Olvi L. Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem *First time in print. ii
Classics in Applied Mathematics (continued) Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems lvar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications E Natterer, The Mathematics of Computerized Tomography AvinashC. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging R. Wong, Asymptotic Approximations of Integrals O. Axelsson and V A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, Time Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Intriligator, Mathematical Optimization and Economic Theory Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1: Theory M. Vidyasagar, Nonlinear Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranting Populations Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods Leah Edelstein-Keshet, Mathematical Models in Biology Heinz-Otto Krciss and Jens Lorcnz, Initial-Boundary Value Problems and the Navier-Stolces Equations J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and Technique Friedrich Pukelsheim, Optimal Design of Experiments Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices ivith Applications in
This page intentionally left blank
Invariant Subspaces of Matrices with Applications Israel Gohberg Tel-Aviv University Ramat-Aviv, Israel Peter Lancaster University of Calgary Calgary, Alberta, Canada Leiba Rodman College of William & Mary Williamsburg, Virginia siam. Society for Industrial and Applied Mathematics Philadelphia
Copyright © 2006 by die Society for Industrial and Applied Mathematics This SIAM edition is an unabridged republication of the work first published by John Wiley &. Sons, Inc., New York, 1986. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. Fot information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Library of Congress Cataloging in Publication Data Gohberg, I. (Israel), 1928- Invarient subspeces of matrices with applications / Israel Gohberg, Peter Lancaster, Leiba Rodman. p. cm. — (Classics in applied mathematics ; 51) Originally published: New York : Wiley, cl986, in series: Canadian Mathematical Society series of monographs and advanced texts. Includes bibliographical references and indexes. lSRN0-89871-608-X(pbk.) 1. Invariant subspaces. 2. Matrices. I. lancaster, Peter, 1929-. II. Rodman, L. III. Title. IV. Series. QA322.G649 2006 515'.73--dc22 2006042260 is a registered trademark.
To our wives fcetfa, Diane, andlzffa MM
This page intentionally left blank
Contents Introduction I Part One Fundamental Properties of Invariant Subspaces and Applications 3 Chapter One Invariant Subspaces: Definition, Examples, and First Properties 5 1.1 Definition and Examples 5 1.2 Eigenvalues and Eigenvectors 10 1.3 Jordan Chains 12 1.4 Invariant Subspaces and Basic Operations on Linear Transformations 16 1.5 Invariant Subspaces and Projectors 20 1.6 Angular Transformations and Matrix Quadratic Equations 25 1.7 Transformations in Factor Spaces 28 1.8 The Lattice of Invariant Subspaces 31 1.9 Triangular Matrices and Complete Chains of Invariant Subspaces 37 1.10 Exercises 40 Chapter Two Jordan Form and Invariant Subspaces 45 2.1 Root Subspaces 45 2.2 The Jordan Form and Partial Multiplicities 52 2.3 Proof of the Jordan Form 58 2.4 Spectral Subspaces 60 2.5 Irreducible Invariant Subspaces and Unicellular Transformations 65 2.6 Generators of Invariant Subspaces 69 2.7 Maximal Invariant Subspace in a Given Subspace 72 2.8 Minimal Invariant Subspace over a Given Subspace 78 2.9 Marked Invariant Subspaces 83 ix
X Contents 2.10 Functions of Transformations 85 2.11 Partial Multiplicities and Invariant Subspaces of Functions of Transformations 92 2.12 Exercises 95 Chapter Three Coinvariant and Semiinvariant Subspaces 105 3.1 Coinvariant Subspaces 105 3.2 Reducing Subspaces 109 3.3 Semiinvariant Subspaces 112 3.4 Special Classes of Transformations 116 3.5 Exercises 119 Chapter Four Jordan Form for Extensions and Completions 121 4.1 Extensions from an Invariant Subspace 121 4.2 Completions from a Pair of Invariant and Coinvariant Subspaces 128 4.3 The Sigal Inequalities 133 4.4 Special Case of Completions 136 4.5 Exercises 142 Chapter Five Applications to Matrix Polynomials 144 5.1 Linearizations, Standard Triples, and Representations of Monic Matrix Polynomials 144 5.2 Multiplication of Monic Matrix Polynomials and Partial Multiplicities of a Product 153 5.3 Divisibility of Monic Matrix Polynomials 156 5.4 Proof of Theorem 5.3.2 161 5.5 Example 167 5.6 Factorization into Several Factors and Chains of Invariant Subspaces 171 5.7 Differential Equations 175 5.8 Difference Equations 180 5.9 Exercises 183 Chapter Six Invariant Subspaces for Transformations Between Different Spaces 189 6.1 [A B]-Invariant Subspaces 189 6.2 Block Similarity 192 6.3 Analysis of the Brunovsky Canonical Form 197 6.4 Description of [A B]-Invariant Subspaces 200 6.5 The Spectral Assignment Problem 203 6.6 Some Dual Concepts 207 6.7 Exercises 209
Contents xi Chapter Seven Rational Matrix Functions 212 7.1 Realizations of Rational Matrix Functions 212 7.2 Partial Multiplicities and Multiplication 218 7.3 Minimal Factorization of Rational Matrix Functions 225 7.4 Example 230 7.5 Minimal Factorizations into Several Factors and Chains of Invariant Subspaces 234 7.6 Linear Fractional Transformations 238 7.7 Linear Fractional Decompositions and Invariant Subspaces of Nonsquare Matrices 244 7.8 Linear Fractional Decompositions: Further Deductions 251 7.9 Exercises 255 Chapter Eight Linear Systems 262 8.1 Reductions, Dilations, and Transfer Functions 262 8.2 Minimal Linear Systems: Controllability and Observability 265 8.3 Cascade Connections of Linear Systems 270 8.4 The Disturbance Decoupling Problem 274 8.5 The Output Stabilization Problem 279 8.6 Exercises 285 Notes to Part 1. 290 Part Two Algebraic Properties of Invariant Subspaces 293 Chapter Nine Commuting Matrices and Hyperinvariant Subspaces 295 9.1 Commuting Matrices 295 9.2 Common Invariant Subspaces for Commuting Matrices 301 9.3 Common Invariant Subspaces for Matrices with Rank 1 Commutators 303 9.4 Hyperinvariant Subspaces 305 9.5 Proof of Theorem 9.4.2 307 9.6 Further Properties of Hyperinvariant Subspaces 311 9.7 Exercises 313 Chapter Ten Description of Invariant Subspaces and Linear Transformations with the Same Invariant Subspaces 316 10.1 Description of Irreducible Subspaces 316 10.2 Transformations Having the Same Set of Invariant Subspaces 323
Xll Contents 10.3 Proof of Theorem 10.2.1 328 10.4 Exercises 338 Chapter Eleven Algebras of Matrices and Invariant Subspaces 339 11.1 Finite-Dimensional Algebras 339 11.2 Chains of Invariant Subspaces 340 11.3 Proof of Theorem 11.2.1 343 11.4 Reflexive Lattices 346 11.5 Reductive and Self-Adjoint Algebras 350 11.6 Exercises 355 Chapter Twelve Real Linear Transformations 359 12.1 Definition, Examples, and First Properties of Invariant Subspaces 359 12.2 Root Subspaces and the Real Jordan Form 363 12.3 Complexification and Proof of the Real Jordan Form 366 12.4 Commuting Matrices 371 12.5 Hyperinvariant Subspaces 374 12.6 Real Transformations with the Same Invariant Subspaces 378 12.7 Exercises 380 Notes to Part 2. 384 Part Three Topological Properties of Invariant Subspaces and Stability 385 Chapter Thirteen The Metric Space of Subspaces 387 13.1 The Gap Between Subspaces 387 13.2 The Minimal Angle and the Spherical Gap 392 13.3 Minimal Opening and Angular Linear Transformations 396 13.4 The Metric Space of Subspaces 400 13.5 Kernels and Images of Linear Transformations 406 13.6 Continuous Families of Subspaces 408 13.7 Applications to Generalized Inverses 411 13.8 Subspaces of Normed Spaces 415 13.9 Exercises 420
Contents xiii Chapter Fourteen The Metric Space of Invariant Subspaces 423 14.1 Connected Components: The Case of One Eigenvalue 423 14.2 Connected Components: The General Case 426 14.3 Isolated Invariant Subspaces 428 14.4 Reducing Invariant Subspaces 432 14.5 Coinvariant and Semiinvariant Subspaces 437 14.6 The Real Case 439 14.7 Exercises 443 Chapter Fifteen Continuity and Stability of Invariant Subspaces 444 15.1 Sequences of Invariant Subspaces 444 15.2 Stable Invariant Subspaces: The Main Result 447 15.3 Proof of Theorem 15.2.1 in the General Case 451 15.4 Perturbed Stable Invariant Subspaces 455 15.5 Lipschitz Stable Invariant Subspaces 459 15.6 Stability of Lattices of Invariant Subspaces 463 15.7 Stability in Metric of the Lattice of Invariant Subspaces 464 15.8 Stability of [A B]-Invariant Subspaces 468 15.9 Stable Invariant Subspaces for Real Transformations 470 15.10 Partial Multiplicities of Close Linear Transformations 475 15.11 Exercises 479 Chapter Sixteen Perturbations of Lattices of Invariant Subspaces with Restrictions on the Jordan Structure 482 16.1 Preservation of Jordan Structure and Isomorphism of Lattices 482 16.2 Properties of Linear Isomorphisms of Lattices: The Case of Similar Transformations 486 16.3 Distance Between Invariant Subspaces for Transformations with the Same Jordan Structure 492 16.4 Transformations with the Same Derogatory Jordan Structure 497 16.5 Proofs of Theorems 16.4.1 and 16.4.4 500 16.6 Distance between Invariant Subspaces for Transformations with Different Jordan Structures 507 16.7 Conjectures 510 16.8 Exercises 513
xiv Contents Chapter Seventeen Applications 514 17.1 Stable Factorizations of Matrix Polynomials: Preliminaries 514 17.2 Stable Factorizations of Matrix Polynomials: Main Results 520 17.3 Lipschitz Stable Factorizations of Monic Matrix Polynomials 525 17.4 Stable Minimal Factorizations of Rational Matrix Functions: The Main Result 528 17.5 Proof of the Auxiliary Lemmas 532 17.6 Stable Minimal Factorizations of Rational Matrix Functions: Further Deductions 537 17.7 Stability of Linear Fractional Decompositions of Rational Matrix Functions 540 17.8 Isolated Solutions of Matrix Quadratic Equations 545 17.9 Stability of Solutions of Matrix Quadratic Equations 551 17.10 The Real Case 553 17.11 Exercises 557 Notes to Part 3. 561 Part Four Analytic Properties of Invariant Subspaces 563 Chapter Eighteen Analytic Families of Subspaces 565 18.1 Definition and Examples 565 18.2 Kernel and Image of Analytic Families of Transformations 569 18.3 Global Properties of Analytic Families of Subspaces 575 18.4 Proof of Theorem 18.3.1 (Compact Sets) 578 18.5 Proof of Theorem 18.3.1 (General Case) 584 18.6 Direct Complements for Analytic Families of Subspaces 590 18.7 Analytic Families of Invariant Subspaces 594 18.8 Analytic Dependence of the Set of Invariant Subspaces and Fixed Jordan Structure 596 18.9 Analytic Dependence on a Real Variable 599 18.10 Exercises 601 Chapter Nineteen Jordan Form of Analytic Matrix Functions 604 19.1 Local Behaviour of Eigenvalues and Eigenvectors 604 19.2 Global Behaviour of Eigenvalues and Eigenvectors 607
Contents xv 19.3 Proof of Theorem 19.2.3 613 19.4 Analytic Extendability of Invariant Subspaces 616 19.5 Analytic Matrix Functions of a Real Variable 620 19.6 Exercises 622 Chapter Twenty Applications 624 20.1 Factorization of Monic Matrix Polynomials 624 20.2 Rational Matrix Functions Depending Analytically on a Parameter 627 20.3 Minimal Factorizations of Rational Matrix Functions 634 20.4 Matrix Quadratic Equations 639 20.5 Exercises 642 Notes to Part 4. 645 Appendix. Equivalence of Matrix Polynomials 646 A.l The Smith Form: Existence 646 A.2 The Smith Form: Uniqueness 651 A.3 Invariant Polynomials, Elementary Divisors, and Partial Multiplicities 654 A.4 Equivalence of Linear Matrix Polynomials 659 A.5 Strict Equivalence of Linear Matrix Polynomials: Regular Case 662 A.6 The Reduction Theorem for Singular Polynomials 666 A.7 Minimal Indices and Strict Equivalence of Linear Matrix Polynomials (General Case) 672 A.8 Notes to the Appendix 678 List of Notations and Conventions 679 References 683 Author Index 687 Subject Index 689
This page intentionally left blank
Preface to the SI AM Classics Edition In the past 50 or 60 years, developments in mathematics have led to innovations in linear algebra and matrix theory. This progress was often initiated by topics and problems from applied mathematics. A good example of this is the development of mathematical systems theory. In particular, many new and important results in linear algebra cannot even be formulated without the notion of invariant subspaces of matrices or linear transformations. In view of this, the authors set out to write a work on advanced linear algebra in which invariant subspaces of matrices would be the central notion, the main subject of research, and the main tool. In other words, matrix theory was to be presented entirely on the basis of the theory of invariant subspaces, including the algebraic, geometric, topological, and analytic aspects of the theory. We believed that this would give a new point of view and a better understanding of the entire subject. It would also allow us to follow up systematically the central role of invariant subspaces in linear algebra and matrix analysis, as well as their role in the study of differential and difference equations, systems theory, matrix polynomials, rational matrix functions, and algebraic Riccati equations. The first edition of the present book was the result. To the authors' knowledge it is the only book in existence with these aims. The first parts of the book have the character of a textbook easily accessible for undergraduate students. As the development progresses, the exposition changes to approach the style and content of a graduate textbook and even a research monograph until, in the last part, recent achievements are presented. The fundamental character of the mathematics, its accessibility, and its importance in applications makes this a widely useful book for experts and for students in mathematics, sciences, and engineering. The first edition sold out in early 2005, and we could not help colleagues who found a need for it. We are grateful to Wiley-Interscience publications for producing the first edition and for returning the copyright to us in order to give the work a new life. We are especially thankful to SIAM for the decision to include this work in their series Classics in Applied Mathematics. We would like to mention some other literature with strong connections to this book. First, there are two other relevant monographs by the present authors: Matrix Polynomials, published by Academic Press in 1982, and Matrices and Indefinite Scalar Products, published by Birkhauser Verlag in 1983. Invariant subspaces play an important role in both of them. In fact, work on these two books convinced us of the need for the present systematic treatment. The monograph of I. Gohberg, M. A. Kaashoek, and F. van Schagen, xvii
XV111 Preface to the Classics Edition Partially Specified Matrices and Operators: Classification, Completion, Applications, Birkhauser Verlag, 1995, is recommended as additional reading for Chapter 4. A later, comprehensive account of the theory of algebraic Riccati equations, discussed in Chapters 17 and 20, can be found in the monograph Algebraic Riccati Equations by P. Lancaster and L. Rodman, published by Oxford University Press in 1995. By the end of 2005 Birkhauser Verlag will also publish the authors' In- definite Linear Algebra. This can also be recommended as a book in which invariant subspaces play an important role. It is a pleasure to repeat the acknowledgments appearing in the first edition. These include support from the Killam Foundation of Canada and the Nathan and Lily Silver Chair on Mathematical Analysis and Operator Theory of Tel Aviv University. Continuing support was also provided by staff at the School of Mathematical Sciences of Tel Aviv University and at the Department of Mathematics and Statistics of the University of Calgary. In particular, Jacqueline Gorsky in Tel Aviv and Pat Dalgetty in Calgary contributed with speedy and skillful development of the first typescript. Support from national organizations is also acknowledged: the Basic Research Fund of the Israel Academy of Science, the U.S. National Science Foundation, and the Natural Sciences and Engineering Research Council of Canada. COMMENTS ON THE DEVELOPMENTS OF TWENTY YEARS Twenty years have passed since the appearance of the first edition. Naturally, in this time advances have been made on some the theory appearing in the first edition, advances which have appeared in specialized journals and books. Also, the status of some conjectures made in the first edition has been clarified. Here, several developments of this kind are summarized for the interested reader, together with a short bibliography. 1. Chapter 2. A characterization of matrices all of whose invariant sub- spaces are marked is given in [1]. 2. Chapter 4. The problem of describing the Jordan forms of completions from an invariant and a coinvariant subspace, also known as the Carlson problem, has been solved (in terms of Littlewood-Richardson sequences). As it turns out, it is closely related to the problem of describing the range of the eigenvalues of A + B in terms of the eigenvalues of Hermitian matrices A and B, solved by Klyachko [5]. See the expository paper [2] and references there. 3. Chapter 9. Various results on the existence of complete chains of invariant subspaces that extend Theorem 9.3.1 are presented in [8] (see also references there). We quote Radjavi's theorem [7]: A collection S of n x n complex matrices has a complete chain of common invariant subspaces if and only if the trace is permutable: trace(A\ ■ ■ ■ Ap) = trace (A^^ ■ ■ ■ Aa^) for every p-tuple A\,..., Ap, Aj G S, and every permutation a of {1,2,... ,p}.
Preface to the Classics Edition xix 4. Chapter 11. A simple proof of Bumside's theorem (Theorem 11.2.1 in the text) is given in [6]. Conjecture 11.2.3 was disproved in [3] (for all n > 1 except 7 and 11) and in [10] (for n — 1 and n — 11). It is certainly of interest to describe all pairs of complementary algebras V\ and Vi for which this conjecture is correct. In [3] it was proved that the conjecture is valid if the complementary algebras V\ and Vi are orthogonal. 5. Chapter 15. The past twenty years have seen the development of a substantial literature concerning stability (in various senses) of invariant subspaces of matrices, as well as of linear operators acting in an infinite- dimensional Hilbert space. For much of this material and its applications in the context of finite-dimensional spaces, we refer the reader to the expository paper [9] and references there. 6. Chapter 16. Conjecture 16.7.1 is false in general. A counterexample is given in [4]. The conjecture holds when A is nonderogatory (however, the proof given on page 512 is erroneous, as pointed out in [4]) and when A is diagonable. These results were established in [4] as well. An interesting open question concerns the characterization of those Jordan structures for which Conjecture 16.7.1 fails. References [1] R. Bru, L. Rodman, and H. Schneider, "Extensions of Jordan bases for invariant subspaces of a matrix," Linear Algebra Appl. 150, 209-225 (1991). [2] W. Fulton, "Eigenvalues, invariant factors, highest weights, and Schubert calculus," Bull. Amer. Math. Soc. 37, 209-249 (2000). [3] M. D. Choi, H. Radjavi and P. Rosenthal, "On complementary matrix algebras," Integral Equations and Operator Theory 13, 165-174 (1990). [4] J. Hartman, "On a conjecture of Gohberg and Rodman," Linear Algebra Appl. 140, 267-278 (1990). [5] A. A. Klyachko, "Stable bundles, representation theory and Hermitian operators," Selecta Math. 4, 419-445 (1998). [6] V. Lomonosov and P. Rosenthal, "The simplest proof of Burnside's theorem on matrix algebras," Linear Algebra Appl. 383, 45-47 (2004). [7] H. Radjavi, "A trace condition equivalent to simultaneous triangulariz- ability," Canad. J. Math. 38, 376-386 (1986). [8] H. Radjavi and P. Rosenthal, Simultaneous Triangularization, Springer Verlag, New York, 2001.
XX Preface to the Classics Edition [9] A. C. M. Ran and L. Rodman, "A class of robustness problems in matrix analysis," Operator Theory: Advances and Applications 134, 337-389 (2002). [10] T. Yoshino, "Supplemental examples: 'On complementary matrix algebras,'" Integral Equations and Operator Theory 14, 764-766 (1991).
Corrections Page Line Correction ~T23 13 For [I 0] read [0 I]. 137 3 For nondecreasing read nonincreasing. 137 6 up For Theorem 4.4.1 read Theorem 4.1.4. 137 5 up For Proposition 4.1.1 read Proposition 4.4.1. 140 8 and 9 up Reverse the order of vectors in these chains. 145 14 For L9X) read L(X). 146 1 up For n x nl read nl x n. 196 6 For FN-1 read FN. 197 5 up For Cm+n read Cm+n -> Cn. 214 6 up Reverse the positions of B and C. Also B and C. 221 11 For Xj — 1 read Xj-\. 223 10 For (XI- Ax) read (XI- Ai)-1. 225 4 up For W(X)-1 read W(A) and replace -C by C. 360 11 In the bottom row of the matrix replace r by —t. 673 2 For fe! read fe. 687 8up For "Mardsen" read "Marsden." xxi
This page intentionally left blank
Introduction Invariant subspaces are a central notion of linear algebra. However, in existing texts and expositions the notion is not easily or systematically followed. Perhaps because the whole structure is very rich, the treatment becomes fragmented as other related ideas and notions intervene. In particular, the notion of an invariant subspace as an entity is often lost in the discussion of eigenvalues, eigenvectors, generalized eigenvectors, and so on. The importance of invariant subspaces becomes clearer in the context of operator theory on spaces of infinite dimension. Here, it can be argued that the structure is poorer and this is one of the few available tools for the study of many classes of operators. Probably for this reason, the first books on invariant subspaces appeared in the framework of infinite-dimensional spaces. It seems to the authors that now there is a case for developing a treatment of linear algebra in which the central role of invariant subspace is systematically followed up. The need for such a treatment has become more apparent in recent years because of developments in different fields of application and especially in linear systems theory, where concepts such as controllability, feedback, factorization, and realization of matrix functions are commonplace. In the treatment of such problems new concepts and theories have been developed that form complete new chapters in the body of linear algebra. As examples of new concepts of linear algebra developed to meet the needs of systems theory, we should mention invariant subspaces for nonsquare matrices and similarity of such matrices. In this book the reader will find a treatment of certain aspects of linear algebra that meets the two objectives: to develop systematically the central role of invariant subspaces in the analysis of linear transformations and to include relevant recent developments of linear algebra stimulated by linear systems theory. The latter are not dealt with separately, but are integrated into the text in a way that is natural in the development of the mathematical structure. 1
2 Introduction The first part of the book, taken alone or together with selections from the other parts, can be used as a text for undergraduate courses in mathematics, having only a first course in linear algebra as prerequisite. At the same time, the book will be of interest to graduate students in science and engineering. We trust that experts will also find the exposition and new results interesting. The authors anticipate that the book will also serve as a valuable reference work for mathematicians, scientists, and engineers. A set of exercises is included in each chapter. In general, they are designed to provide illustrations and training rather than extensions of the theory. The first part of the book is devoted mainly to geometric properties of invariant subspaces and their applications in three fields. The fields in question are matrix polynomials, rational matrix functions, and linear systems theory. They are each presented in self-contained form, and—rather than being exhaustive—the focus is on those problems in which invariant subspaces of square and nonsquare matrices play a central role. These problems include factorization and linear fractional decompostions for matrix functions; problems of realization for rational matrix functions; and the problem of describing connections, or cascades, of linear systems, pole assignment, output stabilization, and disturbance decoupling. The second part is of a more algebraic character in which other properties of invariant subspaces are analyzed. It contains an analysis of the extent to which the invariant subspaces determine the parent matrix, invariant sub- spaces common to commuting matrices, and lattices of subspaces for a single matrix and for algebras of matrices. ., The numerical computation of invariant subspaces is a difficult task as, in general, it makes sense to compute only those invariant subspaces that change very little after small changes in the transformation. Thus it is important to have appropriate notions of "stable" invariant subspaces. Such an analysis of the stability of invariant subspaces and their generalizations is the main subject of Part 3. This analysis leads to applications in some of the problem areas mentioned above. The subject of Part 4 is analytic families of invariant subspaces and has many useful applications. Here, the analysis is influenced by the theory of complex vector bundles, although we do not make use of this theory. The study of the connections between local and global problems is one of the main problems studied in this part. Within reasonable bounds, Part 4 relies only on the theory developed in this book. The material presented here appears for the first time in a book on linear algebra and is thereby made accessible to a wider audience.
Part One Fundamental Properties of Invariant Subspaces and Applications Part 1 of this work comprises almost half of the entire book. It includes what can be described as a self-contained course in linear algebra with emphasis on invariant subspaces, together with substantial developments of applications to the theory of polynomial and rational matrix-valued functions, and to systems theory. These applications demand extensions of the standard material in linear algebra that are included in our treatment in a natural way. They also serve to breathe new life into an otherwise familiar body of knowledge. Thus there is a considerable amount of material here (including all of Chapters 3, 4, and 6) that cannot be found in other books on linear algebra. Almost all of the material in this part can be understood by readers who have completed a beginning course in linear algebra, although there are places where basic ideas of calculus and complex analysis are required. 3
This page intentionally left blank
Chapter One Invariant Subspaces: Definition, Examples, and First Properties This chapter is mainly introductory. It contains the simplest properties of invariant subspaces of a linear transformation. Some basic tools (projectors, factor spaces, angular transformations, triangular forms) for the study of invariant subspaces are developed. We also study the behaviour of invariant subspaces of a transformation when the operations of similarity and taking adjoints are applied to the transformation. The lattice of invariant sub- spaces of a linear transformation—a notion that will be important in the sequel—is introduced. The presentation of the material here is elementary and does not even require use of the Jordan form. /./ DEFINITION AND EXAMPLES Let A: <p"—»<p" be a linear transformation. A subspace M C <p" is called invariant for the transformation A, or A invariant, if Ax £ M for every vector x£l In other words, M is invariant for A means that the image of M under A is contained in M; AM CM. Trivial examples of invariant subspaces are {0} and <p". Less trivial examples are the subspaces Ker A = {x £ <p" | Ax = 0} and lmA = {Ax\xe.$"} Indeed, as Ax = 0 £ Ker A for every x £ Ker A, the subspace Ker A is A invariant. Also, for every x £ <p", the vector Ax belongs to Im A; in particular, A(lm >4)Clm A, and Im A is A invariant. 5
6 Invariant Subspaces More generally, the subspaces Ker Am = {x £ <£" | Amx = 0} , m = l,2,... and ImA" {Amx\xe$*} , m = l,2,... are A invariant. To verify this, let x £ Ker Am, so Amx = 0. Then Am(Ax) = A(Amx) = 0, that is, Ax £ Ker Am. This means that Ker Am is /I invariant. Further, let x£ Im /T, sox= j4"> for some >> £ <£". Then /Ix = A{Amy) = Am(Ay), which implies that Ax £ Im /lm. So Im /4"1 is /I invariant as well. When convenient, we shall often assume implicitly that a linear transformation from <pm into <p" is given by an n x m matrix with respect to the standard orthonormal bases el = (1, 0,. . . ,0), e2 = (0,1, 0,. . . , 0), e„ = <0,0,..., 0,1) inf ",<?„...,<?„, in <pm. The following three examples of transformations and their invariant subspaces are basic and are often used in the sequel. example 1.1.1. Let L0 A0J (the nx/i Jordan block with A0 on the main diagonal). Every nonzero A -invariant subspace is of the form Span{ex,. . ., ek), where et is the vector (0,. . . , 0,1, 0,. . . , 0} with 1 in the tth place. Indeed, let M be a nonzero /1-invariant subspace, and let n x = Zj aiei , «, £ <p be a vector from M for which the index k = maxjm | 1 < m < n, am ¥= 0} is maximal. Then clearly M C Span{e,, . . . , ek) On the other hand, the vector x = E,=l a,e,, ak ¥=0 belongs to M. Hence, since M is A invariant, the vectors
Definition and Examples k jc, = Ax - \0x = Zj otiei_l 1 = 2 k x2 = Axl - A0x, = 2j <*,<?,_ z = 3 Xk_1 — Axk_2 "■oxk-2 akei also belong to M. Hence the vectors 1 e, = —xL e2= — (Xk-2~ak -i<?i) belong to J< as well. So Span{e,,. . . ,ek} CM and the equality Span{e,,. . . ,ek) ~ M follows. As for every y — £*_, ^iei £ Span{e,,. . . , ek) we have Ay = Ky + ^ A-e.-^Spanf*?,,. ..,«*} i = 2 The subspace Span{e,,. . . , ek) is indeed A invariant. The total number of /1-invariant subspaces (including (0} and <p") is thus n + 1. In this example we have KerA ({0} if An*0 lSpan{e,} if Ao = 0 and f <t" if An * 0 lmA = \Z t x -t » n I Span{<?,,. . .,<?„_,} if Ao = 0 As expected, these subspaces are /I invariant. □ example 1.1.2. Let A = A0/, where / is the n x n identity matrix. Clearly, every subspace in <p" is A invariant. Here the number of /1-invariant subspaces is infinite (if n > 1).
8 Invariant Subspaces Note that the set ln\(A) of all yt-invariant subspaces is uncountably infinite. Indeed, for linearly independent vectors x, y G <p" the one- dimensional subspaces Span{x + ay}, a G i|J are all different and belong to Inv(^4). So they form an uncountable set of /1-invariant subspaces. Conversely, if every one-dimensional subspace of <p" is A invariant for a linear transformation A, then A = A0/ for some A0. Indeed, for every x ¥=0 the subspace Span{x} is A invariant, so Ax = \(x)x, where X(x) is a complex number that may, a priori, depend on x. Now if A(x,)# A(x2) for linearly independent vectors xx and x2, then Span{x, + x2) is not A invariant, because A{xx + x2) = \(xl)xl + A(A:2)A:2^'Span{x:1 + x2} Hence we must have A0 = A(x) is independent of x ¥= 0, so actually A = A0/. D Later (see Proposition 2.5.4) we shall see that the set of all ^4-invariant subspaces of on n x n complex matrix A is never countably infinite; it is either finite or uncountably infinite. example 1.1.3. Let A = 'A, L0 0 A, («>2) where the complex numbers A,, . . . , A„ are distinct. For any indices 1 ^ j, < • • • < ik s n the subspace Span{e, ,. . . , et } is A invariant. Indeed, for we have Ax = 2 a A e G Span{<?, ,. . . , <?, } ;=1 ' ' It turns out that these are all the invariant subspaces for A. The proof of this fact for a general n is given later in a more general framework. So the total number of /1-invariant subspaces is .?.(;)-*■
Definition and Examples 9 Here we shall check only that the 2x2 matrix A,^A2 has exactly two nontrivial invariant subspaces, Span{e,} and Span{e2}. Indeed, let M be any one-dimensional .^-invariant subspace J£ = Span{;c}, x = atet + a2e2 ^0 Then Ax = alXlel + a2\2e2 should belong to M and thus is a scalar multiple of *,: alklel+ a2k2e2 = ^e^ /3ae2 for some /3 G <p. Comparing coefficients, we see that we obtain a contradiction A, = A2 unless a, = 0 or a2 = 0. In the former case M = Span{e2} and in the latter case M = Span{e,}. In this example we have Ker A = Span{^ } (when det/l=0), where L is the index for which A, = 0 (as we have assumed that the A; are distinct and det A - 0, there is exactly one such index), and Im A = Span{e, | i^i0}. □ The following observation is often useful in proving that a given subspace is A invariant: A subspace M — Span{*,,. . . , xk} is A invariant if and only if AXj €E M for i = 1,. . . , k. The proof of this fact is an easy exercise. For a given transformation j4:<p"-^<p" and a given vector *G<p", consider the subspace M = Span{*, Ax, A2x,. . .} We now appeal to the Cayley-Hamilton theorem, which states that E"=0 afA' = 0, where the complex numbers a(),. . . , a„ are the coefficients of the characteristic polynomial det(A/ - A) of A: n det(A/-,4) = X ajk' (By writing A as an n x n matrix in some basis in <p", we easily see from the definition of the determinant that det(A/- A) is a polynomial of degree n with an = \.) Hence Akx with k^n is a linear combination of x, Ax,. . . , A"~lx, so actually M = Span{*, Ax, A2x,. . . , A"~lx} The preceding observation shows immediately that M is A invariant. Any -r Lo A,.
10 Invariant Subspaces /1-invariant subspace !£ that contains x also contains all the vectors Ax, A2x,. . . , and hence contains M. It follows that M is the smallest /1-invariant subspace that contains the vector x. We conclude this section with another useful fact regarding invariant subspaces. Namely, a subspace M C <p" is A invariant for a transformation A: <p"-» <p" if and only if it is (aA + /?/) invariant, where a, j8 are arbitrary complex numbers such that a ¥= 0. Indeed, assume that M is A invariant. Then for every x £ M we see that the vector (a A + pi)x = aAx + fix belongs to M. So M is (a/1 + /3/) invariant. As ,4= -(aA + pi)--I a a the same reasoning shows that any (aA + j8/) invariant subspace is also A invariant. 1.2 EIGENVALUES AND EIGENVECTORS The most primitive nontrivial invariant subspaces are those with dimension equal to one. For a transformation v4: <p" —>■ <p" and some nonzero *£ <p", therefore, we consider an .^-invariant subspace of the form M = Span{;c}. In this case there must be a A0 £ <p such that Ax — A0;c. Since we then have A(ax) = at(Ax) = \u(ax) for any a £ <p, the number A0 does not depend on the choice of the nonzero vector in M. We call A0 an eigenvalue of A, and, when Ax = A0* with 0 ¥= x £ <p", we call x an eigenvector of A (corresponding to the eigenvalue A0). Observe that, since (A0/ - A)x = 0, the eigenvalues of A can also be characterized as the set of comnlex zeros of the characteristic def polynomial of A; (pA( A) = det( A/ - A). The set of all eigenvalues of A is called the spectrum of A and is denoted by a(A). We have seen that any one-dimensional >4-invariant subspace is spanned by some eigenvector. Conversely, if x0 is an eigenvector of A corresponding to some eigenvalue A„, then Span{*0} is A invariant. (In other words, A is the operator of multiplication by A0 when restricted to Span{*0}.) Let us have a closer look at the eigenvalues. As the characteristic polynomial <pA( A) = det( A/ - A) is a polynomial of degree n, by the fundamental theorem of algebra, <p4(A) has n (in general, complex) zeros when counted with multiplicities. These zeros are exactly the eigenvalues of A. Since the characteristic polynomial and eigenvalues are independent of the choice of basis producing the matrix representation, they are properties of the underlying transformation. So a transformation A: <p"—> <p" has exactly
Eigenvalues and Eigenvectors 11 n eigenvalues when counted with multiplicities, and, in any event, the number of distinct eigenvalues of A does not exceed n. Note that this is a property of transformations over the field of complex numbers (or, more generally, over an algebraically closed field). As we shall see later, a transformation from iff" into Jf?" does not always have (real) eigenvalues. Since at least one eigenvector corresponds to any eigenvalue A0 of A it follows that every linear transformation v4: <p" —>■ <p" has at least one one- dimensional invariant subspace. Example 1.1.1 shows that in certain cases a linear transformation has exactly one one-dimensional invariant subspace. We pass now to the description of two-dimensional >4-invariant subspaces in terms of eigenvalues and eigenvectors. So assume that M is a two- dimensional >4-invariant subspace. Then, in a natural way, A determines a transformation from M into M. We have seen above that for every transformation in a (complex) finite-dimensional vector space (which can be identified with <pm for some m) there is an eigenvalue and a corresponding eigenvector. So there exists an x0 €E Jt\{0} and a complex number A0 such that A*,, = A0a:0. Now let xx be a vector in M for which {x0, xx) is a linearly independent set; in other words, M = Span{*0, *,}. Since M is A invariant it follows that Axx = nQx0 + nixl for some complex numbers /j^ and /u.,. If fig — 0, then x x is an eigenvector of A corresponding to the eigenvalue /i,l. If fi,0 ¥ 0 and /n, ^ A0, then the vector y = ~fJiltx0 + (A0 - fJil)xl is an eigenvector of A corresponding to fj,l for which {x0, y) is a linearly independent set. Indeed Ay = -/V**„ + (A0 - /u.,)Ar, = -^qXo + (A0 - fi{)(Wo + f^ixi) = (A0 - M,V,*i - nlfM0x0 = nty Finally, if /t0^0 and /n, = A0, then x0 is the only eigenvector (up to multiplication by a nonzero complex number) of A in M. To check this, assume that a0xa + a,*,, ax ¥=0, is an eigenvector of A corresponding to an eigenvalue v0. Then A(aox0 + <*,*,)= v0a0x0 + v0aixi (1.2.1) But the left-hand side of this equality is a0Ax0 + alAxl = a0X0x0 + a{(fi.QxQ + XqX^ and comparing this with equality (2.1), we obtain Kai = "o«I . «0A0+«lMo= "o"o
12 Invariant Subspaces which (with a, ^0) implies A0 = v0 and a1fio = 0, a contradiction with the assumption /n0 # 0. However, note that the vectors z = (l//i,0)xl and x0 form a linearly independent set and z has the property that Az - A0z = x0. Such a vector z will be called a generalized eigenvector of A corresponding to the eigenvector x0. In conclusion, the two-dimensional invariant subspace M is spanned by two eigenvectors if and only if either p^ - 0 or p^ ¥=Q and /a, # A0. If p0 # 0 and px = A(), then M is spanned by an eigenvector and a corresponding generalized eigenvector. A study of invariant subspaces of dimension greater than 2 along these lines becomes tedious. Nevertheless, it can be done and leads to the well-known Jordan normal form of a matrix (or transformation) (see Chapter 2). Using eigenvectors, one can generally produce numerous invariant sub- spaces, as demonstrated by the following proposition. Proposition 1.2.1 Let X{,. . . , Xk be eigenvalues of A (not necessarily distinct), and let x, be an eigenvector of A corresponding to A,, i = 1,. . . , k. Then Span{jr,,. . . , xk} is an A-invariant subspace. Proof. For any * = £,„, aixi £ Span{x,, . . . , xk}, where a, G <p, we have k k Ax = Zj atAxt — 2j at X,xi so indeed Span!*!,. . . , xk} is A invariant. □ For some transformations all invariant subspaces are spanned by eigenvectors as in Proposition 1.2.1, and for some transformations not all invariant subspaces are of this form. Indeed, in Example 1.1.1 only one of the n nonzero invariant subspaces is spanned by eigenvectors. On the other hand, in Example 1.1.2 every nonzero vector is an eigenvector corresponding to A0, so obviously every /1-invariant subspace is spanned by eigenvectors. 1.3 JORDAN CHAINS We have seen in the description of two-dimensional invariant subspaces that eigenvectors alone are not always sufficient for description of all invariant subspaces. This fact necessitates consideration of generalized eigenvectors as well. Let us make a general definition that will include this notion. Let A0 be an eigenvalue of a linear transformation A: <p" -* <p". A chain of vectors
Jordan Chains 13 Xi\ * X | , xk is called a Jordan chain of A corresponding to A0 if x0 ¥= 0 and the following relations hold: AxQ a0x0 Ax^ \0x{ — x0 (1.3.1) Axt *0Xk Xk-\ The first equation (together with x0 ^ 0) means that x0 is an eigenvector of A corresponding to A0. The vectors x{,...,xk are called generalized eigenvectors of A corresponding to the eigenvalue A0 and the eigenvector xQ. For example, let A = L0 0 0 A0J A0e<p as in Example 1.1.1. Then e, is an eigenvector of A corresponding to A0, and ex, e2,. . . , en is a Jordan chain. This Jordan chain is by no means unique; for instance, ev, e2 + ae{,. . . , en + aen_l is again a Jordan chain of A, where a E <p is any number. In Example 1.1.3 the matrix A does not have generalized eigenvectors at all; that is, every Jordan chain consists of an eigenvector only. Indeed, we have A = diag[A,, A2,. . . , Aj, where A,, . . . , An are distinct complex numbers; therefore det(A/-^) = (A-A,)(A-A2)--(A-A„) So A,,. . . , A„ are exactly the eigenvalues of A. It is easily seen that any eigenvector of A corresponding to A, is of the form ae{ with a nonzero scalar a. Assuming that there is a Jordan chain aes , x of A corresponding to A,., equations (1.3.1) imply Ax — A; x = ae, (1.3.2) Write x = E,"=1 0,.*?,., then Ax = E"=1 A,j8,<?., and equality (1.3.2) gives n SU-A.^A^a^ (1.3.3)
14 Invariant Snbspaces As A, =£ Xt for i -^ i0, we find immediately that /3, = 0 for i -^ j0. But then the left-hand side of equation (1.3.3) is zero, a contradiction with a^O. So there are no generalized eigenvectors for the transformation A. Jordan chains allow us to construct more invariant subspaces. Proposition 1.3.1 Let x0,. . . , xk be a Jordan chain of a transformation A. Then the subspace M — Span{*0, . . . , xk) is A invariant. Proof. We have AXq = ^o-^o ^ ^ where A0 is the eigenvalue of A to which x0,. . . ,xk corresponds; and for t= 1 A: Axi = A0a:( tXj^el Hence the A in variance of M follows. D The following proposition shows how the Jordan chains behave under a linear change in the matrix A. Proposition 1.3.2 Let a ¥= 0 and p be complex numbers. A chain of vectors xu, xt,. . . ,xk is a Jordan chain of A corresponding to the eigenvalue A0 if and only if the vectors xo> ~*i> • • • > *•** (1.3.4) a a form a Jordan chain of a A 4- pi corresponding to the eigenvalue aX0 + p of aA + pi. Proof. Assume that x0,. . . ,xk\s a Jordan chain of A corresponding to A0, that is, equalities (1.3.1) hold. Then we have (a A + pi)x0 = aAx0 + 0xo = aXQx0 + 0xo = (aX0 + p)xQ (aA + pl)—xl - (aAit + j8)—*, = Axx - A,,*, = xQ and in general for i = 1, . . . , k 1 11 1 {aA + pi)— x, - (oAo + P)— x, = -7=7 (Ar, - A^,) = -7=7*,., a a a a
Jordan Chains IS So by definition the vectors in equality (1.3.4) form a Jordan chain of a.A + pi corresponding to aA0 + /3. Conversely, assume that equality (1.3.4) is a Jordan chain of aA + j8/ corresponding to a\0 + $. As A= -(/1 + /3/)--/ a a the first part of the proof shows that the vectors x0, «(-*,) = *„..., a"y~jxkj = xk form a Jordan chain of A corresponding to the eigenvalue (l/a)(«A0 + /3) - (0/a) = Ao. D Two corollaries from Proposition 1.3.2 will be especially useful in the sequel. Corollary 1.3.3 (a) The vector x0 is an eigenvector of A corresponding to A0 if and only if x0 is an eigenvector of a A + j8/ (here a t^O, j8 are complex numbers) corresponding to a\0 + j8; (b) the vectors x0,. . . ,xk form a Jordan chain of A corresponding to A0 if and only if these vectors constitute a Jordan chain of A + /3/ corresponding to A0 + /3 for any complex number j8. In many instances Corollary 1.3.3 allows us to reduce the consideration of eigenvalues and Jordan chains to cases when the eigenvalue is zero. Our first example of this device appears in the proof of the following proposition. Proposition 1.3.4 The vectors in a Jordan chain x(),. . . , xk of A are linearly independent. Proof. Assume the contrary, and let xp be the first generalized eigenvector in the Jordan chain that is a linear combination of the preceding vectors: p-> x„ = Z oLiXi; a, e. <p We can assume that the eigenvalue A0 of A to which the Jordan chain x0,. . . ,xk corresponds is zero. (Otherwise, in view of Corollary 1.3.36, we consider A - A0/ in place of A.) So we have Axp = xp_x. On the other hand, we have
16 Invariant Subspaces Ax = 2 (XiAXt = 2, a,*,-. Comparing both expressions, we see that xp_y is a linear combination of the vectors x0,. . . , x 2. This contradicts the choice of xp as the first vector in the Jordan chain that is a linear combination of the preceding vectors. □ 1.4 INVARIANT SUBSPACES AND BASIC OPERATIONS ON LINEAR TRANSFORMATIONS In this section we first consider questions concerning invariant subspaces of sums, compositions, and inverses of linear transformations. We shall also develop the connection between invariant subspaces for a linear transformation and those of similar and adjoint transformations. The basic result for the first three algebraic operations is given in the following proposition. Proposition 1.4.1 Let A, B: <p"—> <p" be transformations, and let MC$" be a subspace which is simultaneously A invariant and B invariant. Then M is also invariant for a A + /3B (with any a, j3 £ (p) and for AB. Further, if A is invertible, then M is also invariant for A~l. Proof. For every x £ M we have (aA + pB)x = a(Ax) + p(Bx) G M and (AB)x~ A(Bx) G M because Bx &. M. Assume now that A is invertible, and let x,,..., xp be a basis in M. Then the vectors y, = Axx,. . . ,yp- Axp are linearly independent (because A is invertible) and belong to M (because M is A invariant). So y,, . . . , y is also a basis in M. Now A'lM = A'1 Spanfy,,. . . , yp} = Span{x,,. . . ,xp} = M U For any transformation A, we denote by Inv(^4) the set of all A -invariant subspaces. Then Proposition 1.4.1 means, in short, that Inv(,4)ninv(B)CInv(a<4 + /3B) (1.4.1) Inv(/l)ninv(£)Clnv(,4£) (1.4.2) Inv(^)Clnv(^"') (if A is invertible) (1.4.3)
Basic Operations on Linear Transformations 17 By applying equality (1.4.3) with A replaced by A~\ we get Inv(/4~')C ln\(A), so actually equality holds in (1.4.3). It is very easy to produce examples when the equality fails in (1.4.1) or (1.4.2). For instance: example 1.4.1. Let A: <p"—► <p" be a transformation that is not of the form y/ for some y G <p (if n>2, such transformations obviously exist). By Example 1.1.2, not all subspaces in <p" are A invariant. On the other hand, take B = A and a + /3 =0 in (1.4.1). Then the right-hand side of (1.4.1) is the zero transformation for which every subspace in <p" is invariant. □ To give an example where the inclusion in (1.4.2) is strict, put TO 11 ^Ho oJ The following example of strict inclusion in (1.4.2) is also instructive. example 1.4.2. Let -ft ;i- -ft :]• ^ An easy analysis (using Example 1.1.1) shows that A and B have no nontrivial common invariant subspaces. Thus Inv(^4)n Inv(B) = ({0}, <p2}. On the other hand, ln\(AB) must have an eigenvector that spans a nontrivial ,4£-invariant subspace. Again, the inclusion (1.4.2) is strict. □ Consider now the notion of similarity. Recall that two transformations A and B on <p" are called similar if A - S~*BS for some invertible transformation 5 (called a similarity transformation between A and B). Evidently, similar transformations have the same characteristic polynomial and, consequently, the same eigenvalues. The next proposition reveals the close connection between invariant subspaces of similar transformations. Proposition 1.4.2 Let transformations A and B be similar, with the similarity transformation S: A = S~ BS. Then a subspace M C <p" is A invariant if and only if the subspace SM = {Sx\xeM}C$" is B invariant. Proof. Let M be A invariant, and let x €E SM, so that x = Sy for some yGM. Then Bx = BSy = SAy, and since Ay G Jl, we find that Bx G SM. So SM is B invariant.
18 Invariant Subspaces Conversely, assume that SM is B invariant. Then for yE.M we have BSy £ SM and thus Ay = 5" lBSy £ S' l(SM ) = M So M is A invariant. □ Proposition 1.4.2 shows, in particular, that there is a natural correspondence between the sets of invariant subspaces of similar transformations. Let us check this correspondence more closely in some of the examples of invariant subspaces already introduced. Proposition 1.4.3 Let A and B be similar, with the similarity transformation S. Then (a) Im B = 5(Im A); (b) Ker B = S(Ker A); (c) if x0, x{,. . . , xk is a Jordan chain of A corresponding to A0, then Sxu, 5a:,, ... , Sxk is a Jordan chain of B corresponding to the same A„. Proof. The proof is straightforward. Let us check (b). Take x £ Ker A, so Ax = 0. Then Ax = S~ BSx = 0, and as S is invertible, BSx = 0, that is, Sx £ Ker B. Reversing the order of this argument, we see that if Sx £ Ker B for some x £ <p", then x £ Ker A. The proofs of (a) and (c) proceed in a similar way. □ Consider now the operation of taking adjoints. Let A: <p"-*<p" be a transformation. Recall that the adjoint transformation A*: <p"-*<p" is defined by the relation (Ax, y) = (x, A*y) , for all x, y £ <p" where (•, •) is the standard scalar product in <p": n (x, y) = YjXiyl, x = {xx,...,xn), y=(yl,.--,y„) More generally, if STX, 3~2 are subspaces in <p" and A: 9~x-* 9~2 is a linear transformation, its adjoint ^4*: 9~z-* 5", is defined by the relation (Ax,y) = (x,A*y) for all x £ ST{, y £ 9~2 It is not difficult to check that the adjoint transformation always exists and is unique. It is easily verified that for any linear transformations A and B on <p" and any a £ <p
Basic Operations on Linear Transformations 19 (A +B)*= A*+B* , (aA)* = aA* (AB)*=B*A*, (A*)* = A If (in the standard basis ex,...,en) L«nl «n2 then the adjoint transformation is given by the formula a,, a, The same formula also holds for the transformation A written as a matrix in any orthogonal basis in <p" as long as A* is considered as a matrix in the same basis. There is a simple and useful characterization of the invariant subspaces of the adjoint transformation A* in terms of the invariant subspaces of A, as follows. Proposition 1.4.4 Let ^4:<p"-*<p" be a linear transformation. A subspace M C <p" is A* invariant if and only if its orthogonal complement M1 is A invariant. Proof. Assume that Mis A* invariant, and let x G M ±. We must prove that Ax E. Jl1. Indeed, for every yEMwe have (Ax,y) = (x, A*y) = 0 because A*yEM and xGi1. Conversely, assume that Mx is A invariant, and take y £ M. Then for every x £ M L we have (A*y,x) = (y,Ax) = 0 which means that A*y E.M. So M is A* invariant. □ Note the following equalities for the /1-invariant subspaces Ker A and Im A and the A*-invariant subspaces Ker A* and Im A*: (Ker A)1 = Im A* ; Ker A* = (Im A)A (1.4.4)
20 Invariant Snbspaces Indeed, let x=A*y and zGKer/1. Then (x, z) = {A*y, z) - {z, A*y) = {Az, y) = 0; so x G (Ker A)^l Hence we have proved that \mA*C{KcrA)1 (1.4.5) On the other hand, let x be orthogonal to 1m A*. Then for every y G <p", we have {Ax, y) = {x, A*y) = 0; so Ax J_ <p", and thus Ax = 0, or x G Ker ^4. So (Im A*)1 CKer A. Taking orthogonal complements, we obtain Im/1*D (Ker j4)\ Combining with (1.4.5), we obtain the first equality in (1.4.4). The second equality follows from the first one applied to A* instead of A [recall that (A*)* = A]. Later, we shall also need the following property: lmA = lm{AA*) Here, the inclusion D is clear. For the opposite inclusion, let x&lmA. Then x = Ay for some y. If z is the projection of y onto Ker A, then y-z6 (Ker A)1 and also x = A{y - z). Then (1.4.1) implies that y — z G Im A* and so x G lm{AA*), as required. A transformation ^4: <p" —» <p" is called self-adjoint ii A = A*. It is easily seen that A is self-adjoint if and only if it is represented by a hermitian matrix in some orthogonal basis (recall that a matrix [«,-*]"*_! is called hermitian if ajk = dkj, j, k = 1, . . . , n). For this important class of transformations we have the following corollary of Proposition 1.4.4. Corollary 1.4.5 If A is self-adjoint, then Jt± is A invariant if and only if M is A invariant. 1.5 INVARIANT SUBSPACES AND PROJECTORS A linear transformation defined by P: <p" -* <p" is called a projector if P2 = P. The important feature of projectors is that there exists a one-to-one correspondence between the set of all projectors and the set of all pairs of complementary subspaces in <p". This correspondence is described in Theorem 1.5.1. Recall first that iiM,Z£ are subspaces of <p", then M+£={zE$"\z = x + y, xEM,yEJ£}. This sum is said to be direct if M D Z£ = {0}, in which case we write M 4- if for the sum. The subspaces M, if are complementary {are direct complements of each other) if M D if = {0} and M + if = <p". Nontrivial subspaces M, if are orthogonal if for each x G M and y G if we have (x, y) = 0 and they are orthogonal complements if, in addition, they are complementary. In this case, we write M = if1, if = M1.
Invariant Subspaces and Projectors 21 Theorem 1.5.1 Let P be a projector. Then (Im P, Ker P) is a pair of complementary subspaces in <p". Conversely, for every pair (if, ,if2) °f complementary subspaces in <p", there exists a unique projector P such that Im P = if,, KerP = if2. Proof. Let x G <p". Then x = (x - Px) + Px. Clearly, Px G Im P and x - Px G Ker P (because P2 = P). So Im P + Ker P = <p". Further, if x G Im P n Ker P, then a: = Py for some y G <£" and Px = 0. So X = Py = P2y = P(Py) = Px = 0 and Im P n Ker P = {0}. Hence Im P and Ker P are indeed complementary subspaces. Conversely, let if, and if2 be a pair of complementary subspaces. Let P be the unique linear transformation in <p" such that Px = x for x G if, and Px = 0 for x G if2. Then clearly P2 = P, if , C Im P, and if, C Ker P. But we already know from the first part of the proof that Im P + Ker P = <p". By dimensional considerations we have, consequently, if, = Im P and if2 = Ker P. So P is a projector with the desired properties. The uniqueness of P follows from the property that Px = x for every xGImP (which, in turn, is a consequence of the equality P2 = P). □ We say that P is the projector on if, along if, if Im P = if,, Ker P = if2. A projector P is called orthogonal if KerP = (ImP)1. Thus the corresponding complementary subspaces are mutually orthogonal. Orthogonal projectors are particularly important and can be characterized as follows. Proposition 1.5.2 A projector P is orthogonal if and only if P is self-adjoint, that is, P* = P. Proof. Suppose that P* = P, and let x G Im P, y G Ker P. Then (x, y) = (Px, y) = (x, Py) = (x, 0) = 0, that is, Ker P is orthogonal to Im P. Since by Theorem 1.5.1 Ker P and Im P are complementary, it follows that in fact Ker P = (Im P)\ Conversely, let Ker P = (Im P)1. To prove that P* = P, we have to check the equality (Px, y) = (x, Py) for all x, yG<p" (1.5.1) Because of the sesquilinearity of the function (Px, y) in the arguments x, y G <p", and in view of Theorem 1.5.1, it is sufficient to prove equation (1.5.1) for the following four cases: (a) x, yGImP; (b) xGKerP, y G Im P; (c) x G Im P, y G Ker P; (d) x, y G Ker P. In case (d), equality (1.5.1)
22 Invariant Subspaces is trivial because both sides are 0. In case (a) we have (Px, y) = {Px, Py) = (x, Py) and (1.5.1) follows. In case (b), the left-hand side of equation (1.5.1) is zero (since x £ Ker P) and the right-hand side is also zero in view of the orthogonality Ker P = (Im P)\ In the same way, one checks (1.5.1) in case (c). So (1.5.1) holds, and P* = P. □ Note that if P is a projector, so is / - P. Indeed, (/ - P)2 = /-2P+P2 = /-2P+P=/-P. Moreover, KerP=Im(/-P) and Im P = Ker(/ - P). It is natural to call the projectors P and I - P complementary projectors. We now give useful representations of a projector with respect to a decomposition of <p" into a sum of two complementary subspaces. Let T: <p"-* <p" be a transformation and let if,, S£2 be a pair of complementary subspaces in <p". Denote m, = dim if, (i = 1,2); then m, + m2 = n. The transformation T may be written as a 2 x 2 block matrix with respect to the decomposition if, 4- if2 = <p": T=\i" I12} (1.5.2) L1!l * 22 J Here T/; (i, j — 1, 2) is an m, x my matrix that represents in some basis the transformation P,T|^: S£f-* if,., where P, is the projector on if, along if3_, (so P, + P2 = I). ' Suppose now that T= P is a projector on if, = Im P. Then representation (1.5.2) takes the form for some matrix A". In general, X ¥= 0. One can easily check that A" = 0 if and only if if, = Ker P. Analogously, if if, = Ker P, then (1.5.2) takes the form P=[o ]] (1.5-4) and Y-0 if and only if if2 = Im P. By the way, the direct multiplication P■ P, where P is given by (1.5.3) or (1.5.4), shows that P is indeed a projector: P2 = P. Consider now an invariant subspace M for a transformation A: <P"-» <P". For any projector P with Im P = M we obtain P/1P = AP (1.5.5)
Invariant Subspaces and Projectors 23 Indeed, if x &. Ker P, we obviously have PAPx = APx If x e Im P = Jt, we see that Ax belongs to Jt as well and thus PAPx = PAx = Ax = APx once more. Since <p" = Ker P -(- Im P, (1.5.5) follows. Conversely, if P is a projector for which (1.5.5) holds, then for every x E. Im P we have PAx = Ax; in other words, Im P is A invariant. So a subspace Jt is A invariant if and only if it is the image of a projector P for which (1.5.5) holds. Let Jt be an /i-invariant subspace and let P be a projector on Jt [so that (1.5.5) holds]. Denoting by Jt' the kernel of P, represent A as a 2 x 2 block matrix 21 ^22" with respect to the direct sum decomposition <p" = Jt + Jt'. Here A,, is a transformation PAP\M: M-* Jt, AX2 is a linear transformation P4(/-P)L.:.ir-»,^, /121 = (/-P)/IP|^:^-*^' A22 = (I-P)A(I-P)\M.:Jt'^Jt' and all these transformations are written as matrices with respect to some chosen bases in M and Jt'. As Jt is A invariant, equation (1.5.5) implies that (/- P)AP = 0, that is, A2l =0. Hence ft" a':) <'-5-6> Using this representation of the matrix A, we can deduce some important connections between the restriction A\M = Al{ and the matrix A itself. Proposition 1.5.3 Let x0,. . . , xk be a Jordan chain of A\M corresponding to the eigenvalue A0 of A\M. Then x0,. . . , xk is also a Jordan chain of A corresponding to A0. In particular, all eigenvalues of A\M are also eigenvalues of A. Proof. We have x0 ¥= 0; x, £ Jt for i = 0,. . . , k, and ^M-r*. ~ Kxi = xi-\ > i = I,. . . , k
24 Invariant Snbspaces As A\M = PAP\M = AP\M, these relations can be rewritten as APx0 = A0x0 , APXj — A0jt(- = xj_l , i — \,...,k But Px^Xj, i = 0,1,. . . , k, and we obtain the relations defining xQ,. . . , xk as a Jordan chain of /I corresponding to A0. □ The last statement in Proposition 1.5.3 can also be proved in the following way. Suppose that A0e a(An), that is, Ker(A07- Au) ^ {0}. The representation (1.5.6) implies that any nonzero vector from Ker(A0/-^H) belongs to Ker(A0/ - A). Thus Ker(A0/- A) * {0}, and A0 £ <t(A). In fact, a more general result holds. Proposition 1.5.4 Let M be an A-invariant subspace with a direct complement M' in <p", and let be the representation of A with respect to the decomposition <p" = M + M'. Then <r(A) = o-(An)Uo-(A22) Proof. This follows immediately from the fact that det(A/-^4) = det(A/- ,4,,) det(A7- A22). □ As an example in which projectors and the subspaces Im A and Ker A of a transformation A all play important roles, let us describe here a construction of generalized inverses for A. Given a transformation A: <p"-» <pm, the transformation X: <pm-> <p" is called a generalized inverse of A if the following holds: for any b £ Im A the linear system Ax = b has a solution x = Xb, and for any bElm X the linear system Xx - b has a solution x - Ab. So this is a natural generalization of the notion of the inverse transformation. Observe that A" is a generalized inverse of A if and only if AX A = A and XAX = X. Indeed, let Ibea generalized inverse of A. Then AXb = b for every b G Im A, that is, for every b of the form b = Ay. So AX Ay = Ay for all y e <p", and AX A = A. Similarly, one checks that XAX = X. Conversely, if AX A = A, then for every b of the form b = Ay the vector Xb = XAy is obviously a solution of the linear equation Ax = b. The descrition of all generalized inverses of A, which implies, in particular, that a generalized inverse of A always exists, is given by the following theorem.
Angular Transformations and Matrix Qnadratic Equations 25 Theorem 1.5.5 Let A: $"-* <£"" be a transformation, let <p" = Ker A + N, <pm = Im A + R for some subspaces N and R, and let P be the projector on Im A along R, Q the projector on N along Ker A. Then (a) the transformation A{ = A\„ is a one-to-one transformation of N onto Im A; (b) the transformation A defined on <pm by A y = A ~ (Py), for all y €E <pm, is a generalized inverse of A for which A A = P and AA = Q; (c) all generalized inverses of A are determined as N, R range over all complementary subspaces for Ker A, Im A, respectively. The proof of Theorem 1.5.5 is straightforward. It is easily seen that, in the hypothesis of the theorem, complementary subspaces R, N are simply the range and null-space of the generalized inverse that they determine. Corollary 1.5.6 In the statement of Theorem 1.5.5, we have Im A' = N and Ker A1 = R Ker A + Im A1 = <p" and Im A + Ker A1 = <pm 1.6 ANGULAR TRANSFORMATIONS AND MATRIX QUADRATIC EQUATIONS In this section we study angular transformations and their connections with matrix quadratic equations and invariant subspaces. The correspondence between the invariant subspaces of similar transformations described in Proposition 1.4.2 is useful here. This discussion can be seen as the first step in the examination of solutions of matrix quadratic equations. In this program, we first need the notion of a subspace "angular with respect to a projector." In Chapter 13 we discuss the topological properties of such subspaces in preparation for the applications to quadratic equations to be made in Chapters 17 and 20. Let 7r be a projector defined on <p". Transformations acting on <p" in this section are written in 2 x 2 block matrix form with respect to the decomposition <p" = Ker 7r 4- Im it. A subspace Jf of <p" is said to be angular with respect to it if Jf + Ker it = <p". That is, if and only if ^V and Ker it are complementary subspaces of <p". Thus Im it is angular with respect to it, but more generally, if R is any transformation from Im -n into Ker it, then the subspace •XR ='{x \x = Ry + y, y E Im n} (1.6.1)
26 Invariant Subspaces is angular with respect to it. To see this, observe first that MR is indeed a subspace; that is, if *,, x2 E JfR, then for some yl, y2 G Im tt xl+x2 = (Ryl+yl) + (Ry2 + y2) = R(yi+ y2) + (y, 4- y2) G JfR and if a G <p ax, = a(Ryt + y,) = R(ay) + (ay) G JfR Then <p" = NR + Ker it because, for any y G <p", if y, = iry, y2 = (/ - 7r)y, then y = yi+y2 = (Ryl + yi) + (y2-Ryi) and /?y, + y, G jVr, y2 - Ryt G Ker ir. Finally, if z G jVr D Ker 7r, then z = Ry + y, where y G Im tt and also ttz - 0. Thus 0 = TrRy + iry = irRy + y Since R is into Ker rr, nR — 0 and it follows that y = 0. Hence z = 0 and <p" = jVk 4- Ker tt. The angular subspaces generated in this way are, in fact, all possible angular subspaces. Proposition 1.6.1 Let N be a subspace of <p". Then N is angular with respect to it if and only if J{-J{R for some transformation R: Im tt + Ker tt that is uniquely determined by Jf. Proof If ^V = NR, we have already checked that N is angular. To prove the converse, assume that Jf is angular with respect to tt, and let Q be the projector of <p" onto Jf along Ker tt. Put Rx = (Q- tt)x , xGlnur (1.6.2) Then Jf = JfR. Indeed tt(Rx) = (irQ - ir)x = (it - tt)x = 0 that is, R: Im 7r-*Ker it, and we have to show that Jf- JfR. If x G NR, then for some y = 7ry, x = Ry + y = (Q - 7r)y + Try = Qy G Jf
Angular Transformations and Matrix Quadratic Equations 27 Thus NR C Jf. Conversely, if yEJf then y = Qy = Qny = (R + ir)iry = R(^y) + (^y) e •#« thus N = Nr, as required. To prove the uniqueness of R, we show that any defining transformation R in (1.6.1) must have the form (1.6.2). Thus let Nbe angular with respect to 7r, and let R: Im 7r-*Ker -tt satisfy (1.6.1). Let yGlm7r and x = Ry + y G jV. Then, since / - Q is onto Ker it along Jf 0 = (I-Q)x = (I-Q)Ry + (I-Q)iry But QR = 0 and Qtt = 0 so that Ry = (Q - rr)y. □ The transformation /? appearing in the preceding proposition is called the angular transformation for Jf. Note that R can be defined as the restriction of a difference of projectors: R = (Q-*)\tm* Consider now a transformation T: <p"-»<p". As before, let v. <p"—* <p" be a projector so that we have <p" = Im tt 4- Ker tt. Then T has a representation with respect to this decomposition: T=\ln ln] (1.6.3) L,ll i 22 J It is clear that Im it is invariant under T if and only if T21 = 0. Similarly, Ker 7r is T invariant if and only if Tl2 — 0. More generally, what is the condition that a subspace Jf that is angular with respect to it be T invariant? Theorem 1.6.2 Let N be an angular subspace with respect to the projector it. Let T have the representation (1.6.3) with respect to the decomposition <p" = Im it + Ker it. Then N is T invariant if and only if the angular transformation R for Jtf satisfies the matrix quadratic equation. RTl2R + RTn-T22R-T2l = 0 (1.6.4) Proof. If /,, I2 are the identity transformations on Im it and Ker ir, respectively, then since R: Im 7r-*Ker it we can define the transformation
28 Invariant Subspaces which is written as a 2x2 matrix with respect to the decomposition <p" = Im 7r + Ker it. The transformation £ is obviously invertible and Y-r lA For every xGlmirwe have Ex = x + Rx £ Jf. So £ maps Im it onto Jf and E~ maps Jf back onto Im it. By Proposition 1.4.2, Jf is T invariant if and only if Im -tt is E~ TE invariant. Now observe that £-'Tir_l *11 + ^12^ T\2 1TE = \ I -j (1.6.5) RTl2R-RTu + T22R+T2l T22- Tl2R so Im 7r is £ TE invariant if and only if (1.6.4) holds. □ Another important observation follows from the similarity (1.6.5). Corollary 1.6.3 If Jf is T invariant, then <r(T)=<r(Tn+Tl2R)U<T(T22-Ti2R) (1.6.6) and <r(T\J,)=<r(Tu + Tl2R) (1.6.7) Proof. We have «r)-.<E-7*)-4r"7,»" T!il"TJ Now use Proposition 1.5.4 to obtain (1.6.6). Further, a(T\^) = 0-(£1r£|lnij = cr(r11 + r12/?). □ 1.7 TRANSFORMATIONS IN FACTOR SPACES Let ^V C <p" be a subspace. We say that two vectors x, y £ <p" are comparable modulo Jf if x - y £ Jf, and denote this by x = y (mod ^V). In particular, x = 0 (mod Jf) if and only if xE.Jf. This relation is easily seen to be reflexive, symmetrical, and transitive. That is x = x(mod Jf) for all x £ <p" x = y(mod Jf) 4> y = x(mod jV) x = y(modJV) and y = z(modJf)^x = z(modJf)
Transformations in Factor Spaces 29 Thus we have an equivalence relation on <p". It follows that <p" is decomposed into disjoint classes of vectors with the properties that in each class the vectors are comparable modulo J{, and in different classes the vectors are not comparable modulo M. We denote by [x] v the class of vectors that are comparable modulo /to a given vector x G <p". The set of all such classes of vectors defined by comparability modulo N is denoted <p"A/V. Proposition 1.7.1 Let set <p"/jV be a vector space over <p with the following operations of addition and multiplication by a complex number. Proof. We have to check first that these definitions do not depend on the choice of the representatives ^G^]^ and yGfy]^. If xl G [x\x and y^lyL. then (*i + yi) - (x + y) = (*, " x) + (y, - y) G Sf that is, Jt, + y, G [x + y]^. So indeed the class [x + y]^ does not depend on the choice of x and y. Similarly, one checks that [a*]^ does not depend on the choice of x in the class [x]x (for fixed a). It is a straightforward but tedious task to verify that <p7^V satisfies the following defining properties of a vector space over <p: The sum is commutative and associative: (a) x + y = y + x, (x + y) + z = x + (y + z) for every x, y, z G <p7^V; (b) there is a zero element 0€E <p7^V", that is, an element 0 such that x + 0 = x for all x G <p7jV; (c) for every x G <p7jV there is an additive inverse element y G $"IN, that is, such that x + y =■ 0; (d) for every a, /3 G <p and j,y£ <p7^V the following equalities hold: a(x + y) = ax + ay, (a + fj)x = ax + fix, (afi)x ~ a(fix), and Ix = x (here 1 is the complex number). We leave the verification of all these properties to the reader. □ The vector space (p7^Vis isomorphic to any direct complement Jf' of Jfin <p". Indeed, let a G <p7^V; then there exists a unique vector y G N' such that a = [y]^ and in fact, y = Px, where P is the projector on Jf' along ^V and x is any vector in the class a. This is easily checked. We have y — x = -(/ — P)x &.N, so y G a. If there were two different vectors y, and y2 from jV' such that [yx]x = [y2L- = «, then y, = y2 G jV' n jV and y, ^ y2, which contradicts the choice of •//"' as a direct complement to ^Vin <p". So we have constructed a map <p: <p"-*^V' defined by <p(a) = y. This map is easily seen to be a homomorphism of vector spaces; that is <p(a + b)= <p(a) + (p(b) ; <p(aa) = atp(a)
30 Invariant Subspaces for every a, b E. <p7^V and every a E. <p. Moreover, if <p(a) = ip(b), then the vector y = ip{a) = <p{b) belongs to both classes a and b of comparable vectors modulo M, and thus a = b. So <p is one-to-one. Taking any y&.Jf', we see that <p([y]v) = y, so <p is onto. Summing up, <p is an isomorphism between the two vector spaces $"Uf and N'. In particular, dim <p"/^V = n — dim Jf. Assume now that Jf is A invariant for some transformation A: <p"^<p". Then the induced transformation A: $"/#-*■$"/# is defined by i4[jr]jV = [Ar] v for any x E <p". This definition does not depend on the choice of the vector x in its class [x)x. Indeed, if [xx\x = [x2\x, then Axl — Ax2 = A(xl — x2) £ Jf because x{ - x2 G N and Jf is A invariant. We now present some basic properties of the induced linear transformation A. Proposition 1.7.2 If Jf is invariant for both transformations A: <p" -» <p" and B: <p" -» <p", then (aA + BB) = aA + BB for any a, B G <p (1.7.1) (,4B) = ,4B //, in addition, A is invertible, then (A^) = (Ayl (1.7.2) Proof By Proposition 1.4.1, N is invariant for aA + /3B, AB, and ^4 ' (if /I is invertible). For any xE. <p" we have (a^A+^B)[x\, = [(a A + BB)x]x = a[Ax], + B[Bx]x = aA[x]x + BB[*].s- Further, by definition of the induced transformation we have (AB)[x]x = [ABx]x and AB[x\x = A[Bx\, = [ABxls for every x£.$". Finally, (1.7.2) is a particular case of (1.7.1) (with B = A~l), taking into account the fact that / = /. □
The Lattice of Invariant Subspaces 31 It may happen that A is not invertible but A is invertible. For instance, let A: <p" -* <p" be any transformation with the property that <p" = Ker A + Im A. (There are many transformations with this property; those represented by a diagonal matrix in some basis for <p", for example.) Put N = Ker A. Then for every vector x G <p" that is not in Jf we have A [x]y = [At].* ¥= 0. Thus Ker A = {0} and A is invertible. The following proposition clarifies the situation. Proposition 1.7.3 If k0 is an eigenvalue of A and Ker(A - X0I) is not contained in Jf, then A0 is also an eigenvalue of A. Conversely, every eigenvalue A0 of A is an eigenvalue of A and Ker(A — A0/) is not contained in M. The proof is immediate: if Ax = X0x with x^Jf, then j4[*]v = A0[jc]v with [x]x ¥" 0, and conversely. /.* THE LATTICE OF INVARIANT SUBSPACES We start with the notion of a lattice of subspaces in <p". A set 5 of subspaces in <p" is called a lattice if {0} and <p" belong to 5 and 5 contains the intersection and sum of any two subspaces belonging to 5. The following are examples of lattices of subspaces: (a) 5= {{0}, M, M1, <p"}, where M is a fixed subspace in <p"; (b) 5 = {{0}, Span{e,,. . ., ek) for k = 1, . . . , «}; (c) 5 is the set of all subspaces in <p". For us, the following example of a lattice of subspaces will be the most important. Proposition 1.8.1 The set ln\{A) of all invariant subspaces for a fixed transformation A: £"-» <p" is a lattice. Proof. Let Jt, NE.In\(A). If xGM HJf, then because of the A in- variance of M and ^V we have Ax G M and Ax G M, so M D ^V is A invariant. Now let x G M + M, so that x = xl+ x2, where x1E.M, x2E.N. Then Ax = Axx + Ax2 G M, + Jf, and M + N is A invariant as well. Finally, both {0} and <p" obviously belong to Inv(^4). □ Actually, examples (b) and (c) are particular cases of Proposition 1.8.1: (b) is just the set of all .^-invariant subspaces for 0 0 0 1 0 0 1 0 1 0 and example (c) is the set of all invariant subspaces for the zero matrix.
32 Invariant Subspaces In contrast, if n > 2, the lattice of example (a) is never the lattice of invariant subspaces of a fixed transformation A. Indeed, assuming the contrary, the restriction A\M has a one-dimensional invariant subspace (a subspace spanned by an eigenvector; here we consider A\M as a transformation from M into M). By Proposition 1.5.3, this subspace is also an invariant subspace of A. Hence necessarily $\mM = \, and for the same reason dim M1 = 1. Since <p" = M + M 1 we obtain a contradiction when n>2. In terms of the lattices of invariant subspaces, Propositions 1.4.2 and 1.4.4 can be restated as follows. We define [Inv(zl)]1 to be the set of subspaces M ± for which M G Inv(^4). Proposition 1.8.2 Given a transformation A: (p" —* <p" and an invertible transformation S: <p"-*<p", we have S[Inv(/l)] = Inv(SAS ') and lnv(A*) = [Inv(zl)]1 We know that if Mx and M2 are A invariant, then so are Mx + M2 and l,ni2. It is of interest to find out how the spectra of the restrictions a\mi+m2 and A\M[nM2 are related to the spectra of A\Mj and A\Mz. Theorem 1.8.3 If Jtt and Jt2 are A-invariant subspaces, then <t(A\m1+m2) = <r(A\M) U a{A\M) (1.8.1) and <r(AL,nM2) C <r(A\M) D *(A\M) (1.8.2) Recall that <r(B) stands for the set of eigenvalues of a transformation B. Proof. Proposition 1.5.3 shows that the inclusion D holds in (1.8.1). To prove the opposite inclusion, write Mt + M2 = M[ + (M1nM2) + M2 (1.8.3) where M\ is a subspace in Mx such that M\ + (Mx n M2) = Jtl, and M'2C M2 satisfies M2 4- (Jtl n M2) = M2. Write A\M +M as the 3x3 block matrix with respect to decomposition (1.8.3):
The Lattice of Invariant Subspaces 33 A = ~*u A2l _j431 Al2 A 22 A 22 Al3~ A 22 A 22 _ M, ^ + M,2 —* Mx-\- J&2 Here, Ait = PtAPt, and P, (resp. P3) is the projector on M\ along (Mx DM2) + M'2 [resp. on M'2 along (i,nj((2) + M[], and P2 = /- P, - P3. As we have seen above, the A invariance of Ml implies Ait = A31 - 0, and the A invariance of M2 implies A12 = A13 = 0. So A = '21 '22 0 423 433 (1.8.4) We find that det(A/-/lLi+^) = det(A/-/l11)det(A/-J422)det(A/-/l33) and hence that \E.<t(A\m +m ) implies that \E.<t(A\m ) or \E.(t(A\Mi). For the proof of (1.8.2) note that M1n Jt2C Mx, and hence by Proposition 1.5.3, a(A\MjnMx)Ca(A\Mi). Similarly, <r(A\MinM2)C <t(A\m), and (1.8.2) follows. D The following example shows that the inclusion in (1.8.2) may be strict. example 1.8.1. Let A = 1 0 0 0 0 0 0 0 0 Mx = Span{e1? e2} , Jt2 = Span{el, e3} Then Mx and M2 are A invariant and <r(A\M nM ) = {1}; a(A\M) = <riA\M2) = {1,0}. □ ' A set 5 of subspaces in <p" is called a chain if {0} and <p" belong to S and either M CJi or ^V C M (with proper inclusions) for every pair of different subspaces M,NE.S. Obviously, a chain is also a lattice. Also, a chain of subspaces is always finite (actually, it cannot contain more than n + 1 subspaces), in contrast to lattices that may be infinite, as in example (c) above. Let {0}Ci,CJ2C'-'C Mk_x CMk = $" (1.8.5) be a chain of different subspaces. We choose a direct complement J2) to
34 Invariant Snbspaces Mi_l in the subspace M1(i = 1,. . . , k). Then we obtain a decomposition of <p" into a direct sum if, 4-i?2 + ■ ■ ■ 4-i?t = £" (1.8.6) This means that for every vector x £ <p" there exists unique vectors jt,E^, i,6i?4 such that x = xx + x2 + • ■ ■ + xk. Now let P, be the projector on if. along if, + i?2 + • • • + <£t_, + i?,+ , + -■•+ <ek The projectors P, are mutually disjoint; that is, Pf^ = P-P, = 0 for i ¥■ j, and P, + • •• + Pk = I. Now any transformation A: <p"-*<p" can be written as a k x k block matrix with respect to the decomposition (1.8.6): ^11 ^12 * ' ' ^ It x= ; ; •; (18-7) -•"tl -^*:2 ' ' ' ™kk - where each transformation Ai: = PAP\V: J£-* J£ is written as a matrix in some fixed bases in ^ and S£k. Choose a basis x,,. . . , xn in <p" in such a way that Span (a:,, . . . , xp } - Mi,, i= 1,. . . , k where 0 < p, < p2 < ■ ■ ■ < pk = n, and let ■£< = Span{jrft t,xPi | + 1(... ,xp) Then one can characterize all matrices for which (1.8.5) is a chain of (not necessarily all) invariant subspaces in terms of the k x k block representation as follows. Proposition 1.8.4 All subspaces from the chain (1.8.5) are invariant for a transformation A if and only if A has the following form in the chosen basis xlt. . . , xn: (1.8.8) where A,y is a (pt..- p,■_,) x (p. - p._,) matrix, 1 </<;"< k {and we define Po = 0). A = An An L n A.... .
The Lattice of Invariant Subspaces 35 Proof. Assume that A has the form (1.8.8), which means that in terms of the projectors Px,...,Pk defined above the equalities PtAPt = 0 for i> j hold. For a fixed j, it follows that (Pj+} + --- + Pk)A(Pl + --- + Pi) = 0 def , As Q). = P, + • • • + P- is a projector on Mi and P;+, + • ■ • + Pk = I - Qjy we obtain (/ — Qj)AQj = 0, which means that M • = Im Q; is A invariant. Conversely, if Ml, M2, . . . , Mk are all A invariant, then the equality (/ - Qj)AQj = 0 holds for j = 1, . . . , k. So PiAPj = 0 for i > /, and A has the form (1.8.8). □ A chain of subspaces {0}ciocj,ci<2C"-ci(t = (|:" (1.8.9) is called maximal (or complete) if it cannot be extended to a larger chain, that is, any chain of subspaces {0} C 20 C iP, C 22 C • • • C X, = <p" with the property that every ^(. is equal to some i?y, coincides with the chain (1.8.9). It is easily seen that a chain (1.8.9) is maximal if and only if dim Mt: = i, i = 1,. . . , n. Now if (1.8.9) is a maximal chain, we may choose a basis x{,. . . , x„ in <p" in such a way that Mj = Span{x,,. . . , *,} , i = 1,. . . , n As a particular case of Proposition 1.8.4, we find that all the subspaces Mx,. . . , Mn are A invariant for a transformation A if and only if A has upper triangular form in the basis xl,. . . , xn: A = L 0 '12 0 We conclude this section with a useful result on chains of invariant subspaces for a transformation having a basis of eigenvectors in <p". It turns out that such chains can be chosen to be complementary to any chain of subspaces given in advance. Theorem 1.8.5 Let .4: <p" —»<p" be a transformation having a basis in <p" formed by eigenvectors of A. Then for every chain of subspaces Jft C • • • C Xp in <p"
36 Invariant Subspaces there exists a chain of A-invariant subspaces M, D • • • D Mp such that M - is a direct complement to N-, j = 1,. . . , p. Proof. Let *,, x2,. . . , xn be a basis in <p" consisting of eigenvectors of A. We show first of all that there exists a set of indices Kx C {1,. . . , n) such that the subspace Mx = {*, | i £ /£,} is a direct complement to ^V, in <p". Let t, be the first index such that x, does not belong to Jf{. If i, < t2 < • • • < i's (Sn) are already chosen, let is+l be the first index such that xt does not belong to Span (a:, , . . . , x, } + ^V,. This process will stop after t steps (say) when the equality Span{x, ,. . . , xt} + Jf{ = <p" is reached. Now one can put K{ = {/,,... , i,} to ensure that Span!*, | i £ Kx) is a direct complement to Jft. dcf By the same token, there is a set K2 C Kx such that M2 = Span{x, | i £ K2) is a direct complement to Mxr\Jf2 in Ml. As jV2 = (iini'2) + J'l, clearly Jt2 is a direct complement to N2 in <p". Let M^ = Span{x, | i £ K3}, where ^3 C K2 and Jt3 is a direct complement to J<2 D ^V3 in M2, and so on. Clearly, all the subspaces M ■ are A invariant. □ In connection with Theorem 1.8.5 we emphasize that not every transformation has a basis of eigenvectors. Indeed, we have seen in Example 1.1.1 a transformation with only one eigenvector (up to multiplication by a nonzero complex number); obviously one cannot form a basis in <p" from the eigenvectors of A. Furthermore, the transformation A of Example 1.1.1 does not satisfy the conclusion of Theorem 1.8.5. We leave it to the reader to verify the following fact concerning this transformation A: for a chain jVj C ■ • • C Jfp of subspaces in <p" there is a chain Mx D ■ • ■ D M p of A- invariant subspaces such that Mi + ^V( = <p", / = 1,. . . , p if and only if each jV) is spanned by the vectors of type en +/„, <?„_, +/„_,,. . . , <?„_,>+, + /„_r+1, where rj = dim Mj and the vectors /„,/„_,,...,/„_,.+, belong to Span{e[, e2,. . . , en_r}. (As usual, ek stands for the kth unit coordinate vector in <p".) The converse of Theorem 1.8.5 is also true: if for every chain of sub- spaces jV, C • • • C Jfp in <p" there exists a chain of /1-invariant subspaces MXD ■ ■ -D M such that Jtt is a direct complement to Njy j = 1,. . . , p, then there exists a basis of eigenvectors of A. However, a stronger statement holds. Proposition 1.8.6 Let A: <p"-* <p" be a transformation. If each subspace M C <p" has a complementary subspace that is A invariant, there is a basis in <p" consisting of eigenvectors of A. Proof. Let ^V0 be the subspace spanned by all the eigenvectors of A. We have to prove that J{Q = <p". Assume the contrary: Jf0 ¥= <p". The hypothesis
Triangular Matrices 37 of the theorem implies that there is an >4-invariant subspace M0 that is a direct complement to Jf0. Clearly, ^,,^{0}. Hence there exists an eigenvector x0 of A in MQ: Ax0 = A0jt0, x„ ^0. Since xu^N0, we contradict the definition of JfQ. □ 1.9 TRIANGULAR MATRICES AND COMPLETE CHAINS OF INVARIANT SVBSPACES The main result of this section is the following theorem on unitary trian- gularization of a transformation. It has important implications for the study of invariant subspaces. Recall that a transformation U: <p" —*• <p" is called unitary if it is invertible and U~l = U* or, equivalently, if (Ux, Uv) = (x, y) for all x, ye. <p". Note that the seemingly weaker condition \\Ux\\ — \\x\\ for all x G <p" is also sufficient to ensure that U is unitary. Note also that the product of two unitary transformations is unitary again, and so is the inverse of a unitary transformation. It will be convenient to write linear transformations from <p" into <p" as n x n matrices with respect to the standard orthonormal basis ex, . . . , en in <p". We shall use the fact that a matrix is unitary if and only if its columns form an orthonormal basis in <p". Theorem 1.9.1 For any n x n matrix A there exists a unitary matrix U such that T=U*AU = [tll]:j=l (1.9.1) is an upper triangular matrix, that is, f, = 0 for i > j, and the diagonal elements tn,. . . , tnn are just the eigenvalues of A. Proof. Let A, be an eigenvalue of A with an eigenvector xx and assume that H*,|| = 1. Let x2,. . . , xn be vectors in <p" that, together with xx, form an orthonormal basis for <p". Then the matrix o. = [*.-••*„] is unitary. Write Ux in a block matrix form Ux = [x,V], where V= [x2 ■ ■ ■ xn] is an n x (n - 1) matrix. Then because of the orthonormality of x,,..., xn, V*xx =0. Now, using the relation Axx = Aj*,, we obtain V\AVX = [ X*\ \a\xxV\ = [ j£ ][A,*„ AV] [a,!!*,!!2 xt/ivnrA, x*avi ~lkxV*xx V*Av\~lO V*AV\
38 Invariant Subspaces Applying the same procedure to the (n - 1): we find an (n - 1) x (« - 1) unitary matrix U2 such that def Applying the same procedure to the (n - 1) x (n - 1) matrix A2 = V*AV, u*a>uAxo 1} for some eigenvalue A2 of A2 and some (n - 2) x (« - 2) matrix /13. Apply the same procedure to A3 using a suitable (/i - 2) x (« - 2) unitary matrix t/3, and so on. Then for the n x n unitary matrix U=Ul i oir/2 o] [■/„_, oi o i/JLo i/J L o uJ the product U*AU is upper triangular. Finally, as £/* = £/ ', we have det(A/-X) = det(A/-7) = (A-r1I)---(A-ril(l) so that /,,,. . . , („„ are the eigenvalues of A. D Let T= U*AU be a triangular form of the matrix A as in Theorem 1.9.1. Then it follows from Proposition 1.8.4 that there is a maximal chain 0CM1C---CM„_lCM„ = <p" where all subspaces Mt are T invariant. Then Proposition 1.4.2 shows that the maximal chain 0C U*MX C • • • C U*Mn_l CMn = $" consists of /1-invariant subspaces. We have obtained the following fact. Corollary 1.9.2 Any transformation {or n x n matrix) A: <p"-* <p" has a maximal chain of A-invariant subspaces. In particular, for every i, 1 < i < n, there exists an i-dimensional A-invariant subspace. In general, a complete chain of /1-invariant subspaces is not unique. An extreme case of this situation is provided by A = a I, a £ <p. For such an A, every complete chain of subspaces is a complete chain of A -invariant subspaces. Clearly, there are many complete chains of subspaces in <p" (unless n = 1). Let us characterize the matrices A for which there is a unique complete chain of invariant subspaces.
Triangular Matrices 39 Theorem 1.9.3 An n x n matrix A has a unique complete chain of invariant subspaces if and only if A has a unique eigenvector (up to multiplication by a scalar). Proof. We have seen in the proof of Theorem 1.9.1 that for any eigenvector x of A the subspace Span {a:} appears in some complete chain of ^-invariant subspaces. So if a complete chain of invariant subspaces is unique, the matrix A has a unique eigenvector (up to multiplication by a scalar). The converse part of Theorem 1.9.3 will be proved later using the Jordan normal form of a matrix (see Theorem 2.5.1). □ Theorem 1.9.1 has important consequences for normal transformations. A transformation A: <p" —> <p" is called normal if AA* = A*A. Self-adjoint and unitary transformations are normal, of course, but there are also normal transformations that are neither self-adjoint nor unitary. Theorem 1.9.4 A transformation A: <p" -* <p" is normal if and only if there is an orthonormal basis in <p" consisting of eigenvectors for A. Proof. Write A as an n x n matrix. Assuming that A is normal, the matrix T from (1.9.1) is easily seen to be normal as well: TT* = U*AUU*A*U = U*AA*U = U*A*AU= U*A*UU*AU= T*T But T is upper triangular: 'll 'l2 *13 ' l\n 0 '22 " h" Hence the (1,1) entry in T*T is \t11\2, whereas this entry in TT* is l'ii|2 + l'i2|2 + "- + l'iJ2- As T*T= TT*, it follows that /,2 = ■•• = *,„ =0. Comparing the (2, 2) entries in T*T and TT*, we now find that t2i = ■ ■ ■ = hn = 0' and so on- I* turns out that T is diagonal. Now Uex,. . . , Uen is an orthonormal basis in <p" consisting of eigenvectors of A. Conversely, assume that A has a set of eigenvectors/,, ...,/„ that form an orthonormal basis in <p". Then the matrix U = [ff2 • • •/„] is unitary and U*AUel = U*Af = \iU*fi = Xle,
40 Invariant Subspaces where A, is the eigenvalue of A corresponding to ft. So def T= t/Mt/ = diag[A,A2---A„] As the diagonal matrix T is obviously normal, we find that A is normal as well. □ 1.10 EXERCISES 1.1 Prove or disprove the following statements for any linear transformation A: $"-*$": (a) Im A + Ker A = <£". (b) Im A + Ker A = <p" (the sum not necessarily direct). (c) Im/inKer,4#{0}. (d) dim Im A + dim Ker A = n. (e) Im A is the orthogonal complement to Ker A*. 1.2 Prove or disprove statements (d) and (e) in the preceding exercise for a transformation A: <pm-* <p", where m ¥" n. 1.3 Let A: <p" -» <p" be the transformation given (in the standard ortho- normal basis) by an upper triangular Toeplitz matrix L0 0 0 «n J where a0,. . . , an_, are complex numbers. Find the subspaces Im A and Ker A. 1.4 Given /4:<p"-*<p" as in Example 1.1.3, identify the /l-invariant subspaces Im Ak and Ker Ak, k = 0,1,... . 1.5 Identify Im A and Ker A , k = 0,1,. . . , where *-a. OH 0 "O' ■ ' - ' "-BH is given by a lower triangular Toeplitz matrix. 1.6 Find all one-dimensional invariant subspaces of the following transformations (written as matrices with respect to the standard ortho- normal basis): -2 0 0 1 -1 2 1 0 1 > "-1 0 0 2 1 0 -2 0 1 9 2 0 0 1 3 -1 0 1 1
Exercises 41 1.7 In the preceding exercise, which transformations have a Jordan chain consisting of more than one vector? Find these Jordan chains. 1.8 Show that all invariant subspaces for the projector P on the subspace ,/V" are of the form Ml 4- ^V,, where Mx (resp. ^V,) is a subspace in M (resp. J>f). Find the lattice Inv P*. 1.9 Given P as in Exercise 1.8, find all the invariant subspaces of axP + a2(I - P), where a, and a2 are complex numbers. 1.10 Let A: <p"-» <p" be a transformation with A2 = I. Show that Im(/ 4- A) and Im(/ - ^4) are the subspaces consisting of the zero vector and all eigenvectors of A corresponding to the eigenvalues 1 and -1, respectively. 1.11 Find all invariant subspaces of a transformation A: <p"-» <p" such that A2 = I. 1.12 Let (a) Show that A is similar to [ 0 -/ (b) Find all invariant subspaces of A. 1.13 Let A =, 0 0 0 a2I ■ ■ akl 0 0 :<p"e---e<P"-<P"e---e<P" La,/ 0 Show that A is similar to a matrix of type A times /3,/ 0 0 /32/ L 0 0 0 1 0 and find the lattice Inv(/1). What are the invariant subspaces of A*1 1.14 Let Q 0 0 0 .1 1 0 0 0 0 • 1 • 0 • 0 • • 0" • 0 1 • 0. <p"-»<p"
42 Invariant Subspaces Prove that the eigenvalues of Q are cos(2irk/n) + i sin(2irk/n), k = 0,1,. . . , n — 1. Find the corresponding eigenvectors. 1.15 Show that the transformation a 1 a 1 0 1 a 1 1 a J :f—f>e<f has eigenvectors xp whose yth coordinate is sin{jpir/(n +1)}, p = 0,. . . , n - 1 (independently of a). What are the corresponding eigenvalues? 1.16 Let 0 1 0 0 0 0 a«> a. 0 1 0 0 :$"-*(" where a0,. . . , a„_, are complex numbers. Show that A0 is an eigenvalue of A if and only if A0 is a zero of the equation a,A - au =0 1.17 Let A = 0 0 1 0 0 1 0 0 L- -(") -(") - -(.:,)] (a) Find all eigenvalues and eigenvectors of A. (b) Find a longest Jordan chain. (c) Show that A is similar to a matrix of the form 1 0 0 <p"-»<p" L 0 0 1 A„ and find the similarity matrix.
Exercises 43 (d) Find the lattice lnv(A) of all invariant subspaces of A. (e) Find all invariant subspaces of the transposed matrix A . 1.18 Let A: <p"-»<p" be a transformation represented as a matrix in the standard orthonormal basis. Show that all invariant subspaces for the transposed matrix AT are given by the formula Span {J,, . . . ,xk}, where *,,. . . , xk is a basis in the orthogonal complement to some /1-invariant subspace, and for a vector y = (y,, . . . , yn) €E <p" we denote y=(yt,. ..,yn). 1.19 Prove a generalization of Proposition 1.4.4: if i4:<p"-»<p" is a transformation and M, N are subspaces in <p", then AM C M holds if and only if A*Jf±CM\ 1.20 Give an example of a transformation A: <p"-*<p" that is not self- adjoint but nevertheless AM 1 C M 1 for every /1-invariant subspace M. 1.21 Let ■*. = {[*]e<p2"|*e<p"}, ^ = { X L — Jt J <P2"|a:G<P"} Find the angular transformations of Mx and M2 with respect to the projector on <p" © {0} along {0} © <p". 1.22 Find at least one solution of the quadratic equation RTnR + RTn T21R-T21=0 where (a) T« = _ 0 0 1 0 0 0 0 A,e<p 1.23 are n x n matrices. (b) T are n x n diagonal matrices. (c) T^ are n* n circulant matrices. Prove that xx 4- M,. . . , xk + M is a basis in $"IM (where xt, . . . , xk £ <p) if and only if for some basis _y,,.. . , y in M the vectors *,,. . . , xk, y,, 1.24 Let >4 = diag[a1 , yp form a basis in <p". <p"^ (p", where the numbers a,,. . . , an are distinct. Show that for any .^-invariant subspace M the induced transformation A: 4(nIM —> 4(nIM can also be written in the form diag[b[, . . . , bk] in some basis in $"/M.
44 Invariant Subspaces 1.25 Find all the induced transformations A: §"/M-* $"IM, where A = Ao 0 _0 1 Ao 0 0 • 1 • • 0 • 0 1 A, and M is any /1-invariant subspace. 1.26 Show that if P is a projector on <p" and P is the induced transformation on $"/M, where M is a P-invariant subspace, then P is a projector as well. Find Im P and Ker P. 1.27 Let "1 0 3" A= 0 1 4 0 0 2 be in a triangular form. Show that 0 1 0 1 0 0 0 0 1 A 0 1 0 -1 0 0 0 0 1 *A is also in a triangular form. Hence the triangular form of a matrix is not unique, in general. 1.28 Find complete chains of invariant subspaces for the transformations given in Exercise 1.6. Check for uniqueness in each case. 1.29 Given a transformation in a matrix form A = 0 0 0 v23 with respect to the basis ex, e2, e3, find a complete chain of A- invariant subspaces. Find a basis in which A has the upper triangular form. 1.30 Let A: <p2"—► <p2" be a transformation. Prove that there exists an orthonormai basis in <p2" such that, with respect to this basis, A has the representation [An Anl A21 A22] where, for each i and /, Aif is an upper triangular matrix.
Chapter Two The Jordan Form and Invariant Subspaces We have seen in Section 1.4 and Proposition 1.8.2 that there is a strong relationship between lattices of invariant subspaces of similar transformations, namely 5(Inv(/l)) = Inv(5/l5"1) for any two tranformations A and S from <p" into <p" with 5 invertible. Thus, for the study of invariant subspaces, it is desirable to use similarity transformations to reduce a given transformation to the simplest form, in the hope that the lattice of invariant subspaces for the simplest form would be more transparent than that for the original transformation. The "simplest form" here is the Jordan form. It is obtained in this chapter and used to study some properties of invariant subspaces. Special insights are obtained into the structure of invariant subspaces and are exploited throughout the book. We examine irreducible invariant subspaces, generators of invariant sub- spaces, maximal and minimal invariant subspaces, and invariant subspaces of functions of transformations. An interesting class of subspaces is introduced and studied in Section 2.9 that we call "marked." All the subject matter here is well known, although this exposition may be unusual in matters of emphasis and detail that will be useful subsequently. 2.1 ROOT SUBSPACES In this section we introduce the root subspaces of a transformation. The study of these subspaces is the first step towards an understanding of the Jordan form. At the same time it will be seen that the root subspaces are important examples of invariant subspaces that can be described in terms of Jordan chains. 45
46 The Jordan Form and Invariant Subspaces We consider now some ideas leading up to the definition of root subspaces. Let A: (p" -» <p" be a transformation and let A„ be an eigenvalue of A. Consider the subspaces Ker(A - A0/)', i = 1,2,.... For z = l the subspace Ker(A - A,,/) ^ {0} is just the subspace spanned by the eigenvectors of A corresponding to A„. As (A - A,,/)'* = 0 implies (A - A0/)' + 1a: = 0, we have Ker(A - A0/) C Ker(A - A()/)2 C • • • C Ker(A - A0/)' CKer(,4-A0/), + ,C--- (2.1.1) Consequently, Ker(A - A0/)' + 1 ¥= Ker(/1 - A0/)' if and only if dim Ker(^4 - A0/)'+ > dim Ker(A - A0/)'. Since the dimensions of the subspaces Ker(>4 - A0/)', /' = 1, 2, . . . are bounded above by n, there exists a minimal integer p>l such that Ker(A - A0/)' = Ker(,4 - A0/)p for all integers i >p. The subspace Ker(/1 - A0/)p is called the rootsubspace of A corresponding to A„ and is denoted 3£A (A). In other words, §lK (A) consists of all vectors x E <p" such that (A— h0I)qx = 0 for some integer q s 1. (This integer may depend on x.) Because A{A - kjj = {A - XJJA , t = 1,2,... all subspaces in (2.1.1) are A invariant. In particular, the root subspace 0tx (A) is A invariant. By definition, S/lK (A) = Ker(/1 - A0/)'' is the biggest subspace in the chain (2.1.1). We see later that, in fact, p is the minimal integer t > 1 for which the equality Ker(>4 - A0/)' = Ker(/4 - A0/)' + 1 holds, and that p < n. Hence we also have »Ao(X)={Jf6<p-|M-A0/)"jf = 0} The nesting of the kernels in (2.1.1) has a dual in the (descending) nesting of images: lm(A - A0/) D Im(A - A0/)2 D • • • D lm(A - A0/)' D •• • But these sequences of inclusions are coupled by the fact that, for any integer i > 0, dim Ker(,4 - A0/)' + dim \m(A - A0/)' = n Consequently, if p is the least integer for which Ker(A - \0I)p + l = Ker(A - \J)P, it is also the least integer which\m{ A - A0/)''+1 =Im(a - k0l)p.
Root Subspaces 47 Proposition 2.1.1 The root subspace 3£A (A) contains the vectors from any Jordan chain of A corresponding to A0. Proof. Let x0,. . . , xk be a Jordan chain of A corresponding to A0. Then (A - A(1/)* + '** = {A- A0/)* • (A - \J)xk = {A- A0/) V. = {A - Kl)k~X*k-i =-=(A- \0I)x0 = 0 Hence all the vectors xi (i = 0, . . . , k) belong to l3tK (A). □ Let us look at the simplest examples. For A = A« 1 L0 01 A0J as well as for A = A0/, the only eigenvalue is A0, and the corresponding root subspace 'St k (A) is the whole of <p". If y4 = diag[A,, A2,. . . , A„], A,¥= Xj for i¥^j then the root subspace 3£A (A) is one-dimensional and is spanned by et for i - 1, 2,. . . , n. Later, we also use the following fact: if A, S: <p"-» <p" are transformations with 5 invertible, then Xo(SASyl=S[®,a(A)] (2.1.2) for every eigenvalue A0 of A. An analogous property holds also for every member of the chain (2.1.1). The proof of equation (2.1.2) follows the same lines as the proof of Proposition 1.4.3. The following property of root subspaces is crucial. Theorem 2.1.2 Let A,,.... A, be all the different eigenvalues of a transformation A: <p"-* <p". Then <p" decomposes into the direct sum <p" = aAi(x) + • • • + 9tkr{A) We need some preparations to prove this theorem.
48 The Jordan Form and Invariant Subspaces Lemma 2.1.3 For every eigenvalue A0 of A, the restriction A(gr (i4) has the sole eigenvalue A0. Proof. Let B = A^ iA). We shall show that for every A, # A0 the transformation A,/- B on 3£A (,4) is invertible. Let q be an integer such that «Ao(>4) = Ker(A0/-^)' Then clearly (A, - A0)«/= (A, - A,)*/- (B - A0/)« (2.1.3) Since this implies that (A1-A0)*/=(A,/-fl) x ((A, - A0)'"7+ (A, - A0)"-2(fi - A07) + • • • + (B - A0/)*-') and since A, ^ A0, the invertibility of A,/- B follows. □ Lemma 2.1.4 Given a transformation A: <p"-* <p" with an eigenvalue A0, let q be a positive integer for which KQx{A-Klf=0iKn{A) (2.1.4) Then the subspaces Ker(/1 - A0/)* and lm(A - X0I)q are direct complements to each other in <p". Proof. Since dim Ker(>4 - A0/)« + dim lm(A - \QI)q = n we have only to check that Ker(yl - A0/)« n Im(,4 - A0/)« = {0} (2.1.5) Arguing by contradiction, assume that there is an x ¥= 0 in the left-hand side of equation (2.1.5). Then x = (A - A l)qy for some y. On the other hand, for some integer r^l we have (A - \0I)rx = 0 , and (A- \0I)r lx*0
Root Snbspaces 49 It follows that (A-\QI)q+ry = 0, and (A - A0/),+r~1.y *0 Hence Ker(A - \J)q+r # Ker(A - A0/),+r_1 a contradiction with (2.1.4) and the definition of a root subspace. □ Proof of Theorem 2.1.2 Let A[ be an eigenvalue of A. Lemma 2.1.4 shows that Ker(/1 - kJY + lm(A - A,/)" = <p" where q is some positive integer for which Ker(A - XJ)* = ®^(A) By Lemma 2.1.3, the restriction of A to Ker(^4 - \J)q has the sole eigenvalue A,. On the other hand, A[ is not an eigenvalue of the restriction of A to \m(A - A0/)«. To see this, observe that we also have Im(X-AI/)* + ,=Im(i4-A1/)' Hence A - A,/ maps Im(/4 - A,/)* onto itself. It follows that A, is not an eigenvalue of the restriction of A to the j4-invariant subspace Im(^4 — A,/)'. So the restrictions of A to the subspaces Ker(A — A,/)* = 9lK (A) and i£ = Im(>4 — A,/)* have no common eigenvalues. This property is easily seen to imply that, for any eigenvalue A2 of A ^ aA2(x) = aA2(x„) So we can repeat the previous argument with A replaced by A^ and with A[ replaced by an eigenvalue A2 of A^, to show that «Ai(X) + 9lXi{A) + M = <p" for some /1-invariant subspace M such that Aj and A2 are not eigenvalues of A\M. Continuing this process, we eventually prove Theorem 2.1.2. □ Another approach to the proof of Theorem 2.1.2 is based on the fact that if <7)(A),. . . , qr{*) are polynomials (with complex coefficients) with no common zeros, there exist polynomails p,(A),. . . , pr(A) such that
50 The Jordan Form and Invariant Subspaces Pl(X)ql(X)+-+pr(X)qr(X)^l (2.1.6) (This is easily proved by induction on r, using the Euclidean algorithm for the case r = 2.) Now let the characteristic polynomial <pA(X) = det(XI - A) be factorized in the form r «P„(A) = n(A-A,)"' 1 = 1 where A,,. . ., Ar are different complex numbers (and are, of course, just the eigenvalues of ^4) and v{,. . . , vr are positive integers. Define r ^(A)=n(A-A,r (=i for y = l,...,r. Using the fact that (pA(A)-0 (the Cayley-Hamilton theorem) one verifies that actually StXi(A) = lmqj(A) (2.1.7) for /=],..., r. Finally, take advantage of the existence of polynomials p,(A), . . . , pr(A) such that equality (2.1.6) holds, and use equation (2.1.7), to prove Theorem 2.1.2. This approach can be used to prove results analogous to Theorem 2.1.2 for matrices over fields other than <p. Now let M be an ^4-invaraint subspace. Consider the restriction A\M as a linear transformation from M into Jt, and note that ®>o(A\M)={x<EM\(A\M-XoI)''x=0} for some q > 1} = m n ®ko(A) for every A0 that is an eigenvalue of ^4|^. If A0 is an eigenvalue of A but not an eigenvalue of A\M, then ^(^1^) = {0}; but also M D ^(A) = {0}. So the equality 9£A (A\M) = M D 9lk (A) holds for any A0 Ea(A). Applying Theorem 2.1.2 for the linear transformation A\M and using the above remark, we obtain the following result. Theorem 2.1.5 Let A: <p"-* <p" be a transformation, and let M be an A-invariant subspace. Then M decomposes into a direct sum m = m n 0ik (A) + • • • -i- m n s?A (A) where A,, . . . , Ar are all the different eigenvalues of A.
Root Subspaces 51 Note that Theorem 2.1.2 is actually the particular case of Theorem 2.1.5 with M = <p". We consider now some examples in which Theorem 2.1.5 allows us to find all invariant subspaces of a given linear transformation. example 2.1.1. Let A = diag[A,, A2,. . . , Aj where A,,. . . , An are different complex numbers (as in Example 1.1.3). Then <r(A) = {A,,. . . , A„}, and S/lx(A) = Span{ei} , i = \,2,...,n By Theorem 2.1.5, any .4-invariant subspace M is a direct sum M = (M DSpan{<?,}) + • • • + (M nSpan{<?„}) As M nSpan{e,} is either {0} or Span{e,}, it follows that any /1-invariant subspace is of the form M - Span{^ } + ••• + Span{e, } = Span{e, ,. . . , et } for some indices 1 < j, < i2 < ■ • • < t < n. This fact was stated without proof in Example 1.1.3. □ example 2.1.2. Let A = A, 0 0 0 1 A, 0 0 0 0 A2 0 0" 0 0 A2J :<P4-<P4 where A, and A2 are different complex numbers. The matrix A has the eigenvalues A, and A2. Further, v= "0 0 0 -0 i 0 0 0 0 0 A2 - A, 0 0 0 0 A2 - A, and thus f Span{e.} , if /' = 1 lSpan{e,,e2} , if y > 1 So 9£A (A) = Span{e,, e2). For the eigenvalue A2 we have 9tKi{A) = Span{e3, e4\.
52 The Jordan Form and Invariant Subspaces We see (as Theorem 2.1.2 leads us to expect) that <p" is a direct (even orthogonal) sum of Rx (A) and $/lK (A). Let M be any /1-invariant subspace. By Theorem 2.1.5, we obtain M = M n Span{e,, e2) 4- M D Span{e3, e4) It is easily seen (cf. Example 1.1.1) that the only .^-invariant subspaces in Spanj^!, e2) are {0}, Span{e,}, and Span{e,, e2). On the other hand, any subspace in Span{e3, e4} is A invariant. One can easily describe all subspaces in Span{e3,e,,} as follows: {0}; the one-dimensional subspaces Span{e3 + ae4}, where a G <p is fixed for each particular subspace; the one-dimensional subspace Span{e4}; and Span{e3,e4}. Finally, the following is a complete list of /1-invariant sub- spaces: {0}, Span{e,}, Span{e,,e2} Span{e3 4- ae4} for a fixed a G <p Span{e,, e3 + ae4} for a fixed a G <p Span{e,, e2, e3 + aeA) for a fixed a G <p Span{^}, Span{e,, e4}, Span{e,, e2, e4} Span{e3,^}, Span{e,,e3,e4}, <p4. D 2.2 THE JORDAN FORM AND PARTIAL MULTIPLICITIES Let A be an n x n matrix. In this section we state one of the most important results in linear algebra—the canonical form of a matrix A under similarity transformations A—>S~lAS, where 5 is an invertible n x n matrix. We start with some notations, eigenvalue A0 is the matrix The Jordan block of size k x k with Jk(K) = 1 0 A0 1 0 0 o_ Clearly, det( A/ - Jk( A0)) = (A - A0) , so A0 is the only eigenvalue of Jk( A0). Further A0/- -J«(K) = '0 0 -0 -1 0 0 0 ■■ -1 •• 0 ■■ 0 0 -1 0
The Jordan Form and Partial Multiplicities S3 so the only eigenvector of Jk( A0) (up to multiplication by a nonzero complex number) is ex. The invariant subspaces of 7t(A0) were described in Example 1.1.1; they form a complete chain of subspaces in <p*: Span{eJ CSpan{e,, e2} C • • • CSpan{e,, e2,. . . , ek_t} C <p* It turns out that a similarity transformation can always be found transforming a matrix into a direct sum of Jordan blocks. Theorem 2.2.1 Let A be an nX n (complex) matrix. Then there exists an invertible matrix S such that S lAS is a direct sum of Jordan blocks: S-lAS = Jti(Xl)@---®Jkp(Xp) (2.2.1) The Jordan blocks Jk(Xj) in the representation (2.2.1) are uniquely determined by the matrix A (up to permutation) and do not depend on the choice ofS. Since the eigenvalues of a matrix are invariant under similarity, it is clear that the numbers A,, . . . , A are the eigenvalues of A. Note that they are not necessarily distinct. We stress that this result holds only for complex matrices. For real matrices there is also a canonical form under similarity with a real similarity matrix. This canonical form is dealt with in Chapter 12. The right-hand side of equality (2.2.1) is called a Jordan form of the matrix (or the linear transformation) A. For a given eigenvalue A0 of A, let /, (A:),..., J k (A, ) be all the Jordan blocks in the Jordan form of A for which A; = A0, q = 1, . . . , m. The positive integer m is called the geometric multiplicity of A„ as an eigenvalue of A, and the integers kt ,. . . , kt are called the partial multiplicities of A0. So the number of partial multiplicities of A0 as an eigenvalue of A coincides with the geometric multiplicity of A0. In view of Theorem 2.2.1, the geometric multiplicity and the partial multiplicities depend on A and A0 only and do not depend on the choice of the invertible matrix 5 for which (2.2.1) holds. The sum kt + V k, of the y ' 'I lm partial multiplicities of A0 is called the algebraic multiplicity of A0 (as an eigenvalue of A). Obviously, the algebraic multiplicity of A0 is not less than its geometric multiplicity. The following property of the partial multiplicities will be useful in the sequel. Corollary 2.2.2 If A, and A2 are /i, x /j, and n2 x n2 matrices with the partial multiplicities k^A^,. . . , km (At) and kr(A2),. . . , km (A2) ofA{ and A2, respectively,
54 The Jordan Form and Invariant Subspaces all corresponding to the common eigenvalue A0, then £,(,4,),. . . , km(At), kt(A2),. . . , km^(A2) are the partial multiplicities of the matrix A, 0 ] . 0 A2l corresponding to A0. In particular, the geometric (resp. algebraic) multiplicity of A, 0 1 . 0 A2\ at A0 is the sum of the algebraic (resp. geometric) multiplicities of Ax and A2 at A0. The proof of this corollary is immediate if one observes that the Jordan form of L 0 A2\ can be obtained as a direct sum of the Jordan forms of Al and A2. We also need the following property of partial multiplicities. Corollary 2.2.3 The partial multiplicities of A at A0 coincide with the partial multiplicities of the conjugate transpose matrix A* at A0. Proof. Write A = SJS~\ where J is the Jordan form of A and 5 is a nonsingular matrix. Then A* = S~l*J*S*. Now the conjugate transpose J* of the matrix J is similar to the matrix J that is obtained from J by replacing each entry by its complex conjugate. Indeed, if we define the permutation ("rotation") matrix R with elements r(; defined in terms of the Kronecicer delta by rt = 5, „ + 1^, then it is easily verified that R~l = R and RJk(\)*R = Jk(\) Hence J is the Jordan form of A*, and Corollary 2.2.3 follows from the definition of partial multiplicities. D To describe the result of Theorem 2.2.1 in terms of linear transformations, let us introduce the following definition. An /1-invariant subspace M is called a Jordan subspace corresponding to the eigenvalue A0 of A if M is spanned by the vectors of some Jordan chain of A corresponding to A0.
The Jordan Form and Partial Multiplicities 55 Theorem 2.2.4 Let A: <p"-» <p" be a linear transformation. Then there exists a direct sum decomposition §n = Mx + --- + Mp (2.2.2) where M{ is a Jordan subspace of A corresponding to an eigenvalue A, (here A,, . . . , A are not necessarily different). If <p" = Jft + • ■ ■ + Nq is another direct sum decomposition with Jordan subspaces Jft corresponding to eigenvalues /u.,,«' = 1,. . . , <jr, then q—p, and (possibly after a permutation of Jf{,. . . , Jf ) dim Mi = dim M, and A, = fit for /= 1,. . . , q. Note that in general the decomposition (2.2.2) is not unique. For example, if A = /, then one can take Mt = Span{x,}, where *,,. . . , xn is any basis in <p". Theorem 2.2.1 follows easily from Theorem 2.2.2 and vice versa. Indeed, let 5 be as in Theorem 2.2.1. Then put Ml = S(Span{ei,...,eki}) M2 = S(Spzn{eki+l,eki+2,..., eki+h}) Mp = S(Span{gti + ...+Vi + I) . . . , «»l+...+^}) to satisfy equality (2.2.2). Conversely, if Mi are as in (2.2.2), choose a basis x\'\ . . . ,xk0 in Mi whose vectors form a Jordan chain for A. Then put 5 = [*< V* • • • 4\}A2) ■ ■ ■ 4\] ■ ■ ■ *\p) ■ ■ ■ x\£\ The direct sum decomposition (2.2.2) ensures that 5 is an n x n nonsingular matrix, and the definition of a Jordan chain ensures that S~lAS has the form (2.2.1). Theorem 2.2.1 (or Theorem 2.2A) is proved in the next section. Note that because of Theorem 2.1.2 one has to prove Theorem 2.2.1 only for the case when 3£A (A) = <p", that is, A has only one eigenvalue A0. In this sense the property of root subspaces described in Theorem 2.1.2 is the first step toward a proof of the Jordan form. In view of Proposition 1.4.2, there are many cases in which the Jordan form allows us to reduce the consideration of invariant subspaces of a general linear transformation to the consideration of invariant subspaces of a linear transformation that is given by the Jordan normal form in the standard orthonormal basis. This reduction is used many times in the sequel. As a first example of such a reduction we note the following simple fact.
56 The Jordan Form and Invariant Subspaces Proposition 2.2.5 Let A: <p" -* <p" be a linear transformation. Then the geometric multiplicity of any A0 G o-(A) coincides with dim Ker(A - A0/), and the algebraic multiplicity of \Q coincides with the dimension of 3£A {A), the root subspace of A0 [i.e., with the dimension ofKer(A — A0/)"]. Proof. By (2.1.2) and Theorem 2.2.1 we can assume without loss of generality that Then for any A0 G <p we have A-\0I= yt((A, - Ao)0- • -®Jkp(\p - A0) From the definition of the Jordan block it is easily seen that ^^'-^-ISpanK}, if A0 = A, Hence p dim Ker(/1 - A0/) = 2 dim Ker Jk (A - A0) i=» ' is the number of indices / for which A0 = Ay, and, by definition, this number coincides [in case A0 G o"(^4)] with the geometric multiplicity of A0. Similarly Kert/^-Ajr {0}, if Ao*A,. Span{<?,,... ,^} , if A0 = A, and q = 1,. . . , ki,- 1 .<£*', if A0 = A, and q > A:, So for q = 1, 2,. . . and A0 G <p we have p dim[Ker(,4 - A0/)«] = 2 dim[Ker[7tj(A/ - A0)]"] Z* min(^y, q) (2.2.3) As 9£A (/I) is the maximal subspace of the type Ker(/1 - X0l)q, q = 1, 2,. . . , we obtain
The Jordan Form and Partial Multiplicities 57 dim[Ker »Jj4)] = 2 /c, which, by definition, is just the algebraic multiplicity of A0. D Proposition 2.2.5 is actually a particular case of the following general proposition. Proposition 2.2.6 Let A: <p"-» <p" be a transformation with partial multiplicities kl, . . . , km corresponding to the eigenvalue A0 of A. Then i dim[Ker(A-k0I)q] = 'Z{j\l^j^m, k^i}*, q = l,2,... i = I where il* represents the number of different elements in a finite set il. Proof. In view of formula (2.2.3) we have only to show that 2min{*,., $}=2{/|ls/<m, k^i}* , <? = 1,2,... (2.2.4) i=i i=i This equality is certainly true for q = 1 (for then both sides are equal to m). Assume that the equality is true for q - 1. We have mm m 2 min{A:,., q) - Y, min{A;,, q ~ 1} = 2 [min{/c,, q) - min^, q - 1}] i=i i=i i=i = {/|l</£m, k^q}* Adding the relation m q-\ 2min{£,., ? - 1} = £ (y | 1 </< m , *.-s=*}# i=i 1=1 (which is just the induction hypothesis) we verify (2.2.4). □ It follows from Proposition 2.2.6 that if Kcx{A - A0/)' = Ker(A - A0/)*+1 for some positive integer q, then actually Ker(,4 - A0/)" = Ker(A - A0/)p for all p^q, that is Ker(x - \Qiy = aAo(x)
58 The Jordan Form and Invariant Subspaces 2.3 PROOF OF THE JORDAN FORM In this section we prove Theorem 2.2.4. In view of Theorem 2.1.5, it is sufficient to consider A^ (A), where A0 £ <r(A) is fixed, in place of A. In other words, we can assume that A has only one eigenvalue A0, possibly with several partial multiplicities. Let if) = Ker(>4 - A,,/)', j-\,2,. . . ,m, where m is chosen so that ^m = ^A()(<4) but ym-x*$tKu(A). Note that ^C^C-'C^. Let x^,. . . , x^m) be a basis in ifm modulo ^m_,, that is, a linearly independent set in ifm such that Z'm-l + Span{xiJ,\...,xy} = ym (2.3.1) (the sum here is direct). We claim that the mtm vectors (A - A,,/)**^, ...,(A~ KI)k*{Lm) , * = 0,. . . , m - 1 are linearly independent. Indeed, assume m— \ tm 1 1 alk(A - Xjfx^ = 0, alke( (2.3.2) *=oi=i Applying (a-A0/)m~' to the left-hand side and using the property that (a - XJ)mx^ = 0 for i = 1,. . . , tm, we find that (A-\oir~,{iaiOx^} = 0 Hence Ej™, ai0x(£ &. 5^m_, and because of (2.3.1), am = • • • = a, 0 = 0. Applying (A - A0/)m~2 to the left-hand side of (2.3.2) we show similarly that atl = • • • = a, [ = 0, and so on, We put Mx = Span{(/1 - \0I)kx^ , A: = 0,. . . , m - 1} M2 = Span{(,4 - A0/)*a^2) , k = 0,. . . , m - 1} M,m = Span{(A - Xjfx^ , k = 0, . . . , m - 1} As we have just seen, the sum Ml + M2 + ■ ■ ■ + Mt is direct. Consider now the vectors 'ii-i^-Ao/)*^, i=l,...,tm We claim that
Proof of the Jordan Form ^m_2nSpan{xLn1,^2l„...,4'"-,1} = {0} Indeed, assume 1=1 Applying (A - A0/)m~ to the left-hand side, we get •m (/t-vr'S^^o i=i which implies a, = •• • = a, =0 in view of equality (2.3.1). So equation (2.3.3) follows. Assume first that 5^m_2 + Span{x'^)_i, . . . , x^J does not coincide with ^m_,. Then there exist vectors x{^xl),. . . , x{^Cx'm~i) in ifm_x such that the set {•*„-!},'™"i'",~1 is linearly independent and <fm_z + SpanUL0 „ . - . , ^-r-,)} = ^-. (2-3.4) Applying the previous argument to (2.3.4) as with (2.3.1), we find that the vectors (A - V)**^i. • ••.(*- Ao/)**^'-0 . ^ = 0,. . • , fit -2 are linearly independent. Now put M,m+1 = Span{(A - A,,/)*^-*" , A = 0,.... m - 2} •*,_♦,„,_, = Span{04 - AaO^Ji-V-'' , * =0,. . . , m -2} If it happens that ^m„2 + Span{^,_1, i = \,...,tm) = ym_x then put formally tm_l =0. At the next step put *li)-2 = M-A0/)*i:)-,, «=i,..., <„ + <„_, and show similarly that ym_3nSpan{xJ;,_2 , « = 1,. . . , tm + *„_,} = {0} Assuming that 5^,_3 + Span{^)_2, i= 1, . . . , /m + /„,_,} ^ 5^m_2, choose
60 The Jordan Form and Invariant Subspaces *m-2. ' = 'm + 'm-1 + *. ■ • • . 'm + 'm-1 + (m-2 »° SUch 3 W3y that the VeCtOrS xm-2» '•= 1> • ■ • . lm + 'm-i + ?m-2 are linearly independent and the linear span of these vectors is a direct complement to 5^m_3 in 5^m_2. Then put M>m+-m^, = Span{(/1 - V)**«--V,-,+'') - * = 0,.... in - 3} for / = 1, .. . , rm_2. We continue this process of construction of Mt, i = 1,. . . , p, where p = tm + tm_, + • • • + /,. The construction shows that each Mt is a Jordan subspace of A and the sume Mi 4- • • • 4- J<p is a direct sum. Also Mi + --- + Mp = ®/lo(A)=$" because of our assumption that <t(A) = {<t0}. Hence (2.2.2) holds. Let us prove the uniqueness part of Theorem 2.2.4. Assume that (2.2.2) holds, and let A,,. . . , At be all the different eigenvalues of A. Denoting by Ej the set of all integers /", l</<p, such that A, = A;, we have for / = 0,1,2,...: - , JO, if i0 E- dim Ker(^|A - A,/) = { ^ ^ ^ .f . £ £_ Consequently dim Ker(A - A/)' = 2 min(l, dim Mt) (2.3.5) ,e£; In particular (taking / = 1), the number of elements in E coincides with dim Ker(A - Ay/). This proves that for a direct sum decomposition <p" = ^V, 4- • ■ • -i- Jfq as in Theorem 2.2.4 we have q - p and for a fixed j the number of /u-, values that are equal to Ay coincides with the numbers of A, values that are equal to A-. Hence we can assume /n, = A,, i = 1,. . . , p. Further, (2.3.5) implies that (for fixed A;) the number dim Ker(,4 - A,/)' - dim Ker(A - Ay/)'_l coincides with the number of indices i £ £y such that dim Mi >t(t = 1,2,...), and thus it also coincides with the number of indices i €E £; such that dim jV) s /. This implies the uniqueness part of Theorem 2.2.4. 2.4 SPECTRAL SUBSPACES Let A: <p"—» <p" be a transformation. A subspace M C <p" is called a spectral subspace for A if M is a sum of root subspaces for A. The zero subspace is also considered spectral. Since root subspaces are A invariant, a spectral
Spectral Subspaces 61 subspace for A is A invariant. It is easily seen that the total number of spectral subspaces for A is 2r, where r is the number of distinct eigenvalues of A. By Theorem 2.1.5, for every A invariant subspace M, we have M = [M D 9?A[(i4)] + • • • + [M n <3lk (A)] (2.4.1) where A,, . . . , A are all the distinct eigenvalues of A. From this formula it is clear that M is spectral if and only if for every Ay either MV\'3tK (A) = {0} or the inclusion $/lK {A) C M holds. Another consequence of formula (2.4.1) is that, for any nonzero spectral subspace M of A, m = m^A) + ■■■ + n^A) where fi{, . . . , fis are all the distinct eigenvalues of the restriction A\M. A useful characterization of spectral subspaces is given by their maximally property. Proposition 2.4.1 An A-invariant subspace M ¥= {0} is spectral if and only if any A-invariant subspace if with the property a{A\^) C <t{A\m) is contained in M. Proof. Assume that M is not spectral so that, in particular, {0} ¥= M nS4 (A) ¥= 9iK (A) for some A0 €E a(A). Define the /1-invariant subspace if by the equalities if n %o(A) = %a(A), if n stK(A) = Mn si^a) for all eigenvalues A, of A different from A0. Obviously, o-(/i|^) = (t{A\m) but if is not contained in M (actually, if contains M properly). On the other hand, assume that M is spectral. If if is A invariant with (t(A\^) (Z a(A\M), then the equality if = [if (1 »Ai(/t)] + • ■ ■ + [iP D 9lA (A)] (2.4.2) (where A,,..., A are the distinct eigenvalues of A) implies that if n 9lK (A) = 0 for every A„ G a{A) not belonging to the spectrum of A\M. It follows then from (2.4.2) that i?caMi(i4) + --- + aMf(X) (2.4.3) where /n,,. . . , fis are the distinct eigenvalues of A\M. As the right-hand side of (2.4.3) is equal to M, the inclusion if C M follows. □ Another characterization of spectral subspaces can be given in terms of direct complements.
62 The Jordan Form and Invariant Subspaces Theorem 2.4.2 The following statements are equivalent for an A-invariant subspace M: (a) M is spectral for A; (b) there exists a direct complement JftoM such that Jf is A invariant and ff(XL)ntr(X|.v) = 0 (2.4.4) (c) there exists a unique A-invariant direct complement Jf to M; (d) for any A-invariant subspace i£ that contains M properly, o-(A^) contains a(A\M) properly. To accommodate the cases M = {0} and M = (p" in Theorem 2.4.2 we adopt the convention that the spectrum of the restriction of A to the zero subspace is empty. Proof. The equivalence of (a) and (d) follows immediately from Theorem 2.4.1. By Theorem 2.1.5 (considering each root subspace of A separately) we can assume that 3£A (A) = <p", that is, A has the single eigenvalue A0. Then the only spectral subspaces of A are {0} and <p". Further, since o-(j4|^,) = {A,,} for every nonzero /1-invariant subspace if, equation (2.4.4) implies that either a(A\M) or <r(/i|^) is empty; in other words, either M = {0} or Jf={0}. But if the latter case holds, then obviously M = <p". Thus M - {0} and M - <p" are the only subspaces satisfying (b), and (a) and (b) are equivalent. Obviously, (a) implies (c). So it remains to prove that (c) implies (a). Let M be a nontrivial ^-invariant subspace (i.e., different from {0} and <P") that has an >4-invariant direct complement Ji. Then ^V is nontrivial as well. We now use the Jordan form (Theorem 2.2.4) for the restriction A\x: M = Span{x(11),. . . , x^} + Span{*(12),. . . , *£»} + --- + Span{Ar(,,?,,...,A:<*)} where x^\ . . . , x^' is a Jordan chain (necessarily with eigenvalue A0) of A, i = 1,. . . , q. It is easily seen (cf. Proposition 1.3.4) that the vectors xj0 , j = 1,. . . , &,, i: = 1, . . . , q are linearly independent and hence form a basis in Jf. We now construct another direct complement for M that is A invariant. Let y (^0) be an eigenvector of A in Jt, and put Ji' = Span{x\,\...,x[\)_l,x[\) + y}+Span{x\2\...,xi12)} + ---+Span{;c(1,,,...,A:<*)}. As Ay = k0y, one checks easily that Jf' is A invariant. Also, Jf' ¥= Jf,
Spectral Snbspaces 63 because otherwise v would belong to M, a contradiction with the direct sum M + Jf= <p". We verify that Jf' is a direct complement to M. Indeed, observe that the vectors x^\. . . ; x[i]_,; x[u + y, Xj'\ j = 1,. . . , kt, i = 2,. . . , q are linearly independent and hence dim Jf' = dim M. So we must only check that M HJf' = {0}. Let « *i * i - • ^SS <v)° + 2 «^/<,) + «u,(4;) + y)e^n jr (2.4.5) where a,, are complex numbers. The condition J< (1 ^V = {0} implies z-aUiy = 0 (2.4.6) which in turn implies q ki and, because of the linear independence of xj'\ all the coefficients a^ are zeros. In particular, alk = 0, and z — 0 in view of equation (2.4.6). We have proved that (when <r(A) = {A0}) any nontrivial /1-invariant subspace either does not have ^-invariant direct complements or has at least two of them. This means that (c) implies (a). □ We deduce immediately from Theorem 2.4.2 that the unique ^-invariant direct complement jV to a spectral subspace M is spectral as well: if M = ®,H{A) + ■■■ + dl^A), then M = M^A) + •■• + to^A), where filt. . . , fis, vx,. . . , v, is a complete list of all the distinct eigenvalues of A. We say that the spectral subspace M for A corresponds to the part A of the spectrum of A if a(A\M) = A. Obviously, there is a unique spectral subspace corresponding to any given subset A of <t(A) [with the understanding that o-(/i|{0,) = 0], This spectral subspace can easily be described in case A is given by an n x n matrix in Jordan form as in equation (2.2.1). Indeed, using the notation of that equation, if A C a(A), define the k, x k^ matrix Ki by AT, = / if A, e A and Kt, = 0 if A, ^A. Then the subspace is the spectral subspace for A corresponding to A. Its only ^-invariant direct complement is Im[(/-K,)0---0(/-A:p)] We conclude this section with a description of spectral subspaces in terms of contour integrals. (Actually, this description is a particular case of the
(A The Jordan Form and Invariant Subspaces properties of functions of transformations that are studied in more detail in Section 2.10.) Let T be a simple, closed, rectifiable, and positively oriented contour in the complex plane. In fact, for our purposes polygonal contours will suffice. Given an n x n matrix B( A) = [fciy( A)J">=,, that depends continuously on the variable A £ T (this means that each entry btj{ A) in B{ A) is a continuous function of A on Y) the integral /rfi(A)rfA = [c/y]->.I is defined naturally as the n x n matrix whose entries are the integrals of the entries of B(A): c*/ = JrMA)rfA; i,j=l,--.,n The same definition of a contour integral applies also for transformations B(A): <p"—►£" that are continuous functions of A on T. We have only to write B(A) as a matrix [^(A)]"y=1 in a fixed basis, and then interpret Jr B{ A) d\ as a transformation represented by the matrix [Jr bif( A) dA]"y=1 in the same basis. One checks easily that this defintion is independent of the chosen basis. Proposition 2.4.3 Let A be a subset of <t(A) where A is a transformation on <p", and let T be a closed contour having A in its interior and <r(A) ^ A outside T. Then the transformation ^liXl-AVdK is a projector (known as a Riesz projector) onto the spectral subspace associated with A and along the spectral subspace associated with <t(A) "- A. Proof. Using the relation 5(A/- A)~lS~l = (A/ - SAS'1)'1, equation (2.1.2), and the Jordan form, we can assume that A is an n x n matrix given by A = Jki(Xl)®---@Jkp(\p) where 7t(A,) is the kt x k, Jordan block with A, on the main diagonal. One easily verifies that (A-A,-)"1 (A-A,)"2 ••• (A-A,)-*n 0 (A-A,)"' : : : (a-'a,-)-2 0 0 ••• (A-A()_1 J [AZ-y^A,)]^
Irreducible Invariant Snbspaces and Unicellular Transformations 65 As a first consequenc of this formula we see immediately that, because ff(i4)(1T = 0, (A/- A)~l is indeed continuous on I\ Further, the Cauchy formula gives f m _ f 2iri , if m = 1 and A, is inside V J r ^ ''•' 10, otherwise Thus ^ jr (A/ - A)'1 d\ = 5(/C, 0 • • • 0 Kp)S l (2.4.7) where Kt: = I if A G A and /C, = 0 if A ^A. Thus the matrix (2.4.7) is indeed a projector with image and kernel as prescribed by the theorem. □ 2.5 IRREDUCIBLE INVARIANT SUBSPACES AND UNICELLULAR TRANSFORMATIONS In this section we use the Jordan form to study irreducible invariant subspaces. An invariant subspace M of a transformation /l:<p"-*<p" is called reducible if M can be represented as a direct sum of nonzero /1-invariant subspaces Mx and Jt2; otherwise M is called irreducible. Let us consider some examples. example 2.5.1. Let i4bea Jordan block. Then, as Example 1.1.1 shows, each nonzero /1-invariant subspace (including <p" itself) is irreducible. example 2.5.2. Let A = A0/, A0 G <p. Then an ^-invariant subspace is irreducible if and only if it is one-dimensional. example 2.5.3. Let 0 0 0 1 0 0 0 0 0 According to Theorem 2.1.5, the ^-invariant subspaces are as follows: {0}; Spanja^! + f}e3} for fixed numbers a, p G <p with at least one of them different from zero; Span{e,, e2); Span{e,, e3}; <p3. Among these subspaces Span{e,, e3} and <p are reducible and the rest are irreducible. □ The following theorem gives various characterizations of irreducible invariant subspaces.
66 The Jordan Form and Invariant Subspaces Theorem 2.5.1 The following statements are equivalent for an A-invariant subspace M:(a)Mis irreducible; (b) each A-invariant subspace ^contained in M is irreducible; (c) M is Jordan, that is, has a basis consisting of vectors that form a Jordan chain of A; (d)thereisa unique eigenvector (up to multiplication by a scalar) of A in M;(e) the lattice of invariant subspaces of A\M is a chain—that is, for any A-invariant subspaces J£u 2?2CM either if, C i?2 or 2£2 C S£x holds; (/) every nonzero A- invariant subspace that is contained in M is Jordan; (g) the spectrum of A \ M is a singleton {A()}, and rank[(A\M - A,,/)'] = max{0, (dim M) - i) , i = 0,1,. . . (h) the Jordan form of the linear transformation A\M consists of a single Jordan block. Proof. The definition of a Jordan block and the description of its invariant subspaces (Example 1.1.1) show that (h) implies all the other statements in Theorem 2.5.1. The implications (f)-*(c) and (b)-*(a) are obvious. Let us show that (c)-*(d). Let *,,. . . , xk be a basis in M such that Axx = A0jt,; Ax2 - X0x2 = *,;...; Axk - \0xk - xk_{. The matrix of A\M in this basis is the k x k Jordan block with A0 on the main diagonal, so the spectrum of A\M is the singleton {A0}. If x = Y*i=laixi is an eigenvector of A (necessarily corresponding to A0), then (A - \aI)x — 0, which implies T.i=2aixi_l=0. As xt, . . . , xk are linearly independent, a2 — ■ ■ ■ = ak = 0, and x is a scalar multiple of*,. So (d) holds. If x and y arc two eigenvectors of A\M such that Span{*} ^Span{y}, then for the /1-invariant subspaces if, = Span{*} and if, = Span{y} we have i?, JZ'if, and Se20^l. So (e) implies (d). It remains, therefore, to show that (d)-^(h), (a)-*(h), and (g)—>(h). To this end we can assume that A\M is in Jordan form (written as a matrix in a suitable basis in A„): If p > 1, then e, and ek +1 are two eigenvectors of A in M that are not scalar multiples of each other; so (d)-*(h). Further, if p>\, then M = Span{<?,,. . . , eki} + Span{t*ti + I,. . . , eki+k^...+kp) is a direct sum of two nonzero ^-invariant subspaces. Hence (a)-»(h). Finally, assume that (g) holds. Then we have A, = A2 = ■ • • = A = A0 in equation (2.5.1), and this equation implies p r<mk(A\M - A07)' = 2 max{0, kt - i} , / = 0,1, 2,. . .
Irreducible Invariant Subspaces and Unicellular Transformations 67 On the other hand, the statement (g) implies that the left-hand side of this equation is also equal to max{0, /c, 4- • ■ -4- kp - i}. In particular (for i - 1), we have p 2 (*,-i) = *, + ••• + *„-! which implies p = 1. So (h) holds, and Theorem 2.5.1 is proved. □ Observe that with M = <p", Theorem 1.9.3 is just the equivalence (d)<=>(e). Thus the proof of that theorem is now complete. A transformation A: <p"—► <p" is called unicellular if the Jordan form of A consists of a single Jordan block. Comparing statements (a) and (h) of Theorem 2.5.1, we obtain another characterization of a unicellular transformation. Proposition 2.5.2 A transformation A: <p" —» <p" is unicellular if and only if the whole space <p" is irreducible as an A-invariant subspace. Indeed, rewriting Theorem 2.5.1 for the particular case M — <p", one obtains various characterizations of unicellular transformations. Another important property of a unicellular transformation is the "near" uniqueness of an orthonormal basis in which this transformation has upper triangular form (see Section 1.9). Theorem 2.5.3 A transformation A: <p"—* <£" is unicellular if and only if for any two orthonormal bases x,,..., xn and y{,. . . , yn in which A has an upper triangular form we have *y = 0yy,, j=\,...,n (2.5.2) where 0,,e <p and |0,-| = 1. Proof. Assume that A is not unicellular. By Theorem 2.5.1 there exist two eigenvectors x{ and y, (which can be assumed to have norm 1) such that Span{x,} ^Span(_y,}. The proof of Theorem 1.9.1 shows that there exists an orthonormal basis whose first vector is x{ and in which A has a triangular form. Similarly, there exists such a basis whose first vector is y,. So equation (2.5.2) does not hold for /= 1. Assume now that A is unicellular, and let z,,. . . , zn be a Jordan basis for A in <£". So Az{ = A0z, ; Az-t - A0z, = z,_, , i = 2,...,n
68 The Jordan Form and Invariant Subspaces For / = 1,2,. . . , n define x{ to be a vector in Span{z,,. . . , z,} that is orthogonal to Span{z,,. . . , zL_x) and has norm 1. (By definition, *, = az,/1|z, || for some a € <p with |a| = 1.) Then Spanf*,,. . . , xJ = Span{z,,. . . , z,} , «'=1,. . . , « and these subspaces are A invariant. By Proposition 1.8.4, A has an upper triangular form with respect to the orthonormal basis xlt . . . , xn. If A also has an upper triangular form in an orthonormal basis y,,...,y„, then Span{y,} C Span{y,, y2) C • • • C Span{y,, • • • , y„} (2.5.3) is a chain of /1-invariant subspaces. But the lattice of all .^-invariant subspaces is a chain (Example 1.1.1); therefore, (2.5.3) is a unique complete chain of ^-invariant subspaces. Hence the chain (2.5.3) coincides with Span{z,}CSpan{z,,z2}C---CSpan{z,,z2,. . . , z„} Hence Span{y,, . . . , yt} = Span{z,,. . . , z,} for i = 1, 2,. . . , n, and the orthonormality of y,,. . . , yn implies that (2.5.2) holds. □ We conclude this section with a proposition that was promised in Section 1.1. Proposition 2.5.4 The set \nw(A) of all invariant subspaces of a fixed transformation A: <p"-» <p" is either a continuum [i.e., there exists a bijection <p: ln\(A)-* $] or a finite set. Proof. In view of Theorem 2.1.5 we can assume that A has only one eigenvalue A0, that is, ^^(A) = <p". If A is unicellular, then by Example 2.1.1 the set Inv(^4) is finite (namely, there are exactly n + 1 ^4-invariant subspaces). If A is not unicellular, then by the equivalence (c)<=>(d) in Theorem 2.5.1 there exist two linearly independent eigenvectors x and y of A: Ax = k0x, and Ay = A0y. Then {Span{* + ay} \ a E. $} is a set of A- invariant subspaces which is a continuum. On the other hand, let t/> be the map from the set of all n-tuples (*,,. . . , xn) of n — dimensional vectors xx,.. . , xn onto \n\{A) defined by t\f{xu . . . , xn) - Span!*,,. . . , xn) if the subspace Span!*),. . . , xn) is A invariant and (/>(*,, . . . , xn) = {0} otherwise. As the set of all n-tuples (*,,..., xn), x, e <p" is a continuum, by an elementary result in set theory it follows that ln\(A) is a continuum as well. □
Generators of Invariant Subspaces 69 2.6 GENERATORS OF INVARIANT SUBSPACES Let Jt be an invariant subspace for the transformation A: <p" -»<p". The vectors *,,. . . , xm £ <p" are called generators for Jt if ./fl = Span{.r,,. . . , xm, Axx, . . . , Axm, A2xx, . . . , A2xm, . . .} For example, any basis for Jt forms a set of generators for Jt. In connection with this definition note that for any vectors y,,... j Gf the subspace Span{y,,. . . , yp, Ay,,. . . , Ayp, A2y{,... , A2yp,. . .} is A invariant. The particular case when Jt has one generator is of special interest (see also Section 1.1), that is, when Jt = Span{*, Ax, A2x,. . .} for some x E <p". In this case we call Jt a cyclic invariant subspace (and is frequently referred to as a "Krylov subspace" in the literature on numerical analysis). The notion of generators behaves well with respect to similarity. That is, if Jt is an ^4-invariant subspace with generators *,,. . . , xm, then SM is an SAS~'-invariant subspace with generators Sxx,. . . ,Sxm (here 5 is any invertible transformation). So the study of generators of /1-invariant sub- spaces can be reduced to the study of generators of /-invariant subspaces, where J is a Jordan form for A. Let us give some examples. example 2.6.1. Let A = I (or, more generally, A = al, where a G <p). Then a A>dimensionaI subspace Jt in <p" (which is obviously /1-invariant) has not less than k generators. Any set of vectors that span Jt is a set of generators. example 2.6.2. Let A = Jn{ A) be the n x n Jordan block with eigenvalue A. An /1-invariant subspace Jlk = Span{e,,. . . , ek} is cyclic with the generator ek. □ The generators xx, . . . , xm of Jt are called minimal generators for Jt if m is the smallest number of generators of Jt. Obviously, any set of minimal generators is a minimal set of generators. (A set of generators xt,. . . , xp for the /1-invariant subspace Jt is called minimal if any proper subset of {*,,. . . , xp} does not constitute a set of generators for Jt.) However, not every minimal set of generators is a set of minimal generators. Let us demonstrate this in an example. example 2.6.3. Let -[J S] and let Jt — <p be the ^-invariant subspace. The vector (1, 1) is obviously a generator for Jt, so a set of minimal generators must consist of a single vector.
70 The Jordan Form and Invariant Subspaces On the other hand, the set of two vectors {ex, e2} is a set of generators of <p that is minimal. Indeed, neither of the vectors ex and e2 is a generator of <p2. □ The number of vectors in a set of minimal generators admits an intrinsic characterization as follows. Theorem 2.6.1 Let M be an A-invariant subspace. Then the number of vectors in a set of minimal generators coincides with the maximal dimension m of Ker(>4 - A0/)|^, where A0 is any eigenvalue of A\M. Proof We can assume that A\M = Jkl(K)®---@Jkp(^P) (2.6.1) a matrix in Jordan form (with respect to a certain basis in M). Further, we can assume that A, = • • • = Am where m<p (recall that m is the maximal number of Jordan blocks corresponding to any eigenvalue). Let xlt. . . ,x be generators of M. Let y, be the m-dimensional vector formed by the &,th, (kx + k2)th,. . . , (kx + k2 + • • • + km)th coordinates of xi (i = 1, . . . , q). Now e^eSpanl*,,. . . , xq, Axx,. . . , Axq, A2xx,. . . , A2xq,. . .} Examining the kxth,. . . , (&, + k2 + ■ ■ • + km)th coordinates of xt and using the condition A, = • • • = Am, we see that ex G Span{_y,, . . . , yq). Similarly, the condition et1 + t2eSPan{j:1, ...,xq, Axx,. . . , Axq, A2xx,. . . , A2xq, . . .} gives rise to the conclusion that e2 £ Span{y,,. . . , yq). Continuing in this way, we eventually find that et G Span{y,,. . . , yq}, i = 1, 2,. . . , m. So yt, ■ ■ ■ , yq span the whole space <pm, and thus q>m. We now prove that there is a set of m generators for M. We proceed by induction on m. Suppose first that m = l, that is, the eigenvalues A,,..., A are all different. Then the vector x — ek + ek +k + ■ ■ ■ + ek +...+k is a generator for M. Indeed (A ~ XJp ■■■{A- \pI)k-x = (A- \2I)k> ■■■{A- kpDk>eki ='/, Because of the form (2.6.1) of A{M the matrix (A - A2/)*2- • • (A - Ap/)^ has the form T{ ©0/t| ® • • • ©0^. , where Tx is an upper triangular non-
Generators of Invariant Subspaces 71 singular matrix. Hence the &,th coordinate flk of /, is nonzero. Now (A - XjYfi has (&; -/)th coordinate equal to/, k (and thus nonzero) and all the coordinates of (A - A,/)'/, below the (kl — y")th coordinate are zeros (j=\,. . . ,kl-l). Consequently, the vectors el,. . . , ek belong to the span of fv, (A - A,/)/,,. . . , (A - A,/)*1 '/,. Similarly, one shows that the span of vectors f2,{A-\2l)f2,...,{A-k2lt--xf2 where f2 = (A - \J)k'(A - k3Ip ■■■(A- \p)k'x contains vectors ek +,,..., ek +k . Proceeding in this way we find eventually that all the vectors e,, i = 1,. . . , kl + ■ ■ ■ + kp belong to Span{*, Ax, Ax,...}. Assume now that m>\. Suppose that for any transformation B and any B-invariant subspace i£ such that max dimKer(B - A/)|^> = m - 1 there exists a set of m — 1 generators in i£. Given the transformation A: <p"-*<p", write A\m = AL,® A\m2 where M{ and M2 are some A -invariant subspaces such that max dimKer(>4|^ - A/) = m - 1 max dimKer(/4|^ - A/) = 1 KBa(.A\Mj) 2 (Such subspaces Mx and M2 are easily found by using the Jordan form of A.) By the induction hypothesis we have a set of m - 1 generatorsxu . . . ,xm_t for the .^-invariant subspace Mx. Also, we have proved that there is a generator xm for the /1-invariant subspace M2. Then, obviously, xl,...,xm is a set of generators for M. □ In particular, an ^-invariant subspace M is cyclic if and only if there is only one eigenvector (up to multiplication by a nonzero number) in M corresponding to any eigenvalue of the restricton A\M. We conclude this section with an example.
72 The Jordan Form and Invariant Subspaces example 2.6.4. Let A = diag[A,A2 • • • A„] where Al5. . . , A„ are different complex numbers. Then <p" is a cyclic subspace for A. A vector x = (xv,. . . , xn) £ <p" is cyclic, that is <p" = Span{x, Ax, A2x, . . .} if and only if all the coordinates x{ are different from zero. Indeed, if xt = 0 for some /, then et does not belong to Span{x, Ax, Ax,...}. On the other hand, if xt■ ¥= 0 for i = 1,. . . , n, then det[jt, Ax, A"~ '*] = det x\ x2 -Xn — xxx2 A,Jf, X2x2 An^„ A:„det ... " 1 1 i A," A"- a;: A, A2 A„ i -i xi i x2 i XnJ ■ K~l ' ' Ar' • a;- . The determinant on the right-hand side is known as the Vandermonde determinant, and it is well known that it is equal to Ilj<;(A; - A,) ^0. So det[*, Ax, . . . , A"~ lx] ¥^ 0. It follows that the vectors x, Ax,. . . , A"~ x are linearly independent and thus span <p". □ 2.7 MAXIMAL INVARIANT SUBSPACE IN A GIVEN SUBSPACE Given a transformation A: <p"-» <p" and a subspace Ji C <p", we say that an >4-invariant subspace M is maximal in M \i MCM and there is no A- invariant subspace that is contained in ^V and contains M properly. Proposition 2.7.1 A maximal A-invaraint subspace in N exists, is unique and is equal to the sum of all A-invariant subspaces that are contained in M. Note that, because the dimension of Ji is finite, M can actually be expressed as the sum of a finite number of /1-invariant subspaces. Proof. Clearly, M is /1-invariant and contained in Ji. Also, M is maximal in Ji. This follows from the definition of M that implies that every /1-invariant subspace in ^Vis contained in M. For the uniqueness, assume that there are two different maximal /1-invariant subspaces in ^V, say, Mx and M2. Then Ml + M2 is an A -in variant subspace in ^V that contains Mi properly, a contradiction with the definition of a maximal /1-invariant subspace in ^V. □
Maximal Invariant Subspace in a Given Subspace 73 Observe that if Jf is A invariant, the maximal /1-invariant subspace in Jf coincides with ^V" itself. At the other extreme, assume that Jf does not contain any eigenvector, it follows that Jf does not contain nonzero A- invariant subspaces. Hence the maximal /1-invariant subspace in ^V" is the zero subspace. Let us consider some examples. examples 2.7.1. Let A = diag[A,, A2,. . . , AJ, where A,, . . . , A„ are different complex numbers. Then the maximal ,4-invariant subspace in ^V is Span{ey-,. . . , ey }, where ef, p = l,...,k are all the vectors among ex,. . . , en that are contained in Jf (by definition, Span{ey|,. . . , ejk) = {0} if none of the vectors e,,. .. , en belongs to Jf). example 2.7.2. Let A = Jn{ A0), the n x n Jordan block with eigenvalue A0. Then the maximal ^-invariant subspace in Jf is Span{e„...,e ,}, where p is the minimal index such that ep^Jf (again, we put Span{e,,. . . , e ,} = (0} if Jf does not contain e,). □ The following more explicit description of maximal ^-invariant subspaces is sometimes useful. Theorem 2.7.2 The maximal A-invariant subspace in Jf coincides with M^'njf. where Jfj = {x S <p" | A'x G Jf} (in particular, Jf0 = Jf). Proof. We have M C Jf0 = Jf. Further, M is ^-invariant. For, if x €E M, then A'x = yy for some y)E.Jf(j = 0,l,...) and A'(Ax) = yi + l, 7 = 0,1,... Hence Ax €E M. It remains to verify that M is maximal in ^V. Let i£ be an .^-invariant subspace contained in Jf. Then for j — 0,1,. . . , {x e <p" | A'x e %} c {x e <p" | A'x g Jf) and (because i£ is A invariant) X c {x e <p" j A'x e %} Combining these inclusions, we have
74 The Jordan Form and Invariant Subspaces i? c n {x e <p" I a'x e ^} c n {* e <P" I ^'* e jV} = ^ and J< is indeed maximal in Jf. D In connection with Theorem 2.7.2 observe that Jfi = A~'Jf with A is invertible. Given a transformation >4: <p"-» <p", it is well known that there are scalar polynomials/(A) such that, if/(A) = £1=0a,A', then 1=0 Indeed, the characteristic polynomial of A has this property (the Cayley- Hamilton theorem). A nonzero polynomial g( A) of least degree—say, p—for which g(A) = 0 is called a minimal polynomial for A (it can be shown that p is uniquely defined). Then it is clear that for any integer />p, we can equate A' to a polynomial in A of degree less than p. Thus (in the notation of Theorem 2.7.2) n^ = nVy (2.7.i) where p is the degree of a minimal polynomial of A. Indeed, the inclusion C in equation (2.7.1) is obvious. To prove the opposite inclusion, let q(X) - Kp + Z£r„' aj^' be a minimal polynomial of A, so p-i q{A) = A" + 2 a/1' = ° ,=0 Let x e Dp,1 .A}, so /f* G AT for / = 0,. . . , p - 1. Then Apx=-Y, ajA'x E M y=o and xE.Np. Assume inductively that we have already proved that xEJij, j = 0,. . . , q - 1 for some q^p. Then p-i A'x^-'Z atAq-p+ixe.J{ and x&.Nq. So actually x&.C\^0Np and equation (2.7.1) is proved. Observe that (2.7.1) implies n ^ = n ^- (2.7.2) ;=0 / = 0 for every <y >p - 1. In particular, equation (2.7.2) holds with q - n - 1.
Maximal Invariant Subspace in a Given Subspace 75 The case when Jf-KerC and C: <£"-*<£' is a transformation is of particular interest. In this case one can describe the maximal /1-invariant subspaces in Ker C in terms of the kernels of transformations CA', j = 0,1,.... Theorem 2.7.3 Given linear transformations /4:<p"-*<p" and C:<p"—»<pr, the maximal A-invariant subspace in Ker C is X(C,A)d=r\Ker(CA>) Moreover, the subspace jfc{C, A) coincides with njro' Ker(CA') for every integer q greater than or equal to the degree of a minimal polynomial of A. Proof. In view of Theorem 2.7.2 and equality (2.7.2), we have only to show that Ker(CA') = {x £ <p" | A'x £ Ker C) , j = 0,1,. . . However, this equality is immediately verified using the defintions of Ker C and Ker(C4')- □ We say that a pair of linear transformations (C, A) where A: <p"-*<p" and C: <p" —» <pr is a null kernel pair if the maximal /1-invariant subspace in Ker C is the zero subspace, or, equivalently, if nKer(C4') = {0} It is easily seen that, also, the pair (C, A) is a null kernel pair if and only if rank C CA -CA" \ example 2.7.3. Let C = [c{-- cn\: <p"-» <p, and For/ = 0, 1,. . A = . , n - 1 we h "0 0 _0 ave 1 0 0 0 • 1 • 0 • • 0 • 0 1 • 0
76 The Jordan Form and Invariant Subspaces C4" = [0-0<vc„ 1 and hence C]Ker(CAJ) = Span{el, . . . ,ek_x} where k is the smallest index such that ck ¥=Q. In particular, (C, A) is a null kernel pair if and only if c, ^0. □ The notion of null kernel pairs plays important roles in realization theory for rational matrix functions and in linear systems theory, as we see in Chapters 7 and 8. Here we prove that every pair of transformations has a naturally defined null kernel part. Theorem 2.7.4 Let C: <p"-*<pr and A: <p"-*<p" be transformations, and let Mx be the maximal A-invariant subspace in Ker C. Then for every direct complement M2 to Mx in <p", C and A have the following block matrix form with respect to the direct sum decomposition <p" = Mx + M2. C-I0.C,], A-[A« A< (2.7.3) where the pair C2: M2-^> <p', A22: M2—* M2 is a null kernel pair. If <p" = M[ + M'2 is another direct sum with respect to which C and A have the form C = [0,Q], A = [Aq Al2 A22 (2.7.4) where the pair (C2, A22) is a null kernel pair, then M[ is the maximal A-invariant subspace in Ker C and there exists an invertible linear transformation S: M2^>M2 such that C2 = C2S, A22 = S'lA^2S (2.7.5) Proof AsJ, is ^4 invariant and Cx = 0 for every xE;Ml, the transformations C and A indeed have the form of equality (2.7.3). Let us show that the pair (C2, A22) is null kernel. Assume x E. n°°=0Ker(C2A'22). As A' L0 A'J- '-0-'- where by * we denote a transformation of no immediate interest, we have
Maximal Invariant Subspace in a Given Subspace 77 and hence x<=r\Ker(CA')EMl i-o On the other hand, x belongs to the domain of definition of A22, that is, x&M2. Since l,ni2 = {0}, the vector x must be the zero vector. Consequently, (C2, A22) is a null kernel pair. Now consider a direct sum decomposition <p" = M\ 4- M'j, with respect to which C and /I have the form of equality (2.7.4) with the null kernel pair (C2, A22 )■ As O4y = [0,C220422y], 7=0,1,... we have 0 Ker(C4') = ^1 + ( PI Ker[C22(,422)']) = ^| where the last equality follows from the null kernel property of (C2, A'22). Hence M\ actually coincides with Mx. Further, write the identity transformation /: <p"-* <p" as a 2 x 2 block matrix /= \:MX + M2^MX + M2 Here S: M2-* Jt'2 is a linear transformation that must be invertible in view of the invertibility of /. The inverse of / (which is / itself) written as a 2 x 2 block matrix with respect to the direct sum decompositions <p" = Mx + M2 = Mx + M2 has the form 7=1 X\:MX + M2-*MX + M2 We obtain the equalities \AXX A12]\I * ]\A'XX A'X2 L 0 A22\ l_0 5"'JL 0 A'22 [0 C2] = [0 C2][*Q *] which imply equality (2.7.5). D Observe that if (2.7.5) holds, one can identify both M2 and M'2 with <pm, / LO
78 The Jordan Form and Invariant Subspaces for some integer m. Write C2 and A22 as r x m and mxm matrices, respectively, with respect to a fixed basis in <p' and some basis in <pm. Then C2 and A22 are transformations represented by the matrices C2 and A22, respectively, with respect to the same basis in <pr and a possibly different basis in <pm. So the pairs (C2, ^422) and (C2, A'22) are essentially the same. We conclude this section with an example. example 2.7.4. Let C and A be as in Example 2.7.3, and assume that c, = • • • = ck_x = 0, ck ¥=0 (k> 1). Then (C, ^4) is not a null kernel pair. The null kernel part (C2, A22) of (C, A) (as in Theorem 2.7.3) is given by ^-2 = lC/t' Cfc + 1> • • ■ ' Cn\ ' A22 = -J„-;t+i(0) 2.8 MINIMAL INVARIANT SUBSPACES OVER A GIVEN SUBSPACE Here we present properties of invariant subspaces that contain a given subspace and are minimal with respect to this property. It turns out that such subspaces are in a certain sense dual to the maximal subspaces studied in the preceding section. We also see a connection with generators of invariant subspaces, as studied in Section 2.6. Given a transformation A: <p"-» <p" and a subspace ^V C <p", we say that an /1-invariant subspace M is minimal over Jf if JCD Jf and there is no /1-invariant subspace that contains M and is contained properly in M. As an analog of Proposition 2.7.1 we see that a minimal A-invariant subspace over N exists, is unique, and is equal to the intersection of all A-invariant subspaces that contain Jf. The proof of this statement is left to the reader. If ^V is A invariant, then the minimal /1-invariant subspace over ^V coincides with ^V itself. On the other hand, it can happen that <p" is the minimal /1-invariant subspace over ^V, even when ^V is one-dimensional. example 2.8.1. Let A = diag[A,, A2,. . . , An], with different complex numbers A,,. . . , An. Let N = Span E"=1 aiei be a one-dimensional subspace. Then the minimal .^-invariant subspace over Jf is Span{^ | a; ^0}. In particular, if all atj are different from zero, then the minimal /1-invariant subspace over M is <p". □ Our next result expresses the duality between minimal and maximal invariant subspaces in a precise form. (Recall that by Proposition 1.4.4 the subspace M is A invariant if and only if its orthogonal complement M x is A* invariant.) Proposition 2.8.1 An A-invariant subspace M is minimal over J{ if and only if the A*-invariant subspace M1 is maximal in Nx.
Minimal Invariant Snbspaces Over a Given Subspace 79 Proof. Assume that the .4-invariant subspace M is minimal over M. In particular, M D N, so Jix C ML. If there were an A*-invariant subspace Z£ such that M± C2£CM^ and ML^^£, the subspace if1 would be A invariant and MD 3?^ D N, M^ !£L. This contradicts the definition of M as a minimal /1-invariant subspace over M. Hence M1 is a maximal A*- invariant subspace in Jf^. Reversing the argument, we find that, if the >l*-invariant subspace ML is a maximal in ML, the /1-invariant subspace M is minimal over J{. □ Proposition 2.8.1 allows us to obtain many properties of minimal invariant subspaces from the corresponding properties of maximal invariant subspaces proved in the preceding section. For example, let us prove an analogue of Theorem 2.7.2 in this way. Theorem 2.8.2 The minimal A-invariant subspace M over M coincides with DJL0 A'Jf. Proof. By Proposition 2.8.1 and Theorem 2.7.2, we have M1 = C\Jfi (2.8.1) where Jfj; = {x G <p" | A*'x G Jf1}. It is not difficult to check that for / = 0,1,... Jfj- = AJSf (2.8.2) Indeed, let y G A'Ji, so that y = A'z for some z E.N. Then for every x G <p" such that A*'x G Jf± we have (y, x) = {A'z, x) = (z, A*'x) = 0 Hence yE.Jff. If the equality (2.8.2) were not true, there would exist a nonzero y0 G Jf]~ such that v0 would be orthogonal to A'N. Hence for every z G ^V we have 0 = (yo,A'z) = (A*'yo,z) which implies v0 G Jfj, a contradiction with y0 G Nf. Now (2.8.1) and (2.8.2) give Note that the equality M - Ejl0 A'Ji can also be verified directly without
80 The Jordan Form and Invariant Snbspaces def difficulty. To this end, observe that the subspace Jt0 = E°°_0 A'Jf is A invariant: if x= A'z for some zE.Jf, then Ax = A'+lz belongs to M0. Obviously, M0 contains Jf. If M' is an /1-invariant subspace that contains Jf, then oc oc n = 0 n = 0 So MQ is indeed the minimal ^4-invariant subspace over Jf. As all subspaces under consideration are finite dimensional, the sum T.J=0A'Jf is actually the sum of a finite number of subspaces A'Jf (/ = 0,1,.. .). In fact 2 A'Jf^'Z A'Jf (2.8.3) /=o >=o where q is any integer greater than or equal to the degree p of a minimal polynomial for A. Indeed, it is sufficient to verify equation (2.8.3) for q = p. Let r(A) = \p + T.p~^ a. A' be a minimal polynomial of A, so X" + 2 <V4> = 0 1 = 0 Assuming by induction that we have already proved the inclusion j-i p-i i-o /=o for some s>p, for a: = Asy, y&Jfwe have p-i P-\ x=A'y = -'2i ajAs~p+iy E X A'Jf 1=0 i=0 So the inclusion AsJf C T.p~q A 'Jf follows, and by induction we have proved the inclusion C in (2.8.3) (with q - p). As the opposite inclusion is obvious, (2.8.3) is proved. Going back to Theorem 2.8.2, observe that ^ = Span{i4'/1,...M'A|/' = 0,l,...} where /,,..., fk is a basis in Jf. In other words, the .^-invariant subspace M has a set of k generators, where k = dim Jf. Combining this observation with Theorem 2.6.1, we obtain the following fact.
Minimal Invariant Subspaces Over a Given Subspace 81 Theorem 2.8.3 If M is the minimal A-invariant subspace over Jf, and k = dim Jf, then for any eigenvalue A0 of A ^ we have dim Ker(A - k0I\M < k (2.8.4) In particular, the theorem implies that if Jf is one-dimensional, then M is cyclic. It is easy to produce examples when the inequality in (2.8.4) is strict. For instance, in the extreme case when N = <p" and A has n distinct eigenvalues, we have M = <p" and max dim Ker(^4 - A0/) = 1 A0Sor(/l) The case when Jf - Im B, and B: $s—> <p" is a transformation is of special interest. Noting that A'(lm B) = lm(A'B), Theorem 2.8.2 together with (2.8.3) gives the following. Theorem 2.8.4 Let B: <p*—»<p" and i4: <p" —» <p" be transformations. Then the minimal A-invariant subspace over Im B coincides with def °° 9_1 f(A, B) = 2 lm(A'B) = 5 Im(i4,'B) y-o >-o /or every integer q greater than or equal to the degree of a minimal polynomial for A. [In particular, J(A, B) = E"J0' Im^'fl).] We say that a pair of transformations (A, B), where A: <p"-*<p" and B: <pJ—»<p", is a full-range pair if the minimal ^-invariant subspace over Im B coincides with <p", or, equivalently, if B-l 2 Im(i4^) = 4:" , = 0 It is easy to see that, also, (A, B) is a full-range pair if and only if rank[B AB ■■■ A"~lB] = n The duality generated by Proposition 2.8.1 now takes the form: the pair (A, B) is a full-range pair if and only if the adjoint pair (B*, A*) is a null kernel pair. This follows from the orthogonal decomposition
82 The Jordan Form and Invariant Subspaces <p"=Im[B AB y4"~'B]©Ker B* ' A*B* (A*)"~lB* which is obtained directly from Proposition 1.4.4. example 2.8.2. Let A = 1 0 0 1 0 0 0 0" 0 i o_ B = b2 A- <F-<P" Then ^(i4,B) = Span{<rl,...,eB1} where m is the index determined by the properties that fcm^0, bm+l = ■ ■ ■ = bn = 0. In particular, ${A, B) - {0} if and only if B = 0, and the pair (A, B) is full range if and only if bn #0. □ As with null kernel pairs, full-range pairs will be important in realization theory for rational matrix functions and in linear systems theory (see Chapters 7 and 8). We conclude this section with an analog of Theorem 2.7.4 concerning the full-range part of a pair of transformations. Theorem 2.8.5 Given transformations A: <p"—»<p", B: <pJ—»<p", let ^V, be the minimal A- invariant subspace over Im B. Then for every direct complement N2 to Jfl in <p", and with respect to the decomposition <p" = Jfi + X2, the transformations A and B have the block matrix form 0 A„l' LOJ (2.8.5) where the pair A n: ^ -+ Jf, B,: (f^l, ij full-range. If <p" = Jf[ + M'2 is another direct sum decomposition with respect to which A and B have the form A = A'u I"]. -If] (2.8.6) with full-range pair (A'n, B[), then Jf\ = Ml and A'n = Au, B[ = Bx.
Marked Invariant Subspaces 83 Proof. Equality (2.8.5) holds because Jlfl is A invariant and Nl D Im B. Further, in view of (2.8.5) we have 2 IiruM^B,) = 2 Im(y4'J?) = Jft so (Al{, fij) is indeed a full-range pair. If (2.8.6) holds for a direct sum decomposition <p" = Jf[ + J*f'2, then 2lm(i4>B) = 2lm((i4;i),'B;) >=o /=o which is equal to ^VJ in view of the full-range property of (A'n, B,). Hence J([ is the minimal ^-invariant subspace over Im B and thus .A",' = M1. Now clearly A'u = A,, (which is the restriction of A to J(\ = JV,) and BJ = B,. D 2.9 MARKED INVARIANT SUBSPACES Let A: <p" —* <p" be a transformation, and let 7ll> ■ ■ ■ ' 7 U,i /21' ■ • • ' J 2k2> ■ ■ ■ > 7pl> • ■ • ' 7p*p be a basis in which A has the Jordan form Obviously, any subspace of the form Span{/n,... ,/,,„,,,/21,...,/2,m2,.. ■>/p.,---./pmp} (2.9.1) for some choice of integers m,, 0< m^ < kt, is y4 invariant. [Here mi = 0 is interpreted in the sense that the vectors /",,..., fik do not appear in (2.9.1) at all.] Such y4-invariant subspaces are called marked (with respect to the given basis /)■ in which A is in the Jordan form). The following example shows that, in general, not every ^-invariant subspace is marked (with respect to some Jordan basis for A). example 2.9.1. Let A = 0 110 0 0 0 0 0 0 0 1 L0 0 0 0J :<P4-<T
84 The Jordan Form and Invariant Subspaces We shall verify that the ^-invariant subspace M = Span{e,, e2} is not marked in any Jordan basis for A. Indeed, it is easy to see (because A2 ^0 and rank A = 2) that the Jordan form of A is -OlOO" 0 0 1 0 0 0 0 0 .0 0 0 0. So any Jordan basis of A is of the form /,, f2, /3, g, where Af = Ag = 0, A/2 =/i> A/3 = fi- ^ -^ were marked with respect to this basis, we would have either .^ = Span{/,, g} or M = Span{/,, f2). The former case is impossible because A\M ^0, and the latter case is impossible because it implies M CIm A, which is not true (e2^Im A). D The description of marked invariant subspaces can be reduced to the description of invariant subspaces which are marked with respect to a fixed Jordan basis. This reduction is achieved with the use of matrices commuting with J. Theorem 2.9.1 Let J be an n x n matrix in Jordan form. Then every marked J-invariant subspace 2! can be represented in the form ££ = BM, where M is marked (with respect to the standard basis, e,,. . . , en in <p") and B is an « x n matrix commuting with J. Proof. Assume Z£=BM, where M is marked (with respect to the standard basis) and BJ = JB. Denoting by /,,..., fn the columns of B, we find that iCis a marked /-invariant subspace in the basis/,, ...,/„. (In view of the equality BJ = JB, the matrix J has the same Jordan form in the basis /1, •■-./.•) Conversely, if ££ is marked with respect to some Jordan basis /,,. . . , fn of J, then, denoting B = [/,/2 • • •/„] and M = B~lZ£, we obtain L in the required form £ = BM. O Note that the characteristic property of a marked invariant subspace depends only on the parts of this subspace corresponding to each eigenvalue: an .^-invariant subspace M is marked if and only if for every eigenvalue A0 of A the ^4^ ,^-invariant subspace M C\ *3lh (A) is marked. This follows immediately from the definition of a marked subspace. In view of Example 2.9.1 it is of interest to find transformations for which every invariant subspace is marked. We have the following result. Theorem 2.9.2 Let A: <p"-» <p" be a transformation such that, for every eigenvalue X0of A, at least one of the following holds: (a) the geometric multiplicity of A0 is equal
Functions of Transformations 85 to its algebraic multiplicity; (b) dim Ker(A - A0/) = 1. Then every A- invariant subspace is marked. Proof. Considering MD01. (A) and A\m (A) in place of M and A, respectively, we can assume that A has a single eigenvalue A0. If dim Ker(j4 - A0/) = 1, then there is a unique maximal chain of A- invariant subspaces: {0},Span{/„ £,...,/,}, k = l,...,n where /,, /2, . . . , /„ is any Jordan chain for A. So obviously every A- invariant subspace is marked. Assume now that the geometric multiplicity of the eigenvalue A0 of A is equal to its algebraic multiplicity. Then A\^ (A) = A07, and since every nonzero vector in 9lk {A) is an eigenvector for A, again every ^-invariant subspace is marked. □ It is easy to produce examples of transforamtions for which the hypotheses of Theorem 2.9.2 fail, but nevertheless every invariant subspace is marked; for example 0 1 0" 0 0 0 .0 0 0. 2.10 FUNCTIONS OF TRANSFORMATIONS We recall the definition of functions of matrices. Let /(A) = E-=0 A'/ be a scalar polynomial of the complex variable A, and let ^4: <p" —> <p" be a transformation written as a matrix in the standard basis. Then f(A) is defined as f(A) = E'=0 f(A'. Letting Jk( A) be the Jordan block of size k with eigenvalue A, define J = S-iAS = Jk](\l)®---®Jkp(\p) a Jordan form for A. Then 1-0 (=0 Li=0 J = s[2 fl[Jkl(\l)'®'^■®Jk(Xp)']}s-, A computation shows that
86 The Jordan Form and Invariant Subspaces ^(A)2 = A' 2A 1 0 0 0 Lo A2 0 2A 1 2A \2 and in general the (s, q) entry of J A A)' is ( ) A' iq s) if a > s and zero otherwise (here ^ _ j = i\/[(q - s)\(i - (q - s))\] if i>q-s and ( ' ) = 0 if i < ^ - 5). It follows that 2 /,[/*(A)j" = /(A) Yj/'(A) i/"(A) /(A) jj/'(A) (A:-!)! (*"2)! f-'(A) r--(A) /(A) /(/I) =5 2© /(*,) Tf/'i^,) 2TAA,) 0 /(A,) -/'(A,) 1 (*>-2) 0 /(A,.) l/^^A,) 5 ' (2.10.1) Hence for fixed A the matrix f(A) depends only on the values of the derivatives /(M;),...,/(m'-1)(M>), j=\,...,r where fil, . . . , fir are all the different eigenvalues of A and mj is the height of fy, that is, the maximal size of Jordan blocks with eigenvalue /xy in a Jordan form of A. Equivalently, the height of /a; is the minimal integer m such that Ker(j4 - ixjI)m — R^A). This observation allows us to define f(A) by equality (2.10.1) not only for polynomials /(A), but also for complex- valued functions that are analytic in a neighbourhood of each eigenvalue of A. Note that for a fixed A the correspondence /(A) —* f(A) is an algebraic homomorphism. This means that for any two functions/(A) and g(A) that are analytic in a neighbourhood of each eigenvalue of A the following holds:
Functions of Transformations 87 (a/ + fig)(A) = af(A) + pg(A); a, p £ <p ; (fg)(A) = f{A)g(A) On the left-hand side the function af + fig (which is analytic in a neighbourhood of each eigenvalue of A) is naturally defined by Also, we define (a/+/3g)(A) = a/(A) + 0g(A) (/g)(A) = /(A)g(A) (2.10.2) These properties can be verified by a straightforward computation using (2.10.1). For example: /(A0) Jj/XAo) 0 /(A0) 0 0 1 (*-!)! f(*-l) r-l'(A„) i (*-2)l r"(K) /(A0) g(A0) ^g'(K) 0 g(A0) KK) yj^'(Ao) 0 h(\u) 0 0 1 (A:-!)! ,(*-2) 1 g^'(K) gik-2)(K) (k-2)\ g(K) (k-iyr (Ao) (fc-2)! ^R"Z,(A0) ^(A„) where /(A) and g(A) are analytic functions in a neighbourhood of A0 and ^(A) =/(A)g(A). In particular, the property (2.10.2) ensures that f(A)g(A) = g(A)f(A) for any functions /(A) and g(A) that are analytic in a neighbourhood of each eigenvalue of A. In the sequel we need integral formulas for functions of matrices. Let A
88 The Jordan Form and Invariant Subspaces be an n x n matrix, and let T be any simple rectifiable contour in the complex plane with the property that all eigenvalues of A are inside T. For instance, one can take T to be a circle with center 0 and radius greater than ||i4|| (here and elsewhere the norm ||i4|| of a transformation A: <p" —*■ <p" is defined by ||i4|| = max||i4jc|| where ||x|| = (|jc,|)2 + • - • + (k|2)"2 for a vector jc = <*„ . . . , xj £ (pn). Proposition 2.10.1 -^-.\rk'(Ik-AYldk = A', 7 = 0,1,... (2.10.3) Proof. Suppose first that T is a Jordan block with eigenvalue A = 0: Then 0 1 0 0 1 Lo o A"1 A"2 0 A"' 0 0 1 0J (2.10.4) (A/-T)~l = L 0 0 (recall that n is the size of 7*). So "" A — I \>(\t- T\~l j\ = _!_ f 2iri )r (2.10.5) ~ \ \\\I-Tyl d\=-^ f liri h ' 277/ Ji i>~2 ii-1 0 0 7 + 1 i 1 >/-« + ! 1>-1 dk L0 1 0. = r
Functions of Transformations 89 It is then easy to verify (2.10.3) for a Jordan block Twith eigenvalue A0 (not necessarily 0). Indeed, T - A0/ has an eigenvalue 0, so by the case already considered £-. fro A'[tt ~{T- A0/)]- d\ = (7- A0/)' , 7 = 0,1,... where ro = {A - A0 | A GT}. The change of variables /x = A + A0 on the left-hand side leads to 2^/r(/*-Ao)/(//*-r)",^ = (r-A0/)'', / = 0,1,... (2.10.6) Now = 2(J)K-l(T-\0iy = r so (2.10.3) holds for the block T. Applying (2.10.3) separately for each Jordan block, we can carry the result further for arbitrary Jordan matrices J. Finally, for a given matrix A there exists a Jordan matrix J and an invertible matrix 5 such that T = S~[JS. Since (2.10.3) is already proved for J, we have 7r—\ iii(IX-Ay1dX = S~l^-:\xi(lX-jy1d\-S = S~1J'S=Ai D 2iri Jr ' 2tti Jr ' As a consequence of Proposition 2.10.1 we see that for a scalar polynomial /(A) the formula j-\rf{\)(l\-Ayld\=f{A) holds. Note that here Y can be replaced by a composite contour that consists of a small circle around each eigenvalue of A. (Indeed, the matrix function (/A - A)'1 is analytic outside the spectrum of .4.) Using this observation and formula (2.10.1) we see that for any function that is analytic in a neighbourhood of each eigenvalue of A, the formula /(y4)=2^/r/(A)(/A->irldA holds, where T consists of a sufficiently small circle around each eigenvalue of A [so that /(A) is analytic inside and on T].
90 The Jordan Form and Invariant Subspaces A transformation A: <p"—*■ <p" (or an n x n matrix A) is called diagonable if there exist eigenvectors xx,. . . ,x„ of A that form a basis in <p". Equivalently, an n x n matrix A is diagonable if for some nonsingular matrix 5 the matrix S~lAS has a diagonal form: S~lAS = diagfc*! a2 ••• a„] So a diagonable matrix has n Jordan blocks in its Jordan form with each block of size 1. If one knows that A is diagonable, then f{A) can be given a meaning [by the same formula (2.10.1)] for every function /(A) that is defined on the set of all eigenvalues of A. So, given a diagonable A, there is an 5 such that S-VtS = diag[a, ••• a„] For any function /(A) that is defined for A = a,,. . . , A = an, put f(A) = S diagt/K) ■■• /("J]*"1 In particular, f(A) is defined for a hermitian A and any function/defined on J|?. Also, for a unitary A and any function / defined on the unit circle, the matrix f(A) is well defined in this way. Consider now the application of these ideas to the exponential function. This is subsequently used in connection with the solution of systems of differential equations with constant coefficients. As/(A) = ek is analytic on the whole complex plane, the linear transformation f(A) = eA is defined for every linear transfomation A: <p"—* <p". In fact A2 A7, eA = I + A+— + — +■■■ (2.10.7) is given by the same power series as ek. In order to verify (2.10.7), we can assume that A is in the Jordan form: Then, by definition 1! 0 eA- 0 0 (*,- (*, -1)! 1 -2)! A, e ' e ' e ' On the other hand
Functions of Transformations 91 CM a,))'' ^(ik'(2k2 - U'-ik4" ° *< (ik - (*kk*i+2 LO 0 A' j = 0, 1,. . . . So the (s, q) (q > 5) entry in the matrix (■MA,))2 / + A,-(A,.)+ '2, + •■ k<-(<?- " = 2 k«-(«-i) 1 (9-s)!(f-(9-s))! ' (9-5)! e ■ Hence formula (2.10.7) follows. This argument shows that the series (2.10.7) converges for every transformation A. Actually, it converges absolutely in the sense that the series / + IMII + JLfjlL + 2 11,41.3 3! + converges as well. The exponential function appears naturally in the solution of systems of differential equations of the type dx2(t) dxn{t) 3r-,?,•«*« Here akj are fixed (i.e., independent of /) complex numbers, and jc,(f), . . . , xn(t) are functions of the real variable t to be found. Denoting A = [akj]"kj=, and x(t) = (x,(0> ■ • ■ >*„(0)> we rewrite this system in the form dx(t) dt Ax{t) A general solution is given by the formula
92 The Jordan Form and Invariant Suhspaces x(t) = e'Ax0, -oo</<oo (2.10.8) where jc0 = jc(0) is the initial value of x(t). In connection with this formula observe that e(,+s)A = e'AesA, as follows, for instance, from (2.10.7). In fact, eA+B = eAeB provided A and B commute. However, eA+B is not equal to eA • eB in general. 2.11 PARTIAL MULTIPLICITIES AND INVARIANT SUBSPACES OF FUNCTIONS OF TRANSFORMATIONS From the definition of a function of a transformation A: <p"—* <p" it follows immediately that if A,,. . . , An are the eigenvalues of A (not necessarily distinct), then /(A,), . . . , /(AB) are the eigenvalues of f(A). Moreover we compute the partial multiplicities oif(A), as follows. Theorem 2.11.1 Let A: <p"—* <p" be a transformation with distinct eigenvalues /i,,. . . , fir and partial multiplicities mn,. . . , mik corresponding to ftt,i=l,. . . ,r. Let /(A) be an analytic function in a neighbourhood of each /*., (if all mtj are 1, it is sufficient to require that /(/*,.) be defined for i = 1,. . . , r). For each mtj define a positive integer s^ as follows: sif = mtj if m^ = 1 or if f( '(/*.,) = 0 for k=\,...,mij — \; otherwise /^(/a,) is the first nonvanishing derivative of /(A) at /x,.. Then the partial multiplicities of f(A) corresponding to the eigenvalue A are as follows: I IT J rePeated 541~\ ~ ma + sa times ' j=l,...,ki — + 1 repeated mtj - st\ — times , j = 1,. . . , kt for all indices i such that /(/*,-) = A. Proof. By Corollary 2.2.2, if suffices to consider the case when A = Jm(fi) is a Jordan block. Using equations (2.10.1), we see that dim Ker[/(,4) -/(/*)/] = s where /(5)(/x) is the first nonvanishing derivative of /(A) at (jl. [If m = 1 or if f{k\fi) = 0 for & = 1, . . . , m, we put s = m.] More generally dimKer[f(A)-f(fi)I]' = min(m,js), ;' = 1,2,... (2.11.1)
Partial Multiplicities and Invariant Subspaces 93 Denoting the left-hand side of this relation by t-, note that the sizes of Jordan blocks of f(A) are uniquely determined by the sequence ti,. . . ,tm. Indeed, the number of Jordan blocks of f(A) with size not less than; is just tj - f ■_,, where j = 1,. . . , m and t0 is zero by definition. This observation, together with (2.11.1), leads to the conclusion of the theorem. □ Let us give an illustrative example for Theorem 2.11.1. example 2.11.1. Let A be a 23 x 23 matrix with only two distinct eigenvalues 0 and 1, and with partial multiplicities 1,4,9 corresponding to the eigenvalue 0, and with partial multiplicities 2,7 corresponding to the eigenvalue 1. Let /(A) = A2( A - l)4. Then f(A) has the unique eigenvalue 0, and the different partial multiplicities of A have the following contribution to the partial multiplicity (PM) of f{A), according to Theorem 2.7.1: The PM 1 for A gives rise to the PM 1 of f(A). The PM 4 of A gives rise to the PM values 2,2 of f(A). The PM 9 of A gives rise to the PM values 4,5 of f(A). The PM 2 of A gives rise to the PM values 1,1 of f(A). The PM 7 of A gives rise to the PM values 1,2,2,2 of f(A). Hence a Jordan form for the transformation A2(A - I)4 has four Jordan blocks of size 1,5 Jordan blocks of size 2, one Jordan block of size 4 and one Jordan block of size 5, all corresponding to the eigenvalue zero. □ Note that for a given transformation A: <p"—* <p" and a function/(A) such that f(A) can be defined as above, there exists a polynomial p( A) such that p(A) =f(A). Indeed, takep(A) such that />(*,)=/(/*,), ■ ■ ■ , p{mrl)(H,) = fimrl\Hi) , j = h...,r where /*.,,. . . , fir are all the different eigenvalues of A and m • is the height of /xr Consider now the connections between invariant subspaces of A and the invariant subspaces of a function of A. Proposition 2.11.2 If M is an invariant subspace of a transformation A, then M is also invariant for every transformation f(A), where /(A) is a function for which f{A) is defined. The proof is immediate: m f(A)=p(A) = '2pjAi
94 The Jordan Form and Invariant Subspaces for some polynomial p( A) = £™,0 Pjk'; so for every xEMv/e have A'x £ M, j — Q,...,m, and thus (2 PjA^xEM Note that in general the linear transformation f{A) may have more invariant subspaces than A, as the following example shows. example 2.11.2. Let TO 11 Ho o. The invariant subspaces of A are {0}, Spanje,}, <p2, but the invariant subspaces of A2 = 0 are all the subspaces in <p2. □ We characterize the cases when f(A) has exactly the same invariant subspaces as A. Theorem 2.11.3 (a) Assume that /(A) is an analytic function in a neighbourhood of each eigenvalue fil,. . . , fir of A (/*.,, . . . , fir are assumed to be distinct). Then f(A) has exactly the same invariant subspaces as A if and only if the following conditions hold: (i) /(/u.,) ^/(/u.y) // /*., ¥■ /u.;; (ii) /'(/*.,) ^0 for every eigenvalue fit with height greater than 1. (b) If A is diagonable and /(A) is a function defined at each eigenvalue of A, then f(A) has exactly the same invariant subspaces as A if and only if condition (i) of part (a) holds. Proof. We shall assume that A has the Jordan form where each A, coincides with some ft-, 1 </< r. Suppose that (i) does not hold, and suppose, for instance, that Aj ^ A2 but/(A,) = /(A2). Formula (2.10.1) shows that ex + efc+l is an eigenvector of f(A) corresponding to the eigenvalue/(A,). Hence Span{e, + ek +1} is/(j4) invariant; but this subspace is easily seen not to be A invariant. Suppose that (ii) does not hold; say, kx > 1 and /'(A,) = 0. Formula (2.6.1) implies that e{ + e2 is an eigenvector oif{A) corresponding to/(A,). So Span{e, + e2) is an f(A)-invariant subspace that is not A invariant. Assume now that (i) and (ii) hold. As f(A) = p(A) for some polynomial p(A), we can assume that /(A) is itself a polynomial. Condition (i) imposed on the polynomial / ensures that the root subspace of A corresponding to some eigenvalue A0 is also a root subspace of f{A) corresponding to the
Exercises 95 eigenvalue/(A0). Since every ^-invariant [resp. /(i4)-invariant] subspace is a direct sum of ^-invariant [resp. /(y4)-invariant] subspaces, each summand belonging to a root subspace, we can assume that cr(A) consists of a single point; say, cr(A) = {0}. Replacing, if necessary, /(A) by a/(A) + [}, where a, [} £ <p are constants and a ¥^ 0—such a replacement does not alter the set Inv(/(y4)) of all f(A)-invariant subspaces—we can assume that /(0) = 0, /'(0) = 1. In this case p f(A) = A + 2 atA' , a, G <P But then f(A)= AF, where F = I + EfJ,' ai + iA' is an invertible matrix. Clearly, every ^-invariant subspace is also AF invariant. Note that F~l is a polynomial in AF (this can be checked, for instance, by direct computation in each Jordan block of A, using the fact that A is a Jordan matrix and (t(A) = {0}); so every y4F-invariant subspace is also (AF- F'1) invariant, that is, A invariant. Thus we have proved that ln\( f(A)) = Inv A. □ 2.12 EXERCISES 2.1 Let where Ax: <pm—»<pm and A2: <p"—* (p" are transformations. (a) Prove or disprove the following statement: every ^-invariant subspace is a direct sum of an y4,-invariant subspace and an y42-invariant subspace. (b) Prove or disprove the preceding statement under the additional condition that the spectra of ^4, and A2 do not intersect. (c) Prove or disprove the preceding statement under the additional condition that A, and A2 are unicellular with the same eigenvalue. 2.2 Let A: <p"—* <p" be a transformation with A2 = /. Describe the root subspaces of A. 2.3 Describe the root subspaces of a transformation A such that Ai = I. How many spectral ^-invariant subspaces are there? 2.4 Find the root subspaces of the transformation where B: <p"—* <p" is some transformation and A, ^ A2. Is it true that 9lk (A) = Ker(A,/ - A), f = 1,2?
96 The Jordan Form and Invariant Snbspaces 2.5 Find the Jordan form for the following matrices A: 2 1 0 -1 -1 2 -1" 0 1. ) 1 0 0 -2 1 0 3" 0 1. » 2 1 -1 0 3 1 0-10 For each one of the matrices A and each eigenvalue A0 of A, check whether dt^A) = Ker( A0/ - >t). 2.6 Find all possible Jordan forms of transformations A: <p"-» <p" satisfying A1 = 0. Express the number of Jordan blocks of size 2 in terms of A. 2.7 Find the Jordan form of the transformation Q = 0 1 0 0 0 1 0 0 0 .10 0 0 0 1 0J : <p"^<p" 2.8 What is the Jordan form of Q , k = 2, 3,. . . , where Q is given in Exercise 2.7.] 2.9 Describe the Jordan form of a circulant matrix A = a2 a, La, a3 •n~2 "n-\ a. . where a,, . . . , a„ are complex numbers. Prove that there exists an invertible matrix 5 independent of ax,. . . , an such that 5^45"' is diagonal. [Hint: A is a polynomial in Q, where Q is denned in Exercise 2.7]. 2.10 What is the Jordan form of the transformation fi2 "0 0 6 u h 0 6 0 o ■ h ■ 6 • o • • 0" • 0 • h ■ 0 J 2n „ Jr,2n< <P2"-(P
Exercises 97 2.11 Find the Jordan form of the transformation G* o 0 0 '* o 0 0 0 Uk 0 0 0J :£"*^<p" 2.12 Let Au A2,. . . , An be transformations on <p2, and define ,4 = VI, A An Al A2 2 ^3 •"2 3 4 A J :<F2"^<P2 (a) Show that A is similar to a block diagonal matrix with 2x2 blocks on the main diagonal. [Hint: On writing A> = [bJ. Cf]'- bi'ei.di-f>et *> fr for ;' = 1,. . . , n, A is similar to ID F\ where B is the circulant matrix A *. ub2 b3 K J and analogously for C, D, and F. Now use the existence of one similarity transformation that takes B, C, D, and F to the Jordan form (Exercise 2.9).] (b) Prove that in the Jordan form of A only Jordan blocks of size 2 or 1 may appear. (c) Show that if all Ajt j = 1,. . . , n are diagonal matrices, then A is diagonable, that is, the Jordan form of A is a diagonal matrix. Give an example of nondiagonal A t,. . . , A n for which A is diagonable nevertheless.
98 The Jordan Form and Invariant Subspaces 2.13 Prove that the block circulant matrix A A, ■A2 A3 A„_ nk Jc"* :<p"*^<p where A{,. . . , An are k x k matrices, has Jordan blocks of sizes less than or equal to k in its Jordan form. 2.14 Find the Jordan form for the transformation A = 0 0 1 0 0 a, 0 1 0 0 where a0,. . . , a„_, £ <p and the polynomial A" - E"J0' ayA' has n distinct zeros. Show that a similarity that takes A to its Jordan form is given by the Vandermonde matrix of type LA 1 A, 1 A„ 2.15 Let 0 0 0 «o 1 0 0 «1 0 • 1 • 0 • «2 • 0 0 1 •• «„- (a) Prove that, for each eigenvalue, A has only one Jordan block in its Jordan form. (Hint: Use the description of partial multiplicities of A in terms of the matrix polynomial A/ - A; see the appendix.) (b) Find the Jordan form of A.
Exercises 99 2.16 Show that any matrix of the type 0 0 0 ^0 h 0 0 A\ 0 • h • 0 • A2 ■ 0 0 • /* •• A.-1 :47"*-»^* where A/ are kx k matrices, has not more than k Jordan blocks corresponding to each eigenvalue in its Jordan form. 2.17 What is the Jordan form of the upper triangular Toeplitz matrix L0 0 An -J where a0,. . . , an_l are complex numbers with a, 7^0? 2.18 Find the Jordan form of (/„(A0))\ k = 2, 3, Show that [/n(0)]* has infinitely many invariant subspaces if k & 2. 2.19 Describe the Jordan form of the matrix in Exercise 2.17 without the restriction a^O. When does this matrix have infinitely many invariant subspaces? [Hint: Observe that the matrix is a polynomial in J„(0) and use Theorem 2.11.1.] 2.20 Prove that an n x n matrix A is similar to its transpose A . 2.21 Let A: <p" -» <f" be a transformation such that p(A) = 0, where p( A) is a polynomial of degree k with k distinct zeros A,,. . . , \k. (a) Show that Ker(A/- A)* {0}, j = 1, . . . , k. (b) Verify the direct sum decomposition <p" = Ker( A,/ - i4) + • • ■ + Ker(\kI - A) 2.22 2.23 (c) Prove that A is diagonable. Assume that the transformation A:$"—»<p" satisfies the equation p(A) = 0, where p(\) is a polynomial. Let A0 be a zero of p(X), and let k be its multiplicity. Show that the ^-invariant subspace Im ^(^4), where g(A) = /?(A)(A - A0)~\ is spectral. Prove that for any transformation A: <p"—* <p" the inequalities dim Ker As+l + dim Ker ,4s ' < 2 dim Ker A5, s = l,2, hold.
100 The Jordan Form and Invariant Subspaces 2.24 Prove that a transformation A: $"-+$" has the property that AML <ZMX for every .^-invariant subspace M if and only if A is normal. 2.25 Show that a transformation has only one-dimensional irreducible subspaces if and only if A is diagonable. 2.26 Find the minimal number of generators in <p" of the following transformations: (a) The circulant - a, a2 ■•■ an~ an al ■■■ fl„-i _a2 a3 ''" a\ - (b) The lower triangular Toeplitz matrix " a0 0 «i «o -«„-l a„-2 (c) The companion matrix "0 1 0 0 0 1 La0 a, a2 2.27 Prove that if A: (pn—»<p" has one-dimensional image, the minimal number of generators of any ^4-invariant subspace is less than or equal to n — 1. Show that Ker A is the only nontrivial .4-invariant subspace whose minimal number of generators is precisely n-\. 2.28 For a given transformation A, denote by g(M) the minimal number of generators in an ^-invariant subspace M. Prove that g(M) = maxg(Mn0lX(>(A)] where the maximum is taken over all eigenvalues A0 of A [g({0}) is interpreted as zero]. «,e<P o o 0 0
Exercises 101 2.29 Let A = A, 0 1 . o aA where Al and A2 are transformations such that every invariant subspace of each of them is cyclic. Prove or disprove the following statements: (a) Every /4-invariant subspace is cyclic. (b) Every ^-invariant subspace has not more than two minimal generators. 2.30 Show that the vector (0,0, . . . , 0,1) G <p" is a generator of <p" as an invariant subspace of a companion matrix. 2.31 Find the minimal ^-invariant subspace over Im B for the following pairs of transformations: (a) (b) (c) A = A = ~ax - "a\ a2 -an i- -«ih «2 0 0 a, «„-i •• 0 a2lk a a, . 0" 0 «i- B = B = aJk 0 - "0 0 0 -1 '0 0 0 0 1 0 -0 1 , B = ■o 0 0 Here a,,. . . , an are complex numbers. 2.32 Find the maximal y4-invariant subspace in Ker C for the following pairs of transformations: (a) C = [1 0 ■ • • 0]; A is a companion matrix. (b) C=[l 0 ••• 0]; A is an upper triangular Toeplitz matrix. (c) C = [Ik 0 0]; A is an in Exercise 2.31, (c).
102 The Jordan Form and Invariant Subspaces 2.33 Prove or disprove the following statements: (a) If Ml is the maximal ^-invariant subspace in Vx and M2 is the maximal .4-invariant subspace in ¥2, then Jii + M2 is the maximal ^-invariant subspace in Vl + T2. (b) If Mf and T, (i = 1, 2) are as in (a), then Mx C\M2 is the maximal /l-invariant subspace in T, H T2. (c) The analog of (a) for the case of minimal ^-invariant subspaces Mi over Vi,i = 1,2. (d) The analog of (b) for the case of minimal ^-invariant subspaces M, over Y„ i = l,2. 2.34 Find when the following pairs of matrices are full-range pairs: (a) U.(Ao), b, where bltb2,. . . ,bnE$ (b) (A, B), where A is an n x n matrix with A" = 0 and B is an « x 1 matrix. 2.35 Find when the following pairs of matrices are null kernel pairs: (a) ([c,,. . . ,c„], J„(h0)k), where k > 1 is a fixed integer and c,,...,c„e<p. (b) (C, j4), where Cis an 1 x n matrix and A is an n x n upper triangular matrix with zeros on the main diagonal. 2.36 Given a full-range pair >1: <p"-»<p", B:<pm^<p", prove that if A': <£"-». (p", B': <pm-» <p" are transformations sufficiently close to >t and B, respectively, (i.e., \{A' - A\\ < e, ||73'- B|| < e, where e>0 depends on A and B only), then (A', B') is a full-range pair as well. 2.37 Prove that for every pair of transformations A: <p"-* <p", B: <pm-» <p" there exists a sequence of full-range pairs (Ap, Bp), p = 1, 2,. . . such that lim„_ \\Ap - .4|| =0 and lim„_ \\Bp - B\\ = 0. 2.38 State and prove the analogs of Exercises 2.36 and 2.37 for null kernel pairs. 2.39 Let A and B be transformations on <p". Show that the biggest ^-invariant (or, equivalently, B-invariant) subspace M for which A\M = B\M consists of all vectors jcG(p" such that A'x= B'x, 7 = 1,2,.... 2.40 Let Ax, . . . , Ak be transformations on <p". Show that the biggest At - invariant subspace M for which A^M = Ap\M for p = 1,. . . , k, consists of all jc E <p" such that A\x = A'px for p = 1,. . . , k and 7 = 1,2,....
Exercises 103 2.41 Show that the transformation eA is nonsingular for every transformation A: <p"—* <p". Find the eigenvalues and the partial multiplicities of eA in terms of the eigenvalues and the partial multiplicities of A. 2.42 Give an example of a transformation A such that lm(A) is finite but Inv^) is infinite. 2.43 Show that for a transformation A the series f{A) = I-A+±A2-jA3 + --- converges provided all eigenvalues of A are less than 1 in absolute value. For such an A prove that A = eS(A) -1, so one can write f(A) = ln(/ + A). Prove that A and ln(/ + A) have exactly the same invariant subspaces. 2.44 Find all marked ^-invariant subspaces for the transformation A of Example 2.9.1. 2.45 Show that for any transformation A, all ^-hyperinvariant subspaces are marked. 2.46 For which of the following classes of n x n matrices are all invariant subspaces marked? (a) Companion matrices (b) Block companion matrices " 0 / 0 • • 0 " 0 0 / •■■ 0 -A0 A{ A2 •■• Ap_x-i with 2x2 blocks Ai {p = nil) (c) Upper triangular Toeplitz matrices (d) Circulant matrices (e) Block circulant matrices j4, A2 ■ ■ • Ap Ap ^4, ••■ Ap_x with 2x2 blocks Ai (f) Matrices A such that A2 = 0
104 The Jordan Form and Invariant Subspaces 2.47 Prove that every invariant subspace of a matrix of type "a, 0 0 ••• 0 /3,- 0 a2 0 ••• & 0 0 /3„_, 0 ••• a„'_, 0 L/3„ 0 0 ••• 0 a J is marked. 2.48 Prove that for any transformation A: <p3—»<p3 every invariant sub- space is marked. 2.49 Find all Jordan forms of transformations A: <p4 —» <p4 for which there exists a nonmarked invariant subspace.
Chapter Three Coinvariant and Semiinvariant Subspaces In this chapter we study two classes of subspaces closely related to invariant ones; namely, coinvariant and semiinvariant subspaces. A subspace is called coinvariant if it is a direct complement to an invariant subspace. A subspace is called semiinvariant if it is a coinvariant part of an invariant subspace. Also, we introduce here the related notion of a triinvariant decomposition for a transformation. This requires a decomposition of the whole space into a direct sum of three subspaces with respect to which the transformation has a block upper triangular form. It follows that the first, second, and third subspace are invariant, semiinvariant, and coinvariant, respectively. The triinvariant decomposition will play an important role in subsequent applications. 3.1 COINVARIANT SUBSPACES A subspace M Cj[p" is called coinvariant for the transformation A: <p"-» <p" (or, in short, A coinvariant) if there is an ^-invariant direct complement to M in <p". Consider some simple examples. example 3.1.1. Let A be an n x n Jordan block. Then for each i (1 < i < n) Span{e,-, ei+l,. . . , en) is an ^-coinvariant subspace (although there are many other .4-coinvariant subspaces). For this subspace there is a unique ^-invariant subspace that is its direct complement, namely, Span{e,, e2,. . . , ej_i} ({0} if i = l). Note that, in this case, the only subspaces that are simultaneously A invariant and A coinvariant are the trivial ones {0} and <p". □ 105
106 Coinvariant and Semiinvariant Subspaces example 3.1.2. Let A = diag[A,,. . . , Aj, where all A, are different. As we have seen in Example 1.1.3, the only ^-invariant subspaces are {0}, <p", and Span{e, ,. . . , ei }, k = 1,. . . , n - 1, for any choice of i, < i2 < • • • < ik. In contrast, every subspace in <p" is A coinvariant. Indeed, let M = Span{jc,,. . . , xq}, where oc,,..., oc are linearly independent vectors in <p". Then the columns of the n x q matrix X = [jc,jc2 ••• x ] are linearly independent. So there exist q rows of X, say, the i,th,. . . , i^th rows, which are also linearly independent. Put {;',,..., jn_q) = {1,..., «}\{i,,..., iq) and ^V = Span{ey ,. . . , ^ } so that M is an y4-invariant subspace. As, by construction, the n x n matrix f jc, x-, ■ ■ • je e, e, ■ ■ ■ e{ 1 is nonsingular, 1 is a direct complement to M in <p". Thus M is A coinvariant. □ example 3.1.3. If A = a I, a G <p, then every subspace in <p" is obviously A coinvariant. For every ^-coinvariant subspace M there is a continuum of ^-invariant subspaces that are direct complements to M in <p". □ For an ^-coinvariant subspace M and any projector P onto M such that Ker P is A invariant, we have PAP = PA. This follows, for instance, when equation (1.5.5) is applied to I — P, or else it can be proved directly. Conversely, if PAP= PA for some projector P onto a subspace M C <p", then M is A coinvariant and Ker P is an /4-invariant direct complement to M in <p". Given an ^-coinvariant subspace M and a projector P onto M such that Ker P is A invariant, the linear transformation A has the following block triangular form: with respect to the decomposition <p" = Im P + Ker P. In particular, we find that every eigenvalue of the compression PA\M: M^>M of A to its coin- variant subspace M, is also an eigenvalue of A. Indeed, in the representation (3.1.1) the compression PA\M coincides with A,,, and this immediately implies that cr(PA\M)C cr(A). We note that, essentially, the compression to a coinvariant subspace depends on the invariant direct complement only. (Actually, we have encountered this property already in Theorem 2.7.4 and its proof.) Proposition 3.1.1 Let Jil and M2 be A-coinvariant subspaces with a common A-invariant direct complement N. Then the compressions PXA\M : MX-*MX and
Coinvariant Suhspaces 107 P2A\M\ ■M2—>M2 (where Pf is the projector on Mj along Jf for j = 1,2) are similar. Proof. Write -ft -ft A 22 J /122 J with respect to the direct sum decompositions <p" = J<, 4- Jf and <f" = M2 + Jf, respectively. Also, write the identity transformation /: <p"—* <p" in the 2 x 2 block matrix form /= " 12 : M, + Jf-*M2 + J<f 21 22 (so 5,,: M1—*M2, 512: N—*M2, S21: M,—> Jf, S22: Jf—* Jf). It is easily seen that 5,2 = 0 and S22 = 7^, the identity transformation on JV. As / is invert- ible, the transformation 5n must be invertible as well, and --[ Snl 0 : Jt2 + Jf-»Jl1 +Jf Now Mi, o i r 5-' o jr/i;, o irs,, o Ly421 >*22 -■ L — S21Sn IjrMA2l i422JL52, IjV which gives, in particular, Au = SuiA'uSn. It remains to observe that The following property of coinvariant subspaces is analogous to the property of y4-invariant subspaces proved in Section 1.4. Proposition 3.1.2 A subspace M is A coinvariant if and only if its orthogonal complement M 1 is A* coinvariant. Proof Assume that M is A coinvariant, and let M be an ^4-invariant direct complement to M in <p". Then M±nJf1 =(M+ Jf)x = i^")1 = {0}, and since dim M + dirn Jf1 = (n - dim M) + (« - dim N) - n, we have M L + XL = <p". As JfL is A* invariant (see Section 1.4) it follows that M L is A* coinvariant. Conversely, if M1 is A* coinvariant, then by the part of this proposition already proved, the subspace (M^Y = M is (A*)* coin- variant, that is, M is A coinvariant. □
108 Coinvariant and Semiinvarianl Subspaces A subspace M C <p" is called orthogonally coinvariant for the transformation A: <p"—» <p" (in short, orthogonally A coinvariant) if the orthogonal complement M L of M is A invariant. Proposition 3.1.3 A subspace M is orthogonally A-coinvariant if and only if M is invariant for the adjoint linear transformation A*. Proof. Assume that M is orthogonally A coinvariant. So Ax EML. Then we have (Ax,y) = 0 (3.1.2) for all yEM. But the left-hand side of (3.1.2) is just (x, A*y). Hence A*yE(M1)1 = M for all yEM and M is A* invariant. Reversing this argument we find that if M is A* invariant, then Ax EM1 for every xEM1, that is, M is orthogonally A coinvariant. □ We observe that, in general, ^-coinvariant subspaces do not form a lattice, that is, the sum and intersection of ^-coinvariant subspaces need not be A coinvariant. This is illustrated in the following example. example 3.1.4. Let A = "0 0 .0 1 0 0 (T 1 0. :<P3-<P3 The only y4-invariant subspaces are {0}, Span{e,}, Span{e,, e2}, <p3. Consequently, all ^-coinvariant subspaces are as follows: {0}; <p3; Span{<x, y, 1)} , x, y E <p Span{<jc,l,0>, (y,0,l)}, x,yE( Indeed, assume Span{«, v} is a two-dimensional subspace for which Span{e,} is a direct complement. Writing « = («,, u2, h3), v = <u,, v2, v3) u2 u2] 7^0. Hence replacing u and v with their linear we see that det combinations, if necessary, we see that Span{«(i>} = Span«x, 1,0), (y,0,l»
Reducing Subspaces 109 for some jc, yG (p. Now Span{e2, e3} and Span{e2, (1,0,1)} are ^-coin- variant subspaces but their intersection (which is equal to Span{e2}) is not. Also, Span{e3} and Span{(1,0,1)} are ^-coinvariant subspaces but their sum (which is equal to Span{e3, (1,0,1)}) is not. □ In contrast, it follows immediately from Proposition 3.1.3 that the set of all orthogonally ^-coinvariant subspaces is a lattice. Note also the following property of orthogonally coinvariant subspaces. Proposition 3.1.4 Any transformation has a complete chain of orthogonally coinvariant sub- spaces. Proof. Let A: <p" —» <p" be a transformation. As we have seen in Section 1.9, there is an orthonormal basis *,,...,*„ for <p" in which A has the upper triangular form: A = Clearly, the subspaces Span{jcfc,. . . , *„}, k - 1,. . . , n are orthogonally A coinvariant and form a complete chain. □ 3.2 REDUCING SUBSPACES An invariant subspace if of a transformation A: <p" —> <p" is called reducing for A if !£ 4- M = <p" for some other A -invariant subspace M. In other words, a subspace if C <p" is reducing for A if it is simultaneously A invariant and A coinvariant. In particular, {0} and <p" are trivially reducing. A more important example follows from Theorem 2.1.2. This shows that the root subspaces 9lk {A) are reducing for A. A unicellular linear transformation is an example in which the only reducing subspaces are the trivial ones {0} and <p". On the other hand, A = I is a linear transformation for which every subspace in <p" is invariant and reducing. As a transformation on <p" with only one Jordan block (i.e., a unicellular transformation) has the smallest possible number of reducing subspaces, one might expect that a transformation with the most Jordan blocks has the most reducing subspaces. This is indeed so. Recall that a transformation is called diagonable if its Jordan form is a diagonal matrix.
110 Coin variant and Semiinvariant Subspaces Theorem 3.2.1 If A is diagonable, then each invariant subspace of A is reducing. Conversely, if each invariant subspace of A is reducing, then A is diagonable. Proof. Assume that A is diagonable. Using Proposition 1.4.2, it is easily seen that each invariant subspace of A is reducing if and only if the same is true for S~ AS, for any nonsingular matrix 5. So we can assume that A = diag[a, • • ■ a„] for some a,,. . . , an G (p. Let A,,. . . , A be all the different numbers among the a, values, and for notational convenience assume that where 1 < kt < k2 < ■ ■ ■ < kp_1 <kp = n are integers. Obviously, the eigenvalues of A are A,,. . . , A , and the root subspaces of A are &*,(>0 = Span{eti_i + 1, <Vi+2, . . . , ek) , i = \,...,p (by definition we put k0 = 0). By Theorem 2.1.5 any ^-invariant subspace M has the form M = Mi + ■ ■ ■ + Mp where Mi C 9lk(A). Let Mt be any direct complement for Mi in 9lk(A). As Ax = A;jc for every xE:9lk(A), the subspace Mt is obviously A invariant. Hence the subspace J*f = ^/", + • • • 4- Np, which is a direct complement to M in <p", is also A invariant. This means, by definition, that M is reducing. Conversely, assume that A is not diagonable. Let M be the ^-invariant subspace of A spanned by its eigenvectors. As A is not diagonable, M # <p". If M is any other ^4-invariant subspace and x is an eigenvector of ^4^, then jc is also an eigenvector of A, and thus xE.M. So J CiJ{^{0} for every y4-invariant jf. Consequently, M is not reducing. □ An important class of diagonable transformations A: <pn—* <p" are those that have n distinct eigenvalues A,,...,A„. Indeed, the corresponding eigenvectors xl, . . . , xn are linearly independent (and, therefore, form a basis in <p") because jc, G 3tK(A) and the subspaces % (^4),. . . ,9lk {A) form a direct sum. We have the following.
Reducing Subspaces 111 Corollary 3.2.2 If a transformation A: <p"—»<p" has n distinct eigenvalues, then every A- invariant subspace is reducing. Consider now the situation in which an ^-invariant subspace is reducing and is orthogonal to its y4-invariant comlementary subspace. An invariant subspace M of a transformation A: <p"—* <p" is called orthogonally reducing if its orthogonal complement M y is also A invariant. Theorem 3.2.3 Every invariant subspace of A is orthogonally reducing if and only if A is normal. Proof. Recall first (Theorem 1.9.4) that A is normal if and only if there is an orthonormal basis of eigenvectors oc,, . . . , xn of A. Assume that A is normal, and let x,,..., xn be an orthonormal basis of eigenvectors of A that is ordered in such a way that xx, . . . ,xk correspond to the eigenvalue A, xk +,,..., xk correspond to the eigenvalue A2 xk ,. . . , xk correspond to the eigenvalue Ap Here A,,. . . , \p are all the different eigenvalues of A. Arguing as in the proof of Theorem 3.2.1 we see that any /4-invariant subspace is of the form M = Mi + ■■ ■ + Mp where Mi(Z^z.n{xk , . . . , *,.}, / = 1,. . . , p (by definition k() = 0) and its orthogonal complement ML = M\ + ■■■ + Mp in <p" is also A invariant. Here M\ is the orthogonal complement to Mt in the space (p*^*1"1 Conversely, assume that every ^-invariant subspace is orthogonally reducing. In particular, every .^-invariant subspace is reducing, and by Theorem 3.2.1, A = diag[a],. . . , an] in a certain basis in <p". Denoting by A,,. . . , A all the different eigenvalues of A, it follows that 3ik(A) is spanned by the eigenvectors of A corresponding to A,. Now for each L, 1 < i0 <p, the subspace 3i. (^4) is the unique ^-invariant subspace '0 that is a direct complement to E,Vj £%A (A) in <p". [This follows from the fact
112 Coinvariant and Semiinvariant Subspaces that any ,4-invariant subspace At has the form M - Ef=1 Ai n 9lk (A).] The orthogonal reducing property of 91. (A) implies that the subspaces 9ik (A),. . . , 9ik (A) are orthogonal to each other. Taking an orthonormal basis in each 9tk{A) (which necessarily consists of eigenvectors of A corresponding to A,), we obtain an orthonormal basis in <p" in which A has a diagonal form. Hence A is normal. □ The proof of Theorem 3.2.3 shows that if every ^-invariant subspace is reducing and every root subspace for A is orthogonally reducing, then every y4-invariant subspace is orthogonally reducing. Note also the important special cases of Theorem 3.2.3: every invariant subspace of a hermitian or unitary transformation is orthogonally reducing. 3.3 SEMIINVARIANT SUBSPACES A subspace Ai C <p" is called semiinvariant for a transformation A: <p"—» <p" (or, in short, A semiinvariant) if there exists an ^-invariant subspace ^Vsuch that 1 n J = {0} and the sum Ai + jV is again A invariant. By taking Jf = {0} we see that any ^-invariant subspace is also A semiinvariant. If Ai is an ^-coinvariant subspace, then there is an ^-invariant direct complement Jf to At in <p" (so the conditions that M fll = {0} and that Ai -i- M is A invariant are automatically satisfied). Thus we see that any ^-coinvariant subspace is also A semiinvariant. In general, a subspace At C <p" is A semiinvariant if and only if At is A\L-coinvariant for some ^-invariant subspace if containing At. example 3.3.1. Let A be an n x n Jordan block. Then it is easily seen that the subspaces Span{ej, ei+l,. . . , e-}, where 1 </</<«, are A semiinvariant (but there are many other A semiinvariant subspaces). This example shows that in general there exist semiinvariant subspaces that are neither invariant nor coinvariant. D Consider now the ^-semiinvariant subspace M, and let Ji be an A- invariant subspace such that Jf nJt = {0} and At + Jf is A invariant. Then we have a direct sum decomposition $n = Jf + Ai + £ (3.3.1) where if is a direct complement to M. + Jfin <p". To emphasize the fact that this is a decomposition of <p" into the sum of invariant, semiinvariant, and coinvariant subspaces, respectively, we call equation (3.3.1) a triinvariant decomposition associated with the ^-semiinvariant subspace M. Triinvariant decompositions play an important role in the applications of Chapters 5 and 7.
Semiinvariant Snbspaces 113 Note that in general a triinvariant decomposition associated with a given M is not unique. With respect to the triinvariant decomposition (3.3.1), the transformation A has the following 3x3 block form: A= 0 A22 A23 (3.3.2) "^1. 0 - 0 An A 22 0 *» A 23 A3J-i Here An: Jf-*Jf, A22--M-*M, A3J: <£-*<£, A12:M-+Jf, A23:£-*M, A13: !£—*Jf. The presence of zeros in (3.3.2) follows from the A invariance of Jf and M + Jf (see Section 1.5). The converse is also true: if A is a transformation from <p" into <p", and A has the form (3.3.2) with respect to some direct sum decomposition (3.3.1), then M is A semiinvariant, and the ^-invariant subspace N is such that M, + Jf is A invariant as well. In particular, it follows from the formula (3.3.2) that the spectrum of the compression PA\M (where P: N + M—* N + Mis the projector on M along Jf) of A to its semiinvariant subspace M is contained in the spectrum of A. We characterize i4-semiinvariant subspaces in terms of functions of A, as follows. Theorem 3.3.1 Let A: <p"—* <p" be a transformation. The following statements are equivalent for a subspace M C <p": (a) M is semiinvariant for A; (b) for a suitable projector P mapping <p" onto M, we have PAm\M = {PA\M)m , m = 0,l,2,... (c) for any function /(A) such that f(A) is defined we have Pf(A)\M=f(PA\M) (3.3.3) where P is a suitable projector with Im P — M. In (b) PAm\M is understood as a transformation from M into M. Recall that f(A) is certainly defined for a function /(A) that is analytic on the spectrum of A and, if A is diagonable, for any function /(A) that is merely defined on the spectrum of A. As the spectrum of PA\M is contained in the spectrum of A (provided M is A semiinvariant), it follows that f(PA\M) is well defined if /(A) is analytic on the spectrum of A. We shall see in Section 4.1 that if A is diagonable, so is PA\M (provided M is A semiinvariant), and thus f(PA\M) is well defined in the case when A is diagonable and /(A) is defined on the spectrum of A. Proof. Assume that M is A semiinvariant, and write A as in (3.3.2), with respect to the triinvariant decomposition (3.3.1). Let P be the projec-
114 Coinvariant and Semiinvariant Subspaces tor on M along JC + Z£. Then PA\M tion shows that A22. Now a straightforward calcula- 0 0 l22 Am m = 0,1,2,. Now assume that (b) holds. Let if be the smallest ^-invariant subspace containing M. (In other words, if is the intersection of all ^-invariant subspaces that contain Jt.) Equivalently, if is the span of all vectors of type A'x, where x £ M and ; = 0,1, .... In particular, if D M. Let Q be a projector on if such that Ker Q CKer P (e.g., take any direct complement Jf' to j? n Ker P in Ker P, so that Ker P = JT 4- (if n Ker P), and let 0 be the projector on if along Jf'). Then Im(/- (2) CKer P or, equivalently, **(/ - C) = 0, that is, PQ = P. As if D M, the equality QP= P obviously holds. Now (Q - P)(Q - P) = Q2 - PQ - QP + P2 = Q - P so Q- P is a projector, and lm(Q - P) is a direct complement to M in if. We shall prove that Im((2- P) is A invariant, which shows that Jt is semiinvariant for A. Clearly, QAQ = AQ (because Im Q = if is A invariant) and QAP = AP (because for every vector jc G Im P = M, the vector Ax belongs to i£ and thus QAx = Ax). Let us show that PAP = PAQ (3.3.4) For every x6i and for any / = 0,1,2,... we have PAPA'x = PA\MPA'\Mx = PA\M ■ (PA\M)'x = {PA\M)i+ix = PA' + ix = PA- A'x where we have used the property (b) twice. As the subspace if is spanned by A'x, x G Jt, j = 0,1,. . . , we conclude that PAPy = PAy for every y G if, which amounts to the equality PAPQ = PAQ, and (3.3.4) follows. Using the equalities QAQ = AQ, QAP = AP, PAP = PAQ, we easily verify that (Q - P)A(Q - P) = A(Q - P). This means that Im(Q - P) is A invariant. Finally, let /(A) be a function such that f(A) is defined. Then f(A) = p{A), where p(\) is a polynomial such that Ji>( _ Ai) pw'(At)=f»(A4), ;=0,. »u 1 «: = !,. where At,. . . , A, are all the distinct eigenvalues of A, and m^ is the height
Semiinvariant Subspaces 115 of \k(k = l,. . . ,s). Such a polynomial p( A) always exists. For example, the Lagrange-Sylvester interpolation polynomial, which is given by the formula s P(A) = 2 [akl + ak2(X - A,) + • • • + « (A - At)m*-']^(A) * = i where k> (/-!)! /(A) i/^(A)Ja=a* 7 = 1 «*; «: = !,. and </^(A) = (A - \k) m" Wi=l (A - A,)m\ k = 1,. . . , s [see, e.g. Chapter V of Gantmacher (1959)]. As the eigenvalues of PA\M are also eigenvalues of A, and the height of A0G a(PA\M) does not exceed the height of A0 as an eigenvalue of A (see Section 4.1), we obtain f{PA\M) = p(PA\M). Now equality (3.3.3) follows from (b). Conversely, (c) obviously implies (b). □ Given an ^-semiinvariant subspace M with an associated triinvariant decomposition <p" = M 4- M + if, the proof of Theorem 3.1 shows that (b) holds with P being the projector on M along N + !£. And conversely, if a projector P satisfies (b), then Ker P = M + if, where M and if are A- invariant and .d-coinvariant subspaces, respectively, taken from some triinvariant decomposition associated with M. Extending the notion of orthogonally coinvariant subspaces, we introduce the notion of orthogonally semiinvariant subspaces as follows. A subspace M C <p" is called orthogonally semiinvariant for a transformation A: <p"—» <p" if there exists an A -invariant subspace N such that M + N is again A invariant and M is the orthogonal complement to M in M + Jf. Clearly, an orthogonally semiinvariant subspace is semiinvariant. For an orthogonally /1-semiinvariant subspace M there exists an orthogonal decomposition §" = Jf®M®% (3.3.5) where if = (M + jV)\ Decomposition (3.3.5) will be called an orthogonal triinvariant decomposition associated with M. Again, for a given M there are generally many associated orthogonal triinvariant decompositions. (The extreme case of this situation appears for A — 0.) Consider the orthogonal triinvariant decomposition (3.3.5), and choose orthonormal bases in M, M, and if. Then we represent A as the 3x3 block matrix (3.3.6) An 0 0 An A 22 0 Al3 A23 A ii in the orthonormal basis for <p" obtained by putting together the ortho-
116 Coin variant and Semiinvariant Subspaces normal bases in Jf, M, and if. As the representation (3.3.6) is in an ortho- normal basis, we have A* = [K o A* A* A* A* LjH13 "^23 This leads to the following conclusion. Proposition 3.3.2 0 0 a; 33- An orthogonally A-semiinvariant subspace is also orthogonally A* semiinvariant. Indeed, if equation (3.3.5) holds, then Z£ is A* invariant, and M is the orthogonal complement to if in the ^"-invariant subspace J{± = M ®J£. An analog of Theorem 3.3.1 holds for orthogonally semiinvariant sub- spaces. Theorem 3.3.3 The following statements are equivalent for a transformation A: <f" —* £" and a subspace M C <p": (a) M is orthogonally semiinvariant for A; (b) we have PMAm\M = (PMA\M)m, m = 0,1,2,... where PM is the orthogonal projector on M; (c) for any function /( A) such that f(A) is defined we have PmKA)\m=KPmA\m) The proof is like the proof of Theorem 3.3.1, with the only difference that an orthogonal triinvariant decomposition is used and the projector Q is taken to be orthogonal. 3.4 SPECIAL CLASSES OF TRANSFORMATIONS In this section we shall describe coinvariant and semiinvariant subspaces for certain classes of transformations. We start with the relatively simple case of unicellular transformations. Proposition 3.4.1 Let A: <p" —* <p" be a unicellular transformation that is represented as a Jordan block in some basis xl,...,x„. Then a k-dimensional subspace
Special Classes of Transformations 117 M C <fr" is A-coinvariant if and only if M is spanned by a set of vectors yt,. . . , yk with the property that jc,,. . . , xn_k, yu . . . , yk is a basis in <p". A k-dimensional subspace M is A semiinvariant if and only if M = Spanfy,,. . . , yk) where the vectors y,,. . . , yk are such that, for some index I with k ^ / < n, we have yi £ Span{jt],. . . , x,}, i = 1,. . . , k and *!,... ,x,_k, yt, . . . , yk is a basis in Span!*,,. . . ,x,}. The proof follows easily from the definitions of coinvariant and semi- invariant subspaces and from the fact that the only ^-invariant subspaces are {0} and Span{jcl5. . . , x,}, 1= \, . . . , n. Consider now a diagonable transformation A:$"—»<p", so that A = diag[A],. . . , A J in some basis in <p". As we have seen in Example 1.2, if all A, are different, then every subspace in <p" is A coinvariant and hence also A semiinvariant. In fact, this conclusion holds for any diagonable transformation (not necessarily with all eigenvalues distinct). Indeed, consider the transformation B given by the matrix diag[/*i,. . . , /*.„] with different /*., values in the same basis in which A is given by diag[A,,. . . , An]. As every B-invariant subspace is also A invariant, it follows that every B-coinvariant subspace is also A coinvariant. But we have already seen that every subspace is B-coinvariant. We consider now the orthogonally coinvariant and semiinvariant sub- spaces. We say that a transformation A: <p"—* <p" is orthogonally unicellular if there exists a Jordan chain jc ,,..., xn of A such that the vectors xx,...,xn form an orthogonal basis in <p". Clearly, any orthogonally unicellular transformation is unicellular. Proposition 3.4.2 Let A: <f""—* £" be an orthogonally unicellular transformation, and let xl,...,x„ be its orthogonal Jordan chain. Then the only orthogonally A-coinvariant subspaces are Span{jc^, xk + i, . . . ,xn}; k — l,...,n; and {0}. The only orthogonally A-semiinvariant subspaces are Span{jc^,. . . , jc,}, 1 <&</<« and {0}. Again, Proposition 3.4.2 follows from the description of all ^-invariant subspaces. Consider a normal transformation A: £" —* <p": AA* = A*A. By Theorem 1.9.4, A has an orthonormal basis of eigenvectors (and conversely, if a transformation has an orthonormal basis of eigenvectors, it is normal). It turns out that normal transformations are exactly those for which the classes of invariant subspaces and of orthogonally semiinvariant subspaces coincide. Theorem 3.4.3 The following statements are equivalent for a transformation: (a) A is normal; (b) every A-invariant subspace is orthogonally A coinvariant; (c)
118 Coinvariant and Semiinvariant Subspaces every orthogonally A-coinvariant subspace is A invariant; (d) every orthogonally A-semiinvariant subspace is A invariant. Proof. Obviously, (d) implies (c). Assume that A is normal, and let A,,. . . , \k be all the different eigenvalues of A. Then <£" = ®,,A)®---®®k(A) is an orthogonal sum, and A\R (A) = A./. Let M be an orthogonally A- semiinvariant subspace, so that' Ji is the orthogonal complement to an /4-invariant subspace M in another ^-invariant subspace J£ We have Jf = Jfl@---@Jfk, 2=%®---^ where jVj-Ci^C £%A(.4), i = 1,. . . , k. Denoting by Mi the orthogonal complement of Jft in ifj, the definition of M implies that M = Mx®---@Mk. It follows that M is A invariant. So (a) implies (d). One sees easily that (a) implies (b) also. It remains to show that (c)=>(a) and (b)=>(a). Assume (c) holds, that is (cf. Proposition 3.1.2) every ^-invariant subspace is /4-invariant. Write A* in an upper triangular form with respect to some orthonormai basis Xl> ■ ■ ■ ' Xn' 0 a 22 L 0 0 (3.4.1) As Spanjjc,, . . . , xk), k = 1, . . . , n are i4*-invariant subspaces, they are also A invariant. Hence (Proposition 1.8.4) A also has an upper triangular form in the same basis: A = 0 bln b In (3.4.2) 0 0 On the other hand, equality (3.4.1) implies
Exercises 119 0 0 >t = (3.4.3) Lfl, Comparison of (3.4.2) and (3.4.3) reveals that b,v = 0 for i<j, and A is normal. Assume now that (b) holds, and write A = 0 fo„ L 0 0 bln (3.4.4) in some orthonormal basis jc ,,..., xn in <p". The subspaces Spanjjc,, . . . , xk}, k = 1,. . . , n are A invariant and, by (b), orthogonally A coinvariant. Hence Span{xt+1,. . . , *„}, k = 1,...,«- 1 are ^-invariant subspaces, which means that A has a lower triangular form A = 0 01 0 LC„ (3.4.5) Comparing equations (3.4.4) and (3.4.5), we find that A is normal. □ As a corollary of Theorem 3.4.3 we obtain the following characterization of a normal transformation in terms of its invariant subspaces. Corollary 3.4.4 A transformation A: <p" —* <p" is normal if and only if a subspace M is A invariant exactly when its orthogonal complement is A invariant. Indeed, it follows from the definition that the subspace M x is A invariant if and only if M is orthogonally A coinvariant. 3.5 EXERCISES 3.1 Prove that, in Example 3.1.2, there is a unique ^-invariant direct complement to the ^-coinvariant subspace M if and only if M itself is A invariant. 3.2 Prove that a subspace M is A coinvariant (resp. A semiinvariant) if and only if M is (aA + /3/) coinvariant [resp. (aA + fil) semiinvariant]. Here a, (3 are complex numbers and a/0.
120 Coinvariant and Semiinvariant Subspaces 3.3 Show that a subspace M is A coinvariant (resp. A semiinvariant) if and only if ZfM is SAS~l coinvariant (resp. SAS~l semiinvariant), where 5 is an invertible transformation. 3.4 Let A:$n —»<p" (n^3) be a unicellular transformation. Give an example of a subspace M C <p" that is not A semiinvariant. List all such subspaces when n = 3. 3.5 Show that every subspace in <p" is A coinvariant if and only if A is diagonable (i.e., it is similar to a diagonal matrix). 3.6 Prove that every subspace in <p" is coinvariant for any n x n circulant matrix. 3.7 Give an example of a nondiagonable transformation A: <p"—* <p" such that every subspace in <p" is A semiinvariant. 3.8 Find all the coinvariant subspaces for the matrices J "0 0 ,i 1 0 3 0 " 1 -3i. 3.9 Find all coinvariant and semiinvariant subspaces for the matrix ■0 1 -r 0 0 1 .0 0 1. 3.10 Prove that every reducing ^-invariant subspace is reducing also for f(A), where /(A) is any function such that f(A) is defined. Is the converse true? 3.11 If J is a Jordan block, for which positive integers k does the matrix /* have a nontrivial reducing invariant subspace? Is the reducing sub- space unique? 3.12 Prove that an ^-invariant subspace M is reducing if and only if M n 9tk (A) is reducing for every eigenvalue A0 of A. 3.13 Find all the triinvariant decompositions <p3 = ^V 4- M 4- if with dim N = dim M = dim if = 1 for the following matrices: "0 1 0" 0 0 1 .0 2 1. > "0 0 .— i 1 0 3 (T 1 3/.
Chapter Four Jordan Forms for Extensions and Completions Consider a transformation A: <p"—»<f" and an i4-coinvariant subspace M. Thus there is an ^-invariant subspace jV such that <p" = M + JV and there is a projector P onto M along JV. The main problems of this chapter are: given Jordan normal forms for A\x and PA\M, what are the possible Jordan forms for A itself? In general, this problem is open. Here, we present partial results and important inequalities. 4.1 EXTENSIONS FROM AN INVARIANT SUBSPACE Let M C <p" be a subspace, and consider a transformation A0: M—* M. A linear transformation A: <p"—* <p" is called an extension of A0 if Ax = AqX for every jc G J<. Then, in particular, M is A invariant. Also, A0 is called the restriction of ^4 to M. We are interested in the Jordan form (or, equivalently, the partial multiplicities) of A0 and its extensions. We start with a relatively simple but important case in which A as well as its extension are in the Jordan form and have special spectral properties. These spectral properties ensure that the partial multiplicities corresponding to a particular eigenvalue A0 are the same for A0 and its extension A. Theorem 4.1.1 Let J1 and J2 be matrices in Jordan normal form with sizes p*~ p and q x q, respectively. Let B be a p x q matrix and J=\h B] I 0 J2 J 121
122 Jordan Forms for Extensions and Completions Denote by Jl0 and 720 the Jordan submatrices of Jx and J2, respectively, formed by those Jordan blocks with the same eigenvalue A0. Then the partial multiplicities of J corresponding to A0 coincide with the partial multiplicities of the submatrix of J, where B0 is the submatrix of J formed by the rows that belong to the rows of Ji0 and by the columns that belong to the columns ofJ20 (so actually B0 is a submatrix of B). Theorem 4.1.1 is used later to reduce problems concerning the Jordan form of an extension to the case when the transformations involved have only one eigenvalue. The proof of Theorem 4.1.1 is based on two lemmas, which are also independently important. Lemma 4.1.2 Let A, B, C be given matrices of sizes n x n, m x m, and n x m, respectively. Consider the equation AX-lBJt=C (4.11) where X is an nx m matrix to be found. Equation (4.1.1) has a unique solution X for every C if and only if cr(A) Pi <r(B) = 0. This lemma follows immediately from the fact that, for the linear transformation L: <p"*m-* <p"*m defined by L(X) = AX - XB, a{L) = {A - fi | A E a(A) and fi E or(B)}. [See Chapter 12 of Lancaster and Tis- menetsky (1985), for example.] Here we give a direct proof based on the Jordan decompositions of A and B. Proof. Equation (4.1.1) may be regarded as a system of linear equations in the rs variables jt(. {i = 1,. . . , r; j = 1,. . . , s) that form the entries in the matrix X. Thus it is sufficient to prove that the homogeneous equation AX-XB = 0 (4.1.2) has only the trivial solution X = 0 if and only if a(A) n cr(B) = 0. Let JA and JB be the Jordan forms of A and B, respectively; so A — SAJASA , B = SgJgSg1 for some invertible matrices 5^ and SB. It follows that A' is a solution of (4.1.2) if and only if Z = 5^'A'5B is a solution of JAZ - ZJB = 0 (4.1.3)
Extensions from an Invariant Subspace 123 Thus we can restrict ourselves to equation (4.1.3). Let us write down JA and JB explicitly: JA = diag[7,4 ,, . . . , JAJ ; JB = diag[7fl ,,. . . , JBJ where JAi (resp. JBJ) is a Jordan block of size mA, (resp. mB ;) with eigenvalue A^ , (resp. Afl •). The matrix Z from (4.1.3) is decomposed into blocks accordingly: where Z,y is of size mA, x mfi •. Suppose first that a(A)C\ (t(B)t^0. Without loss of generality we can assume that A^ x = AB x. Then we can construct a nonzero solution Z of equation (4.1.3) as follows. In the representation equation (4.1.4) put Ztj = 0, except for the case that i— j= 1; and let Z„ = [q] or [/ 0] (according as mA , 2 ms , or mA , < mB j). Direct examination shows that such a matrix Z satisfies (4.1.3). Suppose now that cr(/l) n cr(B) = 0. Let Z be given by (4.1.4) and suppose that Z satisfies (4.1.3). We have to prove that Z = 0. Equation (4.1.3) means that Jajzh = znJB,j for i = 1,.. . , /x ; / = 1,. .. , v (4.1.5) Write JAi = kAJ + H; JBl = kBJ+G where H and G are the nilpotent matrices [i.e., <r(H) = <r(G) = {0}] having 1 on the first superdiagonal and zeros elsewhere. Rewrite equation (4.1.5) in the form (A^-A^Z^Z.G-Z/Z, Multiply the left-hand side by A^ . - \B /? and in each term on the right-hand side replace (A^ , - kBtj)Ztj by Z^G - HZir We obtain (V.- - V,)% = zog2 ~ 2HZ-iG + H% Repeating this process, we obtain for every p — 1, 2,. . .
124 Jordan Forms for Extensions and Completions (A„, - Afl„yZ,7 = 2 (-\)*(P)wZijG>"> (4.1.6) 9 = 0 V9' Choose p large enough so that either Hq = 0 or C * = 0 for every q — 0,...,p. Then the right-hand side of equation (4.1.6) is zero, and since \Ai * \B /; we find that Zl7 = 0. Thus Z = 0. □ Lemma 4.1.3 If A and B are n x n and m x m matrices, respectively, with a(A) C\ cr(fi) = 0, then for every n x m matrix C the (m + n) x (m + n) matrices to b] a"d [o «] are similar. Proof. By Lemma 4.1.2, for every n x m matrix C there is a unique n x m matrix A' such that .dA" - XB = - C. With this A\ one verifies that As the lemma follows. □ Proof of Theorem 1.1. For notational simplicity assume that 7 = 710 Bn B0 Bu' 0 0 .0 Jn 0 0 B23 •'20 0 fi24 B34 •*21 - where Ju (resp. 721) are the Jordan blocks from 7, (resp. J2) with eigenvalues different from A0, and Bit are the corresponding submatrices in /. Applying Lemma 4.1.3 twice, we see that J is similar to •Ao #12 &o 0 0 Jn 0 B24 0 0 J20 "34 L0 0 0 721. which after interchanging the second and third block rows and columns (this is a similarity operation) becomes
Extensions from an Invariant Subspace 125 Ji0 0 0 0 Bo •lys 0 0 Bn 0 Ju 0 0 " B}4 B14 J2i J It remains to apply Lemma 4.1.3 once more to prove that J is similar to [': ZM's £] ° It is convenient to describe the partial multiplicities of a transformation A: <p" —» <p" at an eigenvalue A0 as a nonincreasing sequence of nonnegative integers at(A; A0) > a2(/4'> Ao) — a3(-^'» Ao) — ''' > where the nonzero members of this sequence are exactly the partial multiplicities of A at A0. In particular, not more than n of the numbers a^A; A0) are different from zero. Also, if A0 is not an eigenvalue of A, we define at(A; A0) = 0 for 7 = 1,2,.... Thus the nonnegative integers at(A; A0) are defined for all A0 £ <p, and we have X X otj(A; A0) = « The following result describes the connections between the partial multiplicities of a transformation and those of its extension. Theorem 4.1.4 Let M C <p" be a subspace and let A0: M—> M be a transformation. Then for every extension A: <p" —* <p" of Au we have "/04; A0)&a;(^o; Ao)> /or every A0 G (p. Conversely, let B^ B2> nonnegative integers such that 2 B^n ; = 1,2,... (4.1.7) be a nonincreasing sequence of (4.1.8) and |5/2^o^). y = 1,2 (4-1.9) /or a fixed complex number A0. 77ien tfiere is an extension A of A0 such that o>(i4;A0) = ^,7 = l,2,.... Proof We prove (4.1.7) for an extension >4 of j40. In view of Theorem 4.1.1, we may restrict ourselves to the case when a(A) = {\0). (Indeed,
126 Jordan Forms for Extensions and Completions without loss of generality it can be assumed that A0 is in the Jordan form. Furthermore, the transformation PA\j,-: -V-» jV, where Jfis a direct complement to Jl and P is the projector on Jf along Jl, may also be assumed to have Jordan normal form.) There exists a chain of ^-invariant subspaces j = j0ci(1C'-cj„.m = (|:" (4.1.10) where dim M-t = m + i, i = 0,1,. . . , n — m (so m = dim Jl). This can be seen by considering the transformation A: §"IJl —* ("/Jl induced by A and using the existence of a complete chain of ^-invariant subspaces. In view of the chain (4.1.10) and using induction on the index i of Jl^ it will suffice to prove inequalities (4.1.7) for the case dim Jl = n - 1. Writing A0 in a basis for Jl in which A0 has a Jordan form, we can assume _\J B A-lo aJ where J= /Ai(A0)©- ■ -®Jk (A0), kl > • •• > kp is the Jordan form of A0 and B is an (« - l)-dimensional vector. Let j be the first index (1 <;' </?) for which the (A:, + Ac2 H + &.)th coordinate of B is nonzero (if such a / exists). Let 5 be the (n - 1) x (n - 1) matrix "/* 0 ••• 0 0 0 ■.. /t/ 0 S~ 0 ••• v,G/+, V' .0 •■• apQp 0 where Qm is the A:m x ki matrix of the form [0 Ik] and aj+,,..., ap are complex numbers chosen so that the (kl + k2 + ■ ■ ■ + km)th coordinates of SB are zeros for m = j + 1,. . . , p. If all coordinates kx, kt + k2, . . . , k1 + • • kp of B are zeros, put 5 = /B_,. It is easy to see that SJ = JS and 5 is nonsingular. Moreover, the A:,th, (&, + fc2)th,. . . , (kt + k2 + ■ ■ ■ + kp)th coordinates of SB are all zero except for at most one of them. Further, let X be an (« - 1)-dimensional vector such that the nonzero coordinates of the vector Y = (\0I-J)X + SB can appear only in the places A:,, kl + k2,. . . , k1 + k2 + • • • + kp (this is possible because 0 0
Extensions from an Invariant Subspace 127 Im(A0/-/) = Span{^| ;V k, kx + k2,. . . , kx + k2 + ■ ■ ■ + kp}) Now a computation shows that [S X~\U BITS ' -S~lXl\J Y] LO 1 JLO aJL 0 1 J LO A0J As is the inverse of , it follows that and T7 Y] ° have the same partial multiplicities. Now the partial multiplicities L U Aq j IJ Y] of are easy to discover: they are kx,. . . , kp,l if Y = 0, and kx, . . . , kj_x, kj + 1, kj + x,. . . , kp if V^0 and the nonzero coordinate of K (by construction of Y there is exactly one) appears in the place kt + • • • + kr So the inequalities (4.1.7) are satisfied. If B = 0, then (4.1.7) is obviously satisfied. Now let /3, be a sequence with the properties described in the theorem. Let *,,. . . , xk be a basis in M in which A0 has the Jordan form. We assume also that the first p Jordan blocks in the Jordan form have eigenvalues A0 and sizes ax(A0; A0),. . . , ap(A0; A„), respectively. (Here, otx(Aa; A0),. . . , ap(A0; A0) are all the nonzero integers in the sequence {ctj(A0; A0)}JL,). So in the basis oc,,. . . , xk we have ^o = -/a](A0)©---©/ap(A0)©7mi(A1)©---©Jmu(Au) where A,,...,AU are different from A0, and a/. = ctj{A0\ A0). Now let yx, ■ • - , y„-k be vectors in <p" such that jc,, . . . , xk, yx,. . . , yn_k is a basis in <p". Put 2j — xx,... , 2U| = xaj, 2„i + 1 — yx,. . . , Zpt = y^i_„| 0,+ l ~~ *ax + l' ■ ■ " ' Z0,+a2 _ ',;«i+a2 Z0,+«2+l = yp,-a, + l> • • • > 20,+02~ y^.-a.+pj-^' • " ' ' Zs ~~ yr where s = E?=1 /3,, r = Ef_, (ft - a,), and q is the number of positive ft values. Further, setting f = Ef=] a,, put 2s+i — *t+i' ■ ■ ■ > zk+s_, — xk, zk+s_l+x — yr+i, • • • > zn — y„-k Now let A: <p"-» (f"1 be a transformation that is given in the basis zx,. . . , z„ by the matrix
128 Jordan Forms for Extensions and Completions where / is any (« - k - r) x (« - k - r) matrix in the Jordan form with the property that A0 is not an eigenvalue of J. From the construction of A it is clear that /3,,. . . , B are the partial multiplicities of A corresponding to A0 and that A is an extension of A0. □ In particular, the theorem shows that if A is diagonable, then so is the restriction of A to any ^-invariant subspace. For coinvariant subspaces the notions of coextension and corestriction become natural. Let M C <p" be a subspace, and let A0: M—*M be a linear transformation. A transformation A: <p"—* <p" is called a coextension of A0 if there exists an .4-invariant direct complement JV to M in <p" such that PA\M = A0, where P is the projector on M along M. Clearly, in this case M is an j4-coinvariant subspace. There is a connection between the partial multiplicities of a transformation and those of a coextension of the kind described in Theorem 4.1.4. Theorem 4.1.5 Let M C £" be a subspace and A0: M—>M be a transformation. Then for every coextension A of A0 we have a^A; A0) 2 a-(i40; A0), j = 1, 2,. . . for every A0 e (p. Conversely, let &i ^ B2 ^ • • • be a nonincreasing sequence of nonnegative integers such that equations (4.1.8) and (4.1.9) hold. Then there is a coextension A of A0 such that at(A; A0) = /3., / = 1,2,. . . . The proof of Theorem 4.1.5 is similar to the proof of theorem 4.1.4. Given a transformation A0: M—*M, where M C <p", we say that a transformation A: <p" —* <p" is a dilation of y40 if there exists an y4-invariant subspace ^V for which ^V D M = {0}, M -i- Jf is ^4 invariant as well, and F^l^ = j40, where P is some projector on M with ^VCKerP. (The term "semiextension" would be more logical in the context of our terminology; however, "dilation" is widely used in the literature.) In this case M is an /4-semiinvariant subspace and A0 is the reduction of A (again the term "semirestriction" would be consistent with our terminology, but "reduction" is already widely used.) Thus there is a subspace if of <p" for which the decomposition (3.3.1) holds, and this decomposition determines a triangular representation such as (3.3.2) for A in which j422 = i40. A result similar to theorems 4.1.4 and 4.1.5 also holds for dilations, and it can be proved by first applying one of these theorems and then applying the second. In particular, if A is diagonable, so is any reduction of A. 4.2 COMPLETIONS FROM A PAIR OF INVARIANT AND COVARIANT SUBSPACES Let A: M—*Jt and BiN—^N be transformations, where M and M are subspaces in <p" which are direct complements to each other. A transformation C: <p"—* <p" is called a completion of A and B if M is C invariant and
Completions from a Pair of Invariant and Covariant Subspaces 129 C\M - A, PC\M = B, where P is the projector on J{along M. So with respect to the direct sum decomposition <p" = M + Jf, C has the form for some matrix D. Let a, > a2 > • • ■ (resp. B, s B2 > • • ■) be a sequence of nonnegative integers whose nonzero elements are exactly the partial multiplicities of A (resp. B) corresponding to a fixed point A0E<p. Assuming that C is a completion of A and B, let -y, > -y2 > ■ ■ • be a sequence of nonnegative integers such that the nonzero yl values are the partial multiplicities of C at A0. In this section we study the connections between ait /3,, and yr In view of Theorem 4.1.1, these connections describe the Jordan form of C in terms of the Jordan forms of A and B. Some such connections are easily seen. We have det(C - A/) = det(>t - A/) det(B - A/) (4.2.2) for every A G <p. Now the algebraic multiplicity of an eigenvalue A0 of a matrix X coincides with the multiplicity of A0 as a zero of the polynomial det(A" - A/). (When A0 is not an eigenvalue of X this statement is also true if we accept the convention that, in this case, the algebraic multiplicity of A0 is zero.) It follows from equation (4.2.2) that the algebraic multiplicity* of C at A0 is equal to the sum of the algebraic multiplicities of A and B at A0. In other words 00 OC 00 2 y,■ = 2 *, + 2 A (4.2.3) i=i /=i i=i Further, as C is an extension of A and a coextension of B, Theorems 4.1.4 and 4.1.5 imply that 7/amax(ai,ft), i = l,2,... (4.2.4) The following inequality between (a^lJL,, {fy}*^, and {yyJJL, is deeper. Proposition 4.2.1 Let C be a completion of A and B, with the partial multiplicities of A, B, and C at a fixed A0 G <p given by the nonincreasing sequences of nonnegative integers {a,}"^, {p,}7-„ and {y,}%lt respectively. Then *It is convenient here to talk about the "algebraic multiplicity of C at A0" rather than the "algebraic multiplicity of A0" as an eigenvalue of C.
130 Jordan Forms for Extensions and Completions 2 {k\yk^j}* sE (k\ /3ta/}# + 2 {k\ak*>j}* , m = l,2,... i=\ j-i ,=\ (4.2.5) As usual in this book, the symbol ft* represents the number of different elements in a finite set il. Proof. First we prove the following inequalities: dim Ker(C - A0/)' < dim Ker(,4 - A0/)' + dim Ker(B - A0/)' , i = 1, 2,. . . (4.2.6) Indeed, for every e^Owe have [using formula (4.2.1)] I"/ 0 lM-V D IT/ 0] = M-A0/ eD 1 Lo e'/JL 0 B-A0/JLo e/J L 0 B - A0/J and thus dim Ker(C-A0/)' = dim Kerf ^ -A°/ *** , L U a ~~ A,.i i = l,2,.. . (4.2.7) Fix some i, and let m = rank[y4_0V ° ] So there exists an mx m nonsingular submatrix Q in (A — A0/)' © (B - A()/)'. Consider the m x m submatrix Q(e) of [i4-A0/ eD 1 L 0 B-\j\ which is formed by the same rows and columns as Q itself. Now Q(e) is as close as we wish to Q provided e is sufficiently close to 0. Take e so small that the matrix (2(e) is also nonsingular. For such an e rank A-k{)I eD V 0 B-A„/J m Comparing with (4.2.7), we obtain the desired inequality (4.2.6). Now use Proposition 2.2.6 to obtain the inequalities (4.2.5). □ In connection with inequalities (4.2.5), note that
Completions from a Pair of Invariant and Covariant Subspaces 131 2{k\yk^)}*^2{k\ak^j}* + Y, {k\pk^j}* (4.2.8) >=1 ; = l >=1 Indeed, as {k \ yk s=;}# =0 for j>yx, and similarly for {ak)l=l and {/3*}jt=i' a^ tne sums in equation (4.2.8) are finite, so (4.2.8) makes sense. Further, for any nonincreasing sequence of nonnegative integers {5,},1i w'tn finite sum E*=1 8, we have 2s, = 2{*|^0# (4.2.9) i=l i-l The easiest way to verify (4.2.9) is by representing each nonzero Sj as the rectangle with height S, and width 1 and putting these rectangles one next to another. The result is a ladderlike figure 4>. For instance, if St = 5, S2 = S3 = 4, S4 = 1, Sj: = 0 for / > 4, then 4> is the following figure: Obviously, the area of <& is just the left-hand side of equation (4.2.9). On the other hand, the right-hand side of (4.2.9) is also the area of 4> calculated by the rows of 4> (indeed, {k | Sk s i}# is the area of the ith row in 4> counting from the bottom); hence equality holds in (4.2.9). Now appeal to (4.2.3) and (4.2.8) follows. We need a completely different line of argument to prove the following proposition. Proposition 4.2.2 With {a,}°°=i» (Air=i and (X}T= i as m Proposition 4.2.1, we have m m m 2 7^2 a,+ 2 ft, m = l,2,... (4.2.10) >=i /=i /=i Proof. Assuming that C is given by (4.2.1), one easily obtains *-"-['* .-°J[J TV °,} Using Theorem A.4.3 of the appendix, pick a pXp submatrix C0(A) in C - A/ such that A0 is a zero of det C0( A) of multiplicity y„ + • • • + y„_p+1
132 Jordan Forms for Extensions and Completions (here n x n is the size of C). The integer p is assumed to be greater than ma\(nA, nB) where nA x nA is the size of A and nB x nB is the size of B (so n = nA + nB). By the Binet-Cauchy formula (Theorem A.2.1 of the appendix) we have detC0(A)= 2 detB,(A)det£>,det Ak(\) (4.2.11) i.j.k where B,(A), Dy, Ak(X) are p xp submatrices of [Q fi_A/J>[o 7 J> and T/l-A/ 0 L 0 / respectively, and the summation is taken over certain triples i, j, k. Note that det B,.( A) = 0 unless B,( A) is of the form Is © B,( A), where Bt(k) is a (p - s) x (p - 5) submatrix of B - XI (here 5 is an integer that may depend on i and for which 0<s^nA). Similarly, det Ak(\) = 0 unless i4t(A) is of the form /,@Ak(\), where Ak(\) is a (p - f) x (p - ?) submatrix of ^4 - A/ (0< t^nB). Taking these observations into account, rewrite equation (4.2.11) as follows: det C0( A) = 2 det B,( A) • det D, ■ det Ak( A) Now the size of Bj(A) is at least (p ~ nA)x(p - nA), so by the same theorem, Theorem A.4.3, the multiplicity of A0 as a zero of det Bt(\) is at least (here we use nB + nA = n and /3, = 0 for i > nB). Similarly, the multiplicity of A0 as a zero of det Ak(X) is at least otn + a„_, + • • • + a +1. We find that the multiplicity £J=B_p+i 7,- of A0 as a zero of detC0(A) is at least Z"-„-p+i («/ + Pf)- It follows from equation (4.2.3) that n-p n—p n—p 2 yy^ 2 a,+ 2/3, (4.2.12) l-i >=i /=i If it happens that p<nA, then the inequality n n n V V1 V - Z r^ Z «,+ Z /3, j = n— p+Y j—n—p + 1 j = n—p + l and hence also the relation (4.2.12), follows from (4.2.4) because in this case Bj = 0 for / > n - p + 1. Similarly, (4.2.12) holds for p^nB. We have proved (4.2.10) for m = 1,. . . , n. For m s n the inequality (4.2.10) coincides with (4.2.3), so the proof of (4.2.10) is complete. O We have proved various inequalities and equalities relating the sequences
The Sigal Inequalities 133 {«;}:=,. m:.i, and {r,}°"l, [relations (4.2.3), (4.2.5), (4.2.8), (4.2.10)]. These relations are by no means the only connections between these sequences. More specifically, there exist nonincreasing sequences of non- negative integers {a,}"!,, {/3,}°°=1 and {-y,},=1, only a finite number of them nonzero, that satisfy equations (4.2.3), (4.2.5), (4.2.8), and (4.2.10), but for which there is no completion C of A and B with the property that for some A0£<p the sequences {aJJLj, {/3,.}°Li» and (yJJLi give the partial multiplicities of A, B, and C, respectively, corresponding to A0. In the next section we see more general inequalities, but even they do not completely describe the connections between the partial multiplicities of extensions of A and B and the partial multiplicities of A and B. The problem of describing all such connections is open. 4.3 THE SIGAL INEQUALITIES The main result in this section is the following generalization of Proposition 4.2.2. Theorem 4.3.1 Let {«,•}"=,, {/3,-IJLij and {"y;}°°=i be as in Proposition 4.2.1. Then for every sequence r, < r2 < • ■ • < rm of positive integers we have m m m 2%,^S«,, + 2/3, (4.3.1) i=l ,=1 1-1 and m m m 2%.^2^ + Sft. (4.3.2) i=l i=l 1=1 Proposition 4.2.2 is obtained from this theorem by putting r;. = /, / = 1,. . . , m. It will be convenient to prove a lemma (which is actually a particular case of Theorem 4.3.1) before proving the theorem itself. Lemma 4.3.2 Let c[v 2] where B is (n - k)x(n- k) with a(B) = {0}. // {^}r»i and {A}T=i are the nonincreasing sequences of partial multiplicities of C and B, respectively, then yt = Bt+ S,, i = 1, 2, . . . , where S, is zero or one, and E°°=] 8t = k.
134 Jordan Forms for Extensions and Completions Proof. Let jc ,, . . . , xt (/ ^ 2) be a Jordan chain for C: Ct, + 1 = Jt,, f = l,...,7 — 1; *,*0 (4.3.3) Write xi = ' , where y, is a ^-dimensional vector and z, is (n - &)- dimensional. Equalities (4.3.3) then imply y, = • • • = y,_, = 0 and Bzj + 1 = z,, i = 1, . . . , /-2, z, #0. In other words, z,,. . . , z,_, is a Jordan chain for fi. Moreover, if A/y, = 0, then z,,. . . , z, is also a Jordan chain for B. Now let •*!!>•• • > X\,yt> ■ ■ • ' •''^l' ■ • • ' Xq,yq (4.3.4) be a basis in <p" consisting of Jordan chains for C (so 9 is the maximal index such that y >0). Denoting by /? the maximal index such that y s: 2, let Z be the subspace spanned by the Jordan chains zn,...,z,,;...; z ,,...,z, for fi constructed as in the preceding paragraph from the Jordan chains oc-,,. . . , xjy, j - 1,. . . , p of C. Here l} is either y^ - 1 or yr The order of Jordan chains in equation (4.3.4) of the same length can be adjusted so that /,>•••> lp. Since Z is B invariant, Theorem 4.1.4 gives B,,2: /,, 1 = 1,. . . , p. On the other hand, by Theorem 4.1.5 yt > ft, i'=l,2 So we obtain y, - ft < S,, / = 1, 2,. . . , where each S, is either zero or one. The equality EJL, S, = k follows from the fact that the sum of the partial multiplicities of C (resp. of B) is n (resp. n-k). □ Proof of Theorem 4.3.1. Let <p" = M + Jf and let A: M -+ M, B: Jf^> Jf be transformations such that {a,}°°-i> {ft)r=i> anc* {YjlT-i are tne nonin- creasing sequences of nonnegative integers representing the partial multiplicities of A, B, and c = \A D Lo b\ respectively, corresponding to the eigenvalue A0 (here D is some transformation from jV into M). Applying a similarity transformation, if necessary, we can assume that Jf = M \ Without loss of generality (Theorem 4.1.1) we can assume also that A0 = 0 and a{A) = o\B) = {0} (then also cr(C) = {0}). We can assume also that A is in the Jordan form: ^ = diag[7Oi(0),...,71,/(0)], (a,=0 for />/) We use induction on the size a, of the biggest Jordan block in A. If
The Sigal Inequalities 135 a, = 1, then A = 0 and by Lemma 4.3.2 (applied to B* and C* in place of B and C, respectively) we have mm m mm 2 %, = 2 (ft, + «,,) ^ 2 ft, + min(w, /) = 2 ft, + 2 a, (=1 / = ! (=1 1=1 /=1 Assume that inequality (4.3.2) is proved for all A with the property that the size of the biggest Jordan block is less than a,. Using a matrix similar to A in place of A, we can assume that 0 A2\ where A2 is a Jordan matrix with partial multiplicities {a,'}*!, satisfying o; = a,-l,...,a; = a,-l;a;=0 for />/ (4.3.5) r*ii With the corresponding partition D = , and using the induction hypothesis the partial multiplicities {-y J} °L, of the matrix C = 2 2 satisfy the inequalities: m m m 2y;^2«; + 2ft (4.3.6) ,=1 ' i=\ 1=1 ' But in view of Lemma 4.3.2 (applied with C* and C* in place of B and C, respectively) m m 2 yr^2 yj +min(m,/) (4.3.7) i-i ' 1=1 ' Now combine relations (4.3.5), (4.3.6), and (4.3.7) to obtain the inequality (4.3.2). The inequalities (4.3.1) are obtained from (4.3.2) applied to the transformation C* written as the 2x2 block matrix with respect to the direct sum decomposition <p" = Jf + M. □ Inequalities (4.3.1) and (4.3.2) admit the following geometric interpretation. Let q be any index such that y,=0 for i>q (e.g., ^ = E°L1 at + E*=1 ft). Denote by Ki C W the convex hull of the points a\ + ftd)' a2 + ft(2)' ■•■>", + ft(,) where w is any permutation of {1,2, ... , q}, that is Also let
136 Jordan Forms for Extensions and Completions K2 = JS *,(«»(., + ft, . . . , «,,„ + 0,) 11,^0 , 2 *w = l} Then inequalities (4.3.1) and (4.3.2) imply (ylt...,yq)eKlnK2 (4.3.8) Actually, the inclusion (4.3.8) in turn implies (4.3.1) and (4.3.2). The proof of these statements would take us too far afield; we only mention that it is essentially the same as the proof of Theorem 10 of Lidskii (1966). It is interesting that the geometric interpretation of inequalities (4.3.1) and (4.3.2) is completely analogous to the geometric interpretation of the inequalities for the eigenvalues of the sum of two hermitian matrices in terms of the eigenvalues of each hermitian matrix [see Lidskii (1966)]. Inequalities (4.3.1) and (4.3.2) can be generalized. In fact, for any sequence r, < r2 < ■ ■ ■ < rm of positive integers and any nonnegative integer k<rl the following inequalities hold [see Thijsse (1984)]: m m m m m m 2 yr*H,a k + 2 Bi+k ; Er^S al+k + 2 &,-* (4-3-9) i=l ( = 1 1=1 i=l i-l ;=1 Theorem 4.3.1 is a particular case of (4.3.9) with k = 0. We have seen that, given the sequences {ajT-i an(J {/3,}T= 1 °f partial multiplicities of A and B, respectively, corresponding to A0, the sequence (7(}i°-i °f partial multiplicities corresponding to A0 of any completion C of A and B satisfies the properties of (4.2.3), (4.2.4), (4.2.5), (4.3.1), and (4.3.2); moreover, (4.3.9) is satisfied as well. However, the following example shows that, in general, these properties do not characterize the partial multiplicities of completions. example 4.3.1. Let a, = a2 = 3, a, = 0 for i > 2; B, = f}2 = 5; B3 = 4; Bt = 0 for i > 3; yx = 7, y2 = 6, y3 = 4, y4 = 3, yt = 0 for i > 4. One verifies that relations (4.2.3), (4.2.4), (4.2.5), and (4.3.9) hold [the verification of (4.3.9) is lengthy because of the many possibilities involved]. However, Theorem 7 of Rodman and Schaps (1979) implies that there is no completion C of A and B such that the partial multiplicities of A, B, and C corresponding to some A0 are given by {a,}°°=], {/3,}°°=1, and {7;} °L,, respectively. 4.4 SPECIAL CASE OF COMPLETIONS In this section we describe all the possible sequences of partial multiplicities corresponding to A0 for completions of A and B in case at least one of A and B has only one partial multiplicity at A0. First, we establish some general
Special Case of Completions 137 observations on partial multiplicities of completions that are used in this description. It is convenient to introduce the set Q, of all nondecreasing sequences of nonnegative integers such that, in each sequence, only a finite number of integers is different from zero. For a = (alt a2,. . .), fi = (&, fi2,. . .) Eil denote by T(a, fi) the set of all sequences y = (y,, y2,. . .)Eft with the following properties: (a) there is a transformation C: <p" —*■ <p" (for some n) and a C-invariant subspace M such that the restriction C\M has partial multiplicities a,, a2,. . . corresponding to a certain eigenvalue A0; (b) the compression of C to a coinvariant subspace that is a complement to M has partial multiplicities /3,, /32,. . . corresponding to A0, and (c) C itself has partial multiplicities y,, y2,. . . corresponding to the same A0. Proposition 4.4.1 Let a = (a,, a2, . . .)Gft, fi = (/3j, /32,. . .)Gft, and put m = E"=1 a,, « = E°°_, /3,-. 77ien a sequence y = (y,, y2,. . .) G ft belongs to T(a, /3) i/and o«/y i/ f/iere is an m~x- n matrix A such that the partial multiplicities of the matrix Ho 1} <"■*> where 7, = 7Oi(0) 0 • ■ • 07^(0), 72 = 7^(0)©• • ■ ©7^(0) [n, (resp. «2)] is tfie largest index such that an ^0 [res/;. /3n #0] are y1; y2, . . . . Proof. As the part "if" follows from the definition of T(a, /3), we have only to prove the "only if part. Assume y G T(a, fi). By definition, there is a matrix C partitioned as follows: C = L o" c22J where for some eigenvalue A0 of C the partial multiplicities of C (resp. Cn, C22) at A0 are given by y (resp. a, fi). Replacing C by C- A0/, we can assume A0 = 0. Furthermore, we can assume that C,, and C22 are matrices in the Jordan form. It remains to appeal to Theorem 4.4. l.i. □ It follows immediately from Proposition 4.4.1 that r(a, fi) = T(/3, a). Indeed, in the notation of Proposition 4.4.1 we have ro /p ^nro /] = |"72 o L7 oJL 0 72JL/ OJ LA 7,. so the matrices [i ;j - [i :i
138 Jordan Forms for Extensions and Completions have the same Jordan form. But then (in view of Corollary 2.2.3) this is also true for the matrices [•/, Al \J2 01* \J*2 A*] Lo j2\ and [a jJ =Lo j*\ As 7* and 7* are similar to J2 and 7,, respectively, the conclusion T(a, /3) = V(fi, a) follows. In view of Proposition 4.4.1, in order to determine T(a, /3), we have to find the partial multiplicities -y, > y2 > •• • (or, what is the same, the Jordan form) of matrices 7 of type (4.4.1). As {k | yk> i + 1}# = rank 7' - rank 7'" , i = 0,1,. . . (by definition, 7° = /), we focus on a formula for computation of the ranks of 7', i = 1,2, Divide the matrix A into blocks Atj, i = 1, ...,«,;/= 1,. . . , n2 according to the sizes of Jordan blocks in 7, and 72 (so the size of Ait is aj x /8). For fixed i and ;', write i4j; = E*'=1 E°'=, «m£m, where Epq is an a, x /3, matrix with 1 in the intersection of the (a, - p + l)th row and gth column and zero in all other places. Let df = Uu + "2,-1 + ■ " + ",1 (we put upq = 0 if p > a, or ^ > fy). Define Bf = 2 4P+""*)£P<? , * = 1,2,... (4.4.2) where the sum is over all the pairs p,q such that p <min(&, a,), ^ < min(A:, /3;), and p + q> k. For example, fi;;n has «n in the lower left corner and zeros elsewhere, Z?l2> has '' 12 21 in the lower left corner and ' L 0 «., J zero elsewhere, Btl has <11 0 0 "12 + "21 «,, 0 "l3 + M22 + «31 "l2+«21 «,, J in the lower left corner and zeros elsewhere (provided a,, /3 >3). Let B(k) be the m x n matrix with blocks fi|/'(i = 1,...,«,; y = 1,... , «2). Lemma 4.4.2 In the preceding notation we have rank Jk = rank 7* + rank j\ + rank /3(A:) , k = 1, 2,. . .
Special Case of Completions 139 Proof. Let Aw be defined by Jk = ! k . An easy induct argument on k shows that ion Aik) = H J\AJ\-'-', A: = 1,2,. and hence *-i k-l ^^wr^s-^w 'p<tJV = 2 2««~£ p,<? 5=0 P9 p + s,<7 + A:—s—1 A-l F -k+s + l'-'p'q' ' where Eab = 0 whenever at least one of the inequalities 1 ^ a < a,; 1 s fr < /^ is violated, and uab = 0 for a < 1 or />< 1. It follows that |(<0 B)j + (terms with Ep.q. such that p' > k or q' > k) By column operations from /* and row operations from Jk, we can eliminate all terms of A\k) except those in the block B\k). Permuting the rows and columns of the resulting matrix, we obtain the following matrix that has the same rank as Jk: 0 0 0 -0 l»k 0 0 0 0 B{k) 0 0 0 0 hk 0 where ak = rank 7, and bk = rank J2. Lemma 4.4.2 follows. □ It is an immediate consequence of the lemma that the sequence {yjjli depends only on the diagonal sums d^, for t <min(a(, Bt). Thus we can replace each A(j by a matrix in which only the first column can contain nonzero entries. Alternatively, we can presume that only the bottom row of Atj can contain nonzero entries. For illustration of Lemma 4.4.2, consider the following example. example 4.4.1. Let a = (a,,0,0,. . .), B = (/3,, 0,0,. . .), where a,, /3, > 0. We suppose for definiteness that a, > Bv If d0) ^0, it is easily seen that
140 Jordan Forms for Extensions and Completions rank B^ = min(k, a,) + min(A;, ft) - k , A>1 In general, we have {Jt) _ f min(A:, a,) + min(A:, ft) - fc - ?0 + 1 for k^t0 rank*,, -j Q ior k<t0 (4.4.3) where f0 is the smallest t such that d^ # 0, or t0 = ft + 1 if all d*'/ are zeros. It is now clear that y = (y,, y2, . . .) £ T(a, ft) is determined completely by the value of t0. Further, using formula (4.4.3) and Lemma 4.4.2, we compute {k | yk > i + 1}# = rank f - rank J' + l Computation shows that r(o, ft) = {(a, + ft, 0), (a, + ft - 1,1),... , (a, + 1, ft - 1), (a„ ft)} (In every y sequence we write only the first members; the others are zeros.) The y sequence (a, + ft — p, p) corresponds to the value t0 = p + 1. The possibility of y = (a^ + ft - p, p), p = 0, . . . , ft, is realized for the matrix i(p) Lo jA where Ap is an a, x ft matrix with all but the (a^ - p, l)th entry equal to zero, and this exceptional entry is equal to 1 (for p = ft we put Ap = 0). It is not difficult to construct two independent Jordan chains of A/ - /(p) of lengths «] + ft - p and p. Namely, the Jordan chain of length al + ft - p is ««,+<>,. ««I+Pl-i. ••■.««, + !. %-p> ««,-p-i» ••■.«!■ The Jordan chain of length p is e0[ - e„|+(1, <?„,_, - ett|+p_,, . . . , eai_p + 1 - <?„, + ,. □ Using Lemma 4.4.2, we shall now give a complete description of the set r(a, ft) in the case that a = (0], a2,. . . , a„,0,. . .) and /3 = (ft,0,0,. . .) where an and ft are positive. Introduce the set ft0 of all n-tuples («,, w2,. . . , w„), where w, are integers such that 1 < ^ < A, + 1 and A; = min(ay, ft). For a given sequence (o = (<i>j, a>2, . . . , a>„) G ft0 and i = 1, 2,. . . , define integers c{"' as follows: (i - min(wi - 1, i) for 1 < j =£ A, Aj-min^.-l.A,) for A,.</<M> A ■ + /x- - i - min(w, - 1, A. + /x. - i) for i > p,y
Special Case of Completions 141 where ^■ = max(a,, /3,). Now let y = (yt, y2,. . .) be the nonincreasing sequence of nonnegative integers denned by the equalities {; | y> a k + 1}# = {; | a,, s A: + 1}# + max(/3, - k, 0) -max(/31-*-l,0)+/t-/t+1 for k = 0,1, 2,... , where /„ = 0 and /t = max(4-),cE\...,ci:)) for A:>0 (4.4.4) Thus for every w G il0 we have constructed a sequence -y. Let us denote this sequence by F(w). Theorem 4.4.3 For every (d£(1(i f/ie sequence F{w) belongs to Y{a, fi). Conversely, if y G T(a, B), there exists <o G (l0 such that y = F(cd). Proof Recall that {; | yt a fc + 1}# = rank 7* - rank Jk + 1 In view of Lemma 4.4.2, we find that rank Jk - rank Jk+1 = rank Jk — rank Jk+l + rank 7* _ rank 7*+l + rank B{k) - rank B(*+,) = {;' | at > A; + 1}# + max(/3, - fc, 0) - max(/3, - k - 1,0) + rank B(*> - rank B(*+1) It remains to check, therefore, that for every <o Eft it is possible to pick the complex numbers djj' (l</<n, f = l,2, ...) in such a way that fk = rank B(A:) for k = 1, 2, ... , where /t is denned by equation (4.4.4) and B(*> is denned as in Lemma 4.4.2; and conversely, for every choice of rfj'j' it is possible to find an <o Eft0 such that fk = rank B{k). Note that fi1*' depends on d^ with f ^ A., so we restrict ourselves only to these values of ?. Given &> = (w,,. . . , wn)GO0, choose dj',' in such a way that w; is the smallest index t with the property that d{^ # 0 [if wy = A, + 1, put df? = 0 for all t]. It is easy to see that ck"} is just the rank of the matrix fijj' [denned by (4.4.2)]. Observe that after crossing out some zero columns and rows, if necessary, BJJ* is an upper triangular Toeplitz matrix with min(A:, /3,) columns. Thus the rank of
142 Jordan Forms for Extensions and Completions Bik) = B (*) is just the maximum of the ranks of B\\\ B\]',. . . , BJ,,\ that is, fk. Conversely, if d^ are given, define w; as the minimal /(l <r< A;) such that d^ # 0; and if d^ = 0 for every t, 1 < t < A;, put w, = A; + 1. D 4.5 EXERCISES 4.1 Supply a proof of Theorem 4.1.5. 4.2 State and prove a result for dilations analogous to Theorems 4.1.4 and 4.1.5. 4.3 Prove that the maximal dimension of an irreducible ^-invariant sub- space coincides with the maximal dimension of a Jordan block in the Jordan form of A. 4.4 Find all possibilities for the partial multiplicities of matrices of type ro *i Lo o J where X is any n x m matrix. 4.5 What is the answer to the preceding exercise under the restriction that rank X ^ k, where & is a fixed positive integer? 4.6 Find all possibilities for the partial multiplicities of matrices of the following types: (a) (b) J-(0) 0 0 J where X is any n x m matrix r-uo) x] L o oJ where X is any n x m matrix of rank 1. (Hint: Prove that there exists an n x m matrix X0 with exactly one nonzero entry such that [.UO) X] |"7„(0) X0 I o oJ and L o o are similar.) (c) What happens if we allow matrices X of rank 2?
Exercises 143 4.7 Find all possibilities for partial multiplicities of matrices of type U„(0) L o j. x ,(0)J where X is any n x m matrix. 4.8 Let C,= «1 «2 .a, a, and C2 = P. bn -b2 b2 ■ b, ■ ■ bn ■ bx be circulant matrices. Find all possibilities for the partial multiplicities of matrices of type re, x^ L o cA where X is an n x n matrix.
Chapter Five Applications to Matrix Polynomials Let A0, A,,. . . , j4,_( be complex n x n matrices. We call the matrix-valued function L(A) = IX + E^ AtX' a monic matrix polynomial of degree /. It will be seen that there are In x In matrices C such that L o / and A/_c are equivalent. (See the appendix for the notion of equivalence.) In this case C is said to be a linearization of L(A). The invariant, coinvariant, and semiinvariant subspaces for C play a special role in the study of the matrix polynomial L(X). For example, certain invariant subspaces of C are related to factorizations of L(A). More precisely, certain invariant subspaces determine monic right divisors of L(A), certain coinvariant subspaces determine monic left divisors, and certain semiinvariant subspaces determine three monic factors of L(A). In this chapter we explore these and similar connections and study the behavior of solutions of differential and difference equations with constant coefficients. 5.1 LINEARIZATIONS, STANDARD TRIPLES, AND REPRESENTATIONS OF MONIC MATRIX POLYNOMIALS In this section we introduce the main tools required for the study of monic matrix polynomials. These tools are freely used in subsequent sections. Let L(A) = /A' + L'jZl AjX' be a monic matrix polynomial of degree /, where the A; are n x n matrices with complex entries. Note that det L(A) is a polynomial of degree nl. A linear matrix polynomial IX - A of size (« + p) x (n + p) is called a linearization of L( A) if 144
Monic Matrix Polynomials /A- A = E(A) L(A) 0 a F(A) 145 (5.1.1) where £( A) and F( A) are (« + p) x (n + p) matrix polynomials with constant nonzero determinants. Admitting a small abuse of language, we also call matrix A from equation (5.1.1) a linearization of L(A). Comparing determinants on both sides of (5.1.1), we conclude that det(/A - A) is a polynomial of degree nl, where / is the degree of L(\). So the size of a linearization A of L( A) is necessarily nl. As an illustration of the notion of linearization, consider the linearizations of a scalar polynomial (n = 1). Let L(A) = fl*,, (A - A,)"' be a scalar polynomial having different zeros A,,. . . , \k with multiplicities at,. . . ,ak, respectively. To construct a linearization of L( A), let Ji(i = 1,. . . , k) be the Jordan block of size at with eigenvalue A,, and consider the linear polynomial A/ - / of size E*=1 ajy where J = diag[7;]j=1. Then J is a linearization of L9A). Indeed, IX.- J and have the same elementary divisors; so using Theorem A.3.1, we find that / is a linearization of L(A). The following theorem describes a linearization of a monic matrix polynomial directly in terms of the coefficients of the polynomial. Theorem 5.1.1 For a monic matrix polynomial L(\) - /A + E;=0 Aj\' of size « x n, define the nl x nl matrix C,= 0 0 -Ao I 0 -A, 0 •■ / •■ 0 0 / Then Cxis a linearization of L( A). Proof. Define nl x nl matrix polynomials E{k) and F(A) as follows: F(A) = ' / 0 •• -A/ / •• 0 0- - 0 0 •• • 0 • 0 / • -A/ 0 0 0 7
146 Applications to Matrix Polynomials E(A) = /-,(A) B,_2(A) ••• -/ 0 0 -/ Bo(A) 0 L 0 / 0 where B0(A) = / and Br+1(A) = ABr(A) + /!,_,._, for r = 0,1,...,/- 2. It is immediately seen that det F( A) = 1 and det £(A) = ±1. Direct multiplication on both sides shows that EUXAZ-C,)^^ °JF(A) (5.1.2) and Theorem 5.1.1 follows. □ The matrix Ct from Theorem 5.1.1 will be called the (first) companion matrix of L(A), and will play an important role in the sequel. From the definition of C, it is clear that det(/A-C1) = detL(A) In particular, the eigenvalues of L( A), that is, zeros of the scalar polynomial det L(A), and the eigenvalues of /A - C, are the same. In fact, we can say more: since C, is a linearization of L(A), it follows that the elementary divisors (and thus also the partial multiplicities of every eigenvalue) of /A - C, and L(A) are the same. Now we prove an important result connecting the rational matrix function L(A)~' with the resolvent function for the linearization C,. Proposition 5.1.2 For every AG (p that is not an eigenvalue of L(A), the following equality holds: [L(\)]-l = Pl(I\-Cl),Rl (5.1.3) where F, = U 0 0] is an n x nl matrix and *.= o LI. (5.1.4) is an n x nl matrix.
Monic Matrix Polynomials 147 Proof. Consider the equality (5.1.2) used in the proof of Theorem 5.1.1. We have ['L(f 5] = F(A)(/A-C'rl^A)]"' (5L5) It is easy to see that the first n columns of the matrix [E( A)]"' have the form (5.1.4). Now, multiplying equation (5.1.5) on the left by P, and on the right by PTX and using the relation P,P(A) = [/ 0 ••■ 0] = P, we obtain the desired formula (5.1.3). □ Formula (5.1.3) is referred to as a resolvent form of the monic matrix polynomial L(A). The following result follows directly from the definition of a linearization and Theorem A.4.1. Proposition 5.1.3 Any two linearizations of a monic matrix polynomial L(\) are similar. Conversely, if a matrix T is a linearization of L( A) and matrix S is similar to T, then S is also a linearization of L{\). This proposition and the resolvent form (5.1.3) suggest the following important definition: a triple of matrices (X, T, Y), where T is nl x nl, X is n x nl, and Y is nl x n, is called a standard triple of L(A) if L(A) ' = ^(/A-T)My For example, Proposition 5.1.2 shows that (P,, C,, Rt) is a standard triple of L(A). It is evident from the definition that, if (X, T, Y) is a standard triple for L(A), then so is any other triple (X, f, Y) that is similar to (X, T, Y), that is, such that X=XS, T = S lfS, Y = S~lY for some nonsingular matrix 5. As we see in Theorem 5.1.5, this is the only freedom in the choice of standard triples. We start with some useful properties of standard triples. Here and in the sequel we adopt the notation col[Z,]f=0 for the column matrix ~za- Zy
148 Applications to Matrix Polynomials Proposition 5.1.4 If (X, T,Y) is a standard triple of a monic n x n matrix polynomial /A' + EJIq Aj\', then the nl x nl matrices col[AT'];;J and [Y, TY, . . . , T'~lY] are nonsingular. Further, the equalities A0X + A,XT + ■ ■ ■ + At^XT1'1 + XT' = 0 (5.1.6) and YAn + TYA, + ■■■ + T'~,YA,_l + TlY = 0 (5.1.7) hold. Proof. We have L(\)~l = X(I\- T)~lY and by Proposition 2.10.1, ^—. I ^Lixy1 dy = ~ I k*X{Ik-TYlYdk = XTjY , / = 0,1,... 2tti Jr v 2iti Jr (5.1.8) where T is a circle with centre 0 and sufficiently large radius so that cr(T) and the eigenvalues of L(A) are inside V. On the other hand, since L( A) is a monic polynomial of degree /, the matrix function L(A) = A~'L(A) is analytic and invertible in a neighbourhood of infinity and takes the value / at infinity. In fact, L(A) is analytic outside and on T. Hence ^-. [ X'L(\yld\=^-. I A'-'LUr'dA 2tti Jr ' 2iti Jr v ' and representing L(X)~ as a power series / + E£_, X~kLk, we see that ^[A-Lur'A-f0, !"' >-?••; ■'-2 2iti Jr ' I / for ; = /- 1 Combining this with (5.1.8), we have
J-f Monic Matrix Polynomials i '-1 -i LA' , / , 2i-2 L(A)"1 d\ = 0 ••• 0 /"! / 6 / * 149 X 1 XT LxT x[Y TY ■■■ T'~lY] (5.1.9) As the right-hand side in equation (5.1.9) is nonsingular, the nl x nl matrices oo\[XT']'rX and [Y TY ■■■ TllY] are both nonsingular. Now use equation (5.1.8) again and we find that, for i = 0,1,...,/- 1, 0=^-. I A'L(A)L(A)~1dA=-^ | x'Li^XilA-Ty^dX = {XT' + --- + AtXT+ A0X)T'Y It follows that (XT' + ■ ■ ■ + AlXT + A0X)[Y, YT,..., TllY] = 0 and since the second factor is nonsingular, formula (5.1.6) follows. Similarly, starting with the equality °=2^/rA'L(ArlL(A)dA formula (5.1.7) can be verified. □ We are now ready to state and prove the basic result that the standard triple for a monic matrix polynomial is essentially unique (up to similarity). Theorem 5.1.5 Let (Xx, Tx, Yt) and (X2, T2, Y2) be two standard triples of the monic matrix polynomial L( A) of degree I. Then there exists a unique nonsingular matrix S such that X,=X2S, r,=5_1r25, Y1 = S~,Y2 (5.1.10) The matrix S is given by the formula S = (col[X2T2]'i:lyl.Co\[XlTi1]'-l0 = [Y2, t2y2,..., T'fXliYi, r.r„• • •, r'r'y,]- (5.1.11) where the invertibility of the matrices involved is ensured by Proposition
ISO Applications to Matrix Polynomials 5.1.4. In particular, if (X, T,Y) is a standard triple of L(A), then T is a linearization of L{k). Proof. Assume we have already found a nonsingular 5 such that (5.1.10) holds. Then and coi[*Ir1]I':0 = coi[*2r2]I':0s [y„ r.y,,..., r'r'yj = s~1[y2, t2y2,..., t'2~xy2] Thus formulas (5.1.11) hold and consequently S is unique. Now we prove the existence of an 5 such that (5.1.10) holds. Without loss of generality, and taking advantage of Proposition 5.1.2, we can assume that X, = P,, r2 = C,, Y2 = R{. Using (5.1.6) [with (X,T,Y) replaced by (X{, 7,, y,)], the equality coi^.rjii^c, coi^r,],':,' where C, is the companion matrix of L(A), is easily verified. Also, (5.1.9) implies coiiJf.r'jUy^cop,,/];., where S(/ is the Kronecker index (8,; = 0 if i # /; 51; = 1 if i = ;'). Obviously *, = [/ o ■■■ o]coit^r,]!:,; and equations (5.1.10) hold with 5 = col[Ar,ri]f:(1). Finally, if {X, T, Y) is a standard triple of L(A), then, by the part of Theorem 5.1.4 already proved, T is similar to the companion matrix C, of L(A), and thus T is also a linearization of L(A). □ Proposition 5.1.2 gives an example of a standard triple based on the companion matrix of L(A). Another useful example of a standard triple is where ([0 ••■ 0 /l.Q.cop,,/]!.,) 0 ••• 0 -A0 1 ■■■ 0 -A, (5.1.12) -0 ••• / -^,_J and is called the second companion matrix of L(A). Indeed, if we define
Monic Matrix Polynomials 151 mAx A2 ••■ i4,_, r - / 0 ••■ 0 - then we have [0 ■•• 0 /] = [/ 0 ••• 0]B ', C2 = BC1B~l col[8„/];=1 = Bcol[8„C Thus the triple (5.1.10) is similar to the standard triple given in Proposition 5.1.2. The notion of a standard triple is the main tool in the following representation theorem. Theorem 5.1.6 Let L(\) = /A + L'Z0 v4;A' be a monic matrix polynomial of degree I with standard triple (X, T, Y). Then L(X) admits the following representations: (a) Right canonical form: L(A)=/A'-AT'(V, +V2\ + --- + Vt\'-1) (5.1.13) where Vt are nl x n matrices such that [v, •■• v,] = {coi[*n;:jri (b) Left canonical form: L(A) = A'/- (W, + kW2 + ■ ■ ■ + k'-lW,)T'Y (5.1.14) where W, are n x nl matrices such that coi[w;.];.,=[y, 7Y,...,r'-Iy] ' Note that only X and T appear in the right canonical form of L(A), whereas only T and Y appear in the left canonical form. Proof. Observe that the forms (5.1.13) and (5.1.14) are independent of the choice of the standard triple (X, 7, Y). Let us check this for (5.1.13), for example. We have to prove that if (X, T, Y) and (A", 7", V) are standard triples of L(A), then XT'[V, ■■■ V,] = X'(T,)'[V[ ■■■ V]} (5.1.15) where
152 Applications to Matrix Polynomials [V, •■• Vl] = {col[XT']'r_l0}-1, [V\ ••• V'l]^{co\[X'T',]'i:l0}-i But these standard triples are similar: X' = xs, r' = s-1r5, y' = s~1y Therefore [v; •■• v;] = {coi[jfr'];:0}-, = {coi[jfr];:0s}-1 = S"1[V1 ••• V,] and (5.1.15) follows. Thus it suffices to check equation (5.1.13) only for the special standard triple x = [i o ••• o], r=c,, y = coi[8,,/L'-i and for checking (5.1.14), we choose the standard triple defined by (5.1.12). To prove (5.1.13), observe that [/ 0 ••• 0)C\=[-A0 -A, ••■ -A,.,] and [Vx ••• K,] = {col[[/ 0 •■• 0]C\]'llrl = l so [/ 0 ■■■ 0]Ci[K, V2 ■■■ V,] = [-A0 -A, ■■■ -A,.,] and (5.1.13) becomes evident. To prove (5.1.14), note that by direct computation one easily checks that for the standard triple (5.1.12) c^coi^ ,/]!=. = coi[s<i>+I/i;_1, y = o,...,/-1 and Cicol^./Jl.^colI-^^i So [col^-j;., , C2 col[S(1/],'=,> ...,C'~l col[8n/]J-.] = / Thus
Multiplication of Monic Matrix Polynomials 153 coi[wa'.,=/ and W,C2 col^/lU^-^V,, i=l,...,/ So equations (5.1.14) follows. □ 5.2 MULTIPLICATION OF MONIC MATRIX POLYNOMIALS AND PARTIAL MULTIPLICITIES OF A PRODUCT In this section we describe multiplication of monic matrix polynomials in terms of their standard triples. First we compute the inverse L~ (A) of the product L(A) = L2(A)L,(A) of two monic matrix polynomials L,(A) and MA). Theorem 5.2.1 Let L,( A) be a matrix polynomial with standard triple (Xt, T,, Yt)for i — 1, 2, and let L(A) = L2(A)L,(A). Then L-1(A) = [^10](/A-r)-1[yJ (5.2.1) where T2 , L o Proof. It is easily verified that (/A T) ~l o (/A-r2)-' J The product on the right of equation (5.2.1) is then found to be ^(/A-r.r'y.^/A-^)-1^ But, using the definition of standard triples, this is just L1~'(A)LJ,(A), and the theorem follows immediately. □ Corollary 5.2.2 If L,(A) are monic matrix polynomials with standard triples (Xt, Tn V,) for i = l,2, then L(\) = L2(A)L,(A) has a standard triple (X, T, Y) with the representations
154 Applications to Matrix Polynomials ™. '-ft r£]. y-[°r,} Proof. Combine Theorem 5.1.5 with Theorem 5.2.1. □ Corollary 5.2.2 allows us to describe the partial multiplicities of a product of monic matrix polynomials. We first give some necessary definitions. For a monic matrix polynomial L(A) and its eigenvalue A0 [i.e., det L(A0) = 0], let a, 2 a2 > • • • > ar be the degrees of the elementary divisors of L( A) corresponding to A0. The integers a, are called the partial multiplicities of L(\) corresponding to A0. It is convenient to augment the a, values by zeros and call the sequence a = (a,, a2,. . . , ar, 0,. . .) the sequence of partial multiplicities of L( A) at A0. Thus a £ 12 (see Section 4.4 for the definition of ft). Also, we shall say formally that the partial multiplicities of L( A) corresponding to a complex number that is not an eigenvalue of L(A) are all zeros. Recall also the definition of the set r(a, /3) given in Section 4.4. Theorem 5.2.3 Let Lj(A) and L2(X) be n x n monic matrix polynomials. Let a, (3 and y be the sequences of partial multiplicities of L,(A), L2(A), and L2(A)Lj(A), respectively, at A0. Then y £ r(a, /3). Conversely, if y £ r(a, /3), then for n sufficiently large there exist nXn monic matrix polynomials Lt(\) and L2( A), such that the sequence of their partial multiplicities at A0 are a and [}, respectively, and the sequence of partial multiplicities of L2(A)L,(A) is y. Proof. Let (A',, 7,, Y,) be a standard triple for L,(A) and i = 1,2. By the multiplication formula (Corollary 5.2.2), the matrix rr, y.x2~\ Ho T2\ is a linearization of L2(A)L,(A). From the properties of a linearization it follows that y is also the sequence of partial multiplicities of T at A0. Now from the structure of T it is clear that y £ T(a, /3), and the first part of the theorem follows. To prove the second part of Theorem 5.2.3, we first prove the following assertion: let A be an rx x r2 matrix. Then for n sufficiently large there exist an /-] x n matrix Y and an n x r2 matrix X such that YX - A, the rows of Y are linearly independent, and the columns of X are linearly independent. Indeed, multiplying A by invertible matrices from the left and the right (if necessary), we can suppose that
Multiplication of Monk Matrix Polynomials 155 where / is the unit rxr matrix (for some r < min^j, r2)). Then we can take [. 0 "/ 0 _0 0" 0 *.- where Yi is an (r, - r)x r{ matrix with linearly independent rows and Xy is an r2 x (r2 - r) matrix with linearly independent columns. Then n = r+rx + r2, of course. Now let y £ T(a, /3), so that y is the sequence of partial multiplicities of 0 for some Tx, T2, A, and the partial multiplicities of Tx (resp. T2) corresponding to A0 are given by the sequence a (resp. /3). Applying a similarity to T0, if necessary, we can assume that Tt and T2 are in Jordan form. Further, in view of Theorem 4.1.1 we can assume that a(Tx) = a(T2) - {A,,}- According to the assertion proved in the preceding paragraph, for n sufficiently large there exist matrices Xt) and V0 of sizes n x r2 and r, x n, respectively (where r, = E*=, a;, r2 = E°L, /3y) such that VqA",, = A, the rows of Y0 are linearly independent, and so are the columns of X0. Choose an n x {n - r2) matrix A", such that the matrix [A'(IA'1] (of size n x n) is invertible, and put L2(A) = A/-[*-„*,] T2 0 0 z/ W,^] where z is some complex number different from A0. Similarly, choose an ryoi (n - r,) x n matrix Yt such that is nonsingular, and put As T2@zl (resp. 7\ ©z/) is a linearization of L2(A) [resp. of L,(A)], it follows that the partial multiplicities of L2(A) [resp. of L,(A)] corresponding to A() are given by the sequence /3 (resp. a). Further (WAV; i].M,ii - ([;;]-'.[j :,),m are the standard triples for L2(A) and L,(A), respectively. By Corollary 5.2.2 the matrix
156 Applications to Matrix Polynomials rr, o yox0 Y0xt-\ rr, o a ym 0 zl YtX0 YlXl 0 zl YlXli Y,X, 0 0 T2 0 0 0 T2 0 .00 0 z/JLoO 0 zl . is a linearization of L2( A)L,( A). Now Theorem 4.1.1 ensures that the partial multiplicities of T corresponding to A0 are exactly those for T0; that is, they are given by the sequence y. □ The proof of the converse statement of Theorem 5.2.3 shows that for a yEr(a, /3) there exist linear monic matrix polynomials L,(A) and L2(A) with the desired properties and with the size not exceeding min(r,,r2) + ri + r2> where r, (resp. r2) is the sum of all integers in a (resp. /3). Our analysis of partial multiplicities of completions in sections 4.2- 4.4, combined with Theorem 5.2.3, allows us to deduce various connections between the partial multiplicities of monic matrix polynomials and the partial multiplicities of their product, as indicated, for instance, in the following corollary. Corollary 5.2.4 Let Lj(A) and L2(A) be n x n monic matrix polynomials. Let a = (a,, a2,. . .), /3 = (/§,, /32,. . .), and y = (y,, y2,. . .) be sequences of partial multiplicities o/L,(A), L2(A), and L2(A)Lj(A), respectively, at A0. Then m / m mm m \ Syrs min( X a, +E ft, E «, + E ft) /or any sequence r, < • • • < rm of positive integers. The corollary follows from Theorems 4.3.1 and 5.2.3. 5.3 DIVISIBILITY OF MONIC MATRIX POLYNOMIALS Let L( A) be an n x n monic matrix polynomial of degree /, and let (X, T, Y) be a standard triple for L(A). Consider a r-semiinvariant sub- space M. Thus there exists a triinvariant decomposition (see Section 3.3) associated with M: $"' = £ +M+Jf (5.3.1) where the subspaces if and Z£ + M are T invariant. The triinvariant decomposition (5.3.1) is called supporting [with respect to (X, T, Y)] if, for some integers p and q, the transformations
Divisibility of Monic Matrix Polynomials 157 and X XT LXTp~lJ \X+M ^£ + M^^n (5.3.2) X 1 XT I XT q-\ 2-+C"1 \x (5.3.3) are invertibie (in particular, this implies that dim(i? + M) = np, dim i?= nq). Cases in which i?= {0} are of particular interest; then M is T invariant and condition (5.3.3) is vacuous. Also, if N — {0}, then M is T coinvariant and the condition (5.3.2) is satisfied automatically with p = I. (Indeed, we have seen in Proposition 5.1.4 that the matrix co\[XT']'i=o is non- singular.) The definition of a supporting triinvariant decomposition is given in terms of (X, T) only. However, if Px is a projector with Ker Px = Jf, the following lemma shows that the invertibiiity of (5.3.2) is equivalent to the invertibiiity of the transformation from c"{l~p) into Im PM defined by PAT'-p-lY, ...,TY, Y] = [PyTl-p-1PxY, ..., PMTP„Y, PXY] Similarly, (5.3.3) is invertibie if and only if P<e[T'-"-iY,. . . , TY, Y] = [P^T'"lP^Y, ..., P^TP^Y, P^Y] is invertibie, where Px is a projector with Ker P^ - ££ (note that because of the T invariance of £ and Jf we have P^T' = PXT'PX, P^V = P^T'P^, ; = 1,2,...). Lemma 5.3.1 Let L(X) be a monic matrix polynomial of degree I with standard triple (X, T, Y), and let P be a projector in <p"'. Then the transformation col[XT-l]UlmP--I™P^Pk (where k< I) is invertibie if and only if the transformation (5.3.4) (I-P)[T'" W, n/-Jfc-2 Y,. . . , Y]: <p *('-*) KerP (5.3.5) is invertibie.
158 Applications to Matrix Polynomials Proof. Put A = col[*r ']{_, and B = [1J lY,. . . ,TY, Y]. With respect to the decompositions $nl = Im P + Ker P and <p"' = <pB* © $"<'~k) write A Ya3 a4V b Yb3 bJ Thus the At are transformations with the following domains and ranges: Ax:\mP^$nk; A2: Ker P->$""; AylmP^$ni'-k); A4: Ker P^ §n(,k); and similarly for the Br Observe that Ax and B4 coincide with the transformations (5.3.4) and (5.3.5), respectively. By formula (5.1.9) the product AB has the form \ D. 01 L * D2J where Dx and D2 are nonsingular matrices. Recall that A and B are also nonsingular by Proposition 5.1.4. But then At is invertible if and only if B4 is invertible. This may be seen as follows. Suppose that B4 is invertible. Then r / o i [*, b2]\ i o l \--B4lB3 B4-'J lB3 Bt\l-B;lB3 B~4l J _ [ B\ ~ fi2fi4 fi3 fi2B4 1 L o i \ is invertible in view of the invertibility of B, and then also Bx - B2B4lB3 is invertible. The special form of AB implies AXB2 + A2B4 = 0. Hence D, = AxBt + A2B3 = AXBX - AxB2B~iB3 = AX(BX - B2B41B3) and it follows that Ax is invertible. A similar argument shows that invertibility of Ax implies the invertibility of B4. This proves the lemma. □ The importance of supporting triinvariant decompositions stems from the following result describing factorizations of a monic matrix polynomial L( A) in terms of supporting triinvariant decompositions associated with a linearization of L(A).
Divisibility of Monic Matrix Polynomials 159 Theorem 5.3.2 Let Z£{ A) be an n x n monic matrix polynomial with standard triple (X, T, Y), and let <p" = !£ 4- M 4- J*f be a supporting triinvariant decomposition associated with a T-semiinvariant subspace M. Then L(k) admits a factorization L(A)=L,(A)L2(A)L3(A) (5.3.6) where L^k), i= 1,2,3 are monic matrix polynomials with the following property: (a) (X^, T^, Y) is a standard triple of L3(A), where r x -\ XT Y = XT 9-1 \*> 01 0 0 (5.3.7) (b) (X, PjfT\lmP , PXY) is a standard triple for L{(k), where Fv is a projector with Ker Pv = i? 4- M and * = [0 0 I)(PAY,TY,...,T'-plY]) (5.3.8) (c) (Z|W, PMT\M, Y) is a standard triple for L2(A), where PM is the projector on M along !£ 4- Im PN, Z = [0 ••• 0 /]{(?„ +Pj,)[Y, TY,..., r'"'"1y]}"1:^ + ImPv-»4:" (5.3.9) and z(^r) r p-«-i_ \ ' \M J "0" 0 -/- (5.3.10) [//ere g< / and I -p<l are the unique nonnegative integers such that the linear transformations col(XT')1~^: if—* <pn<? and /VfK,. . . , T ~p~ Y, TlplY]: (p"('"",-»^> 4- M are invertible.] Conversely, if equation (5.3.6) is a factorization of L(k) into a product of three monic matrix polynomials L^k), L2(k), and L3(A), there exists a supporting triinvariant decomposition §"' = y + M+N (5.3.11)
160 Applications to Matrix Polynomials associated with a T-semiinvariant subspace M such that the standard triples 0/L,(A), L2(A),and L3(A) are (X, PxT[lmPj<, PXY), (Z\M, PMT[M, PMY), and (X\<g, T\<g, Y), respectively, where Px is a projector with Ker Px = Z£ + M, PMis the projector on M along !£ + Im P^, and X, Y, Z, Y are given by (5.3.8), (5.3.7), (5.3.9) and (5.3.10), respectively. Moreover, the T-invariant subspaces J£ and J£ + M in (5.3.11) are uniquely determined by the factors Lj(A), L2(\), and L3(\). It is assumed in Theorem 5.3.2 that PM: <p"'—»M, Px: <p" —* Jf, where / is the degree of L(A). As a monic matrix polynomial M(A) and its inverse are uniquely determined by any standard triple (see Theorem 5.1.6 and the definition of a standard triple), Theorem 5.3.2 provides an explicit description of the factors L,(A) in (5.3.6) in terms of supporting triinvariant decompositions. For instance, if <p"' = if 4- M 4- M is a supporting triinvariant decomposition (associated with a T-semiinvariant subspace M) and L( A) = L,(A)L2(A)L3(A) is the corresponding factorization of L(A), then (in the notation of Theorem 5.3.2) we have LI(A)-, = *(A/-p^r|lB#v)-,JVi' L2(\yl = zlM(M-PMTllmPMylY L3(\y1 = x^(\i-Tl^y,Y Similarly, using Theorem 5.3.2, one can produce the formulas for L,(A), L2(A), and L3(A) themselves. The proof of Theorem 5.3.2 is quite lengthy and is relegated to the next section. The following particular case of Theorem 5.3.2 is especially important. We assume that L(A) and (X, T, Y) are as in Theorem 5.3.2. Corollary 5.3.3 Let <p" =Z£ + M+Nbea supporting triinvariant decomposition associated with a T-semiinvariant subspace M such that Z£ 4- M = <p"' (so Jf = {0} and M is actually T coinvariant). Then L(\) admits a factorization L(A)=L2(A)L3(A) (5.3.12) where L3(A) is a monic matrix polynomial of degree q with a standard triple of the form (X^, T\x, Y), where .XT"'1 \*t
Proof of Theorem 5.3.2 161 Also, L2{\) is a monic matrix polynomial of degree I- p with a standard triple of the form (X\M, PM T[M, PM Y) where X = [0 ■■■ 0 I)(PM[Y,TY,...,T'-p-lY)yl and PM is the projector on M along !£. Conversely, if equation (5.3.12) is a factorization of L(X) into a product of two monic matrix polynomials L2{\) and L3{\), there exists a unique T-invariant subspace Z£ such that the triinvariant decomposition <p"' = i? + M 4- {0} {where M is a direct complement to Z£) is supporting and the standard triples of L2{\) and L3{\) are as described above. Note that under the conditions of Corollary 5.3.3 we have q = I-p (cf. Lemma 5.3.1). Again, as in Theorem 5.3.2, one can write down explicit formulas for the factors in (5.3.12) and their inverses using the triinvariant decomposition <p"' = if + M + Jf with Jf = {0}. For example in the notation of Corollary 5.3.3. 5.4 PROOF OF THEOREM 5.3.2 We need the following fact. Proposition 5.4.1 Let L(A) = E>=0 ^^A' be an n x n matrix polynomial {not necessarily monic) and let L,(A) be an nx n monic matrix polynomial with standard triple (A"], 7,, Yt). Then {a) L(A) = L2(A)L,(A) for some matrix polynomial L2(A) if and only if the equality i I.A^T'^0 (5.4.1) holds; (b) L{ A) = L, (A) L3( A) for some matrix polynomial L3( A) if and only if the equality i 2 t[yxa. = o holds. Proof. Let us prove (a). We have L^A)"'= Jr,(A7-7'1)~1K1. Therefore
162 Applications to Matrix Polynomials l(a)l1(a)-' = (S z^.a'Wa/- rj-'y, and for |A| large enough (e.g., for |A|> ||r,||) we have L(A)L,(A)-, = (2 A^xJJ: rA"' ')y, (5.4.2) Now assume L(A)L,(A)~' is a polynomial. Then in formula (5.4.2) the coefficients of negative powers of A are zeros. But the coefficient of A~/_I(/ = 0, 1,...) in (5.4.2) is A0XJ\YX + AXXXT\+,YX + ■■■ + AtXxT\+'Yx which is zero. So (tlAixlri)TliYl=o, y = o,i,... \ = 0 ' As [ y,, 7",, y,,. . . , T * ~' y, ] is nonsingular [where A: is the degree of L, (A); see Proposition 5.1.4], we obtain equality (5.4.1). Conversely, if (5.4.1) holds, then AnXlT\Yl + AlXJ\+lYl+ ■■■ +A,XlT{+,Yl=0, ; = 0,1,. . . which means that all coefficients of negative powers of A in (5.4.2) are zeros, that is, L(A)L,(A)~' is a polynomial. Statement (b) of Proposition 5.4.1 follows from the (already proved) def _ statement (a) when applied to the matrix polynomials L(A) = (L(A))* = def E/=0 A*\' and L,(A) = (L,(A))* in place of L(A) and L,(A), respectively. {Note that {Y*, T*,X*) is a standard triple for L,(A), and that L(A) = L,(A)L3(A) if andonly if L(A)=L3(A)^i(A), where L3(A) = [L3(A)]* is a matrix polynomial together with L3(A).} □ Assume now that <p" = !£ 4- M + Jf is a supporting triinvariant decomposition associated with r-semiinvariant subspace M, as in Theorem 5.3.2. As [col[A'T']fro1]|^: if—* (p"* is an invertible transformation, we can define the n x n monic matrix polynomial L3(A) by the formula L3(A) = /A" - X^T^nV, + V2A + • • ■ + V^A'"1) where
Proof of Theorem 5.3.2 163 (so V,: $"-+£, i = l,. .. ,q). It turns out that {X^, T^, Vq) is a standard triple of L3(A). Indeed, we note that the following equalities hold: [/ 0 •■• 0][col[XT>]«:X = Xw c3[co\[xr]i:X = [coi[xr}'>:;][xTly where C, is the companion matrix for L3(A), and ^\lSiqI]ll=[col[XT']l:XVll (The second equality is obtained from [Vt V2 ■■■ Vq\[co\[XT']?:X = VqX^ + V2XT^ + --- + VqXT^ = I on premultiplication by ATj^,.) Hence (X^,T^,V ) is similar to the standard triple (P,, C,, /?,) for the matrix polynomial L3(A) (in the notation of Proposition 5.1.2), so (X^,, T^, Vq) is itself a standard triple for L3(A). Because of the equality A0X +AtXT + --- + Al_lXT,x + XT' = 0 where / is the degree of L(A) and A/ is the coefficient of A' in L(\) [see formula (5.1.6)], Proposition 5.4.1 ensures that there exists a matrix polynomial L4(A) such that L(A)= L4(A)L3(A). The matrix polynomial L4(A) is necessarily monic and of degree / - q. Let us find its standard triple. First note that the transformation Q = PM + Pv is a projector on M 4- Im Pv along if. Indeed, for every jc £ if we have Qx = PMx + Pxx = 0 + 0 = 0, and for every yEM (resp. yElmPv) we have Qy = PMy + Pyy = y + 0 = 0 (resp. Qy = Pxy = y). Then by Lemma 5.3.1, the transformation Q[Y, TY,..., T'~",_1yj: (nV-")-*M + Im PN is invertible. Now we check that LA{kYl = Z{Ik-QTQ)-lQY (5.4.3) where Z = [0 ■•• 0 I]{Q[Y, TY,..., T'~q'lY]yl:lmQ-*(n and QTQ is considered as a transformation from Im Q into itself. In view of
164 Applications to Matrix Polynomials the multiplication theorem (Theorem 5.2.1) it will suffice to check that the triple (A, T, Y) is similar to the triple (^ ».. V: aril P) vqz QTQ For then we have L(A) ' = L3(A) '^(A) ', where L4(\) is the right-hand side of (5.4.3) and thus L4(\) = L4(\). To this end define P' = [col[Jf,T[~']?_,]~l col[XT-']«.,: £"'-* <p"' where A', = Aji?, Tj = 7^. Then P' is a projector and Im P' = if. Indeed, we obviously have P'y — y for every y£if. Further, formula (5.1.9) implies that KerP'Dlm[y, TY,. . . , r,~'"1K] In fact, we have the equality: Ker P' = Im[y, TY,..., T'~qiY} (5.4.4) To check this, let y GKer P'. As [Y, TY,..., T,~"~lY] is invertible, we have y = E,'lo T'Yxi for some *„,...,.*,_,£ <p". Now o = coi[XT-']Uy i_ii« .. — AT AT »-i [y, 7Y,...,r-1y] LJCi_, and formula (5.1.9) easily implies that X/_q = ■ ■ ■ = x,_1 = 0. Hence (5.4.4) follows. In view of Lemma 5.3.1 the transformation [Y, TY,. . . , T''qlY\ is one-to-one; therefore dim Im[y, TY,..., Tl~q~1Y] = (/ - q)n Using (5.4.4) and the fact that P'^ = /, it follows that if and Im[y, TY,. . . , T'~q~*Y\ are direct complements to each other in <p"'. Thus P' is indeed a projector. Define 5: <p"'-*Im P + Im Q by 5 = ft where P' and (2 are considered as transformations from <p"' into Im P' and
Proof of Theorem 5.3.2 165 Im Q, respectively. One verifies easily that 5 is invertibie. We show that [*, o)s = x, 5r=[r0' %£\s, 5y = [e°y] (5.4.5) Take y£ (p"'. Then P'yE^and co\[XJ\'lP'y]ki=l = co\[XT'ly]ki=i. In particular, XtP'y = Xy. This proves that [Xx 0]5 = X. The second equality in (5.4.5) is equivalent to the relations P'T= TiP' + VqZQ (5.4.6) and QT= QTQ. The last follows immediately from the fact that Ker Q is an invariant subspace for T. To prove (5.4.6), take y G <p"'. The case when y e Ker Q = Im P' is trivial. Therefore, assume that y £ Ker P'. We then have to demonstrate that P'Ty = VtZ2Qy. Since y £ Ker P', there exist *o. ■ • • . */-,-i e <P" such that u = E!=? ^"'"'Kx,.,. Hence with uGKerF' and, as a consequence, P'Ty = P'Tl'qYx0. But then it follows from the definition of P' that P'Ty = [T\"V,,. . . , T.V;, V,]col[0,. . . ,0, *0] = V„ On the other hand, putting x = col[jc, jjl*, we obtain Qy = Q[T'~"~'Y, ...,TY, Y]x = [(GrQj'-'-'Qy,. . . , (QTQ) ■ QY, QY]x and so VqZQy is also equal to Vqxu. This completes the proof of equation (5.4.6). Finally, the last equality in (5.4.5) is obvious because P'Y = 0. We have now proved equality (5.4.3), from which it follows that (Q, QTQ, QY) is a standard triple for L4(A). Now define the monic matrix polynomial MA) = a'""/ - (i/, + u2x + • • ■ + f/,_pA'-p-1)(P^r|Im Pv.)'-"p^y where col[£/,],';? = A'1 and -^ = [^Vy. P^T\lm Pjf • P*Y, • • • . CV^|im Pjf) Pjr*\ Then (A", P^T ilm Pji_, PXY) is a standard triple for Lt(\). Indeed, this follows from the equalities
166 Applications to Matrix Polynomials PJl.Y=Aco][8n\'rP,PJfTilaPyA = AC2, XA = [0- • 0 /] (5.4.7) where C2 is the second companion matrix of Lj(A). The first and third equations of (5.4.7) follow from the definitions; the second equality follows from the structure of C2 using the fact that A col[£/,]Jl^ = /. Now Proposition 5.4.1, (b) implies that L4(A) = L,(A)L2(A) for some (necessarily monic) matrix polynomial L2(\). So in order to prove the direct statement of Theorem 5.3.2 we have only to verify that (Z\M, PMT\M, Y) is indeed a standard triple for L2(X). To this end, put L2(A) = /A'" - Z^(P.ttTlM)p-'(Vl + V2\ + --- + Vp_q\"-"~l) where tt V2 •■■ K|,_,] = [col[Z|J(((P<,^|.lf)'■]f_-0'-,]-, (5-4.8) Note that, in view of Lemma 5.3.1, the invertibility of the transformation on the right-hand side of (5.4.8) follows from the invertibility of A. As shown earlier in this section, (Z^, PMT^M, Y) is a standard triple for L2(X), and L4(A) = L,(A)L2(A) for some monic matrix polynomial Lt(\) with standard triple (X, PvT|lm/v PVY). Hence L,(A)= L,(A), and thus L2(\) = L2(A). Consider now the proof of the converse statement of Theorem 5.3.2. This statement amounts to the following: if L(A) = L4(A)L3(A) for some monic matrix polynomials L4(\) and L3(\), then there is a unique T-invariant subspace !£ such that (X^., T^,, Y), with y=[col[JfT']»-o|^]"lcol[8I.,/]?_l, is a standard triple for L3(A). Here q is the degree of L3(\). Let C be the first companion matrix of L(A). Proposition 5.4.1 implies that ccoi^.r.L'z^coi^r,];:^, (5.4.9) where (X,, 7",, V,) is a standard triple for L3(A). Also c coi[xr]':l0 = coi[XT']J;0r (5.4.10) Eliminating C from (5.4.9) and (5.4.10), we obtain r[coi[A-r];:;j-,[coi[jf1r1];:1,] = [coi[jfr,];;0]-l[coi[jf1T'I]j:0]r (5.4.11) This readily implies that the subspace 2=lmacol[XT,fr_l0]-l[col[XlT'Jrl0])
Example 167 is T invariant. Moreover, it is easily seen that the columns of [col[AT'],Zo]~'[col[A'17"1]1'~o] are linearly independent; equation (5.4.11) implies that in the basis of informed by these columns, T^is represented by the matrix Tt. Further jf[coi[jfr'];:i]-,[coi[jf1r,l];;i] = jf1 so X\x is represented in the same basis in Z£ by the matrix X. Now it is clear that (A'j^,, T|^, Y) is similar to (A\, 7\, y,), and thus (X^, T^, Y) is also a standard triple for L3(A). It remains to prove the uniqueness of 5£. Assume that if' is also a ^-invariant subspace such that (X^., T^,, Y) is a standard triple for L3(A) (for some admissible Y). As any two standard triples of L3(A) are similar, there exists an invertible transformation 5: if'- T,y, = 5 Ti^S. Then •if such that X^, = X^S, col[AT']; -o \je' col[XT li=0 | if ,5 In particular Im(col[*nU| *) = Im(col[*n;:J, ^) But the matrix co\[XT']'i=l0 is invertible, so $' = if. Theorem 5.3.2 is proved completely. 5.5 EXAMPLE We illustrate Theorem 5.3.2 with an example. Let A L(A) A(A-l)2 0 A2(A-2)J Then 10 0 0 0 0] 0 1 0 0 0 oJ' is the standard triple for L(A) of Proposition 5.1.2, where
168 Applications to Matrix Polynomials "0 0 0 0 0 -0 0 0 0 0 0 0 1 0 0 0 -1 0 0 1 0 0 -1 0 0 0 1 0 2 0 0 0 0 1 0 2 c = is the companion matrix for L(A). As we are concerned with semiinvariant subspaces for C, it is more convenient to use a Jordan form for C in place of C itself. The only eigenvalues of L(A) (and thus also of C) are 0,1, and 2. A calculation shows that the vectors x, = <-l, 1,0,0,0,0) , x2 = <0,0, -1,1,0,0) form a Jordan chain of C corresponding to 0; the vector jt3 = (1,0,0,0,0,0) is an eigenvector of C corresponding to 0; the vectors jc4 = <1, 0,1, 0,1,0), x5 = (0,0,1,0, 2,0) form a Jordan chain of C corresponding to 1; and the vector jc6 = (-1,1, -2, 2, -4, 4) is an eigenvector of Ccorresponding to 2. The vectors jc, , . . . , x6 are easily seen to be linearly independent. Denoting by 5 the invertible 6x6 matrix with columns xu . . . , jc6, let ri oooo o] r-i oiio -l] Lo iooo oJ L ioooo lJ 7=5-C5 = [^ J]©[0]©[J j]©[2] (5.5.1) (J is the Jordan form of C); and Y = S~* -o o- 0 0 0 0 0 0 1 0 -0 1- - o 0 1 -1 1 - 0 4 1 2 1 -1 1 I d Clearly, (X, J, Y) is a standard triple for L(A). We now find some factorizations L(A)=L,(A)L2(A)L3(A) (5.5.2) where L-( A), i — 1, 2, 3 are monic matrix polynomials of the first degree. As
Example 169 in Theorem 5.3.2, we express these factorizations in terms of the supporting triinvariant decompositions $6 = £ + M + J{ (5.5.3) with respect to the standard triple (X, J, Y). So we are looking for 7-semiinvariant subspace M with if and Z£ + M J invariant, such that the transformations X\x: if —* <p and X .XJi\x are invertible. In particular, dim if = dim M = dim M = 2. As if and <£ + M are / invariant, we have if = (if n 9i0(j)) + (se n »,(/» + (if n $2(/)) and if + ^ = ((j? + M) n $„(./)) + ((i? + j«) n »!(/)) + ((i? + j«) n $2(/)) where i%A (7) is the root subspace of J corresponding to the eigenvalue A0. We consider only those supporting triinvariant decomposition (5.5.3) for which dim(i? n %{J)) = dim(if n &,(./)) = 1, dim(if n 912{J)) = 0 (5.5.4) and dim((if + M) n 38o(/)) = 2 , (5.5.5) dim((if + i)n»,(/)) = dim((if + .if)n ^2(7)) = 1 . In other words, we consider only those factorizations (5.5.1) for which detL3(A) = A(A-l) and det(L2(A)L3(A)) = A2(A - 1)(A - 2), or, equivalently detL,(A) = A(A-l); det L2(A) = A(A - 2) ; det L3(A) = A(A - 1) One could consider all other factorizations (5.5.2) of L(A) in a similar way. First, we find all pairs of /-invariant subspaces (if, if 4- M) with the
170 Applications to Matrix Polynomials properties (5.5.4) and (5.5.5). Using the Jordan form (5.5.1), it is not difficult to see that all such pairs are given by the following formulas: (a) !£ = Spanje, + ae3, e4}\ !£ 4- M - Span{e,, e3, e4, e6}, where a G <p is arbitrary. (b) i? = Span{e3, e4}; <£ + M =Span{e,, e3, e4, e6}. (c) i? = Span{e,, e4}; Z£ 4- M = Span{e,, e2 + fie3, e4, e6}, where /3 e <f is arbitrary. Let us check which of these pairs (if, !£ 4- M) give rise to supporting triinvariant decompositions, that is, for which pairs the transformations are invertible. We have for if = Span{e1 + ae3,e4): Ai* L i oJ (in the basis et + ae3, e4 in if and the standard basis in <p2), and this matrix is invertible for all a G (p. For i? = Span{e3, e4} L|if Lo oJ which is not invertible. For ££ + M= Span{e,, e3, e4, e6} "-1 1 LXj\\X+Jt which is invertible. For if 4- M = Span{e,, ez + /3e3, e4, e6} L XJ \\se+M 111 —1 10 0 1 0 0 1-2 L 0 0 0 2J -1 /3 1 1 0 0 0 -1 1 L 0 10 -1 1 -2 2J which is invertible if and only if /3 # 0. (In this calculation we have used the formula [£]- "-1 1 0 . 0 0 0 -1 1 1 0 0 0 1 0 1 0 0 0 1 0 -1" 1 -2 2.
Factorization into Several Factors and Chains of Invariant Subspaces 171 Summarizing, one obtains all the supporting triinvariant decompositions (5.5.3) with the properties (5.5.4) and (5.5.5) where either 3! = Span{e, + ae3,e4} for some a £ <p, M is a direct complement to Span{ej, e3, e4, e6} in <p6, or, for some nonzero /3 E <p we have i? = Span{ej, e4}, J< is a direct complement to if in Span{e,, e2 + /3e3, e4, e6}, and ^V is a direct complement to Span{e,, e2 + ^eiy e4, e6} in <p . Using the formulas given in Theorem 5.3.2, one finds all the factorizations (5.5.2) corresponding to the supporting triinvariant decomposition with properties (5.5.4) and (5.5.5) (here a E <p and /3 E <p, /3 ¥■ 0 are as above): , , r A— 1 A-11TA -A + 21TA-1 A-ll L^=[ 0 A JLo A-2JI 0 A J A-l -1 0 A 5.6 FACTORIZATION INTO SEVERAL FACTORS AND CHAINS OF INVARIANT SUBSPACES In this section we study factorizations of the monic n x n matrix polynomial L(A) of degree / into the product of several factors: L(A)=L,(A)L2(A)-■■/,,(A) (5.6.1) where L((A), . . . , Lk{k) are monic n x n matrix polynomials of positive degrees /,,..., lk, respectively (of course, /, + ■■• + lk = I). We have already encountered particular cases of factorizations (5.6.1) in Theorem 5.3.2 (with k = 3) and in Corollary 5.3.3 (with k = 2). In Theorem 5.3.2 factorizations (5.6.1) with k = 3 were described in terms of supporting triinvariant decompositions associated with semiinvariant subspaces of a linearization of L(A). In contrast, the description of (5.6.1) is to be given in terms of chains of invariant subspaces for a linearization of L(A). The following main result can be regarded as a generalization of Corollary 5.3.3. Theorem 5.6.1 Let (X, T, Y) be a standard triple for L(A). Then for every chain of T-invariant subspaces p+2 p+2 L(A) P P 2 2 P A+£J 2 2(0 + 1) A+0 P 2 'P A- 2(P + 1) P .
172 Applications to Matrix Polynomials {0} c£tci?t_, c • • • c^2c <p"' (5.6.2) satisfying the property that the transformations X XT LOT™'"'J :&r j = 2,...,k \*i are invertible {for some positive integers mk < mk_l <■ ■ ■ <m2< I) there exists a factorization (5.6.1) of L{\), with the factors L;(A) uniquely determined by the chain (5.6.2), as follows. For j = 1, 2,. . . , k - 1, let Mj be a direct complement to i^ + 1 in &■ {by definition, =Sf, = <J7") and let PM\ yt^Mj be the projector on M t along £j+1. Then for j = 1,2,. . . , k — Lj{\) = A'<I-(Wn + \Wf2 + ■ • • + I'^W^iPjT^PjY; (5.6.3) where /■ = m; - mj+1 {by definition, ml = I) and Y, = {aAlXT1]^)-' collfi,^/]"/, (5.6.4) and tfie transformations Wyj: ^y—» (p" (i = 1,. . . , /.) are determined by col[W„]L = [PMYr PMT{MPMY.t,..., {PMTlM/PM)'riPMYl} Further L,(A) = A"*/- A^(7Vtr*(KtI + V,2A + • • • + V^A""-1) (5.6.5) w/iere [v*. n2 ••• ^Bfj=(coi[jfr'r.*oiitr,:rM*-^ so V^ are transformations from <p" into i^ /or # = 1,. . . , mk. Conversely, for every factorization (5.6.1) of L{\) there is a unique chain of T-invariant subspaces (5.6.2) such that for j = 2,3,..., k the transformations col[Xr)r>o^:^-rm' where m; = lt + lj + i + • • • + lk (/, is the degree of Ly), are invertible and formulas (5.6.3) and (5.6.5) hold. Observe that in view of Proposition 3.1.1 formulas (5.6.3) do not depend on the choice of M;. Proof. Apply Corollary 5.3.3 several times to see that factorization
Factorization into Several Factors and Chains of Invariant Subspaces 173 (5.6.1) holds for the monic matrix polynomial Ly(A)L/+1(A)- ■ ■ Lk(\) having the standard triple (A^, T^, Yj), where Yt is given by (5.6.4) (j = 2,. . . , k). Now use Theorem 5.1.6 to produce the formulas L/(A)L. + I(A)---Lt(A) = W-X^T^)m'{Vii + ---+Vhm\mrl), j = 2,...,k where [ytl Vji ■■■ vj„) = [ccA[XTi]T^\tlrl--cm'^^ (so Vj are transformations from <p" into if; for q — 1, . . . , m;). In particular (with j = k), formula (5.6.5) follows. Further, using the formulas for the standard triple of the factor L2(A) in Corollary 5.3.3, one easily obtains the desired formulas [equation (5.6.3)]. The converse statement also follows by repeated application of the converse statement of Corollary 5.3.3. □ A "dual" version of Theorem 5.6.1 can be obtained if one uses the left canonical form [equation (5.1.14) instead of the right canonical form equation (5.1.13)] to produce formulas for L;(A)L/+1(A)■• ■ Lk(\). Then one uses (5.1.13) [instead of (5.1.14)] to derive the formulas for Lj(A),. . . , LA_j(A). We omit an explicit formulation of these results. We are interested particularly in factorizations (5.6.1) with linear factors Lj(\): L}(\) = A/+ A j for some n x n matrices Aj (;'= 1,. . . , k). Note that in contrast to the scalar case, not every monic matrix polynomial admits such a factorization: example 5.6.1. Let We claim that L(A) cannot be factorized into the product of (two) linear factors. Indeed, assume the contrary: TA2 -llU + a, bx ITA + fl, b2 ] Lo A2 J I- c, A + rfJL c2 A + dJ (56'6) for some complex numbers a,, bit c,, dt,i = \, 2. Multiplying the factors on the right-hand side and comparing entries, we obtain a1 + a2=0; b2 + bt = 0 c, + c2 = 0 ; dl + d2 = 0 Letting
174 Applications to Matrix Polynomials -[:: si we can rewrite equality (5.6.6) in the form [AQ ~J]=(XI+A)(\I-A) which implies A2 = . However, there is no 2 x 2 matrix A with this property (indeed, such an A must have only the zero eigenvalue, but then inevitably A2 = 0). □ As we shall see in the next theorem, a necessary (but not sufficient) condition for a monic matrix polynomial L( A) not to be decomposable into a product of monic linear factors is that the linearization of L(A) is not diagonable. Indeed, in Example 5.6.1 the linearization of 2 has only one Jordan block 74(0) in its Jordan form. Theorem 5.6.2 Let L{ A) be an n x n monic matrix polynomial of degree I for which the companion matrix is diagonable. Then there exist n x n matrices AV,...,A, such that L(\) = (\I + Al)(\I+ A2)--(\I+ A,) Proof. Let (X, T, Y) be a standard triple for L(\), and let Jfj = Ker X XT 7 = 1 / — 1 LaT"'. Obviously, the Jfj are subspaces in <p"' and ^D^D-O/,., By Theorem 1.8.5 there exist ^-invariant subspaces Mx C M2 C - • - C M,_{ such that Mj is a direct complement to N} in <p"'. The transformations col[^T'K:^;: M,^ <T , ; = 1,...,/- 1 (5.6.7) are invertible. Indeed, by the choice of M{ we have Ker(col[AT']'Io|^ ) = {0}. As the matrix col[AT']j:', is invertible, the matrix col[AT']j:,' has linearly independent rows and thus \m{co\\XTl^\[M ) = Im(col[A7"']|:£) = <P"', j = 1, ...,/- 1. Invertibility of (5.6.7) now follows. The proof is completed by applying Theorem 5.6.1. □
Differential Equations 175 5.7 DIFFERENTIAL EQUATIONS Consider the homogeneous system of differential equations with constant coefficients: d'xjt) dt1 i-o d'x(t) dt' o, /e[o,») (5.7.1) where Au,. . . , At_l are n x n (complex) matrices, and x(t) is an n- dimensional vector function of t to be found. The behaviour of solutions of equation (5.7.1) as t—»°° is an important question in applications to physical systems. We look for solutions with prescribed growth (or decay) at infinity. It will turn out that such solutions depend on certain invariant subspaces of a linearization of the monic matrix polynomial i-\ L(A) = /A' + E At\' connected with (5.7.1). First we observe that a solution of (5.7.1) is uniquely defined by the initial data xu\a) = xj, j = 0, ...,/- 1, with given initial vectors xa,. . . ,x,_1. Indeed, denoting by y(t) the «/-dimensional vector r *(') -i x'(t) ,('-') »J equation (5.7.1) is equivalent to the following equation: dy{t) _ dt 0 0 0 L-i4n / 0 0 -A, 0 / 0 -A 0 0 -^,-,-1 y(t), re [a, oo) (5.7.2) As it is well known [cf. Section 2.10, especially formula (2.10.8)], a solution of equation (5.7.2) is uniquely defined by the initial data y(a), which amounts to the initial data xu\a), j - 0,...,/- 1 for equation (5.7.1). In particular, the dimension of the set of all solutions of (5.7.1) (this set obviously is a linear space) is nl, the number of (complex) parameters in the n-dimensional vectors jt0, . . . , x,_x that determine the initial data of a solution and thus the solution itself. It will be convenient to describe the general solution of (5.7.1) in terms of a standard triple (X, T, Y) of the monic matrix polynomial L(\).
176 Applications to Matrix Polynomials Lemma 5.7.1 A function x{t) is a solution of (5.7.1) if and only if it has the form x{t) = Xe'Tc, tE[a,n) (5.7.3) for some vector c £ <p". Proof. Differentiating (5.7.3), we obtain n> (T- xu\t) = XT'e"c, 7 = 0,1, so i-i ^P-+lA, ^ = XT'e'Tc + tAtxre'Tc = I XT1 + £ A.XT^u dt' y = o ' dt' y = 0 ' ^ j = 0 ' ' which is equal to zero in view of Proposition 5.1.4. It remains to show that every solution of (5.7.1) is of the type (5.7.3) for some c G <p"'. As the linear space of all solutions of (5.7.1) has dimension nl it will suffice to show that the solutions Xe'Tct,. . . , Xe' cnl that correspond to a basis c1,...,cnl'm <p"' are linearly independent. In other words, we should prove that Xe'Tc = 0 for all t^a implies c — 0. Indeed, differentiating the relation Xe'Tc = 0 j times, we obtain XT'e'Tc = 0 for / = 0,1,2, . . . . In particular X XT \-XT i-\ e'Tc = 0 As the matrices e'T and col(AT')j=o are nonsingular (Proposition 5.1.4), it follows that c = 0. □ Now let us introduce some T-invariant subspaces: 9t+{T) [resp. 3i_{T)\ is the sum of all root subspaces of T corresponding to its eigenvalues with positive real part (resp. with negative real part); 310{T) is the sum of all root subspaces of T corresponding to its pure imaginary eigenvalues (including zero); and Xo(T)= 2 Ker(r-A07) A0Sa(T) Obviously, 3fc0(T) is a T-invariant subspace contained in 3i0{T). If it happens that T has no eigenvalues with positive real part, we set 3i+{T) = {0}. A similar convention will apply for 9l_(T), 9l0(T), and 9CQ(T).
Differential Equations 177 Let 9Cy(T) be a fixed direct complement to JC0(T) in &t0{T) and note that 3ifi(r) is never T invariant [unless JC1(T) = {0}]. Otherwise ft^T) would contain an eigenvector of T that, by definition, should belong to 3C0(T). We now have the direct sum <p"' = 91 (T) + 9C0(T) + 3fCt(T) + 91JJ). For a given vector c £ <p"', let c = c_ + c0 + c, + c+ (5.7.4) where ce».(r), c0E3if0(r), c,e3Sr,(r), ct6»+(r). We describe the qualitative behaviour of solutions of (5.7.1) in terms of this decomposition of the initial value of the solution x(t). A solution x(t) of (5.7.1) is said to be exponentially increasing if for some positive number p. 0<\mi\\e->ux(t)\\^™ (5.7.5) but lim||e"u+"'jc(0||=0 (5.7.6) for every e >0. Obviously, such a positive number p is unique and is called the exponent of the exponentially increasing solution x(t). A solution x(t) of (5.7.1) is exponentially decreasing if (5.7.5) and (5.7.6) hold for some negative number p [which is unique and is called again the exponent of x(t)]. We say that a solution x(t) is polynomially increasing if 0<iim||rmjc(0l|<a> for some positive integer m. Finally, we say that a solution x(t) is oscillatory if 0<iirn||jc(0l|<ao These classes of solutions of (5.7.1) can be distinguished according to the decomposition (5.7.4) of the vector c, as follows. Theorem 5.7.2 Let x(t)-Xe'Tc be a solution of (5.7.1). Then (a) x(t) is exponentially increasing if and only ifc+ # 0; (b) x{t) is polynomially increasing if and only if c+ = 0, Cj ¥" 0; (c) x(i) is oscillatory if and only if c+= c, = 0, c0 # 0; (d) x(t) is exponentially decreasing if and only if c+ = c, = c0 = 0, c_ ^0. In cases (a) and (d), the exponent of x(t) is equal to the maximum of the real parts of the eigenvalues A0 of T with the property that P^c # 0, where PA is the projector on 3ikQ{T) along EAeo.(T)$A(r).
178 Applications to Matrix Polynomials Proof. We have x(t) = Xe'T c + Xe'To(c0 + c,) + Xe'T+c+ (5.7.7) where T = T^ (r), T0 = TjS) (r), T+ = Tim <T). Without loss of generality [passing to a similar triple (X, T, Y), if necessary] we can assume that T_, T0, T+ are matrices in Jordan form. Note that for the Jordan block Jk( A) we have (according to Section 2.10) ,"*(*) 1! te t (*-l)! 1 1! te So every entry in Xe' +c+ is a function of the type 2 *>,(f) i*< A,>0 (5.7.8) for some polynomials /»,-(')■ Also, every entry in ^e'T°(c0 + c,) is of the type A,e<r(r) 38« A, = 0 (5.7.9) whereas every entry in Xe'T°c0 is of the type (5.7.9) with all polynomials Pj(t) constant. Finally, every entry in Xe'Tc_ is of the type 2 e'k'pt{t) A,e<r(T) 38. A,<0 (5.7.10) Further, note that Xeu±c^=0 for all f>a (5.7.11) if and only if c± =0. Indeed, if equality (5.7.11) holds, then successive differentiation gives XT'±e'Tlc± = 0, / = 0, 1,.... In particular •T±~ =i [coi[*n;:0]lflMT/'*c±=o (5.7.12) As co1[AT']'=q is a nonsingular matrix, the transformation
Differential Equations 179 [col[Ar]i:Xt(r>:SUn-»r' has zero kernel, and equation (5.7.12) implies c± =0. Also, the equality Xe,T°(co + ci) = 0, t>a holds if and only if c0 + cl- 0. Also Xe'T°c0 = 0 , t > a if and only if c0 = 0. According to the observation made in this and the preceding paragraphs, statements (a)-(d) follow easily from formula (5.7.7). For instance, assume that x{t) is exponentially increasing. In view of (5.7.8)-(5.7.10), this means that Xe'T'c+^0 (since \ez\ = e*" for any complex number z), and this is equivalent to the inequality c+ ^ 0. □ A special case with X= [I 0 ■ ■ • 0] and T the companion matrix of L(A) deserves special attention. In this case the matrix col[AT']|lo is just the identity, and thus *(0) *'(0) c - ,('-!) L^'-"(0). Exponentially decreasing solutions of (5.7.1) are of particular interest. We present one result on existence and uniqueness of exponentially decreasing solutions in which only partial initial data are prescribed. Theorem 5.7.3 For every set of k vectors xQ,. . . , xk_l in <p" there exists a unique exponentially decreasing solution x{t) of (5.7.1) such that xt'\a) = xl, i = 0,...,*-l if and only if the matrix polynomial L( A) admits a factorization L( A) = L2(A)L,(A), where Lj(A) and L2(A) are monk matrix polynomials of degrees k and I - k, respectively, such that Re A <0 for all A E o"(L,) and 3le A>0 for all AEcr(L2). Proof. In the notation of Theorem 5.7.2 the solution x{t) is exponentially decreasing if and only if x{t) = Xe,Tc_ (5.7.13) where c_ E 91_{T). When *(/) is given by (5.7.10) we have
180 Applications to Matrix Polynomials x{i\a) = Xte'Tc_, i = 0,l,2,... It follows that for every set jc0, . . . , jc^, G <p" there exists a unique exponentially decreasing solution x(t) of (5.7.1) with jc<0(a) = *,, i = 0,..., A: — 1 if and only if the transformation is one-to-one and onto. This amounts to the invertibility of col[X(T\m_(7-)),]*J0l> which in turn is equivalent (by Corollary 5.3.3) to the existence of a factorization L(A) = L2(A)L,( A). Moreover, in this factorization [X\m (r), T^ (r), Y] is a standard triple for L,(A) (for a suitable Y), whereas (X, PT\lmP, PY) is a standard triple for L2(A) for a suitable X, where P is the projector on 9l0(T) + 3l+(T) along 3i_{T). As r^_(r) and PT\lmP are linearizations of L,(A) and L2(A), respectively (Theorem 5.1.5), it follows that indeed 0ie A<0 for all AE<r(L,), and 3le A>0 for all aeit(l2). a 5.S DIFFERENCE EQUATIONS In this section we consider the system of difference equations xj+t + At_xxj+l^ + --- + AoXj = Q, / = 0,1,... (5.8.1) where AQ,. . . , A,_l are given « x n matrices, and {jCy}JL0 is a sequence of n-dimensional vectors to be found. Clearly, given / initial vectors jc0, . . . , */_,, the vectors xh jc,+], and so on are determined uniquely from (5.8.1). Hence, a solution {*y}JL0 of equation (5.8.1) is determined by its first / vectors. Again, it will turn out that the asymptotic behaviour of solutions of (5.8.1) can be described in terms of certain invariant subspaces of a linearization of the associated monic matrix polynomial i-\ L(A)=/A' + 2 A^' Let (X, T, Y) be a standard triple for L( A). The general solution of (5.8.1) is then {XT'c):.0 (5.8.2) where c E <p" is an arbitrary vector. Indeed, putting jcy = XT'c, j = 0, 1,..., we have
Difference Equations 181 xl+l + i4,_,je|.+/_1 + • • • + A0Xj = XTI+Jc + A^,XT'+l lc + ■ ■ ■ + A0XT'c = {XT1 + A^XT1'1 + ■■■ + A0X)Vc which is zero in view of Proposition 5.1.4. If the first / vectors in (5.8.2) are zeros, that is Xc = XTc=-=XT'~ic = 0 then by the nonsingularity of col[AT']|~o we obtain c = 0. This means that the solutions (5.8.2) are indeed all the solution of (5.8.1). The solutions of (5.8.1) are now to be classified according to the rate of growth of the sequence {*;}JL0. We say that the solution {Xj}J=0 is of geometric growth (resp. geometric decay) if there exists a number q > 1 (resp. a positive number q< 1) such that 0<TET||<r'"xJ|<oo but for every positive number e. The number q is called the multiplier of the geometrically growing (or decaying) solution {jc;-}°°=0. The solution {jty}JL0 is said to be of arithmetic growth if for some positive integer k the inequalities 0<TFrrr||m"*xJ|<oo holds. Finally, {*;}7=0 is oscillatory if 0<TFrrri|x J|<°o The classification of the solution *. = XT'c, / = 0, 1, . . . of (5.8.1) in terms of c G <p"' is based on certain T-invariant subspaces. Let us introduce these subspaces. Denote by 91+{T) [resp. £%(T)] the sum of all root subspaces of T corresponding to the eigenvalues A0 of T with | A0| > 1 (resp. with |A0| < 1), and let 5tf\T) be a direct complement to the subspace %°(T)= 2 Ker(r-A0/) U0l = i Ane<r(r) in the sum of ail root substances of T corresponding to the eigenvalues An with |A0| = 1. Observe that 01+{T), 9l~(T), and 3C°(T) are 7invariant. We have a direct sum decomposition
182 Applications to Matrix Polynomials <f"' = 01 +{T) + X°{T) + X\T) + 01 ~(T) according to which every vector c G <f"*' will be represented as + , o , i , c = c + c + c + c Theorem 5.8.1 Let {Xj = XT'c}*=0 be a solution of (5.8.1). Then the solution is (a) of geometric growth if and only ifc+ ^O; (b) of arithmetic growth if and only if c+ =0, c1 t^O; (c) oscillatory if and only if c+ =0, c1 =0, c°^0; (d) of geometric decay if and only if c+ = c1 = c° = 0, c~ # 0. In cases (a) and {d) the multiplier of {jc,}^L0 is equal to the maximum of the absolute values of the eigenvalues A0 of T with the property that Pk c ¥^ 0, where Px is the projector on 01 (T) along EAe„(r) ®X(T). A*A„ The proof of Theorem 5.8.1 is similar to the proof of Theorem 5.7.2 if we first observe that the mth power of the Jordan block of size k x k with eigenvalue A is [■MA)]M = a- (7)a- (»)*■ ■'■ (7)'- o 0 L 0 ■ (*ffl2)'-'*! (5.8.3) (It is assumed here that ( . ) = 0 if y > m.) This formula can be easily verified by induction on m. The following result on existence of geometrically decaying solutions of equation (5.8.1) can be established using a proof similar to that of Theorem 5.7.3. Theorem 5.8.2 For every set of k vectors ya,. . . , yk_x in <p" there exists a unique geometrically decaying solution {*,},, 0 with x0 y». = yk if and only if L(A) admits a factorization L{X) = L2(A)L,(A), where L2(A) and L,(A) are monic matrix polynomials of degrees I— k and k, respectively, such that |A0| < 1 for every A() G <r(L,) and |A0| 2:1 for every A0G cr(L2).
Exercises 183 5.9 EXERCISES For a monic n x n matrix polynomial L( A) of degree /, the pair of matrices (X, T), where X and T have sizes n x nl and nl x «/, respectively, is called a r/g/i/ standard pair for L(A) if (X, T, Y) is a standard triple of L(A), for some n x n matrix Y. (a) Prove that a pair of matrices (X, T) of sizes nx nl and n/ x n/, respectively, is a right standard pair for a monic matrix polynomial L(A) = /A' + Ej:j j4,Ay if and only if co\[xr]'-J0 is in- vertible and i40AT + • • • + A,_xXt~x + XT' = 0 [Hint: The necessity follows from Proposition 5.1.4. To prove sufficiency, define y=(coi[jfr];:i)-lcoi[51.l/];.l (i) and verify that (X, T, Y) is similar to the triple (P,, C,, /?,) from Proposition 5.1.2 with the similarity matrix col[AT']'lo.] (b) Show that given a right standard pair (X, T) of L(A), there exists a unique Y such that (X, T, Y) is a standard triple for L( A), and in fact Y is given by formula (1). [Hint: Use formula (5.1.11) for the similarity between the standard triple (X, T, Y) and the standard triple (Ply C,, r?j) from Proposition 5.1.2.] A pair of matrices (T, Y) of sizes nl x nl and nix n, respectively, is called a left standard pair for the monic n x n matrix polynomial L( A) if for some n x nl matrix X the triple (X, T, V) is a standard triple of L(A). (a) Prove that a pair of matrices (T, Y) of sizes n/ x nl and nix n, respectively, is a left standard pair for L(A) = /A' + EJIq Aj\' if and only if [Y, TY,. . . , r'"'y] is invertible and YA0+ TYAl + ■■■+ T'~lYAl_l + T'Y^0 (b) Show that given a left standard pair (T, Y) of L(A), there exists a unique X such that (X, T, Y) is a standard triple of L(A), and in fact *=[o ••• o j][y,ty,..., r'_ly]_I (c) Prove that (T, Y) is a left standard pair for L(A) = /A' + Ejlo /i^A' if and only if (K*, 71*) is a right standard pair for the monic matrix polynomial Ik' + EJI^ A*X'.
Applications to Matrix Polynomials Let L(A) = A' + EJZq a,A' be a scalar polynomial with / distinct zeros A,,..., A,. (a) Show that (X,T) = ([l 1 ••• 1], diag[A,,...,A,]) is a right standard pair for L(A). Find Y such that (X, T, Y) is a standard triple for L(A). (b) Show that (r,K)= diag[A,,...,A,], 1 1 U is a left standard pair for L( A), and find X such that (X, T, Y) is a standard triple for L(A). Let L(A) = ( A - A0)' be a scalar polynomial. Show that ([1 0 ■ • ■ 0], J,(K)) 's a right standard pair of L(A) and that (J,(A0), col[5,;]J=1) is a left standard pair for L(A). Find X and Y such that ([1 0 • • • 0], /,(A0), Y) and (X,J,(K)> co'[S,i]|=1) are standard triples for L(A). Let L(A) = (A - A,)' • • - (A - \k) * be a scalar polynomial, where A,, . . . , \k are distinct complex numbers. Show that {[Xl,...,Xk],Jli{X1)@---®Jlt(Xk)) and ^(Aje-'-e/jAj, L^J are right and left standard pairs, respectively, of L(A), where X, [1 0 • • • 0] is an 1 x /; matrix and K.= 0 LU is an L x 1 matrix.
Exercises 185 5.6 Let 5.7 5.8 5.9 5.10 5.11 5.12 L(A) 4Ll (A) 0 0 L2(A)J be a monic matrix polynomial, and let (A",, Ty, Yx) and (X2, T2, Y2) be standard triples for the polynomials L,(A) and L2(\), respectively. Find a standard triple for the polynomial L(A). Given a standard triple for the polynomial L( A), find a standard triple for the polynomial S~"'L(A + a)S, where S is an invertible matrix, and a is a complex number. Let (X, T, Y) be a standard triple for L(A). Show that [*,0], o n ro-ix t or iy\) is a standard triple for the matrix polynomial L(A2). Given a standard triple for the matrix polynomial L(A), find a standard triple for the polynomial L(/?(A)), where p(\) = \m + T.J'Jq \'aj is a scalar polynomial. Let l(a) = /a' + S Aj\' be a 3 x 3 matrix polynomial whose coefficients are circulants: h bh c Ak = ubk k = 0,1,...,/-! (ak, bk, and ck are complex numbers). Describe right and left standard pairs of L(X). [Hint: Find an invertible S such that S~'L(A)S is diagonal and use the results of Exercises 5.5-5.7.] Identify right and left standard pairs of a monic n x n matrix polynomial with circulant coefficients. Using the right standard pair of a scalar polynomial given in Exercise 5.5, describe: (a) The solutions of differential equation ;-i / = 0 where a0, . . . , a,^, are complex numbers; (b) The solutions of difference equations f/+i + «,-,JfJ+,-, + "- + fl,Jf/+I +«„*,-=0, 7 = 0,1,.. .
186 Applications to Matrix Polynomials 5.13 Find the solution of the system of differential equations x{'\t) + 2(aJxU)(t) + b/i\t)) = 0 j = 0 Ao + X(V(>)(0 + «>//)(0) = o where a0,. . . , a,_, and b0,. . . , b,_l are complex numbers. When are all solutions exponentially decreasing? When does there exist a nonzero oscillatory solution? 5.14 Find the solutions of the system of difference equations xj+l+ S (akxj+k + bkyj+k) = 0 k = 0 i-\ yi+l+ S (bkyj+k + akxj+k) = 0; j = 0,1,2,... When do all nonzero solutions have geometric growth? 5.15 Find the supporting triinvariant decomposition i? 4- Jt 4- {0} = <p' corresponding to the divisor (A - A,)"1 • • • (A - \k)"k of the scalar polynomial (A - A,)"1 • • • (A - Xk)Pk (here a; < 0y, j = \,...,k, and a; are nonnegative integers). Use the standard triple determined by the right standard pair described in Exercise 5.5. 5.16 Let XI — Xx and XI — X2 be linear n x n matrix polynomials such that the matrix Xx - X2 is invertible. Construct a monic n x n matrix polynomial of second degree with right divisors XI - Xx and XI - X2. [Hint: Look for a matrix polynomial with the standard pair ([/ /], Xy@X2).] 5.17 Let Lj(A) and L2(A) be monic matrix polynomials with no partial multiplicities greater than 1. Show that the product L,(A)L2(A) has no partial multiplicities greater than 2. 5.18 State and prove a generalization of the preceding exercise for the product of k monic matrix polynomials with no partial multiplicities greater than 1. 5.19 Show that a monic n x n matrix polynomial has not more than n partial multiplicities corresponding to any zero of its determinant. (Hint: Use Exercise 2.16.) 5.20 Prove that a monic n x n matrix polynomial of degree / with circulant coefficients has not more than / partial multiplicities corresponding to any zero of its determinant. 5.21 Describe all supporting triinvariant decompositions for the scalar polynomial (A - A0)".
Exercises 187 5.22 Given an n x n monic matrix polynomial L( A) of degree /, a CL-invariant subspace i? is called supporting if the direct sum decomposition Z£ 4- M 4- {0} = £"' is a supporting triinvariant decomposition with respect to the standard triple [/ 0-- 0], CL, ro o Find all supporting subspaces for the scalar polynomial (A-A,)*'(A-A2)*> 5.23 Find all supporting subspaces for the scalar polynomial (A-A,)*'---(A-Ar)*' 5.24 Prove that for a scalar monic polynomial L(A), every CL-invariant subspace is supporting. 5.25 Describe all supporting subspaces for a monic matrix polynomial whose coefficients are circulant matrices, that is, matrices of type «,e<P 5.26 5.27 Give an example of a monic matrix polynomial of second degree with nondiagonable companion matrix that admits factorization into linear factors. Prove the following extension of Theorem 5.6.2 for polynomials of second degree. Let L(A) be a monic n x n matrix polynomial of second degree such that its companion matrix has at least 2n - 1 blocks in its Jordan form. Then L(A) admits a factorization into linear factors (A/- /t,)(A/- A2). [Hint: Let (X, J) be a right standard pair of L(A) with J in the Jordan form. Arguing by contradiction, assume that every n columns of X formed by the eigenvectors of L(A) are linearly dependent. Then the columns in that correspond to the eigenvectors of L(A) are linearly dependent, and this contradicts the invertibility of XJ ]
188 Applications to Matrix Polynomials 5.28 5.29 5.30 A factorization L(A) = L2(A)L3(A) of a monic matrix polynomial L(A) is called spectral if det L2(A) and det L3(A) have no common zeros. Show that the factorization is spectral if and only if in the corresponding triinvariant decomposition i? 4- M + {0} = <p"' (Corollary 5.3.3) the /"-invariant subspace 2£ is spectral. Prove or disprove the following statement: each monic matrix polynomial ^(A) has a spectral factorization corresponding to every triinvariant decomposition Z£ 4- M + {0} = <p" with spectral T- invariant subspaces i? and M, where T is a linearization for L(A). Let a,, a2, a3, a4 be distinct complex numbers, and let L(A) = [ (A-fll)(A-a2) 0 A-a, 1 (A-fl3)(A-fl4)J (a) Show that (fl2-fl3) ' (a2-a4)' 1 1 , diag[fl1,fl2,a3,fl4]j 5.31 is a right standard pair for L(A). (b) Find Y such that (X, T, Y) is a standard triple for L(A). (c) Using the supporting triinvariant decomposition i? 4- M 4- {0} = <p4 with spectral T-invariant subspace !£, find all spectral factorizations of L(A). Let M{ A) and N( A) be a monic matrix polynomials of sizes n x n and mx m, respectively, and of the same degree /, and let ^>-[T 01 (X)i /V(A) 5.32 5.33 5.34 be a direct sum of A/(A) and /V(A). Prove or disprove each of the following statements: (a) the monic matrix polynomials L,(A) and L2(A) in every factorization L(A) = L,(A)L2(A) are also direct sums; (b) same as (a) with the extra assumption that M( A) and /V(A) do not have common eigenvalues. Verify formula (5.8.3). Supply the details for the proof of Theorem 5.8.1. Prove Theorem 5.8.2.
Chapter Six Invariant Subspaces For Transformations Between Different Spaces We are now to generalize the notion of an invariant subspace for transformations from <p" into <p" in such a way that it will apply to transformations from <pm+" into <p", or from <£" into $m+". The definitions introduced will have associated with them a natural generalization of similarity, called "block similarity", that will apply to transformations between different spaces. This will form an equivalence relation on the class of transformations between two given (generally different) spaces. A canonical form is developed for this similarity that is a generalization of the Jordan normal form. These ideas and results are then applied to the resolution of two spectral assignment problems. This really means analysis of the changes in spectra brought about by block similarity transformations. Although this material is based on the theory of feedback in time- invariant linear systems, the presentation here is in the framework of linear algebra. 6.1 [A B]-INVARIANT SUBSPACES Consider a transformation from <pm+'' into <p". Our objective in this section is to develop and investigate a generalization of the notion of an invariant subspace that will apply to such transformations and that reduces to the familiar concept when m = 0. Let P be the projector on $m+" that maps each vector onto the corresponding vector with zeros in the last m positions. We treat vectors of <pm+" in terms of their components in Im P and 189
190 Invariant Subspaces for Transformations Between Different Spaces Im(/ - P), respectively, and, for x = {.*,,. . . , xm+n) E. §m+" we identify Px = (xt, . . . , xn,0,. . . ,0) with {.*,, . . . , xn) e<p". Then we may represent any x £ <pm+" as an ordered pair (Px, (/ - P)x) and, with respect to this decomposition, a transformation from $m+" into <p" can be written in the block form [A B] where A: $"-*$" and fl: <pm-> <p". We also write [A B}:$" + <pm^<p". A subspace J( of <p" will be said to be [A B] invariant if there is a subspace Sf of £m+'' with M = P^and[A B]y C Py = M. Of course, when m = 0, P = /, and this is interpreted as the familiar definition AM (ZM tor A in variance. We now characterize this concept in different ways and, for this purpose, introduce another definition. Given a transformation [A B]: <p" 4- £"■-»• <p", a transformation T: <pm+"^> $m+n is called an extension of [/I B] if it has the form T=\A B] IC D for some transformations C: $"—* <pm and D: <£""—> <pm. Theorem 6.1.1 Let M be a subspace of <p" and [/I B] be a transformation from $m+n into <p". Then the following are equivalent: (a) M is [A B] invariant; (b) there exists a subspace y of <pm+" with M = Py and an extension of [A B] under which y is invariant; (c) the subspace M satisfies AMGM+lmB (6.1.1) (d) there is a transformation F: <p"—> <fm such that (A + BF)MCM (6.1.2) Proof. The theorem will be proved by verifying the implications (a)^>(d)^>(c)^>(b)^>(a). (a)=>(d): Since M is [A B] invariant, there is a subspace Sf of <pm+" with M = py and [A B\y C ^. Let at, ,. . . , xk be a basis for J(. Then there exist z,, . . . , z* £ Sf such that Xj = Pzjf j= 1,2,..., k. Define y; = (/- P)z, G <pm, j = 1,2,. . . , k and then, since ' \ey,[A B]ycM implies that, for / = 1,2,. . . , k, AXj + By j E. M. Now define a transformation F: <p" -> <pm by setting f^ = _y; for / = 1,. . . , £ and letting F be arbitrary on some direct complement to M in <p". Then for any m = E*=1 a^ £i we have (.4 + BF)m = 2! "/(A^. + By,) e M as required.
[A fi]-Invariant Subspaces 191 (d)=>(c): Given condition (6.1.2) we have, for any xEM Ax = (A + BF)x - BFx e M + Im B and (6.1.1) follows. (c)=>(b): Let *,, . . . , xk be a basis for M and, using formula (6.1.1), let yl,. . . , yk be vectors in <£"" for which Axi + Byi E M for / = 1, 2,. . . , k. Define a transformation H: <p"—><pm by means of the relation Hxj = y/, j = 1, 2,. . . , k and letting H be arbitrary on some direct complement to M in <p". Then define the subspace V of $m+" by S^ = m 1 i and note our construction ensures that (.4 + BH)m G M for any mE M. Consider the extension of [/I B]. It is easily verified that 5^ is L r/yH no j invariant under this extension. (b)=£>(a): This follows immediately from the definitions. □ We will find the next simple corollary useful. Corollary 6.1.2 With the notation of Theorem 6.1.1, if M is [A B] invariant, then for any transformation F: $"—* <pm, M is [A + BF B] invariant. Proof. We use the equivalence of statements (a) and (d) of the theorem. The fact that M is [,4 B] invariant implies the existence of an F0: <pm—» <pm such that M is (A + BF0) invariant. Thus, for any F: <p"—* <pm, M is invariant under A + BF + B(F0 - F). Consequently, M is [A + BF B] invariant. □ Subspaces characterized by equation (6.1.1) are described in more geometric terms by replacement of Im B by some subspace V of <p". In this context it is useful to describe a subspace M as A invariant (mod T) if AMdM+r When T = {0} a subspace is A invariant (mod V) if and only if it is A invariant. At the other extreme, when V = <p", every subspace is A invariant (mod T). For a given transformation A: <p" —> <p" and a subspace V of <p", consider the class of all subspaces that are A invariant (mod V). It is easy to see that this class is closed under addition of subspaces, but is not closed under intersection. This is illustrated in the next example. We observe that
192 Invariant Subspaces for Transformations Between Different Spaces (reverting to the language of transformations), although the set of all /1-invariant subspaces form a lattice, the same is not generally true for the set of all [A Z?]-invariant subspaces. example 6.1.1. Let A: <p3—><p3 be defined by linearity and the equalities Ael = e2, Ae2 = el, Ae3 — ey. Let V = Span{e2 + e3}. The subspaces Span{e1; e2} and Span{e,,e3} are both A invariant (mod T). (The sub- space Span{e,,e2} is actually A invariant.) However, their intersection Span{e,} is not A invariant (mod T). Indeed, Aet = e20Span{el} + Span{e2 + e3). □ Given A and V as above, it is natural to look for a "largest" subspace among all of those that are A invariant (mod V). More generally (cf. Section 2.7), given a subspace M of <p", a subspace °U of M that is A invariant (mod V), is said to be maximal in M if % contains all other subspaces of M that are A invariant (mod T). Proposition 6.1.3 For every subspace M C <p" there is a unique subspace of <p" that is A invariant (mod V) and maximal in M. Proof. Let % be the sum of all subspaces that are A invariant (mod V) and are contained in M. Because of the finite dimension of M, % is in fact the sum of a finite number of such subspaces. Consequently, % is itself A invariant (mod V) and thus maximal in M. The uniqueness is clear from the definition. O 6.2 BLOCK SIMILARITY In the preceding section the idea of [,4 B]-invariant subspaces has been developed where [A B] is viewed as a transformation from (f"1 + <pm into <p". We must also consider transformations of the other kind, namely, those acting from <p" to <p" 4- <pm. Such transformations can be written in the form where A: <p"—><p" and C: (pn—»(pm. For these transformations, [ a~\ r a ~* need a dual concept of -invariant subspaces where A we is viewed as a transformation from <p" into <p" 4- <pm. Thus, guided by Proposition 1.4.4, it is natural to define a subspace M of <p" is r\ invariant if and only if M x is [A* C*] invariant in the sense of Section 6.1. We develop this idea in Section 6.6. The purpose of this section is to generalize the notion of similarity to transformations [A B] and in a way that will be consistent with the definitions of these generalized invariant subspaces.
Block Similarity 193 Let us begin with similarity for transformations from <(7" into L C J [ A ~\ <£"■ 4- <pm. In this case it is natural to say that a transformation 'is Ml . LCiJ similar to if there is an invertible transformation S on <£"■ 4- <pm such that and the additional assumption that <p" is S invariant. Thus S\fn defines an invertible transformation on <p"—the space on which acts. This means that, with respect to the decomposition <f""+n = <p" 4- <pm, S has the representation S = X Z 0 Y where X, Y are invertible transformations on <£" and <pm, respectively. The formal definition is thus as follows: transformations ' and 2 from <p" into <p" 4- (pm are said to be block similar if there is an invertible transformation 5 = X Z 0 Y j: £" + 4r"-*<p"-i-<p" such that [c;Mc,'k' <*»> Going to the adjoint transformations, this leads us to the dual definition: transformations [Al Bt] and [A2 B2] from (f"1 4- <pm into <p" are said to be fe/oc/: similar if there is an invertible transformation 5 = [l ^]:(P" + (Pm^P+<Pm (6-2.3) such that [/l2 B2] = N"I[/l1 Bj[^ ^] (6.2.4) Now let us describe block-similar pairs [Al B,] and [A2 B2] in two other ways.
194 Invariant Subspaces for Transformations Between Different Spaces Theorem 6.2.1 Let [Al B,] and [A2 B2] be transformations from <£" 4- <£"" into (p". Then the following statements are equivalent: (a)[Ax B,] and [A2 B2] are block similar; (b) there exist invertible transformations N and M on <p" and <pm, respectively, and a transformation F: <p" —* <pm such that A2 = N'\Ay + ByF)N and B2 = N'1BlM (6.2.5) (c) for any extension Tl of [A, B,] there is an extension T2 of [A2 B2] and a triangular invertible transformation S of the form (6.2.3) for which Tl = ST2S~l. Proof. Given statement (a) and, hence, equation (6.2.4), let F= LN'\ and it is found immediately that equation (6.2.4) implies the relations (6.2.5). So (a)4>(b). Given statement (b), define S as in (6.2.3), let L = FN, and let be an extension of [Al BJ. Then it is easily verified that S~1TlS is an extension of [A2 B2], and statement (c) follows. Finally, statement (c) implies that for any extension Tl of [Ax B,] [as in (6.2.6)] there is an extension T2 of [A2 B2] such that T2 = S~1TlS with S as in (6.2.3). This immediately implies equation (6.2.4). Thus (c)=>(a). □ Corollary 6.2.2 Let [/4, B,] and [A2 B2] be block-similar transformations with transforming matrix S given by [6.2.3]. Then Mis an [A, B^-invariant subspace if and only if N~ M is an [A2 B2]-invariant subspace. Proof. Assume that M is \AX BJ invariant. By Theorem 6.1.1 there is an extension Tx of [At B,] and a subspace y such that M-Py and T^CV. Since 7, = ST2S~\ r2(5"V)C (S1?). But also, using (6.2.3), P(S'ly) = N~lPy=N~iM. Hence [A2 B2](S~l^)C N~lM and, by definition, N~lM is [A2 B2] invariant. If we are given that N~1M is [A2 B2] invariant, it follows from T2 = S'1TlS that Mis [A2 BJ invariant. D Corollary 6.2.3 If transformations [At B,] and [A2 B2] are block similar, they have the same rank. Proof. Let [At B,] and [A2 B2] be block similar. Then Theorem 6.2.1 implies that
Block Similarity 195 [A2 B2] = N~l[(Al + B1F)N BXM] Writing G = FNM ', we see that rank[,42 B2] = rank[ ,4, + B,G B,] But it is easily verified that Im[A1 + BlG B^-lv^A^ B,], and so rank[,42 B2] = rank[Al B,]. D By use of the characterizations of block-similar transformations developed in Theorem 6.2.1, it is easily verified that block similarity determines an equivalence relation on the class of all transformations from <p" 4- <pm into <p". This immediately raises the problem of finding a canonical form for representations of the transformations in the equivalence classes determined by this relation. The rest of this section is devoted to the derivation of such a form. It will, of course, be a generalization of (and so be more complicated than) the Jordan normal form, which is associated with similarity of transformations in the usual sense, and which appears herein as Theorem 2.2.1. Our argument will make use of the Kronecker canonical form for linear matrix polynomials under strict equivalence, as developed in the appendix. The following proposition is an important step in the argument. Note that it is convenient to work with matrices here. The previous analysis applies, of course, when they are viewed as transformations in the natural way. Proposition 6.2.4 Let A{ and A2 be n x n matrices and B, and B2 be n x m matrices. Then [A, B,] and [A2 B2] are block similar if and only if the linear matrix polynomials [I\ + Ay B,] and [/A + A2 B2] are strictly equivalent, that is, there exist invertible matrices S and T such that S[I\ + Al Bl]T=[IX + A2 B2] (6.2.7) Proof. Assume that (6.2.7) holds and write TJTn Tl2l *- -*21 '22-* where Tu is n x n. Then 5(/A + >li)7'11 + 5Bjr21 = /A + A2 Hence Tn = S~\ and S(At + BJ2lS)S ' = A2 (6.2.8) Equation (6.2.7) also implies that
196 Invariant Subspaces for Transformations Between Different Spaces S(I\ + A,)Tl2 + SB1T22 = B2 It follows that Tl2 = 0 and then that SBtT22 = B2. Combining this relation with (6.2.5), it follows from Theorem 6.2.1 that [Al B,] and [A2 B2] are block-similar. Conversely, suppose that the relations (6.2.5) hold for appropriate N, M and F. Then (6.2.7) holds with 5= N~\ Tu = N, Tl2 = 0, T2l = FN'r, and Tr. = M. D Now we are ready to state and prove a result giving a canonical form for block-similar transformations and known as the Brunovsky canonical form. In the statement of the theorem Jk(\) will, as usual, denote the kxk Jordan block with eigenvalue A. Theorem 6.2.5 Given a transformation [A B]: <p" 4- <pm-» <p", there is a block-similar transformation [A0 B0] that (in some bases for <p" and <pm) has the representation A0=Jkl(0)@---®JkJiO)®Jli(\l)®---®Jl<i(\q) (6.2.9) for some integers i, s • • • > kp > 0 and all entries in B0 are zero except for those in positions (£,, 1), (i, + k2, 2), ...,(&, + ••• + /c p), and these exceptional entries are equal to one. Moreover, the matrices A0 and B0 defined in this way are uniquely determined by [A B], apart from a permutation of the blocks J, (A,),...,/, (\q) in (6.2.9). Thus the pair of matrices A0, B0 or the block matrix [A0 B0] may be seen as making up the Brunovsky canonical form for the transformation [A B]. It will be convenient to call the matrix Jk(0)@---@Jk (0) the Kronecker part of A0 and the integers kl, . . . ,k the Kronecker indices of [A B]. Similarly, we call /, (A,)0- ■ -®J, (\q) the Jordan part of A0 and /,,..., / the Jordan indices of [A B], Proof. We use the terminology and results of the appendix to this book. We may consider A and B to be n x n and n x m matrices, respectively. Consider the linear matrix polynomial C(A) = [A/+ A,B] of size n x (« + m). As the equation x(X)TC(X) = 0T, AG<p has no nontrivial polynomial solution *(A), the minimal row indices of C(A)
Analysis of the Brunovsky Canonical Form 197 are absent. Further, the polynomial AC(A~') = [/ + \A, ABJ obviously has no elementary divisors at zero, so C(A) has no elementary divisors at infinity. Let k1,...,k be the minimal column indices of C(A) and (A + Ai)'\ . . . ,(A + \q)'i be the elementary divisors of C(A). Then Theorem A.7.3 ensures that C(A) is strictly equivalent to the linear matrix polynomial [^©•••©^©(A/+/,1(A1))©-'©(A/+y/jA,)),0„XI] (6.2.10) where Lk is the k x (k + 1) matrix "A 1 0 ••• 0 0- 0 A 1 ••• 0 0 .0 0 0 ■•■ A 1. and j = maxAglp (rank C(A)) - n [and we have used the elementary fact that -/;(A0) and J,(~K) are similar]. After a permutation of columns the polynomial (6.2.10) becomes [/A + A0 B0] with A0 and B0 as defined in the statement of the theorem. The theorem itself now follows in view of Proposition 6.2.4. □ 6.3 ANALYSIS OF THE BRUNOVSKY CANONICAL FORM We first draw attention to an important special case of Theorem 6.2.5. This concerns transformations [A B]: <pm+"—»<£■" in which the pair (A, B) is a full-range pair in the sense defined in Section 2.8. That is, when 2 Im(A'B) = 2 Im(A'B) = <p" where p is the degree of a minimal polynomial for A. The following lemma will be useful. Lemma 6.3.1 Consider any transformations [A B]: <pm+" and F:<p"^<pm. For s = 0, 1,2,. . . we have s s X lm(A'B) = 2 lm(A + BF)'B (6.3.1) /-o /-o Proof. The proof is by induction on s. When 5=0, equation (6.3.1) is trivially true. Using a binomial expansion it is found that
198 Invariant Subspaces for Transformations Between Different Spaces Im A(A + BF)rlB = A lm((A + BF)' lB) CAlm[B AB ••• ArlB] Clm[AB A2B ■■■ A'B] Hence Im^A + BF)rB C Im A(A + BF)r~lB + Im BF(A + BF)r~lB d\m[AB ■■■ ArB] + lmB = Im[B AB ••• A'B] Assuming that the relation (6.3.1) holds when s = r-1, this implies that the right-hand side of (6.3.1) is contained in the left-hand side. But the opposite inclusion follows from that already proved on replacing A by A - BF. □ We now formulate other characterizations of full-range pairs (A, B). Theorem 6.3.2 For a transformation [A B]: <pm+"—»<p" the following statements are equivalent: (a) the pair (A, B) is a full-range pair; (b) there is a full-range pair (A^By) for which [Ax Bx) and [A B] are block-similar; (c) in the Brunovsky form [A0 B0] for [A B], the matrix A0 has no Jordan part; (d) the rank of the transformation [IX + A B] does not depend on the complex parameter A. Proof. Consider statement (b). If [At B,] and [A B] are block-similar, then, by Theorem 6.2.1, there are invertible transformations N, M and a transformation F such that Al = N~\A + BF)Nt BX = N~'BM Thus A\B1 = N'\A + BF)'M. From the definition of full-range pairs and Lemma 6.3.1 it follows that (A, B) is a full-range pair. So (a) and (b) are equivalent. Now consider a canonical pair (A0, B0) as defined in Theorem 6.2.5. It is easily verified that such a pair is a full-range pair if and only if the Jordan part of A0 is absent. Since [A B] is block-similar to a canonical pair [A0 B0] (by Theorem 6.2.5), the equivalence of (a) and (c) follows from the equivalence of (a) and (b). Consider condition (d). It follows from Corollary 6.2.3 that the rank of [IX + A B] for any A G <p is just that of [IX + A0 B0] where [A0 B0] is a Brunovsky form for [A B]. A moment's examination of A0 and B0 convin-
Analysis of the Brnnovsky Canonical Form 199 ces us that the rank of [/A + A0 B0] takes the same numerical value, except at the points A = -A •, j = 1, . . . , q, where there is a reduction in rank. Thus the rank of [/A + A B] is independent of A if and only if there is no Jordan part in A0, and the equivalence of (c) and (d) is proved. □ So far, the discussion of this section has focussed on cases in which the matrix A0 of a canonical pair (A0, B0) has no Jordan part. This can be described as the case q — 0 in equation (6.2.9). It is also possible that A0 has no Kronecker part; the case p = 0 in equation (6.2.9). In this case B0 = 0 as well. We return to this case in Section 6.6. We conclude this section by showing that the Kronecker indices of the Brunovsky form can be determined directly from geometric properties of the transformation [A B] without resort to the computation of the minimal column indices of [IX + A B]. Proposition 6,3.3 Let [A B] be a transformation from $m+" into <p" and define the sequence d^l,do,dl,...byd^l=0 and, for s = 0,1,. . . s ds = dim 2 lm(A'B) (6.3.2) / = 0 Then the Kronecker indices klt. . . , kp of [A B] are determined by the relations {*,| *,>*}* = </,-</, , (6.3.3) Note that the sequence d_,, d0,. . . is ultimately constant and (if B ¥=0), is initially strictly increasing (see Section 2.8). Proof. Use Theorems 6.2.1 and 6.2.5 to write /l'B = N"1(>l0 + B0F)/B0^ where M and /V are invertible and [A0 B0] is block similar to [A B). Now Lemma 6.3.1 implies t ha(A'B) = WS lm(A0 + B0F)jb\m = /V"'(S M,4'0B0))a# / = () \=0 ' X/ = 0 ' Consequently, the integers ds defined by formula (6.3.2) are invariant under block similarity. Now formula (6.3.3) is easily verified for a canonical pair 04o,B0). □
200 Invariant Subspaces for Transformations Between Different Spaces Note that the number of Kronecker indices p is given by equation (6.3.3) in the case 5 = 0. Thus d0 = p = dim(Im B) = rank B 6.4 DESCRIPTION OF [A B]-INVARIANT SUBSPACES In some special cases Theorem 6.2.5 can be used to describe explicitly all [A B]-invariant subspaces. We consider a primitive but important "full- range" case in this section. Theorem 6.4.1 Let [A B] be a transformation from <p" + 1 into <p" for which (A, B) is a full-range pair. Then there exists a basis /,,••,/„ in <p" such that every m-dimensional [A B]-invariant subspace M ¥= {0} admits the description: M = ^r\±X^fk,±(\l)x^fk,... ■•■.il*!1!)**"''/*; /•=!.■••.'} (6-4.1) where r,,. . . , r, are positive integers with r, + • • • + r,- m and A,,. . . , A, are distinct complex numbers (as usual, I } = m\l[p\(m — p)\ with the (m\ yP' understanding that 0! = 1 and that I I =0 for m< p). Conversely, every subspace M(Z§" of the form (6.4.1) is [A B] invariant. Proof. Taking advantage of the equivalence (a)<=>(b) in Theorem 6.3.2, we can assume that A = "0 10- 0 0 1- -0 0 0- • o- • 0 • 0- B = > -o- 0 -1- Let M ¥" {0} be an [A B]-invariant subspace. Then, by Theorem 6.1.1, there exists a 1 x n matrix F = [a0, . . . , an^x] such that M is invariant for the matrix A + BF = 0 1 0 0 0 1 0 0 0 Lfl„ a, a, 0 0
Description of [A B]-Invariant Subspaces 201 Let r,,. . . , r, be all the partial multiplicities of (A + BF)\M (so r, + • • • + r, = dim M), and let At, . . . , A, be the corresponding eigenvalues. For every A0£ <p the matrix Au/- (A + BF) has a nonsingular (n - 1) x (n - 1) sub- matrix (namely, that formed by the rows 1,2, ...,n-l and columns 2, 3,... , «). It follows that dim Ker(A0/ - (A + BF)) = 1 for every A0 e <r(A). So there is exactly one Jordan block in the Jordan form of A + BF corresponding to each A0 £ <r(A + BF). Hence the same property holds for (A + BF)\M, and the eigenvalues A,, . . . , A, must be distinct. It follows that in order to prove that M has the form (6.4.1), it will suffice to verify that for any Jordan chain g,, . . . , gr of (A + BF)\M corresponding to A; we have Span{g„ . . . , gr) = Spanf £ \^lek, . . . , £ (^W^J (6.4.2) Observe first that det(A/- (A + BF)) = A" - an .A""1 a,A - a0 and consequently A^ is a zero of the polynomial A" - fln_lAn_ -•■•-- a,A - a0 of multiplicity at least r. Further, for t = 1, 2, . .. , r [(A + BF) - A,/] £ (kt~_\)^'ek = £ (^2 )a;+J ^ (6-4-3) (and the right-hand side is interpreted as zero for t = 1). Indeed, equality in the 5th place (s = 1,. . . , n - 1) on both sides of (6.4.3) follows from the easily verified combinatorial identity: (,-,)-(;:;m;-1) Equality in the nth place on both sides of (6.4.3) amounts to 1?1'.-.(*:1,)v-'-^(::1,K-'-C:2,K"'- or SS'-fciV'+C-ik'-'-o «■«> but the left-hand side of this equation is just the (t— l)th derivative of the polynomial A" - an_xX" ' a0 evaluated at A;; so equation (6.4.4), and hence (6.4.3), is confirmed. Ik - 1\ We have verified that the vectors £*_, kkrlek,... , E^, I _ )X*~rek form a Jordan chain of A + BF corresponding to A.. As the restriction (A + BF)\ mK (a+bf) 's unicellular, there exists a unique (A + BF)-
202 Invariant Subspaces for Transformations Between Different Spaces invariant subspace in $t k(A + BF) of dimension r, and this subspace is spanned by the vectors in any Jordan chain of (A + BF) of length r corresponding to A;. So (6.4.2) follows. Conversely, let M be given by (6.4.1) (with fk replaced by ek, k = 1,. . . , n). Let/(A) = A" - fln^,A"_1 - • • • - a0 be a polynomial such that A; is a zero of /( A) of multiplicity of at least r•, / = 1,. . . , /. As we have seen above, the vectors E* = 1 A*"'e*,. . . , E*=l ( _1)A*"r' form a Jordan chain of A + BF corresponding to A^ for / = 1, . . . , / (here F = [a0, a,, . . . , an_,]). So by Theorem 6.1.1, M is [A B] invariant. □ The case /= m in Theorem 6.4.1 deserves special attention. Corollary 6.4.2 Let [A B] be as in Theorem 6.4.1. Then there exists a basis /,,. . . , fnin <p" such that, for every m-tuple of distinct complex numbers Al5. . . , Am, the m-dimensional subspace Span! S A*-1/*; y=l,...,m) is [A B] invariant. This corollary shows that (at least in the case of a full-range pair A: <p"-» <p" and B: <p—> <p") there are a lot of [A Z?]-invariant subspaces. Indeed, Corollary 6.4.2 shows the existence of a family of [A B]-invariant m-dimensional subspaces that depends on m complex parameters (namely, A„...,AJ. For the general case of a full-range pair we have the following partial description of [,4 B]-invariant subspaces. Theorem 6.4.3 Let (A, B) be a full-range pair with Kronecker indices &,>•••> kr. Then there exists a basis fn,. . . , fik, i - 1,. . . , r in <p" such that for every r-tuple ofnonnegative integers lt,. . . , lr satisfying l,si/,! = l,.,.,r, and for every collection {Aa, . . . , A,,; / = 1,. . . , r} of complex numbers the subspace SSpanfS A^.-%|y = l,...,/.} is [A B] invariant. The proof of Theorem 6.4.3 is obtained by combining Theorem 6.3.2 and Corollary 6.4.2.
The Spectral Assignment Problem 203 6.5 THE SPECTRAL ASSIGNMENT PROBLEM For a transformation A on <p" the eigenvalues are invariant under similarity transformations. More generally, if A is defined by a transformation [A B]: <p" + <pm^> <p", then, by Theorem 6.2.1, block similarity transforms A into N~\A + BF)N for some invertible /V. Thus the eigenvalues of A are no longer invariant, but are transformed to those of A + BF, where F depends on the similarity. Now we ask, for given [A B], what are the attainable eigenvalues of A + BF? We do not answer this question directly, but we present solutions to two closely related problems. First, suppose that we are given n complex numbers A,,. . . , An (possibly with repetitions) that are candidates for the eigenvalues of A + BF. Under what conditions on the transformation [A B] does a transformation F: <p"—* £" exist such that the numbers A,,. . . , A„ are just the eigenvalues of A + BF, counting algebraic multiplicities? This is known as the spectral assignment problem. It is important in its own right and is also relevant to our discussion of the stability of [A B]-invariant subspaces. Clearly, when B = 0, the problem is not generally solvable. Another extreme case arises if B = I when it is easily seen that a solution can always be found by using diagonable matrices F. We show first that the problem is always solvable as long as {A, B) is a full-range pair. Theorem 6.5.1 Let A: $"—> <p", B: <£""—> <£" be a full-range pair of transformations. Then for every n-tuple of complex numbers A,, . . . , An there exists a transformation F: <p"—> <pm such that A + BF has eigenvalues A,, . . . , A„. Proof. With the use of Theorem 6.2.1 it is easily seen that we can assume, without loss of generality, that A and B are in Brunovsky canonical form. Furthermore, by Theorem 6.3.2, it follows that the Jordan part of A is absent [see equation (6.2.9)]. So the Kronecker indices kx,. . . ,kpoi[A B] satisfy the condition kx + •• ■ + k = n. For j = 1,. . . , p let «y(A) = A*'+2 c„A« be the scalar polynomial with zeros A, +1,. . . , A,., where // = kt + • ■ • + kf (and we define /0 = 0). Let F=[FX F2 ■■■ Fp] where F, is the m*- kt matrix whose ith row is [-c,0, —cn, . . . , -cik,_x\ and the other rows are zeros. Then
204 Invariant Subspaces for Transformations Between Different Spaces A + BF = dmg[A1,A2,...,A„] where A,= 0 0 1 0 L-C;„ -C, 0 1 0 -c, 0 0 is a kt x kt matrix for / = 1,. . . , p [the companion matrix of «,-(•)]. It is well known that the eigenvalues of At are exactly A, +,,. . . , A,.. This proves the theorem. □ The argument used in proving Theorem 6.5.1 can also be utilized to obtain a full description of the solvable cases of the spectral assignment problem. We omit the details of the proof. Theorem 6.5.2 Let A: <p"—> (f"1 and B: <pm—* <p" be a pair of transformations, and let the I x / matrix J = J, (A,) ©•••©/, (A^) be the Jordan part of the Brunovsky form for [A B]. Then, given an n-tuple of (not necessarily distinct) complex numbers fil,. . . , /i.„, there exists a transformation F: <p"—> <pm such that A + BF has eigenvalues /a, , . . . , fin if and only if at least I numbers among fiy,. . . , ftn coincide with the eigenvalues of J (counting multiplicities). We need another version of the spectral assignment problem, known as the spectral shifting problem. Given a transformation [A B]: §m+n—> <p" and a nonempty set H C <p, when does there exist a transformation F: <p"-^ <pm such that u(A + BF) CO? When (A, B) is a full-range pair, such an F always exists in view of Theorem 6.3.2. In general, the answer depends on the relationship between the root subspaces of A and the minimal ,4-invariant subspace over Im B: def n-1 </l|lmB) = ImB + i4(ImB) + --- + >l""I(ImB)=2 Im(A'B) 1=0 [known as the "controllable subspace" of the pair (A, B) in the systems theory literature; see also Proposition 8.4.1]. Observe first that the subspace {A | Im B) is the minimal ,4-invariant subspace over Im B (see Theorem 2.8.4). In particular, (A | Im B) is A invariant. Also, equation (6.3.1) can be expressed in the form <i4|ImB) = <>l + BF|ImB) (6.5.1) for any transformation F: <p" -» <pm.
The Spectral Assignment Problem 205 Theorem 6.5.3 Given a nonempty set ft C <p and a transformation [A B]: $m+"—> <p", there exists a transformation F: $"—> <pm such that cr(A + BF) CCl if and only if ®Ao(A)C(A\lmB) (6.5.2) for every eigenvalue A0 of A that does not belong to ft. Recall that S?A (.4) = Ker( A0/ - A)" is the root subspace of A corresponding to the eigenvalue A0 and, by definition, £%A (A) = {0} if \o0<r(A). In the proof we use the following basic fact about induced transformations in factor spaces. (Recall the definition of the induced transformation given in Section 1.7.) Lemma 6.5.4 Let X: <p"—> <p" be a transformation with an invariant subspace Z£, and let X: §"13!—* <p7iP be the induced transformation. Then for every A0G <p we have ®Xg(X) = P®Xu(X) (6.5.3) where P: <p"—> <p7i?is the canonical transformation: Px = x + Z£, x G <p". In particular, every eigenvalue of X is also an eigenvalue of X. Proof Let p( A) = ( A0 - A)". Then for every x e <p" with Px e 9tko(X) we have P(p(X)x) = p(X)(Px) = 0 So p(X)x<E<£. Let fl(A) = n;_1(A/-A)", where A,,..., A, are all the eigenvalues of X different from A0. As p(A) and q(X) are polynomials with no common zeros, there exist polynomials g(A) and /i(A) such that g(A)p(A) + h(\)q(\) = 1. (This is well known and is easily deduced from the Euclidean algorithm.) Hence x = g(X)p(X)x + h(X)q(X)x (6.5.4) Since p(X)x e i?, we also have g(X)p(X)x e !£. On the other hand, the Cayley-Hamilton theorem ensures that p(X)h(X)g(X)x = 0, that is, the vector u = h(X)g(X)x belongs to 9tx(X). Now equation (6.5.4) implies Px=Pu<E PM^X)
206 Invariant Subspaces for Transformations Between Different Spaces We have proved the inclusion C in equality (6.5.3). The opposite inclusion follows from the relation P{p(X)y) = p{X)(Py) for every vector y E <p". □ Proof of Theorem 6.5.3. First consider a pair (A0, B0) in the Brunovsky canonical form, as described in Theorem 6.2.5. Then (A0 | Im B0> = Span{ey | / = 1, ...,*, + ■■• + *,} The condition 3frx (A0) C { A0 | Im B0) for every A0 e (p -~ ft means that [in the notation of equation (6.2.9)] A,,..., A Eft. It remains to apply Theorem 6.5.2. Now consider the general case, and let Aa = N~l(A + BF0)N; B0 = N'lBM where [A0 B0] is in Brunovsky canonical form. It is easily seen that there exists a transformation F, such that cr(A0 + B0F,)cn if and only if there exists an F2 with a(A + BF2) Cft (indeed, one can take F2 = F0 + MFlN~l). Further, using equation (6.5.1), we have (A | Im B) = (A + BF0 | Im B) = N(A0 | Im B0) and obviously, for any A0 E <p ^o(A + BF0) = Jf^o(A0) So it remains to show that (6.5.2) holds if and only if ^o(A + BF0)C(A\\mB) , A0E<p-.ft (6.5.5) This is done by using Lemma 6.5.4. Denote by P: <p"-* §"l{A \ Im B) the canonical transformation Px = x+(A\lmB) , Are<p" For a transformation X: <p"^ <p" with invariant subspace (A | Im B), let X:$nl(A\\mB)^>$Hl{A\\mB) be the induced transformation. Using (6.5.1), we see that A and A + BFare well defined. Further, for every x E <p"
Some Dual Concepts 207 Ax = (A 4 BF0)x - BFx G(A + BF0)x 4 {A | Im B) so A = A + BF0. Now, assuming that (6.5.5) holds, and in view of Lemma 6.5.4, we find that for every A0 £ <p -. ft: m^A) = ®,u(A) = »,JLA + BF0) = PW^A 4 BF) C P( A | Im B) = {0} Hence Sft k (A) C{A\lm B). A similar argument shows that (6.5.2) implies (6.5.5). a 6.6 SOME DUAL CONCEPTS The definitions and analysis of this chapter have primarily concerned transformations [A B]: <p" + <pm—><p". Questions arise concerning analogs for transformations : <p" -> (p" 4- (pm. In this section we quickly review some notions and results in this direction. Recall first that a subspace M of <£"■ will be called invariant if and only if M1 is [A* C*] invariant. Thus, with the characterization (d) of Theorem 6.1.1 for [A* C*]-invariant subspaces, there is a transformation G* such that (A* + C*G*)M±CM1 (6.6.1) if and only if M is invariant. Using Proposition 1.4.4, we see that this is equivalent to (A + GC)MCM We include this discussion as part of the following statement. Theorem 6.6.1 Let M be a subspace of <p" and \ be a transformation from <p" into <p" 4- <pm. Then the following are equivalent: (a) M is \ \ invariant; (b) A(M<lKeTC)CM (6.6.2) (c) there is a transformation G: <pm—> <p" such that (A + GC)MCM (6.6.3) Proof. It remains only to establish the equivalence of (a) and (b). This is done by using the equivalence of statements (a) and (c) in Theorem 6.1.1. Thus M is invariant if and only if
208 Invariant Subspaces for Transformations Between Different Spaces A*M±CMX +lm C* = ML + (Ker C)x (6.6.4) Now it is easily verified that, for subspaces %, T, and a transformation A, the relations ATCll and A*aUA~CY± are equivalent. Thus equation (6.6.4) is equivalent to A(ML +(KeTC)L)A CiM1)1 or A(M n Ker C)CM which is condition (6.6.2). □ It is useful to have a terminology involving an arbitrary subspace in the place of Ker C in (6.6.2). Thus, if A is a transformation on <p" and T is a subspace of <p", we say that a subspace M is A invariant intersect T, or A invariant (int T), if A(M n7)Cl Through extension of the terminology of Section 2.8 for any given subspace M, a subspace % that is A invariant (int V) is said to be minimal over M if °U D M and there is no other ,4-invariant (int T) subspace that contains At and is contained in °U. Now consider a generalization of similarity for transformations from <p" to <p" 4- <pm. If is such a transformation, an extension of is a transformation T on <p" + <pm of the form A Bl C Di Then we say that transformations ' and 2 from <p" into <p" + <pm are block similar if, given any extension Tl of ' , there is an extension such that r, and T2 are similar. Comparing this with the -[: T2 of A2 C2 corresponding definition of Section 6.2, we see that this is equivalent to the block similarity of [A* C*] and [A* C*]. We may thus apply Theorem 6.2.1 to obtain the following theorem. Theorem 6.6.2 The transformations ' and 2 from <p" to §m+" are block similar if and only if there exist invertible transformations N on <p" and M on <pm, and a transformation G: <pm —»• <p" such that A2 = N(Al + GCl)N~l , C2 = MClN~l (6.6.5)
Exercises 209 Once again, it is found that block similarity determines an equivalence relation on all transformations from <p" into <£" 4- <pm. Furthermore, the canonical forms in the corresponding equivalence follow immediately from the Brunovsky form of Theorem 6.2.5 by duality. Theorem 6.6.3 Given a transformation _ : C —> <t" + <tm there is a block-similar trans- \ A~\ LCJ „ formation ° that (in some bases for <£"" and (f"1) has the representation L C0 J ^o = V°)©---©V°)@/'1(A>)©---©VA<) (6-66) for some integers fc, > /c2 > • ■ • ^ /c and al entries in C0 are zero except for those in positions (1,1), (2, kl + 1),. . . , (p, &, + •■• + kp_l + 1), and those exceptional entries are equal to one. Moreover, the matrices A0 and C0 defined in this way are uniquely determined by A and C, apart from a permutation of the blocks /, (A,),...,/, (A ) in equation (6.6.6). The case of full-range pairs (.4, B), which was one of our concerns in Section 6.3, is now replaced by the dual case in which (C, A) is a null kernel pair (see the definition in Section 2.7 and Theorem 2.8.2). The dual of Theorem 6.3.2 is now as follows. Theorem 6.6.4 For a transformation : <p" —> (f"1 + <pm the following statements are equivalent: (a) the pair (A, C) is a null kernel pair; (b) there is a null kernel pair (/!,, CJ for which and ' are block similar; (c) in the Brunovsky , the matrix A0 has no Jordan part; (d) the rank of the form [Ac°] for [* _ \I\ + A] J transformation does not depend on the complex parameter A 6.7 EXERCISES 6.1 Let A: <p"—»<p" be a transformation. A chain Ml(Z---dMk (1) of subspaces in <p" will be called almost A invariant if AMj(ZMi + l, i = 1,. . . , k - 1. Show that the chain (1) is almost A invariant if and only if A has the block matrix form A = [Aij\ki*il with j4i7 = 0 for i - / > 1, with respect to the direct sum decomposition C" = J£l + •■• + iP*+i> where i?; is a direct complement to Mi_l in Mt (by definition, M0 = {0} and Mk + l = <£").
210 Invariant Subspaces for Transformations Between Different Spaces 6.2 Prove that every transformation /l:<p"-><p" has an almost A- invariant chain {0}ci,c--ci(„.,c<|:" consisting of n + 1 distinct subspaces, where Ml is any given one- dimensional subspace. {Hint: For a given Mx — Span{x}, x^O, put M2 = Span{x, Ax), . . . , Mk = Span{*, Ax, . . . , A lx}, where k is the least positive integer such that the vectors x, Ax,. . . , Akx are linearly dependent. Use the preceding exercise.) 6.3 A block matrix A = [Au]1y=1 is called tridiagonal if Arj = Q for |i - j\ > 1. Show that a transformation A has tridiagonal block matrix form with respect to a direct sum decomposition <p" = 5£x + • • • 4- j? if and only if the chains oZ- | V_ oZ-1 "T" aL"} >^ V— oZ-1 "T" I n7. _ . and ^cifr]i^c-cl2 + - + ^ are almost ,4 invariant. 6.4 Let A: <p" —» <p" be a self-adjoint transformation. Prove that for any vector .*, G <p" with norm 1 there exists an orthonormal basis xx,. . . , xn in <p" such that the chains Spanf*,} CSpan{jr,, x2} C • • ■ CSpanl^j, x2,. . . , xn_x) and SpanljcJCSpanfj:^,, *,,} C • ■ • C Span{*2,. . . ,xj are almost A invariant (so A has a tridiagonal form with respect to the basis xx,. . . , xn). [Hint: Apply Gram-Schmidt orthogonaliza- tion to a basis xx, y2,. . . , yn in <p" such that the chain Span{x,}CSpan{x1,)'2}C---CSpan{x1,)>2, . . . , yn_x) is almost A invariant (Exercise 6.2) and use the self-adjointness of A.] 6.5 Let A: $"->■ <p" and B: (pm-* <p" be transformations. (a) Show that dimIm[A/- A,B] = n (2) for every A e <p with the possible exception of not more than n - (dim Im B) points. (b) Show that if equation (2) holds at k eigenvalues of A (counting multiplicities) then for every i-tuple /a, ,. . . , fik there exists a transformation F: £"->£'" such that /At, . . . , /aa E <t(A + BF).
Exercises 211 6.6 State and prove the analogs of Exercises 6.5 (a) and (b) for a pair of transformations C: $'-*(", A: <p"^ <p". 6.7 Let (A, B) be a full-range pair of transformations. Show that for any F the transformation A + BF has not more than dim Im B Jordan blocks corresponding to each eigenvalue in its Jordan form. 6.8 Let 6.9 6.10 6.11 0 1 0" 0 0 1 1 0 2. , B = "-1 0 . 1 (a) (b) (c) Show that (.4, B) is a full-range pair. Find matrices N, M and F, where N and M are invertible, such that the pair N~\A + BF)N, N~lBM is in the Brunovsky canonical form. Find G such that A + BG has the eigenvalues 0,2,-1. Let A: <p" —> <p" be a transformation, and let x £ (p" be a cyclic vector in <p" for A (i.e., <p" = Span{;t, Ax, A2x,. . .}). Show that for any n-tuple of not necessarily distinct complex numbers A,, . . . , An there exists a transformation B: <£""—» <p" with Im B CSpan{.*} such that A + B has the eigenvalues A,,. . . , A„. Let A: <p" -> <p" be a transformation, and let M C <p" be a subspace such that <p" is the minimal /I-invariant subspace that contains M. Show that for n-tuple A1; . . . , A„ of not necessarily distinct complex numbers there exists a transformation B: <p"-» <p"withlm B C M such that A + B has eigenvalues A,,. . . , A„. Let C = [l,-1,0]; A = 0 0 1 1 0 0 L0 1 -2. (a) Show that (C, ,4) is a null kernel pair. (b) Find matrices N, M and F, where /V and M are invertible such that MCN~l, N(A + FC)N~l are in the canonical form as described in Theorem 6.6.3. (c) Find G such that A + GC has the eigenvalues 1,-1,0. 6.12 Let A: <p"—» <p" be a transformation, and let M C <p" be a subspace such that {0} is the maximal ,4-invariant subspace in M. Prove that for any n-tuple of not necessarily distinct complex numbers A,, . . . , An there exists a transformation C: <p"—» <p" with Ker CC M such that A + C has eigenvalues A,,. . . , A„.
Chapter Seven Rational Matrix Functions In this chapter we study r x n matrices W(A) whose elements are rational functions of a complex variable A. Thus we may write where p,;(A) and qtj(X) are scalar polynomials and <^(A) are not identically zero. Such functions W(A) are called rational matrix functions. We focus on problems for rational matrix functions in which different types of invariant subspaces and triinvariant decompositions play a decisive role. All these problems are motivated mostly by linear systems theory, and their solutions are used in Chapter 8. The problems we have in mind are the following: (1) the realization problem, which concerns representations of a rational matrix function in the form D + C(A/- A)~lB with constant matrices A, B, C, D; (2) the problem of minimal factorization; and (3) the problem of linear fractional decomposition. 7.1 REALIZATIONS OF RATIONAL MATRIX FUNCTIONS Let W(A) be an r x n rational matrix function. We assume that W(A) is finite at infinity; that is, in each entry /?,7(A)/^/y(A) of W(\) the degree of the polynomial p,7(A) is less than or equal to the degree of <7/7(A). A realization of the rational matrix function W( A) is a representation of the form W(A)=D + C(XIm-Ay1B, A0<t(A) (7.1.1) where A, B, C, D are matrices of sizes /n x m, m x n, r x m, r x n, 212
Realizations of Rational Matrix Functions 213 respectively. Observe that limA_>03(A/- A)~l = 0. [To verify this, assume that A is in the Jordan form and use formula (1.9.5).] So if there exists a realization (7.1.1), then necessarily D = W(°°). We may thus identify such a realization with the triple (A, B, C). The following lemma is useful in the proof of existence of a realization. Lemma 7.1.1 Let //(A) = Ejlo XiH) and L(A) = A 7 + EJI^ A'^, berXnand nxn matrix polynomials, respectively. Put B = Then 0" 0 L/J A = 0 0 0 -A, 0 " / C = [H0 H ;-iJ H(X)L(\)'1 = C(U-A)'1B Proof. We know already (see Section 5.1) that for \0<r(A) L(\yl = Q(\I-AylB (7.1.2) where Q = [/ 0 by 0]. We may define C,( A),. . . , C,(A) for all \0<r(A) col[C/(A)]/LI = (A/-/l)-,B From equation (7.1.2) we see that C,(A)= L(A)~'. As (A/-^)[col[C/(A)]|'=I] = B the special form of A yields C,(A) = A'-'C1(A), 1 < i < / It follows that C(XI - A)~lB = EJZj //;C/ + 1(A) = //(A)L(A)-1, and the proof is complete. □ Theorem 7.1.2 Every r x n rational matrix function that is finite at infinity has a realization. Proof. Let W( A) be an r x n rational matrix function with finite value at infinity. There exists a monic scalar polynomial /(A) such that /(A)W(A) is a
214 Rational Matrix Functions (matrix) polynomial. For instance, take /(A) to be a least common multiple of the denominators of entries in W(\). Put //(A) = /(A)(W(A) - W(°°)). Then //(A) is an r x n matrix polynomial. Clearly, L(A) = /(A)/n is monic and W(A) = VV(oo) + //(A)L(A)"1. Further lim H(A)L(A)~' = lim [W(\) - W(°°)] = 0 A—*cc A—*=c So the degree of //(A) is strictly less than the degree of L(A). We can apply Lemma 7.1.1 to find A, B, C for which W(\) = W(oo) + //(A)L(A)-1 = W(oo) + C(A/- A)~lB This is a realization of W(A). D A realization for W(A) is far from being unique. This can be seen from our construction of a realization because there are many choices for /(A). In general, if (A, B, C) is a realization of W(A), then so is (A, B, C), where 4u 0 0 An A 0 An A 23 ^33J , B = [0.1 B .0. C = [0CC,] (7.1.3) for any matrices Aif, Bx, and C, with suitable sizes (in other words, the matrices A, B, C are of size s x s, s x n, r x s, respectively, and partitioned with respect to the orthogonal sum <p* = <pp 0 <pm © <p*, where m is the size of A; for instance, Al3is a px q matrix). Indeed, for every \0<r(A) we have (A/-^)"' = (XI-AU) 0 0 0 (A/ ^33rlJ and thus B(A/- ^)"'C= B(A/- /l)"'C= W(A)- W(oo) Among all the realizations of W(A) those with the properties that (C, A) is a null kernel pair and (A, B) is a full-range pair will be of special interest. That is, for which D Ker C47 {0} and 2 Im /I'B = <p" (7.1.4) The next result shows that any realization "contains" a realization with
Realizations of Rational Matrix Functions 215 those properties. To make this precise it is convenient to introduce another definition. Let (.4, B, C) be a realization of W(A), and let m x m be the size of A. Given a triinvariant decomposition <pm = i? 4- M 4- Jf associated with an /l-semiinvariant subspace M (so that the subspaces i? and 2£ + M are A invariant) with the property that C\^ — 0 and Im B C i? + Ji, a realization (PjtAly, PMB, C\M), where PM:$m—*M is a projector on M with Ker PM D !£, is called a reduction of (A, B, C). Note that (PMA\jf, PMB, C\M) is again a realization for the same W(X). [See the proof that (7.1.3) is a realization of W(A) if (A, B, C) is.] We shall also say that (A, B, C) is a dilation of (PMA\M, PMB, C\M) (in a natural extension of the terminology introduced at the end of Section 4.1). Theorem 7.1.3 Any realization (A, B, C) of W(A) is the dilation of a realization (A0, B0, C0) of W(A) with null kernel pair (C0, A0) and full-range pair (A0, B0). Proof. Let X(C, A) = n~=0Ker(C4') and ${A, B) = E°l0 Im(A'B) be the maximal ,4-invariant subspace in Ker C and the minimal ,4-invariant subspace over Im B, respectively. Put £ = %(C,A), let M be a direct complement of S£n#(A,B) in #(A, B), and choose Jf so that (pm = 2 + M 4- jV (7.1.5) and we recall that m is the size of A. Let us verify that equality (7.1.5) is a triivariant decomposition associated with an ^4-semiinvariant subspace M, and that the realization (PMA\M,PMB,C\M) (7.1.6) where P: <pM—»J( is the projector on M along i? 4- jV, is a reduction of (/I, B, C), and has the required properties. Indeed, i? and 1£ + M = jfc(C, A) + $(A, B) are ,4-invariant subspaces, so (7.1.5) is indeed a tri- invariant decomposition. Further, C\% = 0 and lmBCf(A,B)C2 + M so (7.1.6) is a reduction of (.4, B, C). It remains only to prove that the realization (7.1.6) of W(A) has the null kernel and full-range properties. Indeed KerC|^(P^L)' = (KerC/lOni(, / = 0,1,... So
216 Rational Matrix Functions H Ker C\M(PMA\M)' = f) Ker CA' D M = if n M = {0} j=0 ;=0 Also Im(P^L)'/»^B = PJI(Im^/B) Hence 2 lm(PMA\MyPJllB = pjj: Im ,4'fi) = PM($(A, B)) = J< because by construction M C^(A, B). D It turns out that a realization {A, B, C) for which conditions (7.1.4) are satisfied is essentially unique. To state this result precisely and to prove it, we need some observations concerning one-sided invertibility of matrices. By Theorems 2.7.3 and 2.8.4 we have Ker coliCA'^Io = {0} , Im[fl, AB,...,A" lB] = <pm where p is any integer not smaller than the degree of the minimal polynomial for A. Hence there exists a left inverse [co\[CA']?~q]~ . Thus [co\[ca%^]1-co\[ca%^ i Also, there exists a right inverse [B, AB,. . . , AP~XB\~R: [B, AB,..., AplB][B, AB,..., A" [B]'R = I Note that in general the left and right inverses involved are not unique. Theorem 7.1.4 Let (Al,Bl,Cl) and (A2,B2,C2) be realizations for a rational matrix function W(X) for which (C,, ,4,) and (C2, A2) are null kernel pairs and (,4,,/?,), (A2, B2) are full-range pairs. Then the sizes of Ay and A2 coincide, and there exists a nonsingular matrix S such that Al = S~lA2S, B, = S~'B2, Ct = C2S (7.1.7) Moreover, the matrix S is unique and is given by s = [coi[c2/i2];:01]-L[coi[c1/i'l]f:01] = [B2, A2B2,..., Ap2-lB2][Bl,A1Bl,. . . , A^B^* (7.1.8)
Realizations of Rational Matrix Fnnctions 217 Here p is any integer greater than or equal to the maximum of the degrees of minimal polynomials for Al and A2, and the superscript -L (resp. -R) indicates left (resp. right) inverse. Proof. We have W(A) = D + C1(XI-AiylB1 = D + C2(A/- A2)~lB2 For |A|>max{||i4j||, ||i42||} the matrices XI-A^ and XI- A2 are non- singular and for / = 1,2 (xi-Alyl = x~\i-x~lAlyl = JZ A"/-l/i{ ,=o Consequently, we have Cj(2 X~'~1A\)bi = C2(S X~'~xA'2\b2 for any A with |A| >max{||>t1||, ||^2II}- Comparing coefficients, we see that C1A\B1 = C2A'2B2, j = 0,1,. . . . This implies fl,A, = il2A2, where, for k = 1, 2 we write ak = coi[c^i];;0', a, = \Bk, AkBk,..., Apk-lBk\ Premultiplying by a left inverse of il2 and postmultiplying by a right inverse of A2, we find that the second equality in (7.1.8) holds. Now define S as in (7.1.8). Let us check first that S is (two-sided) invertible. Indeed, we can verify the relations (ni"Ln2)S = /, S(A,A2-R)=7 Since ClA\Bl = C2A'2B2, for ; = 0,1, . . . , we have n2"Ln,A1A2TR = ft2 Lft2A2A2 * = /. Similarly, one checks that iVLft2A2A~* = /. Because S is invertible, the sizes of Al and A2 must coincide. It remains to check equations (7.1.7). Write Q2A2A2 =a,/l1A1 = a.A.A'^A, = O.AjA'M.A, Premultiply by il2L and postmultiply by A^" to obtain A2S = SA^ Now 5B, = a2LalB1 = a2La2B2 = b2 and C25-C2A2A7R = C,A1ArR = C1 D
218 Rational Matrix Functions Theorems 7.1.3 and 7.1.4 allow us to deduce the following important fact. Theorem 7.1.5 In a realization {A, B, C) of W( A), (C, ,4) and (A, B) are null kernel pairs and full-range pairs, respectively, if and only if the size of A is minimal among all possible realizations of W( A). Proof Assume that the size m of A is minimal. By Theorem 7.1.3, there is a reduction (A', B', C") of (A, B, C) that is a realization for W(A) and satisfies conditions (7.1.4). But because of the minimality of m the realizations (A', B', C") and (A, B, C) must be similar, and this implies that (A, B, C) also satisfies condition (7.1.4). Conversely, assume that (A, B, C) satisfies conditions (7.1.4). Arguing by contradiction, suppose that there is a realization (A', B', C") with A' of smaller size than A. By Theorem 7.1.3, there is a reduction (A", B", C") of (A', B\ C) that satisfies conditions (7.1.4). But then the size of A' is smaller than that of A, which contradicts Theorem 7.1.4. D Realizations of the kind described in this theorem are, naturally, called minimal realizations of W(\). That is, they are those realizations for which the dimension of the space on which A acts is as small as possible. 7.2 PARTIAL MULTIPLICITIES AND MULTIPLICATION In this section we study multiplication and partial multiplicities of rational matrix functions. To facilitate the presentation, it is assumed that the functions take values in the square matrices and that the determinant function is not identically zero. Let W(A) by an n x n rational matrix function with det W(A)^0. In a neighbourhood of each point A0 G <p the function W( A) admits the representation, called the local Smith form of W(A), at A0: W(A) = £,(A) diag[(A - A0)\ . . . , (A - A0)"»]£2(A) (7.2.1) where £,(A) and E2(\) are rational matrix functions that are defined and invertible at A0, and vx,. . . , vn are integers. Indeed, for matrix polynomials equation (7.2.1) follows from Theorem A.3.4 in the appendix. In the general case write W(A) =p(A)~1VV'(A), where W(\) and p(\) are matrix and scalar polynomials, respectively. Since we have a representation (7.2.1) for W(\), it immediately follows that a similar representation holds for W(X). The integers f,,. . . , vn in (7.2.1) are uniquely determined by W(A) and
Partial Multiplicities and Multiplication 219 A0 up to permutation and do not depend on the particular choice of the local Smith form (7.2.1). To see this, assume that vx < • • ■ < vn, and define the multiplicity of a scalar rational function g(A)^0 at A0 as the integer v such that the function g(A)(A - A„)~" is analytic and nonzero at A0. Then, using the Cauchy-Binet formula (Theorem A.2.1 in the appendix), we see that f, + ••• + p, is the minimal multiplicity at A0 of the not identically zero minors of size i x / of W(A), / = 1,. . . , n. Thus the numbers vx + ■ ■ • + vn i - 1,. . . , n, and, consequently, vx, . . . ,vn are uniquely determined by W(A). The integers vx,. . . , vn from the local Smith form (7.2.1) of W(A) are called the partial multiplicities of W(A) at A0. Note that A0 G <p is a pole of W(A) [i.e., a pole of at least one entry in W(A)] if and only if W(A) has a negative partial multiplicity at A0. Indeed, the minimal partial multiplicity of W(A) at A0 coincides with the minimal multiplicity at A0 of the not identically zero entries of W( A). Also, A0 G <p is a zero of W(A) [by definition, this means that A„ is a pole of W(A)~'] if and only if W(X) has a positive partial multiplicity. In particular, for every A0 G <p, except for a finite number of points, all partial multiplicities are zeros. There is a close relationship between the partial multiplicities of W(\) and the minimal realization of W(A). Namely, let W(A) be a rational n x « matrix function with determinant not identically zero. Let i W(A)= 2 A'W, (7.2.2) be the Laurent series of W(A) at infinity (here q is some nonnegative integer and the coefficients W; are n x n matrices): write f/(A) = E*=0 \'Wj for the polynomial part of W(A). Thus W(\) - U(\) takes the value 0 at infinity, and we may write W(\) = C(\J- A)~lB + U(\) (7.2.3) where C(XI - A) lB is a minimal realization of the rational matrix function W(A) - t/(A). We say that (7.2.3) is a minimal realization of W(A). We see later (Theorem 7.2.3) that A0G <p is a pole of W(A) if and only if A0 is an eigenvalue of A. Moreover, for a fixed pole of W( A) the number of negative partial multiplicities of W(X) at A0 coincides with the number of Jordan blocks with eigenvalue A„ in the Jordan normal form of A, and the absolute values of these partial multiplicities coincide with the sizes of these Jordan blocks. A similar statement holds for the zeros of W(A). An analytic n-dimensional vector function <KA) = 2(A-A0)Vy
220 Rational Matrix Functions defined on a neighbourhood of A0 G <p is said to be a null function of a rational matrix function W{\) at A0 if (/fo^O, W(A)</f(A) is analytic in a neighbourhood of A0, and [W/(^)(/'(^)]a=* = 0- The multiplicity of A0 as a zero of the vector function VV(A)</f(A) is the order of <HA), and i/>0 is the null vector of <KA). From this definition it follows immediately that for n x n matrix-valued functions U{ A) and V{ A) that are rational and invertible in a neighbourhood of A0, </f(A) is a null function of V{\)W{\)U{\) at A0 of order k if and only if U{ \)ip( A) is a null function of W{ A) at A0 of order k. A set of null functions </f,(A),. . . , if/p{ A) of W(A) at A0 with orders kt,. . . , kp, respectively, is said to be canonical if the null vectors ^(A,,),. . . , t^p(A0) are linearly independent and the sum kt + k2 + ■ ■ ■ + kp is maximal among all sets of null functions with linearly independent null vectors. Proposition 7.2.1 Let W{\) be as defined above and t^,(A),. . . , i//p{\) be a canonical set of null functions of W{\) {resp. W(A)~') at A0. Then the number p is the number of positive {resp. negative) partial multiplicities of W{\) at A0, and the corresponding orders klt. . . , k are the positive {resp. absolute values of the negative) partial multiplicities of W{\) at A0. Proof. Briefly, reduce W{\) to local Smith form as described above and apply the observation made in the paragraph preceding Proposition 7.2.1. □ Now we fix an n x n rational matrix function W(A) with det W{X)^0. Let W{\) = C{XI- A) lB + U{\) (7.2.4) be its minimal realization, and fix an eigenvalue A0 of A. Replacing (7.2.4), if necessary, by a similar realization, we can assume that where cr{Ap) — {A0} and Xo0cr{A'p). Note also that if A0 is a pole of W(A), then equation (7.2.4) implies that A0 is an eigenvalue of A. Proposition 7.2.2 Let W{ A), Ap, and Bp be defined as above. Let A0 be a pole of W{ A), let </f( A) be a null function of W{ X)' at A0 of order k, and let <p be the coefficients of def ?(A)=W(A)-V(A):
Partial Multiplicities and Multiplication 221 ^(A) = E(A-A0)V/ (7.2.5) Then xr^(Ap-X0ir-ilBp^, / = 0, . . . , fe - 1 (7.2.6) is a Jordan chain for A at A0. Conversely, if A0 is an eigenvalue of A and x0,. . . ,xk^l is a Jordan chain of A at A0, there is a null function </f(A) of W( A) ' at A0 with order not less than kfor which (7.2.6) holds [in particular, A„ is a pole of W(A)]. Note that as o-(Ap) - {A0}, the series in (7.2.6) is actually finite. Proof. By definition, vectors (7.2.6) form the Jordan chain for Ap at A0 if (Ap - A0/)*u = 0 , x0*0 (Ap-X0J)xl = xj-l, / = 1,2,...,*-1 The last k - 1 statements follow immediately from (7.2.6). Also (Ap - \0I)x0 =t(Ap- KI)"Bp<pv Now the Laurent series for W(A) at A0, say, W(A) = EjL_„ (A - A0)'W,, has the following coefficients of negative powers of (A — A0): W_l = Cp(Ap-\0iy-1Bp, j=\,2,...,q and it is easily seen that q is the least positive integer for which (Ap - A0/)9 = 0. (One checks this by passing to the Jordan form of Ap.) Now recall that </f(A) = W(A)<p(A) is analytic near A0; so equating coefficients of negative powers of (A-A0) to zero and using the fact that (Ap - A0/)* = 0, we obtain for / = 1,2, . . . vk 0 = 2 W.r.itPv = X Cp(Ap - A0/)"+'-'B^„ = lcp(Ap-x0iy+>-lBp<Pl, = C(Ap-\0iy-\Ap-\0I)x0
222 Rational Matrix Functions Since n*=0Ker CA' = {0}, it follows that n*=0Ker CpA'p = {0} or, what is the same, that col[C A' ]r,2o is left invertible for some integer r. As C„{Ap ~ V) _Cp(Ap-\l)I)r>} L(-A0) (r;1W- CnAr cAr; the matrix co\[Cp(Ap - A0/)'],rr,J is left invertible as well, and since (Ap - XJf-0 for s>q, we obtain the left invertibility of co\[Cp(Ap - A0/)/_1]J=1. It now follows that (A - A0/);tu = 0 as required. Finally, since </f(A0) = x0, it is also true that xQ #0. Thus, as asserted, equations (7.2.6) do associate a Jordan chain for Ap with the null function <HA). Conversely, let x0, xx, . . . , xk_x be a Jordan chain of Ap at A0. From the definition of a minimal realization it follows that the matrix [BpMP-Ki)Bp,...MP-Ki)m~W is right invertible for some integer m. Consequently, there exist vectors ipk, <pk + l, ■ ■ ■ , with only finitely many nonzero, such that *t-,=i(^-vfV/ (7.2.7) The definition of a Jordan chain includes (Ap - h0I)Xj = xj_l for j - 1,2, . . . , k - 1, and so equations (7.2.6) follow immediately from (7.2.7). It remains only to check that W(A)<p(A) is now a null function of W(\)~l at A0, where <p( A) = Ejlt (A - Au)Vy Observe first that *0 ^0 and that CpA'px0 = \'Cpx0 for/ = 0,1,2, As the matrix col[Cp(Ap - A,,/)7]^1 is left invertible for some integer m, so is col[CpAp]JL~0l, and it follows that Cpxo^0. But using (7.2.6), we obtain 0*Cpxo 2c,04,-A0/)> 'B a> pre lim W(A)<p(A) D If the Jordan chain xn , of Ap at A() cannot be prolonged, then xk , f£\m{Ap - A0/), and it follows from (7.2.7) that ipk ^0. Thus a maximal Jordan chain of length k determines, by means of (7.2.6), an associated null function </f(A) of W(\)~l of order k. Propositions 7.2.1 and 7.2.2 prove the following result. [The second part of Theorem 7.2.3 concerning zeros of W( A) is obtained by applying the first part to W(\)\]
Partial Multiplicities and Multiplication 223 Theorem 7.2.3 Let W(X) be a rational n x n matrix function with det W(A)f^0, and let its minimal realization be given by equation (7.2.4). A complex number A0 is a pole of W(\) if and only if A0 is an eigenvalue of A, and then the absolute values of negative partial multiplicities of W( A) at A0 coincide with the sizes of Jordan blocks with eigenvalue A„ in the Jordan form of A, that is, with the partial multiplicities of A0 as an eigenvalue of A. A complex number A0 is a zero of W(A) if and only if A0 is an eigenvalue of Ax, where Ax is taken from a minimal realization for W(A)-1: W(A)_I = C,(A/- AX)BX + K(A) with matrix polynomial K(A). In this case the positive partial multiplicities of W( A) at A„ coincide with the partial multiplicities of A0 as an eigenvalue of Ax. Now we apply Theorem 7.2.3 to study the partial multiplicities of a product of two rational matrix functions. Let W,(A) and W2(A) be rational n x n matrix functions with realizations W,(A) = D, + C,(A/-/l,r1B, (7.2.8) for i = 1 and 2. [Of course, the existence of realizations (7.2.8) presumes that W,(A) and W2(\) are finite at infinity.] Then the product W,(A)W2(A) has a realization W,(A)W2(A) = D1Z)2 + [C1,D1C2](a/-[^1 B£2]) [Bf2] (7.2.9) Indeed, the following formula is easily verified by multiplication: M, fljCj-h-'rU/-*,)-' (A/-^ir1BlC2(A/-^2)-'l L o a2 \) V o (\i-A2yl J so the right-hand side of (7.2.9) is equal to DlD2 + Cl(M- Al)~lBlD2 + [C,(A/- A,) 'B,C2(A/- A2)~l + D,C2(\I - A2) l)B2 = D1D2 + C,(A/-/11)"IBID2 + Cx(M-AxylBxC2(M- A2)~lB2 + D,C2(A/- ^2)_1B2 = H^,(A)D2 + (W,(A) - DX)(W2(X) - D2) + D,(W2(A) - D2) = W,(A)W2(A)
224 Rational Matrix Functions So formula (7.2.9) produces a realization for the product W1W2 in terms of the realizations for each factor. Easy examples show that (7.2.9) is not necessarily minimal even if the realizations (7.2.8) are minimal. See the following example, for instance. example 2.1. Let W"<A)=A^T' W2(A) = ^p Minimal realizations for W,(A), * = 1,2 are not difficult to obtain: W,(A)=1 + 1-(A-1)~'-1; Wj(A)=l-l-A~'-l Formula (7.2.9) gives /=W,(A)W2(A) = 1 + [1,-1][a/-[J "*]] [j] which is a realization of the rational matrix function /, but not a minimal one. More generally, if W2(A) = W,(A)~', then the realization (7.2.9) is not minimal [unless W,(A) is a constant]. □ Let W( A) be an n x n rational matrix function with determinant not identically zero. For A0 G <p, denote by ir(W; A0) = {tj}}"!, tne nonincreas- ing sequence of absolute values of negative partial multiplicities of W( A) at A0. This means that irx ^ tt2 s • • • are nonnegative integers with only a finite number of them nonzero (say, ~nk> irk + x = 0), and — tt,, — tt2, . . . , —irk are the negative partial multiplicities of W(A) at A0. Consider nonincreasing sequences a = {«;}"=, and /3 = {0y}JLi °i non" negative integers such that only finitely many of them are nonzero, and recall the definition of the set T(a, /3) given in Section 4.4. Theorem 7.2.4 Let W,( A) and W2( A) be n x n rational matrix functions with determinant not identically zero and that take finite value at infinity. Then for every A0 G <p and j - l, 2, . . . we take 7r; ^ 5;, where {tt;-}"=1 = ir(WlW2; A0) and {S-}"_, « some sequence from V(ir(Wl; A0), 7r(W2; A0)). //, i/z addition, W,(A) and W2(X) admit minimal realizations (7.2.8) for which the realization (7.2.9) of Wl(\)W2(\) is minimal as well, then actually ir(W,W2; A0)er(ir(W„ A0), tt(W2; A0)) Proof. Let W,(A) and W2(A) have minimal realizations as in equation (7.2.8). Using Theorem 7.2.3 and the definition of r(7r(W,;A0),
Minimal Factorizations of Rational Matrix Functions 225 7r(W2; A0)), we see that the nonincreasing sequence 5 = {S-}JL, of partial multiplicities of the matrix \AX B,C2 ^"U A2\ belongs to r^W,; A0), it(W2; A0)). Now (7.2.9) is a realization (not necessarily minimal) of WI(A)W2(A). Theorem 7.1.2 shows that there is a \B.S restriction (A0, B0, C0) of (>t, , [C.C2]) to some ,4-semiivariant sub- Li^ J space such that the realization W,(A)W2(A) = / + C„(A/ - Aoy'B0 is minimal. Then {tjj}*=1 is the sequence of partial multiplicities of A0 at A0. But as A0 is a restriction of A to M, we have it- ^ 5/; for / = 1, 2,. . . (see Section 4.1). □ The assumption that both W,(A) and W2(A) take finite values at infinity is not essential in Theorem 7.2.4. However, we do not pursue this generalization. The condition that the realization (7.2.9) is minimal for some minimal realizations (7.2.8) is important in the theory of rational matrix functions and in the theory of linear systems. It leads to the notion of minimal factorization and is studied in detail in the following sections. 7.3 MINIMAL FACTORIZATIONS OF RATIONAL MATRIX FUNCTIONS In this section we describe the minimal factorizations of a rational matrix function in terms of certain invariant subspaces. To make the presentation more transparent, we restrict ourselves to the case when the rational matrix functions involved are n x n and have value / at infinity. (The same analysis applies to the case when the matrix function has invertible value at infinity.) We start with a definition. The McMillan degree of a rational n x n matrix function W(\) [with W(°°) = /], denoted 5(W), is the size of the matrix A in a minimal realization W(A)-' = /-C(A/-,4r1fl (7.3.1) It is easily verified that W(xyl = I-C(XI- Ax)lB (7.3.2) where A* - A- BC. Moreover, if realization (7.3.1) is minimal, so is
226 Rational Matrix Functions equation (7.3.2). Indeed, equation (6.3.1) shows that the pair (A - BC, B) is a full-range pair [because (A, B) is so]. Further, (C, A) is a null kernel pair, or, equivalently, (A*, C*) is a full-range pair. By the same argument, the pair (A* - C*B*, C*) is also a full-range pair. Hence (C, A - BC) is a null kernel pair, and therefore realization (7.3.2) is minimal. In particular, d(W~1) = 5(W). Consider the factorization W(X)=Wl(X)W2(X)-W(X) (7.3.3) where, for /' = 1, . . . , p, W;(A) are n x n rational matrix functions with minimal realizations Wj(X)^I+Cl(XI-AjylBj Formula (7.2.9) applied several times yields a realization for W(A): W(A) = / + [C, C2 c,l A/" Ax B,C2 0 y4, 0 0 0 0 LV (7.3.4) This realization is not necessarily minimal, so we have (in view of Theorem 7.1.2) S(W)^S(W1) + --+5(Wp) We say that the factorization (7.3.3) is minimal if actually 8(W) = d(Wl) + •■•-t-S(W), that is, realization (7.3.4) is minimal as well. In informal terms, minimality of (7.3.3) means that zero-pole cancellation does not occur between the factors W-(A). Because the McMillan degrees of a rational matrix function (with value / at infinity) and of its inverse are the same, (7.3.3) is minimal if and only if the corresponding factorization for the inverse matrix function MA)"1 = W^ArX^A)-'• • • W,(A)-' is minimal. Let us focus on minimal factorizations (7.3.3) with three factors (/? = 3). A description of all such factorizations in terms of certain triinvariant decompositions associated with /l-semiinvariant subspaces is given. Here A is taken from a minimal realization W( A) = / + C(XI - A)~lB. Write Ax = A - BC, and let A and A* be of size m.
Minimal Factorizations of Rational Matrix Functions 227 We say that a direct sum decomposition $m=<e + M+Jf (7.3.5) is a supporting triinvariant decomposition for W(A) if (7.3.5) is a triinvariant decomposition associated with an ,4-semiinvariant subspace M (so i? and i? 4- M are A invariant) and at the same time M is A* semiinvariant with associated triinvariant decomposition <pm = Jf 4- M 4- i? (i.e., Jf and ^V 4- J( are /4X invariant). Note that a supporting triinvariant decomposition for W(A) depends on the choice of minimal realization. We assume, however, that the minimal realization of W(A) is fixed and thereby suppresses the dependence of supporting triinvariant decompositions on this choice. (In view of Theorems 7.1.4 and 7.1.5, there is no loss of generality in making this assumption.) The role of supporting triinvariant decompositions in the minimal factorization problem is revealed in the next theorem. Theorem 7.3.1 Let (7.3.5) be a supporting triinvariant decomposition for W( A). Then W(\) admits a minimal factorization W(X) = [I + Ctt^XI- AylirxB][I + Ciru(\I- A)~lirMB] x[/+Cirjr(A/-^)-V-B] = [/ + C(A/- A)~lwxB][I + Or, (A/- A)~lirMB] x[/+Ctj>(A/- A)~lB] (7.3.6) where irx is the projector on !£ along M 4- Jf, and irM and ir_v are defined similarly. Conversely, for every minimal factorization W(\) = Wl(\)W2(\)W3(\) where the factors are rational matrix functions with value I at infinity there exists a unique supporting triinvariant decomposition <pm = i? 4- M 4- Jf such that W,(A) = / + Cirx(\I- A)-\XB W2(A) = /+C^(A/-/l)-l^B (7.3.7) W3(X) = 1+ Cw^XI - Ay^B Note that the second equality in (7.3.6) follows from the relations it^A7ra = Attx and ■nxATiJ( = tr^A, which express the A invariance of i? and S£ + M, respectively (see Section 1.5).
228 Rational Matrix Functions Proof. With respect to the direct sum decomposition (7.3.5), write A = 4.i 0 0 Al2 ^22 0 ^,3 A23 An A = A* A* J\2X A.22 A* A* ^31 ^32 0 0 l33 C = [Cl C2 C3], B = B2 B3\ Note, in particular, that the triangular form of A* implies Au = BXC2, Al3 = BXC3, and A23 = B2C3. Applying formula (7.2.9) twice, we now see that the product on the right-hand side of (7.3.6) is indeed W(\). Further, denoting Wx(\) = I + Cirx(\I- A)~\XB, for 3iT=iP, M, or M, we obviously have S(WX) ^ dim %. Hence S(W) 2£ 8( Wx) + 8{WM) + 5(H^) < dim if + dim M + dim Jf = m Since, by definition, m = 8(W), it follows that 8(Wy) + 8(WM) + 8(Wy) = m = 5(W) and the factorization (7.3.6) is minimal. Next assume that IV = WXW2W^ is a minimal factorization of W, and for i= 1,2,3 let W,(A) = /+C,(A/-/1,) 'fl, be a minimal realization of W^k). By the multiplication formula (7.2.9) W(A) = /+C(A/- A) lB (7.3.8) where c = [c, Note that C2 c3], i = 0 .0 fl,C2 j4, 0 B,C3 B2C3 /13 J fl = B2 ,4-flC = i ~ B,C, -B2C, -B3C, 0 **- 2 ^2 2 "~ O3C2 0 0 /l3-i As the factorization W = H^H^Wj is minimal, the realization (7.3.8) is
Miuimal Factorizations of Rational Matrix Functions 229 minimal. Hence, by Theorem 7.1.4, for some invertible matrix S we have C=CS, i = 5"U5, B = S~lB To satisfy (7.3.7), put if = S&, M = SM, and Jf=SJf, where ^ = Span{e,,. . .,ePi}, A = Span{ePi + 1,. . • ,ePl+P2} jV- = Span{epi+P2+1,...,ePi+P2+P3} and Aj has size p, for i = 1, 2, 3. It remains to prove the uniqueness of if, J(, and jf. Assume that $m = J£' + M' + Jf' is also a supporting triin variant decomposition such that W,(A) = / + Ctt^,(A/- ^)_17r^,B W2(\) = I+C7rM,(AI-A)~1irM.B (7.3.9) W3(A) = /+C7T>.(A/-,4r17T>B As the realizations (7.3.7) and (7.3.9) are minimal (see the first part of the proof), there exist invertible transformations 7"^: if' —*!£, TM: M'—> M, Tjf\ Jf' -^ Jf such that Cirx, = CirxTx , irx,Airx. = (Tx) itxA'!txTx itX'B = \TX) ^%B , 3if = j?, ^*l, jf Therefore, the invertible transformation T: <pm—> <pm defined by T\x. = Tx for 3ST = if, M, jf is a similarity between the minimal realization W(A) = /+C(A/-^)"'Band itself: C=CT; A=T~lAT; B=TlB Because of the uniqueness of such a similarity (Theorem 7.1.4), we must have T=I. So if'= if, M'= M, Jf'= Jf. D Using formula (7.3.2), we can rewrite the minimal factorization (7.3.6) in terms of the minimal factorization of the inverse matrix function: W(A)-' = [/-CirJ,(A/->lx)-1irvBp-CirJI(A/-^,,r1irJIB] x[/-Ctt^(A/- Ax)~lirxB] = [/- C(A/- A*YXTr„B][I- CirM{XI- A*y\MB] x[I-Cir^(\I-AxylB]
230 Rational Matrix Functions where the second equality follows from ttxA t:k = A 7i> and it % A ir<£ — ■jrxAx, expressing the Ax invariance of Jf and M + Jf. An important particular case of Theorem 7.3.1 appears when jV" = {0} in the supporting triinvariant decomposition (7.3.5). This corresponds to the minimal factorization of W(A) into the product of two factors, as follows. Corollary 7.3.2 Let 3? and M be subspaces in <pm that are direct complements of each other. Assume that Z£ is A invariant and M is A* invariant. Then W(A) admits a minimal factorization W(A) = [/ + C{\I- A)'\ieB\[I + C(/-77>)(A/-,4)~1B] where irx is the projector on Z£ along M. Conversely, if W(A) = Wl(\)W2(\) is a minimal factorization with W,(o°) = W2(°°) = /, then there exists a unique direct sum decomposition <£"" = i? 4- M, where !£ is A invariant, M is Ax invariant, and such that W,(A) = / + C(A/- A)~\^B, W2(\) = /+C(/-7ra,)(A/- A)lB. 7.4 EXAMPLE Let us illustrate the description of minimal factorizations obtained in Theorem 7.3.1. The rational matrix function W(A) [' + A"'(A-1)"1 l + A-'J has a realization where W(\) = I + C(\I- A)~lB (7.4.1) A = "1 0 0- 1 0 0 0 0 0. , B = i o- 0 1 .0 1. ro l 01 Lo o lj This realization is minimal. Indeed, the matrix r0 0 1 Ll 1 0 0 0 °1 1 0 OJ has rank 3 and hence zero kernel. The matrix
Example 231 [B,AB\ 10 10 0 110 0 10 0 has rank 3, and hence its image is <p • Further Ax = A- BC = 1 -1 0 1 0 -1 0 0-1 Let us find all invariant subspaces for A and A*. It is easy to see that (1,1,0) is an eigenvector of A corresponding to the eigenvalue 1, whereas the vectors (0,0,1), (0,1,0) are the eigenvectors of A corresponding to the eigenvalue 0. Hence all one-dimensional ,4-invariant subspaces are of the form Span{(l,1, 0)}; Span{(0,1,0)}; Span{<0, a, l)}, a e <p. All two-dimensional ,4-invariant subspaces are of the form Span{(l,0,0), (0,1,0)}; Span{(l, 1, 0), (0, a, 1)} , a G <p Span{(0,1,0), (0,0,1)} Passing to A*, we fin'd that A* has three eigenvalues — l,y=\(\ + iV3), and y with corresponding eigenvectors (1,2,3), (l,y, 0), and (l,y, 0), respectively. There are three one-dimensional A * -invariant subspaces Span{(l,2,3)}, Span{(l, f,0)}, Span{(l, y,0)}, and three two-dimensional A * -invariant subspaces Span {(1,2,3), (l,y,0)}, Span{(l, 2, 3), (1, y,0)}, and Span{(l,0,0), (0,1,0)}. Now we describe supporing triinvariant decompositions <p3 = if 4- m + Jf (7.4.2) of W(A) with ^ = Span{(l,2,3)}, if = Span{(l, 1,0)}. If we let M = Span{(*, y, z)}, we easily see that <f3 = Span{(l, 1,0)} 4- M 4- Span{(l, 2, 3)} if and only if z ¥= 3(y - x). Further, one of the following four cases appears: (a) ^ = Span{(l,0,0),(0,l,0)}nSpan{(l,2,3),(l,y,0)}. (b) ^ = Span{(l,l,0),(0,a,l)}nSpan{(l,2,3),(l,y,0)}, a G (p. (c) ^ = Span{(l,0,0),(0,l,0)}nSpan({l,2,3),(l,r,0)}. (d) ^=Span{(l,l,0),(0, a, l)}nSpan{(l,2,3),(l,y,0)}, a E (p. In cases (a) and (c) we obtain M = Span{(l, y, 0)} and M = Span{(l, y, 0)}, respectively. In case (b) we have
232 Rational Matrix Functions X y 7. = /> 1 1 0 + q 0 a 1 = r 1 2 3 + 5 1 y o (7.4.3) for some complex numbers p, q, r, s. Consider the second equality in (7.4.3) as an equation with unknowns p, q, r, s. Solving this equation and putting r = 1 - y, we get q = 3- 3y, 5 = 1- 3a, p = 2 - 3a - y, and M is spanned by (2 - 3a - y, 2-3 ay - y, 3 - 3y), where a # 5. [This condition reflects the inequality z ^3(y - x).] Similarly, in case (d) we obtain J£ = Span{{2- 3a - y, 2- 3ay - y, 3 - 3y)}, where a # 5. To summarize, the subspaces M for which <p3 = Span{<l,l,0)} +J(+Span{(l,2,3)} is a supporting triinvariant decomposition for W(A) are exactly the following: Span{<l,y,0)}; Span{<l, y,0)}; Span{(2 - 3a - y, 2-3ay-y, 3-3y)}, a*\; and Span{(2- 3a - y, 2 - 3ay - y, 3 - 3y)}, a*\. To compute the corresponding minimal factorizations according to formula (7.3.6), write the matrices A, B, C (understood as transformations in the standard orthonormal bases in <p2 and <p3) with respect to the basis (1,1,0), (l,y,0), (1,2,3) in <p3 and the standard basis (1,0), (0,1) in <P2: A = -1 0 -0 1 0 0 1- 0 0- -[J y 0 B = -y(i-y)"1 -3(7 + 1X7-1) (l-y)" 0 f(y-i)"' So the minimal factorization corresponding to the supporting triinvariant decomposition (7.4.3) with M = Span{(l, y,0)} is W(A) = Wl(X)W2(X)W3(X), where W,(A) = / + ri- (A- l)-[-y(l y) ',-|(y + l)(y-l) -(y + 1) ■] (l-y)(A-l) 3(y-l)(A-l) 0 1
w,(a) = / + [J]a-'[ W3(A) = /+[3]a-'[o,|] = Example (i-y) '^(y-i)1 233 1 + 2y (l-y)A 3(y-l)A 0 1 _2_ ' 3A 1 + A A J Replacing y by y in these expressions we obtain the minimal factorization corresponding to (7.4.3) with M = Span{(l, y,0)}. Now for a ¥= \ write A, B, and C in the basis (1,1, 0), (2 - 3a - y, 2-3ay-y,3-3y), (1,2,3): A = 1 2-3a-y 1 0 0 0 0 0 0 [J 2 — 3ay — y 2 3-3y 3. B y(y-i) -(y + l)(l-y)- (3a-l)-(y-l)-' ?(3a-ir'(l-y)-' (3a-l)"1 (a-l)(2a-l)-1 The corresponding minimal factorization is given by y y + i W,(A) = i + (y-l)(A-l) 3(l-y)(A-l) 0 1 W2(A) = 1 + 2 — 3ay — y (3a-l)(y-l)A 3-3y (3a-l)(y-l)A 1 + 2(2 — 3ay — y) 3(3a-l)(l-y)A 2(3-3y) 3(3a-l)(l-y)A. W3(A) 1 + 2 2(a - 1) (3a-l)A (3a-l)A 3(a - 1) (3a-l)A 1 + (3a-l)A J
234 Rational Matrix Functions Taking y in the place of y in these expressions, we obtain the minimal factorization corresponding to (7.4.3) with M - Span{{2 - 3a - y, 2-3ay-y, 3-3y». Note that these four factorizations exhaust all minimal factorizations 1 + A~'(A-1)-1 A-1 0 1 + A -i = W,(A)W2(A)W3(A) with not identically constant rational 2x2 matrix functions W,(A) with value / at infinity and for which W,( A) has a pole at A0 = 1 and W3(A) has a zero at A0 = -1 [i.e., H^A)-1 has a pole at A0 = -1]. 7.5 MINIMAL FACTORIZATIONS INTO SEVERAL FACTORS AND CHAINS OF INVARIANT SUBSPACES Let W(X) be an n x n rational matrix function with minimal realization W(\) = I+C(\I- A) lB (7.5.1) so that, in particular, W(pc) - /. We study minimal factorizations of W(A) by means of the realization (7.5.1), and in terms of chains of invariant subspaces for A and A* - A - BC. We state the main theorem of this section. Theorem 7.5.1 Let m be the size of A in equation (7.5.1), and let <pm = i?, + • • • + £p (7.5.2) where the chain 2lC2l+22C---C2l+22 + --- + 2p_l (7.5.3) consists of A-invariant subspaces, whereas the chain <£p C 2p + if,., C • • • C jfp + i?p_, + ••• + i?2 (7.5.4) consists of Ax-invariant subspaces. Then W(A) admits the minimal factorization W(\) = [/ + C7r,(A/ - Ay\yB] •••[/+ Cirp(A/ - A)~lirpB] (7.5.5) where 7ry is the projector on i?; along J^4- • • • 4- 5£j_l + J£j+i + ■ • ■ + J£p. Conversely, for every minimal factorization
Factors and Chains of Invariant Snbspaces 235 W(A) = W,(A)---H;(A) (7.5.6) where W;( A) are rational nx n matrix functions with W;(o°) = /, there exists a unique direct sum decomposition (7.5.2) with the property that the chains (7.5.3) and (7.5.4) consist of invariant subspaces for A and A*, respectively, such that W,(A) = I + Cir,(A/- A)'\B , j = \,...,p The proof is obtained by p - 1 consecutive applications of Corollary 7.3.2. As in the remark following the proof of Theorem 7.3.1, the factorization (7.5.5) implies the minimal factorization for W^A)-1: W(A)-, = [/-Cir,(A/->lT\,B] x[/-Cirp_,(A/-^x)-,irp_1fl] [/- Cir,(A/- Axy\B) We are interested in the case when p, the number of factors in the minimal factorization (7.5.6), is maximal [of course, we exclude the case when some of the W^A) values are identically equal to /]. Obviously, p cannot exceed the McMillan degree of W(A), 8(W), for then each factor Wj(\) must have McMillan degree 1. It is not difficult to find a general form of rational n x n matrix functions K(A) with V(°°) = / and 8{V) - 1; namely K(A) = /+(A-A0)_17? (7.5.7) where A0 is a complex number and R is an n x n matrix of rank 1. Indeed, if V( A) has the form (7.5.7), then by writing R = C0B0, where C0 is an n x 1 matrix and B0 is a 1 x n matrix, we obtain a realization / + C0( A - A0)-1B0 of K(A) that is obviously minimal. So S(V)-\. Conversely, if 5(K) = 1, then we take a minimal realization / + C0(A - A0)~'i?0 of K(A) and put 7? = C0B0 to obtain (7.5.7). Note that if V(A) has the form (7.5.7), then so does V(\)~l [because 8(K_1) = 8(V) = 1]. Indeed, by equation (7.3.2) V(A) 1 = /-(A-(A0-tr7?))-1/? where tr R is the trace of 7? (the sum of its diagonal entries). We arrive at the following problem: study minimal factorizations W(A) = KI(A)-Km(A) (7.5.8) of W(A), where each V^(A) has the form (7.5.7) for some A0 and R. First let
236 Rational Matrix Functions us see an example showing that not every W( A) admits a minimal factorization of this type. example 5.1. Let This realization (with -[Sil- -[X?l- -[J SI' is easily seen to be minimal. As BC = 0, we have —IS J] Obviously, there is no (nontrivial) direct sum decomposition <p2 = if, 4- i?2, where if, and i?2 are /I invariant. So by Theorem 7.5.1 (or Corollary 7.3.2) W(A) does not admit minimal factorizations, except for the trivial ones W(\) = W(\)I=IW(\). □ We give a sufficient condition for the existence of a minimal factorization (7.5.8). This condition is based on the following independently interesting property of chains of invariant subspaces. Lemma 7.5.2 Let A,, A 2: <p" —> <p" be transformations and assume that at least one of them is diagonable. Then there exists a direct sum decomposition <p" = !£l + ■ • • 4- £n with one-dimensional subspaces if., / = 1, . . . , n, such that the complete chains if, c if, + i?2 c • - • c if, + ■ • - + if„_, and if, c if 4- if . c • • • c if 4- if , 4- • • - + if, consist of A ^invariant and A2-invariant subspaces, respectively. Proof. It is sufficient to prove the existence of a direct sum decomposition $n = M + £ (7.5.9) where dim M = n - 1, dim i? = 1, M is Al invariant, and i? is A2 invariant.
Factors and Chains of Invariant Subspaces 237 Indeed, we can then use induction on n and assume that Lemma 7.5.2 is already proved for A^M and PMA2\M in place of Al and A2, respectively, where PM is projector on M along ££. (Remember that if at least one of Al and A2 is diagonable, the same is true for AX\M and PMA2\M; see Theorems 4.1.4 and 4.1.5.) Combining (7.5.9) with the result of Lemma 7.5.2 for A^M and PMA2\M, we prove the lemma for A{ and A2. To establish the existence of the decomposition (7.5.9), assume first that At is diagonable, and let /,,...,/„ be a basis for <p" consisting of eigenvectors of At. If g is an eigenvector of A2, (7.5.9) is satisfied with iP = Span{/-,...,/,-_ }, where the indices /,,...,/„_, are such that //,.-••, /,•„_,'. g form"a'basis in <£". If A2 is diagonable but A, is not, then use the part of the theorem already proved with A\ and A* in place of A^ and A2, respectively. We obtain an (n - 1)-dimensional A*-invariant subspace M and a one-dimensional A\- invariant subspace & that are direct complements of each other. Then put M = (i?)1 and j?= (M)1 to satisfy (7.5.9). D We can now state and prove the following sufficient condition for minimal factorization of a rational matrix function W(A) into the product of 8(W) nontrivial factors. Theorem 7.5.3 Let W(\) be a rational n x n matrix function with a minimal realization W(\) = I+C(\I- A)~lB (7.5.10) and assume that at least one of the matrices A and A - BC is diagonable. Then W( A) admits a minimal factorization of the form W(A) = [/ + (A - A,)"1/?,] •••[/ + (A - AJ-'KJ (7.5.11) where A,,. . . , Am are complex numbers and /?,, . . . , Rm are n x n matrices of rank 1. The proof of Theorem 7.5.3 is obtained by combining Theorem 7.5.1 and Corollary 7.5.2. In Example 7.5.1 the hypothesis of Theorem 7.5.3 is obviously violated. Indeed, the matrix is not diagonable. The following form of Theorem 7.5.3 may be more easily applied in many cases. Theorem 7.5.4 Let W( A) be a rational n x n matrix function with W(&) = I. Assume that either in W(A), or in W(\)~\ all the poles (if any) of each entry are of the first order. Then W(\) admits a factorization (7.5.11).
238 Rational Matrix Functions Recall that the order of a pole A0 of a scalar rational matrix /(A) is defined as the minimal positive integer r such that limA^ [(A - A0)'/(A)] is finite. Proof. Assume that all the poles of each entry in W(A) are of the first order. The local Smith form (7.2.1) implies that all the negative partial multiplicities (if any) of W(A) at each point A0 are -Is. By Theorem 7.2.3, all the partial multiplicities of the matrix A from the minimal realization (7.5.10) are Is. Hence A is diagonable and Theorem 7.5.3 applies. If all poles of W(\)~l are of the first order, apply the above reasoning to W(A)_1, using its realization W(A)_1 = /- C(A/- (A - BC))~{B, which is minimal if (7.5.10) is minimal. □ 7.6 LINEAR FRACTIONAL TRANSFORMATIONS In this and the next sections we study linear fractional transformations and decompositions of general (nonsquare) rational matrix functions. We deviate here from our custom and denote certain matrices by lower case Latin and Greek letters. Let W( A) be a rational matrix function of size rx.m written in a 2 x 2 block matrix form as follows: Iff A)' H» £(j$]= t-' + r-r + t"' <"■» Here m, and r, (/ = 1,2) are positive integers such that m = ml + m2, r=rl + r2. Let V(\) be a rational m2*-rl matrix function for which det(/- W12(A)K(A))^'0, and define matrix function U(\) = W21(A) + W2l(X)V(\)(I- Wl2(\)V(\)ylWn(\) (7.6.2) So U(\) is a rational matrix function of size r2 x ml. It is called the linear fractional transformation of K(A) by W(A) [with respect to the block matrix form of (7.6.1)] and is denoted by !FW(V). It is easily seen that when m, = /-[ and det Wu(X)^0, (7.6.2) can be rewritten in the form U(X) = (K,(A) - /?2(A)K(A))(H,(A) - H4(A)V(A))-' (7.6.3) where K,(A) - W21(A)W11(A)-1 , R2(\) = W21(A)WH(A) 'W12(A) - W22(A) R3(\)=Wn(\y1 , 7?4(A)=WH(A)-V12(A) Conversely, if (7.6.3) holds, then we have (7.6.2) with
Linear Fractional Transformations 239 WH(A) = R3(X)~l , Wl2(X) = R3(xylR4(X), W2I(A) = R^R^Xy1 , W22(A) = R^R^XyXiX) - R2(X) The form (7.6.3) justifies the terminology "linear fractional transformation", however, the form (7.6.2) will be more convenient for our analysis. Observe that multiplication of rational matrix functions is a particular case of the linear fractional transformation, which is obtained in case W21(A) = 0, WI2(A) = 0, and either W22(A) = /or WU(A) = /. Assume now that both W(A) and K(A) take finite values at infinity. Then (see Section 7.1) there exist realizations W(X) = D + C(XI- A)~lB (7.6.4) where A, B, C, and D are matrices of sizes n x n, n x m, r x n, and r x m, respectively, and V(\) = d + c(XJ-a)1b (7.6.5) with matrices a, b, c, d of size pXp, pxr,, m2 x p, and m2'Xrl, respectively. At this point we do not require that the realizations (7.6.4) and (7.6.5) be minimal. We are to find a realization of &W(V) in terms of the realizations (7.6.4) and (7.6.5) of W(X) and K(A). With respect to the direct sum decompositions <pm = <pm' + (p™2 and <pr = <pr' 4- (f"2, we write B, C, and D as block matrices As D = VV(°o), formula (7.6.2) shows that 9FW{V) is analytic at infinity (i.e., has no poles there) provided the matrix /- Dl2d is invertible; in this case 9w(V)(co) = D2i + D22d(I-Dl2d)'xDu We restrict our attention to rational matrix functions that are analytic at infinity, so it will be assumed that /- Dl2d is invertible. Then /- dDl2 is invertible as well and (I-dDn)'x = I + d(I- D12d)"'Dl2 (7.6.6) Indeed, multiplication gives (I-dDl2)[I + d(I-Dl2dylDl2] = / - dDl2 + (d- dDl2d)(I - Dnd)~lDl2 = / + d[-I + (I- D12d)(I - Dl2d)l]D12 = J
240 Rational Matrix Functions Define transformations: \A + B2d(I- Dl2dylCy B2(/-dD12)"V 1 aL b(I-Dl2d)~lCl fl + fe(/-D12d)_1D12cJ' $" + $"-*$" + $" (7.6.7) 7 = [y„ %] = [C2 + D22d(/- O^l-'C,, D22{I-dDl2ylc): <p" + 47'—47r» (7.6.9) 5 = D21 + D22d(/ - Dl2dylDu: f"-^ <T2 (7-6.10) Theorem 7.6.1 We have 9w(Y)(\) = 8 + y(\I-a) 'j8 (7.6.11) Further, if this realization of &W(V) is minimal, then the realizations (7.6.4) and (7.6.5) of W(A) and V(X), respectively, are minimal as well. Proof. Write = I" Du + C,(A/- /i)-1^, Dl2 + C1(\I-A)~1B2'\ ~LD21 + C2(A/-^)"'B, D22 + C2(A/-^)"'B2J So l^7(A) = Z)1/ + C,(A/-/l)-1B;, i, / = 1,2 We use a step-by-step procedure to compute a realization for &w(v) = w21(\) + w22(\)v(\)(i-wl2(\)v(\)ylwn(\) using these realizations for W/;(A) and the realization (7.6.5) for K( A) by the following rules: given two rational matrix functions A^A) and A"2(A) with finite values at infinity and realizations *,(A) = D, + C,(A/- AiylBi , /=1,2 realizations for A",(A) + A"2(A), A"1(A)Ar2(A), and A^A)-1 can be found as follows [cf. formulas (7.2.9) and (7.3.2)]:
Linear Fractional Transformations 241 x1{x) + x2{\) = d1 + d2 + [c1c2](xi-[a01 j1]) [?'] x1(\)x2(\) = d1d2 + [c1,d1c2]{m-[A01 Bf2]) [Bf2] x.(A)-1 = d~1 -d;1c1{xi-(A1 - bxd'xxcx)Y1bxd-x1 (it is assumed in the last formula that Dx is invertible). A computation shows that 9w{V) = 8 + y(\I-a)-lp (7.6.12) where 8 = D2X + D22d(I- Dx2d) lDu y = [C2, C2) D22c, D22d(l- Dl2d)~lCx, D22d(l - Dx2d)~lDX2c , D22d(I-Dx2d)lCx) A 0 0 0 A B2c 0 0 0 ■ XC, XDl2c XCX 0 0 a yC, yD12c yCx 0 0 0 >i + XCX (B2 + XDn)c XCX 0 0 0 yCx a + yDx2c yCx 0 A . L 0 0 0 0 and X=B2d{I- Dnd)~\ y = b(I- Dl2dyl; J8 XDXX yDu XDXX L fl, J Let S = r7- 0 0 0 0 -0 0 0 K 0 0 0 0 /„ 0 0 0 0 0 K 0 h 0 0 0 0 Ip 0 Ip 0 ln -h 0 -h 0 /„
242 Rational Matrix Functions where n x n and p x p are the sizes of A and a, respectively. Then 5~' = nn o o o o -/„-| o o ip o -ip o o /. o -/„ o o 0 0 0 /„ 0 0 0 0 0 >, ° .0 0 0 0 0 /„ and yS = [c2, D22c, C2, C2 + D22d(I - Dl2dylClt D22(I - dDl2)~lc, 0] 5" aS = S~lp = [A 0 0 0 fl 0 0 B2c A\ 0 0 0 fl[ + XDU . fl, Writing (7.6.12) in the form 9wiY) = $ + ys(\i - s~ldsyls'l0 we see that formula (7.6.11) follows. Assume now that (7.6.11) is a minimal realization. Let xE. <p" be such that [£] Akx = 0 (7.6.13) for all nonnegative integers k. Using formula (7.6.7), one proves by induction on k that «tH"o*]. *-<>•■■■ <«■"> Indeed, (7.6.14) holds for k-0. Assuming that (7.6.14) is true for k - 1, we have where the last equality follows in view of (7.6.13). Now
Linear Fractional Transformations 243 ya = (ci + D22d(I- D12d)_1C,)^ = 0 , k = 0,1, and x = 0 because (y, a) is a null kernel pair. [This follows from the minimality of (7.6.11)] So the pair (C, A) is also a null kernel pair. To prove that (A, B) is a full-range pair, observe that a can be written in the form k-\ Ak + 2 A'B2Ylk 2 ^'B2Z,* it = 0,1,. (7.6.15) where y|A and Z,A are certain matrices and the stars denote matrices of no immediate interest. Formula (7.6.15) can be proved by induction on k by means of formula (7.6.7). From the minimality of (7.6.11) it follows that for every x G <p" there exist vectors v0,. . . ,vq G <pm' such that But then, using (7.6.15) and (7.6.8), we have * = 2 {U* + 2 A'B.Y.Ab, + B2d(I- Dl2d)lDu\vk + 2 A'B2Zlkb(I-DndylDnv\= 2 /l*[fl„fl2]M ;=o J *=<> L * J and (,4, B) is a full-range pair. So the realization (7.6.4) is minimal. Now consider the realization (7.6.5). Let xG <pp be such that cakx = 0, A: = 0,1,... . One proves that a* = k , A: = 0,1,. . . using an argument analogous to that used in obtaining (7.6.14). Hence iYi> %]«*[ J =l*> %lf ! 1 = D22(I-dDl2y1ca"x = 0 In view of the minimality of (7.6.11) we obtain x = 0, and (c, a) is a null kernel pair. Finally, write a* in the form k-\ 2 a'bzlk ak + 2 afy„ A = 0,1,... (7.6.16) for some matrices zlk and ylk. [Again, equation (7.6.16) can be proved by induction on k using (7.6.7).] For every x€l$p by the minimality of (7.6.11) there exist vectors u0,. . . , uq G <£""' such that
244 Rational Matrix Functions From (7.6.16), it follows that i x — 2j akbwk * = 0 for some vectors w0,...,wq, and the full-range property of (a, b) is proved. Hence the realization (7.6.5) is minimal as well. □ Observe that if £>12 = 0, D21 = 0, DU = I, C, = 0, B,=0, we have W21(A) = 0, W12(A) = 0, Wn(\) = l, and so 9W(Y)(X) = W22(X)V(X) On the other hand, formulas (7.6.7)-(7.6.10) take the form d J . y = [C2,D22c] , 8 = D22d which coincides with formula (7.2.9) for the realization of a product of rational matrix functions. So (7.6.11) is a generalization of (7.2.9). On the other hand, putting Dl2 = 0, D2l = 0, D22 = /, C2 = 0, B2 = 0, we have ^(V)(A) = HA)WM(A) and formula (7.6.11) gives another version for the realization of the product of two rational matrix functions: 7.7 LINEAR FRACTIONAL DECOMPOSITIONS AND INVARIANT SVBSPACES FOR NONSQUARE MATRICES Let U( A) be a rational matrix function of size q x s with finite value at infinity. A linear fractional decomposition of t/(A) is a representation of U(\) in the form U{k) = 9w(\) (7.7.1) for some rational matrix functions W(A) and V(X) that take finite values at infinity. In this section we describe linear fractional decompositions of i/(A) A 0 B2c P =
Linear Fractional Decompositions and Invariant Subspaces 245 in terms of certain invariant subspaces for nonsquare matrices related to a realization of t/(A). Minimal linear fractional decompositions (7.7.1) are of particular interest. First observe that the definition of the McMillan degree of a rational matrix function with value / at infinity (given in Section 7.3) extends verbatim to a (possibly rectangular) rational matrix function W(\) with finite value at infinity: namely, 8(W) is the size of the matrix A taken from any minimal realization W(\) = D + C(XI-A)~lB of W(A). In any linear fractional decomposition (7.7.1) of £/(A) for which the rational functions W(A) and K(A) take finite values at infinity, we have 8(U)<8(W) + 8(V) (7.7.2) Indeed, assuming that (7.6.4) and (7.6.5) are minimal realizations of W(A) and K(A), respectively, then by Theorem 7.6.1 U(\) has a realization (not necessarily minimal) 8 + y(A/- a)~'/3, where the size of a is t x t, with t= 8(W) + 8(V). Hence (7.7.2) follows. The linear fractional decomposition (7.7.1) is called minimal if equality holds in (7.7.2), that is, 8(U) = 8(W) + 8(V). As in the preceding paragraph, Theorem 7.6.1 implies that (7.7.1) is minimal if and only if for some (and hence for any) minimal realizations (7.6.4) and (7.6.5) of W(A) and t/(A), respectively, the realization (7.6.11) of £/(A) = &W(V) is again minimal. Let t/(A) = 5 + y(A/-a)"'^ (7.7.3) be a realization (not necessarily minimal) of t/(A), where a, J3, y, and 8 are matrices of sizes / x /, / x s, q x /, and q x s, respectively. Recall from Theorem 6.1.1 that a subspace M C <p' is [a /3] invariant if and only if there exists an 5 x / matrix F such that M is invariant for a + pF. Also (see Theorem 6.6.1), a subspace Jf C <p' is invariant if and only if there exists an / x q matrix G such that (a + Gy)Jf C Jf. For the purpose of this section we can accept these properties as definitions of [a /3]-invariant and -invariant subspaces, respectively. A pair of subspaces (M,, M2) of £' will be called reducing with respect to realization (7.7.3) if Mx is [a /3] invariant, M2 is invariant, and Ml and Ji2 are direct complements to each other in <p'. The following theorem provides a geometrical characterization of minimal linear fractional decompositions of U(X) in terms of its realization (7.7.3).
246 Rational Matrix Functions Theorem 7.7.1 Assume that (Ml,M2) is a reducing pair with respect to the realization (7.7.3) of U( A). The following recipe may be used to construct realizations of rational matrix functions W( A) and V( A) such that U(\) = 9W(V) (7.7.4) and = \DU + Cl(\I-AylBl Dn + C,(A/- A)~1B21 ~ID21 + C2(A/- A)~lBl D21 + C2(\I- A)~lB2i' (7.7.5) with a transformation A: Mx-*Mx, and V(X) = d + c(XJ-a)'lb: 4?-»<p* (7.7.6) with a transformation a: M2-^> M2: (a) choose any transformation and any transformation d: <ps—»(p* such that the transformations Dn, D22 and J — Dl2d are invertible and 8 = D21 + D22d(I - Dl2d) 'D,, (7.7.7) (b) choose any transformations F:<p'—»<p* and G:$q—>$ for which (a + BF)Ml CM{ and (a + Gy)M2 C M2; (c) let r oc oc i /3 = [^]:<P^^,+^2; T = [7. %]:^,+^2-<F' (7.7.8) F=[F, F2]:^, + i«2-»^; G = [ ']: £*-».*, + J<2 be block matrix representations with respect to the direct sum decomposition <p = My + M2. Then, defining
Linear Fractional Decompositions and Invariant Subspaces 247 ^ = «ii-G,(5-D21)F1 B,-A + G,(5-Z)2I) B2 = -GlD22 (7.7.9) C^-DnF, C2 = Tl + (5-D2I)F, and a=a22- B2D~{llDX2{I- dDn)D22y2 fe = /32D-'(/-D12d) (7.7.10) c = (/-dD12)D;2'r2 equation (7.7.4) /jo/tij. Moreover, if, in addition, the realization (7.7.3) is minimal, the linear fractional decomposition (7.7.4) is minimal as well; and conversely, any minimal linear fractional decomposition (/(A) = W21(A) + W22(A)V(A)(/ - W21(A)V(A)r V,,(A) (7.7.11) of f/(A) where the rational matrix functions r^,(A) w12(A)i W(A) LW21(A) W22(A)J and V(\) take finite values at infinity and the matrices W, ,(<*>) and W22(o°) are invertible, can be obtained by this recipe. Proof. Let A, Bjt Cy, D:/ and a, b, c, d be defined as in the recipe. Then, using the relationships (7.7.7), (7.7.9), and (7.7.10) and the equalities a21 + /32F,=0 and al2 + Gly2=0 (which follow from the in- variance of Mx and M2 under the transformations a + BF and a + Gy, respectively), one checks that the equalities (7.6.7)-(7.6.10) hold. Now by Theorem 7.6.1. we obtain the linear fractional decomposition (7.7.4). Assume now that (7.7.3) is a minimal realization of t/(A); hence 8(U) = I. By Theorem 7.6.1 the realizations (7.7.5) and (7.7.6) are minimal, so 8(W) = dim M^ 8(V) = dimM2 As Mx and M2 are direct complements to each other in <p', we have 8(U) = 8(W) + 8(V), and the minimality of the linear fractional decomposition (7.7.4) follows.
248 Rational Matrix Functions Conversely, assume that (7.7.3) is a minimal realization of £/(A), and let (7.7.10) be a minimal linear fractional decomposition of £/(A), where the rational functions W(A) and K(A) are finite at infinity and Wn(<x>), W22(o°) are invertible. Here and K(A) is of size ^xj. [The sizes of W(A) and V(\) are dictated by formula (7.7.11) and by the invertibility of W, ,(<*>) and W22(oo); in particular, the matrix functions WH(A) and W22(A) must be square.] Let W(A)=[^I D22] + [c2](A/_/irl[B' B>] (7713) be a minimal realization of W(\) partitioned as in (7.7.12), where the matrix A has size n x n, n = 8(W). Let V(X) = d + c(\I-a)'ib (7.7.14) be a minimal realization of V( A) in which a is p x p, p = 8{V). By Theorem 7.6.1, form a realization l/(A) = 8' + y'(A/-a')-'0' (7.7.15) where a', j3\ y', and 8' are given by formulas (7.6.7), (7.6.8), (7.6.9), and (7.6.10), respectively, using the realizations (7.7.13) and (7.7.14). As (7.7.11) is a minimal linear fractional decomposition, the realization (7.7.15) is minimal. [The size of a' is (n + p) x (n +/?).] Comparing the minimal realizations (7.7.3) and (7.7.15) we find, in view of Theorem 7.1.4, that 8 = 8' and there exists an invertible transformation S: <p" + <pp-»<£' such that a = Sa'S~\ 0 = S0\ y = y'S~l Putting Mx = 5(<p" + {0}), M2 = 5({0} + C), '-i-i>r.'c1i-u..i, G.["S^™B'D-'] one verifies that (a + ^F)MlC Ml, (a + Gy)M2C M2, and the minimal linear fractional decomposition (7.7.11) is given by our recipe. □ Observe that the linear fractional decomposition of £/( A) described in the recipe of Theorem 7.7.1 depends on the reducing pair {Ml, M2), on the choice of D and d such that condition (a) holds, and on the choice of F and
Linear Fractional Decompositions and Invariant Subspaces 249 G such that (a + pF)M1 C Mt, (a + Gy)M2CM2. [We assume that the realization (7.7.3) of U(\) is fixed in advance.] We determine the parts of this information that are uniquely defined by the linear fractional decomposition. Let us introduce the following definition. Let (Mlt M2) be a reducing pair [with respect to the realization (7.7.3)] and F: $'-> <pf, G: $"->■ <£' be transformations such that (a + pF)Ml Ci„(« + Gy)M2 C Ji2, and write F=lFltF2), G = [%] with respect to the direct sum decomposition <p' = Mx + Ji2. The quadruple (Mx, M2; F,, G,) will be called a supporting quadruple [with respect to the realization (7.7.3)]. Given a supporting quadruple, for every choice of D and d satisfying condition (a) of Theorem 7.7.1, the recipe produces a linear fractional decomposition of £/(A). We now have the following important addition to Theorem 7.7.1. Theorem 7.7.2 Assume that the realization (7.7.3) is minimal, and let (7.7.11) be a minimal linear fractional decomposition of f/(A) such that W1;(A) and K(A) take finite values at infinity and the matrices W, ,(<*>) and W22(°°) are invertible. Then there exists a unique supporting quadruple Q = (My, M2; Fl, G,) that produces, together with some choice of D and d satisfying condition (a), the decomposition (7.7.11) according to the recipe of Theorem 7.7.1. Proof. The existence of Q is ensured by Theorem 7.7.1. To prove the uniqueness of Q, assume that Q' = (M[, M2, F\, G[) is another supporting quadruple that gives rise (with some choice of D and d) to the same decomposition (7.7.11). As D = W(pc), d = V(°°), we see that actually the matrices Ld21 d22J and d, which, together with Q\ give rise to the decomposition (7.7.11) are the same matrices chosen to produce (7.7.11), together with Q. Further, let (7.7.8) be the block matrix representations of a, p, and y with respect to the direct sum decomposition <p' = Jil + Ji2, and let «=[:■:!;] *-[£]. -^ be the corresponding representations with respect to the direct sum <p' = M[ + M'2. We now have two realizations for W(A):
250 Rational Matrix Fnnctions W(A) ol D:V[cli"-«r'W B!l (7.7.16) where A, Bt, and Ci are given by formulas (7.7.9) and A', B\, and C;' are given by (7.7.9) with a,,, G,, F,, fl„ r, replaced by «;„ G',, F|, fl;, y;, respectively. By Theorem 7.6.1, both realizations (7.7.16) are minimal, so in view of Theorem 7.1.3 there exists an invertible transformation S: MX—*M[ such that /l = 5-U'5,[^'] = [^]5,[B1 «2] = S-[B; fl2] (7.7.17) Similarly, we have V(A) = d + c(A/ - a)~lb = d + c\XI- a') V (7.7.18) where a, b, and c are given by (7.7.10) and a', b', and c' are given by (7.7.10) with a22, B2, y2 replaced by a'22, B2, y'2, respectively. Since both realizations (7.7.18) are minimal, we have a= T~la'T , c'T, b=T~xb' (7.7.19) for some invertible transformation T: M2- We now verify that ■ M.'2 ■ \s~l o ](■«;, «;2]rs o] [«„ «I2] L 0 T~ll\-a'21 a'22\l0 Ti U2] a22J [5o r0)][fi;] = [^]; [y[ y'Ao °] = [* *] (7.7.20) Indeed, formulas (7.7.9) together with (7.7.17) give f, = f;s, gx = s~1g\, fl^s-'a;, y1 = y'1s and «„ - G,(5 - D2I)F, = S-'(a;, ~ G;(5 - £>2I)F,')5 = 5-'a'115-GI(5-D21)F1 so an = S-^,'^. From formulas (7.7.10) and (7.7.19) one obtains
Linear Fractional Decompositions: Further Deductions 251 ft=r"1^, y2 = y'2T and «22 = « - /32£>H'DI2(/ - dDl2)D22y2 = T~\a' - ^'2D'^Dn(I- dDn)D22y'2)T = T'la22T Further, the definition of the supporting quadruples Q and Q' implies «i2 = -Gi?2' «2i = -/32F,, a'12 = -G;y2, a21 = -$'2F\ so 5 a'12r= — S G[y'2T = — G,y2 = a12 and 7^1a2I5=-7-1^2F;5=-/32F1=a21 All the established relationships verify the equalities (7.7.20). It remains to observe that the transformation V= is a similarity of the minimal realization (7.7.3) with itself. Since such a similarity must be unique (Theorem 7.1.4), it follows that V= /andhenceMl = M\,M2 — M'2, Fx = F\, and G, = GJ. D 7.8. UNEAR FRACTIONAL DECOMPOSITIONS: FURTHER DEDUCTIONS We consider here some deductions, examples, and results on linear fractional decompositions that follow from the main theorems, Theorems 7.7.1 and 7.7.2. The particular case when 8 = /, D - I, and d = I in Theorems 7.7.1 and 7.7.2 is of special interest. In this case condition (a) of Theorem 7.7.1 is satisfied automatically, and we have the following. Theorem 7.8.1 Let £/(A) = / + y( A/-«)"'£ (7.8.1) be a minimal realization of the rational q * q matrix function U(X). Let {Ml, Ji2) be a reducing pair for the realization (7.8.1), and write
252 Rational Matrix Functions [a,, «121 r. Mt + M2 —* Mx + M2 Oty\ 22 Choose any transformations F=[Fl F2\:MX+M2^^ , G = [ J ]: f«-» Mx + J<2 /« smc/i a way </ia< (a 4- f}F)Ml Ci,, (a + Gy)J(2 C M2. Then W(\) = I+\ "/' l(A/-(a1I-G,F1))-,[,31 + G1,-G1] (7.8.2) and K(A) = /+r2(A/-a22)-'/32 (7.8.3) produce a minimal linear fractional decomposition £/( A) = ^(K). Conversely, every minimal linear fractional decomposition U(\) = 3PW(V) with W(°o) = / and K(oo) = / can be obtained in this way, and the quadruple (Mx, M2; Fx, G,) is determined uniquely by W(\) and V(A). Let us give a simple example illustrating Theorem 7.8.1. example 7.8.1. Let ^rr <,-;-]■ •- A minimal realization for f/(A) is easy to find: U(\) = 8+y(\I-a)~lp with 8 = y = [} = I, «=n nWe ^nc^ a" nontrivial [i.e., such that W(\)^I, K(A)^7] minimal linear fractional decompositions f/(A) = &W(V) such that VV(oo) =/, V(<*>) = I. Every subspace in <p2 is [a /3] invariant, as well as invariant. We consider the case when the one- dimensional subspaces Ml and M2 and <p2 that are direct complements to each other are of the form
M,= Span Then one computes Linear Fractional Decompositions: Fnrther Deductions 1 L Xi 253 M2 = Span , x i* y exy ?y y-x y-x -ex2 -exy Ly-x y-x. p = y y-x -x y-x -1 " y-x 1 y-x. y = 1 . X 1 y- with respect to the direct sum decomposition <p = Ml + M2, where (l,x) and (l, y) are chosen as bases in Mx and M2, respectively. Further, F = \Fi Fi\is such that (<* + PF)-^i C.M{ if and only if the transformation '■-[*] satisfies -xfi+fi = ex (7.8.4) The transformation G= ' is such that (a + Gy)M2 C M2 if and only if for G, = [g, g2] we have ffi + ygi = ey y-x (7.8.5) Now formulas (7.8.2) and (7.8.3) give W(\) = I+(\-(y^-g1f1-g2f2)) ' -1 -u ■ u Lx+f2 \ y-x/ lyily-x y — xl (7.8.6) (7.8.7) We conclude that for every six-tuple of complex numbers (*, y, /,, f2, g,, g2) such that *#>' and (7.8.4) and (7.8.5) hold, there is a minimal linear fractional decomposition t/(A) = &W(V) where W(A) and V(\) are given by equalities (7.8.6) and (7.8.7), respectively. □ As an application of Theorem 7.7.1, let us consider linear fractional decompositions with several factors.
254 Rational Matrix Functions Theorem 7.8.2 Let U(X) be a rational matrix function that has no pole at infinity, and let m = S(U). Then U(\) admits a linear fractional decomposition £/(A) = 9Wi{9Wi{ ■ ■ ■ (9WmJ WJ) ■ • •) (7.8.8) where for j = 1,. . . , m H^(A) is a rational matrix function that is finite at infinity with McMillan degree 1. Moreover, W;(A) can be chosen in such a way that &Wi(V) = Wj2(\) + V(\)Wn(A), j=\,...,m-\ (7.8.9) for any rational matrix function K(A) of suitable size, where W;I(A) and W/2(A) are rational matrix functions of appropriate sizes with Wj2dx>) = 0, Observe that the decomposition (7.8.8) is minimal in the sense that 5(U) = 8(Wl) + ••• + d(Wm). So, in contrast with the factorization of rational matrix functions (Example 7.5.1), nontrivial minimal linear fractional decompositions always exist. Proof. Choose a minimal realization U(\) = 8 + y(\I-a) ~'0 By the pole assignment theorem (Theorem 6.5.1), there exists a transformation F such that a(a + /3F) = {A,,. . . , A,} with distinct numbers A,,. . . , A, (here / x / is the size of a). So there is a basis g,,. . . g, in <p' such that (a + flF)gj = A;gy, / = 1,...,/. On the other hand, for any transformation G: (p9—> <p' there is a basis /,,...,/, in <p' in which the matrix of a + Gy has a lower triangular form (Theorem 1.9.1). Choose gj in such a way that g/( /2, f3,. . . , /; are linearly independent and put Ml = Spanfg^}, M2 = Span{/2, /3, • • • , /,}• Then (Mlf M2) is a reducing pair and the recipe of Theorem 7.7.1 (with D = I, d= 8) produces a minimal linear fractional decomposition i/(A) = ^,(1/,), where 8(W) = 1 and W(o°) = /. Moreover, taking G = 0 it follows that W( A) has the form W(AHwi2(A) /J Hence ^(V) has the form (7.8.9). Now apply the preceding argument to i/,(A), and so on. Eventually we obtain the desired linear fractional decomposition (7.8.8). □ Observe that, because 8(Wj) = l, each function W;(A) from Theorem 7.8.2 has only one pole /*., and the multiplicity of this pole is 1. The proof of
Exercises 255 Theorem 7.8.2, together with formula (7.7.8) for the transformation A, shows that the functions W;(A) can be chosen with the additional property that /u,,,. . . , fim are the eigenvalues (counted with multiplicities) of the transformation a taken from a minimal realization (7.8.3) of U(X). 7.9 EXERCISES 7.1 Find realizations for the following rational matrix functions: (a) (b) (c) r i a- i rn-(A-i)-' a- i L 0 1 + (A +1)"1 J Determine whether these realizations are minimal. 7.2 Find the McMillan degree and a minimal realization for the following rational matrix functions: (a) (b) A-+3A + 2 A2+2A + 1 A + 2 A2 + l A2 + 3A + 4 A2 + 3A+2 1 (A-2)2 A J_ A .2 A + 2J 7.3 Reduce the following realizations to minimal realizations or o (a) (b) (c) 1 + [0 1 0 0](A/-/„(Ao)r l+Cp(A/-/„(A0))-C; where Cp is the 1 x n matrix with 1 in the pth place and zeros elsewhere; 1 + [1 0 0] [XI- rl l ll l l l .1 l l. V / roi l .0.
256 Rational Matrix Functions 7.4 Find minimal realizations for the following scalar rational functions: A-A. (a) y—^, A,*A2 (A - A )* ^ ( _ t*' A, t-4 A2 , where A: is a positive integer (A — A2) [Hinf: In the minimal realization I + C(\I- A)'lB the matrix A is the Jordan block of size k with eigenvalue A2.] (c) 2fly(A-A0)-'" 7.5 Find a minimal realization for the scalar rational function with finite value at infinity, assuming that its representation as a sum of simple fractions is known, that is, of the form E^=1 E*=0a;7(A - A,)-'. [Hint: Use Exercise 7.4 (c) and Exercise 7.11.] 7.6 Show that if W,(A) = /„ + C,(A/- Al)~xBl , W2(\) = Im + C2(\I-A2)~lB2 (1) are realizations for n x n and my. m rational matrix functions W, (A) and W2(X), then the (n + m) x (n + m) rational matrix function W, (A) © W2( A) has realization (2) Show, furthermore, that (2) is minimal if and only if each realization (1) is minimal. 7.7 Describe a minimal realization for the 2x2 circulant rational matrix function where a,(A) and a2(A) are scalar rational functions with finite value at infinity. 7.8 Describe a minimal realization for the n x n circulant rational matrix function
Exercises 257 w «,(A) «2(A) «-(A) «i(A) .a2(A) fl3(A) «,(A) " «„-.(*) «,(A) J [As usual, assume that W(o°) is finite at infinity.] 7.9 Let W,(A) and W2(A) be rational matrix functions with realizations W,( A) = D, + Cy( A/ - /!;)"'By, y = l,2 Show that the sum Wt(A) + W2(A) has the realization (3) Wl(A) + ^(A)-DI + Da + [C1 C2](A/-[^ £])"'[£] (4) 7.10 Give an example of rational matrix functions W,(A) and W2(A) with minimal realizations (3) for which the realization (4) is not minimal. 7.11 Assume that the realizations (3) are minimal and Al and A2 do not have common eigenvalues. Prove that (4) is minimal as well. [Hint: We have to show that l[C,, C2], ' ) is a null kernel pair /\Al 0 1 [ B.~\\ and I I, Ms a full-range pair. Suppose that x and y are such that ClAklx+ C2A2y = 0 for k = 0, 1,. . . . Because cr(>4,) n <r(A2) = 0, for jfe = 0,1,. . . there exists a polynomial pt(A) such that pk(A{) = 0, pk(A2) = Ak. Then 0 = CxPk{Ax)x + C2p,(,42)y = C^Jy Hence _y = 0. Similarly, one proves that x - 0.] 7.12 Let W(A) = D + Sz/A-A/rl /=i be a rational n x n matrix function, where A,,. . . , Xk are distinct complex numbers. Show that W(\) admits a realization
258 Rational Matrix Functions W(\)=D + [I ••• /]diag[(A-A,)-7,...,(A-A,)-1/] Z2 l2»j When is this realization minimal? 7.13 Find a realization for a rational n x n matrix function of the form W(A) = Z> + EZ;(A-A,) / = i (where A,, . . . , Xk are distinct complex numbers). When is the obtained realization minimal? 7.14 Given a realization W(A) = C(A - A)~lB, find a realization for the rational matrix function / W(A)1 0 / J Is it minimal if the realization W(A) = C(A - ,4) '/? is minimal? 7.15 Given a realization W(\) = D + C(\I- A)'[B (5) of a rational matrix function, find a realization for W(a\ + /3), where a t^O and /3 are fixed complex numbers. Assuming that (5) is minimal, determine whether the obtained realization is minimal as well. 7.16 Given a realization (5), show that W(A2) has a realization *(*■,-«>♦[. o,(>,-[° £])"[•] If (5) is minimal, is this realization minimal as well? 7.17 Given a realization (5), find a realization for W(p(A)), where p(A) is a scalar polynomial of third degree. Is the realization obtained minimal if (5) is minimal? 7.18 Let W(X)= l+C(XJ- A) {B (6) be a minimal realization, (a) Show that
W(A)2 = / + Exercises A/- 259 [C 0]( A BC 0 A ra A 0 0 BC A 0 BC BC A BC- BC BC A . v \ / ~kB~ IB - B - is a realization of W(A)2. (b) Is the realization of W( A)2 minimal? (c) Is the realization minimal if, in addition, the zeros and poles of W( A) are disjoints? 7.19 For the minimal realization (6), show that W(A)* = / + [C0---0]|A/- LO 0 0 is a realization of W(X) . Is it minimal? Is it minimal if the zeros and poles of W(A) are disjoint? 7.20 Show that a realization W(\) = I + C(A/ - A)'lB is minimal if A and A - BC do not have common eigenvalues. (Hint: Use Theorem 7.1.3.) 7.21 Let W( A) be an n x n rational matrix function with W(o°) = / and assume that W(\) is hermitian for all real A that are not poles of W(A). Prove that for every minimal realization W(\) = l+C(\I- A)~lB there exists a unique invertible matrix S such that C=B*S, A = S~lA*S, B = 5"'C* 7.22 Show that the McMillan degree of fl+iz/A-^.)-1 where A,, . . . , \k are distinct complex numbers, is equal to the sum of ranks of Zx,. . . , Zk. 7.23 Show that for rational n x n matrix functions W{(\) and W2(A) with finite values at infinity the inequalities \S(Wy) - 5(W2)| < S( W, + W2) < S(W1) + S(W2) |5(W1)-8(W2)|<5(W1W2)<8(W1) + 5(W2) hold.
260 Rational Matrix Functions 7.24 Find the McMillan degree of the circulant rational matrix function W(A) ■«,(A) «2(A) «„(A) «i(A) La2(A) a3(A) «„(A) «„-i(A) «,(A) J 7.25 Find a minimal realization of W(A), and, with respect to this realization, describe all the minimal factorizations W(\)= W,(A)W2(A) of W(\) in terms of subspaces i? and M as in Corollary 7.3.2, for the following scalar rational functions: (a) (b) (c) (A-A,)2 (A-A2) 2 ' A,^A2 (A-A,)* (A-A2)* A, # A2, where k > 3 is a fixed integer k 2«,-(A-Aj -i 7.26 When is the realization /„ + /„( AIn - /I)~ B, where /I is upper triangular with zeros on the main diagonal and B is diagonal with distinct eigenvalues, minimal? Show that in this case W(\) admits a minimal factorization with factors having McMillan degree 1. 7.27 Prove that a circulant rational matrix function (Exercise 7.24) with value / at infinity admits a minimal factorization with factors having McMillan degree 1. 7.28 Let W(\) = J+C(XJ-A)'lB be a minimal realization, and assume that BC = 0. (a) Prove that W( A)"' + W( A) = 21. (b) Prove that W(A) admits a nontrivial minimal factorization if and only if A is not unicellular. 7.29 Let t/(A) = A- I - A J A,^A2 be a scalar rational function. Use the recipe of Theorem 7.7.1 to construct all minimal linear fractional decompositions t/(A) = &W{V),
Exercises 261 such that W(A) and K(A) take finite values at infinity and Wn(°°), W22(o°) are invertible. Find all the corresponding reducing pairs of subspaces with respect to a fixed minimal realization of U(\). 7.30 Show that all the following decompositions of a rational matrix function i/(A) are particular cases of the linear fractional decomposition: (a) U(X) = W,(A) + W2(A) (b) £/(A) = W,(A) + W2(A)W2(A) (c) f/(A) = (WI(A)-' + W2(A)-1)-1 7.31 For the rational function t/(A) given in Example 7.8.1, find all minimal linear fractional decompositions i/(A) = ^W(V), with W(oo) = / and K(oo) = /.
Chapter Eight Linear Systems In this chapter we show how the concepts and results of previous chapters are applied to the theory of time-invariant linear systems. In fact, this is a short self-contained introduction to linear systems theory. It starts with the analysis of controllability, observability, minimality, and state feedback and continues with a selection of important problems with full solution. These include cascade connections, disturbance decoupling, and output stabilization. 8.1 REDUCTIONS, DILATIONS, AND TRANSFER FUNCTIONS Consider the system of linear differential equations f dx( t) —if = Ax(t) + Bu(t); x(0) = *0, (20 (8.1.1) Vy{t) = Cx(t) + Du(t) where A: <pm^ <pm, B: ("-> <pm, C: <pm-^ <£', and D: $"-> <p' are constant transformations (i.e., independent of t). Here u(t) is an n-dimensional vector function on f > 0 that is at our disposal and is referred to as the input (or control) of the linear system [equations (8.1.1)]. The r-dimensional vector function y(t) is the output of (8.1.1), and the m-dimensional function x(t) is the state of (8.1.1). Usually the state of the system (8.1.1) is unknown to us and must be inferred from the input (which we know) and the output (which we may be able to observe, at least partially). Let x(t; x0, u) be the solution of the first equation in (8.1.1) [with the initial value jc(0) = x0\. It follows from the basic theory of ordinary differential equations [see Coddington and Levinson (1955), for example] that the solution x(t; x0, u) is unique and is given by the formula 262
Reductions, Dilations, and Transfer Functions 263 x(t\x0,u) = e'Ax0 + jei'~')ABu(s)ds, <>0 (8.1.2) Substituting into the second equation of (8.1.1), we have y = y(t; x0, u) = Ce'Ax0 + J Ce('_I)/,Bu(s) ds + Du(t), t >0 (8.1.3) Formula (8.1.3) expresses the output in terms of the input. In other words, the input-output behaviour of the system is represented explicitly. Now we introduce some important operations on linear systems of type (8.1.1). It is convenient to describe (8.1.1) by the quadruple of transformations (A, B, C, D). A linear system (A', B', C", D') with transformations A'; (pm'^(p"!', B':^"'-»<p"', C: <pm'^ <£r', D'. <p"'-» £'' will be called similar to (A, B, C, D) if there exists an invertible transformation 5: (pm'-> (pm such that A' = S lAS, C' = CS, B' = S~'fl, D' = D (In particular, this implies that m = m', n = n', r = r'.) We also encounter system (8.1.1) with transformations A:M—*M, B: <p"—> M, C: M-* <pr, and D: <p"—> <pr, where M is a subspace of <pm for some m. The definition of similarity applies equally well to this case. [In particular, similarity with the system {A', B', C, D') described above implies dim M = m'.\ A system (A', B', C, D') with A': <pm'-* <pm', B':<p"^(pm, C": <fm'-» <fr, D': <p"-* (pr will be called a d/to/on of (^, B, C, D) if there exists a direct sum decomposition <pm' = se + m + jr (8.1.4) with the two following properties: (1) the transformations A', B', C have the following block forms with respect to this decomposition A' '* 0 .0 * A 0 *' * * C' = [0 C *] B' = (8.1.5) where the stars denote entries of no immediate concern (so A: M—>M, C:M^>$r, B:$"^>M); (2) the system (A, B, C, D') is similar to (A, B, C, D). In particular, if (A', B\ C", D') is a dilation of {A, B, C, D), then D' = D. The form (8.1.5) for A' shows that the subspaces i? and i? 4- M are A' invariant; in other words, (8.1.4) is a triinvariant decomposition associated with the ,4-semiinvariant subspace M. Similarity is actually a particular case of dilation, with M — <£"" and i? = Jf = {0}. We say that (A, B, C, D) is a reduction of (/!', B\ C, D') if {A', B\ C, D') is a dilation of (/I, B, C, D).
264 Linear Systems The basic property of reductions and dilations is that they have essentially the same input-output behaviour; as follows. Proposition 8.1.1 Let (A', B',C',D') be a dilation of (A,B,C,D). Then, for x0 = 0, the input-output behaviours of the systems (A', B', C, D') and (A, B, C, D) are the same. In other words, if u(t) is any (say, continuous) n-dimensional vector function, then the output y = y(t; 0, u) of the system (A', B', C, D') and the output y - y(t; 0, u) of the system (A, B, C, D) coincide. Proof. Formula (8.1.3) gives y(t; 0, u) = J C'e(,~s)A'B'u(s) ds + D'u(t) , / > 0 y(f,0, «)=[ Cel'~')ABu{s) ds + Du(t) , t>0 As D' = D, and e^'~s)A (for a fixed / and s) admits a power series representation (see Section 2.6), we have only to show that for q = 0,1,. . . (8.1.6) CA"B Now (A, B, C, D') and (A, B, C, D) are similar, so there exists an invert- ible transformation S such that A = 5"U5, C= CS, and B = 5-1B. Hence CA"B= CS(S'lAS)"S'lB = CA"B and (8.1.6) follows. □ In practice one is concerned about the dimension m of the state space of a given system (8.1.1). It is desirable to make this dimension as small as possible without changing the input-output behaviour. We say that the system (8.1.1) is minimal if the dimension m of its state space is minimal among all linear systems {A', B', C, D') that exhibit the same input-output behaviour given the initial condition that that state vector is zero [i.e., *(0) = 0]. In view of Proposition 8.1.1, the following problem arises: given the linear system (8.1.1), not necessarily minimal, produce a minimal system by reduction of (8.1.1). We see later that this is always possible. CA"B = C'A,qB' Using formula (8.1.5), we obtain "* * * C'A'qB' = [0 C *] 0 A9 * 0 0* * B 0
Minimal Linear Systems: Controllability and Observability 265 To study this and other problems in linear system theory, it is convenient to introduce the transfer function. Consider the system (8.1.1) with x(0) = 0, and apply the Laplace transform. Denote by the capital Roman letter the Laplace transform of the function designated by the corresponding small letter; thus Z(A) = |o e-*sz(s)ds [It is assumed here that for t > 0 z(/) is a continuous function such that \z(t)\ ^ Ke^' for some positive constants K and fi. This ensures that Z(A) is well defined for all complex A with Re A > /u,.] The system (8.1.1) then takes the form \X(\) = AX(\) + BU(\) Y(\) = CX(\) + DU(\) Solving the first equation for X(\) and substituting in the second equation, we obtain the formula for the input-output behaviour in terms of the Laplace transforms: K(A) = [D + C(A/ - AY'B^iX) So the function W(A) = D + C(A/- A) lB performs the input-outut map of the system (8.1.1), following application of the Laplace transform. This function is called the transfer function of the linear system (8.1.1). Observe that the transfer function is a rational matrix function of size r x n that has finite value (=D) at infinity. Observe also that the transfer functions of two linear systems coincide if and only if the systems have the same input- output behaviour. In particular, systems obtained from each other by reductions and dilations have the same transfer functions. 8.2 MINIMAL LINEAR SYSTEMS: CONTROLLABILITY AND OBSERVABILITY Consider once more the linear system of the preceding section: ~^- = Ax(t) + Bu(t) ; f>0 (8.2.1) (y(t) = Cx(t) + Du(t) and recall that this system is called minimal if the dimension of the state space is minimal. [We omit the initial condition x(0) = x0 from (8.2.1); so (8.2.1) has in general many solutions *(/).]
266 Linear Systems Applying the results of Section 7.1 to transfer functions, we obtain the following information on minimality of the system (8.2.1). Theorem 8.2.1 (a) Any linear system (8.2.1) is a dilation of a minimal linear system; (b) the linear system (8.2.1) is minimal if and only if (A, B) is a full-range pair and (C, A) is a null kernel pair: Pi Ker CA' = {0} , 2 Im(fl'/1) = £m (8.2.2) where m is the dimension of the state space. Moreover, in (8.2.2) one can replace n"_0Ker CA1 by npj Ker CA' and EjL0Im(B;,4) by Ef,"0' \m(B'A), where p is any integer not smaller than the degree of the minimal polynomial of A. Indeed, (a) is a restatement of Theorem 7.1.3, and (b) follows from Theorem 7.1.5. It turns out that the conditions (8.2.2) obtained in Chapter 7 from mathematical considerations have important physical meanings, namely, "controllability" and "observability" of the linear system (8.2.1). Let us introduce these notions. The system (8.2.1) is called observable if for every continuous input u(t) and output y(t) there is at most one solution x(t). In other words, by knowing the input and output one can determine the state (including the initial value) in a unique way. Theorem 8.2.2 The system (8.2.1) is observable if and only if (C, A) is a null kernel pair: f) Ker CA' = {0} (8.2.3) Proof. Assume that (8.2.1) is observable. With y(t) = 0 and u(t) = 0, the definition implies that the only solution of the system ^ = /lx((), Ot(/) = 0 (8.2.4) for t>0 is x(t) = 0. If equality (8.2.3) were not true, there would be a nonzero x0 e DjL0 Ker CA' and the function x(t) - e'Ax0 would be a not identically zero solution of equation (8.2.4). Indeed, for every (>0we have
Minimal Linear Systems: Controllability and Observability 267 " 1 Thus observability implies the condition stated in equality (8.2.3). Now assume that (8.2.3) holds but (arguing by contradiction) the system (8.2.1) is not observable. Then there exist continuous vector functions y(t) and u(t) such that for / = 1, 2 and all t ^0, we obtain dxit) —^- = Ax ft) + Bu(t), y[t) = Cx,(t) + Du(t) (8.2.5) for some xx{t) and x2(t) that do not coincide everywhere. Subtracting (8.2.5) with/'= 2 from (8.2.5) with/= 1, and denotingx(0 = *,(?) - x2(t)^ 0, we have ^P = Ac(0; Cx(r) = 0, rso In particular, C[*(Ar)(0],=o = 0. Since x(?) = e'Ax0 it is found that Ci4**0 = 0, Jt = 0,1, ... Hence x0 = 0 by (8.2.3); but this contradicts x(t)^0. D The system (8.2.1) is called controllable if by a suitable choice of input the state can be driven from any position to any other position in a prescribed period of time. Formally, this means that for every xx G <pm, x2 G <pm, and <2 ><,>() there is a continuous function «(r) such that x(ty) = jc,, x(t2) = x2 for some solution x(t) of ^^ = Ax{t) + Bu{t), t > 0 (8.2.6) Note that in the definition of controllability the second equation y(t) = Cx(t) + Du(t) of equation (8.2.1) is irrelevant. Further, by replacing x(t) by x{t - ty) we can assume in the definition of controllability that tx is always 0. Theorem 8.2.3 The system (8.2.1) is controllable if and only if (A, B) is a full-range pair: 2 lm(A'B) = <pm / = 0 We need the following lemma for the proof of Theorem 8.2.3.
268 Linear Systems Lemma 8.2.4 Let G(t), /G [0, t0] be an m x n matrix depending continuously on t. Then I G(t)u(t) dt | u(t) is continuousj = Im I G(t)G(t)* dt (8.2.7) Proof. Let W= J„0 G(t)[G(t)]* dt. Assume x e <pm is such that x = Wy for some y e <p". Then putting u{t) = [G(t)]*y we find that x belongs to the left-hand side of (8.2.7). Conversely, if xl ^Im W, then there exists an x2 e <pm such that Wx2 = 0 and (at,, ;t2) #0. [Here we use the property that W= W* and thus Im W = (Ker W)x.] Arguing by contradiction, assume that there exists a continuous vector function u(t) such that Jo°G(0«(0 = *i f'° I x*G(t)u(t)dt = (xlyx2)^0 (8.2.8) Then On the other hand 0 = x*2Wx2 = jo° x*2G(t)G(t)*x2 dt = Jo'° ||G(0**2||2 dt and since the norm is nonnegative and G(f)* continuous, we obtain G(t)*x2 = 0, orx*G(0 = 0foralUe[0, t2]. But this contradicts (8.2.8). D Proof of Theorem 8.2.3. By formula (8.1.2) for every solution x(t) of (8.2.6) with *(0) = *, we have x(t) = e'Axx + jo e°-s)ABu(s) ds , t > 0 Hence x{t2) = e'*Axx + j* e(,>~°)ABu(s) ds = ehAxx + e'*A P esABu(s) ds From this equation it is clear that (8.2.1) is controllable if and only if for every t2 > 0 the set of m-dimensional vectors
Minimal Linear Systems: Controllability and Observability 269 jl e sABu{s) ds | u{t) is continuous? coincides with the whole space <£"". By Lemma 8.2.4, the controllability of (8.2.1) is equivalent to the condition that Im W, = <pm for all t>0, where W, = j'o esABB*esA' ds: <pm-» <pm We prove Theorem 8.2.3 by showing that for all t> 0 Ker W, = 2 Im BA') Ly=o J If x e Ker VV„ then jc* W>: = 0, that is (8.2.9) j \\B*e-sA'x\\2ds^0 (8.2.10) So B*e^/t*j: = 0, 0=£s;£f. [Otherwise, in view of the continuity of ||Z?*eJ/t*;t||2 as a function of s, we obtain a contradiction with (8.2.10).] Repeated differentiation with respect to s and putting 5 = 0 gives B*A*'~lx = 0, i = l,2,...,n It follows that x£flKer(BM,w)= f) [Im(/l' 'fl)]1 i = i ;=i L,=i J L,=0 J Assume now that x e [E;=i Im(,4'B)]\ Then B*A*'~lx = 0, / = 1, 2, .... It follows that B*eA'x = 0 when * >0, and hence x*W,x = 0 for / > 0. But Wt is nonnegative definite, so actually W,x - 0, that is, x G Ker Wr □ Combining Theorem 8.2.1 with Theorems 8.2.2 and 8.2.3, we obtain the following important fact. Corollary 8.2.5 The linear system (8.2.1) is minimal if and only if it is controllable and observable. This corollary, together with Theorem 7.1.5, shows that the concept of minimality for systems and realizations of rational functions are consistent,
270 Linear Systems in the sense that a system is minimal precisely when it determines a minimal realization for its transfer function. 8.3 CASCADE CONNECTIONS OF LINEAR SYSTEMS Consider two systems of type (8.1.1) (with initial value zero): *,(<>) = 0 1^,(0 = c,*1(o + d,m1(0 dx. -jj- = Alxl{t) + Blul{t); (8.3.1) and dx2 ~dl = A2x2(t) + B2u2(t) ; x2(0) = 0 ly2(t) = C2x2(t) + D2u2(t) (8.3.2) Suppose also that ux(t) and y2{t) are from the same space. The two systems are combined in a "cascade" form when the output y2 of the second system becomes the input «, of the first system. We obtain dxx ~dt Axxx(t) + Bxy2(t) = A1xl(t) + BtC2x2(t) + BxD2u2(t) and yx(t) = Cxx,{t) + Dxy2{t) = C,*,(r) + DsC2x2(t) + D,D2m2(0 Writing x(t) = ' , we obtain a new system of the same type: idx(t) _\A, dt L 0 BXC2 x(t) + BXD2 B, "2(0 >,(0 = [C, D,C2]*(0 + D,D2«2(0 (8.3.3) The system (8.3.3) is called a simple cascade composed of the first component (8.3.1) and the second component (8.3.2). Note that the dimension of the state space of the simple cascade is the sum of the state space dimensions of its components, and the input of the simple cascade coincides with the input of its second component, whereas the output of the simple cascade coincides with the output of the first component. Similarly, one can consider the simple cascade of more than two components. Let (/I,, Bx, C,, Dx),. . . ,(Ap, Bp, Cp, Dp) be linear systems of
Cascade Connections of Linear Systems 271 type (8.1.1). A linear system that is obtained by identifying the output of 04,., fl„ C„ D,) with the input of (/!,_,, B,_p C,.lt £>,_,), i = 2,3,...,p will be called the simple cascade of the systems (Al,Bl,Cl,Dl), . . . , (Ap, Bp, C , Dp). By applying formula (8.3.2) p-\ times, we see that such a simple cascade has the form dx[i) dt \4, B,C2 B,C3 0 A2 B2C3 : : -0 0 yl(t) = [C1,DlC2, B,Cp B2Cp *(<) + BlDp L B„ . ",(0 .D.C^WO + D.D^-'D^O (8.3.4) In the language of transfer functions the simple cascading connection has a very simple interpretation: formula (7.2.9) shows that the transfer function of the simple cascade of two systems is the product of the transfer functions of its first and second components (in this order). More generally, if (A, B, C, D) is the simple cascade of 04,, Bx, C,, D,), . . . , 04,, Bp, Cp, Dp), then D + C(A/- A)~lB = [D, + C,(A/- A.y'B,] ■ • ■ [D, + Cp(U-Ap)lBp\ The following problem is of considerable interest: describe the representation of a given linear system (A, B,C, D) as a simple cascade of other linear systems. We can assume that (A, B, C, D) is minimal (otherwise replace it by a minimal system with the same input-output behaviour). In order to relate this problem to the factorization problem for rational matrix functions described in Sections 7.3 and 7.5, we shall assume that D = I and that in each component (At, Bt, C,, D,) of the simple cascade (A, B, C, I) we have D, = I. Equation (8.3.4) shows that if (j4, B, C, D) is a simple cascade with components (/!,-, Bn C,, D,), / = 1,. . . , p, then the size of A [or, what is the same, the McMillan degree 8{W) of the transfer function W( A) of (A, B, C, I)] is equal to m, + • • ■ + mp, where mi is the size of Ar Denoting by W,(A) the transfer function of (A,, Bt, C,, D(), we have 5( W,) < mr On the other hand, as we have seen in the preceding paragraph W(A)=W,(A)---J*;(A) (8.3.5) which implies 8(W) < 5(W,) + • • • + 5(Wp) < m, + • • • + mp = 5(W) (8.3.6) So equality holds throughout (8.3.6), which means that the factorization (8.3.5) is minimal and that each system (A,, Bt, C,, D,), i = l,. . . , p is
272 Linear Systems minimal. Now we can use the results of Sections 7.3 and 7.5 concerning minimal factorizations of rational matrix functions to study simple cascading decompositions of minimal linear systems. The following analog of Theorem 7.5.1 is an example. Theorem 8.3.1 The components of every representation of a minimal system (A, B, C, J) as a simple cascade (with the transfer functions of the components having value I at infinity) are given by (ir.^ir,, tt.B, Ctt,, /), . . . , (vpAnp, irpB, Cttp, I) (8.3.7) where the projectors irl, . . . ,ir and associated subspaces if,,. . . , 5£ are defined as in Theorem 7.5.1. The transformations irjAirl in (8.3.7) are understood as acting in S£f, and the transformations Cirj and ir^B are understood as acting from i?y into <p", and from <p" into ^, respectively, where n is the number of rows in C (which is equal to the number of columns in B). We now describe a more general way to connect two linear systems. Consider the linear system dx -r = Ax+ Bu, y = Cx + Du , x(0) = 0 (8.3.8) and assume that the input vector u = u(t) and the output vector y = y(t) are divided into two components: Now let ^^ = aw(t) + bs(t), z(t) = cw(t) + ds(t), z(0) = 0 (8.3.10) be another linear system with the input s(t), output z(t), and the state w(t). (Here a, b, c, and d are constant matrices of appropriate sizes.) We obtain a new system by feeding the first component of the output of (8.3.8) into the input of (8.3.10) and at the same time feeding the output of (8.3.10) into the second component of the input of (8.3.8). [It is assumed, of course, that the vectors y^t) and s(t) are in the same space, as well as the vectors u2(t) and z(t).\ This situation is represented diagrammatically by ". J 1 ?! J I 1 Z^ 1' Z^ Z^ s» (8.3.11) y2 ' ' u2 z ' '
Cascade Connections of Linear Systems 273 Here 2, and %2 represent the linear systems described by equations (8.3.8) and (8.3.10), respectively. The new system has «,(/) as an input and y2(t) as an output and is called the cascade of (8.3.10) by (8.3.8). The "simple cascade" described in the first part of this section is a particular case of a cascade. Indeed, if the first component of the output yx(t) in the system (8.3.8) depends on M,(f) only, and y2(t) = u2(t), then the cascade described by (8.3.11) is actually a simple cascade. We turn now to a description of the cascade in terms of transfer functions. First, rewrite (8.3.8) in the form B. (:} x(0) = 0 ly2\ lc2\X + LD2l D22JLm2J where B = [BX B2], "[£]■ » Du Dl2 D2l D22 are the block matrix representations of B, C, and D conforming with the division [equations (8.3.9)] of y(t) and u(t). The transfer function of this system is W(x)~\-D21 dJ + [c2\(xi~a) [b' ^'[^(a) w22(\)\ where H^y( A) = £>.y + C,(A/ - A) lBf, i, j = 1, 2. So passing to the Laplace transforms, we have y,(A)i = rw11(A) y2(A)J Lw21(a) w12(A)]ri/,(A)i W22(A)JLt/2(A)J (8.3.12) where, as usual, the capital Roman letters indicate Laplace transforms of the functions designated by the corresponding lowercase letters. Let K(A) be the transfer function of (8.3.10); then Z(A) = K(A)5(A) (8.3.13) Now identify 5(A)=YI(A), Z(A)=f/2(A) (8.3.14) Using (8.3.12)-(8.3.14) we have (omitting the variable A)
274 Linear Systems y, = wllul + wnu2 = wuut + w12vy1 and hence Yx = (I-Wx2VylWuUx Further y2 = w2xux + w22u2 = w2xux + w22vyx = (w2l + w22v(i- wx2vylwn)ux So the cascade of a linear system with the transfer function K( A) by a linear system with the transfer function ^A) Lw21(A) W22(A)J has the transfer function U(X) given by the formula U(X) = W2X(X) + W22(X)V(X)(I- Wx2(X)V(X)ylWxl(X) We recognize that t/(A) is just a linear fractional transformation, U(X) = 2FW(V), as discussed in Chapter 7. Consequently, the results of Sections 7.6, 7.7, and 7.8 can be interpreted in terms of minimal cascades of linear systems. The cascade of (7.3.10) by (7.3.8) will be called minimal if the corresponding linear fractional decomposition U = &W(V) is minimal. As an example, let us restate Theorem 7.8.2 in these terms. Theorem 8.3.2 Any minimal linear system with m-dimensional state space can be represented as a minimal cascade of m linear systems each of which has one-dimensional state space. 8.4 THE DISTURBANCE DECOUPLING PROBLEM In this and the next section we consider two important problems from linear system theory in which [A B]-invariant subspaces (as discussed in Chapter 6) appear naturally and play a crucial role. Consider the linear system ^~ = Ax(t) + Bu(t) + Eq{t), t > 0 ' (8.4.1) I 2(0 = Dx(t)
The Disturbance Decoupling Problem 275 where A: <p"-+ <p", B: <pm^ <p", E: ("-> <p", and D: <p"-» <p' are constant transformations, and *(<), «(<), q(t), anc* 2(0 are vector functions taking values in <p", <pm, <p", and <pr, respectively. As in Section 8.1, <p" is interpreted as the state space of the underlying dynamical system, and u(t) is the input. The vector function z(t) is interpreted as the output. The term 17(f) represents a disturbance that is supposed to be unknown and unmeasurable. We assume that q(t) is a continuous function of t for f > 0. An important transformation of the system (8.4.1) involves "state feedback." This is obtained when the state x(t) is fed through a certain constant linear transformation F into the input, so the input of the new system is actually the sum of the original input u(t) and the feedback. Diagrammati- cally, we have u 1 F 1 z ta * Our problem is to determine (if possible) a state feedback F in such a way that, in the new system, the output is independent of the disturbance q(t). To express this problem in mathematical terms we introduce the following definition. The system (8.4.1) is called disturbance decoupled if for every x0 G <p" the output z(f) of the system (8.4.1) with x(0) = x0 is the same for every continuous function q(t). We have (cf. Section 8.1) x(t) = e'Ax(0) + J() e°'s)A[Bu(s) + Eq(s)] ds , t >0 and thus z(0 = De'Ax(0) + D jo el'~')ABu(s) ds + D J el'~')AEq{s) ds , t >0 Hence the system (8.4.1) is disturbance decoupled if and only if D\Qe1' S)A Eq(s) ds = 0 , f>0 for every continuous function q{t). We need one more notion from linear system theory. Consider the linear system dx dt Ax(t) + Bu(t); f>0; x(0)=0 (8.4.2) where A: <p"-><p" and B: <pm—>£" are constant transformations. We say
276 Linear Systems that the state vector y E <p" is reachable for the system (8.4.2) if there exist a r0^0 and a continuous function u(t) such that the solution x(t) of (8.4.2) satisfies x(t0) = y. As x(t)= \e{'-s)ABu(s)ds for / > 0, it follows easily that the set of all reachable state vectors of (8.4.2) is a subspace. Proposition 8.4.1 The set $t of reachable states coincides with the minimal A-invarant subspace that contains Im B: <& = (A | Im B) = Im B + A(\m B) + ■ ■ ■ + ^""'(Im B) C <p" Proof. By Lemma 8.2.4 we find that x E 01 if and only if x E Im[ jj ei'.-'"1B[e''»-'MBj. dsj = Im |o° e-'ABB*e-'A' ds for some /0>0. For any <0>0, let W. = f ° e-sABB*e~sA' ds '« Jo By equality (8.2.9) KerW,o = [ilm(/l'B)] or, taking into account the hermitian property of W, Im W, =2 Im(,4'fl) 0 /=o which coincides with (A\lm B) in view of Theorem 2.7.3. □ Using this proposition, we obtain the following characterization of disturbance decoupled systems. Proposition 8.4.2 The system (8.4.1) is disturbance decoupled if and only if
The Disturbance Decoupling Problem 277 (,4|lm£)CKerD Returning to the problem mentioned above, note that state feedback is described by a transformation F: <p" -> <pm, and substituting u(t) + Fx{t) in place of u(t) in the system (8.4.1), we obtain the system with state feedback: ^- =(A + BF)x{t) + Bu(t) + Eq(t) , f > 0 2(0 = Dx(t) The new system has the same form as the original system (8.4.1), with A replaced by A + BF. Our mathematical problem is: given transformations A: <p"-> <p" and B: <pm^> <p", and given subspaces % C <p" (which plays the role of Im E) and 3) C <p" (which plays the role of Ker D), find, if possible, a transformation F: <p"—> (pm such that the subspace def (A + BF\%) = % +(A + BF)% + ••• + (/! + BF)"l% [which is the minimal (A + BF)-invariant subspace containing %\ is contained in 3). The solution to this problem depends on the notion of [A Z?]-invariant subspaces, as developed in Chapter 6. Theorem 8.4.3 In the preceding notation, there exists a transformation F: <p"—> <pm such that (A + BF\%)C2> (8.4.3) if and only if the [A B]-invariant subspace °U that is maximal in 3) contains %. In this case any transformation F: <p"^> <pm with (A + BF)°U C °U (which exists by Theorem 6.1.1) has the property (8.4.3). Proof. Assume that there is an F: <p"—> <pm with the property (8.4.3). By Theorem 2.8.4 (applied with A + BF playing the role of A and any transformation whose image is % playing the role of B) the subspace (A + BF | %) is(A + BF) invariant, and thus (Theorem 6.1.1) it is [A B] invariant. As (A + BF\ %) D %, and the maximal (in 3>) [A B]-invariant subspace °U contains (A 4- BF \ %), we obtain ID?. Conversely, assume I D %. By Theorem 6.1.1 there is a transformation F: <p"^ <pm such that (A + BF)<H C <U. Now (A + BF\%)C{A + BF\ati) = aUaQ> and (8.4.3) follows. □
278 Linear Systems When applied to the disturbance decoupling problem, Theorem 8.4.3 can be restated in the following form. Theorem 8.4.4 Given a system (8.4.1), there exists a state feedback F: <£" —»• <£"" such that the system dx(t) dt lz(t) = Dx(t) = (A + BF)x{t) + Bu(t) + Eq(t), 0 (8.4.4) is disturbance decoupled if and only if the [A B]-invariant subspace °U that is maximal in Ker D, contains Im E. In this case the system (8.4.4) is disturbance decoupled for every transformation F: <p" -*<pm with the property that °U is (A + BF) invariant. We illustrate Theorem 8.4.4 by a simple example. example 8.4.1. Let A = 0 1 0- 0 0 1 .0 0 0. fl = 0" 0 .1. E = a2 axi D = [6, b2 b3] where a,, a2, and a3, as well s&bl,b2, and b3 are complex numbers not all zero. Using Theorem 6.4.1 and its proof, we find that a one-dimensional subspace M is [A B] invariant if and only if M = Span{(\, A, A2)} for some A £ <p, and a two-dimensional subspace M is [A B] invariant if and only if either it = Span{<l, A, A2), (1, n, fi2)} ; X^fi; A)/Ae<p or it = Span{<l,A, A2), (0,1,2A)}; A e <p Consider first the case when Ker D is [A B] invariant. This happens if and only if b3i*Q. Then obviously Ker D is the maximal [A B]-invariant
Tbe Output Stabilization Problem 279 subspace in Ker D, and Ker D D Im E if and only if a,6, + a2b2 + a3b3 = 0. So, when b3^0, there exists a 1x3 matrix F= [/j, f2, f3] such that the system (8.4.4) is disturbance decoupled if and only if axbx + a2b2 + a3b3 = 0, and in this case one can take f- in such a way that the polynomial fej + b2x + b3x divides -/, - f2x - f3x2 + x3. Assume now that Ker D is not [A B] invariant, that is, b3 = 0. If b2 ¥= 0, then the maximal [A B]-invariant subspace in Ker D is Span{(l, A0, A0)}, where A0 = -bl/b2. In this case we have Span{(l, A0, Ag)} D Im E if and only if a, t^O, albl + a2b2 = 0, a3b2 = atbl (8.4.5) So, if b3 = 0 and b2^0, there exists an F= [/,/2/3] as in Theorem 8.4.4 if and only if (8.4.5) holds, in which case one can take^ in such a way that the polynomial bt+b2x divides -/, - f2x - f3x2 + x3. Finally, assume b2 - b3 - 0. Then the maximal [A B]-invariant subspace in Ker D is the zero subspace, and there is no F for which the system (8.4.4) is disturbance decoupled. D 8.5 THE OUTPUT STABILIZATION PROBLEM Consider the system ^-= Ax(t) + Bu(t), />0 z(t) = Dx(t) (8.5.1) where the transformations /i: <p"-» <p", B: <pm^> <p", and D: (p"-* <f" are constant. The problem we deal with in this section is that of stabilizing the output z{t) by means of a state feedback while still maintaining the freedom to apply a control function u{t). More exactly, the problem is to find a transformation F: <p" —» <pm (which represents the state feedback) such that the solution of the new system ^- = (A + BF)x(t) + Bu(t), *(0) = x0 I £(t) = Dx(t) with identically zero input u(t) satisfies lim,_^ z(t) = 0 for every initial value z(t) = De,(A + BF)x0 x0. As
280 Linear Systems this condition amounts to \imDe,(A+BF) = 0 (8.5.2) To study the property (8.5.2), we need the following lemma. This is, in fact, a special case of Theorem 5.7.2, but it is convenient to have some of the conclusions recast in the present form. Lemma 8.5.1 Let A: <p"—> <£" be a transformation, and let M_, M0, M+ be the sum of root subspaces of A corresponding to the eigenvalues with negative, zero, and positive real parts, respectively. Thus («) }HI(eM)L ll = o (b) liminf||(eM)LJ|>0 and for some x0 G M0 we have lim,^ sup||e' *0|| < °°; (c) lim||eM*|| = °° for all x£lt<{0} Note that M_, M(), and M+ are ,4-invariant subspaces, and therefore these subspaces are also invariant for the transformations e'A, < > 0. Given A, B, D as in (8.5.1) and any transformation F: <p"—»<pm, let JfF= HKerlDiA + BF)"1] be the maximal (.4 4- Z?F)-invariant subspace in Ker D. The condition (8.5.2) can be expressed in terms of the root subspaces of A + BF as follows. Lemma 8.5.2 We have limDe'{A+BF)=0 t—K» if and only if $tx(A + BF) C JfF for every eigenvalue \0of A + BF such that ReAo>0. Proof. By Theorem 2.7.4 we have D-[D DJ, rf + BF-ft" A»]
The Output Stabilization Problem 281 with respect to the direct sum decomposition <p" = JfF + M, where M is a direct complement to JfF in (f"1. Also, (D,, A22) is a null kernel pair. Hence De'(A + BF) = Die,A^, />0 Now clearly m^A + BF) = Ker(A + BF- A0/)n C JfFfor every A0G <r(A + BF) with 0te A0 sO if and only if ,422 has all its eigenvalues in the open left-half plane. So we have to prove that limD.eM22 = 0 (8.5.3) if and only if all the eigenvalues of A22 are in the open left-half plane. Let x{) be an eigenvector of A 22 corresponding to the eigenvalue A0 with 0te An > 0. Then DIeM»x0=D1e'*% and by Lemma 8.5.1 (applied to A = j422|Span{Xo,) KminfllD.e'Sd^O (8.5.4) unless x0 G Ker D,. But if x0 G Ker £>,, then Span{*0} + JfFcf] Ker[D(A + BF)'~l] which contradicts the definition of JfF. Hence in equality (8.5.4) holds, and thus (8.5.3) does not. Conversely, if cr(Az2) lies in the open left-half plane, then (8.5.3) holds by Lemma 8.5.1 (where A22 plays the role of A). □ Now we can reformulate the problem of stabilizing the output by state feedback as follows: given transformations A: <p"—> <p", B: <pm—► <p", and a subspace % C <p" (which plays the role of Ker D), find an F: <p"^ (pm such that every root subspace of A + BF corresponding to an eigenvalue A„ with nonnegative real part is contained in %. In this formulation there is nothing special about the set of eigenvalues with nonnegative real parts. In general, we can consider any proper subset Hb of <p (the "bad" domain) in place of the closed right-half plane. Now we can prove a general result on solvability of this problem in terms of [A B]-invariant subspaces. Theorem 8.5.3 Given transformations /i: <p"-> <p", B: <pm^(p" and a subspace % C <p", there exists a transformation F. <p" -> <pm such that Sfl^A + BF) C % for
282 Linear Systems every eigenvalue A0 £ flb of A + BF if and only if, for every eigenvalue A0 of A in ftb, we have ®Xo(A)C(A\lmB) + °U where (A | Im B) is the minimal A-invariant subspace containing Im B and % is the maximal [A B]-invariant subspace in %. Proof. For a given transformation F: <p"—» 4-"" 'et NF be the maximal (A + Z?F)-invariant subspace in %. As JfF is also [A B] invariant, we have JfFCU. Assume now that F is such that 9tKo{A + BF)C% for every eigenvalue A0 of A + BF that belongs to ilb. Then, by Lemma 8.5.2, for every A0 £ o-(A + BF) D ilb we have ®Xo(A + BF)CJfF and hence aAo(/l + BF) C <U Denote by P the canonical linear transformation <p" —» <p7 (.4 | Im 6) (so Px = * 4- (/I | Im B) for every *£<p"), and for a transformation X: <p"—> <p" for which (.4 | Im B) is an invariant subspace, let X be the transformation induced by X on <p7 {^4 | Im B). One easily checks that A = A + BF. Now use Lemma 6.5.4 to obtain mXg(A) = 9t,a(A) = ®Aa(ATBF) = PSlJiA + BF) G{{A\lmB) + <U)/(A\lmB) (8.5.5) for every A0 £ cr(A + BF) O ilb. Similarly P®Xo(A)C((A\lmB) + °U)/(A\lmB) for every A0 £ ilb that is not an eigenvalue of A + BF. Consequently 9lXo(A)C(A\lmB) + <U (8.5.6) for every A0 £ cr(A) n ilb. Conversely, assume that (8.5.6) holds for every A0£flfc. We have to prove that there exists an F such that 5ft. x (A + BF) C % for every
Tbe Output Stabilization Problem 283 eigenvalue of A + BF that belongs to ilb. Let F0: <p"-» <pm be a transformation such that (A + BFA^l C aU. It is easily seen that the subspace def _ M = (A\lm B) + °U\s A invariant. We have A + BF0 = A, where the upper bar denotes the induced transformation on <p"M. Denoting by Q the canonical transformation <£""—»• §"IM, we see that Lemma 6.5.4 and equality (8.5.6) give for every A0Eftfc: Qto^A + BF0) = St^A + BF0) = ®^(A) = Q^A) C QM = {0} Hence «Ao(>l + BF0) C J< (8.5.7) Further, the inclusion (A + BF0)^l C % implies that °U = JiFg. Indeed, we have seen the inclusion JfF C % at the beginning of this proof. To prove the opposite inclusion, take x £ °U. Then Dx - 0. For i = 2, 3,. . . , n - 1 we have (A + BF0)'x e <%; hence D(A + BF0)'x = 0, and the inclusion °ll C JfFg follows. Let P : <p"—><p7% be the canonical transformation. Denoting by A": <p"/%-» <p"/% the transformation induced by X: <p"^> <p" (it is assumed that % is A' invariant), we have in view of Lemma 6.5.4 and inclusion (8.5.7), for every A0£ftt: 0t,o{{A + BF0)') = P'0tKa{A + BF0) C P'((A | Im B) + <K) = P'(M | Im B>) = P'((A + BF0 | Im B)) = ((A + BF0)' | Im(P'fl)) Now Theorem 6.5.3 implies the existence of a transformation F,: <p7 <%-► <pm such that the spectrum of (/I + BF0)' + P'flF, lies in the complement of S\b. Let F, = F,P': <p"^> <pm We have [(/I + BF0)' + P'BF^P' = P'[,4 + B(F0 + F,)] which means that (/I + BF0)' + P'BFt =(A + B(F0 + F,))' By Lemma 6.5.4 again, for every A0ei\ P'^o(A + B(F0 + F,)) = 0tKa{A + B(F0 + F,))' = {0} so ^AaM + B(F0+F,))C%C^ and F - Fa + F^ is the desired transformation. □
284 Linear Systems The proof of Theorem 8.5.3 shows that, assuming 0tK (A) C (A | Im B) + °U for every A0 e flb, the transformation F: <p" -> <pm such that SkK (A + BF) C % for every A0 G i\ can be constructed as follows: F = F0 + FXP, where F0: <p"-* <pm is such that (,4 + BF0)°U C <%; P: <p7% is the canonical transformation; and Fl: <p7% —»• <pm has the property that the spectrum of the transformation on <p7% induced by the transformation /4 + B(F0 + ^P) on <p" lies outside ilb. Applying Theorem 8.5.3 to the output stabilization problem, we obtain the following result. Theorem 8.5.4 Given the linear system ^- = Ax(t) + Bu(t) ^ z(0 = Dx(t) t>0 (8.5.8) with constant transformations A: ("-*$", B:$m-* <£", and D: <p"-» <£', there exists a transformation (state feedback) F: <p"—* <pm such that, for every initial value x(0), the solution of (8.5.8) with u(t) = Fx(t) satisfies lim,^ z(t) = 0 if and only if ®jJtA)C{A\lmB) + aU for every eigenvalue A0 of A lying in the closed right half plane, and where ^l is the maximal [A B]-invariant subspace in Ker D. We conclude this section with an example illustrating Theorem 8.5.4. example 8.5.1. Let A = 0 10" 0 0 0 0 0 A0. ; b = "0" l .0. A0e<p; KerD = Span{(fl,,fl2,fl3)} where a,, a2, a3 are complex numbers not all zeros. Here {A\lm B) = Span{epe2}. If $eAo<0, then there is always an F=[flf2f3] with properties as in Theorem 8.5.4 (one can take f3 = 0 and choose /, and f2 so that the equation A -/2A -/, = 0 has its zeros in the open left-half plane). So assume S/le A0 > 0. Then there exists an F as in Theorem 8.5.4 if and only if Span{e,,e2} + % = <p3 (8.5.9)
Exercises 285 If a3 = 0, then (8.5.9) is always false. If a3 ¥^ 0, then (8.5.9) happens if and only if the subspace Ker D is [A B] invariant, or, equivalently, if Ker D is (A + BG) invariant for some G = \gig2gi\- An easy verification shows that this is the case if and only if a2 = A0a,. So there exists an F = [fifjfi] as in Theorem 8.5.4 if and only if fl3^0 and a2 = A0a,. In this case f3 = Aofli ~ fia\ ~ Kfia\ and /i and f2 f°r which the zeros of A2 - /2A - /, = 0 are in the left half plane will do. □ 8.6 EXERCISES 8-1 For every input u(t) find the output y(t) for the following linear systems: (a) dt m o L-2 (b) dx(t) dt = /J,(0)*(r) + u(t), x(0) = 0 LiJ y(t) = [l ■■■ !]*(*) +«(0 8.2 For every input u(t) find the output y(t) for the following linear systems: (a) f dx(t) -jf = [hx( Ao) 0 Jk2( K)U(t) + Bu(t) ; *(0) = *0 ly(t) = Cx(t) where B is the (A:, + k2) x 2 matrix whose first column is et and second column is ek +k , and C is the 2 x (&, + k2) matrix whose first row is e[ and second row is e^+1. (b) '^p-=Ax(t)+CTu(0; x(0) = xo _y(t) = Cx(t) where A is an n x n lower triangular matrix and C = [0 • • • 0 1 0 • • • 0] with 1 in the A:th place.
286 Linear Systems 8.3 Consider the linear system dx(t) dt yit) = [c, c2\x{t) + (i(r) When is this system controllable? observable? minimal? 8.4 Find transfer functions for the linear systems given in Exercises 8.1 and 8.2. 8.5 Build minimal linear systems with the following transfer functions: (a) 1 1 A(A-l) 0 A +1 1 A(A+ 1)J (b) p(k) \ wherep(A) = E* =0a-Ay is a scalar polynomial. (c) (L(A))1, where L(A) is a monic n x n matrix polynomial of degree /. 8.6 Show that the system dx(t) dt y(t) "0 1 0 ••■ 0 " 0 0 1 ••• 0 1 Lfl0 a, a2 ••• flB_,^ = [i o ••• omo *(0 + "0 0 Li u(t) is controllable and observable. 8.7 For the system in Exercise 8.6, given the «-tuple of complex numbers A,,. . . , A„, find a state feedback F such that A + BF has eigenvalues A,,...,A„. Also, find G such that A + GC has eigenvalues A,,. . . , A„. 8.8 Let dx — = Ax + Bu ; y = Cu dt y be a linear system with n x n circulant matrix A and nxl and 1 x n matrices B and C, respectively. When is the system controllable? Observable? Minimal?
Exercises 287 8.9 Consider the linear system dx(t) Jx(t) + Bu(t) dt y{t) = Cu(t) + Du{t) where / is a nilpotent n x n Jordan matrix (i.e., with J" =0) and B and C are n x 1 and 1 x n matrices, respectively. When is this system controllable? Observable? Minimal? 8.10 Prove or disprove: if the system —jl = Ax + Bu , y(t) = Cx + Du is minimal, then the system —P- = A2x + Bu ; y(t) = Cx + Du dt > /w is minimal as well. 8.11 Let p(A) be a polynomial of the transformation A: <p"—* (p". Prove that the minimality of the system -^ = p(A)x + Bu , y(t) = Cc + Du implies the minimality of dx(t) _ = Ax + Bu , y{i) =Cx + Du Is the converse true? 8.12 Let dx{t) dt = Atx + Bu , y(t) = Cx + Du and —P = A2x + Bu , y(t)=Cx + Du be two systems, and assume that A2-p(Ax), where p(A) is a polynomial such that p(\x)^p(\2) for any pair of different eigenvalues Aj and A2 of A} and/?'(A)|A=A ^0 for every eigenvalue A0of At such that .4,|gj (A ) is not diagonable. Prove that the systems are simultaneously minimal or nonminimal.
288 Linear Systems 8.13 Show that if the system dx(t) dt = Ax + Bu; y(t) = Cx + Du is controllable, then for every A0 E <p the system dx(t) dt = (\0I+ A)x + Bu; y(t)=Cx + Du is controllable as well. Is this property true for the observability of systems? 8.14 For a controllable system dx{t) dt = J„{0)x(t) + bi \-K «(0 (1) where bn ¥= 0, find a state feedback F such that the system with feedback dx(t) dt 7,(0) + bi F\x(t) is stable, that is, all its solutions x(t) tend to zero as t—►«>. 8.15 For system (1) in Exercise 8.14 and any A: > 0, find a state feedback F such that all solutions x(t) of the system with feedback satisfy ||;c(r)|| s Ke'k', where K>0 is constant independent of t. 8.16 Prove that any minimal linear system with n-dimensional state space has a state feedback for which the system with feedback can be represented as a simple cascade of n linear systems with state spaces of dimension 1. 8.17 Prove that controllability is a stable property in the following sense: for every controllable system dx Tt = Ax + Bu there exists an e > 0 such that any linear system ^■ = A'x + B'u dt with \\A' - A\\ < e, ||fi' - fi|| < e is controllable as well.
Exercises 289 18 Prove that observability and minimality of linear systems are also stable properties. The definition of stability is, in each case, to be similar to that of Exercise 8.17. 19 Show that for any system dx „ „ — = Ax + Bu , y = Cx + Du dt J there exists a sequence of minimal systems dx -jt = Apx + Bpu , y = Cpx + Du where p = 1,2,. . . such that limp_ \\Ap - A\\ = 0, limp__ \\Bp - B||=0,lim„_||C -C|| = 0.
Notes to Part 1 Chapter 1. The material here is quite elementary and well known, although not everything is readily available in the literature. Part of Section 1.5 is based on the exposition in Chapter S4 of the authors' book (1982). More about angular transformations and matrix quadratic equations can be found in Bart, Gohberg, and Kaashoek (1979). Angular subspaces and operators for the infinite dimensional case were introduced and studied in Krein (1970). Chapter 2. The proof of the Jordan form presented here is standard and can be found in many books in linear algebra; for example, see Gantmacher (1959) or Lancaster and Tismenetsky (1985). A proof of the Jordan form can be obtained also by analyzing the properties of the set of all invariant subspaces as a lattice. This was done in Soltan (1973a). In this approach, the invariance of the Jordan form follows from the well-known Schmidt-Ore theorem in lattice theory [see, e.g., Kurosh (1965)]. "The ^-invariant subspace maximal in JV" and "the ^-invariant subspace minimal over Jf" are phrases that are introduced here probably for the first time, although the notions themselves had been developed and are now well known in the context of linear systems theory. In general, the whole material of Sections 2.7 and 2.8 is influenced by linear system theory. However, our presentation here is independent of that theory and leads us to abandon its well-established terminology. In particular, in linear systems theory, "full-range" and "null kernel" pairs are known as "controllable" and "observable" pairs, respectively. Marked invariant subspaces are probably introduced for the first time. The existence of nonmarked invariant subspaces is often overlooked. The description of partial multiplicities and invariant subspaces of functions will hold no surprises for the specialist, but, again, these are results that are not easily found in the standard literature on linear algebra. 290
Notes to Part 1 291 Chapter 3. The material of this chapter (except for Theorem 3.3.1) is well known. Theorem 3.3.1 in the infinite dimensional case was proved by Sarason (1965). Here we follow his proof. Chapter 4. The problem of analysis of partial multiplicities of extensions from an invariant and a coinvariant subspace was stated in Gohberg and Kaashoek (1979). This problem was connected there with the description of partial multiplicities of products of matrix polynomials in terms of partial multiplicities of each factor and reappears in this context in Section 5.2. The first results concerning this description were proved in Sigal (1973). In particular, Theorem 3.3.1 was proved in that paper. Example 4.3.1 and the material in Section 4.4 (except for Proposion 4.4.1) is taken from Rodman and Schaps (1979). For further information and more inequalities concerning the partial multiplicities, see Thijsse (1980, 1984) and Rodman and Schaps (1979). When this book was finalized, the authors learned about another important line of development concerning the problem of partial multiplicities of products of matrix polynomials. This has been intensively studied (even in a more general setting) by several authors. The reader is referred to recent work of Thompson (1983 and 1985) for details and further references. Chapter 5. The theory presented in this chapter can be viewed as a generalization of the familiar spectral theory of a matrix A but, in this context, identified with the linear matrix polynomial A/- A. This theory of matrix polynomials was developed by the authors and summarized in the book by Gohberg, Lancaster, and Rodman (1982). The material and presentation in this chapter is based on the first four chapters of that book. It also contains further results on matrix polynomials including least common multiples, greatest common divisors, matrix polynomials with her- mitian coefficients, nonmonic matrix polynomials, and connections with differential and difference equations. Lists of relevant references and historical comments on this subject are found in the above-mentioned monograph by the authors (1982). In this presentation we focus more closely on decompositions into three or more factors. Theorem 5.2.3 is close to the original theorem of Sigal (1973) concerning matrix-valued functions. See also Thompson (1983 and 1985). Chapter 6. The main results of this chapter were first obtained in a different form in the theory of linear systems [see, e.g., monographs by Wonham (1974) and Kailath (1980)]. In this chapter the presentation is independent of linear systems theory and is given in a pure linear algebraic form. This approach led us to change the terminology, which is well established in the theory of linear systems, and to make it more suitable for linear algebra. The ideas of block similarity in Sections 6.2 and 6.6, as well as of [A fi]-invariant and -invariant subspaces, are taken from Gohberg,
292 Notes to Part 1 Kaashoek, and van Schagen (1980). That paper contains a more general theory of invariant subspaces, similarity, canonical forms, and invariants of blocks of matrices in terms of these blocks only. Some applications of these results may be found in Gohberg, Kaashoek, and van Schagen (1981,1982). Theorem 6.2.5 was proved (by a direct approach, without using the Kronec- ker canonical form) in Brunovsky (1970). The connection between the Kronecker form for linear polynomials and the state feedback problems is given in Kalman (1971) and Rosenbrock (1970). In Theorem 6.3.2 the equivalence of (a) and (d) is due to Hautus (1969). The spectral assignment problem is classical, by now, and can be found in many books [see, e.g., Kailath (1980) and Wonham (1974)]. There is a more difficult version of this problem in which the eigenvalues and their partial multiplicities are preassigned. This problem is not generally solvable. For further analysis, see Rosenbrock and Hayton (1978) and Djaferis and Mitter (1983). Chapter 7. The concept of minimal realization is a well-known and important tool in linear system theory [see, e.g., Wonham (1979) and Kalman (1963)]. See also Bart, Gohberg, and Kaashoek (1979), where the exposition matches the purposes of this chapter. Section 7.1 contains the standard material on realization theory, and Lemma 7.1.1 is a particular case of Theorem 2.2 in Bart, Gohberg, and Kaashoek (1979). Section 7.2 follows the authors' paper (1983a). Sections 7.3-7.5 are based on Chapters 1 and 4 in Bart, Gohberg, and Kaashoek (1979). Here, we concentrate more on decompositions into three or more factors. Linear fractional decompositions of rational matrix functions play an important role in network theory; see Helton and Ball (1982). Theorem 7.7.1 is proved in that paper. The exposition in Sections 7.6-7.8 follows that given in Gohberg and Rubinstein (1985). Chapter 8. In the last 20 years linear system theory has developed into a major field of research with very important applications. The literature in this field is rich and includes monographs, textbooks, and specialized journals. We mention only the following books where the reader can find further references and historical remarks: Kalman, Falb, and Arbib (1969), Wonham (1974), Kailath (1980), Rosenbrock (1970), and Brockett (1970). This chapter can be viewed as an introduction to some basic concepts of linear systems theory. The first three sections contain standard material (except for Theorem 8.3.2). In the last two sections we follow the exposition of Wonham (1979).
Part Two Algebraic Properties of Invariant Subspaces In Chapters 9-12 we develop material that supplements the theory of Part 1. In particular, we go more deeply into the algebraic structure of invariant subspaces. We include a description of the set of all invariant subspaces for a given transformation and examine to what extent a transformation is denned by its lattice of invariant subspaces. Special attention is paid to invariant subspaces of commuting transformations and of algebras of transformations . In the final chapter the theory of the first two parts (developed for complex linear transformations) is reviewed in the context of real linear transformations. 293
This page intentionally left blank
Chapter Nine Commuting Matrices and Hyperinvariant Subspaces In this chapter we study lattices of invariant subspaces that are common to different commuting transformations. The description of all transformations that commute with a given transformation is a necessary part of the investigation of this problem. This description is used later in the chapter to study the hyperinvariant subspaces for a transformation A, that is, those subspaces that are invariant for any transformation commuting with A. 9.1 COMMUTING MATRICES Matrices A and B (both of the same size n x n) are said to commute if AB = BA. In this section we describe the set of all matrices which commute with a given matrix A. In other words, we wish to find all the solutions of the equation AX = XA (9.1.1) where X is an n x n matrix to be found. We can restrict ourselves to the case that A is in the Jordan form. Indeed, let J = S~ AS be a Jordan matrix for some nonsingular matrix S. Then A' is a solution of equation (9.1.1) if and only if Z = S~*XS is a solution of JZ = ZJ (9.1.2) So we shall assume that A = J is in the Jordan form. Write 295
296 Commutiug Matrices and Hypcrinvariant Subspaces 7 = diag[7,,.. . , JJ where Ja(a = 1,. . . , u) is a Jordan block of size max ma, Ja = \aIa + Ha, where Ia is the unit matrix of size ma x ma, and Ha is the ma x ma nilpotent Jordan block: H = 0 1 0 0' L0 0 ( Let Z be a matrix that satisfies (9.1.2) and write Z ~ [Zap\a p where Za/3 is a m„ x rn^ matrix. Rewrite equality (9.1.2) in the form (K-^)Zap-ZapHp-HaZap , l<a,|3<u (9.1.3) Two cases can occur: (a) Aa ^ Ap. We show that in this case Za/3 = 0. Indeed, multiply the left-hand side of equality (9.1.3) by Aa - Ap and in each term in the right-hand side replace (Aa - Ap)Za/3 by ZapHp - HaZaP. We obtain (Aa ~ Ap) Za(3 = ZaPHp ~ 2HaZaPHp + HaZap Repeating this process, we obtain for every p = 1, 2,. . . : (K ~ A,)"Za, = £ (-1)*(^ Wa,//£-* (9.1.4) Choose /? large enough that either //* = 0 or //^* = 0 for every q = 0, . . . , p. Then the right-hand side of equation (9.1.4) is zero, and since Aa ¥^ A^, we find that Za/3 = 0. (b) A„ = Ap.Then Zap*ip **aZap (9.1.5) From the structure of Ha and Hp it follows that the product Ha Za/3 is obtained from Za/3 by shifting all the rows one place upward and filling the last row with zeros; similarly, ZapHp is obtained from Za/3
Commuting Matrices 297 by shifting all the columns one place to the right and filling the first column with zeros. So equation (9.1.5) gives (where £,ik is the (/, A:)th entry in Za/3, which depends, of course, on a and /?): »i+!,* »(./ i = l. ,™a k = 1, . . . , m0 where by definition £/0 = Cma + i,k ~ 0- These equalities mean that the matrix ZaB has one of the following structures: For m=ma: For For Z = 'a/3 0 0 c(2) c(1) La/3 0 ,.("■„-'> 'a(3 c(1) =TJ^e<P) (9-1.6) ta<mlt:ZaP^[0 T ] (9.1.7) ma>me: Zaj3 = 0 m„ -mfi ,m 0 >0 -"'fl (9.1.8) where 0 stands for the zero p* q matrix. Matrices of types (9.1.6)-(9.1.8) are referred to as upper triangular Toeplitz matrices. So we have proved the following result. Theorem 9.1.1 Let i = diag[i,,. . . , Ju] be an n x n Jordan matrix with Jordan blocks 7,,. . . , Ju and eigenvalues A,, . . . , Au, respectively. Then an n x n matrix Z commutes with J if and only if ZaB = 0 for Aa ¥^ \B and ZaB is an upper triangular Toeplitz matrix for Aa = A^, where Z = [ZaB]ua B = l is the partition of Z consistent with the partition of J into Jordan blocks. We repeat that Theorem 9.1.1 gives, after applying a suitable similarity transformation, a description of all matrices commuting with a fixed matrix A. This theorem has a number of important corollaries. Corollary 9.1.2 Let A be an n x n matrix partitioned as follows: \Al °1 L o aA (9.1.9)
298 Commutiug Matrices and Hyperinvariant Subspaces where the spectra of the matrices A, and A2 do not intersect. Then any n x n matrix X that commutes with A has the form X = X, 0 0 X2 with the same partition as in equality (9.1.9). Proof. Let 7, (resp. J2) be the Jordan form of Ax (resp. A2), so /, = S^AfS; for some nonsingular matrices 5, and S2. Then Lo JJ is the Jordan form of A. By Theorem 9.1.1, and since cr(/,) H cr{J2) = 0, any matrix Y that commutes with J has the form Y = V, © Y2 with the same partition as in (9.1.9). Now Y commutes with J if and only if X= SYS1 commutes with A, where S = Sl@S2. So s.y.s,-1 o rs.y.sp o i L o s.y.s:1! S2Y2$2 has the desired structure. D This corollary, reformulated in terms of transformations, runs as follows: let >1: <p"—► <p" be a transformation, and let Ml and M2 be ^-invariant subspaces that are complementary to each other and for which the restrictions A\M and A\M have no common eigenvalues. Then Ml and M2 are invariant subspaces for every transformation that commutes with A. To prove this, write A in the 2x2 block matrix form with respect to the direct sum decomposition <p" = Mx + M2 and use Corollary 9.1.2. The next result is a special case. Corollary 9.1.3 Every root subspace for a transformation A: <p"—* <p" is a reducing invariant subspace for any transformation that commutes with A. The proof of Theorem 9.1.1 allows us to study the set ^(A) of all matrices (or transformations) that commute with the matrix (or linear transformation) A. First, observe that ^(A) is a linear vector space. Indeed, if AX, = XtA for i = 1 and 2, then also A(aX^ + $X2) = (aX{ + (iX2)A for any complex numbers a and )3. To compute the dimension of ^(A), consider the elementary divisors of A. Thus, for every Jordan block or size k x k and eigenvalue A0 in the Jordan normal form of A we have an elementary divisor (A0- A0)* of A
Commuting Matrices 299 (which is a polynomial in A). The greatest common divisor of two elementary divisors (A-A,)*1 and (A-A2)*2 of A is (A-A,)min(*1,*2) if A, = A2 and is 1 if A, # A2. Taking this observation into account, Theorem 9.1.1 shows that the dimension of ^(A) is T.ps l=l asl, where asl is the degree of the greatest common divisor of (A-Aj*1 and (A-A,)*', and (A - A,)*1,. . . , (A - \p)kp are all the elementary divisors of A. In particular dim <€(A) a S a„ = 2 k, = n (9.1.10) where n is the size of A. We have seen that, quite obviously, any polynomial in A commutes with A, and we now ask about conditions on A such that, conversely, each matrix commuting with A is a polynomial in A. To this end we need the following notion. An n x n matrix (or transformation A: <p" —► <p") is called nonderogatory if there is only one Jordan block in the Jordan form of A associated with each eigenvalue. It turns out that A is nonderogatory if and only if any one of the following four equivalent statements holds: (a) dim Ker( A/ - A) < 1 for every A £ <p; (b) A is similar to a matrix 0 1 0 0 0 1 0 0 Lfln (9.1.11) for some complex numbers a„,. . . , a„_,; (c) the minimal polynomial of A coincides with the characteristic polynomial of A; and (d) A is cyclic, that is, there exists an x E <p" such that <p" = Span{;c, Ax, A2x,. . .} (9.1.12) Indeed, by assuming that A is in the Jordan form, condition (a) is clearly equivalent to A having only one Jordan block for each eigenvalue. By Theorem 2.6.1, (d) is equivalent to A being nonderogatory. Further, the minimal polynomial for A is easily seen to be (A — A,)"1, •-(A — A )"', where A,, . . . , Ap are all the distinct eigenvalues of A and «y is the maximal size of the Jordan blocks of A corresponding to A. From this description it is clear that (c) is equivalent to (a). We have proved, therefore, that (a), (c), and (d) are equivalent to each other and to the condition that A is nonderogatory. Let A be the matrix (9.1.11). We want to prove that (a) holds. Let
300 Commuting Matrices and Hyperinvariant Subspaces x= (xx, . . . ,xn) andy = {y,,..., y„) be eigenvectors of A corresponding to the eigenvalue A0. Thus Ax = A0;c, Ay = \0y, and x^O, y#0. The structure of A implies that *, = \'0~1x1, y, = Aj,-1)', for / = 1, . . . , n. But then necessarily x, 5^0, yt t^O, and x = {yjx-^y, that is, x andy are linearly dependent. Hence (a) holds. Finally, we show that (d) implies (b). First observe that if (9.1.12) holds, then the vectors x, Ax, . . . , A"'lx are linearly independent (otherwise <p" would be spanned by less than n vectors, which is impossible). In the basis x, Ax,. . . , A"~lx the matrix A has the form (9.1.11). Theorem 9.1.4 Every matrix commuting with A is a polynomial in A if and only if A is nonderogatory. Proof First recall that in view of the Cayley-Hamilton theorem the number of linearly independent powers of A does not exceed n. Thus, if AX = XA implies that X is a polynomial of A, then X can be rewritten as X = p(A), where / = deg p{ A) < n and all powers /, A, A2,. . . , A'~l are linearly independent. So in this case dim <€(A) = I < n. Inequality (9.1.10) then implies that dim ^(A) = n. This means [again in view of (9.1.10)] that asl = 0 for s ¥=■ t. So in the Jordan form of A there is only one Jordan block associated with each eigenvalue of A. Conversely, assume that the Jordan form of A is /=^,(A,)e---e/ms(AI) where A,,. . . , A5 are different complex numbers. As we have seen, the solution X of AX = XA is then similar to a direct sum of upper tnangular Toeplitz matrices y.= ~c]l) 0 _0 c,<2) c]l) 0 c?} c,(,)J for i = 1,2,. . . , s More exactly, Yt © • • • © Ys = S lXS, where 5 is a nonsingular matrix such that J = S~lAS. Now a polynomial p(\) satisfying the conditions p(Ai) = cJ1',..., 1 K-l)! pim'-l\\i) = c\m,) for / = 1,. . . , s gives the desired result:
Common Invariant Subspaces for Commuting Matrices 301 ^=5(yle■••en)5",=5diag[/7(7ml(AI)),...)p(7mI(Ai))]5-, = SpW1 = Pisjsy1 =p{A) Note that p(\) can be chosen with degree not exceeding n - 1. D We now confine our attention to matrices commuting with a diagonable matrix. Recall that an n x n matrix A is diagonable if and only if there is a basis in <p" of eigenvectors of A. The following corollary is obtained from Theorem 9.1.1. Corollary 9.1.5 // A,, . .. , A, are the distinct eigenvalues of a diagonable matrix A, then s dim«(,4) = E jSf i=i where )3, = dim Ker(,4 - A,/), i=l,...,s For future reference let us also indicate the following fact. Proposition 9.1.6 An n x n matrix B commutes with every n x n matrix A if and only if B is a scalar multiple of I: B = A/ for some A E (p. Proof. The part "if" is obvious. So assume that B commutes with every n x n matrix A—in particular, taking A to be diagonal with n different eigenvalues with respect to a basis xt,. . . , xn in <p", Corollary 9.1.2 implies that B is also diagonal in this basis. Therefore, Bxl ESpanl*,}. As any nonzero vector x, appears in some basis in <p", we find that Bx = \x for every x E <p" -^ {0}, where the number can depend on x: A = A(z). However, if Bx = \(x)x, By = \(y)y with \(x)^\(y), then B(x + y)0 Span{;c + y}, a contradiction. Hence A is independent of x and the proposition is proved. D 9.2 COMMON INVARIANT SUBSPACES FOR COMMUTING MATRICES In this section we establish a fundamental property of a set of commuting transformations, namely, that there is always a complete chain of subspaces that are invariant for every transformation of the set.
302 Commuting Matrices aud Hyperinvariant Subspaces Theorem 9.2.1 Let ft be a set of commuting transformations from <p" into <p" (so AB — BA for any A, BE.il). Then there exists a complete chain of subspaces 0 = M0 C Mi ■ ■ ■ C Mn — <p", dim M . =j, such that M0, Mx, . . . , Mn are invariant for every transformation from ft. Proof. For every nonzero vector x E <p" write £(x) = Span{x, AtA2-Akx\ A l7. ..,AkE.Sl, k = 1,2,. . .} Clearly !£(x) is a nonzero subspace that is invariant for any A E ft (in short, ft invariant). Now let *[ E <p" be an eigenvector of some transformation /I, Eft corresponding to an eigenvalue A,; so Axxt — A,*,. Hence for every B,,. . . , Bk E ft we have ,4,5, • • • Bkx, = BXAXB2 • • Bkx{ = ■ ■ ■ = BtB2 ■■■ BkAlXl = A,fi,fi2- Bkxl So Let XjEif^,) be an eigenvector of some A2 Eft: A2x2 = \2x2. Then A2lXix } = A2/, and 3?(x2) C i?(x,). We continue the construction of nonzero subspaces i(x,)D%)D-0%) where At^(x) = A;7, i = 1, . . . , k for some At,. . . , Ak Eft and complex numbers A,,. . . , \k, until we encounter the situation where ££{y) = 3?(xk) for every eigenvector y E Z£{xk) corresponding to any eigenvalue A of any transformation B E ft. In this case every B E ft has an eigenvalue \B with the property that fi^,, } = AB7. Let yx be any nonzero vector from 2£{xk). Then the subspace Mx =Span{)'I} is ft invariant. Let X{ be a direct complement to Mx in (p". With respect to the decomposition M, 4- jV, = (p", we have The condition >lfi= ZM implies that A2B2 = B2A2. Repeating the above procedure, we find a common eigenvector y2€EJfl of all linear transformations from ft. Put M2 =Span{>'I, y2}, and so on. Eventually we obtain a complete chain of common ft-invariant subspaces. □ In terms of bases, Theorem 9.2.1 can be stated as follows.
Commou Invariant Subspaces for Matrices with Rank 1 Commutators 303 Theorem 9.2.2 Let ii be a set of commuting transformations from <p" into <p". Then there exists an orthonormal basis xt,. . . , xn in (p" such that the representation of any A £ £1 in this basis is an upper triangular matrix. Proof. Let {0} = JtQ C M, C • • • C Mn = £" be a complete chain of subspaces as in Theorem 9.2.1. Now construct an orthonormal basis xv,. .. , xn in such a way that Spanf*,,. . . , xf} = Mi for /= 1, . . . , n. □ If every transformation from the set (1 is normal, the upper triangular matrices of Theorem 9.2.2 are actually diagonal (cf. the proof of Theorem 1.9.4). As a result we obtain the "only if" part of the following result. Theorem 9.2.3 Let ft be a set of normal transformations <p"—* <p". Then AB = BA for any transformations A, B £ (1 if and only if there is an orthonormal basis consisting of eigenvectors that are common to all transformations in £1. The part "if" of this theorem is clear: if x,,..., xn is an orthonormal basis in <p" formed by common eigenvectors of A and B, where A, BE.il, then in this basis we have /4=diag[A,, A2,. . . , A J , B = diag[/i,, fi2, . . . , fi„] 9.3 COMMON INVARIANT SUBSPACES FOR MATRICES WITH RANK 1 COMMUTATORS For n x n matrices A and B, the commutator of A and B is, by definition, the matrix AB — BA. So the commutator measures the extent to which A and B fail to commute. We have seen in the preceding section that if A and B commute, that is, if their commutator is zero, then there exists a complete chain of common invariant subspaces of A and B. It turns out that this result is still true if the commutator is small in the sense of rank. Theorem 9.3.1 Let A and B be n x n matrices with rank(AB - BA) s 1. Then there exists a complete chain of subspaces: such that each M - is both A invariant and B invariant.
304 Commuting Matrices and Hypcrinvariaut Subspaces Proof. We shall assume that rank(,4fi - BA) = 1. (If AB~BA = Q, Theorem 9.3.1 is contained in Theorem 9.2.1.) We can also assume that A is singular. (If necessary, replace A by A - A0/ for a suitable A0, and note that the commutators of A and B and of A - A0/ and B are the same.) We claim that either Ker A or Im A is B invariant. Indeed, if Ker A is not B invariant, then there exists a nonzero vector x E <p" such that Ax — 0 and ABx^O. Thus (AB - BA)x = ABx span the one-dimensional range of AB - BA. Hence for every y E <p" there exists a constant /j,(y) such that (AB- BA)y = n(y)ABx It follows that BAy = AB(y-n(y)x) and hence Im(fi,4) C Im(,4fi) C Im ,4 so Im A is fi invariant. We have shown that there is a nontrivial subspace JV that is invariant for both A and B. Write A and fi as 2 x 2 block matrices with respect to the decomposition Jf + Jf' - <p", where Jf' is some direct complement to Jf: Then rank(/4,B, - S,j4,) =£ 1 and rank(/42B2 - B2A2) s 1. So we can apply the preceding argument to find a nontrivial common invariant subspace for At and fi, (if dim^V>l). Similarly, there exists a nontrivial common invariant subspace for A2 and B2 (if dim Jf' > 1). Continuing in this way, we ultimately obtain the result of the theorem. □ Theorem 9.3.1 can also be restated in terms of simultaneous trianguliza- tions of A and B, just as Theorem 9.2.1 was recast in the form of Theorem 9.2.2. In contrast with Theorem 9.2.1, the result of Theorem 9.3.1 does not generally hold for sets of more than two matrices. example 9.3.1. Let ^• = [-1 J' ^2 = L0 1 Mi It is easily checked that
Hyperinvariaut Subspaces 305 rank(AlA2 - A.2At) = rank(y4,J43 - A^A^) = rank(A2A3 - A3A2) = 1 Nevertheless, there is no one-dimensional common invariant subspace for Alt A2, and A3. Indeed, A3 has exactly two one-dimensional invariant subspaces, Span{e,} and Span{e2}, and neither of them is invariant for both /4, and A2. □ 9.4 HYPERINVARIANT SUBSPACES Let A: <p" —► <p" be a transformation. A subspace M C <p" is called hyperin- variant for .4 (or j4 hyperinvariant) if J< is invariant for any transformation that commutes with A. In particular, an /l-hyperinvariant subspace is A invariant. Let us study two simple examples. example 9.4.1. Let A = A/, A E (p. Obviously, any transformation from <p" to <p" commutes with A, so the only subspaces which are invariant for every linear transformation that commutes with A are the trivial ones: {0} and <p". Hence A has only two hyperinvariant subspaces: {0} and <p". □ example 9.4.2. Assume that A: <p" —* <p" has n distinct eigenvalues A,,. . . , \n with corresponding eigenvectors xt,.. ., xn. Then A has exactly 2" invariant subspaces Span{;c, | i E K}, where K is any subset in {1,... , n) (see Example 1.1.3). By Theorem 9.1.4, the only transformations that commute with A are the polynomials in A. Since every ^-invariant subspace is invariant also for any polynomial of A, we find that every ^-invariant subspace is A hyperinvariant. □ More generally, let A be a nonderogatory transformation. Then Theorem 9.1.4 shows that every ^-invariant subspace is also A hyperinvariant. This property is characteristic for nonderogatory transformations. Theorem 9.4.1 For a transformation A: §"-* <p" every A-invariant subspace is A hyperinvariant if and only if A is nonderogatory. Proof. We have seen already that the part "if" is true. To prove the "only if" part, assume that A is not nonderogatory. We prove that there exists an ^-invariant subspace that is not A hyperinvariant. By assumption, dim Ker(A - \0I) > 2 for some eigenvalue A0 of A. Without loss of generality we can assume that A is a Jordan matrix where m > 2 and the first m Jordan blocks correspond to the eigenvalue A0,
306 Commuting Matrices and Hyperinvariaut Subspaces they are arranged so that kl < k2 and Am+I,. . . , kp are different from A0. Obviously, Span{e,} is an .^-invariant subspace. It turns out that this subspace is not A hyperinvariant. Indeed, by Theorem 9.1.1 the matrix S with 1 in the entries (/c, + 1,1),..., (2/t,, &,) and zero elsewhere, commutes with A. On the other hand, Sei — ek+l, so Span{e,} is not 5 invariant. □ It is easily seen that all the /l-hyperinvariant subspaces form a lattice, that is, the intersection and sum of A-hyperinvariant subspaces are again A hyperinvariant. Denote this lattice by Hinv(j4). Now we can state the main result concerning the structure of Hinv(j4). Theorem 9.4.2 The lattice of all A-hyperinvariant subspaces coincides with the smallest lattice $fA of subspaces in <p" that contains Im(^-A/)* and Ker(,4 - A/)* , A £ <p , A: = 1,2,... Actually, SfA coincides with the smallest lattice of subspaces in <p" that contains Ker(A - A ./)* , \m{A - A ./)*, /=l,...,w; k = 1,. . . , r> - 1 (9.4.1) where (A - A,)r'• •• (A - Am)'m is the minimal polynomial of A. Indeed, Ker(A - A/)* = {0} for X0{\,,. . . , \J and Ker(A - A/)* = R^A) for A = A and k ^ rr The proof of Theorem 9.4.2 is given in the next section. The following example shows that, in general, not every y4-hyperinvariant subspace is the image or the kernel of a polynomial in A. example 9.4.3. Let A be the 6x6 matrix oioooo- 0 0 10 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 1 .000000. According to Theorem 9.4.2, the subspace i? = Span{e,, e2, e5} = Ker A + Im A2 is A hyperinvariant. On the other hand, there is no polynomial p(\) such that 3! = Ker p(A) or J£ = Im p(A). Indeed, for any polynomial p(\) the matrix p(A) has the form (see Section 9.2.10): A =
Proof of Theorem 9.4.2 307 Pi 0 0 0 0 0 Pi Px 0 0 0 0 Pi Pi Px 0 0 0 Pa Pi Pi Px 0 0 0 0 0 0 Px 0 0 0 0 0 Pi Px P(A) for some complex numbers p,, p2, p3, p4. So Ker p(A) can be only one of the following subspaces: {0} (if p,^0); Span{e,,e5} (if p,=0, p2^0); Span{e,,e2,es,e6} (ifp, =p2 = 0, p3 ^0); Span{e,, e2, e3, es, e6} (if p, = p2 = p3 = 0, p3 * 0); <p (if p, = 0, i = 1, 2, 3,4). The subspace Im p(A) can be one of the following: <p6; Span{e,, e2, e3, e,}; Span{e,,e2}; Span{e,}; {0}. None of these subspaces coincides with 5£. □ 9.5 PROOF OF THEOREM 9.4.2 The proof of Theorem 9.4.2 requires some preparation. We first prove several auxiliary results that are useful in their own right. Proposition 9.5.1 For any A E <p the subspaces Ker(,4 - A/)* , \m{A-kl)k , it = 1,2,... are A hyperinvariant. Proof. Fix A e <p and a positive integer k, and let jc be any vector from Ker(/lc- A/)*. If fi commutes with A, we have (.4 - KI)kBx = B(A - kl)kx = 0 So Bx £Ker(j4 - A/)*, and the subspace Ker(.<4 - A/)* is A hyperinvariant. Similarly, let y £ Im(,4 - A/)* and BA = ,4fi. Then for any z £ <p" such that (A - A/) z = _y, we obtain (A - A/)*Bz = B(A - \I)kz = By So By E lm(A - A/)*; therefore, lm(A - A/)* is .4 hyperinvariant. □ We proceed now with the identification of Hinv(j4), assuming that A has only one eigenvalue. Given positive integers p,>-->pm, let A(p,,. . . , pm) be the set of all w-tuples of integers (qt,. . . , qm) such that Qx a • • • s Im - ° and Pi " ?i -P2 " ?z & • ■ • sPm -?.20. For every two
308 Commuting Matrices and Hyperinvariant Subspaces sequences q' = {q[,..., q'J and q" = (q", . . . , q'J from A(Pl,. . . , pj put max(q', q") = (max(q\, q'[),. . . , max(?;, q'J) It is easily seen that max(q\ q")E.A{pt,. . . , pj. Similarly, let minU', q'[) = (mm(q'lt q'[),..., min(^, q'J) then min(^', q") belong to A(p,,. . . , pm). Let fi: <p"—> <p" be a transformation with a single eigenvalue A0, and let 71 '■■•■> J P, ' J i > • • • ' J p2 ' • • ■ ' J i '■■•>/(,„ (,?.:>.r; be a Jordan basis in (p" for fi, where p, >p2 s • • • >pm. So in this basis fi has the form JPi(K)®---®JPm(K) Let ^Spanf/''>,...,/<"}, i=l,...,m; / = 1,. . . , p, (9.5.2) Lemma 9.5.2 For every (q^ . . . , qj e A(p,,. . . , pm) the subspace is B hyperinvariant. Conversely, every B-hyperinvariant subspace !£ has the form <p(qx, . . . , qj for some (qlt. . . , qj£ A(p,, . . . , pj. Moreover 4>(max(q', q')) = 4>{q') + <P(q") (9.5.4) 4>{min(q',q"))=4>(q')n<K<n (9-5.5) for every q\ ?"£ A(p,,. . . , pj. Proof. Let if be a nonzero B-hyperinvariant subspace, and let x E if be an arbitrary nonzero vector. Write x as a linear combination of the basis vectors: Pl Pm 1=1 l=L Assume that for some /' the vector
Proof of Theorem 9.4.2 309 y = 2 i\nf\n is nonzero, and let q be the maximal index i (1 :s ;' =£77,) such that £^y) t^O. We show that the subspace "X'q is in if. Let Pj be the projector on %'p denned by Py/a') = 0 for i#; and PlfiJ)=flt) (a = l,..., p,). Obviously, PjB=BPj. Therefore, the sub- space if is Pj invariant. Hence y = P-x £ if. For every k = 1, 2,. . . the linear transformation (fi - A0/) commutes with B and hence Then the vectors /<'!>, = (fi - A0/)/<'\. . . , /<» = (B - A0/)/<» also belong to if. Thus %'q C if. Furthermore, we show that if I'^Ci? (;>2), then also %'~X<Z%. Indeed, let A"": <p"-* <p" be the linear transformation given in the basis (9.5.1) by the matrix where X^ is a pv x p^ matrix, and ZM v = 0 for all fi, v except for A^., y, which is given as follows: */-.., = [ 0 J Theorem 9.1.1 shows that X commutes with B. Consequently, if is X invariant and the vectors /('-'> = */}» (i = l,...,q) belong to if. We have proved that if has the form (9.5.3) with qt>-- •>qm. Let us verify that px - q^- • ■ ^ pm- qm. Fix i0</0 and let C:<p"-»<p" be defined in the block matrix from C = [Cl7]™y=, with respect to the basis (9.5.1) where C,y is the zero pi x py matrix if i¥=j0 or ;V «0 and CyoJo is the p. xp. matrix [0 /]. By Theorem 9.1.1, C commutes with A, so if is C invariant. If ^,o = 0 or pio~ qio^pio, then obviously pig- qig>pk- qh. Otherwise which implies pyo - p,o + qif> s ^, that is, ph - qig > ph - qlo again.
310 Commuting Matrices and Hyperinvariant Subspaces It remains to show that every subspace 2=X\ + --- + X" with (qlt. . . , qm)E A(p,,. . . , pm) is B hyperinvariant. Let Ce <€(B). We must prove that if is C invariant. With respect to the basis (9.5.1), write C as the block matrix C - [C;y]™/==1, where C,7 is a p, x pf matrix of one of the following types (see Theorem 9.1.1): TPi if i = j; [0 Tp} if i>j; [ *] if i<j [in the notation of (9.1.6)—(9.1.8)]. From the structure of C it is easily seen that J? is C invariant if and only if the ^th column in every C/y has all entries zero in the places q, + l,. . . , pt. In case i > j the first nonzero entry in the <yyth column of C1; can be in the [/?,. — (pt — gy)]th place; but pt — (Pi ~ <?/) s <li because (?,,. . . , qm)£ A(p,,. . . , pm). In case i<j the first nonzero entry in the q^h column of Ctj can be in the ^th place; but qt^qt, so we are done in this case also. Finally, in case i = j obviously the qjth column of C/y has zeros in places qt: + 1, .. . , pr We have verified that if is indeed C invariant. Finally, equalities (9.5.4) and (9.5.5) are clear from the definitions of mm{q', q") and max(g', q"). □ Now we begin the proof of Theorem 9.4.2 itself. In view of Proposition 9.5.1, every element in the lattice SfA, the smallest lattice containing the subspaces (9.4.1), is A hyperinvariant. Now let ££ be an ^-hyperinvariant subspace. Then if is, in particular, A invariant; therefore SB = SB D 9iAi(A) + ■ ■ ■ 4- 2 D 9lAm(A) (9.5.6) where A,,. . . , Am are all the distinct eigenvalues of A. Now if D S/l^A) = 3i fl Ker(/1 - A,/)'1 is also an /4-hyperinvariant subspace. [Recall that the integers r- are defined by the minimal polynomial (A - A, )r' ■ •• (A - Am )r"' of A.] Thus, to show that ^6^,, we can assume that A has only one eigenvalue A0. Letting p, > • • • >p, be the partial multiplicities of A, in view of Lemma 9.5.2 it will suffice to verify that where (qt, . . . , ^()EA(/?,, . . . , p,) and X1- are defined as in equation (9.5.2) [with respect to a Jordan basis/'0 of A]. Actually Xlqi + --- + X'qi = (Ker N"1 D Im N"1'"1) + ■ ■ ■ + (Ker N* D Im N"''q') (9.5.7)
Further Properties of Hyperinvariant Subspaces 311 where N=A- A0/. Indeed, as Tq CKer W'Tl Im NPrq', i = 1,...,/, the inclusion C in (9.5.7) is obvious. For the opposite inclusion, let reKerW'THmW'"''' so x = NPr9iy for some y with N"'y = 0. Write >> = >>, +y2 + ••• + y„ where yy eSpan{/(/\ . . . , f^}. Then x = E;'=I JV^'^and Wy^O for j=l,...,l (9.5.8) We want to show that Np'~'l,yJE 3C' or, equivalently N9'+p-9'yj = 0, j = \,...,l (9.5.9) But since (<?,,. . . , ?,)6A(p„ . . . , p,), we have qt+ pt - q{ >min(p,, p;), l</</, and (9.5.9) follows from (9.5.8). Theorem 9.4.2 is proved. 9.6 FURTHER PROPERTIES OF HYPERINVARIANT SUBSPACES We present here some properties of the lattice Hinv(j4) of all /4-hyperin- variant subspaces. Theorem 9.6.1 For any transformation A: <p"—> <p" the lattice Hinv(/4) is distributive and self-dual and contains exactly k r">,~\ -, n n^-pjv.+iX^+i) (9.6.i) elements, where p(,° > • • • > p^' are f/ie partial multiplicities of A corresponding to the ith eigenvalue, i—\,...,k, and k is the number of different eigenvalues of A (in particular, Hinv(j4) is finite). Let us explain the terms that appear in this theorem. By definition, a lattice A of subspaces in <p" is called distributive if m n (,v, + jv2) = {M n xx) + (m n ,v2) for every Jt, J*ft, Jf2E.A. The lattice A is said to be self-dual if there exists a bijective map i/r: A-* A such that \p(M + ,V) = ifi(M)n i(i(Jf), ^(M C\Jf) = ip(Jl) + iff(Jf) for every J,i"EA. [In other words, A is isomorphic (as a lattice) to the dual lattice of A.] Proof. Note that every /l-hyperinvariant subspace if admits the representation
312 Commuting Matrices and Hyperinvariant Subspaces <£ = $ n 9tAi(A) + ■ ■ ■ + £ n mKk(A) where A,,. . . , \k are all the distinct eigenvalues of A. As k &i n i?2 = 2 (^, n aAi(i4» n (i?2 n aAi(i4» i = 1 and #, + i?2 = E [(#, n aAj(^)) + (i?2 n aAi(,4))] for any /l-hyperinvariant subspaces if, and 2£2, we assume (without loss of generality) that A has only a single eigenvalue A0 [i.e., 9tK (A) = <p"]. To show that the lattice of /l-hyperinvariant subspaces is distributive, first observe the following equality for any real numbers r, s, t: min(max(r, s), t) = max(min(r, t), min(s, t)) (9.6.2) This equality can be easily verified by assuming (without loss of generality) that r<s, and then by considering three cases separately: (1) t<r<s; (2) rs;<j; (3) r<s < t. Now let Mx, M2, Jt3 be /l-hyperinvariant subspaces. According to Lemma 9.5.2, write M, = 3K\n + --- + 3fm{l), i = 1,2,3 in the notation of Lemma 9.5.2, where q{,) = (q\'\. . . , <7^)£ A(p,, . . . , pm), i = 1,2, 3, andp, s: • • • >pm are the partial multiplicities of A. Using (9.5.4) and (9.5.5), we have Mx n (M2 + M3) = <p(min[max(?<2), q0)), q(l)]) (9.6.3) and (Jt1 flijj + fi.n M3) = <p(max[min(?(1), qi2)), mm(q{1\ ?(3))]) (9.6.4) Using (9.6.2), we obtain equality between (9.6.3) and (9.6.4). To prove the self-duality of Hinv(.<4), observe that, in view of Lemma 9.5.2, the map i/»: Hinv(J4)-*Hinv(J4) denned by where (<?,,. . . , qm) e A(p,,. . . , pm) satisfies the definition of a self-dual lattice. For instance:
Exercises 313 (m m \ / m \ i=\ i=\ ' xi=\ f m m - 2j -npi-maxiqlq",) ~ 2j ■# min( Pl ~q\,Pl~q ■ i=l j = 1 = (2n->(Z*U) It remains to verify the H'mv(A) has exactly [U(pj-pl+t + i)]{pm + i) (9.6.5) elements. Instead of Hinv(A), we count elements in A(p,,. . . , pm). Using induction on m [formula (9.6.5) obviously holds for m - 1], assume that HP2, ■ ■ ■ , Pm) has exactly [ny™2' {Pj-pi+l + 1)] (pm + 1) elements. Now observe that (q2 + s, q2,. . . , qm) belongs to A(p,,. . . , pm) if and only if Uz> • • • » <lm) belongs to A(p2, . . . , pm) and 0<s<p, -p2. This completes the induction step. □ We conclude this section by observing that the number of /l-hyperin- variant subspaces for A: <p"-»(p" lies between 2 and 2", and both bounds can be attained. Indeed, the transformation / has only trivial hyperinvariant subspaces, whereas a diagonable transformation with n distinct eigenvalues has 2" hyperinvariant subspaces (see Examples 9.4.1 and 9.4.2). That the number of A -hyperinvariant subspaces cannot exceed 2" follows from a general result in lattice theory [see, e.g., Theorem 148 in Donnellan (1968)] using the fact that Hinv(>l) is distributive and each chain in Hinv(/1) contains not more than n + 1 different subspaces. 9.7 EXERCISES 9.1 Consider the transformation 2 1 0 -10 0 1 1 2. :<P3-<P3 written as a matrix with respect to the standard basis e,, e2, e3. (a) Find all transformations that commute with A. (b) Find all j4-hyperinvariant subspaces.
314 Commuting Matrices and Hyperiuvariant Subspaces 9.2 9.3 9.4 9.5 Show that if a transformation A: <p"—><p" has n distinct eigenvalues, then every transformation commuting with A is diagonable. Conversely, if every transformation commuting with A is diagonable, then A has n distinct eigenvalues. Supply a proof for Corollary 9.1.5. Show that if AJn(\it) = Jn(\u)A, then A is diagonable if and only if A is a scalar multiple of the identity. Prove or disprove each of the following statements for any commuting transformations A: $"-> <p" and B: $"-> <£": (a) There exists an orthonormal basis in which A and B have the lower triangular form. (b) There exists a basis in which both A and B have Jordan form. (c) Both A and B have the same eigenvectors (possibly corresponding to different eigenvalues). (d) Both A and B have the same invariant subspaces. 9.6 Show that any matrix commuting with "0 0 0 L.1 1 0 0 0 0 • 1 • 0 • 0 • • 0" • 0 • 1 • 0- is a circulant. 9.7 Show that any matrix commuting with 0 0 0 flo 1 0 0 «1 0 • 1 • 0 • a2 • 0 • 0 1 • an_ where a0, a,, of A. 9.8 Describe all matrices commuting with , an_t are given complex numbers, is a polynomial "0 0 0 0 Q = 0 i- 1 0 Ll 0 0 0J
Exercises 315 Are all of these polynomials of Ql Find all Q-hyperinvariant sub- spaces. 9.9 Describe all transformations commuting with a transformation A: <£■"—> <p" of rank 1. Find all /4-hyperinvariant subspaces. 9.10 Let A: <p"—> <p" be a transformation. Prove that every /l-hyperin- variant subspace is the image of some transformation which commutes with A. {Hint: Use Lemma 9.5.2.) 9.11 Show that every /1-hyperinvariant subspace is the kernel of some transformation which commutes with A. 9.12 Prove that for the matrix A from Exercise 9.7 we have Hinv(j4) = Inv(,4). 9.13 Is Hinv(>l) = Inv(j4) true for any block companion matrix " 0 I 0 ••• 0 " 0 0 I ••• 0 - ^0 ^l -™2 ' " ' -™n-l " where Af are 2x2 matrices? 9.14 Show that for circulant matrices A in general Hinv(>l) ^ lnv(A). Find necessary and sufficient conditions on the circulant matrix A in order that Hinv(,4) = lnv(A). 9.15 Give an example of a transformation A and of an /l-hyperinvariant subspace M that does not belong to the smallest lattice of subspaces containing the images of all polynomials in A. 9.16 Give an example analogous to Exercise 9.15 with "images" replaced by "kernels." 9.17 Give an example of a transformation A such that lm(A) is not distributive.
Chapter Ten Description of Invariant Subspaces and Linear Transformations with the Same Invariant Subspaces In this chapter we consider two related problems: (a) description of all invariant subspaces of a given transformation and (b) to what extent a transformation is determined by its lattice of all invariant subspaces. We have seen in Chapter 2 that every invariant subspace of a linear transformation A: <£""—* <p" is a direct sum of irreducible ^-invariant sub- spaces, that is, such that the restriction of A to each one of these subspaces has only one Jordan block in its Jordan form. Thus, to solve the first problem mentioned above it will be sufficient to describe all irreducible ^-invariant subspaces. This is done in Section 10.1. The second objective of this chapter is a characterization of transformations having exactly the same set of invariant subspaces. It turns out that, in general, not all such transformations are polynomials of each other. Our characterization (given in Section 10.2) will depend on the description of irreducible invariant subspaces given in Section 10.1. 10.1 DESCRIPTION OF IRREDUCIBLE SUBSPACES In the description of invariant subspaces upper triangular Toeplitz matrices, and matrices that resemble upper triangular Toeplitz matrices, play an important role, as we see later. We recall first some simple facts about Toeplitz matrices. 316
Description or Irreducible Subspaces 317 A matrix A of size / x / is called Toeplitz if its entries have the following structure an a_ uV, -y+i -y + i a0 J = k-*ll, (10.1.1) where a, £ <p, i = -/' + 1, -;' + 2,. . . , / - 1. Denote by Ty the class of all upper triangular Toeplitz matrices of size ;' x /', that is, such that a, = • • • = aj_x =0 in equation (10.1.1). Proposition 10.1.1 The class T- is an algebra, that is, it is closed under the operations of addition, multiplication by scalars, and matrix multiplication. Moreover, if AE Tj and det^^O, then A'1 £ Tr Proof. All but the last assertions of Proposition 10.1.1 are immediate consequence of the definition of T-. To prove the last assertion, suppose that 0 fln 0 0 -y+i fl„ J *.yi U;1 .- bu. = 1 One deduces easily that blk = 0 for i > k. Further ft« = flo'; floft,--i.( + fl-i^« = o and in general 2 "-*+„&,-,.,-= 0, ■ * = 0 /-I; i = l, / (10.1.2) p = 0 (It is assumed that bkt = 0 whenever A: <0.) Equations (10.1.2) define Z>,_t, recursively: '<-*./ 2 «-*+/,-„,, (10.1.3) p=0 J Using (10.1.3), we can prove by induction on A: (starting with A: = 0) that bi_k t does not depend on i*. But this means exactly that the matrix \bik\t k=l is Toeplitz. □
318 Description of luvariant Subspaces and Linear Transformations Let A: <p"—» <p" be a transformation. It is clear that each .4-invariant subspace M can be represented as a direct sum of nonzero /l-invariant subspaces Ml,. . . , Mk, each of which is irreducible, that is, not represent- able as a direct sum of smaller invariant subspaces (indeed, let / be the maximal number of factors in a decomposition M = M1 + --- + M, (10.1.4) into a direct sum of nonzero ^-invariant subspaces M{; then from the choice of / it follows that each Mt in equality (10.1.4) is irreducible). To describe the /l-invariant subspaces, therefore, it is sufficient to describe all the irreducible subspaces. It follows from Theorem 2.5.1 that an /l-invariant subspace if is irreducible if and only if the Jordan form of A ^ consists of one Jordan block only. In other words, if is irreducible if and only if there exists a basis x,,.., xp in if and a complex number A such that {A-XI)Xl=0, (A-\I)xl+l=Xj (/ = 1, ...,/>-1) (10.1.5) that is, the system {xt}p=l is a Jordan basis in if. Consequently, every irreducible subspace is contained in some root subspace. (One can see this also from Theorem 2.1.5.) Thus it is sufficient to describe all the irreducible subspaces contained in a fixed root subspace corresponding to the eigenvalue A. Without loss of generality, we assume that A = 0. (Otherwise, replace A by B = A- \I and observe that both transformations A and B have the same invariant subspaces.) The root subspace @l0(A) is decomposed into a direct sum of Jordan subspaces: 9t0(A) = 2l + --+2m (10.1.6) The description of the Jordan subspaces contained in 9t0{A) is given according to the number m of irreducible subspaces in the decomposition (10.1.6). If $10{A) is an irreducible subspace [i.e., m = 1 in (10.1.6)] and the vectors {xi}pi=l form a Jordan basis in 9l0(A), then Span{*,, . . . , xy}, ;' = 1,. . . , p are all the /l-invariant subspaces in 9ta{A), and all of them are irreducible subspaces. Consider now the case when m = 2 in (10.1.6). We use the following notation: if {z,}f=I is a system of vectors z, E <p", denote by z(y) the column formed by vectors, as follows: If /'</>, then zU) = L z, .
Description or Irreducible Subspaces 319 If ; > p, then z (»_ " ZP ZP- *1 0 _ 0 Let g,, . . . , gp £ «SP, and f%,. . . , fqE.2£2 be Jordan bases in if, and iP2, respectively. Without loss of generality, suppose that p>q.\t is known that in any irreducible subspace ^(¥=0) of A there exists only one eigenvector (up to multiplication by a nonzero scalar). We describe first all the irreducible subspaces that contain the eigenvector g, [and thus are contained in In the following proposition /' is a fixed integer, 1 </'</?. Proposition 10.1.2 Let T{v), where v = min(/, q), be an upper triangular matrix of size j x /', whose diagonal elements are zeros and the block formed by the first v rows and first v columns is a Toeplitz matrix: y(") _ 0 0 0 0 0 ». 0 0 0 0 »2 ' a, 0 • 0 • 0 • • "--i • "u-2 • "u-3 0 0 ft.„+l • ft.„+l • ft.B+l • Pu.v + l 0 • ft, ■■ ft, •■ ft, • ft/ • 0 (10.1.7) Then the components of the column r<» . g (/) _)_ y(")r<y (10.1.8) /orm a Jordan basis of some j-dimensional A-invariant irreducible subspace that contains g,. Conversely, every irreducible subspace of dimension j of A that contains g, has a Jordan basis given by the components of (10.1.8), where T is some matrix of type (10.1.7). The multiplication in T(v)f(i) is performed componentwise: for complex numbers xrs and n-dimensional vectors z,,..., z- we define L*,. «•*/-! zn '/-j rxuzl+xnzj_1 + ---+xljz1 21/ 22 i — I * 2/ I t-XklZj "*" ■KJk2Z/-l + ' ' ' + -**/Z|
320 Description or Iuvariant Subspaces and Linear Transformations Note also that the dimension of every irreducible subspace of A contained in 0lo(A) does not exceed p [recall that m = 2 in (10.1.6) and that dim iP, = p s: dim iP2 > 1]; so Proposition 10.1.2 does indeed give the description of all irreducible subspaces that contain g,. Proof. First observe that if iP is an irreducible subspace and g, E iP, then iPniP2 = {0} (10.1.9) Indeed, if y E iP D iP2 -- {0}, then for some i (0si<p-l) and some complex number y # 0 the equality A'y = y/, holds. So /, E iP n iP2 C if, and since also g, E iP, the irreducible subspace if contains two linearly independent eigenvectors/, and g,, which is impossible. From (10.1.9) and the inclusion iP + iP2 C $0(;4) = iP, + iP2 it follows that dim iP < dim iP, = p. Now let iP be an irreducible subspace containing g, with a Jordan basis yu . . . , yt\ so yx = «0g, and Ayj + 1 = yfc = 1,...,/'- 1). We look for the vectors _y2,. . . , yy in the form of linear combinations of gt,. . ■ , g , /,,..., / . Two possibilities can occur: (l)/':£ #; (2) </ + 1 </'<p. Consider first the case when /< ^. Condition Ay2 = _y, implies that yi = "oft + "i£i + Pi/i Condition j4y3 = y2 implies that _y3 = a0g3 + alg2 + a2gx + /3, /2 + j82/,. Continuing these arguments, we obtain y, = <*„g, + <*,«,-, + <*2g,-2 + ■■■ + ",-282 + «, ,s, + ft/,-, + ft/,- 2 + • ■ ■ + ft-2/2 + ft-,/, y,-, = «„«,., + «,g,-2 + a2g,-, + • • • + o;_2g, + /},/,., + ft,/;_3 + • • ■ + ft_2/, y, =a(lg, +a,g2 +a2g, +0,/, +■ 02/f y2 = a„g2 +a,g, +0,/, (10.1.10) J" i = a»8, where a,, . . . , ay_,, j8,,. . . , j3y_, are some numbers. In case <j> + 1 </</? one finds analogously y, =«o£, +«iS,-, + a2g;.2 + --- + a,-2& + a,-,g, + ft/, + ft/,-, + • ■ ■ + ft-,/2 + ft/, >V I = °0S,-l + ",8,-2 + "2^,-3 + ■ • ■ + «,-2g, + ft/,-, + ft/,-2 + • • ' + ft-,/, ?/-,+ , = aoS,-,*i + «i£/-, + »28,„-i + --- +ft/i (10.1.11) >'i = "o& + ",82 +a28, ^2 = "oft +a,8i y, =«,,«, where a,,. . . , a; ,, j3,,. . . , )3 are some complex numbers.
Description or Irreducible Subspaces 321 Formulas (10.1.10) and (10.1.11) can be written in the form where C and 5(t,) are ;' xj matrices, and C is an upper triangular invertible Toeplitz matrix (invertible because its diagonal element is «0^0). By Proposition 10.1.1, C_1 is also an upper triangular Toeplitz matrix. It is easy to see that the matrix C~'S(l,) has the form T(v) [see (10.1.7)]: T(u) = C~'S(V). Put zU) = C'lyU) = gU) + Tiv)fU). It is easy to see that Spanly,}', =Span{zJ', and the vectors zl,...,z- satisfy (10.1.5). So the components of zU) form a Jordan basis in if. □ Now let Jt[ (¥ag{) be an arbitrary eigenvector of A contained in 9l0(A) = if, 4- SS2. Evidently, *, = £g, + tj/, (£ #0). Consider the system of vectors xi = £g{ + rjf, i = 1,. . . , q. Clearly, the vectors xlt...,xq satisfy the condition (10.1.5); therefore, they form a Jordan basis of some irreducible subspace <£C3/lQ(A). It can easily be verified that ^ + £- 9t0{A). Hence t\\m3! = q. By Proposition 10.1.2, for every irreducible subspace if containing the vector xl (the dimension ;' of ££ is necessarily not larger than q) there exists a matrix TU) of the form (10.1.7) such that the components of the column v(i) = xll) + r(y)g(,) form a Jordan basis in if. Conversely, for every matrix TU) of size / x / the components of the column J(y) form a Jordan basis in some irreducible subspace of A. Thus a complete description of the irreducible subspaces contained in the root subspaces %{A) = £et+£e2,is obtained. This description for the case when m = 2 in the decomposition (10.1.6) can be generalized for an arbitrary m. This is the content of the following theorem. Theorem 10.1.3 Let m^A) = <£x 4- • ■ • + <£m be a decomposition of the root subspace 9lx (A) of the transformation A: <p"—»<p" into a direct sum of irreducible subspaces if,,... , ifm. Let g1,...,gPiei?1;... ;/„..., 4e^r;...;/i,,...,/iPmei?m be Jordan bases in <£t,. . . , ifr,. . . , ifm, respectively (/?,>••• >pm). Let j be an integer such that 1 s/<pr = dim if,. For every i=\,...,m let vt = min(;', p,). Then for every set of matrices T\"l\. . . , 7,Jn"m) of the form (10.1.7) and of size j x /' the components of the column £<'"> = r<*i>g-(» + ... + Tir"rl')aii) +j*n + Tiryi')vU) + ■■■ + T^]hU) (10.1.12)
322 Description of Invariant Subspaces and Linear Transformations form a Jordan basis in some irreducible subspace of A that contains the vector /, (here «,,..., u E iP,_, and u,,.. . , v G <3Pr + , are Jordan bases in i?,_, and iPr+,, respectively). Conversely, for every irreducible subspace ££of dimension j such thatf G iP there exist matrices J1"1',. . . , T("m' such that the components of the column (10.1.12) form a Jordan basis in !£. Proof. Use induction on the number m of subspaces in the decomposition (10.1.6). For m-2 this theorem coincides with Proposition 10.1.2. Suppose that the theorem holds for m < k - 1, and assume that 0tx (A) = iP, 4- • • • 4- !£k. If iP is an irreducible subspace such that /, G if, then 2C\&r = {Q) (10.1.13) where &r = <#, 4- • • • + £r_x 4- £r+l 4- 2k. Indeed, for every y6^^{0} there exist a nonnegative integer i and a complex number y#0 such that j4')' = yf. If, in addition, y G J^, then y/, = A'y G JX",, which contradicts the direct decomposition 3?A (A) = S£x + • • • + 3!k. From (10.1.13) and from the inclusion iP C %o(;4) = if, + • • • + $k we deduce that dim i? < dim ££r. Assume that r < k. (The case r = k can be considered in a similar way.) If iP C if, 4- • • • 4- Z£k _,, then by the induction hypothesis the components of a column of the form (10.1.12) form a Jordan basis in iP. If if^iP, 4- • • • 4- i^_,, consider the subspace .2" = (iP 4- &k) n (if, 4- • • • 4- &k_t). Since iP D iPA = {0}, the equality dim %' = dim(<2 4- £k) + dim(<2, 4- • • • 4- <£k ,) - dim(^P, 4- • •• 4- 5£k) = dim 5£ holds. Evidently, ££' is ,4 invariant. Let us show that iP' is an irreducible subspace. Suppose the contrary; then there exists an eigenvector gG.2" of A that is not a scalar multiple of/,. Since JX" C iP, 4- i?t, the vector g is a linear combination of the eigenvectors/, and A,, where A, G iPt. But then A, G if' C iP, 4- • • • 4- <£k_x, which means that the sum (i?, 4- • • • 4- i?t_,) + .2^ is not direct, and this is a contradiction with our assumptions. So ££' is an irreducible subspace. Since ££' C j£, 4- • • • 4- ■^t-i> by tne assumption of induction the components of the column f(y) form a Jordan basis in iP' for some T\"'\ . . . , r*"*,"1'. The property that if C i? 4-j^ implies the inclusion iP C iP'4-iPt. As it has been proved above, there exists a matrix Tk"k) such that the components of the column yd) = f(/> + rK>£</> = r(-.)^(y) + ...+/(» + ... + r^A(/) form a Jordan basis in iP. □ Theorem 10.1.3 also gives a description of all irreducible subspaces of A that contain an arbitrarily given eigenvector of A from the root subspace Indeed, let xx G 3?A (j4) be an eigenvector, and let r be the minimal integer such that *, G iP, 4- • • • 4- <£r. Then *, = a,g, + • • • + arf, where a,, . . . , ar G <p and ar ^ 0. Consider the system of vectors xl = algj + • ■• + arf, i = 1,. . . , pr. Evidently, xt,...,x satisfy the condition (10.1.5).
Transformations Having the Same Set of Invariant Subspaces 323 Therefore, their linear span Z£r — Spanf*,,. . . , x } is an irreducible sub- space. It is easily seen that £, n (#, + ■•• + #,_, + #r+1 + ••■ + sek) = {0} So in the representation (10.1.6) one can replace !£r by 3?r. Then in view of Theorem 10.1.3 the components of the columns of form (10.1.12) describe all the irreducible subspaces of A, which contain the vector xx [in (10.1.12) write x' in place of/(/)]. Observe that every irreducible subspace contains an eigenvector of A. So the description in the preceding paragraph gives all the irreducible subspaces of A (if the vector xl is varied). 10.2 TRANSFORMATIONS HAVING THE SAME SET OF INVARIANT SUBSPACES Consider a transformation A: <p" —► (p". In this section we describe the class of all transformations fi: <p"—»■ <p" such that lnv(A) = Inv(fi). A relative simple case of this situation has already been pointed out in Theorem 2.11.3 (when one transformation is a polynomial in the other). Surprisingly enough, it turns out that the set of transformations B such that Inv(B) = lnv(A) does not generally consist only of the transformations f(A), where /(A) is a polynomial with the properties indicated in Theorem 2.11.3. It can even happen that noncommuting transformations have the same set of invariant subspaces. Before we embark on the statement and proof of the main theorem describing the transformations with the same set of invariant subspaces (which is quite complicated), let us study some examples. example 10.2.1. Let A be the n x n Jordan block 7„(A(I). The invariant subspaces of A are Z£- = Span{e,, . . . , ey}, / = 0,. . . , n (by definition £0 = {0}). Let us find all transformations B: <p" -* <p" for which Inv(/1) = Inv(fl). It turns out that Inv(>l) = Inv(B) if and only if (in the basis e,,. . . , e„) B has the form an 0 0 0 «12 fl22 0 0 fl13 • fl23 • fl33 - 0 • • «.„" •• a2n ■ °3n ■ <*nn- where fln = ••-=■«„„ and a,2«23---«„-..„ *0 (10.2.2)
324 Description of Invariant Subspaces and Linear Trausformations Indeed, suppose Inv(B) = lnv(A). Then clearly the matrix representing B has the triangular form (10.2.1). Moreover, it is easy to see that au = • • • = ann. Indeed, the numbers au,. . . , ann are the eigenvalues of B; if they are not all equal, then there exists a pair of nonzero complemented invariant subspaces of B, namely, the root subspaces corresponding to a pair of complemented nonempty subsets in cr(B). But the existence of a pair of nonzero complemented subspaces contradicts the assumption that Inv(fi) = Inv(,4). Let us show that a12 a23 ■ • • an _l n ^0. Consider the transformation C = B — anI, which has the same invariant subspaces as B. If for some ;' (1<;'<«-1) we have ajJ+i - , then C^+l C S6j_l. Hence dimKerCs2 (10.2.3) Since any nonzero vector in Ker C spans a one-dimensional C-invariant subspace, inequality (10.2.3) contradicts the assumption Inv(B) = \x\\{A) again. Conversely, suppose that B satisfies (10.2.1) and (10.2.2). Put C = B-auI. We show that Ker C = %v Let x = E"=, £ye, e Ker C and x * 0. Let p be such that £p +, = • • ■ = £„ = 0 and £jp ¥= 0. Then p = 1. Indeed, if p were greater than 1, then Cx = ap p + lep + l + • ■ ■ #0. So x = £,e,, that is, KerC = i?I. This means that any two eigenvectors of B are collinear. Appeal to Theorem 2.5.1 [(d) <£>(e)] and deduce that for any two B- invariant subspaces Ml and M2, either MlCM2 or Jt2CJtl. Since S£0, !£x,. . . ,!£n are B invariant and dim 5£^; = / (/ = 0,. . . , «), it follows that any B-invariant subspace coincides with one of i?.. D Example 10.2.1 provides a situation when Inv(j4) = Inv(B) but A and B do not commute [take A = J„(\()), n^3, and B as in (10.2.1) with distinct nonzero numbers a- -+1, y = 1,. . . , « — 1]. If A has more than one Jordan block, the situation may be completely different from Example 10.2.1. example 10.2.2. Let i4 = y3(0)e^2(0) It turns out that Inv(fi) = Inv(.<4) if and only if B is a polynomial in A, B = p(A), such that /?'(A) =^0. In other words, B has the form ©[« »] (.0.2.4, for some a, b, c E <p where b ¥^ 0. As by Theorem 2.11.3 Inv(fi) = Inv(/4) for every B in the form (10.2.4) B = a o c 0 a b L0 0 a
Transformations Having the Same Set of Invariant Subspaces 325 with b t^O, we must verify only that every B: <f5—* <ps such that Inv(fi) = Inv(/1) has the form (10.2.4) with b ^0 (in the basis et, e2, e3, e4, e5). So assume Inv(fi) = Inv(j4). Then clearly B has upper triangular form, and (see the argument in Example 10.2.1) the elements on the main diagonal of B are all equal. Without loss of generality, we can assume that the main diagonal in B is zero: B = 0 a, 0 0 0 0 0 0 0 0 fl23 0 0 0 fl24 «34 0 0 "35 «45 0 J As Span{e4, es} is A invariant and hence belongs to Inv(fi), we have fli4 = fli5 = fl24 = fl25 = fl34 = fl3s ~ 0- ^ one °f tne numbers ai2> a23, or a4S were zero, then B would have three one-dimensional invariant subspaces whose sum is direct. This contradicts the assumption Inv(B) = lm(A) (A cannot have more than two one-dimensional invariant subspaces whose sum is direct). Hence al2, a23, and a45 are different from zero. It remains to show that al2 = a2J = a4S. To this end observe that Span{e, + e4, e2 + e5} is A invariant and hence B invariant. So B{e2 + e5) = a12e, + a45e4 e Span{e, + e4, e2 + e5} which implies aI2 = a45. A similar analysis of the fi-invariant subspace Span{e,, e2 + e4, e3 + es} leads to the conclusion that a23 = a45. D Now we state the main theorem, which describes all transformations B: <£""-* <p" with Inv(B) = Inv(,4), where the transformation A: <p"^» (p" is given. This description will contain the results of Examples 10.2.1 and 10.2.2 as very special cases. Note that without loss of generality we can assume (and we do) that A is an n x n matrix in the Jordan form A = d\ng[Al,A2,. . . , Ak] where cr(Aj) = {Ay}, A,,...,At are all the different eigenvalues of A, and i4|=diag[JPi(A>),...,J#>iit(Ay)] where pt > • • • ^pm. Of course, the number m, as well as p, pm, depend on /; we suppress this dependence in the notation for the sake of clarity. The notation for upper triangular Toeplitz matrices will be abbreviated to the form
326 Description of Invariant Subspaces and Linear Transformations r,(fl0.---.fl,-i): Finally, we use the notation £/,K>--- >ap-i*F) = 3-/-3 ',-2 0 0 0 L0 0 0 "o 0 0 0 0 ai ao 0 0 0 «z • «r <V 0 0 • v, •fl„-2 •flP-3 /..- Vl flP-2 /l2 • /zz • Vl ■ f\,q-p-\ Jl.q-p-l Jl.q-p-l °p-l fl„-2 /.., h. h.q Jq-p. °P- ~P ~P ~P q-p 1 Lo 0 0 0 An -I where F is the (q - p) * (q - p) upper triangular matrix whose (/, /) entry is ftj (i=s/). It is assumed, of course, that p^q. In other words, U {aQ,. . . , a x; F) is & q x q matrix whose first p superdiagonals (starting from the main diagonal) have the structure of a Toeplitz matrix, whereas the next q — p superdiagonals contain the upper triangular part of the matrix F, which is not necessarily Toeplitz. If p = q, F is empty and Up(.a0,..., flp_,; F) = Tp(a0, . .., ap_x). Theorem 10.2.1 If Inv(fi) = Inv(/4) for a transformation B: <p"—» <p", then fi = diag[fi1,...,B,] (10.2.5) (in a chosen Jordan basis for A), where each block Bj = B\m (>l)(/ = 1,. . . , k) has the form Bj = Upi(njt b2,..., bp-, F)®TP2(^ b2,..., bp2)®-- ■ ®TPm(nrb2,...,bpJ (10.2.6) for some complex numbers ju,,. . . , p.k, b2,. . . , b with ju, ¥^ ^ (i?4/), b2 ¥" 0 and an upper triangular matrix F of size (p, - p2) x (pt - p2); the numbers b2,. . . , b , as well as the matrix F, depend on j. Conversely, if B has the form (10.2.5), (10.2.6) and jii,, bf and F have the above properties, then Inv(B) = Inv(v4).
Transformations Having the Same Set of Invariant Subspaces 327 We relegate the lengthy proof of this theorem to the next section. The proof will be based on the description of irreducible subspaces obtained in Section 10.1. We conclude this section with two corollaries of Theorem 10.2.1. Corollary 10.2.2 Suppose that AB = BA. Then \m{A) = Inv(fi) if and only if B = f(A), where /(A) is a polynomial such that /(A,) ^ /(A^) for eigenvalues A, ^ A> o/ A, f'(K)^° whenever A0 £ a(A) and Ker(A - A0/) ^ 9lXo{A). In other words, the conditions of Theorem 2.11.3 are not only sufficient, but also necessary, provided A and B commute. Proof. In view of Theorem 2.11.3 it is necessary to prove merely the "only if" statement. So assume Inv(j4) = Inv(fi). Let A,,...,At be the different eigenvalues of A, and let be the decomposition of 3?A (A) into a direct sum of Jordan subspaces <2,,,. . . , i£ „, such that dim jk. > • • • > dim 5£: „ . The restrictions A\ „ and B\ig commute; so in view of Theorem 9.1.1 (observing that A\^ has only one Jordan block) there exists a polynomialpy(A) such that B\^ - pj(A\<£ ). It follows now from Theorem 10.2.1 that B\m ~ Pj(A\m). Since the minimal polynomials of A\m, /' = 1,. . . , k are relatively prime, there exists a polynomial p{ A) such that B - p(A). Indeed, let p(\) be an interpolating polynomial such that />(A,) = />y(A;); />'(A;) =/>;.(Ay);...;/>(*'_,)(Ay) = //*'""( A;), /= 1,. . . , k, where kj = dim&.l and <7lQ)(Ap) denotes the ath derivative of the polynomial q(\) evaluated at A0. (See tiantmacher (1959), Lancaster and Tismenetsky (1985), for example, for information on interpolating polynomials (see also Section 2.10).) From the definition of a function of the matrix A (see Section 2.10), it follows that B|yj = p(A\m) for j = 1,. . . , k and, consequently, B = p(A). Using Theorem 10.2.1 once more, we deduce that p(A(.) ^p(A;) for i^j andp'(A,)#0for i = l,.. .,*. □ Corollary 10.2.3 Let A: $"—>$" be a transformation. Then every transformation B with Inv(fi) = Inv(j4) commutes with A if and only if the following condition holds: for every eigenvalue A0 of A with Ker(.<4 - A0/) ¥^ 9tk (A) and dim Ker(j4 - A0/) > 1 we have dim 9lXo(A) - dim Ker(,4 - A0/)p > 2
328 Description of Invariant Subspaces and Liuear Transformations where p = p(\0) is the maximal integer such that Ker(A - A0/)p # 0tK (A). Further, the set of all transformations B with Inv(B) = ln\(A) coincides with the set of all transformations commuting with A if and only if dim Ker(A — A0/) = 1 for every eigenvalue A0 of A, that is, A is nonderogatory. The proof is obtained by combining Theorem 10.2.1 with the description of all matrices commuting with A (Theorem 9.1.1). 10.3 PROOF OF THEOREM 10.2.1 We start with three lemmas to be used in the proof of Theorem 10.2.1. Let A: <p"—»• (p" be a unicellular transformation. (Recall that A is called unicellular if <p" is its irreducible subspace.) Let g,,. . . , gn be a Jordan basis of A. Let B be a transformation such that its matrix in the basis g,,. . . , gn has the form B=Un(bl,...,bk;F) (10.3.1) for some Z>, € <p and an (n - k) x (n - k) upper triangular matrix F. Lemma 10.3.1 If B has the form (10.3.1) with 62^0, then in any Jordan basis for B the transformation A has the form A = UK(alt...,ak;G) (10.3.2) for some a, G <p with a2 ^0, and some upper triangular matrix G. Proof. Without loss of generality we can assume that cr(A) = {0} and the Jordan basis g,,.,g„ coincides with the standard basis: & = e,, / = 1,. . . , n. Let B = B, + C, where B^Tn{bx,...,bn), C=Un(0,...,0;F') (10.3.3) for some bk + l,. . . , bn £ <p and upper triangular matrix F' of size (n - k) x (n - k). Since b2^0, it follows from Example 10.2.1 that the transformations A, B, and fi, have the same invariant subspaces. Hence (recalling the equivalence (a)o(e) in Theorem 2.5.1) the transformations B and B, are also unicellular. As ABt = BXA and B, is unicellular, it follows from Theorem 9.1.1 that A = p(Bt) for some polynomial p(A). Let /,,...,/„ be a Jordan basis for B. We claim that the matrix of C in the basis ft,.--,f„ has the form (10.3.3) again, possibly with another matrix F'. Indeed, the only nonzero B-invariant subspaces are
Proof of Theorem 10.2.1 329 Span{e,, e2,. . . , e,}, i = 1,. . . , n (because they are ^-invariant sub- spaces). On the other hand, Example 10.2.1 ensures that the only nonzero fi-invariant subspaces are Span{/,, f2, ...,/},/ = 1,. . . , n. It follows that f E Span{e,, e2,. . . , et}, i = 1,. . . , n. Now it is easily seen that the matrix of C in the basis /,,...,/„ has the form (10.3.3). Consider the following relations m m A = p{Bx) = p(B-C) = ^ a,(fi-C)y = E aB' + H (10.3.4) ;=0 ;=0 where every summand in H contains C as a factor. Consequently, the matrix of H in the basis /,,...,/„ is upper triangular and the first k diagonals (counting from the main diagonal) are zeros. Now (10.3.2) follows from (10.3.4) and, by Example 10.2.1, a2^0. □ Let vectors dx,. . . , dp, /,,..., / , be linearly independent in <p". In the sequel we shall encounter systems of vectors of the form gp =dp +«/„_, + Bfp_2+--- + yf2 + 8f1 g„-. = 4,-i + «/p-2 + 0/#,-3 + "- + y/i g3 =d3 +af2 + Bf (10.3.5) g2 =d2 + ag{ Ei = di where a, B,. . . ,y,S are some numbers, and hP =dP +ap.p-JP-\ +ap.P-ifp-2 +--- + ap2f2 +aplfy, hp-l= dp\ + "p-l.p-lfp-l + "p-l.p-lfp-l +■■■+ Ap-l.l/l , *3 = <*3 +ai2f2 +a3,/i. (10-3.6) h2 =d2 +a21/,, hi =di where atj are certain numbers. Lemma 10.3.2 If for every m = l,...,p the subspaces Span{g,,. . . , gm) and Span{A,, ...,hm) coincide, then gj = h, (/; = 1, . . . , p). Proof. Use induction on p. For p = 1 the lemma is evident. Assume the lemma holds true for p = k, and Span{g1; . .. , gk + l} =
330 Descriptiou of Invariant Subspaces aud Linear Transformations Span{ft,, . . . , hk + l}. By the induction hypothesis, g, - hx, . . . , gk- hk. For every vector x = E**,1 £ygy ESpan{g,,. . . , gk+l} we have x = E*^,1 T}jh/. Rewrite the equation k+l k + l S gjgj = 2 Vjh, in the form k 2 U/-17y)gy + &+igjt + i -^Jk + l^i + l =0 - /-I If £;t + 17^%+1 or ^t + i -%+i and gi + 1 ^hk + i, this will contradict the linear independence of dt,. . . , dk + l, fx,. . . , fk. So we must have gJk + ,= Let ^ be the set of all irreducible subspaces of a transformation .4: <p"-* <p". Clearly, $A C Inv(j4). Since every invariant subspace for a transformation can be represented as a direct sum of irreducible subspaces, the equality Inv(j4) = Inv(fi) holds if and only if $A = $B. Now consider a special case of this equality. Lemma 10.3.3 Let A: <p"-> <p" be such that where if, (i' = l,2) are irreducible subspaces of A corresponding to the same eigenvalue. Let dim if, = q s p = dim i?2, and to dx,. . . , dq E «SP,; /,,..., / E i?2 £>e Jordan bases in these subspaces. Then for B: <p"—»• <p" we /lave /^ = ^B i/ and ort/y if f/ie matrix of B in the basis dx,...,d, /,,. . . , fp has the form B = [/,(*>„ ...,bp; F)©Tp(i„. . . , 6„) (10.3.7) where bi,...,b are complex numbers with b2¥:0 and F is an upper triangular matrix of size (q - p) x(q - p). Proof. First we prove the necessity, that is, if $A = $B, then B has the form (10.3.7). Consider first the case p = q and prove the necessity by induction on p. For p = 1 everything is evident. Suppose that the lemma is true for p = k, and let if,, i?2 be irreducible subspaces of dimension k + l. Let <£\ = Span{d,, ...,dk); 22 = Span{/,,.. . , fk). Evidently, 2[ and <£'2 are irreducible subspaces of A corresponding to the same eigenvalue. Since
Proof of Theorem 10.2.1 331 $A-$B by assumption, the subspaces ,SP,, iPJ, iP2) iP2 are irreducible subspaces for B. By Example 10.2.1 and the induction hypothesis, the matrix representation of B in the basis dt,. . . , dk+l, /,, . . . , fk + l has the form •i,* + i J2.* + > *3,*+l R = whei P. 0 0 0 .0 e b2 b2 ft, 0 0 0 .c* ^3 &2 0 0 • *+l>fl* ■• bk ■■ bk_t ■■ bk_2 ■■ bt 0 *+,*o. cl.*+l C2,k + l CJ,k+l Ck,k + l ft, J e p. 0 0 0 _0 ft2 ft, 0 0 0 ft3 ft2 ft, 0 0 ■**,* + , ft, J We assume bl t^O; otherwise, consider the linear transformation fi + A0/, where A0#-6,, in place of fi and use the property that Inv(fi) = Inv(B + A0/). This condition means that B is invertible. Let J5P be an irreducible subspace of A such that dim iP = k + 1 and d, E if. By Theorem 10.1.3, there exist numbers a , ak such that the vectors Xk+l=dk + l+alfk +»2/*-I + Xk =dk + «l/*-l + «2/fc-2 + ■ x3 = d3 +a,/2 + or2/, + "t-l/2 + "t/l l2 =^2 + «,/, ^1 M form a Jordan basis in JiP. Since 6,^0, it follows that Span{*,, . . . ,xk,xk + l} = Span{*,,. . . , xk, Bxki_l} It follows from the form of B that Bxk + l = btdk + 1 + X cjk + ldj + «J 2 ft* + i->/y + • • • + «*ft,/, /=, Ly=, J k + 2 «yft*+,-;J/, uy=i and
332 Description or luvariaut Subspaces and Linear Transformations k Bxk + 1 - 2 Cj.k + lXj = M* + l + "A/* + K(fc2 - Ct>t + l) + «2i»lJ/*~l + /=1 Put y = tr^^t + i - 2 cM + IJ:y) = dk + 1 + a,/t + • ■ • Evidently, Spanf*,, . . . , j:*, j:t+,} =Span{jc,,. . . , xk, y}. Then by Lemma 10.3.2 we have aibV(b2-ck.k+i) + <*2 = «2;-• •;2 ay&r1(6*+i-/-c>+i.*+i)+ «* = "* These equalities hold for every a,,. . . , ak (by choosing all possible if; see Theorem 10.1.3). Therefore "2 ~ cJk,* + l» ^3 = c*-l.* + l> • • • ' "k-\ ~ c3,* + l> "* — C2,t + 1 Similarly, considering Jordan bases of the form A+l + M* + Mt-l + • • • + «*-1^2 + Ml /* +«!<**-! +M*-2 + •••+«*-l<*1 k + ».rfi /, we obtain 62 = flt_4 + 1, 63 = ak_l%k + 1, ...,bk_{= a3k + 1, bk = a2k+1. Let us show that, in fact, cuk+l = ax k+l. To this end consider a Jordan basis of A of the form where £ and ij are arbitrary numbers. We have * + i Bzk + 1 = *Z bk + 1_jzj + Zclk + ld1+r)alk + lf1 7 = 2 t + 1 y=2 As above, we obtain
Proof of Theorem 10.2.1 333 Span{z,,.. .,zk,zk+1} = Span{zl,.. . ,xk,Bzk+l} Further, k Bzk + l - X bk + l_jZj - C1-Jk + I2, = ft,2t + I + T)(flIJt + I - Clk + l)fl Puty = zt+I+TjZ)r1(flI.lt+1-cIJt+I)/I. Evidently, Span{z,, . . . ,zk, zk+i} = Span{z,, . . . ,zt, y} By Lemma 10.3.2, Tjftj" (a, t+, - c1Jk+I) = 0. Since ijcan be arbitrary, a, t + 1 = c, t + 1. Thus the necessity part of Lemma 10.3.3 is proved for the case/7 = q. Now consider the case q>p. Put h = q - p and proceed by induction on h. Assume that the necessity part of Lemma 10.3.3 holds for h^k, and let i?,, i?2 be irreducible subspaces for A with dim if, = p + k + 1 and dim i?2 = p. By Example 10.2.1 and the assumption of induction, the matrix representation of B in the basis dx,. . . , dp+k^1, ft, ■ ■ ■ , fp has the form B bl b2 b3 0 bx b2 0 0 b, 0 0 0 0 0 0 L0 0 0 >"2 .P + i >"' ■l,p + k "j,p + k "i,p+k ■l,p + k + l "l,p + k + \ ~3,p + k+l P ^k.p + k+i p-1 ''k + l.p+k+l 0 0 0 bx c. 0 ^p + k,p + k+l ft. . e Z>, b2 b3 0 bt ft2 0 0 b, LO 0 0 ft. J where b7 ^ 0. Let ■*,,+*-M =^P + t + l + "./p +»2/p-l +•■• + «,,-l/2 +«/./! *-p + k *p + k + aifp-l + <*lfp-2 + + "p-l/l Xk + 2 "*+Z "*" al/l = <*.
334 Description of Invariant Subspaces and Linear Transformations be a Jordan basis (for A) of an arbitrary irreducible subspace if of dimension p + k + 1 and such that dl G if. As above, we obtain Span{*,, x2,..., xp+k, xp+k + l} = Span{*,, ...,xp + k, Bxp+k+1} Now p + k PT« r V "I + k + i = t>ldp+k + l+ 2 c,.ptHi^ + «iul bp + l_jfj\ + --- + apbjl /=1 L;=l p + lfc p 1 Hence p + k "Xp + k+l 2Ll Cj,p+k+lXj "\"-p + k+l """ a\"\Jp 1=1 + K(*>2 - cp+*.P+*+.) + «2M/„-i + • • • + S <*j(bp + l-i - Cp+k + i-j,p+k + t) + otpbi /, Put p + Jt y = fe. ' &* +k + l - S Cj k + lXi \=dp + k + l + ajp + --- Since Span{*,,. . . , xp+k, xp+k + l} = Span{*,,. . . , xp+k, y} for every a,,. . . , ap, Lemma 10.3.2 implies that "2 ~ Cp+k,p + k + l' "3 ~ Cp + k-\.p + k+li • " • ' "p ~ Ck+l.p + k + l The necessity part of Lemma 10.3.3 is proved. Let us prove the sufficiency of the conditions of Lemma 10.3.3. Assume that B has the form (10.3.7) in a Jordan basis for A. Let iPbe an irreducible subspace for A with dim if = k (1<H/)) and *, E if be an eigenvector. Then xl = £di+ r)fl for some numbers £ and i). Put xt■ = fdt + r\fj (/ = 2, . . . , p). Suppose that tjt^O. In view of Proposition 10.1.2 (see also the remark after its proof), there are some number a,,. . . , ak_{ for which the vectors
Proof of Theorem 10.2.1 335 vk =xk + aldk_i+--- + ak_2d2 + ak_ldi vk~i = **-i + aidk-i + ■■■ + otk-2di (10.3.8) v2 = x2 + a,t/, form a Jordan basis of A in if. [If 17 = 0, replace d{,. . . , dk_l by /i> • ■ • . fk\ respectively in (10.3.8).] A straightforward computation reveals that if is B invariant, and in the basis vx,. .. , vk we have: B\z = Tk(bub2,...,bk) (10.3.9) As in Example 10.2.1, b2¥^Q implies that if is an irreducible subspace for B. Now let if be an irreducible subspace for A such that dim if = m (p + 1 < m < #). It is easily seen that dl G if and (by Proposition 10.1.2) there exist numbers «,,...,« such that the vectors «» =dm +0(i/P +--- + <Vi/2 + "P/i «»-i =^m-i +»i/„-i+ ••' + «„-./i Um-p + l ~ "m-p+1 form a Jordan basis of j4 in if. Again, a straightforward calculation shows that ££ is B invariant and in the basis w,,. . . , um B\* = Um(b1,...,bp;F0) (10.3.10) where the (i, j) entry of F0 is cM ('^/)- Since b29i0, it follows from Example 10.2.1 that the subspace if is an irreducible subspace of B. So we have proved that $AC$B. Let us prove the opposite inclusion $BC$A. Let g,,. . . , g be a Jordan basis of B in the subspace J£2. Write gt = E*=1 i-kift (k = 1,. . . , p); put hk = E*=, f^dj (k = l,. . . , p). Evidently, the vectors hx,. .. ,hp form a Jordan basis for B in if' = Span{d,,. . . , dp}. We show that the sequence hl, . . . ,hp can be augmented by vectors hp+l,. . ., hq so that /i,,.. . , hq is a Jordan basis of B in if,. (Observe that by Example 10.2.1, if, is an irreducible subspace for B.) Assume that the vectors ht = E^=1 ^,-dj, / = p + l,...,r-l are already constructed. Then for /ir = £,r=1 £,,d, the following equation must be satisfied in order that {B-btl)hr = hr_x:
336 Description of Invariant Subspaces and Linear Transformations Arr. = where Z, is the (r-l)x(r-l) submatrix of B — bj formed by the first r - 1 rows and the columns 2, 3,. . . , r - 1, r. From (10.3.7) and b2 t^O it follows that Zr is invertible, so (10.3.11) always has a (unique) solution £r2> • ■ •» £rr< a°d A, is constructed. By Lemma 10.3.1, A has the following form in the basis hu . . . , hq, g,,. . . , gp: A=U<l(a„...,ap;F)®Tp(al,...,ap) for some F, where a2 ¥^ 0. The first p diagonals in both blocks are the same in view of the choice of h,,..., hp. Now we can repeat the proof of the inclusion $A C $B given above, with A and B interchanged. So j?B C $A follows and, therefore, also $B = gA and Inv(fi) = ln\(A). a Now we are prepared to prove Theorem 10.2.1 itself. Proof of Theorem 10.2.1. As every ^-invariant subspace is the sum of its intersections with the root subspaces of A, we may restrict ourselves to the case when <p" is a root subspace for A. Let <p" = <£x + ■ ■ • + i£m (10.3.12) be the decomposition of <p" into a direct sum of irreducible subspaces #„...,.£„ of A. Let di1),...,4,i>ei?1;...;d(r),...,4^eiPni be Jordan bases in i^,. . . , !£m, respectively. Assume (without loss of generality) that Pl >••• >/>m. Now let B: <p" —► <p" be a transformation, and suppose that the invariant subspaces of B and those of A are the same. Applying Lemma 10.3.3 to the restrictions A\x +2>, i = 2,. . . , m, we find that B has the form described in Theorem 10.2.1. Conversely, assume that B has the form B = Upi(fi, b2,..., bp2; F)®TP2(», b2,..., bpz)®-- ■ ®TPm(n,b2,...,bpJ (10.3.13) with b2¥"0. We now prove that Inv(fi) = Inv(j4). Suppose for definiteness that cr(A) = {0}. Let us show that every irreducible subspace for A is also an irreducible subspace of B. Let if be an irreducible subspace for A with d\m£-j, and let *, e i? be an eigenvector of A. Then *, e Span«\ . . . , d<m)} = Ker ,4. Write x = Uld\l) + ■■■ + ard^ with ar*0, for some r (1 < r < m). Put *, = a,*/,-0 + • • • + ard\r), i = 1,. . . , pr. It is
Proof of Theorem 10.2.1 337 easily seen that / = dim if :Spr. Then the vectors z,,..., zy given by (10.1.12) (replacing/,,..., /y by *,,... , *•) form a Jordan basis for A, for some numbers at,a2,.... Two possibilities occur for the number / (=dimif): / ^p2 or p2 + 1 </'<p,. Consider first the case /^p2- Taking into account the form of B, it is easy to check that if is B invariant and the matrix of B\x in the basis z,,. . . , z; is of the form B\<e = ry(M»62. •••>*>;) with Z)2t^0. Then by Example 10.2.1 if is an irreducible subspace for fi. Now suppose that p2 + 1 </' <p,. Since /' <p„ clearly r = 1. This means that the eigenvector *, E if is collinear with d, G if,. Taking into account the form of B [given by (10.3.13)] we conclude that i?is B invariant and the matrix of B\x in the basis z,,. . . , z- is given by (10.3.10) with Z>, = p. and b2¥"0. By Example 10.2.1, if is an irreducible subspace for B. We show that every irreducible subspace for B is also an irreducible subspace for A. As we have already proved, the subspaces if,,. . . , ifm [which appear in (10.3.12)] are also irreducible subspaces for B. Let hi,. . . , hp be a Jordan basis of B in ifm; then ** = 2&A (k = i,...,pm) where h,,..., h is the Jordan basis of A in ifm. Let <pt,.. . , <pPm_l be a Jordan basis of A in ifm _,. Construct the vectors <p,,. . . , q>Pm as follows (recall that pm<pm_,): <Pk = S &;<*>, i=l Since the vectors h,,..., A„ form a Jordan basis in if , the vectors <p,, . . . , <pPm satisfy the equalities (B - ju7)«p, = 0; (B - pl)<pj + i = q>- for / = 2,. . . , pm. Because ifm_, is an irreducible subspace for B, there exists vectors <pp +,,..., <pp _ such that the system <p,,. . . , q>p forms a Jordan basis for B in ifm_,. (See the last paragraph in the proof of Lemma 10.3.3.) Express q>p +,,..., <pp by means of the Jordan basis for A in ifm_,: k <Pk = S &;<*>, , * = Pm + 1. • • • , Pm-i Continuing these constructions, we obtain Jordan bases for B in each of the subspaces ifm, 5£m_,,..., if,. From the choice of these bases and Lemma 10.3.1 it follows that the matrix of A in the union of these bases has the form A = Upi{\, a2,. . . , ap2, F')® Tp2(\, a2,. . . , api)® ■ ■ ■ ®TPm(\,a2,...,aPm)
338 Description or Invariant Subspaces and Linear Transformations where A is the eigenvalue of A and a2^0. As it was proved above, every irreducible subspace for B is also irreducible for A. Thus the equality Inv(,4) = Inv(B) holds. □ 10.4 EXERCISES 10.1 Let A = J2(P)@J3(0) (a) Describe all irreducible ^-invariant subspaces that contain e,. (b) Describe all irreducible /1-invariant subspaces that contain e3. 10.2 Let A = (Jn(0))2. Describe all irreducible /l-invariant subspaces that contain e,. 10.3 Prove or disprove the following statement: if A, B: <p"--» <p" are transformations with a(A) — a(B) - {A0}, A0 G <p and with ln\(A) = lnv(B), then A and B are similar. 10.4 Show that if A, B: <£"—> <p" have the same set of hyperinvariant subspaces and if cr(A) = cr(B) = {A0}, A0E(p, then A and B are similar. 10.5 Show that two lower triangular Toeplitz matrices have the same invariant subspaces if and only if each matrix is a polynomial in the other. 10.6 Show that two circulants have the same invariant subspaces if and only if each circulant is a polynomial in the other. 10.7 Is the property expressed in Exercise 10.6 true for two block circulants of type A i A2 • • • An A„ A { ■ • ■ An_l where A- are 2x2 matrices? What happens if A! are 3x3 matrices? 10.8 Show that two companion matrices have the same invariant subspaces if and only if each is a polynomial in the other. Is this property true for block companion matrices " 0 / 0 ••• 0 " 0 0 / ••• 0 6 6 6 ■■• / _j40 j4, A2 ■ ■ • An_l- with 2x2 blocks Af! For block companion matrices with 3x3 blocks
Chapter Eleven Algebras of Matrices and Invariant Subspaces In this chapter we consider subspaces that are invariant for every transformation from a given algebra of transformations. In fact, this framework includes general finite-dimensional algebras over (p. The key result, that every algebra of n x n matrices that is not the algebra of all nx n matrices has a nontrivial invariant subspace, is developed with a complete proof. Some results concerning characterization of lattices of subspaces that are invariant for every transformation from an algebra are presented. Finally, in the last section we study algebras of transformations for which the orthogonal complement of an invariant subspace is again invariant. 11.1 FINITE-DIMENSIONAL ALGEBRAS A linear space V (over the field of complex numbers <p) is called an algebra if an operation (usually called multiplication) is defined in V, which associates an element in V (denoted xy or x ■ y) with every (ordered) pair of elements x, y from V with the following properties: (a) a(xy) — (ax)y = x(ay) for every a E <p and every x, y E V; (b) (xy)z = x(yz) for every x,y,zE.V (associativity of multiplication); (c) (x + y)z - xz + yz, x(y + z) = xy + xz for every x, y, z E V (distributivity of multiplication with respect to addition). Note that generally speaking xy ^ yx in the algebra V. The algebra V may or may not have an identity, that is, an element e E V such that ae - ea = a for every a E V. We consider only finite-dimensional algebras, that is, those that are finite-dimensional linear spaces. The basic example of an algebra is Mn n, the algebra of all n x n matrices with complex entries, with the usual multiplication operation. Another important example is the algebra of upper triangular n x n (complex) matrices. The following theorem shows that actually every (finite-dimensional) 339
340 Algebras of Matrices aud Invariant Subspaces algebra is an algebra of (not necessarily all) matrices. This is the basic simple result concerning representations of finite-dimensional algebras. Theorem 11.1.1 Let V be an algebra of dimension n (as a linear space). If V has identity, then V can be identified with an algebra of n x n matrices. If V does not have identity, it can be identified with an algebra of {n + 1) x (n + 1) matrices. Proof. Assume first that V has the identity e. Let Jt, Jt»bea basis in V. For every a G V the mapping a: V—> V denned by d(x) = ax, x E V is a linear transformation. Denote by M(a) the n x n matrix that represents the linear transformation a in the fixed basis x,,..., xn. It is easy to check that the mapping M: V—>Mnn defined above is an algebraic homomorphism: M(a + b) = M(a) + M(b) M(ab) = M(a)M(b) M(aa) = aM(a) for any elements a,bEV and any a E <p. Further, the only element a E V for which M(a) - 0 is a = 0. Indeed, if M(a) = 0, then ax = 0 for every xEV. Taking x = e, we obtain a = 0. Hence we can identify V with the algebra {M(a) | a E V}, which is simply an algebra of n x n matrices. Assume now that V does not have identity. Define a new algebra V as all ordered pairs (x, a) with x&V, aE.$ and with the following operations: (x,a) + (y,p) = (x + y,a + p) (x, a)-(y,p) = (xy + ay + /3x, a)3) y(x, a) = (yx, ya) for any x, y&V and any a, j3, y E <f\ Obviously, the algebra V has the identity (0,1) and dimension n + 1. According to the part of Theorem 11.1.1 already proved, we can identify V with an algebra of (n + 1) x (n + 1) matrices (clearly, dim V= n + 1). As V can be identified in turn with the subalgebra {(x, 0)\xEV} of V, the conclusion of Theorem 11.1.1 follows. □ In view of Theorem 11.1.1 we consider only algebras of matrices in the sequel. 11.2 CHAINS OF INVARIANT SUBSPACES Let V be an algebra of (not necessarily all) n x n matrices. A subspace Jt C <p" is called V invariant if Jt is invariant for any matrix from V. The
Chaius of Invariant Suhspaces 341 following basic fact (known as Burnside's theorem) establishes the existence of nontrivial invariant subspaces for algebras of matrices. Theorem 11.2.1 Let V be an algebra of n x n (complex) matrices with V ¥" Mn n and n >2. Then there exists a nontrivial V-invariant subspace. We exclude the case n = 1, when every subspace in <p" is trivial (in this case the theorem fails for V= {0}). The proof of Theorem 11.2.1 is lengthy and based on a series of auxiliary results; it is given in the next section. Taking a maximal chain of V-invariant subspaces and using Burnside's theorem we arrive at the following conclusion. Theorem 11.2.2 For any algebra Vofn*n matrices, there is a chain of V-invariant subspaces such that, with respect to a direct sum decomposition (11.2.1) V = jrl + --- + jrk (11.2.2) where jfp is a direct complement to M in M +1 (p = 1,. . . , k), every transformation AE.V has a block triangular form A-[APq\p,q = l with *« = o for p>q and the set {App \ A E V), coincides with the algebra of all transformations from Mp into Jfp, for p = 1,. . . , k. The chain (11.2.1) is maximal, and every maximal chain of V-invariant subspaces has the property stated above. The case when V is the algebra of all block upper triangular matrices with respect to the decomposition (11.2.2) is of special interest. Then Mn n is a direct sum of two subspaces: V and W, where W is the algebra of all lower block triangular matrices with zeros on the main block diagonal: W- x e M„ „ I x = 0 l21 L31 ■-**, 0 0 X32 L*3 0' 0 0 oj , Xtl: Jfr •Jf, The subspaces
342 Algebras of Matrices aud Invariant Subspaces ^, = {0}, %2 = Jfk,..., %k = Jf2 + --- + Jfk, 2k+l = f" are all the invariant subspaces for W. In particular, we have the following direct sum decompositions: <p" = Mx + £k = M2 + <£k_x = --- = Mk + 2l This motivates the following conjecture. Conjecture 11.2.3 Let V, and V2 be nonzero subalgebras in M„ n such that VlC\V2 = {0}, Vt + V2 = Mn n. Then there exist nonzero invariant subspaces M1 and M2 for Vl and V2, respectively, which are direct complements of each other in <p". We are able to prove a partial result in the direction of this conjecture. Namely, if Vx and V2 are subalgebras in Jtn n such that Vj + V2 = Mn n, then for every V,-invariant subspace Mx and every K2-invariant subspace M2 either Mx C\M2 = {0} or Mx + M2 = <p" (or both) holds. Indeed, assuming the contrary, let M\ be a direct complement to Mx D M2 in Mk, i = 1,2, and let M be a direct complement to M, + M 2 in (p". Then we have a direct sum decomposition <p" = m ; 4- (m1 n j<2) + M'2 4- .y With respect to this decomposition, every A'eV, has a block matrix representation of type * * * * 0 0 * * .0 0 * *. [the zeros appear because of the V, invariance of M{ = M[ + (Mt fl M2)], whereas every V G V2 has a block matrix representation of type "* 0 0 *" * * * * _* 0 0 *. [the zeros appear because of the V2 invariance of M2 = {Jit n M2) 4- M'2\. So every matrix in V, + V2 has a zero in the (4,2) block entry, which contradicts the assumption that V{ + V2 = Mn „.
Proof of Theorem 11.2.1 343 11.3 PROOF OF THEOREM 11.2.1 We start with auxiliary results. A subset Q of an algebra U of n x n matrices is called an ideal if Q is a subalgebra; that is, A,, A2 E Q implies A{ + A2G Q, AXA2 G Q, and «j4, E £? for every complex number a; and, in addition, AB and BA belong to Q as long as A E £/ and B & Q. Trivial examples of ideals are Q = {0} and Q = M„ „. Lemma 11.3.1 The algebra Mn n has no nontrivial ideals. Proof. Let Q be a nonzero ideal in Mn n, and let A E <2, A ¥0. It is easily seen that for every pair of indices (i, j) (1<i,/<«) there are matrices G,y, //,y such that GHAHtj has a one in the (i, /) entry and zeros elsewhere. Now any n x n matrix B - [btj]" x can be written n B=2 b^AH, and thus belongs to Q. Hence Q = Mnn. □ Now let U be an algebra of n x n matrices (« s 2) that has no nontrivial invariant subspaces. We prove that U = Mnn, thereby proving Theorem 11.2.1. The first observation is that without loss of generality we can assume / E U. Indeed, consider the algebra 0 = {A + al\ AGU, aE(p}. Obviously, U has no nontrivial invariant subspaces as well. Also, U is an ideal in U. Hence, if we know already that 0 = Mn n, then Lemma 11.3.1 implies that either U - Mn „ or U - {0}. But the latter case is excluded by the definition of U and the condition n a 2. So it is assumed that / E U. U. Lemma 11.3.2 For every nonzero vector x in <p" and every y E <p" there exists a matrix AG U such that Ax = y. Proof. The set M - {Ax \ A E U} is an invariant subspace for U. This subspace is nonzero because x = I ■ x is a nonzero vector in M (recall that IGU). By our assumption on U the subspace M coincides with (p". Hence for every y G <p" there exists an A G U such that y = Ax. □ Lemma 11.3.3 The only matrices that commute with every matrix in U are the scalar multiples of I. Proof. Let S G Mn „ be such that SA = AS for every A Gil. Let A0 be an eigenvalue of S with corresponding eigenvector x0. Then for every AG U we have
344 Algebras of Matrices and Invariant Subspaces SAx0 = ASx0 = \0Ax0 (11.3.1) By Lemma 11.3.2, for every y £ (p" there is an A in U with Ax0 = y. So equations (11.3.1) mean that S = \0I. D Lemma 11.3,4 If x, and x2 are linearly independent vectors in (p", then for every pair of vectors y,,y2E(p" there exists a matrix A from U such that Axl=yl, and Ax2 = y2. Proof. It is sufficient to show that there exist A,, A 2 £ U such that Alxl¥!0, Alx2=0 and A2xl = 0, A2x2¥"0. Indeed, we may then use Lemma 11.3.2 to find Bl,B2GU with BlA1xl= yt, B2A2x2 = y2. Hence (BlAl + B2A2)xi = yi, i = l,2 We now prove the existence of Ax. (The existence of A2 is proved similarly.) Arguing by contradiction, assume that Ax2 = 0 implies Axx =0 for every A&U. Then one can define a transformation T: <p"—» <p" by the requirement that TAx2 = Ax, for all A £ U. Indeed, if Ax2 = Bx2 for some A and B in U, then (j4-S)j:2 = 0 and thus also (A- B)x1=0, which means Ac, = Bxx. So 71 is correctly defined. Further, {Ax2 \ A £ U} = <p" by Lemma 11.3.2; hence T is defined on the whole of <p". Now for any A and J5 in U we have 7Vtft*:2 = ABxx = .47^2 and since {Bx21 fi £ [/} = <p", we find that 7V1 = ,47 for all A £ t/. By Lemma 11.3.3, T= «/ for some a £ (p. Therefore, A(xl - ax2) = 0 for all AG U. But this contradicts Lemma 11.3.2. □ We say that an algebra V of n x n matrices is & transitive if for every set of & linearly independent vectors x,,..., j^ in <p" and every set of & vectors y1,...,yt in <p" there exists a matrix A £ V such that Ax^y^ i = 1, . . . , &. Evidently, every ^-transitive algebra is /? transitive for p<k. Lemma 11.3.4 says that the algebra U is 2 transitive. Proof of Theorem 11.2.1 In view of Lemma 11.3.4 it is sufficient to prove that every 2-transitive algebra V of n x n matrices is n transitive. Assume by induction that V is k transitive, and we will prove that V is (k + 1) transitive (here 2<&<«-l). So let x,,..., xk+1 be linearly independent vectors in <p". It will suffice to verify that for every i(l</<Hl) there exists a matrix Ai £ V such that AjXj ^ 0, Aixj = 0, /' ¥■ i (indeed, for given y,,. . . , yk+1 £ <p" the 1 transitiv-
Proof of Theorem 11.2.1 345 ity of V implies the existence of Bt;E V such that BjAjX^y,; then for A = S*.*1 B,A we have Ac,- = yt, i = 1,. . . , k + 1). We will prove the existence of A k +, (for A t, 1 < i < A: one has simply to permute the indices). Suppose that no such Ak + 1 exists; that is, Ac, = ■ ■ = Axk — 0, A E V implies that Axk + i = 0. Consider the algebra of 2n x 2n matrices. It turns out (because of the 2 transitivity of V) that any K<2)-invariant subspace is one of the subspaces {0}, <p2", {0} © <p", | he E <p" [ for some A E (p. Indeed, the V(2)-invariant subspace M (which we can assume to be nonzero) is a sum of cyclic K(2)-invariant subspaces: M = Ef=1 Mt, where Fix an index i. For any nxn matrix B, assuming xn,xi2 are linearly independent, and by the assumption of 2 transitivity of V, we have Bxn = Axn, Bxj2 = Axi2 for some AEV; hence Mt is Myn invariant, where -?>{[? £][;:;]|«-4 Now because of the obvious 2 transitivity of Mn/1, we find that Ml,= <P" © <P"- Assume now that xn and *,2 are linearly dependent. Then 1 transitivity of V implies again that Mt is M^\ invariant. If xn =0, we get •M-i = {0} © <P"> and if xi2 = \xn for some A E <p, we get •«<-{[£] I 'ef"l Consequently, J< = Sf=, J<, is equal to <p2" except for the two cases: (1) xiX = 0 for all i = 1,... , /?; (2) *,2 = Ax,.,, i* = 1,. . . , p for the same AE (p. In the first case M = {0} © <p", and in the second case M = j he E }L L Ax J I Now we return to the proof of the existence of Ak + l. By the induction hypothesis, for each / (l</sfc) there is some C^V with C-x. #0 and Cjxi = 0 for i ¥=j, lsi'< &. The subspace mA[ao' Ic)Lxk'J\A*v> /--1--.*} is V(2) invariant; therefore (according to the fact proved in the preceding
346 Algebras or Matrices and Invariant Subspaces paragraph), there exists a complex number a such that ACtxk + l = a.jACixj for all A E V. The induction hypothesis implies that k limes A 0 A .0 AS \A<EV k shows and the assumption that Axk + l = 0 whenever Axi = 0 for/ = 1, that a mapping T: <p" —> <p" is unambiguously defined by UAxl®--'®Axk) = Axk + l, AEV Obviously, T is linear. Further, for A E V and /' = 1, . . . , k we have (where the term ACjXj appears in the /th place) T(0® ■ ■ • © AClxi ©0© ■ ■ ■ ©0) = T 01 -0 U-ILO C,J Since Cpc^Q, the subspace {AC^Xj | .4 E V} coincides with <p" by the 1 transitivity of V. So the linearity of T gives * /=! Then, for ,4 E K ^(**+i - E vj = Axk+i - T(Axi © • • • © Axk) = Axk+i ~ Axk+i = o Hence {x \ Ax = 0 for all A E V} is a nontrivial K-invariant subspace. This contradicts the 1 transitivity of V. D JJ.4 REFLEXIVE LATTICES Let A be a lattice of subspaces in <p". The set of all n x n matrices ,4 such that A!£d!£ for every i?EA, denoted Alg(A), is an algebra. Indeed, if A, fiEAIg(A), then {A + B)% CA£ + B2!C2! (AB)% = >l(fii?) C AS? C i? (a>l).Sf = a(i4.£) C aiP C if , (a E <p)
Reflexive Lattices 347 for every subspace if E A. On the other hand, for an algebra V of n x n matrices the set Inv(K) of all V-invariant subspaces in <p" is easily seen to be a lattice of subspaces [i.e., J£, M E Inv(K) implies i? + M E Inv(K) and 5£ D M E Inv(V)]. The following properties of Alg(A) and Inv(K) are immediate consequences of the definitions. Proposition 11.4.1 (a) If A, and A2 are two lattices of subspaces in <p", and A,CA2, then Alg(A,)D Alg(A2). (b) If V{ and V2 are algebras of n x n matrices and V,DV2, then Inv(V,)Clnv(V2); (c) Inv(Alg(A)) D A; (d) Alg(Inv(V))DK Let us check property (c), for example. Assume S'EA; then Aid C if for every /4GAlg(A). Hence if is Alg(A) invariant; that is, 5£ G Inv(Alg(A)). example 11.4.1. Let A be the chain OC Span{e,} CSpan{e,, e2} C • • • CSpan{e,,. . . , en_,} C (p" Then Alg(A) is the algebra of all upper triangular matrices. □ example 11.4.2. Let A be the set of subspaces Span{e; | i'G K}, where K runs over all subsets of {1, . . . ,«}. Clearly A is a lattice. The algebra Alg(A) is easily seen to be the algebra of all diagonal matrices. □ example 11.4.3. For a fixed subspace M C <p", let A be the lattice of all subspaces that are contained in M. Then Alg(A) is the algebra of all transformations A having the form [ al * ~\ . with respect to the direct sum decomposition <p" = M + Jf (for a fixed direct complement Jf to Jt). □ example 11.4.4. Let V be the algebra of polynomials Ej=0 a^A', a^E. <p, where A: $"—> <p" is a fixed linear transformation. Then Inv(K) is the lattice of all ^-invariant subspaces. □ example 11.4.5. Let A: <J7" -» <p" be a fixed transformation, and let V be the algebra of all transformations that commute with A. Then lnv(K) is the lattice of all /l-hyperinvariant subspaces. □ Note that Alg(Inv(Alg(A))) = Alg(A) (11.4.1)
348 Algebras of Matrices aud Invariant Subspaces for every lattice A of the subspaces in <p". Indeed, the inclusion C in equation (11.4.1) follows from (c) and (a). To prove the opposite inclusion, let A E Alg(A). Then any subspace M belonging to Inv(Alg(A)) is invariant for every transformation in Alg(A); in particular, M is A invariant. This shows that A E Alg(Inv(Alg(A))). Similarly, one proves that Inv(Alg(Inv(l/))) = Inv(K) (11.4.2) for every algebra V of transformations <p" —► <p". A lattice A of subspaces in <p" is called reflexive if Inv(Alg(A)) = A. Equality (11.4.2) shows, for example, that any lattice of the form Inv(V) for some algebra V is reflexive. Let us give an example of a nonreflexive lattice. example 11.4.6. Let A be the following lattice of subspaces in <p2: {0}, J£ = Span{e2}, M = Span{e,}, Jf = Span{e, + e2}, <p2. Let us find the alge- \ a b~\ bra Alg(A). The 2x2 matrix A = \ has invariant subspaces Z£ and M if and only if b = c = 0. Further, X is A invariant if and only if a + b = c + d. So Alg(A) = {[o °]|«G<f} = W|ae<p} and Lat(Alg(A)) consists of all subspaces in <p2. □ Many results are known about sufficient conditions for reflexivity of a lattice of subspaces. Often the key ingredient in such conditions is distribu- tivity. Recall the definition of a distributive lattice of subspaces given in Section 9.6. Theorem 11.4.2 A distributive lattice of subspaces in <p" is reflexive. Conversely, every finite reflexive lattice of subspaces is distributive. The proof of Theorem 11.4.2 is beyond the scope of this book, and we refer the reader to the original papers by Johnson (1964) and Harrison (1974) for the full story. Here, we shall only prove two particular cases in the form of Theorems 11.4.3 and 11.4.4. Theorem 11.4.3 A complete chain of subspaces {0} C J,C Jt2C- • ■ C Mn_x dp", dim M^ i; i*= 1,. .. ,«-l is reflexive.
Reflexive Lattices 349 Proof. Let/,, ...,/„ be a basis in <p" such that Span{/,, ...,/} = Mt, i = 1,. . . , n - 1, and write linear transformations as matrices with respect to this basis. Example 11.4.1 shows that Alg(A) consists of all upper triangular matrices. As the linear transformation 0 10- 0 0 1- 0 0 • 0 • 0 1 • 0 K obviously belongs to AlgA, and its only invariant subspaces are {0}, M,, i = 1,...,«- 1, and (p", we have Inv(Alg(A)) C A. Since the reverse inclusion is clear, the conclusion of Theorem 11.4.3 follows. D The next theorem deals with lattices that are as unlike chains as possible. A lattice A of subspaces in <p" is called a Boolean algebra if it is distributive and for every M E.A there is a unique complement M' (i.e., M + M' = <p", ini' = {0}) that belongs to A. We say that a nonzero subspace % E A is an atom if there are no subspaces for the lattice A strictly between JC and {0}. The Boolean algebra A is called atomic if any M E A is a sum of all atoms J{ contained in M. A typical example of an atomic Boolean algebra of subspaces is A= {Span{*, | ;'E £}, where E is any subset in {1,2,..., n}}, and *,,. . . , xn is a fixed basis in (p". Theorem 11.4.4 Every atomic Boolean algebra A of subspaces of <p" is reflexive. Proof. Let K be the set of all atoms in A, and for every 3C C K let Px be the projector on 3C along the complement 3C' of 9if in the lattice A. We shall show that A = Inv(V), where V is the algebra generated by the transformations of type PXAPX, where A: <p"-» <p" and 3if E K. In other words, V consists of all linear combinations of transformations of type Pycfi i Pjcfx^ 2 J°ar2' " * PycJ^ m PXm where Ak\ <p"-*■ <p", and 9if,,. . . , 9ifm are atoms in A. Let if be an atom in A. For any atom %, we have either if = % or $<Z3C. (This follows from the distributivity of A: if - se n (3? u 3r) = (if n w) u (i? n 3Z"'); as if is an atom, either iffl^ = iforifn^' = if holds.) In the former case Im PXAPX C if for every transformation A: <p"-> <£"", and in the latter
350 Algebras of Matrices and Invariant Subspaces case if CKer PXAPX. In either case if is PXAPX invariant. Hence i?£ lnv(V). Now every Jt £ A is a sum of the (finitely many) atoms contained in Jt. Hence Jt Elnv(K). In other words, ACInv(V). To prove the reverse inclusion, it is convenient to use the following fact: if X is an atom in A and Jt &lnv(V), then either X C M or JtCX'. Indeed, suppose that Jt is not contained in X', so there exists a vector / £ <p" such that/ £ Jt~-X'. Since Pxf^0, it follows that every vector x in X has the form APxf for some transformation A: <p"—»<p". Then also x = PxAPxf. As /€ Jt and Jt £ Inv(V), we have x £ Jt, that is, X C Jt. Return to the proof of the inclusion Inv(V) C A. Let Jt £ Inv(V), and let Jt0 £ A be the sum of all the atoms in A that are contained in Jt. Also, let Jt, £ A be the intersection of all the complements of atoms in A such that these complements contain Jt. Obviously Jt<)CJtCJtl (11.4.3) Since A is atomic, the complement Jt'Q of Jt0 is the sum of all atoms that are not contained in M. (Indeed, if an atom 9Hs contained in Jt'Q, then 3Sf is not contained in Ma and thus by the definition of Jt0, % is not contained in Jt. Conversely, if an atom % is not contained in Jt, then obviously % is not contained in M0, and since "3Cis an atom, it must be contained in Jt'0.) The fact proved in the preceding paragraph shows that Jt'0 is the sum of all the atoms JC with the property that %' D Jt. For any finite set Sif,,. . . , 9if of atoms with <3i'i D Jt, i = 1,. . . , p, we have (using the distributivity of A): (af1 + --- + af#,) + (af;n---nar;) = (ar, + • • • + xp + x\) n • • • n (ar, + • • • + xp + ar;> = <p" and (ar, + • • • + xp) n (X [ n • • • n X'p) = (ar, n ar; n • • • n ar;) + • • • + (ar,, n ar; + • • ■ + ar;> = {0} so actually ar, + • • • + ar, = (ar; n • • • n ar;)' This shows that Jt'Q = Jt[; hence Jt0 = Jtx. Combining this with (11.4.3), we see that Jt = Jt0 = Jtl and thus Jt £ A. □ 11.5 REDUCTIVE AND SELF-ADJOINT ALGEBRAS We have seen in Corollary 3.4.4 that the set lnv(A) of invariant subspaces of a transformation A: §"-*§" has the property that M £ lm(A) exactly when ML £ \xm{A) if and only if A is normal. This property makes it
Reductive and Self-Adjoint Algebras 351 natural to introduce the following definition: an algebra V of n x n matrices is called reductive if it contains / and for every subspace belonging to Inv(V) its orthogonal complement belongs to Inv(V) as well. Thus the algebra P(A) of all polynomials E™,0 a,/!', where A is a normal transformation, is reductive. This algebra P(A) has the property that X E. P(A) implies X* E P(A). Indeed, we have only to show that, for the normal transformation A, the adjoint A* is a polynomial in A. Passing, if necessary, to the orthonormal basis of eigenvectors of A, we can assume that A is diagonal: A = diag[A,, A2,. .. , Aj. Now let /(A) be a scalar polynomial satisfying the conditions /(AJ = A,, i = 1, . . . , n. Then clearly A* =f(A). The next theorem shows that this property of the reductive algebra P(A) is a particular case of a much more general fact. Theorem 11.5.1 An algebra V of nx n matrices with I E.V is reductive if and only if V is self-adjoint, that is, X E V implies X* E V. As a subspace M is A invariant if and only if Jt1 is A* invariant, it follows immediately that every self-adjoint algebra with identity is reductive. To prove the converse, we need the following basic property of invariant subspaces of reductive algebras. Lemma 11.5.2 Let V be a reductive algebra of n x n matrices, and let M{,... , Mm be a set of mutually orthogonal V-invariant subspaces such that C = Ml@---®Mm and for every i the set of restrictions {A\M \ A E V) coincides with the algebra M(Mt) of all transformations from Mt into Mr Then V is self-adjoint. Proof. We proceed by induction on m. For m = \, that is,Ml = <p" and V= M(<£■"), the lemma is obvious. So assume that the lemma is proved already for m — 1 subspaces, and we prove the lemma for m subspaces. It is convenient to distinguish two cases. In case 1, there exist distinct integers / and k between 1 and m and an algebraic isomorphism <p: M(JtJ)—*M(Mk) such that A\M = (p(A\M ) for every Ae.V. This means that ip is a one-to-one and onto map with the following properties: (a) <p(aA\Mi + 0B\M) = aA\Mk + 0A\Mk for every A, B E V and a, p E <p (b) 9{A\Mi • B\M) = A\Mk ■ B\Mk for every A,BEV (c) 9V\m) = AMk
352 Algebras of Matrices and Invariant Subspaces As dim M(Jtj) = (dim Jtj)2 is equal to dim M(Mk) = (dim Jtk)2, we have dim Jtt = dim Mk. We show first that there exists an invertible transformation S: Mj-*Mk such that (p{X) = SXS~l for every XEM(Mj). Note that <p takes rank 1 projectors into rank 1 projectors. Indeed, if P =PGM(Mj) and rankP = l, then (<p(P))2 = <p(P), so <p(P) is a projector. Moreover, the one-dimensional subspace {PXP \XElM(Mj)} C M(^;) is mapped by <p into the subspace {<p(P)Y9(P)\YeM(Mk)}CM(Mk) so the subspace {<p(P)Y<p(P) \ YE M(Jtk)} is also one-dimensional; hence rank <p(P) = 1. Now fix any nonzero vector/£ M;, and let j40: M ■-*Mi be the orthogonal projector on Span{/}. As (p(A0): Jlk—>Jlk is also a one- dimensional projector, there exists an invertible transformation S0: Jtj—>Mk such that <p(A0) = 5qJ405„'. (This follows from the fact that the Jordan form of any one-dimensional projector in (p" is the same: diag[l, 0,. . . , 0].) Define S: Jij->Jik by S(Af) = <p(A)S0f, A<EM{Mt) Let us show that this definition is correct. Indeed, if Atf=A2f, then (j4, - A2)A0 = 0. Consequently, (<p(j4,) - <p(A2))<p(A0) = 0, and since <p(A0) is a projector onto Span{50/}, we obtain (<p(/4,)- (p(A2))S0f = 0. In other words, Atf = A2f happens only if <p(j4,)S0/= (p(A2)S0. Hence 5 is correctly defined. Clearly, S is linear and onto. If (p(A)Sof = 0y then <p(A)<p(A0) = 0, which implies /4/40 = 0 and /4/=0. This shows that KerS={0}. Hence S is invertible. Finally, for every A,BEzM(Mj) we have S(AB)f= <p(A)<p(B)S0f = <p(A)SBf , and thus SAg - <p(A)Sg for every gEJtr Thus <p(j4) = 5y4S_1 for all A<EM{Mt). Next, we show that S can be taken to be unitary, that is, S ' = S*. Let M be a subspace in <p" consisting of all vectors of the form xl + • • • + xm, where i|Gi„...,j:m£ Mm and xk = S^. As for every A E V, it follows that J< is V invariant. Since V is reductive, M 1 is V invariant as well. A computation shows that
Reductive and Self-Adjoint Algebras 353 Jt1 = {xj + xk I xjEJtj,xk E.Mk, Xj = -S***} The fact that AM1 CM1 for all A G V implies that if Xj = -S*xk for Xj E Mj and j^ G Mk, then Acy = -S*Axk = -5M|^t = -5*5/lL S-lxk = S*SA\MS-lS*-ixj As {.<4|^ |j4GK} coincides with M{M ^ and in the preceding equality xt can be an arbitrary vector from Jtj, we obtain B = S*SBS~iS*~i for all BE M{Mj). By Proposition 9.1.6, 5*5 = A/ for some number A that must be positive because 5*5 is positive definite. Letting U = A" 5, we obtain a unitary transformation U such that <p(fi) = UBU'1 for all fi G M(Jtt). We next show that V\ ± is reductive. Indeed, \ttMCMk=Ml + --- + Mk„. + Mk+l + • • • + Mm be V| ^invariant. Then clearly X is V invariant, and by the reductive property of V, so is Jf1, and hence also Jf1 C\ Mk. It remains to notice that JfL D Jtk coincides with the orthogonal complement to JfL in Mk. By the induction hypothesis, V\M is self-adjoint. Therefore, for every matrix A = Al®-'-®Ak_l@Ak®---®AmeV the transformation A*l®---®A*k_i®A*k + l®---®A*m.MkL^JlkL belongs to V\M x. As for every fi = Bl ® ■■ ■ ® Bk ® ■ ■ ■ ® Bm G V we have Bk = UBtU~\ it follows that A*®---®A*k_l®UA*U~1®A*k+l®---®A*meV (11.5.1) But Ak = UAjU~i and t/ is unitary. So the transformation (11.5.1) is just A*. We have proved that V is self-adjoint (in case 1). Consider now case 2. For any pair of distinct integers ; and k between 1 and m, there is no algebraic isomorphism <p as in case 1. If for fixed j¥= k, A\M =0 implies A\M =0 for any AE.V and vice versa, then we can correctly define an algebraic isomorphism q>: M(Jtj)—> M(Jtk) by putting <p(A\M ) = A\Mk for all A G V (recall that V\M = M{Mj)). Thus our assumption in case 2 implies the following. For each pair /, k of distinct integers between 1 and m there exists a matrix A G V such that exactly one of the transformations A\M and A\M is zero. We now prove that there exists a matrix AEV such that A\M is different from zero for exactly one index /. Choose AEV different from zero so that the number p of indices /' (1 s/< m) with A\M 9^0 is minimal. Permuting
354 Algebras of Matrices and Invariant Subspaces Jtu . . . , Jtm if necessary, we can assume that A\M 7^0,... , A\M 5^0, A\M =0 for j> p. We must show that p - 1. Assume the contrary, that is, p > 1. Interchanging J<, and Jt p if necessary, we can assume that C|^ 5^0, ^- \m = 0 for some matrix C &V. Let ^, denote the set of all transformations B:Jtx-+Jtx such that B=B\Mi for some B6V with B|^ =0. The fact that V\M = A/(y^,) implies that /, is an ideal in A/(^,). Since Ce^, and C#0, Lemma 11.3.1 shows that actually ^, = M{Jtl). Similarly, the set /2 of all transformations B: Jtx—> Jty such that B = B\M for some B EV with #L =0»--->^L =0, 's a nonzero ideal in M{Jtl) and thus ^2 = M(Jt,). Now the identity transformation I: Jtl—*-Jtl belongs to both$x and $2. Therefore, there exist transformations Bj:Jtl-^Jtl (/Vl, /#p) and C-\ Jij—^Jlj (j = 2,3,... , p) such that B=lI®B2®---®Bp_l®0®Bp + 1®---@Bm and C™I®C2®---®Cp_l®Cp®0®---®0 belongs to V. Then also BC belongs to V, and (BC)\M = 0 for ;>/?. However, this contradicts the choice of p. So, indeed, p = 1. As the ideal ^2 constructed above coincides with M(Jtx), it follows that every matrix B from V is a sum of two transformations B\Mi and B| ,. Since VL = M(/^,) we find that V is self-adjoint provided V\ , is. But the I M | algebra V\ ± is easily seen to be reductive because V is. Now the self- adjointness of V\M follows from the induction hypothesis. Lemma 11.5.2 is proved completely. □ Now we are ready to prove the converse statement of Theorem 11.5.1. If V has no nontrivial invariant subspaces, then by Theorem 11.2.1 V= Mn n, and obviously V is self-adjoint. If V has nontrivial invariant subspaces, then it has a minimal one, say, Jtl. As V is reductive, Jt I is also V invariant, and the restriction V\ml is reductive. If V\M± is not the algebra of all transformations Jtx —*Jt\, then there exist a minimal nontrivial V-invariant sub- space Jt2CJti. Proceeding in this manner, we obtain a sequence of mutually orthogonal K-invariant subspaces Jtx,. . . , Mm such that $" = Jtl+--- + Mm and for each ;' there are no nontrivial V-invariant subspaces in Jt y. By Theorem 11.2.1 the restriction V\M (/' = 1,. . . , m) coincides with the algebra of all transformations M •-» Jt.. It remains to apply Lemma 11.5.2.
Exercises 11.6 EXERCISES 355 11.1 Prove or disprove that the following sets of n x n matrices are algebras: (a) Upper triangular Toeplitz matrices: ' a\ 0 -0 itrice ai ai 0 s: fli «3 a2 0 a2 a„ - a„ i a, - » an-n flye<p (1) L-dL a_ (c) Circulant matrices: a, a2 a„ a, (d) Companion matrices: 0 1 0 0 0 1 0 0 0 Lfl0 Ai «z fl,e<p (2) fl„ -| fl>e«p (3) fl;e<p (4) (e) Upper triangular matrices [fll7]"y=, where fl/y = 0 if i >/. 11.2 Prove or disprove that the following sets of nk x nk matrices are algebras: (a) Block upper triangular Toeplitz matrices (1), where ay are k x k matrices, / = 1, . . . , n. (b) Block Toeplitz matrices (2), where ay are k x k matrices, /'=—« + l,...,n — 1. (c) Block circulant matrices (3), where a- are k x k matrices,
356 Algebras of Matrices and Invariaut Subspaces (d) Block upper triangular matrices [a,,]"i=l, where aly are k x k matrices and atj = 0 if / > /. (e) Matrices of type r o 0 0 0 Lfl„ 0 0 1 0 a„„ J where atj are k x k matrices. 11.3 Show that the set of all n x n matrices of type r, 0 0 ••• 0 b„ 0 a2 0 0 0 a. 0 b2 0 \-bl 0 0 0 0 a.J is an algebra. Find all invariant subspaces of this algebra. 11.4 Let A be an n x n matrix, (a) Show that the set is not necessarily an algebra. (b) Prove that the closure of Q, that is, the set of all n x nmatricesXfor which there exists a sequence {Xm}2=] with Xm E Q for m = 1,2,... and limm^0O Xm = X, is an algebra with identity. (c) Describe all invariant subspaces of the closure of Q. 11.5 Show that the algebra of all n x n upper triangular Toeplitz matrices and the algebra of all n x n upper triangular matrices have exactly the same lattice of invariant subspaces. 11.6 Show that the algebra of all upper triangular n x n matrices contains any algebra A for which lnv(yl) = {{0}( Span{e,},. .. ,Span{e,,.. ..e,,.,}, <p"} 11.7 Show that there is no algebra A with identity strictly contained in the algebra UT(n) of upper triangular Toeplitz matrices for which Inv(yl) = {{0}, Spanfe,},. . . , Span{e,, . . . , e^,}, <p"}
Exercises 357 11.8 Prove that the algebra U(n) of n x n upper triangular matrices is the unique reflexive algebra for which the lattice of all invariant sub- spaces is the chain {0} CSpan{e,} CSpan{e,, e2} C • • • CSpan{e,,. . . , e„_,} C <p" (5) 11.9 Show that there exist n different algebras V,,. . . , V„ whose set of invariant subspaces coincides with (5) and for which UT(n) = V1CV2C---CV„ = U(n) 11.10 Find all invariant subspaces of the algebra of all In x 2« matrices of type A fil C D\ where A, B, C, and D are upper triangular matrices. 11.11 As Exercise 11.10 but now, in addition, B and C have zeros along the main diagonal. 11.12 Find all invariant subspaces of the algebra of all In x 2« matrices A B ] , where A, B, C, and D are n x n circulant matrices. 11.13 Let A be an n x n matrix that is not a scalar multiple of the identity. Find a nontrivial invariant subspace for the algebra of all matrices that commute with A. Does there exist such a subspace of dimension 1? 11.14 Let A be an n x n matrix and 2 Ct/L1 ; = 0 «n e<p be the algebra of polynomials in A. Give necessary and sufficient conditions for reflexivity of V in terms of the structure of the Jordan form of A. 11.15 Indicate which of the following algebras are reflexive: (a) n x n upper triangular Toeplitz matrices. (b) n x n upper triangular matrices. (c) n x n circulant matrices. (d) nk x nk block circulant matrices (with k x k blocks). (e) nk x nk block upper triangular matrices (with k x k blocks). (f) nk x nk block upper triangular Toeplitz matrices (with k x k blocks). (g) the algebra from Exercise 11.3.
358 Algebras of Matrices and Invariant Subspaces 11.16 Let Q be as in Exercise 11.4. When is the closure of Q a reflexive algebra? 11.17 Given a chain of subspaces {0}ci1C'--cij,C((;" (6) construct reflexive and nonreflexive algebras whose set of invariant subspaces coincides with (6). 11.18 Let i„..,,i,bea basis in <p", and let A be the minimal lattice of subspaces that contains Span{j:1},. . . ,Span{j:n}. Prove that there exists a unique algebra V for which A = Inv(V). Is V reflexive? 11.19 Let V be an algebra of n x n matrices without identity and such that A" = 0 for every AEV. Prove that A XA2- ■ -An= 0 for every «-tuple of matrices Ax,. . . , A from V. (Hint: Use Theorem 11.2.2.)
Chapter Twelve Real Linear Transformations In this chapter we review the basic facts concerning invariant subspaces for transformations A: $"—*$", focusing mainly on those results that are different (or their proofs are different) in the real case, or cannot be obtained as immediate corollaries, from the corresponding results for transformations from <p" into <p". We note here that the applications presented in Chapters 5, 7, and 8 also hold in the real case. That is, applications to matrix polynomials E;'=0 A;j4; with real n x n matrices Aj and to rational matrix functions W(\) whose values are real n x n matrices for the real values of A that are not poles of W(\). In fact, the description of multiplication and divisibility of matrix polynomials and rational matrix functions in terms of invariant subspaces (as developed in Chapters 5 and 7) holds for matrices over any field. This remark applies for the linear fractional decompositions of rational matrix functions as well. In contrast, the Brunovsky canonical form (Section 6.2) is not available in the framework of real matrices, so all the results of Chapter 6 that are based on the Brunovsky canonical form fail, in general, in this context. Also, the results of Chapter 11 do not generally hold in the context of finite-dimensional algebras over the field of real numbers. 12.1 DEFINITION, EXAMPLES, AND FIRST PROPERTIES OF INVARIANT SUBSPACES Let A: Jf?"—> $" be a linear transformation. As in the case of linear transformations on a complex space, we say that a subspace Jt C J|f" is invariant for A (or A invariant) if Ax G Jt for every xE.Jt. The whole of $" and the zero subspace are trivially A invariant, and the same applies to Im A and Ker A. As in the complex case, one checks that all the nonzero 359
360 Real Linear Transformations invariant subspaces of the n x n Jordan block with real eigenvalue (considered as a transformation from $" into $" written as a matrix in the standard orthonormal basis e,,. . . , e„) are Span{e,,. . . , e^}, k = 1, . . . , n. Also, for the diagonal matrix A = diag[A,,. . . , A„], where A,,. . . , A„ are distinct real numbers, all the invariant subspaces are of the form Span{e, | i E K) with KC{1 «} (Span{<?, | i E 0} is interpreted as the zero subspace). In addition to these examples, the following example is basic and specially significant for real transformations. example 12.1.1. Let " a — T 0 0 0 L0 T a 0 0 0 0 1 0 a — j 0 0 0 •• 1 •• T • ■ a • • 0 •• 0 •■ • 0 • 0 • 0 • 0 1 0 a T where a and r are real numbers and r ¥^ 0. The size n of the matrix A is obviously an even number. It is easily seen that Span{e,, .. . , e2k}, k = 1,. . . ,«/2 are ^-invariant subspaces. It turns out that A has no other nontrivial invariant subspaces. Indeed, replacing A by A - a I, we can assume without loss of generality that a = 0. We prove that if M is an ^-invariant subspace and x = T.j=l ap^M with at least one of the real numbers a 2k_l and a2k different from zero, then M D Span{e,,. . . ,e2k}, and proceed by induction on k. In the case k - 1 we have a,e, + a2e2 E M and A(alel + a2e2) = ra2el - ra,e2 E M. The conditions t # 0 and a\ + a\ ¥* 0 ensure that both vectors e, and e2 are linear combinations of a,e, + a2e2 and Ta2et — ra1e2, and the assertion is proved for k = 1. Assuming that the assertion is proved for k - 1, let x = E;2*, ayey E M with «2*-i + alk ^ 0. A computation shows that the vector y = (A2 + j2)x belongs to Span{e,,. . . , e2k_2} and in the linear combination y = Ey2*j~2 P;ej at least one of the numbers j82Jk_3, j82t_2 is different from zero. Obviously, yE.M, so the induction assumption implies M DSpan{e,,. . . , e2k_2}. Hence a2k_le2k_l + a2ke2kE. M; as the difference Ax - (ra2ke2k_l - ra2k_le2k) belongs to Span{e,,. . . , e2k_2}, also ra2ke2k~i ~ Ta2k-ie2k e •&. Consequently, the vectors e2k_1 and e2k belong to M, and M D Span{e,,. . . , e2k). In particular, A has no odd-dimensional invariant subspaces. □
Defiuition, Examples, and First Properties of Invariant Subspaces 361 We say that a complex number A0 is an eigenvalue of A if det( A0/ - A) = 0. Note that we admit nonreal numbers as eigenvalues of the real transformation A. As before, the set of all eigenvalues of A will be called the spectrum of A and denoted by a(A). Since the polynomial det(A/- A) has real coefficients (as one can see by writing A in matrix form in some basis in ft"), it follows that the spectrum of A is symmetrical with respect to the real axis: if A0 is an eigenvalue of A, so is A0, and the multiplicity of A0 as a zero of det( A/ - A) is equal to that of A0. Not every transformation A: ft"-^ ft" has real eigenvalues. For instance, in Example 12.1.1 the eigenvalues of A are a + ir and a - ir. However, if n is odd, then A must have at least one real eigenvalue. Indeed, det(A/ - A) is a monic polynomial of degree n with real coefficients; hence for n odd det (A/- A) has real zeros. This implies the following fact (which has already been observed in the case of Example 12.1.1). Proposition 12.1.1 If the transformation A: ft"-* ft" has no real eigenvalues, then A has no odd-dimensional invariant subspaces. Proof. If M C ft" were an odd-dimensional /4-invariant subspace, the restriction A\M would have a real eigenvalue, which contradicts the fact that A has no real eigenvalues. (As in the complex case, the eigenvalues of any restriction A\^ to an /1-invariant subspace are necessarily eigenvalues of A.) □ The Jordan chains for real transformations are defined in the same way as for complex transformations: vectors x0,. . . , xk E ft" form a Jordan chain of the transformation A: #"—» ft" corresponding to the eigenvalue A0 of A if x0 t^O and Ax0 = A0;c0; Axf - A0;Cy = xj_i, j = 1,. . . , k. The vector x0 is called an eigenvector. The eigenvalue A0 for which a Jordan chain exists must obviously be real. Since not every real transformation has real eigenvalues, it follows that there exist transformations A: ft"-* ft" without Jordan chains (and in particular without eigenvectors). On the other hand, for every real eigenvalue A0 of A: ft"-* ft" there exists an eigenvector (which is any nonzero vector from Ker( A0/ - A) C ft"). In particular, A has eigenvectors provided n is odd. As we have seen (e.g., in Example 12.1.1), not every real transformation has one-dimensional invariant subspaces. In contrast, two-dimensional invariant subspaces always exist, as shown in the following proposition. Proposition 12.1.2 Any transformation A: ft"—* ft" with n >2 has at least one two-dimensional invariant subspace.
362 Real Linear Transformations Proof. Assume first that A has a pair of nonreal eigenvalues a + h, a - h (or, t are real, t ¥^ 0). Then 0 = det((cr + it)/ - A) det((cr - h)I -A) = det((cr2 + t2)/ - 2aA + A2) Let * £ ft" - {0} be such that [(a2 + r2)I-2aA + A2]x = 0 (12.1.1) Then clearly the subspace Jt = Span{x, Ax) is .4-invariant. Further, M cannot be one-dimensional because otherwise Ax = fix for some p G ft, which in view of equality (12.1.1) would imply /j,2 - 2/jlct + (a2 + t2) = 0, or (fj. — a)2 + t2 = 0, which is impossible since t ^ 0. If A has no nonreal eigenvalues, then (leaving aside the trivial case when A is a scalar multiple of /) the subspace Span{;c, y), where x and y are eigenvectors of A corresponding to different eigenvalues, is two-dimensional and A invariant. □ It is clear now that Theorem 1.9.1 is generally false for real transformations. The next result is the real analog of that theorem. Theorem 12.1.3 Let A: ft" —> ft" be a transformation and assume that det( A/ - A) has exactly s real zeros (counting multiplicities). Then there exists an orthonormal basis *,,. . . , xn in ft" such that, with respect to this basis, the transformation A has the form [a ]"_i = l where all the entries atj with i > j are zeros except for fls + 2.j+l' fljM,i+3' • • • ' an,n-l- So, the matrix [a,;]"/=1 is "almost" upper triangular. Proof. Apply induction on n. If A has a real eigenvalue, then use the proof of Theorem 1.9.1. If A has no real eigenvalues, then pick a two- dimensional /1-invariant subspace (which exists by Proposition 12.1.2) with an orthonormal basis x, y. Write A as the 2 x 2 block matrix with respect to the orthogonal decomposition <p" = M 4- M ±: -ft" %} and apply the induction hypothesis to the transformation A22: Mx —*M1. a It follows from Theorem 12.1.3 that a transformation A: ft"-* ft" with det( A/ - A) having s real zeros has a chain of p + 1 = \(n + s) + 1 invariant subspaces:
Root Subspaces and the Real Jordan Form 363 {0} = MoCMlC---CMp = $n (Observe that n - s is the number of nonreal zeros of det( A/ - A). So n - s and n + s are even numbers.) We leave it to the reader to verify that \(n + s) + 1 is the maximal number of elements in a chain of ^-invariant subspaces. We say that a transformation A: ft"—> tjt" is self-adjoint if (Ax, y) — (x, Ay) for every x, y e J({", [As usual, (-, •) stands for the standard scalar product in $"•] In other words, A is self-adjoint if A = A*. Also, a transformation A is called unitary if A* = /4-1 and normal if /4j4* = A* A. Note that in an orthonormal basis a self-adjoint transformation is represented by a symmetric matrix, and a unitary transformation is represented by an orthogonal matrix. (Recall that a real matrix U is called orthogonal if UUT=UTU = I.) For normal transformations the "almost" triangular form of Theorem 12.1.3 is actually "almost" diagonal: Theorem 12.1.4 Let A be as in Theorem 12.1.3 and assume, in addition, that A is normal. Then there exists an orthonormal basis in ft" with respect to which A has the matrix form [fll7]"y=,, where fl,y = 0 for i^j except for as + 2_s+l, fl»+l.j + 2' • • • ' an.n-l> an-l.n- Proof. Use an orthonormal basis in ft" with the properties described in Theorem 12.1.3, and observe that the equality A*A = AA* implies that actually atj = 0 for i>j except as+iJ+2, ...,d,.,,. □ 12.2 ROOT SUBSPACES AND THE REAL JORDAN FORM Let A: $"—* ft" be a transformation. The root subspace 9tk (A) corresponding to the real eigenvalue A0 of A is denned to be Ker( A0/ - A)", as in the complex case. Then 9tK (A) is spanned by the members of all Jordan chains of A corresponding to A0. For a pair of nonreal eigenvalues a + ir, a - k of A (here a, t are real and t#0) the root subspace is denned by 3la±iM) = Ker[(cr2 + t2)/ - 2aA + A2]" where p is a positive integer such that Ker[(cr2 + t2)/ - 2aA + A2]k C Ker[(cr2 + t2)I - 2aA + A2]" for every positive integer k. Note that, if A,,. . . , Ar are the distinct real eigenvalues of A (if any) and
364 Real Liuear Transformations or, + j't as + irs are the district eigenvalues of A in the open upper half of the complex plane (if any), then r s det( A/ - A) = II (A - A,)"' 11 [{<t\ + r\) ~2ak\+ A2]** >=i * = i for some positive integers a,,. . . , a,, j8,, . . . , (is. Using this observation, it can be proved that there is a direct sum decomposition r = gt^A) + ■■■ + 9tK{A) + ®a^(A) + ■■■ + Stas±iTs(A) (see the remark following the proof of Theorem 2.1.2). Moreover, we have: Theorem 12.2.1 For every A-invariant subspace M the direct sum decomposition M = (Mn stXi(A)) 4- • • • + (M n »Ar(i4» + (M n «„i±iTi(i4)) + ■ • • + {MnaaM±lTt(A)) holds. For the deeper study of properties of invariant subspaces, the real Jordan form of a real transformation, to be described in the following theorem, is most useful. As usual, Jk{\) denotes the k x k Jordan block with eigenvalue A. Also, we introduce the 2/ x 2/ matrix Ji(^ w) = K 0 0 0 h K 0 0 0 • h ■ 0 • 0 • • 0" • 0 • h ■ K- [u, wl and it, w are real numbers with w # 0 and L -W jU. J represents the 2x2 identity matrix. Theorem 12.2.2 For every transformation A: 4?"-* ft" there exists a basis in tjL" in which A has the following matrix form: A = -rkl(*i)®---®Jkp(iP)®Jll(vi,Wi)®---®Jl<L*,>w,) C12'2-1) where A,,. . . , Ap; /u,,. . ., \iq; w,,. . . , wq are real numbers (not necessarily
Root Subspaces and the Real Jordan Form 365 distinct) and wt,. . . ,w are positive. In the representation (12.2.1) the blocks 7^ (A,) and 7,(jt*y, wf) are uniquely determined by A up to permutation. The proof of Theorem 12.2.2 will be relegated to the next section. The right-hand side of equality (12.2.1) is called a real Jordan form of A. Clearly, A,,. . . , A are the real eigenvalues of A, and fil ± nv,,. .. , p.q ± iwq are the nonreal eigenvalues of A. Given A0 E cr(A), A0 real, the partial multiplicities and the algebraic and geometric multiplicity of A corresponding to A0 are denned as in the complex case. For a nonreal eigenvalue /x + iw of A, the partial multiplicities of A corresponding to p. + iw are, by definition, the half-sizes /. of the blocks 7,.( p,jt Wj) with p.t = p. and w; = ± w. The number of partial multiplicities of A corresponding to p + iw is the geometric multiplicity of p + iw, and the sum of partial multiplicities is the algebraic multiplicity of p. + iw. By use of the real Jordan form, it is not difficult to prove the following fact, which we need later. Proposition 12.2.3 If n is odd, then every transformation A: If —* If has an invariant subspace of any dimension k with Ost< n. Proof. Without loss of generality we can assume that A is given by an n x n matrix in the real Jordan form. As n is odd, A has a real eigenvalue, so that blocks Jk (A,) in the real Jordan form (12.2.1) of A are present. Since the subspaces Span{e,,. . . , ey}, / = 1,. . . , kt are 7t(A;) invariant, and the subspaces Span{e,, .. . , e2j}, j = 1,. . . , /, are 7,(/*y, wy) invariant, we obtain the existence of /l-invariant subspaces of any dimension k, 0<k<n. D Analogs of the results on spectral and irreducible invariant subspaces proved in Chapter 2 can be stated and proved for transformations from $" to $". (As in the complex case we say that an ^-invariant subspace M is irreducible if M cannot be represented as a direct sum of two A -invariant subspaces.) For example, see Theorem 12.2.4. Theorem 12.2.4 Let A: $" —* If" be a transformation. The following statements are equivalent for an A-invariant subspace M: (a) M is irreducible. (b) Each A-invariant subspace contained in M is irreducible. (c) The Jordan form of the restriction A\M is either 7„(A), AE If, or (in case n is even) Jn/2(p,, w), /u., w E If, w ¥= 0.
366 Real Linear Transformatious (d) There is either a unique eigenvector (up to multiplication by a nonzero real number) of A in M or (in case A\M has no eigenvectors) a unique two-dimensional A-invariant subspace in M. (e) The lattice of A-invariant subspaces is a chain. (/) The spectrum of A\M is either a singleton {A0}, A„GJ|?, or a pair of nonreal eigenvalues {p, + iw, p. - iw}, and nvk[(A\M - A0/)'] = max{0, dim M - i) , i = 0,1,. . . in the former case and rank[[(ju.2 + w2)I - IpA + A2]\M]' = max{0, dim M - 2i} , i = 0,1,. . . in the latter case. The real Jordan form can be used instead of the (complex) Jordan form to produce results for real transformations analogous to those presented in Chapters 3 and 4 (with the exception of Proposition 3.1.4). For this purpose we say that a transformation A: ft" —> §." is diagonable if its real Jordan form has only 1 x 1 blocks 7,(Ay), A,, . . . , kp e # or 2x2 blocks 7,(/*y, wy), / = 1,. . . , q. Also, we use the fact that the Jordan form of the transformation A: ^"-»4?" with the real Jordan form (12.2.1) is ®Jlq(p.q + iw<l)@Jlq(p<l-iwij) 12.3 COMPLEXIFICATION AND PROOF OF THE REAL JORDAN FORM We describe here a standard method for constructing a transformation <p"—> <p" from a given transformation J)?"—>$" with similar spectral properties. In many cases this method allows us to obtain results on real transformations from the corresponding results on complex transformations. In particular, it is used in the proof of Theorem 12.2.2. Let A: ft"—> $" be a transformation. Define the complexification Ac: <p"-+ <p" of A as follows: Ac(x + iy) = Ax + iAy, where x, ye tf". Obviously, Ac is a linear transformation. If A is given by an n x n matrix in some basis in $", then this same basis may be considered as a basis in <p" and Ac is given by the same matrix. It is clear from this observation that the eigenvalues and the corresponding partial multiplicities of A and of Ac are the same. Let M be a subspace in $". Then M + iM = {x + iy \ x, y E M} is a
Complexification and Proof of the Real Jordan Form 367 subspace in <p". Moreover, if M is A invariant, then M + iM is easily seen to be Ac invariant. We need the following basic connection between the invariant subspaces of a real transformation and the invariant subspaces of its complexification. Theorem 12.3.1 Assume that the transformation A: $"—> ft" does not have real eigenvalues. Let 0l+ C <p" be the spectral subspace of Ac corresponding to the eigenvalues in the open upper half plane. Then for every A-invariant subspace i?(C JfO the subspace (if + i2£) C\ 9t+ is Ac invariant and contained in 9t+. Conversely, for every A1-invariant subspace MG9t+ there exists a unique A-invariant subspace Z£ such that (5£ + iif) n 52+ = M. Proof. The direct statement of Theorem 12.3.1 has already been observed. To prove the converse statement, let M G9t+ be an ^'-invariant subspace. Fix a basis z,,. . . , zk in M, and write zi = x, + iyr j = 1,. . . , k, where xjt y, e ft". Put J£ = Span!*,,. . . , xk, yx,. . . , yk} C ft". Let us check that S£ is A invariant. Indeed, for each /', Aczj is a linear combination (with complex coefficients) of z,,..., zk, say *fz,= 2a<'\ (12.3.1) Letting ap'} - f}^ + iyp'\ where fi^ and y(l) are real, use the definition of Ac to rewrite (12.3.1) in the form Ax, + iAy, = E (Up" + iyPn)(xp + iyp), / = 1,. . . , * After separation of real and imaginary parts, these equations clearly imply that if is A invariant. Further, it is easily seen that JP + i£ = Span{z,, . . . , zk, £,, . - . , zk) C <p" where zj = xf - iyjt j = 1,. . . , k. Equality (12.3.1) implies that the subspace M = Span{z1, . . . , z~k) is Ac invariant and a(A%)=a(A%) This statement is easily verified; by letting z,, . . . , zk be a Jordan basis for AC\M, for example. As M C 0t+, we have M C 0t_, where 9i_ is the spectral subspace of Ac corresponding to the eigenvalues in the open lower half plane. Now
368 Real Linear Trausformations £B + iSB = [(SB + iSB) fl 9t+\ + [(SB + iSB) fl <M_) DSpan{z,, . . . , zk) + Span{z,,. . . , zk} = SB + iSB Hence (SB + iS£) fl ®+ = Span{z,, . . . , zt} = M It remains to prove the uniqueness of SB. Let SB' be another .4-invariant subspace such that (SB'+ iSP')n»+ = M (12.3.2) For a given subspace Jf C (p", define its complex conjugate: ^={<z-I,...,z-„)|{z1,...,zn)e^z/e(p} Obviously, Jfis also a subspace in <p". We have if' + iSB' = SB' + iSB'. Also, it is easy to check (e.j»., by taking complex conjugates of a Jordan basis in 0t+ for Ac\m ) that 3?+ = £%_. Taking complex conjugates in (12.3.2), we have (2' + &')r\9l_=M and if' + iSB' = [(,2" + is?') n 3?+] + [(,2" + is?') n »_] = J< + J = [(iP + tSP) D 98+] + [(iP + tSP) fl &_] = SB + iSP As iP + OB = {a: + iy \ x, y G SB}, and similarly for iP' + iSt', the equality of SB' and iP follows. D The proof shows that Theorem 12.3.1 remains valid if the subspace 0l+ is replaced by the spectral subspace of Ac corresponding to any set S of eigenvalues of Ac such that A0ES implies \O0S and S is maximal with respect to this property. We pass now to the proof of Theorem 12.2.2. First, let us observe that in terms of matrices Theorem 12.2.2 can be restated as follows. Theorem 12.3.2 Given an n x n matrix A whose entries are real numbers, there exists an invertible n x n matrix with real entries S such that sas l = 7,,(A,)e-• -e^(A^e/,,(/*„*i)0-• -e■//,(**«• •%> (12.3.3)
Complexiflcation and Proof of the Real Jordan Form 369 where A., a, and w^ are as in Theorem 12.2.2. The right-hand side of (12.3.3) is uniquely determined by A up to permutations of blocks Jk(\t) and '/,.(/*/. wj)- We now prove the result in the latter form. The Jordan form for transformations from <p" into (p" is used in the proof. Proof. Let Ac be the complexiflcation of A. Let 91^{AL)C <p" be the root subspace of Ac corresponding to a real eigenvalue A0. As the matrices (Ac - A0/)', i = 0,l,2,... have real entries, there exists a basis in each subspace Ker(Ac - A0/)' C C that consists of n-dimensional vectors with real coordinates. (Here, we use the fact that vectors j:, jtt6i(i" are linearly independent over $ if and only if they are linearly independent over (p.) Further, if m is such that Ker(A* - \0I)m = 9tXo{Ac) but Ker(>lc - A0/)m"' ¥■ 9tk (Ac), then, by using the same fact, we see that there is a basis in 9tK (Ac) modulo Ker(Ac - Aq/)"1-1 consisting of real vectors. We can now repeat the arguments from the proof of the Jordan form (Section 12.2.3) to show that there exists a basis in 9tK (Ac) consisting of Jordan chains of Ac (in short, a Jordan basis) with real coordinates. Further, let xn,. . . ,xim;, i-\,...,p be a Jordan basis in *3lk (Ac) where A0 is a nonreal eigenvalue of Ac (so for each i the vectors xii> ■ ■ • ' xi.m f°rm a Jordan chain of Ac corresponding to A0). By taking complex conjugates in the equalities Mt_VKj = Vi' j=l,...,mi; i=\,...,p (by definition, xi0 - 0) and using the fact that Ac is given by a real matrix in the standard basis, we see that *.-i.--•>*,.„.,. i = h---,P (12-3.4) are the Jordan chains of Ac corresponding to A0. The vectors (12.3.4) inherit linear independence from the vectors x,r Further, dim 3?A (Ac) = dim 3?A- (Ac) (because the algebraic multiplicities of Ac at A0 and at A0 are the same); hence the vectors (12.3.4) form a basis in 3l^o(Ac). Putting together Jordan bases for each 9lK (Ac), where A0 E i|f n a(Ac), which consist of vectors with real coordinates, and Jordan bases for each pair of subspaces ihk (Ac) and <3l-k (Ac) (where A0 is a nonreal eigenvalue of Ac) that are obtained from each other by complex conjugation, we obtain the following equality: AR = R{Jmi(\l)®---®Jmp(\p)®[Jli(\p + i)®Jh(\p+l)\®--. 0[^Ap+,)©i/((Apt,)]} (12.3.5)
370 Real Linear Trausformations Here A,,. . . , A are real numbers, A +1,. . . , \p+q are nonreal numbers (which can be assumed to have positive imaginary parts), and R is an invertible n x n matrix that, when partitioned according to the sizes of Jordan blocks in the right-hand side of (12.3.5), say /? = [/?,-•• RpRp+lRp+2- " " Rp+2q-\Rp + 2q\ has the property that /?, (i=l,...,p) are real and Rp+2j-t = R j=l,...,q. Fix /' (l</< q), and consider the 11- x 2/y matrix p+2j' u. V2 1 -i 0 0 0 0 l-i 0 0 0 0 1/00 0 0 1 i 0 0 0 0 1 -i 0 0 0 0 Lo o o o ••• i One checks easily that U- is unitary, that is, UjU* = I, and that and /a; and w- are the real and imaginary parts of Ap+y, repectively (see the paragraph preceding Theorem 12.2.2 for the definition of /((/u.y, wy)). Also, it is easily seen that the matrix [RP+2l-i, RP+2jW = [RP+2r^ Rp+v-W has real entries. Multiplying (12.3.5) from the right by £/='diag[/mi /^ t/fi £//fl] and denoting the real invertible matrix RU by Q, we have AQ = RUU*{Jmi(\1)@---®Jmf(\p)®[Jli(\p+l)®Jli(\p+l)] ©[•/,,(W©VA~'+«)1}1/ = G{/Ml(A1)©---e^p(A#,)©y,1(^,wl)©---©y,(/t,,w,)} and formula (12.3.3) follows. The uniqueness of the right-hand side of (12.3.3) follows from the
Commuting Matrices 371 uniqueness of the Jordan form of Ac. [Indeed, the right-hand side of (12.3.3) is uniquely determined by the eigenvalues and partial multiplicities of Ac.\ □ 12.4 COMMUTING MATRICES Let A be an n x n matrix with real entries. In this section we study the general form of real matrices that commute with A. This result is applied in the next section to characterize the lattice of hyperinvariant subspaces of a real transformation. In view of Theorem 12.2.2, we can assume that ^ = diag[y„...,yj (12.4.1) where each Ja is either a Jordan block of size ma x ma with real eigenvalue Aa, or J = Jm /2(jHQ, wa) (in the notation introduced before Theorem 12.2.2). Let Z be a real matrix such that AZ = ZA. Partition Z according to (12.4.1): Z = [Zafj]"a p = l, where Za/3 is an ma x mp real matrix. Then we have JaZaP = ZapJ„ ; a, fi = 1, . . . , u (12.4.2) If a(Ja) fl a(Jp) = 0, then equation (12.4.2) has only the trivial solution Za/3 =0 (Corollary 9.1.2). Assume cr{Ja) = a(Jp) = {A()}, where A„ is real. Then, as in the proof of Theorem 9.1.1, ZQ/3 is an upper triangular Toeplitz matrix. To study the case cr(Ja) = cr{Jp) = {/j.f) + iwu, /x() - iwf)}, it is convenient to first verify the following lemma. Lemma 12.4.1 Let K = * W]bea — w n J 2x2 matrix with real n, w such that w ¥" 0. Then the system of equations KA + C=AK; KC = CK for unknown 2x2 matrices A and C implies C = 0. The lemma is verified by a direct computation after writing out the entries in A and C explicitly. Now return to the case a(Ja) = o-(Jp) = { ju.0 + iw0, /*„ - i'wu}, ^, w0 e 4?, w„>0 in equations (12.4.2). Letting K= _ " ° and writing Zafj
372 Real Linear Transformations as a (m„/2) x (m„t2) block matrix [Ulj]"]L\mit'2 with 2x2 blocks £/„, we have K / 0 • o a: / ■ 0 0 0 0 0 ••■ / .0 0 0 ••• K r f,i u,i u u, 22 u2l -UmJ2.\ Umal2.2 ma/2,maf2 u. I.mp/2 2,m„/2 UmJ2.l ^m„l2 vm. m„/2,mBl2 K I 0 OK/ 0 0 0 0 0 ••■ / LO 0 0 ■•• K (12.4.3) Comparing the block entries (mJ2,1) and then (ma/2-l,l) in this equation, we obtain By Lemma 12.4.1, t/m /2, =0. Now compare the block entries in positions (mJ2- 1, 1) and (mJ2-2,1), and reapplying Lemma 12.4.1, it follows that Um ,2-1,, =0. Continue in this way, and it is found that Zap = [0,Zap] (if ma< nip) z»" = [Zofl] (ifm-amfl) where Za/3 = [t//y-]fy=1 is a square pxp matrix, /? = min(ma, m^) with U0 = 0 for i>/. So 'K 0 / K 0 0 -0 0 '0n 0 - 0 0 ••• 0 ••• 0 ■■• u12 u22 0 0" 0 / K. 0 0 12 !2 - 0 0 • KPn - ^2,p/2 Upl2,pl2- -K 0 0 .0 •• / K 0 0 ^2,„,2 Upl2,pl2-i 0 ••• / ••• 0 ■•• 0 ••• o- 0 / (12.4.4)
Commuting Matrices 373 Equality (12.4.4) implies that for / = 1,. . . , p/2, KUjj=UijK and for j = 2,...,p/2 KVl-U + Vu- ",-,.,-,+ 1/,-,.,* In view of Lemma 12.4.1, 0U = U22 = - • • = {/ ; hence t/y-_, y commutes with K for /' = 1,. . . , p/2. Further, Kt/y_2>y + £/;_,,; = f//-2.y-1 + ^-2.;^ for /' = 3,. . . , p/2. Using Lemma 12.4.1 again, t/y_, t= t/y_2 ■_, and KUi_2 j = Uj_2 jK- Continuing in this way, we find that £/,y (i^j) depends only on the difference between /' and i and commutes with K. Because of the 1 a J for some real numbers a latter property Utj must have the form and b (which depend, of course, on i and /). Putting all the above information together, we arrive at the following description of all real matrices that commute with a given real n x n matrix A. Theorem 12.4.2 Let A be an « x n matrix with the real Jordan form diag[7,,. . . , Ju], so /I = S-'[diag/„..., 7JS for some invertible real n x n matrix S, where each Ja is either a Jordan block of size ma x ma with real eigenvalue or a matrix of type Ma ~W* 0 0 0 0 wa Ma 0 0 0 0 1 0 M„ -wa 0 0 0 1 wa M„ 0 0 0 0 1 0 0 0 0 •• 0 •• 0 •• 1 •• 0 •• 0 •• 0 0 0 0 1 0 • Ma • ~wa 0 0 0 0 0 1 w« V-a with real fia, wa and wa > 0. Then every real n x n matrix X that commutes with A has the form X— SlZS, where the matrix Z = [Za/3]" p = 1 partitioned conformally with the Jordan form diag[7,, ■■ . , Ju] has the following structure: Ifcr(Ja)r\ a(Jp) = 0, then Zap = 0. Ifcr(Ja) = (Jp)= {A0}, A0 real, then or where Z = [0 Tap] in case ma s m^ in case mn^m P
374 af) • (I) ,.(2) ■*a(3 AaP Real Linear Transformations Xafi -af) L o o '-Q/3 (■) Ca/3 p = min(ma,mp) is a real p x p upper triangular Toeplitz matrix. If cr(Ja) = cr(Jp) = {fi + iw, ft — iw}, where ft and w>0 are real, then again or and in this case T = 1 a/3 ry(l) Aa0 0 - 0 z.* = *.* = v<2> y(') 0 [o r0 ra p] w case w case Aa/3 Aa0 y(l) » ma<mli ma>mp q= \ m w/iere the 2x2 blocks A*„y /iave f/ie /owi r u(>) o(>)i 'a(3 /or some rea/ numbers u(Jl and v(Jl. 12.5 HYPERINVARIANT SUBSPACES Let A: ft"—* $" be a transformation. A subspace Jtclf" is called j4 hyperinvariant if J< is invariant for every transformation X: If" —> tf" that commutes with A. It is easily seen that the set of all /l-hyperinvariant subspaces is a lattice. In this section we obtain another characterization of this lattice, one that is analogous to Theorem 9.4.2. The description of commuting matrices obtained in Theorem 12.4.2 is used in the proof. Theorem 12.5.1 Let a transformation A: $"—>$" have the minimal polynominal k m /(A)=n(A-A/)"ii[(A-^)2 + W/2r'
Hyperinvariant Subspaces 375 where A,, /j,t, and w; are real and wj >0, A,,. . . , Kk are distinct, and so are jti, + MV,, . . . , jtij + i(»s. Then the lattice of ail A hyperinvariant subspaces coincides with the smallest lattice SfA of subspaces in <p" that contains KerM - A,/)*, lm(A - A/)* for k = 1,. . . , ry; /' = 1,. . . , k, and Ker[(,4 - /*,/)* + w)l\\ \m[(A - My/)2 + w)l)k for k = 1,. . . , Sj\ j = 1,. .. , m. We consider first a particular case of Theorem 12.5.1 when the spectrum of A consists only of one pair of nonreal eigenvalues ft + iw, n - iw (H, w&$, w¥^0). Let fd) f(2). /■(2) f{2). . fim) r(m) ,, - , ,-. )\ >••■■> J 2Pl' J I >■■•■> J lPl> ■■■■>) \ ' • • • > J 2pm (14.3.1) be a Jordan basis in $", where p, > • • • >pm so that, in this basis, A is represented by the matrix Let ar; = sPan{/(1'>,...,/<;)}, y = i,...,/>,; i = i,...,m The following lemma is an analog of Lemma 9.5.2. Lemma 12.5.2 Every A-hyperinvariant subspace is of the form 3fi + --- + X" (12.5.2) where qx,. . . , qmis a non-decreasing sequence of nonnegative integers such thatp1-ql^--^pm-qm^O. If q, = 0 for some i, then, of course, 3Clq is interpreted as the zero subspace. We see later that conversely, every subspace of the form (12.5.2) is A hyperinvariant. Proof. Let if be a nonzero /l-hyperinvariant subspace, and let x G !£. Write * as a linear combination of the vectors (12.5.1): x=2^f],)+---+Z{r)f]m) 1=1 i=i We claim that each vector yr = t]i\ £,W/.W belongs to if. Indeed, let Pr be the projector on 9^ denned by Prf\s) = 0 for s^r and Prf\r) = f(p for i = 1,. . . , 1pr. It follows from Theorem 12.4.2 that Pr commutes with A. Hence 5£ is Pr invariant, and yr = Prx E Z£.
376 Real Linear Transformations Fix an integer r between 1 and m and denote by a the maximal index i* of a nonzero coefficient £,-r) (i = 1,. . . , 2pr). Without loss of generality, we can assume that a = 2/8 is even (otherwise consider Ax in place of x). Let us show that all the vectors /f\ . .. , f^} belong to if. Indeed, the vectors Zt-KA-vlf + w'n'-'y^tfljV + tff? and z2 = Azx belong to if and also to Span{/(/'),/2'')}. Now zx and z2 are not collinear; otherwise, A would have a real eigenvalue, and this is impossible. It follows that Span{z,, z2} = Span{/(,r), /f >}, and hence fi\ f?e2. If we already know that f\r\ . . . , f2r,)-2 £ ^ for some i > 2, then by a similar argument using the vectors z2l^-[(A-nI)2 + w2I]eiyrG^ and z2, = Az2i_, £ #, we find that /^,,/^GX For i = /3 we have /<",...,/I" ei?. As the vector x & £ was arbitrary, it follows that if = dCl + ■ • ■ + 3if™ for some integers #, such that Os^.sp,., i = l,...,/M. To prove that <?, > • •• > qr, we must show that l^Ci? implies 9^_1 C if. Consider the transformation B: tf"—> $" that, in the basis (12.5.1), has the block matrix form B = [Xjj]"j=l where Xtj is the 2p, x 2pt zero matrix (i, /' = 1,. . . , m), except for *'-■•'= L 'o J Theorem 12.4.2 ensures that B commutes with A. Hence 3! is B invariant, and f[_,) = B/,(r) e iP, i = 1,. . . , 2a. In other words, 3C C if. Further, consider the transformation C: $" —»$" that, in the basis (12.5.1), has the block matrix form C = [y,;]™/=1, where y/; is the 2pt x 2py zero matrix except for n+1, = [o i2PrJ Then by Theorem 12.4.2, C commutes with A, and assuming 2qr > 2(pr-pr+1), we have /"f(0 _ /•(<■ +1) ^ a> This implies 2qr-2(pr-pr + l)<2qr+l, or pr - ?f >/>, + I - 9r+1. If qr < Pr~ Pr+i> tnen tne inequality pr ~ qr^ pr+l~ qr+l is obvious. □ We are now in a position to prove Theorem 12.5.1 for the case cr(A) = {ft. + iw, p - iw}. As in the proof of Theorem 9.4.2, one shows that every
Hyperinvariant Subspaces 377 subspace of the form Ker[(,4 - filf + w2I]k, or Im[(,4 - /xl)2 + w2I]k is A hyperinvariant. So we have only to show that every j4-hyperinvariant subspace if belongs to the lattice SfA. By Lemma 12.5.2 .SP=afi +--- + X" (12.5.3) for some sequence of integers qx,. . . , qm such that qx > • • • > qm >0 and Pi ~~ <7i — Pi ~ Q2 — ''' — Pm ~ <im — 0- We prove that &E.yA by induction on ^,. Assume first qx = 1. Then .3?= af| 4- • •• 4- af', for some ism. As P,>P,-n> we nave if = Ker[(,4 - ju/)2 + w2/] D Im[(i4 - fil)2 + w2I]"''1 e 5^ Now assume that the inclusion if E 5^, is proved for ty, = v - 1, and let iPbe a subspace of the form (12.5.3) with qx = v. Let r, a be the maximal integers for which qx = • • • = qr and pa - pr + v >0. Consider the subspace j< = ar: + • • • + ar: + ar;;^, + • • ■ + af;o_Pr+„ It is easily seen that M = Ker[(A - nl)2 + w2I]" n Im[(i4 - ^/)2 + w2/]p'+*e ^ The inequalities /?, - ^ srpf+, - qi+i imply that Jl C J£. Further, the sub- space Jf = % fl Ker[(,4 - /a/)2 + w2I)v is j4 hyperinvariant, and since m = ari_, + • • • + ar^_, + ar;^ + ■ • • + ar™M the induction hypothesis ensures that le^,. Finally, if = J< + JV belongs to S^ as well. We have proved Theorem 12.5.1 for the case when the spectrum of A consists of exactly one pair of nonreal eigenvalues. As the proof shows, the converse statement of Lemma 12.5.2 is also true: every subspace of the form (12.5.2) is A hyperinvariant. Proof of Theorem 5.1 {the general case). Again, it is easily seen that each subspace Ker(i4 - A,./)*, Im(i4 - A/)*, Ker[(,4 - A/)2 + w2I]k, Im[(j4 - Ay/) + w2I] is A hyperinvariant. So we must show that each ^-hyperinvariant subspace belongs to ifA. Let M be an /l-hyperinvariant subspace. By Theorem 12.2.1 we have
378 Real Linear Trausformations m = (M n®Ai(A)) + ■ ■ ■ + {M nmKh{A)) + (Jtn9t ^^(A)) + ■■■ + (Mn®^iWt(A)) Write A in the real Jordan form (as in Theorem 12.2.2) and use Theorem 12.4.2 to deduce that each intersection M D 0tj(A) is A\.j) {A) hyperinvariant and M n &t „ ^m, is A\M hyperinvariant (p = I,. . .', s). With the use of Theorem 9.4.2, it follows that MC\9lx (A) belongs to the smallest lattice that contains the subspaces KerMl^, - A/)* = Kcr(A - A,./)* , k = 1,. . . , r, and ImML^, " A/)* = Im(yl - Ay/)* n Ker(i4 - A/)'' , k = 1,. . . , ry Similarly, by the part of the theorem already proved, we find that M D 3?M ±,v M) belongs to the smallest lattice that contains the subspaces Ker[(/lL _ U) - »plf + w\l\k = Ker[M - »plf + w2pI]k for k = 1,2,. . . ,s and Im[(i4|iB)j^i^(A) - fyl)2 + w*/]* ' = lm[(A - mp/)2 + w2pl]k n Ker[(i4 - m„/)z + *#]'' It follows that M e. ¥A, and Theorem 12.5.1 is proved completely. □ 1.2.6 flEAL TRANSFORMATIONS WITH THE SAME INVARIANT SUBSPACES In this section we describe transformations B:lf"—>lf", which have the same invariant subspaces as a given transformation A: If"-* If". This description is a real analog of Theorem 10.2.1. By Theorem 12.2.2, we can assume that, in a certain basis in If", A has the matrix form i4 = diag[y1,...,y#>, *„...,*,] (12.6.1) where /, = diagfi, n( A,),. . ., Jkm (A,)], i = l,...,p with different real numbers A,,. . . , A ; and
Real Transformations with the Same Invariant Subspaces 379 Kt = diag^n( fi., w,),. . . , J,.r((ii, w,)] , i=l,...,q with different complex numbers /j,l + iw,,..., fiq + iwq in the open upper half plane. We use the notation introduced in Section 12.2, and also assume that kn k I /.>• Now introduce the following notation (partly used in Section 10.2): given real numbers a0,...,as _,, denote by Ts(a0,. . . , as_{) the sxj upper triangular Toeplitz matrix 0 a0 ■ L0 0 ■ Further, for positive integers s s ( let 5-2 an J (12.6.2) £/,(«„, . . . ,fls_,,F) a0 a, a2 0 a0 a, 0 0 0 0 L0 0 /i1 /l2 fl,-l fl 22 ••• f2.,-2 as-\ Jl-s.l-s °s-2 flS-I 0 fl„ where Z7 is a real (f — s) x (f — s) upper triangular matrix /li 7l2 0 /?2 Lo o J 2.1-s Jl-s.i-s Similarly, if aj = -c, b, (12.6.3) (12.6.4) , j = 1, . .. , s - 1 are 2x2 real matrices, we define the 2* x 2s upper triangular Toeplitz matrix T2.*2 (a0, . . . , as_,) by the same formula (12.6.2). If, in addition, the real 2x2 matrices fjk (l</'<i<l-j) are given, denote by U2*2 (a0, . . . ,as_t;F) the It x It matrix given by (12.6.3) with F given by (12.6.4). By definition, for s = t we have U,{aa,...,a,_l\F)=T,{aa,...,a,_l) and
380 Real Linear Transformatious U]x2(aa,. . . ,as_,;F)= r2x2(a0,.. .,«,_,) We can now give a description of all transformations B: /f?" —*- /f?" with the same invariant subspaces as A. Theorem 12.6.1 Let the transformation A: $"—>$" be given by (12.6.1), in some basis in tjt". Then a transformation B: $"—*$" has the same invariant subspaces as A if and only if B has the following matrix form (in the same basis): B = dmg[Bl,...,Bp, C,,...,C,] where B, = Ukii(b«\ . . . , b%; F,',)©^<l^(fe,l',,. . . , b%)®-- ■ ®Tk (b^,...,b\° ) for some real numbers b\'\ .... b^ with b[l) t^O and some (kn - k-7) x (kn - ki2) matrix F(,); q = C^ ^;G<")©ff) <)©- i', J', for some 2x2 real blocks Ah fii) r du> fU)~\ s L -fU) dU)i ' ' ' ' ' ' ;2 with f\n r^O and det c£y> #0 and some 2(/;1 - lj2) x 2(/yl - lj2) real matrix Gu\ Moreover, the real numbers b^,. . . , b\p are different and the complex numbers d\l) + i\f\n\, ..., d\q) + i\f\q)\ are different as well. For the proof of Theorem 12.6.1, we refer the reader to Soltan (1974). 12.7 EXERCISES 12.1 Prove that the transformation of rotation through an angle <p: Uostp sin«H L sin <p cos <p J ' has no nontrivial invariant subspaces except when <p is an integer multiple of it.
Exercises 381 12.2 Given an example of a transformation A: ft2" —> ft2" such that A has no eigenvectors but A2 has a basis of eigenvectors in ft2". 12.3 Show that if A: ft" —> ft" is such that A has an eigenvector corresponding to a nonnegative eigenvalue A0, then A has an eigenvector as well. 12.4 Show that if A: ft2" -» ft2" is a transformation with det A< 0, then A has at least two distinct real eigenvalues. 12.5 Find the real Jordan form of the n x n matrix 0 1 0 ••• 0 0 0 1 ••• 0 0 0 0 ■•• 1 Ll 0 0 ••■ 0. Find all the invariant subspaces in ft" of this matrix. 12.6 Describe the real Jordan form and all invariant subspaces in ft of the 3x3 real circulant matrix a b c cab b c a J a, b, cG ft 12.7 Find the real Jordan form of an n x n real circulant matrix fl„ 1 Lfl, 12.8 Find the real Jordan form and all invariant subspaces in ft" of the real companion matrix ro i o o o i o o 0 a, 0 1 0 fl„-r a,A - a0 has n assuming that the polynomial A" - fln_,A" distinct complex zeros. 12.9 What is the real Jordan form of real n x n companion matrix?
382 Real Linear Transformations 12.10 Find the real Jordan form and all invariant subspaces in ft" of the matrix 0 0 0 a„ 0 0 "„-l • 0 • • 0 •• a2 ■■ 0 •■ 0 "l 0 0 0 where a,,. . . , an E ft. 12.11 Two linear matrix polynomials Aj4, + fi, and \A2 + B2 with real matrices At, B,, A2, and B2 are called strictly equivalent (over ft) if there exist invertible real matrices P and Q such that P(\Ai + BX)Q - A/42 + B2. Prove the following result on the canonical form for the strict equivalence (over ft) (the real analog of Theorem A.7.3). A real linear matrix polynomial \A + B is strictly equivalent (over ft) to a real linear polynomial of the type 0pq®Lki®---®Lk®Mli®---®Ml®(Imi + \Jmi(0))®--- ® (L, + A/m,(0)) 0 (A/,, + ■/„,( A,)) 0 • • • 0 (Klnu + Jnu( \u)) ®(\Ihi + Jhi(plt a>l))®---®(\IK+JK(n„, »„)) (1) where 0 is the p * q zero matrix; L€ is the e x (e + 1) matrix A 1 0 0 A 1 on 0 0 0 0 ••• Al J M6 is the transpose of Lt; A,,. . . , \u are real numbers; J,(l*, <o) = [K I2 0 0 K L L0 0 0 o- 0 K. ; and u,, w, are real numbers with <t>>0 for — u> fii > ' ' } — \,...,v. Moreover, the form (1) is uniquely determined by \A + B up to permutations of blocks. (Hint: In the proof of Theorem A.7.3 use the real Jordan form in place of the complex Jordan form.)
Exercises 383 12.12 Prove the following analog of the Brunovsky canonical form for real transformations. Two transformations [Al fij: ft" + 4?'"—>$" and [ A 2 B2\: ft" © ft™ —» ft" are called block similar if there exist invert- ible transformations M: ftm—>ftm and N: ft"—>ft" and a transformation F: ft"-*ftm such that Prove that every transformation [A B]: ft" + ftm-> ft" is block similar to a transformation [An B0] of the following form (written as matrices with respect to certain bases in ftm and ft"): A0 = Jki(0)®---®Jk(0)®J where J is a matrix in the real Jordan form; Bu has all zero entries except for the entries (kt, 1), (kl + k2, 2), . . . , (kl + ■ • • + kr, r), and these exceptional entries are equal to 1. {Hint: Use Exercise 12.11 in the proof of the Brunovsky canonical form.) 12.13 Let A: ft"-* ft" and B:ftm-*ft" be a full-range pair of transformations. Prove that given a sequence S = {A,,...,An} of n (not necessarily distinct) complex numbers such that A0 E S implies A() E S and A0 appears in S exactly as many times as A0, there exists a transformation F: ft"—> ftm such that A,, . . . , A„ are the eigenvalues of A + BF (counted with multiplicities). (Hint: Use Exercise 12.12.)
Notes to Part 2 Chapter 9. The first two sections contain standard material in linear algebra [see, e.g., Gantmacher (1959)]. Theorem 9.3.1 is due to Laffey (1978) and Guralnick (1979). The proof presented here follows Choi, Laurie, and Radjavi (1981). Theorem 9.4.2 appears in Soltan (1976) and Fillmore, Herrero, and Longstaff (1977). Our expositions of Theorem 9.4.2 and Section 9.6 follow the latter paper. Chapter 10. The results and proofs of this chapter are from Soltan (1973b). Chapter 11. Theorem 11.2.1 is a well-known result (Burnside's theorem). It may be found in books on general algebra [see, e.g., Jacobson (1953)] but generally not in books on linear algebra. In the proof of Theorem 11.2.1 we follow the exposition from Chapter 8 in Radjavi and Rosenthal (1973). Other proofs are also available [see Jacobson (1953); Halperin and Rosenthal (1980); E. Rosenthal (1984)]. Example 11.4.6 and Theorem 11.4.4 are from Halmos (1971). In the proof of Theorem 5.1 we are following Radjavi and Rosenthal (1973). Chapter 12. The real Jordan form is a standard result, although not so frequently included in books on linear algebra as the (complex) Jordan form. The real Jordan form can be found in Lancaster and Tismenetsky (1985), for instance. The proof of Theorem 5.1 is taken from Soltan (1981). 384
Part Three Topological Properties of Invariant Subspaces and Stability There are a number of practical problems in which it is necessary to obtain an invariant subspace of a transformation or a matrix by numerical methods. In practice, numerical computation can be performed with only a finite degree of precision and, in addition, the data for a problem will generally be imprecise. In this situation, the best that we can hope to do is to obtain an invariant subspace of a transformation that is close to the one we really have in mind. However, simple examples show that although two transformations may be close (in any reasonable sense), their invariant subspaces can be completely different. This leads us to the problem of identifying all invariant subspaces of a given transformation that are "stable" under small perturbations of the transformation—that is, to identify those invariant subspaces for which the perturbed transformation will have a "close" or "neighbouring" invariant subspace, in an appropriate sense. To develop these ideas, we must introduce a measure of distance between subspaces and to analyze further the structure of the invariant subspaces of a given transformation. This is done in Part 3, together with descriptions of stable invariant subspaces, using different notions of stability. This machinery is then applied to the study of stability of divisors of polynomial and rational matrix functions and other problems. The reader whose interest is confined to the applications of Chapter 17 needs only to study the material presented in Chapter 13, Section 14.3, and Chapter 15. 385
This page intentionally left blank
Chapter Thirteen The Metric Space of Subspaces This chapter is of an auxiliary character. We set forth the basic facts about the topological properties of the set of subspaces in £". Observe that all the results and proofs of this chapter hold for the set of subspaces in i|f" as well. 13.1 THE GAP BETWEEN SUBSPACES We consider <p" endowed with the standard scalar product. If x = {xl,...,xn),y = {y1,...,yn)<=$", then (x, y) = Y,"=l x,y„ and the corresponding norm is \\x\\ = (i\x,\2)1 The norm of an n x n matrix A (or a transformation y4: <p" —* <p") is defined accordingly: |M||= max M*||/|M| Now we introduce a concept that serves as a measure of distance between subspaces. The gap between subspaces if and M (in <p") is defined as 6{^M)=\\PM-P:f\\ (13.1.1) where Px and PM are the orthogonal projectors on if and M, respectively. It is clear from the definition that 0(if, M) is a metric in the set of all subspaces in <p"; that is, 6($,M) enjoys the following properties: (a) 8(<e,M)>0 if 2¥>Jt, d(<e,£) = 0; (b) d(<£,M) = 8(jU,<e); (c) e(<£,M)< 0(ie, Jf) + 9(Jf,M) (the triangle inequality). 387
388 The Metric Space of Subspaces Note also that 0{2£, M)<\. [This property follows immediately from the characterization given in condition (13.1.3).] It follows from (13.1.1) that 8(£,M) = 8(£\M±) (13.1.2) where if1 and M x denote orthogonal complements. Indeed, F^i = / - Px, *>||^-P*|| = ||P«i-'Vl|. In the following paragraphs denote by S^, the unit sphere in a subspace if C £", that is, 5^ = {x G if | ||*|| = 1}. We also need the concept of the distance of d(x, Z) from x G <p" to a set Z C <£"". This is defined by d(*,Z) = infieZ||*-f||. Theorem 13.1.1 Let M, i£ be subspaces in <f"\ Then 0(if, M) = max{sup d{x,£), sup d(x, M)} (13.1.3) xSSM x£Sy If exactly one of the subspaces if and M is the zero subspace, then the right-hand side of (13.1.3) is interpreted as 1; if if = M = {0}, then the right-hand side o/(13.1.3) is interpreted as 0. ///>, and P2 are projectors with Im P2 = if and \m P2 = M, not necessarily orthogonal, then 6(<e,M)^\\Pl-P2\\ (13.1.4) Proof. For every jr6 5ywe have ||jc-P2x|| = ||(P1-P2)*||s||P1-P2|| Therefore sup d(x,M)^\\Pi-P2\\ x£Sx Similarly, supxeSj( d(x, if) <\\Pt- P2\\; so max^.^l^HF.-^ll (13.1.5) where fix = supxg^ d{x, M),/tM= sup,eSj( d(*, if). Observe that >* = supxgJJ|(/-Pa)*||, ^ =supxgSj( ||(/-F^)*||. Consequently, for every * G <p" we have \W-Px)PMx\\*fiM\\PMx\\, \\V~ PM)P*x\i* fiAP**\\ (13-1-6) Now
The Gap Between Subspaces 389 11^^- P*)*\\2 = ((/- P*)PmV- P*)x, V-P*)x) ^\\(i-p*)pAI-pM\-Ui-Px)x\\ Hence by (13.1.6) II^(/ - ^)*H2 ^ ^ II^C - P*)x\\ • IK'" PM\ (13.1.7) \\PAI-PM\^/>A(I-Px)x\\ On the other hand, using the relation Pm-Ps = Pm(I-P*)-V-Pm)P* and the orthogonality of PM, we obtain \\{PM - Pa)x\\2 = \\PM(I- PM\2 + ll(/- ^)^H2 Taking advantage of (13.1.6) and (13.1.7) we obtain ||(^-^)*||2=£>i||(/-P^||2+/k^||^||2^max{^,^}||jt||2 So ll^-^NmaxOW^} Using (13.1.5) (with P, = Px, P2 = PM), we obtain (13.1.3). The inequality (13.1.4) follows now from (13.1.5). □ It is an important property of the metric 6(J£, M) that, in a neighbourhood of every subspace i? G <f"", all the subspaces have the same dimension (equal to dim if). This is a consequence of the following theorem. Theorem 13.1.2 If 0(if, M)<\, then dim if = dim M. Proof. The condition 0(if, M)<\ implies that if n Mx = {0} and if x fl M - {0}. Indeed, suppose the contrary, and assume, for instance, that ifn^^O}. Let x<=S<er\M±. Then d{x, M) = 1, and by (13.1.3) 0(if, M)>1, a contradiction. Now <£C\JI± = {0} implies that dim if < dim M, and ifx ni = {0} implies that dim if >dim M. D It also follows directly from this proof that the hypothesis 0(if, M ) < 1 implies §" = Z£ + Mx = Z£± + M. In addition, we have PM(<e) = M, P^M) = ^£
390 The Metric Space of Subspaces For example, to see the first of these observe that for any xE M there is the unique decomposition x = y + z, y £ if, z £ M±. Hence x = PMx = PMy so that M C PM(J£). But the reverse inclusion is obvious, and so we must have equality. The following result makes precise the idea that direct sum decompositions of <p" are stable under small perturbations of the subspaces, as measured in the gap metric. Theorem 13.1.3 Let M, Mxd <p" be subspaces such that M+Mx = $n If Jf is a subspace in <p" such that B(M, Jf) is sufficiently small, then Jf + M^V (13.1.8) and 6(M, Jf) ^\\PM- PJ\ ^ Cd(M, Jf) (13.1.9) where PM(Pj,-) projects <f"1 onto M (onto Jf) along Ml and C is a constant depending on M and M, but not on Jf. In fact C = 2\\PM\\ max {d{x,M)1} xGM,. \\x\\ = l Proof. Let us prove first that the sum Jf + Jtl is indeed direct. The condition that M + Ml = <p" is a direct sum implies that ||* - _y|| s 8 > 0 for every x £ SM and every y £ M. Here 8 is a fixed positive constant. Take Jf so close to M that 8(M,Jf)<8/2. Then ||z-y||<5/2 for every zE.Sv, where y = y(z) is the orthogonal projection of z on M. Thus for x £ SM and z £ S v we have ||*-z||*||*-;H|-|l*-J'll*i so JfnM, = {0}. By Theorem 13.1.2 dim Jf = dim M if 6(M,Jf)<\, so dimensional considerations tell us that Jf + M, = <p" for Q(M, Jf)<\, and equation (13.1.8) follows. To establish the right-hand inequality in (13.1.9) two preliminary remarks are needed. First note that for any xE.M, and yE.Ml we have x — PM{x + y) so that h + y\\^\\PMV\\x\\ (B.i.io) It is claimed that, for 0(M, Jf) small enough
The Gap Between Snbspaces 391 llz + ylls^llA.iriMI (13.1.11) for all z € Jf and yE.Ml. Without loss of generality, assume ||z|| = l. Suppose that 0(M, Jf)< 8 and let xE.M. Then, using (13.1.10), we obtain Wz + yW^Wx + yW-Wz-xMrj-'WxW-a But then x = (x - z) + z implies )|jc|| s= 1 — S, and so \\z + y\\^\\PM\\-\l-8)-8 and, for 8 small enough, (13.1.11) is established. The second remark is that, for any x G <p" \\x-PMx\\sC0d(x,M) (13.1.12) for some constant C0. To establish (13.1.12), it is sufficient to consider the case that x G Mx and \\x\\ - 1. But then, obviously, we can take Co= ma,x„ {d(x,M)~1} Now for any xE. 5V, by use of (13.1.12) and (13.1.3), we obtain \\(PM - Px)x\\ = \\x - PMx\\ < C0d(jt, M) < C06(M, Jf) Then, if w G <p", \\w\\ = 1, and w = y + z, y G Jf, z G M{, it follows that \\(PM - Pv>|| = \\(PM - /V)y|| < \\y\\C08(M, Jf)^2C0\\PM\\6(Jt, Jf) and the last inequality follows from (13.1.11). This completes the proof of the theorem. □ We remark that the definition and analysis of the gap between subspaces presented in this section extends verbatim to a finite-dimensional vector space V over <p (or over J|?) on which a scalar product is defined. Namely, there exists a complex-valued (or real-valued) function defined on all the ordered pairs, x, y, where x, y G V, denoted by (x, y), which satisfies the following properties: (a) (ax + py, z) - a(x, z) + f}(y, z) for every x,y,zEV and every a, p G <p (or a, p G f); (b) (x, y) = (y, x), x,y<=V; (c) (jt, x) > 0 for all i6V; and (*, x) = 0 if and only if * = 0.
392 The Metric Space of Suhspaces 13.2 THE MINIMAL ANGLE AND THE SPHERICAL GAP There are notions of the "minimal angle" and the "spherical gap" between two subspaces that are closely related to the gap between the subspaces. The basic facts about these notions are exposed in this and the next sections. It should be noted, however, that these notions and their properties are used (apart from Sections 13.2 and 13.3) only in Section 13.8 and in the proof of Theorem 15.2.1. Given two subspaces iE,Md (£"", the minimal angle tpmm{^£, i) (0< <pmin(.£, M.)^ it 12) between it and M is determined by sinVBin(iPf^) = inf{||jt + y|||jtE^,^e^,max{|M|,|^||} = l} (13.2.1) The minimal angle can also be denned by the equality cos<pmin(J2>,^) = sup |i>,y)| (13.2.2) Indeed, writing bx = inf \\ax + py\\ for any x, y G £", we have fc2, = min{inf ||x + /Jy||2, inf ||a* + >-||2} Now for ||jt|| = ||y|| = l ||jc + py\\2 = (jc, jc) + p(x, y) + /§(y, *) + \p\\y, y) = l + 0(*,y) + /3(y,*) + |0|2 and writing (i = u + iv, where u and v are real, we see easily that the function f(u, v) = l + /3(jc, y) + /3(_y, jc) + |0|2 of two real variables u and v has its minimum for u = -|((jt, y) + (y, x)) and v = j(i(y, x) - i(x, y)), that is, when p = -(y, x). Thus inf ||jt + ^||2 = l-|(Jt,y)|2 (13.2.3) Similarly, if ||x|| = ||y|| = 1 inf \\aX + y\\2 = l-\(x,y)\2 a SI
The Minimal Angle and the Spherical Gap 393 Denote by a and b the right-hand sides of equations (13.2.2) and (13.2.1), respectively. Then 1 * " = Jnf.s (1" l(*' ^l2); bl = Jnls b'y In view of (13.2.3) the equality 1 - a2 = b2 follows, and this means that, indeed, formulas (13.2.1) and (13.2.2) define the same angle (pmjn(if, M) with0<^pmin(if,^)<7r/2. Proposition 13.2.1 For two nontrivial subspaces if and M of <f"", if D M = (0) if and only if sin^pmin(if,^)>0 Proof. Obviously, if x G if D M is a vector of norm 1, then sin ?BjB( <£.*)< ||*+ (-*)ll=0 so (pmin(if, M) = 0. Conversely, assume <pmin(if, M) = 0. As the set <&='{(*, y)E. <p" x <p" | max{||*||, ||y\\) = 1} is closed and bounded, the continuous function ||* + _y|| has a minimum in the set <f>, which in our case is zero. In other words, ||*() + y0|| = 0 for some x0 G if, y0 G M, where at least one of ||*0|| and ||_y0|| is equal to 1. But then, clearly, *0 G if n M "- {0}. n We also need the notion of the "spherical gap" between subspaces. For nonzero subspaces if, M in (p" the spherical gap 0(if, M) is defined by 0(if, M) = max{sup d(x, SX), sup d(x, SM)} We also put 0({O}, if) = 0(if, {0}) = 1 for every nonzero subspace if in £" and 0({O}, {0}) = 0. The spherical gap is also a metric in the set of all subspaces in <f"". Indeed, the only nontrivial statement that we have to verify for this purpose is the triangle inequality: 6(<e, Jt) + 0(M, Jf)>&(<£, Jf) (13.2.4) for all subspaces if, M, and Jf in <p". If at least one of if, M, and Jf is the zero subspace, (13.2.4) is evident (observe that 6(£,M.)<2 for all sub- spaces if, M C <p"). So we can assume that if, M and JV are nonzero. Given x G 5^, let zx G SM be such that ||* - zx\\ = d(x, SM). Then for every y E. S^, we have
394 The Metric Space of Subspaces II* - J'll s II* - z,ll + Ik - y\\ = d(x, Su) + \\zx -y\\ and taking the infinum with respect to y, it follows that d(x, Sy) s d(x, SM) + d{zx, Sx) < 6(Jf, M) + 6(M, SB) It remains to take the supremum over x G Sx and repeat the argument with the roles of Jf and i? interchanged, in order to verify (13.2.4). In fact, the spherical gap 0 is not far away from the gap 0 in the following sense: 8(<e,M)<0(<e,M)<V28(£,M) (13.2.5) The left inequality here follows from (13.1.3). To prove the right inequality in (13.2.5) it is sufficient to check that for every x G <p" with ||jc|| = 1 we have d(x,Sy)^y/2d(x,Se) (13.2.6) where !£ C <p" is a subspace. Let y - Pxx, where Px is the orthogonal projector on i£. If y — 0, then x 1 if, and for every z G S^, we have ||x-z||2 = ||^||2 + ||z||2 = 2 = 2[^,^)]2 So (13.2.6) follows. If y¥=0, then, in the two-dimensional real subspace spanned by x and y, there is an acute angle between the vectors x and y. Consider the isosceles triangle with sides x, y/\\y\\ and enclosing this acute angle. In this triangle the angle between the sides >"/||y|| and x - y/||.y|| is greater than it/4. Consequently \x-^\<V2\\x-y\\=V2d(x,X) and (13.2.6) follows again. Proposition 13.2.2 For any three subspaces 5£, M, Jf C <p", sin <pmin(if, #) s sin <pmin(i?, M) - 6(M, Jf) (13.2.7) Proof. Let _y, G i? and y3 G Jf be arbitrary vectors satisfying max{||>'1||, ||y3||} = 1. Letting e be any fixed positive number, choose y2 G M such that f| jv2 j I== M JV3.11 and 11* " y2ll =£ 0(M, Jf) + e)\\y3\\ =s 0(M, Jf) + e
The Minimal Angle and the Spherical Gap 395 Indeed, if y3 = 0, choose y2 = 0; if y3 ¥=0, then the definition of 0(M, Jf) allows us to choose a suitable y2. Now As e >0 was arbitrary, the inequality (13.2.7) follows. □ The angle between subspaces allows us to give a qualitative description of the result of Theorem 13.1.3. Theorem 13.2.3 Let M,Jf be subspaces in <p" such that M D Jf = {0}. Then for every pair of subspaces Jp^Cf" such that 0(41,41,) + 0(Jf, JV,)< sin <pmin(Jl, Jf) (13.2.8) we have M.lC\Jfl = {0}. If, in addition, M + Jf = <p", then every pair of subspaces M^Jf, satisfying (13.2.8) has the additional property that Ml+jfl = i;''. Proof. In view of Proposition 13.2.2 we have sin ?„,,.„(,«„ JV.) > sin vmia(MltJf) ~ 6(Jf, Jft) and sin <pmin(Mlt Jf)>s\n <pmin(Jf,M)- B(M,Mt) Adding these inequalities, and using (13.2.8) and Proposition 13.2.1, we find that Mx n JV, = {0}. Assume now that, in addition, M + Jf - §". Suppose first that M = Mx. Let e > 0 be so small that e^Ji^ + e s <t If M + Jfx ¥ <p", then there exists a vector x G <p" with ||*|| = 1 and ||^->-||>S for all y&M+Jfx [e.g., one can take xE.(M + JV,)1]. We can represent the vector jc as x = y + z, yE.M, ze.Jf. It follows from the definition of sin tpmin(M, Jf) that \\z\\*(sin9min(M,Jf))-1 Indeed, denoting u = max{||y||, ||z||}, we have
396 The Metric Space of Subspaces sin <pmin(£, M) = inf{||*, + x2|| |jc, E J?, *2 E M, max{\\Xl||, ||jc2||} = 1} y-+Z- i J_ «~llz| By the definition of 0{2£, M) we can find a vector z, from jV, with lkll = NI. ||z-z1||<[^,^) + 6]||z||<s The last inequality contradicts the choice of x, because z - z, = x - t, where t = y + zlEM + Jfx, and ||jt-f||<S. Now consider the general case. Inequality (13.2.8) implies d{Jf, Jfl)< sin <pmin(M, JV) and, in view of Proposition 13.2.2, d(M,Ml)< sin <pmin(Jt, Jft). Applying the part of Theorem 13.2.3 already proved, we obtain M + Mx = <p" and then Ml + Jft = <p". □ /J.J MINIMAL OPENING AND ANGULAR LINEAR TRANSFORMATION In this section we study the properties of angular transformations in terms of the minimal angle between subspaces. Let MX,M2 be subspaces in <p". The number ■qiM^MJ = inf{||* + y\\\xGMl,yeM2, max(||*||, \\y\\) = 1} is called the minimal opening between Mi and M2. So ■q[Mx, M2) = sin (pmiJ,MltM2) where ^>mjn(^j, M2) is the minimal angle between Mx and J<2. By convention, t/({0}, {0}) = oo. if n is any projector defined on <p", then max{||n||, ||/ - n||} < (7,(Im n, Ker II)) ' (13.3.1) To see this, note that for each z G <p" ||z|| = ||nz + (/-n)z||>r,(Imn,Kern)-max(||nz||,||(/-n)z||) We would like to mention also the following properties of the minimal opening. If Qx and Q2 are nontrivial (i.e., different from 0 and I) orthogonal projectors of <p" onto the subspaces Mx and M2, respectively, then and
Minimal Opening and Angular Linear Transformation 1 - r\yMx, M2) = sup 2 = sup 0^xGM{ ||*|| 0*yeM2 \\ y\\ Indeed, these formulas follow from the equality in{{\\x+y\\\yEM2) = \\x-Q2x\ for every iGl,, and from 397 (13.3.2) inf I*-(Mill2. -. lWl2-ll(Mli2 11*11 ] - i inf , ,,^'2-n 1 \(x, y)\ 1 - sup 2 = 1 - sup sup . 2 OjiiEJI, ||Jt|| x<EMiy£M2 \\X\\ \\y\\ x*0 y¥-0 \\Q*\ = 1 - sup sup '2 2 = 1 - \\QM\2 yBM.x^M, \\X\\-\\y\\- 0*yGM2 ||>-||2 = inf 0*y£M2 \y-QM\V \\y\\ J As a consequence of (13.3.2) we obtain the following connection between the minimal opening and the distance from one subspace to another. For two subspaces Mx and M2 in <p", put p{Mx,M2) = sup d(x,M2) [if Ml = {0}, then define p(Ml, J<2) = 0]. Then we have p(M2, Mt) = (l- viM^M,)2)112 = cos <pnin(MltM2) (13.3.3) whenever Ml # {0}. To see this, note that for J<2 # {0} p{M2,Mx)- sup jy-ji — = sup I, I, o*yeM2 \\y\\ o*y^M2 \\y\\ where Qx is the orthogonal projector onto Mx. But then we can use (13.3.2) to obtain formula (13.3.3). If M2 = {0}, then (13.3.3) holds trivially. We use the notion of the minimal opening between two subspaces to describe the behaviour of angular transformations when the corresponding projectors are allowed to change.
398 The Metric Space or Subspaces Lemma 13.3.1 Let Il0 be a projector defined on <p", and let Yl be another projector on <p" such that Ker Il0 = Ker n. Then, provided p(Im IT, Im n0) < T/(Ker n„, Im n„) we have the following estimate for the norm of the angular transformation R of Im n with respect to Yl0: || A|| s p(Im n, Im n0)(Tj(Ker no, Im II0) - p(Im n, Im IT,,))"1 (13.3.4) Proof. Put p0 = p(Im II, Im Il0) and t/0 = T/(Ker no, Im no). Recall that « = (n-n0)|lmI(n (13.3.5) For xGlmll and z £ Im no we have ||(n - n0)jt|| = ||(/ - n„)jt|| = ||(/ - n0)(* - z)|| * \\i - nj ||* - z|| Taking the infimum over all z G Imll0 and using inequality (3.1), one sees that IKn-iioHI^PoTjo'lHl, *eimn (13.3.6) Now recall that Ry + y G Im 11 for each yGlmIlu. As /?yGKerIl0 = Kerll, we see from (13.3.5) that (n-ll0)(Ry + y)=Ry So, using (13.3.6), we obtain \\Ry\\^pov~ol\\Ry + yL yeimn0 (13.3.7) It follows from (13.3.7) that (1 - p0-qQ ')||/?y|| ^put?« 'IMI for each y G Imll0, which proves the inequality (13.3.4). D The following lemma will be useful. Lemma 13.3.2 Let P and P* be projectors defined on <p" such that <p" = Im P + Im P*. Then for any pair of projectors Q and Qx defined on <p" with \\P - Q\\ + \\P* - Q*\\ sufficiently small, we have <p" = Im Q + Im Qx, and there exists an invertible transformation S: <f""—* <p" which maps Im Q on Im P, ImQ" on Im Px, and
Minimal Opening and Angular Linear Transformation 399 max{||5 - /||, US"1 - /||} < fi(\\P- Q\\ + \\PX - Qx\\) (13.3.8) where the positive constant /3 depends on P and Px only. Proof. Let a = ^(Im P, Im PX)(||PX|| + l)_l, and assume that the projectors Q and Qx satisfy H^-eiMl^-ei-^ (13.3.9) As 0(lm P, Im Q) < || P - Q\\ and 0(lm Px, Im Q x) < || Px - Q X ||, condition (13.3.9) implies that V20(Im P, Im Q) + V20(Im Px,Im Q x) < Tj(Im P,Im/>x) But then we may apply Theorem 13.2.3 combined with (13.2.5) to show that <f:" = Im0 + Im0x. Note that (13.3.9) implies that \\P- Q\\<\. Hence S, = / + P- Q is invertible, and we can write S^l = I + V with ||V|| ^ 5 ||P - Q\\ < }. As / — P + Q is invertible also, we have ImP = P(I- P + Q) = PQ = (I + P- Q)Q = 5,(Im Q) (13.3.10) Further s1Qxs];l - px = (/ + p- Q)Qx(i + v) - px = QX +{P-Q)QX + QXV + (P-Q)QXV-PX = Qx - Px + (P-Q)(QX - PX) + {P-Q)PX + (QX -PX)V + PXV + (P-Q)(QX -PX)V + (P-Q)PXV So ||S,0xSr' - PX|| ^3||ex - Px\\ + 3||P- OH • ||PX||. But then p(lmSlQ*S;l,lmP*)^SlQxS;1 - Px\\ <3(||P-0|| + ||Px-0x||)(||Pl + l) <^(ImP,ImPx) Let n0(Il) be the projector of <p" along Im P (Im Q) onto Im Px (Im(2) and put n=5in5~1. Then II is again a projector, and by (13.3.10) we haveKerfi = Kern0. Further, Imft = Im SlQxS;1, and so we have p(Im ft, Im n0) < |r/(Ker n<„ Im no) Hence, if R denotes the angular transformation of Im ft with respect to no, then because of equation (13.3.4) of Lemma 13.3.1, we obtain
400 The Metric Space of Subspaces ||fl|| <2p(Im II, Im n„)[i,(Ker II0, Im no)]"' As p(Imii,Imn0)<3(||/>-e|| + ||/jX-e,<||)(||/jX|| + l), this implies that II*ii^(I|j,-gii + iij,x-g1) (13.3.H) Next, put S2 — I- RU0, and take 5 = 52S,. Clearly, S2 is invertible; in fact, SJ1 = / + RU{). It follows that S is invertible also. From the properties of the angular transformation one easily sees that S(Im Q) = Im P, S(Im0x) = ImPx. To prove (13.3.8), we simplify our notation. Put d= \\P- Q\\ + \\PX - 0*||, and let v = 7j(Im P, Im P*). From S = (/- RIl0)(I + P- Q) and the fact that \\P - 0|| < i, one deduces that ||S - /|| < ||P - Q\\ + f \\R\\ • ||n0||. For \\R\\ an upper bound is given by (13.3.11), and from (13.3.1) we know that ||n0||<T/-1. It follows that WS-lW^d+Uiar,)'1 (13.3.12) Finally, we consider S~l. Recall that S~l = I + V with ||V||< I\\P-Q\\<1 Hence ||5-,-/||=s||K|| + ||V||-||n0||.||i?|| + ||i?||-||n0|| *i\\p-Q\\ + m\\-\\n0\\*id + u(<*vrl and (13.3.8) follows in view of (13.3.12). □ 13.4 THE METRIC SPACE OF SUBSPACES We have already seen in Section 13.1 that the set <£(<p") of all subspaces in <p" is a metric space with respect to the gap 0(i?, M). In this section we investigate some topological properties of <£(£"), that is, those properties that depend on convergence (or divergence) in the sense of the gap metric. Theorem 13.4.1 The metric space 4?(<p") is compact, and, therefore, complete (as a metric space). Recall that compactness of <£(£") means that for every sequence i?,,i?2,... of subspaces in <£(<p") there exists a converging subsequence •2J., Z£h,.. ., that is, such that Iim0(«2;v «%) = ()
The Metric Space of Subspaces 401 for some i?0G ^((p"). Completeness of <$(§") means that every sequence of subspaces i£(, i = 1, 2, . . . , for which lim,, -_„ 0(i^, &j) = 0 is convergent. Proof. In view of Theorem 13.1.2, the metric space <$(§") is decomposed into components 4?m, m = 0,. . . , n, where 4?m is a closed and open set in C|?(<P") consisting of all m-dimensional subspaces in <£"". Obviously, it is sufficient to prove the compactness of each <$m. To this end consider the set $m of all orthonormal systems u = {«*}*=, consisting of m vectors u,,. . . , um in £". For u = {uk)mk={ E$m,v = {«,}?_, G £m define m -| ; 2 ll"*-i>J2 1/2 S(u, v) = It is easily seen that 8(u, v) is a metric in £m, thus turning $m into a metric space. For each u = {«t}™=1 G £m define ;4mu = Span{ut,. . . , um) G <J?m. In this way we obtain a map A m: $m —> <J?m of metric spaces $m and (pm. We prove that the map Am is continuous. Indeed, let i?G (J?m and let vl,...,vm be an orthonormal basis in 3?. Pick some u = {uk}k"=l G ^m (which is supposed to be in a neighbourhood of v = {vk}k"=l G £m). For u,, / = 1,. . . , m, we have (where M = /tmu and Px stands for the orthogonal projector on the subspace JV): lltf* " ^Kll = II *V, " "il ^ II^K " ",)ll + II", - "ill ^ II^J Ik - «,IMk - t\N2S(«, i;) and thus for jc = Efl, c^u, G 5^ ll(^-^)*N2 2k|s(«,i') /=i Now, since ||jt|| = E™=1 |a,|2 = 1, we find that \a\ < 1 and E™ , |a,| <m, and so ll(^-^)LN2mfi(ii,i>) (13.4.1) Fix some y G Sy±. We wish to evaluate PMy. For every x G i?, write (x, P„y) = (P„x, y) = ((P* - P*)jc, y) + (x, y) = ((PM - Px)x, y) and \(x,PMy)\^2m\\x\\8(u,v) (13.4.2) by (13.4.1). On the other hand, write
402 The Metric Space of Subspaces m then for every zE.iE1 (z, p.«y) = (z,2 «,(", - «>,■)) + (z,2 «,",) = (z,2 «,(«, - y,)), v 1=1 ' v /=i ' v ;=i ' and |(z,P^)|=£||z||E «,(",-«,) <||z||m max |or,.| Hii,. - u,| X^i-^m But ||>-|| = 1 implies that E,m=1 |«J2 < 1, so max{|a,|, . . . , |aj}<l. Hence \(z,PMy)\^\\z\\m8(u,v) (13.4.3) Combining (13.4.2) and (13.4.3), we find that \(t, PMy)\ <3mS(u, v) for every t E <p" with \\t\\ = 1. Thus \\PMy\\^3m8(u,v) (13.4.4) Now we can easily prove the continuity of Am. Pick an *£<£" with ||*|| = 1. Thus, using (13.4.1) and (13.4.4) we have \\{PM - Pr)x\\ ^ \\{PM - Px)P*x\\ + \\PM(x - P^x)\\ <5m • 8(u, v) so 6(M,<e) = \\Ptt - Py\\^5m8(u,v) which obviously implies the continuity of A m. It is easily seen that $m is compact. Indeed, this follows from the compactness of the unit sphere {x £ (f" | ||*|| = 1} in §". Since Am'- $m —* fym 's a continuous map onto (f?m, the metric space <$m is compact as well. Finally, let us prove the completeness of 4?m. Let i?,, i?2,. . . be a Cauchy sequence in (f?m, that is, 0(i^, i^-)—»0 as /, y-»oo. By compactness, there exists a subsequence i^ such that \imk^0{Z£ik, Z£)=0 for some i?e (J7m. But then it is easily seen that in fact if = lim,_„ J^. D Next we develop a useful characterization of limits in 4?(<P")- Theorem 13.4.2 Let Mx, M2,. . . be a sequence of m-dimensional subspaces in 4?(<pn), such that 6(Mp,M)-^>0 as p-^<x> for some subspace M C <p". Then M consists of exactly those vectors x £ <p" /or which there exists a sequence of vectors x £ <p", p — 1, 2,. . . suc/i f/iaf jr E ./# , /j = 1,2,. . . and x = lim « jt .
The Metric Space of Subspaces 403 Proof. Denoting by Py the orthogonal projector on the subspace Jf C <p", for every xEM. we have: ll*V -*ll - IIC*,- ^)*ll ^ \\Pmp-Pm\\ ■ 11*11 <6{Mp,M)\\x\\->Qzs p-»« So xp = Pu x has the properties that xp £ Mp and limp^ xp = x. Conversely, let xp £ Mp, p = 1, 2, . . . be such that lim^.. xp = x. Then IIP.** ' x\\ s ||/V - P^*|| + IIP^* - PMxp\\ + \\xp - *|| ^d(M,Mp)\\x\\ + \\PMp\\-\\x-xp\\ + \\xp-x\\ *0{M,Mp)\\x\\+2\\x-xp\\->0as p-^°° (in the last inequality we have used the fact that the norm of an orthogonal projector is 1); so PMx = x, and xE M. □ Using Theorems 13.4.2 and 13.1.3, one obtains the following fact. Theorem 13.4.3 Let ^Band M be direct complements to each other in <p", and let {iCm}* = 1, {■^mJm-i be sequences of subspaces such that £m8(2m,2)=]ime(Mm,M) = 0 Then, denoting by P (resp. Pm) the projector on i? along M {resp. on Z£m along Mm) we have rim\\Pm-P\\-0 m—*^ Moreover, there exists a constant K>0 depending on 5£and M only such that \\Pm - P\\ < K{6{Zm, 2) + 6(Mm,M)} (13.4.5) for all sufficiently large m. Observe that, in view of Theorem 13.1.3, the subspaces !£m and Mm are direct complements to each other for sufficiently large m. Proof. Let Pmie be the projector on Mm along it and PM m be the projector on M along Z£m (for sufficiently large m). By Theorem 13.1.3 we have
404 The Metric Space of Subspaces WPm.x-PW^CMM^M) (13.4.6) \\PM.m-P\\^C2e^m,X) where C^2\\P\\ max {d(x, M)'1}, C2 = 2||P|| max {d(x,2y1} x£2e,\\x\\ = l xEM,\\x\\-l As usual, d(x, N) = inf{||* - y|| | y G Jf} is the distance between x G <p" and a subset .A" C <p". In particular, for m large enough we find that H^ll = 1111 + 1 When Theorem 13.1.3 is applied again, it follows that ll^-/V*ll^2||/>m,y|| max {d(x,<eyl)e{<emt<e) x^Mm,\\x\\ = ] Now use (13.4.6) and deduce (for sufficiently large m): \\pm-p\\*\\pm-pmA + \\p«x-p\\ S2(||P|| + 1) max {d(x, 2)~l}8(Sem, SB) + Cl6(Mm, M) x<=Mm,\\x\\=\ We finish the proof by showing that max {d(x,Se)~l}^2 max {d(xy&y1} (13.4.7) *£.*„„ ||x ||=1 jre^,||*||=1 for m sufficiently large. Arguing by contradiction, assume that (13.4.7) does not hold. Then there exists a subsequence {Mm }^=1 and vectors xm G Mm with norm 1 such that d(xm,Xyl>2 max {<*(*, «£)"'} (13.4.8) * j:G^(,||j:|| = 1 As the sequence {xm }£_, of ^-dimensional vectors is bounded, it has a converging subsequence. So we can assume that xm —»*0 as &—»°°. Clearly, ||jt0|| = l, and by Theorem 13.4.2, x0E.M. In view of (13.4.8) for each k = l,2,. . . , there is a vector yk e if such that \\xk-yk\\<(2 max {d{x,Z£yl)yl (13.4.9) In particular, the sequence {yk}k^i is bounded, and we can assume that yk—>y0 as k —>°°, for some y0Gif. Passing to the limit in (13.4.9) when k tends to infinity, we obtain the inequality
The Metric Space of Subspaces 405 2 max {d(x,Serl}*\\x0-y0\\-1* max {d(x,^)~1} which is contradictory. □ The proof of Theorem 13.4.3 shows that actually equation (13.4.5) holds with tf = 4(||P|| + l) max {d(x,Xy1} xeM,\\x\\ = i We conclude this section with the following simple observation. Proposition 13.4.4 The set <$m($") of all m-dimensional subspaces in <p" is connected. That is, for every M, Jf G <$m(§") there exists a continuous function f: [0, l]-» $m($) such that /(0) = M, /(l) = Jf (and the continuity of f is understood in the gap metric). Proof. Using the proof of Theorem 13.4.1 and the notation introduced there, we must show that the set $m is connected. As any orthonormal system «,,..., wm in <fV can be completed to an orthonormal basis in <p", the connectedness of $m would follow from the connectedness of the group £/(£") of all n x n unitary matrices. To show that £/(£") is connected, observe that any X G U(§") has the form *=Sdiag[e"\.. .,ei6"]S'1 , where S is unitary and 0,,. . . , 0„ are real numbers (see Section 1.9). So f(t) = Sd\ag[ei,e\...,ei,6"}S-1, f£ [0,1] is a continuous U(§")-valued function that connects / and X. □ Similarly, one can prove that the set §m(x%") of all m-dimensional subspaces in i%" is connected. To this end use the facts that any orthonormal systems u,,..., um in $" (m < n) can be completed to an orthonormal basis «,,. . . , un with det[U[, u2,. . . , un] - 1 and that the set U+($") of all orthogonal n x n matrices with determinant 1 is connected. Recall that a real n x n matrix U is called orthogonal if UTU = UU T = I. For completeness, let us prove the connectedness of U+($"). It follows from Theorem 12.1.4 that any * E U+($") admits the representation *=S",diag[Kl,K2,...,K|,]S where S is orthogonal and each Kj is either the scalar ±1 or the 2x2 matrix
406 The Metric Space of Subspaces *» = - c-n 0 os 0 Mor some "> 0 — ^ — 2tt, which depends on j. As det X= 1, also det[K,, K2,. .. , KB] = 1, which means that the number of f-1 01 indices j such that Kj = -1 is even. Since _ = Wv, we can assume that each /Cy is either 1 or *fl, 0 = 0(y). Putting X(t) = S"1 diag[ *,(*), K2(t),..., Kp(t)]S , 0< f <1 where /^(f) = K, if Ky = 1 and Ks(t) = Vm if Kf = *„, we obtain a £/+(#")- valued continuous function that connects / and X. 13.5 KERNELS AND IMAGES OF LINEAR TRANSFORMATIONS Important examples of subspaces in <p" are images of transformations into <p" and kernels of transformations from <p". We study here the behaviour of these subspaces when the transformation is allowed to change. The main result in this direction is the following theorem. Theorem 13.5.1 Let X: <f""-* <f"" be a transformation, and let Px be a projector on Ker X. Then there exists a constant K>0, depending only on X and Px, with the following property: for every transformation Y: (£""—» <pm with dim Ker Y = dim Ker X there exists a projector PY on Ker Y such that \\PY-PX\\^K\\X-Y\\. (13.5.1) In particular 0(Ker*,Kery)<A:||*-y||. (13.5.2) Proof. It will suffice to prove (13.5.1) for all those Y with dim Ker Y = dim Ker X that are sufficiently close to X, that is, ||X - Y\\ < €, where e > 0 depends on X and Px only. Indeed, for Y with dim Ker Y = dim Ker X and ||.Y-Y||>e, use the orthogonal projector PY on Y and the fact that IJP,, - Px\\ < ||PJ + HPJ = 1 + ||Pj to obtain (13.5.1) (maybe with a bigger constant K). Consider first the case when X is right invertible. There exists a right inverse X' of X such that Im X' = Im(/ - Px) (cf. Theorem 1.5.5), and then X'X = I - Px (indeed, both sides are projectors with the same kernel and the same image). It is easy to verify that any transformation Y: <f""—>• <f"" with the property .\\Y-X\\*l\\X'\\-1 (13-5.3)
Kernels and Images of Linear Transformations 407 is also right invertible and one of the right inverses Y1 is given by the formula Y1 — ZX1 where Z=2 (-l)"(X'(Y- X))" n = 0 Indeed, we have y = x + ( y - x) = x(i + x'(Y - x)) and hence YZX' = lim X(I + X'(Y - X))t2 (~\)"(X'(Y- X))")x' = lim *(/ + (-1 )*(*'( y - A-))*)*' = XX' = I where the penultimate equality follows from (13.5.3), because \\(X\Y-X))k\\*2- -k A similar argument shows that Z is invertible and ||Z||^2, ||/-Z||< 2\\X'\\ \\X- Y\\. Now put PY = /- y'y. We have Up, - /Ml = ||*'*- y'y|| = H*'*- *'zy|| ^||*'||-||*-zyN||*'||-{||/-z||-||*|| + ||z||-||^-v||} <||*l||{2||*'||-||A'-y||-||*|| + 2||*-y||} So (13.5.1) holds for every Y satisfying ||y-*|| < |||*'|r', with /C = 2||*'||2||A'||+2||Z'||. Now consider the case when X is not right invertible, and let r be the dimension of a complementary subspace N to Im X in (pm. Consider the transformation defined by X(x + y) = Xx + Ly; xE <p\ y E $r, where L: <f:r-» Jf is some invertible transformation. As the image of A" is the whole space <pm the transformation X is right invertible. Also Ker X = Ker X. Let P^ be a projector on Ker * defined by Pk{x + y) = Pxx; x E £", y£ <pr. Applying the part of Theorem 13.5.1 already proved to X, we find positive constants e and K such that, for every transformation Y: <p" © <f"-» <pm with ||* - y|| < e, there exists a projector Pf on Ker Y such that
408 The Metric Space of Subspaces \\Pi-PA*K\\X-Y\\ (13-5.4) Note that the equality dim Ker Y = dim Ker X holds automatically for e small enough because then such Y will also be right invertible (see the first part of this proof). Apply (13.5.4) for Y of the form Y(x + y) = Yx + Ly; x e <p", y G £r, where Y: <p"-» <pm is a transformation such that \\X- Y\\ =£ e and dim Ker Y = dim Ker X. Let us check that Ker Y C <p". Indeed dim Ker Y = dim Ker X = dim Ker X = dim Ker Y and since Ker Y C Ker Y, we have in fact Ker Y = Ker Y and thus Ker Y C (p". Now put Px = Px\ , Py = F^i n to satisfy (13.5.1), for transformations Y: <" -* (pm such that ||X - Y\\ < e. Finally, observe that (13.5.2) follows from (13.5.1) in view of Theorem 13.1.3. □ The condition dim Ker Y = dim Ker X is clearly necessary for the inequality (13.5.1), since otherwise we obtain a contradiction with Theorem 13.1.2 on taking a Y: <p"-+ £" such that \\X - Y\\ < K~\ A result analogous to Theorem 13.5.1 also holds for the images of linear transformations. The statement of this result is obtained from Theorem 13.5.1 by replacing KerX and Ker Y by Im X and Im Y, respectively, and its proof is reduced to Theorem 13.5.1 by observing that Im A = (Ker A*)1 for a linear transformation A and that 6(M, Jf) = 6(Mx, jVx) for any subspaces M, N C <p". 13.6 CONTINUOUS FAMILIES OF SUBSPACES As before, we denote by ^((p") the set of all subspaces in <p" seen as a metric space in the gap metric. In this section we consider subspace-valued families Z£(t) defined on some fixed compact set K C $m, that is, for each t G K, i?(f) is a subspace in <p". The family Z£{t) will be called continuous (on K) if for every t0E K and every e >0 there is S >0 such that ||f-fj<5, tEK implies 6(<£(t), if(f0)) < e (the norm \\t - t0\\ is understood as the Euclidean norm, that is, generated by the standard scalar product (x, y) = E™,, xiyi for x = (xl,...,xm),y={yl,...,ym) E^m). In other words, the continuity is understood in the sense of the gap metric. Examples of continuous families of subspaces are provided by the following proposition. Proposition 13.6.1 Let B(t) be a continuous m x n complex matrix function on K such that rank B{t) = p is independent of t on K. Then Ker B(t) and Im B(t) are continuous families of subspaces on K.
Continuous Families of Subspaces 409 Proof. Take t0 £ K. There exists a nonzero minor of size p x p of B(/0). For simplicity of notation assume that this minor is in the upper left corner of B(t0). By continuity the p x p minor in the upper left corner of B(t) is also nonzero as long as t belongs to some neighbourhood U0 of t0. So [here we use the assumption that rank B(t) is independent of t] for t £ UQ lmB(t) = Span{bt(t),...,bp(t)} (13.6.1) where fc,(f) is the ith column of B(t). Let btj(t) be the (i, y')th entry in B(f); and let D(t) = [6^(0]!:^=,; C(0 = [MOlf.,--,- Then the matrix is a continuous projector with Im P(t) = Im B(t). Hence P(t) is uniformly continuous on Ul, where Ul is a neighbourhood of t0 in ^ such that Ul C t/0. By Theorem 13.1.1 [inequality (13.1.4)] the orthogonal projector on Im B(t) is also uniformly continuous on I/,. The statement concerning Ker B(t) can be reduced to that already considered because Ker B(t) is the orthogonal complement to lm(B(t))* (note that B(t)* is continuous in t if B(t) is). □ In particular, we obtain an important case. Corollary 13.6.2 Let P(t) be a continuous projector-valued function on K. Then Im P(t) and Ker P(t) are continuous families of subspaces on K. We have to show that rank P(t) is constant if the projector function P(t) is continuous. But this follows from inequality (13.1.4) and the fact that the set of subspaces of fixed dimension is open in the set of all subspaces in <p" (Theorem 13.1.2). The following characterization of continuous families of subspaces is very useful. Theorem 13.6.3 Let i£{t) be a family of subspaces {of <p") on a connected compact subset K of $m. Then the following properties are equivalent: (a) Z£(t) is continuous; (b) for each tE.K there exists an invertible transformation S(t): £"—»<(?" which depends continuously on t for tE. K, and there exists a subspace M C <p" such that Z£(t) = S{t)M for all t £ K; (c) for each r0 £ K there exist a neighbourhood Ut of t0 in K, an invertible transformation S,(t): <pB—»<p" that depends continuously on t in U, , and a subspace Mt C <p" such that m = sla(t)Mla,teuh.
410 The Metric Space of Subspaces We prove Theorem 13.6.3 only for the case K = [0,1] (of course, the case when K C $ is easily reduced to this one). The proof when K is a connected compact set of $m requires mathematical tools that are beyond the scope of this book [see Gohberg and Leiterer (1972) for the complete proof]. Proof. Assume that J£(t) is continuous on K = [0, 1]. Let 0 = t0 < t, < t2 < ■ • ■ < tp _, < t = 1 be points with the property that ll^o,)-^(,)ll<l for f,<7,<f, + 1, i = 0,...,p-l Here Px is the orthogonal projector on the subspace Jf C <p". For each i: - 0,. . . , p - 1, the transformation S,.(tj), tt < t\ < ti + l, defined by S.(t/) = / - (P M{, ( - Pmm) maps •/#(',-) on M(tj), is invertible and S,(f,) = I. Now put S(0=S,(f)-"S1(f2)So(f1) for ti<t<tl+l; M = M(0) to satisfy (b). Obviously, (b) implies (c). Finally, let us prove that (c) implies (a). Given S, and Mt as in (c), let P0 be the orthogonal projector on Mt . Then 5, (t)P0(St (0) ' is a projector on Z£(t); therefore, for tE.il, we have e(2(t), ^(<o)) = nvonvo" - wWor'ii < ll^onv)"1 - WW)"1!! + II V»)p»5J0 "l - Vo)poVo)_,ll ^IIVO-VOII-ll^oll-IIW'll + IIVo)l|-|lfol|-||\(0",--SfDOo)",H- As 5, (0 is continuous and invertible in U, , its inverse is continuous as well, and the continuity of !£(i) follows from the preceding inequality. □ Corollary 13.6.4 Let Z£(t) be a continuous family of subspaces (of §") on K, where KG $m is a connected compact set. Then there exists a continuous basis *,(f), . . . , x (t) in Z£(t), where p = dim Z£(t). (Note that because of the connectedness of K the dimension of i£(t) is independent of t on K.) Indeed, use Theorem 13.6.3, (b) and put Xj(t) = S(t)xjf j; = 1,. . . , p, where jt,,. . . , xp is a basis in M. Corollary 13.6.5 Let B(t) be a continuous mx n matrix function on a connected compact set KG If, such that rank B(t) = p is independent of t. Then there exists a
Applications to Generalized Inverses 411 continuous basis *,(?)>. . . , x„_p(t) in Ker B{t) and a continuous basis yi{t),...,yp(t)inlmB{t). This corollary follows from Corollary 13.6.4, taking into account Proposition 13.6.1. 13.7 APPLICATIONS TO GENERALIZED INVERSES In this section we apply results of the preceding sections to study the behaviour of a generalized inverse of a transformation when this transformation is allowed to change. Recall that a transformation B: <pm—»<p is called a generalized inverse of a transformation A: <p" —» <f"" if the equalities BAB = B, ABA = A hold (see Section 1.5). As an application of Theorem 13.5.1, we have the following result concerning close generalized inverses for close linear transformations. Theorem 13.7.1 Let X: <p" -h> <pm be a transformation with a generalized inverse X1: <pm —* <f. Then there exist constants K>0 and e > 0 with the property that every transformation Y: £" -»<pm with \\Y- X\\<e and dim Ker Y = dim Ker X has a generalized inverse Y satisfying \\Y'-X'\\<K\\Y~X\\ (13.7.1) Proof. By Theorem 1.5.5, the generalized inverse X is determined by a direct complement Jf to Ker X in <p" and by a direct complement M to Im X in <f"", as follows: x'y = x;\pxy), >-e<r where Px is the projector on Im X along M, and Jf,: .A^Im Jf is the invertible transformation defined by Xtx = Xx, XE.M. Denote by 3Sf(Z) the set of all transformations Y:§"-^>§m such that dim Ker Y = dim Ker X. Using Theorem 13.1.3 and inequality (13.5.2), choose e,>0 in such a way that Jf is a direct complement to Ker Y for every Y G X(X) with HA'- y||<e,. Using the analog of Theorem 13.5.1 for images of linear transformations, we find a projector PY on Im Y such that IIP*-PJ =£*,!!*-Y|| (13.7.2) for every Y G 51T(Ar). Here the constant Ki depends on X and Px only. Our next observation is that, by Lemma 13.3.2 and (13.7.2), there exists
412 The Metric Space of Subspaces a positive number e2 =s e, such that for any Y G 3£(X) with || X - Y\\ < €2 we can find an invertible transformation SY: (p",-^<p"' with SY(lm Y) = Im X and max(||Sy-/||,||S,;1-/||)</yX-Y|| where the positive constant K2 depends on X and Px only. Let Y - SYY, and note that for every generalized inverse Y1 of Y the transformation Y'SY is a generalized inverse for Y. Now for YE.3C(X) with H*- Y\\ < e2 we have lly's.-^ll^lly'-^'ll + lly^-y'Hlly'-^ll + lly'll^ll^-yll so it is sufficient to prove Theorem 13.7.1 for Y in place of Y. In other words, we can (and will) assume that the transformation Y from Theorem 13.7.1 satisfies the additional property that Im Y = lmX. Now we verify (13.7.1) for the generalized inverse Y' = Y\lPY, where y,: jV-h> Im y = Im X is defined by Ytx =Yx,x& JV. Indeed \\Y;lpY-xVrx\\^\\YV\\-\\P¥-PA + \\YV-xV\\-\\Px\\ and \\y;1-x;1\\ = \\y~\x, - y,)^,'||<lly;1!! II*, - r.ll PT'll ^llyr'llll^-ylMl^r'll But the norms ||y^'|| are bounded provided the transformation YE.J((X) with lmY = lmX is such that \\X - Y\\ < 111^'II- Theorem 13.7.1 is proved. □ Observe that the complete analog of Theorem 13.5.1 does not hold for the case of generalized inverses. Namely, given X and X as in Theorem 13.7.1, in general there is no positive constant K such that any transformation Y: <p"—»<pm witn dim Ker Y = dim Ker X has a generalized inverse Y1 satisfying (13.7.1). To produce an example of such a situation, take n = m and let X: <£""—» <p" be invertible. Then there is only one generalized inverse of X, namely, its inverse X~\ Further, let Y = aX, where a¥=Q. If (13.7.1) were true, we would have for some K>0 and all a: la-'-iHljr-'ll^lk-ilHI*"'!! which is contradictory for a close to zero. Now we consider continuous families of transformations and their generalized inverses. It is convenient to use the language of matrices with the usual understanding that n x m matrices represent transformations from <pm into <p" in fixed bases in <pm and <p".
Applications to Generalized Inverses 413 Theorem 13.7.2 Let Bit) be a continuous mx n matrix function on a connected compact set KC$q such that rank B(t) - p is independent of t. Then there exists a continuous n x m matrix function X(t) on K such that, for every tE. K, X(t) is a generalized inverse of B(t). Proof. In view of Corollary 13.6.5 there exists a continuous basis *,(?),. . . , xn_p(t) in Ker B(t), as well as a continuous basis _y,(f),. . . , yp(t) in Im B(t). By the same corollary there exist a continuous basis xn-P + i(0, • • - , *„(') in Im B(0* and a continuous basis yp+l(t),. . . , ym(t) in KerB(f)*. As Im B{t)* = (Ker B(t))1, it follows that x^t),. . . ,x„(t) is a basis in <pm for all re K. Also, yt(t), . . . , ym(t) is a basis in <pm for all tEK. Define a transformation X(t): <pm^> £" as follows: X(t)yj(t)=0, j = p + 1, . . . , m; and for / = 1,. . . , p X(t)yj(t) is the unique vector in lm B(t)* such that B(t)X(t)yi(t) = y.(t). Theorem 1.5.5 shows that X(t) is indeed a generalized inverse of B(t) for all t G K. It remains to show that X(t) is continuous. For a fixed vector z G <pm and any t G /£, write z = E^l, Z;(0>\(0> f°r some complex numbers z,(f) that depend on f. These numbers z,(0 turn out to be continuous, because Further, the transformation B|lmB(f).:ImB(0*^ImB(0 is invertible, so n / ^ /i - p+1 for some complex numbers a7,(0 that also depend on t. Again, a-,(f) are continuous on /£. Indeed, a;,(0 is the unique solution of the linear system of equations n y,(t)= 2 ajl(t)B(t)xXt), j = \,...,p (13.7.3) i—n—p+1 = [yl(t)---ym(t)V1z Writing yfo), j = 1, ■ ■ ■ , p in terms of linear combinations of the standard basis vectors e,,. . . , em, and writing *,(<), i = n - p + 1, . . . , n in terms of
414 The Metric Space of Subspaces linear combinations of e,,. . . , en we can represent the system (13.7.3) in the form A(t)a(t) = C(t), t<=K (13.7.4) where a(t) is the /?2-dimensional vector formed by a;/(0> /= 1> • • • » P> i = n- p + 1,. . . , n, and A(t) and C(t) are suitable matrix and vector functions, respectively, which are continuous in t. As the solution of (13.7.4) exists and is unique for every t G K, it follows that the columns of A(t) are linearly independent for every t G K. Now fix t() G K, and assume for simplicity of notation that the upper p2 rows of A(t0) are linearly independent. Partition «»-[%$■• <*>-& where A0(t) and C„(f) are the top p2 rows of A(t) and C(t), respectively. Then At)(tQ) is nonsingular; as A(t) is continuous in t, the matrix A0(t) is nonsingular for every t from some neighbourhood £/, of tQ in /£. It follows that a(0 = Mo(0)_,C„(0 is continuous in f for f G [/, . As f0 G ^ was arbitrary, the functions av(0 are continuous on K. Returning to our generalized inverse X(t), we have for every m the following equalities: x(t)z = s 2,(0^(0^(0 = 2 z,.(o*(o '>-,(o i=i i=i = 2 Z,-(0 2 a,,(0*;(0 ; = n-p+l and so X(t) is continuous on /£. □ A particular case of Theorem 13.7.2 deserves to be mentioned explicitly. Corollary 13.7.3. Let B(t) be a continuous mx. n matrix function on a connected compact set Kd$q such that, for every t G K, the matrix B(t) is left invertible (resp. right invertible). Then there exists a left inverse (resp. a right inverse) X(t) of B(t) such that X(t) is a continuous function of t on K.
Subspaces of Normed Spaces 415 13.8 SUBSPACES OF NORMED SPACES Until now we have studied the notions of gaps, minimal angle, minimal opening, and so on for subspaces of <p" where the norm of a vector x = {*,,. . . , xn) is Euclidean: ||*|| = (E"=1 |*.|2)"2. Here we show how these notions can be extended to the framework of a finite-dimensional linear space with a norm that is not necessarily generated by a scalar product. Let V be a finite-dimensional linear space over <p or over ft. A real- valued function defined for all elements * G K, denoted by ||jt||, is called a norm if the following properties are satisfied: (a) ||*||s:0 for all xE.V; ||*|| =0 if and only if x = 0; (b) ||Ajc|| = |a| ||jt|| for every xEV and every scalar A (so A G <p or A G ft according as V is over <p or over ft); (c) ||* + y|| < ||*|| + ||y||, for all x, yE.V (the triangle inequality). example 13.8.1. Let/,, ..,/„ be a basis in V, and fix a number/? 3:1. For every * = E"=1 aJ^V, put 11*11, = (Ski") Also, define H*^ = max(|a,|,. . . , |a„|). We leave it to the reader to verify that ||'||,(p —1) and ||-||„ are norms (one should use the Minkowski inequality for this purpose): for any complex numbers xly. . . ,xn, yt,. . . , yn and any p>lwe have / " \Up/n \ Up I " \\lp (2|x/ + ^r) *\L\x,\') +(2W) a example 13.8.2. For V= <p" (or V= ft") let / " \ 1/2 NI = (2W2) /—i where x = (*,, ...,*„) belongs to <p" (or to ft"). We have used this norm throughout the book. Actually, this is a particular case of Example 13.8.1 (with the basis /. = e,, i = 1, . . . , n in £" (or ft") and p = 2). □ Any norm on V is continuous, as proved in the following proposition. Proposition 13.8.1 Let /,,. . . , fn be a basis in V, and let \\ ■ || be a norm in V. Then, given e > 0 there exists a 8 > 0 such that the inequality
416 The Metric Space of Snbspaces IIW|-|MII<e holds provided \xj — yj\<8 for j — l,...,n, where x = Z"=lxjfj and Proof. Letting M = max,£;.sn ||^||, choose 8 = eM~ln~l. Then for every x = E"=1 Xjfj, y = E"=1 >^. with |x;. - >»;.| < S, y = 1, . . . , n, we have n n \\x-y\\^\xj-yj\\\fj\\^MjJ\xryj\<Mn8 = e ;'=i ;=i It remains to use the inequality IIMI + IHII*II*-:HI which follows easily from the axioms of a norm. D It is important to recognize that different norms on a given finite- dimensional vector space are equivalent in the following sense. Theorem 13.8.2 Let || • ||' and || • ||" be two norms in V. Then there exists a constant K^l such that K-'WxW^WxW-^KWxW (13.8.1) for every x £ V. We stress the fact that K depends on || • ||', || • ||" only (and of course on the underlying linear space V). Proof. Let /,,. . . , /„ be a basis in V. It is sufficient to prove the theorem for the case when , / " , 1/2 2 a,/, =(Sk|2 i = i i'=i Consider the real-valued continuous function g defined on <f"" by g(a,, . . . ,aj = Z a,./. , at G <p for j=l,...,n As the set {(a,, . . . , a„)£ <p"| £"=i la/l = 1) 's closed and bounded, the
Subspaces of Normed Spaces 417 function g attains its maximum and minimum on this bounded set. So there exist *,, x2 G V such that ||jt,||' = Ikll' ~ 1 ar,d IklMMMkll" for every v G V with ||i>||' = l. Now for x G K, x^Owe have ||jt/||jt||'||'= 1 and hence Thus inequality (13.8.1) holds with /C = max(||^2||", 1/||jc,||"). □ In the rest of this section we assume that an arbitrary norm || • || is given in the finite-dimensional linear space V. For any subspace iCK, let S(M) = {x<=M\ ||jc|| = 1} be the unit sphere of M. Now the gap 0(if, M) between the subspaces if and M in V is defined by formula (13.1.3): 0(<e,M) = max{ sup d(x,S£), sup d(x, M)) xSS(M) .(Ells') where d(x, Z) = inf,eZ ||* - t\\ for a set Z C V. The gap has two properties of a metric: (a) 0(if, M) = 6(M, if) for all subspaces i,lCl/;(b) 0(if, J<) > 0 if if ^ M; 0(if, if) = 0. However, the triangle inequality 0(£,M)<d(£,J{) + 0(J{,M) (13.8.2) for all subspaces if, M, JV in V fails in general, although it is true when the norm is defined by means of a scalar product (jc, y), as Theorem 13.1.1 shows. The following example illustrates this fact. example 13.8.3. Let ft2 be the normed space with the norm ll<k>*2>lli = kl + kl > (x^x^eft2 Consider a family of one-dimensional subspaces if(a) = Span{e, + ote2} , a G ft We compute 0(if(a), if(0)). Take x G S(if(0)), so that x = (y, yp), where M = O + l0lr'- Now
418 The Metric Space of Subspaces inf ||jt-y||,= inf If 7J - f M II = inf {|y- /Lt| + |-y/3-/x«|> As the function f(n) — \y - fi\ + \yp - fia\ is piecewise linear, we have inf {\y - fi\ + \y^ - fia\) = mini |-y/3 - -yorl, y- — _ |/3-a l + l/3| So min(l,|a| ') = |/3 - a| max{(l + |/3|)-' min(l, |a|-'), (1 + |a|)-' min(l, l/T')} Let a < (i < y be positive numbers such that /3 < 1 < y and /3-y < 1. We compute 0(2(a), 2(0)) +0(2(0), £{y)) = £^ + 7~^ 1 + a y + /ty and However, clearly so the inequality 0(2(a),2(y)) = ^~^- y + ay &~a + y~& < y~a 1 + a J + fly y + ay holds for sufficiently small positive a, and the triangle inequality for the gap fails in this particular case. □ In contrast, the spherical gap 6(<e,M) = max{ sup d(x,S(2)), sup d(x,S(M))} .r£S(.«) x£S(2) is a metric. (The verification of this fact is exactly the same as that given in Section 13.2.) Instead of inequality (13.2.5), we have in the case of a general normed space the weaker inequality
Subspaces of Normed Spaces 419 0(Z£,M)<d{£,M)<2d{%,M) (13.8.3) for any subspaces it, M C V. Indeed, the left-hand inequality of (13.8.3) is evident from the definitions of 0(2£, M) and B(Z£, M). To prove the right-hand inequality in (13.8.3), it is sufficient to verify that for every vector v E.V with \\v\\ = 1 and every subspace iCKwe have d(u,S(Jf))<2d(u,Jf) (13.8.4) For a given e > 0 there exists a v G Jf such that ||n-i;||<d(K,JV) + e (13.8.5) and we can assume that v ^ 0. [Otherwise, replace v by a nonzero vector sufficiently close to zero so that (13.8.5) still holds.] Then i>0 = u/|i>||E S(Jf) and hence d(u,S(Jf))*\\u-v0\\*\\u-v\\ + \\v-v0\\ But lk-"ollHH-i| = IIMI-HI^II<>-"ll and we have d(u, S(Jf)) <2||v - u\\ < 2d(u, Jf) + 2e As e>0 is arbitrary, the desired inequality (13.8.4) follows. The minimal angle between two subspaces is defined in a normed space by the formula (13.2.1). With this definition, Proposition 13.2.2 and Theorem 13.2.3 are valid in this case. Without going into details, we remark that Lemmas 13.3.1 and 13.3.2 also can be extended to the normed space context. Concerning the metric space properties of the set of all subspaces in the spherical gap metric (such as compactness, completeness), it follows from inequality (13.8.3) and the following result that these do not depend on the particular choice of the norm. Theorem 13.8.3 Let || • ||' and || • ||" be two norms in V, with the corresponding gaps 6'{M, Jf) and d"(M, Jf) between subspaces M and Jf in V. Then there exists a constant L > 1 such that L~l6'(M, Jf) < d'\M, Jf) < L6'(Jl, Jf) (13.8.6) for all subspaces M and Jf.
420 The Metric Space of Subspaces Again, the constant L depends on the norm || • ||' and || • ||" only. Proof. By Theorem 13.8.2 we have for any xEV /r,lUI|'<lUII''</clUI|'', where the constant K > 1 is independent of x. Hence sup inf ||.r-f||'= sup inf||*-?|| xEM '£& x£M IE^ *K sup mi\\x-t\\"=K2 sup inf ||jt-,||" x<EM 'e-^ x<=M ,G& = K' sup inf ||*-f II" xEM '^y IMI"=1 In view of the definition of 6(3!, M) we obtain the left-hand inequality in ,2 (13.8.6) with L = K\ The right-hand inequality in (13.8.6) follows similarly. □ 13.9 EXERCISES 13.1 Compute the gap 0{M, Jf), where M = Span[*], ^ = SpanP]c<p2 and x and y are complex numbers such that |*| = |_y|. 13.2 Compute the gap 0(M, Jf), spherical gap 0(M, Jf), minimal opening 7](M, Jf) and minimal angle (pmm(M, Jf), where J< = Spanl I, JV = Span left y- and x and y are real numbers such that |*| = \y\. 13.3 Compute &(M,Jf), i)(M,Jf), and (pmin(M,Jf) for any two one- dimensional subspaces M and Jf in fj[". 13.4 Let U: <p" —* <p" be a unitary transformation. Prove that 0(Jl,Jf)=0(UM,UJf); 0(M,Jf) = 0(UM,UJf) v(M,Jf) = r)(UJl,UJf) for any pair of subspaces M, Jf C <p".
Exercises 421 5 Prove that for subspaces if, M in £" 6 Show that the equality 0(i?, M) = 1 holds if and only if either &1 n M * {0} or <£ n M L ¥> {0} (or both). 7 Let Ml,Jfl be subspaces in <p" and M2,M2 be subspaces in <pm. Prove that 0(^, ©i<2, JV, © JV2) = max{0(^,, JV,), 0(^<2, Jf2)} where M^M^ ^, © ^2 C <p" © <f:m 8 Find the gaps 0(Ker v4, Ker B) and 0(lm v4, Im B) for the following pairs of transformations A, B: <fB—» (pB: (a) A and B are diagonal in the same orthonormal basis. (b) A and B are commuting normal transformations. (c) A and B are circulant matrices in the same orthonormal basis. {Hint: A and B can be simultaneously diagonalized by a unitary matrix.) (d) A 0 0 "l «2 0 • •• 0 a 1 0 : 0 0. B = "0 0 L0. & 0 0 0 Bn- 0 0 - in the same orthonormal basis, where a. and Bj are complex numbers. 9 For each of cases (a)-(d) in Exercise 13.8, find 0(KerA,KerB), 6{\mA,\mB), Tj(Ker A, Ker B), r,(Im/l,ImB), <pmin(Ker,4,KerB), and ^(Im/l.ImB) 10 Let A: $"-*$" be a transformation. Then 6(M,N) = \ for any distinct ,4-invariant subspaces M and jV if and only if A is normal with n distinct eigenvalues. 11 Show that if A(t), t G [0,1] is a continuous family of n x n circulant matrices and dim Ker A(t) is constant (i.e., independent of t), then the subspaces Ker A(t) and Im A(t) are constant. 12 Prove or disprove the following: (a) If A(t) is a continuous family of upper triangular Toeplitz n x n matrices for t G [0,1], then dim Ker A(t) is constant if and only if Ker A(t) and Im A(t) are constant.
422 The Metric Space of Subspaces (b) Same as (a) for n-l A(t) = 2 <*i{t)A' ,=0 where a-(f) are continuous scalar functions of t G [0, 1] and A is a fixed n x n matrix. 13.13 Show that a circulant matrix has a generalized inverse that is also a circulant. 13.14 Let A{t) be a continuous family of circulant matrices with dim Ker A(t) constant for f£[0,1]. Show that there exists a continuous family B(t) of generalized inverses of A(t) on [0,1] that also consists of circulant matrices. 13.15 Solve Exercises 13.13 and 13.14 with "circulant" replaced by "upper triangular Toeplitz." 13.16 Assume the hypotheses of Lemma 13.3.1 and, in addition, assume that the projector no is orthogonal. Prove that \\R\[ = cotan <pmin, where <pmin is the minimal angle between Ker no and Im n. 13.17 Find the minimal angle between any two one-dimensional subspaces in the normed space $ with the following norms: (a) ll<*,y>ll, = W + M- (b) ||<*,>->|L = max(|*|,|>i).
Chapter Fourteen The Metric Spaces of Invariant Subspaces We study the structure of the set lnv(A) of all invariant subspaces of a transformation A: <p"—» <p" in the context of the metric space (p^") of all subspaces in <p". Throughout this chapter <p" is considered with the standard scalar product and the gap metric determined by this scalar product on ^((p"1), as studied in the preceding chapter. With the exception of Section 14.3, the results of this chapter are not used subsequently in this book. 14.1 CONNECTED COMPONENTS: THE CASE OF ONE EIGENVALUE Let si C % be two sets of subspaces of (f"\ We say that si is connected in 38 if for any subspaces i?, MC. si there is a continuous function /: [0,1]—* 38 such that /(0) = if, /(l) = M. [The continuity of / is understood in the gap metric. Thus, for every ta 6 [0, 1] and every e > 0 there is a 8 > 0 such that \t - tQ\ < 8 and t £ [0,1] imply 8(f(t), f(tit)) < e.] The set si is called connected if si is connected in si. We start the study of connectedness of the set lnv(A) with the case when A = J, a Jordan matrix with <t(7) = {0}. Let r be the geometric multiplicity of the eigenvalue 0 of J, and let kl>--->kr be the sizes of the Jordan blocks in J. Also, denote the set of all /^-dimensional ./-invariant subspaces by Invp. Let / = (/,,..., /r) be an ordered r-tuple of integers such that 0< /, < kt, E;=,/, = /?, and let 4>p be the set of all such r-tuples. We associate every / = (/,,...,/,) E<t>p with the subspace ^(/)6lnvp, spanned by vectors mJ°; j = 0,. . . , /; - 1; i = 1, . . . , r, where uj° are unit coordinate vectors in <p" and the sole nonzero coordinate of m'0 is equal to one and is in the place k{ + • • • + kt:_, + ;' + 1 (we assume k0 = 0) for j = 0,. . . , kt - 1 and / = 1,. . . , r. There is a one-to-one correspondence between elements of <P and 423
424 The Metric Spaces of Invariant Subspaces subspaces from Inv,, spanned by unit coordinate vectors. So we can assume that *p CInv/;. Lemma 14.1.1 <t>p is connected in Invp. Proof. Let / = (/,,..., lr) and / = (/,,..., ir) be r-tuples from <t>p, and suppose, for example, that /, > /, and /,</2- Let ^(e)Glnvp be the subspace spanned by vectors wj'2, + euj,2>, u(\l),. . . , u\l)_2, w*0 for ;'= 0, ...,/,- 1 and i = 2,. . . , r, where e is a complex number. Then ^(0) = $(/) (the subspace corresponding to the r-tuple /) and ^(») = »(/,-l,/2 + l,/.„...,/,) So / = (/,,..., lr) and (/, - 1, l2 + 1, /,, . . . , lr) are connected in Invp. Applying this procedure several times, we obtain a connection between / and /. □ Lemma 14.1.2 Let 2Fl G Invp. Then S>x is connected in Invp with some !f2E.<t>p. Proof. For i' = 0, 1, 2,. . . , let 38, = {x e (p" | fx = 0} Then 0 = 9t0 C 38, C • • • C 38, = <p" for some integer 5 (5 is the minimal integer such that Js =0). We construct the basic set of vectors in 5FX in the following way (see the proof of the Jordan form in Section 2.3). Let i'„ be the greatest index that satisfies (58, "^ 58,_,) n &x ¥=0. Take a basis v, ,,..., v,■ „ in 9. fl 58, modulo 58, ,. Then the vectors J'v, ,,..., J'v, „ are linearly independent in ^nSf _, modulo 38, _,_,; / = 1,. . . , /0 - 1. We complete the set Jvt x,. . . ,Jvt by additional vectors V: _,,,... , V: _, „ to form a basis in iflt , modulo 58, _,. Then the vectors J V: ,,,..., J V,_.„ , J V,: |, . . . , J V. „ 'o-1'1 '0 l'9i0-i 'o' 'oil,, are linearly independent in 9X fl 58; _. modulo 38, _,_, for / = 2,. . . , i0 - 1. Complete the set Jv, _,,,... ,Jv. ^. „ ; J V: ,,.. . ,J2v, „ by additional vectors v, _7 ,,. . . , v., _, u to a basic set of vectors in
Connected Components: The Case of Oue Eigenvalue 425 $\ OS. , modulo 0t, _-,, and so on. So we obtain the basic set of vectors in {J'vn,. . . ,J'viq.; j = l,...,/0; / = 0,...,/-1} To connect &x with some subspace 3>2 G <t> , we use the following procedure. Take a set of a, coordinate unit vectors y,,,...,y,„ in 0t, that are independent modulo 01, _,. For j = 1,2,. . . , qt , put B,i>(A)=Au,, + (l-A)nj where A is a complex parameter. Then the i>, ,(A) are linearly independent modulo 0ti _, for every A G <p except possibly for a finite set S{. Indeed, let jt,,. . . , xk be a basis in 5?, _,, and put fl(A) = K1(A),...,u1(1,i.u(A),*l,...,*t] Then i>,(l/(A), y = 1,. . . , qt are linearly independent modulo 9?,- , if and only if the columns of B{ A) are linearly independent. Let fc( A) be a minor of B(A) of order i0 + A: such that 5(0)^0 (such a minor exists because yi(/, ;'= 1,. . . , g, are linearly independent modulo 01, _,). So 6(A) is a polynomial that is not identically zero. Clearly, for every A that does not belong to the finite set 5, of zeros of b(\), the vectors u, y(A), ;'= 1,. . . , qi are linearly independent modulo 0ii _,. Observe that Sx does not contain 0 and 1. Further, take a set of q, , coordinate unit vectors y,; _, ,,. . . , y,■ _, „ in 82, . such that the vectors are independent modulo 011 _2. Putting ^-../(A^AiVu + O-AbVu for ;' = 1,. . . , ql_,, we see similarly that the vectors «Vi.i(A). y=1.---.?*0-i; -^(A), 7 = 1,...,?,-, are independent modulo 0t t _2 for A G <p "- 52, where S2 D S, is a finite set of complex numbers (not including 0 and 1). We continue this procedure and obtain vectors MA); /=1,...,<7,; i = i0,i0-l,...,l such that
426 The Metric Spaces of Invariant Subspaces {•/'*>/+,,/(A); ; = 1,. ..,<7,; r = 0, l,...,/0-i, i = i0,i0-1,...,1} (14.1.1) are linearly independent for A G <p" ~- 5, where S is finite set of complex numbers not including 0 and 1 and i>/y(l) are coordinate unit vectors. From this procedure it follows also that u.y(A)G 3ij for AG £"- S. Therefore, the subspace ^(A) in <p" spanned by vectors (14.1.1) for AG<p~~-S is a /-invariant subspace with dimension not depending on A. Since S is finite we can connect between 0 and 1 by a continuous curve T such that T D 5 = 0. Then 9{k), AGT carries out the connection between 3>x = 3F(0) and &2 = (1), where &2 G <frp. □ We say that a set si C <$(§") has connected components s&x,. . . , sim if each s&-t, « = l,...,m is a nonempty connected set, but there is no continuous function /: [0, l]-> $($") such that /(0)£i„ /(1)G jtfy, and i^j. (In other words, each sii is a maximal connected set in M.) Lemmas 14.1.1 and 14.1.2 allow us to settle the question of connected components of the set Inv(/1) when the transformation A has only one eigenvalue. Theorem 14.1.3 Assume that the transformation A: <pH—» <P" has only one eigenvalue A((. Then lnv(A) has exactly n + 1 connected components, and each connected component consists of all A-invariant subspaces of fixed dimension. Proof. Without loss of generality we can assume A0 = 0. Let J be the Jordan form of A, and A = S~*JS for some invertible transformation S. Obviously, Inv(v4) = 5"'(Inv(7)) and ln\p(A) = S l(lnvp(J)), where lnvp(A) is the set of all /t-invariant subspaces of dimension p. Lemmas 14.1.1 and 14.1.2 show that Invp(7) and, therefore, lnvp(A) are connected. On the other hand, if if! G Invp(A) and M G Invq{A) with p ¥= q, then there is no continuous function /: [0, l]-» (jj(£") with /(0) = i? and /(l) = M. Indeed, if there were such a function /, then dim f(t) would not be constant in a neighbourhood of some point r„ G [0,1]. This contradicts the continuity of/in view of Theorem 13.1.2. □ 14.2 CONNECTED COMPONENTS: THE GENERAL CASE The description of connected components in ln\(A) for a general transformation A: <p"—»<f"' is given in the following theorem. Theorem 14.2.1 Let A,,. . . , A(. be all the different eigenvalues of A, and let tl>l,. . . , iffc be their respective algebraic multiplicities. Then for every integer p, Os^<n,
Connected Components: The General Case 427 and for every ordered c-tuple of integers (\i> • • • » Xc) sucn tnat Q — X, — 1!',, j= 1,. . . , candT,ci=1 Xi = P {!£ £ Inv A | dim if = p and the algebraic multiplicity of A\<f corresponding to A, is #, for / = 1,. . . , c) (14.2.1) is a connected component of \m(A), and each connected component of Inv(/1) has the form (14.2.1) for a suitable p and suitable c-tuple \ X\ ' • • • ' Xc )■ Proof. In the proof we use the following well-known properties of the trace of a transformation A: <f""—* <p", denoted by tr(A) {e.g., see Section 3.5 in Hoffman and Kunze (1967)]. We may define tr(A) to be the sum of eigenvalues of A. If A is written as an n x n matrix in any basis in <f"", then tt(A) is also the sum of diagonal elements of A. We have tr(AB) = tr(BA) for any transformations A, B: <£""—» §"; in particular, tr(S~US) = tr(A) for any invertible S. The trace (considered as a map from the set of all transformations <f""—* <p" onto <p) is a continuous function. Returning to the proof of Theorem 14.2.1, let T, be a small circle around A, with no other eigenvalue of A inside or on r,.. Let Jf be an ,4-invariant subspace, and let xA-N) be the geometric multiplicity of A, for the transformation A\ v. Using the Jordan form of A\x, for instance, it is easily seen that xx*) = «(^ijiAM-A\Arid\) Let a,,..., ap be an orthonormal basis in Jf. Then in some neighbourhood V(Jf) of Jf, Px.al, . . . , Pxap will be a basis in the subspace Jf' e V(Jf), where Pv. is the orthogonal projector on Jf'. We have 6(Jf, Jf') = \\PX - Px.\\ - \\P,.ai - a,\\ (14.2.2) Write A\A- as a matrix in the basis a,,. . . ,ap, and for every /1-invariant subspace Jf' that belongs to V(Jf), write A\x, as a matrix in the basis Pxat,. . . , Pxap. Using formula (14.2.2) and the continuity of the trace, we see that there exists a 8 >0 such that, if 0(Jf, Jf')<8 and Jf' is A invariant, then \x,W - *,(•*")! < i Since Xi(N') assumes only integer values, it follows that #,(-^') is constant in some neighbourhood of Jf in ln\(A) and, therefore, constant in the connected component of Inv(/1) that contains Jf. We show now that if Jf and Jf' are p-dimensional ,4-invariant subspaces
428 The Metric Spaces of Invariant Subspaces such that Xi(N) = xX-^') for / = 1,. . . , c, then Jf and Jf' are connected in 1ti\(A). Indeed, applying Theorem 14.1.3 to each restriction A\m (A) for j = 1, . . . , c, we find that Jf D 8?A.(/1) is connected with Jf' D S?A (A) in the set of all ,4-invariant subspaces of dimension xX-^) in 5?A(/1). Since jf = (jf n &Xi(A)) + (Jfn m^(A)) + • • • + (jf n »AcC4)) and similarly for Jf', it follows that .A" and Jf' are connected in lnv(A). It remains to show that, given integers Xi»• - • , Af< sucn that 0 < ^(. < ^, and E^=1 *,=/?, there exists a subspace .A" G ln\(A) with #,C^") = #,> f°r / = 1, . . . , c. But assuming that A is in Jordan form, we can always choose an Jf spanned by appropriate coordinate unit vectors. □ Corollary 14.2.2 The set lnv(A) has exactly n)_,(^, + l) connected components, where tpl,. . . , if/c are the algebraic multiplicities of the different eigenvalues A,,. . . , A( of A, respectively. The proof of Theorems 14.1.3 and 14.2.1 shows in more detail how the subspaces in Inv A belonging to the same connected component are connected. We say that a vector function x(t) defined for t G [0,1] and with values in <p" is piecewise linear continuous if there exist m points 0 < t, < •••<fm<l and vectors y,,. . . , ym + l and z,,...,zm + 1 such that, for / = 1,. . . , m + 1 *(/) = .y, + fz,-, ti.^t^t, (by definition, ta = 0, tm + l = 1), and for i = 1,. . . , m, we obtain Corollary 14.2.3 Let M and Jf be p-dimensional A-invariant subspaces that belong to the same connected component in Inv A. Then there exist piecewise linear continuous vector functions i>,(0> • • ■ . vp(t) such that, for all t £ [0,1], the subspace Span{y,(f),. . . , vp(t)} is p-dimensional, A invariant, and M = SpanfMO),. . . , vp(0)} , Jf = Span^O),. . . , vp(l)} 14.3 ISOLATED INVARIANT SUBSPACES Let A: §"—> (p" be a transformation. An /1-invariant subspace M is called isolated if there is an e>0 such that the only ^-invariant subspace Jf satisfying 0(M, Jf) < e is M itself.
Isolated Invariant Subspaces 429 Theorem 14.3.1 An A-invariant subspace M is isolated if and only if, for every eigenvalue A0 of A with dim Ker(A - A0/) > 2, either M D St^A) or M n 9tXf)(A) = {0}. To prove Theorem 14.3.1, we use a lemma that allows us to reduce the problem to the case when A has only one eigenvalue. Lemma 14.3.2 An A-invariant subspace M is isolated if and only if for every eigenvalue A0 of A the subspace M niA (A) is isolated as an A\M (Ayinvariant subspace. Proof We have M = Mn $lX{{A) + M Pi 9?A2(v4) + --- + MH ®K(A) where A,, . . . , Af are all the different eigenvalues of A. Assume that M is isolated. If for some A, the subspace M niA (.A) is not isolated [as an /lja (/l)-invariant subspace], then there exists a sequence of ,4-invariant subspaces Mm C &lx(A), m - 1, 2,. . . , such that Mm¥=M fl 38Ai(y4) and Q{Mm,M n 3?Ai(v4))-»'(). For m = 1, 2,. . . , let jfm = m n »AiC4) + • • • + m n mx. t(A) + Mm + Mr\ »A.tl(>0 + • ■ • + ^ n 9?Ar(v4) Obviously, >Vm is A invariant. Let ify be a direct complement to J< D S?A (/I) in 9?A.(/t) for y = 1, . . . , r, and put if = if, + if, + • • • + ifr. Then if is a direct complement to M in <p". Theorem 13.1.3 shows that for m sufficiently large, if,, is a direct complement to Mm in 5?A (,4), and therefore if is a direct complement to Mm in <f"\ Letting Q (resp. Qm) be the projector on M (resp. ^rm) along if, we have (cf. (13.1.4)) ^^j^lle-ej^iie-eji-il^ll (14.3.1) where P, is the projector on S?A (/I) along &ki(A) + ■■■ + 9tAi_t(A) + aAi+IM) + • • • + aAr(>i) and Q: 9?Ai(v4)-»9?A((v4) [resp. Qm: 9?A.(v4)-» 3?A(/1)] is the projector on J< (~1 S?A(/t) (resp. on Mm) along if,. Theorem 13.1.3 shows that for large m WQ-Qj^ceiJt^Jtnm^A)) where the constant O 0 is independent of m. Comparing with (14.3.1), we
430 The Metric Spaces of Invariant Subspaces obtain d(M,Jfm)-^0 as m-^oo, a contradiction with the fact that M is isolated. Assume now that, for / = 1, . . . , r, M C\ $lx(A) is isolated as an A\x iA)- invariant subspace. So there exists an e, >0 such that the only ,4-invariant subspace JV) C 5?A (A) satisfying e(M n9?A.(/t), jv;)<e, is M D &i„(A) itself. We show now that, for every e > 0, there exists a 8 > 0 such that, for any /t-invariant subspace Jf with d(M, Jf)<8, the inequalities 6(M nSA(/l), Jf fl £%A (/!)) < e hold for i = 1,. . . , r. Indeed, arguing by contradiction, assume that for some e >0 and some / there exists a sequence {Jfm)Z=i of ,4-invariant subspaces such that d(M,Jfm)-^>0 as m—»°° but 6(M D ®X(A), Jfm n m^A)) > e (14.3.2) Let y 6 M n S?A (/I). Then, in particular, >- G J< and by Theorem 13.4.2 there exists a sequence {Jtm}^ = 1 such that xm E.Jfm for m = 1,2,. . . and y = i,im^ (14.3.3) Write *m = *ml + • • • + xmr, where xmj G JVm n 98a.(j4), / = 1, . . . , r. Apply the projector on $lk (A) along the sum of all other root subspaces of A to both sides of (14.3.3). We see that y = lim^^^ xmi. Conversely, if y = \imm^xxmi for some xmiE.JfmC\3^/t(A), then obviously yE.&lx(A), and, by Theorem 13.4.2, we also have y G M. Now by the same Theorem 13.4.2 any limit point of the sequence Jfm n 8?A {A), m = 1, 2,. . . coincides with M C\0ix(A). Since Theorem 13.4.1 ensures that the limit points of {Jfm fl 5?A (/4)}* = 1 exist, we obtain a contradiction with (14.3.2). Now take e = min(e,,. . . , er). Then for S >0 with the property described in the preceding paragraph, we find that for every ,4-invariant subspace Jf with d(M, Jf)<8 the equalities Jf n S?A (A) = M D S?A (/I) hold for; = 1,. . . , r. But these equalities imply Jf ~ M, that is, M is isolated. □ Proof of Theorem 14.3.1 In view of Lemma 14.3.2, we can assume that a(A) = {\0}. If dim Ker(A - \0I) = 1, then /I is unicellular and has a unique complete chain of invariant subspaces. Obviously, every ,4-invariant subspace is isolated. Now assume that dim Ket(A - A0/) > 2. In view of Theorem 14.1.3, the set lmp(A) of all ,4-invariant subspaces of fixed dimension p is connected. So to prove that the only isolated /t-invariant subspaces are {0} and <p", we must show that ln\p(A) has at least two members for 0 ^ p ¥= n. However, for every p with 0 < p < n, and in a fixed Jordan basis for A, the transformation A has at least two invariant subspaces of the same dimension p spanned by some vectors from this basis. □
Isolated Invariant Subspaces 431 An A -invariant subspace M is called inaccessible if the only continuous mapping of the interval [0, 1] into the lattice Inv(/1) of A -invariant subspaces with ^>(0) = M is the constant map <p(t) = M. Clearly, every isolated invariant subspace is inaccessible. The converse is also true, as follows. Proposition 14.3.3 Every inaccessible A-invariant subspace is isolated. Indeed, if A has only one eigenvalue Au and dim Ket(A - A0/) = 1, then any /t-invariant subspace is obviously inaccessible and isolated. It can be proved by using the arcwise connectedness of Inv (A) for 0<p < n that, if (r(/t) = {A0} and dim Ker(^4 - A0Z) > 1, then any nontrivial /t-invariant subspace is not inaccessible (Corollary 14.2.3). The reduction of the general case to this special case is achieved with the following lemma. Lemma 14.3.4 An A-invariant subspace M is inaccessible if and only if, for every eigenvalue A0 of A, the subspace MC\0lk (A) is inaccessible as an A\X {A)-invariant subspace. The proof of Lemma 14.3.4 is left to the reader. (It can be obtained along the same lines as the proof of Lemma 14.3.2.) Theorem 14.3.5 Every inaccessible (equivalently, isolated) A-invariant subspace is A hyper- invariant. Proof. Let A,, . . . , As be the distinct eigenvalues of A (if any) with dim Ker(A - \J) = 1 for / = 1,. . . , s, and let AJ+1, AJ+2,. . . , Ar be other distinct eigenvalues of A (if any). For a given isolated /t-invariant subspace M we have, by Theorem 14.3.1 J n «A(yl) = {0} , for i = s + l,s + 2,...,t MZ>9tx, for i = t + l,...,r and some t with s + 1 < t ^ r. Letting a, be the dimension of M n S?A, we have M = KeTP(A), where />(A) = n;=1 (A - A,)"' -n;,/+1 (A - A,)"-.' As every transformation that commutes with A also commutes with p(A), the subspace M is A hyperinvariant. □ The converse of Theorem 14.3.5 does not hold in general, as the next example shows.
432 The Metric Spaces of Invariant Subspaces EXAMPLE 14.3.1. Let "o 0 Lo 0 0 0 0] 1 OJ T = The subspace M = Span{e,, e2} is the kernel of T and is thus T hyperinvariant. For any complex number a, the subspace M(a) = Span{e, + ae3, e2} is easily seen to be T invariant. We have [1 0 lo 0 1 0 0 0 oJ •«(") 1 o o Vl + |«| 0 .Vi + I«l2 Vi + I«l2 so B{M(fi), M(a)) = ||P. «(0) ",«(a)l and as the norm of a hermitian matrix is equal to the maximal absolute value of its eigenvalues, a computation shows that 6{M((i), M{a)) = £max{|p + q + \'(p - qf + 4r|, where \p + q~ \l{p-q?+*r\) 1 I0|2 0 <? = VTTW VTTR r = a V^W VTTR So the subspace valued function F defined on <p by F{a) = M(a) is continuous and nonconstant and takes T-invariant values. As F(0) = M, the T- invariant subspace M is not inaccessible. □ 14.4 REDUCING INVARIANT SUBSPACES Recall that an invariant subspace M of a transformation A: <p" —* <p" is called reducing if there exists an ,4-invariant subspace J{ that is a direct complement to M in <p".
Reducing Invariant Subspaces 433 The question of existence and openness of the set of reducing A-invariant subspaces of fixed dimension p is settled by the following theorem. Theorem 14.4.1 Let A: §" —* <p" be a transformation with partial multiplicities m,,..., mk (so m, + • • • + mk = n). Then there exists a reducing A-invariant subspace of dimension p¥=0 if and only if p is admissible, that is, is the sum of some partial multiplicities mt,,. . . , m{, . In this case the set of all reducing A- invariant subspaces of dimension p is open in the set of all A-invariant subspaces. Proof. If p is admissible, then obviously a reducing ,4-invariant sub- space of dimension p exists. Conversely, assume that M is a reducing ,4-invariant subspace of dimension p with an ,4-invariant complement Jf. Write A L 0 A2. with respect to the direct sum decomposition M + Jf = <p". Taking Jordan forms of Ai and A2, we see that p is admissible. For an admissible p, let Rinvp(/1) be the set of all /^-dimensional reducing /1-invariant subspaces. For a subspace M G Rinvp(/1), let Jf be a direct complement to M that is A invariant. Theorem 13.1.3 shows that there exists an e < 0 such that Jf is a direct complement for any ,4-invariant subspace M, with &(M,Mx)<e. Hence R\mp(A) is open in the set \mp(A) of all p-dimensional A -invariant subspaces. □ Now consider the question of whether (for admissible p) the set Rinvp(/1) of all p-dimensional reducing subspaces for A is dense in the set Invp(A) of all p-dimensional .A-invariant subspaces. We see later that the answer is, in general, no. So a problem arises as to how one can describe the situations when RinVp(/l) is dense in \n\p(A) in terms of the Jordan structure of A. We need some preparation to state the results. Let A: <p"—»• <p" be a transformation with single eigenvalue A0 and partial multiplicities m, a • • •> mr. It follows from Section 4.1 that the partial multiplicities p, > • • - > p, of the restriction A |M to an ,4-invariant subspace M satisfy the inequalities /<r, Pj^mj, y = l,...,/ (14.4.1) Given an integer p with 1 < p =£ n, let p, 3: • • • 3: p, be a sequence of positive integers such that (14.4.1) holds and p{ + -- + p, = P', a sequence with these properties is called p admissible. For a p admissible sequence p, > •••Srp, denote by \n\p(A; p , p,) the (nonempty) set of all A- invariant subspaces M such that the restriction A\M has the partial multi-
434 The Metric Spaces of Invariant Subspaces plicities p,,. . . , p,, Clearly, dim M - p for every M Glnvp(/i; p{, . . . , p,). Moreover Invp(i4)= U lmp(A; plt. . . , Pl) where the union is taken over the finite set of all p-admissible sequences p, > • • • > p,. For each p-admissible sequence p, > • ■ ■ > p, let pi E(A;pl,...,p,) = ,ZqXcl-qi) (14.4.2) 1 = 1 where c- = {j\ l<;<r, m^i}*, g,. = {;" | 1 <y</, p^i}*, and K# indicates the number of elements in the finite set K. In connection with the definition of E(v4; />,,..., p,), observe that cy s <jn for /' = 1, 2,. . . (so each summand on the right-hand side of (14.4.2) is a nonnegative integer), and/?, is the maximal index with q„ >0. We now give a necessary and sufficient condition for the denseness of Rinvp(/1) in ln\p(A), for a transformation A: £"—»<p" with single eigenvalue and partial multiplicities m, s • • • s mr. Theorem 14.4.2 For a fixed admissible integer p, the set Rinvp(A) is dense in ln\p(A) if and only if the following condition holds: any p-admissible sequence Pi — '-^Pi for which the number a(A; p,,. . . , p,) attains its maximal value among all p-admissible sequences has the form p, = m, ,. . . , p, = mt for some indices 1 < /, < i2 <■••</, < r. In particular, Rimp(A) is dense in ln\p(A) provided there is only one p-admissible sequence p{> • • •> pt for which B(A; p,, . . . , p,) is maximal. In the proof of Theorem 14.4.2 we apply a result proved in Shayman (1982) concerning a representation of ln\p(A) as a union of complex (analytic) manifolds. In this proof (and only in this proof) we assume some familiarity with the definition and simple properties of complex manifolds that can be found, for instance, in Wells (1980). Theorem 14.4.3 For every p-admissible sequence p, s • • • a p{ the set Invp(A; p,,. . . , p,) is, in the topology induced by the gap metric, a connected complex manifold whose (complex) dimension is equal to H(v4; p,, . . . , p,). For the proof of Theorem 14.4.3 we refer the reader to Shayman (1982). Proof of Theorem 14.4.2. Assume that the condition fails, that is, there exists a p-admissible sequence p, > • • • >p, with maximal a(A; p,,. . . , p,)
Reducing Invariant Subspaces 435 that is not of the form p, = m,■,..., p, = ml■., 1 < i, < i2 <•••</, < r. By Theorem 14.4.3 the complex manifold Invp(/1; p,,. . . , p,) has maximal dimension among all the complex manifolds whose union is Inv (A). On the other hand, it is easily seen that \mp(A; plt. . . , p,) does not contain any reducing subspace for A (cf. the proof of Theorem 14.4.1). So Rinvp(/1) is not dense in lmp(A). Assume now that the condition holds. Then every complex manifold lnvp(A; p,,. . . , p,) with maximal H(v4; pl7 . . . , p,) will contain a reducing subspace M{pl,. . . , p,) for A. Fix such a p-admissible sequence p, > • • • > p,, and let JVbe an ,4-invariant direct complement to M(pl, . . . , pt) in <p". It follows from Theorem 7 in Shayman (1982) that the complex manifold Invp(/1; Pi, ■ ■ ■ , Pi) can be covered by a finite number of analytic charts and that each chart is of the form <p: §q—*lnv(A; p{, . . . , pt) (q = E(A; p,,. . . ,p,)) with <p(z) = Span{*,(z), . . . , xp(z)}, where jc,(z), . . . , xp(z) are analytic vector functions in (p*. Now it is easily seen that the set of all subspaces M GInvp(/t; p,, . . . , p,) that are not direct complements to Jf is an analytic set (i.e., the union of the sets of zeros of a finite number of analytic functions that are not identically zero) in each of the charts mentioned above. Denoting by K the union of all Invp(/1; p,, . . . , pi) for which a(A; p,,. . . , p,) is maximal, it follows that Rinvp(/1) D K is dense in K. As \mp(A) is connected (Theorem 14.1.3), it follows from Theorem 14.4.3 that the closure of K coincides with lnvp(A); hence Rinvp(/1) is dense in Invp(^4). Finally, suppose that there exists only one p-admissible sequence p\ 3: • • • SrpJ,, for which H(/l; pj,. . . , p'r) is maximal. As the set ln\p(A) is connected, and Theorem 14.4.3 implies that \n\p(A) is the closure of Invp(/4; p[,. . . , p'r). Since p is admissible, there exists a p-dimensional /4-invariant subspace MQ such that M0 + JV0 = <p" for some /t-invariant subspace jV„. So there exists a subspace M in Invp(^4; p|,. . . , p'r) (sufficiently close to M„) for which ^T0 is a direct complement. Now we can repeat the arguments in the preceding paragraph to show that Rin\p(A) is dense in lnvp(A). □ Let us give an example showing that, for an admissible p, Rinvp(/1) is not generally dense in ln\p(A). example 14.4.1. Let /1 = 75(0)©73(0)©J1(0) where /m(0) is the Jordan block of size m with eigenvalue 0. Clearly, p = 5 is admissible. However, Rinv5(/1) is not dense in Inv5(/1). According to Theorem 14.4.3, the connected set Inv5(/i) is the disjoint union of five analytic manifolds 5,, S2, 53, 54, Ss described as follows: let yl = {5}; y2 = (4,l}; "ft = (3, 2); y, = {3,l,l}; y5 = (2,2,l}. Thenfory = l,...,5,
436 The Metric Spaces of Invariant Subspaces 0 0 .0 1 0' 0 0 0 0. S; consists of all five dimensional A -invariant subspaces M such that the restriction A\M has partial multiplicities given by yr Further, the (complex) dimensions of 5,, S2, S3, S4, S5 are 4, 4, 3, 2, 0, respectively. It is easily seen that there is no reducing subspace for A in S2. Indeed, the sum of a subspace from S2 and any four-dimensional /1-invariant subspace fails to contain the vector e5 £ <p9. Since the dimension of S2 is maximal among the dimensions of Sj, ; = 1,...,5, it follows that Rinvs(A) is not dense in lnvs(A). □ In the next example Rinvp(/1) is dense in ln\p(A), for all admissible p. example 14.4.2. Let A = Obviously, all p-0, 1, 2, 3 are admissible. Among the one-dimensional /t-invariant subspaces Span{ae, + (1 - a)e3) (where a G <p), all are reducing with the exception of Span{e,} (i.e., when a = 1). Indeed Span{e,, e2} + Span{ae, + (1 - a)e3} = <p3 for a # 1. So Rinv,(/1) is dense in Inv,(/t). Further, in the set Span{e,, e3} U I U Span{e,, e2 + ae3}) Va£<t' ' of two-dimensional ^-invariant subspaces the reducing ones are Span{e,, e2 + ae3}, a £ <f", that is, again a dense set. □ We note the following corollary from Theorem 14.4.2. Corollary 14.4.4 If the transformation A:$n—> §" has only one eigenvalue A0 and dim Ker( A0/ - A) = 2, then Rinvp(/1) is dense in lmp(A) for every p such that Rinvp(A) is not empty. Proof. Indeed, let m, ^m2 be the partial multiplicities of A. A simple calculation shows that for every p-admissible sequence p^p2 we have 3(v4; /»,, p2) = m2 - p2, and for the p-admissible sequence consisting of one integer p, only, we have H(^4; p,) = m2. Hence there exists only one p-admissible sequence p, > • • • >/>, for which 3(v4; p,,. . . , p,) is maximal, and the second part of Theorem 14.4.2 applies. □
Covariant and Semiinvariaut Subspaces 437 14.5 COVARIANT AND SEMIINVARIANT SUBSPACES In this section we study topological properties of the sets of coinvariant and semiinvariant subspaces for a transformation A: <pH—»<p". As usual, the topology on these sets is the metric topology induced by the gap metric. For the coinvariant subspaces we have the following basic result. Theorem 14.5.1 The set Co\m(A) of all coinvariant subspaces for a transformation A: <f""—»<p" is open and dense in the set (|?(<p") of all subspaces in <p". Furthermore, the set Coin\p(A) of all A-coinvariant subspaces of a fixed dimension p is connected. Proof. Let M be A coinvariant, so there is an ,4-invariant subspace Jf that is a direct complement to M in <p". By Theoreml3.1.3 there exists an e >0 such that Jf is a direct complement to any subspace i?£ §(§") w'tn 0{M, Z£) < e. Hence Com\{A) is open. We now prove that Coinv(^4) is dense. Let M = Span{u,,. . . , vp) be a /^-dimensional subspace in <f"\ There exists an («- /?)-dimensional A- invariant subspace Jf. Let «,,... ,un_p be a basis for Jf. Denoting by m>,, . . . , iv a basis for some direct complement to Jf, put Mv) = Span{i>,+ tjiv,,.. . ,vp + T)\vp} where tj ¥0 is a complex number. As vlt. . . ,v are linearly independent, for t] close enough to zero the vectors u, + tjiv,, . . . , i>, + i)wp are linearly independent as well. Hence dim J£(t/) =/? for t; close enough to zero. Further, the determinant of the n x n matrix [u, • • • u„ iv, • • • ivp] is nonzero. If £ G <p the determinant of [u, • • • un_p, ^y, + iv,, . . . , ijvp + wp] is a polynomial in £ that is not identically zero, and it follows that det[u,,...,u„_p, fu. + iv,,. ..,{vp + wp}¥0 for all £ such that |f | is large enough. For such £, the subspace Span{^y, + iv,, . . . , £vp + wp} is a direct complement to Jf. As M(Tq) = Span{(l/T7)y, + w,,. . . , (1/T;)yp + wp) it follows that for 17#0 and close enough to zero M{r\) + Jf = <p". To show that M belongs to the closure of the set of all A -coinvariant subspaces, it remains to prove that X\m{M,M(?i)) = § (14.5.1) To prove this, assume for simplicity of notation that the upper p rows in [i>, • • • vp] are linearly independent. Then the same will be true for the upper p rows of [u, + tjiv,, . . . ,vp + Tjwp] (for t; close enough to zero). Write
438 The Metric Spaces of Invariant Subspaces where B(t/) is a nonsingular p *■ p matrix and C(t/) is an (n - p) x p matrix. Then the matrix KV) IX(V)L(V) X(V)L(V)X(V)* \ where X(v) = C(V)B(V)1 and L(v) = (/ + X(t,)*X(t,)) ' is the orthogonal projector on M.(tj). As the entries of P(tj) are continuous functions of t\, equality (14.5.1) follows. Finally, let us verify the connectedness of Coinvp{A). Let Mt, M2E. Coinvp(;4). So Ml 4- if, = M2 4- i?2 = <p" for some (« -/?)-dimensional A- invariant subspaces if, and i?2. Let i>,,. . . , vp and «,,..., u be bases in Mx and J<2, respectively, and consider the subspaces M(rf) = Span{u, + tju ,,..., i; +t/«p} where 17 £ (p. As in the preceding proof of the dense- ness of Coinv(A), one verifies that for all 17 with the possible exception of a finite set <t> (= the set of zeros of a certain polynomial), M(tj) is a direct complement to the least one of the subspaces if, and if2. Pick a continuous curve T(f) in <p U {°°} where t £ [0,1] and that does not intersect <t> and such that r(0) = 0, T(l) = 00. Then M(r(t)) for t £ [0,1] is the desired connection between Mx and M2 in the set Coinvp(/t). □ Now we consider the semiinvariant subspaces. As any /1-coinvariant subspace is also A semiinvariant, Theorem 14.5.1 implies that the set Sinv(/1) of all A -semiinvariant subspaces is dense in <$($"). However, Sinv(/1) is not necessarily open, as the following example shows. example 14.5.1. Let y4 = y4(0): <p4—»■ <p4. The two-dimensional subspace Span{e2, e3} is obviously A semiinvariant, and lim0(Span{e2, e3}, Span{e2, e3 + 17^4}) = 0 7J-.0 (see the proof of Theorem 14.5.1). But the subspace Span{e2, e3 +17^4} is not A semiinvariant for 17 # 0. Indeed, suppose that Span{e2,e3+T7<?4} + jV= M (14.5.2) where .A" and M are A invariant. As the only nonzero /t-invariant subspaces are Span{e, | 1 < i </} for / = 1, 2, 3, 4, and (14.5.2) implies e3 + 7je4 EM, it follows that M = <p4. Then dim Jf = 2. Hence Jf must be Span{e,, e2), which contradicts (14.5.2). □
The Real Case 439 Theorem 14.5.2 For any transformation A: £"—» <p" the set Sinvp(/1) of all A-semiinvariant subspaces of a fixed dimension p is connected. Proof. Given an ,4-invariant subspace N with dimension not less than p, denote by Sp{Jf) the set of all ,4-semiinvariant subspaces if of dimension p such that if + M = ./Vfor some /1-invariant subspace M (in other words, if is A\v coinvariant). It will suffice to show that for any .A" and any if, G Sp(Jf), if, G Sp(<p") there exists a continuous function /: [0,1]—» Sinvp(A) such that /(0) = if,, /(l) = if,. Let if2 + M2 = <p", where M2 is A invariant, and let /,,..., f and g,,. . . , gp be bases in if2 and if,, respectively. Denote by S the finite set of all T?G<p for which Span{/, + r/g,,. . . , fp + ygp) is not a direct complement to M2 in <p". Then put f(t) = Span{f1 + T(t)gl,...,fp + T(t)gp} for 0<f<l and /(l) = if,, where T: [0, l]-^(<pU {°o})^5 is any continuous function with r(0)=0, r(i) = °o. a 14.6 THE REAL CASE Consider now a transformation A: J(f"-^ $". We study here the connected components and isolated subspaces in the set lnv*(A) of all ,4-invariant subspaces in J|f". Theorem 14.6.1 If A has only one eigenvalue, and this eigenvalue is real, then the set \n\p(A) of all A-invariant subspaces of fixed dimension p is connected. The proof of Theorem 14.6.1 will be modeled after the proof of Theorem 14.3.1, taking into account the fact that in some basis in $" the transformation A has the real Jordan form (see Section 12.2). We apply the following fact. Lemma 14.6.2 The set GLr(n) of all real invertible nx n matrices has two connected components; one contains the matrices with positive determinant, the other contains those with negative determinant. Proof. Let T be a real matrix with det T > 0 and let J be a real Jordan form for T. We first show that J can be connected in GLr(n) to a diagonal matrix K with diagonal entries ±1. Indeed, / may have blocks Jp of two types: first
440 The Metric Spaces of Invariant Subspaces ',= in this case we "A, 0 -0 define 1 K 0 0 •• 1 •• • 0' • 0 1 • v ApG#, A„*0 /„(') = Ap(0 1-f 0 A„(0 L 0 0 1 \-t Ap(0- for any tE. [0,1], where A (0 is a continuous path of nonzero real numbers such that Ap(0) = Ap, and Ap(l) = 1 or -1 according as Ap >0 or Ap <0. Second, a Jordan block / may have the form 0 K„ LO 0 I 0 0 M where / = , ^p = I for real a and t with r # 0. Then 7p(f) is defined to have the same zero blocks as Jp, whereas the diagonal and superdiagonal blocks are replaced by \(l-t)a + t (1-0t ] \l~t 0 ] L ~{\-t)r (l-t)a+t\' L 0 1-fJ respectively, for t G [0,1]. Then Jp(t) determines a continuous path of real invertible matrices such that 7p(0) = Jp and ip(l) is an identity matrix. Applying the above procedures to every diagonal block in J, we see that J is connected to K by a path in GLr(n). Now observe that the path in GLr(2) defined for t G [0, 2] by -(1-0 -t Y-Cl-i (2-0 (1-0 f-lJ ] when f£ [0,1] when fG[l,2] connects to . Consequently AT, and hence J, is connect-
The Real Case 441 ed in GLr{n) with either / or diag[-l, 1,1,. . . , 1]. But det T>0 implies det/>0, and so the latter case is excluded. Since T=S~1JS for some invertibie real S, we can hold S fixed and observe that the path in GLr(n) connecting J and / will also connect T and /. Now assume TeGLr(«) and det7"<0. Then det7">0, where 7" = T diag[-l, 1,. . . , 1]. Using the argument above, we find that 7" is connected with / in GLr(n). Hence 7" is connected with diag[-l, 1,. . . , 1] in GLr(n). a Proof of Theorem 14.6.1. Without loss of generality we can assume that A = 7„(0). Let kl > • • • > kr be the sizes of Jordan blocks in A. Let <t>p the set of all ordered r-tuples of nonnegative integers /,,..., lr such that 0</,<A:,, E^=1/, = />. As in Section 14.1, each (/,,..., lr) e<J>p is identified with a certain /^-dimensional ,4-invariant subspace; so <£>p can be supposed to be contained in Inv*(.A). The proof of Lemma 14.1.1 shows that <&p is connected in Inv*(/t). Further, we apply the proof of Lemma 14.1.2 to show that any 3'1 e Inv*(/1) is connected in Inv*(/1) with some 3F2 G <t>p. Take vectors vi} £ %", j = 1,. . . , qt; i = i0, i0 - 1, . . . , 1 as in the proof of Lemma 14.1.2. Let /?, = dim 9?(. - dim £%,_, for i = i0, iQ — 1,. . . , 1. As the vectors y,- ,,. . . , v, q. are linearly independent modulo 9?,- _(, the Pi x ^.( matrix £),„ formed by the rows /„,&, + /„,...,£, + ••• + &, + «„ of the n x <jr. matrix [f,c,,. • • , vi ] has linearly independent columns. For simplicity of notation assume that the top qt x q. submatrix Qt of Qt is nonsingular. Now Lemma 14.6.2 allows us to connect the vectors y<o" • • •' v'o«,0 with ±e'C e*i+-o' • • • ' e*l + -+*„j_1+ib. respectively ^ (the sign + or - coincides with the sign of the nonzero real number det Qt ) in the set of all qt -tuples of vectors in 9?, that are linearly independent modulo »(„-i- Put ytJ= ±ei0> yiQJ = e*l + ... + *|°_1+,0 for y = 2,. . . , 9iu in the proof of Lemma 14.1.2. Using an analogous rule for the choice of y/y at each step of the procedure described in the proof of Lemma 14.1.2, we finish the proof of Theorem 14.6.1. □ Theorem 14.6.3 If the transformation A: J)?"—> J|J" has the only eigenvalues a ± ifi, where a and (3 are real and p # 0, then again the set Inv*(/1) of all A-invariant subspaces of fixed dimension p is connected. Note that under the condition of Theorem 14.6.3, A does not have odd-dimensional invariant subspaces (in particular, n is even), so we can assume that p is even (see Proposition 12.1.1). Proof. Consider A as the n x n real matrix that represents the transformation A in the basis e,,..., en in tf.", and let Ac be the complexification
442 The Metric Spaces of Invariant Subspaces of A; so Ac: <f" —* £". By Theorem 12.3.1, there exists a one-to-one correspondence between the v4''-invariant (/?/2)-dimensional subspaces M in 8fta+ip(A') and the ,4-invariant /^-dimensional subspaces 5£, which is given by the formula At = (2 + i£) D 98a+1-p(y*f) ='*>(■#) (14.6.1) It is easily seen from the proof of Theorem 12.3.1 that this correspondence is actually a homeomorphism <p: lnvf(A)-^ lnvp/2(Ac\^ (/,f)). Now the connectedness of Inv*(/1) follows from the connectedness of i^fiii^L.^A')) (see Theorem 14.1.3). □ Recall that as shown in Chapter 12, any ,4-invariant subspace 2 admits the decomposition 2 = (sen &Xi(A)) + • • • + (if n aAi(A)) + (<en® ai±ifii(A)) + ■■■ + (^na„,±ift(/i))1 where A,, . . . , A^ are all the distinct real eigenvalues of A (if any) and a, + j/3[, . . . , a, + j/3, are all the distinct eigenvalues of A in the open upper half plane. Using this observation, the proof of Theorem 14.2.1 yields the following description of the connected components in the metric space Inv*(/1) of all ,4-invariant subspaces in ft" for the general transformation A:%"^%". Theorem 14.6.4 Let A,, . . . , As be all the different real eigenvalues of A, let their algebraic multiplicities be i//,,. . . , \ps, respectively, and let or, + i/3,, . . . , or, + /j8, be all the distinct eigenvalues of A in the open upper half plane with the algebraic multiplicities «pt,. . . , <pn respectively. Then for every (s + t)-tuple of integers X = (Xi,- ■ •, Xs+I) such that 0< AT, <'/',, i = l,...,s; 0< *J+I.< <p,., / = 1, . . . , t the set {££ G Inv*(/i) | dim 2 = p; x< is the algebraic multiplicity of A\^ corresponding to kjor i = 1,. . . , s; Xs+j is that corresponding to a; + /j3;. for] = 1,. . . , t}, where p = Xl + ■ ■ ■ + xs + 2(*, + i + • • • + *s+,) is a connected component of Inv*(/1) and every connected component of Inv*(/1) has this form. In particular, Inv*(/t) has exactly n*=1 (^ + 1)- l\'j=1 (<p. + 1) connected components. Finally, consider the isolated subspaces in Inv*(/1). Theorem 14.6.5 Let A: Jj?"—»^f" be a transformation. Then an A-invariant subspace M is isolated in Inv*(/1) if and only if either M n 9?A (v4) = {0} or M^> 9?A (/t)
Exercises 443 for every real eigenvalue A0 of A with dim Ker(A0/- A) 3:2, and either M D 01 a±lli(A) - {0} or M D 01 a±ip(A) for any nonreal eigenvalue a + if} of A with geometric multiplicity greater than 1. Proof. Using the real analog of Lemma 14.3.2 (its proof is similar to that of Lemma 14.3.2), we can assume that one of two cases holds: (a) a(A) = {\0}, A0G#; (b) a(A)={a + ip, a-if}}, a,j36»J#0. In the first case Theorem 14.6.5 is proved in the same way as Theorem 14.3.1. In the second case use Theorem 14.3.1 and the homeomorphism between \mf(A) and Invp/2(,4f| »,„.(#)) 8iven by formula (14.6.1). □ 14.7 EXERCISES 14.1 14.2 14.3 Supply the details for the proof of Lemma 14.3.4. Prove that for a transformation A the sets of A -hyperinvariant subspaces and isolated /t-invariant subspaces coincide if and only if A is diagonable. In this case an /1-invariant subspace is isolated if and only if it is a root subspace. What is the number of isolated invariant subspaces of the companion matrix 0 1 0 0 0 1 0 0 0 Lfl,, 0 0 a>e<F? 14.4 Let A = diag[72(0), 72(0), 72(0)]: <p6-> (p6 Is the set of all reducing ,4-invariant subspaces dense in Inv(/1)? 14.5 Show that there exists a converging sequence of semiinvariant sub- spaces for the matrix 7,(0) whose limit is not 7,(0)-semiinvariant.
Chapter Fifteen Continuity and Stability of Invariant Subspaces It has already been mentioned that computational problems for invariant subspaces naturally lead to the problem of describing a class of invariant subspaces that are stable after small perturbations. Only such subspaces can be amenable to numerical computations. The analysis of stability of invariant subspaces is the main topic of this chapter. We also include related material on stability of other classes of subspaces (notably, [A B]-invariant subspaces), and on stability of lattices of invariant subspaces. Different types of stability are analyzed. 15.1 SEQUENCES OF INVARIANT SUBSPACES In this section we consider the continuity of invariant subspaces for transformations from <p" into <p". We start with the following simple fact. Theorem 15.1.1 Let {Am)1l=x be a sequence of transformations from <p" into <p" that converges to a linear transformation A: §"-^> $". If Mm is an Am-invariant subspace for m = 1,2,. . . such that Mm—*M for some subspace M C (£"", then M is A invariant. Proof Let xE. M. Then, by Theorem 13.4.2, there exists a sequence {*m}m = J SUCfl that Xm G ^m for each m ar>d nmm-^ ll*m ~ *ll = °- NOW \\A* ~ Amxm\\ *\\Ax- Amx\\ + \\Amx - Amxm\\ *\\A- Am\\-\\x\\ + \\Am\\-\\x-xm\\ 444
Sequences of Invariant Subspaces As Am-+A, the norms \\Am\\ are bounded; \\Am\\ constant K independent of m. So as m—»°°, 445 K for some positive \Ax-AmXm\ < ||,4-,4J|-1|*||+ tf-||*-*J|-*0 As Mm is A„ invariant, we have AmxmE.Mm for each m, and Theorem mm * m m m » 13.4.2 can be applied to conclude that Ax E.M. □ The continuity property of invariant subspaces expressed in Theorem 15.1.1 does not hold for the classes of coinvariant and semiinvariant subspaces. example 15.1.1. For m = 1, 2,. . . , let A = 0 0 r i m. IS The subspace Span{et} is Am coinvariant for every m. (Indeed, Span a direct complement to Span{e,}, which is Am invariant.) However, m 1 0 11 o oJ is the limit of Am. The □ Span{e,} is not A coinvariant, where A = same subspace Span{e,} is also Am reducing, but not A reducing. example 15.1.2. For m = 1, 2,. . . , let 0 0 0 1 1 m 0 0 1 1 m The eigenvectors of Am are (up to multiplication by a scalar) el,mel + me2, me2 + 2e3. Consequently, the subspace Span{e,,e3} is Am semi- 2 m e. invariant for all m (because Spanfme, + e2) is a direct complement to Span{el5 e3}, which is an /lm-invariant subspace). However, Span{e,, e3} is not A semiinvariant, where A = 0 0 .0 1 0 0 cv 1 0. is the limit of Am if m—»°°. □
446 Continuity and Stability of Invariaut Subspaces Corollary 15.1.2 The set of A-invariant subspaces is closed; that is, if {MmYm^l is a sequence of A-invariant subspaces with limit M = limm_=e M, then M is also A invariant. Simple examples show that the ,4-invariant subspaces Ker A and Im A are not generally continuous in the sense of Theorem 15.1.1. Thus it may happen that {Ker/tm}*=1 does not converge to Ker A and {Im/lm}*=1 does not converge to Im A as Am—* A. The following result shows that the only obstruction to convergence of Ker Am and lmAm is the dimension. Theorem 15.1.3 Let {Am}~m = x be a sequence of transformations on <p" that converges to a transformation A on <p". Then Ker A contains the limit of every convergent subsequence of the sequence {Ker Am}2, = l- In particular, if dim Ker Am = dim Ker A for every m = 1, 2,. . . then Ker Am and Im Am converge, and Ker A = lim Ker Am , Im A = lim Im Am Proof. For k = 1, 2,... , let Ker Am converge to some M C <p". Then for every xE. M there exists a sequence xm G Ker Am , such that xm —» x. As A„ xm = 0, we have also Ax — 0, that is, x G Ker A. mk mk Now let \xaAm be a sequence converging to some Jf C (p". Then [see formula (13.1.1)] * Since j4*-» v4*, by the part of the theorem already proved, Jfx C Ker /I* = (Im A)1 and so .A" D Im v4. Assume in addition that dim Ker Am= dim Ker A for all m = 1, 2, .... If i£ is a limit of a converging subsequence from the sequence {Ker v4m}^,, then (see Theorem 13.1.2) dim if = dim Ker A. From the first part of the theorem we know that i? C Ker A. So actually !£ = Ker A. Hence Ker A is a limit of every converging subsequence of {Ker Am)'m = l. It follows [using the compactness of (frf^")] that Ker Am converges to Ker A. Further, we also have dim Im Am = dim Im A for each m. A similar argument shows that Im Am converges to Im A. □ Let M be an .4-invariant subspace and 11 be an open set in <f. We conclude this section by showing that the inclusion a(A\M) Cll is preserved under small perturbations. Recall that 0 denotes the "gap" metric introduced in Chapter 13.
Stable Invariant Subspaces: The Main Result 447 Theorem 15.1.4 Let M be an invariant subspace for the transformation A: <p"—* <f"", and let ilC <p be an open set such that all eigenvalues of A\M are inside il. Then for transformations B on <p" and B-invariant subspaces JV, (t(B\ v) C il as long as ||B - A || + d(M, Jf) is sufficiently small. Proof Arguing by contradiction, suppose that there exists a sequence of transformations {Bm}* = ] on <p" and a sequence of subspaces {Jfm)Z-i such that Jfm is Bm invariant, \\Bm-A\\ + d(Jt,Jfm)<^, m = l,2,... and cr(Bm\ v JjZ'O. For each m, let Am be an eigenvalue of Bm\s outside il: Bmxm = \mxm, ||*J| = 1, *meJVm (15.1.1) Since ||B„, -/4||—»0 as m-»°°, the norms {||Bm||}^ = i are bounded; hence the sequence {Am}* = 1 is bounded as well. Passing to subsequences in formula (15.1.1), if necessary, we can assume that Am—»• A0 and xm—> x0 (as m-^oo), for some A0£<p and jt„ G £". By Theorem 13.4.2, x0Gi, and clearly x0^0. As Ant,, = A0jr0, A0 is an eigenvalue of v4|-<f, which, by hypothesis, belongs to il. But this contradicts Am 0il for m = 1,2,... . O 15.2 STABLE INVARIANT SUBSPACES: THE MAIN RESULT Let A: £" —* <p" be a transformation. An ,4-invariant subspace .A" is called stable if, given e>0, there exists a 6>0 such that ||B-.A||<6 for a transformation B: <p"^> <p" implies that B has an invariant subspace M with 0(M, Jf) < e. The same definition applies for matrices. This concept is particularly important from the point of view of numerical computation. It is generally true that the process of finding a matrix representation for a linear transformation and then finding invariant sub- spaces can be performed only approximately. Consequently, the stable invariant subspaces will generally be the only ones amenable to numerical computation. Suppose that JV is a direct sum of root subspaces of A. The JV is a stable invariant subspace for A. This follows from the fact that JV appears as the image of a Riesz projector ^ = 2^/i.(/A-^)-,dA (15.2.1) where T is a suitable closed rectifiable contour in <p such that the eigenvalue
448 Continuity and Stability of Invariant Subspaces A0 of A is inside T if 9?Ao(v4) C Jf and outside T if ®Xg(A) C\Jf = {0} (see Proposition 2.4.3). Further, the function F(A) = (/A- A)"1 is a continuous function of A on T. This follows from the formula (/A - A)~l = [det(/A - /4)]"'Adj(/A - ,4) where Adj(/A - A) is the matrix of algebraic adjoints of /A - A, and from the continuity of det(/A - A) and Adj(/A - A) as functions of A. Since T is compact, the number KA = maxAer ||(/A - A)~l\\ is well defined. Now any transformation B: §"-+§" with ||B — v4|| </T^1 has the property that /A - B is invertible for all A G T. [Indeed, for A £ T we have I\-B = (I\- A) + (A- B) = (/A - A){I + (/A - A)'\A - B)] and since ||(/A - A)~\A - B)\\ < 1, the invertibility of Ik-B follows.] Moreover ||(/A - AT1 - (/A - B)-'|| ^ KAKB\\A - £|| which implies that H^/j-^H is arbitrarily small if ||.A-B|| is small enough. Theorem 13.1.1 shows that 8{Jf,M)<\\RB-RA\\ (15.2.2) so 0(Jf, M) is small together with \\RB - RA\\. However, it will turn out that not every stable invariant subspace is spectral. On the other hand, if dim Ker(Ay/- A) > 1 and Jf is a one- dimensional subspace of Ker(A7— A), it is intuitively clear that a small perturbation of A can result in a large change in the gap between invariant subspaces. The following simple example provides such a situation. Let A be the 2x2 zero matrix, and let ■A" = Spanj r C <p2 Clearly, Jf is A invariant, but JV is unstable. Indeed, let B = diag[0, e], where e#0 is close enough to zero. The only one-dimensional B-invariant subspaces are M, = Span n \ and M2 - Spam \\, and both are far from Jf: computation 101 shows that 6(Jf, M) = lh/2, 1 = 1,2 The following theorem gives the description of all stable invariant subspaces. Theorem 15.2.1 Let A,,..., A, be the different eigenvalues of the transformation A. A subspace Jf of <p" is A invariant and stable if and only if Jf — jV, + • • • 4- Jfr,
Stable Invariant Snbspaces: The Main Result 449 where for each j the space JV} is an arbitrary A-invariant subspace of 9?A (A) if dim Ker(Ay/ - A) = 1; if dim Ker( Ay/ - A) # 1 then either JV; = {0} or Jfr®^A). Comparing this theorem with Theorem 14.3.1, we obtain the following important fact: an A-invariant subspace Jf is stable if and only if Jf is isolated in the metric space \n\{A) of all A-invariant subspaces. An interesting corollary is easily detained from Theorem 15.2.1. Corollary 15.2.2 All invariant subspaces of a transformation A: <p"—»<p" are stable if and only if A is nonderogatory [i.e., dim Ker(/t - Au/) = 1 for every eigenvalue A0 of A). The proof of Theorem 15.2.1 will be based on a series of lemmas and an auxiliary theorem that is of some interest in itself. We will also take advantage of an observation that follows immediately from the definition of a stable subspace: the ,4-invariant subspace Jf is stable if and only if the SAS~'-invariant subspace SJf is stable. Here S: <p"-H»<p" is an arbitrary invertible transformation. First we present results leading to the proof of Theorem 15.2.1 for the case when A has only one eigenvalue. To state the next theorem we need the following notion: a chain Mx C M2 C • • • C Mn_l of /t-invariant sub- spaces is said to be complete if dim M ■ = j for j = 1,. . . , n — 1. Theorem 15.2.3 Given e > 0, there exists a 8 > 0 such that the following holds true: if B is a transformation with \\B — A\\ < 8 and {M .} is a complete chain of B- invariant subspaces, then there exists a complete chain {Jf^ of A-invariant subspaces such that 0(Jft, Mj)<e for j=\, . . . ,n — \. In general, the chain {M .} for A will depend on the choice of B. To see this, consider TO 01 Mo oJ« B^ where v G (p. Observe that for v ^ 0 the only one-dimensional invariant subspace of Bu is Span{e2}, and for B'v, v^O, the only one-dimensional invariant subspace is Span{e,}. Proof. Assume that the conclusion of the theorem is not correct. Then there exists an e > 0 with the property that for every positive integer m there exists a transformation Bm satisfying \\Bm - A\\ < 1 Im and a complete chain {Mmj} of Bm-invariant subspaces such that for every complete chain {^} of /t-invariant subspaces 0 °1 R'J° V] v or ° _o ol
450 Coutinuity and Stability of Invariant Snbspaces max 0(Jf„Mmi)>e m = l,2,... (15.2.3) Denote by Pmj the orthogonal projector on Mmj. Since ||Pm;-|| = l, there exists a subsequence {m,} of the sequence of positive integers and transformations P,,. . . , P„ _, on <f"\ such that lim/V/=^. j=\,...,n-\ (15.2.4) (—•30 ''' ' Observe that P,,. . . , P„_, are orthogonal projectors. Indeed, passing to the limit in the equalities Pm y = (Pm>/)2, we find that Pj = P;2. Further, equation (15.2.4) combined with P£' y = Pm } implies that P* = Pf, so P. is an orthogonal projector (see Section 1.5). Further, the subspace Jff = Im P. has dimension /", j' = 1, . . . , n - 1. This is a consequence of Theorem 13.1.2. By passing to the limits it follows from BmPmj = PmJBmPmj that APj = PjAPj. Hence ^T is A invariant. Since Pmj = PmJ+iPmj we have P;. = Pj+iPj, and thus J;C^+1. It follows that .A", is a complete chain of /4-invariant subspaces. Finally, 0(.^, ./#.)= ||P. - PmJ|-»0. But this contradicts (15.2.3), and the proof is complete. □ Corollary 15.2.4 If A has only one eigenvalue, A0, say, and if dim Ker( A0/ - A) = 1, tfie« each invariant subspace of A is stable. Proof. The conditions on A are equivalent to the requirement that for each 1 < / < n - 1 the operator A has only one y'-dimensional invariant subspace and the nontrivial invariant subspaces form a complete chain (see Section 2.5). So we may apply the previous theorem to obtain the desired results. □ Lemma 15.2.5 If A has only one eigenvalue, A0 say, and if dim Ker(A0/- A) >2, then the only stable A-invariant subspaces are {0} and <p". Proof. Let J = diagf/,. (A0), . . . , Jk (A0)] be the Jordan form for A. As dim Ker( A0/ - A) > 2, we have s > 2. By similarity, it suffices to prove that J has no nontrivial stable invariant subspace. For e G <p, define the transformation Tc on <p" by setting Te =("'-' >f' = *! + ••■+ *> + !. / = 1 * - 1 ' ' lo otherwise and put B€ - J + T€. Then ||B£ - i|| tends to 0 as e—»0. For e #0 the linear transformation Bf has exactly one y'-dimensional invariant subspace, namely,
Proof of Theorem 15.2.1 in the General Case 451 Jfj = Span{e,, . . . , et). Here 1 </ < fc - 1. It follows that JfJ is the only candidate for a stable /-invariant subspace of dimension ;'. Now consider 7 = diag[7t (A0),. . . , 7^(A0), /^(A,,)]. Repeating the argument of the previous paragraph for / instead of /, we see that Jft is the only candidate for a stable /-invariant subspace of dimension /'. But / = SJS \ where S is the similarity transformation that reverses the order of the blocks in /. It follows that SJfj is the only candidate for a stable /-invariant subspace of dimension j. As s 3:2, however, we have SJf. ^ Jf- for 1 ^ j; s k - 1, and the proof is complete. □ Corollary 15.2.4 and Lemma 15.2.5 together prove Theorem 15.2.1 for the case when A has one eigenvalue only. 15.3 PROOF OF THEOREM 15.2.1 IN THE GENERAL CASE The proof of Theorem 15.2.1 in the general case is reduced to the case of one eigenvalue considered in the preceding section. Recall the notion of the minimal opening rj(Jt,Jf) = inf{\\x + y\\\x<=Jl,y<=Jf, max(||x||, |M|) = 1} between subspaces M and ^"(Section 13.3). Always 0<tj(JI, Jf) < 1, except when both M and Jf are the zero subspace, in which case r\{M, Jf) = °°. Note that t](M, Jf) >0 if and only if M fl Jf = {0} (Proposition 13.2.1). We need to apply the following fact. Proposition 15.3.1 Let {■Mm}^ri = 1 be a sequence of subspaces in <£"". If limm^x 0(-Mm, &) =0for some subspace !£, then v(Mm,Jf)-^v(<e,Jf) (15.3.1) for every subspace Jf. Indeed, if both if and Jf are nonzero, then also Mm are nonzero (at least for m large enough; see Theorem 13.1.2). Then (15.3.1) follows from formula (13.3.2). If at least one of 3? and Jf is the zero subspace, then (15.3.1) is trivial. Let us introduce some terminology and notation that will be used in the next two lemmas and their proofs. We use the shorthand Am—*A for limm^ot ||.4m - A\\ = 0, where Am, m = 1,2,. . . , and A are transformations on (p". Note that A m —» A if and only if the entries of the matrix representations of Am (in some fixed basis) converge to the corresponding entries of
452 Continuity and Stability of Invariant Subspaces A (represented as a matrix in the same basis). We say that a simple rectifiable contour T splits the spectrum of a transformation T if <r(T) n<$> = 0. In that case we can associate with T and T the Riesz projector nT;r)=j^.jr(i\-Tyld\ The following observation is used subsequently. If T is a transformation for which T splits the spectrum, then T splits the spectrum for every transformation S that is sufficiently close to T (i.e., ||5 - 7"|| is close enough to zero). Indeed, this follows from the continuity of eigenvalues of a linear transformation as functions of this transformation. Lemma 15.3.2 Let r be a simple rectifiable contour that splits the spectrum of T, let T0 be the restriction of T to Im P(T; V), and let Jfbe a subspace of Im P(T; T). Then Jf is a stable invariant subspace for T if and only if jV is a stable invariant subspace for 7"0. Proof. Suppose that Jf is a stable invariant subspace for T0, but not for T. Then one can find an e > 0 such that for every positive integer m there exists a transformation Sm such that \\Sm-T\\<^ (15.3.2) and V,l)>e, M<Elnv(Sm) (15.3.3) From (15.3.2) it is clear that Sm—» T. By assumption, T splits the spectrum of T. Thus, for m sufficiently large, the contour T will split the spectrum of Sm. Moreover, P(Sm; r)-+ P(T; T), and hence lmP(Sm-,r) tends to Im P(T\ T) in the gap topology. But then, for m sufficiently large, KetP(T;r) + lmP(Sm;r) = p (cf. Theorem 13.1.3). Let Rm be the angular transformation of Im P(Sm; T) with respect to P(T; T). Here, as in what follows, m is supposed to be sufficiently large. As P(Sm;T)->P(T;r), we have /?„,-»0. Put -[J Rr\ where the matrix representation corresponds to the decomposition
Proof of Theorem 15.2.1 iu the Geueral Case 453 <:'! = Ker/J(r;r)-i-Im/J(r;r) (15.3.4) Then £L is invertible with inverse [J ""I Also, Em Im P(T; T) = Im P(Sm; T), and Em -* /. Put Tm = E~mlSmEm. Then 7"m Im P(T; T) C Im P(T; T) and Tm-^ 7". Let Tmg be the restriction of Tm to Im P(T; T). Then TmQ-^ T0. As JVis a stable invariant subspace for T0, there exists a sequence {^m} of subspaces of Im P(T; T) such that Jfm is Tm<j invariant and d(Jfm, Jf)-^0. Note that Jfm is also Tm invariant. Now put Mm = EmJfm. Then J<m is an invariant subspace for Sm. From Em-*I one can easily deduce that 0(^m, .A"m)-^0. Together with 8(Mm, Jf)-+0, this gives 6(Mm, Jf)-^0, which contradicts (15.3.3). Next assume that Jf C Im P(T; T) is a stable invariant subspace for T, but not for T0. Then one can find an e >0 such that, for every positive integer m, there exists a transformation Sm on Im P(T; T) satisfying \\Sm ~ T0\\<- (15.3.5) and 0(^, M) > e , JV e Inv(Smn) (15.3.6) Let T, be the restriction of T to Ker /*(r; T) and write T, S = i L 0 °1 S„ J where the matrix representation corresponds to the decomposition (15.3.4). From (15.3.5) it is clear that Sm—* T. Hence, as Jf is a stable invariant subspace for T, there exists a sequence {Jfm} of subspaces of <p" such that Jfm is 5m invariant and 8(Jfm, M)-^0. Put Mm = P(T; F)Jfm. Since P(T; T) commutes with 5m, then Mm is an invariant subspace for Sm . We now prove that 0{Mm, 1A")-»0, thus obtaining a contradiction with (15.3.6). Take yE.Mm with ||_y|| < 1, and let x G Jfm be such that y = P(T; Y)x. Then ||y|| = ||P(r;r^||>inf{||x-M|||MeKerP(r;r)} a^JV,,,, Ker P(r;r))-11*11 (15.3.7) By Proposition 15.3.1, 0(Jfm, JV)-»0 implies that Tj(JVm, Ker P(T; V))-^^,
454 Continuity and Stability of Invariant Subspaces where t)0 = T)(Jf, Ker P{T; T)). So, for m sufficiently large, ■t)(Jfm, Ker P(T; T)) > |tj„. Together with (15.3.7), this gives IMIainblMI for m sufficiently large. Using this inequality, we obtain z||< sup inf ||P(r;r>-z|| = sup inf\\P(T;r)x-P(T;Dz\\ ze.vB *e-v lkll = 2'l« -||p(r;r)||(|-)0(^m,^) y||<sup inf \\P(T;r)z - P(T;r)x\\ 26.V '£-^ Ik 11 = 1 =s||p(r;r)||e(jvm,jv) So for m sufficiently large. We conclude that d(Mtn, .^)-+0, and the proof is complete. □ Lemma 15.3.3 Let Jf be an invariant subspace for T, and assume that the contour F splits the spectrum of T. If Jf is stable for T, then P(T; r)Jf is a stable invariant subspace for the restriction T0 of T to Im P(T; T). Proof It is clear that M - P{T; T)Jf is T0 invariant. Assume that M is not stable for T0. Then M is not stable for T, either, by Lemma 15.3.2. Hence there exist e>0 and a sequence {Sm} such that Sm —* T and 6(<e,M)>e, i?eInv(Sm), m = l,2, ... (15.3.8) As Jf is stable for T, one can find a sequence of subspaces {Jfm} such that SmJfm CJfm and d(Jfm, Jf)-^0. Further, since T splits the spectrum of T and Sm —* T, the contour T will split the spectrum of Sm for m suf- and sup inf II y lkll-i sup inf 2SJV y^Mm Ik 11 = 1 \z —
Perturbed Stable Invariant Subspaces 455 ficiently large. But then, without loss of generality, we may assume that T splits the spectrum of each Sm. Again using Sm-+T, it follows that p(sm,r)-^p(T;r). Let 2E be a direct complement of Jf in £". As 0(Jfm, Jf)-+0, we have <p" = 3f + Jfm for m sufficiently large (Theorem 13.1.3). So, without loss of generality, we may assume that <p" = 2£ + Jfm for each m. Let Rm be the angular transformation of Jfm with respect to the projector of <p" along 3E onto Jf, and put -I"7 Rm\ E"' ~ 10 /J where the matrix corresponds to the decomposition <p" = 2t + Jf. Note that T„ = EmlSmEm leaves Jf invariant. Because Rm-*0, we have Em-*/, and sorm-^r. Clearly, T splits the spectrum of 7"| v. As rm-» T and Jf is invariant for Tm, the contour T will split the spectrum of Tm\^ too, provided m is sufficiently large. But then we may assume that this happens for all m. Also, we have iimP(rm|,.;r)-^p(r|.v;r) Hence J<m = Im P^J^; T)^Im P(T| v-; T) = M in the gap topology. Now consider Z£m = EmMm. Then ifm is an 5m-invariant subspace. From Em-*I it follows that 0(i?m, MJ-^0. This, together with 0(./0m, ■/«)-»0, gives 6{Z£m, M)-^0. So we arrive at a contradiction to (15.3.8) and the proof is complete. □ After this long preparation we are now able to give a short proof of Theorem 15.2.1. Proof of Theorem 15.2.1. Suppose that JVis a stable invariant subspace for A. Put Jf^Jfn 3?A (v4). Then Jf = Jfl + • • • + Jfr. By Lemma 15.3.3, the space Jfj is a stable invariant subspace for the restriction Ai of A to S?A (A). But, by Lemma 2.1.3, Aj has one eigenvalue only, namely, Ar So we may apply Lemma 15.2.5 to prove that Jfi has the desired form. Conversely, assume that each Jfj has the desired form, and let us prove that Jf — JV, + • • • 4- Jfr is a stable invariant subspace for A. By Corollary 15.2.4, the space Jfj is a stable invariant subspace for the restriction v4 . of A to 0l^(A). Hence we may apply Lemma 15.3.2 to show that each Jfj is a stable invariant subspace for A. But then the same is true for the direct sum Jf = Jfl + --- + Jfr. □ 15.4 PERTURBED STABLE INVARIANT SUBSPACES In this section we show that the stability of an A -invariant subspace M is preserved under small perturbations of M and A. This is true also when we restrict our attention to the intersection of M and a fixed spectral subspace
456 Continuity and Stability of Invariant Suhspaces of A. To state this result precisely, denote by dla(A) the spectral subspace of A (the sum of root subspaces for A) corresponding to those eigenvalues of A that lie in an open set ft. Theorem 15.4.1 Let A: <p"—* <p" be a transformation, and let (1C <p be an open set whose boundary does not intersect o-(A). Assume that M is an A-invariant subspace for which the intersection M fl 9?n(v4) is stable (with respect to A). Then any B-invariant subspace Jf has the property that Jf fl 8fta(B) is stable (with respect to B) provided ||B — v4|| and 0(M, Jf) are small enough. The particular case of Theorem 15.4.1 when fl=(p is especially important. Corollary 15.4.2 Let M be a stable A-invariant subspace. Then there exists an e >0 such that any B-invariant subspace Jf is stable provided \\B- A\\ + 0(M,Jf)<e We need the following lemma for the proof of Theorem 15.4.1. Lemma 15.4.3 Let A and fl be as in Theorem 15.4.1, and let M be an A-invariant subspace. Then for every e > 0 there exists a 8 > 0 such that every B-invariant sub- space Jf with ||B - A|| + 6(M, Jf)<8 satisfies the inequality 6(M D $ln(A), Jf D 9ia(B)) < e. Proof. Arguing by contradiction, assume that there is a sequence of transformations {Bm}* = i and a sequence of subspaces {Jfm)Z~i sucn that limm_ ||Bm - A\\ =0, Hmm_ 6(M, Jfm) = 0, Jfm is Bm invariant for each m, but e(M n ma(A), jfm n ma(Bm))> e >o (15.4.1) where e does not depend on m. Denote by Pn(Bm) [resp. Pa(A)] the Riesz projector onto 3in(Bm) [resp. onto 9?n(v4)]. By Lemma 13.3.2, for m large enough there exists an invertible transformation Sm: <p" —» <p" such that Sm(®a(A)) = ®a(Bm) , Sm(Ker Pn(A)) = Ker Pa(Bm) and, moreover, max{||Sm-/||,||s;1-/||}<C1||,4-Bj|
Perturbed Stable Invariant Subspaces 457 Here C,, C2,. . . are positive constants that depend on A only. Actually, one can take Sm defined as follows: Smx = (/ - Pa{Bm) + Pa(A))x , x G Ker Pa(A) Smx = (/ + Pa(Bm) - Pn(A))x , x G mn(A) Put Bm = S~mlBmSm and Xm = S^X (so that ^ is Bm invariant). Let PM (resp. Px ) be the orthogonal projector onto M (resp. jVm). As SmlPx Sm is a projector onto Jfm (not necessarily orthogonal), we have 8(M, Jfjs ||5m'P,m5m - Pj| < C20(^, Jfm) (15.4.2) where the first inequality follows from (13.1.4). Hence 0(M,Jfm)->0 as m-H>°° (15.4.3) It is easily seen that 3?la(Bm) = 3in{A) and Ker Pn(Bm) = Ker Pn(A) (for m large enough). Consequently K = (*m n »nM)) + (^m D Ker Pa(^)) Since also M = (M n 98„(j4)) + (^ D Ker Pn(/4)) Theorem 13.4.2, together with (15.4.2), implies that 0(J#n3?n(v4),.yVmn3?fi(v4))-»O as m-^oo (15.4.4) (cf. the proof of Lemma 14.3.2). Now, as in (15.4.2), we have e(M n ®a(A), Jfm n ®a(Bm)) < c,e{M n an(>4), ^m n aa(A)) which contradicts (15.4.1) in view of (15.4.4). D Proof of Theorem 15.4.1. Consider first the case fl = <p (i.e., S?n(/4) = <p", where n is the size of A). Arguing by contradiction, assume that the statement of the theorem is not true (for Cl = <p). Then there exist an e > 0 and a sequence {Bm } * =, of transformations on <p" converging to A such that 6(M, Ji) > e for every stable fim-invariant subspace ^T, m = 1, 2,. . . Since M is stable and Bm—* A, there exists a sequence {Mm}^ = l of subspaces in <p" with BmMmdMm for each m and &{Mm, M)-+Q. For m sufficiently large we have &(Mm, M)< e, and hence the Bm-invariant subspace Mm is not stable.
458 Continuity and Stability of Invariant Subspaces Let 3? be a direct complement of M in §". We may assume that 3? is also a direct complement to each Mm (Theorem 13.1.3). Let Rm be the angular transformation of Mm with respect to the projector onto M along St. Then fl-»0. Write m Lo /J where the matrix representation is taken with respect to the decomposition §" = 2£ + M. Then Em is invertible, EmM = Mm, and £m-^/. Put v4m = E'mlBmEm. Obviously, Am—* A and AmM CM. Note that J< is not stable for Am. With respect to the decomposition <p" = M + 3?, we write ■^m m I o wj Then Um-+U and Wm-* W Since J< is not stable for Am, Theorem 15.2.1 ensures the existence of a common eigenvalue Am of Um and Wm such that dimKer(Am/-v4m)>2, m = l,2,... (15.4.5) Now |A,J < \\Um\\ and {Um} converges to U. Hence the sequence {Am} is bounded. Passing, if necessary, to a subsequence, we may assume that Am-»A0 for some A(( G <f. But then \mI - Um—> \0I - U and Am/ - Wm -^ A,,/ - W. It follows that A(( is a common eigenvalue of U and W. Again applying Theorem 15.2.1, we see that A() is an eigenvalue of geometric multiplicity one: dim Ker(A0/ - A) = 1. So there exists a nonzero (« — 1) x (n — 1) minor in A()/— A. Then, for m large enough, the corresponding minor in Am/- Am is also nonzero, a contradiction with (15.4.5). Now consider the general case of Theorem 15.4.1. It is seen from the proof of Lemma 15.4.3 that we can assume that B satisfies 8ftn(B) = 3in{A). But then we can apply the part of Theorem 15.4.1 already proved with <p", A and B replaced by ffln(A), A\# (A) and B\^ (B), respectively. □ Now let us focus attention on the spectral A -invariant subspaces, that is, sums of root subspaces for A (the zero subspace will also be called spectral). Theorem 15.2.1 shows that each spectral invariant subspace is stable. The converse is not true in general: every invariant subspace of a unicellular transformation is stable, but the only spectral subspaces in this case are the trivial ones. For the spectral subspaces, an analog of Theorem 15.4.1 holds. Theorem 15.4.4 Let A and Q, be as in Theorem 15.4.1. Assume that M is an A-invariant subspace for which M D &tn(A) is a spectral invariant subspace for A. Then A = U 0 V w A„ =
Lipschitz Stable Invariant Subspaces 459 any B-invariant subspace Jf has the property that 1 n SS!(B) is spectral (as a B-invariant subspace) provided \\B - A\\ + 6(M, Jf) is small enough. Proof. As in the proof of Theorem 15.4.1, the general case can be reduced to the case fl = <f\ So assume il- <f\ Since every invariant subspace is the sum of its intersections with the root subspaces, it follows that an /t-invariant subspace if is spectral if and only if there is an ,4-invariant direct complement if' to i? such that a(/l|^)n cr(y41^.) = 0. Let A be an open set containing a(A\ u), and let A' be an open set disjoint with A that contains all other eigenvalues of A (if any). Then (t(A\u.) C A' for an /1-invariant direct complement M' to M (actually, M' is the spectral /t-invariant subspace corresponding to the eigenvalues in A'). By Theorem 15.1.4, any B-invariant subspace Jf satisfies ct(B|v)CA provided ||B - A\ + B(M, Jf) is small enough. On the other hand, by Theorems 15.2.1 and 15.1.4 there exists a B-invariant subspace Jf' such that ct(B|v.)CA' and 6(M',Jf') is as small as we wish provided ||B-/t|| is small enough. As Jf' is a direct complement to Jf (Theorem 13.1.3) and a-(B|w.)n o-(B|-V..) = 0, it follows that jV is spectral. □ The proof of Theorem 15.4.4 shows that if M is a spectral /t-invariant subspace with o-(A\M) Cfl, where flC(f is an open set, then for any B-invariant subspace Jf such that ||B — A\\ + d(M, Jf) is small enough, we also have a(B|v)Cfl. 15.5 LIPSCHITZ STABLE INVARIANT SUBSPACES In this section we study a stronger version of stability for invariant sub- spaces. A subspace M C <p" that is invariant for a transformation A: <p" —» <p" is said to be Lipschitz stable (with respect to A) if there exist positive constants K and e such that every transformation B: <f""—* <)7" with ||B-/l||<e has an invariant subspace Jf with 6(M, JV)< K\\B - v4||. Clearly, every Lipschitz stable subspace is stable; the converse is not true in general. The following theorem decribes Lipschitz stability. Theorem 15.5.1 For a transformation A and an A-invariant subspace M the following statements are equivalent: (a) M is Lipschitz stable; (b) M - {0} or else M = S?A (A) 4- • • • + 3?A (A) for some different eigenvalues A,,. . . , Ar of A; in other words, M is a spectral A-invariant subspace; (c) for every sufficiently small e > 0 there exists a 8 > 0 such that any transformation B with || A - B || < S has a unique invariant subspace Jf for which 6(M, Jf) < e.
460 Continuity and Stability of Invariant Subspaces The emphasis in (c) is on the uniqueness of Jf; if the word "unique" is omitted in (c), we obtain the definition of stability of M. Proof. First, arguing as in the proof of Lemma 15.3.2, one shows that M is a Lipschitz stable A-invariant subspace if and only if each intersection M C\0l ^(A) is Lipschitz stable (with respect to the restriction Mm (A)) f°r )'' = 1> • • • > s> where /x,,. . . , ns are all the distinct eigenvalues of A. Assume that (c) holds but (b) does not. Then M is a stable subspace, and Theorem 15.2.1 ensures that for some eigenvalue A(( of A with dim Ker( A0/ - A) = 1 we have {0} ¥= M n 3?A (A) ¥= 3?A (A). Let l3*A„M) in a Jordan basis for A in where 0 < a < 1, as follows: A0 1 0 A, 0 1 1 A„J L0 0 (A), and define the transformation B(a), .(A) 0 La 1 0 0 1 0An (15.5.1) B(a) = A on all root subspaces of A other than 8?A (A). Then B(a)—> A as a-»0. Let /? = dimS?A (A); q = dim Ker M D 3tko(A); so 0<q<p. For brevity, denote the right-hand side of (15.5.1) by K(a). To obtain a contradiction, it is sufficient to show that for a small enough the number of ^-dimensional ^(a)-invariant subspaces JV such that 0(M C\ S?A (A), Jf) s C, ailp is exactly ( J > 1 (we denote by C,, C2,. . . positive constants that depend on p and q only). Let us prove this assertion. The matrix K(a) has p different eigenvalues e,,. . . , e„, which are the p different roots of the equation xp = a. The corresponding eigenvectors are y, = (l,e,,. '), i = l,...,p. The only ^-dimensional ^(a)-invariant subspaces are those spanned by any q vectors among yt, . . . , yp notational convenience that Jf - Span{y,, Jf along the subspace spanned by e +1,. . Take such a subspace Jf and suppose for • > yq)- The projector (2.^ onto e is given by the formula My y-' oJ Y Y p-i i
Lipschitz Stable Invariant Subspaces 461 where Yq (resp. Yp-q) is the q x q [resp. (p - q) x q] matrix formed by the first q (resp. last p - q) rows of the matrix [y, y2 ■ • • yq\. As Yq is a Vandermonde matrix, det Yq = nis,<>S(? (e^-e^^O (cf. Example 2.6.4). Let Z^ = Adj J^ be the matrix of algebraic adjoints to the elements of Yq, so that Y"1 = l/(det Yq)Zq. From the form of Yq it is easily seen that \\Zq\\^C2ar!p, where r=l+2+■• •+ (q-2)= \(q-\)(q-2). Further, |det Yq\ = C-ias,p, where s= ^q(q - 1) is the number of all pairs of integers (i,y) such that \<i<j<q. As \\Yp^q\\< C4a'"p, it follows that ||yp_,y;'|| =£ C5a{r+s+q>'p = C,allp. Consequently, \\Q-QA\^Cb-ailp (15.5.2) ', o- 0 0. (resp. onto Jf) we have where Q As Q (resp. Q„) is a projector onto JDSA (/I) 0(^n^,M),jv)<ne-e.vll [see (13.1.4)]. Combining this inequality with (15.5.2), we find that 6(M D 8?A (A), Jf) < C6 • a"p for a >0 small enough. Since the number of ^-dimensional ^(a)-invariant subspaces iV is exactly I J, the required assertion is proved. Conversely, assume that (b) holds but (c) does not. Since M is a stable subspace (by Theorem 15.2.1), this implies the existence of a sequence {BmYm = \ a"d the existence of two different Bm-invariant subspaces Jflm and Klm such that ||Bm - /i|| < (1 /m) and 0(-M,^im)<^ (15-5.3) for i = 1 and 2. Let T (resp. A) be a closed simple rectifiable contour such that cr(A) D T = 0 [resp. a(A) D A = 0] and A,, . . . , Ar are the only eigenvalues of A inside T (resp. outside A). Letting 9?,.(C) be the image of the Riesz projector (2iri)~ J, (A/— C)~ dX, where the matrix C has no eigenvalues on T, we have M = 9ix{A). Since 0(», (B,„), &tr(A))-*0 as m-^oo, we find in view of (15.5.3) that 6{%{Bm),Jfim)-^0 asm-**, i = l,2 (15.5.4) Now ^.^(^n^BJJ + l^n^flJ); combining this with (15.5.4), it is easily seen that Jfim DS?A(Bm) = {0}, at least for large m. (Indeed, argue by contradiction and use the properties that the set of all subspaces in <p" is compact and that the limit of a converging sequence of nonzero subspaces is again nonzero.) So Jfim C &lv(Bm). But (15.5.3) implies that (for large m) dim Jfim = dim M = dim % (Bm). Hence Jfl)n = 3?lr(Bm),
462 Coutiuuity and Stability of Invariant Subspaces i = 1,2 (for large m), contradicting the assumption that Jflm and Jf2m are different. Now we prove the equivalence of (a) and (b). In view of Theorem 15.2.1, we have to check that the only Lipschitz stable invariant subspaces of the Jordan block 0 1 0 0 0 1 01 1 0J <p"^<f:" M) 0 0 • are the trivial spaces {0} and <f"'. For a > 0, let /_ = "0 10- 0 0 1 -a 0 0 • • 0" 1 • 0. For k = 1, ...,«- 1, the only ^-dimensional /-invariant subspace Jfk is spanned by the first k unit coordinate vectors. Denote by Pk the orthogonal projector onto Jfk, and let Pka denote the orthogonal projector onto a ^-dimensional 7a-invariant subspace Jfk a (l<i<n-l). We have y=(h e" ) G Jfk a, where e = a So *(•*■*>•*■*..)= II f* P*-J"IM \Pky-pk.ay\ -v=* ki >I21 i\2 112 s;^iei Now use |e| = Va. One finds that for a sufficiently small «(■""*.■*■*.„) M«*'" On the other hand, ||/ - Ja \\ = a. But then it is clear that for 1 < k < n - 1 the space Jfk is not a Lipschitz stable invariant subspace of J, and thus J has no nontrivial Lipschitz stable invariant subspace. □ The property of being a Lipschitz stable subspace is stable in the following sense: let M be an /t-invariant Lipschitz stable subspace. Then any B-invariant subspace Jf is Lipschitz stable (with respect to B) provided \\B - A\\ and 0(M, Jf) are small enough. In view of Theorem 15.5.1, this is simply a reformulation of Theorem 15.4.4. It follows from Theorem 15.5.1 that a transformation A: <p"—»<p" has
Stability of Lattices of Invariant Subspaces 463 exactly 2r different Lipschitz stable invariant subspaces, where r is the number of distinct eigenvalues of A. 15.6 STABILITY OF LATTICES OF INVARIANT SUBSPACES In this section we extend the notion of stable invariant subspaces to the lattices of invariant subspaces. Recall that a set A of subspaces in <p" is called a lattice if 1,^£A implies i+lGA and M DJf E.A. Two lattices A and A' of subspaces in <p" are isomorphic if there exists a bijective map S: A—* A' such that S(Jt n JV) = SM D SJf and S{M + Jf) = SM + SJf for any two members M and Jf of A. In this case 5 is called an isomorphism of A onto A'. Let A be a lattice of (not necessarily all) invariant subspaces of a transformation v4: <p"-» <p". The lattice A is called stable if for every e >0 there exists a 5>0 such that, for any transformation B:(p"-^(p" with ||/1-B||<S, there exists a lattice A' of (not necessarily all) B-invariant subspaces that is isomorphic to A and satisfies sup^eA 0(i?, 5(if)) < e for some isomorphism S: A—* A'. If A consists of just one subspace, we obtain the definition of a stable invariant subspace. Theorem 15.6.1 A lattice A of A-invariant subspaces is stable if and only if it consists of stable A-invariant subspaces. Proof. Without loss of generality we can assume that {0} and <p" belong to A. Suppose first that A contains an A-invariant subspace M that is not stable. Then there exist an e>0 and a sequence of transformations {Bm)* = 1 tending to A such that d(M, Jf) > e0 for any Bm-invariant subspace Jf and any m. Obviously, A cannot be stable. Assume now that every member of A is a stable A -invariant subspace. As the number of stable /t-invariant subspaces is finite (by Theorem 15.2.1), the lattice A is finite. Let M,,..., Mp be all the elements in A. Denote by A,,. . . , Ar the different eigenvalues of A ordered so that dim Ker(/t - \J) = 1 for i = 1, . .. , s dim Ker(A - A,/) > 1 for / = s + 1,. . . , r Then M, = Jfn+Jf,2 + --- + Jfir, i=l,...,p where Jfif = Mi n S?A (A), and Jftj is equal either to {0} or to ^(A) for
464 Continuity and Stability of Invariant Subspaces ;' = 5 + 1, . . . , r. Let T; (/ = 1,. . . , r) be a small circle around Ay such that A; is the only eigenvalue of A inside or on 1^. There exists a 8Q > 0 such that all transformations B: <f"'—* <p" with \\B - A\\ < 8f) have all their eigenvalues inside the circles Ty; for such a B denote by 5?,(B) the sum of root subspaces of B corresponding to the eigenvalues inside r.. Now put M'i = Jf'n + --- + Jf'ir, i = l p where for / = s + 1, . . . , r, M'h = {0} if Jftf = {0} and #'„ = ^(B) if Jf,. = mk (A); for j = 1,. . . , 5 we take Jf'u as follows. Let {0} = i?0 C if, C • • • C 3?m = tfljiB) be a complete chain of B-invariant subspaces in £%;(B); then Jf'ji is equal to that subspace i£k whose dimension coincides with the dimension of Jfir Clearly, M\ is B invariant. Further, it is clear from the construction that M\CMk if and only if Mi CMk. Using Theorem 15.2.3, it is not difficult to see that, given e < 0, there exists a positive 8 < 80 such that max1SlS/,0(J<(, M'i)< e for any transformation B: <p"-» <p" with ||B - A\\ < 8. Putting A' = {M\,. . . , M'p), we find that A is stable. □ The case when the lattice A is a chain is of special interest for us. We say that a chain i?, C • • • C 3?r of ^-invariant subspaces is stable if for every e >0 there exists a 8 >0 such that any transformation B: <p"—* <p" with ||B - A\\ < 8 has a chain i£\ C • • • C <£'r of invariant subspaces such that 0(i?;, &,) < e for / = 1,. . . , r. It follows from Theorem 15.6.1 that a chain of ,4-invariant subspaces is stable if and only if each member of this chain is a stable ,4-invariant subspace. The notion of Lipschitz stability of a lattice of invariant subspaces is introduced naturally: a lattice A of (not necessarily all) ,4-invariant sub- spaces is called Lipschitz stable if there exist positive constants e and K such that every transformation B with ||B - .A|| < e has a lattice A' of invariant subspaces that is isomorphic to A and satisfies inf sup 0{£, S(%)) *kK\\B-A\\ where S runs through the set of all isomorphisms of A onto A'. Obviously, every Lipschitz stable lattice of invariant subspaces is stable. We leave the proof of the following result to the readers. Theorem 15.6.2 A lattice A of A-invariant subspaces is Lipschitz stable if and only if A consists only of a spectral subspaces for A. 15.7 STABILITY IN METRIC OF THE LATTICES OF INVARIANT SUBSPACES If the lattice A consists of all ,4-invariant subspaces, then a different notion of stability (based on the distance between sets) is also of interest. To introduce this notion, we start with some terminology.
Stability in Metric of the Lattices of Invariant Subspaces 465 Given two sets X and Y of subspaces in <p", the distance between X and Y is introduced naturally: dist(*, Y) = max{sup inf 8(M,Jf), sup inf 0(M, Jf)} Borrowing notation from set theory, denote by 2Z the set of all subsets in a set Z. Then distfA', Y) is a metric in 2<>i<:") [as before, (p(<p") represents the set of all subspaces in £"]. Indeed, the only nontrivial property that we have to check is the triangle inequality: dist(*, Y) < dist(*, Z) + dist(Z, Y) for any subsets X, Y, Z in <$(("). For M e X, Jf G Y, <£ E Z we have 0(M,Jf)<0(M,£) + d(<£,Jf) (15.7.1) Fix J< and e>0 and take i£ in such a way that 0(M,Z£)< inf#ez0(J<, ^) + e. Taking the infimum in (15.7.1) with respect to Jf, we obtain inf 6(M, Jf) < inf 6»(^, 3?) + inf BiJ£, Jf) + e <dist(*, Z) + dist(Z, Y) + e Now take the supremum with respect to J<, and, from the resulting inequality with the roles of M and Jf interchanged, it follows that dist(A', Y) s dist(A", Z) + dist(Z, Y) + e As e >0 was arbitrary, the triangle inequality follows. Note also that dist(Z, Y) < 1 for any X, yeZW) The lattice lm(A) of all invariant subspaces of a transformation >1: <p"—»• <p" is called stable in metric if for every e > 0 there exists a 8 > 0 such that the lattice Inv(fi) of any transformation B:<p"—»<p" with ||B-/1||<S satisfies dist(Inv(B), Inv(/l))<e. The following theorem describes all transformations with stable lattices of invariant subspaces. Theorem 15.7.1 lnv(A) is stable in metric if and only if A is nonderogatory, that is, dim Ker(/t - A0/) = 1 for every eigenvalue A„ of A. Proof. Assume that A is derogatory. Then obviously lm(A) is an infinite set. Without loss of generality we can assume that A is a matrix in the Jordan form:
466 Coutinuity and Stability of Invariant Subspaces Here A,,. . . , Ar are (not necessarily distinct) eigenvalues of A. For i — 1,. . . , r let {e;(m)}* = 1 be a sequence of numbers such that lim,n^„ e,(m) = 0 and kl + el(m)*\i + el{m) for any i¥=j and any positive integer m. (Such sequences can obviously be arranged.) Letting Am = /t|( A, + e,(m)) © • • • ®JkJ[ A, + e,(m)) we obtain || /I m - A \\ —* 0 as m —» °°. Moreover, the number of /I m-invariant subspaces is exactly (ki + 1)- •• (kr + 1), and the lattice of/lm-invariant subspaces is independent of m. As Inv(/1) is infinite, clearly, dist(Inv(/l), Inv(/lm))s e >0, where e does not depend on m. Hence ln\(A) is not stable. Assume now that A is nonderogatory. Then the lattice lnv(A) is finite. Let Ml,...,M be all the ,4-invariant subspaces. Theorem 15.2.1 shows that every Mt is stable. That is, given e >0, there exists a S,>0 such that any transformation B: <p"-» <p" witn II^ ~~ ^11 < fy has an invariant subspace JV, such that 0(^,, Jf,) < e. Taking S' = min(81( . . . , 8p), we have max inf 6(M, Jf) < e for every transformation with ||fi - v4|| < 8'. We prove now that given e >0 there exists a S">0 such that sup inf 6(M,Jf)<e .Ae;inv(B) *elnv(/l) for every transformation B with \\B - A\\ < 8". Suppose not. Then there is a sequence of transformations on <p", {Bm}^, = l, such that Bm—> A as m—»°° and for every m there exists a Bm-invariant subspace Jfm with inf 8(l,/J>Eo>0 (15.7.2) M elnv(.4) where e0 is independent of m. Using the compactness of the set of all subspaces in <p", we can assume that limm^„ 0(Jfm, Jf) = 0 for certain subspace Jf in <p". Then (15.7.2) gives inf 6(M>Jf)>e0 (15.7.3) ^Glnv(i4) However, by Theorem 15.1.1, Jf Blm(A), which contradicts (15.7.3).
Stability in Metric of the Lattices of Invariaut Subspaces 467 Now given e >0, let 8 = min(S', 8") to see that dist(Inv(B), lm(A)) < e for every transformation B with ||B - A\\ < 8. □ It follows from Theorems 15.6.1 and 15.7.1 that lm(A) is stable if and only if it is stable in metric. Also, let us introduce the notion of Lipschitz stability in metric. We say that the lattice Inv(/1) of all invariant subspaces of a transformation A: <p" —* <p" is Lipschitz stable in metric if there exist positive constants K and e such that for any transformation B: §n-^> <F" w'tn II# ~ ^11 < e tne inequality dist(Inv(B), Inv(/t))< K\\B- A\\ holds. Theorem 15.7.2 The lattice \n\(A) for a transformation A: <p" —* <p" is Lipschitz stable in metric if and only if A has n distinct eigenvalues. Proof. Assume that A has n distinct eigenvalues. Then every A- invariant subspace is spectral and by Theorem 15.5.1 every /t-invariant subspace is Lipschitz stable, let M{,. . . ,Jl be all ,4-invariant sub- spaces (their number is finite). So there exist positive constants Kt, ei such that any transformation B with || B - A || < e, has a invariant subspace Jf, with d{Mi,M'i)^Kl\\B- A\\. Letting K = max(Kl,. . . , Kp), e = min(e,,. . . , e ), we find that sup inf B(M,Jf)<K\\B-A\\ (15.7.4) -«elnv(/l)>elnv(fl) provided ||B - v4|| < e. Now consider the invariant subspaces of B. As A has n distinct eigenvalues, the same is true for any transformation B sufficiently close to A. So every B-invariant subspace N is spectral: JV = Im 1 ■ fv(\i- By1 dx] 2tt-/ Jr for a suitable contour T. We can assume that T D o-(A) = 0. Then, letting M = Im we find that bX^i-^A sK,\\B-A — I (\I-Ay1 dk-~ \ (\I-B) for every transformation B sufficiently close to A (cf. the verification of
468 Continuity and Stability of Invariant Subspaces stability of a direct sum of root subspaces in the beginning of Section 15.2). Hence sup inf d(M,Jf)^Kl\\B-A\\ (15.7.5) „«elnv(fl)V61nv<'4) for all such B. In view of (15.7.4) and (15.7.5), Inv(/t) is Lipschitz stable in metric. Conversely, if A has less than n distinct eigenvalues, then by Theorem 15.5.1 there exists an ,4-invariant subspace that is not Lipschitz stable. Then clearly \n\{A) cannot be Lipschitz stable in metric. □ 15.8 STABILITY OF [A B]-INVARIANT SUBSPACES In this section we treat the stability of [A B]-invariant subspaces. In view of the important part they play in our applications (see Section 17.7), the reader can anticipate subsequent applications of this material. Let >1: <p" —»• <p" and B: <p"—»<p" be linear transformations. Recall from Chapter 6 that a subspace M C (p" is called [A B] invariant if there exists a transformation F: (f" -»$m such that (A + BF)M C M (actually, this is a property that is equivalent to the definition of an [A B]-invariant subspace, as proved in Theorem 6.1.1). We restrict our attention to the case most important in applications, when the pair (A, B) is a full-range pair; thus ■x 'Zlm(A>B)= (p" It turns out that, in contrast with the case of invariant subspaces for a transformation, every [A B]-invariant subspace is stable and, moreover, the stability is understood in the Lipschitz sense. More exactly, we have the following theorem. Theorem 15.8.1 Let v4: £"—» <f"' and B: <pm —* <p" be a full-range pair of transformations. Then for every [A B]-invariant subspace M there exist positive constants e and K such that, for every pair of transformations A': (p"-»<prf ond B':<f:m-^(p", with |M-^|| + ||B-fl'||<e there exists an [A' B']-invariant subspace M' satisfying 6(M',M)< K(\\A - A'\\ + ||£ - B'||) (15.8.1) Proof. Let F: <p"-^ <pm be such that (A + BF)M C M. Write A + BF
Stability of {A B]-Invariant Subspaces 469 and B as block matrices with respect to the decomposition <p" - M + Jf, where Jf is some direct complement to M: »-"-K- :;;]• -[«;] We claim that (A22, B2) is a full-range pair. Indeed, since (A,B) is a full-range pair, so is (A + BF, B) (Lemma 6.3.1). Now for every x = xM + xx G <p" with xM G M, Xjf G M we have (/I + BF)'Bjt G A'21B2xx + J< Hence in view of the full-range property of (A + BF, B) we find that Span {/122 #2*^ | jtjy EJf ; i = 0,1,. . .} = Jf This implies the full-range property of (A22, B2). We appeal to the spectral assignment theorem (Theorem 6.5.1). According to this theorem, there exists a transformation G: Jf^> <pm such that a(A22 +B2G)na(Au) = & (15.8.2) Put F0 = F+G, where the transformation G: $"-^ $m is defined by the properties that Gx = 0 for all x G M and Gx = Gx for all x E.Jf. Clearly A + BF0 An A 0 A22 12 + S.Cl -,-, + B-yG J Condition (15.8.2) ensures that J< is a spectral invariant subspace for A + BF0. By Theorem 15.5.1, M is Lipschitz stable [as an (A + B/^-invariant subspace]. So there exist constants e', K' >0 such that every transformation H: §"-^> <p" with ||A + BF0 - H\\ < e' has an invariant subspace M' such that 8(M,M')<K'\\A + BFQ- H\\ It remains to choose e in such a way that \\A-A'\\ + \\B-B'\\<e implies \\(A + BF0) - (A'+ B'F0)\\ < e' and put K= AT'max(l, ||F0||) to ensure (15.8.1). □ We emphasize that the full-range property of (A, B) is crucial in Theorem 15.8.1. Indeed, in the extreme case when B-0 the [A B]- invariant subspaces coincide with ,4-invariant subspaces and, in general, not every ^-invariant subspace is Lipschitz stable.
470 Continuity and Stability of Invariant Subspaces The proof of Theorem 15.8.1 reveals some additional information about the stability of [A B]-invariant subspaces: Corollary 15.8.2 Let A: (p"-* <P" and B: <pm—»<p" be a full-range pair of transformations, and let M be an [A B]-invariant subspace. Then for every transformation F: <pB—* <pm such that (A + BF)M C M and every direct complement Jf to M in <p" there exists positive constants K and e with the property that to any pair of transformations A': <p"-» (p", B': <pm-^ <p" with \\A - A'\\ + \\B- B'\\ < e there corresponds a transformation F': <p"-» <pm with Ker F' D M, and a subspace M' C C", such that {A' + B'{F + F'))M' C M' and 8(M, M')< K(\\A - A'\\ + \\B - B'\\) A dual version of Theorem 15.8.1 also holds. Namely, given a null kernel pair of transformations G: <p"-> £"" and A: $"-+ <£", every -invariant subspace is Lipschitz stable in the above sense. The proof can be obtained by using Theorem 15.8.1 and the fact that a subspace M is I 1 invariant if and only if its orthogonal complement is [A* G*] invariant. We leave it to the reader to state and prove this dual version of Corollary 15.8.2. 15.9 STABLE INVARIANT SUBSPACES FOR REAL TRANSFORMATIONS Let A: 4?"—»■ $" be a transformation. The definition of stable invariant subspaces of A is analogous to that for transformations from <p" to <p". Namely, an /t-invariant subspace M C ft" is called stable if for every e >0 there exists a 8 >0 such that any transformation B: ft"-^ft" with \\B - A\\ < 8 has an invariant subspace .A" with 0(M, N)<e. However, it turns out that, in contrast with the complex case, the classes of stable and of isolated invariant subspaces no longer coincide. More exactly, every stable invariant subspace is isolated, but, in general, not every isolated invariant subspace is stable. To describe the stable invariant subspaces of real transformations, we start with several basic particular cases. Lemma 15.9.1 Let A: ft"'—* ft" be a transformation such that a(A) consists of either exactly one real eigenvalue or exactly one pair of nonreal eigenvalues. Let the geometric multiplicity {multiplicities) be greater than one in either case. Then there js no nontrivial stable A-invariant subspaces. The proof of this lemma is similar to the proof of Lemma 15.2.5.
Stable Invariant Subspaces for Real Transformations 471 Lemma 15.9.2 Assume that n is odd and the transformation A: tjL"-^>$" has exactly one eigenvalue (which is real) and the geometric multiplicity of this eigenvalue is one. Then each A-invariant subspace is stable. Proof. As n is odd, every transformation X: i£"-^ ft" has an invariant subspace of every dimension A: for 1 < A: < n - 1 (this follows from the real Jordan form for X, because X must have a real eigenvalue). Arguing as in the proof of Theorem 15.2.3, one proves that for every e >0 there exists a 8>0 such that, if B is a transformation with ||B-/l||<S and M is a A:-dimensional B-invariant subspace, there exists a A:-dimensional A- invariant subspace Jf with 6(M, Jf) < e. Since A is unicellular, this subspace Jf is unique, and its stability follows. □ Lemma 15.9.3 Let n be even, and A: $"—»ft" have exactly one real eigenvalue. Let its geometric multiplicity be one. Then the even dimensional A-invariant sub- spaces are stable and the odd dimensional A-invariant subspaces are not stable. Proof. If k is even, then the stability of the ^-dimensional ,4-invariant subspace (which is unique) is proved in the same way as Lemma 15.9.2, using the existence of a ^-dimensional invariant subspace for every transformation X: $"-*$". Now let J be a ^-dimensional /1-invariant subspace where k is odd. Without loss of generality we can assume A = J„(0). For every positive e, the transformation A(e) = S(e) + A, where 5(e) has e in the entries (2, 1), (4, 3), ...,(« — 2, n — 3), («, n - 1), and zeros in all other entries, has no real eigenvalues. Hence A(e) has no /c-dimensional invariant subspaces, so 6(M, N) 3:1 for every /l(e)-invariant subspace Jf (Theorem 13.1.2). Therefore, M is not stable. □ Lemma 15.9.4 Assume that A: $"-+ ft" has exactly one pair of nonreal eigenvalues a ± iB, and their geometric multiplicity is one. Then every A-invariant subspace is stable. Proof. Using the real Jordan form of A, we can assume that A = K 0 0 0 / 0 •• K I ■ 0 0-- 0 0-- • 0 • 0 • K ■ 0 0" 0 / K- K = a 01 -B a\
472 Continuity and Stability of Invariant Subspaces (In particular, n is even.) Theorem 12.2.4 shows that the lattice of A- invariant subspaces is a chain; so for every even integer k with 0< fe<«, there exists exactly one ,4-invariant subspaces of dimension k. Also, there exists an e > 0 such that any transformation B with || B - A || < e has no real eigenvalues. [Indeed, for a suitable e all the eigenvalues of B will be in the union of two discs (A G £ 11A - (a ± iB)\ < {812)} that do not intersect the real axis.] Now one can use the proof of Lemma 15.9.2. □ Now we are prepared to handle the general case of a transformation A: $"—> ft". Let A,,. . . , Ar be all the distinct real eigenvalues of A, and let a, + //?,,. . . , as + iBs be all the distinct eigenvalues of A in the open upper half plane (so at are real and 8, are positive). We have ft" = »AlM) + ■ • ■ + mK(A) + mai^(A) + ■■■ + ^as±iPs(A) For every ,4-invariant subspace ^we also have jf = (jfn m^A)) + • • • + (Jf n mK(A)) + (jfnm ai±i,t(A)) + ■■■ + (^n98aj±lft(^)) (see Theorem 12.2.1). In this notation we have the following general result that describes all stable ,4-invariant subspaces. Theorem 15.9.5 Let A be a transformation on ft". The A-invariant subspace Jf is stable if and only if all the following properties hold: (a) Jf C\ 8/lx (A) is an arbitrary even dimensional A-invariant subspace of 8ftA(A) whenever the algebraic multiplicity of Ay is even and the geometric multiplicity of kf is 1; (b) Jf n 5RK (A) is an arbitrary A-invariant subspace of '31^ (A) whenever the algebraic multiplicity of kj is odd and the geometric multiplicity of kj is 1; (c) JfZ) 0tx (A), or Jf nS?A (A) = {0} whenever Ay has geometric multiplicity at least' 2; (d) Jfr\8fta±ifi(A) is an arbitrary A-invariant subspace of &laiip(A) whenever the geometric multiplicity of otj + iBj is 1; (e) Jf D @la.±ip(A) or «V H 3? a ±p (/4) = {0} whenever at + iBt has geometric multiplicity of at least 2. Proof. As in Lemma 15.3.2 one proves that Jf is stable if and only if each intersection Jf D 8ft A (A) is stable as an A\x ^-invariant subspace, and each intersection Jf D 5? +in(A) is stable as an v4L -invariant sub- space. Now apply Lemmas 15.9.1-15.9.4. □ Comparing Theorem 15.9.5 with Theorem 14.6.5, we obtain the following corollary.
Stable Invariant Subspaces for Real Transformations 473 Corollary 15.9.6 For a transformation A: $"—»ft", every stable A-invariant subspace is isolated. Conversely, every isolated A-invariant subspace is stable if and only if A has no real eigenvalues with even algebraic multiplicity and geometric multiplicity 1. We pass now to Lipschitz stable invariant subspaces for real transformations. The definition of Lipschitz stability is the same as for transformations on <(7". Clearly, every Lipschitz stable invariant subspace is stable. Also, for a transformation A: ft"—>ft" every root subspace 3ix(A) corresponding to a real eigenvalue A of A, as well as every root subspace @l a±ip(A) corresponding to a pair a ± ifi of nonreal eigenvalues of A, is a Lipschitz stable ,4-invariant subspace. Moreover, every spectral subspace for A (i.e., a sum of root subspaces) is also a Lipschitz stable /t-invariant subspace. As in the complex case, these are all Lipschitz stable subspaces: Theorem 15.9.7 For a transformation A: ft"-^ ft" and an A-invariant subspace N C ft", the following statements are equivalent: {a) M is Lipschitz stable; (b) M = ®Xi(A) + --- + ®K(A) + ®ai±il3i(A) + --- + mas±il3s(A) for some distinct real eigenvalues A,,. . . , A, of A and some distinct eigenvalues a, + j/3,,. . . , as + i(3s in the open upper half plane (here terms $lK (A) or terms 8%a±ip(A), or even both (in which case M is interpreted as the zero subspace) may be absent); (c) for every e >0 small enough there exists a 8 >0 such that every transformation B: ft" —* ft" for which \\B - A\\ < 8 has a unique invariant subspace Jf for which 0(M, Jf) < e. Proof. As in Lemma 15.3.2, one proves that M is Lipschitz stable if and only if for every real eigenvalue A of A the intersection Mr\<3lk(A) is Lipschitz stable as an /^^-invariant subspace and for every nonreal eigenvalue a + if3 of A M A &ia±,p(A) is Lipschitz stable as an A\# (A)- invariant subspace. Let us prove the equivalence (a)o(b). In view of the above remark, we can assume that A has either exactly one real eigenvalue or exactly one pair of nonreal eigenvalues. By Theorem 15.9.6 we have only to prove that the transformations represented by the matrices AG^ 1 "A 1 0 • 0 A 1 • .0 0 • 0 • 0 1 • A and
474 Continuity and Stability of Invariant Subspaces 2 ~K I2 0 • 0 K I2 ■ -0 0 • o- •• 0 • k. K = •= [ a Tl . —t crJ' 8, t 6 ^ , t ^ 0 have no nontrivial Lipschitz stable invariant subspaces. For Al one shows this as in the proof of Theorem 15.5.1. Consider now A2. By a direct computation one shows that A2 = S[Jn/2(cr - h)®Jnf2(a+ir)]S-1 where n is the size of A2 and 5 = - 1 — i 0 0 0 . 0 0 0 1 -/ 0 0 0 0 0 0 1 — i 1 i 0 0 0 0 0 • 0 • 1 • i 0 • 0 • • 0 • 0 • 0 • 0 • 1 i (For convenience, note that 5~' = i 0 0 i 0 -1 0 0 i 0 0 1 0 0 i 0 • -1 • 0 0 • 1 • • 0 • 0 i ■ 0 • 0 0 0 -1 0 0 L0 0 0 0 1 Moreover, denoting by T the n x n matrix that has 1 in the entries {nil, 1) and («, nil + 1) and zeros elsewhere, we have (for a G J|?) A2(a) K I 0 OKI 0 0 0 al 0 0 0 0 / KJ S~l[V»,2(<r ~ 'T)®Jm,2(" + ir)) + aT]S (15.9.1)
Partial Multiplicities of Close Linear Transformations 475 Now the proof of Theorem 15.5.1 shows that the only candidates for nontrivial Lipschitz invariant subspaces for A2 are S~'(Span{e,,. . . , enl2)) and S~'(Span{e„/2+1,. . . , e„}). But since these subspaces are not real (i.e., cannot be obtained from subspaces in ft" by complexification), A2 has no nontrivial Lipschitz invariant subspaces. The implication (b) => (c) is proved as in the proof of Theorem 15.5.1. To prove the converse implication, observe that, as we have seen in the proof of Theorem 15.5.1, it is sufficient to show that for any /l2-invariant subspace M(C ft") of dimension q (0 < q < n) the number of ^-dimensional invariant subspaces JV of A2(a) such that 8(M,Jf)<Ca2'" (15.9.2) is at least ( ) (Here a is positive and sufficiently close to zero, and C is a positive constant depending on q and n only.) Observe that q, as well as n, must be even. Using formula (15.9.1) and arguing as in the proof of Theorem 15.5.1, we find that for any choice of different complex numbers e,,..., eg/2 with e"'2 = 1, i'• = 1,. . . , q/2, the subspace Jf spanned by the columns of the real matrix V • • - V 1 v v V Vi = (i,ei,...,el ), i = \,...,qll satisfies (15.9.2). □ 15.10 PARTIAL MULTIPLICITIES OF CLOSE LINEAR TRANSFORMATIONS In this chapter we have studied up to now the behaviour of invariant subspaces under perturbations of the given linear transformation. We have found that certain information about the transformation (e.g., its spectral invariant subspaces) remains stable under small changes in the transformation. Here we study the corresponding problem of stability of the partial multiplicities of transformations. Given a transformation A: £"-> <p", denote by &,(A, A),. . . , kp(\, A) the partial multiplicities of A corresponding to its eigenvalue A, arid put kr( A, A) = 0 for r > p (here p is the geometric multiplicity of the eigenvalue A). For a closed contour T in the complex plane that does not intersect the spectrum of A, let kj(r,A)=tkl(\k,A), ; = 1,2,
476 Continuity and Stability of Invariant Subspaces where A,,. . . , Ar are all the distinct eigenvalues of A inside I\ If there are no eigenvalues of A inside T, put formally kj(T, A) = 0 for j'• = 1, 2,. . . . Theorem 15.10.1 Given a transformation A: (p"-* <P" an& a closed contour V with T n (t(A) = 0, there exists an e >0 such that any transformation B: §" —»<p" with \\B — A\\<e has no eigenvalues on V and satisfies the inequalities JZkj(r,B)^2ki(r,A); s = 2,3,... (15.10.1) and f«e equality f,kXr,B) = flkj(r,A) (15.10.2) Proof Let n(r, /) be the number of zeros (counting multiplicities) of a scalar polynomial/inside I\ (It is assumed that/does not have zeros on T.) For 5 = 1, 2,...,«, we have the relations s I,kn + l_i(T;A) = n(T;fs) (15.10.3) >=■ where /(A) is the greatest common divisor of all determinants of sxj submatrices in A/— A. (Here and in the sequel all transformations on <p" are regarded as n x n matrices in a fixed basis in <p".) Indeed, (15.10.3) follows from Theorem A.4.3 (in the appendix). Consider the Smith form of A/- A (see the appendix): kI-A = F(A) diag[fll(A), a2(A),. . . , fl„(A)]G(A) where F( A) and G( A) are n x n matrix polynomials with constant nonzero determinant, and fl,(A), . . . , a„(A) are scalar polynomials such that fl,(A) is divisible by fl,_,(A) for i = 2,. . . , n. By the Binet-Cauchy formula (Theorem A.2.1) /j(A) coincides with the greatest common divisor of all determinants of jxj submatrices in diag[at(A),. . . , a„(A)], and this is equal to the product a,(A)-• • a5(A) in view of the properties of a,(A),. . . ,an(\). So for s= 1,2,. . . , n 2 *,-(r; A) = 2 *„+,-,(r; a) = «(r; /,) = «(r; fll(A)- • • «,(a)) j-n + 1—5 i= I (15.10.4) Now let e >0 be so small that if \\B - A\\ < e, the determinant of the top sxs submatrix in F(A)"'(A/-B)G(A)"1 has exactly «(f; a,(A) • • • fls(A))
Partial Multiplicities of Close Linear Transformations 477 zeros in I\ [Such an e exists by Rouche's theorem in the theory of functions of a complex variable; e.g., see Marsden (1973).] Denote by hs(\) the greatest common divisor of determinants of all s x s submatrices of F(A)_1 (A/ - B)G(A)-1. Then hs(A) coincides (again by the Binet-Cauchy formula) with the greatest common divisor of determinants of all s x s submatrices in XI-B. When ||B-/t||<e we obviously have n(r; a,(A)- • • a,(A)) > «(r; hs). Combining this inequality with (15.10.4) and using (15.10.3) with A replaced by B, we find that, for s = 1, . . . , n 2 *>(rM)s= 2 *>(r;B) ; = n + 1 — A* ) = n + 1 — .y As the inequalities (15.10.1) with s>n are trivial, (15.10.1) is proved. Further, EJL, £;-(T; /I) coincides with the number of zeros of det(A/- A) inside T, counting multiplicities. This number does not change after sufficiently small perturbations of A, again by the Rouche theorem. □ The following question arises in connection with Theorem 15.10.1: Are the restrictions (15.10.1) and (15.10.2) imposed on the transformation B sufficient for existence of such a B arbitrarily close to Al Before we answer this question (it turns out that the answer is yes), let us introduce a convenient notation for the partial multiplicities of a transformation. Given a transformation v4: <p"-+ <p\ let {s; r,, r2, . . . , rs; mu, . . . , mlr|; m21,. . . , mlr\,. . . ; msl, . . . , msr) (15.10.5) be an ordered sequence, where s is the number of distinct eigenvalues of A, and the ith eigenvalue has geometric multiplicity r, and partial multiplicities m,,,. .. , mi . So E,v=] E^, mH = n. The order in (15.10.5) is determined by the following properties: (a) r, > r2- ■ ■ > rs; (b) if r, = ri+l, then ', r, Sm0a2»i, + 1/ (15.10.6) (c) if r, = r( + 1 and equality holds in (15.10.6), then k k Im,y>Sm,tl/, k = l,2,...,ri-l We say that (15.10.5) is the Jordan structure sequence of A. Denote by $ the finite set of all ordered sequences of positive integers (15.10.5) such that properties (a)-(c) hold and E*=1 EJ'_, m,v = n (here n is fixed). Given the sequence flE<l> as in (15.10.5), for every nonempty subset AC {1,... ,s} define
478 Continuity and Stability of Invariant Subspaces *,(fl;A)=2mw-, / = 1,2,... (mpj is interpreted as zero for j> rp). Now we have the following. Theorem 15.10.2 Let A: <p"—» <f"" be a transformation with s distinct eigenvalues and Jordan structure sequence SI. Then, given a sequence SI'= {s'; r[,. . . ,r'5.;m'n,...,m'lr[;.. . ;«;.,,.. . ,«;.,..}£* there exists a sequence of transformations on <p", say, {BmYm = l that converges to A and has a common Jordan structure sequence SI' if and only if there is a partition {1,2, ... ,s'} into s disjoint nonempty sets A,,.. . , As such that the following inqualities hold: S M": <W) s 2 *,-(«'; A„); t=l,2,...; p = l,...,s ;=l ;=1 (15.10.7) 2*,(il;{p}) = 2*y(n';A|,); p = l,...,s (15.10.8) Informally, if A,,. . . , A^ are the distinct eigenvalues of A ordered as in SI, and if Alm,. . . , Asm are the distinct eigenvalues of Bm ordered as in SI', then the eigenvalues {A-m}yEJ cluster around \p, for /? = 1, . . . , s. Proof of Theorem 15.10.2. The necessity of conditions (15.10.7) and (15.10.8) follows from Theorem 15.10.1. To prove sufficiency, we can restrict our attention to the case 5 = 1. Let A0 be the eigenvalue of A, and write mi = T,p=l m'pj (recall that r[ = max{rj,. . . , r\) and that m'pj is zero by definition if j > r'p). We then have the inequalities Sw,7<Swy, t = l,2,... >=i j=\ and the equality ■x ix. Now we construct a sequence {B9}*=1 converging to A such that A0 is the only eigenvalue of Bq and, for each q, the Jordan structure sequence of Bq is SI = {1; /•[; m,,..., wfl). Using induction on the number E^i (Sj=1 m; -
Exercises 479 E'_, m,y), it is sufficient to consider only the case when, for some indices l<q,we have m, = mu + 1, mq = mlq - 1, whereas mi = mXj lox\¥^ I, j' ¥> q. Write as a matrix in some Jordan basis for A. Let B =A+-Q where the matrix Q has all zero entries except for the entry in position (mu + ■■ ■ + mu, mn H + mlq) that is equal to 1. One verifies without difficulty that the partial multiplicities of Bq are ml = mll + q,mq = mlq-l, and rhj = mXi for j ¥^ I, q. Given a sequence {Bq}q_] converging to A such that a(Bq) = {A0} and the Jordan structure sequence of Bq is SI (for each q). For a fixed q, let ■*11 > • • • i ■* l.m, > -^21 ' • ■ ' > ■*2.m2' ' ' " ' > *ri.l > ' ' ' ' *r\,mri \^~>■ IV.y) be a Jordan basis for Bq; in other words, *yi, . . . , xjA is a Jordan chain for Bq for ; = 1,. . . , rj. Let /x,,. . . , [is, be distinct complex numbers; define the transformation B (filt. . . , fis.) by the requirement that in the basis (15.10.9) it has the matrix form Bq + diag[tiJm.i,fi2Im.i,... ,fis.I tnri* * * * * JV'm;.2> • • • , Ml^miri. A^m^, • • • ' ^7m;.,j] (15.10.10) where /, is the i x / unit matrix, and /*,■/„,. does not appear in (15.10.10) if k> r\. Clearly, Bq{\Lx,. . . , fis.) has the Jordan structure sequence SI', and by suitable choice of ja, values one can ensure that ||^(Ml,...,Ms.)-Sj|<^ With this choice of ja, values (which depend on q), put Bm = ^m(Mi» • - - > Ms■) to satisfy the requirements of Theorem 15.10.2. □ 15.11 EXERCISES 15.1 When are all invariant subspaces of the following transformations A: <p"—»<p" (written as matrices in the standard orthonormal basis) stable?
480 Continuity and Stability of Invariant Subspaces (a) A is an upper triangular Toeplitz matrix. (b) A is a circulant matrix. (c) A is a companion matrix. 15.2 Describe all stable invariant subspaces for the classes (a), (b), and (c) in Exercise 15.1. 15.3 Describe all stable invariant subspaces of a block circulant matrix with blocks of size 2x2. 15.4 Show that any transformation A: <p"—»<p" with rank A < n - 2 has a nonstable invariant subspace and identify it. 15.5 Prove that for every transformation A there exists a transformation B such that every invariant subspace of A + eB is stable. Show that one can always ensure, in addition, that rank B = n - 1. 15.6 Give an example of a transformation A: <f""—* <f"' such that there is no transformation B: £"—»£" with rank fi<n-2 such that, for some e £ <p, all invariant subspaces of A + eB are stable. 15.7 Given transformations A: <p"—»■ <P" and B: <p"—»<p", an ^-invariant subspace i£ will be called B stable if for every e0 > 0 there exists S0>0 such that each transformation A + SB, with |S|<S0 has an invariant subspace M such that 0(i?, M)< e0. Clearly, every stable A -invariant subspace is B stable for every B. Give an example of a B-stable ,4-invariant subspace that is not stable. 15.8 Show that if A and B commute, then there is a complete chain of B-stable ,4-invariant subspaces. 15.9 Give an example of transformations A and B with the property that an /4-invariant subspace is stable if and only if it is B stable. 15.10 Show that an ,4-invariant subspace is stable if and only if it is B stable for every B. 15.11 Show that the set of all stable invariant subspace of a transformation A: <p" -» <p" is a lattice. When is this lattice trivial, that is, when does it consist of {0} and <p" only? When does this lattice coincide with Inv(/4)? 15.12 Show that every stable invariant subspace is hyperinvariant. Is the converse true? 15.13 Prove that the transformation A: <p" —» §" has the following property if and only if A is nonderogatory: for every orthonormal basis jt,,. . . , xn in which A has an upper triangular form and any e >0 there exists a S>0 such that any transformation B: <p"—»<p" with ||B-/1||<S has an upper triangular form in some orthonormal basis yl, . . . , yn that satisfies ib,-*,l|<e i=i
Exercises 481 15.14 Let A: if?" —*■ i(?^ and B: $m—> %" be a full-range pair of real transformations. Show that every [A B]-invariant subspace is stable (in the class of real transformations and real subspaces). [Hint: Use the spectral assignment theorem for real transformations (Exercise 12.13).] 15.15 Let A be an upper triangular Toeplitz matrix. Find all possible partial multiplicities for upper triangular Toeplitz matrices that are arbitrarily close to A. 15.16 Let A and B be circulant matrices. Compute dist(Inv(/t), Inv(B)).
Chapter Sixteen Perturbations of Lattices of Invariant Subspaces with Restrictions on the Jordan Structure In this chapter we study the behaviour of the lattice Inv(A') of all invariant subspaces of a matrix X, when X is perturbed within the class of matrices with fixed Jordan structure (i.e., with isomorphic lattices of invariant subspaces). A larger class of matrices with fixed Jordan structure corresponding to the eigenvalues of geometric multiplicity greater than 1 is also studied. For transformations A and B on <p", our main concern is the relationship of the distance between the lattices of invariant subspaces for A and B to ||v4 - B||. 16.1 PRESERVATION OF JORDAN STRUCTURE AND ISOMORPHISM OF LATTICES We start with a definition. Transformations A, B: <p" -+ <p" are said to have the same Jordan structure if they have the same number of distinct eigenvalues [so that we may write a(A) = {A,,. . . , \s) and a(B) — {fi{,. . . , ns}], and the eigenvalues can be ordered in such a way that the partial multiplicities of A; as an eigenvalue of A coincide with the partial multiplicities of ja, as an eigenvalue of B, i = 1,. . . , s. Given a transformation A, denote by J{A) the set of all transformations with the same Jordan structure as A. This structure is determined by the sequence of positive integers (which was also a useful tool in Section 15.10): 482
Preservation of Jordan Structure and Isomorphism of Lattices 483 {s;r,,r2,.. . ,rs;mu,. . . , mlri; m21,. . . , m^,. . . ;msl,. . . ,msr) (16.1.1) where s is the number of distinct eigenvalues of A, and the /th eigenvalue has geometric multiplicity r, and partial multiplicities mn,. . . , mir. Thus E,r=1 EyL, nijj = n. The parameters of this sequence are ordered in such a way that rl>r2-"'-rj. ant' if r, = ri+i> then Xm^E«i+M (16.1.2) and, furthermore, if r, = r, + , and equality holds in inequality (16.1.2), the integers m/; and mi + uj are ordered in such a way that Sw,^E»i/ti;, £ = 1,2, . . . , r, - 1 Clearly, the property of having the same Jordan structure induces an equivalence relation on the set of all transformations on £". The number of equivalence classes under the relation is finite and is equal to the number of all different sequences of type (16.1.1) with the order properties described. It is shown in the first theorem that transformations have the same Jordan structure if and only if they have isomorphic (or linearly isomorphic) lattices of invariant subspaces. Let us define the notion of isomorphism of lattices. First, let Sfl and Sf2 be two lattices of subspaces in <p". A map ip: if',—» if2 is called a lattice homomorphism if y({0}) = {0}, <p($n) = <f"\ and <p(M + JV) = tp(M) + tp(N), ip(Jt n jV) = tp(Jt) n ip(N) for every two subspaces J,jV£yr Then a lattice homomorphism tp is called a lattice isomorphism if <p is one-to-one and onto; in this case the lattices ifl and if2 are sa^ to be isomorphic. An example of a lattice isomorphism is provided by the following proposition. Proposition 16.1.1 IfS: <pH—» <p" 's an invertible transformation and ifis a lattice of subspaces in <p", then Sif = {SM \ M G if} is also a lattice of subspaces and the correspondence cp(M) = SM is a lattice isomorphism of if onto Sif. Proof. The definition of <p ensures that ip is onto, and invertibility of S ensures that tp is one-to-one. Furthermore, S(M n jv) = sm n sjt (16.1.3) for any subspaces M and Jf in <f"\ Indeed, the inclusion C in equation
484 Perturbations of Lattices of Invariant Subspaces (16.1.3) is evident. To prove the opposite inclusion, take x £ SM n SJf, so x = Sm = Sn for some mE.M, nE.Jf. As S is invertible, actually m = n and * £ S{M D JV). Finally, the equality S{M + Jf) = SM + SJf is evident. D The lattice isomorphisms described in Proposition 16.1.1 are called linear. So the lattices Sfl and if2 are called linearly isomorphic if there exists a transformation S: <£""—» <p" (necessarily invertible) such that if E 5^, if and only if S£<=y2. It is easy to provide examples of lattices of subspaces that are isomorphic but not linearly isomorphic. For instance, two chains of subspaces {0} = M0CMlCM2C---CMk_lCJlk = £" and {0} = ^ c sex c &2 c • • • c j?,_, cse, = C are lattice isomorphic if and only if k = I (it is assumed that Mi ¥^ M . for j ^ / and i^ # i^ for / 5^ /). However, there exists an invertible matrix S such that SMj = iEi for / = 1,. . . , k if and only if dim M. = dim ^ for each *'. The following theorem shows, in particular, that for the lattices of all invariant subspaces isomorphism and linear isomorphism are the same. Theorem 16.1.2 Let a transformation A: §"—>§" be given. The following statements are equivalent for a transformation B: <pH—»<p": (a) B has the same Jordan structure as A; (b) the lattices Inv(B) and lnv(A) are isomorphic; (c) the lattices Inv(B) and Inv(,4) are linearly isomorphic. Proof. Assume B G J(A). Let A,, . . . , A and ja,,. . . , fip be all the distinct eigenvalues of A and B, respectively, and let them be numbered so that the partial multiplicities of A and Ay coincide with the partial multiplicities of B at yu.; for / = 1,. . . , p. For a fixed /, let ■*11 ' - • • ' Xlk,' X2\ ) • • • > X2k2> • • • ' Xq\ > • ■ • » Xqkq be a Jordan basis in dik (A), and let j'n. • • •. yi.*,; ^21. • • •. yik^ • • ■; y,i. • • •, y,,t, be a Jordan basis in S?M(B) (so kl, k2,. . . , kq are the partial multiplicities of A at A; and of B at nt). Given an ,4-invariant subspace M C S?A(A) spanned by the vectors
Preservation of Jordan Structure and Isomorphism of Lattices 485 1 kr q *, [here a'^' are complex numbers], put *l>i(M) = Span {/a,,. . . , /x,} where "rlSa^, f=l,...,/ Clearly, tyiM) is a B-invariant subspace that belongs to 8? (B). Now for any ,4-invariant subspace M put *(M) = ^(M n mXi(A)) + ■■■ + 4,p{m n<%Ap(,4)) It is easily seen that ^ is a desired isomorphism between Inv(/1) and Inv(B); moreover, t(i(Jl) = 5J<, where 5 is the invertible transformation defined by Sxrs=y„; s = l,...,kr; r=l,...,q. Conversely, suppose that tp: lnv(A)—> Inv(B) is an isomorphism of lattices. Let A,, . . . , \p be all the distinct eigenvalues of A, and let Jft = «K5?A{A)), i = 1, . . . , p. Then <p" is a direct sum of the B-invariant sub- spaces jfl,. . . , JV . We claim that o-(B| v) n er(B| ^) = 0 for i¥=j. Indeed, assume the contrary, that is, fi0 G a(B| v) D ct(B|v) for some .A^- and Jf} with /t^;'. Let Ji = Span{_y, + y2}, where yt (resp. _y,) is some eigenvector of B\x (resp. of B\x) corresponding to the eigenvalue fi^. Then Jf is B invariant. Let J< be the A -invariant subspace such that if/(Jl) = Jf. Since M must contain a one-dimensional /I-invariant subspace, and since ip is a lattice isomorphism, the subspace J< is one-dimensional. Therefore, JC9tA (/I) for some k. This implies ^V= tf>(M) C i/»(3?A (/I)) = Jfk, a contradiction with the choice of Jf. Further, the spectrum of each restriction B\y is a singleton. To verify this, assume the contrary. Then for some i the subspace Jft is a sum of at least two root subspaces for B: jf1 = mli(B) + --- + m^(B), k>2 Letting At f be the yt-invariant subspace such that ifi(At,) = 91 ^ (B), /' = 1,. . . , k, we have If xl and x2 are eigenvectors of A\Mf and of v4|^ , respectively, then
486 Perturbations of Lattices of Invariant Subspaces Span{*, + *2} is ,4-invariant and does not belong to any subspace M ■. Hence ^(Span{*, + jc2}) is B invariant, belongs to Mt but does not belong to any subspace 01 M(B). This is impossible because ^(Span{*, + x2)) is one-dimensional. We have proved, therefore, that JV, = 8^(B), i=l,. . . , p, where Hi,. . . , p, are all the distinct eigenvalues of B. For a fixed i, the number of partial multiplicities of A corresponding to A. that are greater than or equal to a fixed integer q, coincides with the maximal number of summands in a direct sum if, + • • • + S£s, where, for ;' = 1, . . . , s, SB)C Wi x (A) are irreducible subspaces with dimension not less than q. As \p induces an isomorphism between Inv(/l|^ (A)) and ln\(B\m (B)), it follows that the number of partial multiplicities of A corresponding to A, that are greater than or equal to q coincides with the number of partial multiplicities of B corresponding to ja, that are not less than q. Hence A and B have the same Jordan structure. □ Corollary 16.1.3 Assume that A and B are transformations on <p" with one and only one eigenvalue A0: a(A) = a(B) = {A,,}. Then the lattices ln\(A) and Inv(B) are isomorphic if and only if A and B are similar. 16.2 PROPERTIES OF LINEAR ISOMORPHISMS OF LATTICES: THE CASE OF SIMILAR TRANSFORMATIONS In view of Theorem 16.1.2, for transformations A and B with the same Jordan structure, the set y(A, B) of all invertible transformations S such that i?£lnv(/l) if and only if SSB elnv(B), is not empty. Denote il(A, B) = inf{||/- S|| \SSSr\A, B)} Note that the set 5^(/l, B) contains transformations arbitrarily close to zero [Indeed, take a fixed SG^(/4,B) and consider aS with a-^0, a^O.] Hence Cl(A, B) < 1 for any A and B with the same Jordan structure. This observation will be used frequently in the sequel. The following example shows that the equality Cl(A, B) = 1 is possible. example 16.2.1. Let Hi or- Ho oJ Then
and Properties of Linear Isomorphisms of Lattices 487 <KA,B)= inf If1"8 -Ml «,i..ceil!L -c 1 Jll However, it is easily seen that the norm of is at least 1, for any choice of a, b, and c, and can be arbitrarily close to 1. Hence Cl(A,B)=l. D The number Q.(A, B) is closely related to the distance between Inv(/1) and Inv(B), as we shall see in the next theorem. Recall that dist(Inv(/l),Inv(B)) = max{ sup inf 6(A, B) , sup inf 6(A,B)} ^elnv(fl) AtElnv(B) Theorem 16.2.1 If A and B have the same Jordan structure and £1{A, B) < 1, then dist(Inv(>4), lm(B))<2il(A, B)(l - Q(A, B))~l Proof. For positive e < 1 - fl(v4, B) let S e S?(v4, B) be such that \\I-S\\^q~,a(A,B) + e For every nonzero x £ <p", denoting y = S'lx, we have 1MI .\\y-Sy\\ + \\Sy\\^q\\y\\ 11*11 " 11*11 ~ 11*11 Hence UzJU_i_ 11*11 1-9 so ||5-|| < (1 - 9)- and ||/- S'H < |{S '|| ||/- S|| <<?(! - <?)-. Now for any subspace M C <p" the transformation 5/*^ 5 ' is a projector on 5J< (we denote by PM the orthogonal projector on M). So
488 Pertnrbations of Lattices of Invariant Subspaces 6(M, SM)*\\PM- SPMS~l\\ < \\PM - SPj\ + \\SPM - SPMS l\\ ^l|/-5||-||^|| + ||5||-||^||.||/-5-'|| ^||/-5|| + ||/-5-1|| + ||/-5||-||/-5-1||<29(l-9)-1 Consequently dist(Inv(/t), Inv(B)) <2^(1 - q)~l and since e <0 was arbitrary, Theorem 16.2.1 follows. □ Now consider the case when A and B are similar. Then, evidently, A and B have the same Jordan structure. Clearly, in this case $f(A, B) contains all the similarity transformations between A and B: Sf(A,B)^>{S: $"-^$"\S is invertible and A = S~lBS} We remark that this inclusion can be proper. Indeed, in Example 16.2.1 above, the similarity transformations between A and B have the form which is a proper subset of {f(A, B). Theorem 16.2.2 For every transformation A: <p" —*■ <p" we have il(B,A) . dist(Inv(B),Inv(/l)) ,„„,, ^py/Mr*3 and gnp—iig-^ii <o° (i62i) where the suprema are taken over all transformations B that are similar to A. In other words, the first inequality in (16.2.1) means that there exists a positive constant ^>0 (depending on A) such that for every B that is similar to A we have ||/-r||<K||B-,4|| for some invertible transformation T satisfying A = TBT~\ In the next section the result of Theorem 16.2.2 is generalized to include all the transformations B with the same Jordan structure as A. Proof of Theorem 16.2.2. As 0(M,J()-^\ for any subspaces M,J{ in <p" [this follows, for instance, from formula (13.1.3)], we have dist(Inv(A-),Inv(y))<l (16.2.2)
Properties of Linear Isomorphisms of Lattices 489 for any transformations X, Y: (f"1—* <p". So, by Theorem 16.2.1, the second inequality in (16.2.1) follows from the first one. To prove the first inequality in (16.2.1), consider the linear space L($") of all transformations A: $"->$", with the scalar product (X, Y) = tr(XY*) for X,YEL($") (where Y* denotes the adjoint of Y defined by the standard scalar product on <£"") and the corresponding norm ||Jf||,= ViXTX) for all A"6L(f). For every B6L(f) consider the linear transformation WB(X) = AX - XB, X £ L(<p) so that, in particular, WA(X) = AX- XA. If B is similar to A, then dim Ker WB = dim Ker WA (indeed, Ker WB = {XS \ X e Ker WA}, where S is a fixed invertible transformation such that B = S~US). Let PA be a fixed projector on Ker W^. [Thus PA: L(f)-» L(<p").] By Theorem 13.5.1 there exists a positive constant K^ such that, if B is similar to A, then \\pa ~ pb\\, - kiWwa ~ wb\\, for some projector PB on Ker WB. Here \\PA-PB\\,= max ||(/*>, - -P»)Jf|L is the norm induced by || • ||/( and similarly for \\WA - WB\\r Observe that the norm || • ||, is multiplicative: ||XY\\, < 11*11, ■ || Y\\, for all transformations X, Y: <p" —»■ (p". Indeed, if || • || is the norm induced on transformations by the standard norm on <p", then it is easily verified that ||y||2/-yy* is positive semidefinite and hence that ||y||2A'A'*- XYY*X* is positive semidefinite. Thus ||*y||,2 = tr(A-yy*A-*) < tr(|| y||2***) = || y||2 ■ ||*||2 (16.2.3) Further, denoting by A, > ■ • • > A„ (s:0) the eigenvalues of the positive semidefinite transformation Y*Y, we have every /£ (p" with ||/|| = 1: || y/||2 = (Yf, Yf) = (Y* Yf, f) < A, < A, + ■ • • + A„ = tr(y*y) = tr(yy*) = ||y||2 so ||y||2^||y||2. Substitution in (16.2.3) yields the desired inequality ||*y||^||*||,||y||,. Note that (WA - WB)(X) = X(B - A); so the multiplicative property of II • II, implies WWa-WsI^WB-AW, Now \\PA - PB||, </C,||fl - A\\t for every transformation B similar to A. The identity transformation / belongs to Ker WA; so PA(I) = I and
490 Perturbations of Lattices of Invariant Subspaces ||/- PB(I)\\, *Kt\\B- Al\\l\\, = VnK.WB - A\\, If, in addition, ||B- A\\t< (VnKiy\ then PB(I) is an invertible transformation. In this case PB(I) £ 5^(B, A); hence il(B, A)<K2\\B-A\\ (16.2.4) for every fl£L((p") that is similar to A and such that ||fl-.4||,< {VnK^)~x, where the constant K2 >0 depends on A only. Taking into account the fact that Q(B, A)< 1 for all B similar to A, we find that (16.2.1) follows from (16.2.4). □ Results analogous to Theorem 16.2.2 hold also for other classes of subspaces connected with a linear transformation. For example, the lattice of all invariant subspaces in Theorem 16.2.2 can be replaced by any one of the following sets: coinvariant subspaces, semiinvariant subspaces, hyper- invariant subspaces, reducing invariant subspaces, and root subspaces. The proof remains the same in all these cases. Theorem 16.2.2 fails in general if we drop the requirement that B is similar to A. The next example illustrates this fact. example 16.2.2. Let A-[l l\^f: B(S)-[l 0S].Sef Let us compute dist(Inv(/l), Inv(fl(S)) for S ^0. We have for a complex number a a(span{ei},SPan{[;]}) = |[J J] - (N> + !>-[ f ?]| = (|a| + l)(|«|2 + ir' d({0}, Span{[ "]}) = fl((p2, Span{["]}) = 1 So f 0, if M = {0}, M = Span{e,}, M = <p2 inf 6(M,£)=\ r , [ min(l, (|a| + l)(|a|2 + l)"1), if M = Span[ " J Jfelnv B(«) Hence
Properties of Linear Isomorphisms of Lattices 491 sup inf d(M,Z£) = \ .«elnv(/*)^elnvB(S) As any subspace in <p is A invariant, we obviously have inf 6(%,M) = 0 for every ^GlnvB(S). Thus dist(Inv(,4), Inv(fl(5))) = 1. In the limit as 5—»0, we see that the conclusion of Theorem 16.2.2 fails for this particular A if we drop the condition that B be similar to A. □ We conclude this section with a simple example in which il(B, A) and dist(Inv(B), Inv(j4)) can be calculated explicitly. example 16.2.3. Let TO 01 „ [0 x"l Ho J- Ho iFe*> Then ^•»)"{[o 1] and il(A,B)2 = mm{ inf max {|(1 - a)u - xdv\2 + |(1 - d)v\2} |«|2 + IH2=i inf max {|(1 - xd)u - av\2 + \-du + v\2}} |«|2 + |«|2=l Taking a = 1, it follows that il(A,B)2< inf {\xd\2 + |1 - d\2} On the other hand, taking u = 0, we have il(A,B)2>min{ inf max {\xdv\2 + |(1 - d)v\2}} inf max {|au|2 + |u|2} = min{inf {|xd|2 + |1 - d\2}, 1} So Jl(,4,fl)2 = inf {\xd\2 + \1 - d\2} de<f a,d^0 u [- ;] ..^o
492 Perturbations of Lattices of Invariant Subspaces An elementary calculation (using the stationary points of \xd\2 + |l - d\2 considered as a function of two real variables %e d and 3m d) yields il(A,B) = \x\(\x\2 + iyu2 To calculate the distance between lnv(A) and Inv(fl), note that the unique different invariant subspaces of A and B (if jc^O) are Span and Span , respectively, with corresponding orthogonal projectors ,,-[l J] and ,,-dxl-..,-[«' ;] Observe that ii/w2u = W(M2 + ir,/2 and, letting P3 = I - P,, we find that \\Pt-p3\\ = u \\pi-p2\\-(\x\2 + iyU2 These inequalities, together with the fact that 6(M, Jf) = l if dim M ¥^ dim M (see Theorem 13.1.2), allow us to verify that dis(Inv(,4), Inv(fl)) = \x\(\x\2 + 1)~"2 It is curious that il(A, B) = dist(Inv(/l), Inv(fl)) in this example. □ 16.3 DISTANCE BETWEEN INVARIANT SUBSPACES FOR TRANSFORMATIONS WITH THE SAME JORDAN STRUCTURE We state the main result of this chapter. Theorem 16.3.1 Given a transformation A on <£"", we have il(A,B) SUPP^| and <oo (16.3.1) dist(InvM), Inv(B)) ,„ „ „, SUP l|fl-^|l <C° (1632) where the suprema are taken over the set J(A) of all transformations B: (p"—* <p" which have the same Jordan structure as A.
Transformations with the Same Jordan Structure 493 Before we proceed with the proof of Theorem 16.4.1 (which is quite long), let us mention the following result on Lipschitz continuity of dist(Inv(j4), Inv(B)), whose proof is facilitated by the use of Theorem 16.3.1. Theorem 16.3.2 Let J be a class of all linear transformations having the same Jordan structure. Then the real function defined on J by <p(A, B) = dist(Inv(.4), Inv(fl)) for all A, B G J is Lipschitz continuous at every pair Aa, fl0 G J, that is \<p(A, B) - <p(A0, B0)\ ^ K(\\A - A0\\ + \\B - BQ\\) for every A, BE. J, where the constant K>0 depends on A0 and B0 only. Proof. We need the following observation (proved in Section 15.6): dist(Inv(,4), Inv(B))<dist(Inv(^), Inv(C))+ dist(Inv(C), Inv(fl)) (16.3.3) for any transformations A, B, C: <p"-» <p". Using (16.3.3) and (16.3.2), we obtain for a fixed A0, B0 G J: \<p(A, B) - <p(A{i, B0)\ < \<p(A, B) - <p(A0, B)\ + \<p(A0, B) - <p(A0, B0)\ - <p(A, A0) + 9(B, B0)<zK(\\A - A0\\ + \\B- B0\\) D Proof of Theorem 16.3.1. Since Theorem 16.2.1, together with (16.3.1), implies (16.3.2), we have only to prove (16.3.1). The main idea of the proof is to reduce it to Theorem 16.2.2. For the reader's convenience the proof is divided into three parts. (a) Let A,,. . . , Ap be all the distinct eigenvalues of A, and let T; be a circle around A,, i = 1,. . . , p chosen so small that T, PI r; = 0 for i ¥" j and A. is the unique eigenvalue of A inside T.. For every T, and every transformation B: <p"—»(p" that has no eigenvalues on T,, define i kl(ri,B) = 2kj(nl,B), / = 1,2 where /i,,. . . , /i are all the eigenvalues of B inside T, and
494 Perturbations of Lattices of Invariant Subspaces kx(fim, B) s: k2(ij.m, B) s: • • • are the partial multiplicities of B at /j.m (we put kr(nm, B) = 0 for r greater than the geometric multiplicity of \xm as an eigenvalue of B). By Theorem 13.5.1 there exists an e, >0 such that any transformation B with \\B - A\\ < e, has all its eigenvalues in the union of the interiors of r,,...,!^; and, moreover, the sum of algebraic multiplicities of the eigenvalues of B inside a fixed circle T, is equal to the algebraic multiplicity of the eigenvalue A,of A, for i = 1,. . . , p; further ■x x SW,B)-^Mr^)- ; = 1,2,...; i = 1 p j=s j=s (16.3.4) provided ||fl- j4||<e,. (b) Assume now that \\B - A\\ < e, and B £ J(A). As the numbers of different eigenvalues of B and of A coincide, there is exactly one eigenvalue of B, denoted (tjt inside each circle T,. We claim that for every i = 1,. . . , p the eigenvalue A, of A and the eigenvalue /a, of B have the same partial multiplicities. Indeed, assuming the contrary, it follows from (16.3.4) that i*,(r(|,,B)<i k,(rio,A) (16.3.5) for some «0 (1 < i„ < p) and some sQ (note that the equality 2 *,cX'B) = i *>(r„ /i) /=i >=i holds for j = 1,. . . , p). For notational simplicity assume that j0 = 1, and that A,, A2,. . . , A are exactly those eigenvalues of A whose algebraic multiplicities are equal to the algebraic multiplicity of A,. As B £ J(A), there is a permutation tr of {1,2, . . . , p0) such that ki(ri,A) = kf(rwU),B), i=l,...,p0; j = l,2 Consequently, Po * Pa * 2 2*,(r,,B) = 2 2 k,(T„A) However, (16.3.4) and (16.3.5) imply 2 2 *,-(r„ b)<2 2 k^A) which is a contradiction.
Transformations with the Same Jordan Structnre 495 (c) Observe that a transformation F: (p"-* (p" with \\F- A\\ < e,/2 has no eigenvalue on T, U T2 U • • - U Tp. So the number M= max max ||(A/- F) 'I \\F-A\\*t,l2 is well defined. For the transformation B E J(A) with ||fi—j4||< e,/2 we have [using (13.1.4)] 6(®^A), %,(B)) -It^ L (A/-^)"'rfA--^ f (A/-B)-'dAl II27TZ Jr.- 27TZ Jr, v ' || - 2^ Ir, HCA/- ^)"' " (A/- S) 'II MA| ^ /,. IK A/- /I)"'!! • ||/l - B|| • ||(A/ - B)-'|| ■ |dA| 2i MU-BI M2A 2tt where A, is the length of T,. Let 5' = ' ~ 2^r7 /r,l( A/ ~~ A)"' ~ (A/ " B)"] dA ' 1 = 1, . . . , p Then ||/-5,|| < (M2A,/2tt)||j4 - B|| and (provided, in addition, (M2A;/2tt)|M - fi|| < 1) S,(%(A)) = m^B), i = 1, . . . ,p. Put e, irM'A,]-1 lTM2Ap and for fixed i(l</<p) let 5. be the transformation constructed above for the transformation BE J(A) with ||B - A\\ < e2. Define the transformation Bi:aAj(^)-aAjM)by Bi = 5; fi|^(B)5- where 5, = S^m . Obviously, /a, is the only eigenvalue of fl,. Further, for the transformation At = A\m (A) we have (here xE 9?A(/1)):
496 Perturbations of Lattices of Invariant Subspaces Ik^-B^IMI/U-sr'flVH < \\Ax - S~lBx\\ + \\S;lB(I-S,)x\\ *\\Ax- Bx\\ + ||(/- S;l)Bx\\ + \\S7lB(I- S,)x\\ (16.3.6) Now (16.3.7) ||fl||<IM|| + IM-fl||<|M|| + i and WS-'W^il-q,)1, \\I ~ S;l\\ < q,(l - fl,)"' where 9, = ||/-5|.|| (cf. the proof of Theorem 16.2.1). Since (1 - q,) ' <2, (16.3.6) gives \\A,-Bi\\<K\\A-B\ (16.3.8) where Kt = \ + (AM%lw)(\\A\\ + 1) + 4(||,4|| + l)(M2A,/"n-)- Now we have we det(A/ - A,) = (A - A,)*', det(A/ - B,) = (A - j*,.)*', tri, = A;,A, , tr B,■ = &,A, (16.3.9) On the other hand, for any orthonormal basis fn . . . , fk in 3?A (/I) the inequality (16.3.8) gives M,-trB,| = St/i,/,,/;)"^/^) /-I ^ 2 IM,),, /,) - (B,fr f,)\ * kt\\A, - B,\\ < *,K,||,4 - B|| Taking into account (16.3.9), we obtain k-Ajrsfcfjy^-Bll (16.3.10) Now define the transformation B': <p"—»(p" by fi'jt = (fi-/tt,./ + Ai/)*, *e%.(fl) Then B' is clearly similar to A. As every invariant subspace of a transformation is the direct sum of its intersections with all the root subspaces of this transformation, it follows that Inv(B) = Inv(B'). Moreover, inequality (16.3.10) shows that for all x, £ %(B), ||(B' - A)x,\\ ^ ||(B - A)xt\\ + ||(m, - A.KH < (1 + k*K,)\\A - B|| • \\x,\\ (16.3.11)
Transformations with the Same Derogatory Jordan Structure 497 For every x£<p" write x = x, + • • •+ xp, where Xj~Pj(B)x, and P^B) is the projector on %(fl) along E^S^fl). As P;(B) = (1/2-n-i) J, (A/- B)~x d\, we have M2 A WPfiB)- P^W^-^WA- B\\ where /^(.d) is the projector on 0ix (A) along T,l?tj0i/i(A). Denoting /M2A- \ G, = max I —-^|M - fi|| + ||P,(/1)||) (16.3.12) we see that ||Py(B)|| ^ 0,, / = 1,. . . , p. Now using (16.3.11) with these inequalities we obtain ||(B' - A)x\\^ ||(fi' - A)Xj\\ ^ (2 (1 + k^Wx^WA - fi|| *{£(i + *,2^G,}|M-b|NMI '/=i and thus |fi'-^||<G2|M-B| where Q2 = <2, E?=1 (1 + k)^). By Theorem 16.2.2 there exists Qi >0 such that for any transformation X that is similar to A there exists an invertible S with A = S lXS and ||/- 5|| < Q3\\X- A\\. Applying this result for X= B' and bearing in mind that Inv(fi') = Inv(fi), we obtain il(A,B)^Q2Q3\\B-A\\ (16.3.13) for any B £ J(A) with || B - A \\ < e3. As L\A, B) < 1 for any B £ J{A), (16.3.1) follows from (16.3.13). D 16.4 TRANSFORMATIONS WITH THE SAME DEROGATORY JORDAN STRUCTURE The result on continuity of lnv(A) that is contained in Theorem 16.4.1 can be extended to admit pairs of transformations that are close to one another and have different Jordan structures, provided the variations in this structure are confined to those eigenvalues with geometric multiplicity 1. To make this idea precise, we introduce the following definition. We say that transformations A: (f"1-* <p" and fl: <p"—» (pn have the same derogatory Jordan structure if A\^iA) and B\%(B) have the same Jordan structure, where ^(A) is
498 Perturbations of Lattices of Invariant Subspaces the sum of the root subspaces of A corresponding to eigenvalues A0 with dim Ker( A0/ - A) > 1. By definition, <%(A) = 0 if dim Ker( A0/ - A) = 1 for every eigenvalue A0 of A. Denote by DJ(A) the set of all transformations that have the same derogatory Jordan structure as A. We need one more definition to state the next theorem. For a transformation A, the height of A is the maximal partial multiplicity of A corresponding to the eigenvalues A0 with dim Ker( A(l/ - A) = 1. If A has no such eigenvalues, its height is defined to be 1. Theorem 16.4.1 Let A: <p"—»<pn be a transformation with height a. Then dist(Inv(/i), Inv(B)) sup- <co iiB-^ir where the supremum is taken over all B 6 DJ(A). The inequality in Theorem 16.4.1 is exact in the sense that in general a cannot be replaced by a smaller number. Namely, given a transformation A with height a, there exists a sequence {flm}* = i of transformations converging to A with Bm e DJ(A) such that . x dist(Inv(/l), Inv(g„,)) ^ n hm int T-r— > U \B„-A \Ua (16.4.1) Indeed, it is sufficient to consider the case when A = 7„(0) is a Jordan block. Then the sequence B„ 0 1 0 0 0 1 m 0 0 satisfies (16.4.1). This is not difficult to verify using the fact that Bm has n distinct eigenvalues em""" with corresponding eigenvectors Span(l, em ,. . . , e m ) where e is an «th root of unity. Indeed, writing £ = em l/", we see that the orthogonal projector on Span(l, f,. . . , £"~l) is
Transformations with the Same Derogatory Jordan Structure 499 (i + ki2 + --- + ki2<"",>r' i ki2 yn — 1 cTl c2Tl /|2("-1> so ^SpanCpSpa^l, f,...,r_l»sC|fI =Cm-1"' where the positive constant C is independent of m. Hence for m large enough (such that C|£| < 1) we have dist(Inv(i4),Inv(Bm)) a Cm""" and (16.4.1) follows. The proof of Theorem 16.4.1 is given in the next section. For the time being, note the following important special case. Corollary 16.4.2 Let A: <p" —* <p" be a nonderogatory transformation with height a. Then there exists a neighbourhood °U of A in the set of all transformations on <p" such that supdist(Inv(^),Inv(fl))<oo se* \B-A\V Recall that a transformation A is called nonderogatory if dimKer(A/ — A) = 1 for every eigenvalue A of A, and note that the set of all nonderogatory transformations is open. Indeed, if A: <p"—»<p" is nonderogatory, then rank(/l - A()/) = n — \ for every eigenvalue A0 of A. Write A as an n x n matrix in some basis in <p", and let A 0 be an (n — 1) x (n — 1) nonsingular submatrix of A - A0/. Then, for B sufficiently close to A and A sufficiently close to A0, the corresponding (n-l)x(n-l) submatrix BQ of B - A/ will also be nonsingular. Consequently rank(B - A/) > n ~ 1 (16.4.2) for all such B and A. Now the eigenvalues of a transformation depend continuously on that transformation. So the set of A values for which (16.4.2) holds will contain all eigenvalues of B (if B is close enough to A), which means that B is nonderogatory. Using the openness of the set of all nonderogatory linear transformations, we see that Corollary 16.4.2 follows immediately from Theorem 16.4.1.
500 Perturbations of Lattices of Invariant Subspaces The following result on continuity of dist(Inv(j4), Inv(fl)) can be obtained from Theorem 16.4.1 in the same way that Theorem 16.3.2 was obtained from Theorem 16.3.1. Theorem 16.4.3 Let DJ be a class of all transformations having the same derogatory Jordan structure. Then the real function defined on DJ by <p(A, B) = dist(Inv(.4), Inv(fl)) for every A, BE DJ is continuous. Moreover, for every pair A0, B0E J there exists a constant K>0 such that \<p{A, B) - <p(A0, B0)\ < K(\\A- A0\\l,a + \\B- B0\\Ufi) for every A, BE DJ that is sufficiently close to A0, B0, and where a, 6 are the heights of Aa and fl„, respectively. Now we consider stable invariant subspaces. Recall from Section 15.2 that an ^-invariant subspace M is called stable if for every e > 0 there exists 8 >0 such that any transformation B with ||fl- j4||<S has an invariant subspace ^V with the property that d(M, Jf)< e. Using Theorem 16.4.1 and its proof, we can prove a stronger property of stable invariant subspaces: Theorem 16.4.4 Let A: <p"—»<p" be a transformation with height a, and let M be a stable A-invariant subspace. Then inf 0(M,Jf) -V£Inv(S) SUP U.--4I"- *" where the supremum is taken over all transformations B: <pn—* <p". It will be convenient to prove Theorem 16.4.4 in the next section, following the proof of Theorem 16.4.1. 16.5 PROOFS OF THEOREMS 16.4.1 AND 16.4.4 We start with a preliminary result. Lemma 16.5.1 Let A: ("-*(" be a transformation with cr(A) = {0} and dim Ker A = l. Then, given a constant M>0, there exists a K>0 such that \X0\^K\\B-A\\l/" (16.5.1) for every eigenvalue A0 of every transformation B:(p"—»(p" satisfying \\B- A\\<M.
Proofs of Theorems 16.4.1 and 16.4.4 501 Proof. Let B: <p" -»• (f" be such that ||B - A|| < M. We have .4" - 0 and thus ||B"|I = IIB" - ^"11 = \\B"~\B - A) + B"\B - A)A + --- + B(B - A)A"~2 + (B- i4)i4"_l|| ^\\B-A\\2\\B\\"l->\\A\\> ; = 0 ^\\B-A\\2(M+\\A\\rl->\\A\\> i-o On the other hand, if A0 is an eigenvalue of B, then A„ is an eigenvalue of B" (as one can easily see by passing to the Jordan form of B). Hence |A,,!" = |AjJ| < ||fl"||. If this inequality is combined with the preceding one and nth roots of both sides are taken, the lemma follows. □ Now we prove Theorem 16.4.1 for the case when A: (pn —» <p" is non- derogatory and has only one eigenvalue. Lemma 16.5.2 Let o-(J4) = {A0} and dim Ker( A()/ - A) = 1. Then there exists a constant K>0 such that the inequality dist(Inv(.4), lm(B))<K\\B- A\\lln (16.5.2) holds for every transformation B: <p"—» (p". Proof. It will suffice to prove (16.5.2) for all B belonging to some neighbourhood of A. We can assume A0 = 0. By Lemma 16.5.1 there exist K{ >0 and e, >0 such that any eigenvalue A0 of a B with \\B - A\\ < e, satisfies |A0| < KX\\B - AW1'". As the set of nonderogatory transformations is open, we can assume also that every B with ||fl - A\\ < e, is non- derogatory. Now for such a B and its eigenvalue A0 let x0 be the corresponding eigenvector: (B — A0/)x0 = 0, x0^0. Then dim Ker(B - A0/) = dim Ker A = 1, and using Theorem 13.5.1, we find that 6)(Ker A, Ker(B - A0/)) < K2\\A - B\\Un (16.5.3) for any eigenvalue A0 of any B satisfying ||fl - A\\ < e2, where the positive constants K2 and e2 < e, depend on A only. It is convenient to assume that A is the Jordan block with respect to the standard orthonormal basis in (p": A = •/„(()). For any B sufficiently close to A write B - A = [bii]"J=l. Inequality (16.5.3) shows that there is an eigen-
502 Perturbations of Lattices of Invariant Subspaces vector x of B corresponding to an eigenvalue A0 of the form x = (l,x2,x3, . . . ,xn), where x2,. . . , xn G <p. The equation (B - \QI)x = 0 has the form fc,,-A0 l + bi2 bl3 b2l b22 - A0 1 + b23 b3l b32 b33 - A0 bn-i,i bn_l2 b„_l3 b„, b^ b. 'In bn-\.n-\ K 1 + b„-i n n2 "n"i u n,n~\ Rewrite the first n - 1 equations in the form b„,„- V -1- x2 x3 ~*n- = 1 0 0 -0. 'l + fc,2 bl3 b22 - A0 1 + b23 L bn-l,2 fe«-1.3 '2n 1 + 6^,,,-IL^ '-(*,i-A0)" -*2, Using |A0|^ AT,||fi-i4||l/B and Cramer's rule, we see that for / = 2, 3,. . . , n, xf has the following structure: xj = A0"l(l +fi,i-l(bpq)) + K'2fi.,-2(bpq) + • ■ ■ + A0/;,(fcM) +fj0(bpq) (16.5.4) where fjk(bpq) are scalar functions of n2 variables {bpq}"p q=l such that I/;*(V)I^MM-b|| where B satisfies ||fl - A\\ < e2. Here and in the sequel L0, L,,. . . , denote positive constants that depend on A only. Now let x°\ . . . , x(k) be k eigenvectors of B corresponding to k different eigenvalues A,,. . . , \k. Construct new vectors using divided differences: ,(12) C("-x(2> A, - A2 ," (23) _ x(2)-x(3) A2 - A3 Ak-Uk) _ x«-l)-xik) K-\ A* u(13) = »(12)-»(23) A,-A, ," (21) _ (23) _ ,,(34) tr ' - u A2 - A4 (*-2,*-l)_ (*- (*-2.t) _ " " 1.*) A*-2 ~~ At <1.A-1) _ „<2,*> (1,*) _ U M A, A^ Let
Proofs of Theorems 16.4.1 and 16.4.4 503 0,2:0 a, H +af = k be the homogeneous polynomial of degree /c in variables yl, . . . , y,. A simple induction argument [using (16.5.4)] shows that u(/*} has the following form (where s = k - / and the first s coordinates in uuk) are zeros): 0 0 P|(Ay, A;+1, />2(Ay, A/ + 1, .</*)_ •-.A*)(l+/, + 2.I+l)+/1+2., ■•.Aj(l+/J+3j+2) + pl(A/, "*)/i + 3,s+ 1 ' / s + 3. P«-i-,(A>. A/+i> ■ ■ • • AJ(1 +/„,„_,) + p,,-,^^, . . . , \k)fn „_2 + -" + P,(Ay,...,Aj/J1<J+l,+/llt (16.5.5) Here fuw— fuw{b ). The induction argument is based on the following equality (where we put formally p0 = 1): P«(A;, ■ ■-, A,f)-p„(A/M,.. . , A^,) (A;-A, + 1) = [^l=o AjVu-tt-CAy+i, . . . , A^) - £",=0 A^+1pu_,v(A/+i, . . . , A^)] A/~ A„ + i U = X Pk.^(A;, A9 + 1)^_vv(A; + 1,. . . , A9) = pu^,(A;, . . . , A„ + 1) »V= 1 Now consider the subspace ^=Span{*(,>, u(,2), u(,3),...,u<")}. Obviously ^ = Span{x(,U(2),...^,*)} On the other hand, the matrix Q Ly .y:1 OJ y y~
504 Perturbations of Lattices of Invariant Subspaces is a projector on 5£, where Yk (resp. Yn_k) is the k x k [resp. (n - k) x k] matrix formed by the upper k (resp. lower n - k) rows of the n x k matrix ^(l)„(12)u(13) ,("*> [;tl'V Using formulas (16.5.5), we see that detyt = (l+/2I)---(l+/M_,) and thus, Yk is invertible (for B sufficiently close to A). Using the estimates \fm\^L0\\A-B\\, |A,|==KI||B-i4||",\ we easily find from (16.5.5) that ' * II ^= ^i- Further, II Y„ B L2\\A \Q-[o o Hence <L3||,4 B\ So Consequently e(Se,Span{elt...,ek}): L3\\A B (16.5.6) dist(Inv(J4), Inv(fl))< L4||,4 - fl||' for every transformation B such that ||fl-/4||<e2 and every B-invariant subspace is spanned by its eigenvectors. As B must be nonderogatory, the last condition means that B has n distinct eigenvalues. Assume now that B is such that \\B - A\\ < e2, but B does not have n distinct eigenvalues. In particular, B is nonderogatory. Let {#„,}„ = , be a sequence of transformations such that \\Bm - A\\ < e2 for all m, Bm-* B as m—>«>, and Bm has n distinct eigenvalues for each m. Let M be a ^-dimensional B-invariant subspace. As J is a stable subspace (see Theorem 15.2.1), there exists a sequence {-^„,}™ = i, where Mm is a k- dimensional BTO-invariant subspace such that 0(Mm, M)—*0 as m—»». By (16.5.6) 0(J<, Span{ei, . . . , ek}) < fl(it, J<J + fl(^M, Span{e„ Passing to the limit in this inequality as m—*«>, we obtain 8(M,Svan{el,...>ek}<Li\\A-B\V"' ek)) hence
Proofs of Theorems 16.4.1 and 16.4.4 505 dist(Inv(i4), Inv(B))<Lj.4-B||"" for all B with ||B-.4||<e2. D Proof of Theorem 16.4.1. We now start to prove Theorem 16.4.1 in full generality. Let T, and T2 be two closed contours in the complex plane such that T, D T2 = 0 and the eigenvalues A(l of A lying inside T, (resp. T2) are exactly those for which dim Ker(A0/ - A) = 1 (resp. dim Ker( A„/ - A) > 1). Let 5, >0 be chosen so that any transformation B:(pn—»(p" with ||B - A\\ < 8, has no eigenvalues on T, U T2. For such a B, let S' = I~^i Jr> [(A/" Ay' ~(A/" By l]dx' i = hl and define the transformation S: <p"-»<p" by Sx-SjX for jt£9?.(y4), the spectral subspace associated with the eigenvalues of A inside rr Denote by P, the projector on 0tx(A) along 9t2(A); then for any xE <p" with ||jt|| = 1 we have ||(/ - S)x\\ = ||(P, - S,P,)* + ((/ - P.) - S2(/ - P,))4 s||/-SI||-||PI|| + ||/-52||.||/-P1|| =s^M-B||-||PI|| + ^||X-B||-||/-P1|| where A is the length of r. and M= max max ||(A/-F)_1|| f. ([■"-.((•" Aer,ur2 ||F-/t||-S, (cf. the proof of Theorem 16.3.1). Letting N =(2ir)'lM2^l\\Pl\\ + A2||/-P,||), we have ||/ - S\\ < N\\A - B\\. Hence for ||.4-B||< min(5,,(2A0~l) the transformation S is invertible and SSt^A) = 3?,(B), /=1,2. Now put B = S~'B.S. Then (cf. the proof of Theorem 16.2.1) dist(Inv(B), \m(B))-&2N\\A- B||(l - 2W||,4 - fl||)~' As dist(Inv(/l), Inv(B))<dist(Inv(/4), Inv(B)) + dist(Inv(B), Inv(B)) (16.5.7) it is sufficient to prove Theorem 16.4.1 only for those B: (f"1—* <p" that are close enough to A and satisfy 9?,(B) = 01^ A), j = 1, 2.
506 Perturbations of Lattices of Invariant Subspaces Note that for any transformation B sufficiently close to A with 3?;(B) = 0lj(A), every B-invariant subspace Jf is of the form Jf = Jfl + Jf2, where Jfj = Jff\ @lj{A). Let M be an /t-invariant subspace, and let M = Mx + M2, where Mj = M D ^(A). Then, denoting by Py (resp. Qy) the orthogonal projector on the subspace if, (resp. i?2) in ®X(A) [resp. 3?2(/l)], we have fl(J£, Jf) ^ \\(PMi + Q.M) ~ (Px> + QXi)|| Hence dist(Inv(/l), Inv(B)) < dist^nv^^,), \m(B\^A))) + dist^nv^^,), Inv(BUj(yl))) (16.5.8) Further, we remark that if B is sufficiently close to A, and 9?y(B) = ^(A) for; = 1,2, then B^ (/M is nonderogatory, that is, dim Ker( A0/ - B\# (A)) = 1 for every eigenvalue A0 of B\A (A). Indeed, this follows from the choice of 9?,(/l), which ensures that j4|gj^) is nonderogatory and from the openness of the set of nonderogatory transformations. If, in addition, BE DJ(A), it now follows that A\.^ (/1) and B\A (A) have the same Jordan structure. Hence in view of (16.5.8) and Theorem 16.3.1, we only need to prove the inequality distanv^l^,), Inv(flUi(/,;)) *K\\B- A\\lla In other words, we can assume that A is nonderogatory. Moreover, using the arguments similar to those employed above, we can assume in addition that A has only one eigenvalue, and this case is covered already in Lemma 16.5.2. Theorem 16.4.1 is proved completely. □ Proof of Theorem 16.4.4. It is sufficient to prove that there exist positive constants e and K such that the inequality inf 6(M,Jf)^K\\B- A\\1,a (16.5.9) holds for every transformation B satisfying ||fl - A\\ < e. Observe that for any transformations B, B: £"—» (p" the inequality inf 6(M,Jf)< inf 6(M, Jf) + dist(Inv(B), Inv(fl)) (16.5.10) .velnv(B) .veinv(B) holds. Indeed, for every ^VGlnv(B) and JfEln\(B) we have
Transformations with Different Jordan Structures 507 9{M, Jf) < 0(M, Jf) + 6(Jf, Jf) Taking the infimum over all ^VGlnv(B) it follows that inf e(M,Jf)<6(M,JT) + inf 6(Jf, Jf) Xelnv(B) „VeInv(B) < 8{M, Jf) + dist(Inv(B), Inv(B)) It remains to take the infimum over all ^V"£lnv(B) to obtain (16.5.10). Using the arguments from the proof of Theorem 16.4.1 [when (16.5.10) is used instead of (16.5.7)], we reduce the proof of (16.5.9) to the case when B has the property that every root subspace 9?A (A) of A is a spectral subspace for B and, moreover, the spectra of B\M ^A) and B\A {A) do not intersect if A, ^ A2. Let A,, . . . , Ar be all the distinct eigenvalues of A; then M = {Mf\ S8A|(i4)) + • ■ • + (M D ®Kr{A)) Also, for every B-invariant subspace ^V we have ^ = (jf n MXi{A)) + --- + (jfnm K(A)) Arguing as in the proof of Theorem 16.4.1, we obtain r 6(M, Jf) < 2 e(M fl 0lK {A), JfC\0l, (A)) i=\ ' ' So in order to prove (16.5.9), we can assume without loss of generality that A has only one eigenvalue, say A,. If dim Ker(A,/- A) > 1, then by Theorem 15.2.1 (here we use the assumption that M is stable) M - {0} or M = $", in which case (16.5.9) is trivial. If dimKer(A,/ - A) = 1, then (16.5.9) follows from Theorem 16.4.1. [Note that in this case B £ DJ(A) for all B sufficiently close to A.] O 16.6 DISTANCE BETWEEN INVARIANT SUBSPACES FOR TRANSFORMATIONS WITH DIFFERENT JORDAN STRUCTURES In this section we investigate the behaviour of dist(Inv(/l), Inv(fl)) when A and B have different Jordan structures or different derogatory Jordan structures. The basic result in this direction is as follows. Theorem 16.6.1 We have infdist(Inv(/l),Inv(B))>0 (16.6.1)
508 Perturbations of Lattices of Invariant Subspaces where the infimum is taken over all pairs of transformations A, B: <pB—» <p" such that A is derogatory and B is nonderogatory. [The infimum in (16.6.1) depends on n.\ Proof. Recall that B is nonderogatory if and only if the set of its invariant subspaces is finite. By assumption, dim Ker( A0/ - A) > 1 from some eigenvalue A0 of A. Let x and y be orthonormal vectors belonging to Ker(A0/- A), and put M(t) = Span{x + ty) , 0<f<l Clearly, the subspaces M(t) are A invariant. On the other hand, for every nonderogatory B: <p"—* (pn it is easily seen that the number of B-invariant subspaces does not exceed s max 11 (/>, + !) = 2" where the maximum is taken over all sequences /?,,..., ps of positive integers with p{ + ■ ■ ■ + ps = n. Now for any set of 2" subspaces if,, . . . , if2„ in (pn put F(Seit. . . , .&.) = max min d(M{t), if) Osisl ]<j<2» As 6{M{t), if) is a continuous function of t on [0,1], so is min]sjs2« 6(M(t), if,), hence F(££x, . . . , if2*) 's we" defined. Let us show thatf^if,,. . . , if 2„) is a continuous function of <2\,. . . , S£ 2». For some S >0, let Ml., i = 1,. . . , 2" be subspaces in <(7" such that 0(./V,, if,) < 5 for each i. Then for i = 1, . . . , 2" and / G [0,1], we obtain 6)(J<(0,^;)^6)(J<(0,if,) + S First take the minimum with respect to / on the left-hand side and then on the right-hand side. We obtain minn 6(M(t), Jft) < minn 6(M(t), if,) + 8 for all f £[0,1]. Taking the maximum with respect to t on the right-hand side first, and then on the left-hand side, we obtain FpV,,. . . , Jf2.)<F(Seit. . . , if2„) + S With the roles of if, and Niy switched it also follows that
Transformations with Different Jordan Structures 509 that is F(^, ...,<e2.)s F(Jf{, ...,Jf2„) + 8 \F(Jft,. . . , Jf2„) - F(Seit ..., %r)\ < 8 which proves the continuity of F(i£v , !£2n). Obviously, F(iP,, . . . , S£s„) > 0 for all <er As the set of all 2"-tuples of subspaces in <p" is compact, there exists an e>0 such that F(^,,. . . , Z£2„) > e for all $£-,, i = 1,. . . , 2". From the definition of F(J£X, . . . , 2£2-) is it easily seen that e does not depend on the choice of x and y (because any pair of orthonormal vectors in <p" can be mapped to any other pair of such vectors by a unitary transformation). Hence the theorem follows. D When the transformations A and B are both derogatory, or both non- derogatory, with different Jordan structures, the situation is more complicated. The following question arises naturally: if {Bm}* = 1 is a sequence of transformations converging to A and such that each Bm has Jordan structure different from that of A, does it follow that dist(Inv(^),Inv(flm)) l,m =00? ™->* ||B - A\\ (16.6.2) The next example shows that the answer is, in general, negative. example 16.6.1. For m = 1, 2, . . . , let A = [0 1 0 0 Lo o 01 0 oJ :<P3-<F3; B_ = -m~l 0 0 1 0 0 0 0 oJ :<P3-<P3 Clearly, for all m, Bm and A have different derogatory Jordan structure (in particular, different Jordan structure). One-dimensional /1-invariant subspaces are Span{e, + Be3}, B £ <p and Span{e3}. The orthogonal projector on Span{e, + Be3} is ^ = (l + l)8|2)" 1 0 B 0 0 0 P 0 |0|2 One-dimensional Bm-invariant subspaces are Span{e,}, Span{e3}, and Span{e, + m~1e2 + 3e3} where B £ (p. The orthogonal projector on Span{e, + m~1e2 + Be3} is
510 Perturbations of Lattices of Invariant Subspaces 0B,-=(l + «"2 + liB|2)- 0 /3m"' P jBm"1 |)8|2 1 m" -1 -2 Now there exists a constant L, >0 (independent of B and m) such that G. m.p I Lxm (16.6.3) Two-dimensional /1-invariant subspaces are Span{e,, e2 + j8e3} where )8 E (p and Span{e,,e3}. Two-dimensional Bm-invariant subspaces are Span{e, + m~ e2, e3\, Span{e,, e3}, and Span{e,, e2 + Be3], where )8 E (p. The orthogonal projector on Span{e, + m~'e2, e3} is Rm = (l + m'Y m m m 0 0 1 n 0 0 1 + wt'J There exists a constant L2 >0 (independent of m) such that rt«- [1 0 0] 0 0 0 L0 0 lj L-,m (16.6.4) Now the inequalities (16.6.3) and (16.6.4) ensure that for m = l,2, dist(Inv(Bm),Inv(/i))<m~l max(L,, L2). D In the last example both A and Bm are derogatory. Taking A = [°o o] fl« = ["o ' J] we obtain an example contradicting (16.6.2) with both A and Bm non- derogatory. 16.7 CONJECTURES In view of Example 16.6.1 the following question arises: Given a transformation A: (pn—» <p" with a certain Jordan structure, it is true that for any other Jordan structure there exists a sequence of linear transformations {#,„}* = , that have this other Jordan structure, for which fl„,—> A, and for which dist(Inv(,4), Inv(BJ) lim ~ -71 =00? B„ A\\ (16.7.1)
Conjectures 511 A similar question arises for the case of derogatory Jordan structure, when (16.7.1) is replaced by ,. dist(Inv(,4), Inv(fi)) — \\Bm-A\\Va and a is the height of A. Of course, certain conditions should be imposed on the Jordan structure (or on the derogatory Jordan structure) of {Bm}^=1 to ensure the existence of a sequence {flm}* = 1 converging to A. A complete set of such conditions is given in Theorem 15.10.2. Let us describe the Jordan structure of transformations on (p" in terms of sequences as in (16.1.1), and let <I> be the set of all such sequences. As in Section 15.10, for 11 = {s; r,, r2,. . . , rs; mn,..., mlri; m21,. . . , mlr;,. . . ; mtl,.... msr) e* (16.7.2) and for every nonempty set A C {1, . . . , s} define kj(il;A)=2mpi, /=1,2,.... Further, for ft given by (16.7.2) denote by P(il) the set of all sequences ft' = {s';r[,r^,...,r'1;m[l,. . . , m[r.; m'2l,. . . , m2r,;. . . ;m;.„.. ■ , m's.r) £<!> for which there is a partition of {1,. . . , s'} into s disjoint nonempty sets A,,. . . , As such that the following relations hold: 2 *,(!!;{?})<=£*,(**';*,,); f=l,2,...; p = l,...,s 2^(0; {/>}) = 2 *,(!!'; A„); p = l,...,5 Note that fleP(fl) always (one takes Ap = {p}, p = 1,. . . , s). The set J°(ft) consists of Q if and only if Q represents the Jordan structure corresponding to n distinct eigenvalues, that is, s = n. Note that by Theorem 15.10.2, P(il) represents exactly those Jordan structures for which there is a sequence of transformations converging to a given transformation with the Jordan structure Q. We propose the following conjecture.
512 Perturbations of Lattices of Invariant Subspaces Conjecture 16.7.1 Let A: <£""—* <p" be a transformation with the Jordan structure ftE<I>. Then for any sequence ft' that belongs to P(ft) and is different from ft, there exists a sequence of transformations {Bm} ^ _, that converges to A, for which each Bm has the Jordan structure ft', and for which v dist(InvM),Inv(BM)) J'JH. p-r^y It is not difficult to verify this conjecture when A is nonderogatory. Indeed, without loss of generality we can assume that A is the n x n Jordan block with eigenvalue zero. In view of Theorem 15.10.2, any sequence ft' belonging to P(ft) (here ft is the Jordan structure of A) has the form il' = {s;\,\,. . . ,l;m,; . . . ;ms} where s> 1 and mt are positive integers with E'=1 /n, = n. Given such ft', consider the following n x n matrix (we denote by 0m and lm the m x m zero and identity matrices, respectively): B=/l + diag[0J,V«I-..---.'»A,-il + /l«' e>0 where tj, ,. . . , % are the s\h roots of e, and the n x n matrix A f has e in the (s, 1) entry and zeros elsewhere. It is easy to see [by considering, e.g., det( A/ - B()] that, at least for e close enough to zero, the matrix B( has the Jordan structure ft'. Clearly, tj,, . . . ,17, are the eigenvalues of Be, and (1, T);,. . . , tj*-', 0, . . . , 0) is the only eigenvector of B (up to multiplication by a nonzero scalar) corresponding to tj, for / = 1,. . . , s. It follows (cf. the remark following Theorem 16.5.1) that dist(Inv(/l), Inv(fl)) and Conjecture 16.7.1 is verified for the matrix A. To formulate the corresponding conjecture for derogatory Jordan structure, we introduce one more notion. Let ft={s;r,,. . . ,rs;mu,. . . , mlf|;. .. ,;/n,,,. . . ,msr) and il' = {t;r[,. . . ,r',;m'u,. .. ,m\r[\. .. ;m'n,. . . , m'„.} be two sequences from <I>. We say that ft and ft' have the same derogatory part if the number (say, u) of indices j, 1 < /' < s such that r. > 2 coincides with the number of indices ;', l</<f such that r'^2, and, moreover, ri = r'j, ) = 1, • • • . «; miq = m'Jq, q = 1,. . . , r,; )' = 1, . . . , u. If it does not happen that ft and ft' have the same derogatory part, we say that ft and ft' have different derogatory parts.
Exercises 513 Conjecture 16.7.2 Let the transformation A: <pn—» <p" have the Jordan structure (1G$. Then for every sequence il' that belongs to P(il) and such that il' and il have different derogatory parts there exists a sequence of transformations {Bm}^ = l that converges to A, for which each Bm has the Jordan structure il', and for which dist(Inv(/l), Inv(flJ) m — \\Bm-A\r =co where a is the height of A. 16.8 EXERCISES 16.1 Given an n x n upper triangular Toeplitz matrix A, find all possible Jordan structures of upper triangular Toeplitz n x n matrices that are arbitrarily close to A. Are there additional Jordan structures if the perturbed matrix is not necessarily upper triangular Toeplitz? 16.2 Solve Exercise 16.1 for the class of n x n companion matrices. 16.3 Solve Exercise 16.1 for the class of n x n circulant matrices. 16.4 Solve Exercise 16.1 for the class ofnxn matrices A such that A2 = 0. 16.5 Prove or disprove each one of the following statements (a), (b), and (c): for every transformation A: <£""—»(p" there exists an c >0 such that any transformation B: ("—* <p" with ||fl - A\\ < e has the property that (a) the height of B is equal to the height of A; (b) the height of B is not greater than the height of A; (c) the height of B is not smaller than the height of A. 16.6 Prove Conjecture 16.7.1 for the case when A = 73(0). 16.7 Given a transformation A:("—»(p" and a number a>0, an A- invariant subspace M is called a stable if there exist positive constants K and e such that every transformation B: <p"—* <p" with ||fl - A\\ < e has an invariant subspace Jf satisfying 0{M,N)rEkK\\B- A\\lla Show that all invariant subspaces of the Jordan block J„{K) are a stable if aSn. (Hint: Use Lemma 16.5.2.) 16.8 (a) For every a>l, give an example of an a-stable ^-invariant subspace that is not Lipschitz stable, (b) For every a si, give an example of a stable /1-invariant subspace that is not a stable. 16.9 Are there a-stable invariant subspaces with 0< a < 1?
Chapter Seventeen Applications Chapters 13-16 provide us with tools for the study of stability of divisors for monic matrix polynomials and rational matrix functions. In this chapter we develop a complete description of stable divisors in terms of their corresponding invariant subspaces and supporting projectors. Special attention is paid to Lipschitz stable and isolated divisors. We consider also the stability and isolatedness properties of solutions of matrix quadratic equations as well as stability of linear fractional decompositions of rational matrix functions. 17.1 STABLE FACTORIZATIONS OF MATRIX POLYNOMIALS: PRELIMINARIES Let L(X) be an n x n monic matrix polynomial, and let L(A)=L,(A)L2(A)-L,(A) (17.1.1) be a factorization of L(A) into a product of n x n monic polynomials L,(A),. . . , Lr(\). We say that the factorization (17.1.1) is stable if, after sufficiently small changes in the coefficients of L(A), the new matrix polynomial again admits a factorization of type (17.1.1) with only small changes in the factors L-(A). In the next section we study stability of the factorization of type (17.1.1) in terms of invariant subspaces for the linearization of the matrix polynomial L(A). In this section we establish the framework for this study and prove results on continuity of the correspondence between factorizations and invariant subspaces to be used in the next section. Let CL be the companion matrix for L(A): 514
Matrix Polynomials: Preliminaries 515 cL = ' 0 0 0 --A* I 0 0 -A, 0 / 0 -A2 ■• 0 0 / ■ ~A,- - 1 - where L(A) = /a' + T.'jJ0 j4;A;. As we have seen in Chapter 5, the triple (X0, CL,YQ), where X0 = [I 0 0], Y0 = 01 0 0 ./J is a standard triple for L(A). Further, there is a one-to-one correspondence between the factorizations (17.1.1) of L(A) and chains of CL-invariant subspaces {0}CJ,C--'Ci2Cf' (17.1.2) with the property that the transformations -X0Cl \M ;.*,-*$•"*, 2,. . . ,r (17.1.3) are invertible (see Section 5.6). Here, lr<: ■ • <l2<l are some positive integers. The correspondence between factorizations (17.1.1) and chains of CL-invariant subspaces is given by the formulas from Theorem 5.6.1. Namely, let Jf. be a direct complement to Mi+X in Mj(;' = 1,. . . , r - 1) (by definition, Jtl = (p"'), and let Px: Mj—*Nj be the projector on Jff along Mj+i. For j=l,. . . ,r—l, let pt be the difference /•+1-// where, by definition, /, = /. Here / is the degree of L(A). Then for/= 1,. . . , r- 1 we have L.( A) = A"'/ - (Wn + \WJ2 + ■■■ + k^Wi^(PJCCLy'PyYj where Yj = (colf^CJ^,^)- col[5/m /]-', and the transformations W^: ^.—> (p", / = 1,. . . , /n, are determined by
516 Applications col[W;1]?L, = [Psfi, PxCLUPxYn ..., (P^C^/rip^] (As usual, Suu denotes the Kronecker symbol: duv = 1 if u = v and Suv =0 if u ¥" v.) For the last factor L,(A) we have Lr( A) = A''/ - ^ok(C,u)''(Krl + Vr2A + • • • + Vr>, A''"1) where IK, Vr2 • ■ • Vr,J = (colMfoC'jJ'-oU,)"1: (p"''-^. Also, it is convenient to use the formulas for the products L1(A)L2(A)-L,_1(A) and L;(A)L,.+](A)-• • Lr(A) (cf. the proof of Theorem 5.6.1). We have for / = 2,. . . , r: L,(A)- • • L,(A) = /A'- - X0(CL[Mi)\Vn + K2A + • • • + V,,^"') (17.1.4) where Wn ^•■•^Ml^otCzW'"1]-'-,] /-'I', 1-1 (Observe that when / = r formula (17.1.4) coincides with the preceding formula for Lr.) Also, for i = 2,. . . , r. - (z„ + z,2a + • • • + z^a'"''") • (^cL|j(fi)'-''y0 (17.1.5) where J£j is a direct complement to Mt in ("', P, is the projector on M\ along J<,., and Zn 1 LZ. = [p,y0, pc^^y,, ..., (PiC^.y-'^p^y1 ..(-(,-■ Our next step is to show that this correspondence between factorizations of monic matrix polynomials L(A) and chains of certain CL-invariant subspaces is continuous. To this end define a metric <rk on the set 8Pk of all n x n monic matrix polynomials of degree k: Jia" + £ b,x1, /a* + E b;a'") = 2 ||b, - b;||
Matrix Polynomials: Preliminaries 517 Now fix a positive integer /. Consider the set Wr of all r-tuples (Mr, Mr_t,. . . , M2, L(X)), where L(A) is a monic matrix polynomial of degree /, and Mr CiM C • • • C M2 is a chain of CL-invariant subspaces. The set Wr is a metric space with the metric 6r((Mr, ...,M2, L(A)), (M'„ ...,M^ L'(A))) r i-2 For every increasing sequence £ = {lT < /r_, < • • • < l2) of positive integers /, with l2 < I, define the subset Wr ^ of Wr consisting of the elements (Mr,. . . , M2, L(k)) from Wr with the additional property that the transformations (17.1.3) are invertible. Theorem 17.1.1 For each £ the set Wr ( is open in Wr. Proof. Define the subspace cSl_p <*!,...,*,)£ ^,_p if and only if xx = • of (p"' by the condition x = ■ = xp = 0 (here x,e(p"). As XqCl = [/„, o] for p = 1, . . . , /, it follows that the transformation (17.1.3) is invertible if and only if Mt is a direct complement to %_, in (p"'. From Theorem 13.1.3 it follows that, if Mt + <§,_,. = (p"', then for e > 0 sufficiently small we also have M\\- <8,_t = <£"" for every subspace M\ in <p"' with e(Mt, M'i)< e. Hence W,wt is open in Vr. D Now define a map *i:*V ' '2 '2 '3 'r-l 'r 'r where £ = {/,.,. . . , /2} is an increasing sequence of positive integers lr, /,_,,. . . , l2 with /2</, as follows. Given (Mr, Mr_x, . . . ,M2, L(\))E. Wr (, the image of this element is (L,(A),. . . , Lr(\)), where the monic matrix polynomials L,(A) are taken from the factorization L(A)=L,(A)L2(A)---Lr(A) which corresponds to the chain MT C • • • C J<2 of CL-invariant subspaces. It
518 Applications is evident that Ft is one-to-one and surjective, so that the map F ^ exists. Make the set 3>i = g>,_,2 x 0> x • • • x 0>, _t x ^ into a metric space by defining p(t^,...,LX(L\,...,K)) = <rl_h{Li,L\) + <rh_li(L2,L'2) + --- + (rlr(Lr,L'r) If A',, X2 are topological spaces with metrics p,, p2, defined on A', and X2, respectively, the map G: X^ —*■ X2 is said to be locally Lipschitz continuous if, for every i£Z„ there is a deleted neighbourhood Ux of x for which SUp I r— I < oo Obviously, a locally Lipschitz continuous map is continuous. It is easy to see that the composition of two locally Lipschitz continuous maps is again locally Lipschitz continuous. Theorem 17.1.2 The maps F( and FJ1 are locally Lipschitz continuous. Proof. Given (Mr,. . . , M2, L(\)) G WrV write M,(A) = L,(A)- • • L,_,(A), jV,(A) = L,(A)- • • Lr(A), where the products L, • • • L,_, and L, • • • Lr are given by (17.1.5) and (17.1.4), respectively. Then L( A) = M2( A)A^2( A) = • • • = Mr( A)N,( A) We show first that the coefficients of M,(A), jV,( A), i = 1,. . . , r- 1 are locally Lipschitz continuous. Observe that in the representations (17.1.4) and (17.1.5) the coefficients of Mk and N, are uniformly bounded in some neighbourhood of (Mr,. . . , M2, L(A)). It is then easily seen that in order to establish the local Lipschitz continuity of the coefficients of M, and Ni it is sufficient to verify the following assertion: for a fixed (Mr,. . . , M2, L( A)) GWr( there exist positive constants S and C such that, for a set of subspaces i?r_,, . . . , i£, satisfying 0(j^, Mt) < S for / = 2,. . . , r, it follows that •e;.+ »,_,,= <;"' Here «,_, = {(0,. . . ,0, «„ . . . , «„„_,_,> G $"' | «, G (p) and \\PX<- PM || ^ C6(^, Mj), where P^ (resp. PM) is the projector on J2j (resp. J*,) along ^,_(. But this conclusion follows from Theorem 13.1.3. Hence the coefficients of A/,(A) and N,(A) are locally Lipschitz continuous functions of an element in Wr e. In particular, L, = M2 and Lr = Nr are locally Lipschitz continuous.
Matrix Polynomials: Preliminaries 519 To prove this property for L2,. . . , Lr_,, note that M/(A)L/(A) = Mj+1(A), i = 2,...,r-l (17.1.6) Regard the equalities (17.1.6) as a system of linear equations Ax = b (17.1.7) where A and b are formed by the entries of coefficients of M,(A) and M( + 1(A) for i = 2,. . . , r - 1, and the unknown vector x is formed by the entries of the coefficients of L2,. . . , Lr_v The system (17.1.7) has a unique solution x; hence the matrix A is left invertible. Sox = A'b, where A is a left inverse of A. Observe that every matrix B with ||B - A\\ < ^Wa'W'1 is also left invertible with a left inverse B1 satisfying {B'-A'W^lWA'i IB (cf. the proof of Theorem 13.5.1). This inequality shows that x is a locally Lipschitz continuous function of (dtr,. . . , Jt2, L(A))G Wr (, because A and b have this property. To establish the local Lipschitz continuity of F^1, we consider a fixed element (L,,. . . , Lr)e 9>i. It is apparent that the polynomial L = LlL2-Lr will be a Lipschitz continuous function of L,,...,Lr in a neighbourhood of this fixed element. Further, let MrC-- -CM2 be the chain of CL-invariant subspaces corresponding to the factorization L = L,L2 • •• Lr. Let TV, = LiLl+l ■ ■ ■ Lr for / = 2,. . . , r, and let »/- {<0,...,0,a„...,a.(/_m/)>E<p-'kE<p} where / is the degree of L and mi is the degree of TV,. The projector PM on Mt along cSl_m is given by the formula p J* ° '■*■ Lf. o F.= x0(cNy- McJ (17.1.8) where .Y0 = [/ 0 ■ ■ • 0] and CN is the companion matrix of Nr Indeed, obviously, PM is a projector and Ker PM = <S,_m. Let us check that Im PM = J<(. Recall (see the proof of the converse statement of Theorem 5.3.2) that Mk is given by the formula M, = Im{[col[*0Ci-'];=1] ' col[X0(C„Y-%,} As and coi[^0ci'];=1 = /
520 Applications coi^c^r']^/ we find that ^, = Im = Im P^,. Formula (17.1.8) implies the local Lipschitz continuity of PM '(as a function of (L,, . . . , Lr)) and, therefore, also of Mi (cf. Theorem 13.1.1). □ 17.2 STABLE FACTORIZATIONS OF MATRIX POLYNOMIALS: MAIN RESULTS We say that a factorization L(A)=L1(A)L2(A)---Lr(A) (17.2.1) of a monic matrix polynomial L(\), where L,(A) are monic matrix polynomials as well, is stable if for any e >0 there exists a 8 >0 such that any monic matrix polynomial L(A) with o-,(L, L)<8 admits a factorization £(A) = Lj(A)-• • £r(A), where £,(A) are monic matrix polynomials satisfying max(a-,_,2(£1, L,), o-,^h(L2, L2),. . . , <r, ,_, (Lr_,, L,_,), a-,(Lr, Lr))< e Here / is the degree of L and L, whereas for / = 2,. . . , r, /, is the degree of the products Li+i ■ ■ ■ Lr and Ll+1 • • • Lr. Recall the definition of a stable chain of invariant subspaces given in Section 15.6. Theorem 17.2.1 Let equality (17.2.1) be a factorization of the monic matrix polynomial L(A). Let (Mr, . . . , M2, L(A)) = F^ (L,,. . . , Lr) be the corresponding chain of CL-invariant subspaces. Then the factorization (17.2.1) is stable if and only if the chain MrC---CM2 (17.2.2) is stable. Proof. If the chain (17.2.2) is stable, then by Theorem 17.1.2 the factorization (17.2.1) is stable. Now conversely, suppose that the factorization (17.2.1) is stable but the chain (17.2.2) is not. Then there exists an e >0 and a sequence of matrices {Cm)Z = \, such that limm_,„ Cm = CL and for any chain
Matrix Polynomials: Main Results 521 %im) C • • • C #<m) of Cm -invariant subspaces the inequality holds. Put Q = col[S(.,/]j=1 and Sm = col[QC'-l]'l.l, m = l,2,... Then 5m converges to col[J2C"~']{=1, which is equal to the unit nl x nl matrix. So without loss of generality we may assume that Sm is nonsingular for all m. Let Smx = [Uml, Um2,..., Uml], and note that £C-col[5;7/];=1, i = l,...,/ (17.2.3) A straightforward calculation shows that SmCmS~J is the companion matrix associated with the monic matrix polynomial i-\ Mm(K) = \'l-^KiQC'mUm_i + l 1=0 From (17.2.3) and the fact that Cm-*CL it follows that a,{Mm, L)-»0. But then we may assume that for all m the polynomial Mm admits a factorization Mm(A) = Llm(A)--Lr,„,(A) (17.2.4) where crp(Lim(\), L,(A))-»0 for i = 1,. . . , r (here pt is the degree of Lt, which is also equal to the degree of Lim for m = 1,2,. . .). Let Mr m C- ■ • C M2m be the chain of CM -invariant subspaces corresponding to the factorization (17.2.4), that is Ft(Mr>m,..., M2m, Mm(A)) = (Llm(A),. . . , L, m(A)) By Theorem 17.1.2 we have ^n(ie(M^,M,)) = 0 Put Yim = S~J MIJK for / = 2,. . . , r and m = 1, 2,. . . . Then Yim is an invariant subspace for Cm for each m. Moreover, it follows from Sm—>I that, for i = 2,...,r, d(Yim, M,m)-*0 as m-»<». (Indeed, by Theorem 13.1.1
522 Applications where Pim is the orthogonal projector on Mim. Now \\S-JPimSm - Pim\\ ^ ||(5m' - I)PimSj + \\Pim(I - Sm)\\ <max||5m||-||5;,-/|| + ||/-5j| which tends to zero as m tends to infinity.) But then 6(Ttm, J*,)—»0 as m —* °o, for / = 2,. . . , r. This contradicts the choice of Cm, and the proof of Theorem 17.2.1 is complete. □ Comparing Theorem 17.2.1 with Corollary 14.6.2 and Theorem 14.2.1, we obtain the next result. Corollary 17.2.2 A factorization L(A)=L,(A)L2(A),...,Lr(A) with monk matrix polynomials L(\), Lt(\),. . . , Lr(A) is stable if and only if the corresponding chain Mrd--dM2 of CL-invariant subspaces satisfies the condition that for every eigenvalue A0 of CL with dim Ker(CL - A0/) > 1 and for every i (2 < i s r) either M{ D S?Ao(CJor J<,n2?Ao(CJ={0}. One can formulate a criterion for stability of factorizations of this kind in terms of eigenvalues of the polynomials L,(A) rather than the companion matrix (as we have done in Corollary 17.2.2), as follows. Theorem 17.2.3 A factorization (17.2.1) is stable if and only if, for any common eigenvalue A0 of a pair Lt(k), L;(A) (i¥^j) we have dimKer L(A0) = 1. The proof of Theorem 17.2.3 is based on the following lemma. Lemma 17.2.4 Let
Matrix Polynomials: Main Results 523 be a transformation from <fm into <pm, written in matrix form with respect to the decomposition <f"" = <£""' ® <f""2 (m, + /n2 = /n). Then $m is a stable invariant subspace for A if and only if for each common eigenvalue A0 of A, and A 2 the condition dim Ker( A0/ - A) = 1 is satisfied. Proof. It is clear that (p""1 is an invariant subspace for A. We know from Theorem 15.2.1 that <£""' is stable if and only if for each Riesz projector P of A corresponding to an eigenvalue A0 with dim Ker( A0/ - A) > 2, we have P(pm' = 0 or P(pm' = Im P. Let P be a Riesz projector of j4 corresponding to an arbitrary eigenvalue A0. Also for / = 1, 2, let Py be the Riesz projector associated with Ai and A0: Pi = ^~- I (lX-Ai)~ldX ' Itri J|a-a0|=€ v >' for j = 1,2, where e > 0 is sufficiently small. Then 0 P, d\ Observe that for i — 1, 2, the Laurent expansion of (/A - At) ' at A0 has the form (/A - A,)"' = 2 (A - XoYP&jP, + ■■■ + (17.2.5) where £>/y are some transformations of Im P,. into itself and the ellipsis on the right-hand side of (17.2.5) represents a series in nonnegative powers of (A - A0). From (17.2.5) one sees that P has the form L o ^e. + ^/y where Qt and Q2 are certain transformations acting from (p™2 into (pm*. It follows that {0} # P(pm' ^ Im P if and only if A0 Eor(y4,)n or(y42). Now appeal to Theorem 15.2.1 (see first paragraph of the proof) to finish the proof. □ Proof of Theorem 17.2.3. Let (M„ . . . , M2, L( A)) = /^'(L,,. . . , Lr) be the chain of CL-invariant subspaces corresponding to the factorization (17.2.1). From Theorem 17.2.1 (taking into account Corollary 17.2.2) we know that this factorization is stable if and only if M2, . . . , Mr are stable CL-invariant subspaces. Let / be the degree of L, let r, be the degree of LXL2 ■ • • L,, and let
524 Applications », = {<*„....:OE<p"'l*, = --- = */-,, = 0} Then <p"' = Mt + fy. With respect to this decomposition, write L L o c2/J As we know (see Corollary 5.3.3), cr(Li+l • ■ • Lr) = cr(Cu) and o-(L, • • • Lj) = a-(Cu). Also o-(CL) = o-(L); the desired result is now obtained by applying Lemma 17.2.4. □ Another characterization of stable factorizations of monic matrix polynomials can be given in terms of isolatedness. Consider a factorization L(A)=L,(A)L2(A)---Lr(A) (17.2.6) of a monic matrix polynomial L( A) into the product of monic polynomials L,(A),. . . , L,(A), and let p. be the degree of Lt for / = 1,. . . , r. This factorization is called isolated if there exists an e > 0 such that any factorization L(A)=M,(A)M2(A)--Mr(A) of L(X) with monic polynomials M,(A) satisfying o- (L^A), M,(A))< e (it is assumed that the degree of M, is pt) coincides with (17.2.6), that is, M,(A)=L,.(A)fori = l,...,r. Theorem 17.2.5 A factorization (17.2.6) is stable if and only if it is isolated. Proof. Let (Mr,. . . , M2, L(A)) = F^ '(L,, L2,. . . , Lr) be the corresponding chain of CL-invariant subspaces. By Theorems 17.1.2 and 17.2.1, the factorization (17.2.6) is isolated if and only if each Mt satisfies the condition that either Mt D 3?A (CL) or Mt n 9?A (CL) = {0} for every eigenvalue A0 of CL with dimKer(CL - A0/)>1. Now it remains to appeal to Corollary 17.2.2. □ We conclude this section with a statement concerning stability of the property that a given factorization of a monic matrix polynomial is stable. Theorem 17.2.6 Assume that L(A)=L,(A)L2(A)---Lr(A)
Monic Matrix Polynomials 525 is a stable factorization with monic matrix polynomials L,(A), L2(A),. . . , Lr(A). Then there exists an e >0 such that every factorization M(A) = M1(A)M2(A)---Mr(A) with monic matrix polynomials M,(A), . . . , Mr(A) is stable provided "i-iSM^ L>> + <WM2> Li) + ■ ■ ■ + <V,-,,(M,-,. t,-i) + °i,(^,. *v)< « where for i = 2,... , r, /, is the degree of the products L, •• • Lr and M, • ■ • Mr. The proof of Theorem 17.2.6 is obtained by combining Theorem 17.2.1 and Corollary 15.4.2. 17.3 LIPSCHITZ STABLE FACTORIZATIONS OF MONIC MATRIX POLYNOMIALS A factorization L(A)=L,(A)L2(A)---Lr(A) (17.3.1) of the monic matrix polynomial L(A), where L,(A),. . . , Lr(\) are monic matrix polynomials as well, is called Lipschitz stable if there exist positive constants e and K such that any monic matrix polynomial £(A) with at(L, L)<e admits a factorization £(A) = £,(A)■ • • Lr(A) with monic matrix polynomials L,(A) satisfying max{oi_fi(L„ L,), <r,t_h{L2, L2),..., a,(Lr, Lr)} == /Cor,(£, L) Obviously, every Lipschitz stable factorization is stable. The converse is not true in general, as one can see from the results of this section. We start with the correspondence between the factorization (17.3.1) and chains of CL-invariant subspaces, where CL is the companion matrix for L(A), described in Section 17.1. Theorem 17.3.1 The factorization (17.3.1) is Lipschitz stable if and only if the corresponding chain of CL-invariant subspaces MrCMr_lC---CM2 (17.3.2) is Lipschitz stable.
526 Applications The Lipschitz stability of (17.3.2) is understood in the sense of Lipschitz stability of lattices of invariant subspaces (Section 15.6). In the particular case of chains, the chain (17.3.2) is, by definition, Lipschitz stable if there exist positive constants e and K [that depend on CL and the chain (17.3.2)] with the property that every nl x nl matrix A with \[A - CL\\ < e has a chain 5£r C • • • C %2 of invariant subspaces such that max(8(Mr, %r), ..., 6(M2, .%))< K\\A - CL\\ Proof. If the chain (17.3.2) is Lipschitz stable, then by Theorem 17.1.2 the factorization (17.3.1) is Lipschitz stable. Conversely, assume that the factorization (17.3.1) is Lipschitz stable but the chain (17.3.2) is not. Then there exists a sequence {Cm}^_, of nl x nl matrices such that \\Cm - CL\\ < (1/m) and for every chain 3?rC---C%2 of Cm-invariant subspaces the inequality max(0(Mr, <£r),..., 6(M2, i?2)) > m\\Cm - CL\\ (17.3.3) holds. We continue now with an argument analogous to that used in the proof of Theorem 17.2.1. Putting Sm=co\[QCim,]'i=i, where Q = col[5n]j=1, we verify that Sm is nonsingular (at least for large m) and that 5mCm5^' is the companion matrix associated with the matrix polynomial i-\ Mm(\) = \lI-2\,QC'mUmJ+l i = 0 where [Uml, Um2, . . . , Uml] = 5m'. We assume that Sm is nonsingular for m = 1, 2,. . . . Observe that col[QC"L-1][=, is the unit matrix /; so it is not difficult to check that for m = 1, 2,. . . <r,{Mm,L)*K,\\Cm-CL\\ (17.3.4) Here and in the sequel we denote certain positive constants independent of m by K-y, K2,.... As the factorization (17.3.1) is Lipschitz stable, for m sufficiently large the polynomial Mm(k) admits a factorization AUA) = Mlm(A)---Mrm(A) (17.3.5) with monic matrix polynomials Mlm(\),. . . , Mrm(A) such that max(cr (Mlm, L,),. . . , crPr(Mrm, Lr)) < K2<r,(Mm, L) (17.3.6)
Monic Matrix Polynomials 527 Let Mr m C • ■ • C M2 m be the chain of CM -invariant subspaces corresponding to the factorization (17.3.5). By Theorem 17.1.2 we have 2 6(Mjm, <2>) + <rt{Mm, L) < K\t crJMjm, L;)l (17.3.7) From (17.3.4), (17.3.6), and (17.3.7) one obtains t e(Mjm, ^)< rKxK2K3\\Cm - CL\\ (17.3.8) Put Tim = Sm'Mim for / - 2,. . . , r and m = 1, 2,. . . . Then Tim is Cm invariant for each m. Further, the formula for Sm shows that ||/-Sj|</g|Cm-Cj| (17.3.9) Indeed I~Sm=co\[Q(Ci-'-Ci-')]'i=] = col[Q(CL-2(CL -Cm)+ CL-3(CL - Cm)Cm + ■■■ + CL(CL - CJC'-3 + (Q - Cm)C-2%-_x and (17.3.9) follows. Now (cf. the proof of Theorem 17.2.1) fl(^,m,^,J^rnax||5m||-||C-/|| + l|/-5m|| <(max||5m||max||5;1|| + l)||/-5j|<A:5||Cm-Cj| Using this inequality and (17.3.8), we obtain r r 2 6(Yhm, %) < 2 [6^, M^m) + 6(Mim, #,)] < Kb\\Cm - CL\\ f-2 i-2 a contradiction with (17.3.3). □ Combining Theorem 17.3.1 with Theorems 15.6.2 and 15.5.1, we obtain the following corollary. Corollary 17.3.2 For the factorization (17.3.1) and the corresponding chain of CL-invariant subspaces (17.3.2), the following statements are equivalent: (a) the factorization (17.3.1) is Lipschitz stable; (b) all the CL-invariant subspaces
528 Applications M2, . . . , Mr are spectral; (c) for every e > 0 sufficiently small there exists a 8 >0 with the property every nl x nl matrix B with \\B - CL\\ < 8 has a unique chain of invariant subspaces Jfr C JVr_j C • • • C N2 such that max(0(Mr, Jfr),. . . , 6(M2, Jf2)) < e. Now we are ready to state and prove the main result of this section, namely, the description of Lipschitz stable factorizations. (Recall the definition of the metric ak on matrix polynomials given in Section 17.1.) Theorem 17.3.3 The following statements are equivalent for a factorization L(A)=L,(A)---Lr(A) (17.3.10) of the monk n x n matrix polynomial L( A) of degree I, where L,(A),. . . , Lf(A) are also monic matrix polynomials of degrees px,. . . , pr, respectively: (a) the factorization (17.3.10) is Lipschitz stable; (b) a(Lj)C) cr(Lk) = 0 for j ^ k; (c) for every e > 0 sufficiently small there exists a 8 >0 such that any monic matrix polynomial L(A) with a;(L, L)<8 has a unique factorization L(A) = L,(A)-• ■ L,(A) with the property that max(crpi(Lx, L,),. . . , crp(Lr, Lr))<e. Proof. Observe that for ;' = 2,. . . , r, °{CL\M)=u{Lr--Lr) where JrC---CJ2 is the chain of CL-invariant subspaces corresponding to the factorization (17.3.10) (see formula (17.1.4)). Also, denoting by M) a direct complement to Mj in MJ_l for j = 2,... , r, defining My = <p"', and letting Pf. Mj_i-*M'j be the projector on Jf'f along Mf, we have <r(PICL\M.)=a{Li ,) So, the subspaces M;- are spectral if and only if a{L^) (1 a(Lk) = 0 for / ^ k. Hence the equivalence (a)o(b) in Theorem 17.3.3 follows from the equivalence (a)o(b) in Corollary 17.3.2. Similarly, the equivalence (a)O(c) in Theorem 17.3.3 follows from the corresponding equivalence in Corollary 17.3.2, taking account of Theorem 17.1.2. □ 17.4 STABLE MINIMAL FACTORIZATIONS OF RATIONAL MATRIX FUNCTIONS: THE MAIN RESULT Throughout this section W0(\), W01(A), W02(\),. . . , Wok(\) are rational nx n matrix functions that take the value / at infinity. We assume that
Rational Matrix Functions: The Main Result 529 W0(A) = W01(A)W02(A)- • • W0k( A) and that this factorization is minimal. The following notion of stability of this factorization is natural. Let W0(\) = In + CQ(\Is-Aoy,B0 (17.4.1) Wo,(A) = /„ + C0,(A/6-J40,.)-,B0,) « = 1,...,* (17.4.2) be the minimal realizations for W0 and W01, W02,. . . , W0k (so 8 is the McMillan degree of WQ, and 8, is the McMillan degree of Wt for / = 1,. . . , k). The minimal factorization W0 = W01 ■ ■ ■ WQk is called stable if for each e>0 there exists a w>0 such that \\A - AQ\\ + ||fl - B0|| + \\C — C0\\ < a) implies that the realization W(\) = In + C(Us-Ay1B is minimal and W admits a minimal factorization W= W1W2,. . . ,Wk, where for / = 1,. . . , k, the rational matrix function W|(A) has a minimal realization Wi(\) = I„ + Ci(\IsrAiy1Bi with the extra property that ||y4, - A0i\\ + \\Bj - BQi\\ + \\C,,- Cm\\ < e. Since all minimal realizations of a given rational matrix function are mutually similar (Theorems 7.1.4 and 7.1.5), this definition does not depend on the choice of the minimal realizations (17.4.1) and (17.4.2). The next theorem characterizes stability of minimal factorizations in terms of spectral data. Theorem 17.4.1 The minimal factorization W0( A) = W01( A)W02( A) ■ ■ • W0k( A) is stable if and only if each common pole (zero) of W0i and W0p (j^p) is a pole (zero) of W0 of geometric multiplicity 1. The geometric multiplicity of a pole (zero) A0 of a rational matrix function W( A) is the number of negative (positive) partial multiplicities of W( A) at A0 (see Section 7.2). We need some preliminary discussion before starting the proof of Theorem 17.4.1. As we have seen in Theorem 7.5.1, the minimal factorizations W0(A) = W01(A)- --^(A) (17.4.3) of W0(A) are in one-to-one correspondence with those direct sum decompositions (p6 = jJP, + • ■ • + 2k (17.4.4)
530 Applications for which the subspaces if, + • ■ • + Z£p (p = 1,. . . , k) are A0-invariant and the subspaces £k +£k + l + • •• + £p (p = k,. . . , 1) are Aq invariant, where A^ = A0 - BnC0. Moreover, the minimal factorization (17.4.3) corresponding to the direct sum decomposition (17.4.4) is given by W0/(A) = /+C0ir;.(A/-i40rV,fl0, j = l,...,k (17.4.5) where iry is the projector on i?y along if, + • • • -i- £j_l + S£j+l + ■ ■ • + Z£k; note that the realizations (17.4.5) are necessarily minimal. In the formula (17.4.5) the transformations C07ry: i?-» (pn, 7ry/407jy. <2y-*i^, and iT/Bq: <p"—* !£f are understood as matrices of sizes n x l.t I. x ljt and /; x n, respectively, where /; = dim <£jf with respect to some basis in S£r Let (A, B, C) be a triple of matrices of sizes 5x8, 8 x n, n x 8, respectively. Consider the ordered /c-tuple II = (tt,, . . . , irt) of projectors in <|7 . We say that II is a supporting k-tuple of projectors with respect to the triple of matrices (A, B, C) if irjTTi = ■uiTtj = 0 for / ^ j, trl + ■ ■ • + irk = /, the subspaces Im(7T, + ■ • • + 7rp) for p = 1, 2, . . . , k are /I invariant, and the subspaces Im(ir + ir +1 + • • • + trk), p = 1,. . . , k, are A" invariant, where Ax — A - BC. Clearly, II is a supporting /c-tuple of projectors with respect to (A0, BQ, Cn)ifandonlyifthesubspacesi^ = Im 7r/(/ = 1,. . . , k) form a direct sum decomposition of <p6 as in (17.4.4). A supporting fc-tuple of projectors 11 = (it,, . . . , irft) with respect to (A, B, C) will be called stable if for every e > 0 there exists an <o > 0 such that, for any triple of matrices (A', B', C) of sizes 8x5, 8 x n, n x 5, respectively, with \\A - A'\\ + \\B - B'\\ 4- \\C - C'\\ < u>, there exists a supporting /c-tuple of projectors IT = (ir\,. . . , Tr'k) with respect to (A', B', C) such that k 2 Ik;- mil < 6 The first step in the proof of Theorem 17.4.1 is the following lemma. Lemma 17.4.2 Let (17.4.1) be a minimal realization for W0( A), and let II = (7T,,. . . , irk) be a supporting k-tuple of projectors with respect to (AQ, B0, CQ), with the corresponding minimal factorization WW A) = W01(A)Wra(A)- • • WQk(\) (17.4.6) (so that, for j - 1,. . . , k, ^.(A) = / + C07Ty(A/ - An)~1irjB0 with respect to some basis x^\ . . . , x\j) in Im ir.). Then II is stable if and only if the factorization (17.4.6) is stable.
Rational Matrix Functions: The Main Result 531 The proof of Lemma 17.4.2 is rather long and technical and is given in the next section. Next, we make the connection with stable invariant subspaces. Lemma 17.4.3 Let 11 = (7r1; . . . , irk) be a supporting k-tuple of projectors with respect to (A0, BQ, C0). Then n is stable if and only if the A0-invariant subspaces Im(7r, + • • • + 77;), ; = 1, . . . , k are stable and the Aq -invariant subspaces Im^ + 7T/+1 + • ■ • + irk), j = 1,. . . , k are stable as well (as before, A^ = ■■4(1 — B0CU). Again, it will be convenient to relegate the proof of Lemma 17.4.3 to the next section. Proof of Theorem 17.4.1. Let 11 = (ttx, . . . , Trk) be the supporting /c-tuple of projectors with respect to (A0, BQy C0) that corresponds to the minimal factorization W0(A) = W01(A)W02(A)---W0A(A) (17.4.7) By Lemmas 17.4.2 and 17.4.3 the factorization (17.4.7) is stable if and only if the A0-invariant subspaces ^ = Im^ + ■ • • + 7^),;' = 1, . . . , k are stable and the Aq -invariant subspacesM, = Im(7r; + irj+i H + irk),j= 1, . . . , k are stable as well. With respect to the decomposition <p6 = Im 7r, + Im ir2 + ■ ■ ■ + Im trk, write An A12 L 0 22 0 AuL. K = A x A x 0 0 1kk In view of Lemma 17.2.4, !£■ is stable if and only if, for every common eigenvalue A0 of In 0 0 Al2 ' A22 • 0 • •• Aul " A.2> ■ An- and 'A> + i, 0 0 ; + i •^/ + i,y+2 Aj + 2,i + 2 0 / + ],* i + 2,k lkk we have dim Ker( A0/ - A) = 1. So all the subspaces if,,. . . , iPt are stable if and only if every common eigenvalue of Afi and A (j^p) is an eigenvalue of A0 of geometric multiplicity 1. Similarly, all the subspaces Mx,. . . , Mk
532 Applications are stable if and only if every common eigenvalue of A* and A* with j ^ p is an eigenvalue of Aq of geometric multiplicity 1. It follows that the factorization (17.4.7) is stable if and only if every common eigenvalue of A- and App (resp. of A*} and A*p) with / ¥^p is an eigenvalue of A0 (resp. of Aq) of geometric multiplicity 1. To finish the proof, observe that the realizations (17.4.5) are minimal and hence, by Theorem 7.2.3, the poles (resp. zeros) of WQj(\) coincide with the eigenvalues of ttjAqttj = An (resp. eigenvalues of ttjA^tti - A*). Also, the partial multiplicities of a pole (resp. zero) A0 of Woj are equal to the partial multiplicities of A0 as an eigenvalue of Ajj (resp. Ajj). Analogous statements hold for the poles and zeros of W0(A) and eigenvalues of A0 and A%. □ 17.5 PROOF OF THE AUXILIARY LEMMAS We start with the proof of Lemma 17.4.2. Assume that II is stable. Given e >0, let e' be a positive number that we fix later. By Lemma 13.3.2 there exists an w, >0 with the property that, for any projector Tr'f such that || -rrj'. — 77^ || < &>,, there exists an invert- ible transformation Sf. (pa->- (pa with 5y(Im irj = Im tr\ and ||/ - 5y|| < e'. We also assume that Wj ^ min(e\ 1). Further, let a>2 be the number corresponding to to, as defined by the stability of II. As the realization (17.4.1) is minimal, in view of Theorem 7.1.5 the matrix col[C0^;0]p0' is left invertible, where p is the degree of the minimal polynomial for A0, and the matrix [B0, A0B0,. . . , Ap0 lBa] is right invertible. Since the left (right) invertibility of a matrix X is stable under small perturbations [indeed, if \\Y- X\\ < ||A''||~1, then Y is also left (right) invertible), there exists a t >0 such that the realization W(X)=I„ + C(\Is-AylN (17.5.1) is minimal provided \\A - A0\\ + \\B - B0|| + \\C - CQ\\ < r. Now put a) = min(w,, <o2, 7, e') and let (A, B, C) be such that ||/l - ^0|| +||fl - B0|| + ||C-C0|| <« Then the realization (17.5.1) is minimal. By the stability of II, there exists a supporting /c-tuple of projectors IT = (tt\, . . . , ir'k) with respect to (.4, B, C) such that 2lk;-«ill<«, / = i For ; = 1,..., let 5;: (p6—»(p6 be invertible transformations with Sy(Im tt,-) = Im it) and ||/ - 5;|| < e'. Now put
Proof of the Auxiliary Lemmas 533 Wj(k) = /„ + Cir,Sy( A/ - S;,ir'iATT'jSjylS:,TT'iB (17.5.2) for each j, where the transformation 5y is understood as 5y: Im it;—»Im 7rJ. Also, we regard the rational functions (17.5.2) as matrix functions with respect to the basis introduced for Im 7r;. We have the minimal factorization W(A)=W1(A)W2(A)---W'il(A) Moreover, writingp = max(||A0||, ||flj, ||C0||, ||7r,||,. . . , ||7rJ|), we obtain < ||C07ry(/- S,.)|| + ||C0(ir; - ir;.)5;|| + ||(C- C0)«-;5,|| + ysrV^oCfl) - 7r;)5;|| + ||5rV;(^0 - >t>«-;sy|| +1|(/- s;VyBj + ||5r,(7r;-7r;.)B0|| + ||5r17r;(B0-B)|| < pV + pw,(l + e') + o>(w, + p)(l + e') + ||(/- s:')||P3 + ||5-'||p36' + ||sr'|kp2(l + e') + II^'IIK + pWi +*') + 1157'IIK + p)2»(i +O + P- 5;MIp2 + ||5:,||a,1p + ||5:1||(Wl+p)W Use the inequalities w, < e', <o s e' and the inequalities HSr'H < (1 - e')~", ||7— 5rl|| < e'(l - e')_1 (assuming e'< 1; cf. the proof of Theorem 16.2.1) to get ||C0ir, - Cir'jSjW + WirjA^ - SJ^AvftW + ||ir,B0 - Sy"V;B|| < p2e' + pe'(l + e') + e'(e' + p)(l + e')+2e'(l - e')"'p3 + (1 - 6')" VP2(1 + 6') + (1 - *')" V + p)p€'(l + 6') + (1 - e')~ V + P)2^'(l + «') + «'(1 - e')"'p2 + (1 - «')" Vp + (l-6T,(e' + P)6' It remains to choose e' < 1 in such a way that this expression is less than e, and the stability of factorization (17.4.6) is proved. Conversely, let the factorization (17.4.6) be stable and assume that II is not stable. Then there exist an e>0 and sequences {Am}°^=l, {Bm}°°m^l, {Cm)Z = i such that Km{\\Am - A0\\ + \\Bm - B0|| + ||Cm - C0||} =0 (17.5.3) and
534 Applications i-l where II = (ir[, . . . , ir'k) is any supporting /c-tuple of projectors with respect to at least one of the triples (Am, Bm, Cm), m = 1, 2, . . . . Since (17.5.1) is a minimal realization, we can assume (using Theorem 7.1.5 and the fact that the full-range and null kernel properties of a pair of transformations are preserved under sufficiently small perturbation of this pair) that WJ\)d=In + Cm(\Is-Am)-lBm is minimal for all m. In view of the stability of (17.4.6), we can also assume that each Wm(\) admits a minimal factorization *UA) = Wml(K)Wm2{k)-- ■ Wmk{K) (17.5.4) where for /' = 1, 2,. . . k, we obtain Wmi(\) = I+Cmi(U-Amjy1Bm/ (17.5.5) and cm{-Im "/-* <p" , Amr Im fl)-*Im irt, Bmj: (p"^Im ■ni are transformations written as matrices with respect to the basis introduced for Im iTj with the property that Urn (He,,. - C07ry|| + \\Amj - Vt07r;.|| + \\Bmi - ^B0\\} = 0 (17.5.6) For fixed m, consider the minimal realization Wm(\) = I+Cm(\I-Am) lB„ where *-m l^ml' ^m2' ■ • • > ^mk\ A_ = Am\ BmlCm2 "ml^mk BmlCmk B_ = Bml ml Bm-, Bmk-
Proof of the Auxiliary Lemmas 535 obtained from the minimal factorization (17.5.3) [cf. formula (7.3.4)]. As any two minimal realizations of Wm(\) are similar, there exists an invertible transformation Sm: Im ttx 4- • • ■ 4- Im irk—*- <p such that L-.A-, — C_ , om Amj_ — j4_ , im om — o„ Actually, such an Sm is unique, and from the explicit formula for Sm (Theorem 7.1.3) we find, using (17.5.3) and (17.5.6), that Sm —*■ I as m —> ». Now let n(m) = (tr\m),. . . , trkm)) be the supporting /c-tuple of projectors with respect to (Am,Bm,Cm), which corresponds to the minimal factorization (17.5.4). Thus, for / = 1,. . . , k we have Wmj(X) = I+ Cm7r<m)(A/- ^Am^y^Bm (17.5.7) and hence 7r;(m) = SmTrjSm1. We find that £*=1 ||77;(m) - «ry||-*0 as /n^oo, a contradiction with the choice of (Am, Bm, Cm). Lemma 17.4.2 is proved. We pass on to the proof of Lemma 17.4.3. Assume that the subspaces Im(7r, + • • • + 7T;), j = 1,. . . , k are stable ^-invariant subspaces and that Im(7j-:+ 77/+1 H + TTk) are stable Aq -invariant subspaces. Arguing by contradiction, assume that II is not stable. Then there exist an e>0 and sequences {AJl = l,{Bm}l=l, and {CJ^., such that Hm {\\Am - A0\\ + \\Bm - B0\\ + \\Cm- C0\\) =0 and 2 Ik;- ir{\\ >e (17.5.8) for every supporting /c-tuple of projectors (ir[,. . . , ir'k) with respect to (Am,Bm,C), m = l,2, .... Then clearly Am^AQ and A* = Am - BmCm-* Aq as m—*«>. By assumption, and using Theorem 15.6.1, for each positive integer m there exists a sequence of chains of subspaces {0} C £\m) C---C £km_\ C %\m) = <p6, such that &?\ ..., %[m) are Am invariant and Km 0(#y<M),Im(ir1 +■■• + !>))) = {) for ; = 1, ...,* (17.5.9) Similarly, there exists a sequence of chains of subspaces ^s = M(r)DM^)D---DM[m)D{0}, rn = l,2,... such that M^m), j= I,. . . ,k are A^ invariant and Mm d{M<"\ ImCfly + 7r; + I + - • • + ttJ) = 0, / = 1, . . . , /c (17.5.10)
536 Applications As Im(7r, + ■ • • + 7r;) + Im(7r/+1 + ■ • • + irk) = $s, for / = 1,. . . , k - 1 and sufficiently large m, we find, using Lemma 13.3.2, that Now let jv<»> = ^<»)n^{"), 7 = 1,..., k It is easy to see that ^<m)ni?;(m> = {0}, j = 2,...,k (17.5.11) Furthermore ^(,m) + • • ■ + jV<m> = jjp<m) , / = 1 it (17.5.12) Indeed, (17.5.12) obviously holds for ;'=1. Assuming that (17.5.12) is proved for / = p - 1, we have jf™ + ..- + j{W = #£>, + (#<,"■> n J< <m)) where is clearly contained in ifj,m). Take xE££^\ and write jc = y + z, where y e i^m', and z6i,w. Then z = x-y E-S?*"0, and x E <e<?_\ + (g^nM^). So (17.5.12) is proved. Combining (17.5.11) and (17.5.12), we find that Jf\m) + ■■■ + Jf™ = <fa Developing an analog of the proof of (17.5.12), one proves that Jf)m) + JV£> + • ■ • + JVf' = M<;m), j = 1,... , k For sufficiently large m, let 7rjm) be the projector on ^V;(m) along JV"(,m) 4- • • • 4-^V)mJ 4-^V]^ +--- + Jf^"\ Then the fc-tuple of projectors (iriM>, 7r<m),. . . , 7r(r') is supporting for (,4m, Bm, Cm). Denoting by T<m) the projector on %f> along .*<"{ (/= 1, ...,*-1), we have T;(m) = 7r(,m) + • - • + 7r]m). On the other hand, (17.5.9) and (17.5.10) imply, in view of Theorem 13.4.3, that for /' = 1,. . . , k - 1. lim||r<m,-(7r1+-- + 7ry)||=0 and so limm_0O ||7r/(m) - 7ry|| =0, a contradiction with (17.5.8). Conversely, assume that n is stable, but one of the j40-invariant sub- spaces Im 7r,,. . . , 1111(77-, + ■ ■ ■ + irk), say, Im(7r, + • • ■ + Try), is not stable.
Rational Matrix Fnnctions: Further Deductions 537 Then there exist an e>0 and a sequence {Am}*°m=x such that ||/lm- .<40||-»0 as m-*°° and 6)(J<,Im(7r1+-- + 7r/))>6 (17.5.13) for every j4m-invariant subspace M (m = 1, 2,. . .). As II is stable, there exists a sequence of k-tuples of projectors II<m) = (7r(,m>,. . . , 7r^m)), m = 1,2,... such that II<m) is supporting for (Am, BQ, C0) and \immJi\\^-^\\]=0 Hence for the v4m-invariant subspace Im(ir(1m) + • • • + 7rjm') we have Jim fl(Im(fl-(1",) + • ■ • + 7r;(m)), Im(7r, + • ■ • + ny)) = 0 a contradiction with (17.5.13). In a similar way, one arrives at a contradiction if n is stable but one of the Aq -invariant subspaces Im(?7v + ir-+, + • - - + 7rA), j - 1,. . . , /c, is not stable. Lemma 17.4.3 is proved completely. 17.6 STABLE MINIMAL FACTORIZATIONS OF RATIONAL MATRIX FUNCTIONS: FURTHER DEDUCTIONS In this section we use Theorem 17.4.1 and its proof to derive some useful information on stable minimal factorizations of rational matrix functions. First, let us make Theorem 17.4.1 more precise in the sense that if the minimal factorization Wo(A) = W01(A)---W0i(A) (17.6.1) is stable, then so is every minimal factorization sufficiently close to (17.6.1). Theorem 17.6.1 Assume that (17.6.1) is a stable minimal factorization, and let W0{\) = /„ + C0(A/S - AoylB0 (17.6.2) and W0>(A) = /„ + C0/(A/(. - AQjy'B0j, / = !,...,* (17.6.3)
538 Applications be minimal realizations of W0(A) and W0;(A). Then every minimal factorization W(\)=Wi(\)-Wk(\) with minimal realizations W(\) = I„ + C(XIS-A)-1B and Wt(\) = /„ + C,(A/(/- A/)'lBj, j = 1,. . . , k is stable provided ||/l->l0|| + ||fl-fl0|| + ||C-CJ + J. {\\Ar A0j\\ + \\Br B0i\\ + \\Cr C0j\\) is small enough. The proof of this result is obtained by combining Corollary 15.4.2 with Lemmas 17.4.2 and 17.4.3. Let us clarify the connection between isolatedness and stability for minimal factorizations. The minimal factorization (17.6.1) is called isolated if the following holds: given minimal realizations W = /. + WrM"\ for j: = 1,. . . , k, there exists e > 0 such that, if is a minimal factorization with rational matrix functions VVo,(A)- • • WQk(X) that admit minimal realizations WQi(\) = /„ + C0;(A/,;- AQj)-lBQj (17.6.4) such that 2 {\\A0i - Ay\ + \\B0j - BJ + \\C0j - CQj\\} < e then necessarily W0j(\) = W0j(\) for each j. It is easily seen that this definition does not depend on the choice of the minimal realization (17.6.4).
Rational Matrix Functions: Further Deductions 539 From the proof of Theorem 17.4.1 and the fact that the stable invariant subspaces coincide with the isolated ones (Section 14.3), it is found that this property also holds for stable minimal factorizations: Theorem 17.6.2 The minimal factorization (17.6.1) is stable if and only if it is isolated. Consider again the minimal factorization (17.6.1) with given minimal realizations (17.6.2) and (17.6.3) for W0(\) and W01(A),. . . , W0k(\). We say that (17.6.1) is Lipschitz stable if there exist positive constants e and K with the following property: for every triple of matrices (A, B, C) with appropriate sizes and with \\A - j40|| + ||fl - B0|| + \\C - C0\\ < e, the realization W(\) = I„ + C(\IS-A)~,B is minimal and W(X) admits a minimal factorization W= W,W2 • • • Wk such that, for /' = 1, . . . , k, W^( A) has a minimal realization Wj(\) = In + Cj(Ulj-Aj)1Bj where, for each / U^A^W + WB.-Bj + Wq-Cj *K{\\A-A0\\ + \\B-B0\\ + \\C-C0\\} Again, the proof of Theorem 17.4.1, together with the description of Lipschitz stable invariant subspaces (Theorem 15.5.1), yields a characterization of Lipschitz stable minimal factorizations, as follows. Theorem 17.6.3 For the minimal factorization (17.6.1), the following statements are equivalent: (a) equation (17.6.1) is Lipschitz stable; (b) for every pair of indices j¥"p, the rational functions W0/(A) and WQp(\) have no common zeros and no common poles; (c) given minimal realizations (17.6.2) and (17.6.3) of W0( A) and W01(A),. . . , W0k(\), for every sufficiently small e>0 there exists an <o>0 such that for any triple (A, B, C) with \\A - AQ\\ + \\B - B0\\ + \\C - C0\\ < (o the realization W(X) = In + C(XI8-Ay,B is minimal and W( A) admits a unique minimal factorization W( A) = Wl(X)W2(X) ■ ■ • Wk(X) with the property that for j = 1,. . . , k each W^k) has a minimal realization
540 Applications wi(\) = in + cj(\i,i-Aiy1Bi satisfying K~ ^11 + 11^-^11 + IICf-C0J\\<e 17.7 STABILITY OF LINEAR FRACTIONAL DECOMPOSITIONS OF RATIONAL MATRIX FUNCTIONS Let £/(A) be a rational ^xj matrix function with finite value at infinity. In this section we study stability of minimal linear fractional decompositions U(\) = &W(V) (17.7.1) where W(\) and V(A) are rational matrix functions of suitable sizes that take finite values at infinity. (See Sections 7.6-7.8 for the definition and basic facts on linear fractional decompositions.) In informal terms, the stability of (17.7.1) means that any rational matrix function t/(A) sufficiently close to U(X) admits a minimal linear fractional decomposition U(X) = &W(V), where the rational matrix functions W(X) and V(X) are as close as we wish to W(X) and V(X), respectively. To make this notion precise, we resort to minimal realizations for the matrix functions involved. Thus let U(X) = 8 +y(\I-a)',B (17.7.2) be a minimal realization of U( A), where a, B, y, and 8 are matrices of sizes / x /, / x s, q x /, and q X s, respectively. Also, let and W(X) = D + C(XI-A)~lB V(X) = d + c(XI-aylb be minimal realizations of W(\) and V(X). We say that the minimal linear fractional decomposition (17.7.1) is Lipschitz stable if there exist positive constants e and K such that any q x s rational matrix function U( A) that admits a realization U(A) = 8 + y(Xl-aylp (17.7.3) with
Decompositions of Rational Matrix Functions 541 max{||S-S||,||y-y||, ||/3-/3||,||«-a||) <€ (17.7.4) has a minimal linear fractional decomposition where the rational matrix functions W(\) and V(\) admit realizations W(\) = D + C(\I- AylB , Vi^^d + ciXI-dy'b with the property that max{||D - D||, \\C-C\\, \\B- B\\, \\A - A\\, \\d-d\\, \\c - c\\, ||6-ftl|,||a-fl||}sJfmax{|lS-S||,||f-y||, H/3-jB||, Hd-a||} (17.7.5) It is assumed, of course, that the sizes of two matrices coincide each time their difference appears in the preceding inequalities. Since any two minimal realizations of the same rational matrix function are similar (Theorems 7.1.4 and 7.1.5), it is easily seen that the definition of Lipschitz stability does not depend on the particular choice of minimal realizations for £/(A), W(\), and V(A). It is remarkable that a large class of minimal linear fractional decompositions is Lipschitz stable, as opposed to the factorization of monic matrix polynomials and the minimal factorization of rational matrix functions, where Lipschitz stability is exceptional in a certain sense (Sections 17.3 and 17.6). Theorem 17.7.1 Let U(\)=&w(V) = W2l(\) + W22(\)V(\)(I-Wn(\)V(\))-'Wu(\) (17.7.6) be a minimal linear fractional decomposition, where W»A)-[^> "^ MA) W22(\)l is a suitable partition of W(X). Assume that the rational matrix functions W(X) and U(\) take finite values at infinity, and assume, in addition, that the matrices Wu(°°) and W22(^>) are invertible. Then (17.7.6) is Lipschitz stable.
542 Applications Proof. We make use of Theorem 7.7.1, which describes minimal linear fractional decompositions in terms of reducing pairs of subspaces with respect to the minimal realization (17.7.2). Thus there exists an [a /3]- invariant subspace Mx C (p' and an -invariant subspace M2 C <p', which are direct complements to each other and such that for some transformations F:<p'-><p* and G: <£*-» <p' with (a + pF)MlCMl and (a+ Gy)M2CM2 the formulas (7.7.5)-(7.7.10) hold. Moreover, one can choose F and G in such a way that M, is a spectral invariant subspaces (i.e., a sum of root subspaces) for a + /3F and J<2 is a spectral invariant subspace for a + Gy. Indeed, Theorem 7.7.2 shows that the linear fractional decomposition (17.7.6) depends on (Mj, M2; F\M[, Qm G) only, where QM is the projector on Mx along M2. [Of course, it is assumed that the minimal realization (17.7.2) of £/(A) is fixed.] But the proof of Theorem 15.8.1 shows that there exists a transformation F': $'-*$s such that F'x = 0 for all xEMl and the (a + 0(F + F'))-invariant subspace Mx is spectral. So we can replace F by F + F'. Similarly, one proves that G can be chosen with spectral (a + Gy)-invariant subspace M2. In the rest of the proof we assume that F and G satisfy this additional property. Now let £/( A) be another rational q x s matrix function with finite value at infinity that admits a realization (17.7.3) with the property (17.7.4). Here the positive number e > 0 is sufficiently small and is chosen later. First, observe that for e >0 small enough the realization (17.7.3) is also minimal. Indeed, by Theorem 7.6.1 we have 2lm(a'jB) = 4:\ H Ker(y«')={0} which means the right invertibility of [/}, a/},. . . , a'-1/}] and the left invertibility of J ya .ya Since one-sided invertibility of a matrix is a property that is preserved under small perturbations of that matrix, our conclusion concerning minimality of (17.7.3) follows. Recall (Theorem 15.8.1) that the spectral invariant subspaces Ml and M2 for (a + /3F) and for (a + Gy), respectively, are Lipschitz stable. It follows that there exists a constant AT,>0 such that d + /3F and a + Gy have invariant subspaces Mx and M2, respectively, with the property that
Decompositions of Rational Matrix Functions 543 0(Ml,M1) + 6(M2, M2)^K, max{||d - «||, ||j§ - fi\\, \\y - y\\, \\S - S\\) provided e is small enough. By Lemma 13.3.2, by choosing sufficiently small e we ensure that Ji, and Ji2 are again direct complements to each other. In other words, (MX,M2) is a reducing pair with respect to the realization (17.7.3). Let d = d, Du = £>„, £>22 = D22, Dl2 = Dn, and D2l = 8 - D22d(I-D12d)~lDu Also, put F=F, G = G. By Theorem 7.7.1 we obtain a minimal linear fractional decomposition £/(A) = ^^,(V), where the functions W(\) and V(\) are given by formulas (7.7.5)-(7.7.10) except that each letter (with the exception of <pV<P*> <P') nas a tilde. These formulas show that for e>0 small enough there is a positive constant K satisfying (17.7.5) provided F and G satisfy the following property: given a basis /,,..., fk in Mt, there exists a positive constant K2 (which depends on this basis only) such that ||F, - F,|| + ||G, - G,|| < K2{e(MuMx) + 6(M2, M2)} (17.7.7) Here F, = F\M : M^ —*■ (p1 and G, = QM G: $q—*■ M^, where QM stands for the projector on Mx along M2, are transformations written as matrices with respect to the basis /,,. . . , fk (and the standard orthonormal bases in (p* and (p*), and /", = F\Mt: Mx-^> (p1, G, = QUG: <p*—»^, are similarly defined matrices with respect to some basis gx,. . . , gk'm Mx, where QM is the projector on Ml along M2. To prove the existence of a constant ^2>0 with the property (17.7.7), we appeal to Lemma 13.3.2. In view of this lemma, in case Jtl and M2 are sufficiently close to Mx and M2, respectively, there exists a constant K3>0 (depending on Mx and M2 only) such that max(||/- 5||, ||/- 5"'||)< K3{e{Mx, Mx) + 6(M2, M2)} for some invertible transformation S:(p'—»<p' such that SM, = Mx and SM2 — Jt2. It remains to choose g, = S/j,.. . , gk = Sfk. O It is instructive to compare Theorem 17.7.1 with Theorems 17.4.1 and 17.6.3. Thus any minimal factorization f/(A) = f/,(A)f/2(A), where £/,(A) and t/2(A) are nx n rational matrix functions with value / at infinity, is Lipschitz stable in the class of minimal linear fractional decompositions. In contrast, this minimal factorization need not be Lipschitz stable (or even stable) in the class of minimal factorizations. The following example illustrates this point:
544 EXAMPLE 17.7.1. Applications Let «™-[,+.A"' , 0 1 It is easily seen that U(X) admits a minimal factorization «™-['r x 1+v.] (17.7.8) This minimal factorization is not stable because the perturbed rational matrix function does not have nontrivial minimal factorizations at all. On the other hand, (17.7.8) can be represented as a minimal linear fractional decomposition U(k) = 9w(V) with W(A) = diag[l,l,l + A-,,l]; V(A)=[J 1+°a_,] Observe that W(\) has a minimal realization W(A) = / + 0 0 1 OJ (A-0)"'[0 0 1 0] Now f/e(A) also admits a minimal linear fractional decomposition U€(\) = &W(V), where W.(A) = 0 0 0 Moreover, W€(A) has a minimal realization 1 0 0 0 1 -6A"1 0 0 1 + A"1 0 0 6A"1 W.(A) = / + ro- 0 1 Lo. (A-ori[o,-6,i,6} Hence, as predicted by Theorem 17.7.1, the minimal factorization (17.7.8) is Lipschitz stable when understood as a minimal linear fractional decomposition. □
Isolated Solutions of Matrix Quadratic Equations 545 17.8 ISOLATED SOLUTIONS OF MATRIX QUADRATIC EQUATIONS Consider the matrix quadratic equation XBX+ XA- DX-C = 0 (17.8.1) where A, B, C, D are known matrices of sizes n x n, n x m, m x n, mx m, respectively, and A' is a matrix of size m x n to be found. For any mx n matrix X, let Gw={[^]|xGr}cr©r be the graph of X. The following proposition connects the solutions of (17.8.1) with invariant subspaces of the (m + n) x (m + n) matrix = [Ac l\ Proposition 17.8.1 For an m x n matrix X, the subspace G(X) is T invariant if and only if X satisfies (17.8.1). Proof. Assume that G(X) is T invariant. So for every x G (p" there exists a y G (p" such that iXx\ LXv\ Xy. The correspondence x—* y is clearly linear; so y — Zx for some n x n matrix Z, and we have ■[*]■[ Zx XZx for all x G <£•", or [Ac XH z xz\ (17.8.2) This implies Z = A + BX and C + DX=XZ = X(A + BX) . def which means that (17.8.1) holds. Conversely, if (17.8.1) holds and Z"=A + BX, then (17.8.2) holds. This implies the T invariance of G(X). O
546 Applications To take advantage of Proposition 17.8.1 in describing isolated solutions of (17.8.1), we need a preliminary result. Lemma 17.8.2 Define a function G from the set Mmxn of all m x n matrices to the set of all subspaces in <(7" © (pm by G(X) = G(X). Then G is a homeomorphism (i.e., a bijective map that is continuous together with its inverse) between MmXn and the set of all subspaces M C <p" © §m with the property that 6(M, &)<1, where ^=£"©{0}. Here d(M, Jf) is the gap between M and jV (see Chapter 13). Proof. The continuity of G and G ~' follows from the easily verified fact that the orthogonal projector P on G(X) is given by IXI LX* XL XLX* (17.8.3) where L = (I + X*X) \ Let us check that d(G(X), %)< 1. By Theorem 13.1.1 fl(G(*), #) = max{ sup ||(/-P)*||, sup ||(7-P*)*||} 11*11 = 1 ||x|| = l (17.8.4) where Px is the orthogonal projector on 2if. The second supremum is sup|(7-P*)[^J|"=sup|LYy|| where ||[^y]| = l, that is, ||^||2 = 1 - ||>>||2 As ||A>|| /||y|| ^ \\X\\ is uniformly bounded, it follows that ||>>|| is bounded away from zero. Hence the second supremum in (17.8.4) is less than 1. To show that the first supremum in (17.8.4) is also less than 1, assume (arguing by contradiction) that sup|(7-P)f*l| = l ||.r|| = ill LUJII Asll(/-/,)[oI +IHo]|| =!ol =i'itfoi,°wsthat ii r*in inf P =0 11*11 = 111 LOJII and by formula (17.8.3)
Isolated Solutions of Matrix Quadratic Equations 547 Lx inf XLxi = 0 (17.8.5) But L is invertible, so ||*|| HIL-Lrll^-'IHM and (17.8.5) is impossible. Thus 6(G(X), ^)<1 as claimed. Now we must show that every subspace M C (p" © Cm with 0(M, %!) = a<\ is a graph subspace, that is, M = G(X) for some X. First, Theorem 13.1.2 shows that dim M = dim "M — n. Further, assume that Pxx = 0 for some *£J<. Denoting by P the orthogonal projector on M, we have 11*11 = %?m - px)x\\> which, in view of the condition 6)(J<, Slf) = ||P<, - Pa,||<l, implies x = 0. Hence Q- PX\M: M—*'3t is an invertible linear transformation. Now M - G((/- Px)Q~l). Indeed, if xEM, then x= Qx + (/- Px)Q~l ■ Qx E G((/- P^)^"1) On the other hand, if for some uE 3€ y==l(I-Px)Q~lu\ then the vector v = Q'1u has the property that v EM, Pxy = u = Pxv and (/- Px)y = (/- P^Q-1!* = (/- P^)y. So)/ = i); therefore, y belongs to M. D A solution X of (17.8.1) is called isolated if there exists a neighbourhood of X in the linear space Mmxn of all m x n matrices that does not contain other solutions of (17.6.1). A solution X is called inaccessible if the only continuous function <p: [0,1]—* Mmxn such that (p(Q) = X and (p(f) is a solution of (17.8.1) for every f£[0,1], is the constant function (p(t) = X. Clearly, every isolated solution is inaccessible. We now have a characterization of isolated and inaccessible solutions of (17.8.1). Theorem 17.8.3 The following statements are equivalent: (a) X0 is an isolated solution of (17.8.1); (b) X0 is an inaccessible solution of (17.8.1); (c) for every eigenvalue A0 of the matrix A A + BX0 B D-XQB. with dim Ker(T'0 - A0/) > 1, either
548 Applications ^o(r0)n[^] = {0} or (d) every common eigenvalue of A + BX0 and D — X0B has geometric multiplicity one as an eigenvalue of TQ. Proof. Making a change of variable Y = X - XQ, we see that X satisfies (17.8.1) if and only if Y satisfies the equation YBY+ Y(A + BX0) -(D-X0B)Y = 0 (17.8.6) Hence XQ is an isolated (or inaccessible) solution of (17.8.1) if and only if 0 is an isolated (or inaccessible) solution of (17.8.6). By Proposition 17.8.1 and Lemma 17.8.2, the correspondence is a homeomorphism between the set of all solutions Y of (17.8.5) and the set r<ri of ro-invariant subspaces M such that d(M,%C)<\, where 2i?= T> ■ Hence 0 is an isolated (resp. inaccessible) solution of (17.8.6) if and only if 2if is an isolated (resp. inaccessible) ro-invariant subspace. An application of Theorem 14.3.1 and Proposition 14.3.3 shows that (a), (b), and (c) are equivalent. Further, the characteristic polynomial of T0 is the product of the characteristic polynomials of A + BX0 and D - X0B. As the multiplicity of A0 as a zero of the characteristic polynomial of a matrix S is equal to the dimension of 9?A (5), it follows that A0 is a common eigenvalue of A + BX0 and D - XQB if and only if {o}*seAo(r0)n^seAo(r0) So (c) and (d) are equivalent. □ An interesting particular case appears when B = 0. Then we have the equation XA-DX=C (17.8.7) which is a system of linear equations in the entries of X. It is well known
Isolated Solntions of Matrix Quadratic Equations 549 from the theory of linear equations that equation (17.8.7) either has no solutions, has a unique solution, or has infinitely many solutions. [In this case the homogeneous equation YA~DY = 0 (17.8.8) has nontrivial solutions, and the general form of solutions of (17.8.7) is X0 + Y, where X0 is a particular solutions of (17.8.7) and Y is the general solution of the homogeneous equation.] Clearly, a solution X of (17.8.7) is isolated if and only if (17.8.8) has only the trivial solution. Using the criterion of Theorem 17.8.3, we obtain the following well-known result. Corollary 17.8.4 The equation YA - DY = 0 has only the trivial solution Y = 0 if and only if a(A)n<r(D)=0. Reconsidering the general case of equation (17.8.1), let us give some sufficient conditions for isolatedness of the solutions. Corollary 17.8.5 If the matrix r-[Ac 1} is nonderogatory [i.e., dimKer(r- A0/) = 1 for every eigenvalue A0 of T], then the number of solutions of (17.8.1) (// they exist) is finite and, consequently, every solution is isolated. Proof. The matrix T has a finite number of invariant subspaces; namely, there are exactly nj_, (dim 3?A(T) + 1) of them, where A,,. . . , Ar are all the distinct eigenvalues of T. It remains to appeal to Proposition 17.8.1. □ example 17.8.1. Consider the equation [;]»WHS :i;h <-,, WehaveZ? = [l l],A = l, D = [J °], C = [°]. So i i r o l o .0 0 0- The only one-dimensional T-invariant subspaces are J^ = Span{e,} and M2 = Spanje, - e3}. Defining Slf = Span^,}, we have Yc d\
550 fl(^„ ^) = 0; 6(M2,%) = Applications f i 0 0 0 L-i o 2 0 1 2. - 1 0 01 0 0 0 -0 0 OJ V2 <1 so by Proposition 17.8.1 and Lemma 17.8.2 there exist only two solutions [y]]and[^]givenby Hence Ml =< Ly,' te$ M2 =• Ly2' f£<p YMll [;:]-[-?] y2- As expected from Corollary 17.8.5, the number of solutions of (17.8.9) is finite. □ Another particular case of (17.8.1) is of interest. Consider the equation Xz+ A,X+ AQ = 0 (17.8.10) where Ax and A0 are given n x n matrices, and X is an n x n matrix to be found. Equation (17.8.10) is a particular case of (17.8.1) with B = I, C = —A0, D = -At, and A = 0 and is sometimes described as "unilateral." The matrix T turns out to be just the companion matrix of the matrix polynomial L(A) = A2/ + Aj4, + AQ: T=\ ° l 1 l-A0 -Aj Proposition 17.8.1 gives a one-to-one correspondence between the set of solutions X of (17.8.10) and the set of ^invariant subspaces of the form {[;JI-d We remark that a f-invariant subspace M has this form if and only if the transformation [/ 0]|^: Jt-+$" is invertible. In this way we recover the description of right divisors of L(\) given in Section 5.3. Similarly, the equation X2 + XAl + Aa = 0
Stability of Solutions of Matrix Quadratic Equations 551 considered as a particular case of (17.8.1) gives rise (by using Proposition 17.8.1) to a description of left divisors of the matrix polynomial A2/ + Aj4, + A0. 17.9 STABILITY OF SOLUTIONS OF MATRIX QUADRATIC EQUATIONS Consider the equation XBX + XA - DX - C = 0 (17.9.1) with the same assumptions on the matrices A, B, C, D as in the preceding section. We say that a solution X of (17.9.1) is stable if for any e > 0 there is 8 > 0 such that whenever A', B', C", £>' are matrices of appropriate size with max{||i4 - A'\\, \\B - B% \\C-C% \\D - D'||} < 8 the equation YB'Y+ YA'~D'Y-C' = 0 has a solution Y for which \\Y- X\\ < e. It turns out that the situation with regard to stability and isolatedness is analogous to that for invariant subspaces. Theorem 17.9.1 A solution X of equation (17.9.1) is stable if and only if X is isolated. Proof. It is sufficient to prove the theorem for the case when C = 0 and the solution X is the zero matrix (see the proof of Theorem 17.8.3). In this case G(X) = <p" © {0}; so the homeomorphism described in Lemma 17.8.2 implies that ^ = 0 is a stable (resp. isolated) solution of XBX + XA- DX=0 A "1- n , invariant if and only if <(7" © {0} is a stable (resp. isolated) subspace. Now use the fact that the isolated invariant subspaces for a linear transformation coincide with the stable ones (Theorems 15.2.1 and 14.3.1). □ In view of Theorem 7.9.1, statements (c) and (d) in Theorem 17.8.3 describe the stable solutions of equation (17.9.1). In the particular case when B = 0 we find that the solution X of XA - DX = C is stable if and only if o-O4)ntr(D) = 0.
SS2 Applications As a solution X of (17.9.1) is stable if and only if the subspace Im is stable as a T-invariant subspace, where r-[Ac I) we can deduce some properties of stable solutions of (17.9.1) from the corresponding properties of stable T-invariant subspaces. For instance, the set of stable solutions of (17.9.1) is always finite (it may also be empty), and the number of stable solutions of (17.9.1) does not exceed the number y(T) of ^-dimensional stable T-invariant subspaces, which can be calculated .as follows. Let A,,. . . , A be all the distinct eigenvalues of T with algebraic multiplicities m,,..., mp, respectively; then y(T) is the number of sequences of type (qx, . . . , qp), where qt are nonnegative integers with the properties that q^ < /n;, either qi = 0 or qf = mj for every ;' such that dimKer(A;/- T)>1, and qx + ■■ • + qp = n. Using Corollary 15.4.2, we obtain the following property of stable solutions of (17.9.1). Theorem 17.9.2 Let X be a stable solution of (17.9.1). Then every solution Y of equation YB'Y+YA' -D'X- C'=0 where A', B', C", and D' are matrices of appropriate sizes, is stable provided \\Y-X\\ + \\A-A'\\ + \\B-B'\\ + \\C-C'\\ + \\D-D-\\ is small enough. The notion of Lipschitz stability of solutions of (17.7.1) is introduced naturally: a solution X of (17.7.1) is called Lipschitz stable if there exist positive constants e and K such that, for any matrices A', B', C, D' of appropriate sizes with max{\\A - A'\\, \\B - B%\\C - C'\\, \\D - D'\\} < e the equation YB'Y+ YA' - D'Y- C' = 0 has a solution Y satisfying ||A--y||sJC(||i4-^|| + ||fl-B'|| + ||C-C'|| + ||D-D'||)
The Real Case 553 Theorem 17.9.3 A solution of (17.9.1) is Lipschitz stable if and only if <r(A + BX) (1 <r(D-XB) = 0. Proof. Again, we can assume without loss of generality that C = 0 and ^ = 0. Formula (17.8.3) shows that the function G introduced in Lemma 17.8.2 is locally Lipschitz continuous; that is, for every m x n matrix Y there exists a neighbourhood °U of Y and a positive constant K such that 6(G(Z),G(Y))<K\\Z-Y\\ for every ZE.°U. The inverse function G is locally Lipschitz continuous as well. So the zero matrix is a Lipschitz stable solution of (17.9.1) (where C = 0) if and only if the subspace %C = (p" © {0} is Lipschitz stable as an invariant subspace for the matrix Lo d By Theorem 15.5.1, ffl is Lipschitz stable if and only if it is a spectral invariant subspace for T. This means that <r(A) (1 a-(D) = 0. Indeed, if cr(A) H o-(D) t^0, then there exists a T-invariant subspace Z£ strictly bigger than 2fand such that a-(T\x) = {T\x) [e.g., i£= W + Span{x0}, where x0 is an eigenvector of D corresponding to an eigenvalue A0 G cr(A) (1 <r(D)]. So 2if is not spectral. Conversely, if cr(A) n a-(D) = 0, then with the use of Lemma 4.1.3, it follows that !% is spectral. □ Similarly, one can obtain the following fact from Theorem 15.5.1: the solution X in (17.9.1) is Lipschitz stable if and only if for every sufficiently small e > 0 there exists a S > 0 such that \\A - A>\\ + \\B - fl'U + ||C - C'|| + ||D - D'|| < S implies that the equation YB'Y+ YA'-D'Y-C' = 0 has a unique solution Y satisfying \\Y- X\\ < e. 17.10 THE REAL CASE In this section we quickly review some real analogs of the results obtained in this chapter.
554 Applications Let L{ A) be a monic matrix polynomial whose coefficients are real n x n matrices, and consider a factorization L(A)=L,(A)L2(A)---Lr(A) (17.10.1) where L^(A) are monic matrix polynomials with real coefficients. Using the results of Section 15.9 and the approach developed in the proof of Theorem 17.3.1, one obtains necessary and sufficient conditions for stability of the factorization (17.10.1) (the analog of Corollary 17.2.2). The definition of a stable factorization of real monic matrix polynomials is the same as in the complex case, except that now only real matrix polynomials are allowed as perturbations of L(X) and as factors in a factorization of the perturbed polynomial. Theorem 17.10.1 Let CL be the companion matrix of L(A), and let MrCMr_lC---CM2 be the chain of CL-invariant subspaces in $" [where I is the degree of L(A)] corresponding to the factorization (17.10.1). Then (17.10.1) is stable if and only if the following conditions are satisfied: (a) for every eigenvalue A0 of CL with geometric multiplicity greater than 1 and for every i (2^i^r), either Mt D 9?Ao(CL) or M, (1 9?Ao(CL) = {0}; (b) for every real eigenvalue A0 of CL with geometric multiplicity of 1 and even algebraic multiplicity, the algebraic multiplicity of A0 as an eigenvalue of each restriction CL\M. (if A0 is an eigenvalue of CL\M at all) is also even. In contrast with the complex case (Theorem 17.2.5), not every isolated real factorization (17.10.1) is stable. Using the description of isolated invariant subspaces for real transformations (Section 15.9), one finds that (17.10.1) is isolated if and only if the condition (a) in Theorem 17.10.1 holds. Now we pass to the stability of minimal factorizations W0(A) = W0.(A)W01(A) • • • W0k( A) (17.10.2) of a rational matrix function W0(A) such that the entries of W0(A) are real for real A. (In short, such rational matrix functions are called real.) The functions W0/(A) are also assumed to be real, and, in addition, we require that all rational matrix functions involved are n x n and take value / at infinity. Again, the stability of (17.10.2) is defined as in the complex case with only real rational matrix functions allowed. The main result on stability of (17.10.2) is the following analog of Theorem 17.4.1.
The Real Case 555 Theorem 17.10.2 The minimal factorization (17.10.2) of the real rational matrix function W0(\) with W0(<*>) = I, where for j=l,2,...,k, W0j(\) is also a real rational matrix function with W0/(<») = I, is stable if and only if the following conditions hold: (a) each common pole (zero) of W0] and WQp (j¥=p) is a pole (zero) ofW0 of geometric multiplicity 1; (b) each even order real pole A0 of WQ (resp. of Wq1) is also a pole of each W0j (resp. of each W^1) of even order (if X0 is a pole of W0j or of W^ at all). Recall that the geometric multiplicity of a pole (zero) A0 of a rational matrix function W(\) is the number of negative (positive) partial multiplicities of W(X) at A„. In connection with condition (b), observe that the order of a pole A0 of W0(A) is the least positive integer p such that (A- A0)PW0(A) is analytic in a neighbourhood of A0. It coincides with the greatest absolute value of a negative partial multiplicity of W0(A) at A0, as one can easily see using the local Smith form for WQ(\) at A0. We omit the proof of Theorem 17.10.2. It can be obtained in a similar way to the proof of Theorem 17.4.1 by using the description of stable invariant subspaces for real transformations presented in Section 15.9. As in the case of matrix polynomials, not every isolated minimal factorization of a real rational matrix function with real factors is stable (in the class of real factorizations). It is found that (17.10.2) is isolated if and only if condition (a) of Theorem 17.10.2 holds. Let us give an example of an isolated but not stable minimal factorization of real rational matrix functions. example 17.10.1. Let W0(A) = WQ2(\) = 1 .0 "1 .0 A-'+A"2] 1 + A"1 J; ° 1 1 + A-J Wo,(A) = 1 A"1' 0 1 . One verifies easily that W0(A) = WQl(\)WQ2(\) and this factorization is minimal (indeed, the McMillan degree of W0(\) is 2, whereas the McMillan degree of W01(A) and W02(\) is 1). Furthermore Wo,(A)~ -[; r]. ««»--[; • .] so W01(A) and W02(A) do not have common zeros. It is easily seen that A0 = 0 is a common pole of W0(\), W01(A), and W02(A) and that the only negative partial multiplicities of W0(A), W01(A), and W^A) at A0are -2, -1, and —1, respectively. Hence condition (a) of Theorem 17.10.2 is satisfied,
556 Applications but condition (b) is not. It follows that the factorization VV0(A) = W0](A)W02(A) is isolated but not stable in the class of minimal factorizations of real rational matrix functions. □ Finally, consider the matrix quadratic equation XBX + XA - DX - C = 0 (17.10.3) where A, B, C, D are known real matrices of sizes nx n, nxm, m x n, mx m, respectively, and X is a real matrix of size m x n to be found. The solution of X of (17.10.3) is called isolated if there exists e > 0 such that the set of all real matrices Y satisfying ||^- Y\\ < e does not contain solutions of (17.10.3) other than X. The solution of (17.10.3) is called stable if for any e >0 there is S >0 such that whenever A', B', C, D' are real matrices of appropriate sizes with max{|M-/l'||, ||B-B'||, ||C-C'||,||D-D'||}<S, the equation YB'Y+YA'-D'Y-C'=0 has a real solution Y for which ||y-A'||<e. The isolated and stable solutions can be characterized as follows. Theorem 17.10.3 The solution X0 of (17.10.3) is isolated if and only if every common eigenvalue of A + BX0 and D — X0B has geometric multiplicity 1 as an eigenvalue of the matrix T = A IC Di The solution X0 is stable if and only if it is isolated and, in addition, for every real eigenvalue A0 of T with even algebraic multiplicity the algebraic multiplicity of A0 as an eigenvalue of A + BX0 (or of D - X0B) is even (if A0 is an eigenvalue of A + BX0, or of D - XQB at all). In connection with the second statement in this theorem, observe that and thus the algebraic multiplicity m(T; A0) for the eigenvalue A0 of T is equal to the sum of the algebraic multiplicities m(A + BX0; A0) and m(D - X0B; A0). Consequently, if m(T; A0) is even, then the evenness of one of
Exercises 557 the numbers m(A + BXQ; A0) and m(D - X0B; A0) implies the evenness of the other. Again, we omit the proof of Theorem 17.10.3. It can be obtained by using an argument similar to the proofs of Theorems 17.8.3 and 17.9.1, using the description of stable and isolated invariant subspaces for real transformations (Section 15.9) and taking into account equation (17.10.4). 17.11 EXERCISES 17.1 Find all stable factorizations (whose factors are linear matrix polynomials) of the monic matrix polynomial ,, , Taz-a -a + 11 Does L(A) have a nonstable factorization? 17.2 Solve Exercise 17.1 for the matrix polynomial L(A) = A2-2A -A+l 0 A2-2A 17.3 Let L( A) be a monic n x n matrix polynomial of degree / such that CL has nl distinct eigenvalues. Show that any factorization of L(X) (whose factors are monic matrix polynomials as well) is stable. 17.4 Is any factorization of monic matrix polynomial L(A) stable if CL is diagonable? 17.5 Show that the factorization L = LlL1LJ of a monic matrix polynomial L(\) is stable if and only if each of the factorizations L = L2M, M = L2L3 is stable, where M = L^xL. 17.6 Is the property expressed in Exercise 17.5 true for Lipschitz stability? 17.7 Show that a factorization of 2 x 2 monic matrix polynomials L = LtL2 is stable if and only if one of L,(A0) and L2(X0) is invertible for every A0 G <p such that L( A0) = 0. 17.8 Let L(\) = IX.' + 1.'^ AjX' be an n x n matrix polynomial whose coefficients Aj are circulant matrices. Show that any factorization L(A)=L,(A)L2(A)-Lr(A) where for ;' = 1,. . . , r, L.( A) is a monic matrix polynomial with circulant coefficients, is stable in the algebra of circulant matrices, in the following sense: for every e >0 there exists a S >0 such that every monic matrix polynomial L(A) of degree / with circulant coefficients that satisfies cr,(L, L)<8 admits a factorization
558 Applications L(A)=L,(A)L2(A)--£r(A) where L,(A),. . . , Lr(A) are monic matrix polynomials with cir- culant coefficients and such that <rpi(Ll,Ll)+-- + apr(Lr,Lr)<e (Here p- is the degree of L; and of Lr for /' = 1,. . . , r.) 17.9 Give an example of a nonstable factorization of an n x n matrix polynomial with circulant coefficients. 17.10 Let L(A) = diag[M,(A), M2( A)], where M,(A) and Af2(A) are monic matrix polynomials of sizes nx x nt and n2 x n2, respectively, and let L(A) = diag[MH(A), M21(A)] • • -diag[Af „(A), Af2r(A)] (1) be a factorization of L(A), where for / = l,...,r, A/,-(A) and M2/(A) have sizes n, x «, and n2x n2, respectively. (a) Prove that if (1) is stable, then each factorization M,(A) = M„(A)- ■ • M„(A); M2(A) = M21(A)- • • M2,(A) (2) is stable as well. (b) Show that the converse of statement (a) is generally false. (c) Show that the factorization (1) is stable in the algebra of all matrices of type [Ao 1,] where ,4, (resp. A2) is any ni x «, (resp. n2 x n2) matrix if and only if each factorization (2) is stable. (Stability in the algebra of all matrices of type (3) is understood in the same way as stability in the algebra of circulant matrices, as explained in Exercise 17.8.) 17.11 Let V be the algebra of all n x n matrices of type "a, 0 ••• 0 0," 0 a2 ••• /32 0 ; • ; ; I 0 /3„_, ••• a„_, 0 lfin 0 ••• 0 aj where a; and 0. are complex numbers, and let L(A) be a monic matrix polynomial with coefficients from the algebra V. Describe factorizations of L(A) that are stable in the algebra V. (Hint: Use Exercise 17.7.)
Exercises 559 12 Find all stable minimal factorizations of the rational matrix function 1 + A"1 (A-l)~' + A~21 (A-l)2 1 J Is there a nonstable factorization of this function? 13 Prove that every minimal factorization of a scalar rational function with value / at infinity is stable. (It is assumed that the factors are scalar rational functions with value / at infinity as well.) 14 Let W(\) be a rational matrix function with value / at infinity. Assume that W(X) has S distinct zeros and 8 distinct poles, where 8 is the McMillan degree of W(X). Show that every minimal factorization of W(X) is stable. 15 Let W( A) be an n x n rational matrix function with value / at infinity that is a circulant, that is, of type -h-,(A) w2(X) ••• h>„(A) " wn(\) Wl(\) ••• wn_,(A) -w2(A) w3(A) ••• Wl(X) . where ^(A), w2(A), . . . , w„(A) are scalar rational functions. Show that every minimal factorization of W(A) is stable in the class of circulant rational matrix functions. 16 Give an example of nonstable minimal factorization of a circulant rational matrix function with value / at infinity whose factors are also from this class. 17 Let W( A) be a rational matrix function with W(<») = /, and let W(A) = W1(A)---W^(A) (4) be a factorization of W(\), where W;(A) are also rational matrix functions with value / at infinity. Show that if W(\2) = W,(\2)--Wr(\2) is a minimal factorization, then (4) is also minimal. Is the converse true? 18 (a) Find all solutions of the matrix quadratic equation (b) Find all stable solutions of this equation. (c) Find all Lipschitz stable solutions of this equation.
560 Applications 17.19 (a) Describe all circulant solutions of the equation XBX + XA-DX-C = 0 (5) with circulant matrices A, B, C, and D. (b) Can one obtain all circulant solutions of (5), in the event that B is invertible, by the formula \(D - A)B~X + (\(D - A)2B~2 + 4BC)"2? 17.20 Solve the quadratic equation
Notes to Part 3 Chapter 13. This chapter contains mainly well-known results. The main ideas and results concerning the metric space of subspaces appeared first in the infinite dimensional framework [see Krein, Krasnoselskii and Milman (1948); Gohberg and Markus (1959); and also Gohbergand Krein (1957)], and they are adapted here for the finite-dimensional case. The contents of Sections 13.1 and 13.4 are standard. The exposition presented here is based on that of Chapter S.4 in the authors' book (1982) [see also Kato (1976)]. Theorem 13.2.3 is from Gohberg and Markus (1959). The exposition in Section 13.3 follows Section 7.2 in Bart, Gohberg, and Kaashoek (1979). Theorem 13.6.3, along with other related results, was obtained in Gohberg and Leiterer (1972) as a consequence of general properties of cocycles in certain algebras of continuous matrix functions. Theorem 13.5.1 appears in the infinite dimensional framework in Gohberg and Krupnik (1979); here we follow the authors' book (1983b).The material on normed spaces presented in Section 13.8 is standard knowledge. For the first part of this section we made use of the exposition in Lancaster and Tismenetsky (1985). Chapter 14. The description of connected components in the set of invariant subspaces (Sections 14.1 and 14.2) is found in Douglas and Pearcy (1968) [see also Shayman (1982)]. An identification of isolated invariant subspaces is given in Douglas and Pearcy (1968). Note that in the infinite- dimensional framework (Hilbert space and bounded linear operators) there exist inaccessible invariant subspaces that are not isolated [see Douglas and Pearcy (1968)]. Theorem 14.3.5 was originally proved in the infinite- dimensional case [Douglas and Pearcy (1968)]. The results on coinvariant and semiinvariant subspaces in Section 14.5 appear here for the first time. Chapter 15. Theorem 15.2.1 appeared in Bart, Gohberg and Kaashoek (1978) and Campbell and Daughtry (1979). The proof presented here follows the exposition in Bart, Gohberg and Kaashoek (1979). Parts (a)>»(b) of Theorem 15.5.1 was first proved in Kaashoek, van der Mee and Rodman (1982). The statement of Theorem 15.5.1 and the remaining proof is taken from Ran and Rodman (1983). Theorem 15.7.1 was proved in Conway and Halmos (1980). Theorem 15.8.1, although not stated in this way, was proved in Gohberg and Rubinstein (1985). The material of Section 15.9 is based on Bart, Gohberg and Kaashoek (1979). Theorem 15.10.1 was 561
562 Notes to Part 3 proved in den Boer and Thijsse (1980) and Markus and Parilis (1980). Theorem 15.10.2 is suggested by Theorem 2.4 in den Boer and Thijsse (1980). The results of this chapter play an important role in explicit numerical computation of invariant subspaces. However, we do not touch the topic of numerical computation in this book, and refer the reader to the following sources: Bart, Gohberg, Kaashoek and van Dooren (1980); Golub and Wilkinson (1976); Ruhe (1970,1970b); van Dooren (1981, 1983); and Golub and van Loan (1983). Chapter 16. Most of the results and expositions of the material in this chapter is taken from Gohberg and Rodman (1986). Corollary 16.1.3 appeared in Brickman and Fillmore (1967). Lemma 16.5.1 is a particular case of a result due to Ostrowski [see pages 334-335 in Ostrowski (1973)]. Chapter 17. The main results of Section 17.2 (where the case of factorization into the product of two factors L(A) = L,(A)L2(A) was considered) are from Bart, Gohberg and Kaashoek (1978). The exposition of Sections 17.1 and 17.2 follows Gohberg, Lancaster, and Rodman (1982), where only the case of two factors was considered [see also the authors' paper (1979)]. The results of Section 17.3 are presented here probably for the first time. The main part of the contents of Section 17.4, as well as Theorems 17.6.1 and 17.6.2, is taken from Bart, Gohberg and Kaashoek (1979). Lemma 17.8.2 is taken from Campbell and Daughtry (1979). The main results of Section 17.7 are from Gohberg and Rubinstein (1985). Example 17.10.1 is taken from Chapter 9 in Bart, Gohberg and Kaashoek (1979).
Part Four Analytic Properties of Invariant Subspaces This part is devoted to the study of transformations that depend analytically on a parameter, and to the dependence of their invariant subspaces on the parameter. We begin with the simplest invariant subspaces, the kernel and image of the transformation, and this already requires the development of a theory of analytic families of invariant subspaces. Also, the solution of some basic problems is required, such as the existence of analytic bases and analytic complements for analytic families of subspaces. This material is all presented in Chapter 18 and is probably presented in a book on linear algebra for the first time. More generally, these results appeared first in the theory of analytic fibre bundles. The study of more sophisticated objects and their dependence on the complex parameter z is the subject of Chapter 19. These include irreducible subspaces, the Jordan form, and Jordan bases. These results can be viewed as extensions of perturbation theory for analytic families of transformations. The final chapter of Part 4 (and of the book) contains applications of the two preceding chapters to problems that have already appeared in earlier chapters, but now in the context of analytic dependence on a parameter. These applications include the factorization of matrix polynomials and rational matrix functions and the solution of quadratic matrix equations. 563
This page intentionally left blank
Chapter Eighteen Analytic Families of Subspaces In this chapter we study analytic families of transformations and analytic families of their invariant subspaces. For this purpose, the basic notion of an analytic family of subspaces is introduced and studied. This notion is of a local character, and the analysis of its global properties is one of the main problems of this chapter. In the proofs of Lemmas 18.4.2 and 18.5.2 (only) we use some basic methods from the theory of infinite-dimensional spaces, and this leads us beyond the prerequisites in linear algebra required up to this point. It is shown that the kernel and image of an analytic family of transformations form two analytic families of subspaces (possibly after correction at a discrete set of points). Other classes of invariant subspaces whose behaviour is analytic (at least locally) are also studied. In Section 18.8 we analyze the case when the whole lattice of invariant subspaces behaves analytically. This occurs for analytic families of transformations with a fixed Jordan structure. 18.1 DEFINITION AND EXAMPLES Let ft be a domain (i.e., a connected open set) in the complex plane <(7, and assume that for every 2 Eft a transformation A(z): <pB—><f"" is given. We say that A(z) is an analytic family on ft if in a neighbourhood Uz of each point z0 Eft the transformation valued function A(z) admits representation as a power series A(z) = iAJ(z-zoy, zEU2o where Aa, Ax,. . . , are transformations from (p" into <pm. Equivalently, A(z) is said to depend analytically on 2 in ft if the entries in the matrix 565
566 Analytic Families of Snbspaces representing A(z) in fixed bases in <p" and <pm are analytic functions of z on the domain Qr Obviously, this definition does not depend on the choice of these bases. Now let {■M{z)}z^a be a family of subspaces in (p". So for every z in ft, M(z) is a subspace in <f"\ We say that the family {M(z)}z(=n is analytic on ft if for every 20E(1 there exists a neighbourhood Uz Cil of z0, a subspace M C (p", and an invertible transformation /l(z): <p"—»<p" that depends analytically on z in i/z and M(z) = A(z)M, z£(/Jo (18.1.1) It is easily seen that for an analytic family of subspaces {•M(z)}zfE{1 the dimension of M(z) is independent of z. Indeed, (18.1.1) shows that dim M(z) is fixed for z belonging to the neighbourhood Uz of z0. Since il is connected, for any two points z',z"Eil there is a sequence z0 = z', z,,. . . , zk = z" of points in il such that the intersections Uz (1 Uz , i = 1,. . . , k are not empty. Then obviously dim M(zi) = dim M(zi_l), i = 1, . . . ,k, and hence dim M(z') = dim J<(z"). Let us give some examples of analytic families of subspaces. Proposition 18.1.1 Let xt(z), . . . , x (z) be analytic functions of z on the domain H whose values are n-dimensional vectors. If for every z0 G il the vectors x,(z0),. . . , xp(z0) are linearly independent, then Span{x,(z),. . . , xp{z)} , zEil is an analytic family of subspaces. Proof Take zQEil, and let yp+l, ■ . . , y„ be vectors in (p" such that xx(zQ),... , xp(zQ), yp+1,. . . , y„ form a basis in <p". Then det[x,(z0)• • ■ xp(z0)yp+l ■ ■ ■ yn] *0 As the determinant is a continuous function of its entries and Xj(z), j= 1,. . . , p are analytic (and hence continuous) functions of z on ft, it follows that det[xl(z)--xp(z)yp+r--yn]*0 for all z belonging to some neighbourhood U of z0. Hence Span{x,(z), . . . , xp(z)} = [x,(z) ■ • ■ xp{z)yp+x • ■ ■ yn\M , z <= U
Definition and Examples 567 where M is spanned by the first p coordinate unit vectors in <p", and Span{x,(z),. . . , xp(z)} is, by definition, analytic on ft. O We see later that the property described in Proposition 18.1.1 is characteristic in the sense that for every analytic family of subspaces, there exists a basis that consists of analytic vector functions. Proposition 18.1.2 Let A(z): <p"—»<£"" be an analytic family of transformations on ft, and assume that dim Ker A(z) is constant (i.e., independent of z for z in ft). Then Ker A(z) is an analytic family of subspaces (of <p") on ft, whereas Im A(z) is an analytic family of subspaces (of <£"") on ft. Note that dim Ker A(z) is constant on ft if and only if the rank of A(z) is constant, or, equivalently, the dimension of Im A(z) is constant. Proof. Write A(z) as an m x n matrix with respect to fixed bases in (pm and (p". Take z0 G ft. There exists a nonzero minor of size p x p of A(z0), where by assumption, p = rank A(z) is independent of z. For simplicity of notation assume that this minor is in the upper left corner of A(zQ). As the entries of A(z) depend analytically on z, this p x p minor is also nonzero for all z in a sufficiently small neighbourhood U0 of z0. So for any z E. U0 [here we use the assumption that rank A(z) is independent of z], we obtain Im ,4(2) = Span{a,(z), . . . , ap(z)} where a^z) is the /th column of A(z). Let bp+l,. . . , bm be /n-dimensional vectors such that a,(z0),. . . , ap(z0), bp+x,. . . , bm form a basis in <f"", that is det[a,(z0),. . . , ap(z0), bp+l, ...,bm]*0 Again, by the analyticity of a,(z),. . . , ap(z), there exists a neighbourhood V0 C U0 such that det[ai{z),...,ap{z),bp+l,...,bm]*0 for all z £ V0. Now for z G V0 we have Im A(z) = [at(z),..., ap(z), bp+l,..., bm]M where M = Span{<?,, . . . , ep) C (pm. So, by definition, Im A(z) is an analytic family of subspaces. Now consider Ker A(z) and fix a z0 in ft. There exists a nonzero minor of
568 Analytic Families of Subspaccs size p x p of A(z0), which will be supposed to lie in the left upper corner of A(z0). Partition A(z) accordingly: r*(z) c(z)i A{Z) lD(z) E(z)\ D(z) E(z). where B(z), C{z), D(z), and E(z) are matrix functions of sizes p x p, p x(n- p), (m- p)x m, (m- p)x (n- p), respectively, and are analytic on il. For some neighbourhood U of z0 we have det B(z) ¥^ 0 for z G U. If the vector x , xG $p, y G £" p belongs to Ker A(z) and z£l/, then f B(z)x+C(z)y=0 I D(z)x + E(z)y = 0 or x = -B(zyiC(z)y [-D(z)B(zy1C(z) + E(z)]y = Q It follows that dim Ker A(z) = dim Ker[-D(z)B(z)"'C(z) + £(z)]. But dim Ker j4(z) is independent of z and equal to n-p; consequently, D(z)B(z)~lC(z) + £(z) = 0 for all zEU. Now, obviously «-*.>-[j -B(2);'C(z)k. '6 1/ where ■^r = | ye<P"_P[- Hence Ker /l(z) is an analytic family on n. □ y We see later that the examples of analytic families of subspaces given in Proposition 18.1.2 are basic. In fact, any analytic family of subspaces is the image (or the kernel) of an analytic transformation whose values are projectors. More generally, without the extra assumption that the dimension of Ker A(z) is independent of z, the families of subspaces Ker A(z) and Im A(z), where A(z): <pB—» (pm is an analytic family on il, are not analytic on il. Let us give a simple example illustrating this fact. example 18.1.1. Let 2" z 3 z ^(z) = [22 \\ *e<P Obviously, A(z): <p2—»■ <t2 is an analytic family on <p (written as a matrix in the standard basis in <p ). We have
Analytic Families of Transformations Span[ J, z#0 {0}, 569 Im A(z) ■■ Ker A(z) = Span[ XZ], <P\ 2=0 2 = 0 As dim Im A(z) is not constant, the family of subspaces Im A(z) is not analytic on (p. Similarly, Ker A(z) is not analytic on (p. Note, however, that by changing Im A(z) at the single point 2 = 0 (replacing {0} by Span , TH L0J we obtain a family of one-dimensional subspaces Span that is analytic on <p) (indeed, Span m Span{<?,}) Similarly, by changing Ker A(z) at the single point z = 0 we obtain an analytic family of subspaces Span , z G (p. □ 18.2 KERNEL AND IMAGE OF ANALYTIC FAMILIES OF TRANSFORMATIONS We have observed in the preceding section that, if A(z): <p"—» (pm is an analytic family of transformations, then, in general, Ker A(z) and Im A(z) are not analytic families of subspaces. However, Example 18.1.1 suggests that after a change at certain points Ker A(z) and Im A(z) become analytic families. It turns out that this is true in general. To make this statement more precise, it is convenient to introduce some terminology. Let A(z): <£" —* (pm be an analytic family of transformations on il. The singular set S(A) of A(z) is the set of all z0E(l for which rank A(zn) < max rank A(z) Note that the singular set is discrete; that is, for every z0 G S(A) there is a neighbourhood U C il of zQ such that U D S(A) — {z0}. Theorem 18.2.1 Let A(z): ("—*■ (pm be an analytic family of transformations on il, and let r = max2en rank A(z). Then there exist m-dimensional vector-valued func-
570 Analytic Families of Subspaces tions y}(z), . . . , yr(z) and n-dimensional vector-valued functions Xj(z),. . . , x„_r(z) that are all analytic on il and have the following properties: (a) y^z), . . . , yr(z) are linearly independent for every zEil; (b) x,(z),. . . , xn_r(z) are linearly independent for every z G il; (c) for every z not belonging to the singular set of A(z) Span{y,(z),. . . , y,(z)} = Im A(z) (18.2.1) and Span{x,(2), . . . , x„ r(z)} = Ker A(z) (18.2.2) For any z belonging to the singular set of A(z) the inclusions Span{y,(z), . . . , yr(z)} D Im A(z) (18.2.3) and Span{jc,(z),. . . , *„_r(z)} C Ker A(z) hold. In particular (Proposition 18.1.1), Span{y,(z),. . . , yr(z)} >s an analytic family of subspaces that coincides with Im A(z) outside the singular set of A(z). Similarly, Span{jt,(z),.. . , xn_r(z)} is an analytic family of subspaces that coincides with Ker A(z) outside S(A). The proof of Theorem 18.2.1 is based on the following lemma. Lemma 18.2.2 Let xx(z), ■ ■ ■ , xr(z) be n-dimensional vector-valued functions that are analytic on a domain il in the complex plane. Assume that for some z0 G il, the vectors *,(z0),. . . , jc,(z0) are linearly independent. Then there exist n- dimensional vector functions y,(z), . . . , yr(z) with the following properties: (a) _y,(z),. . . , yr(z) are analytic on il; (b) y,(z),. . . , yr(z) are linearly independent for every zEil; (c) Span{y,(z),. . . , yr(z)} = Span{x,(z), . . . , xr(z)} (C(p") for every zGfixfl0, where fl0 = {zG il | x,(z),. . . , xr(z) are linearly dependent). If, in addition, for some s (sr) the vector functions x,(z),. . . , xs(z) are linearly independent for all z Gil, then y,(2)' ' = 1, • • ■ ,»" can be chosen in such a way that (a)-(c) hold, and moreover, y,(z) = *,(z), • • • , ys(z) = xs(z) for all zE.il. In the proof of Lemma 18.2.2 we use two classical results (see Chapter 3 of Markushevich (1965), Vol. 3, for example) in the theory of analytic and meromorphic functions that are stated here for the reader's convenience.
Analytic Families of Transformations 571 Recall that a set S Cil is called discrete if for every z E S there is a neighbourhood V of z such that V (1 5 = {z}. (In particular, the empty set and the finite sets are discrete.) Note also that a discrete set is at most countable. Lemma 18.2.3 (Weierstrass's theorem). Let S Cil be a discrete set, and for every z0 E S let a positive integer s(z0) be given. Then there exists a (scalar) function f(z) that is analytic on il and for which the set of zeros of f(z) coincides with S, and for every z„ E 5 the multiplicity of z0 as a zero of f(z) is exactly s(z0). Lemma 18.2.4 (Mittag-Leffler theorem). Let S Cil be a discrete set, and for every zQ E S let a rational function of type k qZu(z) = 2 «j(z - zoy (18.2.4) be given, where k is a positive integer (depending on z0) and a; are complex numbers (also depending on z0). Then there exists a function f(z) that is meromorphic on il, for which the set of poles of f(z) coincides with S, and for every z0E S, the singular part of f(z) at zQ coincides with qz (z); that is, f(z) - <7z„(z) « analytic at z0. Proof of Lemma 18.2.2. We proceed by induction on r. Consider first the case r = 1. Let g(z) be an analytic scalar function on il with the property that every zero of g(z) is also a zero of x,(z) having the same multiplicity, and vice versa. The existence of such a g(z) is ensured by the Weierstrass theorem given above. Put yl(z) = (g(z))~lxl(z) to prove Lemma 18.2.2 in the case r — \. Now we can pass on to the general case. Using the induction assumption, we can suppose that xt(z),. . . , xr_t(z) are linearly independent for every z E il. Let X0(z) be an r x r submatrix of the n x r matrix [jc,(z), . . . , xr(z)] such that det X0(z0) t^O. It is well known in the theory of analytic functions that the set of zeros of the not identically zero analytic function det X0(z) is discrete. Since det X0(z0) ^0 implies that the vectors x,(z0),. . . , xr(z0) are linearly independent, it follows that the set il0 = {x E il | Xj(z),. . . , xr(z) are linearly dependent} is also discrete. Disregarding the trivial case when ilQ is empty, we can write f^o = (£i> £>> • • ■}» where C, E il, i = 1, 2,. . . , is a finite or countable sequence with no limit points inside il. Let us show that for every ; = 1, 2,. . . , there exist a positive integer Sj
572 Analytic Families of Subspaces and scalar functions au(z),. . . , ar_x j(z) that are analytic in a neighbourhood of Cj such that the system of n-dimensional analytic vector functions on x,(z),. . . , *r_,(z), (z - £.y> x,(z) + E «/y(z)AC,(z)] (18.2.5) has the following properties: for each z ¥^ £; it is linearly equivalent to the system x,(z),. . . , xT(z) (i.e., both systems span the same subspace in <p"); for z = Cj 't is linearly independent. Indeed, consider the n x r matrix B(z) whose columns are formed by x,(z),. . . , xr(z). By the induction hypothesis, there exists an (r - 1) x (r - 1) submatrix B0(z) in the first r — 1 columns of B(z) such that det B0( ^) ^ 0. For simplicity of notation suppose that B0(z) is formed by the first r - 1 columns and rows in B(z); so p0(z) fl,(z)l *W Lfl2(z) fl3(z)J where B,(z), B2(z)> and B3(z) are of sizes (r - 1) x 1, (n - r + 1) x (r - 1), and (« - r + 1) x 1, respectively. Since B0(z) 's invertible in a neighbourhood of (j, we can write R( , = r / 0][B0(z) 0 lp Bo-^zJB.Cz)] w Lb,(2)b:V2) /JL o Mz)JLo / J (18.2.6) where W(z) = fl3(z) - B2(z)B0",(z)B,(z) is an (n — r + 1) x 1 matrix. Let s be the multiplicity of £; as a zero of the vector function W(z). Consider the matrix function B(z) =L (z)B-\z) T\ 0 (z-£,)"''W(z). Clearly, the columns bt(z),. . . , br(z) of B(z) are analytic and linearly independent vector functions in a neighbourhood V(£y) of £;. From formula (18.2.6) it is clear that Span{jt,(z),. . . , xr(z)} = Span{6,(z),. . . , br(z)} for z £ V(£y.) "- f;. Further, from (18.2.6) we obtain rfl0(z) 0 "W l_fl2(z) (z-f.)"'>W(z) and (z-(,)->W(z) ]-<z-t,n*w[-»-",<;>'"w]
Analytic Families of Transformations 573 So the columns b,(z),. . . , br(z) °f ^(z) have the form (18.2.5), where atj(z) are analytic scalar functions in a neighbourhood of £;. Now choose yt(z),. . . , yr(z) in the form r y,(z) = *,(z),. . . , yr_,(z) = xr_,(z), yr(» = 2 g,(^i(z) where the scalar functions gt(z) are constructed as follows: (a) gr(z) is analytic and different from zero in ft except for the set of poles £,, £2, . . . , with corresponding multiplicities slts2,...; (b) the functions gt(z) (for i = l,...,r — 1) are analytic in ft except for the poles £,, £2,. . . , and the singular part of &(z) at £y (for / = 1, 2,. . .) is equal to the singular part of an(z)gr(z) at Cr Let us check the existence of such functions g^z). Let gr(z) be the inverse of an analytic function with zeros at £,, £2,. . . , with corresponding multiplicities sl,s2,... (such an analytic function exists by Lemma 18.2.3). The functions g,(z),. . . , gr_,(z) are constructed by using the Mittag-Leffler theorem (Lemma 18.2.4). Property (a) ensures that y,(z), . . . , y,(z) are linearly independent for every z £ ft"- { £,, £2,. . .}. In a neighbourhood of each £y we have yr(z) = S (&(z) - a,7(z)gr(z))*,(*) + *r(z)U,(2) + 2 a^jx.-lz) i=l x i=l ' = (z-f>)-,>L(z)+Sflff(z)^)l L 1-1 J + {linear combination of *,(£■),. . . , *r_i (£,-)} + • • • + (18.2.7) where the final ellipsis denotes a vector function that is analytic in a neighbourhood of £ and assumes the value zero at f;. Formula (18.2.7) and the linear independence of vectors (18.2.5) for z = £; ensures that y,(£;), . . . , yr(£j) are linearly independent. Finally, the last statement of Lemma 18.2.2 follows from the proof of the first part of this lemma. □ Proof of Theorem 18.2.1. Let A0(z) be an r x r submatrix of A(z) that is nonsingular for some zE.il, that is, det AQ(z) ¥> 0. So the set fl0 of zeros of the analytic function detAQ(z) is either empty or consists of isolated points. In what follows we assume for simplicity that A0(z) is located in the top left corner of A(z) of size r x r. Let JCj(z),. . . , xr(z) be the first r columns of A(z), and let y,(z), . . . , y,(z) be the vector functions constructed in Lemma 18.2.2. Then for each z G Q ~^ Q0 we have
574 Analytic Families of Subspaces Span{y,(z),. . . , yr(z)} = Span{x,(z),. . . , xr(z)} = Im A(z) (18.2.8) [The last equality follows from the linear independence of jc,(z), . . . , xr(z) for z G fi ^ fl0.] We now prove that Span{y1(z),. . . , y,(z)} Dim i4(z), zen (18.2.9) Equality (18.2.8) means that for every zGfl~-fl0 there exists an rxr matrix B(z) such that Y(z)B(z) = A(z), zGfl~-fl0 (18.2.10) where Y(z) = [yx{z),. . . , yr(z)]. Note that B(z) is necessarily unique. [Indeed, if B'(z) also satisfies (18.2.10), we have Y(z)(B(z) - fl'(z)) = 0, and, in view of the linear independence of the columns of Y(z), B(z) = fl'(z).] Further, B(z) is analytic in fl~-fl0. To check this, pick an arbitrary z'£flvQ0, and let Y0(z) be an rxr submatrix of Y(z) such that det(Y0(z')) 7^0. [For simplicity of notation assume that YQ(z) occupies the top r rows of Y(z).] Then det(y0(z))^0 in some neighbourhood V of z', and (y0(2)r' is analytic on z G V. Now Y(z)~ L d= [(Y0(z))~\0] is a left inverse of Y(z); premultiplying (18.2.10) by Y(z)~L, we obtain B(z)=Y(zyLA(z), zeV (18.2.11) So B(z) is analytic on z G V; since z' G fi ~- fi0 was arbitrary, B(z) is analytic on fi ~"fl0. Moreover, B(z) admits analytic continuation to the whole of fi, as follows. Let z0 G fi0, and let Y(z)'L be a left inverse of Y(z), which is analytic in a neighbourhood Va of z0. [The existence of such Y(z) is proved as above.] Define B(z) as Y(z) LA(z) for z G VQ. Clearly, B(z) is analytic on V0, and for z G V0 "^ {z0}, this definition coincides with (18.2.11) in view of the uniqueness of B(z). So B(z) is analytic on fi. Now it is clear that (18.2.10) holds also for zG fi0, which proves (18.2.9). Consideration of dimensions shows that in fact we have an equality in (18.2.9), unless rank,4(z)<r. Thus (18.2.1) and (18.2.3) are proved. We pass now to the proof of existence of yr+l(z),. . . , y„(z) such that (b), (18.2.2), and (18.2.4) hold. Let at(z),.... ar\z) be the first r rows of A(z). By assumption a,(z), . . . ,ar(z) are linearly independent for some f Gfl. Apply Lemma 18.2.2 to construct ^-dimensional analytic row functions b{(z),. .. , br(z) such that for all z Gfl the rows 6,(z),. . . , br(z) are linearly independent, and for z G fi"- fi0, Span{Z>,(z)r, . . . , br(z)T} = Span{a,(z)r,. . . , ar{z)T} (18.2.12)
Global Properties of Analytic Families of Subspaces 575 Fix zQ E il, and let br+l,. . . , br be ^-dimensional rows such that the vectors bl(z0)T,. . . , br(z0)T, bj+,,..., bl form a basis in (p". Applying Lemma 18.2.2 again [for x,(z) = bx(z)T,. . . , xr(z) = br(zf, xr+1(z) = bJ+],. . . , xn(z) = bTn\, we construct n-dimensional analytic row functions br+i(z),. . . , bn(z) such that the n x n matrix B(z) bx{z) b2(z) is nonsingular for all zE.il. Then the inverse B(z)~l is analytic on il. Let yr+1(z), . . . , yn(z) be the last (n - r) columns of B(z)~l. We claim that (b), (18.2.2), and (18.2.4) are satisfied with this choice. Indeed, (b) is evident. Take zEil"~il0; from (18.2.12) and the construction of yr+l{z),.. . , yn(z) it follows that Ker a,(z) a2(z) Lar(*)J DSpan{yf+1(z),.. . , y„(z)} But since z^il0, every row of A(z) is a linear combination of the first r rows. So in fact Ker A(z)^Span{yr+i(z),..., yn(r)} Now (18.2.13) implies that for zEil~-il0 >l(2)[y,+ ,(2),---,yB(2)] = 0 (18.2.13) (18.2.14) Passing to the limit when z approaches a point from il0, we find that (18.2.14), as well as the inclusion (18.2.13), holds for every zEil. Consideration of dimensions shows that the equality holds in (18.2.13) if and only if rank A(z) = r. □ 18.3 GLOBAL PROPERTIES OF ANALYTIC FAMILIES OF SUBSPACES In the definition of an analytic family of subspaces the transformation A(z) and the subspace M depend on z0, so the definition of an analytic family of subspaces has a local character. However, it turns out that for a given analytic family of subspaces M(z) there exists an analytic family A(z) and a subspace M independent of zQ for which the equality M(z) = A(z)M holds.
576 Analytic Families of Subspaces Theorem 18.3.1 Let {■M{z)}2^n be an analytic family of subspaces (of $") on il. Then there exist invertible transformations A(z): <J7"—* <p" that are analytic on il, and a subspace M C <p" such that M(z) = A(z)M, for all z E il. The lengthy proof of Theorem 18.3.1 is relegated to the next two sections. First, we wish to emphasize that this is a particularly important result concerning analytic families of subspaces and has many consequences, some of which we describe now. Theorem 18.3.2 For an analytic family of subspaces M(z) (of <p") on il the following properties hold: (a) there exist n-dimensional vector functions xt(z), . . . , xp(z) that are analytic on il and such that, for each zE.il, the vectors x,(z),. . . , xp(z) are linearly independent and M(z) = Span{x,(z),. . . , xp(z)} (b) there is an analytic family of projectors P(z) defined on il such that M(z) = Im P(z) for all zEil; (c) for every zEil there exists a direct complement Jf(z) to M(z) in <p" such that the family of subspaces N(z) is analytic. Proof. Let A(z) and M be as in Theorem 18.3.1, and let *,,. . . , xp be a basis in M. Then xt(z) = A(z)xi, i = 1,. . . , p satisfy (a). To satisfy (b), put P(z) = A(z)PA(z)~\ where P is a projector on M. Finally, the family of subspaces N(z) = A(z)Jf, where Jf is a direct complement in M in <p", satisfies (c). D Note that property (b) [as well as property (a)] is characteristic for analytic families of subspaces. So, if P(z) is an analytic family of projectors on il, then Im P(z) is an analytic family of subspaces. We leave the verification of this statement to the reader. In connection with Theorem 18.3.2 (c), note that the orthogonal complement M(z)x is usually not an analytic family, as the next example shows. example 18.3.1. For any z E <p let J<(2) = Span[1JC(p2 Then M(z)L =Span
Global Properties of Analytic Families of Subspaces 577 which is not analytic. Indeed, if M(z)L were analytic, then for 2 in a neighbourhood U of each point z0 £ (f we would have *7l- Span] I = A(z)M where A(z) is a 2 x 2 analytic family of invertible matrices and M is a fixed one-dimensional subspace that, without loss of generality, may be assumed equal to Span{e,}. So Span[7] = Spanh((zi] z(*)- I ic thp first rnliimn nf At: .2(z). on U, where , . is the first column of A(z). Hence aJz) ^0 and for all La,(z)J z(=U z-=-ai(z)a2(zyl (18.3.1) However, the function 2 is not analytic in U, so (18.3.1) cannot happen. □ In the next section we will need the following generalization of Theorem 18.3.2. Theorem 18.3.3 Let M(z) and Jf(z) be analytic families of subspaces (of (p") on ft such that M(z) C Jf(z) for all z G ft. Then there exist n-dimensional vector functions xt(z), . . . , xp(z) [where p = dim Jf(z) - dim M(z)] that are analytic on il and such that, for each z E il, the vectors xx(z), . . . , xp(z) form a basis in ^V(z) modulo M(z). Proof. By Theorem 18.3.2 there are bases y^(z),. . . , ys(z) in M(z) and 11,(2),. . . , v,(z) in Jf(z) that are analytic on ft. By Lemma 18.2.2 there exist analytic vector functions ys+l(z),. . . , y,(z) such that y,(2),. . . , v,(z) are linearly independent for each 2 G ft and Span{y,(2),. . . , y,(z)} = Span{y,(2),. . . , v,(z)} = Jf(z) Obviously, ys+l(z),. . . , y,(z) is the desired analytic basis in N(z) modulo M(z). D We note one more consequence of Theorem 18.3.1. Corollary 18.3.4 Let Mx(z),. . . , Mk(z) be analytic families of subspaces (of <p") on ft, and assume that for each 2 G ft, (p" is a direct sum of Mx(z),. . . , Mk(z). Then,
578 Analytic Families of Subspaces given z0 G il, there exists a family of invertible transformations S(z): <p" —» <p" that is analytic on il and for which 5(z)J<,(zo) = ^i(z) on ^> and S(z0) = /. Proof. It follows from Theorem 18.3.1 that there exist analytic families of invertible transformations S,(z): <p"—»<p", *' = 1, ...,/c, such that S;(z0) = / and ^(z^/z,,) = .^(z) f°r a" zG(p. Now the transformation S(z): ("-* <p" denned by the property that S(z)x = S,(2)* for a1' * e ^;(zo) satisfies the requirements of Corollary 18.3.4. □ JS.4 PK00F OF THEOREM 18.3.1 (COMPACT SETS) As a first step towards the proof of Theorem 18.3.1, a result is proved in this section that can be considered as a weaker version of that theorem. We say that a function /(z) (whose values may be vectors, or transformations) is analytic on a compact set KCil if f(z) is analytic on some open set containing K. Theorem 18.4.1 Let KCil be a compact set, and let M(z) C (p" be an analytic family of subspaces on il. Then there exist vector functions /,(z),. . . , fr(z) G <p" that are analytic on K and such that f(z),. . . , f(z) is a basis in M(z) for every zCK. In turn, we need some preliminaries for the proof of Theorem 18.4.1. First, we introduce the notion of an incomplete factorization. Let A(z) be an n x n matrix function that is analytic on a neighbourhood of the unit circle and is nonsingular on the unit circle. An incomplete factorization of A(z) is a representation of the form A{z) = ~A(z)+A(z) (18.4.1) that holds whenever |z| = 1 and the family +A(z) is nonsingular and analytic on the disc \z\ < 1, and the family ~A{z) is nonsingular and analytic on the annulus 1 < \z\ <». Lemma 18.4.2 Every n x n matrix function A(z) that is analytic and nonsingular on a neighbourhood of the unit circle admits an incomplete factorization. Proof. Consider first the case when A(z) is analytic on the disc \z\ ^ 1. Let z0 be a zero of det A(z) with |z0| < 1. Then for some invertible matrix TQ the first row of T0A(z) is zero at the point z0. Put
Proof of Theorem 18.3.1 (Compact Sets) 579 Vt,(z) = diag[(z - zQy\ 1,. . . , 1] Vt(z) -/l1(z)=r0-|[diag(z-z0),l,...,l] Then A(z) = ~Ax(z) +^4,(z); moreover, ^,(z) is analytic and invertible for 1< Iz^00, +Aj(z) is analytic and invertible for |z|s 1, and the number of zeros of det *A ,(z) inside the unit circle is strictly less than that of det A{z). If det +/l,(z)#0 for \z\ < 1, then A(z) = ~At(z) +At(z) is an incomplete factorization of A(z). Otherwise, we apply the construction above to +Ax(z), and after a finite number of steps an incomplete factorization of A(z) is obtained. Now it is easy to prove Lemma 18.4.2 for the case that A(z) is meromorphic in the disc \z\ < 1 (more exactly, admits a meromorphic continuation into the disc). Indeed, let z,,..., zk be all the poles of A(z) inside the unit disc with orders a{, . . . , ak, respectively. Then the function B(z) = n*=] (z - zi)"iA(z) is analytic for \z\ < 1 and thus (according to the assertion proved in the preceding paragraph) admits an incomplete factorization: B(z) = ~B(z) +fl(z). So (18.4.1) with ~A(z) = {n*„,(z - z,)-"'} B(z); +A(z) = +B(z) is an incomplete factorization of A(z). Now consider the general case. Let e > 0 be such that A(z) is analytic and invertible in the closed annulus <J> = {ze<p|l-e<|z|<l + e}. In the sequel we use some basic and elementary facts about the structure of the set CM of all n x n matrix functions X(z) that are continuous in the closed annulus <t> and analytic in the open annulus 4> = {z£(p|l-e<|z|<l + e}. The set CM is an algebra with pointwise addition and multiplication of matrices and multiplication by scalars, that is, for z G <£ and X(z), Y(z) G Cm we define (XY)(z) = X(z) Y(z), (X + Y)(z) = X(z) + Y(z), (aX)(z) = aX(z) Introduce the following norm in Cu: ||A-||C- = max||*(2)|| where X(z)E CM. It is easily seen that this is indeed a norm; that is, the axioms (a)-(c) of Section 13.8 are satisfied. Moreover H*y||c„*ll*llcJMk for X, YGC^. In fact, the normed algebra CM is a Banach algebra, which means that each Cauchy sequence converges in the norm || ■ ||c to some function in Ca. This follows from the fact that the uniform limit of continuous functions on O is itself a continuous function on 4>, and the limit of analytic function on $ which is uniform on each compact set in 4> is itself analytic on 4>.
580 Analytic Families of Subspaces Let M + be the set of all matrix functions from Ca that admit an analytic continuation to the set {z E (p | \z\ < 1 - e} and let M_ be the set of all matrix functions from Ca that admit an analytic continuation to the set {z£ <p||z|>l + e}U{°°} and assume the zero value at infinity. It is easily seen (as for Cw) that Jt+ and M_ are closed subspaces in the norm || • ||c . Clearly, M + Q M _ = {0} (here 0 stands for the identically zero n x n matrix function on 4>). Furthermore, M+ + M_ = Ca. Indeed, recall that every function X{z) G Ca can be developed into the Laurent series X(z)= S ziXl (l-e<|z|<l + e) where the functions X+(z) = iz% and X_(z)=' 2 z% belong to Jt+ and M_, respectively. Denoting P+(X(z)) = X+(z), we obtain a projector P+: Ca-+ Ca with Im P+= M+ and Ker P+ = M_. It turns out that P+ is bounded, that is d*»f ll^ll = «ip{\\P+(X)\\cm\X(z)BCM, \\X\\Cu= 1} <» [See page 225 in Gohberg and Goldberg (1981), for example; the proof is based on Banach's theorem that every bounded linear operator that maps a Banach space onto itself, and is one-to-one, has a bounded inverse.] Return to our original matrix function A(z). Clearly A(z)'1 G Ca, and the Laurent series A(z)~l = £*!_„ z'Aj converges uniformly in the annulus l-e<|z|^l + e. Therefore, for some N the matrix function AN{z)~ £ji_;v z'Aj has the following properties: det AN(z) t^O for l-e^|z|<l + e and A(z)-^AN(z)(I-M{z)) where M(z) G Ca and l|M||c„<(4||P+||)" (18.4.2) Let +N=P+M + P+((P+M)M)---ECU, Because of (18.4.2), ||+N||C <\, and hence I++N is invertible in the algebra Ca. (Here / represents the constant n x n identity matrix.) Denote +G = (/ + +7V)~'. Then +G and (+G)_1 belong to the image of P+. In
Proof of Theorem 18.3.1 (Compact Sets) 581 particular, +G and (+G)_I are analytic in the disc \z\ < 1. Furthermore, one checks easily that P[(I++N)(I- M)] = / so the function ~G = (I ++N)(I - M) is analytic for 1 < |z| <: oo and at infinity. As \\'G - l\\c^\\+N\\Cu+ \\M\\Ca+ \\+NM\\c^< \ + \ + | <1 G is invertible in C^. Since both ~G and / belong to the (closed) subalgebra C~ = {al + Ker P+ \ a G <£} of CM) also ("G)"'6C,". Now write A(z) l = AN{z)+G(z)-G(z), or /l(2) = CG(z)y\+G(z)y\AN{z)yi and use the fact (proved in the preceding paragraph) that the function (+G(z)y\AN(z))~\ which is meromorphic on the unit disc, admits an incomplete factorization. □ Lemma 18.4.3 Let /,,...,/, G <|7" and g,,. . . , g, G <p" be two systems of analytic and linearly independent vectors on il such that Span{/j(z),. . . , fr(z)} =Span{g1(2),. . ., gr(z)} (18.4.3) for zCil0, where ft0Cfl is a set with at least one limit point inside il. Then Span{/,(2),. . . , fr(z)} = Spanlg.W,. . . , gr(z)} for every z G il and [/.(*) • • -/,W] = [?,(«) • • • 8r(*)] ■ A(z) (18.4.4) where A(z) is an r x r matrix function that is invertible and analytic on il. Proof Consider the system n= {/,,...,/r, g,,..., gr} of '2r n- dimensional vectors. Then rank Il(z) = r for z G il0. On the other hand, the set {z0GQ | rankll(z0)< maxzetl rank n(z)} is discrete. Thus r — max*en rankil(z), and (18.4.3) holds for every zE.il because both systems fi(z),. . . , fr(z) and g^z),. . . , gr(z) are linearly independent. Consequently, there exists a unique matrix function A(z) such that (18.4.4) holds. It remains to prove that A(z) is analytic on il. Let z0 G il, and suppose, for example, that the square matrix X(z) formed by the upper r rows of
582 Analytic Families of Subspaces [g,(z),. . . , gr{z)] is invertible for z = z0. Computing A(z) in a neighbourhood of z0 by Cramer's formulas, we see that A(z) is analytic in a neighbourhood of zQ. Thus A(z) is analytic on ft. □ Proof of Theorem 18.4.1. Without loss of generality, we can suppose that K is a connected set (otherwise consider a larger compact set). Fix z0 £ K, and let Jf0 be some direct complement for M(zQ) in (p". Then (pn = M(z) + JfQ (18.4.5) is a direct sum decomposition for every z £ K except maybe for a finite set of points z,,. . . , zk. Indeed, by the definition of an analytic family of subspaces, for every t\EK there exists a neighbourhood Uv of tj and an analytic and invertible matrix function Bv(z) defined on Uv such that Br)(z)M = M(z) on Uv, where M is a fixed subspace in (p". We can assume [by changing fl,,(z) if necessary) that the subspace M is independent of 17. [Here we use the fact that dim M(z) is constant because of the connectedness of ft.] Actually, we assume M = M(z0). Let x,,. . . , xr be some basis in M(z0), and let xr+i, . . . , xn be a basis in N{). Then for z E U^ the subspaces J<(z) and Jf0 are direct complements to each other if and only if D„(z) = det[B,(2)jr„ . . . , fi„(zK, xr+l,. . . , xn}*0 Two cases can occur: (a) Dv(z) = Q for zE Uv; (b) D^(z)#0, and then we can suppose (taking U^ smaller if necessary) that D^(z) 7^0 only at a finite number of points of Uv. Let us call the points 17 for which (a) holds points of the first kind, and the points tj for which (b) holds points of the second kind. Since K is connected, all tj E K are of the same kind, and since z0 is of the second kind, all 17 £ K are of the second kind. Further, let Uv ,. . . , Uv be a finite covering of the compact K. Since Dv (z) ^ 0 only at a finite number of points z in Un , j = 1,. . . , /, we find that (18.4.5) holds for every z E K except possibly for a finite number of points z,,. . . , zk E K. By the definition of an analytic family of subspaces, there exist neighbourhoods U(z,),..., U(zk) of z,,.--, zk, respectively, and functions B(l)(z),. . . , B(k\z) that are invertible and analytic on £/(z,),. . . , U(zk), respectively, such that Bu\z) M(Zj) = M(z), z£f/(zy), j=l,...,k Let x['\ . . . , jc';) be some basis of the subspace M{z^), and let g^iz) = Bu\z)x\'\ (i = l,...,r; zEUiz,); j=l,...,k). Then for p >0 small enough we have Span{gS/)(z),...,g^(z)} = .^(z)
Proof of Theorem 18.3.1 (Compact Sets) 583 as long as \z - z-| ^ p for / = 1, . . . , k. Let S;. = {zE<f:||z-z;|<p}; S=\JSj For every zE K~- S let P(z) be the projector on M(z) along ,/V0. Then we claim that P(z) is an analytic function on K ~- 5. Indeed, we have to prove this assertion in a neighbourhood of every p,0 E K ^ 5. Let U0 be a neighbourhood of p,0 in the set K ~- S such that, when zEU0, M(z) = B(z)M(iia) for some analytic and invertible matrix function B(z) on U0. The matrix function B(z) defined on U0 by the properties that B(z)x = B(z)x for all x E ^<(mo) and B(z)y = y for all yEJf0 is analytic and invertible. As P(z) = B(z)P0(b(z))~\ where P0 is the projector on M(ij.0) along Jf0, the analyticity of P(z) on U0 follows. Let us now prove that there exist vector functions f\0)(z),. . . , /'0>(z) that are analytic on K ^ S and for which Span{/(,0)(z),. . . , /J0)(z)} = Im P(z) = M(z) where z E K~^ S. Indeed, let z0E K~- 5 be a fixed point. Then dim Im P(z0) = r; let g[0)(z),..., g[°\z) be columns of P(0\z) that are linearly independent for z = z0. In view of Lemma 18.2.2, there exist analytic and linearly independent vector functions /j0)(z), . . . , /*0)(z) defined on K ~- S such that Span{/<°>(z),. . . , /<0)(z)} = Span{gr(2), • • ■ - 8^)} for every zG/k^S, except maybe for a finite set of points. (The set of exceptional points is at most finite because of the compactness of K ~- 5.) But from the choice of g\0),. . . , g<0) it follows that Span{g(,0)(z), . . . , g<0)(z)} = Im P(z) = M{z) for every zE K~-S, except perhaps for a finite set of points [viz., those points z for which the vectors g(,0)(z),. . . , gi°^(z) are not linearly independent]. Thus Span{/(10,(2),. . . , /<0,(z)} = M(z) (18.4.6) for every z E K ""- 5 except maybe for a finite number of points. As both sides of (18.4.6) are analytic families of subspaces on K~-S (Proposition 18.1.1), it is easily seen that, in fact, (18.4.6) holds for every zEK~-S. Consider now the systems {f\0)(z),...,f<r°\z)} and {S^i2)*- ■ • > s\l)(z)}- These systems form two bases for M(z) that are analytic in a neighbourhood of the circle \z - z,| = p. Therefore, by Lemma
584 Analytic Families of Subspaces 18.4.3 there exists an r x r matrix function A(z) analytic and invertible on a neighbourhood U of the set {z£ <p | (z - z,| = p} and such that, for all [g\l\z), ..., g?\z)] = [/<°>(2),. . . , f(r°\z)]A(z) (18.4.7) By Lemma 18.4.2, the function A(z) admits an incomplete factorization relative to the circle {z | \z - z,| = p}: A(z) - ~A(z)- +A(z) (\z - z,| = p). In view of (18.4.7), we find that, when \z - z,| = p [/(.,)(^)■••/il)(^)]-,[«(,,)(^)••■^,)W](^(^))-, = [/(0)(2).../(o»(2)rA(2) Clearly, the functions f\l\z),..., f\l\z) can be continued analytically to the set K ~- (52 U ■ • • U Sk). Moreover, since +A(z) [resp. ~j4(z)] is invertible for |z-z,|<p (resp. Iz-zjap), the set /'"'(z),... ,/<])(z) is linearly independent for every z £ K ~-(S2 U ■ ■ • U Sk). Furthermore, for any z £ K""- (S2 U • • ■ U Sk), we obtain Span{ f\*\z),...,flrl\z)} = M(z) Now take the point z2 and apply similar arguments, and so on. After k steps one obtains the conclusion of Theorem 18.4.1. □ 18.5 PROOF OF THEOREM 18.3.1 (GENERAL CASE) In this section we finish the proof of Theorem 18.3.1. The main idea is to pass from the case of compact sets (Theorem 18.4.1) to the case of a general domain Q. To this end we need some approximation theorems. A set M C <p is called finitely connected if M is connected and (p ~- M consists of a finite number of connected components. A set N C M is called simply connected relative to M if for every connected component Y of (p "- N the set y (1 ((p "- M) is not empty. The first of the necessary approximation theorems is the following. Lemma 18.5.1 Let K C H be a finitely connected compact set that is also simply connected relative to il. Let Y{, . . . ,YS be all the bounded components of (p "- K, and, for j = 1,. . . , s let ZjEiYj^il be fixed points. Let A{z) be an mx n matrix function that is analytic on K. Then for every e > 0 there exists a rational matrix function B(z) of size m x n such that B(z) is analytic on <p ~^ {z,,. . . , zs} and, for any zE. K, ||B(z)-J4(z)||<6 (18.5.1)
Proof of Theorem 18.3.1 (General Case) 585 Proof. Without loss of generality we will suppose that m = n = 1, that is, the functions A(z) and B(z) are scalars. We prove that it is possible to choose a rational function of the form *o s *; R(z) = 2 z\. + H(z- zty'xj¥ (18.5.2) r=0 / = 1 l/=l where the xjvE$ and such that \A(z) - R(z)\ < e for any zEK. Let £/C(p^{z1,...,zi}bea neighbourhood of K whose boundary <?£/ consists of s + 1 closed simple rectifiable contours. Then for z E K, we obtain 2 m )i>v r\ <4M-^l...r!?>*i Since this integral can be uniformly approximated by Riemann sums, we have to prove only that the function (tj-z)'1 can be uniformly approximated by functions of the form S*=0 (z - zi)~"xv, xv E (p, where rjEdU n y^ (/= 1,. . . , s), and (rj-z)'1 can be approximated uniformly by the polynomials S*=0/^, (jc„ E <p) where tjE <?£/n(<p^(/CU y, U • • • U y^)). But this assertion follows from Runge's theorem [Chapter 4 of Markushevich (1965), Vol. 1], which states that, given a simply connected domain T in (pU {°°} and a point £ in the interior of (<(7U {°°})~-I\ any analytic function /(z) on T is the limit of a sequence of rational functions with their only pole at £, and the convergence of this sequence to /(z) is uniform on every compact subset of T. Indeed, for /= 1,. . . , s the set bounded by the contour dU C\ V- is simply connected, as well as the set (<pU{«>})-(KuyiU---y,). □ Lemma 18.5.2 Let K and z,, . . . , zs be as in Lemma 18.5.1. If A(z) is an n x n matrix function that is analytic on K and invertible for every z E K, then for every e > 0 there exists an analytic and invertible matrix function B(z) defined on <p ~- {z,,. . . , zj such that (18.5.1) holds for any zEK. Proof Denote by G the group of all n x n matrix functions M(z) that are analytic on K and invertible for every z E K, together with the topology induced by the norm ||M||G = maxzSK ||M(z)||. Let G, be the connected component of the topological space G that contains /, the constant n x n identity matrix. In fact G,= X E G | there exist an integer v > 0 and Mx,. . . , MVEG, \\ M; || G < 1 j=l,...,v such that X = f\ (I - MM (18.5.3)
586 Analytic Families of Subspaces Indeed, denoting the right-hand side of (18.5.3) by G0, let us prove first that G0 is both a closed and an open set in G. Let FE.G0 and HEGbe such that \\H - F\\G < ||F~'||G'. Then H = (/- M)F, where M = I-HF'\ We have ||M||G = ||/-//F-1||G = ||(//-F)F,||G<||//-F||G||F-1||G<1 that is, H G G0. So GQ is open. Suppose now that Ff G G0, j = 1,2, . . . and ||Fy — F|| -»0 for some F G G. Let ;0 be large enough such that ||F; - F|| < ||Fr'||-'. Then F = (/-M)F/o where ||M||C = ||/-FFr,||c<l,°that is, FE. G0. So GQ is a closed set. Now let us prove that G0 is connected. Let V X = U{I-Mi)SG0, \\Mj\\G<l then V ^(0 = I1(/-/M.), ?G[0,1] ;=i is a continuous function that connects X and /in G0. So G0 is connected and thus is the connected component of G that contains /. So (18.5.3) is proved. As a side observation, note that G0 is also a subgroup of G. Indeed, let X, YE.G0. Then the set X-G^1 is connected and contains /; therefore, X- G0 ' C G0. In particular, XY~l G G0, which means that G0 is a subgroup of G. Now let j4(z) be as in Lemma 18.5.2, and suppose first that A G G,. Then A = (I-Ml)---(I-Mv) for some M,, . . . , MvElG with ||Af^jlc < 1 for ;' = 1,. . . , v. Rewrite this representation in the form A = exp(ln(/ - M,)) • • ■ exp(ln(/ - MJ) where ln(/-A#y)=Z -M) k = \ K By Lemma 18.5.1, for each /' = 1,. . . , v there exists a rational n x n matrix function D- whose poles are contained in {z,,. . . , zs,»}, with the property that DJ approximates the analytic function ln(/ - M;(z)) well enough to ensure that the analytic matrix function B(z) = exp(D,(z))- ■ • exp(D„(z))
Proof of Theorem 18.3.1 (General Case) 587 satisfies (18.5.1) for every zEK. Clearly, B(z) is invertible for every 2 £ <p^ {z{,. . . , zk), so the lemma is proved in the case A(z) E G,. We now pass to the general case. Let GA be the connected component of G which contains A(z). It suffices to show that there exists an n x n matrix function D(z) that is analytic and invertible in fv{z: :,} and such that D(z)EGA. Indeed, then A(z)D(z)~1 E Gn and as we have seen already, there exists an analytic and invertible matrix function B(z) in <p~-{z,,. . . ,zs} with the property that ||fi-j4D"'||g < e||D||c\ The matrix function B(z) = B(z)D(z) is the desired one. Thus let us prove the existence of D(z). According to Lemma 18.5.1, for every S > 0 there exists a rational matrix function DQ(z) that is analytic on <p"^ {z,,. . . ,zs) and such that ||£>0(z) - -<4(z)|| < 8 when zE K. Choose 8 > 0 small enough to ensure that D0(z) is invertible for zEK and D0EGA. Since D0(z) is a rational function, det D0(z) ¥= 0 for every z E (p ^ {2,,. . . , zs) except perhaps for a finite set of points tj,, . . . , r\m E <p "- {zx,...,zs}, which do not belong to K. Denote by Y{r\x) the connected component of <p>/C that contains tj,, and let 2(17,) be the point from {», 2,,. . . , zs) that belongs to y(Tj,). Let p>0 be such that the disc {2 e <p | \z - tj,| < p} is contained in ^(tj,)^ {2(17,), tj2, . . . , Tjm}. By Lemma 18.4.2 there exists an incomplete factorization of D0(z) with respect to the circle \z - tj,| = p: D0(z)=-DQ(z)-+DQ(z) (|2-tj,| = p) (18.5.4) where +D0(z) is analytic and invertible in the disc {z£<p||2-Tj,|<p} and ~D0(z) is analytic and invertible for p < \z - tj,| <». The equality ~D0 = D0(+D0)~l shows that ~D0 admits analytic continuation to the whole of (p and ~D0(z) is invertible for all z # tj, . Also, +DQ is analytic and invertible on <F-{2I,...,2s,T,2,...,TJm}DA:. Let Y(t), 0< t < 1 be a continuous function with values in y(Tj,) such that y(Q) = tj,, y(l) = 2(17,). Then the formula F,(z) = -DQ(z + tj, - y(t)), zEK, 0<,<1 defines a continuous map F: [0,1]—*G with FQ = ~DQ. Hence D, = F, +D0_e GA. As +D0 is invertible on <p - {z,,. . . , zs, tj2, . . . , tjJ and Fx{z) = DQ(z + tj, - z(tj,)) is invertible on <p^ {z(tj,)}, it follows that D^z) is analytic and invertible on $ ~- {2,,. . . , zs, tj2, . . . , Tjm}. Repeating this argument m - 1 times with respect to the points tj2, . . . , Tjm, we obtain the desired function D(z). O The following lemma is the main approximation result that will be used in the transition from compact sets in Q to the domain ft itself.
588 Analytic Families of Subspaces Lemma 18.5.3 Let KC ft be a finitely connected compact set that is also simply connected relative to ft. Let M C (p" be a fixed subspace and A(z) be an n X n matrix function that is analytic and invertible on K and such that A(z)M = M for 2 E K. Then for every e > 0 there exists a matrix function B(z) that is analytic and invertible on ft and such that \\B(z)-A{z)\\<e for all z G K and B(z)M = M for all zEil. Proof. Without loss of generality, we can assume that M = Span{<?,,. . . , er), for some r. Then in the 2x2 block matrix formed by representation with respect to the direct sum decomposition M + M ± = (p" we have Because A{z) is invertible when z G K, so are A^(z) and A2(z). Use Lemma 18.5.2 to find matrix functions fl,(z) and B2(z) that are analytic and invertible on Q and such that \\B,(z) - ^,(z)|| < e/3 for z G K; i = 1, 2. By Lemma 18.5.1 there exists an analytic matrix function Bl2(z) on ft such that HBi2(2)-^i2(2)II <e/3 for zG K. Then fl,(z) Bl2(z) 0 B2(z) B(z) satisfies the requirements of Lemma 18.5.3. □ The following result allows us to pass from the compact sets in ft to ft itself. Lemma 18.5.4 Let K,C K2C- ■ ■ Cft be a sequence of finitely connected compact sets K-, which are also simply connected relative to ft. For m = 1,2,..., let Gm(z) be an n x n matrix function that is analytic and invertible on Km and satisfies Gm{z)M = Ji for z G Km and for some fixed subspace M C (pn. Then for m = 1, 2,. . . , there exists an n x n matrix function Dm{z) that is analytic and invertible on Km and such that, whenever zE.Km DJz)M=M and Gm(z) = Dm(z)Dml+l(z) Proof. We need the following simple assertion. Let Xl, X2,... , be a sequence of n x n matrices such that
Proof of Theorem 18.3.1 (General Case) 589 «=f2 PUI<» (18-5-5) def Then the infinite product V = II^=, (/ + Jfm) converges and ||/ - Y\\ < aea. Indeed, for the matrices Ym - njl, (/ + Xt) we have the estimates: ||ym||^exp(i||*;||W (m = l,2,...) \\Ym ~ V„+1II = IIV™ - YJI + Xm+1)\\ = \\YmXm + l\\ < \\YJ ■ \\Xm+1\\ Thus, in view of (18.5.5) the infinite product Y = Il^=1 (/ + Xm) converges. Moreover HZ-yll^HZ-yJI+i lly.-y^.ll^llAr.H+i \\Xm+l\\e° < ae° m = \ m— 1 We now prove Lemma 18.5.4 itself. Applying Lemma 18.5.3 repeatedly, we find for m = \,2,... a matrix function Hm(z) that is analytic and invertible on Km, for which //,(z) = /, and for z £ Km, Hm(z)M = M and ||/C(z)Gm(z)//m + 1(2)||<2-(m+1) The assertion proved in the preceding paragraph ensures that for every m = 1, 2,. . . , the infinite product Em = 11 (Hml+iGm+iHm + l+J) converges uniformly on Km, and ||/-£m(z)|| <2~m exp(2 m)< 1 for z£ Km. Consequently, Em(z) is invertible for every zE.Km. Further, Em(z)M = M(z £ Km; m = 1,2,. . .). Indeed, since Em{z) is invertible, it is sufficient to prove that Em{z)M CM. But this follows from the equalities Hm(z)M = M, Gm(z)M = M and the definition of Em. Now we can put Dmi.z) = Hm{z)Em{z), because Em = Hm1GmHm + lEm+l and consequently Gm = (HmEm)(Hm + lEm+iyl. D We are now prepared to prove Theorem 18.3.1. Proof of Theorem 18.3.1. Let us show first that there exists a sequence of compact sets KlC K2C • • ■ that are finitely connected, simply connected relative to il, and for which U°°=1 Kf = il. To this end choose a sequence of closed discs Sm C il, m = 1, 2,. . . , such that U^=1 Sm = il. It is sufficient to construct Km in such a way that KmDSm, m = 1,2,... . Put AT, = 5,,
590 Analytic Families of Subspaces suppose that Ki,. . . , Km are already constructed, and Ki D Sj for ;' = 1,. . . , m. Let M be a connected compact set such that M D Km U 5m+1, and let Vx, . . . , Vk Cil be a finite set of closed discs from {5m}^=1 such that N = U*=1 Vj D M. Clearly, A' is a finitely connected compact set. If N is also simply connected relative to il, then put Km+y = N. Otherwise, put Km+} = NUF,U---UFS, where V,,. . . , Ys are all the bounded connected components of the set <£""- N, which are entirely in il. Given the sequence Kx C K2 C • ■ • constructed in the preceding paragraph. Choose z0 G K{ and put M0 = M(zQ) [here M(z) is the analytic family of subspaces (of <p") on Q given in Theorem 18.3.1]. Without loss of generality we can assume that MQ = Span{e,,. . . , er}. By Theorem 18.4.1, there exist analytic vector functions /, \z),. . . , /*m)(z) in Km that form a basis in M(z) for every z G Km. Using Lemma 18.2.2, we find analytic vector functions f%\(z),. . . , f„(z) defined on Km such that the vectors /(,m)(z),. . . , /im)(z) form a basis in <f" for every zEKm (indeed, apply Lemma 18.2.2 with *,(*) = /(1m)(z),. . . , x,{z) = /<m)(z), *,+1(z) = g,,. . . ,xn{z) = gn_r, where g,,. . . , gn_r is a basis in a fixed direct complement to M0). Then the matrix function Am(z) = t/im)(z)> /2m)(z). ■ ■ • . /im)(^)] ^ analytic and invertible on Km and satisfies i(z) = /lm(z)ia (18.5.6) where z<EKm. Put Gm(z) = i4^l(z)i4m+1(z) for zEKm. Then (18.5.6) ensures that Gm(z)Ma = M0 (z E Km,m = 1,2,. . .). By Lemma 18.5.4 (for m = 1,2,. . .) there exists an analytic and invertible matrix function Dm(z) on Km such that Gm = DmD~m\x and, for z G Km Dm(z)M0 = M0 (18.5.7) Since ,4m+1(z)Dm+1(z) = Am(z)Dm(z) (zEKm; m = 1, 2,. . .) the relation j4(z) = j4m(z)Dm(z), which holds for all zEKm, defines an analytic and invertible matrix function ^4(z) on il. Now the relation A(z)M0 = M(z) for z G11 follows from (18.5.6) and (18.5.7). □ 18.6 DIRECT COMPLEMENTS FOR ANALYTIC FAMILIES OF SUBSPACES Let M(z) be an analytic family of subspaces of <(7" defined on a domain il. If Jf is a direct complement to M{za) and z0 G il, then the results of Chapter 13 (Theorem 13.1.3) imply that N is also a direct complement to M(z) as long as z is sufficiently close to z0. This local property of direct complements raises the corresponding global question: does there exist a subspace Jf of (p" that is a direct complement to M(z) for all zEil? The simple example below shows that the answer is generally no.
Direct Complements for Analytic Families of Snbspaces 591 EXAMPLE 18.6.1. Let ^(z) = im[z 7z + 1]c<|:2, ze<: As the polynomials z2 - z + 1 and z + z do not have common zeros, it follows that M(z) is an analytic family of subspaces. Indeed, if z0 is such that z\ + z0 ¥=0, then in a neighbourhood of z0 we have _2 *„-[; 2;;r](span[»]) and if z0 is such that z0 - z0 + 1 ¥= 0, then there is a neighbourhood of z0 in which «*>-[! 2!;;i']W']) However, there is no one-dimensional subspace Span (with at least one of the complex numbers a, b nonzero) such that Span["J + ^(z)=<:2 (18.6.1) for all z e <p. Indeed, (18.6.1) means detU Z ~2Z + 1] = (a-b)z2 + (<i + l>)z-b*0 for all z&$, which is impossible. □ It turns out that although one common direct complement for an analytic family of subspaces may not exist, only two subspaces are needed to serve as "alternate" direct complements for each member of the analytic family. Theorem 18.6.1 For an analytic family of subspaces {M(z)}zen of $" there exist two subspaces MX,M2C $" such that for each z E Q, either M(z) + J(x = <p" or M(z) 4- jV2 = <:" holds. Proof. To prove this we first need the following observation: for any ^-dimensional subspace 5£ C <p, the set DC{5£) of all direct complements to X in <p" is open and dense in the set of all (n - /c)-dimensional subspaces. Indeed, the openness of DC{5£) follows immediately from Theorem 13.1.3. To prove denseness, let Jf be an (n - k)-dimensional subspace in $" with
592 Analytic Families of Snbspaces basis /,,-■,/„_*, and let Jf0 be a direct complement to 5£ with basis 8i>- ■ • > 8n-k- F°r a complex number e put Jf(e) = Span{/, + eSi> • •• . L-k + *£„-*}• Clearly, the vectors /, + eg,., i = 1,. . . , n - k are linearly independent for e close enough to 0, so dim Jf(e) = n — k. Moreover, Theorem 13.4.2 shows that lim0(^(e),jV) = O It remains to show that Jf(e) belongs to DC(Z£). To this end pick a basis hx,...,hk'm3!, and consider the n x n matrix G(«) = [/i + eg,,. . . , /„„* + eg„_t, hx,. . . , AJ As det[g1,...,gn_t,A1,... , AJ^O (recall that jV0 + if = <pn); also det[i/1+g„...,i/n_, + g„-„A1)...,^]^0 for |e| sufficiently large. Hence detG(e)#0 for |e| large enough. We find that detG(e)^0, and since det G(e) is a polynomial in e it follows that detG(e)7£0 for e^O and sufficiently close to zero. Obviously, ^V(e)G DC(i?) for such e. Now we start to prove Theorem 18.6.1 itself. Fix z0 G ft, and let JV\ be a direct complement to M(z0) in <p". By Theorem 18.3.2 it is possible to pick vector functions xx(z),. . . , xp(z) G <p" that are analytic on ft and such that, for every z Gft, the vectors xx(z),. . . , xp(z) form a basis in M(z). Letting /i» • • • > /« p De a basis in ^V,, consider the n x n matrix function G(2) = [/,,..., f„-p,x,(z),...,xp(z)} which is analytic on ft. As det G(.z0)#0, the determinant of G(z) is not identically zero, and thus the number of distinct zeros of det G(z) is at most countable. Let zx, z2, . . .Eil be all of these zeros. Then Jfx is a direct complement to M(z) for z0{z{, z2,. . .}. On the other hand, we have seen that, for i = l,2,..., the sets DC{M{zt)) are open and dense in the set of all (n —/?)-dimensional subspaces in <p". As the latter set is a complete metric space in the gap topology (Section 13.4), it follows that the intersection n°°=1 DC(M(zt)) is again dense [the Baire category theorem; e.g., see Kelley (1955)]. In particular, this intersection is not empty, so there exists a subspace jV2 C $" that is simultaneously a direct complement to all of M(zx), M(z2), □
Direct Complements for Analytic Families of Subspaces 593 The following result shows that for analytic families of subspaces that appear as the kernel or the image of a linear matrix function there exists a common direct complement. As Example 18.6.1 shows, the result is not necessarily valid for nonlinear matrix functions. Theorem 18.6.2 Let T, and T2 be m x « matrices such that the dimension ofKer(Tl + zT2) is constant, that is, it is independent of z on (p [and the same is automatically true for dim Im(r, + zT2)]. Then there exist subspaces Jfx C <p", Jf2 C <f"" such that vV, + Ker(7\ + zT2) = (" , X2 + Im(r, + zT2) = <pm for all z e <p. Note that in view of Proposition 18.1.1 and Theorem 18.2.1 the families of subspaces Ker(7\ + zT2) and lm(Ti + zT2) are analytic on (p. Proof. For the proof of Theorem 18.6.2 we use the Kronecker canonical form for linear matrix polynomials under strict equivalence (which is developed in the appendix to this book). As dim Ker^ + zT2) is independent of z £ <p, the canonical form of Tl + zT2 does not have the term zl + J. So, in the notation of Theorem A.7.3, there exist invertible matrices Qx and Q2 such that 0^ + zT2)Q2 = 0UXv®Lpi@- ■ -® Lpk® LTqi®- ■ -® LTqi ® (7ri + A/ri(0)) 0 • • • 0 (/,, + A/,/0)) (18.6.2) It is easily seen that Ker LTqi = {0} , Kcr(Ir + A/r/0)) = {0} for all z £ <p, and that Ker Lp = Span{e, - ze2 + z2e3 - • • • ± z"'~1ep} , z G <p So there exists a direct complement Mx to Kerfg^T, + zT2)Q2\ for all z G <p given as follows: ■*i = Span(cu+2,. . ., e„+Pl> Gv+pt+2' ■ - • ' et>+p,+p2> • • • > e„+Pl+ ..+P/t_,+2> • • ■ . e„+„I + -+^. ej with 7> "+/>,+•■• + p4} As
594 Analytic Families of Subspaces Ker(r, + zT2) = Q^K^Q^T, + zT2)Q2)) it follows that Q2MX + Ker(r, +zT2)= <p" , z E <p The part of Theorem 18.6.2 concerning Im(r, + zT2) is proved similarly, taking into account the facts that ImLp,= <p"-\ Im(/r> + A/,,(0)) = <p" and for each z£ <p, Im L^ has a direct complement Span{e,}. □ 18.7 ANALYTIC FAMILIES OF INVARIANT SUBSPACES Let A(z): <p"—» <p" be an analytic family of transformations on ft. Our next topic concerns the analytic properties (as functions on z) of certain invariant subspaces of A(z). We have already seen some first results in this direction in Section 18.1. Namely, if the rank of A(z) is independent of z, then Im A(z) and Ker A(z) are analytic families of subspaces. In the general case, Im A(z) and Ker A (z) become analytic families of subspaces if corrected on the singular set of A(z). The next theorem is mainly a reformulation of this statement. For convenience, let us introduce another definition: an analytic family of subspaces {M(z)}2eC1 is called A(z) invariant on ft if the subspace M(z) is A(z) invariant for every z G ft. Theorem 18.7.1 There exist A(z)-invariant analytic families {M(z))zetl and {JV(.z)}z6n such that M(z) = Im A(z) and N(z) = Ker A(z) for every z not belonging to the singular set of A(z). Proof. In view of Theorem 18.2.1 we have only to prove that M(z0) and ./V(z0) are A(z0) invariant for every z0GS(A). But this follows from Theorem 15.1.1 because limz^z A(z) = A(z0) and lim 0(M(z), M(z0)) = lim 0(Jf(z), Jf(z0)) = 0 D Another class of A(z)-invariant subspaces whole behaviour is analytic (at least locally) includes spectral subspaces, as follows. Theorem 18.7.2 Let T be a contour in the complex plane such that T H a(A{z0)) = 0 for a fixed z0 E ft. Then the sum MT(z) of the root subspaces of A(z) correspond-
Analytic Families of Invariant Subspaces 595 ing to the eigenvalues inside T, is an A(z)-invariant analytic family of subspaces in a neighbourhood U of z0. Proof. As A(z) is a continuous function of z on ft, the eigenvalues of A(z) also depend continuously on z. Hence there is a neighbourhood U of z0 such that A(z) has no eigenvalues on T for any z in the closure of U. Now for z e U we have Mr(z) = lm[^-.jr(XI-A(2))-ldx\ (18.7.1) We have seen in Section 2.4 that P(2)= 2^7 Jr(A/- ^C*))"' dA (18-7-2) is a projector for every z£U. So, to prove that Mr(z) is an analytic family in U, it is sufficient to check that P(z) is an analytic function on U. Indeed, |det( A/ - A(z))\ > S >0 for every A £ T and z£t/, where 8 is independent of A and z. Hence ||(A/- j4(z))~'|| is bounded for AGT and z G If, and consequently the Riemann sums 5-7 2 (V.-A^KA,./-^))-1 ^■"' /=o where A,,,. . . , Am are consecutive points in the positive direction on T with Am = A0, converge to the integral (18.7.2) uniformly on every compact set in U. As each Riemann sum is obviously analytic on U, so is the integral (18.7.2). D In view of Theorems 18.7.1 and 18.7.2, the following question arises naturally: does there exist an /l(.z)-invariant analytic family that is nontrivial (i.e., different from {0} and <p")? Without restrictions on A(z) the answer is no, as the following example shows. example 18.7.1. Define an analytic family on <p by Here the A(z)-invariant subspaces (for a fixed 2) are easy to find: the only nontrivial invariant subspace of A(0) is Span{e,}, and, when z ¥■ 0, the only nontrivial invariant subspaces of A(z) are Span[M and Span[M where ul and u2 are the square roots of z. It is easily seen that there is no nontrivial, A(z)-invariant, analytic family of subspaces on (p. □
596 Analytic Families of Subspaces In the next section we study A(z)-invariant analytic families of subspaces under the extra condition that A (z) have the same Jordan structure for all z G ft. We see that, in this case, nontrivial A{z)-invariant analytic families of subspaces always exist. On the other hand, we have seen in Example 18.7.1 that there exists a nontrivial /i(z)-invariant family of subspaces that is analytic in <p except for the branch point at zero. Such phenomena occur more generally and are studied in detail in Chapter 19. 18.8 ANALYTIC DEPENDENCE OF THE SET OF INVARIANT SUBSPACES AND FIXED JORDAN STRUCTURE Given a family of transformations A(z): <p" —» <p" that depends analytically on the parameter z in a domain ft G <p, we say that the lattice Inv(/i(z)) depends analytically on z G ft if there exists an invertible transformation 5(z): (p"-*^" that is analytic on ft and such that Inv(A(z)) = S(z)(In\(A(z0))) for all z Gft and some fixed point z0Eft. This definition does not depend on the choice of z0. Indeed, if Inv(A(z)) = 5(z)(Inv(A(z0))) then for every z'0 £ ft we have Inv(/l(z)) = S(z)(S(z'0))\lm(A(z'0))) Also, replacing S(z) by S(z)S(z0)~\ we can require in the definition of analytic dependence of lnv(A(z)) that 5(z0) = /. Since ln\(A), Inv(B) are linearly isomorphic if and only if A and B have the same Jordan structure (Theorem 16.1.2), a necessary condition for analytic dependence of lm(A(z)) on z is that A(z) have fixed Jordan structure, that is, the number m of different eigenvalues of A{z) is independent of z on ft, and for every pair z,,z2Gft the different eigenvalues A,(zt), . . . , km{zx) and A,(z2), . . . , Am(z2) of A(zx) and A(z2), respectively, can be enumerated so that the partial multiplicities of Ay(z,) [as an eigenvalue of /t(z,)] coincide with the partial multiplicities of A^(z2) [as an eigenvalue of A(z2)], for j - 1,. . . , m. Using Theorem 16.1.2, we find that the family A(z) has fixed Jordan structure if and only if, for every z,,z2Gft the lattices Inv(v4(z,)) and Inv(/l(z2)) are isomorphic. Clearly, this property is necessary for the lattice Inv(/l(z)) to depend analytically on z G ft. The following result shows that this property is also sufficient as long as ft is simply connected. Theorem 18.8.1 Let ii be a simply connected domain in <p, and let A{z): <p"-» <p" be an analytic family of transformations on ft. Then Inv(A(z)) depends analytically on z E ft if and only if A(z) have fixed Jordan structure.
Invariant Subspaces and Fixed Jordan Structure 597 In particular, the condition of a fixed Jordan structure ensures existence of at least as many A(z)-invariant analytic families of subspaces as there are j4(z0)-invariant subspaces. Proof. We assume that A(z) is represented as a matrix-valued function with respect to some basis in <p" that is independent of z on ft. Fix a z0 in ft. Let A,, .... A be all the distinct eigenvalues of A(z0), and let T, be a circle around A, chosen so small that T, n T. = 0 for i # j. As the proof of Theorem 16.3.1 shows, there exists an e > 0 with the property that if B: <p" —» <p" is a transformation with the same Jordan structure as A(z0), and if ||iS- -A(z„)|| < e, then there is a unique eigenvalue m,(#) °f B in each circle T, (1 <i'</j), and, moreover, the partial multiplicities of /x,.(fi) (as an eigenvalue of B) coincide with the partial multiplicities of A, (as an eigenvalue of A(z0)). Hence, for every z from some neighbourhood U, of z0, there is a unique eigenvalue [denoted by /x(-(z)] of A(z) in the circle T, (1 < / < p), and the partial multiplicities of fi^z) coincide with those of A,. Obviously, M,-(*o) = A/- Let us prove that /i,(z) is analytic on Ux. Indeed, denoting by m, the algebraic multiplicity of A, [as an eigenvalue of j4(z0)], we have which is an analytic function of z on Ul. We have proved that in a neighbourhood of each point z0 E ft the distinct eigenvalues of A(z) are analytic functions of z. It follows that the eigenvalues of A(z) admit analytic continuation along any curve in ft. By the monodromy theorem [see, e.g. Rudin (1974); this is where the simple connectedness of ft is used] the distinct eigenvalues /a,(z), .. . , u (z) of A(z) are analytic functions on ft. Now fix z0Eil and define the family of transformations B(z): <p"—»<f"\ z G ft by the requirement that B(z)x = [m,(zo) ~ My(z)]* f°r anv x belonging to the root subspace of A(z) corresponding to the eigenvalue /i,(z). It is easily seen that B(z) is analytic on ft. Indeed, for every z,Gft let TJ,. . . , r'p be circles around /i,(z,),. . . , /x (z^, respectively, so small that ^(z,) is the only eigenvalue of A(z,) inside or on the circle Ty for /= 1,. . . , p. There is a neighbourhood V of z, such that any A(z) with z£ V has the unique eigenvalue fjLt(z) inside the circle T], /= 1,. . . , p. Then B(z) = 2 =M, [Hj(z0) - n,{z)](\I - A(z)yl dX, z G V which is analytic on V in view of the analyticity of A(z) and /^(z) for / = 1,. . . , p. Put A(z) = A(z) + B{z). Obviously, the set of /l(z)-invariant
598 Analytic Families of Subspaces subspaces coincides with the set of A(z)-invariant subspaces for all z E ft, so it is sufficient to prove Theorem 18.8.1 for A(z) instead of A(z). From the definition of A(z) it is clear that the eigenvalues of A(z) are ju.,(z0), . . . , fip(z0), that is, they do not depend on z, and, moreover, the partial multiplicities of Hj(z0) as eigenvalues of A(z) do not depend on z, either. In other words, in Theorem 18.8.1 we may assume that A(z) is similar to A(z0) for all zECl. For /' = 1,. . . , p, let nij be the maximal partial multiplicity of fjLj(z0) as an eigenvalue of A(z0) [and hence as an eigenvalue of A(z) for all z in ft]. Note that since A(z) is similar to A(z0) for all z Eft, by Proposition 18.1.2 there is an analytic basis in Kei(A(z) - ju.y(z0)/)m for m = 0,1,2,. . . (i.e., for each fixed /' and m). By Theorem 18.3.3 there exists a basis x[[\z),..., x{p) in Ker(/l(z) - fifa)!)"1' modulo Ker(/l(z) - ai^o)"'""1 that is analytic on ft. It is easily seen that the vectors (A(z) - h(z0)I)xH\z), r = \,...,qj are linearly independent for all zEQ, and belong to Ker(/l(z) - pLj(z0)I)m>~ . Hence by Theorem 18.3.3 again there is a basis x^{z),..., x^(z) in Ker(/l(z) - M>(z0)/)m'~' modulo Span{(A(z) - h(z0)I)xH\z), r = 1,. . . , qt} which is analytic on ft. Next we find an analytic basis x\i\z),...,x^{z) in Ker(/l(z) - ^(z0)/)m'"2 modulo Span{(/l(z) - iL,(z0)lfx\»{z) , r = 1, . . . , q,; (A(z) - M/(z0)/)^>(z), * = 1,.... r,} and so on. Now define the n x n matrix T(z) formed by the columns (A(z) - Pfizjir^xtfiz),..., (A(z) - h(z0)I)x\{\z), x\[\z), (A(z) - n^ir^x^iz),.. . , *<(>(z),. . . , (A(z) - Aiy(20)/)m'-,*«>(2),. . . ,*<;>(z), (A(z) - h{z0)I)mr2 x *<>>(*),..., x\»(z),. . . ,{A(z) - h(z0)irr>x(»iz)>. . . , xl»{z)>. . . (18.8.1) where / = 1,. . . , p. As the proof of the Jordan form of a matrix shows (see Section 2.3), the columns of T(z) form a Jordan basis of A(z). In particular,
Analytic Dependence on a Real Variable 599 T(z) is invertible for all zEil. Clearly, T(z) is analytic on ft. As T(z)~lA(z)T(z) is a constant matrix (i.e., independent of z) and is in the Jordan form, the assertion of Theorem 18.8.1 follows. □ In the course of the proof of Theorem 18.8.1 we have also proved the following result on analytic families of similar transformations. Corollary 18.8.2 Let A(z): <p" —»<p" be an analytic family of transformations on ft, where ft is a simply connected domain. Assume that, for a fixed point z0 G ft, A(z) is similar to A(z0) for all z £ ft. Then there exists an invertible transformation T(z): <p" -»<p", which is analytic on ft and such that T(z0) = / and T(z)-lA(z)T(z) = A(z0) for all z e ft. The assumption that ft is simply connected in Theorem 18.8.1 is necessary, as the next example shows. example 18.8.1. Let ft= <f"- {0}, and let Clearly, A(z) has fixed Jordan structure on ft (the eigenvalues being the two square roots of z). The nontrivial A (z) -invariant subspaces are Span and Span Clearly, there is no (single-valued) invertible 2x2 matrix function S(z) that is analytic on £^{0} and satisfies the conditions of Theorem 18.8.1. □ Note that in the proof of Theorem 18.8.1 the existence of an analytic Jordan basis [Formula (18.8.1)] of A{z) also follows from a general result on analytic perturbations of matrices (see Section 19.2). 18.9 ANALYTIC DEPENDENCE ON A REAL VARIABLE The results presented in Sections 18.1-18.8 include the case when the families of transformations <p" —» <pm and subspaces of <p" are analytic in a real variable on an open interval {a, b) of the real axis. The definition of analyticity is analogous to that in the complex case: representation as a power series (this time with real coefficients) in a real neighbourhood of each point f0G(a, b). As the radius of convergence of this power series is positive, it converges also in some complex neighbourhood of t0. Con-
600 Analytic Families of Subspaces sequently, a family of transformations from <p" into <f"" (or of subspaces of <jT") that is analytic on (a, b) can be extended to a family of linear transformations (or subspaces) that is analytic in some complex neighbourhood ft of (a, b), and the results presented in Sections 18.1-18.8 do apply. It is noteworthy that, in contrast to the complex variable case, the orthogonal complement preserves analyticity, as follows. Theorem 18.9.1 Let M(t) be a family of subspaces (of$a) that is analytic in the real variable t on (a, b). Then the orthogonal complement M(t)± is an analytic family of subspaces on (a, b) as well. Proof. Let t0 E (a, b). Then in some real neighbourhood U{ of t0 there exists an analytic family of invertible transformations A(t): <p"—»<p" such that M(t) = A(t)M, (£(/, for a fixed subspace M C <f"\ Assume (without loss of generality) that M — Span{e,,. . . , ep} for some p, and write A(t) as the n x n matrix with entries that are analytic on (a, b) with respect to the standard basis in <p". Then M(t) = Im B{t) for (£(/,, where B(t) is formed by the first p columns of A(t). As A{t) is invertible, the columns of B(t) are linearly independent. For notational simplicity, assume that the top p rows of B(t0) are linearly independent and hence form a nonsingular pxp matrix. Then there is a real neighbourhood U2 C Ul of t0 such that the top rows of B(t) form a nonsingular pxp matrix C(t) as well. So for te(/2,we obtain M(t) = Imf 2^1 = Im[ I v ,1 w lD(t)\ LD(0C(0 J where D{i) is the (n - p)x p matrix formed by the bottom n — p rows of B(t). Denoting X(t) = D(t)C(t)~\ consider the p x p matrix function 5(0 = (/ + X(t)*X(t))~i for t e U2. Note that / + X(t)*X(t) is positive definite and thus invertible. Clearly, 5(f) is positive definite and analytic on U2. Let T be a contour that lies in the open right half plane, is symmetrical with respect to the real axis, and contains all the eigenvalues of 5(f0) in its interior. Then all eigenvalues of 5(f), where f is taken from some neighbourhood U3 C U2 of f0, will also be in the interior of I\ For such a t the integral Z(0=2^/rA1/2(A/-5(f))-'dA where A1'2 is the analytic branch of the square root that takes positive values for A positive, is well defined and Z(t)2 = 5(f) (see Section 2.10). Moreover, because of the symmetry of T, the matrix Z{t) is positive definite for all t G U3. Also, Z(t) is an analytic family of matrices on U3. Now one sees easily that, for f E U3
Exercises 601 -.if z(t) z(t)x(ty ] u Ix(t)zu) x(t)z(t)xu)* J x(t)Z(t) x(t)Z(t)x(ty is the orthogonal projector on M(t). Indeed, a straightforward computation verifies that P(tf = P(t) = P{t)*. So P(t) is an orthogonal projector. Furthermore, it is clear that ImP(0DIm[^0](=^(0) (18.9.1) and since rank P(t) is easily seen to be p, equality (rather than inclusion) holds in (18.9.1). Consequently, M{tY is the image of the analytic family of projectors l-P{t), and thus M(t)L is analytic on U3. As t0E(a, b) was arbitrary, the analyticity of M(t)L on (a, b) follows. □ One can also consider families of real transformations from ft" into %m, as well as families of subspaces in the real vector space ft", which are analytic in a real variable t on (a,b). For such families of real linear transformations and subspaces the results of Sections 18.1-18.8 hold also. However, in Theorem 18.7.2 the contour T should be symmetrical with respect to the real axis; and in the definition of fixed Jordan structure one has to require, in addition, that the enumeration A,(2 J,. . . , Am(z,) and A,(z2),. . . , Am(z2) of distinct eigenvalues of A(zx) and A(z2), respectively, is such that A,(z,) = A;(2,) holds if and only if A,(z2) = A;(z2). 18.10 EXERCISES 18.1 Let r 1 z U2 1 l 2z 4z2\ A(z)= z 2z :<p2-><p3, 2£<p be an analytic family of transformations written as a matrix in the standard orthonormal bases in <p2 and <p3. (a) Are Im A(z) and Ker A(z) analytic families of subspaces? (b) Find an analytic vector function y(z) such that y(z) ^ 0 for all z e <p and Span{y(z)} = Ker A(z) for all z E <p with the exception of a discrete set. (c) Find linearly independent and analytic (in <p) vector functions yi(z)» yi(z)sucn that sPan(.yi(z)> y2(2)}= Im Mz) f°r a11 * ^ <p with the exception of a discrete set. [Hint: Use the Smith form for the matrix polynomial v4(z).]
602 Analytic Families of Subspaces 18.2 Solve Exercise 18.1 for A{z) = z + 1 z Z 2-1 1 Z 18.3 Let P(z) be an analytic family of projectors. Show that Im P(z) is an analytic family of subspaces. 18.4 Let A(z) = diag[A,(z), A2(z),..., Ak(z)] where for/ = 1,. . . , k, A^z) is an analytic family of transformations on a domain ft. Prove that the following statements are equivalent: (a) Im A{z) and Ker A(z) are analytic families of subspaces. (b) Im Aj(z) is an analytic family of subspaces, for j = 1,. . . , k. (c) Ker Aj(z) is an analytic family of subspaces, for j = 1,. . . , k. 18.5 Let A(z): <p"—»<p" be an analytic family of transformations on ft such that A(z)2 = I for all z £ ft. Prove that the families of subspaces Im(/i(z) - /) and Im(A(z) + /) are analytic on ft. 18.6 Let A(z) be an analytic family of transformations on ft such that p(A(z)) = 0 for all z£ft, where p(A) is a scalar polynomial of degree m with distinct zeros A,,. . . , Am. Prove that the families of subspaces Ker( A;/ - A(z)), j = l,...,m are analytic on ft. 18.7 Does the result of Exercise 18.6 hold if p( A) has less than m distinct zeros? 18.8 Given matrices A and B of sizes n x n and nx m, respectively, show that Ker[A/ + A, B] is an analytic family of subspaces if and only if (A, B) is a full-range pair. 18.9 Given matrices C and A of sizes p x n and nx n, respectively, show that Im is an analytic family of subspaces if and only if (C, A) is a null kernel pair. 18.10 Given an analytic n x n matrix function A(z) on ft that is upper triangular for all zEil, when is Ker A(z) analytic on ft? 18.11 For the following analytic vector functions *,(z), x2(z), where z E £, find analytic vector functions y,(z), y2(z) of z £ <p such that y,(z) and y2(z) are linearly independent for every z £ <p and Span{x,(z), x2(z)} = Spanfy^z), y2(z)} for every z £ <p except for a discrete set: (a) ^(z) = <z2,l-z,0), *2(z) = <z3,l-z2,z2-2> (b) X,f»=<l,-2,2>, *2(Z) = <1,Z2,Z2 + Z> [Hint: Use the Smith form for the matrix polynomial [*,(z), *2(2)]-]
Exercises 603 12 Let jc,(2), . . . , xk(z) be n-dimensional vector polynomials such that, for at least one value z0£ (p, the vectors xx(z0),. . ., xk(zQ) are linearly independent. Prove that one can construct n-dimensional vector polynomials y,(2),.. . , yk(z) such that yx(z),. . ., yk(z) are linearly independent for all 2 E <p and Spanfy^z),. . . , yk(z)} = Span{x,(2),. . . , xk(z)} for all 2 £ <p with the possible exception of a finite set, as follows. Let [Xl(z),...,xk(z)] = E(z)D(z)F(z) be the Smith form of the n x k matrix [^,(2),. . . , xk(z)\; then put [y,(z), . . . , yk(z)] = E(z)F(z) 13 Complete the following linearly independent analytic families of vectors in <p (depending on the complex variable 2 E <p) to analytic families of vectors that form a basis in <p4 for every 2 E <p: (a) xl(z) = (l,z,z\z3); x2(2)=(l,22,422,l> (b) x,(2) = <l,z,0,l>; *2(2) =(-1,0,2-1,2); x3(z) = (-1,0,0, 2 + 1) 14 For the following analytic families M(z) of subspaces in <p" that depend on 2 E <p, find two subspaces ^T, and Jf2 such that for every 2 E <p at least one of M(z) + JV, = £" or ^(2) + ^V2 = <f" holds: (a) ^(2) = Im 1 2 22 L23 1 " 22 422 1 . (b) M(z) = Im 1 2 0 1 -1 0 2-1 2 -1 ■ 0 0 2 + 1- 15 For each n > 2 give an example of an analytic family of transformations A(z): (p"-*^" defined on ft that has no nontrivial A(z)- invariant analytic family of subspaces on ft. 16 Let A(z) be an analytic family of transformations defined on ft such that p(A(z)) = 0 for all zEil, where p( A) is a scalar polynomial of degree m with m distinct zeros. Prove that there are at least 2m j4(2)-invariant analytic families of subspaces on ft.
Chapter Nineteen Jordan Form of Analytic Matrix Functions In this chapter we study the behaviour of eigenvalues and eigenvectors of a transformation that depends analytically on a parameter in both the local and global frameworks. It turns out that this behaviour is analytic except for isolated singularities that are described in detail. The results obtained allow us to solve (at least partially) the problem of analytic extendability of an invariant subspace. In turn, the solution of this problem is used in Chapter 20 for the solution of various problems concerning divisors of monic matrix polynomials, minimal factorization of rational matrix functions, and solutions of matrix quadratic equations, all of which involve analytic dependence on a parameter. Clearly, the material of this chapter relies on more advanced complex analysis than does that of the preceding chapters. However, this is not a prerequisite for understanding the main results. 19.1 LOCAL BEHAVIOUR OF EIGENVALUES AND EIGENVECTORS Let A(z): §"-* <p" be a family of transformations that is analytic on a domain ft. In this section we study the behaviour of eigenvalues and eigenvectors as functions of z in a neighbourhood of a fixed point z0 G ft. First let us state the main result in this direction. Theorem 19.1.1 Let p.x,. . . , fik be all the distinct eigenvalues of A(z0), that is, the distinct zeros of the equation det( fil- A(z0)) -0, where k < n, and let r,(/ = l,...,k) be the multiplicity of ^ as a zero of det(/i/- A(z0)) = 0 (so r, + • • • + rk = n). Then there is a neighbourhood °U of z0 in ft with the following properties: (a) there exist positive integers mn,. . . , mu ; 604
Local Behaviour of Eigenvalues and Eigenvectors 605 m2l,. . . , m2 s ;. . . ; mkl,. .. , mks such that the n eigenvalues (not necessarily distinct) of A(z) for zE.°U> {z0}, are given by the fractional power series: do /v(z) = ^ + S aaiJ[(z - Mi)!"""]"; °- = i, • • • > "v i = i. • • •. *,- a = 1 i = l,...,fc (19.1.1) where aaij G <p and for a = 1,. . . , mtj (b) the dimension y- of Ker(.4(A) - /^(z)/), as well as the partial multiplicities m\^ > • • • > m)j'j) (>0) of the eigenvalue fiija(z) of A(\), do not depend on z (for z6l/^ {z0}) and do not depend on a; (c) for each i — 1,. . . , k and j = 1,. . . , st there exist vector-valued fractional power series converging for zEi°U: x(£)(z)=ix^Kz-n,)::m"]l 5 = l,...,mj;» r = 1 Ti/; <r=l,...,mt, (19.1.2) where xlJ^ G <p", such that for each y and each zG^^fz,} the vectors x]Jal)(z),. . . , x(Jam'' \z) form a Jordan chain of A(z) corresponding to (A(z) - ^(z)I)x^\z) = x%>-l\z), p = 1,. . . , m}/>; y = \,...,ytj, cr = l,...,m,7 (19.1.3) where by definition x\]°\z) = 0, and x<;jj)(z)¥"0. Moreover, for every zE.M "- {z0} the vectors x\g>(z); fi = l,...,m</>; y = 1, . .. , y„ ; a = 1,. . . , m(> / = l,...,s,; i = !,...,* form a basis in <p" . The full proof of Theorem 19.1.1 is too long to be presented here. We refer the reader to the book of Baumgartel (1985), and especially Section IX.3 there, for a complete proof. Let us make some remarks concerning this important theorem. First, in the expansion (19.1.3), if m,7>l, then the greatest common divisor of all
606 Jordan Form of Analytic Matrix Functions positive integers a such that aaii^0 is 1 (so iiijir{z), a = 1, ... , mtj have a branch point at fit of branch multiplicity m,7 and not less). If mtj = 1, then fiija(z) are analytic on a neighbourhood of ju.,; it may even happen that fiija(z) is the constant function /x, (see Example 19.1.2). Second, the theorem does not say anything explicit about the partial multiplicities pM > • • • > pit of the eigenvalue fit of A(z0) (we know only that E;(i=1 ptj = r, for i = 1,. . . , k). However, there is a connection between the partial multiplicities m\p > • •• > mjV of the eigenvalues ii.ija(.z) of A(z) (zel^ {z0}) and the partial multiplicities of the eigenvalue /i, of A(z0). This connection is given by the following formula (see Theorem 15.10.2): Ep^ESmX'1, /=1,2,..., where /j1? is interpreted as zero for k>tt, and similarly for m\^ when q> yir As the total sum of partial multiplicities of eigenvalues near \it does not change after small perturbation of the transformation, we also have the equality Let us illustrate Theorem 19.1.1 with an example. example 19.1.1. Let 20 = 0 and A(z) = 0 110 z 0 0 1 0 0 0 1 L0 0 z 0 The only eigenvalue of A(Q) is zero, with partial multiplicities 3 and 1. [The easiest way to find the partial multiplicities of A(0) is to observe that rank A(0) = 2 and A(0)2 ¥=0.] To find the eigenvalues of A(z), we have to solve the equation det(fil— A(z)) = 0, which gives (in the notation of Theorem 19.1.1) p.ih,{z) = z"\ /=1; i=l; <r = l,2 (so we have k = 1, s, = 1, mM =2). It is not difficult to see that the only partial multiplicity of i*.,i<r(z) is mj?' = 2. The Jordan chain of A(z) corresponding to nijcr(z) is *£>(z) = <l,2,,2,0,0>; 4l2)(*) = <0,0,l,z"2> D
Global Behaviour of Eigenvalues and Eigenvectors 607 An important particular case of Theorem 19.1.1 appears when the eigenvalues of A(z) are analytic in a neighbourhood of z0, that is, all integers mtj are equal to 1, as follows. Corollary 19.1.2 Assume that all the eigenvalues ofA{z) are analytic in a neighbourhood ofz0. Then the distinct eigenvalues fi^z),. . . , fik(z) of A(z), z¥^z0 can be enumerated so that /x,-(.z) is an analytic function in a neighbourhood °UX ofz0. Further, assuming that the enumeration of the distinct eigenvalues ofA(z) for z¥= z0 is as above, there exist analytic n-dimensional vector functions ylj\z); 1 = 1,...,*; j=l,...,s, ; y = l,...,^ (19.1.4) in a neighbourhood °U2 C aUl ofz0 with the following properties: (a) for every z£*!/2x{z0}, and for i = l,...,k; j = \,...,si the vectors y\p(z),. . . , y\j''\z) form a Jordan chain of A(z) corresponding to the eigenvalue fij(z); (b) for every zEl0U2~^ {z0} the vectors (19.1.4) form a basis in <p". The following example illustrates this corollary. example 19.1.2. Let A(z) = [°0 z0], ze<p Obviously, the eigenvalues of A(z) are analytic (even constant). It is easy to find analytic vector functions y\j (z) as in Corollary 19.1.2: we have k = 1, si = !> Tn =2' and /,1V) = [j], /,?>(*) = [J] Note that y[\\z), y\\\z) do not form a basis in <p2 for z = 0; also, yjJ^O), y(,f(0) do not form a Jordan chain of A(0). This shows that in (a) and (b) in Corollary 19.1.2 one cannot, in general, replace %*"- {z0} by aU2. D 19.2 GLOBAL BEHAVIOUR OF EIGENVALUES AND EIGENVECTORS The result of Theorem 19.1.1 allows us to derive some global properties of eigenvalues and eigenvectors of an analytic family of transformations A(z): <p"—>§" defined on ft. As before, ft is a domain in the complex plane.
608 Jordan Form of Analytic Matrix Functions For a transformation X: <p" —» <p" we denote by v{X) the number of distinct eigenvalues of X. Obviously, 1 ^ v(X) < n. Theorem 19.2.1 Let A(z): <p"—* <p" be an analytic family of transformations on ft. Then for all zE.il except a discrete set S0 we have v(A(z)) = max v(A(z)) for z0 G S0 we have i>(A(z0))<maxv(A(z)) zefl Proof. Theorem 19.1.1 shows that for every z0E(l there is a neighbourhood SUZ of z0 such that v(A(z)) is constant (equal to v0, say) for zE°U2 ~^ {z0} and v(A(z0)) ^ v0. A priori, it appears that v0 may depend on z0. Let us show that actually v0 is independent of z0. For v = 1,. . . , n, let Vv = U°UZ , where the union is taken over all z0EQ such that v(A(z)) = vin a deleted neighbourhood °U "^ {z0} of z0. Obviously, T,,. . . , Tn are open sets whose union is il, and it is easily seen that they are mutually disjoint. This can happen only if all T; are empty except for Yv; therefore, V = Q. It is clear also that Now if v(A(z'))< vQ for some z'Eil, then by Theorem 19.1.1 we have v(A(z)) — v0 in a deleted neighbourhood of z'. This shows that the set S0 of all z e ft for which v(A(z)) < v0 is indeed discrete. □ The points from S0 will be called the multiple points of the analytic family of transformations A(z), because at these points the eigenvalues of A(z) attain higher multiplicity than "usual." Another way to prove Theorem 19.2.1 is by examining a suitable resultant matrix. Let n-l det( fil- A(z)) = fi" + 2 fl/U)/*' /=o for some scalar functions a;(z) that are analytic on ft, and consider the (2/j - 1) x (2/j - 1) matrix whose entries are analytic functions on ft:
Global Behaviour of Eigenvalues and Eigenvectors 609 r«0(z) a,(z) ■■■ «„_,(*) 1 0 ••• 0-1 0 a0(z) «,(*) ... fl„_,(z) 1 ■• 0 0 0 ■•• aa(z) «,(z) ■■• 1 2a2(z) ■■• (fi-lK.,(z) 0 ... 0 «,(z) 2fl2(z) ■•■it ••• 0 .000 •• • a,(z) 2a2(z) • ■ • n . This is the resultant matrix of two scalar polynomials on /j.: det( fil - A{z)) and (d/du)(det(ij,I - A(z)). A well-known property of resultant matrices [see, e.g., Gohberg and Heinig (1975)] states that 2n - 1 - rank R(z) is equal to the number of common zeros of these two polynomials in n (counting multiplicities). In other words In - 1 - rank R(z) = n- v{A(z)) or rank R(z) = n - 1 + v{A(z)) (19.2.1) Now let k (n < k < 2n - 1) be the largest size of a square submatrix in R(z) whose determinant is not identically zero. Denoting by S^z), . . . , S,(z) all such submatrices in R(z), we obviously have rank R(z) = k if at least one of det S^z),. . . , det 5,(z) is different from zero; and rank R(z) < k otherwise. Comparing with (19.2.1), we obtain: v(A(z)) = k-n + \ if not all numbers det 5,(z), . . . , det S,(z) are zeros; v{A(z)) <k-n + \ otherwise. Since the set of common zeros of det Sx(z),. . . ,detS,(.z) is discrete, Theorem 19.2.1 follows. D Theorem 19.1.1 shows that the distinct eigenvalues of A(z), nx{z),. . . , fi„(z) (where v = max^en v(A(z))) are analytic on ft"-- S0, where S0 is taken from Theorem 19.2.1, and have at most algebraic branch points is 50. [Some of the functions fix(z), . . . , a«.„(z) may also be analytic at certain points in 50.] Denote by Sx the subset of S0 consisting of all the points z0 such that at least one of the functions fij(z), j - 1,. . . , v is not analytic at z0. As a subset of a discrete set, 5, is itself discrete. The set 5, will be called the first exceptional set of the analytic family of linear transformations A(z), z Eft. It may happen that 5, ¥=■ 50, as shown in the following example. Riz) = 0 «■(*) 0
610 Jordan Form of Analytic Matrix Functions example 19.2.1. Let ft = <p and «->-[' o] The eigenvalues of A(z) are ±z, so in this case S0 = {0} but 5, = 0. D Example 19.1.2 shows that in general, when 2Eft^5,, one cannot expect that there will be a Jordan basis of A(z) that depends analytically on z. To achieve that we must exclude from consideration a second exceptional set, which is described now. Theorem 19.2.2 Let A(z): <£■"—» <p" be an analytic family of transformations on ft with the set S0 of multiple points and let fi^z),. . . , p.„(z) be the distinct eivenvalues of A(z) analytic on ft"~-S0 and having at most branch points in S0. Let mn(z) > • • • > mh(z), y — y(j, z) be the partial multiplicities of the eigenvalue fij(z) ofA(z) for j = 1,. . . , v; z 0SO. Then there exists a discrete set S2 in ft such that S2 C ft""- S0 and the number y( j, z) of partial multiplicities and the partial multiplicities mjk(z) themselves, k - 1,. . . , y(j, z) do not depend on z in ft""- (50 U S2), for 1,. . . , v. Proof. The proof follows the pattern of the proof of Theorem 19.2.1. In view of Theorem 19.1.1, for every z0 G ft, there is a neighbourhood %lz of z0 such that the number of distinct eigenvalues v = v(z0), as well as the number yt = y^z^ of partial multiplicities and the partial multiplicities themselves mjl 2= • • • > mjy , mjk = mjk{z0), corresponding to the /'th eigenvalue, are constant for zGl, ^{z0}. It is assumed that the distinct eigenvalues of A(z) for zE^^^- {z0} are enumerated so that they are analytic and yt s= • • • a yv. Denote by A the (finite) set of all sequences of type 8 = {v\ ?,,..., %,; mu,..., mly]; ...;m„„..., mvy} (19.2.2) where v,yj,mjk are positive integers with the properties that v<n; yl s • • • > yv; mt > • • • > miy, i = 1, . . . , v; S,; m/; = n. For any sequence SEA as in (19.2.2) let Ts=\JaUz , where the union is taken over all z0 eft such that v = v(z0); y, = y^), j = 1,. . . , v; mtj = mtj{z0), j = l,...,yi; i = 1,. . . , v. Obviously, Vs is open and USea Vs = ft. Also, the sets Vs, S G A are mutually disjoint. As ft is connected, this means that all Vs, except for one of them, are empty. So Theorem 19.2.2 follows. □ The set S2 = S2 U (50""- 5,), where S2 is taken from Theorem 19.2.2 and 50 and 5, are the set of multiple points and the first exceptional set of A(z), respectively, is called the second exceptional set of A(z). Note that 52 n 5, =
Global Behaviour of Eigenvalues and Eigenvectors 611 0. The second exceptional set is characterized by the properties that the distinct analytic eigenvalues of A(z) can be continued analytically into any point z0ES2, but for every z0E52, either y(/i(z0))<maxzen v(A(z)), or v(A(z0)) = maxzen v(A(z)) and for at least one analytic eigenvalue fij(z) of A(z) the partial multiplicities of ai,(z0) are different from the partial multiplicities of /x;(z), z¥= z0, in a neighbourhood of z0. example 19.2.2. Let A(z) = /j,(z) <?,(z) ••• p2(z) q2{z) 0 ze<p where pt(z) and q^z) are not identically zero polynomials such that /j,(z) = • • • = p. (z); pk+1(z) = • • • = p4 (z);. . . ; p* , + 1(z) = • • • = p (z) "«-«■ for all z G <p, where 1 < &, < A:2 < • •• < Ar?_, <kq = n. We also assume that the polynomials pk(z), ■ ■ . , pk (z) are all different. We have the set of multiple points S0 = {z E <j7 | /^ (z) = p* (z) for some i */} the first exceptional set S, is empty, and the second exceptional set S2 is the union of S0 and the set {z E <p ^ S0 | qt(z) = 0 for some kp + 1 < / < kp + l - 1 and some p). □ Now we state the result on existence of an analytic Jordan basis for an analytic family of transformations. Theorem 19.2.3 Let A(z): <p"-> <p" be an analytic family of transformations in ft with the first exceptional set 5, and the second exceptional set S2. Let fi^z),. . . , ai„(z) be tht distinct eigenvalues of A(z) (apart from the multiple points), which are analytic onil^ Sj and have at most algebraic branch points in 5P Then there exist n-dimensional vector functions *ym, ■ ■ -. *!:;, co. *vm, ■. ■, x^co, ■.., *#(*), ■■■> <V2) (19.2.3) j = 1, . . . , v, where mjl > • • • > mjy are positive integers, with the following properties: (a) the functions (19.2.3) are analytic on ft""- 5, and have at most
612 Jordan Form of Analytic Matrix Functions algebraic branch points in S,; (b) for every z£(i^(S,US2) the vectors (19.2.3) form a basis in <p"; (c) for every z E ft ^ (5, U S2) the vectors x[[\z),...,x[%ik(z) form a Jordan chain of the transformation A(z) corresponding to the eigenvalue fi^z), for k = 1,. . . , yt; j = 1, . . . , v. It is easily seen that if /x;-(z) has an algebraic branch point at z0 £ 5U then all eigenvectors X\[\z),X2[\z),...,X^{z) of A(z) corresponding to ^(z) also have an algebraic branch point at z0. Indeed, let y(z) be some (say, the /cth) coordinate of x\[\z) that is not identically zero. The equality [A(z) - fij(z)]x[[)(z) = 0 for z in ft^S2 implies that , , ak{z)x\\\z) M>(2)= ky{^K (19.2.4) where ak(z) is the fcth row of A(z). If x\'?(z) were analytic at z0, then (19.2.4) implies that fij(z) is also analytic at z0, a contradiction. The proof of Theorem 19.2.3 is given in the next section. In the particular case when A(z) is diagonable (i.e., similar to a diagonal matrix) for every z0Sl U52, the conclusions of Theorem 19.2.3 can be strengthened, as follows. Theorem 19.2.4 Let A(z) be as in Theorem 19.2.3, and assume that A(z) is diagonable for all z0Sj U S2. Then there exist n-dimensional vector functions x['\z),...,x[;.\z), /=1,...,„ (19.2.5) with the following properties: (a) the functions (19.2.5) are analytic on ft"-- 5, and have at most algebraic branch points in 5,; (b) for every z E ft and every j = 1,. .. , v the vectors x\n(z),. . . , x^,;)(z) are linearly independent; (c) for every zeft"-(5,U52) the vectors x\'}(z),. . . ,x^z) form a basis in Keiifi^z)! - A(z)). In particular, the vectors (19.2.5) form a basis in <p" for every z eft -- (5, U S2). The strengthening of Theorem 19.2.3 arises in statement (b), where the linear independence is asserted for all z E ft and not only for z E ft"-- (Sl U S2) as asserted in Theorem 19.2.3. The proof of Theorem 19.2.4 is obtained in the course of the proof of Theorem 19.2.3. We illustrate Theorem 19.2.4 with a simple example.
Proof of Theorem 19.2.3 613 example 19.2.3. Let Here S, = 0; Sz = {0}. The eigenvectors x,(z)= and x2(z) = 11 corresponding to the eigenvalues 0 and z2 of A(z), respectively, are analytic and nonzero for all zE$ (including the point z = 0), as ensured by Theorem 19.2.4. However, *,(z) and x2(z) are not linearly independent for 2 = 0. □ 19.3 PROOF OF THEOREM 19.2.3 We need some preparation for the proof of Theorem 19.2.3. A family of transformations B(z): <£""—» (p" is called branch analytic on ft if B(z) is analytic on ft except for a discrete set of algebraic (as opposed to logarithmic) branch points. The same definition applies to n-dimensional vector functions as well. The singular set of a family of transformations B(z): if"1—* <p", which is branch analytic on ft, is, by definition, the set of all z0E.Q such that dim Im B(zn) < max dim Im B(z) It is easily seen that the singular set is discrete and coincides with the set of all z0 £ ft with dim Ker B(z0) > min dim Ker B(z) zed We use the notation S(B) to designate the singular set of B(z). Lemma 19.3.1 Let B(z): <p"—»<pm be a branch analytic family of transformations on ft. Then there exist m-dimensional branch analytic vector-valued functions y,(z),. . . , yr{z) on ft. and n-dimensional branch analytic vector-valued functions x^z), . . . ,xn_r(z) on ft with the following properties: (a) each branch point for any function yt(z), j'•- 1,. . . , r or xk(z), k = 1,. . . , n - r is also a branch point of B(z); {b) yx(z), . . . , yr(z) are linearly independent for every z E ft; (c) xt(z),. . . , xn_r(z) are linearly independent for every z G ft; (d) Span{y1(2),...,y,(2)} = ImB(z) and Span{*,(z),. . . , xn_r{z)} = Ker B(z) for every z not belonging to S(B). The proof of this lemma can be obtained by repeating the proofs of Lemma 18.2.2 and Theorem 18.2.1 with the following modification: in place
614 Jordan Form of Analytic Matrix Functions of the Weierstrass and Mittag-Leffler theorems (Lemmas 18.2.3 and 18.2.4), one must use the branch analytic and branch meromorphic versions of these theorems. [In the context of Riemann surfaces, these versions can be found in Kra (1972).] Lemma 19.3.2 Let fi,(z): $"^>$m and B2(z): ^"-*^m be branch analytic families of transformations on ft, such that Ker B,(z)D Ker fi2(z) for every z G ft that does not belong to the union of the singular sets of Bx(z) and B2(z). Then there exist branch analytic n-dimensional vector functions xx(z),. . . , xs(z), z G ft with the following properties: (a) every branch point of any xf(z), j= I,. . . ,s is also a branch point of at least one of fi,(z) and B2(z); (b) *,(z)> • •■ > xs(z) are linearly independent for every z Gft; (c) for every zGft that does not belong to S{Bl)\JS{B2) the vectors x{(z),. . . , xs(z) form a basis in Ker B,(z) modulo Ker B2{z). An analogous result also holds in case Im B,(z) D Im B2(z), for all z G ft with the possible exception of singular points of B{(z) and B2(z). Proof. We regard fi,(z) and B2(z) as m x n matrix functions, with respect to fixed bases in <p" and <pm. By Lemma 19.3.1, find linearly independent branch analytic vector-valued functions ;y,(z), . . . , yu(z) on ft such that Span^Cz),. • • , yv(z)) = Ker B2(z) (19.3.1) for all zGft not belonging to the singular set of B2{z). Fix z0Gft, and choose xv+l,. . . , x„ in such a way that y,(z0),. . . , yv(z0), xu+l,... ,x„ form a basis in <p". Using the branch analytic version of Lemma 18.2.2 (cf. the paragraph following Lemma 19.3.1), find branch analytic vector functions y„+1(z),. . . , yn(z), zGil such that yx{z),..., yv(z), ^u + iC2)' • • • > yn(z) f°rm a basis in <p" for every z Gft. If necessary, replace Bt{z) by Bt(z)s\z), i = 1,2, where S(z) = [yx{z)■ • • yn(z)] is an invertible n x n matrix function, and we can assume that Bi(z) = [OmxvBi(z)], i = l,2 where Bt(z) are branch analytic m x (n - v) matrix functions, and Ker B2(z) = 0 for all z G ft with the possible exception of a discrete set of points. By Lemma 19.3.1 again, find branch analytic linearly independent
Proof of Theorem 19.2.3 615 £" "-valued functions i,(z),. . • , xs(z), z&Q, such that x,(z),. . . , xs(z) is a basis in Kerfl,(z) for all zEft except for the singular points of J3,(z). Then the vector functions satisfy the requirements of Lemma 19.3.2. □ Lemma 19.3.3 Let fl,(z) and B2(z) be as in Lemma 19.3.2, and let xx(z),. . . , x,(z) be branch analytic n-dimensional vector functions with the following properties: (a) every branch point of any x^z), j = 1,. . . , t is also a branch point of at least one of Bx{z) and B2(z); (b) there exists a discrete set T D S(Bl) U S(B2) such that *,(z), . . . , x,(z) belong to Ker B,(z) and are linearly independent modulo Ker B2(z) for every zEft^T. Then there exist branch analytic n-dimensional vector functions xl+l(z),. . . , xs(z) such that every point of any jc;(z), /' = t + 1,. . . , s is a branch point of at least one ofBt(z) and B2(z) and for every z E ft ^ T the set xx(z),. . . , x,(z), xt+l(z), . . . , xs(z) forms a basis in Ker B,(z) modulo Ker B2(z). The case t - 0 [when the set xx(z),. . . , xt(z) does not appear] is not excluded in Lemma 19.3.3. Proof. Arguing as in the proof of Lemma 19.3.2, we can assume that Kerfi2(z) = 0 for every Z0S(B2). Replacing T by TUS(B2), we can assume that S(B2) =0. Further, by the branch analytic version of Lemma 18.2.2, there exist branch analytic and linearly independent vector functions y^z),. . . , y,(z) with Spanfx^z), . . . , *,(*)} = Span{y,(.z),. . . , y,(z)} , z e Q - T There exist branch analytic vector functions y,+1(z),. . . , y„(z) such that yi(z),...,yn(z) form a basis in <f"" for every z E ft (cf. the proof of Lemma 19.3.2). By replacing B,(z) by B,(z)[y,(z)- • • y„(z)], we can assume that fi1(z) = [Onx,fi,(z)] and the proof is reduced to the case t = 0. But then Lemma 19.3.1 is applicable. □ We are ready now to prove Theorem 19.2.3. The main idea is to mimic the proof of the Jordan form for a transformation (Section 2.3) using Lemma 19.3.2 when necessary.
616 Jordan Form of Analytic Matrix Functions Proof of Theorem 19.2.3. For a fixed /' (/' = 1,. . . , v) let mjx be the maximal positive integer p such that Ker(M,(2)/ - A(z)Y * Ker(M/(z)/ - A(z))"-1 for all z0Sl U 52. By Theorem 19.2.1 and the definition of S2 the number mjx is well defined. By Lemma 19.3.2, there exist branch analytic vector functions x\'^m. (z),. . . , x['l, (z) on ft that are linearly independent for every z G ft, can have branch points only in 5,, and such that X\,ml-l\Z)i ■ • ■ i Xkl,mi,\Z) form a basis in Ker(/*y(z)/- A(z))m» modulo Ker(^(z)/- A(z))m"'\ for every zGft that does not belong to 5((^(z)/- A(z))m'')U 5((^;(z)/- ;4(z))",»"1). As we have seen in the proof of the Jordan form, the vectors *$.„-!(*)=' (-^(*)/ + ^W)*i!i,;lW - 9 = 1,-..,*, are linearly independent modulo Ker(^(z)/ - A(z))m,i~2 for every z0Sl U 52 (we assume here that m;1>2). By Lemma 19.3.3, there exist branch analytic vector functions on ft: with branch points only in 5, and such that for every z 0SxU S2 the vectors xl,mll-l\Z)' X2,mji-l\Z)> ■ ■ • ■> xkl,mjl-l\Z)> **, +l,myI - iC2)' • • • > Xk2,mji-\\Z) form a basis in Ker(^(z)/- A(z))m''1 modulo Ke^/i^z)/-y4(z))m"~2 Continuing this process as in the proof of the Jordan form, we obtain the vector function (19.2.3) with the desired properties. □ 19.4 ANALYTIC EXTENDAB1UTY OF INVARIANT SUBSPACES In this section we study the following problem: given an analytic family of transformations A(z) on ft and an invariant subspace M0 of A(z0), when is there a family of subspaces M(z) that is analytic in some domain ft' C ft with z0Eil', and such that M(z0) = M0 and Jt(z) is A(z) invariant for all z E ft'? (As before, ft is a domain in <p.) If this happens, we say that M0 is extendable to an analytic A(z) -invariant family of subspaces on ft'. The main result in this direction is given in the following theorem.
Analytic Extendability of Invariant Subspaces 617 Theorem 19.4.1 Let A(z): <p" —»• <p" be an analytic family of transformations on ft with the first and second exceptional sets 5, and S2, respectively. Then, provided 20e(l"-(52U5,), every A(70)-invariant subspace M0 is extendable to an analytic A(z)-invariant family of subspaces on ft ~~* Sx. Proof For j = 1,. . . , v, let x\[\z),. . . , xiH^z), x[[\z),..., x[%i2(z),..., 4>(2),. . . , x^mjy(z) (19.4.1) be n-dimensional vector functions as in Theorem 19.2.3. We consider A(z) and vectors (19.4.1) as an n x n matrix function and n-dimensional vector functions, respectively, written in the standard orthonormal basis in <f"\ Let zQ G ft""- (52 U SJ, and let J0 be the Jordan form of A(z0): / = diag[/,,. . . ,/„] where Jf = diag^^zj),. . . , Jmh(iij{z0))] and Jk( /j.) is the k x k Jordan block with eigenvalue p.. For z G ft "- Sl let T(z) be the n x n matrix whose columns are the vectors (19.4.1) (in this order). Observe that T(z) is analytic on ft "^ 5! with algebraic branch points in 5, and T(z) is invertible for zEil^ (52 U 5,) [the function T(z) is analytic but not necessarily invertible at points in S2]. Then we have A(z0)T(z0) = T(z0)J. Given an .A(z0)-invariant subspace M0, and any zGft^(S,US2), define M(z)^T(z)T(zoyiM0 Clearly, M{z) is analytic and A(z) invariant for z Gft"-- (5, U S2), and also M(z()) = M0. We show that M(z) admits analytic and A(z)-invariant continuation into the set 52. Let /,,..., fk be a basis in M0; then the vectors gl(z)= T(z)T(z0) '/„ . . . , gk(z)=T(z)nzoylfk form a basis in M(z) for every z G ft*"- (5, U S2). Note that g,(z), . . . , gk(z) are analytic in ft^Sr By Lemma 18.2.2 there exist n-dimensional vector functions h^z), . . . , hk(z) that are analytic on ft^ S,, linearly independent for every z G ft"-- 5,, and for which
618 Jordan Form of Analytic Matrix Functions Span{A,(z),. . . , hk(z)} = Span{gl(z),. . . , gk(z)} whenever z0St U S2. Putting M(z) = Span{fc,(z),. . . , hk{z)} for z G S2, we clearly obtain an analytic extension of the analytic family {M(z)}jgn-(s us,) to tne points in S2. As for a fixed z0E S2 we have lim A(z) = A{z0) , lim 6(M(z), M(zQ)) = 0 it follows in view of Theorem 13.4.2 that Jt(z0) is A(z0) invariant. □ The proof of Theorem 19.4.1 shows that the analytic A(2)-invariant family of subspaces M(z) on ft ~- 5, with M(z0) - M0, has at most algebraic branch points in 5,, in the following sense. For every z' G S,, either M(z) can be analytically continued into z' (i.e., there exists a subspace M', which is necessarily A(z') invariant, and for which the family of subspaces Jf(z), z£(ft--SI)U{z'} defined by ^V(z) = ./fl(z) on ft--S,, Jf(z') = M' is analytic on (ft^ 5,)U {z'}), or M(z) = S(z)M0 in a neighbourhood of z', where 5(2) is an invertible family of transformations that is analytic on a deleted neighbourhood of z' and has an algebraic branch point at 2'. Looking ahead to the applications of the next chapter, we introduce the notion of analytic extendability of chains of invariant subspaces. Let j4(z): <p"—»• <p" be an analytic family of transformations on ft, and let A0 = {M0l C M02 C • • • C M0r) be a chain of A(z0)-invariant subspaces. We say that A0 is extendable to an analytic chain of A(z)-invariant subspaces on a set il'Cft containing z0 if there exist analytic families of subspaces M0l(z),. . . , M0r(z) on ft' such that M0j(z0) = M0j for /' = 1,. . . , r, MQj(z) C Mok(z) for j <k and 2 G ft', and M0i(z) is A(z) invariant for all z G ft'. Clearly, this is a generalization of the notion of extendability of a single invariant subspace dealt with in Theorem 19.4.1. The arguments used in the proof of Theorem 19.4.1 also prove the following result on analytic extendability of chains of invariant subspaces. Theorem 19.4.2 Let A(z), 5, and S2 be as in Theorem 19.4.1. Then every chain of A(z0)- invariant subspaces, where z0Gft^(52 U 5,), is extendable to an analytic chain of A(z)-invariant subspaces on ft "^ 5,. Moreover, the analytic families of subspaces that form this analytic chain have at most algebraic branch points at St {in the sense explained after the proof of Theorem 19.4.1). Chains consisting of spectral subspaces are important examples of chains of subspaces that are always analytically extendable. Recall that an A- invariant subspace M is called spectral if M is a sum of root subspaces of A.
Analytic Extendability of Invariant Subspaces 619 Theorem 19.4.3 Let A(z) and 5, be as in Theorem 19.4.1. Then every chain A0 = {M0l C • • C M0r) of spectral subspaces of A(z0), where z0Eil, is extendable to an analytic chain of A(z)-invariant subspaces on ft""-(S, ^ {z0}) that has at most algebraic branch points at S, ""> {z(l}. Proof. For j = 1,. . . ,r write M0j = Im Pt (z0), where plpo)=^-il(^'A(z0))-ld\ is the Riesz projector of A(z0) corresponding to a suitable simple rectifiable contour T;. We can assume that 1^ lies in the interior of Tk, for / >k. Let <U C ft be a neighbourhood of z0 that is so small that A(z) has no eigenvalues on T, U • • • U Tr for z E %. Clearly, for z E °U, we find that A(z) = {^,(z)C---C^r(z)} where Mf(z) = Im Pr(z) form an analytic chain of /i(z)-invariant subspaces in aU. Fix z E °U ^ (5, U S2), and let J?y(z) be the analytic /l(z)-invariant family of subspaces (cf. the proof of Theorem 19.4.1) to which M^z) is extendable. It is easily seen that My(z) = Mj(z) for z E %l "*■ (5, U S2), so A0 admits the desired extension. □ To analyze the extendability of ,4(z0)-invariant subspaces when zES2, we need the following notion. An invariant subspace M0 of A(z0), z0 E ft is called sequentially isolated (in ft) if there is no sequence zmit z0, m = 1,2, . . . of points in ft tending to z0 such that, for some A(zm)-invariant subspace Mm (m = 1,2, . . .), we have limm^o0 6{Mm, M0) = 0. Theorem 19.4.1 shows, in particular, that every /l(z0)-invariant subspace with zQE ft ~~- (S[ U 52) is sequentially nonisolated. However, certain ,4(z0)-invariant subspaces with z0 E Sz may be sequentially isolated, as follows. example 19.4.1. Let ^ = [o o]« 2e<p Here 5, is empty, S2 = {0}. Any A(0)-invariant subspace of the form Span , where y is a complex number, is sequentially isolated. On the m other hand, the A(0)-invariant subspace Span is sequentially nonisolated. □
620 Jordan Form of Analytic Matrix Functions Clearly, a sequentially isolated A(z0)-invariant subspace is not extendable to an analytic A (z) -invariant family of subspaces on a neighbourhood of z0. We conjecture that these are the only nonextendable invariant subspaces. Conjecture 19.4.4 Let A(z), Sl, and S2 be as in Theorem 19.4.1. Then every sequentially nonisolated A(z0)-invariant subspace M0, where z0E S2, is extendable to an analytic A(z)-invariant family of subspaces on il ~^ Sl that has at most algebraic branch points in 5, (in the same sense as the remark following the proof of Theorem 19.4.1). Theorem 19.4.3 verifies this conjecture in case M0 is a spectral subspace. 19.5 ANALYTIC MATRIX FUNCTIONS OF A REAL VARIABLE The results of Sections 19.1-19.4 hold also for n x n matrix functions A(t) that are analytic in the real variable t on an open interval ft on the real line. Of particular interest is the case when all eigenvalues of A(t) are real, as follows. Theorem 19.5.1 Let A(t) be an n x n matrix function that is analytic in the real variable t on ft. Assume that, for all rEft, all eigenvalues of A(t) are real. Then the eigenvalues of A(t) are also analytic functions of t on il. Proof. Let t0 Eft. By Theorem 19.1.1, all eigenvalues of A(t), for t in a neighbourhood of t0, are given by fractional power series of the form oo where cy are complex numbers. Let /', be the first index such that c; ^ 0. [If all Cj are zeros, then A(f) = A0 is obviously analytic at t0.] Then 'A-StiF^ (19'51) Take t > t0 and (t - t0)lla positive. Since \(t) and A0 are real, we find that ct must be real. In (19.5.1) we now take t < 10 and (t - t0)"° = \t - t0\ila • (cos(27r/7a) + i sin(27r//a)). We obtain a contradiction with the fact
Analytic Matrix Functions of a Real Variable 621 that Cj is real unless/', is a multiple of a. Uj2 >/', is the minimal integer with c/2#o' then . x(t)-x0-ch(t-t0Y',a c<> SS [(f-r.)1'*]* and the preceding argument shows that c; is real and j2 is a multiple of a. Continue in this way to conclude that A(f) is analytic in a neighbourhood of t0. As t0 was arbitrary in ft, the analyticity of \(t) on ft follows. □ Combining this result with Theorems 19.2.3 and 19.4.1, we have the following corollary. Corollary 19.5.2 Let A(t) be an analytic nx n matrix function of a real variable t on ft, and assume that all eigenvalues of A(t) are real when t E ft. Let S2 be the discrete set of points in ft defined by the property that either v(tn)<max v(t), t0£S7 where v(t) is the number of distinct eigenvalues of A{t), or KO = max v{t), (0£S2 but at least for one analytic eigenvalue fi.(t) ofA(t) the partial multiplicities of fij(t0) are different from the partial multiplicities of fij(t), t ¥^ t0 in a real neighbourhood of t0. Then there exist analytic n-dimensional vector functions *..('), • • • , *,m,(0; • • • ; *„(0. - ■ • > *,mr(0 (19.5.2) on ft such that for every f Eft""- 52 the vectors (19.5.2) form a basis in <p" and, for j= 1,. . . , r, xjX{t),... , xjm(t) is a Jordan chain of the transformation A(t). Moreover, when t0 E ft "^ S2, every A(t0)-invariant subspace M0 is extendable to an analytic A(t)-invariant family of subspaces on ft. In particular, the conclusions of Corollary 19.5.2 hold for an analytic n x n matrix function A{t) of the real variable t E ft that is diagonable and all eigenvalues of which are real for every t E ft. These properties are satisfied, for example, if A{t) is an analytic matrix function on ft that is hermitian for all t E ft.
622 Jordan Form of Analytic Matrix Functions 19.6 EXERCISES 19.1 Find the first and second exceptional sets for the following analytic families of transformations: -[;;] 0 0 z4-3z2 1 0 0 1 3z z-z2 (a) A(z) (b) A(z) = 19.2 In Exercise 19.1 (a) find a basis in <p2 that is analytic on <p (with the possible exception of branch points) and consists of eigenvectors of A(z) (with the possible exception of a discrete set of values of z). 19.3 Describe the first and second exceptional sets for the following types of analytic families of transformations A(z): <p"—* <p" on ft: (a) A(z) = diag[a,(z),. . . , an(z)] is a diagonal matrix. (b) A(z) is a circulant matrix (with respect to a fixed basis in <p") for every z E ft. (c) A(z) is an upper triangular Toeplitz matrix for every z Eft. (d) For every 2 Eft, all the entries in A(z), with the possible exception of the entries (/, /') with i = j or with i + j = n + I, are zeros. 19.4 Show that the analytic matrix function of type a(z)I + j3(z)/i, where a(z) and )8(z) are scalar analytic functions and A is a fixed n x n matrix, has all eigenvalues analytic. 19.5 Show that if A(z) = a(z)I + P(z)A is the function of Exercise 19.4 and f3(z) is a polynomial of degree /, then the second exceptional set of A(z) contains not more than / points. 19.6 Prove that the number of exceptional points of a polynomial family of transformations S*=0 z'A}, z E <f\ is always finite. [Hint: Use the approach based on the resultant matrix (Section 19.2).] 19.7 Let A(z) be an analytic n x n matrix function defined on ft whose values are circulant matrices. When is every A(z0)-invariant sub- space analytically extendable for every z0 E ft? 19.8 Describe the analytically extendable /l(.z0)-invariant subspaces, where A (z) is an analytic n x n matrix function on ft with upper traingular Toeplitz values, and z0 E ft. 19.9 Let A(z): <p"—* <p" be an analytic family of transformations defined on ft, and assume that A(z0) is nonderogatory for some 20Eft. Prove that every A(z0)-invariant subspace is sequentially nonisolated. (Hint: Use Theorem 15.2.3.)
Exercises 623 19.10 Let A(z) be an analytic n X n matrix function of the real variable z G ft, where ft is an open interval on the real line, such that A(z) is hermitian for every z G ft. Prove that there exist analytic families *,(z),. . . , x„(z) of n-dimensional vectors on ft such that for every z0 G ft the vectors x,(z0),..., xn(z0) form an orthonormal basis of eigenvectors of A(z0). [Hint: Let A0(z) be an eigenvalue of A(z) that is analytic on ft (one exists by Theorem 19.5.1). Choose an analytic vector function *,(z)GKer(.A(z) - A0(z)/) on ft with ||*(z)|| = 1. Repeat this argument for the restriction of A(z) to Spanfjt^z)}^— recall that Span{jc,(z)}x is an analytic family of subspaces on ft—and so on.] 19.11 Let A and B be hermitian n x n matrices, and assume that A has n distinct eigenvalues A,, A2, . . . , A„. Show that in the power series oo and representing the eigenvalue At(z) of A + zB and the corresponding eigenvector fk(z) of A + zB, for z sufficiently close to zero, we have where ak are pure imaginary numbers. It is assumed that ||/k(z)|| = 1 for real z sufficiently close to zero. [Hint: By Exercise 19.10, the eigenvalue A*(z) and the corresponding eigenvector fk(z) are analytic functions of z. Show that the equality Afil) + Bfk = Xkfku + kkl)fk (1) holds. Find A^1' by taking the scalar product of (1) with fk. By taking the scalar product of (1) with /. (i ¥= k) it is found that Ak A, The condition \\fk(z)\\ = 1 gives (/<>>, /,) + (fk, f[") = 0.]
Chapter Twenty Applications This chapter contains applications of the results of the previous two chapters. These applications are concerned with problems of factorizations of monic matrix polynomials and rational matrix functions depending analytically on a parameter. The main problem is the analysis of analytic properties of divisors. Solutions of a matrix quadratic equation with coefficients depending analytically on a parameter are also analyzed. 20.1 FACTORIZATION OF MONIC MATRIX POLYNOMIALS Consider a monic matrix polynomial L(\) = I\' + T.'-J0AI\>, where A0,. . . , Al_l are n x n matrices that depend analytically on the parameter 2 for 2 E ft, and ft is a domain in the complex plane. We write Af = At{z) and L( A) = L( A, z). In this section we study the behaviour of factorizations L(A, z) = L^A, £)■ ■ - Lr(A, z) of L(A, z) as functions of z. Our attention is focused on the problem of analytic extension of factorizations from a given 20eft. Let C(z) = 0 0 / 0 0 / -A0(z) -Ax{z) -A2(z) 0 0 -^/-.(z)-l be the companion matrix of L(A, z). Obviously, C(z) is an analytic n x n matrix function on ft. The first (resp. second) exceptional set of C(z) is called the first (resp. second) exceptional set of L( A, z). In other words (see Chapter 19), z0 E ft belongs to the first exceptional set S, of L( A, z) if and only if not all solutions of det L( A, z) = 0 (as functions of z) are analytic at 20. The point z0 belongs to the second exceptional set S2 of L(A, z) if and 624
Factorization of Monic Matrix Polynomials 625 only if all solutions of det L(A, z) = 0 are analytic in a neighbourhood of z0 and, denoting by A^z), . . . , Ar(z) all the different analytic functions in a neighbourhood of z0 satisfying det L( Ay(z), z) = 0, /' = 1,. . . , r, we have either (a) A;(z0) = kk(z0) for some /V k or (b) all the numbers A,(.z0),. . . , Ar(z0) are different, but for at least one A;(z) the partial multiplicities of L(A, z) at A .(z) are not the same when z = z0 and when z 9^ z() (and z is sufficiently close to z0). Now we state the main result on analytic extendability of factorizations of L(A,z). Theorem 20.1.1 Let z0eQ^(5,US2) and L(A,z0) = L1(A)-Lr(A) (20.1.1) where L;(\), j = 1, . . . , r are monic matrix polynomials and 5, {resp. S2) is the first (resp. second) exceptional set of L(X, z). Then there exist monic matrix polynomials L,(A, z),. . . , Lr(A, z) whose coefficients are analytic functions on fl^(5, US) (where S is some discrete subset of Q^{zu}), having at most poles in S, and at most algebraic branch points in 5,, and such that L(A,z) = L,(A,z) •••Lr(A,z) for z6ftxS, and Lj( A, z0) = L-( A) for j = 1,. . . , r. Note that the case when Sl n 5 ¥=0 is not excluded. This means that the coefficients Ajk(z) of L,(A, z) may have an algebraic branch point and a pole at the same point z' simultaneously, that is, there is a power series representation of type A)k(z)= 2 Bfr-z')'* in a deleted neighbourhood of z', where p and q are positive integers. Proof. We use the description of factorizations of monic matrix polynomials in terms of invariant subspaces developed in Chapter 5. Let o- X=[I 0 ••• 0], C(z), Y= Q (20.1.2) L/. be a standard triple for L(\), and let
626 Applications i,C CM, (20.1.3) be the chain of C(z0 )-invariant subspaces corresponding to the factorization (20.1.1) [with respect to the triple (20.1.2)]. In particular, for / = 1,. . . , r - 1, the transformations X\MC{z'0)\M. m.^F are invertible, where pi is the sum of degrees of the matrix polynomials Lr_i+1(\),. . . , Lr(A). By Theorem 19.4.2 the chain (20.1.3) is extendable to a chain Mx(z) C • • • C Mr_^z) of C(z)-invariant subspaces that is analytic in ft "^ 5j and has at most algebraic branch points in 5,. Let 5 = 5(1> U • • • U 5(r_1), where SU) is the discrete set of all z G ft for which the transformation X\mjU) ^L;(z)c(z)L;(I) ■*/(*)-<F "Pi ^Mii4(C(z))\Mi(2)Yr\ is not invertible. For z E ft "- 5, let L(A,2) = L1(A,2)---Lr(A,2) be the factorization of L( A, z) that corresponds to the chain Mx{z) C • • • C ■^r-ii2) °f C(z)-invariant subspaces [with respect to the triple (20.1.2)]. Formulas (5.6.3) and (5.6.5) show that the coefficients of L;(A, z) have all the desired properties. D In the same way (using Theorem 19.4.3 in place of Theorem 19.4.2) one proves the analytic extendability of spectral factorization, as follows. Theorem 20.1.2 Let zn G ft and L(A,z0)=L1(A)-L,(A) where <r(Ly) n (r(Lk) - 0 for j ¥^ k. Then there exist monk matrix polynomials L,(A, z),. . . , Lr( A, z) with the same properties as in Theorem 20.1.1, and whose coefficients are, in addition, analytic at z0.
Rational Matrix Functions 627 We say that a factorization L(A,z0) = L,(A)--Lr(A), z0Eft (20.1.4) of monic matrix polynomials Ly(A) = IX.'1 + £^=0 Ajk\k, j = I, . . . , r is sequentially nonisolated if there is a sequence of points {zm)Z,-\ m ft "" (2o) such that limm_oc zm = z0 and a sequence of factorizations L(A,zm) = L(r)(A)---L^>(A), m = l,2,... where L<m>( A) = /A'' + 2 ^A* , / = 1,. . . , r * = o with limm^o0 A]f = Ajk for A: = 0,...,/- 1 and / = 1,. . . , r. Theorem 20.1.1 shows, in particular, that every factorization (20.1.4) with zo0Sl U S2 is sequentially nonisolated. Simple examples show that sequentially isolated factorizations do exist, for instance: example 20.1.1. Let C(z) be any matrix depending analytically on z in a domain ft with the property that for z = z0Eil, C(z0) has a square root and for z ¥= z0, z in a neighbourhood of z0, C(z) has no square root. The prime example here is Zo=<>, «z)=[°z I] Then define L(A, z) = /A2 - C(z). It is easily seen that if L( A, z) has a right divisor /A - A(z), then L(A, z) = /A2 - /42(z) and hence that L(A, z) has a monic right divisor if and only if C(z) has a square root. Thus, under the hypotheses stated, L(A, z) has an isolated divisor at z0. D It is an open question whether every sequentially nonisolated factorization L(A, z0) = L[(A) • • • Lr{ A) of monic matrix polynomials with z0 belonging to the second exceptional set 52 of L(\, z) is analytically extendable in the sense of Theorem 20.1.1. (It is clear that the sequential nonisolatedness is a necessary condition for the analytic extendability.) A proof of Conjecture 19.4.4 will answer this question in the affirmative. 20.2 RATIONAL MATRIX FUNCTIONS DEPENDING ANALYTICALLY ON A PARAMETER In this section we study the realizations and exceptional points of rational matrix functions that depend analytically on a parameter. This will serve as
628 Applications a background for the study of analytic extendability of mimimal factorizations of such functions to be dealt with in the next section. Let W(A, z) = [w,v(A, z)]"-=1 be a rational n x n matrix function that depends analytically on the parameter z for z G ft, where ft is a domain in <f\ That is, each entry w(>(A, z) is a function of type pij(\,z)lqij(\,z), where pti(k, z) and qtj(\, z) are (scalar) polynomials in A whose coefficients are analytic functions of z on ft. We assume that: (a) For each i and /' and for all z ECl, the polynomial qif{ A, z) in A is not identically zero, so the rational matrix function W(A,z) is well defined for every z ED,. (b) It is convenient to make the further assumption, namely, that for each pair of indices i, / (1 < i, j < n) there exists a z0 E ft such that the leading coefficient of ^,;(A,z) is nonzero at z = z0 and the polynomials p^(A, z0) and qti(k, z0) are coprime, that is, have no common zeros. In particular, this assumption rules out the case when p(.( A, z) and qtj( A, z) have a nontrivial common divisor whose coefficients depend analytically on z for z G ft. (c) Finally, we assume that for every z G ft the rational matrix function W(A, z) (as a function of A) is analytic at infinity and W(°°, z) = /. Assumptions (a), (b), and (c) are maintained throughout this section. It can happen that W( A, z) has zeros and poles tending to infinity when z tends to a certain point zn G ft. This is illustrated in the next example. example 20.2.1. Let Obviously, W(A,z) satisfies conditions (a), (b), and (c). Specifically, W(A, z) depends analytically on z for z G <p, W(°o, z) = 1 for all z G <p, and the polynomials 1 + Az and z 4-1 + Az have no common zeros for z = 1. However, W(A, z) has a zero at A = -z_1 and a pole at A = -(z + l)z_1, and both tend to infinity as z—»0. □ A convenient criterion for boundedness of zeros and poles in a neighbourhood of each point in ft can be given in terms of the entries of W( A, z), as follows. Proposition 20.2.1 The poles and zeros of W(\, z) are bounded in a neighbourhood of each point in ft if and only if, for each entry pti(k, z)/^;/(A, z) of W(A, z), the leading coefficient of the polynomial q,j(X, z) has no zeros in ft (ay an analytic function of z on ft).
Rational Matrix Functions 629 Proof. Assume that the leading coefficient of each <jrf.( A, 2) has no zeros in ft. Fix 20 E ft. Write qtj( A, z) = E^=0 qijk(z)kk, and, in general, s depends on i and /. As qijs(zo) ^ 0> tne zeros of qtj( A, z) are bounded in a neighbourhood of z0. Indeed, writing rk(z) = qjjk(z)/qijs(z) for A: = 0,. . . , s - 1, the zeros of the polynomial s-l A* + S rk(z)\k , zE:°U k=0 are all in the disc | A( < 1 + maxz60i( (|r0(z)|, . . . , ^.^(z)!), where °U is a suitably chosen neighbourhood of z0. As the poles of W(A, z) must also be zeros of at least one of the polynomials qtj( A, z), i, j = 1, . . . , n, it follows that there exists an M>0 such that the poles of W(\,z) are all in the disc|A|<M for every z E aU. Arguing by contradiction, assume that the zeros of W( A, z) are not bounded in any neighbourhood of z0. So there exist sequences {zm}™ = 1 and {A,X=I such that Am-», |Aj> A# and Am is a zero of W(zm). Then W(Am, zm)xm = 0 for some vector *m E <p" of norm 1. [Here we use the fact that Am is not a pole of W(A, zm).] Passing to a subsequence, if necessary, we can assume that *m—»x0 for some x0E. <p", ||*0|| = 1. Using also the fact that W(«>, z) = / for all z£l/, it follows that VV(A,z) is continuous on the set (A,z)E ({AE <p I |A| > M} U {»}) x aiL. A quick way to verify this is by using a general result that says that if a function /(z,,. . . , zm) of complex variables z,, . . . , zm defined on V, x - • • x Vm where each Vj is a domain in <p, is analytic in each variable separately (when all other variables are fixed), then /(z,,. . . , zm) is analytic (in particular, continuous) on K, x • • • x Vm. For the proof of this result, see, for example, Bochner and Martin (1948). Now the continuity of W(A, z) implies W(«>, z0)x0 = 0, a contradiction with the fact that W(oo, z0) = /. Conversely, let z0 € ft be a zero of the leading coefficient of some <7,,(A, z). Then there is a zero A0 = A0(z) of the polynomial <7,;(A, z) such that A0(z) tends to infinity as z tends to z0. As A0(z) is a pole of W(A, z) provided />,- (A0(z), z)#0, we have only to show that A0(z) is not a zero of Pij( A, z) for z ¥= z0 sufficiently close to z0. To this end, use the existence of a point zxEl(1 such that qijs(z\) ^0 an^ the polynomials /?,7(A, Z[) and <?,7(A, zj are coprime. The coprimeness of p(/( A, z) = Ej.o p^zjA' and gY/(A, z) is equivalent to the invertibility of the (s + /) x (i + t) resultant matrix <7,yo(2) Villi2) ■■ •?«,(*) ° ° •■ ° I 0 <7/,o(2) ■■• ?*-.W */,(*) ° •■' 0 0 0 ■• q.l0(z) qin(z) ■■• ?,,,(*) P,to(z) P,/i(2> •• />«.(*) ° 0 ••• 0 0 P;/o(z) ••• />,„-,(*) J>;„(2) ° •■■ 9 0 0 ••• M2) PaM ••• M*)- *„(*) =
630 Applications as long as qijs(z)¥>0 [e.g., see Uspensky (1978)]. So detR{zx)¥^Q, and since det R(z) is an analytic function of z on ft, it follows that det R(z) ¥= 0 for all z¥=zn sufficiently close to z0. Hence indeed, p,7(A0(2), z)¥=0 for z ¥= z0 in some neighbourhood of z0. D It turns out that the boundedness of the poles and zeros of W(A, z) is precisely the condition needed for existence of an analytic minimal realization in the following sense. Theorem 20.2.2 Let W(A, 2) be a rational nx n matrix function that depends analytically on the parameter z Eft and satisfies assumptions (a), (ft), and (c). Let the zeros and poles of W(\, z) be bounded in a neighbourhood of every point in ft. Then there exist analytic matrix functions on ft, A(z), B(z), and C(z) of sizes m x m, m x n, and n x m, respectively, such that W(\,z) = I + C(z)(\I-A(z))lB(z), 2 eft (20.2.1) and for every z Eil, with the possible exception of a discrete set S, the realization (20.2.1) is minimal. Conversely, if (20.2.1) holds for some matrix functions A(z), B(z), and C(z) of appropriate sizes that are analytic on ft, then the zeros and poles of W(A, 2) are bounded in a neighbourhood of every point in ft. Proof. By Theorem 7.1.2, for every zE.il there exists a realization W(A, 2) = / + C0(2)(A/ - A0(z)YlB0(z) for some matrices C0(z), A0(z), and B0(z). Further, by Proposition 20.2.1, the leading coefficients of the denominators of the entries in W(A, 2) have no zeros in ft. According to this fact, the proof of Theorem 7.1.2 shows that A0(z), Bu(z), and CH(z) can be chosen to be analytic matrix functions of 2 on ft. Let p x p be the size of AQ(z). By Theorem 18.2.1 we can find families of subspaces of <pp, 3C{z), and £{z), which are analytic on ft and are such that, for every z Eil with the possible exception of a discrete set 5,, we have X(z) = H Ker(C0(2)(A0(2))') = Ker i = 0 C0(z) C0(z)A0(z) .C0(z)(A0(z)) P-\ and
Rational Matrix Functions 631 p-i #{z) = 2 Im((i40(z))''B0(z)) 1 = 0 = Im[B0(z), i40(z)B0(z), . . . , (^(z))"-^^)] For 2 E 5, we have 5f(z)cn1Ker(C0(2)(A0(2))') i = 0 and p-i jf(2)DSlm((/l0(2))'B0(2)) i = 0 By Theorem 18.3.2, when 2 E ft we may write X(z) = Im P(z) = Ker(/ - />(z)) jf(2) = Ker(/-e(2)) where P(z): <pp—» <pp and (?(z): <PP—* <PP are analytic families of projectors on ft. Using the same Theorem 18.2.1, we find an analytic family of subspaces i?(z) on ft such that J2>(z) = X(z) n ${£) = Kerf l~_ ^ ] for every 2 E ft except possibly for a discrete set S2 C ft. For each z E S2 we have %{z) C 3Sf(z) n ${z) In view of Theorem 18.3.2 there exists an analytic family of subspaces M{z) on ft such that £" = ${z) 4- Jf(z) for all zE.0.. Also, Lemma 19.3.2, with ensures the existence of an analytic family of subspaces M(z) on ft such that ${z) = 2{z) + M{z) for all z&Sl. Let
632 Applications A(z)= PMU)A0\M2): M(z)^M(z) B(z) = PMMB0(z):("-+M(z) C{z) = C0(z)\M{z):M{z)^$" where PMU) is the projector on M(z) along 5£(z) + Jf(z). We regard A(z), B(z), and C(z) as matrices with respect to a fixed basis *,(z), . . . , xm(z) in M(z) such that Xj(z) are analytic functions on ft (such a basis exists in view of Theorem 18.3.2). It is easily seen that A(z), B(z), and C(z) are analytic on ft. The proof of Theorem 6.1.3, together with Theorem 6.1.5, shows that VV( A, z) = / + C(z)( A/ - A(z))~lB{z) (20.2.2) for every zGft^(5, U S2), and that (20.2.2) is a minimal realization for W(A, z) when z£S,US2. By continuity, equation (20.2.2) holds also for 2£S,U S2, and the first part of Theorem 20.2.2 is proved. Assume now that (20.2.1) holds for some analytic matrix functions A(z), B(z), and C(z). It follows from Theorem 7.2.3 that every pole of W(A, z) is an eigenvalue of A(z) and every zero of W(A, z) is an eigenvalue of A(z)- B(z)C(z) (although the converse need not be true). As the eigenvalues of A(z) and of A(z) - B(z)C(z) depend continuously on z, they are bounded in a neighbourhood of each point in ft, and the converse statement of Theorem 20.2.2 follows. □ As the proof of Theorem 20.2.2 shows, the converse statement of this theorem remains true if the matrix functions A(z), B(z), and C(z) satisfying (20.2.1) are merely assumed to be continuous on ft. The discrete set 5 from Theorem 20.2.2 consists of exactly those points where the McMillan degree of W(A, z) is less than m. This follows from Theorems 7.1.3 and 7.1.5. Note also that the McMillan degree of W(A) is equal to m for every z G ft ^ 5. From now on it will be assumed (in addition to the assumptions made in the beginning of this section) that the zeros and poles of W(A, z) are bounded in a neighbourhood of each point in ft. Let VV( A, z) = / + C(z)(A/ - A(z))~lB(z) (20.2.3) be a minimal realization of W(A, z) for zGfl^-S, as in Theorem 20.2.2. Here 5 is the set of all z E ft such that the realization (20.2.3) is not minimal. Denote by 5, and 52 the first and second exceptional sets, respectively, of the analytic matrix function A(z), as defined in Section 19.2. Similarly, let Sf and 52x be the first and second exceptional sets, respectively, of /l(z)x =f A{z) - B(z)C{z), z eft. The set 5, U 5^ will be called the first exceptional set 7\ of W(A, z). As the poles (resp. zeros) of W(X,z), when zEONS, are exactly the eigenvalues of A(z) [resp. of
Rational Matrix Functions 633 /t(z)x] (see Section 7.2), it follows that the point 2(l Eft belongs to the first exceptional set of W(\, z) if and only if there is a pole or a zero A0(z) of W(A, z), where 2£l(s{z0}, where <U is a neighbourhood of 20, such that z0 is an algebraic branch point of A0(z). Note that it can happen that (20.2.3) is not a minimal realization for some z belonging to the first exceptional set of W(A) (see Example 20.2.2). The set (52>5r)U(52x-S,)U(5-(5rU5,)) will be called the second exceptional set T2 of W( A, z). Denoting by S(z) the McMillan degree of W(A,z), we obtain the following description of the points in the second exceptional set: z E T2 if and only if all poles and zeros of W(A, z) can be continued analytically (as functions of z) to zn, and either 5(z„)<max S(z) or S(z„) = max S(z) " zen v " zen ' and for at least one zero (or pole) A0(2) that is analytic in a neighbourhood % of z(), the zero (or pole) multiplicities of W(A, z) corresponding to A0(2) (z E °U "^ {z(l}) are different from the zero (or pole) multiplicities of W(\, 20) at Al)(20). Again, it can happen that T2 intersects with the set of points where the realization (20.2.3) is not minimal. Clearly, both Tl and T2 are discrete sets. Note also that the set T, U T2 contains all the points z0 for which S(z(l)<max2en 8(z). example 20.2.2. Let W(A, z) = l + [0, 2,0, z] |A/- 0 1 z 0 0 0 0 0 0 0 0 -z + 2 01 0 1 oJ V | } / rn 0 i LoJ 1 + *(z - 2) z A - z(z - 2) be a scalar rational function depending analytically on z for z E <p. Clearly, W(s°, z) = 1 and the zeros and poles of W(A, z) are bounded in a neighbourhood of each point z E <p (cf. Proposition 20.2.2). In the notation introduced above we have 5, = {0, 2}; S2 = {1}; 5 = {0}. Further A(zY 0 z 0 0 1-z 0 -z 0 0 0 0 -z+2 -z " 0 1-z 0 . and a calculation shows that
634 Applications det( A/ - A{z)x) = A4 + 2A2(z - 1) + 2(2 - 2)(2z - 1) So the eigenvalues of A{z)x are given by the formula A = (-(2 - 1) ± V-223 + 622-42 + l)"2 It is easily seen that 5J* = {0, 2, |, z„ z2, z3}, where z„ z2, z3 are the zeros of the polynomial -22 +62 - 4z + 1, and S2 is empty. The first exceptional set of W(A, 2) is {0, 2,5,2,, z2, z3}, whereas the second exceptional set of W(A, z) consists of one point {!}. D 20.3 MINIMAL FACTORIZATIONS OF RATIONAL MATRIX FUNCTIONS Let W(A, z) be a rational n x n matrix function depending analytically on the parameter z for z E ft, as in the preceding section. Let W(A,z0) = W10(A)--Wr0(A) (20.3.1) be a minimal factorization of W(A,z0), for some z0Eil. Here W10(A),. . . , Wr0(A) are n x n rational matrix functions with value / at infinity. We study the problem of continuation of (20.3.1) to an analytic family of minimal factorizations. In case z0 does not belong to the exceptional sets of W{\, z), such a continuation is always possible, as the following theorem shows. Theorem 20.3.1 Let W{ A, z) be a rational n x n matrix function that depends analytically on z for z c. ft and such that W(=°, z) = I for z G ft. Assume that the denominator and numerator of each entry in VV( A, z) are coprime for some z0Eil that is not a zero of the leading coefficient of the denominator. Assume, in addition, that the zeros and poles of W{ A, z) are bounded in a neighbourhood of each point in ft. Let 5 = {z0eft|S(z0)<maxS(z)} where 8{z) is the McMillan degree of W(A, z), and let Tl and T2 be the first and second exceptional sets of W{\, z), respectively. Consider a minimal factorization (20.3.1) with z0Eft ^(T, U T2). Then there exist rational matrix functions W[(A, z),. . . , Wr{ A, z), the entries of which depend analytically on z in ft {with the possible exception of algebraic branch points in Tl and of a discrete set D C ft of poles), and having the following properties: {a) W^.(°°, z) = / for j = 1,. . . , r and every z£flvD; {b) the point z0 does not
Minimal Factorizations of Rational Matrix Functions 635 belong to D and W,(A, z0) = W>0(A) for ;'=l,...,r; (c) W(A, z) = W^ A, z)• • • Wr( A, 2) /or every z E fl ^ £>. Moreover, this factorization is minimal for every z£fls(DUi), The set D of poles of Wt(\, 2) in Theorem 20.3.1 generally depends on the factorization (20.3.1), and not only on the original function W(A, 2). This is in contrast with the sets Tl and T2 that depend on W(A, 2) only. Proof. Let A{z), B(z), and C(z) be as in Theorem 20.2.2, so the realization (20.2.1) is minimal for all zEil~- S. Using Theorem 7.5.1, let H- — -*io >"■■■' =^>o be the direct sum decomposition corresponding to the minimal factorization (20.3.1), with respect to the minimal realization W(A,20) = /+C(20)(A/->l(20))-1fi(20) Thus, for j = 1, . . . , r - 1 the subspaces •"*■ /o = =*io "*" ' ' ' "*" -*yo are /l(z0) invariant, whereas the subspaces ^' =oc i...f* \r — V 4- ¥ hT = V •"20 °t20 T -'-rO' • • • ' *" r-1.0 "^ r-1.0 T -^ rO' *'' rO °z' rl) are v4(z0)x invariant. [Here, as usual, A(z)x = ^4(z) — Z?(z)C(z).] Now by Theorem 19.4.2, there exist families of subspaces Mj(z) for; = 1,. . . , r - 1, and Jfj(z) tor j = 2,... , r, which are analytic on ft, except possibly for algebraic branch points in Tl, and have the following properties: (a) ^,(z)C---C^r_,(z) for all z£ft; (b) N2(z) D • ■ • D JVr(z) for all zGft; (c) Mj(z) are /l(z) invariant, and .A^z) are A(z)x invariant; (d) Mj(z0) = Mj0,j=l,.. , r-1; Jfi{z0) = Jfj0,j = 2,. . . , r. Let m, be the dimension of J^0, / = 1,.. ., r (so m, + • • ■ + m, = m). It follows from the proof of Theorem 19.4.2 that ^>(z) = Span{x(/>(z),...,^;)(z)}, zEft, / = l,...,r-l (20.3.2) jrl{z) = Sp*n{y\i\z),...,yl»{z)}, z£ft, / = 2,. . . , r (20.3.3) where for each /', the vector functions x\'\z),. . . , x^'\z), as well as y['\z), . . . , yl'}(z), are linearly independent for every zEil and analytic on ft except possibly for algebraic branch points in T1. Here pj = ml +
636 Applications • • • + ntj is the dimension of M^z) and qf = m/ + m-+l + ■ • • + mr is the dimension of Jft(z). Our next observation is that Mf(z) + Jf/+l(z) = i:m, zBil^D,, >=1 r-1 (20.3.4) where D- is a discrete set in ft. [Note that the sum in (20.3.4) is direct.] Indeed, by (20.3.2) and (20.3.3) we have dim(Mj(z) + jV)(z)) = rank Ffe), z e ft where F{z) = [x\'\z)■ ■ ■ *<;>(z) y\' + l\z)-- - ^;;,1)(2)] (20.3.5) is a matrix function of size mx (pf -f qj+i) = m x m. It remains to observe that det F-(z) is not identically zero [because detFy(20)#0] and (20.3.4) holds with D- being the set of zeros of det Fj(z). Let D = D1U---UD,_I. In particular, we have j«y(z) + ^)(z) = <:", zeft--£>, ; = 2,...,r-l (20.3.6) Note also that z„ does not belong to D. Consider the subspaces i?j(z) = Mf{z) D JVj-(z) for /' = 2,. . . , r — 1. First, it is clear that ^.(z0) = ^.0f ;=2,..., r-1 Second ^,(2) + --- + if^) = C 2£ft-D (20.3.7) where we put 5£x{z)~ M^J) and Z£r(z) = Nr(z). Indeed, it is sufficient to verify that Ml_l(z) + 2l(z) = M,(z), 2£ft-0, y = 2,...,r (20.3.8) [By definition, Mr(z) = <Fm] The inclusion C in (20.3.8) is evident from the definition of i?y(z). Further, for z£ftK D, we have ^^,(2) n j^.(z) c ^;_,(z) n A}(z) n ^;(z) = {0} in view of (20.3.4). Now, using (20.3.6), we have for z eft ^ D
Minimal Factorizations of Rational Matrix Functions 637 dim £j(z) = dim M^z) + dim Jfj(z) - m = m} so dim Mj_l(z) + dim i?y(z) = Pj_x + mi = pt - dim M^z) and (20.3.8) follows. By Theorem 7.5.1, for zGft^(DUS), there exists a minimal factorization iy(A,z) = W,(A,z)---Wr(A,z) (20.3.9) which corresponds to the direct sum decomposition (20.3.7), with respect to the minimal realization VV( A, z) = / + C(z)( A/ - A(z)) 'fi(z) If we show that each projector ■nj{z) on ify(z) along 5£x(z) 4- • • • 4- J^._,(z) + i?/+I(z) + • ■ • + !£r(z) is analytic in ft, except possibly for algebraic branch points in Tx and poles in D, then formula (7.5.5) shows that W,(A, z) have all the properties required in Theorem 20.3.1. [Note that, by continuity, factorization (20.3.9) holds also for z G S ^ D, but it is not minimal at these points.] To verify these properties of 7ry(z), introduce for z £ ft ^ D the projector Qi(z) on Jtj(z) along jV/+1(z) for/ = 1,. . . , r - 1. Define also Q0{z) - 0 and Qr(z) = /. One checks easily that for j = 1,. . . , r Q,{z)Qi l(z)=Q^i(z)Q)(z), zetl^D (20.3.10) [Indeed, both sides of (20.3.10) take the value 0 on vectors from ^V/+1(z) and from i^(z), and take the value x on each vector x from M,._x{z).\ Therefore, (/- Qj-i(z))Qj(z) is a projector that coincides with 7r;(z) for/=l,. . . ,r. But e/(z) = F/(z)[^' ^(fyz))-1, /=!,. ..,r-l where Ff(z) is given by (20.3.5); so 2,(z) is analytic on ft except possibly for algebraic branch points in T, and poles in D. Hence 7r;(z) also enjoys these properties. □ Consider now an important case of analytic continuation of minimal factorizations that can also be achieved when z„ G 7\ U T2. Theorem 20.3.2 Let W(A, z) be as in Theorem 20.3.1, and let
638 Applications W(A,20)=W10(A)---Wr0(A) be a minimal factorization of W(A, z0) where z0 E Q. [As usual, W>0( A) are rational matrix functions with value I at infinity.] Assume that VV^A) and Wk0( A) have no common zeros and no common poles when j ¥= k. Then there exist rational matrix functions Wj(\,z), j = I,. . . ,r with the properties described in Theorem 20.3.1 and that, in addition, are analytic on a neighbourhood of z0. The proof is obtained in the same way as the proof of Theorem 20.3.1, by using Theorem 19.4.3 in place of Theorem 19.4.2. To conclude this section we discuss minimal factorizations (20.3.1) that cannot be continued analytically (as in Theorem 20.3.1). We say that the minimal factorization (20.3.1) is sequentially nonisolated, if there is a sequence of points {zm}^ =, in ft""- { z0} such that zm —» z0, and sequences of rational matrix functions {W>m(A)}^=1, /'= 1,. . . , r with value / at infinity such that W(A,zJ = Wlm(A)---W;m(A) is a minimal factorization of W( A, zm), m - 1, 2,. . . , and for /' = 1, . . . , r hm W/m(A) = W/0(A) (20.3.11) Equation (20.3.11) is understood in the sense that for each pair of indices k,l{\<k,l<n) the (k, I) entry of Wjm(A) has the form £p=0 apm^ where apm and )8pm are complex numbers (depending, of course, on j, k, and /) such that Hmm^ <*pm = ar limm^„ )8,m = ft, (p = 1,. . . , u\ q = 1, . . . , v), and the {k, /)-entry in W/0(A) is Clearly, if an analytic continuation (as in Theorem 20.3.1) of the minimal factorization (20.3.1) exists, then this factorization is sequentially nonisolated. In particular, Theorem 20.3.1 shows that every minimal factorization (20.3.1) with z0Eil^~ (Tl U T2) is sequentially nonisolated. Also, Theorem 20.3.2 shows that a minimal factorization (20.3.1) is sequentially nonisolated provided W/0(A) and Wk0(\) have no common zeros and no common poles when j ¥= k. It turns out that not every minimal factorization of W(A, z0)(z0Eil) can
Matrix Quadratic Equations 639 be continued analytically; indeed, we exhibit next a sequentially isolated minimal factorization. example 20.3.1. Let and consider the minimal factorization of W(A,0): .,„ , Tl + A ' 0 I fl A-1 lri + A"1 -A"1] W(A'°H 0 l + A-'Ho i + A-«J[ 0 1 J (20.3.12) We verify that this factorization is sequentially isolated. To this end we find all minimal factorizations of W(A, z), where z ¥=0. A minimal realization of W(A, z) is easily found: W(A, z) = / + /(a/- z 0""-1 / L0 0 In the notation of Theorem 20.2.1, we have A(z) = [ZQ I], B(z) = C(z) = I, A(zY = [Z~Ql _°J Theorem 7.5.1 shows that all nontrivial minimal factorizations of W(A, z)(z ¥0) are given by the formulas __,,. , ri + (A-z)-' oiri o i W(A'2H 0 iJLo i + a-'J W(A'Z> = Lo i + A-UL 0 iJ So the minimal factorization (20.3.12) is indeed sequentially isolated. □ 20.4 MATRIX QUADRATIC EQUATIONS Consider the matrix quadratic equation XBX + XA - DX - C = 0 where A, B, C, D are known matrices of sizes n x n, n x m, m x n, m x m, respectively, and X is an m x n matrix to be found. We assume that
640 Applications A = A(z), B = B(z), C = C(z), and D = D(z) are analytic functions of z on ft, where ft is a domain in the complex plane. The analytic properties of the solutions X as functions of z are studied. Let T(z) A{z) B(z) LC(z) D(z)1 26il ]■ be the (m + n) x (m + n) analytic matrix function, and let 5, and Sz be the first and second exceptional sets of T(z) as defined in Section 20.2. We have the following main result. Theorem 20.4.1 For every z0 E ft "- (5, U S2) and every solution X0 of XB{z0)X + XA(z0) - D(z0)X - C(z0) = 0 (20.4.1) there exists an m x n matrix function X(z) that is analytic on ft, except possibly for algebraic branch points in 5, and a discrete set of poles in ft, and such that X(z0) = X0 and X(z)B(z)X(z) + X(z) A{z) - D(z)X(z) - C(z) = 0 (20.4.2) for every z E ft that is not a pole of X(z). [The case when a point z0E Sl is also a pole of X(z) is not excluded.] Proof. By Proposition 17.8.1, the subspace ^0 = Im[^Jc<p"+m is T(z0) invariant. By Theorem 19.4.1, there is a family of subspaces M(z) that is analytic on ft except possibly for algebraic branch points in 5,, for which M(z0) = M0, and for which M(z) is T(z) invariant for all z&Q,. By Theorem 18.3.2 there exists an (m + n) x n analytic matrix function S(z) on ft with linearly independent columns such that, for all z E ft, M(z) = Im 5(z). Write ™-w (*) where 5,(z) is of size n x n and S2(z) is of size m x n, and observe that det5j(2)f^0 [because 5,(20) = /]. Now by the same Proposition 17.8.1
Matrix Quadratic Equations 641 X(z) = S2(z)S1(zy1 is the desired solution of (20.4.2). □ Consider an example. example 20.4.1. Let C(z) be an n x n analytic matrix function on ft with det C(z)^0, and assume that the eigenvalues of C(z) are analytic functions. [This will be the case if, for instance, C(z) has an upper triangular form.] Assume in addition that C(z) has n distinct eigenvalues for every z E ft. Consider the equation X2 = C(z) (20.4.3) Here and it is easily seen that det(A/- T(z)) = det(A2/- C(z)). So A0 is an eigenvalue of T(z0) if and only if A^ is an eigenvalue of C(z0). It follows that the first exceptional set of T(z0) is contained in the set 5={zE ft|det C(z) = 0}. As for every zEil~^S the matrix T(z) has In distinct eigenvalues, it follows that the second exceptional set of T{z) is also contained in 5. By Theorem 20.4.1 every solution X0 of (20.4.3) with z = z0 E ft "^ 5 can be extended to a family of solutions X(z) of (20.4.3) that is meromorphic on ft except possibly for algebraic branch points in 5. D In addition, let us indicate a case when an analytic extension of a solution of (20.4.1) is always possible. Theorem 20.4.2 Let X0 be a solution of (20.4.1) and z0Eil. Furthermore, assume that the T(z0)-invariant subspace Im is spectral. Then there exists an m x n matrix function X(z) with the properties described in Theorem 20.4.1 and, in addition, X(z) is analytic in a neighbourhood of z0. The proof of Theorem 20.4.2 is obtained in the same way as the proof of Theorem 20.4.1, but using Theorem 19.4.3 in place of Theorem 19.4.1. In connection with Theorem 20.4.2, note the following fact. Assume that m = n. If Xx and X2 are solutions of (20.4.1) such that (t(A(z0) + B{z0)X,) D cr(A(z0) + B(zo)X2)=0 then both 7"(20)-invariant subspaces
642 Applications ^, = Im[^J, / = 1,2 are spectral. Indeed T{zA x\ = [ x]{AiZo) + B^X>^ i = l>2 so (r(T(zo)\Mi)(~\(r(T(zo)\M2) = 0. In particular, Ml(~\M2 = {0}. As dim Ml = dim M2 — n, it follows that Ml + M2 = <p2". (Here we use the assumption that m = n.) Hence both Mx and M2 are spectral. The following example shows that not every solution of the equation XB(z0)X + XA(z0) - D(z0)X- C(z0) = 0, z0 E ft can be continued analytically as in Theorem 20.4.1. (Of course, it is then necessary that z0 E Sl U 52.) example 20.4.2. Consider the scalar equation zx2 = 0 (20.4.4) The solution x = \ of (20.4.4) with z0 = 0 cannot be continued analytically. □ 20.5 EXERCISES 20.1 Let i2 -kz A'-l L(A,z) = [_A'_ l2 Find the analytic continuation (as in Theorem 20.1.1) of the factorization «*.,)-(«-[! j])(«-[X i]) What are the poles of this analytic continuation? 20.2 Let L(A, z) be a monic n x « matrix polynomial of degree / whose coefficients are analytic on ft, and assume that for every z E ft det L(A, z) has n/ distinct zeros. Prove that for every factorization L(A, 20) = L,(A)-• • Lr(A), where 20Eft and Ly(A) are monic matrix polynomials, there exist monic matrix polynomials L[(A, z), . . . , Lr{ A, z) whose coefficients are analytic on ft and such that L,( A, z0) = L;.( A) for / = 1, . . . , r.
Exercises 643 20.3 Show that if by Theorem 20.1.1 the polynomial L(A, z) is scalar, then the analytic continuations of Ly( A) do not have poles in ft (i.e., 5 = 0 in the notation of Theorem 20.1.1). 20.4 Let L(A,z) be a monic matrix polynomial whose coefficients are circulant matrices analytic on ft. Prove that the analytic continuation of every factorization L( A, z0) = L,(A) • • • Lr( A) where z0 E ft (as in Theorem 20.1.1) has no poles in ft. 20.5 Prove that every factorization of a monic scalar polynomial L(A, z) with coefficients depending analytically on z E ft is sequentially nonisolated. (Hint: Use Exercise 19.9.) 20.6 Find the first and second exceptional sets for the following rational matrix functions depending analytically on a parameter z E <p: (a) W(A,z) = l + f^ + f^ (b) W(A,2) = 1 + A2-z2 A + l 1 + A2 + 22. 20.7 Let W(A, z) be as in Exercise 20.6 (a). Find the analytic continuations (as in Theorem 20.3.1) of all minimal factorizations of the rational matrix function W(A, z). 20.8 Let W(A, z) be a rational matrix function that satisfies the hypotheses of Theorem 20.3.1. Assume that for some z0 E ft, W( A, z0) has 5 distinct zeros and 8 distinct poles, where S is the maximum of the McMillan degrees of W(A, z) for 2 Eft. Prove that every minimal factorization W(A,20) = W1(A)--Wr(A) admits an analytic continuation into a neighbourhood of z0, that is, there exist rational matrix functions W^A, 2),. . . , Wr(A, 2) that are analytic in 2 on a neighbourhood °U of z0 such that W(A,2) = W,(A,2)---Wr(A,2) is a minimal factorization for every zE°U, and W,( A, z0) = W,( A) for j=\,. . . ,r. 20.9 Let XB(z)X + XA(z) - D(z)X - C{z) = 0 (1)
Applications be a matrix equation, where A(z), B(z), C(z), and D(z) are analytic matrix functions (of appropriate sizes) on a domain ft C <p. Assume that all eigenvalues of the matrix A(z) B(z) ] C(z) D(z)\ are distinct, for every 2 eft. Prove that given a solution X0 of (1) with 2 = 20 E ft, there exists an analytic matrix function X(z) on ft such that X{z) is a solution of (1) for every zEil and X(z0) = X0. 10 We say that a solution X0 of (1) with 2 = z0Eft is sequentially nonisolated if there exist a sequence (2m}^=1 such that zm-^z0 as m—* =0 and zm ¥^ z0 for m —1,2,... , and a sequence {Xm}^=l such that XmB(zm)Xm - XmA{zm) - D(Zm)Xm - C(zm) = 0 for m = 1,2,. . . , which satisfies lim Xm = X0 Prove that if the matrix M*o) B{z0) 1 C(20) D(20)J is nonderogatory, then every solution of (1) with 2 = z0 is sequentially nonisolated. 11 Give an example of a solution of (1) that is sequentially isolated.
Notes to Part 4 Chapter 18. This chapter is an introduction to the basic facts on analytic families of subspaces. The main result is Theorem 18.3.1, which connects the local and global properties of an analytic family of subspaces. This result (in a more general framework) appeared first in the theory of analytic fibre bundles [Grauert (1958), Allan (1967), Shubin (1979)]. Here, we follow Gohberg and Leiterer (1972,1973) in the proof of this theorem. The result of Theorem 18.2.1 goes back to Shmuljan (1957) [see also Gohberg and Rodman (1981)]. The proof of Theorem 18.2.1 presented here is from the authors' book (1982). The results of Section 18.6 seem to be new. In case of a function A/- A, where A is a bounded linear operator acting in infinite dimensional Banach space, the result of Theorem 18.6.2 was proved in Saphar (1965). Chapter 19. The starting point for the material in this chapter (Theorem 19.1.1) is taken from the book by Baumgartel (1985). Theorem 19.5.1 was proved in Porsching (1968). The analytic extendability problem for invariant subspaces is probably treated here for the first time. Chapter 20. We consider in this chapter some of the applications dealt with in Chapters 5, 7, and 17, but in the new circumstances when the matrices involved depend analytically on a complex parameter. All the results (except those in Section 20.1) seem to be new. In Section 20.1 we adapt and generalize the results developed in Chapter 5 of Gohberg, Lancaster, and Rodman (1982). Example 20.1.1 is Example 20.5.4 of the authors' book (1982). 645
Appendix Equivalence of Matrix Polynomials To make this work more self-contained, we present in this appendix the basic facts about equivalence of matrix polynomials that are used in the main body of the book. Two concepts of equivalence are discussed. For the first of these, two matrix polynomials A(\) and B(A) are said to be equivalent if one is obtained from the other by premultiplication and postmultiplication with square matrix polynomials having constant nonzero determinant. Elementary divisors (or, alternatively, invariant polynomials) form the full set of invariants for this concept of equivalence, and the Smith form (which is diagonal) is the canonical form. This equivalence is studied in detail in Sections A.1-A.4. The second concept of equivalence is the strict equivalence of linear matrix polynomials A + \B and Al + AS,. This means that P(A + \B)Q = Al + \Bl for some invertible matrices P and Q. For strict equivalence the full set of invariants comprises minimal column indices, minimal row indices, elementary divisors, and elementary divisors at infinity. The Kronecker form (which is block diagonal) is the canonical form. A thorough treatment of strict equivalence is presented in Sections A.5-A.7. The canonical form for equivalence of matrix polynomials is a natural prerequisite for this presentation. AA THE SMITH FORM: EXISTENCE In this and subsequent sections we consider matrix polynomials A(X) = T.sj=0 AjX', where Aj are mx n matrices whose entries are complex numbers (so that we admit the case of rectangular matrices Af). Of course, the sizes of all Aj must be the same. Two m x n matrix polynomials /1(A) and Z?(A) are said to be equivalent if 646
The Smith Form: Existence 647 y4(A) = £(A)B(A)F(A) for all Ae(f (A.l.l) and some matrix polynomials £(A) and F(A) of sizes mx n and n x n, respectively, with constant nonzero determinants (i.e., independent of A). We use the symbol ~:A(A)~B(A) to mean that A(\) and Z?(A) are equivalent. It is easy to see that ~ is an equivalence relation, that is: (a) A( A) ~ A( A) for every matrix polynomial A(\); (b) A(X)~ B(A) implies B(\)~ A(\); (c)/l(A)~iS(A) and B(A)~C(A) implies A(\)~ C( A). Indeed, if B(A) = /1(A), then (A.l.l) holds with £(A) = /m, F(A) = /„. Further, assume that (A.l.l) holds for matrix polynomials A( A) and B( A). As det £( A) = const ¥= 0, the formula for the inverse matrix in terms of cofactors implies that £(A)~' is a matrix polynomial as well and since det £(A) det £( A)~' = 1, it follows that det £(A)~' is also a nonzero constant. Similarly, F(A)~ is a matrix polynomial for which det F(A)~' is a nonzero constant. Now we have fl(A) = £(A) U(A)F(A)_1 which means that B(\)~ A(\). Finally, let us check (c). We have i4(A) = £1(A)B(A)F1(A), B(A) = £2(A)C(A)F2(A) where £,(A), £2(A), F,(A), F2(A) have constant nonzero determinants. Then A(\) = E( A)C(A)F(A) with E( A) = £,(A)£2(A) and F(A) = F2(A)F,(A). So A(A)~C(A). The central result on equivalence of matrix polynomials is the Smith form, which describes the simplest matrix polynomial in each equivalence class, as follows. Theorem A.l.l An mx n matrix polynomial A(\) is equivalent to a unique m x n matrix polynomial D( A) where D(A) = rf.(A) 0 o- rfr(A) 0J (A. 1.2) is a diagonal polynomial matrix with monic scalar polynomials dt{ A) such that dt( A) is divisible by d,_,(A) for i = 2,...,r. In other words, for every matrix polynomial A(\) there exist matrix polynomials £(A) and F(A) with constant nonzero determinants such that
648 Appendix £(A)/1(A)F(A) = D(A) (A.1.3) has the form (A.1.2), and this form is uniquely determined by A(\). The matrix polynomial D(A) of (A. 1.2) is called the Smith form of /1(A) and plays an important role in the analysis of matrix polynomials. Note that £(A) and F(A) from (A.1.3) are not unique in general. Note also that the zeros on the main diagonal in D(A) are absent in case A(\0) has full rank for some A0 €E (p- [In particular, this happens if ./4(A) is an n x n matrix polynomial with leading coefficient /.] Proof of Theorem A. 1.1 (First Part). Here we prove the existence of a D(A) of the form (A.1.2) that is equivalent to a given A(\). We use the following elementary transformations of a matrix polynomial A(\) of size m x n: (a) interchange two rows, (b) add to some row another row multiplied by a scalar polynomial, and (c) multiply a row by a nonzero complex number, together with the three corresponding operations on columns. Note that each of these transformations is equivalent to the multiplication of .A(A) by an invertible matrix as follows. Interchange of rows (columns) i and / in /4(A) is equivalent to multiplication on the left (right) by 1 0 0 1 (A.1.4) Adding to the ith row of A( A) the /th row multiplied by the polynomial /(A) is equivalent to multiplication on the left by n i i /(A) 1J (A.1.5)
The Smith Form: Existence 649 the same operation for columns is equivalent to multiplication on the right by the matrix 1 /(A) " 1 1 (A. 1.6) Finally, multiplication of the ith row (column) in /1(A) by a number a ¥=0 is equivalent to the multiplication on the left (right) by 1 1J (A.1.7) [Empty spaces in (A.1.4)-(A.1.7) are assumed to be zeros.] Matrices of the form (A.1.4)-(A.1.7) are be called elementary. It is apparent that the determinant of any elementary matrix is a nonzero constant. Consequently, it is sufficient to prove that, by applying a sequence of elementary transformations, every matrix polynomial /1(A) can be reduced to a diagonal form: diag[d,(A), . . . , dr(A), 0,. . . ,0], where d,(A),. . . , dr(A) are scalar polynomials such that the quotients d,(A)/d,_,( A), i = 1, 2,. . . , r - 1, are also scalar polynomials. We prove this statement by induction on m and n. For m-n — 1 it is evident. Consider now the case m = 1, n > 1; that is A(A) = K(AK(A)-«„(A)] If all a;(A) are zeros, there is nothing to prove. Suppose that not all the z,( A) are zeros, and let af (A) be a polynomial of minimal degree among the nonzero entries of /1(A). We can suppose that /'0 = 1. [Otherwise, interchange columns in /1(A).] By elementary transformations it is possible to
650 Appendix replace all the other entries in .4(A) by zero. Indeed, let ay(A)^0. Divide a-( A) by fl,(A): af( A) = &.( A)a,( A) + ry(A), where r;(A) is the remainder and its degree is less than the degree of a,(A), or r-(A) = 0. Add to the /th column the first column multiplied by -6;(A). Then r;(A) will appear in the /'th position of the new matrix. If ry(A)?t0, then put ry(A) in the first position, and if there is still a nonzero entry [different from r-( A)], apply the same argument again. Namely, divide this (say, the kth) entry by r.( A) and add to the Arth column the first multiplied by minus the quotient of the division, and so on. Since the degrees of the remainders decrease, after a finite number of steps [not more than the degree of a,(A)] we find that all the entries in our matrix, except the first, are zeros. This proves Theorem A.1.1 in the case m = 1, n > 1. The case m > 1, n = 1 is treated in a similar way. Assume now that m,n>\, and assume that the theorem is proved for matrices with m - 1 rows and n - 1 columns. We can suppose that the (1,1) entry of .4(A) is nonzero and has the minimal degree among the nonzero entries of /4(A). [Indeed, if A(\)^0, we can reach this condition by interchanging rows and/or columns in /1(A). If -4(A) = 0, Theorem A.1.1 is trivial.] With the help of the procedure described in the previous paragraph [applied for the first row and the first column of/1(A)], by a finite number of elementary transformations we reduce /1(A) to the form >»,(A) = ali'(A) 0 0 «£>(A) L 0 «i|>(A) 0 a^U) Suppose that for some i, j>\, a(J\\)^0 and is not divisible by aJJ'CA) (without remainder). Then add to the first row the /th row and apply the above arguments again. We obtain a matrix polynomial of the form ^2(A) = r«W(A) 0 0 «g>(A) L 0 <C2'(A) 0 «£'(A) <OA)J where the degree of a\]\\) is less than the degree of a[\\\). If there still exists some entry aJ;2)(A) that is not divisible by a(,^(A), repeat the same procedure once more, and so on. After a finite number of steps we obtain the matrix -aiftA) 0 A3(\) = i(3) '22 (A) o ■ ,(3) (A) ,(3> (A).
The Smith Form: Uniqueness 651 where every a]p( A) is divisible by a\]\ A). Multiply the first row (or column) by a nonzero constant to make the leading coefficient of the polynomial a(,^(A) equal to 1. Now define the (m - 1) x (n — 1) matrix polynomial -^■•(A) ~ (3) «;r(A) a^(A) •■• «£>(A) <'(A) ••■ «i,3>(A) and apply the induction hypothesis for /14(A) to complete the proof of existence of a Smith form D(A). Q A.2 THE SMITH FORM: UNIQUENESS We need some preparations to prove the uniqueness of the Smith form D( A) in Theorem A. 1.1. Let A = [a^]™,'" l be an m x n matrix with complex entries. Choose k rows, 1 < i, < • • • < ik < m, and A: columns, 1 </[<••• < jk^n, in A, and consider the determinant det[a,:J ]*,=1 of the fc x A: submatrix of A formed by these rows and columns. This determinant is called a minor of /I. Loosely speaking, we can say that this minor is of order k and is composed of the rows i,,. . . , ik and columns jlt. . . , jk of A. It is denoted by A (• ::: ■*) . We establish the important Binet-Cauchy formula, which expresses the minors of a product of two matrices in terms of the minors of each factor, as follows. Theorem A.2.1 Let A = BC, where B is a mx p matrix, and C is a p x n matrix. Then for every k, 1 < A: < min(m, ri) and every minor Ay 1 k J of order k we have 4-1 "• m=eb('' ''2 '" ik)ch °2 ••• a*) Vi •■• h> v"i «2 '•• <V v/! h •■■ Ik' (A.2.1) where the sum is taken over all sequences {a,}* = 1 of integers satisfying 1 ^ a, < a2 < • ■ ■ < ak < p. In particular, if k> p, then the sum on the right-hand side of (A.2.1) is empty and the equation is interpreted as Vi ••• h'
652 Appendix Note that for k = 1 formula (A.2.1) is just the rule of multiplication of two matrices. On the other hand, if m=p = n and k = n, then (A.2.1) gives the familiar multiplication formula for determinants: det(BC) = det B • det C. Proof. As the rank of A does not exceed p. we have A\ } .* = 0 V/i ••• jj as long as k>p. So we can assume k<p. For simplicity of notation assume also iq=jq = q,q = l,...,k. Letting A = [fl,7]™;"=1, B = [b^f^, C= [c/;]f'"_,, we may write AI ) in the form det 2 blacail E blaca22 otj = 1 a-, = 1 P P 2j &2ttiCa]1 ^ ^2a2Ca22 a, = 1 a7= I 2 b*nic0]1 E bkac, a22 2 fclatcat P p 2 &*„C„.j and using the linearity of the determinant as a function of each column, this expression is easily seen to be equal to 2 det b\a,Cat\ "\afa22 ' ' " "\a,pakk blafa^X b2a2Ca22 ■ ■ ■ b1<XiCa^k ~"kafa,\ "ka2Ca2l bt„.c kak ati.k ' where the sum is taken over all k-tuples of integers (a,,. . . , ak) such that l^at<p. (Here we use the notation B\ to denote \ai a2 ■■■ aj det[6/a ]/,,=! even when the the sequence {aq}q=i is not increasing, or when it contains repetitions of numbers.) If not all a,, a2,. . . , ak are different, then clearly B\ I = 0. Ignoring these summands in \Ul a2 ■■■ akJ (A.2.2), split the remaining terms into groups of A:! terms each in such a way that the summands in the same group differ only in the order of indices or,, a2,. . . , ak. We obtain:
The Smith Form: Uniqueness 653 (1 2 •■■ k\ A(l 2 .- k) y yb( 1 2 "' k ) (A.2.3) where the internal summation is over all permutations tt of {1,2, . . . , k}. Denoting by e(7r) the sign of ir (c(tt) is 1 if n is even and -1 if ir is odd), we find that the right-hand side of (A.2.3) is isa,< <«tSp Va, «2 ••• <V *"> ' c<v<*)* lSOl<-<ajt£p V"l <*2 ••• <V V1 ••• k/ and the theorem is proved. D Returning to matrix polynomials observe that the minors of a matrix polynomial A(X) are (scalar) polynomials, so we can speak about their greatest common divisors. Theorem A.2.2 Let A(\) be an m x n matrix polynomial. Let pk(X) be the greatest common divisor (with leading coefficient 1) of the minors ofA(\) of order k, if not all of them are zeros, and let pk( A) = 0 if all the minors of order k of A( A) are zeros. Let p0(A) = l and D(A) = diag[d,(A),. . . , dr(A),0,. . . ,0] be a Smith form of A(X) (which exists by the part of Theorem AAA already proved). Then r is the maximal integer such that pr( A)^0, and d,(A) = -^r, i = l,...,r (A.2.4) Proof. Let us show that if A^k) and A2(k) are equivalent matrix polynomials, then the greatest common divisors pk ,(A) and pk 2(A) of the minors of order k of j4,(A) and A2(X), respectively, are equal. Indeed, we have A,(A) = £(A)A2(A)F(A) for some matrix polynomials E( A) and F( A) with constant nonzero determinants. Apply Theorem A.2.1 twice to express a minor of Al(A) of order k as
654 Appendix a linear combination of minors of A2(\) of the same order. Therefore, it follows that pk 2(A) is a divisor of pk ,(A). But the equation /12(A)=£-1(A)A(A)F'(A) implies that pk[(A) is a divisor oipk2{\). So pk ,(A) = pk2(k). In the same way one shows that the maximal integer rl such that pr l (A) ^0 coincides with the maximal integer r2 such that pr 2(A)^0. Now apply this observation for the matrix polynomials /1(A) and D( A). It follows that we have to prove Theorem A.2.2 only in the case that /1(A) itself is in the diagonal form /1(A) = D(A). From the structure of D( A) it is clear that d,(A)d2(A)--d5(A), s = l,...,r is the greatest common divisor of the minors of D(A) of order s. So p,(A) = d,(A)---d,(A), * = l,...,r, and (A.2.4) follows. □ Theorem A.2.2 immediately implies the uniqueness of the Smith form (A.1.2). Indeed, Theorem A.2.2 shows that the number r of not identically zero entries in the Smith form of /1(A), as well as the entries rfj(A),. . . , dr(A) themselves, can be expressed explicitly in terms of A(\), that is, r and d,(A),. . . , dr(h) are uniquely determined by A(\). A.3 INVARIANT POLYNOMIALS, ELEMENTARY DIVISORS, AND PARTIAL MULTIPLICITIES In this section we study various invariants appearing in the Smith form for the matrix polynomials. Let /1(A) be an m x n matrix polynomial with the Smith form D( A). The diagonal elements d,( A),. . , dr( A) in D( A) are called the invariant polynomials of /1(A). The number r of invariant polynomials can be defined as r = max{rank /1(A)} (A.3.1) Indeed, since £(A) and F(A) from (A. 1.3) are invertible matrices for every A, we have rank /1(A) = rank D(A) for every A E <p. On the other hand, it is clear that rank D( A) = r if A is not a zero of one of the invariant polynomials, and rank D( A) < r otherwise. So (A.3.1) follows. The set of invariant polynomials forms a complete invariant for equivalence of matrix polynomials of the same size. Theorem A.3.1 Matrix polynomials /1(A) and B( A) of the same size are equivalent if and only if the invariant polynomials of /1(A) and B(A) are the same.
Invariant Polynomials, Elementary Divisors, and Partial Multiplicities 655 Proof. Suppose the invariant polynomials of /4(A) and B(k) are the same. Then their Smith forms are equal: A(k) = EMD(k)Fx(k), B(k) = £2(A)D(A)F2(A) where det £,( A) = const ¥= 0, det F,( A) = const #0, i = 1, 2. Consequently (^(AjrWAXF.U))-1 = (£2(A))-'fi(A)(F2(A))-1(=D(A)) and j4(A) = £(A)B(A)F(A) where £(A) = £1(A)(£2(A))"1, F(A) = F,(A)(F2(A))_1. Since £2(A) and F2(A) are matrix polynomials with constant nonzero determinants, the same is true for £~'(A) and F~'(A), and, consequently, for £(A) and F(A). So A(k)~B(k). Conversely, suppose A(k) - £(A)B(A)F(A), where det £(A) = const¥= 0, det F(A) = const^0. Let D(A) be the Smith form for B(k): B(A) = £1(A)D(A)F1(A) Then D(A) is also the Smith form for /4(A): /4(A) = £(A)£,(A)D(A)F1(A)F(A) By the uniqueness of the Smith form for A(k) [more exactly, by the uniqueness of the invariant polynomials of /1(A)], it follows that the invariant polynomials of A( A) are the same as those of B(k). □ We now take advantage of the fact that the polynomial entries of /1(A) and its Smith form D( A) are over <p to represent each invariant polynomial d;(A) as a product of linear factors: d/(A) = (A-AI.1)-»-.-(A-Al,t()"'*', i=l r where An,. . . , A, k, are different complex numbers and an,. . . , aik are positive integers. The factors (A - A;•)"'', /'= 1,. . . , kt, i = 1,. . . , r are called the elementary divisors of/1(A). Some different elementary divisors may contain the same polynomial (A - A0)a (this happens, for example, in case d,(A) = di+l(k) for some i); the total number of elementary divisors of /1(A) is thus E^=1 kr The degrees a^ of the elementary divisors form an important characteristic of the matrix polynomial /1(A). Here we mention only the following simple property of the elementary divisors, whose verification is left to the reader.
656 Appendix Proposition A.3.2 Let A(\) be an n x n matrix polynomial such that det/l(A)#0. Then the sum E^=1 £*![ atj of degrees of its elementary divisors (A - A,--)"'' coincides with the degree o/det A (A). Note that the knowledge of the elementary divisors of A{\) and the number r of its invariant polynomials d,(A),. . , d,(A) is sufficient to construct ^(A), . . . , dr(\). In this construction we use the fact that d,(A) is divisible by d^^X). Let A,,. . . , A be all the different complex numbers that appear in the elementary divisors, and let (A - A,-)"", . . . , (A - A,) ''*' (i'= 1, . . . ,p) be the elementary divisors containing the number A,, and ordered in the descending order of the degrees an s • • • > a, t > 0. Clearly, the number r of invariant polynomials must be greater than or equal to max{Ar,,. . . , kp). Under this condition, the invariant polynomials ^(A),. . . , dr(X) are given by the formulas p d,(A) = n(A-A,)a"+'-', / = 1 r where we put (A - A,-)"'' = 1 for /' > kt. The following property of the elementary divisors is used subsequently. Proposition A.3.3 Let A(X) and B( A) be matrix polynomials, and let C( A) = diag[.4( A), B( A)], a block-diagonal matrix polynomial. Then the set of elementary divisors of C(A) is the union of the elementary divisors ofA(\) and B(\). Proof Let D,(A) and D2(A) be the Smith forms of /1(A) and B(A), respectively. Then clearly C(A)=£(A) D,(A) 0 0 D2(A) F(A) for some matrix polynomials £(A) and F(A) with constant nonzero determinant. Let (A - A,,)"1, . . . , (A - A0)a", and (A - A,,)"1, . . . , (A - A0)^« be the elementary divisors of D,(A) and D2(A), respectively, corresponding to the same complex number A0. Arrange the set of exponents a1,. . . , ap, By,. . . , Bq, in a nonincreasing order: {a„ . . . , ap, j8,,. . . , ft,} = {ylt. . . , yp+q) where 0 < -y, s • • • < Jp+q- Using Theorem A.2.2 it is clear that in the Smith form D = diag[rf,(A),. . .,dr(A),0,.. . ,0] of diag[D,(A), D2(A)], the invariant polynomial dr(A) is divisible by (A-A,,)^*' but not by (A-
Invariant Polynomials, Elementary Divisors, and Partial Multiplicities 657 A0)y"+, + 1, rf,-i(A) is divisible by (A - An)y"+' ' but not by (A - A0)y"+"l + 1, and so on. It follows that the elementary divisors of Pi(A) 0 1 L 0 D,(A)J D2(A). [and thus also those of C(A)] corresponding to A0, are just (A- An)yi,. . . , (A - \0)yp*\ and Proposition A.3.3 is proved. Q In the rest of this section we assume that (as in Proposition A.3.2) the matrix polynomial /1(A) is square and that the determinant of /1(A) is not identically zero. In this case, complex numbers A0 such that det /t(A0) = 0 are called the eigenvalues of/1(A). Clearly, the set of eigenvalues is finite [it contains not more than degree (det /1(A)) points], and A0 is an eigenvalue of /1(A) if and only if there is an elementary divisor of/1(A) of type (A - A0)a. Let A0 be an eigenvalue of A(A), and let (A - A0)a', . . . , (A - A0)"'' be all the elementary divisors of /1(A) that are divisible by A - A0. The exponents Op . .. , a are called the partial multiplicities of /1(A) corresponding to A0. Recall that some of the numbers ax, . . . , ap may be equal; the number ay appears in the list of partial multiplicities as many times as there are elementary divisors (A-A0)"' of /1(A). The partial multiplicities play an important role in the following representation of matrix polynomials. Theorem A.3.4 Let /1(A) be an nx n matrix polynomial with det A(\)^0. Then for every A0E <p, /1(A) admits the representation A(k) = EAX) (A-Aor 0 0 (A-Ao)"- F.(A) (A3.2) where ZsA (A) and FA (A) are matrix polynomials invertible at A0, and Kj < • • • < Kn are nonnegative integers, which coincide {after striking off zeros) with the partial multiplicities of A(k) corresponding to A0. Proof. The existence of representation (A.3.2) follows easily from the Smith form. Namely, let £>(A) = diag[d,(A),. . . , d„(A)] be the Smith form of/1(A), and let /1(A) = £(A)D(A)F(A) (A.3.3) where det £( A) = const ^ 0, det F(A) = const ¥=0. Represent each d^X.) in the form
658 Appendix d,(A) = (A-A0)X(A), i=\,...,n where d,(A0)?tO and k(>0. Since dt(k) is divisible by dt_x(k), we have k,s Ki_l. Now (A.3.2) follows from (A.3.3), where £Ao( A) = £( A) diagK(A),. . . , d„(A)], FJ A) = F( A) It remains to show that the k, coincide (after striking off zeros) with the degrees of elementary divisors of/1(A) corresponding to A0. To this end we show that any factorization of A(\) of type (A.3.2) with *,<-••<*„ implies that k; is the multiplicity of A0 as a zero of d;(A),/' = 1,. . . , n, where £>( A) = diag[d,( A),. . . , dn( A)] is the Smith form of /1(A). Indeed, let A(A) = £(A)D(A)F(A) where £( A) and F( A) are matrix polynomials with constant nonzero determinants. Comparing with (A.3.2), write diagK(A),. . . , d„(\)] = £Ao(A) diag[(A - A0)"',. . . , (A - A0)"»]FAo(A) (A.3.4) where EA (A) = (£(A))~'£A (A), FA (A)(F(A))~' are matrix polynomials in- vertible at A0. Applying Theorem A.2.1, we obtain dx(X)d2{ A) • • • d, (A) = 2 m, £( A) • m, ,D (A) • m,f( A), i = 1,2,. . . , n (A.3.5) where mi£(k) [resp. m-_D (A), rnkp(\)] is a minor of order i0 of £(A) [resp. diag[(A-A0)K|,. . . ,(A-A0)""], F(A)], and the sum in (A.3.5) is taken over a certain set of triples (i, /', fc). It follows from (A.3.5) and the condition *,<•••<*„, that A0 is a zero of the product dx (A)d2( A) • • • di (A) of multiplicity at least k, + k2 + ■ • • + k, . Rewrite (A.3.4) in the form (£Ao(A))-' diagK(A),. . . , dn(A)](FAo(A))-' = diag[(A-A0)-',...,(A-Aor-] and apply Theorem A.2.1 again. Using the fact that (EA(A))~' and (FA (A))-1 are rational matrix functions that are defined and invertible at A = A0, and that dt(\) is a divisor of d, + 1(A), we deduce that (A - Ao)",+ +% = rf,( A)rf2(A)- • rfio(A)*lo(A) where 4>,n( A) is a rational function defined at A = A0 (i.e., A0 is not a pole of
Equivalence of Linear Matrix Polynomials 659 <!>, (A)). It follows that A0 is a zero of d1(k)d2(k)- ■ • di (A) of multiplicity exactly k, + k2 + • • • 4- k,. , i = 1,. . . , n. Hence Kt is exactly the multiplicity of A0 as a zero of d,(A) for i = 1,. . . , n; that is, the nonzero numbers (if any) among *,,...,*„ are the partial multiplicities of A{ A) corresponding to A0. □ As a consequence of Theorem A.3.4, note that there are nonzero k. in the representation (A.3.2) if and only if A0 is an eigenvalue of A(k). A.4 EQUIVALENCE OF LINEAR MATRIX POLYNOMIALS We study here equivalence and the Smith form for matrix polynomials of type Ik — A, where A is an n x n matrix. It turns out that for such matrix polynomials the notion of equivalence is closely related to similarity. Theorem AAA Ik — A ~ Ik- B if and only if A and B are similar. To prove this theorem, we have to introduce division of matrix polynomials. We restrict ourselves to the case when the dividend is a general matrix polynomial A(k) = £J„0 A^k', and the divisor is a matrix polynomial of type Ik + X, where A' is a constant n x n matrix. In this case the following representation holds: A(k) = Qr{k)(Ik + X) + Rr (AAA) where £?r(A) is a matrix polynomial, which is called the right quotient, and Rr is a constant matrix, which is called the right remainder, on division of A(k) by I\ + X. Also A(k)^(Ik + X)Q,(k) + Rl (A.4.2) where Q,(k) is the left quotient, and the constant matrix R, is the left remainder. Let us check the existence of representation (A.4.1); (A.4.2) can be checked in a similar way. If / = 0 [i.e., A(k) is constant], put Qr(k) = 0 and Rr = /1(A). So we can suppose /> 1. Write gr(A) = EJ:J Qfk'. Comparing the coefficients of powers of A on the right- and left-hand sides of (A.4.1), we can rewrite this relation as follows: At = Q^, , A,., = Q?l2 + fiW.AT, ...,A, = Q^ + Q[r)X , A0 = Q?X+Rr
660 Appendix Clearly, these relations define (?;-i>. . . , Q[r), Q^, and Rr, sequentially. It follows from this argument that the left and right quotient and remainder are uniquely defined. Proof of Theorem A.4.1. In one direction this result is immediate: if A = SBS ~ for some nonsingular 5, then the equality /A - A = 5(/A - B)S proves the equivalence of /A - A and /A - B. Conversely, suppose Ik- A — IX- B. Then for some matrix polynomials £(A) and F(k) with constant nonzero determinant we have £(A)(/A- A)F(k) = Ik- B Suppose that division of (£(A)) ' on the left by /A - A and of F(k) on the right by Ik - B yield (E(k))~l=(Ik-A)S(k) + E0 (A.4.3) F(k)=T(k)(Ik-B) + F0 Substituting in the equation (E(k))'1(Ik-B) = (Ik-A)F(k) we obtain {(/A - A)S( A) + £0}(/A - B) = (Ik -A){T( A)(/A - B) + F0} whence (/A - A)(S( A) - T(k))(Ik - B) = (Ik - A)F0 - E0(Ik - B) Since the degree of the matrix polynomial on the right-hand side here is 1, it follows that 5(A) = T( A); otherwise, the degree of the matrix polynomial on the left is at least 2. Hence (Ik-A)F0 = E0(kI-B) so that F0 = E0, AF0 = E0B , AE0 = E0B0 It remains only to prove that E0 is nonsingular. To this end divide E( A) on the left by Ik - B: E(k) = (I\-B)U(k) + R0 (AAA) Then, using (A.4.3) and (A.4.4), we have
Equivalence of Linear Matrix Polynomials 661 / = (£(A))-'£(A) = {(/A - A)S(k) + £0}{(/A - B)U(k) + R0} = (/A - A){S(k)(Ik - B)U(X)} + (/A - A)F0U(k) + (Ik-A)S(k)R0+E0R0 = (/A - A)[S{k)(Ik - B)U(k) + F0U(k) + S(k)R0] + E0R0 Hence the matrix polynomial in the square brackets is zero, and E0R0 = I. It follows that E0 is nonsingular. D The definitions of eigenvalues and partial multiplicities made in the preceding section can be applied to an n x n matrix polynomial of the form IX- A. On the other hand, as an n x n matrix (or as a transformation represented by this matrix in the standard basis e,,. . . , en), A has eigenvalues and partial multiplicities as denned in Sections 1.2 and 2.2. It is an important fact that these notions for Ik- A and for A coincide. Theorem A.4.2 A complex number A0 is an eigenvalue of Ik- A if and only if it is an eigenvalue of A. Moreover, the partial multiplicities of Ik- A corresponding to its eigenvalue A0 coincide with the partial multiplicities of A corresponding to k0. Proof. The first statement follows from the definitions: A0 is an eigenvalue of Ik - A if and only if det(/A - A) - 0, which is exactly the definition of an eigenvalue of A. For the proof of the second statement, we can assume that A is in the Jordan form. Further, using Proposition A.3.3, we reduce the proof to the case when A is a single Jordan block of size n x n: A0 1 0 0 An 1 L0 0 0 0 An J The partial multiplicity of A is clearly n, corresponding to the eigenvalue A0. To find the partial multiplicities of Ik - A, observe that Ik- A = A-An L 0 -1 0 0 0 -1 A-A, o-1 has a nonzero minor of order n - 1 that is independent of A (namely, the
662 Appendix minor formed by crossing out the first column and the last row in IX- A). As det(/A - A) = (A - A0)", Theorem A.2.2 implies that the Smith form of IX- A is diag[l, 1,. . . , 1, (A - A0)"]. So the only partial multiplicity of IX - A is n, which corresponds to A0. □ We also need the following connection between the partial multiplicities of a matrix A and submatrices of IX — A. Theorem A.4.3 Let A be an nx n matrix. Let a, > • • • > am fte the partial multiplicities of an eigenvalue X0 of A, and put at=0 for i = m + 1,. . . , n. Then an + ari_l + ••• + «„+, is the minimal multiplicity of A0 as a zero of the determinant {considered as a polynomial in A) of any p x p submatrix in IX — A. Proof. By Theorems A.4.2 and A.3.4 we have the following representation: IX - A = £Ao(A) diag[( A - A0)<\ (A - A0)"»-,. . . , (A - A0P]FAo(A) (A.4.5) where EK (A) and FA (A) are matrix polynomials invertible for A = A0. Now the Binet-Cauchy formula (Theorem A.2.1) implies that the multiplicity of A0 as a zero of the determinant of any pxp submatrix in IX — A is at least a„ + a„_! + • - • 4- an_p + l. Rewriting (A.4.5) in the form £Ao(A) '(/A - A)F)io(xyl = diag[(A - A0)°-, (A - A0)a-',. . . , (A - Aon and using the Binet-Cauchy formula again, we find that (A-Aor"+a-'+ +°" "+' = S 9,(A)det(Aj(A)) (A.4.6) where /I,(A),. . . , AS(X) are certain p *• p submatrices in IX-A, and <Pi(X), i = 1,. . . , s are rational functions defined at A0 [so A0 is not a pole of any <p£X)]. It follows from equation (A.4.6) that at least one of the minors det(y4,(A)) has a zero at A0 with multiplicity exactly equal to an + a„_, + A.5 STRICT EQUIVALENCE OF LINEAR MATRIX POLYNOMIALS: REGULAR CASE Let A + XB and Al + XBl be two linear matrix polynomials of the same size m x n. We say that A + XB and A x + XBl are strictly equivalent if there exist
Strict Equivalence of Linear Matrix Polynomials 663 invertible matrices P and Q of sizes m x m and n x n, respectively, independent of A, such that, for all A E <p, we obtain P(A + AB)Q = Al + ABl s We denote strict equivalence by A + \B~Al + AB^ It is easily seen that strict equivalence is indeed an equivalence relation, that is, that the three following properties hold: A + AB ~ A + AB for every polynomial A + AB. If A + AB~A, + AB,, then also A, + AB^A + AB. If A + AB~Al + ABl and A,, + ABl~A2 + AB2, then A + AB~A2 + AB2. Obviously, strict equivalence of linear matrix polynomials implies their equivalence. The converse is not true in general, as we see later in this section. In this and subsequent sections we find the invariants of strict equivalence, as well as the simplest representative (the canonical form) in each class of strictly equivalent linear matrix polynomials. This section is devoted to the regular case. That is, when A and B are square matrices and det(/l + AB) does not vanish identically. In particular, the polynomials A + AB with squares matrices A and B and detB^O are regular. This hypothesis is used in our first result. Proposition A.S.I Two regular polynomials A + AB and Ax + AS, with det B #0, det Bx ¥=0 are strictly equivalent if and only if they have the same invariant polynomials (or, equivalently, the same elementary divisors). The proof is easily obtained by combining Theorems A.3.1 and A.4.1. However, the result of Proposition A.5.1 is false, in general, if we omit the conditions det B ¥= 0, det Bl ¥= 0 and require only that the polynomials are regular. example A.5.1. Let A = Al = I2, B = [°Q J], B,=0 The polynomials A + AB = [o i]' and A'+Afi> = [J i] are obviously regular, and both have the Smith form , that is, the same invariant polynomials. However, they cannot be strictly equivalent because B and Bt have different ranks. (If A + AB and Ax + AS, were
664 Appendix strictly equivalent, we would have B = PBXQ for some invertible P and Q, and this would imply the equality of the ranks of B and Bx.) D To extend the result of Proposition A.5.1 to the class of all regular polynomials A + XB, we must introduce the elementary divisors at infinity. We say that Ap is an elementary divisor at infinity of a regular polynomial A + XB if Xp is an elementary divisor of XA + B. Clearly, there exist elementary divisors at infinity of A + XB if and only if det B = 0. Theorem A.5.2 Two regular polynomials A + XB and A, + \Bl are strictly equivalent if and only if the elementary divisors of A + XB and Ax + XB{ are the same and their elementary divisors at infinity are the same. Proof Assume that A + XB and Ax + AJ5, are strictly equivalent. Then obviously A + XB and Ax + XBt are equivalent, so by Theorem A.3.1 they have the same elementary divisors. Moreover, XA + B and XAX + Bx are equivalent as well, so, by the same Theorem A.3.1, A + XB and Ax + AS, have the same elementary divisors at infinity. To prove the second part of the theorem, we introduce homogeneous linear matrix polynomials. Thus we consider the polynomial p.A + XB where p., A£ <p. Note that every minor m(X, p.) of order r of p.A + XB is a polynomial of two complex variables p. and A that is homogeneous of order r in the sense that m(aX, ap.) = arm(X, p.) for every a, A, p. E <p. For a fixed r, 1 ^ r ^ n, let pr{ X, p.) be the greatest common divisor in the set of homogeneous polynomials of all the nonzero minors m,(A, p.),. . . , ms(X, p.) of order r of p.A + XB. In other words, /?r(A, p.) is a homogeneous polynomial that divides each m,(A, p,), and if <?(A, p.) is another homogeneous polynomial with this property, then <7(A, p.) divides /?r(A, p.). Clearly, /?,_i(^> P-) divides p,(^, m)- The polynomials p,(A, p.), . . . , p„(A, p.) are called the invariant polynomials of p.A + XB. As each minor m(A, p.) of p-A + XB is a homogeneous polynomial in A and p., it admits factorizations of the form 9 l' m(X, p.) = p? 11 (A + a>M)T'= A"' 11 (m + <*'A)y; >=i >=i for some complex numbers a; and a'r (In fact, the nonzero a J values are the reciprocals of the nonzero a; values.) Using factorizations of this kind, it is easily seen that p,(A,l),...,p„(A,l) are the invariant polynomials of A + XB, whereas p,(l, p.),. . ., p„(l, p.) are the invariant polynomials of p,A + B.
Strict Equivalence of Linear Matrix Polynomials 665 Returning to the proof of Theorem A.5.2, assume that the elementary divisors of A + XB and Al + A/?,, including those at infinity, are the same. This means that the invariant polynomials of A + XB and Ax + XBX are the same, and so are the invariant polynomials of fiA + B and \x.Ax + Bx. Since a homogeneous polynomial p(X, fi) of A and /j, is uniquely defined by p(X, 1) and p{\, /J,), it follows from the discussion in the preceding paragraph that the invariant polynomials of p. A + AS and of \x.Ax + XBX are the same. Now we make a change of variables: A = xx A + x2/i, p. = yxX + y2/i, where xxy2 — x2yx ^0. Then the invariant polynomials of p.A + KB and of /JLAX + XBX are again the same, where A = y2A + x2B, B = yxA + xxB, Ax = y2Ax + x2Bx, Bx = yxAx+ xxBx. As the polynomials A + \B and Ax + \BX are regular, we can choose xx and yx in such a way that det 6^0 and det Bx ¥=0. Apply Proposition A.5.1 to deduce that A + XB and Ax + \BX are strictly equivalent: PAQ = Ax, PBQ = Bx for some invertible matrices P and Q. Since A=-{xxA-x2B), B=-(-yxA+y2B) where A = xxy2 - x2yx, and similarly for Ax and Bx, we obtain PAQ - Ax, PBQ = Bx, and the strict equivalence of A + \B and A x + \BX follows. D Theorem A.5.2 allows us to obtain the canonical form for strict equivalence of regular linear matrix polynomials, as follows. Theorem A.5.3 Every regular, linear, matrix polynomial A + AS is strictly equivalent to a linear polynomial of the form (/tl+A/ti(O))©---©(/^+Aytp(O))©(A/,i+/,i(A1))©-"0(A//s+y/?(A€)) (A.5.1) where /t(A) is the k* k Jordan block with eigenvalue A. The linear polynomial (/1.5.1) is uniquely determined by A + \B. In fact, A ',..., A p are the elementary divisors at infinity of A + XB, whereas (A + A;)', i = 1,. . . , q are the elementary divisors of A + XB. Proof. Let AX + XBX be the polynomial (A.5.1). Using Proposition A.3.3, we see immediately that (A + A,)', i = 1,. . . , q are the elementary divisors of Ax + XBX and A*', i = 1,. . . , p are its elementary divisors at infinity. If the strict equivalence claimed by the theorem holds, it follows from Theorem A.5.2 that (A.5.1) is uniquely determined by A + XB, and that A + XB must have the specified elementary divisors.
666 Appendix It remains to prove that there is a strict equivalence of the required form. Let c e <p be such that det(A + cB) ¥=0. Write A + XB = {A + cB) + (A - c)B, multiply on the left by (A + cB)_I, and apply a similarity transformation reducing (A + cB)~lB to the Jordan form. We obtain A + AS ~ (/ + (A - c)J0)©(/ + (A - c)/,) (A.5.2) where J0 is a nilpotent Jordan matrix (i.e., Jl0 = 0 for some /) and /, is an invertible Jordan matrix. Multiply the first diagonal block on the right-hand side of (A.5.2) by (/ - cJ0)~ . It is easily verified that {/ + (A - c)/0}(/- cJoyl = / + A/0(/- c/J-1 and since J0(I - cJ0)~l is also nilpotent, /+ A/0(/-c/0)~l is similar to a matrix of the form (/4l + A/4i(O))0---0(/^0A/^(O)) Multiply the second diagonal block on the right-hand side of (A.5.2) by /J"1 and reduce /j"1 to its Jordan form by similarity. We find that / + (A-c)/1~(A//i + //i(AI))0---0(A/^+^(A,)) for some complex numbers A,,. . . , A and some positive integers A.6 THE REDUCTION THEOREM FOR SINGULAR POLYNOMIALS Consider now the singular polynomial A + \B, where A and B are m x n matrices. Singularity means that either m#norm = n but det(A + \B) is identically zero. Let r be the rank of A + \B, that is, the size of the largest minors in A + AS that do not vanish identically. Then either r < m or r < n holds (or both). Assume r<n. Then the columns of the matrix polynomial A + \B are linearly dependent, that is, the equation (A + \B)x = 0, Ae(f (A.6.1) where x is an unknown vector, has a nonzero solution. Let us check first that there is a vector polynomial x = x( A) ^0 for which (A.6.1) is satisfied. For this purpose we can use the Smith form D(A) of A + \B in place of A + XB itself (see Theorem A.1.1). But because of the
The Reduction Theorem for Singular Polynomials 667 assumption r<n, the last column of D(A) is zero. Hence D(A)* = 0 is satisfied with x = (0,. . . , 0,1). The following example is important in the sequel. example A.6.1. Let MA) A 1 0 0 A 1 0 0 0 o o- 0 0 A lJ be an e x (e + 1) linear matrix polynomial (e = 1, 2,. . .). We claim that the minimal degree of a nonzero vector polynomial solution *(A) of the equation L£(A)*(A) = 0 is e. Indeed, rewrite this equation in the form AjCj(A) + *2(A) =0, A*2(A) + jc3(A) = 0, . . . , A*,,(A) + *£ + 1(A) = 0, where xy(A) is the /'th coordinate of jc(A). So xk(\) = (-ly-Vx^A) , k = 1,2,. . . , e + 1 and the minimal degree for x( A) (which is equal to e) is obtained by taking jc,(A) to be a nonzero constant. D Among all not identically zero polynomial solutions x(\) of (A.6.1), we choose one of least degree e and write jc(A) = jc0- Ax, +\2x2 +(-l)£AX, xt¥=0 (A.6.2) The following reduction theorem holds. Theorem A.6.1 Ifeis the minimal degree of a nonzero polynomial solution of (A.6.1), and if e>0, then A + \B is strictly equivalent to a linear matrix polynomial of the form L 0 4+abJ (A.6.3) where
668 Appendix L = X 1 0 0 A 1 0 01 0 0 A 1J (A.6.4) LO 0 is an e x (e 4-1) matrix, and the equation (A + XB)x = 0 has no nonzero polynomial solutions of degree less than e. (A.6.5) It is convenient to state and prove a lemma to be used in the proof of Theorem A.6.1. For an m x n matrix polynomial U + XV, let u V 0 0 0 • u ■ V 0 - • 0" • 0 u ■ V. M{U + XV] be a matrix of size m{i 4- 2) x n(i + 1) for i = 0,1,2,. . . . Lemma A.6.2 Assume that the rank of U + XV is less than n. Then e is the minimal degree of nonzero polynomial solutions y( A) of (Lf + AK)y(A) = 0, A £ <p (A.6.6) if and only if and rank Af,[C/ + XV] = (i + l)n ; i = 0,. . . , e - 1 rank Mt[U + XV] <(e + l)n (A.6.7) Proof. Let y(X) = EJ=0 X'yf be a nonzero polynomial solution of (A.6.6) of the least degree. Then tfy0 = 0; Vyo + Uy,=0,...,Vyf_^Uy€ = Q; Vyt=0 or, equivalently
The Reduction Theorem for Singular Polynomials 669 Mf[U + \V] 7o' = o Not all the vectors y are zero, and so (A.6.7) follows. Conversely, if (A.6.7) holds, we may reverse the argument and obtain a nonzero polynomial solution of (A.6.6) of degree e. D Proof of Theorem A.6.1. The proof is given in three steps. In the first step we show that A + \B- \Lt D + AFl L0 i+AfiJ for suitable matrices A, B, D, and F, then we show that A + \B satisfies the conclusions of Theorem A.6.1, and finally we prove that [Lt D + AF~\ztLe 0 L 0 A + XBi L 0 A + i AB (a) Let (A.6.2) be a vector polynomial satisfying (A.6.1): (A + \B)(x0 - Ax, + \2x2 + (-l)'AX) = 0 , A £ <p where x€ ^ 0. This is equivalent to Ax0 = 0, AXi = Bx0,. . . , Axt = Bxt_u Bxf = 0 (A.6.8) We claim that the vectors /\ X. , J\X.y.y i . . j /\X f (A.6.9) are linearly independent. Assume the contrary, and let Axh (h =g 1) be the first vector in (A.6.9) that is linearly dependent on the preceding ones: Axh = axAxh_x + a2Axh_2 + ■•■ + ah_xAxx By (A.6.8) this equation can be rewritten as follows: Bxh_l = axBxh„2 + a2Bxh 3 + ■ • • + ah_xBx0 that is, Bxl_x =0, where
670 Appendix aiXh-2 a2Xh-l '■' ah-\X0 Furthermore, again by (A.6.8), we have ArJ_! = B(xh_2 - alXh_3 «*-2*o) = Bxt-z where Xh -2 = Xh-2 ~ aiXh-3 — • ■ • — 0th_2X0 Continuing the process and introducing the vectors Xh-i ~ Xh-i ~ aiXh-4 ~ ' ' ' _ ah-3X0* ■ ■ ■ t X 1 — X\ ~ ^l^O' X0~ X0 we obtain the equations &:;_,= 0, Ax*h^ = Bx*h„2, ..., Ax* = Bx*, Ax*o = 0 (A.6.10) From (A.6.10) it follows that x*(\) = x*0-\x* + --- + {~l)h-1x*h_l is a nonzero solution of (A.6.1) with degree not exceeding h — 1< e, which is impossible. [The fact that this solution is not identically zero follows because Xq-x0¥=0; for if x0 were zero, then A_I*(A) would be a polynomial solution of (A.6.1) of degree less than e.] Thus the vectors (A.6.9) are linearly independent. But then the vectors x0,. . . , xt are linearly independent as well. Indeed, if E'=0a,*, = 0, then E'=1 a,.Ajc, = 0, and by the linear independence of (A.6.9) a, = ■ • • = ae =0. So a0*0 = 0 and since ^o^O we find that also «o = 0- Now write A+\B in a basis in <p" whose first e + 1 vectors are x0, xl,. . . , x€ and in a basis in $m whose first e vectors are Axx,. . . , Axf. In view of equations (A.6.8), the polynomial A + \B in the new bases has the form ["L, D + AFl 1-0 A+\Bl for some D, F, A, and B. In the second step we show that the equation (A + \B)x = Q has no nonzero polynomial solutions of degree less than e. Note that MjL< D + AF] (A.6.11) is obtained from
The Reduction Theorem for Singular Polynomials W._,[LJ Mfl[D + \F) M£_,[i + A 0 "'I 671 (A.6.12) by a suitable permutation of rows and columns. By Lemma A.6.2 the rank of (A.6.11) is equal to en; that is, the columns of (A.6.11) are linearly independent. By the same lemma, taking into account Example A.6.1, rank Afe_,[LJ = e(e + 1); that is, the square e(e + 1) x e(e + 1) matrix M(_X[L(\ is invertible. As the columns of (A.6.12) are linearly independent as well, we find that the columns of Mf_x[A + AJ3] are linearly independent, that is, rank M€_i[A + \B] = e(n - (e + 1)). Using Lemma A.6.2 again, we find that (A + \B)x = 0 has no solutions of degree less than e. In the third step, replacing \Lt D + AF] L 0 A+ABJ by I"/ YVLf D + AFir/ -X~\\Lt D + \F+Y(A+\B)-LtX-\ Lo III 0 i+AfiJI-0 / J L0 A + \B J for suitable matrices X and Y, we see that Theorem A.6.1 will be completely proved if we can show that the matrices X and Y can be chosen so that the matrix equation LeX=D + \F+Y(A +AJ3) (A.6.13) holds. We introduce a notation for the elements of D, F, X and also for the rows of Y and the columns of A and B: D = [dik]';"k:r1, f =[/*]**": «-i X ~ [*/*]/,*= € + l,n-«-l = 1 y = Ly£J i4 = [a1,a2,..., «„_._,], B = [bx,b2,.. . ,ft„_€_,] Then the matrix equation (A.6.13) can be replaced by a system of scalar equations that expresses the equality of the elements of the Arth column — the right- and left-hand sides of (A.6.13). For k = 1, 2, . . . , n - e - 1, obtain on we
672 Appendix x2k + a*u = d>*+ A/u + y\ak + Ay a xu + Ax2t = d2Jt + A/2* + y2ak + \y2bk x*k + Ax3* = rf3t + A/3t + y3at + Ay3ft4 (A.6.14) *. + i.* + A*<* = dmk + kf,k + ycak + Xytbk The left-hand sides of these equations are linear polynomials in A. The free term of each of the first e - 1 of these polynomials is equal to the coefficient of A in the next polynomial. But then the right-hand sides must also satisfy this condition. Therefore, for k = 1, 2,. . . , n — e - 1, we obtain yiak-y2bk=f2k-dlk y2ak-y3bk=f3k-d2k (A.6.15) If (A.6.15) holds, then the required elements of X can obviously be determined from (A.6.14). It now remains to show that the system of equations (A.6.15) for the elements of Y always has a solution for arbitrary dik and fjk (i = 1, 2,. . . , e; k = 1,2,. . . , n - e - 1). Rewrite (A.6.15) in the form Lv„ -y2, y3, ■ ■ ■. (-lYyAK 2U + as] = [//,-•• //._,] where Hj = [fj+i,i - djlt. . . , //+1-n-«_i - d/.B-.-J , / = 1,. . . , e - 1 and use the left invertibility of MC_2[A + AJ3] (ensured by Lemma A.6.2) to verify that (A.6.15) has a solution [>>,, —y2,. . . , (-l)€>"e] = [H^- He_l]{Me_2[A + \B]}'L, where the subscript "L" denotes a left inverse. Theorem A.6.1 is now proved completely. □ A.7 MINIMAL INDICES AND STRICT EQUIVALENCE OF LINEAR MATRIX POLYNOMIALS (GENERAL CASE) We introduce the important notion of minimal indices for linear matrix polynomials. Let A + KB be an arbitrary linear matrix polynomial of size m x n. Then the k polynomial columns xx(\),xz(\),. . . ,xk(\) that are solutions of the equation (A + \B)x = 0 (A.7.1)
Minimal Indices and Strict Equivalence of Linear Matrix Polynomials 673 are called linearly dependent if the rank of the polynomial matrix formed from these columns X(X) = [^(A), x2(A), . . . , xk(\)] is less than A:! In that case there exist k polynomials pt(\), p2(A),. . . , pk(A), not all identically zero, such that p1(A)x1(A)+p2(A)x2(A)+--+p*(A)x,(A) = 0 (A.7.2) Indeed, let A-(A) = £(A)D(A)F(A) be the Smith form of X(\), where £(A) [resp. F(A)] is an n x n (resp. A: x A:) matrix polynomial with constant nonzero determinant, and ^k rdiagK(A),...,dr(A)] 0] with nonzero polynomials d^A),. . . , dr(^)- As the rank r of X(\) is less than A;, the last column of D(A) is zero. One verifies that (A.7.2) is satisfied with (p,(A), . ..,pt(A)) = F(A)"'(0,0, . . . , 0, 1). If polynomials p,(A) (not all zero) with the property (A.7.2) do not exist, then the rank of X is A; and we say that the solutions x,(A),. . . , xk(\) are linearly independent. Among all the polynomial solutions of (A.7.1) we choose a nonzero solution *[( A) of least degree e,. Among all polynomial solutions x( A) of the same equation for which x,( A) and x( A) are linearly independent, we take a solution x2(A) of least degree e2. Obviously, e{<e2. We continue the process, choosing from the polynomial solutions x(A) for which x^A), x2( A), and x(A) are linearly independent a solution x3(A) of minimal degree e3, and so on. Since the number of linearly independent solutions of (A.7.1) is always at most n, the process must come to an end. We obtain a fundamental series of solutions of (A.7.1) x,(A),x2(A),...,xp(A) (A.7.3) having the degrees e, < e2 s • ■ - < €p (A.7.4) Note that it may happen that some degrees el,. . . , e- are zeros. [This is the case when (A.7.1) admits constant nonzero solutions.] In general, a fundamental series of solutions is not uniquely determined (to within scalar factors) by the pencil A + A B. However, note the following.
674 Appendix Proposition A.7.1 Two distinct fundamental series of solutions always have the same series of degrees e,, . . . , ep. Proof. In addition to (A.7.3), consider another fundamental series of solutions xl(\),xz(\), . . . with the degrees e,, e2 Suppose that in (A.7.4) and similarly, in the series e,, e2,. . . Obviously, €l-il. For every vector i,-(A) (i = 1, . . . , mx) there exists a polynomial ^,(A)^0 such that "i 6( A)i,(A) = 2 p,,( A)x,(A), i = 1,.... m, (A.7.5) for some polynomials p0( A). (Otherwise, i,., j^, . . . , jc„ would be linearly independent and one could replace xn +1 by x,, which is of smaller degree, contrary to the definition of xn +1.) Rewrite (A.7.5) in the form [i,(A) • • • xmi(k)\Q(k) = [x,( A) • • • xnj( A)]P(A) (A.7.6) where G(A) = diag[g,( A),. . . , ?mi(A)] and P\k) = [^(A)]-'/^ is an n, x m, matrix polynomial. Asf ,(A),. . . , xm (A) are linearly independent, there is a nonzero minor/(A) of order mt of \xx{ A) • • • xm (A)]. So for every A E (p that is not a zero of one of the polynomials /(A), qx(k), . . . , qm (A) the rank of the matrix on the left-hand side of (A.7.6) is mx. Hence (A.7.6) implies m,^n,. Interchanging the roles of i,-(A) and x^k), we find the opposite inequality mx s nr As m, = n,, we have e„ +1 = em +1, and we can repeat the above argument with n2 and m2 in place of n, and m„ respectively, and so on. □ The degrees e, < • • • < e of polynomials in any fundamental series of polynomial solutions of (A.7.1) are called the minimal column indices of A + kB. As Proposition A.7.1 shows, the number p of the minimal column indices and the indices themselves do not depend on the choice of the fundamental series. If there are no nonzero solutions of (A.7.1) (i.e., the rank of A + kB is equal to n), we say that the number of minimal column indices is zero, in this case no such indices are defined. We define the minimal row indices of A + kB as the minimal column indices of A* + kB*.
Minimal Indices and Strict Equivalence of Linear Matrix Polynomials 675 example A.7.1. Let L€ be as in Example A.6.1. The polynomial Lt has the single minimal column index e, whereas the minimal row indices are absent. Indeed, as in Example A.6.1, observe that every nonzero polynomial solution x(\) = (^(A), . . . , *4 + i(A)) of L£x(A) = 0 (A.7.7) has the form xk(\) = (-l)k'1\k'lxl(\), k=l,...,e + l (A.7.8) and a solution i(A) of minimal degree e is obtained by taking j:,(A) = 1. Hence the first minimal column index of Le is e. As (A.7.8) shows, every other solution x(A) of (A.7.7) has the form x(\) = jc,(A)i(A), where x^A) is the first coordinate of *(A). So *(A) and *,(A) are linearly dependent, which means that there are no more minimal column indices. As the rows of Lt are linearly independent for every A, the minimal row indices are absent. Similarly, we conclude that the transposed polynomial L J has the single minimal row index e and no minimal column indices. D The importance of minimal indices stems from their invariance under strict equivalence, as follows. Proposition A.7.2 IfA + \B~Al + \Bl, then the minimal column indices of the polynomials A + AB and Al + AB, are the same, and the minimal row indices of these polynomials are also the same. The proof is immediate: if P(A + XB)Q = Al + \Bl for invertible matrices P and Q, then the solutions of {A + XB)x{ A) = 0 are obtained from the solutions of (At + \B1)y(\) = 0 by multiplication by Q: x(X)= Qy(\), which preserves linear dependence and independence and also implies that x(X) and v(A) have the same degree. We are now in a position to state and prove the main result concerning strict equivalence of linear matrix polynomials in general. We denote by Lt the e x (e + 1) linear polynomial A 1 0 ••• 0 0 0 A 1 ••• 0 0 LO 0 ••• \ 1 (A.7.9) and L J is its transpose [which is an (e + 1) x e linear polynomial). Then 0„x„
676 Appendix will denote the zero u x v matrix. As before, /t(A0) represents the k x k Jordan block with eigenvalue A0. Theorem A.7.3 Every m x n linear matrix polynomial A + \B is strictly equivalent to a unique linear matrix polynomial of type OuXu@L,i®---®Lfp®Lli®---®Lli®(Iki + XJki{0))®--- ® (ikr + A/tp(o)) e ( a/,, + /,,( a, » e • • • e (\ils+/,/ a,» (A.7.10) Here £,<•••<« and tj, < • • • < 7^ are positive integers; kl,. . . , kr and /,,..., ls are positive integers; A,,. . . , \s are complex numbers. The uniqueness of the linear matrix polynomial of type (A.7.10) to which A + AB is strictly equivalent means that the parameters u, v, p, q, r, s, {*,}£.i, fa/l^n {£,}>=], and {Ay}^, are uniquely determined by the polynomial A + \B. It may happen that some of the numbers u, v, p, q, r, and s are zeros. This means that the corresponding part is missing from formula (A.7.10). Proof of Theorem A.7.3. Let x,,...,x„6(f" be a basis in the linear space of all constant solutions of the equation (A+\B)x = 0, A6(f (A.7.11) that is, all solutions that are independent of A. Note that (A.7.11) is equivalent to the simultaneous equations Ax = 0 , Bx = 0 Likewise, let yx,. . . , yu £ $m be a basis in the linear space of all constant solutions of (A* + \B*)y=0, AGlf or, what is the same, the simultaneous equations A*y = Q, B*y = Q Write A + KB (understood for each A E <p as a transformation written in the standard orthonormal bases in <p" and <pm) as a matrix with respect to the basis in <p" whose first v vectors are x,,..., xu and the basis in <f"" whose
Minimal Iudices and Strict Equivalence of Linear Matrix Polynomials 677 first u vectors are y,,..., y„ and the others are orthogonal to Span{y1,. . . , yu). Because Im A = (Ker A*)x C(Span{y,,. . . , yu})L and also Im B C(Span{y,,. . . , yu})L, it follows that, with respect to the indicated bases, A + XB has the form 0UXU ®(Al + XBX). Here >!, + AJ5, has the property that neither (Ax + XBl)x = 0 nor (A* + XB*)y = 0 has constant nonzero solutions. If the rank of /i, + AS, is less than the number of columns in Ax + XBX, apply the reduction theorem (Theorem A.6.1) several times to show that Al + XB, ~ L€] © • • • © LCp® {A2 + XB2) where A2 + XB2 is such that the equation (A2 + XB2)x = 0 has no nonzero polynomial solutions x = x(X). From the property of A + KB in Theorem A.6.1 it is clear that Cj < • • • < e It is also clear that the process of consecutive applications of Theorem A.6.1 must terminate for the simple reason that the size of Al + XBl is finite. The Smith form of A2 + XB2 (Theorem A. 1.1) shows that the number of columns of the polynomial A2 4- XB2 coincides with its rank. If it happens that the rank of A2 + XB2 is less than the number of its rows, apply the above procedure to A*2 + Afi*. After taking adjoints, we find that A2 + XB2 ~LTVi ©••■©/.;©(A3 + XB3) where 0 < rjl ^ • • • ^ r\q and the rank of A3 + AB3 coincides with the number of columns and the number of rows of (A3 + AB3). In other words, /43 + \B3 is regular. It remains to apply Theorem A.5.3 in order to show that the original polynomial A + XB is strictly equivalent to a polynomial of type (A.7.10). It remains to show that such a polynomial (A.7.10) is unique. Proposition A.7.2 and Example A.7.1 show that the minimal column indices of A + \B are 0,. . . , 0, e,,. . . , ep (where 0 appears u times) and the minimal row indices of A + XB are 0,. . . , 0, tj,, . . . , tj (where 0 appears v times). Hence the parameters u, v, p, q, {ejf=1, and {t);}?=, are uniquely determined by A + AB. Further, observe that Lf and L J have no elementary divisors; that is, their Smith forms are [It 0] and " , respectively. (This follows from Theorem A.2.2 since both Le and L\ have an e x e minor that is equal to 1.) Using Proposition A.3.3, we see that the elementary divisors of (A.7.10) are (A + Aj'1,. . . , (A + Aj'', which must coincide with the elementary divisors of A + \B because of the strict equivalence of A + XB and (A.7.10) (Theorem A.3.1). Hence the parameters s, {/(-}*=1, and {A,.}'„, are also uniquely determined by A + XB. Applying this argument for XA +B in place of A + XB, we see that r and {ArJJL, are uniquely determined by A + XB as well. D
678 Appendix The matrix polynomial (A.7.10) is called the Kronecker canonical form of A + AB. Here 0, . . . , 0, e,,. . . , e (m times 0) are the minimal column indices of A + AS; 0,. . . , 0, tj,, . . . , tj (v times 0) are the minimal row indices of A + XB; A*1, . . . , A*' are the elementary divisors of A + AS at infinity; and (A 4- A,)'1,. . . , (A + As)'s are the (finite) elementary divisors of A + AB. We obtain the following corollary from Theorem A.7.3. Corollary A.7.4 We have A + \B ~ Al + AB, if and only if the polynomials A + \B and Ax + Afl, have the same minimal column indices, minimal row indices, elementary divisors, and elementary divisors at infinity. Thus Corollary A.7.4 describes the full set of invariants for strict equivalence of linear matrix polynomials. A.8 NOTES TO THE APPENDIX This appendix contains well-known results on matrix polynomials. Essentially the entire material can be found in Chapters 6 and 12 of Gantmacher (1959), for example. In our exposition of Sections A.5-A.7 we follow this book. In the exposition of Sections A.1-A.4 we follow Gohberg, Lancaster, and Rodman (1982).
List of Notations and Conventions XCY r C die X= z(x + x) $tnx = — (x — x) {ax,...,a„) (•••) Ml = (*>*)' =(ikll)J inclusion between sets X and V (equality not excluded) the held of real numbers the space of all ^-dimensional real column vectors the field of complex numbers the space of all n- dimensional complex column vectors complex conjugate of complex number x the real part of x the imaginary part of x the n-dimensional column vector La„ c the standard scalar product in <p": ««,,.■■,«„>,<&,,... ,b„)) = 2«A the norm of a vector x=(x1,...,xH)fE(H 679
680 List of Notations and Conventions e, = (0,...,0,l,0,...,0> "Linear transformation" [„ -\m,n fll/Ji./-i / AT A* A A~L A-R A1 UA ,. .. \\Ax\\ \\A = max ",, ,," lmA={Ay\yE$m} KerA = {y\ Ay = 0} a(A) = {A e £ | Ker(/1 - A/) * {0}} (with 1 in the <th place) the <th unit coordinate vector in <p"; its size n will be clear from the context often abbreviated to "transformation"—when convenient, a linear transformation from <pm into <p" is assumed to be given by an n x m matrix with respect to the bases e,,. . . ,e„ in <p" and e,,. . . ,em in <f"", consequently, when convenient, an n x m matrix will be considered as a linear transformation written in the standard bases ex,. . . , en and e\i ■ ■ ■ ' em m x n matrix whose entry in the (i, j) place is atj unit matrix; identity linear transformation (the size of / is understood from the context) the k x k unit matrix the transpose of a matrix A the adjoint of a transformation A; the conjugate transpose of a matrix A complex conjugate in every entry of a matrix A left inverse of a matrix (or transformation) A right inverse of a matrix (or transformation) A one-sided inverse (left or right) of A; generalized inverse of A the trace of a matrix (or transformation) A the norm of a transformation A the restriction of a transformation A to its invariant subspace M the image of a transformation A:$m^$" the kernel of a transformation A the spectrum of a matrix (or transformation) A the root subspace of A corresponding to its eigenvalue A
List of Notations and Conventions 681 A(A) diag[Al,...,Ap] = Al®---®Ap LZ.J coi[z,];=1 Inv(A) Invp(/1) Cim(A) Sinv(A) Rinv(/1) Rinvp(A) Him(A) Inv*(A) <€{A) {0} M@Jf d(x,Z)=ini\\x-y\ ylEZ d(X, Y) 0(£,M) <Pmin(if,J0 the Jordan block of size k x k with eigenvalue A the block diagonal matrix with the matrices A,,. . . , Ap along the main diagonal; or, the direct sum of the linear transformations Aly. . . , A a block column matrix the set of all A-invariant subspaces the set of all /^-dimensional A- invariant subspaces the set of all coinvariant subspaces for A the set of all semiivariant sub- spaces for A the set of all reducing invariant sub- spaces for A the set of all p-dimensional reducing invariant subspaces for A the set of all hyperinvariant sub- spaces for A the set of all real invariant subspaces for a real transformation A the set of all transformations (or matrices) that commute with a transformation (or matrix) A the zero subspace the orthogonal complement to a sub- space M direct sum of subspaces M and Jf orthogonal sum of subspaces M and Jf the unit sphere in a subspace M the distance between a point x E <p" and a set Z C <p" the distance between sets X and Y the gap between Z£ and M the minimal opening between J? and M the spherical gap between if and M the minimal angle between sub- spaces if and M
682 <P(0 Span{x,,. . . ,xk) Inv(K) Alg(A) GLr(n) S(W) S{A) References the metric space of all subspaces in the set of all m-dimensional sub- spaces of <p" the subspace spanned by vectors xl> • • • > xk the algebra of all n x n matrices the algebra of all transformations on a linear space if the algebra of all upper triangular Toeplitz matrices of size / x j the lattice of all invariant subspaces for an algebra V the algebra of all transformations for which every subspace from a lattice A is invariant the set of all n x n unitary matrices the set of all n x n real orthogonal matrices with determinant 1 the set of all real invertible n x n matrices the McMillan degree of a rational matrix function W(A) the singular set of an analytic family of transformations A(z) Kronecker index: Stj = 0 if ii^j; fi„ = lifi=/ (u > v are positive integers; 0! = 1) the number of distinct elements in a finite set K end of a proof or an example
References Alien, G, R., "Hoiomorphic vector-valued functions on a domain of holomorphy," J. London Math. Soc. 42, 509-513 (1967), Bart, H., I. Gohberg, and M. A, Kaashoek, "Stable factorization of monk matrix polynomials and stable invariant subspaces," Integral Equations and Operator Theory 1, 496-517 0978). Bart, H., I. Gohberg, and M. A. Kaashoek, Minimal Factorization of Matrix and Operator Functions (Operator Theory: Advances and Applications, Vol, 1) Birkhauser, Base!, 1979, Bart, H., I. Gohberg, M. A, Kaashoek, and P, Van Dooren, "Factorizations of transfer functions," SI AM J. Control Optim. 18(6), 675-696 (1980). Bauntgartei, H. Analytic Perturbation Theory for Matrices and Operators (Operator Theory: Advances and Applications, Vol. 15) Birkhauser, Basel-Boston-Stuttgart, 1985. den Boer, H., aad G. Ph. A. Thijsse, "Semistability of sums of partial multiplicities under additive perturbations," Integral Equations and Operator Theory 3, 23-42 (1980). Bochner, S., and W, T. Martin, Several Complex Variables, Princeton University Press, Princeton, NJ, 1948. Brickman, L., and P. A. Fillmore, "The invariant subspace lattice of a linear transformaton," Canad. J. Math, 19, 810-822 (1967). Brockett, R., Finite Dimensional Linear Systems, John Wiley & Sons, New York, 1970. Brunovsky, P., "A classification of linear controllable systems," Kybernetika (Praha) 3, 173-187 (1970). Campbell, S., and J. Daughtry, "The stable solutions of quadratic matrix equations," Proc. AMS 74, 19-23 (1979). Choi, M.-D., C. Laurie, and H. Radjavi, "On comnttttators and invariant subspaces," Linear and Multilinear Algebra 9, 329-340 (1981). Coddington, E, A., and N. Levinson, Theory of Ordinary Differential Equations, McGraw- Hill, New York, 1955. Conway, J. B., and P. R, Halntos, "Finite-dimensional points of continuity of Lat," Linear Algebra Appt. 31, 93-102 (1980). Djaferis, T. E., and S. K. Miner, "Some generic invariant factor assignment results using dynamic output feedback," Linear Algebra Appl. 5t, 103-131 (1983). Donnellan, T., Lattice Theory, Pergatnon Press, Oxford, 1968. Douglas, R. G., and C. Pearcy, "On a topology for invariant subspaces," J. Functional Anaty. 2, 323-341 (1968). Fillmore, P. A., D. A. Herrero, and W. E. Longstaff, "The hyperinvariant subspaces lattice of a linear transformation," Linear Algebra Appl. 17, 125-132 (1977). Ganttnacher, F. R., The Theory of Matrices, Vols. I and II, Chelsea, New York, 1959. Gochberg, L Z., and J. Leiterer, "Uber Algebren stetiger Operatorfuncttonen," Studia Mathematica, Vol. LVI1, 1-26, 1976. Gohberg, L, and S. Goldberg, Basic Operator Theory, Birkhauser, Basel, 1981. 683
684 References Gohberg, 1., and G. Heinig, "The resultant matrix and its generalizations, I. The resultant operator for matrix polynomials," Acta Sci. Math. (Szeged) 37, 41-61 (Russian) (1975). Gohberg, I., and M. A. Kaashoek, "Unsolved problems in matrix and operator theory, II. Partial multiplicities of a product," Integral Equations and Operator Theory 2, 116-120 (1979). Gohberg, I., M. A. Kaashoek, and F. van Schagen, "Similarity of operator blocks and canonical forms. I. General results, feedback equivalence and Kronecker indices," Integral Equations and Operator Theory 3, 350-396 (1980). Gohberg, I., M. A. Kaashoek, and F. van Schagen, "Similarity of operator blocks and canonical forms. II. Infinite dimensional case and Wiener-Hopf factorization," in Topics in Modern Operator Theory. Operator Theory: Advances and Applications, Vol. 2, Birkhauser-Verlag, 1981, pp. 121-170. Gohberg, I., M. A. Kaashoek, and F. van Schagen, "Rational matrix and operator functions with prescribed singularities," Integral Equations and Operator Theory 5, 673-717 (1982). Gohberg, I. C, and M. G. Krein, "The basic propositions on defect numbers, root numbers and indices of linear operators," Uspehi Mat. Nauk 12, 43-118 (1957); translation, Russian Math. Surveys 13, 185-264 (1960). Gohberg, I., and N. Krupnik, Einfuhrung in die Theorie der eindimensionalen singuldren Integraloperatoren, Birkhauser, Basel, 1979. Gohberg, I., P. Lancaster, and L. Rodman, "Perturbation theory for divisors of operator polynomials," SIAM J Math. Anal. 10, 1161-1183 (1979). Gohberg, I., P. Lancaster, and L. Rodman, Matrix Polynomials, Academic Press, New York, 1982. Gohberg, I., P. Lancaster, and L. Rodman, "A sign characteristic for self-adjoint meromorphic matrix functions," Applicable Analysis 16, 165-185 (1983a). Gohberg, I., P. Lancaster, and L. Rodman, Matrices and Indefinite Scalar Products (Operator Theory: Advances and Applications, Vol. 8) Birkhauser-Verlag, Basel, 1983b. Gohberg, I., and Ju. Leiterer, "On holomorphic vector-functions of one variable, I. Functions on a compact set," Matem. Issled. 7, 60-84 (Russian) (1972). Gohberg, I., and Ju. Leiterer, "On holomorphic vector-functions of one variable, II. Functions on domains," Matem. Issled. 8, 37-58 (Russian) (1973). Gohberg, I. C. and A. S. Markus, "Two theorems on the gap between subspaces of a Banach space," Uspehi Mat. Nauk 14, 135-140 (Russian) (1959). Gohberg, I., and L. Rodman, "Analytic matrix functions with prescribed local data," /. d'Analyse Math. 40, 90-128 (1981). Gohberg, I., and L. Rodman, "On distance between lattices of invariant subspaces of matrices," Linear Algebra Appl. 76, 85-120 (1986). Gohberg, I., and S. Rubinstein, "Stability of minimal fractional decompositions of rational matrix functions," in Operator Theory: Advances and Applications, Vol. 18, Birkhauser, Basel, 1986, pp. 249-270. Golub, G. H., and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 1983. Golub, G. H., and J. H. Wilkinson, "Ill-conditioned eigensystems and the computation of the Jordan canonical form," SIAM Review 18, 578-619 (1976). Grauert, H., "Analytische Faserungen fiber holomorph vollstandigen Raumen," Math. Ann. 135, 263-273 (1958). Guralnick, R. M., "A note on pairs of matrices with rank one commutator," Linear and Multilinear Algebra 8, 97-99 (1979). Halmos, P. R., "Reflexive lattices of subspaces," /. London Math. Soc. 4, 257-263 (1971). Halperin, I., and P. Rosenthal, "Burnside's theorem on algebras of matrices," Am. Math. Monthly 87, 810 (1980). Harrison, K. J., "Certain distributive lattices of subspaces are reflexive," J. London Math. Soc. 8, 51-56 (1974). Hautus, M. L. J., "Controllability and observability conditions of linear autonomous systems," Ned. Akad. Wet. Proc, Ser. A, 12, 443-448 (1969).
References 685 Helton, J. W., and J. A. Ball, "The cascade decompositions of a given system vs the linear fractional decompositions of its transfer function," Integral Equations and Operator Theory 5, 341-385 (1982). Hoffman, K., and R. Kunze, Linear Algebra, Prentice-Hall of India, New Delhi, 1967. Jacobson, N., Lectures in Abstract Algebra II: Linear Algebra, Van Nostrand, Princeton, NJ, 1953. Johnson, R. E., "Distinguished rings of linear transformations," Trans. Am. Math. Soc. Ill, 400-412 (1964). Kaashoek, M. A., C. V. M. van der Mee, and L. Rodman, "Analytic operator functions with compact spectrum, II. Spectral pairs and factorization," Integral Equations and Operator Theory 5, 791-82? (1982). Kailath, T., Linear Systems, Prentice-Hall, Englewood Cliffs, NJ, 1980. Kalman, R. E., "Mathematical description of linear dynamical systems," SIAM J. Control 1, 152-192 (1963). Kalman, R. E., "Kronecker invariants and feedback," Proceedings of Conference on Ordinary Differential Equations, Math. Research Center, Naval Research Laboratory, Washington, DC, 1971. Kalman, R. E., P. L. Falb, and M. A. Arbib, Topics in Mathematical System Theory, McGraw-Hill, New York, 1969. Kato, T., Perturbation Theory for Linear Operators, 2nd ed., Springer-Verlag, Berlin, 1976. Kelley, J. L., General Topology, van Nostrand, New York, 1955. Kra, I., Automorphic Forms and Kleinian Groups, Benjamin, Reading, MA, 1972. Krein, M. G., "Introduction to the geometry of indefinite ./-spaces and to the theory of operators in these spaces," Am. Math. Soc. Translations (2) 93, 103-176 (1970). Krein, M. G., M. A. Krasnoselskii, and D. P. Milman, "On the defect numbers of linear operators in Banach space and on some geometric problems," Sbornik Trud. Inst. Mat. Akad. Nauk Ukr. SSR 11, 97-112 (Russian) (1948). Kurosh, A. G., Lectures in General Algebra, Pergamon Press, Oxford, 1965. Laffey, T. J., "Simultaneous triangularization of matrices—low rank cases and the non- derogatory case," Linear and Multilinear Algebra 6, 269-305 (1978). Lancaster, P., Theory of Matrices, Academic Press, New York, 1969. Lancaster, P., and M. Tismenetsky, The Theory of Matrices with Applications, Academic Press, New York, 1985. Lidskii, V. B., "Inequalities for eigenvalues and singular values," appendix in F. R. Gantmach- er, The Theory of Matrices, Moscow, Nauka, 1966, pp. 535-559 (Russian). Markus, A. S., and E. E. Parilis, "Change in the Jordan structure of a matrix under small perturbations," Matem. Issled. 54, 98-109 (Russian) (1980). Markushevich, A. I., Theory of Analytic Functions, Vols. I-III, Prentice-Hall, Englewood Cliffs, NJ, 1965. Marsden, J. E., Basic Complex Analysis, Freeman, San Francisco, 1973. Ostrowski, A. M., Solution of Equations in Euclidean and Banach Space, Academic Press, New York, 1973. Porsching, T. A., "Analytic eigenvalues and eigenvectors," Duke Math. J. 35, 363-367 (1968). Radjavi, H., and P. Rosenthal, Invariant Subspaces, Springer-Verlag, Berlin, 1973. Ran, A. C. M., and L. Rodman, "Stability of neutral invariant subspaces in indefinite inner products and stable symmetric factorizations," Integral Equations and Operator Theory 6, 536-571 (1983). Rodman, [.., and M. Schaps, "On the partial multiplicities of a product of two matrix polynomials," Integral Equations and Operator Theory 2, 565-599 (1979). Rosenbrock, H. H., State Space and Multivariate Theory, Nelson, London, 1970. Rosenbrock, H. H., and C. E. Hayton, "The general problem of pole assignment," Intern. J. Control 27, 837-852 (1978). Rosenthal, E., "A remark on Burnside's theorem on matrix algebras," Linear Algebra Appl. 63, 175-17? (1984). Rudin, W., Real and Complex Analysis, 2nd ed., Tata McGraw-Hill, New Delhi.
686 References Ruhe, A., "Perturbation bounds for means of eigenvalues and invariant subspaces," Nordisk Tidskriftfur Informations Behandlung (BIT) 10, 343-354 (1970a). Ruhe, A., "An algorithm for numerical determination of the structure of a general matrix," Nordisk Tidskriftfur Informations Behandlung (BIT) 10, 196-216 (1970b). Saphar, P., "Sur les applications lineaires dans un espace de Banach. II," Ann. Sci. Ecole Norm. Sup. 82, 205-240 (1965). Sarason, D., "On spectral sets having connected complement," Acta Sci. Math. (Szeged) 26, 289-299 (1965). Shayman, M. A., "On the variety of invariant subspaces of a finite-dimensional linear operator," Trans. AMS 274, 721-747 (1982). Shmuljan, Yu. L , "Finite dimensional operators depending analytically on a parameter," Ukrainian Math. J. 9(2), 195-204 (Russian) (1957). Shubin, M. A., "On holomorphic families of subspaces of a Banach space," Integral Equations and Operator Theory 2, 407-420 (translation from Russian) (1979). Sigal, E. 1., "Partial multiplicities of a product of operator functions," Matem. Issled. 8(3), 65-79 (Russian) (1973). Soltan, V. P., "The Jordan form of matrices and its connection with lattice theory," Matem. Issled. 8(27), 152-170 (Russian) (1973a). Soltan, V. P., "On finite dimensional linear operators with the same invariant subspaces," Matem. Issled. 8(30), 80-100 (Russian) (1973b). Soltan, V. P., "On finite dimensional linear operators in real space with the same invariant subspaces," Matem. Issled. 9, 153-189 (Russian) (1974). Soltan, V. P., "The structure of hyperinvariant subspaces of a finite dimensional operator," in Nonselfadjoint Operators, Kishinev, Stiinca, 1976, pp. 192-203 (Russian). Soltan, V. P., "The lattice of hyperinvariant subspaces for a real finite dimensional operator," Matem. Issled. 61, 148-154, Stiinca, Kishinev (Russian) (1981). Thijsse, G. Ph. A., "Rules for the partial multiplicities of the product of holomorphic matrix functions," Integral Equations and Operator Theory 3, 515-528 (1980). Thijsse, G. Ph. A., Partial Multiplicities of Products of Holomorphic Matrix Functions, Habilitationschrift, Dortmund, 1984. Thompson, R. C, "Author vs. referee: A case history for middle level mathematicians," Am. Math. Monthly, 90(10), 661-668 (1983). Thompson, R. C, "Some invariants of a product of integral matrices," in Proceedings of the 1984 Joint Summer Research Conference on Linear Algebra and its Role in Systems Theory, 1985. Uspensky, J. V., Theory of Equations, McGraw-Hill, New York, 1978. Van Dooren, P., "The generalized eigenstructure problem in linear system theory," IEEE Trans. Am. Contr. AC-26, 111-129 (1981). Van Dooren, P., "Reducing subspaces: Definitions, properties and algorithms," in A. Ruhe and B. Kagstrom, Eds., Matrix Pencils, Lecture Notes in Mathematics, Vol. 973, Springer, New York, 1983, pp. 58-73. Wells, R. O., Differential Analysis on Complex Manifolds, Springer-Verlag, New York, 1980. Wonham, W. M., Linear Multivariable Control: A Geometric Approach, Springer-Verlag, Berlin, 1979.
Author Index Allan, G.R., 645 Arbib, M.A., 292 Helton, J.W., 292 Herrero, D.A., 384 Hoffman, K., 427 Ball, J. V, 292 Bart, H., 290, 292, 561, 562 Baumgartel, H., 605, 645 Bochner, S., 629 den Boer, H., 562 Brickman, L., 562 Brockett, R., 292 Brunovsky, P., 292 Campbell, S., 561, 562 Choi, M.D., 384 Coddington, E.A., 262 Conway, J.B., 561 Daughtry, J., 561, 562 Djaferis, T.E., 292 Donnellan, T., 313 Douglas, R.G., 561 Falb, P.L., 292 Fillmore, P.A., 384, 562 Gantmacher, F.G., 115, 290, 384, 678 Gohberg, I., 290, 291, 292, 410, 561, 562, 580, 609, 645, 678 Goldberg, S., 580 Golub, G., 562 Grauert, H., 645 Guralnick, R.M., 384 Halmos, P., 384, 561 Halperin, I., 384 Harrison, K.J., 348 Hautus, M.L.J., 292 Hayton, C.E., 292 Heinig, G., 609 Jacobson, N., 384 Johnson, R.E., 348 Kaashoek, M.A., 290, 291, 292, 561, 562 Kailath.T.,291, 292 Kalman, R.E.,292 Kato, T., 561 Kelley, J.L., 592 Kra, I., 614 Krasnoselskii, M.A., 561 Krein, M.G., 290, 561 Krupnick, N., 561 Kunze, R., 427 Kurosh, A.G., 290 Laffey, T.J., 384 Lancaster, P., 122, 290, 291, 327, 384, 561, 562, 645, 678 Laurie, C, 384 Leiterer, Ju., 410, 561, 645 Levinson, N., 262 Lidskii, V.B., 136 Longstaff, W.E., 384 Markus, A.S.,561, 562 Markushevich, A.I., 570, 585 Mardsen, J.E., 477 Martin, W.T., 629 Milman, D.P., 561 Miner, S.K., 292 Ostrowski, A.M., 562 Parilis, E.E., 562 Pearcy, C, 561 Porsching, T.A., 645 687
688 Author Index Radjavi, H, 384 Ran, A.C.M., 561 Rodman, L., 136, 291, 561, 562, 645, Rosenbrock, H., 292 Rosenthal, E., 384 Rosenthal, P., 384 Rubinstein, S., 292, 561, 562 Rudin, W., 597 Rune, A., 562 Saphar, P., 645 Sarason, D., 291 Schaps, M., 136, 291 Shayman, M.A.,434, 561 Shmuljan, Yu.L., 645 Shubin, M.A., 645 Sigal, E.I., 291 Soltan, V.P., 290, 380, 384 Thijsse, G.Ph.A., 136, 562 Thompson, R.C., 291 Tismenetsky, M., 122, 290, 327, 384, 561 Uspensky, J.V., 630 vanderMee, C.V.M., 561 van Dooren, P., 562 van Loan, C.F., 562 van Schagen, F., 292 Wells, R.O., 434 Wilkinson, J.H., 562 Wonham, W.M., 291, 292
Subject Index Algebra, 339 k-transitive, 344 reductive, 351 self-adjoint, 351 see also Boolean algebra Analytic family: of subspaces, 566 A(z)-invariant, 594 direct complement for, 590 real, 600 of transformations, 565, 599, 604 analytic Jordan basis for, 611 diagonable, 612 eigenvalues of, 604, 609 eigenvectors of, 605 first exceptional set, 609, 624, 632 image of, 569 incomplete factorization of, 578 kernel of, 569 multiple points of, 608 real, 600 second exceptional set of, 610, 624, 633 singular set of, 569 Angular subspace, 25 Angular transformation, 27, 398 Atom, 349 Baire category theorem, 592 Binet-Cauchy formula, 651 Block similarity, 193, 208, 383 Boolean algebra, 349 atomic, 349 Branch analytic family, 613 singular set of, 613 Brunovsky canonical form, 196, 359, 383 Bumside's theorem, 341 Cascade (of linear systems), 273 minimal, 274 simple, 270 Chain (of subspaces), 33 almost invariant, 209 analytic extendability of, 618 complete, 35, 348, 449 Lipschitz stable, 526 maximal, 35 stable, 464 Characteristic polynomial, iO Circulant matrix, 43, 96, 256, 260 Coextension, 128 Coinvariant subspace, 105, 437, 490 orthogonally, 108 Col, 147 Column indices, minimal, 674 Commutator, 303 Commuting matrices, 295, 371 Companion matrix, 146, 515 second, 150 Completion, 128 Complexification, 366 Compression, 106 Connected components, 426, 442 Connected set, 423 finitely, 584 simply, 584 Connected subspaces, 405, 423, 437 Continuous families: of subspaces, 408, 445 of transformations, 412 Controllable pair, 290 Controllable system, 267 Diagonable transformation, 109, 366 Difference equation, 180 Differential equation, 175 Dilation, 128 of linear system, 263 Direct sum of subspaces, 20 Distance: between sets of subspaces, 465 689
690 Subject Index Distance (Com.) between subspaces, 397 from point to set, 388 Disturbance decoupling, 275 Eigenvalue, 10, 146, 361, 604, 609, 657, 661 Eigenvector, 10, 361, 605 generalized, 12, 13 Elementary divisors, 298, 655, 665 at infinity, 664, 665 Elementary matrices, 694 Equivalent matrix polynomials, 646 strictly, 195, 382, 662, 665 Extension, 121 Factorization: of matrix polynomials, 159, 160, 171, 554, 624 analytic extendability, 625, 626 isolated, 524, 554 Lipschitz stable, 525 sequentially nonisolated, 627 stable, 520, 524, 554 of rational matrix functions, 226, 554 analytic continuation, 634 isolated, 538, 539, 555 Lipschitz stable, 539 minimal, 226, 529, 634 sequentially nonisolated, 638 stable, 529, 537, 539, 554 Factor space, 29 Feedback, 275, 277, 279 Fractional power series, 605 Full range pair, 81, 197, 290, 468 Gap, 387, 417 spherical, 393, 418 Generalized inverse, 24 continuity of, 411, 413 Generators, 69, 100 minimal, 69 Graph (of matrix), 545 Height: of eigenvalue, 86 of transformation, 498, 513 Hyperinvariant subspace, 305-313, 374, 431, 490 Ideal, in algebra, 343 Image, 5, 406 Incomplete factorization, 578 Input (of linear system), 262 Invariant polynomials, 654, 664 Invariant subspace, 5, 359 of algebra, 340 u stable, 513 analytic extendability of, 616 B-stable, 480 common to different matrices, 301, 378 cyclic, 69 inaccessible, 431 intersect v, 208 irreducible, 65, 365 isolated, 428, 442, 473 Jordan, 54 Lipschitz stable, 459, 473 marked, 83 maximal, 72 minimal, 78 mod v, 191 orthogonal reducing, 111 real, 359 reducible, 65 reducing, 109, 298, 432, 490 sequentially isolated, 619 spectral, 60, 365, 458, 618 stable, 447 supporting, 187 Jordan block, 6, 52 Jordan chain, 13, 361 Jordan form, 53 real, 365 Jordan indices, 196 Jordan part (of Brunovsky form), 196 Jordan structure, 482 derogatory, 497 fixed, 596 Jordan structure sequence, 477, 483 derogatory part, 512 Jordan subspace, 54 Kernel, 5, 406 Kronecker canonical form, 678 Kronecker indices, 196, 199 Kronecker part (of Brunovsky form), 196 Laplace transform, 265 Lattice, 31 analytic dependence, 596 distributive, 311, 348 linear isomorphism, 484 reflexive, 348 self-dual, 311 Lattice homomorphism, 483 Lattice of invariant subspaces, 463, 470 analytic dependence, 596
Subject Index 691 Lipschitz stable, 464 in metric, 467 stable, 464 in metric, 465 Lattice isomorphism, 463, 483, 596 Left inverse, 216 continuity of, 414 Left quotient, 659 Left remainder, 659 Linear equation (in matrices), 548, 551 Linear fractional decomposition, 244. 274 Lipschitz stable, 540 minimal, 245, 274 Linear fractional transformation, 238 Linear isomorphism (of lattices), 484 Linearization, 144 Linear system, 262 controllable, 267 disturbance decoupled, 275 minimal, 264 observable, 266 similar, 263 Linear transformation: diagonable, 109, 366 normal, 39, 363 self-adjoint, 363 unitary, 363 Lipschitz continuous map, locally, 518 Lipschitz stability, 467 Lyapunov equation, see Linear equation McMillan degree, 225, 245, 632 Matrix: block: circulant, 98 tridiagonal, 210 circulant, 96, 97, 314 companion, 98, 100, 299, 314 cyclic, 299 diagonable, 90 hermitian, 20 nonderogatory, 299, 449, 465, 499 normal, 100, 111, 117, 303 orthogonal, 363, 405 Toeplitz, 317 Matrix polynomial, 646 monic, 144 see also Factorization, of matrix polynomials Metric, 387 Metric space: compact, 400 complete, 401 connected, 405 Minimal angle, 392, 419 Minimal opening, 396, 451 Minimal polynomial, 74 Minimal realization, 218, 219 Minimal system, 264 Minor, 651 Mittag-Leffler theorem, 571, 614 Monodromy theorem, 597 Multiplicity: algebraic, 53, 365 geometric, 53, 365 partial. 53, 365 Norm, 88, 415 Normed space, 415 Null function, 220 associated, 222 canonical, 220 order of, 220 Null kernel pair, 75, 81, 209, 290 Null vector, 220 Observable pair, 290 Observable system, 266 Output (of linear system), 262 Output stabilization, 279 Partial multiplicities, 154, 219, 657, 661 stability of, 475 Pole (of rational function), 219, 223 geometric multiplicity of, 529 Projector, 20 complementary, 22 orthogonal, 21 Quadratic equation (in matrices), 27, 545, 637 inaccessible solution of, 547 isolated solution of, 547, 551, 556 Lipschitz stable solution of, 552 stable solution of, 551, 552, 556 unilateral, 550 Rational matrix function, 212 analytic dependence, 628 analytic minimal realization of, 630 exceptional sets of, 632, 633 minimal realization of, 218 partial multiplicities of, 219 pole of, 219 realization of, 212 zero of, 219 see also Factorization, of rational matrix functions Reachable vector, 276
692 Realization, see Rational matrix function Reducing subspaces, 245, 251 Reduction: of linear system, 263 of realization, 215 Regular linear matrix polynomial, 663 Resolvent form, 147 Restriction of transformation, 121 Riccati equation, see Quadratic equation Riesz projector, 64, 447, 452 Right inverse, 216 continuity of, 414 Right quotient, 659 Right remainder, 659 Root subspace, 46, 363, 490 Rotation matrix, 54 Row indices, minimal, 674 Scalar product, 391 Schmidt-Ore theorem, 290 Self-adjoint transformation, 20 Semiinvariant subspace, 112, 438, 490 orthogonally, 115 Sigal inequalities, 133 Similarity, 17 of standard triple, 147 of systems, 263 Simply connected set, 584 Smith canonical form, 647 local, 218 uniqueness of, 651 Spectral assignment, 203, 383 Spectral factorization, 187 Spectral shifting, 204 Spectral subspace, 60 Spectrum, 10 Standard pair, 183 Standard triple, 147 similarity of, 147 State vector, 262 Subspace: [A B]-invariant, 190, 481 -invariant, 192 angular, 25 coinvariant, 105, 437, 490 complementary, 20 Subject Index controllable, 204 irreducible, 65 Jordan, 54 orthogonally coinvariant, 108 orthogonally semiinvariant, 115 reducible, 65 root, 46, 363, 490 semiinvariant, 112, 438, 490 spectral, 60 see also Invariant subspace Supporting k-tuple, 530 stable, 530 Supporting quadruple, 249 Toeplitz matrix, 40, 317 upper triangular, 297, 317 Trace, 427 Transfer function, 265 Transformation: adjoint, 18 angular, 27, 398 coextension of, 128 diagonable, 90, 100 dilation of, 128 extension of, 121, 190, 208 function of, 85 induced, 30 nonderogatory, 299, 449, 465, 499 normal, 39, 303 orthogonally unicellular, 117 reduction of, 128 self-adjoint, 20 unicellular, 67 Triinvariant decomposition, 112, 253 orthogonal, 115 supporting, 156, 277 Unitary matrix, 37 Vandermonde, 72, 98 Weierstrass' theorem, 571, 614 Zero: geometric multiplicity of, 529 of rational function, 219, 223