Text
                    Diffusions, Markov Processes,
and Martingales
Volume 1: FOUNDATIONS
2nd Edition


Diffusions, Markov Processes, and Martingales Volume 1: FOUNDATIONS 2nd Edition L. C..G. ROGERS and DAVID WILLIAMS School of Mathematical Sciences, University of Bath JOHN WILEY & SONS Chichester · New York · Brisbane · Toronto · Singapore
Copyright © 1979, 1994 by John Wiley & Sons Ltd, Baffins Lane, Chichester, West Sussex P019 1UD, England National Chichester (0243) 779777 International (+ 44) 243 779777 All rights reserved. No part of this book may be reproduced by any means, or transmitted, or translated into a machine language without the written permission of the publisher. Other Wiley Editorial Offices John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, USA Jacaranda Wiley Ltd, 33 Part Road, Milton, Queensland 4064, Australia John Wiley & Sons (Canada) Ltd, 22 Worcester Road, Rexdale, Ontario M9W 1L1, Canada John Wiley & Sons (SEA) Pte Ltd, 37 Jalan Pemimpin #05-04, Block B, Union Industrial Building, Singapore 2057 Library of Congress Cataloging-in-Publication Data Rogers, L. C. G. Diffusions, Markov processes, and martingales / L. С G. Rogers, David Williams. — 2nd ed. p. cm.—(Wiley series in probability and mathematical statistics) New ed. of: Diffusions, Markov processes, and martingales / David Williams. 1979- Includes bibliographical references and index. Contents: v. 1. Foundations. ISBN 0 471 95061 0 (v. 1) 1. Markov processes. 2. Diffusion processes. 3. Martingales (Mathematics) I. Williams, D. (David), 1938- . II. Williams, D. (David), 1938- Diffusions, Markov processes, and martingales. III. Title. IV. Series. QA274.7.W54 1994 94-28241 519.2'33—dc20 CIP British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0 471 95061 0 Typeset in 10/12pt Times by Thomson Press (India) Ltd., New Delhi Printed and bound in Great Britain by Biddies Ltd., Guildford and Kings Lynn
For our parents
From the Original (1979) Preface Long ago (or so it seems today), Chung wrote on page 196 of his book [1]: One wonders if the present theory of stochastic processes is not still too difficult for applications.' Advances in the theory since that time have been phenomenal, but these have been accompanied by an increase in the technical difficulty of the subject so bewildering as to give a quaint charm to Chung's use of the word 'still'. Meyer writes in the preface to his definitive account of stochastic integral theory:'... ilfaut ...un cours de six mois sur les definitions. Que peut on yfaireT I have thought up as intuitive a picture of the subject as I can, written it down at speed, and refused to be lured back by piety (or even by wit!) to cancel half a line. 'First' intuition, which is what you need when you are learning the subject, is raw, rough and ready; and, as you have guessed, I make the excuse that it demands a compatible style and lack of polish. Note that I wrote first intuition'. Consider an example. Meyer's concept of a right process is exactly right for Markov process theory, but the concept is the result of a long evolution. To understand it properly, you need a highly developed intuition, and that takes time to acquire. The difficulty with the best advanced literature is that its authors have too much intuition; never make the mistake of thinking otherwise. My aim then is to sharpen your intuition to a point where the advanced abstract literature becomes accessible, enjoyable and 'relevant'. Like my expository article [1], this is a missionary tract not a theological treatise. (Those of you who have read my article [1] will see that this book often follows it very closely, except that now I have the time and the duty to be more obviously appreciative of the abstract theory!) I believe that, in the end, it is applications which justify mathematics. The 'artistic' justification of pure mathematics in terms of intrinsic qualities like elegance and generality rings rather hollow in my ears when I compare the best mathematics with the greatest music. Many applied workers will regard this book as extremely 'pure', but I see it as one stage in shunting pure theory over towards applications. The shunting is not always necessary: time and again, one finds 'applied' papers which 'solve' problems long since solved for 'purely
Vlii FROM THE ORIGINAL (1979) PREFACE theoretical' purposes. Moral: the pure/applied division of probability theory (as of mathematics in general) is a nonsense. Acknowledgements. This is an appropriate place at which to thank David Kendall and Harry Reuter for teaching me probability theory and for giving me an enthusiasm for the subject which is wearing well. My best way to thank them is to try to share that enthusiasm. I have to say another huge 'thank you' to David Kendall for the immense amount of work he has done in making editorial comments on the original manuscript. I now see that my determination to convey a sense of adventure did need to be tempered by a greater concern for the reader's sense of security. So I have acceded to many of David Kendall's requests for 'more details'; and as a result, you will learn more techniques of calculation and have a clearer idea of several concepts. (But I still see it as part of my job to keep you on your toes!) I am very grateful to Ronald Getoor and Andre Meyer for clearing up some confusions. I have been extremely fortunate in having been able to rely on the superb typing skills of Sheila Campbell, Eileen Jenkins and Gladys Maddocks; my thanks and best wishes to them. I thank Springer-Verlag and the authors for granting me permission to quote from Chung [1] in Section 111.44, from Getoor [1] in Section III. 54, and from Chung [1] and Meyer [1] earlier in ths preface. Finally, I have to thank James Cameron and Wiley for encouragement and great patience; and subeditors, copy-editors, and printers, whose skills have much impressed me. David Williams Swansea, 1978
Preface to the Second Edition This second edition differs profoundly from the first—and not only in having two authors rather than one. We retain the Gallic tradition of dividing the volume into three massive chapters: Chapter I, which says why the subject is worth studying; Chapter II, which provides background; and Chapter III, which presents an account of Markov processes. Chapter I is now much more extensive and wide-ranging, and covers much work done since the first edition appeared. Chapter II is now a highly systematic account, with detailed proofs, of what every young probabilist must know. It is rather unashamedly a sequel to DW's Probability with Martingales, Cambridge University Press having been very generous in allowing us to follow that account closely (but without many proofs, without the examples, etc.). It is perfectly possible to read Chapter II before Chapter I if you so wish. We would suggest however that you try things in the order 'heuristics then rigour': Our doubts are traitors, And make us lose the good we oft might win, Through fearing to attempt. (W. Shakespeare, Measure for Measure.) Chapter III seems to have been regarded as the most successful part of the original; and it is reproduced here without much modification (except that some of the functional analysis is given fuller treatment). It was always intended as a missionary tract on Markov processes. The full theory may be found in Sharpe [1] and in the final two volumes of the probabilist's bible, Dellacherie and Meyer [1]. All kinds of important developments are ignored in Chapter III: they would require another complete volume, and will be, or are, covered by greater experts. Dawson's eagerly awaited treatment [1] of measure-valued processes has now appeared; Mark Davis has a very nice new book [4] on piecewise-deterministic Markov processes; and so on. You can access the huge literature on measure-valued processes via Dawson's account. The musical allusions in the first edition have been excised. Apparently many people found them annoying. 'Would David Williams like a book on mathematics
χ PREFACE TO THE SECOND EDITION filled with references to baseball?', they say. (To which the answer is, of course, 'Yes.') So, this is Mathematics all the way from A to Zzzz—or from Ω on, if you want to be rigorous. Our thanks to Sue Collins and Wolfgang Stummer, and to other colleagues at Bath, Cambridge, and Queen Mary and Westfield College, London. Our thanks too to Helen Ramsey and other Wiley staff for suggesting this new version; and the copy-editor and printer whose skills have impressed us. Chris Rogers David Williams November 1993
Contents Some Frequently Used Notation xix CHAPTER I. BROWNIAN MOTION 1. INTRODUCTION 1 1. What is Brownian motion, and why study it? 1 2. Brownian motion as a martingale 2 3. Brownian motion as a Gaussian process 3 4. Brownian motion as a Markov process 5 5. Brownian motion as a diffusion (and martingale) 7 2. BASICS ABOUT BROWNIAN MOTION 10 6. Existence and uniqueness of Brownian motion 10 7. Skorokhod embedding 13 8. Donsker's Invariance Principle 16 9. Exponential martingales and first-passage distributions 18 10. Some sample-path properties 19 11. Quadratic variation 21 12. The strong Markov property 21 13. Reflection 25 14. Reflecting Brownian motion and local time 27 15. Kolmogorov's test 31 16. Brownian exponential martingales and the Law of the Iterated Logarithm 31 3. BROWNIAN MOTION IN HIGHER DIMENSIONS 36 17. Some martingales for Brownian motion 36 18. Recurrence and transience in higher dimensions 38 19. Some applications of Brownian motion to complex analysis 39 20. Windings of planar Brownian motion 43 21. Multiple points, cone points, cut points 45
Xll CONTENTS 22. Potential theory of Brownian motion in Rd {d ^ 3) 46 23. Brownian motion and physical diffusion 51 4. GAUSSIAN PROCESSES AND LEVY PROCESSES 55 Gaussian processes 24. Existence results for Gaussian processes 55 25. Continuity results 59 26. Isotropic random flows 66 27. Dynkin's Isomorphism Theorem 71 Levy processes 28. Levy processes 73 29. Fluctuation theory and Wiener-Hopf factorisation 80 30. Local time of Levy processes 82 CHAPTER II. SOME CLASSICAL THEORY 1. BASIC MEASURE THEORY 85 Measurability and measure 1. Measurable spaces; σ-algebras; π-systems; d-systems 85 2. Measurable functions 88 « 3. Monotone-Class Theorems 90 4. Measures; the uniqueness lemma; almost everywhere; ^β.(μ,Σ) 91 5. Caratheodory's Extension Theorem 93 6. Inner and outer μ-measures; completion 94 Integration 7. Definition of the integral \fdμ 95 8. Convergence theorems 96 9. The Radon-Nikodym Theorem; absolute continuity; λ « μ notation; equivalent measures 98 10. Inequalities; <£p and U spaces {p ^ 1) 99 Product structures 11. Product σ-algebras 101 12. Product measure; Fubini's Theorem 102 13. Exercises 104 2. BASIC PROBABILITY THEORY 108 Probability and expectation 14. Probability triple; almost surely (a.s.); a.s.(P), a.s.iP,^) 108
CONTENTS хш 15. limsup£n; First Borel-Cantelli Lemma 109 16. Law of random variable; distribution function; joint law 110 17. Expectation; E(X; F) 110 18. Inequalities: Markov, Jensen, Schwarz, Tchebychev 111 19. Modes of convergence of random variables 113 Uniform integr-ability and JS?1 convergence 20. Uniform integrabilky 114 21. У?1 convergence 115 Independence 22. Independence of σ-algebras and of random variables 116 23. Existence of families of independent variables 118 24. Exercises 119 3. STOCHASTIC PROCESSES 119 The Daniell-Kolmogorov Theorem 25. (£τ, £Ί)\ σ-algebras on function space; cylinders and σ-cylinders 119 26. Infinite products of probability triples 121 27. Stochastic process; sample function; law 121 28. Canonical process 122 29. Finite-dimensional distributions; sufficiency; compatibility 123 30. The Daniell-Kolmogorov (DK) Theorem: 'compact metrizable' case 124 31. The Daniell-Kolmogorov (DK) Theorem: general case 126 32. Gaussian processes; рге-Brownian motion 127 33. Pre-Poisson set functions 128 Beyond the DK Theorem 34. Limitations of the DK Theorem 128 35. The role of outer measures 129 36. Modifications; indistinguishability 130 37. Direct construction of Poisson measures and subordinators, and of local time from the zero set; Azema's martingale 131 38. Exercises 136 4. DISCRETE-PARAMETER MARTINGALE THEORY 137 Conditional expectation 30. Fundamental theorem and definition 137 40. Notation; agreement with elementary usage 138 41. Properties of conditional expectation: a list 139 42. The role of versions; regular conditional probabilities and pdfs 140
XIV CONTENTS 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. A counterexample A uniform-integrability property of conditional expectations (Discrete-parameter) martingales and supermartingales Filtration; filtered space; adapted process; natural filtration Martingale; supermartingale; submartingale Previsible process; gambling strategy; a fundamental principle Doob's Upcrossing Lemma Doob's Supermartingale-Convergence Theorem if1 convergence and the UI property The Levy-Doob Downward Theorem Doob's Submartingale and if Inequalities Martingales in S£2; orthogonality of increments Doob decomposition The <M> and [M] processes Stopping times, optional stopping and optional sampling Stopping time Optional-stopping theorems The pre-Τ σ-algebra !FT Optional sampling Exercises 141 142 143 144 144 145 146 147 148 150 152 153 154 155 156 158 159 161 CONTINUOUS-PARAMETER SUPERMARTINGALES 163 Regularisation: R-supermartingales 61. Orientation 163 62. Some real-variable results 163 63. Filiations; supermartingales; R-processes, R-supermartingales 166 64. Some important examples 167 65. Doob's Regularity Theorem: Part 1 169 66. Partial augmentation 171 67. Usual conditions; R-filtered space; usual augmentation; R-regularisation 172 68. A necessary pause for thought 174 69. Convergence theorems for R-supermartingales 175 70. Inequalities and S£v convergence for R-submartingales 177 71. Martingale proof of Wiener's Theorem; canonical Brownian motion 178 72. Brownian motion relative to a filtered space 180 Stopping times 73. Stopping time T; pre-Γ σ-algebra 3ίΓ; progressive process 181 74. First-entrance (debut) times; hitting times; first-approach times: the easy cases 183
CONTENTS XV 75. Why 'completion' in the usual conditions has to be introduced 184 76. Debut and Section Theorems 186 77. Optional Sampling for R-supermartingales under the usual conditions 188 78. Two important results for Markov-process theory 191 79. Exercises 192 6. PROBABILITY MEASURE ON LUSIN SPACES 200 'Weak convergence9 80. C(J) and Pr(J) when J is compact Hausdorff 202 81. C(J) and Pr(J) when J is compact metrizable 203 82. Polish and Lusin spaces 205 83. The Cb(S) topology of Pr(S) when 5 is a Lusin space; Prohorov's Theorem 207 84. Some useful convergence results 211 85. Tightness in Pv(W) when W is the path-space W:= C([0,oo);R) 213 86. The Skorokhod representation of Cb(s) convergence on Pr(S) 215 87. Weak convergence versus convergence of finite-dimensional distributions 216 Regular conditional probabilities 88. Some preliminaries 217 89. The main existence theorem 218 90. Canonical Brownian Motion CBM(R*); Markov property of Px laws 220 91. Exercises 222 CHAPTER III. MARKOV PROCESSES 1. TRANSITION FUNCTIONS AND RESOLVENTS 227 1. What is a (continuous-time) Markov process? 227 2. The finite-state-space Markov chain 228 3. Transition functions and their resolvents 231 4. Contraction semigroups on Banach spaces 234 5. The Hille-Yosida Theorem 237 2. FELLER-DYNKIN PROCESSES 240 6. Feller-Dynkin (FD) semigroups 240 7. The existence theorem: canonical FD processes 243 8. Strong Markov property: preliminary version 247 9. Strong Markov property: full version; Blumenthal's 0-1 Law 249
XVI CONTENTS 10. Some fundamental martingales; Dynkin's formula 252 11. Quasi-left-continuity 255 12. Characteristic operator 256 13. Feller-Dynkin diffusions 258 14. Characterisation of continuous real Levy processes 261 15. Consolidation 262 3. ADDITIVE FUNCTIONALS 263 16. PCHAFs; Α-excessive functions; Brownian local time 263 17. Proof of the Volkonskii-Sur-Meyer Theorem 267 18. Killing 269 19. The Feynmann-Kac formula 272 20. A Ciesielski-Taylor Theorem 275 21. Time-substitution 277 22. Reflecting Brownian motion 278 23. The Feller-McKean chain 281 24. Elastic Brownian motion; the arcsine law 282 4. APPROACH TO RAY PROCESSES: THE MARTIN BOUNDARY 284 25. Ray processes and Markov chains 284 26. Important example: birth process 286 27. Excessive functions, the Martin kernel and Choquet theory 288 28. The Martin compactification 292 29. The Martin representation; Doob-Hunt explanation 295 30. R. S. Martin's boundary 297 31. Doob-Hunt theory for Brownian motion 298 32. Ray processes and right processes 302 5. RAY PROCESSES 303 33. Orientation 303 34. Ray resolvents 304 35. The Ray-Knight compactification 306 Ray's Theorem: analytical part 36. From semigroup to resolvent 309 37. Branch-points 313 38. Choquet representation of 1-excessive probability measures 315 Ray's Theorem: probabilistic part 39. The Ray process associated with a given entrance law 316 40. Strong Markov property of Ray processes 318 41. The role of branch-points 319
CONTENTS XV11 6. APPLICATIONS 321 Martin boundary theory in retrospect 42. From discrete to continuous time 321 43. Proof of the Doob-Hunt Convergence Theorem 323 44. The Choquet representation of Π-excessive functions 325 45. Doob /i-transforms 327 Time reversal and related topics 46. Nagasawa's formula for chains 328 47. Strong Markov property under time reversal 330 48. Equilibrium charge 331 49. BM (JR.) and BES (3): splitting times 332 A first look at Markov-chain theory 50. Chains as Ray processes 334 51. Significance of q{ 337 52. Taboo probabilities; first-entrance decomposition 337 53. The Q-matrix; DK conditions 339 54. Local-character condition for Q 340 55. Totally instantaneous β-matrices 342 56. Last exits 343 57. Excursions from b 345 58. Kingman's solution of the 'Markov characterization problem' 347 59. Symmetrisable chains 348 60. An open problem 349 References for Volumes 1 and 2 351 Index to Volumes 1 and 2 375
Some Frequently Used Notation We use ':=' to mean 'is defined to equal'. This Pascal notation can also be used in reverse. We define Z+:={0,1,2,...} 3 {1,2,3,...} =:1K, R\=[0,oo), R+ + :=(0,oo), Q+:=QnR+. We neaten layout, and make things easier for our printers, by the use of alternative notations: X(tl9a>) for Xtl(a>\ fn{l) for fni, P(TX) for *Tl, Ptf(x) for (Ptf)(x), etc. Once things are underway, such switches in notation will be made without comment. The composition notation f°g(t):=f(g(t)) will often be used for tidiness. If / and g are real numbers or real-valued functions, we define /v g:= max(/,0), fvg:=mm(f,g), /+:=/v0, /":=(-/) v 0; hence /=/ + -/" and |/| = / + +/". If Ж is a set of real-valued functions, we write Ж+ for the set of non-negative elements of Ж, ЪЖ* for the set of bounded elements in Ж. If Σ is a σ-algebra, we write ml for the set of real-valued (or perhaps [oo, oo]-valued) Σ-measurable functions, bX for the space of bounded Σ-measurable functions. If S is a topological space, we write C(S) for the space of all continuous functions from S to R. Cb{S) for the space of all bounded continuous functions from S to R.
XX SOME FREQUENTLY USED NOTATION Monotone convergence. We write 'sji' to signify that s->i,s^i; and 'sffr' to signify that s-*t,s<t. If (sn) is a sequence then 'sn f i' signifies that sn -> i, sn ^ 5π +1 ^ fi while'sn1119 signifies that sn-^t,sn^sn+l<t. If fn and/are real-valued functions then (for example) /:jlim/n signifies that fn]f pointwise.
CHAPTER I Brownian Motion 1. INTRODUCTION 1. What is Brownian motion, and why study it? The first thing is to define Brownian motion. We assume given some probability triple (Ω, &, Ρ). (1.1) DEFINITION. A real-valued stochastic process {Bt:telR + } is a Brownian motion if it has the properties (1.2) (i) Βο(ω) = 0,νω; (1.2) (ii) the map tb-+Bt{co) is a continuous function o/ieIR+ for all ω; (1.2)(iii) for every t,h^0, Bt+h-Bt is independent of{Bu:0^u^ t}, and has а Gaussian distribution with mean 0 and variance h. The conditions (1.2)(ii) and (1.2)(iii) are the really essential ones; if В — {B^ieR"1"} is a Brownian motion, we frequently speak of {ξ + Bt:teR+} as a Brownian motion (started at ξ); the starting point ξ can be a fixed real, or a random variable independent of B. Now that we know what a Brownian motion is, questions of existence and uniqueness (answered in Section 6) are less important than an answer to the second question of the title, 'Why study it?' There are many answers to this question, but to us there seem to be four main ones: (i) Virtually every interesting class of processes contains Brownian motion— Brownian motion is a martingale, a Gaussian process, a Markov process, a diffusion, a Levy process,...; (ii) Brownian motion is sufficiently concrete that one can do explicit calculations, which are impossible for more general objects; (iii) Brownian motion can be used as a building block for other processes (indeed, a number of the most important results on Brownian motion state that the most general process in a certain class can be obtained from Brownian motion by some sequence of transformations); (iv) last but not least, Brownian motion is a rich and beautiful mathematical object in its own right. The aim of this chapter is to expand on these reasons, and convince you that
2 BROWNIAN MOTION 1.1,2 Brownian motion is indeed worthy of study; and the rest of this introduction gives a brief outline of some of the main points of the chapter. 2. Brownian motion as a martingale. Let {Bt:t^ 0} be a Brownian motion, and define 3#t = a({Bs:s^ t}). Then (Bt,&t)t^0 is a martingale. We shall have a lot more to say about martingales in Chapter II, but for now we need little of the theory developed there. Let us just check that (Bt,&t)t^0 is a martingale (cf. Section 11.63); first, BteLl for all i, because, from (1.2)(i) and (1.2)(iii), Bt ~ N(0, t), and, secondly, for 0 ^ s ^ i, Ε[_Bt - Bs | #J = 0, equivalent^, Ε[_Bt | ЭД = Bs, since Bt — Bs is independent of 39s by (1.2)(iii). Likewise, since Bt — Bs~ N(0, t — s) independently of 3$s, we have El(Bt-Bs)2\@s-] = t-s. But E[(B,-BS)2|^] = E[Bt2 -2BtBs + B2s\@J = E[Bt2|^J -B2, using properties of conditional expectation (Section 11.41), so since we have (almost surely) that E[£t2 -1 |#s] = B2 - s, we conclude that (2.1) B2 — t is a martingale. This simple fact is a pointer to the development of stochastic integrals; once that theory is developed, we shall be in a position to prove the following startling converse to (2.1). (2.2) THEOREM. (Levy) Let (Xt)t^0 be a continuous martingale, Xo = 0, and suppose that X2 — t is a martingale. Then X is a Brownian motion. By a continuous martingale, we mean of course one such that t\-+Xt{a>) is a continuous map for all ω. We have not been too specific about the filtration (^t)t^o with respect to which AT is a martingale, but this is not necessary; if X is a martingale with respect to (J*,),^ 0> and satisfies the hypotheses of Theorem 2.2 then X is an (J5",) Brownian motion—that is, X satisfies (1.2)(i), (1.2)(ii) and the stronger condition (1.2)(iii)' for any ί,/ι^Ο, Xt+h — Xt is independent of ^t and has a Gaussian distribution with mean zero and variance h. The Kunita-Watanabe proof of Theorem 2.2 is given in Section IV.33; a more elementary proof without using stochastic calculus appears in Doob [1]. A remarkable consequence of Theorem 2.2 is that
1.2,3 INTRODUCTION 3 (2.3) every continuous martingale is a time-change of Brownian motion. For a statement and proof of this, see Section IV. 34. One extremely useful consequence is that, since > lim: supBt = + oo, lim inf Bt = — oo = 1 (as we shall see in Lemma 3.6), if X is a continuous martingale for which P(lim inf Xt = — oo) > 0 then we must have P(lim supXt = + oo) > 0. See Section IV.34 for a full discussion. The elementary arguments that gave (2.1) also show that for any 0eR (or indeed, for 0e<C) (2.4) exp (6Bt — |02i) is a martingale; all one needs is that E(exp [0(Bt - Bs)]) = exp [|02(i - s)] for 0 ^ 5 ^ t, which is just the moment-generating function of a Gaussian distribution. These exponential martingales are extremely useful in many ways; in Section 9 we use them to compute the Brownian first-passage distribution to a level, and in Section 16 we derive the Law of the Iterated Logarithm using them. One small point to note here in connection with the exponential martingales (2.4) is that if we define the Hermite polynomials Hn(t, x) by exp(0x-i02i):= Σ -Hn{Ux\ then, for 0 ^ s ^ i, E(exp(0Bt-402i)W= Σ -K{Hn{t,Bt)\@s) n^on\ = exp(0Bs-i025) = Σ -Hn(s,Bs), so, by comparing coefficients of Θ", we deduce that Hn(t,Bt) is a martingale for each n. It is easy to check that Ηχ(ί,χ) = χ and H2(t,x) = x2 — t, so, in particular, (2.4)=>(2.1); Levy's Theorem 2.2 is essentially the converse to this. (2.5) Remark. Ii(Nt)t>0 is a standard Poisson process then Xt:= Nt — t satisfies all of the hypotheses of Theorem 2.2 except for continuity of the paths. 3. Brownian motion as a Gaussian process. In complete generality, a (real- valued) process (Xt)t€T indexed by some set Τ is said to be a Gaussian process
4 BROWNIAN MOTION 1.3 if, for any ί!,..., i„e T, the law of (X(tx),..., X(tn) is multivariate Gaussian. Thus the law of the process X is specified by the functions μ(ί):=Ε*„ p(s,t):= cow (Xs,Xt). (By this, we mean no more than that if we were told μ and p, we could work out the law of (^(ij,...,X(tr)) for any tl,...,tneT.) In the study of Gaussian processes, one usually assumes that μ = 0, to which the general case can be reduced by considering the Gaussian process Xt — μ(ί). It is obvious that (Bt)t^ 0 is a Gaussian process, with mean zero, and со variance (3.1) p(s,t) = SAt (s,i^0). Any continuous real-valued process (Xt)t>0 that is a zero-mean Gaussian process with со variance (3.1) is a Brownian motion—just check the definition! This simple fact turns out to be an extremely efficient means of checking when a process is a Brownian motion, and the following four simple but extremely important examples serve to illustrate this: (3.2) the process ( — Bt)t^0 is a Brownian motion; (3.3) for any a ^ 0, the process {Bt+a — Ba)t^0 is a Brownian motion; (3.4) for any с ^0,JicBt/c2)t^0 is a Brownian motion {Brownian scaling); (3.5) the process (Bt)t>0 defined by Bo = 0, Bt = tBl/t fori>0, is a Brownian motion. The proofs of these properties are trivial exercises, with the sole exception of the proof of continuity at 0 of B. But this is not difficult, because the event that B->0 at 0 is F-ΠΌ Π \\St\<-\, since В is certainly continuous in (0, oo). But the processes (Bt)t>0 and (Bt)i>0 are continuous, and have the same distribution (they are Gaussian processes with the same covariance!), so P(F) = P(F)=1, where F is the event (^n[jmC]qe^{0ii/m]{\Bq\ ^ Щ that B->0 at 0, which, by the definition of B, is certain. The most important by far of the properties (3.2)-(3.5) is the Brownian scaling property (3.4). We shall give here an easy but striking consequence. (3.6) LEMMA. We have Pi sup Bt = + oo, infBt = — oo ) = 1.
1.3,4 INTRODUCTION Proof. Let Z:= suptBr By Brownian scaling, for any о 0, we have cZlz, so the law of Ζ is concentrated on {0, +oo}. Let ρ = P(Z = 0). Then P(Z = 0)^P[B1 ^0 and BU^0 for all ti^l] (в, ^0 and sup {B1+t- 8^=0 t^o because (Bl+t — Βγ\>0 is a Brownian motion, whose supremum is therefore 0 or +oo. But {Bl +t - ΒχΧ^ο is independent of (Bu)u^ l5 so we deduce that ρ = P(Z = 0) ^ P(BX ^ 0)P(Z = 0) = !p, whence ρ = 0. Combining with (3.2) gives the stated result. □ Lemma 3.6 implies straight away that, almost surely, for each деШ, {t:Bt = a} is not bounded above. Thus Brownian motion is recurrent—it keeps returning to its starting point. We shall have more to say about Gaussian processes in Part 4 of this chapter, but point out now that the discussion there is by way of an interesting digression from our main theme; the general setting for Gaussian processes is too general to permit full exploitation of the special features of Brownian motion (notably a completely ordered index set). 4. Brownian motion as a Markov process. Brownian motion is a (time-homogeneous) Markov process; for any bounded Borel /:R-*IR, and s, ί ^0, (4.1) E[/(Bt+s)|^s] = Pt/(Bs) where the transition semigroup (P,)t^0 is defined by P,(x,y)f(y)dy (i>0), P,f(x)> ', [fix) (i = 0) where (*-y)2~] (4.2) pt(x9y):=(2nt)-l'2txp It is the Brownian transition density. The Markov property (4.1) is immediate from the definition of Brownian motion. It is easy to confirm that (Pt)t^0 is a semigroup: (4.3) Pt+s = PtPs = PsPt (5,i^0), the so-called Chapman-Kolmogorov equations. The semigroup property (4.3)
6 BROWNIAN MOTION 1.4 suggests that we ought in some sense to have (4.4) ~Pt = lim -(Pt+t - Pt) = Ρ β = 9P„ at sio s where (4.5) SF:=lim-(P,-fl is the (infinitesimal) generator of(Pt)t^ 0. This is indeed true in complete generality, when suitably interpreted; the suitable interpretation involves us in some fairly careful analysis, because in general 0 is not defined for all functions, and much of the classical early work on Markov processes struggled with these technicalities. This functional-analytic viewpoint has many merits, not least that it can suggest quickly what things are likely to be true, but we shall not stress it too much because it is not a very convenient framework in which to prove the conjectures to which it leads. But, for now, let us illustrate the notion by working out the generator of Brownian motion. From (4.5), we should define &f for suitable / by <Zf:=lim-(Ptf-n tio t and, indeed, if/eC£(R) then 40 t tiO J-oo t Jin = lim tio \{уфГ(х) + W(x + вуу/i)} exp(--b2)-^ 'In = i/"W· (where 0e(0,1) depends on Уу/t) Thus the infinitesimal generator of Brownian motion is 2dx2' at least when applied to C£(R). From (4.4), we find that, for /eC£(R), ^Ptf(x) = $Ptf(x) = ±(Ptf)"(x), Ot which leads to Kolmogorov's backward equation for the Brownian transition density: д 1 д2 (4.6) - pt(x, у) = - — pr(x, у), ot 1 dxz
1.4,5 INTRODUCTION 7 since / is arbitrary. Using the other part of (4.4) gives us ^Ptf(x) = Pt$f(x) = ±Ptf"(x), Ot and an integration by parts now yields Kolmogorov's forward equation for the Brownian transition density: д , ч 1 д2 < ч (4-7) —pt(x, у) = - — pt(x, у). ot 2 су1 This equation is familiar in physics, where it is known as the heat equation, or the diffusion equation, so called because it determines the physical flow of heat, or the physical diffusion of particles in solution, in a homogeneous medium. Many of the notions of diffusion that probabilists use everyday were known to physicists long ago, and amount to the same things in different language (see, for example, the classic book by Crank [1] for a physicists' exposition—and a broad selection of fascinating and challenging questions). It is however important to stress that we are not simply going to be rederiving results well known in physics; probability provides techniques for the study of individual diffusing particles, which are far more flexible and powerful than the classical analysis of the heat equation, which is only a statement about the average behaviour of a large number of diffusing particles. 5. Brownian motion as a diffusion (and martingale). Without trying to be too precise, a diffusion (on the real line for now) is a continuous time-homogeneous Markov process X that is 'characterised' in some sense by its local infinitesimal drift b and variance a: for small h, (5.1) (i) ElXt + k-Xt\Pt-]=hb(Xt), (5.1)(ii) E[{Xt+h-Xt-hb(Xt)}2\Ft-]=ha(Xt). If a and b were constant functions then Xt = aBt + bt, σ:=α1/2, would satisfy the description (5.1); the more general diffusion is rather similar except that the drift and variance may now depend on position. It is unnecessary to impose conditions on moments of the increment Xt+h — Xt beyond the second, which you will certainly accept as plausible if you recall Levy's Theorem (2.2), which said that Brownian motion is characterised by b = 0, a = 1. Broadly speaking, there are three approaches to diffusions: the stochastic differential equation (SDE) approach, the martingale-problem approach, and the partial differential equation (PDE) approach. Each has its merits and peculiar techniques. The SDE approach constructs the diffusion X with given infinitesimal charac-
8 BROWNIAN MOTION 1.5 teristics a and b by solving (5.2) *, = *<> + c(Xs)dBs + b(Xs)ds, /o Jo where σ:=α1/2, an equation that is commonly written in 'differential' form (5.3) dXt = a{Xt) dBt + b(Xt) dt. Thus X has infinitesimal drift b and infinitesimal variance, a, since the increment Xt+h — Xt is (approximately) a(Xt)(Bt+h-Bt) + hb(Xt). There is a lot of work involved in defining what the second term on the right-hand side of (5.2) means, and in verifying existence and uniqueness of a solution under suitable conditions on σ and b; we shall have almost nothing to say on this until Volume 2. The martingale problem approach and the PDE approach both begin from the same trivial calculation based on (5.1). For any feC%, (5.4) EU(Xt+h)-f(X,Wt-] = EU'(x,)(x,+k - x,) + \f'\ext+h + (i - Θ)χ,)(χ,+η - x,)2m (where 0e(0,1) is random) =f'(Xt)hb(X,) + hf"(Xt)lha(Xt) + к2Ь{ХП = h#f(Xt) + 0(h2), where S£ is the second-order elliptic operator, (5.5) jSf/(x):= ±α(χΆχ) + b(x)^-(x). dxz dx The martingale-problem approach takes (5.4) and re-expresses it as E^f(Xt+h)-f(Xt)- Γ"JSf/(XjJ^l = o(h% so that the martingale-problem 'definition' of a diffusion X with drift b and variance a is that X is a continuous process such that, for all /eCj, (5.6) f(Xt)- &f(Xs)ds is a martingale. The PDE approach takes expectations on both sides of (5.4) to get (5.7) Phf(x) -f{x) = h<?f(x) + 0(h2), so that, dividing by h and letting /i|0, (5.8) the infinitesimal generator У of X is JSf.
1.5 INTRODUCTION 9 The PDE approach is now ready to go, with all of the arsenal of PDE techniques at its disposal; for example, one may begin by looking for the fundamental solution pt(x, y) to — Pt(*> У) = &xPt(x> У% Ро(х> У) = δγ{χ\ pt ^ 0, dt where S£x is the operator JSf acting on the x-variable, and by is the Dirac delta function at y. This fundamental solution is the transition density of the diffusion, from which one can obtain much information; see Chapter 3 of Stroock and Varadhan [1], which is also the definitive account of the martingale-problem method applied to multidimensional diffusions. There are still real problems of definition, existence and uniqueness for each of the three approaches—least severe for the PDE approach. But the additional price to be paid for using stochastic methods is worth it; the conditions imposed on a and b to get a PDE result to work are generally of a global nature, whereas the diffusion, being continuous, should only care about local behaviour. The stochastic methods are just right for this—once a diffusion leaves a region where everything is nice, we can stop it, and solve in the nice region, thereby giving results under only local conditions. We have great admiration and respect for the PDE approach— the analysts' fine results are not just valid for second-order elliptic operators, which is the case with the probabilistic results. The last word for the moment on the comparison between the three methods must be with Sid Port: The one thing probabilists can do which analysts can't is stop—and they never forgive us for it.' You will realise by now that one can perfectly well have diffusions in dimension greater than one, but the one-dimensional diffusion theory is essentially complete, thanks to Brownian local time. The existence and properties of Brownian local time form the first non-trivial result in the theory (after the existence of Brownian motion itself). (5.9) THEOREM (Trotter). There exists a process {l{t, x):t^ 0, xeR} such that (5.10) (i) (ί, χ) ι—► l(t, x) is jointly continuous; (5.10)(ii) for any bounded measurable f and у ^ 0, \'f(Bs)ds= Г f(x)l(t,x)dx. Jo J - oo This is a deep result, whose proof using stochastic calculus we shall finally give in Section IV.44. The key property (5.10)(ii) is the occupation density formula; we shall discuss some of the implications for the Brownian sample path in Section 10, but for now we describe the most general regular diffusion (see Sections V.44-54 for the whole story). (5.11) THEOREM. A (regular) one-dimensional diffusion X on an interval I can
10 BROWNIAN MOTION 1.5,6 be obtained from Browian motion В as where the scale function s:/->R is continuous and strictly increasing and zt = inf {u: Au > t}, where Au = m{dx) l{u, x) for some measure m (the speed measure) that puts positive finite mass on bounded non-empty open subintervals of I which exclude the endpoints of I. Any pair (s, m) gives rise to a regular diffusion, which is uniquely characterised by(s,m). The theory of diffusions in dimension greater than one is still much less complete, and doubtless will reman so. 2. BASICS ABOUT BROWNIAN MOTION 6. Existence and uniqueness of Brownian motion. The existence proof for Brownian motion that we now give (due to Ciesielski [1]) is the ultimate refinement of Wiener's original idea of representing Brownian motion as a random Fourier series. (6.1) THEOREM. There exists a probability space on which it is possible to define a process (Bt)0^t^ x with the properties (i) B0(co) = 0forallco; (ii) the map t\->Bt((o) is a continuous function ofte[0, V\for all ω; (iii) for every O^s^i^l, Bt — Bs is independent of {Bu:u^s} and has a N(0, t — s) distribution. Proof. Take some probability space on which there is defined an infinite sequence of independent N(0,1) random variables. For reasons that will soon be apparent, we assume that they are indexed as {Zkn:neZ+, к odd, k^ 2n). Now define 0i,oW=l r 2<»-i>/2 ((k-l)2~n<t*:k2-n), gKn(t)= j-*"-1"2 (/c2-"<i^(/c+l)2-"), I 0 otherwise, for n^l, fe^2n, к odd. For notational convenience, let Sn = {(k,n):k odd, к ^ 2n}, S = [jn> 0Sn. The first thing to notice is that {gkn: (k, n)eS} is a complete
1.6 BASICS ABOUT BROWNIAN MOTION 11 orthonormal system in L2[0,1]. The orthonormality of the gkn is easy to check; and, for completeness, if feL2 [0,1] were orthogonal to all the gkjn then F(t) = Jq/(w)du would vanish at 0 and 1 (since f-Lglt0Y, anc^ also at i (since /lflfn); and also at|,f, (since/±012,032),.... ThusF = 0, and/=0. Now define Λ,Λ(ί):=ίοΛ,η(Μ)^ anc^ ^е approximations Βπ(·) to Brownian motion by ВД= Σ Σ zKmfktm(t)- m = 0 (JI[,ffl)€Sm Let us describe what these approximations are doing. The first approximation B0 is simply iZ10, a straight line. The next approximation is obtained by adding on a Gaussian multiple of /lls which is a tent-shaped function, vanishing at 0 and 1. The next approximation is obtained by adding on two Gaussian multiples of tent-shaped functions, which both vanish at 0, j and 1. The first three approximations are illustrated in Fig. 1.1. So what is happening is that the nth approximation is piecewise-linear and continuous, and is equal to the limit value at each point of the formy2~n. The next stage of the proof is to establish that the Bn converge uniformly almost surely. Indeed, for any positive constant an, P( sup \Bn(t)-Bn_l(t)\>ar = p(sup|ZM|>2<" + 1>/2a„ (since the fKn are all at most 2~(B + 1)/2) <2"-1P(|Z1,J>2<" + 1)/4) <(4π)-1'22*4~1«φ(-α.22"λ Ba Вг Fig. 1.1
12 BROWNIAN MOTION 1.6 by the elementary estimate exp(—^y2)dy^x_1exp( —^x2). We now aim to choose the constants an in such a way that Σ2η/2α;1εχρ(-α2η2η)<€θ, Σα»<0°· π The first of these conditions will ensure that, almost surely, sup | B„(t) — B„ _ x (i) | ^ an for all large enough n; the second will guarantee that the Bn converge uniformly (almost surely) to a limit B, which is therefore continuous. But these conditions are satisfied by the choice an = (n2"n)1/2, for example. Thus we have proved that, almost surely, the Bn converge uniformly to some continuous limit B, which we now must show is Brownian motion. As we saw in Section 3, the simplest way to do this is to check that В is a zero-mean Gaussian process with covariance structure E(BsBt) = s л ί. Obviously, each Bn is a zero-mean Gaussian process: the vector (Bn(t1),...,Bn(tk)) is multivariate Gaussian. This converges almost surely (and so in distribution) to (В(^),..., B(tk)\ which also has a zero-mean Gaussian law, and the limit of the covariances of the Br gives the covariance of B. But E[Bn(s)Bn№= £ Σ fUs)fU*)> m = 0 (k,m)eSm by independence of the Zkm, and this converges as ηf oo to Σ /k.m(*)/kf«(i) (k,m)eS = ho,s](u)ho,t](u)du Jo = S Λ ί, since fkttn{s) — ioI[o,s](u)dk,m(u) du is the Fourier coefficient oigktn in the representation of I[0tS] in terms of the complete orthonormal system {gktn:(k,n)eS}. ParsevaPs identity concludes the proof. Π (6.2) COROLLARY. There exists a probability space on which Brownian motion can be defined. The idea is obvious; we can build on some probability space independent copies of the process constructed in Theorem 6.1, and then stick them together
1.6,7 BASICS ABOUT BROWNIAN MOTION 13 to make a process (Bt)t> 0. We leave the reader to satisfy himself or herself that this can easily be done. (63) Remark. The observant reader will notice that in Theorem 6.1 we obtained В as the almost sure uniform limit of continuous functions, and may be worrying what happens on the null set—define В = 0 there! Now we turn to the uniqueness of Brownian motion, a much simpler matter, once one decides the sense in which Brownian motion is unique. But this can only be in the distributional sense: there is a unique probability measure Ρ (called Wiener measure) on the space C(R+,R) = {continuous χ :R + ->R} such that under Ρ (6.4) (i) x(0) = 0,a.s.; (6.4)(ii) for i0 = 0<ix < ··· <i„, (x(iy-ij_1))"=1 are independent zero-mean Gaussian variables with variances t}·, — iy_ x (j = 1,..., n). If we define nt: C(R+,R)->R to be the projection map nt(x) = x(t) and # to be the collection of all cylinder sets {x:x{tj)eAj for ; = l,...,n} as η runs through N, t x,..., tn run though R +, and A1,...,An run through ^(R), then ^ is a π-system that generates the σ-field si = a({nt:t ^0}). Any two measure Ρ and P' on C(R+,R) with properties (6.4)(i) and (ii) must agree on # and therefore on si (see Lemma H.4.6). Thus Ρ is unique. (6.5) Remarks. It is also true that si is the Borel σ-field on C(R + ,R) when this space is equipped with the topology of uniform convergence on compacts (see Lemma II.82.3). 7. Skorokhod embedding. In many ways, a zero-mean finite-variance random walk looks like Brownian motion, and we shall shortly see this made precise. The key to understanding this is the celebrated Skorokhod embedding, which allows one to embed any random walk with zero mean and finite variance into a given Brownian motion in a very well-controlled way. We suppose that F is the distribution of the steps of the random walk; thus Γ xF(dx) = 0, x2F(dx) = σ2 < oo. J — ao J — oo The aim is to find a stopping time Τ for Brownian motion В such that BT~F and ΕΓ=σ2.
14 BROWNIAN MOTION 1.7 The requirement that ΕΤ=σ2 is essential to our subsequent use of the Skorokhod embedding, and prevents the problem from being a triviality. (As Doob has remarked, we could just take r.= inf{i > l:Bt = h(Bl)}, where h is chosen to make /i(#i)~ F]- But for this Τ,ΕΓ= oo). There are a number of different ways to build the stopping time Τ based on the path of В alone; we discuss the beautiful Azema-Yor construction in Section VI.51 to illustrate this by excursion theory. For now, though, we take a simpler though less elegant approach and assume that we are given some further randomisation independent of B, which we use to pick a pair α < 0 < β according to the distribution (7.1) μ(άα,Μ)) = y{b - a)F+(db)F_(da% where F+ are the restrictions of F to [0, oo), (—οο,Ο) respectively, and (7.2) y~l = \bF+{db) = - ί aF_(da): Jo J - oo \x\F(dx) is the appropriate normalisation. To see why this is a good thing to do, we first need to do a couple of trivial martingale calculations. {73) PROPOSITION. If τ = mi{t:Bt<£(a,b)}, where a<0<b are fixed, then τ < oo, a.s., and (7.4) (i) Ρ(βτ = ί>)= ~α b-a (7.4)(ii) Ет = |оЬ|. Proof. Finiteness of τ follows from Lemma 3.6. For (7.4)(i), use the Optional Sampling Theorem on the martingale B: 0 = E(BtAn) = feP(Bt = b, τ ^ n) + aF(Bt = α, τ ^ и) + Ε \βη: τ > η] -+bP{Bz = b) + aF(Bz = a) as η-*οο, since ΒτΑ is bounded. To obtain (7.4)(ii), consider the martingale Mt = (Bt-a)(b-Bt) + t, and use the Optional Sampling Theorem again: EM0=-ab.= EMtAn = Е(хлп) + Е(ВХАп-а)(Ь-ВгАй) -»Ετ as п-юо. D
1.7 BASICS ABOUT BROWNIAN MOTION 15 The Skorokhod embedding is described as follows. Take the two points α < 0 < /?, and run В until it hits one or other: T:=inf{u:BJ(aJ)}. (7.5) THEOREM (Skorokhod embedding). The law of BT is F, andET= σ2. Proof. From (7.1) and (7.4) (i), we see immediately that, for b > 0, -1 P(BTedb)=\ —-y{b - a)F+{db)F_(da) ■<o.o)b-a = F+(db), using the definition of y. Thus BT has law F. Next, ET= | | i4da,db)\ab\ ,0) ■ί ί = y\ F+(db)( F_(da)(b-a)\ba\ J[0,oo) J(-oo,0) ■j: x2F{dx) = σ2 as required. Π Now we see how to embed the random walk into Brownian motion; we take 7\ := Τ as just described, and then perform the same construction on the Brownian motion (BTl + t — BTl)t^ 0 to obtain a stopping time T'2 with mean σ2 and such that В , — ΒΤι ~ F. Now set T2:= Tx + T'2, and proceed to carry out the same construction on the Brownian motion (BT2 + t — BT2)t>0 (the fact that this is a Brownian motion does not follow from (3.3), since Tx and T2 are random; we need the strong Markov property of Brownian motion—see Section 12—to justify this intuitively obvious fact). We go on doing this, ending up with a sequence T0 = 0 ^ 7\ ^ T2 ^ · · · of stopping times. To summarise then, we have the following result. (7.6) THEOREM. The process (Sn)n^0:= (B(Tn))n^0 is a random walk with step distribution F, and ΈΤη — ησ2. The Skorokhod embedding just described reduces many statements about zero-mean finite-variance random walks to trivialities; for example, since Р[8ирВ,= + oo,infBt= — oo ] = 1, (7.7) P[supS„= +oo,infS„= -oo] = l
16 BROWNIAN MOTION 1.7,8 for any such random walk! But we need to be a bit careful here—might the sequence (Tn) not miss all those times when В was above 20, say? In principle, yes; but, because we ensured that the Tn—Tn_1 are independent and identically distributed (HD) with finite mean, that cannot happen. This is the only place where applying the Skorokhod embedding is in the least bit delicate, and in our proof in Sections 8 and 17 of the big classical limit results for HD summands you will see how we deal with this. Generalising to independent summands that have different distributions tends to become more involved; the limit problem for (Sn) get converted into a limit problem for (Tn)\ Nonetheless, if this can be solved, the functional form of the limit theorem often follows very easily. In any case, knowing what Brownian motion does will give a very good idea of what the random walk should do! 8. Donsker's Invariance Principle. It is an important fact that Brownian motion is a weak limit of random walks, both for our intuitive understanding, and for the easy proof it permits of various limit theorems, using the Continuous- Mapping Theorem (see Lemma H.84.2). Theorem 7.6 tells us that a random walk can be embedded in a Brownian motion, so the result cannot be said now to be a big surprise; and, indeed, all that is needed is a little care over the details. So we shall suppose given any zero-mean step distribution F with unit variance, and use Theorem 7.6 to give us a random walk S„:= B(Tn) with this step distribution. Now for each η define the random function (8.1) S(n)(i) = n1/2 .НК1+(^-')4 ; fc + 1 This is a piecewise-linear continuous function, equal to η 1/2Sk at each point of the form t — k/n. (8.2) THEOREM (Donsker's Invariance Principle). The processes (S(n)(i))0< f < 1 converge weakly to (Bt)0^t^ 1 as n-> oo. Weak convergence is studied in Chapter 2, Part 6. Proof. We shall prove that, given ε > 0, there is some n0 — η0(ε) so large that, for η > n0, (8.3) Ρ sup |S(n)(i)-B(n)(i)|>eUe, |_ο<ί<ι J where B{n) is a Brownian motion; in fact, we define В\пЫп~1/2Вт. To begin with, note that we can choose δ > 0 so small that (8.4) P[|BS - Bt\ > ε for some 5, ie[0,1], \s - t\ ^ <5] ^ \ε
1.8 BASICS ABOUT BROWNIAN MOTION 17 since В is continuous. Note also that, since the embedding times Tn constitute a random walk, by the Strong Law of Large Numbers we have so that Π - sup \Tk — k\ —АО. Π k^n Thus, for some nl9 for all η ^ nl9 Ρ -sup\Tk-k\>±d Ue/2. The third simple fact we need is that, for any ie[/c/n,(/c + l)/n], there is some nue[Tk,Tk + 1~\ such that (8.5) S(n)(t) = n-i/2Bnu = :BM. Thus we estimate Ρ sup |S(">(i)-B<">(i)|>el Lo<*<i J ^P sup \Sin){t)-Bin){t)\>e, \Тк-к\^Ш for all k^ n] |_ o<r< ι ° J + рГ-5ир|Гк-/с|>|Л \_П k^n J ^Р[|В<и)-В<и)|>е for some u,te[0, l],|u- i| ^<5] + e/2, at least for η ^ n0 = nx ν (3/<5). Indeed, if there is some ie[fc/n,(fc + l)/n] such that |S(n)(i)-B(n)(i)| >ε then, using (8.5), there is some и in [n'1Tk9n~1Tk+1] such that |B(un) - B<n)| > ε; and, since \n~* Tk - k/n\ ^ |<5 for all к (at least off the unlikely event {n'1 supk<n | Tk - k\ > |<5}), it follows that \u - t\ ^ <5. Using (8.4) now yields (8.3): 4 sup |S(n)(i)-B(n)|>e <ε for all n^n0. D Remarks. Notice that Theorem 8.2 implies the Central Limit Theorem for IID zero-mean finite-variance random variables, and the proof makes no use anywhere of characteristic-function methods!
18 BROWNIAN MOTION 1.9 9. Exponential martingales and first-passage distributions. We shall use the Brownian exponential martingales (2.4) to derive the distribution of Hx:=ini{t>0:Xt = x} for χ > 0, where Xt:= Bt + ct is a Brownian motion with drift, and also to derive the distribution of Ha л Hb, where a < 0 < b. Let us fix some λ > 0. From (2.4), exp (6Xt - Xt) = exp [ΘΒΧ - (λ - 0c)J] is a martingale provided that Л-0с = ±02, that is, 0 = β:= jc2 + 2λ - с, (or) θ = α:= - с - у/сЧИ Note that α < 0 < /?. Thus the martingale exp (/?Zt — λί) is bounded on [0, #x], so we can use the Optional Sampling Theorem to conclude that 1=Еехр[)ЩЯ,)-АЯ,] = е*хЕе-ХНх, from which (9.1) Ε e~ λΗχ = exp { - x(,/c2 + 2λ - с)]. The Laplace transform can be inverted explicitly to give (9.2) P(Hxedt)/dt = —^— exp[-(x - ci)2/2i]. v/2^r> From (9.1), taking the limit as Л JO, we conclude that fl (c^O), l - (е-2И- (c<0). Thus, if the drift is negative, the drifting Brownian motion will with positive probability fail to hit same named positive level. This is not very surprising, since t~~1Xt-*c, a.s. Next, if we fix a < 0 < b, and let Г:= На л Hb, then the process Mt:= {epb - ePa)eaX'-Xt + (eaa - eab)efiXt~Xt is a martingale of the form f(Xt)e~Xt, where / is constructed so that f{a)=f{b) = epb + aa-efia+ab.
1.9,10 BASICS ABOUT BROWNIAN MOTION 19 Another application of the Optional Sampling Theorem gives _хт_ерь-еРа-^еаа-еаЬ (9·3) Ee ~ efib + aa_efia+ab ' This time, there is no explicit inversion of the Laplace transform in any particularly useable form, even in the special case a = — b, с = 0, when we have the simpler statement (9.4) Ee-Ar = sech(b4/2l). Remarks. Everything in this section is entirely classical, but is a good illustration of martingale techniques in conjunction with the Brownian exponential martingales. We shall later see a completely different derivation of (9.2) which is far more illuminating (see Section 14). 10. Some sample-path properties. The fine structure of Brownian sample paths exerts a mesmeric fascination, which you will understand after reading this section; Brownian paths are wilder than we can imagine! As a small aperitif, we give a soft argument to prove that, almost surely, В is not differentiable at zero. Notice first that, by the time-inversion property (3.5) and the oscillation of the Brownian path near infinity (Lemma 3.6), Ρ [for each ε > 0, 3 s, t ^ ε such that Bs < 0 < BJ = 1, so the only possible derivative at zero would be zero. But if Bq = 0, we have that, for all small enough i, \Bt\ ^ t. Time inversion again translates this into the statement that, for all large enough s,|55|^l (Bs:=sB1/s\ contradicting Lemma 3.6. By Fubini's Theorem, we conclude that, almost surely, В is almost everywhere non-differentiable. What we really want is true, but needs another approach. (10.1) THEOREM. Almost surely, В is not Lipschitz-continuous anywhere. In particular, В is nowhere differentiable. Proof. Fixing some (large) К > 0, define, for each η ^ 2, the event An:= {for some se[0,1], \Bt- B5\ ^ K\t - s\ whenever \t-s\^ 2/n}, and let ΔΜ:= \B(k/n) - B((k - l)/n)|, к = 1,...,п. Then Лп~пКп<т i°Tj=k-i'k>k+i}>
20 BROWNIAN MOTION 1.10 and so Ρ(Αη)^(η-2)ΙΡ(Αί>η!ζ4Κ/η)γ = (n-2)P(\Bi\^4K/y/nf But the An are increasing, so P(A„) = 0 for all n. D This neat argument is due to Dvoretsky, Erdos and Kakutani. An elegant proof using the joint continuity of Brownian local time can be found in Geman and Horowitz [1]. The exact modulus of continuity of the Brownian path is an altogether more delicate result; you will find a proof of the following result in Section 1.6 of McKean [1]. (10.2) THEOREM (Levy's Modulus-of-Continuity Theorem) Pllimsup sup ,. Bt + *~Bt^. ,„ = 1 1 =1. <U0 o<t<i 2<51og(^11/2 01 This exact modulus of continuity should be compared with the celebrated Law of the Iterated Logarithm: P^ limsup no Ш 1/2 2floglog(-J I B,= +1V = 1. (10.3) We prove this result in Section 16; do note that, although the Law of the Iterated Logarithm (10.3) must hold at almost every point of the path, by Theorem 10.2 there will be places where the oscillation will be wilder than [2<51oglog(l/<5)]1/2. Indeed, there will also be places where the oscillation will be like cS1/2 (if с is small enough); for a fascinating analysis of such slow points of Brownian motion, see Greenwood and Perkins [1,2]. One other sample path property worth mentioning is the following. (10.4) THEOREM. Almost surely, Brownian motion has no point of increase: P[3(5>0,s>0 such that Bs__h^Bs^Bs+h for all /ie[0,<5]] =0. Of the many proofs, we give that of Burdzy [3] in Section 12. (jf0.5) Remark. If (B,1),^ and (Bf)t^0 are independent Brownian motions, then (Bt)t^0 = (Bl9Bf)t^0 is Brownian motion in the plane. For any ueR2, |i?| = 1, the process (v · Bt) is a real-valued Brownian motion, and therefore has no points of increase with probability 1. James Taylor has posed the question 'Is there a positive probability that for some ν the Brownian motion ν · Β has a point of
1.10-12 BASICS ABOUT BROWNIAN MOTION 21 increase?' As far as we are aware, this problem is unresolved. If true, it would say that, with positive probability, the trace B[0,1]:= {Bt:0^ t ^ 1} of two- dimensional Brownian motion could be cut with a straight line; the best that is known so far is Burdzy's result that B[0,1] can be cut with a Lipschitz curve. 11. Quadratic variation. While the sole result of this section could have been included in the previous section, we separate it out because of its fundamental importance in the construction of stochastic integrals. Define i£:= t л (fe2~n), and №= Σ [ВД-в^)]2. (11.1) LEMMA (Levy). With probability 1, as n->oo [B]"->i uniformly on compact i-intervals. The proof is given in Section IV.2, so we defer the details. Since they are not hard, the reader may wish to have a go at proving Lemma 11.1 now. (11.2) Remarks (i) If we take the quadratic variation of В down the dyadic partitions, we get the answer t. If, on the other hand, we take sup{J£ |B(rk)-B(ik_1)|2:0 = i0<ii<-<iiv=li we get the answer + oo; the true quadratic variation of В is infinite almost surely. See Levy [2, p. 190] or Freedman [1, p. 48]. (ii) Of course, it follows immediately from Lemma 11.1 that the variation of almost every Brownian path is infinite on every interval. 12. The strong Markov property. We give here a quick proof of the strong Markov property for Brownian motion, exploiting heavily the special features of Brownian motion, and standard results on martingales. The strong Markov property holds for a much wider class of processes (we shall see in Chapter III that it holds for all Feller-Dynkin processes, and for all Ray processes), and is proved by approximating a stopping time by a dyadic-rational stopping time. Since this approximation procedure is used in the martingale results to which we appeal, the result we now give contains all the same ingredients. For the definition of Brownian motion relative to a filtered probability space, see Section 11.72, or look back to (1.2)(iii)'. (12.1) THEOREM. Let (Bt)t^0 be Brownian motion on some filtered probability space (Ω, #", (lFt\ P), and let Τ be a finite-valued (J^) stopping time (see
22 BROWNIAN MOTION 1.12 Section II.73.1). Then the process Β\Ό:= BT+t — BT, is a Brownian motion independent of ^T. i^O, Proof. Fix 0 = ί0<ίχ < ··· ^in, and reals θχ θ„, and take some £eb!FT. Let sj:=tJ — tj_1 (;'=l,...,n). Let S be a stopping time. By applying the Optional Sampling Theorem to the martingale Mr:=exp(i0B( + ±02i) and the stopping time S л Ν, we obtain E[exp {i0B({S λΝ) + ϊ) + \d2((S л N) + t)}\P(S л Ν)} = exp {ieB(S λΝ) + \62{S a N)}. Rearranging and letting Nfoo yields (12.2) E[exp {i0(Bs + I - Bs) + i02t} l^s] = l Thus, if f ebJ^r, we have, with Zj denoting &T)(tj) - B(T)(t,·-!), Ε te*P\ Σ rj0,.Z, + i0,2s,] = E £«p Σ [ВД+№/1 by taking S=r+in_l5 i = sn in (12.2), and evaluating the expectation by conditioning first on iFs. Conditioning successively on ^"(Γ+ tk) (k = η — 2,... ,0) gives ΖexP Ι .Σ DOjZj + ЩзЛ~\ = Εξ. Hence, conditional on J^r, Zj (j = 1,..., n) are independent Gaussian variables with mean 0 and variances s, (7 = 1,..., η). □ {12.3) Remarks (i) The restriction to finite-valued stopping times is not really essential, being included only to save worrying over the definition of BiT). (ii) If one takes a more general continuous Markov process, the strong Markov property is formulated as follows. Let Q = C(R+,R), with the canonical process ^(ω) = ω(ί), filtration <F°X = σ({Χη: и ^ ί}) and measure P* for the process started at x. The shift maps 6t: Ω -> Ω are defined by (0,а))(я):= ω(ί + s), and the strong Markov property says that, for any С^7Д^0 stopping time T, any £ebi^ + , це\>&° and xeR, (12.4) Εχ[ξη°θτ: т< oo] = Εχ[ξΕΧ{Τ)(η): Τ< oo]. This formulation turns out to be ideal for applications. As an example, con-
1.12 BASICS ABOUT BROWNIAN MOTION 23 sider the celebrated Blumenthal 0-1 Law, which says that #"£ + := ns>0 J^° is trivial under P*. The proof is not hard either; if Ле^ + then, taking ξ = η = /л in (12.4), we obtain (with 7= 0) Е*[/Л] = Е*[/ЛЕ*<°>/Л] = Р*№ so that р*(Л) = Р*(Л)2 and Р*(Л) = 0 or 1! The consequences of this result for Brownian motion are far-reaching. In the general setting, though, the measurability questions implicit in (12.4) need to be carefully considered; we return to this in Chapter III. As a first application of the strong Markov property, we give Burdzy's [3] beautiful proof of Theorem 10.4, that with probability 1 Brownian motion has no point of increase. The aim is to prove that P°(Ao) = 0, where A0 = {for some 0 < t < и, Bs ^ Bt ^ 1 for s ^ t,Bs ^ Bt for t ^s ^ и, Bu ^ Bt + 2}, This will give the result, since if Brownian motion has a point of increase with positive probability then with positive probability it will have a point of increase before it reaches 1 (scaling), and then with positive probability it will rise at least 2 before returning to1 the level of the point of increase. For fixed ee(0,1), define the a.s. finite stopping times Tk9 Uk9 and non-negative random variables Mk by Mo = [/o = 0; Tk = ini{t>Uk:Bt = Mk-e or B, = Mk + 2}, /c^O, Mk + l=sup{Bt:t^Tk}9 /c^O, Uk+1=ini{t>Tk:Bt = Mk + 1} /c^O. Figure 1.2 on page 24 illustrates the situation up to the feth stage: Because of the strong Markov property of Brownian motion, the pieces of path {B(Uk + s) — B(Uk): 0 ^ s < Tk — Uk} are independent, identically-distributed. Thus the random variables Xk = Mk + 1-Mkt fc^O, are also IID, and each is equal in law to the maximum of Brownian motion until it leaves [ — ε, 2]. Thus (see Proposition 7.3) ίε/(ε + χ) (0<χ<2), P(Xk>x) |o (^2) The idea is that the unlikely event that Xk = 2 (which has probability ε/(ε + 2)) is an approximate point of increase at time [/к_х; convince yourself of the key fact that if A0 happens then
24 BROWNIAN MOTION 1.12 Fig. 1.2 AE — {for some /c, Xk — 2, Mk ^ 1} also happens. [Hint: If τ is the largest ί such that Bs^Bt^l for all s ^ i, and Bs^Bt for i^s^w, Вм^В, + 2, and if ξ:= Βτ then consider inf{Mk: Mk > ξ} =:m. Argue that m ^ 2.] But now P(A0) ^ Ρ(Αε) and Ρ(Λ)= Σ P(Mk ^ 1, Zk = 2) = Σ P(Mk^l)P(Zk = 2) (since Xk is dependent of Mk by the strong Markov property) ε (12.5) 2 + ε E(N + 1), where N is the number of к for which Mk ^ 1. But Mk = Σ*=ο Xj> anc^ N + 1 is a stopping time for the random walk (Mk)k>0, so, by Wald's identity (a special case of the Optional Sampling Theorem), so that EM„+1 = E(N + 1)E*0, E(N+1) = E(M"^ EXn e|loge| Feeding this into (12.4) gives Р(Л0КР(Л£Х (2 + E)|logs| -+0 as ε|0. D
1.13 BASICS ABOUT BROWNIAN MOTION 25 13. Reflection. The basic idea of the reflection principle for Brownian motion is familiar from simple symmetric random walk, and only the technical machinery is any more difficult. For деИ, define the hitting time of a Ha:=inf{t>0:Bt = a}. (13.1) THEOREM. Fix aeR. The process \Bt [t<Ha\ (13.2) Bt:= · [2a-Bt (t>Ha) is a Brownian motion. Proof. Consider the process Yt:=Bt (O^i^HJ, Zt:=B(t + Ha)-a. By the Strong Markov property, Theorem 12.1, Ζ is a Brownian motion independent of Y. By (3.2), — Ζ is also a Brownian motion, also independent of Y. Thus (7,Z) = (У, -Z). The map <p:{Y,Z)^{YtI{t^HaY + {a + Zt-Ha)I{t produces a continuous process, which will therefore have the same law as φ(Υ, -Ζ). But φ{Υ,Ζ) = Β, and φ{Υ, -Z) = B. D (13.3) COROLLARY. Define St:=sup{Bu:u^t}. Then, for а, у ^ 0, t > 0, (13.4) Р[5,>а,В,<а-у] = Р[В,>а + з;]. Proof. P[S, > a, Bt ^ a - y~\ = P[S, ^a,Bt^a- y], with В given by (13.2), = P[Bt>a + yl by drawing a picture. Π Remarks (i) Immediately from (13.4), we obtain, for a>0, Р[Яа^Г] = Р[5,^а] = P[S, > a,Bt ^ a] + P[Sr > a,Bt > a\ = 2P[B, > α] = 2Ρ[βχ >а/ф\
26 BROWNIAN MOTION 1.13 by Brownian scaling. Differentiating with respect to t gives a (a1 (13.5) Р[ЯвеЛ]/Л = —=exp -- , (ii) The joint distribution of St,Bt follows easily from (13.4): (2a -x)2~ It da dx, (13.6) P[Steda, Btedx] = 2(2fj exp for д ^ 0, χ ^ α. (iii) Let P* denote the law of Brownian motion started at x. Then it follows easily from (13.4) that, for x, у > 0, from which (13.7) PxlBtedy9 H0 > f]/dy = Pt(x9y) - Pt(x, -y\ where pt is, of course, the Brownian transition density (4.2). The transition density (13.7) is often called the taboo transition density. (iv) It is possible to obtain the analogue of (13.7) when H0 is replaced by the first exit of Brownian motion from an interval (a, b)\ the transition density is (13-8) Σ {рг{х,у + 2пб)-рах,2Ь-у + 2пд)}9 a<xty<bt where δ := b — a. We omit the proof of this result, which we shall not be using: see Freedman [1], p. 26 for a proof, (v) There is a routine way to derive the analogous results for Brownian motion with drift с from the ones we have just calculated. If Q:=C(R+,R), Xt{co):= ω(ί), ^°:= σ{{Χ„:и ^ t}) is the canonical space, and if Pxc is the law on (Ω, 3?°) of Brownian motion started at χ with drift ceR, then on &°x the laws (P*'c)ceR are equivalent with density with respect to Wiener measure given by dP (13.9) dP x,Q = exp [c{Xt -x)- \c2t]. This is a special case of the celebrated Cameron-Martin-Girsanov formula, which we discuss in depth in Chapter IV. The reader may like to try proving (13.9) now (Hint: consider a cylinder set.) Combining (13.9) and (13.6), we deduce that, for a ^ 0, χ ^ a, (13.10) P°'TSteda, *,eix] = 2(2fl~x> exp Γ - (2а~х? + cx _ ιΛ1 Jbti* L 2i J Can you see how to deduce (9.2) from (13.5) and (13.9)?
1.14 BASICS ABOUT BROWNIAN MOTION 27 14. Reflecting Brownian motion and local time. If {Bt)t> 0 is a Brownian motion then a reflecting Brownian motion is a continuous process identical in law to (|£fl)r>o· It is a continuous non-negative process, and is in fact also a Markov process. The Markov property is not immediately obvious, because we have taken a function of a Markov process (x \—>|x| in this case), which will not in general be Markovian. The following simple result covers this situation. {14.1) LEMMA. Let {S,^) and (S'9Sf) be measurable spaces, and suppose that the measurable function Φ:{Ξ,^)-^{Ξ',^') is onto, {Pt)t>0 *5 a Markov transition semigroup on (S,^), and (Qt)t>0 г5 а collection of probability kernels on S' such that, for allfebSf', (14.2) Л(/°Ф) = (бг/)°Ф. Then (Qt)t^0 is a Markov transition semigroup, and if X is a Markov process with transition semigroup (Pt) then Ф(Х) is Markov with transition semigroup (Qt). (A probability kernel Q on S' is a mapQ:S' χ ^'->[0,1] such that β(χ,·) is a probability measure for each xeS', and Q(-, A) is measurable for each A e9") Proof. Using the semigroup property of (Pt) and (14.2) Л+.(/°Ф) = (&+,Я°Ф = ЛЛ(/°Ф) = Л((б,Я°Ф) = (йб-Л°Ф- Since Φ is onto, Qt+sf = QtQsf To check that Φ(Ζ) is Markov, just take 0 = i0 ^ t1··· ^tn, sk = tk — tk_1, fkeh&" and compute U-i J = (QsJiQsj2-QsJn)mx)) by repeated application of (14.2). Π In our case, the transition semigroup is the Brownian transition semigroup (4.2), and Φ(χ) = |x|. If we define, for /eCb(R+), QtfW = (a(*. y) + Pt(x, ~y))f(y) dy Jo = J " (2πί)-1/2 exp ( - ^p^ cosh №\ f{y) dy then it is trivial to check (14.2). The criterion of Lemma (14.1) for Φ{Χ) to be Markov is a very obvious one; see Rogers and Pitman [1] for a discussion of more interesting criteria. The strong Markov property of \B\ follows from the strong Markov property of Brownian motion, since any stopping time for the filtration of |B| is a stopping time for the filtration of B.
28 BROWNIAN MOTION 1.14 Now let us consider the process ((St,St — Bt))t>0, which is a continuous strong Markov process in (R+)2. (Here, St:= sup{Bu:и ^ i}, as in the previous section). To see the strong Markov property, observe that, for any finite stopping time Г, (14.3) (ST+t,ST+t-BT+t) = (STv(BT + St\((ST-BT)vSt)-Bt\ where Bt:= BT+t — BT and St:= sup {Bu:и ^ i). Since В is independent of ^T, the law of the future of (S,S — B) given ^T depends only on (ST,BT). It is possible to write out explicitly the transition semigroup of (S, S — B\ using the results of the last section, and then to confirm, using Lemma 14.1, that S — В is a Markov process, with the same transition semigroup as reflecting Brownian motion |B|. We skip the details, because we give a much neater proof in Section V.6. For now, you will not be suprised that S — В is a reflecting Brownian motion even if you do not carry through the calculations we indicated, because S — В is a continuous non-negative process that clearly behaves like Brownian motion in (0, oo). What is most important for now is the remarkable fact (discovered by Levy) that by looking at the path ofS — B, we can work out what S is\ Let us see how this can be done. Fix ε > 0 and define T:=ini{u:Su-Bu>e}. Then it is not hard to see that ST must have an exponential distribution. Indeed, for any a > 0, PlST-a>x\ST>a-] = PlT>Ha+x\T>Ha·] = P[SM - Bu ^ ε for Ha ^ и ^ Ha+x\Su -Bu^sioru^ HJ = P[SM-Bu ^ε for 0^u ^HX\SU-Bu^ε for и^HJ (using (14.3) with T = Ha,Hx:=inf{u:Bu = x}) = P[SM - Bu ^ ε for 0 ^ и ^ Ях] (since В is independent of ^(Ha)) = P[Sr>x]. Since ST has an exponential law, we can compute its mean using the Optional Sampling Theorem: 0 = ЕВГл и = ЕРгл и ~ (STa η - ВТл и)] = Е5Гл и ~ Е(5Гл и ~ ВТл „)> whence and, letting wtoo, using monotone convergence on the left and dominated convergence on the right, we obtain ESr = ε, so that ST is an exponential random variable of rate ε-1. Now let us define Τ\(ε):= 0, Γ„(ε):= inf {и > 7»: Su - Bu > ε}, T'n + 1(£):=M{u>Tn(£):Su-Bu = 0},
1.14 BASICS ABOUT BROWNIAN MOTION 29 so that 7\(ε), Γ2(ε),... are the successive times at which S — B achieves an upcrossing of [Ο,ε) (see Fig. 1.3). Define [/(ί,ε):= sup {k:Tk(e)^ ή, the number of upcrossings made by S — В before t. Note that [/(·, ε) is increasing for each ε > 0. Now it does not take too long to realise (using the strong Markov property) that the random variables 5(Γ1(ε)),5(Γ2(ε))-5(Γ1(ε)),5(Τ3(ε))- S(T2(e)\... are IID exponentials, rate ε"1. Thus U(Ha,s) = sup{k:S(Tk(s))^a} will have a Poisson distribution with mean α/ε. Now hold a fixed, and consider Z„ = 2-"C/(Hfl,2-"), which has mean a. If <&„:= a({Zk: k^ n}), we claim that (Zn,^n) is a reversed martingale. Indeed, if we consider an upcrossing of S — B from 0 to 2~n then with probability exactly \ the path of S — B will go on up to 2~n+1 before it returns to 0; thus, given ^n, the number of upcrossings to 2 n+ *, U(Ha has а В([/(Яа, 2-"),£) distribution. Hence Е[С/(Яв,2-и + 1)|^п]=1С/(Яв,2-и), which implies ECZ^J^J =Z„. Thus (see Section 11.51) Z„-+Z^, a.s. and in L1. )-n+l ), This ensures that EZ^ = a; but yar (Z„) = ей Ζω = a, a.s. To summarise, (14.4) ► 0,ΕΖ„ = α,Ζ„ Ит2-"С7(Яв,2-") = а, a.s. • a, implying S-B Ш Fig. 1.3
30 BROWNIAN MOTION 1.14 Hence we immediately have that Ρ lim2"nC/(Ha,2-n) = aforallaeQ+ =1, l_n->oo J and by the fact that U(-,e) is increasing we conclude that (14.5) Ρ lim2-"l7(i,2-',) = SIforalli^0 =1. Ln-*oo J (For a different martingale argument, see Exercise E79.71c.) To summarize, we can by looking at S — В count the number C/(i, ε) of upcrossings of [0, ε] by time i, and then, according to (14.5), we can work out St simply by the recipe St = lim2-n[/(i,2-n). But a moment's reflection will show that, since we could reconstruct S just from S — B, we could apply the same construction to the process \B\ (which has the same law as S — B) and thereby obtain some process I with the properties (14.6) (i) / is continuous increasing, (14.6) (ii) / grows only when |B| = 0, (14.6)(iii) |Bt| — lt is a Brownian motion, since each of these properties is immediate for the pair (S — B, S), which has the same law as (|B|, /). We have therefore proved the following celebrated result of Levy. (14У) THEOREM (Levy). These exists a (unique) continuous increasing process I such that \Bt\ — lt is a Brownian motion. The process I grows only when \B\ — 0, and can be recovered from \B\ by the recipe (14.8) lt=lim2-nU(t,2~nl where U(t,e) is the number of upcrossings of[0,e] by \B\ before time t. (14.9) Remarks (i) The uniqueness assertion remains unproved, but is an immediate consequence of a general result that a finite-variation continuous martingale is constant (IV.30.4). (ii) The reversed martingale argument given here is due in a more general setting to Greenwood and Pitman [1]. See also Ito and McKean [1, Section 12]. (iii) The process / constructed in Theorem 14.7 is called Brownian local time (at zero). In various places in the literature (and in particular, Ito and McKean [1]), Brownian local time at zero is taken to be jl, so be careful when moving from one account to another.
1.14-16 BASICS ABOUT BROWNIAN MOTION 31 (iv) We could evidently repeat the construction (14.8) at other levels by looking at the upcrossings of [0,2~n] by \Bt — x|; this would give us a process /(i, x), the local time at x, defined except on some null set that may depend on x. Now look again at Trotter's Theorem 5.9, which says that, almost surely, /(*,x) can be defined simultaneously for all x, and in such a way as to be jointly continuous—a far stronger result. 15. Kolmogorov's test. Suppose ft:R+ ->R has the property that t~1,2h(t)\ as ί JO, and let Λ:= {Bt ^ h(t) near 0} = (J {B5 ^ h(s) for all s *ζ <5}. <5>0 Then Kolmogorov's test says that Р(Л) = 0 or 1 according to whether the integral (15.1) f r3/2/i(i)exp'" k{t)2 \dt It diverges or converges. The correct way to prove this result is by excursion theory, but it would be premature to discuss this here. The essence of the excursion theory ideas is contained in the fine proof due to Motoo [2]; see also Ito and McKean [l,p. 33]. We omit the proof. (15.2) Remarks (i) Since, for each t > 0, Л= (J Π {Bs^h(s)}, di=Qn{0,t]s€Qn(0,S] we see that Ле^°, and hence Ae^°0 + ; by the Blumenthal 0-1 Law, therefore Р(Л) = 0ог 1. (ii) Since Y[Haedt]/dt = a{2nt3)~i/2exp{-a2/2t), the integral in (15.1) has a simple interpretation, making Kolmogorov's test easy to remember; the special case h(t) = a > 0 for all t will help you to remember that Р(Л) = 1 if and only if the integral converges, (iii) It is a simple consequence of Kolmogorov's test that (15.3) Ρ limsup-^ °x =1 L цо V2tloglog(l/t) = 1, the celebrated Law of the Iterated Logarithm (LIL) for Brownian motion. Prove (15.3) now as an exercise using Kolmogorov's test; we shall give a direct proof in the next section. 16. Brownian exponential martingales and the Law of the Iterated Logarithm. The main aim of this section is to give McKean's [1] neat proof of the classical
32 BROWNIAN MOTION 1.16 Law of the Iterated Logarithm (LIL) for Brownian motion. Recall the statement (10.3): (16.1) Ρ limsup - =1 =1. L no ^ilogloga/i) J Because of the time-inversion property (3.5), this is equivalent to the statement (16.2) Ρ limsup-—=^====1 U 1. L it* ^2iloglogi J Proof of (16.1). Write h(t)= [2iloglog(l/i)]1/2. The first part of the proof is to show that limsupt|0Bt//i(i) cannot be bigger than 1. Using the Doob sub- martingale maximal inequality (see Section 11.70) on the exponential martingale Z,:= exp(aBt — ya2i), we obtain (16.3) Ρ sup(Bs--|as)>/? =P supZs>eaP к e~aPEZt = e~afi. Now fix 0, δe(0,1), and apply (16.3) with t = 0й, α = θ~η{1 + δ)Η{θη) and β = Щвп). Since <χβ = constant + (1 + δ) log η, P\ sup (Bs-±us)> β] ^constant χη~{ί+δ) and so, by the Borel-Cantelli Lemma, it is almost surely true that, for all large enough n, sup [Bs - \5(1 + δ)θ~ηΗ{θη)'] ^ Щвп). Thus, for 0n + x < t ^ 0", we have Bt ^ sup Bs ^ 1(2 + (5)/i(0n) ^ ±0" 1/2(2 + (5)/i(i), so that limsupBt//i(i)^^-1/2(2 + i), a.s. Letting 0jl,<5 JO through countable sequences, we conclude that (16.4) lim sup Bt/h{t) ^ 1, a.s. We now turn to the second part of the proof, that P[lim sup Bt/h(t) ^ 1] = 1. Once again, choosing 0e(O,1), we let An be the event Αη = {Β(θ»)-Β(θη + ')>(1-θ)ν2Ηθη)}. The events An are independent, and, since Β{θη) - Β{θη+1) has an N(0,0n(l - 0))
1.16 BASICS ABOUT BROWNIAN MOTION 33 law, we have, with α„:= θ η/2/ι(0π), Р(Лп) = Р(В1>а„) (*p(-jy2)dy Since = (2π)-^[0 J αΎ >{2πΓ1'2 Г(1-Зу-*)ехр(-у2)<1у J an = (2я)-1/2ехр(-К2)^„"1(1-^2) := γΗ9 say. ±a2 = logn + loglog(0 X) we conclude that £пуп = + oo, by comparison with Znfalog n) 1. Thus, almost surely, for infinitely many n, Β(θη) - Β(θη+1)> (1 - 0)1/2/ι(0Π). But on applying (16.4) to —B, we conclude that, for all large enough n, Β(θη + 1) ^ - 2Η(θη + 1) > - 4θ1/2Η{θη). Thus, for all large enough n, Β{θη) > [(1 - 0)1/2 - 401/2]й(0"), and now the result follows by letting 0 JO. Π (ί6.5) COROLLARY. Let Yl9Y29... be IID zero-mean random variables with variance 1, and let Sn = Σ"=ι Yj- Then r']- Ρ limsup—— n L n->oo ^J2n log log Proof. Please review the Skorokhod embedding of Section 7. We are going to use the construction and notation of that section, and, in particular, we shall assume that (SH)H>0 = {B(T„))n>0; that is, the random walk is embedded in B. By the Strong Law, η~1Τη-^ΈΤ1 = 1, and so, with h(t):= ^/2* log log i, HTn) Hn) . Trivially, then, >1, a.s. (n->oo). r sn ,· B(Tn) ,. B(t) i hm sup = hm sup ^ hm sup-—- = 1. .->«> Цп) n-«> цтн) и/ад The other inequality needs a little more care, but is still not hard. Note that p2:= limsupn_0O S„/h{n) is a tail measurable random variable, which is therefore constant a.s. (by the Kolmogorov 0-1 Law.) Suppose that p< 1; then, for all
34 BROWNIAN MOTION 1.16 large enough n, над m<f. This will contradict (16.2) provided we can prove that in the interval [Гп, Tn+ x] the Brownian motion cannot rise too far. To estimate this, using the explicit form of the embedding, we have, for χ > 0, <p(x):=P supB,>x =y F_{da) Li^Ti J J-oo F+(db)(b-a)- a) x — a since ( — a)/(x — a) is the probability that the Brownian motion reaches χ before a. We aim to prove that, for any ε > 0, (16.7) £ (p(e^/2nloglogn) < oo, which, since φ is clearly decreasing, will follow from (16.8) Σφ(ε4Α)<οο. π The point of this is that if we let An:=l sup (Bu-BTn)/h(n)>s Ι Τ n ^ u ^ Τ n + ι then P(An) = cp(sh(n)\ and (16.7) will imply that almost surely only finitely many of the An occur. Thus, for all large enough n, Bu B(Tn) sup ^ + ε < ρ + ε, Tn*u*Tn+lh(n) h(n) whence for all large enough η sup BJh(u) < ρ + ε. If we have chosen ε so small that ρ + ε < 1, this contradicts (16.2). The proof will therefore be complete once we have (16.8). We write ФМ = уГ F.ida)^ F+{db){b-x)- J — oo J χ ■* + y Π F_(da)f"jF+(db)(-fl) J — oo J χ :=φ1(χ) + φ2(χ), say.
1.16 BASICS ABOUT BROWNIAN MOTION 35 Now φ2 (x) = 1 — F(x\ so Λαο Λαο Σ^2(ε\Α)< 0°^ (p2{£y/x)dx< oo<=> (1 — F{t))tdt< oo; and the last statement is true because F has a finite second moment. As for φΐ9 we have easily and Σ(Ρι(β\Α)< ^^ ^x—7= F + {db){b — By/x)< oo η J ε^/xJzJx -W> ■f (db)(b-t)<a> b2F+(db)<oo; and this last statement is true, again by the finiteness of the second moment ofF. α (16.9) Remarks. The Law of the Iterated Logarithm is just one aspect of a much bigger picture, and is a simple consequence of the following much deeper result. (16.10) THEOREM (Strassen). For η > 3, define XM(t) := B(nt)(2n log log и)"1/2, 0<ί<1, a random element of С ([0,1]). With probability I,the set oflimit pointsof(X(n\^3 is the set К:= \ f eС[0,1 ]:/is absolutely continuous, f'(tf dt^l}. , Г А Jo See Strassen [1] or Freedman [1] for a proof. You will not be surprised to learn that if (Sn) is a zero-mean unit-variance random walk, and S„(i):= (nt -j)Sj+ x + (j + 1 - nt)Sj {j/n < ί < (J + 1)/") is the piecewise-linear interpretation of (SJ, then almost surely the set of limit points of {{In log log n)~ 1/2Sn{-))n> 3 is K. This is the Strassen Invariance Principle. You will also not be surprised to learn that one deduces the Strassen invariance principle from Theorem 16.10 in the same way that we deduced the classical LIL, Corollary 16.5, from (16.2), namely by Skorokhod embedding of the random walk in Brownian motion.
36 BROWNIAN MOTION 1.16,17 The proof of Theorem 16.10 that you will find in either of the above references is specific to the Brownian situation, of course. But there is a sense in which this result can be seen as part of the much more general theory of large deviations, pioneered by Cramer, Schilder, Donsker and Varadhan, Ventcel and Freidlin, and developed further by Donsker, Varadhan, Stroock, among others. The excellent account by Deuschel and Stroock [1] contains a proof of Theorem 16.10 from a large-deviations point of view, and is delightfully clear. 3. BROWNIAN MOTION IN HIGHER DIMENSIONS 17. Some martingales for Brownian motion. By Brownian motion in Rd we mean a process Bt:=(Bl,...,Bdt) where each of the (B{)t>0 (j=l,...,d) is a Brownian motion, independent of all the others. To study Brownian motion in Rd, we are going to need martingales, and the purpose of this section is to derive a result that gives us all the martingales we shall need. This result can be seen as a special case of general results in Markov process theory or in stochastic calculus, but we shall prove it here using the special structure of Brownian motion, since we do not yet have the general results. (17.1) THEOREM. Suppose that /:R+ χ Rd->R is C1·2, and that there exists a constant К such that, for all t ^ 0, xeIRd, d2f (17.2) \f(t,x)\ + Then the process f("» d + Σ 7=1 OX: d d + Σ Σ дх,дх (t,x) ^KeK(, + \x\) (17.3) where (17.4) C{:= f(t,Bt) - f(0, B0) - 9f(s, Bs) ds is a martingale, Jo •Ws+ilSV Remarks. The class C1,2 is, of course, the class of functions f(t,x). with continuous partial derivatives of all orders up to 1 in t and up to 2 in x. The exponential growth condition (17.2) will be seen to be unnecessary provided we relax the statement (17.3) to say that Cf is a local martingale. We shall not digress to define this now. In dimension d = 1, the only functions of χ for which f(Bt) is a martingale are the linear functions, but in dimension d ^ 2 we shall see that there is a very rich family of / for which f{Bt) is a martingale. Proof. We must prove that, for 0 ^ s ^ i, E[Cf-C{|^] = 0,
1.17 BROWNIAN MOTION IN HIGHER DIMENSIONS 37 for which, by the independent-increments property of B, it will suffice to prove that, for any xeJR* and t ^ 0 (17.5) E*[Cf]=0, where P* is the law of Brownian motion started at x. Without loss of generality, we can take χ = 0 (and write Ρ for P°), and we shall prove that, for 0 < ε < ί, (17.6) E[C{-CfJ = 0. Using the assumption (17.2), the fact that P[supu<i \BU\ ^ a] < сР[|Вх | ^ α/^/ί] (see (13.4)), and dominated convergence, (17.6) implies (17.5). Letting pt(x):=(2nt)~d,2exp( — \x\2/2t) denote the d-dimensional Brownian transition density, we observe that, for t > 0, xeRd, dt (17.7) Hence ELC{-C{] = E f(t,Bt)-f(E,Bc)-{'%f(s,Bs)ds] (x)f(t, x) - pe{x)f{e, x)] dx But ds ε *f p£x)№(s,x) + ±Af(s,x)]dx. ps(x)±Af(s, x) dx = ±Aps(x)/(s, x) dx, (integrating twice by parts and using (17.2)) J dt = \^(x)f(s,x)dx, using (17.7). Thus Ε \C{ - Cf] = [|>,(χ)/(ί, x) - Pe(x)f(e, *)] dx -{'ds | L(x) ^"(s, x) + Д5, x) ^ (x)l = [P.W/C, *) - Ρε(*)/(ε, *)] dx - ds ■ = \lP,W(t, x) - pt{x)f{e, x)] dx - J| Γ dx 5 (PsW/(5,x))dx = 0. dsjips(x)f(s,x)^>dx α
38 BROWNIAN MOTION 1.17,18 Note that the interchange of the s-integral and the x-integral in the penultimate line В is only justified because we have ensured s ^ ε > 0 and so ps(x) and dps(x)/dt are bounded; this is why we had to proceed to the natural (17.5) via the slightly clumsy (17.6). 18. Recurrence and transience in higher dimensions. As a first application of Theorem 17.1, we shall show in this section that Brownian motion is recurrent in dimension d — 2, and transient in dimension d ^ 3. If В is Brownian motion in Rd, let Ha:=mf{t>0:\Bt\ = a}. {18.1) THEOREM. For 0< a < \x\ < b, '\ogb-log\x\ log b — log a \x\2-d-b2 (18.2) Px(Ha<Hb): χ .d 2-d L2-d a2-"-b Proof. Let /:R'-»R be a C£ function such that, for a< |x| ^b, HX) 1|χ|2-' (d>3). Then / satisfies the conditions of Theorem 17.1, and moreover, Δ/(χ) = 0 for a^|x|^fe, as is readily verified. So, using the Optional Sampling Theorem on the martingale Cf at the stopping time τ:= Ha л Hb yields, for \x\e(a,b\ 0 = Ex[C£]=Ex[C{] = E*lf(BHa):Ha < Я J + E*lf(BHb):Hb < Ha~\ - /(*), since τ ^ Hb < oo, a.s. (one dimensional Brownian motion certainly leaves [—6,6] in finite time!). So, in the case d = 2, f(x) = log|x| = Px[Ha < Hb-\ log α + Px[Hb < Η J log 6, and rearrangement gives (18.2), with analogous reasoning for the case d ^ 3. D (183) COROLLARY. Brownian motion in dimension d = 2is recurrent, and in dimension d ^ 3 it is transient; more precisely, for any 0 < a < \x\, (18.4) Рх[Я.<оо] = {1 А „ (d = 2) \(a/\x\)d-2 (d>3).
1.18,19 BROWNIAN MOTION IN HIGHER DIMENSIONS 39 Proof. Since {Ha < 00} = [jn{Ha < Hn}, (18.4) follows immediately from (18.2) by letting bf 00. (18.5) Remarks. It follows immediately that Brownian motion in the plane visits every non-empty open set U with probability 1, and, in fact, keeps returning to U: there is no last visit to U. In dimension greater than 2, any bounded set with ultimately be left for ever. Thus, in dimension d ^ 3, Brownian motion is transient. In dimension 2, Brownian motion is recurrent—or, more accurately, neighbourhood-recurrent, since Brownian motion in the plane does not hit points, as we now show. (18.6) COROLLARY. For Brownian motion in R2, |x| >0, РХ[Я0 < oo] = 0. Proof. {H0 < 00} = [J {H0^Hn} η η \m>|x|-i / and Ρ*Γ Π {Я1/т^Яя}1 = 1НтР^[Я1/т^Яп] Lm>|x|"1 J "»-°° = 0, using (18.2). Π (18.7) Remarks. We are seeing here the first signs of the important result that if {Bt)t>0 is Brownian motion in Rd (d ^ 2) then (\Bt\)t>0 is a diffusion process, called a d-dimensional Bessel process (denoted BES(d)). We postpone discussion of this until we have a better idea of what a diffusion process is; see Section V.48 for a detailed analysis and Section VI.52 for the celebrated Ray-Knight Theorem, where Bessel processes enter in a wholly natural way to describe the diffusion property in the spatial parameter of Brownian local time (1(τ,χ)χ>0) taken at some suitable stopping time τ. The Bessel processes are the most important one-dimensional diffusions apart from Brownian motion; Pitman and Yor [1] provide a detailed study of many of their properties. See also Revuz and Yor [1]. The fact that the Bessel process is a diffusion in its own right is due to the fact that Brownian motion in Rd is rotation-invariant; if ReO(d) is fixed then (RBt)t> 0 is a Brownian motion too. The proof is trivial. Lemma 14.1 now allows us to conclude that |Bf| is a Markov process. 19. Some applications of Brownian motion to complex analysis. One of the richest areas of application of Brownian motion is complex analysis. The funda-
40 BROWNIAN MOTION 1.19 mental observation is that if / is an analytic function in some domain D then each of the functions Re/ and Im/ satisfies Laplace's equation Aw = 0 in D. Thus, in view of Theorem 17.1, Re/(Z,) and Im/(Zt) are local martingales, where Z,:= Xt + iYt is complex Brownian motion. {19.1) Remark. Even if/ were defined on the whole of <C, there would be no reason for Re/ to satisfy the growth condition (17.2) (as an example, take f(z) — exp(z3)). This is not really a problem, because if we take an open D0 c= D, with D0 compact, and take / to be C£, equal to / in D0, then certainly Re/ satisfies (17.2), and so Re/(ZiAt) is a martingale, r:=ini{t:Zt$D0}. In all the applications we make to complex analysis, this sort of localisation will often be implicit, and we shall not dwell on the details. {19.2) PROPOSITION (Maximum Modulus Theorem) Suppose that /: D-»C is analytic, and that D Ώ. Dr:= {ze<C:\z\ ^ r}. Then (19.3) max{|/(z)|:zEDr} = max{|/(z)|:|z| = r}. Proof. If / were constant, there would nothing to prove; so suppose / is not constant, and take z0eDr such that |/(z0)| = max{|/(z)|:zEDr}. Since / is bounded on_Dr, we may add λ/(ζ0) to / and assume (by taking λ > 0 large enough) that f(Dr) is contained in a half-space distant at least 1 from 0. Thus log/ is a well-defined analytic function on Dn continuous on Dr. Hence h:= Re log/ = log|/| is harmonic in Dr and continuous in Dr. The Maximum Modulus Principle will follow once we show the stronger result: (19.4) if U ^ С is open, and h: U ->R is harmonic then h has no local maximum in U\ ifz0eD(z0,e):= {z:\z — z0\ < ε} c= U and h(z0) ^ h(z)for all zeD(z0,b) then h{z0) = h(z) for all zeD(z0, ε). The proof of this is a trivial application of the Optional Sampling Theorem. If τ:= inf{i:|Z, — z0\ = <5}, where Z0 = z0 then, since h{Zt, τ) is a martingale (Theorem 17.1), we have ft(z0) = Eft(Zt) = -h(z0 + dew\ Jo 2π since, by symmetry, the exit distribution from D(z0,S) is uniform on the boundary. But, since h is continuous and h(z0 + Sew) ^ h(z0) for δ < ε, it must be that h(z0 + Sew) = h(z0) for δ < ε, establishing (19.4). Thus if |/1 had a local maximum inside Dn there would be a disc where
1.19 BROWNIAN MOTION IN HIGHER DIMENSIONS 41 log|/1 was constant, implying that log/ (and therefore /) was constant in the disc, and hence throughout D. Π (19.5) PROPOSITION (Fundamental Theorem of Algebra). Suppose that f: С -> С is a non-constant polynomial. Then there exists z0e<E such that f(z0) — 0. Proof. Suppose the contrary, that / is non-vanishing in <C. Then g:= \jf is a well-defined analytic function, tending to zero at infinity (since / is a polynomial) and therefore bounded. Since / is non-constant, we may find disjoint discs Dx and D2 and α < β such that Re^) ^ α < /K Re#(z2), zteDt. Let Ζ be Brownian motion in <C, and consider the bounded martingale Re#(Zt). The Martingale Convergence Theorem says that this is almost surely convergent, yet, by Corollary 18.3, the process Ζ keeps visiting D1 and D2l there is no last visit to D1 or D2. Hence liminf Reg(Zt) ^ α < β < limsup Reg(Zt), a contradiction. Π So far, we have not really been using the full strength of the Brownian- motion/complex-analysis combination; we have only applied harmonic functions to Brownian motion, and an analytic function comprises two very intimately related harmonic functions. See the theory of conformal martingales (Getoor and Sharpe [5]) and Chapter 4. The connection with Brownian motion goes much deeper, as the following result shows. (19.6) THEOREM. Let f:D-*D be analytic, and let Ζ be Brownian motion in D. Then there is a Brownian motion Ζ in D such that f(Zt) = Z^'\f'(ZJ\2du\ Remarks. A Brownian motion in a domain D я С is only defined up until the first exit time from D. The importance of (19.7) is that the image of Brownian motion under an analytic map is another Brownian motion (to within a time change). See Section IV.34 for a proof. A common and powerful use of this is that the exit distribution for one domain gets mapped to the exit distribution of some other domain by an analytic map. Let us see an example of this. (19.7) PROPOSITION (Poisson integral formula). Let H= {ze<C:Imz>0}, (Zt)t^0 be Brownian motion started at y = a + ibeM, and τ:= inf{V.ZM£H}. Then (19.8) P[ReZ,6rfx]- >* =Im(-i-)^, n\\r + (x - a)2] \x-yj η
42 BROWNIAN MOTION 1.19 First proof. The stopping time τ is simply the first time that Υ := Im Ζ hits zero. But this time has a density that we know (13.5). Meanwhile, X = Re Ζ is moving like an independent Brownian motion, so '°°be"b2/2ie"(x"fl)2/2i ^2nt* y/2nt b P[Xredxydx = ί "*,__-—т=-Л Jo Second proof. The map тг[Ь2 + (х-д)2]' ζ — у ζ ι—►νν(ζ):=· D ζ —у maps И one-one onto D= {w:|w| < 1}, and takes у to 0. Thus {w(Zt):t< τ} is a time transformation of Brownian motion on С started at 0 and run until it exits the disc D. Thus, for xeR, P(Ztedx) = ^-\w'(x)\dx, which agrees with (19.8). Π By using the Riemann Mapping Theorem, we can find the exit distribution from any connected simply connected domain by transforming the problem to one for Brownian motion in D started at 0. Suppose, for example, that we want the exit distribution from D for Brownian motion started at ζ in D. The map z^g(z):=f{z)~^ where /(z):=i—eH, f(z)~f(y) l~z maps D one-one onto D, taking γ to 0. Thus, if z = eie, 1 \ — \l\2 (19.9) Pc(Brownian motion exists D in d0) = — \gf(z)\de = —μη{άθ) 2π \ζ-ζ\η where η — 2 and μ2 is the normalized Lebesgue measure on дЮ. Of course, complex analysts are very familiar with the use of the Riemann Mapping Theorem to obtain exit distributions (or Poisson kernels or harmonic measure). Like them, we have to be sensible in a case such as that in which the domain is D\(— 1,0] and has the wrong topology, the upper and lower parts of the cut needing to be separated. Martin boundary theory will clarify the matter of correct topologies, show that (19.9) holds in all dimensions (check it when η = 1 now!), and prove that every positive harmonic function h on Юп:= {xelRn:|x| < 1} with h(0) = 1 has
1.19,20 BROWNIAN MOTION IN HIGHER DIMENSIONS 43 a unique representation i-ICI2 M0 = v(dz\ 6Юп\г-С\П where ν is a probability measure on дЮп. When h = 1 on Dn. γ is the normalized Lebesgue measure on дЮп. 20. Windings of planar Brownian motion. Let (Zt)t^0 be complex Brownian motion, Z0 Φ 0. Then there is a continuous determination of 6t:= arg(Zt), and a unique one with θοε[0,2π). The angle 0, keeps tracks of the winding of Brownian motion about 0 up to time t. The earliest result on the windings of Brownian motion is the following remarkable theorem of Spitzer [2]. (20.1) THEOREM (Spitzer). Suppose that Z0 = 1. Then (20.2) r-^^-C, (r->oo), logi where C1 is the standard Cauchy law with density [π(1 4-х2)]"1. Proof. Let Pz denote the law of complex Brownian motion started at ζ e<C. For x>0, and Brownian motion Ζ started at 1, Zt:=xZt/x2 is Brownian motion started at x, and 6t = θί/χ2. Thus an equivalent statement to (20.2) is (20.3) the Px-law of(logl/x)~19i converges to Cx as x|0, and it is this that we prove. The idea of the proof is to fix some very small disc of radius a about the origin, which (with high probability) Brownian motion will leave before time 1. As x|0, the contribution to θλ that comes after hitting the circle of radius a is negligible, so we want the limiting behaviour of 0r, where Τ = inf{i:|Zt| = a}. But, by Theorem 19.6, we could realize the Brownian motion Ζ started at χ > 0 as a time change of exp (ζ), where ζ is complex Brownian motion started at log x. In particular, the argument of ZT is equal to the imaginary part of ζ where ζ first hits the line Re z = log a, and the law of this is Cauchy with parameter log (α/χ), from Proposition 19.7—this is where the Cauchy distribution comes from. Now we implement this sketched proof. Suppose given ε > 0 and some bounded uniformly continuous test function /: R -* [0,1]. Choose δ > 0 so small that|x->;|^(5=>|/(x)-/(};)|^^8.NowfixaG(0,l)sosmallthat,forall|z|^a, P'CIZJ^a forall ί^1]<^ε. As we explained above, for all χ in (0, a\ theV.lawqf-^—^^^isC,. log (α/χ) log (α/χ)
44 BROWNIAN MOTION 1.20 Now pick К so large that Pe[| 0,| ^ К for some t ^ 1] ^ £ε, and x0 > 0 so small that К ^ δ log (l/x0) and, for χ ^ x0, log (α/χ) \, i>-' (»S) Then, for χ ^ x0, θ τ Ε / log (α/χ) -/ ^ log(l/x) <ΕΊ/ 0T -/ ^7 + E> f θγ log(l/x) ■/ 0i log(l/x) 1оЕ(а/х)У 4log(l/x) <±ε + P*[T> 1] + Px[|0t - θτ\ > Κ for some Τ < ί < 1, Τ ^ 1] Pj \ / 0j + E· ίΚ log(l/x) ■/ log(l/x) :T^1, \вг-вт\^К for Γ^ί^Ι D (20.4) Remarks (i) The key to this simple proof of Spitzer's theorem is the Brownian Mapping Theorem 19.6. Spitzer's original proof was based on complicated calculations (see Ito and McKean [1, pp. 270-271]), but several proofs based on the Brownian Mapping Theorem have since been given; see Durrett [1] and Messulam and Yor [1]. (ii) If one takes Brownian motion in {z:\z\ ^ 1} reflected in the unit circle then the above argument goes through with the obvious changes to show that if Ζ starts at 1 then (20.5) 20, logi -►S (i->oo), where S is the distribution with density (Zncoshy)'1 (remarkably, the Fourier transform of this distribution is sech 0: see Feller [1, Vol. 2, p. 503] who remarks that the distribution is 'of no importance'!) The result (20.5) was pointed out to us by Kalvis Jansons. The limit law S has all moments, in contrast to Ci9 which is evidence that windings that happen near zero make a large contribution. (iii) The study of Brownian windings has been carried a very long way in recent years by Pitman and Yor [3,4] and Le Gall and Yor [1,2], As a sample of the kinds of results achieved, we give the asymptotic joint distribution of the windings about η points. (20.6) THEOREM (Pitman and Yor [3]). Let zu...,zn be distinct points of<E,
1.20,21 BROWNIAN MOTION IN HIGHER DIMENSIONS 45 θ[ the winding of Ζ about Zj by time t. Then (20.7) ^(0i5...50?)^(Kl,...,Fn) logi where Vj = U + # Y}, the variables Yj are independent standard Cauchy independent of the pair (t/,#), which have joint distribution characterised by (20.8) Eexp(- aH + ivU) = ( cosh ν + °^L1 J for α ^ 0, veR. (iv) For a proof using Brownian windings of Picard's Little Theorem (if /:<C-><C is analytic and non-constant then the range of / can omit at most one value), see Davis [1] and also Durrett [1]. For a proof of Picard's Great Theorem, see Davis [2]. 21. Multiple points, cone points, cut points. In this section we give without proof a number of fascinating and beautiful results that illuminate the behaviour of the Brownian path, and provide a few references to an area in a bewildering state of development. (21.1) THEOREM (Dvoretsky-Erdos-Kakutani [1-3]; Dvoretsky-Erdos- Kakutani-Taylor [1]) (a) Brownian motion in two dimensions has points of all multiplicities 2,3,..., c, where с denotes the multiplicity of the continuum. (b) Brownian motion in three dimensions has double points but no triple points. (c) Brownian motion in dimension greater than three has no double points. (21.2) Remarks. Numerous proofs of all or part of Theorem 21.1 have appeared since the first ones, many of them valid for more general Levy processes; see Hawkes [1], Evans [1], Le Gall, Rosen and Shieh [1] and Rogers [5] for a sample. The existence of multiple points of Brownian motion has been but a small part of a much more profound study of the existence and properties of intersection local time, carried out by Dynkin, Le Gall, Rosen, Wolpert, Yor and others; we refer the interested reader to some of the papers cited in the bibliography for more information on these topics. Applications include questions related to quantum field theory (Dynkin [5,7,8,9]), the asymptotics of the 'Wiener sausage' (Le Gall [4]) and the exact Hausdorff measure of the set of Brownian multiple points (Le Gall [5]). For ae(0,π) let Ca denote the wedge {rew:r ^ 0,|0| ^ a} of angle 2α. If Ζ is Brownian motion in С then a time t such that ZueZt - Ca for all и ^ t
46 BROWNIAN MOTION 1.21,22 is called a cone time, the set of all such being denoted Ha. A cone time t is a time at which the path-so-far lies in the shifted cone — Ca with vertex at the current position, Zt. The position Zt is then called a cone point. When do cone points exist? The following result of Burdzy [1] and Shimura [1] answers this. (21.3) THEOREM (Burdzy [1]; Shimura [1]). Cone points exist if and only if α>£π. (21.4) Remarks. Le Gall [6] shows how this result follows easily from the criterion of Varadhan and Williams [1] for a reflecting Brownian motion in a wedge to hit the corner. Evans [2] and Le Gall [6] study the Hausdorff dimension of the set of Brownian cone points and the construction and properties of a local time on the set Ha of cone times. And lastly in this rushed survey of interesting Brownian motion properties, we give the fine result of Burdzy [2] on cut points of the two-dimensional Brownian path. (21.5) THEOREM (Burdzy [2]). Let Ζ be Brownian motion in <C. Then almost surely there exists ie(0,1) such that {Zs:0 ^s<t}n{Zs:t<s^l} = 0. 22. Potential theory of Brownian motion in Rd(d ^ 3). This brief section can do no more than provide some heuristic sketches of an immense topic; the books of Blumenthal and Getoor [1], Dellacherie and Meyer [1], Helms [1], Meyer [2] and Port and Stone [3] are just a few of the many written on this subject. We shall later provide proofs of some of the results listed here. Basically, any transient Markov process has a potential theory, but we shall discuss here only Brownian motion in at least three dimensions. The key concept of potential theory is the Green kernel G defined by (22.1) Gf(x) = f °° Ptf(x)dt, feCb(Rd), where (Pt)t>0 is the transition semigroup of our Markov process. For BM(Rd), we have (22.2) Gf(x)=lg(x,y)f{y)dy> where (23.3) g(x,y)= ί Pt(x,y)<b Jo _rW««p(-!£r2U)*
1.22 BROWNIAN MOTION IN HIGHER DIMENSIONS 47 = ^~d/2\y-x\2-d\ e'uu'2+df2du Ге-«и-2+а Jo = ^n-dl2\y-x\2-dT{\d-\\ (Note that our Green function is based on the probabilists' normalisation ^Δ, rather than the physicists' Δ.) The same calculation in dimension 1 or 2 gives an infinite answer, because Brownian motion in those dimensions is recurrent. The probabilistic interpretation (22.4) С^(х) = Е^ГГ^(В,)л1 = Ε* [time spent in A by B.~\ is a useful one to bear in mind. A heuristic but suggestive calculation gives from the notion (4.4) of the generator У = ±A as the derivative of the semigroup that (22.5) Pt = ±Δ Pt=>P, = exp φΔ) Ptdt:=G = (-±A)~\ Jo and thinking of the Green kernel as the inverse of — ^ will never lead you astray, though usually a proof must be sought elsewhere. (22.6) Exercise. For /eC£(Rd), prove that ±AGf = G(±Af)=-f. Hint. Use analysis to prove the first equality, then (22.4) and Theorem 17.1) for the second. Fix now some path wise-connected compact subset К ^ Rd, which we think of in physical terms as a conducting body. A classical problem of electrostatics is to determine the equilibrium charge distribution for K, and the equilibrium potential; if a charge is placed on K, then the charge will flow in К very rapidly so as to equate the electrostatic potential everywhere within K. (If the potential were not constant, charge would flow between regions where it differed until everything was evened out.) This equilibrium charge distribution will minimise the energy of the charge, and the potential associated with it is called the equilibrium potential. How are these physical concepts related to the probabilistic ones? The following theorems do not require connectedness of K. (22.7) THEOREM (Hunt). The function PK\ on Rd defined by (22.8) PK\(x):= Px(BteK for some t > 0)
48 BROWNIAN MOTION 1.22 is expressible as the potential (22.9) ΡκΙ = Ομκ=\Β(·9γ)μκ(άγ) of a unique measure μκ on K. The measure μκ is concentrated on dK. If μ is any other measure concentrated on K, and satisfying ΰμ^Ι on K, then ΰμ < ΰμκ onlRA (The restriction in (22.8) to t > 0 is essential—consider the case К = {0}!) The measure μκ appearing in Theorem 22.7 is, of course, what we shall call the equilibrium charge distribution. The capacity C(K) of К is defined to be μκ(Κ\ and is characterised by the extremal property (22.10) C(K):= max{μ{Κ):μ is concentrated on Κ,ϋμ^Ι on K} = max {μ{Κ):μ is concentrated on Κ, Θμ ^ 1 on Rd}. These concepts are neatly related to the concept of energy: the energy <?(μ) of a measure μ on Rd is defined by (22.11) /(μ):= I μ(άχΜχ,γ)μ(άγ). Then (22.12) C(K)-l = min {£(μ):μ concentrated on Κ,μ{Κ) = 1}, and the minimum is uniquely attained at μ = μκ/ϋ(Κ). There is a beautiful probabilistic interpretation of the equilibrium charge, due to Chung [2] and Getoor and Sharpe [1], which improves on Hunt's Theorem. Define a:=sup{t>0:BteK}, with the convention that sup 0 = 0. (22.13) THEOREM. For xeRd, yeK, t >0 Ρχ{ΒσΕάγ, aedt) = pt(x, γ)μκ(άγ)άί. Since {σ > 0} = {BteK for some t > 0}, (22.9) follows immediately by integrating with respect to i. We shall give a proof of Theorem 22.13, and most of Theorem 22.7, in Section VI.35, but before that in Section 111.46 we give an argument based on time reversal that gives a clear intuitive picture of these results. {22.14) Remark. It is true that C(K) = max {v(K): ν concentrated on K, Gv *ξ 1},
1.22 BROWNIAN MOTION IN HIGHER DIMENSIONS 49 and that C(K) = inf [ί(μ): βμ > 1 on К}. Why is the second one inf, rather than min? Why not just take μκ, whose energy we know is C(K)1 Unfortunately, it is not in general true that Θμκ ^ 1 on К, as the example where К is a ball together with a distant point illustrates. This is a typical feature of potential theory—it is essential to be very careful in order to make a statement that is totally correct, as you will see if you consult any of the references cited above. We have seen transparent probabilistic interpretations of the equilibrium charge and equilibrium potential; now we provide an equally transparent interpretation of capacity. (22.15) THEOREM (Spitzer-Kesten-Whitman). Consider the 'Wiener sausage' (St)t>o of (Bt)t>0 defined by St:={J(K + Bu% where К is some fixed compact set. Then (22.16) rMStl-^CiK), a.s., where \A\ denotes the Lebesgue measure of A. (Recall that the physicists' capacity of К will be twice ours.) The proof of this is based on the Subadditive Ergodic Theorem of Hammersley and Kingman; see Durrett [3] for a well-motivated account and fascinating examples of this, as of many other topics. The verification of the conditions of the Subadditive Ergodic Theorem is a triviality, and the interest is in identifying the almost-sure limit, which is simply (22.17) y=limi-1E|Sf|. t-*co Now с (22.18) E\St\ = E\dyI{yeSt) ^ dy P(yeBu + К for some 0 < и ^ i) = dy P(Buey - К for some 0 < и ^ i) = dy Py(BueK for some 0 < и ^ i) = \dyPy(r^t\ (where r:=inf{i >0:BteK}) ^ \dyPy(0<a^t) = tC{K\
50 BROWNIAN MOTION 1.22 from (22.13). Thus immediately у ^ C(K\ and the proof will be complete if we can prove that \dyP'(r^t<a) = o(t). The key to proving this is to show by splitting the path at time t and using the reversibility of Brownian motion with respect to Lebesgue measure (compare Exercise II.E39.29) that (22.19) \dy Ρ'(τ < t < σ) = | dz Ρζ(τ ^ ί)Ρζ(σ > 0). From (22.3) and (22.9), we obtain that PKl(z):= Ρζ(σ > 0) ~\n~"laY{\d - l)C(K)\z\2~d, so it is not very surprising that (22.19) is of smaller order than (22.18), and this is indeed the case, though the verification involves us in techniques that lie ahead, so we shall leave these last steps. Finally, no introduction to potential theory would be complete without a few words about the Dirichlet problem. The problem is this. Suppose given a bounded open connected set D^IRd, and a bounded measurable φ:δΰ->ΊΆ; can one find a harmonic function h on D such that limx_b/i(x) = cp(b) where be 3D and χ converges in D to b? There is a simple solution to this problem using Brownian motion. For any Borel set A, and bounded measurable g, define (noting once again the 4 > 0' condition) (22.20) HA:=M{t>0:BteA}, PAg(x):=Elg(B(HA)):HA< oo]. Let V = Rd\D. Call a point b of dD a regular boundary point if D of b is regular for V for Brownian motion, that is, if РЬ(ЯК = 0)=1. It is not hard to show that if b is the tip of a cone lying entirely within V then b is regular. That the probabilistic definition of regular boundary point agrees with the classical definition in terms of Wiener's test is proved in Section 7.10 of Ito and McKean [1]. The optimal solution of the Dirichlet problem is the following. (22.2jf) THEOREM (Wiener). Let D be a bounded domain in Rn. Let φ be a bounded measurable function on dD that is continuous at each regular boundary point. Then there exists a unique harmonic function h on D such that (22.22) lim h{x) = q>(b) D3x->b for every regular boundary point b.
1.22,23 BROWNIAN MOTION IN HIGHER DIMENSIONS 51 Doob's idea is to prove Theorem 22.21 by establishing the explicit formula (22.23) h = Pvq> = PdD(p in D. You will find a careful proof of all steps of this Theorem in Section 13.6 of Dynkin [2], among other places. The main points of the argument are first to prove (using the strong Markov property and rotational invariance of Brownian motion) that h{x) is the average of h over any ball B{x, r), centred at χ with radius r, contained entirely in D; secondly to use this together with convolution with a smooth function to show that h is C00 inside Д and hence satisfies Laplace's equation; and thirdly to prove (22.22) by an ε-<5 argument. As for the uniqueness assertion, it is clear that if h is any solution, and_G с D is a subdomain, GaD, then PdGh = h on G, since h is continuous on G, and, by the Optional-Stopping Theorem, h(x) = Exh(B(HdD)) (xeG). Now we let G]D and use the fact that (22.24) B{HV) is regular for V, P* a.s. The proof of (22.24) needs Hunt's Theorem 22.7, though if every point oi dV is regular, no proof is needed, of course. I to and McKean [1] and Port and Store [3] give several complements to, and extensions of, Theorem 22.21. Probabilistic potential theory is a big and important subject, essentially originating in Hunt's profound papers [1], which explain the basic principles. Blumenthal and Getoor [1] provide a complete account of Hunt's theory; other standard references include Dellacherie and Meyer [1], Fukushima [1], Kellogg [1], Port and Stone [3], Silverstein [1] and Helms [1]. Hunt emphasised the role of a dual process, which is a kind of time-reversal of the basic process being studied, though the self-duality of Brownian motion tends to obscure things here. See the papers of Mitro [1,2] for a lucid account. In Chapter III, we shall use time reversal to present the Martin boundary for continuous-time chains; Martin boundary theory describes all possible positive harmonic functions or Д and is thus deeper than Theorem 22.21, which only characterises bounded harmonic functions with boundary regularity. 23. Brownian motion and physical diffusion. Brownian motion as we have being studying it is very closely related to what a physicist would understand by the term 'diffusion'; the connection is the celebrated diffusion equation of mathematical physics, which we shall now derive. Consider the diffusion of some substance (for example, a dye) through a medium (which could be water, or a crystal). Let p(i,x) be the concentration of dye at position χ at time t, and let us suppose initially that the medium is isotropic (no preferred directions, uniform throughout space). Consider some plane in the medium, perpendicular to the χ * -direction, say; this plane is constantly traversed
52 BROWNIAN MOTION 1.23 by molecules of dye, which pass from one side of the plane to the other. If the concentration to the left of the plane is higher than that to the right, there will be a net flux of particles of dye from left to right; and the greater the difference in concentration, the greater this flux from left to right will be. Fick's Law of diffusion says that the flux is equal to —\adp/dxv More generally, the flux F(t, x) is a vector quantity and obeys (23.1) F(i,x)=--|aVp(i,x). The vector field F specifies the direction and strength of the net flux of dye, and to find the flow of particles across a plane perpendicular to the unit vector w, we simply form the scalar product u-F. Now if we consider a small volume V around the point x, the total amount of dye in V is jVp(i,x)dx, so rate of change of amount of dye in V --f dt)y p(t,x)dx IV = integral of flux around dV F{t,x)-dn tdv = - V-F(t,x)dx, by the Divergence Theorem. Since V is arbitrary, we deduce the diffusion equation (23.2) ^(i,x) = fV-(aVp)(i,x). ot In the case α ξ 1, we have the Kolmogorov (forward) equation for the evolution of the Brownian transition density: (23.5) -pt(*,y) = iA,pt(x,)0, ot as we argued at (4.7). We have derived the diffusion equation (23.2) under the assumption that a is a constant, but it may equally well depend on position (if the diffusivity of the medium varies), and may even be a matrix-valued function of position. The latter could arise in a crystal, where the preferred directions of the crystal will tend to distort the concentration gradient and produce a flux that is not aligned exactly with the concentration gradient. The derivation of (23.2) remains unchanged. (23A) Remarks. The special case a = 1 of the diffusion equation gives us the Kolmogorov (forward) equation for the Brownian transition density, so you
1.23 BROWNIAN MOTION IN HIGHER DIMENSIONS 53 will be wondering whether the general statement of the diffusion equation (23.2) has a similar probabilistic interpretation. It does indeed, and the interpretation is the analogue of the interpretation given in Section 4 for Brownian motion. Without going into too much detail here, we are concerned with a diffusion whose infinitesimal generator # has adjoint (23.5) S?*:=±V-(aV). What this means is as explained is Section 4: there is a transition semigroup (pt\>o such that <&f = limtiot~1(Ptf - f), at least for some class of/. As a process, (Xt)t>0 satisfies rt f(Xt) - f(X0) - yf{Xs) ds is a martingale Jo for all / in some class. This is a sample-path formulation of the statement (4.4): at The Markov process X may be considered as describing the motion of a single particle of dye. (Note in passing that in general there is no closed-form expression for Pt, unlike the Brownian case, so an alternative prescription such as the generator ^ is very necessary!) Formally, the arguments establishing Kolmogorov's backward and forward equations (4.6) and (4.7) now run as before, but do note that the analogue of the forward equation (4.7) should read where #* is the formal adjoint of #: f9g= [g<S*f for f,geC%. This distinction did not arise for Brownian motion, whose generator |Δ is self- adjoint, nor will it arise here if the matrix a is symmetric (since then # is again self-adjoint). It is the received wisdom of physics that a is symmetric and non- negative definite. Diffusions with 'divergence-form' generators (that is, of the form (23.5) with a symmetric non-negative definite) are particularly tractable because they are amenable to the theory of Dirichlet forms. The ideas of Dirichlet-form theory are drawn from physical notions about energy. While we are discussing the physical aspects of diffusion, it is worth pointing out that no physicist would accept Brownian motion as a literal model for the movement of a particle, since the path has infinite variation. However, a more satisfactory model can be built by making the velocity vt of the particle into Brownian motion—but an even more satisfactory model can be made by making
54 BROWNIAN MOTION 1.23 v solve (23.6) dvt = dBt-Xvtdt, where Λ>0 is the viscous drag coefficient, and В is Brownian motion on R, B0 = 0. This permits the velocity to wriggle around, but makes it unlikely to get too big. The correct interpretation of (23.6) is of course — λ\ vudu, Jo (23.7) Ot = O0 + Bt- which is solved explicitly by (23.8) vt = v0e~M + e-M I eksdBs Jo where the stochastic integral appearing on the right is here properly interpreted by integrating by parts: rt rt (23.9) eXsdBs:= ektBt - λ eXsBs ds. Jo Jo The stochastic differential equation (23.6) is called the Ornstein-Uhlenbeck stochastic differential equation, and its solution is called the Ornstein-Uhlenbeck (OU) process. Assume for simplicity there we work in one dimension, and that v0 is zero-mean Gaussian with variance (2Λ)~\ independent of B. Then υ is a stationary zero-mean Gaussian process with covariance (23.10) cov (υ„ vt) = {2λ)~ * exp(- X\t - s\) The physicist would then take the process Jo ds as a model for the diffusion of a particle. This integrated OU process, being non-Markovian, is an altogether less obliging process than Brownian motion, and few of the functionals of X have closed-form distributions. However, it is not too hard to prove that λ χ{ηή\ -ί->(Β,χ>0 (п^схэ) η /t^o (where the sense of convergence in distribution is fully explained in Part 6 of Chapter II), so for large time scales, Brownian motion may be accepted as a model of physical diffusion. {23.11) Exercise. Confirm that (23.8) solves (23.6), and that the covariance is (23.10).
1.24 GAUSSIAN PROCESSES AND LEVY PROCESSES 55 4. GAUSSIAN PROCESSES AND LEVY PROCESSES Gaussian processes 24. Existence results for Gaussian processes. In general, a Gaussian process is a process (Xt)teT indexed by a general set, such that, for any tl9...9tneT9 (Xtl,...,Xtn) has a multivariate Gaussian distribution. Thus the distribution is specified by the mean and со variance: (24.1) p(t):= EXt9 p(s, t):= cov (Xs, Xt\ s, te T, since we could write down the point density of (Xtl,... ,Xtn) *n terms of μ and p. It is customary to assume that μ ξ 0, since the general case can be reduced to this by taking the process X't:=Xt — μ(ί); we shall follow this custom and henceforth assume that μ = 0. {24.2) PROPOSITION. The function p:7xT->R is the covariance of a Gaussian process if and only if ρ is non-negative definite: (24.3) for any tl9..., i„, (p(th *,·))" j=ι is a non-negative definite matrix. Proof. Necessity is immediate. Sufficiency uses the Daniell-Kolmogorov Theorem II.31.1. To check the conditions of that theorem, notice that the state space R is Polish, and that, for J = {tl9..., tn} с / = {tl9..., ί^} (Ν > n), the law of (Xtl9...,Xtn) regarded as the projection of (Xtl9...,XtN) down to (Xtl9..., Xtn) is just N(0, V)9 where VtJ = p(ti9 tj) (i9j = 1,..., n). But this is the law of (Xtl9..., Xtn) regarded as {Xt:teJ}, and so the consistency condition of the Daniell- Kolmogorov Theorem holds. Π The limitation of Proposition 24.2 is that the condition (24.3) is not in general easy to check. One situation where this can be done is the following. The consequences are far-ranging. {24.4) LEMMA. Let Ε be a measurable space with a σ-finite measure m, and suppose that (Pt)t>0 ™ а sub-Markovian transition semigroup that has a density pt{;·) with respect to m: {P,f){x) = P,(x,y)f(y)m(dy), /еЪЯ, ί>0. Suppose further that (24.5) (i) pt{x, y) = pt(y, x) for t > 0, x, yeE; 9(х,У)= P,(x, Jo (24.5)(ii) 0(х,.у)= p,{x,y)dt<co forallx,ye£. Then g is the covariance of a Gaussian process on E.
56 BROWNIAN MOTION 1.24 Proof. We need only check the condition (24.3). But, for any aeIRn, xl9..., x„eE, η и Г<х> n и Σ Σ ai9(xhxj)aj= \ dt Σ Σ aiPt(XhXj)aj = Λ Σ Σ w(^)fliPf/2(^y)Pr/2(^^)flj JO i=lj=lj dt \m{dy)\ Σ (НРцгЬьУ) using the symmetry of pt at the last step. Π (We have looked ahead to Section III.3 for the definition of a sub-Markovian transition semigroup, but we have already seen the essentials in Section 4.) As an example of a situation to which Lemma 24.4 would apply, consider a random walk on a finite connected graph G, when a jump from vertex i to a neighbouring vertex; takes place at unit rate, and where the process is killed at a constant rate <5. The measure m is simply the counting measure on the vertices of G. As an example of a situation to which Lemma 24.4 would not apply, consider Brownian motion in ]R.d{d ^ 3); the condition (24.5)(ii) fails for χ = у. Broadly speaking, the finiteness of g(x, y) for all χ Φ у is equivalent to the transience of the process, and g(x, x) < oo is equivalent to the process visiting χ with positive probability (though, without enough conditions to ensure nice sample paths, we cannot yet begin to make rigorous sense of this in general.) The argument of Lemma 24.4 is too pretty for us to abandon all hope of making it apply to Brownian motion. Some sense can be made of it, but we have to consider a generalised random field, indexed by the vector space (24.6) ®(<f):= {σ-finite measures μ on Rd s.t. <^(μ,μ) < oo}, where $ is the energy functional encountered in Section 22: (24.7) <ί(μ, ν):= fL(dx)g(x, y)v(dy). The inequality (24.8) ^(μ,ν)2^^(μ,μΚ(ν,ν) follows from the Cauchy-Schwarz inequality and a simple modification of the idea of Lemma 24.4, as you are invited to check, and provides a proof that Q){&) is closed under addition. We can now consider the Gaussian field {Χμ:μΕ^(^)}, with со variance Ε(ΖμΧν) = ^(μ,ν).
1.24 GAUSSIAN PROCESSES AND LEVY PROCESSES 57 The intuitive interpretation X.-J» guides us, but has no strict sense; there does not exist a process (Xx)XeRd· Nevertheless (Χμ)μ€^{^) is a perfectly good Gaussian field, whose existence is confirmed by checking the condition (24.3) and appealing to the Danniell-Kolmogorov Theorem. We shall say some more about these random fields whose covariance comes from a symmetric Green function in the next section. For more on the relevance to quantum field theory, see Symanzik [1], Brydges, Frohlich and Spencer [1] and Dynkin [5]. One other existence result that we cannot do without is the celebrated theorem of Bochner. We consider now only the case where 7 = Rd, and where the covariance structure is stationary: for all x,yeRd, p(*,)0 = p(O,)>-x) for brevity. We give the full form of Bochner's Theorem. (24.9) THEOREM (Bochner). Let <p:Rd->C be bounded and continuous. Then the following are equivalent. (24.10) (i) There exists a finite measure μ on Rd such that φ{θ) = U{dx)eWx. (24.10)(ii) For anyau...,ane(C,xl9...,x„eRd, π η Σ Σ αιφ(*ι-*/)α/>ο. (We say that φ is non-negative definitej Before proving this we apply it to the representation of a stationary Gaussian process on Rd. {24.11) COROLLARY. Suppose that p:Rd-+R is continuous. In order that ρ should be the covariance of a stationary Gaussian process on Rd, it is necessary and sufficient that ρ may be represented in the form (24.12) p(x) = F(dd)e,ex, where F is a finite non-negative symmetric measure on JRA (The measure F is called the spectral measure of the Gaussian process.)
58 BROWNIAN MOTION 1.24 Proof. If the representation (24.12) holds, the criterion (24.3) is easy to prove. Conversely, if (24.3) holds then ρ is non-negative definite, continuous (by hypothesis) and bounded (since p(x) ^ p(0) = EX I < oo), so, by Bochner's Theorem, ρ is a Fourier transform of some non-negative measure, which is symmetric since ρ is. Π Proof of Theorem 24.9. The implication (24.10)(i) => (24.10)(ii) is trivial. For the converse, the aim should be to get the inverse Fourier transform of φ. We approach this by taking some large integers K,n > 0, and (with δ:— 1 /ή) noticing that (24.10)(ii) implies, for any 0eIRd, 0 < (2n + ΙΓ2άΣκ,η^ίδθ1φ(δΙ- dj)eibB\2K)-d where Σ*,„ denotes the sum over all pairs (lJ)e(Zd)2 such that || / II oo:= sup{|/r|:r= l,...,d} ^Kn, ||j||^ ^Kn. But, as n-+oo with К fixed, this expression converges to (24.13) (2K)~d dx dye-ie{x-y)(p{x-y) J{||*IL<jk} J{||ylU<K} J {Nicotic} j=i\ ^V which is thus non-negative. (We use the continuity of φ to get the convergence.) But (24.13) is (to within powers of 2π) the inverse Fourier transform of ^)n,d=1(l-|^l/2iC),and where the components of X are independent, with density f{v) = {I - cos v)(nv2y К Thus ii f2K(v):=2Kf{2Kv), we have that (24.13) is (a multiple of) (<p*F2K)(9% where F2K(v) = П^/гк^Д an(i Φ is the inverse Fourier transform of φ. But the distributions with density F2K converge weakly to the point mass at 0, and the density oiq)*F2K is non-negative. Hence φ is a non-negative measure, which is what we sought. Π (24.14) Remark. The function p(x) = Ifx=0\ (xeJR.d) is the covariance function of a Gaussian process {Xx: xeRd} for wnich XXl,..., Xx are independent N(0,1) for any xl9... ,xn. However, ρ is not representable in the form (24.12). This may appear to be a limitation of the representation result Corollary 24.11, but it is not particularly grave; the process X cannot have a version with any sensible regularity properties, and so is essentially useless. It is a simple exercise to prove
1.24,25 GAUSSIAN PROCESSES AND LEVY PROCESSES 59 that if X is a stationary Gaussian process for which x\—>Χχ{ω) is continuous for almost all ω then ρ must be continuous. Thus if we want a stationary Gaussian process with continuous paths, continuity of ρ is necessary, and, as we shall see next, we need only strengthen continuity to a mild form of Holder continuity to obtain a sufficient condition. (24.15) Exercises (i) Spectral measure of the one-dimensional Ornstein-Uhlenbeck process. Confirm that by taking F{dd) = (λ/π)(λ2 Λ-θ2)'1 άθ in (24.12), we recover the (Ornstein- Uhlenbeck) со variance p(x) = е~яК (ii) Levy's Brownian motion. With F{d0) = (2π)π exp (-11 \ Θ |2) άθ in (24.12), show that pi(x) = (2^)-1/2exp(-^ defines a stationary со variance function on Rn. Deduce that, for λ = \γ > О, > Г е~х\2п Jo i)"1/2exp( -l^-)dt = e-yM defines a stationary covariance function, for which E[(*x-*/] = 2(l-exp{-y|x-y|}). Prove that there exists a Gaussian process (Y^ixeR"), Levy's Brownian motion, such that Г0 = 0, El(Yx-Yy)r = \x-y\. A wonderful paper by McKean [4] discusses Markov properties of Levy's Brownian motion, showing that it behaves very differently in even and odd dimensions. 25. Continuity results. Let (Art)ieRn be a stochastic process with values in a complete separable metric space (S, p). We say that X has a continuous version if there exists an S-valued stochastic process (X't)telRn such that (25.1) (i) t\—>Χ[(ω) is continuous for almost all ω); (25.1)(ii) p(X't(a>),Xt(a>)) = 0, a.s. for all ieR*. If a process has a continuous version, we generally discard the original process and work with the continuous version instead, because the original was too irregular to work with. We now give a simple but powerful result that is usually sufficient to decide when a process has a continuous version. (25.2) THEOREM (Kolmogorov's Lemma). // {Xt)t€Rn is a stochastic process
60 BROWNIAN MOTION 1.25 with values in a complete separable metric space {S,p\ and if there exist positive constants а,С,г such that, for all s.teTR" (25.3) Ep(XSJXt)*<C\s-t\n+\ then there exists a continuous version of X. This version is Holder continuous of order θ for each θ < ε/α. Proof. Let D:= (Jk>0Dk be the set of dyadic rational points in Rn, where TDk:=2~kZn. The idea of the proof is to show that the restriction of X to Dn[0,l)" is Holder (0) for any 0<ε/α; we then extend X by continuity to [0, l)n, and then apply the same argument to a cube of arbitrary size. So we fix θ e (0, ε/α), and define Ak:= {for some UjeF9\i-j\ = Ц2~к and;2-ke[0,l)n, and P(X(i2-k\X(j2-k))>2-ke}. Then Ρ(ΛΚ Σ Σ P(p(^2-k),Z(;2-k))>2-ke) £e[0,2k)n jeZn Ι;-ί| = ι <2ик2п2кваС2"к(и+£) = 2пС2~к(£-ва\ and so by the Borel-Cantelli Lemma with probability 1 only finitely many of the Ak happen, so that, for some Κ = Κ(ω), (25.4) p(X(i2-k\X(j2-k)) ^ K2~ke for all fceN, /,;е[0,2к]и, \i-j\ = 1. All that remains is to extend (25.4) from neighbouring dyadic rationals to any. For this, let us assume for notational simplicity that η = 1; the general result is an immediate consequence of this. If we take 0^χ <у < 1, with χ,уеШ), then, for some /c, 2~k~l <y — χ ^ 2~k, and so there exists i such that х^/2-к-1<(1+1)2"к-1^у, and using (25.4). Now we similarly analyse the intervals [x,i2~k~1] and [(i+ l)2~k~1,y] by chipping off the largest dyadic-rational intervals in each (of length at most 2~k~2 in each case). Continuing thus, we have p{Xy,Xx)^2K2~{k + l)e + 2K Σ 2~гв r^k + 2 <К'\у-х\в. The Holder continuity allows us to extend the process X now to the whole of [0,1)".
1.25 GAUSSIAN PROCESSES AND LEVY PROCESSES 61 (25.5) Remarks (i) A function that is Holder continuous of order θ > 1 is constant, (ii) A more general and more powerful way of proving the existence of continuous versions has been discovered by Garsia, Rodemich and Rumsey [1], and is extremely useful. You will find nice accounts in Stroock and Varadhan [1] and Walsh [3]. We are going to use Kolmogorov's Lemma to derive sufficient conditions for the existence of a continuous version of a Gaussian process. These conditions are not necessary, but the gap is unimportant in most examples that arise. Necessary and sufficient conditions for the continuity of a Gaussian process are now known; work of Fernique, Dudley and others enabled Talagrand to reach the summit in [1]. The conditions are of a technical nature, so we refer the interested reader to Talagrand's paper, or to the books by Adler [1,2], which are full of other interesting results on Gaussian processes. (25.6) COROLLARY. Let (Xt)teJRn be a (zero-mean) Gaussian process with covariance function p(s,t):=EXsXt. A sufficient condition for the existence of a continuous version is that ρ should be locally Holder continuous: for each NeN there exists θ = Θ(Ν) > 0 and С = C(N) such that, for |i|, \s\ < N, (25.7) |p(M)-p(t,t)l«C|s-t|e Proof We have E(\Xt - Xs\2) = p(u t) - 2p(us) + p(us) ^2C\t-s\e. Since Xt — Xs is Gaussian, there exist constants am such that E\Xt — Xs\2m = am(E\Xt-X5\2r^ndso E(\Xt-Xs\2m)^(2Cram\s-tr. For large enough m, тв > η, and we can use Kolmogorov's Lemma. Π For a stationary Gaussian process, p(s, t) = p(t — s), the condition (25.7) reduces to the Holder continuity at 0 of ρ (which implies easily that ρ is Holder continuous everywhere.) In the case of a stationary Gaussian process, it is often convenient to have a condition in terms of the measure F that represents p, (24.12). Here is one such. (25.8) COROLLARY. Suppose that, for some ee(0,1), (25.9) Γ \x\EF(dx)<oo. Then X has a continuous version.
62 BROWNIAN MOTION 1.25 Proof. Let A:= J |x|EF(dx), Br = {xeR": |x| < r), and note that (25.9) implies F{Bcr)^Ar-c. Let us now take the case и = 1, indicating later how the general case follows: I-P(*)=f( 0 ^ />(0) - p(x) = | (1 - cos Βχψ(άθ) 9x)2 a 2)F{d9) But the estimation gives us < P" i02x2F(^) + 2F(5c2/J J -2/x Г 2/x ^^x2 e2F{de) + 2A{\xf. J -2/x Jo Jo ^A I y'-'dy = Α(2-ε)-1Ν2"ε 6 — ε 0^ρ(0)-ρ(χ)^χε2ι~εΑ 2-е so that p is Holder continuous at zero. For general n, fix some unit vector v, and let Fv be the image of the measure F under the map θ\—>θ-ν. Thus the measure Fv satisfies the bound Fv({x: |x| > r})< F{Bcr) ^ Лг"£, and so, as before, 0 < p(0) - p(x-v) < χε2χ "Μ(6 - ε)/(2 - ε). Since the constant does not depend on v, the Holder continuity at 0 of ρ now follows. D We therefore have quite useable criteria sufficient to ensure the existence of a continuous version of a stationary Gaussian process {Xt:teWL*}. Can we obtain similarly simple criteria for the existence of a Ck version? {25.10) THEOREM. Let p(x) = $eWxF(de) be the covariance function of a stationary Gaussian process on Rn, let α = (α1,...,απ) be a multi-index and let εΕ(0,1) be such that (25.11) f (Πθ?>)\θ№№<θ.
1.25 GAUSSIAN PROCESSES AND LEVY PROCESSES 63 Then there is a version {Xt:teJR"} of the process for which DaX{t) exists and is continuous. The process {DaAr(t):ie]Rn} is a stationary Gaussian process with spectral measure (25.12) Fa(^):=(^n^2ajV(^)· Proof It is clearly sufficient prove only the case α = (0,..., 0,1). For notational convenience, we write a point of Rn as (τ,ί), where teR"-1 and ieR. We build a stationary Gaussian process {(ξν Yxt): (τ, i)eRn} with zero mean and со variance structure (25.13) (i) Ε(ξοξχ) = Έ(ΧοΧτ,ο) = ρ(τ,0), (25.13) (ii) Ε(ξ0Υτί,) = ^(τ,ή, ot (25.13)(iii) Ε(707Μ)=--£(τ,ί). dt2 (The fact that $d2F{d9) < oo implies that the first two partial derivatives of ρ with respect to t exist and are continuous.) In order to see that (25.13) really does give the covariance of a Gaussian process, we see at the same time why (25.13) was chosen. Indeed, if we fix some h > 0 and consider the process ^YD^iX^h-'iX^-XJ) then the process clearly exists, so its covariance is non-negative definite and satisfies (25.14) (i) Ε(^τ) = ρ(τ,0), (25.14) (ii) Ε(ξ0Υΐ) = h~4>(τ, t + h)- ρ(τ, ί)], (25.14)(iii) Е[У0 7* J = /Γ2[2ρ(τ, ή - ρ(τ, t + h) - ρ(τ, t - h)\ Thus the limiting form of the covariance (25.13) is also non-negative definite, and therefore /5 the covariance structure of some Gaussian process. In view of the integrability assumption (25.11) and Corollary 25.8, there is a continuous version of (7τ>ί), and the spectral measure of (Yttt) is just 92F(dd), since the covariance function of У is — d2p/dt2. There is a continuous version of ξ because there is a continuous version of X. Now we simply define Jo (25.15) Χτ/.= ξτ+\ Yx.sds. Jo It is immediate that X is a continuous Gaussian process, with a continuous derivative with respect to i, and it is a simple exercise to confirm that X has the same covariance as Χ. Π
64 BROWNIAN MOTION 1.25 (25.16) Remarks (i) John Kent [3] has obtained attractive sufficient conditions in terms of the covariance structure of an arbitrary (non-Gaussian) stationary process for the existence of a continuous version. His result is as follows. If ρ is Си, and pn(h) is the polynomial of degree η given by the Taylor expansion of ρ about 0, and if there exists γ > 0 such that (25.17) ΙΡ(*)-Ρ-(Λ)Ι = 0(Γ-/|1ο8Γ|3 + η as r = |A|->0, then there exists a continuous version of the random field {Xt:teTR.n}. (ii) Everything we have done in this section and the previous section goes through with minor modification for vector-valued Gaussian processes. Thus if {Xt: ieRd} is a fc-vector stationary Gaussian process, we have that, for each 7 = l,...,d, {A"/:ieRd} is a stationary real Gaussian process, and so its covariance can be represented as in (24.12): ρη(ί):=ΕΧ<(0)Χ\ή- Fjj{de)e> ie-t By considering more generally the real stationary Gaussian process α·Χν where aelR* is fixed, we deduce the representation \ή = ^β(άθ)β^\ (25.18) Pjl(t):=EXj(0)X where, for each Borel B^Rd, {F^B)) is a non-negative definite matrix. Likewise, the condition. ^\рл(<1в)\в\2г+*<оо will ensure the existence of а С version of the vector Gaussian random field. (25.19) Example: Brownian bridge. A Brownian bridge is an R"-valued Gaussian process (Xt)o^t^T such that, f°r s9te[09 T], (25.20) EXt = at, E(XSX*) - staa* = ( s л t - - )/, where aeR" is fixed, and Τ > 0 is fixed. Does such a process exist, and does it have a continuous version! To answer this, we may assume without loss of generality that a — 0 (because we could always add the function tv->at to the zero-mean process), and that η = 1 (because we could construct each component of the motion separately). There are many ways of proving that such a process exists and has a continuous version (look at Theorem IV. 40.3 for four different representations!) but for now we use the methods developed for general Gaussian processes to prove this.
1.25 GAUSSIAN PROCESSES AND LEVY PROCESSES 65 First, if η is any bounded signed measure on [0, T], we see that η(άχ) η(άγ)ρ(χ, y):= η(άχ) η{άγ)1 хлу-^ J[0,T] J[0,T] J[0,T] J[0,T] \ * ГТ ГТ ГТ ι Г ГТ ГТ П2 = \ dv\ η(άχ)\ n{dy)--\ dv\ η(άχ)\ Jo Jv Jv * LJo Jv J by the Cauchy-Schwarz inequality. Hence, by Proposition 24.2, the function p(s,i):=5A t — st/T is the covariance of a stationary Gaussian process. (We could also use Lemma 24.4, since ρ is (a multiple of) the Green function of Brownian motion in [0, Γ], killed when it exits (0, Γ); this approach is less elementary, though.) As to the existence of a continuous version, the condition (25.7) of Corollary 25.6 is trivial to verify, and delivers the result immediately. (25.21) Exercise. If В is Brownian motion, verify directly that T — t ( tT X{t):=at + B\ T-t satisfies (25.20), and conclude that there exists a continuous version of the Brownian bridge. (25.22) Example: Brownian sheet. The Brownian sheet is a real-valued two- parameter zero-mean Gaussian process {B(s, t):s, t ^0} such that p((s, t), (u, t>)):= E[B(5, t)B(u, t>)] = (5 л u)(t л ν). The existence of such a Gaussian process follows because >/(i/s, dt) (R+)2 Γ Γ Γ00 Γ00 Ί2 ^du,dv){sA u){t л v)=\ dxdyl ^ds,dt)\ ^0 (R+)2 J(R+)2 LJs = xJf = y J proves that the covariance function ρ is non-negative definite (Proposition 24.2). the continuity follows again easily from Corollary 25.6, since ρ is Lipschitz continuous. It is worth remarking that the process Xr{t):= X(z,t):=e-xf2B(e\t) is a continuous Gaussian process such that, for each tgR, (Xx(t))t^ 0 is a standard Brownian motion. Moreover, Xx is a stationary process, as is easily verified. This 'Brownian motion of Brownian motions' arises in many contexts; see the expository papers in Williams [13], and Walsh [3].
66 BROWNIAN MOTION 1.25,26 (25.23) Exercise. Satisfy yourself that an η-parameter Brownian 'sheet' {B(tu..., i„):i„eR + } can be defined just as easily. 26. Isotropic random flows. The study of turbulent fluid flow using stochastic methods has a long history, and has involved many great names in probability and fluid dynamics, including Kolmogorov [1,2] Taylor [1], Batchelor [1], Ito [8] and Yaglom [1]. The first objective of this work is to construct and classify Gaussian random fields (7:Rd->Rd that are not only stationary (with respect to all shifts of the parameter xeIRd), but also isotropic, which means that, for each GeO(d), (26.1) (GU(x))xeUA (C/(Gx))xeRa. Physically, this means that the random field U 'looks the same' in all coordinate systems. If we assume as usual that U is zero-mean, and define the covariance function (26.2) pjk (x):= Е[СЯ(0)С/к(х)] as before, then the condition (26.1) is equivalent to (26.3) p(Gx) = Gp(x)GT, VGeO(d), VxeRd. In terms of the spectral measure representation (25.18), if we could assume that F had a smooth density Fjk(d&) = fjk(9)d9, then the isotropy condition (26.1) would be equivalent to (26.4) /(G0) = G/(0)GT, VGeO(d), V0eIRA (The assumption that F has a smooth density is harmless; if ρ is an isotropic covariance then ρε(χ):= e~E^2p(x) is another isotropic covariance with a spectral measure that does have a smooth density.) For concreteness, we shall from now on assume that, for some ε > 0, Φ'{ (26.5) Σ|^0)|0|2+ε<α), so that the random field U has a C1 version. What does an isotropic random field look like? The next result partly answers this. (26.6) PROPOSITION. If ρ is isotropic then it may be represented in the form χΐχ ι Χ^Χ \ (26.7) p"(x) = Pi.(r)-^- + PN(r)^k - —J, where r:= |x|, and pL,pN <we two continuous functions such that pL(0) — pN(0). Proof. Take ex =(l,0,...,0)TeRd and note that by (26.3), (26.8) p(re1) = Gp(re1)GT
1.26 GAUSSIAN PROCESSES AND LEVY PROCESSES 67 for all GeO(d) for which Ge1 = ev But the G that fix ex are exactly those expressible as (26.9) G = 1 0 0 where ReO(d - 1). If (26.8) holds for all G of the form (26.9), it is easy to deduce that р{ге^) must be of the form P(^i) = pM 0 ——— ) = Pdr)ei*l + Ps(r)(I - *i*D> where Id _ x is the (d — 1) χ (d — 1) identity matrix, and pL, pN are some continuous functions. The result now follows by rotating the generic xeRd to be a multiple of ev D This result is far from a complete characterisation of an isotropic covariance, since we know little about the functions pL and pN. The following result leads us to a complete description of pL and pN. (26.10) COROLLARY. Let σ be the surface Lebesgue measure on S4'1. Then the spectral measure (Fjk(m))jik = ltd is the spectral measure of an isotropic covariance if and only if there exist measures μΡ and μ5 on (0, oo) and a constant у ^0 such that, for any heC^(Rdl (26.11) (h(0)Fjk(d9)= ( a{du) h{ru)[ujukpP{dr) + (Sjk - иjuk) μ5(Λ·)] (O.oo) + yh{0)Sjk. Proof Suppose first that F is isotropic. If F has a smooth density / then / satisfies (26.4). But then Proposition 26.6 implies that (26.12) fjk(6) = φΡ(\θ\)θΨ\θ\-2 + <psW)Q* ~ θΨ\θ\~2) for some smooth φΡ and <p5. Thus, for /ieC£(Rd), h(9)Fjk(d9)= f a{du) Γt*-4rh(ru)fjk(ru\ Js*-* Jo which is of the form (26.11) with 7 = 0, pP{dr):=rd~1(pP(r)dr, and μ8(άή:= r4~19s(r)dr. Moreover, for radially symmetric й, (26.13) Σ ^h(e)Fjj{d9) = cd J" h(r)\jiP(dr) + (d- l)ps(dr)l where cd:= a(Sd *). To dispense with the assumption of a smooth density, let FE be the spectral measure corresponding to covariance e~e^2p(x) and observe
68 BROWNIAN MOTION 1.26 that (26.13) with h = 1 shows that the measures μρ and μ* have bounded total mass, and taking h(6) = \9\2 in (26.13) shows that μρ and με8 have bounded second moment and so are tight. Taking a weakly convergent subsequence, we derive the limiting form (26.11), the constant γ appearing because in the limit μΡ and μ5 could put mass on 0. For the converse statement, if F has the form (26.11) then (26.14) ρ cos (ru · x) [μ*ιίμΡ(ώ) + {Sjk - и V)^5(dr)] + ydjk *(x)= f a(du) :=δ* Γ9(Γχ)μ5(άή+ i™gjk(rx)<Jip-Hs){dr) + 7d*, Jo Jo where g(v):= f JSd-l cos (vu)a(du), 9jk(vY= uJuг cos (v - u)a{du). 5d-l Note that we can express 9(ν) = (2πΥ"-^\υ\-^-2^(<ι_2)/2(\ν\):=ψ(\ν\), 9jM = Ψ 9(v) φ"(\ν\) vjvk \v\ \ \v\2 = (2я)<--1)/^^|1,|--/^,/2(|1,|)-И-<--2>/27(<,+2)/2(|1;|)^| where Jv() is the standard Bessel function (see Watson [1]), using the well- known identities 2v Jv_1(z) + Jv+1(z) = — Jv(z), ζ Jv.1(z)-Jv+1(z) = 2J'v(z). Abbreviating (2π)(<,-1)/2 to ad, j(d-2) to ν and setting Яу(х) = МУ(|х|)|хГ\ хеШ.",
1.26 GAUSSIAN PROCESSES AND LEVY PROCESSES 69 we obtain from (26.14), after some calculations, (26.15) p*(x) = γδ* + (δ* - ^) | [°° м5(Лг)[Яу(гх) - Hv + x(rx)] + Γμ,(Λ·)Η,+ 1(«)| + Ϊ4Η I №(Л-)(^-1)Я,+ 1(гх) f Jo μΡ(ίίΓ)[Ην+1(Γχ)-|Γχ|2Ην + 2(Γχ)] Thus we have expressed p·** in the form (26.7) with (26.16) pL(\x\):= pPL(\x\) + PsdM), Pn(M):= Ррн(Ы) + Psn(M), where (26.17) (i) P«.(M) = Г [Яу+i(rx) - |гх|2Яу+2(гх)]М<Н (26.17) (ii) pa(|x|) = f °° (d - l)Hv+1(rx^s(dr), J о (26.17)(iii) Р,*(М) = [°°Яу+1(гх)М,,(^), J 0 (26.17)(iv) p5N(|x|) = f °° lHv(rx)- Яу + 1(rx)]M5(dr) D J о Thus Corollary 26.10 not only characterises completely the spectral measures of an isotropic covariance, but also gives a representation (26.16), (26.17) for the possible isotropic covariances themselves. Let us now explain the choice of the subscripts Ρ and S for the measures; Ρ stands for 'potentiaF, S stands for 'solenoidaV. Indeed, the general isotropic covariance can be expressed, according to (26.15), as (26.18) p(x) = ySjk + pP(x) + ps(x\ where ρ^χ):=^-^ρΡΝ(\χ\) + ^Ρρά\χ\1 p*(x):= (δ* - ^jPss(\x\) + ^Psl(\x\1 are the covariance of two isotropic Gaussian random fields, as is γ Sjk (the latter
70 BROWNIAN MOTION 1.26 being the covariance of the trivial field U{x) = Y, Vx, where Υ ~ N(0,yl)). Thus we have decomposed the general isotropic Gaussian random field into the sum of three independent isotropic Gaussian random fields, with covariances pP,ps and yl respectively. The random field with covariance ps actually is solenoidal (that is, divergence-free), as we confirm by computing (5,·:= d/dxj) E[divC/(0)]2 = E^^t/j(0)Y = ΣΣΕ[δ,Η0)δΛ[/Λ(0)] J к J к j к J = 0 if F is given by (26.11) with μΡ = 0 and у = 0. Next, we check that the random field with covariance pP is actually a potential (that is, curl-free: djUk = dkU\ V/, k) For this, we just need to compute E[dj Uk(0) - dkU\0)Y = - djdjPkk(0) + 2djdkp*(0) - dkdkp»(0) = jielFnW ~ 2θβ^(άθ) + e2kFkk(d9)l = 0 if F is given by (26.11) with μ5 = 0 and у = 0. If a C1 vector field U is curl-free, then there is some real C2 function K:=Rd-»R such that U = grad V. In this context, it is natural to ask whether, for a curl-free isotropic Gaussian random field I/, we could find a stationary Gaussian process (V(x))xe^d such that U = grad V. This cannot always be done; as an exercise, check that the condition that permits such a representation is fΣFjjW)№\2- c* J" r~^p(dr) < сю. The interest of this decomposition is that the solenoidal flows preserve Lebesgue measure and so correspond to an incompressible flow of fluid. Let us recall that the reason that isotropic Gaussian fields are of interest is as a model for turbulent fluid flow; and any turbulent fluid flow must be expected to be changing in time, so should be modelled by some random field (U(t, x): ieR, xeIRd) with values in Rd. In view of what we have already done, we can set up a simple model for such a time-varying Gaussian random field by taking (26.19) ElUj(t,x)Uk(s,yK = y(s,t)pjk{x-y),
1.26,27 GAUSSIAN PROCESSES AND LEVY PROCESSES 71 where ρ is an isotropic covariance, and у is the covariance of a real Gaussian process indexed by R. The reason we do not assume that у is the covariance of a stationary Gaussian process is that we may wish to use y(s, t):= s л t, the Brownian-motion covariance. Indeed, if we do so then, for each x, (U(t,x))t^0 is a Brownian motion in Ra, and the correlation between the different Brownian motions [/(·, x) is given by p. This suggests the intriguing notion of studying the motion of a particle dropped into this turbulent flow; if the particle starts at x, and is at position Xt(x) at time i, then, in some sense, we should have infinitesimally (26.20) d(Xt(x)) = dU(t,Xt(x)). Good sense can be made of this; see Baxendale and Harris [1] and Kunita [3]. It will not come as any surprise that the process (Xt{x)\^0 is a Brownian motion. But much more interesting is to study the flow (Xt(x))t> 0>JceRd, which tells us not only how individual particles move, but also how they move relative to each other. There are many fascinating and beautiful results here; see Baxendale [1], Baxendale and Harris [1]. Carverhill [1,2] Harris [1], Kunita [3], Le Jan [1-4], Le Jan and Watanabe [1] and the many references therein for a view of what is known. We have already looked ahead way beyond the scope of this volume, and must now end our discussion of isotropic random fields; we hope that what we have said will help you get started on papers .such as Baxendale and Harris [1], Le Jan [4] and Yaglom [1]. {26.21) Exercise. Confirm that (26.19) is the covariance of a Gaussian process. 27. Dynkin's Isomorphism Theorem. Since our entire discussion of Gaussian processes is by way of a digression motivated by interest, we make no apologies for describing briefly here (a caricature of) Dynkin's work on Gaussian processes and local-time fields, even though we shall develop it no further. This is appropriate for the mysterious but powerful result that we are about to discuss, since it is clear that its full potential is yet to be explored; we point to the paper of Marcus and Rosen [1], where continuity results for local-time fields are deduced from continuity results for Gaussian processes using Dynkin's ideas, and to the paper of Sheppard [1], which deduces the classical Ray-Knight Theorem on Brownian local time using the Dynkin's result. We shall consider a sub-Markovian process (Xt)t^0 with values in a finite set /, and a transition semigroup (Pt)t^0 that is symmetric .and assumed to be integrable and irreducible: 0<g(x,y):= pt{x,y)dt<cc, 4x,yel. Jo Let Q denote the β-matrix of the chain X: Q = P(0).
72 BROWNIAN MOTION 1.27 We shall also consider a zero-mean Gaussian process (φχ)χ€Ε with covariance Щ<Рх<Ру) — 9xy> independent of X, and introduce the notation Px"+y for the law of X started at χ and conditioned to die at y; precisely, this is the process with the sub-Markov transition function (27.1) р!(а,Ъ) = Рг(а,Ъ)д(Ъ,У)/д(а,у). (27.2) THEOREM (Dynkin). // F:JRJ -»R+ is any bounded measurable function then, for any x,yel, (27.3) E<pxcpyF£(p2) = E^'F^cp2 + T)gxy, where Ta:= $™ I{Xt=a)dt is the occupation field of the chain X. (27.4) Clarification. On the left-hand side of (27.3), the expectation is only over the Gaussian field φ. On the right-hand side, X has law Px~+y, and φ is independent of X. Proof. It clearly suffices to prove the theorem only for F of the form F(£) = exp(- Σ£^-) which we now assume. Let Λ denote the diagonal matrix (diag(2a)). The inclusion of the weighting F(j<p2) changes the law of the Gaussian field from N(0, G) to N(0,(Λ - Q)~x); in particular, Eq>x(pyF(±<p2) EFfrp2) (27.5) *!. Γ =(A-Q)-W)· This leaves us just to confirm that (27.6) gx,E*->F(T) = (Λ - βΓ^Χ,Λ since then (27.3) follows immediately from (27.5) and (27.6). Under Px"y, the process X is a Markov chain with β-matrix Q^D'XQD9 where D = diag(#(a,y)), as we deduce immediately from (27.1). The problem is thus to evaluate (27.7) E*exp|- J X(X.)ds\ where X is a chain with β-matrix Q, dying at ζ, whereupon it is sent to a graveyard d. The interpretation of (27.7) is that the process X is also being killed at rate λ(·) and being sent to a graveyard d\ and (27.7) is the probability that the process ends in d rather than d'. The Q-matrix of the process on
1.27,28 GAUSSIAN PROCESSES AND LEVY PROCESSES 73 lKj{d,d'} is thus 1 д & /β-Λ -βΐ Λΐ\ Ιο ο ο ' \ 0 0 0/ and from this we compute immediately Ρ*(Χ ends in 5) = (Λ - Q) "1 (- Q l)x = D-1(A-fi)-1D(-ei)x. But (β l)x = Σ Ϊχο = Σ IxaGaylQxy a a OxylGxyi so that PX(X ends in δ) = —(Л - ОГНьу), Gxy completing the proof. D Levy processes 28. Levy processes. The aim of this section is to give the briefest of introductions to the theory of Levy processes. Inclusion of this material is justified because Brownian motion is a Levy process; Levy processes also provide one of the most important examples of the Markov processes studied in Chapter III and the semimartingales of Chapter VI. (28.1) DEFINITION. A process (Xt)t>0 with values in Rd is called a Levy process (or process with stationary independent increments^ if it has the properties (28.2) (i) for almost all ω, ί-> Χ,(ω) is right continuous on [0, oo), with left limits on (0, oo); (28.2) (ii) for 0 ^ t0 ^ t1 < ··· < tn, the random variables Yj.— Xt — Xt t (j = 1,..., n) are independent, (28.2)(iii) the law of Xt+h — Xt depends on h, but not on t. The analytic theory of the semigroups associated with Levy processes is the same as the theory of infinitely divisible distributions. (283) DEFINITION. A probability measure μ on Rd is infinitely divisible if for
74 BROWNIAN MOTION 1.28 each n, there is a probability μη on Rd such that ifV^..., Vn are independent with law μη then νί+-+νηΜμ. It is clear that if X is a Levy process then the law of X1 is infinitely divisible. The converse, that any infinitely divisible law is the law of X1 for some Levy process, X, will follow from the central result of the theory of Levy processes. (28.4) THEOREM (Levy-Khinchin representation) For each beIRd, each non- negative definite symmetric d χ d matrix % and each measure ν on Rd\{0} satisfying the integrability condition J< (28.5) |(|χ|2Λΐ)ν(<ίχ)<οο, the function (28.6) <р(в) = ехр[_ф(в)1 веШ.", is the characteristic function of an infinitely divisible law (here we write (28.7) ψ(θ):=ώθ-±θτΧθ [{<?■'-1-ίθ ·χΙιΜ<1])ν{άχ), the characteristic exponent of the law.) Moreover, the characteristic function of any infinitely divisible law on Rd may be represented in this way, with the representing triple (b, %, v) being uniquely determined. Proof. We refer the reader to Section 9.5 of Breiman [1] or Section XVII.2 of Feller [1] for the one-dimensional case. For the general situation, see Fristedt [1]. D The measure ν in (28.7) is called the Levy measure of the infinitely divisible law. An important extension of the definition of infinite divisibility is the following. (28.8) THEOREM. Let Xbea random variable with the property that, for each η (28.9) Χ* tXnj, where the Xn] are independent, and, for each ε > 0, (28.10) lim sup P( \Xnj\ > ε) = 0. η -► oo j ^ η Then X is infinitely divisible. Proof. See Breiman [1].
1.28 GAUSSIAN PROCESSES AND LEVY PROCESSES 75 The uniform asymptotic negligibility condition (28.10) is commonly encountered in limit theory; it is the simplest condition one could impose to prevent any one of the summands Xnj from contributing noticeably to the sum. As an example, the first passage times of one-dimensional diffusions are clearly infinitely divisible using Theorem 28.8, but not obviously so using Definition 28.3. As an exercise, check that if, for each n, the (Xnj) have a common distribution then (28.10) holds. We now show how any infinitely divisible law (with characteristic exponent φ of the form (28.7)) may be realised as the law of X1 for some Levy process X. The law of Xt must be given by Ε exp (Ю · Xt) = exp [i^(0)], \ the right-hand side being the characteristic function of an infinitely divisible law by Theorem 28.4. Thus if μ, is the law of Xt, for any i, s ^ 0 so we may define a Markovian semigroup {Pt)t>0 by Λ/(*):= Ях + yMdy), V/eC0(Rd). It is easy to see that, for each t ^ 0, Pt: C0(Rd):= {continuous functions on Rd which vanish at oo} -> C0(Rd). Hence (Pt)t^ 0 is a Eeller-Dynkin semigroup (see Section III.6), and by Theorem III.7.17, there exists a process X with paths that are right-continuous on [0, oo) with left limits on (0, oo) and with transition semigroup (Pt)t> 0, and such that X0 = 0. It is now clear that X is a Levy process, and X1 ~ μ1 as claimed. The result to which we have just appealed is quite technical, but once we have got it, it allows us to pass from an analytical description of the process (in terms of the convolution semigroup of infinitely divisible laws) to a sample-path description, which is far more powerful. To understand what we have gained by doing this, we are going to prove Levy's Theorem that the only continuous Levy processes are drifting Brownian motions (indeed, without sample paths, what does this theorem mean?!) First, though, we discuss the 'building-block' Levy process—the compound Poisson process. Consider a process that is constructed from a standard Poisson counting process (Nt)t>0 of rate A>0, and an independent sequence YUY29... of IID random variables with distribution function F as follows: we define (28.11) Xt= Σ Yj. It is clear that the paths of X are right-continuous with left limits. Moreover, the increments of X over disjoint intervals are independent, and Xt+S — Xt has
76 BROWNIAN MOTION 1.28 the law of a sum of a Poisson number of copies of У,: Eexp[i0-(*,+.-*,)]= Σ {-^-^ieiexF(dx) oo n\ J = exp L \(ew'x-l)XF(dx) \. Thus X is a Levy process, and if we compare with (28.7), we see that in the representation of X, $ ξ= 0, and ν has finite mass. We can now prove Levy's Theorem. (28.12) THEOREM (Levy). If X is a continuous Levy process in Rd then X is expressible in the form (28.13) Xt = aBt + bt for some beTR.d, and σ a d χ d matrix. Proof. Let us suppose that the characteristic exponent of X is given by (28.7); our task is then to prove that ν = 0. Fixing ee(0,1), we construct on some suitable probability space two independent Levy processes Xе and YE with characteristic exponents ψε(θ):=ώ·θ-\θτΧθ + Г [eiex-\-ie-x)v{dx\ ψε(θ):=ψ(θ)-ψε(θ) respectively. Then ι = {eWx - 1 - i0/(, JW>£ ψε(θ)= {βΡ"-1-ιθΙΜκ1)}ν(άχ) Ux\>b is the characteristic exponent of a compound Poisson process, since v({x: |x| > ε}) < oo. Thus Υε has only finitely many jumps in any bounded time interval. Since Xе and YE are independent, their sum is a Levy process with characteristic exponent ψε + ψΕ = ψ, so that Xе + YE is (a copy of) X. Now, since Xе is a Levy process, has right-continuous paths with left limits, there can be only countably many discontinuities of Xе in [0,1]. Let Q)t denote this countable random set. Since the jumps of YE come at the times of a Poisson process independent of Xе, we have immediately that, almost surely, no jump time of YE falls in Q)v Thus any jump time of YE will actually be a jump time of Xе + YE = X\ but X is supposed to be continuous. The only possibility then is that YE has no jumps, which is to say ν({χ:|χ|>ε}) = 0. Since ε > 0 was arbitrary, the expression (28.7) for the characteristic exponent
1.28 GAUSSIAN PROCESSES AND LEVY PROCESSES 77 of X collapses to ψ{θ) = ώ-θ-±θτΧθ, and taking σ = $1/2 yields the form (28.13), as required. D We conclude this section with a few examples of common Levy processes (or, equivalently, infinitely divisible distributions). The Gaussian distribution we already know about. (28.14) Stable processes. A real-valued Levy process X is said to be stable of index ae(0,2] if, for any о 0, (28.15) (4Ьо = ^ЧЬо. The family of all (real-valued) stable (a) processes is given by the characteristic exponents (28.16) ψ(θ)= -c|0|a[l-i/?sgn(0)tan^ua] where l<a<2, -1 ^ /? ^ 1, or 0 < a < 1, — Ι^β^ί. For a = 1, the exponent is of the form φ(θ) = - сЩ + ιμθ. The representation (b,$, v) in terms of c, β and α is, for 0 < α < 1, (28.17) ^ = lxr1-g[i(l+^)J(JC>o) + |(l-j8)/(JC<o)]rn ™ 1 , dx r(l-a)cos^r $ = 0, b = 2υ 1 x v(dx) = β ас (28.18) 4-_, = _^ $ = 0> b = /i .! 1 — α Γ(1 — α) cos |απ' and, for 1 < α < 2, exactly the same formula is valid (now, note, cos |απ < 0). The representation of φ{θ) = — с\в\ + ιμθ in the case α = 1 is achieved by v(dx) с dx πχ2 The case a = 2 is just the Brownian case. One special case is that of the symmetric Cauchy process: (28.19) Ψ(θ)=-\θ\. The Cauchy process arises naturally in two-dimensional Brownian motion; if X and Υ are independent BM0 processes, rt:= inf {u:Xu = i}, then Y{rt) is a Cauchy process, as you can easily verify from the fact that, for λ > 0, Ε ехр(-,1т,) = ехр(-ц/2Д). (See (9.1).) The asymmetric Cauchy process has (28.20) ψ(θ)=-±π\θ\-ίθΙοξ\θ\ and (28.21) v(dx) = x-2dxl{x>0), b = $ = 0. Note that the asymmetric Cauchy process is not stable.
78 BROWNIAN MOTION 1.28 In view of the interpretation of ν as the 'jump measure', it is natural (and correct) to think that the asymmetric Cauchy process Ζ can only jump upward; and it is natural (and incorrect) to think that, for some large enough c>0, Z, + ct is increasing. The reason that this is incorrect is somewhat mysterious, and is to do with the fact that, for any t > 0, Σβ<ι|ΔΖ,| = + oo, almost surely. We discuss this further in Section VI.2, but raise the issue here to emphasise that Levy processes for which |(|x|2Al)v(dx)<oo, f< (|x|2 л l)v{dx)< oo, (|x| л l)v(dx) = + oo are the most mysterious by all. We shall prove straight away that we cannot make Zt + ct increasing, however large с is, by characterising all increasing Levy processes. (28.22) Subordinators. A subordinator is simply an increasing Levy process; examples include Xt = t and compound Poisson processes with positive jumps. We shall prove the following characterisation. (28.23) THEOREM. The distribution F on R+ is infinitely divisible if and only if there is a representation (28.24) Г e~XxF(dx) = exp\-ck- | (1 - e~λχ)μ(άχ) \ J[0,oo) L Jo J for some с ^ 0, and measure μ on (0, oo) satisfying the integrability condition fV Jo (28.25) | (хл 1)μ(</χ)<οο. (28.26) Remarks (i) The function (28.27) γ(λ):= cX + | (1 - βλχ)μ(άχ) Jo is called the Laplace exponent of the infinitely divisible law. (Our notation у is not standard.) (ii) The method of proof of Theorem 28.23 contains most of the essential steps of the proof of Theorem 28.4, but simplified by the fact that we work with Laplace transforms, (iii) The measure (28.21) does not satisfy the integrability condition (28.25), so the law of the asymmetric Cauchy process with positive drift с cannot be an infinitely divisible law onR+. Proof of Theorem 28.23. Suppose first that F is infinitely divisible, F(0) = 0.
1.28 GAUSSIAN PROCESSES AND LEVY PROCESSES 79 Define Ρ(λ):= I e-XxF(dx)e(0M J[0,oo) and observe that, for each nelM, there is an nth root Fn of F with J[0,oo) Thus as π -► oo, Ρ „(λ) -*■ 1 uniformly on compact sets, and (28.28) logP(X) = nlogPn(X) = nlog{l-[l-Fn(A)]} <-n[l-F„(A)]. But since F„(A)-> 1 uniformly on compacts, we can assert that, for any ε > 0, and К>0, there is some n0 such that, for n^n0, λ^Κ, where δ > 0 is such that, for all 0 s$ χ < δ, log(l-x)>-(l + e)x. Hence, for n^n0, λ^Κ, we have (28.29) log Ρ(λ) = и log Ρ „(λ) > - и(1 + ε) [1 - F.(A)]. The conclusion from (28.28) and (28.29) is that n[l-^.(A)]-*-logF(A) (n->oo). But п[1-ад] = и ■ί (1-в-^я(Л) (Ο,οο) (Ο,οο) 1 -"^ Thus the measures mn(dx):= n(i — e~x)Fn(dx) on (0, oo) have bounded total mass (indeed, m„(0, oo)-* — logF(l)), so there is a subsequence down which mn=>m, a measure on [0, oo], and we conclude that и[1-ад]-+т({0})Я+ f \"_e _xm(dx) + m({cc}). J(0,c») * e Writing c:=m({0}) and μ(</χ):=(1 -e"*)"1"^*) gives the form (28.24) for — logF(X):= γ(λ), except for the presence of m({oo}). But letting Λ JO shows that, in fact, m({oo}) = 0, and the proof is complete. Π
80 BROWNIAN MOTION 1.28,29 (28.30) The gamma process. Many of the common families of distributions of statistics are actually infinitely divisible, the gamma distribution included. Since the gamma law is concentrated on R+, we see from Theorem 28.23 that if Xt is a gamma random variable with scale parameter α and shape parameter t (that is, Xt has density χ'"1 β"αν/Γ(ί)) then (28.31) Ee'XXt = of (λ + α)"' = exp - cXt - t | °°(1 - β~λχ)μ{άχ) for some с ^ 0, and μ satisfying (28.25). The reader should have no difficulty in confirming that (28.32) μ{άχ)^β-αχ—, c = 0, χ makes (28.31) true. The fact that the mean and variance of a gamma law are proportional to t is now obvious from the interpretation of the gamma law as the law of a Levy process. (28.33) Among other common distributions, the ί-distribution, the lognormal, and reciprocals of gammas are all infinitely divisible; see respectively Grosswald [1], Thorin [1] and Bondesson [1]. Although the Levy-Khinchin representation looks like all that there is to say about infinite divisibility, it is unfortunately rare that one can exhibit the characteristic function in a sufficiently explicit form to be able to decide infinite divisibility. For that reason, attention has focused on various subclasses of the infinitely divisible laws; see Bondesson [1] for a survey. Pitman and Yor [1] succeeded in explaining the analytical results of Ismail and Kelker [1], for example that, for ν > — 1, 2h^r/22Vv(Vv)/r(v + l) is the Laplace transform of an infinitely divisible law, by finding a diffusion additive functional with this law, which was thus obviously infinitely divisible. Pitman and Yor also discovered a whole host of other infinitely divisible laws. This opened a whole new vein in the study of infinite divisibility, and it is fair to say that it is still far from exhausted. 29. Fluctuation theory and Wiener-Hopf factorisation. At the very beginning of this chapter, one of the reasons we gave for studying Brownian motion was that it was sufficiently concrete that many calculations can be done explicitly; in particular, the law of the maximum by time i, or the law of the first passage time to a level, were quite easy to derive using the reflection principle. Generalising to Levy processes, these problems become very much harder to answer (despite the explicit Levy-Khinchin representation), and lead us into the realm of Wiener-Hopf factorisation. The whole area is notorious for the
1.29 GAUSSIAN PROCESSES AND LEVY PROCESSES 81 lack of good closed-form answers, although various general formulae are known. We shall here present without proof a selection of largely classical results that illustrate what is known. We have drawn extensively on the excellent surveys by Bingham [1] and Fristedt [1]. Let X be a (real-valued) Levy process, Xt:=sups^tXs,Xt:=mfs<tXs and let Tfl:= inf {t > 0:Xt > a}. Take Τ to be an exponential random variable of mean j/"1, independent of X. (29.1) THEOREM (Spitzer, Rogozin, Pecherskii) (29.2) (i) ΕβίθΧ{Τ) = ηΙη'-ψ(θ)Γ1 (29.2) (ii) =Eeiex(T)Eeie{x(T)-xm} (29.2)(iii) = Eew*mEew*m. Moreover, we have the Spitzer-Rogozin identity Γ C^e'i'dt f00 Ί (29.3) ΕeimT) = exp (eWx - l)P(Xtedx) , with the analogous expression for the characteristic function of X{T). (29.4) Remarks. The equality (29.2)(i) is immediate from the definitions. The equality (29.2)(ii) follows because X(T)= X{T)-X{T); draw a picture of the sample path and turn it upside down! The equality (29.2) (ii) is the profound statement; although evidently XT = XT + (XT - XT\ the factorisation (29.2)(ii) follows from the less obvious fact that (29.5) XT and XT — XT are independent. This fundamental fact is best understood by excursion theory, and the account given by Greenwood and Pitman [2] is the definitive reference. The excursion- theoretic standpoint is the key to most of distributional identities concerning Levy processes. We omit the very important identity of Fristedt [1] because we have not the necessary notions yet to state it, but record here a simple but useful identity; for λ ^ 0, (29.6) Εβ-λ*{Τ) = η\η + κ(η)λ+ | P(X(T)edy) Γ v(dx)(l -e~x(x + y)) , L J(-oo,0] J -y J where /c is a non-negative increasing function. See Rogers [5] for a derivation and the explanation of the significance of к, which is related to Fristedt's identity. Spitzer's book [1] gives the random-walk version of (29.3), which was extended to Levy processes by Rogozin [1]. Although the Wiener-Hopf factor (29.3) can rarely be computed in closed form, the identity (29.3) does yield useful information. By extending from веЖ
82 BROWNIAN MOTION 1.29,30 to 0eH = {ze<C:Imz^0} and letting Θ = ιλ-+ ίοο, Я > 0, we deduce that (29.7) Ρ(το>0) = 0 or 1 according as $0 + (dt/t)P(Xt >0) = + oo or <+oo. Rogozin [1] deduces the test (29.8) PCX^ = + oo) = 0 or 1 according as f*>(dt/t)F(Xt > 0) < oo or = + oo. If one restricts attention to the case of spectrally one-sided Levy processes (those for which the measure ν in the Levy-Khinchin representation (28.7), puts no mass on one or other of the half-lines) then more complete results may be obtained, simply because, if v(R+) = 0, say, the law of X(T) is exponential. This is probabilistically obvious (why?—recall that X has no upward jumps), but can be seen immediately from (29.6). It may not be easy to work out the rate of the exponential, but at least one has in principle both of the Wiener-Hopf factors, and some expressions for them. For example, if X were spectrally positive, and -X(T)~εχρ[β(η)]9 then, from (29.6), Έεχρ[-λΧ{Τ)] = η\ η + λκ{η)+ | °° v(dx) \'fie'Pf(l -eX{x~y))dy and it can be shown that κ(η) > 0 if and only if σ > 0 (that is, there is a Brownian component; see Rogers [5, Theorem 3]). Distributional results for Levy processes continue to be discovered; see Doney [1,2] for particularly interesting recent ones. 30. Local times of Levy processes. Of the many sample-path properties of Levy processes, some of the most interesting are to do with existence and properties of a local-time process. In this section, we give a brief introduction to the main results in this area, which are due to Kesten, Bretagnolle, Blumenthal and Getoor, Barlow and Hawkes. Throughout, we assume that the (real-valued) Levy process X is not a compound Poisson process, this case being a trivial complication. Define C^lRby C:= {xeTH:P(Xt = χ for some t > 0) > 0}. (30.1) THEOREM (Kesten, Bretagnolle). Either C = 0,or else Leb(C)>0. // the latter alternative obtains then С is one of R, (0, oo) or (— oo,0). A necessary and sufficient condition for Leb(C) > 0 is (30.2) I Re| ]</0<oo. The original proof of this result is to be found to Kesten [1]; Bretagnolle [1] was able to simplify Kesten's approach by working directly with the potentials of singletons, rather than little intervals.
1.30 GAUSSIAN PROCESSES AND LEVY PROCESSES 83 Let mt (respectively m) denote the occupation measure by time t (respectively the 1-discounted occupation measure): Щ(Л)-= \lA(Xs)ds, m(A):= Г е~Чл(Х,)а1. Jo Jo Let Lt (respectively Ц denote the density of mt, (respectively m), if such densities exist. The following attractive result decides when an occupation density exists. (30.3) THEOREM (Hawkes [3]). A local time exists if and only if (30.4) I Re l—— d0<oo. Moreover, L{) is almost surely square-integrable. Proof. E\m(0)\ 2 = E\ ["е-'-'^'-^ЛЛ Jo Jo Re Thus the condition (30.4) (which is the same as (30.2)) implies that almost surely rheL2(JR), and so m has an L2(R) density. Conversely, if a local time exists, the range of X has positive Lebesgue measure almost surely, implying Leb(C) > 0, and hence (30.2), by Theorem 30.1. Π Millar and Tran [1] show that the local time of the asymmetric Cauchy process (which exists by applying the test (30.4) to the characteristic exponent (28.20)) has the property that xi->Lf(x) is unbounded on every interval; thus the local time is far from continuous in general. Under the condition (30.5) 0 is regular for {0}, Blumenthal and Getoor [2] proved the existence of a jointly measurable process {L(i,x):i^0, xeJR} that is an occupation density and such that, for each x,t\->L{t,x) is continuous increasing and, indeed, an additive functional; see Section III. 16). Thus the condition (30.5) ensures the continuity of L in the time variable, leaving just the continuity in the space variable. (The condition (30.5) means P°(#0 = 0) = 1, where, as usual, Hx:= inf {i > 0:Xt = x}. It can be shown (see Rogozin [1]) that (30.5) is equivalent to (30.6) Γ Re ΐ 1<*0<оо
84 BROWNIAN MOTION 1.30 and either σ2>0 or \(\x\ a l)y(dx) = + 00. Let us assume the (30.4) holds, and that the function φ is defined by ^>.{iJ(i-«fa)».[T-Lj5]*},e. Let φ be the monotone rearrangement of <p, that is, <p(i)=inf{s:^(s)> i}, where ^(s):= Leb{w:<p(w) ^ i}. The sufficiency of the following condition is due to Barlow and Hawkes [1], the necessity to Barlow [5]. (30.7) THEOREM (Barlow, Barlow and Hawkes). Assume (30.4) and (30.5). There exists a jointly continuous version of the local-time process {L(i,x):i >0, xeR} if and only if (30.8) cp{u)du < 00. o+u^/logil/u) The condition (30.8) was suggested by results in the continuity of Gaussian processes; see Hawkes [3] for a well-motivated survey. Barlow [5] also establishes the following modulus-of-continuity result. {30.9) THEOREM. If<p(x) = \x\af(x), where f is slowly varying at 0, then, for a proper subinterval /o/R, |L(s,a)-L(s,b)| Γ F/ Ί1/2 km sup ' \ \ ] " =2 supL(r,x) . SiO a,bel ψ{θ - θ){ -log| Ь - fl |)1/2 |_ xel J \a-b\<S We have already mentioned important work of Marcus and Rosen [1] which utilises the Dynkin isomorphism theorem in deep studies of local-time.
CHAPTER II Some Classical Theory This chapter is reminder of what every probabilist should know, with the emphasis on things that tend to be neglected. The considerable length of the chapter—and it is now much more extensive than in the first edition—should be sufficient guarantee that 'reminder' is used in the usual 'courtesy' sense! Because things are now developed in strictly logical order, you may sometimes have to wait a little time for applications. (We very occasionally cheat just a little in Exercises by using things that you may feel are not yet proved with full rigour, but we always clear these up later.) Exercises are very much part of the text—please do them! So many reminders of standard definitions are included that we break with our usual definition format except when we wish to give special emphasis to particularly important material that may not be so familiar. 1. BASIC MEASURE THEORY The basic results of measure theory are summarised here, with commentary, but mostly without proofs. A full account, with all results proved, may be found, for example, in Williams [15], referred to as [W] throughout this chapter; that account has the advantage that its notation and terminology are the same as those used here. Neveu [1] is a marvellous account of measure theory for probabilies; and, for the definitive account of the full theory, including Choquet capacitability theory (which is needed for the Debut and Section Theorems), see Volume 1 of Dellacherie and Meyer [1]. If you have studied Measure for Measure from such classics as Halmos [1] or Dunford and Schwartz [1] then this part of the chapter can serve to remind you of probabilists' language. Measurability and measure 1. Measurable spaces; σ-algebras; π-systems; ^-systems. Measurability will, in a sense, be much more important to us then measure. The emphasis in probability is therefore very different from that of courses that aim straight for the Dominated-Convergence Theorem.
86 SOME CLASSICAL THEORY II. 1 (1.1) Algebra; σ-algebra; σ(#); measurable space. Let S be a set. A collection Σ0 of subsets of S is called an algebra on S (or algebra of subsets o/S) if the following three conditions hold: 5εΣ0, FgE0^Fc:=S\FeE0, F,GeE0=>FuGgE0. Note that 0 = SceE0 and that F, σεΣο^ί'η G = (Fcu Gc)ceE0. Thus an algebra on S is a family of subsets of S stable under finitely many set operations. {Note: Some authors use 'field' for 'algebra' and 'σ-field' for 'σ-algebra'.) A collection Σ of subsets of S is called a σ-algebra on S (or σ-algebra of subsets ofS) if Σ is an algebra on S such that whenever FneΣ{neЩ9 Π Note that if Σ is a σ-algebra on S and ^„εΣ for neN, then f]nFn = ((^ί^εΣ. Thus a σ-algebra on S is a family of subsets of S 'stable under any countable collection of set operations'. A pair (S, Σ), where S is a set and Σ is a σ-algebra on S, is called a measurable space. An element of Σ is called a Σ-measurable subset of S. Let # be a class of subsets of S. Then σ(ίί), the σ-algebra generated by #, is the smallest σ-algebra Σ on S such that <β ^ Σ. It is the intersection of all σ-algebras on S that have <€ as a subclass. (Obviously, the class ^(S) of all subsets of S is a σ-algebra that extends #.) (i.2) Боге/ σ-algebras; @{S); ® = Jf(R). Let S be a topological space. Then @{S\ the Borel σ-algebra on S, is the σ-algebra generated by the family of open subsets of S. With slight abuse of notation, (1.3) ^(5):=σ(ορεη sets). It is standard shorthand that ^:=^(R). The σ-algebra 36 is the most important of all σ-algebras. Every subset of R that you meet in everyday use is an element of 3&\ and indeed it is difficult (but possible!) to find a subset of R constructed explicitly (without the Axiom of Choice) that is not in Si. To construct a subset of R that is not Lebesgue-measurable (as we do in Exercise E20.6b), one must use the Axiom of Choice. See Durrett [3, p. 411]. Elements of Si can be quite complicated, and it is not possible to write down the 'generic' element of Si in practicable fashion. However, the collection (1.4) 7r(R):={(-oo,x]:xeR} (not a standard notation, but a key example of a 'π-system') is very easy to understand, and it is often the case that all we need to know about 36 is the almost obvious result that (1.5) 3» = σ(π0ΙΙ)).
II.l BASIC MEASURE THEORY 87 (1.6) π-systems and d-sy stems. Here we develop the point just made into a very useful technique. Let S be a set. A collection J of subsets of S is called a π-system if J is stable under finite intersections: whenever A,Be</, we have AnBeJ. A collection 2 of subsets of S is called a d-system (on S) if if A, Be® and^cfi then B\A e@, ifAneS> and An] A then Ae®. Recall that Лп| Л means АпяАп+ ^Vn) and (J4n = A. (1.7) PROPOSITION. A collection Σ of subsets of S is a σ-algebra if and only if Σ is both a π-system and a d-system. Proof. The 'only if part is trivial, so we prove only the 'if part. Suppose that Σ is both a π-system and a d-system, and that E,F and £„(neN) are in Σ. Then Ec:= Ξ\ΕβΣ, and EuF = S\(EcnFc)eX. Hence ΰΠ:=£1υ···υ£ΠΕΣ, and, since GH^\jEk9 we see that [JEkeT. If # is a class of subsets of S, we define d(#) to be the intersection of all d-systems that contain #. Obviously, а(Я>) is a d-system, the smallest d-system containing #. It is also obvious that d(V) с σ(#). (jf.8) LEMMA (Dynkin). If J is a π-system, then d(J) = a(J). Thus any d-system that contains a π-system contains the σ-algebra generated by that π-system. Proof. Because of Proposition 1.7, we need only prove that d(J) is a π-system. Step 1: Let ®x := {Bed(J):Bn Ced(J\ VCeJ). Because J is a π-system, ®x з,/. It is easily checked that 2X inherits the d-system structure from d(J). [For, clearly, Sb2x. Next, if BuB2e@i and Bx с β2, then, for С in Λ (B2\BJ nC = (B2n CUB, η С); and, since B2nCed(J\BinCed(J) and d(</) is a d-system, we see that (B2\B1)nCed(J), so that ^V^e^V Finally, if BHeS1 (weN) and B„|B then,
88 SOME CLASSICAL THEORY 11.1,2 for CeJ, (ВппС)ЦВпС) so that Β η Ced(J) and BeS>v~\ We have shown that Q)γ is a d-system containing У, so that (since 9^ ^ d(c/) by definition) Q)1 = </(./). Step 2: Let ®2:= {Aed(J):Ac\Bed(J\ УВефО}· Step 1 showed that ®2 contains J. But, just as in Step 1, we can prove that 2)2 inherits the d-system structure from d(J\ and that therefore ®2 = d(J). But the fact that S>2 = d(J) says that d(J) is a π-system. Π 2. Measurable functions. This section is largely a matter of acquainting you with our notation. (2.1) DEFINITION (Measurable functions, τη(Σ1/Σ2)). Suppose that (S^IJ and (S2,X2) are measurable spaces, and that h is a map h:S1-+S2. Then h is called Z1/T2-measurable (or just measurable when Σχ and Σ2 are understood), and we write hem(X1fL2\ if Λ_1:Σ2^Σι; that is, if the inverse image h-1(A):={seS:h(s)eA} of every set ΑθΣ2 is in Σχ. This definition is exactly analogous to the definition of continuity. (22) PROPOSITION. The map h'1 preserves all set operations: h~\\JaAa) = [Jah-\Aa\ h-\Ac) = (h~\A))\ etc. Proof. This is just definition chasing. Π (23) PROPOSITION. If^^^o^^^andh-^.^^^thenhem^^^. Proof. Let £ be the class of elements F in Σ2 such that /Γ^εΣ^ By (2.2), £ is a σ-algebra, and, by hypothesis, £ ^ #. Π (2.4) PROPOSITION (Composition Lemma). //(S^J, (52,Σ2) and (53,Σ3) are measurable spaces, and ifhl is measurable from (S1,X1) to (52,Σ2) and h2 is measurable from (S2^2) to (53,Σ3), then h2°h1 is measurable from (ί^Σ^ to (53Дз).
II.2 BASIC MEASURE THEORY 89 Proof. This is obvious. Π (2.5) JH-valued functions; ml; (πιΣ)+; ЬХ. Let (Ξ,Σ) be a measurable space. A function ft: S -► R is called Σ-measurable, and we write ftemX, if ft "*: ^ -► Σ, that is, if hemfci/gS). We write (ηιΣ)+ for the class of non-negative elements in ml, and bΣ for the class of bounded Σ-measurable functions on S. Note. Because lim sups of sequences even of finite-valued functions may be infinite, and for other reasons, it is convenient to extend these definitions to functions ft taking values in [—00,00] in the obvious way: ft is called Σ- measurable if ft_1:^[— 00, 00]-»Σ. Which of the various results stated for real-valued functions extend to functions with values in [—00,00], and what these extensions are, should be obvious. (2.6) PROPOSITION. Our function ft:S-»R is Σ-measurable if and only if ^^ήι^^εΞ^^^εΣ (VceR). Proof. Take # to be the class 7r(R) of intervals of the form (— oo,c],ceR, and apply 2.3. Note. Obviously, similar results apply in which {ft < c] is replaced by {ft > c}, {ft ^ c] etc. (2.7) LEMMA. Sums and products of measurable TSL-valued functions are measurable: in other words, ml is an algebra over R. Thus if >leR and ft,ftl5 /i2emZ, then h1 + /ι2£πιΣ, ftift2em^ XhemZ. Example of proof. Let ceR. Then for seS it is clear that ft^s) + h2(s) > с if and only if for some rational q we have hi(s)>q >c — h2(s). In other words, {ftx+ft2>c}= \J({h1>q}n{h2>c-q})9 a countable union of elements of Σ. Π (2.8) LEMMA (measurability of infs, lim infs of functions). Let (к„:пеЩ be a sequence of elements ο/ιηΣ. Then (i) infftn, (ii) liminfftn, (iii) limsupftn are Σ-measurable (into ([ — 00,00],^[— 00, 00]), but we shall still write inf /i„emZ
90 SOME CLASSICAL THEORY 11.2,3 (for example)). Further, (iv) {s:limhn(s) exists in R}eX. Proof. (i){inihn>c} = f)n{hn>c}. (ii) Let L„(s):= inf {hr(s):r ^ n). Then L„emX, by part (i). But L(s):= lim inf h„(s) = |lim L„{s) = sup Ln(s), (iii) This part is now obvious. (iv) This is also clear because the set on which lim hn exists in R is {limsup/in< oo}n{liminf/in> — оо}п<7_1({0}), where g:= lim sup hn — lim inf hn. D (2.9) σ-algebra generated by a collection of functions on S. This important idea is analogous to the weakest topology that makes every function in a given family continuous, etc. Generally, if we have a collection (Yy:yeQ of maps 7y:Q-*]R, then W:=a(Yy:yeQ is defined to be the smallest σ-algebra <& on Ω such that each map Yy (yeC) is Φ-measurable. Clearly, G{Yy:yeC) = G{{(ue&\Yy{(u)eBy.yeC,Be@). (2.10) Borel functions. A function h from a topological space S to R is called Borel if h is Jf(S)-measurable. The most important case is when S itself is R. (2.11) PROPOSITION. IfS is topological andh'.S-^Ж is continuous, then h is Borel. Proof. Take # to be the class of open subsets of R, and apply Proposition 2.3. D 3. Monotone-Class Theorems. The following elementary Monotone-Class Theorem allows us to deduce results about general measurable functions from results about indicators of elements of π-systems. (3.1) THEOREM. Let Ж be a class of bounded functions from a set S into R satisfying the following conditions: (i) Ж is a vector space over R; (ii) the constant function 1 is an element of Ж;
H.3,4 BASIC MEASURE THEORY 91 (Hi) if{fn) is a sequence of non-negative functions in Ж such that fn}f, where f is a bounded function on S, thenfejf. Suppose further that Ж contains the indicator function of every set in some n-system J'. Then Ж contains every bounded o(J)-measur able function on S. Sketch of proof Let 2) be the class of subsets D of S such that 1йеЖ. Then the listed properties of Ж guarantee that Q) is a d-system. Since $) contains J by hypothesis, Dynkin's Lemma shows that Q) contains o(J). Suppose that / is a a(</)-measurable function such that for some К in N, Q^f(s)^K, VseS. For neN, define f„(s):= Σ i2~4DM(s), i = 0 where D(nJ):= {s:i2~n ^f(s)<(i+ 1)2""}. Since / is a(c/)-measurable, every D(nJ)ea(J), so that 1Щп1)еЖ. Since Ж is a vector space, every /„e/. But 0 </„t/, so that feЖ. If /ebfff/), we may write f = f+ —/", where /=max(/,0) and /" = max(-/,0). Then f+,f~ebG(J) and /+,/~ ^0, so that /+9/'еМГ by what we established above. Π For certain applications, it is very useful to have more sophisticated forms of monotone-class theorem. Here is one. (3.2) THEOREM. Let Ж be a vector space of bounded real-valued functions on a set S. Suppose that Ж contains constant functions, is closed under uniform convergence, and has the following property: for a uniformly bounded sequence (fn) of non-negative functions in Ж such that fn(s) t/(s) (Vs), we must have feЖ. If Ж contains a subset <£ that is closed under multiplication, then Ж contains every bounded affi-measw-able function from S to R. You are invited to prove this in Section 13. The hypothesis that Ж is closed under uniform convergence (usually one that is easily verified) may be dropped. See Dellacherie and Meyer [1]. 4. Measures; the uniqueness lemma; almost everywhere; a.e. (μ, Σ). This is fairly familiar material, but watch the use of π-systems in Lemma 4.6. (4.1) Set functions: additivity; σ-additivity; monotone convergence. Let S be a set,
92 SOME CLASSICAL THEORY II.4 let Σ0 be an algebra on S, and let μο·Σο-»[0>°°] be a 'non-negative set function'. Then μ0 is called additive if μο(0) = 0 and, for F,GeE0, FnG = 0^o(FuG^o(F)4^o(G). The map μ0 is called countably additive (or σ-additive) if μ(0) = 0 and if, whenever (Fn:neN) is a sequence of disjoint sets in Σ0 with union F = \JF„ in Σ0 (note that this is an assumption since Σ0 need not be a σ-algebra), then π (4.2) LEMMA. Suppose that μ0 is additive on (S, Σ0), w/iere Σ0 is an algebra on S. Then μ0 is σ-additive on Σ0 if and only if whenever Ρ„εΣ0 (пеЩ and Fn]F, where FeL0, we have μ0(Ρ„)]μ0(Ρ). Recall that Fn]F means F„^Fn + 1 (VneN), (Jf„ = F. Lemma 4.2 is the fundamental property of measure. Proof of Only if part. Write G1:=F1,Gn:=Fn\F„_1 (и>2). Then the sets G„ (neN) are disjoint, and ^o(fn) = ^o(G1uG2u--uG„)= Σ μ0((?λ)Τ Σ ft>(G*) = Mo(n Π *5ζπ k<oo It is now obvious how to prove the 'if part. (4.3) LEMMA. ί/μ0(5) < oo then μ0 is σ-additive on Σ0 if and only if whenever G^0(neIN) and GJ0, we have μ(σ„)|0. You prove this! (4.4) Measure space; finite and σ-finite measures. Let (S, Σ) be a measurable space, so that ^is a σ-algebra on S. A map μ:Σ^[0,οο] is called a measure on (S, Σ) if μ is countably additive. The triple (S, Σ, μ) is then called a measure space. Now let (S, Σ, μ) be a measure space. Then μ (or indeed the measure space (S,^))is called finite if μ(Ξ) < oo, σ-finite if there is a sequence (Sn:neN) of elements of Σ such that μ(5„) < oo (VnelK) and (JS„ = S.
Π.4,5 BASIC MEASURE THEORY 93 Intuition is usually good for finite measures, and adapts well for σ-finite measures. However, measures which are not σ-finite can be rather crazy. An element F of Σ is called μ-null if /x(F) = 0. It is easily shown by the use of Lemma 4.2 that a countable union of μ-null sets is μ-null. A statement if about points s of S is said to hold μ-almost everywhere (a.e. (μ)) if F:= {s:^(s) is false} el and /x(F) = 0; If we wish to emphasise that FeZ, we say that S holds a.e. (μ,Σ). (4.5) A fundamental uniqueness lemma. The point here is that σ-algebras are 'difficult', but π-systems are 'easy': one can often write down in closed form the general element of a π-system, while the general element even of 3$ is impossibly complicated. (4.6) LEMMA. Let J be a π-system on a set S, and let Σ:= g(J). Suppose that μ1 and μ2 are measures on (S, Σ) such that μ^Ξ) = μ2(Ξ) < oo and μ1 — μ2 on J. Then μγ — μ2 on Σ. Proof. The class of elements F of Σ for which μ^(Ρ) = μ2(Ρ) is a d-system containing J. Dynkin's Lemma 1.8 now gives the result. Π (4.7) COROLLARY (The Uniqueness Lemma). // two probability measures agree on a π-system, then they agree on the σ-algebra generated by that π-system. This result will play an extremely important role. 5. Caratheodory's Extension Theorem. The following result underpins the existence of every non-trivial probabilistic model. We shall see its use in the celebrated Daniell-Kolmogorov Theorem on the existence of stochastic processes. (5.1) THEOREM (Caratheodory). Let S be a set, let Σ0 be an algebra on S, and let Σ:=σ(Σ0). If μ0 is a countably additive map μο:Σο-»[0, οο], then there exists a measure μ on (S, Σ) such that μ = μ0 on Σ0. If μ0(Ξ) < oo then, by Lemma 4.6, this extension is unique—an algebra is a π-system! For a proof see for example [W; Al.5-1.8].
94 SOME CLASSICAL THEORY 11.5,6 {5.2) Lebesgue measure Leb on ((0,1],^(0,1]). Let S = (0,1]. For F с S, say that FeL0 if F may be written as a finite union (5.3) F = (alJfe1]u---u(arJfer]5 where reN, 0 ^ ax ^ Ьх ^ · · · ^ ar ^ br ^ 1. Then Σ0 is an algebra on (0,1] and Σ:=σ(Σ0) = #(0,1]. (We write Jf(0,1] instead of Л((0,1]).) For F as in (5.3), let Po(F) = Σ (fek ~ ak)· Then μ0 is well-defined and additive on Σ0 (this is easy). Moreover (see [W, A1.9], μ0 is countably additive on Σ0. (To prove this is not trivial. Our proof of the Daniell-Kolmogorov Theorem will remind you how it is done.) Hence, by Theorem 5.1, there exists a unique measure μ on ((0,1],^(0,1]) extending μ0 on Σ0. This measure μ is called the Lebesgue measure on ((0,1],^(0,1]) or (loosely) the Lebesgue measure on (0,1]. We shall often denote μ by Leb. The Lebesgue measure (still denoted by Leb) on ([0,1],^[0,1]) is of course obtained by a trivial modification, the set {0} having Lebesgue measure 0. In a similar way, we can construct the (σ-finite) Lebesgue measure (which we also denote by Leb) on R (more strictly, on (R,^(R)). 6. Inner and outer μ-measures; completion. Results in this section should be proved as an (easy) exercise. They are, however, important. Let (S, Σ, μ) be a measure space. For G £ S, define the inner μ-measure μ+{0) of G via ^+(G):=suP{MF):Fe^F^G}, and the outer μ-measure μ*{ΰ) of G via μ*((?):=ΐηί{μ(#):#ΕΣ;#Ξ>σ}. The function μ* on &(S) is sub-σ-additive (or countably sub-additive) in that for any sequence (Gn:neIM). If μ+{0) = μ*{ϋ), we say that G is μ-measurable, and write ϋβΣμ. Then Σμ is a σ-algebra, and we can extend μ to a measure, still denoted by μ, on Σμ by writing μ(0:=μ+(σ) = μ*(σ) for G in Σ". The triple (Ξ,Σμ,μ) is called the completion of (ί,Σ,μ). The σ-algebra Σμ is the smallest σ-algebra that extends Σ and contains every set of outer μ-measure 0. We end this section with a lemma that is very significant for probability theory. Its proof is an easy exercise.
И.6,7 BASIC MEASURE THEORY 95 (6.1) LEMMA. Suppose that /x(S) = 1 and that G is a subset ofS with μ*{6) = 1. Then for FeI,/i*(GnF) = /z(i'). Moreover, (ΰ,&,μ*) is a measure space, where & is the class of subsets of G of the form GnF, where feZ. Integration 7. Definition of the integral \f άμ. Let (Ξ,Σ,μ) be a measure space. (7.1) Notation etc: μ(/) :=:f/ άμ; μ(/; A). We are interested in defining for suitable elements / in ml the integral of / with respect to μ, for which we shall use the alternative notations μϋΥ=:[№μ(ά5):=:^<1μ. It is worth mentioning now that we shall also use the following equivalent notations for ΑεΣ: | /(5)μ(^):=:| /</μ:=:μ(/; Α):= μ(/Ικ) (with a true definition on the extreme right!) It should be clear that, for example, M/;/> *):= ltf\ A\ where A = {seS:f(s) > x}. (7.2) Integrals of non-negative simple functions; SF+. If A is an element of Σ, we define Λ>(Ιι)·-ΜΛ)<οο. The use of μ0 rather than μ signifies that we currently have only a naive integral defined for simple functions. An element / of (ml)+ is called simple, and we shall then write /eSF+, if / may be written as a finite sum (7-3) /= Σ aklAk, where ake[0, oo] and АкеЪ. We then define (7.4) μ0(/) = Σα^(Α) < oo (with O.oo := 0 =: oo.O). Of course, it needs to be checked that μ0(/) is well defined, since / will have many different representations of the form (7.3), and we must ensure that they yield the same value of μ0(/) in (7.4). (7.5) Integrals of non-negative functions. For /e(mZ)+, we define μ(/):= sup ^0(ft):fteSF+,ft </} ^ oo.
96 SOME CLASSICAL THEORY 11.7,8 Clearly, for /eSF+, we have /4/) = μ0(/)· (7.6) DEFINITION (μ-integrable functions; ^(Ξ,Σ,μ)). For /eml, we write f=f+ ~f~, where f+(s):= max(/(5),0), /"(*):= max(-/(s),0). Then /+,/-6(ηιΣ)+ and |/| =/+ +/". For /eml, we say that f is μ-integrable, and write /ε&^,Σ,μ), if /41/1) = /4/+) + /4Г)<оо, and then we define ^άμ:=μ(/):=μ(/+)-μ(Γ). Note that, for /Ε^β,Σ,μ), W)\<{*\f\l the familiar rule that the modulus of the integral is less than or equal to the integral of the modulus. (7.7) LEMMA (Linearity). For a,j?eR and f9ge£e\S9JL^\ α/+ββΕ^1(Ξ9Σ9μ) and μΜ+β0) = *μ(ί) + βμ(0). (7.8) Note. There is a slight problem here. For some 5, the expression (a/ + βg)(s) may lead to the undefined 00 — 00. Please defer worrying about this until Section 10. 8. Convergence theorems. We recall the standard results. (8.1) THEOREM (The Monotone-Convergence Theorem). //(/„) is a sequence of elements of(mL)+ such that fn]f then or, in other notation, j" Λ(Φ(ώ)ΐ j" №υ№ [W, A5] contains a proof. This theorem is really all there is to integration
II.8 BASIC MEASURE THEORY 97 theory. We shall see that other key results such as the Fatou Lemma and the Dominated-Convergence Theorem follow trivially form it. (8.2) LEMMA (The Fatou Lemma for functions). For a sequence (fn) in (ml)+, /x(liminf/n)^liminf/x(/„). Proof. We have (8·3) liminffH = tlimgk9 where gk:= inf /„. n n^k For η ^ k, we have fn ^ gk, so that μ(/π) ^ μ(#Λ), whence /x(^)^inf/i(/„); n^k and, on combining this with an application of Theorem 8.1 to (8.3), we obtain μ( lim inf /„) = |lim μ(&) ^ Tlim inf //(/„) \ π / к к п^к =:liminf/x(/„). Π π (8.4) Lemma ('Reverse Fatou' Lemma). //(/„) is a sequence in (mE)+ such that for some g in (ml)+, we have fn ^ g, Vn, and μ^) < oo, then μ{1ίτη sup /и) ^ lim sup μ(/„). Proof. Apply Lemma 8.2 to the sequence (g — /„). Π (8.5) THEOREM (The Dominated-Convergence Theorem). Suppose that /n, /επίΣ, ί/ιαί fn(s)->f(s) for μ-almost every s in S, and that the sequence (/„) is dominated by an element g of JSf^L,//)"1": \fn(s)\<g(s\ VseS, VneN, w/iere μ^) < oo. Γ/ιβη /„-^/ίη^Η^,Σ,μ); that is, μ(|/„-/|)-+0; M/„W(/). Note. This theorem is central to many applications of measure theory. For us, it will be superseded by the uniform-integrability result: Theorem 21.2. Proof. We have \fn — /|<2#, where μ(2#)<οο, so, by the reverse Fatou Lemma 8.4, limsuPM|/n-/|)^/x(limsup|/n-/|) = /z(0) = 0.
98 SOME CLASSICAL THEORY 11.8,9 Since ΙΜ/„)-μ(/)Ι = |μ(/„-/)Ι^μ(Ι/„-/Ι), the theorem is proved. Π Here is a useful result. (8.6) LEMMA (Scheie's Lemma). Suppose thatfn9 fe^\S9Σ,μ) and thatfn->f (a.e.(/x)). Then ΜΙΛ-/ΙΗ0 if and only if MI/.IHMI/I). Exercise. Prove Scheffe's Lemma. Consider first the case in which fn and / are non-negative. Note that then (/„ — /)" ^/. (8.7) The standard machine. What we call the standard machine is a much cruder alternative to the Monotone-Class Theorem. The idea is that to prove that a 'linear' result is true for all functions h in a space such as 2?1(8,Σ,μ), (i) we first show the result is true for the case when h is an indicator function— which it normally is by definition; (ii) we then use linearity to obtain the result for h in SF+; (iii) we then use the Monotone-Convergence Theorem to obtain the result for he(mL)+, integrability conditions on h usually being superfluous at this stage; (iv) finally, we show, by writing h = h+ — h~ and using linearity, that the claimed result is true. When it works, it is easier to 'watch the standard machine work' than to appeal to the monotone-class result, though there are times when the greater subtlety of the Monotone-Class Theorem is essential. 9. The Radon-Nikodym Theorem; absolute continuity; λ«μ notation; equivalent measures. Let (Ξ,Σ,μ) be a measure space. If /e(ml)+, then, by linearity and the Monotone-Convergence Theorem, (9-1) (//z)(F):=M/;F):=M/If) (^Σ) defines a measure /μ on (S, Σ). Note that (9.2) /x(F) = 0 implies that (/»(F) = 0. The Radon-Nikodym Theorem is a very important converse for σ-finite measures. (9.3) THEOREM (The Radon-Nikodym Theorem) and DEFINITION
И.9,10 BASIC MEASURE THEORY 99 (absolute continuity, «, άλ/άμ). Let (S, Σ) be a measurable space, and let μ and λ be σ-finite measures on (S,L). Then the following statements are equivalent: (i) for FeE, μ(Έ) = 0 implies that A(F) = 0; (ii) Я = fμfor some fin (πιΣ)+. Suppose that statements (i) and (ii) hold. We then say that λ is absolutely continuous relative to μ. The function f is defined uniquely modulo μ-null sets: we say that f is a version of the (Radon-Nikodym) density of λ relative to μ, and write dX / = — a.e.(Ai). άμ You can see the relevance of σ-finiteness if you consider μ to be a measure that counts the number of elements in a set. For the classical proof see Halmos [1]. Meyer [2] and [W, Section 14.13] are among the books that give a martingale proof. (9.4) LEMMA. Suppose that λ and μ are finite measures on a measurable space (S, Σ). Then λ«μ if and only if for ε > 0 we can find α δ > 0 such that FeL and μ(Έ) < δ imply that A(F) < ε. (9.5) LEMMA and DEFINITION (equivalent measures). Again suppose that λ and μ are finite measures on a measurable space (S, Σ). Suppose further that λ«μ and μ«λ. We then say that λ and μ are equivalent. Note that a.e.(/x) and a.e.(A) now mean the same thing: we write a.e. Iff is a version of άλ/άμ αηά g is a version ofa^ak then 0 < / < oo a.e. ana g = 1// a.e.. You are invited to prove these lemmas in Section 13. 10. Inequalities; S£p and LP spaces (p ^ 1). Continue to let (S, Σ, μ) be a measure space, and let pe[l, oo). For /emZ, write fe&p:= J5fP(S, Σ,//) if Н/11р:=Ы1/П}1/р<оо. (10.1) LEMMA (Minkowski's Inequality). We have \\/ + д\\,<и\\,+ Ш\г (10.2) LEMMA (Holder's Inequality). Ifp>\anaq>\ satisfy p~l + q'1 = 1 then, for fgemZ, \μ(/9)\<μ(\/9\)<υ\\Ρ\\9\\4. The Schwarz inequality is the case when ρ = q = 2.
100 SOME CLASSICAL THEORY II. 10 The best way to view these classical inequalities is as consequences of Jensen's inequality (18.3). See [W, 6.13]. An immediate consequence of Jensen's inequality ([W, 6.7]) is the following result. (10.3) If μ is a finite measure and l^p^r then, for /emL, ll/Jlp^(S)cH/llr, where c'.^p-'-r-1. We now need some standard results from functional analysis. See, for example, Dunford and Schwartz [1] or Halmos [1]. Define an equivalence relation on J5fp as follows: f = g if and only if ||/-^||p = 0, equivalently, f = g if and only if / = g, a.e.(^). Let [/] be the equivalence class in 5£p containing /, and let LP be the set of equivalence classes. Since, for /eJSfp,^(|/| = oo) = 0, every equivalence class [/] contains a representative /* taking only finite values. With obvious notation, we can define «[/] + β[β\ = [«/* + β9*1 II [Л 11,:= 11/11,. The use of equivalence classes avoids both the oo — oo problem mentioned in (7.8) and the associated lack of associativity. The set LP now becomes a normed vector space. But more is true: (10.4) LP is a Banach space; in particular, LP is a complete metric space under the distance ^([/],Ы):=11[/-^]|1р. The following characterisation of the dual of LP is important. (10.5) LEMMA. For p>\, the dual space (LP)* of LP is the space 13 where p'1 + q'1 = 1; if A is a bounded linear functional on LP, then there exists geL? such that A(/) = MM V/eL'. What happens when ρ = 1 and q = oo? We say that /eif °° if the μ-essential supremum norm off is finite: ||/||ao:=/i-esssup(/):=sup{x>0:/i(|/|>x)>0} := inf {x 7* 0:μ(\/\ ^ x) = 0} < oo. Build L° in the obvious way. Then (10.6) (L1)* = L00; but, except in trivial cases, (L00)* will be much bigger than L1.
И. 10,11 BASIC MEASURE THEORY 101 A key application is to combine these results with the Hahn-Banach Theorem as follows. {10.7) LEMMA. Let (Ξ,Σ,μ) be a measure space, let pe[l, oo), and let V be a vector subspace of LP. Define qe(\,cc~\byp~l +q~l = I. Suppose that, for geU, (M/^) = 0,V/gK)=>(^ = 0). Then V is dense in LP. Here is a way of putting this into practice. (10.8) LEMMA. Let (Ξ,Σ,μ) be a finite measure space, and let J be a π-system on S such that a(J) = Σ. Let pe[l, oo). Let V be the vector subspace of IF spanned by the indicator functions of elements of J. Then V is dense in LP. You are invited to prove this in Section 13. (10.9) Important discussion; 5£p versus LP. If we ask the problem 'Does a certain real-valued stochastic process {Xt: t ^ 0} have continuous paths?', we are asking about the mapih-*AT(£,co). The question forces us to regard random variables as true functions, not as equivalence classes. All interesting problems in continuous time are invisible to the 'elegant' equivalence-class approach of functional analysis. So, we must use 5£p rather than LP. The oo — oo problem will not worry us, because whenever we need to subtract random variables, they will have all values finite. Product structures 11. Product σ-algebras. Product structures are especially important in probability theory because of their close connection with the concept of independence. (11.1) Finite-product σ-algebras. This is an area in which we really need the Monotone-Class Theorems; the standard machine is not good enough. Let (S^LJ and (52,Σ2) be measurable spaces. Let S denote the Cartesian product S:=S1 χ S2. For i = 1,2, let p{ denote the ith coordinate map, so that Pi(suS2):=su ρ2(5ι>52):=52· The fundamental definition of Σ = Σχ χ Σ2 is as the σ-algebra (Π.2) Σ = σ(Ρι,ρ2). Thus Σ is generated by sets of the form p;1(B1) = B1xS2 (Β,βΣ,)
102 SOME CLASSICAL THEORY 11.11,12 together with sets of the form P21(B2) = S1xB2 (Β2εΣ2). Generally, a product σ-algebra is generated by Cartesian products in which one factor is allowed to vary over the σ-algebra corresponding to that factor, and all other factors are whole spaces. In the case of our product of two factors, we have (11.3) (Bx χ S2)n{S1 χ B2) = Bxx B2, and you can easily check that J = {Bx χ B2: BfeLj is a π-system generating Σ = Σ! χ Σ2. A similar remark would apply for a countable product Π ΣΠ, but you can see that, since we may only take countable intersections in analogues of (11.3), products of uncountable families of σ- algebras cause problems. The fundamental definition analogous to (11.2) still works. {11.4) LEMMA. Let Ж denote the class off unctions f:S-> JR. that are in ЬΣ and that are such that for each sx in Slt the map s2\-+f(sl9s2) is ^L^measurable on S2, for each s2 in S2, the map sx «—►/(51,52) is Σ^-measurable on Sx. Then Ж = ЪΣ. Proof. It is clear that if AeJ then 1АеЖ. Verification that Ж satisfies the hypotheses of the Monotone-Class Theorem 3.1 is straightforward. Since Σ = g(J\ the result follows. Π Extension of the above concepts to general finite products is obvious, as are canonical identifications: (St χ S2^ χ Σ2) χ (S3^3) = (S1^1) χ (S2 χ 53,Σ2 χ Σ3) = (S1xS2xS3^iXl2xZ3). 12. Product measure; Fubini's Theorem. We continue with the notation of the preceding section. We suppose that for i = 1,2^ is a finite measure on (5{,Σ{). We know from the preceding section that, for feΣ, we may define the integrals I{(si):= /(si,s2)a*2(</s2), I{(s2):= JSi J /(suSihiidSi). (12.1) LEMMA. Let Ж be the class of elements in ЪΣ such that the following
11.12 BASIC MEASURE THEORY 103 property holds: IiiOebLi and I{()ebX2 and f I{(5l)/x1(d5l)= | I{(s2)/x2(ds2). Гйет ^ = bΣ. Proof. If Лбе/ then, trivially, 1АеЖ. Verification of the conditions of the Monotone-Class Theorem 3.1 is straightforward. Π For FeL with indicator function /:= IF, we now define /z(F):=[ 1{(51)μ1№1)=ί 1{(52)μ2(ώ2). J Si JS2 (12.2) THEOREM (Fubini's Theorem; Product measure). Recall that, for i = l,2,/xf is a finite measure on (S^i). The set function μ is a measure on (Ξ,Σ) called the product measure of μχ and μ2, and we write μ = μγ χ μ2 and (S,E,/x) = (S1,E1,/x1)x(S2,E2,/x2). Moreover, μ is the unique measure on (S, Σ) for which (12.3) μ(Α1 χ A2) = μ1(Α1)μ2(Α21 Α,βΣ,. Iffe(mL)+, then, with the obvious definitions of I{ and l{, we have (12.4) μ{/)=\ Ι{(51)μ1(ώ1)=ί φ2)μ2(ά32), JSi JS2 in [0, oo]. IffemL and μ(\f\) < oo, then (12.4) is valid (with all terms in KJ. Proof. The fact that μ is a measure is a consequence of linearity and the Monotone-Convergence Theorem. The fact that μ is then uniquely specified by (12.3) is obvious from the Uniqueness Lemma 4.7 and the fact that a(J) = Σ. The result (12.4) is automatic for / = IA, where AeJ. The Monotone-Class Theorem 3.1 shows that it is therefore valid for /ebL, and in particular for / in the SF+ space for (ί,Σ,μ). The Monotone-Convergence Theorem then shows that it is valid for/G(mE)+; and linearity shows that (12.4) is valid if μ(|/|) < oo. (12.5) (Extension). All ofFubinVs Theorem will work if the (ί,-,Σ^μ,·) are σ-finite measure spaces. We can prove this by breaking up σ-finite spaces into countable unions of disjoint finite blocks. (12.6) Lebesgue measure on ΚΛ We have ^(Rn) = ^и:= ^(R)n. For further study of such matters, see the exercises in the next section. We define the Lebesgue measure on Rn as Leb", but often denote this by Leb when η is understood.
104 SOME CLASSICAL THEORY 11.13 13. Exercises. All of these exercises play a part later, Hints are given at the end of this section. The number of an exercise indicates when (that is, at the end of which section in the main text) you can attempt the question. (El3.2a) R-functions on R. By an R-function F on R, we mean a right- continuous function on R such that the left limit F(x —) exists for every xeR. Prove that an R-function is Borel-measurable. (El 3.2b) Prove that if S is a metric space, and if С = Cb(S\ the space of bounded continuous functions on S, then a(C) = @(S). (El3.3a) Prove the Monotone-Class Theorem 3.2. (E13.3b) Let jr.S->R. Prove that σ(Χ) = {Х~\В)\Ве^}. Prove that if Y:S-> R, then Υ is apQ-measurable if and only if Υ = f(X) for some Borel-measurable /onR. (El 3.5a) From Lebesgue to Lebesgue-Stieltjes measure. Let F be a right- continuous non-decreasing function on R. Let a:= mIxF(x) and b:= supxF(x). For yel:= [a,b)nR, define <t>(y):=mi{x:F(x)>y}. Prove that φ is a left-continuous (therefore Borel) function on /. Assume that Lebesgue measure μ exists on (/, Jf(/)), and define μΡ(Β):=(μοφ-ΐ)(Β):=μ{Γ.φ(γ)ΕΒ} (Be@). Prove that μ¥ is a measure on (R, 3S\ and that it is the unique such measure with /xF(w, v] = F(v) — F(u) (— oo < и < ν < oo). (El 3.5b) Functions of finite variation. For a sub-interval (д,Ь] of R, we define the total variation VF(a,b~\ of F over (a,b~\ to be KF(a,b]:= sup \ £ \F(tt) - Ffa-Jl: neN,fl< tx < t2 < ··· < tn ^ b A function F is said to be of finite variation (or an FV function) if it is an R- function such that VF(a, b~] < oo for every finite subinterval (a, b] of R. Let F be an FV function. Prove that, for aeR, (i) b\-+VF(a,b~] is an R-function on [a,oo), (ii) the functions b \-+ VF(a, b~\ + F(b) and b \-* VF(a, b~\ — F(b) are non-decreasing on [a, oo). It is now trivial that F is an FV function if and only if it is the difference of two non-decreasing R-functions. This makes clear how an FV function F induces a signed measure μ¥, the difference of two measures.
11.13 BASIC MEASURE THEORY 105 (E13.5c) Continuous FV functions. Let F be a continuous FV function. Prove that VF(a, b~\ is continuous in b for b ^ a. Define the quadratic variation QF(a, b] over the interval {a, b] by GF(fl,b]:=sup| Σ |F(ii)-F(ii_1)|2:nEH^^ii<i2<-"<in^b Prove that QF(a, b~] = 0 for all intervals (a, ft]. (Whence, Brownian paths are not of finite variation.) {El 3.6a) Prove Lemma 6.1. (E13.6b) Bad subsets of [0,1]. Many of our counterexamples start off: Take a subset of [0,1] of outer (Lebesgue) measure 1 and inner measure 0.' Here is a way of constructing such a set. At the moment, this exercise is rather hard. We shall see later (Exercise 60.50) that martingales make it easy. Let ρ be an irrational number. Define an equivalence relation on [0,1] by saying that xx = x2 if and only if xx — x2 = m + np for some m, neZ, in other words, if x1—x2=z np mod 1 for some пеЖ. Use the Axiom of Choice to create a set A with one element from each equivalence class. Define B:= {{2np + a)modi :neZ,aeA}, C:={(p + /?)modl:/?eB}. Note that BnC = 0,BvC = [0,1],B= {(p + y)modl :yeC}, so that each of В and С is 'half of [0,1]. Prove that В has outer Lebesgue measure 1 and inner Lebesgue measure 0. (El3.8) Prove Scheffe's Lemma 8.6. (El3.9a) Prove Lemma 9.4. (El3.9b) Prove Lemma 9.5. (El 3.9c) Integrals under change of measure. Let (Ξ,Σ,μ) be a measure space, let /e(ml)+, and let λ = (/μ), the measure in (9.1). Show that, for gemL, we have geSe\S^X) if and only if fgeSel(S^^\ and then λ(g) = μ(fg). (E13.9d) Absolutely continuous functions on R. A function F on R is called absolutely continuous if there exists a function / in if Ц11, $, Leb), in the sense that //^ei^OR^Leb) for all finite subintervals [д,Ь] of R, such that F(b) - F(a) = f(x) dx (oo<a^b<oo). We then call / a derivative of F (obviously, in an extension of the Newton- Leibniz sense) and write Ff = /, a.e. Prove that an absolutely continuous function
106 SOME CLASSICAL THEORY 11.13 / is continuous. Prove that if F is an absolutely continuous non-decreasing function, then the measure μρ of Exercise 5a is absolutely continuous with respect to Lebesgue measure with άμρ/ά Leb = F, a.e. (E13.9e) Show that if F is an absolutely continuous function then it is a continuous FV function with VF(a,bl = ^\F'(x)\dx. The last part is tricky now, but easy using martingale theory. {El3.10a) Prove Lemma 10.8. Topology and measure The ideas behind the following exercises are important throughout the book. There is always a possibility of 'conflict' between topology, which allows certain uncountable operations (the union of an uncountable number of open sets is still open), and measure theory, which allows only countably many operations. For measures on separable metric spaces, things are as one might hope. Recall that a metric space (S, p) is called separable if it has a countable dense subset. (El3.1 la) Let (S,p) be a separable metric space. Prove that p:SxS-*R is @t(S) χ @(S) measurable. Show that any subset A of S has a countable dense subset. (E13.11b) Let (Si,Σι) and (S2,E2) be measurable spaces. Define (^Σ^^,Σ^χ^,Σ,). Prove that the map (x,y) = ((x1,x2),(j;1,};2))i^(x1,y1) of S χ S into Si χ Si is (Σ χ Σ)/{Σ1 χ EJ-measurable. (E13.11c) Let (S1,p1) and (S2,p2) be separable metric spaces. Then the product topology on S:=S1 χ S2 arises from the metric р(х,У) = Р((х1,Х2)ЛУиУ2)У=РЛхиУ1) + Р2(х2>У2)· Define ^l:=S3(Sl\^2:=^{S2) and Σ:=Σ!χΣ2. Prove that ρ is (Σ χ Σ)- measurable. Deduce that ^(S1 xS2) = Jf(S1)x^(S2).
П. 13 BASIC MEASURE THEORY 107 (Generalisation). Suppose that, for neN, (Sn,pn) in a separable metric space. The product topology on Ппек^л arises from the metric p, where P{x,y):= Σ 2- ><*»·*> Show that S is separable. Check (at least in principle!) that @(S) = Y[@(Sn). {El3.1 Id) Let S be the non-separable metric space with cardinality greater than that of the continuum, and with the discrete metric (0 if x = y. Let Δ:= {(x,y)eS χ S:x = y}. Then Δ is closed and therefore Ae$(S χ S). Convince yourself that it is plausible (it is true!) that АфЩБ) х 38(S) and that ρ is (therefore) not ($>(S) χ J^S))-measurable. Hints for selected exercises {H13.2a) F is the limit of step functions F(2"n([2ni] + 1)), [x] denoting sup{neZ:n^x}. {HI 3.2b) If F is a closed set and ρ is the distance function, then max(0,1 — np(x,F))lIF. (HI 3.3a) Consider the π-system of sets of the form (*) Π {*:с,Ме(в,Л)Ь ί=1 where neN and, for 1 ^ i < n, qe^ and — oo < a{ < bf < oo. If we can prove that the indicator function of every set of the form (*) is in Ж then the desired result will follow from Theorem 3.1. For any open subinterval of R, we can find continuous functions gm on R with gm]I{a,by Let сеЯ>. By the Weierstrass theorem, we can find polynomials pm>k such that pm>k ->gm uniformly on [- || с ||, || с || ]. Then pmM°c-*gm°c uniformly on S, whence дт°сеЖ; and now it follows that 1{аЬ)°сеЖ. The rest is easy. (HI3.3b) Since, {X~1(B):Be@} is a σ-algebra, the first part is easy. Let Ж consist of functions of the form f°X, where / is bounded Borel-measurable. Then Ж satisfies conditions (i)-(iii) of Theorem 3.1. (HI3.5a) We have и < ф(у)<vii and only if F(u) <y^F(v). (H13.5b) VF(a,b~\ (note the 'open at a') is decreasing as b[[a. Suppose that, for
108 SOME CLASSICAL THEORY 11.13,14 some a, lim VF(a, b] = ε0 > 0. ЬЦа Find a partition a < t0 < tx < ··· < tn = b on which i=l Now choose a partition of (a, i0] on which the analogous sum is at least |ε0 to arrive at a contradiction. (HI 3.5c) We have ZTO-*to-i)l2<sup[|^ (HI 3.6b) Wait until the exercises on discrete-parameter martingales. (HI 3.8) The hint was given after Lemma 8.6. (HI 3.9a) Peep ahead to the proof of Lemma 20.1 to get the idea. (HI3.9c) If g = IF (FeY), then X(g) = μ(/#) by definition. Now use the standard machine. (H13.9e) It is obvious that VF ^ j,..., but why do we have equality? Again wait for martingales to come to the rescue. (HI3.1 la) Let (sn) be a sequence dense in S. Then p(x,y) = inf lp(x,sn) + p(sn,y)l η For each n, xi->p(x,sn) is continuous, and (x,y)h->p(x,sn) is ЩЗ) χ ^S(S)- measurable. (HI3.11c) See(Ellllb). (HI3.1 Id) See Billingsley [2]. 2. BASIC PROBABILITY THEORY Probability and expectation 14. Probability triple;almost surely (a.s.), a.s.(P), a.s.(P, J*). By a probability triple, we mean a measure space (Ω,^,Ρ) of total mass Ρ(Ω) = 1.
Π.14,15 BASIC PROBABILITY THEORY 109 From now on, (Ω, J*, P) will always denote a probability triple. Unless otherwise stated, У ':= JS? '(Ω, #", P), LF:= LP(Q, J*", P), and ||·||ρ will refer to these spaces. An element Ε of J^ will be called an event, and P(£) will be called the probability of the event E. A statement S about outcomes ω in Ω is said to be true almost surely (a.s.) if F:= {ω: S(co) is true}e^ and P(F) = 1. If we wish to emphasise which probability measure we are talking about, we write 'a.s.(P)', and if we further wish to emphasise that the truth set of S is in J^, we write 'a.s.(P, &)\ It is easy shown that if Fne& (neN) and P(F„) = 1, Vn, thenP(n„F„)=l. The intuitive meaning. We assume that the intuitive meaning is familiar to you from more elementary books. Chance is regarded as having chosen a particular point ω (the actual realisation) of Ω 'according to the law P' before the experiment modelled by (Ω, ^", P) is performed. For an event F, F occurs in reality if and only if the chosen ω is in F. 15. limsup£„; First Borel-CantelK Lemma. Suppose now that (En:neM) is a sequence of events. We define limsup£„:=P) U En m О т = {ω:for every m, 3η(ω) ^m such that coe£„(to)} = {co:coeEn for infinitely many n}. Two important results relate to lim sup £n. (15.1) LEMMA (Reverse Fatou Lemma for sets): P(lim sup EH) ^ lim sup P(£„). Proof. Let Gm:=(J„>m£„. Then GmjG, where G:=limsup£„. By result (4.3), P(Gw)jP(G). But, clearly, P(GJ^supP(£J. п^ т Hence P(G) ^ | lim | sup P(£„) [ =:lim sup P(E„). П (15.2) LEMMA (First Borel-Cantelli Lemma). Let (En:nelK) be a sequence of
110 SOME CLASSICAL THEORY II. 15-17 events such that ΣηΊ*(Εη)< °°· Then P(lim sup En) = P(E„, i.o.) = 0. Proof. We have, for each m, P(GKP(GJ< Σ P(£n). О т (Convince yourself of the rigour.) Now let m] oo. Π 16. Law of random variable; distribution function; joint law (16.1) DEFINITION (Random variable; law). Let (E, S) be a measurable space. By an (£, <?)-valued random variable X (or E-valued random variable X, when S is understood) carried by our probability triple (Ω,^,Ρ), we mean an (&/immeasurable map X from Ω to E, so that X~l\£-*&. By the law Ax of X, we mean the probability measure Ax:=P°X~l on (£,£), so that AX(A) = P{Xe A):= Ρ{ω: X(co)e A } {AeS). Suppose that for i = 1,2, (Eh <f£) is a measurable space and that X{ is an {Eh £{)- valued random variable. Let E:= Ex χ E2, i\— Sl χ S2 and Χ(ω):=(Χ1(ωΙΧ2(ω))ΕΕ. Check that X is an (E, <?)-valued random variable. The joint law AXuX2 of Xt and X2 is then defined to be the law of X. There are obvious extensions, some of which we study in great detail later. If our variable X is R-valued (that is, (R, SS)~valued) then it follows from the Uniqueness Lemma 4.7 and the fact that ^ = σ(π(Κ)), where n(R) is at (1.4), that the law of X is determined by the distribution function Fx of X: Fx{x):=P(X^x) {хеЩ. 17. Expectation; E(X,F). We introduce some notation used throughout the book. (17.1) DEFINITION (Expectation; E(X)) For a random variable Xe^l=^1(il,^r,P), we define the expectation E(X) of X by E(X):= XdP=\ Χ(ω)Ρ(άω). Jn Jo. We also define E(X) (^ oo) for Xe(m«f)+. In earlier notation, E(X) = P(X).
И. 17,18 BASIC PROBABILITY THEORY 111 (17.2) DEFINITION (Notation E{X;F)). For Xetf1 (or (m^) +) and Fe^, we define E(X;F):= J Χ(ω)Ρ{άω):= E{XlF), where, as ever, (\ if coeF, IF(co):= < 10 if cottF. {17.3) LEMMA. Ifhe{m£)+, then (17.4) Eh(X) = h{x)Ax(dx) < oo. For hemS, h(X)e&l{Q3F,P) if and only ifhe&\E,£,Kx\ and then Eh(X)=( h(x)Ax(dx). Proof Use the standard machine at (8.7). If h = IA for some AeS then (17.4) is true by definition of Kx\ etc. Π An important case of this lemma is when (£, S) — (R, 8$) and h is the identity function on R. 18. Inequalities: Markov, Jensen, Schwarz, Tchebychev. These inequalities will be used repeatedly in estimates. (18.1) LEMMA (Markov's inequality). Suppose that Z6m«f and that #:R-* [0, oo] is US-measurable and non-decreasing. (We know that g(Z) = g°Ze(m^r)+.) Then Eg(Z) > E(g(Z); Z>c)> g(c)P(Z > c). Examples for Ze{m&)+, cP{Z ^ c) ^ E(Z) (c > 0), iorXetf1, cP(\X\>c)^E{\X\) (c>0). Considerable strength can often be obtained by choosing the optimum θ for с in P(Y>c)^e~ecE(eeY) (0>O,ceR). (18.2) Jensens inequality for convex functions. A function c:G-*R, where G is an open subinterval of R, is called convex on G if its graph lies below any of
112 SOME CLASSICAL THEORY 11.18 its chords: for x, yeG and 0 ^ ρ = 1 — q^l, c(px + qy) ^ ρφ) + ^φ). Then с is automatically continuous on G. If с is twice-differentiable on G then с is convex if and. only if с" ^ 0. Important examples of convex functions are |x|,x2 and e**(0eR). (18.3) THEOREM (Jensen's inequality). Suppose that c:G-*R is a convex function on an open subinterval Go/R and that X is a random variable such that E(\X\) < oo, P(XeG) = 1, E|c(JSQ| < oo. Then Ec(X) ^ c(E(X)). See [W; Section 6.6] for a full proof. The point is that, since there is a supporting hyperplane for с at (μ, φ)), where μ = E(X\ there exists an m in R such that c(X) > m(X - μ) + φ); and Jensen's inequality follows on taking expectations. (18.4) LEMMA (Monotonicity of norms). If l^p^r and Ует^, then \\y\\p^\\YL· Proof Apply Jensen's inequality with X = | Y\p and c(x) = xr/p {x > 0). D (18.5) Familiar facts. We recall three results that are consequences of Jensen's inequality (see Section 24): for ρ > 1 and p~l + q~l = 1, Holder: |E(XY)| ^E|*7| < ||*||,|| Y\\v Schwarz: |Е(ХУ)| ^Е|*У| ^ ||AJ2|| Y\\2, Minkowski: \\X+Y\\p<\\X\\p+\\Y\\p, (18.6) Variance; covariance; Tchebychev's inequality. If X, Ye Si'2 then, by the monotonicity of norms, X, YeSf1, so that we may define μχ:=Ε(Χ\ μγ:=Ε(Υ). Since the constant functions with values μχ and μγ are in jS?2, we see that Χ:=Χ-μχ, Ϋ:=Υ-μγ are in JSf2. By the Schwarz inequality, X YeS£*, and so we may define €ον(Χ,Υ):=Ε(ΧΫ) = ΕΙ(Χ-μχ)(Υ-μγ)1 The Schwarz inequality further justifies expanding out the product in the final
II. 18,19 BASIC PROBABILITY THEORY 113 [ ] bracket to yield the alternative formula Οον(Χ,Υ) = Έ(ΧΥ)-μχμγ. As you know, the variance of X is defined by Var (*):= E[_(X - μχ)21 = Ε(Χ2) -μ2χ = Cov (X, X). You also know Tchebychev's inequality: (18.7) с2Р(|*^*|>сКУаг(*) (c>0). 19. Modes of convergence of random variables. Let (Xn: neN) be a sequence of random variables and let X be a random variable, all carried by our triple (Ω,^,Ρ) and all R-valued. Recall that we say that Xn -* X almost surely if (19.1) P(Xn->X) = l. We say that Xn-*X in probability if, for every ε >0, (19.2) Ρ(\ΧΗ-Χ\>ε)->0 as и^оо. We say that Xn->X in S£p if each Xn is in S£p and Xe&p and ||*,-ΑΊΙρ->0 as n^oo, or, equivalently, (19.3) E(\XH-X\*)->0 as n->oo. Some relationships between these modes of convergence will now be stated. Regard the proofs as exercises. See [W; EA13.1]. Convergence in probability is the weakest of the above forms of convergence. Thus (19.4) (Xh->X,2l.s.)=>(Xh-+X in prob) (19.5) (ΧΗ->Χίη&η=>(ΧΗ->Χ in prob). No other implication between any two of our three forms of convergence is valid. But, of course, for r ^ ρ ^ 1, monotonicity of norms shows that (19.6) {Xn^Xm S£r)^{Xn->X in Sep). 'Fast convergence in probability' does imply almost sure convergence: (19.7) hp(\Xn-X\>s)<oo,4s>Oj=>(Xn^X,a.s.). Property (19.7) is used in proving the following result: (19.8) Xn-+X in probability if and only if every subsequence of(Xn) contains a further subsequence along which we have almost sure convergence to X.
114 SOME CLASSICAL THEORY 11.20 Uniform integrability and JS?1 convergence 20. Uniform integrability. We begin with a lemma. (20.1) LEMMA. Suppose that Xetf1 = ^(Ω,^,Ρ). Then, given ε>0, there exists α δ > 0 swc/i that for FeJ^, P(f) < <5 /mpfes ί/ιαί E(|A"|;F) < ε. Proof. If the conclusion is false, then, for some ε0 > 0, we can find a sequence (Fn) of elements of J^ such that P(F„)<2-" and E(\X\;Fn)>s0. Let #:= lim supFn. Then the First Borel-Cantelli Lemma shows that Р(Я) = О, but the 'Reverse Fatou' Lemma 8.4 shows that Щ\Х\;Н)>е0; and we have arrived at the required contradiction. Π (20.2) COROLLARY. Suppose that Xe£fl and that ε > 0. Then there exists К in [0, oo) such that E(\X\;\X\>K)<s. Proof. Let δ be as in Lemma (34.1). Since KP(\X\ > К) ^ E(|X\\ we can choose KsuchthatP(|Z|>K)<<5. Π (20.3) DEFINITION (UI family). A class <ё of ΈΙ-valued random variables is called uniformly integrable (UI) if given ε > 0, there exists К in [0, oo) such that Έ(\Χ\;\Χ\>Κ)<ε, VXe<€. We note that for such a class #, we have (with K1 relating to ε = 1), for every Xe<#, Έ(\Χ\) = Έ(\Χ\;\Χ\>Κ1) + Έ(\Χ\;\Χ\*:Κ1)*:ΐ+Κ1. Thus a UI family is bounded in JSf1. It is not true that a family bounded in JSf * is UI. (20.4) Example. Take (Ω,^,Ρ) = ([0,1],Λ[0,1], Leb). Let £„ = (0,0, Xn = nIEn. Then E(|X„\) = 1, Vn, so that (Xn) is bounded in JS?1. However, for any К > 0, we have, for n> K, E(\Xn\;\Xn\>K) = nF(En) = l, so that (Xn) is not UI. Here Xn->0 a.s., but Έ(Χη)-/>0. Π
Π.20,21 BASIC PROBABILITY THEORY 115 We now give two simple sufficient conditions for the UI property. (20.5) LEMMA. Suppose that <£ is a class of random variables that is bounded in <£p for some ρ > 1; thus, for some Ae[0, oo), E(\X\P)<A, VXe<€. Then <€ is UI. Proof. Ifv>K>0 then v^K1 ~pvp (obviously!). Hence, for Κ >0 and XeV, we have E{\X\;\X\>K)^K1-pE{\X\p;\X\>K)^K1-pA. The result follows. Π (20.6) LEMMA. Suppose that <£ is a class of random variables that is dominated by an integrable non-negative variable Υ: \Χ(ω)\ ^ Υ(ω),"iX^€ and E(Y) < oo. Then <€ is UI. Proof. It is obvious that, for К > 0 and XeV, E(|X\; \X\ > К) ^ E(Y; Y> K\ and now it is only necessary to apply (20.2) to Υ. Π (20.7) LEMMA. A class Ή of random variables is UI if and only if the following two conditions hold: (i) <ё is bounded in 5£1; (ii) given ε > 0, there exists δ > 0 such that, whenever Xetf, and FetF is such that P(F) < δ, we have E(\X\;F) < ε. (20.8) LEMMA. // <ё and Q) are UI families of random variables, then V + S>:= {X + Y'.Xe^YeS)} is UI. Proofs of Lemmas 20.7 and 20.8 are left as easy exercises. 21. jS?* convergence. We begin with what is (in view of (19.8)) a consequence «of the Dominated-Convergence Theorem. (21.1) THEOREM (Bounded-Convergence Theorem). Let (Xn) be a sequence of random variables, and let X be a random variable. Suppose that Xn->X in probability and that, for some К in [0, oo), we have for every η and ω, \Χη(ω)\^Κ. Then Е(\Хя-Х\)^0.
116 SOME CLASSICAL THEORY 11.21,22 Proof. You check that P( | X | ^ K) = 1. Let ε > 0 be given. Choose n0 such that P(\Xn-X\> £e) < εβΚ when η ^ n0. Then, for η ^ n0, Ε(\Χη-Χ\) = Ε(\Χη-Χ\;\Χη-Χ\>^ε) + Ε(\Χη-Χ\;\Χη-Χ\^^ε) *ζ2ΚΡ(\Χη-Χ\>±ε) + ±ε*ζε. The proof is finished. D {21.2) THEOREM (A necessary and sufficient condition for Se1 convergence). Let (Xn) be a sequence in JSP1, and let Xetf1. Then Xn-*Xin &\ or, equivalently E(\Xn — X\)-*0-, if and only if the following two conditions are satisfied: (i) Xn-*X in probability; (ti) the sequence (Xn) is UI. It is of course the 'if part of the theorem that is useful. Since the result is 'best possible', it must improve on the Dominated-Convergence Theorem for our (Ω, .F, P) triple; and, of course, the result (20.6) makes this explicit. Proof of'if part. Suppose that conditions (i) and (ii) are satisfied. For Ke[0, oo), define a function <px:IR-*[ — K,K~\ as follows: (к iix>K, φκ{χ):=\ x if \x\^K, \ -K if x<-K. Let ε > 0 be given. By the UI property of the (Xn) sequence and (20.2), we can choose К so that Ε{\φκ(Χη)-Χη\}<& Vn; Ε{\φκ(Χ)-Χ\) <±ε. But, since \φκ(χ) — Ψκ(ϊ)\ < |x — ,y|, we see that (Рк(Хн)-*Ч>к(Х) i*1 probability; and, by Theorem 21.1, we can choose nQ such that, for η ^ n0, Ε{\φκ{Χη)-φκ{Χ)\}<\ε. The triangle inequality therefore implies that, for n^n09E(\XH — X\)<e, and the proof is complete. Π Independence 22. Independence of σ-algebras and of random variables. Here are the key definitions of independence. Sub-a-algebras (SU<&2,... of <F are called independent if, whenever G{69j (/eN) and il9...,iH are distinct, P(Gl,n..nGiJ=nP(GiJ.
11.22 BASIC PROBABILITY THEORY 117 Random variables X1,X2,... are called independent if the σ-algebras σ(Χ1%σ(Χ2),... are independent. Events El9 E2,... are called independent if the σ-algebras Sl9Sl9... are independent, where Sn is the σ-algebra (0,£„,Ω\£„,Ω}. Since £η = σ(ΙΕη\ it follows that events El9E29... are independent if and only if the random variables Ie1Je2^··· are independent. (22.1) The π-system Lemma. We know from elementary theory that events El9 E2,... are independent if and only if whenever neN and il9..., in are distinct, P(£lln...n£J=nP(£iJ, corresponding results involving complements of the Et etc., being consequences of this. We now use the Uniqueness Lemma 4.7 to obtain a significant generalisation of this idea, allowing us to study independence via (manageable) π-systems rather than (awkward) σ-algebras. Let us concentrate on the case of two σ-algebras. (22.2) LEMMA. Suppose that У and Ж are sub-σ algebras of ^, and that J and # are π-systems with a(J) = 99 а(/) = Ж. Then У and Ж are independent if and only if J and # are independent in that P(/nJ) = P(/)P(J), IeJ, Jef. Proof. Suppose that J and # are independent. For fixed / in </, the measures (check that they are measures!) ЯиР(/пЯ) and ЯиР(/)Р(Я) on (И9Ж) have the same total mass P(/), and agree on /. Therefore, by the Uniqueness Lemma 4.7, they agree on σ(#) = Ж. Hence Р(/пЯ) = Р(/)Р(Я), IeJ, НеЖ. Thus, for fixed Η in Ж, the measures G\-+P(GnH) and G\-+P(G)P(H) on (Ω, ^) have the same total mass Р(Я), and agree on J. They therefore agree on a(J) = ^; and this is what we set out to prove. D
118 SOME CLASSICAL THEORY 11.22,23 Suppose now that X and Υ are two real-valued random variables on (Ω, &, P) such that, whenever x, yelR, (22.3) P(X ^ x; Y^ y) = P(X < x)P(Y^ y). Now, (22.3) says that the π-systems π(Χ):= {X~x((- oo,x]):xeR} and π(Υ) are independent. Hence σ(Χ) and σ(Υ) are independent: that is, X and У are independent in our new 'abstract' sense. In the same way, we can prove that random variables X1,X2,...,Xn are independent if and only if P(Xk^xk:l^k^n)=f\P(Xk^xk), k=l and all the familiar things from elementary theory. (22.4) Independence and product measure. This is the ultimate form of the 'independence means multiply' idea. Suppose that, for /=1,2, (Ei9£i) is a measurable space and that Xt is an (Eb (^)-valued random variable. Recall from Section 16 the definitions of the laws AXl and AX2 of Xx and X2, and of the joint law AXuX2 on (E,S'):={E1 χ E2, Sx χ S2). (22.5) THEOREM. The variables Xx and X2 are independent if and only if If Χι and X2 are independent and if for i= 1,2, ^е{т^) + then (22.6) ΕΜ*ι)Μ*2) = Vh^XJ.Eh^XJ ^ oo. You prove the first statement. Result (22.6) follows from Fubini's Theorem together with the ideas in (17.3): if we define h on Ε via h{x):= h^xjh^xj, where χ = (xi,x2)> then EM*i)M*2)= f 4x)AXl9X2(dx) h1(x1)h2(x2)A1(dx1)A2(dx2i JE1JE2 = EM*i).E/i2(*2). There are obvious generalisations of the theorem. If X and Υ are independent elements of JSf^Q, ^,Ρ), then XYetf1 and E{XY) = E{X)E{Y). If, further, X, Ye&2, then Var(X + Y) = Var(X) + Var(7); and so on, and so forth... 23. Existence of families of independent variables. From Section 1.6 on Ciesielski's construction of Brownian motion onwards, we have required models that
Π.23-25 STOCHASTIC PROCESSES 119 support families of independent variables with prescribed laws. Theorem 26.1 gives the elegant and proper way of doing this; and the strength of that theorem is needed in, for example, the direct construction of Poisson measures in Section 37. However, it is often the case that all we require is a model that supports the existence of a sequence of real-valued random variables with prescribed distribution functions. We now recall briefly the well-known trick for achieving this; [W] gives some more details. Let (a#-,P) = ([0,l],«[0,l),Leb). Expand ω in Ω in binary, and write Χ(ω):= ω: Χ(ω):= ω — ·ω1ω2ω3··· = Yjl~kcok. (Conventions made about dyadic rationals are irrelevant.) Then the variables (ππ: neN), where ππ(ω):= ωη are independent coin-tossing variables, each taking the values 0 and 1 with probability \ each. Thus ΑΓ1(ω):=·ω1ω3ω6..., X2(coi)'.= ·ω2ω5ω9..., ΛΓ3(ω):=·ω4ω8ω13... etc. defines an independent sequence of variables each with the same distribution as X, that is, each with the uniform distribution on [0,1]. If (Fn:neIN) is a sequence of distribution functions on R then the definitions Yn:=sup{y:Fn(y)*:Xn} produce a sequence of independent variables (Yn), Yn having distribution function All that is needed to prove these statements in this section is the Uniqueness Lemma 4.7. Note that we now have all the theoretical equipment for the proof of the existence of Wiener measure in Section 1.6. 24. Exercises. Do all the exercises preceding Exercise E9.1 in [W]. Of course, the exercises in our Section 13 have important consequences for probability. For example, El 3.1 la shows that if AT and Υ are random variables taking values in (S,^(S)), where {S,p) is a separable metric space, then p(X, Y) is a real-valued random variable. 3. STOCHASTIC PROCESSES The Daniell-Kolmogorov Theorem 25. (£T,<?T); σ-algebras on function space; cylinders and σ-cylinders. Let (£,<?) be a measurable space, and let Τ be a set. Recall that ET is the set of all functions
120 SOME CLASSICAL THEORY 11.25 / from Τ to E. For teT, define nt:ET ->E to be the evaluation map (25.1) π,(/):=/(ί). {25.2) DEFINITION (the σ-algebra ST). Define the σ-algebra £T\=G{nt\teT} on ET. Thus ST is the smallest σ-algebra on ET such that each щ is {STImmeasurable. For 0 Φ S £ T, define π5: ET->ES to be the restriction map (25.3) π5(/):=/|5. Then (check!) π5 is (^T/<^5)-measiirable. (25.4) DEFINITION (cylinder, special cylinder). We say that a subset F of ET is a cylinder if F has the form (25.5) F = π"xAs = Asx £n5, for some non-empty finite subset S of Τ and some As in Ss. We say that F is a special cylinder if it has the form (25.6) [)п;'Н, = (х\н\хЕт\\ teS \teS / where S is a finite subset of Τ and HteS for teS. (25.7) LEMMA. The cylinder sets form an algebra that generates ST. The special cylinder sets form a π-system that generates ST. The proof is left as a simple exercise. (25.8) DEFINITION (σ-cylinder). We say that F is a σ-cylinder if it has the form for some non-empty countable subset SofT and some As in Ss. (25.9) LEMMA. ST is precisely the collection of σ-cylinders: thus membership of an element of ST imposes restriction on the values of f only at countably many t-values. This result is very important. Its proof is easy. One need only show that the σ-cylinders form a σ-algebra. The key point is that if ((S(n)) is a sequence of countable subsets of Τ and for each n,A(n)eSS{n), then f]KS(n)A(n) = n^A9 η
Π.25-27 STOCHASTIC PROCESSES 121 where S=[jS(n) and A = f)iA(n)xEsWtt)]e£s. η The notation introduced in this section will be carried forward for some time. 26. Infinite products of probability triples. Section 25 had to be the first section of this discussion of stochastic processes. The present section requires the definition of S1\ but otherwise would belong more properly at the end of part 2 of this chapter. (26.1) THEOREM. For each t in T, let μί be a probability measure on (E,S). Then there exists a unique probability measure μ on (ET, ST) such that, whenever S is a finite subset of Τ and HteS for teS, (26.2) μ(ΠπΓ1Η,) = ΠΛ(^)· \teS / reS The uniqueness of μ is an immediate consequence of Lemma 4.7 and the fact that the special cylinders form a π-system that generates ST. The existence of μ is quite a deep matter. Fubini's Theorem implies the existence of a finitely additive measure μ0 on the algebra of cylinder sets such that the analogue of (26.2) holds for μ0. Caratheodory's Theorem 5.1 shows that we need only (!) prove that μ0 is σ-additive on the collection of cylinder sets. After a little thought about Lemma 25.9, we realise that we need only prove the result for the case when Τ is countable. Compare Observation 30.2 below. A leisurely proof is given in [W; Chapter A9] (you can regard the (R, SS) there as a notation for (E, S) for this purpose). You should compare and contrast that proof with the proof given below for the Daniell-Kolmogorov Theorem, for which topological assumptions are necessary. When (E, S) does have suitable topological properties, as is always the case in practice, Theorem 26.1 follows from the DK Theorem. Note that (26.3) the (E, Syvalued random variables (nt:teT) on the probability triple (ET, iT, μ) are independent, nt having law μν The problem of the existence of'completely independent' stochastic processes (and, in particular, of independent, identically distributed (IID) sequences) is therefore settled. We now turn to the study of more general processes. 27. Stochastic process; sample function; law. There are many different ways, all important, of regarding a stochastic process. (27.1) CLASSICAL DEFINITION (stochastic process; state-space; parameter
122 SOME CLASSICAL THEORY 11.27,28 set; carrier triple). Let Τ be a set, (E,£) a measurable space and (Ω,#",Ρ) a probability triple. The traditional definition of a stochastic process with time- parameter set Γ, state-space (E, S) and carrier triple (Ω, J*", P) is as a collection {Xt:teT} of\E,S)-valued random variables carried by our triple (Ω,^,Ρ). Thus, we have, for each i, the picture (27.2) DEFINITION (sample function; sample path; realisation). Let X be as in (27.1). For ωεΩ, the map (element of ET) Χ(ω):Τ->Ε t^X^co) is called the sample function of X (or, especially when Τ is a time-parameter set, the sample path of X) or realisation of X corresponding to ω. This leads to an alternative view of X, namely as the map (27.3) Χ:Ω^ΕΤ, X'^.S7^^, ω\-+Χ(ω); in other words, as an (£T, £T)-valued random variable. You can easily check that X is (J2r/^,r)-measurable as a map from Ω to ET if and only if each Xt is (#7<?)-measurable as a map from Ω to E. {27.4) DEFINITION (law of stochastic process). The law of the stochastic process X in Definition 27.1 is the probability measure (27.5) μ^ΡοΑΓ"1 on (ΕΓ,<ίΓ); in other words, it is the law of the (£T,<?T)-valued random variable X. If we wish to emphasise the role of (Ω, J^, P) and/or Γ, we shall use such notations as (Ω,#\Ρ;*) or (Cl^,P;{Xt:teT}) to signify our process X. 28. Canonical process. Let X be the process of Section 27, and let μ be its law. The process (28.1) {ET,ST^\nt:teT) trivially has the same law μ as X; it is called the canonical process with law μ. A canonical process is completely determined by its law; and, for a canonical process, the idea that a sample point 'is' the outcome of the experiment is
Π.28,29 STOCHASTIC PROCESSES 123 restored. Canonical processes are certainly nice. However, probability theory gets most of its depth from being able to construct (certainly non-canonical!) processes from other processes by time transformation, or as solutions of SDEs, etc. Important note on terminology. It is important that we are currently working with the space ET of all functions from a set Τ into a space Ε that carries a measurable structure S. When we speak of canonical Brownian motion, we usually mean the set-up where С = C([0, oo); R) is the space of continuous paths w: [0, oo) -*R, nt: С -*R is the evaluation map π,(ω) = ω(ί), s/ = a{nt:telR] and W is Wiener measure. You must keep in mind that at the moment, all paths are allowed. 29. Finite-dimensional distributions, sufficiency; compatibility. We continue with the notation of the last few sections. For a non-empty subset S of Γ, define π8Χ:Ω-*ΕΞ via (29.1) (π5*)(ω):= π8(Χ(ω)) = X(co)\s and (29.2) /i^Po^X)-1 оп(£5,П (29.3) DEFINITION (Fin(7), finite-dimensional distributions). Let Fin(7) denote the set of поп-empty finite subsets ofT. The probability measures &s:SeFm(T)} are called the finite-dimensional distributions of X. For SeFin(r), (29.4) μ5 = μοπ-ι0η(£5,Ε5). If we know the finite-dimensional distributions of X then we know the value of μ on all cylinders, and hence, by Lemmas 4.7 and 25.7, we know μ on the whole of (EV7): (29.5) the finite-dimensional distributions determine the law. (This is what is meant by 'sufficiency' in the title of this section: it has nothing to do with Fisher's brilliant concept in statistics). Do Exercise E38.29. Note that if t/, KeFin(T) and U £ V, and ifn^ denotes the restriction map from Ev to Eu, then we have the compatibility condition or projective property: (29.6) μυ = μν°(πΙ)
124 SOME CLASSICAL THEORY 11.29,30 The fundamental Daniell-Kolmogorov Theorem considers the following problem. Suppose that we have a family of probability measures as in (29.3) that satisfies the compatibility condition (29.6): does there exist a measure μ on (£r, ST) such that (29.4) holds? In the language of category theory, we are asking whether a projective system has a projective limit. Rather surprisingly, the answer is 'Not in general'. In order to obtain a positive result, we have to make a topological assumption about the measurable space (E,S)\ we need the following inner regularity with respect to compact sets. (29.7) LEMMA. Let J be a compact metric space, and let В be a Borel subset of J. Ifm is a finite measure on J and ε > 0, then there exists a compact subset К of В such that m(K)>m(B)-s. This standard result is proved in Section 81. 30. The Daniell-Kolmogorov (DK) Theorem; 'compact metrisable' case. The DK Theorem is the essential first step in constructing stochastic processes. The general case of the theorem is given in the next section. It is well worth presenting the present 'simple' case on its own. Recall that Fin(7) is the family of non-empty finite subsets of T. (30.1) THEOREM (Daniell, Kolmogorov). Let Ε be a compact metrisable space, and let S — ЩЕ). Let Τ be a set. Suppose that for each S in Fin(T), there exists a probability measure μ5 on (ES,SS\ and that the measures ^5:SeFin(r)} are compatible or projective in that (30.2) μν= μν°(πνϋΓ1 holds whenever C/, KeFin(r) and U ^ V. Here n\j is the restriction map from Ev into Eu. Then there exists a unique measure μ on (£r, ST) such that (30.3) μ8 = μ°^1 on (E\£s\ where ns is the restriction map from ET to Es. Start of Proof. For any cylinder set F, we have for some SeFin(7) and some AseEs, (30.4) ^ = я"1у15 = Л5х£г\5. For such an F, set (30.5) /z0(F):= μ8(Α8). The compatibility condition (30.3) guarantees that this definition is independent of the particular representation of F used in (30.4). Moreover, it is obvious that
11.30 STOCHASTIC PROCESSES 125 μ0 is finitely additive on the algebra # of cylinder sets. We need only show that (30.6) μ0 is countably additive on Ή since then Caratheodory's Theorem does the rest. The result (4.3) makes it clear that Theorem 30.1 is implied by the following lemma. (30.7) LEMMA. Suppose that (i) FneV (neN); F„=>F„+1 (Vn); (ii) for some ε > 0, μ0(Ρ„) > 2ε (Vn). Thenf)nFn*0. Proof of Lemma. Let (Fn) satisfy the hypotheses of Lemma 30.7. We have (30.8) F„ = n^An = AnxE^s^ for some S{n)eF'm(T) and some An in SS(n). Now, μ5(π) is a probability measure on the compact metrisable space ES{n\ and so, by Lemma 29.7, there is a compact subset Kn of An such that μ8{η)(Κ„)>μ8{η)(Αη)-2-ηε. In other words, (30.9) μ0(Ηη)>μ0(Ρη)-2-% where (30.10) Hn:=KnxET^Sin). Note that Hn is compact, by Tychonov's Theorem. You can easily combine the hypotheses of Lemma 30.7 with (30.9) to prove that ^о(Я1п---пЯи)>8, Vn. Thus (30.11) Ηίη···ηΗηϊ0, Vn. If f]kHk = 0 then [JkHck = ET, whence the fact that Ε is compact forces (J Щ = ET for some n, k^n contradicting (30.11). Hence f)kHk^0, whence, a fortiori, f]kFk^0' Thus Lemma 30.7 and Theorem 30.1 are true. Π Now do Exercise E38.30.
126 SOME CLASSICAL THEORY 11.31 31. The DanieD-Kolmogorov Theorem: general case. The case now to be presented is not the most general known, but it is good enough for us. (31.1) THEOREM (Daniell, Kolmogorov). Theorem 30.1 remains true if the assumption that Ε is compact metrisable is replaced by the assumption that Ε is a Lusin space, that is, Ε is homeomorphic to a Borel subset of a compact metrisable space. Remark. Of course, Rn is homeomorphic to a Borel (indeed, open) subset of a compact metric space. For example, use stereographic projection of the Sn sphere inRn + 1. The following observation will prove useful. (31.2) OBSERVATION. In proving the theorem, we may assume that Τ is countable. Justification of Observation. All of the remarks made up to and including the statement of Lemma 30.7 transfer to the present case. Proving Lemma 30.7 for a fixed sequence (Fn) as in (30.8) is identical to proving the same result when T= \JS(n). D Proof of Theorem 31.1. We suppose that Eef, where f\=3t(J\ J being a compact metrisable space. We do (as we may) assume that Τ is countable. We derive the theorem directly from Theorem 30.1, making no further use of Lemma 30.7. For each S in Fin (Γ), we extend μ5 on (£5, Ss) to μ5 on (J5, fs) in the obvious way: μ8(Α8):=μ8(Α8ηΕ5) (Asefs). Define μ0 on the algebra # of cylinder sets associated with (J, #) via the obvious analogue of (30.5). Since J is compact metrisable, we know from Theorem 30.1 that μ0 has a unique countably additive extension μ to (Jr, #T). Now, Τ is countable. We may therefore find a sequence (T(k)) of finite sets with T(k)]T But then μ(£Γ) = |lim/2(£r(*> χ JT^k)) = |lim fiT(k)(ET<k)) = |liml = l, and l4L):= fi(LnET) (LeSt) obviously defines the required probability measure on (ETy ST) asserted by the theorem.
11.31,32 STOCHASTIC PROCESSES 127 The proof of the DK Theorem is finished. Π Discussion. Suppose that Ε is not compact, and that Τ is uncountable. Then ET is not an element of </r, and we cannot say that μ(Ετ) = 1. If however, F is any element of ^T such that ET czF then, for some countable subset S of Γ, F2£s χ Jn5, and μ(Ρ) = 1. Thus the outer μ-measure fi*(F) of F is equal to 1, and what is happening is that M(F) = /i*(F) (FeST). This kind of thing will keep on happening! 32. Gaussian processes; pre-Brownian motion. In this section and the next we look at some applications of the DK Theorem. We shall soon see in Section 34 that these applications are, as yet, extremely unsatisfactory. (32.1) Gaussian processes. Let Γ be a parameter set. Let m: 7-*R, and let V be a symmetric non-negative-definite function from Τ χ Τ to R, so that, for any finite subset S of Τ and any function / on S, Ein^)/(r)/(#o. reSseS We know from elementary theory that, for SeFin(r), there exists a unique measure μ5 on (R5,Jf5) such that, for 0eR5, (32.2) f πρϊϊΣ W(s)\s(df) JR.* L seS J = exp\iΣ 0{rMr)-\ςΣ %r)V{r,зЩзЛ L seS 2-reSseS J Indeed, if the restriction Vs of V to S χ S is strictly positive-definite then μ5 has density (32.3) (2^-'s'/2(detF5)-1/2exp{-^ Σ if(r)-m(rn(Vsr%s)lf(s)-m(Sn L J-reSseS relative to the Lebesgue measure on R5. We also know from elementary theory that the measures {μ5: Se Fin (Γ)} are compatible (projective) in the sense of the DK Theorem. Hence we can construct the Gaussian process (RT,@T,ftnt:teT) with mean function m and covariance function V; this has the projective limit μ as its law. That m is the mean function and V is the covariance function is confirmed by μ(π,) = m(i), μ(π5π,) - μ(π5)μ(π,) = V(s, t\ V(s, t). (32.4) Orthogonality and independence. If T1 and T2 are disjoint subsets of Τ
128 SOME CLASSICAL THEORY H.32-34 and V(tu t2) = 0 whenever tteΤί5 ί = (1,2), then {ntl: t1 eTj and {πί2: t2eT2} are independent processes. {32.5) Pre-Brownian motion. If we take Τ = [0, oo), m{t) = 0, V(s, t) = min (5, t) then, as we already know, V is positive-definite. We call the associated canonical process (ΚΓ,^Γ,μ;π,:ίΕΤ) pre-Brownian motion. We adjoint the suffix 'pre-' because this process has all possible functions from (0, 00) to R as paths, not just continuous functions. 33. Рге-Poisson set functions. Let (\Υ,Ψ*,λ) be a σ-finite measure space such that every singleton set {x} (xeW) is in iV. By a pre-Poisson set function on (W, if) with intensity measure A, we mean a (Z+u{ 00})-valued process (if one exists) {A(B):BeiT} with the following properties: (33.1) (i) for every В in iT,A(B) is a Z+-valued random variable with the Poisson distribution of parameter λ(Β): Prob(A(B) = k) = K—L if χ(Β) < 00, fe! Prob(A(B) = 00) = 1 if λ{Β) = oo; (33.1) (ii) if Bu...,Bn are disjoint elements of iir then Α(Βγ),...,A{Bn) are independent random variables: (33.1)(iii) whenever B1 and B2 are disjoint elements of if, р[Л(в1ив2; = л(в1) + Л(в2)] = 1. If S is a finite subset of iV then we can easily specify the desired law μ5 of {A(B):BeS}. We know from elementary theory that the sum of independent Poisson variables is again Poisson, from which it follows that the family fas'.SeFiniiT)} is projective. Hence, we can construct a canonical pre-Poisson set function (33.2) ((Z+v{aD})*9P(Z+v{ao})*,KnB:Bein with intensity measure λ. Beyond the DK Theorem 34. Limitations of the DK Theorem. Under its hypothesis, the DK Theorem 47.1 provides us with a canonical process (ET^T^;nt:teT) that has all possible functions in ET as sample functions. Moreover, we know
Π.34,35 STOCHASTIC PROCESSES 129 from Lemma 25.9 that FeST if and only if F is a σ-cylinder, that is, if and only if F = n^1As for some countable set S and some As in Ss. (34.1) Difficulties with path continuity. Consider the canonical pre-Brownian process (Ж[о'^[о'от),#я,:£е[0,оо)) in (32.5). Let С be the set of continuous functions from [0, oo) to R. Life would be simple if it were the case that Ce^[0'co) and μ(ϋ) = 1. However, C^^[0'oo) because С is not a σ-cylinder. Suppose that Fe^[0'co) and F с С. Then (ij F = π^1Α5 for some countable set S and some Л in 3P\ (ii) every element of F is continuous on [0, oo). Since property (i) tells us nothing about the behaviour of elements of F off the set S, we conclude that F = 0. we have proved that (34.2) Fe^[0'co) and F с С imply that F = 0. Thus С has inner μ-measure 0, and completion certainly will not help us. (34.3) Difficulties with Poisson measures. Recall the canonical pre-Poisson set function in Section 33. What we really want is that Β\-+πΒ(ω) is a measure for each ω. Let Ji£(Z+ufoo})^ be the set of (Z+и {oo})-valued measures on (VffW). Life would be simple if it were the case that Л e(Z+ kj {co})^ and μ(Μ) = 1. However, you can easily prove, in analogy with (34.2), that if iV is uncountable, as it will be in every case of interest, then (34.4) Fe(Z+ и{oo})^ and F s Μ imply that F = 0. Note that if Л0 denotes the set of finitely additive measures on (W,iT) then, if iT is uncountable, the analogue of (34.4) will hold for Л0. 35. The role of outer measures. Again let (£, S) be a measurable space and Τ a set. Let μ be a probability measure on (ET,ST). Let G с ET. Think of G as a class of good sample functions. Thus G might be С in the context of (34.1), or Ji in the context of (34.3). In many contexts, G will be the set of ruj/ii-continuous paths on (0, oo), which is a more natural class for probability theory than the set of continuous paths. Before reading the next lemma, please reread Lemma 6.1. (35.1) LEMMA and DEFINITION. A process with law μ exists with all its sample functions in G if and only if the outer μ-measure μ*(ϋ) of G is 1, Then is the canonical process with path-space G and law μ.
130 SOME CLASSICAL THEORY H.35,36 (35.2) COROLLARY. Because of Wiener's Theorem 1.6.1, we have м*(С) = 1 if μ is the pre-Brownian law. It is never at all easy in practice to decide whether or not μ*(ϋ) = 1. Lemma 35.1 is more a matter of clarifying structure than a useful tool. 36. Modification; indistinguishability. Let У be a stochastic process with parameter set Τ and state-space (£, S) carried by the triple (Ω, ^", Ρ). (36.1) Important discussion. To indicate the way in which things develop, suppose that Υ is an (R, 3&)-valued process with time-parameter set [0, oo) and law μ, carried by a triple (Ω, #", P). It is often possible, by making heavy use of the structure of μ, to show that there exists a set QG in !F with Ρ(Ωσ) = 1 such that, for every a>eQG, the map q\-+ Yq(co) from Qn [0, oo) to R has a right-continuous extension t\-^Xt(d) from [0, oo) to 1R. If ω£Ωσ, set Xt(co) — 0 for all t. Then all paths of X are right-continuous. It is often further possible to show, again by using the structure of the particular μ, that P(Xt = Yt) = 1, VieT. Then X will be a process with law μ, all paths of which are right-continuous. (The set Ω€ of ω for which t\—>Xt(co) is continuous will then be an element of J^, so that P(QC) is meaningful.) The 'regularisation' method just described, and due to Kolmogorov and Doob, is one of the most powerful and widely used ways of obtaining processes with right-continuous paths. Often, however, we use direct methods of construe- tion such as Ciesielski's proof of the existence of path-continuous Brownian motion in Section 1.6. Having obtained this Brownian motion, we can then construct path-continuous diffusions by solving SDEs; and so on. The good sense of the following definition is now evident. (36.2) DEFINITION (modification). A process X is called a modification of Υ if X has the same state-space, parameter set and carrier triple, and also Р(Х,= у,)=1 for every teT. Clearly, two processes that are modifications of each other have the same law. (36.3) Note. It is not necessarily the case (even if Τ is a singleton set) that if Υ has law μ and G^ET satisfies μ*(θ)= 1 then У has a modification with all paths in G. See Exercise E38.36. The most-stringent form of'near-equality' of processes will now be introduced. (36.4) DEFINITION (indistinguishable processes). Let X and Υ be two processes with the same state-space, parameter set Τ and carrier triple. We say that X and
11.36,37 STOCHASTIC PROCESSES 131 Υ are indistinguishable for are equal modulo indistinguishabilityj if P{Xt=Yt for all teT)=l. (36.5) PROPOSITION. The following statements hold. (i) Two indistinguishable processes are modifications of each other, (ii) If the parameter set is countable then two processes that are modifications of each other are indistinguishable. (Hi) If X and Υ are right-continuous processes with values in some Hausdorff space (£, @(E)) then X and Υ are modifications of each other if and only if they are indistinguishable. 37. Direct construction of Poisson measures and subordinators, and of local time from the zero set; heuristics; Azema's martingale. Because Poisson measures are the foundation for excursion theory, this topic is very important for us. As in Section 33, let (W, iP, λ) be a σ-finite measure space in which all singleton sets belong to iT. We want to construct a pre-Poisson process Aonf with intensity measure λ such that B\-+A{B) is a measure. We follow Kingman [2]; see also Kingman [4]. First suppose that λ(\Υ) < oo. Use Theorem 26.1 to construct a sequence (Ν,Ζ15Ζ2,...) of independent variables on some triple (Ω, ^, Ρ) where (i) N has the Poisson distribution with parameter λ{Ψ); (ii) each Zk takes values in {W,W) and has law X/k(W). Thus P(JV = m) = e ~ XiW)X(W)m/ml (m = 0,1,2,...), P(ZkeB) = X(B)/X(W) (Ве1Г,кеЩ We now define A to be the measure on {W,iT) with Ν(ω) Λ(Β,ω):= Σ IB(Zk(w)) (BeiT). *=ι Then, for BeiT and r,meZ+, P[A(B) = r;A{W\B) = m] = P[JV = r + m]P[A(B) = r\N = r + m] _ e-XiW)X{W)r+m (r + m)! Γ λ{Β) Τ Γ λ{Β) lm (r + m)\ r\m\ LWOJ L ~^(W0J _ e~MB)X(B)r e-XiW^B)l(W\B)m r\ ml
132 SOME CLASSICAL THEORY 11.37 so that A(B) and A(W\B) are independent and have Poisson distributions with parameters λ(Β) and λ{\Υ\Β) respectively. It is easy to give a full proof that Λ has the required 'pre-Poisson'—and, indeed, now proper Poisson-measure— structure. Now consider the case when λ is assumed only σ-finite. Write λ = ΣπεΝ^π5 where each λη is a finite measure on (W,W). Use Theorem 26.1 and the construction just described for the case 'Λ finite' to construct independent Poisson measures Λη on (W,iT% An having intensity measure λη. Then it is easily verified that Л:=£ЛП is a Poisson measure with intensity measure λ. Of course, it now follows (but it hardly matters) that μ*(Μ) = 1 in the context of (34.3). Subordinators. Recall from Section 1.28 that a Levy process is a right-continuous process with stationary independent increments and that a subordinator is a Levy process with non-decreasing paths. [Note. There are never enough symbols to go round in mathematics. When we combine different ideas, we often find conflict of commonly used notations. In the following discussion, we adjust the notation of Section 1.28 so that it does not conflict with that which we have recently been using.] Let X be a subordinator with X(0) = 0. The distribution of X{1) is infinitely divisible: for each n, it is the sum of η independent random variables each with the distribution of X(l/n). For θ > 0, we have (37.1) Eexp 1-ΘΧ(ί) ] = exp [-ίΨ(0)], where we can regard the Laplace exponent Ψ as defined by (37.1) with t= 1. From Theorem 1.28.3, we have, for θ > 0, •ΜΟ,αο) (37.2) *F(0) = c0+ I (1 - e~ex)v{dx\ J(0,oo) for some с ^ 0 and some measure ν on (0, oo), the Levy measure of X, with (37.3) min(x,l)v(</x)<oo, the condition guaranteeing that, for some (then all) θ > 0, the integral appearing on the right-hand side of (37.2) is finite. (37.4) THEOREM (Levy, Ito). Let ν satisfy the condition (37.3). Let A be a Poisson measure on (0, oo) χ (0, oo) with intensity measure Leb x»v. Let c^0. Define Jse(0,f] Jjc (37.5) X{t):=ct+ I xA{dsxdx). Jjce(O.oo) Then X is a subordinator with Laplace exponent Ψ as at (37.2).
11.37 STOCHASTIC PROCESSES 133 Heuristic proof. It is (truly!) obvious from the independence properties of the Poisson measure that AT is a subordinator. Formally, the number J{t,dx) of jumps of size between χ and χ + dx made by X during time-interval [0, i] is Poisson with parameter fi:=tv(dx). Now X(t) = ct + $xJ(t,dx), the 'sum' of independent bits. Moreover, f β-θχηβ-ββη Ε exp [-0xJ(i, </*)]= — = exp[-/J(l-*"**)] n\ = expl-t(l-e~ex)v(dx)l and the result follows. Exercise. Make this heuristic proof rigorous. First consider the compound Poisson process (see Section 1.28) obtained by removing all jumps of size less than ε from X. The process Я+ = {Я* :a ^ 0} for Brownian motion. Let β be a path-continuous Brownian motion on R, starting at 0. For a ^ 0, define (37.6) Я+(а):= inf {t > 0:Bt > a}. Then Я+ is a right-continuous non-decreasing process; and it is clear from the strong Markov theorem and the spatial homogeneity of Brownian motion that Я+ is a subordinator. From (1.9.1), with the с there equal to 0, we have (37.7) Ε exp(-ΘΗ?) = exp [-αΨ(0)], where (37.8) ψ(0) = (20)1/2= J °°(1 -e-ex)(2nx3)-1,2dx, Jo so that our с equals 0 and v{dx) = {2nx3)~1/2. Define the continuous non-decreasing process St:= supBs as usual. Recall from (1.14) Levy result that Y:=S — В defines a reflecting Brownian motion Y. The jumps of Я+ correspond to intervals of constancy of S and to the intervals between visits to 0 by Y. Let i2f := {t: Yt = 0}, the zero-set for the reflecting Brownian motion Y. For t ^ 0 and ε > 0, let JV(i, ε) denote the number of component intervals of [0, ί]\3Γ with length greater than ε. We know that N(H*,e) has a Poisson distribution of mean αν(ε, οο) = α(|πε)"1/2. It is easy to prove by using martingale techniques and exploiting monotonicity (see Exercise (79.71c)) that, almost surely, (37.9) (±ns)V2N(H:,s)-+a = S(H;) (e->0)
134 SOME CLASSICAL THEORY 11.37 uniformly on compact α-intervals, and it follows that, almost surely, (±ne)1/2N(t,e)^S(t) (ε->0) uniformly on compact α-intervals. We have therefore constructed the local time ( = S for the reflecting Brownian motion Υ directly from the set 2t of times at which 7 = 0. A striking and difficult result due to Levy, Wendel, Taylor and Hawkes (see Hawkes [2]) tells us that t{i) is the Hausdorff /i-measure of «2Γ n[0,i] associated with the function M<5):=[2<51oglog(l/<5)]1/2. Heuristics. Chapter II is, as you will agree, following a Definition-Lemma- Theorem approach. For the remainder of this section, however, we cast off the shackles of rigour—for interest's sake. We shall return later to many of the points considered here. The intervals comprising the set [0, co)\& are the excursion intervals of Υ (away from 0). The lengths of these intervals are determined by the measure v. We might therefore conclude as a heuristic principle that, given that an excursion interval is of length at least a, the probability that it, is of length at least у is (37.10) -1Zl_Z= _ . v(a,oo) \yj We are now going to work with our BM0 process В rather than У. The zero-set for В is the same as that for the reflecting Brownian motion |B|, and so has the same structure as 2£. Let t > 0, and define (37.11) a,:= t - sup {5 ^ t:Bs = 0}, βί:=ίηϊ{Η>ί:Β» = 0}-ί. You might guess on the basis of (37.10) that α W2 (37.12) P(/^>№ = «) = . , a and Exercise (37b) in Section 38 shows that you would be right. Azema's martingale. Define Azema's process J and its natural filtration {ft} by (37.13) Jt:=Sen(Bt)(2oct)1/2, ft:=a{Js:s*it}. (37.14) THEOREM (Azema). {Jt} and {J2 - ή are martingales relative to {ft}. These processes are not martingales relative to the natural filtration of B. Let us see how Azema's Theorem ties in with (37.12).
11.37 STOCHASTIC PROCESSES 135 (37.15) Suppose that <xt = α for some fixed t and α with 0 < α < t. Let и > α, and let Τ be the first time after time t that <xT is either w or 0. Then, either, and with probability (a/w)1/2 ar = w and T—t = u — a, or, and with probability {a/4v3)1/2dv, for some i; with a<v<u, aT = 0 and Γ— ie(u, ι; -h <ii?) — a. Elementary calculations now show that, conditionally on (37.15), EJr = J„ E{J2T-T) = Jf-t, and these results start to make Azema's Theorem plausible. A much better explanation of Azema's result is provided by the following facts: for xeR and s ^ t, ^L(i-a)aJ 2a *\ 2a/ whence (37.17) P{Btedx\ateda, sgn (Bf) > 0) = -exp ( - — Ъх (х>0). a \ 2a/ See Exercise (37a) in the next section. If ξ is a positive random variable with the probability density function on the right-hand side of (37.17) then E({) = (£πα)1/2 and Ε{ξ2) = 2α. This strongly suggests that (37.18) Ε(Β,|Λ) = W/2J<> E(B2 - ί|/t) = J2 - i; and hence {Jt} and {J2 — t} inherit their martingale properties relative to {ft} from those of {Bt} and {B2 — t} relative to the natural filtration of B. See Exercise E79.71b. If i now denotes local time at 0 for B, and xt\=mi{wJ(u)> t}, then the processes R, where Rt is the sum of the moduli of the jumps of J by time zt, is clearly a subordinator. However, the number of jumps of modulus greater than ε made by J by time τ, equals the number of jumps of modulus greater than |ε2 made by α by time τ,; and this, like Ν(Η*,ε2), has a Poisson distribution of mean ί(£π)~1/2ε-1. Thus the Levy measure vR of R satisfies vR(s, со) = ί(|π)~1/2ε-1; but this fails the integrability condition (37.3). You see the problem: (37.19) Azema's martingale J is not of finite variation. Azema's martingale has been the source of much interest recently, particularly to workers in quantum probability. See Azema [1], Azema and Yor [3], Emery [2], Meyer [12] and Revuz and Yor [1].
136 SOME CLASSICAL THEORY 11.38 38. Exercises. (E38.25) Extend exercise E13.3b as follows. Show that if (£, S) is a measurable space and Γ is a parameter set, then a function ξ:Ετ-+ΈΙ is (immeasurable if and only if £=/°π5 for some countable subset S of Τ and some immeasurable function /:£S->R. (E38.29) Time-reversal for Brownian motion. Fix t ^ 0. We consider Brownian motion with time-parameter set [0, i]. Let Ω:= C([0,i];R), and, for ωεΩ and se[0,i], write Bs(a>):=a>(s), and define stf:=a{Bs:s^t). Let P* be the law of Brownian motion starting at x, so that Px is the unique measure on (Ω,^) such that for neN, for 0 = s0 <sx < ··· <sn and x0,xl9...9xneR with x0 = 0, we have p( Π {B(sdedxt})= np(5£-5,_lfx£_lfx£)rfxif this making rigorous sense when integrated over xl9 x2,..., xn in a Borel subset ofRn. Let" be the time-reversal map on Ω, so that ώ(5):=ω(ί-5) (O^s^t). For £егшг/, define £(α>):= £(ώ). Prove that, for £е(т.я/)+, f E*(£)</x = [ E^)dy(^oo). There are some measurability questions involved, which we shall study in detail later—do not fret over these. Hint. First take ξ = IA{B0)IH(Bs)Ic(Bt\ where 0 ^ s ^ i, and A, H and С are Borel subsets of R. Because JPxdx is not a finite measure, you will have to do some truncation. But the idea is the same as that used to show that the finite- dimensional distributions determine the law of a process. (E38.30) Lebesgue measure from coin tossing. Show that one can reverse the argument in Section 23 as follows. Use the DK theorem to construct a sequence of independent variables (£k:/ceN) each taking the values 0 and 1 with probability \ each. Define X:= Σ2~4„· Then Lebesgue measure on ([0,1],#[0,1]) is the law of X. (E38.36) The object of this exercise is to confirm the point made in Note 36.3. Let Ω:=[0,1], Jf:=^[0,l], μ:=Leb on (Ω,^), and let μ* be the outer measure associated with (μ,^7). Let G be a subset of Ω with μ*(ΰ) = 1 and ^*(GC) = 1, where, of course, Gc:= Q\G. Define J^:= σ(^, G) and P(F):=^*(GcnF) (FeP). Prove that (Ω,^,Ρ) is a probability triple. See Lemma 6.1. Let 7(ω) = ω for
11.38,39 DISCRETE-PARAMETER MARTINGALE THEORY 137 ωεΩ. Prove that Υ has law μ, but that, even though μ*{ϋ) = 1, there is no modification of Υ taking all its values in G. (E38.37a) Use Exercise E38.29 and the result (1.9.2) (with с there equal to 0) to prove (37.15). Now deduce (37.16). Hint. Consider ξ:= f(B0)g(<xt)h(Btfy, where _ f 1 if Bs = 0 for some se[0, r], [θ otherwise. (E38.37b) Prove (37.11). (E38.37c) Modify the last two exercises to cope with the case when the Brownian motion В has drift с 4. DISCRETE-PARAMETER MARTINGALE THEORY Again, we follow [W] very closely. There, you will find the same notation, all proofs not given here, and many illustrative examples. Neveu [5] gives a fine broader picture of the scope of discrete-parameter martingale theory. Of course, Doob [1] is the classic account. In that account, Doob emphasises the debt we owe to Sparre Andersen and Jessen for their work on uniformly integrable martingales. After revising the theory of conditional expectation (due, of course, to Kolmogorov), we concentrate on the Upcrossing Lemma, the Submartingale Inequality, results on uniform integrability, the Optional-Stopping Theorem, the Optional-Sampling Theorem (all, of course, due to Doob), and the 'Downward' Convergence Theorem for supermartingales due to Levy and Doob. These results are central to the extension to the continuous-parameter theory, which occupies Part 5 of this chapter and dominates the remainder of both volumes. CONVENTION: Until further notice, all random variables are (R, 38)-valued. Conditional expectation 39. Fundamental theorem and definition. The following theorem and definition constitute the greatest of Kolmogorov's many contributions to the subject. (39.1) THEOREM and DEFINITION (a version of the conditional expectation E(X\9)). Let (Ω,^,Ρ) be a triple, and X a random variable with E(|A"|)< oo.
138 SOME CLASSICAL THEORY 11.39,40 Let 9 be a sub-a-algebra of 3F. Then there exists a random variable Υ such that (i) Υ is 9 measurable; (ii) Е(|У|)<оо; (Hi) for every set 9 in 9 (equivalently, for every set G in some π-system that contains Ω and generates 9), we have E( У; G) — E{X, G). Moreover, if Υ is another random variable with these properties then У = Y, a.s., that is, Р[У= У] = 1. A random variable Υ with properties (i)-(iii) is called a version of the conditional expectation E{X\9) of X given 9, and we write Y=E(X\9), a.s. The Radon-Nikodym proof will be given shortly. {39.2) The intuitive meaning. An experiment has been performed. The only information available to you regarding which sample point ω has been chosen is the set of values Ζ(ω) for every ^-measurable random variable Z, or, equivalently, the values Ig(cd) for every Ge9. Then Υ{ω) = Ε{Χ\9){ω) is regarded as (almost surely equal to) the 'expected value of Χ{ω) given this information'. Note that if ^ is the trivial σ-algebra {0,Ω} (which contains no information) then Ε(Χ\9)(ω) = E(X) for all ω. Proof of Theorem 39.1. Existence. Suppose that XeSf1 (&,&,?). Consider first the case when X ^ 0. Then, as we saw in Section 9, the map Gt->E(X; G) is a finite measure on (Ω, 9) that is absolutely continuous with respect to P. Hence, by the Radon-Nikodym Theorem 9.3, there exists а У in ^(Ω,^,Ρ) such that (39.3) E(Y;G) = E(X;G) for all Ge9. The existence of У is therefore established; and the general case when ATeJS?χ(Ω,^,Ρ) follows by linearity. Uniqueness. If У and У are in ^(Ω,^,Ρ) and Е(У- Y;G)^0 for every G in 9 then У^ У, a.s. For consider Е(У- Y\Gn\ where G„:= {ω:{Υ- Υ)(ω) > η"1}, etc. The π-system formulation. Suppose that XeS£^Ω,^,Ρ), Υε& 1(Ω999Ρ)9 and that E( Y; G) = E(X; G) for e\^ry G in some π-system containing Ω and generating 9. By the Dominated-Convergence Theorem 8.5, the class of sets G in 9 for which (39.3) holds is a d-system in the sense at (1.6). By Dynkin's Lemma 1.8, this class must coincide with 9. D 40. Notation; agreement with elementary usage. We often write E(X\Z) for Ε(*|σ(Ζ)), Ε(*|Ζ15Ζ2,...) for Ε(*|σ(Ζΐ5Ζ2,...)), etc. The case of two RVs will suffice to illustrate the connection between the abstract definition 39.1 and elementary conditional expectation. So suppose
H.40,41 DISCRETE-PARAMETER MARTINGALE THEORY 139 that X and Ζ are RVs that have a joint probability density function (pdf) fx,z(x>z)· Then /z(z) = $wLfx,z(x>z)dx acts as a probability density function for Z. Define the elementary conditional pdffX\Z of X given Ζ via (ω*,*) if/z(z)^0? /xiz№)H /zOO (0 otherwise. Let ft be a Borel function on R such that E|A(*)I= f \h(x)\fx(x)dx«x>, Jr where of course fx(x) = Jr fxtz(x>z) dz gives a pdf for X. Set #(z):= Λ(χ)/Χ|Ζ(χ|ζ)έίχ. Jr Then Y:=g(Z) is a version of the conditional expectation ofh{X) given σ(Ζ). Proof. The typical element of σ(Ζ) has the form {co:Z(co)eB}, where Be@). Hence we must show that Ε[/ι(Ζ)/Β(Ζ)] = E[#(Z)/B(Z)]. But this follows from Fubini's Theorem. □ 41. Properties of conditional expectation: a list. This is the same list of properties as in Section 9.7 (and on the back cover!) of [W]. All Xs satisfy E(|A"|) < oo in this list of properties. Of course, ^ and Ж denote sub-a-algebras of #". (The use of 'c' to denote 'conditional' in {cMon) etc. is obvious.) (41)(a) If Υ is any version of E(X\9) then E(Y) = E(X). (41)(b) If X is У measurable then E(X\9) = X, a.s. (41)(c) {Linearity) E(a1X1 + a2X2\<#) = ^ΕΟΧΊΙ») + a2E(Z2|Si), a.s. Clarification. If Y^ is a version of Ε(ΑΊ|#) and Y2 is a version of E(X1\(S\ then αι ΥΊ + a2^2 is a version of E(a1A'1 + α2Χ2\<&). (41)(d) (Positivity) li X^O then ECY|3)^0, a.s. (41)(e) (cMon) If (К*Л* then Ε{Χη\<Ζ)ΪΕ{Χ\<$\ a.s. (41)(f) (cFaiou) If Xn>0 then EDiminfJTJSf] ^limmfE[jrj»]f a.s. (41)(g) (cDom) If |-ΥΒ(ω)| < Κ(ω), Vn, EK< oo, and *„->*, a.s, then E{Xn\<Z)->E(X\&\ a.s. (41)(h) {cJensen) If c: IR-+IR is convex and E\c{X)\ < oo then Е\с{Х)\Щ>с{Е\_Х\<3]\ a.s. Important corollary. \\E(X\<#)\\p ^ ||ΑΊ|Ρ for p^ 1.
140 SOME CLASSICAL THEORY 11.41,42 (41)(i) (Tower Property) If Ж с 9 с J* then Е[Е(Х\9)\Ж] = Е\_Х\Ж\ a.s. Note. We shorten the left-hand side to Е\_Х\9\Ж] for tidiness. (41)(j) {^Taking out what is known') If Ζ is ^-measurable and bounded then (*) E\_ZX\9}=ZE\_X\9\ a.s. If ρ > 1, p"1 + i"1 = 1, Ze Jif Ρ(Ω, «Г, P) and Ze JS?*(Q, Sf, P) then (*) again holds. If Xe(m^)+9 Ze(m9)+, E(X) < oo and E(ZX) < oo then (*) holds. (41)(k) (Role of independence) if Ж is independent of σ(σ(Χ),9) then Ε[ΛΓ|σ(»,^Τ)]=Ε(ΛΓ|δί), a.s. In particular, if X is independent of Ж then Е(Х\Ж) = E(Z), a.s. For proofs of all the above properties, see Section 9.8 of [W]. Do Exercise E60.41 now. 42. The role of versions; regular conditional probabilities and pdfs. If we consider conditional expectation as a map from ί^Ω,^,Ρ) to 1/(0, #,P), these spaces being the proper Banach spaces of equivalence classes of functions, then this map is truly uniquely defined with no untidy 'almost sure' qualifications and no need for 'versions'. So why not do this in this 'elegant' way? The answer is that the ability to choose 'good' versions—of the functions, not the equivalence classes—is absolutely crucial to the whole theory. We shall repeatedly see cases where we get the good results by modifying random variables on null sets. Thus, for example, we want modifications of martingales that have right-continuous paths; and concepts of path regularity are meaningless if we work with equivalence classes. In the remainder of this section, and in the next, we consider a rather important case that in some, though not all, respects parallels our earlier discussion of Poisson measures. (42.1) DEFINITION (a version of conditional probability, P(F\9)). Let Fe& and let У^ЗР. We call any version ofE(IF\&) a version of the conditional probability of F given 9, and write P(F\9) = E(IF\9), a.s. By (41) (c) and the 'cMon' result (41)(e), we can show that for a fixed sequence (Fn) of disjoint elements of ^", we have (42.2) n\jFn\9) = YY>(Fn\9\ a.s. Except in trivial cases, there are uncountably many sequences of disjoint sets; and· it is therefore not at all clear that we can choose a good modification {(P\9)(F):Fe^} of the process {P(F\9):Fe^}. Let us formulate what we mean by a good modification.
11.42,43 DISCRETE-PARAMETER THEORY 141 (42.3) DEFINITION (regular conditional probability given 9). Let (Ω, У, Р) be a triple and let 9 be a sub-a-algebra of iF. By a regular conditional probability (P|^)(·, ·) given У, we mean a map (42.4)(a) (P|Sf):^" χΩ->[0,1] such that (42.4)(b)/or Fe^, the function cub->(P|#)(F,cu) is α version ofP(F\<#);for almost every ω, the map (42.4)(c) Ft-+(P\9)(F,a>) is a probability measure on 3F. It is known, and is proved in Section 89, that regular conditional probabilities exist under most conditions encountered in practice, but, as we shall see in the next section, they do not always exist. Note. The elementary conditional pdf fx\z(x\z) of Section 40 is a regular conditional pdf for X given Ζ in that for every A in ^, &>i—м fx\z(x\Z(co))dx is a version of P(XeA\Z). J a Proof. Take h = lA in Section 40. 43. A counterexample. This counterexample, for which Halmos, Dieudonne, Andersen and Jessen share the credit, exhibits a situation in which no regular conditional probability given 9 exists. It helps emphasise why we need some extra 'topological' hypothesis such as that used for the positive result in Section 89. Take (Ω,^):=([0,1],^[0,1]). Let μ denote Lebesgue measure on (Ω, 9). Let Ζ be a subset of Ω of inner μ-measure О and outer μ-measure l. (We are assuming the Axiom of Choice!) Let 2F be the smallest σ-algebra on Ω extending 9 and containing Z, so that a typical element Λ of 3F may be written (with Zc denoting [0,1]\Z) A = (ZnA)n (Zc η Β), where A, Be<$. The fact that Ζ and Zc have outer measure 1 implies (see Lemma 6.1) that μ*(Ζ ηΑ) = μ*(ΖηΑ) = μ(Α% μ*{Ζ€ η Λ) = μ*{Ζ€ η Β) = μ{Β). Hence we can define a probability measure Ρ on (Ω, <F) by Ρ(Λ):= \μ\Ζ η Λ) + |μ*(Ζ< η Λ) = \μ{Α) + \μ{Β\
142 SOME CLASSICAL THEORY H.43,44 Assume that (P|S?):^ χ Ω->[0,1] is a regular conditional probability given #. We shall show that this assumption leads to a contradiction. Let Ге^. Then, for Ge&, Ε((Ρ|^)(ΖηΓ); G) = Ρ{ΖηΓηβ) = ±μ*{ΖηΓηβ) = ±μ{Γηβ) = E(f/r; G), so that (P19f)(Z η Γ, ω) = £/Γ(ω), a.s. Since ^ is generated by a countable π-system </, and since Г н^(Р | ^)(Z η Γ,.ω), Γ *-4/Γ(ω) are measures for every ω (the first because of our assumption), the set J:= {ω:(Ρ|^)(ΖηΓ)(ω) = |/Γ(ω),νΓΕ^} = {ω:(Ρ|^)(ΖηΓ)(ω) = ^Γ(ω),νΓΕ./} is in <S, and Ρ (J) = μ(7) = 1. (The argument now takes on a 'Russell's paradox' appearance. The set J is itself an element of ^ ) If ω ε J then (Ρ|^)(Ζη7,ω) = ^(ω)^1/Λ{ω}(ω) = (Ρ|^)(Ζη[Λ{ω}],ω), so that Ζ η J Φ Ζ η [Α{ω}]; in other words, coeZ. Hence J, which is an element of ^ of measure 1, is a subset of Z, contradicting the fact that Ζ has inner measure 0. 44. A uniform-integrability property of conditional expectations. The reason that the martingale and UI properties tie in so well is the following. (44.1) THEOREM (Doob). Let Χε&^Ο,&,Έ). Then the family {E{X\&):& is a sub-a-algebra of J5"} is uniformly integrable. Clarification. Because of the business of versions, a formal description of the family in question would be the set of all random variables У with the property that for some σ-algebra # с je-, y= E(X\&), a.s. Proof. Let ε > 0 be given. Use Lemma 20.1 to choose δ > 0 such that, for Fe^, P(F)<(5 implies that E{\X\;F)<s. Choose К so that K"1E(|Z|)<(5. Now let ^ с J*" and let У be a version of E(A"|37). By the 'cJensen' property in Section 41, (44.2) \Y\*ZE(\X\\n a.s.
И.44,45 DISCRETE-PARAMETER THEORY 143 Hence, as in fact we already know, E(| Y\) < E(|X\), and КР(|У|Ж)<Е(|У|)<Е(|Х|), so that P(\Y\>K)<6. But {ω:| Υ{ω)\ >K}e<S, and, from (44.2) and the definition of conditional expectation, Ε(|7|;|7|>ΚΚΕ(|*|;|7|>Κ)<ε, and this is the desired uniform-integrability property. D (Discrete-parameter) martingales and supermar ting ales 45. Filtration, filtered space; adapted process; natural filtration. Let (Ω,^,Ρ) be given. (45.1) DEFINITIONS (filtration, filtered space). By a filtration on (Ω,^,Ρ), we mean an increasing family {&n:neZ*} of sub-a-algebras of2F: The set-up (Q,^",P,{^n:neZ+}) is then called a filtered space. {45.2) CONVENTION. Until further notice, we assume given a filtered space (Ω,^,Ρ,{^:η€Ζ+}). All of our martingales, supermartingales etc. will be defined relative to this set-up. (453) DEFINITION (adapted process). A process X = {Xn:neZ+} carried by (Ω,^,Ρ) is said to be adapted (to our given filtration) if for every neZ+,X„ is ^„-measurable. (45.4) DEFINITION (natural filtration). Let W={W„:neZ} be a stochastic process carried by our triple (Ω, J^,P). The natural filtration {Wn\neZ+} of W is defined to be the smallest filtration relative to which W is adapted, so that iTn = a(W0,Wu...,Wn). (45.5) The intuitive meanings. The information about the chosen ω that is available to us at time η consists of the values Ζ„(ω) for every ^„-measurable random variable Z„. A process X is adapted if the value ^„(ω) is known to us at time n. Usually, {^„} is the natural filtration {iTn:neZ*} of some process W9 and then the information about ω which we have at time η consists of the values W0(co), \νχ(ω\..., Wn(<o). A process X is then adapted if and only if, for each n, Xn = /„(W0,..., Wn) for some /emJn+ К See E38.25.
144 SOME CLASSICAL THEORY H.46,47 46. Martingale; supermartingale; submartingale. As already explained, these concepts are defined relative to our given filtered space in (45.2). (46.1) THE KEY DEFINITIONS. A process X is called a martingale if (i) X is adapted; (ii) Е(|*и|)<схэ, Vn; (in) E[*„|JvJ = *„_!, a.s. (n>\). A supermartingale is defined similarly, except that (Hi) is replaced by E[*„|JF,,_i]^*„-i, a.s. (n^l), and a submartingale is defined with (Hi) replaced by Ъ\Хп\?п-Д>Хп_х, a.s (n>l). A supermartingale 'decreases on average'; a submartingale 'increases on average'. The 'p' points down, the 'b' up! Of course, Chapter I has explained how 'superharmonic' corresponds to 'local supermartingale', which was the reason for this choice of terminology. Note that X is a supermartingale if and only if — X is a submartingale, and that X is a martingale if and only if it is both a supermartingale and a submartingale. It is important to note that a process X for which X0e JSf *(Ω, ^0,P) is a martingale (respectively, supermartingale, submartingale) if and only if the process X — X0 = (Xn — X0:neZ+) has the same property. So we can focus attention on processes that are null at 0. If X is, for example, a supermartingale, then the Tower Property of conditional expectations, (41)(i), shows that, for m < n, E[^|^m] = E[ZJ^_1|erj^E[Zn_1|^J^...<Zm, a.s. (46.2) Gambling interpretation. Think of Xn — Χη-± as your winnings per unit stake on a gambling game. The game is unfavourable to you if X is a supermartingale, favourable to you if X is a submartingale, and fair if X is a martingale. 47. Previsible process; gambling strategy; a fundamental principle. We now study the discrete-parameter analogue of stochastic integrals. (47.1) DEFINITION (previsible process). We call a process (Сп:пеЩ previsible if] for each neN, C„ is !Fn- ^measurable. Note that C0 is not defined. Think of Cn as your stake on game n. You have to decide on the value of Cn based on the history up to (and including) time η — 1. This is the intuitive significance of the 'previsible' character of C. Your winnings on game η are
11.47,48 DISCRETE-PARAMETER MARTINGALE THEORY 145 Cn(Xn — Xn_1) and your total winnings up to time η are (47.2) Y„= Σ Ck(Xk-Xk^)=:{CX)n.. Note that (Ο·Χ)0 = 0, and that *и ~~ *n- 1 = Cn(Xn ~~ ^n- l)· (47.3) DEFINITION (martingale transform, stochastic integral). The process 0·Χ is called the martingale transform of AT by C, or £/ie (discrete) stochastic integral of С with respect to X. (47.4) THEOREM. You can't beat the system! (i) Let С be a bounded non-negative previsible process, so that, for some К in [0, oo), |Cn(co)| ^Kfor every η and every ω. Let X be a supermartingale (respectively martingale). Then ϋ·Χ is a supermartingale (martingale) null at 0. (ii) If С is a bounded previsible process and X is a martingale, then (0·Χ) is a martingale null at 0. (Hi) In (i) and (ii) the boundedness condition on С may be replaced by the condition С„еЛ?29Уп, provided we also insist that ATneJS?2,Vn. Proof of (i). Write Υ for 0·Χ. Since Cn is bounded non-negative and ^"n_i measurable, we have, from (41)(j), Е[1;-1;-1|^и-1] = СиЕ[^и-^и_1|^и.1]^0 (resp. =0). Proofs of (ii) and (iii) are now obvious. (Look again at (41)(j).) Π 48. Doob's Upcrossing Lemma. Doob's use of upcrossings is one of the most sparkling things in the theory. (48.1) DEFINITION (number of upcrossings). Let X be a supermartingale. The number UN(X; [a, b])(co) of upcrossings of [a, b] made by η \-+Χη(ω) by time N is defined to be the largest к ίηΈ+ such that we can find 0 ^ Si < t1 < s2 < t2 < · · · < sk < tk ^ N with XMi(co) < a, Xti(a>) >b (1 ^ i ^ k). Regard Xn — Xn _ x as representing your winnings per unit stake on game n. Consider your total-winnings process У:= 0·Χ under the previsible strategy С described as follows: Pick two numbers a and b with a < b. Repeat
146 SOME CLASSICAL THEORY 11.48,49 Wait until X gets below a Play unit stakes until X gets above b and stop playing Until False (that is, forever!). To be more formal (and to prove inductively that С is previsible), define Cl:=I{Xo<a}> and, for η ^ 2, ^n:== I{Cn- ι = l}I{*n- ι < b} + I{C„- ι =0}I{X„- , <α} ' The fundamental inequality (recall that 70(ω):=0) (48.2) ΥΝ(ω) >(b- a)UN(X; [α, Ь] )(ω) - [ΧΝ(ω) - α] " is now obvious: every upcrossing of [a, b] increases the У-value by at least b — a, while the [^(ω) — a] " overemphasises the loss during the last 'interval of play'. (Draw a picture, or see [W].) (48.3) THEOREM (Doob's Upcrossing Lemma). Let X be a supermartingale. Let UN(X;(a,b~\) be the number of upcrossings o/[a,b] by time N. Then (b - a)EUN(X; [a, Ц) < E[(*„ - a)"]. Very Important Note. The number of steps does not feature directly on the right-hand side; only the final variable XN appears. It is the fact that we get a bound independent of the number of steps that makes this result so powerful. Proof. The process С is previsible, bounded and non-negative, and Υ=0·Χ. Hence У is a supermartingale, and E(YN) < 0. The result now follows from (48.2). D (48.4) COROLLARY. Let Xbea supermartingale that is bounded in JS?1 in that supnE(|Arn|)< со. Let a9beQwith a < b. Then,with t/^pfjfob])·^ ^limNUN(X\ (b-a)EC/00№[a,b])^|a| + supE(|ZJ)<oo, π so that P(UJX;[_a,b]) = oo) = 0. Proof. By Lemma 48.3, we have, for JVeN, (b -a)EUN(X;[a,b]) ^ \a\ + E(|*w|)< \a\ + supE(|X„|). Now let iV|oo, using the Monotone-Convergence Theorem. Π 49. Doob's Supermartingale-Convergence Theorem. Doob's proof is worthy of the result.
11.49,50 DISCRETE-PARAMETER MARTINGALE THEORY 147 (49.1) THEOREM (Doob's Supermartingale-Convergence Theorem). Let X be a supermartingale bounded in <£ 1:supnE(|Arn|) < oo. Then, almost surely, X^'.^ lim Xn exists and is finite. For definiteness, we define 1^(0)):= limsup ^„(ω), \/ω, 50 that X^ is 3F^measurable and X^ — lim Xn9 a.s. Proof (Doob). Write (noting the use of [— oo, oo]): Λ:= {ω:Χη(ω) does not converge to a limit in [— oo,oo]} = {ω: lim inf Χη(ω) < lim sup ^„(ω)} = (J ^:liminfZn^)<a<b<limsupZn(a>)} {a,bs<Q:a<b} =:UAfl,b (say). But Ав,ьс{ш:1/в№[а1Ь])(ш)=оо}, so that, by (11.4), P(Afltb) = 0. Since Л is a countable union of sets Afl>b, we see that Р(Л) = 0, whence ' Xao:=^mXH exists a.s. in [—00,00]. But Fatou's Lemma shows that E(\XJ) = E(liminf|ZJ) ^HminfE(|AJKsupE(|*J)< 00, so that P(XO0 is finite) =1. Π Note. There are other proofs for the discrete-parameter case. None of these is as probabilistic, and none shares the central importance of this one for the continuous-parameter case. (49.2) COROLLARY. IfX is a non-negative supermartingale, then X^ := lim Xn exists almost surely. Proof. X is obviously bounded in JSf1, since E(|Xn\) = E(Xn)^E(X0). Π 50. JS?1 convergence and the UI property. It is important to know when supermartingales converge in $£1. (50.1) THEOREM. Let X be a supermartingale bounded in jSf1, so that X^'.^ lim Xn exists a.s. Then Xn-+XOQ in JSf1 if and only ifX = {Xn:neZ+} is uniformly integrable, and then, for пеЖ*, (50.2) Е(ХХ\^„)^ХП, a.s. with a.s. equality if X is a (UI) martingale.
148 SOME CLASSICAL THEORY 11.50,51 Proof. Because of Theorem 21.2 on the equivalence of $£1 convergence and the UI property, all that remains is to prove that (50.2) holds if X^X^ in JSf1. But then, for Fe J^, and r ^ n, E(XriF)^E(Xn;F)9 and (50.2) follows on letting r-* oo. (50.3) THEOREM (Levy's 'Upward' Theorem). Let ξε&^Ο,&,Ρ), and, for n^O, define Mn:= Ε(ξ\^η)9 a.s. Then Μ is a UI martingale and Μ„^η:=Ε(ξ\^00)ί almost surely and in JS?1. Proof. We know that Μ is a martingale because of the Tower Property 41 (i). We know from Theorem 44.1 that Μ is UI. Hence Мда := lim Mn exists a.s. and in jSf1, and it remains only to prove that Мда =η9 a.s., where η:=Ε(ξ\^Γ00). However, for Fe^n, EfeF) = E(Mn;F) = E(M00;F), so that E(i/;F) = E(£F) for all F in the π-system (Jj% that generates P^- ВУ property 39.1 (iii) in the definition of conditional expectation, the result follows. (50.4) THEOREM (Kolmogorov's 0-1 Law). Let Xl9X29... be a sequence of independent RVs. Define <Γη:=σ(Χη+ι,Χη+2,...), 3-.= {\9-η. η ThenifFe^,P(F) = Qor 1. Proof. Define J*v= a(XuX2,...,Xn). Let FeF, and let η\= lF. Since ^eb^», Levy's Upward Theorem shows that ^ = Efo|^J = limEfo|^,,), a.s. However, for each η, η is ^"„-measurable, and hence is independent of 3Fn. Hence, by41(k), Е(Ч|^Я) = Е(|,) = Р(П a.s. Hence η = P(F), a.s.; and since η only takes the values 0 and 1, the result follows. D For another nice application of the Upward Theorem, see Exercise 60.50. 51. The L6vy-Doob Downward Theorem. This theorem is crucial for the continuous-parameter theory. We follow the account at T.V.21 in Meyer [2]. (51.1) THEOREM (Levy-Doob Downward Theorem). Suppose that (Ω,^,Ρ) is a probability triple, and that {&n:ne— N} is a collection of sub-a-algebras of
11.57 DISCRETE-PARAMETER MARTINGALE THEORY 149 & such that (for fc,neN) к Let X — {X„:ne— N} be a supermar ting ale relative to {Уп:пе — Ν}, so that E(Xn\<Zm)^Xm, a.s. (m^n^-1). Assume that supn< _t EfX,,) < oo. Then the process X is UI, and the limit *_:= lim Xn n~* — oo exists a.s. and in JSf1. Further, for η < — 1, EiATJSi.J^Jr.., a.s., wii/i a.5. equality if X is a martingale. Proof We prove the UI property; the existence (a.s. and JSf1) of X-^ then follows from the Upcrossing Lemma just as in the case of the Supermartingale- Convergence Theorem 49.1. Let ε > 0 be given. Since | lim EX„ < oo, n| — oo there exists к such that (51.2) 0 ^ E{Xn) - E{Xk) < £e for all η < к. Now, for η < к and λ > О, Ε(\Χη\;\Χη\>λ)=-Ε(Χη;Χη<-λ) + Ε(Χη)-Ε(Χη;Χη*:λ) < -E(Xfc;X„< -Я) + Е(ХИ)-Е(Х*;Х„<Я), by the supermartingale property. Hence, by (51.2), Ε(\Χη\;\Χη\>λ)^Ε(\Χ,\;\Χη\>λ) + ±ε. Since Xke$£S Lemma 20.1 shows that we can find δ > 0 such that P(F) < δ implies that E(\Xk\;F) < ±e. But P(|AJ > Я) < Я"1Е(|АГИ|), and, since X~ := max {{-X%0} is a submartingale by 'cJensen', E(|Xn\) = E(X„) + 2Е(ЛГ;) < sup E(X„) + 2E(XI,). π We may therefore choose К such that Ρ{\Χη\>Κ)<δ whenever η < Jt, Ε(|λ^|;|λ^|>Κ)<ε whenever J>fc. Then E(|X„\; \Xn\ >Κ)<ε for every η < -1, so that X is UI. D
150 SOME CLASSICAL THEORY 11.51,52 Kolmogorov's Strong Law of Large Numbers is a consequence. (51.3) THEOREM (SLLN). Let X1;A:2,... be independent, identically distributed random variables, with E{\Xk\) < oo for some (then every) k. Let μ be the common value ofE(X„). Write S„:= Xt+X2 +■■■ + X„. Then η~18„-*μ a.s. and in ££l. Proof. Define <0-n:=c(S„,Sn+l,Sn + 2,...), »_„:= f]9.n. П Then, for η ^ 1, E(X1|^_n) = E(X2|^_n)=...=E(XJ^_n) = n-1E(SJ^_n) = n-1Sn, a.s. Hence L:= limn~ 1Sn exists a.s. and in $£*. For definiteness, define L:= limsupn~* Sn for every ω. Then, for each fc, L = limsup*'+1 + '"+*' + " η so that Lem^k, where &'k = a(Xk+1,Xk+2,···)· By Kolmogorov's 0-1 law, P(L= c) = 1 for some с in R. But с = E(L) = lim Е(и" 'SJ = μ. Π Remarks. See Meyer [2] for important extensions and applications of the results given so far in this chapter. These extensions include the Hewitt-Savage 0-1 Law, de Finetti's Theorem on exchangeable random variables, and the Choquet- Deny Theorem on bounded harmonic functions for random walks on groups. 52. Doob's Submartingale and 5£p Inequalities. Many uses are made of the inequalities in this section. We return to the standard situation in which our time-parameter set is Z+. {52.1) THEOREM (Doob's Submartingale Inequality). Let Ζ be a non-negative submartingale. Then, for о 0 and пеЖ+, cP( sup Zk ^ с ) ^ E( Z„; supZk ^ с ) ^ E(Z„). Important Notes. The number η of steps does not feature directly in the last two expressions. This is what gives the result its power. The proof will show that the assumption that Ζ is non-negative is not needed for the first inequality. Proof. Let F:= {supk^nZk^c]. Then F is a disjoint union F = F0vF1u~-vFH,
11.52 DISCRETE-PARAMETER MARTINGALE THEORY 151 where F0:={Z0^c}, Fk:= {Z0<c}n{Zl <c}n---n{Zk^1 <c}n{Zk^c}. Now, Fke&k, and Zk ^ с on Fk. Hence E(Z„;Fk)>E(Zk;Fk)>cP(Fk). Summing over к now yields the result. Π The main reason for the usefulness of the above theorem is the following. (52.2) LEMMA. If Μ is a martingale, с is a convex function and E\c(Mn)\ < oo, Vn, then c(M) is a submartingale. Proof. Apply the conditional form of Jensen's inequality in Table 41. Π In preparation for Doob's ifp inequality, we now establish a consequence of Holder's inequality. (52.3) LEMMA. Suppose that X and Υ are non-negative random variables such that cP(X ^ c) < E( Y; X ^ c) for every c> 0. Then, for ρ > 1 and p~ l + q~ l = 1, we have \\X\\P<q\\Y\\p. Proof. We obviously have (52 .4) L:= pcp-lV{X^c)dc^ pcp-2E(Y;X^c)dc=:R. c = 0 Using Fubini's Theorem with non-negative integrands, we obtain Ь=Г ( ί I{x>c}{co)P(d<o)\pc'-Uc = Γ ( Γ pc"-1 dc )P(dd) = E(XP). )a\Jc=o / Exactly similarly, we find that R = E(qXp-lY). We apply Holder's inequality to conclude that (52.3) E(X')^E(qX'-1Y)^q^Y^X'-l\lr
152 SOME CLASSICAL THEORY H.52,53 Suppose that || Y\\p < oo, and suppose for now that || X \\p < oo also. Then, since {p—l)q = p9 we have so (52.5) implies that ||A"||P<q\\ Y\\p. For general X, note that the hypothesis remains true for Χ απ. Hence || X л η \\p ^ q \\ Υ \\ρ for all n, and the result follows using the Monotone-Convergence Theorem. Π {52.6) THEOREM (Doob's JS?p inequality). Let ρ > 1 and define q so that p"1 + q~l = 1. Let Ζ be a non-negative submartingale bounded in JSfp, and define (this is standard notation) Z*:= sup Zk. Then Z*eJ?p, and indeed (52.7) || Z* ||, < * sup || Zr ||,. r The submartingale Ζ is therefore dominated by the element Z* of S£p. Also, Z^^ lim Zn exists a.s. and in JS?P, and ||ZJ|, = sup||ZJ, = Tlim||Zr||p. r r (b) If Ζ is of the form \M\, where Μ is a martingale bounded in S£py then M00:=limM„ exists a.s. and in JSfp, and of course Z^ = IM^J, a.s. Proof. For neZ+, define Z*:= supfc<nZfc. From Doob's Submartingale Inequality and the above Lemma, we see that \\z:\\p^q\\ZJp^qsup\\Zr\\p. r Property (52.7) now follows from the Monotone-Convergence Theorem. Since ( —Z) is a supermartingale bounded in JSfp, and therefore in JS?1, we know that Z00:=limZn exists a.s. However, \Zn-Z\p^(2Z*)pe<?\ so that the Dominated-Convergence Theorem shows that Zn -* Ζ in 5£p. Jensen's inequality shows that ||Zr||p is non-decreasing in r, and all the rest is straightforward. Π 53. Martingales in JSf2; orthogonality of increments. Let M = (Mn:n^0) be a martingale in <£2 in that each Mn is in JS?2 so that E(M2) < oo, Vn. Then for 5, t,u9veE+, with s < t ^ и ^ v, we know from properties (a) and (j) of Table 41 that for Ze&2{Pu\ E(ZM„) = EE(ZMV\FU) = E[ZE(MJ#-M)] = E(ZMU),
11.53,54 DISCRETE-PARAMETER MARTINGALE THEORY 153 so that Mv — Mu is orthogonal to JSf 2(J*M) and, in particular, (53.1) E[(Mf-MJ(M.-MJ] = 0. Hence the formula k = l expresses Mn as the sum of orthogonal terms, and Pythagoras's theorem yields (53.2) E(Mn2) = E(M2) + £ ΕΚΜ,-Μ,.,η k = l The following theorem is therefore obvious. (53.3) THEOREM. Let Μ be a martingale for which Mne&2^n. Then Μ is bounded in «Sf2 if and only if ХЕ[(Мк-Мк_1)2]<сю; and when this obtains, Theorem 52.6 implies that Mn->MaD almost surely and in if2. 54. Doob decomposition. In the following theorem, the statement that Ά is a previsible process null at 0' means of course that A0 = 0 and Anem J^n _ x (neN). (54.1) THEOREM (Doob decomposition), (i) Let {Xn:neZ+) be an adapted process with Jfneif 1,Vw. Then X has a Doob decomposition (54.2) X = X0 + M + A, where Μ is a martingale null at 0, and A is a previsible process null at 0. Moreover, this decomposition is unique modulo indistinguishability in the sense that if X — X0 + Μ + A is another such decomposition then Р(Мп = Мп,Ап = Ап,Щ=1. (ii) The process X is a submartingale if and only if A is an increasing process in the sense that V(An<An+l9W=l. Proof. ΙΪΧ has a Doob decomposition as at (54.2) then, since Μ is a martingale and A is previsible, we have, almost surely, Е(ХП-ХП_1|#'Л_1) = Е(МП-МП_1|^П_1) + Е(ЛП-ЛЛ_1|^Г1_1) = 0 + (Αη-Απ.1). Hence (54.3) An = t E(** -Xk-1!#■*- Д a.s., *=1
154 SOME CLASSICAL THEORY 11.54,55 and if we use (54.3) to define A, we obtain the required decomposition of X. The 'submartingale' result in Part (ii) of the theorem is now obvious. Π {54.4) Remark. The Doob-Meyer decomposition, which expresses a submartin- gale in continuous time as the sum of a local martingale and a previsible increasing process, is a deep result that is the foundation stone for stochastic- integral theory. It is proved in full generality in Chapter VI. A useful estimate. The following estimate, which makes no positivity hypothesis, is often useful. (54.5) LEMMA. If X is a submartingale or supermartingale then, for NeZ+ and c>0, >(sup\Xk\>3c) cP sup \Xk\ ^ 3c U 4E(\X0\) + 3£(|*„|). Proof. Let X be a submartingale with Doob decomposition X = X0 + M + A, where A is increasing. Then sup I A; I ^ \X0\ + sup \Mk\ + sup \Ak\ ^ \X0\ + sup \Mk\ + AN. k^N k^N k^N k^N Thus, using the fact that | Μ | is a non-negative submartingale and the Submartingale Inequality, we have, for о 0, cP sup \Xk\ ^ 3c UcP(|ΑΌΙ >c) + cP sup \Mk\ 7*c\ + cP{AN7* c) <Ε(|Χ0Ι) + Ε(|Λί*|) + Ε(Α„) <Ε(|Χ0Ι) + Ε(|^--Υο-^Ι) + Ε(^) <Е(|Х01) + Е(|^|) + Е(|Х01) + 2Я(^) < 2Е(|ВД + E(|^|) + 2E(*„ - X0) <4E(\X0\) + 3E(\XN\). If AT is a supermartingale, apply the result just obtained to the submartingale (-X). D 55. The <M> and [M] processes. The continuous-parameter analogues of these processes allow the stochastic integral to be defined. In both discrete and continuous time, a host of celebrated inequalities (Burkholder-Davis-Gundy, John-Nirenberg etc.) are associated with them. See, for example, Garsia [1] and Neveu [5] for the discrete case.
H.55,56 DISCRETE-PARAMETER MARTINGALE THEORY 155 {55.1) DEFINITION (the angle-brackets process <M>). Let Μ be a martingale in JS?2 and null at 0. Then M2 is a submartingale with (essentially unique) Doob decomposition M2 = N + A, where N is a martingale and A is a previsible increasing process, both N and A being null at 0. The process A is written <M>, and called the angle-brackets process of M. Define Αβ:= |lim A„, a.s.. Since E(M2) = E{An), we see that (55.2) Μ is bounded in S£2 if and only ifE{AaD) < oo. It is important to note that (55.3) ЛП-ЛП_1=Е(М2-М2_^^П_1) = Е[(МП-МП_1)2|^П_1]. This is reflected in the following result. {55.4) THEOREM and DEFINITION (the [M] process). Again, let Μ be a martingale in S£2 and null at 0. Define [М]„:= ^(М.-М,^)2. Then (55.5) M2-[M] = CM, where CH:=2MH-l9 and V:= M2 - [M] is a martingale. If Μ is bounded in 5£2 then the martingale V is uniformly integrable. Proof. The result (55.5) is elementary, and implies that К is a martingale because of Theorem 47.4(iii). If Μ is bounded in S£2 then, by Doob's S£2 Inequality, the process M2 is dominated by M*2, which is in $£*, and the process [M] is dominated by [M]^, which is also in 5£x. Hence V is dominated in 5£ *, and is therefore UI. Π Stopping times, optional stopping and optional sampling 56. Stopping time. The discrete-parameter theory is easy. The continuous- parameter theory is more challenging. {56.1) DEFINITION (stopping time). A map Τ:Ω-► {0,1,2,...; oo} /5 called a stopping time if (56.2) {Γ^η} = {ω:Γ(ω)^η}Ε^„, Vn^oo,
156 SOME CLASSICAL THEORY H.56,57 equivalently, if (56.3) {Τ = η} = {ω:Τ(ω) = η}Ε3?„, Vn^oo. Note that Τ can be oo. The equivalence of (56.2) and (56.3) is trivial. Example. Suppose that (An) is an adapted process, and that Be0S. Let Τ = inf {n ^ 0: A„eB} = time of first entry of A into set B. By convention, inf(0) = oo, so that Г= oo if A never enters set B. Obviously, {T<n}= \J{AkeB}e*n9 so that Г is a stopping time. Conversely, if Г is a stopping time, and we define the process /[Гоо) by writing, for 0 ^ η < oo and ωεΩ, fl ifn^TM, [Γ'°°Λ ; [0 otherwise, then I[T ^ is adapted (check!), and Τ is the first entry time of this process into the set {1}. 57. Optional-stopping theorems. Let X be a supermartingale, and let Γ be a stopping time. For η ^ 1, regard Xn — Xn-1 as your fortune per unit stake on game n. Suppose that you always bet 1 unit and quit playing at (immediately after) time T. Then your 'stake process' is C(r), where, for neN, c<r) = W}> sothat с1Г)И = {о 1 ϋη^Τ(ω), otherwise. Your 'winnings process' is the process with value at time η equal to (CW.X)n = XTAn-X0. If XT denotes the process X stopped at T, *>):=*Γ(ω)Λ», then &Τ)·Χ = ΧΤ-Χ0. Now С(Г) is clearly bounded (by 1) and non-negative. Moreover, С(Г) is previsible because С<Г) can only be 0 or 1 and, for neN, {С<г> = 0} = {Г<п-1}е^„_1. Theorem 47.4 now yields the following result. (57.1) THEOREM (stopped supermartingales are supermartingales). The following results hold.
11.57 DISCRETE-PARAMETER MARTINGALE THEORY 157 (i) IfX is a supermar ting ale and Τ is a stopping time, then the stopped process Хт = (ХТлп:пеЖ+) is a supermartingale, so that, in particular, (57.2) Е(ХГл„ХЕ(Х0),. Vn. (ii) If X is a martingale and Τ is a stopping time, then XT is a martingale, so that, in particular, (57.3) Ε(*ΓΛΠ) = Ε(*0), Vn. It is important to notice that this theorem imposes no extra integrability conditions whatsoever (except of course for those implicit in the definition of supermartingale and martingale). But we have to be careful. Let X be a simple random walk on Ζ (with probabilities j of jumping to a nearest neighbour), starting at 0. Then X is a martingale relative to its natural filtration. Let Τ be the stopping time: T:=ini{n:Xn=l}. It is well known that P(T < oo) = 1. However, even though (57.3) holds for every n, we have 1 = E(XT) Ф E(X0) = 0. It is important to know when we can say that E(XT)^E(X0) for a martingale X. The following theorem gives some sufficient conditions. (57 A) THEOREM (Doob's Optional-Stopping Theorem). The following results hold, (a) Let Τ be a stopping time. Let X be a supermartingale. Then XT is integrable and E(XT)^E(X0) in each of the following situations: (i) Τ is bounded (for some N in IK, Τ(ω) ^ N, Va>); (ii) X is bounded (for some К in R+, | ^„(ω)! ^ Κ for every η and every ω) and Τ is a.s. finite; (Hi) E(T) < oo, and, for some К in R+, \Χη(ω)-Χη-ι(ω)\<Κ, ν(η,ω). (b) If any of the conditions (i)-(iii) holds and X is a martingale then E(XT) = ЩХо). Proof of (a). We know that XTAn is integrable, and (57.5) E(XTAn-Xo)^0. For (i), we can take η = N. For (ii), we can let η -* oo in (57.5) using the Bounded- Convergence Theorem 21.1. For (iii), we have ^KT Ι-^ΓΛϋ — ^οΙ — Гл п *=1
158 SOME CLASSICAL THEORY 11.57,58 and E(KT) < oo, so that the Dominated-Convergence Theorem 8.5 justifies letting η -* oo in (57.5) to obtain the answer we want. Proof of (b). Apply (a) to X and to (-X). D (57.6) Awaiting the almost inevitable. In order to be able to apply result (Hi) of Part (a) of the above theorem, we need ways of proving that (when true!) E(T) < oo. Then following announcement of the principle that 'whatever always stands a reasonable chance of happening will almost surely happen—sooner rather than laterf is often useful. (57.7) LEMMA. Suppose that Τ is a stopping time such that for some N in N and some ε > 0, we have, for every η in N, Ρ(Γ^η + Ν|^„)>ε, a.s. Then E(T) < oo. The proof is left as an exercise. (57.8) THEOREM (Doob's Supermartingale Inequalities). Let X be a non- negative supermar ting ale and Τ a stopping time. Then (57.9) E(XT)^E(X0). Moreover, for с ^ 0, (57.10) cpJ supXk > с I < E(X0). Proof. The result (57.9) is obtained by applying Fatou's Lemma to (57.2). The result (57.10) is then obtained by taking T:= inf {n: X„ > c} in (57.9). Π 58. The pre-Γ σ-algebra &τ. Let Γ be a stopping time. (58.1) DEFINITION (the pre-Γ σ-algebra &T). For F^Q, we write Fe^T if Fn{T^n}e^n for every neZ+u{oo}, or, equivalently, if Fn{T=n}e^n for every neZ+u{oo}. (58.2) The intuitive meaning. We regard the σ-algebra &*т as consisting of those events whose occurrence or non-occurrence can be decided from what our observer has seen up to and including time T. Note how well the following lemma therefore ties in with our intuition. (58.3) LEMMA. Let S and Τ be stopping times.
11.58,59 DISCRETE-PARAMETER MARTINGALE THEORY 159 (i) If X is an adapted process then XTem^T. (ii) IfS^ Τ then ^s с J^. (Hi) ^лГ = ^п^г. (iv) If Fe&Sy T then F n{S ^T}e^T. The proof is left as an easy exercise. 59. Optional sampling. This is an area in which one has to be careful. It is easy to devise fallacious quick 'proofs' of the following theorem that, one then realises, assume Part (ii) of the theorem. One is, for example, tempted to presuppose that XTAn converges to XT in if1. The fact that Part (ii) of the theorem is false in continuous time (though it is then still true for UI martingales) helps indicate the need for care. We give a proof that very clearly connects the so-called 'class (D) property' in Part (ii) with the existence of a Doob decomposition. {59.1) THEOREM (Doob's Optional-Sampling Theorem). Let S and Τ be stopping times with 0 ^ S < Τ ^ oo. (i) Let Μ be a UI martingale. Then (59.2) Ε(ΜΓ|^5) = Μ5, a.s.. Moreover, Μ is of class (D) in that the family (ΜΓ: Τ a stopping time) is UI. (ii) Let X be a UI supermartingalet and let (59.3) X = X0 + M-A be its Doob decomposition (with Μ a martingale null at 0, and A a previsible increasing process null at 0). Then A is integrable in that ЩА^ < oo), and Μ is UI, whence X is of class (D) in that the family (XT:T a. stopping time) is UI. Moreover, E(XT\^S)^XS, a.s. Proof of Part (i)for the case when O^S^T^kfor some fcelM. Suppose that 0^S< T^k for some fceN. Then MT and Ms are in JSf1, because each is dominated by \M0\ Η h \Mk\. Let Fe^s, and define the process C:= IfI{S,t]> that is, ^ . ч f 1 if coeF and S(d) < η < Τ(ω), Сп(а>):= < (0 otherwise. Then (check!) С is previsible (and bounded and non-negative), whence E(C*M)fc:=E(Mr-M5;F) = 0, and Part (i) follows for this case.
160 SOME CLASSICAL THEORY 11.59 Completion of Proof of Part (i). We have, using the result just proved, the Tower Property (41)(i) and Theorem 50.1, MTAk = E(Mk\^TAk) = E(MJ^k\^TAk) = E(M00\PTAk). By Levy's Upward Theorem 50.3, we have J£?* convergence of this equation to Mr = E(MJSi), where 9>*a(\J*TA\ Of course, ^c jzrr. Suppose that Fe&T. Then (check!) Fn{T^k}e^TAk, so that Fn{T<oo}e& and E(Mr;Fn{T<oo}) = E(M00;Fn{r<oo}). Of course it is tautological that E(Mr;Fn{T=oo}) = E(M00;Fn{r=oo}). Hence E(Mr;F) = E(M00;F), and we have proved (59.2). The class (D) property of Μ now follows because of Theorem 44.1. Π Exercise. Prove that if Fe^T then there exists Ge^ such that (G\F)u(F\G) is P-null. Proof of Part (ii). We have E(An) = E(X0)-E(Xn), and since X is UI and therefore bounded in JS?1, we have Е(Л00) = ТИтЕ(Ли)<оо. Thus the process A is dominated by the element A^ of 5£ \ so that Л is UI. Since AT is also UI, it follows that Μ is UI. We now know that the families (ΜΓ: Τ a stopping time) and (AT:T a stopping time) are UI, the latter being dominated by Аю. Hence (XT:T 3. stopping time) is UI, and certainly each XT is in J5f*. Next, E(XT\&S) = X0 + E(MT\&s) - E(AT\^S) <:X0 + MS-E(AS\&S) = X0 + MS-AS = XS, and the theorem is proved. Π {59.4) (Riesz decomposition; potentials). Again let X be a UI supermartingale. Then X has a unique Riesz decomposition X=Y + Z, where Υ is a martingale, and Ζ is a potential that is, Ζ is a non-negative UI
11.59,60 DISCRETE-PARAMETER MARTINGALE THEORY 161 supermartingale with Z^ — 0, a.s.. In our discrete-parameter situation, Ζ is of class (D), and Ζ is the potential Ζ„:=Ε(ΑΧ-Αη\^η) of our integrable previsible increasing process A. The proof is an easy exercise. (59.5) THEOREM. Let X be a non-negative supermartingale, and let S and Τ be stopping times with S ^T. Then E(XT\^S)^XS9 a.s. Note. Now there is no a.s. equality if X is a non-negative martingale. Proof. As in (59.2), Е(*гл„1^5л„К*5лп> a.s., and, by modifying the arguments for (59.3), we indeed have E(A-TAJ^s)<JrSA., as. Now let η -* oo, and use the conditional form of Fatou's Lemma. Π The following commutativity property is often useful. (59.6) THEOREM. For a stopping time T, define Ετ: ϋ(Ω, Ρ, Ρ) -> ί/(Ω, J*"r, Ρ) (note the L1 rather than Si'1) by making Ετ(ξ) the equivalence class containing Ε(ξ\^τ), the distinction between ξ and its equivalence class already being blurred! Then, for stopping times S and T, we have Proof. We make repeated use of Theorem 59.1 for UI martingales. Let ξ e <£Χ(Ω,^,P) and let ξη:= Ε(ξ\&η\ a.s.. Let η = Ε(ξ\Ρτ\ a.s., so that, by (59.1), η = ξΤ9 a.s.. Moreover, Efal^J and ξηΑΤ are UI martingales, both with limiting value (a.s. equal to) ξτ. Hence Ε(η\^Η) = ξηΑΤ, a.s. for all n, and, by (59.1) yet again, Е(^5) = £5лГ. D Note. In the continuous-parameter situation, we need a right-continuous version of ξτ, and have to mirror the use of JS?1 rather than L1 during the proof. 60. Exercises. There are a lot of exercises on this material in [W]. The following important exercises are not given there.
162 SOME CLASSICAL THEORY 11.60 (E60.41) Conditional independence. Let (Ω,^,Ρ) be a probability triple, and let si, Si and <€ be sub-a-algebras of J*". Show that the conditions (i) Р(ЛпС|Л) = Р(Л|Л)Р(С|Л), a.s. (VAe^,VCe#). (ii) P(C|a(^,^)) = P(C|^), a.s. (ICeV), are equivalent. When one (then each) of these conditions holds, we say that si and <€ are conditionally independent given 3$. The crucial application is to Markov-process theory, when si represents the Past, ® the Present and <€ the Future. {E60.50) Martingales and differentiation. Let fe^([0,1], Jf[0,1], Leb). Define Гк2~п /„(*):= 2" f{x)dx if (fe - 1)2"π < χ < fc2"n, J(k-1)2-" with (say) /и(1):= /(1). Prove that /„-*/, a.e. and in S£1. Use this result to complete Exercises El 3.6b and El3.9c. Hints (H60.41) Suppose that (i) holds. We want to prove (ii). Now, sets of the form AnB, where Aesi and Be3&, form a π-system that generates a(si,3&\ so we need only prove, using (i), that Е((Р(С|Л)М nB) = P(A nBn С). This and the 'converse' part are exercises in using the Taking out what is known' property (41)(j). {H60.50) Take Ω:= [0,1], &:= ^[0,1] and P:= Leb. Define ^V= ^([(fe- 1)2-иД2"и): 1 ^ fe < 2я). Then fn = E(f\^n\ and Levy's Upward Theorem shows that /„-*/, a.s. and in JSf1. Hint for E13.6b. Suppose that Fe& satisfies F^B and P(F)>0. Let f:=IF and let ε > 0. Since /и -* /, a.s., there will be some large η and some к such that /и(х)>1-|е for (fe-l)2-n^x<fe2"n. But (F + 2mp) mod 1 is a subset of B, and, for 1 < i ^ 2n, we can, by suitable choice of m, show that P,(Bn[(i - 1)2"", Я"")) > (1 - ε)2~η.
11.61,62 CONTINUOUS-PARAMETER SUPERMARTINGALES 163 5. CONTINUOUS-PARAMETER SUPERMARTINGALES Regularisation: R-supermar ting ales 61. Orientation. You should not proceed with this Part 5 until you are very secure with the material of Part 4. Everything there will find applications here. The essential first step, Doob's Regularity Theorem, helps establish that we can concentrate on right-continuous supermartingales relative to right-continuous filtrations. Right continuity of paths and filtrations will then allow us to transfer results from the discrete-parameter context of Part 4. We have to work with 'R-filtrations', filtrations satisfying the 'usual conditions', to obtain an adequate theory of stopping times. (61.1) A guide to our notation. We shall begin by assuming given a 'rough' supermartingale {Y,:ieR+} relative to a 'rough' filtration {^:ieR + }· This 'rough' setup is the 'obvious' generalisation of the discrete-parameter case; but it is not at all adequate. We shall show that, for almost all ω, the limit (through rational times) *,(ω):= lim 79(ω):= lim Yq{w) qllt <$3q^t,q>t exists simultaneously for all i, and defines a modification of Υ that is a right- continuous supermartingale relative to the 'usual augmentation' {#г'^+} of {#,}. We call the setup {(Xt,&t):teWL+} the R-regularisation of {№&)}- If the map t\-+Yt is right-continuous from [0, oo) into JS?1, as it will be in all cases of interest, then X is a modification of Y. The 'obvious' analogue of everything in Part 4 holds, except for the 'obvious' analogue of the Doob decomposition and the 'class (D) property of UI supermartingales' in Part (ii) of Theorem 59.1. The great (Doob-)Meyer Decomposition Theorem, which gives the correct generalisation of these things, is proved in in full generality in Chapter VI. The theory of stopping times explodes into a huge subject, many of the deeper parts of which are also studied in Chapter VI. Note. We use 'rough' with the connotation of 'rough diamond'—one that can be 'polished'. Please note that 'raw' is a different technical term meaning 'not necessarily adapted'. In a sense, 'raw' means 'tmtameable'. 62. Some real-variable results. We collect here some elementary, but essential, real-variable results. We begin by defining the appropriate regularity property for the paths of supermartingales. (62.1) DEFINITION (R-function on R+). A function x:R+ ^R will be called
164 SOME CLASSICAL THEORY 11.62 an R-function if xt = lim xu for every t > 0, lijjl xt _ := lim xs exists finitely for every t > 0. The double-arrow notation used above will now be clarified. (62.2) NOTATION (one-sided limits; Ц, etc.). For a function x:R+ -»R, we define lim sup xu:= lim sup xu = inf sup {χυ: t < и ^ v}. ujji u-*t,u>t v>t The corresponding lim inf is defined in the obvious way, and the corresponding lim exists if and only if the lim sup and lim inf have the same finite value. Of course, the || notation is used in a similar way. Remarks. The French call an R-function cadlag {continu a droite et pourvu de limites a gauche). Some writers use corlol (continuous on the right with limits on the left). In the first edition of this book, R-functions were called Skorokhod functions. Our current notation agrees with that in Volume 2. We have already examined the difficulties caused by the fact that in studying processes with time-parameter set IR+, we can only utilise σ-cylinders: essentially, we can only consider the behaviour of our process on a countable subset of times. For convenience, we work with the set Q+ of non-negative rational times. (In many ways, the set of dyadic-rational times would be more convenient!) {623) IMPORTANT CONVENTION. The symbol q as a subscript under a lim or lim sup or lim inf or η stands for a RATIONAL number. We shall use s or и (or anything other than q) to signify a real number in this subscript context. (62.4) More notation. Let у be a function on Q+. We combine the notational conventions at (62.2) and (62.3) by writing, for example, lim sup yq:= inf sup {yq: geQ+, t<q^v}. qiit v>t Here the double arrow notation helps emphasise that if t happens to be rational then the value yt is not relevant to the lim sup just described. The definition of analogous things will be obvious. (62.5) DEFINITION (Regularisable function on Q+). Let у:<§+ ->IR. We shall call у regularisable if limyq exists finitely for every real t >0, qiit limyq exists finitely for every real t > 0,
11.62 CONTINUOUS-PARAMETER SUPERMARTINGALES 165 (62.6) DEFINITION (Upcrossings; UN(y; [a, b]) where y:Q+ ^R). Let y:Q+ -+ R. Lei JVeN and /ei a,beR, w/iere a<b. We define the number UN(y;[a,b~\) of upcrossings of [a, b~\ by у during the interval [0, Ν] ίο be the supremum of кеЖ* such that we can find rationals 0^q1<r1<q2<r2<"' <qk<rk^N with y(qi)<a, y(rt)>b (l^i^/c). Remarks. We use y(q) rather than yq when typographically more convenient. Note that U N(y\\_a,b~\) may well be oo. (62.7) THEOREM. Let y: <Q+ -»R. 77ien у is regularisable if and only if whenever iVeN and a, beQ vvii/i a<b, we have both (62.8) sup{|);g|.^EQ+n[0,JV]}<cxD and (62.9) t/*(K[e,4)<oo. Proo/. Now that you know the proof of Doob's Convergence Theorem 49.1, this is an easy exercise. The 'if1 part. Suppose that, whenever NeN and a,be$l with a < b, statements (62.8) and (62.9) hold. Suppose for the purposes of contradiction that, for some i, (62.10) lim sup yq > lim inf yq. «lit illt If we choose a and b in Q so that a < b and both a and b lie strictly between the lim inf and lim sup in (62.10) then, for N > i, we shall have UN(y; [a, b]) = oo, contradicting (62.9). Hence (62.10) is false. We therefore see that for every t ^ 0, limyq exists in [oo, oo], qiit and (62.8) guarantees that this limit is finite. The proof for the tt limits is similar. The 'only if part. If у is unbounded on Q+ n[0,iV] for some JVeN, we can choose q(n) in Q+ n[0,JV] such that \yqin)\ > n. Let t be an accumulation point of the set {q(n)}. Then at least one of the limits (62.11) limyq, limyq qiit qtft must fail to exist in R. Suppose that, for some a, beQ with a < b and for some NeN, UN(y; [a, b~\) =
166 SOME CLASSICAL THEORY 11.62,63 oo. Define i:= inf {reR+ : Ur(y; [a,b~\) = oo}, the definition of Ur(y; [a,b~\) being obvious. Then at least one of the limits at (62.11) must fail to exist, the limsup being at least b and the lim inf at most α. Π (62.12) COROLLARY. If {Yq: qe(^+} is an Ж-valued stochastic process carried fry (Ω,^,Ρ) then G:= {(u\q\-► Yq(co) is regularisable} is an element of<&. Proof. Theorem 62.7 allows us to exhibit G as a σ-cylinder, which is then automatically in ^. Do think this through. (62.13) THEOREM. Let y:Q+-»IR be a regularisable function. Then xt:=limyq (ieIR+) qiit defines an R-function x. 63. Filiations; supermartingales; R-processes; R-supermartingales. Let (Ω, ^, Ρ) be a probability triple. (63.1) DEFINITION (filtration {Sfr:teR+}; filtered space). By a filtration {yt:telH+} on (Ω,^,Ρ), we mean an increasing family of sub-a-algebras of&: (63.2) forO^s^i, ЙГ.с^с^-^а) (J Sfjcsf. \meR+ / The setup (Ω,^,Ρ; {^t:ieIR+}) is then called a filtered space. (63.3) DEFINITION (martingale, supermartingale, adapted process).· Let (Ω,^,Ρ; {<&t: ieIR+}) be a filtered space. By a martingale (relative to this setup), we mean an Έί-valued process {Y^ieR"1"} such that (i) Yte^ for every t; (ii) {Yt} is {^t}-adapted in that Ytem<&t for every t; (Hi) forO^s^t, E(Yt\9a) = Ys, a.s. For the definition of supermartingale (respectively, submartingalej, the '=' sign in (Hi) is replaced by '*ζ' (respectively '^'). (63.4) LEMMA. Let Υ be a supermartingale relative to the filtered space (^^,P;{^:iER+}). Let ie[0, oo) and let (g(n):ne-IM) be a sequence of rationals with q(n)[[t as η Ц — oo. Then lim Ya(n, exists a.s. and in JS?1.
11.63,64 CONTINUOUS-PARAMETER SUPERMARTINGALES 167 Proof. Simply apply the Levy-Doob Downward Theorem 51.1 to the setup (W **«>). notin8 that suPn< -1 E(Ym) ^ ВД. D (63.5) DEFINITION (R-process, R-supermartingale). A process is called an R-process if all its sample functions are R-functions. By an R-supermartingale, we mean a process that is both an R-process and a supermar ting ale. 64. Some important examples. We look at Brownian and Poisson examples, at hazard functions, and at an example of a UI martingale with a jump that is not in se1. (64.1) Example: pre-Brownian motion. Consider the pre-Brownian motion (Ω,$,Ρ;{Υ^Α>0}):=(Κτ,&τ,μ;{π('ΛΕΤ}) (T:=[0,oo)) in (32.5). For 0 ^ s ^ i, Yt — Ys is independent of any finite subfamily of {Yr: r < 5}, and therefore of the σ-algebra ^s:= a(Yr: r ^ 5). Hence Е(У,-У5;^) = Е(У,-У5) = 0, a.s., and so У is a martingale relative to its natural filtration {^t}. Note that, since E[(yt — У5)2] = t — 5, the map t \-* Yt is continuous into JSf2, and therefore into (64.2) Example: compensated Poisson counting process. Use Section 37 to construct a Poisson measure Л on [0, 00) with intensity measure equal to the Lebesgue measure λ. Write (Ω,^,Ρ) for the carrier triple. Define Nt:=A[0,i], the number of Poisson occurrences during time interval [0, t]. Note that the function t\-+N(t,(o) is here already an R-function for every ω. For 0 ^ s ^ i, Nt-NM = A(s,i] is independent of ^s:= a(Nr:r^s), and Nt — Ns has mean t — s. Hence {Nt — t} is an R-martingale relative to the filtration {^t}. The process {Nt — t} is called the compensated Poisson counting process. A serious study of 'compensators' is made in Chapter VI. (64.3) Example: hazard functions. A more extended study of hazard or'cumulative- risk' functions is made in Section VI.22. Let (Ω, ^, Ρ) be a probability triple, and let Τ: Ω -*· (0, oo) be a positive random variable. Let F be the distribution function of T:F(t):=P(T^t). Define the right-continuous process A:=I[TiO0) via Л,(со):=\1 ύί>Τ{ω)> (.0 if t < Τ(ω); and put &t:=a(As:s ιζή. Note that {^,} is the smallest filtration relative to which Γ is a stopping time in that (T< t}e&t for every t.
168 SOME CLASSICAL THEORY 11.64 Define the hazard or comulative-risk function ад:Л tfW . J(o,u]l-^-) (64.4) LEMMA. The process M, where Mt:=At-h(TAt\ is a martingale relative to the filtration {&t}. Proof. For s ^ i, the σ-algebra У5 is generated by the π-system consisting of Ω together with all sets of the form {T < r}, where r < 5. Now, for r < s, we have E[M,; Γ ^ r] = E[l; Γ < r] - Ε[/ι(Γ); Γ ^ r] = E[MS; Τ < r]. Next, we have E(M,) = P(T ^ t) - E[h(T); Τ ^ i] - E[ft(i); Г > 0 = F(t)-f h(r)dF(r)-h(t)[l-F(t)l J(0,t] However, Γ /i(r) dF(r) = Γ </F(r) Γ dF(v) [1 - F(i; -)] "' J(0,i] J(0,i] J(0,r] = f dF^Cl-Fit;-)]-^ dF(r) J(0,f] J[i;,f] W-Fiv-n-'lFW-Fiv-n F(t,_)]-i{l_F(t,_)_[l_F(i)]} = f ^ J (0,1] = f dF{v)[l J(0,i] = [ dF(v)-h(t)\;i-Fm J (0,0 = F(t)-fc(t)[l-F(t)], so that Ε(Μ,;Ω) = Ε(Μ,;Ω) = 0. Hence, by the π-system characterisation of conditional expectation in Definition 39.1, we have E(M,|^S) = Ms, a.s.. Π (64.5) Example. Consider the previous example when Τ has the exponential distribution of rate 1, so that, for ί > 0, F(t) = l-e-', h{t) = t, Mt = At-{TAt).
H.64,65 CONTINUOUS-PARAMETER SUPERMARTINGALES 169 Let g be a deterministic function on [0, oo), let G(i):= $s0g(s)ds9 and define Z,:= {'g(s)dMM = g(T)I{T<t}-G(TAt) Jo Ut)-g(t> if τ<ί, (-G(i) ifT>i. Now choose g so that e'-l te'+i G(i) = - , whence 0(t) = - 1 + ί (1 +1) 2 (64.6) LEMMA. TTte process Ζ is then a UI martingale such that Z(T—) is noi in se\ Proof. We have \2 + T-eT\ eT (1 + Tf (I + T)2 Since eT f°°_fl + T)2"Jo (ГЙ E ——r= I e* ,e~sds=l, (1 + Γ)2 J0 (i+s)2 we see that the process {Zt/<r</,} is dominated in if1. Moreover, E(|Z,/{T>t}|;|Z,|>K) f(l +1)" V- l)e"' if(l + t)" V - 1) > K, [θ otherwise. "{ί Thus both the processes {ZtLT^tA and {ZtLT>tA are UI, whence Ζ is UI. Proof of the martingale property of Z, along the lines of the proof of Lemma 64.4, is left as an exercise. Note that _ Γ00**-1 Jo 1 + 5 EZ(T-) = -EG(T-)=- e~5ds=-oo. D Remarks. Examples similar to Ζ were studied by Dellacherie and by Doleans- Dade. We can obtain the same phenomenon in discrete time with Z(T— 1) instead of Z(T-). 65. Doob's Regularity Theorem: Part 1. Now the full power of the Upcrossing Lemma is brought into play.
170 SOME CLASSICAL THEORY 11.65 (65.1) THEOREM (Doob's Regularity Theorem: Part 1). Let {y,:ieIR+} be a supermartingale carried by the filtered space (0,^,Р;{^:£еК/}). Let G:= {ω:the тардь-» Yq(co) from Q+ to 1R is regularisable}. Then Ge& and P(G) = 1. For ieIR+, define (0 ifcotG. Then X is an R-process in that all sample paths of X are R-functions. Proof. Look again at Theorem 62.7. Because there are only countable many triples (iV,a, b) where iVeN and a, fteQ with a < ft, we need only show that, for fixed JVeN and fixed a,beQ with a < ft, we have (65.2) P(sup{|7»|:^EQ+n[0,iV]}<oo)=l (65.3) P(C/JV(y|Q+;[a,ft])<oo)=l, where the number of upcrossings relates to the restriction of 11-> Yt(co) to the setQ+n[0,JV]. Let (D(m)) be a sequence of finite subsets of Q+ η [0, iV], each containing 0 and N, and with D(m)tQ+ n[0,N]. Then, by Lemma 54.5 applied to {Yq:qeD{m)}, we have, for о 0, P(sup {| У»| : g eQ+ η [0, N]} > 3c) = t lim P(sup {| Yq(a>)\: qeD(m)} > 3c) ^4Е(|У0|) + ЗЕ(|У„|), and (65.2) follows. By the Upcrossing Lemma 48.3, we find that ЕС/*(У |Q+; [а, i>]) = Τ Hm EUN(Y \D(m); [a, Ц) < Шгй+И), ^ ft —a and (65.3) follows. (65.4) Example. Suppose that Ω = { + 1,-1}, that P({ + 1}) = P({-1}) = ±, that <St = {0, Ω} when ί < 1 and that #f = ^(Ω) when ί > 1. Suppose that, for οοεΩ, УДш): Γθ if t ^ 1, [ω ifi>l. Then У is a martingale relative to the filtration {^t}, and [0 if ί < 1, [ω if ί Μ. Note that Xx is not ^-measurable, so that X is not a martingale relative to the filtration {^t}. Moreover, P(XX = Ух) = 0, so that X is not a modification of У This example explains our next concerns.
11.66 CONTINUOUS-PARAMETER SUPERMARTINGALES 171 66. Partial augmentation. We continue with the notation and assumptions of Theorem 65.1. The set G in that theorem is an element of the σ-algebra Ji{^S^) of P-trivial sets in <S ^ that is, sets in ^да of P-measure 0 or 1. The definition of X in Theorem 65.1 makes the following lemma obvious. (66.1) LEMMA and DEFINITION (partial augmentation). The process X is adapted to the filtration {Jft}, where jrtv=a(9t+9jr(9J), where <Zt+:= f] Vv= f] STf. v>t q>t We call {Жг} the partial augmentation of {^t}. {66.2) THEOREM (Doob's Regularity Theorem: Part 2). The process X is a supermartingale relative to {Jft}. Moreover, X is a modification of Υ if and only if the mapt\-+Yt is right-continuous into if1, that is, if and only if limE(\Yv-Yt\) = 0 for every i^O. viit Proof. For the moment, fix υ and t with v>t^0. Suppose that (q(n):ne — N) is a sequence of rationals with v>q(n)[[t as η || — oo. (For sequences, the || notation is understood to imply monotonicity: q{n) ^q(n-l)> t.) We have Е(Щ^(и))<Уд(и), a.s. Using the Levy-Doob Downward Theorem 51.1 (for martingales!), we have Е(ВД+ХХ„ a.s., whence, trivially, (66.3) E(Yv\jft)^Xt, a.s. Now suppose instead that и ^ t and that {q(n)) is a sequence of rationals with q(n)Hu. From (66.3), we have (66.4) Е(Уд(и)|^,К*„ a.s. However, we know from Lemma 63.4 and Theorem 65.1 that Yqin)-+Xu in if1; and, by the JS?1 continuity of conditional expectations, we now have E(Xu\JTt)^Xt, a.s. Hence AT is a supermartingale relative to {J^u}. It now follows from Lemma 63.4 and the right-continuity of X that X is right- continuous in JSf1. Since we also know that if q{n)[[t then Yqin)-+Xt in JS?1, it follows that AT is a modification of Υ if and only if Υ is right-continuous in if *. D
172 SOME CLASSICAL THEORY 11.67 67. Usual conditions; R-filtered space; usual augmentation; R-regularisation. We continue the discussion in the last two sections. The partial augmentation {J^t} of {^t} does not allow a sufficiently rich class of stopping times. We need the so-called 'usual augmentation'. (67.1) DEFINITION (usual conditions; R-filtered space). A filtered space (Q,^',P;{^'f:ie]R+}) is said to satisfy the usual conditions and then to be an R-filtered space if, in addition to the filtration property forO^s^t, &ш^&г^&^=о\ (J iOc:^, \ueR+ / the following properties hold: (i) the σ-algebra 3F is P-complete; (ii) ^0 contains all P-null sets in 3F; (Hi) {^t} is right-continuous in that jri = j5-i+:= p| jtm forallt^O. u>t (67.2) Remark. The 'usual conditions' described above are those standard in the literature. In some contexts, however, a more appropriate definition is obtained by replacing У by & ^ in properties (i) and (ii). We stick to Definition 67.1, though the current Remark will haunt us in Markov-process theory. (67.3) DEFINITION (usual augmentation, R-regularisation). Let (Ω, #,?,{#,}) be a filtered space. The usual augmentation, or R-regularisation (Ω,^",Ρ, {&Ί}) of this setup is the minimal enlargement ('enlargement' in that У с: & and &t с: &*г for every t) that satisfies the usual conditions. (67.4) LEMMA. The usual augmentation of (Ω, 0, P, {&t}) is obtained by making !F the P-completion of <&, and, with JT denoting the collection of P-null sets in &', setting (67.5) Px = Π σ(3ΤΜ, JT) = σ(»,+, JT). u>t Ift^O and Fe^t then there exists G in <&t+ such that FAG:= (F\G)u(G\F)ejr. Important Note. The equality on the last two terms of (67.5) needs proof because it is not true in general that if (Σπ) is a decreasing sequence of σ-algebras on a set S with intersection Σ, and Sf is a σ-algebra on S, then f] σ(Σπ, У) = σ(Σ, Sf\ See, for example, Section 4.12 of [W]. Of course, in (67.5), Ji consists of all P-null sets, and this makes (67.5) easy to prove. Go through the Exercise E79.67b of proving the lemma in full.
11.67 CONTINUOUS-PARAMETER SUPERMARTINGALES 173 (67.6) PROPOSITION. In the context of Parts 1 and 2 of the Regularity Theorem, X is an {!Ft} supermartingale. The structure {(Xt, 3P\} is called the R-regularisation of{(yt,SFt)}. Proof This is obvious because, for each i, Жх and §*x differ only by null sets, as was explained at the end of Lemma 67.4. Π We now examine what happens when our underlying filtration does satisfy the usual conditions. (67.7) THEOREM (Doob's Regularity Theorem: Part 3). Let (Ω,^,Ρ,{^}) be an R-filtered space relative to which supermartingales etc. are defined. Let Υ be a supermartingale. Then Υ has an R-process modification Ζ if and only if the mapt\-+E(Yt) from [0, oo) to R is right-continuous, and then Ζ is an R-super- martingale. Proof. From the supermartingale property of У, we have, for и > ί, (67.8) Е(ГМ|^,КГ„ a.s. Now let X be constructed from Υ as in Parts 1 and 2 of the Regularity Theorem. Let иUt in (67.8), and use the fact that Yu->Xt in if1, to obtain E(X,| J%K 7„ a.s. However, {Xt} is {J%}-adapted, the usual augmentation of {^t} being {J%} itself. Hence Xt^Yn a.s. But, if t\-+E(Yt) is right-continuous then, since Yu->Xt in JS?1, we have (67.9) Е(*,) = ИтЕ(Ум) = Е(У,), «lit and, on comparing (67.8) with (67.9), we see that Xt = Ур a.s. Thus, if the тар£ь->Е(У,) is right-continuous then X is an R-supermartingale modification of У The rest is trivial. Π The following lemma answers a frequently occurring question. (67.10) LEMMA. Suppose that Υ is an R-supermartingale relative to a filtered space (Ω,^,Ρ, {^t}). Then Υ is also an R-supermartingale relative to the usual augmentation (Ω, ^", Ρ, {«^J). Proof. Let 0 < t < v. Then, for и(п)Ц t with u(n) < ν for every n, we have whence, applying the Downward Theorem on the left-hand side and right
174 SOME CLASSICAL THEORY 11.67,68 continuity on the right-hand side, we obtain E(y.|Srt+)«yt, a.s. Hence Е(У„|^,)<У„ a.s. D 68. A necessary pause for thought. We must not rush too quickly to claim that we can henceforth consider only R-supermartingales relative to R-filtered spaces. Let us consider rather carefully the canonical pre-Brownian motion (68.1) (Cl,<Z,P;{Yt:t>0}) of (64.1). We define the natural filtration &t:=a(Yr:r ^t) of У Then У is a martingale relative to this filtration. Since the martingale У is also continuous in S£ \ we can find a modification X of У that is an R-supermartingale relative to the usual augmentation (<Г,{^}) of(9,{9t}). But 3Ft contains information about what happens just after time i, and various questions are raised. Have we destroyed the Markov property? We know that, for i, и > 0, Yt+U — Yt is independent of ^t, and it is easy to see that Xt+U — Xt is independent of Уг But is Xt+U — Xt independent of ^7? More fundamentally, we can ask whether <FX really looks into the future, or whether it is true that !FX = σ(^ρ Jf\ where Jf is the collection of P-null sets in &. We know, of course, that &x = a(9t+9 Jf). We now resolve these questions. It seems best to state the results first as they relate to the original setup (68.1). (68.2) THEOREM. For the canonical pre-Brownian motion setup at (68.1), with its natural filtration {Уг}, the following results hold. (i) For t^0,<%t:=a{Yt+u-Yt:u^0) is independent of$t + . (ii) For t ^ 0, ^i+ с σ(9ί9 Jfi$n)\ where Jfi<§β) denotes the collection of P-null subsets in У^. We can interpret (i) as stating that looking ahead an infinitesimal amount by replacing <&t by 9t+ does not destroy the crucial independence property. The result (ii) makes it clear that 9t+ does not really look ahead of t: every 9t+ set differs from some cSt set by a null set. Proof of (i). It is clear from the independence properties of У and from the construction of X in Part 1 of the Regularity Theorem that, for i, и ^ 0 and ε>0, Xt+U+E — Xt+E is independent of ^ί+(1/2)ε and hence of 9t+. Hence, for any G in &t+ and any bounded continuous function /onR, E[/(X,+B+£ - Xt+e); G] = P(G)E[/(X(+u+£ - Xt+,)l We use the right-continuity of X and the Bounded-Convergence Theorem to
H.68,69 CONTINUOUS-PARAMETER SUPERMARTINGALES 175 deduce that (68.3) 4f(Xt+u - *,); G] = P(G)E[f(Xt+u - Xt)l The Monotone-Class Theorem shows that for each fixed G, the result (68.3) holds for every bounded Borel function /, whence Xt+U — Xt is independent of #t + . Since У is a modification of X, Yt+U — Yt is independent of ^t+. But now, for ί,Μ,ΟΟ, the variable Yt+U + O— Yt+U is independent of #(t+ll)+; and &t+ and σ(7ί+Μ— Yt) are independent sub-a-algebras of #(t+ll)+; etc. Full proof that ^t is independent of #t+ is now left as an exercise. Proof of (ii). Note that #e = a(^t9%t% so that #β is generated by the π-system Jt of sets of the form Gtr\Ati where Gte(St and Ate%r Let η be bounded (^i+)-measurable, and let ξ be a version of η — E(^t). All that we need to show is that ξ = 0, a.s.. Since ξ is (^i+)-measurable, ξ is independent of <%t. So, for Gte<&t and Л,е^„ we have Е(£С,п^ = Р(Л,)Е(£С,) = 0, the last equality holding because of the definition of conditional expectation. Hence ξ is orthogonal to the indicator function of any element of the π-system </„ and hence to the indicator function of every element of %. Since ξ is (^J-measurable, ζ = 0, a.s. Π Theorem 68.2 has the following consequences for the R-regularisation (X,^,{^t}) of (Y, ${$}). {68.4) THEOREM. For the R-regularisation (Χ,Ρ,^}) of the canonical pre- Brownian motion (Y,^, {^t}), the following statements are true. (i) For t ^ 0, σ(Χί+Η - Xt:u ^ 0) is independent of fv (ii) For t ^ 0, &t = σ(#„ΛΟ, where jV is the collection ofP-null elements of ^ (which here equals ^J. Note that since X is a modification of У, we have proved the following result. (68.5) LEMMA. There exists an R-process X with Wiener measure as its law. We shall see later that almost all paths of this process X are continuous. Results such as Theorem 68.4 are very much part of the reason that we can concentrate on R-supermartingales relative to R-filtrations. We need considerable extensions of Theorem 68.4 later. 69. Convergence theorems for R-supermartingales. Right-continuity of paths allows us to prove the convergence theorem for continuous-parameter super- martingales in the same way as we proved the discrete-parameter result.
176 SOME CLASSICAL THEORY 11.69 (69.7) THEOREM (Doob's Supermartingale-Convergence Theorem; compare Theorem 49.1). Let X be an R-supermartingale relative to a filtered space (Ω,^,Ρ, {#,}). Suppose further that X isboundedin JS?1: supiE(|ArJ)< oo. Then 1Ю:= lim Xt exists (in IR), a.s. Proof. Since t\-+Xt((o) is right-continuous, lim sup Xt = lim sup Xq, lim inf Xt = lim inf Xq. I-t-oo q-*ao ί-»·οο q-+со If, therefore, lim, ЛГДсо) did not exist in [ — 00,00], we could find rationals a,b with a < b such that lim inf Xq < a < b < lim sup Xq, q-> со q-+ 00 whence the number (700(Ar|Q+; [a,fe]) of upcrossings of [a,fc] by the restriction ^Iq+ would be infinite. However, by the Upcrossing Lemma and familiar arguments, EI/JJriQ+jfeblJ^ib-arMsupEilATJJ + la^oo. Thus Хю exists, a.s., in [ — 00, 00]. That AT^elR, a.s. follows as usual by Fatou's Lemma. D (69.2) THEOREM (Doob's Convergence Theorem: Part 2; compare Theorem 50.1). Continue with the notation and assumptions of Theorem 69.1. (i) If {Xt:t ^ 0} is further assumed to be UI then X^X^ in S£\ and, for every t, E(Xo0\^t) ^ Xv a.s., with a.s. equality if X is martingale, (ii) If X is a martingale and X^X^ in «Sf1 then X is UI. Proof of (i) is an obvious modification of the proof of Theorem 50.1. Part (ii) is an immediate consequence of the JS?1 continuity of conditional expectations and the UI property established in Theorem 44.1. (69.3) Warning: in continuous time, an JS?1 -convergent supermartingale need not be UI. See Exercise 79.64 or part (v) of E79.66a. (69.4) THEOREM (compare the Downward Theorem 51.1). Suppose that we have an R-supermartingale (Ω, ^, P; {(% Xt):t > 0}) with parameter set (0, 00) open at 0. Suppose further that 8ир,>0Е(,Х\)< 00. Then X0+:= lim Xt exists a.s. and in 5£ \ and, for every t, E(Xt\%+) ^ X0, a.s.
H.69,70 CONTINUOUS-PARAMETER SUPERMARTINGALES 177 Proof. Yet again, the existence of the limit in [ — 00,00] follows from the Upcrossing Lemma. The rest follows closely the proof of Theorem 51.1. D {69.5) THEOREM (Upward Theorem for martingales). Suppose that (Ω,^, Ρ; \β*{Λ ^ 0}) satisfies the usual conditions. Let ξε^1(Ω, #", Ρ). Then there exists a UI R-martingale {ξ,'Λ ^ 0} with ξί = Е(£|^), a.s. As t-+ 00, ξί-^Ε(ξ\^Γο0% a.s. and in JS?1. This is now an easy exercise. 70. Inequalities and ££p convergence for R-submartingales. Right continuity of paths also allows us to transfer the standard inequalities from the discrete- parameter case. (70.1) THEOREM (Doob's Submartingale Inequality; compare Theorem 52.1). Let Ζ be a non-negative R-submartingale relative to the filtered space (Ω, % P; {^,}). Then, for о 0 and t ^ 0, cp( sup Zs 7* с ) ^ Ei Zt: sup Zs ^ с J ^ E(Zt). Proof. Let (D(m)) be an increasing sequence of finite subsets of [0, i] each containing the points 0 and f, and insist that the union of the D(m) is dense in [0, i]. Then sup Ζ5(ω) = sup sup Zs(co\ se[0,t] m seD(m) so that sup Zs > с > = t lim < sup Zs>c). se[0,r] J mi. seD(m) Note that we need ' >' not' ^' for this logic. Thus, for у > 0, :} = Tlim{ P<J sup Zs>y > = TlimP< sup Zs>y . se[0,t) J ml seD(m) ^^Иту'Щг,; sup Zs>y seD(m) = y-1E(Zt; supZ5>y Now let у |Tс. П (70.2) THEOREM (Doob's JS?p inequality; compare Theorem 52.3). Let ρ > 1 and define q so that p~l + q~x = 1. Lei Ζ be a nonnegative R-submartingale
178 SOME CLASSICAL THEORY 11.70,71 relative to some filtered space (Ώ,^,Ρ;{^}). Assume further that Ζ is bounded in S£p. Define Z*:= sup \Zt\ = sup Zv Then Z*e&pt and indeed (70.3) HZ*||^isup||Zf||r t The submartingale Ζ is therefore dominated by the element Z* of <£p. Also, Z^\= limZt exists a.s. and in 5£py and ||Z00||p=sup||Zi||p=lim||Zi||p. t t If Ζ is of the form M, where Μ is an R-martingale bounded in S£p> then Af e:= limMt exists a.s. and in 5£py and E(Ma0\(St) = Mv a.s.. The proof is left as an exercise. 71. Martingale proof of Wiener's Theorem; canonical Brownian motion. We gave the Levy-Ciesielski proof of Wiener's Theorem in Section 1.6. Now, we give a martingale proof. We know that the pre-Brownian motion Υ has an R-process modification X that is a martingale relative to the usual augmentation {&t} of the natural filtration of Y. Perhaps it is here most natural to work with the natural filtration {3£t} of X, defining martingales etc. with respect to {3Ct}. However, you can work with {J^,} if you wish: it does not matter. (71.1) THEOREM. P-almost all paths of X are continuous. Proof. The process X* is an R-submartingale, by the conditional form of Jensen's inequality. Hence, by the Submartingale Inequality, we have for ε > 0, ε4Ρ( sup IJSCJ > ε j = ε4Ρ( sup {X4} >e]< E(X4) = Κδ2, \ s^S J \ s^S J where Κ = Ε£4, ξ denoting a random variable with the standard normal distribution. (In fact, К = 3.) Thus if Dn:= {/с2"п:0 < к < 2П} с [0,1], we have with <5„:= 2" and επ:= n_1, P( sup 8ηρ\ΧΓ+5-ΧΓ\>εη)^2ηΡ[ sup \XS-X0\ > ε„ reD(n) s^d» J \s^bn ^ 2ηΚε;*δ2 = 2nKn*2~2n = Kn*2~n. However, ΣΧη42~π converges, so, by the First Borel-Cantelli Lemma, there exists a subset Ω0 of Ω with Ρ(Ω0) = 1 such that, for ω€Ω0, there exists an η0(ώ)
11.71 CONTINUOUS-PARAMETER SUPERMARTINGALES 179 such that, for η ^ η0(ω), sup sup \Xr+s{w)-Xr{(D)\^n~\ reD(n) s^ 2~n whence sup \ΧΓ+5(ω)-ΧΓ(ω)\^^-\ r,se[0,l]\r-s\^2-n Thus, for ωΕΩ0, t\-*Xt((u) is (uniformly) continuous on [0,1]. It is obvious now that P-almost-all paths of X are continuous on [0, oo). Π It is obvious that we can modify X so as to have all its paths continuous. (Take Χ.(ω) = 0 for bad ω.) By using exponential martingales instead of the X* submartingale, and a lot of ingenuity, one can refine this argument to obtain Levy's Modulus-of-Continuity Theorem. See, for example, McKean [1]. (71.2) Canonical Brownian motion. Let C:= C(R + ,R) be the space of all continuous functions from [0, oo) to R. For weC and t ^ 0, define nt(w):= w(t), and define the σ-algebras s/t:= a(n5:s < ί) (ί > 0), */„:= a(ns:s ^ 0). If X is either the process of the last section or the process В of Section 1.6 (in either case, with all paths made continuous) then Wiener measure W on (С, Я^) is given by the law of X: w = p°x-1, when we regard X as the measurable map ω\-+Χ. (ω) from (Ω, J^) to (C,^). (71.3) DEFINITION (canonical Brownian motion started at 0) and LEMMA. The setup (C^W^^ii^O) is called canonical Brownian motion on R started at 0. The Wiener measure on (C, si^) is the unique probability measure on (C, si^) such that, under W, (i) π0 = 0, a.s., and (ii) whenever i,w^0, πί+α — nt is independent of s/t and has the N(0,t — s) distribution. We know this. Now check out the following lemma. (71.4) LEMMA. For canonical Brownian motion, for every t^0 the process {nt+u-nt:u^0} has law W and is independent of siv
180 SOME CLASSICAL THEORY 11.71,72 We know that such things are rather tiresome (and there are more such in the next section). However, you should be sure that you can prove such results as that just given. Of course, we more or less did this example during the proof of Theorem 68.2. There are good reasons for not insisting that Χ0(ω) = 0 for every ω. 72. Brownian motion relative to a filtered space. Just when you thought that things were getting clean and tidy ... . Already during our discussion of Skorokhod embedding in Section 1.7, we had to use-non-=caironical Brownian motion. We needed variables α, β etc. that are independent of our Brownian motion, and our canonical space will not carry these. So, more definitions.... {72.1) DEFINITION (Brownian motion relative to a filtered space). Let (Ω,^, Ρ, {&t}) be a filtered space. By a Brownian motion (on R, starting at 0) relative to this setup, we mean a process X such that (i) X(0) = 0, a.5.; (ii) all paths of X are continuous; (Hi) whenever t,u^0, Xt+u — Xt is independent of (St and has the N(Q,t — s) distribution. (72.2) LEMMA. // X is as in (72.1) then, for each t ^ 0, the process {Xt+U — Xt: и ^ 0} has the Wiener measure as its law, and is independent ofcSv Moreover, X is a Brownian motion relative to the usual augmentation of the setup (Ω, ^, Ρ, {^}). We essentially know all this. We mention that Levy's characterisation of Brownian motion (1.2.2) extends to this situation. (72.3) LEMMA. Let (Ω,^,Ρ, {%}) be a filtered space. Let X be a continuous process adapted to this filtration such that Xo = 0 almost surely. Then X is a Brownian motion (started at 0) relative to the setup (Ω,^,Ρ, {&t}) if and only if both {Xt} and {Xf — t} are martingales relative to this setup. The proof is in Section IV.33. Here is the result we need to begin to play the Skorokhod-embedding game of Section 1.7 properly. (We also require the Strong Markov Theorem.) (72.4) THEOREM. Let X* be a Brownian motion relative to (Ω*, ^*, Ρ*, {^*}). Let (Ω**, ^**, P**) be a probability triple carrying a family {а** :ЛеЛ} of random variables. Let Ω:= Ω* χ Ω**, with typical point ω = (ω*, ω**). Define <#t:= <#? χ ^**, Xt(co):= Χ*(ω*% αλ(ω):= α**(ω**),
Π.72,73 CONTINUOUS-PARAMETER SUPERMARTINGALES 181 and put P:= Ρ* χ Ρ** on <§\= <§* χ 9**. Then (i) {Xt} is a Brownian motion relative to (Ω,^,Ρ;{^,}); (ii) the family {ая:ЛеЛ} has the same P-law as {а£*:ЛеЛ} has P**-law; (Hi) {Xt} and {ая:ЛеЛ} are independent families. Proof For t ^ 0, let rft:=a(Xt+u-Xt:u^0). Then <stft = s/f χ {0,Ω**}, with an obvious notation for «s/*, because srft has no information about ω**. For A*es/*, Gfe&f and G**e^**9 we have P[(i4*xQ**)n(G*xG**)] = P[(4*nG*)xG**] (logic!) = P*(A * η G*)P**(G**) (definition of P) = P*(4*)P*(G*)P**(G**) (P*-independence of s/* and G*) = Р(Л* x Q**)P(G* x G**) (definition of P). That stt and ^, are independent now follows from the π-system Lemma 22.2. The rest is easy. D Stopping times In continuous time, the theory of stopping times becomes a massive and very deep subject in its own right, many of the crucial parts of which we develop in Chapter VI. Here we concentrate on developing those results that we need in Chapters III and IV (the latter being on stochastic-integral theory for continuous processes). 73. Stopping time Γ, pre-Γ algebra &T9 progressive process. Let (Ω, 9, P, {9t}) be filtered space. At the moment, we make no assumptions about usual conditions. (73.1) DEFINITION (stopping time Γ, σ-algebra 9T). A map Γ:Ω-»[0, οο] is called a {9t} stopping time if (73.2) {Γ< r}:= {ω:Τ{ω) < t}e&t for every t < oo. We then define the pre-Γ σ-algebra 9T via (73.3) Ae9T if and only if Λ η {Γ < ί}efy for every t < oo. {73.4) LEMMA. The following results hold, (i) If S^T then 9S с 9T. (ll) ^5Л r = ^Π 9T.
182 SOME CLASSICAL THEORY 11.73 (Hi) If Ae&Sv T then An{S < T}e&T. (iv) 9SvT = a(9s,9T). (Compare Lemma 58.3—carefully!) (73.5) {% + } stopping times. Note that {&t+} is also a filtration, that Τ is a {&t + } stopping time if and only if (73.6) {T<t}e9t for every t < oo, and that then, with &T+:={&.+)T in the obvious sense, (73.7) Ae^r+ifandonlyif Αη{Τ*ζ t}e9t+9 Vi, if and only if An {T< t}e9t, Vi. {73.8) LEMMA. Let (Sn:neN) be a sequence of {9t} stopping times. (i) If Sn|S then S is a {9t} stopping time, (ii) If S„[S then S is a {&t + } stopping time and &s+ = f]n^sn+' The proof is left as Exercise E79.73. Note. Of course, if S„|S, it need not be true that 9S = а(98п:пеЩ A very important part is played in the subject by a sequence Adapted Ώ Progressive 3 Optional 3 Previsible of ever-more-restrictive notions of 'non-anticipating'. We now meet the second of these. (73.9) DEFINITION (progressive process). A process X = {Xt:t^Q} (with values in an arbitrary measurable space {E,S)) is called {9t}-progressive if for every i^O, the restriction of (ί,ω)ι->Χ(ί,ω) to [0,ί]χΩ is (Jf[0,i] x ^t)- measurable. Note that a {^-progressive process is automatically {&t}-adapted—see Lemma 11.4. An important example of a progressive process (in fact, of an optional process!) is provided by the following lemma. (73.10) LEMMA. A right-continuous adapted process with values in a metrisable space (£, Jt, E)) is progressive. Proof. Fix t ^ 0. For neN, define, for s < i, 7(Λ)(5,ω):=Χ([Λ+ 1]2-πί,ω) if k2'ut <s < [fc+ l]2"ni, and put 7(π)(ί,ω):= X{t,co). Then 7(n) is trivially (^[Ο,ί] χ ^-measurable, and X = lim У(и). D
11.73,74 CONTINUOUS-PARAMETER SUPERMARTINGALES 183 You did of course notice that no analogue of Part (i) of Lemma 58.3 was given at (73.4). Here is the appropriate analogue. (73.11) LEMMA. // X is {^^-progressive and Τ is a {&t} stopping time then XT is ^-measurable. Of course, Χτ(ω) is defined to equal ΧΤ(ω){ω) if Τ(ω) < oo. If, for example, X is a supermartingale such that X^ exists, we define Χτ{ω) = ^(ω). In other cases, we would set ^(ω) to be identically 0. Proof. Fix t ^ 0. Define Ω,:= {ω: Τ(ω) ^ί}, and let #^be the σ-algebra of subsets of Ω, that are in <0t. Define the map ρ:Ω^->[0,ί] χ Ω, by ρ(ω):=(Γ(ω),ω), and define the map Jf(t):[0,i] χΩ,-^Ε by Xit)(s,co):= X(s,co). Then we have the pictures %^~— Λ[0,ί] χ <St<^— <f, whence, for TeS, {ω:Χτ(ω)ΕΓ}η{Τ^ή = (Χ(ί)οΡΓ1(Γ)Ε^^^. 74. First-entrance (debut) times; hitting times; first-approach times: the easy cases. We now consider some important examples of stopping times. (74.1) DEFINITION (first-entrance (debut) times; hitting times). // {Xt} is a process with values in a measurable space (E,S), and if Te$, we define Dr(o):= inf {t ^ 0:Xt(co)er}9 ΗΓ{ω):= inf {t > 0: АГг(о)еГ}, with the usual convention that the infimum of the empty set is oo. We call Dr the debut of Г for X or first-entrance time of X into Г and Hr the hitting time of Г by X. The definition of hitting time may seem rather bizarre. However, it is the one that matters in potential theory. Please do note carefully the hypotheses and conclusions of the following lemmas. (74.2) LEMMA. The first-entrance time DF into a closed set F for a continuous {&t} adapted process with values in a metric space (£,p) is a {&t} stopping time. The hitting time HF of F is a {&t+} stopping time. Proof. Since x\-+p(x,F) is continuous, ω\-+ρ(Χ4(ω\F) is ^-measurable, for
184 SOME CLASSICAL THEORY 11.74,75 geQ+. But now, for r^O, we have by path continuity, Df{cd) < t if and only if inf {p(Xq(co),F):qeQη[0, t]} = 0, and the stopping-time property of DF follows. Proof of the result for HF is left to you. D (74.3) LEMMA. The first-entrance time DG into an open set G for a right- continuous {&t} adapted process with values in a topological space (E,3#(E)) is a {% + } stopping time. Proof. Right continuity of paths implies that {DG<t} = [j{XqeG}e% q<t so that DG is a {^ +} stopping time. Even if all paths of X are continuous, DG need not be a {&t} stopping time. (For example, suppose that X is real-valued, that for some ω, Xt(co) < 1 for t ^ 1, and that Α\(ω) = 1. Let G = (1, oo). We cannot tell without looking slightly ahead of time 1 whether or not DG= 1.) {J4.4) LEMMA and DEFINITION (first-approach time). Let X be an adapted R-process with values in a metric space (£, p). Let К be a compact subset of E. Define the first-approach time AK for К as AK:= inf {t ^ 0:either Xt or Xt- is in K.) (We define X0_:= X0.) Then AK is а {Щ stopping time. Proof. We have AK^t if and only if inf{p(Xq(cu\K):qeQr\[0,t]} = 0 or XteK. D 75. Why 'completion' in the usual conditions has to be introduced. Now for the result that explains why we need the 'completion' part of the usual conditions. You are very strongly recommended to examine its proof, so that you start to appreciate the need for the 'transfinite' methods that find their proper form in proofs of the Debut and Section Theorems. {75.1) LEMMA. Again suppose that X is an R-process with values in a separable metric space (£, p) and that К is a compact subset of E. Suppose that X is adapted to a filtration {^Ft} that satisfies the usual conditions. Then DK is a {^t} stopping time. Proof. Let 5χ(ω):= Ax:=inf {i^0:either Xt or Xt_ is in K). The most intuitive feeling for ordinal numbers will suffice for our discussion.
11.75 CONTINUOUS-PARAMETER SUPERMARTINGALES 185 Recall that one can count through countable ordinals as follows: 1,2,3,...,α,α+1,α + 2,...,2α,...,3α,...,4α,...,α2(=αα), ..., α3,..., α4,..., αα, αα + 1,..., etc., etc., where α is the first infinite ordinal. Each countable ordinal is either the successor η + 1 of some η or a limit ordinal, which is the supremum of ordinals less than it. Define Ξη for countable ordinals η as follows: S„+1:=inf{i^S„:either Xt or Xt_ is in K}, S ·= | lim Sy if η is a limit ordinal. νίΐτ The argument used to prove Lemma 74.4 shows that each Ξη is a {#",} stopping time. The process X may approach К at Sfi, and 'jump away at the last minute', in which case Sp+1 >Sfi; but it can only make countably many such jumps away. (It is easy to prove that if the sum of an arbitrary sequence of non-negative terms is finite then only countable many terms can be non-zero.) We have &κ(ω) = ^(ω)(ω), where <5(ω) is the first—necessarily countable—ordinal such that SS{(a)(d) = SS{(0) + !(ω). The problem is that the number of countable ordinals, the number of possible values of <5(ω), is uncountable (in much the same way that the number of finite ordinals is infinite). The probability measure Ρ now comes into play. Let c„ := Ε exp (- S„), с := inf c„, the infimum being over all countable ordinals. For neN, we can find η(η) such that сф)<с + η"1. Let η(οο) be the countable ordinal lim^(n). Then η(οο) is independent of ω, and, since εη{ο0) = с, we have s„<oo) = sup S„, a.s.(P). It is now clear that DK is almost surely equal to the {lFt} stopping time Ξη(α0). D It is not at all easy to prove that DK need not be ^-measurable if {^t} is the unaugmented natural filtration of X, but Dellacherie [2] succeeded in doing so. (75.2) The effect of usual augmentation on stopping times. Proofs of Strong Markov Theorems rely heavily on the following result. (75.3) THEOREM (Dynkin). Let Τ be a {#,} stopping time, where {J%} is the usual augmentation of {&t}. Then there exists a {&t + } stopping time S such that P(S = T) = 1. Furthermore, &τ is the smallest σ-algebra containing &s+ and all P-rn/// sets in #". Proof. For neN, define (75.4) Γ(π)(ω):= (к + 1)2"и if 2"и ^ t < (к + 1)2"и,
186 SOME CLASSICAL THEORY H.75,76 with Γ(π)(ω):= oo if Τ(ω) = oo. Then ΑΚη:={ω:Τ(η)(ω) = ΐα-η}Ε^2-η (Ши{оо})5 so that we can find Л*п in %2-n+ such that Л*п = Лм, a.s. (in that their indicator functions agree, a.s.). Define (fc2-- on Л*я\Гил7.Л (fceIN)> Я«(ш).-= li<k J I oo on Ω\^υΛ*π] Then R{n) is a {^t + } stopping time. Put S(n):=infm^„K(m). Then S(n) is a {^i + } stopping time. It is easily checked that S(n) = T{n) a.s.(P), and we know from Lemma 73.8 that S:=|limS(n) is a {^t + } stopping time that clearly satisfies S=T, a.s.(P). Proof of the statement about ^T is now an easy exercise. D 76. Debut and Section Theorems. The Debut and Section Theorems are the fundamental 'measurability' results for stopping-time theory. The proper techniques for deriving these results involve Choquet capacitability theory and earlier methods of Sierpinski and others. We owe the theory principally to Sierpinski, Choquet, Ray, Doob, Hunt, Dynkin, Meyer and Dellacherie. Dellacherie and Meyer [1] is the definitive account. It is rather hard going, and it might be advisable to read (say) the appendix to Dynkin [1] first. Perhaps the best thing to do is to take the Debut and Section Theorems on trust for a while until you understand how such things are used. (76.1) THEOREM (Debut Theorem). Let X be a progressive process relative to a filtration {&t}, and with values in some topological space Ε (with its Borel σ-algebra 3&{E)). Then, for Be@(E), the equation £>Β(ω):= inf {t ^ 0: Xt(co)eB} defines an {^t} stopping time, where {^Ft} is the usual augmentation of {&t}. For a proof, see T.IV.50 of Dellacherie and Meyer [1]. (Note that IB°X is a progressive process with values in {0,1}.) (76.2) THEOREM (Section Theorem: first draft). Let X be a right-continuous process, {&t}-adapted, with values in some complete separable metric space E. Let В be a Borel subset of E. Then, for ε > 0, there exists a {&t} stopping time Τ such that (i) ΧΤ{ω){ω)ΕΒοη{Τ<οο}; (ii) P( Τ < oo) ^ P{DB < oo) - ε. If {#,} satisfies the usual conditions then we often can, and do, take T=DK, where К is some large' compact subset of B.
11.76 CONTINUOUS-PARAMETER SUPERMARTINGALES 187 The hypotheses of Theorem 76.2 are not the natural ones—hence the 'first draft' in the title. It is important to note that Theorem 76.2 would be false ifX were only assumed progressive: the correct measurability requirement on X is that it be 'optional. We look further into these matters in Chapter VI. The significance of the Debut Theorem will already be clear to you. The Section Theorem may not convey much as the moment. Let us therefore give you now an illustration of its use. We shall take for granted the strong Markov property of a certain process; this is proved in Chapter III. (76.3) Example. For neN, let Xn — {Xn(t):t ^ 0} be a Markov chain with two states {1,-1} and β-matrix i"4n In \ \ 4n -(In)' Assume that the chains Xn are independent of one another. Let Ε be the multiplicative group Ε = { — 1,1}N with the obvious product topology, so that Ε is homeomorphic to the Cantor set. Let X(i):=(*iW, «),...)· Then X is a strong Markov process on E; more precisely, if {^t} denotes the usual augmentation of the natural filtration of X, and if Τ is an {^t} stopping time, then the process {Χ(Γ+ ί)/Χ(Γ)} is independent of !FT and has the same law as (X(i)/X(0)}. (You should consider the adjustments to be made when T can be oo). If the qn grow sufficiently rapidly, then one can prove by local-time techniques that p( supDw<ooJ = l, so that, almost surely, by some finite random time, X will have visited all points of£. What we do now is investigate the 'robustness' of the Strong Law of Large Numbers (SLLN) when, as we now assume, qn — 1 for all n. Then Ρ[^(ί) = ^(0)]=|(1 + β-2ί), V(n,i), and one can easily use the SLLN to prove that, for fixed i, Ρ limsupn"1 £ Ark(i) = e_2ilimsupn-1 £ Xk(0) =1. L k<n *<n J
188 SOME CLASSICAL THEORY 11.76,77 In particular, if Γ is a stopping time then, for fixed t, limsupn"1 X Ark(r+i) = ^"2ilimsupn-1 £ Xk{T) k^n k^n (76.4) Ρ Suppose henceforth that T< oo ]- P[*„(0)=±l]=4, Vn. Let T:=<xeE:n * £ xk-*0 as n-*oo >. I k^n J Then, by the SLLN, for fixed t, Р[Х(*)еГ] = 1, so, by Fubini's Theorem, (76.5) Р[Х(*)еГ for α/mosi all t] = 1. However, if Г is a stopping time then, by (76.4), Р[Х(Г+ г)фГ Ι Х(Г)£Г] = 1 Vi ^ О, whence, a fortiori, (76.6) P[JT(t)er for almost all i; Х{Т)фГ, Т< oo] = 0. On comparing (76.5) with (76.6), we see that, for any stopping time T, Р[Г<оо;Х(Г)£Г]=0. By the Section Theorem, Р[Х(*)еГ for all t ^ 0] = Рф£\Г = oo) = 1. Thus, when qn = 1 for all n, the SLLN is preserved at all times. We saw earlier that if the qn grow rapidly then there will be a random set of measure zero of times on which the SLLN fails in every possible way. This kind of discussion takes us into the area of'quasi' potential theory—see, for example, Fukushima [2] and Lyons [3]. 77. Optional Sampling for R-supermartingales under the usual conditions. Let (Ω, ^, Ρ, {^t}) satisfy the usual conditions. Martingales, supermartingales, stopping times etc. are defined relative to {^t}. {77.1) THEOREM (approximation from above). Let X be an R-supermartingale, and let Τ be a stopping time. For neN, let Dn+:={fc2-":fceZ'f} be the set of non-negative dyadic rationals of order (less than or equal to) n. Fix
11.77 CONTINUOUS-PARAMETER SUPERMARTINGALES 189 t ^ 0. Define (77.2) Tin){co):=mi{qeT)^.q> Τ(ω)}, i(n):=inf {qety:q > t}. Then T{n) is a stopping time relative to {3?q:qelD+}, Τ(π)| Τ and &{T{n))l^{T). Moreover, (77.3) X{T{n) a tin))^X{T a t), a.s. and in 5£γ. In particular, X{T a t)eS£^. Proof. The fact that ^(Γ(π))|^(Γ) follows from Lemma 73.8. Applying the argument used at (59.2) to the finite discrete-parameter set Dn++1n[0,i+ 1], we obtain E[X{Tin) a t{n))\3?(Tin + 1) a i<" + 1))] ^ X(T{n+1) a i(n + 1)), a.s., and EX{T{n)At{n))^EX{0). The result (77.3) now follows from the Downward Theorem 51.1. (Our new labelling reverses the 'direction' of the ns!) D {77.4) THEOREM (stopped supermartingales are supermartingales). Let X be an R-supermar ting ale, and let Τ be a stopping time. Then Χτ:= {ΧΤΛί:ί ^0} is an R-supermar ting ale. Of course, XT is an R-martingale if X is an R-martingale. Proof. Fix 0^5^i. Define T(n) and t{n) as in Theorem 77.1, and define s(n) analogously. By discrete-parameter theory, we have for n,reN with r ^ n, Е[АГ(Г(П) л i(n))|J^(5(r))] ^ X(Tin) a 5(r)), a.s. Letting r|oo and using the Downward Theorem for martingales, we have E[X{Tin) a i(n))|J%] ^ X{T{n) a s), a.s. But we may now use the result (77.3) on the left-hand side and right-continuity of paths on the right-hand side to obtain Ε[Ζ(ΓΛί)|^5]^Ζ(ΓΛ5), a.s. as required. D (77.5) THEOREM (Doob's Optional-Sampling Theorem for UI R-super- martingales). Suppose that X is an R-supermartingale, and that either X is UI, so that {Xt:t^0} is a UI family, or X is non-negative. Let S and T be stopping times with S ^T Then Zre JSf *, and E{XJPT)^XT, a.s., E(XT\^S)^XS, a.s., with a.s. equality in both places if X is a UI R-martingale. In particular, a UI
190 SOME CLASSICAL THEORY 11.77 martingale Μ is of class (D) in that the family {MT\T a stopping time} is UI. Remark. You have already been warned that the 'obvious' analogue of the Doob decomposition for supermartingales is false in continuous time. See Exercise E79.77a. We therefore cannot proceed as we did in discrete time by deriving the supermartingale result from the martingale result. We combine approximation with the discrete-parameter result for supermartingales. Proof. From the discrete-parameter results (59.1) and (59.5) for Dn++ v we have Е[АГ(Г(и))|^(Г(и + 1))] ^ X(T{n + l)), a.s., and EX(T<n))^EX(0). Hence, by the Downward Theorem, X(T{n))->X(T) in JS?1 as well as almost surely (by right-continuity). Next, we have Ε[ΑΓ(οο)|^(Γ(π))]^Α:(Γ(π)), a.s., whence Ε[*(οο)|^(Γ)Κ*(Γ), a.s., We also have, for n,reN with r ^ n, E[Z(r(n))|J^(S(r))] ^ X(S{r)), a.s., whence, as previously, on letting r|oo and then η|οο, we obtain the desired result. D The martingale case of the above Optional-Sampling Theorem is used repeatedly in stochastic-integral theory. The following little result will also be found useful in that subject. (77.6) THEOREM. Let {Mt:0 ^ t ^ oo} be a progressive process such that for each (finite or infinite) stopping time T, we have £(|Mr|) < oo and E(Mr) = 0. Then Μ is a UI martingale. Proof. Let t ^ 0 and Fe^t. Define oo ifcoeFc:=Q\F.
11.77,78 CONTINUOUS-PARAMETER SUPERMARTINGALES 191 Then (check!) Τ is a stopping time, and E(Mr) = E(Mr; F) + E(M e; F<) = 0, E(M J = E(A# e; F) + Е(МЮ; F<) = 0, whence £(МИ; F) = E(M,; F), and so, M, = Е(МЮ | J^), a.s. D Do the exercise of extending the commutativity property in Theorem 59.6 to the following continuous time. (77.7) THEOREM. Continue to assume the usual conditions. Let S and Τ be stopping times. Then EsEr = ErEs = Е5л t, where ET denotes the conditional expectation map See the note following Theorem 59.6. 78. Two important results for Markov-process theory. We apologise for 'floating in and out of the usual conditions'. We shall see that there are reasons for so doing. (78.1) THEOREM. The following results hold. (i) Let Ζ be an R-process on a triple (Ω,^,Ρ). Set [/(ω):=ίηί{ί^0:Ζί(ω) = 0 or Z,_=0}, so that U is the time of first approach by Ζ to 0. Then U is ^-measurable and G:= (ω:Ζ.(ω) = 0 on [t/, oo)}eSi. (ii) Let X be a non-negative R-supermartingale relative to a setup (Ω,^,Ρ, {%:t^Q}). Set Τ(ω):= inf{ί^0:Xt(co) = 0 or Xt. = 0}. Then P(Z = 0 on [Г,оо)) = 1. Proof of (i). That U is ^-measurable is obvious from the fact that we could take every 9t to be ^ in Lemma 74.4. Then, with ε, q1 and q2 rational, we have G= Π U Π {ω:9ι<υ(ω)<9ι+ε;Ζ92(ω) = 0}. D ε>0 q\ ^ 0 92>9ΐ +ε Proof of (ii). We know that X remains a supermartingale relative to the
192 SOME CLASSICAL THEORY 11.78,79 usual augmentation (Ω,^,Ρ,{^:ί ^0}) of (Ω,^,Ρ,{^,:ί ^0}), and that Τ is an {^t} stopping time. For neN, set Sn:= inf {t:Xt < n-1}. Then Sn is a stopping time with Sn ^ Γ. For any rational q ^ 0, Sn and Γ + q are stopping times, and S„^T+q. Hence, for every n, EX{T+ q) ^ £*(£„) < л"1, whence Р^Г + g) = 0] = 1. The rest is obvious. D The next result has been very influential in shaping the development of Markov-process theory, particularly in connection with the problem of determining the 'correct' hypotheses for that theory. (78.2) THEOREM (Meyer). Let {^"J satisfy the usual conditions. For each heN, let Xn be an R-supermartingale relative to {J^}. Suppose that the sequence (X":neTti) is increasing in that Xn{t, ω)*ζΧη + ι (ί, ω), V(n, ί, ω). Define X{t, ω):= sup Xn{t, ω) ^ oo. π Then P-almost all paths of the (oo, oo]-valued process X are R-functions. The original proof in Meyer [1] remains the simplest. There is a nice, more sophisticated, proof in Getoor [1]. 79. Exercises. Substantial hints for most of the exercises are given at the end of the section. (E79.63) Let В be a path-continuous BM{R). Show that Xt:=e^cosBt and У,:= e*1 sin Bt define martingales X and Υ relative to the natural filtration of B. Thus (X, Y) is a martingale in IR2 and Xf + У2 = e\ This is behaviour very different from that of continuous martingales on IR, which, as we shall see, all resemble Brownian motion on R. (E79.64) Empirical distributions. Let neN. Let XuX2,...,Xn be independent random variables each with the uniform distribution on [0,1]. For 0 ^ t < 1, define Gn(t):=n-l#{k^n:Xk^t}, An{t):= n±[G„(i)- t]. Let yn{t):= a(G„{s):s^t). Prove that %tAn(s) A (i) M„(i):=-^, B„(ty-=An(t) + 1-i ol ■ds, Vn(t)-=Bn{t)2-Gn{t), define martingales M„, B„ and V„ with parameter set [0,1) relative to {@„(t):
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 193 ie[0,1)}. Which of these martingales are UI on [0,1)? Note that if we set M(i):= 0 and <St = σ{Χ^...,Xn) for 1 < t < oo then Mn is a supermartingale 5£x convergent at oo, but is not UI. Note. It is intuitively clear that, as η -> oo, we should have, in some sense, GAt)->t, Bn^B, An^A (on parameter set [0,1)), where β is a Brownian motion and Л is a Brownian bridge. {E79.66a) Test your understanding of the regularisation results by proving the following 'L-analogues'. Part (Hi) is particularly instructive. We shall say that χ: (R+ -> IR is an L-function if xt = lim xs (Vi > 0), xt + := lim xu exists in R (Vt > 0). *tt' «Hi (i) If у: Q + -*· IR is regularisable, then zt:=' lim^ J «ίί* ) U if ί>0, if ί = 0 defines an L-function z. (ii) If {Yt: teU + } is a supermartingale carried by (Ω, #, P; {^,: ie!R + }) then the set G of those ω for which q\-+ Yq(co) is regularisable is in 9, and P(G) = 1. If we set (lim Yq if ί>0, *TT' τ η Уо ifi = 0 then Ζ is an L-process. (iii) Define 9t _ := a(9s: 5 < t) for t > 0 and ^0 _ := ^0, and set /,:= a(9t _, <V(9J\ where Jii^S^) is the collection of P-null sets in 9Λ. Then Ζ is adapted to {/",}. Moreover, Ζ is a supermartingale relative to {<ft}. (iv) Ζ is a modification of Υ if the map ti-> Y, is left-continuous into JS?1. (v) In contrast to the case of R-regularisation, Ζ can be a modification of Υ even though ti-> У, is not left-continuous into JS?1. Convince yourself of this by taking an example in which Y*=B(jzrtAx) (i<1)' y':=~1 {t>l)> where В is a BM0(R) and τ:= inf {w: Bu = - 1}. {E79.66b) Let {#,} be any filtration, and define tft:=a(ys:s < t) for t>0. Prove that, for every t,Ji?t+=9t+.
194 SOME CLASSICAL THEORY 11.79 (E79.67a) Let (Ω, J^, P) be a triple in which J^ is P-complete. Let JT be the collection of all P-null sets in &. Let Ж be a sub-a-algebra of J*. Prove that L:= d(jr, JT) = {C/eJ^ : [/ΔΧε^Γ for some KeЖ} = :R. Note that, since Ж и ^Г с Я ^ L, you need only prove that R is a σ-algebra. (Е79.6Щ Prove Lemma 67.4. (E79.71a) Likelihood-ratio martingales. Let (Ω, ^, {^J) be a filtered space. Suppose that Ρ and Q are probability measures on (Ω,^) such that Q is absolutely continuous relative to Ρ on every cSt (though not necessarily on all of 9). Define Mt to be a version of the Radon-Nikodym derivative: Aff:=i^onSrA a.s. Prove that Μ is a (P, {^,}) martingale. (i) Let (Ω,^, {^J) = (C,cfi/, {s/t}), the canonical filtered space for continuous processes on R. Let Ρ be the Wiener law of BM0, and let Q be the law of Brownian motion with drift с starting at 0. Prove that 'Q---1 1,2, — on 9t 1 = exp (cnt - \c2t). (ii) Now let Ω be the space of all right-continuous functions w: IR+ ->Z+ such that w(0) = 0 and that w is constant except for a series of upward jumps of size 1. Let 7i,(w) := w{t), <SX := a(ns: s ^ 0), 9t := σ(π,: s ^ i). Let A and μ be positive constants. Let Ρ (respectively Q) be the law of the Poisson process of rate λ (respectively μ). Calculate dQ/dP on <&v (E79.71b) Projecting onto smallerfiltrations. Let (Ω, 9, P; {^J) be a filtered space carrying a supermartingale Y. Suppose that {Жг} is a smaller filtration in that Жt с #f for every i. Prove that Zr:=E(yf|jrfX a.s., defines a supermartingale Ζ relative to {^f,}. You have already seen at an intuitive level how this applies to give Azema's martingale in Section 37. Here is another application. Let Xt:=Bt + ct, where В is a BM0(R) and c>0. Let {Жг} be the natural filtration of В (equivalently of X). Let σ:= sup{i:Xt = 0}, with the convention that sup 0 = 0. Prove that Z,:= Ρ(σ > 11Жt) defines a supermartingale Ζ relative to {Ж^. Obtain the explicit formula Zt = exp( — 2cX*\ a.s. Deduce that (*) P(aedt) = c{2nty*exp{-±c2t)dt.
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 195 Give an immediate direct proof of (*): 'And the last shall befirsf. {Note. Excursion theory allows one to write down formulae analogous to (*) in very general situations.) (E79.71c) This exercise gives a simple proof of Levy's descriptions of local time in (1.14.7) and (37.9). Suppose that Λ is a Poisson measure on [0, oo) χ (0, oo) with intensity measure Leb χ ν, where ν Φ 0, and where ν(ε oo) c(r):= lim — exists for \ < r < 1 40 v(re, oo) and defines a strictly increasing continuous function с on β, 1]. Set Μ(ί,ε):= ν(ε, oo)"1 Λ([0, ί] χ (ε, oo)) - ί. Show that Μ(·,ε) is a martingale, and that, for \<,r< 1, it is almost surely true that M(·, r") -* 0 as η -* oo, uniformly on compact ί-intervals. Deduce that almost surely, Μ(ί, ε) -> 0 uniformly on compact i-intervals. (E79.71d) Read Section IV.2 of Volume 2, and make the sketched proof of Levy's quadratic-variation result in that section rigorous. (Volume 2 does give rigorous proofs of much more general results.) {E79.73) Prove Lemma 73.8. (E79J7a) Let Μ be a continuous martingale such that P( sup Mt — oo, inf Mt = oo 1 = 1. Define T(0):=0, T(n):=inf{i>T(n-l):|Mt-Mr(n.1)| = l}. Prove that the law of {ΜΓ(π): neZ+} is the (Markovian!) law of Simple Random Walk. (E79.77b) Let Μ be a continuous martingale such that P(supM, > 1) = P(infM, < - 1) = 1. Define Ut:= sups< t M» Lt:= infs< t Ms and T:=inf{r:C/r-Lr=l}. What is the distribution of Mr? (This question arose, and was solved, in work by LCGR on financial problems. We now have many solutions.) (E79.77c) The Η elms-Johnson example. This is a first look at one of the most celebrated counterexamples in the subject, one to which we return in Sections 111.31, IV. 14 and VI.33.
196 SOME CLASSICAL THEORY 11.79 Suppose that X is a UI supermartingale, and that (i) X = Μ - Л, where Μ is a martingale and A a process with non-decreasing paths. Show that Е(Лв) < oo, and deduce that Μ is UI. Prove that X is of class (D) in that the family {XT:T a stopping time} isUI. Let В be a BM0(R3), and define the Helms-Johnson process (ii) *,:=l/|Bt+1|. Prove that X is a UI supermartingale relative to its own natural filtration, but that X is not of class (D), so that X does not have a 'Doob decomposition' as (i). Hints. Show that (Ш) E(X?)=l/(t + l). If Гп:= inf {i: Xt ^ n} then, by Corollary 1.18.3 (which we assume), (iv) P(Tn < oo\X0) = тЩп-'Хо, 1). (E79J7d) More on likelihood-ratio martingales. Let (Ω, J^, P; {J%}) satisfy the usual conditions. Martingales, a.s., etc. are defined relative to this setup. Let Q be a probability measure on (Ω, &) such that Q « Ρ on every 3Ft. Choose a right-continuous martingale Μ with dQ/dP = Mt on every &v Show that if Τ is an almost surely finite stopping time such that {MTAt: t ^ 0} is UI then Q « Ρ on !F τ and dQ — = MT, a.s. on 3P T. dP T T Prove Reuter's Theorem that if X is a Brownian motion on Rn with constant drift vector с and started at 0, and if Г:= inf {t: |Xt| = 1} is the hitting time of the unit sphere, then Τ and Xr are independent variables. Why does this not conflict with intuition? Hints for selected exercises (H79.63) We know that exp(i0£, + f Θ2ή is a martingale with values in С Now take 0=1. {H79.64) For 0 ^ t ^ и ^ 1, Е(С„(н)|ЗД) = Gn(t) + [1 - G„(i)] U-^. 1 -t
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 197 This rearranges to say that Mn is a martingale. The real reason that Bn is a martingale is the SDE dBn = {l-t)dMn. But we can prove the property directly: for 0 < t < и ^ 1, 1-ί Jo 1-5 J, 1-i (Think briefly on how to justify this step.) There is an SDE reason for the martingale property of Vn too, but you can obtain this directly using the variance of a binomial distribution. Of course, Mn(l -) = 0, a.s., so M„ cannot be UI on [0,1): if it were, we would have M„(i) = E(M„(1 - )\9J = 0, a.s. We know that E{B„(i)2} = EG„(i) = t, so that Bn is 5£2 bounded and therefore UI. A bit more calculation shows that Vn is also $£2 bounded. {H79.66a) (Hint for Part (iii), which is the tricky bit.) Fix и > 0. For 0 ^ r < u, with red}, write Vr:=Yr-E(Yu\<Zr). Check that К is a non-negative supermartingale with parameter set Qn[0,w). Hence, for 0 < q < t < r < u, with qeQ, we have E(Ku_|<g = E( liminfKr|^ WliminfE(Kr|^)< Vr a.s. V rtT« / But, by the Upward Theorem, as г||и, Yr=Vr + E(Yu\9r)->Yu- = Vu-+E(Yu\9n-)9 a.s., whence Е(Уа_ \Vq) = E(KU_ |^) + Е(Уа|Зд< Vq + Е(У„|зд = yr Now let q]]t to get the required result that Ζ is a {/t) supermartingale. {H79.67b) For и > ί, &u must extend а(^и,Ж), whence However, {jfj satisfies the usual conditions, so that Jft = ^"t,Vi. Suppose now that HeJft and that u(n)||i. By Exercise 79.67a, we can find Gm in ^а(и) such that Nu(n):= HAGuin)eJT. Now set G:= f) Gu(n) and JV:= У Nm. Then Ge^t + and ЛТеЖ, and, since you can easily check that HAG^N, the proof is finished. (H79.71a) For every i, let M, = </Q/<iP on 9„ a.s. Then, for s < i, we have, for
198 SOME CLASSICAL THEORY 11.79 E(Mt;Gs) = Q(Gs) = E(M-Gs), so that Μ is a martingale. (i) If p(t;x,y) is the transition density function for Brownian motion, and 9c(*; *> У) that for Brownian motion with drift c, then Яс(Ь x, У) = exp [c(y -x)- \сгг\ p{t; x, y\ and, for neN, 0 = to<t1 < ··· <ίπ = ί, and χ0»χι»···>νΚ· with x0 = 0, Q( Π Kedxi})= Π^(ί|·-ίί-ι;^-ι5^)^ί=···· (ii) For j,neZ+, we have (μί)π /μ\π Ρμ(ίί Λ 7 + и) = exp (- μί) —- = exp [(Я - μ)ί] I £ J ρΑ(ί;;,; + η); and the answer is exp [(Я - μ)ί](μ/Λ)πί. (H79.71b) The 'theory' part is just the Tower Property of conditional expectations: for 0 ^ s ^ i, E(Zi|jri) = E(yt|jrt|jri) = E(yt|jri) = E(yt|ari|JfI)<E(yi|jrj) = ZIf a.s. As regards the example, Yt:=I<r>t is decreasing, and is obviously a super- martingale relative to the filtration with <&t = Ж ^ for every t. Hence Ζ is a supermartingale. That Zt = exp( — 2cX*) you know from Section 1.9. The formula (*) now follows from Ρ(σ > t) = EE( Yt | jfr) = E(Zr) and some integration. Now for the immediate proof. Let В be the BM0 defined as B{t) = tB{l/t). Then σ = sup {i: B(i) + ci = 0} = sup {t: tB(l/t) + ci = 0} = sup {i: B(l/i) = - с} = 1/inf {ί: 5(ί) = - с}. (Н79.71с) Since Ε{Μ(ί,ε)2} = ν(ε,οο)_1ί, the fact that M(-,rn)->0 uniformly on compact ί-intervals follows from the Submartingale Inequality and the Borel-Cantelli Lemma. Interpolating for ε between rn+1 and r" for r very close to 1, by using mono tonicity, is left to you. {H79.73) If Sn]S then {S ^ ή = f] {Sn < i}. If SJS then {S < ή = [j{Sn < t}.
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 199 If S„[S and Gef]^Sn+ then, for every η and every i, Gn{Sn<t}e<Z„ whence Gn{S <t} = (J(Gn{S„ < t})e<Zt. (H.79.77b) Simon Harris found this direct solution. Suppose that 0 < χ < у < 1. If Мге[х, у] then (i) Μ hits j — 1 before it hits y; and (ii) after hitting у — 1, it hits χ before it hits χ — 1. Hence, by the logic of the previous question, P(MTelx,y])*iy(y-x). On the other hand, if the following three conditions hold: (i) Μ hits у — 1 before it hits x; (ii) after hitting у — 1, Μ hits 0 before it hits χ — 1; (iii) after hitting у — 1 and then 0, Μ hits у before у — 1; then Мге[х,у]. Hence Ρ(ΛίΓε|>,?])>— ^(1-?). 1 4-х — у 1 —χ The rest is easy. {H.79.77c) If X is a UI supermartingale satisfying (i), then X is bounded in JS? \ and, since Е(Л) = Е(Л0) + Е(1г-10), we must have E^^) < oo. But now X and A are UI, whence Μ is also UI, and, by Theorem 77.5, Μ is of class (D). Since A is trivially of class (D), it follows that X is of class (D). In the example, E{Xf) =| r2(27ri)-3/2exp( —- ]4nr2dr = —, so that X is bounded in 5£2, and so is UI. Clearly, Хю = 0, a.s., so that, if X is of class (D) then, for any sequence {Tn) of stopping times with Гп| oo, we shall have X(Tn)-*0 in $£*. But, for the given sequence of stopping times, we have lim inf EX(Tn) ^ lim inf nP(Tn < oo) ^ E(X0) > 0. (Jti.79.77d) Let Fe^T. Then Fn {Τ < ί}6^ΓΛ t = ^η/Γ,
200 SOME CLASSICAL THEORY and so Q(Fn{T^t}) = E(Mt;Fn{T^t}) = E(MTAt;Fn{T^t}). Now let tf oo to get Q(Fn{T<oo}) = E(MT:Fn{T<oo}). For the proof of Reuter's Theorem see IV. 39.6 in Volume 2. 6. PROBABILITY MEASURES ON LUSIN SPACES This Part consists of two main themes: 'weak' convergence of measures, and existence of regular conditional probabilities. These themes are linked via their use of inner regularity of measures relative to compact sets. In the Daniell- Kolmogorov Theorem 31.1, we were able to utilize this inner regularity by making the assumption that the state-space Ε was Lusin. Now we are going to make this assumption on the sample space Ω. The product sample space from the DK Theorem is useless for this purpose, but we have now learnt that we can work with the space of continuous paths or that of R-paths, both of which spaces are Lusin (and even Polish). We present—fully, and in the simplest way we could devise—all the 'theory' of 'weak convergence', and how it applies to the space of continuous paths, the only case that we shall need. See 'Pointers to the main results' below. For the way in which the theory applies to the space of R-functions with the Skorokhod topology, you will have to see the excellent books by Billingsley [2] Parthasarathy [1] and Ethier and Kurtz [1]. Let X!,...,Xn be independent identically distributed real-valued random variables each with mean 0 and variance 1. Let μη be the law of The Central Limit Theorem tells us that μπ converges 'weakly' to the law μ of a standard normal N(0,1) random variable. If, for example, each X is + 1 or — 1 with probability \ each, and if A denotes the set of algebraic numbers, then μπ(Α) = Ρ(7„ΕΑ)=1^0 = μ(Α). For which Borel sets do we have μη(Β)->μ(Β)Ί How do we formulate 'weak convergence' generally so that it will apply, for example, to the Donsker Invariance Principle of Section 1.8 and allow us to derive interesting consequences (look ahead to (84.7)) from it? These are among the questions that we shall address. We use weak convergence to obtain existence of solutions to martingale problems in Section V.23 of Volume 2. Regular conditional probabilities provide an important language for probability theory and an important technique for martingale problems etc. See the
PROBABILITY MEASURES ON LUSIN SPACES 201 proof that solutions of martingale problems are Markovian in Section V.21 in Volume 2. We begin by recalling the Stone-Weierstrass Theorem and the Riesz Representation Theorem, results that we shall also need in Markov-process theory. For the case where J is a compact metrisable space, we shall put in explicit form (also useful later) the fact that the set Pr(J) of probability measures on (J,^?(J)) is again a compact metrisable space. Results for the case in which we are most interested, that of probability measures on a Polish space S (a space homeomorphic to a complete separable metric space), will be deduced by embedding S as a Borel subset of a compact metrisable space K. Therefore the only property of S that we need is that S is a Lusin space (a space homeomorphic to a Borel subset of a compact metrisable space!). Since Lusin spaces occur frequently, we use the 'Lusin' hypothesis—but the main reason for doing so is that it makes things easier! The vexed question of terminology. There have always been conflicts of terminology in this area. Probabilists have always used 'weak convergence' for something close to functional analysts' 'weak* convergence' (and very different from functional analysts' 'weak convergence'). Since our treatment depends crucially on the fact that, in functional analysts' language, 'the unit ball in the dual space is weak* compact', and since we rely on functional analysis more in this part of the chapter than elsewhere in the book, we use terminology consistent with that of functional analysis while 'doing the work'. Having made everything clear (we hope), we shall then, from a clearly signalled point on, regress to probabilists' terminology. Note on functional analysis. Through no fault of their own, many research students these days are less familiar with functional analysis than students were when the first edition of this book was published. We therefore include a more systematic linking of our results with those in the functional-analysis texts. We emphasize that one does not know a priori that, for a Polish space S, the Cb(S) topology on Pr(S) is metrisable. This is why we are forced to use nets rather than sequences. Pointers to the main results. The 'weak', or Cb(S), topology on the set Pr(S) of probability measures on a Lusin space S is defined in (83.1). This topology is shown to be metrisable in (83.7). Prohorov's sufficient condition (also necessary when S is Polish) for conditional compactness of subsets of Pr(S) is given in (83.10). The Continuous-Mapping Principle is given in (84.2); and Skorokhod's Representation Theorem (which gives a clear picture of the Continuous- Mapping Theorem and of much else) is found in Section 86. It is shown in Section 82 that, with the topology of uniform convergence on compacts, the
202 SOME CLASSICAL THEORY 11.80 space W = C([0,oo);IR) is Polish, with the usual algebra of σ-cylinders as its Borel σ-algebra. Prohorov's theorem on 'weak' compactness is translated into practicable form for Pr(W) in Section 85. Finally, the relation between 'weak' convergence and convergence of finite-dimensional distributions for W is explained in Section 87. Weak convergence' 80. C(J) and Pr(J) when J is compact Hausdorff. Let J be a compact Hausdorff space. This is a standard setting in functional analysis. We recall various fundamental results, for which D&S (that is, Dunford and Schwartz [1]), remains a superb reference. (Nowadays, you have to read 'sphere' there as 'ball'.) Let C(J) denote the Banach algebra of continuous (and necessarily bounded) real-valued functions on J with the usual supremum norm. (80.1) THEOREM (Stone-Weierstrass Theorem, D&S IV.6.16). Let A be a subalgebra ofC(J) that contains constant functions and separates points of J: for xeJ, there exist elements f and g in A such that f(x) φ g(x). Then A is dense in C(J). (80.2) DEFINITION (Pr(J), inner regularity for a single measure). Let Pr(J) denote the set of probability measures on (J, @)(J)). An element μ o/Pr(J) is called inner regular if, for every Be@(J), μ(Β) = sup {μ(Κ): Κ compact, К Я В}. (80.3) THEOREM (Riesz Representation Theorem, D&S IV.6.3). Let φ be a linear increasing functional 0:C(J)-*IR such that 0(1)= 1. Then there exists a unique inner regular element μ of Pr (J) such that Φ(Ω = μ(/)=\ fdn. Of course, 1 denotes the constant function equal to 1 on J; and 'φ is increasing' means that / < g on J implies that 0(/) < ф(д). The hypotheses on φ force φ to be a bounded linear functional of norm l:0eC(J)*. (80.4) Discussion. What is going on here? Why the mysterious inner regularity? The answer is that the smallest σ-algebra on J with respect to which all continuous functions are measurable, the so-called Baire σ-algebra on J, may well be smaller than 3i(J). A probability measure on the Baire σ-algebra has a unique extension to an inner regular element of Pr(J). Does this matter to probabilistsl Yes, it does. These ideas can be used in a very illuminating alternative approach to the proof of the Daniell-Kolomogorov Theorem and
11.80,81 PROBABILITY MEASURES ON LUSIN SPACES 203 to a study of its limitations and how to deal with them. Nelson [1] deserves much of the credit for this. See also Meyer [1] and Tjur [1]. (80.5) The C{J), or weak*, topology on C(J)*. The C(J) topology on the space C(J)* of all bounded linear functionals on the Banach space C(J) is obtained by making sets of the form (80.6) {<t>eC{J)*:\φ(β- φ0(β\ <β, 1 < i<и} a basis for neighbourhoods of the point ф0 in C(J)*. Note that this topology is automatically Hausdorff. We can study general topology in much the same way as we studied elementary metric topology provided that we replace convergence of sequences by convergence of nets (generalised sequences). See D&S 1.7. A directed set D is a partially ordered set for which every finite subset has an upper bound in D. A net is a family (xa:aeD) parametrised by some directed set D. If (xa:aeD) is a net of points in a topological space E, we say that xa -* χ if, for every open set G containing x, there exists a0 in D such that xaeG whenever α ^ α0. The C(J) topology of C(J)* may therefore be described by saying that a net (Φα) of elements ofC(J)* converges to the element φ ofC(J)* if and only if (80.7) фа(Л ^ф(Г) for all f in C(J). For feC(J), the statement that </>a(/) -* ф(Л means that, given ε > 0, there exists an element a0 in D such that IФа(Л ~ Ф(Л\ < ε whenever α0 < α. (80.8) THEOREM (Alaoglu, D&S V.4.2). The unit ball {феС(3)*: \\ф||< 1} in C(J)* is compact in the C(J) topology. In effect, this is just Tychonov's Theorem. We can identify the set of inner regular probability measures on Pr(J) with the closed set of elements феС^)* such that φ is increasing and maps 1 to 1. Hence the inner regular elements of Pr (J) form a compact set in the C(J) topology. 81. C(J) and Pr(J) when J is compact metrisable. In this section, we assume that J is a compact metrisable space. Things become much simpler. It is difficult to overstate the importance of the following result. (81.1) THEOREM. Every element ofPr(J) is inner regular. Note. The Baire and Borel σ-algebras on J agree.
204 SOME CLASSICAL THEORY 11.81 Proof. Let pePr(J). Let si be the class of В in 36{J) such that, for every ε > 0, there exist a compact set К with К я В and an open set G with G ^ В such that (81.2) μ(Β\*0 < ε, MG\B) < ε. It is immediate that the complement of a set in si is again in si. Suppose that B1 and B2 are in si, and let B:= B1nB2. Let ε > 0. For i = 1,2, choose a compact Xf and an open Gt with Then if K:= KinK2 and G:= G1nG2, we have (81.2). By this stage we know that si is an algebra. Now let {Bn) be an increasing sequence of elements of si with union B. For each n, choose a compact Kn ^ Bn and an open Gn with Gn ^ Bn with μ(Βη\Κη)<ε2-η-\ μ{Οη\Βη)<ε2~η. Then G:= (JG„ is open and μ(β\Β) < ε. The set L:=[JKn satisfies /x(B\L) < |ε, but L need not be compact. However, for some N, K:=[Jn^NKn is compact with μ{Κ) > μ(ϋ) — \ε\ and we have proved (81.2). We now know that si is a. σ-algebra. If К is a closed (hence compact) subset of J then К is the intersection of the sequence (Gn) of open sets: K = f]Gn, G„:={xEJ:p(x,K)<n-1}, where ρ is the metric on J. Hence every closed set is in si, and si — S6(J\ Π (81.3) THEOREM. C(J) is separable, and Pr(J) in its C(J) topology is compact metrizable. Proof. Let (xn) be a countable dense subset of J (why is there such a set?), and define M*):=P(*>**)· Note that the functions hk separate points of J. Let A be the collection of functions in C(J) that are finite sums of the form where q and the q(·,·,...) are rational constants. Then the closure of A is an algebra containing all constant functions and separating points of J. By the Stone-Weierstrass Theorem, A is dense in C(J). Since A is countable, C(J) is a separable metric space. Let (/„) be a countable dense subset of C(J). Consider the map (81.4) Рг(7)эМ^(м(/1),р(/2),...)еК:=П[-|1/п1М1/п11]·
11.81,82 PROBABILITY MEASURES ON LUSIN SPACES 205 This map is clearly one-one (why?). Moreover, for a net (μα) in Pr(J), μα(/)-μ(Λ V/eC(J), if and only if μα(/„)-μ(/„), Vn. Hence the map is (81.4) is a homeomorphism, and Pr(J) is homeomorphic to a compact subset of the metrizable space V. Note that the Riesz Representation Theorem is still necessary to obtain the fact that the image of Pr(J) in V is closed. D 82. Polish and Lusin spaces. We begin by recalling the definitions. (82.1) DEFINITION (Polish space; Lusin space). Let S be a topological space. Then S is called a Polish space if the topology of S arises from a metric with respect to which S is complete. The space S is called a Lusin space if S is homeomorphic to a Borel subset of a compact metric space J. (82.2) The space (W,stf). By far the most important example for us is the case when S is the space W:=C([0,oo),R) of all continuous functions on JR. This is the path space for (1-dimensional) Brownian motion and diffusions. (82.3) LEMMA. In the topology of uniform convergence on compact sets, W is a Polish space, and the σ-algebra of σ-cylinders, j/:=a(nt:t^0), nt(w):=w(t) (weW\ is the Borel σ-algebra $(W) on W. Proof. A suitable metric on W is given by p(w^ w2):= Σ2~ηΡη(ηι> w2)[l + pn(wu w2)~\ " \ where Pn(^!,>v2):= sup \w1(t)-w2(t)\. te[0,n] That W is complete and separable follows from the corresponding result for C([0,n]). That si с 3&{W) follows because each nt is a continuous map from W to R. Next note that if w1e\V then P„(w, wx) = sup | nq(w) - n^wj |, qeQn[0,n] so that each pn, and therefore also p, is ^-measurable. If F is a closed subset
206 SOME CLASSICAL THEORY 11.82 of W and {wn} is a countable dense subset of F then F= {we\V:inip(w,wn) = Q}, and Fes/. Thus si = @{W). (82.4) LEMMA. // S is a Lusin space then every probability measure on S is inner regular. Proof. This is an immediate consequence of Theorem 81.1. A probability measure μ on {S, 3#{S)) has a canonical extension to a probability measure on (J,^(J))withM(J\S) = 0. П (82.5) THEOREM. A topological space S is Polish if and only if it is homeomor- phic to a Gs subset (countable intersection of open sets) of a compact metric space J. In particular, every Polish space is a Lusin space. The 'if part is not necessary for us. You can find it in Section 6, No. 1, Theorem 1 of Bourbaki [1]. Our proof of the 'only if part is an extended version of one found there. Proof of the 'only if part. Let S be our Polish space. We prove that S may be embedded as a G6 subset of the compact metrisable space J:=[0,1]N. This result is so important for us that we provide every detail of the proof. Step 1: Put p:=p/(l +p). Then ρ is also a metric under which S is complete and separable, and O^p^ 1. Choose a countable dense subset {xn:neN} of S, and let α be the map a: S -* J defined as follows: «(*):= (β{*>Χι)> P(*> *2)> · · ·)· Let us prove that α is a homeomorphism of S to a(S). We need only show that if (x(n)) is a sequence of elements of S and xeS, then the statements (82.6) x(n)^x, (82.7) p{x{n),xk)^p{x,xk) for every к are equivalent. Since each p(,xk) is continuous on S, it is immediate that (82.6) implies (82.7). Suppose now that (82.7) holds. Since p(x(n), x) ^ p{x(n), xk) + p{xk, x), we have from (82.7) lim sup p{x(n\ χ) *ζ 2β{χ, xk), Vfc. Now let xk-+x to see that p(x(n),x)-*0. Step 2: For the moment fix xeS. Let d be a metric giving the topology of J.
11.82,83 PROBABILITY MEASURES ON LUSIN SPACES 207 Since a"1 is continuous on a(S) at a(x), for ε > 0, we can find <5(ε) > 0 such that yeS and d(a(x), <x(y)) < δ(ε) imply that p(x,y)<s. In particular, if neIN, then, taking ε = (2n)~* and <5 = min(<5(e), ε), we see that if BJd(a(x), δ) is the open ball in J of d-radius <5, then В7^(а(х), <5) has d-diameter at most 1/n and a(S)n B7 d(a(x), <5) has p-diameter at most 1/n. Step 3: Now think of S as identified with a(S) sitting on J. For neN and xeS, the closure of S in J, put χ in C/n if χ has a J-neighbourhood JVXtll of d-diameter less than 1/n such that the p-diameter oi SnNxn is also less than 1/n. We have already proved at Step 2 that Un Ώ. S, Now we claim that Un is open in S. So suppose that xeUn with Nxn as above. Any zeS sufficiently close to χ in the d-metric will also belong to Nx%n, and we can take Nz%n — Nx%n. Hence U„ is open in S. Suppose that xe(^\n U„. For each n, pick a point xn of S (remember that xeS\) in f]k^nNXtk. Then d{x, x„) ^ 1/n, so that xn -*χ in (J,d). But also, for r ^ n, the points xr and xn are in NXt„, so that p(xr, xn) < 1/n. Hence (xr) is a Cauchy sequence in (S,p), a complete metric space. Hence, for some x0eS,xn-*x0 in (S, p). But, since α is a homeomorphism, we must also have xn -* x0 in J. Thus χ = x0eS. Since U„ is open in S, we must have Un = Sn Vm where Kn is open in J. Hence To show that S is a G6 in J, we need only show that S is a G^ in J; and this is obvious because S = H{);gJ:^S)<1M. D π 83. The Cb(S) topology of Pr (S) when S is a Lusin space; Prohorov's Theorem. Throughout this important section. S denotes a Lusin space, so that S is a Borel subset of a compact metric space (J, p). (83.1) DEFINITION (the Cb(S) topology of Pr(S)). We denote the space of bounded continuous functions on S by Cb(S). We denote by Pr(S) the set of probability measures on (S, @(S)). //(pj is a net of elements o/Pr (S) and pePr (S), we say that μα converges to μ in the Cb(S) topology if /U/"W(/) for all feCb(S). A basis of neighbourhoods of the point p0ePr(S) is therefore provided by sets of the form {μβΡτ(Ξ):\μ(β-μ0(β\<εί, 1</<η}, where neN, each fieCb{S) and each ε{ > 0
208 SOME CLASSICAL THEORY 11.83 The most obvious way to guarantee Cb{S) convergence of sequences of elements in Pr(S) is as follows. (83.2) LEMMA. Suppose that (X„) is a sequence of (S,@(S))-valued random variables on some probability triple (Ω, &, P), and that Xn -* X, a.s. Then the law μη ofXn converges to the law μ of X in the Cb(S) topology. The same conclusion holds if we only have Xn-+X in probability in that, for every ε > 0, Ρ[ρ(Χη,Χ)>ε]->0 as n->oo. Proof. First assume almost sure convergence. Then, for feCb(S), we have μπ(/) = Ε/(*„ΗΕ/(Χ) = μ(/). If we assume convergence in probability then we shall have almost sure convergence along any sufficiently fast subsequence, etc. □ (83.3) Example. Let S = C([0, l],R) with the supremum-norm topology. Let S(n) be the normalized random walk in Section 1.8. Then, by (1.8.3), we see that the law of S(n) converges in the Cb(S) topology to the law of Brownian motion with parameter set [0,1]. Now, back to the theory! If (xa: aeD) is a net in R, we define lim sup xa:= inf sup {xa: a ^ a0}, aoeD and we define the lim inf analogously. We have χά-+χ if and only if lim sup xa — lim inf xa — x. (83.4) THEOREM. The following three conditions on a net (μα) of elements of Pr(S) are equivalent: (83.5) (i) μα-*μ in the Cb(S) topology; (83.5) (ii) lim sup μα^) ^ μ(Ρ)^ every closed F^S; (83.5)(iii) limii^a(G) ^ μ(0)^ every open G с S; If therefore (83.5)(i) holds and Be@(S) satisfies μ(δΒ) = 0, where dB is the frontier (closure\interior) of B, then μα(Β) -> μ(Β). Proof. The equivalence of (83.5)(ii) and (83.5)(iii) is trivial. Now suppose that (83.5)(i) holds, and that F is closed in S. For each n, the function fn defined by /n(x) = max(0,l-np(x,F)) is an element of Cb(S), and f„ j IF (the indicator function of F) as η ] oo. For each n, lim sup μα(ΙΡ) ^ lim μα(/„) = μ(/„), α α so that (83.5)(ii) follows on letting η|°°·
11.83 PROBABILITY MEASURES ON LUSIN SPACES 209 To finish the proof, we need to show that (83.5)(ii) implies (83.5)(i); and for this, we follow Billingsley [2]. Assume (83.5)(ii). We first show that (83.6) limsupMa(/KM(/), V/eCb(S). By replacing / by a suitable linear combination af + b, where a > 0, we see that it is enough to prove (83.6) when 0 <f < 1. Pick such an /, and define F,:= {sgS :/(5) ^ i/k} (0 < К к), where к is a temporarily fixed positive integer. Then F^as if, and k~ \i -l)^f<k~H on F£_ AF,, so that By partial summation, we find that i = 1 i = 1 Thus, since each Ff is closed and (83.5)(ii) holds, Mm^fc-^^iF^fe-^limsupM.iF^limsupfe-^M^) ^limsup^a(/)-fe_1. Since к is arbitrary, (83.6) follows, and, by applying (83.6) both to / and to — /, (83.5)(i) follows. D (83.7) THEOREM. For μεΡΐβ), let μ be the extension of μ to an element of Pr(J) with fi{J\S) = 0. Then the map μπ->μ /5 a homeomorphism ofPr{S) with its Cb(S) topology to the subset {v: v(S) =1} of Pr (J) with its Cb{J) topology. Hence the Cb(S) topology of Pr(S) is metrisable. Proof. We must show that if (μα) /5 a net in Pr (S) and μ ε Ργ (S) then the conditions (83.8) Μ/ΗΜΛ V/eCb(S), (83.9) &(/)-#/), V/еОД are equivalent. Since the restriction to S of an element of Cb(J) is automatically in Cb(S), it is obvious that (83.8) implies (83.9). Now assume that (83.9) holds. A closed subset F of S is of the form SnY, where У is closed in J. By Theorem 83.4, lim sup μα(^ = lim sup μα( У) ^ μ( У) = μ^), so that, again by Theorem 83.4 the result (83.8) holds. Π
210 SOME CLASSICAL THEORY 11.83 Of course the metrisability of Pr (S) is a relief: we can return to working with sequences. 'Let the ungodly fall into their own nets together; and let me ever escape them* (Book of Psalms). (83.10) THEOREM (Prohorov's Theorem) and DEFINITION (tightness). A sufficient condition for a subset Η ofPr(S) to be conditionally compact (that is, for its closure to be compact) in the Cb(S) topology is that Η be tight in the following sense: {83.11) for each ε >0 there exists a compact subset Κε of S such that μ(Κε)>1-ε, V/хеЯ. // S is Polish then this tightness condition is also necessary. It is the 'sufficiency' part that is important for us. Proof of sufficiency. Suppose that (83.11) holds. Since Pr(J) and Pr(S) are metrisable, conditional compactness is the same as conditional sequential compactness. Further, we know from Theorem 81.3 that every subset of Pr(J) is conditionally sequentially compact. It now follows from Theorem 83.7 that we need only show that if μηβΗ and μπ-*ν in Pr(J) then v(S) = 1. This is, however, almost obvious: from Theorem 83.4, we have v(KB)>limsupμη(Κε)^ I - ε. Hence v(S)= 1, as required. Π Proof of necessity when S is Polish. Let S be Polish. Let ρ be a metric on S such that (S,p) is complete with countable dense set {xn}. Let К be a compact subset of Pr(S) in the Cb(S) topology, and let ε > 0 be given. For each reN, the open subsets G" of S, where G> U Bp(xp 1/r), where Bp(y, <5):= {xeS: p(x, y) < δ}, satisfy Gnr]S as η|οο, so that υηΓ:={μβν:μ(&;)>1-ε2-'}ϊν as η|οο. However, it is clear from the result (83.5)(iii) that {μβν:μ(0ΐ)^1-ε2-'} is closed in V9 whence U" is open in V. Since V is compact, U"ir) = V for some n(r), so that /i(Grn(r))>l-e2"r, V/ieK
11.83,84 PROBABILITY MEASURES ON LUSIN SPACES 211 Now put K:=f)G?\ the 'bar' signifying closure in S. Then μ(Κ) > 1 — ε. Moreover, К is closed in S, and therefore complete under p. For every r, ^U Bp(Xj,2/r), so that К is totally bounded. Because К is complete and totally bounded for the metric p,K is compact (D&S, 1.6.14). The proof is complete. Π 84. Some useful convergence results. Again, let S be a Lusin space. (84.1) LEMMA. Suppose that h is a measurable function from (S, 4$(S)) to (S, @{S)) where S is another Lusin space, with metric p. Then the set of points Dh at which h is discontinuous is in @(S). Proof With δ and ε denoting positive rationals, we have where Αεδ is the open subset of S consisting of those χ in S for which there exist y, ζ in S such that p(x9 y) < <5, ρ(χ, ζ)<δ and (){h(y\ h{z)) > ε. Π (84.2) LEMMA (Continuous-Mapping Principle). Let h and Dh be as in Lemma 84.1. Suppose that (μη) is a sequence in Pr (S) with μη -* μ, and that μφΗ) = 0. Then (84.3) . μ„°Λ_1 -►μο/Γ1 in the Cb(S) topology ofPr(S). Proof. Let Γ be a closed set in S. Let F be the closure in S of ft" ^Γ). Then From (83.5)(ii), limsupM^ft-^D^limsup^FX/iiiO^M0/!"1^ so that, by Lemma 83.4, the result (84.3) holds. Π (84.4) Cb(R) convergence in Pr (R). Let μ be an element of Pr (R), and introduce the associated distribution function F (x):= μ(— oo, x]. A point aeR is called an atom of μ (or of F) if M{a}) = F(a)-F(a-)>0, or equivalently, if F is discontinuous at a. Since μ can have at most η atoms of mass 1/n, the number of atoms of μ is countable, so that the set of non-atoms of μ is dense in R.
212 SOME CLASSICAL THEORY 11.84 (84.5) LEMMA. Let (μη) be a sequence of elements o/Pr(R), and let μ€ΡΓ(Κ). Introduce the associated distribution functions F„ and F. Then the following conditions are equivalent: (84.6) (i) μη -» μ in the Cb(R) topology of Pr(R); (84.6)(ii) Fn(x) -» F{x) at every non-atom ofF. (Skorokhod representation for Cb(R) convergence in Pr (R)). Moreover, if (84.6) (ii) holds then we can find a probability triple (Ω, ^", Ρ) carrying (R, @)-valued random variables Xn with law μη and X with law μ such that Xn-*X almost surely. For the general Skorokhod representation for Cb(S) convergence in Pr(S), with S a Lusin space, see Section 86. Proof that (84.6) (i) implies (84.6) (ii). This is an immediate application of the last sentence of Lemma (83.5), taking В there to be (οο,χ]. Proof that (84.6) (ii) implies (84.6) (i). It is clearly enough to prove the last sentence of the Lemma. Suppose that (84.6)(ii) holds. Let (Ω,^,Ρ) = ([0,1], Щ[0,1]), Leb). For ωεΩ, define Χ+(ω):= sup {x:F{x) < ω} = inf {x:F{x) > ω}, Χ~(ω):= sup {x:F(x) < ω} = inf {x:F(x) ^ ω}; and make the analogous definitions for X*. If ζ > Χ "(ω) then F{z) ^ ω, so that, by right-continuity of F, we have F(X~(co)) ^ ω, and X "(ω) < с implies that ω < F{X ~ (ω))< F(c). We now see that X " (ω) ^ с if and only if ω < F(c), whence Ρ [Χ " (ω) < с] = F(c). If ω < F{c) then Χ+{ω) < с, so that F(c) = Ρ[ω < F(c)] < Ρ[*+(ω) < с]. But, since X" < X + 9 we must have P[*+(ω) < с] < P[Jf "(ω) < с] = F(c), and it is now clear that equality must hold throughout, so that both X~ and X+ have law μ. Since, for every rational c, we have P(JST ^c<X+) = P(X~ ^c)-P{X+ ^c) = 0, it is clear that X+ and X~ are almost surely equal. Fix ω€Ω. Let ζ be a non-atom of F with ζ > Χ+(ω). Then F(z) > ω, so that (by (84.6)(ii)), for large n, we shall have F„(z) > ω and Χ*(ω) < ζ. Hence limsupA^a))^.
11.84,85 PROBABILITY MEASURES ON LUSIN SPACES 213 But we can choose non-atoms ζ with ζ[[Χ+(ω) to get limsupXf{cD)^X+{(D). Finally, since Μπ\ίηϊΧ~{ω)^Χ~{ω) follows similarly and X+=X~ almost surely, we have, with Xn [respectively, X] denoting either of X*, X~ [respectively, X\X~\ Xn^X, a.s. D (84.7) The arcsine law. Consider the situation in (83.3) in which S:= C([0,1];R) with the supremum norm, and in which S(n) is the normalised random walk of Section 1.8. Let μ„ be the law of Sin) on (S,^(S)) and μ the Wiener law of BM0. For weS, define h{w):= Leb {s:0 < s < 1; w(s) > 0}. Then (why?) h is a monotone limit of continuous functions on S, and so is Borel from S to [0,1]. If wk -* w in S, then {5 < 1: w{s) > 0} с Hm inf {5 < 1: wk{s) > 0} £ lim sup {5 < 1: wk{s) > 0} <={s<l:w(s)>0}. Hence /i(w) < lim inf h(wk) ^ lim sup h(wk) < /i(w) + /{o}(wW) dt. Jo Fubini's Theorem shows that Г Г1 Г1 /ι(Λν) /{0}(w(t))A= μ{νν:νν(ί) = 0}</ί = 0, J5 Jo Jo so that μφ/,) = 0. Hence, by Lemmas 84.2 and 84.5, we have, for 0 < и < 1, μπ{νν:/ι(νν) < и) ->μ{Η^:/ι(νν) < u} = (2/7r)arcsinu1/2. The last equality is Levy's arcsine law, which is proved in Section 111.23, and, in a more illuminating way by excursion theory, in Section VI.53. 85. Tightness in Pr (W) when W is the path-space W:= C( [0,00); R). The Arzela- Ascoli Theorem allows us to translate Prohorov's Theorem into 'practicable' terms when S = W with the Frechet topology in Section 82. (85.1) THEOREM (Arzela-Ascoli Theorem, D&S, IV.6.7). A subset TofW is conditionally compact if and only if the following two conditions hold: (85.2(i) sup{|w(0)|:wer}<oo; (85.2)(ii) for each NeN, lim sup Δ(<5, Ν\ w) = 0, <U0 weT
214 SOME CLASSICAL THEORY 11.85 where A(SiN^):=sup{\w(t1)-w(t2)\:tut2e[09Ny9\ti-t2\<d} And here is how it is applied. (85.3) THEOREM. A subset Η ofPr{W) is conditionally compact if and only if the following two conditions hold: (85.4) (i) lim $\ιρμ^:^(0)\ > a} = 0; (85.4)(ii) for every ε > 0 and NeN, lim sup μ{νν:Δ(<5, Ν, w) > ε} = 0. <UO деЯ Only the 'if part matters to us. Proof of 'if part. By Prohorov's Theorem, we must show that if η > 0 is given then we can find a conditionally compact subset Γ of W with μ(Γ) > Ι—η,^μεΗ. So suppose that conditions (85.4) hold and that η > 0. Choose a so that if A:={w:\w(0)\^a} then μ{Α) > 1 - \ц, ΥμεΗ. Choose δ = <5(n, Ν) such that if Any.= {w:A(d,N;w)^l/n}f then μ(Αη,Ν) > 1 - η2-{η+Ν+2\4μβΗ. Put r:=Anf)AntN. n,N Then μ(Γ) > 1 — η, and since Γ satisfies the conditions (85.2), Γ is conditionally compact in W. □ Martingale methods will provide the best way of establishing the conditions (85.4) in the cases that concern us. But moment criteria and the deep Garsia- Rodemich-Rumsey inequality provide very important ways of establishing these conditions in other contexts. Here, as an example, is a result motivated by Kolmogorov's criterion (1.25.2) for path continuity. {85.5) THEOREM. Let Η be a subset ofPr{W) such that (85.4) (ii) holds and that, for every iVeN, there exist constants γΝ,δΝ and CN in (0, oo) such that supf Iw^-w^rMi^^C^I^-iJ1^- (Vtut2el0,m). Then Η is conditionally compact in Pr(W).
11.86 PROBABILITY MEASURES ON LUSIN SPACES 215 86. The Skorokhod representation of Cb(S) convergence on Pr (S). The following theorem gives a nice (and useful) way of thinking about Cb(S) convergence on Pr(S). (86.1) THEOREM (Skorokhod). Suppose that S is a Lusin space, that μη (neN) and μ are elements of Pr(5), and that μη-*μ in the Cb(S) topology. Then there exists a triple (Ω, .^, Ρ) carrying (S, @(S))-valued random variables Xn with law μη and X with law μ such that Xn-+X almost surely. (86.2) OBSERVATION. IfS is some fixed Lusin space for which the theorem is true (for arbitrary μη and μ in Pr(S),) then the theorem remains true when S is replaced by an element S of$$(S). Proof of Observation. This is obvious from the proof of Theorem 83.7. Π Proof of Theorem. We already know from Lemma 84.5 that the theorem is true when S = R. In particular, it is true when S is the Cantor set С a [0,1]. Now, the space {0,1}N is homeomorphic to С via the map (0^2£ги3-и, and it is therefore clear that CN is homeomorphic to С By now, we know that the theorem is true when S = CN. We also know that we need only prove the theorem when S is a compact metrisable space (J,d). However, the map from J to [0, |]N defined by xb+(i</(x,xfc)[l +</(х,хк)Г ^fceN), where {xn} is a dense subset of J, is a homeomorphism of J to a compact (and therefore Borel) subset of [0,|]N. Hence all we need do is deduce the result for S = [0,|]K from the known result of CN. So, let S;= [0,|]N, and let μη-*μ in the Cb{S) topology of Pr(S). For each к in N, there can be at most countably many real numbers a such that μ{χΕ$:χΛ = a} > 0. Hence, if D denotes the set of dyadic rationals then we can find r in [0, γ\ such that μ{χΕί:χΛ + re[0, l]\D, VfceN} = 1. Now let с be the continuous standard Cantor functions с: [0, 1] -* [0,1], and let y(i):= inf {ue[0,1]:φ) ^ i}, ie[0, l]. Then γ: [0,1] -* С,γ is Borel, γ is continuous at points of [0,1]\D, and c°y = id on [0,1]· The map t:S-*Cn, where T(x):=(7(xfc + r):fcelK), is Borel-measurable (look at the inverse image of special cylinders!), and the
216 SOME CLASSICAL THEORY 11.86,87 set of its discontinuities has μ-measure 0. Hence, by Lemma 84.2, μ„οτ_1 -»μ°τ_1 in the Cb(C**) topology of Pr(CN). Since our theorem is true when S = CN, we can find (Ω, &, Ρ) carrying (С,ЩС))-valued random variables Yn with law μπ°τ_1 and У with law μ°τ-1 such that Yn-+Y in CN, almost surely. Now consider the map ф: Ск-*[ — 1,1]N defined by ф(у):=(с(ук)-г:кеЩ Note that 0°t = id on S. Since φ is continuous, 0(Уп)-*(/>(У) almost surely. If V is an (S,^(S)) random variable with law μ, then τ(Κ) has law μ°τ~\ just like У, and 0(τ(Κ))= Κ Hence the random variable φ(Υ) has law μ, the obvious extension of μ from S to [— 1,1]^. If we now throw away any ω for which either some Υη(ω) or Υ{ω) is not in S, we have completed the requisite construction. Π Note. We did not need to consider, for example, whether the image of S under τ is Borel in CN. 87. Weak convergence versus convergence of finite-dimensional distributions. We now revert to probabilists' terminology. (87.1) TERMINOLOGY (weak convergence, convergence in law). Let S be a Lusin space and let μη (neIN) and μ be elements ofPr(S). We say that μη converges weakly to μ, and write if and only if μη converges to μ in the Cb(S) topology. IfXn, carried by (Ω, #„, Pn) and with law μη, and X, carried by (Ω, #", P) and with law μ, are (S, 3S(S))-valued random variables, and if also μη => μ, then we say that Xn converges in law to X. For W= C(R+,]R) and for each finite subset U of R+, we have the restriction map nv: W-+T!LU as in our study of the DK Theorem. (87.2) DEFINITION (convergence of finite-dimensional distributions). Let W be the path-space W= C(R+,R). Let μη(ηβ№) and μ be elements ofPr(W). We say that the finite-dimensional distributions of μπ converge to those of μ
11.87,88 PROBABILITY MEASURES ON LUSIN SPACES 217 if, for every finite subset U o/R+, (in the Cb(Ku) topology ofPx(Ru)). The following result might clarify certain things. {87.3) LEMMA. Preserve the meaning of W, μΠ and μ. Then μπ=>μ if and only if both of the following conditions hold: (87.4) (i) the finite-dimensional distributions of μ„ converge to those of μ; (87.4)(ϋ) the family (μπ:«6ΐΚ) is tight. Proof of 'only if part. Suppose that μπ=>μ. Since each map πυ is continuous, convergence of the finite-dimensional distributions follows from Lemma 84.2. The tightness follows from the 'necessity' part of Prohorov's Theorem, W being Polish. Π Proof of 'if part. Suppose that (87.4)(i) and (87.4)(ii) hold. Then, by the 'sufficiency' part of Prohorov's Theorem, (μπ) is conditionally sequentially compact. But if μπ(Λ)=>ν, then the finite-dimensional distributions of μπ(λ) converge to those of v; and, since μ and ν therefore have the same finite-dimensional distributions, they are equal. Π Now do Exercise E91.87. Regular conditional probabilities Here we prove the existence of regular conditional probabilities under topological assumptions. (Recall that we know from Section 43 that regular conditional probabilities do not always exist.) We also discuss the Markov property of Brownian motion using the language of regular conditional probabilities: this is meant to prepare you for the next chapter on Markov processes. We are even going to start switching now to 'Markov-process' notation. It seems to be impossible to study Markov processes rigorously without a baroque extravaganza of σ-algebras and filtrations: one uses symbols with 'degrees' such as #"° for 'uncompleted' σ-algebras (rather like those we have hitherto called ^), 2?μ for the completion of J^0 with respect to a certain Рд measure, etc. So, we are going to work with a basic σ-algebra J^0 and a sub-a-algebra #°. 88. Some preliminaries. Let Ω be a set. (88. jf) DEFINITION (countably generated σ-algebra; atom of a σ-algebra). Let <$° be a σ-algebra on Ω. Then <§° is said to be countably generated if there exists
218 SOME CLASSICAL THEORY 11.88,89 a sequence G1,G2,... of elements of 9° such that 9° — a{G1,G2,...}. For ωεΩ, the atom Α(ω) of 9° containing ω is defined to be the set A((Dy=(){Ge&0:cDeG}. Exercises (i) Prove that if S is a compact metrisable space then $$(S) is countably generated. Deduce that the same holds for a Lusin space S. (ii) Find a sub-a-algebra of ^[0,1] that is not countably generated—there is an obvious one. (iii) In general, Α(ω) need not be an element of 9°. Prove that А(со)е&° if ^° is countably generated. This is done later in this section. The proof of the main theorem in the next section requires a slight modification of the Riesz Representation Theorem 80.3. Suppose that Ω is a compact metrizable space. Suppose that ^ is a countable dense subset of C(Q) such that le# and ii is a vector space over the rational field Q. For example, # could be the algebra A used in the proof of Theorem 81.3. Suppose that φ.^-^TR. is Q-linear, increasing, and satisfies 0(1) = 1. Let us prove that for /ieC(Q), 0*(й):= inf {ф(д):де<$,д > h) = sup {<f>(f):fe<$,f^h}. [For, by adding to h a suitable multiple of 1, we can suppose that h ^ 1 on Ω. Then, for rational ε > 0, we can find / and g in 4! such that (l-e)h<:f<:h^g^{l+B)h, whence f>(l -ε)(1 4-ε)"1^.] By applying the Riesz theorem to φ*, we find that there is a unique probability measure μ on (Ω, ^(Ω)) such that Jn 89. The main existence theorem. We recall the definition of regular conditional probability within the theorem. (89.1) THEOREM (Doob,..., Kuratowski,...). Let (Ω,^°,Ρ) be a probability triple in which Ω is a Lusin space and J*° = ^(Ω). Then there exists a regular conditional probability (P\9°) of Ρ given 9°, that is, a function (Τ|»°):.£ΓΟχΩ->[0,1] such that (i) for each Fe^°t the function coi->(P|^0)(F,co) is a version ofP{F\9°); (ii) for every ω, the map F\-►P(ir|^°)(ir,a>) is a probability measure on 3F°.
11.89 PROBABILITY MEASURES ON LUSIN SPACES 219 The stochastic process {[P|^0)(F,-):f e<^°} *5> modulo indistinguishability, the unique modification of the process {P{F\9°):Fe^°} with the regularity properties (i) and (ii). Assume further that У° is a countably generated sub-o-algebra of 3?°. Then (Р\&°) has the further properties: (Hi) {ω:(Ρ|^°)(σ,ω) = /σ(ω),νσΕ^°} is a set in 9° ofP-measure 1. (iv) ifA{d) denotes the atom of Ψ containing ω then {ω:(Ρ|^°)(Λ(ω),ω) = 1} is in <§° and Ρ{ω:(Ρ|^°)(Λ(ω),ω)=1} = 1. Proof when Ω is a compact metrisable space. Assume that Ω is a compact metrisable space. Let ii be a countable dense subset of C(Q) containing 1 and such that ii is a vector space over Q. For each /e#, choose and fix some version ofE(/|3T) and write •0ω(/):=Ε(/|^)(ω). The set Γ of 'good' ω for which the statements Φω(9ιίι + 4ifi) = ίι0ω(/Ί) + 4ιΦΛίι\ V^^eQ, V/Ί,/2 etf, Φωϋι) < ΦωΰΊ) whenever fi,f2e^2indf1 ^f2, </>ω(1) = 1, are simultaneously true is in #°, and elementary properties of conditional expectations show that it has probability 1. For соеГ, the map φω on 4> is Q-linear and increasing, with φω(1) = 1, so that from the discussion in Section 88, there exists a probability measure (Ρ|#°)(·,ω) such that ФЛЛ = Ε(/|3Τ°)(ω) = |/(ώ)(Ρ|^№ώ,ω), V/etf. For ω φ Γ, define (Ρ | #°) (·, ω) := ν, for some arbitrary but fixed element ν of Pr (Ω). It is now trivial that, for fixed ξ, Ε(ξ19°) = (ξ(ώ)(ΡIЗГ°)(ЛЗ, ω), a.s. (Ρ, 9°\ first for each ξ in C(Q) by uniform convergence, then for each ^ebJ^0 by monotone-class arguments, then for each ξ in «^(Ω,^,Ρ) by truncation. Now we must prove properties (iii) and (iv) under the assumption that 9° is generated by a countable sequence G1,G2j... . Thus ^° is a fortiori generated by the countable π-system X consisting of all finite intersections of the Gv Now let Ω,:= {ω:{Ρ\$°)(Κ,ω) = 1к(со%ЧКеХ)еУ°, where, as usual, IK denotes the indicator function of K. Since X is countable
220 SOME CLASSICAL THEORY II.89,90 and, obviously, Щ1к\9°) = 1к, a.s.(P,«n, it follows that Ρ(ΩΧ) = 1. For each ωΕΩ1? the set of those Ge&0 on which the measures Gh+(P|ST0)(G,Q>), Gt-+IG(a>) agree is a d-system; and, since this d-system includes the π-system Jf, it includes σ{Χ) = ^°, by Dynkin's Lemma. Hence, for ωΕΩ1? (Ρ|^°χσ,ω) = /σ(ω), VGeST. Now let Α(ω) be the atom of G° containing ω. Then, for ωεΩλ, we can conclude that (Ρ\9η(Α(ω\ω) = ΙΜω)(ω)=1 once we know that A{a>)e&°. But you have already done this exercise by showing that where ^° = σ(σι? G2,...) as before and G<:= Q\G„. All further details are left to you. D Proof when Ω is a Lusin space. Assume that Ω is a Lusin space. Thus Ω is in 3&(J) for some compact metrisable space J. Regard Ρ as extended to (J,Jf(J)) in the obvious way, and apply the 'compact' result already obtained to the triple (J,Jf(J),P) with subalgebra ^°:=σ(^°) on J. On the P-null, Ж°- measurable set of ω* in J for which (Ρ|^°)(Ω,ω*) Φ 1, redefine (P| Jf °)(·,ω*):= ν, where ν is an arbitrary but fixed element of Pr(J) with ν(Ω) = 1. Set (P19°)(F, ω):= (Ρ| Ж°){¥, ω) (FeF, ωεΩ), and we are finished. Π See Dellacherie and Meyer [1], Stroock and Varadhan [1] and Parthasarathy [1] for other accounts of essentially the same proof. Stroock and Varadhan illuminate the relevance of tightness to the DK Theorem by relating that theorem to Theorem 89.1. Remark. The theory of regular conditional probabilities is sometimes called la theorie de desintegration des mesures. 90. Canonical Brownian Motion CBM (RN); Markov property of P* laws. We
11.90 PROBABILITY MEASURES ON LUSIN SPACES 221 end this chapter (except for an important set of exercises) by revising our view of canonical Brownian motion in the light of the ideas we have studied. On no account skip this section. It is important for your later understanding. We now use the basic notation: (90.1) (i) Ω:= C(R+, R"), Χ,(ω):= X(t, ω):= ω(ί) (ί > 0; ωεΩ); (90.1) (ii) &°:=a(Xs:s>% ^°:=c(Xs:s^t) (ί>0); (90.1)(iii) p(t;x,y):= (2πίΓΝ'2exp( - "У ~ *И (t>0;x,yeKN), (90.1)(iv) P(t;XJy):J*>Xfdy *'>* [ex(dy) if i = 0. Here εχ is the unit mass at x. Thus p{t; x, y) is the Brownian transition-density function, and P(t; x, B) is the transition function from χ into Borel set В in time t. By Wiener's Theorem, for each χ in Rn, there exists a unique measure Px on (Ω, &0) such that for neN, for 0 ^ tx < ··· ^ tn and for xl9...,хиеКЛ (90.2) Px( Π №M**}) = Π P(trii-i;Vi,^ where i0:= 0 and x0:= x. Equation (90.2) makes rigorous sense when integrated over a Borel subset of (RN)n. Let Ex be the expectation associated with Px. It is obvious from (90.2) that, for a special cylinder set F, the map (90.3) xi->Px(F) is Borel-measurable on Rn. The class of all Fe^° for which (90.3) holds is clearly a d-system; and, since it contains the π-system of special cylinders, we see that (90.4) xh->Px(F) is ^(R")-measurable for every Fe^°. Space shifts (σχ:χΕΈίΝ). For xeRN and ωεΩ, define (90.5) σχ: Ω-»Ω, (σχω)(ί):= ω(ί) + χ (ί ^ 0). Then σχ is a continuous map from Ω to Ω if we use the usual topology of uniform convergence on compacts. Moreover, ρ* = ρ°οσ;ΐ, Ε*ί = Ε°{οσ, (£eb^°)· It is now clear that (90.6) the map xi-*Px is continuous from JR.N to Ργ(Ω) with its weak, Cb(Q), topology. Towards the Markov property. There are two equivalent ways of formulating the conditional independence of past and future given the present, which is the
222 SOME CLASSICAL THEORY 11.90 Markov property: (90.7) Ρ (past and future | present) = P(past | present) Ρ (future | present); (90.8) P(future | past and present) = Ρ (future | present). See Exercise E60.41. We are going to concentrate on the latter formulation, but make it more precise by using the language of regular conditional probabilities. We also build the time-homogeneity property into our formulation. Time shifts (0,: t ^ 0). For t ^ 0, define the map (90.9) 0,: Ω -► Ω, (0,со)(и):= ω(ί + и) (и ^ 0). If Л denotes an event then 0,_1(Л) denotes that event shifted through time t: thus if Л = {XheB} then θ~\\) = {Xt+heB}. If η is a function on Ω, we write Qtr\ for η°θί: (90.10) 0ί^7(ω):=(0ί^7)(ω):=^7(0ίω). Note that, since Ω is a Polish space and J* ° = ^(Ω), and since also !F°t is countably generated (why?), a regular conditional probability (P*|#"°) must exist for хеКЛ One of the most intuitive statements of the time-homogeneous Markov property is that (P^F,0)^"1 is (indistinguishable from) P*(i). This is part of our next theorem. Because we are considering CBM (RN) already equipped with all its P* laws, we do not need here the existence theorem on regular conditional probabilities. However, we shall need that theorem later, in the Stroock-Varadhan theory of martingale problems. Roughly speaking, we shall need to say there that CBM (RN) with just one given law (say the P° law) can sense the family of P* laws. {90.11) THEOREM (Markov properties of CBM (R*)). The following results hold. (i) For every xelR** and every i^0, we have (modulo indistinguishability) (90.12) (P| J^H-1 = Ρ*(,·ω) on 0го. (ii) For xgRn and t^0, and for £eJ%0 and r\eh&°, we have (90.13) E*[£0^] = Ex KE*(l)if]. The same result holds (/"^(т^^ and т/е(т^°)+. Discussion and proof. Both parts of the theorem describe the way in which the Markov property knits together the various P* laws. The formula (90.13) is the most useful statement of the Markov property for doing calculations. You will soon acquire fluency in its use. At first sight, (90.13) looks rather complicated; and, even though it will look more complicated when written out in full, it is
11.90 PROBABILITY MEASURES ON LUSIN SPACES 223 worth expanding its statement for clarity. It says that Γ ξ(ω)η(θίω)Ρχ{άω) = Γ {(ω) ( | η(ώ)ΡΧ{ί>ω)(άώ) J Ρχ{άώ). J ωεΩ J ωεΩ \ J ώεΩ / A number of measurability and other questions will have occurred to you in connection with the statement of the theorem. Indeed, once one is clear what the theorem means, its proof is almost obvious! The result (90.4) extends by monotone-class arguments to yield the fact that (90.14) for ^ebJ*70, the map χ\-+Εχη is #(RN)-measurable. We know that ω\->X{t,a>) is ^-measurable, so that (90.15) for це\>&\ the map ω*-+ΈΧ(ί-ω)(η) is immeasurable. Because F\-+PX{t)(F) is already a measure on J^0 and because J*"0 is countably generated, to prove part (i), we need only prove that, for fixed Fe^°, (90.16) P*(0-1F|J^ = P*(,)(F), a.s.(P*,J*7), a statement about ordinary (as opposed to regular) conditional probabilities. To prove (90.16), we must show that, for Ge^°, we have (90.17) PX{G η θ;1F) = Ex{Pm(F); G). It is enough to prove this when F and G are special cylinders. To avoid lots of integrals, we assume formally that G = {X(t)edy} η Π {ВДеЛс,}, F = f] {X(uk)edzk}. i=l *=1 Then and P*<'.«>(F) = Π Р(Щ ~ «*-1;4-1,dzk) (u0:= 0,z0:= X(t,ω)) *=1 EX(P™(F); G) = | Π P(si ~ s(- » x,_ lf dxt)\p(t - sy9 xp dy)py{F\ and (check!) this agrees with the left-hand side of (90.17). When ξ = IG and η = /F, (90.13) reduces to (90.17). Monotone-class arguments round everything off. D 91. Exercises. In these exercises, we use the probabilistic notation and terminology explained in Section 87. For numerous exercises on weak convergence of measures on (R,^(R)), see [W]. (E91.82) Prove that if S is a Gd subset (countable intersection of open sets) of
224 SOME CLASSICAL THEORY 11.91 a compact metric space J, then S is Polish. (Hint. Begin with the case when S is an open subset.) Deduce that the set IR\Q of irrational numbers, with its usual topology, is Polish. {E91.83a) The space Pr(S) shares many of the properties of S. We know that if J is compact metrisable then Pr( J) is compact metrizable. Show that if G is open in J and ^elR, then {/xEPr(J):/x(G) > η} is open in Pr(J). Deduce that if S is a Gs subset of J then {/xEPr(J):/x(S) = 1} is a Gd in Pr(J): in other words, if S is Polish then Pr(S) is Polish. Use the Monotone-Class Theorem to prove that the set of / in b@){J) for which μι-*μ(/) is ^(Pr(J)) measurable coincides with b^(J). Hence, if S is a Borel subset of J then {/xEPr(J):/x(S) = 1} is Borel in Pr(J): in other words, if S is a Lusin space, then so is Pr(S). It should be noted that if S is Polish then there is a natural metric, the Prohorov metric, under which Pr(S) is complete and separable. See, for example, Ethier and Kurtz [1]. (E91.83b) Weak convergence of empirical distributions. Let Xx, X2,... be independent identically distributed real-valued random variables carried by (Ω, J7', P). Let μ be the common law of the Xk. For ωεΩ, let μη(ω) be the empirical distribution μ„(ω)(Β):= n~l #{k ^ n: Xk{oS)eB} {Be@). Prove that {ω:μη(ω)=>μ}€#" and Ρ(μη=>μ)= 1. (£91.83c) Let S = C([0,1];R). For λ < 1 and weS, define {τ(λ)\ν}(ί):=λ-ι\ν(λ2ή. ie[0,l]. Thus z(X):S->S. Fix c< 1, and, for weS, define a probability measure μη(νν) on(S,Jf(S)) via Let Ρ be the (Wiener measure on (S, Jf(S)). We know that z{c) preserves P: ροφ)"1 =ρ on (S,^(S)). Prove that if / is a bounded measurable function from (S, 3&(S)) to R such that f(r(c) w) = /(w) for every w then / is constant a.s. (P). It now follows from the Ergodic Theorem that μπ(\ν)(/)->Ε(Λ a-s- It therefore follows (why?) that Ρ(μη=>Ρ)= 1. (E91.86) The Continuous-Mapping Principle (84.2) was used in the proof of the Skorokhod Representation Theorem in Section 86. Even though it is a circular argument, convince yourself that the Skorokhod result greatly illuminates Lemma 84.2. Now use the Skorokhod Representation Theorem to prove the following
11.91 PROBABILITY MEASURES ON LUSIN SPACES 225 result, which we need in Section V.23 of Volume 2 and which is often used in the literature. Let {gk: к ^ 0} be a uniformly bounded sequence of functions on a Lusin space S, and suppose that /xfcePr(S) and μ*=>μ. If {gk'.k^0} is equicontinuous at each point of S and gk -* g pointwise then i.*M. 9<1μ. (E91.87) The purpose of this exercise is to give an example in which we have convergence of finite-dimensional distributions without convergence in law. (Clearly, tightness, must fail in such a case.) 'Almost anything will do', but we give a very concrete example. Let h be the 'tent function' Ц-М ifum. l0 if |x|>l. Let U be a random variable carried by some (Ω, J*, P) and uniformly distributed on Q, f]. For ωεΩ and ie[0,1], define for neIN, Ζ„(ί,ω):= h(3n{t- ί/(ω)}), *(ί,ω):= 0. Regard Xn and X as C[0, l]-valued random variables. Prove that, for every i, Xn{t) -* X{t) almost surely, so that the finite-dimensional distributions of Xn converge to those of X. Prove that Xn does not converge in law to X. (E91.90). Let X be CBM (RN) under the P° measure. Prove that a regular conditional probability of P° given σ(Χ(1)) will agree with Q on ^°v where Q is the law of the Brownian bridge from 0 to X{1) in time 1. Some hints {H91.82) For an open subset G of J define 1 1 where Gc:= J\G. For S = f] G(n), define p(x,Gc) p(y,Gc) Ps(x,y)=i2-\pG{n){x:y) 4. i+pG{n)(x>y) (H91.83b) Think of R as an open subset of J:= [ - oo, oo]. Choose a countable
226 SOME CLASSICAL THEORY II.91 dense subset {/r} of C(J). For each r, 1 n " Σ /r(**W(/r), a.s., И*=1 by the Strong Law. (H91.83c) If f(w) = /(t(c)vv) for every w then, for every n, /(w) = /(r(cn)w), so that fema{w(s): s ^ cn}. But (see part (ii) of Theorem 68.2) f] σ{\φ): s ^ cn} is P-trivial, so / is a.s. constant.
CHAPTER III Markov Processes As explained in the Preface to this edition, this chapter remains very much as it was originally: the only significant difference is that the functional analysis— in Hille-Yosida theory and in Ray's Theorem—is given much fuller treatment. There is a sense, therefore, in which the chapter is caught in a 'time warp'; but we hope and trust that it will still serve as a useful introduction to Markov processes. The Preface advised further4reading. The table of contents is a good guide to what this chapter contains. 1. TRANSITION FUNCTIONS AND RESOLVENTS 1. What is a (continuous-time) Markov process? The first two sections are intended to explain and motivate something of what follows. Details may be left vague—at least for the time being! Informally, a Markov process models the motion of a particle that moves around in a measurable space {E,S) in a memoryless way. As carrier set-up, we need a filtered space (Ω, {&t}\ on which, for every t ^ 0, an ^-measurable random variable Xt9 which gives the position of our particle at time i, is defined. It proves necessary to introduce a probability law P* for each point χ in E. (In the Feller-Dynkin context that we meet first, P* denotes the law of the process when it starts at x. In the Ray context, this idea has to be modified somewhat.) The Markov property ties together the various P* laws. It is all very similar to what we have just studied for the Brownian case in Section 90 of Chapter II. The following definition gives the idea of a Markov process and a precise definition of a transition function. The 'P,(x, Ε) ^ Γ condition allows for the possible death of our particle. (1.1) DEFINITION (Markov process, transition function). A Markov process X = (Ω, {&t:t> 0}, {Xt: t > 0}, {Pt: t > 0}, {P*: xeE}) with state-space {E,S) is an Ε-valued stochastic process adapted to {&t} such that, for 0 ^ s < tJehS and xeE, (1.2) E*[/(X.+i)|#-J = {Ptf){Xs\ Vх a.s.,
228 MARKOV PROCESSES IH.1,2 where {Pt} is a transition function on (£, S\ a family of kernels Pt:E χ £-* [0,1] such that (1.3) (i) for £ ^ 0 and xeE,Pt(x,) is a measure on £ with Pt{x,E) ^ 1; (1.3) (ii) for 17*0 and Ге<Г,Р,(-,Г) is ^-measurable; (1.3)(iii) for s9t^09xeE and Te£, -L· Λ+,(*,Γ) = Ι Ps(x,dy)Pt(y,O· Equation (1.3)(iii) is called the Chapman-Kolmogorov equation. We can equally think of the transition function as inducing/being a family {Pt} of positive bounded operators of norm less than or equal to 1 on b<f, with Ptf(x):=(Ptf)(x) = Pt(x,dy)f(y), in which case the Chapman-Kolmogorov equation becomes the semigroup property PsPt = Ps+t (s,t>0). Much of the interest of the subject arises from the interplay between the analysis of transition functions and the sample-path description of the Markov process. The starting point is sometimes the one, sometimes the other. For example, a diffusion X is often given to us 'pathwise' as the solution of some stochastic differential equation; and, except in a few trivial cases, there will be no closed-form expression for the transition function {Pt} of X. (Often, there will be no need to know about {Pt}.) On the other hand, in Markov-chain theory, we begin with a semigroup satisfying some minimal regularity properties; this time, it is the process that is hard to get hold of, and indeed the existence and properties of this process will need deep and difficult results, which it is the aim of this chapter to develop. The most important results of the chapter say that if a transition semigroup {Pt} is given then, under certain mild regularity conditions, there will exist on some probability space a Markov process X with R-paths (in Ε or perhaps in a suitably-topologised compactification of E) and such that the strong Markov property holds: E*[/(Xs+i)l^s] = (PJ)(X8l P* a.s., whenever S is a finite stopping time. The chapter will explain numerous methods (time-substitution, Feynman-Kac formula etc.) that may be applied to this good version X. 2. The finite-state-space Markov chain. To illustrate certain concepts, we start with the simplest possible case. A Markov process whose state-space (before compactification!) is a countable set is called a Markov chain. (But beware: some
III.2 TRANSITION FUNCTIONS AND RESOLVENTS 229 authors use 'chain' to signify 'with discrete time parameter'.) To keep things really simple, we shall in this section assume that £ is a finite set, and we shall use the notations Pij(t):= Pt(U {Л), P(t) = {Pij(t): UjeE}. We shall assume that {Pt} is honest in that Pt(U E) = 1 for all i and t. A transition semigroup is now a semigroup of £ χ £ matrices. That some regularity is necessary is obvious if we consider the case when Ε has two points and m=Q !)> ™-(! J) <·><* then no chain X with transition function {Pt} can have a right-continuous version. Some basic regularity assumption is needed: we consider only semigroups that are standard in that (2.1) Py(t)->*y (ПО). It can be shown (see later in this chapter) that the condition (2.1) implies that (2.2) p;.(0):= q{J exists for UjeE. The matrix Q:= {q^iij'eE} is the Q-matrix or infinitesimal generator of {Pt}, and has the properties (2.3) qij>0 (1ФЛ £itt = 0 (ieE). к Starting from the transition function, how would we construct an associated Markov process/chain ΧΊ One remark is that, for an initial distribution μ of X0, the finite-dimensional distributions of X are determined via (2.4) Ε" Π /i№<) = M/o^1/iP,a/2-^/J. i = 0 where 0 = t0 < tl < ··· < tn and sy.= tj — tj^l.lt is not hard to check (using the semigroup property) that the finite-dimensional distributions are consistent in the sense of the Daniell-Kolmogorov Theorem, whence, by that theorem (11.31.1), there exists a process X with the required law. But can we suppose that X has R-paths and the strong Markov property? If not, we can do nothing with X. The italicised question is not trivial; of course, that the answer is 'Yes' follows from theory developed below. Could we start from the process? We could indeed. Let qt:= — qii9 as usual. The well-known 'jump-hold' construction of a Markov chain starts with a discrete time chain (У(и)} with 7(0) having initial distribution μ and transition matrix J, where τ ._Uij/4i if l^h
230 MARKOV PROCESSES III.2 (We assume qt Φ 0 for each i.) Let {Vr: reZ+} be a family of independent positive variables each exponentially distributed with rate 1. Set K-= Σ 9(Г,ГЧ. T,:=inf{nMn>t}( Χ,:=Υ(τ,). r = 0 Then A" has R-paths (we use the discrete topology on E). Moreover, it is true that X is Markovian with Q-matrix Q; but how do we prove the Markov property, that the Q-matrix really is Q, etc.? Can we prove that X has the strong Markov property? The reader who tries to answer these questions will see that there are fundamental and non-trivial issues even in this simple example; we need an adequate theory of Markov processes to take us clear of these fundamental (but not very exciting) questions. Before abandoning this example, we can learn from it important elements of the structure of a general Markov process. Since P'(0) exists and is equal to Q9 it follows that, for t ^ 0, (2.5) F(t) = lim e-'IPit + e) - P(t)] = Hm ε^χΡ(ί){Ρ(ε) -1} = P(t)Q; ε|0 ε|0 and, solving this equation, we find that (2.6) P(i) = exp(t0; the semigroup is here generated by its generator Q in a very simple way. Next, we define the resolvent {Rx: λ > 0} of the semigroup by (2.7) Rx:= Γβ-*Ρ,Λ. Jo This is just the (componentwise) Laplace transform of the semigroup; or we may regard it (more helpfully) as follows: (2.8) (ARA)y = f °° λβ-λ%(ί)άί = P(XT =j\X0 = 0, Jo where Τ is a random variable independent of A" with the exponential distribution of rate A. In view of (2.6), it is immediate that (2.9) Rx = (X-Qy\ and various simple algebraic properties follow immediately, most notably the resolvent equation (2.10) Κλ-Κμ = (/ζ-Α)ΚλΚμ (λ9μ>0). The structure we have just described for a finite-state-space Markov chain is indeed, when suitably interpreted, a feature of all Feller-Dynkin processes: the (infinitesimal) generator of a semigroup is just its derivative at 0, the resolvent
IH.2,3 TRANSITION FUNCTIONS AND RESOLVENTS 231 is given by (2.9), and the semigroup is then found by inverting the Laplace transform as in (2.7). Sense can be made of the exponential formula (2.6). The resolvent equation holds in complete generality; and we shall see that resolvents are 'smoother', and in many ways more fundamental, than semigroups. Technical problems arise in the general situation because the derivative with respect to t of Ptf does not exist for all /, so the generator is defined only on a subspace ofb<f,.... The Hille-Yosida Theorem and Ray's marvellous extension of it will allow us to cope with these analytical problems. Once we have the semigroup, and hence a crude Daniell-Kolmogorov version of our desired process, we find that there are always enough supermartingales around to allow us to 'smooth' our process by Doob's regularisation theorems. 3. Transition functions and their resolvents. Let {Pt} be a transition function on (£, $). We shall say that {Pt} is honest ('conservative' and 'strictly Markovian' are often used) if P,(x,£) = l, W,x. The possibility Pt{x, E) < 1 must be allowed in the theory for reasons that will become clear later. The intuitive significance is that 1 — Pt(x, E) represents the probability that our Markov particle has 'died' before or at time t It is convenient (even when {Pt} is honest, for we need licence to kill an honest process!) to adjoin a coffin state д to Ε producing an extended state-space Ед:=Еид. Let ^:=σ(^,δ) be the smallest σ-algebra on Ed extending S and containing {d}. The transition function {Pt} extends to an honest transition function {P*d} on (Edi£d) in the obvious way: for t ^ 0, (3.1) (i) P+d(x,d):=l-Pt(x,E) (xe£,t^0), (3.1) (ii) Р?д(д, *)·= £д, the unit mass at δ, (3.1)(iii) Ρ^(ν):=Λ(ν) on£x<f. It is profitable to think about our (unextended) transition function {Pt} on (£, S) in another way. It is easy to see that the equation PJ(x) = j Pt(*,dy)f(y) sets up a one-to-one correspondence between transition functions on (£, S) and sub-Markov semigroups on hS. A (one-parameter) sub-Markov semigroup {Pt:t^0} on Ъ£ is a family of bounded linear operators on hS such that (3.2) (i) Pt:b£^b£, (3.2) (ii) 0^/41=*(KPJ41,
232 MARKOV PROCESSES III.3 (3.2)(iii) Ps+t = P5Pt, (3.2)(iv) /ЛО^Л/ДО, Vt. In this formulation, {Pt} is honest if and only if Pt\ = 1, Vi. (3.3) 'Normal' transition functions. In all cases of interest, £ contains all singleton sets {x}. We say that our transition function {Pt} is 'normal if Ρ0(χ,) = εχ(), the unit mass at x, for all χ in E. In semigroup language, this means that P0 — /, the identity map on hS. (3.4) Remark. Note that (3.2)(iii) implies that Ρ% = P0 for every transition function {Pt}. In the theory of Feller processes, which we develop first, we shall have the 'normal' situation: P0 = /. In the theory of Ray processes, the condition P0 = I can fail. (3.5) Example. We have already met the Brownian transition function on (Rn, Jf(Rn)) defined as follows: Pt(x, Γ):= J p(t; x, y) dy (t > 0; xeR"; Те®(Жп)\ where ρ is the Brownian transition-density function: p(t;x,y):=(2nt)-nt2exp(-\y-x\2/2t), Λ)(*>·):=εχ(·). (3.6) Example. Let / be a countable set and let J be the set of all subsets of /. Let {pij(t): t ^ 0; ijel} be a transition matrix function on /, so that the following three conditions are satisfied: (3.7) (i) Pij(t)>0, Vi,j,i, (3.7) (ii) ΣρΛ'ΚΙ, Vf,i, (3.7)(iii) ptj(s + i) = X pik(s)pkj(t), Vi, j, 5, t. kel Then ЛМ):=1РоО (t>0;ieI;JeS) JeJ defines a transition function on (/, J\ and all such transition functions arise in this way. The only transition functions on countable sets of any interest are those that satisfy the continuity requirement: (3.7)(iv) limPy(i) = Pi,.(0) = ,5i;. 40
III.3 TRANSITION FUNCTIONS AND RESOLVENTS 233 These are what Chung [1] calls 'standard' transition functions. We are going to take the view that for transition functions on countable sets, (3.7iv) comes as part of the definition. It is interesting that the full weight of Ray theory (or something like it) is needed to handle Markov chains. As we shall see, the true state-space of the Markov process is generally much larger than /. (We reserve the symbol Ε for that space.) (3.8) Resolvents. Suppose now that {Pt} is a measurable transition function on the measurable space (£,<f), so that, in addition to the conditions (1.3), we have the measurability requirement: (3.9) Vre<f, the map (x, t)^Pt(x, Γ) is {£ χ ^[0, oo))-measurable from Ε χ [0, oo) to JR. For λ>0, we can then define a map Rk:b$-*b£ as follows: for xeE, (3.10) RJ(x):= f e-^PJix)dt = Γ Rx(x, dy)f(y), J[0,oo) JE where Дя(х,Г):= [ e-*Pt{x,T)dt. J[0,oo) (Trivial applications of monotone-class theorems and of Fubini's theorem are now made without comment.) Then {Rx: λ > 0} is a sub-Markovian resolvent on bS\ (3.11) (i) Rk:hS^hS\ (3.11) (ii) 0</^1=>0^ЛЯя/<1; (3.1 l)(iii) the resolvent equation holds: Rλ-Rμ + (λ-μ)RλRμ = 0·, (3.11)(1У)/Л0=>ЯЯ/Л0(УЯ). Of course we have the following characterisation of the honest (or strictly Markovian) situation: РД = 1, У^ЯЯЯ1 = 1, VI Terminology. Rx is often called the λ-potential operator associated with {Pt}. We shall call Rx the λ-resolvent of {Pt}. (3.12) Note. If S and Τ are independent [0, oo)-valued random variables with the exponential distributions of rate λ and μ respectively, then, for λ φ μ, е~Хи — е~ци EPsf(x) = XRJ{x\ P(S + Tedu) = λμ — du. μ — λ
234 MARKOV PROCESSES IH.3,4 This gives a probabilistic interpretation of the resolvent equation since E(Ps+r/) = EE(PsPTf\S) = EPsμRμf = ΛμΚλΚμ/ (3.13) Exercise. Prove that for BM(R), and for λ > 0, л(*> У)/(У) ^ where гя(х, у) = у "1 ехр [ - у \ у - χ | ], у denoting (2Л)1/2. {Hint. For x > 0 and у > 0, 1/2 ехр[- \у2t-\x2rl-\dt RJ(x)= f гА(. Jr Jo = 2(2яу)-1/2х1/2г"^ Г"ехрС-^х^-*-1)2]^, Jo by putting t = xs2/y. But now the map s\-+u(s):= s — s-1 maps (0, oo) one-one onto (—00,00) and the inverse map u\-+s(u) satisfies s(u) = и + s( — u), whence s'(u) + s'( — u) = 1. Hence / = 2(2яуГ1/2х1/2е-у* exp[-^xw2]dw Jo = y~le~yx. 4. Contraction semigroups on Banach spaces. In many respects, resolvents are more fundamental than transition functions. In the theory of Ray processes, we shall construct transition functions from resolvents. The starting-point for this construction will be the Hille-Yosida theorem on contraction semigroups of operators on Banach spaces. That theorem is proved in Section 5. Here we introduce contraction semigroups and their resolvents and infinitesimal generators, and explain how these concepts are related. The formal equations of the theory, namely P5Pt = Ps+t, Л = ехр(^), P'0 = 99 Rx= e-bPtdt^X-V)-1 Jo are known to us, but we must now translate them into rigorous mathematics. (But see Note (4.16) below.) (4.1) DEFINITION (strongly continuous contraction semigroup (SCCSG)). Let B0 be a Banach space. A family {Pt:t^ 0} of bounded linear operators Pt:B0->B0 is called a (one-parameter) strongly continuous contraction semi-
III.4 TRANSITION FUNCTIONS AND RESOLVENTS 235 group (SCCSG) if the following conditions hold: (4.2) (i) for/eB0> ||Pf/ —/||-*0 as i|0 (strong continuity) (4.2) (ii) || Pt || ^ 1 for t ^ 0 (contraction property); (4.2)(iii) PsPt = Ps+t for s,i^0 (semigroup property). (By 'strongly continuous', we therefore mean 'of class C0' in the terminology of Hille and Phillips [1].) Suppose that {Pt: t ^ 0} is an SCCSG on B0. Then, for t ^ s ^ 0, and feB0, \\Ptf-P,f\\ = 11№,-,/-ЯИ < \\Pt-sf-fl from which it follows that the mapii—»Р,/ is continuous from [0, oo) into B0. We may therefore define the resolvent {Rx: λ > 0} of {Pt: t ^ 0} via (4.3) Яя/:= Г V^P,/</r, Jo the integral being the limit (in the (strong) topology of B0) of approximating Riemann sums. (4.4) DEFINITION (contraction resolvent). Let В be a Banach space, and let {Rk :λ>0} be a family of bounded linear operators Rx: B-*B.We call {Rx: λ > 0} a contraction resolvent if (4.5) (i) || ЛЯЯ К 1 for λ > 0, and (4.5)(ii) the resolvent equation holds: Rk-R^(X- μ)ΛλΛμ = 0 (Я, μ > 0). We already know how to prove that if {Pt: t > 0} is an SCCSG on B0, then its resolvent defined in (4.3) is a contraction resolvent on B0. But since then Jo it is clear that the resolvent of an SCCSG is a strongly continuous contraction resolvent (SCCR) in the sense of the following definition. (4.6) DEFINITION (strongly continuous contraction resolvent (SCCR)). By a strongly continuous contraction resolvent (SCCR) on a Banach space B0, we mean a contraction resolvent {Rx: λ > 0} on B0 with the additional property (4.7) || XRkf — /1| -* 0 as λ -* oo (strong continuity). The Hille-Yosida theorem gives the Tauberian' converse to the 'Abelian' result we have just seen by showing that (4.8) an SCCR is the resolvent of an SCCSG.
236 MARKOV PROCESSES III.4 Suppose now that {Rx: λ > 0} is a contraction resolvent on a Banach space B. Since Rμ = RλlI + (λ-μ)Rμl it is clear that (4.9) the range RXB of Rx is a space 9t independent of λ. Since, for geB, (XRX - I)R„g « -^-Д„0 - -^ί/, / — μ λ — μ we see that for fe£l, and therefore for / in the closure U of ^2, XRxf-+f as Я -* oo. Indeed, it is now obvious that (4.10) B0:= {heB-.XRJi-^h as Λ-»οο} = J. (4.21) DEFINITION ((infinitesimal) generator of an SCCSG). Let B0 be a Banach space and let {Pt:t^ 0} be an SCCSG on B0. The (infinitesimal) generator 9 of {Pt:t ^0} is the (generally unbounded) operator &: 3>{&)->B0 defined as follows. We write feQ)^S) if, for some g in B0, we have U-'iPJ-n-gW^O asejO, and we then define &f to equal g. Let {Pt:t^0} be an SCCSG on B0. We are now going to (formulate and) prove .that, for λ > 0, (4.12) the operators Rx and (λ — &) are inverses. For geB0, we have as ε|0, (4.13) B-\Rxg-e->*PERxg) = B-i e X5P5gds->g; о and it is clear that (4.14) for λ > 0 and geB0, RxgeS>(&) and (λ - &)Rxg = g. For feS>(&% and t > 0, z-\Pt+J-Ptf) = PtB-\PEf-f)^Pt<Sf and it is easy to obtain the results Ptfe®{<#\ ^Ptf = P<Sf = 9PJ, at P,f-f = 'psVfds= \'$Psfds. о Jo
HI.4,5 TRANSITION FUNCTIONS AND RESOLVENTS 237 On taking Laplace transforms of the last equation, we obtain RJ-X-lf = X-lRx9f; in other words, (4.15) Ufe9(9),ihenRx(X-9)f = f. (4.16) Note. In this section, we have used without proof properties of strong Riemann integrals. If you want the full theory of these, see Hille and Phillips [1] or Dynkin [2]. We end this section with a useful lemma that sometimes enables us to calculate the precise domain 9(9) for an SCCSG {Pt\t>0}. {4.17) LEMMA (Dynkin, Reuter). Suppose that <£ is an extension of 9. Thus suppose that Ή is a linear map from Q)^€), with ЭДсЭДсВ0) into B0 and that Vf = 9f, Vfe9(9). Suppose also that, for fe2)(<$\<$f = /=>/ = 0. Then <$ = 9; or, equivalent^, 2)(W) = S>(9). Proof. Suppose that /е®(#). Put g:= f - <$f. Then h:= R1geS>{9) and h-<#h = h-9h = g = f-Vf so that f-h = V(f-h) and f = he@(9). D 5. The Hille-Yosida Theorem. Here now is the route from resolvent to semigroup. {5.1) THEOREM (Hille-Yosida). Let {Дя:Л>0} be a strongly continuous contraction resolvent family on B0. Then there exists a unique strongly continuous contraction semigroup {SCCSG) {Pt: t ^ 0} on B0 such that (5.2) Г e-XtPtfdt = Rxf V*>0,V/eBo- J[0,oo) Indeed, if we define (5.3) Gx:=k(XR,-I), (5.4) Рм:=ехр(*Оя) = е-* £ (A0n(WM л = 0 then, for each fin B0, (5.5) Pt/=limPtiA/, Vi^O. A-+00
238 MARKOV PROCESSES HI. 5 {5.6) Preliminaries. In the last section, we obtained the fact (4.12) that if {Rx: λ > 0} and ^ are the resolvent and generator of an already given SCCSG {Pt:t^0}, then Rx and λ — У are inverse operators. In our current situation, we are given an SCCR {Rx: λ > 0} as data. No semigroup {Pt:t^0} is yet available. Even so, we are guided by (4.12). We know that the range RXB0 of Rx is a space 9t independent of Я, and that 9t is dense in B0 (since ЛЯЯ/ -* / as Л -*оо). If heB0 and Rxh = 0 for some Я, then Rμh = {I + (λ-μ)Rμ}Rλh = 0 for every μ; and, since μRμh-*h as μ-* οο, we must have /i = 0. Thus the map Rx: B0 -* Ш is a bijection. On combining this fact with the resolvent equation, we see that there is a uniquely defined operator ^ with domain ®(^):= 3t such that (Я-^-^Кя in that (4.14) and (4.15) hold for our new situation. The operator Gx in (5.3) is bounded. What makes the result (5.5) a particularly satisfying interpretation of Pt = exp(i^) is the fact that, for feB0, (5.7) fe@(&) if and only if g:= lim GJ exists, A->oo and then <&f = g. Proof of (5.7). Suppose first that feS>(&). Then GJ = kRk9f->9f. Suppose conversely that Gxf-+g. Then, for fixed μ > 0, the resolvent equation shows that, as Λ-> οο, Since we also have R^J-^R^, it must be the case that / = Λ„(μ/ — g\ whence /e®(^)and$?/ = 0. D Proof of Theorem 5.1. Since Gx is a bounded operator, it is well known that (5.8) (i) Ρ5,λΡ<,λ = Ρ5+ί,λ; (5.8) (ii) limhU0h 1(Ph X-I) = GX (uniform operator topology); (5.8)(ίϋ)Λ,Λ-/ = ίό^>Λ^. Since ||ЯЯЯ|| *ζ 1, it is clear from (5.4) that (5.8)(iv) ΙΙΛ,αΚΙ. Recall that Rx and Rfl commute, whence Gx and GM commute and Ριλ commutes
III.5 TRANSITION FUNCTIONS AND RESOLVENTS 239 with Psfl. Thus we may calculate, with Ρ(ί, λ) standing for PuX when convenient, so that, by (5.8)(iv), ιιλ.λ/-λ,μ/κ»|[^(^)/-/]-[^(^)/-/]|· Letting n-> oo and using (5.8)(ii), we find that \\Ρ,,χ/-Ρ,,μη\^<\\ολ/-ομη\. It now follows from (5.7) that, for /e^(^), the limit (5-9) />,/:= lim PtJ A-+00 exists uniformly over compact ί-intervals, so that t\-+Ptf is continuous for feS>{<&). Since Q}{<§) is dense in B0, the limit in (5.9) exists for each / in B0, and tv-^PJ is continuous for feB0. That (5.2) holds now follows from the fact that Γ e-»PtJdt = (X-Glt)-1f-+RJ (μ-*)). Jo Exercise. Prove this by using the resolvent equation to show that, with γ:= λμ(λ + μΓ\ (Я-G,)"1 =(Λ + M)-VKy + (A + μ)"1/. The proof of the Hille-Yosida Theorem is now complete. Π (5.10) LEMMA. The operator & does generate {Pt} in the sense that {Pt} is uniquely determined by (У, 2)(^)). Proof. For / in B0, we can (for each λ > 0) determine Rkf as the unique solution in ЩУ) of the equation (X-$)RJ = f. The function t)-+Ptf is then uniquely specified by the fact that its Laplace transform is Rxf. Of course, in this special 'semigroup' situation, inversion of the Laplace transform may be effected directly via (5.5). Π Note. There is no need to worry about the uniqueness theorem for Laplace transforms in the Banach-space context—just apply an element of the dual space and use the real-variable result.
240 MARKOV PROCESSES 111.5,6 (5.11) Limitations of the HY Theorem. There are two reasons why the HY theorem may not be entirely appropriate. The first is that Q)(^S) may be too small for many purposes. This is certainly the case in Markov-chain theory, where we need to introduce a 'natural' generator extending <&. The second reason is (of course!) that S>(^) may be too large. This is the case in diffusion theory in dimension η ^ 2, where the 'differential generator', a contraction of ^, is more tractable and often contains all the relevant information. 2. FELLER-DYNKIN PROCESSES 6. Feller-Dynkin (FD) semigroups. Until further notice, suppose that Ε is a locally compact Hausdorffspace with countable base (LCCB) and that $ — ЩЕ). It is well known that if Ε is not compact, then we can adjoin to £ a point д so that Ед:= Еид is compact metrisable. Thus д is the point at infinity in the one- point compactification of E. The notation is meant to indicate that д can be used as a coffin state. If Ε is compact, make д a point isolated from E. In either case, Ε is σ-compact and Polish. We write: C{E) for the space of all (R-valued) continuous functions on £; Cb(£) for the space of bounded continuous functions on £; C0(£) for the space of (bounded) continuous functions on £ which vanish at infinity; CK(E) for the space of continuous functions on £ with compact support. As an extension of the Riesz representation theorem (H.80.3) we have the following result. Again see Theorem IV.6.3 of Dunford and Schwartz [1]. (6.1) THEOREM. A bounded linear functional φ on C0(£) may be written uniquely in the form <?>(/) = M/):=[ /Шах) where μ is a signed measure on Ε of finite total variation. By a sub-Markov kernel V on (£,<f), we mean a map V:E χ (?-*[0,1] such that (i) Vxe£, V(x,·) is a subprobability measure on (E,S) so that V(x,E) < 1; (ii) Vre<f,K(-,r) is ^-measurable. Exercise. Derive the following theorem from Theorem 6.1 by using the Monotone-Class Theorem II.3.1. (6.2) THEOREM. Suppose that V\C0(E)-*b$ is a (bounded) linear operator
III.6 FELLER-DYNKIN PROCESSES 241 that is sub-Markov in the sense that 0^/^ 1 implies 0^ Vf < 1. Then there exists a unique sub-Markov kernel (also denoted by) V on (£, S) such that Vf{x) = iv{x, dy)f(y), V/eC0(£), Vxe£. Hence V has a canonical extension (via the integral) to a map V: Ъ$ -> hS. Every author has his or her own definition of 'Feller semigroup', so be careful when moving from book to book. The modern trend is to mean by the Feller property of a transition function on (E, S) the property (6.3) Pt:Cb(E)^Cb(E% Vi^O, and by the strong Feller property the property (6.4) Pt:hS^Ch(E\ Vi^O. There are good reasons for using the 'Feller' label for all kinds of subtle modifications of these statements. To avoid causing still further terminological clashes, let us give a new (and perfectly just) name to a favourite class of semigroups. (6.5) DEFINITION (Feller-Dynkin semigroup). A Feller-Dynkin (FD) semigroup is a strongly continuous, sub-Markov semigroup {Pt:t^0} of linear operators on C0(E): (6.6) (i) Pt:C0(E)^C0(E); (6.6) (ii) V/eC0(£), 0 < / ^ 1 =>0 ^ Ρ J ^ 1; (6.6)(iii) PsPt = Ps+t, Vs, t ^ 0; P0 = /, the identity on C0(E); (6.6)(iv) V/eC0(£), ||Pf/-/||->0 as Ц0. (Here then we have a situation to which the HY theorem applies with B = B0 = C0(E).) It follows easily from Theorem 6.2 that to any FD semigroup there corresponds a 'Feller-Dynkin' transition function on (E,S). The following lemma is very important for verifying conditions (6.6) in practice. (6.7) LEMMA. // {Pt:t^0} is a sub-Markov semigroup on C0(E) satisfying (6.6)(i)-(iii) then (6.6) (iv) is implied by the apparently weaker condition (6.6)(iv)* V/eC0(£), VxeE, Ptf(x)^f(x) as Ц0. Proof. \ifeC0(E) and χ -► у in Ε then, by the Dominated-Convergence Theorem, (RJ)(x):= Γe~XtPtf(x)dt^ Γe~*Ptf(y)dt = RJ(y). Jo Jo
242 MARKOV PROCESSES III.6 It is therefore clear that RX:C0(E)->C0(E% and {Rx:X>0} is a contraction resolvent on C0(E). We know from the Hille-Yosida Theorem that the common domain B0 of strong continuity of {Pt:t^0} and {Ля:Л>0} is given by B0 = RXC0(E) for every λ. We need to prove that B0 = C0{E). If B0 Φ C0{E) then, by the Hahn-Banach theorem, we can find a non-trivial linear functional φ on C0(E) such that φ annihilates B0. If μ is the signed measure that represents φ in the Riesz theorem, we shall have l· \XRxf(xMdx) = <p(XRxf) = 0 for every / in C0(E) and every λ > 0. However, XRxf{x) = f °° e-°Ps/xf(x) ds - f(x) (λ - со) Jo by the assumption (6.6)(iv)*. Hence <p(f) = 0 for every / in C0(E), contradicting the fact that φ is non-trivial. Hence B0 does equal C0{E). Π You might like to try to prove Lemma 6.7 directly, without using Hille-Yosida machinery. Let {Pt} be an FD transition function on (£, S). Let ^ be the (strong) generator of the FD semigroup {Pt}. Then, for fe3>{<0), ^/(х) = ИтГ1Г f Pt(x,dy)f(y)-f(x)]. illo LJe J Let /еЯ>(<&) с CO(E) and let / attain its supremum (as it must) at the point x. Then if f(x) ^ 0, we must have &f(x) ^ 0. (If {Pt} is honest then we will have &f(x) ^ 0 irrespective of the sign of f(x).) This fact motivates (6.8) LEMMA (Dynkin's Maximum Principle). Suppose that %:&(%)-> С 0(E) is a linear operator extending &. Suppose that iffeS}^) andf attains its maximum at χ and f{x) ^ 0, then <#f{x) ^ 0. Then <# = <#. Proof. By Lemma 4.17, we need only prove that /e^(#) and Vf = f imply that / = 0. So suppose that feS){^) and #/ = /. Let / attain its maximum at x. If f(x) ^ 0 then <#f{x) < 0, so that f{x) = <£f{x) = 0. By applying the same argument to —/, we see that / = 0. Π (6.9) Generator of Brownian motion. Let £ = Rn. Let Pt(x,dy) denote the transition function of CBM (Rn). See Example 1.6. For /eC0(Rn), set Ptf(x):= Jp,(x, dy)f(y) = Wf(Xt). Here X is CBM (Rn) and Wx is the Wiener measure corresponding to starting
ΠΙ.6,7 FELLER-DYNKIN PROCESSES 243 position χ. The fact that Pt:C0->C0 (where C0 = C0(Rn)) is easily established by analysis, and is an immediate consequence of the already established fact that xnWx is continuous from Rn to Pr{W). The fact that limPf/(x) = /(x) (/eCo(R-),xeR-) rjO is also easy to establish analytically, and is probabilistically obvious because of the (right)-continuity of Xt at 0. Thus {Pt} is an FD semigroup on C0(Rn). The natural domain in C0 = C0(Rn) of the operator \A (Δ being Laplace's operator) is defined to be ЩА):= {feC0:±Af exists and \AfeC0). In a moment, we shall prove that if η = 1 then ^ = \A. The situation in dimension η ^ 2 is more complicated. The operator ^ is the closure of |Δ: thus feQ)^S) if and only if there exist functions fn in S>{jA) and a function g in C0 such that || fn — f || -* 0 and || |Δ/Π — # || -* 0; and then ^/ = g. We examine this case later. The moral is that for dimension η ^ 2, infinitesimal generators are not really the right things to look at. The Stroock-Varadhan theory tells us how we should view things. Now consider the 1-dimensional case. From (3.13), it follows that **/(*)= Γ n(x,y)f(y)dy (Л>0,/еСо), Jr where r^yy^y-'expi-yly-xl), y:=№/2. Fix Л>0. Suppose that he@{&), so that h = Rxf for some / in C0. (In the present context, we have В = B0 = C0.) Then Л'(*)= УгА(х, у) sgn (у-*)/();)</)> Jr where ( 1 if χ > 0, sgnx:= < — 1 if x<0, ( 0 if χ = 0. On differentiating again, we find that kh-\h"=f = kh-<$h. Hence jA is an extension of ^. By Dynkin's maximal principle (or by direct application of Lemma 4.17), ^ = ^Δ. 7. The existence theorem: canonical FD processes. Let Ε continue to denote an LCCB and let ё\=М(Е). Suppose that {Pt} is an FD semigroup on C0:= C0(E). We shall show that there exists a strong Markov, Ed-valued R-process X with
244 MARKOV PROCESSES III.7 transition function {Pt}. (Strictly speaking, the transition function of X is {P*d}, but, conventionally, we say that {Pt} is the transition function, and {P*d} the extended transition function, of X.) We then obtain Dynkin's simple and extremely illuminating formula for the (strong) generator ^ of {Pt}. The technique used for establishing the existence of X is the same as that which we used for CBM (R). We first construct the D^niell-Kolmogorov (DK) process Υ associated with {Pt} and then obtain X by smoothing the paths of Υ via the regularity theorem for supermartingales. (7.1) Use of the DK Theorem. Let Ω:= Ef'*>\ the space of all functions ω from [0, oo) to Ed. For t ^0, let Yt be the coordinate projection mapping Ω to Ed via Υ,(ω):=ω(ί). Set 3°:=σ{Γ,:θ0}, 9?:=a{Ya:s<t}. For every probability measure μ on (Ed,Sd\ the DK theorem guarantees the existence of a unique probability measure Ρμ on (Ω, ^°) such that, for neN, O^ii^i^··· ^i„and x0,xu...,xneEd, (7.2) Р"[У(0)е</хо; Y^edx,;...; Y(tn)edxJ = μ(άχ0)Ρ;ι^χ0,άχ1)...ρ;ηδ_ίη_ι(χη-1,άχη). The semigroup (Chapman-Kolmogorov) property guarantees the required consistency, and the fact that Ed is compact metric gives more than adequate topological structure. We write (7.3) Px:= Ρε*, εχ being the unit mass at x. We can verify by the usual monotone-class methods (Exercise!) that the map хь->Рх(Л) is ^-measurable for every Λ in ^°. (Problems concerning the weak continuity of the map χι—>PX are discussed in Section 13). Hence, for цеЪУ0 and t ^ 0, the map ω^>ΈΥ(ί>ω)η is ^-measurable, where Ex (respectively Εμ) denotes the expectation corresponding to Px (respectively Ρμ). The Markov property knitting the laws {Px:xeEd} can now be expressed: for ηeЪ<g0,μePΐ{Eд) and i^0, (7.4) E"|>7 ° 0, |»f°] = Er( V a.s.(P"). Here, of course, 0, is the time-shift map: 0,:Ω-»Ω, 0,ω(5):=ω(ί + 5). See Section 11.90 for (7.4). (The 'Brownian' proof obviously transfers!) In particular, we have, for feC0 and 5, t ^ 0, (7.5) (i) &и°Гш+*№ = РшЯЪ). a.s.(P^)
III.7 FELLER-DYNKIN PROCESSES 245 and this, together with the fact that (7.5)(ii) Ρ>°Υ^ = μ, completely determines all the laws P". {7.6) Path regularisation. Suppose that h is of the form Rtg, where geC£ (the set of non-negative elements in C0). Then (7.7) h is l-super-median for {P,}: by definition, this means that 0<e"5Ps/i</i, Vs^O. Proof of (7.7). ,Γ'e-Pugdu=l Jo Js e-5PsRig = e-5P5 \ e~uPugdu = \ e'vPvgdO^Rxg. D Hence, for every μ, E^-(s+i)/i(n+I)l^°] = e-<*+»PMh(Yt) < е-%и so that (7.8) e~lh(Yt) is a supermartingale relative to (^°,Ρμ). Hence (see Section 11.65.1). (7.9) α.5.(Ρμ), the following statement holds: the limit hm^3q^t e~qh(Yq) exists for all t and defines an R-function of t. Now let g0,gug2,-· be a countable dense subset of C£ with g0 >0 on E. Put hn = R1gn and let Ж:= {h0,h1,h2,...}. Then h0>0 on Ε and Ж separates points of Ed because Ж — Ж is dense in !&{&) and &{<&) is dense in C0. Since Ж is countable, it follows that for every μ, (7.10) a.s.(P"), (7.9) holds for all кеЖ. But the map xh*(A0W,AiW,...) (with h(d) = Whetf)v& a homeomorphism of Ed onto a subset of R00. Hence we can conclude that, a.s.(PM), (7.11) Xt:= limQ3g||f Yq exists, Vi, and defines an R-process X. It is worth explaining things in a little more detail. Let Ω0 be the set of ω in Ω for which the limit Xt(co) exists for every t and defines an R-map t\-*Xt(a>). Then Ω0ε^° and Ρμ(Ω0)= 1 forall^ePr(£a). For ωΕΩ\Ω0, define Xt(co):= 5, Vi. Then X is an R-process and Xt is ^-measurable for each i. The crucial result that, for each μ, X is α (Ρμ) modification of Y: (7.12) Р"[*,= У,] = 1, Υί,νμ, must be established directly, since we cannot appeal to Theorem II.67.7 in the absence of the usual conditions.
246 MARKOV PROCESSES III.7 Proof of (7.12). For fuf2eC0(E) and <&э«Ш> VUi(Y,)fi(X,n = limETOW2(rg)] = limE'1[/1(yi)Pg_i/2(yi)] = E'i[/1(yi)/2(yi)]. By monotone-class arguments, Wf(Y„Xt) = Wf(Yt, Yt) for /eb(^ χ ёд\ and (7.12) follows. D Finally, note that, since /i0:= Я^0 > 0 on £, we can conclude from Theorem 11.78.1 that, for every μ, it is true a.s.(P") that (7.13) Vt, i/difcer *,_ or Xt = d, then Xu = d,Vu^t. According to that theorem, the statement (7.13) corresponds to a ^-measurable set. (7.14) Canonical FD processes. The DK theorem has served its purpose. The clumsy space Ω:= Εψ·Λ) is no longer needed. As in the switch from Rt0·00* to С for CBM(R) in Section 11.71, we can now tidy things up. (7.15) Let Ω now denote the space of R-paths: ω:[0,οο)->£β, such that if either ω(ί —) or co(t) = д then ω(μ) = 5, Vw ^ t. By convention, each ω in Ω is extended to a map ω: [0, oo] -> Ed by setting ω(οο):= д. Note. It is important that we do not require the existence of the limit limiTT β ω(ί) for ω in Ω. For ωεΩ and t ^ 0, define (7.16) (i) *,(ω):=ω(ί), (7.16) (ii) «Τ°:= σ{Χ5:0 ^ s < oo} = σ{*5:0 ^ 5 ^ oo}, (7.16)(iii) ^:=a{Xs:s^t}. (7.17) THEOREM (Dynkin, Kinney, Blumenthal). For μβΡτ(Εδ), there exists a unique probability measure Ρμ on (Ω, &°) such that, for neN, 0 ^ tx ^ t2 < · · · ^ tn and x0,xu...,x„eEd, (7.18) P^X(0)sdxo;X(t1)edx1;...;X(tH)edxH] = μ(άχ0)Ρ:ι\χ0ΜιΥ-Ρ:ηίίη_ι(χη.ιΜη). This very important theorem merely reinterprets the results obtained above. The new P" is the old P" law of X\ The set-up Ζ = (Ω,^°,{Ζί:0^ί^οο},{Ρ'1:μ€ΡΓ(£β)})
Ш.7,8 FELLER-DYNKIN PROCESSES 247 is called the canonical FD process associated with the FD semigroup {Pt}. Of course, X has the same simple Markov properties as the process Y. In particular, for 5, t ^ 0, ξeЬ^ and /eC0, (7.19) Ε*Κ/(ΧΙ+Ι)] = &KPJ(Xt)l This formula will allow us to utilize the smoothness of the semigroup {Pt} and the right-continuity of {Xt} to show that X (unlike Y) has the strong Markov property. (7.20) Lifetime. The random time C:= inf {t:X(t) = d} = inf {t: JT(t-) = δ or JT(t) = d) is called the lifetime of X (7.21) Noie that ι/ί<ζ(ω) ί/ien ί/ie sei of values {X{s, ω): s ^ t] is precompact in E. 8. Strong Markov property: preliminary version. Recall that an {&?+} stopping time is a map Γ.Ω-* [Ο, οο] such that {ω: Τ(ω) < ί} e^°+, Vte[0, οο). (Here J^ + := J^:= <^°.) Equivalently, Г is a map Γ:Ω-» [0, οο] such that (8.1) (i) {ω: Τ(ω) < t}e^% Vie[0, oo]. For such a T, we define &°T+ to be the σ-algebra of sets Λ in 3F° for which (8.1)(ii) Αη{ω:Τ{ω)<ήΕ^% Vie[0,oo]. For 0 ^ t ^ oo, define 0r'Q->Q as usual: (8.2) (i) 0,ω(5):=ω(ί + 5), Vs, where, of course, oo + s — s + oo = oo, Vs. If Τ is a map from Ω to [0, oo], define (8.2) (ii) θτω = θηω)ω. Recall that, for a function η on Ω, we write θτη for η°0Γ and that, for example, (8.2)(iii) (ξθτη)(ω):=ξ(ω)η(θΤ(ω)ω). (8.3) THEOREM (Strong Markov Theorem; Dynkin, Yuskevic, Blumenthal). Let Τ be an {&?+} stopping time. Then Чμe¥τ(Eд\ЧηeЪ&\ (8.4) ^μίθΤη\^°Τ^ = Εχ^ΙηΙ α.5.(Ρ"). Equivalently, У/хеРг(£Д 4^^+, V^ehT0, (8.5) Е^0г^] = Е*[£Е*(г^].
248 MARKOV PROCESSES III.8 Notes (i) We have already mentioned that (8.5) expresses the strong Markov theorem in a form ideally suited to applications. Certain slight variants of (8.5) are sometimes required. Thus (8.5) will obviously hold if £em+^°T+ (the set of non-negative #roT+ measurable functions from Ω to [0, oo])and^em+#"°. (ii) The debut and section theorems make it clear that Theorem 8.3 needs to be extended to take account of completions of σ-algebras before it is of any real use for discontinuous processes. See Section 9 for the appropriate extension. Proof of Theorem 8.3. As we used to do for martingales, put г<">м = lk2~" if {k ~1)2~"< τ(ω) < fc2""'feeH joo if Γ(ω) = οο. Suppose that Ae^°T+. Then Λ„,*:= {ω: Γ(η)(ω) = fe2""} пЛе^°2.„. Thus, applying the simple Markov property (7.19) with ξηΙ( as the indicator function of Л„ ц, we find that, for μβΡι(ΕΒ) and /eC0, Ε"[/οΧ(Γ<"> + 5);Λ]= £ E"[/oZ(fc2-» + 5);An))k] *^ 00 = 1Е"[^Ж2-»);л»,к] = E"[(PS/>W>);A]. Keep μ and s fixed and let n-* oo. By right-continuity of paths, Χ (Γ(π) + 5) -> X{T + 5), X{Tin)) -> Ζ(Γ). Since /eC0, we have PsfeC0 by the Feller-Dynkin property, so foX(T^ + s)-^f(XT+s), Psf°X(T^)-+Psf(XT). Hence, by the Dominated-Convergence Theorem, (8.6) Ε"[/(ΧΓ+5);Λ] = Ε"[Ρ5/(ΧΓ);Λ], and monotone-class arguments give (8.7) ФКЯХт+шП-ЯКРшЯХт)! V£eb^°r+. Now consider the expression V'.= E^f(XT+s)g(XT+s+u)l where fgeC0 and £еЬ^+. We can apply (8.7) with T + s playing the role of Τ and ξf{Xτ+s) playing the role of ξ to obtain V = E»tff(XT+s)Pug(XT+5)l Now we can apply (8.7) again with Τ as Γ, and ξ as ξ9 but with f(x) replaced
III. 8,9 FELLER-DYNKIN PROCESSES 249 by f(x)(Pug)(x) to find that V = E»tfx({Ps(fxPug)}oXT)l The brackets are meant to help clarify the structure, but you can ignore as many as you wish! You can now check that we have just established (8.5) in the case when η(ω) = /(Χ5(ω))9(Χ5+»(ω)), and you can see that the case when (8.8) η = ίι(Χ5ί)/2(Χ*2)'~/η(Χ5η) (fi,f2,...,fneC0) can be established similarly. Now let Ж be the algebra of functions η that are sums of products of the form (8.8) and apply the Monotone-Class Theorem II.3.2 to obtain the general case of (8.5). You should check that д causes no trouble in this proof. D 9. Strong Markov property: full version; Blumenthal's 0-1 Law. The Strong Markov Theorem (8.3) is inadequate because it only applies to {&"?+ }-stopping times, whereas, for example, the debut of a compact set for an R-process is not an {#"°+ }-stopping time (see Section 11.75). We therefore need the extension to be described in this section. For each μ in Pr(£5), we now define (9.1) (Ω,^", {^μ}) to be the usual P" augmentation of (Ω, J^°, {J%°}). First, then, &μ is the Ρμ completion of&°. This means that Ае^ц if and only if there exist Λ1μ,Λ2,μ in &° with AUfl <Ξ Л s Α29β9 Ρ"(Λ1ίμ) = Ρ"(Λ2,μ); and then we set Ρμ(Λ):= Ρμ(Λ1μ). Further, for t > 0, &μ is the smallest σ-algebra on Ω extending ^°+ and containing all Ρμ null sets in $*μ. Now put (9.2) ^:=Π^"> *>П*7. μ μ the intersections being taken over all μ in Pr(Ed). Although for each μ, (Ω,^μ,{^?},Ρμ} satisfies the usual conditions (see Section 11.67), (Ω, J^, {e^ri},P//) does not satisfy the usual conditions (except in trivial cases). However, we do have &г+ = J%, because &*+ = ^f(V^). {9.3) THEOREM (Debut Theorem, see 11.76). For ВеЩЕд), set DB:=mi{t^0:XteB}, HB:=wf{t>0:XteB}.
250 MARKOV PROCESSES HI.9 Then DB and HB are {^"f} stopping times for every μ, so that DB and HB are {^t} stopping times. {9.4) THEOREM (Strong Markov Theorem for X, definitive form). Let Τ be an {J%} stopping time; then, for μ€ΡΓ(£5), ^eb^j^eb^", we have (9.5) (i) Е"|>0г|^г] = Ε*<Γ>Μ, a.s.(P"), (9.5)(ii) Κμ1ξθτη\ = Е"[£Е*(Г)>/]. It is necessary first to give careful thought to the technical problem of what Theorem 9.4 means, because many measurability properties are implicit in its statement. (We are sympathetic us to a point if you regard the whole business of completions as an unavoidable and artificial nuisance in a subject, probability theory, which is fundamentally a branch of applied mathematics. We shall therefore try to deal with this area in as succinct a way as possible.) Routine applications of monotone-class arguments are skipped in the following discussion. We know from Lemma II.75.3 that (9.6) for μεΡτ(Εδ), there exists an {J%°+} stopping time Τ{μ) such that Ρ"[Γ(μ)=Γ| = 1. Further, it is easily shown that, for μβΡτ(Εδ% we can find a function ημ in b^"° with Ρμ[ημ = η\ = 1. Then η°θτ is, a.s^P"), equal to the ^-measurable function ημ°θΤ(μ), so that η°θτ is ^-measurable. Hence (9.7) η°θτ is ^-measurable. Thus the conditional expectation Εμ|>7ο0Γ|^"Γ] can be interpreted by reference either to the 'carrier' triple (Ω,^,Ρμ) or to (Ω, #~", Ρ"). Next, we prove that (9.8) VAe^", the map xi->P*(A) is universally (ij) measurable on Ed. (Recall that <f£, the universal completion of Sd, is defined as (9.9) <?*:=C)№d:vePT(Ed)l where {Ed,Syd) is the v-completion of {Edyid)) Proof of (9.8). If vePr(Ed) and Ae«f(c J^v), we can find AltV and A2tV in ^° with Ai,sAcA2il and Pv(AltV) = Pv(A2,v). But it is clear from the definition of Pv on 3F° that, for к = 1 or 2, Pv(Ak,v) = |p^(Afc,v)v(dx). Hence P*(AltVKP*(AXP*(A2,v), Vx,
III.9 FELLER-DYNKIN PROCESSES 251 and ΠΑι>(ώ)=ίρ(Α2>(ίχ). Since χι—>Px(Aky) is ^-measurable for к = 1,2, it follows that xi->P*(A) is <fv-measurable. Since ν is arbitrary. (9.8) follows. Π The above proof of (9.8) yields the intuitively obvious result: (9.10) VAe#",VvEPr(£), PV(A) = Px{A)v{dx). ι = Px{A)v{dx). J Ed We know (11.73.11) that if S is an {&?+} stopping time then (9.11) Xs is «fj+-measurable from Ω to (Ed, gd). By some obvious further arguments based on (9.6), we can show that (9.12) XT is ^-measurable from Ω to (Ed, gf). By composition, it follows from (9.8) and (9.12) that (9.13) УЛе«Г, the map ωι-*Ρ*(Γ(ω)·ω)(Λ) is &'^measurable. Finally, (9.13) implies that (9.14) V^ebJ^, E*(r)|>7]ebJV The measurability implications of Theorem 9.4 are now clear. Of course, the proof of the theorem is now trivial from (9.6) and the 'algebraic' Strong Markov Theorem (8.3). {9.15) THEOREM (Blumenthal's 0-1 Law). If Ae3?0 then Vxe£a,P*(A) = 0 or 1. Proof. Apply the Strong Markov Theorem with T= 0, ξ = /л and η = /л. Since P*[*(0) = x] = l,Vx, E*[/J = Е*|7Л0О/Л] = Е*|7ЛЕ*/Л] = (Е*[/л])2. D (9.16) COROLLARY. If Τ is an {^t} stopping time thenVxeEd,Px[T= 0~\=0 or 1. Proof. {T=0}e^o. D (9.77) Example. Let xeE.BeS. Prove that either PX\HB = 0] = 1, in which case χ is called regular for B, orPx[HB — 0] = 0, in which case χ is called irregular for B. It is often difficult to decide which alternative obtains as classical examples
252 MARKOV PROCESSES ΙΠ.9,10 like Lebesgue's thorn (see Section 7.11 of Ito and Mckean [1]) demonstrate. The connection between the present concept of 'regular' and that of a 'regular' boundary point in the Dirichlet problem is described in Section 1.22. (9.18) Almost surely. A statement S about points ω in Ω will be said to hold almost surely (a.s.) if Λ:= {ω:Ξ{ω) is true}e J* and Ρμ(Λ) = 1, У μ. Thus 'a.s.' means 'a.s.(P"),V^\ Exercises {9.10) Let X be CBM(R3). Let Kc=R3 and let b be a point of R3 such that b is the tip of a cone that lies entirely within V. Prove that b is regular for V (for X). Why is it obvious that if L is a line in R3 then no point b is regular for L (for Χ)Ί {9.20) Let X be a canonical FD process, and let xeE. Set Ux:=inf{t>0:Xt*x}. Prove that PX[UX > s + t] = P*[[/ > s]P*[C/ > t] and deduce that PX[UX > t] = exp{-qxt) for some qx with 0 < qx < oo. Explain why if X is honest and has continuous paths then ^x = 0or oo. 10. Some fundamental martingales; Dynkin's formula. It should not surprise you to learn that, at various levels of sophistication, the Strong Markov Theorem can be presented as just a corollary of the optional stopping theorem for martingales. This does not matter too much to us now, since we already have the strong Markov theorem. What does matter is that it is very advantageous to regard some of the traditional consequences of the Strong Markov Theorem as martingale results. This will be a recurring theme in this book. For now, let us adopt a martingale approach (guided by Meyer's book [3]) to Dynkin's formula and Blumenthal's Quasi-left-continuity Theorem. X continues to denote our FD process. Let us write (10.1) b<f*:= b*J η {f:f{d) = 0}, m V*:= m+/J η {f:f{d) = 0}. Recall that m+<?£ denotes the set of ^-measurable functions from Ed to [0, oo]. For / in Ъ£% (or m + <f*), we have (10.2) PJ(x) = | Pt{x, dy)f{y) = Exf{Xt) for χ in Ε (and, by convention, for χ = д with Ptf{d) = 0). Then Pt: b<f* -»b<f*.
шло FELLER-DYNKIN PROCESSES ' 253 For an {^,} stopping time T, and for λ > 0, we set (10.3) Ρλτ/(χ):= Εχ1β~λτ/(ΧΤ)1 PTf(x):= E*[/(*r)]: here again, we allow fehS% or /emVj. In particular, Pf = c~XtPt. We have P*:b<f*-»b<i*. If В is a Borel subset of £, we write (10.4) PB for РЯв, Яв:= inf {ί > 0:XteB}. You will appreciate that it is the necessity of completion in the Debut Theorem that forces our present concern with universally measurable functions. It follows from (10.2) that, for geC0 and λ > 0, (Ю.5) K^(x):=E*[V^(*,)<ii. Jo By appeal to the general form of Fubini's Theorem, we can extend (10.5) to the case when деЪ£% or m+^J. (10.6) Exercise (simple proof of Dynkin's formula). Deduce from the Strong Markov Theorem that if Τ is an {J^} stopping time then ρλ ρλ ρλ ΓΤΓί ~ rT + t (but show that P^P* φ Ρ\Ρ\ in general). Hence obtain Dynkin's formula: for geCo,X>0,xeE, (10.7) R,9(x) = Ex e-Xtg{Xt)dt + PxTRxg{x). Of course, P\Rkg(x) means (P^Rxg)(x). The alternative proof we now give of Dynkin's formula (10.7) is the key to many of the deepest results in the subject. For the moment, fix ^eC0 and λ > 0, and put Jo (10.8) η:= \ e-"sg(Xs)dseb^°. Then Rxg(x) = E% Vx. Since η = Γ e-^(*s)ds + e-V0p Jo we can use the simple Markov property to find the following: (10.9) for every x, t^\ e-ksg{Xs)ds + e-ktRkg{Xb Jo is an R-modification of the UI martingale t \-+Εχ[η | J^J. By the Optional-Stopping
254 MARKOV PROCESSES 111.10 Theorem, if Τ is an {^"J stopping time (and hence an {^\x} stopping time), Ε* Γe-Xsg(Xs)ds + E'le'XTR,g(XT^ = Rxg(x); Jo in other words, Dynkin's formula (10.7) holds. Now pick fe@(&) and λ > 0, and apply (10.9) to g = (Я - ^)/. We see that (10.10) if Τ is an {^t} stopping time then, for X>Q,fe@(&) and xeE, <?·':= e-"f(Xt) -f(X0) + Г е-'\Х - <Z)foXsds Jo defines a UI R-martingale CkJ relative to ({^},Ρ*). By the Optional-Stopping Theorem, (10.11) Ex e~XTf(XT) -f(x) = Ex f V*(ST - k)f°{Xs)ds. Jo IfEx(T) < oo /or some x, we can let λ[[0 to obtain (for such x): = EX\ < Jo (10.12) E*/(Xr)-/(x) = E* ar/Wds. Jo The formula (10.12) is also called Dynkin's formula. Since (10.7) and (10.11) are the same and (10.12) is an immediate corollary, we shall mean by 'Dynkin's formula' any or all of (10.7), (10.11) and (10.12). It is easy to verify the following directly: (10.13) for feS(n C{:=f(Xt)-f(X0)~ [<Sf°Xsds Jo defines a martingale relative to ({#",}, P*) for all x. This corresponds to the analytical fact that, for fe Я>{&), -f Jo PJ-f- P,9fds = 0. Jo {10.14) Example. Let X be CBM(R) so that ^ = \d2/dx2 on its natural domain in C0(R). Fix b > 0 and λ > 0. We can certainly find / in S>(&) with f(x) = exp [х(2Я)1/2] for - oo < χ ^ b. Apply (10.11) with χ = 0 and 7= Hb to find that Е°е-яяь/(Ь)-/(0) = 0, since (0 — Я)/= 0 on (— oo, b). Hence Е°е-яя* = ехр[-Ь(2А)1/2], in agreement with our earlier findings. Π
III. 11 FELLER-DYNKIN PROCESSES 255 11. Quasi-left-continuity. We shall return to obviously applicable ideas very shortly (and there are a lot of applications coming up soon). First, it is convenient to prove Theorem 11.1, which (precisely) asserts that X is quasi-left-continuous (qlc). Later in this chapter, we shall see how the modification of the qlc property needed for Ray processes clarifies the role of branch-points. However, it is only when we begin to consider the modern 'general theory of processes' in Volume 2 that we find what 'qlc' is really about. Still, the historical order is a good one to follow when learning. (11.1) THEOREM (Blumenthal's qlc Theorem). Let (Тя) be a strictly increasing sequence of stopping times with limit T. Then X(Tn)^X(T\ a.s. on {T< oo}. Note. 'Stopping time' here means, of course, '{J^} stopping time'. Proof. It is enough to prove the theorem when T^c for some non-random constant с (For the general case, we can then replace Tn by T„ л(с — c/ή) and finally let с ft oo through a countable sequence.) So assume that Τ ^ с for some с Since X has R-paths, lim X{Tn) exists and equals X{T—). Thus we must prove that, a.s., XT = XT_. Define (but note that this is not the fundamental definition of ^г_; see Volume 2): (11.2) ^Γ_:=σ(^Γη:η = 1,2,3,...). Fix χ and, for the moment, fix / in $){<§). By the Martingale-Convergence Theorem (Theorem H.69.2) we have (11.3) ΕΛ[/(ΖΓ)|^Γη]^ΕΛ[/(ΧΓ)|^Γ_], a.s.(P*). But, by (10.13) and the Optional-Stopping Theorem (using the condition T^c for justification), (11.4) E*[/(*r)l^rJ =f(XTn) + Ε*Γ fT 9foXads\*T\ a.s.(P'). Now we can choose a subsequence (n(k)) with ■κι; <2 -3* Ex| I I &f°Xsds Mk) so that (by the Borel-Cantelli Lemma and the contraction property of conditional expectations), a.s.(P*), Ε'ΓΓ 9f°X,ds\rT^<2 LJr„(k) I J ' r„(k) for all large к (greater than &0(ω)). Hence, letting η tend to oo through (n(k))
256 MARKOV PROCESSES 111.11,12 in (11.4), we obtain (П.5) Е*[/(*Г)|.ГГ_]=/(*Г_), a.s.(P*). Since 2(9) is separable and dense in C0, it now follows that (for our fixed x) (11.5) is true if feC0. Hence, for /eC0, EW(*r) -/(Xr-)}2l^r-] =f(XT-)2 ~ 2f(XT-)2 +f(XT-)2 = 0, a.s.(P*). T,he rest is trivial. Π Not only is X quasi-left-continuous, but the filtration {J^,} is quasi-left- continuous. (11.6) THEOREM (Meyer). The filtration {J5',} is qlc: if {Tn) is an increasing sequence of stopping times with limit Τ then 3?τ = σ(#ΓΤη:η=1,2,3,...). See Theorem VI. 18.2 in Volume 2. 12. Characteristic operator. A point χ of £ is called absorbing if either of the following two equivalent conditions holds: (i) P*[X(t) = x,Vt] = l; (ii) Ft(x,{x})=l, Vt. (12.1) LEMMA (Dynkin). Let xeE and let d be a metric giving the topology of E. If χ is not absorbing then, for all sufficient small η > 0, Εχνη,χ< οο, where Vn9X:=inf{t:d(x9Xt)>rf. So as not to interrupt things, we defer the proof of this lemma to (12.4). We now define Dynkin's characteristic operator % of X. If χ is absorbing, define tf/(x):=0, V/eC0. If χ is not absorbing, define Vf(x):= lim E'Lf°^(^]-/(x) 4io WV4vX if the limit exists. The domain S>(^) is defined to be the set of those / in C0 for which #/(x) exists for every χ and for which #/(-)eC0. (12.2) THEOREM (Dynkin's Characteristic-Operator Theorem for FD processes). We have <# = %.
III. 12 FELLER-DYNKIN PROCESSES 257 Proof. It is clear from the definition of <€ that <€ satisfies Dynkin's Maximum Principle (6.8). Hence it is sufficient to show that # extends ^. However, this fact is an immediate consequence of Dynkin's formula (10.12) with T= Vn>x, since <&f is continuous at χ. Π Dynkin's splendid theorem has all sorts of important consequences. Let us first see what it has to say for CBM(Rn). (12.3) Example. Let X be CBM(Rn). Choose d to be the Euclidean metric on Rn. Since B] — t is a martingale if В is a BM0(R), it is 'obvious' that Exercise. Give rigorous proof by the optional-stopping theorem. Further, as we have seen before, the P* distribution of the variable X(V4tX) is the uniform probability distribution μηχ (say) on the sphere Ξηχ:= {у: d{x, у) = η}. Now, if feS>(jA), then the Gauss-Green Theorem shows that limmr^f f(y)^Jdy)~f(*)]== W(*)> so that feS>(<#) and #/= \Δ/. Hence ^( = #) is an extension of f Δ. We already know that ^ = |Δ if n = 1. See Section 7.2 of Ito and McKean [1] for the fact that if n ^ 2 then ^ is the closure of |Δ and a proper extension of |Δ. (22.4) Proof of Lemma 12.1. Suppose that χ is not absorbing. Set Be(x):={y:d(x,y)<e}. Then, for some ε > 0, t > 0 and α > 0, Р;д(*,Ед\Ве(х))>а, where Βε(χ) is the closure of BE(x) in Ed. Let G be the open set G:= Ed\Be(x). Let (ftn) be a sequence of continuous functions on Ed increasing to the indicator function of G. Then P+dhJP+d(,G), so that {y:P;d(y,G)>a} = [J{y:P;dhn(y)>a} n is open. Hence, for some positive η, which we can and do suppose to be less than ε, Р;д(у,Ед\Ве(х))>а, ЧуеВч(х). An obvious use of the simple Markov property now shows that Р*[*к,еВД,У/с^пК(1-аГ, , and it is an elementary consequence that
258 MARKOV PROCESSES III. 13 13. Feller-Dynkin diffusions. We now assume that Ε = IRn, but Ε could equally well be an η-dimensional C00 manifold. Recall that the lifetime ζ of our process X is defined as follows: ζ(ω):=ίηί{ί:Ζί(ω) = δ}. By an FD diffusion on IRn, we mean an FD process X with the following additional properties: (13.1) (i) the paths t\-+Xt((u) are continuous on [Ο,ζ); (13.1)(ii) the domain ®(У) of the generator ^ of X contains C*:= C*(Rn), the space of infinitely differentiable functions of compact support. Let X be an FD diffusion. Then the restriction 5£ (say) of ^ to C* satisfies the following conditions: (13.2) (i) 5£ is a linear map from C* to C0; (13.2) (ii) 5£ is local: if functions / and g in C* agree in some neighbourhood of a point x, then S£f(x) = JSf^x); (13.2)(iii) S£ satisfies the maximum principle: if/ in C* attains its maximum at χ and f{x) ^ 0, then J27(x) ^ 0. The property (13.2)(i) is obvious, and (13.2)(iii) is already familiar to us. Since X has continuous paths, it is clear from the definition of # that <€ is local. The property (13.2)(ii) now follows because $£ ^ ^ = #. The three properties (13.2) imply the following theorem. (13.3) THEOREM (Dynkin). The restriction S£ of <$ to C* is a second-order elliptic operator of the form *f(x)=^ZZfluWWM+IbfcWix) - c{x)f{x), £ i j i where 6{ denotes д/дх{ and (13.4) (i) Vi, j, the functions α^(·)9^(·) and c(·) are continuous; (13.4) (ii) Vx, the matrix {α^(χ): 1 ^ i, j ^ n) is non-negative definite symmetric; (13.4)(iii) Vx,c(x)^0. Proof. Note that it follows from the local and maximum-principle properties of 5£ that S£ satisfies the Local-Maximum Principle: if / in ®(i?) has a local maximum at χ and f(x) ^ 0, then S£f{x) ^ 0. For χ in £, we can find φ in C* with φ = 1 in a neighbourhood of x. For such a φ, we can define c(x)= — ^<p(x); this defines c(x) independently of the particular φ chosen. For fixed χ = (xl5x2,...,xn) in £, set bi(x) = ^i(x), where φ,-eC* and φ{{у) = yt — xf near x, and α0(χ):= ^(φ,φ,Χχ).
III. 13 FELLER-DYNKIN PROCESSES 259 Then c,bi and ai} are continuous. For XUX2,...,A„eR, the function h with has a local maximum at x, so that Hence the symmetric matrix {я0(х): 1 ^ i, j ^ n} is non-negative definite. Now, if feC™, Taylor's formula gives (for у near x) Яу) = Ф(у) + о(\у-х\2\ where (Recall that <p(y) = 1 near x.) Note that J2Wx) = - c(x)/(x) + Σ Ь,(х)3,/(х) + |Σ Σ aiMWjAx). For ε > 0, a function in C* defined near χ by y*->f(y)-iKy)-*\y-x\2 has a local maximum at x. Hence JSf/(x)-^(x)-eXe«<0. i You can see why S£f(x) = &ψ(χ), so the proof is complete. Π Suppose given a second-order elliptic operator 5£ from C*(Rn) of C0(Rn) of the form described in Theorem 13.3 (equivalently, satisfying the conditions (13.2)). Suppose that {Pt} is an FD semigroup on Rn with generator ^ extending if and that X is the canonical FD process associated with {Pt}. We now prove that, a.s., X has continuous paths up to time ζ, so that (ignoring null sets) X is an FD diffuson. (13.5) THEOREM (Dynkin, Kinney). Let Z/>\C™ -> C0 be an elliptic operator of the type described in Theorem 13.3. Suppose that {Pt} is an FD semigroup with generator extending S£ and that X is the associated FD process. Then, a.s., the paths of X are continuous on [Ο,ζ). Proof. To avoid annoyances, we give the proof when {Pt} is further assumed to be honest, so ζ = oo, a.s. (Since then Ptl = 1, Vi, it is easy to see that с = 0. However, the condition 'c = 0' does not imply honesty, because it does not preclude explosion in which X reaches infinity 'continuously' in a finite time. More about explosion, and more about the case when с Ф 0, later.)
260 MARKOV PROCESSES III. 13 Let К be a closed ball in R" and let G be an open ball containing K. It is well known that there exists feC? with /=lonK,/=0on R"\G,0 </< l everywhere. Since с = 0, we have i?/= 0 on K. For xeK, we have Pt(x,-Rn\G)<f(x)-P,f(x)= - f P.if/Mds. Jo Since \PsS£f- &f\\ -+0 (s JO), it is now clear that (13.6) supr^foR^GHO (i||0). We now wish to prove that, for each compact K, for ε, и > 0 and for xeK, (13.7) P* UilZib/^-Zafe + lW^^Sei^E^Vs^w} Uo. as η t °°- The theorem will then be obvious. (See (7.21).) The probability in (13.7) is dominated by nsupPu/n(y^n\B3E(y)\ ysK where Βε(χ) denotes the open ball of radius ε around x. Hence we need only show that r^upP.^R-v^OOHO (Щ0). yeK This is an immediate consequence of (13.6). For let xl9x29...9xr in К be such that ВЕ{хг% BE{x2),..., BE{xr) cover K. Apply (13.6) to the case where К = BE{xk) and G = B2E(xk). Let η > 0 be given. Then there exists Sk > 0 such that, Vi < <5k, sup Г'РДлК'ХВз.ООК sup r1Pί(};,Rя\β2ε(xk))<^7. Now take <5 := min (<5 x, δ2,..., <5Γ), etc. Π Wiener's Theorem is obviously a corollary of Theorem 13.5. Two problems remain. (i) Does there exist an FD semigroup with generator ^ extending 5£ ? (ii) If so, is there only one such semigroup? See Section V.22 for the answers. (13.8) The weak-continuity problem for {Px}. It would be wrong to leave the present theoretical discussion of the Feller property without mentioning the connection with weak continuity. (This connection will be clearly apparent in our later treatment of Stroock-Varadhan theory. References for the special case of diffusions will be given at that stage.) If X is an honest FD diffusion for which Pi:Cb(Rn)->Cb(Rn), then the map
ΙΠ.13,14 FELLER-DYNKIN PROCESSES 261 χι—>PX is continuous from Rn to Pr(W), where W is the space of continuous paths in Rn. For all FD diffusions, we can make the same statement, provided we make a suitable slight adjustment of the concept of weak convergence. It has been mentioned earlier that the theory of weak convergence is very highly developed in the case when W is replaced by the space D of R-paths (with values in a compact metric space Ed) with the Skorokhod Jx topology. This provides the appropriate setting for studying weak continuity of {P*} for general FD semigroups. See Skorokhod's classic paper [2], and Billingsley [3], Aldous [1] and Ethier and Kurtz [1] for interesting work. 14. Characterisation of continuous real Levy processes. Let X be a continuous 1-dimensional Levy process, so that X has stationary independent increments. Then X is a continuous Markov process with shift-invariant transition function Pt(x, χ + Γ) = P,(0, Γ) (t ^ 0, xeR, Ге@). We shall use Dynkin's characteristic-operator formula to prove Levy's theorem that Xt = aBt + μί for some Brownian motion В and constants σ and μ. It is strictly elementary to show that {Pt} has the FD property; and, since X is continuous, X is strong Markov. We do not yet know that the domain of the generator of X extends С". Write ®:={feC0:f"eC0}. Note that, for fe2, f'eC0 because f(x + 1) -/(*) =/'(*) + i/"(* + θχ) for some θχ in (0, 1). We shall prove that every element of S> belongs to the domain $){$) of the characteristic operator ^ of X and that there exist constants aeR+ and μεΜ. such that Since the operator ja2d2/dx2 + μά/dx with domain 2 is exactly the generator of the FD semigroup of aBt + μί, the desired result follows. We shall assume that, for a < χ < b, [ 0 < PxlHa < HJ = 1 - Px[Hb < HJ < 1. ' where Hy:=ini{t:Xt = y}. (The remaining cases are easily shown to be trivial in that, for them, Xt = X0 + μί for some μεΈί.) For h > 0, put 'o:= °> < +1 :=inf {' > <: 1*0 ~ X(-Q\ = *}■
262 MARKOV PROCESSES III. 14,15 Then {*(τ*):η = 0,1,2,...} is just a simple (Bernoulli) random walk. From standard elementary results on gambler's ruin (see Sections 1-3 of Chapter XIV of Feller [1]), we can show that there exist constants yeR and /?eR+ + such that, for a < χ < Ь, еУЬ_ ух / U_x \ (.4.1) Р^щ.-^— (:=_ ifr-o). (14.2) ^Н.,И,^Ь'^"-'2 + ^'"-^ γ еуа — еуо (:=1/?(Ь-х)(х-а) if 7 = 0). То prove (14.1), (14.2), first obtain Px[Hfl < Яь] and Ex[Ha л Яь] (in terms of Е°[Я_АлЯл]) when x — a and b — x are both multiples of ft, and employ obvious monotonicity properties in 'letting /Щ0\ You will find that β can be defined as (14.3) β:= lim 2h~2E°[H_h л Ял] > 0; *llo and it is indeed the existence of the limit at (14.3) rather than the much more informative (14.2) that we really require. Recall that #/(x) is defined as (14.4, «/(*):= toE'™-^>, nlio Ε*[τ?] provided the limit exists. But, from (14.1) and (14.2), we find that у (e"- l)/(x -»?) + (!- <r")/(x + φ) - (*"- <T ")/(*), vfix) = - lim , βηίίο e™ + e~™-2 and now it follows (by Taylor series expansion) that, for feS), we have feQ)^€) and #/=Ι*2/"+Λ where σ2:=2β~\ μ:=-γβ~ι. The proof of Levy's Theorem is complete. □ Exercise. Explain why the 1-dimensional result just proved implies the n- dimensional case (1.28.12) of Levy's result. For left-invariant diffusions on Lie groups, see Section V.35 in Volume 2. 15. Consolidation. We can profitably draw together a few threads. You will notice that the first sentence of the next section reads: 'Let Χ = (Χ„Ω, {J*,}, Px:xeEd) be an FD process with transition function {P,}.' Let us revise what
ΙΠ.15,16 ADDITIVE FUNCTIONALS 263 is involved in this statement. The space Ε is an LCCB and the space Ед:= Еид is compact metrisable. We can regard {Pt} as a semigroup on C0(E) satisfying (6.6)(i)-(iv)—but note particularly the significance of (6.6)(iv)*—or as the corresponding transition function derived via (6.2). We can therefore also regard Pt as a map Pt\b£-+b$. The extended transition function {P*d} is an FD transition function on Ed. Since we have so far met only canonical FD processes, we take X to be the canonical process described in Section 7. The process X has R-paths, and if either X(s —, ω) = д or X(s, ω) = δ then X(t, ω) = δ (Vi ^ 5). The lifetime £ of A' is defined as C:=ini{t:Xt = d}. The σ-algebra &°t is as in (7.16), and &t is as in (9.2). We note particularly the Debut Theorem 9.3. The process X is strong Markov relative to the filtration {^t} in the sense described in Theorem 9.4, and is quasi-left-continuous in the sense of Theorem 11.1. Moreover (Theorem 11.6), the filtration {J^,} is also qlc. The resolvent of {Pt} (or of X) is defined in (3.10). Dynkins formula (10.7), which 'decomposes the resolvent at a stopping time T" is extremely important. Finally, the generator <& of X may be defined via either (4.11) or Dynkin's characteristic-operator formula (12.2) (which are equivalent for our FD process X). 3. ADDITIVE FUNCTIONALS 16. PCHAFs; λ-excessive functions; Brownian local time. Let Х = (Х0О,{Ъ}9Р*:х€Ед) be an FD process with transition function {Pt}. (You will see that the FD property is not really used in our results on additive functionals, so that we can (and shall) apply these results in more general contexts when these arise. We have a clear idea of what an FD process is, and it provides a good enough context to be getting on with.) Let с be a measurable function from Ε to [0,00). (Take с(д) = О by convention.) Define 1:= c° Jo (16.1) Λ,(ω):= c°Xs{co)dsy or At:= c{Xs) ds for short. Expressions of this type occur as cost functions in control theory, and represent occupation times when с is an indicator function. In practice, we often wish to find the Px distribution of At for each x. For this purpose we use the Feynman-Kac (FK) formula, first developed for quantum-theoretic applications. We prove the FK formula in Section 19. A second major application of additive functionals is as compensators of certain potentials. For example, if At is defined as in (16.1) and h(x):=ExAao, assumed everywhere finite, then (16.2) E(AJW = At + h(Xt)
264 MARKOV PROCESSES 111.16 expresses the Doob decomposition of the supermartingale h(Xt) (you should check that this is a supermartingale!) More commonly, we first discount time to ensure convergence, forming, for example, φ(χ):= Ε* f " e-*f{Xt)dt:= Rxf(x). Jo The analogue of (16.2) now is E[ j °° е"ЯМ^^^к'1 = Г e~Xuf(X«)du + e'^RJ(Xt) expressing the Doob decomposition of the supermartingale e~XtRxf(Xt). To say that e~XtRxf{Xt) is a supermartingale amounts to saying that g:=Rxf^ e~XtPtg for all i^O. It is almost the case that if ψ is a function such that ψ ^ e~xP$ for all ί ^ 0 then ψ = Rkf for some /; the exact statement is the Volkonskii-Sur-Meyer Theorem 16.7. The third major use of additive functionals is as random clocks to time-change Markov processes; we saw a very important use of this technology in 1.5.13, and will explore the technique thoroughly in Sections 22-25, and again in Volume 2. (jf 6.3) DEFINITION. (PCHAF) A perfect continuous homogeneous additive functional (PCHAF) of X is an {^}-adapted process A such that, for some set Ω0 in 3* with ΡΧ(Ω0) = 1, Vx, the following properties hold for every ω in Ω0: (i) t\—>At(a>) is continuous, non-decreasing and Αο(ω) = 0; (ii) Vs, Vi, As+t(co) = As(cd) + At(0job (Hi) Α (ω) is constant on [ζ(ω), οο). Terminological note. In Dynkin [2], 'perfect' refers to the 'adapted' property of A. In Blumenthal and Getoor [1], 'perfect' refers to the fact that the 'exceptional' set Ω\Ω0 in (ii) can be chosen independently of 5 and t. Dynkin calls this 'strict homogeneity'. We win either way. Everyone would agree as to what a PCHAF is. If, for example, the function с in (10.1) is (non-negative and) bounded then (16.1) defines a PCHAF A. (Some boundedness condition on с is needed to keep At finite.) It is true that all PCHAFs are 'limits' of such 'integral' PCHAFs, and this is reflected in the important Existence and Uniqueness Theorem 16.7. {16.4) DEFINITION (uniformly Я-excessive). Let λ>0 be fixed. An ^-measurable function f from Ε to R is called uniformly Я-excessive if (i) f is bounded; (ii) f is λ-super-median (see Section 7): 0 < e~XtPtf(x) ^/(x), Vi > 0, Vxe£; (Hi) \\e~xtPtf - f \\^0 as 110.
III. 16 ADDITIVE FUNCTIONALS 265 The norm in (Hi) is of course the supremum norm. We take f(d) = 0by convention. (16.5) Example. For any FD process, Rkh is uniformly Λ-excessive for heC^. (16.6) Example. Let X be CBM(R). Then f(x):=y-'exp(-y\x\)9 у:=(2Я)1/2, defines a uniformly Я-excessive function /. (See Exercise 16.12.) (16J) THEOREM (Volkonskii, Sur, Meyer). Fix λ>0. Let f be a uniformly λ-excessive function on E. Then there exists a unique PCHAF A of X such that f is the λ-potential of A: (16.8) /W = EX Γ e~XtdAt. Jo Uniqueness means that if В is another PCHAF for which (16.8) holds then PxlAt = Bt9Vt] = l9 Ух. Note that our conventions force both sides of (16.8) to equal 0 when χ = д. Before proving Theorem 16.6, let us look at Examples 16.5 and 16.6. In the case of (16.5), 4' Jo h(X5)ds. (16.9) Brownian local time at 0. Let A be the unique PCHAF of CBM (R) with Λ-potential / as in (16.6). Now / was chosen so that f(x) = E* [ехр(-ЯЯ0)]/(0) = ΡλΗο/(χ) in the notation of Section 10. By 'Dynkin's formula' (see Exercise 16.14), E* [C°e-XtdA, = PxHof(x) = f(x) = Ex Ге~хЧА„ J Ho JO so that Р*[Л(Яо) = 0] = 1. It is now obvious (see Exercise 16.14) that (a.s.) A grows only when X is at 0: Jo (16.10) At= Im{Xs)dA, Jo It is further obvious from the 'uniqueness' part of Theorem 16.7 that, up to constant multiples, A is the only PCHAF satisfying (16.10). (16.11) Remarks. In dimension η ^ 2, we cannot define Brownian local time at a point (why?), but we can define the local time spent on certain sets.
266 MARKOV PROCESSES III. 16 Exercises. These exercises provide good practice in the use of 9t operators. {16.12) Prove that the function / of Example 16.6 is uniformly Я-excessive for X = CBM(R). Hint. Use probabilistic (as opposed to analytical) reasoning to show that / is Я-super-median. Solution. Since / is in C0(R) and X is FD, the only point that needs proof is that / is Я-super-median. Since, as we have already seen, /(х) = Е*[ехр(-ЯЯ0)]ЯО), it is enough to prove that £(*):= Е*[ехр(-Ш0)] defines a Λ-super-median function g. Now, for t ^ 0, H0 ^ inf {5 > t:Xs = 0} = t + Η0οθν Hence, with ^:=exp( — ЯЯ0), we have g(x) = EXM > <ΓλΈχ[0^] = <ΓλΈχ[Ε*(ί)ί/] = e-"Wlg{Xt)-\ = e-XtPtg{x\ D as required. (16.13) Though we are at present primarily interested in the 'Brownian local time' case, this exercise covers the general situation. Show that if A is a PCHAF (of some FD process X) with finite Я-potential / then, for a stopping time T, Ex Γ e~XtdAt = PxTf{x). This generalises Dynkin's formula (10.7), which becomes the special case-when A is an integral PCHAF of the form (16.1). Solution. We calculate e~XtdAt= Xe-Xt(At-AT)dt = e-Xt\ Хе~х*{Ат+8- AT)ds Jt Jt Jo = e~Xt\ ke-XsAs°eTds = e~kTeT^ Jo where cds. Jo Jo
ΙΠ.16,17 ADDITIVE FUNCTIONALS 267 By the Strong Markov Theorem, Ε* Γ e'^dA, = Έ.χ[β-ιτθτη\ = Ε*[<ΓλΓΕ*(7>] as required. D Note. On taking Τ — u9 we find that /(x) = Ex e~XtdAt^Ex Jo г-я^Л, = е-ЯмРм/(х), so that / must be Λ-super-median. Because of the Monotone-Convergence Theorem, we can see that / is even λ-excessive in that e~XuPuf{x)\f{x) as wj,0; but it need not be true that / is uniformly Λ-excessive. (16.14) Now return to the 'Brownian local time' case and prove (16.10). Solution. Since РУ[Л(Я0) = 0] = 1, Vy, we have Px[A{t + Hoo0t) - A(t) = 0] = ExPXit)[A{H0) = 0] = 1. Thus, for every x, P*lA(q + H0oeq) = A(q)9 V^eQ + ] = 1. The rest is easy. D 17. Proof of the Volkonskii-Sur-Меуег Theorem. If the Λ-excessive function / were of the form / = Rxg9 we would recover g by taking # = (Л — &)f. The idea in general is to approximate Л — # by n(I — P\ln\ where we recall PtA:= e~XtPt. Thus we define gn:=n(f-P\/nf)^0, and then define •1/я fn = Rx9n = n Pffdt. It is easy and important that fn]f uniformly on £. Then /„(*) = Яд<7„(х) = Е* fV*ft,pO«fe = E* f VA'<L4n(i), Jo Jo where U0:= Г' Jo 9n(Xs)ds.
268 MARKOV PROCESSES III. 17 Put Cn(t):= [ e-ksdAn{s)= [ e~"s gn(X5)ds9 Jo Jo so that (17.1) /„(*) = E*[C„(oo)]. For n^m9 and with the shorthand 9m,n ·= 9n~9m< 9n> fm,n ·= fn~fm> ®> we make the estimate (with ||/|| and \\f — fm\\ denoting supremum norms) 0^E^(ICn(oo)-Cm(oo)]2) = 2E*f°° f°° e-*sgm,n(Xs)e-^gmtn(Xs+t)dsdt Js-0 Ji=0 = 2E*J* f e-«2°+»gmJXs)Ptgm,„{Xs)dsdt = 2EX i°° Js = 0 (because of the simple Markov property) e-2X*gm,n(Xt)fmJXs)ds (since /m>„ = RxgmJ <2E4 e-2X°gn(Xs)fmJXs)ds f Js = 0 Js = 0 <2||/-/J|E*j e-*°gn{Xs)ds = 2\\f-fm\\fn(x) ^2\\f\\\\f-fm\\. (The 'Fubini' operations are justified because \\g„t„\\ < (m + и) ||/1| < oo.) But (17.2) E*[C„(oo)|JFt] = Cn(t) + e-Xtf„(Xt), a.s.(P*), so that, for each x, (11.3) Mm>B(i):= Cn(t) - Cm(t) + e~Xtfn(Xt) - e^f^X,) defines a martingale relative to ({,Ft},Px) with terminal value C„(oo) — Cm(oo). For the moment, assume that the martingale Mm „ has Я-paths. Set M*in:=sup|Mm,n(i)|. Then, by Doob's L2 inequality (II.70.2), Е*ЛС>П ^ [Е'(М:2 )]^2 < (8||/1| ||/ - /J|)1/2. If we choose a sequence n{k) such that ΣII/ — /π(*)ΙΙ1/2 converges then (Borel-
III. 17,18 ADDITIVE FUNCTIONALS 269 Cantelli), almost surely (a.s.(Px), Vx), converges uniformly over ie[0, oo]. But sup|г^+1,№)-г%Щ| ^ ll/„(fc+i)-/„(fc)ll ^ ll/-/„(fc)ll, and, since ΣII / ~/«w II < °o> we deduce that, almost surely, C(i):=limCn(k)(i) exists uniformly over ie[0, oo] and defines a continuous increasing process С It is easy to check that A(t):= satisfies (a.s.) eXsdCs 0 A(t) = limAnik)(t)9 the limit existing uniformly over compact intervals. Hence Л is a PCHAF. That /(x) = Ex[C(oo)] = Ex f e~XtdA(t) Jo follows from (17.1) because Cn(oo)-*C(oo) in 5£2 and hence in if1. (17.4) R-property of Mmn. In the cases that will concern us, fm and fn are continuous functions on £, and then Mmn inherits the R-property from X. If we assume only that fm and fn are Borel (as will happen if / is Borel), then we shall see in Volume 2 that the section theorem implies that Mmn is an R-process. However, the R-property of Mmn can be established in general: / is nearly Borel See Theorem II.2.12 of Blumenthal and Getoor [1]. (17.5) Connection with the Meyer decomposition. We have e-^/(^) = Ex[C(oo)[^t]-C(i). This is (see Volume 2) the Meyer decomposition of the 'regular class (D) potential' t\-+e~Xtf{Xt) on (Ω,{^},ΡΧ) as the 'potential' of the continuous integrable increasing process C. The uniqueness part of the Meyer decomposition theorem implies the uniqueness of A asserted in Theorem 16.7. Though it is not difficult to prove the uniqueness of A directly (you should be able to sense how to do it from the proof of the existence part of Theorem 16.7), it is best thought of in terms of martingale theory, so we take the uniqueness result for granted for now. D 18. Killing. We now construct a process X that represents 'X killed at rate dAt\ (A now denotes some fixed PCHAF of X.) The intuitive idea is that X agrees
270 MARKOV PROCESSES 111.18 with X up to time ζ ^ ζ and X(t) = δ, Vi ^ ζ, where (18.1) P[C > t|jr] = ' Π [1 " dA(s)J = e-A(t). Set Mt:=M(t):=e-A{t\ so that Μ is a PCHMF of X. The 'Af in PCHMF stands of course for ти/rt- plicative and reflects the property (18.2) Μ5+ί(ω) = Μ,(ω)Μ,(θ5ω). The type of construction required is standard and straightforward, but even the most intuitive ideas look tedious in probability theory. Blumenthal and Getoor [1] do their killing as quickly and humanely as possible, and we shall follow them. Needless to say, they do a much more thorough post-mortem, if you want all the gory details. The situation is rather curious because one normally works with σ-algebras {#t°} for X that are, so to speak, 'half-completed'. (See Blumenthal and Getoor [1].) We shall keep things 'algebraic' by assuming that A (equivalently, M) is {^°t}-adapted (as it will be in all cases of interest to us). We can then work with {^7} and Borel functions on Ε instead of {#",} and universally-measurable functions on E. We further assume that we have 'weeded out' (rejected) a null set so that for every ω, Μ0(ω) = 1 and t ь-> Mt{d) is continuous non-increasing. Let Ω:= Ω χ [0, oo] and let ώ = (ω, ξ) denote the typical point of Ω. Let ^ be the Borel σ-algebra on [0, oo] and set #°:= &° χ 0t. Define <Γ(ώ):= ζ(ω) л ξ, and put Л ' \д if ί^ξ. Define θ,ώ:= (θ,ω, {ξ-ήν 0). Then (18.3) X,o§h = X,+h Vt,Vfc. Let Ω,:=Ω χ (ί, οο]ε#°, and, for Ле#°, put Л in β° if there exists Λ in &° such that ΛηΩ, = Λ χ (t, oo]. Then (18.4) {#;} is a filtration of (Ω,#°) and X is {^-adapted. For ωεΩ, define a probability measure αω on [0, oo] by setting αω( {0} ):= 0 and αω(ί, οο]:= Μ,(ω) (ie[0, oo]).
III. 18 ADDITIVE FUNCTIONALS 271 If Ле#° = «Г0 х dt, then, for each ω, Лш:={{е[0,оо]:(о),{)еЛ}еЯ, and we can easily check by monotone-class arguments that ω\—►αω(Λω) is J^-measurable. Define (18.5) Ρ*(Λ):=Ε*[αω(Λω)] {хеЕд). Then Ρ* is a probability measure on (Ω,β°) for each χ in £5. Recall that we use hS0 to denote the space of bounded ^-measurable functions / on Sd such that f(d) = 0. (18.6) THEOREM. X^=(&,£°,^,XtJt,Px) is a Markov process on Ε with transition semigroup {P,}, where (18.7) £/(*):= E*[M,/(*,)] (feb^xeE). Note. You will appreciate that the notation (Ω,#°,{#°},{*,},...) would be just too unwieldy. Clarification of Theorem 18.6. Since we have not previously met a sextuple like X except in the case of canonical FD processes, a word of clarification is necessary. We are not going to spell out all the axiomatics. (See Blumenthal and Getoor [1].) It is clear that, for example, statements (18.3) and (18.4) form part of the definition of the statement that the sextuple X forms a Markov process. But only one property really matters: (18.8) txlf(X,+t)\f°al = Ptf(x) (/eW0)- Proof of (18.8). Let Ae#s°, so that ΛηΩ5 = Λ χ (s,oo] for some Λ in ^°s. Then, since f(d) = 0, you can check that Ё*[Ж5+,);Л] =E*[/(*S+,)MS+,;A] = P[(MA)W[/oij)] = Е*[(МА)Л/М = t'lPtf(Xa Л].. D (18.9) Illustrative examples. The Feynman-Kac formula will allow us to calculate the infinitesimal generator of {Pt} in important cases. (18.10) Example. We shall see that (modulo certain technical qualifications) if X is an FD diffusion with 'differential' generator ££ satisfying (18.11) S£f(x) = \ Σ Σ *yW WW + Σ ВД^/М, and (18.12) A(t):= c(Xs)ds (c non-negative) Jo
272 MARKOV PROCESSES ΙΠ.18,19 then X is a locally' FD diffusion with differential generator (18.13) <£/(*) = <?f(x) - c(x)f(x). The significance of the operation of killing is that, in connection with the big problems mentioned in Section 13, we can restrict attention to the case c(x) = 0, Vx. (18.14) Killing at constant rate. Perhaps the most frequently used application of killing occurs when we take с to be a constant function: c(x) = X (>0), Vx. (X now denotes an arbitrary FD process.) The killing operation in this case simply involves constructing a variable ξ that, under each P*, is independent of X and has the exponential distribution of rate λ. (ξ = oo if λ = 0). We then put '* la (ι>ξ). Note that our convention that A be constant on [£, oo] leads us to take At = (It) л ζ instead of At = Λί, but it does not matter: 'After the first death, there is no other.' We have pt = pi.-e-*pv <$ = <$-λ. The Feynman-Kac formula (Section 19) localizes this idea. (18.15) Exercise. Let G be an open set, and let F:= Ed\G. Define χ0[Χ< (t<HF), '" Ъ (t>HF). Thus XG is the process X killed on first leaving G. Formulate and prove the result that XG is Markov, noting especially how the terminal time property (18.16) ЯF-ί = ЯFoθf on {HF>t} is used in your proof. Explain how XG corresponds to the X process obtained by killing X in accordance with the right-continuous multiplicative functional M, where 1 (t<HF\ Mt:=· Ю (t>HF). See IH.3.7 of Blumenthal and Getoor [1]. 19. The Feynman-Kac formula. We shall examine three approaches to the Feynman-Kac formula: one analytical, one via Markov-process theory and (inevitably) one via martingale theory. Inevitably too, the martingale approach is the one that best corresponds to our intuition.
III. 19 ADDITIVE FUNCTIONALS 273 Analytical approach. We first consider a simple situation with hypotheses that are much too strong for certain applications. Let X be an FD process with transition semigroup {Pt} acting on C0 and with generator ^. Let ν be a bounded continuous non-negative function on Ε and set At:= Γ v(X5)ds, M,:= £>"*'>. Jo We prove that (note that our notation is consistent with our previous notation if ν is a constant function) (19.1) P»f(x):=ExlMtf(XtK defines an FD semigroup {?"} with generator &v satisfying (19.2) @(<ZV) = ®(<§\ <#vf{x) = <Zf{x) - v(x)f(x). In fact, since/1—>vf is a bounded operator (which we denote by v), it is a standard piece of semigroup theory that &v, considered as defined by (19.2), generates an FD semigroup {Pvt} with (19.3) Pf'/= lim (e-^nPt/nyf (in C0), ηίί°° where, of course, (e-tv/nf)(x):=e-tv{x)/nf(x). (19.4) Exercise. Convince yourself that (19.3) is the analytical counterpart to (19.1). Do not worry about rigour, because we are now going to prove something much better. Markov-process approach. Let us reformulate (19.2) in terms of resolvents: (19.5) Rvx = Rx-R,vRx. We can prove (19.5) directly and under wide conditions. It is important that we can drop the continuity requirement on v. So let us assume that ν is non-negative and ^-measurable and that Κλν(χ) < oo, Ух. As for X, we do not need anything as strong as the FD property (though you can assume it for now). The only hypothesis really needed on X is that (ί,ω)ι—►Ζ(ί,ω) be measurable relative to the respective σ-algebras ^[0, oo) χ J^ and S. We use the property As + U — As = Au°us and the simple Markov property of X to calculate, for feb$0, RJ(x) - R\f{x) = E* e-Xt'Ait)f(Xt)(eA{t) - \)dt Jo
274 MARKOV PROCESSES III. 19 = E* Γ dte-Xt-Ait)f{Xt) \ dsv{X5)eA{5) Jo Jo = EX \ dse-^Xi \duexp(-ku-Auoes)f(Xu°es) Jo Jo = Ε' f " e-*°v(Xs)Rlf{Xs)ds = RxvRlf{x). Jo (19.6) Exercise. Prove the easier, but less useful, result (19.7) RJ(x) - R'J(x) = RlvRJ(x). Martingale approach. Let us now give the 'right' treatment based on stochastic integral theory which is justified later. Though we now use the language of infinitesimal generators, the present approach is just as general as the Markov- process approach. The idea is to use the fact, for /e®(^), C{:=f(Xt)- [mxdds defines a martingale. Then d[e-A"f(Xt)l = e-A<»imXd ~ v(Xt)f(Xt)l dt + e~A" dC{. (If you do not know that stochastic calculus obeys different rules from Newton's, you will not be surprised by this calculation.) Thus 'o)= Γ Jo (19.8) Mtf(Xt)-f(X0)= | Ms$"f(Xs)ds + Nt, where ^/(х):=^/(х)-фс)/(х), Nt:= \' e~AmdC{. Jo (Continuity of ν is not required in order for this to make sense.) The key fact (proved in Volume 2) is that N, as the stochastic integral of a bounded continuous adapted process relative to a martingale in L2, is itself a martingale in L2 (for each P*). Taking Px expectations in (19.8), Pvtf(x)-f(x)= [ Pvs^f{x)ds. Jo This is essentially (19.7). Exercise. How do you obtain (19.5) by the martingale method? Note. By the Feynman-Kac formula, we mean any or all of (19.2), (19.5) and (19.7).
111.20 ADDITIVE FUNCTIONALS 275 20. A Ciesielski-Taylor Theorem. The following strange result was discovered by Ciesielski and Taylor [1] by explicit calculation of the distributions involved. Only in the case when η = 1 is a simple non-computational explanation known (Williams [7]; see Section 9; but see also Getoor and Sharpe [2] and Biane [1].) {20.1) THEOREM. The hitting time of the sphere {|x| = l} by а 5M0(Rn) process has the same distribution as the total time spent in the ball {\x\ ^ 1} by a BM0(Rn + 2) process. Let us now use the Feynman-Kac formula to obtain the distribution of the time spent in {|x| ^ 1} by CBM (R3). The general case of CBM (Rn) {η ^ 3) may be studied in exactly the same way, but we spare you (explicit use of) Bessel functions. Use of the Feynman-Kac formula is simpler than the Kac 'method of moments' technique employed by Ciesielski and Taylor. So let X be CBM (R3). Let В be the unit ball in R3 and let φι := meas {5 ^ t: Xs e B] = Ib(Xs) ds. Jo Fix α > 0 for the moment and put At:=ccq)t:= v{Xs)ds1 v:=aIB. 0 Introduce Put Rvxl{x):= ΓVя' Jo Then By (19.5), W\_e-A{t)-\dt. A(oo):= | lim A(t) = αφ(οο). й(х):=Е*|>-Л(0О)]= ИтХЯЩх). АЦ0 Rxl-Rvxl = RxvRll so that, since XRX\ = 1, l-XRll = XRxvRvxl. On letting λ[[0 (justification is completely trivial) (20.2) 1 - h = R0lvhl = a f g(x9y)h(y)dy where g is the free-space Green function for |Δ. See Section 1.22. Spitzer [1]
276 MARKOV PROCESSES 111.20 gives a neat treatment of (20.2), which we now follow. It is clear from the definition of h that h is spherically symmetric: h(x) = /(|x|) for some /. The right-hand side of (20.2) is therefore the gravitational potential due to a spherically symmetric mass distribution. Gauss found the way to deal with such potentials. First, the potential outside a spherical shell due to a symmetric mass distribution on the shell is the same as if the whole mass of the shell were concentrated at its centre. Secondly, the potential inside a spherical shell due to a mass distribution! on the shell is constant, and therefore equal to the value at the shell's centre. We may therefore calculate, for 0 < |x| < 1, 1-/ζ(χ) = α(2π|χ|)-1 f h(y)dy + a f (2n\y\y'h(y)dy. •ЧЫ<М> J{M<M<D Thus, with u(r) = rf(r), where h(x) =/(|x|), we have, for 0 < r < 1, r — u(r) = 2a pu(p) dp + 2ar u(p) dp, whence u" = 2aи on (0,1). We now easily obtain (20.3) E°e-^(00) = sech δ, δ:= (2α)1/2. Of course, the problem is really 1-dimensional, and we could have transformed the Gauss results into statements about BES (3); but it would not have been such fun. (20.4) Exercise. Let X be CBM (R). Let T:=inf{t:\Xt\ = l}=Hi vH.,. Let α > 0 and <5:= (2a)1/2. Explain why Е°[ехр(-аЯ1)] = Е°[ехр(-аЯ1); H^H.J + Е0[ехр(-аЯ_1);Я.1<Я1>-", and deduce that E°e"ar = sech(5. Comparison of this result with (20.3) clinches the CT theorem in the case η — 1. One of the classic applications of the Feynman-Kac formulae is to the arcsine law (see Ito and McKean [1]). However, we prove the arcsine law by using local-time theory, both in Section 24 and, by a better method, in Section VI.63 of Volume 2. ♦Still assumed symmetric.
111.21 ADDITIVE FUNCTIONALS 277 21. Time-substitution. Suppose that Л is a PCHAF of X and that a null set has been weeded out so that properties (16.3)(i)-(iii) hold for all ω. Set τ,(ω):= τ(ί,ω):= inf {s:As > i}, so that τ is the right-continuous function inverse to A. Since {rs<t} = {At>s} and {J*,} is right-continuous, for each 5, τ5 is an {^t} stopping time. Set Xt:=Xr(t),&r=^m and h=Kty Then is a strong Markov process on (E, <f*). The point is that if Τ is an {^} stopping time then Γ:= τ{Τ) is an {J^} stopping time and #f(f) = «^(7):= J^r; further, τ(Γ + ί) = Γ + т,°0г. Thus Z inherits the strong Markov property from X. As an exercise, write out all the details of the present argument—see Section X.5 of Dynkin [2]. This result is due to Volkonskii. If A is strictly increasing, in which case τ is continuous, then it is clear that X and X have the same hitting distributions: (21.1) for every compact X, Vx, ЧВеЩК), Yx[_XoHKeB-\ = PxlXoHKeBl Indeed X°HK = X°HK. Of course, Ях:= inf {t:XteK}. The converse theorem is very deep. (See Theorem V.5.1 of Blumenthal and Getoor [1].) {21.2) THEOREM (Blumenthal-Getoor-McKean). // Υ is a'standard'process with the same hitting distributions as X then, for some strictly increasing PCHAF A of Χ, Υ has the same laws as the X process associated with A. For the definition of 'standard process', see Blumenthal and Getoor [1]. (FD processes are certainly standard.) Incidentally, the fact that if X is standard and A is strictly increasing then X is standard emphasises the need to axiomatise processes by probabilistic axioms (as is done for standard processes and, better, right processes) instead of axiomatising via analytical properties like the FD property. {21.3) The generator of X. The preceding sentence stands as a piece of pure mathematics. However, when we wish to talk about the generator of X, we should consider for example how the FD property behaves under time- substitution. {21.4) Volkonskii's formula. Suppose that X is an FD process and that A(t)= Γ v{X5)ds, Jo
278 MARKOV PROCESSES IH.21,22 where ν is a positive continuous function on Ε bounded away from 0. Then, for some Κ, τ, < Kt, Vt. For /e®(^), C{:=f(Xt) &f(Xs) ds is an i?-martingale relative to (Ω, {^^},Ρμ) for every μ. See (10.13). If we consider a fixed interval [0,a] for t then rte[Q,Ka], and we can apply the Optional Stopping Theorem H.77.5 to Cf(t a Ka). In this way, we deduce that (21.5) C{{t) =/(*,) - f v(Xs)-1mXs)ds Jo {t running over the whole half-line [0, oo)) is a martingale relative to each (Ω,{#?},Ρ"). For fe@(<0), write (21.6)(i) G/(x):=.(x)-^/(x) provided fe3>{<&); then GfeC0. Taking P* expectations of the martingale C{(t), we find that Ptf(x)-f(x)= P5Gf(x)ds. Jo If we assume that (in the cases we study) the transition semigroup {Pt} of X is an FD semigroup with generator § (say), it is now clear that (21.6)(ii) #=>G. The results (21.6) comprise Volkonskii's formula. For Volkonskii's proof via Dynkin's formula, see X. 10.24 in Dynkin [2]. Of course, if ν is also bounded away from oo, we can reverse the roles of X and X to show that ^ = G. (21.7) Finite Markov chains. The time change of a finite Markov chain with β-matrix Q by the additive functional At = ^Qv{Xs)ds is, according to (21.6)(i), the Markov chain with β-matrix β, where In terms of the jump-hold description of the chain (Section 2), the effect of the time change is easy to specify; when the time-changed chain visits i, it resides there for an exponential amount of time with mean v(i)/q(i% compared with the mean l/q(i) for the original chain. 22. Reflecting Brownian motion. Let X be CBM (R). Let At:= I[0tao)(Xs)ds = meas{s*:t:Xs>Q}. Jo
111.22 ADDITIVE FUNCTIONALS 279 Let τ be the right-continuous inverse of A and let Xt:= Χ°τν It is intuitively obvious that Xt must be reflecting Brownian motion on [0, oo). Assume for now (see Note below) that X is an FD process on [0, oo). Dynkin's formula # = <% for the generator of X shows that f(0)=timM. .Uo Е°[А(Я.)] Now, by the formula (1.13.7) for the transition-density function of Brownian motion killed at 0, Е°[Л(Я£)]= Г Г lp(tAs-y)-p(rAe + yWydt, Jt = 0 Jy = 0 where ρ is the transition-density function of CBM (JR.). But, for λ > 0 and with у:=(2Л)1/2, formula (3.13) gives e"A,[p(i;0,e - y) - ρ(ί;0,ε + у)] Л = гу-^'^тЬдо. Jo On letting Я ДО, we obtain [p(i; 0, ε - y) - p(i; 0, ε + у)] A = 2y, Jo so that Е°1А(НЛ=е2. Hence, for / in Co[0, oo), we have fe@(§) if and only if the formulae (22.1) (i) »/(*) = iTM (*>°) (22.1)(ii) m0) = lim ε"2[/(ε) -/(0)] make sense and define a function &f in Co[0, oo). Note that / in <&(&) satisfies / + (0).-=Ume-1[/(e)'-/(0)]=0, and observe how the formula for У checks with FHopital's rule. We saw in Section 1.41.1 that reflecting Brownian motion \X\ is Markovian. From the explicit formula (1.14) for the transition-density function for |A"|, it is clear that \X\ is an FD process. That \X\ has generator ^ as in (22.1) will be immediate once we show that Е°[Я£лЯ.£] = б2. This follows from the more precise result that for λ > 0 and with y:= (2Я)1/2 and Γ:= Ηε л #_ε, we have E0[exp(-^7)] = sechye.
280 MARKOV PROCESSES 111.22 See Exercise 20.4. The proof that X and |A"| have the same generator #, and hence the same transition function, is complete. Important note on preservation of FD property. Deciding whether or not the FD property is preserved under probabilistic operations such as time-substitution is generally a very difficult problem. In the 1-dimensional case, special arguments apply that allow easy settlement of the matter, so that the FD property for simple transformations of CBM (R) is usually taken for granted in the literature. Let us work through the details for this example. So let X be CBM (R), and .define At:= meas {s ^ t:Xs ^ 0}, τ,:= inf {u: Au > ή, Χt:= X(rt). Our intuitive idea is this: if X starts at a point χ of [0, oo) then, with high probability, X will very soon hit y.f We use this idea to show that X has FD resolvent, whence, by the Hille-Yosida Theorem, X has FD transition function {Pt}. (Warning: if you try to express the same intuitive idea directly in terms of the transition function, you will encounter some very awkward technical problems.) For }>e[0, oo), put Hy:=ini{t:Xt = y}. By applying the strong Markov property of X at time Hy—which we know amounts to applying the strong Markov property of X at time r(Hy)—we find that (with obvious notation) for /eCo[0, oo), λ > 0, and x,ye[0, oo), Rxf(x) = Ε* Γ*e-x*f{Xt)dt + Е*[ехр(-АЯ,)Ля/(у)]. Jo (You recognise that this is just Dynkin's formula for X) Hence (22.2) \Rxf(x)-Rxf(y)\*i\\f \\E* "e-»dt + |ЛдДу)|Е-[1 -ехр(-АН,)] Jo ^(A-Ml/ll + ll^/IDE'Cl-expi-AH,)] ^2Я-1||/1|Е^[1-ехр(-Ш,)]. That Rkf is continuous will therefore follow once we show that (22.3) E*[exp (- ЯЯ,)] -> 1 as χ-> у. But Hy = А{Ну) ^ Ну, a.s., so (22.4) follows from the corresponding property for X. It is trivial to show that, for feCo[0, oo), Rxf(x)->0 as x-* oo. Hence Ял:С0[0,оо)^С0[0,оо). By Lemma 6.7, you will see that it is enough now to show that, for fe Co(0, oo), (22.5) Ptf(x)->f(x) (40). (It will then follow that as ЛТТ00, XRxf-*f not only pointwise but also in the Неге, у is a point [0, oo) very close to x.
IH.22,23 ADDITIVE FUNCTIONALS 281 supremum norm.) However, since X is right-continuous and Ptf(x) = Exlf(Xt)l the result (22.5) is obvious. The same argument covers the case of Xх and the case of elastic Brownian motion in Section 24. The Feller-McKean example (now to be discussed) would require a little more thought, but whether or not it is FD (in fact, it isl) is irrelevant. 23. The Feller-McKean chain. Again let X be CBM (R). We assume Trotter's Theorem 1.5.9. Let /(i, x) be a jointly continuous local time for X and set At:= l{Ux)m{dx)=Yjl{Uq)m{q] Jr q where m is a probability measure concentrated on the rationals with m{q} > 0, VgeQ. Then Л is a strictly increasing PCHAF of X. (We suppress 'almost surely' qualifying phrases here.) Let X be the corresponding time-transformation of X. Then J? is a continuous process that spends almost all its time in Q and can be regarded as a Markov chain with 'minimal' state-space Q. The generator of X (considered as an FD process on R) is a highly singular second-order 'elliptic' operator ±_d_d_ 2dmdx ltd and McKean [1] contains the definitive account of such operators. Breiman [1] and Freedman [1] contain good easy introductions. The Feller-McKean chain X is historically important as the first chain with all states instantaneous: ίι·- -ί«:= lim ί_1[1 - РгШ = oo, Vi, where (Some basic chain theory is recalled later in this chapter.) Since X moves continuously and therefore does not jump, we have qu:= limi-1pu(i) = 0 (ViJeQi^j). mo Thus the β-matrix Q of X satisfies (-oo 0 0 -·Λ 0 -oo 0 ... 0 0 -oo ··· : ! ! / In Theorem 55.1, we describe all possible totally instantaneous Q-matrices.
282 MARKOV PROCESSES 111.24 24. Elastic Brownian motion; the arcsine law. Let X continue to denote CBM (R) and let A continue to denote time spent by X above 0. Let ξ be an exponentially distributed variable of rate λ > 0 independent of the X process. Let x_,Xt (t<& χλ·= In other words, we have the 'killing at constant rate' situation of (18.14). Set Jo and let τχ be the right-continuous inverse to Ax. Let Ух be the generator of Χχ:=Χχοτχ. Then (24.1) (i) & f(x) = $f"(x) - kf(x) (x > 0) (24.1)(ii) ^ 7(0) = lim s-2le-y*f(s) - /(0)], y:= (2Я)1'2 εϊϊΟ because Е°[Л(Я£)л£|~б2 (εϋΟ), Р°[Я£ < ξ\ = Е°[ехр( - λΗ^ = е-". Note the elastic boundary condition (24.1)(iii) /+(0) = y/(0) satisfied by elements / of 2){<gx). Again note how the formula for <SX checks with l'Hopital's rule. Now let us explain why this example is interesting. The lifetime of Χλ is Α(ξ). If Χ(ξ) > 0 then Xх dies at position Χ(ξ). The exciting thing is that if Χ(ξ) < 0, which happens for example with P° probability |, then Xх dies at 0. In other words, Xх must be obtained by killing X at rate λ while X is away from 0, but in a way depending on the local time at 0 when X is at 0. Indeed, if lt = /t(i) denotes the local time at 0 for X then (24.2) ΡχΙΑ(ξ)>ί\Χ]=εχρ(-λί-±γΙ% Vx^O. This formula, taken from Williams [6], is one way of introducing local time from global considerations! We prove (24.2) below. Note that it may be reformulated in a way that obviates the need for introducing ξ: (24.3) E*[exp (- kzt) \ X~\ = exp (- Xt - |y/t(i)), Vx > 0. (24.4) Arcsine law. Levy's arcsine law P°[A(wKi]=-arcsin (-) (t^u) is an immediate consequence of (24.2). Take χ = 0 in (24.2) and then take P°
111.24 ADDITIVE FUNCTIONALS 283 expectations to get 5 Xe-XuP°lA(u) >i]du = <ΓΛΈ°|>χρ(- \ylt)~\. All that remains is to identify the law of jlt. In fact, jlt has the same law as Sf:= supu<i Xu; you may consider this to be completely obvious in view of the construction of local time from upcrossings given in Section 1.14. On the other hand, it may take Exercise 24.5 below to convince you. However, given the equality in law of jlt and St, the arcsine law now follows by consulting Laplace transform tables, or, better, by calculating a few integrals. (24.5) Exercise (i) Let U+(t,e) be the number of upcrossings of [Ο,ε] by X before time i, and let C/_(i,e) be the number of downcrossings of [— ε,Ο] by X before time t. By the strong Markov property of X, argue that, conditional on (7(ί,ε):= t/+(i,fi) + [/_(ί,ε) = η, U+(t,e) has a B{n,j) distribution. (ii) Observing that ί/(ί,ε) is the number of upcrossings of [Ο,ε] by \X\ before time i, show, as in Levy's Theorem 1.14.7, that lim2-nC/+(i,2-n) = f/r n->oo (iii) From the characterisation in Section 22 of reflecting Brownian motion, we have X(t) = Χ{τ,) = \Xt\. Thus lim 2-nU+(Tt>2~") = ji,= lim 2-nU{t,2~") = I,- (24.6) Elastic Brownian motion. Let X still denote Brownian motion and lt the local time at 0 for X. Fix γ >0. Let X denote \X\ killed at rate ydlt, so that X has transition semigroup {Pt} with Ptf(x) = E*[exp (- ylt)f(\Xtm = £x[exp( - hl)f(*t)l Obviously <$/ = j f" away from 0. The situation at 0 is interesting. ,ιιο Е°[Н,лС] where Ηε and Ηε are of course the hitting times of ε by |A"| and X respectively. During the course of our proof of Levy's Downcrossing Theorem (Section 1.14), we showed that the P° distribution of 1{Ηε) is exactly exponential with mean ε, so that Е°[ехр(-7/оЯс)] = (1+уе)-1. Further (why?), Ε°[ΗεΛζ]~ε2 (eUO).
284 MARKOV PROCESSES IH.24,25 Hence, for fe9($), Г(0) = уД0), */(0) = i/'(0 + ). The formula (24.2), which involves an extra killing at constant rate Я, is now obvious. 4. APPROACH TO RAY PROCESSES: THE MARTIN BOUNDARY 25. Ray processes and Markov chains. We now move on from the familiar FD semigroups and processes to Ray semigroups and processes. Quite rightly, you first want certain questions answered: (25.1) (i) Are there important examples of Ray processes that are not FD1 (25.1) (ii) Does the theory of Ray processes provide new information on FD processes! (25.1)(iii) Do we have to move on again from Ray processes to still more general objects! The answers must be (i) Yes, (ii) Yes, (iii) No—or we would have not asked the questions. You will of course realise in regard to (iii) that 'No, never' must yield to 'Hardly ever' if the point is pressed. For Ray processes, both the analysis and the probability theory are much richer than for the FD situation; and, for the pure mathematicians among you, this may be justification enough for studying the Ray theory. However, motivation never harmed anyone (least of all, pure mathematicians), and we propose to answer questions (i), (ii) and (iii) in some detail before we develop the theory. In 1966, Chung made some shrewd and prophetic comments in his book [1] on Markov chains: The second edition of this book appears at a time when boundary theory (envisaged in this book as a study in depth of the behaviour of sample functions in relation to the 'infinities') has just begun to take shape. This vital theme, already announced in the preface to the first edition, will no doubt be the most challenging part of the theory to come. I have chosen not to enter into it in detail in the belief that such a development needs more time to mature. In this regard, it may be a timely observation that the theory of Markov processes in general-state-space, which flourished in recent-years and has built up a powerful machinery, has had to date little impact on the denumerable [chain] case. This is because the prevailing assumptions would allow the sample paths of chains virtually no other discontinuities than jumps—a situation which would make a trite object of a chain. On the other hand, the special theory of Markov chains has
111.25 APPROACH TO RAY PROCESSES 285 yet to adapt its methodology to a broader context suitable for the general state-space. Thus there exists at the moment a state of mutual detachment which surely must not be allowed to continue. Future progress in the field looks to a meaningful fusion of these two aspects of the Markovian phenomenon. The meaningful fusion is achieved in the theory of Ray processes. The benefits to chain theory are enormous. In the other direction, last-exit theory provides one of many examples where the general theory has benefited from adapting the methodology of chain theory. The prophetic character of Chung's comments will be seen throughout this book. Anyone familiar with Chung's philosophy (one with which we are in full agreement) will know the stress to be laid on sample functions in the quotation. It is on the behaviour of sample functions rather than on that of (for example) excessive functions that we, as probabilists, must concentrate. It is therefore satisfying that the general theory will allow us to treat chains in a manner that greatly clarifies the probabilistic significance of qh qip q^t) etc. and suppresses much of the analysis on which many previous treatments have relied. To be sure, Ray's theorem makes very heavy use of analysis in its early stages. The point is that, once it gets underway, the probability theory is more or less self-sufficient; and, by then, it trounces analysis at its own game. In connection with question (25.1)(i), it is necessary to understand Chung's statement that the then-prevailing assumptions covered only 'trite' chains. This relates to the following fact (discussed briefly and illustrated in a moment, and explained fully later in this chapter). If a transition matrix function {ρη(ί)} on a countable set I has the FD property relative to the discrete topology of I, then the associated chain X is totally stable and Feller minimal. Though nearly all chains that can serve as models for real-world phenomena are totally stable and Feller minimal, such chains are 'trite' from a pure-mathematical standpoint. (We shall attempt to provide a strong justification for the study of 'non-trite' chains later!) The statement that X is totally stable means of course that every state i in / is stable: 9i:=limr1[l-pii(i)]<oo, Vi. mo You can see why this has to be. Since X is FD, X is right-continuous in the discrete topology of /, so that if X starts in state i then X must stay at i throughout a time interval. The well-known fact that this time interval is exponentially distributed with rate q{ is proved (along with the existence of q() for general chains in Section 82. The Feller-minimal property refers to the fact that 'the behaviour of X is completely determined by Q in that if X explodes then X dies at its explosion time. In the next section we clarify these points in regard to a special example.
286 MARKOV PROCESSES ΙΠ.25,26 (25.2) The Ray-Knight topology. It is hard to believe now that Ray's tremendous paper [1] was published as long ago as 1959. Ray's choice of axioms for what is now called a Ray process was astonishingly perceptive. There was unfortunately an error in Ray's attempt to show that every 'acceptable' Markov process (and, in particular, every chain) could be made to accord with these axioms by introducing a suitable topology on, and compactification of, the state space. This extraordinary claim is however essentially true. The gap in Ray's paper was corrected (for different particular situations and in different ways) by several people including Ray himself. It was Knight [1] who got things just right, and, especially after the appearance of the 1967 Kunita and T. Watanabe paper [1], the Ray-Knight compactification was firmly established. In particular it was known that all chains are Ray processes, which gives a strong 'Yes' to question (2.5.1)(i). You can see the problem involved if we consider the Feller-McKean example (Section 23). The Feller-McKean process X is a continuous process on R that can be regarded as a chain on Q. Now suppose that someone relabels Q as, say, N and presents us with the Feller-McKean transition matrix function {ρί;·(ί)}. Could be recognize from {py(i)} that we should 'unscramble' the situation by imbedding N as Q in R? Yes; the Ray-Knight compactification will do the unscrambling for us. 26. Important example: birth process. The example we now discuss is extremely simple, but it serves well to illustrate points in the theory of Ray processes and (later) in the classification of stopping times and in the theory of jumps of martingales. We are sure that you know enough elementary chain theory (from Volume 1 of Feller [1], which discusses this example) to follow this account without difficulty. We are only concerned with intuitive understanding, and skip some details of rigour (along with some 'a.s.' phrases). Let / = {1,2,3,...} and let Q be the / χ / matrix /-9i 9i О О -Л Ql 0 -q2 q2 0 ... where0< <00jVl· 0 0 -a, a, .. Let X be a right-continuous chain with Q-matrix Q, let Hn:=ini{t:Xt = n} and let η:= limn#n ^ oo be the first explosion time of X. Up to time η, the paths of X are non-decreasing. Under the Pl law, the variables are independent and are exponentially distributed with rates qbgi +1> · · · respec-
111.26 APPROACH TO RAY PROCESSES 287 tively. Thus (26.i) Εί[β-λ'] = πα+^;1)"1 α>οχ so that (as we see by letting Λ||0) if £ q~1 = oc then η = oo, a.s., if Σ <7k~ * < °° then η < oo, a.s. In the case when η = oo, a.s., there is nothing more to say: X is the unique chain with β-matrix Q and X is FD because PJXO ^min^j/i/) and because (6.6)(iv)* is automatic. We now devote our attention to the case when η < oo, a.s.. If X is FD (for the discrete topology on /) then, by Theorem 11.1, X is quasi-left-continuous on [0, oo), so that Xfa) = 1xniX(HJ = ao=d; n by the coffin condition, X(t) = д — oo, Vi ^ η. Thus there is only one FD chain, namely the 'Feller-minimal' chain killed at time η. {26.2) Exercise. Show that the Feller-minimal chain Irain has resolvent τ$*(λ):= f°° e-Xtp™n(t)dt (Я>0), Jo satisfying rmina)i° U<i), iS Xtt + qj)-1 Π [(ΐ + ^,Γ1)"1] (;>0, г5"°(Л) = (Л + 9уГ1. D When η < oo (a.s.), there are other (non-FD) chains X with β-matrix Q. We now write £:= {1,2,3,...; oo} for the one-point compactification of / and д for a point isolated from E. Let μ be a probability measure on / и {д} with μ{δ} < 1. For each such μ, we can construct a chain X with Q-matrix Q by Doob's immediate-return procedure: we choose Χ(η)εΙν{δ} with distribution μ; if Χ{η) = δ then JT(r) = d, Vi > >/; if A^e/ then we run X according to the 'old' rules until the next explosion time η2 (say); we choose Χ(η2) with distribution μ; etc., etc. For this chain X, quasi-left-continuity breaks down and takes the modified form (26.3) P'U(limHn) = j\ VПНпП = μ,:= /ι«Λ) (je/иЗ), where V &(H„) is the smallest σ-algebra containing every ^"(#n). We say that
288 MARKOV PROCESSES ΙΠ.26,27 00 is a branch-point with branching measure μ and write J\>(«>» {./}) =/*/· We note that X never visits the branch-point oo. (It approaches oo but branches at the last moment.) The only sensible interpretation of the P°° law is as p°°= Σ Цкрк. kelud Note that now we generally have <FΆ Φ V ^(Hn) (in contrast to the 'FD' situation in (11.6)). (26.4) Exercise. For fee/υ δ, put хк(Я):= Ε*[>"λ,ί], so that χδ{λ) = 0. Show that, for i,jel, ^(λ) = ^(λ) + Χί(λ)Σμ^(λ) fee/ (Substitute the first equation into itself.) (26.5) Warning. More complex ways of return from infinity are possible even for this example. (26.6) Note. We must mention one further point in connection with this example. Consider the immediate-return process with μ1 = 1, so that X returns to 1 after each explosion. Then X will have the FD property (and so will be quasi-left-continuous) if / is topologised as the compact metric space with 1 as the unique limit-point of the sequence 2,3,4,... Check this by using (26.4) and the HY theorem. The Ray-Knight topology will automatically make the point 1 an accumulation-point of /, as it should be. In short, the Ray-Knight compactification will detect Feller properties when these are present. (More is true. It can happen that when a process X on (say) a compact metric space is constructed by complicated probabilistic methods, we are unable to prove directly that X has the FD property, but can prove the equivalent result that X is a Ray process without branch-points.) But the value of the Ray-Knight compactification is of course that it always works. 27. Excessive functions, the Martin kernel and Choquet theory. Let us do more than answer 'Yes' to question (25.1)(ii) by telling you that Ray theory yields a much simpler and more intuitive account of Martin (-Doob-Hunt) boundary theory, even in the case of discrete-parameter chains! To be honest, this most elementary case of Martin boundary theory is much the most interesting: it has many delightful and important applications. It will be helpful to run through some of the basic analysis for this case now. For one thing, it will help us
111.27 APPROACH TO RAY PROCESSES 289 understand the original Martin boundary of R. S. Martin. Later in this chapter, we shall derive both the analysis and the more interesting probability theory for the chain case from Ray's Theorem. The 'Ray' treatment will be independent of Choquet theory, but we now explain how Choquet's famous theorem on integral representation of elements of simplexes shows that a Martin representation of harmonic (more generally, excessive) functions must hold. Meyer [2] and Phelps [1] have fine accounts of Choquet theory. Choquet's Theorem has been useful in establishing the existence and/or uniqueness of integral representations in many areas of probability theory. Inevitably, its very generality prevents its being useful in pinning down the explicit form of extremal elements. Let / be a countable set and let Π be a substochastic I x I matrix. Define the Green kernel Γ of Π as the / χ / matrix with Γ(;,;):= £ TF(Uj)<co9 VU n = 0 so that, formally, Γ = (/ - Π)" Κ Compare (1.22.1). The probabilistic interpretation is obvious. Let X = (Xn: η = 0,1,2,...) be a Markov chain on / (with coffin state д adjoined) with 1-step transition matrix П, so that, for, neN and i0,il9i2,...j„el, Pfe[*i = hi · · ·; *„ = Q = Π(ί0, h)U(iu i2)... Π(ί„_ lf in). (The Daniell-Kolmogorov Theorem immediately gives an appropriate X) Then Г(/, j) = Ef[time( ^ 0) spent by X in ;]. (27A) ASSUMPTION. There exists a reference point b in I such that 0<Г(Ь,;)<оо, v/e/. This assumption is made throughout the remainder of Section 27. It says: (0 state b can feed into j (ultimately) (V; # b); (ii) every state is transient. The easily-established strong Markov property of X shows that (27.1) Г(У) = Pf[D,· < оо]Г(л j) < Г( j, Д Vi, j, where Dj:=wI{n>0:XH=j}. Now define the Martin kernel к on / χ / as (27.2) K(i,j):=r(Uj)/r(bJ). It follows from (27A) and (27.1) that (27.3) K(Uj)^K(j,j)<<n, Vi,;.
290 MARKOV PROCESSES 111.27 It is another easy consequence of the strong Markov property that (27.4) k(UJ)^k(Ui)< cc, Vi,;\ Exercise. Prove (27.4) probabilistically. Can you give a neat algebraic proof? A function / from / to R is called excessive (respectively, regular) (for Π) if (i) 0</<oo; (ii) Π/ < / (respectively Π/ = /). The set of excessive functions forms a cone С in R7. For the topology of C, we take that of R7, that is, the topology of pointwise convergence. Because of Assumption 27A, 0O-):=supir(b,;)>O, π and since a function f in С satisfies we have (27.5) №>e(j)f(j% Y/. In particular, every / in С may be written as / = /(*)/*, f*eS:={feC:f(b)=l}. The study of the cone С thus reduces to the study of its section S. {27.6) PROPOSITION. The set S is a compact convex metrisable subset of the locally convex linear topological space RJ. This proposition, which is an immediate consequence of (27.5) and Fatou's Lemma, states exactly that S satisfies the hypothesis of the metrisable case of Choquet's Theorem on the existence of integral representations. Recall that an element / of S is called extremal if the equation / = i/i+i/2 (/i,/2eS) implies that / = /x = /2. For our special situation, Choquet's Existence Theorem takes the following form. See Meyer [2] and Phelps [1]. (27.7) THEOREM. The set Se of extremal elements of S is a Gs in S. IffeS then there exists a probability measure ν on @(Se) such that (27.8) /(0 = ξ(ϊ)ν№ Vi. Se (Note that the mapξ\-+ξ(i) is continuous on S.) We wish to add the following theorems
111.27 APPROACH TO RAY PROCESSES 291 (27.9) THEOREM. Further, ν is uniquely determined by f. Choquet's Uniqueness Theorem states that Theorem 27.9 is equivalent to the following lemma. (27.10) LEMMA. The cone С is a lattice in its intrinsic order. Note. The intrinsic order « on С is defined as follows: for x, zeC, we write χ « ζ if ЗиеС with χ + и — ζ. How to prove Lemma 27.10 will be explained in a moment. (As an exercise— not quite as easy as it may look!—try proving it now.) The key technique for studying С is provided by the Riesz Decomposition Theorem 27.14. Let μ be a (non-negative) measure on / such that Γμ(0:=ΣΠυ)μΟ*)<*>, Vi. Then Γμ is called the potential (due to the charge μ). Since (27.11) ΠΓ/ι = Τ μ - μ ^ Γμ, the function Γμ is excessive. Note that the equation (27.12) μ = Γμ-ΠΓμ determines μ from Γμ, and that (27.13) ΠΤμ=ΣΠ*μ|0 (η|ΐοο). (27.14) THEOREM (Riesz Decomposition Theorem). Iff is excessive then f has a unique decomposition (27.15) / = " + Γμ, where ν is regular and μ is a measure on I. Indeed, (27.16) i; = limir/, (27.17) μ = /-Π/. Proof. Define μ by (27.17). Then μ(ι) ^0, Vi, and (/ + Π + ··· + Π")μ = /-ΙΓ + 1/. The Monotone-Convergence Theorem yields (27.15) with ν as is (27.16). Properties (27.12) and/or (27.13) make the uniqueness assertion obvious. Π (27.18) Exercise. Now prove Lemma 27.10 (if you did not do so earlier) by showing that if /1=ι;1 + Γμ1, /2 = ι?2 + Γμ2
292 MARKOV PROCESSES ΠΙ.27,28 then the lattice structure of С in its intrinsic order is exhibited by the equations /1 л л /2 =lim ГТК л v2) + Γ(μχ л μ2), η /ι ν ν/2=/1+/2-/1Λ λ/2, where (»ι Λ »2)(0:= МО Λ Μ0> (Μι л М2)(0-= /*ι(0 α μ2(0· In this connection, we must mention Feller's historic paper [2], the impressive first attempt to define an appropriate 'boundary' for С by lattice methods. Hints for exercise. First prove that if ν < Γμ( < oo) then ν — 0. Next deal separately with the cases (i) v1 — v2 — 0 and (ii) μ1 = μ2 = 0. As an immediate consequence of the Riesz decomposition, we have the following proposition. (27.19) PROPOSITION. For each j in I, the function k(-,j) is a (non-regular) extremal element of S. Every extremal element ofS that is not of the form k(-,j) for some j in I is regular. 28. The Martin compactification. (We continue to assume 27A.) Since potential determines charge, the map (pi/^SczR', (p(jy=K(-J) is one-one. We now identify I with φ(Ι) and let F be the compact closure of I( — φ(Ι)) in S. The set F is called the Martin compactification of /, though, in this context, the theory is due to Doob and Hunt. Since the topology of F is inherited from that of R7, (28.1) for each i, the тарк(г9-) extends continuously to F; then κ: Ι χ F-*R, and, for ^eF\/, we have the alternative notations'. κ(ί,ξ) — ξ(ί). (28.2) THOREM (Doob, Hunt). Every extremal element ofS is of the form κ(·,ξ) for some ξ in F. The following result therefore holds. Let Fe be the set of ξ in F for which κ(-,ξ) is extremal. Then each f in S can be written uniquely as /= ί κ(',ξ)ν(άξ) = ν + Γμ, where ν is a probability measure on 3S(Fe\ ι>:= κ(·,ξ)ν(άξ) is regular, Jfc\i and
111.28 APPROACH TO RAY PROCESSES 293 Once we establish the first sentence of Theorem 28.2, the remainder of the theorem follows from the Choquet results (27.7) and (27.9); and we then have Se = Fea F. (We are presently assuming the Choquet results, but recall that we later-see Section 44—give a full independent proof of Theorems 27.7 and 27.9.) We now argue that it is enough to prove that (28.3) every element f of S may be written as / = J κ{;ξ)ν(άξ) for some (not necessarily unique) probability measure ν on $(¥). First argument. From (28.3), it follows that S is the closed convex hull of F. By a standard result (much more elementary than Choquet theory)—see Theorem V.8.5 of Dunford and Schwartz [1]—the extremal elements of S are contained in F. Π Second argument. First, we clarify notation by (temporarily) writing α for a typical element of Se (not yet known to belong to F) and ξ for a typical element ofF. Let β be an element of Se. Its unique Choquet representing measure on 3#{Se) is of course the unit mass at /?, denoted by ε^(·). However, by (28.3), there is at least one probability measure v^ on 3t(F) such that /?(*)=[ κ(ί,ξ)νβ(άξ)=\ νβ(άξ)\ *(ϊ)μξ(άα), Vi, J#=F J#=F Ja€Se where μξ on ^(Se) is the Choquet representing measure for κ{·,ξ). Hence вд(-)= Г Μ^ξ)μξ(') on ®(Se), and so, for (v^)-almost-all ξ in F, μξ = εβ. But, for any ξ in F for which μξ = εβ9 we have so that j? = ^eF. Π We have now reduced the problem of proving Theorem 28.2 to that of proving the statement (28.3). Proof of (28.3). Fix / in S. Choose a measure β such that 0<Γ/?(0<οο, Vi. (By (27.3), it is enough to choose β so that fi(j) > 0, V/, and £Γ( j, j)fi(j) < oo.) Let /и(О:=тт(ЛО,пГ0(О).
294 MARKOV PROCESSES 111.28 Then f„ is excessive, and since fn is dominated by the potential ηΓβ, it follows from (27.13) and the Riesz theorem that f„ is a potential: (28.4) /„(0 = Σ Γ(ί, ί)μη{ j) = Σ k(U j)vn( j) where v„(;) = Г(Ь, ])μη(]). Since fn(b) = f{b) = 1 for large n, and φ J) = 1, V/, it follows that (for large n) vn is a probability measure on F with vn(/) = 1. Since F is compact metrisable, Pr(F) is compact metrisable in the weak topology. Let ν be a subsequential limit of (vn) in Pr(F). Then (28.3) follows from (28.4) and (28.1). D The following analytical problem remains: how can we determine the 'extremal part FeofFl A striking probabilistic solution is~provided as one part of the Doob-Hunt probabilistic theory of the Martin boundary for this case. (See Section 29.) (28.5) Example. Let A" be a simple random walk on Zd such that \{2d)~' if|j-i| = l, [0 otherwise. We now prove that every regular function (that is, every solution of П/ = / ^ 0) is constant. Case 1: d — 1 or 2. In this case, it is well known that X is recurrent. If / is regular then (under each Pl) f(Xn) is a non-negative martingale, so that lim f(X„) exists. Since X visits every point of Zd infinitely often, the only possible explanation is that / is constant on TLd. This proof is (of course) due to Doob. Case 2: d^3. In this case, X is transient and Assumption 27A holds with b = 0 (say). It is well known (see Spitzer [1]) that Γ(ί, j) - constant | j - i\2~d (| j - i| -> oo), as one might expect by analogy with the Brownian-motion results. Since K(Uj)~\J-i\2-d/\J\2-4, it is clear that F is the one-point compactification /u{oo} of / and that /c(i, oo) = l, Vi. The desired result follows. It is worth mentioning that no particularly simple proof is known. See Spitzer [1] for a specialisation of the Martin-boundary argument to this case and for an Ito-McKean proof. (28.6) Example. We now consider 'space-time coin-tossing'. Think of Xn as (Яи, η), where Hn represents the number of heads in η tosses. We put I:={(m9n)eZ2:0^m^n}. U((m9n);(m + l,n + 1)) = 1 - П((т, n);(m, η + 1)) = ±.
111.29 APPROACH TO RAY PROCESSES 295 Then, for (m, n) and (r, 5) in /, ' s — η r((m,n);(r9s)) = { \r-rn, 0 otherwise. Taking b = (0,0), we find from Stirling's formula that if s -* 00 and r/s -* t e[0,1] then /c((m,n);(r,5))^/it(m,n):= 2ПГ(1 - t)n~m. It is now clear (why?) that the Martin topology can be regarded as identifying (m,n) in / with (1 + n)_1(m,n)ER2, with F\/ = [0,1] χ {1} and Λ = *(·,£) K = (U)eF\J). Thus / is a regular element of S if and only if there exists a probability measure vonJ[0,l] such that (28.7) /(m,и) = 2ПГ(1 - i)n"wv(</t) Jo This result yields an immediate solution of the Hausdorff moment problem. (See Spitzer [1].) The Weierstrass Approximation Theorem makes it obvious that ν is uniquely determined by / in (28.7). Hence, for every ie[0,1], ht is extremal in S. 29. The Martin representation: Doob-Hunt explanation. We retain the notation of Section 28. In particular, we have C:=inf{n:*„ = <5}. (29.1) THEOREM (Doob, Hunt). Almost surely on {ζ = οο}, Χζ_:=ΙίτηΧη exists in the topology of F, and X^eFe. (Recall that the probabilistic results of this section and the analytical results of the last section will all soon (Section 43) be exhibited as consequences of Ray's Theorem.) As explained in Section 8.5 of Ito and McKean [1], the point is that, while Fe is large enough to allow representation of regular functions, it is also small enough (think!) to describe the exits of X. Let us agree to define Χζ.:=Χζ^1 on {ζ<οο}. (We are not interested in the case when X starts at д.) Recall that b is our 'reference' point in terms of which the Martin kernel is defined.
296 MARKOV PROCESSES 111.29 (29.2) THEOREM (Doob, Hunt). Let l=f к(-Л)^т be the Martin representation of the (excessive) constant function 1. Then Pb[Zc_EB] = v1(B), VBe^(Fe). Example. To get /(·,·) = 1 in (28.7), you have to choose ν to be the unit mass at \. Thus Theorems 29.1 and 29.2 contain the strong law for tossing a fair coin. Now let heS. To avoid trivial nuisances, we assume that h is strictly positive on /. The Doob h-transform Hh of Π is defined as follows: Then Tlh is substochastic. We have, with obvious notation, Kh(hj) = -7-TK(hj), h(i) and feShii and only if hfeS. Thus F and Fe are unaffected if we change from Π to Tlh. (You should formulate this a little more carefully.) Hence if Xih) is a chain with one-step transition matrix Tlh then X(h)(C(h) —) exists in Fe almost surely. Here C(h) denotes inf{n:X{h)(n) = d). (29.3) THEOREM (Doob). A strictly positive function h in S is extremal in S if and only if for some single point ξ ofF, we have χΜ(ζΜ-) = ξ, almost surely. Then £eFe and h = κ(·,ξ). We then say that Xih) represents X conditioned to converge to ξ. (29.4) Notes (a) If h = 7c(·,c), where eel, then Xih) has the same laws as {Xn: η ^ ac}9 where ac is the time of the last visit by X to state с (Prove this as an exercise.) This gives the correct interpretation of AT conditioned by {Χ(ζ —) = с} for eel. (b) Theorems 29.2 and 29.3 are obviously closely related. (Investigate the connection.) (29.2) Example. Return to Example 28.6 and take h = ht for some fixed ie[0,1]. Then h = /c(·, f), where { = (i, l)eF\I. Then П„((т, и); (m + 1, и + 1)) = 1 - П„((т, η); (m, η + 1)) = t.
Ш.29,30 APPROACH TO RAY PROCESSES 297 Thus (29.6) the ht transform corresponds to the case of space-time coin-tossing for a coin with probability t of heads. By the strong law of large numbers, X(h) -* ξ, so that ht is extremal. (29.7) A statistical interpretation. The property (19.6) illustrates the attractive idea that the Doob ft-transforms correspond to the set of'appropriate' alternative hypotheses in hypothesis-testing contexts. This idea has been developed to cover maximum-likelihood estimation, sufficiency etc. by members of the Copenhagen school. 30. R. S. Martin's boundary. The original Martin boundary (Martin [1]) was introduced to describe the non-negative harmonic functions on a Greenian domain D in Rn. Let us run quickly (and in heuristic fashion) through some of the basic ideas. We take η = 3, which case provides the most familiar potential theory. Every domain D in R3 is Greenian, so we do not need to explain what 'Greenian' means. Let g be the free-space Green function for R3 for |Δ: g{x9y)-={2n\y-x\)~1. Then the Green function gD for D is the smallest non-negative function on D χ D such that (30. l)(i) g — gD is bounded in the neighbourhood of each 'diagonal' point (x, x) ofDxD; (30.1)(ii) ±Ax(g - gD) = ±Ay(g - gD) = 0 on D; (30.1)(iii) gD(x9y) = gD(y,x). Such a function gD is known to exist. See Section 7.4 of Ito and McKean [1]. Let K:=R3\I>. The physical significance of gD(x,y) is as the potential at χ due to a unit charge (we are using 'probabilistic' units—see Section 1.22) placed at у when V is earthed. The probabilistic significance of gD is that if X is a Brownian motion on R3 then, for x,yeD, (30.2) E*[time spent by X in dy before time #K] = gD{x,y)dy. The strong Markov theorem yields the probabilistic formula (30.3) gD(x,y) = g(x,y) - Exg(XoHv,y), or, in better notation, (зо.зо gD(->y) = g(;y)-Pvg(;y). (Of course there are all sorts of details of rigour to be chased up, but, for the moment, who cares?) Classical Martin boundary theory reduces to the analytical
298 MARKOV PROCESSES IH.30,31 part of the Martin-Doob-Hunt theory of the process {Xt: t < Hv) killed on leaving D. Pick a reference point b in Д introduce the Martin kernel and take the Gelfand-Stone-Cech (GSC) compactification F of £> determined by the separable class of functions {arctan k(x, ·): xe£>}. Full details of such GSC compactifications are recalled later in this chapter. The point is that each function /фс,·) extends continuously to F. Martin's Theorem is that every non-negative harmonic function /οηΰ with f(b) = 1 has a unique representation > = /фс, < JFe\D /(*)= κ(χ9ξ)ν(άξ) JFe\D where Fe is what you expect, and ν is a probability measure on 3i(Fe\D). {30.4) Example. Let D be the open ball {xeR3: |x| < 1}. Then Kelvin's method of images shows that, for yeD, Ы0,у)-(2л:) (* = 0), where x*:= |x|~2x is the point inverse to χ in the sphere 3D = {£eIR3: |f | = 1}. Taking b = 0, you find that as γ-^ξεδΌ (in the Euclidean topology), /c(x, у) -> /с(х, {):= -. Ιί-χ|3 Hence (why?) F = DvdD with the Euclidean topology, and every positive harmonic function / on D with /(0) = 1 may be written as (30.5) /(*)= κ(χ,ξ)ν(άξ). J 3D Invariance under SO(3) implies that every point of 3D is in Fe, so that the representation (30.5) of / is unique. We investigate this example further in Section 31. 31. Doob-Hunt theory for Brownian motion. It is easy to guess the basic form of the Doob-Hunt theorems for Brownian motion or indeed for a general Markov process. However, we need to be careful about the precise formulation
111.31 APPROACH TO RAY PROCESSES 299 of the concepts of harmonic function and excessive function in the context of continuous-parameter processes. In Example 30.4,'/ is harmonic' means of course that jAf = 0 (on D). The correct probabilistic formulation relies on the mean-value property. Here are some guidelines. Let {Pt) be a transition function (in the 'abstract' sense of (1.1)) on a measurable space (E,S). An <f*-measurable function / from Ε to [0, oo] is called excessive (for {/>,}) if (31.1)(i) / is supermedian: Ptf^f, Vi, (31.1)(ii) limPI/(x) = /(x) Vx. no Of course, if χ is excessive, then Ptf(x)]f(x) (Ц0). Do note that, for excessive functions, the 'smoothness' condition (31.1)(ii) is imposed in addition to the 'supermedian' condition (31.1)(i). In Ray theory the difference between super- median and excessive becomes critically important. Now let X be a 'nice' process on a nice space E. Let {Pt} be the transition function of X. If / is an (?*-measurable function from Ε to [0, oo] satisfying the smoothness ^condition (31.1)(ii), then / is excessive if and only if / is superharmonic in the following sense: whenever A is an open subset of Ε with compact closure A. We call an <f*-measurable function from Ε to [0, oo] satisfying (31.1)(ii) harmonic if the mean-value property PE\Af = f holds whenever A is an open subset of Ε with compact closure A. It is important to notice that, in general, a harmonic function will not satisfy Λ/ =/> Vi. Example 31.4 will clarify this matter. After all that, the analogues of Theorems 29.1-29.3 for the situation discussed in Section 30 are obvious (in form!). Let D be our domain in R3. Let K:= R3 \D and let F be the Martin compactification of D. Then, for Brownian motion X started inside D, it is a.s. true that X(Hv-):= lim X{t) exists in F and X(HV — )eFe\D. This should set you wondering about the relation between regular points for the Dirichlet problem and extremal points of the Martin boundary. The following exercise should make you think further about Dirichlet- Martin connections. (31.2) Exercise. Return to the case where D is the open ball {| χ | < 1} for which
300 MARKOV PROCESSES 111.31 Fe = DKjdD with the Euclidean topology. By considering the effect of changing the reference point b from 0 to a point с of D, show that Ρ£ΙΧ(Ηδο)Εάξ^ = κ(€9ξ)μ(άξ), where μ is a normalised surface area measure on dD. If g is a continuous function on dD then the unique continuous function h on D with Aft = 0 on D and h = g on dD is therefore given by the Poisson formula J a, (31.3) h(x)=\ κ(χ,ξΜξ)μ(άξ). idD For fixed ξεδΰ, calculate explicitly the differential generator ±/c(x, {)" χΔκ(χ, ί) = £Δ + /с(х, ζ)'1 grad /c(x, £)-grad of Brownian motion conditioned to hit dD at ξ, and convince yourself that the 'extra' term behaves as it should. (31.4) The Helms-Johnson example revisited (see EII.79.77). Take D:= R3\{0}. Let X be Brownian motion in R3 started in D. Since X never visits 0, the Green function for D is just the restriction toDxD of the free-space Green function 0в(х>у) = (2п\у -xW1 (x,yeD). The Martin compactification adjoins two points 0 and oo to D, producing the expected one-point compactification of R3. If the reference point b is chosen with |b| = 1 then /фс,0) = |х|-1, /c(x, oo)=l, Vxe£>. The function f on D with /(x) = |x|_1 is harmonic in D, but Ptf(x) φ /(χ), Vi > 0, VxeD. Indeed, you can check that (31.5) Ptf(x) = [Φ(χ[2ί] " ^2) - Φ( - χ[2ί] " 1/2)]/(x), Vi > 0, xeD, where Φ is the normal distribution function: Φ(^) = (2π)"1/2 exp (— j u2) du. Equation (31.5) makes it clear that / is excessive for {Pt}. The fact that / is supermedian implies that, VxeD9f(Xt) is a swpermartingale relative to ({^°},Ρ*). The fact that / is not invariant under {Pt} implies that xeD,f(Xt) is not a martingale relative to ({^°},Ρ*). The 'intermediate' fact that / is harmonic corresponds to the statement that / is a local martingale. It is in local-martingale theory that the main illustrative importance of the present example lies. Let Ги:= inf {t:Xt = n~x} = inf {t: f(Xt) = n) < oo.
Ш.31 APPROACH TO RAY PROCESSES 301 We can find gn in 3>(^Δ) with gn = f on {М^и-1}. For |x| ^ n"1, we have, by Dynkin's formula (10.12), Pt*Tnf(x)-f(x) = ExfoX(tA Tn)-f(x) = Ε>ί(ίΛΓ„)-#) тглТп = EX ±Ag(X5)ds = 09 since jAg = jAf = 0 on {| у | ^ η *}. Hence, for each fixed x, we have (31.6) Ллг„/(*) = /(*), Vi,Vn>|x|-1. Since Tn | oo, this property can be regarded as an appropriate ('local') correction of the false statement that Ptf(x) = f(x\ Vx Vi. Note further that, on letting пЦоо in (31.6), we obtain Ptf(x)^f(x), Vi,VxeD, by Fatou's Lemma. Since Ptf(x)^f(x), Vi > 0, VxeD, it follows that (31.7) for each t and each xeD, the sequence {f(XtA rJ:neN} is not uniformly integrable relative to Px. We shall appreciate the full significance of (31.7) only when we come to study local-martingale theory. It is worth noting that, since | Xt \ ~ * is a continuous supermartingale relative to each P* (xeD), lim,^ \Xt\~1 exists a.s.(P*). Exercise. Explain why the limit must be 0, a.s.(P*). Thus a 3-dimensional Brownian motion started away from 0 will never hit 0 and will drift to oo. The corresponding result is true all the more in dimension η > 3. Of course, we have known and used these properties since Section 1.18. Next consider 'X Doob-conditioned to converge to 0' with generator / " 1{\A)f. The radial part of this process is nothing other than a 1-dimensional Brownian motion absorbed at 0. (Check this!) We can see this working in reverse. Let β be a 1-dimensional Brownian motion on (0,oo) with 0 as killing boundary (see Section 1.13). The Martin compactification is of course [0, oo], and if we take b = 1, we obtain /c(x, 0) = 1, /c(x, oo) = x. Thus 'Brownian motion on (0, oo) Doob-conditioned to converge to oo' has generator 2dx2)X = in other words, it is a BES (3) process. 2dx2) 2dx2 dx
302 MARKOV PROCESSES IH.31,32 (31.8) Comments. Martin-Doob-Hunt-Ray-Kunita-Watanabe boundary theory may be developed under assumptions of extreme generality. We present proofs only for the discrete-parameter chain case. But we do so by the most modern continuous (l)-parameter methods. We then give a list of references in which you may chase up the general theory. The long account of Martin-Doob-Hunt theory that we have already given has been a gentle introduction to compactifications. It helps prepare the way for the much more sophisticated Ray-Knight compactification. Further, it will be reassuring after working through the Ray-Knight theory to find that in the case of discrete-parameter chains we obtain exactly (there are many boundaries!) the familiar results of Section 27-29. After all this talk of Martin boundary theory, let us remind you that it is but one application of Ray theory. Indeed, it is not the one that chiefly concerns us in this book! 32. Ray processes and right processes. It remains to answer question (25.1)(iii). Blumenthal and Getoor [1] present a theory based, not on assumptions like the FD property that rely on analytical properties of the transition function, but on probabilistic axioms for a setup The fundamental concept in Blumenthal and Getoor [1] is that of a standard process. Now there are obvious advantages in using probabilistic axioms: for example, the property of being standard is preserved under various probabilistic operations that do not preserve the FD property. Briefly, the situation is this. Some generalisation of standard process was needed in order to cope with branch-points. Meyer introduced the concept of a right process (process satisfying les hypotheses droites) as the 'natural' concept for Markov-process theory. Ray's hypotheses remained the most general analytical hypotheses on transition functions. Ray, Knight, Meyer, Shih, Walsh, Getoor and Sharpe all participated in work that culminated in proving that the concepts of Ray process and right process are essentially the same. This is explained very clearly in the books by Getoor [1], Sharpe [1] and Dellacherie and Meyer [1]. Let us quote part of Getoor's book (perhaps you will not mind if we emphasise the obvious fact that the set D of non-branch-points has no connection with our domain D in the classical Martin-boundary case!): One has the following inclusions among the various classes of processes: (Feller) c= (Hunt) c= (special standard) c= (standard) c= (right). ... These different types of processes were introduced at various stages during the development of the modern theory of Markov processes. In view of the theory to be developed in the sections it seems to me that [except for right processes] they are now mainly of historic interest A
Ш.32,33 RAY PROCESSES 303 Ray process У on a compact metric space Ε is not necessarily a right process since Ρ0φΙ [in general]. However, if one restricts Υ to the set of non-branch points Д then it is a Borel right process. Since YteD for all t ^ 0, this amounts to considering initial measures that are carried by D. In what follows we shall see that the converse is true in the sense that if X is a right process with state-space £, then by changing the topology on Ε one can essentially regard X as a Ray process restricted to its set of non-branch points. There is thus a perfect match between the probabilistic and analytical parts of the theory. However, at least in special circumstances, one can achieve such a match in other ways, as is illustrated by the important books by Fukushima [1], Silverstein [1,2] on symmetrisable processes. But the moral is that the subject's masters regard its theoretical foundations as being fully achieved by the essentially equivalent theories of Ray processes and right processes. We content ourselves with introducing you to the more concrete Ray processes; the masters can then set you right. 5. RAY PROCESSES 33. Orientation. Our basic datum will be a resolvent {Rx: λ > 0} on a space /. In general, there will not exist a nice (right) process with resolvent {Rx} and taking all its values in /. In order to construct a nice (Ray) process with resolvent {Ля}, we must generally allow the process to take values in a suitable compacti- fication F of /. (If this process starts in /, and if / is a Borel subset of F, then the set of times spent in F\I before the death-time of the process will be of measure zero.) First we must construct the Ray-Knight compactification F of / determined by {Ля}, extend {Rx} to a 'Ray' resolvent on F, and construct the 'Ray' transition function {Pt} on F with resolvent {Ля}. Then we shall construct the Ray process on F with transition function {Pj. The main case in which we are interested is that when we begin with a 'standard' transition function on a countable set /; and our notation for the general case reflects that for chains. Here (as a guide—you are not expected to understand about RK compactifi- cations, branch-points etc. now!) is a list of our notations for chains: /: countable 'minimal' state space. {Pt}: ('standard') transition function on / with resolvent {Ля}; F: Ray-Knight compactification of / determined by {Rx}; Fe: set of non-branch- (or extremal) points in F; FbT: set of branch-points in F; E:=tfeF:Pt(bF\I) = 0,Vt>0}. In the definition of £, we have used the notation {Pt} to denote the Ray extension on (F, β&¥) of the original transition function on /.
304 MARKOV PROCESSES IH.33,34 (33.1) Important note. Let / (for reasons explained in Section 32, we abandon the notation 'D for domain') be a bounded domain in R3 and let V\— R3\/. To obtain the Martin-Doob-Hunt results for Brownian motion on /, we apply Ray theory not to {Xt: t < Hv; Pb} but to its time-reversal X := {X{HV -t):0<t<Hv; Pb}, which is also Markovian with stationary transition probabilities and which has Green function ui(y, χ) = 0/(0, *)gi(x, y)/gi(0, у) = £7(0, х)к(х, у). Note the order in which у and χ appear, and also the appearance of the Martin kernel. Time-reversal will be studied in Part 6 of this chapter. Ray theory is an entrance-boundary theory, and time reversal has to be invoked to apply it to theory of Martin exit boundary. Where X goes to is where X comes from. If / is an arbitrary (possible unbounded) domain in R3, we first speed up X via a time substitution of the type described in Section 21, so as to ensure that X exits / (or else dies) within a finite time. Then we reverse X to produce X. The faster the speeding up, the smoother will be the analytic properties of X. As we have already mentioned, you will see all the tricks during our 'Ray' treatment of the discrete-parameter chain case. (33.2) Our plan. We first describe the 'good' situation, that in which we have a Ray resolvent on a compact metric space F. Then we explain how a resolvent on a measurable space (/, J) may, under minimal conditions, be extended to a Ray resolvent on a Ray-Knight compactification F of /. We then construct the Ray semigroup and Ray process on F. 34. Ray resolvents. Let F now denote an arbitrary compact metric space. Let {Rx: λ > 0} be an honest Feller resolvent on C(F): RX:C(F)^C(F), 0</^1=>0^АЯя/<1, ЯЛЯ1 = 1, Rx - Κμ + (λ - μ)/?λ*μ = 0. (34.1) DEFINITION (continuous α-supermedian function). For α^Ο,αη element f ofC(F) is called a (continuous) α-supermedian function relative to {Ля}> ап& we write /eCSMa, if 0^XRx+af^f (УЛ>0). For β > a, (34.2) XRx+fif = XRx+af - λ(β - a)Rx+aRx+/if, so that CSMa = P) CSM". β>α
111.34 RAY PROCESSES 305 (34.3) DEFINITION (Ray resolvent). Our honest Feller resolvent on F is called a Ray resolvent if (34.4) (J CSMa separates points of F. (34.5) LEMMA. {Rx} is a Ray resolvent if and only ifCSM* separates points of F for each fixed strictly positive a. Lemma 34.5 obviously follows from the following result. (34.6) LEMMA. The vector space JS?:=CSMa-CSMa (a>0) is independent of a > 0. Proof. Let 0 < a < β. Since CSMa с CSMfi, it is enough to prove that if (as we now assume) feCSMfi then / may be written as the difference of two elements of CSMa. But f = lf + W-a)Ran-W-x)Raf and, since (β — a)i?a/eCSMa because of the resolvent equation, it is enough to show that / + (/?-a)i?a/eCSMa. Now, since /eCSM", it follows from (34.2) that *Rx+*f^f + tf-x)Ri+af Using this fact and the resolvent equation again, we obtain ^+Л Λ-(β-^^^ί ^(β-^Rx^f + (β-^^^ -Rk+af) so that /eCSM" as required. D (34.7) LEMMA. // {Rx} is a Ray resolvent then $£ is a dense subspace of the Banach space C(F). Proof. For each a, CSMa is obviously closed under the operation л . For /, g, h, fceCSMa, we have (/-ff)A(fc-fc) = [(/ + fc)-to + fc)]A[(fc + ff)-(ff + fc)] = [(/ + fc)A(A + 0)]-fo + fc)6J2f. Hence S£ is closed under л, and, since 5£ is a vector space, 5£ is a lattice. Since $£ contains constant functions and $£ separates points of F because of the Ray hypothesis, the lattice form of the Stone-Weierstrass Theorem gives the result. D
306 MARKOV PROCESSES 111.35 35. The Ray-Knight compactification. We make the following hypothesis. {35.1) GENERAL HYPOTHESIS. Suppose given (i) a measurable space (/, J)\ (ii) an honest resolvent {Rx: λ > 0} on (/, </), so that in particular Rx: bJ -* hJ\ (Hi) a sequence S = (fk: fceN) of elements ofbJ such that, for x,yel with χ фу, there exist λ > 0 and fceN such that Rxfk(x) Φ Rxf\ky\ Heuristic comment. If this last property failed for some χ and у then the process started at χ would be identical to that started from у on the time-parameter set (0, oo); so we would identify χ and y. We want to view {Rx} as a Ray resolvent on a certain compactification F on /. For this purpose, we introduce certain Banach subalgebras of be/. Let A() denote 'Banach algebra generated by'. Define inductively (35.2) Ζ^λίΐ, U *яД Z„+1:=a(z„, (J ВДД It is an immediate consequence of the resolvent equation that the family {Rk: λ > 0} of bounded operators on the Banach space bJ is separable in the uniform operator topology; and it follows easily that each Zn is separable. Put (35.3) Ζ := closure ({JzA Then it is easily verified that the separable Banach algebra Ζ is the smallest Banach subalgebra ofbJ that contains constant functions, contains RxSfor each Λ>0, and satisfies RX:Z-+Z for each λ>0. Further, it is immediate from Hypothesis 35.1 that Ζ separates points of I. Let {gn:neW} be a countable dense subset of Z. Define the map φ: /-»RN as follows: Ф(х):=Ы*Ш*Х...)еКк. Since Ζ separates points of/, φ is one-one. We now identify I with φ(Ι). Since φ(Ι)^Υ\ί-\\9η\\,\\9ηη, П the closure F οϊφ{Ι) is compact in 1RN. We call F the Ray-Knight compactification of Ι — φ(Ι) induced by S. We skip discussion of the influence of S: it has no practical importance for us. Since every gn has a unique continuous extension from / to F, it follows that every g in Ζ has a continuous extension to F. Thus Ζ ^ C(F) in an obvious sense. However, the closed algebra Ζ contains all constant functions, and, by construction of F, separates points of F. Hence, by the Stone-Weierstrass
111.35 RAY PROCESSES 307 Theorem, (35.4) Z = C(F). (35.5) LEMMA. {Rx :λ > 0} is a Ray resolvent on C(F). Proof. Since RX:Z->Z for each Λ>0, {Rx:X>0} is a Feller resolvent on F. Now let CSMa denote the set of continuous a-supermedian functions for {Дя:А>0} on F, and let &:=CSMa-CSMa (independently of α) as in Lemma 34.6. Suppose that there exist two distinct points ξ and η of F that are not separated by S£. Now, for heC(F)nS, Rxh = Rx(h+) - Rx(h~)e&. We can therefore show inductively that ξ and η are not separated by Zl9 or by Z2, etc. But this leads to the ridiculous conclusion that ξ and η are not separated by Z = C(F). D Here is an important situation in which we may use the above construction. (35.6) SPECIAL HYPOTHESIS. Suppose that (i) I is an LCCB; (ii) {Rx: λ > 0} is a Feller resolvent on Cb(I); (Hi) XRxl = 1; (iv) lim^ XRJ(x) = f(x\ V/eCb(/), Vxel. One case in which this special hypothesis holds in that in which we have a Markov chain on a countable set / with standard transition function and we begin with the discrete topology on /, taking, S to be the set of indicator function of dements of /. Suppose now that the special hypothesis (35.6) obtains. Then we may take S to be a countable dense subset of C0(I). Then the general hypothesis (35.1) holds. Since every function gn will be continuous on /, the map φ will here be continuous: the topology on / induced by F will be at least as coarse as the original topology. We emphasize that in this situation, it can happen that limAR/({)*/({) A-oo for some £eF. Note that if the special hypothesis (35.6) holds, and / is compact, then, by the proof of Lemma 6.7, {Rx: λ > 0} will be a strongly continuous resolvent on C(I\ and we shall be back in the FD situation. (35.7) The Feller-McKean chain (see Section 23). If we are given the Feller- McKean chain viewed as a chain on Q with its discrete topology, and take S to be the collection of indicator functions (Iq:qeQ$), then the Ray-Knight compacti- fication of Q will be the usual one-point compactification of R. Hence the Ray-Knight topology of Q—that induced from the topology of F—will be
308 MARKOV PROCESSES 111.35 coarser than the original topology. (The set {0} was open in the discrete topology). The proof of the statements just made follows from results on one- dimensional diffusions X in Section V.50 in Volume 2, the essential point being that, by Dynkin's formula, \RJ(x)-RJ(y)\^ ι e-*f(Xs) ds + [Έ*(β- λΗ>) - 1] RJ(y) < 2/1-41Я [1-Ех(е_Шу)]· (35.8) Example. We now look at a famous example which illustrates a case where our Ray hypotheses do not hold. Let /:= [0, oo). Our process X stays at 0 for an exponential holding time of rate 1 and then drifts towards oo at rate 1. Thus f(x + t) ifx^O, PJ(x) ■f Jo e~7(0)+ e~sf(t-s)ds ifx = 0. This time, Rx does not map Ch(I) to Ch(I). Let us concentrate on the situation when X starts at 0. Put Г:= inf {i: Xt Φ 0}. Then Τ is an {«Г,°+} stopping time, but, since X(T) = 0 and X{T + ε) Φ 0, Ve > 0, the process X does not start afresh at time T. (This example is the standard example of a simple Markov process that is not strong Markov relative to the {J%°+} filtration.) For this example, what we need the Ray-Knight compactification to do is to enforce the strong Markov property by tearing [0, oo) apart at 0, producing a 'corrected' state space F:=0u[0+, oo), where [0+, oo) is homeomorphic to the usual half-line [0, oo) and where 0 is now a point isolated from [0+, oo). Our process (started at 0) is then modified by setting Xt:=0eF (i<T), Xr:=0+, Xt:= Τ (ί>Τ); and it is now strong Markov, as required. In this example, the mapping Ι\-+φ(Ι) is not continuous: the Ray-Knight topology on / is now bigger than the original topology because {0} is open in the RK topology. {35.9) Levy's diagonal Q-matrix. Levy 'jazzed up' Example 35.8 to produce a remarkable illustrative example for Markov chain theory. Let /:= Qn [0,1]. Let q be a strictly positive function on /\{1} such that Σ 4Γ1<00· iE/\{l} Put qx:= 0. Our process takes values in [0,1]. Its sample paths are continuous increasing functions which spend almost all their time in /. (The paths are
ΙΠ.35,36 RAY PROCESSES 309 'random Cantor functions'.) We now describe the law of the process started at 0. Let {Tj'.jel} be independent exponentially distributed variables, T} having rate qp so that, in particular, 7\ = oo. Define xt:=j if Σ Γ^ί<Σ тк k<j k^j and interpolate X by continuity (or monotonicity). Then X is a simple Markov process with resolvent satisfying i^k<j as for the pure-birth process in Section 26. The discussion of Example 35.8 makes it clear that the Ray-Knight compacti- fication should tear the set [0,1] apart at each point i of /\{1}, so that the process will spend an exponentially distributed time at i, then jump to i+ and leave i+ immediately. The RK topology on / arises from a metric d9 which, in this example, is given by d(i,j) = Ei{Hj)= Σ q-\ Whether or not we deal in compactifications, the process X cannot jump from a point i of / to a point j of /. (The point i+ is of course not in /; it is a 'fictitious' state from the point of view of chain theory.) Hence the β-matrix Q of X satisfies qV) — 0 whenever i Φ). Thus Q is the diagonal I x I matrix G = diag(-^) = (-^o)· Ray's Theorem: analytical part 36. From semigroup to resolvent Here is Part 1 of Ray's amazing achievement. (36.1) THEOREM (Ray's Theorem: Part 1). Let {Rx} be a Ray resolvent on a compact metric space. Thus {Rx} is an honest Feller resolvent, and the space 5£ in Lemma 34.6 is dense in C(F). Then there exists a unique honest measurable transition function {Pt} on {F9@{F)) (see Section 3) such that (i) t\-^Ptf(x) is right-continuous on [0, oo)for xeF andfeC{F); (ii) RJ(x) = f °° e~XtPtf(x) dt (feC(Fl xeF, λ > 0). Points to note (a) It is not true in general that Pt: C(F)->C(F). (b) It is not true in general that P0 = /, though, of course, P0Pt = PtP0 = Pt, Vi^O.
310 MARKOV PROCESSES 111.36 (с) It is not true in general that t\-+Ptf{x) is continuous for feC{F) and xeF. An example illustrating these points will be given at the end of the next section. Getoor [1] and Dellacherie and Meyer [1, Chapter XII] prove the theorem from first principles, that is, without using the Hille-Yosida Theorem. Since we already have the HY Theorem, we give a proof based on it, guided in part by Meyer [2]. Proof of Theorem 36.1. Because of the importance of the theorem, we go carefully through the proof. Throughout the proof α denotes a fixed positive number. Though α is used in its construction, the final transition semigroup {Pt} will, of course, be independent of a. We make use of the well-known fact that if (xmn) is a double sequence such that both m\-+xmn and n\-+xmn are monotone non-decreasing then lim lim xmn = lim lim xmn. m η η m (This is of course what underlies the Monotone-Convergence Theorem.) As a consequence, we see that (36.2) the limit of a non-decreasing sequence of right-continuous non-increasing functions mapping [0, oo) into [0, oo) is right-continuous. Strategy. The idea of the proof is to begin by using the Hille-Yosida Theorem to establish results on the domain Z0 of strong continuity of the resolvent; to extend these results by monotonicity to functions in CSMa; then by linearity to functions in jS?a:=CSMa-CSMa; then by continuity to functions in C(F), since J5fa is dense in C(F); then by the Monotone-Class Theorem to functions in b^(F). But we have to be careful of the order in which this strategy is applied. Step 1: Write (36.3) ^:= RXC{F) (independently of λ > 0), Z0 := Λ, the closure being in C(F) or b^(F). By the Hille-Yosida Theorem, there exists a strongly continuous contraction semigroup {Qt: t ^ 0} on Z0 with resolvent the restriction of {Rx} to Z0. It is clear from (5.4) and (5.5) that each Qt is positive on Z0 in that if /eZ+ (that is, feZ0 and />0) then Qtf ^0. The constant function 1 is in Z0, and Qt\ = 1 for all t. Finally, for /eZ0, ЛКя+а/-*/ as Л-*оо, whence (36.4) for feZ0 and t > 0, Qt(XRx+J) -> Qtf (strong topology). Step 2: Let t^0 and let /eCSMa. The map X\-+XRx+af is non-decreasing because, for 0 < λ < ν, vRx+J-XRx+af = (v-X)Rv+alf-XRx+Jl>0.
111.36 RAY PROCESSES 311 Since XRx+afe3i, so that Qt{XRk+sJ) is defined, we may (as we must!) define (36.5) for /eCSM" and xeP, Ptf(x):= ΐ lim {QMRx+af)} (x). ЯТТоо The formula (36.5) and the linearity and positivity of {Qt} make it clear that if /i> /2eCSMa and cl9 c2e[0, oo) then Pt(cifi + c2f2) = c1PJ1 + c2P2/2, and if /, geCSM* with f^g then Ptf>?a. It is clear on comparing (36.5) with (36.4) that Pt = Qt on Ra(C(F)+). The map Pt extends uniquely by linearity to a positive map Pt: &* -> b^(P), where S£ a:= CSMa - CSMa, and P/ agrees with β, on 9t. Because Pt is positive and linear and Pf 1 = 1, and since JSfa is dense in C(F) (by Lemma 34.6), (36.6) Pt has a unique extension to a bounded linear operator Pt: C(F) -* b@(F). We know from Theorem 6.2 that there exists a unique kernel Pt: F χ $?(P) -* [0,1] such that хь->Р,(х,В) is Jf(P)-measurable for each Be@(F\ p\-^Pt(x,B) is a probability measure on (F,3#(F)) for each xeP, Λ/(*)= f ЛМ)0/(Л V/еОД. The map /ι—► Ptf from C(F) to b^(P) therefore extends canonically to q map from b^(P) to b^(P). Step3: We claim that the linear operators {Pt:t^0} on hSS{F) have the semigroup property: (36.7) PsPt = Ps+t (ΟΟ,ί^Ο). Proof of (36.7). We know from the Hille-Yosida Theorem that P5Ptf = Ps+tf for feZ0. Now let /eCSM". Then XRk+afeZ0, and PsPt№i+af) = P5+tttRx+af). We have РДЯРя+а/)|Р,/ by (36.5); and, since Ps arises from a kernel, the Monotone-Convergence Theorem yields (PsPtf)(x) = (Ps+tf)(x) (/eCSM«, xeF).
312 MARKOV PROCESSES 111.36 The result extends according to the remainder of our strategy: by linearity to /ei?a, thence by continuity to feC{F\ and thence by the Monotone-Class Theorem to /eb^(F). Step 4: We now show that (36.8) for geC(F)+ and xeF,e-atPtRag(x) < Rag(x% and the map t\->e~atPtRag(x) is right-continuous and non-increasing. Proof of (36.8). For λ > 0, let hx:= XRx+ageZ0. By the Hille-Yosida Theorem, we have e-«Pt{XRx+aRag) = e^PtRahx = Г e~^Prhkdr ^RJi^R^-R.^g^R^ But Ra0eCSMa, so that, by (36.5), (e-^PtRag)(x)^(Rag)(x) №). That e-«s+»Ps+tRag(x)^e-*sP5Rag(x) is now clear from the semigroup property. By the principle (36.2), the map t \-^e~atPtRag(x% the non-decreasing limit of the right-continuous non-increasing maps t\-+e~atPt(XRx+aRag)(x% is right- continuous. Step 5: Next we prove the following. (36.9) LEMMA. For feCSM* and xeF, the map tb^e~atPtf(x) is non-increasing and right-continuous. Remarks. This fact is of central importance in the probabilistic theory. Proof of Lemma 36.9. We have XRx+J=RAf-№x+J\ and X(f-Rx+af)eC(F)\ whence, by (36.8), t\-^e~atPtXRk+aLf(x) is non-increasing and right-continuous. The desired result now follows from (36.5) and the principle (36.2). Step 6: It is now clear, since 5£a is dense in C(F\ that, (36.10) for feC(F\ the map t\-+Ptf(x) is right-continuous.
Ш.36,37 RAY PROCESSES 313 The Monotone-Class Theorem shows that Pt defines a measurable transition function on (F,08(F)\ and all that remains is to confirm that, for feC(F) (or for feb@{F)\ for xeF and λ > 0, we have (36.11) Γ e~XtPtf(x)dt = RJ(x). Jo Proof of (36.11). If feZ0 then (36.11) holds by the НШе-Yosida Theorem. The argument is now completed by the machinery in our strategy. D {36.12) Exercise. Prove that Z0 = {feC(F):P0f=f}. Hint. Compare Lemma 6.7. 37. Branch-points. We continue with the notation and hypotheses of Section 36. The set Fe of non-branch-points of F (for {Pt}) is defined as follows: Fe:= {xeF:P0f(x) = f(x), V/eC(F)} = {xeF:P0{x,-) = ex}. The set Fbr of branch-points is defined as Fbr:= F\Fe. The proper explanation of the role of branch-points escapes the analysis, and has to wait for the probability: you need to see the paths of the process. (37.1) LEMMA. A semigroup on a compact metric space F is FD if and only if it is a Ray semigroup without branch-points. (Of course, a 'Ray semigroup' is a semigroup derived from a Ray resolvent as described in Theorem 36.1.) Lemma 37.1 is an immediate consequence of the proof of Lemma 6.7. Now let (hm) be a dense sequence in CSM1. Then P0hm = Τ Hm XRx+1hm ^ hm, Vm. Since JS? = CSM"1 - CSM1 is dense in C(F), Fe=f){P0hm = hm} m = f]f]{P0hM>hm-n-1} m η = ПП(и№иА«>1-«"1}\ m η \ λ / so that, since the set in large parentheses is open, (37.2) Fe is a G5 in F. In particular, Fee$tF.
314 MARKOV PROCESSES 111.37 Let μ be a probability measure on {F, $F). We write μΡ, for the measure ' (',:= i μΡ,(·):= | μ{άχ)Ρι{χΛ However, it is better to think in terms of the functional notation (μ,0):=μ(0):= \ράμ for measures, and define μΡί via faPtJ) = (^Ptf). {37.3) LEMMA. For^ePr(F), μΡ0 = μ if and only if /x(Fbr) = 0. Proof With (hm) as above, we have μρ0 = μο(μΡ0, ft J = (μ, ftj, Vm, ομ{Ρ0Ηηι>Ηη-η-1} = 1 Vm,n. (37A) COROLLARY. Pt(x, FJ = 0, VxeF, Vt ^ 0. Proof Since Ρ,Ρ0 = Λ> ^e measure μ:= εχΡ, satisfies μΡ0 — μ. D (37.5) Example. Here is a rather artificial example to illustrate points (a), (b) and (c) made after the statement of Theorem 36. Let F:= [ — 1,1]. We take a process that while away from 0 drifts towards 0 at rate 1, and which on approaching 0 jumps (or branches) to +1 or — 1 with probability \ each. For x>0, Pf(x) = lf{X~t] {t<X% where <·> denotes 'fractional part of. For x^O, \f{x + t) {t<\x\\ Ptf(x) = \ \Ptf(-x) (t>\x\). You can see that Pt does not map C(F) to C(F) and that t\-^Ptf(x) has points of discontinuity. Now, for χ > 0, Jo f(x-t)dt + e~XxRxf(0),
111.38 RAY PROCESSES 315 with a similar 'Dynkin formula' for χ <0. It follows that Rx:C(F)->C(F). The result Z0 = {/eC(F):P0/ = /} of Ray theory leads us to believe that Z0:= closure RXC(F) = {feC(F):f(0) = ±/(1) +|/(-1)}, and this result is easily verified directly. The fact that {Rx} is indeed a Ray resolvent now follows immediately from the fact that Z0 separates points off. 38. Choquet representation of 1-excessive probability measures. We continue with the notation of Sections 36 and 37. For α>0, a probability measure μ on (F,@F) is called a-supermedian (relative to {Яя}) if λμΚλ+α<μ> УЛ>0, that is, if μ№λ+α/)^μ(η V/eC(F)+, VI For every μ in Pr (F) and every α ^ 0, we have (as λ -* oo) μ(λΚλ + α/)^μ(Ροη V/eC(F), so that, in the weak topology of Pr (f), λμΚλ+*-*μΡο· Hence a probability measure μ on (F, Я¥) will be called cc-excessive (relative to {Ля}) if (38.1) (i) μ is a-supermedian; (38.1)(ii) μ = μΡ0, or equivalently (Lemma 37.3) /x(Fbr) = 0. Fix α = 1 for convenience. The set of 1-excessive probability measures on F is easily shown to satisfy the hypotheses of both the existence and uniqueness parts of Choquet's representation theorem. However, we can now obtain the explicit form of the representation by simple direct methods. {38.2) THEOREM (Ray's theorem: Part 2). Let μ be a l-excessive element of Pr (F). Then there exists a unique element ν of Pr (F) such that v(Fbr) — 0 ana Proof For λ > 0, set νλ:=(Λ+1)(μ-λμΚλ+1). Because {Rx} is honest, vxePr(F). Recall that Pr(F) is compact and let ν be any limit point of νλ as A-* oo. The resolvent equation shows that νλΛ1=α+1)μΚλ+1,
316 MARKOV PROCESSES Ш.38,39 so that on letting λ -* oo through a suitable sequence, νϋ1=μΡ0 = μ. Put ν = vP0. Since ν = vP0, we have v(Fbr) = 0; and, since P0^i = Ri, we have Now if v*R1 = μ and v*P0 = v* f°r some v*ePr(F) then (as λ-* oo) vA = (A + l)v*(Ki-AR1RA+1) = (A + l)v*RA+1->v*P0 = v*, so that v* = v. D Ray's Theorem: Probabilistic part 39. The Ray process associated with a given entrance law. Our treatment of FD processes was designed to make the transition to Ray processes as easy as possible; and we shall not dwell on those 'FD' arguments which apply in the present context. We shall highlight those places where the Ray theory is more subtle. For the Martin-boundary application (and for other purposes), we need to deal with processes with time-parameter set (0, oo) open at 0. (The X process in (33.1) is not defined at time 0.) Let ρ — {pt:t>0} be a probability entrance law for {P,}, that is, a family of elements of Pr (F) satisfying (39.1) Рж = рЛ Vs,i>0. Note that, for 0 < ε < ί, PtPo = ΡεΛ-εΛ) = ΡεΛ-ε = Ρ» so that (39.2) Pt(Fe)=U Vi>0. Step 1: use of Daniell-Kolmogorov Theorem. For coei40'00*, write Υ)(ω) = ω(ί); and put ^°'0Ο):=σ{Υί:ί>0}. The Daniell-Kolmogorov theorem implies the existence of unique measure P* on (F(0'*>\ J^0,00*) such that, for 0 < t1 < t2 < · · · < tn and xl9x2,...,xHeF, (39.3) PpiYtledxi; Yt2edx2;...;Ytnedxn-] = ptl{dxi)Pt2_tl{x1,dx2)...Ptn_tn_l{xn-udxn). In particular, (39.2) shows that (39.4) P^yiEFJ = Pt(Fe)=l, Vi>0. Step 2: regularising Υ to produce X. If fteCSM1 then (Lemma 36.9)e~l Pth < ft, Vi,
111.39 RAY PROCESSES 317 so that (39.5) {е-'й(У;):0<*<оо} is a Pp supermartingale relative to the filtration induced by the У-process. The crucial property that CSM1 separates points now allows us to repeat the argument of (7.9) to show that, a.s.(Pp), the limit *,:= lim Yq exists for all ί > 0 and 11-> Xt is Skorokhod from (0, oo) to F. The proof that РТ*,= У,] = 1, Vi>0, proceeds as did the proof of (7.12), but you find that you now need to use (39.4). Step 3: extending X to t = 0. By applying (H.69.4) to supermartingales of the form (39.5), we see that (39.6) X0:= lim Xt exists in F mo and the process X is an R-process in [0, oo). Set p0:=PpoX~1. By right-continuity of paths, (39.7) p0 = 1™р, in Pr(F). mo We must now check that (39.8) PoPt = Pt, Vi, so that (39.3), with X replacing У, extends to the case when 0 ^ ix ^ t2 ^ ··· <ί„· It is tempting to write (ρε, Ptf) = (ρ,+ε,/), feC(F\ and to try to deduce (39.8) by letting ε j JO. However, Ptf need not be continuous. We can prove (38.8)—with important extra information—as follows. We have, for λ > 0, ε > 0 and feC{F\ (Pe,RJ)= ΓV*'(pi+£,/)</i = E' [V^m+J^; Jo Jo and we can let ε||0, using the facts that Rxf is continuous on F and that Ζ is right-continuous, to obtain {p0,Rj) = E» ("e-*f(Xt)dt= \™e-»(p„f)dt. Jo Jo Now p0R1 is clearly 1-supermedian, and, because of (39.2), it is 1-excessive. Hence, by Theorem 38.2, po^ = vRx for a unique ν carried by Fe. From the resolvent equation, р0Яя = νΚλ for every λ. We now know that for feC{F), we have, for almost all i, (39.9) (v, Ptf) = (po, Λ/) = E'/(Xf) = (pi5/).
318 MARKOV PROCESSES Ш.39,40 But tv-+Ptf(x) is right-continuous for each x, and X is an i?-process. Hence, by right-continuity, equality holds in (39.9) for all t ^0 and all feC{F). Thus, (38.8) holds. Since vP0 = v, we see that p0 = Pp°X~x = v. Since p0 = v, so that p0(Fe) = 1, we obtain the following improvement of (39.6): (39.10) X0:= lim XteFe, a.s.(P'). fjjO It is (39.10), applied in reversed time, that gives the Doob-Hunt Convergence Theorem for the Martin boundary. 40. Strong Markov property of Ray processes. Let Ω now denote the space of R-paths ω from [0, oo) to F. (For ωεΩ, we define ω(οο) = δ, where д is either a 'new' coffin state or the 'old' coffin state already adjoined to arrange honesty—it does not matter.) Let X now denote the coordinate process, ΑΓί(ω):=ω(ί), let ^°:= a{Xt:t ^ 0} and let J%°:= a{X5:s ^ ή. For any pePr{F\ (40.1) Pt:=pPt defines an entrance law p. The results of Section 3 ' show that we can define a corresponding law Pp on (Ω, J^0). Our new Pp is what we called PP<=>X~1 in Section 39. Now, with ρ as in (40.1), it is natural to do what we did in FD theory and write Ρμ for the measure Pp on (Ω, J^°). The significance of μ as 'initial law' is no longer accurate, however, since Ρ"ο*-ι=μρ0. (We shall see shortly that we can think of μ as the Ρμ law of X instantaneously before time 0.) As usual, we shall write P* {xeF) for PEx. Then Рх1Х0еП = Р0(х,Г) (Te@F\ and, for 0^t1^t2< ··· ^i„, PxlX0edx0; Xt,edx1\...; Xtnedxn~\ = P0(x,dx0)Ptl(x0,dx1)'-Ptn-tn_l(xn-1,dxn). Of course P"(.)= Γ Ρχ(ήμ(άχ) on J*"0. The setup (ajF^,p*,0,,*t) is the Ray process with transition function {Pt}. It has the simple Markov property.
IH.40,41 RAY PROCESSES 319 Because Pt need not map C(F) to C(F), we have to use a 'Laplace-transformed' version of the proof of Theorem 8.3 for FD processes. Here are the necessary modifications to that proof. For Τ an a.s. finite {SF°t+} stopping time, T{n) its nth dyadic approximation, Ле^°+ and feC(F\ we have Ι β-λ5ΕμΙ/οΧ(Τ{η) + 5); Λ] ds = Е"[(КЯ/) °*(Г(И)); Л]. Jo Letting n-* oo and using right-continuity of paths and the fact that RxfeC(F\ we find that |°°β-λΈ'1[/οΖ(Τ+5);Λ]^ = Ε'1[(Κλ/)οΖ(Τ);Λ] Jo = Γ°° Γ^[(Ρ5/)οΙ(Γ);Λ]ώ. Jo Hence (40.2) Е"[/о*(Г+ 5);Л] = Е"[Р5/о*(Г);Л] for almost all s. But (same old argument!) each side of (40.2) is right-continuous in 5, so that (40.2) holds for all s. We can now follow exactly the same course as we did in Section 9, introducing various completions etc., and establish that the 'Ray process proper' (Ω,.Τ,^,Ρ',0,,*,) /5 strong Markov. Assume this done. 41. The role of branch-points. We already know that for fixed t ^ 0, XteFe a.s. We now prove the much stronger result that (41.1) almost surely, X never visits Fbr: Px[XteFe, Vi ^0] = 1, Vx. Proof of (41.1). The 'right' way to prove this is by using Meyer's section theorem (Theorem H.76.2), but there is an elementary proof avoiding capacit- ability theory—see the exercise below. The Section Theorem shows that it is enough to prove that if Τ is a finite stopping time and xeF then (41.2) P*[*(DeFbr] = 0. But pt:=Px°XYlt defines a probability entrance law because of the Strong Markov Theorem, and (41.2) now follows immediately from Step 3 of Section 39. Exercise. Deduce (41.1) from (41.2) by using Lemma H.75.1 and the fact (37.2) thai Fbr is Ka (countable union of compact sets). The 'exploding birth process' example in Section 26 shows that X can have
320 MARKOV PROCESSES 111.41 left limits in Fbr. Thus X may be able to approach Fbr, but it will branch at the moment of approach. The following theorem, which is the 'Ray' analogue of BlumenthaPs Quasi-Left-Continuity Theorem (Theorem 11.1), is the true explanation of the role of branch-points. {41.3) THEOREM. If (Гп:пе1М) is a strictly increasing sequence of {J^} stopping times with ГИ||Г< oo then, for xeF and Ге@Р, Р*Г*геГ V ^г(„)] = Л)(*г-,Г), where Ч^Т(п):=а{^Т(п):пеЩ Proof Let AeJ^) for some k. Then, for /eC(F), we have, for η ^ fc, e-"Ex[foX(Tn + ί);Λ] dt = ExlRxfoX{Tn); Λ]. f Jo Since the set of discontinuities of X is countable (why?), and hence of measure 0, we can let η -> oo to obtain f Jo e-XtExlfoX(T+ty,A}dt = ExlRxfoX{T-y,A] f Jo = e-*ExlPtfoX{T-)-\-\dt. Hence, Jby the old right-continuity argument yet again, Ел[/о^(Г+0;Л] = Ел[Л/о^(Г_);л] for all t ^ 0. In particular, (41.4) Е*[/(*Г);Л] = ExlP0f(XT_);X]. Monotone-class arguments now show that (41.4) is true for all Λ ε V ^T(k) and all feb$SF. On taking / = /r, we obtain the desired result. D (41.5) Note. If we set X0__ := X0 then Theorem 41.3 and its proof apply to the case when ТП||Г' is interpreted in the slightly wider sense that (i) when Γ(ω) = 0, Γ„(ω) = 0, Vn; (ii) when Τ(ω) > 0, Τη(ω)-> Τ(ω) and Τη(ω) *ζΤη + 1(ω)< Γ(ω), Vn: From now on, we always allow this wider interpretation of Τπ || Τ" for stopping times. We have at last finished Ray's Theorem (though Ray's amazing 1959 paper goes further in certain directions). Recall that we have utilised ideas of Knight and Kunita and Watanabe as well as those of Ray.
111.42 APPLICATIONS 321 6. APPLICATIONS Martin boundary theory in retrospect 42. From discrete to continuous time. The idea of combining time reversal with Ray theory to produce Martin-boundary results goes back to the very important Kunita and T. Watanabe papers [2,3]. A fine account appears in Meyer [4]. The optimal way to handle the discrete-parameter chain case has been known in folklore for some time, and the extremely effective device of using time transformation to make the resolvents into compact operators finds general expression in the paper [1] by Garcia Alvarez and Meyer. As in Section 27, let / be a countable set, let Π be a substochastic I x I matrix with Green matrix Γ:= ΣΓΓ and make the following assumption: there exists a (reference) point b in I such that 0 < Г (bj) < oo, V/. Define the Martin kernel к on I x I: K(iJ):=r(iJ)/r(bJ). Let J (rather than X)—J stands for 'jump chain'—now denote a discrete- parameter chain with one-step transition matrix Π. By observing J with a (discrete) clock that 'ticks' only when J changes position (and which therefore 'ignores' times η when J(n) = J(n — 1)), we produce a new chain J. You can easily check that щи л=[i - s(um [i - πα о] - чщ,л and that the change from Π to Π preserves excessive and regular functions and also preserves the Martin kernel. Since, further, results of Doob-Hunt type are invariant under the change from J to J, we may as well make the assumption Щм) = 0, Vi. Now let q(-) be a strictly positive function on / such that (42.1) Σ^<οο. J q(j) We shall often write q for the diagonal I x I matrix dmg {q{i)}. Introduce the I x I matrix Q:=-q + qYl. We now let X be a (right-continuous, continuous-parameter) chain on I with Q-matrix Q. It is well known that each visit by X to state i in / is exponentially distributed with rate q{i) and is independent of the behaviour of X prior to that particular visit. It is also well known that the jumps of X are made in accordance
322 MARKOV PROCESSES ΠΙ.42 with the law of J. The lifetime ζ of X is a.s. finite. Indeed, for each i, Εΐα=Σ^ΣΓ^<οο. J 4(J) J q(j) The P' probability that X is at j at time t having made the η (>0) jumps i = i0 to iu it to i2,.. ■,i„~ι to i„ = j has (as a function of i) Laplace transform ПлО'о, iJUfa, i2) ■ ■ ■ Πλ(ίη_ l5 i„) [A + q(m ~ \ where Пя denotes the substochastic matrix: ПЯ(У):= [A+ #·)]" МОЩУ). thus the resolvent {Rx} of X satisfies (42.2) Яя(и) = 1^А, A + q(j) where ГА is the Green matrix of Πλ: Γλ··= Σ πι. The formula (42.2) is intuitively obvious from the formal calculation Ля=а-е)-1=(^+^-^п)-1 = [(я+^)(/-пя)]-1. All of these formulae are due to Feller [3]. We write {Pt} for the transition (matrix) function of X, РДи):=Р'{*, =./}, and G for the Green matrix of X, G(iJ) = R0(iJ) = r(iJ)/q(j). Note that the Martin kernel for X is again к. (42.3) Time-reversal. We now insist that X starts at b, so we work with the Pb law. Define \d (t>0, where δ is a new coffin state isolated from /. Since Ϋ is left-continuous and we wish to work with right-continuous processes, we put X(t):=Y(t+) (i>0). We do not define X(0). It is easy to believe that {X(t): ί > 0} is a right-continuous (Pb) modification of {Y(t): t > 0}.
Ш.42,43 APPLICATIONS 323 For t > 0, define ЛО-):=РЬ[^(0=Л (jelud). Then, as we shall see in Section 45, we have the following. (42.4) Nagasaw as formula. For 0<t1<t2<··· <tn and j!,^,..., jne/u5, P\X(h)=ju...9X(tn) = jJ = ptl(j1)Pt2_tl(j where (42.5) Pt(U j):= G(bJ)Pt(j, i)/G(b, i) (ijel), Pt(i, 5):= 1 - Pt(U /), Pt(d, *) = unit mass at д. You can easily check that {Pt} is an honest transition function on I ud. It follows from (42.4) that {pt: t > 0} /5 a probability entrance law for {Pt} on I и д. The resolvent {Rx} of X is obtained by taking Laplace transforms in (42.5). Thus (42.6) Rx(iJ) = G(bJ)Rx(j, i)/G(b, i) (ijel), Яя(/,5) = Я-1-Кя(/,/), etc. In particular, X has Green function G on I x I satisfying G(iJ) = R0(iJ) = G(bJ)K(jJ). (42.7) Exercise. Explain (intuitively—we do not yet have a strong Markov theorem for X) why the fact that X can reach д from a point i of / only via b corresponds precisely to the result RX(U d) = Rx(i, b) [ЯЛ(Ь, Ь)] " *Rx(b, д) ^ КЯ(Ь, 4 and prove this result analytically. 43. Proof of the Doob-Hunt Convergence Theorem. The plan should now be obvious to you. Take the discrete topology on I и д. Then I ud is an LCCB and {Rx} is a resolvent on /u3 satisfying the special 'pre-Ray' hypotheses (35.6). We can therefore build the Ray-Knight compactification Fud(say) of/u5 determined by {Rx}. Since the indicator function of the set {5} is continuous and of compact support on /u5, it follows from the definition of Zx in (35.2) that (43.1) i\—>Rx(i,d) extends continuously from I ud to Fud. But since, by (42.7), Rx(iJ)^Rx(bj)<Rx(d,d), Vie/, it is clear that д is isolated in Fud. Hence F is a compactification of /.
324 MARKOV PROCESSES ΠΙΑ {43.2) THEOREM. The Ray-Knight compactification F of I is identical to tfa Martin compactification FM (say) of I based on к. That is, there exists a homeomorphism of F to FM that leaves points of ι invariant. Proof. Let /(/) be the usual Banach space of absolutely convergent series (μ(: iel with norm IM|:=Zl"il<00· Usually, but not always, we think of /(/) as the space of signed measures и on I with || и || as total-variation norm. The whole of the present proof could be based on the fact that each Rx (λ ^ 0), acting on the right by multiplication u\->uRx, is a compact operator on /(/). (This follows immediately from (43.3) below and the well-known Cohen-Dunford criterion.) However, we shall phrase the argument in terms of uniform integr- ability, which is the concept underlying the Cohen-Dunford result. For λ ^ 0, we have (43.3) Rx(iJ) < R0(iJ) ^ R0(jJ) = i~, q(j) and, since YT{jJ)lq(j) < со, the functions {ЛА(/, ·): λ ^ 0, iel} on I are uniformly integrable with respect to counting measure on I. Hence (43.4) if (in) is a sequence of I and λ^Ο is fixed then limn Λλ(ίπ,·) exists in lx(I) if and only if limnRx(inJ) exists V/. Indeed, this follows directly from (43.3) and the Dominated-Convergence Theorem. It is immediate from (43.3) that each Rx (λ ^ 0) acting on the right by multiplication on /(/) is a bounded operator, and it is easy to see that the resolvent equation extends to give R0(I - ХКЯ) = RX9 R0 = RX(I + ЯЯ0) in the language of bounded operators on /(/). Since, for example, Λο(/„,·) = ^λ(^·)(/ + ^0), we can improve (43.4) to the following form: (43.5) lim„Rx(i„,·) exists in 1(1) for some (and then all) λ^Ο if and only if lim„ R0(iH,j) = lim„ G{bJ)ic(j, in) exists V/. The argument leading to (43.1) with; replacing д shows that if (/„) is a sequence in / converging to £RK in the Ray-Knight topology of F, then Ля({як,/) = 11тЯяО'.эЛ exists Y/e/.
Ш.43,44 APPLICATIONS 325 Hence lim k{j, i„) exists V/, so that (in) converges to a point ξΜ of FM in the Martin topology and (43.6) K(j^u) = G(b,jrllR^KKMI + ^Ro)(k,j), Vje/, fori for every λ > 0. Conversely, suppose that a sequence (ij in / converges to a point ξΜ of F^ (in the topology of FM). Then, from (43.5) (you can check that the extra д term causes no trouble), lim ZJ^„5J)-^M,J)I = 0, where ( Σ С(М)к(*,{м)(/-ЛЯя)(*,./) (jel), (43.7) Лл(^м,;):= < *·'"3 [Х^-ЯМиЛ (j = §). Hence, for every bounded function / on /, lim RJ(i„) = RJtfM):= Σ. ^a«m. Λ/(Λ· By the construction (Section 35) of the Knight algebra C(Fud), it now follows that lim g(in) exists for every g in C(Fud). Hence £κκ:= limin exists in F, and You can round off the argument (that ξΜ = ξκκ etc.) to your own satisfaction, but (43.6) and (43.7) really say it all. In particular, these results show that Fe and (FM)e agree. Π The Doob-Hunt Convergence Theorem 29.1 now follows immediately from (39.10) because Χ(ζ -) = X(0 +) Π 44. The Choquet representation of Π-excessive functions. Suppose that 0^ Π/ < / and /(b) = 1; thus, in the notation of Section 27, feS. It is an elementary fact (used in the proof of Theorem 28.2) that /(0 = tlim£r(U)/fn(./) for some non-negative 'charges' β„ on /. Thus f(i) = 1KmZG(iJ)q(j)fin(jl and, since XRkG = G — RX<G, it follows that / is supermedian for {Rx}: XRJ^fonl. Recall from (27.5) that f(j) < 0(j) x. It is now convenient to assume (and we do assume) q(·) chosen so that (in addition to (42.1)) we have lGUJ)0(j)'x -ΣΠλΜΑ"1^./)"1 < oo.
326 MARKOV PROCESSES Ш.44 Then Rof = lG(i>J)fU)<lGUJWU)-\ so that R0f is bounded on / (indeed, uniformly over / in S). If fl»f-Rif{>0) on / then f1 satisfies ХЯх+1/г </* and also f = f1 + Rof1· The resolvents {Rx} and {Rx} are 'in duality relative to the measure G(b,·) on Г in the sense that (h1, Rxh2 yGibt.) = < Rx^i, h2 Уо(ь,)> where hx and /i2 are non-negative functions on / and <*i.*2W):=Z*i(0*2(0G(b,0. It is a general principle of potential theory that a 1-excessive function for {Rx} is the density relative to G(b,·) of a 1-excessive measure for {Дя}. We can use (respectively, understand) the idea behind this general principle in (via) our simple situation. Put (i\j)-=f\j)G{b,j) (jel). Then, for jel, (44.1) λμ^λ+ x(j) = G{b,j)kRk+J\j) < (i\j\ so that μ1 is (at least) 1-supermedian for {Rk} on /. If we ignore technical details, the general principle is 'trivial', but it is extremely useful, as we shall soon see. Let us now make the cunning definition (i\d):= f\b). By several applications of the resolvent equation, you can check that /1Ч5)-Я/21Яя+1(5) = (Я+1)-1[/(Ь)-(Я+1)Яя+1/(Ь)]^0, so that μ1 is 1-supermedian for {Rk} on I и д. Note that /i1(/u5) = /1(b) + G/1(b) = /(b)=l, so that μ1 is a probability measure on /u5. Now every point of I\jd is obviously a non-branch-point of {Rk} now considered as extendedjto Fu5, so μ1 can be considered as a 1-excessive probability measure on Fud. By Theorem 38.2, JFeVd
Ш.44,45 APPLICATIONS 327 for some probability measure ν on &${FeKjd). Define fi(j):=G(bJ)f(j) (jel). Then, on I (not I ud), we have μ = № + *<>]=[ ν(άξ)ίΐ0(ξΛ whence (44.2) /(*)=[ κ(ί,ξ)ν(άξ) and v(Fe) = /(b) = 1. That the 'Choquet' representation (44.2) is unique follows easily from the 'uniqueness' part of Theorem 38.2. Our direct proof (avoiding Choquet theory) of the analytical results in Section 27 is now complete. 45. Doob's /i-transforms. Now the 'Ray' approach to Martin boundary theory really pays dividends. Let ν be a probability measure on Fe and let h(i):= [κ{ίΛ)ν№ denote the excessive function (with h(b) = 1) that ν represents. We can define the Pv law of 'X started according to the law v' in the usual way. It is automatic from (42.7) that X dies at b: Pv[X(C-) = b] = l, where ζ:=Μ{ί:Χ(ή = δ}. Now write Yh for the time-reversal of (X, Pv): w~b (t>o, where д is a 'forward' coffin state. Put Xh(t):= Yh(t + ), so that, for t = 0, Xh(0) = b (a.s.(P)). By Nagasawa's formula, Xk (starts at b and) has transition probabilities Ph(f,hj) = \ .„.*.„ P(Kj,0 = h(r1P(t;i,j)h(j'). In particular, the Q-matrix Qh of Xh satisfies QHdJ) = h(r1Q(i,JMjl so that the 'jump chain' Jh of successive states visited by X has one-step transition
328 MARKOV PROCESSES HI.45,46 matrix Пл, where Uh(i,j) = h(r1Tl(iJ)h(j). We have proved the following theorem. {45.1) THEOREM (Doob). The probability measure ν that represents an element h of S is the distribution of Jh(^ — \ where Jh is a discrete-parameter chain with transition matrix Ylh and started at b. Theorems 29.2 and 29.3 follow immediately. We have presented a full acount of Martin-boundary theory in its simplest setting, but by the most powerful methods. You will wish to follow up the general theory in Meyer [4]. To get a first idea of the scope of applications, see Blackwell and Kendall [1] on Polya's urn and population growth, Revuz [1] for the culmination of work of Kesten, Spitzer, Brunei and Revuz on random walks on groups, and Dynkin [3] for an extraordinary and deep result on random deformations of ellipsoids. Norris, Rogers and Williams [1] gives a simpler proof of Dynkin's result. Time reversal and related topics It is something of a mystery that a number of results are much clearer in reversed time than in forward time. Martin-boundary theory has already illustrated this; and we shall shortly see other examples. Our concern is with how to use the idea of time reversal, without full discussion of certain technical difficulties, which tend to blur the subject. Chung and Walsh [1] is a very important paper on identifying and resolving those difficulties. The last two volumes of Dellacherie and Meyer [1] have the latest news. Time reversal is deeply connected with duality, a major topic from the work of Hunt on. See Blumenthal and Getoor [1]. Joanna Mitro's papers [1,2] greatly clarify duality. 46. Nagasawa's formula for chains. Let / be a countable set and let {ρ0·(ί)} be a transition matrix function on / (satisfying the usual continuity condition: pl7(0 + ) = Sij). Let {p{t)} be a probability entrance law for the usual extension {P?d} of {Pt} to I ud. But let us write Pt instead of the clumsy P + d. Let {Xt: t > 0; Ω, J5"0, Pp} be a process on / и д with PP[X5 = i; Xs+t =j; Xs+t+u = fc;...] = р^)р{](()р]к{и)... whenever s, £, w,... > 0 and i, j, fe,... el и д. If you like, you can imagine X to be the Ray process taking values in the Ray-Knight compactification of Iud determined by {Яя}. (The precise details of this are clarified in Section 50.) But let us not be too specific about the technicalities—assume that X is 'sufficiently
111.46 APPLICATIONS 329 smooth'. It is well known (see Section 52 for proof) that, for fee/, there exists a continuous function fkd(') on (0, oo) such that Р*КеЛ] = fkd(t) dt, ί := inf {i: Xt = d}. (Strictly speaking, we should be concentrating on a particular triple (Ω,#"°,ΡΡ), but we assume that P* can be defined on (Ω, !F°\ as will happen when (Ω, J^°) is the usual path-space. As far as we are concerned, this point is rather academic. When you read the Chung-Walsh paper, you will see why we have mentioned it, and you will recall comments on regular conditional probabilities made before Theorem 11.90.11. Let us suppose that Pp(0 < ζ < oo) = 1. We now consider the process {7(t), Pp}, where Ϋ(ίγ=ίχ(ζ~ή (°<ίίζζ)> \д (ί>ζ). It is plausible that, for kel, (46.1) PplY(t) = fc] = Γ °° Ρρ[Χ(ζ -t) = k; ζΕί + dv\ Jo = Γ P'[X(t>) = k^LCedq/dt) dv = ^Ae(i), Jv = 0 where (46.2) ξ,:= Γρ,(ν)άν. We do not need to tell you that (46.1) is not a rigorous calculation. One way (Chung-Walsh) in which to make it rigorous involves approximating the integrals by Riemann sums, using right-continuity of X to push things through. Another way (Meyer [4]) is to justify directly the result obtained when both sides of (46.1) are multiplied by an arbitrary measurable non-negative function of t and integrated over (0, oo). In either case, we can use left-continuity of Υ to show that (46.1) must hold for all (not merely almost all) i. Without any further difficulty, we can calculate for ij\kel and s,i,w > 0, (46.3) P^[yf = i;yi+l=j;yi+l+ll = fc] priXt-^nX^-^jlX^-t-^kXes + t + u + dO], ■i. 1 Ρ'ΙΧυ = к; Xv+U =j; Xv+u+t = Qfie{s) dv
330 MARKOV PROCESSES IH.46,47 where Α/*):=ίΛι(*)/& (with some arbitrary conventions for 0/0). The extension of (46.3) 'to η terms' is obvious. Thus (46.4) {Yt: t > 0;PP} is Markovian with stationary transition probabilities {&;(£)} and with probability entrance law on Iud determined by (46.1). 47. Strong Markov property under time reversal. Consider a process X that starts at 1, drift towards 0 at constant rate 1, stays at 0 for an exponentially distributed time S of rate 1, and then dies. Thus Xt=l-t (0^ί<1); Xt = 0 (l^i<l+S); Xt = d (t^l + S). It is easy to see that X has FD transition function, so that X is strong Markov. The time-reversal Υ οϊ Χ satisfies У, = 0 (0<i^S); Y, = i-S (S<i^S + l); Υ, = δ (i>S+l). The right-continuous modification X of Υ satisfies Xt = 0 (0^t<S); Xt = t-S (S<i<S+l); Xt = S (i^S+l). Note that 1 acts as a branch-point for X, with Ρ0(1,δ) = 1. The process X is very similar to the process in Example 35.8. In particular, X is not strong Markov relative to its natural {^7+} -algebras, because X does not start afresh at time S. Thus time reversal can destroy the strong Markov property. Chung and Walsh [1] show that the time-reversal (made right-continuous) of a strong Markov process has a Markov property intermediate between the simple and strong Markov properties: the so-called moderately strong Markov property. This interesting concept is 'correct' from the point of view of the theory of previsible processes. There is however another way out of the 'difficulty', which is more satisfactory: Doob's simultaneous compactification. The problem with time reversal is that the Ray-Knight compactifications associated with X and X may be totally different and may induce different topologies on the initially given state-space. For the example that we have just been discussing, we know from (35.8) that, in order to 'force' the strong Markov property of X, we must tear [0,1] apart at 0, producing a state-space {0} u [0+, 1]. (Recall that [0+, 1] is the same as the conventional [0,1] and that 0 now denotes a point isolated from [0+, 1].) The important thing is that both X and X have good (right-continuous, strong Markov) modifications with state-space {0}u[0+, 1], with 0+ made a branchpoint for X from which X branches to 0, and that these modifications are properly related: each is the 'time-reversal made right-continuous' of the other. Here you have a clue to Doob's idea (see Doob [2]) of constructing an entrance-exit space for a general process X by using a 'simultaneous' compacti-
ΠΙ.47,48 APPLICATIONS 331 fication based on both X and a time-reversal X. We leave aside discussion of the worries you are beginning to have about whether we will now have to abandon the Ray property. 48. Equilibrium charge. We now give the promised intuitive explanation (expanded from Williams [1]) of Hunt's Theorem 1.22.7 on equilibrium potential and the Chung-Getoor-Sharpe Theorem 1.22.13 on equilibrium charge. Though we skip rigour for interest's sake, it is not too hard to supply it even in much more general contexts. Take η ^ 3 and let g be the free-space Green function for Δ (in Rn). Let beTR" and let В be canonical Brownian motion in R+ starting at b. Let ν be a strictly positive C00 function on Rn such that EbAaD = 2\g(b,y)v(y)dy<cc where = 2Lb,y)v{ A,:= Ί v(Bs)ds. /0 Put rt:=ini{s:As>t}. Then, by Volkonskii's Theorem, Xt\=B°xt defines a continuous strong Markov process with finite lifetime A^. The Green function of X with respect to the measure 2v(y) dy is just g. Now let X be the time-reversal of X with Green function # relative to the measure 2v(y) dy given by the Nagasawa formula: (48.1) $(y,z) = g(b,z)g(z,y)/g(b,y). Define HK:=ini{s>Q:XseK}, HK:=inf{s>Q:X5eK}, L£:= sup {5 > 0: B5eK}, LXK = sup {5 > 0: XseK}. For ГеЩдК), put Я(Г):= P*[B(L£)er] = Рь[*(Ь£)еГ] = Рь1Х(Нк)еГ}. Then, using the (unproved but 'obvious') strong Markov property of X, we find that 2v(z)dz l{dy)g{y,z) = Eb[time spent by X in dz before L£] JdK = Eb[time spent by X in dz~\Pz[HK < 00] = 2g{b,z)v(z)dzPz[HK < 00]. On substituting the formula (48.1) for g{y,z), we see that, for almost all z. (48.2) PZ[HK < сю] = g(z9 y)eK{dy\ JdK Jdl
332 MARKOV PROCESSES Ш.48,49 where (48.3) eK{dy):=g{b,y)-'Y*[.B{.LBK)edy-}. Since Pz[#£ < oo ] has the same significance for В as for X, the main results of Theorems 1.22.7 and 1.22.13 are more or less proved. Rounding off Theorem 1.22.7 is largely a case of repeating arguments used in connection with the Dirichlet problem. Let us emphasise one thing, however. The two sides of (48.2) are discontinuous at sufficiently singular points of dK, as Lebesgue's thorn shows. To obtain (48.2) for all z, which is important because of such singularities, we use the fact that both sides of (48.2) are excessive (for В or for X). Exercise. Explain the 'excessive' property and why it implies equality for all ζ in (48.2). You should now be convinced that time reversal provides the natural approach to many problems. 49. BM(R) and BES(3); splitting times. Many relations exist between 1- dimensional Brownian motion and the 3-dimensional Bessel process. Here is a first one. (A MBb(R) has starting position b, etc.) (49.1) THEOREM. Let В be a BM0(R) and define Я? := inf {t: Bt=l}. Let R be a BES0(3) and define Lf := sup {i: Rt — 1}. Then the processes {1 - B(Hf - i):0 ^ t < ΗΪ} and {R{t):0 ^ t < Lf} are identical in law. After the discussion at the end of Section 31, it should be clear to you that Martin-boundary theory makes this result extremely plausible. It is not difficult (Exercisel) to prove it directly by bare-hand computation—see Williamsr [4], where the result is applied to local-time theory. For 0 ^ t < oo, put At:= meas {5 < t: Bs ^ 0}, τ,:= inf {5: As > t}. Then (see Section 22) Yr:= B°xt defines a reflecting Brownian motion Y. In other words, У is a BES0(1) process. Put H\:= inf {t: Yt = 1}. Then H\ = A(H*) - meas {t ^ Lf: R(t) ^ 1} = meas {t < 00 : R{t) ^ 1}, '~' signifying equality in law. This is the special case 'n = Г of the Ciesielski- Taylor Theorem that if Rn is a BES0(n) and Rn + 2 is a BES0(n + 2) then (49.2) inf {t: R„{t) = 1} - meas {t < 00 : Rn+2{t) ^ 1}. See (11.20) and Ciesielski and Taylor [1]. Many proofs of (49.2) are now known. None seems to provide a clear geometrical explanation, and one conjectures that no such explanation exists.
111.49 APPLICATIONS 333 The following result (Williams [4,7]) was needed for certain applications to excursion theory and local-time theory. (49.3) THEOREM. Fix b in (0, oo). On a suitable probability triple (Ω, J^P), set up three independent random elements (see Fig. III.l): a random variable у uniformly distributed on (0, b); aBMb(R){B(i):i^0}; aBES0(3) {R(t):t^0}. Define p:=inf{i:B(i) = y}, X(t):= B(t) (t<pl R(t-p) + y (t>p). Then {Χ(ή: ί ^ 0} is a BESb(3). We regard this result as providing a path decomposition of the BESb(3) X at the time ρ at which X attains its minimum value y. Williams [4,7] proved Theorem 49.3 by bare-hand calculation, but did try (unsuccessfully) to produce a theory of splitting-times as the 'natural' times at which one might expect to have path decompositions of Markov processes. His idea was to call a random time ρ an algebraic splitting-time (for a Markov process X) if, for every i, we can write / {p = t}=Ftr\Gt, where Ftea{Xs:s ^ i}, Gtea{Xu:u ^ i}, the σ-algebras being uncompleted. The way in which this definition would be ruined if we allowed completion of algebras is a first indication of how much more difficult it must be to prove a splitting-time theorem than the Stopping- Time (that is, Strong Markov) Theorem. Figure III.l
334 MARKOV PROCESSES HI.49,50 Jacobsen [1] gave a nice formulation of 'splitting-time' in terms of the 'crossover' property, and also gave a more illuminating proof of Theorem 49.3. Pitman [1] then proved Theorem 49.3 by using a clever random-walk approximation, and in [2] he described other applications of the splitting-time idea. Millar has done some fine work in this area (see his survey [1] and papers referred to there) and has (Millar [2]) the definitive proof of results on path decomposition at times of minima for a wide class of Markov processes based- on work of Getoor and Sharpe. New proofs of Theorem 49.3 appear regularly. See, for example, Walsh [1], le Gall [7], Ikeda and Watanabe [1], Revuz and Yor [1], and Section VI.55 in Volume 2. Non-standard analysis provides a promising approach to splitting times; see Cutland and Kendall [1]. Theorem 49.3 helped motivate some of the original work on grossissements (enlargements of filtrations). See Barlow [1,2], Jeulin [1], Jeulin and Yor [1]. Jeulin gives a martingale proof of Theorem 49.3. A first look at Markov-chain theory Many key ideas of modern Markov-process theory—last-exit decompositions, excursion laws, boundary theory etc.—first appeared in clear form in Markov- chain theory; and chain theory still seems to us the ideal vehicle for learning process theory and assessing its achievements. Like number theory, chain theory is at the same time 'concrete' and sufficiently rich to accommodate the most sophisticated ideas. 50. Chains as Ray processes. Let / be a countable set, so that / is an LCCB in its discrete topology. Let {Pt} = {pl7(i)} be a transition matrix function on /, assumed 'standard' in that Ρυ(0 + ) = Ρυ(0) = δφ Vijel. Without loss of generality, we can assume that {Pt} is honest. So let us assume that Let {Rx} be the resolvent of {Pt} acting on C(I) = B{I). Then the conditions (35.6) are obviously satisfied, so that we can construct the Ray-Knight com- pactification F of I based on {Rx}, the Ray transition function {Pt} on F, and the Ray process X with transition function {Pt}. The space F may include 'irrelevant' points, and we shall see that we can—with advantage—restrict attention to the space Ε defined as follows: (50.1) £:= {xeF:Pt(x,I) = 1,Vi > 0} 2 /.
111.50 APPLICATIONS 335 For xeF, the mapt\-*Pt(x,I) is obviously (why?) non-decreasing in t, so that (50.2) E = {xeF:Rl(x,I)=i}. (If, for example, we have the Poisson chain on Z+ with ( 0 otherwise, then F is the one-point compactification of Z+, and РДоо, {oo}) = 1 for t ^0; here, £ = Z\) The space Ε may be described very simply. Think of the point i of / as identified with the element rl,(l) = /?1(i,{-})E/1(/). Then the arguments in Section 43 show that (up to homeomorphism) Ε is nothing other than the closure of / in ίγ (I). The map хн.гя.(1) = Я1(х,{·}) is therefore a homeomorphism of Ε to ίγ (I). (50.3) THEOREM (Neveu [5]). For t >0, the map from Ε to ίγ (I) is continuous. Hence, for every f in B{I) and every t > 0, the map χ ι—► Ptf(x) is continuous on E. Proof. We are guided by Neveu [5]. With slight (but allowable) misuse of notation, let x->y in E. Then r,.(l)-rr(l) (in /,(/)). Since, for 0 < и < ν, ί" J и e-°px.(s)ds = гхШе~иР(и) - e~vP(v)l we have, for 0 < δ < ί, (50.4) δ~ι Γ J-'PxjWs-tS-1 Γ J-'pyj^ds, y/e/. Jt-δ Jt-δ Fix j ε I. Let ε > 0 be given, and let δ > 0 be so small that pn{u) ^ 1 — ε for w ^ <5. Then, for ί — δ < s < ί,
336 MARKOV PROCESSES 111.50 whence, from (50.4), liminfp^iOXl-e)^"^"1 ex~spyj{s)ds. x-*y Jt-δ Now let ε||0 (and insist that <5||0) to obtain (50.5) lim inf pxj(t) ^ pyj(t). x-*y Fatou's Lemma combined with Σρ^·(ί) = 1 shows that equality must hold in (50.5) and this implies (why?) that we can replace lim inf in (50.5) by lim'. D The point of Theorem 50.3 is that {Pt} has nice analytical properties on E. We must now show that Only Ε matters' in the probabilistic theory. {50.6) THEOREM (Ray, Kunita-Watanabe, Meyer). For every χ in E, (50.7) P*[*feE,Vt>0] = l, (50.8) P*[*f_e£,Vi>0] = l. Proof. The proof is a fine illustration of the need for Meyer's section theorems. Let xeE. Then, with χΙ denoting the characteristic (indicator) function of /, we have, for any stopping time T, 1 = RiXiix) = E* fTe-'Xl(Xt)dt + Exle-T^Xi(XT)l Jo <Ех[1-г-г] + Ех[г-г] = 1. You can see that, because of (50.2), PxlT<oo,XTeF\E]=0, and the result (50.7) now follows from the Section Theorem II.76.2. To prove (50.8), we need some of the general theory of processes, which we study further in Chapter VI. Since the process {Xt-} is left-continuous, it is 'previsible'. Hence if Px[Xt_ eF\E for some i] > 0 then (by Meyer's 'previsible' section theorem—see Dellacherie and Meyer [1]) there exists a sequence (Tn) of stopping times with T„^T (a.s.(P*)) such that Р*[Г<оо,*г_е£\£]>0. However, this contradicts the 'Quasi-Left-Continuity' Theorem 41.3, because (as we saw above) XTeE (a.s.(P*)), whereas P0(XT_,E)<1 (HXT_eF\E).
Ш.50-52 APPLICATIONS 337 Clarification. Since Λ1χι^χ£, we have, for yeF\E, l>R1(y,I) = syRlXi = ByPoRa, > syPoXE = P0(y, E). D We may [and shall) now regard Ε as the state-space of our process (chain?\) X. Of course, Ε need not be compact, but Ε is Polish. We write Ebr for the set of branch-points in £, and Ee for the set of non-branch-points in E. 51. Significance of qf. Let iel and define (51.1) S£:=inf{t:*f#i}. (Then Si is an {J%°+} stopping time.) By the simple Markov property, we have, for 5, t > 0, P'[S, > s + t] = P'[Sf > s]P'[S£ > i], so that (51.2) FCSf > i] = e~qit for some ^е[0, оо]. Now let επ||0. Then it is clear that P£[Sl>i] = Kmp„(eI1)W-i + 1 π so that t -^i=hm-logpfi(en). It is now trivial that (51.3) i^Iime-^l-p^e)], εϊίΟ the existence of the limit being part of our conclusion. Of course, we have (51.4) pii(i)^Pi[Si>i]=^9if. We make the usual classification: (51.5) a state i of I is called stable if qt < oo and instantaneous if qt = oo. Since paths are right-continuous, it is clear that (51.6) a point of I that is isolated in I in the Ray-Knight topology is a stable state. Of course (give an example!) a stable state may be a point of accumulation of/. 52. Taboo probabilities; first-entrance decomposition. We stick to DW's traditional use of 'b\ the first letter in the alphabet to cause printers (and readers) no trouble, for a state in / on which we concentrate our attention. (Of course, b is not a branch-point of X.) We try to abide by Chung's terminology and notation for chains.
338 MARKOV PROCESSES Ш.52 So fix b in /. Introduce Chung's taboo transition probability: (52.1) bPo.(i):=P[^f=j,Hb>i] (ijel\b) where Hb is the hitting-time of b. Then, because Hb has the terminal-time property Hb = t + Hbo9t on {Hb>tl it follows {Exercise—but see Section 54) that {ьрф)} is a 'standard' transition matrix function on I\b and that, for s,i > 0, (52.3) Fib(t + s)-Fib(t)= Σ bPij(t)Fjb(s) (iel\b), jel\b where (52.4) ^ь(г):=Р'[Яь<г]. Chung and Neveu independently showed that (52.3) may be differentiated with respect to s in the following precise sense: for хфЬ, there exists a (finite) continuous function fib(·) on [0, oo) such that (52.5) Fib(t)=\'fib(s)ds; Jo further, for s > 0, but not necessarily for 5 = 0, (52.6) /*(* + *)= Σ ьРиШМ jel\b We skip the analytical derivation of (52.5) and (52.6) from (52.3). See Neveu [3,4], where the idea is that we can write fib explicitly as fib(t)= Σ r1[tbPij(t-s)dFjM jel\b Jo or Chung [1; Theorem II. 12.4] for proof by appeal to classical (Fubini) theorems on differentiation. The important fact that fib extends continuously from (0, oo) to [0, oo) with /ib(0) < oo needs special pleading. We reproduce the argument from Chung and Neveu. We have fib(s)>bPu(s-u)fib(u) (0<u<s). Hence, as we see by letting иЦО (through a suitable subsequence), (52.7) fib(s) > bPnishim sup fib(u)\ L alio J and now letting sjjO, lim inf /ib(s) ^ lim sup/ib(w). «1Ю ujjo
Ш.52,53 APPLICATIONS 339 Put (52.8) ^.b:=/.b(0):=/ib(0 + ) (iel\b). We see from (52.7) that (52.9) <Uifb<oo (iel\b). D If we let 5 ДО in (52.6) and apply Fatou's Lemma, we obtain (52.10) /№(t)> Σ ъР1№ь. jel\b Strict inequality may obtain here. The right-hand side of (52.10) represents the P' probability density of first entering b via a jump from I\b at time t. (Accept this intuitively obvious fact for now.) Note how 'wrong' things go in the case of the Feller-McKean process. Let us now agree to use the 'tilde notation for Laplace transforms. So, for λ > 0, write (52.11) /»(Я):= Г°° е-*Ш it = E'[exp (- ВД]; Jo (52.12) &,·(!):= f " e~XtPij{t)dt = ry(A). Jo Dynkin's formula gives PM) = RxXb(i) = ЕТехр(-Шь)]КяХь(Ь) =fib{X)pbb{X\ and, on inverting the Laplace transform, we obtain the first-entrance decomposition: (52.13) pib(t)= \fib(s)pbb(t-s)ds (iel\b) Jo which (by continuity) is valid for all t > 0. (52.14) Exercise. Prove the more precise result P(lHbeds; X(t) = b] =fib(s)pbb(t -s)ds (0<s< t). 53. The Q-matrix; DK conditions. We have already seen that (53.1) ^:=lime-1[pii(e)-l]= -^e[-oq,0]. Since qib:=fib{0 + ) exists in [0, oo), it follows immediately from (52.13) that (53.2) lime-We) = ^e[0,oo). εϊίΟ
340 MARKOV PROCESSES ΙΠ.53,54 The existence of the limits in (53.1) and (53.2) was first established by Kolmogorov and Doob. We write (53.3) Q = P\% with the interpretation provided by (53.1) and (53.2). We call the conditions (DK1) 0<iy<oo (i*j); (DK2) Σ9υ<9<<°° the Doob-Kolmogorov conditions. The condition (DK2) is obtained by letting ε υ 0 in the equation (53.4) Ze'We)<e"l[l-P«W] and applying Fatou's Lemma. (Under our assumption that {Pt} is honest, equality holds in (53.4).) The Feller-McKean example shows that (even for an honest {Pt)} we can have the 'worst' possible situation: Actually this is the 'best' possible situation for chains with all states instantaneous. (See Theorem 55.1.) The probabilistic interpretation of q^(j Φ ί) is given in Section 57. 54. Local-character condition for Q. Let G be an open subset of E. Define p*(i):= P'[*(t) =j, HE\G > t] (i, jelη G), where, as usual HE\G denotes the hitting-time of E\G. Then Pa(t) > Pu(t) ~ P'№XG < i] (iel η G). Since G is open and X is right-continuous, Pl[#£\G > 0] = 1, so that (54.1) p<}(t)->l (t->0) for ielnG. That {PG(i)}:= {p^(i):i,jE/nG} is a transition matrix function on InG is easily shown (recall Exercise 18.15). Note that (54.1) ensures that {PG{t)} is 'standard'. Extend {PG(t)} to {InG)ηд in the usual way and observe that, for belnG, pGd(8)>PblXEeI\G-]= Σ РъМ)' Multiply through by ε"1, let ε||0, and apply Fatou's Lemma and (DK1) to obtain 6(b,AG):= Σ ibj<iw<°o. Jei\G
111.54 APPLICATIONS 341 We have proved the following result (Williams [8,9]). {54.2) LEMMA (local-character condition). Let bel, and let G be an open subset of Ε containing b. Then Q(bJ\G) <oo. The result is interesting, not for its own sake, but for its 'concrete' applications, we follow Williams [9]. (54.3) COROLLARY 1. If a and b are distinct points of I, then (Ν) Σ <7«,λ<^<οο. Note. The 'Ν' is in deference to Neveu, in whose work this condition is implicit but nowhere explicit. Proof of Corollary 1. Since Ε is Hausdorff, there exist disjoint open subsets Gfl, Gb of Ε with aeGa, beGb. Then Σ ^A^<e(a,/\Gfl) + e(b,AGb)<oo. Π j*{a,b} (54.4) COROLLARY 2. Suppose that Η is a finite subset of I such that (54.5) liminf X qhj>0. j ЛеЯ Then every state i in I\H is stable. Note. The meaning of 'lim inf should be obvious: for some ε > 0, Σ <lhj < ε f°r only finitely many j. heH Proof of Corollary 2. It is clearly enough to suppose that / is infinite and Η is minimal subject to the requirement (54.5). Then every state in Η is instantaneous (by DK2)), and is therefore a point of accumulation of /. Let G be an open subset of Ε that contains H. Then Lemma 54.2 and the condition (54.5) imply that E\G contains only finitely many points of /. Thus / is homeomorphic to the disjoint union of \H\ copies of {1,2,3,...; oo} and is already compact: I = F = E. (Thus X takes all its values in /.) Any point i of I\H is isolated in the Ray-Knight topology, and is therefore stable. Π It is important that (54.6) under the hypotheses of (54.4), we can add that (54.7) Σ <lij = <li («»λ Vie/\tf. ^/\{f}
342 MARKOV PROCESSES ΙΠ.54,55 This follows because, since X takes all its values in /, X must exist the stable state i in I\H by jumping to another state of /. (Since {Pt} is honest, X cannot jump to d; but the real point is that there are no 'fictitious' states in E\I to which X can jump from i.) We are sure that you already know that (54.7) is equivalent to the statement that X exits i by jumping to another state of /. (If not, wait for something very much better in Section 57.) Π 55. Totally instantaneous ^-matrices. Recall that the conditions (DK1) 0<iv<oo, Чиулф], (Ν) Σ ^л^.<оо, Va,b:a^b, ]ф{а,Ь} hold. Suppose now that every state of / is instantaneous: (TI) 9*=-9«=oo, Vie/. Then we can argue that the following 'safety' condition holds: (S): There exists an infinite subset J of I with 6(UV):= Σ 9tj<«>, v'*e/. 'J is a large set (comparatively) safe from hits'. Proof of (S) (Williams [9]). Label / as N. Since (TI) holds, it follows from (54.4) that there is an infinite set J = {j(l), j(2), ...}c/ such that j{n) > η and Σ 9ijw<2-". Then e(/,j\o<oo, ν/. π It was known back in 1967 (see Williams [10]) that if β is a β-matrix that satisfies (TI) and (DK1) then the conditions (N) and (S) hold. Swayed by the then-prevalent belief that totally instantaneous chains are impossibly complicated, DW spent the next seven years trying to find additional necessary conditions. He was then somewhat annoyed to discover that there are none. (55.1) THEOREM. Let Q be an I x I matrix satisfying (TI) and (OKI). Then Q = P'(0) for some transition function {P(t)} on I if and only if the conditions (N) and (S) hold; and then {P(t)} may be chosen to be honest. It is obvious that the 'if' part is proved by bare-hands construction of a suitable (P(i)}. (See Williams [9].) Theorem 55.1 is all very well as an analytical result, but it does not probe
IH.55,56 APPLICATIONS 343 deeply into the probabilistic structure of chains. For a much more challenging problem than that solved by Theorem 55.1, see Section 60. Note that the easiest way to guarantee conditions (N) and (S) is to take q{j — 0 (i#j). Thus the Feller-McKean β-matrix is the most 'likely' candidate for a TI β-matrix, not the least likely, as was once thought. This explains remarks at the end of Section 53. 56. Last exits. We now wish to prove the following result, which is 'dual' to (52.13): for b, jel with j φ b, there exists a continuous function gbj{-) on [0, oo) such that (56.1) Ai(t) = Pbb(s)gbj(t-s)ds. One interpretation is provided by the dual of (52.14): (56.2) P^b(t)eds; X(t) = j] = Pbb(s)gbj(t - s) ds9 where (56.3) <rb(i):= sup {s < ί:X(s) = b}. The intuitively obvious (but very unfashionable!) thing to do is to derive these results from those in Section 5.2 by time reversal. We used the same idea in connection with the Chung-Getoor-Sharpe description of equilibrium charge in terms of (spatial) last-exit distribution. Since the 'hat' notation is standard both for Laplace transforms and for time reversal, let us follow the notation of Doob [2], using the 'tilde' notation for Laplace transforms (as we did in Section 52), and the 'star' notation for time- reversed processes. It is appropriate that we should follow Doob's paper [2], because we now make rather trivial use of a method fully developed there. Let ξ be an exponentially distributed random variable of rate α > 0 and independent of X. Let Y* and X* be (£u 5*)-valued processes defined as follows: y*(i):= Χ(ξ-ή (0 < t ^ ξ); Y*(t):= δ*(ί > ξ% **(t):= У*(* + ). By Nagasawa's formula, the process X* under the Pb law is Markovian with transition matrix function {p*(t)} satisfying (56.4) Рт=е^%^)рь](а)/ры(а). Apply the first entrance decomposition to X* to obtain whence pbj(X + a) = J%k)pJX + 0L)pbj{<x)lpbb{a).
344 MARKOV PROCESSES 111.56 Inversion of the Laplace transform yields (56.1), with 9bj(ty=e*\f%(t)pbj(a)/pbM It is formally obvious how (56.2) follows from (52.14). However, one has to be rather careful here because of the difficulty mentioned earlier than X* need not be strong Markov. Recall that the Ray-Knight compactification induced by X* may be quite different from that induced by X, that we need Doob's double (or simultaneous) compactification to do things properly, and so on — Of course, our proof that (for j Φ b) there exists a continuous function gbj on [0, oo) satisfying (56.1) is totally rigorous. We could have made the argument independent of the concept of time reversal by making (56.4) an analytical definition of a transition matrix function {p*(t)}. Important exercise. Deduce from (56.1) that (56.5) gbj(0) = qbj. Next prove that, for s > 0 and t ^ 0, (56.6) gbj(s + ί) = Σ 9bi(s)bPij(t) (j Φ Ь\ and deduce that (56.7) 9ьМ)=1ЯыьР1М) (j*b). (Hint for (56.6): compare (52.6).) Remembering that {P(t)} is assumed honest, show that (56.1) implies that Jo (56.8) 1 - pbb(t) = | pbb(s)gb(t - s) ds, where (56.9) gb(t):= Σ gbj(t) (t>0). ЗФЬ Deduce from (56.6) that gb is non-increasing; then from (56.8) that gb is finite on (0, oo); then from (56.6) that gb is continuous on (0, oo). Use (56.8) to show that (56.10) 0ь(О):=0ь(О + ) = 4ь<°о. Note that (56.7) does not necessarily extend to t = 0. Let vb be the measure on (0, oo] defined by (56.11) vb(i,oo]:=0b(i) (i>0). Deduce frdur(-56.8) that> for λ > °> (56.12) pbb(X) = \λ + f (1 - e-»)vb(dl)} \ L J(0,oo] J
111.57 APPLICATIONS 345 57. Excursions from b. Neveu's paper [2-5] are perhaps the finest written on chains. Our concern here is to describe what is in modern terminology Neveu's description of the Ito excursion law from the point b of /. Let (57.1) Lb(i):= meas {s < t: Xs = b}, and, for τ ^ 0, write (57.2) ρ(τ):= inf {t: Lb(t) > τ} < oo. The strong Markov property shows that {р(т):т^0:Рь} is a subordinator in the sense of Section 11.37, but with a slight generalisation (which you can easily make) to allow ρ to take the value oo. Exactly as in Section 11.37, we find that, for λ > 0, (57.3) Ε*[β-Α'(τ)] = <Γίψ(Α) for some function Ψ. Hence (57.4) Ψ(Λ)"1 = Eb f °° e-^ dz = Eb f °° e~Xt dLb(t) = pbb(X), Jo Jo so that, from (56.12), (57.5) Ψ(Λ) = Λ+ f (l-e~Xt)vb(dt). J(0,c»] The Levy-Ito formula (II.37.4) now shows that we can write (57.6) ρ(τ) = τ+ Γ ΖΛΓ((0,τ]χΛ) J(0,oo] where N is a Poisson measure on (0, oo) χ (0, oo] with expectation measure dz vb(dl). Of course, ρ(τ) is finite if and only if no atom of N lies in (Ο,τ] χ {oo}. It is therefore clear that Ρ*[ρ(τ) < oo] = Pb[Lb(oo) > τ] = exp(- τν,{οο}). The jumps made by p(·) correspond to the lengths of the excursions made by X from b, so our description of the Ito excursion law of X at b must be consistent with (57.6). (57.7) Excursion space. Let U be the excursion space of Skorokhod maps e from [0, oo) to Eud such that (i) if e(s) = д then e(t) = 5, Vt ^ s; (ii) φ) Φ b for s > 0.1 An excursion of X from b is to be considered as killed (sent to d) at its lifetime, so it cannot return to b.
346 MARKOV PROCESSES 111.57 Define Ce:=ini{t:e(t) = d}. Let %° be the smallest σ-algebra on U measuring each projection e\-+e(t). (57.8) THEOREM (Neveu). There exists a unique σ-finite measure η on(U, tft°) such that, for 0< t1 < t2 < ··· < tm and iui2,...jmel\b, n{eeU : e{tk) = ik (1^/c^m)} = dbi^^bPhhih -h)'~ bPin-tiJtm ~ lm- l)· The following statements hold: (57.9) n(U) = qb, n{e:e(0) = j} = qbj (jel\b)', (57.10) n{e:Ce>t}=gb(t), n{e:e(t) = j} = gbj(t) (jel\b); (57.11) n{e: Ceedt} = vb{dt) = ηύ{ή dt on (0, oo), where η{ί) (t > 0) is defined independently of s in (0, t) by (57.12) tfi):= Σ 9bj(s)fjb(t-s). jel\b The measure η is the ltd excursion law ofX at b in the sense explained in (57.13) below. The business of assigning credit is terribly tortuous. What is certain is that Neveu's papers have received much less credit than they deserve. However, we must be careful not to overcompensate and thereby do injustice to later work. (See, for. example, Freedman's interpretation of qbj described below.) We are expressing our belief that if ltd excursion laws had been discovered when Neveu was writing on chains, then he would have expressed his ideas in the form of Theorem 57.8. (Recall what Gauss did with ideal-class groups!) {57.13) The role of η as ltd excursion law from b. Part 8 of Chapter VI in Volume 2 is an extensive study of Ito excursion theory; and you will have to look there for proofs, and for much fuller explanation, of the statements now to be made. We describe how a process Ζ with the Pb law of X may be built out of excursions from b. Construct, as in Section 11.37, a Poisson random measure Λ on ((0, oo) χ U, Щ0, oo) χ ^°) with intensity measure Leb χ η. Let A be the set of atoms of Л, the 'points' of the Poisson point process. A typical point of A is, of course, a pair (σ, e) where ae(0, oo) and e is an element of U with lifetime ζβ ^ oo. For τ ^ 0, define 7(τ) = Σ{^(σ,β)ΕΑ:σ^τ}. Because of (56.11), (56.12), (57.6), (57.10) and (57.11), у is a subordinator identical
ΙΠ.57,58 APPLICATIONS 347 in law to the 'inverse local time' process (p, Pb). The jumps of у correspond to excursions of Ζ from b, and it remains only to interpolate within these excursions. For t ^ 0, we define Z(t):=e(t-y(r-)) if for some (e, т)еА, we have γ(τ — )< t < γ(τ); define Z(t)\= b otherwise. Then Ζ has the Pb law of X. In particular, if С is a measurable subset of U then the local time at b for X before the first excursion from b that lies in С has the exponential distribution with rate parameter n(C). The second equation in (57.9) shows that if we set Tbj for the time of the first jump made by X from b to j, Tbj:=inf{t>0:X(t-) = b,X(t) = j}, then (57.14) PbiLb(Tbj)>z\Tbj< oo] = exp(-zqbj). This is the interpretation of qbj discovered by Freedman [2]. Another (and very closely related) interpretation of qbj is provided by the theory of Levy kernels. Let Jbj{t) be the number of jumps made by X from b to j during time t. Then the idea of qbj as 'jump intensity' is perfectly captured by the statement that (57.15) Jbj(t) — qbjLb(t) is a martingale (relative to ({^°}, Ρμ) for every μ). In particular, for every probability measure μ on £, (57.16) EVw(t) = f Adx) Ρ pxb(s)qb} ds. Je Jo The object of Levy kernel theory is to describe (simultaneously) all the jumps of X. The β-matrix β, which describes the /-to-/ jumps, is just the restriction to / χ Щ1) of the Levy kernel of X, which describes all possible £-to-£ jumps. See Benveniste and Jacod [1] and Volumes IV and V of Dellacherie and Meyer [1]. 58. Kingman's solution of the 'Markov characterisation problem9. Kingman calls a function ρ on [0, oo) a Markov p-function if there exists a ('standard') transition function on a countable set / and a state bin I such that pbb(t) = p(t), Vi. (58.1) THEOREM (Kingman). A continuous function ρ on [0, oo) is a Markov p-function if and only if its Laplace transform may be written as ρ(λ) = Γλ+ί (1-β-χί)Λ + ν({οο})1 X (Λ>0) L J(0,oo) J where vf{oo}J>0 and where η is a low er-semicontinuous function on (0, oo) such
348 MARKOV PROCESSES IH.58,59 that either (i) rti) = 0,Vi,or (ii) for some a in (0, oo), ^(ί)>0 (0<ί<1), η(ή>β~αί (i^l). You can see that the 'only if part of the theorem is largely a consequence of Neveu's work. The 'if part is surprising and very much more difficult, and Kingman's proof of it is a splendid tour de force. You will find this proof and much else of interest in Kingman's book [3]. 59. Symmetrisable chains. Our paper, Rogers and Williams [2], was designed to advertise the power of Dirichlet-form theory initiated by Beurling and Deny and spectacularly developed for use in probability theory by Fukushima [1], Silverstein [1,2]. We proved the following theorem by a finite-state approximation technique originally used by Reuter and Ledermann for birth-and-death processes. (59.1) THEOREM. Let Q be an I x I matrix such that (59.2) qij>0 (i*j), Σ9λ=-9«<°°. νί> and such that Q is msymmetrisable in that тди=тдл (i,jel) for some strictly positive numbers (mi: iel). Then there exists a standard transition function {P(t)} on I with Q-matrix Q and m-symmetrisable in that WiPijit) = mjPjiit) (U jel;t^0) if and only if (59.3) &(*):= {fet2(m)'.£(f,f) < oo} is dense in i\m) where S is the Dirichlet form or energy norm associated with Q: Λ/,/)-ΣΣ»ί9«/Λ-/ί)2· i J Though, in general, {P(t)} is by no means unique, there is a 'icanonical' {P(t)} with the properties described. Ifm is a finite measure, the canonical {P(t)} is honest. It was too glibly stated in our paper that if every qit is finite then the canonical {P(t)} process corresponds to the chain reflected off its Martin boundary. Ivor McGillivray has pointed out that we should have said 'reflected off its Kuramochi boundary', the Kuramochi boundary being a boundary analogous to the Martin boundary (and agreeing with it in the cases most frequently encountered) but specially tailored for symmetrisable processes.
IH.60 APPLICATIONS 349 60. An open problem. Here, to end with, is a problem we should like to solve. Suppose that m is a probability measure on I, and that Q is an I x I matrix satisfying (59.2) and also (60.1) £ т#л = - т£д„ ^ oo, Vi. When does there exist a (positive-recurrent) chain X with Q-matrix Q and with m as invariant measure! The condition (59.3) is necessary—see Rogers and Williams [2].
References for Volumes 1 and 2 Abrahams, R. and Robbin, J, [1] Transversal Mappings and Flows, Benjamin, New York, Amsterdam, 1967. Adler, R. J. [1] The Geometry of Random Fields, Wiley Chichester, 1981. [2] An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes, IMS Lecture Notes—Monograph Series Vol. 12, IMS, Hay ward, Calif., 1990. Aizenmann, M. and Simon, B. [1] Brownian motion and the Harnack inequality for Schrodinger operators, Comm. Pure and Appi Math., 35, 209-273 (1982). Albeverio, S., Blanchard, P. and Hoegh-krohn, R. [1] Newtonian diffusions and planets, with a remark on non-standard Dirichlet forms and polymers, Stochastic Analysis and Applications: Lecture Notes in Mathematics 1095, Springer, Berlin, 1984, pp. 1-24. Albeverio, S., Fenstad, I.E., Hoegh-krohn, R. and Lindstrom, T. [1] Nonstandard Methods in Probability and Mathematical Physics, Academic Press, New York (1986). Aldous, D. J. [1] Stopping times and tightness, Ann. Prob., 6, 335-40 (1978). Ancona, A. [1] Negatively curved manifolds, elliptic operators and Martin boundary Ann. Math., 125, 495-536 (1987). Arnold, L. and Wihstutz, V. (editors) [1] Lyapunov Exponents (Proceedings): Lecture Notes in Mathematics 1186, Springer, Berlin, 1986. Azema, J. [1] Sur les fermes aleatoires, Seminaires de Probabilites XIX: Lecture Notes in Mathematics 1123, Springer, Berlin, 1985, pp. 297-495. Azema, J. and Yor, M. [1] Une solution simple au probleme de Skorokhod, Seminaire de probabilites XIII: Lecture Notes in Mathematics 721, Springer, Berlin, 1979, pp. 90-115, 625-633. [2] (editors) Temps locaux, Asterisque 52-53 Societe Mathematique de France (1978). [3] Etude d'une martingale remarquable, Seminaire de Probabilites XXIII: Lecture Notes in Mathematics 1372, Springer, Berlin, 1989, pp. 88-130. Azencott, R. [1] Grandes deviations et applications, Ecole d'Ete de Probabilites de Saint-Flour VIII: Lecture Note in Mathematics 774, Springer, Berlin, 1980.
352 REFERENCES FOR VOLUMES 1 AND 2 Barlow, Μ. Τ. [1] Study of a filtration expanded to include an honest time, Z. Wahrscheinlichkeitstheorie, 44, 307-323 (1978). [2] Decomposition of a Markov process at an honest time (unpublished). [3] One dimensional stochastic differential equation with no strong solution, J. London Math. Soc, 26, 335-347 (1982). [4] On Brownian local time, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 189-190. [5] Necessary and sufficient conditions for the continuity of local time of Levy processes, Ann. Prob. 16, 1389-1427 (1988). Barlow, Μ. Τ. and Hawkes, J. [1] Application d'entropie metrique a la continuite des temps locaux des processus de Levy. C.R. Acad. Sci. Paris Ser. I, 301, 237-239 (1985). Barlow, M. Т., Jacka, S. and Yor, M. [1] Inequalities for a pair of processes stopped at a random time, Proc. London Math. Soc, 52, 142-172 (1986). [2] Inegalities pour un couple de processus arretes a un temps quelconque, C.R. Acad. Sci., 299, 351-354 (1984). Barlow, M. T. and Perkins, E. [1] One-dimensional stochastic differential equations involving a singular increasing process, Stochastics, 12, 229-249 (1984). [2] Strong existence, uniqueness and non-uniqueness in an equation involving local time, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986, Springer, Berlin, 1983, pp. 32-66. Barlow, M. T. and Yor, M. [1] (Semi-) martingale inequalities and local times, Z. Wahrscheinlichkeitstheorie 55, 237-254 (1981). [2] Semi-martingale inequalities via the Garsia-Rodemich-Rumsey lemma and applications to local times, J. Funct. Anal., 49, 198-229 (1982). Bass, R. and Cranston, M. [1] The Malliavin calculus for pure jump processes and applications to local time, Ann. Prob., 14, 490-532 (1986). Batchelor, G. K. [1] Kolmogoroff's theory of locally isotropic turbulence, Proc. Camb. Phil. Soc, 43, 555-559 (1947). Baxendale, P. [1] Asymptotic behaviour of stochastic flows of diffeomorphisms; two case studies, Prob. Th. Rel. Fields, 73, 51-85 (1986). [2] Moment stability and large deviations for linear stochastic differential equations, Proc. Taniguchi Symposium on Probabilistic Methods in Mathematical Physics, Katata and Kyoto, 1985 (ed. N. Ikeda), Kinokuniya, Tokyo, 31-54 (1986). [3] The Lyapunov spectrum of a stochastic flow of diffeomorphisms, in Arnold and Wihstutz [1], pp. 322-337 (1986). [4] Brownian motions on the diffeomorphism group, I, Compos. Math., 53,19-50 (1984). Baxendale, P. and Harris, Т. Е. [1] Isotropic stochastic flows. Ann. Prob., 14, 1155-1179 (1986). Baxendale, P. and Stroock, D. W. [1] Large deviations and stochastic flows of diffeomorphisms, Prob. Th. Rel. Fields, 80, 169-215 (1988).
REFERENCES FOR VOLUMES 1 AND 2 353 Bensoussan, A. [1] Lectures on stochastic control, Nonlinear Filtering and Stochastic Control: Lecture Notes in Mathematics 972, Springer, Berlin, 1982, pp. 1-62. Benes, V. E., Shepp, L. A. and Witsenhausen, H. S. [1] Some solvable stochastic control problems, Stochastics, 4, 39-83 (1980). Benveniste, A. and Jacod, J. [1] Systemes de Levy des processus de Markov, Invent. Math., 21, 183-198 (1973). Berman, S. M. [1] Local times and sample function properties of stationary Gaussian processes, Trans. Amer. Math.Soc, 137, 277-300 (1969). [2] Harmonic analysis of local times and sample functions of Gaussian processes, Trans. Amer. Math. Soc, 143, 269-281 (1969). [3] Gaussian processes with stationary increments: local times and sample function properties, Ann. Math. Statist., 41, 1260-1272 (1970). BlANE, P. [1] Comparaison entre temps d'atteinte et temps de sejour de certaines diffusions reelles, Seminaire de Probabilites XIX, Lecture Notes in Mathematics 1123, Springer, Berlin, 1985, pp. 291-296. BlCHTELER, K. [1] Stochastic integration and //-theory of semi-martingales, Ann. Prob., 9,49-89 (1981). Bichteler, K. and Fonken, D. [1] A simple version of the Malliavin calculus in dimension one, Martingale Theory in Harmonic Analysis and Banach Spaces: Lecture Notes in Mathematics 939, Springer, Berlin, 1982, pp. 6-12. Bichteler, K. and Jacod, J. [1] Calcul de Malliavin pour les diffusions avec sauts: Existence d'une densite dans le cas unidimensionnel, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986, Springer, Berlin, 1983, pp. 132-157. BlLLINGSLEY, P. [1] Ergodic Theory and Information, Wiley, New York, 1965. [2] Convergence of Probability Measures, Wiley, New York, 1968. [3] Conditional distributions and tightness, Ann. Prob., 2, 480-485 (1974). Bingham, N. H. [1] Fluctuation theory in continuous time, Adv. Appl. Prob., 7, 705-766 (1975). Bingham, N. H. and Doney, R. A. [1] On higher-dimensional analogues of the arc-sine law, J. Appl. Prob. 25, 120-131 (1988). Bishop, R. and Crittenden, R. J. [1] Geometry of Manifolds, Academic Press, New York, 1964. Bismut, J.-M. [1] Mechanique Aleatoire: Lecture Notes in Mathematics 866, Springer, Berlin, 1981. [2] Martingales, the Malliavin calculus and hypoellipticity under general Hormander's conditions, Z. Wahrscheinlichkeitstheorie, 56, 469-505 (1981). [3] Calcul de variations stochastiques et processus de sauts, Z. Wahrscheinlichkeitstheorie 56, 469-505 (1983). [4] Large deviations and the Malliavin calculus, Progress in Mathematics, Birkhauser, Boston, 1984. [5] The Atiyah-Singer theorems; a probabilistic approach: I, The index theorem, J. Funct. Anal., 57,56-98 (1984); II, The Lefschetz fixed-point formulas, ibid, 329-348.
354 REFERENCES FOR VOLUMES 1 AND 2 Bismut, J.-M. and Michel, D. [1] Diffusions conditionnelles, I, II, J. Funct. Anal., 44, 174-211 (1981), 45, 274-292 (1981). Blackwell, D. and Kendall, D. G. [1] The Martin boundary for Polya's urn scheme and an application to stochastic population growth, J. Appl. Prob. 1, 284-296 (1964). Blumenthal, R. M. and Getoor, R. K. [1] Markov Processes and Potential theory, Academic Press, New York, 1968. [2] Local times for Markov processes. Z. Wahrscheinlichkeitstheorie verw. Geb., 3, 50-74 (1964). Bondesson, L. [1] Classes of infinitely divisible distributions and densities. Z. Wahrscheinlichkeitstheorie verw Geb., 57, 39-71 (1981). Bougerol, P. and Lacroix, J. [1] Products of Random Matrices with Applications to Schrodinger Operators, Birkhauser, Boston, 1985. Bourbaki, N. [1] Topologie generate, in Elements de Mathematique, Hermann, Paris, 1958, Chap. IX, 2nd edition. Breiman, L. [1] Probability, Addison-Wesley, Reading, Mass., 1968. Br6maud, P. [1] Point Processes and Queues: Martingale Dynamics, Springer, New York, 1981. Bretagnolle, J. [1] Resultats de Kesten sur les processus a accroissements independantes, Seminaire de Probabilites V, Lecture Notes in Mathematics 191, Springer, Berlin, 1971, pp. 21-36. Brydges, D., Frohlich, J. and Spencer, T. [1] The random walk representation of classical spin systems and correlation inequalities. Comm. Math. Phys., 83, 123-150 (1982). Burdzy, K. [1] On nonincrease of Brownian motion. Ann. Prob. 18, 978-980 (1990). [2] Brownian paths and cones, Ann. Prob. 13, 1006-1010 (1985). [3] Cut points on Brownian paths. Ann. Prob. 17, 1012-1036 (1989). BlJRKHOLDER, D. [1] Distribution function inequalities for martingales, Ann. Prob., 1, 19-42 (1973). Carlen, E. A. [1] Conservative diffusions, Comm. Math. Phy., 94, 293-315 (1984). [2] Potential scattering in quantum mechanics, Ann. Inst. H. Poincare, 42, 407-428 (1985). Carverhill, A. P. [1] Flows of stochastic dynamical systems: ergodic theory, Stochastics, 14, 273-318 (1985). [2] A formula for the Lyapunov exponents of a stochastic flow. Application to a perturbation theorem, Stochastics, 14, 209-226 (1985). [3] A nonrandom Lyapunov spectrum for nonlinear stochastic dynamical systems, Stochastics, 17, 209-226, 1986. Carverhill, A. P., Chappell, M. J. and Elworthy, K. D. [1] Characteristic exponents for stochastic flows, Proceedings, ВI BOS I: Stochastic Processes. Carverhill, A. P. and Elworthy, K. D. [1] Flows of stochastic dynamical systems: the functional analytic approach, Z. Wahrscheinlichkeitstheorie, 65, 245-268 (1983).
REFERENCES FOR VOLUMES 1 AND 2 355 Chaleyat-Maurel, M. [1] La condition d'hypoellipticite d'Hormander, Asterisque, 84-85, 189-202 (1981). Chaleyat-Maurel, M. and El Karoui, N. [1] Un probleme de reflexion et ses applications au temps local et aux equations differentielles stochastiques sur R, case continu. In Azema and Yor [2], pp. 117-144. Cheeger, J. and Ebin, D. G. [1] Comparison Theorems in Riemannian Geometry, North-Holland, Amsterdam, 1975. Chung, K. L. [1] Markov Chains with Stationary Transition Probabilities, 2nd edition, Springer, Berlin, 1967. [2] Probabilistic approach in potential theory to the equilibrium problem, Ann. Inst. Fourier, Grenoble, 23, 313-322 (1973). [3] Excursions in Brownian motion, Ark. Mat., 14, 155-177 (1976). Chung, K. L. and Getoor, R. K. [1] The condenser problem, Ann. Prob., 5, 82-86 (1977). Chung, K. L. and Walsh, J. B. [1] To reverse a Markov process, Acta Math., 123, 225-251 (1969). [2] Meyer's theorem on previsibility, Z. Wahrscheinlichkeitstheorie, 29,253-256 (1974). Chung, K. L. and Wlliams, R. J. [1] Introduction to Stochastic Integration Birkhauser, Boston, 1983. Ciesielski, Z. [1] Holder conditions for realisations of Gaussian processes. Trans. Amer. Math. Soc, 99, 403-413 (1961). Ciesielslki, Z. and Taylor, S. J. [1] First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path, Trans. Amer. Math. Soc, 103, 434-450 (1962). Qnlar, E., Chung, K. L. and Getoor, R. K. (editors) [1] Seminars on Stochastic Processes 1981,1982; 1983,1984 (four volumes), Birkhauser, Boston, 1982, 1983, 1984, 1985. Qnlar, E, Chung, K. L., Getoor, R. K. and Glover, J. (editors) [1] Seminar on Stochastic Processes 1986, Birkhauser, Boston, 1987. Qnlar, E., Jacod, J., Protter, P. and Sharpe, M. J. [1] Semimartingales and Markov processes, Z. Wahrscheinlichkeitstheorie, 54, 161-220 (1980). Clark, J. M. С [1] The representation of functionals of Brownian motion by stochastic integrals, Ann. Math. Stat., 41, 1282-1295 (1970); 42, 1778 (1971). [2] An introduction to stochastic differential equations on manifolds, Geometric Methods in Systems Theory (eds. D. Q. Mayne and R. W. Brockett), Reidel, Dordrecht, 1973. [3] The design of robust approximations to the stochastic differential equations of nonlinear filtering, Communications Systems and Random Process Theory (ed. J. Skwirzynski), Sijthoff and Noordhoff, Alphen aan den Rijn, 1978. Clarkson, B. (editor) [1] Stochastic Problems in Dynamics, Pitman, London, 1977. Cocozza, С and Yor, M. [1] Demonstration simplifiee d'un theoreme de Knight, Seminaire de Probabilites XIV: Lecture Notes in Mathematics 721, Springer, Berlin, 1980, pp. 496-499. Crank, J. [1] The Mathematics of Diffusion, 2nd ed. Oxford University Press, Oxford (1975).
356 REFERENCES FOR VOLUMES 1 AND 2 Cranston, M. [1] On the means of approach of Brownian motion Ann. Probab., 15,1009-1013 (1987). Cutland, N. [1] Non-standard measure theory and its applications, Bull. London. Math. Soc, 15, 529-589 (1983). Cutland, N. and Kendall, W. S. [1] A non-standard proof of one of David Williams' splitting-time theorems, in D. G. Kendall [5], pp. 37-48. Darling, R. W. R. [1] Martingales in manifolds—definition, examples, and behaviour under maps, Seminaire de Probabilites XVI Supplement: Lecture Notes in Mathematics 921, Springer, Berlin, 1982, pp. 217-236. Da vies, E. B. and Simon, B. [1] Ultracontractivity and the heat kernel for Schrodinger operators and Dirichlet Laplacians, J. Funct. Anal. 59, 335-395 (1984). Davis, B. [1] Picard's theorem and Brownian motion, Trans. Amer. Math. Soc, 213, 353-362 (1975). [2] Applications of the conformal invariance of Brownian motion, Harmonic analysis in Euclidean Space. Davis, Μ. Η. A. [1] On a multiplicative functional transformation arising in non-linear filtering theory, Z. Wahrscheinlichkeitstheorie, 54, 125-139 (1980). [2] Pathwise non-linear filtering, Stochastic Systems: the Mathematics of Filtering and Identification and Applications (eds. M. Hazewinkel and J. C. Willems), Reidel, Dordrecht, 1981. [3] Some current issues in stochastic control theory, Stochastics. [4] Markov Models and Optimization, Chapman & Hall, London, 1993. Davis, Μ. Η. A. and Varaiya, P. [1] Dynamic programming conditions for partially observed stochastic systems, SI AM J. Control, 11, 226-261 (1973). Dawson, D. A. [1] Measure-valued Markov processes, Ecole d'Ete de Probabilites de Saint-Flour XXI, 1993 (ed. P. L. Hennequin), Lecture Notes in Mathematics 1541, 1993. Dawson, D. A. and Gartner, J. [1] Large deviations from the McKean-Vlasov limit for weakly-interacting diffusions, Stochastics, 20, 247-308 (1987). Dellacherie, С [1] Capacites et Processus Stochastiques, Springer, Berlin, 1972. [2] Quelques exemples familiers en probabilites d'ensembles analytiques non-Boreliens, Seminaire de Probabilites XII: Lecture Notes in Mathematics, Springer, Berlin, 1978, pp. 742-745. [3] Un survoi de la theorie de l'integrale stochastique, Stoch. Proc. Appl., 10, 115-144 (1980). Dellacherie, C, Dol£ans(-dade), Catherine, Letta, G. and Meyer, P. A. [1] Diffusions a coefficients continus d'apres D. W. Stroock et S. R. S. Varadhan, Seminaire de Probabilites IV: Lecture Notes in Mathematics 124, Springer, Berlin, 1970, pp. 241-282.
REFERENCES FOR VOLUMES 1 AND 2 357 Dellacherie, C. and Meyer, P. A. [1] Probabilitis et Potentiel, Chaps. I-VI, Hermann, Paris, 1975; Chaps. V-VIII, Hermann, Paris, 1980; Chaps. IX-XI, Hermann, Paris, 1983; Chapters XII-XVI, Hermann, Paris, 1987; Chaps XVII-XXIV, Hermann, Paris, 1993. Deuschel, J.-D. and Stroock, D. W. [1] Large Deviations. Academic Press, Boston, 1989. De Witt-Morette, C. and Elworthy, K. D. (editors) [1] New stochastic methods in physics, Phys. Rep., 11, 121-382 (1981). Doleans(-dade), С [1] Existence du processus croissant natural associe a un potentiel de la classe (D), Z. Wahrscheinlichkeitstheorie 9, 309-314 (1968). [2] Quelques applications de la formule de changement de variables pour les semimartingales, Z. Wahrescheinlichkeitstheorie, 16, 181-194 (1970). Doleans-Dade, С and Meyer, P. A. [1] Equations differentielles stochastiques, Seminaires de Probabilites XI: Lecture Notes in Mathematics 581, Springer, Berlin, 1977, pp. 376-382. Doney, R. A. [1] On the maxima of random walks and stable processes and the arc-sine law, Bull. London Math. Soc, 19, 177-182 (1987). [2] A path decomposition for Levy processes, Stock Proc. Appl. 47, 167-181 (1993). Doob, J. L. [1] Stochastic Processes, Wiley, New York, 1953. [2] State-spaces for Markov chains, Trans. Amer. Math. Soc. 149, 279-305 (1970). [3] Classical Potential Theory and its Probabilistic Counterpart, Springer, New York, 1981. Doss, H. [1] Liens entre equations differentielles stochastiques et ordinaires, Ann. Inst. Henri Poincare B, 13, 99-126 (1977). Dubins, L. and Schwarz, G. [1] On continuous martingales, Proc. Natl. Acad. Sci. USA, 53, 913-916 (1965). Dunford, N. and Schwartz, J. T. [1] Linear Operators: Part I, General Theory, Interscience, New York, 1958. DURRETT, R. [1] Brownian Motion and Martingales in Analysis, Wads worth, Belmont, Calif. 1984. [2] (editor) Particle systems, random media, large deviations, Contemp. Math. 41, Amer. Math. Soc, Providence, RI, 1985. [3] Probability: Theory and Examples, Wadsworth & Brooks Cole, Pacific Grove, Calif., 1991. Dvoretsky, Α., Erdos, P. and Kakutani, S. [1] Double points of paths of Brownian motion in «-space, Acta. Sci. Math. (Szeged), 12,64-81 (1950). [2] Multiple points of paths of Brownian motion in the plane, Bull. Res. Council Isr. Sect. F, 3, 364-371 (1954). [3] Points of multiplicity с of plane Brownian paths, Bull. Res. Council Isr. Sect. F, 7, 175-180(1958). Dvoretsky, Α., Erdos, P., Kakutani, S. and Taylor, S. J. [1] Triple points of Brownian motion in 3-space, Proc. Camb. Phil. Soc, 53, 856-862 (1957). Dynkin, Ε. Β. [1] Theory of Markov Processes, Pergamon Press, Oxford, 1960.
358 REFERENCES FOR VOLUMES 1 AND 2 [2] Markov Processes (two volumes), Springer, Berlin, 1965. [3] Non-negative eigenfunctions of the Laplace-Beltrami operator and Brownian motion in certain symmetric spaces (in Russian), Dokl. Akad. Naud SSSR, 141, 288-291 (1961). [4] Diffusion of tensors, Dokl. Akad. Nauk. SSSR, 179, 1264-1267 (1968). [5] Local times and quantum fields, in фп1аг, Chung and Getoor [1, 1983]. [6] Gaussian and non-Gaussian random fields associated with Markov processes, J. Func. Anal., 55, 344-376 (1984). [7] Self-intersection local times, occupation fields and stochastic integrals (to appear in Adv. App. Math.). [8] Random fields associated with multiple points of the Brownian motion, J. Fund. Anal, 62, 397-434 (1985). [9] Local times and quantum fields, in ^inlar, Chung and Getoor [1, 1984]. Elliott, R. J. [1] Stochastic Calculus and Applications, Springer, Berlin, 1982. Elliott, R. J. and Anderson, B. D. O. [1] Reverse time diffusions, Stochastic Processes and their Applications, 19, 327-339 (1985). Elworthy, K. D. [1] Stochastic Differential Equations on Manifolds, London Mathematical Society Lecture Note Series 20, Cambridge University Press, Cambridge, 1982. [2] (editor) From Local Time to Global Geometry, Control and Physics, Proceedings, Warwick Symposium 1984/85, Longman, Harlow/Wiley, New York, 1986. Elworthy, K. D. and Stroock, D. W. [1] Large deviation theory for mean exponents of stochastic flows, Appendix to Carverhill, Chappell and Elworthy [1]. Elworthy, K. D. and Truman, A. [1] Classical mechanics, the diffusion (heat) equation and the Schrodinger equation on a Riemannian manifold, J. Math. Phys., 22, 2144-2166 (1981). [2] The diffusion equation and classical mechanics: an elementary formula, Stochastic processes in quantum theory and statistical physics (ed. S. Albeverio et al), Lecture Notes in Physics 173, Springer, Berlin, 1982, pp. 136-146. Emery, M. [1] Annoncabilite des temps previsibles: deux contre-exemples, Seminaire de Probabilites IV: Lecture Notes in Mathematics 784, Springer, Berlin, 1980, pp. 318-323. [2] On the Azema martingales, Seminaire de Probabilities XXIII: Lecture Notes in Mathematics 1372, Springer, Berlin 1989 pp. 66-88. Ethier, S. N. and Kurtz, T. G. [1] Markov Processes: Characterization and Convergence, Wiley, New York, 1986. Evans, S. N. [1] On the Hausdorff dimension of Brownian cone points, Math. Proc. Comb. Phil. Soc, 98, 343-353 (1985). [2] Multiple points in the sample paths of a Levy process, Prob. Th. Rel. Fields, 76, 359-367 (1987). Feller, W. [1] Introduction to Probability Theory and its Applications, Vol. 1, 2nd edition Wiley, New York, 1957; Vol. 2, Wiley, New York, 1966. [2] Boundaries induced by non-negative matrices, Trans. Amer Math. Soc, 83, 19-54 (1956). [3] On boundaries and lateral conditions for the Kolmogorov equations, Ann. Math., Ser. II, 65, 527-570 (1957).
REFERENCES FOR VOLUMES 1 AND 2 359 [4] Generalized second-order differential operators and their lateral conditions, Illinois J. Math., 1, 459-504 (1957). Fleming, W. H. and Rishel, R. W. [1] Deterministic and Stochastic Optimal Control, Springer, Berlin, 1975. FOLLMER, H. [1] Calcul d'lto sans probabilites, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 143-150. Freedman, D. [1] Brownian Motion and Diffusion, Holden-Day, San Francisco, 1971. [2] Approximating Countable Markov Chains, Holden-Day, San Francisco, 1972. Friedman, A. [1] Stochastic Differential Equations and Applications (two volumes), Academic Press, New York, 1975. Fristedt, B. [1] Sample functions of stochastic processes with stationary independent increments, Adv. Prob., 3, 241-396 (1973). Fujisaki, M., Kallianpur, G. and Kunita, H. [1] Stochastic differential equations for the non-linear filtering problem, Osaka J. Math., 9, 19-40 (1972). Fukushima, M. [1] Dirichlet Forms and Markov Processes, Kodansha, Tokyo, 1980. [2] Basic properties of Brownian motion and a capacity on the Wiener space, J. Math. Soc. Japan, 36, 161-176 (1984). Garcia Alvarez, M. A. and Meyer, P. A. [1] Une theorie de la dualite a un ensemble polaire pres: I, Ann. Prob., 1,207-222 (1973). Garsia, A. [1] Martingale Inequalities: Seminar Notes on Recent Progress, Benjamin, Reading, Mass, 1973. Garsia, Α., Rodemich, E. and Rumsey, H. Jr [1] A real variable lemma and the continuity of paths of some Gaussian processes. Indiana Univ. Math. J., 20, 565-578 (1970). Geman, D. and Horowitz, J. [1] Occupation densities, Ann. Prob., 8, 1-67 (1980). Geman, D. Horowitz, J. and Rosen, J. [1] A local time analysis of intersections of Brownian paths in the plane, Ann. Prob., 12, 86-107 (1984). Getoor, R. K. [1] Markov processes: Ray Processes and Right Processes: Lecture Notes in Mathematics 440, Springer, Berlin, 1975. [2] Excursions of a Markov process, Ann. Prob., 8, 244-266 (1979). [3] Splitting times and shift functional, Z. Wahrscheinlichkeitstheorie, 47,69-81 (1979). Getoor, R. K. and Sharpe, M. J. [1] Last exit times and additive functional, Ann. Prob., 1, 550-569 (1973). [2] Excursions of Brownian motion and Bessel process, Z. Wahrscheinlichkeitstheorie, 47, 83-106 (1979). [3] Last exit decompositions and distributions, Indiana Univ. Math. J., 23, 377-404 (1973). [4] Excursions of dual processes, Adv. Math., 45, 259-309 (1982). [5] Conformal martingales, Invent Math., 16, 271-308 (1972). Gikhman, 1.1, and Skorokhod, A. V. [1] The Theory of Stochastic Processes (three volumes), Springer, Berlin, 1979.
360 REFERENCES FOR VOLUMES 1 AND 2 Gray, Α., Karp, L. and Pinsky, M. A. [1] The mean exit time from a tube in a Riemannian manifold, Probability and Harmonic Analysis (eds. J. Chao and W. Woyczynski), Dekker, 1986, pp. 113-137. Gray, A. and Pinsky, M. A. [1] The mean exit time from a small geodesic ball in a Riemannian manifold, Bull. Sci Math., 107, 345-370 (1983). Greenwood, P. and Perkins, E. [1] A conditional limit theorem for random walk and Brownian local time on square root boundaries, Ann. Prob. 11, 227-261 (1982). [2] Limit theorems for excursions from a moving boundary. Th. Prob. Appl. 29, 703-714 (1984). Greenwood, P. and Pitman, J. W. [1] Construction of local time and Poisson point processes from nested arrays, J. London Math. Soc. (2), 22, 182-192 (1980). [2] Fluctuation identities for Levy processes and splitting at the maximum, Adv. Appl. Prob., 12, 893-902 (1980). Grenander, U. [1] Probabilities on Algebraic Structures, Wiley, New York, 1963. Griffeath, D. [1] Coupling methods for Markov processes, Advances in Mathematics Supplementary Studies: Studies in Probability and Ergodic Theory, Vol. 2, Academic Press, New York, 1978, pp. 1-43. Gromov, M. and Rohlin, V. A. [1] Russian Math. Surveys, 25, 1-57 (1970). Grosswald, E. [1] The Student r-distribution of any degree of freedom is infinitely divisible, Z. Wahrsheinlichkeitscheorie verw. Geb., 36, 103-109 (1976). Halmos, P. [1] Measure Theory, Van Nostrand, Princeton, NJ, 1959. Harris, Т. Е. [1] Brownian motions on the homeomorphisms of the plane, Ann. Prob., 9, 232-254 (1981). Haussmann, U. [1] On the integral representation of Ito processes, Stochastics, 3, 17-7 (1979). [2] A Stochastic Maximum Principle for Optimal Control of Diffusions, Longman, Harlow, 1986. Hawkes, J. [1] Multiple points for symmetric Levy processes, Math. Proc. Camb. Phil., 83, 83-90 (1978). [2] The measure of the range of a subordinator, Bull. London Math. Soc, 5, 21-28 (1973). [3] Local times as stationary processes, From Local to Global Geometry, Control and Physics, Research Notes in Math. 150, Pitman, Harlow, 1986, pp. 111-120. Hazewinkel, M. and Willems, J. С (editors) [1] Stochastic Systems: The Mathematics of Filtering and Identification and Applications, Reidel, Dordrecht, 1981. Helgason, S. [1] Differential Geometry and Symmetric Spaces, Academic Press, New York, 1962. Helms, L. L. [1] Introduction to Potential Theory, Robert E. Krieger, Huntington, NY, 1975.
REFERENCES FOR VOLUMES 1 AND 2 361 Hille, E. and Phillips, R. S. [1] Functional Analysis and Semigroups, Amer. Math. Soc, Providence, RI, 1957. Holley, R., Stroock, D. W. and Williams, D. [1] Applications of dual processes to diffusion theory, Proc. Amer. Math. Soc. Prob. Symp., Urbana, 1976, pp. 23-36. HuRMANDER, L. [1] Hypoelliptic second-order differential equations, Acta Math., 117, 147-171 (1967). Hsu, P. [1] On excursions of reflecting Brownian motion, Trans. Math. Soc, 296,239-264 (1986). [2] Brownian motion and the index theorem (to appear). Hunt, G. A. [1] Markoff processes and potentials: I, И, ЛИ, Illinois J. Math., 1, 44-93; 316-369 (1957); 2, 151-213 (1958). Ikeda, N. and Watanabe, S. [1] Stochastic Differential Equations and Diffusion Processes, North Holland-Kodansha, Amsterdam and Tokyo, 1981. [2] Malliavin calculus of Wiener functionals and its applications, in Elworthy [2], pp. 132-178. Ismail, M. E. and Kelker, D. H. [1] The Bessel polynomials and the Student r-distribution, SIAM J. Math. Anal., 7, 82-91 (1976). Ιτό, Κ. [1] Stochastic integral, Proc. Imp. Acad. Tokyo, 20, 519-524 (1944). [2] On a stochastic integral equation, Proc. Imp. Acad. Tokyo, 22, 32-35 (1946). [3] Stochastic differential equations in a differential manifold, Nagoya Math. J., 1, 35-47 (1950). [4] The Brownian motion and tensor fields on a Riemannian manifold, Proc. Int. Congr. Math, Stockholm, 1963, pp. 536-539. [5] Stochastic parallel displacement, Probabilistic Methods in Differential Equations: Lecture Notes in Mathematics 451, Springer, Berlin, 1975, pp. 1-7. [6] Poisson point processes attached to Markov processes, Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. 3, University of California Press, Berkeley, 1971, pp. 225-240. [7] (editor) Proceedings of the 1982 Taniguchi Int. Symp. on Stochastic Analysis, Kinokuniya-Wiley, 1984. [8] Stationary random distributions. Mem Coll. Sci. Kyoto Univ. Ser. A, 28, 209-223 (1954). Ιτό, Κ. and McKean, H. P. [1] Diffusion Processes and their Sample Paths, Springer, Berlin, 1965. Jacka, S. [1] A finite fuel stochastic control problem, Stochastics, 10, 103-113 (1983). [2] A local time inequality for martingales, Seminaires de Probabilites XVII: Lecture Notes in Mathematics 986, Springer, Berlin, 1983. Jacobsen, M. [1] Splitting times for Markov processes and a generalised Markov property for diffusions, Z. Wahrscheinlichkeitstheorie, 30, 27-43 (1974). [2] Statistical Analysis of Counting Processes: Lecture Notes in Mathematics 12, Springer, New York, 1982. Jacod, J. [1] A general theorem of representation for martingales, Proc. Amer. Math. Soc. Prob. Symp., Urbana, 1976, 37-53.
362 REFERENCES FOR VOLUMES 1 AND 2 [2] Calcul Stochastique et Problemes de Martingales: Lecture Notes in Mathematics 714, Springer, Berlin, 1979. Jacod, J. and Yor, M. [1] Etude des solutions extremales et representation integrate des solutions pour certains problemes de martingales, Z. Wahrscheinlichkeitstheorie, 38, 83-125 (1977). Jeulin, T. [1] Semimar ting ales et Grossissement d'une Filtration: Lecture Notes in Mathematics 833, Springer, Berlin, 1980. Jeulin, T. and Yor, M. [1] Grossissement d'une filtration et semi-martingales: formules explicites, Seminaire de Probabilites XII: Lecture Notes in Mathematics 649, Springer, Berlin, 1978, pp. 78-97. [2] (editors) Grossissements de Filtrations: Exemples et Applications: Lecture Notes in Mathematics 1118, Springer, Berlin, 1985. Johnson, G. and Helms, L. L. [1] Class (D) supermartingales, Bull. Amer. Math. Soc, 69, 59-62 (1963). Kailath, T. [1] An innovations approach to least squares estimation, Part I: Linear filtering with additive white noise, IEEE Trans. Autom. Control. 13, 646-655 (1968). Kallianpur, G. [1] Stochastic Filtering Theory, Springer, Berlin, 1980. Karatzas, I. Shreve, S. E. [1] Brownian Motion and Stochastic Calculus, Springer, Berlin, 1988. Kellogg, O. D. [1] Foundations of Potential Theory, Dover, New York, 1953. Kendall, D. G. [1] Pole-seeking Brownian motion and bird navigation (with discussion), J. Roy. Statist. Soc. B, 36, 365-417 (1974). [2] The diffusion of shape, Adv. Appl. Prob., 9, 428-430 (1979). [3] Shape manifolds, Procrustean metrics, and complex projective spaces, Bull. London Math. Soc, 16, 81-121 (1984). [4] A totally unstable Markov process, Quart. J. Math. Oxford, 9, 149-160 (1958). [5] (editor) Analytic and Geometric Stochastics (special supplement to Adv. Appl. Prob. to honour G. Ε. Η. Reuter), Appl. Prob. Trust, 1986. Kendall, D. G. and Reuter, G. Ε. Η. [1] Some pathological Markov processes with a denumerable infinity of states and the associated contraction semigroups of operators on {, Proc. Int. Congr. Math. 1954 (Amsterdam), 3, 377-415 (1956). Kendall, W. S. [1] Knotting of Brownian motion in 3-space, J. London Math. Soc. (2), 19, 378-384 (1979). [2] Brownian motion, negative curvature, and harmonic maps, Stochastic Integrals: Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 479-491. [3] Brownian motion on a surface of negative curvature, Siminaire de Probabilites XVIII: Lecture Notes in Mathematics 1059, Springer, Berlin, 1984, pp. 70-76. [4] Survey article on stochastic differential geometry (to appear). Kent, J. [1] Some probabilistic properties of Bessel functions, Ann. Prob., 6, 760-770 (1978). [2] The infinite divisibility of the von Mises-Fisher distribution for all values of the parameter in all dimensions, Proc. London Math. Soc, 3, 359-384 (1977). [3] Continuity properties for random fields. Ann. Prob. 17, 1432-1440 (1989).
REFERENCES FOR VOLUMES 1 AND 2 363 Kesten, H. [1] Hitting probabilities of single points for processes with stationary independent increments, Mem. Amer. Math. Soc, 93 (1969). Khasminskii, R. Z. [1] Ergodic properties of recurrent diffusion processes and stabilization of the solution of the Cauchy problem for parabolic equations, Th. Prob. Appl., 5,179-196 (1960). [2] Stochastic Stability of Differential Equations, Sijthoff and Noordhoff, Alphen aan den Rijn, 1980. KiraR, Y. [1] Brownian motion and positive harmonic functions on complete manifolds of non-positive curvature, in Elworthy [2], pp. 187-232. Kingman, J. F. С [1] Subadditive ergodic theory, Ann. Prob., 1, 883-909 (1973). [2] Completely random measures, Pacific J. Math., 21, 59-78 (1967). [3] Regenerative Phenomena, Wiley, New York, 1972. [4] Poisson Processes, Oxford University Press, Oxford, 1993. Knight, F. B. [1] Note on regularisation of Markov processes, Illinois, J. Math., 9, 548-552 (1965). [2] A reduction of continuous square-integrable martingales to Brownian motion, Martingales: A Report on a Meeting at Oberwolfach (ed. H. Dinges): Lecture Notes in Mathematics 190, Springer, Berlin, 1971, pp. 19-31. [3] Random walks and the sojourn density process of Brownian motion, Trans. Amer. Math. Soc, 107, 56-86 (1963). Knight, F. B. and Pittenger, A.O. [1] Excision of a strong Markov process, Z. Wahrscheinlichkeitstheorie, 23, 114-120 (1972). Kobayashi, S. and Nomizu, K. [1] Foundations of Differential Geometry (two volumes) Wiley-Interscience, New York, 1963, 1969. Kolmogorov, A. N. [1] The local structure of turbulence in an incompressible fluid at very large Reynolds numbers, Dokl. Akad. Nauk SSSR, 30, 229-303 (1941). [2] The distribution of energy in locally isotropic turbulence. Dokl. Akad. Nauk SSSR, 32, 19-21 (1941). Kozin, F. and Prodromou, S. [1] Necessary and sufficient conditions for almost sure sample stability of linear Ito equations, SI AM J. Appl. Math., 21, 413-425 (1971). Krylov, N. V. [1] Controlled Diffusion Processes, Springer, New York, 1980. KUELBS, J. [1] The law of the iterated logarithm for Banach space valued random variables, Probability in Banach Spaces: Lecture Notes in Mathematics 526, Springer, Berlin, 1976, pp. 131-142. KUNITA, H. [1] On the decomposition of the solutions of stochastic differential equations, Stochastic Integrals: Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 213-255. [2] On backward stochastic differential equations, Stochastics, 6, 293-313 (1982). [3] Stochastic differential equations and stochastic flows of homeomorphisms. [4] Stochastic partial differential equations connected with nonlinear filtering, in Mitter and Moro [1].
364 REFERENCES FOR VOLUMES 1 AND 2 [5] Stochastic Flows and Stochastic Differential Equations, Cambridge University Press, Cambridge, 1990. Kunita, H. and Watanabe, S. [1] On square integrable martingales, Nagoya Math. J., 30, 209-245 (1967). Kunita, H. and Watanabe, T. [1] Some theorems concerning resolvents over locally compact spaces, Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 2, Part 2, University of California Press, Berkeley 1967, pp. 131-164. [2] Markov processes and Martin boundaries, I, Illinois J. Math., 9, 485-526 (1965). [3] On certain reversed processes and their application to potential theory and boundary theory, J. Math. Mech., 15, 393-434 (1966). Kusuoka, S. and Stroock, D. [1] Applications of the Malliavin calculus, Part I, Proceedings of the 1982 Taniguchi Int. Symp. on Stochastic Analysis (ed. K. Ito), Kinokuniya-Wiley, 1984, 271-306. [2] Applications of the Malliavin calculus, Part II, J. Fac. Sci. Univ. Tokyo (IA), 32, 1-76 (1985). le Gall, J.-F. [1] Applications du temps local aux equations differentielles stochastiques unidimen- sionelles, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986, Springer, Berlin, 1983, pp. 15-31. [2] Sur la saucisse de Wiener et les points multiples du mouvement Brownien plan at la methode de renormalization de Varadhan, Seminaire de Probabilites XIX: Lecture Notes in Mathematics 1123, Springer, Berlin, 1985, pp. 314-331. [4] Fluctuation results for the Wiener sausage, Ann. Prob., 16, 991-1018 (1988). [5] The exact Hausdorff measure of Brownian multiple points, in Cinlar, Chung, and Getoor and Glover [1], pp. 107-137. [6] Planar Brownian motion, cones and stable processes, C. R. Acad. Sci. Paris Ser. I, 302, 641-643 (1986). [7] Une approche elementaire des theoremes de decomposition de Williams, Seminaire de Probabilites, XX, Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 447-464. le Gall, J.-F., Rosen, J. and Shieh, N. R. [1] Multiple points of Levy processes, Ann. Prob., 17, 503-515 (1989). le Gall, J.-F. and Yor, M. [1] Etude asymptotique de certains mouvements browniens complexes avec drift, Prob. Th. Rel Fields, 71, 183-229 (1986). [2] Etude asymptotique des enlacements due mouvement brownien autour des droites de l'espace, Prob. Th. Rel. Fields, 74, 617-635 (1987). le Jan, Y. [1] Flots de diffusion dans Rd, C.R. Acad. Sci. Paris Ser. I, 294, 697-699 (1982). [2] Equilibre et exposants de Lyapounov de certains flots Browniens, C.R. Acad. Sci. Paris Ser. I, 298, 361-364 (1984). [3] Exposants de Lyapounov pour les mouvements Browniens isotropes, C. R. Acad. Sci. Paris Ser. I, 299, 947-949 (1984). [4] On isotropic Brownian motions, Z. Wahrscheinlichkeitstheorie verw. Geb., 70, 609-620 (1985). le Jan, Y. and Watanabe, S. [1] Stochastic flows of diffeomorphisms, Proceedings of the 1982 Taniguchi Int. Symp. on Stochastic Analysis, 1984, pp. 307-332. Lenglart, E., Lepingle, D. and Pratelli, M. [1] Presentation unifiee de certaines inegalites de la theorie des martingales, Seminaire
REFERENCES FOR VOLUMES 1 AND 2 365 de Probabilites XIV: Lecture Notes in Mathematics 784, Springer, Berlin, 1980. Levy, P. [1] Theorie de VAddition des Variables Aleatoires, Gauthier Villars, Paris, 1954. [2] Processus Stochastiques et Mouvement Brownien, Gauthier Villars, Paris, 1965. [3] Systemes markoviens et stationnaires. Cas denombrable, Ann. Ecole Norm. Sup. (3), 68, 327-381 (1951); 69, 203-212 (1952). [4] Processus markoviens et stationnaires du cinquieme type (infinite denombrable des etats possibles, parametre continu), С R. Acad. Sci. Paris, 236,1630-1632, (1953). [5] Processus markoviens et stationnaires. Cas denombrable, Ann. Inst. H. Poincare, 16, 7-25 (1958). Lewis, J. T. [1] Brownian motion on a submanifold of Euclidean space, Bull. London Math. Soc, 18, 616-620 (1986). Liggett, T. [1] Interacting Particle Systems, Springer, New York, 1985. Lindvall, T. [1] On coupling of diffusion processes, J. Appl. Prob., 20, 82-93 (1983). Lipster, R. S. and Shiryayev, A. N. [1] Statistics of Random Processes, I, Springer, Berlin, 1977. London, R. R., McKean, H. P., Rogers, L. С G. and Williams, D. [1] A martingale approach to some Wiener-Hopf problems, I, Seminaire de Probabilites XVI: Lecture Notes in Mathematics 920, Springer, Berlin, 1982, pp. 41-67. Lyons, T. J. [1] Finely holomorphic functions, J. Funct. Anal, 37, 1-18 (1980). [2] Instability of the Liouville property for quasi-isometric Riemannian manifolds and reversible Markov chains, J. Diff. Geom. 26, 33-66 (1987). [3] The critical dimension at which quasi-every path is self-avoiding, in D. G. Kendall [5], pp. 87-100. Lyons, T. J. and McKean, H. P. [1] Windings of the plane Brownian motion, Adv. Math., 51, 212-225 (1984). McGill, P. [1] Calculation of some conditional excursion formulae, Z. Wahrscheinlichkeitstheorie, 61, 255-260 (1982). [2] Markov properties of diffusion local time: a martingale approach, Adv. Appl. Prob., 14, 789-810 (1980). [3] Integral representation of martingales in the Brownian excursion filtration, Seminaire de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 465-502. McKean, H. P. [1] Stochastic Integrals, Academic Press, New York, 1969. [2] Excursions of a non-singular diffusion, Z. Wahrscheinlichkeitstheorie, 1, 230-239 (1963). [3] Brownian local times, Adv. Math., 16, 91-111 (1975). [4] Brownian motion with a several-dimensional time, Teor. Veroyatnost., 4(4), 357-378 (1963). McNamara, J. Μ. [1] A regularity condition on the transition probability measure of a diffusion process. Stochastics, 15, 161-182 (1985). Maisonneuve, B. [1] Systemes regeneratifs, Asterisque, Soc. Mathematique de France, 15 (1974).
366 REFERENCES FOR VOLUMES 1 AND 2 Maisonneuve, B. and Meyer, P. -A. [1] Ensembles aleatoires markoviens homogenes, Seminaire de Probabilites VIII: Lecture Notes in Mathematics 381, Springer, Berlin, 1974, pp. 172-261. Malliavin, M.P. and Malliavin, P. [1] Factorisations, et lois limites de la diffusion horizontale au dessus d'un espace riemannien symmetrique, Lecture Notes in Mathematics 404, Springer, Berlin, 1974, pp. 166-217. Malliavin, P. [1] Stochastic calculus of variation and hypo-elliptic operators, Proc. Int. Symp. Stoch. Diff. Equations, Kyoto, 1976 (ed. K. Ito), Kinokuniya-Wiley, 1978, pp. 195-263. [2] C*-hypoellipticity with degeneracy, Stochastic Analysis (eds. A. Friedman and M. Pinksy), Academic Press, New York, 1978, pp. 199-214. [3] Formula de la moyenne, calcul de perturbations et theoremes d'annulation pour les formes harmoniques, J. Funct. Anal., 17, 274-291 (1974). Marcus, M.B. and Rosen, J. [1] Sample path properties of the local times of strongly symmetric Markov processes via Gaussian processes. Ann. Prob., 20, 1603-1684 (1992). Mandl, P. [1] Analytic Treatment of One-Dimensional Markov Processes, Springer, Berlin, 1968. Meleard, S. [1] Application du calcul stochastique a l'etude de processus de Markov reguliers sur [0,1], Stochastics, 19, 41-82 (1986). Messulam, P. and Yor, M. [1] On D. Williams' 'pinching method' and some applications, J. London Math. Soc, 26, 348-364 (1982). Metivier, M. and Pellaumail, J. [1] Stochastic Integration, Academic Press, New York, 1979. Meyer, P. A. [I] Un cours sur les integrates stochastiques, Seminaire de Probabilites X: Lecture Notes in Mathematics 511, Springer, Berlin, 1976, pp. 245-400. [2] Probability and Potential, Blaisdell, Waltham, Mass., 1966. [3] Processus de Markov: Lecture Notes in Mathematics 26, Springer, Berlin, 1967. [4] Processus de Markov: la Frontiere de Martin: Lecture Notes in Mathematics 77, Springer, Berlin, 1970. [5] Demonstration simplifiee d'un theoreme de Knight, Seminaire de Probabilites V: Lecture Notes, in Mathematics 191, Springer, Berlin, 1971, pp. 191-195. [6] Demonstration probabiliste de certaines inegalites de Littlewood-Paley, Seminaire de Probabilites X: Lecture Notes in Mathematics 511, Springer, Berlin, 1976, pp. 125-183. [7] Flot d'un equation differentielle stochastique, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 103-117. [8] Sur la demonstration de previsibilite de Chung and Walsh, Seminaire de Probabilites IX: Lecture Notes in Mathematics 465, Springer, Berlin, 1975, pp. 530-533. [9] Geometrie stochastique sans larmes, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 44-102. [10] Geometrie stochastique sans larmes (bis), Seminaire de Probabilites XVI: Supplement, Lecture Notes in Mathematics 921, Springer, Berlin, 1982, pp. 165-207. [II] Elements de probabilites quantiques, Seminaire de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 186-312.
REFERENCES FOR VOLUMES 1 AND 2 367 [12] Quantum Theory for Probabilists, Lecture Notes in Mathematics 1538, Springer, Berlin, 1993. MlHLSTEIN, G. N. [1] Approximate integration of stochastic differential equations, Th. Prob. AppL, 19, 557-562 (1974). Millar, P. W. [1] Random times and decomposition theorems, in Probability: Proc. Symp. Pure Math. XXXI, Amer. Math. Soc, Providence, RI, 1977, pp. 91-103. [2] A path decomposition for Markov processes, Ann. Prob., 6, 345-348 (1978). Millar, P. W. and Tran, L. T. [1] Unbounded local times, Z. Wahrscheinlichkeitstheorie verw. Geb., 30,87-92 (1974). MlTRO, J. [1] Dual Markov processes: construction of a useful auxiliary process, Z. Wahrscheinlichkeitstheorie, 47, 139-156 (1979). [2] Dual Markov functions: applications of a useful auxiliary process, Z. Wahrscheinli- chkeitstheorie, 48, 97-114 (1979). MlTTER, S. K. [1] Lectures on non-linear filtering and stochastic control, in Mitter and Moro [1], pp. 170-207. Mitter, S. K. and Moro, A. (editors) [1] Non-linear Filtering and Stochastic Control: Lecture Notes in Mathematics 972, Springer, Berlin, 1982. Μοτοο, Μ. [1] Application of additive functionals to the boundary problem of Markov processes (Levy's system of U-processes), Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 2, Part 2, Univ. of California Press, Berkeley, 1967, pp. 75-110. [2] Proof of the law of iterated logarithm through diffusion equation, Ann. Inst. Statist. Math., 10, 21-28 (1959). Μοτοο, M. Watanabe, S. [1] On a class of additive functionals of Markov processes, J. Math. Kyoto Univ., 4, 429-469 (1965). Nakao, S. [1] On the pathwise uniqueness of solutions of one-dimensional stochastic differential equations, Osaka J. Math., 9, 513-518 (1972). Nash, J. F. [1] The imbedding problem for Riemannian manifolds, Ann. Math., 63, 20-63 (1956). Nelson, E. [1] Dynamical Theories of Brownian Motion, Princeton University Press, 1967. [2] Quantum Fluctuations, Princeton University Press, 1984. Neveu, J. [1] Bases Mathematiques du Calcul des Probabilites, Masson, Paris, 1964. [2] Sur les etats d'entree et les etats flctifs d'un processus de Markov, Ann. Inst. Henri Poincare, 17, 323-337 (1962). [3] Lattice methods and submarkovian processes, Proc. 4th Berkeley Symp. Math. Statist. Prob., Vol. 2, University of California Press, Berkeley, 1960, pp. 347-391. [4] Une generalisation des processus a accroissements positifs independants, Abh. Math. Sem. Univ. Hamburg, 25, 36-61 (1961). [5] Entrance, exit and fictitious states for Markov chains, Proc. Aarhus Colloq. Combin Prob., 1962, pp. 64-68.
368 REFERENCES FOR VOLUMES 1 AND 2 NORRIS, J. R. [1] Simplified Malliavin calculus, Seminaire de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 101-130. Norris, J. R., Rogers, L. C. G. and Williams, D. [1] Brownian motion of ellipsoids, Trans. Amer. Math. Soc, 294, 757-765 (1986). [2] Self-avoiding random walk: a Brownian motion model with local time drift, Prob. Th. Rel. Fields, 74, 271-287 (1987). Ocone, D. [1] Malliavin's calculus and stochastic integral: representation of functional of diffusion processes, Stochastics, 12, 161-185 (1984). Orihara, A. [1] On random ellipsoid, J. Fac. Sci. Univ. Tokyo, Sect. IA Math., 17, 73-85 (1970). Pardoux, E. [1] Stochastic differential equations and filtering of diffusion processes, Stochastics, 3, 127-167(1979). [2] Grossissement d'une filtration et retournement du temps d'une diffusion, Seminaire de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 48-55. [3] Equations of non-linear filtering, and applications to stochastic control with partial observations, in Mitter and Moro [1], pp. 208-248. Pardoux, E. and Talay, D. [1] Discretization and simulation of stochastic differential equations (to appear in Acta Appl. Math.). Parthasarathy, K. R. [1] Probability Measures on Metric Spaces, Academic Press, New York, 1967. Pauwels, E. and Rogers, L. C. G. [1] Skew-product decompositions of Brownian motions, Contemp. Math. 73, 237-262 (1988). Perkins, E. [1] Local time and path wise uniqueness for stochastic differential equations, Seminaire de Probabilites XVI: Lecture Notes in Mathematics 920, Springer, Berlin, 1982, pp. 201-208. [2] Local time is a semimartingale, Z. Wahrscheinlichkeitstheorie, 60, 79-117 (1982). Phelps, R. R. [1] Lectures on Choquet's Theorem, Van Nostrand, Princeton, NJ, 1966. Pinsky, M. A. [1] Homogenization and stochastic parallel displacement, in Williams [13], pp. 271-284. [2] Stochastic Riemannian geometry, Probabilistic Analysis and Related Topics, 1 (ed. A. T. Bharucha-Reid), Academic Press, New York, 1978. Pitman, J. W. [1] One-dimensional Brownian motion and the three-dimensional Bessel process, J. Appl. Prob., 1, 511-526(1975). [2] Path decomposition for conditional Brownian motion, Inst. Math. Statist. Univ. Copenhagen, Preprint No. 11 (1974). [3] Levy systems and path decompositions, in Qlinlar, Chung and Getoor [1, 1981]. Pitman, J. W. and Yor, M. [1] Bessel processes and infinitely divisible laws, Stochastic Integrals (ed. D. Williams), Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 285-370. [2] A decomposition of Bessel bridges. Z. Wahrscheinlichkeitstheorie, 59, 425-457 (1982).
REFERENCES FOR VOLUMES 1 AND 2 369 [3] The asymptotic joint distribution of windings of planar Brownian motion, Bull. Amer. Math. Soc, 10, 109-111 (1984). [4] Asymptotic laws of planar Brownian motion, Ann. Prob., 14, 733-779 (1986). Pittenger, A. O. and Shih, С. Т. [1] Coterminal families and the strong Markov property, Trans. Amer. Math. Soc, 182, 1-42 (1973). Poor, W. A. [1] Differential Geometric Structures, McGraw-Hill, New York, 1981. Port, S. С and Stone, С J. [1] Classical potential theory and Brownian motion, Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. 3, University of California Press, Berkeley, 1972, pp. 143-176. [2] Logarithmic potentials and planar Brownian motion, Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. 3, University of California Press, Berkeley 1972, pp. 177-192. [3] Brownian Motion and Classical Potential Theory, Academic Press, New York, 1978. Price, G. C. and Williams, D. [1] Rolling with 'slipping': I, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986, Springer, Berlin, 1983, pp. 194-297. Prohorov, Yu, V. [1] Convergence of random processes and limit theorems in probability, Th. Prob. Appl., 1, 157-214 (1956). Protter, P. [1] On the existence, uniqueness, convergence and explosions of solutions of stochastic differential equations, Ann. Prob., 5, 243-261 (1977). Rao, К. М. [1] On decomposition theorems of Meyer, Math. Scand., 24, 66-78 (1969). [2] Quasimartingales, Math. Scand., 24, 79-92 (1969). Ray, D. B. [1] Resolvents, transition functions and strongly Markovian processes, Ann. Math., 70, 43-72 (1959). [2] Sojourn times of a diffusion process, Illinois J. Math., 7, 615-630 (1963). Reuter, G. Ε. Η. [1] Denumerable Markov processes, II, J. London Math. Soc, 34, 81-91 (1959). Revuz, D. [1] The Martin boundary of a recurrent random walk has one or two points, Probability: Proc. Symp. Pure Math. XXXI, Amer. Math. Soc, Providence, RI, 1977, pp. 125-130. Revuz, D. and Yor, M. [1] Continuous Martingales and Brownian Motion, Springer, Berlin, 1991. Rogers, L. С G. [1] Williams' characterization of the Brownian excursion law: proof and applications, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 227-250. [2] Ito excursion theory via resolvents, Z. Wahrscheinlichkeitstheorie, 63,237-255 (1983). [3] Smooth transition densities for one-dimensional diffusions, Bull. London Math. Soc, 17, 157-161 (1985). [4] Continuity of martingales in the Brownian excursion filtration, Prob. Th. Rel. Fields 16, 291-298 (1987). [5] Multiple points of Markov processes in a complete metric space, Seminaire de Probabilites XXIII: Lecture Notes in Mathematics 1372, Springer, Berlin, 1989, pp. 186-197. [6] A new identity for real Levy processes. Ann. Inst. Henri Poincare, 20,21-34 (1984).
370 REFERENCES FOR VOLUMES 1 AND 2 Rogers, L. C. G. and Pitman, J. W. [1] Markov functions, Ann. Prob. 9, 573-582 (1981). Rogers, L. C. G. and Williams, D. [1] Diffusions, Markov Process, and Martingales: Volume 2: ltd Calculus, Wiley, Chichester, 1987. [2] Construction and approximation of transition matrix functions, in D. G. Kendall [5], pp. 133-160. Rogozin, B. A. [1] On the distribution of functionals related to boundary problems for processes with independent increments, Th. Prob. Appi, 11, 580-591 (1966). Rosen, J. [1] A local time approach to self-intersections of Brownian paths in space, Comm. Math. Phys., 88, 327-338 (1983). Schwartz, L. [1] Geometrie differentielle du 2ieme ordre, semimartingales et equations differentielles stochastiques sur une variete differentielle, Seminaire de Probabilitis XVI, Supplement: Lecture Notes in Mathematics 921, Springer, Berlin, 1982, pp. 1-148. Sharpe, M. J. [1] General Theory of Markov Processes, Academic Press, New York, 1988. Sheppard, P. [1] On the Ray-Knight property of local times, J. London Math. Soc, 31,377-384 (1985). Shiga, T. and Watanabe, S. [1] Bessel diffusions as a one-parameter family of diffusion processes, Z. Wahrschein- lichkeitstheorie, 27, 37-46 (1973). Shigekawa, I. [1] Derivatives of Wiener functionals and absolute continuity of induced measure, J. Math. Kyoto Univ., 20, 263-289 (1980). Shimura, M. [1] Excursions in a cone for two-dimensional Brownian motion, J. Math. Kyoto Univ., 25, 433-443 (1985). Silverstein, M. L. [1] Symmetric Markov Processes: Lecture Notes in Mathematics 426, Springer, Berlin, 1974. [2] Boundary Theory for Symmetric Markov Processes: Lecture Notes in Mathematics 516, Springer, Berlin, 1976. Simon, B. [1] Functional Integration and Quantum Physics, Academic Press, New York, 1979. [2] Semiclassical analysis of low-lying eigenvalues, II. Tunneling, Ann. Math. 120,89-118 (1984). Skorokhod, A. V. [1] Limit theorems for stochastic processes, Th. Prob. Appl. 1, 261-290 (1956). [2] Limit theorems for Markov processes, Th. Prob. Appl. 3, 202-246 (1958). Spitzer, F. [1] Principles of Random Walk, Van Nostrand, Princeton, NJ, 1964. [2] Some theorems concerning two-dimensional Brownian motion, Trans. Amer. Math. Soc, 87, 187-197 (1958). Strassen, V. [1] An in variance principle for the law of the iterated logarithm, Z. Wahrscheinlichkeits- theorie, 3, 211-226 (1964). [2] Almost sure behaviour of sums of independent random variables and martingales,
REFERENCES FOR VOLUMES 1 AND 2 371 Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 2, Part 1, University of California Press, Berkeley, 1966, pp. 315-343. Stroock, D. W. [1] The Malliavin calculus and its applications to second-order parabolic differential operators I, II, Math. System Theory, 14, 25-65, 141-171 (1981). [2] The Malliavin calculus; a functional analytical approach, J. Funct. Anal, 44, 217-257 (1981). [3] Diffusion processes associated with Levy generators, Z. Wahrscheinlichkeitstheorie, 32, 209-244 (1975). [4] An Introduction to the Theory of Large Deviations, Springer, Berlin, New York, 1984. Stroock, D. W. and Varadhan, S. R. S. [1] Multidimensional Diffusion Processes, Springer, New York, 1979. [2] On the support of diffusion processes with applications to the strong maximum principle, Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. 3, University of California Press, Berkeley, 1972, pp. 333-359. [3] Diffusion processes with boundary conditions, Comm. Pure Appl. Math., 24,147-225 (1971). Stroock, D. W. and Yor, M. [1] Some remarkable martingales, Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 590-603. SUSSMANN, H. J. [1] On the gap between deterministic and stochastic ordinary differential equations, Ann. Prob., 6, 19-41 (1978). Symanzik, K. [1] Euclidean quantum field theory, Local Quantum Theory (ed. R. Jost), Academic Press, New York, 1969. Talagrand, M. [1] Regularity of Gaussian processes, Acta Math., 159, 99-149 (1987). Taylor, G. I. [1] Statistical theory of turbulence, Proc. Roy. Soc. London A, 151, 421-478 (1935). Taylor, Η. Μ. [1] A stopped Brownian motion formula, Ann. Prob., 3, 234-246 (1975) Taylor, S. J. [1] Sample path properties of processes with stationary independent increments, Stochastic Analysis (eds. D. G. Kendall and E. F. Harding), Wiley, New York, 1973, pp. 387-414. Thorin, O. [1] On the infinite divisibility of the lognormal distribution, Scand. Actuarial J., 121-148 (1977). Tsirel'son, B. S. [1] An example of the stochastic equation having no strong solution, Teoria Verojatn. i Primenen., 20, 427-430 (1975). Van Den Berg, M. and Lewis, J. T. [1] Brownian motion on a hypersurface, Bull. London Math. Soc, 17, 144-150 (1985). Varadhan, S. R. S. [1] Large Deviations and Applications, SIAM, Philadelphia, 1984. Varadhan, S. R. S. and Williams, R. J. [1] Brownian motion in a wedge with oblique reflection, Comm. Pure Appl. Math., 38, 405-443 (1985).
372 REFERENCES FOR VOLUMES 1 AND 2 Walsh, J. B. [1] Excursions and local time, in Azema and Yor [2], pp. 159-192. [2] Stochastic integration with respect to local time, in фп1аг, Chung and Getoor [1, 1983]. [3] An introduction to stochastic partial differential equations, Ecole d'Ete de Probabilites de St Flour XIV-1984, Lecture Notes in Mathematics 1180, Springer, Berlin, 1986. Warner, F. W. [1] Foundations of Differentiable Manifolds and Lie Groups, Springer, Berlin 1983. Watanabe, S. [1] On discontinuous additive functionals and Levy measures of a Markov process, Jap. J. Math., 34, 53-79 (1964). Watson, G. N. [1] A Treatise on the Theory of Bessel Functions, Cambridge University Press, Cambridge, 1966. Whitney, H. [1] Geometric Integration Theory, Princeton University Press, Princeton, NJ, 1957. Whittle, P. [I] Optimization over Time (two volumes), Wiley, Chichester, 1982, 1983. Williams, D. [1] Brownian motions and diffusions as Markov processes, Bull. London Math. Soc, 6, 257-303 (1974). [2] Some basic theorems on harnesses, Stochastic Analysis (eds. D. G. Kendall and E. F. Harding), Wiley, New York, 1973, pp. 349-366. [3] On Levy's downcrossing theorem, Z. Wahrscheinlichkeitstheorie, 40, 157-158 (1977). [4] Path decomposition and continuity of local time for one-dimensional diffusions, I, Proc. London Math. Soc, Ser. 3, 28, 738-768 (1974). [5] On the stopped Brownian motion formula of Η. Μ. Taylor, Seminaire de Probabilites X: Lecture Notes in Mathematics 511, Springer, Berlin, 1976, pp. 235-239. [6] Markov properties of Brownian local time, Bull. Amer. Math. Soc, 75, 1035-1036 (1969). [7] Decomposing the Brownian path, Bull. Amer. Math. Soc, 16, 871-873 (1970). [8] The Q-matrix problem for Markov chains, Bull. Amer. Math. Soc, 81, 1115-1118 (1975). [9] The Q-matrix problem, Seminaire de Probabilites X: Lecutre Notes in Mathematics 511, Springer, Berlin, 1976, pp. 216-234. [10] A note on the Q-matrices of Markov chains, Z. Wahrscheinlichkeitstheorie, 1, 116-121 (1967). [II] Some Q-matrix problems, Probability: Proc. Symp. Pure Math. XXXI, Amer. Math. Soc, Providence, RI, 1977, pp. 165-169. [12] Diffusions, Markov Processes, and Martingales, Volume 1: Foundations, Wiley, Chichester, 1979. [13] (editor) Stochastic Integrals: Proceedings, LMS Durham Symposium, Lecture Notes in Mathematics 851, Springer, Berlin, 1981. [14] Conditional excursion theory, Seminaire de Probabilites XIII: Lecture Notes in Mathematics 721, Springer, Berlin, 1979, pp. 490-494. [15] ( = [W]) Probability with Martingales, Cambridge University Press, Cambridge, 1991.
REFERENCES FOR VOLUMES 1 AND 2 373 Yaglom, A. M. [1] Some classes of random fields in и-dimensional space, related to stationary random processes, Th. Prob. Appl, 2, 273-319 (1957). Yamada, T. [1] On a comparison theorem for solutions of stochastic differential equations and its applications, J. Math. Kyoto Univ., 13, 497-512 (1973). Yamada, T. and Ogura, Y. [1] On the strong comparison theorems for solutions of stochastic differential equations, Z. Wahrscheinlichkeitstheorie, 56, 3-19 (1981). Yamada, T. and Watanabe, S. [1] On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto Univ., 11, 155-167 (1971). Yor, M. [1] Sur certains commutateurs d'une filtration, Seminaires de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin, 1981, pp. 526-528. [2] Sur la continuite des temps locaux associes a certaines semimartingales, in Azema and Yor [2], pp. 23-35. [3] Rappel et preliminaires generaux, in Azema and Yor [2], pp. 17-22. [4] Precisions sur l'existence et la continuite des temps locaux d'intersection du mouve- ment Brownien dans R2, Seminaire de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 532-542. [5] Sur la representation comme integrates stochastique des temps d'occupation du mouvement Brownien dans Rd, ibid, pp. 543-552. Yamada, T. [1] Functional Analysis, Springer, Berlin, 1965. [2] Brownian motion in homogeneous Riemannian space, Pacific J. Math., 2, 263-296. (1952). Zakai, M. [1] The Malliavin calculus, Acta Appl. Math., 3, 175-207 (1985). Zheng, W. A. and Meyer, P.-A. [1] Quelques resultats de 'mechanique stochastique', Seminaire de Probabilites XVIII: Lecture Notes in Mathematics 1059, Springer, Berlin, 1984, pp. 223-244. Zvonkin, A. K. [1] A transformation of the phase space of a diffusion process that removes the drift, Math. USSR Sbornik, 22, 129-149 (1974).
Index to Volumes 1 and 2 Absolute continuity: II.9. Absorbing state: III. 12. Accessible stopping time: VI. 13-14. Adapted process: 11.45. Additive functional: III. 16; construction from Α-potential, III. 16, 1.17. Affine group: V.35. Algebra: II. 1. Almost surely: 11.14, III.9. Announceable time: VI. 12. Approximation to compensators: VI.31. Arcsine law: 11.34,111.24, V.53. Arzela-Ascoli theorem: 11.85. Atlas: V.34. Atom: 11.88. Awaiting the almost inevitable: 11.57. Azema's martingale: 11.37. Backward equation: 1.4. Barlow's example: V.41. Basic integrands: IV.5. Bessel process: IV.35, V.48; time reversal, 111.49. Bi-invariant metric: V.35. Birth process: 111.26, VI. 14. BlumenthaPs 0-1 Law: for Brownian motion, 1.12; for FD processes, III.9. Bochner's theorem: 1.24. Bochner's horizontal Laplacian: V.34. Borel-Cantelli Lemmas: 11.15. Boundary points of one-dimensional diffusions: V.47, V.51. Boundary theory: see Martin-Doob-Hunt theory. Branch points: definition, 111.37; illustrative example, 111.37; probabilistic significance 111.41. Brownian motion: definition, 1.1; on affine group, V.35; arcsine law, 11.34, 111.24, VI.53; Brownian bridge, 1.25, 11.91, IV.40; canonical, 11.90; complex, see complex Brownian motion; and continuous martingales, 1.2, IV.34; Dirichlet problem, 1.22; elastic Brownian motion, 111.24; of ellipses, V.36, V.37; energy of charge, 1.22; excursion law, VI.50; exponential martingales, 1.2,1.9; Feller Brownian motions, VI.57; on filtered probability space, 1.2, 11.72; first-passage distribution, 1.9, 1.14, III. 10; Gaussian description, 1.3; generator, 1.4, III.6; Green function, 1.22; iterated-logarithm laws, 1.16; Kolmogorov's
376 INDEX TO VOLUMES 1 AND 2 backward and forward equations, 1.4; Kolmogorov's test, 1.13; Levy's characterization, IV.33; on Lie groups, V.35; local time, 1.5,1.14,111.16; on a manifold, V.30, V.31; martingales of, 1.17; martingale characterisations, 1.2; martingale representation, IV.36, IV.41; modulus of continuity, 1.10; no-increase property, 1.10; nowhere-differentiability, 1.12; on the orthonormal frame bundle, V.30, V.33, V.34; path decomposition, VI.55; potential theory, 1.22; quadratic variation, 1.11, IV.2; Ray-Knight Theorems, VI.52; recurrence, 1.3; reflecting Brownian motion, 1.14,111.22, V.6; reflection principle, 1.13; resolvent, III.3; rotational invariance, 1.18; scaled Brownian excursion, IV.40; scaling, 1.3; skew-product representation, V.31; on SO(3), V.35; Skorokhod embedding, 1.7, VI.51; slow points, 1.10; strong Markov property, 1.12; on a surface, V.4, V.31; time-reversed, 11.38; transition density, 1.4; unbounded variation, 1.11, IV.2; wandering to infinity, 1.18. Brownian sheet: 1.25. Burkholder-Davis-Gundy inequalities: IV.42. Cadlag maps: see K-paths. Cameron-Martin-Girsanov change of measure: IV.38-41, V.27. Canonical decomposition of a special semimartingale: VI.40. Canonical process: 11.28, 11.71, III.7. Capacity: 1.22. Caratheodory's Extension Theorem: II.5. Carverhill's noisy North-South flow: V.14. Cauchy law: 1.20. Cauchy process: 1.28, VI.2, VI.28. Change of time scale: see time substitution. Chapman-Kolmogorov equations: 1.4, III.l. Characteristic exponent: 1.28. Characteristic operator: III. 12. Charge: 1.22,111.27; see also equilibrium charge. Chart: V.34. Choquet capacitability theory: 111.76. Choquet representation of Α-excessive functions: 111.44. Choquet representation of 1-excessive probabilities: 111.38. Choquet's theorem on integral representations: 111.27. Christoffel symbols: V.31, V.34. Cieselski-Taylor Theorem: 111.20,111.49. Clark's Theorem on Brownian martingale representation: IV.41. Coffin state: III.3. Comparison theorem: V.43. Compensated Poisson process: 11.64. Compensator: VI.29, VI.31; see also dual previsible projection. Completions: 11.75. Complex Brownian motion: 1.19; cone point, 1.21; cut point, 1.21; multiple points, 1.21; Spitzer's theorem, 1.20; windings of, 1.20. Compound Poisson process: 1.28. Condition Ν: ΙΙΙ.54. Condition S: 111.55. Conditional expectations and probabilities: 11.41,11.44; regular, 11.42, 11.43. Conditional independence: 11.60. Cone point: 1.21.
INDEX TO VOLUMES 1 AND 2 377 Conformal martingales: IV.34. Connection: V.32, V.34. Continuous Levy processes, characterization of: 1.28, III. 14. Continuous local martingale: pure local martingales, IV.34; quadratic-variation process IV.30; as time change of Brownian motion, IV.34. Continuous mapping principle: 11.84. Continuous semimartingale: canonical decomposition, IV.30, VI.24; Ito's formula, IV.32; local time, IV.43. Contraction resolvent: III.4; strongly continuous (SCCR), III.4. Contraction semigroup: III.4; strongly continuous (SCCSG), III.4. Control problems: see stochastic control. Controlled variance problem: V.6, V.42. Convergence of random variables: 11.19. Coupling inequality: V.54. Coupling of one-dimensional diffusions: V.54. Co variance of a diffusion: V.l. Covariant differentiation: V.32, V.34. Cumulative risk: 11.64, VI.22. Curvature: V.38. Cut point: 1.21. Cylinder: 11.25. ^-system: ILL Daniell-Kolmogorov Theorem: 11.30, 11.31; limitations of, 11.34. Debut: of open set for R-process, 11.74; of compact set of R-process, 11.75; of progressive set, VI.3. Debut Theorem 11.76, III.9, VI.3. De Finetti's Theorem: 11.51. Diffeomorphism: V.34. Diffeomorphism Theorem: V.13. Diffusion equation:L4. Diffusion: III. 13, V.l, V.2; diffusion SDE, V.8; in one dimension, see one-dimensional diffusions; physical, 1.23. Directed set: 11.80. Dirichlet form: 1.23; for Markov chains, 111.59. Dirichlet problem: 1.22. Distribution function: 11.16. Doleans' characterization of FV processes: VI.20, VI.25-27. Doleans exponential: IV.19. Doleans' proof of the Meyer Decomposition Theorem: VI.30. Dominated-Convergence Theorem: II.8. Donsker's Invariance Principle: 1.8. Doob decomposition of a submartingale: 11.54. Doob /i-transform: 111.29, 111.45, IV.39. Doss-Sussmann method: V.28. Downcrossing Theorem (Levy): 1.14. Drift of a diffusion: V.l. Dual previsible projection: VI. 1, VI.21, VL23. Dynkin's formula: III. 10. Dynkin's Local-Maximum Principle: III. 13.
378 INDEX TO VOLUMES 1 AND 2 Dynkin's Isomorphism Theorem: 1.27. Dynkin's Maximum Principle: III.6. Elastic boundary: 111.24. Elementary process: IV.6, IV.25. El worthy's example: V.13. Empirical distribution: 11.91. Entrance laws: III. 39. Equilibrium charge and potential: 1.22,111.48, VI.35. Ergodic Theorem for one-dimensional diffusions: V.53. Evanescent process: IV. 13. Excessive functions: 111.27; representation, see Martin-Doob-Hunt theory; Riesz decomposition, 111.27; uniformly Α-excessive, III. 16. Excessive measures: III.38. Excursion intervals: VI.42. Excursion law: VI.47, VI.50; for Brownian motion, VI.50, VI.55; for Markov chain, VI.43, VI.50. Excursion theory: Ch.VI; censoring and reweighting of excursion laws, VI.58; characteristic measure, VI.47; excursion filtration, VI.59; for a finite Markov chain, VI.43; lifetime, VI.47; marked excursions, VI.49; for a Markov chain, 111.57; Markovian character of excursion law, VI.48; path decomposition for Brownian excursions, VI.55; from a point which is not regular extremal, VI.50; Poisson point process, VI.43, VI.47; starred excursion, VI.49; by stochastic calculus, VI.59. Excursion space: VI.43, VI.47. Expectation: 11.17. Exponential map: V.34. Exponential semimartingale: IV. 19, IV.22, IV.37. Extending the generator: III.4. FD (Feller-Dynkin) diffusions: III. 13, V.22; martingale representation, V.25. FD processes: existence, III.7; strong Markov property, III.8, III.9. FD semigroups: III.6. FV: see finite-variation. Fair stopping time: VI. 12. Fatou Lemma: II.8; for non-negative supermartingales, IV. 14. Feller Brownian motions: VI.57. Feller property: III.6. Feller-McKean chain: 111.23,111.35. Feynman-Kac formula: 111.19; for Markov chains, IV.22. Field: see algebra. Fick's Law: 1.23. Filtering: VI.8; Bayesian approach, VI. 10; change-detection filter, V.10, V.22; Kalman- Bucy filter, VI.9; robust filtering, VI. 11. Filtration: 11.45,11.63; natural, 11.45. Finite-dimensional distributions: 11.29,11.87. Finite fuel control problem: V.7, V.15. Finite-variation functions: 11.13. Finite-variation processes: IV.7; Doleans' characterization, VI.20. First-approach times: 11.74.
INDEX TO VOLUMES 1 AND 2 379 First-entrance decomposition: III.52. First-entrance times: see debut. First-hitting times: 11.74. Forward equation: 1.4, 1.23. Freedman's interpretation of qbj: III.57. Fubini's Theorem: 11.12. Fundamental Theorem of Algebra: 1.20. Gamma process: 1.28. Gaussian process: definition, 1.3. Gaussian random fields, isotropic: 1.26. Generator: see infinitesimal generator. Geodesic: V.32, V.34. Girsanov SDE: V.26. Good λ inequality: IV.42. Green function: 1.22,111.27, 111.30. Gronwall's lemma: V.ll. Harmonic function: 111.31. Hausdorff moment problem: 111.28. Hazard function: 11.64. Heat equation: 1.4. Helms-Johnson example: 11.79,111.31, IV. 14, VI.33. Hermite polynomials: 1.2. Hewitt-Savage 0-1 law: 11.51. Hille-Yosida Theorem: III.5. Holder inequality: 11.10. Honest transition function: III.3. Horizontal lift: V.34. Horizontal vector field: V.34. Hormander's Theorem: V.38. Hunt's Theorem: VI.35. Hyperbolic plane: V.34, V.35, V.36. Hyperboloid sheet: V.36. Hypotheses droites: VI.46. Identical hitting-distributions: 111.21. Imbedding: V.34. Independence: 11.21,11.23. Indistinguishable processes: 11.36, IV. 13. Infinite divisibility: 1.28. Infinitesimal generator: III.2, III.4; Brownian motion, 1.4, III.6; one-dimensional diffusion V.47, V.50. Inner measure: II.6. Inner regularity of measures: 11.80. Innovations process: VI.8. Instantaneous state of a Markov chain: 111.51. Integral: II.7. Integrable-variation processes: IV.7.
380 INDEX TO VOLUMES 1 AND 2 Integral curve: V.34. Integration by parts: IV.2, VI.38; for continuous semimartingales, IV.32; for finite- variation processes, IV. 18. Integrators: IV. 16. Isometric imbedding: V.34. Isotropic Gaussian random fields: L26. Ito's formula: IV.3, VI.39; for continuous semimartingales, IV.32; for convex functions IV.45, V.47; for FV processes, IV. 18. Ito integral: see stochastic integral. Jensen's inequality: 11.18,11.41,11.52. Joint law: 11.16. Kalman-Bucy filter: VI.9. Khasminskii's method for stability: V.37. Khasminskii's test for explosion: V.52. Killing: III. 18 Kingman's Markov Characterization Theorem: 111.58. Knight's Theorem on continuous local martingales: IV.34. Kolmogorov's backward equations: 1.4. Kolmogorov's forward equations: 1.4. Kolmogorov's lemma: 1.25,11.85, IV.44. Kolmogorov's test for Brownian motion: 1.13. Kolmogorov's 0-1 Law: 11.50. Krylov's example: V.29. Kunita-Watanabe inequalities: IV.28. L-process, L-path: IV, Introduction. Α-potential operator: see resolvent. Laplace exponent: 1.28,1.37. Laplace-Beltrami operator: V.30, V.34. Last-exit decomposition: 111.56, VI.43, VI.48. Last-exit distribution for Brownian motion: VI.35. Law of the Iterated Logarithm: 1.16. Law of process: 11.27. Law of a random variable: 11.16. LCCB: locally compact Hausdorff space with countable base, II.6. Lebesgue measure: II.5. Lebesgue's thorn: III.9. Left-invariant vector field: V.35. Levy Brownian motion: 1.24. Levy's characterization of Brownian motion: 1.2, IV.33. Levy-Doob 'Downward' Theorem: 11.51. Levy kernels: 111.57, IV.21. Levy-Hincin formula: 1.28, VI.2. Levy measure: 1.28,11.37, VI.2; Levy system: VI.28. Levy process: 1.28,1.29,1.30,11.37, VI.2. Levy's 'Upward' Theorem: 11.50. Lie algebra: V.35. Lie bracket: V.34, 38.
INDEX TO VOLUMES 1 AND 2 381 Lie group: V.35. Lifetime: III.7. Likelihood ratio: 11.79, IV. 17; for Markov chains, IV.22. Lipschitz square root: V.12. Local martingale: IV.l, IV.14; on a manifold, V.30, V.33. Local time: for Brownian motion, 1.5, 1.14; for continuous semimartingales, IV.43-4; growth set, VI.45; for Levy processes, 1.30; Markovian local time, IV.43; as an occupation density, IV.45; for one-dimensional diffusions, V.49; at regular extreme point of a Ray process, VI.45; from upcrossings of Brownian motion, 1.14,11.79. Localization: IV.9. Locally bounded previsible process: IV. 10. Lusin space: 11.31,11.82. Lyapunov exponent: V.37. Malliavin-Bismut integration-by-parts: V.38. Malliavin calculus: V.38. Manifold: V.34. Marked excursions: VI.49. Markov chains: III.2; birth process, IV.26; Dirichlet form, 111.59; Feller-McKean chain, 111.23,111.35; Levy's diagonal Q-matrix, 111.35, IV.35; Martin boundary, 111.48; martingale problem, IV.20-22; as Ray processes, 111.50; stable and instantaneous states, 111.51; see also Q-matrices, standard transition functions. Markov p-function: 111.58. Markov inequality: 11.18. Markov processes: III.l; see also FD processes, Ray processes. Martin compactification: 111.28. Martin kernel: 111.27; for Brownian motion in the unit ball, III.30. Martin-Doob-Hunt theory: for discrete-parameter chains, 111.28, 111.29, 111.42; for Brownian motion, 111.30, 111.31. Martingales: definitions, 11.46,11.63; for Brownian motion, 1.17; convergence theorems, 11.49, 11.50, 11.51, 11.69; in L', 11.53; regularity of paths, 11.65, 11.66, 11.67; for FD processes, ШЛО; for Brownian motion, 1.17. Martingale inequalities: Burkholder-Davis-Gundy inequality, IV.42; Doob's LP inequality, 11.52, 11.70; Doob's submartingale inequality, 11.52, 11.54, 11.70; Doob's Upcrossing Lemma, 11.48. Martingale problem: V.19; existence of solutions, V.23; for Markov chains, IV.20; Markov property of solution, V.21; relationship to weak solutions of SDEs, V.19-20; Stroock- Varadhan Theorem, V.24; well-posed, V.19. Martingale representation: for Brownian motion, IV.36, 41; for FD diffusion, V.25; for Markov chains, IV.21. Maximum-Modulus Theorem: 1.20. Maximum Principle: III. 13. McGill's Lemma: VI.59. Mean curvature: V.4. Measurable function: II.2. Measurable space: II. 1. Measurable transition function: III.3. Measure space: II.4. Meyer decomposition: III. 17, VI.29, VI.32, VI.46. Meyer's Previsibility Theorem: VI. 15.
382 INDEX TO VOLUMES 1 AND 2 Minkowski inequality: 11.10. Moderate function: IV.42. Modification: 11.36. Monotone-Class Theorems: II.3. Monotone Convergence Theorem: II.8. Multiple points of Brownian motion: 1.21. Multiplicative functional: see PCHMF. Nagasawa's formula: 111.42, 111.46. Natural scale: V.46. Net: 11.80. Normal coordinates: V.34. Normal transition function: III.3. Observation process: VI.8 Occupation density formula: 1.5, IV.45. One-dimensional diffusions: 1.5, V.44-54, absorbing, inaccessible, reflecting end-points, V.47, V.51; exit, entrance boundary points, V.51; infinitesimal generator, V.47, V.50; natural scale, V.46; regular diffusion, V.45; resolvent, V.50, VI.54; scale function, V.46; speed measure, V.47; time substitution, V.47. One-parameter subgroup: V.35. Optional processes, σ-algebra: VI.4. Optional projection: VI.7. Optional-Sampling Theorem: 11.59,11.77. Optional Section Theorem: VI.5. Optional-Stopping Theorem: 11.57. Optional time: VI.4; see also stopping time. Orthonormal frame bundle: V.30, V.33, V.34. Ornstein-Uhlenbeck process: 1.23, V. 5; spectral measure, 1.24. Outer measure: 1.6,11.35. π-system: ILL Parallel-displacement: V.32, 34. Parallel transport: V.32. Path decomposition: 111.49, VI.55. Path regularization: 11.67, III.7. Path-space: 11.28, V.8. Pathwise-exact SDE: see SDE. Pathwise uniqueness: V.9, V.17; Nakao theorem, V.41; Yamada-Watanabe Theorem, V.40. PCHAF (perfect, continuous, homogeneous, additive functional): III. 16. PCHMF (perfect, continuous, homogeneous, multiplicative functional): III. 18. PFA theorem: VI. 12, VI. 16. Picard's Theorem: 1.20. Polish space: 11.82. Poisson measures: 11.37. Potential (supermartingale): 11.59. Potential theory: 1.22; see also Dirichlet problem, Martin-Doob-Hunt theory. Poisson kernel for the half plane: 1.19. Poisson kernel for the unit ball: 111.30.
INDEX TO VOLUMES 1 AND 2 383 Poisson measure, process: 11.37, VI.2. Polish space: 11.82; characterization of, 11.82. Probability triple: 11.14. Рге-Brownian motion: 11.32, 11.68. Рге-Poisson set function: 11.33. Pre-Τ σ-algebra: II.58,11.73, VI. 17. Previsible: 11.47; path functionals, V.8; processes, σ-algebra, IV.6; Section Theorem, VI.19; stopping time, VI. 12. Previsible projection: VI.19. Product σ-algebras: 11.11; product measures, 11.12,11.22. Progressive process, σ-algebra: II.73, VI.3. Prohorov's Theorem: 11.83. Pseudo-Riemannian metric: V.36. Pure local martingale: IV.34, IV.35, V.28. Purely discontinuous martingales: IV.24. β-matrices: III.l; DK conditions, 111.53; local-character condition, 111.54; probabilistic significance, 111.57, IV.21; of symmetric chains, 111.59; of totally instantaneous chains, 111.55. Quadratic-covariation process: IV.26. Quadratic variation: 1.11. Quadratic-variation process: IV.26, VI.36; for continuous local martingales, IV.30; previsible angle-bracket process, VI.34. Quantum fluctuations: V.5. Quasi-left-continuity: for FD processes, III.l 1; of filiations, VI. 18; for Ray processes 111.41, 111.50. Quasimartingales: VI.41. Quaternions: V.35. R-filtered space: 11,67. R-path, R-process: 11.62, 11.63, Introduction to Chapter IV. R-regularisation: 11.67. R-supermartingale convergence theorem: 11.69. Radon-Nikodym Theorem: II.9. Random field: 1.24. Random walk, Martin boundary of: 111.28. Ray-Knight compactification: Ш.35. Ray-Knight Theorem on local times: VI.52. Ray processes: 111.36; application to chains, 111.50. Ray resolvent: 111.34. Ray's Theorem: 111.36, 111.38. Reducing sequence: IV.ll. Reduction: IV. 11,1V.29. Reflecting Brownian motion: see Brownian motion. Reflection principle: 1.13. Regular conditional probabilities: existence theorem, 11.89; counterexample, 11.43. Regular class (D) submartingale: VI.31. Regular diffusion: V.45. Regular increasing process: VI.21. Regular point: 1.22.
384 INDEX TO VOLUMES 1 AND 2 Regular function: 111.27. Regularizable path: 11.62. Resolvent: III.2, III.3; of Brownian motion, III.3. Resolvent equation: III.2, III.3. Reuter's Theorem on drifting Brownian motion: IV.39. Reversed Martingale Convergence Theorem: 11.51. Riemannian connection: V.32, V.34. Riemannian manifold: V.31. Riemann mapping theorem: 1.19. Riemannian metric: V.34. Riemannian structure induced by non-singular diffusion: V.34. Riesz decomposition of excessive functions: 111.27. Riesz decomposition of a UI supermartingale: 11.59. Riesz representation Theorem: 11.80, III.6. Rolling without slipping: V.33. σ-additivity: II.4. σ-algebra: II. 1; countably-generated, 11.88. σ-field: see σ-algebra. SDE: of diffusion type, V.8; exact, V.9, V.17; Ito's Theorem on existence and uniqueness of solutions, V.ll; links with martingale problem, V.19-20; with (locally) Lipschitz coefficients, V. 11-13; Markov property of solutions, V.13; pathwise uniqueness, V.9, V.17; strong solution, V.10; Tanaka's SDE, V.16; time-reversal, V.13; Tsirel'son's SDE, V.18; uniqueness in law, V.16; weak solution, V.16. Scale function: V.28, V.46. SchefTe's Lemma: II.8. Section Theorem: 11.76; Optional Section Theorem, VI.5; Previsible Section Theorem, VI.19. Semimartingale: IV. 15; continuous, see continuous semimartingale; as integrator, IV. 16; local time, IV.43; in a manifold, IV. 15. Signal process: VI.8. Skew product of Brownian motion: IV.35. Skorokhod embedding: 1.7, VI.51. Skorokhod's equation: V.6. Special semimartingale: VI.40. Spectral measure of stationary Gaussian process: 1.24. Spitzer-Rogozin identity for Levy processes: 1.29. Splitting time: 111.49. Stable process: 1.28. Stable subspace: IV.24. Standard process: 111.49. 'Standard' transition matrix function: III.2. Stochastic control, optimality principle: V.15. Stochastic development: V.33. Stochastic differential equation: see SDE. Stochastic differentials: IV.32, V.l. Stochastic flows: V.13. Stochastic integral: IV.27, VI.36-38; Riemann-sum approximation, IV.47. Stochastic partial differential equations: VI. 11. Stochastic process: 11.27.
INDEX TO VOLUMES 1 AND 2 385 Stone-Weierstrass Theorem: 11.80. Stopping time: 11.56,11.73. Strassen's Law: 1.16; invariance principle, 1.16. Stratonovich calculus: IV.46; switch to Ito, V.30. Strong Law of Large Numbers: 11.51. Strong Markov property: for Brownian motion, 1.12; for FD processes, III.8, III.9; for Ray processes, 111.40; under time reversal, 111.47. Strong reduction: VI.37. Structural constants for Lie groups: V.35. Structural equations: V.34. Subadditive Ergodic Theorem: 1.22. Submanifold: V.34; regular submanifold, V.31. Sub-Markov semigroup: III.3. Submartingale: 11.46, 11.63. Subordinator: 1.28,11.37, VI.43. Summation convention: V.l. Superharmonic function: 111.31. Supermartingale: convergence theorem, 11.49; definition, 11.46, 11.63; sup of a sequence of, 11.78. Supermedian function: III.34. Symmetrisable Q-matrix, transition matrix function: 111.59. Taboo probabilities: 111.52. Tanaka's formula: IV.43. Tanaka's SDE: V.16. Tangent bundle: V.34. Tangent vector: V.30, V.34. Tchebychev inequality: 11.18. Terminal time: III. 18. Tightness: 11.83,11.85. Time change: see time substitution. Time reversal: 111.42, 111.47,111.49. Time-reversed Brownian motion: 11.38. Time substitution: 111.21, IV.30, V.26. Torsion: V.34. Totally inaccessible stopping time: V.21, VI.13-14. Tower property of conditional expectation: 11.41. Transition function: III. 1; measurable, III.3. Trotter's Theorem: 1.5. Tsirel'son's SDE: V.l8. Uniform asymptotic negligibility: 1.28. UI: see uniform integrability. Uniform integrability 11.20,11.29, 11.21 11.44. Uniqueness in law: V.16. Universal completion: III.9. Uperossings: 11.62. Upcrossing Lemma: 11.48. Usual augmentation: 11.67,11.75. Usual conditions: 11.67, IV, Introduction.
386 INDEX TO VOLUMES 1 AND 2 Volkonskii's formula: 111.21. Volkonskii-Sur-Meyer Theorem: III. 16, III. 17. Volume element: V.34. Von Mises distribution: IV.39. Weak convergence: 11.83; Prohorov's Theorem, 11.83; in W, 11.85; Skorokhod's interpretation, 11.84,11.86. Weak* topology: 11.80. Weyl's Lemma: V.38. Whittle's flypaper example: V.7, V.15. Wiener-Hopf factorisation of Levy processes: 1.29 . Wiener measure: 1.6. Wiener process: see Brownian motion. Wiener's Theorem: 1.6; proofs, 1.6,11.71. Yor's addition formula: IV. 19. Yor's Theorem on semimartingale local time: IV.44. Zvonkin's observation: V.18, V.28.