/
Author: Williams D. Rogers L.C.G.
Tags: mathematical analysis discrete mathematics probability theory mathematical statistics
ISBN: 0-471-95061-0
Year: 1979
Text
Diffusions, Markov Processes,
and Martingales
Volume 1: FOUNDATIONS
2nd Edition
Diffusions, Markov Processes,
and Martingales
Volume 1: FOUNDATIONS
2nd Edition
L. C..G. ROGERS
and
DAVID WILLIAMS
School of Mathematical Sciences,
University of Bath
JOHN WILEY & SONS
Chichester · New York · Brisbane · Toronto · Singapore
Copyright © 1979, 1994 by John Wiley & Sons Ltd,
Baffins Lane, Chichester,
West Sussex P019 1UD, England
National Chichester (0243) 779777
International (+ 44) 243 779777
All rights reserved.
No part of this book may be reproduced by any means,
or transmitted, or translated into a machine language
without the written permission of the publisher.
Other Wiley Editorial Offices
John Wiley & Sons, Inc., 605 Third Avenue,
New York, NY 10158-0012, USA
Jacaranda Wiley Ltd, 33 Part Road, Milton,
Queensland 4064, Australia
John Wiley & Sons (Canada) Ltd, 22 Worcester Road,
Rexdale, Ontario M9W 1L1, Canada
John Wiley & Sons (SEA) Pte Ltd, 37 Jalan Pemimpin #05-04,
Block B, Union Industrial Building, Singapore 2057
Library of Congress Cataloging-in-Publication Data
Rogers, L. C. G.
Diffusions, Markov processes, and martingales / L. С G. Rogers,
David Williams. — 2nd ed.
p. cm.—(Wiley series in probability and mathematical
statistics)
New ed. of: Diffusions, Markov processes, and martingales / David
Williams. 1979-
Includes bibliographical references and index.
Contents: v. 1. Foundations.
ISBN 0 471 95061 0 (v. 1)
1. Markov processes. 2. Diffusion processes. 3. Martingales
(Mathematics) I. Williams, D. (David), 1938- . II. Williams, D.
(David), 1938- Diffusions, Markov processes, and martingales.
III. Title. IV. Series.
QA274.7.W54 1994 94-28241
519.2'33—dc20 CIP
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0 471 95061 0
Typeset in 10/12pt Times by Thomson Press (India) Ltd., New Delhi
Printed and bound in Great Britain by Biddies Ltd., Guildford and Kings Lynn
For our parents
From the Original (1979)
Preface
Long ago (or so it seems today), Chung wrote on page 196 of his book [1]:
One wonders if the present theory of stochastic processes is not still too difficult
for applications.' Advances in the theory since that time have been phenomenal,
but these have been accompanied by an increase in the technical difficulty of
the subject so bewildering as to give a quaint charm to Chung's use of the word
'still'. Meyer writes in the preface to his definitive account of stochastic integral
theory:'... ilfaut ...un cours de six mois sur les definitions. Que peut on yfaireT
I have thought up as intuitive a picture of the subject as I can, written it
down at speed, and refused to be lured back by piety (or even by wit!) to cancel
half a line. 'First' intuition, which is what you need when you are learning the
subject, is raw, rough and ready; and, as you have guessed, I make the excuse
that it demands a compatible style and lack of polish.
Note that I wrote first intuition'. Consider an example. Meyer's concept of
a right process is exactly right for Markov process theory, but the concept is
the result of a long evolution. To understand it properly, you need a highly
developed intuition, and that takes time to acquire. The difficulty with the best
advanced literature is that its authors have too much intuition; never make the
mistake of thinking otherwise.
My aim then is to sharpen your intuition to a point where the advanced
abstract literature becomes accessible, enjoyable and 'relevant'. Like my expository
article [1], this is a missionary tract not a theological treatise. (Those of you
who have read my article [1] will see that this book often follows it very closely,
except that now I have the time and the duty to be more obviously appreciative
of the abstract theory!)
I believe that, in the end, it is applications which justify mathematics. The
'artistic' justification of pure mathematics in terms of intrinsic qualities like
elegance and generality rings rather hollow in my ears when I compare the best
mathematics with the greatest music. Many applied workers will regard this
book as extremely 'pure', but I see it as one stage in shunting pure theory over
towards applications. The shunting is not always necessary: time and again, one
finds 'applied' papers which 'solve' problems long since solved for 'purely
Vlii FROM THE ORIGINAL (1979) PREFACE
theoretical' purposes. Moral: the pure/applied division of probability theory (as
of mathematics in general) is a nonsense.
Acknowledgements. This is an appropriate place at which to thank David
Kendall and Harry Reuter for teaching me probability theory and for giving
me an enthusiasm for the subject which is wearing well. My best way to thank
them is to try to share that enthusiasm.
I have to say another huge 'thank you' to David Kendall for the immense
amount of work he has done in making editorial comments on the original
manuscript. I now see that my determination to convey a sense of adventure
did need to be tempered by a greater concern for the reader's sense of security.
So I have acceded to many of David Kendall's requests for 'more details'; and
as a result, you will learn more techniques of calculation and have a clearer
idea of several concepts. (But I still see it as part of my job to keep you on
your toes!)
I am very grateful to Ronald Getoor and Andre Meyer for clearing up some
confusions.
I have been extremely fortunate in having been able to rely on the superb
typing skills of Sheila Campbell, Eileen Jenkins and Gladys Maddocks; my
thanks and best wishes to them.
I thank Springer-Verlag and the authors for granting me permission to quote
from Chung [1] in Section 111.44, from Getoor [1] in Section III. 54, and from
Chung [1] and Meyer [1] earlier in ths preface.
Finally, I have to thank James Cameron and Wiley for encouragement and
great patience; and subeditors, copy-editors, and printers, whose skills have
much impressed me.
David Williams
Swansea, 1978
Preface to the Second
Edition
This second edition differs profoundly from the first—and not only in having
two authors rather than one. We retain the Gallic tradition of dividing the
volume into three massive chapters: Chapter I, which says why the subject is
worth studying; Chapter II, which provides background; and Chapter III, which
presents an account of Markov processes. Chapter I is now much more extensive
and wide-ranging, and covers much work done since the first edition appeared.
Chapter II is now a highly systematic account, with detailed proofs, of what
every young probabilist must know. It is rather unashamedly a sequel to DW's
Probability with Martingales, Cambridge University Press having been very
generous in allowing us to follow that account closely (but without many proofs,
without the examples, etc.). It is perfectly possible to read Chapter II before
Chapter I if you so wish. We would suggest however that you try things in the
order 'heuristics then rigour':
Our doubts are traitors,
And make us lose the good we oft might win,
Through fearing to attempt.
(W. Shakespeare, Measure for Measure.)
Chapter III seems to have been regarded as the most successful part of the
original; and it is reproduced here without much modification (except that some
of the functional analysis is given fuller treatment). It was always intended as
a missionary tract on Markov processes. The full theory may be found in Sharpe
[1] and in the final two volumes of the probabilist's bible, Dellacherie and
Meyer [1]. All kinds of important developments are ignored in Chapter III:
they would require another complete volume, and will be, or are, covered by
greater experts. Dawson's eagerly awaited treatment [1] of measure-valued
processes has now appeared; Mark Davis has a very nice new book [4] on
piecewise-deterministic Markov processes; and so on. You can access the huge
literature on measure-valued processes via Dawson's account.
The musical allusions in the first edition have been excised. Apparently many
people found them annoying. 'Would David Williams like a book on mathematics
χ
PREFACE TO THE SECOND EDITION
filled with references to baseball?', they say. (To which the answer is, of course,
'Yes.') So, this is Mathematics all the way from A to Zzzz—or from Ω on, if
you want to be rigorous.
Our thanks to Sue Collins and Wolfgang Stummer, and to other colleagues
at Bath, Cambridge, and Queen Mary and Westfield College, London. Our
thanks too to Helen Ramsey and other Wiley staff for suggesting this new
version; and the copy-editor and printer whose skills have impressed us.
Chris Rogers
David Williams
November 1993
Contents
Some Frequently Used Notation xix
CHAPTER I. BROWNIAN MOTION
1. INTRODUCTION 1
1. What is Brownian motion, and why study it? 1
2. Brownian motion as a martingale 2
3. Brownian motion as a Gaussian process 3
4. Brownian motion as a Markov process 5
5. Brownian motion as a diffusion (and martingale) 7
2. BASICS ABOUT BROWNIAN MOTION 10
6. Existence and uniqueness of Brownian motion 10
7. Skorokhod embedding 13
8. Donsker's Invariance Principle 16
9. Exponential martingales and first-passage distributions 18
10. Some sample-path properties 19
11. Quadratic variation 21
12. The strong Markov property 21
13. Reflection 25
14. Reflecting Brownian motion and local time 27
15. Kolmogorov's test 31
16. Brownian exponential martingales and the Law of the
Iterated Logarithm 31
3. BROWNIAN MOTION IN HIGHER DIMENSIONS 36
17. Some martingales for Brownian motion 36
18. Recurrence and transience in higher dimensions 38
19. Some applications of Brownian motion to complex analysis 39
20. Windings of planar Brownian motion 43
21. Multiple points, cone points, cut points 45
Xll
CONTENTS
22. Potential theory of Brownian motion in Rd {d ^ 3) 46
23. Brownian motion and physical diffusion 51
4. GAUSSIAN PROCESSES AND LEVY PROCESSES 55
Gaussian processes
24. Existence results for Gaussian processes 55
25. Continuity results 59
26. Isotropic random flows 66
27. Dynkin's Isomorphism Theorem 71
Levy processes
28. Levy processes 73
29. Fluctuation theory and Wiener-Hopf factorisation 80
30. Local time of Levy processes 82
CHAPTER II. SOME CLASSICAL THEORY
1. BASIC MEASURE THEORY 85
Measurability and measure
1. Measurable spaces; σ-algebras; π-systems; d-systems 85
2. Measurable functions 88
« 3. Monotone-Class Theorems 90
4. Measures; the uniqueness lemma; almost everywhere; ^β.(μ,Σ) 91
5. Caratheodory's Extension Theorem 93
6. Inner and outer μ-measures; completion 94
Integration
7. Definition of the integral \fdμ 95
8. Convergence theorems 96
9. The Radon-Nikodym Theorem; absolute continuity;
λ « μ notation; equivalent measures 98
10. Inequalities; <£p and U spaces {p ^ 1) 99
Product structures
11. Product σ-algebras 101
12. Product measure; Fubini's Theorem 102
13. Exercises 104
2. BASIC PROBABILITY THEORY 108
Probability and expectation
14. Probability triple; almost surely (a.s.); a.s.(P), a.s.iP,^) 108
CONTENTS
хш
15. limsup£n; First Borel-Cantelli Lemma 109
16. Law of random variable; distribution function; joint law 110
17. Expectation; E(X; F) 110
18. Inequalities: Markov, Jensen, Schwarz, Tchebychev 111
19. Modes of convergence of random variables 113
Uniform integr-ability and JS?1 convergence
20. Uniform integrabilky 114
21. У?1 convergence 115
Independence
22. Independence of σ-algebras and of random variables 116
23. Existence of families of independent variables 118
24. Exercises 119
3. STOCHASTIC PROCESSES 119
The Daniell-Kolmogorov Theorem
25. (£τ, £Ί)\ σ-algebras on function space; cylinders and σ-cylinders 119
26. Infinite products of probability triples 121
27. Stochastic process; sample function; law 121
28. Canonical process 122
29. Finite-dimensional distributions; sufficiency; compatibility 123
30. The Daniell-Kolmogorov (DK) Theorem: 'compact
metrizable' case 124
31. The Daniell-Kolmogorov (DK) Theorem: general case 126
32. Gaussian processes; рге-Brownian motion 127
33. Pre-Poisson set functions 128
Beyond the DK Theorem
34. Limitations of the DK Theorem 128
35. The role of outer measures 129
36. Modifications; indistinguishability 130
37. Direct construction of Poisson measures and subordinators,
and of local time from the zero set; Azema's martingale 131
38. Exercises 136
4. DISCRETE-PARAMETER MARTINGALE THEORY 137
Conditional expectation
30. Fundamental theorem and definition 137
40. Notation; agreement with elementary usage 138
41. Properties of conditional expectation: a list 139
42. The role of versions; regular conditional probabilities and pdfs 140
XIV
CONTENTS
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
A counterexample
A uniform-integrability property of conditional expectations
(Discrete-parameter) martingales and supermartingales
Filtration; filtered space; adapted process; natural filtration
Martingale; supermartingale; submartingale
Previsible process; gambling strategy; a fundamental principle
Doob's Upcrossing Lemma
Doob's Supermartingale-Convergence Theorem
if1 convergence and the UI property
The Levy-Doob Downward Theorem
Doob's Submartingale and if Inequalities
Martingales in S£2; orthogonality of increments
Doob decomposition
The <M> and [M] processes
Stopping times, optional stopping and optional sampling
Stopping time
Optional-stopping theorems
The pre-Τ σ-algebra !FT
Optional sampling
Exercises
141
142
143
144
144
145
146
147
148
150
152
153
154
155
156
158
159
161
CONTINUOUS-PARAMETER SUPERMARTINGALES 163
Regularisation: R-supermartingales
61. Orientation 163
62. Some real-variable results 163
63. Filiations; supermartingales; R-processes, R-supermartingales 166
64. Some important examples 167
65. Doob's Regularity Theorem: Part 1 169
66. Partial augmentation 171
67. Usual conditions; R-filtered space; usual augmentation;
R-regularisation 172
68. A necessary pause for thought 174
69. Convergence theorems for R-supermartingales 175
70. Inequalities and S£v convergence for R-submartingales 177
71. Martingale proof of Wiener's Theorem; canonical
Brownian motion 178
72. Brownian motion relative to a filtered space 180
Stopping times
73. Stopping time T; pre-Γ σ-algebra 3ίΓ; progressive process 181
74. First-entrance (debut) times; hitting times; first-approach times:
the easy cases 183
CONTENTS
XV
75. Why 'completion' in the usual conditions has to be introduced 184
76. Debut and Section Theorems 186
77. Optional Sampling for R-supermartingales under the
usual conditions 188
78. Two important results for Markov-process theory 191
79. Exercises 192
6. PROBABILITY MEASURE ON LUSIN SPACES 200
'Weak convergence9
80. C(J) and Pr(J) when J is compact Hausdorff 202
81. C(J) and Pr(J) when J is compact metrizable 203
82. Polish and Lusin spaces 205
83. The Cb(S) topology of Pr(S) when 5 is a Lusin space;
Prohorov's Theorem 207
84. Some useful convergence results 211
85. Tightness in Pv(W) when W is the path-space W:= C([0,oo);R) 213
86. The Skorokhod representation of Cb(s) convergence on Pr(S) 215
87. Weak convergence versus convergence of finite-dimensional
distributions 216
Regular conditional probabilities
88. Some preliminaries 217
89. The main existence theorem 218
90. Canonical Brownian Motion CBM(R*); Markov property of
Px laws 220
91. Exercises 222
CHAPTER III. MARKOV PROCESSES
1. TRANSITION FUNCTIONS AND RESOLVENTS 227
1. What is a (continuous-time) Markov process? 227
2. The finite-state-space Markov chain 228
3. Transition functions and their resolvents 231
4. Contraction semigroups on Banach spaces 234
5. The Hille-Yosida Theorem 237
2. FELLER-DYNKIN PROCESSES 240
6. Feller-Dynkin (FD) semigroups 240
7. The existence theorem: canonical FD processes 243
8. Strong Markov property: preliminary version 247
9. Strong Markov property: full version; Blumenthal's 0-1 Law 249
XVI
CONTENTS
10. Some fundamental martingales; Dynkin's formula 252
11. Quasi-left-continuity 255
12. Characteristic operator 256
13. Feller-Dynkin diffusions 258
14. Characterisation of continuous real Levy processes 261
15. Consolidation 262
3. ADDITIVE FUNCTIONALS 263
16. PCHAFs; Α-excessive functions; Brownian local time 263
17. Proof of the Volkonskii-Sur-Meyer Theorem 267
18. Killing 269
19. The Feynmann-Kac formula 272
20. A Ciesielski-Taylor Theorem 275
21. Time-substitution 277
22. Reflecting Brownian motion 278
23. The Feller-McKean chain 281
24. Elastic Brownian motion; the arcsine law 282
4. APPROACH TO RAY PROCESSES:
THE MARTIN BOUNDARY 284
25. Ray processes and Markov chains 284
26. Important example: birth process 286
27. Excessive functions, the Martin kernel and Choquet theory 288
28. The Martin compactification 292
29. The Martin representation; Doob-Hunt explanation 295
30. R. S. Martin's boundary 297
31. Doob-Hunt theory for Brownian motion 298
32. Ray processes and right processes 302
5. RAY PROCESSES 303
33. Orientation 303
34. Ray resolvents 304
35. The Ray-Knight compactification 306
Ray's Theorem: analytical part
36. From semigroup to resolvent 309
37. Branch-points 313
38. Choquet representation of 1-excessive probability measures 315
Ray's Theorem: probabilistic part
39. The Ray process associated with a given entrance law 316
40. Strong Markov property of Ray processes 318
41. The role of branch-points 319
CONTENTS
XV11
6. APPLICATIONS 321
Martin boundary theory in retrospect
42. From discrete to continuous time 321
43. Proof of the Doob-Hunt Convergence Theorem 323
44. The Choquet representation of Π-excessive functions 325
45. Doob /i-transforms 327
Time reversal and related topics
46. Nagasawa's formula for chains 328
47. Strong Markov property under time reversal 330
48. Equilibrium charge 331
49. BM (JR.) and BES (3): splitting times 332
A first look at Markov-chain theory
50. Chains as Ray processes 334
51. Significance of q{ 337
52. Taboo probabilities; first-entrance decomposition 337
53. The Q-matrix; DK conditions 339
54. Local-character condition for Q 340
55. Totally instantaneous β-matrices 342
56. Last exits 343
57. Excursions from b 345
58. Kingman's solution of the 'Markov characterization problem' 347
59. Symmetrisable chains 348
60. An open problem 349
References for Volumes 1 and 2 351
Index to Volumes 1 and 2
375
Some Frequently Used
Notation
We use ':=' to mean 'is defined to equal'. This Pascal notation can also be used
in reverse. We define
Z+:={0,1,2,...} 3 {1,2,3,...} =:1K,
R\=[0,oo), R+ + :=(0,oo), Q+:=QnR+.
We neaten layout, and make things easier for our printers, by the use of
alternative notations:
X(tl9a>) for Xtl(a>\ fn{l) for fni, P(TX) for *Tl, Ptf(x) for (Ptf)(x),
etc. Once things are underway, such switches in notation will be made without
comment. The composition notation
f°g(t):=f(g(t))
will often be used for tidiness.
If / and g are real numbers or real-valued functions, we define
/v g:= max(/,0), fvg:=mm(f,g), /+:=/v0, /":=(-/) v 0;
hence /=/ + -/" and |/| = / + +/".
If Ж is a set of real-valued functions, we write
Ж+ for the set of non-negative elements of Ж,
ЪЖ* for the set of bounded elements in Ж.
If Σ is a σ-algebra, we write
ml for the set of real-valued (or perhaps [oo, oo]-valued)
Σ-measurable functions,
bX for the space of bounded Σ-measurable functions.
If S is a topological space, we write
C(S) for the space of all continuous functions from S to R.
Cb{S) for the space of all bounded continuous functions from S to R.
XX
SOME FREQUENTLY USED NOTATION
Monotone convergence. We write 'sji' to signify that s->i,s^i; and 'sffr' to
signify that s-*t,s<t. If (sn) is a sequence then 'sn f i' signifies that sn -> i, sn ^
5π +1 ^ fi while'sn1119 signifies that sn-^t,sn^sn+l<t. If fn and/are real-valued
functions then (for example) /:jlim/n signifies that fn]f pointwise.
CHAPTER I
Brownian Motion
1. INTRODUCTION
1. What is Brownian motion, and why study it? The first thing is to define
Brownian motion. We assume given some probability triple (Ω, &, Ρ).
(1.1) DEFINITION. A real-valued stochastic process {Bt:telR + } is a Brownian
motion if it has the properties
(1.2) (i) Βο(ω) = 0,νω;
(1.2) (ii) the map tb-+Bt{co) is a continuous function o/ieIR+ for all ω;
(1.2)(iii) for every t,h^0, Bt+h-Bt is independent of{Bu:0^u^ t}, and has а
Gaussian distribution with mean 0 and variance h.
The conditions (1.2)(ii) and (1.2)(iii) are the really essential ones; if В — {B^ieR"1"}
is a Brownian motion, we frequently speak of {ξ + Bt:teR+} as a Brownian
motion (started at ξ); the starting point ξ can be a fixed real, or a random
variable independent of B.
Now that we know what a Brownian motion is, questions of existence and
uniqueness (answered in Section 6) are less important than an answer to the
second question of the title, 'Why study it?' There are many answers to this
question, but to us there seem to be four main ones:
(i) Virtually every interesting class of processes contains Brownian motion—
Brownian motion is a martingale, a Gaussian process, a Markov process, a
diffusion, a Levy process,...;
(ii) Brownian motion is sufficiently concrete that one can do explicit calculations,
which are impossible for more general objects;
(iii) Brownian motion can be used as a building block for other processes
(indeed, a number of the most important results on Brownian motion state
that the most general process in a certain class can be obtained from
Brownian motion by some sequence of transformations);
(iv) last but not least, Brownian motion is a rich and beautiful mathematical
object in its own right.
The aim of this chapter is to expand on these reasons, and convince you that
2
BROWNIAN MOTION
1.1,2
Brownian motion is indeed worthy of study; and the rest of this introduction
gives a brief outline of some of the main points of the chapter.
2. Brownian motion as a martingale. Let {Bt:t^ 0} be a Brownian motion, and
define 3#t = a({Bs:s^ t}). Then (Bt,&t)t^0 is a martingale. We shall have a lot
more to say about martingales in Chapter II, but for now we need little of the
theory developed there. Let us just check that (Bt,&t)t^0 is a martingale
(cf. Section 11.63); first, BteLl for all i, because, from (1.2)(i) and (1.2)(iii),
Bt ~ N(0, t), and, secondly, for 0 ^ s ^ i,
Ε[_Bt - Bs | #J = 0, equivalent^, Ε[_Bt | ЭД = Bs,
since Bt — Bs is independent of 39s by (1.2)(iii). Likewise, since Bt — Bs~ N(0, t — s)
independently of 3$s, we have
El(Bt-Bs)2\@s-] = t-s.
But
E[(B,-BS)2|^] = E[Bt2 -2BtBs + B2s\@J = E[Bt2|^J -B2,
using properties of conditional expectation (Section 11.41), so since we have
(almost surely) that E[£t2 -1 |#s] = B2 - s, we conclude that
(2.1) B2 — t is a martingale.
This simple fact is a pointer to the development of stochastic integrals; once
that theory is developed, we shall be in a position to prove the following
startling converse to (2.1).
(2.2) THEOREM. (Levy) Let (Xt)t^0 be a continuous martingale, Xo = 0, and
suppose that
X2 — t is a martingale.
Then X is a Brownian motion.
By a continuous martingale, we mean of course one such that t\-+Xt{a>) is a
continuous map for all ω. We have not been too specific about the filtration
(^t)t^o with respect to which AT is a martingale, but this is not necessary; if X
is a martingale with respect to (J*,),^ 0> and satisfies the hypotheses of Theorem 2.2
then X is an (J5",) Brownian motion—that is, X satisfies (1.2)(i), (1.2)(ii) and the
stronger condition
(1.2)(iii)' for any ί,/ι^Ο, Xt+h — Xt is independent of ^t and has a Gaussian
distribution with mean zero and variance h.
The Kunita-Watanabe proof of Theorem 2.2 is given in Section IV.33; a more
elementary proof without using stochastic calculus appears in Doob [1].
A remarkable consequence of Theorem 2.2 is that
1.2,3
INTRODUCTION
3
(2.3) every continuous martingale is a time-change of Brownian motion.
For a statement and proof of this, see Section IV. 34. One extremely useful
consequence is that, since
> lim:
supBt = + oo, lim inf Bt = — oo
= 1
(as we shall see in Lemma 3.6), if X is a continuous martingale for which
P(lim inf Xt = — oo) > 0 then we must have P(lim supXt = + oo) > 0. See Section
IV.34 for a full discussion.
The elementary arguments that gave (2.1) also show that for any 0eR
(or indeed, for 0e<C)
(2.4) exp (6Bt — |02i) is a martingale;
all one needs is that E(exp [0(Bt - Bs)]) = exp [|02(i - s)] for 0 ^ 5 ^ t, which
is just the moment-generating function of a Gaussian distribution. These
exponential martingales are extremely useful in many ways; in Section 9 we use
them to compute the Brownian first-passage distribution to a level, and in
Section 16 we derive the Law of the Iterated Logarithm using them.
One small point to note here in connection with the exponential martingales
(2.4) is that if we define the Hermite polynomials Hn(t, x) by
exp(0x-i02i):= Σ -Hn{Ux\
then, for 0 ^ s ^ i,
E(exp(0Bt-402i)W= Σ -K{Hn{t,Bt)\@s)
n^on\
= exp(0Bs-i025)
= Σ -Hn(s,Bs),
so, by comparing coefficients of Θ", we deduce that
Hn(t,Bt) is a martingale for each n.
It is easy to check that Ηχ(ί,χ) = χ and H2(t,x) = x2 — t, so, in particular,
(2.4)=>(2.1); Levy's Theorem 2.2 is essentially the converse to this.
(2.5) Remark. Ii(Nt)t>0 is a standard Poisson process then Xt:= Nt — t satisfies
all of the hypotheses of Theorem 2.2 except for continuity of the paths.
3. Brownian motion as a Gaussian process. In complete generality, a (real-
valued) process (Xt)t€T indexed by some set Τ is said to be a Gaussian process
4
BROWNIAN MOTION
1.3
if, for any ί!,..., i„e T, the law of (X(tx),..., X(tn) is multivariate Gaussian. Thus
the law of the process X is specified by the functions
μ(ί):=Ε*„ p(s,t):= cow (Xs,Xt).
(By this, we mean no more than that if we were told μ and p, we could work
out the law of (^(ij,...,X(tr)) for any tl,...,tneT.) In the study of Gaussian
processes, one usually assumes that μ = 0, to which the general case can be
reduced by considering the Gaussian process Xt — μ(ί).
It is obvious that (Bt)t^ 0 is a Gaussian process, with mean zero, and со variance
(3.1) p(s,t) = SAt (s,i^0).
Any continuous real-valued process (Xt)t>0 that is a zero-mean Gaussian process
with со variance (3.1) is a Brownian motion—just check the definition! This
simple fact turns out to be an extremely efficient means of checking when a
process is a Brownian motion, and the following four simple but extremely
important examples serve to illustrate this:
(3.2) the process ( — Bt)t^0 is a Brownian motion;
(3.3) for any a ^ 0, the process {Bt+a — Ba)t^0 is a Brownian motion;
(3.4) for any с ^0,JicBt/c2)t^0 is a Brownian motion {Brownian scaling);
(3.5) the process (Bt)t>0 defined by
Bo = 0,
Bt = tBl/t fori>0,
is a Brownian motion.
The proofs of these properties are trivial exercises, with the sole exception of
the proof of continuity at 0 of B. But this is not difficult, because the event that
B->0 at 0 is
F-ΠΌ Π \\St\<-\,
since В is certainly continuous in (0, oo). But the processes (Bt)t>0 and (Bt)i>0 are
continuous, and have the same distribution (they are Gaussian processes with
the same covariance!), so
P(F) = P(F)=1,
where F is the event (^n[jmC]qe^{0ii/m]{\Bq\ ^ Щ that B->0 at 0, which, by
the definition of B, is certain.
The most important by far of the properties (3.2)-(3.5) is the Brownian scaling
property (3.4). We shall give here an easy but striking consequence.
(3.6) LEMMA. We have
Pi sup Bt = + oo, infBt = — oo ) = 1.
1.3,4 INTRODUCTION
Proof. Let Z:= suptBr By Brownian scaling, for any о 0, we have
cZlz,
so the law of Ζ is concentrated on {0, +oo}. Let ρ = P(Z = 0). Then
P(Z = 0)^P[B1 ^0 and BU^0 for all ti^l]
(в,
^0 and sup {B1+t- 8^=0
t^o
because (Bl+t — Βγ\>0 is a Brownian motion, whose supremum is therefore 0
or +oo. But {Bl +t - ΒχΧ^ο is independent of (Bu)u^ l5 so we deduce that
ρ = P(Z = 0) ^ P(BX ^ 0)P(Z = 0) = !p,
whence ρ = 0. Combining with (3.2) gives the stated result. □
Lemma 3.6 implies straight away that, almost surely, for each деШ, {t:Bt = a}
is not bounded above. Thus Brownian motion is recurrent—it keeps returning
to its starting point.
We shall have more to say about Gaussian processes in Part 4 of this chapter,
but point out now that the discussion there is by way of an interesting digression
from our main theme; the general setting for Gaussian processes is too general
to permit full exploitation of the special features of Brownian motion (notably
a completely ordered index set).
4. Brownian motion as a Markov process. Brownian motion is a
(time-homogeneous) Markov process; for any bounded Borel /:R-*IR, and s, ί ^0,
(4.1) E[/(Bt+s)|^s] = Pt/(Bs)
where the transition semigroup (P,)t^0 is defined by
P,(x,y)f(y)dy (i>0),
P,f(x)> ',
[fix) (i = 0)
where
(*-y)2~]
(4.2) pt(x9y):=(2nt)-l'2txp
It
is the Brownian transition density. The Markov property (4.1) is immediate
from the definition of Brownian motion. It is easy to confirm that (Pt)t^0 is a
semigroup:
(4.3) Pt+s = PtPs = PsPt (5,i^0),
the so-called Chapman-Kolmogorov equations. The semigroup property (4.3)
6 BROWNIAN MOTION 1.4
suggests that we ought in some sense to have
(4.4) ~Pt = lim -(Pt+t - Pt) = Ρ β = 9P„
at sio s
where
(4.5) SF:=lim-(P,-fl
is the (infinitesimal) generator of(Pt)t^ 0. This is indeed true in complete generality,
when suitably interpreted; the suitable interpretation involves us in some fairly
careful analysis, because in general 0 is not defined for all functions, and much
of the classical early work on Markov processes struggled with these technicalities.
This functional-analytic viewpoint has many merits, not least that it can suggest
quickly what things are likely to be true, but we shall not stress it too much
because it is not a very convenient framework in which to prove the conjectures
to which it leads. But, for now, let us illustrate the notion by working out the
generator of Brownian motion. From (4.5), we should define &f for suitable / by
<Zf:=lim-(Ptf-n
tio t
and, indeed, if/eC£(R) then
40 t tiO J-oo t Jin
= lim
tio
\{уфГ(х) + W(x + вуу/i)} exp(--b2)-^
'In
= i/"W·
(where 0e(0,1) depends on Уу/t)
Thus the infinitesimal generator of Brownian motion is
2dx2'
at least when applied to C£(R). From (4.4), we find that, for /eC£(R),
^Ptf(x) = $Ptf(x) = ±(Ptf)"(x),
Ot
which leads to Kolmogorov's backward equation for the Brownian transition
density:
д 1 д2
(4.6) - pt(x, у) = - — pr(x, у),
ot 1 dxz
1.4,5
INTRODUCTION
7
since / is arbitrary. Using the other part of (4.4) gives us
^Ptf(x) = Pt$f(x) = ±Ptf"(x),
Ot
and an integration by parts now yields Kolmogorov's forward equation for the
Brownian transition density:
д , ч 1 д2 < ч
(4-7) —pt(x, у) = - — pt(x, у).
ot 2 су1
This equation is familiar in physics, where it is known as the heat equation, or
the diffusion equation, so called because it determines the physical flow of heat,
or the physical diffusion of particles in solution, in a homogeneous medium.
Many of the notions of diffusion that probabilists use everyday were known
to physicists long ago, and amount to the same things in different language
(see, for example, the classic book by Crank [1] for a physicists' exposition—and
a broad selection of fascinating and challenging questions). It is however
important to stress that we are not simply going to be rederiving results well
known in physics; probability provides techniques for the study of individual
diffusing particles, which are far more flexible and powerful than the classical
analysis of the heat equation, which is only a statement about the average
behaviour of a large number of diffusing particles.
5. Brownian motion as a diffusion (and martingale). Without trying to be too
precise, a diffusion (on the real line for now) is a continuous time-homogeneous
Markov process X that is 'characterised' in some sense by its local infinitesimal
drift b and variance a: for small h,
(5.1) (i) ElXt + k-Xt\Pt-]=hb(Xt),
(5.1)(ii) E[{Xt+h-Xt-hb(Xt)}2\Ft-]=ha(Xt).
If a and b were constant functions then
Xt = aBt + bt, σ:=α1/2,
would satisfy the description (5.1); the more general diffusion is rather similar
except that the drift and variance may now depend on position. It is unnecessary
to impose conditions on moments of the increment Xt+h — Xt beyond the
second, which you will certainly accept as plausible if you recall Levy's Theorem
(2.2), which said that Brownian motion is characterised by b = 0, a = 1.
Broadly speaking, there are three approaches to diffusions: the stochastic
differential equation (SDE) approach, the martingale-problem approach, and
the partial differential equation (PDE) approach. Each has its merits and peculiar
techniques.
The SDE approach constructs the diffusion X with given infinitesimal charac-
8 BROWNIAN MOTION 1.5
teristics a and b by solving
(5.2) *, = *<> +
c(Xs)dBs + b(Xs)ds,
/o Jo
where σ:=α1/2, an equation that is commonly written in 'differential' form
(5.3) dXt = a{Xt) dBt + b(Xt) dt.
Thus X has infinitesimal drift b and infinitesimal variance, a, since the increment
Xt+h — Xt is (approximately)
a(Xt)(Bt+h-Bt) + hb(Xt).
There is a lot of work involved in defining what the second term on the
right-hand side of (5.2) means, and in verifying existence and uniqueness of a
solution under suitable conditions on σ and b; we shall have almost nothing
to say on this until Volume 2.
The martingale problem approach and the PDE approach both begin from
the same trivial calculation based on (5.1). For any feC%,
(5.4) EU(Xt+h)-f(X,Wt-]
= EU'(x,)(x,+k - x,) + \f'\ext+h + (i - Θ)χ,)(χ,+η - x,)2m
(where 0e(0,1) is random)
=f'(Xt)hb(X,) + hf"(Xt)lha(Xt) + к2Ь{ХП
= h#f(Xt) + 0(h2),
where S£ is the second-order elliptic operator,
(5.5) jSf/(x):= ±α(χΆχ) + b(x)^-(x).
dxz dx
The martingale-problem approach takes (5.4) and re-expresses it as
E^f(Xt+h)-f(Xt)- Γ"JSf/(XjJ^l = o(h%
so that the martingale-problem 'definition' of a diffusion X with drift b and
variance a is that X is a continuous process such that, for all /eCj,
(5.6) f(Xt)-
&f(Xs)ds is a martingale.
The PDE approach takes expectations on both sides of (5.4) to get
(5.7) Phf(x) -f{x) = h<?f(x) + 0(h2),
so that, dividing by h and letting /i|0,
(5.8) the infinitesimal generator У of X is JSf.
1.5
INTRODUCTION
9
The PDE approach is now ready to go, with all of the arsenal of PDE techniques
at its disposal; for example, one may begin by looking for the fundamental
solution pt(x, y) to
— Pt(*> У) = &xPt(x> У% Ро(х> У) = δγ{χ\ pt ^ 0,
dt
where S£x is the operator JSf acting on the x-variable, and by is the Dirac delta
function at y. This fundamental solution is the transition density of the diffusion,
from which one can obtain much information; see Chapter 3 of Stroock and
Varadhan [1], which is also the definitive account of the martingale-problem
method applied to multidimensional diffusions. There are still real problems of
definition, existence and uniqueness for each of the three approaches—least
severe for the PDE approach. But the additional price to be paid for using
stochastic methods is worth it; the conditions imposed on a and b to get a PDE
result to work are generally of a global nature, whereas the diffusion, being
continuous, should only care about local behaviour. The stochastic methods are
just right for this—once a diffusion leaves a region where everything is nice,
we can stop it, and solve in the nice region, thereby giving results under only
local conditions. We have great admiration and respect for the PDE approach—
the analysts' fine results are not just valid for second-order elliptic operators,
which is the case with the probabilistic results. The last word for the moment
on the comparison between the three methods must be with Sid Port: The one
thing probabilists can do which analysts can't is stop—and they never forgive
us for it.'
You will realise by now that one can perfectly well have diffusions in dimension
greater than one, but the one-dimensional diffusion theory is essentially complete,
thanks to Brownian local time. The existence and properties of Brownian local
time form the first non-trivial result in the theory (after the existence of Brownian
motion itself).
(5.9) THEOREM (Trotter). There exists a process {l{t, x):t^ 0, xeR} such that
(5.10) (i) (ί, χ) ι—► l(t, x) is jointly continuous;
(5.10)(ii) for any bounded measurable f and у ^ 0,
\'f(Bs)ds= Г f(x)l(t,x)dx.
Jo J - oo
This is a deep result, whose proof using stochastic calculus we shall finally give
in Section IV.44. The key property (5.10)(ii) is the occupation density formula;
we shall discuss some of the implications for the Brownian sample path in
Section 10, but for now we describe the most general regular diffusion (see
Sections V.44-54 for the whole story).
(5.11) THEOREM. A (regular) one-dimensional diffusion X on an interval I can
10 BROWNIAN MOTION 1.5,6
be obtained from Browian motion В as
where the scale function s:/->R is continuous and strictly increasing and
zt = inf {u: Au > t}, where
Au = m{dx) l{u, x)
for some measure m (the speed measure) that puts positive finite mass on bounded
non-empty open subintervals of I which exclude the endpoints of I.
Any pair (s, m) gives rise to a regular diffusion, which is uniquely characterised
by(s,m).
The theory of diffusions in dimension greater than one is still much less
complete, and doubtless will reman so.
2. BASICS ABOUT BROWNIAN MOTION
6. Existence and uniqueness of Brownian motion. The existence proof for
Brownian motion that we now give (due to Ciesielski [1]) is the ultimate
refinement of Wiener's original idea of representing Brownian motion as a
random Fourier series.
(6.1) THEOREM. There exists a probability space on which it is possible to
define a process (Bt)0^t^ x with the properties
(i) B0(co) = 0forallco;
(ii) the map t\->Bt((o) is a continuous function ofte[0, V\for all ω;
(iii) for every O^s^i^l, Bt — Bs is independent of {Bu:u^s} and has a
N(0, t — s) distribution.
Proof. Take some probability space on which there is defined an infinite
sequence of independent N(0,1) random variables. For reasons that will soon
be apparent, we assume that they are indexed as {Zkn:neZ+, к odd, k^ 2n).
Now define
0i,oW=l
r 2<»-i>/2 ((k-l)2~n<t*:k2-n),
gKn(t)= j-*"-1"2 (/c2-"<i^(/c+l)2-"),
I 0 otherwise,
for n^l, fe^2n, к odd. For notational convenience, let Sn = {(k,n):k odd,
к ^ 2n}, S = [jn> 0Sn. The first thing to notice is that {gkn: (k, n)eS} is a complete
1.6
BASICS ABOUT BROWNIAN MOTION
11
orthonormal system in L2[0,1]. The orthonormality of the gkn is easy to check;
and, for completeness, if feL2 [0,1] were orthogonal to all the gkjn then
F(t) = Jq/(w)du would vanish at 0 and 1 (since f-Lglt0Y, anc^ also at i (since
/lflfn); and also at|,f, (since/±012,032),.... ThusF = 0, and/=0.
Now define Λ,Λ(ί):=ίοΛ,η(Μ)^ anc^ ^е approximations Βπ(·) to Brownian
motion by
ВД= Σ Σ zKmfktm(t)-
m = 0 (JI[,ffl)€Sm
Let us describe what these approximations are doing.
The first approximation B0 is simply iZ10, a straight line. The next
approximation is obtained by adding on a Gaussian multiple of /lls which is a
tent-shaped function, vanishing at 0 and 1. The next approximation is obtained
by adding on two Gaussian multiples of tent-shaped functions, which both vanish
at 0, j and 1. The first three approximations are illustrated in Fig. 1.1. So what
is happening is that the nth approximation is piecewise-linear and continuous,
and is equal to the limit value at each point of the formy2~n.
The next stage of the proof is to establish that the Bn converge uniformly
almost surely. Indeed, for any positive constant an,
P( sup \Bn(t)-Bn_l(t)\>ar
= p(sup|ZM|>2<" + 1>/2a„
(since the fKn are all at most 2~(B + 1)/2)
<2"-1P(|Z1,J>2<" + 1)/4)
<(4π)-1'22*4~1«φ(-α.22"λ
Ba
Вг
Fig. 1.1
12
BROWNIAN MOTION
1.6
by the elementary estimate
exp(—^y2)dy^x_1exp( —^x2).
We now aim to choose the constants an in such a way that
Σ2η/2α;1εχρ(-α2η2η)<€θ,
Σα»<0°·
π
The first of these conditions will ensure that, almost surely,
sup | B„(t) — B„ _ x (i) | ^ an for all large enough n;
the second will guarantee that the Bn converge uniformly (almost surely) to a
limit B, which is therefore continuous. But these conditions are satisfied by the
choice an = (n2"n)1/2, for example.
Thus we have proved that, almost surely, the Bn converge uniformly to some
continuous limit B, which we now must show is Brownian motion. As we saw
in Section 3, the simplest way to do this is to check that В is a zero-mean
Gaussian process with covariance structure E(BsBt) = s л ί. Obviously, each Bn
is a zero-mean Gaussian process: the vector (Bn(t1),...,Bn(tk)) is multivariate
Gaussian. This converges almost surely (and so in distribution) to (В(^),..., B(tk)\
which also has a zero-mean Gaussian law, and the limit of the covariances of
the Br gives the covariance of B. But
E[Bn(s)Bn№= £ Σ fUs)fU*)>
m = 0 (k,m)eSm
by independence of the Zkm, and this converges as ηf oo to
Σ /k.m(*)/kf«(i)
(k,m)eS
= ho,s](u)ho,t](u)du
Jo
= S Λ ί,
since fkttn{s) — ioI[o,s](u)dk,m(u) du is the Fourier coefficient oigktn in the
representation of I[0tS] in terms of the complete orthonormal system {gktn:(k,n)eS}.
ParsevaPs identity concludes the proof. Π
(6.2) COROLLARY. There exists a probability space on which Brownian motion
can be defined.
The idea is obvious; we can build on some probability space independent
copies of the process constructed in Theorem 6.1, and then stick them together
1.6,7
BASICS ABOUT BROWNIAN MOTION
13
to make a process (Bt)t> 0. We leave the reader to satisfy himself or herself that
this can easily be done.
(63) Remark. The observant reader will notice that in Theorem 6.1 we obtained
В as the almost sure uniform limit of continuous functions, and may be worrying
what happens on the null set—define В = 0 there!
Now we turn to the uniqueness of Brownian motion, a much simpler matter,
once one decides the sense in which Brownian motion is unique. But this can
only be in the distributional sense: there is a unique probability measure Ρ
(called Wiener measure) on the space C(R+,R) = {continuous χ :R + ->R} such
that under Ρ
(6.4) (i) x(0) = 0,a.s.;
(6.4)(ii) for i0 = 0<ix < ··· <i„, (x(iy-ij_1))"=1 are independent zero-mean
Gaussian variables with variances t}·, — iy_ x (j = 1,..., n).
If we define nt: C(R+,R)->R to be the projection map
nt(x) = x(t)
and # to be the collection of all cylinder sets
{x:x{tj)eAj for ; = l,...,n}
as η runs through N, t x,..., tn run though R +, and A1,...,An run through ^(R),
then ^ is a π-system that generates the σ-field si = a({nt:t ^0}). Any two
measure Ρ and P' on C(R+,R) with properties (6.4)(i) and (ii) must agree on #
and therefore on si (see Lemma H.4.6). Thus Ρ is unique.
(6.5) Remarks. It is also true that si is the Borel σ-field on C(R + ,R) when this
space is equipped with the topology of uniform convergence on compacts (see
Lemma II.82.3).
7. Skorokhod embedding. In many ways, a zero-mean finite-variance random
walk looks like Brownian motion, and we shall shortly see this made precise.
The key to understanding this is the celebrated Skorokhod embedding, which
allows one to embed any random walk with zero mean and finite variance into
a given Brownian motion in a very well-controlled way.
We suppose that F is the distribution of the steps of the random walk; thus
Γ xF(dx) = 0, x2F(dx) = σ2 < oo.
J — ao J — oo
The aim is to find a stopping time Τ for Brownian motion В such that
BT~F and ΕΓ=σ2.
14
BROWNIAN MOTION
1.7
The requirement that ΕΤ=σ2 is essential to our subsequent use of the
Skorokhod embedding, and prevents the problem from being a triviality. (As
Doob has remarked, we could just take r.= inf{i > l:Bt = h(Bl)}, where h is
chosen to make /i(#i)~ F]- But for this Τ,ΕΓ= oo). There are a number of
different ways to build the stopping time Τ based on the path of В alone; we
discuss the beautiful Azema-Yor construction in Section VI.51 to illustrate this
by excursion theory. For now, though, we take a simpler though less elegant
approach and assume that we are given some further randomisation independent
of B, which we use to pick a pair α < 0 < β according to the distribution
(7.1) μ(άα,Μ)) = y{b - a)F+(db)F_(da%
where F+ are the restrictions of F to [0, oo), (—οο,Ο) respectively, and
(7.2) y~l = \bF+{db) = - ί aF_(da):
Jo J - oo
\x\F(dx)
is the appropriate normalisation.
To see why this is a good thing to do, we first need to do a couple of trivial
martingale calculations.
{73) PROPOSITION. If τ = mi{t:Bt<£(a,b)}, where a<0<b are fixed, then
τ < oo, a.s., and
(7.4) (i) Ρ(βτ = ί>)= ~α
b-a
(7.4)(ii) Ет = |оЬ|.
Proof. Finiteness of τ follows from Lemma 3.6. For (7.4)(i), use the Optional
Sampling Theorem on the martingale B:
0 = E(BtAn)
= feP(Bt = b, τ ^ n) + aF(Bt = α, τ ^ и) + Ε \βη: τ > η]
-+bP{Bz = b) + aF(Bz = a)
as η-*οο, since ΒτΑ is bounded.
To obtain (7.4)(ii), consider the martingale
Mt = (Bt-a)(b-Bt) + t,
and use the Optional Sampling Theorem again:
EM0=-ab.= EMtAn
= Е(хлп) + Е(ВХАп-а)(Ь-ВгАй)
-»Ετ
as п-юо.
D
1.7
BASICS ABOUT BROWNIAN MOTION
15
The Skorokhod embedding is described as follows. Take the two points
α < 0 < /?, and run В until it hits one or other:
T:=inf{u:BJ(aJ)}.
(7.5) THEOREM (Skorokhod embedding). The law of BT is F, andET= σ2.
Proof. From (7.1) and (7.4) (i), we see immediately that, for b > 0,
-1
P(BTedb)=\ —-y{b - a)F+{db)F_(da)
■<o.o)b-a
= F+(db),
using the definition of y. Thus BT has law F. Next,
ET= | | i4da,db)\ab\
,0)
■ί ί
= y\ F+(db)( F_(da)(b-a)\ba\
J[0,oo) J(-oo,0)
■j:
x2F{dx)
= σ2
as required. Π
Now we see how to embed the random walk into Brownian motion; we take
7\ := Τ as just described, and then perform the same construction on the
Brownian motion (BTl + t — BTl)t^ 0 to obtain a stopping time T'2 with mean σ2
and such that В , — ΒΤι ~ F. Now set T2:= Tx + T'2, and proceed to carry
out the same construction on the Brownian motion (BT2 + t — BT2)t>0 (the fact
that this is a Brownian motion does not follow from (3.3), since Tx and T2 are
random; we need the strong Markov property of Brownian motion—see Section
12—to justify this intuitively obvious fact).
We go on doing this, ending up with a sequence T0 = 0 ^ 7\ ^ T2 ^ · · · of
stopping times. To summarise then, we have the following result.
(7.6) THEOREM. The process (Sn)n^0:= (B(Tn))n^0 is a random walk with step
distribution F, and ΈΤη — ησ2.
The Skorokhod embedding just described reduces many statements about
zero-mean finite-variance random walks to trivialities; for example, since
Р[8ирВ,= + oo,infBt= — oo ] = 1,
(7.7) P[supS„= +oo,infS„= -oo] = l
16
BROWNIAN MOTION
1.7,8
for any such random walk! But we need to be a bit careful here—might the
sequence (Tn) not miss all those times when В was above 20, say? In principle,
yes; but, because we ensured that the Tn—Tn_1 are independent and identically
distributed (HD) with finite mean, that cannot happen. This is the only place
where applying the Skorokhod embedding is in the least bit delicate, and in
our proof in Sections 8 and 17 of the big classical limit results for HD summands
you will see how we deal with this. Generalising to independent summands that
have different distributions tends to become more involved; the limit problem
for (Sn) get converted into a limit problem for (Tn)\ Nonetheless, if this can be
solved, the functional form of the limit theorem often follows very easily. In
any case, knowing what Brownian motion does will give a very good idea of
what the random walk should do!
8. Donsker's Invariance Principle. It is an important fact that Brownian motion
is a weak limit of random walks, both for our intuitive understanding, and for
the easy proof it permits of various limit theorems, using the Continuous-
Mapping Theorem (see Lemma H.84.2). Theorem 7.6 tells us that a random walk
can be embedded in a Brownian motion, so the result cannot be said now to
be a big surprise; and, indeed, all that is needed is a little care over the details.
So we shall suppose given any zero-mean step distribution F with unit
variance, and use Theorem 7.6 to give us a random walk S„:= B(Tn) with this
step distribution. Now for each η define the random function
(8.1) S(n)(i) = n1/2
.НК1+(^-')4 ;
fc + 1
This is a piecewise-linear continuous function, equal to η 1/2Sk at each point
of the form t — k/n.
(8.2) THEOREM (Donsker's Invariance Principle). The processes (S(n)(i))0< f < 1
converge weakly to (Bt)0^t^ 1 as n-> oo.
Weak convergence is studied in Chapter 2, Part 6.
Proof. We shall prove that, given ε > 0, there is some n0 — η0(ε) so large that,
for η > n0,
(8.3) Ρ sup |S(n)(i)-B(n)(i)|>eUe,
|_ο<ί<ι J
where B{n) is a Brownian motion; in fact, we define
В\пЫп~1/2Вт.
To begin with, note that we can choose δ > 0 so small that
(8.4) P[|BS - Bt\ > ε for some 5, ie[0,1], \s - t\ ^ <5] ^ \ε
1.8
BASICS ABOUT BROWNIAN MOTION
17
since В is continuous. Note also that, since the embedding times Tn constitute
a random walk, by the Strong Law of Large Numbers we have
so that
Π
- sup \Tk — k\ —АО.
Π k^n
Thus, for some nl9 for all η ^ nl9
Ρ -sup\Tk-k\>±d Ue/2.
The third simple fact we need is that, for any ie[/c/n,(/c + l)/n], there is some
nue[Tk,Tk + 1~\ such that
(8.5) S(n)(t) = n-i/2Bnu = :BM.
Thus we estimate
Ρ sup |S(">(i)-B<">(i)|>el
Lo<*<i J
^P sup \Sin){t)-Bin){t)\>e, \Тк-к\^Ш for all k^ n]
|_ o<r< ι ° J
+ рГ-5ир|Гк-/с|>|Л
\_П k^n J
^Р[|В<и)-В<и)|>е for some u,te[0, l],|u- i| ^<5] + e/2,
at least for η ^ n0 = nx ν (3/<5). Indeed, if there is some ie[fc/n,(fc + l)/n] such
that |S(n)(i)-B(n)(i)| >ε then, using (8.5), there is some и in [n'1Tk9n~1Tk+1]
such that |B(un) - B<n)| > ε; and, since \n~* Tk - k/n\ ^ |<5 for all к (at least off the
unlikely event {n'1 supk<n | Tk - k\ > |<5}), it follows that \u - t\ ^ <5. Using (8.4)
now yields (8.3):
4 sup |S(n)(i)-B(n)|>e
<ε
for all n^n0. D
Remarks. Notice that Theorem 8.2 implies the Central Limit Theorem for IID
zero-mean finite-variance random variables, and the proof makes no use
anywhere of characteristic-function methods!
18
BROWNIAN MOTION
1.9
9. Exponential martingales and first-passage distributions. We shall use the
Brownian exponential martingales (2.4) to derive the distribution of
Hx:=ini{t>0:Xt = x}
for χ > 0, where Xt:= Bt + ct is a Brownian motion with drift, and also to derive
the distribution of Ha л Hb, where a < 0 < b.
Let us fix some λ > 0. From (2.4),
exp (6Xt - Xt) = exp [ΘΒΧ - (λ - 0c)J]
is a martingale provided that
Л-0с = ±02,
that is,
0 = β:= jc2 + 2λ - с, (or) θ = α:= - с - у/сЧИ
Note that α < 0 < /?. Thus the martingale exp (/?Zt — λί) is bounded on [0, #x],
so we can use the Optional Sampling Theorem to conclude that
1=Еехр[)ЩЯ,)-АЯ,]
= е*хЕе-ХНх,
from which
(9.1) Ε e~ λΗχ = exp { - x(,/c2 + 2λ - с)].
The Laplace transform can be inverted explicitly to give
(9.2) P(Hxedt)/dt = —^— exp[-(x - ci)2/2i].
v/2^r>
From (9.1), taking the limit as Л JO, we conclude that
fl (c^O),
l - (е-2И- (c<0).
Thus, if the drift is negative, the drifting Brownian motion will with positive
probability fail to hit same named positive level. This is not very surprising,
since t~~1Xt-*c, a.s.
Next, if we fix a < 0 < b, and let Г:= На л Hb, then the process
Mt:= {epb - ePa)eaX'-Xt + (eaa - eab)efiXt~Xt
is a martingale of the form f(Xt)e~Xt, where / is constructed so that
f{a)=f{b) = epb + aa-efia+ab.
1.9,10
BASICS ABOUT BROWNIAN MOTION
19
Another application of the Optional Sampling Theorem gives
_хт_ерь-еРа-^еаа-еаЬ
(9·3) Ee ~ efib + aa_efia+ab '
This time, there is no explicit inversion of the Laplace transform in any
particularly useable form, even in the special case a = — b, с = 0, when we have
the simpler statement
(9.4) Ee-Ar = sech(b4/2l).
Remarks. Everything in this section is entirely classical, but is a good illustration
of martingale techniques in conjunction with the Brownian exponential martingales.
We shall later see a completely different derivation of (9.2) which is far more
illuminating (see Section 14).
10. Some sample-path properties. The fine structure of Brownian sample paths
exerts a mesmeric fascination, which you will understand after reading this
section; Brownian paths are wilder than we can imagine! As a small aperitif,
we give a soft argument to prove that, almost surely, В is not differentiable at
zero. Notice first that, by the time-inversion property (3.5) and the oscillation of
the Brownian path near infinity (Lemma 3.6),
Ρ [for each ε > 0, 3 s, t ^ ε such that Bs < 0 < BJ = 1,
so the only possible derivative at zero would be zero. But if Bq = 0, we have
that, for all small enough i, \Bt\ ^ t. Time inversion again translates this into
the statement that, for all large enough s,|55|^l (Bs:=sB1/s\ contradicting
Lemma 3.6.
By Fubini's Theorem, we conclude that, almost surely, В is almost everywhere
non-differentiable. What we really want is true, but needs another approach.
(10.1) THEOREM. Almost surely, В is not Lipschitz-continuous anywhere. In
particular, В is nowhere differentiable.
Proof. Fixing some (large) К > 0, define, for each η ^ 2, the event
An:= {for some se[0,1], \Bt- B5\ ^ K\t - s\ whenever \t-s\^ 2/n},
and let
ΔΜ:= \B(k/n) - B((k - l)/n)|, к = 1,...,п.
Then
Лп~пКп<т i°Tj=k-i'k>k+i}>
20
BROWNIAN MOTION
1.10
and so
Ρ(Αη)^(η-2)ΙΡ(Αί>η!ζ4Κ/η)γ
= (n-2)P(\Bi\^4K/y/nf
But the An are increasing, so P(A„) = 0 for all n. D
This neat argument is due to Dvoretsky, Erdos and Kakutani. An elegant
proof using the joint continuity of Brownian local time can be found in Geman
and Horowitz [1].
The exact modulus of continuity of the Brownian path is an altogether more
delicate result; you will find a proof of the following result in Section 1.6 of
McKean [1].
(10.2) THEOREM (Levy's Modulus-of-Continuity Theorem)
Pllimsup sup ,. Bt + *~Bt^. ,„ = 1 1 =1.
<U0 o<t<i
2<51og(^11/2
01
This exact modulus of continuity should be compared with the celebrated Law
of the Iterated Logarithm:
P^ limsup
no
Ш
1/2
2floglog(-J I B,= +1V = 1. (10.3)
We prove this result in Section 16; do note that, although the Law of the Iterated
Logarithm (10.3) must hold at almost every point of the path, by Theorem 10.2
there will be places where the oscillation will be wilder than [2<51oglog(l/<5)]1/2.
Indeed, there will also be places where the oscillation will be like cS1/2 (if с is
small enough); for a fascinating analysis of such slow points of Brownian motion,
see Greenwood and Perkins [1,2].
One other sample path property worth mentioning is the following.
(10.4) THEOREM. Almost surely, Brownian motion has no point of increase:
P[3(5>0,s>0 such that Bs__h^Bs^Bs+h for all /ie[0,<5]] =0.
Of the many proofs, we give that of Burdzy [3] in Section 12.
(jf0.5) Remark. If (B,1),^ and (Bf)t^0 are independent Brownian motions, then
(Bt)t^0 = (Bl9Bf)t^0 is Brownian motion in the plane. For any ueR2, |i?| = 1,
the process (v · Bt) is a real-valued Brownian motion, and therefore has no points
of increase with probability 1. James Taylor has posed the question 'Is there a
positive probability that for some ν the Brownian motion ν · Β has a point of
1.10-12
BASICS ABOUT BROWNIAN MOTION
21
increase?' As far as we are aware, this problem is unresolved. If true, it would
say that, with positive probability, the trace B[0,1]:= {Bt:0^ t ^ 1} of two-
dimensional Brownian motion could be cut with a straight line; the best that
is known so far is Burdzy's result that B[0,1] can be cut with a Lipschitz curve.
11. Quadratic variation. While the sole result of this section could have been
included in the previous section, we separate it out because of its fundamental
importance in the construction of stochastic integrals. Define i£:= t л (fe2~n), and
№= Σ [ВД-в^)]2.
(11.1) LEMMA (Levy). With probability 1, as n->oo
[B]"->i uniformly on compact i-intervals.
The proof is given in Section IV.2, so we defer the details. Since they are not
hard, the reader may wish to have a go at proving Lemma 11.1 now.
(11.2) Remarks
(i) If we take the quadratic variation of В down the dyadic partitions, we get
the answer t. If, on the other hand, we take
sup{J£ |B(rk)-B(ik_1)|2:0 = i0<ii<-<iiv=li
we get the answer + oo; the true quadratic variation of В is infinite almost
surely. See Levy [2, p. 190] or Freedman [1, p. 48].
(ii) Of course, it follows immediately from Lemma 11.1 that the variation of
almost every Brownian path is infinite on every interval.
12. The strong Markov property. We give here a quick proof of the strong
Markov property for Brownian motion, exploiting heavily the special features
of Brownian motion, and standard results on martingales. The strong Markov
property holds for a much wider class of processes (we shall see in Chapter III
that it holds for all Feller-Dynkin processes, and for all Ray processes), and is
proved by approximating a stopping time by a dyadic-rational stopping time.
Since this approximation procedure is used in the martingale results to which
we appeal, the result we now give contains all the same ingredients. For the
definition of Brownian motion relative to a filtered probability space, see Section
11.72, or look back to (1.2)(iii)'.
(12.1) THEOREM. Let (Bt)t^0 be Brownian motion on some filtered
probability space (Ω, #", (lFt\ P), and let Τ be a finite-valued (J^) stopping time (see
22
BROWNIAN MOTION
1.12
Section II.73.1). Then the process
Β\Ό:= BT+t — BT,
is a Brownian motion independent of ^T.
i^O,
Proof. Fix 0 = ί0<ίχ < ··· ^in, and reals θχ θ„, and take some £eb!FT.
Let sj:=tJ — tj_1 (;'=l,...,n). Let S be a stopping time. By applying the
Optional Sampling Theorem to the martingale
Mr:=exp(i0B( + ±02i)
and the stopping time S л Ν, we obtain
E[exp {i0B({S λΝ) + ϊ) + \d2((S л N) + t)}\P(S л Ν)}
= exp {ieB(S λΝ) + \62{S a N)}.
Rearranging and letting Nfoo yields
(12.2)
E[exp {i0(Bs + I - Bs) + i02t} l^s] = l
Thus, if f ebJ^r, we have, with Zj denoting &T)(tj) - B(T)(t,·-!),
Ε
te*P\ Σ rj0,.Z, + i0,2s,]
= E
£«p Σ [ВД+№/1
by taking S=r+in_l5 i = sn in (12.2), and evaluating the expectation by
conditioning first on iFs. Conditioning successively on ^"(Γ+ tk) (k = η — 2,... ,0)
gives
ΖexP Ι .Σ DOjZj + ЩзЛ~\ = Εξ.
Hence, conditional on J^r, Zj (j = 1,..., n) are independent Gaussian variables
with mean 0 and variances s, (7 = 1,..., η). □
{12.3) Remarks
(i) The restriction to finite-valued stopping times is not really essential, being
included only to save worrying over the definition of BiT).
(ii) If one takes a more general continuous Markov process, the strong Markov
property is formulated as follows. Let Q = C(R+,R), with the canonical
process ^(ω) = ω(ί), filtration <F°X = σ({Χη: и ^ ί}) and measure P* for the
process started at x. The shift maps 6t: Ω -> Ω are defined by (0,а))(я):= ω(ί + s),
and the strong Markov property says that, for any С^7Д^0 stopping time
T, any £ebi^ + , це\>&° and xeR,
(12.4) Εχ[ξη°θτ: т< oo] = Εχ[ξΕΧ{Τ)(η): Τ< oo].
This formulation turns out to be ideal for applications. As an example, con-
1.12
BASICS ABOUT BROWNIAN MOTION
23
sider the celebrated Blumenthal 0-1 Law, which says that #"£ + := ns>0 J^°
is trivial under P*. The proof is not hard either; if Ле^ + then, taking
ξ = η = /л in (12.4), we obtain (with 7= 0)
Е*[/Л] = Е*[/ЛЕ*<°>/Л] = Р*№
so that
р*(Л) = Р*(Л)2 and Р*(Л) = 0 or 1!
The consequences of this result for Brownian motion are far-reaching. In
the general setting, though, the measurability questions implicit in (12.4)
need to be carefully considered; we return to this in Chapter III.
As a first application of the strong Markov property, we give Burdzy's [3]
beautiful proof of Theorem 10.4, that with probability 1 Brownian motion has
no point of increase. The aim is to prove that P°(Ao) = 0, where
A0 = {for some 0 < t < и, Bs ^ Bt ^ 1 for s ^ t,Bs ^ Bt for t ^s ^ и, Bu ^ Bt + 2},
This will give the result, since if Brownian motion has a point of increase
with positive probability then with positive probability it will have a point of
increase before it reaches 1 (scaling), and then with positive probability it will
rise at least 2 before returning to1 the level of the point of increase. For fixed
ee(0,1), define the a.s. finite stopping times Tk9 Uk9 and non-negative random
variables Mk by
Mo = [/o = 0;
Tk = ini{t>Uk:Bt = Mk-e or B, = Mk + 2}, /c^O,
Mk + l=sup{Bt:t^Tk}9 /c^O,
Uk+1=ini{t>Tk:Bt = Mk + 1} /c^O.
Figure 1.2 on page 24 illustrates the situation up to the feth stage:
Because of the strong Markov property of Brownian motion, the pieces of path
{B(Uk + s) — B(Uk): 0 ^ s < Tk — Uk} are independent, identically-distributed.
Thus the random variables
Xk = Mk + 1-Mkt fc^O,
are also IID, and each is equal in law to the maximum of Brownian motion
until it leaves [ — ε, 2]. Thus (see Proposition 7.3)
ίε/(ε + χ) (0<χ<2),
P(Xk>x) |o (^2)
The idea is that the unlikely event that Xk = 2 (which has probability ε/(ε + 2))
is an approximate point of increase at time [/к_х; convince yourself of the key
fact that if A0 happens then
24
BROWNIAN MOTION
1.12
Fig. 1.2
AE — {for some /c, Xk — 2, Mk ^ 1}
also happens. [Hint: If τ is the largest ί such that Bs^Bt^l for all s ^ i,
and Bs^Bt for i^s^w, Вм^В, + 2, and if ξ:= Βτ then consider inf{Mk:
Mk > ξ} =:m. Argue that m ^ 2.] But now P(A0) ^ Ρ(Αε) and
Ρ(Λ)= Σ P(Mk ^ 1, Zk = 2)
= Σ P(Mk^l)P(Zk = 2)
(since Xk is dependent of Mk by the strong Markov property)
ε
(12.5)
2 + ε
E(N + 1),
where N is the number of к for which Mk ^ 1. But Mk = Σ*=ο Xj> anc^ N + 1
is a stopping time for the random walk (Mk)k>0, so, by Wald's identity (a
special case of the Optional Sampling Theorem),
so that
EM„+1 = E(N + 1)E*0,
E(N+1) = E(M"^
EXn
e|loge|
Feeding this into (12.4) gives
Р(Л0КР(Л£Х
(2 + E)|logs|
-+0 as ε|0.
D
1.13
BASICS ABOUT BROWNIAN MOTION
25
13. Reflection. The basic idea of the reflection principle for Brownian motion
is familiar from simple symmetric random walk, and only the technical machinery
is any more difficult. For деИ, define the hitting time of a
Ha:=inf{t>0:Bt = a}.
(13.1) THEOREM. Fix aeR. The process
\Bt [t<Ha\
(13.2) Bt:= ·
[2a-Bt (t>Ha)
is a Brownian motion.
Proof. Consider the process
Yt:=Bt (O^i^HJ, Zt:=B(t + Ha)-a.
By the Strong Markov property, Theorem 12.1, Ζ is a Brownian motion
independent of Y. By (3.2), — Ζ is also a Brownian motion, also independent of Y.
Thus (7,Z) = (У, -Z). The map
<p:{Y,Z)^{YtI{t^HaY + {a + Zt-Ha)I{t
produces a continuous process, which will therefore have the same law as
φ(Υ, -Ζ). But φ{Υ,Ζ) = Β, and φ{Υ, -Z) = B. D
(13.3) COROLLARY. Define
St:=sup{Bu:u^t}.
Then, for а, у ^ 0, t > 0,
(13.4) Р[5,>а,В,<а-у] = Р[В,>а + з;].
Proof.
P[S, > a, Bt ^ a - y~\ = P[S, ^a,Bt^a- y], with В given by (13.2),
= P[Bt>a + yl
by drawing a picture. Π
Remarks
(i) Immediately from (13.4), we obtain, for a>0,
Р[Яа^Г] = Р[5,^а]
= P[S, > a,Bt ^ a] + P[Sr > a,Bt > a\
= 2P[B, > α]
= 2Ρ[βχ >а/ф\
26 BROWNIAN MOTION 1.13
by Brownian scaling. Differentiating with respect to t gives
a (a1
(13.5) Р[ЯвеЛ]/Л = —=exp -- ,
(ii) The joint distribution of St,Bt follows easily from (13.4):
(2a -x)2~
It
da dx,
(13.6) P[Steda, Btedx] = 2(2fj exp
for д ^ 0, χ ^ α.
(iii) Let P* denote the law of Brownian motion started at x. Then it follows
easily from (13.4) that, for x, у > 0,
from which
(13.7) PxlBtedy9 H0 > f]/dy = Pt(x9y) - Pt(x, -y\
where pt is, of course, the Brownian transition density (4.2). The transition
density (13.7) is often called the taboo transition density.
(iv) It is possible to obtain the analogue of (13.7) when H0 is replaced by the
first exit of Brownian motion from an interval (a, b)\ the transition density is
(13-8) Σ {рг{х,у + 2пб)-рах,2Ь-у + 2пд)}9 a<xty<bt
where δ := b — a. We omit the proof of this result, which we shall not be
using: see Freedman [1], p. 26 for a proof,
(v) There is a routine way to derive the analogous results for Brownian motion
with drift с from the ones we have just calculated. If Q:=C(R+,R),
Xt{co):= ω(ί), ^°:= σ{{Χ„:и ^ t}) is the canonical space, and if Pxc is the
law on (Ω, 3?°) of Brownian motion started at χ with drift ceR, then on
&°x the laws (P*'c)ceR are equivalent with density with respect to Wiener
measure given by
dP
(13.9)
dP
x,Q
= exp [c{Xt -x)- \c2t].
This is a special case of the celebrated Cameron-Martin-Girsanov formula,
which we discuss in depth in Chapter IV. The reader may like to try proving
(13.9) now (Hint: consider a cylinder set.) Combining (13.9) and (13.6), we
deduce that, for a ^ 0, χ ^ a,
(13.10) P°'TSteda, *,eix] = 2(2fl~x> exp Γ - (2а~х? + cx _ ιΛ1
Jbti* L 2i J
Can you see how to deduce (9.2) from (13.5) and (13.9)?
1.14
BASICS ABOUT BROWNIAN MOTION
27
14. Reflecting Brownian motion and local time. If {Bt)t> 0 is a Brownian motion
then a reflecting Brownian motion is a continuous process identical in law to
(|£fl)r>o· It is a continuous non-negative process, and is in fact also a Markov
process. The Markov property is not immediately obvious, because we have
taken a function of a Markov process (x \—>|x| in this case), which will not in
general be Markovian. The following simple result covers this situation.
{14.1) LEMMA. Let {S,^) and (S'9Sf) be measurable spaces, and suppose that
the measurable function Φ:{Ξ,^)-^{Ξ',^') is onto, {Pt)t>0 *5 a Markov transition
semigroup on (S,^), and (Qt)t>0 г5 а collection of probability kernels on S' such
that, for allfebSf',
(14.2) Л(/°Ф) = (бг/)°Ф.
Then (Qt)t^0 is a Markov transition semigroup, and if X is a Markov process
with transition semigroup (Pt) then Ф(Х) is Markov with transition semigroup (Qt).
(A probability kernel Q on S' is a mapQ:S' χ ^'->[0,1] such that β(χ,·)
is a probability measure for each xeS', and Q(-, A) is measurable for each A e9")
Proof. Using the semigroup property of (Pt) and (14.2)
Л+.(/°Ф) = (&+,Я°Ф = ЛЛ(/°Ф) = Л((б,Я°Ф) = (йб-Л°Ф-
Since Φ is onto, Qt+sf = QtQsf To check that Φ(Ζ) is Markov, just take
0 = i0 ^ t1··· ^tn, sk = tk — tk_1, fkeh&" and compute
U-i J
= (QsJiQsj2-QsJn)mx))
by repeated application of (14.2). Π
In our case, the transition semigroup is the Brownian transition semigroup
(4.2), and Φ(χ) = |x|. If we define, for /eCb(R+),
QtfW = (a(*. y) + Pt(x, ~y))f(y) dy
Jo
= J " (2πί)-1/2 exp ( - ^p^ cosh №\ f{y) dy
then it is trivial to check (14.2). The criterion of Lemma (14.1) for Φ{Χ) to be
Markov is a very obvious one; see Rogers and Pitman [1] for a discussion of
more interesting criteria.
The strong Markov property of \B\ follows from the strong Markov property
of Brownian motion, since any stopping time for the filtration of |B| is a stopping
time for the filtration of B.
28
BROWNIAN MOTION
1.14
Now let us consider the process ((St,St — Bt))t>0, which is a continuous
strong Markov process in (R+)2. (Here, St:= sup{Bu:и ^ i}, as in the previous
section). To see the strong Markov property, observe that, for any finite stopping
time Г,
(14.3) (ST+t,ST+t-BT+t) = (STv(BT + St\((ST-BT)vSt)-Bt\
where Bt:= BT+t — BT and St:= sup {Bu:и ^ i). Since В is independent of ^T,
the law of the future of (S,S — B) given ^T depends only on (ST,BT). It is
possible to write out explicitly the transition semigroup of (S, S — B\ using the
results of the last section, and then to confirm, using Lemma 14.1, that S — В
is a Markov process, with the same transition semigroup as reflecting Brownian
motion |B|. We skip the details, because we give a much neater proof in
Section V.6. For now, you will not be suprised that S — В is a reflecting Brownian
motion even if you do not carry through the calculations we indicated, because
S — В is a continuous non-negative process that clearly behaves like Brownian
motion in (0, oo). What is most important for now is the remarkable fact
(discovered by Levy) that by looking at the path ofS — B, we can work out what
S is\ Let us see how this can be done. Fix ε > 0 and define
T:=ini{u:Su-Bu>e}.
Then it is not hard to see that ST must have an exponential distribution. Indeed, for
any a > 0,
PlST-a>x\ST>a-] = PlT>Ha+x\T>Ha·]
= P[SM - Bu ^ ε for Ha ^ и ^ Ha+x\Su -Bu^sioru^ HJ
= P[SM-Bu ^ε for 0^u ^HX\SU-Bu^ε for и^HJ
(using (14.3) with T = Ha,Hx:=inf{u:Bu = x})
= P[SM - Bu ^ ε for 0 ^ и ^ Ях] (since В is independent of ^(Ha))
= P[Sr>x].
Since ST has an exponential law, we can compute its mean using the Optional
Sampling Theorem:
0 = ЕВГл и = ЕРгл и ~ (STa η - ВТл и)] = Е5Гл и ~ Е(5Гл и ~ ВТл „)>
whence
and, letting wtoo, using monotone convergence on the left and dominated
convergence on the right, we obtain
ESr = ε,
so that ST is an exponential random variable of rate ε-1. Now let us define
Τ\(ε):= 0, Γ„(ε):= inf {и > 7»: Su - Bu > ε},
T'n + 1(£):=M{u>Tn(£):Su-Bu = 0},
1.14
BASICS ABOUT BROWNIAN MOTION
29
so that 7\(ε), Γ2(ε),... are the successive times at which S — B achieves an
upcrossing of [Ο,ε) (see Fig. 1.3). Define
[/(ί,ε):= sup {k:Tk(e)^ ή,
the number of upcrossings made by S — В before t. Note that [/(·, ε) is increasing
for each ε > 0. Now it does not take too long to realise (using the strong Markov
property) that the random variables 5(Γ1(ε)),5(Γ2(ε))-5(Γ1(ε)),5(Τ3(ε))-
S(T2(e)\... are IID exponentials, rate ε"1. Thus
U(Ha,s) = sup{k:S(Tk(s))^a}
will have a Poisson distribution with mean α/ε. Now hold a fixed, and consider
Z„ = 2-"C/(Hfl,2-"),
which has mean a. If <&„:= a({Zk: k^ n}), we claim that (Zn,^n) is a reversed
martingale. Indeed, if we consider an upcrossing of S — B from 0 to 2~n then
with probability exactly \ the path of S — B will go on up to 2~n+1 before it
returns to 0; thus, given ^n, the number of upcrossings to 2 n+ *, U(Ha
has а В([/(Яа, 2-"),£) distribution. Hence
Е[С/(Яв,2-и + 1)|^п]=1С/(Яв,2-и),
which implies ECZ^J^J =Z„. Thus (see Section 11.51)
Z„-+Z^, a.s. and in L1.
)-n+l
),
This ensures that EZ^ = a; but yar (Z„) = ей
Ζω = a, a.s. To summarise,
(14.4)
► 0,ΕΖ„ = α,Ζ„
Ит2-"С7(Яв,2-") = а, a.s.
• a, implying
S-B
Ш
Fig. 1.3
30
BROWNIAN MOTION
1.14
Hence we immediately have that
Ρ lim2"nC/(Ha,2-n) = aforallaeQ+ =1,
l_n->oo J
and by the fact that U(-,e) is increasing we conclude that
(14.5) Ρ lim2-"l7(i,2-',) = SIforalli^0 =1.
Ln-*oo J
(For a different martingale argument, see Exercise E79.71c.) To summarize, we
can by looking at S — В count the number C/(i, ε) of upcrossings of [0, ε] by
time i, and then, according to (14.5), we can work out St simply by the recipe
St = lim2-n[/(i,2-n).
But a moment's reflection will show that, since we could reconstruct S just from
S — B, we could apply the same construction to the process \B\ (which has the
same law as S — B) and thereby obtain some process I with the properties
(14.6) (i) / is continuous increasing,
(14.6) (ii) / grows only when |B| = 0,
(14.6)(iii) |Bt| — lt is a Brownian motion,
since each of these properties is immediate for the pair (S — B, S), which has
the same law as (|B|, /). We have therefore proved the following celebrated result
of Levy.
(14У) THEOREM (Levy). These exists a (unique) continuous increasing process
I such that \Bt\ — lt is a Brownian motion. The process I grows only when \B\ — 0,
and can be recovered from \B\ by the recipe
(14.8) lt=lim2-nU(t,2~nl
where U(t,e) is the number of upcrossings of[0,e] by \B\ before time t.
(14.9) Remarks
(i) The uniqueness assertion remains unproved, but is an immediate
consequence of a general result that a finite-variation continuous martingale is
constant (IV.30.4).
(ii) The reversed martingale argument given here is due in a more general
setting to Greenwood and Pitman [1]. See also Ito and McKean [1,
Section 12].
(iii) The process / constructed in Theorem 14.7 is called Brownian local time (at
zero). In various places in the literature (and in particular, Ito and McKean
[1]), Brownian local time at zero is taken to be jl, so be careful when
moving from one account to another.
1.14-16
BASICS ABOUT BROWNIAN MOTION
31
(iv) We could evidently repeat the construction (14.8) at other levels by looking
at the upcrossings of [0,2~n] by \Bt — x|; this would give us a process /(i, x),
the local time at x, defined except on some null set that may depend on x.
Now look again at Trotter's Theorem 5.9, which says that, almost surely,
/(*,x) can be defined simultaneously for all x, and in such a way as to be
jointly continuous—a far stronger result.
15. Kolmogorov's test. Suppose ft:R+ ->R has the property that t~1,2h(t)\ as
ί JO, and let
Λ:= {Bt ^ h(t) near 0} = (J {B5 ^ h(s) for all s *ζ <5}.
<5>0
Then Kolmogorov's test says that Р(Л) = 0 or 1 according to whether the integral
(15.1) f r3/2/i(i)exp'" k{t)2
\dt
It
diverges or converges. The correct way to prove this result is by excursion theory,
but it would be premature to discuss this here. The essence of the excursion
theory ideas is contained in the fine proof due to Motoo [2]; see also Ito and
McKean [l,p. 33]. We omit the proof.
(15.2) Remarks
(i) Since, for each t > 0,
Л= (J Π {Bs^h(s)},
di=Qn{0,t]s€Qn(0,S]
we see that Ле^°, and hence Ae^°0 + ; by the Blumenthal 0-1 Law,
therefore Р(Л) = 0ог 1.
(ii) Since Y[Haedt]/dt = a{2nt3)~i/2exp{-a2/2t), the integral in (15.1) has a
simple interpretation, making Kolmogorov's test easy to remember; the
special case h(t) = a > 0 for all t will help you to remember that Р(Л) = 1 if
and only if the integral converges,
(iii) It is a simple consequence of Kolmogorov's test that
(15.3) Ρ limsup-^ °x =1
L цо
V2tloglog(l/t)
= 1,
the celebrated Law of the Iterated Logarithm (LIL) for Brownian motion.
Prove (15.3) now as an exercise using Kolmogorov's test; we shall give a
direct proof in the next section.
16. Brownian exponential martingales and the Law of the Iterated Logarithm.
The main aim of this section is to give McKean's [1] neat proof of the classical
32 BROWNIAN MOTION 1.16
Law of the Iterated Logarithm (LIL) for Brownian motion. Recall the statement
(10.3):
(16.1) Ρ limsup - =1 =1.
L no ^ilogloga/i) J
Because of the time-inversion property (3.5), this is equivalent to the statement
(16.2) Ρ limsup-—=^====1 U 1.
L it* ^2iloglogi J
Proof of (16.1). Write h(t)= [2iloglog(l/i)]1/2. The first part of the proof is to
show that limsupt|0Bt//i(i) cannot be bigger than 1. Using the Doob sub-
martingale maximal inequality (see Section 11.70) on the exponential martingale
Z,:= exp(aBt — ya2i), we obtain
(16.3) Ρ sup(Bs--|as)>/? =P supZs>eaP к e~aPEZt = e~afi.
Now fix 0, δe(0,1), and apply (16.3) with t = 0й, α = θ~η{1 + δ)Η{θη) and β = Щвп).
Since <χβ = constant + (1 + δ) log η,
P\ sup (Bs-±us)> β]
^constant χη~{ί+δ)
and so, by the Borel-Cantelli Lemma, it is almost surely true that, for all large
enough n,
sup [Bs - \5(1 + δ)θ~ηΗ{θη)'] ^ Щвп).
Thus, for 0n + x < t ^ 0", we have
Bt ^ sup Bs ^ 1(2 + (5)/i(0n) ^ ±0" 1/2(2 + (5)/i(i),
so that
limsupBt//i(i)^^-1/2(2 + i), a.s.
Letting 0jl,<5 JO through countable sequences, we conclude that
(16.4) lim sup Bt/h{t) ^ 1, a.s.
We now turn to the second part of the proof, that P[lim sup Bt/h(t) ^ 1] = 1.
Once again, choosing 0e(O,1), we let An be the event
Αη = {Β(θ»)-Β(θη + ')>(1-θ)ν2Ηθη)}.
The events An are independent, and, since Β{θη) - Β{θη+1) has an N(0,0n(l - 0))
1.16
BASICS ABOUT BROWNIAN MOTION
33
law, we have, with α„:= θ η/2/ι(0π),
Р(Лп) = Р(В1>а„)
(*p(-jy2)dy
Since
= (2π)-^[0
J αΎ
>{2πΓ1'2 Г(1-Зу-*)ехр(-у2)<1у
J an
= (2я)-1/2ехр(-К2)^„"1(1-^2)
:= γΗ9 say.
±a2 = logn + loglog(0 X)
we conclude that £пуп = + oo, by comparison with Znfalog n) 1. Thus, almost
surely, for infinitely many n,
Β(θη) - Β(θη+1)> (1 - 0)1/2/ι(0Π).
But on applying (16.4) to —B, we conclude that, for all large enough n,
Β(θη + 1) ^ - 2Η(θη + 1) > - 4θ1/2Η{θη).
Thus, for all large enough n,
Β{θη) > [(1 - 0)1/2 - 401/2]й(0"),
and now the result follows by letting 0 JO. Π
(ί6.5) COROLLARY. Let Yl9Y29... be IID zero-mean random variables with
variance 1, and let Sn = Σ"=ι Yj- Then
r']-
Ρ limsup—— n
L n->oo ^J2n log log
Proof. Please review the Skorokhod embedding of Section 7. We are going to
use the construction and notation of that section, and, in particular, we shall
assume that (SH)H>0 = {B(T„))n>0; that is, the random walk is embedded in B.
By the Strong Law, η~1Τη-^ΈΤ1 = 1, and so, with h(t):= ^/2* log log i,
HTn)
Hn) .
Trivially, then,
>1, a.s. (n->oo).
r sn ,· B(Tn) ,. B(t) i
hm sup = hm sup ^ hm sup-—- = 1.
.->«> Цп) n-«> цтн) и/ад
The other inequality needs a little more care, but is still not hard. Note that
p2:= limsupn_0O S„/h{n) is a tail measurable random variable, which is therefore
constant a.s. (by the Kolmogorov 0-1 Law.) Suppose that p< 1; then, for all
34
BROWNIAN MOTION
1.16
large enough n,
над m<f.
This will contradict (16.2) provided we can prove that in the interval [Гп, Tn+ x]
the Brownian motion cannot rise too far. To estimate this, using the explicit
form of the embedding, we have, for χ > 0,
<p(x):=P supB,>x =y F_{da)
Li^Ti J J-oo
F+(db)(b-a)- a)
x — a
since ( — a)/(x — a) is the probability that the Brownian motion reaches χ before
a. We aim to prove that, for any ε > 0,
(16.7) £ (p(e^/2nloglogn) < oo,
which, since φ is clearly decreasing, will follow from
(16.8) Σφ(ε4Α)<οο.
π
The point of this is that if we let
An:=l sup (Bu-BTn)/h(n)>s
Ι Τ n ^ u ^ Τ n + ι
then
P(An) = cp(sh(n)\
and (16.7) will imply that almost surely only finitely many of the An occur. Thus,
for all large enough n,
Bu B(Tn)
sup ^ + ε < ρ + ε,
Tn*u*Tn+lh(n) h(n)
whence for all large enough η
sup BJh(u) < ρ + ε.
If we have chosen ε so small that ρ + ε < 1, this contradicts (16.2). The proof
will therefore be complete once we have (16.8).
We write
ФМ = уГ F.ida)^ F+{db){b-x)-
J — oo J χ ■*
+ y Π F_(da)f"jF+(db)(-fl)
J — oo J χ
:=φ1(χ) + φ2(χ), say.
1.16
BASICS ABOUT BROWNIAN MOTION
35
Now φ2 (x) = 1 — F(x\ so
Λαο Λαο
Σ^2(ε\Α)< 0°^ (p2{£y/x)dx< oo<=> (1 — F{t))tdt< oo;
and the last statement is true because F has a finite second moment.
As for φΐ9 we have easily
and
Σ(Ρι(β\Α)< ^^ ^x—7= F + {db){b — By/x)< oo
η J ε^/xJzJx
-W>
■f
(db)(b-t)<a>
b2F+(db)<oo;
and this last statement is true, again by the finiteness of the second moment
ofF. α
(16.9) Remarks. The Law of the Iterated Logarithm is just one aspect of a much
bigger picture, and is a simple consequence of the following much deeper result.
(16.10) THEOREM (Strassen). For η > 3, define
XM(t) := B(nt)(2n log log и)"1/2, 0<ί<1,
a random element of С ([0,1]). With probability I,the set oflimit pointsof(X(n\^3
is the set
К:= \ f eС[0,1 ]:/is absolutely continuous, f'(tf dt^l}.
, Г А
Jo
See Strassen [1] or Freedman [1] for a proof. You will not be surprised to
learn that if (Sn) is a zero-mean unit-variance random walk, and
S„(i):= (nt -j)Sj+ x + (j + 1 - nt)Sj {j/n < ί < (J + 1)/")
is the piecewise-linear interpretation of (SJ, then almost surely the set of limit
points of {{In log log n)~ 1/2Sn{-))n> 3 is K. This is the Strassen Invariance Principle.
You will also not be surprised to learn that one deduces the Strassen invariance
principle from Theorem 16.10 in the same way that we deduced the classical
LIL, Corollary 16.5, from (16.2), namely by Skorokhod embedding of the random
walk in Brownian motion.
36
BROWNIAN MOTION
1.16,17
The proof of Theorem 16.10 that you will find in either of the above references
is specific to the Brownian situation, of course. But there is a sense in which
this result can be seen as part of the much more general theory of large deviations,
pioneered by Cramer, Schilder, Donsker and Varadhan, Ventcel and Freidlin,
and developed further by Donsker, Varadhan, Stroock, among others. The
excellent account by Deuschel and Stroock [1] contains a proof of Theorem
16.10 from a large-deviations point of view, and is delightfully clear.
3. BROWNIAN MOTION IN HIGHER DIMENSIONS
17. Some martingales for Brownian motion. By Brownian motion in Rd we
mean a process Bt:=(Bl,...,Bdt) where each of the (B{)t>0 (j=l,...,d) is a
Brownian motion, independent of all the others. To study Brownian motion in
Rd, we are going to need martingales, and the purpose of this section is to
derive a result that gives us all the martingales we shall need. This result can be
seen as a special case of general results in Markov process theory or in stochastic
calculus, but we shall prove it here using the special structure of Brownian
motion, since we do not yet have the general results.
(17.1) THEOREM. Suppose that /:R+ χ Rd->R is C1·2, and that there exists
a constant К such that, for all t ^ 0, xeIRd,
d2f
(17.2) \f(t,x)\ +
Then the process
f("»
d
+ Σ
7=1
OX:
d d
+ Σ Σ
дх,дх
(t,x)
^KeK(, + \x\)
(17.3)
where
(17.4)
C{:= f(t,Bt) - f(0, B0) - 9f(s, Bs) ds is a martingale,
Jo
•Ws+ilSV
Remarks. The class C1,2 is, of course, the class of functions f(t,x). with
continuous partial derivatives of all orders up to 1 in t and up to 2 in x. The
exponential growth condition (17.2) will be seen to be unnecessary provided we
relax the statement (17.3) to say that Cf is a local martingale. We shall not
digress to define this now. In dimension d = 1, the only functions of χ for which
f(Bt) is a martingale are the linear functions, but in dimension d ^ 2 we shall
see that there is a very rich family of / for which f{Bt) is a martingale.
Proof. We must prove that, for 0 ^ s ^ i,
E[Cf-C{|^] = 0,
1.17
BROWNIAN MOTION IN HIGHER DIMENSIONS
37
for which, by the independent-increments property of B, it will suffice to prove
that, for any xeJR* and t ^ 0
(17.5) E*[Cf]=0,
where P* is the law of Brownian motion started at x. Without loss of generality,
we can take χ = 0 (and write Ρ for P°), and we shall prove that, for 0 < ε < ί,
(17.6) E[C{-CfJ = 0.
Using the assumption (17.2), the fact that P[supu<i \BU\ ^ a] < сР[|Вх | ^ α/^/ί]
(see (13.4)), and dominated convergence, (17.6) implies (17.5).
Letting pt(x):=(2nt)~d,2exp( — \x\2/2t) denote the d-dimensional Brownian
transition density, we observe that, for t > 0, xeRd,
dt
(17.7)
Hence
ELC{-C{] = E
f(t,Bt)-f(E,Bc)-{'%f(s,Bs)ds]
(x)f(t, x) - pe{x)f{e, x)] dx
But
ds
ε *f
p£x)№(s,x) + ±Af(s,x)]dx.
ps(x)±Af(s, x) dx = ±Aps(x)/(s, x) dx,
(integrating twice by parts and using (17.2))
J dt
= \^(x)f(s,x)dx,
using (17.7). Thus
Ε \C{ - Cf] = [|>,(χ)/(ί, x) - Pe(x)f(e, *)] dx
-{'ds | L(x) ^"(s, x) + Д5, x) ^ (x)l
= [P.W/C, *) - Ρε(*)/(ε, *)] dx - ds ■
= \lP,W(t, x) - pt{x)f{e, x)] dx - J| Γ
dx
5
(PsW/(5,x))dx
= 0.
dsjips(x)f(s,x)^>dx
α
38
BROWNIAN MOTION
1.17,18
Note that the interchange of the s-integral and the x-integral in the penultimate
line В is only justified because we have ensured s ^ ε > 0 and so ps(x) and dps(x)/dt
are bounded; this is why we had to proceed to the natural (17.5) via the slightly
clumsy (17.6).
18. Recurrence and transience in higher dimensions. As a first application of
Theorem 17.1, we shall show in this section that Brownian motion is recurrent
in dimension d — 2, and transient in dimension d ^ 3. If В is Brownian motion
in Rd, let
Ha:=mf{t>0:\Bt\ = a}.
{18.1) THEOREM. For 0< a < \x\ < b,
'\ogb-log\x\
log b — log a
\x\2-d-b2
(18.2) Px(Ha<Hb): χ .d
2-d L2-d
a2-"-b
Proof. Let /:R'-»R be a C£ function such that, for a< |x| ^b,
HX) 1|χ|2-' (d>3).
Then / satisfies the conditions of Theorem 17.1, and moreover,
Δ/(χ) = 0 for a^|x|^fe,
as is readily verified. So, using the Optional Sampling Theorem on the martingale
Cf at the stopping time τ:= Ha л Hb yields, for \x\e(a,b\
0 = Ex[C£]=Ex[C{]
= E*lf(BHa):Ha < Я J + E*lf(BHb):Hb < Ha~\ - /(*),
since τ ^ Hb < oo, a.s. (one dimensional Brownian motion certainly leaves
[—6,6] in finite time!). So, in the case d = 2,
f(x) = log|x| = Px[Ha < Hb-\ log α + Px[Hb < Η J log 6,
and rearrangement gives (18.2), with analogous reasoning for the case d ^ 3.
D
(183) COROLLARY. Brownian motion in dimension d = 2is recurrent, and in
dimension d ^ 3 it is transient; more precisely, for any 0 < a < \x\,
(18.4) Рх[Я.<оо] = {1 А „ (d = 2)
\(a/\x\)d-2 (d>3).
1.18,19 BROWNIAN MOTION IN HIGHER DIMENSIONS 39
Proof. Since {Ha < 00} = [jn{Ha < Hn}, (18.4) follows immediately from (18.2)
by letting bf 00.
(18.5) Remarks. It follows immediately that Brownian motion in the plane visits
every non-empty open set U with probability 1, and, in fact, keeps returning to
U: there is no last visit to U. In dimension greater than 2, any bounded set
with ultimately be left for ever. Thus, in dimension d ^ 3, Brownian motion is
transient. In dimension 2, Brownian motion is recurrent—or, more accurately,
neighbourhood-recurrent, since Brownian motion in the plane does not hit points,
as we now show.
(18.6) COROLLARY. For Brownian motion in R2, |x| >0,
РХ[Я0 < oo] = 0.
Proof. {H0 < 00} = [J {H0^Hn}
η
η \m>|x|-i /
and
Ρ*Γ Π {Я1/т^Яя}1 = 1НтР^[Я1/т^Яп]
Lm>|x|"1 J "»-°°
= 0,
using (18.2). Π
(18.7) Remarks. We are seeing here the first signs of the important result that
if {Bt)t>0 is Brownian motion in Rd (d ^ 2) then (\Bt\)t>0 is a diffusion process,
called a d-dimensional Bessel process (denoted BES(d)). We postpone discussion
of this until we have a better idea of what a diffusion process is; see Section V.48
for a detailed analysis and Section VI.52 for the celebrated Ray-Knight Theorem,
where Bessel processes enter in a wholly natural way to describe the diffusion
property in the spatial parameter of Brownian local time (1(τ,χ)χ>0) taken at
some suitable stopping time τ.
The Bessel processes are the most important one-dimensional diffusions apart
from Brownian motion; Pitman and Yor [1] provide a detailed study of many
of their properties. See also Revuz and Yor [1].
The fact that the Bessel process is a diffusion in its own right is due to the
fact that Brownian motion in Rd is rotation-invariant; if ReO(d) is fixed then
(RBt)t> 0 is a Brownian motion too. The proof is trivial. Lemma 14.1 now allows
us to conclude that |Bf| is a Markov process.
19. Some applications of Brownian motion to complex analysis. One of the
richest areas of application of Brownian motion is complex analysis. The funda-
40
BROWNIAN MOTION
1.19
mental observation is that if / is an analytic function in some domain D then
each of the functions Re/ and Im/ satisfies Laplace's equation Aw = 0 in D.
Thus, in view of Theorem 17.1, Re/(Z,) and Im/(Zt) are local martingales, where
Z,:= Xt + iYt is complex Brownian motion.
{19.1) Remark. Even if/ were defined on the whole of <C, there would be no
reason for Re/ to satisfy the growth condition (17.2) (as an example, take
f(z) — exp(z3)). This is not really a problem, because if we take an open D0 c= D,
with D0 compact, and take / to be C£, equal to / in D0, then certainly Re/
satisfies (17.2), and so Re/(ZiAt) is a martingale, r:=ini{t:Zt$D0}. In all the
applications we make to complex analysis, this sort of localisation will often
be implicit, and we shall not dwell on the details.
{19.2) PROPOSITION (Maximum Modulus Theorem) Suppose that /: D-»C
is analytic, and that D Ώ. Dr:= {ze<C:\z\ ^ r}. Then
(19.3) max{|/(z)|:zEDr} = max{|/(z)|:|z| = r}.
Proof. If / were constant, there would nothing to prove; so suppose / is not
constant, and take z0eDr such that
|/(z0)| = max{|/(z)|:zEDr}.
Since / is bounded on_Dr, we may add λ/(ζ0) to / and assume (by taking λ > 0
large enough) that f(Dr) is contained in a half-space distant at least 1 from 0.
Thus log/ is a well-defined analytic function on Dn continuous on Dr. Hence
h:= Re log/ = log|/| is harmonic in Dr and continuous in Dr. The Maximum
Modulus Principle will follow once we show the stronger result:
(19.4) if U ^ С is open, and h: U ->R is harmonic then h has no local maximum
in U\ ifz0eD(z0,e):= {z:\z — z0\ < ε} c= U and h(z0) ^ h(z)for all zeD(z0,b)
then h{z0) = h(z) for all zeD(z0, ε).
The proof of this is a trivial application of the Optional Sampling Theorem. If
τ:= inf{i:|Z, — z0\ = <5}, where Z0 = z0 then, since h{Zt, τ) is a martingale (Theorem
17.1), we have
ft(z0) = Eft(Zt)
= -h(z0 + dew\
Jo 2π
since, by symmetry, the exit distribution from D(z0,S) is uniform on the
boundary. But, since h is continuous and h(z0 + Sew) ^ h(z0) for δ < ε, it must
be that h(z0 + Sew) = h(z0) for δ < ε, establishing (19.4).
Thus if |/1 had a local maximum inside Dn there would be a disc where
1.19
BROWNIAN MOTION IN HIGHER DIMENSIONS
41
log|/1 was constant, implying that log/ (and therefore /) was constant in the
disc, and hence throughout D. Π
(19.5) PROPOSITION (Fundamental Theorem of Algebra). Suppose that f:
С -> С is a non-constant polynomial. Then there exists z0e<E such that f(z0) — 0.
Proof. Suppose the contrary, that / is non-vanishing in <C. Then g:= \jf is a
well-defined analytic function, tending to zero at infinity (since / is a polynomial)
and therefore bounded. Since / is non-constant, we may find disjoint discs Dx
and D2 and α < β such that
Re^) ^ α < /K Re#(z2), zteDt.
Let Ζ be Brownian motion in <C, and consider the bounded martingale Re#(Zt).
The Martingale Convergence Theorem says that this is almost surely convergent,
yet, by Corollary 18.3, the process Ζ keeps visiting D1 and D2l there is no last
visit to D1 or D2. Hence liminf Reg(Zt) ^ α < β < limsup Reg(Zt), a
contradiction. Π
So far, we have not really been using the full strength of the Brownian-
motion/complex-analysis combination; we have only applied harmonic functions
to Brownian motion, and an analytic function comprises two very intimately
related harmonic functions. See the theory of conformal martingales (Getoor
and Sharpe [5]) and Chapter 4. The connection with Brownian motion goes
much deeper, as the following result shows.
(19.6) THEOREM. Let f:D-*D be analytic, and let Ζ be Brownian motion in
D. Then there is a Brownian motion Ζ in D such that
f(Zt) = Z^'\f'(ZJ\2du\
Remarks. A Brownian motion in a domain D я С is only defined up until the
first exit time from D. The importance of (19.7) is that the image of Brownian
motion under an analytic map is another Brownian motion (to within a time
change). See Section IV.34 for a proof. A common and powerful use of this is
that the exit distribution for one domain gets mapped to the exit distribution
of some other domain by an analytic map. Let us see an example of this.
(19.7) PROPOSITION (Poisson integral formula). Let H= {ze<C:Imz>0},
(Zt)t^0 be Brownian motion started at y = a + ibeM, and τ:= inf{V.ZM£H}.
Then
(19.8) P[ReZ,6rfx]- >* =Im(-i-)^,
n\\r + (x - a)2] \x-yj η
42
BROWNIAN MOTION
1.19
First proof. The stopping time τ is simply the first time that Υ := Im Ζ hits zero.
But this time has a density that we know (13.5). Meanwhile, X = Re Ζ is moving
like an independent Brownian motion, so
'°°be"b2/2ie"(x"fl)2/2i
^2nt* y/2nt
b
P[Xredxydx = ί "*,__-—т=-Л
Jo
Second proof. The map
тг[Ь2 + (х-д)2]'
ζ — у
ζ ι—►νν(ζ):=·
D
ζ —у
maps И one-one onto D= {w:|w| < 1}, and takes у to 0. Thus {w(Zt):t< τ}
is a time transformation of Brownian motion on С started at 0 and run until
it exits the disc D. Thus, for xeR,
P(Ztedx) = ^-\w'(x)\dx,
which agrees with (19.8). Π
By using the Riemann Mapping Theorem, we can find the exit distribution
from any connected simply connected domain by transforming the problem to
one for Brownian motion in D started at 0. Suppose, for example, that we want
the exit distribution from D for Brownian motion started at ζ in D. The
map
z^g(z):=f{z)~^ where /(z):=i—eH,
f(z)~f(y) l~z
maps D one-one onto D, taking γ to 0. Thus, if z = eie,
1 \ — \l\2
(19.9) Pc(Brownian motion exists D in d0) = — \gf(z)\de = —μη{άθ)
2π \ζ-ζ\η
where η — 2 and μ2 is the normalized Lebesgue measure on дЮ.
Of course, complex analysts are very familiar with the use of the Riemann
Mapping Theorem to obtain exit distributions (or Poisson kernels or harmonic
measure). Like them, we have to be sensible in a case such as that in which the
domain is D\(— 1,0] and has the wrong topology, the upper and lower parts
of the cut needing to be separated.
Martin boundary theory will clarify the matter of correct topologies, show
that (19.9) holds in all dimensions (check it when η = 1 now!), and prove that
every positive harmonic function h on Юп:= {xelRn:|x| < 1} with h(0) = 1 has
1.19,20 BROWNIAN MOTION IN HIGHER DIMENSIONS 43
a unique representation
i-ICI2
M0 =
v(dz\
6Юп\г-С\П
where ν is a probability measure on дЮп. When h = 1 on Dn. γ is the normalized
Lebesgue measure on дЮп.
20. Windings of planar Brownian motion. Let (Zt)t^0 be complex Brownian
motion, Z0 Φ 0. Then there is a continuous determination of 6t:= arg(Zt), and a
unique one with θοε[0,2π). The angle 0, keeps tracks of the winding of Brownian
motion about 0 up to time t. The earliest result on the windings of Brownian
motion is the following remarkable theorem of Spitzer [2].
(20.1) THEOREM (Spitzer). Suppose that Z0 = 1. Then
(20.2) r-^^-C, (r->oo),
logi
where C1 is the standard Cauchy law with density [π(1 4-х2)]"1.
Proof. Let Pz denote the law of complex Brownian motion started at ζ e<C. For
x>0, and Brownian motion Ζ started at 1, Zt:=xZt/x2 is Brownian motion
started at x, and 6t = θί/χ2. Thus an equivalent statement to (20.2) is
(20.3) the Px-law of(logl/x)~19i converges to Cx as x|0,
and it is this that we prove. The idea of the proof is to fix some very small disc
of radius a about the origin, which (with high probability) Brownian motion
will leave before time 1. As x|0, the contribution to θλ that comes after hitting
the circle of radius a is negligible, so we want the limiting behaviour of 0r,
where Τ = inf{i:|Zt| = a}. But, by Theorem 19.6, we could realize the Brownian
motion Ζ started at χ > 0 as a time change of exp (ζ), where ζ is complex
Brownian motion started at log x. In particular, the argument of ZT is equal
to the imaginary part of ζ where ζ first hits the line Re z = log a, and the law
of this is Cauchy with parameter log (α/χ), from Proposition 19.7—this is where
the Cauchy distribution comes from.
Now we implement this sketched proof. Suppose given ε > 0 and some
bounded uniformly continuous test function /: R -* [0,1]. Choose δ > 0 so small
that|x->;|^(5=>|/(x)-/(};)|^^8.NowfixaG(0,l)sosmallthat,forall|z|^a,
P'CIZJ^a forall ί^1]<^ε.
As we explained above, for all χ in (0, a\
theV.lawqf-^—^^^isC,.
log (α/χ) log (α/χ)
44
BROWNIAN MOTION
1.20
Now pick К so large that
Pe[| 0,| ^ К for some t ^ 1] ^ £ε,
and x0 > 0 so small that К ^ δ log (l/x0) and, for χ ^ x0,
log (α/χ) \,
i>-' (»S)
Then, for χ ^ x0,
θ τ
Ε
/
log (α/χ)
-/
^
log(l/x)
<ΕΊ/
0T
-/
^7
+ E>
f
θγ
log(l/x)
■/
0i
log(l/x)
1оЕ(а/х)У 4log(l/x)
<±ε + P*[T> 1] + Px[|0t - θτ\ > Κ for some Τ < ί < 1, Τ ^ 1]
Pj \ / 0j
+ E·
ίΚ
log(l/x)
■/
log(l/x)
:T^1, \вг-вт\^К for Γ^ί^Ι
D
(20.4) Remarks
(i) The key to this simple proof of Spitzer's theorem is the Brownian Mapping
Theorem 19.6. Spitzer's original proof was based on complicated calculations
(see Ito and McKean [1, pp. 270-271]), but several proofs based on the
Brownian Mapping Theorem have since been given; see Durrett [1] and
Messulam and Yor [1].
(ii) If one takes Brownian motion in {z:\z\ ^ 1} reflected in the unit circle
then the above argument goes through with the obvious changes to show that
if Ζ starts at 1 then
(20.5)
20,
logi
-►S (i->oo),
where S is the distribution with density (Zncoshy)'1 (remarkably, the Fourier
transform of this distribution is sech 0: see Feller [1, Vol. 2, p. 503] who remarks
that the distribution is 'of no importance'!) The result (20.5) was pointed out to
us by Kalvis Jansons. The limit law S has all moments, in contrast to Ci9 which
is evidence that windings that happen near zero make a large contribution.
(iii) The study of Brownian windings has been carried a very long way in
recent years by Pitman and Yor [3,4] and Le Gall and Yor [1,2],
As a sample of the kinds of results achieved, we give the asymptotic joint
distribution of the windings about η points.
(20.6) THEOREM (Pitman and Yor [3]). Let zu...,zn be distinct points of<E,
1.20,21 BROWNIAN MOTION IN HIGHER DIMENSIONS 45
θ[ the winding of Ζ about Zj by time t. Then
(20.7) ^(0i5...50?)^(Kl,...,Fn)
logi
where Vj = U + # Y}, the variables Yj are independent standard Cauchy independent
of the pair (t/,#), which have joint distribution characterised by
(20.8) Eexp(- aH + ivU) = ( cosh ν + °^L1 J
for α ^ 0, veR.
(iv) For a proof using Brownian windings of Picard's Little Theorem (if
/:<C-><C is analytic and non-constant then the range of / can omit at most
one value), see Davis [1] and also Durrett [1]. For a proof of Picard's Great
Theorem, see Davis [2].
21. Multiple points, cone points, cut points. In this section we give without proof
a number of fascinating and beautiful results that illuminate the behaviour of
the Brownian path, and provide a few references to an area in a bewildering state
of development.
(21.1) THEOREM (Dvoretsky-Erdos-Kakutani [1-3]; Dvoretsky-Erdos-
Kakutani-Taylor [1])
(a) Brownian motion in two dimensions has points of all multiplicities 2,3,..., c,
where с denotes the multiplicity of the continuum.
(b) Brownian motion in three dimensions has double points but no triple points.
(c) Brownian motion in dimension greater than three has no double points.
(21.2) Remarks. Numerous proofs of all or part of Theorem 21.1 have appeared
since the first ones, many of them valid for more general Levy processes; see
Hawkes [1], Evans [1], Le Gall, Rosen and Shieh [1] and Rogers [5] for a
sample. The existence of multiple points of Brownian motion has been but a
small part of a much more profound study of the existence and properties of
intersection local time, carried out by Dynkin, Le Gall, Rosen, Wolpert, Yor
and others; we refer the interested reader to some of the papers cited in the
bibliography for more information on these topics. Applications include questions
related to quantum field theory (Dynkin [5,7,8,9]), the asymptotics of the
'Wiener sausage' (Le Gall [4]) and the exact Hausdorff measure of the set of
Brownian multiple points (Le Gall [5]).
For ae(0,π) let Ca denote the wedge {rew:r ^ 0,|0| ^ a} of angle 2α. If Ζ is
Brownian motion in С then a time t such that
ZueZt - Ca for all и ^ t
46
BROWNIAN MOTION
1.21,22
is called a cone time, the set of all such being denoted Ha. A cone time t is a
time at which the path-so-far lies in the shifted cone — Ca with vertex at the
current position, Zt. The position Zt is then called a cone point. When do cone
points exist? The following result of Burdzy [1] and Shimura [1] answers this.
(21.3) THEOREM (Burdzy [1]; Shimura [1]). Cone points exist if and only if
α>£π.
(21.4) Remarks. Le Gall [6] shows how this result follows easily from the
criterion of Varadhan and Williams [1] for a reflecting Brownian motion in a
wedge to hit the corner. Evans [2] and Le Gall [6] study the Hausdorff dimension
of the set of Brownian cone points and the construction and properties of a
local time on the set Ha of cone times.
And lastly in this rushed survey of interesting Brownian motion properties,
we give the fine result of Burdzy [2] on cut points of the two-dimensional
Brownian path.
(21.5) THEOREM (Burdzy [2]). Let Ζ be Brownian motion in <C. Then almost
surely there exists ie(0,1) such that
{Zs:0 ^s<t}n{Zs:t<s^l} = 0.
22. Potential theory of Brownian motion in Rd(d ^ 3). This brief section can
do no more than provide some heuristic sketches of an immense topic; the
books of Blumenthal and Getoor [1], Dellacherie and Meyer [1], Helms [1],
Meyer [2] and Port and Stone [3] are just a few of the many written on this
subject. We shall later provide proofs of some of the results listed here.
Basically, any transient Markov process has a potential theory, but we shall
discuss here only Brownian motion in at least three dimensions. The key
concept of potential theory is the Green kernel G defined by
(22.1) Gf(x) = f °° Ptf(x)dt, feCb(Rd),
where (Pt)t>0 is the transition semigroup of our Markov process. For BM(Rd),
we have
(22.2) Gf(x)=lg(x,y)f{y)dy>
where
(23.3) g(x,y)= ί Pt(x,y)<b
Jo
_rW««p(-!£r2U)*
1.22
BROWNIAN MOTION IN HIGHER DIMENSIONS
47
= ^~d/2\y-x\2-d\ e'uu'2+df2du
Ге-«и-2+а
Jo
= ^n-dl2\y-x\2-dT{\d-\\
(Note that our Green function is based on the probabilists' normalisation ^Δ,
rather than the physicists' Δ.) The same calculation in dimension 1 or 2 gives
an infinite answer, because Brownian motion in those dimensions is recurrent.
The probabilistic interpretation
(22.4) С^(х) = Е^ГГ^(В,)л1
= Ε* [time spent in A by B.~\
is a useful one to bear in mind. A heuristic but suggestive calculation gives from
the notion (4.4) of the generator У = ±A as the derivative of the semigroup
that
(22.5) Pt = ±Δ Pt=>P, = exp φΔ)
Ptdt:=G = (-±A)~\
Jo
and thinking of the Green kernel as the inverse of — ^ will never lead you
astray, though usually a proof must be sought elsewhere.
(22.6) Exercise. For /eC£(Rd), prove that
±AGf = G(±Af)=-f.
Hint. Use analysis to prove the first equality, then (22.4) and Theorem 17.1) for
the second.
Fix now some path wise-connected compact subset К ^ Rd, which we think
of in physical terms as a conducting body. A classical problem of electrostatics
is to determine the equilibrium charge distribution for K, and the equilibrium
potential; if a charge is placed on K, then the charge will flow in К very rapidly
so as to equate the electrostatic potential everywhere within K. (If the potential
were not constant, charge would flow between regions where it differed until
everything was evened out.) This equilibrium charge distribution will minimise
the energy of the charge, and the potential associated with it is called the
equilibrium potential.
How are these physical concepts related to the probabilistic ones? The
following theorems do not require connectedness of K.
(22.7) THEOREM (Hunt). The function PK\ on Rd defined by
(22.8) PK\(x):= Px(BteK for some t > 0)
48
BROWNIAN MOTION
1.22
is expressible as the potential
(22.9) ΡκΙ = Ομκ=\Β(·9γ)μκ(άγ)
of a unique measure μκ on K. The measure μκ is concentrated on dK. If μ is any
other measure concentrated on K, and satisfying ΰμ^Ι on K, then ΰμ < ΰμκ
onlRA
(The restriction in (22.8) to t > 0 is essential—consider the case К = {0}!)
The measure μκ appearing in Theorem 22.7 is, of course, what we shall call
the equilibrium charge distribution. The capacity C(K) of К is defined to be
μκ(Κ\ and is characterised by the extremal property
(22.10) C(K):= max{μ{Κ):μ is concentrated on Κ,ϋμ^Ι on K}
= max {μ{Κ):μ is concentrated on Κ, Θμ ^ 1 on Rd}.
These concepts are neatly related to the concept of energy: the energy <?(μ) of
a measure μ on Rd is defined by
(22.11) /(μ):=
I
μ(άχΜχ,γ)μ(άγ).
Then
(22.12) C(K)-l = min {£(μ):μ concentrated on Κ,μ{Κ) = 1},
and the minimum is uniquely attained at μ = μκ/ϋ(Κ).
There is a beautiful probabilistic interpretation of the equilibrium charge,
due to Chung [2] and Getoor and Sharpe [1], which improves on Hunt's
Theorem. Define
a:=sup{t>0:BteK},
with the convention that sup 0 = 0.
(22.13) THEOREM. For xeRd, yeK, t >0
Ρχ{ΒσΕάγ, aedt) = pt(x, γ)μκ(άγ)άί.
Since
{σ > 0} = {BteK for some t > 0},
(22.9) follows immediately by integrating with respect to i. We shall give a proof
of Theorem 22.13, and most of Theorem 22.7, in Section VI.35, but before that
in Section 111.46 we give an argument based on time reversal that gives a clear
intuitive picture of these results.
{22.14) Remark. It is true that
C(K) = max {v(K): ν concentrated on K, Gv *ξ 1},
1.22
BROWNIAN MOTION IN HIGHER DIMENSIONS
49
and that
C(K) = inf [ί(μ): βμ > 1 on К}.
Why is the second one inf, rather than min? Why not just take μκ, whose energy
we know is C(K)1 Unfortunately, it is not in general true that Θμκ ^ 1 on К,
as the example where К is a ball together with a distant point illustrates. This
is a typical feature of potential theory—it is essential to be very careful in order
to make a statement that is totally correct, as you will see if you consult any of
the references cited above.
We have seen transparent probabilistic interpretations of the equilibrium
charge and equilibrium potential; now we provide an equally transparent
interpretation of capacity.
(22.15) THEOREM (Spitzer-Kesten-Whitman). Consider the 'Wiener sausage'
(St)t>o of (Bt)t>0 defined by
St:={J(K + Bu%
where К is some fixed compact set. Then
(22.16) rMStl-^CiK), a.s.,
where \A\ denotes the Lebesgue measure of A. (Recall that the physicists' capacity
of К will be twice ours.)
The proof of this is based on the Subadditive Ergodic Theorem of Hammersley
and Kingman; see Durrett [3] for a well-motivated account and fascinating
examples of this, as of many other topics. The verification of the conditions of
the Subadditive Ergodic Theorem is a triviality, and the interest is in identifying
the almost-sure limit, which is simply
(22.17) y=limi-1E|Sf|.
t-*co
Now
с
(22.18) E\St\ = E\dyI{yeSt)
^ dy P(yeBu + К for some 0 < и ^ i)
= dy P(Buey - К for some 0 < и ^ i)
= dy Py(BueK for some 0 < и ^ i)
= \dyPy(r^t\
(where r:=inf{i >0:BteK})
^ \dyPy(0<a^t)
= tC{K\
50
BROWNIAN MOTION
1.22
from (22.13). Thus immediately у ^ C(K\ and the proof will be complete if we
can prove that
\dyP'(r^t<a) = o(t).
The key to proving this is to show by splitting the path at time t and using the
reversibility of Brownian motion with respect to Lebesgue measure (compare
Exercise II.E39.29) that
(22.19) \dy Ρ'(τ < t < σ) = | dz Ρζ(τ ^ ί)Ρζ(σ > 0).
From (22.3) and (22.9), we obtain that
PKl(z):= Ρζ(σ > 0) ~\n~"laY{\d - l)C(K)\z\2~d,
so it is not very surprising that (22.19) is of smaller order than (22.18), and this
is indeed the case, though the verification involves us in techniques that lie
ahead, so we shall leave these last steps.
Finally, no introduction to potential theory would be complete without a
few words about the Dirichlet problem. The problem is this. Suppose given a
bounded open connected set D^IRd, and a bounded measurable φ:δΰ->ΊΆ;
can one find a harmonic function h on D such that limx_b/i(x) = cp(b) where
be 3D and χ converges in D to b? There is a simple solution to this problem
using Brownian motion. For any Borel set A, and bounded measurable g, define
(noting once again the 4 > 0' condition)
(22.20) HA:=M{t>0:BteA}, PAg(x):=Elg(B(HA)):HA< oo].
Let V = Rd\D. Call a point b of dD a regular boundary point if D of b is regular
for V for Brownian motion, that is, if
РЬ(ЯК = 0)=1.
It is not hard to show that if b is the tip of a cone lying entirely within V then
b is regular. That the probabilistic definition of regular boundary point agrees
with the classical definition in terms of Wiener's test is proved in Section 7.10
of Ito and McKean [1].
The optimal solution of the Dirichlet problem is the following.
(22.2jf) THEOREM (Wiener). Let D be a bounded domain in Rn. Let φ be a
bounded measurable function on dD that is continuous at each regular boundary
point. Then there exists a unique harmonic function h on D such that
(22.22) lim h{x) = q>(b)
D3x->b
for every regular boundary point b.
1.22,23 BROWNIAN MOTION IN HIGHER DIMENSIONS 51
Doob's idea is to prove Theorem 22.21 by establishing the explicit formula
(22.23) h = Pvq> = PdD(p in D.
You will find a careful proof of all steps of this Theorem in Section 13.6 of
Dynkin [2], among other places. The main points of the argument are first to
prove (using the strong Markov property and rotational invariance of Brownian
motion) that h{x) is the average of h over any ball B{x, r), centred at χ with
radius r, contained entirely in D; secondly to use this together with convolution
with a smooth function to show that h is C00 inside Д and hence satisfies
Laplace's equation; and thirdly to prove (22.22) by an ε-<5 argument.
As for the uniqueness assertion, it is clear that if h is any solution, and_G с D
is a subdomain, GaD, then PdGh = h on G, since h is continuous on G, and,
by the Optional-Stopping Theorem,
h(x) = Exh(B(HdD)) (xeG).
Now we let G]D and use the fact that
(22.24) B{HV) is regular for V, P* a.s.
The proof of (22.24) needs Hunt's Theorem 22.7, though if every point oi dV
is regular, no proof is needed, of course. I to and McKean [1] and Port and
Store [3] give several complements to, and extensions of, Theorem 22.21.
Probabilistic potential theory is a big and important subject, essentially
originating in Hunt's profound papers [1], which explain the basic principles.
Blumenthal and Getoor [1] provide a complete account of Hunt's theory; other
standard references include Dellacherie and Meyer [1], Fukushima [1], Kellogg
[1], Port and Stone [3], Silverstein [1] and Helms [1]. Hunt emphasised the
role of a dual process, which is a kind of time-reversal of the basic process being
studied, though the self-duality of Brownian motion tends to obscure things
here. See the papers of Mitro [1,2] for a lucid account. In Chapter III, we shall
use time reversal to present the Martin boundary for continuous-time chains;
Martin boundary theory describes all possible positive harmonic functions or
Д and is thus deeper than Theorem 22.21, which only characterises bounded
harmonic functions with boundary regularity.
23. Brownian motion and physical diffusion. Brownian motion as we have being
studying it is very closely related to what a physicist would understand by the
term 'diffusion'; the connection is the celebrated diffusion equation of
mathematical physics, which we shall now derive.
Consider the diffusion of some substance (for example, a dye) through a medium
(which could be water, or a crystal). Let p(i,x) be the concentration of dye at
position χ at time t, and let us suppose initially that the medium is isotropic
(no preferred directions, uniform throughout space). Consider some plane in the
medium, perpendicular to the χ * -direction, say; this plane is constantly traversed
52
BROWNIAN MOTION
1.23
by molecules of dye, which pass from one side of the plane to the other. If the
concentration to the left of the plane is higher than that to the right, there will
be a net flux of particles of dye from left to right; and the greater the difference
in concentration, the greater this flux from left to right will be. Fick's Law of
diffusion says that the flux is equal to —\adp/dxv More generally, the flux
F(t, x) is a vector quantity and obeys
(23.1) F(i,x)=--|aVp(i,x).
The vector field F specifies the direction and strength of the net flux of dye,
and to find the flow of particles across a plane perpendicular to the unit vector
w, we simply form the scalar product u-F. Now if we consider a small volume
V around the point x, the total amount of dye in V is jVp(i,x)dx, so
rate of change of amount of dye in V
--f
dt)y
p(t,x)dx
IV
= integral of flux around dV
F{t,x)-dn
tdv
= - V-F(t,x)dx,
by the Divergence Theorem. Since V is arbitrary, we deduce the diffusion
equation
(23.2) ^(i,x) = fV-(aVp)(i,x).
ot
In the case α ξ 1, we have the Kolmogorov (forward) equation for the evolution
of the Brownian transition density:
(23.5) -pt(*,y) = iA,pt(x,)0,
ot
as we argued at (4.7). We have derived the diffusion equation (23.2) under the
assumption that a is a constant, but it may equally well depend on position (if
the diffusivity of the medium varies), and may even be a matrix-valued function
of position. The latter could arise in a crystal, where the preferred directions of
the crystal will tend to distort the concentration gradient and produce a flux
that is not aligned exactly with the concentration gradient. The derivation of
(23.2) remains unchanged.
(23A) Remarks. The special case a = 1 of the diffusion equation gives us the
Kolmogorov (forward) equation for the Brownian transition density, so you
1.23
BROWNIAN MOTION IN HIGHER DIMENSIONS
53
will be wondering whether the general statement of the diffusion equation (23.2)
has a similar probabilistic interpretation. It does indeed, and the interpretation
is the analogue of the interpretation given in Section 4 for Brownian motion.
Without going into too much detail here, we are concerned with a diffusion
whose infinitesimal generator # has adjoint
(23.5) S?*:=±V-(aV).
What this means is as explained is Section 4: there is a transition semigroup
(pt\>o such that <&f = limtiot~1(Ptf - f), at least for some class of/. As a
process, (Xt)t>0 satisfies
rt
f(Xt) - f(X0) - yf{Xs) ds is a martingale
Jo
for all / in some class. This is a sample-path formulation of the statement (4.4):
at
The Markov process X may be considered as describing the motion of a single
particle of dye. (Note in passing that in general there is no closed-form expression
for Pt, unlike the Brownian case, so an alternative prescription such as the
generator ^ is very necessary!) Formally, the arguments establishing Kolmogorov's
backward and forward equations (4.6) and (4.7) now run as before, but do note
that the analogue of the forward equation (4.7) should read
where #* is the formal adjoint of #:
f9g= [g<S*f for f,geC%.
This distinction did not arise for Brownian motion, whose generator |Δ is self-
adjoint, nor will it arise here if the matrix a is symmetric (since then # is again
self-adjoint). It is the received wisdom of physics that a is symmetric and non-
negative definite. Diffusions with 'divergence-form' generators (that is, of the
form (23.5) with a symmetric non-negative definite) are particularly tractable
because they are amenable to the theory of Dirichlet forms. The ideas of
Dirichlet-form theory are drawn from physical notions about energy.
While we are discussing the physical aspects of diffusion, it is worth pointing
out that no physicist would accept Brownian motion as a literal model for the
movement of a particle, since the path has infinite variation. However, a more
satisfactory model can be built by making the velocity vt of the particle into
Brownian motion—but an even more satisfactory model can be made by making
54
BROWNIAN MOTION
1.23
v solve
(23.6) dvt = dBt-Xvtdt,
where Λ>0 is the viscous drag coefficient, and В is Brownian motion on
R, B0 = 0. This permits the velocity to wriggle around, but makes it unlikely to
get too big. The correct interpretation of (23.6) is of course
— λ\ vudu,
Jo
(23.7) Ot = O0 + Bt-
which is solved explicitly by
(23.8) vt = v0e~M + e-M I eksdBs
Jo
where the stochastic integral appearing on the right is here properly interpreted
by integrating by parts:
rt rt
(23.9) eXsdBs:= ektBt - λ eXsBs ds.
Jo Jo
The stochastic differential equation (23.6) is called the Ornstein-Uhlenbeck
stochastic differential equation, and its solution is called the Ornstein-Uhlenbeck
(OU) process. Assume for simplicity there we work in one dimension, and that
v0 is zero-mean Gaussian with variance (2Λ)~\ independent of B. Then υ is a
stationary zero-mean Gaussian process with covariance
(23.10) cov (υ„ vt) = {2λ)~ * exp(- X\t - s\)
The physicist would then take the process
Jo
ds
as a model for the diffusion of a particle. This integrated OU process, being
non-Markovian, is an altogether less obliging process than Brownian motion,
and few of the functionals of X have closed-form distributions. However, it is
not too hard to prove that
λ χ{ηή\ -ί->(Β,χ>0 (п^схэ)
η /t^o
(where the sense of convergence in distribution is fully explained in Part 6 of
Chapter II), so for large time scales, Brownian motion may be accepted as a model
of physical diffusion.
{23.11) Exercise. Confirm that (23.8) solves (23.6), and that the covariance
is (23.10).
1.24
GAUSSIAN PROCESSES AND LEVY PROCESSES
55
4. GAUSSIAN PROCESSES AND LEVY PROCESSES
Gaussian processes
24. Existence results for Gaussian processes. In general, a Gaussian process is
a process (Xt)teT indexed by a general set, such that, for any tl9...9tneT9
(Xtl,...,Xtn) has a multivariate Gaussian distribution. Thus the distribution is
specified by the mean and со variance:
(24.1) p(t):= EXt9 p(s, t):= cov (Xs, Xt\ s, te T,
since we could write down the point density of (Xtl,... ,Xtn) *n terms of μ and
p. It is customary to assume that μ ξ 0, since the general case can be reduced
to this by taking the process X't:=Xt — μ(ί); we shall follow this custom and
henceforth assume that μ = 0.
{24.2) PROPOSITION. The function p:7xT->R is the covariance of a
Gaussian process if and only if ρ is non-negative definite:
(24.3) for any tl9..., i„, (p(th *,·))" j=ι is a non-negative definite matrix.
Proof. Necessity is immediate. Sufficiency uses the Daniell-Kolmogorov
Theorem II.31.1. To check the conditions of that theorem, notice that the state
space R is Polish, and that, for J = {tl9..., tn} с / = {tl9..., ί^} (Ν > n), the law
of (Xtl9...,Xtn) regarded as the projection of (Xtl9...,XtN) down to (Xtl9..., Xtn)
is just N(0, V)9 where VtJ = p(ti9 tj) (i9j = 1,..., n). But this is the law of (Xtl9..., Xtn)
regarded as {Xt:teJ}, and so the consistency condition of the Daniell-
Kolmogorov Theorem holds. Π
The limitation of Proposition 24.2 is that the condition (24.3) is not in general
easy to check. One situation where this can be done is the following. The
consequences are far-ranging.
{24.4) LEMMA. Let Ε be a measurable space with a σ-finite measure m, and
suppose that (Pt)t>0 ™ а sub-Markovian transition semigroup that has a density
pt{;·) with respect to m:
{P,f){x) =
P,(x,y)f(y)m(dy), /еЪЯ, ί>0.
Suppose further that
(24.5) (i) pt{x, y) = pt(y, x) for t > 0, x, yeE;
9(х,У)= P,(x,
Jo
(24.5)(ii) 0(х,.у)= p,{x,y)dt<co forallx,ye£.
Then g is the covariance of a Gaussian process on E.
56
BROWNIAN MOTION
1.24
Proof. We need only check the condition (24.3). But, for any aeIRn, xl9..., x„eE,
η и Г<х> n и
Σ Σ ai9(xhxj)aj= \ dt Σ Σ aiPt(XhXj)aj
= Λ Σ Σ w(^)fliPf/2(^y)Pr/2(^^)flj
JO i=lj=lj
dt \m{dy)\ Σ (НРцгЬьУ)
using the symmetry of pt at the last step. Π
(We have looked ahead to Section III.3 for the definition of a sub-Markovian
transition semigroup, but we have already seen the essentials in Section 4.)
As an example of a situation to which Lemma 24.4 would apply, consider a
random walk on a finite connected graph G, when a jump from vertex i to a
neighbouring vertex; takes place at unit rate, and where the process is killed at
a constant rate <5. The measure m is simply the counting measure on the vertices
of G.
As an example of a situation to which Lemma 24.4 would not apply, consider
Brownian motion in ]R.d{d ^ 3); the condition (24.5)(ii) fails for χ = у. Broadly
speaking, the finiteness of g(x, y) for all χ Φ у is equivalent to the transience of
the process, and g(x, x) < oo is equivalent to the process visiting χ with positive
probability (though, without enough conditions to ensure nice sample paths,
we cannot yet begin to make rigorous sense of this in general.)
The argument of Lemma 24.4 is too pretty for us to abandon all hope of
making it apply to Brownian motion. Some sense can be made of it, but we
have to consider a generalised random field, indexed by the vector space
(24.6) ®(<f):= {σ-finite measures μ on Rd s.t. <^(μ,μ) < oo},
where $ is the energy functional encountered in Section 22:
(24.7) <ί(μ, ν):= fL(dx)g(x, y)v(dy).
The inequality
(24.8) ^(μ,ν)2^^(μ,μΚ(ν,ν)
follows from the Cauchy-Schwarz inequality and a simple modification of the
idea of Lemma 24.4, as you are invited to check, and provides a proof that
Q){&) is closed under addition.
We can now consider the Gaussian field {Χμ:μΕ^(^)}, with со variance
Ε(ΖμΧν) = ^(μ,ν).
1.24
GAUSSIAN PROCESSES AND LEVY PROCESSES
57
The intuitive interpretation
X.-J»
guides us, but has no strict sense; there does not exist a process (Xx)XeRd·
Nevertheless (Χμ)μ€^{^) is a perfectly good Gaussian field, whose existence is confirmed
by checking the condition (24.3) and appealing to the Danniell-Kolmogorov
Theorem. We shall say some more about these random fields whose covariance
comes from a symmetric Green function in the next section. For more on the
relevance to quantum field theory, see Symanzik [1], Brydges, Frohlich and
Spencer [1] and Dynkin [5].
One other existence result that we cannot do without is the celebrated theorem
of Bochner. We consider now only the case where 7 = Rd, and where the
covariance structure is stationary: for all x,yeRd,
p(*,)0 = p(O,)>-x)
for brevity. We give the full form of Bochner's Theorem.
(24.9) THEOREM (Bochner). Let <p:Rd->C be bounded and continuous. Then
the following are equivalent.
(24.10) (i) There exists a finite measure μ on Rd such that
φ{θ) = U{dx)eWx.
(24.10)(ii) For anyau...,ane(C,xl9...,x„eRd,
π η
Σ Σ αιφ(*ι-*/)α/>ο.
(We say that φ is non-negative definitej
Before proving this we apply it to the representation of a stationary Gaussian
process on Rd.
{24.11) COROLLARY. Suppose that p:Rd-+R is continuous. In order that ρ
should be the covariance of a stationary Gaussian process on Rd, it is necessary
and sufficient that ρ may be represented in the form
(24.12) p(x) =
F(dd)e,ex,
where F is a finite non-negative symmetric measure on JRA (The measure F is
called the spectral measure of the Gaussian process.)
58
BROWNIAN MOTION
1.24
Proof. If the representation (24.12) holds, the criterion (24.3) is easy to prove.
Conversely, if (24.3) holds then ρ is non-negative definite, continuous (by
hypothesis) and bounded (since p(x) ^ p(0) = EX I < oo), so, by Bochner's Theorem,
ρ is a Fourier transform of some non-negative measure, which is symmetric
since ρ is. Π
Proof of Theorem 24.9. The implication (24.10)(i) => (24.10)(ii) is trivial.
For the converse, the aim should be to get the inverse Fourier transform of
φ. We approach this by taking some large integers K,n > 0, and (with δ:— 1 /ή)
noticing that (24.10)(ii) implies, for any 0eIRd,
0 < (2n + ΙΓ2άΣκ,η^ίδθ1φ(δΙ- dj)eibB\2K)-d
where Σ*,„ denotes the sum over all pairs (lJ)e(Zd)2 such that || / II oo:=
sup{|/r|:r= l,...,d} ^Kn, ||j||^ ^Kn. But, as n-+oo with К fixed, this
expression converges to
(24.13) (2K)~d dx dye-ie{x-y)(p{x-y)
J{||*IL<jk} J{||ylU<K}
J {Nicotic} j=i\ ^V
which is thus non-negative. (We use the continuity of φ to get the convergence.)
But (24.13) is (to within powers of 2π) the inverse Fourier transform of
^)n,d=1(l-|^l/2iC),and
where the components of X are independent, with density
f{v) = {I - cos v)(nv2y К
Thus ii f2K(v):=2Kf{2Kv), we have that (24.13) is (a multiple of) (<p*F2K)(9%
where F2K(v) = П^/гк^Д an(i Φ is the inverse Fourier transform of φ. But
the distributions with density F2K converge weakly to the point mass at 0, and
the density oiq)*F2K is non-negative. Hence φ is a non-negative measure, which
is what we sought. Π
(24.14) Remark. The function p(x) = Ifx=0\ (xeJR.d) is the covariance function
of a Gaussian process {Xx: xeRd} for wnich XXl,..., Xx are independent N(0,1)
for any xl9... ,xn. However, ρ is not representable in the form (24.12). This may
appear to be a limitation of the representation result Corollary 24.11, but it is
not particularly grave; the process X cannot have a version with any sensible
regularity properties, and so is essentially useless. It is a simple exercise to prove
1.24,25 GAUSSIAN PROCESSES AND LEVY PROCESSES 59
that if X is a stationary Gaussian process for which x\—>Χχ{ω) is continuous
for almost all ω then ρ must be continuous. Thus if we want a stationary
Gaussian process with continuous paths, continuity of ρ is necessary, and, as
we shall see next, we need only strengthen continuity to a mild form of Holder
continuity to obtain a sufficient condition.
(24.15) Exercises
(i) Spectral measure of the one-dimensional Ornstein-Uhlenbeck process. Confirm
that by taking F{dd) = (λ/π)(λ2 Λ-θ2)'1 άθ in (24.12), we recover the (Ornstein-
Uhlenbeck) со variance p(x) = е~яК
(ii) Levy's Brownian motion. With F{d0) = (2π)π exp (-11 \ Θ |2) άθ in (24.12), show
that
pi(x) = (2^)-1/2exp(-^
defines a stationary со variance function on Rn. Deduce that, for λ = \γ > О,
> Г е~х\2п
Jo
i)"1/2exp( -l^-)dt = e-yM
defines a stationary covariance function, for which
E[(*x-*/] = 2(l-exp{-y|x-y|}).
Prove that there exists a Gaussian process (Y^ixeR"), Levy's Brownian
motion, such that
Г0 = 0, El(Yx-Yy)r = \x-y\.
A wonderful paper by McKean [4] discusses Markov properties of Levy's
Brownian motion, showing that it behaves very differently in even and odd
dimensions.
25. Continuity results. Let (Art)ieRn be a stochastic process with values in a
complete separable metric space (S, p). We say that X has a continuous version
if there exists an S-valued stochastic process (X't)telRn such that
(25.1) (i) t\—>Χ[(ω) is continuous for almost all ω);
(25.1)(ii) p(X't(a>),Xt(a>)) = 0, a.s. for all ieR*.
If a process has a continuous version, we generally discard the original process
and work with the continuous version instead, because the original was too
irregular to work with. We now give a simple but powerful result that is usually
sufficient to decide when a process has a continuous version.
(25.2) THEOREM (Kolmogorov's Lemma). // {Xt)t€Rn is a stochastic process
60
BROWNIAN MOTION
1.25
with values in a complete separable metric space {S,p\ and if there exist positive
constants а,С,г such that, for all s.teTR"
(25.3) Ep(XSJXt)*<C\s-t\n+\
then there exists a continuous version of X. This version is Holder continuous of
order θ for each θ < ε/α.
Proof. Let D:= (Jk>0Dk be the set of dyadic rational points in Rn, where
TDk:=2~kZn. The idea of the proof is to show that the restriction of X to
Dn[0,l)" is Holder (0) for any 0<ε/α; we then extend X by continuity to
[0, l)n, and then apply the same argument to a cube of arbitrary size. So we fix
θ e (0, ε/α), and define
Ak:= {for some UjeF9\i-j\ = Ц2~к and;2-ke[0,l)n,
and P(X(i2-k\X(j2-k))>2-ke}.
Then
Ρ(ΛΚ Σ Σ P(p(^2-k),Z(;2-k))>2-ke)
£e[0,2k)n jeZn
Ι;-ί| = ι
<2ик2п2кваС2"к(и+£)
= 2пС2~к(£-ва\
and so by the Borel-Cantelli Lemma with probability 1 only finitely many of
the Ak happen, so that, for some Κ = Κ(ω),
(25.4) p(X(i2-k\X(j2-k)) ^ K2~ke
for all fceN, /,;е[0,2к]и, \i-j\ = 1. All that remains is to extend (25.4) from
neighbouring dyadic rationals to any. For this, let us assume for notational
simplicity that η = 1; the general result is an immediate consequence of this. If
we take 0^χ <у < 1, with χ,уеШ), then, for some /c, 2~k~l <y — χ ^ 2~k, and
so there exists i such that
х^/2-к-1<(1+1)2"к-1^у,
and
using (25.4). Now we similarly analyse the intervals [x,i2~k~1] and [(i+ l)2~k~1,y]
by chipping off the largest dyadic-rational intervals in each (of length at most
2~k~2 in each case). Continuing thus, we have
p{Xy,Xx)^2K2~{k + l)e + 2K Σ 2~гв
r^k + 2
<К'\у-х\в.
The Holder continuity allows us to extend the process X now to the whole of
[0,1)".
1.25
GAUSSIAN PROCESSES AND LEVY PROCESSES
61
(25.5) Remarks
(i) A function that is Holder continuous of order θ > 1 is constant,
(ii) A more general and more powerful way of proving the existence of
continuous versions has been discovered by Garsia, Rodemich and Rumsey
[1], and is extremely useful. You will find nice accounts in Stroock and
Varadhan [1] and Walsh [3].
We are going to use Kolmogorov's Lemma to derive sufficient conditions
for the existence of a continuous version of a Gaussian process. These
conditions are not necessary, but the gap is unimportant in most examples that arise.
Necessary and sufficient conditions for the continuity of a Gaussian process
are now known; work of Fernique, Dudley and others enabled Talagrand to
reach the summit in [1]. The conditions are of a technical nature, so we refer
the interested reader to Talagrand's paper, or to the books by Adler [1,2],
which are full of other interesting results on Gaussian processes.
(25.6) COROLLARY. Let (Xt)teJRn be a (zero-mean) Gaussian process with
covariance function p(s,t):=EXsXt. A sufficient condition for the existence of a
continuous version is that ρ should be locally Holder continuous: for each NeN
there exists θ = Θ(Ν) > 0 and С = C(N) such that, for |i|, \s\ < N,
(25.7) |p(M)-p(t,t)l«C|s-t|e
Proof We have
E(\Xt - Xs\2) = p(u t) - 2p(us) + p(us)
^2C\t-s\e.
Since Xt — Xs is Gaussian, there exist constants am such that E\Xt — Xs\2m =
am(E\Xt-X5\2r^ndso
E(\Xt-Xs\2m)^(2Cram\s-tr.
For large enough m, тв > η, and we can use Kolmogorov's Lemma. Π
For a stationary Gaussian process, p(s, t) = p(t — s), the condition (25.7)
reduces to the Holder continuity at 0 of ρ (which implies easily that ρ is Holder
continuous everywhere.) In the case of a stationary Gaussian process, it is often
convenient to have a condition in terms of the measure F that represents p,
(24.12). Here is one such.
(25.8) COROLLARY. Suppose that, for some ee(0,1),
(25.9) Γ \x\EF(dx)<oo.
Then X has a continuous version.
62 BROWNIAN MOTION 1.25
Proof. Let A:= J |x|EF(dx), Br = {xeR": |x| < r), and note that (25.9) implies
F{Bcr)^Ar-c.
Let us now take the case и = 1, indicating later how the general case follows:
I-P(*)=f(
0 ^ />(0) - p(x) = | (1 - cos Βχψ(άθ)
9x)2 a 2)F{d9)
But the estimation
gives us
< P" i02x2F(^) + 2F(5c2/J
J -2/x
Г 2/x
^^x2 e2F{de) + 2A{\xf.
J -2/x
Jo Jo
^A I y'-'dy
= Α(2-ε)-1Ν2"ε
6 — ε
0^ρ(0)-ρ(χ)^χε2ι~εΑ
2-е
so that p is Holder continuous at zero.
For general n, fix some unit vector v, and let Fv be the image of the measure F
under the map θ\—>θ-ν. Thus the measure Fv satisfies the bound Fv({x: |x| > r})<
F{Bcr) ^ Лг"£, and so, as before, 0 < p(0) - p(x-v) < χε2χ "Μ(6 - ε)/(2 - ε). Since
the constant does not depend on v, the Holder continuity at 0 of ρ now follows.
D
We therefore have quite useable criteria sufficient to ensure the existence of
a continuous version of a stationary Gaussian process {Xt:teWL*}. Can we
obtain similarly simple criteria for the existence of a Ck version?
{25.10) THEOREM. Let p(x) = $eWxF(de) be the covariance function of a
stationary Gaussian process on Rn, let α = (α1,...,απ) be a multi-index and let
εΕ(0,1) be such that
(25.11) f (Πθ?>)\θ№№<θ.
1.25
GAUSSIAN PROCESSES AND LEVY PROCESSES
63
Then there is a version {Xt:teJR"} of the process for which DaX{t) exists and is
continuous. The process {DaAr(t):ie]Rn} is a stationary Gaussian process with
spectral measure
(25.12) Fa(^):=(^n^2ajV(^)·
Proof It is clearly sufficient prove only the case α = (0,..., 0,1). For notational
convenience, we write a point of Rn as (τ,ί), where teR"-1 and ieR. We build
a stationary Gaussian process {(ξν Yxt): (τ, i)eRn} with zero mean and со variance
structure
(25.13) (i) Ε(ξοξχ) = Έ(ΧοΧτ,ο) = ρ(τ,0),
(25.13) (ii) Ε(ξ0Υτί,) = ^(τ,ή,
ot
(25.13)(iii) Ε(707Μ)=--£(τ,ί).
dt2
(The fact that $d2F{d9) < oo implies that the first two partial derivatives of ρ
with respect to t exist and are continuous.) In order to see that (25.13) really
does give the covariance of a Gaussian process, we see at the same time why
(25.13) was chosen. Indeed, if we fix some h > 0 and consider the process
^YD^iX^h-'iX^-XJ)
then the process clearly exists, so its covariance is non-negative definite and
satisfies
(25.14) (i) Ε(^τ) = ρ(τ,0),
(25.14) (ii) Ε(ξ0Υΐ) = h~4>(τ, t + h)- ρ(τ, ί)],
(25.14)(iii) Е[У0 7* J = /Γ2[2ρ(τ, ή - ρ(τ, t + h) - ρ(τ, t - h)\
Thus the limiting form of the covariance (25.13) is also non-negative definite,
and therefore /5 the covariance structure of some Gaussian process. In view of
the integrability assumption (25.11) and Corollary 25.8, there is a continuous
version of (7τ>ί), and the spectral measure of (Yttt) is just 92F(dd), since the
covariance function of У is — d2p/dt2. There is a continuous version of ξ because
there is a continuous version of X. Now we simply define
Jo
(25.15) Χτ/.= ξτ+\ Yx.sds.
Jo
It is immediate that X is a continuous Gaussian process, with a continuous
derivative with respect to i, and it is a simple exercise to confirm that X has
the same covariance as Χ. Π
64
BROWNIAN MOTION
1.25
(25.16) Remarks
(i) John Kent [3] has obtained attractive sufficient conditions in terms of the
covariance structure of an arbitrary (non-Gaussian) stationary process for
the existence of a continuous version. His result is as follows. If ρ is Си, and
pn(h) is the polynomial of degree η given by the Taylor expansion of ρ about
0, and if there exists γ > 0 such that
(25.17) ΙΡ(*)-Ρ-(Λ)Ι = 0(Γ-/|1ο8Γ|3 + η as r = |A|->0,
then there exists a continuous version of the random field {Xt:teTR.n}.
(ii) Everything we have done in this section and the previous section goes
through with minor modification for vector-valued Gaussian processes. Thus
if {Xt: ieRd} is a fc-vector stationary Gaussian process, we have that, for each
7 = l,...,d, {A"/:ieRd} is a stationary real Gaussian process, and so its
covariance can be represented as in (24.12):
ρη(ί):=ΕΧ<(0)Χ\ή-
Fjj{de)e>
ie-t
By considering more generally the real stationary Gaussian process α·Χν
where aelR* is fixed, we deduce the representation
\ή = ^β(άθ)β^\
(25.18) Pjl(t):=EXj(0)X
where, for each Borel B^Rd, {F^B)) is a non-negative definite matrix.
Likewise, the condition.
^\рл(<1в)\в\2г+*<оо
will ensure the existence of а С version of the vector Gaussian random field.
(25.19) Example: Brownian bridge. A Brownian bridge is an R"-valued Gaussian
process (Xt)o^t^T such that, f°r s9te[09 T],
(25.20) EXt = at, E(XSX*) - staa* = ( s л t - - )/,
where aeR" is fixed, and Τ > 0 is fixed. Does such a process exist, and does it
have a continuous version! To answer this, we may assume without loss of
generality that a — 0 (because we could always add the function tv->at to the
zero-mean process), and that η = 1 (because we could construct each component
of the motion separately). There are many ways of proving that such a process
exists and has a continuous version (look at Theorem IV. 40.3 for four different
representations!) but for now we use the methods developed for general Gaussian
processes to prove this.
1.25
GAUSSIAN PROCESSES AND LEVY PROCESSES
65
First, if η is any bounded signed measure on [0, T], we see that
η(άχ) η(άγ)ρ(χ, y):= η(άχ) η{άγ)1 хлу-^
J[0,T] J[0,T] J[0,T] J[0,T] \ *
ГТ ГТ ГТ ι Г ГТ ГТ П2
= \ dv\ η(άχ)\ n{dy)--\ dv\ η(άχ)\
Jo Jv Jv * LJo Jv J
by the Cauchy-Schwarz inequality. Hence, by Proposition 24.2, the function
p(s,i):=5A t — st/T is the covariance of a stationary Gaussian process. (We
could also use Lemma 24.4, since ρ is (a multiple of) the Green function of
Brownian motion in [0, Γ], killed when it exits (0, Γ); this approach is less
elementary, though.)
As to the existence of a continuous version, the condition (25.7) of Corollary
25.6 is trivial to verify, and delivers the result immediately.
(25.21) Exercise. If В is Brownian motion, verify directly that
T — t ( tT
X{t):=at + B\
T-t
satisfies (25.20), and conclude that there exists a continuous version of the
Brownian bridge.
(25.22) Example: Brownian sheet. The Brownian sheet is a real-valued two-
parameter zero-mean Gaussian process {B(s, t):s, t ^0} such that
p((s, t), (u, t>)):= E[B(5, t)B(u, t>)] = (5 л u)(t л ν).
The existence of such a Gaussian process follows because
>/(i/s, dt)
(R+)2
Γ Γ Γ00 Γ00 Ί2
^du,dv){sA u){t л v)=\ dxdyl ^ds,dt)\ ^0
(R+)2 J(R+)2 LJs = xJf = y J
proves that the covariance function ρ is non-negative definite (Proposition 24.2).
the continuity follows again easily from Corollary 25.6, since ρ is Lipschitz
continuous.
It is worth remarking that the process
Xr{t):= X(z,t):=e-xf2B(e\t)
is a continuous Gaussian process such that, for each tgR, (Xx(t))t^ 0 is a standard
Brownian motion. Moreover, Xx is a stationary process, as is easily verified.
This 'Brownian motion of Brownian motions' arises in many contexts; see the
expository papers in Williams [13], and Walsh [3].
66
BROWNIAN MOTION
1.25,26
(25.23) Exercise. Satisfy yourself that an η-parameter Brownian 'sheet' {B(tu...,
i„):i„eR + } can be defined just as easily.
26. Isotropic random flows. The study of turbulent fluid flow using stochastic
methods has a long history, and has involved many great names in probability
and fluid dynamics, including Kolmogorov [1,2] Taylor [1], Batchelor [1], Ito
[8] and Yaglom [1]. The first objective of this work is to construct and classify
Gaussian random fields (7:Rd->Rd that are not only stationary (with respect
to all shifts of the parameter xeIRd), but also isotropic, which means that, for
each GeO(d),
(26.1) (GU(x))xeUA (C/(Gx))xeRa.
Physically, this means that the random field U 'looks the same' in all coordinate
systems. If we assume as usual that U is zero-mean, and define the covariance
function
(26.2) pjk (x):= Е[СЯ(0)С/к(х)]
as before, then the condition (26.1) is equivalent to
(26.3) p(Gx) = Gp(x)GT, VGeO(d), VxeRd.
In terms of the spectral measure representation (25.18), if we could assume that
F had a smooth density Fjk(d&) = fjk(9)d9, then the isotropy condition (26.1)
would be equivalent to
(26.4) /(G0) = G/(0)GT, VGeO(d), V0eIRA
(The assumption that F has a smooth density is harmless; if ρ is an isotropic
covariance then ρε(χ):= e~E^2p(x) is another isotropic covariance with a spectral
measure that does have a smooth density.) For concreteness, we shall from now
on assume that, for some ε > 0,
Φ'{
(26.5) Σ|^0)|0|2+ε<α),
so that the random field U has a C1 version. What does an isotropic random
field look like? The next result partly answers this.
(26.6) PROPOSITION. If ρ is isotropic then it may be represented in the form
χΐχ ι Χ^Χ \
(26.7) p"(x) = Pi.(r)-^- + PN(r)^k - —J,
where r:= |x|, and pL,pN <we two continuous functions such that pL(0) — pN(0).
Proof. Take ex =(l,0,...,0)TeRd and note that by (26.3),
(26.8) p(re1) = Gp(re1)GT
1.26
GAUSSIAN PROCESSES AND LEVY PROCESSES
67
for all GeO(d) for which Ge1 = ev But the G that fix ex are exactly those
expressible as
(26.9)
G =
1
0
0
where ReO(d - 1). If (26.8) holds for all G of the form (26.9), it is easy to deduce
that р{ге^) must be of the form
P(^i) =
pM
0
——— ) = Pdr)ei*l + Ps(r)(I - *i*D>
where Id _ x is the (d — 1) χ (d — 1) identity matrix, and pL, pN are some continuous
functions. The result now follows by rotating the generic xeRd to be a multiple
of ev D
This result is far from a complete characterisation of an isotropic covariance,
since we know little about the functions pL and pN. The following result leads
us to a complete description of pL and pN.
(26.10) COROLLARY. Let σ be the surface Lebesgue measure on S4'1. Then the
spectral measure (Fjk(m))jik = ltd is the spectral measure of an isotropic covariance
if and only if there exist measures μΡ and μ5 on (0, oo) and a constant у ^0 such
that, for any heC^(Rdl
(26.11) (h(0)Fjk(d9)= ( a{du)
h{ru)[ujukpP{dr) + (Sjk - иjuk) μ5(Λ·)]
(O.oo)
+ yh{0)Sjk.
Proof Suppose first that F is isotropic. If F has a smooth density / then /
satisfies (26.4). But then Proposition 26.6 implies that
(26.12) fjk(6) = φΡ(\θ\)θΨ\θ\-2 + <psW)Q* ~ θΨ\θ\~2)
for some smooth φΡ and <p5. Thus, for /ieC£(Rd),
h(9)Fjk(d9)= f a{du) Γt*-4rh(ru)fjk(ru\
Js*-* Jo
which is of the form (26.11) with 7 = 0, pP{dr):=rd~1(pP(r)dr, and μ8(άή:=
r4~19s(r)dr. Moreover, for radially symmetric й,
(26.13)
Σ ^h(e)Fjj{d9) = cd J" h(r)\jiP(dr) + (d- l)ps(dr)l
where cd:= a(Sd *). To dispense with the assumption of a smooth density, let
FE be the spectral measure corresponding to covariance e~e^2p(x) and observe
68
BROWNIAN MOTION
1.26
that (26.13) with h = 1 shows that the measures μρ and μ* have bounded total
mass, and taking h(6) = \9\2 in (26.13) shows that μρ and με8 have bounded
second moment and so are tight. Taking a weakly convergent subsequence, we
derive the limiting form (26.11), the constant γ appearing because in the limit
μΡ and μ5 could put mass on 0.
For the converse statement, if F has the form (26.11) then
(26.14) ρ
cos (ru · x) [μ*ιίμΡ(ώ) + {Sjk - и V)^5(dr)] + ydjk
*(x)= f a(du)
:=δ* Γ9(Γχ)μ5(άή+ i™gjk(rx)<Jip-Hs){dr) + 7d*,
Jo Jo
where
g(v):= f
JSd-l
cos (vu)a(du),
9jk(vY=
uJuг cos (v - u)a{du).
5d-l
Note that we can express
9(ν) = (2πΥ"-^\υ\-^-2^(<ι_2)/2(\ν\):=ψ(\ν\),
9jM =
Ψ
9(v)
φ"(\ν\)
vjvk
\v\ \ \v\2
= (2я)<--1)/^^|1,|--/^,/2(|1,|)-И-<--2>/27(<,+2)/2(|1;|)^|
where Jv() is the standard Bessel function (see Watson [1]), using the well-
known identities
2v
Jv_1(z) + Jv+1(z) = — Jv(z),
ζ
Jv.1(z)-Jv+1(z) = 2J'v(z).
Abbreviating (2π)(<,-1)/2 to ad, j(d-2) to ν and setting
Яу(х) = МУ(|х|)|хГ\ хеШ.",
1.26 GAUSSIAN PROCESSES AND LEVY PROCESSES 69
we obtain from (26.14), after some calculations,
(26.15) p*(x) = γδ* + (δ* - ^) | [°° м5(Лг)[Яу(гх) - Hv + x(rx)]
+ Γμ,(Λ·)Η,+ 1(«)|
+ Ϊ4Η I №(Л-)(^-1)Я,+ 1(гх)
f
Jo
μΡ(ίίΓ)[Ην+1(Γχ)-|Γχ|2Ην + 2(Γχ)]
Thus we have expressed p·** in the form (26.7) with
(26.16) pL(\x\):= pPL(\x\) + PsdM), Pn(M):= Ррн(Ы) + Psn(M),
where
(26.17) (i) P«.(M) = Г [Яу+i(rx) - |гх|2Яу+2(гх)]М<Н
(26.17) (ii) pa(|x|) = f °° (d - l)Hv+1(rx^s(dr),
J о
(26.17)(iii) Р,*(М) = [°°Яу+1(гх)М,,(^),
J 0
(26.17)(iv) p5N(|x|) = f °° lHv(rx)- Яу + 1(rx)]M5(dr) D
J о
Thus Corollary 26.10 not only characterises completely the spectral measures
of an isotropic covariance, but also gives a representation (26.16), (26.17) for
the possible isotropic covariances themselves. Let us now explain the choice of
the subscripts Ρ and S for the measures; Ρ stands for 'potentiaF, S stands for
'solenoidaV. Indeed, the general isotropic covariance can be expressed, according
to (26.15), as
(26.18) p(x) = ySjk + pP(x) + ps(x\
where
ρ^χ):=^-^ρΡΝ(\χ\) + ^Ρρά\χ\1
p*(x):= (δ* - ^jPss(\x\) + ^Psl(\x\1
are the covariance of two isotropic Gaussian random fields, as is γ Sjk (the latter
70
BROWNIAN MOTION
1.26
being the covariance of the trivial field U{x) = Y, Vx, where Υ ~ N(0,yl)). Thus
we have decomposed the general isotropic Gaussian random field into the sum of
three independent isotropic Gaussian random fields, with covariances pP,ps and yl
respectively. The random field with covariance ps actually is solenoidal (that is,
divergence-free), as we confirm by computing (5,·:= d/dxj)
E[divC/(0)]2 = E^^t/j(0)Y
= ΣΣΕ[δ,Η0)δΛ[/Λ(0)]
J к
J к
j к J
= 0
if F is given by (26.11) with μΡ = 0 and у = 0.
Next, we check that the random field with covariance pP is actually a potential
(that is, curl-free: djUk = dkU\ V/, k) For this, we just need to compute
E[dj Uk(0) - dkU\0)Y = - djdjPkk(0) + 2djdkp*(0) - dkdkp»(0)
= jielFnW ~ 2θβ^(άθ) + e2kFkk(d9)l
= 0
if F is given by (26.11) with μ5 = 0 and у = 0. If a C1 vector field U is curl-free,
then there is some real C2 function K:=Rd-»R such that U = grad V. In this
context, it is natural to ask whether, for a curl-free isotropic Gaussian random
field I/, we could find a stationary Gaussian process (V(x))xe^d such that
U = grad V. This cannot always be done; as an exercise, check that the condition
that permits such a representation is
fΣFjjW)№\2- c* J" r~^p(dr) < сю.
The interest of this decomposition is that the solenoidal flows preserve
Lebesgue measure and so correspond to an incompressible flow of fluid.
Let us recall that the reason that isotropic Gaussian fields are of interest is
as a model for turbulent fluid flow; and any turbulent fluid flow must be expected
to be changing in time, so should be modelled by some random field (U(t, x): ieR,
xeIRd) with values in Rd. In view of what we have already done, we can set up
a simple model for such a time-varying Gaussian random field by taking
(26.19) ElUj(t,x)Uk(s,yK = y(s,t)pjk{x-y),
1.26,27 GAUSSIAN PROCESSES AND LEVY PROCESSES 71
where ρ is an isotropic covariance, and у is the covariance of a real Gaussian
process indexed by R. The reason we do not assume that у is the covariance
of a stationary Gaussian process is that we may wish to use y(s, t):= s л t, the
Brownian-motion covariance. Indeed, if we do so then, for each x, (U(t,x))t^0
is a Brownian motion in Ra, and the correlation between the different Brownian
motions [/(·, x) is given by p. This suggests the intriguing notion of studying
the motion of a particle dropped into this turbulent flow; if the particle starts
at x, and is at position Xt(x) at time i, then, in some sense, we should have
infinitesimally
(26.20) d(Xt(x)) = dU(t,Xt(x)).
Good sense can be made of this; see Baxendale and Harris [1] and Kunita
[3]. It will not come as any surprise that the process (Xt{x)\^0 is a Brownian
motion. But much more interesting is to study the flow (Xt(x))t> 0>JceRd, which
tells us not only how individual particles move, but also how they move relative
to each other. There are many fascinating and beautiful results here; see
Baxendale [1], Baxendale and Harris [1]. Carverhill [1,2] Harris [1], Kunita
[3], Le Jan [1-4], Le Jan and Watanabe [1] and the many references therein
for a view of what is known. We have already looked ahead way beyond the
scope of this volume, and must now end our discussion of isotropic random
fields; we hope that what we have said will help you get started on papers .such
as Baxendale and Harris [1], Le Jan [4] and Yaglom [1].
{26.21) Exercise. Confirm that (26.19) is the covariance of a Gaussian process.
27. Dynkin's Isomorphism Theorem. Since our entire discussion of Gaussian
processes is by way of a digression motivated by interest, we make no apologies
for describing briefly here (a caricature of) Dynkin's work on Gaussian processes
and local-time fields, even though we shall develop it no further. This is
appropriate for the mysterious but powerful result that we are about to discuss,
since it is clear that its full potential is yet to be explored; we point to the paper
of Marcus and Rosen [1], where continuity results for local-time fields are
deduced from continuity results for Gaussian processes using Dynkin's ideas,
and to the paper of Sheppard [1], which deduces the classical Ray-Knight
Theorem on Brownian local time using the Dynkin's result.
We shall consider a sub-Markovian process (Xt)t^0 with values in a finite
set /, and a transition semigroup (Pt)t^0 that is symmetric .and assumed to be
integrable and irreducible:
0<g(x,y):= pt{x,y)dt<cc, 4x,yel.
Jo
Let Q denote the β-matrix of the chain X: Q = P(0).
72
BROWNIAN MOTION
1.27
We shall also consider a zero-mean Gaussian process (φχ)χ€Ε with covariance
Щ<Рх<Ру) — 9xy> independent of X, and introduce the notation Px"+y for the law
of X started at χ and conditioned to die at y; precisely, this is the process with
the sub-Markov transition function
(27.1) р!(а,Ъ) = Рг(а,Ъ)д(Ъ,У)/д(а,у).
(27.2) THEOREM (Dynkin). // F:JRJ -»R+ is any bounded measurable function
then, for any x,yel,
(27.3) E<pxcpyF£(p2) = E^'F^cp2 + T)gxy,
where Ta:= $™ I{Xt=a)dt is the occupation field of the chain X.
(27.4) Clarification. On the left-hand side of (27.3), the expectation is only over
the Gaussian field φ. On the right-hand side, X has law Px~+y, and φ is
independent of X.
Proof. It clearly suffices to prove the theorem only for F of the form
F(£) = exp(- Σ£^-)
which we now assume. Let Λ denote the diagonal matrix (diag(2a)).
The inclusion of the weighting F(j<p2) changes the law of the Gaussian field
from N(0, G) to N(0,(Λ - Q)~x); in particular,
Eq>x(pyF(±<p2)
EFfrp2)
(27.5) *!. Γ =(A-Q)-W)·
This leaves us just to confirm that
(27.6) gx,E*->F(T) = (Λ - βΓ^Χ,Λ
since then (27.3) follows immediately from (27.5) and (27.6). Under Px"y, the
process X is a Markov chain with β-matrix
Q^D'XQD9
where D = diag(#(a,y)), as we deduce immediately from (27.1). The problem is
thus to evaluate
(27.7) E*exp|- J X(X.)ds\
where X is a chain with β-matrix Q, dying at ζ, whereupon it is sent to a
graveyard d. The interpretation of (27.7) is that the process X is also being
killed at rate λ(·) and being sent to a graveyard d\ and (27.7) is the probability
that the process ends in d rather than d'. The Q-matrix of the process on
1.27,28 GAUSSIAN PROCESSES AND LEVY PROCESSES 73
lKj{d,d'} is thus
1 д &
/β-Λ -βΐ Λΐ\
Ιο ο ο '
\ 0 0 0/
and from this we compute immediately
Ρ*(Χ ends in 5) = (Λ - Q) "1 (- Q l)x
= D-1(A-fi)-1D(-ei)x.
But
(β l)x = Σ Ϊχο = Σ IxaGaylQxy
a a
OxylGxyi
so that
PX(X ends in δ) = —(Л - ОГНьу),
Gxy
completing the proof. D
Levy processes
28. Levy processes. The aim of this section is to give the briefest of introductions
to the theory of Levy processes. Inclusion of this material is justified because
Brownian motion is a Levy process; Levy processes also provide one of the most
important examples of the Markov processes studied in Chapter III and the
semimartingales of Chapter VI.
(28.1) DEFINITION. A process (Xt)t>0 with values in Rd is called a Levy
process (or process with stationary independent increments^ if it has the
properties
(28.2) (i) for almost all ω, ί-> Χ,(ω) is right continuous on [0, oo), with left limits
on (0, oo);
(28.2) (ii) for 0 ^ t0 ^ t1 < ··· < tn, the random variables
Yj.— Xt — Xt t (j = 1,..., n) are independent,
(28.2)(iii) the law of Xt+h — Xt depends on h, but not on t.
The analytic theory of the semigroups associated with Levy processes is the
same as the theory of infinitely divisible distributions.
(283) DEFINITION. A probability measure μ on Rd is infinitely divisible if for
74 BROWNIAN MOTION 1.28
each n, there is a probability μη on Rd such that ifV^..., Vn are independent with
law μη then
νί+-+νηΜμ.
It is clear that if X is a Levy process then the law of X1 is infinitely divisible.
The converse, that any infinitely divisible law is the law of X1 for some Levy
process, X, will follow from the central result of the theory of Levy processes.
(28.4) THEOREM (Levy-Khinchin representation) For each beIRd, each non-
negative definite symmetric d χ d matrix % and each measure ν on Rd\{0}
satisfying the integrability condition
J<
(28.5) |(|χ|2Λΐ)ν(<ίχ)<οο,
the function
(28.6) <р(в) = ехр[_ф(в)1 веШ.",
is the characteristic function of an infinitely divisible law (here we write
(28.7) ψ(θ):=ώθ-±θτΧθ
[{<?■'-1-ίθ ·χΙιΜ<1])ν{άχ),
the characteristic exponent of the law.) Moreover, the characteristic function of
any infinitely divisible law on Rd may be represented in this way, with the
representing triple (b, %, v) being uniquely determined.
Proof. We refer the reader to Section 9.5 of Breiman [1] or Section XVII.2
of Feller [1] for the one-dimensional case. For the general situation, see
Fristedt [1]. D
The measure ν in (28.7) is called the Levy measure of the infinitely divisible law.
An important extension of the definition of infinite divisibility is the following.
(28.8) THEOREM. Let Xbea random variable with the property that, for each η
(28.9) Χ* tXnj,
where the Xn] are independent, and, for each ε > 0,
(28.10) lim sup P( \Xnj\ > ε) = 0.
η -► oo j ^ η
Then X is infinitely divisible.
Proof. See Breiman [1].
1.28
GAUSSIAN PROCESSES AND LEVY PROCESSES
75
The uniform asymptotic negligibility condition (28.10) is commonly encountered
in limit theory; it is the simplest condition one could impose to prevent any
one of the summands Xnj from contributing noticeably to the sum. As an
example, the first passage times of one-dimensional diffusions are clearly infinitely
divisible using Theorem 28.8, but not obviously so using Definition 28.3. As an
exercise, check that if, for each n, the (Xnj) have a common distribution then
(28.10) holds.
We now show how any infinitely divisible law (with characteristic exponent
φ of the form (28.7)) may be realised as the law of X1 for some Levy process
X. The law of Xt must be given by
Ε exp (Ю · Xt) = exp [i^(0)],
\
the right-hand side being the characteristic function of an infinitely divisible
law by Theorem 28.4. Thus if μ, is the law of Xt, for any i, s ^ 0
so we may define a Markovian semigroup {Pt)t>0 by
Λ/(*):=
Ях + yMdy), V/eC0(Rd).
It is easy to see that, for each t ^ 0, Pt: C0(Rd):= {continuous functions on Rd
which vanish at oo} -> C0(Rd). Hence (Pt)t^ 0 is a Eeller-Dynkin semigroup (see
Section III.6), and by Theorem III.7.17, there exists a process X with paths that
are right-continuous on [0, oo) with left limits on (0, oo) and with transition
semigroup (Pt)t> 0, and such that X0 = 0. It is now clear that X is a Levy process,
and X1 ~ μ1 as claimed.
The result to which we have just appealed is quite technical, but once we
have got it, it allows us to pass from an analytical description of the process
(in terms of the convolution semigroup of infinitely divisible laws) to a
sample-path description, which is far more powerful. To understand what we
have gained by doing this, we are going to prove Levy's Theorem that the only
continuous Levy processes are drifting Brownian motions (indeed, without
sample paths, what does this theorem mean?!) First, though, we discuss the
'building-block' Levy process—the compound Poisson process.
Consider a process that is constructed from a standard Poisson counting
process (Nt)t>0 of rate A>0, and an independent sequence YUY29... of IID
random variables with distribution function F as follows: we define
(28.11) Xt= Σ Yj.
It is clear that the paths of X are right-continuous with left limits. Moreover,
the increments of X over disjoint intervals are independent, and Xt+S — Xt has
76
BROWNIAN MOTION
1.28
the law of a sum of a Poisson number of copies of У,:
Eexp[i0-(*,+.-*,)]= Σ {-^-^ieiexF(dx)
oo n\ J
= exp L \(ew'x-l)XF(dx) \.
Thus X is a Levy process, and if we compare with (28.7), we see that in the
representation of X, $ ξ= 0, and ν has finite mass. We can now prove Levy's
Theorem.
(28.12) THEOREM (Levy). If X is a continuous Levy process in Rd then X is
expressible in the form
(28.13) Xt = aBt + bt
for some beTR.d, and σ a d χ d matrix.
Proof. Let us suppose that the characteristic exponent of X is given by (28.7);
our task is then to prove that ν = 0. Fixing ee(0,1), we construct on some
suitable probability space two independent Levy processes Xе and YE with
characteristic exponents
ψε(θ):=ώ·θ-\θτΧθ + Г [eiex-\-ie-x)v{dx\
ψε(θ):=ψ(θ)-ψε(θ)
respectively. Then
ι = {eWx - 1 - i0/(,
JW>£
ψε(θ)= {βΡ"-1-ιθΙΜκ1)}ν(άχ)
Ux\>b
is the characteristic exponent of a compound Poisson process, since v({x:
|x| > ε}) < oo. Thus Υε has only finitely many jumps in any bounded time interval.
Since Xе and YE are independent, their sum is a Levy process with characteristic
exponent ψε + ψΕ = ψ, so that Xе + YE is (a copy of) X. Now, since Xе is a Levy
process, has right-continuous paths with left limits, there can be only countably
many discontinuities of Xе in [0,1]. Let Q)t denote this countable random set.
Since the jumps of YE come at the times of a Poisson process independent of
Xе, we have immediately that, almost surely, no jump time of YE falls in Q)v
Thus any jump time of YE will actually be a jump time of Xе + YE = X\ but X
is supposed to be continuous. The only possibility then is that YE has no jumps,
which is to say
ν({χ:|χ|>ε}) = 0.
Since ε > 0 was arbitrary, the expression (28.7) for the characteristic exponent
1.28
GAUSSIAN PROCESSES AND LEVY PROCESSES
77
of X collapses to
ψ{θ) = ώ-θ-±θτΧθ,
and taking σ = $1/2 yields the form (28.13), as required. D
We conclude this section with a few examples of common Levy processes (or,
equivalently, infinitely divisible distributions). The Gaussian distribution we
already know about.
(28.14) Stable processes. A real-valued Levy process X is said to be stable of
index ae(0,2] if, for any о 0,
(28.15) (4Ьо = ^ЧЬо.
The family of all (real-valued) stable (a) processes is given by the characteristic
exponents
(28.16) ψ(θ)= -c|0|a[l-i/?sgn(0)tan^ua]
where l<a<2, -1 ^ /? ^ 1, or 0 < a < 1, — Ι^β^ί. For a = 1, the exponent
is of the form φ(θ) = - сЩ + ιμθ.
The representation (b,$, v) in terms of c, β and α is, for 0 < α < 1,
(28.17) ^ = lxr1-g[i(l+^)J(JC>o) + |(l-j8)/(JC<o)]rn ™ 1 ,
dx r(l-a)cos^r
$ = 0, b =
2υ
1
x v(dx) =
β ас
(28.18) 4-_, = _^ $ = 0> b = /i
.! 1 — α Γ(1 — α) cos |απ'
and, for 1 < α < 2, exactly the same formula is valid (now, note, cos |απ < 0).
The representation of φ{θ) = — с\в\ + ιμθ in the case α = 1 is achieved by
v(dx) с
dx πχ2
The case a = 2 is just the Brownian case.
One special case is that of the symmetric Cauchy process:
(28.19) Ψ(θ)=-\θ\.
The Cauchy process arises naturally in two-dimensional Brownian motion; if X
and Υ are independent BM0 processes, rt:= inf {u:Xu = i}, then Y{rt) is a Cauchy
process, as you can easily verify from the fact that, for λ > 0,
Ε ехр(-,1т,) = ехр(-ц/2Д).
(See (9.1).) The asymmetric Cauchy process has
(28.20) ψ(θ)=-±π\θ\-ίθΙοξ\θ\
and
(28.21) v(dx) = x-2dxl{x>0), b = $ = 0.
Note that the asymmetric Cauchy process is not stable.
78
BROWNIAN MOTION
1.28
In view of the interpretation of ν as the 'jump measure', it is natural (and
correct) to think that the asymmetric Cauchy process Ζ can only jump upward;
and it is natural (and incorrect) to think that, for some large enough c>0,
Z, + ct is increasing. The reason that this is incorrect is somewhat mysterious,
and is to do with the fact that, for any t > 0, Σβ<ι|ΔΖ,| = + oo, almost surely.
We discuss this further in Section VI.2, but raise the issue here to emphasise
that Levy processes for which
|(|x|2Al)v(dx)<oo, f<
(|x|2 л l)v{dx)< oo, (|x| л l)v(dx) = + oo
are the most mysterious by all. We shall prove straight away that we cannot
make Zt + ct increasing, however large с is, by characterising all increasing Levy
processes.
(28.22) Subordinators. A subordinator is simply an increasing Levy process;
examples include Xt = t and compound Poisson processes with positive jumps.
We shall prove the following characterisation.
(28.23) THEOREM. The distribution F on R+ is infinitely divisible if and only
if there is a representation
(28.24) Г e~XxF(dx) = exp\-ck- | (1 - e~λχ)μ(άχ) \
J[0,oo) L Jo J
for some с ^ 0, and measure μ on (0, oo) satisfying the integrability condition
fV
Jo
(28.25) | (хл 1)μ(</χ)<οο.
(28.26) Remarks
(i) The function
(28.27) γ(λ):= cX + | (1 - βλχ)μ(άχ)
Jo
is called the Laplace exponent of the infinitely divisible law. (Our notation
у is not standard.)
(ii) The method of proof of Theorem 28.23 contains most of the essential steps
of the proof of Theorem 28.4, but simplified by the fact that we work with
Laplace transforms,
(iii) The measure (28.21) does not satisfy the integrability condition (28.25), so
the law of the asymmetric Cauchy process with positive drift с cannot be
an infinitely divisible law onR+.
Proof of Theorem 28.23. Suppose first that F is infinitely divisible, F(0) = 0.
1.28
GAUSSIAN PROCESSES AND LEVY PROCESSES
79
Define
Ρ(λ):= I e-XxF(dx)e(0M
J[0,oo)
and observe that, for each nelM, there is an nth root Fn of F with
J[0,oo)
Thus as π -► oo, Ρ „(λ) -*■ 1 uniformly on compact sets, and
(28.28) logP(X) = nlogPn(X)
= nlog{l-[l-Fn(A)]}
<-n[l-F„(A)].
But since F„(A)-> 1 uniformly on compacts, we can assert that, for any ε > 0, and
К>0, there is some n0 such that, for n^n0, λ^Κ,
where δ > 0 is such that, for all 0 s$ χ < δ,
log(l-x)>-(l + e)x.
Hence, for n^n0, λ^Κ, we have
(28.29) log Ρ(λ) = и log Ρ „(λ) > - и(1 + ε) [1 - F.(A)].
The conclusion from (28.28) and (28.29) is that
n[l-^.(A)]-*-logF(A) (n->oo).
But
п[1-ад] = и
■ί
(1-в-^я(Л)
(Ο,οο)
(Ο,οο) 1 -"^
Thus the measures mn(dx):= n(i — e~x)Fn(dx) on (0, oo) have bounded total mass
(indeed, m„(0, oo)-* — logF(l)), so there is a subsequence down which mn=>m, a
measure on [0, oo], and we conclude that
и[1-ад]-+т({0})Я+ f \"_e _xm(dx) + m({cc}).
J(0,c») * e
Writing c:=m({0}) and μ(</χ):=(1 -e"*)"1"^*) gives the form (28.24) for
— logF(X):= γ(λ), except for the presence of m({oo}). But letting Λ JO shows that,
in fact, m({oo}) = 0, and the proof is complete. Π
80
BROWNIAN MOTION
1.28,29
(28.30) The gamma process. Many of the common families of distributions of
statistics are actually infinitely divisible, the gamma distribution included. Since
the gamma law is concentrated on R+, we see from Theorem 28.23 that if Xt
is a gamma random variable with scale parameter α and shape parameter t
(that is, Xt has density χ'"1 β"αν/Γ(ί)) then
(28.31) Ee'XXt = of (λ + α)"' = exp - cXt - t | °°(1 - β~λχ)μ{άχ)
for some с ^ 0, and μ satisfying (28.25). The reader should have no difficulty in
confirming that
(28.32) μ{άχ)^β-αχ—, c = 0,
χ
makes (28.31) true. The fact that the mean and variance of a gamma law are
proportional to t is now obvious from the interpretation of the gamma law as
the law of a Levy process.
(28.33) Among other common distributions, the ί-distribution, the lognormal,
and reciprocals of gammas are all infinitely divisible; see respectively Grosswald
[1], Thorin [1] and Bondesson [1].
Although the Levy-Khinchin representation looks like all that there is to
say about infinite divisibility, it is unfortunately rare that one can exhibit the
characteristic function in a sufficiently explicit form to be able to decide infinite
divisibility. For that reason, attention has focused on various subclasses of the
infinitely divisible laws; see Bondesson [1] for a survey. Pitman and Yor [1]
succeeded in explaining the analytical results of Ismail and Kelker [1], for
example that, for ν > — 1,
2h^r/22Vv(Vv)/r(v + l)
is the Laplace transform of an infinitely divisible law, by finding a diffusion
additive functional with this law, which was thus obviously infinitely divisible.
Pitman and Yor also discovered a whole host of other infinitely divisible laws.
This opened a whole new vein in the study of infinite divisibility, and it is fair to
say that it is still far from exhausted.
29. Fluctuation theory and Wiener-Hopf factorisation. At the very beginning of
this chapter, one of the reasons we gave for studying Brownian motion was that
it was sufficiently concrete that many calculations can be done explicitly; in
particular, the law of the maximum by time i, or the law of the first passage
time to a level, were quite easy to derive using the reflection principle.
Generalising to Levy processes, these problems become very much harder to
answer (despite the explicit Levy-Khinchin representation), and lead us into
the realm of Wiener-Hopf factorisation. The whole area is notorious for the
1.29
GAUSSIAN PROCESSES AND LEVY PROCESSES
81
lack of good closed-form answers, although various general formulae are known.
We shall here present without proof a selection of largely classical results that
illustrate what is known. We have drawn extensively on the excellent surveys
by Bingham [1] and Fristedt [1].
Let X be a (real-valued) Levy process, Xt:=sups^tXs,Xt:=mfs<tXs and let
Tfl:= inf {t > 0:Xt > a}. Take Τ to be an exponential random variable of mean
j/"1, independent of X.
(29.1) THEOREM (Spitzer, Rogozin, Pecherskii)
(29.2) (i) ΕβίθΧ{Τ) = ηΙη'-ψ(θ)Γ1
(29.2) (ii) =Eeiex(T)Eeie{x(T)-xm}
(29.2)(iii) = Eew*mEew*m.
Moreover, we have the Spitzer-Rogozin identity
Γ C^e'i'dt f00 Ί
(29.3) ΕeimT) = exp (eWx - l)P(Xtedx) ,
with the analogous expression for the characteristic function of X{T).
(29.4) Remarks. The equality (29.2)(i) is immediate from the definitions. The
equality (29.2)(ii) follows because X(T)= X{T)-X{T); draw a picture of the
sample path and turn it upside down! The equality (29.2) (ii) is the profound
statement; although evidently XT = XT + (XT - XT\ the factorisation (29.2)(ii)
follows from the less obvious fact that
(29.5) XT and XT — XT are independent.
This fundamental fact is best understood by excursion theory, and the account
given by Greenwood and Pitman [2] is the definitive reference. The excursion-
theoretic standpoint is the key to most of distributional identities concerning
Levy processes. We omit the very important identity of Fristedt [1] because
we have not the necessary notions yet to state it, but record here a simple but
useful identity; for λ ^ 0,
(29.6)
Εβ-λ*{Τ) = η\η + κ(η)λ+ | P(X(T)edy) Γ v(dx)(l -e~x(x + y)) ,
L J(-oo,0] J -y J
where /c is a non-negative increasing function. See Rogers [5] for a derivation
and the explanation of the significance of к, which is related to Fristedt's identity.
Spitzer's book [1] gives the random-walk version of (29.3), which was extended
to Levy processes by Rogozin [1].
Although the Wiener-Hopf factor (29.3) can rarely be computed in closed
form, the identity (29.3) does yield useful information. By extending from веЖ
82
BROWNIAN MOTION
1.29,30
to 0eH = {ze<C:Imz^0} and letting Θ = ιλ-+ ίοο, Я > 0, we deduce that
(29.7) Ρ(το>0) = 0 or 1 according as $0 + (dt/t)P(Xt >0) = + oo or <+oo.
Rogozin [1] deduces the test
(29.8) PCX^ = + oo) = 0 or 1 according as f*>(dt/t)F(Xt > 0) < oo or = + oo.
If one restricts attention to the case of spectrally one-sided Levy processes
(those for which the measure ν in the Levy-Khinchin representation (28.7), puts
no mass on one or other of the half-lines) then more complete results may be
obtained, simply because, if v(R+) = 0, say, the law of X(T) is exponential. This
is probabilistically obvious (why?—recall that X has no upward jumps), but
can be seen immediately from (29.6). It may not be easy to work out the rate
of the exponential, but at least one has in principle both of the Wiener-Hopf
factors, and some expressions for them. For example, if X were spectrally
positive, and -X(T)~εχρ[β(η)]9 then, from (29.6),
Έεχρ[-λΧ{Τ)] = η\ η + λκ{η)+ | °° v(dx) \'fie'Pf(l -eX{x~y))dy
and it can be shown that κ(η) > 0 if and only if σ > 0 (that is, there is a Brownian
component; see Rogers [5, Theorem 3]).
Distributional results for Levy processes continue to be discovered; see Doney
[1,2] for particularly interesting recent ones.
30. Local times of Levy processes. Of the many sample-path properties of
Levy processes, some of the most interesting are to do with existence and
properties of a local-time process. In this section, we give a brief introduction
to the main results in this area, which are due to Kesten, Bretagnolle, Blumenthal
and Getoor, Barlow and Hawkes. Throughout, we assume that the (real-valued)
Levy process X is not a compound Poisson process, this case being a trivial
complication.
Define C^lRby
C:= {xeTH:P(Xt = χ for some t > 0) > 0}.
(30.1) THEOREM (Kesten, Bretagnolle). Either C = 0,or else Leb(C)>0. //
the latter alternative obtains then С is one of R, (0, oo) or (— oo,0). A necessary
and sufficient condition for Leb(C) > 0 is
(30.2) I Re| ]</0<oo.
The original proof of this result is to be found to Kesten [1]; Bretagnolle [1]
was able to simplify Kesten's approach by working directly with the potentials
of singletons, rather than little intervals.
1.30
GAUSSIAN PROCESSES AND LEVY PROCESSES
83
Let mt (respectively m) denote the occupation measure by time t (respectively
the 1-discounted occupation measure):
Щ(Л)-= \lA(Xs)ds, m(A):= Г е~Чл(Х,)а1.
Jo Jo
Let Lt (respectively Ц denote the density of mt, (respectively m), if such densities
exist. The following attractive result decides when an occupation density exists.
(30.3) THEOREM (Hawkes [3]). A local time exists if and only if
(30.4) I Re l—— d0<oo.
Moreover, L{) is almost surely square-integrable.
Proof.
E\m(0)\
2 = E\ ["е-'-'^'-^ЛЛ
Jo Jo
Re
Thus the condition (30.4) (which is the same as (30.2)) implies that almost surely
rheL2(JR), and so m has an L2(R) density.
Conversely, if a local time exists, the range of X has positive Lebesgue measure
almost surely, implying Leb(C) > 0, and hence (30.2), by Theorem 30.1. Π
Millar and Tran [1] show that the local time of the asymmetric Cauchy
process (which exists by applying the test (30.4) to the characteristic exponent
(28.20)) has the property that xi->Lf(x) is unbounded on every interval; thus
the local time is far from continuous in general.
Under the condition
(30.5) 0 is regular for {0},
Blumenthal and Getoor [2] proved the existence of a jointly measurable process
{L(i,x):i^0, xeJR} that is an occupation density and such that, for each
x,t\->L{t,x) is continuous increasing and, indeed, an additive functional; see
Section III. 16). Thus the condition (30.5) ensures the continuity of L in the time
variable, leaving just the continuity in the space variable. (The condition (30.5)
means P°(#0 = 0) = 1, where, as usual, Hx:= inf {i > 0:Xt = x}. It can be shown
(see Rogozin [1]) that (30.5) is equivalent to
(30.6) Γ Re ΐ 1<*0<оо
84 BROWNIAN MOTION 1.30
and either
σ2>0 or \(\x\ a l)y(dx) = + 00.
Let us assume the (30.4) holds, and that the function φ is defined by
^>.{iJ(i-«fa)».[T-Lj5]*},e.
Let φ be the monotone rearrangement of <p, that is,
<p(i)=inf{s:^(s)> i},
where
^(s):= Leb{w:<p(w) ^ i}.
The sufficiency of the following condition is due to Barlow and Hawkes [1], the
necessity to Barlow [5].
(30.7) THEOREM (Barlow, Barlow and Hawkes). Assume (30.4) and (30.5).
There exists a jointly continuous version of the local-time process {L(i,x):i >0,
xeR} if and only if
(30.8)
cp{u)du
< 00.
o+u^/logil/u)
The condition (30.8) was suggested by results in the continuity of Gaussian
processes; see Hawkes [3] for a well-motivated survey.
Barlow [5] also establishes the following modulus-of-continuity result.
{30.9) THEOREM. If<p(x) = \x\af(x), where f is slowly varying at 0, then, for
a proper subinterval /o/R,
|L(s,a)-L(s,b)| Γ F/ Ί1/2
km sup ' \ \ ] " =2 supL(r,x) .
SiO a,bel ψ{θ - θ){ -log| Ь - fl |)1/2 |_ xel J
\a-b\<S
We have already mentioned important work of Marcus and Rosen [1] which
utilises the Dynkin isomorphism theorem in deep studies of local-time.
CHAPTER II
Some Classical Theory
This chapter is reminder of what every probabilist should know, with the
emphasis on things that tend to be neglected. The considerable length of the
chapter—and it is now much more extensive than in the first edition—should
be sufficient guarantee that 'reminder' is used in the usual 'courtesy' sense!
Because things are now developed in strictly logical order, you may sometimes
have to wait a little time for applications. (We very occasionally cheat just a
little in Exercises by using things that you may feel are not yet proved with full
rigour, but we always clear these up later.) Exercises are very much part of the
text—please do them!
So many reminders of standard definitions are included that we break with
our usual definition format except when we wish to give special emphasis to
particularly important material that may not be so familiar.
1. BASIC MEASURE THEORY
The basic results of measure theory are summarised here, with commentary,
but mostly without proofs. A full account, with all results proved, may be found,
for example, in Williams [15], referred to as [W] throughout this chapter; that
account has the advantage that its notation and terminology are the same as
those used here. Neveu [1] is a marvellous account of measure theory for
probabilies; and, for the definitive account of the full theory, including Choquet
capacitability theory (which is needed for the Debut and Section Theorems),
see Volume 1 of Dellacherie and Meyer [1]. If you have studied Measure for
Measure from such classics as Halmos [1] or Dunford and Schwartz [1] then
this part of the chapter can serve to remind you of probabilists' language.
Measurability and measure
1. Measurable spaces; σ-algebras; π-systems; ^-systems. Measurability will, in
a sense, be much more important to us then measure. The emphasis in
probability is therefore very different from that of courses that aim straight for
the Dominated-Convergence Theorem.
86
SOME CLASSICAL THEORY
II. 1
(1.1) Algebra; σ-algebra; σ(#); measurable space. Let S be a set. A collection Σ0
of subsets of S is called an algebra on S (or algebra of subsets o/S) if the following
three conditions hold:
5εΣ0,
FgE0^Fc:=S\FeE0,
F,GeE0=>FuGgE0.
Note that 0 = SceE0 and that F, σεΣο^ί'η G = (Fcu Gc)ceE0. Thus an algebra
on S is a family of subsets of S stable under finitely many set operations. {Note:
Some authors use 'field' for 'algebra' and 'σ-field' for 'σ-algebra'.)
A collection Σ of subsets of S is called a σ-algebra on S (or σ-algebra of subsets
ofS) if Σ is an algebra on S such that whenever FneΣ{neЩ9
Π
Note that if Σ is a σ-algebra on S and ^„εΣ for neN, then f]nFn = ((^ί^εΣ.
Thus a σ-algebra on S is a family of subsets of S 'stable under any countable
collection of set operations'. A pair (S, Σ), where S is a set and Σ is a σ-algebra
on S, is called a measurable space. An element of Σ is called a Σ-measurable
subset of S.
Let # be a class of subsets of S. Then σ(ίί), the σ-algebra generated by #, is
the smallest σ-algebra Σ on S such that <β ^ Σ. It is the intersection of all
σ-algebras on S that have <€ as a subclass. (Obviously, the class ^(S) of all
subsets of S is a σ-algebra that extends #.)
(i.2) Боге/ σ-algebras; @{S); ® = Jf(R). Let S be a topological space. Then @{S\
the Borel σ-algebra on S, is the σ-algebra generated by the family of open
subsets of S. With slight abuse of notation,
(1.3) ^(5):=σ(ορεη sets).
It is standard shorthand that ^:=^(R). The σ-algebra 36 is the most
important of all σ-algebras. Every subset of R that you meet in everyday use
is an element of 3&\ and indeed it is difficult (but possible!) to find a subset of
R constructed explicitly (without the Axiom of Choice) that is not in Si. To
construct a subset of R that is not Lebesgue-measurable (as we do in Exercise
E20.6b), one must use the Axiom of Choice. See Durrett [3, p. 411].
Elements of Si can be quite complicated, and it is not possible to write down
the 'generic' element of Si in practicable fashion. However, the collection
(1.4) 7r(R):={(-oo,x]:xeR}
(not a standard notation, but a key example of a 'π-system') is very easy to
understand, and it is often the case that all we need to know about 36 is the
almost obvious result that
(1.5)
3» = σ(π0ΙΙ)).
II.l
BASIC MEASURE THEORY
87
(1.6) π-systems and d-sy stems. Here we develop the point just made into a very
useful technique.
Let S be a set. A collection J of subsets of S is called a π-system if J is stable
under finite intersections:
whenever A,Be</, we have AnBeJ.
A collection 2 of subsets of S is called a d-system (on S) if
if A, Be® and^cfi then B\A e@,
ifAneS> and An] A then Ae®.
Recall that Лп| Л means АпяАп+ ^Vn) and (J4n = A.
(1.7) PROPOSITION. A collection Σ of subsets of S is a σ-algebra if and only
if Σ is both a π-system and a d-system.
Proof. The 'only if part is trivial, so we prove only the 'if part.
Suppose that Σ is both a π-system and a d-system, and that E,F and
£„(neN) are in Σ. Then Ec:= Ξ\ΕβΣ, and
EuF = S\(EcnFc)eX.
Hence ΰΠ:=£1υ···υ£ΠΕΣ, and, since GH^\jEk9 we see that [JEkeT.
If # is a class of subsets of S, we define d(#) to be the intersection of all
d-systems that contain #. Obviously, а(Я>) is a d-system, the smallest d-system
containing #. It is also obvious that
d(V) с σ(#).
(jf.8) LEMMA (Dynkin). If J is a π-system, then
d(J) = a(J).
Thus any d-system that contains a π-system contains the σ-algebra generated by
that π-system.
Proof. Because of Proposition 1.7, we need only prove that d(J) is a π-system.
Step 1: Let ®x := {Bed(J):Bn Ced(J\ VCeJ). Because J is a π-system, ®x з,/.
It is easily checked that 2X inherits the d-system structure from d(J). [For,
clearly, Sb2x. Next, if BuB2e@i and Bx с β2, then, for С in Λ
(B2\BJ nC = (B2n CUB, η С);
and, since B2nCed(J\BinCed(J) and d(</) is a d-system, we see that
(B2\B1)nCed(J), so that ^V^e^V Finally, if BHeS1 (weN) and B„|B then,
88 SOME CLASSICAL THEORY 11.1,2
for CeJ,
(ВппС)ЦВпС)
so that Β η Ced(J) and BeS>v~\ We have shown that Q)γ is a d-system containing
У, so that (since 9^ ^ d(c/) by definition) Q)1 = </(./).
Step 2: Let ®2:= {Aed(J):Ac\Bed(J\ УВефО}· Step 1 showed that ®2
contains J. But, just as in Step 1, we can prove that 2)2 inherits the d-system
structure from d(J\ and that therefore ®2 = d(J). But the fact that S>2 = d(J)
says that d(J) is a π-system. Π
2. Measurable functions. This section is largely a matter of acquainting you
with our notation.
(2.1) DEFINITION (Measurable functions, τη(Σ1/Σ2)). Suppose that (S^IJ
and (S2,X2) are measurable spaces, and that h is a map
h:S1-+S2.
Then h is called Z1/T2-measurable (or just measurable when Σχ and Σ2 are
understood), and we write hem(X1fL2\ if
Λ_1:Σ2^Σι;
that is, if the inverse image
h-1(A):={seS:h(s)eA}
of every set ΑθΣ2 is in Σχ.
This definition is exactly analogous to the definition of continuity.
(22) PROPOSITION. The map h'1 preserves all set operations:
h~\\JaAa) = [Jah-\Aa\ h-\Ac) = (h~\A))\ etc.
Proof. This is just definition chasing. Π
(23) PROPOSITION. If^^^o^^^andh-^.^^^thenhem^^^.
Proof. Let £ be the class of elements F in Σ2 such that /Γ^εΣ^ By (2.2), £
is a σ-algebra, and, by hypothesis, £ ^ #. Π
(2.4) PROPOSITION (Composition Lemma). //(S^J, (52,Σ2) and (53,Σ3)
are measurable spaces, and ifhl is measurable from (S1,X1) to (52,Σ2) and h2 is
measurable from (S2^2) to (53,Σ3), then h2°h1 is measurable from (ί^Σ^ to
(53Дз).
II.2
BASIC MEASURE THEORY
89
Proof. This is obvious. Π
(2.5) JH-valued functions; ml; (πιΣ)+; ЬХ. Let (Ξ,Σ) be a measurable space. A
function ft: S -► R is called Σ-measurable, and we write ftemX, if ft "*: ^ -► Σ, that
is, if hemfci/gS). We write (ηιΣ)+ for the class of non-negative elements in ml,
and bΣ for the class of bounded Σ-measurable functions on S.
Note. Because lim sups of sequences even of finite-valued functions may be
infinite, and for other reasons, it is convenient to extend these definitions to
functions ft taking values in [—00,00] in the obvious way: ft is called Σ-
measurable if ft_1:^[— 00, 00]-»Σ. Which of the various results stated for
real-valued functions extend to functions with values in [—00,00], and what
these extensions are, should be obvious.
(2.6) PROPOSITION. Our function ft:S-»R is Σ-measurable if and only if
^^ήι^^εΞ^^^εΣ (VceR).
Proof. Take # to be the class 7r(R) of intervals of the form (— oo,c],ceR, and
apply 2.3.
Note. Obviously, similar results apply in which {ft < c] is replaced by {ft > c},
{ft ^ c] etc.
(2.7) LEMMA. Sums and products of measurable TSL-valued functions are
measurable: in other words, ml is an algebra over R. Thus if >leR and ft,ftl5
/i2emZ, then
h1 + /ι2£πιΣ, ftift2em^ XhemZ.
Example of proof. Let ceR. Then for seS it is clear that ft^s) + h2(s) > с if and
only if for some rational q we have
hi(s)>q >c — h2(s).
In other words,
{ftx+ft2>c}= \J({h1>q}n{h2>c-q})9
a countable union of elements of Σ. Π
(2.8) LEMMA (measurability of infs, lim infs of functions). Let (к„:пеЩ be a
sequence of elements ο/ιηΣ. Then
(i) infftn, (ii) liminfftn, (iii) limsupftn
are Σ-measurable (into ([ — 00,00],^[— 00, 00]), but we shall still write inf /i„emZ
90 SOME CLASSICAL THEORY 11.2,3
(for example)). Further,
(iv) {s:limhn(s) exists in R}eX.
Proof. (i){inihn>c} = f)n{hn>c}.
(ii) Let L„(s):= inf {hr(s):r ^ n). Then L„emX, by part (i). But
L(s):= lim inf h„(s) = |lim L„{s) = sup Ln(s),
(iii) This part is now obvious.
(iv) This is also clear because the set on which lim hn exists in R is
{limsup/in< oo}n{liminf/in> — оо}п<7_1({0}),
where
g:= lim sup hn — lim inf hn. D
(2.9) σ-algebra generated by a collection of functions on S. This important idea
is analogous to the weakest topology that makes every function in a given family
continuous, etc.
Generally, if we have a collection (Yy:yeQ of maps 7y:Q-*]R, then
W:=a(Yy:yeQ
is defined to be the smallest σ-algebra <& on Ω such that each map Yy (yeC) is
Φ-measurable. Clearly,
G{Yy:yeC) = G{{(ue&\Yy{(u)eBy.yeC,Be@).
(2.10) Borel functions. A function h from a topological space S to R is called
Borel if h is Jf(S)-measurable. The most important case is when S itself is R.
(2.11) PROPOSITION. IfS is topological andh'.S-^Ж is continuous, then h is
Borel.
Proof. Take # to be the class of open subsets of R, and apply Proposition 2.3.
D
3. Monotone-Class Theorems. The following elementary Monotone-Class
Theorem allows us to deduce results about general measurable functions from
results about indicators of elements of π-systems.
(3.1) THEOREM. Let Ж be a class of bounded functions from a set S into R
satisfying the following conditions:
(i) Ж is a vector space over R;
(ii) the constant function 1 is an element of Ж;
H.3,4
BASIC MEASURE THEORY
91
(Hi) if{fn) is a sequence of non-negative functions in Ж such that fn}f, where
f is a bounded function on S, thenfejf.
Suppose further that Ж contains the indicator function of every set in some n-system
J'. Then Ж contains every bounded o(J)-measur able function on S.
Sketch of proof Let 2) be the class of subsets D of S such that 1йеЖ. Then
the listed properties of Ж guarantee that Q) is a d-system. Since $) contains J
by hypothesis, Dynkin's Lemma shows that Q) contains o(J).
Suppose that / is a a(</)-measurable function such that for some К in N,
Q^f(s)^K, VseS.
For neN, define
f„(s):= Σ i2~4DM(s),
i = 0
where
D(nJ):= {s:i2~n ^f(s)<(i+ 1)2""}.
Since / is a(c/)-measurable, every D(nJ)ea(J), so that 1Щп1)еЖ. Since Ж is a
vector space, every /„e/. But 0 </„t/, so that feЖ.
If /ebfff/), we may write f = f+ —/", where /=max(/,0) and /" =
max(-/,0). Then f+,f~ebG(J) and /+,/~ ^0, so that /+9/'еМГ by what
we established above. Π
For certain applications, it is very useful to have more sophisticated forms of
monotone-class theorem. Here is one.
(3.2) THEOREM. Let Ж be a vector space of bounded real-valued functions on
a set S. Suppose that Ж contains constant functions, is closed under uniform
convergence, and has the following property: for a uniformly bounded sequence
(fn) of non-negative functions in Ж such that fn(s) t/(s) (Vs), we must have feЖ.
If Ж contains a subset <£ that is closed under multiplication, then Ж contains
every bounded affi-measw-able function from S to R.
You are invited to prove this in Section 13. The hypothesis that Ж is closed
under uniform convergence (usually one that is easily verified) may be dropped.
See Dellacherie and Meyer [1].
4. Measures; the uniqueness lemma; almost everywhere; a.e. (μ, Σ). This is fairly
familiar material, but watch the use of π-systems in Lemma 4.6.
(4.1) Set functions: additivity; σ-additivity; monotone convergence. Let S be a set,
92
SOME CLASSICAL THEORY
II.4
let Σ0 be an algebra on S, and let
μο·Σο-»[0>°°]
be a 'non-negative set function'. Then μ0 is called additive if μο(0) = 0 and, for
F,GeE0,
FnG = 0^o(FuG^o(F)4^o(G).
The map μ0 is called countably additive (or σ-additive) if μ(0) = 0 and if, whenever
(Fn:neN) is a sequence of disjoint sets in Σ0 with union F = \JF„ in Σ0 (note
that this is an assumption since Σ0 need not be a σ-algebra), then
π
(4.2) LEMMA. Suppose that μ0 is additive on (S, Σ0), w/iere Σ0 is an algebra on
S. Then μ0 is σ-additive on Σ0 if and only if whenever Ρ„εΣ0 (пеЩ and Fn]F,
where FeL0, we have μ0(Ρ„)]μ0(Ρ).
Recall that Fn]F means F„^Fn + 1 (VneN), (Jf„ = F. Lemma 4.2 is the
fundamental property of measure.
Proof of Only if part. Write G1:=F1,Gn:=Fn\F„_1 (и>2). Then the sets G„
(neN) are disjoint, and
^o(fn) = ^o(G1uG2u--uG„)= Σ μ0((?λ)Τ Σ ft>(G*) = Mo(n Π
*5ζπ k<oo
It is now obvious how to prove the 'if part.
(4.3) LEMMA. ί/μ0(5) < oo then μ0 is σ-additive on Σ0 if and only if whenever
G^0(neIN) and GJ0, we have μ(σ„)|0.
You prove this!
(4.4) Measure space; finite and σ-finite measures. Let (S, Σ) be a measurable
space, so that ^is a σ-algebra on S. A map
μ:Σ^[0,οο]
is called a measure on (S, Σ) if μ is countably additive. The triple (S, Σ, μ) is then
called a measure space.
Now let (S, Σ, μ) be a measure space. Then μ (or indeed the measure space
(S,^))is called
finite if μ(Ξ) < oo,
σ-finite if there is a sequence (Sn:neN) of elements of Σ such that
μ(5„) < oo (VnelK) and (JS„ = S.
Π.4,5
BASIC MEASURE THEORY
93
Intuition is usually good for finite measures, and adapts well for σ-finite
measures. However, measures which are not σ-finite can be rather crazy.
An element F of Σ is called μ-null if /x(F) = 0. It is easily shown by the use
of Lemma 4.2 that a countable union of μ-null sets is μ-null.
A statement if about points s of S is said to hold μ-almost everywhere (a.e.
(μ)) if
F:= {s:^(s) is false} el and /x(F) = 0;
If we wish to emphasise that FeZ, we say that S holds a.e. (μ,Σ).
(4.5) A fundamental uniqueness lemma. The point here is that σ-algebras are
'difficult', but π-systems are 'easy': one can often write down in closed form the
general element of a π-system, while the general element even of 3$ is impossibly
complicated.
(4.6) LEMMA. Let J be a π-system on a set S, and let Σ:= g(J). Suppose that
μ1 and μ2 are measures on (S, Σ) such that μ^Ξ) = μ2(Ξ) < oo and μ1 — μ2 on J.
Then
μγ — μ2 on Σ.
Proof. The class of elements F of Σ for which μ^(Ρ) = μ2(Ρ) is a d-system
containing J. Dynkin's Lemma 1.8 now gives the result. Π
(4.7) COROLLARY (The Uniqueness Lemma). // two probability measures
agree on a π-system, then they agree on the σ-algebra generated by that π-system.
This result will play an extremely important role.
5. Caratheodory's Extension Theorem. The following result underpins the
existence of every non-trivial probabilistic model. We shall see its use in the
celebrated Daniell-Kolmogorov Theorem on the existence of stochastic processes.
(5.1) THEOREM (Caratheodory). Let S be a set, let Σ0 be an algebra on S, and
let
Σ:=σ(Σ0).
If μ0 is a countably additive map μο:Σο-»[0, οο], then there exists a measure μ
on (S, Σ) such that
μ = μ0 on Σ0.
If μ0(Ξ) < oo then, by Lemma 4.6, this extension is unique—an algebra is a
π-system!
For a proof see for example [W; Al.5-1.8].
94
SOME CLASSICAL THEORY
11.5,6
{5.2) Lebesgue measure Leb on ((0,1],^(0,1]). Let S = (0,1]. For F с S, say that
FeL0 if F may be written as a finite union
(5.3) F = (alJfe1]u---u(arJfer]5
where reN, 0 ^ ax ^ Ьх ^ · · · ^ ar ^ br ^ 1. Then Σ0 is an algebra on (0,1] and
Σ:=σ(Σ0) = #(0,1].
(We write Jf(0,1] instead of Л((0,1]).) For F as in (5.3), let
Po(F) = Σ (fek ~ ak)·
Then μ0 is well-defined and additive on Σ0 (this is easy). Moreover (see [W,
A1.9], μ0 is countably additive on Σ0. (To prove this is not trivial. Our proof
of the Daniell-Kolmogorov Theorem will remind you how it is done.) Hence,
by Theorem 5.1, there exists a unique measure μ on ((0,1],^(0,1]) extending
μ0 on Σ0. This measure μ is called the Lebesgue measure on ((0,1],^(0,1]) or
(loosely) the Lebesgue measure on (0,1]. We shall often denote μ by Leb. The
Lebesgue measure (still denoted by Leb) on ([0,1],^[0,1]) is of course obtained
by a trivial modification, the set {0} having Lebesgue measure 0.
In a similar way, we can construct the (σ-finite) Lebesgue measure (which we
also denote by Leb) on R (more strictly, on (R,^(R)).
6. Inner and outer μ-measures; completion. Results in this section should be
proved as an (easy) exercise. They are, however, important.
Let (S, Σ, μ) be a measure space. For G £ S, define the inner μ-measure μ+{0)
of G via
^+(G):=suP{MF):Fe^F^G},
and the outer μ-measure μ*{ΰ) of G via
μ*((?):=ΐηί{μ(#):#ΕΣ;#Ξ>σ}.
The function μ* on &(S) is sub-σ-additive (or countably sub-additive) in that
for any sequence (Gn:neIM).
If μ+{0) = μ*{ϋ), we say that G is μ-measurable, and write ϋβΣμ. Then Σμ is
a σ-algebra, and we can extend μ to a measure, still denoted by μ, on Σμ by writing
μ(0:=μ+(σ) = μ*(σ) for G in Σ".
The triple (Ξ,Σμ,μ) is called the completion of (ί,Σ,μ). The σ-algebra Σμ is the
smallest σ-algebra that extends Σ and contains every set of outer μ-measure 0.
We end this section with a lemma that is very significant for probability
theory. Its proof is an easy exercise.
И.6,7
BASIC MEASURE THEORY
95
(6.1) LEMMA. Suppose that /x(S) = 1 and that G is a subset ofS with μ*{6) = 1.
Then for FeI,/i*(GnF) = /z(i'). Moreover, (ΰ,&,μ*) is a measure space, where
& is the class of subsets of G of the form GnF, where feZ.
Integration
7. Definition of the integral \f άμ. Let (Ξ,Σ,μ) be a measure space.
(7.1) Notation etc: μ(/) :=:f/ άμ; μ(/; A). We are interested in defining for suitable
elements / in ml the integral of / with respect to μ, for which we shall use
the alternative notations
μϋΥ=:[№μ(ά5):=:^<1μ.
It is worth mentioning now that we shall also use the following equivalent
notations for ΑεΣ:
| /(5)μ(^):=:| /</μ:=:μ(/; Α):= μ(/Ικ)
(with a true definition on the extreme right!) It should be clear that, for example,
M/;/> *):= ltf\ A\ where A = {seS:f(s) > x}.
(7.2) Integrals of non-negative simple functions; SF+. If A is an element of Σ,
we define
Λ>(Ιι)·-ΜΛ)<οο.
The use of μ0 rather than μ signifies that we currently have only a naive integral
defined for simple functions.
An element / of (ml)+ is called simple, and we shall then write /eSF+, if /
may be written as a finite sum
(7-3) /= Σ aklAk,
where ake[0, oo] and АкеЪ. We then define
(7.4) μ0(/) = Σα^(Α) < oo (with O.oo := 0 =: oo.O).
Of course, it needs to be checked that μ0(/) is well defined, since / will have
many different representations of the form (7.3), and we must ensure that they
yield the same value of μ0(/) in (7.4).
(7.5) Integrals of non-negative functions. For /e(mZ)+, we define
μ(/):= sup ^0(ft):fteSF+,ft </} ^ oo.
96 SOME CLASSICAL THEORY 11.7,8
Clearly, for /eSF+, we have /4/) = μ0(/)·
(7.6) DEFINITION (μ-integrable functions; ^(Ξ,Σ,μ)). For /eml, we write
f=f+ ~f~, where
f+(s):= max(/(5),0), /"(*):= max(-/(s),0).
Then /+,/-6(ηιΣ)+ and |/| =/+ +/".
For /eml, we say that f is μ-integrable, and write
/ε&^,Σ,μ),
if
/41/1) = /4/+) + /4Г)<оо,
and then we define
^άμ:=μ(/):=μ(/+)-μ(Γ).
Note that, for /Ε^β,Σ,μ),
W)\<{*\f\l
the familiar rule that the modulus of the integral is less than or equal to the
integral of the modulus.
(7.7) LEMMA (Linearity). For a,j?eR and f9ge£e\S9JL^\
α/+ββΕ^1(Ξ9Σ9μ)
and
μΜ+β0) = *μ(ί) + βμ(0).
(7.8) Note. There is a slight problem here. For some 5, the expression (a/ + βg)(s)
may lead to the undefined 00 — 00. Please defer worrying about this until
Section 10.
8. Convergence theorems. We recall the standard results.
(8.1) THEOREM (The Monotone-Convergence Theorem). //(/„) is a sequence
of elements of(mL)+ such that fn]f then
or, in other notation,
j" Λ(Φ(ώ)ΐ j" №υ№
[W, A5] contains a proof. This theorem is really all there is to integration
II.8
BASIC MEASURE THEORY
97
theory. We shall see that other key results such as the Fatou Lemma and the
Dominated-Convergence Theorem follow trivially form it.
(8.2) LEMMA (The Fatou Lemma for functions). For a sequence (fn) in (ml)+,
/x(liminf/n)^liminf/x(/„).
Proof. We have
(8·3) liminffH = tlimgk9 where gk:= inf /„.
n n^k
For η ^ k, we have fn ^ gk, so that μ(/π) ^ μ(#Λ), whence
/x(^)^inf/i(/„);
n^k
and, on combining this with an application of Theorem 8.1 to (8.3), we obtain
μ( lim inf /„) = |lim μ(&) ^ Tlim inf //(/„)
\ π / к к п^к
=:liminf/x(/„). Π
π
(8.4) Lemma ('Reverse Fatou' Lemma). //(/„) is a sequence in (mE)+ such that
for some g in (ml)+, we have fn ^ g, Vn, and μ^) < oo, then
μ{1ίτη sup /и) ^ lim sup μ(/„).
Proof. Apply Lemma 8.2 to the sequence (g — /„). Π
(8.5) THEOREM (The Dominated-Convergence Theorem). Suppose that /n,
/επίΣ, ί/ιαί fn(s)->f(s) for μ-almost every s in S, and that the sequence (/„) is
dominated by an element g of JSf^L,//)"1":
\fn(s)\<g(s\ VseS, VneN,
w/iere μ^) < oo. Γ/ιβη
/„-^/ίη^Η^,Σ,μ); that is, μ(|/„-/|)-+0;
M/„W(/).
Note. This theorem is central to many applications of measure theory. For us,
it will be superseded by the uniform-integrability result: Theorem 21.2.
Proof. We have \fn — /|<2#, where μ(2#)<οο, so, by the reverse Fatou
Lemma 8.4,
limsuPM|/n-/|)^/x(limsup|/n-/|) = /z(0) = 0.
98
SOME CLASSICAL THEORY
11.8,9
Since
ΙΜ/„)-μ(/)Ι = |μ(/„-/)Ι^μ(Ι/„-/Ι),
the theorem is proved. Π
Here is a useful result.
(8.6) LEMMA (Scheie's Lemma). Suppose thatfn9 fe^\S9Σ,μ) and thatfn->f
(a.e.(/x)). Then
ΜΙΛ-/ΙΗ0 if and only if MI/.IHMI/I).
Exercise. Prove Scheffe's Lemma. Consider first the case in which fn and / are
non-negative. Note that then (/„ — /)" ^/.
(8.7) The standard machine. What we call the standard machine is a much
cruder alternative to the Monotone-Class Theorem.
The idea is that to prove that a 'linear' result is true for all functions h in a
space such as 2?1(8,Σ,μ),
(i) we first show the result is true for the case when h is an indicator function—
which it normally is by definition;
(ii) we then use linearity to obtain the result for h in SF+;
(iii) we then use the Monotone-Convergence Theorem to obtain the result for
he(mL)+, integrability conditions on h usually being superfluous at this
stage;
(iv) finally, we show, by writing h = h+ — h~ and using linearity, that the claimed
result is true.
When it works, it is easier to 'watch the standard machine work' than to appeal
to the monotone-class result, though there are times when the greater subtlety
of the Monotone-Class Theorem is essential.
9. The Radon-Nikodym Theorem; absolute continuity; λ«μ notation; equivalent
measures. Let (Ξ,Σ,μ) be a measure space. If /e(ml)+, then, by linearity and
the Monotone-Convergence Theorem,
(9-1) (//z)(F):=M/;F):=M/If) (^Σ)
defines a measure /μ on (S, Σ). Note that
(9.2) /x(F) = 0 implies that (/»(F) = 0.
The Radon-Nikodym Theorem is a very important converse for σ-finite
measures.
(9.3) THEOREM (The Radon-Nikodym Theorem) and DEFINITION
И.9,10
BASIC MEASURE THEORY
99
(absolute continuity, «, άλ/άμ). Let (S, Σ) be a measurable space, and let μ and
λ be σ-finite measures on (S,L). Then the following statements are equivalent:
(i) for FeE, μ(Έ) = 0 implies that A(F) = 0;
(ii) Я = fμfor some fin (πιΣ)+.
Suppose that statements (i) and (ii) hold. We then say that λ is absolutely
continuous relative to μ. The function f is defined uniquely modulo μ-null sets:
we say that f is a version of the (Radon-Nikodym) density of λ relative to μ,
and write
dX
/ = — a.e.(Ai).
άμ
You can see the relevance of σ-finiteness if you consider μ to be a measure
that counts the number of elements in a set.
For the classical proof see Halmos [1]. Meyer [2] and [W, Section 14.13]
are among the books that give a martingale proof.
(9.4) LEMMA. Suppose that λ and μ are finite measures on a measurable space
(S, Σ). Then λ«μ if and only if for ε > 0 we can find α δ > 0 such that
FeL and μ(Έ) < δ imply that A(F) < ε.
(9.5) LEMMA and DEFINITION (equivalent measures). Again suppose that
λ and μ are finite measures on a measurable space (S, Σ). Suppose further that
λ«μ and μ«λ. We then say that λ and μ are equivalent. Note that a.e.(/x) and
a.e.(A) now mean the same thing: we write a.e. Iff is a version of άλ/άμ αηά g is
a version ofa^ak then 0 < / < oo a.e. ana g = 1// a.e..
You are invited to prove these lemmas in Section 13.
10. Inequalities; S£p and LP spaces (p ^ 1). Continue to let (S, Σ, μ) be a measure
space, and let pe[l, oo). For /emZ, write fe&p:= J5fP(S, Σ,//) if
Н/11р:=Ы1/П}1/р<оо.
(10.1) LEMMA (Minkowski's Inequality). We have
\\/ + д\\,<и\\,+ Ш\г
(10.2) LEMMA (Holder's Inequality). Ifp>\anaq>\ satisfy p~l + q'1 = 1
then, for fgemZ,
\μ(/9)\<μ(\/9\)<υ\\Ρ\\9\\4.
The Schwarz inequality is the case when ρ = q = 2.
100 SOME CLASSICAL THEORY II. 10
The best way to view these classical inequalities is as consequences of Jensen's
inequality (18.3). See [W, 6.13]. An immediate consequence of Jensen's inequality
([W, 6.7]) is the following result.
(10.3) If μ is a finite measure and l^p^r then, for /emL,
ll/Jlp^(S)cH/llr, where c'.^p-'-r-1.
We now need some standard results from functional analysis. See, for example,
Dunford and Schwartz [1] or Halmos [1].
Define an equivalence relation on J5fp as follows:
f = g if and only if ||/-^||p = 0,
equivalently,
f = g if and only if / = g, a.e.(^).
Let [/] be the equivalence class in 5£p containing /, and let LP be the set of
equivalence classes. Since, for /eJSfp,^(|/| = oo) = 0, every equivalence class
[/] contains a representative /* taking only finite values. With obvious
notation, we can define
«[/] + β[β\ = [«/* + β9*1 II [Л 11,:= 11/11,.
The use of equivalence classes avoids both the oo — oo problem mentioned in
(7.8) and the associated lack of associativity. The set LP now becomes a normed
vector space. But more is true:
(10.4) LP is a Banach space; in particular, LP is a complete metric space under the
distance
^([/],Ы):=11[/-^]|1р.
The following characterisation of the dual of LP is important.
(10.5) LEMMA. For p>\, the dual space (LP)* of LP is the space 13 where
p'1 + q'1 = 1; if A is a bounded linear functional on LP, then there exists geL?
such that
A(/) = MM V/eL'.
What happens when ρ = 1 and q = oo? We say that /eif °° if the μ-essential
supremum norm off is finite:
||/||ao:=/i-esssup(/):=sup{x>0:/i(|/|>x)>0}
:= inf {x 7* 0:μ(\/\ ^ x) = 0} < oo.
Build L° in the obvious way. Then
(10.6) (L1)* = L00; but, except in trivial cases, (L00)* will be much bigger than L1.
И. 10,11 BASIC MEASURE THEORY 101
A key application is to combine these results with the Hahn-Banach Theorem
as follows.
{10.7) LEMMA. Let (Ξ,Σ,μ) be a measure space, let pe[l, oo), and let V be a
vector subspace of LP. Define qe(\,cc~\byp~l +q~l = I. Suppose that, for geU,
(M/^) = 0,V/gK)=>(^ = 0).
Then V is dense in LP.
Here is a way of putting this into practice.
(10.8) LEMMA. Let (Ξ,Σ,μ) be a finite measure space, and let J be a π-system
on S such that a(J) = Σ. Let pe[l, oo). Let V be the vector subspace of IF spanned
by the indicator functions of elements of J. Then V is dense in LP.
You are invited to prove this in Section 13.
(10.9) Important discussion; 5£p versus LP. If we ask the problem 'Does a certain
real-valued stochastic process {Xt: t ^ 0} have continuous paths?', we are asking
about the mapih-*AT(£,co). The question forces us to regard random variables
as true functions, not as equivalence classes. All interesting problems in
continuous time are invisible to the 'elegant' equivalence-class approach of functional
analysis. So, we must use 5£p rather than LP. The oo — oo problem will not
worry us, because whenever we need to subtract random variables, they will
have all values finite.
Product structures
11. Product σ-algebras. Product structures are especially important in
probability theory because of their close connection with the concept of independence.
(11.1) Finite-product σ-algebras. This is an area in which we really need the
Monotone-Class Theorems; the standard machine is not good enough.
Let (S^LJ and (52,Σ2) be measurable spaces. Let S denote the Cartesian
product S:=S1 χ S2. For i = 1,2, let p{ denote the ith coordinate map, so that
Pi(suS2):=su ρ2(5ι>52):=52·
The fundamental definition of Σ = Σχ χ Σ2 is as the σ-algebra
(Π.2) Σ = σ(Ρι,ρ2).
Thus Σ is generated by sets of the form
p;1(B1) = B1xS2 (Β,βΣ,)
102
SOME CLASSICAL THEORY
11.11,12
together with sets of the form
P21(B2) = S1xB2 (Β2εΣ2).
Generally, a product σ-algebra is generated by Cartesian products in which
one factor is allowed to vary over the σ-algebra corresponding to that factor, and
all other factors are whole spaces. In the case of our product of two factors, we
have
(11.3) (Bx χ S2)n{S1 χ B2) = Bxx B2,
and you can easily check that
J = {Bx χ B2: BfeLj
is a π-system generating Σ = Σ! χ Σ2. A similar remark would apply for a
countable product Π ΣΠ, but you can see that, since we may only take countable
intersections in analogues of (11.3), products of uncountable families of σ-
algebras cause problems. The fundamental definition analogous to (11.2) still
works.
{11.4) LEMMA. Let Ж denote the class off unctions f:S-> JR. that are in ЬΣ and
that are such that
for each sx in Slt the map s2\-+f(sl9s2) is ^L^measurable on S2,
for each s2 in S2, the map sx «—►/(51,52) is Σ^-measurable on Sx.
Then Ж = ЪΣ.
Proof. It is clear that if AeJ then 1АеЖ. Verification that Ж satisfies the
hypotheses of the Monotone-Class Theorem 3.1 is straightforward. Since
Σ = g(J\ the result follows. Π
Extension of the above concepts to general finite products is obvious, as are
canonical identifications:
(St χ S2^ χ Σ2) χ (S3^3) = (S1^1) χ (S2 χ 53,Σ2 χ Σ3)
= (S1xS2xS3^iXl2xZ3).
12. Product measure; Fubini's Theorem. We continue with the notation of the
preceding section. We suppose that for i = 1,2^ is a finite measure on (5{,Σ{).
We know from the preceding section that, for feΣ, we may define the
integrals
I{(si):= /(si,s2)a*2(</s2), I{(s2):=
JSi J
/(suSihiidSi).
(12.1) LEMMA. Let Ж be the class of elements in ЪΣ such that the following
11.12
BASIC MEASURE THEORY
103
property holds:
IiiOebLi and I{()ebX2 and f I{(5l)/x1(d5l)= | I{(s2)/x2(ds2).
Гйет ^ = bΣ.
Proof. If Лбе/ then, trivially, 1АеЖ. Verification of the conditions of the
Monotone-Class Theorem 3.1 is straightforward. Π
For FeL with indicator function /:= IF, we now define
/z(F):=[ 1{(51)μ1№1)=ί 1{(52)μ2(ώ2).
J Si JS2
(12.2) THEOREM (Fubini's Theorem; Product measure). Recall that, for i =
l,2,/xf is a finite measure on (S^i). The set function μ is a measure on (Ξ,Σ)
called the product measure of μχ and μ2, and we write μ = μγ χ μ2 and
(S,E,/x) = (S1,E1,/x1)x(S2,E2,/x2).
Moreover, μ is the unique measure on (S, Σ) for which
(12.3) μ(Α1 χ A2) = μ1(Α1)μ2(Α21 Α,βΣ,.
Iffe(mL)+, then, with the obvious definitions of I{ and l{, we have
(12.4) μ{/)=\ Ι{(51)μ1(ώ1)=ί φ2)μ2(ά32),
JSi JS2
in [0, oo]. IffemL and μ(\f\) < oo, then (12.4) is valid (with all terms in KJ.
Proof. The fact that μ is a measure is a consequence of linearity and the
Monotone-Convergence Theorem. The fact that μ is then uniquely specified by
(12.3) is obvious from the Uniqueness Lemma 4.7 and the fact that a(J) = Σ.
The result (12.4) is automatic for / = IA, where AeJ. The Monotone-Class
Theorem 3.1 shows that it is therefore valid for /ebL, and in particular for /
in the SF+ space for (ί,Σ,μ). The Monotone-Convergence Theorem then shows
that it is valid for/G(mE)+; and linearity shows that (12.4) is valid if μ(|/|) < oo.
(12.5) (Extension). All ofFubinVs Theorem will work if the (ί,-,Σ^μ,·) are σ-finite
measure spaces.
We can prove this by breaking up σ-finite spaces into countable unions of
disjoint finite blocks.
(12.6) Lebesgue measure on ΚΛ We have ^(Rn) = ^и:= ^(R)n. For further study
of such matters, see the exercises in the next section. We define the Lebesgue
measure on Rn as Leb", but often denote this by Leb when η is understood.
104
SOME CLASSICAL THEORY
11.13
13. Exercises. All of these exercises play a part later, Hints are given at the
end of this section. The number of an exercise indicates when (that is, at the
end of which section in the main text) you can attempt the question.
(El3.2a) R-functions on R. By an R-function F on R, we mean a right-
continuous function on R such that the left limit F(x —) exists for every xeR.
Prove that an R-function is Borel-measurable.
(El 3.2b) Prove that if S is a metric space, and if С = Cb(S\ the space of bounded
continuous functions on S, then a(C) = @(S).
(El3.3a) Prove the Monotone-Class Theorem 3.2.
(E13.3b) Let jr.S->R. Prove that σ(Χ) = {Х~\В)\Ве^}. Prove that if Y:S->
R, then Υ is apQ-measurable if and only if Υ = f(X) for some Borel-measurable
/onR.
(El 3.5a) From Lebesgue to Lebesgue-Stieltjes measure. Let F be a right-
continuous non-decreasing function on R. Let a:= mIxF(x) and b:= supxF(x).
For yel:= [a,b)nR, define
<t>(y):=mi{x:F(x)>y}.
Prove that φ is a left-continuous (therefore Borel) function on /. Assume that
Lebesgue measure μ exists on (/, Jf(/)), and define
μΡ(Β):=(μοφ-ΐ)(Β):=μ{Γ.φ(γ)ΕΒ} (Be@).
Prove that μ¥ is a measure on (R, 3S\ and that it is the unique such measure with
/xF(w, v] = F(v) — F(u) (— oo < и < ν < oo).
(El 3.5b) Functions of finite variation. For a sub-interval (д,Ь] of R, we define
the total variation VF(a,b~\ of F over (a,b~\ to be
KF(a,b]:= sup \ £ \F(tt) - Ffa-Jl: neN,fl< tx < t2 < ··· < tn ^ b
A function F is said to be of finite variation (or an FV function) if it is an R-
function such that VF(a, b~] < oo for every finite subinterval (a, b] of R. Let F
be an FV function. Prove that, for aeR,
(i) b\-+VF(a,b~] is an R-function on [a,oo),
(ii) the functions b \-+ VF(a, b~\ + F(b) and b \-* VF(a, b~\ — F(b) are non-decreasing
on [a, oo).
It is now trivial that F is an FV function if and only if it is the difference of
two non-decreasing R-functions. This makes clear how an FV function F induces
a signed measure μ¥, the difference of two measures.
11.13
BASIC MEASURE THEORY
105
(E13.5c) Continuous FV functions. Let F be a continuous FV function. Prove
that VF(a, b~\ is continuous in b for b ^ a. Define the quadratic variation QF(a, b]
over the interval {a, b] by
GF(fl,b]:=sup| Σ |F(ii)-F(ii_1)|2:nEH^^ii<i2<-"<in^b
Prove that QF(a, b~] = 0 for all intervals (a, ft]. (Whence, Brownian paths are not
of finite variation.)
{El 3.6a) Prove Lemma 6.1.
(E13.6b) Bad subsets of [0,1]. Many of our counterexamples start off: Take a
subset of [0,1] of outer (Lebesgue) measure 1 and inner measure 0.' Here is a
way of constructing such a set. At the moment, this exercise is rather hard. We
shall see later (Exercise 60.50) that martingales make it easy.
Let ρ be an irrational number. Define an equivalence relation on [0,1] by
saying that xx = x2 if and only if xx — x2 = m + np for some m, neZ, in other
words, if x1—x2=z np mod 1 for some пеЖ. Use the Axiom of Choice to create
a set A with one element from each equivalence class. Define
B:= {{2np + a)modi :neZ,aeA},
C:={(p + /?)modl:/?eB}.
Note that BnC = 0,BvC = [0,1],B= {(p + y)modl :yeC}, so that each of
В and С is 'half of [0,1]. Prove that В has outer Lebesgue measure 1 and
inner Lebesgue measure 0.
(El3.8) Prove Scheffe's Lemma 8.6.
(El3.9a) Prove Lemma 9.4.
(El3.9b) Prove Lemma 9.5.
(El 3.9c) Integrals under change of measure. Let (Ξ,Σ,μ) be a measure space, let
/e(ml)+, and let λ = (/μ), the measure in (9.1). Show that, for gemL, we have
geSe\S^X) if and only if fgeSel(S^^\ and then λ(g) = μ(fg).
(E13.9d) Absolutely continuous functions on R. A function F on R is called
absolutely continuous if there exists a function / in if Ц11, $, Leb), in the sense
that //^ei^OR^Leb) for all finite subintervals [д,Ь] of R, such that
F(b) - F(a) = f(x) dx (oo<a^b<oo).
We then call / a derivative of F (obviously, in an extension of the Newton-
Leibniz sense) and write Ff = /, a.e. Prove that an absolutely continuous function
106 SOME CLASSICAL THEORY 11.13
/ is continuous. Prove that if F is an absolutely continuous non-decreasing
function, then the measure μρ of Exercise 5a is absolutely continuous with respect
to Lebesgue measure with άμρ/ά Leb = F, a.e.
(E13.9e) Show that if F is an absolutely continuous function then it is a
continuous FV function with
VF(a,bl = ^\F'(x)\dx.
The last part is tricky now, but easy using martingale theory.
{El3.10a) Prove Lemma 10.8.
Topology and measure
The ideas behind the following exercises are important throughout the book.
There is always a possibility of 'conflict' between topology, which allows
certain uncountable operations (the union of an uncountable number of open
sets is still open), and measure theory, which allows only countably many
operations. For measures on separable metric spaces, things are as one might
hope. Recall that a metric space (S, p) is called separable if it has a countable
dense subset.
(El3.1 la) Let (S,p) be a separable metric space. Prove that p:SxS-*R is
@t(S) χ @(S) measurable. Show that any subset A of S has a countable dense
subset.
(E13.11b) Let (Si,Σι) and (S2,E2) be measurable spaces. Define
(^Σ^^,Σ^χ^,Σ,).
Prove that the map
(x,y) = ((x1,x2),(j;1,};2))i^(x1,y1)
of S χ S into Si χ Si is (Σ χ Σ)/{Σ1 χ EJ-measurable.
(E13.11c) Let (S1,p1) and (S2,p2) be separable metric spaces. Then the product
topology on S:=S1 χ S2 arises from the metric
р(х,У) = Р((х1,Х2)ЛУиУ2)У=РЛхиУ1) + Р2(х2>У2)·
Define ^l:=S3(Sl\^2:=^{S2) and Σ:=Σ!χΣ2. Prove that ρ is (Σ χ Σ)-
measurable. Deduce that
^(S1 xS2) = Jf(S1)x^(S2).
П. 13 BASIC MEASURE THEORY 107
(Generalisation). Suppose that, for neN, (Sn,pn) in a separable metric space. The
product topology on Ппек^л arises from the metric p, where
P{x,y):= Σ 2- ><*»·*>
Show that S is separable. Check (at least in principle!) that @(S) = Y[@(Sn).
{El3.1 Id) Let S be the non-separable metric space with cardinality greater than
that of the continuum, and with the discrete metric
(0 if x = y.
Let Δ:= {(x,y)eS χ S:x = y}. Then Δ is closed and therefore Ae$(S χ S).
Convince yourself that it is plausible (it is true!) that АфЩБ) х 38(S) and that
ρ is (therefore) not ($>(S) χ J^S))-measurable.
Hints for selected exercises
{H13.2a) F is the limit of step functions F(2"n([2ni] + 1)), [x] denoting
sup{neZ:n^x}.
{HI 3.2b) If F is a closed set and ρ is the distance function, then
max(0,1 — np(x,F))lIF.
(HI 3.3a) Consider the π-system of sets of the form
(*) Π {*:с,Ме(в,Л)Ь
ί=1
where neN and, for 1 ^ i < n, qe^ and — oo < a{ < bf < oo. If we can prove that
the indicator function of every set of the form (*) is in Ж then the desired result
will follow from Theorem 3.1.
For any open subinterval of R, we can find continuous functions gm on R
with gm]I{a,by Let сеЯ>. By the Weierstrass theorem, we can find polynomials
pm>k such that pm>k ->gm uniformly on [- || с ||, || с || ]. Then pmM°c-*gm°c uniformly
on S, whence дт°сеЖ; and now it follows that 1{аЬ)°сеЖ. The rest is easy.
(HI3.3b) Since, {X~1(B):Be@} is a σ-algebra, the first part is easy. Let Ж
consist of functions of the form f°X, where / is bounded Borel-measurable.
Then Ж satisfies conditions (i)-(iii) of Theorem 3.1.
(HI3.5a) We have и < ф(у)<vii and only if F(u) <y^F(v).
(H13.5b) VF(a,b~\ (note the 'open at a') is decreasing as b[[a. Suppose that, for
108 SOME CLASSICAL THEORY 11.13,14
some a,
lim VF(a, b] = ε0 > 0.
ЬЦа
Find a partition a < t0 < tx < ··· < tn = b on which
i=l
Now choose a partition of (a, i0] on which the analogous sum is at least |ε0
to arrive at a contradiction.
(HI 3.5c) We have
ZTO-*to-i)l2<sup[|^
(HI 3.6b) Wait until the exercises on discrete-parameter martingales.
(HI 3.8) The hint was given after Lemma 8.6.
(HI 3.9a) Peep ahead to the proof of Lemma 20.1 to get the idea.
(HI3.9c) If g = IF (FeY), then X(g) = μ(/#) by definition. Now use the standard
machine.
(H13.9e) It is obvious that VF ^ j,..., but why do we have equality? Again wait
for martingales to come to the rescue.
(HI3.1 la) Let (sn) be a sequence dense in S. Then
p(x,y) = inf lp(x,sn) + p(sn,y)l
η
For each n, xi->p(x,sn) is continuous, and (x,y)h->p(x,sn) is ЩЗ) χ ^S(S)-
measurable.
(HI3.11c) See(Ellllb).
(HI3.1 Id) See Billingsley [2].
2. BASIC PROBABILITY THEORY
Probability and expectation
14. Probability triple;almost surely (a.s.), a.s.(P), a.s.(P, J*). By a probability
triple, we mean a measure space (Ω,^,Ρ) of total mass Ρ(Ω) = 1.
Π.14,15
BASIC PROBABILITY THEORY
109
From now on, (Ω, J*, P) will always denote a probability triple. Unless
otherwise stated,
У ':= JS? '(Ω, #", P), LF:= LP(Q, J*", P),
and ||·||ρ will refer to these spaces.
An element Ε of J^ will be called an event, and P(£) will be called the
probability of the event E. A statement S about outcomes ω in Ω is said to be true
almost surely (a.s.) if
F:= {ω: S(co) is true}e^ and P(F) = 1.
If we wish to emphasise which probability measure we are talking about, we
write 'a.s.(P)', and if we further wish to emphasise that the truth set of S is in
J^, we write 'a.s.(P, &)\ It is easy shown that if Fne& (neN) and P(F„) = 1, Vn,
thenP(n„F„)=l.
The intuitive meaning. We assume that the intuitive meaning is familiar to you
from more elementary books. Chance is regarded as having chosen a particular
point ω (the actual realisation) of Ω 'according to the law P' before the
experiment modelled by (Ω, ^", P) is performed. For an event F, F occurs in reality if
and only if the chosen ω is in F.
15. limsup£„; First Borel-CantelK Lemma. Suppose now that (En:neM) is a
sequence of events. We define
limsup£„:=P) U En
m О т
= {ω:for every m, 3η(ω) ^m such that coe£„(to)}
= {co:coeEn for infinitely many n}.
Two important results relate to lim sup £n.
(15.1) LEMMA (Reverse Fatou Lemma for sets):
P(lim sup EH) ^ lim sup P(£„).
Proof. Let Gm:=(J„>m£„. Then GmjG, where G:=limsup£„. By result (4.3),
P(Gw)jP(G). But, clearly,
P(GJ^supP(£J.
п^ т
Hence
P(G) ^ | lim | sup P(£„) [ =:lim sup P(E„). П
(15.2) LEMMA (First Borel-Cantelli Lemma). Let (En:nelK) be a sequence of
110 SOME CLASSICAL THEORY II. 15-17
events such that ΣηΊ*(Εη)< °°· Then
P(lim sup En) = P(E„, i.o.) = 0.
Proof. We have, for each m,
P(GKP(GJ< Σ P(£n).
О т
(Convince yourself of the rigour.) Now let m] oo. Π
16. Law of random variable; distribution function; joint law
(16.1) DEFINITION (Random variable; law). Let (E, S) be a measurable space.
By an (£, <?)-valued random variable X (or E-valued random variable X, when
S is understood) carried by our probability triple (Ω,^,Ρ), we mean an
(&/immeasurable map X from Ω to E, so that X~l\£-*&.
By the law Ax of X, we mean the probability measure Ax:=P°X~l on (£,£),
so that
AX(A) = P{Xe A):= Ρ{ω: X(co)e A } {AeS).
Suppose that for i = 1,2, (Eh <f£) is a measurable space and that X{ is an {Eh £{)-
valued random variable. Let E:= Ex χ E2, i\— Sl χ S2 and
Χ(ω):=(Χ1(ωΙΧ2(ω))ΕΕ.
Check that X is an (E, <?)-valued random variable. The joint law AXuX2 of Xt
and X2 is then defined to be the law of X. There are obvious extensions, some
of which we study in great detail later.
If our variable X is R-valued (that is, (R, SS)~valued) then it follows from the
Uniqueness Lemma 4.7 and the fact that ^ = σ(π(Κ)), where n(R) is at (1.4),
that the law of X is determined by the distribution function Fx of X:
Fx{x):=P(X^x) {хеЩ.
17. Expectation; E(X,F). We introduce some notation used throughout the
book.
(17.1) DEFINITION (Expectation; E(X)) For a random variable
Xe^l=^1(il,^r,P),
we define the expectation E(X) of X by
E(X):= XdP=\ Χ(ω)Ρ(άω).
Jn Jo.
We also define E(X) (^ oo) for Xe(m«f)+. In earlier notation, E(X) = P(X).
И. 17,18 BASIC PROBABILITY THEORY 111
(17.2) DEFINITION (Notation E{X;F)). For Xetf1 (or (m^) +) and Fe^,
we define
E(X;F):= J Χ(ω)Ρ{άω):= E{XlF),
where, as ever,
(\ if coeF,
IF(co):= <
10 if cottF.
{17.3) LEMMA. Ifhe{m£)+, then
(17.4) Eh(X) = h{x)Ax(dx) < oo.
For hemS, h(X)e&l{Q3F,P) if and only ifhe&\E,£,Kx\ and then
Eh(X)=( h(x)Ax(dx).
Proof Use the standard machine at (8.7). If h = IA for some AeS then (17.4)
is true by definition of Kx\ etc. Π
An important case of this lemma is when (£, S) — (R, 8$) and h is the identity
function on R.
18. Inequalities: Markov, Jensen, Schwarz, Tchebychev. These inequalities will
be used repeatedly in estimates.
(18.1) LEMMA (Markov's inequality). Suppose that Z6m«f and that #:R-*
[0, oo] is US-measurable and non-decreasing. (We know that g(Z) = g°Ze(m^r)+.)
Then
Eg(Z) > E(g(Z); Z>c)> g(c)P(Z > c).
Examples
for Ze{m&)+, cP{Z ^ c) ^ E(Z) (c > 0),
iorXetf1, cP(\X\>c)^E{\X\) (c>0).
Considerable strength can often be obtained by choosing the optimum θ for с in
P(Y>c)^e~ecE(eeY) (0>O,ceR).
(18.2) Jensens inequality for convex functions. A function c:G-*R, where G is
an open subinterval of R, is called convex on G if its graph lies below any of
112
SOME CLASSICAL THEORY
11.18
its chords: for x, yeG and 0 ^ ρ = 1 — q^l,
c(px + qy) ^ ρφ) + ^φ).
Then с is automatically continuous on G. If с is twice-differentiable on G then
с is convex if and. only if с" ^ 0. Important examples of convex functions are
|x|,x2 and e**(0eR).
(18.3) THEOREM (Jensen's inequality). Suppose that c:G-*R is a convex
function on an open subinterval Go/R and that X is a random variable such that
E(\X\) < oo, P(XeG) = 1, E|c(JSQ| < oo.
Then
Ec(X) ^ c(E(X)).
See [W; Section 6.6] for a full proof. The point is that, since there is a supporting
hyperplane for с at (μ, φ)), where μ = E(X\ there exists an m in R such that
c(X) > m(X - μ) + φ);
and Jensen's inequality follows on taking expectations.
(18.4) LEMMA (Monotonicity of norms). If l^p^r and Ует^, then
\\y\\p^\\YL·
Proof Apply Jensen's inequality with X = | Y\p and c(x) = xr/p {x > 0). D
(18.5) Familiar facts. We recall three results that are consequences of Jensen's
inequality (see Section 24): for ρ > 1 and p~l + q~l = 1,
Holder: |E(XY)| ^E|*7| < ||*||,|| Y\\v
Schwarz: |Е(ХУ)| ^Е|*У| ^ ||AJ2|| Y\\2,
Minkowski: \\X+Y\\p<\\X\\p+\\Y\\p,
(18.6) Variance; covariance; Tchebychev's inequality. If X, Ye Si'2 then, by the
monotonicity of norms, X, YeSf1, so that we may define
μχ:=Ε(Χ\ μγ:=Ε(Υ).
Since the constant functions with values μχ and μγ are in jS?2, we see that
Χ:=Χ-μχ, Ϋ:=Υ-μγ
are in JSf2. By the Schwarz inequality, X YeS£*, and so we may define
€ον(Χ,Υ):=Ε(ΧΫ) = ΕΙ(Χ-μχ)(Υ-μγ)1
The Schwarz inequality further justifies expanding out the product in the final
II. 18,19 BASIC PROBABILITY THEORY 113
[ ] bracket to yield the alternative formula
Οον(Χ,Υ) = Έ(ΧΥ)-μχμγ.
As you know, the variance of X is defined by
Var (*):= E[_(X - μχ)21 = Ε(Χ2) -μ2χ = Cov (X, X).
You also know Tchebychev's inequality:
(18.7) с2Р(|*^*|>сКУаг(*) (c>0).
19. Modes of convergence of random variables. Let (Xn: neN) be a sequence of
random variables and let X be a random variable, all carried by our triple
(Ω,^,Ρ) and all R-valued.
Recall that we say that Xn -* X almost surely if
(19.1) P(Xn->X) = l.
We say that Xn-*X in probability if, for every ε >0,
(19.2) Ρ(\ΧΗ-Χ\>ε)->0 as и^оо.
We say that Xn->X in S£p if each Xn is in S£p and Xe&p and
||*,-ΑΊΙρ->0 as n^oo,
or, equivalently,
(19.3) E(\XH-X\*)->0 as n->oo.
Some relationships between these modes of convergence will now be stated.
Regard the proofs as exercises. See [W; EA13.1]. Convergence in probability
is the weakest of the above forms of convergence. Thus
(19.4) (Xh->X,2l.s.)=>(Xh-+X in prob)
(19.5) (ΧΗ->Χίη&η=>(ΧΗ->Χ in prob).
No other implication between any two of our three forms of convergence is
valid. But, of course, for r ^ ρ ^ 1, monotonicity of norms shows that
(19.6) {Xn^Xm S£r)^{Xn->X in Sep).
'Fast convergence in probability' does imply almost sure convergence:
(19.7) hp(\Xn-X\>s)<oo,4s>Oj=>(Xn^X,a.s.).
Property (19.7) is used in proving the following result:
(19.8) Xn-+X in probability if and only if every subsequence of(Xn) contains a
further subsequence along which we have almost sure convergence to X.
114 SOME CLASSICAL THEORY 11.20
Uniform integrability and JS?1 convergence
20. Uniform integrability. We begin with a lemma.
(20.1) LEMMA. Suppose that Xetf1 = ^(Ω,^,Ρ). Then, given ε>0, there
exists α δ > 0 swc/i that for FeJ^, P(f) < <5 /mpfes ί/ιαί E(|A"|;F) < ε.
Proof. If the conclusion is false, then, for some ε0 > 0, we can find a sequence
(Fn) of elements of J^ such that
P(F„)<2-" and E(\X\;Fn)>s0.
Let #:= lim supFn. Then the First Borel-Cantelli Lemma shows that Р(Я) = О,
but the 'Reverse Fatou' Lemma 8.4 shows that
Щ\Х\;Н)>е0;
and we have arrived at the required contradiction. Π
(20.2) COROLLARY. Suppose that Xe£fl and that ε > 0. Then there exists К
in [0, oo) such that
E(\X\;\X\>K)<s.
Proof. Let δ be as in Lemma (34.1). Since KP(\X\ > К) ^ E(|X\\ we can choose
KsuchthatP(|Z|>K)<<5. Π
(20.3) DEFINITION (UI family). A class <ё of ΈΙ-valued random variables is
called uniformly integrable (UI) if given ε > 0, there exists К in [0, oo) such that
Έ(\Χ\;\Χ\>Κ)<ε, VXe<€.
We note that for such a class #, we have (with K1 relating to ε = 1), for every
Xe<#,
Έ(\Χ\) = Έ(\Χ\;\Χ\>Κ1) + Έ(\Χ\;\Χ\*:Κ1)*:ΐ+Κ1.
Thus a UI family is bounded in JSf1.
It is not true that a family bounded in JSf * is UI.
(20.4) Example. Take (Ω,^,Ρ) = ([0,1],Λ[0,1], Leb). Let
£„ = (0,0, Xn = nIEn.
Then E(|X„\) = 1, Vn, so that (Xn) is bounded in JS?1. However, for any К > 0,
we have, for n> K,
E(\Xn\;\Xn\>K) = nF(En) = l,
so that (Xn) is not UI. Here Xn->0 a.s., but Έ(Χη)-/>0. Π
Π.20,21
BASIC PROBABILITY THEORY
115
We now give two simple sufficient conditions for the UI property.
(20.5) LEMMA. Suppose that <£ is a class of random variables that is bounded
in <£p for some ρ > 1; thus, for some Ae[0, oo),
E(\X\P)<A, VXe<€.
Then <€ is UI.
Proof. Ifv>K>0 then v^K1 ~pvp (obviously!). Hence, for Κ >0 and XeV,
we have
E{\X\;\X\>K)^K1-pE{\X\p;\X\>K)^K1-pA.
The result follows. Π
(20.6) LEMMA. Suppose that <£ is a class of random variables that is dominated
by an integrable non-negative variable Υ: \Χ(ω)\ ^ Υ(ω),"iX^€ and E(Y) < oo.
Then <€ is UI.
Proof. It is obvious that, for К > 0 and XeV, E(|X\; \X\ > К) ^ E(Y; Y> K\
and now it is only necessary to apply (20.2) to Υ. Π
(20.7) LEMMA. A class Ή of random variables is UI if and only if the following
two conditions hold:
(i) <ё is bounded in 5£1;
(ii) given ε > 0, there exists δ > 0 such that, whenever Xetf, and FetF is such
that P(F) < δ, we have E(\X\;F) < ε.
(20.8) LEMMA. // <ё and Q) are UI families of random variables, then
V + S>:= {X + Y'.Xe^YeS)}
is UI.
Proofs of Lemmas 20.7 and 20.8 are left as easy exercises.
21. jS?* convergence. We begin with what is (in view of (19.8)) a consequence
«of the Dominated-Convergence Theorem.
(21.1) THEOREM (Bounded-Convergence Theorem). Let (Xn) be a sequence
of random variables, and let X be a random variable. Suppose that Xn->X in
probability and that, for some К in [0, oo), we have for every η and ω,
\Χη(ω)\^Κ.
Then
Е(\Хя-Х\)^0.
116 SOME CLASSICAL THEORY 11.21,22
Proof. You check that P( | X | ^ K) = 1. Let ε > 0 be given. Choose n0 such that
P(\Xn-X\> £e) < εβΚ when η ^ n0.
Then, for η ^ n0,
Ε(\Χη-Χ\) = Ε(\Χη-Χ\;\Χη-Χ\>^ε) + Ε(\Χη-Χ\;\Χη-Χ\^^ε)
*ζ2ΚΡ(\Χη-Χ\>±ε) + ±ε*ζε.
The proof is finished. D
{21.2) THEOREM (A necessary and sufficient condition for Se1 convergence).
Let (Xn) be a sequence in JSP1, and let Xetf1. Then Xn-*Xin &\ or, equivalently
E(\Xn — X\)-*0-, if and only if the following two conditions are satisfied:
(i) Xn-*X in probability;
(ti) the sequence (Xn) is UI.
It is of course the 'if part of the theorem that is useful. Since the result is 'best
possible', it must improve on the Dominated-Convergence Theorem for our
(Ω, .F, P) triple; and, of course, the result (20.6) makes this explicit.
Proof of'if part. Suppose that conditions (i) and (ii) are satisfied. For Ke[0, oo),
define a function <px:IR-*[ — K,K~\ as follows:
(к iix>K,
φκ{χ):=\ x if \x\^K,
\ -K if x<-K.
Let ε > 0 be given. By the UI property of the (Xn) sequence and (20.2), we can
choose К so that
Ε{\φκ(Χη)-Χη\}<& Vn; Ε{\φκ(Χ)-Χ\) <±ε.
But, since \φκ(χ) — Ψκ(ϊ)\ < |x — ,y|, we see that (Рк(Хн)-*Ч>к(Х) i*1 probability;
and, by Theorem 21.1, we can choose nQ such that, for η ^ n0,
Ε{\φκ{Χη)-φκ{Χ)\}<\ε.
The triangle inequality therefore implies that, for n^n09E(\XH — X\)<e, and
the proof is complete. Π
Independence
22. Independence of σ-algebras and of random variables. Here are the key
definitions of independence.
Sub-a-algebras (SU<&2,... of <F are called independent if, whenever G{69j
(/eN) and il9...,iH are distinct,
P(Gl,n..nGiJ=nP(GiJ.
11.22 BASIC PROBABILITY THEORY 117
Random variables X1,X2,... are called independent if the σ-algebras
σ(Χ1%σ(Χ2),...
are independent.
Events El9 E2,... are called independent if the σ-algebras Sl9Sl9... are
independent, where
Sn is the σ-algebra (0,£„,Ω\£„,Ω}.
Since £η = σ(ΙΕη\ it follows that events El9E29... are independent if and only
if the random variables Ie1Je2^··· are independent.
(22.1) The π-system Lemma. We know from elementary theory that events
El9 E2,... are independent if and only if whenever neN and il9..., in are distinct,
P(£lln...n£J=nP(£iJ,
corresponding results involving complements of the Et etc., being consequences
of this.
We now use the Uniqueness Lemma 4.7 to obtain a significant generalisation
of this idea, allowing us to study independence via (manageable) π-systems
rather than (awkward) σ-algebras.
Let us concentrate on the case of two σ-algebras.
(22.2) LEMMA. Suppose that У and Ж are sub-σ algebras of ^, and that J
and # are π-systems with
a(J) = 99 а(/) = Ж.
Then У and Ж are independent if and only if J and # are independent in that
P(/nJ) = P(/)P(J), IeJ, Jef.
Proof. Suppose that J and # are independent. For fixed / in </, the measures
(check that they are measures!)
ЯиР(/пЯ) and ЯиР(/)Р(Я)
on (И9Ж) have the same total mass P(/), and agree on /. Therefore, by the
Uniqueness Lemma 4.7, they agree on σ(#) = Ж. Hence
Р(/пЯ) = Р(/)Р(Я), IeJ, НеЖ.
Thus, for fixed Η in Ж, the measures
G\-+P(GnH) and G\-+P(G)P(H)
on (Ω, ^) have the same total mass Р(Я), and agree on J. They therefore agree
on a(J) = ^; and this is what we set out to prove. D
118 SOME CLASSICAL THEORY 11.22,23
Suppose now that X and Υ are two real-valued random variables on (Ω, &, P)
such that, whenever x, yelR,
(22.3) P(X ^ x; Y^ y) = P(X < x)P(Y^ y).
Now, (22.3) says that the π-systems π(Χ):= {X~x((- oo,x]):xeR} and π(Υ) are
independent. Hence σ(Χ) and σ(Υ) are independent: that is, X and У are
independent in our new 'abstract' sense.
In the same way, we can prove that random variables X1,X2,...,Xn are
independent if and only if
P(Xk^xk:l^k^n)=f\P(Xk^xk),
k=l
and all the familiar things from elementary theory.
(22.4) Independence and product measure. This is the ultimate form of the
'independence means multiply' idea. Suppose that, for /=1,2, (Ei9£i) is a
measurable space and that Xt is an (Eb (^)-valued random variable. Recall from
Section 16 the definitions of the laws AXl and AX2 of Xx and X2, and of the
joint law AXuX2 on (E,S'):={E1 χ E2, Sx χ S2).
(22.5) THEOREM. The variables Xx and X2 are independent if and only if
If Χι and X2 are independent and if for i= 1,2, ^е{т^) + then
(22.6) ΕΜ*ι)Μ*2) = Vh^XJ.Eh^XJ ^ oo.
You prove the first statement. Result (22.6) follows from Fubini's Theorem
together with the ideas in (17.3): if we define h on Ε via h{x):= h^xjh^xj,
where χ = (xi,x2)> then
EM*i)M*2)= f 4x)AXl9X2(dx)
h1(x1)h2(x2)A1(dx1)A2(dx2i
JE1JE2
= EM*i).E/i2(*2).
There are obvious generalisations of the theorem.
If X and Υ are independent elements of JSf^Q, ^,Ρ), then XYetf1 and
E{XY) = E{X)E{Y). If, further, X, Ye&2, then
Var(X + Y) = Var(X) + Var(7);
and so on, and so forth...
23. Existence of families of independent variables. From Section 1.6 on Ciesielski's
construction of Brownian motion onwards, we have required models that
Π.23-25
STOCHASTIC PROCESSES
119
support families of independent variables with prescribed laws. Theorem 26.1
gives the elegant and proper way of doing this; and the strength of that theorem is
needed in, for example, the direct construction of Poisson measures in Section 37.
However, it is often the case that all we require is a model that supports the
existence of a sequence of real-valued random variables with prescribed
distribution functions. We now recall briefly the well-known trick for achieving this;
[W] gives some more details.
Let
(a#-,P) = ([0,l],«[0,l),Leb).
Expand ω in Ω in binary, and write Χ(ω):= ω:
Χ(ω):= ω — ·ω1ω2ω3··· = Yjl~kcok.
(Conventions made about dyadic rationals are irrelevant.) Then the variables
(ππ: neN), where ππ(ω):= ωη are independent coin-tossing variables, each taking
the values 0 and 1 with probability \ each. Thus
ΑΓ1(ω):=·ω1ω3ω6...,
X2(coi)'.= ·ω2ω5ω9...,
ΛΓ3(ω):=·ω4ω8ω13...
etc. defines an independent sequence of variables each with the same distribution
as X, that is, each with the uniform distribution on [0,1].
If (Fn:neIN) is a sequence of distribution functions on R then the definitions
Yn:=sup{y:Fn(y)*:Xn}
produce a sequence of independent variables (Yn), Yn having distribution function
All that is needed to prove these statements in this section is the Uniqueness
Lemma 4.7.
Note that we now have all the theoretical equipment for the proof of the
existence of Wiener measure in Section 1.6.
24. Exercises. Do all the exercises preceding Exercise E9.1 in [W].
Of course, the exercises in our Section 13 have important consequences for
probability. For example, El 3.1 la shows that if AT and Υ are random variables
taking values in (S,^(S)), where {S,p) is a separable metric space, then p(X, Y)
is a real-valued random variable.
3. STOCHASTIC PROCESSES
The Daniell-Kolmogorov Theorem
25. (£T,<?T); σ-algebras on function space; cylinders and σ-cylinders. Let (£,<?)
be a measurable space, and let Τ be a set. Recall that ET is the set of all functions
120 SOME CLASSICAL THEORY 11.25
/ from Τ to E. For teT, define nt:ET ->E to be the evaluation map
(25.1) π,(/):=/(ί).
{25.2) DEFINITION (the σ-algebra ST). Define the σ-algebra
£T\=G{nt\teT}
on ET. Thus ST is the smallest σ-algebra on ET such that each щ is
{STImmeasurable.
For 0 Φ S £ T, define π5: ET->ES to be the restriction map
(25.3) π5(/):=/|5.
Then (check!) π5 is (^T/<^5)-measiirable.
(25.4) DEFINITION (cylinder, special cylinder). We say that a subset F of ET
is a cylinder if F has the form
(25.5) F = π"xAs = Asx £n5,
for some non-empty finite subset S of Τ and some As in Ss. We say that F is a
special cylinder if it has the form
(25.6) [)п;'Н, = (х\н\хЕт\\
teS \teS /
where S is a finite subset of Τ and HteS for teS.
(25.7) LEMMA. The cylinder sets form an algebra that generates ST. The special
cylinder sets form a π-system that generates ST.
The proof is left as a simple exercise.
(25.8) DEFINITION (σ-cylinder). We say that F is a σ-cylinder if it has the form
for some non-empty countable subset SofT and some As in Ss.
(25.9) LEMMA. ST is precisely the collection of σ-cylinders: thus membership of
an element of ST imposes restriction on the values of f only at countably many
t-values.
This result is very important. Its proof is easy. One need only show that the
σ-cylinders form a σ-algebra. The key point is that if ((S(n)) is a sequence of
countable subsets of Τ and for each n,A(n)eSS{n), then
f]KS(n)A(n) = n^A9
η
Π.25-27
STOCHASTIC PROCESSES
121
where S=[jS(n) and
A = f)iA(n)xEsWtt)]e£s.
η
The notation introduced in this section will be carried forward for some time.
26. Infinite products of probability triples. Section 25 had to be the first section
of this discussion of stochastic processes. The present section requires the
definition of S1\ but otherwise would belong more properly at the end of part
2 of this chapter.
(26.1) THEOREM. For each t in T, let μί be a probability measure on (E,S).
Then there exists a unique probability measure μ on (ET, ST) such that, whenever
S is a finite subset of Τ and HteS for teS,
(26.2) μ(ΠπΓ1Η,) = ΠΛ(^)·
\teS / reS
The uniqueness of μ is an immediate consequence of Lemma 4.7 and the fact
that the special cylinders form a π-system that generates ST.
The existence of μ is quite a deep matter. Fubini's Theorem implies the
existence of a finitely additive measure μ0 on the algebra of cylinder sets such
that the analogue of (26.2) holds for μ0. Caratheodory's Theorem 5.1 shows
that we need only (!) prove that μ0 is σ-additive on the collection of cylinder
sets. After a little thought about Lemma 25.9, we realise that we need only
prove the result for the case when Τ is countable. Compare Observation 30.2
below. A leisurely proof is given in [W; Chapter A9] (you can regard the (R, SS)
there as a notation for (E, S) for this purpose). You should compare and contrast
that proof with the proof given below for the Daniell-Kolmogorov Theorem,
for which topological assumptions are necessary. When (E, S) does have suitable
topological properties, as is always the case in practice, Theorem 26.1 follows
from the DK Theorem.
Note that
(26.3) the (E, Syvalued random variables (nt:teT) on the probability triple (ET, iT, μ)
are independent, nt having law μν
The problem of the existence of'completely independent' stochastic processes
(and, in particular, of independent, identically distributed (IID) sequences) is
therefore settled. We now turn to the study of more general processes.
27. Stochastic process; sample function; law. There are many different ways, all
important, of regarding a stochastic process.
(27.1) CLASSICAL DEFINITION (stochastic process; state-space; parameter
122 SOME CLASSICAL THEORY 11.27,28
set; carrier triple). Let Τ be a set, (E,£) a measurable space and (Ω,#",Ρ) a
probability triple. The traditional definition of a stochastic process with time-
parameter set Γ, state-space (E, S) and carrier triple (Ω, J*", P) is as a collection
{Xt:teT} of\E,S)-valued random variables carried by our triple (Ω,^,Ρ).
Thus, we have, for each i, the picture
(27.2) DEFINITION (sample function; sample path; realisation). Let X be as
in (27.1). For ωεΩ, the map (element of ET)
Χ(ω):Τ->Ε
t^X^co)
is called the sample function of X (or, especially when Τ is a time-parameter
set, the sample path of X) or realisation of X corresponding to ω.
This leads to an alternative view of X, namely as the map
(27.3) Χ:Ω^ΕΤ, X'^.S7^^,
ω\-+Χ(ω);
in other words, as an (£T, £T)-valued random variable. You can easily check
that X is (J2r/^,r)-measurable as a map from Ω to ET if and only if each Xt is
(#7<?)-measurable as a map from Ω to E.
{27.4) DEFINITION (law of stochastic process). The law of the stochastic
process X in Definition 27.1 is the probability measure
(27.5) μ^ΡοΑΓ"1 on (ΕΓ,<ίΓ);
in other words, it is the law of the (£T,<?T)-valued random variable X.
If we wish to emphasise the role of (Ω, J^, P) and/or Γ, we shall use such
notations as
(Ω,#\Ρ;*) or (Cl^,P;{Xt:teT})
to signify our process X.
28. Canonical process. Let X be the process of Section 27, and let μ be its law.
The process
(28.1) {ET,ST^\nt:teT)
trivially has the same law μ as X; it is called the canonical process with law μ.
A canonical process is completely determined by its law; and, for a canonical
process, the idea that a sample point 'is' the outcome of the experiment is
Π.28,29
STOCHASTIC PROCESSES
123
restored. Canonical processes are certainly nice. However, probability theory
gets most of its depth from being able to construct (certainly non-canonical!)
processes from other processes by time transformation, or as solutions of SDEs,
etc.
Important note on terminology. It is important that we are currently working
with the space ET of all functions from a set Τ into a space Ε that carries a
measurable structure S. When we speak of canonical Brownian motion, we
usually mean the set-up
where С = C([0, oo); R) is the space of continuous paths w: [0, oo) -*R, nt: С -*R
is the evaluation map π,(ω) = ω(ί), s/ = a{nt:telR] and W is Wiener measure.
You must keep in mind that at the moment, all paths are allowed.
29. Finite-dimensional distributions, sufficiency; compatibility. We continue with
the notation of the last few sections.
For a non-empty subset S of Γ, define π8Χ:Ω-*ΕΞ via
(29.1) (π5*)(ω):= π8(Χ(ω)) = X(co)\s
and
(29.2) /i^Po^X)-1 оп(£5,П
(29.3) DEFINITION (Fin(7), finite-dimensional distributions). Let Fin(7)
denote the set of поп-empty finite subsets ofT. The probability measures
&s:SeFm(T)}
are called the finite-dimensional distributions of X.
For SeFin(r),
(29.4) μ5 = μοπ-ι0η(£5,Ε5).
If we know the finite-dimensional distributions of X then we know the value
of μ on all cylinders, and hence, by Lemmas 4.7 and 25.7, we know μ on the
whole of (EV7):
(29.5) the finite-dimensional distributions determine the law.
(This is what is meant by 'sufficiency' in the title of this section: it has nothing to
do with Fisher's brilliant concept in statistics). Do Exercise E38.29.
Note that if t/, KeFin(T) and U £ V, and ifn^ denotes the restriction map
from Ev to Eu, then we have the compatibility condition or projective property:
(29.6)
μυ = μν°(πΙ)
124
SOME CLASSICAL THEORY
11.29,30
The fundamental Daniell-Kolmogorov Theorem considers the following
problem. Suppose that we have a family of probability measures as in (29.3)
that satisfies the compatibility condition (29.6): does there exist a measure μ on
(£r, ST) such that (29.4) holds? In the language of category theory, we are asking
whether a projective system has a projective limit. Rather surprisingly, the
answer is 'Not in general'. In order to obtain a positive result, we have to make
a topological assumption about the measurable space (E,S)\ we need the
following inner regularity with respect to compact sets.
(29.7) LEMMA. Let J be a compact metric space, and let В be a Borel subset
of J. Ifm is a finite measure on J and ε > 0, then there exists a compact subset К
of В such that
m(K)>m(B)-s.
This standard result is proved in Section 81.
30. The Daniell-Kolmogorov (DK) Theorem; 'compact metrisable' case. The
DK Theorem is the essential first step in constructing stochastic processes.
The general case of the theorem is given in the next section. It is well worth
presenting the present 'simple' case on its own.
Recall that Fin(7) is the family of non-empty finite subsets of T.
(30.1) THEOREM (Daniell, Kolmogorov). Let Ε be a compact metrisable space,
and let S — ЩЕ). Let Τ be a set. Suppose that for each S in Fin(T), there exists
a probability measure μ5 on (ES,SS\ and that the measures ^5:SeFin(r)} are
compatible or projective in that
(30.2) μν= μν°(πνϋΓ1
holds whenever C/, KeFin(r) and U ^ V. Here n\j is the restriction map from Ev
into Eu. Then there exists a unique measure μ on (£r, ST) such that
(30.3) μ8 = μ°^1 on (E\£s\
where ns is the restriction map from ET to Es.
Start of Proof. For any cylinder set F, we have for some SeFin(7) and some
AseEs,
(30.4) ^ = я"1у15 = Л5х£г\5.
For such an F, set
(30.5) /z0(F):= μ8(Α8).
The compatibility condition (30.3) guarantees that this definition is independent
of the particular representation of F used in (30.4). Moreover, it is obvious that
11.30 STOCHASTIC PROCESSES 125
μ0 is finitely additive on the algebra # of cylinder sets. We need only show that
(30.6) μ0 is countably additive on Ή
since then Caratheodory's Theorem does the rest.
The result (4.3) makes it clear that Theorem 30.1 is implied by the following
lemma.
(30.7) LEMMA. Suppose that
(i) FneV (neN); F„=>F„+1 (Vn);
(ii) for some ε > 0, μ0(Ρ„) > 2ε (Vn).
Thenf)nFn*0.
Proof of Lemma. Let (Fn) satisfy the hypotheses of Lemma 30.7. We have
(30.8) F„ = n^An = AnxE^s^
for some S{n)eF'm(T) and some An in SS(n). Now, μ5(π) is a probability measure
on the compact metrisable space ES{n\ and so, by Lemma 29.7, there is a compact
subset Kn of An such that
μ8{η)(Κ„)>μ8{η)(Αη)-2-ηε.
In other words,
(30.9) μ0(Ηη)>μ0(Ρη)-2-%
where
(30.10) Hn:=KnxET^Sin).
Note that Hn is compact, by Tychonov's Theorem.
You can easily combine the hypotheses of Lemma 30.7 with (30.9) to prove
that
^о(Я1п---пЯи)>8, Vn.
Thus
(30.11) Ηίη···ηΗηϊ0, Vn.
If f]kHk = 0 then [JkHck = ET, whence the fact that Ε is compact forces
(J Щ = ET for some n,
k^n
contradicting (30.11). Hence f)kHk^0, whence, a fortiori, f]kFk^0' Thus
Lemma 30.7 and Theorem 30.1 are true. Π
Now do Exercise E38.30.
126
SOME CLASSICAL THEORY
11.31
31. The DanieD-Kolmogorov Theorem: general case. The case now to be presented
is not the most general known, but it is good enough for us.
(31.1) THEOREM (Daniell, Kolmogorov). Theorem 30.1 remains true if the
assumption that Ε is compact metrisable is replaced by the assumption that Ε is
a Lusin space, that is, Ε is homeomorphic to a Borel subset of a compact metrisable
space.
Remark. Of course, Rn is homeomorphic to a Borel (indeed, open) subset of a
compact metric space. For example, use stereographic projection of the Sn sphere
inRn + 1.
The following observation will prove useful.
(31.2) OBSERVATION. In proving the theorem, we may assume that Τ is
countable.
Justification of Observation. All of the remarks made up to and including the
statement of Lemma 30.7 transfer to the present case. Proving Lemma 30.7 for
a fixed sequence (Fn) as in (30.8) is identical to proving the same result when
T= \JS(n). D
Proof of Theorem 31.1. We suppose that Eef, where f\=3t(J\ J being a
compact metrisable space.
We do (as we may) assume that Τ is countable.
We derive the theorem directly from Theorem 30.1, making no further use
of Lemma 30.7.
For each S in Fin (Γ), we extend μ5 on (£5, Ss) to μ5 on (J5, fs) in the obvious
way:
μ8(Α8):=μ8(Α8ηΕ5) (Asefs).
Define μ0 on the algebra # of cylinder sets associated with (J, #) via the obvious
analogue of (30.5). Since J is compact metrisable, we know from Theorem 30.1
that μ0 has a unique countably additive extension μ to (Jr, #T).
Now, Τ is countable. We may therefore find a sequence (T(k)) of finite sets
with T(k)]T But then
μ(£Γ) = |lim/2(£r(*> χ JT^k)) = |lim fiT(k)(ET<k))
= |liml = l,
and
l4L):= fi(LnET) (LeSt)
obviously defines the required probability measure on (ETy ST) asserted by the
theorem.
11.31,32 STOCHASTIC PROCESSES 127
The proof of the DK Theorem is finished. Π
Discussion. Suppose that Ε is not compact, and that Τ is uncountable. Then
ET is not an element of </r, and we cannot say that μ(Ετ) = 1. If however, F
is any element of ^T such that ET czF then, for some countable subset S of Γ,
F2£s χ Jn5, and μ(Ρ) = 1. Thus the outer μ-measure fi*(F) of F is equal to
1, and what is happening is that
M(F) = /i*(F) (FeST).
This kind of thing will keep on happening!
32. Gaussian processes; pre-Brownian motion. In this section and the next we
look at some applications of the DK Theorem. We shall soon see in Section
34 that these applications are, as yet, extremely unsatisfactory.
(32.1) Gaussian processes. Let Γ be a parameter set. Let m: 7-*R, and let V
be a symmetric non-negative-definite function from Τ χ Τ to R, so that, for
any finite subset S of Τ and any function / on S,
Ein^)/(r)/(#o.
reSseS
We know from elementary theory that, for SeFin(r), there exists a unique
measure μ5 on (R5,Jf5) such that, for 0eR5,
(32.2) f πρϊϊΣ W(s)\s(df)
JR.* L seS J
= exp\iΣ 0{rMr)-\ςΣ %r)V{r,зЩзЛ
L seS 2-reSseS J
Indeed, if the restriction Vs of V to S χ S is strictly positive-definite then μ5 has
density
(32.3)
(2^-'s'/2(detF5)-1/2exp{-^ Σ if(r)-m(rn(Vsr%s)lf(s)-m(Sn
L J-reSseS
relative to the Lebesgue measure on R5.
We also know from elementary theory that the measures {μ5: Se Fin (Γ)} are
compatible (projective) in the sense of the DK Theorem. Hence we can
construct the Gaussian process (RT,@T,ftnt:teT) with mean function m and
covariance function V; this has the projective limit μ as its law. That m is the
mean function and V is the covariance function is confirmed by
μ(π,) = m(i), μ(π5π,) - μ(π5)μ(π,) = V(s, t\ V(s, t).
(32.4) Orthogonality and independence. If T1 and T2 are disjoint subsets of Τ
128
SOME CLASSICAL THEORY
H.32-34
and V(tu t2) = 0 whenever tteΤί5 ί = (1,2), then {ntl: t1 eTj and {πί2: t2eT2} are
independent processes.
{32.5) Pre-Brownian motion. If we take
Τ = [0, oo), m{t) = 0, V(s, t) = min (5, t)
then, as we already know, V is positive-definite. We call the associated canonical
process (ΚΓ,^Γ,μ;π,:ίΕΤ) pre-Brownian motion. We adjoint the suffix 'pre-'
because this process has all possible functions from (0, 00) to R as paths, not
just continuous functions.
33. Рге-Poisson set functions. Let (\Υ,Ψ*,λ) be a σ-finite measure space such
that every singleton set {x} (xeW) is in iV. By a pre-Poisson set function on
(W, if) with intensity measure A, we mean a (Z+u{ 00})-valued process (if one
exists) {A(B):BeiT} with the following properties:
(33.1) (i) for every В in iT,A(B) is a Z+-valued random variable with the
Poisson distribution of parameter λ(Β):
Prob(A(B) = k) = K—L if χ(Β) < 00,
fe!
Prob(A(B) = 00) = 1 if λ{Β) = oo;
(33.1) (ii) if Bu...,Bn are disjoint elements of iir then Α(Βγ),...,A{Bn) are
independent random variables:
(33.1)(iii) whenever B1 and B2 are disjoint elements of if,
р[Л(в1ив2; = л(в1) + Л(в2)] = 1.
If S is a finite subset of iV then we can easily specify the desired law μ5 of
{A(B):BeS}. We know from elementary theory that the sum of independent
Poisson variables is again Poisson, from which it follows that the family
fas'.SeFiniiT)} is projective. Hence, we can construct a canonical pre-Poisson
set function
(33.2) ((Z+v{aD})*9P(Z+v{ao})*,KnB:Bein
with intensity measure λ.
Beyond the DK Theorem
34. Limitations of the DK Theorem. Under its hypothesis, the DK Theorem
47.1 provides us with a canonical process
(ET^T^;nt:teT)
that has all possible functions in ET as sample functions. Moreover, we know
Π.34,35
STOCHASTIC PROCESSES
129
from Lemma 25.9 that FeST if and only if F is a σ-cylinder, that is, if and only
if F = n^1As for some countable set S and some As in Ss.
(34.1) Difficulties with path continuity. Consider the canonical pre-Brownian
process
(Ж[о'^[о'от),#я,:£е[0,оо))
in (32.5). Let С be the set of continuous functions from [0, oo) to R. Life would
be simple if it were the case that Ce^[0'co) and μ(ϋ) = 1. However, C^^[0'oo)
because С is not a σ-cylinder.
Suppose that Fe^[0'co) and F с С. Then
(ij F = π^1Α5 for some countable set S and some Л in 3P\
(ii) every element of F is continuous on [0, oo).
Since property (i) tells us nothing about the behaviour of elements of F off the
set S, we conclude that F = 0. we have proved that
(34.2) Fe^[0'co) and F с С imply that F = 0.
Thus С has inner μ-measure 0, and completion certainly will not help us.
(34.3) Difficulties with Poisson measures. Recall the canonical pre-Poisson set
function in Section 33. What we really want is that Β\-+πΒ(ω) is a measure
for each ω. Let Ji£(Z+ufoo})^ be the set of (Z+и {oo})-valued measures
on (VffW). Life would be simple if it were the case that Л e(Z+ kj {co})^ and
μ(Μ) = 1. However, you can easily prove, in analogy with (34.2), that if iV is
uncountable, as it will be in every case of interest, then
(34.4) Fe(Z+ и{oo})^ and F s Μ imply that F = 0.
Note that if Л0 denotes the set of finitely additive measures on (W,iT) then,
if iT is uncountable, the analogue of (34.4) will hold for Л0.
35. The role of outer measures. Again let (£, S) be a measurable space and Τ
a set. Let μ be a probability measure on (ET,ST).
Let G с ET. Think of G as a class of good sample functions. Thus G might
be С in the context of (34.1), or Ji in the context of (34.3). In many contexts,
G will be the set of ruj/ii-continuous paths on (0, oo), which is a more natural
class for probability theory than the set of continuous paths.
Before reading the next lemma, please reread Lemma 6.1.
(35.1) LEMMA and DEFINITION. A process with law μ exists with all its
sample functions in G if and only if the outer μ-measure μ*(ϋ) of G is 1, Then
is the canonical process with path-space G and law μ.
130
SOME CLASSICAL THEORY
H.35,36
(35.2) COROLLARY. Because of Wiener's Theorem 1.6.1, we have м*(С) = 1
if μ is the pre-Brownian law.
It is never at all easy in practice to decide whether or not μ*(ϋ) = 1. Lemma
35.1 is more a matter of clarifying structure than a useful tool.
36. Modification; indistinguishability. Let У be a stochastic process with
parameter set Τ and state-space (£, S) carried by the triple (Ω, ^", Ρ).
(36.1) Important discussion. To indicate the way in which things develop,
suppose that Υ is an (R, 3&)-valued process with time-parameter set [0, oo) and
law μ, carried by a triple (Ω, #", P). It is often possible, by making heavy use of
the structure of μ, to show that there exists a set QG in !F with Ρ(Ωσ) = 1 such
that, for every a>eQG, the map
q\-+ Yq(co) from Qn [0, oo) to R
has a right-continuous extension t\-^Xt(d) from [0, oo) to 1R. If ω£Ωσ, set
Xt(co) — 0 for all t. Then all paths of X are right-continuous. It is often further
possible to show, again by using the structure of the particular μ, that
P(Xt = Yt) = 1, VieT. Then X will be a process with law μ, all paths of which
are right-continuous. (The set Ω€ of ω for which t\—>Xt(co) is continuous will
then be an element of J^, so that P(QC) is meaningful.)
The 'regularisation' method just described, and due to Kolmogorov and
Doob, is one of the most powerful and widely used ways of obtaining processes
with right-continuous paths. Often, however, we use direct methods of construe-
tion such as Ciesielski's proof of the existence of path-continuous Brownian
motion in Section 1.6. Having obtained this Brownian motion, we can then
construct path-continuous diffusions by solving SDEs; and so on.
The good sense of the following definition is now evident.
(36.2) DEFINITION (modification). A process X is called a modification of Υ
if X has the same state-space, parameter set and carrier triple, and also
Р(Х,= у,)=1 for every teT.
Clearly, two processes that are modifications of each other have the same law.
(36.3) Note. It is not necessarily the case (even if Τ is a singleton set) that if Υ
has law μ and G^ET satisfies μ*(θ)= 1 then У has a modification with all
paths in G. See Exercise E38.36.
The most-stringent form of'near-equality' of processes will now be introduced.
(36.4) DEFINITION (indistinguishable processes). Let X and Υ be two processes
with the same state-space, parameter set Τ and carrier triple. We say that X and
11.36,37 STOCHASTIC PROCESSES 131
Υ are indistinguishable for are equal modulo indistinguishabilityj if
P{Xt=Yt for all teT)=l.
(36.5) PROPOSITION. The following statements hold.
(i) Two indistinguishable processes are modifications of each other,
(ii) If the parameter set is countable then two processes that are modifications of
each other are indistinguishable.
(Hi) If X and Υ are right-continuous processes with values in some Hausdorff
space (£, @(E)) then X and Υ are modifications of each other if and only
if they are indistinguishable.
37. Direct construction of Poisson measures and subordinators, and of local time
from the zero set; heuristics; Azema's martingale. Because Poisson measures
are the foundation for excursion theory, this topic is very important for us.
As in Section 33, let (W, iP, λ) be a σ-finite measure space in which all singleton
sets belong to iT. We want to construct a pre-Poisson process Aonf with
intensity measure λ such that B\-+A{B) is a measure. We follow Kingman [2];
see also Kingman [4].
First suppose that λ(\Υ) < oo. Use Theorem 26.1 to construct a sequence
(Ν,Ζ15Ζ2,...)
of independent variables on some triple (Ω, ^, Ρ) where
(i) N has the Poisson distribution with parameter λ{Ψ);
(ii) each Zk takes values in {W,W) and has law X/k(W).
Thus
P(JV = m) = e ~ XiW)X(W)m/ml (m = 0,1,2,...),
P(ZkeB) = X(B)/X(W) (Ве1Г,кеЩ
We now define A to be the measure on {W,iT) with
Ν(ω)
Λ(Β,ω):= Σ IB(Zk(w)) (BeiT).
*=ι
Then, for BeiT and r,meZ+,
P[A(B) = r;A{W\B) = m] = P[JV = r + m]P[A(B) = r\N = r + m]
_ e-XiW)X{W)r+m (r + m)! Γ λ{Β) Τ Γ λ{Β) lm
(r + m)\ r\m\ LWOJ L ~^(W0J
_ e~MB)X(B)r e-XiW^B)l(W\B)m
r\ ml
132
SOME CLASSICAL THEORY
11.37
so that A(B) and A(W\B) are independent and have Poisson distributions with
parameters λ(Β) and λ{\Υ\Β) respectively. It is easy to give a full proof that Λ
has the required 'pre-Poisson'—and, indeed, now proper Poisson-measure—
structure.
Now consider the case when λ is assumed only σ-finite. Write λ = ΣπεΝ^π5
where each λη is a finite measure on (W,W). Use Theorem 26.1 and the
construction just described for the case 'Λ finite' to construct independent
Poisson measures Λη on (W,iT% An having intensity measure λη. Then it is
easily verified that Л:=£ЛП is a Poisson measure with intensity measure λ.
Of course, it now follows (but it hardly matters) that μ*(Μ) = 1 in the context
of (34.3).
Subordinators. Recall from Section 1.28 that a Levy process is a right-continuous
process with stationary independent increments and that a subordinator is a
Levy process with non-decreasing paths. [Note. There are never enough symbols
to go round in mathematics. When we combine different ideas, we often find
conflict of commonly used notations. In the following discussion, we adjust the
notation of Section 1.28 so that it does not conflict with that which we have
recently been using.]
Let X be a subordinator with X(0) = 0. The distribution of X{1) is infinitely
divisible: for each n, it is the sum of η independent random variables each with
the distribution of X(l/n). For θ > 0, we have
(37.1) Eexp 1-ΘΧ(ί) ] = exp [-ίΨ(0)],
where we can regard the Laplace exponent Ψ as defined by (37.1) with t= 1.
From Theorem 1.28.3, we have, for θ > 0,
•ΜΟ,αο)
(37.2) *F(0) = c0+ I (1 - e~ex)v{dx\
J(0,oo)
for some с ^ 0 and some measure ν on (0, oo), the Levy measure of X, with
(37.3) min(x,l)v(</x)<oo,
the condition guaranteeing that, for some (then all) θ > 0, the integral appearing
on the right-hand side of (37.2) is finite.
(37.4) THEOREM (Levy, Ito). Let ν satisfy the condition (37.3). Let A be a
Poisson measure on (0, oo) χ (0, oo) with intensity measure Leb x»v. Let c^0.
Define
Jse(0,f] Jjc
(37.5) X{t):=ct+ I xA{dsxdx).
Jjce(O.oo)
Then X is a subordinator with Laplace exponent Ψ as at (37.2).
11.37
STOCHASTIC PROCESSES
133
Heuristic proof. It is (truly!) obvious from the independence properties of the
Poisson measure that AT is a subordinator. Formally, the number J{t,dx) of
jumps of size between χ and χ + dx made by X during time-interval [0, i] is
Poisson with parameter fi:=tv(dx). Now X(t) = ct + $xJ(t,dx), the 'sum' of
independent bits. Moreover,
f β-θχηβ-ββη
Ε exp [-0xJ(i, </*)]= — = exp[-/J(l-*"**)]
n\
= expl-t(l-e~ex)v(dx)l
and the result follows.
Exercise. Make this heuristic proof rigorous. First consider the compound
Poisson process (see Section 1.28) obtained by removing all jumps of size less
than ε from X.
The process Я+ = {Я* :a ^ 0} for Brownian motion. Let β be a path-continuous
Brownian motion on R, starting at 0. For a ^ 0, define
(37.6) Я+(а):= inf {t > 0:Bt > a}.
Then Я+ is a right-continuous non-decreasing process; and it is clear from the
strong Markov theorem and the spatial homogeneity of Brownian motion that
Я+ is a subordinator. From (1.9.1), with the с there equal to 0, we have
(37.7) Ε exp(-ΘΗ?) = exp [-αΨ(0)],
where
(37.8) ψ(0) = (20)1/2= J °°(1 -e-ex)(2nx3)-1,2dx,
Jo
so that our с equals 0 and v{dx) = {2nx3)~1/2.
Define the continuous non-decreasing process
St:= supBs
as usual. Recall from (1.14) Levy result that Y:=S — В defines a reflecting
Brownian motion Y. The jumps of Я+ correspond to intervals of constancy of
S and to the intervals between visits to 0 by Y. Let i2f := {t: Yt = 0}, the zero-set
for the reflecting Brownian motion Y. For t ^ 0 and ε > 0, let JV(i, ε) denote the
number of component intervals of [0, ί]\3Γ with length greater than ε. We
know that N(H*,e) has a Poisson distribution of mean αν(ε, οο) = α(|πε)"1/2.
It is easy to prove by using martingale techniques and exploiting monotonicity
(see Exercise (79.71c)) that, almost surely,
(37.9)
(±ns)V2N(H:,s)-+a = S(H;) (e->0)
134 SOME CLASSICAL THEORY 11.37
uniformly on compact α-intervals, and it follows that, almost surely,
(±ne)1/2N(t,e)^S(t) (ε->0)
uniformly on compact α-intervals. We have therefore constructed the local time
( = S for the reflecting Brownian motion Υ directly from the set 2t of times at
which 7 = 0.
A striking and difficult result due to Levy, Wendel, Taylor and Hawkes (see
Hawkes [2]) tells us that t{i) is the Hausdorff /i-measure of «2Γ n[0,i] associated
with the function
M<5):=[2<51oglog(l/<5)]1/2.
Heuristics. Chapter II is, as you will agree, following a Definition-Lemma-
Theorem approach. For the remainder of this section, however, we cast off the
shackles of rigour—for interest's sake. We shall return later to many of the
points considered here.
The intervals comprising the set [0, co)\& are the excursion intervals of Υ
(away from 0). The lengths of these intervals are determined by the measure v.
We might therefore conclude as a heuristic principle that, given that an excursion
interval is of length at least a, the probability that it, is of length at least у is
(37.10) -1Zl_Z= _ .
v(a,oo) \yj
We are now going to work with our BM0 process В rather than У. The zero-set
for В is the same as that for the reflecting Brownian motion |B|, and so has the
same structure as 2£.
Let t > 0, and define
(37.11) a,:= t - sup {5 ^ t:Bs = 0},
βί:=ίηϊ{Η>ί:Β» = 0}-ί.
You might guess on the basis of (37.10) that
α W2
(37.12) P(/^>№ = «) = . , a
and Exercise (37b) in Section 38 shows that you would be right.
Azema's martingale. Define Azema's process J and its natural filtration {ft} by
(37.13) Jt:=Sen(Bt)(2oct)1/2, ft:=a{Js:s*it}.
(37.14) THEOREM (Azema). {Jt} and {J2 - ή are martingales relative to {ft}.
These processes are not martingales relative to the natural filtration of B.
Let us see how Azema's Theorem ties in with (37.12).
11.37
STOCHASTIC PROCESSES
135
(37.15) Suppose that <xt = α for some fixed t and α with 0 < α < t. Let и > α, and
let Τ be the first time after time t that <xT is either w or 0. Then,
either, and with probability (a/w)1/2
ar = w and T—t = u — a,
or, and with probability {a/4v3)1/2dv, for some i; with a<v<u,
aT = 0 and Γ— ie(u, ι; -h <ii?) — a.
Elementary calculations now show that, conditionally on (37.15),
EJr = J„ E{J2T-T) = Jf-t,
and these results start to make Azema's Theorem plausible.
A much better explanation of Azema's result is provided by the following
facts: for xeR and s ^ t,
^L(i-a)aJ 2a *\ 2a/
whence
(37.17) P{Btedx\ateda, sgn (Bf) > 0) = -exp ( - — Ъх (х>0).
a \ 2a/
See Exercise (37a) in the next section.
If ξ is a positive random variable with the probability density function on
the right-hand side of (37.17) then E({) = (£πα)1/2 and Ε{ξ2) = 2α. This strongly
suggests that
(37.18) Ε(Β,|Λ) = W/2J<> E(B2 - ί|/t) = J2 - i;
and hence {Jt} and {J2 — t} inherit their martingale properties relative to {ft}
from those of {Bt} and {B2 — t} relative to the natural filtration of B. See
Exercise E79.71b.
If i now denotes local time at 0 for B, and xt\=mi{wJ(u)> t}, then the
processes R, where Rt is the sum of the moduli of the jumps of J by time zt, is
clearly a subordinator. However, the number of jumps of modulus greater than
ε made by J by time τ, equals the number of jumps of modulus greater than
|ε2 made by α by time τ,; and this, like Ν(Η*,ε2), has a Poisson distribution of
mean ί(£π)~1/2ε-1. Thus the Levy measure vR of R satisfies vR(s, со) = ί(|π)~1/2ε-1;
but this fails the integrability condition (37.3). You see the problem:
(37.19) Azema's martingale J is not of finite variation.
Azema's martingale has been the source of much interest recently, particularly
to workers in quantum probability. See Azema [1], Azema and Yor [3], Emery
[2], Meyer [12] and Revuz and Yor [1].
136
SOME CLASSICAL THEORY
11.38
38. Exercises. (E38.25) Extend exercise E13.3b as follows. Show that if (£, S) is
a measurable space and Γ is a parameter set, then a function ξ:Ετ-+ΈΙ is
(immeasurable if and only if £=/°π5 for some countable subset S of Τ and
some immeasurable function /:£S->R.
(E38.29) Time-reversal for Brownian motion. Fix t ^ 0. We consider Brownian
motion with time-parameter set [0, i]. Let Ω:= C([0,i];R), and, for ωεΩ and
se[0,i], write Bs(a>):=a>(s), and define stf:=a{Bs:s^t). Let P* be the law of
Brownian motion starting at x, so that Px is the unique measure on (Ω,^)
such that for neN, for 0 = s0 <sx < ··· <sn and x0,xl9...9xneR with x0 = 0,
we have
p( Π {B(sdedxt})= np(5£-5,_lfx£_lfx£)rfxif
this making rigorous sense when integrated over xl9 x2,..., xn in a Borel subset
ofRn.
Let" be the time-reversal map on Ω, so that
ώ(5):=ω(ί-5) (O^s^t).
For £егшг/, define £(α>):= £(ώ). Prove that, for £е(т.я/)+,
f E*(£)</x = [ E^)dy(^oo).
There are some measurability questions involved, which we shall study in detail
later—do not fret over these.
Hint. First take ξ = IA{B0)IH(Bs)Ic(Bt\ where 0 ^ s ^ i, and A, H and С are
Borel subsets of R. Because JPxdx is not a finite measure, you will have to do
some truncation. But the idea is the same as that used to show that the finite-
dimensional distributions determine the law of a process.
(E38.30) Lebesgue measure from coin tossing. Show that one can reverse the
argument in Section 23 as follows. Use the DK theorem to construct a sequence
of independent variables (£k:/ceN) each taking the values 0 and 1 with probability
\ each. Define X:= Σ2~4„· Then Lebesgue measure on ([0,1],#[0,1]) is the
law of X.
(E38.36) The object of this exercise is to confirm the point made in Note 36.3.
Let Ω:=[0,1], Jf:=^[0,l], μ:=Leb on (Ω,^), and let μ* be the outer
measure associated with (μ,^7). Let G be a subset of Ω with μ*(ΰ) = 1 and
^*(GC) = 1, where, of course, Gc:= Q\G. Define J^:= σ(^, G) and
P(F):=^*(GcnF) (FeP).
Prove that (Ω,^,Ρ) is a probability triple. See Lemma 6.1. Let 7(ω) = ω for
11.38,39 DISCRETE-PARAMETER MARTINGALE THEORY 137
ωεΩ. Prove that Υ has law μ, but that, even though μ*{ϋ) = 1, there is no
modification of Υ taking all its values in G.
(E38.37a) Use Exercise E38.29 and the result (1.9.2) (with с there equal to 0) to
prove (37.15). Now deduce (37.16).
Hint. Consider ξ:= f(B0)g(<xt)h(Btfy, where
_ f 1 if Bs = 0 for some se[0, r],
[θ otherwise.
(E38.37b) Prove (37.11).
(E38.37c) Modify the last two exercises to cope with the case when the Brownian
motion В has drift с
4. DISCRETE-PARAMETER MARTINGALE THEORY
Again, we follow [W] very closely. There, you will find the same notation, all
proofs not given here, and many illustrative examples. Neveu [5] gives a fine
broader picture of the scope of discrete-parameter martingale theory. Of course,
Doob [1] is the classic account. In that account, Doob emphasises the debt
we owe to Sparre Andersen and Jessen for their work on uniformly integrable
martingales.
After revising the theory of conditional expectation (due, of course, to
Kolmogorov), we concentrate on the Upcrossing Lemma, the Submartingale
Inequality, results on uniform integrability, the Optional-Stopping Theorem, the
Optional-Sampling Theorem (all, of course, due to Doob), and the 'Downward'
Convergence Theorem for supermartingales due to Levy and Doob. These
results are central to the extension to the continuous-parameter theory, which
occupies Part 5 of this chapter and dominates the remainder of both volumes.
CONVENTION: Until further notice, all random variables are (R, 38)-valued.
Conditional expectation
39. Fundamental theorem and definition. The following theorem and definition
constitute the greatest of Kolmogorov's many contributions to the subject.
(39.1) THEOREM and DEFINITION (a version of the conditional expectation
E(X\9)). Let (Ω,^,Ρ) be a triple, and X a random variable with E(|A"|)< oo.
138
SOME CLASSICAL THEORY
11.39,40
Let 9 be a sub-a-algebra of 3F. Then there exists a random variable Υ such that
(i) Υ is 9 measurable;
(ii) Е(|У|)<оо;
(Hi) for every set 9 in 9 (equivalently, for every set G in some π-system that
contains Ω and generates 9), we have E( У; G) — E{X, G).
Moreover, if Υ is another random variable with these properties then У = Y, a.s.,
that is, Р[У= У] = 1. A random variable Υ with properties (i)-(iii) is called a
version of the conditional expectation E{X\9) of X given 9, and we write
Y=E(X\9), a.s.
The Radon-Nikodym proof will be given shortly.
{39.2) The intuitive meaning. An experiment has been performed. The only
information available to you regarding which sample point ω has been chosen
is the set of values Ζ(ω) for every ^-measurable random variable Z, or,
equivalently, the values Ig(cd) for every Ge9. Then Υ{ω) = Ε{Χ\9){ω) is regarded
as (almost surely equal to) the 'expected value of Χ{ω) given this information'.
Note that if ^ is the trivial σ-algebra {0,Ω} (which contains no information)
then Ε(Χ\9)(ω) = E(X) for all ω.
Proof of Theorem 39.1. Existence. Suppose that XeSf1 (&,&,?). Consider
first the case when X ^ 0. Then, as we saw in Section 9, the map Gt->E(X; G)
is a finite measure on (Ω, 9) that is absolutely continuous with respect to P.
Hence, by the Radon-Nikodym Theorem 9.3, there exists а У in ^(Ω,^,Ρ)
such that
(39.3) E(Y;G) = E(X;G)
for all Ge9. The existence of У is therefore established; and the general case
when ATeJS?χ(Ω,^,Ρ) follows by linearity.
Uniqueness. If У and У are in ^(Ω,^,Ρ) and Е(У- Y;G)^0 for every G in
9 then У^ У, a.s. For consider Е(У- Y\Gn\ where G„:= {ω:{Υ- Υ)(ω) > η"1},
etc.
The π-system formulation. Suppose that XeS£^Ω,^,Ρ), Υε& 1(Ω999Ρ)9 and
that E( Y; G) = E(X; G) for e\^ry G in some π-system containing Ω and generating
9. By the Dominated-Convergence Theorem 8.5, the class of sets G in 9 for
which (39.3) holds is a d-system in the sense at (1.6). By Dynkin's Lemma 1.8,
this class must coincide with 9. D
40. Notation; agreement with elementary usage. We often write E(X\Z) for
Ε(*|σ(Ζ)), Ε(*|Ζ15Ζ2,...) for Ε(*|σ(Ζΐ5Ζ2,...)), etc.
The case of two RVs will suffice to illustrate the connection between the
abstract definition 39.1 and elementary conditional expectation. So suppose
H.40,41 DISCRETE-PARAMETER MARTINGALE THEORY 139
that X and Ζ are RVs that have a joint probability density function (pdf)
fx,z(x>z)· Then /z(z) = $wLfx,z(x>z)dx acts as a probability density function for
Z. Define the elementary conditional pdffX\Z of X given Ζ via
(ω*,*) if/z(z)^0?
/xiz№)H /zOO
(0 otherwise.
Let ft be a Borel function on R such that
E|A(*)I= f \h(x)\fx(x)dx«x>,
Jr
where of course fx(x) = Jr fxtz(x>z) dz gives a pdf for X. Set
#(z):= Λ(χ)/Χ|Ζ(χ|ζ)έίχ.
Jr
Then Y:=g(Z) is a version of the conditional expectation ofh{X) given σ(Ζ).
Proof. The typical element of σ(Ζ) has the form {co:Z(co)eB}, where Be@).
Hence we must show that Ε[/ι(Ζ)/Β(Ζ)] = E[#(Z)/B(Z)]. But this follows from
Fubini's Theorem. □
41. Properties of conditional expectation: a list. This is the same list of properties
as in Section 9.7 (and on the back cover!) of [W]. All Xs satisfy E(|A"|) < oo in
this list of properties. Of course, ^ and Ж denote sub-a-algebras of #". (The
use of 'c' to denote 'conditional' in {cMon) etc. is obvious.)
(41)(a) If Υ is any version of E(X\9) then E(Y) = E(X).
(41)(b) If X is У measurable then E(X\9) = X, a.s.
(41)(c) {Linearity) E(a1X1 + a2X2\<#) = ^ΕΟΧΊΙ») + a2E(Z2|Si), a.s.
Clarification. If Y^ is a version of Ε(ΑΊ|#) and Y2 is a version of E(X1\(S\ then
αι ΥΊ + a2^2 is a version of E(a1A'1 + α2Χ2\<&).
(41)(d) (Positivity) li X^O then ECY|3)^0, a.s.
(41)(e) (cMon) If (К*Л* then Ε{Χη\<Ζ)ΪΕ{Χ\<$\ a.s.
(41)(f) (cFaiou) If Xn>0 then EDiminfJTJSf] ^limmfE[jrj»]f a.s.
(41)(g) (cDom) If |-ΥΒ(ω)| < Κ(ω), Vn, EK< oo, and *„->*, a.s, then
E{Xn\<Z)->E(X\&\ a.s.
(41)(h) {cJensen) If c: IR-+IR is convex and E\c{X)\ < oo then
Е\с{Х)\Щ>с{Е\_Х\<3]\ a.s.
Important corollary. \\E(X\<#)\\p ^ ||ΑΊ|Ρ for p^ 1.
140
SOME CLASSICAL THEORY
11.41,42
(41)(i) (Tower Property) If Ж с 9 с J* then
Е[Е(Х\9)\Ж] = Е\_Х\Ж\ a.s.
Note. We shorten the left-hand side to Е\_Х\9\Ж] for tidiness.
(41)(j) {^Taking out what is known') If Ζ is ^-measurable and bounded then
(*) E\_ZX\9}=ZE\_X\9\ a.s.
If ρ > 1, p"1 + i"1 = 1, Ze Jif Ρ(Ω, «Г, P) and Ze JS?*(Q, Sf, P) then (*) again holds.
If Xe(m^)+9 Ze(m9)+, E(X) < oo and E(ZX) < oo then (*) holds.
(41)(k) (Role of independence) if Ж is independent of σ(σ(Χ),9) then
Ε[ΛΓ|σ(»,^Τ)]=Ε(ΛΓ|δί), a.s.
In particular, if X is independent of Ж then Е(Х\Ж) = E(Z), a.s.
For proofs of all the above properties, see Section 9.8 of [W]. Do Exercise
E60.41 now.
42. The role of versions; regular conditional probabilities and pdfs. If we consider
conditional expectation as a map from ί^Ω,^,Ρ) to 1/(0, #,P), these spaces
being the proper Banach spaces of equivalence classes of functions, then this
map is truly uniquely defined with no untidy 'almost sure' qualifications and no
need for 'versions'. So why not do this in this 'elegant' way?
The answer is that the ability to choose 'good' versions—of the functions,
not the equivalence classes—is absolutely crucial to the whole theory. We shall
repeatedly see cases where we get the good results by modifying random
variables on null sets. Thus, for example, we want modifications of martingales
that have right-continuous paths; and concepts of path regularity are meaningless
if we work with equivalence classes.
In the remainder of this section, and in the next, we consider a rather important
case that in some, though not all, respects parallels our earlier discussion of
Poisson measures.
(42.1) DEFINITION (a version of conditional probability, P(F\9)). Let Fe&
and let У^ЗР. We call any version ofE(IF\&) a version of the conditional
probability of F given 9, and write P(F\9) = E(IF\9), a.s.
By (41) (c) and the 'cMon' result (41)(e), we can show that for a fixed sequence
(Fn) of disjoint elements of ^", we have
(42.2) n\jFn\9) = YY>(Fn\9\ a.s.
Except in trivial cases, there are uncountably many sequences of disjoint sets;
and· it is therefore not at all clear that we can choose a good modification
{(P\9)(F):Fe^} of the process {P(F\9):Fe^}. Let us formulate what we mean
by a good modification.
11.42,43 DISCRETE-PARAMETER THEORY 141
(42.3) DEFINITION (regular conditional probability given 9). Let (Ω, У, Р)
be a triple and let 9 be a sub-a-algebra of iF. By a regular conditional probability
(P|^)(·, ·) given У, we mean a map
(42.4)(a) (P|Sf):^" χΩ->[0,1]
such that
(42.4)(b)/or Fe^, the function cub->(P|#)(F,cu) is α version ofP(F\<#);for almost
every ω, the map
(42.4)(c) Ft-+(P\9)(F,a>)
is a probability measure on 3F.
It is known, and is proved in Section 89, that regular conditional probabilities
exist under most conditions encountered in practice, but, as we shall see in the
next section, they do not always exist.
Note. The elementary conditional pdf fx\z(x\z) of Section 40 is a regular
conditional pdf for X given Ζ in that for every A in ^,
&>i—м fx\z(x\Z(co))dx is a version of P(XeA\Z).
J a
Proof. Take h = lA in Section 40.
43. A counterexample. This counterexample, for which Halmos, Dieudonne,
Andersen and Jessen share the credit, exhibits a situation in which no regular
conditional probability given 9 exists. It helps emphasise why we need some
extra 'topological' hypothesis such as that used for the positive result in
Section 89.
Take
(Ω,^):=([0,1],^[0,1]).
Let μ denote Lebesgue measure on (Ω, 9). Let Ζ be a subset of Ω of inner
μ-measure О and outer μ-measure l. (We are assuming the Axiom of Choice!)
Let 2F be the smallest σ-algebra on Ω extending 9 and containing Z, so that
a typical element Λ of 3F may be written (with Zc denoting [0,1]\Z)
A = (ZnA)n (Zc η Β), where A, Be<$.
The fact that Ζ and Zc have outer measure 1 implies (see Lemma 6.1) that
μ*(Ζ ηΑ) = μ*(ΖηΑ) = μ(Α%
μ*{Ζ€ η Λ) = μ*{Ζ€ η Β) = μ{Β).
Hence we can define a probability measure Ρ on (Ω, <F) by
Ρ(Λ):= \μ\Ζ η Λ) + |μ*(Ζ< η Λ) = \μ{Α) + \μ{Β\
142 SOME CLASSICAL THEORY H.43,44
Assume that (P|S?):^ χ Ω->[0,1] is a regular conditional probability given #.
We shall show that this assumption leads to a contradiction.
Let Ге^. Then, for Ge&,
Ε((Ρ|^)(ΖηΓ); G) = Ρ{ΖηΓηβ) = ±μ*{ΖηΓηβ) = ±μ{Γηβ) = E(f/r; G),
so that
(P19f)(Z η Γ, ω) = £/Γ(ω), a.s.
Since ^ is generated by a countable π-system </, and since
Г н^(Р | ^)(Z η Γ,.ω), Γ *-4/Γ(ω)
are measures for every ω (the first because of our assumption), the set
J:= {ω:(Ρ|^)(ΖηΓ)(ω) = |/Γ(ω),νΓΕ^}
= {ω:(Ρ|^)(ΖηΓ)(ω) = ^Γ(ω),νΓΕ./}
is in <S, and Ρ (J) = μ(7) = 1. (The argument now takes on a 'Russell's paradox'
appearance. The set J is itself an element of ^ ) If ω ε J then
(Ρ|^)(Ζη7,ω) = ^(ω)^1/Λ{ω}(ω) = (Ρ|^)(Ζη[Λ{ω}],ω),
so that Ζ η J Φ Ζ η [Α{ω}]; in other words, coeZ. Hence J, which is an element
of ^ of measure 1, is a subset of Z, contradicting the fact that Ζ has inner
measure 0.
44. A uniform-integrability property of conditional expectations. The reason that
the martingale and UI properties tie in so well is the following.
(44.1) THEOREM (Doob). Let Χε&^Ο,&,Έ). Then the family
{E{X\&):& is a sub-a-algebra of J5"}
is uniformly integrable.
Clarification. Because of the business of versions, a formal description of the
family in question would be the set of all random variables У with the property
that for some σ-algebra # с je-, y= E(X\&), a.s.
Proof. Let ε > 0 be given. Use Lemma 20.1 to choose δ > 0 such that, for Fe^,
P(F)<(5 implies that E{\X\;F)<s.
Choose К so that K"1E(|Z|)<(5.
Now let ^ с J*" and let У be a version of E(A"|37). By the 'cJensen' property
in Section 41,
(44.2) \Y\*ZE(\X\\n a.s.
И.44,45
DISCRETE-PARAMETER THEORY
143
Hence, as in fact we already know, E(| Y\) < E(|X\), and
КР(|У|Ж)<Е(|У|)<Е(|Х|),
so that P(\Y\>K)<6. But {ω:| Υ{ω)\ >K}e<S, and, from (44.2) and the
definition of conditional expectation,
Ε(|7|;|7|>ΚΚΕ(|*|;|7|>Κ)<ε,
and this is the desired uniform-integrability property. D
(Discrete-parameter) martingales and supermar ting ales
45. Filtration, filtered space; adapted process; natural filtration. Let (Ω,^,Ρ)
be given.
(45.1) DEFINITIONS (filtration, filtered space). By a filtration on (Ω,^,Ρ),
we mean an increasing family {&n:neZ*} of sub-a-algebras of2F:
The set-up (Q,^",P,{^n:neZ+}) is then called a filtered space.
{45.2) CONVENTION. Until further notice, we assume given a filtered space
(Ω,^,Ρ,{^:η€Ζ+}).
All of our martingales, supermartingales etc. will be defined relative to this set-up.
(453) DEFINITION (adapted process). A process X = {Xn:neZ+} carried by
(Ω,^,Ρ) is said to be adapted (to our given filtration) if for every neZ+,X„ is
^„-measurable.
(45.4) DEFINITION (natural filtration). Let W={W„:neZ} be a stochastic
process carried by our triple (Ω, J^,P). The natural filtration {Wn\neZ+} of W
is defined to be the smallest filtration relative to which W is adapted, so that
iTn = a(W0,Wu...,Wn).
(45.5) The intuitive meanings. The information about the chosen ω that is
available to us at time η consists of the values Ζ„(ω) for every ^„-measurable
random variable Z„. A process X is adapted if the value ^„(ω) is known to us
at time n. Usually, {^„} is the natural filtration {iTn:neZ*} of some process
W9 and then the information about ω which we have at time η consists of the
values W0(co), \νχ(ω\..., Wn(<o). A process X is then adapted if and only if, for
each n, Xn = /„(W0,..., Wn) for some /emJn+ К See E38.25.
144 SOME CLASSICAL THEORY H.46,47
46. Martingale; supermartingale; submartingale. As already explained, these
concepts are defined relative to our given filtered space in (45.2).
(46.1) THE KEY DEFINITIONS. A process X is called a martingale if
(i) X is adapted;
(ii) Е(|*и|)<схэ, Vn;
(in) E[*„|JvJ = *„_!, a.s. (n>\).
A supermartingale is defined similarly, except that (Hi) is replaced by
E[*„|JF,,_i]^*„-i, a.s. (n^l),
and a submartingale is defined with (Hi) replaced by
Ъ\Хп\?п-Д>Хп_х, a.s (n>l).
A supermartingale 'decreases on average'; a submartingale 'increases on average'.
The 'p' points down, the 'b' up! Of course, Chapter I has explained how
'superharmonic' corresponds to 'local supermartingale', which was the reason
for this choice of terminology.
Note that X is a supermartingale if and only if — X is a submartingale, and
that X is a martingale if and only if it is both a supermartingale and a
submartingale. It is important to note that a process X for which X0e JSf *(Ω, ^0,P)
is a martingale (respectively, supermartingale, submartingale) if and only if the
process X — X0 = (Xn — X0:neZ+) has the same property. So we can focus
attention on processes that are null at 0.
If X is, for example, a supermartingale, then the Tower Property of conditional
expectations, (41)(i), shows that, for m < n,
E[^|^m] = E[ZJ^_1|erj^E[Zn_1|^J^...<Zm, a.s.
(46.2) Gambling interpretation. Think of Xn — Χη-± as your winnings per unit
stake on a gambling game. The game is unfavourable to you if X is a
supermartingale, favourable to you if X is a submartingale, and fair if X is a
martingale.
47. Previsible process; gambling strategy; a fundamental principle. We now study
the discrete-parameter analogue of stochastic integrals.
(47.1) DEFINITION (previsible process). We call a process (Сп:пеЩ previsible
if] for each neN, C„ is !Fn- ^measurable.
Note that C0 is not defined.
Think of Cn as your stake on game n. You have to decide on the value of
Cn based on the history up to (and including) time η — 1. This is the intuitive
significance of the 'previsible' character of C. Your winnings on game η are
11.47,48 DISCRETE-PARAMETER MARTINGALE THEORY 145
Cn(Xn — Xn_1) and your total winnings up to time η are
(47.2) Y„= Σ Ck(Xk-Xk^)=:{CX)n..
Note that (Ο·Χ)0 = 0, and that
*и ~~ *n- 1 = Cn(Xn ~~ ^n- l)·
(47.3) DEFINITION (martingale transform, stochastic integral). The process
0·Χ is called the martingale transform of AT by C, or £/ie (discrete) stochastic
integral of С with respect to X.
(47.4) THEOREM. You can't beat the system!
(i) Let С be a bounded non-negative previsible process, so that, for some К in
[0, oo), |Cn(co)| ^Kfor every η and every ω. Let X be a supermartingale
(respectively martingale). Then ϋ·Χ is a supermartingale (martingale) null
at 0.
(ii) If С is a bounded previsible process and X is a martingale, then (0·Χ) is a
martingale null at 0.
(Hi) In (i) and (ii) the boundedness condition on С may be replaced by the
condition С„еЛ?29Уп, provided we also insist that ATneJS?2,Vn.
Proof of (i). Write Υ for 0·Χ. Since Cn is bounded non-negative and ^"n_i
measurable, we have, from (41)(j),
Е[1;-1;-1|^и-1] = СиЕ[^и-^и_1|^и.1]^0 (resp. =0).
Proofs of (ii) and (iii) are now obvious. (Look again at (41)(j).) Π
48. Doob's Upcrossing Lemma. Doob's use of upcrossings is one of the most
sparkling things in the theory.
(48.1) DEFINITION (number of upcrossings). Let X be a supermartingale. The
number UN(X; [a, b])(co) of upcrossings of [a, b] made by η \-+Χη(ω) by time N
is defined to be the largest к ίηΈ+ such that we can find
0 ^ Si < t1 < s2 < t2 < · · · < sk < tk ^ N
with
XMi(co) < a, Xti(a>) >b (1 ^ i ^ k).
Regard Xn — Xn _ x as representing your winnings per unit stake on game n.
Consider your total-winnings process У:= 0·Χ under the previsible strategy С
described as follows:
Pick two numbers a and b with a < b.
Repeat
146 SOME CLASSICAL THEORY 11.48,49
Wait until X gets below a
Play unit stakes until X gets above b and stop playing
Until False (that is, forever!).
To be more formal (and to prove inductively that С is previsible), define
Cl:=I{Xo<a}>
and, for η ^ 2,
^n:== I{Cn- ι = l}I{*n- ι < b} + I{C„- ι =0}I{X„- , <α} '
The fundamental inequality (recall that 70(ω):=0)
(48.2) ΥΝ(ω) >(b- a)UN(X; [α, Ь] )(ω) - [ΧΝ(ω) - α] "
is now obvious: every upcrossing of [a, b] increases the У-value by at least b — a,
while the [^(ω) — a] " overemphasises the loss during the last 'interval of play'.
(Draw a picture, or see [W].)
(48.3) THEOREM (Doob's Upcrossing Lemma). Let X be a supermartingale.
Let UN(X;(a,b~\) be the number of upcrossings o/[a,b] by time N. Then
(b - a)EUN(X; [a, Ц) < E[(*„ - a)"].
Very Important Note. The number of steps does not feature directly on the
right-hand side; only the final variable XN appears. It is the fact that we get a
bound independent of the number of steps that makes this result so powerful.
Proof. The process С is previsible, bounded and non-negative, and Υ=0·Χ.
Hence У is a supermartingale, and E(YN) < 0. The result now follows from (48.2).
D
(48.4) COROLLARY. Let Xbea supermartingale that is bounded in JS?1 in that
supnE(|Arn|)< со. Let a9beQwith a < b. Then,with t/^pfjfob])·^ ^limNUN(X\
(b-a)EC/00№[a,b])^|a| + supE(|ZJ)<oo,
π
so that P(UJX;[_a,b]) = oo) = 0.
Proof. By Lemma 48.3, we have, for JVeN,
(b -a)EUN(X;[a,b]) ^ \a\ + E(|*w|)< \a\ + supE(|X„|).
Now let iV|oo, using the Monotone-Convergence Theorem. Π
49. Doob's Supermartingale-Convergence Theorem. Doob's proof is worthy of
the result.
11.49,50 DISCRETE-PARAMETER MARTINGALE THEORY 147
(49.1) THEOREM (Doob's Supermartingale-Convergence Theorem). Let X be
a supermartingale bounded in <£ 1:supnE(|Arn|) < oo. Then, almost surely, X^'.^
lim Xn exists and is finite. For definiteness, we define 1^(0)):= limsup ^„(ω), \/ω,
50 that X^ is 3F^measurable and X^ — lim Xn9 a.s.
Proof (Doob). Write (noting the use of [— oo, oo]):
Λ:= {ω:Χη(ω) does not converge to a limit in [— oo,oo]}
= {ω: lim inf Χη(ω) < lim sup ^„(ω)}
= (J ^:liminfZn^)<a<b<limsupZn(a>)}
{a,bs<Q:a<b}
=:UAfl,b (say).
But
Ав,ьс{ш:1/в№[а1Ь])(ш)=оо},
so that, by (11.4), P(Afltb) = 0. Since Л is a countable union of sets Afl>b, we see
that Р(Л) = 0, whence '
Xao:=^mXH exists a.s. in [—00,00].
But Fatou's Lemma shows that
E(\XJ) = E(liminf|ZJ) ^HminfE(|AJKsupE(|*J)< 00,
so that P(XO0 is finite) =1. Π
Note. There are other proofs for the discrete-parameter case. None of these is
as probabilistic, and none shares the central importance of this one for the
continuous-parameter case.
(49.2) COROLLARY. IfX is a non-negative supermartingale, then X^ := lim Xn
exists almost surely.
Proof. X is obviously bounded in JSf1, since E(|Xn\) = E(Xn)^E(X0). Π
50. JS?1 convergence and the UI property. It is important to know when
supermartingales converge in $£1.
(50.1) THEOREM. Let X be a supermartingale bounded in jSf1, so that X^'.^
lim Xn exists a.s. Then Xn-+XOQ in JSf1 if and only ifX = {Xn:neZ+} is uniformly
integrable, and then, for пеЖ*,
(50.2) Е(ХХ\^„)^ХП, a.s.
with a.s. equality if X is a (UI) martingale.
148 SOME CLASSICAL THEORY 11.50,51
Proof. Because of Theorem 21.2 on the equivalence of $£1 convergence and the
UI property, all that remains is to prove that (50.2) holds if X^X^ in JSf1.
But then, for Fe J^, and r ^ n,
E(XriF)^E(Xn;F)9
and (50.2) follows on letting r-* oo.
(50.3) THEOREM (Levy's 'Upward' Theorem). Let ξε&^Ο,&,Ρ), and, for
n^O, define Mn:= Ε(ξ\^η)9 a.s. Then Μ is a UI martingale and
Μ„^η:=Ε(ξ\^00)ί
almost surely and in JS?1.
Proof. We know that Μ is a martingale because of the Tower Property 41 (i).
We know from Theorem 44.1 that Μ is UI. Hence Мда := lim Mn exists a.s. and
in jSf1, and it remains only to prove that Мда =η9 a.s., where η:=Ε(ξ\^Γ00).
However, for Fe^n,
EfeF) = E(Mn;F) = E(M00;F),
so that E(i/;F) = E(£F) for all F in the π-system (Jj% that generates P^- ВУ
property 39.1 (iii) in the definition of conditional expectation, the result follows.
(50.4) THEOREM (Kolmogorov's 0-1 Law). Let Xl9X29... be a sequence of
independent RVs. Define
<Γη:=σ(Χη+ι,Χη+2,...), 3-.= {\9-η.
η
ThenifFe^,P(F) = Qor 1.
Proof. Define J*v= a(XuX2,...,Xn). Let FeF, and let η\= lF. Since ^eb^»,
Levy's Upward Theorem shows that
^ = Efo|^J = limEfo|^,,), a.s.
However, for each η, η is ^"„-measurable, and hence is independent of 3Fn. Hence,
by41(k),
Е(Ч|^Я) = Е(|,) = Р(П a.s.
Hence η = P(F), a.s.; and since η only takes the values 0 and 1, the result follows.
D
For another nice application of the Upward Theorem, see Exercise 60.50.
51. The L6vy-Doob Downward Theorem. This theorem is crucial for the
continuous-parameter theory. We follow the account at T.V.21 in Meyer [2].
(51.1) THEOREM (Levy-Doob Downward Theorem). Suppose that (Ω,^,Ρ)
is a probability triple, and that {&n:ne— N} is a collection of sub-a-algebras of
11.57 DISCRETE-PARAMETER MARTINGALE THEORY 149
& such that (for fc,neN)
к
Let X — {X„:ne— N} be a supermar ting ale relative to {Уп:пе — Ν}, so that
E(Xn\<Zm)^Xm, a.s. (m^n^-1).
Assume that supn< _t EfX,,) < oo. Then the process X is UI, and the limit
*_:= lim Xn
n~* — oo
exists a.s. and in JSf1. Further, for η < — 1,
EiATJSi.J^Jr.., a.s.,
wii/i a.5. equality if X is a martingale.
Proof We prove the UI property; the existence (a.s. and JSf1) of X-^ then
follows from the Upcrossing Lemma just as in the case of the Supermartingale-
Convergence Theorem 49.1.
Let ε > 0 be given. Since
| lim EX„ < oo,
n| — oo
there exists к such that
(51.2) 0 ^ E{Xn) - E{Xk) < £e for all η < к.
Now, for η < к and λ > О,
Ε(\Χη\;\Χη\>λ)=-Ε(Χη;Χη<-λ) + Ε(Χη)-Ε(Χη;Χη*:λ)
< -E(Xfc;X„< -Я) + Е(ХИ)-Е(Х*;Х„<Я),
by the supermartingale property. Hence, by (51.2),
Ε(\Χη\;\Χη\>λ)^Ε(\Χ,\;\Χη\>λ) + ±ε.
Since Xke$£S Lemma 20.1 shows that we can find δ > 0 such that
P(F) < δ implies that E(\Xk\;F) < ±e.
But P(|AJ > Я) < Я"1Е(|АГИ|), and, since X~ := max {{-X%0} is a submartingale
by 'cJensen',
E(|Xn\) = E(X„) + 2Е(ЛГ;) < sup E(X„) + 2E(XI,).
π
We may therefore choose К such that
Ρ{\Χη\>Κ)<δ whenever η < Jt,
Ε(|λ^|;|λ^|>Κ)<ε whenever J>fc.
Then E(|X„\; \Xn\ >Κ)<ε for every η < -1, so that X is UI. D
150
SOME CLASSICAL THEORY
11.51,52
Kolmogorov's Strong Law of Large Numbers is a consequence.
(51.3) THEOREM (SLLN). Let X1;A:2,... be independent, identically distributed
random variables, with E{\Xk\) < oo for some (then every) k. Let μ be the common
value ofE(X„). Write S„:= Xt+X2 +■■■ + X„. Then
η~18„-*μ a.s. and in ££l.
Proof. Define
<0-n:=c(S„,Sn+l,Sn + 2,...), »_„:= f]9.n.
П
Then, for η ^ 1,
E(X1|^_n) = E(X2|^_n)=...=E(XJ^_n) = n-1E(SJ^_n) = n-1Sn, a.s.
Hence L:= limn~ 1Sn exists a.s. and in $£*. For definiteness, define L:= limsupn~* Sn
for every ω. Then, for each fc,
L = limsup*'+1 + '"+*' + "
η
so that Lem^k, where &'k = a(Xk+1,Xk+2,···)· By Kolmogorov's 0-1 law,
P(L= c) = 1 for some с in R. But
с = E(L) = lim Е(и" 'SJ = μ. Π
Remarks. See Meyer [2] for important extensions and applications of the results
given so far in this chapter. These extensions include the Hewitt-Savage 0-1
Law, de Finetti's Theorem on exchangeable random variables, and the Choquet-
Deny Theorem on bounded harmonic functions for random walks on groups.
52. Doob's Submartingale and 5£p Inequalities. Many uses are made of the
inequalities in this section. We return to the standard situation in which our
time-parameter set is Z+.
{52.1) THEOREM (Doob's Submartingale Inequality). Let Ζ be a non-negative
submartingale. Then, for о 0 and пеЖ+,
cP( sup Zk ^ с ) ^ E( Z„; supZk ^ с ) ^ E(Z„).
Important Notes. The number η of steps does not feature directly in the last
two expressions. This is what gives the result its power. The proof will show
that the assumption that Ζ is non-negative is not needed for the first inequality.
Proof. Let F:= {supk^nZk^c]. Then F is a disjoint union
F = F0vF1u~-vFH,
11.52 DISCRETE-PARAMETER MARTINGALE THEORY 151
where
F0:={Z0^c},
Fk:= {Z0<c}n{Zl <c}n---n{Zk^1 <c}n{Zk^c}.
Now, Fke&k, and Zk ^ с on Fk. Hence
E(Z„;Fk)>E(Zk;Fk)>cP(Fk).
Summing over к now yields the result. Π
The main reason for the usefulness of the above theorem is the following.
(52.2) LEMMA. If Μ is a martingale, с is a convex function and E\c(Mn)\ < oo, Vn,
then c(M) is a submartingale.
Proof. Apply the conditional form of Jensen's inequality in Table 41. Π
In preparation for Doob's ifp inequality, we now establish a consequence of
Holder's inequality.
(52.3) LEMMA. Suppose that X and Υ are non-negative random variables such
that
cP(X ^ c) < E( Y; X ^ c) for every c> 0.
Then, for ρ > 1 and p~ l + q~ l = 1, we have
\\X\\P<q\\Y\\p.
Proof. We obviously have
(52
.4) L:= pcp-lV{X^c)dc^
pcp-2E(Y;X^c)dc=:R.
c = 0
Using Fubini's Theorem with non-negative integrands, we obtain
Ь=Г ( ί I{x>c}{co)P(d<o)\pc'-Uc
= Γ ( Γ pc"-1 dc )P(dd) = E(XP).
)a\Jc=o /
Exactly similarly, we find that
R = E(qXp-lY).
We apply Holder's inequality to conclude that
(52.3) E(X')^E(qX'-1Y)^q^Y^X'-l\lr
152 SOME CLASSICAL THEORY H.52,53
Suppose that || Y\\p < oo, and suppose for now that || X \\p < oo also. Then, since
{p—l)q = p9 we have
so (52.5) implies that ||A"||P<q\\ Y\\p. For general X, note that the hypothesis
remains true for Χ απ. Hence || X л η \\p ^ q \\ Υ \\ρ for all n, and the result follows
using the Monotone-Convergence Theorem. Π
{52.6) THEOREM (Doob's JS?p inequality). Let ρ > 1 and define q so that
p"1 + q~l = 1. Let Ζ be a non-negative submartingale bounded in JSfp, and define
(this is standard notation)
Z*:= sup Zk.
Then Z*eJ?p, and indeed
(52.7) || Z* ||, < * sup || Zr ||,.
r
The submartingale Ζ is therefore dominated by the element Z* of S£p. Also,
Z^^ lim Zn exists a.s. and in JS?P, and
||ZJ|, = sup||ZJ, = Tlim||Zr||p.
r r
(b) If Ζ is of the form \M\, where Μ is a martingale bounded in S£py then
M00:=limM„ exists a.s. and in JSfp, and of course Z^ = IM^J, a.s.
Proof. For neZ+, define Z*:= supfc<nZfc. From Doob's Submartingale Inequality
and the above Lemma, we see that
\\z:\\p^q\\ZJp^qsup\\Zr\\p.
r
Property (52.7) now follows from the Monotone-Convergence Theorem. Since
( —Z) is a supermartingale bounded in JSfp, and therefore in JS?1, we know that
Z00:=limZn exists a.s. However,
\Zn-Z\p^(2Z*)pe<?\
so that the Dominated-Convergence Theorem shows that Zn -* Ζ in 5£p. Jensen's
inequality shows that ||Zr||p is non-decreasing in r, and all the rest is
straightforward. Π
53. Martingales in JSf2; orthogonality of increments. Let M = (Mn:n^0) be a
martingale in <£2 in that each Mn is in JS?2 so that E(M2) < oo, Vn. Then for
5, t,u9veE+, with s < t ^ и ^ v, we know from properties (a) and (j) of Table 41
that for Ze&2{Pu\
E(ZM„) = EE(ZMV\FU) = E[ZE(MJ#-M)] = E(ZMU),
11.53,54 DISCRETE-PARAMETER MARTINGALE THEORY 153
so that Mv — Mu is orthogonal to JSf 2(J*M) and, in particular,
(53.1) E[(Mf-MJ(M.-MJ] = 0.
Hence the formula
k = l
expresses Mn as the sum of orthogonal terms, and Pythagoras's theorem yields
(53.2) E(Mn2) = E(M2) + £ ΕΚΜ,-Μ,.,η
k = l
The following theorem is therefore obvious.
(53.3) THEOREM. Let Μ be a martingale for which Mne&2^n. Then Μ is
bounded in «Sf2 if and only if
ХЕ[(Мк-Мк_1)2]<сю;
and when this obtains, Theorem 52.6 implies that
Mn->MaD almost surely and in if2.
54. Doob decomposition. In the following theorem, the statement that Ά is a
previsible process null at 0' means of course that A0 = 0 and Anem J^n _ x (neN).
(54.1) THEOREM (Doob decomposition), (i) Let {Xn:neZ+) be an adapted
process with Jfneif 1,Vw. Then X has a Doob decomposition
(54.2) X = X0 + M + A,
where Μ is a martingale null at 0, and A is a previsible process null at 0. Moreover,
this decomposition is unique modulo indistinguishability in the sense that if
X — X0 + Μ + A is another such decomposition then
Р(Мп = Мп,Ап = Ап,Щ=1.
(ii) The process X is a submartingale if and only if A is an increasing process
in the sense that
V(An<An+l9W=l.
Proof. ΙΪΧ has a Doob decomposition as at (54.2) then, since Μ is a martingale
and A is previsible, we have, almost surely,
Е(ХП-ХП_1|#'Л_1) = Е(МП-МП_1|^П_1) + Е(ЛП-ЛЛ_1|^Г1_1)
= 0 + (Αη-Απ.1).
Hence
(54.3) An = t E(** -Xk-1!#■*- Д a.s.,
*=1
154 SOME CLASSICAL THEORY 11.54,55
and if we use (54.3) to define A, we obtain the required decomposition of X.
The 'submartingale' result in Part (ii) of the theorem is now obvious. Π
{54.4) Remark. The Doob-Meyer decomposition, which expresses a submartin-
gale in continuous time as the sum of a local martingale and a previsible
increasing process, is a deep result that is the foundation stone for stochastic-
integral theory. It is proved in full generality in Chapter VI.
A useful estimate. The following estimate, which makes no positivity hypothesis,
is often useful.
(54.5) LEMMA. If X is a submartingale or supermartingale then, for NeZ+ and
c>0,
>(sup\Xk\>3c)
cP sup \Xk\ ^ 3c U 4E(\X0\) + 3£(|*„|).
Proof. Let X be a submartingale with Doob decomposition
X = X0 + M + A,
where A is increasing. Then
sup I A; I ^ \X0\ + sup \Mk\ + sup \Ak\ ^ \X0\ + sup \Mk\ + AN.
k^N k^N k^N k^N
Thus, using the fact that | Μ | is a non-negative submartingale and the
Submartingale Inequality, we have, for о 0,
cP sup \Xk\ ^ 3c UcP(|ΑΌΙ >c) + cP sup \Mk\ 7*c\ + cP{AN7* c)
<Ε(|Χ0Ι) + Ε(|Λί*|) + Ε(Α„)
<Ε(|Χ0Ι) + Ε(|^--Υο-^Ι) + Ε(^)
<Е(|Х01) + Е(|^|) + Е(|Х01) + 2Я(^)
< 2Е(|ВД + E(|^|) + 2E(*„ - X0)
<4E(\X0\) + 3E(\XN\).
If AT is a supermartingale, apply the result just obtained to the submartingale
(-X). D
55. The <M> and [M] processes. The continuous-parameter analogues of these
processes allow the stochastic integral to be defined. In both discrete and
continuous time, a host of celebrated inequalities (Burkholder-Davis-Gundy,
John-Nirenberg etc.) are associated with them. See, for example, Garsia [1]
and Neveu [5] for the discrete case.
H.55,56 DISCRETE-PARAMETER MARTINGALE THEORY 155
{55.1) DEFINITION (the angle-brackets process <M>). Let Μ be a martingale
in JS?2 and null at 0. Then M2 is a submartingale with (essentially unique) Doob
decomposition
M2 = N + A,
where N is a martingale and A is a previsible increasing process, both N and A
being null at 0. The process A is written <M>, and called the angle-brackets
process of M.
Define Αβ:= |lim A„, a.s.. Since E(M2) = E{An), we see that
(55.2) Μ is bounded in S£2 if and only ifE{AaD) < oo.
It is important to note that
(55.3) ЛП-ЛП_1=Е(М2-М2_^^П_1) = Е[(МП-МП_1)2|^П_1].
This is reflected in the following result.
{55.4) THEOREM and DEFINITION (the [M] process). Again, let Μ be a
martingale in S£2 and null at 0. Define
[М]„:= ^(М.-М,^)2.
Then
(55.5) M2-[M] = CM, where CH:=2MH-l9
and
V:= M2 - [M]
is a martingale. If Μ is bounded in 5£2 then the martingale V is uniformly integrable.
Proof. The result (55.5) is elementary, and implies that К is a martingale because
of Theorem 47.4(iii). If Μ is bounded in S£2 then, by Doob's S£2 Inequality,
the process M2 is dominated by M*2, which is in $£*, and the process [M] is
dominated by [M]^, which is also in 5£x. Hence V is dominated in 5£ *, and is
therefore UI. Π
Stopping times, optional stopping and optional sampling
56. Stopping time. The discrete-parameter theory is easy. The continuous-
parameter theory is more challenging.
{56.1) DEFINITION (stopping time). A map Τ:Ω-► {0,1,2,...; oo} /5 called a
stopping time if
(56.2) {Γ^η} = {ω:Γ(ω)^η}Ε^„, Vn^oo,
156 SOME CLASSICAL THEORY H.56,57
equivalently, if
(56.3) {Τ = η} = {ω:Τ(ω) = η}Ε3?„, Vn^oo.
Note that Τ can be oo. The equivalence of (56.2) and (56.3) is trivial.
Example. Suppose that (An) is an adapted process, and that Be0S. Let
Τ = inf {n ^ 0: A„eB} = time of first entry of A into set B.
By convention, inf(0) = oo, so that Г= oo if A never enters set B. Obviously,
{T<n}= \J{AkeB}e*n9
so that Г is a stopping time.
Conversely, if Г is a stopping time, and we define the process /[Гоо) by writing,
for 0 ^ η < oo and ωεΩ,
fl ifn^TM,
[Γ'°°Λ ; [0 otherwise,
then I[T ^ is adapted (check!), and Τ is the first entry time of this process into
the set {1}.
57. Optional-stopping theorems. Let X be a supermartingale, and let Γ be a
stopping time. For η ^ 1, regard Xn — Xn-1 as your fortune per unit stake on
game n. Suppose that you always bet 1 unit and quit playing at (immediately
after) time T. Then your 'stake process' is C(r), where, for neN,
c<r) = W}> sothat с1Г)И = {о
1 ϋη^Τ(ω),
otherwise.
Your 'winnings process' is the process with value at time η equal to
(CW.X)n = XTAn-X0.
If XT denotes the process X stopped at T,
*>):=*Γ(ω)Λ»,
then
&Τ)·Χ = ΧΤ-Χ0.
Now С(Г) is clearly bounded (by 1) and non-negative. Moreover, С(Г) is previsible
because С<Г) can only be 0 or 1 and, for neN,
{С<г> = 0} = {Г<п-1}е^„_1.
Theorem 47.4 now yields the following result.
(57.1) THEOREM (stopped supermartingales are supermartingales). The
following results hold.
11.57
DISCRETE-PARAMETER MARTINGALE THEORY
157
(i) IfX is a supermar ting ale and Τ is a stopping time, then the stopped process
Хт = (ХТлп:пеЖ+) is a supermartingale, so that, in particular,
(57.2) Е(ХГл„ХЕ(Х0),. Vn.
(ii) If X is a martingale and Τ is a stopping time, then XT is a martingale, so
that, in particular,
(57.3) Ε(*ΓΛΠ) = Ε(*0), Vn.
It is important to notice that this theorem imposes no extra integrability
conditions whatsoever (except of course for those implicit in the definition of
supermartingale and martingale).
But we have to be careful. Let X be a simple random walk on Ζ (with
probabilities j of jumping to a nearest neighbour), starting at 0. Then X is a martingale
relative to its natural filtration. Let Τ be the stopping time:
T:=ini{n:Xn=l}.
It is well known that P(T < oo) = 1. However, even though (57.3) holds for every
n, we have 1 = E(XT) Ф E(X0) = 0. It is important to know when we can say
that
E(XT)^E(X0)
for a martingale X. The following theorem gives some sufficient conditions.
(57 A) THEOREM (Doob's Optional-Stopping Theorem). The following results
hold, (a) Let Τ be a stopping time. Let X be a supermartingale. Then XT is
integrable and
E(XT)^E(X0)
in each of the following situations:
(i) Τ is bounded (for some N in IK, Τ(ω) ^ N, Va>);
(ii) X is bounded (for some К in R+, | ^„(ω)! ^ Κ for every η and every ω) and
Τ is a.s. finite;
(Hi) E(T) < oo, and, for some К in R+,
\Χη(ω)-Χη-ι(ω)\<Κ, ν(η,ω).
(b) If any of the conditions (i)-(iii) holds and X is a martingale then E(XT) =
ЩХо).
Proof of (a). We know that XTAn is integrable, and
(57.5) E(XTAn-Xo)^0.
For (i), we can take η = N. For (ii), we can let η -* oo in (57.5) using the Bounded-
Convergence Theorem 21.1. For (iii), we have
^KT
Ι-^ΓΛϋ — ^οΙ —
Гл п
*=1
158 SOME CLASSICAL THEORY 11.57,58
and E(KT) < oo, so that the Dominated-Convergence Theorem 8.5 justifies
letting η -* oo in (57.5) to obtain the answer we want.
Proof of (b). Apply (a) to X and to (-X). D
(57.6) Awaiting the almost inevitable. In order to be able to apply result (Hi) of
Part (a) of the above theorem, we need ways of proving that (when true!)
E(T) < oo. Then following announcement of the principle that 'whatever always
stands a reasonable chance of happening will almost surely happen—sooner rather
than laterf is often useful.
(57.7) LEMMA. Suppose that Τ is a stopping time such that for some N in N
and some ε > 0, we have, for every η in N,
Ρ(Γ^η + Ν|^„)>ε, a.s.
Then E(T) < oo.
The proof is left as an exercise.
(57.8) THEOREM (Doob's Supermartingale Inequalities). Let X be a non-
negative supermar ting ale and Τ a stopping time. Then
(57.9) E(XT)^E(X0).
Moreover, for с ^ 0,
(57.10) cpJ supXk > с I < E(X0).
Proof. The result (57.9) is obtained by applying Fatou's Lemma to (57.2). The
result (57.10) is then obtained by taking T:= inf {n: X„ > c} in (57.9). Π
58. The pre-Γ σ-algebra &τ. Let Γ be a stopping time.
(58.1) DEFINITION (the pre-Γ σ-algebra &T). For F^Q, we write Fe^T if
Fn{T^n}e^n for every neZ+u{oo},
or, equivalently, if
Fn{T=n}e^n for every neZ+u{oo}.
(58.2) The intuitive meaning. We regard the σ-algebra &*т as consisting of those
events whose occurrence or non-occurrence can be decided from what our
observer has seen up to and including time T. Note how well the following
lemma therefore ties in with our intuition.
(58.3) LEMMA. Let S and Τ be stopping times.
11.58,59 DISCRETE-PARAMETER MARTINGALE THEORY 159
(i) If X is an adapted process then XTem^T.
(ii) IfS^ Τ then ^s с J^.
(Hi) ^лГ = ^п^г.
(iv) If Fe&Sy T then F n{S ^T}e^T.
The proof is left as an easy exercise.
59. Optional sampling. This is an area in which one has to be careful. It is easy
to devise fallacious quick 'proofs' of the following theorem that, one then realises,
assume Part (ii) of the theorem. One is, for example, tempted to presuppose
that XTAn converges to XT in if1. The fact that Part (ii) of the theorem is false
in continuous time (though it is then still true for UI martingales) helps indicate
the need for care. We give a proof that very clearly connects the so-called 'class
(D) property' in Part (ii) with the existence of a Doob decomposition.
{59.1) THEOREM (Doob's Optional-Sampling Theorem). Let S and Τ be
stopping times with 0 ^ S < Τ ^ oo.
(i) Let Μ be a UI martingale. Then
(59.2) Ε(ΜΓ|^5) = Μ5, a.s..
Moreover, Μ is of class (D) in that the family (ΜΓ: Τ a stopping time) is UI.
(ii) Let X be a UI supermartingalet and let
(59.3) X = X0 + M-A
be its Doob decomposition (with Μ a martingale null at 0, and A a previsible
increasing process null at 0). Then A is integrable in that ЩА^ < oo), and
Μ is UI, whence X is of class (D) in that
the family (XT:T a. stopping time) is UI.
Moreover,
E(XT\^S)^XS, a.s.
Proof of Part (i)for the case when O^S^T^kfor some fcelM. Suppose that
0^S< T^k for some fceN. Then MT and Ms are in JSf1, because each is
dominated by \M0\ Η h \Mk\. Let Fe^s, and define the process C:= IfI{S,t]>
that is,
^ . ч f 1 if coeF and S(d) < η < Τ(ω),
Сп(а>):= <
(0 otherwise.
Then (check!) С is previsible (and bounded and non-negative), whence
E(C*M)fc:=E(Mr-M5;F) = 0,
and Part (i) follows for this case.
160 SOME CLASSICAL THEORY 11.59
Completion of Proof of Part (i). We have, using the result just proved, the
Tower Property (41)(i) and Theorem 50.1,
MTAk = E(Mk\^TAk) = E(MJ^k\^TAk) = E(M00\PTAk).
By Levy's Upward Theorem 50.3, we have J£?* convergence of this equation to
Mr = E(MJSi), where 9>*a(\J*TA\
Of course, ^c jzrr. Suppose that Fe&T. Then (check!) Fn{T^k}e^TAk, so
that Fn{T<oo}e& and
E(Mr;Fn{T<oo}) = E(M00;Fn{r<oo}).
Of course it is tautological that
E(Mr;Fn{T=oo}) = E(M00;Fn{r=oo}).
Hence E(Mr;F) = E(M00;F), and we have proved (59.2).
The class (D) property of Μ now follows because of Theorem 44.1. Π
Exercise. Prove that if Fe^T then there exists Ge^ such that (G\F)u(F\G)
is P-null.
Proof of Part (ii). We have E(An) = E(X0)-E(Xn), and since X is UI and
therefore bounded in JS?1, we have
Е(Л00) = ТИтЕ(Ли)<оо.
Thus the process A is dominated by the element A^ of 5£ \ so that Л is UI.
Since AT is also UI, it follows that Μ is UI. We now know that the families
(ΜΓ: Τ a stopping time) and (AT:T a stopping time)
are UI, the latter being dominated by Аю. Hence
(XT:T 3. stopping time)
is UI, and certainly each XT is in J5f*. Next,
E(XT\&S) = X0 + E(MT\&s) - E(AT\^S)
<:X0 + MS-E(AS\&S)
= X0 + MS-AS = XS,
and the theorem is proved. Π
{59.4) (Riesz decomposition; potentials). Again let X be a UI supermartingale.
Then X has a unique Riesz decomposition
X=Y + Z,
where Υ is a martingale, and Ζ is a potential that is, Ζ is a non-negative UI
11.59,60 DISCRETE-PARAMETER MARTINGALE THEORY 161
supermartingale with Z^ — 0, a.s.. In our discrete-parameter situation, Ζ is of
class (D), and Ζ is the potential
Ζ„:=Ε(ΑΧ-Αη\^η)
of our integrable previsible increasing process A.
The proof is an easy exercise.
(59.5) THEOREM. Let X be a non-negative supermartingale, and let S and Τ
be stopping times with S ^T. Then
E(XT\^S)^XS9 a.s.
Note. Now there is no a.s. equality if X is a non-negative martingale.
Proof. As in (59.2),
Е(*гл„1^5л„К*5лп> a.s.,
and, by modifying the arguments for (59.3), we indeed have
E(A-TAJ^s)<JrSA., as.
Now let η -* oo, and use the conditional form of Fatou's Lemma. Π
The following commutativity property is often useful.
(59.6) THEOREM. For a stopping time T, define
Ετ: ϋ(Ω, Ρ, Ρ) -> ί/(Ω, J*"r, Ρ)
(note the L1 rather than Si'1) by making Ετ(ξ) the equivalence class containing
Ε(ξ\^τ), the distinction between ξ and its equivalence class already being blurred!
Then, for stopping times S and T, we have
Proof. We make repeated use of Theorem 59.1 for UI martingales. Let ξ e
<£Χ(Ω,^,P) and let ξη:= Ε(ξ\&η\ a.s.. Let η = Ε(ξ\Ρτ\ a.s., so that, by (59.1),
η = ξΤ9 a.s.. Moreover, Efal^J and ξηΑΤ are UI martingales, both with limiting
value (a.s. equal to) ξτ. Hence Ε(η\^Η) = ξηΑΤ, a.s. for all n, and, by (59.1) yet
again, Е(^5) = £5лГ. D
Note. In the continuous-parameter situation, we need a right-continuous version
of ξτ, and have to mirror the use of JS?1 rather than L1 during the proof.
60. Exercises. There are a lot of exercises on this material in [W]. The following
important exercises are not given there.
162 SOME CLASSICAL THEORY 11.60
(E60.41) Conditional independence. Let (Ω,^,Ρ) be a probability triple, and let
si, Si and <€ be sub-a-algebras of J*". Show that the conditions
(i) Р(ЛпС|Л) = Р(Л|Л)Р(С|Л), a.s. (VAe^,VCe#).
(ii) P(C|a(^,^)) = P(C|^), a.s. (ICeV),
are equivalent. When one (then each) of these conditions holds, we say that si
and <€ are conditionally independent given 3$.
The crucial application is to Markov-process theory, when si represents the
Past, ® the Present and <€ the Future.
{E60.50) Martingales and differentiation. Let fe^([0,1], Jf[0,1], Leb). Define
Гк2~п
/„(*):= 2" f{x)dx if (fe - 1)2"π < χ < fc2"n,
J(k-1)2-"
with (say) /и(1):= /(1). Prove that /„-*/, a.e. and in S£1. Use this result to
complete Exercises El 3.6b and El3.9c.
Hints
(H60.41) Suppose that (i) holds. We want to prove (ii). Now, sets of the form
AnB, where Aesi and Be3&, form a π-system that generates a(si,3&\ so we
need only prove, using (i), that
Е((Р(С|Л)М nB) = P(A nBn С).
This and the 'converse' part are exercises in using the Taking out what is
known' property (41)(j).
{H60.50) Take Ω:= [0,1], &:= ^[0,1] and P:= Leb. Define
^V= ^([(fe- 1)2-иД2"и): 1 ^ fe < 2я).
Then fn = E(f\^n\ and Levy's Upward Theorem shows that /„-*/, a.s. and
in JSf1.
Hint for E13.6b. Suppose that Fe& satisfies F^B and P(F)>0. Let f:=IF
and let ε > 0. Since /и -* /, a.s., there will be some large η and some к such that
/и(х)>1-|е for (fe-l)2-n^x<fe2"n.
But (F + 2mp) mod 1 is a subset of B, and, for 1 < i ^ 2n, we can, by suitable
choice of m, show that
P,(Bn[(i - 1)2"", Я"")) > (1 - ε)2~η.
11.61,62 CONTINUOUS-PARAMETER SUPERMARTINGALES 163
5. CONTINUOUS-PARAMETER SUPERMARTINGALES
Regularisation: R-supermar ting ales
61. Orientation. You should not proceed with this Part 5 until you are very
secure with the material of Part 4. Everything there will find applications here.
The essential first step, Doob's Regularity Theorem, helps establish that we
can concentrate on right-continuous supermartingales relative to right-continuous
filtrations. Right continuity of paths and filtrations will then allow us to transfer
results from the discrete-parameter context of Part 4. We have to work with
'R-filtrations', filtrations satisfying the 'usual conditions', to obtain an adequate
theory of stopping times.
(61.1) A guide to our notation. We shall begin by assuming given a 'rough'
supermartingale {Y,:ieR+} relative to a 'rough' filtration {^:ieR + }· This
'rough' setup is the 'obvious' generalisation of the discrete-parameter case; but
it is not at all adequate. We shall show that, for almost all ω, the limit (through
rational times)
*,(ω):= lim 79(ω):= lim Yq{w)
qllt <$3q^t,q>t
exists simultaneously for all i, and defines a modification of Υ that is a right-
continuous supermartingale relative to the 'usual augmentation' {#г'^+}
of {#,}. We call the setup {(Xt,&t):teWL+} the R-regularisation of {№&)}-
If the map t\-+Yt is right-continuous from [0, oo) into JS?1, as it will be in all
cases of interest, then X is a modification of Y.
The 'obvious' analogue of everything in Part 4 holds, except for the 'obvious'
analogue of the Doob decomposition and the 'class (D) property of UI
supermartingales' in Part (ii) of Theorem 59.1. The great (Doob-)Meyer Decomposition
Theorem, which gives the correct generalisation of these things, is proved in
in full generality in Chapter VI. The theory of stopping times explodes into
a huge subject, many of the deeper parts of which are also studied in
Chapter VI.
Note. We use 'rough' with the connotation of 'rough diamond'—one that can
be 'polished'. Please note that 'raw' is a different technical term meaning 'not
necessarily adapted'. In a sense, 'raw' means 'tmtameable'.
62. Some real-variable results. We collect here some elementary, but essential,
real-variable results. We begin by defining the appropriate regularity property
for the paths of supermartingales.
(62.1) DEFINITION (R-function on R+). A function x:R+ ^R will be called
164
SOME CLASSICAL THEORY
11.62
an R-function if
xt = lim xu for every t > 0,
lijjl
xt _ := lim xs exists finitely for every t > 0.
The double-arrow notation used above will now be clarified.
(62.2) NOTATION (one-sided limits; Ц, etc.). For a function x:R+ -»R, we
define
lim sup xu:= lim sup xu = inf sup {χυ: t < и ^ v}.
ujji u-*t,u>t v>t
The corresponding lim inf is defined in the obvious way, and the corresponding
lim exists if and only if the lim sup and lim inf have the same finite value.
Of course, the || notation is used in a similar way.
Remarks. The French call an R-function cadlag {continu a droite et pourvu de
limites a gauche). Some writers use corlol (continuous on the right with limits
on the left). In the first edition of this book, R-functions were called Skorokhod
functions. Our current notation agrees with that in Volume 2.
We have already examined the difficulties caused by the fact that in studying
processes with time-parameter set IR+, we can only utilise σ-cylinders: essentially,
we can only consider the behaviour of our process on a countable subset of
times. For convenience, we work with the set Q+ of non-negative rational times.
(In many ways, the set of dyadic-rational times would be more convenient!)
{623) IMPORTANT CONVENTION. The symbol q as a subscript under a
lim or lim sup or lim inf or η stands for a RATIONAL number. We shall use s
or и (or anything other than q) to signify a real number in this subscript context.
(62.4) More notation. Let у be a function on Q+. We combine the notational
conventions at (62.2) and (62.3) by writing, for example,
lim sup yq:= inf sup {yq: geQ+, t<q^v}.
qiit v>t
Here the double arrow notation helps emphasise that if t happens to be rational
then the value yt is not relevant to the lim sup just described. The definition of
analogous things will be obvious.
(62.5) DEFINITION (Regularisable function on Q+). Let у:<§+ ->IR. We shall
call у regularisable if
limyq exists finitely for every real t >0,
qiit
limyq exists finitely for every real t > 0,
11.62 CONTINUOUS-PARAMETER SUPERMARTINGALES 165
(62.6) DEFINITION (Upcrossings; UN(y; [a, b]) where y:Q+ ^R). Let y:Q+ -+
R. Lei JVeN and /ei a,beR, w/iere a<b. We define the number UN(y;[a,b~\)
of upcrossings of [a, b~\ by у during the interval [0, Ν] ίο be the supremum of
кеЖ* such that we can find rationals
0^q1<r1<q2<r2<"' <qk<rk^N
with
y(qi)<a, y(rt)>b (l^i^/c).
Remarks. We use y(q) rather than yq when typographically more convenient.
Note that U N(y\\_a,b~\) may well be oo.
(62.7) THEOREM. Let y: <Q+ -»R. 77ien у is regularisable if and only if whenever
iVeN and a, beQ vvii/i a<b, we have both
(62.8) sup{|);g|.^EQ+n[0,JV]}<cxD
and
(62.9) t/*(K[e,4)<oo.
Proo/. Now that you know the proof of Doob's Convergence Theorem 49.1,
this is an easy exercise.
The 'if1 part. Suppose that, whenever NeN and a,be$l with a < b, statements
(62.8) and (62.9) hold. Suppose for the purposes of contradiction that, for some i,
(62.10) lim sup yq > lim inf yq.
«lit illt
If we choose a and b in Q so that a < b and both a and b lie strictly between
the lim inf and lim sup in (62.10) then, for N > i, we shall have UN(y; [a, b]) = oo,
contradicting (62.9). Hence (62.10) is false. We therefore see that for every t ^ 0,
limyq exists in [oo, oo],
qiit
and (62.8) guarantees that this limit is finite. The proof for the tt limits is
similar.
The 'only if part. If у is unbounded on Q+ n[0,iV] for some JVeN, we can
choose q(n) in Q+ n[0,JV] such that \yqin)\ > n. Let t be an accumulation point
of the set {q(n)}. Then at least one of the limits
(62.11) limyq, limyq
qiit qtft
must fail to exist in R.
Suppose that, for some a, beQ with a < b and for some NeN, UN(y; [a, b~\) =
166 SOME CLASSICAL THEORY 11.62,63
oo. Define i:= inf {reR+ : Ur(y; [a,b~\) = oo}, the definition of Ur(y; [a,b~\) being
obvious. Then at least one of the limits at (62.11) must fail to exist, the limsup
being at least b and the lim inf at most α. Π
(62.12) COROLLARY. If {Yq: qe(^+} is an Ж-valued stochastic process carried
fry (Ω,^,Ρ) then
G:= {(u\q\-► Yq(co) is regularisable}
is an element of<&.
Proof. Theorem 62.7 allows us to exhibit G as a σ-cylinder, which is then
automatically in ^. Do think this through.
(62.13) THEOREM. Let y:Q+-»IR be a regularisable function. Then
xt:=limyq (ieIR+)
qiit
defines an R-function x.
63. Filiations; supermartingales; R-processes; R-supermartingales. Let (Ω, ^, Ρ)
be a probability triple.
(63.1) DEFINITION (filtration {Sfr:teR+}; filtered space). By a filtration
{yt:telH+} on (Ω,^,Ρ), we mean an increasing family of sub-a-algebras of&:
(63.2) forO^s^i, ЙГ.с^с^-^а) (J Sfjcsf.
\meR+ /
The setup (Ω,^,Ρ; {^t:ieIR+}) is then called a filtered space.
(63.3) DEFINITION (martingale, supermartingale, adapted process).· Let
(Ω,^,Ρ; {<&t: ieIR+}) be a filtered space. By a martingale (relative to this setup),
we mean an Έί-valued process {Y^ieR"1"} such that
(i) Yte^ for every t;
(ii) {Yt} is {^t}-adapted in that Ytem<&t for every t;
(Hi) forO^s^t, E(Yt\9a) = Ys, a.s.
For the definition of supermartingale (respectively, submartingalej, the '=' sign
in (Hi) is replaced by '*ζ' (respectively '^').
(63.4) LEMMA. Let Υ be a supermartingale relative to the filtered space
(^^,P;{^:iER+}). Let ie[0, oo) and let (g(n):ne-IM) be a sequence of
rationals with q(n)[[t as η Ц — oo. Then
lim Ya(n, exists a.s. and in JS?1.
11.63,64 CONTINUOUS-PARAMETER SUPERMARTINGALES 167
Proof. Simply apply the Levy-Doob Downward Theorem 51.1 to the setup
(W **«>). notin8 that suPn< -1 E(Ym) ^ ВД. D
(63.5) DEFINITION (R-process, R-supermartingale). A process is called an
R-process if all its sample functions are R-functions. By an R-supermartingale,
we mean a process that is both an R-process and a supermar ting ale.
64. Some important examples. We look at Brownian and Poisson examples, at
hazard functions, and at an example of a UI martingale with a jump that is
not in se1.
(64.1) Example: pre-Brownian motion. Consider the pre-Brownian motion
(Ω,$,Ρ;{Υ^Α>0}):=(Κτ,&τ,μ;{π('ΛΕΤ}) (T:=[0,oo))
in (32.5). For 0 ^ s ^ i, Yt — Ys is independent of any finite subfamily of {Yr: r < 5},
and therefore of the σ-algebra ^s:= a(Yr: r ^ 5). Hence
Е(У,-У5;^) = Е(У,-У5) = 0, a.s.,
and so У is a martingale relative to its natural filtration {^t}. Note that, since
E[(yt — У5)2] = t — 5, the map t \-* Yt is continuous into JSf2, and therefore into
(64.2) Example: compensated Poisson counting process. Use Section 37 to
construct a Poisson measure Л on [0, 00) with intensity measure equal to the
Lebesgue measure λ. Write (Ω,^,Ρ) for the carrier triple. Define Nt:=A[0,i],
the number of Poisson occurrences during time interval [0, t]. Note that the
function t\-+N(t,(o) is here already an R-function for every ω. For 0 ^ s ^ i,
Nt-NM = A(s,i]
is independent of ^s:= a(Nr:r^s), and Nt — Ns has mean t — s. Hence {Nt — t}
is an R-martingale relative to the filtration {^t}. The process {Nt — t} is called
the compensated Poisson counting process. A serious study of 'compensators'
is made in Chapter VI.
(64.3) Example: hazard functions. A more extended study of hazard or'cumulative-
risk' functions is made in Section VI.22.
Let (Ω, ^, Ρ) be a probability triple, and let Τ: Ω -*· (0, oo) be a positive random
variable. Let F be the distribution function of T:F(t):=P(T^t). Define the
right-continuous process A:=I[TiO0) via
Л,(со):=\1 ύί>Τ{ω)>
(.0 if t < Τ(ω);
and put &t:=a(As:s ιζή. Note that {^,} is the smallest filtration relative to
which Γ is a stopping time in that (T< t}e&t for every t.
168 SOME CLASSICAL THEORY 11.64
Define the hazard or comulative-risk function
ад:Л tfW .
J(o,u]l-^-)
(64.4) LEMMA. The process M, where
Mt:=At-h(TAt\
is a martingale relative to the filtration {&t}.
Proof. For s ^ i, the σ-algebra У5 is generated by the π-system consisting of Ω
together with all sets of the form {T < r}, where r < 5. Now, for r < s, we have
E[M,; Γ ^ r] = E[l; Γ < r] - Ε[/ι(Γ); Γ ^ r] = E[MS; Τ < r].
Next, we have
E(M,) = P(T ^ t) - E[h(T); Τ ^ i] - E[ft(i); Г > 0
= F(t)-f h(r)dF(r)-h(t)[l-F(t)l
J(0,t]
However,
Γ /i(r) dF(r) = Γ </F(r) Γ dF(v) [1 - F(i; -)] "'
J(0,i] J(0,i] J(0,r]
= f dF^Cl-Fit;-)]-^ dF(r)
J(0,f] J[i;,f]
W-Fiv-n-'lFW-Fiv-n
F(t,_)]-i{l_F(t,_)_[l_F(i)]}
= f ^
J (0,1]
= f dF{v)[l
J(0,i]
= [ dF(v)-h(t)\;i-Fm
J (0,0
= F(t)-fc(t)[l-F(t)],
so that Ε(Μ,;Ω) = Ε(Μ,;Ω) = 0. Hence, by the π-system characterisation of
conditional expectation in Definition 39.1, we have E(M,|^S) = Ms, a.s.. Π
(64.5) Example. Consider the previous example when Τ has the exponential
distribution of rate 1, so that, for ί > 0,
F(t) = l-e-', h{t) = t, Mt = At-{TAt).
H.64,65 CONTINUOUS-PARAMETER SUPERMARTINGALES 169
Let g be a deterministic function on [0, oo), let G(i):= $s0g(s)ds9 and define
Z,:= {'g(s)dMM = g(T)I{T<t}-G(TAt)
Jo
Ut)-g(t> if τ<ί,
(-G(i) ifT>i.
Now choose g so that
e'-l te'+i
G(i) = - , whence 0(t) = -
1 + ί (1 +1)
2
(64.6) LEMMA. TTte process Ζ is then a UI martingale such that Z(T—) is noi
in se\
Proof. We have
\2 + T-eT\ eT
(1 + Tf (I + T)2
Since
eT f°°_fl
+ T)2"Jo (ГЙ
E ——r= I e* ,e~sds=l,
(1 + Γ)2 J0 (i+s)2
we see that the process {Zt/<r</,} is dominated in if1. Moreover,
E(|Z,/{T>t}|;|Z,|>K)
f(l +1)" V- l)e"' if(l + t)" V - 1) > K,
[θ otherwise.
"{ί
Thus both the processes {ZtLT^tA and {ZtLT>tA are UI, whence Ζ is UI.
Proof of the martingale property of Z, along the lines of the proof of Lemma
64.4, is left as an exercise. Note that
_ Γ00**-1
Jo 1 + 5
EZ(T-) = -EG(T-)=- e~5ds=-oo. D
Remarks. Examples similar to Ζ were studied by Dellacherie and by Doleans-
Dade. We can obtain the same phenomenon in discrete time with Z(T— 1)
instead of Z(T-).
65. Doob's Regularity Theorem: Part 1. Now the full power of the Upcrossing
Lemma is brought into play.
170 SOME CLASSICAL THEORY 11.65
(65.1) THEOREM (Doob's Regularity Theorem: Part 1). Let {y,:ieIR+} be a
supermartingale carried by the filtered space (0,^,Р;{^:£еК/}). Let
G:= {ω:the тардь-» Yq(co) from Q+ to 1R is regularisable}.
Then Ge& and P(G) = 1. For ieIR+, define
(0 ifcotG.
Then X is an R-process in that all sample paths of X are R-functions.
Proof. Look again at Theorem 62.7. Because there are only countable many
triples (iV,a, b) where iVeN and a, fteQ with a < ft, we need only show that, for
fixed JVeN and fixed a,beQ with a < ft, we have
(65.2) P(sup{|7»|:^EQ+n[0,iV]}<oo)=l
(65.3) P(C/JV(y|Q+;[a,ft])<oo)=l,
where the number of upcrossings relates to the restriction of 11-> Yt(co) to the
setQ+n[0,JV].
Let (D(m)) be a sequence of finite subsets of Q+ η [0, iV], each containing 0 and
N, and with D(m)tQ+ n[0,N]. Then, by Lemma 54.5 applied to {Yq:qeD{m)},
we have, for о 0,
P(sup {| У»| : g eQ+ η [0, N]} > 3c) = t lim P(sup {| Yq(a>)\: qeD(m)} > 3c)
^4Е(|У0|) + ЗЕ(|У„|),
and (65.2) follows. By the Upcrossing Lemma 48.3, we find that
ЕС/*(У |Q+; [а, i>]) = Τ Hm EUN(Y \D(m); [a, Ц) < Шгй+И),
^ ft —a
and (65.3) follows.
(65.4) Example. Suppose that Ω = { + 1,-1}, that P({ + 1}) = P({-1}) = ±,
that <St = {0, Ω} when ί < 1 and that #f = ^(Ω) when ί > 1. Suppose that, for οοεΩ,
УДш):
Γθ if t ^ 1,
[ω ifi>l.
Then У is a martingale relative to the filtration {^t}, and
[0 if ί < 1,
[ω if ί Μ.
Note that Xx is not ^-measurable, so that X is not a martingale relative to
the filtration {^t}. Moreover, P(XX = Ух) = 0, so that X is not a modification
of У
This example explains our next concerns.
11.66 CONTINUOUS-PARAMETER SUPERMARTINGALES 171
66. Partial augmentation. We continue with the notation and assumptions of
Theorem 65.1.
The set G in that theorem is an element of the σ-algebra Ji{^S^) of P-trivial
sets in <S ^ that is, sets in ^да of P-measure 0 or 1. The definition of X in
Theorem 65.1 makes the following lemma obvious.
(66.1) LEMMA and DEFINITION (partial augmentation). The process X is
adapted to the filtration {Jft}, where
jrtv=a(9t+9jr(9J), where <Zt+:= f] Vv= f] STf.
v>t q>t
We call {Жг} the partial augmentation of {^t}.
{66.2) THEOREM (Doob's Regularity Theorem: Part 2). The process X is a
supermartingale relative to {Jft}. Moreover, X is a modification of Υ if and only
if the mapt\-+Yt is right-continuous into if1, that is, if and only if
limE(\Yv-Yt\) = 0 for every i^O.
viit
Proof. For the moment, fix υ and t with v>t^0.
Suppose that (q(n):ne — N) is a sequence of rationals with v>q(n)[[t as
η || — oo. (For sequences, the || notation is understood to imply monotonicity:
q{n) ^q(n-l)> t.) We have
Е(Щ^(и))<Уд(и), a.s.
Using the Levy-Doob Downward Theorem 51.1 (for martingales!), we have
Е(ВД+ХХ„ a.s.,
whence, trivially,
(66.3) E(Yv\jft)^Xt, a.s.
Now suppose instead that и ^ t and that {q(n)) is a sequence of rationals with
q(n)Hu. From (66.3), we have
(66.4) Е(Уд(и)|^,К*„ a.s.
However, we know from Lemma 63.4 and Theorem 65.1 that Yqin)-+Xu in if1;
and, by the JS?1 continuity of conditional expectations, we now have
E(Xu\JTt)^Xt, a.s.
Hence AT is a supermartingale relative to {J^u}.
It now follows from Lemma 63.4 and the right-continuity of X that X is right-
continuous in JSf1. Since we also know that if q{n)[[t then Yqin)-+Xt in JS?1, it
follows that AT is a modification of Υ if and only if Υ is right-continuous in if *.
D
172
SOME CLASSICAL THEORY
11.67
67. Usual conditions; R-filtered space; usual augmentation; R-regularisation. We
continue the discussion in the last two sections.
The partial augmentation {J^t} of {^t} does not allow a sufficiently rich class
of stopping times. We need the so-called 'usual augmentation'.
(67.1) DEFINITION (usual conditions; R-filtered space). A filtered space
(Q,^',P;{^'f:ie]R+}) is said to satisfy the usual conditions and then to be an
R-filtered space if, in addition to the filtration property
forO^s^t, &ш^&г^&^=о\ (J iOc:^,
\ueR+ /
the following properties hold:
(i) the σ-algebra 3F is P-complete;
(ii) ^0 contains all P-null sets in 3F;
(Hi) {^t} is right-continuous in that
jri = j5-i+:= p| jtm forallt^O.
u>t
(67.2) Remark. The 'usual conditions' described above are those standard in
the literature. In some contexts, however, a more appropriate definition is
obtained by replacing У by & ^ in properties (i) and (ii). We stick to Definition
67.1, though the current Remark will haunt us in Markov-process theory.
(67.3) DEFINITION (usual augmentation, R-regularisation). Let (Ω, #,?,{#,})
be a filtered space. The usual augmentation, or R-regularisation (Ω,^",Ρ, {&Ί})
of this setup is the minimal enlargement ('enlargement' in that У с: & and &t с: &*г
for every t) that satisfies the usual conditions.
(67.4) LEMMA. The usual augmentation of (Ω, 0, P, {&t}) is obtained by making
!F the P-completion of <&, and, with JT denoting the collection of P-null sets in
&', setting
(67.5) Px = Π σ(3ΤΜ, JT) = σ(»,+, JT).
u>t
Ift^O and Fe^t then there exists G in <&t+ such that
FAG:= (F\G)u(G\F)ejr.
Important Note. The equality on the last two terms of (67.5) needs proof because
it is not true in general that if (Σπ) is a decreasing sequence of σ-algebras on a
set S with intersection Σ, and Sf is a σ-algebra on S, then f] σ(Σπ, У) = σ(Σ, Sf\
See, for example, Section 4.12 of [W]. Of course, in (67.5), Ji consists of all
P-null sets, and this makes (67.5) easy to prove. Go through the Exercise E79.67b
of proving the lemma in full.
11.67 CONTINUOUS-PARAMETER SUPERMARTINGALES 173
(67.6) PROPOSITION. In the context of Parts 1 and 2 of the Regularity Theorem,
X is an {!Ft} supermartingale. The structure {(Xt, 3P\} is called the R-regularisation
of{(yt,SFt)}.
Proof This is obvious because, for each i, Жх and §*x differ only by null sets,
as was explained at the end of Lemma 67.4. Π
We now examine what happens when our underlying filtration does satisfy
the usual conditions.
(67.7) THEOREM (Doob's Regularity Theorem: Part 3). Let (Ω,^,Ρ,{^})
be an R-filtered space relative to which supermartingales etc. are defined. Let Υ
be a supermartingale. Then Υ has an R-process modification Ζ if and only if the
mapt\-+E(Yt) from [0, oo) to R is right-continuous, and then Ζ is an R-super-
martingale.
Proof. From the supermartingale property of У, we have, for и > ί,
(67.8) Е(ГМ|^,КГ„ a.s.
Now let X be constructed from Υ as in Parts 1 and 2 of the Regularity Theorem.
Let иUt in (67.8), and use the fact that Yu->Xt in if1, to obtain
E(X,| J%K 7„ a.s.
However, {Xt} is {J%}-adapted, the usual augmentation of {^t} being {J%}
itself. Hence
Xt^Yn a.s.
But, if t\-+E(Yt) is right-continuous then, since Yu->Xt in JS?1, we have
(67.9) Е(*,) = ИтЕ(Ум) = Е(У,),
«lit
and, on comparing (67.8) with (67.9), we see that Xt = Ур a.s. Thus, if the
тар£ь->Е(У,) is right-continuous then X is an R-supermartingale modification
of У The rest is trivial. Π
The following lemma answers a frequently occurring question.
(67.10) LEMMA. Suppose that Υ is an R-supermartingale relative to a filtered
space (Ω,^,Ρ, {^t}). Then Υ is also an R-supermartingale relative to the usual
augmentation (Ω, ^", Ρ, {«^J).
Proof. Let 0 < t < v. Then, for и(п)Ц t with u(n) < ν for every n, we have
whence, applying the Downward Theorem on the left-hand side and right
174
SOME CLASSICAL THEORY
11.67,68
continuity on the right-hand side, we obtain
E(y.|Srt+)«yt, a.s.
Hence Е(У„|^,)<У„ a.s. D
68. A necessary pause for thought. We must not rush too quickly to claim that
we can henceforth consider only R-supermartingales relative to R-filtered spaces.
Let us consider rather carefully the canonical pre-Brownian motion
(68.1) (Cl,<Z,P;{Yt:t>0})
of (64.1). We define the natural filtration &t:=a(Yr:r ^t) of У Then У is a
martingale relative to this filtration. Since the martingale У is also continuous
in S£ \ we can find a modification X of У that is an R-supermartingale relative
to the usual augmentation (<Г,{^}) of(9,{9t}).
But 3Ft contains information about what happens just after time i, and various
questions are raised. Have we destroyed the Markov property? We know that,
for i, и > 0, Yt+U — Yt is independent of ^t, and it is easy to see that Xt+U — Xt
is independent of Уг But is Xt+U — Xt independent of ^7? More fundamentally,
we can ask whether <FX really looks into the future, or whether it is true that
!FX = σ(^ρ Jf\ where Jf is the collection of P-null sets in &. We know, of
course, that &x = a(9t+9 Jf).
We now resolve these questions. It seems best to state the results first as they
relate to the original setup (68.1).
(68.2) THEOREM. For the canonical pre-Brownian motion setup at (68.1), with
its natural filtration {Уг}, the following results hold.
(i) For t^0,<%t:=a{Yt+u-Yt:u^0) is independent of$t + .
(ii) For t ^ 0, ^i+ с σ(9ί9 Jfi$n)\ where Jfi<§β) denotes the collection of P-null
subsets in У^.
We can interpret (i) as stating that looking ahead an infinitesimal amount
by replacing <&t by 9t+ does not destroy the crucial independence property.
The result (ii) makes it clear that 9t+ does not really look ahead of t: every
9t+ set differs from some cSt set by a null set.
Proof of (i). It is clear from the independence properties of У and from the
construction of X in Part 1 of the Regularity Theorem that, for i, и ^ 0 and
ε>0,
Xt+U+E — Xt+E is independent of ^ί+(1/2)ε and hence of 9t+.
Hence, for any G in &t+ and any bounded continuous function /onR,
E[/(X,+B+£ - Xt+e); G] = P(G)E[/(X(+u+£ - Xt+,)l
We use the right-continuity of X and the Bounded-Convergence Theorem to
H.68,69 CONTINUOUS-PARAMETER SUPERMARTINGALES 175
deduce that
(68.3) 4f(Xt+u - *,); G] = P(G)E[f(Xt+u - Xt)l
The Monotone-Class Theorem shows that for each fixed G, the result (68.3)
holds for every bounded Borel function /, whence Xt+U — Xt is independent of
#t + . Since У is a modification of X, Yt+U — Yt is independent of ^t+.
But now, for ί,Μ,ΟΟ, the variable Yt+U + O— Yt+U is independent of #(t+ll)+;
and &t+ and σ(7ί+Μ— Yt) are independent sub-a-algebras of #(t+ll)+; etc. Full
proof that ^t is independent of #t+ is now left as an exercise.
Proof of (ii). Note that #e = a(^t9%t% so that #β is generated by the π-system
Jt of sets of the form Gtr\Ati where Gte(St and Ate%r
Let η be bounded (^i+)-measurable, and let ξ be a version of η — E(^t). All
that we need to show is that ξ = 0, a.s.. Since ξ is (^i+)-measurable, ξ is
independent of <%t. So, for Gte<&t and Л,е^„ we have
Е(£С,п^ = Р(Л,)Е(£С,) = 0,
the last equality holding because of the definition of conditional expectation.
Hence ξ is orthogonal to the indicator function of any element of the π-system
</„ and hence to the indicator function of every element of %. Since ξ is
(^J-measurable, ζ = 0, a.s. Π
Theorem 68.2 has the following consequences for the R-regularisation
(X,^,{^t}) of (Y, ${$}).
{68.4) THEOREM. For the R-regularisation (Χ,Ρ,^}) of the canonical pre-
Brownian motion (Y,^, {^t}), the following statements are true.
(i) For t ^ 0, σ(Χί+Η - Xt:u ^ 0) is independent of fv
(ii) For t ^ 0, &t = σ(#„ΛΟ, where jV is the collection ofP-null elements of ^
(which here equals ^J.
Note that since X is a modification of У, we have proved the following result.
(68.5) LEMMA. There exists an R-process X with Wiener measure as its law.
We shall see later that almost all paths of this process X are continuous.
Results such as Theorem 68.4 are very much part of the reason that we can
concentrate on R-supermartingales relative to R-filtrations. We need considerable
extensions of Theorem 68.4 later.
69. Convergence theorems for R-supermartingales. Right-continuity of paths
allows us to prove the convergence theorem for continuous-parameter super-
martingales in the same way as we proved the discrete-parameter result.
176
SOME CLASSICAL THEORY
11.69
(69.7) THEOREM (Doob's Supermartingale-Convergence Theorem; compare
Theorem 49.1). Let X be an R-supermartingale relative to a filtered space
(Ω,^,Ρ, {#,}). Suppose further that X isboundedin JS?1: supiE(|ArJ)< oo. Then
1Ю:= lim Xt exists (in IR), a.s.
Proof. Since t\-+Xt((o) is right-continuous,
lim sup Xt = lim sup Xq, lim inf Xt = lim inf Xq.
I-t-oo q-*ao ί-»·οο q-+со
If, therefore, lim, ЛГДсо) did not exist in [ — 00,00], we could find rationals a,b
with a < b such that
lim inf Xq < a < b < lim sup Xq,
q-> со q-+ 00
whence the number (700(Ar|Q+; [a,fe]) of upcrossings of [a,fc] by the restriction
^Iq+ would be infinite. However, by the Upcrossing Lemma and familiar
arguments,
EI/JJriQ+jfeblJ^ib-arMsupEilATJJ + la^oo.
Thus Хю exists, a.s., in [ — 00, 00]. That AT^elR, a.s. follows as usual by Fatou's
Lemma. D
(69.2) THEOREM (Doob's Convergence Theorem: Part 2; compare Theorem
50.1). Continue with the notation and assumptions of Theorem 69.1.
(i) If {Xt:t ^ 0} is further assumed to be UI then
X^X^ in S£\
and, for every t, E(Xo0\^t) ^ Xv a.s., with a.s. equality if X is martingale,
(ii) If X is a martingale and X^X^ in «Sf1 then X is UI.
Proof of (i) is an obvious modification of the proof of Theorem 50.1. Part
(ii) is an immediate consequence of the JS?1 continuity of conditional expectations
and the UI property established in Theorem 44.1.
(69.3) Warning: in continuous time, an JS?1 -convergent supermartingale need not
be UI. See Exercise 79.64 or part (v) of E79.66a.
(69.4) THEOREM (compare the Downward Theorem 51.1). Suppose that we
have an R-supermartingale (Ω, ^, P; {(% Xt):t > 0}) with parameter set (0, 00) open
at 0. Suppose further that 8ир,>0Е(,Х\)< 00. Then
X0+:= lim Xt exists a.s. and in 5£ \
and, for every t, E(Xt\%+) ^ X0, a.s.
H.69,70 CONTINUOUS-PARAMETER SUPERMARTINGALES 177
Proof. Yet again, the existence of the limit in [ — 00,00] follows from the
Upcrossing Lemma. The rest follows closely the proof of Theorem 51.1. D
{69.5) THEOREM (Upward Theorem for martingales). Suppose that (Ω,^,
Ρ; \β*{Λ ^ 0}) satisfies the usual conditions. Let ξε^1(Ω, #", Ρ). Then there exists
a UI R-martingale {ξ,'Λ ^ 0} with ξί = Е(£|^), a.s. As t-+ 00, ξί-^Ε(ξ\^Γο0% a.s.
and in JS?1.
This is now an easy exercise.
70. Inequalities and ££p convergence for R-submartingales. Right continuity of
paths also allows us to transfer the standard inequalities from the discrete-
parameter case.
(70.1) THEOREM (Doob's Submartingale Inequality; compare Theorem 52.1).
Let Ζ be a non-negative R-submartingale relative to the filtered space (Ω, % P; {^,}).
Then, for о 0 and t ^ 0,
cp( sup Zs 7* с ) ^ Ei Zt: sup Zs ^ с J ^ E(Zt).
Proof. Let (D(m)) be an increasing sequence of finite subsets of [0, i] each
containing the points 0 and f, and insist that the union of the D(m) is dense in
[0, i]. Then
sup Ζ5(ω) = sup sup Zs(co\
se[0,t] m seD(m)
so that
sup Zs > с > = t lim < sup Zs>c).
se[0,r] J mi. seD(m)
Note that we need ' >' not' ^' for this logic. Thus, for у > 0,
:} = Tlim{
P<J sup Zs>y > = TlimP< sup Zs>y
. se[0,t) J ml seD(m)
^^Иту'Щг,; sup Zs>y
seD(m)
= y-1E(Zt; supZ5>y
Now let у |Tс. П
(70.2) THEOREM (Doob's JS?p inequality; compare Theorem 52.3). Let ρ > 1
and define q so that p~l + q~x = 1. Lei Ζ be a nonnegative R-submartingale
178 SOME CLASSICAL THEORY 11.70,71
relative to some filtered space (Ώ,^,Ρ;{^}). Assume further that Ζ is bounded
in S£p. Define
Z*:= sup \Zt\ = sup Zv
Then Z*e&pt and indeed
(70.3) HZ*||^isup||Zf||r
t
The submartingale Ζ is therefore dominated by the element Z* of <£p. Also,
Z^\= limZt exists a.s. and in 5£py and
||Z00||p=sup||Zi||p=lim||Zi||p.
t t
If Ζ is of the form M, where Μ is an R-martingale bounded in S£p> then
Af e:= limMt exists a.s. and in 5£py and E(Ma0\(St) = Mv a.s..
The proof is left as an exercise.
71. Martingale proof of Wiener's Theorem; canonical Brownian motion. We gave
the Levy-Ciesielski proof of Wiener's Theorem in Section 1.6. Now, we give a
martingale proof.
We know that the pre-Brownian motion Υ has an R-process modification
X that is a martingale relative to the usual augmentation {&t} of the natural
filtration of Y. Perhaps it is here most natural to work with the natural filtration
{3£t} of X, defining martingales etc. with respect to {3Ct}. However, you can work
with {J^,} if you wish: it does not matter.
(71.1) THEOREM. P-almost all paths of X are continuous.
Proof. The process X* is an R-submartingale, by the conditional form of
Jensen's inequality. Hence, by the Submartingale Inequality, we have for ε > 0,
ε4Ρ( sup IJSCJ > ε j = ε4Ρ( sup {X4} >e]< E(X4) = Κδ2,
\ s^S J \ s^S J
where Κ = Ε£4, ξ denoting a random variable with the standard normal
distribution. (In fact, К = 3.) Thus if Dn:= {/с2"п:0 < к < 2П} с [0,1], we have
with <5„:= 2" and επ:= n_1,
P( sup 8ηρ\ΧΓ+5-ΧΓ\>εη)^2ηΡ[ sup \XS-X0\ > ε„
reD(n) s^d» J \s^bn
^ 2ηΚε;*δ2 = 2nKn*2~2n = Kn*2~n.
However, ΣΧη42~π converges, so, by the First Borel-Cantelli Lemma, there
exists a subset Ω0 of Ω with Ρ(Ω0) = 1 such that, for ω€Ω0, there exists an η0(ώ)
11.71 CONTINUOUS-PARAMETER SUPERMARTINGALES 179
such that, for η ^ η0(ω),
sup sup \Xr+s{w)-Xr{(D)\^n~\
reD(n) s^ 2~n
whence
sup \ΧΓ+5(ω)-ΧΓ(ω)\^^-\
r,se[0,l]\r-s\^2-n
Thus, for ωΕΩ0, t\-*Xt((u) is (uniformly) continuous on [0,1]. It is obvious now
that P-almost-all paths of X are continuous on [0, oo). Π
It is obvious that we can modify X so as to have all its paths continuous.
(Take Χ.(ω) = 0 for bad ω.) By using exponential martingales instead of the X*
submartingale, and a lot of ingenuity, one can refine this argument to obtain
Levy's Modulus-of-Continuity Theorem. See, for example, McKean [1].
(71.2) Canonical Brownian motion. Let C:= C(R + ,R) be the space of all
continuous functions from [0, oo) to R. For weC and t ^ 0, define
nt(w):= w(t),
and define the σ-algebras
s/t:= a(n5:s < ί) (ί > 0), */„:= a(ns:s ^ 0).
If X is either the process of the last section or the process В of Section 1.6 (in
either case, with all paths made continuous) then Wiener measure W on (С, Я^)
is given by the law of X:
w = p°x-1,
when we regard X as the measurable map ω\-+Χ. (ω) from (Ω, J^) to (C,^).
(71.3) DEFINITION (canonical Brownian motion started at 0) and LEMMA.
The setup
(C^W^^ii^O)
is called canonical Brownian motion on R started at 0. The Wiener measure
on (C, si^) is the unique probability measure on (C, si^) such that, under W,
(i) π0 = 0, a.s., and
(ii) whenever i,w^0, πί+α — nt is independent of s/t and has the N(0,t — s)
distribution.
We know this. Now check out the following lemma.
(71.4) LEMMA. For canonical Brownian motion, for every t^0 the process
{nt+u-nt:u^0}
has law W and is independent of siv
180
SOME CLASSICAL THEORY
11.71,72
We know that such things are rather tiresome (and there are more such in
the next section). However, you should be sure that you can prove such results
as that just given. Of course, we more or less did this example during the proof
of Theorem 68.2.
There are good reasons for not insisting that Χ0(ω) = 0 for every ω.
72. Brownian motion relative to a filtered space. Just when you thought that
things were getting clean and tidy ... . Already during our discussion of
Skorokhod embedding in Section 1.7, we had to use-non-=caironical Brownian
motion. We needed variables α, β etc. that are independent of our Brownian
motion, and our canonical space will not carry these. So, more definitions....
{72.1) DEFINITION (Brownian motion relative to a filtered space). Let (Ω,^,
Ρ, {&t}) be a filtered space. By a Brownian motion (on R, starting at 0)
relative to this setup, we mean a process X such that
(i) X(0) = 0, a.5.;
(ii) all paths of X are continuous;
(Hi) whenever t,u^0, Xt+u — Xt is independent of (St and has the N(Q,t — s)
distribution.
(72.2) LEMMA. // X is as in (72.1) then, for each t ^ 0, the process {Xt+U — Xt:
и ^ 0} has the Wiener measure as its law, and is independent ofcSv Moreover, X
is a Brownian motion relative to the usual augmentation of the setup (Ω, ^, Ρ, {^}).
We essentially know all this.
We mention that Levy's characterisation of Brownian motion (1.2.2) extends
to this situation.
(72.3) LEMMA. Let (Ω,^,Ρ, {%}) be a filtered space. Let X be a continuous
process adapted to this filtration such that Xo = 0 almost surely. Then X is a
Brownian motion (started at 0) relative to the setup (Ω,^,Ρ, {&t}) if and only if
both {Xt} and {Xf — t} are martingales relative to this setup.
The proof is in Section IV.33.
Here is the result we need to begin to play the Skorokhod-embedding game
of Section 1.7 properly. (We also require the Strong Markov Theorem.)
(72.4) THEOREM. Let X* be a Brownian motion relative to (Ω*, ^*, Ρ*, {^*}).
Let (Ω**, ^**, P**) be a probability triple carrying a family {а** :ЛеЛ} of random
variables.
Let Ω:= Ω* χ Ω**, with typical point ω = (ω*, ω**). Define
<#t:= <#? χ ^**, Xt(co):= Χ*(ω*% αλ(ω):= α**(ω**),
Π.72,73 CONTINUOUS-PARAMETER SUPERMARTINGALES 181
and put P:= Ρ* χ Ρ** on <§\= <§* χ 9**. Then
(i) {Xt} is a Brownian motion relative to (Ω,^,Ρ;{^,});
(ii) the family {ая:ЛеЛ} has the same P-law as {а£*:ЛеЛ} has P**-law;
(Hi) {Xt} and {ая:ЛеЛ} are independent families.
Proof For t ^ 0, let
rft:=a(Xt+u-Xt:u^0).
Then <stft = s/f χ {0,Ω**}, with an obvious notation for «s/*, because srft has
no information about ω**. For A*es/*, Gfe&f and G**e^**9 we have
P[(i4*xQ**)n(G*xG**)]
= P[(4*nG*)xG**] (logic!)
= P*(A * η G*)P**(G**) (definition of P)
= P*(4*)P*(G*)P**(G**) (P*-independence of s/* and G*)
= Р(Л* x Q**)P(G* x G**) (definition of P).
That stt and ^, are independent now follows from the π-system Lemma 22.2.
The rest is easy. D
Stopping times
In continuous time, the theory of stopping times becomes a massive and very
deep subject in its own right, many of the crucial parts of which we develop in
Chapter VI. Here we concentrate on developing those results that we need in
Chapters III and IV (the latter being on stochastic-integral theory for continuous
processes).
73. Stopping time Γ, pre-Γ algebra &T9 progressive process. Let (Ω, 9, P, {9t}) be
filtered space. At the moment, we make no assumptions about usual conditions.
(73.1) DEFINITION (stopping time Γ, σ-algebra 9T). A map Γ:Ω-»[0, οο] is
called a {9t} stopping time if
(73.2) {Γ< r}:= {ω:Τ{ω) < t}e&t for every t < oo.
We then define the pre-Γ σ-algebra 9T via
(73.3) Ae9T if and only if Λ η {Γ < ί}efy for every t < oo.
{73.4) LEMMA. The following results hold,
(i) If S^T then 9S с 9T.
(ll) ^5Л r = ^Π 9T.
182
SOME CLASSICAL THEORY
11.73
(Hi) If Ae&Sv T then An{S < T}e&T.
(iv) 9SvT = a(9s,9T).
(Compare Lemma 58.3—carefully!)
(73.5) {% + } stopping times. Note that {&t+} is also a filtration, that Τ is a {&t + }
stopping time if and only if
(73.6) {T<t}e9t for every t < oo,
and that then, with &T+:={&.+)T in the obvious sense,
(73.7) Ae^r+ifandonlyif Αη{Τ*ζ t}e9t+9 Vi,
if and only if An {T< t}e9t, Vi.
{73.8) LEMMA. Let (Sn:neN) be a sequence of {9t} stopping times.
(i) If Sn|S then S is a {9t} stopping time,
(ii) If S„[S then S is a {&t + } stopping time and &s+ = f]n^sn+'
The proof is left as Exercise E79.73.
Note. Of course, if S„|S, it need not be true that 9S = а(98п:пеЩ
A very important part is played in the subject by a sequence
Adapted Ώ Progressive 3 Optional 3 Previsible
of ever-more-restrictive notions of 'non-anticipating'. We now meet the second
of these.
(73.9) DEFINITION (progressive process). A process X = {Xt:t^Q} (with
values in an arbitrary measurable space {E,S)) is called {9t}-progressive if for
every i^O, the restriction of (ί,ω)ι->Χ(ί,ω) to [0,ί]χΩ is (Jf[0,i] x ^t)-
measurable.
Note that a {^-progressive process is automatically {&t}-adapted—see
Lemma 11.4.
An important example of a progressive process (in fact, of an optional process!)
is provided by the following lemma.
(73.10) LEMMA. A right-continuous adapted process with values in a metrisable
space (£, Jt, E)) is progressive.
Proof. Fix t ^ 0. For neN, define, for s < i,
7(Λ)(5,ω):=Χ([Λ+ 1]2-πί,ω) if k2'ut <s < [fc+ l]2"ni,
and put 7(π)(ί,ω):= X{t,co). Then 7(n) is trivially (^[Ο,ί] χ ^-measurable, and
X = lim У(и). D
11.73,74 CONTINUOUS-PARAMETER SUPERMARTINGALES 183
You did of course notice that no analogue of Part (i) of Lemma 58.3 was
given at (73.4). Here is the appropriate analogue.
(73.11) LEMMA. // X is {^^-progressive and Τ is a {&t} stopping time then
XT is ^-measurable.
Of course, Χτ(ω) is defined to equal ΧΤ(ω){ω) if Τ(ω) < oo. If, for example, X is
a supermartingale such that X^ exists, we define Χτ{ω) = ^(ω). In other cases,
we would set ^(ω) to be identically 0.
Proof. Fix t ^ 0. Define Ω,:= {ω: Τ(ω) ^ί}, and let #^be the σ-algebra of subsets
of Ω, that are in <0t. Define the map ρ:Ω^->[0,ί] χ Ω, by ρ(ω):=(Γ(ω),ω), and
define the map Jf(t):[0,i] χΩ,-^Ε by Xit)(s,co):= X(s,co). Then we have the
pictures
%^~— Λ[0,ί] χ <St<^— <f,
whence, for TeS,
{ω:Χτ(ω)ΕΓ}η{Τ^ή = (Χ(ί)οΡΓ1(Γ)Ε^^^.
74. First-entrance (debut) times; hitting times; first-approach times: the easy cases.
We now consider some important examples of stopping times.
(74.1) DEFINITION (first-entrance (debut) times; hitting times). // {Xt} is a
process with values in a measurable space (E,S), and if Te$, we define
Dr(o):= inf {t ^ 0:Xt(co)er}9
ΗΓ{ω):= inf {t > 0: АГг(о)еГ},
with the usual convention that the infimum of the empty set is oo. We call Dr the
debut of Г for X or first-entrance time of X into Г and Hr the hitting time of
Г by X.
The definition of hitting time may seem rather bizarre. However, it is the one
that matters in potential theory.
Please do note carefully the hypotheses and conclusions of the following
lemmas.
(74.2) LEMMA. The first-entrance time DF into a closed set F for a continuous
{&t} adapted process with values in a metric space (£,p) is a {&t} stopping time.
The hitting time HF of F is a {&t+} stopping time.
Proof. Since x\-+p(x,F) is continuous, ω\-+ρ(Χ4(ω\F) is ^-measurable, for
184
SOME CLASSICAL THEORY
11.74,75
geQ+. But now, for r^O, we have by path continuity,
Df{cd) < t if and only if inf {p(Xq(co),F):qeQη[0, t]} = 0,
and the stopping-time property of DF follows. Proof of the result for HF is left
to you. D
(74.3) LEMMA. The first-entrance time DG into an open set G for a right-
continuous {&t} adapted process with values in a topological space (E,3#(E)) is a
{% + } stopping time.
Proof. Right continuity of paths implies that
{DG<t} = [j{XqeG}e%
q<t
so that DG is a {^ +} stopping time. Even if all paths of X are continuous, DG
need not be a {&t} stopping time. (For example, suppose that X is real-valued,
that for some ω, Xt(co) < 1 for t ^ 1, and that Α\(ω) = 1. Let G = (1, oo). We
cannot tell without looking slightly ahead of time 1 whether or not DG= 1.)
{J4.4) LEMMA and DEFINITION (first-approach time). Let X be an adapted
R-process with values in a metric space (£, p). Let К be a compact subset of E.
Define the first-approach time AK for К as
AK:= inf {t ^ 0:either Xt or Xt- is in K.)
(We define X0_:= X0.) Then AK is а {Щ stopping time.
Proof. We have AK^t if and only if inf{p(Xq(cu\K):qeQr\[0,t]} = 0
or XteK. D
75. Why 'completion' in the usual conditions has to be introduced. Now for the
result that explains why we need the 'completion' part of the usual conditions.
You are very strongly recommended to examine its proof, so that you start to
appreciate the need for the 'transfinite' methods that find their proper form in
proofs of the Debut and Section Theorems.
{75.1) LEMMA. Again suppose that X is an R-process with values in a separable
metric space (£, p) and that К is a compact subset of E. Suppose that X is adapted
to a filtration {^Ft} that satisfies the usual conditions. Then DK is a {^t} stopping
time.
Proof. Let
5χ(ω):= Ax:=inf {i^0:either Xt or Xt_ is in K).
The most intuitive feeling for ordinal numbers will suffice for our discussion.
11.75 CONTINUOUS-PARAMETER SUPERMARTINGALES 185
Recall that one can count through countable ordinals as follows:
1,2,3,...,α,α+1,α + 2,...,2α,...,3α,...,4α,...,α2(=αα),
..., α3,..., α4,..., αα, αα + 1,..., etc., etc.,
where α is the first infinite ordinal. Each countable ordinal is either the successor
η + 1 of some η or a limit ordinal, which is the supremum of ordinals less than
it. Define Ξη for countable ordinals η as follows:
S„+1:=inf{i^S„:either Xt or Xt_ is in K},
S ·= | lim Sy if η is a limit ordinal.
νίΐτ
The argument used to prove Lemma 74.4 shows that each Ξη is a {#",} stopping
time.
The process X may approach К at Sfi, and 'jump away at the last minute',
in which case Sp+1 >Sfi; but it can only make countably many such jumps
away. (It is easy to prove that if the sum of an arbitrary sequence of non-negative
terms is finite then only countable many terms can be non-zero.) We have
&κ(ω) = ^(ω)(ω), where <5(ω) is the first—necessarily countable—ordinal such
that SS{(a)(d) = SS{(0) + !(ω). The problem is that the number of countable ordinals,
the number of possible values of <5(ω), is uncountable (in much the same way
that the number of finite ordinals is infinite).
The probability measure Ρ now comes into play. Let
c„ := Ε exp (- S„), с := inf c„,
the infimum being over all countable ordinals. For neN, we can find η(η) such
that сф)<с + η"1. Let η(οο) be the countable ordinal lim^(n). Then η(οο) is
independent of ω, and, since εη{ο0) = с, we have
s„<oo) = sup S„, a.s.(P).
It is now clear that DK is almost surely equal to the {lFt} stopping time Ξη(α0).
D
It is not at all easy to prove that DK need not be ^-measurable if {^t} is the
unaugmented natural filtration of X, but Dellacherie [2] succeeded in doing so.
(75.2) The effect of usual augmentation on stopping times. Proofs of Strong
Markov Theorems rely heavily on the following result.
(75.3) THEOREM (Dynkin). Let Τ be a {#,} stopping time, where {J%} is the
usual augmentation of {&t}. Then there exists a {&t + } stopping time S such that
P(S = T) = 1. Furthermore, &τ is the smallest σ-algebra containing &s+ and all
P-rn/// sets in #".
Proof. For neN, define
(75.4) Γ(π)(ω):= (к + 1)2"и if 2"и ^ t < (к + 1)2"и,
186
SOME CLASSICAL THEORY
H.75,76
with Γ(π)(ω):= oo if Τ(ω) = oo. Then
ΑΚη:={ω:Τ(η)(ω) = ΐα-η}Ε^2-η (Ши{оо})5
so that we can find Л*п in %2-n+ such that Л*п = Лм, a.s. (in that their indicator
functions agree, a.s.). Define
(fc2-- on Л*я\Гил7.Л (fceIN)>
Я«(ш).-= li<k J
I oo on Ω\^υΛ*π]
Then R{n) is a {^t + } stopping time. Put S(n):=infm^„K(m). Then S(n) is a {^i + }
stopping time. It is easily checked that S(n) = T{n) a.s.(P), and we know from
Lemma 73.8 that S:=|limS(n) is a {^t + } stopping time that clearly satisfies
S=T, a.s.(P). Proof of the statement about ^T is now an easy exercise. D
76. Debut and Section Theorems. The Debut and Section Theorems are the
fundamental 'measurability' results for stopping-time theory. The proper
techniques for deriving these results involve Choquet capacitability theory and earlier
methods of Sierpinski and others. We owe the theory principally to Sierpinski,
Choquet, Ray, Doob, Hunt, Dynkin, Meyer and Dellacherie. Dellacherie and
Meyer [1] is the definitive account. It is rather hard going, and it might be
advisable to read (say) the appendix to Dynkin [1] first. Perhaps the best thing
to do is to take the Debut and Section Theorems on trust for a while until you
understand how such things are used.
(76.1) THEOREM (Debut Theorem). Let X be a progressive process relative
to a filtration {&t}, and with values in some topological space Ε (with its Borel
σ-algebra 3&{E)). Then, for Be@(E), the equation
£>Β(ω):= inf {t ^ 0: Xt(co)eB}
defines an {^t} stopping time, where {^Ft} is the usual augmentation of {&t}.
For a proof, see T.IV.50 of Dellacherie and Meyer [1]. (Note that IB°X is
a progressive process with values in {0,1}.)
(76.2) THEOREM (Section Theorem: first draft). Let X be a right-continuous
process, {&t}-adapted, with values in some complete separable metric space E. Let
В be a Borel subset of E. Then, for ε > 0, there exists a {&t} stopping time Τ such
that
(i) ΧΤ{ω){ω)ΕΒοη{Τ<οο};
(ii) P( Τ < oo) ^ P{DB < oo) - ε.
If {#,} satisfies the usual conditions then we often can, and do, take T=DK,
where К is some large' compact subset of B.
11.76 CONTINUOUS-PARAMETER SUPERMARTINGALES 187
The hypotheses of Theorem 76.2 are not the natural ones—hence the 'first
draft' in the title. It is important to note that Theorem 76.2 would be false ifX
were only assumed progressive: the correct measurability requirement on X is that
it be 'optional. We look further into these matters in Chapter VI.
The significance of the Debut Theorem will already be clear to you. The
Section Theorem may not convey much as the moment. Let us therefore give
you now an illustration of its use. We shall take for granted the strong Markov
property of a certain process; this is proved in Chapter III.
(76.3) Example. For neN, let Xn — {Xn(t):t ^ 0} be a Markov chain with two
states {1,-1} and β-matrix
i"4n In \
\ 4n -(In)'
Assume that the chains Xn are independent of one another. Let Ε be the
multiplicative group Ε = { — 1,1}N with the obvious product topology, so that
Ε is homeomorphic to the Cantor set. Let
X(i):=(*iW, «),...)·
Then X is a strong Markov process on E; more precisely, if {^t} denotes the
usual augmentation of the natural filtration of X, and if Τ is an {^t} stopping
time, then the process {Χ(Γ+ ί)/Χ(Γ)} is independent of !FT and has the same
law as (X(i)/X(0)}. (You should consider the adjustments to be made when T
can be oo).
If the qn grow sufficiently rapidly, then one can prove by local-time techniques
that
p( supDw<ooJ = l,
so that, almost surely, by some finite random time, X will have visited all points
of£.
What we do now is investigate the 'robustness' of the Strong Law of Large
Numbers (SLLN) when, as we now assume,
qn — 1 for all n.
Then
Ρ[^(ί) = ^(0)]=|(1 + β-2ί), V(n,i),
and one can easily use the SLLN to prove that, for fixed i,
Ρ limsupn"1 £ Ark(i) = e_2ilimsupn-1 £ Xk(0) =1.
L k<n *<n J
188 SOME CLASSICAL THEORY 11.76,77
In particular, if Γ is a stopping time then, for fixed t,
limsupn"1 X Ark(r+i) = ^"2ilimsupn-1 £ Xk{T)
k^n k^n
(76.4) Ρ
Suppose henceforth that
T< oo
]-
P[*„(0)=±l]=4, Vn.
Let
T:=<xeE:n * £ xk-*0 as n-*oo >.
I k^n J
Then, by the SLLN, for fixed t,
Р[Х(*)еГ] = 1,
so, by Fubini's Theorem,
(76.5) Р[Х(*)еГ for α/mosi all t] = 1.
However, if Г is a stopping time then, by (76.4),
Р[Х(Г+ г)фГ Ι Х(Г)£Г] = 1 Vi ^ О,
whence, a fortiori,
(76.6) P[JT(t)er for almost all i; Х{Т)фГ, Т< oo] = 0.
On comparing (76.5) with (76.6), we see that, for any stopping time T,
Р[Г<оо;Х(Г)£Г]=0.
By the Section Theorem,
Р[Х(*)еГ for all t ^ 0] = Рф£\Г = oo) = 1.
Thus, when qn = 1 for all n, the SLLN is preserved at all times. We saw earlier
that if the qn grow rapidly then there will be a random set of measure zero of
times on which the SLLN fails in every possible way. This kind of discussion
takes us into the area of'quasi' potential theory—see, for example, Fukushima
[2] and Lyons [3].
77. Optional Sampling for R-supermartingales under the usual conditions. Let
(Ω, ^, Ρ, {^t}) satisfy the usual conditions. Martingales, supermartingales,
stopping times etc. are defined relative to {^t}.
{77.1) THEOREM (approximation from above). Let X be an R-supermartingale,
and let Τ be a stopping time. For neN, let
Dn+:={fc2-":fceZ'f}
be the set of non-negative dyadic rationals of order (less than or equal to) n. Fix
11.77 CONTINUOUS-PARAMETER SUPERMARTINGALES 189
t ^ 0. Define
(77.2) Tin){co):=mi{qeT)^.q> Τ(ω)}, i(n):=inf {qety:q > t}.
Then T{n) is a stopping time relative to {3?q:qelD+}, Τ(π)| Τ and &{T{n))l^{T).
Moreover,
(77.3) X{T{n) a tin))^X{T a t), a.s. and in 5£γ.
In particular, X{T a t)eS£^.
Proof. The fact that ^(Γ(π))|^(Γ) follows from Lemma 73.8. Applying the
argument used at (59.2) to the finite discrete-parameter set Dn++1n[0,i+ 1],
we obtain
E[X{Tin) a t{n))\3?(Tin + 1) a i<" + 1))] ^ X(T{n+1) a i(n + 1)), a.s.,
and
EX{T{n)At{n))^EX{0).
The result (77.3) now follows from the Downward Theorem 51.1. (Our new
labelling reverses the 'direction' of the ns!) D
{77.4) THEOREM (stopped supermartingales are supermartingales). Let X be
an R-supermar ting ale, and let Τ be a stopping time. Then Χτ:= {ΧΤΛί:ί ^0} is
an R-supermar ting ale. Of course, XT is an R-martingale if X is an R-martingale.
Proof. Fix 0^5^i. Define T(n) and t{n) as in Theorem 77.1, and define s(n)
analogously. By discrete-parameter theory, we have for n,reN with r ^ n,
Е[АГ(Г(П) л i(n))|J^(5(r))] ^ X(Tin) a 5(r)), a.s.
Letting r|oo and using the Downward Theorem for martingales, we have
E[X{Tin) a i(n))|J%] ^ X{T{n) a s), a.s.
But we may now use the result (77.3) on the left-hand side and right-continuity
of paths on the right-hand side to obtain
Ε[Ζ(ΓΛί)|^5]^Ζ(ΓΛ5), a.s.
as required. D
(77.5) THEOREM (Doob's Optional-Sampling Theorem for UI R-super-
martingales). Suppose that X is an R-supermartingale, and that either X is UI, so
that {Xt:t^0} is a UI family, or X is non-negative. Let S and T be stopping
times with S ^T Then Zre JSf *, and
E{XJPT)^XT, a.s., E(XT\^S)^XS, a.s.,
with a.s. equality in both places if X is a UI R-martingale. In particular, a UI
190 SOME CLASSICAL THEORY 11.77
martingale Μ is of class (D) in that the family
{MT\T a stopping time}
is UI.
Remark. You have already been warned that the 'obvious' analogue of the
Doob decomposition for supermartingales is false in continuous time. See
Exercise E79.77a. We therefore cannot proceed as we did in discrete time by
deriving the supermartingale result from the martingale result. We combine
approximation with the discrete-parameter result for supermartingales.
Proof. From the discrete-parameter results (59.1) and (59.5) for Dn++ v we have
Е[АГ(Г(и))|^(Г(и + 1))] ^ X(T{n + l)), a.s.,
and
EX(T<n))^EX(0).
Hence, by the Downward Theorem, X(T{n))->X(T) in JS?1 as well as almost
surely (by right-continuity). Next, we have
Ε[ΑΓ(οο)|^(Γ(π))]^Α:(Γ(π)), a.s.,
whence
Ε[*(οο)|^(Γ)Κ*(Γ), a.s.,
We also have, for n,reN with r ^ n,
E[Z(r(n))|J^(S(r))] ^ X(S{r)), a.s.,
whence, as previously, on letting r|oo and then η|οο, we obtain the desired
result. D
The martingale case of the above Optional-Sampling Theorem is used
repeatedly in stochastic-integral theory. The following little result will also be
found useful in that subject.
(77.6) THEOREM. Let {Mt:0 ^ t ^ oo} be a progressive process such that for
each (finite or infinite) stopping time T, we have £(|Mr|) < oo and E(Mr) = 0.
Then Μ is a UI martingale.
Proof. Let t ^ 0 and Fe^t. Define
oo ifcoeFc:=Q\F.
11.77,78 CONTINUOUS-PARAMETER SUPERMARTINGALES 191
Then (check!) Τ is a stopping time, and
E(Mr) = E(Mr; F) + E(M e; F<) = 0,
E(M J = E(A# e; F) + Е(МЮ; F<) = 0,
whence £(МИ; F) = E(M,; F), and so, M, = Е(МЮ | J^), a.s. D
Do the exercise of extending the commutativity property in Theorem 59.6 to
the following continuous time.
(77.7) THEOREM. Continue to assume the usual conditions. Let S and Τ be
stopping times. Then
EsEr = ErEs = Е5л t,
where ET denotes the conditional expectation map
See the note following Theorem 59.6.
78. Two important results for Markov-process theory. We apologise for 'floating
in and out of the usual conditions'. We shall see that there are reasons for so
doing.
(78.1) THEOREM. The following results hold.
(i) Let Ζ be an R-process on a triple (Ω,^,Ρ). Set
[/(ω):=ίηί{ί^0:Ζί(ω) = 0 or Z,_=0},
so that U is the time of first approach by Ζ to 0. Then U is ^-measurable and
G:= (ω:Ζ.(ω) = 0 on [t/, oo)}eSi.
(ii) Let X be a non-negative R-supermartingale relative to a setup (Ω,^,Ρ,
{%:t^Q}). Set
Τ(ω):= inf{ί^0:Xt(co) = 0 or Xt. = 0}.
Then
P(Z = 0 on [Г,оо)) = 1.
Proof of (i). That U is ^-measurable is obvious from the fact that we could
take every 9t to be ^ in Lemma 74.4. Then, with ε, q1 and q2 rational, we have
G= Π U Π {ω:9ι<υ(ω)<9ι+ε;Ζ92(ω) = 0}. D
ε>0 q\ ^ 0 92>9ΐ +ε
Proof of (ii). We know that X remains a supermartingale relative to the
192
SOME CLASSICAL THEORY
11.78,79
usual augmentation (Ω,^,Ρ,{^:ί ^0}) of (Ω,^,Ρ,{^,:ί ^0}), and that Τ is
an {^t} stopping time. For neN, set Sn:= inf {t:Xt < n-1}. Then Sn is a
stopping time with Sn ^ Γ. For any rational q ^ 0, Sn and Γ + q are stopping
times, and S„^T+q. Hence, for every n, EX{T+ q) ^ £*(£„) < л"1, whence
Р^Г + g) = 0] = 1. The rest is obvious. D
The next result has been very influential in shaping the development of
Markov-process theory, particularly in connection with the problem of
determining the 'correct' hypotheses for that theory.
(78.2) THEOREM (Meyer). Let {^"J satisfy the usual conditions. For each
heN, let Xn be an R-supermartingale relative to {J^}. Suppose that the sequence
(X":neTti) is increasing in that
Xn{t, ω)*ζΧη + ι (ί, ω), V(n, ί, ω).
Define
X{t, ω):= sup Xn{t, ω) ^ oo.
π
Then P-almost all paths of the (oo, oo]-valued process X are R-functions.
The original proof in Meyer [1] remains the simplest. There is a nice, more
sophisticated, proof in Getoor [1].
79. Exercises. Substantial hints for most of the exercises are given at the end
of the section.
(E79.63) Let В be a path-continuous BM{R). Show that Xt:=e^cosBt and
У,:= e*1 sin Bt define martingales X and Υ relative to the natural filtration of
B. Thus (X, Y) is a martingale in IR2 and Xf + У2 = e\ This is behaviour very
different from that of continuous martingales on IR, which, as we shall see, all
resemble Brownian motion on R.
(E79.64) Empirical distributions. Let neN. Let XuX2,...,Xn be independent
random variables each with the uniform distribution on [0,1]. For 0 ^ t < 1,
define
Gn(t):=n-l#{k^n:Xk^t}, An{t):= n±[G„(i)- t].
Let yn{t):= a(G„{s):s^t). Prove that
%tAn(s)
A (i)
M„(i):=-^, B„(ty-=An(t) +
1-i
ol
■ds,
Vn(t)-=Bn{t)2-Gn{t),
define martingales M„, B„ and V„ with parameter set [0,1) relative to {@„(t):
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 193
ie[0,1)}. Which of these martingales are UI on [0,1)? Note that if we set
M(i):= 0 and <St = σ{Χ^...,Xn) for 1 < t < oo then Mn is a supermartingale 5£x
convergent at oo, but is not UI.
Note. It is intuitively clear that, as η -> oo, we should have, in some sense,
GAt)->t, Bn^B, An^A
(on parameter set [0,1)), where β is a Brownian motion and Л is a Brownian
bridge.
{E79.66a) Test your understanding of the regularisation results by proving the
following 'L-analogues'. Part (Hi) is particularly instructive. We shall say that
χ: (R+ -> IR is an L-function if
xt = lim xs (Vi > 0), xt + := lim xu exists in R (Vt > 0).
*tt' «Hi
(i) If у: Q + -*· IR is regularisable, then
zt:='
lim^
J «ίί*
)
U
if ί>0,
if ί = 0
defines an L-function z.
(ii) If {Yt: teU + } is a supermartingale carried by (Ω, #, P; {^,: ie!R + }) then the
set G of those ω for which q\-+ Yq(co) is regularisable is in 9, and P(G) = 1. If we set
(lim Yq if ί>0,
*TT' τ η
Уо ifi = 0
then Ζ is an L-process.
(iii) Define 9t _ := a(9s: 5 < t) for t > 0 and ^0 _ := ^0, and set /,:= a(9t _, <V(9J\
where Jii^S^) is the collection of P-null sets in 9Λ. Then Ζ is adapted to {/",}.
Moreover, Ζ is a supermartingale relative to {<ft}.
(iv) Ζ is a modification of Υ if the map ti-> Y, is left-continuous into JS?1.
(v) In contrast to the case of R-regularisation, Ζ can be a modification of Υ
even though ti-> У, is not left-continuous into JS?1. Convince yourself of this by
taking an example in which
Y*=B(jzrtAx) (i<1)' y':=~1 {t>l)>
where В is a BM0(R) and τ:= inf {w: Bu = - 1}.
{E79.66b) Let {#,} be any filtration, and define tft:=a(ys:s < t) for t>0.
Prove that, for every t,Ji?t+=9t+.
194
SOME CLASSICAL THEORY
11.79
(E79.67a) Let (Ω, J^, P) be a triple in which J^ is P-complete. Let JT be the
collection of all P-null sets in &. Let Ж be a sub-a-algebra of J*. Prove that
L:= d(jr, JT) = {C/eJ^ : [/ΔΧε^Γ for some KeЖ} = :R.
Note that, since Ж и ^Г с Я ^ L, you need only prove that R is a σ-algebra.
(Е79.6Щ Prove Lemma 67.4.
(E79.71a) Likelihood-ratio martingales. Let (Ω, ^, {^J) be a filtered space.
Suppose that Ρ and Q are probability measures on (Ω,^) such that Q is
absolutely continuous relative to Ρ on every cSt (though not necessarily on all
of 9). Define Mt to be a version of the Radon-Nikodym derivative:
Aff:=i^onSrA a.s.
Prove that Μ is a (P, {^,}) martingale.
(i) Let (Ω,^, {^J) = (C,cfi/, {s/t}), the canonical filtered space for continuous
processes on R. Let Ρ be the Wiener law of BM0, and let Q be the law of
Brownian motion with drift с starting at 0. Prove that
'Q---1 1,2,
— on 9t 1 = exp (cnt - \c2t).
(ii) Now let Ω be the space of all right-continuous functions w: IR+ ->Z+ such
that w(0) = 0 and that w is constant except for a series of upward jumps of size 1.
Let
7i,(w) := w{t), <SX := a(ns: s ^ 0), 9t := σ(π,: s ^ i).
Let A and μ be positive constants. Let Ρ (respectively Q) be the law of the Poisson
process of rate λ (respectively μ). Calculate dQ/dP on <&v
(E79.71b) Projecting onto smallerfiltrations. Let (Ω, 9, P; {^J) be a filtered space
carrying a supermartingale Y. Suppose that {Жг} is a smaller filtration in that
Жt с #f for every i. Prove that
Zr:=E(yf|jrfX a.s.,
defines a supermartingale Ζ relative to {^f,}.
You have already seen at an intuitive level how this applies to give Azema's
martingale in Section 37. Here is another application.
Let Xt:=Bt + ct, where В is a BM0(R) and c>0. Let {Жг} be the natural
filtration of В (equivalently of X). Let σ:= sup{i:Xt = 0}, with the convention
that sup 0 = 0. Prove that Z,:= Ρ(σ > 11Жt) defines a supermartingale Ζ relative
to {Ж^. Obtain the explicit formula Zt = exp( — 2cX*\ a.s. Deduce that
(*) P(aedt) = c{2nty*exp{-±c2t)dt.
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 195
Give an immediate direct proof of (*): 'And the last shall befirsf. {Note. Excursion
theory allows one to write down formulae analogous to (*) in very general
situations.)
(E79.71c) This exercise gives a simple proof of Levy's descriptions of local time
in (1.14.7) and (37.9). Suppose that Λ is a Poisson measure on [0, oo) χ (0, oo)
with intensity measure Leb χ ν, where ν Φ 0, and where
ν(ε oo)
c(r):= lim — exists for \ < r < 1
40 v(re, oo)
and defines a strictly increasing continuous function с on β, 1]. Set
Μ(ί,ε):= ν(ε, oo)"1 Λ([0, ί] χ (ε, oo)) - ί.
Show that Μ(·,ε) is a martingale, and that, for \<,r< 1, it is almost surely
true that M(·, r") -* 0 as η -* oo, uniformly on compact ί-intervals. Deduce that
almost surely, Μ(ί, ε) -> 0 uniformly on compact i-intervals.
(E79.71d) Read Section IV.2 of Volume 2, and make the sketched proof of
Levy's quadratic-variation result in that section rigorous. (Volume 2 does give
rigorous proofs of much more general results.)
{E79.73) Prove Lemma 73.8.
(E79J7a) Let Μ be a continuous martingale such that
P( sup Mt — oo, inf Mt = oo 1 = 1.
Define
T(0):=0, T(n):=inf{i>T(n-l):|Mt-Mr(n.1)| = l}.
Prove that the law of {ΜΓ(π): neZ+} is the (Markovian!) law of Simple Random
Walk.
(E79.77b) Let Μ be a continuous martingale such that
P(supM, > 1) = P(infM, < - 1) = 1.
Define Ut:= sups< t M» Lt:= infs< t Ms and
T:=inf{r:C/r-Lr=l}.
What is the distribution of Mr? (This question arose, and was solved, in work
by LCGR on financial problems. We now have many solutions.)
(E79.77c) The Η elms-Johnson example. This is a first look at one of the most
celebrated counterexamples in the subject, one to which we return in Sections 111.31,
IV. 14 and VI.33.
196
SOME CLASSICAL THEORY
11.79
Suppose that X is a UI supermartingale, and that
(i) X = Μ - Л,
where Μ is a martingale and A a process with non-decreasing paths. Show that
Е(Лв) < oo, and deduce that Μ is UI. Prove that X is of class (D) in that the
family
{XT:T a stopping time}
isUI.
Let В be a BM0(R3), and define the Helms-Johnson process
(ii) *,:=l/|Bt+1|.
Prove that X is a UI supermartingale relative to its own natural filtration,
but that X is not of class (D), so that X does not have a 'Doob decomposition'
as (i). Hints. Show that
(Ш) E(X?)=l/(t + l).
If Гп:= inf {i: Xt ^ n} then, by Corollary 1.18.3 (which we assume),
(iv) P(Tn < oo\X0) = тЩп-'Хо, 1).
(E79J7d) More on likelihood-ratio martingales. Let (Ω, J^, P; {J%}) satisfy the
usual conditions. Martingales, a.s., etc. are defined relative to this setup. Let Q
be a probability measure on (Ω, &) such that Q « Ρ on every 3Ft. Choose a
right-continuous martingale Μ with dQ/dP = Mt on every &v Show that if Τ
is an almost surely finite stopping time such that {MTAt: t ^ 0} is UI then Q « Ρ
on !F τ and
dQ
— = MT, a.s. on 3P T.
dP T T
Prove Reuter's Theorem that if X is a Brownian motion on Rn with constant
drift vector с and started at 0, and if Г:= inf {t: |Xt| = 1} is the hitting time of
the unit sphere, then Τ and Xr are independent variables. Why does this not
conflict with intuition?
Hints for selected exercises
(H79.63) We know that exp(i0£, + f Θ2ή is a martingale with values in С Now
take 0=1.
{H79.64) For 0 ^ t ^ и ^ 1,
Е(С„(н)|ЗД) = Gn(t) + [1 - G„(i)] U-^.
1 -t
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 197
This rearranges to say that Mn is a martingale. The real reason that Bn is a
martingale is the SDE
dBn = {l-t)dMn.
But we can prove the property directly: for 0 < t < и ^ 1,
1-ί Jo 1-5 J, 1-i
(Think briefly on how to justify this step.) There is an SDE reason for the
martingale property of Vn too, but you can obtain this directly using the variance
of a binomial distribution.
Of course, Mn(l -) = 0, a.s., so M„ cannot be UI on [0,1): if it were, we
would have M„(i) = E(M„(1 - )\9J = 0, a.s. We know that E{B„(i)2} = EG„(i) = t,
so that Bn is 5£2 bounded and therefore UI. A bit more calculation shows that
Vn is also $£2 bounded.
{H79.66a) (Hint for Part (iii), which is the tricky bit.) Fix и > 0. For 0 ^ r < u,
with red}, write
Vr:=Yr-E(Yu\<Zr).
Check that К is a non-negative supermartingale with parameter set Qn[0,w).
Hence, for 0 < q < t < r < u, with qeQ, we have
E(Ku_|<g = E( liminfKr|^ WliminfE(Kr|^)< Vr a.s.
V rtT« /
But, by the Upward Theorem, as г||и,
Yr=Vr + E(Yu\9r)->Yu- = Vu-+E(Yu\9n-)9 a.s.,
whence
Е(Уа_ \Vq) = E(KU_ |^) + Е(Уа|Зд< Vq + Е(У„|зд = yr
Now let q]]t to get the required result that Ζ is a {/t) supermartingale.
{H79.67b) For и > ί, &u must extend а(^и,Ж), whence
However, {jfj satisfies the usual conditions, so that Jft = ^"t,Vi.
Suppose now that HeJft and that u(n)||i. By Exercise 79.67a, we can find
Gm in ^а(и) such that Nu(n):= HAGuin)eJT. Now set G:= f) Gu(n) and JV:= У Nm.
Then Ge^t + and ЛТеЖ, and, since you can easily check that HAG^N, the
proof is finished.
(H79.71a) For every i, let M, = </Q/<iP on 9„ a.s. Then, for s < i, we have, for
198 SOME CLASSICAL THEORY 11.79
E(Mt;Gs) = Q(Gs) = E(M-Gs),
so that Μ is a martingale.
(i) If p(t;x,y) is the transition density function for Brownian motion, and
9c(*; *> У) that for Brownian motion with drift c, then
Яс(Ь x, У) = exp [c(y -x)- \сгг\ p{t; x, y\
and, for neN, 0 = to<t1 < ··· <ίπ = ί, and χ0»χι»···>νΚ· with x0 = 0,
Q( Π Kedxi})= Π^(ί|·-ίί-ι;^-ι5^)^ί=····
(ii) For j,neZ+, we have
(μί)π /μ\π
Ρμ(ίί Λ 7 + и) = exp (- μί) —- = exp [(Я - μ)ί] I £ J ρΑ(ί;;,; + η);
and the answer is exp [(Я - μ)ί](μ/Λ)πί.
(H79.71b) The 'theory' part is just the Tower Property of conditional
expectations: for 0 ^ s ^ i,
E(Zi|jri) = E(yt|jrt|jri) = E(yt|jri)
= E(yt|ari|JfI)<E(yi|jrj) = ZIf a.s.
As regards the example, Yt:=I<r>t is decreasing, and is obviously a super-
martingale relative to the filtration with <&t = Ж ^ for every t. Hence Ζ is a
supermartingale. That Zt = exp( — 2cX*) you know from Section 1.9. The
formula (*) now follows from
Ρ(σ > t) = EE( Yt | jfr) = E(Zr)
and some integration.
Now for the immediate proof. Let В be the BM0 defined as B{t) = tB{l/t).
Then
σ = sup {i: B(i) + ci = 0} = sup {t: tB(l/t) + ci = 0}
= sup {i: B(l/i) = - с} = 1/inf {ί: 5(ί) = - с}.
(Н79.71с) Since Ε{Μ(ί,ε)2} = ν(ε,οο)_1ί, the fact that M(-,rn)->0 uniformly
on compact ί-intervals follows from the Submartingale Inequality and the
Borel-Cantelli Lemma. Interpolating for ε between rn+1 and r" for r very
close to 1, by using mono tonicity, is left to you.
{H79.73) If Sn]S then {S ^ ή = f] {Sn < i}.
If SJS then {S < ή = [j{Sn < t}.
11.79 CONTINUOUS-PARAMETER SUPERMARTINGALES 199
If S„[S and Gef]^Sn+ then, for every η and every i,
Gn{Sn<t}e<Z„
whence Gn{S <t} = (J(Gn{S„ < t})e<Zt.
(H.79.77b) Simon Harris found this direct solution. Suppose that 0 < χ < у < 1.
If Мге[х, у] then
(i) Μ hits j — 1 before it hits y; and
(ii) after hitting у — 1, it hits χ before it hits χ — 1.
Hence, by the logic of the previous question,
P(MTelx,y])*iy(y-x).
On the other hand, if the following three conditions hold:
(i) Μ hits у — 1 before it hits x;
(ii) after hitting у — 1, Μ hits 0 before it hits χ — 1;
(iii) after hitting у — 1 and then 0, Μ hits у before у — 1;
then Мге[х,у]. Hence
Ρ(ΛίΓε|>,?])>— ^(1-?).
1 4-х — у 1 —χ
The rest is easy.
{H.79.77c) If X is a UI supermartingale satisfying (i), then X is bounded in
JS? \ and, since
Е(Л) = Е(Л0) + Е(1г-10),
we must have E^^) < oo. But now X and A are UI, whence Μ is also UI,
and, by Theorem 77.5, Μ is of class (D). Since A is trivially of class (D), it
follows that X is of class (D).
In the example,
E{Xf) =| r2(27ri)-3/2exp( —- ]4nr2dr = —,
so that X is bounded in 5£2, and so is UI. Clearly, Хю = 0, a.s., so that, if X is
of class (D) then, for any sequence {Tn) of stopping times with Гп| oo, we shall
have X(Tn)-*0 in $£*. But, for the given sequence of stopping times, we have
lim inf EX(Tn) ^ lim inf nP(Tn < oo) ^ E(X0) > 0.
(Jti.79.77d) Let Fe^T. Then
Fn {Τ < ί}6^ΓΛ t = ^η/Γ,
200
SOME CLASSICAL THEORY
and so
Q(Fn{T^t}) = E(Mt;Fn{T^t}) = E(MTAt;Fn{T^t}).
Now let tf oo to get
Q(Fn{T<oo}) = E(MT:Fn{T<oo}).
For the proof of Reuter's Theorem see IV. 39.6 in Volume 2.
6. PROBABILITY MEASURES ON LUSIN SPACES
This Part consists of two main themes: 'weak' convergence of measures, and
existence of regular conditional probabilities. These themes are linked via their
use of inner regularity of measures relative to compact sets. In the Daniell-
Kolmogorov Theorem 31.1, we were able to utilize this inner regularity by
making the assumption that the state-space Ε was Lusin. Now we are going
to make this assumption on the sample space Ω. The product sample space
from the DK Theorem is useless for this purpose, but we have now learnt
that we can work with the space of continuous paths or that of R-paths, both
of which spaces are Lusin (and even Polish).
We present—fully, and in the simplest way we could devise—all the 'theory'
of 'weak convergence', and how it applies to the space of continuous paths, the
only case that we shall need. See 'Pointers to the main results' below. For the
way in which the theory applies to the space of R-functions with the Skorokhod
topology, you will have to see the excellent books by Billingsley [2] Parthasarathy
[1] and Ethier and Kurtz [1].
Let X!,...,Xn be independent identically distributed real-valued random
variables each with mean 0 and variance 1. Let μη be the law of
The Central Limit Theorem tells us that μπ converges 'weakly' to the law μ of
a standard normal N(0,1) random variable. If, for example, each X is + 1 or
— 1 with probability \ each, and if A denotes the set of algebraic numbers,
then
μπ(Α) = Ρ(7„ΕΑ)=1^0 = μ(Α).
For which Borel sets do we have μη(Β)->μ(Β)Ί How do we formulate 'weak
convergence' generally so that it will apply, for example, to the Donsker
Invariance Principle of Section 1.8 and allow us to derive interesting consequences
(look ahead to (84.7)) from it? These are among the questions that we shall
address. We use weak convergence to obtain existence of solutions to martingale
problems in Section V.23 of Volume 2.
Regular conditional probabilities provide an important language for
probability theory and an important technique for martingale problems etc. See the
PROBABILITY MEASURES ON LUSIN SPACES
201
proof that solutions of martingale problems are Markovian in Section V.21 in
Volume 2.
We begin by recalling the Stone-Weierstrass Theorem and the Riesz
Representation Theorem, results that we shall also need in Markov-process theory.
For the case where J is a compact metrisable space, we shall put in explicit
form (also useful later) the fact that the set Pr(J) of probability measures on
(J,^?(J)) is again a compact metrisable space. Results for the case in which we
are most interested, that of probability measures on a Polish space S (a space
homeomorphic to a complete separable metric space), will be deduced by
embedding S as a Borel subset of a compact metrisable space K. Therefore the
only property of S that we need is that S is a Lusin space (a space homeomorphic
to a Borel subset of a compact metrisable space!). Since Lusin spaces occur
frequently, we use the 'Lusin' hypothesis—but the main reason for doing so is
that it makes things easier!
The vexed question of terminology. There have always been conflicts of
terminology in this area. Probabilists have always used 'weak convergence' for something
close to functional analysts' 'weak* convergence' (and very different from
functional analysts' 'weak convergence'). Since our treatment depends crucially
on the fact that, in functional analysts' language, 'the unit ball in the dual space
is weak* compact', and since we rely on functional analysis more in this part
of the chapter than elsewhere in the book, we use terminology consistent with
that of functional analysis while 'doing the work'. Having made everything clear
(we hope), we shall then, from a clearly signalled point on, regress to probabilists'
terminology.
Note on functional analysis. Through no fault of their own, many research
students these days are less familiar with functional analysis than students were
when the first edition of this book was published. We therefore include a more
systematic linking of our results with those in the functional-analysis texts. We
emphasize that one does not know a priori that, for a Polish space S, the Cb(S)
topology on Pr(S) is metrisable. This is why we are forced to use nets rather
than sequences.
Pointers to the main results. The 'weak', or Cb(S), topology on the set Pr(S) of
probability measures on a Lusin space S is defined in (83.1). This topology is
shown to be metrisable in (83.7). Prohorov's sufficient condition (also necessary
when S is Polish) for conditional compactness of subsets of Pr(S) is given in
(83.10). The Continuous-Mapping Principle is given in (84.2); and Skorokhod's
Representation Theorem (which gives a clear picture of the Continuous-
Mapping Theorem and of much else) is found in Section 86. It is shown in
Section 82 that, with the topology of uniform convergence on compacts, the
202
SOME CLASSICAL THEORY
11.80
space W = C([0,oo);IR) is Polish, with the usual algebra of σ-cylinders as its
Borel σ-algebra. Prohorov's theorem on 'weak' compactness is translated into
practicable form for Pr(W) in Section 85. Finally, the relation between 'weak'
convergence and convergence of finite-dimensional distributions for W is
explained in Section 87.
Weak convergence'
80. C(J) and Pr(J) when J is compact Hausdorff. Let J be a compact Hausdorff
space. This is a standard setting in functional analysis. We recall various
fundamental results, for which D&S (that is, Dunford and Schwartz [1]), remains
a superb reference. (Nowadays, you have to read 'sphere' there as 'ball'.)
Let C(J) denote the Banach algebra of continuous (and necessarily bounded)
real-valued functions on J with the usual supremum norm.
(80.1) THEOREM (Stone-Weierstrass Theorem, D&S IV.6.16). Let A be a
subalgebra ofC(J) that contains constant functions and separates points of J: for
xeJ, there exist elements f and g in A such that f(x) φ g(x). Then A is dense in
C(J).
(80.2) DEFINITION (Pr(J), inner regularity for a single measure). Let Pr(J)
denote the set of probability measures on (J, @)(J)). An element μ o/Pr(J) is called
inner regular if, for every Be@(J),
μ(Β) = sup {μ(Κ): Κ compact, К Я В}.
(80.3) THEOREM (Riesz Representation Theorem, D&S IV.6.3). Let φ be a
linear increasing functional 0:C(J)-*IR such that 0(1)= 1. Then there exists a
unique inner regular element μ of Pr (J) such that
Φ(Ω = μ(/)=\ fdn.
Of course, 1 denotes the constant function equal to 1 on J; and 'φ is increasing'
means that / < g on J implies that 0(/) < ф(д). The hypotheses on φ force φ
to be a bounded linear functional of norm l:0eC(J)*.
(80.4) Discussion. What is going on here? Why the mysterious inner regularity?
The answer is that the smallest σ-algebra on J with respect to which all
continuous functions are measurable, the so-called Baire σ-algebra on J, may
well be smaller than 3i(J). A probability measure on the Baire σ-algebra has
a unique extension to an inner regular element of Pr(J). Does this matter to
probabilistsl Yes, it does. These ideas can be used in a very illuminating
alternative approach to the proof of the Daniell-Kolomogorov Theorem and
11.80,81 PROBABILITY MEASURES ON LUSIN SPACES 203
to a study of its limitations and how to deal with them. Nelson [1] deserves
much of the credit for this. See also Meyer [1] and Tjur [1].
(80.5) The C{J), or weak*, topology on C(J)*. The C(J) topology on the space
C(J)* of all bounded linear functionals on the Banach space C(J) is obtained
by making sets of the form
(80.6) {<t>eC{J)*:\φ(β- φ0(β\ <β, 1 < i<и}
a basis for neighbourhoods of the point ф0 in C(J)*. Note that this topology
is automatically Hausdorff.
We can study general topology in much the same way as we studied
elementary metric topology provided that we replace convergence of sequences
by convergence of nets (generalised sequences). See D&S 1.7. A directed set D
is a partially ordered set for which every finite subset has an upper bound in
D. A net is a family (xa:aeD) parametrised by some directed set D. If (xa:aeD)
is a net of points in a topological space E, we say that xa -* χ if, for every open
set G containing x, there exists a0 in D such that xaeG whenever α ^ α0.
The C(J) topology of C(J)* may therefore be described by saying that a net
(Φα) of elements ofC(J)* converges to the element φ ofC(J)* if and only if
(80.7) фа(Л ^ф(Г) for all f in C(J).
For feC(J), the statement that </>a(/) -* ф(Л means that, given ε > 0, there exists
an element a0 in D such that
IФа(Л ~ Ф(Л\ < ε whenever α0 < α.
(80.8) THEOREM (Alaoglu, D&S V.4.2). The unit ball
{феС(3)*: \\ф||< 1}
in C(J)* is compact in the C(J) topology.
In effect, this is just Tychonov's Theorem. We can identify the set of inner
regular probability measures on Pr(J) with the closed set of elements феС^)*
such that φ is increasing and maps 1 to 1. Hence the inner regular elements of
Pr (J) form a compact set in the C(J) topology.
81. C(J) and Pr(J) when J is compact metrisable. In this section, we assume
that J is a compact metrisable space. Things become much simpler. It is difficult
to overstate the importance of the following result.
(81.1) THEOREM. Every element ofPr(J) is inner regular.
Note. The Baire and Borel σ-algebras on J agree.
204
SOME CLASSICAL THEORY
11.81
Proof. Let pePr(J). Let si be the class of В in 36{J) such that, for every ε > 0,
there exist a compact set К with К я В and an open set G with G ^ В such that
(81.2) μ(Β\*0 < ε, MG\B) < ε.
It is immediate that the complement of a set in si is again in si. Suppose
that B1 and B2 are in si, and let B:= B1nB2. Let ε > 0. For i = 1,2, choose a
compact Xf and an open Gt with
Then if K:= KinK2 and G:= G1nG2, we have (81.2). By this stage we know
that si is an algebra.
Now let {Bn) be an increasing sequence of elements of si with union B. For
each n, choose a compact Kn ^ Bn and an open Gn with Gn ^ Bn with
μ(Βη\Κη)<ε2-η-\ μ{Οη\Βη)<ε2~η.
Then G:= (JG„ is open and μ(β\Β) < ε. The set L:=[JKn satisfies /x(B\L) < |ε,
but L need not be compact. However, for some N, K:=[Jn^NKn is compact
with μ{Κ) > μ(ϋ) — \ε\ and we have proved (81.2). We now know that si is a.
σ-algebra.
If К is a closed (hence compact) subset of J then К is the intersection of the
sequence (Gn) of open sets:
K = f]Gn, G„:={xEJ:p(x,K)<n-1},
where ρ is the metric on J. Hence every closed set is in si, and si — S6(J\ Π
(81.3) THEOREM. C(J) is separable, and Pr(J) in its C(J) topology is compact
metrizable.
Proof. Let (xn) be a countable dense subset of J (why is there such a set?), and
define
M*):=P(*>**)·
Note that the functions hk separate points of J.
Let A be the collection of functions in C(J) that are finite sums of the form
where q and the q(·,·,...) are rational constants. Then the closure of A is an
algebra containing all constant functions and separating points of J. By the
Stone-Weierstrass Theorem, A is dense in C(J). Since A is countable, C(J) is
a separable metric space.
Let (/„) be a countable dense subset of C(J). Consider the map
(81.4) Рг(7)эМ^(м(/1),р(/2),...)еК:=П[-|1/п1М1/п11]·
11.81,82 PROBABILITY MEASURES ON LUSIN SPACES 205
This map is clearly one-one (why?). Moreover, for a net (μα) in Pr(J),
μα(/)-μ(Λ V/eC(J), if and only if μα(/„)-μ(/„), Vn.
Hence the map is (81.4) is a homeomorphism, and Pr(J) is homeomorphic to
a compact subset of the metrizable space V. Note that the Riesz Representation
Theorem is still necessary to obtain the fact that the image of Pr(J) in V is closed.
D
82. Polish and Lusin spaces. We begin by recalling the definitions.
(82.1) DEFINITION (Polish space; Lusin space). Let S be a topological space.
Then S is called a Polish space if the topology of S arises from a metric with
respect to which S is complete. The space S is called a Lusin space if S is
homeomorphic to a Borel subset of a compact metric space J.
(82.2) The space (W,stf). By far the most important example for us is the case
when S is the space
W:=C([0,oo),R)
of all continuous functions on JR. This is the path space for (1-dimensional)
Brownian motion and diffusions.
(82.3) LEMMA. In the topology of uniform convergence on compact sets, W is
a Polish space, and the σ-algebra of σ-cylinders,
j/:=a(nt:t^0), nt(w):=w(t) (weW\
is the Borel σ-algebra $(W) on W.
Proof. A suitable metric on W is given by
p(w^ w2):= Σ2~ηΡη(ηι> w2)[l + pn(wu w2)~\ " \
where
Pn(^!,>v2):= sup \w1(t)-w2(t)\.
te[0,n]
That W is complete and separable follows from the corresponding result for
C([0,n]).
That si с 3&{W) follows because each nt is a continuous map from W to R.
Next note that if w1e\V then
P„(w, wx) = sup | nq(w) - n^wj |,
qeQn[0,n]
so that each pn, and therefore also p, is ^-measurable. If F is a closed subset
206
SOME CLASSICAL THEORY
11.82
of W and {wn} is a countable dense subset of F then
F= {we\V:inip(w,wn) = Q},
and Fes/. Thus si = @{W).
(82.4) LEMMA. // S is a Lusin space then every probability measure on S is
inner regular.
Proof. This is an immediate consequence of Theorem 81.1. A probability
measure μ on {S, 3#{S)) has a canonical extension to a probability measure on
(J,^(J))withM(J\S) = 0. П
(82.5) THEOREM. A topological space S is Polish if and only if it is homeomor-
phic to a Gs subset (countable intersection of open sets) of a compact metric
space J. In particular, every Polish space is a Lusin space.
The 'if part is not necessary for us. You can find it in Section 6, No. 1,
Theorem 1 of Bourbaki [1]. Our proof of the 'only if part is an extended
version of one found there.
Proof of the 'only if part. Let S be our Polish space. We prove that S may be
embedded as a G6 subset of the compact metrisable space J:=[0,1]N. This
result is so important for us that we provide every detail of the proof.
Step 1: Put p:=p/(l +p). Then ρ is also a metric under which S is complete
and separable, and O^p^ 1. Choose a countable dense subset {xn:neN} of
S, and let α be the map a: S -* J defined as follows:
«(*):= (β{*>Χι)> P(*> *2)> · · ·)·
Let us prove that α is a homeomorphism of S to a(S). We need only show that
if (x(n)) is a sequence of elements of S and xeS, then the statements
(82.6) x(n)^x,
(82.7) p{x{n),xk)^p{x,xk) for every к
are equivalent. Since each p(,xk) is continuous on S, it is immediate that (82.6)
implies (82.7). Suppose now that (82.7) holds. Since
p(x(n), x) ^ p{x(n), xk) + p{xk, x),
we have from (82.7)
lim sup p{x(n\ χ) *ζ 2β{χ, xk), Vfc.
Now let xk-+x to see that p(x(n),x)-*0.
Step 2: For the moment fix xeS. Let d be a metric giving the topology of J.
11.82,83 PROBABILITY MEASURES ON LUSIN SPACES 207
Since a"1 is continuous on a(S) at a(x), for ε > 0, we can find <5(ε) > 0 such that
yeS and d(a(x), <x(y)) < δ(ε) imply that p(x,y)<s. In particular, if neIN, then,
taking ε = (2n)~* and <5 = min(<5(e), ε), we see that if BJd(a(x), δ) is the open ball
in J of d-radius <5, then В7^(а(х), <5) has d-diameter at most 1/n and a(S)n
B7 d(a(x), <5) has p-diameter at most 1/n.
Step 3: Now think of S as identified with a(S) sitting on J. For neN and xeS,
the closure of S in J, put χ in C/n if χ has a J-neighbourhood JVXtll of d-diameter
less than 1/n such that the p-diameter oi SnNxn is also less than 1/n. We have
already proved at Step 2 that Un Ώ. S, Now we claim that Un is open in S. So
suppose that xeUn with Nxn as above. Any zeS sufficiently close to χ in the
d-metric will also belong to Nx%n, and we can take Nz%n — Nx%n. Hence U„ is open
in S.
Suppose that xe(^\n U„. For each n, pick a point xn of S (remember that xeS\)
in f]k^nNXtk. Then d{x, x„) ^ 1/n, so that xn -*χ in (J,d). But also, for r ^ n, the
points xr and xn are in NXt„, so that p(xr, xn) < 1/n. Hence (xr) is a Cauchy
sequence in (S,p), a complete metric space. Hence, for some x0eS,xn-*x0 in
(S, p). But, since α is a homeomorphism, we must also have xn -* x0 in J. Thus
χ = x0eS. Since U„ is open in S, we must have Un = Sn Vm where Kn is open in
J. Hence
To show that S is a G6 in J, we need only show that S is a G^ in J; and this is
obvious because
S = H{);gJ:^S)<1M. D
π
83. The Cb(S) topology of Pr (S) when S is a Lusin space; Prohorov's Theorem.
Throughout this important section. S denotes a Lusin space, so that S is a Borel
subset of a compact metric space (J, p).
(83.1) DEFINITION (the Cb(S) topology of Pr(S)). We denote the space of
bounded continuous functions on S by Cb(S). We denote by Pr(S) the set of
probability measures on (S, @(S)). //(pj is a net of elements o/Pr (S) and pePr (S),
we say that μα converges to μ in the Cb(S) topology if
/U/"W(/) for all feCb(S).
A basis of neighbourhoods of the point p0ePr(S) is therefore provided by sets of
the form
{μβΡτ(Ξ):\μ(β-μ0(β\<εί, 1</<η},
where neN, each fieCb{S) and each ε{ > 0
208 SOME CLASSICAL THEORY 11.83
The most obvious way to guarantee Cb{S) convergence of sequences of
elements in Pr(S) is as follows.
(83.2) LEMMA. Suppose that (X„) is a sequence of (S,@(S))-valued random
variables on some probability triple (Ω, &, P), and that Xn -* X, a.s. Then the law
μη ofXn converges to the law μ of X in the Cb(S) topology. The same conclusion
holds if we only have Xn-+X in probability in that, for every ε > 0,
Ρ[ρ(Χη,Χ)>ε]->0 as n->oo.
Proof. First assume almost sure convergence. Then, for feCb(S), we have
μπ(/) = Ε/(*„ΗΕ/(Χ) = μ(/).
If we assume convergence in probability then we shall have almost sure
convergence along any sufficiently fast subsequence, etc. □
(83.3) Example. Let S = C([0, l],R) with the supremum-norm topology. Let
S(n) be the normalized random walk in Section 1.8. Then, by (1.8.3), we see that
the law of S(n) converges in the Cb(S) topology to the law of Brownian motion
with parameter set [0,1].
Now, back to the theory! If (xa: aeD) is a net in R, we define
lim sup xa:= inf sup {xa: a ^ a0},
aoeD
and we define the lim inf analogously. We have χά-+χ if and only if
lim sup xa — lim inf xa — x.
(83.4) THEOREM. The following three conditions on a net (μα) of elements of
Pr(S) are equivalent:
(83.5) (i) μα-*μ in the Cb(S) topology;
(83.5) (ii) lim sup μα^) ^ μ(Ρ)^ every closed F^S;
(83.5)(iii) limii^a(G) ^ μ(0)^ every open G с S;
If therefore (83.5)(i) holds and Be@(S) satisfies μ(δΒ) = 0, where dB is the
frontier (closure\interior) of B, then μα(Β) -> μ(Β).
Proof. The equivalence of (83.5)(ii) and (83.5)(iii) is trivial.
Now suppose that (83.5)(i) holds, and that F is closed in S. For each n, the
function fn defined by
/n(x) = max(0,l-np(x,F))
is an element of Cb(S), and f„ j IF (the indicator function of F) as η ] oo. For each n,
lim sup μα(ΙΡ) ^ lim μα(/„) = μ(/„),
α α
so that (83.5)(ii) follows on letting η|°°·
11.83 PROBABILITY MEASURES ON LUSIN SPACES 209
To finish the proof, we need to show that (83.5)(ii) implies (83.5)(i); and for
this, we follow Billingsley [2]. Assume (83.5)(ii). We first show that
(83.6) limsupMa(/KM(/), V/eCb(S).
By replacing / by a suitable linear combination af + b, where a > 0, we see
that it is enough to prove (83.6) when 0 <f < 1. Pick such an /, and define
F,:= {sgS :/(5) ^ i/k} (0 < К к),
where к is a temporarily fixed positive integer. Then F^as if, and
k~ \i -l)^f<k~H on F£_ AF,,
so that
By partial summation, we find that
i = 1 i = 1
Thus, since each Ff is closed and (83.5)(ii) holds,
Mm^fc-^^iF^fe-^limsupM.iF^limsupfe-^M^)
^limsup^a(/)-fe_1.
Since к is arbitrary, (83.6) follows, and, by applying (83.6) both to / and to — /,
(83.5)(i) follows. D
(83.7) THEOREM. For μεΡΐβ), let μ be the extension of μ to an element of
Pr(J) with fi{J\S) = 0. Then the map μπ->μ /5 a homeomorphism ofPr{S) with its
Cb(S) topology to the subset {v: v(S) =1} of Pr (J) with its Cb{J) topology. Hence
the Cb(S) topology of Pr(S) is metrisable.
Proof. We must show that if (μα) /5 a net in Pr (S) and μ ε Ργ (S) then the conditions
(83.8) Μ/ΗΜΛ V/eCb(S),
(83.9) &(/)-#/), V/еОД
are equivalent. Since the restriction to S of an element of Cb(J) is automatically
in Cb(S), it is obvious that (83.8) implies (83.9).
Now assume that (83.9) holds. A closed subset F of S is of the form SnY,
where У is closed in J. By Theorem 83.4,
lim sup μα(^ = lim sup μα( У) ^ μ( У) = μ^),
so that, again by Theorem 83.4 the result (83.8) holds. Π
210 SOME CLASSICAL THEORY 11.83
Of course the metrisability of Pr (S) is a relief: we can return to working with
sequences. 'Let the ungodly fall into their own nets together; and let me ever
escape them* (Book of Psalms).
(83.10) THEOREM (Prohorov's Theorem) and DEFINITION (tightness). A
sufficient condition for a subset Η ofPr(S) to be conditionally compact (that is,
for its closure to be compact) in the Cb(S) topology is that Η be tight in the following
sense:
{83.11) for each ε >0 there exists a compact subset Κε of S such that
μ(Κε)>1-ε, V/хеЯ.
// S is Polish then this tightness condition is also necessary.
It is the 'sufficiency' part that is important for us.
Proof of sufficiency. Suppose that (83.11) holds. Since Pr(J) and Pr(S) are
metrisable, conditional compactness is the same as conditional sequential
compactness. Further, we know from Theorem 81.3 that every subset of Pr(J)
is conditionally sequentially compact. It now follows from Theorem 83.7 that
we need only show that if μηβΗ and μπ-*ν in Pr(J) then v(S) = 1. This is,
however, almost obvious: from Theorem 83.4, we have
v(KB)>limsupμη(Κε)^ I - ε.
Hence v(S)= 1, as required. Π
Proof of necessity when S is Polish. Let S be Polish. Let ρ be a metric on S
such that (S,p) is complete with countable dense set {xn}. Let К be a compact
subset of Pr(S) in the Cb(S) topology, and let ε > 0 be given.
For each reN, the open subsets G" of S, where
G> U Bp(xp 1/r), where Bp(y, <5):= {xeS: p(x, y) < δ},
satisfy Gnr]S as η|οο, so that
υηΓ:={μβν:μ(&;)>1-ε2-'}ϊν as η|οο.
However, it is clear from the result (83.5)(iii) that
{μβν:μ(0ΐ)^1-ε2-'}
is closed in V9 whence U" is open in V. Since V is compact, U"ir) = V for some
n(r), so that
/i(Grn(r))>l-e2"r, V/ieK
11.83,84 PROBABILITY MEASURES ON LUSIN SPACES 211
Now put
K:=f)G?\
the 'bar' signifying closure in S. Then μ(Κ) > 1 — ε. Moreover, К is closed in S,
and therefore complete under p. For every r,
^U Bp(Xj,2/r),
so that К is totally bounded. Because К is complete and totally bounded for
the metric p,K is compact (D&S, 1.6.14). The proof is complete. Π
84. Some useful convergence results. Again, let S be a Lusin space.
(84.1) LEMMA. Suppose that h is a measurable function from (S, 4$(S)) to (S, @{S))
where S is another Lusin space, with metric p. Then the set of points Dh at which
h is discontinuous is in @(S).
Proof With δ and ε denoting positive rationals, we have
where Αεδ is the open subset of S consisting of those χ in S for which there
exist y, ζ in S such that p(x9 y) < <5, ρ(χ, ζ)<δ and (){h(y\ h{z)) > ε. Π
(84.2) LEMMA (Continuous-Mapping Principle). Let h and Dh be as in Lemma
84.1. Suppose that (μη) is a sequence in Pr (S) with μη -* μ, and that μφΗ) = 0. Then
(84.3) . μ„°Λ_1 -►μο/Γ1 in the Cb(S) topology ofPr(S).
Proof. Let Γ be a closed set in S. Let F be the closure in S of ft" ^Γ). Then
From (83.5)(ii),
limsupM^ft-^D^limsup^FX/iiiO^M0/!"1^
so that, by Lemma 83.4, the result (84.3) holds. Π
(84.4) Cb(R) convergence in Pr (R). Let μ be an element of Pr (R), and introduce
the associated distribution function F (x):= μ(— oo, x]. A point aeR is called an
atom of μ (or of F) if
M{a}) = F(a)-F(a-)>0,
or equivalently, if F is discontinuous at a. Since μ can have at most η atoms
of mass 1/n, the number of atoms of μ is countable, so that the set of non-atoms
of μ is dense in R.
212 SOME CLASSICAL THEORY 11.84
(84.5) LEMMA. Let (μη) be a sequence of elements o/Pr(R), and let μ€ΡΓ(Κ).
Introduce the associated distribution functions F„ and F. Then the following
conditions are equivalent:
(84.6) (i) μη -» μ in the Cb(R) topology of Pr(R);
(84.6)(ii) Fn(x) -» F{x) at every non-atom ofF.
(Skorokhod representation for Cb(R) convergence in Pr (R)). Moreover, if (84.6) (ii)
holds then we can find a probability triple (Ω, ^", Ρ) carrying (R, @)-valued random
variables Xn with law μη and X with law μ such that Xn-*X almost surely.
For the general Skorokhod representation for Cb(S) convergence in Pr(S),
with S a Lusin space, see Section 86.
Proof that (84.6) (i) implies (84.6) (ii). This is an immediate application of the
last sentence of Lemma (83.5), taking В there to be (οο,χ].
Proof that (84.6) (ii) implies (84.6) (i). It is clearly enough to prove the last
sentence of the Lemma. Suppose that (84.6)(ii) holds.
Let (Ω,^,Ρ) = ([0,1], Щ[0,1]), Leb). For ωεΩ, define
Χ+(ω):= sup {x:F{x) < ω} = inf {x:F{x) > ω},
Χ~(ω):= sup {x:F(x) < ω} = inf {x:F(x) ^ ω};
and make the analogous definitions for X*. If ζ > Χ "(ω) then F{z) ^ ω, so that,
by right-continuity of F, we have F(X~(co)) ^ ω, and
X "(ω) < с implies that ω < F{X ~ (ω))< F(c).
We now see that X " (ω) ^ с if and only if ω < F(c), whence Ρ [Χ " (ω) < с] = F(c).
If ω < F{c) then Χ+{ω) < с, so that
F(c) = Ρ[ω < F(c)] < Ρ[*+(ω) < с].
But, since X" < X + 9 we must have
P[*+(ω) < с] < P[Jf "(ω) < с] = F(c),
and it is now clear that equality must hold throughout, so that both X~ and
X+ have law μ. Since, for every rational c, we have
P(JST ^c<X+) = P(X~ ^c)-P{X+ ^c) = 0,
it is clear that X+ and X~ are almost surely equal.
Fix ω€Ω. Let ζ be a non-atom of F with ζ > Χ+(ω). Then F(z) > ω, so that
(by (84.6)(ii)), for large n, we shall have F„(z) > ω and Χ*(ω) < ζ. Hence
limsupA^a))^.
11.84,85 PROBABILITY MEASURES ON LUSIN SPACES 213
But we can choose non-atoms ζ with ζ[[Χ+(ω) to get
limsupXf{cD)^X+{(D).
Finally, since Μπ\ίηϊΧ~{ω)^Χ~{ω) follows similarly and X+=X~ almost
surely, we have, with Xn [respectively, X] denoting either of X*, X~ [respectively,
X\X~\
Xn^X, a.s. D
(84.7) The arcsine law. Consider the situation in (83.3) in which S:= C([0,1];R)
with the supremum norm, and in which S(n) is the normalised random walk of
Section 1.8. Let μ„ be the law of Sin) on (S,^(S)) and μ the Wiener law of BM0.
For weS, define
h{w):= Leb {s:0 < s < 1; w(s) > 0}.
Then (why?) h is a monotone limit of continuous functions on S, and so is Borel
from S to [0,1]. If wk -* w in S, then
{5 < 1: w{s) > 0} с Hm inf {5 < 1: wk{s) > 0}
£ lim sup {5 < 1: wk{s) > 0}
<={s<l:w(s)>0}.
Hence
/i(w) < lim inf h(wk) ^ lim sup h(wk) < /i(w) + /{o}(wW) dt.
Jo
Fubini's Theorem shows that
Г Г1 Г1
/ι(Λν) /{0}(w(t))A= μ{νν:νν(ί) = 0}</ί = 0,
J5 Jo Jo
so that μφ/,) = 0. Hence, by Lemmas 84.2 and 84.5, we have, for 0 < и < 1,
μπ{νν:/ι(νν) < и) ->μ{Η^:/ι(νν) < u} = (2/7r)arcsinu1/2.
The last equality is Levy's arcsine law, which is proved in Section 111.23, and,
in a more illuminating way by excursion theory, in Section VI.53.
85. Tightness in Pr (W) when W is the path-space W:= C( [0,00); R). The Arzela-
Ascoli Theorem allows us to translate Prohorov's Theorem into 'practicable'
terms when S = W with the Frechet topology in Section 82.
(85.1) THEOREM (Arzela-Ascoli Theorem, D&S, IV.6.7). A subset TofW is
conditionally compact if and only if the following two conditions hold:
(85.2(i) sup{|w(0)|:wer}<oo;
(85.2)(ii) for each NeN, lim sup Δ(<5, Ν\ w) = 0,
<U0 weT
214 SOME CLASSICAL THEORY 11.85
where
A(SiN^):=sup{\w(t1)-w(t2)\:tut2e[09Ny9\ti-t2\<d}
And here is how it is applied.
(85.3) THEOREM. A subset Η ofPr{W) is conditionally compact if and only if
the following two conditions hold:
(85.4) (i) lim $\ιρμ^:^(0)\ > a} = 0;
(85.4)(ii) for every ε > 0 and NeN, lim sup μ{νν:Δ(<5, Ν, w) > ε} = 0.
<UO деЯ
Only the 'if part matters to us.
Proof of 'if part. By Prohorov's Theorem, we must show that if η > 0 is given
then we can find a conditionally compact subset Γ of W with μ(Γ) > Ι—η,^μεΗ.
So suppose that conditions (85.4) hold and that η > 0. Choose a so that
if
A:={w:\w(0)\^a}
then μ{Α) > 1 - \ц, ΥμεΗ. Choose δ = <5(n, Ν) such that if
Any.= {w:A(d,N;w)^l/n}f
then μ(Αη,Ν) > 1 - η2-{η+Ν+2\4μβΗ. Put
r:=Anf)AntN.
n,N
Then μ(Γ) > 1 — η, and since Γ satisfies the conditions (85.2), Γ is conditionally
compact in W. □
Martingale methods will provide the best way of establishing the conditions
(85.4) in the cases that concern us. But moment criteria and the deep Garsia-
Rodemich-Rumsey inequality provide very important ways of establishing these
conditions in other contexts. Here, as an example, is a result motivated by
Kolmogorov's criterion (1.25.2) for path continuity.
{85.5) THEOREM. Let Η be a subset ofPr{W) such that (85.4) (ii) holds and
that, for every iVeN, there exist constants γΝ,δΝ and CN in (0, oo) such that
supf Iw^-w^rMi^^C^I^-iJ1^- (Vtut2el0,m).
Then Η is conditionally compact in Pr(W).
11.86 PROBABILITY MEASURES ON LUSIN SPACES 215
86. The Skorokhod representation of Cb(S) convergence on Pr (S). The following
theorem gives a nice (and useful) way of thinking about Cb(S) convergence on
Pr(S).
(86.1) THEOREM (Skorokhod). Suppose that S is a Lusin space, that μη (neN)
and μ are elements of Pr(5), and that μη-*μ in the Cb(S) topology. Then there
exists a triple (Ω, .^, Ρ) carrying (S, @(S))-valued random variables Xn with law
μη and X with law μ such that Xn-+X almost surely.
(86.2) OBSERVATION. IfS is some fixed Lusin space for which the theorem is
true (for arbitrary μη and μ in Pr(S),) then the theorem remains true when S is
replaced by an element S of$$(S).
Proof of Observation. This is obvious from the proof of Theorem 83.7. Π
Proof of Theorem. We already know from Lemma 84.5 that the theorem is
true when S = R. In particular, it is true when S is the Cantor set С a [0,1].
Now, the space {0,1}N is homeomorphic to С via the map
(0^2£ги3-и,
and it is therefore clear that CN is homeomorphic to С
By now, we know that the theorem is true when S = CN. We also know that
we need only prove the theorem when S is a compact metrisable space (J,d).
However, the map from J to [0, |]N defined by
xb+(i</(x,xfc)[l +</(х,хк)Г ^fceN),
where {xn} is a dense subset of J, is a homeomorphism of J to a compact (and
therefore Borel) subset of [0,|]N. Hence all we need do is deduce the result for
S = [0,|]K from the known result of CN.
So, let S;= [0,|]N, and let μη-*μ in the Cb{S) topology of Pr(S). For each к
in N, there can be at most countably many real numbers a such that
μ{χΕ$:χΛ = a} > 0. Hence, if D denotes the set of dyadic rationals then we can
find r in [0, γ\ such that
μ{χΕί:χΛ + re[0, l]\D, VfceN} = 1.
Now let с be the continuous standard Cantor functions с: [0, 1] -* [0,1], and
let
y(i):= inf {ue[0,1]:φ) ^ i}, ie[0, l].
Then γ: [0,1] -* С,γ is Borel, γ is continuous at points of [0,1]\D, and c°y = id
on [0,1]· The map t:S-*Cn, where
T(x):=(7(xfc + r):fcelK),
is Borel-measurable (look at the inverse image of special cylinders!), and the
216
SOME CLASSICAL THEORY
11.86,87
set of its discontinuities has μ-measure 0. Hence, by Lemma 84.2,
μ„οτ_1 -»μ°τ_1 in the Cb(C**) topology of Pr(CN).
Since our theorem is true when S = CN, we can find (Ω, &, Ρ) carrying
(С,ЩС))-valued random variables Yn with law μπ°τ_1 and У with law μ°τ-1
such that
Yn-+Y in CN, almost surely.
Now consider the map ф: Ск-*[ — 1,1]N defined by
ф(у):=(с(ук)-г:кеЩ
Note that 0°t = id on S. Since φ is continuous, 0(Уп)-*(/>(У) almost surely. If
V is an (S,^(S)) random variable with law μ, then τ(Κ) has law μ°τ~\ just like
У, and 0(τ(Κ))= Κ Hence the random variable φ(Υ) has law μ, the obvious
extension of μ from S to [— 1,1]^. If we now throw away any ω for which
either some Υη(ω) or Υ{ω) is not in S, we have completed the requisite
construction. Π
Note. We did not need to consider, for example, whether the image of S under
τ is Borel in CN.
87. Weak convergence versus convergence of finite-dimensional distributions. We
now revert to probabilists' terminology.
(87.1) TERMINOLOGY (weak convergence, convergence in law). Let S be a
Lusin space and let μη (neIN) and μ be elements ofPr(S). We say that
μη converges weakly to μ,
and write
if and only if
μη converges to μ in the Cb(S) topology.
IfXn, carried by (Ω, #„, Pn) and with law μη, and X, carried by (Ω, #", P) and with
law μ, are (S, 3S(S))-valued random variables, and if also μη => μ, then we say that
Xn converges in law to X.
For W= C(R+,]R) and for each finite subset U of R+, we have the restriction
map nv: W-+T!LU as in our study of the DK Theorem.
(87.2) DEFINITION (convergence of finite-dimensional distributions). Let W
be the path-space W= C(R+,R). Let μη(ηβ№) and μ be elements ofPr(W). We
say that
the finite-dimensional distributions of μπ converge to those of μ
11.87,88 PROBABILITY MEASURES ON LUSIN SPACES 217
if, for every finite subset U o/R+,
(in the Cb(Ku) topology ofPx(Ru)).
The following result might clarify certain things.
{87.3) LEMMA. Preserve the meaning of W, μΠ and μ. Then μπ=>μ if and only
if both of the following conditions hold:
(87.4) (i) the finite-dimensional distributions of μ„ converge to those of μ;
(87.4)(ϋ) the family (μπ:«6ΐΚ) is tight.
Proof of 'only if part. Suppose that μπ=>μ. Since each map πυ is continuous,
convergence of the finite-dimensional distributions follows from Lemma 84.2.
The tightness follows from the 'necessity' part of Prohorov's Theorem, W being
Polish. Π
Proof of 'if part. Suppose that (87.4)(i) and (87.4)(ii) hold. Then, by the
'sufficiency' part of Prohorov's Theorem, (μπ) is conditionally sequentially
compact. But if μπ(Λ)=>ν, then the finite-dimensional distributions of μπ(λ) converge
to those of v; and, since μ and ν therefore have the same finite-dimensional
distributions, they are equal. Π
Now do Exercise E91.87.
Regular conditional probabilities
Here we prove the existence of regular conditional probabilities under topological
assumptions. (Recall that we know from Section 43 that regular conditional
probabilities do not always exist.) We also discuss the Markov property of
Brownian motion using the language of regular conditional probabilities: this
is meant to prepare you for the next chapter on Markov processes.
We are even going to start switching now to 'Markov-process' notation. It
seems to be impossible to study Markov processes rigorously without a baroque
extravaganza of σ-algebras and filtrations: one uses symbols with 'degrees' such
as #"° for 'uncompleted' σ-algebras (rather like those we have hitherto called
^), 2?μ for the completion of J^0 with respect to a certain Рд measure, etc.
So, we are going to work with a basic σ-algebra J^0 and a sub-a-algebra #°.
88. Some preliminaries. Let Ω be a set.
(88. jf) DEFINITION (countably generated σ-algebra; atom of a σ-algebra). Let
<$° be a σ-algebra on Ω. Then <§° is said to be countably generated if there exists
218
SOME CLASSICAL THEORY
11.88,89
a sequence G1,G2,... of elements of 9° such that 9° — a{G1,G2,...}. For ωεΩ,
the atom Α(ω) of 9° containing ω is defined to be the set
A((Dy=(){Ge&0:cDeG}.
Exercises
(i) Prove that if S is a compact metrisable space then $$(S) is countably
generated. Deduce that the same holds for a Lusin space S.
(ii) Find a sub-a-algebra of ^[0,1] that is not countably generated—there is
an obvious one.
(iii) In general, Α(ω) need not be an element of 9°. Prove that А(со)е&° if ^° is
countably generated. This is done later in this section.
The proof of the main theorem in the next section requires a slight modification
of the Riesz Representation Theorem 80.3. Suppose that Ω is a compact
metrizable space. Suppose that ^ is a countable dense subset of C(Q) such that
le# and ii is a vector space over the rational field Q. For example, # could
be the algebra A used in the proof of Theorem 81.3. Suppose that φ.^-^TR. is
Q-linear, increasing, and satisfies 0(1) = 1. Let us prove that for /ieC(Q),
0*(й):= inf {ф(д):де<$,д > h) = sup {<f>(f):fe<$,f^h}.
[For, by adding to h a suitable multiple of 1, we can suppose that h ^ 1 on Ω.
Then, for rational ε > 0, we can find / and g in 4! such that
(l-e)h<:f<:h^g^{l+B)h,
whence f>(l -ε)(1 4-ε)"1^.] By applying the Riesz theorem to φ*, we find
that there is a unique probability measure μ on (Ω, ^(Ω)) such that
Jn
89. The main existence theorem. We recall the definition of regular conditional
probability within the theorem.
(89.1) THEOREM (Doob,..., Kuratowski,...). Let (Ω,^°,Ρ) be a probability
triple in which Ω is a Lusin space and J*° = ^(Ω). Then there exists a regular
conditional probability (P\9°) of Ρ given 9°, that is, a function
(Τ|»°):.£ΓΟχΩ->[0,1]
such that
(i) for each Fe^°t the function coi->(P|^0)(F,co) is a version ofP{F\9°);
(ii) for every ω, the map F\-►P(ir|^°)(ir,a>) is a probability measure on 3F°.
11.89
PROBABILITY MEASURES ON LUSIN SPACES
219
The stochastic process {[P|^0)(F,-):f e<^°} *5> modulo indistinguishability,
the unique modification of the process {P{F\9°):Fe^°} with the regularity
properties (i) and (ii).
Assume further that У° is a countably generated sub-o-algebra of 3?°. Then
(Р\&°) has the further properties:
(Hi) {ω:(Ρ|^°)(σ,ω) = /σ(ω),νσΕ^°} is a set in 9° ofP-measure 1.
(iv) ifA{d) denotes the atom of Ψ containing ω then {ω:(Ρ|^°)(Λ(ω),ω) = 1} is
in <§° and
Ρ{ω:(Ρ|^°)(Λ(ω),ω)=1} = 1.
Proof when Ω is a compact metrisable space. Assume that Ω is a compact
metrisable space. Let ii be a countable dense subset of C(Q) containing 1 and
such that ii is a vector space over Q. For each /e#, choose and fix some version
ofE(/|3T) and write
•0ω(/):=Ε(/|^)(ω).
The set Γ of 'good' ω for which the statements
Φω(9ιίι + 4ifi) = ίι0ω(/Ί) + 4ιΦΛίι\ V^^eQ, V/Ί,/2 etf,
Φωϋι) < ΦωΰΊ) whenever fi,f2e^2indf1 ^f2,
</>ω(1) = 1,
are simultaneously true is in #°, and elementary properties of conditional
expectations show that it has probability 1. For соеГ, the map φω on 4> is
Q-linear and increasing, with φω(1) = 1, so that from the discussion in Section
88, there exists a probability measure (Ρ|#°)(·,ω) such that
ФЛЛ = Ε(/|3Τ°)(ω) = |/(ώ)(Ρ|^№ώ,ω), V/etf.
For ω φ Γ, define (Ρ | #°) (·, ω) := ν, for some arbitrary but fixed element ν of Pr (Ω).
It is now trivial that, for fixed ξ,
Ε(ξ19°) = (ξ(ώ)(ΡIЗГ°)(ЛЗ, ω), a.s. (Ρ, 9°\
first for each ξ in C(Q) by uniform convergence, then for each ^ebJ^0 by
monotone-class arguments, then for each ξ in «^(Ω,^,Ρ) by truncation.
Now we must prove properties (iii) and (iv) under the assumption that 9° is
generated by a countable sequence G1,G2j... . Thus ^° is a fortiori generated
by the countable π-system X consisting of all finite intersections of the Gv Now
let
Ω,:= {ω:{Ρ\$°)(Κ,ω) = 1к(со%ЧКеХ)еУ°,
where, as usual, IK denotes the indicator function of K. Since X is countable
220 SOME CLASSICAL THEORY II.89,90
and, obviously,
Щ1к\9°) = 1к, a.s.(P,«n,
it follows that Ρ(ΩΧ) = 1. For each ωΕΩ1? the set of those Ge&0 on which the
measures
Gh+(P|ST0)(G,Q>), Gt-+IG(a>)
agree is a d-system; and, since this d-system includes the π-system Jf, it includes
σ{Χ) = ^°, by Dynkin's Lemma. Hence, for ωΕΩ1?
(Ρ|^°χσ,ω) = /σ(ω), VGeST.
Now let Α(ω) be the atom of G° containing ω. Then, for ωεΩλ, we can conclude
that
(Ρ\9η(Α(ω\ω) = ΙΜω)(ω)=1
once we know that A{a>)e&°. But you have already done this exercise by showing
that
where ^° = σ(σι? G2,...) as before and G<:= Q\G„.
All further details are left to you. D
Proof when Ω is a Lusin space. Assume that Ω is a Lusin space. Thus Ω is in
3&(J) for some compact metrisable space J. Regard Ρ as extended to (J,Jf(J))
in the obvious way, and apply the 'compact' result already obtained to the
triple (J,Jf(J),P) with subalgebra ^°:=σ(^°) on J. On the P-null, Ж°-
measurable set of ω* in J for which (Ρ|^°)(Ω,ω*) Φ 1, redefine (P| Jf °)(·,ω*):= ν,
where ν is an arbitrary but fixed element of Pr(J) with ν(Ω) = 1. Set
(P19°)(F, ω):= (Ρ| Ж°){¥, ω) (FeF, ωεΩ),
and we are finished. Π
See Dellacherie and Meyer [1], Stroock and Varadhan [1] and Parthasarathy
[1] for other accounts of essentially the same proof. Stroock and Varadhan
illuminate the relevance of tightness to the DK Theorem by relating that
theorem to Theorem 89.1.
Remark. The theory of regular conditional probabilities is sometimes called la
theorie de desintegration des mesures.
90. Canonical Brownian Motion CBM (RN); Markov property of P* laws. We
11.90
PROBABILITY MEASURES ON LUSIN SPACES
221
end this chapter (except for an important set of exercises) by revising our view
of canonical Brownian motion in the light of the ideas we have studied.
On no account skip this section. It is important for your later understanding.
We now use the basic notation:
(90.1) (i) Ω:= C(R+, R"), Χ,(ω):= X(t, ω):= ω(ί) (ί > 0; ωεΩ);
(90.1) (ii) &°:=a(Xs:s>% ^°:=c(Xs:s^t) (ί>0);
(90.1)(iii) p(t;x,y):= (2πίΓΝ'2exp( - "У ~ *И (t>0;x,yeKN),
(90.1)(iv) P(t;XJy):J*>Xfdy *'>*
[ex(dy) if i = 0.
Here εχ is the unit mass at x. Thus p{t; x, y) is the Brownian transition-density
function, and P(t; x, B) is the transition function from χ into Borel set В in time t.
By Wiener's Theorem, for each χ in Rn, there exists a unique measure Px on
(Ω, &0) such that for neN, for 0 ^ tx < ··· ^ tn and for xl9...,хиеКЛ
(90.2) Px( Π №M**}) = Π P(trii-i;Vi,^
where i0:= 0 and x0:= x. Equation (90.2) makes rigorous sense when integrated
over a Borel subset of (RN)n. Let Ex be the expectation associated with Px.
It is obvious from (90.2) that, for a special cylinder set F, the map
(90.3) xi->Px(F) is Borel-measurable on Rn.
The class of all Fe^° for which (90.3) holds is clearly a d-system; and, since it
contains the π-system of special cylinders, we see that
(90.4) xh->Px(F) is ^(R")-measurable for every Fe^°.
Space shifts (σχ:χΕΈίΝ). For xeRN and ωεΩ, define
(90.5) σχ: Ω-»Ω, (σχω)(ί):= ω(ί) + χ (ί ^ 0).
Then σχ is a continuous map from Ω to Ω if we use the usual topology of uniform
convergence on compacts. Moreover,
ρ* = ρ°οσ;ΐ, Ε*ί = Ε°{οσ, (£eb^°)·
It is now clear that
(90.6) the map xi-*Px is continuous from JR.N to Ργ(Ω)
with its weak, Cb(Q), topology.
Towards the Markov property. There are two equivalent ways of formulating
the conditional independence of past and future given the present, which is the
222
SOME CLASSICAL THEORY
11.90
Markov property:
(90.7) Ρ (past and future | present) = P(past | present) Ρ (future | present);
(90.8) P(future | past and present) = Ρ (future | present).
See Exercise E60.41. We are going to concentrate on the latter formulation, but
make it more precise by using the language of regular conditional probabilities.
We also build the time-homogeneity property into our formulation.
Time shifts (0,: t ^ 0). For t ^ 0, define the map
(90.9) 0,: Ω -► Ω, (0,со)(и):= ω(ί + и) (и ^ 0).
If Л denotes an event then 0,_1(Л) denotes that event shifted through time t:
thus if Л = {XheB} then θ~\\) = {Xt+heB}. If η is a function on Ω, we write
Qtr\ for η°θί:
(90.10) 0ί^7(ω):=(0ί^7)(ω):=^7(0ίω).
Note that, since Ω is a Polish space and J* ° = ^(Ω), and since also !F°t is
countably generated (why?), a regular conditional probability (P*|#"°) must
exist for хеКЛ One of the most intuitive statements of the time-homogeneous
Markov property is that (P^F,0)^"1 is (indistinguishable from) P*(i). This is
part of our next theorem. Because we are considering CBM (RN) already
equipped with all its P* laws, we do not need here the existence theorem on
regular conditional probabilities. However, we shall need that theorem later,
in the Stroock-Varadhan theory of martingale problems. Roughly speaking,
we shall need to say there that CBM (RN) with just one given law (say the P°
law) can sense the family of P* laws.
{90.11) THEOREM (Markov properties of CBM (R*)). The following results
hold.
(i) For every xelR** and every i^0, we have (modulo indistinguishability)
(90.12) (P| J^H-1 = Ρ*(,·ω) on 0го.
(ii) For xgRn and t^0, and for £eJ%0 and r\eh&°, we have
(90.13) E*[£0^] = Ex KE*(l)if].
The same result holds (/"^(т^^ and т/е(т^°)+.
Discussion and proof. Both parts of the theorem describe the way in which the
Markov property knits together the various P* laws. The formula (90.13) is the
most useful statement of the Markov property for doing calculations. You will
soon acquire fluency in its use. At first sight, (90.13) looks rather complicated;
and, even though it will look more complicated when written out in full, it is
11.90 PROBABILITY MEASURES ON LUSIN SPACES 223
worth expanding its statement for clarity. It says that
Γ ξ(ω)η(θίω)Ρχ{άω) = Γ {(ω) ( | η(ώ)ΡΧ{ί>ω)(άώ) J Ρχ{άώ).
J ωεΩ J ωεΩ \ J ώεΩ /
A number of measurability and other questions will have occurred to you in
connection with the statement of the theorem. Indeed, once one is clear what
the theorem means, its proof is almost obvious!
The result (90.4) extends by monotone-class arguments to yield the fact that
(90.14) for ^ebJ*70, the map χ\-+Εχη is #(RN)-measurable.
We know that ω\->X{t,a>) is ^-measurable, so that
(90.15) for це\>&\ the map ω*-+ΈΧ(ί-ω)(η) is immeasurable.
Because F\-+PX{t)(F) is already a measure on J^0 and because J*"0 is countably
generated, to prove part (i), we need only prove that, for fixed Fe^°,
(90.16) P*(0-1F|J^ = P*(,)(F), a.s.(P*,J*7),
a statement about ordinary (as opposed to regular) conditional probabilities.
To prove (90.16), we must show that, for Ge^°, we have
(90.17) PX{G η θ;1F) = Ex{Pm(F); G).
It is enough to prove this when F and G are special cylinders. To avoid lots of
integrals, we assume formally that
G = {X(t)edy} η Π {ВДеЛс,}, F = f] {X(uk)edzk}.
i=l *=1
Then
and
P*<'.«>(F) = Π Р(Щ ~ «*-1;4-1,dzk) (u0:= 0,z0:= X(t,ω))
*=1
EX(P™(F); G) = | Π P(si ~ s(- » x,_ lf dxt)\p(t - sy9 xp dy)py{F\
and (check!) this agrees with the left-hand side of (90.17).
When ξ = IG and η = /F, (90.13) reduces to (90.17). Monotone-class arguments
round everything off. D
91. Exercises. In these exercises, we use the probabilistic notation and
terminology explained in Section 87. For numerous exercises on weak convergence
of measures on (R,^(R)), see [W].
(E91.82) Prove that if S is a Gd subset (countable intersection of open sets) of
224
SOME CLASSICAL THEORY
11.91
a compact metric space J, then S is Polish. (Hint. Begin with the case when S
is an open subset.) Deduce that the set IR\Q of irrational numbers, with its
usual topology, is Polish.
{E91.83a) The space Pr(S) shares many of the properties of S. We know that
if J is compact metrisable then Pr( J) is compact metrizable. Show that if G is
open in J and ^elR, then {/xEPr(J):/x(G) > η} is open in Pr(J). Deduce that if
S is a Gs subset of J then {/xEPr(J):/x(S) = 1} is a Gd in Pr(J): in other words,
if S is Polish then Pr(S) is Polish. Use the Monotone-Class Theorem to prove
that the set of / in b@){J) for which μι-*μ(/) is ^(Pr(J)) measurable coincides
with b^(J). Hence, if S is a Borel subset of J then {/xEPr(J):/x(S) = 1} is Borel
in Pr(J): in other words, if S is a Lusin space, then so is Pr(S).
It should be noted that if S is Polish then there is a natural metric, the
Prohorov metric, under which Pr(S) is complete and separable. See, for example,
Ethier and Kurtz [1].
(E91.83b) Weak convergence of empirical distributions. Let Xx, X2,... be
independent identically distributed real-valued random variables carried by (Ω, J7', P).
Let μ be the common law of the Xk. For ωεΩ, let μη(ω) be the empirical
distribution
μ„(ω)(Β):= n~l #{k ^ n: Xk{oS)eB} {Be@).
Prove that {ω:μη(ω)=>μ}€#" and Ρ(μη=>μ)= 1.
(£91.83c) Let S = C([0,1];R). For λ < 1 and weS, define
{τ(λ)\ν}(ί):=λ-ι\ν(λ2ή. ie[0,l].
Thus z(X):S->S. Fix c< 1, and, for weS, define a probability measure μη(νν)
on(S,Jf(S)) via
Let Ρ be the (Wiener measure on (S, Jf(S)). We know that z{c) preserves P:
ροφ)"1 =ρ on (S,^(S)). Prove that if / is a bounded measurable function
from (S, 3&(S)) to R such that f(r(c) w) = /(w) for every w then / is constant a.s.
(P). It now follows from the Ergodic Theorem that μπ(\ν)(/)->Ε(Λ a-s- It
therefore follows (why?) that Ρ(μη=>Ρ)= 1.
(E91.86) The Continuous-Mapping Principle (84.2) was used in the proof of the
Skorokhod Representation Theorem in Section 86. Even though it is a circular
argument, convince yourself that the Skorokhod result greatly illuminates
Lemma 84.2.
Now use the Skorokhod Representation Theorem to prove the following
11.91
PROBABILITY MEASURES ON LUSIN SPACES
225
result, which we need in Section V.23 of Volume 2 and which is often used in
the literature.
Let {gk: к ^ 0} be a uniformly bounded sequence of functions on a Lusin
space S, and suppose that /xfcePr(S) and μ*=>μ. If {gk'.k^0} is equicontinuous
at each point of S and gk -* g pointwise then
i.*M.
9<1μ.
(E91.87) The purpose of this exercise is to give an example in which we have
convergence of finite-dimensional distributions without convergence in law.
(Clearly, tightness, must fail in such a case.) 'Almost anything will do', but we
give a very concrete example.
Let h be the 'tent function'
Ц-М ifum.
l0 if |x|>l.
Let U be a random variable carried by some (Ω, J*, P) and uniformly distributed
on Q, f]. For ωεΩ and ie[0,1], define for neIN,
Ζ„(ί,ω):= h(3n{t- ί/(ω)}), *(ί,ω):= 0.
Regard Xn and X as C[0, l]-valued random variables. Prove that, for every i,
Xn{t) -* X{t) almost surely, so that the finite-dimensional distributions of Xn
converge to those of X. Prove that Xn does not converge in law to X.
(E91.90). Let X be CBM (RN) under the P° measure. Prove that a regular
conditional probability of P° given σ(Χ(1)) will agree with Q on ^°v where Q
is the law of the Brownian bridge from 0 to X{1) in time 1.
Some hints
{H91.82) For an open subset G of J define
1 1
where Gc:= J\G. For S = f] G(n), define
p(x,Gc) p(y,Gc)
Ps(x,y)=i2-\pG{n){x:y) 4.
i+pG{n)(x>y)
(H91.83b) Think of R as an open subset of J:= [ - oo, oo]. Choose a countable
226 SOME CLASSICAL THEORY II.91
dense subset {/r} of C(J). For each r,
1 n
" Σ /r(**W(/r), a.s.,
И*=1
by the Strong Law.
(H91.83c) If f(w) = /(t(c)vv) for every w then, for every n,
/(w) = /(r(cn)w), so that fema{w(s): s ^ cn}.
But (see part (ii) of Theorem 68.2) f] σ{\φ): s ^ cn} is P-trivial, so / is a.s. constant.
CHAPTER III
Markov Processes
As explained in the Preface to this edition, this chapter remains very much as
it was originally: the only significant difference is that the functional analysis—
in Hille-Yosida theory and in Ray's Theorem—is given much fuller treatment.
There is a sense, therefore, in which the chapter is caught in a 'time warp'; but
we hope and trust that it will still serve as a useful introduction to Markov
processes. The Preface advised further4reading. The table of contents is a good
guide to what this chapter contains.
1. TRANSITION FUNCTIONS AND RESOLVENTS
1. What is a (continuous-time) Markov process? The first two sections are
intended to explain and motivate something of what follows. Details may be
left vague—at least for the time being!
Informally, a Markov process models the motion of a particle that moves
around in a measurable space {E,S) in a memoryless way. As carrier set-up,
we need a filtered space (Ω, {&t}\ on which, for every t ^ 0, an ^-measurable
random variable Xt9 which gives the position of our particle at time i, is defined.
It proves necessary to introduce a probability law P* for each point χ in E. (In
the Feller-Dynkin context that we meet first, P* denotes the law of the process
when it starts at x. In the Ray context, this idea has to be modified somewhat.)
The Markov property ties together the various P* laws. It is all very similar to
what we have just studied for the Brownian case in Section 90 of Chapter II. The
following definition gives the idea of a Markov process and a precise definition
of a transition function. The 'P,(x, Ε) ^ Γ condition allows for the possible death
of our particle.
(1.1) DEFINITION (Markov process, transition function). A Markov process
X = (Ω, {&t:t> 0}, {Xt: t > 0}, {Pt: t > 0}, {P*: xeE})
with state-space {E,S) is an Ε-valued stochastic process adapted to {&t} such
that, for 0 ^ s < tJehS and xeE,
(1.2) E*[/(X.+i)|#-J = {Ptf){Xs\ Vх a.s.,
228
MARKOV PROCESSES
IH.1,2
where {Pt} is a transition function on (£, S\ a family of kernels Pt:E χ £-* [0,1]
such that
(1.3) (i) for £ ^ 0 and xeE,Pt(x,) is a measure on £ with Pt{x,E) ^ 1;
(1.3) (ii) for 17*0 and Ге<Г,Р,(-,Г) is ^-measurable;
(1.3)(iii) for s9t^09xeE and Te£,
-L·
Λ+,(*,Γ) = Ι Ps(x,dy)Pt(y,O·
Equation (1.3)(iii) is called the Chapman-Kolmogorov equation. We can equally
think of the transition function as inducing/being a family {Pt} of positive
bounded operators of norm less than or equal to 1 on b<f, with
Ptf(x):=(Ptf)(x) =
Pt(x,dy)f(y),
in which case the Chapman-Kolmogorov equation becomes the semigroup
property
PsPt = Ps+t (s,t>0).
Much of the interest of the subject arises from the interplay between the
analysis of transition functions and the sample-path description of the Markov
process. The starting point is sometimes the one, sometimes the other. For
example, a diffusion X is often given to us 'pathwise' as the solution of some
stochastic differential equation; and, except in a few trivial cases, there will be
no closed-form expression for the transition function {Pt} of X. (Often, there
will be no need to know about {Pt}.) On the other hand, in Markov-chain
theory, we begin with a semigroup satisfying some minimal regularity properties;
this time, it is the process that is hard to get hold of, and indeed the existence
and properties of this process will need deep and difficult results, which it is
the aim of this chapter to develop.
The most important results of the chapter say that if a transition semigroup
{Pt} is given then, under certain mild regularity conditions, there will exist on
some probability space a Markov process X with R-paths (in Ε or perhaps in
a suitably-topologised compactification of E) and such that the strong Markov
property holds:
E*[/(Xs+i)l^s] = (PJ)(X8l P* a.s.,
whenever S is a finite stopping time. The chapter will explain numerous methods
(time-substitution, Feynman-Kac formula etc.) that may be applied to this good
version X.
2. The finite-state-space Markov chain. To illustrate certain concepts, we start
with the simplest possible case. A Markov process whose state-space (before
compactification!) is a countable set is called a Markov chain. (But beware: some
III.2
TRANSITION FUNCTIONS AND RESOLVENTS
229
authors use 'chain' to signify 'with discrete time parameter'.) To keep things
really simple, we shall in this section assume that £ is a finite set, and we shall
use the notations
Pij(t):= Pt(U {Л), P(t) = {Pij(t): UjeE}.
We shall assume that {Pt} is honest in that Pt(U E) = 1 for all i and t. A transition
semigroup is now a semigroup of £ χ £ matrices. That some regularity is
necessary is obvious if we consider the case when Ε has two points and
m=Q !)> ™-(! J) <·><*
then no chain X with transition function {Pt} can have a right-continuous
version. Some basic regularity assumption is needed: we consider only semigroups
that are standard in that
(2.1) Py(t)->*y (ПО).
It can be shown (see later in this chapter) that the condition (2.1) implies that
(2.2) p;.(0):= q{J exists for UjeE.
The matrix Q:= {q^iij'eE} is the Q-matrix or infinitesimal generator of {Pt},
and has the properties
(2.3) qij>0 (1ФЛ £itt = 0 (ieE).
к
Starting from the transition function, how would we construct an associated
Markov process/chain ΧΊ One remark is that, for an initial distribution μ of
X0, the finite-dimensional distributions of X are determined via
(2.4) Ε" Π /i№<) = M/o^1/iP,a/2-^/J.
i = 0
where 0 = t0 < tl < ··· < tn and sy.= tj — tj^l.lt is not hard to check (using the
semigroup property) that the finite-dimensional distributions are consistent in
the sense of the Daniell-Kolmogorov Theorem, whence, by that theorem
(11.31.1), there exists a process X with the required law. But can we suppose that
X has R-paths and the strong Markov property? If not, we can do nothing with
X. The italicised question is not trivial; of course, that the answer is 'Yes' follows
from theory developed below.
Could we start from the process? We could indeed. Let qt:= — qii9 as usual.
The well-known 'jump-hold' construction of a Markov chain starts with a
discrete time chain (У(и)} with 7(0) having initial distribution μ and transition
matrix J, where
τ ._Uij/4i if l^h
230 MARKOV PROCESSES III.2
(We assume qt Φ 0 for each i.) Let {Vr: reZ+} be a family of independent positive
variables each exponentially distributed with rate 1. Set
K-= Σ 9(Г,ГЧ. T,:=inf{nMn>t}( Χ,:=Υ(τ,).
r = 0
Then A" has R-paths (we use the discrete topology on E). Moreover, it is true
that X is Markovian with Q-matrix Q; but how do we prove the Markov
property, that the Q-matrix really is Q, etc.? Can we prove that X has the strong
Markov property?
The reader who tries to answer these questions will see that there are
fundamental and non-trivial issues even in this simple example; we need an adequate
theory of Markov processes to take us clear of these fundamental (but not very
exciting) questions.
Before abandoning this example, we can learn from it important elements of
the structure of a general Markov process. Since P'(0) exists and is equal to Q9
it follows that, for t ^ 0,
(2.5) F(t) = lim e-'IPit + e) - P(t)] = Hm ε^χΡ(ί){Ρ(ε) -1} = P(t)Q;
ε|0 ε|0
and, solving this equation, we find that
(2.6) P(i) = exp(t0;
the semigroup is here generated by its generator Q in a very simple way. Next,
we define the resolvent {Rx: λ > 0} of the semigroup by
(2.7) Rx:= Γβ-*Ρ,Λ.
Jo
This is just the (componentwise) Laplace transform of the semigroup; or we
may regard it (more helpfully) as follows:
(2.8) (ARA)y = f °° λβ-λ%(ί)άί = P(XT =j\X0 = 0,
Jo
where Τ is a random variable independent of A" with the exponential distribution
of rate A. In view of (2.6), it is immediate that
(2.9) Rx = (X-Qy\
and various simple algebraic properties follow immediately, most notably the
resolvent equation
(2.10) Κλ-Κμ = (/ζ-Α)ΚλΚμ (λ9μ>0).
The structure we have just described for a finite-state-space Markov chain is
indeed, when suitably interpreted, a feature of all Feller-Dynkin processes: the
(infinitesimal) generator of a semigroup is just its derivative at 0, the resolvent
IH.2,3 TRANSITION FUNCTIONS AND RESOLVENTS 231
is given by (2.9), and the semigroup is then found by inverting the Laplace
transform as in (2.7). Sense can be made of the exponential formula (2.6). The
resolvent equation holds in complete generality; and we shall see that resolvents
are 'smoother', and in many ways more fundamental, than semigroups. Technical
problems arise in the general situation because the derivative with respect to t
of Ptf does not exist for all /, so the generator is defined only on a subspace
ofb<f,....
The Hille-Yosida Theorem and Ray's marvellous extension of it will allow
us to cope with these analytical problems. Once we have the semigroup, and
hence a crude Daniell-Kolmogorov version of our desired process, we find that
there are always enough supermartingales around to allow us to 'smooth' our
process by Doob's regularisation theorems.
3. Transition functions and their resolvents. Let {Pt} be a transition function
on (£, $). We shall say that {Pt} is honest ('conservative' and 'strictly Markovian'
are often used) if
P,(x,£) = l, W,x.
The possibility Pt{x, E) < 1 must be allowed in the theory for reasons that will
become clear later. The intuitive significance is that 1 — Pt(x, E) represents the
probability that our Markov particle has 'died' before or at time t It is
convenient (even when {Pt} is honest, for we need licence to kill an honest process!)
to adjoin a coffin state д to Ε producing an extended state-space Ед:=Еид.
Let ^:=σ(^,δ) be the smallest σ-algebra on Ed extending S and containing
{d}. The transition function {Pt} extends to an honest transition function {P*d}
on (Edi£d) in the obvious way: for t ^ 0,
(3.1) (i) P+d(x,d):=l-Pt(x,E) (xe£,t^0),
(3.1) (ii) Р?д(д, *)·= £д, the unit mass at δ,
(3.1)(iii) Ρ^(ν):=Λ(ν) on£x<f.
It is profitable to think about our (unextended) transition function {Pt} on
(£, S) in another way. It is easy to see that the equation
PJ(x) = j Pt(*,dy)f(y)
sets up a one-to-one correspondence between transition functions on (£, S) and
sub-Markov semigroups on hS. A (one-parameter) sub-Markov semigroup
{Pt:t^0} on Ъ£ is a family of bounded linear operators on hS such that
(3.2) (i) Pt:b£^b£,
(3.2) (ii)
0^/41=*(KPJ41,
232
MARKOV PROCESSES
III.3
(3.2)(iii) Ps+t = P5Pt,
(3.2)(iv) /ЛО^Л/ДО, Vt.
In this formulation, {Pt} is honest if and only if Pt\ = 1, Vi.
(3.3) 'Normal' transition functions. In all cases of interest, £ contains all singleton
sets {x}. We say that our transition function {Pt} is 'normal if Ρ0(χ,) = εχ(),
the unit mass at x, for all χ in E. In semigroup language, this means that P0 — /,
the identity map on hS.
(3.4) Remark. Note that (3.2)(iii) implies that Ρ% = P0 for every transition
function {Pt}. In the theory of Feller processes, which we develop first, we shall
have the 'normal' situation: P0 = /. In the theory of Ray processes, the condition
P0 = I can fail.
(3.5) Example. We have already met the Brownian transition function on
(Rn, Jf(Rn)) defined as follows:
Pt(x, Γ):= J p(t; x, y) dy (t > 0; xeR"; Те®(Жп)\
where ρ is the Brownian transition-density function:
p(t;x,y):=(2nt)-nt2exp(-\y-x\2/2t),
Λ)(*>·):=εχ(·).
(3.6) Example. Let / be a countable set and let J be the set of all subsets of
/. Let {pij(t): t ^ 0; ijel} be a transition matrix function on /, so that the following
three conditions are satisfied:
(3.7) (i) Pij(t)>0, Vi,j,i,
(3.7) (ii) ΣρΛ'ΚΙ, Vf,i,
(3.7)(iii) ptj(s + i) = X pik(s)pkj(t), Vi, j, 5, t.
kel
Then
ЛМ):=1РоО (t>0;ieI;JeS)
JeJ
defines a transition function on (/, J\ and all such transition functions arise in
this way.
The only transition functions on countable sets of any interest are those that
satisfy the continuity requirement:
(3.7)(iv) limPy(i) = Pi,.(0) = ,5i;.
40
III.3
TRANSITION FUNCTIONS AND RESOLVENTS
233
These are what Chung [1] calls 'standard' transition functions. We are going
to take the view that for transition functions on countable sets, (3.7iv) comes
as part of the definition. It is interesting that the full weight of Ray theory (or
something like it) is needed to handle Markov chains. As we shall see, the true
state-space of the Markov process is generally much larger than /. (We reserve
the symbol Ε for that space.)
(3.8) Resolvents. Suppose now that {Pt} is a measurable transition function on
the measurable space (£,<f), so that, in addition to the conditions (1.3), we have
the measurability requirement:
(3.9) Vre<f, the map (x, t)^Pt(x, Γ) is {£ χ ^[0, oo))-measurable from Ε χ [0, oo)
to JR.
For λ>0, we can then define a map Rk:b$-*b£ as follows: for xeE,
(3.10) RJ(x):= f e-^PJix)dt = Γ Rx(x, dy)f(y),
J[0,oo) JE
where
Дя(х,Г):= [ e-*Pt{x,T)dt.
J[0,oo)
(Trivial applications of monotone-class theorems and of Fubini's theorem are
now made without comment.)
Then {Rx: λ > 0} is a sub-Markovian resolvent on bS\
(3.11) (i) Rk:hS^hS\
(3.11) (ii) 0</^1=>0^ЛЯя/<1;
(3.1 l)(iii) the resolvent equation holds:
Rλ-Rμ + (λ-μ)RλRμ = 0·,
(3.11)(1У)/Л0=>ЯЯ/Л0(УЯ).
Of course we have the following characterisation of the honest (or strictly
Markovian) situation:
РД = 1, У^ЯЯЯ1 = 1, VI
Terminology. Rx is often called the λ-potential operator associated with {Pt}.
We shall call Rx the λ-resolvent of {Pt}.
(3.12) Note. If S and Τ are independent [0, oo)-valued random variables with
the exponential distributions of rate λ and μ respectively, then, for λ φ μ,
е~Хи — е~ци
EPsf(x) = XRJ{x\ P(S + Tedu) = λμ — du.
μ — λ
234 MARKOV PROCESSES IH.3,4
This gives a probabilistic interpretation of the resolvent equation since
E(Ps+r/) = EE(PsPTf\S) = EPsμRμf = ΛμΚλΚμ/
(3.13) Exercise. Prove that for BM(R), and for λ > 0,
л(*> У)/(У) ^ where гя(х, у) = у "1 ехр [ - у \ у - χ | ],
у denoting (2Л)1/2. {Hint. For x > 0 and у > 0,
1/2 ехр[- \у2t-\x2rl-\dt
RJ(x)= f гА(.
Jr
Jo
= 2(2яу)-1/2х1/2г"^ Г"ехрС-^х^-*-1)2]^,
Jo
by putting t = xs2/y. But now the map s\-+u(s):= s — s-1 maps (0, oo) one-one
onto (—00,00) and the inverse map u\-+s(u) satisfies
s(u) = и + s( — u), whence s'(u) + s'( — u) = 1.
Hence
/ = 2(2яуГ1/2х1/2е-у* exp[-^xw2]dw
Jo
= y~le~yx.
4. Contraction semigroups on Banach spaces. In many respects, resolvents are
more fundamental than transition functions. In the theory of Ray processes,
we shall construct transition functions from resolvents. The starting-point for
this construction will be the Hille-Yosida theorem on contraction semigroups
of operators on Banach spaces. That theorem is proved in Section 5. Here we
introduce contraction semigroups and their resolvents and infinitesimal
generators, and explain how these concepts are related. The formal equations of the
theory, namely
P5Pt = Ps+t, Л = ехр(^), P'0 = 99
Rx= e-bPtdt^X-V)-1
Jo
are known to us, but we must now translate them into rigorous mathematics.
(But see Note (4.16) below.)
(4.1) DEFINITION (strongly continuous contraction semigroup (SCCSG)).
Let B0 be a Banach space. A family {Pt:t^ 0} of bounded linear operators
Pt:B0->B0 is called a (one-parameter) strongly continuous contraction semi-
III.4
TRANSITION FUNCTIONS AND RESOLVENTS
235
group (SCCSG) if the following conditions hold:
(4.2) (i) for/eB0> ||Pf/ —/||-*0 as i|0 (strong continuity)
(4.2) (ii) || Pt || ^ 1 for t ^ 0 (contraction property);
(4.2)(iii) PsPt = Ps+t for s,i^0 (semigroup property).
(By 'strongly continuous', we therefore mean 'of class C0' in the terminology
of Hille and Phillips [1].)
Suppose that {Pt: t ^ 0} is an SCCSG on B0. Then, for t ^ s ^ 0, and feB0,
\\Ptf-P,f\\ = 11№,-,/-ЯИ < \\Pt-sf-fl
from which it follows that the mapii—»Р,/ is continuous from [0, oo) into B0.
We may therefore define the resolvent {Rx: λ > 0} of {Pt: t ^ 0} via
(4.3) Яя/:= Г V^P,/</r,
Jo
the integral being the limit (in the (strong) topology of B0) of approximating
Riemann sums.
(4.4) DEFINITION (contraction resolvent). Let В be a Banach space, and let
{Rk :λ>0} be a family of bounded linear operators Rx: B-*B.We call {Rx: λ > 0}
a contraction resolvent if
(4.5) (i) || ЛЯЯ К 1 for λ > 0, and
(4.5)(ii) the resolvent equation holds:
Rk-R^(X- μ)ΛλΛμ = 0 (Я, μ > 0).
We already know how to prove that if {Pt: t > 0} is an SCCSG on B0, then
its resolvent defined in (4.3) is a contraction resolvent on B0. But since then
Jo
it is clear that the resolvent of an SCCSG is a strongly continuous contraction
resolvent (SCCR) in the sense of the following definition.
(4.6) DEFINITION (strongly continuous contraction resolvent (SCCR)). By a
strongly continuous contraction resolvent (SCCR) on a Banach space B0, we
mean a contraction resolvent {Rx: λ > 0} on B0 with the additional property
(4.7) || XRkf — /1| -* 0 as λ -* oo (strong continuity).
The Hille-Yosida theorem gives the Tauberian' converse to the 'Abelian'
result we have just seen by showing that
(4.8) an SCCR is the resolvent of an SCCSG.
236 MARKOV PROCESSES III.4
Suppose now that {Rx: λ > 0} is a contraction resolvent on a Banach space
B. Since
Rμ = RλlI + (λ-μ)Rμl
it is clear that
(4.9) the range RXB of Rx is a space 9t independent of λ.
Since, for geB,
(XRX - I)R„g « -^-Д„0 - -^ί/,
/ — μ λ — μ
we see that for fe£l, and therefore for / in the closure U of ^2, XRxf-+f as
Я -* oo. Indeed, it is now obvious that
(4.10) B0:= {heB-.XRJi-^h as Λ-»οο} = J.
(4.21) DEFINITION ((infinitesimal) generator of an SCCSG). Let B0 be a
Banach space and let {Pt:t^ 0} be an SCCSG on B0. The (infinitesimal) generator
9 of {Pt:t ^0} is the (generally unbounded) operator &: 3>{&)->B0 defined as
follows. We write feQ)^S) if, for some g in B0, we have
U-'iPJ-n-gW^O asejO,
and we then define &f to equal g.
Let {Pt:t^0} be an SCCSG on B0. We are now going to (formulate and)
prove .that, for λ > 0,
(4.12) the operators Rx and (λ — &) are inverses.
For geB0, we have as ε|0,
(4.13) B-\Rxg-e->*PERxg) = B-i
e X5P5gds->g;
о
and it is clear that
(4.14) for λ > 0 and geB0, RxgeS>(&) and (λ - &)Rxg = g.
For feS>(&% and t > 0,
z-\Pt+J-Ptf) = PtB-\PEf-f)^Pt<Sf
and it is easy to obtain the results
Ptfe®{<#\ ^Ptf = P<Sf = 9PJ,
at
P,f-f =
'psVfds= \'$Psfds.
о Jo
HI.4,5 TRANSITION FUNCTIONS AND RESOLVENTS 237
On taking Laplace transforms of the last equation, we obtain
RJ-X-lf = X-lRx9f;
in other words,
(4.15) Ufe9(9),ihenRx(X-9)f = f.
(4.16) Note. In this section, we have used without proof properties of strong
Riemann integrals. If you want the full theory of these, see Hille and Phillips
[1] or Dynkin [2].
We end this section with a useful lemma that sometimes enables us to calculate
the precise domain 9(9) for an SCCSG {Pt\t>0}.
{4.17) LEMMA (Dynkin, Reuter). Suppose that <£ is an extension of 9. Thus
suppose that Ή is a linear map from Q)^€), with ЭДсЭДсВ0) into B0 and
that Vf = 9f, Vfe9(9). Suppose also that, for fe2)(<$\<$f = /=>/ = 0. Then
<$ = 9; or, equivalent^, 2)(W) = S>(9).
Proof. Suppose that /е®(#). Put g:= f - <$f. Then h:= R1geS>{9) and
h-<#h = h-9h = g = f-Vf
so that
f-h = V(f-h)
and f = he@(9). D
5. The Hille-Yosida Theorem. Here now is the route from resolvent to
semigroup.
{5.1) THEOREM (Hille-Yosida). Let {Дя:Л>0} be a strongly continuous
contraction resolvent family on B0. Then there exists a unique strongly continuous
contraction semigroup {SCCSG) {Pt: t ^ 0} on B0 such that
(5.2) Г e-XtPtfdt = Rxf V*>0,V/eBo-
J[0,oo)
Indeed, if we define
(5.3) Gx:=k(XR,-I),
(5.4) Рм:=ехр(*Оя) = е-* £ (A0n(WM
л = 0
then, for each fin B0,
(5.5) Pt/=limPtiA/, Vi^O.
A-+00
238 MARKOV PROCESSES HI. 5
{5.6) Preliminaries. In the last section, we obtained the fact (4.12) that if
{Rx: λ > 0} and ^ are the resolvent and generator of an already given SCCSG
{Pt:t^0}, then Rx and λ — У are inverse operators.
In our current situation, we are given an SCCR {Rx: λ > 0} as data. No
semigroup {Pt:t^0} is yet available. Even so, we are guided by (4.12).
We know that the range RXB0 of Rx is a space 9t independent of Я, and that
9t is dense in B0 (since ЛЯЯ/ -* / as Л -*оо). If heB0 and Rxh = 0 for some Я, then
Rμh = {I + (λ-μ)Rμ}Rλh = 0
for every μ; and, since μRμh-*h as μ-* οο, we must have /i = 0. Thus the map
Rx: B0 -* Ш is a bijection. On combining this fact with the resolvent equation,
we see that there is a uniquely defined operator ^ with domain ®(^):= 3t such
that
(Я-^-^Кя
in that (4.14) and (4.15) hold for our new situation.
The operator Gx in (5.3) is bounded. What makes the result (5.5) a particularly
satisfying interpretation of Pt = exp(i^) is the fact that, for feB0,
(5.7) fe@(&) if and only if g:= lim GJ exists,
A->oo
and then <&f = g.
Proof of (5.7). Suppose first that feS>(&). Then
GJ = kRk9f->9f.
Suppose conversely that Gxf-+g. Then, for fixed μ > 0, the resolvent equation
shows that, as Λ-> οο,
Since we also have R^J-^R^, it must be the case that / = Λ„(μ/ — g\ whence
/e®(^)and$?/ = 0. D
Proof of Theorem 5.1. Since Gx is a bounded operator, it is well known that
(5.8) (i) Ρ5,λΡ<,λ = Ρ5+ί,λ;
(5.8) (ii) limhU0h 1(Ph X-I) = GX (uniform operator topology);
(5.8)(ίϋ)Λ,Λ-/ = ίό^>Λ^.
Since ||ЯЯЯ|| *ζ 1, it is clear from (5.4) that
(5.8)(iv) ΙΙΛ,αΚΙ.
Recall that Rx and Rfl commute, whence Gx and GM commute and Ριλ commutes
III.5 TRANSITION FUNCTIONS AND RESOLVENTS 239
with Psfl. Thus we may calculate, with Ρ(ί, λ) standing for PuX when convenient,
so that, by (5.8)(iv),
ιιλ.λ/-λ,μ/κ»|[^(^)/-/]-[^(^)/-/]|·
Letting n-> oo and using (5.8)(ii), we find that
\\Ρ,,χ/-Ρ,,μη\^<\\ολ/-ομη\.
It now follows from (5.7) that, for /e^(^), the limit
(5-9) />,/:= lim PtJ
A-+00
exists uniformly over compact ί-intervals, so that t\-+Ptf is continuous for
feS>{<&). Since Q}{<§) is dense in B0, the limit in (5.9) exists for each / in B0, and
tv-^PJ is continuous for feB0. That (5.2) holds now follows from the fact
that
Γ e-»PtJdt = (X-Glt)-1f-+RJ (μ-*)).
Jo
Exercise. Prove this by using the resolvent equation to show that, with γ:=
λμ(λ + μΓ\
(Я-G,)"1 =(Λ + M)-VKy + (A + μ)"1/.
The proof of the Hille-Yosida Theorem is now complete. Π
(5.10) LEMMA. The operator & does generate {Pt} in the sense that {Pt} is
uniquely determined by (У, 2)(^)).
Proof. For / in B0, we can (for each λ > 0) determine Rkf as the unique solution
in ЩУ) of the equation
(X-$)RJ = f.
The function t)-+Ptf is then uniquely specified by the fact that its Laplace
transform is Rxf. Of course, in this special 'semigroup' situation, inversion of
the Laplace transform may be effected directly via (5.5). Π
Note. There is no need to worry about the uniqueness theorem for Laplace
transforms in the Banach-space context—just apply an element of the dual
space and use the real-variable result.
240
MARKOV PROCESSES
111.5,6
(5.11) Limitations of the HY Theorem. There are two reasons why the HY
theorem may not be entirely appropriate. The first is that Q)(^S) may be too
small for many purposes. This is certainly the case in Markov-chain theory,
where we need to introduce a 'natural' generator extending <&. The second
reason is (of course!) that S>(^) may be too large. This is the case in diffusion
theory in dimension η ^ 2, where the 'differential generator', a contraction of
^, is more tractable and often contains all the relevant information.
2. FELLER-DYNKIN PROCESSES
6. Feller-Dynkin (FD) semigroups. Until further notice, suppose that Ε is a
locally compact Hausdorffspace with countable base (LCCB) and that $ — ЩЕ).
It is well known that if Ε is not compact, then we can adjoin to £ a point д so
that Ед:= Еид is compact metrisable. Thus д is the point at infinity in the one-
point compactification of E. The notation is meant to indicate that д can be
used as a coffin state. If Ε is compact, make д a point isolated from E. In either
case, Ε is σ-compact and Polish.
We write:
C{E) for the space of all (R-valued) continuous functions on £;
Cb(£) for the space of bounded continuous functions on £;
C0(£) for the space of (bounded) continuous functions on £ which vanish at
infinity;
CK(E) for the space of continuous functions on £ with compact support.
As an extension of the Riesz representation theorem (H.80.3) we have the
following result. Again see Theorem IV.6.3 of Dunford and Schwartz [1].
(6.1) THEOREM. A bounded linear functional φ on C0(£) may be written
uniquely in the form
<?>(/) = M/):=[ /Шах)
where μ is a signed measure on Ε of finite total variation.
By a sub-Markov kernel V on (£,<f), we mean a map V:E χ (?-*[0,1] such
that
(i) Vxe£, V(x,·) is a subprobability measure on (E,S) so that V(x,E) < 1;
(ii) Vre<f,K(-,r) is ^-measurable.
Exercise. Derive the following theorem from Theorem 6.1 by using the
Monotone-Class Theorem II.3.1.
(6.2) THEOREM. Suppose that V\C0(E)-*b$ is a (bounded) linear operator
III.6 FELLER-DYNKIN PROCESSES 241
that is sub-Markov in the sense that 0^/^ 1 implies 0^ Vf < 1. Then there
exists a unique sub-Markov kernel (also denoted by) V on (£, S) such that
Vf{x) = iv{x, dy)f(y), V/eC0(£), Vxe£.
Hence V has a canonical extension (via the integral) to a map V: Ъ$ -> hS.
Every author has his or her own definition of 'Feller semigroup', so be careful
when moving from book to book. The modern trend is to mean by the Feller
property of a transition function on (E, S) the property
(6.3) Pt:Cb(E)^Cb(E% Vi^O,
and by the strong Feller property the property
(6.4) Pt:hS^Ch(E\ Vi^O.
There are good reasons for using the 'Feller' label for all kinds of subtle
modifications of these statements.
To avoid causing still further terminological clashes, let us give a new (and
perfectly just) name to a favourite class of semigroups.
(6.5) DEFINITION (Feller-Dynkin semigroup). A Feller-Dynkin (FD)
semigroup is a strongly continuous, sub-Markov semigroup {Pt:t^0} of linear
operators on C0(E):
(6.6) (i) Pt:C0(E)^C0(E);
(6.6) (ii) V/eC0(£), 0 < / ^ 1 =>0 ^ Ρ J ^ 1;
(6.6)(iii) PsPt = Ps+t, Vs, t ^ 0; P0 = /, the identity on C0(E);
(6.6)(iv) V/eC0(£), ||Pf/-/||->0 as Ц0.
(Here then we have a situation to which the HY theorem applies with B =
B0 = C0(E).)
It follows easily from Theorem 6.2 that to any FD semigroup there corresponds
a 'Feller-Dynkin' transition function on (E,S).
The following lemma is very important for verifying conditions (6.6) in
practice.
(6.7) LEMMA. // {Pt:t^0} is a sub-Markov semigroup on C0(E) satisfying
(6.6)(i)-(iii) then (6.6) (iv) is implied by the apparently weaker condition
(6.6)(iv)* V/eC0(£), VxeE, Ptf(x)^f(x) as Ц0.
Proof. \ifeC0(E) and χ -► у in Ε then, by the Dominated-Convergence Theorem,
(RJ)(x):= Γe~XtPtf(x)dt^ Γe~*Ptf(y)dt = RJ(y).
Jo Jo
242
MARKOV PROCESSES
III.6
It is therefore clear that RX:C0(E)->C0(E% and {Rx:X>0} is a contraction
resolvent on C0(E). We know from the Hille-Yosida Theorem that the common
domain B0 of strong continuity of {Pt:t^0} and {Ля:Л>0} is given by
B0 = RXC0(E) for every λ. We need to prove that B0 = C0{E). If B0 Φ C0{E) then,
by the Hahn-Banach theorem, we can find a non-trivial linear functional φ on
C0(E) such that φ annihilates B0. If μ is the signed measure that represents φ
in the Riesz theorem, we shall have
l·
\XRxf(xMdx) = <p(XRxf) = 0
for every / in C0(E) and every λ > 0. However,
XRxf{x) = f °° e-°Ps/xf(x) ds - f(x) (λ - со)
Jo
by the assumption (6.6)(iv)*. Hence <p(f) = 0 for every / in C0(E), contradicting
the fact that φ is non-trivial. Hence B0 does equal C0{E). Π
You might like to try to prove Lemma 6.7 directly, without using Hille-Yosida
machinery.
Let {Pt} be an FD transition function on (£, S). Let ^ be the (strong) generator
of the FD semigroup {Pt}. Then, for fe3>{<0),
^/(х) = ИтГ1Г f Pt(x,dy)f(y)-f(x)].
illo LJe J
Let /еЯ>(<&) с CO(E) and let / attain its supremum (as it must) at the point x.
Then if f(x) ^ 0, we must have &f(x) ^ 0. (If {Pt} is honest then we will have
&f(x) ^ 0 irrespective of the sign of f(x).)
This fact motivates
(6.8) LEMMA (Dynkin's Maximum Principle). Suppose that %:&(%)-> С 0(E)
is a linear operator extending &. Suppose that iffeS}^) andf attains its maximum
at χ and f{x) ^ 0, then <#f{x) ^ 0. Then <# = <#.
Proof. By Lemma 4.17, we need only prove that /e^(#) and Vf = f imply
that / = 0. So suppose that feS){^) and #/ = /. Let / attain its maximum at
x. If f(x) ^ 0 then <#f{x) < 0, so that f{x) = <£f{x) = 0. By applying the same
argument to —/, we see that / = 0. Π
(6.9) Generator of Brownian motion. Let £ = Rn. Let Pt(x,dy) denote the
transition function of CBM (Rn). See Example 1.6. For /eC0(Rn), set
Ptf(x):= Jp,(x, dy)f(y) = Wf(Xt).
Here X is CBM (Rn) and Wx is the Wiener measure corresponding to starting
ΠΙ.6,7
FELLER-DYNKIN PROCESSES
243
position χ. The fact that Pt:C0->C0 (where C0 = C0(Rn)) is easily established
by analysis, and is an immediate consequence of the already established fact
that xnWx is continuous from Rn to Pr{W). The fact that
limPf/(x) = /(x) (/eCo(R-),xeR-)
rjO
is also easy to establish analytically, and is probabilistically obvious because of
the (right)-continuity of Xt at 0. Thus {Pt} is an FD semigroup on C0(Rn).
The natural domain in C0 = C0(Rn) of the operator \A (Δ being Laplace's
operator) is defined to be
ЩА):= {feC0:±Af exists and \AfeC0).
In a moment, we shall prove that if η = 1 then ^ = \A. The situation in dimension
η ^ 2 is more complicated. The operator ^ is the closure of |Δ: thus feQ)^S)
if and only if there exist functions fn in S>{jA) and a function g in C0 such that
|| fn — f || -* 0 and || |Δ/Π — # || -* 0; and then ^/ = g. We examine this case later.
The moral is that for dimension η ^ 2, infinitesimal generators are not really the
right things to look at. The Stroock-Varadhan theory tells us how we should
view things.
Now consider the 1-dimensional case. From (3.13), it follows that
**/(*)= Γ n(x,y)f(y)dy (Л>0,/еСо),
Jr
where
r^yy^y-'expi-yly-xl), y:=№/2.
Fix Л>0. Suppose that he@{&), so that h = Rxf for some / in C0. (In the
present context, we have В = B0 = C0.) Then
Л'(*)= УгА(х, у) sgn (у-*)/();)</)>
Jr
where
( 1 if χ > 0,
sgnx:= < — 1 if x<0,
( 0 if χ = 0.
On differentiating again, we find that
kh-\h"=f = kh-<$h.
Hence jA is an extension of ^. By Dynkin's maximal principle (or by direct
application of Lemma 4.17), ^ = ^Δ.
7. The existence theorem: canonical FD processes. Let Ε continue to denote an
LCCB and let ё\=М(Е). Suppose that {Pt} is an FD semigroup on C0:= C0(E).
We shall show that there exists a strong Markov, Ed-valued R-process X with
244
MARKOV PROCESSES
III.7
transition function {Pt}. (Strictly speaking, the transition function of X is {P*d},
but, conventionally, we say that {Pt} is the transition function, and {P*d} the
extended transition function, of X.) We then obtain Dynkin's simple and
extremely illuminating formula for the (strong) generator ^ of {Pt}.
The technique used for establishing the existence of X is the same as that
which we used for CBM (R). We first construct the D^niell-Kolmogorov (DK)
process Υ associated with {Pt} and then obtain X by smoothing the paths of
Υ via the regularity theorem for supermartingales.
(7.1) Use of the DK Theorem. Let Ω:= Ef'*>\ the space of all functions ω from
[0, oo) to Ed. For t ^0, let Yt be the coordinate projection mapping Ω to Ed
via Υ,(ω):=ω(ί). Set
3°:=σ{Γ,:θ0}, 9?:=a{Ya:s<t}.
For every probability measure μ on (Ed,Sd\ the DK theorem guarantees
the existence of a unique probability measure Ρμ on (Ω, ^°) such that, for neN,
O^ii^i^··· ^i„and x0,xu...,xneEd,
(7.2) Р"[У(0)е</хо; Y^edx,;...; Y(tn)edxJ
= μ(άχ0)Ρ;ι^χ0,άχ1)...ρ;ηδ_ίη_ι(χη-1,άχη).
The semigroup (Chapman-Kolmogorov) property guarantees the required
consistency, and the fact that Ed is compact metric gives more than adequate
topological structure.
We write
(7.3) Px:= Ρε*, εχ being the unit mass at x.
We can verify by the usual monotone-class methods (Exercise!) that the map
хь->Рх(Л) is ^-measurable for every Λ in ^°. (Problems concerning the weak
continuity of the map χι—>PX are discussed in Section 13). Hence, for цеЪУ0
and t ^ 0, the map
ω^>ΈΥ(ί>ω)η
is ^-measurable, where Ex (respectively Εμ) denotes the expectation
corresponding to Px (respectively Ρμ).
The Markov property knitting the laws {Px:xeEd} can now be expressed:
for ηeЪ<g0,μePΐ{Eд) and i^0,
(7.4) E"|>7 ° 0, |»f°] = Er( V a.s.(P").
Here, of course, 0, is the time-shift map:
0,:Ω-»Ω, 0,ω(5):=ω(ί + 5).
See Section 11.90 for (7.4). (The 'Brownian' proof obviously transfers!)
In particular, we have, for feC0 and 5, t ^ 0,
(7.5) (i)
&и°Гш+*№ = РшЯЪ). a.s.(P^)
III.7
FELLER-DYNKIN PROCESSES
245
and this, together with the fact that
(7.5)(ii) Ρ>°Υ^ = μ,
completely determines all the laws P".
{7.6) Path regularisation. Suppose that h is of the form Rtg, where geC£ (the
set of non-negative elements in C0). Then
(7.7) h is l-super-median for {P,}: by definition, this means that
0<e"5Ps/i</i, Vs^O.
Proof of (7.7).
,Γ'e-Pugdu=l
Jo Js
e-5PsRig = e-5P5 \ e~uPugdu = \ e'vPvgdO^Rxg. D
Hence, for every μ,
E^-(s+i)/i(n+I)l^°] = e-<*+»PMh(Yt) < е-%и
so that
(7.8) e~lh(Yt) is a supermartingale relative to (^°,Ρμ).
Hence (see Section 11.65.1).
(7.9) α.5.(Ρμ), the following statement holds: the limit hm^3q^t e~qh(Yq) exists for
all t and defines an R-function of t.
Now let g0,gug2,-· be a countable dense subset of C£ with g0 >0 on E.
Put hn = R1gn and let Ж:= {h0,h1,h2,...}. Then h0>0 on Ε and Ж separates
points of Ed because Ж — Ж is dense in !&{&) and &{<&) is dense in C0. Since
Ж is countable, it follows that for every μ,
(7.10) a.s.(P"), (7.9) holds for all кеЖ.
But the map xh*(A0W,AiW,...) (with h(d) = Whetf)v& a homeomorphism of
Ed onto a subset of R00. Hence we can conclude that, a.s.(PM),
(7.11) Xt:= limQ3g||f Yq exists, Vi, and defines an R-process X.
It is worth explaining things in a little more detail. Let Ω0 be the set of ω in
Ω for which the limit Xt(co) exists for every t and defines an R-map t\-*Xt(a>).
Then Ω0ε^° and Ρμ(Ω0)= 1 forall^ePr(£a). For ωΕΩ\Ω0, define Xt(co):= 5, Vi.
Then X is an R-process and Xt is ^-measurable for each i.
The crucial result that, for each μ, X is α (Ρμ) modification of Y:
(7.12) Р"[*,= У,] = 1, Υί,νμ,
must be established directly, since we cannot appeal to Theorem II.67.7 in the
absence of the usual conditions.
246
MARKOV PROCESSES
III.7
Proof of (7.12). For fuf2eC0(E) and <&э«Ш>
VUi(Y,)fi(X,n = limETOW2(rg)]
= limE'1[/1(yi)Pg_i/2(yi)] = E'i[/1(yi)/2(yi)].
By monotone-class arguments, Wf(Y„Xt) = Wf(Yt, Yt) for /eb(^ χ ёд\ and
(7.12) follows. D
Finally, note that, since /i0:= Я^0 > 0 on £, we can conclude from Theorem
11.78.1 that, for every μ, it is true a.s.(P") that
(7.13) Vt, i/difcer *,_ or Xt = d, then Xu = d,Vu^t.
According to that theorem, the statement (7.13) corresponds to a ^-measurable
set.
(7.14) Canonical FD processes. The DK theorem has served its purpose. The
clumsy space Ω:= Εψ·Λ) is no longer needed. As in the switch from Rt0·00* to
С for CBM(R) in Section 11.71, we can now tidy things up.
(7.15) Let Ω now denote the space of R-paths:
ω:[0,οο)->£β,
such that if either ω(ί —) or co(t) = д then ω(μ) = 5, Vw ^ t. By convention, each ω
in Ω is extended to a map ω: [0, oo] -> Ed by setting ω(οο):= д.
Note. It is important that we do not require the existence of the limit limiTT β ω(ί)
for ω in Ω.
For ωεΩ and t ^ 0, define
(7.16) (i) *,(ω):=ω(ί),
(7.16) (ii) «Τ°:= σ{Χ5:0 ^ s < oo} = σ{*5:0 ^ 5 ^ oo},
(7.16)(iii) ^:=a{Xs:s^t}.
(7.17) THEOREM (Dynkin, Kinney, Blumenthal). For μβΡτ(Εδ), there exists a
unique probability measure Ρμ on (Ω, &°) such that, for neN, 0 ^ tx ^ t2 < · · · ^ tn
and x0,xu...,x„eEd,
(7.18) P^X(0)sdxo;X(t1)edx1;...;X(tH)edxH]
= μ(άχ0)Ρ:ι\χ0ΜιΥ-Ρ:ηίίη_ι(χη.ιΜη).
This very important theorem merely reinterprets the results obtained above.
The new P" is the old P" law of X\
The set-up
Ζ = (Ω,^°,{Ζί:0^ί^οο},{Ρ'1:μ€ΡΓ(£β)})
Ш.7,8 FELLER-DYNKIN PROCESSES 247
is called the canonical FD process associated with the FD semigroup {Pt}. Of
course, X has the same simple Markov properties as the process Y. In particular,
for 5, t ^ 0, ξeЬ^ and /eC0,
(7.19) Ε*Κ/(ΧΙ+Ι)] = &KPJ(Xt)l
This formula will allow us to utilize the smoothness of the semigroup {Pt} and
the right-continuity of {Xt} to show that X (unlike Y) has the strong Markov
property.
(7.20) Lifetime. The random time
C:= inf {t:X(t) = d} = inf {t: JT(t-) = δ or JT(t) = d)
is called the lifetime of X
(7.21) Noie that ι/ί<ζ(ω) ί/ien ί/ie sei of values {X{s, ω): s ^ t] is precompact
in E.
8. Strong Markov property: preliminary version. Recall that an {&?+} stopping
time is a map Γ.Ω-* [Ο, οο] such that
{ω: Τ(ω) < ί} e^°+, Vte[0, οο).
(Here J^ + := J^:= <^°.) Equivalently, Г is a map Γ:Ω-» [0, οο] such that
(8.1) (i) {ω: Τ(ω) < t}e^% Vie[0, oo].
For such a T, we define &°T+ to be the σ-algebra of sets Λ in 3F° for which
(8.1)(ii) Αη{ω:Τ{ω)<ήΕ^% Vie[0,oo].
For 0 ^ t ^ oo, define 0r'Q->Q as usual:
(8.2) (i) 0,ω(5):=ω(ί + 5), Vs,
where, of course, oo + s — s + oo = oo, Vs. If Τ is a map from Ω to [0, oo], define
(8.2) (ii) θτω = θηω)ω.
Recall that, for a function η on Ω, we write θτη for η°0Γ and that, for example,
(8.2)(iii) (ξθτη)(ω):=ξ(ω)η(θΤ(ω)ω).
(8.3) THEOREM (Strong Markov Theorem; Dynkin, Yuskevic, Blumenthal).
Let Τ be an {&?+} stopping time. Then Чμe¥τ(Eд\ЧηeЪ&\
(8.4) ^μίθΤη\^°Τ^ = Εχ^ΙηΙ α.5.(Ρ").
Equivalently, У/хеРг(£Д 4^^+, V^ehT0,
(8.5) Е^0г^] = Е*[£Е*(г^].
248 MARKOV PROCESSES III.8
Notes
(i) We have already mentioned that (8.5) expresses the strong Markov theorem
in a form ideally suited to applications. Certain slight variants of (8.5) are
sometimes required. Thus (8.5) will obviously hold if £em+^°T+ (the set of
non-negative #roT+ measurable functions from Ω to [0, oo])and^em+#"°.
(ii) The debut and section theorems make it clear that Theorem 8.3 needs to be
extended to take account of completions of σ-algebras before it is of any real
use for discontinuous processes. See Section 9 for the appropriate extension.
Proof of Theorem 8.3. As we used to do for martingales, put
г<">м = lk2~" if {k ~1)2~"< τ(ω) < fc2""'feeH
joo if Γ(ω) = οο.
Suppose that Ae^°T+. Then
Λ„,*:= {ω: Γ(η)(ω) = fe2""} пЛе^°2.„.
Thus, applying the simple Markov property (7.19) with ξηΙ( as the indicator
function of Л„ ц, we find that, for μβΡι(ΕΒ) and /eC0,
Ε"[/οΧ(Γ<"> + 5);Λ]= £ E"[/oZ(fc2-» + 5);An))k]
*^ 00
= 1Е"[^Ж2-»);л»,к]
= E"[(PS/>W>);A].
Keep μ and s fixed and let n-* oo. By right-continuity of paths,
Χ (Γ(π) + 5) -> X{T + 5), X{Tin)) -> Ζ(Γ).
Since /eC0, we have PsfeC0 by the Feller-Dynkin property, so
foX(T^ + s)-^f(XT+s), Psf°X(T^)-+Psf(XT).
Hence, by the Dominated-Convergence Theorem,
(8.6) Ε"[/(ΧΓ+5);Λ] = Ε"[Ρ5/(ΧΓ);Λ],
and monotone-class arguments give
(8.7) ФКЯХт+шП-ЯКРшЯХт)! V£eb^°r+.
Now consider the expression
V'.= E^f(XT+s)g(XT+s+u)l
where fgeC0 and £еЬ^+. We can apply (8.7) with T + s playing the role of
Τ and ξf{Xτ+s) playing the role of ξ to obtain
V = E»tff(XT+s)Pug(XT+5)l
Now we can apply (8.7) again with Τ as Γ, and ξ as ξ9 but with f(x) replaced
III. 8,9 FELLER-DYNKIN PROCESSES 249
by f(x)(Pug)(x) to find that
V = E»tfx({Ps(fxPug)}oXT)l
The brackets are meant to help clarify the structure, but you can ignore as
many as you wish!
You can now check that we have just established (8.5) in the case when
η(ω) = /(Χ5(ω))9(Χ5+»(ω)),
and you can see that the case when
(8.8) η = ίι(Χ5ί)/2(Χ*2)'~/η(Χ5η) (fi,f2,...,fneC0)
can be established similarly. Now let Ж be the algebra of functions η that are
sums of products of the form (8.8) and apply the Monotone-Class Theorem II.3.2
to obtain the general case of (8.5). You should check that д causes no trouble
in this proof. D
9. Strong Markov property: full version; Blumenthal's 0-1 Law. The Strong
Markov Theorem (8.3) is inadequate because it only applies to {&"?+ }-stopping
times, whereas, for example, the debut of a compact set for an R-process is not
an {#"°+ }-stopping time (see Section 11.75). We therefore need the extension to
be described in this section.
For each μ in Pr(£5), we now define
(9.1) (Ω,^", {^μ}) to be the usual P" augmentation of (Ω, J^°, {J%°}).
First, then, &μ is the Ρμ completion of&°. This means that Ае^ц if and only
if there exist Λ1μ,Λ2,μ in &° with
AUfl <Ξ Л s Α29β9 Ρ"(Λ1ίμ) = Ρ"(Λ2,μ);
and then we set Ρμ(Λ):= Ρμ(Λ1μ). Further, for t > 0, &μ is the smallest σ-algebra
on Ω extending ^°+ and containing all Ρμ null sets in $*μ.
Now put
(9.2) ^:=Π^"> *>П*7.
μ μ
the intersections being taken over all μ in Pr(Ed). Although for each μ,
(Ω,^μ,{^?},Ρμ} satisfies the usual conditions (see Section 11.67),
(Ω, J^, {e^ri},P//) does not satisfy the usual conditions (except in trivial cases).
However, we do have &г+ = J%, because &*+ = ^f(V^).
{9.3) THEOREM (Debut Theorem, see 11.76). For ВеЩЕд), set
DB:=mi{t^0:XteB},
HB:=wf{t>0:XteB}.
250 MARKOV PROCESSES HI.9
Then DB and HB are {^"f} stopping times for every μ, so that DB and HB are
{^t} stopping times.
{9.4) THEOREM (Strong Markov Theorem for X, definitive form). Let Τ be
an {J%} stopping time; then, for μ€ΡΓ(£5), ^eb^j^eb^", we have
(9.5) (i) Е"|>0г|^г] = Ε*<Γ>Μ, a.s.(P"),
(9.5)(ii) Κμ1ξθτη\ = Е"[£Е*(Г)>/].
It is necessary first to give careful thought to the technical problem of what
Theorem 9.4 means, because many measurability properties are implicit in its
statement. (We are sympathetic us to a point if you regard the whole business
of completions as an unavoidable and artificial nuisance in a subject, probability
theory, which is fundamentally a branch of applied mathematics. We shall
therefore try to deal with this area in as succinct a way as possible.)
Routine applications of monotone-class arguments are skipped in the following
discussion.
We know from Lemma II.75.3 that
(9.6) for μεΡτ(Εδ), there exists an {J%°+} stopping time Τ{μ) such that
Ρ"[Γ(μ)=Γ| = 1.
Further, it is easily shown that, for μβΡτ(Εδ% we can find a function ημ in b^"°
with Ρμ[ημ = η\ = 1. Then η°θτ is, a.s^P"), equal to the ^-measurable function
ημ°θΤ(μ), so that η°θτ is ^-measurable. Hence
(9.7) η°θτ is ^-measurable.
Thus the conditional expectation Εμ|>7ο0Γ|^"Γ] can be interpreted by reference
either to the 'carrier' triple (Ω,^,Ρμ) or to (Ω, #~", Ρ").
Next, we prove that
(9.8) VAe^", the map xi->P*(A) is universally (ij) measurable on Ed.
(Recall that <f£, the universal completion of Sd, is defined as
(9.9) <?*:=C)№d:vePT(Ed)l
where {Ed,Syd) is the v-completion of {Edyid))
Proof of (9.8). If vePr(Ed) and Ae«f(c J^v), we can find AltV and A2tV in ^°
with
Ai,sAcA2il and Pv(AltV) = Pv(A2,v).
But it is clear from the definition of Pv on 3F° that, for к = 1 or 2,
Pv(Ak,v) = |p^(Afc,v)v(dx).
Hence
P*(AltVKP*(AXP*(A2,v), Vx,
III.9 FELLER-DYNKIN PROCESSES 251
and
ΠΑι>(ώ)=ίρ(Α2>(ίχ).
Since χι—>Px(Aky) is ^-measurable for к = 1,2, it follows that xi->P*(A) is
<fv-measurable. Since ν is arbitrary. (9.8) follows. Π
The above proof of (9.8) yields the intuitively obvious result:
(9.10) VAe#",VvEPr(£), PV(A) = Px{A)v{dx).
ι = Px{A)v{dx).
J Ed
We know (11.73.11) that if S is an {&?+} stopping time then
(9.11) Xs is «fj+-measurable from Ω to (Ed, gd).
By some obvious further arguments based on (9.6), we can show that
(9.12) XT is ^-measurable from Ω to (Ed, gf).
By composition, it follows from (9.8) and (9.12) that
(9.13) УЛе«Г, the map ωι-*Ρ*(Γ(ω)·ω)(Λ) is &'^measurable.
Finally, (9.13) implies that
(9.14) V^ebJ^, E*(r)|>7]ebJV
The measurability implications of Theorem 9.4 are now clear. Of course, the
proof of the theorem is now trivial from (9.6) and the 'algebraic' Strong Markov
Theorem (8.3).
{9.15) THEOREM (Blumenthal's 0-1 Law). If Ae3?0 then Vxe£a,P*(A) = 0
or 1.
Proof. Apply the Strong Markov Theorem with T= 0, ξ = /л and η = /л. Since
P*[*(0) = x] = l,Vx,
E*[/J = Е*|7Л0О/Л] = Е*|7ЛЕ*/Л] = (Е*[/л])2. D
(9.16) COROLLARY. If Τ is an {^t} stopping time thenVxeEd,Px[T= 0~\=0
or 1.
Proof. {T=0}e^o. D
(9.77) Example. Let xeE.BeS. Prove that
either PX\HB = 0] = 1, in which case χ is called regular for B,
orPx[HB — 0] = 0, in which case χ is called irregular for B.
It is often difficult to decide which alternative obtains as classical examples
252
MARKOV PROCESSES
ΙΠ.9,10
like Lebesgue's thorn (see Section 7.11 of Ito and Mckean [1]) demonstrate.
The connection between the present concept of 'regular' and that of a 'regular'
boundary point in the Dirichlet problem is described in Section 1.22.
(9.18) Almost surely. A statement S about points ω in Ω will be said to hold
almost surely (a.s.) if Λ:= {ω:Ξ{ω) is true}e J* and Ρμ(Λ) = 1, У μ. Thus 'a.s.' means
'a.s.(P"),V^\
Exercises
{9.10) Let X be CBM(R3). Let Kc=R3 and let b be a point of R3 such that b
is the tip of a cone that lies entirely within V. Prove that b is regular for V (for
X). Why is it obvious that if L is a line in R3 then no point b is regular for L
(for Χ)Ί
{9.20) Let X be a canonical FD process, and let xeE. Set
Ux:=inf{t>0:Xt*x}.
Prove that
PX[UX > s + t] = P*[[/ > s]P*[C/ > t]
and deduce that PX[UX > t] = exp{-qxt) for some qx with 0 < qx < oo. Explain
why if X is honest and has continuous paths then ^x = 0or oo.
10. Some fundamental martingales; Dynkin's formula. It should not surprise
you to learn that, at various levels of sophistication, the Strong Markov Theorem
can be presented as just a corollary of the optional stopping theorem for
martingales. This does not matter too much to us now, since we already have
the strong Markov theorem.
What does matter is that it is very advantageous to regard some of the
traditional consequences of the Strong Markov Theorem as martingale results.
This will be a recurring theme in this book. For now, let us adopt a martingale
approach (guided by Meyer's book [3]) to Dynkin's formula and Blumenthal's
Quasi-left-continuity Theorem.
X continues to denote our FD process. Let us write
(10.1) b<f*:= b*J η {f:f{d) = 0}, m V*:= m+/J η {f:f{d) = 0}.
Recall that m+<?£ denotes the set of ^-measurable functions from Ed to [0, oo].
For / in Ъ£% (or m + <f*), we have
(10.2) PJ(x) = | Pt{x, dy)f{y) = Exf{Xt)
for χ in Ε (and, by convention, for χ = д with Ptf{d) = 0). Then Pt: b<f* -»b<f*.
шло
FELLER-DYNKIN PROCESSES
' 253
For an {^,} stopping time T, and for λ > 0, we set
(10.3) Ρλτ/(χ):= Εχ1β~λτ/(ΧΤ)1 PTf(x):= E*[/(*r)]:
here again, we allow fehS% or /emVj. In particular, Pf = c~XtPt. We have
P*:b<f*-»b<i*. If В is a Borel subset of £, we write
(10.4) PB for РЯв, Яв:= inf {ί > 0:XteB}.
You will appreciate that it is the necessity of completion in the Debut Theorem
that forces our present concern with universally measurable functions.
It follows from (10.2) that, for geC0 and λ > 0,
(Ю.5) K^(x):=E*[V^(*,)<ii.
Jo
By appeal to the general form of Fubini's Theorem, we can extend (10.5) to the
case when деЪ£% or m+^J.
(10.6) Exercise (simple proof of Dynkin's formula). Deduce from the Strong
Markov Theorem that if Τ is an {J^} stopping time then
ρλ ρλ ρλ
ΓΤΓί ~ rT + t
(but show that P^P* φ Ρ\Ρ\ in general). Hence obtain Dynkin's formula: for
geCo,X>0,xeE,
(10.7) R,9(x) = Ex
e-Xtg{Xt)dt + PxTRxg{x).
Of course, P\Rkg(x) means (P^Rxg)(x).
The alternative proof we now give of Dynkin's formula (10.7) is the key to
many of the deepest results in the subject.
For the moment, fix ^eC0 and λ > 0, and put
Jo
(10.8) η:= \ e-"sg(Xs)dseb^°.
Then Rxg(x) = E% Vx. Since
η = Γ e-^(*s)ds + e-V0p
Jo
we can use the simple Markov property to find the following:
(10.9) for every x,
t^\ e-ksg{Xs)ds + e-ktRkg{Xb
Jo
is an R-modification of the UI martingale t \-+Εχ[η | J^J. By the Optional-Stopping
254 MARKOV PROCESSES 111.10
Theorem, if Τ is an {^"J stopping time (and hence an {^\x} stopping time),
Ε* Γe-Xsg(Xs)ds + E'le'XTR,g(XT^ = Rxg(x);
Jo
in other words, Dynkin's formula (10.7) holds.
Now pick fe@(&) and λ > 0, and apply (10.9) to g = (Я - ^)/. We see that
(10.10) if Τ is an {^t} stopping time then, for X>Q,fe@(&) and xeE,
<?·':= e-"f(Xt) -f(X0) + Г е-'\Х - <Z)foXsds
Jo
defines a UI R-martingale CkJ relative to ({^},Ρ*). By the Optional-Stopping
Theorem,
(10.11) Ex e~XTf(XT) -f(x) = Ex f V*(ST - k)f°{Xs)ds.
Jo
IfEx(T) < oo /or some x, we can let λ[[0 to obtain (for such x):
= EX\ <
Jo
(10.12) E*/(Xr)-/(x) = E* ar/Wds.
Jo
The formula (10.12) is also called Dynkin's formula. Since (10.7) and (10.11) are
the same and (10.12) is an immediate corollary, we shall mean by 'Dynkin's
formula' any or all of (10.7), (10.11) and (10.12).
It is easy to verify the following directly:
(10.13) for feS(n
C{:=f(Xt)-f(X0)~ [<Sf°Xsds
Jo
defines a martingale relative to ({#",}, P*) for all x. This corresponds to the
analytical fact that, for fe Я>{&),
-f
Jo
PJ-f- P,9fds = 0.
Jo
{10.14) Example. Let X be CBM(R) so that ^ = \d2/dx2 on its natural domain
in C0(R). Fix b > 0 and λ > 0. We can certainly find / in S>(&) with
f(x) = exp [х(2Я)1/2] for - oo < χ ^ b.
Apply (10.11) with χ = 0 and 7= Hb to find that
Е°е-яяь/(Ь)-/(0) = 0,
since (0 — Я)/= 0 on (— oo, b). Hence
Е°е-яя* = ехр[-Ь(2А)1/2],
in agreement with our earlier findings. Π
III. 11 FELLER-DYNKIN PROCESSES 255
11. Quasi-left-continuity. We shall return to obviously applicable ideas very
shortly (and there are a lot of applications coming up soon). First, it is convenient
to prove Theorem 11.1, which (precisely) asserts that X is quasi-left-continuous
(qlc).
Later in this chapter, we shall see how the modification of the qlc property
needed for Ray processes clarifies the role of branch-points. However, it is only
when we begin to consider the modern 'general theory of processes' in Volume
2 that we find what 'qlc' is really about. Still, the historical order is a good one
to follow when learning.
(11.1) THEOREM (Blumenthal's qlc Theorem). Let (Тя) be a strictly increasing
sequence of stopping times with limit T. Then
X(Tn)^X(T\ a.s. on {T< oo}.
Note. 'Stopping time' here means, of course, '{J^} stopping time'.
Proof. It is enough to prove the theorem when T^c for some non-random
constant с (For the general case, we can then replace Tn by T„ л(с — c/ή) and
finally let с ft oo through a countable sequence.) So assume that Τ ^ с for some с
Since X has R-paths, lim X{Tn) exists and equals X{T—). Thus we must prove
that, a.s., XT = XT_.
Define (but note that this is not the fundamental definition of ^г_; see
Volume 2):
(11.2) ^Γ_:=σ(^Γη:η = 1,2,3,...).
Fix χ and, for the moment, fix / in $){<§). By the Martingale-Convergence
Theorem (Theorem H.69.2) we have
(11.3) ΕΛ[/(ΖΓ)|^Γη]^ΕΛ[/(ΧΓ)|^Γ_], a.s.(P*).
But, by (10.13) and the Optional-Stopping Theorem (using the condition T^c
for justification),
(11.4) E*[/(*r)l^rJ =f(XTn) + Ε*Γ fT 9foXads\*T\ a.s.(P').
Now we can choose a subsequence (n(k)) with
■κι;
<2
-3*
Ex| I I &f°Xsds
Mk)
so that (by the Borel-Cantelli Lemma and the contraction property of conditional
expectations), a.s.(P*),
Ε'ΓΓ 9f°X,ds\rT^<2
LJr„(k) I J
' r„(k)
for all large к (greater than &0(ω)). Hence, letting η tend to oo through (n(k))
256
MARKOV PROCESSES
111.11,12
in (11.4), we obtain
(П.5) Е*[/(*Г)|.ГГ_]=/(*Г_), a.s.(P*).
Since 2(9) is separable and dense in C0, it now follows that (for our fixed x)
(11.5) is true if feC0. Hence, for /eC0,
EW(*r) -/(Xr-)}2l^r-] =f(XT-)2 ~ 2f(XT-)2 +f(XT-)2 = 0,
a.s.(P*). T,he rest is trivial. Π
Not only is X quasi-left-continuous, but the filtration {J^,} is quasi-left-
continuous.
(11.6) THEOREM (Meyer). The filtration {J5',} is qlc: if {Tn) is an increasing
sequence of stopping times with limit Τ then
3?τ = σ(#ΓΤη:η=1,2,3,...).
See Theorem VI. 18.2 in Volume 2.
12. Characteristic operator. A point χ of £ is called absorbing if either of the
following two equivalent conditions holds:
(i) P*[X(t) = x,Vt] = l;
(ii) Ft(x,{x})=l, Vt.
(12.1) LEMMA (Dynkin). Let xeE and let d be a metric giving the topology of
E. If χ is not absorbing then, for all sufficient small η > 0,
Εχνη,χ< οο, where Vn9X:=inf{t:d(x9Xt)>rf.
So as not to interrupt things, we defer the proof of this lemma to (12.4).
We now define Dynkin's characteristic operator % of X. If χ is absorbing,
define
tf/(x):=0, V/eC0.
If χ is not absorbing, define
Vf(x):= lim E'Lf°^(^]-/(x)
4io WV4vX
if the limit exists. The domain S>(^) is defined to be the set of those / in C0
for which #/(x) exists for every χ and for which #/(-)eC0.
(12.2) THEOREM (Dynkin's Characteristic-Operator Theorem for FD
processes). We have
<# = %.
III. 12
FELLER-DYNKIN PROCESSES
257
Proof. It is clear from the definition of <€ that <€ satisfies Dynkin's Maximum
Principle (6.8). Hence it is sufficient to show that # extends ^. However, this
fact is an immediate consequence of Dynkin's formula (10.12) with T= Vn>x,
since <&f is continuous at χ. Π
Dynkin's splendid theorem has all sorts of important consequences. Let us
first see what it has to say for CBM(Rn).
(12.3) Example. Let X be CBM(Rn). Choose d to be the Euclidean metric on
Rn. Since B] — t is a martingale if В is a BM0(R), it is 'obvious' that
Exercise. Give rigorous proof by the optional-stopping theorem.
Further, as we have seen before, the P* distribution of the variable X(V4tX) is
the uniform probability distribution μηχ (say) on the sphere Ξηχ:= {у: d{x, у) = η}.
Now, if feS>(jA), then the Gauss-Green Theorem shows that
limmr^f f(y)^Jdy)~f(*)]== W(*)>
so that feS>(<#) and #/= \Δ/. Hence ^( = #) is an extension of f Δ. We already
know that ^ = |Δ if n = 1. See Section 7.2 of Ito and McKean [1] for the fact
that if n ^ 2 then ^ is the closure of |Δ and a proper extension of |Δ.
(22.4) Proof of Lemma 12.1. Suppose that χ is not absorbing. Set
Be(x):={y:d(x,y)<e}.
Then, for some ε > 0, t > 0 and α > 0,
Р;д(*,Ед\Ве(х))>а,
where Βε(χ) is the closure of BE(x) in Ed. Let G be the open set G:= Ed\Be(x).
Let (ftn) be a sequence of continuous functions on Ed increasing to the indicator
function of G. Then P+dhJP+d(,G), so that
{y:P;d(y,G)>a} = [J{y:P;dhn(y)>a}
n
is open. Hence, for some positive η, which we can and do suppose to be less than ε,
Р;д(у,Ед\Ве(х))>а, ЧуеВч(х).
An obvious use of the simple Markov property now shows that
Р*[*к,еВД,У/с^пК(1-аГ,
, and it is an elementary consequence that
258
MARKOV PROCESSES
III. 13
13. Feller-Dynkin diffusions. We now assume that Ε = IRn, but Ε could equally
well be an η-dimensional C00 manifold. Recall that the lifetime ζ of our process
X is defined as follows:
ζ(ω):=ίηί{ί:Ζί(ω) = δ}.
By an FD diffusion on IRn, we mean an FD process X with the following
additional properties:
(13.1) (i) the paths t\-+Xt((u) are continuous on [Ο,ζ);
(13.1)(ii) the domain ®(У) of the generator ^ of X contains C*:= C*(Rn), the
space of infinitely differentiable functions of compact support.
Let X be an FD diffusion. Then the restriction 5£ (say) of ^ to C* satisfies
the following conditions:
(13.2) (i) 5£ is a linear map from C* to C0;
(13.2) (ii) 5£ is local: if functions / and g in C* agree in some neighbourhood
of a point x, then S£f(x) = JSf^x);
(13.2)(iii) S£ satisfies the maximum principle: if/ in C* attains its maximum at
χ and f{x) ^ 0, then J27(x) ^ 0.
The property (13.2)(i) is obvious, and (13.2)(iii) is already familiar to us. Since
X has continuous paths, it is clear from the definition of # that <€ is local. The
property (13.2)(ii) now follows because $£ ^ ^ = #.
The three properties (13.2) imply the following theorem.
(13.3) THEOREM (Dynkin). The restriction S£ of <$ to C* is a second-order
elliptic operator of the form
*f(x)=^ZZfluWWM+IbfcWix) - c{x)f{x),
£ i j i
where 6{ denotes д/дх{ and
(13.4) (i) Vi, j, the functions α^(·)9^(·) and c(·) are continuous;
(13.4) (ii) Vx, the matrix {α^(χ): 1 ^ i, j ^ n) is non-negative definite symmetric;
(13.4)(iii) Vx,c(x)^0.
Proof. Note that it follows from the local and maximum-principle properties
of 5£ that S£ satisfies the Local-Maximum Principle: if / in ®(i?) has a local
maximum at χ and f(x) ^ 0, then S£f{x) ^ 0.
For χ in £, we can find φ in C* with φ = 1 in a neighbourhood of x. For
such a φ, we can define c(x)= — ^<p(x); this defines c(x) independently of the
particular φ chosen.
For fixed χ = (xl5x2,...,xn) in £, set
bi(x) = ^i(x),
where φ,-eC* and φ{{у) = yt — xf near x, and
α0(χ):= ^(φ,φ,Χχ).
III. 13
FELLER-DYNKIN PROCESSES
259
Then c,bi and ai} are continuous. For XUX2,...,A„eR, the function h with
has a local maximum at x, so that
Hence the symmetric matrix {я0(х): 1 ^ i, j ^ n} is non-negative definite.
Now, if feC™, Taylor's formula gives (for у near x)
Яу) = Ф(у) + о(\у-х\2\
where
(Recall that <p(y) = 1 near x.) Note that
J2Wx) = - c(x)/(x) + Σ Ь,(х)3,/(х) + |Σ Σ aiMWjAx).
For ε > 0, a function in C* defined near χ by
y*->f(y)-iKy)-*\y-x\2
has a local maximum at x. Hence
JSf/(x)-^(x)-eXe«<0.
i
You can see why S£f(x) = &ψ(χ), so the proof is complete. Π
Suppose given a second-order elliptic operator 5£ from C*(Rn) of C0(Rn) of
the form described in Theorem 13.3 (equivalently, satisfying the conditions
(13.2)). Suppose that {Pt} is an FD semigroup on Rn with generator ^ extending
if and that X is the canonical FD process associated with {Pt}. We now prove
that, a.s., X has continuous paths up to time ζ, so that (ignoring null sets) X is
an FD diffuson.
(13.5) THEOREM (Dynkin, Kinney). Let Z/>\C™ -> C0 be an elliptic operator of
the type described in Theorem 13.3. Suppose that {Pt} is an FD semigroup with
generator extending S£ and that X is the associated FD process. Then, a.s., the
paths of X are continuous on [Ο,ζ).
Proof. To avoid annoyances, we give the proof when {Pt} is further assumed
to be honest, so ζ = oo, a.s. (Since then Ptl = 1, Vi, it is easy to see that с = 0.
However, the condition 'c = 0' does not imply honesty, because it does not
preclude explosion in which X reaches infinity 'continuously' in a finite time.
More about explosion, and more about the case when с Ф 0, later.)
260
MARKOV PROCESSES
III. 13
Let К be a closed ball in R" and let G be an open ball containing K. It is
well known that there exists feC? with /=lonK,/=0on R"\G,0 </< l
everywhere. Since с = 0, we have i?/= 0 on K. For xeK, we have
Pt(x,-Rn\G)<f(x)-P,f(x)= - f P.if/Mds.
Jo
Since \PsS£f- &f\\ -+0 (s JO), it is now clear that
(13.6) supr^foR^GHO (i||0).
We now wish to prove that, for each compact K, for ε, и > 0 and for xeK,
(13.7) P* UilZib/^-Zafe + lW^^Sei^E^Vs^w} Uo.
as η t °°- The theorem will then be obvious. (See (7.21).) The probability in (13.7)
is dominated by
nsupPu/n(y^n\B3E(y)\
ysK
where Βε(χ) denotes the open ball of radius ε around x. Hence we need only
show that
r^upP.^R-v^OOHO (Щ0).
yeK
This is an immediate consequence of (13.6). For let xl9x29...9xr in К be such
that ВЕ{хг% BE{x2),..., BE{xr) cover K. Apply (13.6) to the case where К = BE{xk)
and G = B2E(xk). Let η > 0 be given. Then there exists Sk > 0 such that, Vi < <5k,
sup Г'РДлК'ХВз.ООК sup r1Pί(};,Rя\β2ε(xk))<^7.
Now take <5 := min (<5 x, δ2,..., <5Γ), etc. Π
Wiener's Theorem is obviously a corollary of Theorem 13.5.
Two problems remain.
(i) Does there exist an FD semigroup with generator ^ extending 5£ ?
(ii) If so, is there only one such semigroup?
See Section V.22 for the answers.
(13.8) The weak-continuity problem for {Px}. It would be wrong to leave the
present theoretical discussion of the Feller property without mentioning the
connection with weak continuity. (This connection will be clearly apparent in
our later treatment of Stroock-Varadhan theory. References for the special case
of diffusions will be given at that stage.)
If X is an honest FD diffusion for which Pi:Cb(Rn)->Cb(Rn), then the map
ΙΠ.13,14
FELLER-DYNKIN PROCESSES
261
χι—>PX is continuous from Rn to Pr(W), where W is the space of continuous paths
in Rn. For all FD diffusions, we can make the same statement, provided we
make a suitable slight adjustment of the concept of weak convergence.
It has been mentioned earlier that the theory of weak convergence is very
highly developed in the case when W is replaced by the space D of R-paths
(with values in a compact metric space Ed) with the Skorokhod Jx topology.
This provides the appropriate setting for studying weak continuity of {P*} for
general FD semigroups. See Skorokhod's classic paper [2], and Billingsley [3],
Aldous [1] and Ethier and Kurtz [1] for interesting work.
14. Characterisation of continuous real Levy processes. Let X be a continuous
1-dimensional Levy process, so that X has stationary independent increments.
Then X is a continuous Markov process with shift-invariant transition function
Pt(x, χ + Γ) = P,(0, Γ) (t ^ 0, xeR, Ге@).
We shall use Dynkin's characteristic-operator formula to prove Levy's theorem
that
Xt = aBt + μί
for some Brownian motion В and constants σ and μ.
It is strictly elementary to show that {Pt} has the FD property; and, since X
is continuous, X is strong Markov. We do not yet know that the domain of
the generator of X extends С".
Write
®:={feC0:f"eC0}.
Note that, for fe2, f'eC0 because
f(x + 1) -/(*) =/'(*) + i/"(* + θχ) for some θχ in (0, 1).
We shall prove that every element of S> belongs to the domain $){$) of the
characteristic operator ^ of X and that there exist constants aeR+ and μεΜ.
such that
Since the operator ja2d2/dx2 + μά/dx with domain 2 is exactly the generator
of the FD semigroup of aBt + μί, the desired result follows.
We shall assume that, for a < χ < b, [
0 < PxlHa < HJ = 1 - Px[Hb < HJ < 1. '
where Hy:=ini{t:Xt = y}. (The remaining cases are easily shown to be trivial
in that, for them, Xt = X0 + μί for some μεΈί.)
For h > 0, put
'o:= °> < +1 :=inf {' > <: 1*0 ~ X(-Q\ = *}■
262 MARKOV PROCESSES III. 14,15
Then
{*(τ*):η = 0,1,2,...}
is just a simple (Bernoulli) random walk. From standard elementary results on
gambler's ruin (see Sections 1-3 of Chapter XIV of Feller [1]), we can show
that there exist constants yeR and /?eR+ + such that, for a < χ < Ь,
еУЬ_ ух / U_x \
(.4.1) Р^щ.-^— (:=_ ifr-o).
(14.2) ^Н.,И,^Ь'^"-'2 + ^'"-^
γ еуа — еуо
(:=1/?(Ь-х)(х-а) if 7 = 0).
То prove (14.1), (14.2), first obtain Px[Hfl < Яь] and Ex[Ha л Яь] (in terms of
Е°[Я_АлЯл]) when x — a and b — x are both multiples of ft, and employ
obvious monotonicity properties in 'letting /Щ0\ You will find that β can be
defined as
(14.3) β:= lim 2h~2E°[H_h л Ял] > 0;
*llo
and it is indeed the existence of the limit at (14.3) rather than the much more
informative (14.2) that we really require.
Recall that #/(x) is defined as
(14.4, «/(*):= toE'™-^>,
nlio Ε*[τ?]
provided the limit exists. But, from (14.1) and (14.2), we find that
у (e"- l)/(x -»?) + (!- <r")/(x + φ) - (*"- <T ")/(*),
vfix) = - lim ,
βηίίο e™ + e~™-2
and now it follows (by Taylor series expansion) that, for feS), we have feQ)^€)
and
#/=Ι*2/"+Λ where σ2:=2β~\ μ:=-γβ~ι.
The proof of Levy's Theorem is complete. □
Exercise. Explain why the 1-dimensional result just proved implies the n-
dimensional case (1.28.12) of Levy's result.
For left-invariant diffusions on Lie groups, see Section V.35 in Volume 2.
15. Consolidation. We can profitably draw together a few threads. You will
notice that the first sentence of the next section reads: 'Let Χ = (Χ„Ω, {J*,},
Px:xeEd) be an FD process with transition function {P,}.' Let us revise what
ΙΠ.15,16
ADDITIVE FUNCTIONALS
263
is involved in this statement. The space Ε is an LCCB and the space Ед:= Еид
is compact metrisable. We can regard {Pt} as a semigroup on C0(E) satisfying
(6.6)(i)-(iv)—but note particularly the significance of (6.6)(iv)*—or as the
corresponding transition function derived via (6.2). We can therefore also regard
Pt as a map Pt\b£-+b$. The extended transition function {P*d} is an FD
transition function on Ed. Since we have so far met only canonical FD processes,
we take X to be the canonical process described in Section 7. The process X
has R-paths, and if either X(s —, ω) = д or X(s, ω) = δ then X(t, ω) = δ (Vi ^ 5).
The lifetime £ of A' is defined as C:=ini{t:Xt = d}. The σ-algebra &°t is as in
(7.16), and &t is as in (9.2). We note particularly the Debut Theorem 9.3. The
process X is strong Markov relative to the filtration {^t} in the sense described
in Theorem 9.4, and is quasi-left-continuous in the sense of Theorem 11.1.
Moreover (Theorem 11.6), the filtration {J^,} is also qlc. The resolvent of {Pt}
(or of X) is defined in (3.10). Dynkins formula (10.7), which 'decomposes the
resolvent at a stopping time T" is extremely important. Finally, the generator
<& of X may be defined via either (4.11) or Dynkin's characteristic-operator
formula (12.2) (which are equivalent for our FD process X).
3. ADDITIVE FUNCTIONALS
16. PCHAFs; λ-excessive functions; Brownian local time. Let
Х = (Х0О,{Ъ}9Р*:х€Ед)
be an FD process with transition function {Pt}. (You will see that the FD
property is not really used in our results on additive functionals, so that we
can (and shall) apply these results in more general contexts when these arise.
We have a clear idea of what an FD process is, and it provides a good enough
context to be getting on with.)
Let с be a measurable function from Ε to [0,00). (Take с(д) = О by convention.)
Define
1:= c°
Jo
(16.1) Λ,(ω):= c°Xs{co)dsy or At:=
c{Xs) ds for short.
Expressions of this type occur as cost functions in control theory, and represent
occupation times when с is an indicator function. In practice, we often wish to
find the Px distribution of At for each x. For this purpose we use the
Feynman-Kac (FK) formula, first developed for quantum-theoretic applications.
We prove the FK formula in Section 19.
A second major application of additive functionals is as compensators of
certain potentials. For example, if At is defined as in (16.1) and h(x):=ExAao,
assumed everywhere finite, then
(16.2)
E(AJW = At + h(Xt)
264
MARKOV PROCESSES
111.16
expresses the Doob decomposition of the supermartingale h(Xt) (you should
check that this is a supermartingale!) More commonly, we first discount time
to ensure convergence, forming, for example,
φ(χ):= Ε* f " e-*f{Xt)dt:= Rxf(x).
Jo
The analogue of (16.2) now is
E[ j °° е"ЯМ^^^к'1 = Г e~Xuf(X«)du + e'^RJ(Xt)
expressing the Doob decomposition of the supermartingale e~XtRxf(Xt). To
say that e~XtRxf{Xt) is a supermartingale amounts to saying that g:=Rxf^
e~XtPtg for all i^O. It is almost the case that if ψ is a function such that
ψ ^ e~xP$ for all ί ^ 0 then ψ = Rkf for some /; the exact statement is the
Volkonskii-Sur-Meyer Theorem 16.7.
The third major use of additive functionals is as random clocks to time-change
Markov processes; we saw a very important use of this technology in 1.5.13,
and will explore the technique thoroughly in Sections 22-25, and again in
Volume 2.
(jf 6.3) DEFINITION. (PCHAF) A perfect continuous homogeneous additive
functional (PCHAF) of X is an {^}-adapted process A such that, for some set
Ω0 in 3* with ΡΧ(Ω0) = 1, Vx, the following properties hold for every ω in Ω0:
(i) t\—>At(a>) is continuous, non-decreasing and Αο(ω) = 0;
(ii) Vs, Vi, As+t(co) = As(cd) + At(0job
(Hi) Α (ω) is constant on [ζ(ω), οο).
Terminological note. In Dynkin [2], 'perfect' refers to the 'adapted' property of
A. In Blumenthal and Getoor [1], 'perfect' refers to the fact that the 'exceptional'
set Ω\Ω0 in (ii) can be chosen independently of 5 and t. Dynkin calls this 'strict
homogeneity'. We win either way. Everyone would agree as to what a PCHAF is.
If, for example, the function с in (10.1) is (non-negative and) bounded then
(16.1) defines a PCHAF A. (Some boundedness condition on с is needed to
keep At finite.) It is true that all PCHAFs are 'limits' of such 'integral' PCHAFs,
and this is reflected in the important Existence and Uniqueness Theorem 16.7.
{16.4) DEFINITION (uniformly Я-excessive). Let λ>0 be fixed. An
^-measurable function f from Ε to R is called uniformly Я-excessive if
(i) f is bounded;
(ii) f is λ-super-median (see Section 7):
0 < e~XtPtf(x) ^/(x), Vi > 0, Vxe£;
(Hi) \\e~xtPtf - f \\^0 as 110.
III. 16
ADDITIVE FUNCTIONALS
265
The norm in (Hi) is of course the supremum norm. We take f(d) = 0by convention.
(16.5) Example. For any FD process, Rkh is uniformly Λ-excessive for heC^.
(16.6) Example. Let X be CBM(R). Then
f(x):=y-'exp(-y\x\)9 у:=(2Я)1/2,
defines a uniformly Я-excessive function /. (See Exercise 16.12.)
(16J) THEOREM (Volkonskii, Sur, Meyer). Fix λ>0. Let f be a uniformly
λ-excessive function on E. Then there exists a unique PCHAF A of X such that
f is the λ-potential of A:
(16.8) /W = EX Γ e~XtdAt.
Jo
Uniqueness means that if В is another PCHAF for which (16.8) holds then
PxlAt = Bt9Vt] = l9 Ух.
Note that our conventions force both sides of (16.8) to equal 0 when χ = д.
Before proving Theorem 16.6, let us look at Examples 16.5 and 16.6. In the
case of (16.5),
4'
Jo
h(X5)ds.
(16.9) Brownian local time at 0. Let A be the unique PCHAF of CBM (R) with
Λ-potential / as in (16.6). Now / was chosen so that
f(x) = E* [ехр(-ЯЯ0)]/(0) = ΡλΗο/(χ)
in the notation of Section 10. By 'Dynkin's formula' (see Exercise 16.14),
E* [C°e-XtdA, = PxHof(x) = f(x) = Ex Ге~хЧА„
J Ho JO
so that
Р*[Л(Яо) = 0] = 1.
It is now obvious (see Exercise 16.14) that (a.s.) A grows only when X is at 0:
Jo
(16.10) At= Im{Xs)dA,
Jo
It is further obvious from the 'uniqueness' part of Theorem 16.7 that, up to
constant multiples, A is the only PCHAF satisfying (16.10).
(16.11) Remarks. In dimension η ^ 2, we cannot define Brownian local time at
a point (why?), but we can define the local time spent on certain sets.
266 MARKOV PROCESSES III. 16
Exercises. These exercises provide good practice in the use of 9t operators.
{16.12) Prove that the function / of Example 16.6 is uniformly Я-excessive for
X = CBM(R). Hint. Use probabilistic (as opposed to analytical) reasoning to
show that / is Я-super-median.
Solution. Since / is in C0(R) and X is FD, the only point that needs proof is
that / is Я-super-median. Since, as we have already seen,
/(х) = Е*[ехр(-ЯЯ0)]ЯО),
it is enough to prove that
£(*):= Е*[ехр(-Ш0)]
defines a Λ-super-median function g. Now, for t ^ 0,
H0 ^ inf {5 > t:Xs = 0} = t + Η0οθν
Hence, with ^:=exp( — ЯЯ0), we have
g(x) = EXM > <ΓλΈχ[0^] = <ΓλΈχ[Ε*(ί)ί/]
= e-"Wlg{Xt)-\ = e-XtPtg{x\ D
as required.
(16.13) Though we are at present primarily interested in the 'Brownian local
time' case, this exercise covers the general situation. Show that if A is a PCHAF
(of some FD process X) with finite Я-potential / then, for a stopping time T,
Ex Γ e~XtdAt = PxTf{x).
This generalises Dynkin's formula (10.7), which becomes the special case-when
A is an integral PCHAF of the form (16.1).
Solution. We calculate
e~XtdAt= Xe-Xt(At-AT)dt = e-Xt\ Хе~х*{Ат+8- AT)ds
Jt Jt Jo
= e~Xt\ ke-XsAs°eTds = e~kTeT^
Jo
where
cds.
Jo Jo
ΙΠ.16,17 ADDITIVE FUNCTIONALS 267
By the Strong Markov Theorem,
Ε* Γ e'^dA, = Έ.χ[β-ιτθτη\ = Ε*[<ΓλΓΕ*(7>]
as required. D
Note. On taking Τ — u9 we find that
/(x) = Ex e~XtdAt^Ex
Jo
г-я^Л, = е-ЯмРм/(х),
so that / must be Λ-super-median. Because of the Monotone-Convergence
Theorem, we can see that / is even λ-excessive in that e~XuPuf{x)\f{x) as wj,0;
but it need not be true that / is uniformly Λ-excessive.
(16.14) Now return to the 'Brownian local time' case and prove (16.10).
Solution. Since РУ[Л(Я0) = 0] = 1, Vy, we have
Px[A{t + Hoo0t) - A(t) = 0] = ExPXit)[A{H0) = 0] = 1.
Thus, for every x,
P*lA(q + H0oeq) = A(q)9 V^eQ + ] = 1.
The rest is easy. D
17. Proof of the Volkonskii-Sur-Меуег Theorem. If the Λ-excessive function /
were of the form / = Rxg9 we would recover g by taking # = (Л — &)f. The idea
in general is to approximate Л — # by n(I — P\ln\ where we recall PtA:= e~XtPt.
Thus we define
gn:=n(f-P\/nf)^0,
and then define
•1/я
fn = Rx9n = n
Pffdt.
It is easy and important that fn]f uniformly on £. Then
/„(*) = Яд<7„(х) = Е* fV*ft,pO«fe = E* f VA'<L4n(i),
Jo Jo
where
U0:= Г'
Jo
9n(Xs)ds.
268 MARKOV PROCESSES III. 17
Put
Cn(t):= [ e-ksdAn{s)= [ e~"s gn(X5)ds9
Jo Jo
so that
(17.1) /„(*) = E*[C„(oo)].
For n^m9 and with the shorthand
9m,n ·= 9n~9m< 9n> fm,n ·= fn~fm> ®>
we make the estimate (with ||/|| and \\f — fm\\ denoting supremum norms)
0^E^(ICn(oo)-Cm(oo)]2)
= 2E*f°° f°° e-*sgm,n(Xs)e-^gmtn(Xs+t)dsdt
Js-0 Ji=0
= 2E*J* f e-«2°+»gmJXs)Ptgm,„{Xs)dsdt
= 2EX i°°
Js = 0
(because of the simple Markov property)
e-2X*gm,n(Xt)fmJXs)ds (since /m>„ = RxgmJ
<2E4 e-2X°gn(Xs)fmJXs)ds
f
Js = 0
Js = 0
<2||/-/J|E*j e-*°gn{Xs)ds = 2\\f-fm\\fn(x)
^2\\f\\\\f-fm\\.
(The 'Fubini' operations are justified because \\g„t„\\ < (m + и) ||/1| < oo.) But
(17.2) E*[C„(oo)|JFt] = Cn(t) + e-Xtf„(Xt), a.s.(P*),
so that, for each x,
(11.3) Mm>B(i):= Cn(t) - Cm(t) + e~Xtfn(Xt) - e^f^X,)
defines a martingale relative to ({,Ft},Px) with terminal value C„(oo) — Cm(oo).
For the moment, assume that the martingale Mm „ has Я-paths. Set
M*in:=sup|Mm,n(i)|.
Then, by Doob's L2 inequality (II.70.2),
Е*ЛС>П ^ [Е'(М:2 )]^2 < (8||/1| ||/ - /J|)1/2.
If we choose a sequence n{k) such that ΣII/ — /π(*)ΙΙ1/2 converges then (Borel-
III. 17,18 ADDITIVE FUNCTIONALS 269
Cantelli), almost surely (a.s.(Px), Vx),
converges uniformly over ie[0, oo]. But
sup|г^+1,№)-г%Щ| ^ ll/„(fc+i)-/„(fc)ll ^ ll/-/„(fc)ll,
and, since ΣII / ~/«w II < °o> we deduce that, almost surely,
C(i):=limCn(k)(i)
exists uniformly over ie[0, oo] and defines a continuous increasing process С
It is easy to check that
A(t):=
satisfies (a.s.)
eXsdCs
0
A(t) = limAnik)(t)9
the limit existing uniformly over compact intervals. Hence Л is a PCHAF. That
/(x) = Ex[C(oo)] = Ex f e~XtdA(t)
Jo
follows from (17.1) because Cn(oo)-*C(oo) in 5£2 and hence in if1.
(17.4) R-property of Mmn. In the cases that will concern us, fm and fn are
continuous functions on £, and then Mmn inherits the R-property from X. If we
assume only that fm and fn are Borel (as will happen if / is Borel), then we
shall see in Volume 2 that the section theorem implies that Mmn is an R-process.
However, the R-property of Mmn can be established in general: / is nearly
Borel See Theorem II.2.12 of Blumenthal and Getoor [1].
(17.5) Connection with the Meyer decomposition. We have
e-^/(^) = Ex[C(oo)[^t]-C(i).
This is (see Volume 2) the Meyer decomposition of the 'regular class (D)
potential' t\-+e~Xtf{Xt) on (Ω,{^},ΡΧ) as the 'potential' of the continuous
integrable increasing process C. The uniqueness part of the Meyer
decomposition theorem implies the uniqueness of A asserted in Theorem 16.7. Though it
is not difficult to prove the uniqueness of A directly (you should be able to
sense how to do it from the proof of the existence part of Theorem 16.7), it is
best thought of in terms of martingale theory, so we take the uniqueness result
for granted for now. D
18. Killing. We now construct a process X that represents 'X killed at rate
dAt\ (A now denotes some fixed PCHAF of X.) The intuitive idea is that X agrees
270
MARKOV PROCESSES
111.18
with X up to time ζ ^ ζ and X(t) = δ, Vi ^ ζ, where
(18.1) P[C > t|jr] = ' Π [1 " dA(s)J = e-A(t).
Set
Mt:=M(t):=e-A{t\
so that Μ is a PCHMF of X. The 'Af in PCHMF stands of course for ти/rt-
plicative and reflects the property
(18.2) Μ5+ί(ω) = Μ,(ω)Μ,(θ5ω).
The type of construction required is standard and straightforward, but even
the most intuitive ideas look tedious in probability theory. Blumenthal and
Getoor [1] do their killing as quickly and humanely as possible, and we shall
follow them. Needless to say, they do a much more thorough post-mortem, if
you want all the gory details.
The situation is rather curious because one normally works with σ-algebras
{#t°} for X that are, so to speak, 'half-completed'. (See Blumenthal and Getoor
[1].) We shall keep things 'algebraic' by assuming that A (equivalently, M) is
{^°t}-adapted (as it will be in all cases of interest to us). We can then work
with {^7} and Borel functions on Ε instead of {#",} and universally-measurable
functions on E. We further assume that we have 'weeded out' (rejected) a null
set so that for every ω, Μ0(ω) = 1 and t ь-> Mt{d) is continuous non-increasing.
Let Ω:= Ω χ [0, oo] and let ώ = (ω, ξ) denote the typical point of Ω. Let ^
be the Borel σ-algebra on [0, oo] and set #°:= &° χ 0t. Define <Γ(ώ):= ζ(ω) л ξ,
and put
Л ' \д if ί^ξ.
Define θ,ώ:= (θ,ω, {ξ-ήν 0). Then
(18.3) X,o§h = X,+h Vt,Vfc.
Let Ω,:=Ω χ (ί, οο]ε#°, and, for Ле#°, put Л in β° if there exists Λ in &°
such that
ΛηΩ, = Λ χ (t, oo].
Then
(18.4) {#;} is a filtration of (Ω,#°) and X is {^-adapted.
For ωεΩ, define a probability measure αω on [0, oo] by setting αω( {0} ):= 0 and
αω(ί, οο]:= Μ,(ω) (ie[0, oo]).
III. 18 ADDITIVE FUNCTIONALS 271
If Ле#° = «Г0 х dt, then, for each ω,
Лш:={{е[0,оо]:(о),{)еЛ}еЯ,
and we can easily check by monotone-class arguments that ω\—►αω(Λω) is
J^-measurable. Define
(18.5) Ρ*(Λ):=Ε*[αω(Λω)] {хеЕд).
Then Ρ* is a probability measure on (Ω,β°) for each χ in £5. Recall that we
use hS0 to denote the space of bounded ^-measurable functions / on Sd such
that f(d) = 0.
(18.6) THEOREM. X^=(&,£°,^,XtJt,Px) is a Markov process on Ε with
transition semigroup {P,}, where
(18.7) £/(*):= E*[M,/(*,)] (feb^xeE).
Note. You will appreciate that the notation (Ω,#°,{#°},{*,},...) would be
just too unwieldy.
Clarification of Theorem 18.6. Since we have not previously met a sextuple like
X except in the case of canonical FD processes, a word of clarification is
necessary. We are not going to spell out all the axiomatics. (See Blumenthal and
Getoor [1].) It is clear that, for example, statements (18.3) and (18.4) form part
of the definition of the statement that the sextuple X forms a Markov process.
But only one property really matters:
(18.8) txlf(X,+t)\f°al = Ptf(x) (/eW0)-
Proof of (18.8). Let Ae#s°, so that ΛηΩ5 = Λ χ (s,oo] for some Λ in ^°s.
Then, since f(d) = 0, you can check that
Ё*[Ж5+,);Л] =E*[/(*S+,)MS+,;A]
= P[(MA)W[/oij)]
= Е*[(МА)Л/М = t'lPtf(Xa Л].. D
(18.9) Illustrative examples. The Feynman-Kac formula will allow us to calculate
the infinitesimal generator of {Pt} in important cases.
(18.10) Example. We shall see that (modulo certain technical qualifications) if
X is an FD diffusion with 'differential' generator ££ satisfying
(18.11) S£f(x) = \ Σ Σ *yW WW + Σ ВД^/М,
and
(18.12) A(t):= c(Xs)ds (c non-negative)
Jo
272 MARKOV PROCESSES ΙΠ.18,19
then X is a locally' FD diffusion with differential generator
(18.13) <£/(*) = <?f(x) - c(x)f(x).
The significance of the operation of killing is that, in connection with the big
problems mentioned in Section 13, we can restrict attention to the case c(x) = 0,
Vx.
(18.14) Killing at constant rate. Perhaps the most frequently used application
of killing occurs when we take с to be a constant function: c(x) = X (>0), Vx.
(X now denotes an arbitrary FD process.) The killing operation in this case
simply involves constructing a variable ξ that, under each P*, is independent
of X and has the exponential distribution of rate λ. (ξ = oo if λ = 0). We then put
'* la (ι>ξ).
Note that our convention that A be constant on [£, oo] leads us to take
At = (It) л ζ instead of At = Λί, but it does not matter: 'After the first death,
there is no other.' We have
pt = pi.-e-*pv <$ = <$-λ.
The Feynman-Kac formula (Section 19) localizes this idea.
(18.15) Exercise. Let G be an open set, and let F:= Ed\G. Define
χ0[Χ< (t<HF),
'" Ъ (t>HF).
Thus XG is the process X killed on first leaving G. Formulate and prove the
result that XG is Markov, noting especially how the terminal time property
(18.16) ЯF-ί = ЯFoθf on {HF>t}
is used in your proof. Explain how XG corresponds to the X process obtained
by killing X in accordance with the right-continuous multiplicative functional
M, where
1 (t<HF\
Mt:=·
Ю (t>HF).
See IH.3.7 of Blumenthal and Getoor [1].
19. The Feynman-Kac formula. We shall examine three approaches to the
Feynman-Kac formula: one analytical, one via Markov-process theory and
(inevitably) one via martingale theory. Inevitably too, the martingale approach
is the one that best corresponds to our intuition.
III. 19
ADDITIVE FUNCTIONALS
273
Analytical approach. We first consider a simple situation with hypotheses that
are much too strong for certain applications. Let X be an FD process with
transition semigroup {Pt} acting on C0 and with generator ^. Let ν be a bounded
continuous non-negative function on Ε and set
At:= Γ v(X5)ds, M,:= £>"*'>.
Jo
We prove that (note that our notation is consistent with our previous notation
if ν is a constant function)
(19.1) P»f(x):=ExlMtf(XtK
defines an FD semigroup {?"} with generator &v satisfying
(19.2) @(<ZV) = ®(<§\ <#vf{x) = <Zf{x) - v(x)f(x).
In fact, since/1—>vf is a bounded operator (which we denote by v), it is a
standard piece of semigroup theory that &v, considered as defined by (19.2),
generates an FD semigroup {Pvt} with
(19.3) Pf'/= lim (e-^nPt/nyf (in C0),
ηίί°°
where, of course,
(e-tv/nf)(x):=e-tv{x)/nf(x).
(19.4) Exercise. Convince yourself that (19.3) is the analytical counterpart to
(19.1). Do not worry about rigour, because we are now going to prove something
much better.
Markov-process approach. Let us reformulate (19.2) in terms of resolvents:
(19.5) Rvx = Rx-R,vRx.
We can prove (19.5) directly and under wide conditions. It is important that
we can drop the continuity requirement on v. So let us assume that ν is
non-negative and ^-measurable and that Κλν(χ) < oo, Ух. As for X, we do not need
anything as strong as the FD property (though you can assume it for now).
The only hypothesis really needed on X is that (ί,ω)ι—►Ζ(ί,ω) be measurable
relative to the respective σ-algebras ^[0, oo) χ J^ and S.
We use the property
As + U — As = Au°us
and the simple Markov property of X to calculate, for feb$0,
RJ(x) - R\f{x) = E* e-Xt'Ait)f(Xt)(eA{t) - \)dt
Jo
274 MARKOV PROCESSES III. 19
= E* Γ dte-Xt-Ait)f{Xt) \ dsv{X5)eA{5)
Jo Jo
= EX \ dse-^Xi \duexp(-ku-Auoes)f(Xu°es)
Jo Jo
= Ε' f " e-*°v(Xs)Rlf{Xs)ds = RxvRlf{x).
Jo
(19.6) Exercise. Prove the easier, but less useful, result
(19.7) RJ(x) - R'J(x) = RlvRJ(x).
Martingale approach. Let us now give the 'right' treatment based on stochastic
integral theory which is justified later. Though we now use the language of
infinitesimal generators, the present approach is just as general as the Markov-
process approach. The idea is to use the fact, for /e®(^),
C{:=f(Xt)- [mxdds
defines a martingale. Then
d[e-A"f(Xt)l = e-A<»imXd ~ v(Xt)f(Xt)l dt + e~A" dC{.
(If you do not know that stochastic calculus obeys different rules from Newton's,
you will not be surprised by this calculation.) Thus
'o)= Γ
Jo
(19.8) Mtf(Xt)-f(X0)= | Ms$"f(Xs)ds + Nt,
where
^/(х):=^/(х)-фс)/(х), Nt:= \' e~AmdC{.
Jo
(Continuity of ν is not required in order for this to make sense.) The key fact
(proved in Volume 2) is that N, as the stochastic integral of a bounded continuous
adapted process relative to a martingale in L2, is itself a martingale in L2 (for
each P*). Taking Px expectations in (19.8),
Pvtf(x)-f(x)= [ Pvs^f{x)ds.
Jo
This is essentially (19.7).
Exercise. How do you obtain (19.5) by the martingale method?
Note. By the Feynman-Kac formula, we mean any or all of (19.2), (19.5) and
(19.7).
111.20 ADDITIVE FUNCTIONALS 275
20. A Ciesielski-Taylor Theorem. The following strange result was discovered
by Ciesielski and Taylor [1] by explicit calculation of the distributions involved.
Only in the case when η = 1 is a simple non-computational explanation known
(Williams [7]; see Section 9; but see also Getoor and Sharpe [2] and Biane [1].)
{20.1) THEOREM. The hitting time of the sphere {|x| = l} by а 5M0(Rn)
process has the same distribution as the total time spent in the ball {\x\ ^ 1} by a
BM0(Rn + 2) process.
Let us now use the Feynman-Kac formula to obtain the distribution of the
time spent in {|x| ^ 1} by CBM (R3). The general case of CBM (Rn) {η ^ 3)
may be studied in exactly the same way, but we spare you (explicit use of)
Bessel functions. Use of the Feynman-Kac formula is simpler than the Kac
'method of moments' technique employed by Ciesielski and Taylor.
So let X be CBM (R3). Let В be the unit ball in R3 and let
φι := meas {5 ^ t: Xs e B] = Ib(Xs) ds.
Jo
Fix α > 0 for the moment and put
At:=ccq)t:=
v{Xs)ds1 v:=aIB.
0
Introduce
Put
Rvxl{x):= ΓVя'
Jo
Then
By (19.5),
W\_e-A{t)-\dt.
A(oo):= | lim A(t) = αφ(οο).
й(х):=Е*|>-Л(0О)]= ИтХЯЩх).
АЦ0
Rxl-Rvxl = RxvRll
so that, since XRX\ = 1,
l-XRll = XRxvRvxl.
On letting λ[[0 (justification is completely trivial)
(20.2) 1 - h = R0lvhl = a f g(x9y)h(y)dy
where g is the free-space Green function for |Δ. See Section 1.22. Spitzer [1]
276
MARKOV PROCESSES
111.20
gives a neat treatment of (20.2), which we now follow. It is clear from the
definition of h that h is spherically symmetric: h(x) = /(|x|) for some /. The
right-hand side of (20.2) is therefore the gravitational potential due to a
spherically symmetric mass distribution. Gauss found the way to deal with such
potentials. First, the potential outside a spherical shell due to a symmetric mass
distribution on the shell is the same as if the whole mass of the shell were
concentrated at its centre. Secondly, the potential inside a spherical shell due
to a mass distribution! on the shell is constant, and therefore equal to the value
at the shell's centre. We may therefore calculate, for 0 < |x| < 1,
1-/ζ(χ) = α(2π|χ|)-1 f h(y)dy + a f (2n\y\y'h(y)dy.
•ЧЫ<М> J{M<M<D
Thus, with u(r) = rf(r), where h(x) =/(|x|), we have, for 0 < r < 1,
r — u(r) = 2a pu(p) dp + 2ar u(p) dp,
whence u" = 2aи on (0,1). We now easily obtain
(20.3) E°e-^(00) = sech δ, δ:= (2α)1/2.
Of course, the problem is really 1-dimensional, and we could have transformed
the Gauss results into statements about BES (3); but it would not have been
such fun.
(20.4) Exercise. Let X be CBM (R). Let
T:=inf{t:\Xt\ = l}=Hi vH.,.
Let α > 0 and <5:= (2a)1/2. Explain why
Е°[ехр(-аЯ1)] = Е°[ехр(-аЯ1); H^H.J
+ Е0[ехр(-аЯ_1);Я.1<Я1>-",
and deduce that
E°e"ar = sech(5.
Comparison of this result with (20.3) clinches the CT theorem in the case η — 1.
One of the classic applications of the Feynman-Kac formulae is to the arcsine
law (see Ito and McKean [1]). However, we prove the arcsine law by using
local-time theory, both in Section 24 and, by a better method, in Section VI.63
of Volume 2.
♦Still assumed symmetric.
111.21
ADDITIVE FUNCTIONALS
277
21. Time-substitution. Suppose that Л is a PCHAF of X and that a null set
has been weeded out so that properties (16.3)(i)-(iii) hold for all ω. Set
τ,(ω):= τ(ί,ω):= inf {s:As > i},
so that τ is the right-continuous function inverse to A. Since
{rs<t} = {At>s}
and {J*,} is right-continuous, for each 5, τ5 is an {^t} stopping time. Set
Xt:=Xr(t),&r=^m and h=Kty Then
is a strong Markov process on (E, <f*). The point is that if Τ is an {^} stopping
time then Γ:= τ{Τ) is an {J^} stopping time and #f(f) = «^(7):= J^r; further,
τ(Γ + ί) = Γ + т,°0г. Thus Z inherits the strong Markov property from X. As
an exercise, write out all the details of the present argument—see Section X.5
of Dynkin [2]. This result is due to Volkonskii.
If A is strictly increasing, in which case τ is continuous, then it is clear that
X and X have the same hitting distributions:
(21.1) for every compact X, Vx, ЧВеЩК),
Yx[_XoHKeB-\ = PxlXoHKeBl
Indeed X°HK = X°HK. Of course, Ях:= inf {t:XteK}.
The converse theorem is very deep. (See Theorem V.5.1 of Blumenthal and
Getoor [1].)
{21.2) THEOREM (Blumenthal-Getoor-McKean). // Υ is a'standard'process
with the same hitting distributions as X then, for some strictly increasing PCHAF
A of Χ, Υ has the same laws as the X process associated with A.
For the definition of 'standard process', see Blumenthal and Getoor [1]. (FD
processes are certainly standard.) Incidentally, the fact that if X is standard and
A is strictly increasing then X is standard emphasises the need to axiomatise
processes by probabilistic axioms (as is done for standard processes and, better,
right processes) instead of axiomatising via analytical properties like the FD
property.
{21.3) The generator of X. The preceding sentence stands as a piece of pure
mathematics. However, when we wish to talk about the generator of X, we
should consider for example how the FD property behaves under time-
substitution.
{21.4) Volkonskii's formula. Suppose that X is an FD process and that
A(t)= Γ v{X5)ds,
Jo
278
MARKOV PROCESSES
IH.21,22
where ν is a positive continuous function on Ε bounded away from 0. Then,
for some Κ, τ, < Kt, Vt. For /e®(^),
C{:=f(Xt)
&f(Xs) ds is an i?-martingale
relative to (Ω, {^^},Ρμ) for every μ. See (10.13). If we consider a fixed interval
[0,a] for t then rte[Q,Ka], and we can apply the Optional Stopping Theorem
H.77.5 to Cf(t a Ka). In this way, we deduce that
(21.5) C{{t) =/(*,) - f v(Xs)-1mXs)ds
Jo
{t running over the whole half-line [0, oo)) is a martingale relative to each
(Ω,{#?},Ρ"). For fe@(<0), write
(21.6)(i) G/(x):=.(x)-^/(x)
provided fe3>{<&); then GfeC0. Taking P* expectations of the martingale C{(t),
we find that
Ptf(x)-f(x)= P5Gf(x)ds.
Jo
If we assume that (in the cases we study) the transition semigroup {Pt} of X is
an FD semigroup with generator § (say), it is now clear that
(21.6)(ii) #=>G.
The results (21.6) comprise Volkonskii's formula. For Volkonskii's proof via
Dynkin's formula, see X. 10.24 in Dynkin [2].
Of course, if ν is also bounded away from oo, we can reverse the roles of X
and X to show that ^ = G.
(21.7) Finite Markov chains. The time change of a finite Markov chain with
β-matrix Q by the additive functional At = ^Qv{Xs)ds is, according to (21.6)(i),
the Markov chain with β-matrix β, where
In terms of the jump-hold description of the chain (Section 2), the effect of the
time change is easy to specify; when the time-changed chain visits i, it resides
there for an exponential amount of time with mean v(i)/q(i% compared with the
mean l/q(i) for the original chain.
22. Reflecting Brownian motion. Let X be CBM (R). Let
At:= I[0tao)(Xs)ds = meas{s*:t:Xs>Q}.
Jo
111.22
ADDITIVE FUNCTIONALS
279
Let τ be the right-continuous inverse of A and let Xt:= Χ°τν It is intuitively
obvious that Xt must be reflecting Brownian motion on [0, oo). Assume for
now (see Note below) that X is an FD process on [0, oo). Dynkin's formula
# = <% for the generator of X shows that
f(0)=timM.
.Uo Е°[А(Я.)]
Now, by the formula (1.13.7) for the transition-density function of Brownian
motion killed at 0,
Е°[Л(Я£)]= Г Г lp(tAs-y)-p(rAe + yWydt,
Jt = 0 Jy = 0
where ρ is the transition-density function of CBM (JR.). But, for λ > 0 and with
у:=(2Л)1/2, formula (3.13) gives
e"A,[p(i;0,e - y) - ρ(ί;0,ε + у)] Л = гу-^'^тЬдо.
Jo
On letting Я ДО, we obtain
[p(i; 0, ε - y) - p(i; 0, ε + у)] A = 2y,
Jo
so that
Е°1А(НЛ=е2.
Hence, for / in Co[0, oo), we have fe@(§) if and only if the formulae
(22.1) (i) »/(*) = iTM (*>°)
(22.1)(ii) m0) = lim ε"2[/(ε) -/(0)]
make sense and define a function &f in Co[0, oo). Note that / in <&(&) satisfies
/ + (0).-=Ume-1[/(e)'-/(0)]=0,
and observe how the formula for У checks with FHopital's rule.
We saw in Section 1.41.1 that reflecting Brownian motion \X\ is Markovian.
From the explicit formula (1.14) for the transition-density function for |A"|, it is
clear that \X\ is an FD process. That \X\ has generator ^ as in (22.1) will be
immediate once we show that
Е°[Я£лЯ.£] = б2.
This follows from the more precise result that for λ > 0 and with y:= (2Я)1/2 and
Γ:= Ηε л #_ε, we have
E0[exp(-^7)] = sechye.
280 MARKOV PROCESSES 111.22
See Exercise 20.4. The proof that X and |A"| have the same generator #, and
hence the same transition function, is complete.
Important note on preservation of FD property. Deciding whether or not the FD
property is preserved under probabilistic operations such as time-substitution
is generally a very difficult problem. In the 1-dimensional case, special arguments
apply that allow easy settlement of the matter, so that the FD property for
simple transformations of CBM (R) is usually taken for granted in the literature.
Let us work through the details for this example. So let X be CBM (R), and
.define
At:= meas {s ^ t:Xs ^ 0}, τ,:= inf {u: Au > ή, Χt:= X(rt).
Our intuitive idea is this: if X starts at a point χ of [0, oo) then, with high
probability, X will very soon hit y.f We use this idea to show that X has FD
resolvent, whence, by the Hille-Yosida Theorem, X has FD transition function
{Pt}. (Warning: if you try to express the same intuitive idea directly in terms
of the transition function, you will encounter some very awkward technical
problems.)
For }>e[0, oo), put Hy:=ini{t:Xt = y}. By applying the strong Markov
property of X at time Hy—which we know amounts to applying the strong
Markov property of X at time r(Hy)—we find that (with obvious notation) for
/eCo[0, oo), λ > 0, and x,ye[0, oo),
Rxf(x) = Ε* Γ*e-x*f{Xt)dt + Е*[ехр(-АЯ,)Ля/(у)].
Jo
(You recognise that this is just Dynkin's formula for X) Hence
(22.2) \Rxf(x)-Rxf(y)\*i\\f \\E* "e-»dt + |ЛдДу)|Е-[1 -ехр(-АН,)]
Jo
^(A-Ml/ll + ll^/IDE'Cl-expi-AH,)]
^2Я-1||/1|Е^[1-ехр(-Ш,)].
That Rkf is continuous will therefore follow once we show that
(22.3) E*[exp (- ЯЯ,)] -> 1 as χ-> у.
But Hy = А{Ну) ^ Ну, a.s., so (22.4) follows from the corresponding property
for X. It is trivial to show that, for feCo[0, oo), Rxf(x)->0 as x-* oo. Hence
Ял:С0[0,оо)^С0[0,оо).
By Lemma 6.7, you will see that it is enough now to show that, for fe Co(0, oo),
(22.5) Ptf(x)->f(x) (40).
(It will then follow that as ЛТТ00, XRxf-*f not only pointwise but also in the
Неге, у is a point [0, oo) very close to x.
IH.22,23
ADDITIVE FUNCTIONALS
281
supremum norm.) However, since X is right-continuous and
Ptf(x) = Exlf(Xt)l
the result (22.5) is obvious.
The same argument covers the case of Xх and the case of elastic Brownian
motion in Section 24. The Feller-McKean example (now to be discussed) would
require a little more thought, but whether or not it is FD (in fact, it isl) is
irrelevant.
23. The Feller-McKean chain. Again let X be CBM (R). We assume Trotter's
Theorem 1.5.9. Let /(i, x) be a jointly continuous local time for X and set
At:= l{Ux)m{dx)=Yjl{Uq)m{q]
Jr q
where m is a probability measure concentrated on the rationals with m{q} > 0,
VgeQ. Then Л is a strictly increasing PCHAF of X. (We suppress 'almost surely'
qualifying phrases here.) Let X be the corresponding time-transformation of X.
Then J? is a continuous process that spends almost all its time in Q and can
be regarded as a Markov chain with 'minimal' state-space Q. The generator of
X (considered as an FD process on R) is a highly singular second-order 'elliptic'
operator
±_d_d_
2dmdx
ltd and McKean [1] contains the definitive account of such operators. Breiman
[1] and Freedman [1] contain good easy introductions.
The Feller-McKean chain X is historically important as the first chain with
all states instantaneous:
ίι·- -ί«:= lim ί_1[1 - РгШ = oo, Vi,
where
(Some basic chain theory is recalled later in this chapter.) Since X moves
continuously and therefore does not jump, we have
qu:= limi-1pu(i) = 0 (ViJeQi^j).
mo
Thus the β-matrix Q of X satisfies
(-oo 0 0 -·Λ
0 -oo 0 ...
0 0 -oo ···
: ! ! /
In Theorem 55.1, we describe all possible totally instantaneous Q-matrices.
282 MARKOV PROCESSES 111.24
24. Elastic Brownian motion; the arcsine law. Let X continue to denote CBM (R)
and let A continue to denote time spent by X above 0. Let ξ be an exponentially
distributed variable of rate λ > 0 independent of the X process. Let
x_,Xt (t<&
χλ·=
In other words, we have the 'killing at constant rate' situation of (18.14). Set
Jo
and let τχ be the right-continuous inverse to Ax. Let Ух be the generator of
Χχ:=Χχοτχ. Then
(24.1) (i) & f(x) = $f"(x) - kf(x) (x > 0)
(24.1)(ii) ^ 7(0) = lim s-2le-y*f(s) - /(0)], y:= (2Я)1'2
εϊϊΟ
because
Е°[Л(Я£)л£|~б2 (εϋΟ),
Р°[Я£ < ξ\ = Е°[ехр( - λΗ^ = е-".
Note the elastic boundary condition
(24.1)(iii) /+(0) = y/(0)
satisfied by elements / of 2){<gx). Again note how the formula for <SX checks
with l'Hopital's rule.
Now let us explain why this example is interesting. The lifetime of Χλ is Α(ξ).
If Χ(ξ) > 0 then Xх dies at position Χ(ξ). The exciting thing is that if Χ(ξ) < 0,
which happens for example with P° probability |, then Xх dies at 0. In other
words, Xх must be obtained by killing X at rate λ while X is away from 0, but
in a way depending on the local time at 0 when X is at 0. Indeed, if lt = /t(i)
denotes the local time at 0 for X then
(24.2) ΡχΙΑ(ξ)>ί\Χ]=εχρ(-λί-±γΙ% Vx^O.
This formula, taken from Williams [6], is one way of introducing local time
from global considerations! We prove (24.2) below. Note that it may be
reformulated in a way that obviates the need for introducing ξ:
(24.3) E*[exp (- kzt) \ X~\ = exp (- Xt - |y/t(i)), Vx > 0.
(24.4) Arcsine law. Levy's arcsine law
P°[A(wKi]=-arcsin (-) (t^u)
is an immediate consequence of (24.2). Take χ = 0 in (24.2) and then take P°
111.24
ADDITIVE FUNCTIONALS
283
expectations to get
5 Xe-XuP°lA(u) >i]du = <ΓΛΈ°|>χρ(- \ylt)~\.
All that remains is to identify the law of jlt. In fact, jlt has the same law as
Sf:= supu<i Xu; you may consider this to be completely obvious in view of the
construction of local time from upcrossings given in Section 1.14. On the other
hand, it may take Exercise 24.5 below to convince you. However, given the
equality in law of jlt and St, the arcsine law now follows by consulting Laplace
transform tables, or, better, by calculating a few integrals.
(24.5) Exercise
(i) Let U+(t,e) be the number of upcrossings of [Ο,ε] by X before time i, and
let C/_(i,e) be the number of downcrossings of [— ε,Ο] by X before time t.
By the strong Markov property of X, argue that, conditional on (7(ί,ε):=
t/+(i,fi) + [/_(ί,ε) = η, U+(t,e) has a B{n,j) distribution.
(ii) Observing that ί/(ί,ε) is the number of upcrossings of [Ο,ε] by \X\ before
time i, show, as in Levy's Theorem 1.14.7, that
lim2-nC/+(i,2-n) = f/r
n->oo
(iii) From the characterisation in Section 22 of reflecting Brownian motion, we
have X(t) = Χ{τ,) = \Xt\. Thus
lim 2-nU+(Tt>2~") = ji,= lim 2-nU{t,2~") = I,-
(24.6) Elastic Brownian motion. Let X still denote Brownian motion and lt the
local time at 0 for X. Fix γ >0. Let X denote \X\ killed at rate ydlt, so that
X has transition semigroup {Pt} with
Ptf(x) = E*[exp (- ylt)f(\Xtm = £x[exp( - hl)f(*t)l
Obviously <$/ = j f" away from 0. The situation at 0 is interesting.
,ιιο Е°[Н,лС]
where Ηε and Ηε are of course the hitting times of ε by |A"| and X respectively.
During the course of our proof of Levy's Downcrossing Theorem (Section
1.14), we showed that the P° distribution of 1{Ηε) is exactly exponential with
mean ε, so that
Е°[ехр(-7/оЯс)] = (1+уе)-1.
Further (why?),
Ε°[ΗεΛζ]~ε2 (eUO).
284 MARKOV PROCESSES IH.24,25
Hence, for fe9($),
Г(0) = уД0), */(0) = i/'(0 + ).
The formula (24.2), which involves an extra killing at constant rate Я, is
now obvious.
4. APPROACH TO RAY PROCESSES:
THE MARTIN BOUNDARY
25. Ray processes and Markov chains. We now move on from the familiar FD
semigroups and processes to Ray semigroups and processes. Quite rightly, you
first want certain questions answered:
(25.1) (i) Are there important examples of Ray processes that are not FD1
(25.1) (ii) Does the theory of Ray processes provide new information on FD
processes!
(25.1)(iii) Do we have to move on again from Ray processes to still more general
objects!
The answers must be (i) Yes, (ii) Yes, (iii) No—or we would have not asked the
questions. You will of course realise in regard to (iii) that 'No, never' must yield
to 'Hardly ever' if the point is pressed.
For Ray processes, both the analysis and the probability theory are much
richer than for the FD situation; and, for the pure mathematicians among
you, this may be justification enough for studying the Ray theory. However,
motivation never harmed anyone (least of all, pure mathematicians), and we
propose to answer questions (i), (ii) and (iii) in some detail before we develop
the theory.
In 1966, Chung made some shrewd and prophetic comments in his book
[1] on Markov chains:
The second edition of this book appears at a time when boundary theory
(envisaged in this book as a study in depth of the behaviour of sample
functions in relation to the 'infinities') has just begun to take shape. This
vital theme, already announced in the preface to the first edition, will no
doubt be the most challenging part of the theory to come. I have chosen
not to enter into it in detail in the belief that such a development needs
more time to mature. In this regard, it may be a timely observation that
the theory of Markov processes in general-state-space, which flourished in
recent-years and has built up a powerful machinery, has had to date little
impact on the denumerable [chain] case. This is because the prevailing
assumptions would allow the sample paths of chains virtually no other
discontinuities than jumps—a situation which would make a trite object
of a chain. On the other hand, the special theory of Markov chains has
111.25
APPROACH TO RAY PROCESSES
285
yet to adapt its methodology to a broader context suitable for the general
state-space. Thus there exists at the moment a state of mutual detachment
which surely must not be allowed to continue. Future progress in the field
looks to a meaningful fusion of these two aspects of the Markovian
phenomenon.
The meaningful fusion is achieved in the theory of Ray processes. The benefits
to chain theory are enormous. In the other direction, last-exit theory provides
one of many examples where the general theory has benefited from adapting
the methodology of chain theory. The prophetic character of Chung's comments
will be seen throughout this book.
Anyone familiar with Chung's philosophy (one with which we are in full
agreement) will know the stress to be laid on sample functions in the quotation.
It is on the behaviour of sample functions rather than on that of (for example)
excessive functions that we, as probabilists, must concentrate. It is therefore
satisfying that the general theory will allow us to treat chains in a manner that
greatly clarifies the probabilistic significance of qh qip q^t) etc. and suppresses
much of the analysis on which many previous treatments have relied. To be
sure, Ray's theorem makes very heavy use of analysis in its early stages. The
point is that, once it gets underway, the probability theory is more or less
self-sufficient; and, by then, it trounces analysis at its own game.
In connection with question (25.1)(i), it is necessary to understand Chung's
statement that the then-prevailing assumptions covered only 'trite' chains. This
relates to the following fact (discussed briefly and illustrated in a moment, and
explained fully later in this chapter). If a transition matrix function {ρη(ί)} on
a countable set I has the FD property relative to the discrete topology of I, then
the associated chain X is totally stable and Feller minimal. Though nearly all
chains that can serve as models for real-world phenomena are totally stable
and Feller minimal, such chains are 'trite' from a pure-mathematical standpoint.
(We shall attempt to provide a strong justification for the study of 'non-trite'
chains later!)
The statement that X is totally stable means of course that every state i in /
is stable:
9i:=limr1[l-pii(i)]<oo, Vi.
mo
You can see why this has to be. Since X is FD, X is right-continuous in the
discrete topology of /, so that if X starts in state i then X must stay at i
throughout a time interval. The well-known fact that this time interval is
exponentially distributed with rate q{ is proved (along with the existence of q()
for general chains in Section 82. The Feller-minimal property refers to the fact
that 'the behaviour of X is completely determined by Q in that if X explodes
then X dies at its explosion time. In the next section we clarify these points in
regard to a special example.
286
MARKOV PROCESSES
ΙΠ.25,26
(25.2) The Ray-Knight topology. It is hard to believe now that Ray's tremendous
paper [1] was published as long ago as 1959. Ray's choice of axioms for what
is now called a Ray process was astonishingly perceptive. There was
unfortunately an error in Ray's attempt to show that every 'acceptable' Markov process
(and, in particular, every chain) could be made to accord with these axioms
by introducing a suitable topology on, and compactification of, the state space.
This extraordinary claim is however essentially true.
The gap in Ray's paper was corrected (for different particular situations and
in different ways) by several people including Ray himself. It was Knight [1]
who got things just right, and, especially after the appearance of the 1967 Kunita
and T. Watanabe paper [1], the Ray-Knight compactification was firmly
established. In particular it was known that all chains are Ray processes, which
gives a strong 'Yes' to question (2.5.1)(i).
You can see the problem involved if we consider the Feller-McKean example
(Section 23). The Feller-McKean process X is a continuous process on R that
can be regarded as a chain on Q. Now suppose that someone relabels Q as, say,
N and presents us with the Feller-McKean transition matrix function {ρί;·(ί)}.
Could be recognize from {py(i)} that we should 'unscramble' the situation by
imbedding N as Q in R? Yes; the Ray-Knight compactification will do the
unscrambling for us.
26. Important example: birth process. The example we now discuss is extremely
simple, but it serves well to illustrate points in the theory of Ray processes and
(later) in the classification of stopping times and in the theory of jumps of
martingales. We are sure that you know enough elementary chain theory (from
Volume 1 of Feller [1], which discusses this example) to follow this account
without difficulty. We are only concerned with intuitive understanding, and
skip some details of rigour (along with some 'a.s.' phrases).
Let / = {1,2,3,...} and let Q be the / χ / matrix
/-9i 9i О О -Л
Ql 0 -q2 q2 0 ... where0< <00jVl·
0 0 -a, a, ..
Let X be a right-continuous chain with Q-matrix Q, let
Hn:=ini{t:Xt = n}
and let η:= limn#n ^ oo be the first explosion time of X. Up to time η, the paths
of X are non-decreasing. Under the Pl law, the variables
are independent and are exponentially distributed with rates qbgi +1> · · · respec-
111.26
APPROACH TO RAY PROCESSES
287
tively. Thus
(26.i) Εί[β-λ'] = πα+^;1)"1 α>οχ
so that (as we see by letting Λ||0)
if £ q~1 = oc then η = oo, a.s.,
if Σ <7k~ * < °° then η < oo, a.s.
In the case when η = oo, a.s., there is nothing more to say: X is the unique
chain with β-matrix Q and X is FD because PJXO ^min^j/i/) and because
(6.6)(iv)* is automatic.
We now devote our attention to the case when η < oo, a.s.. If X is FD (for
the discrete topology on /) then, by Theorem 11.1, X is quasi-left-continuous
on [0, oo), so that
Xfa) = 1xniX(HJ = ao=d;
n
by the coffin condition, X(t) = д — oo, Vi ^ η. Thus there is only one FD chain,
namely the 'Feller-minimal' chain killed at time η.
{26.2) Exercise. Show that the Feller-minimal chain Irain has resolvent
τ$*(λ):= f°° e-Xtp™n(t)dt (Я>0),
Jo
satisfying
rmina)i° U<i),
iS Xtt + qj)-1 Π [(ΐ + ^,Γ1)"1] (;>0,
г5"°(Л) = (Л + 9уГ1. D
When η < oo (a.s.), there are other (non-FD) chains X with β-matrix Q. We
now write £:= {1,2,3,...; oo} for the one-point compactification of / and д for
a point isolated from E. Let μ be a probability measure on / и {д} with μ{δ} < 1.
For each such μ, we can construct a chain X with Q-matrix Q by Doob's
immediate-return procedure: we choose Χ(η)εΙν{δ} with distribution μ; if
Χ{η) = δ then JT(r) = d, Vi > >/; if A^e/ then we run X according to the 'old'
rules until the next explosion time η2 (say); we choose Χ(η2) with distribution
μ; etc., etc.
For this chain X, quasi-left-continuity breaks down and takes the modified
form
(26.3) P'U(limHn) = j\ VПНпП = μ,:= /ι«Λ) (je/иЗ),
where V &(H„) is the smallest σ-algebra containing every ^"(#n). We say that
288
MARKOV PROCESSES
ΙΠ.26,27
00 is a branch-point with branching measure μ and write
J\>(«>» {./}) =/*/·
We note that X never visits the branch-point oo. (It approaches oo but branches
at the last moment.) The only sensible interpretation of the P°° law is as
p°°= Σ Цкрк.
kelud
Note that now we generally have <FΆ Φ V ^(Hn) (in contrast to the 'FD' situation
in (11.6)).
(26.4) Exercise. For fee/υ δ, put хк(Я):= Ε*[>"λ,ί], so that χδ{λ) = 0. Show that,
for i,jel,
^(λ) = ^(λ) + Χί(λ)Σμ^(λ)
fee/
(Substitute the first equation into itself.)
(26.5) Warning. More complex ways of return from infinity are possible even
for this example.
(26.6) Note. We must mention one further point in connection with this
example. Consider the immediate-return process with μ1 = 1, so that X returns
to 1 after each explosion. Then X will have the FD property (and so will be
quasi-left-continuous) if / is topologised as the compact metric space with 1 as
the unique limit-point of the sequence 2,3,4,... Check this by using (26.4) and
the HY theorem. The Ray-Knight topology will automatically make the point
1 an accumulation-point of /, as it should be.
In short, the Ray-Knight compactification will detect Feller properties when
these are present. (More is true. It can happen that when a process X on (say)
a compact metric space is constructed by complicated probabilistic methods,
we are unable to prove directly that X has the FD property, but can prove the
equivalent result that X is a Ray process without branch-points.) But the value
of the Ray-Knight compactification is of course that it always works.
27. Excessive functions, the Martin kernel and Choquet theory. Let us do more
than answer 'Yes' to question (25.1)(ii) by telling you that Ray theory yields a
much simpler and more intuitive account of Martin (-Doob-Hunt) boundary
theory, even in the case of discrete-parameter chains! To be honest, this most
elementary case of Martin boundary theory is much the most interesting: it has
many delightful and important applications. It will be helpful to run through
some of the basic analysis for this case now. For one thing, it will help us
111.27
APPROACH TO RAY PROCESSES
289
understand the original Martin boundary of R. S. Martin. Later in this chapter,
we shall derive both the analysis and the more interesting probability theory
for the chain case from Ray's Theorem.
The 'Ray' treatment will be independent of Choquet theory, but we now
explain how Choquet's famous theorem on integral representation of elements
of simplexes shows that a Martin representation of harmonic (more generally,
excessive) functions must hold. Meyer [2] and Phelps [1] have fine accounts
of Choquet theory. Choquet's Theorem has been useful in establishing the
existence and/or uniqueness of integral representations in many areas of
probability theory. Inevitably, its very generality prevents its being useful in
pinning down the explicit form of extremal elements.
Let / be a countable set and let Π be a substochastic I x I matrix. Define
the Green kernel Γ of Π as the / χ / matrix with
Γ(;,;):= £ TF(Uj)<co9 VU
n = 0
so that, formally, Γ = (/ - Π)" Κ Compare (1.22.1).
The probabilistic interpretation is obvious. Let X = (Xn: η = 0,1,2,...) be a
Markov chain on / (with coffin state д adjoined) with 1-step transition matrix
П, so that, for, neN and i0,il9i2,...j„el,
Pfe[*i = hi · · ·; *„ = Q = Π(ί0, h)U(iu i2)... Π(ί„_ lf in).
(The Daniell-Kolmogorov Theorem immediately gives an appropriate X) Then
Г(/, j) = Ef[time( ^ 0) spent by X in ;].
(27A) ASSUMPTION. There exists a reference point b in I such that
0<Г(Ь,;)<оо, v/e/.
This assumption is made throughout the remainder of Section 27. It says:
(0 state b can feed into j (ultimately) (V; # b);
(ii) every state is transient.
The easily-established strong Markov property of X shows that
(27.1) Г(У) = Pf[D,· < оо]Г(л j) < Г( j, Д Vi, j,
where
Dj:=wI{n>0:XH=j}.
Now define the Martin kernel к on / χ / as
(27.2) K(i,j):=r(Uj)/r(bJ).
It follows from (27A) and (27.1) that
(27.3)
K(Uj)^K(j,j)<<n, Vi,;.
290 MARKOV PROCESSES 111.27
It is another easy consequence of the strong Markov property that
(27.4) k(UJ)^k(Ui)< cc, Vi,;\
Exercise. Prove (27.4) probabilistically. Can you give a neat algebraic proof?
A function / from / to R is called excessive (respectively, regular) (for Π) if
(i) 0</<oo;
(ii) Π/ < / (respectively Π/ = /).
The set of excessive functions forms a cone С in R7. For the topology of C, we
take that of R7, that is, the topology of pointwise convergence.
Because of Assumption 27A,
0O-):=supir(b,;)>O,
π
and since a function f in С satisfies
we have
(27.5) №>e(j)f(j% Y/.
In particular, every / in С may be written as
/ = /(*)/*, f*eS:={feC:f(b)=l}.
The study of the cone С thus reduces to the study of its section S.
{27.6) PROPOSITION. The set S is a compact convex metrisable subset of the
locally convex linear topological space RJ.
This proposition, which is an immediate consequence of (27.5) and Fatou's
Lemma, states exactly that S satisfies the hypothesis of the metrisable case of
Choquet's Theorem on the existence of integral representations. Recall that
an element / of S is called extremal if the equation
/ = i/i+i/2 (/i,/2eS)
implies that / = /x = /2.
For our special situation, Choquet's Existence Theorem takes the following
form. See Meyer [2] and Phelps [1].
(27.7) THEOREM. The set Se of extremal elements of S is a Gs in S. IffeS
then there exists a probability measure ν on @(Se) such that
(27.8) /(0 =
ξ(ϊ)ν№ Vi.
Se
(Note that the mapξ\-+ξ(i) is continuous on S.)
We wish to add the following theorems
111.27 APPROACH TO RAY PROCESSES 291
(27.9) THEOREM. Further, ν is uniquely determined by f.
Choquet's Uniqueness Theorem states that Theorem 27.9 is equivalent to the
following lemma.
(27.10) LEMMA. The cone С is a lattice in its intrinsic order.
Note. The intrinsic order « on С is defined as follows: for x, zeC, we write χ « ζ
if ЗиеС with χ + и — ζ.
How to prove Lemma 27.10 will be explained in a moment. (As an exercise—
not quite as easy as it may look!—try proving it now.)
The key technique for studying С is provided by the Riesz Decomposition
Theorem 27.14. Let μ be a (non-negative) measure on / such that
Γμ(0:=ΣΠυ)μΟ*)<*>, Vi.
Then Γμ is called the potential (due to the charge μ). Since
(27.11) ΠΓ/ι = Τ μ - μ ^ Γμ,
the function Γμ is excessive. Note that the equation
(27.12) μ = Γμ-ΠΓμ
determines μ from Γμ, and that
(27.13) ΠΤμ=ΣΠ*μ|0 (η|ΐοο).
(27.14) THEOREM (Riesz Decomposition Theorem). Iff is excessive then f
has a unique decomposition
(27.15) / = " + Γμ,
where ν is regular and μ is a measure on I. Indeed,
(27.16) i; = limir/,
(27.17) μ = /-Π/.
Proof. Define μ by (27.17). Then μ(ι) ^0, Vi, and
(/ + Π + ··· + Π")μ = /-ΙΓ + 1/.
The Monotone-Convergence Theorem yields (27.15) with ν as is (27.16).
Properties (27.12) and/or (27.13) make the uniqueness assertion obvious. Π
(27.18) Exercise. Now prove Lemma 27.10 (if you did not do so earlier) by
showing that if
/1=ι;1 + Γμ1, /2 = ι?2 + Γμ2
292 MARKOV PROCESSES ΠΙ.27,28
then the lattice structure of С in its intrinsic order is exhibited by the equations
/1 л л /2 =lim ГТК л v2) + Γ(μχ л μ2),
η
/ι ν ν/2=/1+/2-/1Λ λ/2,
where
(»ι Λ »2)(0:= МО Λ Μ0> (Μι л М2)(0-= /*ι(0 α μ2(0·
In this connection, we must mention Feller's historic paper [2], the impressive
first attempt to define an appropriate 'boundary' for С by lattice methods.
Hints for exercise. First prove that if ν < Γμ( < oo) then ν — 0. Next deal
separately with the cases (i) v1 — v2 — 0 and (ii) μ1 = μ2 = 0.
As an immediate consequence of the Riesz decomposition, we have the
following proposition.
(27.19) PROPOSITION. For each j in I, the function k(-,j) is a (non-regular)
extremal element of S. Every extremal element ofS that is not of the form k(-,j)
for some j in I is regular.
28. The Martin compactification. (We continue to assume 27A.) Since potential
determines charge, the map
(pi/^SczR', (p(jy=K(-J)
is one-one. We now identify I with φ(Ι) and let F be the compact closure of
I( — φ(Ι)) in S. The set F is called the Martin compactification of /, though, in
this context, the theory is due to Doob and Hunt.
Since the topology of F is inherited from that of R7,
(28.1) for each i, the тарк(г9-) extends continuously to F; then κ: Ι χ F-*R, and,
for ^eF\/, we have the alternative notations'. κ(ί,ξ) — ξ(ί).
(28.2) THOREM (Doob, Hunt). Every extremal element ofS is of the form κ(·,ξ)
for some ξ in F. The following result therefore holds. Let Fe be the set of ξ in F
for which κ(-,ξ) is extremal. Then each f in S can be written uniquely as
/= ί κ(',ξ)ν(άξ) = ν + Γμ,
where ν is a probability measure on 3S(Fe\
ι>:= κ(·,ξ)ν(άξ) is regular,
Jfc\i
and
111.28 APPROACH TO RAY PROCESSES 293
Once we establish the first sentence of Theorem 28.2, the remainder of the
theorem follows from the Choquet results (27.7) and (27.9); and we then have
Se = Fea F. (We are presently assuming the Choquet results, but recall that we
later-see Section 44—give a full independent proof of Theorems 27.7 and 27.9.)
We now argue that it is enough to prove that
(28.3) every element f of S may be written as
/ = J κ{;ξ)ν(άξ)
for some (not necessarily unique) probability measure ν on $(¥).
First argument. From (28.3), it follows that S is the closed convex hull of F. By
a standard result (much more elementary than Choquet theory)—see Theorem
V.8.5 of Dunford and Schwartz [1]—the extremal elements of S are contained
in F. Π
Second argument. First, we clarify notation by (temporarily) writing α for a
typical element of Se (not yet known to belong to F) and ξ for a typical element
ofF.
Let β be an element of Se. Its unique Choquet representing measure on 3#{Se)
is of course the unit mass at /?, denoted by ε^(·). However, by (28.3), there is at
least one probability measure v^ on 3t(F) such that
/?(*)=[ κ(ί,ξ)νβ(άξ)=\ νβ(άξ)\ *(ϊ)μξ(άα), Vi,
J#=F J#=F Ja€Se
where μξ on ^(Se) is the Choquet representing measure for κ{·,ξ). Hence
вд(-)= Г Μ^ξ)μξ(') on ®(Se),
and so, for (v^)-almost-all ξ in F, μξ = εβ. But, for any ξ in F for which μξ = εβ9
we have
so that j? = ^eF. Π
We have now reduced the problem of proving Theorem 28.2 to that of proving
the statement (28.3).
Proof of (28.3). Fix / in S. Choose a measure β such that
0<Γ/?(0<οο, Vi.
(By (27.3), it is enough to choose β so that fi(j) > 0, V/, and £Γ( j, j)fi(j) < oo.) Let
/и(О:=тт(ЛО,пГ0(О).
294
MARKOV PROCESSES
111.28
Then f„ is excessive, and since fn is dominated by the potential ηΓβ, it follows
from (27.13) and the Riesz theorem that f„ is a potential:
(28.4) /„(0 = Σ Γ(ί, ί)μη{ j) = Σ k(U j)vn( j)
where v„(;) = Г(Ь, ])μη(]). Since fn(b) = f{b) = 1 for large n, and φ J) = 1, V/,
it follows that (for large n) vn is a probability measure on F with vn(/) = 1. Since
F is compact metrisable, Pr(F) is compact metrisable in the weak topology.
Let ν be a subsequential limit of (vn) in Pr(F). Then (28.3) follows from (28.4)
and (28.1). D
The following analytical problem remains: how can we determine the 'extremal
part FeofFl
A striking probabilistic solution is~provided as one part of the Doob-Hunt
probabilistic theory of the Martin boundary for this case. (See Section 29.)
(28.5) Example. Let A" be a simple random walk on Zd such that
\{2d)~' if|j-i| = l,
[0 otherwise.
We now prove that every regular function (that is, every solution of П/ = / ^ 0)
is constant.
Case 1: d — 1 or 2. In this case, it is well known that X is recurrent. If / is
regular then (under each Pl) f(Xn) is a non-negative martingale, so that lim f(X„)
exists. Since X visits every point of Zd infinitely often, the only possible
explanation is that / is constant on TLd. This proof is (of course) due to Doob.
Case 2: d^3. In this case, X is transient and Assumption 27A holds with b = 0
(say). It is well known (see Spitzer [1]) that
Γ(ί, j) - constant | j - i\2~d (| j - i| -> oo),
as one might expect by analogy with the Brownian-motion results. Since
K(Uj)~\J-i\2-d/\J\2-4,
it is clear that F is the one-point compactification /u{oo} of / and that
/c(i, oo) = l, Vi. The desired result follows. It is worth mentioning that no
particularly simple proof is known. See Spitzer [1] for a specialisation of the
Martin-boundary argument to this case and for an Ito-McKean proof.
(28.6) Example. We now consider 'space-time coin-tossing'. Think of Xn as
(Яи, η), where Hn represents the number of heads in η tosses. We put
I:={(m9n)eZ2:0^m^n}.
U((m9n);(m + l,n + 1)) = 1 - П((т, n);(m, η + 1)) = ±.
111.29
APPROACH TO RAY PROCESSES
295
Then, for (m, n) and (r, 5) in /,
' s — η
r((m,n);(r9s)) = { \r-rn,
0 otherwise.
Taking b = (0,0), we find from Stirling's formula that if s -* 00 and r/s -* t e[0,1]
then
/c((m,n);(r,5))^/it(m,n):= 2ПГ(1 - t)n~m.
It is now clear (why?) that the Martin topology can be regarded as identifying
(m,n) in / with (1 + n)_1(m,n)ER2, with F\/ = [0,1] χ {1} and
Λ = *(·,£) K = (U)eF\J).
Thus / is a regular element of S if and only if there exists a probability measure
vonJ[0,l] such that
(28.7) /(m,и) = 2ПГ(1 - i)n"wv(</t)
Jo
This result yields an immediate solution of the Hausdorff moment problem.
(See Spitzer [1].)
The Weierstrass Approximation Theorem makes it obvious that ν is uniquely
determined by / in (28.7). Hence, for every ie[0,1], ht is extremal in S.
29. The Martin representation: Doob-Hunt explanation. We retain the notation
of Section 28. In particular, we have
C:=inf{n:*„ = <5}.
(29.1) THEOREM (Doob, Hunt). Almost surely on {ζ = οο},
Χζ_:=ΙίτηΧη
exists in the topology of F, and X^eFe.
(Recall that the probabilistic results of this section and the analytical results of
the last section will all soon (Section 43) be exhibited as consequences of Ray's
Theorem.) As explained in Section 8.5 of Ito and McKean [1], the point is that,
while Fe is large enough to allow representation of regular functions, it is also
small enough (think!) to describe the exits of X.
Let us agree to define
Χζ.:=Χζ^1 on {ζ<οο}.
(We are not interested in the case when X starts at д.) Recall that b is our
'reference' point in terms of which the Martin kernel is defined.
296
MARKOV PROCESSES
111.29
(29.2) THEOREM (Doob, Hunt). Let
l=f к(-Л)^т
be the Martin representation of the (excessive) constant function 1. Then
Pb[Zc_EB] = v1(B), VBe^(Fe).
Example. To get /(·,·) = 1 in (28.7), you have to choose ν to be the unit mass
at \. Thus Theorems 29.1 and 29.2 contain the strong law for tossing a fair coin.
Now let heS. To avoid trivial nuisances, we assume that h is strictly positive
on /. The Doob h-transform Hh of Π is defined as follows:
Then Tlh is substochastic. We have, with obvious notation,
Kh(hj) = -7-TK(hj),
h(i)
and feShii and only if hfeS. Thus F and Fe are unaffected if we change from
Π to Tlh. (You should formulate this a little more carefully.) Hence if Xih) is a
chain with one-step transition matrix Tlh then
X(h)(C(h) —) exists in Fe almost surely.
Here C(h) denotes inf{n:X{h)(n) = d).
(29.3) THEOREM (Doob). A strictly positive function h in S is extremal in S if
and only if for some single point ξ ofF, we have
χΜ(ζΜ-) = ξ, almost surely.
Then £eFe and h = κ(·,ξ). We then say that Xih) represents X conditioned to
converge to ξ.
(29.4) Notes
(a) If h = 7c(·,c), where eel, then Xih) has the same laws as {Xn: η ^ ac}9 where
ac is the time of the last visit by X to state с (Prove this as an exercise.)
This gives the correct interpretation of AT conditioned by {Χ(ζ —) = с} for eel.
(b) Theorems 29.2 and 29.3 are obviously closely related. (Investigate the
connection.)
(29.2) Example. Return to Example 28.6 and take h = ht for some fixed ie[0,1].
Then h = /c(·, f), where { = (i, l)eF\I. Then
П„((т, и); (m + 1, и + 1)) = 1 - П„((т, η); (m, η + 1)) = t.
Ш.29,30
APPROACH TO RAY PROCESSES
297
Thus
(29.6) the ht transform corresponds to the case of space-time coin-tossing for a
coin with probability t of heads.
By the strong law of large numbers, X(h) -* ξ, so that ht is extremal.
(29.7) A statistical interpretation. The property (19.6) illustrates the attractive
idea that the Doob ft-transforms correspond to the set of'appropriate' alternative
hypotheses in hypothesis-testing contexts. This idea has been developed to cover
maximum-likelihood estimation, sufficiency etc. by members of the Copenhagen
school.
30. R. S. Martin's boundary. The original Martin boundary (Martin [1]) was
introduced to describe the non-negative harmonic functions on a Greenian
domain D in Rn. Let us run quickly (and in heuristic fashion) through some of
the basic ideas. We take η = 3, which case provides the most familiar potential
theory. Every domain D in R3 is Greenian, so we do not need to explain what
'Greenian' means.
Let g be the free-space Green function for R3 for |Δ:
g{x9y)-={2n\y-x\)~1.
Then the Green function gD for D is the smallest non-negative function on
D χ D such that
(30. l)(i) g — gD is bounded in the neighbourhood of each 'diagonal' point (x, x)
ofDxD;
(30.1)(ii) ±Ax(g - gD) = ±Ay(g - gD) = 0 on D;
(30.1)(iii) gD(x9y) = gD(y,x).
Such a function gD is known to exist. See Section 7.4 of Ito and McKean [1].
Let K:=R3\I>. The physical significance of gD(x,y) is as the potential at χ
due to a unit charge (we are using 'probabilistic' units—see Section 1.22) placed
at у when V is earthed. The probabilistic significance of gD is that if X is a
Brownian motion on R3 then, for x,yeD,
(30.2) E*[time spent by X in dy before time #K] = gD{x,y)dy.
The strong Markov theorem yields the probabilistic formula
(30.3) gD(x,y) = g(x,y) - Exg(XoHv,y),
or, in better notation,
(зо.зо gD(->y) = g(;y)-Pvg(;y).
(Of course there are all sorts of details of rigour to be chased up, but, for the
moment, who cares?) Classical Martin boundary theory reduces to the analytical
298 MARKOV PROCESSES IH.30,31
part of the Martin-Doob-Hunt theory of the process {Xt: t < Hv) killed on
leaving D.
Pick a reference point b in Д introduce the Martin kernel
and take the Gelfand-Stone-Cech (GSC) compactification F of £> determined
by the separable class of functions
{arctan k(x, ·): xe£>}.
Full details of such GSC compactifications are recalled later in this chapter. The
point is that each function /фс,·) extends continuously to F. Martin's Theorem
is that every non-negative harmonic function /οηΰ with f(b) = 1 has a unique
representation
> = /фс, <
JFe\D
/(*)= κ(χ9ξ)ν(άξ)
JFe\D
where Fe is what you expect, and ν is a probability measure on 3i(Fe\D).
{30.4) Example. Let D be the open ball {xeR3: |x| < 1}. Then Kelvin's method
of images shows that, for yeD,
Ы0,у)-(2л:) (* = 0),
where x*:= |x|~2x is the point inverse to χ in the sphere 3D = {£eIR3: |f | = 1}.
Taking b = 0, you find that as γ-^ξεδΌ (in the Euclidean topology),
/c(x, у) -> /с(х, {):= -.
Ιί-χ|3
Hence (why?) F = DvdD with the Euclidean topology, and every positive
harmonic function / on D with /(0) = 1 may be written as
(30.5) /(*)= κ(χ,ξ)ν(άξ).
J 3D
Invariance under SO(3) implies that every point of 3D is in Fe, so that the
representation (30.5) of / is unique.
We investigate this example further in Section 31.
31. Doob-Hunt theory for Brownian motion. It is easy to guess the basic form
of the Doob-Hunt theorems for Brownian motion or indeed for a general
Markov process. However, we need to be careful about the precise formulation
111.31
APPROACH TO RAY PROCESSES
299
of the concepts of harmonic function and excessive function in the context of
continuous-parameter processes. In Example 30.4,'/ is harmonic' means of course
that jAf = 0 (on D). The correct probabilistic formulation relies on the
mean-value property.
Here are some guidelines.
Let {Pt) be a transition function (in the 'abstract' sense of (1.1)) on a measurable
space (E,S). An <f*-measurable function / from Ε to [0, oo] is called excessive
(for {/>,}) if
(31.1)(i) / is supermedian: Ptf^f, Vi,
(31.1)(ii) limPI/(x) = /(x) Vx.
no
Of course, if χ is excessive, then Ptf(x)]f(x) (Ц0). Do note that, for excessive
functions, the 'smoothness' condition (31.1)(ii) is imposed in addition to the
'supermedian' condition (31.1)(i). In Ray theory the difference between super-
median and excessive becomes critically important.
Now let X be a 'nice' process on a nice space E. Let {Pt} be the transition
function of X. If / is an (?*-measurable function from Ε to [0, oo] satisfying
the smoothness ^condition (31.1)(ii), then / is excessive if and only if / is
superharmonic in the following sense:
whenever A is an open subset of Ε with compact closure A. We call an
<f*-measurable function from Ε to [0, oo] satisfying (31.1)(ii) harmonic if the
mean-value property
PE\Af = f
holds whenever A is an open subset of Ε with compact closure A.
It is important to notice that, in general, a harmonic function will not satisfy
Λ/ =/> Vi. Example 31.4 will clarify this matter.
After all that, the analogues of Theorems 29.1-29.3 for the situation discussed
in Section 30 are obvious (in form!). Let D be our domain in R3. Let K:= R3 \D
and let F be the Martin compactification of D. Then, for Brownian motion X
started inside D, it is a.s. true that
X(Hv-):= lim X{t) exists in F
and X(HV — )eFe\D. This should set you wondering about the relation between
regular points for the Dirichlet problem and extremal points of the Martin
boundary. The following exercise should make you think further about Dirichlet-
Martin connections.
(31.2) Exercise. Return to the case where D is the open ball {| χ | < 1} for which
300
MARKOV PROCESSES
111.31
Fe = DKjdD with the Euclidean topology. By considering the effect of changing
the reference point b from 0 to a point с of D, show that
Ρ£ΙΧ(Ηδο)Εάξ^ = κ(€9ξ)μ(άξ),
where μ is a normalised surface area measure on dD. If g is a continuous function
on dD then the unique continuous function h on D with Aft = 0 on D and h = g
on dD is therefore given by the Poisson formula
J a,
(31.3) h(x)=\ κ(χ,ξΜξ)μ(άξ).
idD
For fixed ξεδΰ, calculate explicitly the differential generator
±/c(x, {)" χΔκ(χ, ί) = £Δ + /с(х, ζ)'1 grad /c(x, £)-grad
of Brownian motion conditioned to hit dD at ξ, and convince yourself that the
'extra' term behaves as it should.
(31.4) The Helms-Johnson example revisited (see EII.79.77). Take D:= R3\{0}.
Let X be Brownian motion in R3 started in D. Since X never visits 0, the Green
function for D is just the restriction toDxD of the free-space Green function
0в(х>у) = (2п\у -xW1 (x,yeD).
The Martin compactification adjoins two points 0 and oo to D, producing the
expected one-point compactification of R3. If the reference point b is chosen
with |b| = 1 then
/фс,0) = |х|-1, /c(x, oo)=l, Vxe£>.
The function f on D with /(x) = |x|_1 is harmonic in D, but Ptf(x) φ /(χ),
Vi > 0, VxeD. Indeed, you can check that
(31.5) Ptf(x) = [Φ(χ[2ί] " ^2) - Φ( - χ[2ί] " 1/2)]/(x), Vi > 0, xeD,
where Φ is the normal distribution function:
Φ(^) = (2π)"1/2
exp (— j u2) du.
Equation (31.5) makes it clear that / is excessive for {Pt}.
The fact that / is supermedian implies that, VxeD9f(Xt) is a swpermartingale
relative to ({^°},Ρ*). The fact that / is not invariant under {Pt} implies that
xeD,f(Xt) is not a martingale relative to ({^°},Ρ*). The 'intermediate' fact that
/ is harmonic corresponds to the statement that / is a local martingale. It is
in local-martingale theory that the main illustrative importance of the present
example lies.
Let
Ги:= inf {t:Xt = n~x} = inf {t: f(Xt) = n) < oo.
Ш.31 APPROACH TO RAY PROCESSES 301
We can find gn in 3>(^Δ) with gn = f on {М^и-1}. For |x| ^ n"1, we have,
by Dynkin's formula (10.12),
Pt*Tnf(x)-f(x) = ExfoX(tA Tn)-f(x)
= Ε>ί(ίΛΓ„)-#)
тглТп
= EX
±Ag(X5)ds = 09
since jAg = jAf = 0 on {| у | ^ η *}. Hence, for each fixed x, we have
(31.6) Ллг„/(*) = /(*), Vi,Vn>|x|-1.
Since Tn | oo, this property can be regarded as an appropriate ('local') correction
of the false statement that Ptf(x) = f(x\ Vx Vi. Note further that, on letting
пЦоо in (31.6), we obtain
Ptf(x)^f(x), Vi,VxeD,
by Fatou's Lemma. Since Ptf(x)^f(x), Vi > 0, VxeD, it follows that
(31.7) for each t and each xeD, the sequence {f(XtA rJ:neN} is not
uniformly integrable relative to Px.
We shall appreciate the full significance of (31.7) only when we come to study
local-martingale theory.
It is worth noting that, since | Xt \ ~ * is a continuous supermartingale relative
to each P* (xeD), lim,^ \Xt\~1 exists a.s.(P*).
Exercise. Explain why the limit must be 0, a.s.(P*).
Thus a 3-dimensional Brownian motion started away from 0 will never hit 0
and will drift to oo. The corresponding result is true all the more in dimension
η > 3. Of course, we have known and used these properties since Section 1.18.
Next consider 'X Doob-conditioned to converge to 0' with generator / " 1{\A)f.
The radial part of this process is nothing other than a 1-dimensional Brownian
motion absorbed at 0. (Check this!)
We can see this working in reverse. Let β be a 1-dimensional Brownian
motion on (0,oo) with 0 as killing boundary (see Section 1.13). The Martin
compactification is of course [0, oo], and if we take b = 1, we obtain
/c(x, 0) = 1, /c(x, oo) = x.
Thus 'Brownian motion on (0, oo) Doob-conditioned to converge to oo' has
generator
2dx2)X =
in other words, it is a BES (3) process.
2dx2) 2dx2 dx
302
MARKOV PROCESSES
IH.31,32
(31.8) Comments. Martin-Doob-Hunt-Ray-Kunita-Watanabe boundary
theory may be developed under assumptions of extreme generality. We present
proofs only for the discrete-parameter chain case. But we do so by the most
modern continuous (l)-parameter methods. We then give a list of references in
which you may chase up the general theory.
The long account of Martin-Doob-Hunt theory that we have already given
has been a gentle introduction to compactifications. It helps prepare the way
for the much more sophisticated Ray-Knight compactification. Further, it will
be reassuring after working through the Ray-Knight theory to find that in the
case of discrete-parameter chains we obtain exactly (there are many boundaries!)
the familiar results of Section 27-29.
After all this talk of Martin boundary theory, let us remind you that it is
but one application of Ray theory. Indeed, it is not the one that chiefly concerns
us in this book!
32. Ray processes and right processes. It remains to answer question (25.1)(iii).
Blumenthal and Getoor [1] present a theory based, not on assumptions like
the FD property that rely on analytical properties of the transition function,
but on probabilistic axioms for a setup
The fundamental concept in Blumenthal and Getoor [1] is that of a standard
process. Now there are obvious advantages in using probabilistic axioms: for
example, the property of being standard is preserved under various probabilistic
operations that do not preserve the FD property.
Briefly, the situation is this. Some generalisation of standard process was
needed in order to cope with branch-points. Meyer introduced the concept of
a right process (process satisfying les hypotheses droites) as the 'natural' concept
for Markov-process theory. Ray's hypotheses remained the most general analytical
hypotheses on transition functions. Ray, Knight, Meyer, Shih, Walsh, Getoor
and Sharpe all participated in work that culminated in proving that the concepts
of Ray process and right process are essentially the same. This is explained very
clearly in the books by Getoor [1], Sharpe [1] and Dellacherie and Meyer [1].
Let us quote part of Getoor's book (perhaps you will not mind if we emphasise
the obvious fact that the set D of non-branch-points has no connection with
our domain D in the classical Martin-boundary case!):
One has the following inclusions among the various classes of processes:
(Feller) c= (Hunt) c= (special standard) c= (standard) c= (right).
... These different types of processes were introduced at various stages
during the development of the modern theory of Markov processes. In
view of the theory to be developed in the sections it seems to me that
[except for right processes] they are now mainly of historic interest A
Ш.32,33
RAY PROCESSES
303
Ray process У on a compact metric space Ε is not necessarily a right
process since Ρ0φΙ [in general]. However, if one restricts Υ to the set
of non-branch points Д then it is a Borel right process. Since YteD for
all t ^ 0, this amounts to considering initial measures that are carried by
D. In what follows we shall see that the converse is true in the sense that
if X is a right process with state-space £, then by changing the topology
on Ε one can essentially regard X as a Ray process restricted to its set of
non-branch points.
There is thus a perfect match between the probabilistic and analytical parts
of the theory. However, at least in special circumstances, one can achieve such
a match in other ways, as is illustrated by the important books by Fukushima
[1], Silverstein [1,2] on symmetrisable processes. But the moral is that the
subject's masters regard its theoretical foundations as being fully achieved by
the essentially equivalent theories of Ray processes and right processes. We
content ourselves with introducing you to the more concrete Ray processes; the
masters can then set you right.
5. RAY PROCESSES
33. Orientation. Our basic datum will be a resolvent {Rx: λ > 0} on a space /.
In general, there will not exist a nice (right) process with resolvent {Rx} and
taking all its values in /. In order to construct a nice (Ray) process with resolvent
{Ля}, we must generally allow the process to take values in a suitable compacti-
fication F of /. (If this process starts in /, and if / is a Borel subset of F, then
the set of times spent in F\I before the death-time of the process will be of
measure zero.) First we must construct the Ray-Knight compactification F of
/ determined by {Ля}, extend {Rx} to a 'Ray' resolvent on F, and construct the
'Ray' transition function {Pt} on F with resolvent {Ля}. Then we shall construct
the Ray process on F with transition function {Pj.
The main case in which we are interested is that when we begin with a
'standard' transition function on a countable set /; and our notation for the
general case reflects that for chains.
Here (as a guide—you are not expected to understand about RK compactifi-
cations, branch-points etc. now!) is a list of our notations for chains:
/: countable 'minimal' state space.
{Pt}: ('standard') transition function on / with resolvent {Ля};
F: Ray-Knight compactification of / determined by {Rx};
Fe: set of non-branch- (or extremal) points in F;
FbT: set of branch-points in F;
E:=tfeF:Pt(bF\I) = 0,Vt>0}.
In the definition of £, we have used the notation {Pt} to denote the Ray extension
on (F, β&¥) of the original transition function on /.
304
MARKOV PROCESSES
IH.33,34
(33.1) Important note. Let / (for reasons explained in Section 32, we abandon
the notation 'D for domain') be a bounded domain in R3 and let V\— R3\/. To
obtain the Martin-Doob-Hunt results for Brownian motion on /, we apply
Ray theory not to {Xt: t < Hv; Pb} but to its time-reversal
X := {X{HV -t):0<t<Hv; Pb},
which is also Markovian with stationary transition probabilities and which has
Green function
ui(y, χ) = 0/(0, *)gi(x, y)/gi(0, у) = £7(0, х)к(х, у).
Note the order in which у and χ appear, and also the appearance of the Martin
kernel. Time-reversal will be studied in Part 6 of this chapter.
Ray theory is an entrance-boundary theory, and time reversal has to be
invoked to apply it to theory of Martin exit boundary. Where X goes to is
where X comes from.
If / is an arbitrary (possible unbounded) domain in R3, we first speed up X
via a time substitution of the type described in Section 21, so as to ensure that
X exits / (or else dies) within a finite time. Then we reverse X to produce X.
The faster the speeding up, the smoother will be the analytic properties of X.
As we have already mentioned, you will see all the tricks during our 'Ray'
treatment of the discrete-parameter chain case.
(33.2) Our plan. We first describe the 'good' situation, that in which we have a
Ray resolvent on a compact metric space F. Then we explain how a resolvent
on a measurable space (/, J) may, under minimal conditions, be extended to a
Ray resolvent on a Ray-Knight compactification F of /. We then construct
the Ray semigroup and Ray process on F.
34. Ray resolvents. Let F now denote an arbitrary compact metric space. Let
{Rx: λ > 0} be an honest Feller resolvent on C(F):
RX:C(F)^C(F), 0</^1=>0^АЯя/<1,
ЯЛЯ1 = 1, Rx - Κμ + (λ - μ)/?λ*μ = 0.
(34.1) DEFINITION (continuous α-supermedian function). For α^Ο,αη element
f ofC(F) is called a (continuous) α-supermedian function relative to {Ля}> ап&
we write /eCSMa, if
0^XRx+af^f (УЛ>0).
For β > a,
(34.2) XRx+fif = XRx+af - λ(β - a)Rx+aRx+/if,
so that
CSMa = P) CSM".
β>α
111.34 RAY PROCESSES 305
(34.3) DEFINITION (Ray resolvent). Our honest Feller resolvent on F is called
a Ray resolvent if
(34.4) (J CSMa separates points of F.
(34.5) LEMMA. {Rx} is a Ray resolvent if and only ifCSM* separates points
of F for each fixed strictly positive a.
Lemma 34.5 obviously follows from the following result.
(34.6) LEMMA. The vector space
JS?:=CSMa-CSMa (a>0)
is independent of a > 0.
Proof. Let 0 < a < β. Since CSMa с CSMfi, it is enough to prove that if (as we
now assume) feCSMfi then / may be written as the difference of two elements
of CSMa. But
f = lf + W-a)Ran-W-x)Raf
and, since (β — a)i?a/eCSMa because of the resolvent equation, it is enough
to show that / + (/?-a)i?a/eCSMa. Now, since /eCSM", it follows from (34.2)
that
*Rx+*f^f + tf-x)Ri+af
Using this fact and the resolvent equation again, we obtain
^+Л Λ-(β-^^^ί ^(β-^Rx^f + (β-^^^ -Rk+af)
so that /eCSM" as required. D
(34.7) LEMMA. // {Rx} is a Ray resolvent then $£ is a dense subspace of the
Banach space C(F).
Proof. For each a, CSMa is obviously closed under the operation л . For /,
g, h, fceCSMa, we have
(/-ff)A(fc-fc) = [(/ + fc)-to + fc)]A[(fc + ff)-(ff + fc)]
= [(/ + fc)A(A + 0)]-fo + fc)6J2f.
Hence S£ is closed under л, and, since 5£ is a vector space, 5£ is a lattice. Since
$£ contains constant functions and $£ separates points of F because of the Ray
hypothesis, the lattice form of the Stone-Weierstrass Theorem gives the result.
D
306
MARKOV PROCESSES
111.35
35. The Ray-Knight compactification. We make the following hypothesis.
{35.1) GENERAL HYPOTHESIS. Suppose given
(i) a measurable space (/, J)\
(ii) an honest resolvent {Rx: λ > 0} on (/, </), so that in particular Rx: bJ -* hJ\
(Hi) a sequence S = (fk: fceN) of elements ofbJ such that, for x,yel with χ фу,
there exist λ > 0 and fceN such that Rxfk(x) Φ Rxf\ky\
Heuristic comment. If this last property failed for some χ and у then the process
started at χ would be identical to that started from у on the time-parameter
set (0, oo); so we would identify χ and y.
We want to view {Rx} as a Ray resolvent on a certain compactification F
on /. For this purpose, we introduce certain Banach subalgebras of be/. Let
A() denote 'Banach algebra generated by'. Define inductively
(35.2) Ζ^λίΐ, U *яД Z„+1:=a(z„, (J ВДД
It is an immediate consequence of the resolvent equation that the family
{Rk: λ > 0} of bounded operators on the Banach space bJ is separable in the
uniform operator topology; and it follows easily that each Zn is separable. Put
(35.3) Ζ := closure ({JzA
Then it is easily verified that the separable Banach algebra Ζ is the smallest
Banach subalgebra ofbJ that contains constant functions, contains RxSfor each
Λ>0, and satisfies RX:Z-+Z for each λ>0. Further, it is immediate from
Hypothesis 35.1 that Ζ separates points of I.
Let {gn:neW} be a countable dense subset of Z. Define the map φ: /-»RN
as follows:
Ф(х):=Ы*Ш*Х...)еКк.
Since Ζ separates points of/, φ is one-one. We now identify I with φ(Ι). Since
φ(Ι)^Υ\ί-\\9η\\,\\9ηη,
П
the closure F οϊφ{Ι) is compact in 1RN. We call F the Ray-Knight compactification
of Ι — φ(Ι) induced by S. We skip discussion of the influence of S: it has no
practical importance for us.
Since every gn has a unique continuous extension from / to F, it follows that
every g in Ζ has a continuous extension to F. Thus Ζ ^ C(F) in an obvious
sense. However, the closed algebra Ζ contains all constant functions, and, by
construction of F, separates points of F. Hence, by the Stone-Weierstrass
111.35
RAY PROCESSES
307
Theorem,
(35.4) Z = C(F).
(35.5) LEMMA. {Rx :λ > 0} is a Ray resolvent on C(F).
Proof. Since RX:Z->Z for each Λ>0, {Rx:X>0} is a Feller resolvent on F.
Now let CSMa denote the set of continuous a-supermedian functions for
{Дя:А>0} on F, and let &:=CSMa-CSMa (independently of α) as in
Lemma 34.6. Suppose that there exist two distinct points ξ and η of F that are
not separated by S£. Now, for heC(F)nS, Rxh = Rx(h+) - Rx(h~)e&. We can
therefore show inductively that ξ and η are not separated by Zl9 or by Z2, etc.
But this leads to the ridiculous conclusion that ξ and η are not separated by
Z = C(F). D
Here is an important situation in which we may use the above construction.
(35.6) SPECIAL HYPOTHESIS. Suppose that
(i) I is an LCCB;
(ii) {Rx: λ > 0} is a Feller resolvent on Cb(I);
(Hi) XRxl = 1;
(iv) lim^ XRJ(x) = f(x\ V/eCb(/), Vxel.
One case in which this special hypothesis holds in that in which we have a
Markov chain on a countable set / with standard transition function and we
begin with the discrete topology on /, taking, S to be the set of indicator function
of dements of /.
Suppose now that the special hypothesis (35.6) obtains. Then we may take
S to be a countable dense subset of C0(I). Then the general hypothesis (35.1)
holds. Since every function gn will be continuous on /, the map φ will here be
continuous: the topology on / induced by F will be at least as coarse as the
original topology. We emphasize that in this situation, it can happen that
limAR/({)*/({)
A-oo
for some £eF.
Note that if the special hypothesis (35.6) holds, and / is compact, then, by
the proof of Lemma 6.7, {Rx: λ > 0} will be a strongly continuous resolvent on
C(I\ and we shall be back in the FD situation.
(35.7) The Feller-McKean chain (see Section 23). If we are given the Feller-
McKean chain viewed as a chain on Q with its discrete topology, and take S to be
the collection of indicator functions (Iq:qeQ$), then the Ray-Knight compacti-
fication of Q will be the usual one-point compactification of R. Hence the
Ray-Knight topology of Q—that induced from the topology of F—will be
308
MARKOV PROCESSES
111.35
coarser than the original topology. (The set {0} was open in the discrete
topology). The proof of the statements just made follows from results on one-
dimensional diffusions X in Section V.50 in Volume 2, the essential point being
that, by Dynkin's formula,
\RJ(x)-RJ(y)\^
ι
e-*f(Xs) ds + [Έ*(β- λΗ>) - 1] RJ(y)
< 2/1-41Я [1-Ех(е_Шу)]·
(35.8) Example. We now look at a famous example which illustrates a case
where our Ray hypotheses do not hold. Let /:= [0, oo). Our process X stays at
0 for an exponential holding time of rate 1 and then drifts towards oo at rate
1. Thus
f(x + t) ifx^O,
PJ(x)
■f
Jo
e~7(0)+ e~sf(t-s)ds ifx = 0.
This time, Rx does not map Ch(I) to Ch(I).
Let us concentrate on the situation when X starts at 0. Put Г:= inf {i: Xt Φ 0}.
Then Τ is an {«Г,°+} stopping time, but, since X(T) = 0 and X{T + ε) Φ 0, Ve > 0,
the process X does not start afresh at time T. (This example is the standard
example of a simple Markov process that is not strong Markov relative to the
{J%°+} filtration.)
For this example, what we need the Ray-Knight compactification to do is
to enforce the strong Markov property by tearing [0, oo) apart at 0, producing
a 'corrected' state space F:=0u[0+, oo), where [0+, oo) is homeomorphic to
the usual half-line [0, oo) and where 0 is now a point isolated from [0+, oo).
Our process (started at 0) is then modified by setting
Xt:=0eF (i<T), Xr:=0+, Xt:= Τ (ί>Τ);
and it is now strong Markov, as required.
In this example, the mapping Ι\-+φ(Ι) is not continuous: the Ray-Knight
topology on / is now bigger than the original topology because {0} is open in
the RK topology.
{35.9) Levy's diagonal Q-matrix. Levy 'jazzed up' Example 35.8 to produce a
remarkable illustrative example for Markov chain theory. Let /:= Qn [0,1]. Let
q be a strictly positive function on /\{1} such that
Σ 4Γ1<00·
iE/\{l}
Put qx:= 0. Our process takes values in [0,1]. Its sample paths are continuous
increasing functions which spend almost all their time in /. (The paths are
ΙΠ.35,36
RAY PROCESSES
309
'random Cantor functions'.) We now describe the law of the process started at
0. Let {Tj'.jel} be independent exponentially distributed variables, T} having
rate qp so that, in particular, 7\ = oo. Define
xt:=j if Σ Γ^ί<Σ тк
k<j k^j
and interpolate X by continuity (or monotonicity). Then X is a simple Markov
process with resolvent satisfying
i^k<j
as for the pure-birth process in Section 26.
The discussion of Example 35.8 makes it clear that the Ray-Knight compacti-
fication should tear the set [0,1] apart at each point i of /\{1}, so that the
process will spend an exponentially distributed time at i, then jump to i+ and
leave i+ immediately. The RK topology on / arises from a metric d9 which, in
this example, is given by
d(i,j) = Ei{Hj)= Σ q-\
Whether or not we deal in compactifications, the process X cannot jump
from a point i of / to a point j of /. (The point i+ is of course not in /; it is a
'fictitious' state from the point of view of chain theory.) Hence the β-matrix Q
of X satisfies qV) — 0 whenever i Φ). Thus Q is the diagonal I x I matrix
G = diag(-^) = (-^o)·
Ray's Theorem: analytical part
36. From semigroup to resolvent Here is Part 1 of Ray's amazing achievement.
(36.1) THEOREM (Ray's Theorem: Part 1). Let {Rx} be a Ray resolvent on a
compact metric space. Thus {Rx} is an honest Feller resolvent, and the space 5£
in Lemma 34.6 is dense in C(F). Then there exists a unique honest measurable
transition function {Pt} on {F9@{F)) (see Section 3) such that
(i) t\-^Ptf(x) is right-continuous on [0, oo)for xeF andfeC{F);
(ii) RJ(x) = f °° e~XtPtf(x) dt (feC(Fl xeF, λ > 0).
Points to note
(a) It is not true in general that Pt: C(F)->C(F).
(b) It is not true in general that P0 = /, though, of course,
P0Pt = PtP0 = Pt, Vi^O.
310
MARKOV PROCESSES
111.36
(с) It is not true in general that t\-+Ptf{x) is continuous for feC{F) and xeF.
An example illustrating these points will be given at the end of the next section.
Getoor [1] and Dellacherie and Meyer [1, Chapter XII] prove the theorem
from first principles, that is, without using the Hille-Yosida Theorem. Since we
already have the HY Theorem, we give a proof based on it, guided in part by
Meyer [2].
Proof of Theorem 36.1. Because of the importance of the theorem, we go
carefully through the proof. Throughout the proof α denotes a fixed positive
number. Though α is used in its construction, the final transition semigroup
{Pt} will, of course, be independent of a.
We make use of the well-known fact that if (xmn) is a double sequence such
that both m\-+xmn and n\-+xmn are monotone non-decreasing then
lim lim xmn = lim lim xmn.
m η η m
(This is of course what underlies the Monotone-Convergence Theorem.) As a
consequence, we see that
(36.2) the limit of a non-decreasing sequence of right-continuous
non-increasing functions mapping [0, oo) into [0, oo) is
right-continuous.
Strategy. The idea of the proof is to begin by using the Hille-Yosida Theorem
to establish results on the domain Z0 of strong continuity of the resolvent; to
extend these results by monotonicity to functions in CSMa; then by linearity
to functions in jS?a:=CSMa-CSMa; then by continuity to functions in C(F),
since J5fa is dense in C(F); then by the Monotone-Class Theorem to functions
in b^(F). But we have to be careful of the order in which this strategy is applied.
Step 1: Write
(36.3) ^:= RXC{F) (independently of λ > 0), Z0 := Λ,
the closure being in C(F) or b^(F). By the Hille-Yosida Theorem, there exists
a strongly continuous contraction semigroup {Qt: t ^ 0} on Z0 with resolvent
the restriction of {Rx} to Z0. It is clear from (5.4) and (5.5) that each Qt is
positive on Z0 in that if /eZ+ (that is, feZ0 and />0) then Qtf ^0. The
constant function 1 is in Z0, and Qt\ = 1 for all t. Finally, for /eZ0, ЛКя+а/-*/
as Л-*оо, whence
(36.4) for feZ0 and t > 0, Qt(XRx+J) -> Qtf (strong topology).
Step 2: Let t^0 and let /eCSMa. The map X\-+XRx+af is non-decreasing
because, for 0 < λ < ν,
vRx+J-XRx+af = (v-X)Rv+alf-XRx+Jl>0.
111.36 RAY PROCESSES 311
Since XRx+afe3i, so that Qt{XRk+sJ) is defined, we may (as we must!) define
(36.5) for /eCSM" and xeP, Ptf(x):= ΐ lim {QMRx+af)} (x).
ЯТТоо
The formula (36.5) and the linearity and positivity of {Qt} make it clear that if
/i> /2eCSMa and cl9 c2e[0, oo) then
Pt(cifi + c2f2) = c1PJ1 + c2P2/2,
and if /, geCSM* with f^g then
Ptf>?a.
It is clear on comparing (36.5) with (36.4) that
Pt = Qt on Ra(C(F)+).
The map Pt extends uniquely by linearity to a positive map
Pt: &* -> b^(P), where S£ a:= CSMa - CSMa,
and P/ agrees with β, on 9t. Because Pt is positive and linear and Pf 1 = 1, and
since JSfa is dense in C(F) (by Lemma 34.6),
(36.6) Pt has a unique extension to a bounded linear operator Pt: C(F) -* b@(F).
We know from Theorem 6.2 that there exists a unique kernel Pt: F χ $?(P) -*
[0,1] such that
хь->Р,(х,В) is Jf(P)-measurable for each Be@(F\
p\-^Pt(x,B) is a probability measure on (F,3#(F)) for each xeP,
Λ/(*)= f ЛМ)0/(Л V/еОД.
The map /ι—► Ptf from C(F) to b^(P) therefore extends canonically to q map
from b^(P) to b^(P).
Step3: We claim that the linear operators {Pt:t^0} on hSS{F) have the
semigroup property:
(36.7) PsPt = Ps+t (ΟΟ,ί^Ο).
Proof of (36.7). We know from the Hille-Yosida Theorem that P5Ptf = Ps+tf
for feZ0. Now let /eCSM". Then XRk+afeZ0, and
PsPt№i+af) = P5+tttRx+af).
We have РДЯРя+а/)|Р,/ by (36.5); and, since Ps arises from a kernel, the
Monotone-Convergence Theorem yields
(PsPtf)(x) = (Ps+tf)(x) (/eCSM«, xeF).
312
MARKOV PROCESSES
111.36
The result extends according to the remainder of our strategy: by linearity to
/ei?a, thence by continuity to feC{F\ and thence by the Monotone-Class
Theorem to /eb^(F).
Step 4: We now show that
(36.8) for geC(F)+ and xeF,e-atPtRag(x) < Rag(x%
and the map t\->e~atPtRag(x) is right-continuous and non-increasing.
Proof of (36.8). For λ > 0, let hx:= XRx+ageZ0. By the Hille-Yosida Theorem,
we have
e-«Pt{XRx+aRag) = e^PtRahx = Г e~^Prhkdr
^RJi^R^-R.^g^R^
But Ra0eCSMa, so that, by (36.5),
(e-^PtRag)(x)^(Rag)(x) №).
That
e-«s+»Ps+tRag(x)^e-*sP5Rag(x)
is now clear from the semigroup property.
By the principle (36.2), the map t \-^e~atPtRag(x% the non-decreasing limit of
the right-continuous non-increasing maps t\-+e~atPt(XRx+aRag)(x% is right-
continuous.
Step 5: Next we prove the following.
(36.9) LEMMA. For feCSM* and xeF, the map tb^e~atPtf(x) is non-increasing
and right-continuous.
Remarks. This fact is of central importance in the probabilistic theory.
Proof of Lemma 36.9. We have
XRx+J=RAf-№x+J\
and X(f-Rx+af)eC(F)\ whence, by (36.8),
t\-^e~atPtXRk+aLf(x) is non-increasing and right-continuous.
The desired result now follows from (36.5) and the principle (36.2).
Step 6: It is now clear, since 5£a is dense in C(F\ that,
(36.10) for feC(F\ the map t\-+Ptf(x) is right-continuous.
Ш.36,37 RAY PROCESSES 313
The Monotone-Class Theorem shows that Pt defines a measurable transition
function on (F,08(F)\ and all that remains is to confirm that, for feC(F) (or for
feb@{F)\ for xeF and λ > 0, we have
(36.11) Γ e~XtPtf(x)dt = RJ(x).
Jo
Proof of (36.11). If feZ0 then (36.11) holds by the НШе-Yosida Theorem.
The argument is now completed by the machinery in our strategy. D
{36.12) Exercise. Prove that
Z0 = {feC(F):P0f=f}.
Hint. Compare Lemma 6.7.
37. Branch-points. We continue with the notation and hypotheses of Section 36.
The set Fe of non-branch-points of F (for {Pt}) is defined as follows:
Fe:= {xeF:P0f(x) = f(x), V/eC(F)}
= {xeF:P0{x,-) = ex}.
The set Fbr of branch-points is defined as Fbr:= F\Fe.
The proper explanation of the role of branch-points escapes the analysis, and
has to wait for the probability: you need to see the paths of the process.
(37.1) LEMMA. A semigroup on a compact metric space F is FD if and only if
it is a Ray semigroup without branch-points.
(Of course, a 'Ray semigroup' is a semigroup derived from a Ray resolvent as
described in Theorem 36.1.) Lemma 37.1 is an immediate consequence of the
proof of Lemma 6.7.
Now let (hm) be a dense sequence in CSM1. Then
P0hm = Τ Hm XRx+1hm ^ hm, Vm.
Since JS? = CSM"1 - CSM1 is dense in C(F),
Fe=f){P0hm = hm}
m
= f]f]{P0hM>hm-n-1}
m η
= ПП(и№иА«>1-«"1}\
m η \ λ /
so that, since the set in large parentheses is open,
(37.2) Fe is a G5 in F.
In particular, Fee$tF.
314
MARKOV PROCESSES
111.37
Let μ be a probability measure on {F, $F). We write μΡ, for the measure
' (',:= i
μΡ,(·):= | μ{άχ)Ρι{χΛ
However, it is better to think in terms of the functional notation
(μ,0):=μ(0):= \ράμ
for measures, and define μΡί via
faPtJ) = (^Ptf).
{37.3) LEMMA. For^ePr(F),
μΡ0 = μ if and only if /x(Fbr) = 0.
Proof With (hm) as above, we have
μρ0 = μο(μΡ0, ft J = (μ, ftj, Vm,
ομ{Ρ0Ηηι>Ηη-η-1} = 1 Vm,n.
(37A) COROLLARY. Pt(x, FJ = 0, VxeF, Vt ^ 0.
Proof Since Ρ,Ρ0 = Λ> ^e measure μ:= εχΡ, satisfies μΡ0 — μ. D
(37.5) Example. Here is a rather artificial example to illustrate points (a), (b)
and (c) made after the statement of Theorem 36. Let F:= [ — 1,1]. We take a
process that while away from 0 drifts towards 0 at rate 1, and which on
approaching 0 jumps (or branches) to +1 or — 1 with probability \ each. For
x>0,
Pf(x) = lf{X~t] {t<X%
where <·> denotes 'fractional part of. For x^O,
\f{x + t) {t<\x\\
Ptf(x) = \
\Ptf(-x) (t>\x\).
You can see that Pt does not map C(F) to C(F) and that t\-^Ptf(x) has points
of discontinuity.
Now, for χ > 0,
Jo
f(x-t)dt + e~XxRxf(0),
111.38 RAY PROCESSES 315
with a similar 'Dynkin formula' for χ <0. It follows that Rx:C(F)->C(F). The
result Z0 = {/eC(F):P0/ = /} of Ray theory leads us to believe that
Z0:= closure RXC(F) = {feC(F):f(0) = ±/(1) +|/(-1)},
and this result is easily verified directly. The fact that {Rx} is indeed a Ray
resolvent now follows immediately from the fact that Z0 separates points off.
38. Choquet representation of 1-excessive probability measures. We continue
with the notation of Sections 36 and 37. For α>0, a probability measure μ on
(F,@F) is called a-supermedian (relative to {Яя}) if
λμΚλ+α<μ> УЛ>0,
that is, if
μ№λ+α/)^μ(η V/eC(F)+, VI
For every μ in Pr (F) and every α ^ 0, we have (as λ -* oo)
μ(λΚλ + α/)^μ(Ροη V/eC(F),
so that, in the weak topology of Pr (f),
λμΚλ+*-*μΡο·
Hence a probability measure μ on (F, Я¥) will be called cc-excessive (relative to
{Ля}) if
(38.1) (i) μ is a-supermedian;
(38.1)(ii) μ = μΡ0, or equivalently (Lemma 37.3) /x(Fbr) = 0.
Fix α = 1 for convenience. The set of 1-excessive probability measures on F
is easily shown to satisfy the hypotheses of both the existence and uniqueness
parts of Choquet's representation theorem. However, we can now obtain the
explicit form of the representation by simple direct methods.
{38.2) THEOREM (Ray's theorem: Part 2). Let μ be a l-excessive element of
Pr (F). Then there exists a unique element ν of Pr (F) such that v(Fbr) — 0 ana
Proof For λ > 0, set
νλ:=(Λ+1)(μ-λμΚλ+1).
Because {Rx} is honest, vxePr(F). Recall that Pr(F) is compact and let ν be
any limit point of νλ as A-* oo. The resolvent equation shows that
νλΛ1=α+1)μΚλ+1,
316
MARKOV PROCESSES
Ш.38,39
so that on letting λ -* oo through a suitable sequence,
νϋ1=μΡ0 = μ.
Put ν = vP0. Since ν = vP0, we have v(Fbr) = 0; and, since P0^i = Ri, we have
Now if v*R1 = μ and v*P0 = v* f°r some v*ePr(F) then (as λ-* oo)
vA = (A + l)v*(Ki-AR1RA+1) = (A + l)v*RA+1->v*P0 = v*,
so that v* = v. D
Ray's Theorem: Probabilistic part
39. The Ray process associated with a given entrance law. Our treatment of FD
processes was designed to make the transition to Ray processes as easy as
possible; and we shall not dwell on those 'FD' arguments which apply in the
present context. We shall highlight those places where the Ray theory is more
subtle.
For the Martin-boundary application (and for other purposes), we need to
deal with processes with time-parameter set (0, oo) open at 0. (The X process in
(33.1) is not defined at time 0.)
Let ρ — {pt:t>0} be a probability entrance law for {P,}, that is, a family of
elements of Pr (F) satisfying
(39.1) Рж = рЛ Vs,i>0.
Note that, for 0 < ε < ί,
PtPo = ΡεΛ-εΛ) = ΡεΛ-ε = Ρ»
so that
(39.2) Pt(Fe)=U Vi>0.
Step 1: use of Daniell-Kolmogorov Theorem. For coei40'00*, write Υ)(ω) = ω(ί);
and put ^°'0Ο):=σ{Υί:ί>0}. The Daniell-Kolmogorov theorem implies the
existence of unique measure P* on (F(0'*>\ J^0,00*) such that, for 0 < t1 < t2 < · · · <
tn and xl9x2,...,xHeF,
(39.3) PpiYtledxi; Yt2edx2;...;Ytnedxn-]
= ptl{dxi)Pt2_tl{x1,dx2)...Ptn_tn_l{xn-udxn).
In particular, (39.2) shows that
(39.4) P^yiEFJ = Pt(Fe)=l, Vi>0.
Step 2: regularising Υ to produce X. If fteCSM1 then (Lemma 36.9)e~l Pth < ft, Vi,
111.39 RAY PROCESSES 317
so that
(39.5) {е-'й(У;):0<*<оо}
is a Pp supermartingale relative to the filtration induced by the У-process. The
crucial property that CSM1 separates points now allows us to repeat the
argument of (7.9) to show that, a.s.(Pp), the limit
*,:= lim Yq
exists for all ί > 0 and 11-> Xt is Skorokhod from (0, oo) to F. The proof that
РТ*,= У,] = 1, Vi>0,
proceeds as did the proof of (7.12), but you find that you now need to use (39.4).
Step 3: extending X to t = 0. By applying (H.69.4) to supermartingales of the
form (39.5), we see that
(39.6) X0:= lim Xt exists in F
mo
and the process X is an R-process in [0, oo).
Set p0:=PpoX~1. By right-continuity of paths,
(39.7) p0 = 1™р, in Pr(F).
mo
We must now check that
(39.8) PoPt = Pt, Vi,
so that (39.3), with X replacing У, extends to the case when 0 ^ ix ^ t2 ^ ··· <ί„·
It is tempting to write (ρε, Ptf) = (ρ,+ε,/), feC(F\ and to try to deduce (39.8)
by letting ε j JO. However, Ptf need not be continuous.
We can prove (38.8)—with important extra information—as follows. We
have, for λ > 0, ε > 0 and feC{F\
(Pe,RJ)= ΓV*'(pi+£,/)</i = E' [V^m+J^;
Jo Jo
and we can let ε||0, using the facts that Rxf is continuous on F and that Ζ
is right-continuous, to obtain
{p0,Rj) = E» ("e-*f(Xt)dt= \™e-»(p„f)dt.
Jo Jo
Now p0R1 is clearly 1-supermedian, and, because of (39.2), it is 1-excessive.
Hence, by Theorem 38.2, po^ = vRx for a unique ν carried by Fe. From the
resolvent equation, р0Яя = νΚλ for every λ. We now know that for feC{F), we
have, for almost all i,
(39.9) (v, Ptf) = (po, Λ/) = E'/(Xf) = (pi5/).
318
MARKOV PROCESSES
Ш.39,40
But tv-+Ptf(x) is right-continuous for each x, and X is an i?-process. Hence,
by right-continuity, equality holds in (39.9) for all t ^0 and all feC{F). Thus,
(38.8) holds. Since vP0 = v, we see that p0 = Pp°X~x = v.
Since p0 = v, so that p0(Fe) = 1, we obtain the following improvement of (39.6):
(39.10) X0:= lim XteFe, a.s.(P').
fjjO
It is (39.10), applied in reversed time, that gives the Doob-Hunt Convergence
Theorem for the Martin boundary.
40. Strong Markov property of Ray processes. Let Ω now denote the space of
R-paths ω from [0, oo) to F. (For ωεΩ, we define ω(οο) = δ, where д is either a
'new' coffin state or the 'old' coffin state already adjoined to arrange honesty—it
does not matter.) Let X now denote the coordinate process, ΑΓί(ω):=ω(ί), let
^°:= a{Xt:t ^ 0} and let J%°:= a{X5:s ^ ή.
For any pePr{F\
(40.1) Pt:=pPt
defines an entrance law p. The results of Section 3 ' show that we can define a
corresponding law Pp on (Ω, J^0). Our new Pp is what we called PP<=>X~1 in
Section 39. Now, with ρ as in (40.1), it is natural to do what we did in FD
theory and write Ρμ for the measure Pp on (Ω, J^°). The significance of μ as
'initial law' is no longer accurate, however, since
Ρ"ο*-ι=μρ0.
(We shall see shortly that we can think of μ as the Ρμ law of X instantaneously
before time 0.) As usual, we shall write P* {xeF) for PEx. Then
Рх1Х0еП = Р0(х,Г) (Te@F\
and, for 0^t1^t2< ··· ^i„,
PxlX0edx0; Xt,edx1\...; Xtnedxn~\
= P0(x,dx0)Ptl(x0,dx1)'-Ptn-tn_l(xn-1,dxn).
Of course
P"(.)= Γ Ρχ(ήμ(άχ) on J*"0.
The setup
(ajF^,p*,0,,*t)
is the Ray process with transition function {Pt}. It has the simple Markov
property.
IH.40,41 RAY PROCESSES 319
Because Pt need not map C(F) to C(F), we have to use a 'Laplace-transformed'
version of the proof of Theorem 8.3 for FD processes. Here are the necessary
modifications to that proof. For Τ an a.s. finite {SF°t+} stopping time, T{n) its
nth dyadic approximation, Ле^°+ and feC(F\ we have
Ι β-λ5ΕμΙ/οΧ(Τ{η) + 5); Λ] ds = Е"[(КЯ/) °*(Г(И)); Л].
Jo
Letting n-* oo and using right-continuity of paths and the fact that RxfeC(F\
we find that
|°°β-λΈ'1[/οΖ(Τ+5);Λ]^ = Ε'1[(Κλ/)οΖ(Τ);Λ]
Jo
= Γ°° Γ^[(Ρ5/)οΙ(Γ);Λ]ώ.
Jo
Hence
(40.2) Е"[/о*(Г+ 5);Л] = Е"[Р5/о*(Г);Л]
for almost all s. But (same old argument!) each side of (40.2) is right-continuous
in 5, so that (40.2) holds for all s.
We can now follow exactly the same course as we did in Section 9, introducing
various completions etc., and establish that the 'Ray process proper'
(Ω,.Τ,^,Ρ',0,,*,)
/5 strong Markov. Assume this done.
41. The role of branch-points. We already know that for fixed t ^ 0, XteFe a.s.
We now prove the much stronger result that
(41.1) almost surely, X never visits Fbr: Px[XteFe, Vi ^0] = 1, Vx.
Proof of (41.1). The 'right' way to prove this is by using Meyer's section
theorem (Theorem H.76.2), but there is an elementary proof avoiding capacit-
ability theory—see the exercise below. The Section Theorem shows that it is
enough to prove that if Τ is a finite stopping time and xeF then
(41.2) P*[*(DeFbr] = 0.
But pt:=Px°XYlt defines a probability entrance law because of the Strong
Markov Theorem, and (41.2) now follows immediately from Step 3 of Section 39.
Exercise. Deduce (41.1) from (41.2) by using Lemma H.75.1 and the fact (37.2)
thai Fbr is Ka (countable union of compact sets).
The 'exploding birth process' example in Section 26 shows that X can have
320 MARKOV PROCESSES 111.41
left limits in Fbr. Thus X may be able to approach Fbr, but it will branch at the
moment of approach. The following theorem, which is the 'Ray' analogue of
BlumenthaPs Quasi-Left-Continuity Theorem (Theorem 11.1), is the true
explanation of the role of branch-points.
{41.3) THEOREM. If (Гп:пе1М) is a strictly increasing sequence of {J^} stopping
times with ГИ||Г< oo then, for xeF and Ге@Р,
Р*Г*геГ V ^г(„)] = Л)(*г-,Г),
where Ч^Т(п):=а{^Т(п):пеЩ
Proof Let AeJ^) for some k. Then, for /eC(F), we have, for η ^ fc,
e-"Ex[foX(Tn + ί);Λ] dt = ExlRxfoX{Tn); Λ].
f
Jo
Since the set of discontinuities of X is countable (why?), and hence of measure
0, we can let η -> oo to obtain
f
Jo
e-XtExlfoX(T+ty,A}dt = ExlRxfoX{T-y,A]
f
Jo
= e-*ExlPtfoX{T-)-\-\dt.
Hence, Jby the old right-continuity argument yet again,
Ел[/о^(Г+0;Л] = Ел[Л/о^(Г_);л]
for all t ^ 0. In particular,
(41.4) Е*[/(*Г);Л] = ExlP0f(XT_);X].
Monotone-class arguments now show that (41.4) is true for all Λ ε V ^T(k) and
all feb$SF. On taking / = /r, we obtain the desired result. D
(41.5) Note. If we set X0__ := X0 then Theorem 41.3 and its proof apply to the
case when ТП||Г' is interpreted in the slightly wider sense that
(i) when Γ(ω) = 0, Γ„(ω) = 0, Vn;
(ii) when Τ(ω) > 0, Τη(ω)-> Τ(ω) and Τη(ω) *ζΤη + 1(ω)< Γ(ω), Vn:
From now on, we always allow this wider interpretation of Τπ || Τ" for stopping
times.
We have at last finished Ray's Theorem (though Ray's amazing 1959 paper
goes further in certain directions). Recall that we have utilised ideas of Knight
and Kunita and Watanabe as well as those of Ray.
111.42
APPLICATIONS
321
6. APPLICATIONS
Martin boundary theory in retrospect
42. From discrete to continuous time. The idea of combining time reversal with
Ray theory to produce Martin-boundary results goes back to the very important
Kunita and T. Watanabe papers [2,3]. A fine account appears in Meyer [4].
The optimal way to handle the discrete-parameter chain case has been known
in folklore for some time, and the extremely effective device of using time
transformation to make the resolvents into compact operators finds general expression
in the paper [1] by Garcia Alvarez and Meyer.
As in Section 27, let / be a countable set, let Π be a substochastic I x I matrix
with Green matrix Γ:= ΣΓΓ and make the following assumption:
there exists a (reference) point b in I such that 0 < Г (bj) < oo, V/.
Define the Martin kernel к on I x I:
K(iJ):=r(iJ)/r(bJ).
Let J (rather than X)—J stands for 'jump chain'—now denote a discrete-
parameter chain with one-step transition matrix Π. By observing J with a
(discrete) clock that 'ticks' only when J changes position (and which therefore
'ignores' times η when J(n) = J(n — 1)), we produce a new chain J. You can
easily check that
щи л=[i - s(um [i - πα о] - чщ,л
and that the change from Π to Π preserves excessive and regular functions and
also preserves the Martin kernel. Since, further, results of Doob-Hunt type are
invariant under the change from J to J, we may as well make the assumption
Щм) = 0, Vi.
Now let q(-) be a strictly positive function on / such that
(42.1) Σ^<οο.
J q(j)
We shall often write q for the diagonal I x I matrix dmg {q{i)}. Introduce the
I x I matrix
Q:=-q + qYl.
We now let X be a (right-continuous, continuous-parameter) chain on I with
Q-matrix Q. It is well known that each visit by X to state i in / is exponentially
distributed with rate q{i) and is independent of the behaviour of X prior to that
particular visit. It is also well known that the jumps of X are made in accordance
322
MARKOV PROCESSES
ΠΙ.42
with the law of J. The lifetime ζ of X is a.s. finite. Indeed, for each i,
Εΐα=Σ^ΣΓ^<οο.
J 4(J) J q(j)
The P' probability that X is at j at time t having made the η (>0) jumps
i = i0 to iu it to i2,.. ■,i„~ι to i„ = j has (as a function of i) Laplace transform
ПлО'о, iJUfa, i2) ■ ■ ■ Πλ(ίη_ l5 i„) [A + q(m ~ \
where Пя denotes the substochastic matrix:
ПЯ(У):= [A+ #·)]" МОЩУ).
thus the resolvent {Rx} of X satisfies
(42.2) Яя(и) = 1^А,
A + q(j)
where ГА is the Green matrix of Πλ:
Γλ··= Σ πι.
The formula (42.2) is intuitively obvious from the formal calculation
Ля=а-е)-1=(^+^-^п)-1 = [(я+^)(/-пя)]-1.
All of these formulae are due to Feller [3].
We write {Pt} for the transition (matrix) function of X,
РДи):=Р'{*, =./},
and G for the Green matrix of X,
G(iJ) = R0(iJ) = r(iJ)/q(j).
Note that the Martin kernel for X is again к.
(42.3) Time-reversal. We now insist that X starts at b, so we work with the Pb
law.
Define
\d (t>0,
where δ is a new coffin state isolated from /. Since Ϋ is left-continuous and we
wish to work with right-continuous processes, we put
X(t):=Y(t+) (i>0).
We do not define X(0). It is easy to believe that {X(t): ί > 0} is a right-continuous
(Pb) modification of {Y(t): t > 0}.
Ш.42,43 APPLICATIONS 323
For t > 0, define
ЛО-):=РЬ[^(0=Л (jelud).
Then, as we shall see in Section 45, we have the following.
(42.4) Nagasaw as formula. For 0<t1<t2<··· <tn and j!,^,..., jne/u5,
P\X(h)=ju...9X(tn) = jJ = ptl(j1)Pt2_tl(j
where
(42.5) Pt(U j):= G(bJ)Pt(j, i)/G(b, i) (ijel),
Pt(i, 5):= 1 - Pt(U /), Pt(d, *) = unit mass at д.
You can easily check that {Pt} is an honest transition function on I ud. It
follows from (42.4) that {pt: t > 0} /5 a probability entrance law for {Pt} on I и д.
The resolvent {Rx} of X is obtained by taking Laplace transforms in (42.5).
Thus
(42.6) Rx(iJ) = G(bJ)Rx(j, i)/G(b, i) (ijel),
Яя(/,5) = Я-1-Кя(/,/), etc.
In particular, X has Green function G on I x I satisfying
G(iJ) = R0(iJ) = G(bJ)K(jJ).
(42.7) Exercise. Explain (intuitively—we do not yet have a strong Markov
theorem for X) why the fact that X can reach д from a point i of / only via b
corresponds precisely to the result
RX(U d) = Rx(i, b) [ЯЛ(Ь, Ь)] " *Rx(b, д) ^ КЯ(Ь, 4
and prove this result analytically.
43. Proof of the Doob-Hunt Convergence Theorem. The plan should now be
obvious to you.
Take the discrete topology on I и д. Then I ud is an LCCB and {Rx} is a
resolvent on /u3 satisfying the special 'pre-Ray' hypotheses (35.6). We can
therefore build the Ray-Knight compactification Fud(say) of/u5 determined
by {Rx}. Since the indicator function of the set {5} is continuous and of compact
support on /u5, it follows from the definition of Zx in (35.2) that
(43.1) i\—>Rx(i,d) extends continuously from I ud to Fud.
But since, by (42.7),
Rx(iJ)^Rx(bj)<Rx(d,d), Vie/,
it is clear that д is isolated in Fud. Hence F is a compactification of /.
324
MARKOV PROCESSES
ΠΙΑ
{43.2) THEOREM. The Ray-Knight compactification F of I is identical to tfa
Martin compactification FM (say) of I based on к.
That is, there exists a homeomorphism of F to FM that leaves points of ι
invariant.
Proof. Let /(/) be the usual Banach space of absolutely convergent series (μ(: iel
with norm
IM|:=Zl"il<00·
Usually, but not always, we think of /(/) as the space of signed measures и on
I with || и || as total-variation norm.
The whole of the present proof could be based on the fact that each Rx (λ ^ 0),
acting on the right by multiplication u\->uRx, is a compact operator on /(/).
(This follows immediately from (43.3) below and the well-known Cohen-Dunford
criterion.) However, we shall phrase the argument in terms of uniform integr-
ability, which is the concept underlying the Cohen-Dunford result.
For λ ^ 0, we have
(43.3) Rx(iJ) < R0(iJ) ^ R0(jJ) = i~,
q(j)
and, since YT{jJ)lq(j) < со, the functions {ЛА(/, ·): λ ^ 0, iel} on I are uniformly
integrable with respect to counting measure on I. Hence
(43.4) if (in) is a sequence of I and λ^Ο is fixed then limn Λλ(ίπ,·) exists in lx(I)
if and only if limnRx(inJ) exists V/.
Indeed, this follows directly from (43.3) and the Dominated-Convergence
Theorem.
It is immediate from (43.3) that each Rx (λ ^ 0) acting on the right by
multiplication on /(/) is a bounded operator, and it is easy to see that the resolvent
equation extends to give
R0(I - ХКЯ) = RX9 R0 = RX(I + ЯЯ0)
in the language of bounded operators on /(/). Since, for example,
Λο(/„,·) = ^λ(^·)(/ + ^0),
we can improve (43.4) to the following form:
(43.5) lim„Rx(i„,·) exists in 1(1) for some (and then all) λ^Ο if and only if
lim„ R0(iH,j) = lim„ G{bJ)ic(j, in) exists V/.
The argument leading to (43.1) with; replacing д shows that if (/„) is a sequence
in / converging to £RK in the Ray-Knight topology of F, then
Ля({як,/) = 11тЯяО'.эЛ exists Y/e/.
Ш.43,44 APPLICATIONS 325
Hence lim k{j, i„) exists V/, so that (in) converges to a point ξΜ of FM in the
Martin topology and
(43.6) K(j^u) = G(b,jrllR^KKMI + ^Ro)(k,j), Vje/,
fori
for every λ > 0.
Conversely, suppose that a sequence (ij in / converges to a point ξΜ of F^
(in the topology of FM). Then, from (43.5) (you can check that the extra д term
causes no trouble),
lim ZJ^„5J)-^M,J)I = 0,
where
( Σ С(М)к(*,{м)(/-ЛЯя)(*,./) (jel),
(43.7) Лл(^м,;):= < *·'"3
[Х^-ЯМиЛ (j = §).
Hence, for every bounded function / on /,
lim RJ(i„) = RJtfM):= Σ. ^a«m. Λ/(Λ·
By the construction (Section 35) of the Knight algebra C(Fud), it now follows
that lim g(in) exists for every g in C(Fud). Hence £κκ:= limin exists in F, and
You can round off the argument (that ξΜ = ξκκ etc.) to your own satisfaction,
but (43.6) and (43.7) really say it all. In particular, these results show that Fe
and (FM)e agree. Π
The Doob-Hunt Convergence Theorem 29.1 now follows immediately from
(39.10) because Χ(ζ -) = X(0 +) Π
44. The Choquet representation of Π-excessive functions. Suppose that 0^
Π/ < / and /(b) = 1; thus, in the notation of Section 27, feS. It is an elementary
fact (used in the proof of Theorem 28.2) that
/(0 = tlim£r(U)/fn(./)
for some non-negative 'charges' β„ on /. Thus
f(i) = 1KmZG(iJ)q(j)fin(jl
and, since XRkG = G — RX<G, it follows that / is supermedian for {Rx}:
XRJ^fonl.
Recall from (27.5) that f(j) < 0(j) x. It is now convenient to assume (and we
do assume) q(·) chosen so that (in addition to (42.1)) we have
lGUJ)0(j)'x -ΣΠλΜΑ"1^./)"1 < oo.
326
MARKOV PROCESSES
Ш.44
Then
Rof = lG(i>J)fU)<lGUJWU)-\
so that R0f is bounded on / (indeed, uniformly over / in S). If
fl»f-Rif{>0) on /
then f1 satisfies ХЯх+1/г </* and also
f = f1 + Rof1·
The resolvents {Rx} and {Rx} are 'in duality relative to the measure G(b,·)
on Г in the sense that
(h1, Rxh2 yGibt.) = < Rx^i, h2 Уо(ь,)>
where hx and /i2 are non-negative functions on / and
<*i.*2W):=Z*i(0*2(0G(b,0.
It is a general principle of potential theory that a 1-excessive function for {Rx} is
the density relative to G(b,·) of a 1-excessive measure for {Дя}.
We can use (respectively, understand) the idea behind this general principle
in (via) our simple situation. Put
(i\j)-=f\j)G{b,j) (jel).
Then, for jel,
(44.1) λμ^λ+ x(j) = G{b,j)kRk+J\j) < (i\j\
so that μ1 is (at least) 1-supermedian for {Rk} on /. If we ignore technical details,
the general principle is 'trivial', but it is extremely useful, as we shall soon see.
Let us now make the cunning definition
(i\d):= f\b).
By several applications of the resolvent equation, you can check that
/1Ч5)-Я/21Яя+1(5) = (Я+1)-1[/(Ь)-(Я+1)Яя+1/(Ь)]^0,
so that μ1 is 1-supermedian for {Rk} on I и д. Note that
/i1(/u5) = /1(b) + G/1(b) = /(b)=l,
so that μ1 is a probability measure on /u5. Now every point of I\jd is obviously
a non-branch-point of {Rk} now considered as extendedjto Fu5, so μ1 can be
considered as a 1-excessive probability measure on Fud.
By Theorem 38.2,
JFeVd
Ш.44,45 APPLICATIONS 327
for some probability measure ν on &${FeKjd). Define
fi(j):=G(bJ)f(j) (jel).
Then, on I (not I ud), we have
μ = № + *<>]=[ ν(άξ)ίΐ0(ξΛ
whence
(44.2) /(*)=[ κ(ί,ξ)ν(άξ)
and v(Fe) = /(b) = 1. That the 'Choquet' representation (44.2) is unique follows
easily from the 'uniqueness' part of Theorem 38.2. Our direct proof (avoiding
Choquet theory) of the analytical results in Section 27 is now complete.
45. Doob's /i-transforms. Now the 'Ray' approach to Martin boundary theory
really pays dividends.
Let ν be a probability measure on Fe and let
h(i):= [κ{ίΛ)ν№
denote the excessive function (with h(b) = 1) that ν represents. We can define
the Pv law of 'X started according to the law v' in the usual way. It is automatic
from (42.7) that X dies at b:
Pv[X(C-) = b] = l, where ζ:=Μ{ί:Χ(ή = δ}.
Now write Yh for the time-reversal of (X, Pv):
w~b (t>o,
where д is a 'forward' coffin state. Put Xh(t):= Yh(t + ), so that, for t = 0, Xh(0) = b
(a.s.(P)). By Nagasawa's formula, Xk (starts at b and) has transition probabilities
Ph(f,hj) = \ .„.*.„ P(Kj,0
= h(r1P(t;i,j)h(j').
In particular, the Q-matrix Qh of Xh satisfies
QHdJ) = h(r1Q(i,JMjl
so that the 'jump chain' Jh of successive states visited by X has one-step transition
328
MARKOV PROCESSES
HI.45,46
matrix Пл, where
Uh(i,j) = h(r1Tl(iJ)h(j).
We have proved the following theorem.
{45.1) THEOREM (Doob). The probability measure ν that represents an element
h of S is the distribution of Jh(^ — \ where Jh is a discrete-parameter chain with
transition matrix Ylh and started at b.
Theorems 29.2 and 29.3 follow immediately.
We have presented a full acount of Martin-boundary theory in its simplest
setting, but by the most powerful methods. You will wish to follow up the
general theory in Meyer [4]. To get a first idea of the scope of applications,
see Blackwell and Kendall [1] on Polya's urn and population growth, Revuz
[1] for the culmination of work of Kesten, Spitzer, Brunei and Revuz on random
walks on groups, and Dynkin [3] for an extraordinary and deep result on
random deformations of ellipsoids. Norris, Rogers and Williams [1] gives a
simpler proof of Dynkin's result.
Time reversal and related topics
It is something of a mystery that a number of results are much clearer in reversed
time than in forward time. Martin-boundary theory has already illustrated this;
and we shall shortly see other examples.
Our concern is with how to use the idea of time reversal, without full
discussion of certain technical difficulties, which tend to blur the subject. Chung
and Walsh [1] is a very important paper on identifying and resolving those
difficulties. The last two volumes of Dellacherie and Meyer [1] have the latest
news. Time reversal is deeply connected with duality, a major topic from the
work of Hunt on. See Blumenthal and Getoor [1]. Joanna Mitro's papers [1,2]
greatly clarify duality.
46. Nagasawa's formula for chains. Let / be a countable set and let {ρ0·(ί)} be
a transition matrix function on / (satisfying the usual continuity condition:
pl7(0 + ) = Sij). Let {p{t)} be a probability entrance law for the usual extension
{P?d} of {Pt} to I ud. But let us write Pt instead of the clumsy P + d. Let
{Xt: t > 0; Ω, J5"0, Pp} be a process on / и д with
PP[X5 = i; Xs+t =j; Xs+t+u = fc;...] = р^)р{](()р]к{и)...
whenever s, £, w,... > 0 and i, j, fe,... el и д. If you like, you can imagine X to be
the Ray process taking values in the Ray-Knight compactification of Iud
determined by {Яя}. (The precise details of this are clarified in Section 50.) But
let us not be too specific about the technicalities—assume that X is 'sufficiently
111.46
APPLICATIONS
329
smooth'. It is well known (see Section 52 for proof) that, for fee/, there exists
a continuous function fkd(') on (0, oo) such that
Р*КеЛ] = fkd(t) dt, ί := inf {i: Xt = d}.
(Strictly speaking, we should be concentrating on a particular triple (Ω,#"°,ΡΡ),
but we assume that P* can be defined on (Ω, !F°\ as will happen when (Ω, J^°)
is the usual path-space. As far as we are concerned, this point is rather academic.
When you read the Chung-Walsh paper, you will see why we have mentioned
it, and you will recall comments on regular conditional probabilities made
before Theorem 11.90.11.
Let us suppose that Pp(0 < ζ < oo) = 1. We now consider the process {7(t), Pp},
where
Ϋ(ίγ=ίχ(ζ~ή (°<ίίζζ)>
\д (ί>ζ).
It is plausible that, for kel,
(46.1) PplY(t) = fc] = Γ °° Ρρ[Χ(ζ -t) = k; ζΕί + dv\
Jo
= Γ P'[X(t>) = k^LCedq/dt) dv = ^Ae(i),
Jv = 0
where
(46.2) ξ,:= Γρ,(ν)άν.
We do not need to tell you that (46.1) is not a rigorous calculation. One way
(Chung-Walsh) in which to make it rigorous involves approximating the
integrals by Riemann sums, using right-continuity of X to push things through.
Another way (Meyer [4]) is to justify directly the result obtained when both
sides of (46.1) are multiplied by an arbitrary measurable non-negative function
of t and integrated over (0, oo). In either case, we can use left-continuity of Υ
to show that (46.1) must hold for all (not merely almost all) i.
Without any further difficulty, we can calculate for ij\kel and s,i,w > 0,
(46.3) P^[yf = i;yi+l=j;yi+l+ll = fc]
priXt-^nX^-^jlX^-t-^kXes + t + u + dO],
■i.
1
Ρ'ΙΧυ = к; Xv+U =j; Xv+u+t = Qfie{s) dv
330
MARKOV PROCESSES
IH.46,47
where
Α/*):=ίΛι(*)/&
(with some arbitrary conventions for 0/0). The extension of (46.3) 'to η terms'
is obvious. Thus
(46.4) {Yt: t > 0;PP} is Markovian with stationary transition probabilities {&;(£)}
and with probability entrance law on Iud determined by (46.1).
47. Strong Markov property under time reversal. Consider a process X that
starts at 1, drift towards 0 at constant rate 1, stays at 0 for an exponentially
distributed time S of rate 1, and then dies. Thus
Xt=l-t (0^ί<1); Xt = 0 (l^i<l+S); Xt = d (t^l + S).
It is easy to see that X has FD transition function, so that X is strong Markov.
The time-reversal Υ οϊ Χ satisfies
У, = 0 (0<i^S); Y, = i-S (S<i^S + l); Υ, = δ (i>S+l).
The right-continuous modification X of Υ satisfies
Xt = 0 (0^t<S); Xt = t-S (S<i<S+l); Xt = S (i^S+l).
Note that 1 acts as a branch-point for X, with Ρ0(1,δ) = 1. The process X is
very similar to the process in Example 35.8. In particular, X is not strong
Markov relative to its natural {^7+} -algebras, because X does not start afresh
at time S. Thus time reversal can destroy the strong Markov property.
Chung and Walsh [1] show that the time-reversal (made right-continuous)
of a strong Markov process has a Markov property intermediate between the
simple and strong Markov properties: the so-called moderately strong Markov
property. This interesting concept is 'correct' from the point of view of the theory
of previsible processes.
There is however another way out of the 'difficulty', which is more satisfactory:
Doob's simultaneous compactification. The problem with time reversal is that
the Ray-Knight compactifications associated with X and X may be totally
different and may induce different topologies on the initially given state-space.
For the example that we have just been discussing, we know from (35.8) that,
in order to 'force' the strong Markov property of X, we must tear [0,1] apart
at 0, producing a state-space {0} u [0+, 1]. (Recall that [0+, 1] is the same as the
conventional [0,1] and that 0 now denotes a point isolated from [0+, 1].) The
important thing is that both X and X have good (right-continuous, strong
Markov) modifications with state-space {0}u[0+, 1], with 0+ made a
branchpoint for X from which X branches to 0, and that these modifications are
properly related: each is the 'time-reversal made right-continuous' of the other.
Here you have a clue to Doob's idea (see Doob [2]) of constructing an
entrance-exit space for a general process X by using a 'simultaneous' compacti-
ΠΙ.47,48
APPLICATIONS
331
fication based on both X and a time-reversal X. We leave aside discussion of
the worries you are beginning to have about whether we will now have to
abandon the Ray property.
48. Equilibrium charge. We now give the promised intuitive explanation
(expanded from Williams [1]) of Hunt's Theorem 1.22.7 on equilibrium potential
and the Chung-Getoor-Sharpe Theorem 1.22.13 on equilibrium charge. Though
we skip rigour for interest's sake, it is not too hard to supply it even in much
more general contexts.
Take η ^ 3 and let g be the free-space Green function for Δ (in Rn). Let beTR"
and let В be canonical Brownian motion in R+ starting at b. Let ν be a strictly
positive C00 function on Rn such that
EbAaD = 2\g(b,y)v(y)dy<cc
where
= 2Lb,y)v{
A,:=
Ί
v(Bs)ds.
/0
Put rt:=ini{s:As>t}. Then, by Volkonskii's Theorem, Xt\=B°xt defines a
continuous strong Markov process with finite lifetime A^. The Green function
of X with respect to the measure 2v(y) dy is just g.
Now let X be the time-reversal of X with Green function # relative to the
measure 2v(y) dy given by the Nagasawa formula:
(48.1) $(y,z) = g(b,z)g(z,y)/g(b,y).
Define
HK:=ini{s>Q:XseK}, HK:=inf{s>Q:X5eK},
L£:= sup {5 > 0: B5eK}, LXK = sup {5 > 0: XseK}.
For ГеЩдК), put
Я(Г):= P*[B(L£)er] = Рь[*(Ь£)еГ] = Рь1Х(Нк)еГ}.
Then, using the (unproved but 'obvious') strong Markov property of X, we find
that
2v(z)dz l{dy)g{y,z) = Eb[time spent by X in dz before L£]
JdK
= Eb[time spent by X in dz~\Pz[HK < 00]
= 2g{b,z)v(z)dzPz[HK < 00].
On substituting the formula (48.1) for g{y,z), we see that, for almost all z.
(48.2) PZ[HK < сю] = g(z9 y)eK{dy\
JdK
Jdl
332
MARKOV PROCESSES
Ш.48,49
where
(48.3) eK{dy):=g{b,y)-'Y*[.B{.LBK)edy-}.
Since Pz[#£ < oo ] has the same significance for В as for X, the main results
of Theorems 1.22.7 and 1.22.13 are more or less proved. Rounding off Theorem
1.22.7 is largely a case of repeating arguments used in connection with the
Dirichlet problem. Let us emphasise one thing, however. The two sides of (48.2)
are discontinuous at sufficiently singular points of dK, as Lebesgue's thorn
shows. To obtain (48.2) for all z, which is important because of such singularities,
we use the fact that both sides of (48.2) are excessive (for В or for X).
Exercise. Explain the 'excessive' property and why it implies equality for all ζ
in (48.2).
You should now be convinced that time reversal provides the natural
approach to many problems.
49. BM(R) and BES(3); splitting times. Many relations exist between 1-
dimensional Brownian motion and the 3-dimensional Bessel process. Here is
a first one. (A MBb(R) has starting position b, etc.)
(49.1) THEOREM. Let В be a BM0(R) and define Я? := inf {t: Bt=l}. Let R be
a BES0(3) and define Lf := sup {i: Rt — 1}. Then the processes
{1 - B(Hf - i):0 ^ t < ΗΪ} and {R{t):0 ^ t < Lf}
are identical in law.
After the discussion at the end of Section 31, it should be clear to you that
Martin-boundary theory makes this result extremely plausible. It is not difficult
(Exercisel) to prove it directly by bare-hand computation—see Williamsr [4],
where the result is applied to local-time theory.
For 0 ^ t < oo, put
At:= meas {5 < t: Bs ^ 0}, τ,:= inf {5: As > t}.
Then (see Section 22) Yr:= B°xt defines a reflecting Brownian motion Y. In other
words, У is a BES0(1) process. Put H\:= inf {t: Yt = 1}. Then
H\ = A(H*) - meas {t ^ Lf: R(t) ^ 1} = meas {t < 00 : R{t) ^ 1},
'~' signifying equality in law. This is the special case 'n = Г of the Ciesielski-
Taylor Theorem that if Rn is a BES0(n) and Rn + 2 is a BES0(n + 2) then
(49.2) inf {t: R„{t) = 1} - meas {t < 00 : Rn+2{t) ^ 1}.
See (11.20) and Ciesielski and Taylor [1]. Many proofs of (49.2) are now known.
None seems to provide a clear geometrical explanation, and one conjectures
that no such explanation exists.
111.49
APPLICATIONS
333
The following result (Williams [4,7]) was needed for certain applications to
excursion theory and local-time theory.
(49.3) THEOREM. Fix b in (0, oo). On a suitable probability triple (Ω, J^P), set
up three independent random elements (see Fig. III.l):
a random variable у uniformly distributed on (0, b);
aBMb(R){B(i):i^0};
aBES0(3) {R(t):t^0}.
Define
p:=inf{i:B(i) = y},
X(t):=
B(t)
(t<pl
R(t-p) + y (t>p).
Then {Χ(ή: ί ^ 0} is a BESb(3).
We regard this result as providing a path decomposition of the BESb(3) X at
the time ρ at which X attains its minimum value y. Williams [4,7] proved
Theorem 49.3 by bare-hand calculation, but did try (unsuccessfully) to produce
a theory of splitting-times as the 'natural' times at which one might expect to
have path decompositions of Markov processes. His idea was to call a random
time ρ an algebraic splitting-time (for a Markov process X) if, for every i, we
can write /
{p = t}=Ftr\Gt,
where
Ftea{Xs:s ^ i}, Gtea{Xu:u ^ i},
the σ-algebras being uncompleted. The way in which this definition would be
ruined if we allowed completion of algebras is a first indication of how much
more difficult it must be to prove a splitting-time theorem than the Stopping-
Time (that is, Strong Markov) Theorem.
Figure III.l
334
MARKOV PROCESSES
HI.49,50
Jacobsen [1] gave a nice formulation of 'splitting-time' in terms of the
'crossover' property, and also gave a more illuminating proof of Theorem 49.3.
Pitman [1] then proved Theorem 49.3 by using a clever random-walk
approximation, and in [2] he described other applications of the splitting-time idea.
Millar has done some fine work in this area (see his survey [1] and papers
referred to there) and has (Millar [2]) the definitive proof of results on path
decomposition at times of minima for a wide class of Markov processes based-
on work of Getoor and Sharpe.
New proofs of Theorem 49.3 appear regularly. See, for example, Walsh [1],
le Gall [7], Ikeda and Watanabe [1], Revuz and Yor [1], and Section VI.55
in Volume 2. Non-standard analysis provides a promising approach to splitting
times; see Cutland and Kendall [1].
Theorem 49.3 helped motivate some of the original work on grossissements
(enlargements of filtrations). See Barlow [1,2], Jeulin [1], Jeulin and Yor [1].
Jeulin gives a martingale proof of Theorem 49.3.
A first look at Markov-chain theory
Many key ideas of modern Markov-process theory—last-exit decompositions,
excursion laws, boundary theory etc.—first appeared in clear form in Markov-
chain theory; and chain theory still seems to us the ideal vehicle for learning
process theory and assessing its achievements. Like number theory, chain theory
is at the same time 'concrete' and sufficiently rich to accommodate the most
sophisticated ideas.
50. Chains as Ray processes. Let / be a countable set, so that / is an LCCB
in its discrete topology. Let {Pt} = {pl7(i)} be a transition matrix function on
/, assumed 'standard' in that
Ρυ(0 + ) = Ρυ(0) = δφ Vijel.
Without loss of generality, we can assume that {Pt} is honest. So let us assume
that
Let {Rx} be the resolvent of {Pt} acting on C(I) = B{I). Then the conditions
(35.6) are obviously satisfied, so that we can construct the Ray-Knight com-
pactification F of I based on {Rx}, the Ray transition function {Pt} on F, and
the Ray process X with transition function {Pt}.
The space F may include 'irrelevant' points, and we shall see that we can—with
advantage—restrict attention to the space Ε defined as follows:
(50.1) £:= {xeF:Pt(x,I) = 1,Vi > 0} 2 /.
111.50
APPLICATIONS
335
For xeF, the mapt\-*Pt(x,I) is obviously (why?) non-decreasing in t, so that
(50.2) E = {xeF:Rl(x,I)=i}.
(If, for example, we have the Poisson chain on Z+ with
( 0 otherwise,
then F is the one-point compactification of Z+, and РДоо, {oo}) = 1 for t ^0;
here, £ = Z\)
The space Ε may be described very simply. Think of the point i of / as
identified with the element
rl,(l) = /?1(i,{-})E/1(/).
Then the arguments in Section 43 show that (up to homeomorphism) Ε is
nothing other than the closure of / in ίγ (I). The map
хн.гя.(1) = Я1(х,{·})
is therefore a homeomorphism of Ε to ίγ (I).
(50.3) THEOREM (Neveu [5]). For t >0, the map
from Ε to ίγ (I) is continuous. Hence, for every f in B{I) and every t > 0, the map
χ ι—► Ptf(x) is continuous on E.
Proof. We are guided by Neveu [5]. With slight (but allowable) misuse of
notation, let x->y in E. Then
r,.(l)-rr(l) (in /,(/)).
Since, for 0 < и < ν,
ί"
J и
e-°px.(s)ds = гхШе~иР(и) - e~vP(v)l
we have, for 0 < δ < ί,
(50.4) δ~ι Γ J-'PxjWs-tS-1 Γ J-'pyj^ds, y/e/.
Jt-δ Jt-δ
Fix j ε I. Let ε > 0 be given, and let δ > 0 be so small that pn{u) ^ 1 — ε for
w ^ <5. Then, for ί — δ < s < ί,
336
MARKOV PROCESSES
111.50
whence, from (50.4),
liminfp^iOXl-e)^"^"1 ex~spyj{s)ds.
x-*y Jt-δ
Now let ε||0 (and insist that <5||0) to obtain
(50.5) lim inf pxj(t) ^ pyj(t).
x-*y
Fatou's Lemma combined with Σρ^·(ί) = 1 shows that equality must hold in
(50.5) and this implies (why?) that we can replace lim inf in (50.5) by lim'. D
The point of Theorem 50.3 is that {Pt} has nice analytical properties on E. We
must now show that Only Ε matters' in the probabilistic theory.
{50.6) THEOREM (Ray, Kunita-Watanabe, Meyer). For every χ in E,
(50.7) P*[*feE,Vt>0] = l,
(50.8) P*[*f_e£,Vi>0] = l.
Proof. The proof is a fine illustration of the need for Meyer's section theorems.
Let xeE. Then, with χΙ denoting the characteristic (indicator) function of /,
we have, for any stopping time T,
1 = RiXiix) = E* fTe-'Xl(Xt)dt + Exle-T^Xi(XT)l
Jo
<Ех[1-г-г] + Ех[г-г] = 1.
You can see that, because of (50.2),
PxlT<oo,XTeF\E]=0,
and the result (50.7) now follows from the Section Theorem II.76.2.
To prove (50.8), we need some of the general theory of processes, which we
study further in Chapter VI. Since the process {Xt-} is left-continuous, it is
'previsible'. Hence if
Px[Xt_ eF\E for some i] > 0
then (by Meyer's 'previsible' section theorem—see Dellacherie and Meyer [1])
there exists a sequence (Tn) of stopping times with T„^T (a.s.(P*)) such that
Р*[Г<оо,*г_е£\£]>0.
However, this contradicts the 'Quasi-Left-Continuity' Theorem 41.3, because
(as we saw above) XTeE (a.s.(P*)), whereas
P0(XT_,E)<1 (HXT_eF\E).
Ш.50-52
APPLICATIONS
337
Clarification. Since Λ1χι^χ£, we have, for yeF\E,
l>R1(y,I) = syRlXi = ByPoRa, > syPoXE = P0(y, E). D
We may [and shall) now regard Ε as the state-space of our process (chain?\)
X. Of course, Ε need not be compact, but Ε is Polish. We write Ebr for the set
of branch-points in £, and Ee for the set of non-branch-points in E.
51. Significance of qf. Let iel and define
(51.1) S£:=inf{t:*f#i}.
(Then Si is an {J%°+} stopping time.) By the simple Markov property, we have,
for 5, t > 0,
P'[S, > s + t] = P'[Sf > s]P'[S£ > i],
so that
(51.2) FCSf > i] = e~qit for some ^е[0, оо].
Now let επ||0. Then it is clear that
P£[Sl>i] = Kmp„(eI1)W-i + 1
π
so that
t
-^i=hm-logpfi(en).
It is now trivial that
(51.3) i^Iime-^l-p^e)],
εϊίΟ
the existence of the limit being part of our conclusion. Of course, we have
(51.4) pii(i)^Pi[Si>i]=^9if.
We make the usual classification:
(51.5) a state i of I is called stable if qt < oo and instantaneous if qt = oo.
Since paths are right-continuous, it is clear that
(51.6) a point of I that is isolated in I in the Ray-Knight topology is a stable state.
Of course (give an example!) a stable state may be a point of accumulation of/.
52. Taboo probabilities; first-entrance decomposition. We stick to DW's
traditional use of 'b\ the first letter in the alphabet to cause printers (and readers)
no trouble, for a state in / on which we concentrate our attention. (Of course,
b is not a branch-point of X.) We try to abide by Chung's terminology and
notation for chains.
338 MARKOV PROCESSES Ш.52
So fix b in /. Introduce Chung's taboo transition probability:
(52.1) bPo.(i):=P[^f=j,Hb>i] (ijel\b)
where Hb is the hitting-time of b. Then, because Hb has the terminal-time property
Hb = t + Hbo9t on {Hb>tl
it follows {Exercise—but see Section 54) that {ьрф)} is a 'standard' transition
matrix function on I\b and that, for s,i > 0,
(52.3) Fib(t + s)-Fib(t)= Σ bPij(t)Fjb(s) (iel\b),
jel\b
where
(52.4) ^ь(г):=Р'[Яь<г].
Chung and Neveu independently showed that (52.3) may be differentiated with
respect to s in the following precise sense: for хфЬ, there exists a (finite)
continuous function fib(·) on [0, oo) such that
(52.5) Fib(t)=\'fib(s)ds;
Jo
further, for s > 0, but not necessarily for 5 = 0,
(52.6) /*(* + *)= Σ ьРиШМ
jel\b
We skip the analytical derivation of (52.5) and (52.6) from (52.3). See Neveu
[3,4], where the idea is that we can write fib explicitly as
fib(t)= Σ r1[tbPij(t-s)dFjM
jel\b Jo
or Chung [1; Theorem II. 12.4] for proof by appeal to classical (Fubini) theorems
on differentiation.
The important fact that fib extends continuously from (0, oo) to [0, oo) with
/ib(0) < oo needs special pleading. We reproduce the argument from Chung and
Neveu. We have
fib(s)>bPu(s-u)fib(u) (0<u<s).
Hence, as we see by letting иЦО (through a suitable subsequence),
(52.7) fib(s) > bPnishim sup fib(u)\
L alio J
and now letting sjjO,
lim inf /ib(s) ^ lim sup/ib(w).
«1Ю ujjo
Ш.52,53
APPLICATIONS
339
Put
(52.8) ^.b:=/.b(0):=/ib(0 + ) (iel\b).
We see from (52.7) that
(52.9) <Uifb<oo (iel\b). D
If we let 5 ДО in (52.6) and apply Fatou's Lemma, we obtain
(52.10) /№(t)> Σ ъР1№ь.
jel\b
Strict inequality may obtain here. The right-hand side of (52.10) represents the
P' probability density of first entering b via a jump from I\b at time t. (Accept
this intuitively obvious fact for now.) Note how 'wrong' things go in the case
of the Feller-McKean process.
Let us now agree to use the 'tilde notation for Laplace transforms. So, for
λ > 0, write
(52.11) /»(Я):= Г°° е-*Ш it = E'[exp (- ВД];
Jo
(52.12) &,·(!):= f " e~XtPij{t)dt = ry(A).
Jo
Dynkin's formula gives
PM) = RxXb(i) = ЕТехр(-Шь)]КяХь(Ь) =fib{X)pbb{X\
and, on inverting the Laplace transform, we obtain the first-entrance
decomposition:
(52.13) pib(t)= \fib(s)pbb(t-s)ds (iel\b)
Jo
which (by continuity) is valid for all t > 0.
(52.14) Exercise. Prove the more precise result
P(lHbeds; X(t) = b] =fib(s)pbb(t -s)ds (0<s< t).
53. The Q-matrix; DK conditions. We have already seen that
(53.1) ^:=lime-1[pii(e)-l]= -^e[-oq,0].
Since qib:=fib{0 + ) exists in [0, oo), it follows immediately from (52.13) that
(53.2) lime-We) = ^e[0,oo).
εϊίΟ
340 MARKOV PROCESSES ΙΠ.53,54
The existence of the limits in (53.1) and (53.2) was first established by Kolmogorov
and Doob. We write
(53.3) Q = P\%
with the interpretation provided by (53.1) and (53.2). We call the conditions
(DK1) 0<iy<oo (i*j);
(DK2) Σ9υ<9<<°°
the Doob-Kolmogorov conditions. The condition (DK2) is obtained by letting
ε υ 0 in the equation
(53.4) Ze'We)<e"l[l-P«W]
and applying Fatou's Lemma. (Under our assumption that {Pt} is honest,
equality holds in (53.4).) The Feller-McKean example shows that (even for an
honest {Pt)} we can have the 'worst' possible situation:
Actually this is the 'best' possible situation for chains with all states instantaneous.
(See Theorem 55.1.)
The probabilistic interpretation of q^(j Φ ί) is given in Section 57.
54. Local-character condition for Q. Let G be an open subset of E. Define
p*(i):= P'[*(t) =j, HE\G > t] (i, jelη G),
where, as usual HE\G denotes the hitting-time of E\G. Then
Pa(t) > Pu(t) ~ P'№XG < i] (iel η G).
Since G is open and X is right-continuous, Pl[#£\G > 0] = 1, so that
(54.1) p<}(t)->l (t->0)
for ielnG. That {PG(i)}:= {p^(i):i,jE/nG} is a transition matrix function on
InG is easily shown (recall Exercise 18.15). Note that (54.1) ensures that {PG{t)}
is 'standard'. Extend {PG(t)} to {InG)ηд in the usual way and observe that,
for belnG,
pGd(8)>PblXEeI\G-]= Σ РъМ)'
Multiply through by ε"1, let ε||0, and apply Fatou's Lemma and (DK1) to
obtain
6(b,AG):= Σ ibj<iw<°o.
Jei\G
111.54 APPLICATIONS 341
We have proved the following result (Williams [8,9]).
{54.2) LEMMA (local-character condition). Let bel, and let G be an open subset
of Ε containing b. Then
Q(bJ\G) <oo.
The result is interesting, not for its own sake, but for its 'concrete' applications,
we follow Williams [9].
(54.3) COROLLARY 1. If a and b are distinct points of I, then
(Ν) Σ <7«,λ<^<οο.
Note. The 'Ν' is in deference to Neveu, in whose work this condition is implicit
but nowhere explicit.
Proof of Corollary 1. Since Ε is Hausdorff, there exist disjoint open subsets Gfl,
Gb of Ε with aeGa, beGb. Then
Σ ^A^<e(a,/\Gfl) + e(b,AGb)<oo. Π
j*{a,b}
(54.4) COROLLARY 2. Suppose that Η is a finite subset of I such that
(54.5) liminf X qhj>0.
j ЛеЯ
Then every state i in I\H is stable.
Note. The meaning of 'lim inf should be obvious: for some ε > 0,
Σ <lhj < ε f°r only finitely many j.
heH
Proof of Corollary 2. It is clearly enough to suppose that / is infinite and Η is
minimal subject to the requirement (54.5). Then every state in Η is instantaneous
(by DK2)), and is therefore a point of accumulation of /. Let G be an open
subset of Ε that contains H. Then Lemma 54.2 and the condition (54.5) imply
that E\G contains only finitely many points of /. Thus / is homeomorphic to
the disjoint union of \H\ copies of {1,2,3,...; oo} and is already compact:
I = F = E. (Thus X takes all its values in /.) Any point i of I\H is isolated in
the Ray-Knight topology, and is therefore stable. Π
It is important that
(54.6) under the hypotheses of (54.4), we can add that
(54.7) Σ <lij = <li («»λ Vie/\tf.
^/\{f}
342
MARKOV PROCESSES
ΙΠ.54,55
This follows because, since X takes all its values in /, X must exist the stable
state i in I\H by jumping to another state of /. (Since {Pt} is honest, X cannot
jump to d; but the real point is that there are no 'fictitious' states in E\I to
which X can jump from i.) We are sure that you already know that (54.7) is
equivalent to the statement that X exits i by jumping to another state of /. (If
not, wait for something very much better in Section 57.) Π
55. Totally instantaneous ^-matrices. Recall that the conditions
(DK1) 0<iv<oo, Чиулф],
(Ν) Σ ^л^.<оо, Va,b:a^b,
]ф{а,Ь}
hold.
Suppose now that every state of / is instantaneous:
(TI) 9*=-9«=oo, Vie/.
Then we can argue that the following 'safety' condition holds:
(S): There exists an infinite subset J of I with
6(UV):= Σ 9tj<«>, v'*e/.
'J is a large set (comparatively) safe from hits'.
Proof of (S) (Williams [9]). Label / as N. Since (TI) holds, it follows from
(54.4) that there is an infinite set J = {j(l), j(2), ...}c/ such that j{n) > η and
Σ 9ijw<2-".
Then
e(/,j\o<oo, ν/. π
It was known back in 1967 (see Williams [10]) that if β is a β-matrix that
satisfies (TI) and (DK1) then the conditions (N) and (S) hold. Swayed by the
then-prevalent belief that totally instantaneous chains are impossibly complicated,
DW spent the next seven years trying to find additional necessary conditions.
He was then somewhat annoyed to discover that there are none.
(55.1) THEOREM. Let Q be an I x I matrix satisfying (TI) and (OKI). Then
Q = P'(0) for some transition function {P(t)} on I if and only if the conditions
(N) and (S) hold; and then {P(t)} may be chosen to be honest.
It is obvious that the 'if' part is proved by bare-hands construction of a
suitable (P(i)}. (See Williams [9].)
Theorem 55.1 is all very well as an analytical result, but it does not probe
IH.55,56
APPLICATIONS
343
deeply into the probabilistic structure of chains. For a much more challenging
problem than that solved by Theorem 55.1, see Section 60.
Note that the easiest way to guarantee conditions (N) and (S) is to take q{j — 0
(i#j). Thus the Feller-McKean β-matrix is the most 'likely' candidate for a
TI β-matrix, not the least likely, as was once thought. This explains remarks
at the end of Section 53.
56. Last exits. We now wish to prove the following result, which is 'dual' to
(52.13): for b, jel with j φ b, there exists a continuous function gbj{-) on [0, oo)
such that
(56.1) Ai(t) =
Pbb(s)gbj(t-s)ds.
One interpretation is provided by the dual of (52.14):
(56.2) P^b(t)eds; X(t) = j] = Pbb(s)gbj(t - s) ds9
where
(56.3) <rb(i):= sup {s < ί:X(s) = b}.
The intuitively obvious (but very unfashionable!) thing to do is to derive these
results from those in Section 5.2 by time reversal. We used the same idea in
connection with the Chung-Getoor-Sharpe description of equilibrium charge
in terms of (spatial) last-exit distribution.
Since the 'hat' notation is standard both for Laplace transforms and for time
reversal, let us follow the notation of Doob [2], using the 'tilde' notation for
Laplace transforms (as we did in Section 52), and the 'star' notation for time-
reversed processes.
It is appropriate that we should follow Doob's paper [2], because we now
make rather trivial use of a method fully developed there.
Let ξ be an exponentially distributed random variable of rate α > 0 and
independent of X. Let Y* and X* be (£u 5*)-valued processes defined as follows:
y*(i):= Χ(ξ-ή (0 < t ^ ξ); Y*(t):= δ*(ί > ξ% **(t):= У*(* + ).
By Nagasawa's formula, the process X* under the Pb law is Markovian with
transition matrix function {p*(t)} satisfying
(56.4) Рт=е^%^)рь](а)/ры(а).
Apply the first entrance decomposition to X* to obtain
whence
pbj(X + a) = J%k)pJX + 0L)pbj{<x)lpbb{a).
344 MARKOV PROCESSES 111.56
Inversion of the Laplace transform yields (56.1), with
9bj(ty=e*\f%(t)pbj(a)/pbM
It is formally obvious how (56.2) follows from (52.14). However, one has to
be rather careful here because of the difficulty mentioned earlier than X* need
not be strong Markov. Recall that the Ray-Knight compactification induced
by X* may be quite different from that induced by X, that we need Doob's
double (or simultaneous) compactification to do things properly, and so on —
Of course, our proof that (for j Φ b) there exists a continuous function gbj on
[0, oo) satisfying (56.1) is totally rigorous. We could have made the argument
independent of the concept of time reversal by making (56.4) an analytical
definition of a transition matrix function {p*(t)}.
Important exercise. Deduce from (56.1) that
(56.5) gbj(0) = qbj.
Next prove that, for s > 0 and t ^ 0,
(56.6) gbj(s + ί) = Σ 9bi(s)bPij(t) (j Φ Ь\
and deduce that
(56.7) 9ьМ)=1ЯыьР1М) (j*b).
(Hint for (56.6): compare (52.6).) Remembering that {P(t)} is assumed honest,
show that (56.1) implies that
Jo
(56.8) 1 - pbb(t) = | pbb(s)gb(t - s) ds,
where
(56.9) gb(t):= Σ gbj(t) (t>0).
ЗФЬ
Deduce from (56.6) that gb is non-increasing; then from (56.8) that gb is finite on
(0, oo); then from (56.6) that gb is continuous on (0, oo). Use (56.8) to show that
(56.10) 0ь(О):=0ь(О + ) = 4ь<°о.
Note that (56.7) does not necessarily extend to t = 0. Let vb be the measure on
(0, oo] defined by
(56.11) vb(i,oo]:=0b(i) (i>0).
Deduce frdur(-56.8) that> for λ > °>
(56.12) pbb(X) = \λ + f (1 - e-»)vb(dl)} \
L J(0,oo] J
111.57
APPLICATIONS
345
57. Excursions from b. Neveu's paper [2-5] are perhaps the finest written on
chains. Our concern here is to describe what is in modern terminology Neveu's
description of the Ito excursion law from the point b of /.
Let
(57.1) Lb(i):= meas {s < t: Xs = b},
and, for τ ^ 0, write
(57.2) ρ(τ):= inf {t: Lb(t) > τ} < oo.
The strong Markov property shows that {р(т):т^0:Рь} is a subordinator in
the sense of Section 11.37, but with a slight generalisation (which you can easily
make) to allow ρ to take the value oo. Exactly as in Section 11.37, we find that,
for λ > 0,
(57.3) Ε*[β-Α'(τ)] = <Γίψ(Α)
for some function Ψ. Hence
(57.4) Ψ(Λ)"1 = Eb f °° e-^ dz = Eb f °° e~Xt dLb(t) = pbb(X),
Jo Jo
so that, from (56.12),
(57.5) Ψ(Λ) = Λ+ f (l-e~Xt)vb(dt).
J(0,c»]
The Levy-Ito formula (II.37.4) now shows that we can write
(57.6) ρ(τ) = τ+ Γ ΖΛΓ((0,τ]χΛ)
J(0,oo]
where N is a Poisson measure on (0, oo) χ (0, oo] with expectation measure dz vb(dl).
Of course, ρ(τ) is finite if and only if no atom of N lies in (Ο,τ] χ {oo}. It is
therefore clear that
Ρ*[ρ(τ) < oo] = Pb[Lb(oo) > τ] = exp(- τν,{οο}).
The jumps made by p(·) correspond to the lengths of the excursions made
by X from b, so our description of the Ito excursion law of X at b must be
consistent with (57.6).
(57.7) Excursion space. Let U be the excursion space of Skorokhod maps e from
[0, oo) to Eud such that
(i) if e(s) = д then e(t) = 5, Vt ^ s;
(ii) φ) Φ b for s > 0.1
An excursion of X from b is to be considered as killed (sent to d) at its lifetime, so it cannot return to b.
346
MARKOV PROCESSES
111.57
Define Ce:=ini{t:e(t) = d}. Let %° be the smallest σ-algebra on U measuring
each projection e\-+e(t).
(57.8) THEOREM (Neveu). There exists a unique σ-finite measure η on(U, tft°)
such that, for 0< t1 < t2 < ··· < tm and iui2,...jmel\b,
n{eeU : e{tk) = ik (1^/c^m)}
= dbi^^bPhhih -h)'~ bPin-tiJtm ~ lm- l)·
The following statements hold:
(57.9) n(U) = qb, n{e:e(0) = j} = qbj (jel\b)',
(57.10) n{e:Ce>t}=gb(t), n{e:e(t) = j} = gbj(t) (jel\b);
(57.11) n{e: Ceedt} = vb{dt) = ηύ{ή dt on (0, oo),
where η{ί) (t > 0) is defined independently of s in (0, t) by
(57.12) tfi):= Σ 9bj(s)fjb(t-s).
jel\b
The measure η is the ltd excursion law ofX at b in the sense explained in (57.13)
below.
The business of assigning credit is terribly tortuous. What is certain is that
Neveu's papers have received much less credit than they deserve. However, we
must be careful not to overcompensate and thereby do injustice to later work.
(See, for. example, Freedman's interpretation of qbj described below.) We are
expressing our belief that if ltd excursion laws had been discovered when Neveu
was writing on chains, then he would have expressed his ideas in the form of
Theorem 57.8. (Recall what Gauss did with ideal-class groups!)
{57.13) The role of η as ltd excursion law from b. Part 8 of Chapter VI in
Volume 2 is an extensive study of Ito excursion theory; and you will have to
look there for proofs, and for much fuller explanation, of the statements now
to be made. We describe how a process Ζ with the Pb law of X may be built
out of excursions from b.
Construct, as in Section 11.37, a Poisson random measure Λ on
((0, oo) χ U, Щ0, oo) χ ^°)
with intensity measure Leb χ η. Let A be the set of atoms of Л, the 'points' of
the Poisson point process. A typical point of A is, of course, a pair (σ, e) where
ae(0, oo) and e is an element of U with lifetime ζβ ^ oo.
For τ ^ 0, define
7(τ) = Σ{^(σ,β)ΕΑ:σ^τ}.
Because of (56.11), (56.12), (57.6), (57.10) and (57.11), у is a subordinator identical
ΙΠ.57,58
APPLICATIONS
347
in law to the 'inverse local time' process (p, Pb). The jumps of у correspond to
excursions of Ζ from b, and it remains only to interpolate within these excursions.
For t ^ 0, we define
Z(t):=e(t-y(r-))
if for some (e, т)еА, we have γ(τ — )< t < γ(τ); define Z(t)\= b otherwise. Then Ζ
has the Pb law of X. In particular, if С is a measurable subset of U then the
local time at b for X before the first excursion from b that lies in С has the
exponential distribution with rate parameter n(C).
The second equation in (57.9) shows that if we set Tbj for the time of the first
jump made by X from b to j,
Tbj:=inf{t>0:X(t-) = b,X(t) = j},
then
(57.14) PbiLb(Tbj)>z\Tbj< oo] = exp(-zqbj).
This is the interpretation of qbj discovered by Freedman [2].
Another (and very closely related) interpretation of qbj is provided by the
theory of Levy kernels. Let Jbj{t) be the number of jumps made by X from b
to j during time t. Then the idea of qbj as 'jump intensity' is perfectly captured
by the statement that
(57.15) Jbj(t) — qbjLb(t) is a martingale (relative to ({^°}, Ρμ) for every μ).
In particular, for every probability measure μ on £,
(57.16) EVw(t) = f Adx) Ρ pxb(s)qb} ds.
Je Jo
The object of Levy kernel theory is to describe (simultaneously) all the jumps
of X. The β-matrix β, which describes the /-to-/ jumps, is just the restriction
to / χ Щ1) of the Levy kernel of X, which describes all possible £-to-£ jumps.
See Benveniste and Jacod [1] and Volumes IV and V of Dellacherie and Meyer
[1].
58. Kingman's solution of the 'Markov characterisation problem9. Kingman calls
a function ρ on [0, oo) a Markov p-function if there exists a ('standard') transition
function on a countable set / and a state bin I such that pbb(t) = p(t), Vi.
(58.1) THEOREM (Kingman). A continuous function ρ on [0, oo) is a Markov
p-function if and only if its Laplace transform may be written as
ρ(λ) = Γλ+ί (1-β-χί)Λ + ν({οο})1 X (Λ>0)
L J(0,oo) J
where vf{oo}J>0 and where η is a low er-semicontinuous function on (0, oo) such
348 MARKOV PROCESSES IH.58,59
that either
(i) rti) = 0,Vi,or
(ii) for some a in (0, oo),
^(ί)>0 (0<ί<1), η(ή>β~αί (i^l).
You can see that the 'only if part of the theorem is largely a consequence
of Neveu's work. The 'if part is surprising and very much more difficult, and
Kingman's proof of it is a splendid tour de force. You will find this proof and
much else of interest in Kingman's book [3].
59. Symmetrisable chains. Our paper, Rogers and Williams [2], was designed
to advertise the power of Dirichlet-form theory initiated by Beurling and Deny
and spectacularly developed for use in probability theory by Fukushima [1],
Silverstein [1,2]. We proved the following theorem by a finite-state
approximation technique originally used by Reuter and Ledermann for birth-and-death
processes.
(59.1) THEOREM. Let Q be an I x I matrix such that
(59.2) qij>0 (i*j), Σ9λ=-9«<°°. νί>
and such that Q is msymmetrisable in that
тди=тдл (i,jel)
for some strictly positive numbers (mi: iel). Then there exists a standard transition
function {P(t)} on I with Q-matrix Q and m-symmetrisable in that
WiPijit) = mjPjiit) (U jel;t^0)
if and only if
(59.3) &(*):= {fet2(m)'.£(f,f) < oo} is dense in i\m)
where S is the Dirichlet form or energy norm associated with Q:
Λ/,/)-ΣΣ»ί9«/Λ-/ί)2·
i J
Though, in general, {P(t)} is by no means unique, there is a 'icanonical' {P(t)} with
the properties described. Ifm is a finite measure, the canonical {P(t)} is honest.
It was too glibly stated in our paper that if every qit is finite then the canonical
{P(t)} process corresponds to the chain reflected off its Martin boundary. Ivor
McGillivray has pointed out that we should have said 'reflected off its Kuramochi
boundary', the Kuramochi boundary being a boundary analogous to the Martin
boundary (and agreeing with it in the cases most frequently encountered) but
specially tailored for symmetrisable processes.
IH.60 APPLICATIONS 349
60. An open problem. Here, to end with, is a problem we should like to solve.
Suppose that m is a probability measure on I, and that Q is an I x I matrix
satisfying (59.2) and also
(60.1) £ т#л = - т£д„ ^ oo, Vi.
When does there exist a (positive-recurrent) chain X with Q-matrix Q and with
m as invariant measure!
The condition (59.3) is necessary—see Rogers and Williams [2].
References for Volumes 1
and 2
Abrahams, R. and Robbin, J,
[1] Transversal Mappings and Flows, Benjamin, New York, Amsterdam, 1967.
Adler, R. J.
[1] The Geometry of Random Fields, Wiley Chichester, 1981.
[2] An Introduction to Continuity, Extrema, and Related Topics for General Gaussian
Processes, IMS Lecture Notes—Monograph Series Vol. 12, IMS, Hay ward, Calif.,
1990.
Aizenmann, M. and Simon, B.
[1] Brownian motion and the Harnack inequality for Schrodinger operators, Comm.
Pure and Appi Math., 35, 209-273 (1982).
Albeverio, S., Blanchard, P. and Hoegh-krohn, R.
[1] Newtonian diffusions and planets, with a remark on non-standard Dirichlet forms
and polymers, Stochastic Analysis and Applications: Lecture Notes in Mathematics
1095, Springer, Berlin, 1984, pp. 1-24.
Albeverio, S., Fenstad, I.E., Hoegh-krohn, R. and Lindstrom, T.
[1] Nonstandard Methods in Probability and Mathematical Physics, Academic Press,
New York (1986).
Aldous, D. J.
[1] Stopping times and tightness, Ann. Prob., 6, 335-40 (1978).
Ancona, A.
[1] Negatively curved manifolds, elliptic operators and Martin boundary Ann. Math.,
125, 495-536 (1987).
Arnold, L. and Wihstutz, V. (editors)
[1] Lyapunov Exponents (Proceedings): Lecture Notes in Mathematics 1186, Springer,
Berlin, 1986.
Azema, J.
[1] Sur les fermes aleatoires, Seminaires de Probabilites XIX: Lecture Notes in
Mathematics 1123, Springer, Berlin, 1985, pp. 297-495.
Azema, J. and Yor, M.
[1] Une solution simple au probleme de Skorokhod, Seminaire de probabilites XIII:
Lecture Notes in Mathematics 721, Springer, Berlin, 1979, pp. 90-115, 625-633.
[2] (editors) Temps locaux, Asterisque 52-53 Societe Mathematique de France (1978).
[3] Etude d'une martingale remarquable, Seminaire de Probabilites XXIII: Lecture
Notes in Mathematics 1372, Springer, Berlin, 1989, pp. 88-130.
Azencott, R.
[1] Grandes deviations et applications, Ecole d'Ete de Probabilites de Saint-Flour VIII:
Lecture Note in Mathematics 774, Springer, Berlin, 1980.
352
REFERENCES FOR VOLUMES 1 AND 2
Barlow, Μ. Τ.
[1] Study of a filtration expanded to include an honest time, Z. Wahrscheinlichkeitstheorie,
44, 307-323 (1978).
[2] Decomposition of a Markov process at an honest time (unpublished).
[3] One dimensional stochastic differential equation with no strong solution, J. London
Math. Soc, 26, 335-347 (1982).
[4] On Brownian local time, Seminaire de Probabilites XV: Lecture Notes in Mathematics
850, Springer, Berlin, 1981, pp. 189-190.
[5] Necessary and sufficient conditions for the continuity of local time of Levy processes,
Ann. Prob. 16, 1389-1427 (1988).
Barlow, Μ. Τ. and Hawkes, J.
[1] Application d'entropie metrique a la continuite des temps locaux des processus de
Levy. C.R. Acad. Sci. Paris Ser. I, 301, 237-239 (1985).
Barlow, M. Т., Jacka, S. and Yor, M.
[1] Inequalities for a pair of processes stopped at a random time, Proc. London Math.
Soc, 52, 142-172 (1986).
[2] Inegalities pour un couple de processus arretes a un temps quelconque, C.R. Acad.
Sci., 299, 351-354 (1984).
Barlow, M. T. and Perkins, E.
[1] One-dimensional stochastic differential equations involving a singular increasing
process, Stochastics, 12, 229-249 (1984).
[2] Strong existence, uniqueness and non-uniqueness in an equation involving local
time, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986, Springer,
Berlin, 1983, pp. 32-66.
Barlow, M. T. and Yor, M.
[1] (Semi-) martingale inequalities and local times, Z. Wahrscheinlichkeitstheorie 55,
237-254 (1981).
[2] Semi-martingale inequalities via the Garsia-Rodemich-Rumsey lemma and
applications to local times, J. Funct. Anal., 49, 198-229 (1982).
Bass, R. and Cranston, M.
[1] The Malliavin calculus for pure jump processes and applications to local time, Ann.
Prob., 14, 490-532 (1986).
Batchelor, G. K.
[1] Kolmogoroff's theory of locally isotropic turbulence, Proc. Camb. Phil. Soc, 43,
555-559 (1947).
Baxendale, P.
[1] Asymptotic behaviour of stochastic flows of diffeomorphisms; two case studies, Prob.
Th. Rel. Fields, 73, 51-85 (1986).
[2] Moment stability and large deviations for linear stochastic differential equations,
Proc. Taniguchi Symposium on Probabilistic Methods in Mathematical Physics,
Katata and Kyoto, 1985 (ed. N. Ikeda), Kinokuniya, Tokyo, 31-54 (1986).
[3] The Lyapunov spectrum of a stochastic flow of diffeomorphisms, in Arnold and
Wihstutz [1], pp. 322-337 (1986).
[4] Brownian motions on the diffeomorphism group, I, Compos. Math., 53,19-50 (1984).
Baxendale, P. and Harris, Т. Е.
[1] Isotropic stochastic flows. Ann. Prob., 14, 1155-1179 (1986).
Baxendale, P. and Stroock, D. W.
[1] Large deviations and stochastic flows of diffeomorphisms, Prob. Th. Rel. Fields, 80,
169-215 (1988).
REFERENCES FOR VOLUMES 1 AND 2
353
Bensoussan, A.
[1] Lectures on stochastic control, Nonlinear Filtering and Stochastic Control: Lecture
Notes in Mathematics 972, Springer, Berlin, 1982, pp. 1-62.
Benes, V. E., Shepp, L. A. and Witsenhausen, H. S.
[1] Some solvable stochastic control problems, Stochastics, 4, 39-83 (1980).
Benveniste, A. and Jacod, J.
[1] Systemes de Levy des processus de Markov, Invent. Math., 21, 183-198 (1973).
Berman, S. M.
[1] Local times and sample function properties of stationary Gaussian processes, Trans.
Amer. Math.Soc, 137, 277-300 (1969).
[2] Harmonic analysis of local times and sample functions of Gaussian processes, Trans.
Amer. Math. Soc, 143, 269-281 (1969).
[3] Gaussian processes with stationary increments: local times and sample function
properties, Ann. Math. Statist., 41, 1260-1272 (1970).
BlANE, P.
[1] Comparaison entre temps d'atteinte et temps de sejour de certaines diffusions reelles,
Seminaire de Probabilites XIX, Lecture Notes in Mathematics 1123, Springer, Berlin,
1985, pp. 291-296.
BlCHTELER, K.
[1] Stochastic integration and //-theory of semi-martingales, Ann. Prob., 9,49-89 (1981).
Bichteler, K. and Fonken, D.
[1] A simple version of the Malliavin calculus in dimension one, Martingale Theory in
Harmonic Analysis and Banach Spaces: Lecture Notes in Mathematics 939, Springer,
Berlin, 1982, pp. 6-12.
Bichteler, K. and Jacod, J.
[1] Calcul de Malliavin pour les diffusions avec sauts: Existence d'une densite dans le
cas unidimensionnel, Seminaire de Probabilites XVII: Lecture Notes in Mathematics
986, Springer, Berlin, 1983, pp. 132-157.
BlLLINGSLEY, P.
[1] Ergodic Theory and Information, Wiley, New York, 1965.
[2] Convergence of Probability Measures, Wiley, New York, 1968.
[3] Conditional distributions and tightness, Ann. Prob., 2, 480-485 (1974).
Bingham, N. H.
[1] Fluctuation theory in continuous time, Adv. Appl. Prob., 7, 705-766 (1975).
Bingham, N. H. and Doney, R. A.
[1] On higher-dimensional analogues of the arc-sine law, J. Appl. Prob. 25, 120-131
(1988).
Bishop, R. and Crittenden, R. J.
[1] Geometry of Manifolds, Academic Press, New York, 1964.
Bismut, J.-M.
[1] Mechanique Aleatoire: Lecture Notes in Mathematics 866, Springer, Berlin, 1981.
[2] Martingales, the Malliavin calculus and hypoellipticity under general Hormander's
conditions, Z. Wahrscheinlichkeitstheorie, 56, 469-505 (1981).
[3] Calcul de variations stochastiques et processus de sauts, Z. Wahrscheinlichkeitstheorie
56, 469-505 (1983).
[4] Large deviations and the Malliavin calculus, Progress in Mathematics, Birkhauser,
Boston, 1984.
[5] The Atiyah-Singer theorems; a probabilistic approach: I, The index theorem, J.
Funct. Anal., 57,56-98 (1984); II, The Lefschetz fixed-point formulas, ibid, 329-348.
354
REFERENCES FOR VOLUMES 1 AND 2
Bismut, J.-M. and Michel, D.
[1] Diffusions conditionnelles, I, II, J. Funct. Anal., 44, 174-211 (1981), 45, 274-292
(1981).
Blackwell, D. and Kendall, D. G.
[1] The Martin boundary for Polya's urn scheme and an application to stochastic
population growth, J. Appl. Prob. 1, 284-296 (1964).
Blumenthal, R. M. and Getoor, R. K.
[1] Markov Processes and Potential theory, Academic Press, New York, 1968.
[2] Local times for Markov processes. Z. Wahrscheinlichkeitstheorie verw. Geb., 3, 50-74
(1964).
Bondesson, L.
[1] Classes of infinitely divisible distributions and densities. Z. Wahrscheinlichkeitstheorie
verw Geb., 57, 39-71 (1981).
Bougerol, P. and Lacroix, J.
[1] Products of Random Matrices with Applications to Schrodinger Operators, Birkhauser,
Boston, 1985.
Bourbaki, N.
[1] Topologie generate, in Elements de Mathematique, Hermann, Paris, 1958, Chap. IX,
2nd edition.
Breiman, L.
[1] Probability, Addison-Wesley, Reading, Mass., 1968.
Br6maud, P.
[1] Point Processes and Queues: Martingale Dynamics, Springer, New York, 1981.
Bretagnolle, J.
[1] Resultats de Kesten sur les processus a accroissements independantes, Seminaire de
Probabilites V, Lecture Notes in Mathematics 191, Springer, Berlin, 1971, pp. 21-36.
Brydges, D., Frohlich, J. and Spencer, T.
[1] The random walk representation of classical spin systems and correlation inequalities.
Comm. Math. Phys., 83, 123-150 (1982).
Burdzy, K.
[1] On nonincrease of Brownian motion. Ann. Prob. 18, 978-980 (1990).
[2] Brownian paths and cones, Ann. Prob. 13, 1006-1010 (1985).
[3] Cut points on Brownian paths. Ann. Prob. 17, 1012-1036 (1989).
BlJRKHOLDER, D.
[1] Distribution function inequalities for martingales, Ann. Prob., 1, 19-42 (1973).
Carlen, E. A.
[1] Conservative diffusions, Comm. Math. Phy., 94, 293-315 (1984).
[2] Potential scattering in quantum mechanics, Ann. Inst. H. Poincare, 42, 407-428
(1985).
Carverhill, A. P.
[1] Flows of stochastic dynamical systems: ergodic theory, Stochastics, 14, 273-318 (1985).
[2] A formula for the Lyapunov exponents of a stochastic flow. Application to a
perturbation theorem, Stochastics, 14, 209-226 (1985).
[3] A nonrandom Lyapunov spectrum for nonlinear stochastic dynamical systems,
Stochastics, 17, 209-226, 1986.
Carverhill, A. P., Chappell, M. J. and Elworthy, K. D.
[1] Characteristic exponents for stochastic flows, Proceedings, ВI BOS I: Stochastic
Processes.
Carverhill, A. P. and Elworthy, K. D.
[1] Flows of stochastic dynamical systems: the functional analytic approach, Z.
Wahrscheinlichkeitstheorie, 65, 245-268 (1983).
REFERENCES FOR VOLUMES 1 AND 2
355
Chaleyat-Maurel, M.
[1] La condition d'hypoellipticite d'Hormander, Asterisque, 84-85, 189-202 (1981).
Chaleyat-Maurel, M. and El Karoui, N.
[1] Un probleme de reflexion et ses applications au temps local et aux equations
differentielles stochastiques sur R, case continu. In Azema and Yor [2], pp. 117-144.
Cheeger, J. and Ebin, D. G.
[1] Comparison Theorems in Riemannian Geometry, North-Holland, Amsterdam, 1975.
Chung, K. L.
[1] Markov Chains with Stationary Transition Probabilities, 2nd edition, Springer, Berlin,
1967.
[2] Probabilistic approach in potential theory to the equilibrium problem, Ann. Inst.
Fourier, Grenoble, 23, 313-322 (1973).
[3] Excursions in Brownian motion, Ark. Mat., 14, 155-177 (1976).
Chung, K. L. and Getoor, R. K.
[1] The condenser problem, Ann. Prob., 5, 82-86 (1977).
Chung, K. L. and Walsh, J. B.
[1] To reverse a Markov process, Acta Math., 123, 225-251 (1969).
[2] Meyer's theorem on previsibility, Z. Wahrscheinlichkeitstheorie, 29,253-256 (1974).
Chung, K. L. and Wlliams, R. J.
[1] Introduction to Stochastic Integration Birkhauser, Boston, 1983.
Ciesielski, Z.
[1] Holder conditions for realisations of Gaussian processes. Trans. Amer. Math. Soc,
99, 403-413 (1961).
Ciesielslki, Z. and Taylor, S. J.
[1] First passage times and sojourn times for Brownian motion in space and the
exact Hausdorff measure of the sample path, Trans. Amer. Math. Soc, 103, 434-450
(1962).
Qnlar, E., Chung, K. L. and Getoor, R. K. (editors)
[1] Seminars on Stochastic Processes 1981,1982; 1983,1984 (four volumes), Birkhauser,
Boston, 1982, 1983, 1984, 1985.
Qnlar, E, Chung, K. L., Getoor, R. K. and Glover, J. (editors)
[1] Seminar on Stochastic Processes 1986, Birkhauser, Boston, 1987.
Qnlar, E., Jacod, J., Protter, P. and Sharpe, M. J.
[1] Semimartingales and Markov processes, Z. Wahrscheinlichkeitstheorie, 54, 161-220
(1980).
Clark, J. M. С
[1] The representation of functionals of Brownian motion by stochastic integrals, Ann.
Math. Stat., 41, 1282-1295 (1970); 42, 1778 (1971).
[2] An introduction to stochastic differential equations on manifolds, Geometric Methods
in Systems Theory (eds. D. Q. Mayne and R. W. Brockett), Reidel, Dordrecht, 1973.
[3] The design of robust approximations to the stochastic differential equations of
nonlinear filtering, Communications Systems and Random Process Theory (ed. J.
Skwirzynski), Sijthoff and Noordhoff, Alphen aan den Rijn, 1978.
Clarkson, B. (editor)
[1] Stochastic Problems in Dynamics, Pitman, London, 1977.
Cocozza, С and Yor, M.
[1] Demonstration simplifiee d'un theoreme de Knight, Seminaire de Probabilites XIV:
Lecture Notes in Mathematics 721, Springer, Berlin, 1980, pp. 496-499.
Crank, J.
[1] The Mathematics of Diffusion, 2nd ed. Oxford University Press, Oxford (1975).
356
REFERENCES FOR VOLUMES 1 AND 2
Cranston, M.
[1] On the means of approach of Brownian motion Ann. Probab., 15,1009-1013 (1987).
Cutland, N.
[1] Non-standard measure theory and its applications, Bull. London. Math. Soc, 15,
529-589 (1983).
Cutland, N. and Kendall, W. S.
[1] A non-standard proof of one of David Williams' splitting-time theorems, in D. G.
Kendall [5], pp. 37-48.
Darling, R. W. R.
[1] Martingales in manifolds—definition, examples, and behaviour under maps,
Seminaire de Probabilites XVI Supplement: Lecture Notes in Mathematics 921,
Springer, Berlin, 1982, pp. 217-236.
Da vies, E. B. and Simon, B.
[1] Ultracontractivity and the heat kernel for Schrodinger operators and Dirichlet
Laplacians, J. Funct. Anal. 59, 335-395 (1984).
Davis, B.
[1] Picard's theorem and Brownian motion, Trans. Amer. Math. Soc, 213, 353-362
(1975).
[2] Applications of the conformal invariance of Brownian motion, Harmonic analysis
in Euclidean Space.
Davis, Μ. Η. A.
[1] On a multiplicative functional transformation arising in non-linear filtering theory,
Z. Wahrscheinlichkeitstheorie, 54, 125-139 (1980).
[2] Pathwise non-linear filtering, Stochastic Systems: the Mathematics of Filtering and
Identification and Applications (eds. M. Hazewinkel and J. C. Willems), Reidel,
Dordrecht, 1981.
[3] Some current issues in stochastic control theory, Stochastics.
[4] Markov Models and Optimization, Chapman & Hall, London, 1993.
Davis, Μ. Η. A. and Varaiya, P.
[1] Dynamic programming conditions for partially observed stochastic systems, SI AM
J. Control, 11, 226-261 (1973).
Dawson, D. A.
[1] Measure-valued Markov processes, Ecole d'Ete de Probabilites de Saint-Flour XXI,
1993 (ed. P. L. Hennequin), Lecture Notes in Mathematics 1541, 1993.
Dawson, D. A. and Gartner, J.
[1] Large deviations from the McKean-Vlasov limit for weakly-interacting diffusions,
Stochastics, 20, 247-308 (1987).
Dellacherie, С
[1] Capacites et Processus Stochastiques, Springer, Berlin, 1972.
[2] Quelques exemples familiers en probabilites d'ensembles analytiques non-Boreliens,
Seminaire de Probabilites XII: Lecture Notes in Mathematics, Springer, Berlin, 1978,
pp. 742-745.
[3] Un survoi de la theorie de l'integrale stochastique, Stoch. Proc. Appl., 10, 115-144
(1980).
Dellacherie, C, Dol£ans(-dade), Catherine, Letta, G. and Meyer, P. A.
[1] Diffusions a coefficients continus d'apres D. W. Stroock et S. R. S. Varadhan,
Seminaire de Probabilites IV: Lecture Notes in Mathematics 124, Springer, Berlin,
1970, pp. 241-282.
REFERENCES FOR VOLUMES 1 AND 2
357
Dellacherie, C. and Meyer, P. A.
[1] Probabilitis et Potentiel, Chaps. I-VI, Hermann, Paris, 1975; Chaps. V-VIII,
Hermann, Paris, 1980; Chaps. IX-XI, Hermann, Paris, 1983; Chapters XII-XVI,
Hermann, Paris, 1987; Chaps XVII-XXIV, Hermann, Paris, 1993.
Deuschel, J.-D. and Stroock, D. W.
[1] Large Deviations. Academic Press, Boston, 1989.
De Witt-Morette, C. and Elworthy, K. D. (editors)
[1] New stochastic methods in physics, Phys. Rep., 11, 121-382 (1981).
Doleans(-dade), С
[1] Existence du processus croissant natural associe a un potentiel de la classe (D), Z.
Wahrscheinlichkeitstheorie 9, 309-314 (1968).
[2] Quelques applications de la formule de changement de variables pour les
semimartingales, Z. Wahrescheinlichkeitstheorie, 16, 181-194 (1970).
Doleans-Dade, С and Meyer, P. A.
[1] Equations differentielles stochastiques, Seminaires de Probabilites XI: Lecture Notes
in Mathematics 581, Springer, Berlin, 1977, pp. 376-382.
Doney, R. A.
[1] On the maxima of random walks and stable processes and the arc-sine law, Bull.
London Math. Soc, 19, 177-182 (1987).
[2] A path decomposition for Levy processes, Stock Proc. Appl. 47, 167-181 (1993).
Doob, J. L.
[1] Stochastic Processes, Wiley, New York, 1953.
[2] State-spaces for Markov chains, Trans. Amer. Math. Soc. 149, 279-305 (1970).
[3] Classical Potential Theory and its Probabilistic Counterpart, Springer, New York,
1981.
Doss, H.
[1] Liens entre equations differentielles stochastiques et ordinaires, Ann. Inst. Henri
Poincare B, 13, 99-126 (1977).
Dubins, L. and Schwarz, G.
[1] On continuous martingales, Proc. Natl. Acad. Sci. USA, 53, 913-916 (1965).
Dunford, N. and Schwartz, J. T.
[1] Linear Operators: Part I, General Theory, Interscience, New York, 1958.
DURRETT, R.
[1] Brownian Motion and Martingales in Analysis, Wads worth, Belmont, Calif. 1984.
[2] (editor) Particle systems, random media, large deviations, Contemp. Math. 41, Amer.
Math. Soc, Providence, RI, 1985.
[3] Probability: Theory and Examples, Wadsworth & Brooks Cole, Pacific Grove, Calif.,
1991.
Dvoretsky, Α., Erdos, P. and Kakutani, S.
[1] Double points of paths of Brownian motion in «-space, Acta. Sci. Math. (Szeged),
12,64-81 (1950).
[2] Multiple points of paths of Brownian motion in the plane, Bull. Res. Council Isr.
Sect. F, 3, 364-371 (1954).
[3] Points of multiplicity с of plane Brownian paths, Bull. Res. Council Isr. Sect. F, 7,
175-180(1958).
Dvoretsky, Α., Erdos, P., Kakutani, S. and Taylor, S. J.
[1] Triple points of Brownian motion in 3-space, Proc. Camb. Phil. Soc, 53, 856-862
(1957).
Dynkin, Ε. Β.
[1] Theory of Markov Processes, Pergamon Press, Oxford, 1960.
358
REFERENCES FOR VOLUMES 1 AND 2
[2] Markov Processes (two volumes), Springer, Berlin, 1965.
[3] Non-negative eigenfunctions of the Laplace-Beltrami operator and Brownian
motion in certain symmetric spaces (in Russian), Dokl. Akad. Naud SSSR, 141,
288-291 (1961).
[4] Diffusion of tensors, Dokl. Akad. Nauk. SSSR, 179, 1264-1267 (1968).
[5] Local times and quantum fields, in фп1аг, Chung and Getoor [1, 1983].
[6] Gaussian and non-Gaussian random fields associated with Markov processes, J.
Func. Anal., 55, 344-376 (1984).
[7] Self-intersection local times, occupation fields and stochastic integrals (to appear in
Adv. App. Math.).
[8] Random fields associated with multiple points of the Brownian motion, J. Fund.
Anal, 62, 397-434 (1985).
[9] Local times and quantum fields, in ^inlar, Chung and Getoor [1, 1984].
Elliott, R. J.
[1] Stochastic Calculus and Applications, Springer, Berlin, 1982.
Elliott, R. J. and Anderson, B. D. O.
[1] Reverse time diffusions, Stochastic Processes and their Applications, 19, 327-339
(1985).
Elworthy, K. D.
[1] Stochastic Differential Equations on Manifolds, London Mathematical Society Lecture
Note Series 20, Cambridge University Press, Cambridge, 1982.
[2] (editor) From Local Time to Global Geometry, Control and Physics, Proceedings,
Warwick Symposium 1984/85, Longman, Harlow/Wiley, New York, 1986.
Elworthy, K. D. and Stroock, D. W.
[1] Large deviation theory for mean exponents of stochastic flows, Appendix to
Carverhill, Chappell and Elworthy [1].
Elworthy, K. D. and Truman, A.
[1] Classical mechanics, the diffusion (heat) equation and the Schrodinger equation on
a Riemannian manifold, J. Math. Phys., 22, 2144-2166 (1981).
[2] The diffusion equation and classical mechanics: an elementary formula, Stochastic
processes in quantum theory and statistical physics (ed. S. Albeverio et al), Lecture
Notes in Physics 173, Springer, Berlin, 1982, pp. 136-146.
Emery, M.
[1] Annoncabilite des temps previsibles: deux contre-exemples, Seminaire de Probabilites
IV: Lecture Notes in Mathematics 784, Springer, Berlin, 1980, pp. 318-323.
[2] On the Azema martingales, Seminaire de Probabilities XXIII: Lecture Notes in
Mathematics 1372, Springer, Berlin 1989 pp. 66-88.
Ethier, S. N. and Kurtz, T. G.
[1] Markov Processes: Characterization and Convergence, Wiley, New York, 1986.
Evans, S. N.
[1] On the Hausdorff dimension of Brownian cone points, Math. Proc. Comb. Phil. Soc,
98, 343-353 (1985).
[2] Multiple points in the sample paths of a Levy process, Prob. Th. Rel. Fields, 76,
359-367 (1987).
Feller, W.
[1] Introduction to Probability Theory and its Applications, Vol. 1, 2nd edition Wiley,
New York, 1957; Vol. 2, Wiley, New York, 1966.
[2] Boundaries induced by non-negative matrices, Trans. Amer Math. Soc, 83, 19-54
(1956).
[3] On boundaries and lateral conditions for the Kolmogorov equations, Ann. Math.,
Ser. II, 65, 527-570 (1957).
REFERENCES FOR VOLUMES 1 AND 2
359
[4] Generalized second-order differential operators and their lateral conditions, Illinois
J. Math., 1, 459-504 (1957).
Fleming, W. H. and Rishel, R. W.
[1] Deterministic and Stochastic Optimal Control, Springer, Berlin, 1975.
FOLLMER, H.
[1] Calcul d'lto sans probabilites, Seminaire de Probabilites XV: Lecture Notes in
Mathematics 850, Springer, Berlin, 1981, pp. 143-150.
Freedman, D.
[1] Brownian Motion and Diffusion, Holden-Day, San Francisco, 1971.
[2] Approximating Countable Markov Chains, Holden-Day, San Francisco, 1972.
Friedman, A.
[1] Stochastic Differential Equations and Applications (two volumes), Academic Press,
New York, 1975.
Fristedt, B.
[1] Sample functions of stochastic processes with stationary independent increments,
Adv. Prob., 3, 241-396 (1973).
Fujisaki, M., Kallianpur, G. and Kunita, H.
[1] Stochastic differential equations for the non-linear filtering problem, Osaka J. Math.,
9, 19-40 (1972).
Fukushima, M.
[1] Dirichlet Forms and Markov Processes, Kodansha, Tokyo, 1980.
[2] Basic properties of Brownian motion and a capacity on the Wiener space, J. Math.
Soc. Japan, 36, 161-176 (1984).
Garcia Alvarez, M. A. and Meyer, P. A.
[1] Une theorie de la dualite a un ensemble polaire pres: I, Ann. Prob., 1,207-222 (1973).
Garsia, A.
[1] Martingale Inequalities: Seminar Notes on Recent Progress, Benjamin, Reading,
Mass, 1973.
Garsia, Α., Rodemich, E. and Rumsey, H. Jr
[1] A real variable lemma and the continuity of paths of some Gaussian processes.
Indiana Univ. Math. J., 20, 565-578 (1970).
Geman, D. and Horowitz, J.
[1] Occupation densities, Ann. Prob., 8, 1-67 (1980).
Geman, D. Horowitz, J. and Rosen, J.
[1] A local time analysis of intersections of Brownian paths in the plane, Ann. Prob.,
12, 86-107 (1984).
Getoor, R. K.
[1] Markov processes: Ray Processes and Right Processes: Lecture Notes in Mathematics
440, Springer, Berlin, 1975.
[2] Excursions of a Markov process, Ann. Prob., 8, 244-266 (1979).
[3] Splitting times and shift functional, Z. Wahrscheinlichkeitstheorie, 47,69-81 (1979).
Getoor, R. K. and Sharpe, M. J.
[1] Last exit times and additive functional, Ann. Prob., 1, 550-569 (1973).
[2] Excursions of Brownian motion and Bessel process, Z. Wahrscheinlichkeitstheorie,
47, 83-106 (1979).
[3] Last exit decompositions and distributions, Indiana Univ. Math. J., 23, 377-404
(1973).
[4] Excursions of dual processes, Adv. Math., 45, 259-309 (1982).
[5] Conformal martingales, Invent Math., 16, 271-308 (1972).
Gikhman, 1.1, and Skorokhod, A. V.
[1] The Theory of Stochastic Processes (three volumes), Springer, Berlin, 1979.
360
REFERENCES FOR VOLUMES 1 AND 2
Gray, Α., Karp, L. and Pinsky, M. A.
[1] The mean exit time from a tube in a Riemannian manifold, Probability and Harmonic
Analysis (eds. J. Chao and W. Woyczynski), Dekker, 1986, pp. 113-137.
Gray, A. and Pinsky, M. A.
[1] The mean exit time from a small geodesic ball in a Riemannian manifold, Bull. Sci
Math., 107, 345-370 (1983).
Greenwood, P. and Perkins, E.
[1] A conditional limit theorem for random walk and Brownian local time on square
root boundaries, Ann. Prob. 11, 227-261 (1982).
[2] Limit theorems for excursions from a moving boundary. Th. Prob. Appl. 29, 703-714
(1984).
Greenwood, P. and Pitman, J. W.
[1] Construction of local time and Poisson point processes from nested arrays, J. London
Math. Soc. (2), 22, 182-192 (1980).
[2] Fluctuation identities for Levy processes and splitting at the maximum, Adv. Appl.
Prob., 12, 893-902 (1980).
Grenander, U.
[1] Probabilities on Algebraic Structures, Wiley, New York, 1963.
Griffeath, D.
[1] Coupling methods for Markov processes, Advances in Mathematics Supplementary
Studies: Studies in Probability and Ergodic Theory, Vol. 2, Academic Press, New
York, 1978, pp. 1-43.
Gromov, M. and Rohlin, V. A.
[1] Russian Math. Surveys, 25, 1-57 (1970).
Grosswald, E.
[1] The Student r-distribution of any degree of freedom is infinitely divisible, Z.
Wahrsheinlichkeitscheorie verw. Geb., 36, 103-109 (1976).
Halmos, P.
[1] Measure Theory, Van Nostrand, Princeton, NJ, 1959.
Harris, Т. Е.
[1] Brownian motions on the homeomorphisms of the plane, Ann. Prob., 9, 232-254
(1981).
Haussmann, U.
[1] On the integral representation of Ito processes, Stochastics, 3, 17-7 (1979).
[2] A Stochastic Maximum Principle for Optimal Control of Diffusions, Longman,
Harlow, 1986.
Hawkes, J.
[1] Multiple points for symmetric Levy processes, Math. Proc. Camb. Phil., 83, 83-90
(1978).
[2] The measure of the range of a subordinator, Bull. London Math. Soc, 5, 21-28
(1973).
[3] Local times as stationary processes, From Local to Global Geometry, Control and
Physics, Research Notes in Math. 150, Pitman, Harlow, 1986, pp. 111-120.
Hazewinkel, M. and Willems, J. С (editors)
[1] Stochastic Systems: The Mathematics of Filtering and Identification and Applications,
Reidel, Dordrecht, 1981.
Helgason, S.
[1] Differential Geometry and Symmetric Spaces, Academic Press, New York, 1962.
Helms, L. L.
[1] Introduction to Potential Theory, Robert E. Krieger, Huntington, NY, 1975.
REFERENCES FOR VOLUMES 1 AND 2
361
Hille, E. and Phillips, R. S.
[1] Functional Analysis and Semigroups, Amer. Math. Soc, Providence, RI, 1957.
Holley, R., Stroock, D. W. and Williams, D.
[1] Applications of dual processes to diffusion theory, Proc. Amer. Math. Soc. Prob.
Symp., Urbana, 1976, pp. 23-36.
HuRMANDER, L.
[1] Hypoelliptic second-order differential equations, Acta Math., 117, 147-171 (1967).
Hsu, P.
[1] On excursions of reflecting Brownian motion, Trans. Math. Soc, 296,239-264 (1986).
[2] Brownian motion and the index theorem (to appear).
Hunt, G. A.
[1] Markoff processes and potentials: I, И, ЛИ, Illinois J. Math., 1, 44-93; 316-369
(1957); 2, 151-213 (1958).
Ikeda, N. and Watanabe, S.
[1] Stochastic Differential Equations and Diffusion Processes, North Holland-Kodansha,
Amsterdam and Tokyo, 1981.
[2] Malliavin calculus of Wiener functionals and its applications, in Elworthy [2],
pp. 132-178.
Ismail, M. E. and Kelker, D. H.
[1] The Bessel polynomials and the Student r-distribution, SIAM J. Math. Anal., 7,
82-91 (1976).
Ιτό, Κ.
[1] Stochastic integral, Proc. Imp. Acad. Tokyo, 20, 519-524 (1944).
[2] On a stochastic integral equation, Proc. Imp. Acad. Tokyo, 22, 32-35 (1946).
[3] Stochastic differential equations in a differential manifold, Nagoya Math. J., 1, 35-47
(1950).
[4] The Brownian motion and tensor fields on a Riemannian manifold, Proc. Int.
Congr. Math, Stockholm, 1963, pp. 536-539.
[5] Stochastic parallel displacement, Probabilistic Methods in Differential Equations:
Lecture Notes in Mathematics 451, Springer, Berlin, 1975, pp. 1-7.
[6] Poisson point processes attached to Markov processes, Proc. 6th Berkeley Symp.
Math. Statist. Prob., Vol. 3, University of California Press, Berkeley, 1971, pp. 225-240.
[7] (editor) Proceedings of the 1982 Taniguchi Int. Symp. on Stochastic Analysis,
Kinokuniya-Wiley, 1984.
[8] Stationary random distributions. Mem Coll. Sci. Kyoto Univ. Ser. A, 28, 209-223
(1954).
Ιτό, Κ. and McKean, H. P.
[1] Diffusion Processes and their Sample Paths, Springer, Berlin, 1965.
Jacka, S.
[1] A finite fuel stochastic control problem, Stochastics, 10, 103-113 (1983).
[2] A local time inequality for martingales, Seminaires de Probabilites XVII: Lecture
Notes in Mathematics 986, Springer, Berlin, 1983.
Jacobsen, M.
[1] Splitting times for Markov processes and a generalised Markov property for
diffusions, Z. Wahrscheinlichkeitstheorie, 30, 27-43 (1974).
[2] Statistical Analysis of Counting Processes: Lecture Notes in Mathematics 12,
Springer, New York, 1982.
Jacod, J.
[1] A general theorem of representation for martingales, Proc. Amer. Math. Soc. Prob.
Symp., Urbana, 1976, 37-53.
362
REFERENCES FOR VOLUMES 1 AND 2
[2] Calcul Stochastique et Problemes de Martingales: Lecture Notes in Mathematics 714,
Springer, Berlin, 1979.
Jacod, J. and Yor, M.
[1] Etude des solutions extremales et representation integrate des solutions pour certains
problemes de martingales, Z. Wahrscheinlichkeitstheorie, 38, 83-125 (1977).
Jeulin, T.
[1] Semimar ting ales et Grossissement d'une Filtration: Lecture Notes in Mathematics 833,
Springer, Berlin, 1980.
Jeulin, T. and Yor, M.
[1] Grossissement d'une filtration et semi-martingales: formules explicites, Seminaire de
Probabilites XII: Lecture Notes in Mathematics 649, Springer, Berlin, 1978, pp. 78-97.
[2] (editors) Grossissements de Filtrations: Exemples et Applications: Lecture Notes in
Mathematics 1118, Springer, Berlin, 1985.
Johnson, G. and Helms, L. L.
[1] Class (D) supermartingales, Bull. Amer. Math. Soc, 69, 59-62 (1963).
Kailath, T.
[1] An innovations approach to least squares estimation, Part I: Linear filtering with
additive white noise, IEEE Trans. Autom. Control. 13, 646-655 (1968).
Kallianpur, G.
[1] Stochastic Filtering Theory, Springer, Berlin, 1980.
Karatzas, I. Shreve, S. E.
[1] Brownian Motion and Stochastic Calculus, Springer, Berlin, 1988.
Kellogg, O. D.
[1] Foundations of Potential Theory, Dover, New York, 1953.
Kendall, D. G.
[1] Pole-seeking Brownian motion and bird navigation (with discussion), J. Roy. Statist.
Soc. B, 36, 365-417 (1974).
[2] The diffusion of shape, Adv. Appl. Prob., 9, 428-430 (1979).
[3] Shape manifolds, Procrustean metrics, and complex projective spaces, Bull. London
Math. Soc, 16, 81-121 (1984).
[4] A totally unstable Markov process, Quart. J. Math. Oxford, 9, 149-160 (1958).
[5] (editor) Analytic and Geometric Stochastics (special supplement to Adv. Appl. Prob.
to honour G. Ε. Η. Reuter), Appl. Prob. Trust, 1986.
Kendall, D. G. and Reuter, G. Ε. Η.
[1] Some pathological Markov processes with a denumerable infinity of states and the
associated contraction semigroups of operators on {, Proc. Int. Congr. Math. 1954
(Amsterdam), 3, 377-415 (1956).
Kendall, W. S.
[1] Knotting of Brownian motion in 3-space, J. London Math. Soc. (2), 19, 378-384
(1979).
[2] Brownian motion, negative curvature, and harmonic maps, Stochastic Integrals:
Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 479-491.
[3] Brownian motion on a surface of negative curvature, Siminaire de Probabilites
XVIII: Lecture Notes in Mathematics 1059, Springer, Berlin, 1984, pp. 70-76.
[4] Survey article on stochastic differential geometry (to appear).
Kent, J.
[1] Some probabilistic properties of Bessel functions, Ann. Prob., 6, 760-770 (1978).
[2] The infinite divisibility of the von Mises-Fisher distribution for all values of the
parameter in all dimensions, Proc. London Math. Soc, 3, 359-384 (1977).
[3] Continuity properties for random fields. Ann. Prob. 17, 1432-1440 (1989).
REFERENCES FOR VOLUMES 1 AND 2
363
Kesten, H.
[1] Hitting probabilities of single points for processes with stationary independent
increments, Mem. Amer. Math. Soc, 93 (1969).
Khasminskii, R. Z.
[1] Ergodic properties of recurrent diffusion processes and stabilization of the solution
of the Cauchy problem for parabolic equations, Th. Prob. Appl., 5,179-196 (1960).
[2] Stochastic Stability of Differential Equations, Sijthoff and Noordhoff, Alphen aan den
Rijn, 1980.
KiraR, Y.
[1] Brownian motion and positive harmonic functions on complete manifolds of
non-positive curvature, in Elworthy [2], pp. 187-232.
Kingman, J. F. С
[1] Subadditive ergodic theory, Ann. Prob., 1, 883-909 (1973).
[2] Completely random measures, Pacific J. Math., 21, 59-78 (1967).
[3] Regenerative Phenomena, Wiley, New York, 1972.
[4] Poisson Processes, Oxford University Press, Oxford, 1993.
Knight, F. B.
[1] Note on regularisation of Markov processes, Illinois, J. Math., 9, 548-552 (1965).
[2] A reduction of continuous square-integrable martingales to Brownian motion,
Martingales: A Report on a Meeting at Oberwolfach (ed. H. Dinges): Lecture Notes
in Mathematics 190, Springer, Berlin, 1971, pp. 19-31.
[3] Random walks and the sojourn density process of Brownian motion, Trans. Amer.
Math. Soc, 107, 56-86 (1963).
Knight, F. B. and Pittenger, A.O.
[1] Excision of a strong Markov process, Z. Wahrscheinlichkeitstheorie, 23, 114-120
(1972).
Kobayashi, S. and Nomizu, K.
[1] Foundations of Differential Geometry (two volumes) Wiley-Interscience, New York,
1963, 1969.
Kolmogorov, A. N.
[1] The local structure of turbulence in an incompressible fluid at very large Reynolds
numbers, Dokl. Akad. Nauk SSSR, 30, 229-303 (1941).
[2] The distribution of energy in locally isotropic turbulence. Dokl. Akad. Nauk SSSR,
32, 19-21 (1941).
Kozin, F. and Prodromou, S.
[1] Necessary and sufficient conditions for almost sure sample stability of linear Ito
equations, SI AM J. Appl. Math., 21, 413-425 (1971).
Krylov, N. V.
[1] Controlled Diffusion Processes, Springer, New York, 1980.
KUELBS, J.
[1] The law of the iterated logarithm for Banach space valued random variables,
Probability in Banach Spaces: Lecture Notes in Mathematics 526, Springer, Berlin,
1976, pp. 131-142.
KUNITA, H.
[1] On the decomposition of the solutions of stochastic differential equations, Stochastic
Integrals: Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 213-255.
[2] On backward stochastic differential equations, Stochastics, 6, 293-313 (1982).
[3] Stochastic differential equations and stochastic flows of homeomorphisms.
[4] Stochastic partial differential equations connected with nonlinear filtering, in Mitter
and Moro [1].
364
REFERENCES FOR VOLUMES 1 AND 2
[5] Stochastic Flows and Stochastic Differential Equations, Cambridge University Press,
Cambridge, 1990.
Kunita, H. and Watanabe, S.
[1] On square integrable martingales, Nagoya Math. J., 30, 209-245 (1967).
Kunita, H. and Watanabe, T.
[1] Some theorems concerning resolvents over locally compact spaces, Proc. 5th Berkeley
Symp. Math. Statist. Prob., Vol. 2, Part 2, University of California Press, Berkeley
1967, pp. 131-164.
[2] Markov processes and Martin boundaries, I, Illinois J. Math., 9, 485-526 (1965).
[3] On certain reversed processes and their application to potential theory and boundary
theory, J. Math. Mech., 15, 393-434 (1966).
Kusuoka, S. and Stroock, D.
[1] Applications of the Malliavin calculus, Part I, Proceedings of the 1982 Taniguchi Int.
Symp. on Stochastic Analysis (ed. K. Ito), Kinokuniya-Wiley, 1984, 271-306.
[2] Applications of the Malliavin calculus, Part II, J. Fac. Sci. Univ. Tokyo (IA), 32,
1-76 (1985).
le Gall, J.-F.
[1] Applications du temps local aux equations differentielles stochastiques unidimen-
sionelles, Seminaire de Probabilites XVII: Lecture Notes in Mathematics 986,
Springer, Berlin, 1983, pp. 15-31.
[2] Sur la saucisse de Wiener et les points multiples du mouvement Brownien plan at
la methode de renormalization de Varadhan, Seminaire de Probabilites XIX: Lecture
Notes in Mathematics 1123, Springer, Berlin, 1985, pp. 314-331.
[4] Fluctuation results for the Wiener sausage, Ann. Prob., 16, 991-1018 (1988).
[5] The exact Hausdorff measure of Brownian multiple points, in Cinlar, Chung, and
Getoor and Glover [1], pp. 107-137.
[6] Planar Brownian motion, cones and stable processes, C. R. Acad. Sci. Paris Ser. I,
302, 641-643 (1986).
[7] Une approche elementaire des theoremes de decomposition de Williams, Seminaire
de Probabilites, XX, Lecture Notes in Mathematics 1204, Springer, Berlin, 1986,
pp. 447-464.
le Gall, J.-F., Rosen, J. and Shieh, N. R.
[1] Multiple points of Levy processes, Ann. Prob., 17, 503-515 (1989).
le Gall, J.-F. and Yor, M.
[1] Etude asymptotique de certains mouvements browniens complexes avec drift, Prob.
Th. Rel Fields, 71, 183-229 (1986).
[2] Etude asymptotique des enlacements due mouvement brownien autour des droites
de l'espace, Prob. Th. Rel. Fields, 74, 617-635 (1987).
le Jan, Y.
[1] Flots de diffusion dans Rd, C.R. Acad. Sci. Paris Ser. I, 294, 697-699 (1982).
[2] Equilibre et exposants de Lyapounov de certains flots Browniens, C.R. Acad. Sci.
Paris Ser. I, 298, 361-364 (1984).
[3] Exposants de Lyapounov pour les mouvements Browniens isotropes, C. R. Acad.
Sci. Paris Ser. I, 299, 947-949 (1984).
[4] On isotropic Brownian motions, Z. Wahrscheinlichkeitstheorie verw. Geb., 70, 609-620
(1985).
le Jan, Y. and Watanabe, S.
[1] Stochastic flows of diffeomorphisms, Proceedings of the 1982 Taniguchi Int. Symp.
on Stochastic Analysis, 1984, pp. 307-332.
Lenglart, E., Lepingle, D. and Pratelli, M.
[1] Presentation unifiee de certaines inegalites de la theorie des martingales, Seminaire
REFERENCES FOR VOLUMES 1 AND 2
365
de Probabilites XIV: Lecture Notes in Mathematics 784, Springer, Berlin, 1980.
Levy, P.
[1] Theorie de VAddition des Variables Aleatoires, Gauthier Villars, Paris, 1954.
[2] Processus Stochastiques et Mouvement Brownien, Gauthier Villars, Paris, 1965.
[3] Systemes markoviens et stationnaires. Cas denombrable, Ann. Ecole Norm. Sup. (3),
68, 327-381 (1951); 69, 203-212 (1952).
[4] Processus markoviens et stationnaires du cinquieme type (infinite denombrable
des etats possibles, parametre continu), С R. Acad. Sci. Paris, 236,1630-1632, (1953).
[5] Processus markoviens et stationnaires. Cas denombrable, Ann. Inst. H. Poincare, 16,
7-25 (1958).
Lewis, J. T.
[1] Brownian motion on a submanifold of Euclidean space, Bull. London Math. Soc,
18, 616-620 (1986).
Liggett, T.
[1] Interacting Particle Systems, Springer, New York, 1985.
Lindvall, T.
[1] On coupling of diffusion processes, J. Appl. Prob., 20, 82-93 (1983).
Lipster, R. S. and Shiryayev, A. N.
[1] Statistics of Random Processes, I, Springer, Berlin, 1977.
London, R. R., McKean, H. P., Rogers, L. С G. and Williams, D.
[1] A martingale approach to some Wiener-Hopf problems, I, Seminaire de Probabilites
XVI: Lecture Notes in Mathematics 920, Springer, Berlin, 1982, pp. 41-67.
Lyons, T. J.
[1] Finely holomorphic functions, J. Funct. Anal, 37, 1-18 (1980).
[2] Instability of the Liouville property for quasi-isometric Riemannian manifolds and
reversible Markov chains, J. Diff. Geom. 26, 33-66 (1987).
[3] The critical dimension at which quasi-every path is self-avoiding, in D. G. Kendall
[5], pp. 87-100.
Lyons, T. J. and McKean, H. P.
[1] Windings of the plane Brownian motion, Adv. Math., 51, 212-225 (1984).
McGill, P.
[1] Calculation of some conditional excursion formulae, Z. Wahrscheinlichkeitstheorie,
61, 255-260 (1982).
[2] Markov properties of diffusion local time: a martingale approach, Adv. Appl. Prob.,
14, 789-810 (1980).
[3] Integral representation of martingales in the Brownian excursion filtration, Seminaire
de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986,
pp. 465-502.
McKean, H. P.
[1] Stochastic Integrals, Academic Press, New York, 1969.
[2] Excursions of a non-singular diffusion, Z. Wahrscheinlichkeitstheorie, 1, 230-239
(1963).
[3] Brownian local times, Adv. Math., 16, 91-111 (1975).
[4] Brownian motion with a several-dimensional time, Teor. Veroyatnost., 4(4), 357-378
(1963).
McNamara, J. Μ.
[1] A regularity condition on the transition probability measure of a diffusion process.
Stochastics, 15, 161-182 (1985).
Maisonneuve, B.
[1] Systemes regeneratifs, Asterisque, Soc. Mathematique de France, 15 (1974).
366
REFERENCES FOR VOLUMES 1 AND 2
Maisonneuve, B. and Meyer, P. -A.
[1] Ensembles aleatoires markoviens homogenes, Seminaire de Probabilites VIII: Lecture
Notes in Mathematics 381, Springer, Berlin, 1974, pp. 172-261.
Malliavin, M.P. and Malliavin, P.
[1] Factorisations, et lois limites de la diffusion horizontale au dessus d'un espace
riemannien symmetrique, Lecture Notes in Mathematics 404, Springer, Berlin, 1974,
pp. 166-217.
Malliavin, P.
[1] Stochastic calculus of variation and hypo-elliptic operators, Proc. Int. Symp. Stoch.
Diff. Equations, Kyoto, 1976 (ed. K. Ito), Kinokuniya-Wiley, 1978, pp. 195-263.
[2] C*-hypoellipticity with degeneracy, Stochastic Analysis (eds. A. Friedman and M.
Pinksy), Academic Press, New York, 1978, pp. 199-214.
[3] Formula de la moyenne, calcul de perturbations et theoremes d'annulation pour
les formes harmoniques, J. Funct. Anal., 17, 274-291 (1974).
Marcus, M.B. and Rosen, J.
[1] Sample path properties of the local times of strongly symmetric Markov processes
via Gaussian processes. Ann. Prob., 20, 1603-1684 (1992).
Mandl, P.
[1] Analytic Treatment of One-Dimensional Markov Processes, Springer, Berlin, 1968.
Meleard, S.
[1] Application du calcul stochastique a l'etude de processus de Markov reguliers sur
[0,1], Stochastics, 19, 41-82 (1986).
Messulam, P. and Yor, M.
[1] On D. Williams' 'pinching method' and some applications, J. London Math. Soc,
26, 348-364 (1982).
Metivier, M. and Pellaumail, J.
[1] Stochastic Integration, Academic Press, New York, 1979.
Meyer, P. A.
[I] Un cours sur les integrates stochastiques, Seminaire de Probabilites X: Lecture Notes
in Mathematics 511, Springer, Berlin, 1976, pp. 245-400.
[2] Probability and Potential, Blaisdell, Waltham, Mass., 1966.
[3] Processus de Markov: Lecture Notes in Mathematics 26, Springer, Berlin, 1967.
[4] Processus de Markov: la Frontiere de Martin: Lecture Notes in Mathematics 77,
Springer, Berlin, 1970.
[5] Demonstration simplifiee d'un theoreme de Knight, Seminaire de Probabilites V:
Lecture Notes, in Mathematics 191, Springer, Berlin, 1971, pp. 191-195.
[6] Demonstration probabiliste de certaines inegalites de Littlewood-Paley, Seminaire
de Probabilites X: Lecture Notes in Mathematics 511, Springer, Berlin, 1976,
pp. 125-183.
[7] Flot d'un equation differentielle stochastique, Seminaire de Probabilites XV: Lecture
Notes in Mathematics 850, Springer, Berlin, 1981, pp. 103-117.
[8] Sur la demonstration de previsibilite de Chung and Walsh, Seminaire de Probabilites
IX: Lecture Notes in Mathematics 465, Springer, Berlin, 1975, pp. 530-533.
[9] Geometrie stochastique sans larmes, Seminaire de Probabilites XV: Lecture Notes
in Mathematics 850, Springer, Berlin, 1981, pp. 44-102.
[10] Geometrie stochastique sans larmes (bis), Seminaire de Probabilites XVI: Supplement,
Lecture Notes in Mathematics 921, Springer, Berlin, 1982, pp. 165-207.
[II] Elements de probabilites quantiques, Seminaire de Probabilites XX: Lecture Notes
in Mathematics 1204, Springer, Berlin, 1986, pp. 186-312.
REFERENCES FOR VOLUMES 1 AND 2
367
[12] Quantum Theory for Probabilists, Lecture Notes in Mathematics 1538, Springer,
Berlin, 1993.
MlHLSTEIN, G. N.
[1] Approximate integration of stochastic differential equations, Th. Prob. AppL, 19,
557-562 (1974).
Millar, P. W.
[1] Random times and decomposition theorems, in Probability: Proc. Symp. Pure Math.
XXXI, Amer. Math. Soc, Providence, RI, 1977, pp. 91-103.
[2] A path decomposition for Markov processes, Ann. Prob., 6, 345-348 (1978).
Millar, P. W. and Tran, L. T.
[1] Unbounded local times, Z. Wahrscheinlichkeitstheorie verw. Geb., 30,87-92 (1974).
MlTRO, J.
[1] Dual Markov processes: construction of a useful auxiliary process, Z.
Wahrscheinlichkeitstheorie, 47, 139-156 (1979).
[2] Dual Markov functions: applications of a useful auxiliary process, Z. Wahrscheinli-
chkeitstheorie, 48, 97-114 (1979).
MlTTER, S. K.
[1] Lectures on non-linear filtering and stochastic control, in Mitter and Moro [1],
pp. 170-207.
Mitter, S. K. and Moro, A. (editors)
[1] Non-linear Filtering and Stochastic Control: Lecture Notes in Mathematics 972,
Springer, Berlin, 1982.
Μοτοο, Μ.
[1] Application of additive functionals to the boundary problem of Markov processes
(Levy's system of U-processes), Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 2,
Part 2, Univ. of California Press, Berkeley, 1967, pp. 75-110.
[2] Proof of the law of iterated logarithm through diffusion equation, Ann. Inst. Statist.
Math., 10, 21-28 (1959).
Μοτοο, M. Watanabe, S.
[1] On a class of additive functionals of Markov processes, J. Math. Kyoto Univ., 4,
429-469 (1965).
Nakao, S.
[1] On the pathwise uniqueness of solutions of one-dimensional stochastic differential
equations, Osaka J. Math., 9, 513-518 (1972).
Nash, J. F.
[1] The imbedding problem for Riemannian manifolds, Ann. Math., 63, 20-63 (1956).
Nelson, E.
[1] Dynamical Theories of Brownian Motion, Princeton University Press, 1967.
[2] Quantum Fluctuations, Princeton University Press, 1984.
Neveu, J.
[1] Bases Mathematiques du Calcul des Probabilites, Masson, Paris, 1964.
[2] Sur les etats d'entree et les etats flctifs d'un processus de Markov, Ann. Inst. Henri
Poincare, 17, 323-337 (1962).
[3] Lattice methods and submarkovian processes, Proc. 4th Berkeley Symp. Math. Statist.
Prob., Vol. 2, University of California Press, Berkeley, 1960, pp. 347-391.
[4] Une generalisation des processus a accroissements positifs independants, Abh. Math.
Sem. Univ. Hamburg, 25, 36-61 (1961).
[5] Entrance, exit and fictitious states for Markov chains, Proc. Aarhus Colloq. Combin
Prob., 1962, pp. 64-68.
368
REFERENCES FOR VOLUMES 1 AND 2
NORRIS, J. R.
[1] Simplified Malliavin calculus, Seminaire de Probabilites XX: Lecture Notes in
Mathematics 1204, Springer, Berlin, 1986, pp. 101-130.
Norris, J. R., Rogers, L. C. G. and Williams, D.
[1] Brownian motion of ellipsoids, Trans. Amer. Math. Soc, 294, 757-765 (1986).
[2] Self-avoiding random walk: a Brownian motion model with local time drift, Prob.
Th. Rel. Fields, 74, 271-287 (1987).
Ocone, D.
[1] Malliavin's calculus and stochastic integral: representation of functional of diffusion
processes, Stochastics, 12, 161-185 (1984).
Orihara, A.
[1] On random ellipsoid, J. Fac. Sci. Univ. Tokyo, Sect. IA Math., 17, 73-85 (1970).
Pardoux, E.
[1] Stochastic differential equations and filtering of diffusion processes, Stochastics, 3,
127-167(1979).
[2] Grossissement d'une filtration et retournement du temps d'une diffusion, Seminaire
de Probabilites XX: Lecture Notes in Mathematics 1204, Springer, Berlin, 1986,
pp. 48-55.
[3] Equations of non-linear filtering, and applications to stochastic control with partial
observations, in Mitter and Moro [1], pp. 208-248.
Pardoux, E. and Talay, D.
[1] Discretization and simulation of stochastic differential equations (to appear in Acta
Appl. Math.).
Parthasarathy, K. R.
[1] Probability Measures on Metric Spaces, Academic Press, New York, 1967.
Pauwels, E. and Rogers, L. C. G.
[1] Skew-product decompositions of Brownian motions, Contemp. Math. 73, 237-262
(1988).
Perkins, E.
[1] Local time and path wise uniqueness for stochastic differential equations, Seminaire
de Probabilites XVI: Lecture Notes in Mathematics 920, Springer, Berlin, 1982,
pp. 201-208.
[2] Local time is a semimartingale, Z. Wahrscheinlichkeitstheorie, 60, 79-117 (1982).
Phelps, R. R.
[1] Lectures on Choquet's Theorem, Van Nostrand, Princeton, NJ, 1966.
Pinsky, M. A.
[1] Homogenization and stochastic parallel displacement, in Williams [13], pp. 271-284.
[2] Stochastic Riemannian geometry, Probabilistic Analysis and Related Topics, 1 (ed.
A. T. Bharucha-Reid), Academic Press, New York, 1978.
Pitman, J. W.
[1] One-dimensional Brownian motion and the three-dimensional Bessel process, J.
Appl. Prob., 1, 511-526(1975).
[2] Path decomposition for conditional Brownian motion, Inst. Math. Statist. Univ.
Copenhagen, Preprint No. 11 (1974).
[3] Levy systems and path decompositions, in Qlinlar, Chung and Getoor [1, 1981].
Pitman, J. W. and Yor, M.
[1] Bessel processes and infinitely divisible laws, Stochastic Integrals (ed. D. Williams),
Lecture Notes in Mathematics 851, Springer, Berlin, 1981, pp. 285-370.
[2] A decomposition of Bessel bridges. Z. Wahrscheinlichkeitstheorie, 59, 425-457
(1982).
REFERENCES FOR VOLUMES 1 AND 2
369
[3] The asymptotic joint distribution of windings of planar Brownian motion, Bull.
Amer. Math. Soc, 10, 109-111 (1984).
[4] Asymptotic laws of planar Brownian motion, Ann. Prob., 14, 733-779 (1986).
Pittenger, A. O. and Shih, С. Т.
[1] Coterminal families and the strong Markov property, Trans. Amer. Math. Soc,
182, 1-42 (1973).
Poor, W. A.
[1] Differential Geometric Structures, McGraw-Hill, New York, 1981.
Port, S. С and Stone, С J.
[1] Classical potential theory and Brownian motion, Proc. 6th Berkeley Symp. Math.
Statist. Prob., Vol. 3, University of California Press, Berkeley, 1972, pp. 143-176.
[2] Logarithmic potentials and planar Brownian motion, Proc. 6th Berkeley Symp. Math.
Statist. Prob., Vol. 3, University of California Press, Berkeley 1972, pp. 177-192.
[3] Brownian Motion and Classical Potential Theory, Academic Press, New York, 1978.
Price, G. C. and Williams, D.
[1] Rolling with 'slipping': I, Seminaire de Probabilites XVII: Lecture Notes in
Mathematics 986, Springer, Berlin, 1983, pp. 194-297.
Prohorov, Yu, V.
[1] Convergence of random processes and limit theorems in probability, Th. Prob. Appl.,
1, 157-214 (1956).
Protter, P.
[1] On the existence, uniqueness, convergence and explosions of solutions of stochastic
differential equations, Ann. Prob., 5, 243-261 (1977).
Rao, К. М.
[1] On decomposition theorems of Meyer, Math. Scand., 24, 66-78 (1969).
[2] Quasimartingales, Math. Scand., 24, 79-92 (1969).
Ray, D. B.
[1] Resolvents, transition functions and strongly Markovian processes, Ann. Math., 70,
43-72 (1959).
[2] Sojourn times of a diffusion process, Illinois J. Math., 7, 615-630 (1963).
Reuter, G. Ε. Η.
[1] Denumerable Markov processes, II, J. London Math. Soc, 34, 81-91 (1959).
Revuz, D.
[1] The Martin boundary of a recurrent random walk has one or two points, Probability:
Proc. Symp. Pure Math. XXXI, Amer. Math. Soc, Providence, RI, 1977, pp. 125-130.
Revuz, D. and Yor, M.
[1] Continuous Martingales and Brownian Motion, Springer, Berlin, 1991.
Rogers, L. С G.
[1] Williams' characterization of the Brownian excursion law: proof and applications,
Seminaire de Probabilites XV: Lecture Notes in Mathematics 850, Springer, Berlin,
1981, pp. 227-250.
[2] Ito excursion theory via resolvents, Z. Wahrscheinlichkeitstheorie, 63,237-255 (1983).
[3] Smooth transition densities for one-dimensional diffusions, Bull. London Math. Soc,
17, 157-161 (1985).
[4] Continuity of martingales in the Brownian excursion filtration, Prob. Th. Rel. Fields
16, 291-298 (1987).
[5] Multiple points of Markov processes in a complete metric space, Seminaire de
Probabilites XXIII: Lecture Notes in Mathematics 1372, Springer, Berlin, 1989,
pp. 186-197.
[6] A new identity for real Levy processes. Ann. Inst. Henri Poincare, 20,21-34 (1984).
370
REFERENCES FOR VOLUMES 1 AND 2
Rogers, L. C. G. and Pitman, J. W.
[1] Markov functions, Ann. Prob. 9, 573-582 (1981).
Rogers, L. C. G. and Williams, D.
[1] Diffusions, Markov Process, and Martingales: Volume 2: ltd Calculus, Wiley, Chichester,
1987.
[2] Construction and approximation of transition matrix functions, in D. G. Kendall
[5], pp. 133-160.
Rogozin, B. A.
[1] On the distribution of functionals related to boundary problems for processes with
independent increments, Th. Prob. Appi, 11, 580-591 (1966).
Rosen, J.
[1] A local time approach to self-intersections of Brownian paths in space, Comm. Math.
Phys., 88, 327-338 (1983).
Schwartz, L.
[1] Geometrie differentielle du 2ieme ordre, semimartingales et equations differentielles
stochastiques sur une variete differentielle, Seminaire de Probabilitis XVI, Supplement:
Lecture Notes in Mathematics 921, Springer, Berlin, 1982, pp. 1-148.
Sharpe, M. J.
[1] General Theory of Markov Processes, Academic Press, New York, 1988.
Sheppard, P.
[1] On the Ray-Knight property of local times, J. London Math. Soc, 31,377-384 (1985).
Shiga, T. and Watanabe, S.
[1] Bessel diffusions as a one-parameter family of diffusion processes, Z. Wahrschein-
lichkeitstheorie, 27, 37-46 (1973).
Shigekawa, I.
[1] Derivatives of Wiener functionals and absolute continuity of induced measure, J.
Math. Kyoto Univ., 20, 263-289 (1980).
Shimura, M.
[1] Excursions in a cone for two-dimensional Brownian motion, J. Math. Kyoto Univ.,
25, 433-443 (1985).
Silverstein, M. L.
[1] Symmetric Markov Processes: Lecture Notes in Mathematics 426, Springer, Berlin,
1974.
[2] Boundary Theory for Symmetric Markov Processes: Lecture Notes in Mathematics
516, Springer, Berlin, 1976.
Simon, B.
[1] Functional Integration and Quantum Physics, Academic Press, New York, 1979.
[2] Semiclassical analysis of low-lying eigenvalues, II. Tunneling, Ann. Math. 120,89-118
(1984).
Skorokhod, A. V.
[1] Limit theorems for stochastic processes, Th. Prob. Appl. 1, 261-290 (1956).
[2] Limit theorems for Markov processes, Th. Prob. Appl. 3, 202-246 (1958).
Spitzer, F.
[1] Principles of Random Walk, Van Nostrand, Princeton, NJ, 1964.
[2] Some theorems concerning two-dimensional Brownian motion, Trans. Amer. Math.
Soc, 87, 187-197 (1958).
Strassen, V.
[1] An in variance principle for the law of the iterated logarithm, Z. Wahrscheinlichkeits-
theorie, 3, 211-226 (1964).
[2] Almost sure behaviour of sums of independent random variables and martingales,
REFERENCES FOR VOLUMES 1 AND 2
371
Proc. 5th Berkeley Symp. Math. Statist. Prob., Vol. 2, Part 1, University of California
Press, Berkeley, 1966, pp. 315-343.
Stroock, D. W.
[1] The Malliavin calculus and its applications to second-order parabolic differential
operators I, II, Math. System Theory, 14, 25-65, 141-171 (1981).
[2] The Malliavin calculus; a functional analytical approach, J. Funct. Anal, 44, 217-257
(1981).
[3] Diffusion processes associated with Levy generators, Z. Wahrscheinlichkeitstheorie,
32, 209-244 (1975).
[4] An Introduction to the Theory of Large Deviations, Springer, Berlin, New York, 1984.
Stroock, D. W. and Varadhan, S. R. S.
[1] Multidimensional Diffusion Processes, Springer, New York, 1979.
[2] On the support of diffusion processes with applications to the strong maximum
principle, Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. 3, University of
California Press, Berkeley, 1972, pp. 333-359.
[3] Diffusion processes with boundary conditions, Comm. Pure Appl. Math., 24,147-225
(1971).
Stroock, D. W. and Yor, M.
[1] Some remarkable martingales, Seminaire de Probabilites XV: Lecture Notes in
Mathematics 850, Springer, Berlin, 1981, pp. 590-603.
SUSSMANN, H. J.
[1] On the gap between deterministic and stochastic ordinary differential equations,
Ann. Prob., 6, 19-41 (1978).
Symanzik, K.
[1] Euclidean quantum field theory, Local Quantum Theory (ed. R. Jost), Academic
Press, New York, 1969.
Talagrand, M.
[1] Regularity of Gaussian processes, Acta Math., 159, 99-149 (1987).
Taylor, G. I.
[1] Statistical theory of turbulence, Proc. Roy. Soc. London A, 151, 421-478 (1935).
Taylor, Η. Μ.
[1] A stopped Brownian motion formula, Ann. Prob., 3, 234-246 (1975)
Taylor, S. J.
[1] Sample path properties of processes with stationary independent increments,
Stochastic Analysis (eds. D. G. Kendall and E. F. Harding), Wiley, New York, 1973,
pp. 387-414.
Thorin, O.
[1] On the infinite divisibility of the lognormal distribution, Scand. Actuarial J., 121-148
(1977).
Tsirel'son, B. S.
[1] An example of the stochastic equation having no strong solution, Teoria Verojatn.
i Primenen., 20, 427-430 (1975).
Van Den Berg, M. and Lewis, J. T.
[1] Brownian motion on a hypersurface, Bull. London Math. Soc, 17, 144-150 (1985).
Varadhan, S. R. S.
[1] Large Deviations and Applications, SIAM, Philadelphia, 1984.
Varadhan, S. R. S. and Williams, R. J.
[1] Brownian motion in a wedge with oblique reflection, Comm. Pure Appl. Math., 38,
405-443 (1985).
372
REFERENCES FOR VOLUMES 1 AND 2
Walsh, J. B.
[1] Excursions and local time, in Azema and Yor [2], pp. 159-192.
[2] Stochastic integration with respect to local time, in фп1аг, Chung and Getoor [1,
1983].
[3] An introduction to stochastic partial differential equations, Ecole d'Ete de
Probabilites de St Flour XIV-1984, Lecture Notes in Mathematics 1180, Springer,
Berlin, 1986.
Warner, F. W.
[1] Foundations of Differentiable Manifolds and Lie Groups, Springer, Berlin 1983.
Watanabe, S.
[1] On discontinuous additive functionals and Levy measures of a Markov process,
Jap. J. Math., 34, 53-79 (1964).
Watson, G. N.
[1] A Treatise on the Theory of Bessel Functions, Cambridge University Press,
Cambridge, 1966.
Whitney, H.
[1] Geometric Integration Theory, Princeton University Press, Princeton, NJ, 1957.
Whittle, P.
[I] Optimization over Time (two volumes), Wiley, Chichester, 1982, 1983.
Williams, D.
[1] Brownian motions and diffusions as Markov processes, Bull. London Math. Soc,
6, 257-303 (1974).
[2] Some basic theorems on harnesses, Stochastic Analysis (eds. D. G. Kendall and E. F.
Harding), Wiley, New York, 1973, pp. 349-366.
[3] On Levy's downcrossing theorem, Z. Wahrscheinlichkeitstheorie, 40, 157-158
(1977).
[4] Path decomposition and continuity of local time for one-dimensional diffusions,
I, Proc. London Math. Soc, Ser. 3, 28, 738-768 (1974).
[5] On the stopped Brownian motion formula of Η. Μ. Taylor, Seminaire de
Probabilites X: Lecture Notes in Mathematics 511, Springer, Berlin, 1976, pp.
235-239.
[6] Markov properties of Brownian local time, Bull. Amer. Math. Soc, 75, 1035-1036
(1969).
[7] Decomposing the Brownian path, Bull. Amer. Math. Soc, 16, 871-873 (1970).
[8] The Q-matrix problem for Markov chains, Bull. Amer. Math. Soc, 81, 1115-1118
(1975).
[9] The Q-matrix problem, Seminaire de Probabilites X: Lecutre Notes in Mathematics
511, Springer, Berlin, 1976, pp. 216-234.
[10] A note on the Q-matrices of Markov chains, Z. Wahrscheinlichkeitstheorie, 1,
116-121 (1967).
[II] Some Q-matrix problems, Probability: Proc. Symp. Pure Math. XXXI, Amer. Math.
Soc, Providence, RI, 1977, pp. 165-169.
[12] Diffusions, Markov Processes, and Martingales, Volume 1: Foundations, Wiley,
Chichester, 1979.
[13] (editor) Stochastic Integrals: Proceedings, LMS Durham Symposium, Lecture Notes
in Mathematics 851, Springer, Berlin, 1981.
[14] Conditional excursion theory, Seminaire de Probabilites XIII: Lecture Notes in
Mathematics 721, Springer, Berlin, 1979, pp. 490-494.
[15] ( = [W]) Probability with Martingales, Cambridge University Press, Cambridge,
1991.
REFERENCES FOR VOLUMES 1 AND 2
373
Yaglom, A. M.
[1] Some classes of random fields in и-dimensional space, related to stationary random
processes, Th. Prob. Appl, 2, 273-319 (1957).
Yamada, T.
[1] On a comparison theorem for solutions of stochastic differential equations and its
applications, J. Math. Kyoto Univ., 13, 497-512 (1973).
Yamada, T. and Ogura, Y.
[1] On the strong comparison theorems for solutions of stochastic differential equations,
Z. Wahrscheinlichkeitstheorie, 56, 3-19 (1981).
Yamada, T. and Watanabe, S.
[1] On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto
Univ., 11, 155-167 (1971).
Yor, M.
[1] Sur certains commutateurs d'une filtration, Seminaires de Probabilites XV: Lecture
Notes in Mathematics 850, Springer, Berlin, 1981, pp. 526-528.
[2] Sur la continuite des temps locaux associes a certaines semimartingales, in Azema
and Yor [2], pp. 23-35.
[3] Rappel et preliminaires generaux, in Azema and Yor [2], pp. 17-22.
[4] Precisions sur l'existence et la continuite des temps locaux d'intersection du mouve-
ment Brownien dans R2, Seminaire de Probabilites XX: Lecture Notes in
Mathematics 1204, Springer, Berlin, 1986, pp. 532-542.
[5] Sur la representation comme integrates stochastique des temps d'occupation du
mouvement Brownien dans Rd, ibid, pp. 543-552.
Yamada, T.
[1] Functional Analysis, Springer, Berlin, 1965.
[2] Brownian motion in homogeneous Riemannian space, Pacific J. Math., 2, 263-296.
(1952).
Zakai, M.
[1] The Malliavin calculus, Acta Appl. Math., 3, 175-207 (1985).
Zheng, W. A. and Meyer, P.-A.
[1] Quelques resultats de 'mechanique stochastique', Seminaire de Probabilites XVIII:
Lecture Notes in Mathematics 1059, Springer, Berlin, 1984, pp. 223-244.
Zvonkin, A. K.
[1] A transformation of the phase space of a diffusion process that removes the drift,
Math. USSR Sbornik, 22, 129-149 (1974).
Index to Volumes 1 and 2
Absolute continuity: II.9.
Absorbing state: III. 12.
Accessible stopping time: VI. 13-14.
Adapted process: 11.45.
Additive functional: III. 16; construction from Α-potential, III. 16, 1.17.
Affine group: V.35.
Algebra: II. 1.
Almost surely: 11.14, III.9.
Announceable time: VI. 12.
Approximation to compensators: VI.31.
Arcsine law: 11.34,111.24, V.53.
Arzela-Ascoli theorem: 11.85.
Atlas: V.34.
Atom: 11.88.
Awaiting the almost inevitable: 11.57.
Azema's martingale: 11.37.
Backward equation: 1.4.
Barlow's example: V.41.
Basic integrands: IV.5.
Bessel process: IV.35, V.48; time reversal, 111.49.
Bi-invariant metric: V.35.
Birth process: 111.26, VI. 14.
BlumenthaPs 0-1 Law: for Brownian motion, 1.12; for FD processes, III.9.
Bochner's theorem: 1.24.
Bochner's horizontal Laplacian: V.34.
Borel-Cantelli Lemmas: 11.15.
Boundary points of one-dimensional diffusions: V.47, V.51.
Boundary theory: see Martin-Doob-Hunt theory.
Branch points: definition, 111.37; illustrative example, 111.37; probabilistic significance
111.41.
Brownian motion: definition, 1.1; on affine group, V.35; arcsine law, 11.34, 111.24, VI.53;
Brownian bridge, 1.25, 11.91, IV.40; canonical, 11.90; complex, see complex Brownian
motion; and continuous martingales, 1.2, IV.34; Dirichlet problem, 1.22; elastic Brownian
motion, 111.24; of ellipses, V.36, V.37; energy of charge, 1.22; excursion law, VI.50;
exponential martingales, 1.2,1.9; Feller Brownian motions, VI.57; on filtered probability
space, 1.2, 11.72; first-passage distribution, 1.9, 1.14, III. 10; Gaussian description, 1.3;
generator, 1.4, III.6; Green function, 1.22; iterated-logarithm laws, 1.16; Kolmogorov's
376
INDEX TO VOLUMES 1 AND 2
backward and forward equations, 1.4; Kolmogorov's test, 1.13; Levy's characterization,
IV.33; on Lie groups, V.35; local time, 1.5,1.14,111.16; on a manifold, V.30, V.31;
martingales of, 1.17; martingale characterisations, 1.2; martingale representation, IV.36,
IV.41; modulus of continuity, 1.10; no-increase property, 1.10; nowhere-differentiability,
1.12; on the orthonormal frame bundle, V.30, V.33, V.34; path decomposition, VI.55;
potential theory, 1.22; quadratic variation, 1.11, IV.2; Ray-Knight Theorems, VI.52;
recurrence, 1.3; reflecting Brownian motion, 1.14,111.22, V.6; reflection principle, 1.13;
resolvent, III.3; rotational invariance, 1.18; scaled Brownian excursion, IV.40; scaling,
1.3; skew-product representation, V.31; on SO(3), V.35; Skorokhod embedding, 1.7,
VI.51; slow points, 1.10; strong Markov property, 1.12; on a surface, V.4, V.31;
time-reversed, 11.38; transition density, 1.4; unbounded variation, 1.11, IV.2; wandering
to infinity, 1.18.
Brownian sheet: 1.25.
Burkholder-Davis-Gundy inequalities: IV.42.
Cadlag maps: see K-paths.
Cameron-Martin-Girsanov change of measure: IV.38-41, V.27.
Canonical decomposition of a special semimartingale: VI.40.
Canonical process: 11.28, 11.71, III.7.
Capacity: 1.22.
Caratheodory's Extension Theorem: II.5.
Carverhill's noisy North-South flow: V.14.
Cauchy law: 1.20.
Cauchy process: 1.28, VI.2, VI.28.
Change of time scale: see time substitution.
Chapman-Kolmogorov equations: 1.4, III.l.
Characteristic exponent: 1.28.
Characteristic operator: III. 12.
Charge: 1.22,111.27; see also equilibrium charge.
Chart: V.34.
Choquet capacitability theory: 111.76.
Choquet representation of Α-excessive functions: 111.44.
Choquet representation of 1-excessive probabilities: 111.38.
Choquet's theorem on integral representations: 111.27.
Christoffel symbols: V.31, V.34.
Cieselski-Taylor Theorem: 111.20,111.49.
Clark's Theorem on Brownian martingale representation: IV.41.
Coffin state: III.3.
Comparison theorem: V.43.
Compensated Poisson process: 11.64.
Compensator: VI.29, VI.31; see also dual previsible projection.
Completions: 11.75.
Complex Brownian motion: 1.19; cone point, 1.21; cut point, 1.21; multiple points, 1.21;
Spitzer's theorem, 1.20; windings of, 1.20.
Compound Poisson process: 1.28.
Condition Ν: ΙΙΙ.54.
Condition S: 111.55.
Conditional expectations and probabilities: 11.41,11.44; regular, 11.42, 11.43.
Conditional independence: 11.60.
Cone point: 1.21.
INDEX TO VOLUMES 1 AND 2
377
Conformal martingales: IV.34.
Connection: V.32, V.34.
Continuous Levy processes, characterization of: 1.28, III. 14.
Continuous local martingale: pure local martingales, IV.34; quadratic-variation process
IV.30; as time change of Brownian motion, IV.34.
Continuous mapping principle: 11.84.
Continuous semimartingale: canonical decomposition, IV.30, VI.24; Ito's formula, IV.32;
local time, IV.43.
Contraction resolvent: III.4; strongly continuous (SCCR), III.4.
Contraction semigroup: III.4; strongly continuous (SCCSG), III.4.
Control problems: see stochastic control.
Controlled variance problem: V.6, V.42.
Convergence of random variables: 11.19.
Coupling inequality: V.54.
Coupling of one-dimensional diffusions: V.54.
Co variance of a diffusion: V.l.
Covariant differentiation: V.32, V.34.
Cumulative risk: 11.64, VI.22.
Curvature: V.38.
Cut point: 1.21.
Cylinder: 11.25.
^-system: ILL
Daniell-Kolmogorov Theorem: 11.30, 11.31; limitations of, 11.34.
Debut: of open set for R-process, 11.74; of compact set of R-process, 11.75; of progressive
set, VI.3.
Debut Theorem 11.76, III.9, VI.3.
De Finetti's Theorem: 11.51.
Diffeomorphism: V.34.
Diffeomorphism Theorem: V.13.
Diffusion equation:L4.
Diffusion: III. 13, V.l, V.2; diffusion SDE, V.8; in one dimension, see one-dimensional
diffusions; physical, 1.23.
Directed set: 11.80.
Dirichlet form: 1.23; for Markov chains, 111.59.
Dirichlet problem: 1.22.
Distribution function: 11.16.
Doleans' characterization of FV processes: VI.20, VI.25-27.
Doleans exponential: IV.19.
Doleans' proof of the Meyer Decomposition Theorem: VI.30.
Dominated-Convergence Theorem: II.8.
Donsker's Invariance Principle: 1.8.
Doob decomposition of a submartingale: 11.54.
Doob /i-transform: 111.29, 111.45, IV.39.
Doss-Sussmann method: V.28.
Downcrossing Theorem (Levy): 1.14.
Drift of a diffusion: V.l.
Dual previsible projection: VI. 1, VI.21, VL23.
Dynkin's formula: III. 10.
Dynkin's Local-Maximum Principle: III. 13.
378
INDEX TO VOLUMES 1 AND 2
Dynkin's Isomorphism Theorem: 1.27.
Dynkin's Maximum Principle: III.6.
Elastic boundary: 111.24.
Elementary process: IV.6, IV.25.
El worthy's example: V.13.
Empirical distribution: 11.91.
Entrance laws: III. 39.
Equilibrium charge and potential: 1.22,111.48, VI.35.
Ergodic Theorem for one-dimensional diffusions: V.53.
Evanescent process: IV. 13.
Excessive functions: 111.27; representation, see Martin-Doob-Hunt theory; Riesz
decomposition, 111.27; uniformly Α-excessive, III. 16.
Excessive measures: III.38.
Excursion intervals: VI.42.
Excursion law: VI.47, VI.50; for Brownian motion, VI.50, VI.55; for Markov chain, VI.43,
VI.50.
Excursion theory: Ch.VI; censoring and reweighting of excursion laws, VI.58; characteristic
measure, VI.47; excursion filtration, VI.59; for a finite Markov chain, VI.43; lifetime,
VI.47; marked excursions, VI.49; for a Markov chain, 111.57; Markovian character of
excursion law, VI.48; path decomposition for Brownian excursions, VI.55; from a
point which is not regular extremal, VI.50; Poisson point process, VI.43, VI.47; starred
excursion, VI.49; by stochastic calculus, VI.59.
Excursion space: VI.43, VI.47.
Expectation: 11.17.
Exponential map: V.34.
Exponential semimartingale: IV. 19, IV.22, IV.37.
Extending the generator: III.4.
FD (Feller-Dynkin) diffusions: III. 13, V.22; martingale representation, V.25.
FD processes: existence, III.7; strong Markov property, III.8, III.9.
FD semigroups: III.6.
FV: see finite-variation.
Fair stopping time: VI. 12.
Fatou Lemma: II.8; for non-negative supermartingales, IV. 14.
Feller Brownian motions: VI.57.
Feller property: III.6.
Feller-McKean chain: 111.23,111.35.
Feynman-Kac formula: 111.19; for Markov chains, IV.22.
Field: see algebra.
Fick's Law: 1.23.
Filtering: VI.8; Bayesian approach, VI. 10; change-detection filter, V.10, V.22; Kalman-
Bucy filter, VI.9; robust filtering, VI. 11.
Filtration: 11.45,11.63; natural, 11.45.
Finite-dimensional distributions: 11.29,11.87.
Finite fuel control problem: V.7, V.15.
Finite-variation functions: 11.13.
Finite-variation processes: IV.7; Doleans' characterization, VI.20.
First-approach times: 11.74.
INDEX TO VOLUMES 1 AND 2
379
First-entrance decomposition: III.52.
First-entrance times: see debut.
First-hitting times: 11.74.
Forward equation: 1.4, 1.23.
Freedman's interpretation of qbj: III.57.
Fubini's Theorem: 11.12.
Fundamental Theorem of Algebra: 1.20.
Gamma process: 1.28.
Gaussian process: definition, 1.3.
Gaussian random fields, isotropic: 1.26.
Generator: see infinitesimal generator.
Geodesic: V.32, V.34.
Girsanov SDE: V.26.
Good λ inequality: IV.42.
Green function: 1.22,111.27, 111.30.
Gronwall's lemma: V.ll.
Harmonic function: 111.31.
Hausdorff moment problem: 111.28.
Hazard function: 11.64.
Heat equation: 1.4.
Helms-Johnson example: 11.79,111.31, IV. 14, VI.33.
Hermite polynomials: 1.2.
Hewitt-Savage 0-1 law: 11.51.
Hille-Yosida Theorem: III.5.
Holder inequality: 11.10.
Honest transition function: III.3.
Horizontal lift: V.34.
Horizontal vector field: V.34.
Hormander's Theorem: V.38.
Hunt's Theorem: VI.35.
Hyperbolic plane: V.34, V.35, V.36.
Hyperboloid sheet: V.36.
Hypotheses droites: VI.46.
Identical hitting-distributions: 111.21.
Imbedding: V.34.
Independence: 11.21,11.23.
Indistinguishable processes: 11.36, IV. 13.
Infinite divisibility: 1.28.
Infinitesimal generator: III.2, III.4; Brownian motion, 1.4, III.6; one-dimensional diffusion
V.47, V.50.
Inner measure: II.6.
Inner regularity of measures: 11.80.
Innovations process: VI.8.
Instantaneous state of a Markov chain: 111.51.
Integral: II.7.
Integrable-variation processes: IV.7.
380
INDEX TO VOLUMES 1 AND 2
Integral curve: V.34.
Integration by parts: IV.2, VI.38; for continuous semimartingales, IV.32; for finite-
variation processes, IV. 18.
Integrators: IV. 16.
Isometric imbedding: V.34.
Isotropic Gaussian random fields: L26.
Ito's formula: IV.3, VI.39; for continuous semimartingales, IV.32; for convex functions
IV.45, V.47; for FV processes, IV. 18.
Ito integral: see stochastic integral.
Jensen's inequality: 11.18,11.41,11.52.
Joint law: 11.16.
Kalman-Bucy filter: VI.9.
Khasminskii's method for stability: V.37.
Khasminskii's test for explosion: V.52.
Killing: III. 18
Kingman's Markov Characterization Theorem: 111.58.
Knight's Theorem on continuous local martingales: IV.34.
Kolmogorov's backward equations: 1.4.
Kolmogorov's forward equations: 1.4.
Kolmogorov's lemma: 1.25,11.85, IV.44.
Kolmogorov's test for Brownian motion: 1.13.
Kolmogorov's 0-1 Law: 11.50.
Krylov's example: V.29.
Kunita-Watanabe inequalities: IV.28.
L-process, L-path: IV, Introduction.
Α-potential operator: see resolvent.
Laplace exponent: 1.28,1.37.
Laplace-Beltrami operator: V.30, V.34.
Last-exit decomposition: 111.56, VI.43, VI.48.
Last-exit distribution for Brownian motion: VI.35.
Law of the Iterated Logarithm: 1.16.
Law of process: 11.27.
Law of a random variable: 11.16.
LCCB: locally compact Hausdorff space with countable base, II.6.
Lebesgue measure: II.5.
Lebesgue's thorn: III.9.
Left-invariant vector field: V.35.
Levy Brownian motion: 1.24.
Levy's characterization of Brownian motion: 1.2, IV.33.
Levy-Doob 'Downward' Theorem: 11.51.
Levy kernels: 111.57, IV.21.
Levy-Hincin formula: 1.28, VI.2.
Levy measure: 1.28,11.37, VI.2; Levy system: VI.28.
Levy process: 1.28,1.29,1.30,11.37, VI.2.
Levy's 'Upward' Theorem: 11.50.
Lie algebra: V.35.
Lie bracket: V.34, 38.
INDEX TO VOLUMES 1 AND 2
381
Lie group: V.35.
Lifetime: III.7.
Likelihood ratio: 11.79, IV. 17; for Markov chains, IV.22.
Lipschitz square root: V.12.
Local martingale: IV.l, IV.14; on a manifold, V.30, V.33.
Local time: for Brownian motion, 1.5, 1.14; for continuous semimartingales, IV.43-4;
growth set, VI.45; for Levy processes, 1.30; Markovian local time, IV.43; as an
occupation density, IV.45; for one-dimensional diffusions, V.49; at regular extreme
point of a Ray process, VI.45; from upcrossings of Brownian motion, 1.14,11.79.
Localization: IV.9.
Locally bounded previsible process: IV. 10.
Lusin space: 11.31,11.82.
Lyapunov exponent: V.37.
Malliavin-Bismut integration-by-parts: V.38.
Malliavin calculus: V.38.
Manifold: V.34.
Marked excursions: VI.49.
Markov chains: III.2; birth process, IV.26; Dirichlet form, 111.59; Feller-McKean chain,
111.23,111.35; Levy's diagonal Q-matrix, 111.35, IV.35; Martin boundary, 111.48;
martingale problem, IV.20-22; as Ray processes, 111.50; stable and instantaneous states, 111.51;
see also Q-matrices, standard transition functions.
Markov p-function: 111.58.
Markov inequality: 11.18.
Markov processes: III.l; see also FD processes, Ray processes.
Martin compactification: 111.28.
Martin kernel: 111.27; for Brownian motion in the unit ball, III.30.
Martin-Doob-Hunt theory: for discrete-parameter chains, 111.28, 111.29, 111.42; for
Brownian motion, 111.30, 111.31.
Martingales: definitions, 11.46,11.63; for Brownian motion, 1.17; convergence theorems,
11.49, 11.50, 11.51, 11.69; in L', 11.53; regularity of paths, 11.65, 11.66, 11.67; for FD
processes, ШЛО; for Brownian motion, 1.17.
Martingale inequalities: Burkholder-Davis-Gundy inequality, IV.42; Doob's LP
inequality, 11.52, 11.70; Doob's submartingale inequality, 11.52, 11.54, 11.70; Doob's
Upcrossing Lemma, 11.48.
Martingale problem: V.19; existence of solutions, V.23; for Markov chains, IV.20; Markov
property of solution, V.21; relationship to weak solutions of SDEs, V.19-20; Stroock-
Varadhan Theorem, V.24; well-posed, V.19.
Martingale representation: for Brownian motion, IV.36, 41; for FD diffusion, V.25; for
Markov chains, IV.21.
Maximum-Modulus Theorem: 1.20.
Maximum Principle: III. 13.
McGill's Lemma: VI.59.
Mean curvature: V.4.
Measurable function: II.2.
Measurable space: II. 1.
Measurable transition function: III.3.
Measure space: II.4.
Meyer decomposition: III. 17, VI.29, VI.32, VI.46.
Meyer's Previsibility Theorem: VI. 15.
382
INDEX TO VOLUMES 1 AND 2
Minkowski inequality: 11.10.
Moderate function: IV.42.
Modification: 11.36.
Monotone-Class Theorems: II.3.
Monotone Convergence Theorem: II.8.
Multiple points of Brownian motion: 1.21.
Multiplicative functional: see PCHMF.
Nagasawa's formula: 111.42, 111.46.
Natural scale: V.46.
Net: 11.80.
Normal coordinates: V.34.
Normal transition function: III.3.
Observation process: VI.8
Occupation density formula: 1.5, IV.45.
One-dimensional diffusions: 1.5, V.44-54, absorbing, inaccessible, reflecting end-points,
V.47, V.51; exit, entrance boundary points, V.51; infinitesimal generator, V.47, V.50;
natural scale, V.46; regular diffusion, V.45; resolvent, V.50, VI.54; scale function, V.46;
speed measure, V.47; time substitution, V.47.
One-parameter subgroup: V.35.
Optional processes, σ-algebra: VI.4.
Optional projection: VI.7.
Optional-Sampling Theorem: 11.59,11.77.
Optional Section Theorem: VI.5.
Optional-Stopping Theorem: 11.57.
Optional time: VI.4; see also stopping time.
Orthonormal frame bundle: V.30, V.33, V.34.
Ornstein-Uhlenbeck process: 1.23, V. 5; spectral measure, 1.24.
Outer measure: 1.6,11.35.
π-system: ILL
Parallel-displacement: V.32, 34.
Parallel transport: V.32.
Path decomposition: 111.49, VI.55.
Path regularization: 11.67, III.7.
Path-space: 11.28, V.8.
Pathwise-exact SDE: see SDE.
Pathwise uniqueness: V.9, V.17; Nakao theorem, V.41; Yamada-Watanabe Theorem,
V.40.
PCHAF (perfect, continuous, homogeneous, additive functional): III. 16.
PCHMF (perfect, continuous, homogeneous, multiplicative functional): III. 18.
PFA theorem: VI. 12, VI. 16.
Picard's Theorem: 1.20.
Polish space: 11.82.
Poisson measures: 11.37.
Potential (supermartingale): 11.59.
Potential theory: 1.22; see also Dirichlet problem, Martin-Doob-Hunt theory.
Poisson kernel for the half plane: 1.19.
Poisson kernel for the unit ball: 111.30.
INDEX TO VOLUMES 1 AND 2
383
Poisson measure, process: 11.37, VI.2.
Polish space: 11.82; characterization of, 11.82.
Probability triple: 11.14.
Рге-Brownian motion: 11.32, 11.68.
Рге-Poisson set function: 11.33.
Pre-Τ σ-algebra: II.58,11.73, VI. 17.
Previsible: 11.47; path functionals, V.8; processes, σ-algebra, IV.6; Section Theorem,
VI.19; stopping time, VI. 12.
Previsible projection: VI.19.
Product σ-algebras: 11.11; product measures, 11.12,11.22.
Progressive process, σ-algebra: II.73, VI.3.
Prohorov's Theorem: 11.83.
Pseudo-Riemannian metric: V.36.
Pure local martingale: IV.34, IV.35, V.28.
Purely discontinuous martingales: IV.24.
β-matrices: III.l; DK conditions, 111.53; local-character condition, 111.54; probabilistic
significance, 111.57, IV.21; of symmetric chains, 111.59; of totally instantaneous chains,
111.55.
Quadratic-covariation process: IV.26.
Quadratic variation: 1.11.
Quadratic-variation process: IV.26, VI.36; for continuous local martingales, IV.30;
previsible angle-bracket process, VI.34.
Quantum fluctuations: V.5.
Quasi-left-continuity: for FD processes, III.l 1; of filiations, VI. 18; for Ray processes
111.41, 111.50.
Quasimartingales: VI.41.
Quaternions: V.35.
R-filtered space: 11,67.
R-path, R-process: 11.62, 11.63, Introduction to Chapter IV.
R-regularisation: 11.67.
R-supermartingale convergence theorem: 11.69.
Radon-Nikodym Theorem: II.9.
Random field: 1.24.
Random walk, Martin boundary of: 111.28.
Ray-Knight compactification: Ш.35.
Ray-Knight Theorem on local times: VI.52.
Ray processes: 111.36; application to chains, 111.50.
Ray resolvent: 111.34.
Ray's Theorem: 111.36, 111.38.
Reducing sequence: IV.ll.
Reduction: IV. 11,1V.29.
Reflecting Brownian motion: see Brownian motion.
Reflection principle: 1.13.
Regular conditional probabilities: existence theorem, 11.89; counterexample, 11.43.
Regular class (D) submartingale: VI.31.
Regular diffusion: V.45.
Regular increasing process: VI.21.
Regular point: 1.22.
384
INDEX TO VOLUMES 1 AND 2
Regular function: 111.27.
Regularizable path: 11.62.
Resolvent: III.2, III.3; of Brownian motion, III.3.
Resolvent equation: III.2, III.3.
Reuter's Theorem on drifting Brownian motion: IV.39.
Reversed Martingale Convergence Theorem: 11.51.
Riemannian connection: V.32, V.34.
Riemannian manifold: V.31.
Riemann mapping theorem: 1.19.
Riemannian metric: V.34.
Riemannian structure induced by non-singular diffusion: V.34.
Riesz decomposition of excessive functions: 111.27.
Riesz decomposition of a UI supermartingale: 11.59.
Riesz representation Theorem: 11.80, III.6.
Rolling without slipping: V.33.
σ-additivity: II.4.
σ-algebra: II. 1; countably-generated, 11.88.
σ-field: see σ-algebra.
SDE: of diffusion type, V.8; exact, V.9, V.17; Ito's Theorem on existence and uniqueness
of solutions, V.ll; links with martingale problem, V.19-20; with (locally) Lipschitz
coefficients, V. 11-13; Markov property of solutions, V.13; pathwise uniqueness, V.9,
V.17; strong solution, V.10; Tanaka's SDE, V.16; time-reversal, V.13; Tsirel'son's SDE,
V.18; uniqueness in law, V.16; weak solution, V.16.
Scale function: V.28, V.46.
SchefTe's Lemma: II.8.
Section Theorem: 11.76; Optional Section Theorem, VI.5; Previsible Section Theorem,
VI.19.
Semimartingale: IV. 15; continuous, see continuous semimartingale; as integrator, IV. 16;
local time, IV.43; in a manifold, IV. 15.
Signal process: VI.8.
Skew product of Brownian motion: IV.35.
Skorokhod embedding: 1.7, VI.51.
Skorokhod's equation: V.6.
Special semimartingale: VI.40.
Spectral measure of stationary Gaussian process: 1.24.
Spitzer-Rogozin identity for Levy processes: 1.29.
Splitting time: 111.49.
Stable process: 1.28.
Stable subspace: IV.24.
Standard process: 111.49.
'Standard' transition matrix function: III.2.
Stochastic control, optimality principle: V.15.
Stochastic development: V.33.
Stochastic differential equation: see SDE.
Stochastic differentials: IV.32, V.l.
Stochastic flows: V.13.
Stochastic integral: IV.27, VI.36-38; Riemann-sum approximation, IV.47.
Stochastic partial differential equations: VI. 11.
Stochastic process: 11.27.
INDEX TO VOLUMES 1 AND 2
385
Stone-Weierstrass Theorem: 11.80.
Stopping time: 11.56,11.73.
Strassen's Law: 1.16; invariance principle, 1.16.
Stratonovich calculus: IV.46; switch to Ito, V.30.
Strong Law of Large Numbers: 11.51.
Strong Markov property: for Brownian motion, 1.12; for FD processes, III.8, III.9; for
Ray processes, 111.40; under time reversal, 111.47.
Strong reduction: VI.37.
Structural constants for Lie groups: V.35.
Structural equations: V.34.
Subadditive Ergodic Theorem: 1.22.
Submanifold: V.34; regular submanifold, V.31.
Sub-Markov semigroup: III.3.
Submartingale: 11.46, 11.63.
Subordinator: 1.28,11.37, VI.43.
Summation convention: V.l.
Superharmonic function: 111.31.
Supermartingale: convergence theorem, 11.49; definition, 11.46, 11.63; sup of a sequence
of, 11.78.
Supermedian function: III.34.
Symmetrisable Q-matrix, transition matrix function: 111.59.
Taboo probabilities: 111.52.
Tanaka's formula: IV.43.
Tanaka's SDE: V.16.
Tangent bundle: V.34.
Tangent vector: V.30, V.34.
Tchebychev inequality: 11.18.
Terminal time: III. 18.
Tightness: 11.83,11.85.
Time change: see time substitution.
Time reversal: 111.42, 111.47,111.49.
Time-reversed Brownian motion: 11.38.
Time substitution: 111.21, IV.30, V.26.
Torsion: V.34.
Totally inaccessible stopping time: V.21, VI.13-14.
Tower property of conditional expectation: 11.41.
Transition function: III. 1; measurable, III.3.
Trotter's Theorem: 1.5.
Tsirel'son's SDE: V.l8.
Uniform asymptotic negligibility: 1.28.
UI: see uniform integrability.
Uniform integrability 11.20,11.29, 11.21 11.44.
Uniqueness in law: V.16.
Universal completion: III.9.
Uperossings: 11.62.
Upcrossing Lemma: 11.48.
Usual augmentation: 11.67,11.75.
Usual conditions: 11.67, IV, Introduction.
386
INDEX TO VOLUMES 1 AND 2
Volkonskii's formula: 111.21.
Volkonskii-Sur-Meyer Theorem: III. 16, III. 17.
Volume element: V.34.
Von Mises distribution: IV.39.
Weak convergence: 11.83; Prohorov's Theorem, 11.83; in W, 11.85; Skorokhod's
interpretation, 11.84,11.86.
Weak* topology: 11.80.
Weyl's Lemma: V.38.
Whittle's flypaper example: V.7, V.15.
Wiener-Hopf factorisation of Levy processes: 1.29 .
Wiener measure: 1.6.
Wiener process: see Brownian motion.
Wiener's Theorem: 1.6; proofs, 1.6,11.71.
Yor's addition formula: IV. 19.
Yor's Theorem on semimartingale local time: IV.44.
Zvonkin's observation: V.18, V.28.