/
Author: Billingsley P.
Tags: mathematical analysis probability theory functional analysis convergence of measures
ISBN: 471-07242-7
Year: 1968
Text
Convergence of
Probability Measures
Patrick Billingsley
Departments of Statistics and Mathematics
The University of Chicago
JOHN WILEY & SONS, New York • Chichester • Brisbane • Toronto
Copyright © 1968 by John Wiley & Sons, Inc.
All rights reserved.
Reproduction or translation of any part of this work beyond
that permitted by Sections 107 or 108 of the 1976 United States
Copyright Act without the permission of the copyright owner
is unlawful. Requests for permission or further information
should be addressed to the Permissions Department, John
Wiley & Sons, Inc.
Library of Congress Catalog Card Number: 68-23922
SBN 471 07242 7
Printed in the United States of America
20 19 18 17 16 15 14 13
TO MY MOTHER
Preface
Asymptotic distribution theorems in probability and statistics have from the
beginning depended on the classical theory of weak convergence of distribu-
distribution functions in Euclidean space—convergence, that is, at continuity points
of the limit function. The past several decades have seen the creation and
extensive application of a more inclusive theory of weak convergence of
probability measures on metric spaces. There are many asymptotic results
that can be formulated within the classical theory but require for their proofs
this more general theory, which thus does not merely study itself. This book
is about weak-convergence methods in metric spaces, with applications
sufficient to show their power and utility.
The Introduction motivates the definitions and indicates how the theory
will yield solutions to problems arising outside it. Chapter 1 sets out the basic
general theorems, which are then specialized in Chapter 2 to the space of
continuous functions on the unit interval and in Chapter 3 to the space of
functions with discontinuities of the first kind. The results of the first three
chapters are used in Chapter 4 to derive a variety of limit theorems for
dependent sequences of random variables.
Although standard measure-theoretic probability and metric-space topol-
topology are assumed, no general (nonmetric) topology is used, and the few results
required from functional analysis are proved in the text or in an appendix.
Mastering the impulse to hoard the examples and applications till the last,
thereby obliging the reader to persevere to the end, I have instead spread
them evenly through the book to illustrate the theory as it emerges in stages.
Chicago, March 1968 Patrick Billingsley
vii
Acknowledgements
My thanks go to Soren Johansen, Samuel Karlin, David Kendall, Ronald
Руке, and Flemming Topsoe, who read large parts of the manuscript; the
book owes much to their detailed suggestions, and I am very grateful. I
should also like to thank Mary Woolridge for her typing, cheerful, swift,
and error-free.
The writing of this book was supported in part by the Statistics Branch,
Office of Naval Research, and in part by Research Grant No. 8026 from the
Division of Mathematical, Physical, and Engineering Sciences of the National
Science Foundation.
Vlll
Contents
INTRODUCTION 1
CHAPTER 1. WEAK CONVERGENCE IN METRIC SPACES 7
1. Measures in Metric Spaces, 7
Measures and Integrals, 7, Tightness, 9
2. Properties of Weak Convergence, 11
Portmanteau Theorem, 11, Other Criteria, 14
3. Some Special Cases, 17
Euclidean Space, 17, The Circle, 18, The
Space R™, 19, The Space C, 19, Product
Spaces, 20
4. Convergence in Distribution, 22
Random Elements, 22, Convergence in Distri-
Distribution, 23, Convergence in Probability, 24,
Product Spaces, 26
5. Weak Convergence and Mappings, 29
Continuous Mappings, 29, Main Theorem,
30, Integration to the Limit, 31, An Extension
of Theorem 5.1, 33
6. Prohorov's Theorem, 35
Relative Compactness, 35, The Direct Theorem,
37, The Converse, 40
IX
Contents
7. First Applications, 41
Smooth Functions, 41, The Central Limit
Theorem, 42, Characteristic Functions, 45, The
Cramer-Wold Device, 48, Local and Integral
Limit Theorems, 49, Weak Convergence on the
Circle and Torus, 50
CHAPTER 2. THE SPACE С 54
8. Weak Convergence and Tightness in C, 54
Weak Convergence, 54, Tightness, 54, Random
Functions, 57, Coordinate Variables, 60
9. The Existence of Wiener Measure, 61
Wiener Measure, 61, The Brownian Bridge, 64,
Separable Stochastic Processes, 65
10. Donsker's Theorem, 68
The Theorem, 68, An Application, 70, A
Necessary Condition for Tightness, 73, Another
Proof of Donsker's Theorem, 73
11. Functions of Brownian Motion Paths, 77
Maximum and Minimum, 11, The Arc Sine Law,
80, The Brownian Bridge, 83
12. Fluctuations of Partial Sums, 87
Maxima, 87, Product Moments, 88, Applica-
Applications, 89, Proof of Theorem 12.1, 91, Moments,
94, A Tightness Criterion, 95, Further In-
Inequalities, 98
13. Empirical Distribution Functions, 103
CHAPTER 3. THE SPACE D 109
14. The Geometry of D, 109
The Space D, 109, The Skorohod Topology,
111, Completeness ofD, 115, Compactness in D,
116, A Second Characterization of Compactness,
118, Finite-Dimensional Sets, 120
15. Weak Convergence and Tightness in D, 123
Finite-Dimensional Distributions, 123, Tight-
Tightness, 125, Random Elements of D, 128, A
Criterion for Convergence, 128, Criteria for
Contents xi
Existence, 130, Other Criteria, 133, Separable
Stochastic Processes, 134
16. Applications, 137
Donskefs Theorem, 137, Dominated Measures,
139, Empirical Distribution Functions, 141
17. Random Change of Time, 143
Randomly Selected Partial Sums, 143, Random
Change of Time, 144, Applications, 145,
Renewal Theory, 148
18. The Uniform Topology, 150
CHAPTER 4. DEPENDENT VARIABLES
19. Diffusion, 154
A Characterization of Brownian Motion, 154,
Other Diffusion Processes, 158
20. Mixing Processes, 166
y-Mixing, 166, Inequalities for Moments, 170,
Functional Central Limit Theorem, VIA, Inte-
Integrals in Place of Sums, 178, Nonstationarity,
179
21. Functions of Mixing Processes, 182
Preliminaries, 182, Functional Central Limit
Theorem, 184, Applications, 191, Diophantine
Approximation, 193, Nonstationarity, 194
22. Empirical Distribution Functions, 195
(p-Mixing Processes, 195, Functions of <p-
Mixing Processes, 199
23. Martingales, 205
24. Exchangeable Random Variables, 208
Sampling, 208, Exchangeable Variables, 212
APPENDIX I. METRIC SPACES
Separability, 215, Compactness, 217, Upper
Semicontinuity, 218, 77ie S/?aa? i?00, 218, 77ie
C, 220
154
215
xii Contents
APPENDIX II. MISCELLANY 222
Measurability, 222, Change of Variable, 222,
Tail Probabilities, 223, Scheffe's Theorem, 223,
Subspaces, 224, Product Spaces, 224, Measur-
Measurability of Dh, 225, Belly s Theorem, 226,
Kolmogorov's Theorem, 228, Measurability of
Some Mappings, 230, More Measurability, 232
APPENDIX III. THEORETICAL COMPLEMENTS 233
The Problem of Measure, 233, Separable
Measures, 234, The Topology of Weak Con-
Convergence, 236, Prohorov's Theorem, 239
BIBLIOGRAPHY 243
SUMMARY OF NOTATION 248
INDEX 251
CONVERGENCE OF PROBABILITY MEASURES
Introduction
The De Moivre-Laplace limit theorem states that, if
A) ВД = Pf^=^ < x\
I J J
is the distribution function of the normalized number of successes in n
Bernoulli trials, and if
B) F(x) = ~=Г e~W du
is the unit normal distribution function, then
C) Fn(x) -> F{x)
for all x (n -> oo, the probability p of success fixed).
We say of arbitrary distribution functions Fn and F on the line that Fn
converges weakly to F, which we indicate by writing Fn => F, if C) holds for
all continuity points x of F. Thus the De Moivre-Laplace theorem asserts
that A) converges weakly to B); since B) is everywhere continuous, the
proviso about continuity points is vacuous in this case. If Fn and F are
denned by
0 if x < -
D) Fn{x) = ;
if x>-
n
and
fO if x < 0
E) F{x) =
U if x > 0,
2 Introduction
then again Fn=>F, and this time the proviso does come into play: C)
fails at x = 0.
For a better understanding of this notion of weak convergence, which
underlies a large class of Limit theorems in probability theory, consider the
probability measures Pn and P generated by arbitrary distribution functions
Fn and F. These probability measures, defined on the class of Borel subsets
of the line, are uniquely determined by the requirements
Pn{- со, x] = Fn(x), P{- со, x] = F(x).
Since F is continuous at x if and only if the set {x} consisting of x alone has
P-measure 0, Fn => F means that the implication
F) Pn(-co,x]-+P(-cc,x] if P{x} = 0
holds for each x.
Let dA denote the boundary of a subset A of the line; дA consists of those
points that are limits of sequences of points in A and are also limits of
sequences of points outside A. Since the boundary of (— со, х] consists of the
single point x, F) is equivalent to
G) Pn(A)-+P(A) if P(dA) = 0,
where we have written A for (— oo, x]. The fact of the matter is that Fn=> F
holds if and only if the implication G) is true for every Borel set A—a result
proved in Chapter 1.
Let us distinguish by the term P-continuity set those Borel sets A for which
P(dA) = 0, and let us say that Pn converges -weakly to P, and write Pn => P,
if Pn(A) -> P(A) for each P-continuity set—that is, if G) holds. As just
asserted, Pn=>P if and only if the corresponding distribution functions
satisfy Fn => F.
This reformulation of the concept of weak convergence clarifies the reason
why we allow C) to fail if F has a jump at x. Without this exemption, D)
would not converge weakly to E), but this example may appear artificial.
If we turn our attention to probability measures Pn and P, however, we see
that Pn(A) -> P{A) may fail if P(dA) > 0 even in the De Moivre-Laplace
theorem. The measures Pn and P generated by A) and B) satisfy
(8) Pn(A) = p[^-=^P e A]
and
(9) P(A) = -±= | e-*"" du
^2 J
Introduction 3
for Borel sets A. Now if A consists of the countably many points
.— , и = 1,2,..., к = 0, 1,. . . , и,
then РЯ(Л) = 1 for all и and P{A) = 0, so that Pn(A) -> P(^) is impossible.
Since 9Л is the entire real line, this does not violate G).
Although the concept of weak convergence of distribution functions is tied
to the real line (or to Euclidean space, at any rate), the concept of weak
convergence of probability measures can be formulated for the general
metric space, which is the real reason for preferring the latter concept. Let S
be an arbitrary metric space, let Sf be the class of Borel subsets of S (Sf is
the G-neld generated by the open sets), and consider probability measures Pn
and P denned on Sf. Exactly as before, we define weak convergence Pn => i>
by requiring the implication G) to hold for all Borel sets A. In Chapter 1 we
investigate the general theory of this concept of convergence and see what it
reduces to in various particular metric spaces. We prove there, for example,
that Pn converges weakly to P if and only if
A0) (fdPn->(fdP if feC(S),
Js Js
where C(S) denotes the class of bounded, continuous real-valued functions
on S. (In order to conform with general mathematical usage, we take A0)
as the definition of weak convergence, so that G) becomes a necessary and
sufficient condition instead of a definition.)
Chapter 2 concerns weak convergence in the space С — C[0, 1] with the
uniform topology; С is the space of all continuous real functions on the
closed unit interval [0, 1], metrized by taking the distance between two
functions x = x{t) and у — y{t) to be
A1) p(z,y)= sup \x(t)-y{i)\.
0<t<l
An example of the sort of application made in Chapter 2 will show the
utility of a general treatment of weak convergence—one that goes beyond the
classical Euclidean case. Let ?l5 ?2, ... be a sequence of independent,
identically distributed random variables defined on some probability space
(Q., &, P). If the ?n have mean 0 and variance a2, then, according to the
Lindeberg-Levy central limit theorem, the distribution of the normalized sum
A2) ^5 (| ++!)
converges weakly, as n tends to infinity, to the normal distribution defined
by (9).
4 Introduction
We can formulate a refinement of the central limit theorem by proving
weak convergence of the distributions of certain random functions constructed
from the partial sums Sn. For each integer n and
each sample point со, construct on the unit interval
the polygonal function that is linear on each of the
subintervals [(i — \)/n, Цп], i = 1, 2, . . . , n, and
has the value 8г{соIа\1п at the point i\n (S0(co) =
0). In other words, construct the function Xn{co)
whose value at a point / of [0, 1] is
A3)
Xn(t, со) = J- SU°>) + t ~ °' Г 1)/П ~T- U")
a^Jn 1/n Tj
"i- 1 П
n ' nj
For each со, Xn(co) is an element of the space C. Let Pn be the distribution
of Xn(co) in C, defined for Borel subsets A of С—Borel sets relative to the
metric A1)—by
(the definition is possible because the mapping со -> Xn(co) turns out to be
measurable in the right way). In Chapter 2 we prove
A4) Pn ^ W,
where W is Wiener measure. We also prove the existence in С of Wiener
measure, which describes the probability distribution of the path traced out
by a particle under Brownian motion.
If A = {x:x{\) < a}, then, since the value of the function Xn(co) at / = 1
is Xn{\, со) = Sn(co)lay/n,
It turns out that W{dA) = 0, so that A4) implies
Pfco:^-Sn(co) < a) -> W{x:x(l) < a}.
It also turns out that
W{x:x(l) < a} = -p= Г e~W du,
¦JItT J — ao
so that A4) does contain the Lindeberg-Levy theorem.
The sum |x + • • • + ?n may be interpreted as the position at time n in a
random walk. The central limit theorem asserts that this position (properly
normalized) is, for n large, distributed approximately as the position at time
t = 1 of a particle in Brownian motion. The relation A4) asserts that the
Introduction 5
entire path of the random walk during the first n steps is, for n large,
distributed approximately as the path up to time / = 1 of a particle under
Brownian motion.
To see in a concrete way that A4) contains information going beyond the
central limit theorem, consider the set
A = \x: sup x{t) < <xl.
I o<t<i J
Again it turns out that W(dA) — 0, so that A4) implies
A5) lim ?[co:—L max Sk(co) < a] = lim Pn(A)
= W\x: sup x(t) < a).
I o<t<i J
If we evaluate the right-most member of A5) (which we can do in a number
of ways, for example, by computing the limit on the left for some specially
selected sequence {?n} that makes the computation easy), then we have
a limit theorem for the distribution of maxfc<n Sk under the hypotheses of
the Lindeberg-Levy theorem.
As a final example involving Xn(co), take A to be the set of x in С for
which the set {t:x(t)> 0} has Lebesgue measure at most a (we assume
0 < a < 1). As before, Pn(A) -> W(A). Since the Lebesgue measure of
{t:Xn(t, со) > 0} is essentially the fraction of the partial sums S1} S2, . . . , Sn
that exceed 0, this argument leads to an arc sine law under the hypotheses of
the Lindeberg-Levy theorem. Chapter 2 contains the details of all these
derivations.
We can in this way use the theory of weak convergence in С to obtain a
whole class of limit theorems for functions of the partial sums Slt S2,. . . , Sn.
The fact that Wiener measure W is the weak limit of the distribution in С
of the random function Xn(a>) can also be used to prove theorems about W,
and W is interesting in its own right.
Chapter 3 specializes the theory of weak convergence to another space of
functions on [0, 1]—the space D = D[0, 1] of functions having discontinu-
discontinuities of at most the first kind. This is the natural space in which to analyze the
behavior of empirical distribution functions, for example. Let |l5 ?2, . . . be
independent random variables on (Q., &, P),
each uniformly distributed on [0, 1]. Let Fn(t, со)
be the empirical distribution function of ^(co),
..., fn(o>); for 0 < / < 1, Fn(t, со) is the frac-
fraction of integers k, 1 < к < n, for which |fc(a») <
t. Now let Yn(co) be the element of D whose
value at / is
Yn{t, со) = Jn(Fn(t, со) - /).
6 Introduction
If D is metrized in the right way, it becomes possible to speak of the distribu-
distribution in D of the random function Yn(a>) and to prove that this distribution
converges weakly as n tends to infinity. Just as in the case of the random
element of С defined by A3), we can then go on to derive the limiting
distributions of
sup ^n(Fn{t, со) - t)= sup Yn(t, со)
0<t<l 0<<l
and related quantities that arise in statistics.
Chapter 4 concerns weak convergence of the distributions of random
functions derived from various dependent sequences of random variables.
Many of the conclusions in Chapters 2, 3, and 4, although not requiring
function-space concepts for their statement, could hardly have been derived
without function-space methods.
Standard measure-theoretic probability and metric-space topology are
used from the beginning. Although the point of view throughout is that of
functional analysis (a function is a point in a space), nothing of functional
analysis is assumed (beyond an initial willingness to view a function as a
point in a space); all function-analytic results needed are proved in the text
or else in Appendix I (which also gathers together for easy reference some
results in metric-space topology).
Remarks. The main papers that lead to the development of this theory were Kolmogorov
A931), Erdos and Kac A946 and 1947), Doob A949), and Donsker A951 and 1952).
Prohorov A953 and 1956) and Skorohod A956) gave the theory its present form. Le Cam
A957) and Varadarajan A958a and 1961a) have extended it to general topological spaces.
CHAPTER 1
Weak Convergence in Metric Spaces
1. MEASURES IN METRIC SPACES . . .
Let S be a metric space. We shall study probability measures on the class Sf
of Borel sets in S. Here Sf is the a-field generated by .the open sets—the
smallest a-field containing all the open sets—and a probability measure on
Sf is a nonnegative, countably additive set function P with P(S) = 1.
If such probability measures Pn and P satisfy J^ fdPn -> J^ fdP for
every bounded, continuous real function / on S, we say that Pn converges
weakly to P and write Pn => P. Our aim in this chapter is to study this
concept in detail; we begin with some properties of individual probability
measures on (S, SP).
Although we must sometimes assume separability or completeness, most
of the theorems in this chapter hold for an arbitrary metric space S. The
spaces in our applications are usually separable and complete; since they
rarely have further regularity properties, such as local compactness, we
never impose further restrictions.
Measures and Integrals
THEOREM 1.1 Every probability measure on (S,?f) is regular; that is,
if A e?f and s > 0, then there exist a closed set F and an open set G such that
F с Л с GandP(G - F)< s.
Proof. Denote the metric on S by p(x, y) and the distance from x to A by
p(x, A).1[ If A is closed, then we may take F = A and G = {x:p{x, A) < <5}
f For terminology and some theorems about metric spaces, see Appendix I.
8 Weak Convergence in Metric Spaces
for some d, since the latter sets decrease to A as д J, 0. Hence we need only
show that the class ^ of Borel sets with the asserted property is a o--field.f
Given sets An in 0, choose closed sets Fn and open sets Gn such that
FnczAn^Gn and P(Gn - Fn)< sl2n+K If G=\JnGn, and if F-
Un<n/n' with "o so chosen that РA1Л - Ж e/2, then Fc(JAcG
and P(G — F) < ?- Thus ^ is closed under the formation of countable
unions; since У is obviously closed under complementation, the proof is
complete.
Theorem 1.1 implies that P is determined by the values of P(F) for closed
sets F. Theorem 1.3 shows that P is determined by the values of $fdP%
for bounded, continuous real functions / denned on S. Denote by CE) the
class of such functions /. It is shown on p. 222 § that each / in C(S) is
measurable Sf. Everything depends on the following result, which shows
how to approximate the indicator (or characteristic function) IF of a closed
set F by elements of CE).
THEOREM 1.2 If F is closed and s positive, there is a function f in C(S)
such thatf(x) = lifxe F,f(x) = 0 if P(x, F) > s, and 0 <f(x) < 1 for all
x. The function f may be taken to be uniformly continuous.
Proof. Define a continuous function q> of a real variable by
(\ if t<0,
A.1) <p(t) - (
If
A-2) J
1 - / if 0 < / < 1,
0 if 1 < /.
a-6
f We have defined the class ?f of Borel sets as the ст-field generated by the open sets, which
is the same thing as the c-field generated by the closed sets and is the one appropriate for
the present theory. For related (mostly inappropriate) ст-fields, see Problem 6.
% When it is the entire space, we omit the region of integration.
§ Of Appendix II, a miscellany to which most measurability questions are relegated.
Measures in Metric Spaces 9
then / has the required properties—it is even uniformly continuous. The
drawing graphs this/for F — [a, b] on the line.
THEOREM 1.3 Probability measures P and Q on (S, Sf) coincide if
A.3) jfdP=(fdQ
for each/in C(S).
Proof. Suppose F is closed. Start with A.1) and define, for each positive
integer u,
A.4) <pu(t) = <p(ut)
and
A-5) /„(*) = ?u(p(*. F)).
Then {/„} is a nonincreasing sequence of elements of C(S) converging point-
wise to IF. By the bounded convergence theorem, P(F) = limu J/u dP and
Q(F) =* HmJ/u dQ, so that, if A.3) holds for all/in C(S), P(F) =* Q(F).
Since P and Q agree for all closed sets, it follows by Theorem 1.1 that P and
Q are identical.
Thus the values of \fdP for/in C(S) completely determine the values of
P{A) for A in Sf. This fact underlies the circle of ideas centering on the notion
of weak convergence; although we have defined weak convergence by requir-
requiring the convergence of the integrals of functions in C(S), in the next section
we shall characterize it in terms of the convergence of the measures of certain
sets.
Tightness
The following notion of tightness proves important both in the theory of weak
convergence and in its applications. A probability measure P on (S, Sf) is
tight if for each positive s there exists a compact (p. 217) set К such that
P{K) > 1 — e. Clearly, P is tight if and only if it has a a-compact support.f
By Theorem 1.1, P is tight if and only if P(A) is, for each A in Э*, the
supremum of P(K) over the compact subsets К of A.
In a space that is a-compact, every probability measure is tight—which
covers fc-dimensional Euclidean space. The following result, which also
covers the Euclidean case, is more useful.
t A support of a probability measure is a set A in У with P{A) = 1; a set is c-compact
if it can be represented as a countable union of compact sets. The characterization of a
tight P as having a c-compact support is inappropriate as a definition because it does not
generalize in the right way to families of probability measures (see Section 6).
10 Weak Convergence in Metric Spaces
THEOREM 1.4 If S is separable and complete, then each probability
measure on (S, ?f) is tight.
Proof. Since S is separable, there is, for each n, a sequence Anl, An2, ... of
open l/«-spheres covering S. Choose in so гЬ.аХР(Х)г^гпАпг) > 1 — ?/2n- By
the completeness hypothesis, the totally bounded set C\n>i\Ji<inAni has
compact closure К (see p. 217). Since clearly P(K) > 1 — s, the theorem
follows.
Theorem 1.4 is false without the hypothesis of completeness; whether the
hypothesis of separability can be suppressed is equivalent to the problem of
measure. These matters are discussed in Appendix III.j
Remarks. Theorem 1.4 is due to Ulam (see Oxtoby and Ulam A939)); LeCam A957)
introduced the term "tight."
PROBLEMS*
1. Say that a function / separates sets A and В if f(x) = 0 for x in A, f(x) = 1 for x
in B, and 0 < f(x) < 1 for all x. If A and В are at positive distance, they can be separated
by a uniformly continuous / [Theorem 1.2]. If A and В have disjoint closures but are
at distance 0, they can be separated by a continuous/ [/(x) = p(x, A)/(p(x, A) + p(x, B))]
but not by a uniformly continuous/. There is no continuous/separating A and В if their
closures meet; there is no /separating A and В if they meet.
2. Give examples of distinct topologies that give rise to the same class of Borel sets.
3. If S can be embedded as an open set in some complete metric space, then [Kelley
A955, p. 207)] it is topologically complete. Since a locally compact S is open in its
completion, it is topologically complete. Hence Theorem 1.4 applies if S is separable and
locally compact. Since such an S is a-compact [being a union of open sets with compact
closures and hence (p. 216) a countable such union], it also follows directly that each proba-
probability measure on it is tight; Euclidean space is an example.
4. Let S be a Hilbert space with a countably infinite orthonormal basis xx, x2, . . . .
Since S is separable and complete, Theorem 1.4 applies. However, no set with nonempty
interior is compact [a nonempty interior must, for some x and e, contain all the points
x + exn], so that S is neither locally compact nor [Baire's category theorem; Kelley A955,
p. 200)] ff-compact. If P assigns positive mass to each element of a countable, dense set,
then P has no support locally compact in the relative topology.
5. Adapt Problem 4 to the general Banach space of countably infinite dimension [there
exist points хъ x2, . . . with supn ||xn|] < со and mfm^n \\xm — xn\\ > 0; see Banach
A932, p. 83)]; C[0, 1], important in probability, is such a space, which explains why a
theory based on local compactness is of small utility in this subject. (See also Problem 5 in
Section 3.)
t Although Theorem 1.4 as given suffices for all the applications in this book, it is natural
to inquire after extensions. It is to questions of just this sort that Appendix III is devoted.
t Some problems involve concepts not required for an understanding of the text itself;
there are no problems whose solutions are used later in the text. A simple assertion is
understood to be prefaced by "show that." Square brackets contain hints or indications of
solutions.
Properties of Weak Convergence 11
6. We have defined У as the cr-field generated by the open sets, which we can indicate by
wr-iting У — cr(open sets). In the same way, define ?x = cr(closed G5 sets) (a set is a G&
if it is a countable intersection of open sets), define Sf2 = cr(C(S)) (the smallest cr-field with
respect to which each function in C(S) is measurable), and define SP$ = cr(open spheres),
•5^4 = o-(compact sets), and Уь — cr(compact Gs sets). In a metric space each closed set is
a G&. Use this fact and Theorem 1.2 to prove
Show that У = У3 if S is separable. Show that У = У5 if S is cr-compact (which will be
true if S is separable and locally compact). We may have Уг # Уъ (even if S is locally
compact): Take S uncountable and discrete. We may have Sf% Ф <S^4 (even if S is separable
and complete): Take S to be the Hilbert space in Problem 4. (The situation differs in the
general topological space, where one must consider two classes of sets: The Borel sets are
taken as the elements sometimes of У and sometimes of Sf±, and the Baire sets are taken as
the elements sometimes of S?2 an(^ sometimes of Уь—the terminology varies.)
7. In connection with tightness, this fact is interesting: Suppose P is defined on (S, У),
but suppose at the outset only that it is finitely additive. If, for each A in У, P{A) = sup
P(K) with К ranging over the compact subsets of A, then P is countably additive after all.
2. PROPERTIES OF WEAK CONVERGENCE
We have defined Pn =>P to mean that §fdPn -> \fdP for each/in the class
C{S) of bounded, continuous real functions on S. Note that, since the
integrals $fdP completely determine P (Theorem 1.3), the sequence {Pn}
cannot converge weakly to two different limits at the same time. Note also
that weak convergence depends only on the topology of S, not on the specific
metric that generates it: Two metrics generating the same topology give rise
to the same classes Sf and C(S) and hence to the same notion of weak
convergence.!
Portmanteau Theorem
The following theorem provides useful conditions equivalent to weak
convergence; any one of these conditions could serve as the definition. A set
A in Sf whose boundary ЗА satisfies P(dA) — 0 is called a P-continuity set
(note that dA is closed and hence lies in Sf).
THEOREM 2.1 Let Pn, P be probability measures on (S, Sf). These five
conditions are equivalent:
t If we topologize the space Z{S) of all probability measures on {S, У) by taking as the
general basic neighborhood of P the set of Q such that |f ftdP — f/,- dQ\ < e for / =
1, . . . , k, where e is positive and the/j lie in C(S), then weak convergence is convergence
in this topology. The topological structure of Z{S), which will be of no direct concern to
us, is discussed in Appendix III.
12 Weak Convergence in Metric Spaces
@ Pn
(ii) limn jfdPn — jfdPfor all bounded, uniformly continuous real/.
(in) lim supn Pn(F) < P(F)for all closed F.
(iv) lim MnPn(G) > P(G)for all open G.
(v) Нтя Рп{А) = P(A)for all P-continuity sets A.
A couple of examples will show the significance of these conditions. Let
P be a unit mass at the point x (P(A) is 1 or 0, according as x lies in A or
notf), and letPn be a unit mass at xn. If xn —*- x, then §fdPn =/(#„) -»¦/(#) —
$fdP for all/in C(S), so that Pn =>P. If xn does not converge to x, then,
for some positive e, we have p(xn,x) > e for infinitely many n. If f(y) *=
<р(е~1р(х, у)) with <p defined by A.1), then/e C(S),f(x) = 1, and/(zj = 0
for infinitely many n; hence Pn cannot converge weakly to P. Thus Pn => P
if and only if жя ->- ж, which provides an example we shall often use. (Many
putative weak-convergence theorems that are in fact not theorems can be
disproved by specializing this example.) Since A is a P-continuity set if and
only if x ф дА, it is easy to check the equivalence of (i) and (v) in this case.
If xn —> x but the xn all differ from x, then there is strict inequality in (iii) for
F ~ {x} and strict inequality in (iv) for the complementary set G =* Fc;
moreover, if the xn are all distinct and A = {x2, ж4, . . . }, then Pn(A) does
not converge to P{A) or to anything else.
On the line with the ordinary metric, the DeMoivre-Laplace theorem also
illustrates the conditions in the theorem. For a simpler example equally
relevant, consider the measure Pn corresponding to a mass of \jn at each of
the points i/n, i — 1,2, . . . , n. Now Pn converges weakly to Lebesgue
measure P confined to the unit interval, as follows from the fact that J fdPn
is an approximating sum to ifdP viewed as a Riemann integral. If A consists
of the rationals, then Pn(A) = 1 does not converge to P(A) ;= 0; if G is an
open set containing the rationals and having Legesgue measure near 0,
then there is strict inequality in (iv).
We prove Theorem 2.1 by establishing the implications in the following
diagram.
I I
(i) —(ii) —(iii)<->(iv)
I
(v)
Of course, (i) —> (ii) is trivial.
Proof of (ii) —> (iii). Suppose (ii) holds and that F is closed. Suppose
<3 > 0. For small enough e, G = {x:P(x, F) < e} satisfies P(G) < P(F) + d,
Each subset of S mentioned is assumed to lie in
Properties of Weak Convergence 13
since the sets of this form decrease to Fas e [ 0. If f(x) is the function
defined by A.2), then/is uniformly continuous on S,f(x) = 1 on F,f(x) =s 0
on the complement Gc of G, and 0 <f(x) < 1 for all x. Since (ii) holds, we
have limn $fdPn = $fdP, which, together with the relations
and
[fdP =* Г fdP < P(G) < P{F) + <3,
implies
lim supn Pn(F) < limn [fdPn = \fdP < P(F) + <3.
Since д was arbitrary, (iii) follows.
Proof of (iii) -* (i). Suppose that (iii) holds and that /e C(S). We shall
first show that
B.1) \imsuVn\fdPn<\fdP.
jfdPn<(j
By transforming / linearly (with a positive coefficient for the first-degree
term), we may reduce the problem to the case in which 0 </(#) < 1 for all
x. For an integer k, temporarily fixed, let Ft be the closed set Ft =
{х:Цк <f(x)}, i = 0,l it. Since 0 < f(x) < 1, we have
i) < \
к) J
fdP < ъ[р[х-Х=± <f(x) < f)
k { к к
r ?/() < i) < \fdP < ъ[р[
к к) J i=ik { к
The sum on the right is
2 7 [ВД0 ^(^)] +
2 7 [ВД-0 (^)] г + 7
i=i /с к к i
This and a similar transformation of the sum on the left yield
B.2)
If (iii) holds, then lim зирпРи(^г) < Р(^) for each г and hence (apply the
right-hand inequality in B.2) to Pn and the left-hand one to P)
| i ад> < G лр < 7+7 i p(^.).
К г=1 J к к г=1
n<± + (
Letting к —> со, we obtain B.1).
Applying B.1) to —/yields lim infn lfdPn > J/rfP, which, together
with B.1) itself, proves weak convergence.
The equivalence of (iii) and (iv) follows easily by complementation.
14 Weak Convergence in Metric Spaces
Proof of (iii) -> (v). Let A° denote the interior of A, and let A~ denote its
closure. If (iii) holds, then so does (iv), and hence, for each A,
B.3) P(A-) ^ lim supn Pn(A~) ^ lim supn Pn(A)
> lim inf. Pn(A) > Hm infn Pn(A°)
If P(dA) = 0, then the extreme terms equal P(A) and limn Pn(A) = P(A)
follows.
Proof of (v) —*• (iii). Since d{x: p(x, F) < <5} is contained! in {x: p(x, F) = <5},
these boundaries are disjoint for distinct <5, and hence at most countably
many of them can have positive P-measure. Therefore, for some sequence
of positive dk going to 0, the sets Fk =* {x:p(x, F) < dk} are P-continuity sets.
If (v) holds, then lim supnPn(F) < \imn Pn(Fk) = P(Fk) for each k; if F
is closed, then Fk j, F, so that (iii) follows. This completes the proof of
Theorem 2.1.
Other Criteria
It is sometimes convenient to prove weak convergence by showing that
Pn(A) -^P(A) for some special class of sets A.
THEOREM 2.2 Let % be a subclass of Sf such that (i) $1 is closed under the
formation of finite intersections and (ii) each open set in S is a finite or countable
union of elements of<%. IfPn(A) -> P(A)for every A in <%, then Pn =>P.
Proof. If Ax, . . . , Am lie in %, then so do their intersections; hence, by
the inclusion-exclusion formula,
1 m \
Pn U АЛ =^РЯШ ~^Рп(АгА0) +ЪшРп(АЛАк)
If G is open, then G = Ui^i f°r some sequence {Аг} of elements of W. Given
e, choose m so that P(Ui<mA) > P(G) — ?- By the relation just proved,
P(G) - е<Р(и{<тЛг) ^"Шп.РЛи^яЛ) < lim infnPn(G). Since в was
arbitrary, condition (iv) of the preceding theorem holds.
Let S(x, e) denote the (open) ?-sphere about x.
COROLLARY 1 Let % be a class of sets such that (i) ^ is closed under the
formation of finite intersections and (ii) for every x in S and every positive s
t The inclusion may be strict—in a discrete space, for example.
Properties of Weak Convergence 15
there is an A in fy with x e A0 <= A <= S(x, e). If S is separable and ifPn(A) ->
P(A)for every A in <%, then Pn =>P.
Proof. Condition (ii)t implies that, for each point x of an open set G,
x g A0 <= A <= G for some A in %f. Since S is separable, there exists (see p.
216) in ^ a finite or infinite sequence {At} such that G <= UiA° anc* ^» ^ ^>
which implies (j — UHr Thus ^ satisfies the hypotheses of Theorem 2.2.
COROLLARY 2 Suppose that, for each finite intersection A of open spheres,
we have Pn(A)-+P(A), provided A is a P-continuity set. If S is separable,
then Pn^P.
Proof. The boundaries dS(x, e), being contained in the sets {y'-p{%, y) — ?},
are disjoint (for fixed x) and hence have P-measure 0, with at most countably
many exceptions. Since
д(А П Я) с (дА) и (дВ),
it follows that the hypotheses of Corollary 1 are satisfied by the class °U of
those P-continuity sets that are finite intersections of spheres, and the result
follows.
Let us agree to call a subclass У of Sf a convergence-determining class if
convergence Pn{A) -> P(A) for all P-continuity sets A in У invariably entails
the weak convergence of Pn to P. Corollary 2 becomes: In a separable space,
the finite intersections of spheres constitute a convergence-determining class.
Let us further agree to call У a determining class if P and Q are identical
whenever they agree on У. The class of closed sets is a determining class and
so is any field that generates Sf. Although each convergence-determining
class is clearly also a determining class, the following example shows that
the converse fails. Let S be the half-open interval [0, 1) with the ordinary
metric; let У be the class of sets [a, b) with 0 < a < b < 1. Then У is а
determining class but not (as may be seen by taking Pn [P] a unit mass at
1 — 1/w [0]) a convergence-determining class. Although this one is artificial,
we shall see that the applications abound with real examples of determining
classes that are not convergence-determining classes.
We close this section with another condition for weak convergence. A
sequence {xn} of real numbers converges to a limit x if and only if each
subsequence {xn,} contains a further subsequence {xn~} that converges to x.
(It is convenient to denote a sequence of integers by {n'} rather than {nk}
and a subsequence of {«'} by {n"} rather than {nk}.) From this fact it is easy
to deduce a weak-convergence analogue.
t This condition is slightly stronger than the requirement that the interiors of the elements
of % form a base for the topology of S.
16 Weak Convergence in Metric Spaces
THEOREM 2.3 We have Pn^P if and only if each subsequence {Pn>}
contains a further subsequence {Pn»} such that Pn» =>P.
We shall deal occasionally with weak convergence of Pt to P when t goes
to infinity in a continuous manner. Of course, this is defined to mean that
B.4) lim [fdPt=[fdP
i^°o J J
for each/in C(S), For/fixed, B.4) holds if and only if
B.5) lim [fdPtn = [fdP
for each sequence {tn} going to infinity. Thus Pt =>P as t —> oo if and only if
Ptn =>P for each sequence {tn} going to infinity, and nothing really new is
involved. We can also let t approach in a continuous manner some finite
value t0.
Remarks. Theorem 2.1 dates back at least to Alexandrov A940-43). Theorem 2.2 is
due to Kolmogorov and Prohorov A954). For other accounts of the theory, see the books
of Gikhman and Skorohod A965), Hermequin and Tortrat A965), and Parthasarathy
A967).
The Banach space C(S) has for its adjoint C*(S) the space of finite signed measures on
У\ the weak* topology, or the C(S) topology of C*(S), relativized to the space Z(S) of
probability measures on У, is the topology described in the first footnote in this section
(hence the "weak" in our "weak convergence"); see Dunford and Schwartz A958, pp. 262
and 419). Varadarajan A958a and 1961a) investigates the topological structure of Z(S);
see also Appendix III.
For extensions of the theory to general topological spaces, see LeCam A957), Vara-
Varadarajan A958a and 1961a), and Kallianpur A961).
If the metric space S is not separable, the cr-field <S^0 generated by the spheres may be
smaller than S?. Dudley A966 and 1967) has a theory of weak convergence involving only
sets in S?o and functions measurable Уо.
If Pn=> P, one can ask whether Pn{A) -»¦ P(A) holds uniformly on a given-class of P-
continuity sets; see Ranga Rao A962) and Billingsley and Tops0e A967).
PROBLEMS
1. If S is countable and discrete, then Pn ¦=> P if and only if Pn{x} -> P{x) for each one-
point set {x}.
2. Let Pn and P be given by densities p and pn with respect to a measure A on (S, ??).
lfpn(x) ->p(x) except for x in a set of A-measure 0, then Pn => P [see Scheffe's theorem on
p. 224]. Show by example.that pn(%) -^-p(x) may fail on a set of positive measure even
though Pn => P.
3. Even though Pn => P, \fdPn -> \fdP may fail if /is bounded but not continuous or
if/ is continuous but not bounded (even if the integrals exist). Give examples. If S is
compact, the second possibility does not exist. What if S is not compact but P has compact
support?
4. The class of P-continuity sets (P fixed) form a field.
Some Special Cases 17
5. If * is a determining class, if Pn(A) -> Q(A) for Ae<%, and if Pn => P, it does /iof
follow that P = Q. [Define probability measures on the line by Р^л} = Pn{l + nr1} —
p{0} = P{1} = ^ and Q{0} = 1. Let Б consist of the points 0, 1, и, and 1 + /г1 (л =
1,2,...)- Define °U as the field of Borel sets A such that either (i) А О В is finite and
0 ф A or else (ii) Ac П Bis finite and 0 ^ Ac. (This example is due to O. Bjornsson.)]
6. Define what one should mean by determining classes and convergence-determining
classes of elements of C(S). Give an example of a determining class that is not a converg-
convergence-determining class. Show that a class uniformly dense in C(S) is a convergence-
determining class.
7. If/ is bounded and upper semicontinuous (p. 218), then Pn=> P implies lim supn
\fdPn ? \fdP.
8. Let {fg} be a family of real functions on S, equicontinuous at each x (for each x and
e, there exists a 8 for which p(x, y) < 8 implies, for all 6, \fg(x) — fe{y)\ < e). If {/e}is
uniformly bounded and S is separable, then Pn=> P implies that J/e dPn -*¦ J/e dP uniformly
in 6. [First show that, for each e, there exists a countable partition A? = {D^} of S into
P-continuity sets such that | fe(x) —fg(y)\ < e for all б if ж and у lie in the same D^. Approxi-
Approximate J/e dP by T*kfe(xk)P(Dek) with xfc in D?k, and similarly for \fe dPn, and apply Scheffe's
theorem (p. 224).] (This result is due to Ranga Rao A962); for extensions, see Billingsley
and Topsoe A967) and Topsoe A967a and 1967b).)
3. SOME SPECIAL CASES
Euclidean Space
Let Rk denote A:-dimensional Euclidean space, which we shall always take
with the ordinary metric p(x, y) = \x — y\ = [2*=1(а^ — 2ЛJ]^- We denote
by 8%k the class of Borel sets; the elements of 8%k we shall call k-dimensional
Borel sets, linear Borel sets in case к = 1. By the notation у < x\y < x],
we mean yi < хг\уг < xt] for all i = 1, 2, . . . , k. (Note that у < х is
stronger than у < x together with y^ x.) An interval is a set
C.1) (a,b] = {x:a<x<b}.
Finally, let us denote by e = A, . . . , 1) the vector whose coordinates are
all 1.
The general probability measure P on (Rk, fflk) has a distribution function
F, defined by
C.2) F(x) = P{y:y^x}, x E Rk.
Let us relate weak convergence Pn => P to the usual notion of convergence
for the corresponding distribution functions Fn, F.
The function F is continuous at x if and only if for each positive s there
exists a positive <5 such that x — де < у < x -\- de implies \F(x) — F(y)\ < e.
To say that F is continuous from above at x means that for each positive e
there exists a positive <5 such that x < у < x + be implies \F(x) — F(y)\ < e.
18 Weak Convergence in Metric Spaces
From the definition C.2) it follows that F is nondecreasing in each variable.
Hence F is continuous from above at x if and only if F(x) coincides with
infj>0 F(x + de) = inf5>0 P{y\y < x + de}. This infimum is just the P-
measure of the intersection f]d>0 {y.y < x + de} = {y.y < x}. Therefore F
is continuous from above at each x.
Since F is nondecreasing in each variable and is everywhere continuous
from above, it is continuous at x if and only if it is continuous from below at
that point, that is, if and only if for each positive s there exists a positive <5
such that x — de < у < x implies \F(x) — F(y)\ < e. Using the monotonicity
once more, we see that this condition is in turn equivalent to F(x) =*
sup5>0 F(x — de). The supremum being the P-measure of the union
\Js>0 {y:y < x — de} = {y.y < x}, we see that F is continuous at x if and
only if F(x) = P{y:y < x}. Since {y.y < x} — {y.y < x} is exactly the
boundary of {y.y < x}, F is continuous at x if and only if {y.y < x} is a
P-continuity set.
For distribution functions Fn and F, let us define Fn^>F to mean that
there is convergence Fn(x) -+ F(x) at continuity points x of F. By what has
just been proved, if Pn =>P, then the corresponding distribution functions
satisfy Fn => F. Now an interval (a, b] is determined by the 2k (k — 1)-
dimensional hyperplanes containing its faces; let °U be the class of intervals
for which all these hyperplanes have P-measure 0. Each vertex of an element
of ^ is a continuity point of F, and ^ is closed under the formation of finite
intersections. Since only countably many parallel hyperplanes can have
positive P-measure, it follows by Corollary 1 to Theorem 2.2 that, if
Pn(A) -> P(A) for each A in <8f, then Pn => P. Since P(a, b] is a sum 2 ± F(x)
with x ranging over the 2k vertices of (a, b] and similarly forPn(a, b], Fn => F
implies that Pn(A) -+ P(A) holds for each A in °U. Therefore Pn^-P and
i%j => F are equivalent.
Thus the notion of weak convergence reduces in Rk to the ordinary notion
of the convergence of distribution functions. In other words, the sets
{y.y < x} form a convergence-determining class. The proof above also
shows that the rectangles (a, b] form a convergence-determining class.
The Circle
The same sort of result obviously holds if S is the unit circle in the complex
plane: Pn=>P if and only if Pn(A) —*¦ P(A) for every arc A whose endpoints
have P-measure 0. A sequence {xx, x2, . . . } of points of S (complex numbers
of modulus 1) is said to be uniformly distributed if the allotment of points to
each arc is proportional to its length, in the sense that
C.3)
n-+oo П 3=1
Some Special Cases 19
where IA is the indicator, or characteristic function, of A, and P is circular
Lebesgue measure so normalized that P(S) =* 1. If Pn denotes the rcth
empirical distribution of the sequence—the measure corresponding to a mass
of l/n at each of the points xx, xz, . . . , xn—this condition reduces to
Pn(A)^>~P(A) for arcs A, so that the sequence is uniformly distributed if
and only if Pn=>P. Therefore, if every arc contains its proper quota of
points, in the sense of C.3), then so does every other Borel set whose
boundary has Lebesgue measure 0. We prove in Section 7 a famous theorem
of Weyl, according to which {xx, x2, . . . } is uniformly distributed if and only
if n~x 2"=1 (x3)u —v 0 holds for every nonzero integer u.
The Space R™
The results for Rk carry over in all essential respects to the topological
product of a countable sequence of copies of the line, that is, to the space
R™ of sequences x =a (хг, x2, . . . ) of real numbers (see p. 218). The topology
has as the basic neighborhoods of a point x the sets of the form
C-4) Nk>?(x) ^ {y:\xt-Vi\<e, i = 1, . . . , k}
with ? > 0 and к = 1, 2, .... With this topology, i?°° is a complete,
separable metric space. (We shall always mean by i?00 this topological
product of a countable set of copies of the real line.)
Let 7Tk denote the natural projection from i?00 to Rk, defined by ттк(х) =
(xx, . . . , xk). A finite-dimensional set, or cylinder, is by definition a set of the
form тг~гЯ with к > 1 and H e 8%k. Since each ттк is continuous and hence
measurable (see p. 222), the finite-dimensional sets lie in the a-field ^°° of
Borel sets in i?00. Let ^ denote the class of finite-dimensional sets. Since
each set C.4) lies in IF and since i?00 is separable, IF generates ^°°. Since
IF is a (finitely additive) field, it follows that IF is a determining class.
For fixed к and x, the sets C.4) for different values of e have disjoint
boundaries (e < <5 implies JVj^e <= Nkd). Applying Corollary 1 of Theorem 2.2
to the class °U of P-continuity sets in IF, we see therefore that JF is even a
convergence-determining class. Thus Pn=>P if and only if Pn(A) —** P(A)
holds for all finite-dimensional P-continuity sets A.
The Space С
In С = C[0, 1], the space of continuous functions on [0, 1] with the uniform
metric p(x, y) = supt \x(t) — y(t)\ (see p. 220), the situation differs markedly
from that in i?00. For points t1}. . . , tk in [0, 1], let Trtl...tk be the mapping that
carries the point x of С to the point (x(t^, . . . , x(tk)) of Rk. The finite-
dimensional sets are now defined as sets of the form тгг1 \ H with H e Mk.
20 Weak Convergence in Metric Spaces
Since TTtv..tk is continuous, these sets lie in the class *% of Borel sets in C.
On the other hand, the closed sphere {y:p{x,y) < z} is the limit of the
finite-dimensional sets {y:\x(i[n) — y(ijri)\ < e, i — 1, ...,«}; since С is
separable, each open set is a countable union of open spheres and hence of
closed spheres, so that the finite-dimensional sets generate ^'. Since they
form a field, the finite-dimensional sets are thus a determining class.
An example shows that the finite-dimensional sets do not form a conver-
convergence-determining class. Let P be a unit mass at 0 (the function that vanishes
identically), and let Pn be a unit mass at the function xn, where
C.5) xn{t) =
nt if 0 < t <> -
n
2-nt if - < t < - ,
n n
0 if - < t < 1.
n
Since xn does not converge to 0 uniformly (that is, in the topology of C), Pn
cannot converge weakly to P. (For example, if A = S@, ?), then P{dA) — 0,
while Pn(A) = 0 does not converge to P(A) = 1.) But Pn(A) -+P(A) does
hold for finite-dimensional P-continuity sets—in fact, if A = ir~^tkH and
if 2/л is smaller than the least of the nonzero tt, then Pn(A) — P(A).
The finite-dimensional sets are thus a determining class in С but not a
convergence-determining class. The difficulty, interest, and usefulness of
weak convergence in С all spring from the fact that it involves considerations
going beyond those of finite-dimensional sets.
Product Spaces
Let S — 5" x S" be the product of metric spaces S' and 5"'. If S is separable
(which requires that S' and S" be separable), then the a-fields Sf, Sf', and
Sf" of Borel sets in these spaces are related by У = У" х У" (see p. 225).
The two marginal distributions of a probability measure P on (S, У) are
defined by P'(A') =* P(A' x S"), A'e9", and P"(A") = P(S' к A"),
A" e Sf".
THEOREM 3.1 If S is separable, then a necessary and sufficient condition
for Pn^P is that Pn(A' x A")-+P(A' x A") for each P'-continuity set A'
and each P"-continuity set A", where P' and P" are the marginal distributions
ofP.
Proof. Let 3, d', and d" denote the boundary operators in S, S', and S",
Some Special Cases 21
respectively. Since
C.6) d(A' x А") <= {{д'А') x S") и (S' x (
the condition is necessary.
To prove sufficiency, we apply Corollary 1 of Theorem 2.2 to the class <%
of sets A' x A" with A' a P'-continuity set and A" a P"-continuity set. The
class °tt is closed under the formation of finite intersections and, by
hypothesis, Pn(A) -+P(A) for A in <%.
Given (xf, x") in S and e > 0, consider the sets
A& - {y:p\x\ y') < 6} x {*/":/>"(*", У*) < <5}.
For distinct d, the sets д'{у':р'(х', у') < 6} are disjoint and the sets
d"{y":p"(x",y") < 6} are disjoint; therefore ^ lies in °tt for some д with
0 < E < e. If S is metrized by
) = max {я'(*\ 2/'), />"(*", /)}.
then Лг is just the sphere with center (x', x") and radius 6. Hence <% satisfies
the hypotheses of Corollary 1 of Theorem 2.2, as required.
The sufficiency of the condition in Theorem 3.1 implies that the
measurable rectangles form a convergence-determining class (but says more:
P(d(A' x A")) =* 0 does not imply P'(d'A') - P"(d"A") = 0).
For given probability measures P' and P" on (S1, Sf") and (S"', &"'), the
product measure P' x P" is a probability measure on У х ?f" and hence, if
iS is separable, on У. The following theorem, in which P'n and P' are
probability measures on (S", S?') and P"n and P" are probability measures on
(iS", Sf"), is an immediate consequence of Theorem 3.1.
THEOREM 3.2 IfS is separable, then P'nxP"n=>P' x P" if and only if
Pfn=>P'andPun=>P".
PROBLEMS
1. Show directly that Fn => F if and only if Fn{x) -+ F{x) for all ж in a dense set and that
Fn => F if and only if lim supn Fn(x) <, F{x) and lim infn Fn{x — 0) > F{x — 0) for all
x (here F(x - 0) = ы?у<х F(y)).
2. If к > 1, the set of discontinuities of F, although having dense complement, need not
be countable. A (k — l)-dimensional hyperplane can contain at most countably many
discontinuities if it is normal to none of the axes. [To see the problem, consider first the
hyperplane {(хг, x2): xx = — x2} in i?2.]
3. If Fn => -Fand if Fis continuous at each point of a closed set A, then
supxe^ \Fn(x) - F(x)\ -> 0.
4. The Levy distance A(F, G) between two one-dimensional distribution functions is the
infimum of those positive e such that F(x — e) — e < G(x) < F(x + e) + e for all x.
22 Weak Convergence in Metric Spaces
Interpret A(F, G) geometrically in terms of the graphs of F and G. Show that Fn => F if and
only if A(Fn, F) -*¦ 0; prove that the collection of one-dimensional distribution functions is
a separable, complete metric space under X.
5. Problem 5 in Section 1 adapts the problem preceding it to the general Banach space
of countably infinite dimension. In С a simple direct analysis is possible: Work with the
functions xn defined by C.5).
6. The uniform distribution on the unit square and the uniform distribution on its
diagonal have identical marginal distributions. Relate to Theorem 3.2.
7. Extend the product-space theory at the end of the section by showing that, for a
countable product of separable spaces, the finite-dimensional sets (appropriately defined)
form a convergence-determining class (i?00 is a special case).
4. CONVERGENCE IN DISTRIBUTION
The theory of weak convergence can be paraphrased as the theory of
convergence in distribution. When stated in the terminology of the latter
theory, which involves no new ideas, many results assume a compact and
perspicuous form.
Random Elements
Let X be a mapping from a probability space (П, 3$, P) into a metric space
5. If JHs measurable (in the sense that X^S? <= ^; see p. 222), we call it a
random element. We shall say X is defined on its domain D (or (П, 3ft, P))
and in its range 5 and call it a random element of S. If 5 = R1, we call X a
random variable; if 5 — Rk, we call X a random vector; if S = C, we call
X a random function.
Random variables and random vectors are familiar objects, and the
Introduction contains (see formula A3) there) an example of a useful random
function (although its measurability was not proved). A variety of random
functions arise in a natural way in probability theory.
The distribution^ of X is the probability measure P = РХ~г on (S, SP):
D.1) P(A) - ?(Х~гА) = P{o): X(co) eA} = P{XeA}, AeST.
(We generally suppress the argument со in this way.) In case S = Rk, we
have also the associated distribution function of X, defined by
F(x) =P{y.y<x}= ?{X <x}, xe R\
Note that P is a probability measure on a space of an arbitrary nature,
whereas P is always defined on a metric space. For many questions, the
distribution P contains all relevant information about the random element
t This has nothing to do with the distributions of Schwartz.
Convergence in Distribution 23
X. If h is a measurable function on S (/r1^1 с У), then, by the change-of-
variable formula (see p. 223),
D.2) (h(X) d? = (h dP,
in the sense that the two integrals exist or fail to exist together and have the
same value if they do exist. In the usual expected-value notation, D.2)
becomes
D.3) ЦКХ)} =[hdP.
Each probability measure on each metric space is the distribution of some
random element on some probability space: Given P on (S, У), if we take
(П, 98, P) - OS, У, P), and if we take X to be the identity,
D.4) X(co) -co, соеп = S,
then X is a random element on П with values in S and has P as its distribu-
distribution. Although the class of distributions thus coincides with the class of
probability measures on metric spaces, we generally call a measure on a
metric space a distribution only when it is indeed the distribution of some
random element already under discussion.
Convergence in Distribution
We say a sequence {Xn} of random elements converges in distribution to the
random element X, and we write
D.5) Xn^X,
if the distributions Pn of the Xn converge weakly to the distribution P of X:
D.6) Pn^P.
Although this definition of course makes no sense unless the image space S
(the range) and the topology on it are the same for all the random elements
X, Хъ X2, . . . , the underlying probability spaces (the domains) may be all
distinct. We ordinarily make no mention of these underlying spaces because
their structures enter into the argument only by way of the distributions on S
they induce. Thus we write P{Xn e A} where we should write Pn{Xn e A},
and we write E{f(Xn)} where we should write $ f(Xn) d?n or perhaps
ЕпШХг)}- Since )sf(x)P(dx) = Jn f{X) d? by the change-of-variable
formula D.3) and similarly for $fdPn, we have Xn Д. X if and only if
E{/TO> - W(X)} for every /e C(S).
Theorem 2.1 asserts the equivalence of the following five statements. Call
a set A in У an Z-continuity set if P{X e dA) — 0.
24 Weak Convergence in Metric Spaces
(i)Xn^X
(ii) limn E{f(Xn)} = E{f(X)} for all bounded, uniformly continuous real f.
(iii) lim supn ?{Xn eF}< ?{XeF}for all closed F.
(iv) lim infn ?{Xn eG}> P{XeG}for all open G.
(v) limn ?{Xn eA} = ?{X e A} for all X-continuity sets A.
Each theorem about weak convergence can be similarly recast.
The following hybrid terminology is useful. If Xn are random elements of
S, if Pn are the corresponding distributions, and if P is a probability measure
on (iS, У), we say the Xn converge in distribution to P, and write
D.7) Xn^P,
in case Pn =>P. There is the obvious corresponding version of Theorem 2.1.
It is a great convenience to be able to pass from one to another of the
three equivalent concepts D.5), D.6), and D.7), and we shall do so freely.
This is largely a matter of expedient phraseology. For example, if random
variables Xn have asymptotically a normal distribution with mean ц and
variance a2, we shall express this fact by writing
D.8) Xn^
It makes no difference whether we interpret N(/u, a2) as the measure on
(R1, Mx) with density Bna2)-i е~ы-^12^ and understand D.8) in the sense
of D.7) or whether we interpret N(/n, a2) as a random variable with
и-">2/2<т2 du, Ae M1,
and understand D.8) in the sense of D.5). (There exist many such random
variables N(/n, a2) on many probability spaces; the construction D.4) gives
one of them.) We shall restrict NQz, a2) to this (dual) meaning, and we shall
write N for N@, 1):
D.10) P{iV e A} = —L f e~/2 du, AeM\
D.9) ?{N(p, a2) eA}= -=L- Г
yj2n a J
Convergence in Probability
Many of the standard concepts and results for convergence in distribution of
ordinary random variables generalize to random elements. If, for an element
a of S,
D.11)
for each positive e, we say Xn converges in probability to a and write
D.12) Xn^a.
Convergence in Distribution 25
If a is conceived as a constant-valued random element, then, as is easily
proved, Xn A- a if and only if Xn ^> a. Alternatively, Xn A- a if and only if
the distribution of Xn converges weakly to the probability measure corre-
corresponding to a mass of 1 at the point a. The random elements Xn in D.12)
may, as usual, be defined on distinct probability spaces—only the range S
need be common to them all.
If Xn and Yn have a common domain, it makes sense to speak of the
distance p(Xn, Yn)—the function with value p(Xn(co), Yn(co)) at со. If S is
separable, p(Xn, Yn) is a random variable (see p. 225). In the following
theorem, we assume that, for each n, Xn and Yn do have a common domain
and that S is separable.
THEOREM 4.1 //Xn ^> X and p(Xn, Yn)^0, then Yn ^> X.
Proof. If FE = {x:P(x, F)<e}, then
P{ YneF}< P{p(Xn, YJ >e} + ?{Xn e Fe}.
Since Fe is closed, the hypotheses imply
lim supn P{ YneF}< lim supn P{Xn e F.} < P{X e F?}.
If F is closed, then Fe \. F as e [ 0 and the result follows by Theorem 2.1 (the
version corresponding to convergence in distribution).
In the next theoremt we assume that, for each n, Yn, Xln, X2n, . . . have a
common domain and that S is separable.
THEOREM 4.2 Suppose that, for each u, Xun -^ Xu as n^ oo and that
Xu ^> X as м —»¦ oo. Suppose further that
D.13) lim lim sup ?{9{Xun, Yn) > e} = 0
«-* oo n-* oo
for each positive e. Then Yn ^> X as n -*¦ oo.
Proof. Defining FE as before by FE = {x:p{x,F)< e}, we have
P{Yn EF}< P{Xun e Fe} + P{P(Xun, Yn) > e}.
By the hypothesis Xun 2+ Xu (n -*- oo),
lim sup P{Yn eF}< P{Xu e Ft} + lim sup P{P(Xun, Yn) > e}.
n-*oo n-*co
By D.13) and the hypothesis Xu ¦?> X (u -> oo),
lim sup P{Yn eF} < P{X e FE}.
n-* oo
The result follows as before.
t The remainder of this section is not central to the theory; after a cursory reading, it can
be consulted as the need arises.
26 Weak Convergence in Metric Spaces
Suppose now that X, Xlf X2, . . . all have a common domain and that S is
separable. If
for every positive e, we say that Xn converges in probability to X and write
D.14) Xn^X.
Because of the assumptions here of separability and (more important) of a
common domain for the Xn, this concept does not generalize D.12).
THEOREM 4.3 // Xn ^> X, therf
D.15) P({XneA} + {XeA})^0
for every X-continuity set A.
Proof. For each positive e,
?{Xn e A, X$ A} < P{P(Xn, X) > e} + P{P(X, A) < e, X$A).
From this, the same inequality with Ac in place of A, and the assumption
Xn A- X, we obtain
lim sup ?{{Xn eA} + {Xe A})
< P{p(X, A)<8,X$A} + ?{P{X, Ae) < 8, XeA}.
As e —>- 0, the right-hand member of this inequality tends to ?{X e dA) = 0.
Since D.15) implies ?{Xn e A} ^ ?{XeA), it follows from Theorem 4.3
that Xn -^ X implies Xn ®+ X.
Product Spaces %
Let X' and X'n be random elements of S", and let X" and X"n be random
elements of S". In the rest of this section we assume that X' and X" have the
same domain, that X'n and X"n have the same domain for each n, and that S"
and S"' are separable, so that (X', X") and (X'n, JiQ are random elements of
S' x S"' (see p. 225). We seek conditions under which
D.16)
The random elements X' and X" are by definition independent if
?{X' e A', X" e A"} = ?{X' e A'} ?{X" e A"). If X' and X" are independent
and X'n and X"n are independent for each n, then, by Theorem 3.2, D.16) is
t Some notation: Ec is the complement of E, Ex — E2 = Ег (~л E2° is the difference of Ex
and E2, and E1-\- E2 = (Ег — E2) U {E2 — Ег) is the symmetric difference of Ex and E2.
t The next-to-last footnote still applies.
Convergence in Distribution 27
equivalent to
D.17) X'n^X', X"n®+X".
Although D.16) implies D.17) even without independence, the converse
fails—take (X', X") distributed uniformly on the unit square and (X'n, X'^)
distributed uniformly on its diagonal. We shall be concerned with cases in
which X' and X" are independent but X'n and X"n satisfy some condition
weaker than independence. If they satisfy D.16) with independent X' and
X", then it is natural to regard X'n and X"n as asymptotically independent.
By Theorem 3.1, D.16) holds if and only if
D.18) P{X'n e A', X"n e A"} -> P{X' e A', X" e A"}
for all X'-continuity sets A' and all ^''-continuity sets A".
THEOREM 4.4 If X'n ^> X' and X"n ^ a", then (X'n, JQ ^> (X', a").
Proof We must verify D.18) with X" identically equal to a". Suppose that
A' is an ^'-continuity set and that A" is an Х''-continuity set (that is, а" ф дА").
If a" e A", then P{X'^ ф A"} -> 0, and D.18) follows from X'n ®+ X' and
P{X'n e A'} - P{X: ф A"} < ?{X'n e A', X: e A"} < P{X'n e A'}.
If а" ф A", then D.18) follows from
?{X'n e A', X"n e A"} < P{X'; e A"} — 0.
In the next theorem we assume that X"n converges in probability to a
random element Y" that is not necessarily constant, which requires that Y"
and all the (X'n, JQ have the same domain (D, &, P). Let ^0 be a (finitely
additive) field contained in ?%, and denote by a(&0) the (T-field generated
by ^0.
THEOREM 4.5 Suppose that X' and X" are independent and that X" has
the same distribution as Y". If' X"n A- 7", if
D.19) ?{{X'n e А'} ПЕ)^ P{X' e A'} P(E)
for each X'-continuity set A' and each set E in the field ^0, and if each X'n
is measurable o(&0), then (X'n, X'^) ®+ (X1, X").
Note that D.19) implies X'n®+X' (take E =* D). Note also that the
domain of (Xr, X") need not be that common to Y" and the (X'n, X'^). We may
replace (X', X") in the conclusion by G', Y") if the domain of 7" supports
some random element Y' that is independent of Y" and has the proper
distribution; but it would be an unnecessary restriction to assume in general
the existence of such a Y'.
28 Weak Convergence in Metric Spaces
Proof. Fix an Z'-continuity set A' and an ^''-continuity set A". We are to
prove D.18), which, since X' and X" are independent and X" has the distribu-
distribution of Y", is the same thing as ..
D.20) ?{X'n e A', X"n ? A"} -> P{X' e A'} ?{Y" e A").
Since X"n A- Y", it follows by Theorem 4.3 that D.20) is in turn the same as
D.21) ?{X'n e A', Y" e A") -> P{X' e A'} ?{Y" e A").
Write En - {Х'пеА'} and a =* ?{X' eA'}, and let g denote the indicator
of the set {Y" e A"}. Then D.21) takes the form
D.22) Г gdP-+oi(gdP.
Since each X'n is measurable o(&0), each En lies in a(
We shall prove that D.22) holds if g is an arbitrary integrable function
(measurable 38). If g is the indicator of a set in 38 a, then D.22) holds because
it is the same thing as D.19). Clearly D.22) then also holds if g is a simple
function measurable 380. If g is integrable and measurable o{38^), then, for
each positive e, there is a simple function g?, measurable 38 й, with
E{l? ~ ge\) < e; but then
lim sup g dP — <x\g dP
n-+oo jEn J
so that D.22) follows for all such g.
Finally, suppose that g is measurable 38 and integrable but not necessarily
measurable aC$0). By the properties of conditional expected valuesf we
still have, since En e
f g d? = Г E{# 1 a(^0)} d? - а Ге{# I a(^0)} ^P = a f^ d?.
J En JEn J J
Thus D.22) does hold for all integrable g, as required.
Remarks. The ideas in the proof of Theorem 4.5 come from Renyi A958).
PROBLEMS
1. Prove for random variables that, if Xn ®+ X and Yn ?+ 0, then Xn + Yn 1^ X and
лп rn —> u-
2. Prove for random vectors Xn, Yn, X and random variables Zn that (a) if Xn Ё+ X,
\Xn - Yn\ ? Zn \Xn\, and Zn P> 0, then Уп М+ X and that (b) if Xn ®+ X, \Xn - Yn\
<, Zn | Yn\, and Zn L-y 0, then У„ ?> X [Reduce (b) to (a) via the fact that \x — y\ <,
e \y\ with e < i implies \x — y\ < 2e \x\.]
t See Doob A953) or Billingsley A965). The central results in this book do not require
conditional probabilities and expected values.
Weak Convergence and Mappings 29
3. Three fair coins are tossed independently. Let Etj be the event that coins j and у show
the-same face, let X' = X'n = X"n = Y" be the indicator of ?13, let X" be the indicator of
E12, and let 3§0 = {E12, E2Z}. The conclusion of Theorem 4.5 fails, although its hypotheses
are satisfied except for the requirement that &0 be a field.
4. Let Xlt X2,... be independent and have a common distribution P on S. Let Pn m be
the empirical measure for Xx(in), . .. , Xn(co); Pn m(A) is the fraction of k, 1 < к <| n, for
which ^(co) e A:
1 Ы
Show that, if S is separable, then Pn m => P with probability 1. [Use the strong law of large
numbers for Bernoulli trials and Corollary 1 to Theorem 2.2.1 (This result is due to Vara-
darajan A958b); see Ranga Rao A962) for extensions.)
5. Show that random variables Xn and X satisfy Xn Ё+ X if and only if
D.22) 4F(Xn)} - 4F(X)}
for every continuous distribution function F [turn A.1) around]. If У has the distribution
function Fand is independent of X and all the Xn (take them all to have the same domain),
then D.22) is the same as P{ Y < Xn} — P{ Y < X}.
6. For a probability measure P on (S, ?"), D.4) shows how to construct on a probability
space (Q, 3B, P) a random element X with distribution P. If S is separable and complete,
we can take P as Lebesgue measure on the Borel sets 88 of the unit interval Q. [Let
sfk = {Aku} be a decomposition of S into P-continuity sets of diameter less than I/A: and
let Sk = {Iku} be a decomposition of Q into subintervals with lengths P(/fcu) = ^(Л^);
arrange that •я?к+1 refines s4h and >/fc+1 refines Jk. For a> G Iku, give Xk(a>) some value in
Aku, show that {Хк(ш)} is fundamental for each a>, and use Theorem 4.1 to show that
X(a>) = UmkXk(u>) has distribution P.] (Skorohod A956) has the stronger result that, if
Pn => P, random elements Xn and X with these distributions can be constructed on the unit
interval in such a way that Xn(w) -*¦ X(w) for each со.)
7. Show that (X'n, X"n) 1*. (X', X") if X' and X" are independent, X"n Д. Y", where
Y" has the distribution of X", and D.19) holds for each E in the a-field generated by Y".
5. WEAK CONVERGENCE AND MAPPINGS
Continuous Mappings
If h is a measurable mapping of 5 into another metric space S' (with metric
p and cr-field У of Borel sets), then each probability measure P on(.S, ST)
induces on (S', S?') a unique probability measure Phr1, defined by Рк~х(А) =
P{hr1A) for AeSS". We need conditions under which Pn^>P implies
Pnhr1^>Phr1. One such condition is that h be continuous, since then
f(h(x)) is bounded and continuous on S whenever f(y) is bounded and
continuous on S', so that Pn =>P implies §f(h(x))Pn(dx) -> §f(h{x))P(dx), a
relation which, upon transformation of the integrals (see p. 223), becomes
f(y)Ph~\dy).
30 Weak Convergence in Metric Spaces
For example, the natural projection vk from i?00 to Rk is continuous, so
that Pn =>P implies P^-1 => Ртг for each k. Let us show that, conversely,
if Pn-n-1 => P-n--1 for each k, then Pn^~P. From the continuity of ттк it
follows easily that Этг^Я <= тг ЭЯ for Я с i?fc. Using special properties of
ттк we shall prove that there is inclusion in the other direction. If x e тг дН,
so that nkx е ЭЯ, then there are points <x(u> in Я and points /?(u) in Hc such
that <x(u) -> тткх and /S(t<) -> тг&а; (и -> oo). Since the points (a<">, . . . , а?и),
жлн-ъ • • • ) lie in -n~xH and converge to x, and since the points (/^u>, . . . , fi[u),
xk+1, . . . ) lie in (тг^ЯH and also converge to x, x e d^-^H). Thus дтт^Н =*
7Г-1 дН. If РпТг-^Ртт-1, then Рп(Л)^Р(Л) for sets Л = тг^Я with
^ andРСтг-1 дН) - 0. SinceРСтг дН) = 0 is equivalent to
0, РП(Л)^-Р(Л) holds for all finite-dimensional P-continuity sets, and
hence, since the finite-dimensional sets form a convergence-determining
class, Pn => P.
We call the Ртг^г1 the finite-dimensional distributions or measures corre-
corresponding to P. We have shown that probability measures on G?00, ^°°)
converge weakly if and only if all the corresponding finite-dimensional
distributions converge weakly.
The finite-dimensional distributions of a probability measure P on (C, ^)
we define as the various measures P^J^..tic, where the irt ...tjfe are the projec-
projections defined in Section 3. Since these projections are continuous, the weak
convergence of probability measures on (C, ^) implies the weak convergence
of the corresponding finite-dimensional distributions. But the converse fails
because, as was shown by counterexample, the class of finite-dimensional
sets is not convergence determining. Indeed, if P [PJ is a unit mass at 0
[the function C.5)], then Pn does not converge weakly to P, even though
Pn7r~^...tk =>Pnj*..tk holds for all sets (fls . . . , tk). On the other hand, since
the finite-dimensional sets form a determining class, a probability measure
on (C, ^) is uniquely determined by its finite-dimensional distributions.
Main Theorem
We have seen that Pn^>P implies PJr1 => Phr1 if A is a continuous mapping
of iS into S", but we can weaken the continuity assumption. Assume only
that h is measurable and let Dh be the set of discontinuities of h. Then
DheSf (even if h is not measurable; see p. 225).
THEOREM 5.1 IfPn^P and P(Dh) = 0, then PJr1 ^>Pfr1.
Proof. We shall show that, if F is a closed subset of S", then
lim sup PJT\F) < Phr\F).
Weak Convergence and Mappings 31
Since Pn => P, we have
Um supn Pn(h^F) < lim supn Pn(Qr*F)-) < P((h^F)-).
Hence it suffices to prove P((h~1F)~) — P{h~xF), which follows from the
assumption P(Dh) = 0 and the fact that (hr1?)- с ?>л и
There are two immediate corollaries. For a random element X of S, h(X)
is a random element of S' (we still assume h measurable).
COROLLARY 1 IfXn^X and ?{X e Dh) = 0, then h(Xn) -^ h(X).
COROLLARY 2 IfXn ?* a and ifh is continuous at a, then h(Xn) ?> h{a).
For example, for ordinary random variables, (Xn, Yn) Д>- (X, Y) implies
Xn + Yn Q> X + Y. Facts of this sort, used constantly in probability and
statistics, all depend ultimately on Theorem 5.1.
Particular interest attaches to Theorem 5.1 when S' is the line, in which
case h is an ordinary real, measurable function.
THEOREM 5.2 (i) IfPn => P, then Pnhrx ^> Phrx for every real, measurable
function hfor which P(Dh) = 0.
(ii) IfPJr1 =>Ph~x for all bounded, continuous real h, then Pn =>P.
(iii) IfPn => P and h is a real, bounded, measurable function with P(Dh) = 0,
then ]hdPn^lh dP.
Proof. Part (i) follows from Theorem 5.1. If PJr1 => Phr1, then, by change
of variable, ff(h(x))Pn(dx) -» J f(h{x))P{dx) for each / in С{ВУ). If h is
bounded by M, then, taking
f-M if t < -M,
f{t) = | t if -M <t <M,
[ M if M <, t,
we see that J h dPn —> J h dP. Thus part (iii) is a consequence of part (i) and
part (ii) follows by the definition of weak convergence. (We can strengthen
(ii) by requiring h in the hypothesis to be uniformly continuous.)
In applications we are generally interested in establishing weak conver-
convergence in various metric spaces in order to be able, by applying part (i) of this
theorem, to prove the weak convergence of measures induced on the line by
various real functions h.
Integration to the Limit
In part (iii) of Theorem 5.2, the restriction that h be bounded can be relaxed.
It is simplest to discuss this problem in terms of random variables Xn and X
having distributions PJr1 and Phrx.
32 Weak Convergence in Metric Spaces
THEOREM 5.3 IfXn 2+ X, then E{|X|} <, lim infn E{\Xn\}.
Proof. Take
(\x\ for |ж| < а
h(x) =*
( 0 for |ж| > a
in part (in) of Theorem 5.2; if P{|X| = a} = 0, then
f |*|rfP=lim f |XJ JP<limi
J{|X|<a} n-oo J{|Xn|<a} n-o
The result follows if we let a tend to infinity through values satisfying
= a} = 0.
The variables Xn are said to be uniformly integrable if
lim sup f \Xn\ d? = 0.
a-oo n J{\Xn\>a]
If the Xn are uniformly integrable, then
E.1) supnE{|Xn|}<oo.
On the other hand, if
supn E{\Xn\1+E] < oo
for some positive e, then the Xn are uniformly integrable because in this case
f \Xn\ d? < \
<xE
The Xn are also uniformly integrable if there is a random variable Y such
that E{| F|} < oo and
E.2) P{\Xn\ > a} < P{| Y\ ? a}, n > 1, a > 0,
because in this case (see C) on p. 223)
f \Xn\dP<,{ \Y\dP.
J{|Xn|>a} J{|F|>a}
THEOREM 5.4 Suppose Xn %- X. If the Xn are uniformly integrable, then
E.3)
if X and the Xn are nonnegative and integrable, then E.3) implies that the Xn
are uniformly integrable.
Proof. If the Xn are uniformly integrable, the integrability of X follows
from E.1) and Theorem 5.3. Take
(x if \x\ < a,
h(x) =
@ if \x\ > a,
Weak Convergence and Mappings 33
in part (iii) of Theorem 5.2. Since Xn 2* X,
E.4) ЕВД} -> E{ha(X)}
if P{\X\ - a} = 0. But
E.5) E{Xn} - E{/za(XJ} = f Xnd?
J{|Xn|>a}
and
E.6) E{X} - E{/za(X)} = f XdP.
J{|X|>a}
Since these three relations imply
lim sup \E{Xn} - B{X}\ < sup f \Xn\ d? + f \X\ d?,
n-oo n J{\Xn\>a] J{\X\>a}
E.3) does follow from uniform integrability.
On the other hand, if X and the Xn are integrable and E.3) holds, it follows
by E.4), E.5), and E.6) that
f XndP-*[ X
J{|Xn|>a} J{|X|>a}
E.7) f XndP-*[ XdP
J{|Xn|>a} J{|X|>a}
if P{|X| — a} =: 0. Since X is integrable, we can, given e, choose a so the
right side of E.7) is less than e. But then the left side is less than e for all n
exceeding some n0. Since the Xn are nonnegative and individually integrable,
uniform integrability follows.
Since Xn^X implies Xnr^Xr, Theorems 5.3 and 5.4 immediately
extend to moments higher than the first.
In none of this need X and the Xn (and the Y in E.2)) have a common
domain. If they do, and if XJco) —*¦ X(co) for almost all со, or if Xn Д. X,
then Xn -^ X. Thus Theorem 5.3 contains Fatou's lemma and Theorem 5.4
contains the standard theorems about integration to the limit (the case
involving E.2) contains Lebesgue's dominated convergence theorem).
An Extension of Theorem 5.11
Let hn and h be measurable mappings from S to 5". We may ask whether
Pn=>P implies Pnh~x =>Phrx when hn converges to h in some sense. Let
E be the set of x such that hnxn -> hx fails to hold for some sequence {xn}
approaching x. (If hn is identically equal to h, then E =* Dh.) It is not hard to
show that x e Ec if and only if for every e there exist а к and а д such that
/ > к and p(x, у) < д together imply p'(hx, h-y) < e. Assume that E lies
in У (which is shown on p. 226 to hold automatically if S' is separable).
f This result is not used in the sequel.
34 Weak Convergence in Metric Spaces
THEOREM 5.5 IfPn => P and P{E) - 0, then Pnh~x => Phr\
Proof. We shall show that P(hrlG) < lim infn Pn(h-XG) for open subsets
G of S'. From the above characterization of the points of Ec, it follows that,
if x e Ec and if hx lies in the open set G, then there exist к and д such that
h-y E G if i > к and p(x, y) < d, so that x is interior to Tk = [
Thus hrxG с ? и иЛ0- Since ^C^) — °» we nave P(h~lG) <
since Г° с J°+l, we have, for given e > 0, PQr^G) < Р(Г°) + ? for
sufficiently large fc. From Pn^>P it follows that P(J°) < lim infnPn(J°);
since T° a h~xG for large n, we have P(T°) < lim infn Pn(h~lG). Therefore
P(hrxG) < lim infn Pn(hr*-G) + ?, which, since ? was arbitrary, completes
the proof.
If hn = h, this result reduces to Theorem 5.1. If h is everywhere continuous
and hn converges to h uniformly on compact sets, then E is empty, so that
the hypothesis of the theorem is satisfied. More generally, P(E) =* 0 if
P(Dh) — 0 and if there is uniform convergence on compact sets. On the
other hand, if Dh — E — 0, then hn converges to h uniformly on compact
sets.
Remarks. In the Euclidean case, Theorem 5.1 is sometimes called the Mann-Wald
theorem (see Mann and Wald A943) and Chernoff A956)); Corollary 2 for rational
functions h is Slutsky's theorem (see Slutsky A925)). Theorem 5.5, which generalizes a
result of Prohorov A956), is due to H. Rubin; see Anderson A963). For extensions of
Theorems 5.1 and 5.5 and converses to them, see Tops0e A967a and 1967b).
PROBLEMS
1. If a Borel function/has a derivative at a (this being the only assumption on/) and
Xn P+ a, then f(Xn) = f(a) + f'(a)(Xn - a) + (Xn - a)Zn with Zn P> 0. Extend to
higher derivatives.
2. Suppose/has a derivative at 0. If XnYn ®+ Y and Yn ?>. 0, then
Xn(f(Yn)-f@))l+f\0)Y.
3. Suppose fn and / have continuous derivatives with f'n(%) -*¦/' (x) uniformly in x. If
Xn Yn ®+ Y and Yn P> 0, then Xn(fn( Yn) - /n@)) ^/'@) Y.
4. Suppose Xn ,^> X. Give examples in which the Xn are integrable but X is not, and
conversely. Show that E {Xn} -> E^} does not in general imply uniform integrability of
the Xn.
5. Show that Xn are uniformly integrable if and only if supn E{|ZJ} < oo and for each
г there exists а д such that, for all n, P(?) < S implies ]E \Xn\ d? < e.
6. Let P be Lebesgue measure on the unit interval and Pn correspond to a mass of \\n
at some point chosen from ((/ — 1)/и, i\ri), i = 1, . . . , n. Show that Pn => P and deduce
from Theorem 5.2 that a bounded function continuous almost everywhere is Riemann
integrable.
Prohorov's Theorem 35
6. PROHOROV'S THEOREM
Relative Compactness
Let П be a family of probability measures on (S, ?P). We call П relatively
compact if every sequence of elements of П contains a weakly convergent
subsequence; that is, if for every sequence {Pn} in П there exist a subsequence
{Pn,} and a probability measure Q (defined on (S, 5f), but not necessarily
an element of П) such that Pn. => g.| Even though Pn. => Q makes no sense
if Q(S) < 1, it is to be emphasized that we do require that Q(S) = 1—we
disallow any escape of mass, as discussed below.
We need to be able to decide whether or not a given family П is relatively
compact. For example, suppose we know of probability measures Pn and P
on (C, #) that the finite-dimensional distributions of Pn converge weakly to
those of P. We have seen in Section 5 that Pn need not converge weakly to
P. Suppose, however, that we also know that {Pn} is relatively compact. Then
each subsequence {Pn<} contains a further subsequence {Pn.} converging
weakly to some limit Q. The finite-dimensional distributions of Q must be
the weak limits of those of {Pn*} and hence must coincide with the finite-
dimensional distributions of P (for each (tlf . . . , tk), Pn'7rT^-tk => Q^-tk
and Pn7rj*..tk=> PTr~*..tk, so QTTj~*..tk — iV"?-..^). But then, since a probability
measure on С is completely determined by its finite-dimensional distributions
(the finite-dimensional sets form a determining class), Q and P must them-
themselves coincide. Thus each subsequence {Pn,} contains a further subsequence
converging weakly to P, and it follows by Theorem 2.3 that the entire
sequence {Pn} converges weakly to P. Note that, if {Pn} does converge
weakly to P, then it is relatively compact, so that relative compactness is not
an excessive requirement.
Suppose we know that {Pn} is relatively compact and that, for each
(tx, . . . , tk), Pnrrj^..tic converges weakly to some probability measure
pt ...ik on (Rk, Mk)—the point being that we do not assume at the outset
that the nt...tic are the finite-dimensional distributions of a probability
measure on (C, <i). It still follows that each subsequence {Pn,} contains a
further subsequence {Pn*} converging weakly to some limit. Since this limit
must have the (*tl...tk as its finite-dimensional distributions, it is unique.
Therefore {Pn} converges weakly to some P.
These ideas provide a powerful technique for proving weak convergence
in С and in other function spaces. One proves first that the finite-dimensional
f We take this as a definition; it is really a sequential compactness notion in Z(S) as denned
in the footnote on p. 11. See Appendix III for further topological information.
36 Weak Convergence in Metric Spaces
distributions converge weakly and then that the sequence in question is
relatively compact. To use this method we need an effective criterion for
relative compactness.
Consider R1 first. Let П be a family of probability measures on (R1, &1).
Given a sequence {Pn} from П, we can apply to the sequence {Fn} of corre-
corresponding distribution functions the classical Helly selection theorem (see
p. 227). There exist a subsequence {Fn>} and some function F such that
F.1) FA*)-+F(x)
holds for all its continuity points x. The function F may be taken to be right-
continuous, in which case there exists on (R1, ?%*) a finite measure /л, such
that
F.2) /л(а, b] = F(b) - F{d).
Now it may happen that ^(R1) < 1. For instance, if Pn corresponds to a
unit mass at the point n, then .Fis identically 0, no matter what subsequence
{Fn,} is selected, so that ^(R1) — 0. If Pn is the uniform distribution over
[—n, n], then F(x) = \ is the only possibility—again /j,(R}) = 0. In these
examples, mass is "escaping to infinity" in an obvious and intuitive sense.
Suppose, however, that the measure /л determined by F via F.2) satisfies
/^(R1) = 1. Then [л is a probability measure and, since F.1) holds at
continuity points of F, we have (see Section 3) Pn, => [x. Thus П will be
relatively compact if we somehow ensure that each of these measures уь we
encounter satisfies ^(R1) — 1. Now ^(R1) = 1 if for every positive e there
exist a and b such that ц{а, b] > 1 — e. Suppose that for every positive e
there exist a and b such that
F.3) Pn(a,b]>l-e, л =1,2, ....
Since F.3) persists if a is decreased and b increased, we may take a and b to
be continuity points of F, in which case F.3) and F.1) together imply
M(a, b]>l-e.
It follows that a family П of probability measures on (i?1, 3^) is relatively
compact if for each positive e there exist a and b such that P(a, b] > 1 — e
for all P in П, a condition which has the effect of preventing the escape of
mass alluded to above. On the other hand, if this condition fails, then there
exists some positive e such that, no matter what a and b are, P(a, b] < 1 — e
for some P in П. Choose in П a Pn with Pn(—n, n] < 1 — e. If some
subsequence {Pn'} were to converge weakly to a probability measure Q, we
would have, for each x,
Q(-x, x) < lim infn, PA-*, x) ^ Ига infn, Pn<-n', n'] < 1 - e,
which is impossible. Thus the condition is both necessary and sufficient.
Prohorov's Theorem 37
Since an interval (a, b] has compact closure, and since each compact set
can be enclosed in such an interval, the condition can be recast: a family П
of probability measures on (R1, &1) is relatively compact if and only if for
each positive e there exists a compact set К such that P(K) > 1 — e for all
P in П. Now the condition has meaning in an arbitrary metric space.
A family П of probability measures on the general metric space S is said
to be tight if for every positive e there exists a compact set К such that
P(K) > 1 — ? for all P in П. If П consists of a single measure alone, this
definition reduces to the one introduced in Section 1. We have just seen in
the case R1 that tightness is necessary and sufficient for relative compactness.
The following theorems, due to Prohorov, extend the sufficiency to
arbitrary metric spaces and the necessity to separable, complete spaces.
THEOREM 6.1 If U is tight, then it is relatively compact.
THEOREM 6.2 Suppose S is separable and complete. If П is relatively
compact, then it is tight.
We shall refer to these two theorems conjointly as Prohorov's theorem,
calling Theorem 6.1 the direct half and Theorem 6.2 the converse half.
The Direct Theorem
We turn first to the proof of the direct half, which is the more useful in the
applications. We shall prove the result successively for Rk, for i?00, for S
cr-compact (a countable union of compact sets), and, finally, for the general
S. Each of the last three cases is handled by reducing it to the one preceding.
The case Rk. Here the proof is practically the same as the one already given
for R1. If {Pn} is a sequence in П, Helly's theorem (p. 227) implies that the
sequence {Fn} of corresponding distribution functions contains a subsequence
{F„,} such that
F.4) Fn.(*) - F(x)
at continuity points of F, where F is a function continuous from above.
There exists on (Rk, &k) a measure ju such that ii{a, b] is Fdifferenced around
the vertices of the ^-dimensional rectangle (a, b]. Now Pn. => [i will follow
if we prove /j,(Rk) — 1.
Given e, choose a compact set К in Rk such that Pn>(K) > 1 — e for all ri,
which is possible because П is tight. Now choose a and b such that К <=¦ (a, b]
and such that all 2k vertices of (a, b] are continuity points of F (this is possible
because only countably many parallel (k — l)-dimensional hyperplanes can
have positive ^-measure). Since Pn>{a, b] is Fn. differenced around the
vertices of (a, b], F.4) implies Pn-{a, b] —*¦ р(а, b]. From Pn>(a, b] >_
38 Weak Convergence in Metric Spaces
Pn>(K) > 1 — ?, it follows that ц(а, b] > 1 — e. Since e was arbitrary,
ju(Rk) — 1. Thus П is relatively compact.
For the case i?00 we shall need the following lemma.
LEMMA 1 If II is a tight family on (S, 3f) and if h is a continuous mapping
from S to S', then {Ph'1:? eU} is a tight family on (S', Sf").
Proof Given e, choose in S a compact set К such that P(K) > 1 — e for
all P in П. If К' я hK, then K' is compact (see p. 218) and hrxK' => K, so
that Phr^K') > 1 - ? for all P in П.
The case i?°°. If П is a tight family on (i?°°, ^°°), then, by Lemma 1,
{Ртт-г:Р е П} is, for each k, a tight family on (Rk, 0tk). By the direct half of
Prohorov's theorem for Rk, just proved via Helly's theorem, we can select
from a given sequence {Pn} in П a subsequence {Pn>} such that Р^тт^1
converges weakly to a probability measure fa on (Rk, 3%k). By the diagonal
method (as on p. 219), we can choose the sequence {Pn-} so that Р^тг^1 => fa
holds for all к simultaneously.
Since the measures fa obviously satisfy the consistency conditions of
Kolmogorov's existence theorem (see p. 228), there is a probability measure
Q on (i?00, ^°°) such that Qtt'1 — fa for all k. (It was shown in Section 3
that the cr-field ^°° of Borel sets in i?00 coincides with the <r-neld generated
by the finite-dimensional sets, which is the cr-field that intervenes in
Kolmogorov's existence theorem.) But then iV^1 => Qtt^1 for each k, so
that the finite-dimensional distributions of Pn. converge to those of Q, which,
as noted at the beginning of Section 5, implies Pn> => Q. Tightness implies
relative compactness in i?°°.
For the cr-compact and the general cases, we shall need two further
lemmas. Suppose that 5*0 is a Borel subset of 5*:
F.5) Soe#>.
Now 5*0 is a metric space in its own right in the relative topology; let S?o
denote the class of Borel sets for 5*0. From F.5) it follows (see p. 224) that
F.6) У0={А:Ас: S0,Ae^};
in particular,
F.7) ^0 с у.
If P is a probability measure on E*, Sf) with P(S0) = 1, let Pr (r for restric-
restriction) be the probability measure on (So, ?f0) got by restriction from У to
SfQ (see F.7)). If P is a probability measure on E*0, Уо), let Pe (e for exten-
extension) be the probability measure on (S, Sf) with Pe(A) — P(A r\ So) for
Ae^ (see F.6)). Note that Pe(S0) =* 1.
Prohorov's Theorem 39
If P is a probability measure on (S, ?P) with P(S0) = 1, then
F.8) (РУ = P;
if P is a probability measure on E*0, S^o), then
F.9) (Pe)r - P.
If, in Lemma 1 and in Theorem 5.1, we take h to be the identity mapping
from 5*0 into 5*, we obtain the following lemma.
LEMMA 2 If U is a tight family on (So, S?o), then Ue - {Pe:P e П} is a
tight family on (S, Sf). IfPn^>Pin (So, S?o), then Pne=>Pe in (S, SP).
LEMMA 3 IfPn^>Pin (S, Sf) and Pn(S0) = P(S0) = 1, then Pnr =>Pr
in (So, ^o).
Proof The general open set in So is Go — G П So with G open in S. Since
Pnr(G0) = Pn(G) and РЧ^о) = P(G), lim infn Pn(G) > P(G) implies
lim infn Pn(G0) > P'(G0).
The o-compact case. If 5* is cr-compact, then it is separable and hence can be
embedded homeomorphically into i?00 (see p. 219). Since 5* is cr-compact, so
is its image under the homeomorphism; in particular, this image is a Borel
subset of i?°°. Thus 5* is homeomorphic to a Borel subset of i?00. By Theorem
5.1, weak convergence persists under homeomorphism, and hence so does
relative compactness of П. Since compactness of sets persists under homeo-
homeomorphism, so does tightness of П. Therefore we can replace 5* by its
homeomorphic image.
We may thus assume that 5* is a Borel subset of i?°°. If П is tight in E*, ?f),
then, by Lemma 2 applied to i?°° and its subset S, ГР is tight in (i?°°, ^°°).
Since, as already proved, Theorem 6.1 holds in this larger space, Ue is
relatively compact, so that, for each sequence {Pn} in П, the corresponding
sequence {Pne} contains a subsequence {Pen-} converging weakly in the sense
of (i?00, ^°°) to some Q. By the tightness of П itself, there is for each e a
compact subset К of S such that Penl(K) = Pn-(K) > 1 - e for all n', so that
Q{S) > Q(K) > lim supn,P^CK) > 1 - ?. Thus S supports Q as well as
all the Penl, and hence, by Lemma 3 and F.9), Pn, converges weakly in the
sense of E*, SP) to Qr. Tightness implies relative compactness if S is
cr-compact.t
The general case. Whatever 5* is, if 5*0 — {J^, where Kt is a compact set in
S such that Р(Кг) > 1 — 1// for all P in П, then 5*0 supports each element of
f A separable 5 is homeomorphic to a subset of i?00. The stronger assumption that S is
ст-compact ensures that its image lies in ^°°, which is convenient because Lemmas 2 and 3
become awkward without the assumption So G У.
П and Пг = {РГ:Р е П} is a tight family in (So, Уо). By the case just treated,
Пг is relatively compact. For each sequence {Pn} in П, therefore, the
corresponding sequence {Pnr} contains a subsequence {Prn'} converging
weakly in the sense of (So, S^o) to some Q. By Lemma 2 and F.8), Pn>
converges weakly in the sense of E*, Sf) to Qe. Thus П is relatively compact,
which proves Theorem 6.1 in full generality.
The Converse
We turn now to Theorem 6.2, the converse half of Prohorov's theorem.
Note that it reduces to Theorem 1.4 in case П consists of a single measure;
the proof is a refinement of the earlier one.
Suppose that for each positive e and д there exists a finite collection
Alt . . . , An of E-spheres such that Р(и<<»Л) > 1 - ? for all P in П. The
following argument shows that П is then tight. Given e, choose, for each
integer k, finitely many 1/fc-spheres Akl, . . . , АкПк such that P([Ji<nkAkl) >
1 — ej2k for all P in U. If К is the closure of the totally bounded set
r\k>i\Ji<nkAki> tnen P(K) > 1 — ? and, since 5* is assumed complete, jSTis
compact (see p. 217).
We prove Theorem 6.2 by showing that, if the condition stated in the
preceding paragraph fails, then П is not relatively compact. Suppose then
there exist positive e and д such that every finite collection Ax, . . . , An of
5-spheres satisfies P(\Ji<nAi) <[ 1 — e for some P in П. Since S is assumed
separable, it is the union of a sequence Alt A2, ... of open spheres of radius
д. Let Bn = \Ji<nAi and choose Pn in П so thatPn(J?n) < 1 — e. Suppose
some subsequence {Pn<} converged weakly to some limit Q. Since Bm is open,
we would have P(Bm) <, lim infn, Pn,(Bm) for each fixed m. But then, since
Bm с Bn, for large n, P(BJ < lim infn. Pn>(Bn.) < 1 - e would follow;
since Bm increases to S, this is impossible, which completes the proof.
For strengthened forms of Theorem 6.2, see Appendix III.
If Xn are random elements of S, we say {Xn} is tight when {Pn} is tight,
where Pn is the distribution of Xn. If S is i?00 or C, we identify the finite-
dimensional distributions of Xn with those of Pn. The argument at the
beginning of this section may be interpreted as asserting of random elements
Xn and X of С that, if the finite-dimensional distributions of Xn converge
weakly to those of X and {Xn} is tight, then Xn ^ X.
Remarks. In his original proof of Theorem 6.1, Prohorov A956) assumed S to be
separable and complete; the present extension and its proof are due to Varadarajan A958a
and 1961a). See also LeCam A957).
PROBLEMS
1. If П is tight, then its elements have a common сг-compact support. The converse fails
(unless, for example, П consists of a single measure).
2. Let П consist of the unit masses for points in A. Show directly from the definition that
n.is relatively compact if and only if A~ is compact in S.
3. A sequence of probability measures on the line is tight if and only if the corresponding
distribution functions satisfy lima._00 Fn{x) = 1 and lim^^oo Fn{x) = 0 uniformly in n.
A class of normal distributions is tight if and only if the means and variances are bounded.
4. Show that, if Fis a right-continuous, nondecreasing real function with 0 < F(x) < 1,
then there exist distribution functions Fn such that Fn(x) —*¦ F(x) for each continuity
point x.
5. If {\Xn\d} is uniformly integrable (see p. 32) for some д > 0, then {Xn} is tight.
6. Probability measures on a product S' x 5" are tight if and only if the two sets of
marginal distributions are tight in S' and 5".
7. Suppose S is separable and locally compact. Since S can be given a metric under which
it is complete [Problem 3 of Section 1], Theorem 6.2 applies. From the fact that each
compact set in S can be enclosed in the interior of a larger compact set, it follows further that
Pn => P if and only if J fdPn —*¦ J fdP for every (bounded) continuous / with compact
support (a result which cannot hold in C, for example, since there a continuous function
with compact support vanishes identically; see Problem 5 in Section 3).
8. In connection with Lemma 3, note that, if П is a tight family on E, if) and P(S0) = 1
for all Ре П, then it does not follow that LTr = {Pr:Pe П} is a tight family on (SQ, Уо).
[Take S = [0, 1] and So = @, 1) and consider point masses.] This can happen even if П
consists of a single measure. [See Remark 2 following Theorem 1 in Appendix III.]
7. FIRST APPLICATIONS
In this section we examine from the point of view of the preceding theory
some standard probability results, results used frequently in the chapters
that follow.
Smooth Functions
In proving Theorems 1.2 and 1.3 and in proving the implication (ii)—>¦ (iii)
in Theorem 2.1, we used arguments involving the function A.1). These
arguments still go through if, in place of A.1), we use any uniformly
continuous 93 such that y(t) =1 for t < 0, 0 < y(t) < 1 for 0 < t < 1,
and (p(t) =; 0 for t > 1. It is possible to construct such functions cp with
smoothness properties stronger than uniform continuity. Outside [0, 1],
define 9? as before; for 0 < t < 1, define
G.1)
where
Then, for each integer v, cp has, over the whole line, a bounded, continuous
derivative of order v.
Let Pn and P be probability measures on (R1,
42 Weak Convergence in Metric Spaces
THEOREM 7.1 IfJ /'dP'n—> J /' dP for every bounded, continuous function
f having bounded, continuous derivatives of each order, then Pn =>P.
Proof. Consider the distribution functions Fn and F corresponding to Pn
and P. If <pu(f) = (p(ut), with <p defined by G.1), then, for each u,
lim sup FJx) < lim <pu(y - x)Pn(dy) =; <pu(y - x)P{dy) < Fix + -I.
П—+ОЭ П—+СО J J \ U '
Using <pu(y — x + 1/u) in place of cpu{y — x), we see that lim infn FJx) >
.F(x — 1/m) for each u. Hence Fn(x) —*¦ F(x) if F is continuous at x, and
Pn=>P follows.
The Central Limit Theorem
Theorem 7.1 can be used to derive the classical central limit theorems for
triangular arrays. For each n, let
G-2) U, • • ¦ , fnt.
be independent random variables with mean 0 and finite variance a2nk.
(The probability space on which the variables G.2) are defined may vary
with n.) Let Sn = |nl + • • • + €nkn and suppose its variance sn2 =
ffni + ' ' ' + °2пкп ^s positive. Recall that TV is a random variable normally
distributed with mean 0 and variance 1. We first prove Lindeberg's theorem:
THEOREM 7.2 If
G.3) -2 |f & d? - 0
Sn fc=l J{|?nfc|>?Sn}
(n —»- oo) for each positive e, then Sjsn ^ TV.
Proof It suffices to show that
G.4) E{
for every bounded, continuous /having bounded, continuous derivatives of
each order. Fix such a function /and define
G.5) g{h) = supx |/(* + h) - /(*) -/'(*)/* - \f"{x)h*\
(g is Borel measurable). By the mean value theorems of the second and third
orders, there is a constant K, depending on/alone, such that
G.6) g(h)< К mm {h\\h\*}.
We have g(h) <, Kh2 and g(h) < K\h\s; the first inequality is good for h
large, the second for h near 0. Since / is fixed throughout the rest of the
First Applications 43
argument, so is K. From the definition G.5) it follows that
G.7) \[f(x + hj -f(x + h2)] - [f\x){hx - h2) + i/'(*)(V - hm
+ g(h2).
If the |nfc were all normally distributed, E{/(Sn/sn)} would coincide with
E{/GV)}. If we successively replace the ?nk by normal variables r\nk with
mean 0 and variances a2nlc, we get a sequence
n
J)},
The first member of the sequence is E{f(SJsn)} and the last one is E{f(N)}.
The idea now is to show that each member is so close to the next (for n
large) that even the first and last members are close.
Since G.4) involves the joint distribution of the variables G.2) but not any
properties of the probability space they are defined on, we may, by passing
to a new space (say (R2kn, ?%2kn)), introduce random variables r]nl, . . . , rjnkn
such that rjnk is normally distributed with mean 0 and variance a2nk and such
that the 2kn random variables
G.8)
are independent. If
bnl> • • • > bnfcn>
<
2
k<i<kn
< k <> K,
then, since ?,nkn + ?nfcn = Sn and since ?nl + r)nl has the distribution of snN,
E/ ^ -
,2
'Ink + Г]пЛ
*n I \ Sn
Because of the independence of the sequence G.8), the three variables t,nk,
gnk, and rjnk are independent for each value of к and therefore
пк)
— 0
From G.7) it follows that
44 Weak Convergence in Metric Spaces
The proof will therefore be complete if we show that
G.9) IeUMUo
*-i 1 \snn
and
G1O)
Given ? > 0, split the expected value in G.9) into an integral over
{|?n*:l < esn) an(i an integral over the complementary set. Use G.6) to
bound the integrand by K\^nk/sn\3 on the first set and by K\?nkjsn\2 on the
second:
E\g\ — \) <
I \snJ)
Thus
kn I It \ ч 1 kn
G 11) У E{ o- i zjifc\ I ^ j^-? _i_ _g — V
and G.9) follows from G.3).
Since G.11) also holds with rjnk in place of |nfc, to prove G.10) we need
only show that
G.12) T^Ii/, >?S}^^P
tends to 0 for each positive e. But G.12) is at most
Since
.^Ц < e2 +
Sn
-Г
G.3) implies mo.xk^knankjsn —»- 0, which in turn implies 2*»х o3njsn3 —*- 0. Thus
G.12) does tend to 0, which completes the proof.
If the ink have moments of order 2 + d, then the sum in G.3) is at most
?~5^~2~5 S^x E{||nfc|2+5}, which proves Lyapounov's theorem:
THEOREM 7.3 If, for some positive d,
2+S 2,
Finally, from Theorem 7.2 we can deduce the Lindeberg-Levy theorem:
First Applications 45
THEOREM 7.4 If ?l5 ?2, . . . are independent and identically distributed
with mean 0 and finite variance a2 > 0,
k=1
To prove this last result, take kn = n and ?m- = <^; the sum in G.3) is at
most Ii2/ff2 integrated over flU > ea\jn}.
Characteristic Functions
If P is a probability measure on (Rk, &k), its characteristic function p is
defined as
p(t) — I eitxP(dx), t eRk,
where t • x = 2^=1 tuxu denotes inner product. Let Q be a second probability
measure on (Rk, 3#k), and let ^ be its characteristic function. We shall
prove the uniqueness theorem:
THEOREM 7.5 If p{t) = q(t)for all t, then P = Q.
For each t, eitx is, as a function of x, an element of C(Rk) (or its real and
imaginary parts are). Thus in the case Rk the uniqueness theorem refines
Theorem 1.3, which asserts that P is determined by the values of I f dP for
fe C(S). We shall prove uniqueness not by deriving an explicit inversion
formula but by using the Weierstrass approximation theorem.
We know from Section 3 that the rectangles (a, b] form a determining
class; since such a rectangle is an increasing union of closed rectangles, it
suffices to check that P and Q agree for sets
A = [аъ Ъг] X • • • X [ak, bk].
By the argument for Theorem 1.3, this will follow if, for each integral u,
fdP = (fdQ
G.13) [fdP = [
holds when / has the form /(xl5 . . . , xk) =/x(xx) • • -fk(xk) with f}(s) =
<pu(p(s, [aj} bj])), where p denotes linear distance and <pu is defined by
A.4).
Fix u. Given в, choose r so large that each/,- vanishes outside the interval
[—r, r] and, at the same time, so large that, if /r denotes the
cube {x:\xj < r, m = 1, . . . , k}, then P(Irc) < e and ?>(//) < e.
Since fj(—r) =fj{r), fj{s) can, by Weierstrass's theorem,! be uniformly
t For example, see Titchmarsh A939) or, for Stone's generalization, Simmons A963).
46 Weak Convergence in Metric Spaces
approximated in [ —r, r] by a finite trigonometric sum И1а.1еШз/г of period
2r. Multiplying together these sums for the different values of j, we see
that/can be uniformly approximated in Ir by a finite trigonometric sum
G.14) g(aO = Sav*"'"-
of period Ir in each variable. Choose G.14) so that |/(x) — g(x)\ < e for
x in Ir. Since/is everywhere bounded by 1, g is bounded by 1 + ? in Ir and
hence, by periodicity, in all of Rk. Thus [/— g\ is bounded by ? in /r and by
2 + ? everywhere. Assuming 0 < в < 1, we have
I/- g\ dP < ? + B + e)P(Ir°) < 4?.
J
Similarly, /|/— g| dQ < 4e, so that
fdP -
jgdP- jgdQ
8?.
But $ g dP — § g dQ is an immediate consequence of /? = q and G.14);
since ? was arbitrary, G.13) follows.
Suppose now that Pn and P are probability measures on (Rk, 0tk~) with
characteristic functions pn and p. We shall prove the continuity theorem:
THEOREM 7.6 A necessary and sufficient condition for Pn^>P is that
/>n@ -*p(t)for each t.
The necessity follows from the fact that eitx is bounded and continuous
in x for each t. We shall prove the sufficiency via two intermediate
propositions.
If the pn{t) converge pointwise to some limit and if {Pn} is tight, then
Pn => P for some P. Let g(t) — limn pn(t); we make no assumptions whatever
about g. If {Pn} is tight, then, by Prohorov's theorem, it is relatively compact,
so that each subsequence {Pn'} contains some further subsequence {Pn-}
converging weakly to some limit whose characteristic function must then be
linvPn"(t) ~ g@- By the uniqueness theorem, there is thus only one possible
such limit, which is the P we seek (see Theorem 2.3).
Notice that this proposition fails without the assumption of tightness
[example: к — 1 and Pn uniform over (—n, и)]. Notice also that the proof
just given exactly parallels the argument at the beginning of Section 6
involving tightness and finite-dimensional distributions for probability
measures on (С, ЯГ).
Now let us show that, if the limit function g{f) =* \\mnpn(t) is continuous at
t = 0, then {Pn} is tight (and hence weakly convergent to some limit).
Clearly, {Pn} is tight if each of the к corresponding sequences of one-
dimensional marginal distributions is tight. The characteristic functions
pn(s, 0, . . . , 0) of the marginal distributions for the first coordinate
converge pointwise to a limit g(s, 0, . . . , 0) continuous at s — 0, and
First Applications 47
similarly for the other coordinates. Thus each sequence of marginal distribu-
distributions satisfies the hypotheses and it suffices to treat the case к = 1. In this
case, by Fubini's theorem,
J L J-u
G.15)
U J-u
sin ux\
> 2 f (l - -M Рп(*п) > Pn{x: \x\ > -)
(in particular, the first integral is real). Since g is continuous at the origin,
there is, for each positive e, а и such that urx J"u |1 — g{t)\ dt < e. By the
bounded convergence theorem, for n beyond some n0 we have
(•
Ju
\l -pn(t)\ dt < 2e
and hence, by G.15), Pn{x:\x\ > a} < 2e with a = 2/w. By increasing a if
necessary, we can ensure that this inequality holds for the finitely many n
preceding n0: {Pn} is indeed tight.
If g is the characteristic function of a probability measure, it is auto-
automatically continuous; the continuity theorem follows.
It is instructive to compare the following pairs of statements, of which all
are true except 6°. Here Pn and P are probability measures on (Rk, ?%k) or
on (C, *?), as appropriate; in the former case,/?n and/? are their characteristic
functions andg is a function on Rk; in the latter case, Рптт^...и and Ртт^..и
are their finite-dimensional distributions and f*J*..t. are probability measures
on (R\ 0b\
Space Rk Space С
1. The function p(t) determines P. 1°. The measures Pn^1 t, determine P.
2. If pn(t) converges to some g{t) for 2°. If iVr converges weakly to some
each t and if {Pn} is tight, then Pn fxf u for each (tx- • • t{) and if {Pn}
converges weakly to some P. is tight, then Pn converges weakly to
some P.
3. Statement 2 fails without tightness. 3°. Statement 2° fails without tightness.
4. If pn(t)converges to some^-(r) for each
t and if g is continuous at 0, then {Pn}
is tight (and hence, by 2, weakly
convergent).
5. If pn(t) ^p{t) for each / and if {Pn} 5°. If Рпщ*..и => Рщ*..и for each (rx • • • tt)
is tight, then Pn => P. and if {Pn} is tight, then Pn => P.
6. Upn(t)-»p(t) for each /, then Pn^ P. 6°. [If Pnn^..u=> Ртг'Ц for each (/x •¦•/0,
then jPn => P.]
48 Weak Convergence in Metric Spaces
Statement 1 is the uniqueness theorem for characteristic functions; 1°
restates the fact that the finite-dimensional sets in С form a determining class.
The proofs of 1 and 1° have nothing particular in common.
Statements 2 and 2° follow, respectively, from 1 and 1° by exactly parallel
arguments.
Counterexamples substantiate 3 and 3°.
Statement 4 has no analogue 4°.
Now 5 and 5° follow, respectively, from 2 and 2°, again by exactly parallel
arguments; but the assumption of tightness in 5 is superfluous because of 4
and the fact that/? must be continuous. Suppressing the tightness assumption
leads from 5 to 6 and from 5° to 6°; although 6 is true (because of 4 and the
continuity of/?), 6° is false (because there is no 4°).
Not only is 6° false as it stands (as an implication applying to all P and all
{Pn}), but there is no single P for which it is true (as an implication applying
to all {Pn})\ Define hn:C^ С by hn(x) - x + xn with xn defined by C.5)
and, given P, put Pn — Ph~x. For some sufficiently small d, the open set
A = {x:x(t) < x@) + \, t <d} satisfies P(A) > 0. If An = A - xn, then
lim supn An is empty and hence
lim infn Pn(A) < lim supn Pn(A) = lim supn P(An) = 0.
By Theorem 2.1, Pn cannot converge weakly to P—but the finite-dimensional
distributions do converge.
The argument just given also shows that there can be no 4°. In this connec-
connection, notice that 3° follows from the fact that 6° is false; since 6 is true, 3
requires a special counterexample (the limit function g must be discontinuous
atO).
If there were a 4°, most of Chapter 2 would be superfluous (and if there
were a 4° for the space D, most of Chapter 3 would be superfluous). About
6°: Since all the finite-dimensional distributions of P and all the Pn determine
P and all the Pn themselves, conditions for Pn=>P can in principle be given
in terms of finite-dimensional distributions alone.f But we must in effect
deal with all sets (tx, . . . , t{) simultaneously asn^- oo—it will not do to fix
an arbitrary (tlt . . . , tt) and consider what happens as n —*- oo. And this is
where tightness comes in.
The Cramer-Wold Device
By means of the following simple device due to Cramer and Wold, problems
involving random vectors in Rk can often be reduced to problems involving
only ordinary random variables in R1. Suppose that /c-dimensional random
f See Bartozynski A961).
First Applications 49
vectors Xn - (Xnl, ..., Xnk) andX- (Xx, . . . , Xk) satisfy
J=l 5=1
for each point t — (tu . . . , tk) of Rk. Then the characteristic functions
pn(s) =¦ E{exp (is 2*=1 tjXnj)} of these one-dimensional random variables
converge to p(s) = E{exp (is S*=1 tjXj)} for each real s. Taking s = 1, we
see that
Since t was arbitrary, Хп ^- X follows by the continuity theorem for
characteristic functions.
THEOREM 7.7 In Rk, Xn converges in distribution to X if and only if each
linear combination of the components of Xn converges in distribution to the
corresponding linear combination of the components of X.
In the terminology of Section 2, the half-spaces in Rk form a convergence-
determining class, a fact which apparently cannot be proved without ideas
from Fourier analysis.
Local and Integral Limit Theorems
If Pn and P are probability measures on Rk with densities pn and p with
respect to Lebesgue measure and if
G.16) PnW^Pi?)
for all x outside some set of Lebesgue measure 0, then, by Scheffe's theorem
(p. 224),
G.17) Pn^P.
The relation G.16), called a local limit theorem, implies the relation G.17),
called an integral limit theorem.
There is an analogous result in case P has a density but the mass for Pn
is confined to a lattice in Rk. Let d(n) =; («^(и), . . . , дк(п)) be a point of Rk
with positive coordinates, let <x(n) — (ai(«), • . • , <*k(ri)) be an arbitrary
point in Rk, and denote by Ln the lattice consisting of all those points of Rk
having the form
where щ, . . . , uk range independently over 0, ±1, ±2, . ... If ж is a point
in Ln, then the interval (x — d(ri), x] (see C.1) for the notation) is a cell of
volume vn = d^n) • • ¦ 6k(n), and Rk is the disjoint union of these countably
many cells.
50 Weak Convergence in Metric Spaces
Suppose now that Pn is a probability measure in Rk with Pn(Ln) — 1, and,
for x g Ln, \etpn(x) be the mass (possibly 0) at that point. Let P have density
p with respect to Lebesgue measure.
THEOREM 7.8 Suppose that
G.18) max {^(n), . . . , <5t(n)} - 0.
Suppose further that, if {xn} is any sequence and x any point in Rk, and if
xn lies in Ln and varies with n in such a way that
G.19) xn^x,
then
G.20) ^^ — p(x).
Then Pn => P.
Proof. Define a probability density qn on Rk by setting qn{y) — pn(x)lvn if
у lies in the cell (x — d(n), x] determined by the point x of the lattice Ln.
Since G.19) implies G.20), we have
G.21) qn(y)
for each y. Let Xn have density qn, and define Yn on the same probability
space by setting Yn = x if Xn e (x — d(n), x] with x e Ln. What we are to
prove is Yn®+P. Since \Xn - Гп| < \д(п)\, this will follow by G.18) and
Theorem 4.1 if we prove Xn ^ P. But, in view of G.21), this is a consequence
of Scheffe's theorem.
Weak Convergence on the Circle and Torus
Let S be the unit circle in the complex plane and let eu(x) = xu be the circular
functions, и = 0, ±1, ±2, .... The Fourier series of a measure P on S is
defined by p(u) — J eu(x) P(dx) for integral u. Since P is determined by the
values of I fdP for/in C(S) and since each/in C(S) can, by Weierstrass's
theorem, be uniformly approximated by linear combinations of the circular
functions, p determines P. By the same reasoning as for the line, this unique-
uniqueness theorem implies a continuity theorem: Pn => P if and only if the corre-
corresponding Fourier series satisfy pn(u) —*¦ p(u) for each integer u. Since the
circle is compact, there is this time no question that {Pn} is tight; hence
{Pn} converges weakly to some limit if \imn pn(u) exists for each u. If P is
normalized circular Lebesgue measure, then p(u) vanishes for и Ф 0 (and,
of course, is 1 for и = 0). Weyl's criterion follows immediately: (xl5 x2, . . .)
is uniformly distributed on the circle if and only if n 2"=1(а;3-)и —»- 0 for each
nonzero u. In particular the powers x} are uniformly distributed if x is not
a root of unity.
First Applications 51
The circle can be rolled out onto the interval [0, 1) by means of the
correspondence e2lTix <-> x. The Fourier series of a probability measure P on
[0, 1) is denned by p(u) = J e2iriux P{dx) for integral u; of course, the unique-
uniqueness and continuity theorems for the circle can be restated for the interval
[0, 1). Although weak convergence here must refer to the topology that
[0,1) inherits from the circle and not to the relative topology of the line (for
example, 1 — 1/rc -+ 0), we can clearly ignore this distinction if the limit
measure assigns measure 0 to the point 0. Thus, if Pn, P are probability
measures on R1 with support [0, 1) and if P{0} = 0, then Pn^>P if and
only if
L2:Tiuxpn(dx) -* L
for each integer u. A sequence (х1г x2, . . . ) of real numbers is said to be
uniformly distributed modulo 1 if the empirical distributions of the sequence
({хх}, {x2}, . . . ) of fractional parts converge weakly to the uniform distribu-
distribution on the unit interval. Weyl's criterion becomes: (x1, x2, . . . ) is uniformly
distributed modulo 1 if and only if n~x SJLi e2iriuXi —> 0 for each nonzero
integer u. This criterion is satisfied if xt = j!-, where ? is irrational.
By the ?>dimensional Weierstrass theorem, similar results hold for the
A:-dimensional torus—the product of к circles. (They even carry over to the
general compact group, with the characters in the role of the circular func-
functions.) Taking the case к = 2 and laying the torus out into a square, we see
that, if Pn and P are probability measures in R2 supported by the square
{(x, y):0 < x, у < 1}, and if the lower and left-hand edges of the square
have P-measure 0, then Pn => P if and only if
/•
2rri(ux+vy) jp __. „2iri{ux+vv)
for all pairs of integers и and v.
We say a sequence ((ж1} уг), (х2,у2),...) in the plane is uniformly
distributed modulo 1 if the empirical distributions, obtained by reducing all
coordinates modulo 1, converge weakly to the uniform distribution on the
square; this holds if and only if rr1 2™=1 e27ri{uxi+vv')-^ 0 whenever и and v
do not both vanish. This condition is satisfied if (x^, y}) = (j?,jrj), where ?,
rj, and 1 are linearly independent (there do not exist integers и and v, not
both 0, such that u? + vr\ is integral). To take another case, suppose ? and r\
satisfy exactly one constraint: ug + vr\ is integral if and only if и = 2w and
v = 3w for some integer w. Then
1 if — = - is integral,
2 3
0 otherwise.
52 Weak Convergence in Metric Spaces
By checking the Fourier coefficients, one sees that the
empirical distributions converge weakly to a distribu-
distribution uniform on the four slanting lines in the
diagram.
The Cramer-Wold result has an obvious analogue
here. For example, the planar sequence ((ж1} у-,),
(х2,У-д-, • • •) is uniformly distributed modulo 1 if
and only if for all integral и and v, not both 0,
the linear sequence (ихг + vyx, ux% + vy2, . . .) is uniformly distributed
modulo 1.
Remarks. Theorem 7.2 is here proved by Levy's version of Lindeberg's method; see
Levy A925, pp. 246-249). See Feller A966) for other approaches to the central limit
theorem and for characteristic functions. There exists for linear spaces a theory of
characteristic functions which ties in closely with weak convergence; see Prohorov A960)
and Gross A963) and the references there.
See Cramer and Wold A936) for their device. See Hardy and Wright A960) for uniform
distribution modulo 1, in particular for the last example here.
PROBLEMS
.1. Replace the integrand in G.1) by a B&)th-degree polynomial so chosen that cp has
bounded, continuous derivatives up to order к (к = 3 suffices for the proof of Lindeberg's
theorem).
2. Use the Cramer-Wold device to derive a ^-dimensional version of the Lindeberg-
Levy theorem.
3. A measurable function h{x) on the line has a mean value if the limit
i Г
M{h) = lim — h(x) dx
y_oo IT J_T
exists; it has a distribution if the probability measures (here X is Lebesgue measure)
PT(A) = — X{x: \x\ < T, h(x) EA}, A
converge weakly as T-* со. If h is almost periodic, then it is bounded and, for each t, eltx is
almost periodic (see Bohr A932)); from the first proposition used in deriving the continuity
theorem for characteristic functions, show that h has a distribution.
4. Consider on the line probability measures having moments of all orders. The measure
P is said to be determined by its moments if ]xu P(dx) = $xu Q(dx), и = 1, 2,. . . implies
P = Q. (See Feller A966, pp. 224 and 487) for an example of a P not determined by its
moments and for a criterion for P to be determined by its moments.) If supn \$xu Pn(dx)\ <
oo for each u, and if Pn => P, then the moments of Pn converge to those of P [Theorem 5.4].
If the moments of Pn converge to those of P and if P is determined by its moments, then
Pn => P. [First show {Pn} is tight (Problem 5 in Section 6) and then imitate the proof given
above for the continuity theorem for characteristic functions.] Use the Cramer-Wold
technique to extend this result to Rk [consider product moments of all orders].
First Applications 53
5. If A: varies with n in such a way that (k — np)\^jnpq -> x, then
pkqn-knpq-+-i=e-b*2
(see Feller A957, p. 169)). Use Theorem 7.8 to deduce the De Moivre-Laplace theorem.
Apply the same technique to the hypergeometric distribution (see Problem 2 on p. 17 and
Problem 10 on p. 180 of Feller A957)) and to the frequency count for a set of multinomial
trials (leave out the count for one cell, so that the limiting distribution will have a density).
6. Use Problem 8 in Section 2 to prove that, if probability measures in Rh converge
weakly, then the corresponding characteristic functions converge uniformly on bounded
sets.
7. Generalize Theorem 7.8: Replace Lebesgue measure on Rk by a general measure on a
separable S and consider decompositions {Cni} of S with Итта тах^ diam Cni — 0.
CHAPTER 2
The Space С
8. WEAK CONVERGENCE AND TIGHTNESS IN С
The Introduction explains some of the merits of proving weak convergence
in the space С = C[0, 1] of continuous functions on [0, 1], where we give
С the uniform topology by defining the distance between two points x and у
(two functions x and у of t e [0, 1]) as
P(x, y) = sup, КО - y(t)\.
Weak Convergence
Although weak convergence in С does not in general follow from weak
convergence of the finite-dimensional distributions, we saw at the beginning
of Section 6 that it does in the presence of relative compactness. Since С is
separable and complete (see p. 220) it follows by Prohorov's theorem
(Theorems 6.1 and 6.2) that relative compactness of a family of probability
measures on (C, ^) is equivalent to tightness of the family. Thus we have
the following result.
THEOREM 8.1 Let Pn, P be probability measures on (C, %). If the finite-
dimensional distributions of Pn converge weakly to those of P, and if {Pn} is
tight, then Pn => P.
If we are to use this theorem to prove weak convergence in C, we must
see just what tightness here means.
Tightness
The modulus of continuity of an element a; of С is defined by
(8.1) wx(d) = w(x, d) = sup \x(s) - x(t)\, 0 < д < 1.
|s-<|<<5
54
Weak Convergence and Tightness in С 55
Let {Pn} be a sequence of probability measures on (С, <&).
THEOREM 8.2 The sequence {Pn} is tight if and only if these two conditions
hold:
(i) For each positive rj, there exists an a such that
(8.2) Рп{х:\х(Щ>а}<г1, n>\.
(ii) For each positive s and rj, there exist a d, with 0 < д < 1, and an
integer щ such that
(8.3) Pn{x-wM>e}<n, n>n0.
Condition (i) stipulates that {Рптт^г} be tight. In connection with (8.3),
note that wx(d) is continuous in x and hence measurable.
Proof. Suppose {Pn} is tight. Given e and r\, choose a compact set К such
that Pn{K) > 1 — rj for all n. By the Arzela-Ascoli theorem (p. 221), we
have К <^ {ж:|ж@)| < a) for large enough a and К <^ {x: wx(d) < e} for
small enough d, so that (i) and (ii) follow (with «0 - 1 in (ii)). This proves
the necessity of (i) and (ii).
Since a single probability measure P on (C, ^) is tight (Theorem 1.4), it
follows by the necessity of (ii) that for each e and rj there is а д such that
P{x:wx(d) > e} < rj. If {Pn} satisfies condition (ii), therefore, we may ensure
that the inequality in (8.3) holds for the finitely many n preceding n0 by
decreasing д if necessary. Thus we may assume in proving sufficiency that
the n0 in (8.3) is always 1.
Assume that {Pn} satisfies (i) and (ii), with щ always 1 in (8.3). Given rj,
choose a so that, if
A = {x:\x@)\<a},
then Pn(A) > 1 — \rj for all n, and choose dk so that, if
then Pn(Ak) > 1 - 77/2*+! for all n. If К is the closure of А П f\°lA,
then Pn(K) > 1 — 7] for all n and, by the Arzela-Ascoli theorem, К is
compact. Hence {Pn} is tight.
As the proof shows, we may demand that (8.3) hold for all Pn. With this
change, the theorem is true if {Pn} is replaced by an arbitrary family П. In
applications, we shall often prove (8.3) with d, s, and rj replaced by (say)
\b, 3s, and 9r/, which is just as good.
Theorem 8.2 transforms the concept of tightness in С simply by substituting
for compactness its Arzela-Ascoli characterization. Our next theorem goes
only a small step beyond this, but, even so, fills our present needs.
56 The Space С
THEOREM 8.3 The sequence {Pn} is tight if these two conditions are
satisfied:
(i) For each positive rj, there exists an a such that
(8.4) Рп{хЛхШ> a] <rj, n>\.
(ii) For each positive e and rj, there exist a d, with 0 < d < 1, and an
integer щ such that
(8.5) \PnU- sup \x(s)-x(t)\>e)<r), n > n0,
О [ t<s<t+d )
for all t.
Of course, we restrict the t in (8.5) to 0 < t < 1; if t > 1 — d, we restrict
s in the supremum to t < s < 1. Note that (8.5) is formally satisfied if
д > Ifrj; but we require д < 1.
Proof Fix д and let
At = \x: sup \x(s) — x(t)\ > el.
{ t<s<t+d )
Now 5 and t each lie in intervals of the form [id, (i + 1K]. If \s — t\ < d,
then these intervals either coincide or abut; it follows that
(8-6) Pn{x:wx(d)>3e}<Pnl U Aid).
\i<d I
Since
(8-7) Pj U Ai5) <
\i<6 г J i<5
(8.5) implies Pn{x:wx(d) > 3e} < A + [l/d])drj < 2r\ (recall that 5 < l).f
Therefore condition (ii) here implies the corresponding condition in Theorem
8.2. Since condition (i) is unchanged, the result follows.
The following corollary contains the essential point of the proof.
COROLLARY IfQ=^tQ<t1<---<tr=l,and if
(8.8) и-и_г>д, 2<i<r-l,
then
(8.9) P{x:wxF)>3e}<^p[x: sup \x(s) - a;(^| > e).
i=i i ц-.х<8<и )
Note that the inequality in (8.8) need not hold for i = 1 and / = r.
t We use [a] to denote the integer part of a—the largest integer not larger than a.
Weak Convergence and Tightness in С 57
Although the inequality (8.6) entails no real loss, (8.7) does, and condition
(ii) is not necessary.f Suppose, however, that the sets Ai& {b fixed, i varying)
are independent under Pn and all have the same measure p, which will be at
least approximately true for many of the measures Pn encountered in this
chapter. Then
If this quantity is bounded above by r\, then (if r\ < ^)
(8.10) \ < (l + \-~\)p < -log(l - 17) < 2i?,
which is a restriction of the same order as (8.5). Therefore any effective
tightness criterion that goes beyond Theorem 8.3 must make some sort of
positive use of dependence among the Aid.
Random Functions
Let X be a mapping from (П, ^, P) into С For the moment we do not
assume X to be measurable {X~xc€ <= ^). For each со in П, X(co) is an
element of С—а. continuous function on [0, 1] whose value at t we denote by
X(t, со). For fixed t, let X{t) denote the real function on Q. with value
X(t, со) at со; X{t) is the composition -ntX. Similarly, let (X(t^), . . . , X(tk))
denote the mapping from D. into Rk with value (X(tx, со), . . . , X(tk, со))
Sit CO.
If A = {xe C:x(t) < a}, then Ae<€ and {co:X(t, со) < «.} - Х~гА. It
follows that if X is a random element of С—that is, if X~xc€ <= ?fi—then each
X{t) is a random variable {X{t)-x3P- <= ?g) and hence each (X{t^),. . . , X(tk))
is a random vector. Suppose, on the other hand, that each X(t) is a random
variable. If В is the closed sphere in С with center x and radius d, then
Х~гВ = {co:X(co) EB} = C\r {co:x(r) - д ^ X(r, со) < x(r) + 6}, where the
intersection extends over the rationale, so that Х~хВе&. Since С is
separable, X~xc€ с 38 follows: X is a random element of C. Thus X is a
random function (a random element of C) if and only if each X(t) is a
random variable. (We have merely proved once more that the finite-dimen-
finite-dimensional sets generate ^.)
Suppose now that {Xn} is a sequence of random functions. The sequence is
by definition tight when the sequence of corresponding distributions is tight.
According to Theorem 8.2, {Xn} is tight if and only if the sequence B^@)}
is tight (on the line) and for each positive e and r\ there exist a d, 0 < б < 1,
t See Problem 1.
58 The Space С
and an integer щ such that
(8.11) P{HXn, $)>?}<У, n> n0.
(We use w(x, d) as an alternate notation for wx(d).) This condition stipulates
that the random functions Xn do not oscillate too violently.
Theorem 8.3 can be recast in the same way: {Xn} is tight if {Xn@)} is
tight and if for each positive e and r\ there exist a d, 0 < д < 1, and an
integer n0 such that
(8-12) - P( sup \Xn(s) - Xn(t)\ > e) < V
О [t<s<t+d J
for n>n0 and 0 < t < 1 (with 5 in the supremum restricted to t < s < 1
in case 1 — д < t < 1).
As explained in the Introduction, our first interest will be in random
functions constructed in the following way. Let ?l5 |2> • • • ¦> be random
variables on some probability space (Cl, 28, P). For the present, the tjn need
not have any special properties such as stationarity and independence. We
define Sn — ?i + • • • + ?n, with So =* 0, and construct Xn from the partial
sums So, Slf . . . , Sn. For points ijn in [0, 1], we set
(8.13) Хп(-,сЛ =J-
(We confine our attention to norming factors of the form a^jn—others are
possible.) For the remaining points t of [0, 1], we define Xn(t, со) by linear
interpolation: If t e [(/ — l)/n, i/n], then
xti
l/и \ n / 1/и \n
\ n J a^Jn
Since 1 — 1 = [nt] if ? e [(/ — l)/n, i[n), we may define the function
more concisely by
(8.15) Xn(t, со) = -±- Slntl(co) + (nt - [nt]) -?- ?[n<]+1(co).
a^Jn a^Jn
Since the ?i} and hence the iSi5 are random variables, it follows by (8.15)
that Xn(t) is a random variable for each t. Therefore the Xn are random
functions.
When do these random functions form a tight sequence? Since 2^@) = 0,
certainly {Xn@)} is tight. By using the definition (8.15) we can translate
(8.12) into a restriction on the fluctuations of the partial sums S^ If t = kfn
Weak Convergence and Tightness in С 59
and t + d — j\n, with к and у integers, then (8.12) reduces to
(8.16) - Pfmax -*-= \Sh+i - Sk\ >e)<rj.
d {i<nd Oy/n )
Although (8.12) and (8.16) in general differ if t and d are not integral
multiples of l/n, the discrepancy is, as we shall show, irrelevant to our
purposes. Consider the integers у and к defined by the inequalities
n n n In
Because of the polygonal character of Xn, we have
sup \Xn(s) - Xn(t)\ < 2 max -^- \Sk+i - Sk\.
t<s<t+ii J
If n > 4/E, then j — к < пд, so that the maximum on the right does not
decrease if the restriction i <j — к is relaxed to i < nd. Therefore, if (8.16)
holds for all к and all n > n0, then (8.12) holds for all t and all n >_
max {щ, 4/E}, provided the e, rj, and д in (8.12) are replaced by 2e, 2r\, and \b,
respectively, which amounts only to a renaming of these quantities.
Thus {Xn} is tight if, for each positive e and r\, there exist a d, with
0 < д < 1, and an integer n0, such that (8.16) holds for all fc and for all
n > n0. The maximum in (8.16), which extends over / < nd, becomes easier
to work with if nd is replaced by an integer. If nd is an integer m—it need not
be—then the inequality (8.16) becomes
P max \Sh+i - Sk\ > -^ ojm) < rjd.
\i<m ^Jd )
Put X = sf-Jd (if d is small, X is large); the inequality then further reduces to
Pfmax |Sfc+i - Sfc| >
\ i < m
Since rjs2 is positive if r\ and e are, we are led to formulate the following
theorem.
THEOREM 8.4 Suppose {Xn} is defined by (8.15). The sequence {Xn} is
tight if for each positive s there exist a X, with X > 1, and an integer n0 such
that, ifn> n0, then
(8.17) pjmax \Sk+i - Sk\ > Xojn) < ~
[i<n j A
holds for all к.
60 The Space С
We require X > 1 ((8.17) is trivial if X < V e), which corresponds to the
requirement д < 1 in Theorem 8.3. In concrete cases, proving (8.17) forces
large X's on us.
Proof. Given e and rj, we shall produce а д @ < д < 1) and an n0 for which
(8.16) holds for all к if n > n0. Since (8.16) becomes more stringent as e
and ?7 decrease, we may assume 0 < s, rj < 1.
By the hypothesis, with ^e2 in place of e, there exist A (A > 1) and nx
such that
Pjmax \Sk+i - Sk\ > Xojn] < ^
[ i<n ) X
(8.18)
for n > nx and к > 1. Put 5 = e2/A2; since X > 1 > e, we have 0 < 5 < 1.
Let щ be an integer exceeding n^d.
If я > n0, then [л<5] > и1} and it follows by (8.18) that
p( max \Sk+i -Sk\ >
Since X\J [nd] < e\Jn and ^e2/P = ?)d, (8.16) holds for all к if и > и0. This
proves Theorem 8.4.
It is not hard to see that it is enough that (8.17) hold for к < nX2/s, but
we shall not need this. If {?n} is stationary, then (8.17) reduces to
(8.19) Pfmax \St\ > Xojn) < ^ .
\i<n ) X
We can absorb a into X and require
(8.20) P(max |S,| > Xjn) < ±
\i<n j X
with X > a.
We shall see later that in some circumstances the hypothesis of Theorem 8.4
is necessary as well as sufficient.
Coordinate Variables
The projection ттг, with value 7Tt(x) = x(t) at x e C, is a random variable on
(C, ^). We shall often denote this random variable by xt: For fixed t, xt is
the function on С with value x{t) at x. If there is a probability measure P on
(C,^), {?*:()< *< 1} is a stochastic process. We think of us a time
parameter, and the xt are commonly called the coordinate variables or
functions. The distribution of xt depends on the measure P, and we shall
speak of the distribution of xt under P; we shall often write P{xt e H) in
The Existence of Wiener Measure 61
place of P{x:xt e H) and J xt dP in place of J zt P(dx). Finally, when / is a
complicated expression, we shall often revert back to x(t), still intended as a
coordinate function.
Remarks. The general theory of weak convergence in С (Theorem 8.2, for example) is
due to Prohorov A956). Theorem 8.4, a convenient reformulation of Theorem 8.3,
resulted from discussions with F. Topsoe.
Stone A963) treats weak convergence in a space of continuous functions on [0, oo).
Lamperti A962b) discusses spaces of functions satisfying a Holder condition.
PROBLEMS
1. Let X{t) = f?, where ? is a random variable with P{|?| > a} ~ a~* as a -»- oo, and,
for every n, let Pn be the distribution of X. Then {Р„} is tight but does not satisfy condition
(ii) of Theorem 8.3.
9. THE EXISTENCE OF WIENER MEASURE
Wiener Measure
Wiener measure, denoted here by W, is a probability measure on (C, ^)
having the following two properties. First, for each t, the random variable
xt is normally distributed under Ж with mean 0 and variance t:
W{xt < a} = —L= Г e-/2t du.
y/27Tt J-™
(9.1)
If t = 0, this is interpreted to mean W{x0 = 0} = 1. Second, the stochastic
process {xt:0 <; t < 1} has independent increments under W: If
then the random variables
(9.3) xti — xto, xt^ — xt , . . . , xt — xtk
are independent under W. In this section, we prove there does exist such a
measure W.
If W has these two properties and if s <, t, then xt (normal with mean 0
and variance t) is the sum of the independent variables xs (normal with mean
0 and variance s) and xt — xs, so that xt — xs must be normal with mean 0
and variance t — s, as may be seen by dividing the characteristic functions.
Thus, when (9.3) holds,
(9.4) W{xti - xti_x < a,, i = 1, . . . , k}
1 /*°ч , ..
'-1' du.
— h
62 The Space С
In particular, the increments are stationary (the distribution of xt — xs
under W depends only on the difference t — s) as well as independent.
If we regard x(t) as (one coordinate of) the position at time t of a moving
particle, then x itself gives the history of the particle's motion (relative to
this specific coordinate) from time t = 0 to time t = 1. Wiener measure
gives to these paths x a distribution appropriate for the description of
Brownian motion—the motion of a pollen grain suspended in water.
In proving the existence of W, we face the problem of proving the existence
on (C, ^) of a probability measure with specified finite-dimensional distribu-
distributions. There can for an arbitrary specification be at most one such measure,
and for some specifications there is none at all (there is, for example, no P
under which the distribution of xt is a unit mass at 0 for t < \ and at 1 for
'>*)•
THEOREM 9.1 There exists on (C,^) a probability measure W such that
(9.1) holds and such that the random variables (9.3) are independent under W
whenever (9.2) holds.
Proof. Let |l5 |2, ... be independent and normally distributed (on some
(Q, &, P)) with mean 0 and variance 1. Let Xn be the random function defined
by (8.15) with a = 1:
(9.5) Xn(t, o>) = ^- Slntl(oj) + (nt - [nt]) -j= |
Let Pn of the distribution of Xn on C. This measure is well defined because>
as we showed in the preceding section, Xn is measurable {Х~х(ё с
We shall first show that the finite-dimensional distributions of the Pn
converge weakly to what we want the finite-dimensional distributions of W
to be. After that, we shall prove that the sequence {Pn} is tight. It is clear
that the limit of any weakly convergent subsequence {Pn'} will satisfy the
requirements placed on W. The idea is that the Pn approximate the putative
measure W.
The finite-dimensional distribution -Pn^1..^ is just the distribution of the
random vector (Xn(tx), . . . , Xn(tk)y Consider a single time point t. By (9.5)
and the assumed normality of the |n, Xn(t) is normal with mean 0 and
variance
[nt] (nt - [nt])* .
Г 5
n n
this variance differs from t. by at most 2/n. Thus Xn(t) ®> N@, t) (see D.9)
for this notation).
Clearly, we can treat two or more time points in the same way; the finite-
dimensional distributions therefore converge weakly to those prescribed
for W.
The Existence of Wiener Measure 63
To prove that {Pn} is tight, we apply Theorem 8.4. To prove (8.20), write
(9.6) Et = (max |S,| < IX^n < \St\\.
{ o<i J
By the stationarity and independence of the |n, we have
(9.7) Pjmax \St\ > 2Ljn) < P{\Sn\ > Xjn}
[< J
г=1
г=1
п—\
< P{|SJ > Цп} + 2 Р(?,)Р{|5П_,| ^ Xjn - i]
г=1
Since SjVj is normally distributed with mean 0 and variance 1 and hence has
a finite third absolute moment a independent of у (а — 2v2/tt), it follows
by (9.7) and Chebyshev's inequality that
P(max \St\ > 2XJn\ < + f W)
\ i<n ) A i=\ A
Since the Et are disjoint, we can conclude
(9.8) Pf max \St\ ^ 2Xjn\ < \ , n = 1, 2, ....
{ г<п ) А
Given ?, choose X so that 2a/A < e and Я > 1. Then (9.8) implies
Pfmax \St\
which is (8.20), except for the irrelevant factor 2 on the left. By Theorem 8.4,
therefore, the {Pn} are tight, which completes the proof.
Thus Wiener measure exists. In the sections following, we shall derive
some facts about W. For the present, let us only show that the elements of
С are of unbounded variation, except for a set of Wiener measure 0. This is
interesting because it exhibits a sense in which Wiener paths (elements of С
chosen according to the probability measure W) are highly irregular—they
are continuous, but only just.
An element x of С is of bounded variation if and only if there exists a
number Mx such that
г=1
64 The Space С
for all t0, tlt . . . , tk with 0 < t0 < tx < • • ¦ < tk < 1. Therefore it will
suffice to show that, if
'i - Г
/.(*) = I
— x\
1=1
then/n(z)-> oo except for x in a set of Ж-measure O. Now |JV| (see D.10))
has positive mean b and variance с (b = V2/tt and с = 1 — 2/тг). Since the
|z(/2-n) - x((i — lJ~n)| are, under W, independent with mean 2~n/2Z> and
variance 2~nc, the mean and variance of fjx) are 2nl2b and с It follows by
Chebyshev's inequality that W{x :fn(x) > a} -»¦ 1 for each a. Since
fn+i(x) >fn(x)> it follows further that fn(x) -> oo except on a set of Wiener
measure 0.
We shall also use the symbol W to denote a random element with values
in С and with Wiener measure as its distribution; that is, Ж is a measurable
mapping from some (Q, 38, P) to С with the property that
P{co: W(co) gA}= W(A), AeV,
where the W on the right is Wiener measure as already defined. That there
exists such a random element follows by the argument centering on D.4).
This dual use of W should cause no confusion.
We shall denote by Wt(co) or W(t, со) the value at t of the random function
W{oo). Thus {Wt:0 <, t < 1} is a stochastic process whose paths are
continuous for all со. This process we shall call the Wiener process or the
Brownian motion process. The relation Xn ^> W can be interpreted in
accordance with D.5) or in accordance with D.7), depending on whether W
is construed as a random function or as a measure—the meaning is the same
for the two interpretations.
The Brownian Bridge
A random element X of С is Gaussian if all its finite-dimensional distributions
are normal. The distribution in С of a Gaussian random element is com-
completely specified by the means E{X(t)} and product moments ?{X(s)X(t)}
0 <i s, t <; 1, because these determine the finite-dimensional distributions.
For W, the moments are
(9.9) ?{Wt} = 0
and
(9.10) E{WsWt} = s if s <, t.
In studying the behavior of empirical distribution functions, we shall need
a Gaussian random element W° of С whose distribution is specified by the
The Existence of Wiener Measure 65
requirements
(9.П) E{Wt°} = 0
and
(9.12) E{Ws°Wt°} = s(l - 0 if s < t.
Although we could prove the existence of such a W° by the methods of
Theorem 9.1, it is simpler to construct W° from W by setting
(9.13) W° =Wt-tWx, 0<t< 1.
Certainly W°, thus defined, is a Gaussian random element of C, and (9.11)
and (9.12) follow from (9.9) and (9.10).
The random element W° is called the Brownian bridge (or tied-down
Brownian motion). Note that W° — W° =^ 0 with probability 1. From (9.11)
and (9.12) it follows that
(9.14) E{(Wt° - W:f) = (t - 5)A - (t - 5)) if s<t
and that
(9.15)
= ~(S2 - Si)(*2 - tx) if 5X < S2 < tr < t2.
We shall also use W° to denote the distribution on С of the random
element W°. If /г:С-> С takes the function x to the function with value
x(t) — tx(l) at t, then the measures W° and Ж are related by W° = Whrx.
Separable Stochastic Processes^
Let us connect our construction of Wiener measure with the notion of a
separable stochastic process. According to Kolmogorov's existence theorem
(p. 230), there exists a stochastic process {|t:0 <, t < 1}, on some probability
space (Q, ^, P), for which the distribution of each finite collection of the |t
coincides with the corresponding finite-dimensional distribution prescribed
for W. In the standard construction, (Q, 8$) is the product of a collection of
copies of (R1, ?%х), one copy for each t in [0, 1], and the |t are the coordinate
functions.
The set Qo of со for which !f(co) is continuous in t, 0 ^ t < 1, need not
satisfy P(O0) = 1; indeed, in the standard construction Qo does not even lie
in ?%. It is possible, however, by another procedure, to find a separable
process with the prescribed finite-dimensional distributions. The process
{?t'O < t < 1} on (Q, ^, P) is separable! if & is complete relative to P and
t The rest of this section may be omitted.
% See Doob A953, p. 51).
66 The Space С
if there exists in ^ a set E of measure 0 and in [0, 1] a countable set To such
that, for each a and /? and for each open interval /, we have
(9.16)
{ш:а < ?t(co) < /5, / e / П To} - {co:a < |f(a>) ^ /5, / e / П Г} с ?,
where we have written Tin place of [0, 1]. The left-hand set in the difference
here lies in 38 because To is countable; since P(?") — 0 and 38 is complete, the
right-hand set in the difference must lie in 38 also and have the same
probability.
If {IJ is separable and has the finite-dimensional distributions appropriate
to Brownian motion, thenf the set Qo of со for which the sample path (It(co)
as a function oft) is continuous lies in 38 and satisfies P(O0) = 1- If we map
Qo into С by carrying со into its sample path, and if we carry P (restricted to
O0) over to (C, ^f) via this mapping, we arrive at W, which gives another
method of constructing Wiener measure.
The problem of proving the continuity of the sample paths of separable
processes and the problem of constructing measures on (C, ^f) are effectively
the same. If each separable process having certain specified finite-dimensional
distributions also has continuous sample paths with probability 1, then, as
the above construction shows, there exists on (C, ^) a probability measure
with these finite-dimensional distributions. Let us prove the converse.
THEOREM 9.2 If{?t:Q <, t < 1} is a separable stochastic process, and if
there exists on (C, ^) a probability measure with the same finite-dimensional
distributions as {IJ, then the sample paths of the process are continuous with
probability 1.
Proof. In proving this result it is no restriction to assume that To contains
all the rationale in the unit interval, in which case (9.16) holds if / is a closed
interval with rational endpoints. We shall use the relations
(9.17) (U. Ee) + (U. E'e) cz \J9 (Ев + Е'в)
and
(9.18) (Пе Ев) + (С\в Е'в) с Ue (Ев + Е'в),
each of which is valid whatever the range of the index 6. %
Let V denote the general system
(9.19) V: k;r0,..., rk; ax, . . . , afc,
where к is an arbitrary integer, the rt and the o^ are rational, and 0 =
r0 < • • • < rk = 1. There are countably many such systems V. For V given
f SeeDoob A953, p. 393).
% Recall that + stands for symmetric difference.
The Existence of Wiener Measure 67
by (9.19) and e positive, define
к
(9.20) QTo(V, e) = П {со:a, < |t(co) ^ a, + e, t e [гИ) г,] П Го}
and
(9.21) 0Го = П U QTo(V, s),
E V
where the intersection extends over positive, rational e. Let Q.T(V, e) and
пт be the same sets but with To replaced by T = [0, 1] in (9.20) and (9.21).
Since UT(V, e) + QTo(V, s) <= E, (9.17) and (9.18) imply
(9.22) пт + 0Го с Е.
Now 0Го(К, ?) is a countable intersection of sets in & and hence itself lies
in &. Therefore пТо e dS. It follows by (9.22) that пт lies in SS and P(Qr) =
P(Qr ). Since Or is exactly the set of со with continuous paths, we need only
prove - ¦
(9.23) ?{пт) = 1.
Let (tu t2, . . .) be an enumeration of the points in To. For V given by
(9.19) and ? positive, let HTo(V, e) denote the set of points z =* (zls z2, . . .)
in i?00 such that, for each / — 1, . . . , k, the inequality o^ < zu < a^ + ?
holds for every coordinate index м for which tu e [г^_х, rj. Then HTo(V, e) e
^°°. If the mapping 99 :Q -> i?00 is defined by q>{<o) = (|tl(o>), lt2(co), . . .),
then <p-*HTo(V, e) = Qro(F, ?). If ЯГо = ПеигЯГо(К, е), then ЯГо е <%«>
and
(9.24) ^ЯГо = QTo.
Now define xp:C^ R™ by ^0*0 = 0»(*i)> «('2). • • •)• Let p be the
probability measure on (C, ^) whose finite-dimensional distributions coincide
with those of {?J. If H is a finite-dimensional set in i?00, then
(9.25) Р(<Р-1Я) = Р(^-ХЯ);
since the finite-dimensional sets form a field generating ^f°, (9.25) holds also
for every Я in ^°°. By (9.24), therefore, ?{пт) =* Р(^-хЯГо). But ^-хЯГо =
С, from which (9.23) follows.
Thus proving continuity for separable processes and constructing measures
on С are the same activity.
Remarks. See ltd and McKean A965) for other constructions of Wiener measure and for
some history. For an interesting account of Wiener's early work, see the articles in "Norbert
Wiener, 1894-1964," Bull. Amer. Math. Soc. 72 A966), number 1, part II.
68 The Space С
PROBLEMS
1. Let ?r, with r ranging over the rationale in [0, 1], be random variables with the
finite-dimensional distributions appropriate to Brownian motion. By adapting the argu-
arguments in the proof of Theorem 9.1 but avoiding the tightness concept (and indeed all
mention of the space C), show that {?r} is, with probability 1, uniformly continuous in r.
For t irrational, define ?f = limr_t ?r, and, by the argument preceding Theorem 9.2,
construct Wiener measure on С anew.
2. Show that Wiener measure has no locally compact support.
3. For 0 <; t < oo, define Vt = A + t)W°/{1+t), where W° is the Brownian bridge. The
process {Vt:t > 0} has sample paths that are continuous for all со, the finite-dimensional
distributions are Gaussian, and the moments are E{Ft} = 0 and E{FsFt} = min (s, t). The
process thus represents a Brownian motion over the time interval [0, oo).
10. DONSKER'S THEOREM
The Theorem
Given random variables ?ls |2, ... defined on (Q, ?%, P), let ?„ =
|x + • • • + ln be the partial sums and define a random element Xn of С by
(8.15):
A0.1) Xn(t, со) = -L S[nt](co) + (#i/ - И]) -L l[nt]+1(co).
crv« <Л/и
The following theorem on the convergence in distribution of the Xn is due to
Donsker. The Introduction has examples of its use.
THEOREM 10.1 Suppose the random variables |n are independent and
identically distributed with mean 0 and finite, positive variance cr2:
(Ю.2) E{|J = 0, E{|n2} - a2.
Then the random functions Xn defined by A0.1) satisfy
(Ю.З) Xn ^ W.
Proof. We first show that the finite-dimensional distributions of the Xn
converge to those of W. Consider first a single time point s\ we must prove
(Ю.4) Xn{s) Д. Ws.
Since
A0.5)
Xn(s) -S,
0
by Chebyshev's inequality, A0.4) will follow by Theorem 4.1 if we prove
XX/
VVs-
Donskefs Theorem 69
Jut this is a direct consequence of the Lindeberg-Levy central limit theorem
Theorem 7.4) and the fact that [ns]jn —»¦ s.
Consider now two time points s and t with s < t. We are to prove
(Xn(s),Xn(t))X(Ws,Wt),
vhich will follow by Corollary 1 to Theorem 5.1 if we prove
(Xn(s), Xn(t) - Xn(s)) X (Ws, Wt - Ws).
because of A0.5) and the same relation with t in place of s, it is enough to
)rove
10.6) (—=, S[fM], -~ (S[frt] - S[ns])) Д. (Ws, Wt - Ws).
Since the components on the left are independent, by the independence of the
?n, A0.6) follows by the central limit theorem and Theorem 3.2. A set of
;hree or more, time points can be treated in the same way, and hence the
inite-dimensional distributions converge properly.
We-prove tightness via the following lemma, which is slightly more
general than presently required.
LEMMA Let |x, . . . , |m be independent random variables with mean 0
mdfinite variances of; put S^ = ?x + • • • + !г- and s? — аг2 + • • • + о*-
Then
10.7) P(max |S,| > Xsm) < 2?[\SJ > (A - y/2)sm).
\i<m ) { _ )
To prove A0.7)—note that it is trivial if Я < V2—consider the sets
E, = (max |S,| < Xsm < \sA
[ i<i )
Clearly,
;i0.8) Pfmax \St\ > Xsm) ? P{\Sm\ > (A -
[i<m |
m
+ 2 p№ n (ISJ < a -
m—1
г=1
Since \St\ > Xsm and \Sm\ < (A - \ll)sm together imply \Sm — St\ >
it follows by Chebyshev's inequality and the assumed independence of the
I; that the sum in A0.8) is at most
m—1 _ m—1
2 P№)P(lsm - s*l > V2s-} < 2
г=1 г=1
m-1
2
г=1 \i<m
And now A0.8) and A0.9) combine to give A0.7).
70 The Space С
Applying the lemma to the random variables involved in Theorem 10.1,
we have, for X > 2V2,
pfmax \S,\ > Xajn] < 2P{\Sn\ > \XoJn).
{ i<n J
By the central limit theorem,
PflSU > \^n) - P{|JV| > Щ < ^ E{\N\3}.
Therefore, if e is positive, we have
A0.10) lim sup PJmax \St\ > Xo^Jn \ < j2
for X sufficiently large. Tightness now follows by Theorem 8.4.
An Application
As pointed out in the Introduction, Donsker's theorem has this qualitative
interpretation: Xn Д- W says that, if т is small, then a particle subject to
independent displacements |x, |2, ... at successive times r, 2r, . . . will,
viewed from afar, appear to perform approximately a Brownian motion.
More important than this qualitative interpretation is the use of Donsker's
theorem to prove limit theorems for various functions of the partial sums
Sn. The Introduction indicates how to use the relation Xn Д- Wto derive the
limiting distribution of maxi<n St; let us now carry this through in detail.
Since h(x) =; supt x{t) is a continuous function on C, Xn -^> ^implies, by
Corollary 1 to Theorem 5.1, that
supf Xn(t) ^ supt Wt.
The obvious relation
sup Xn(t) = max —— S^
0<t<l i<n Oyjn
now implies
A0.11) —^- max Si ^> supt Wt.
j
Thus we would have the limiting distribution of тах^<п St (properly
normalized) if we knew the distribution of supt Wt. (We still assume the
hypotheses of Theorem 10.1, of course.) The technique we shall use to find
this latter distribution is to compute the limiting distribution of maxi<n St
in an easy special case.
Suppose that So, Slt. . . are the random variables for a symmetric random
walk starting from the origin. Suppose, that is, that the |n are independent
Donsker's Theorem 71
ind satisfy
110.12) P{?n=1}==PUn:=_1} = i.
Let us show that, if a is a nonnegative integer, then
10.13) pf max St > a) = 2P{Sn > a} + P{Sn = a}.
\0<i<n )
The case a = 0 is easy. Assume a > 0 and put Mt = тахо<3<г- Ss. Since
?{Mn >a}- P{Sn - a} = P{i/n > a, Sn < д} + ?{Мп >a,Sn> a}
and
P{Mn > a, Sn > a} = ?{Sn > a},
A0.13) will follow if we prove
A0.14) ?{Mn >a,Sn<a} = ?{Mn >a,Sn> a}.
Because of A0.12), all 2n possible paths (JSlt S2, . . . ,.5J have the same
probability 2~n. Therefore A0.14) will follow if we show that the number of
paths contributing to the left-hand event is the same as the number of paths
contributing to the right-hand event, and to show this it suffices to match the
paths in a one-to-one manner.f Given a path (S1} S2, . . . , Sn) contributing
to the left-hand event in A0.14), match it with the path obtained by reflecting
through a all the partial sums after the first one that achieves the height a.
Since the correspondence is one-to-one, A0.14) follows. This argument is an
example of the reflection principle.
Let a be an arbitrary nonnegative number, and let an = — [—a.n%]. By
A0.14), we have
A0.15) Pfmax -^ St > a) = 2P{Sn > an} + P{Sn = an}.
{ i<n Jn }
By the central limit theorem,
?{Sn > an} -* P{7V > a}
t The new mathematics.
72 The Space С
(сг2 = 1 in this case, in view of A0.12)). Since the largest term in the symmetric
binomial distribution goes to 0, the term ?{Sn = an} in A0.15) is negligible.
Thus
A0.16) Pfmax -p St > a] -> 2P{N > a}, a > 0.
[ j )
Combining A0.16) with A0.11) (still assuming A0.12)), we conclude
A0.17) P{suPi Wt < a} = -L [*e-W du, a > 0.
Of course, the left side of A0.17) vanishes if a < 0. We have derived a fact
about Brownian motion by combining Donsker's theorem with a computa-
computation involving random walk, a computation that was simple partly because it
reduced to enumeration and partly because a random walk cannot pass
above a positive integer a without passing through it.
Let us now drop the assumption A0.12). If the ?n are independent and
identically distributed and satisfy A0.2), then A0.11) holds, and from
A0.17) we can now conclude
A0.18) P(-Jp max St < a] -> -p= TV* du, a > 0.
[ff^Jn i<n J V2tt Jo
Thus we have derived the limiting distribution of max,<n St under the
hypotheses of the Lindeberg-Levy theorem.
This argument follows a general pattern. If h is continuous on С—or
continuous except at points forming a set of Wiener measure 0^—then
Xn®+ W implies
A0.19) h(Xn)^h(W).
(In the case just analyzed, h(x) = sup4 z@-) We can find the limiting
distribution of h(Xn) if we know the distribution of h(W), and we can often
find the distribution of h{W) by finding the limiting distribution of h(Xn)
in some special case and then using A0.19) in the other direction. The next
section contains further examples in this pattern.
Therefore, if the $n are independent and identically distributed with
E{?J == 0 and E{fn2} = a2, then the limiting distribution of h(Xn) does not
depend on any further properties of the fn. For this reason, Donsker's
theorem is often called the (or an) invariance principle. Here we shall call it
instead the (or a) functional central limit theorem. This term will be used for a
variety of similar theorems in what follows, just as the term central limit
theorem itself is used for a whole class of theorems.
Donsker's Theorem 1Ъ
A Necessary Condition for Tightness
Let us drop the assumption that the ?n are independent and directly assume
that Xn Д* W (with Xn still defined by A0.1)). We shall show that, in this
circumstance, for each positive e there exists а Я exceeding 1 such that
A0.20) p(max \St\ > lajn) < -
\i<n j Я2
holds for all n exceeding some n0. In case {?n} is stationary, this is just the
hypothesis in Theorem 8.4, which we verified in proving Donsker's theorem.
Let Y — supj \Wt\. Because of A0.17) and the symmetry of W under
reflection through 0, Y has a finite second moment. From A0.19) with
h(x) =; supt |a;(f)|,.it follows that, for each positive Я,
A0.21) p(max |S,| > XaJn) -+ P{Y > Я} < — Г У2 d?.
Given e, choose Я so large that the integral here is less than e; it follows
from A0.21) that A0.20) holds for all sufficiently large n.
Thus we see, after the fact, that A0.20) is "the" condition to verify in
proving tightness in Donsker's theorem. In Chapter 4 we shall use the same
condition in proving functional central limit theorems for sequences of
dependent random variables.
The preceding argument really shows this: Suppose that {fn} is stationary
and that the finite-dimensional distributions of the Xn defined by A0.1)
converge to those of a random function X, where supt \X(t)\ has a finite
second moment. Under these conditions, if {Xn} is tight, then the hypothesis
of Theorem 8.4 is satisfied. To this extent, this hypothesis is necessary;
compare the remarks centering on (8.10).
Another Proof of Donsker's Theorem^
The rest of this section is devoted to a second proof of Theorem 10.1. The
proof is interesting because it makes little use of the preceding theory. In
fact, granted the existence of Wiener measure, J we need only Theorem 2.2
and the central limit theorem.
We are to prove Pn => W, where Pn is the distribution of Xn. According to
Corollary 2 of Theorem 2.2, it is enough to prove that Pn(A) ~* W{A)
whenever A is a finite intersection of open spheres and satisfies W(dA) =; 0.
t The rest of this section may be omitted.
t Which can itself be established by a direct argument; see Problem 1 of Section 9.
74 The Space С
Now if A is the intersection of spheres with centers x1} . . . , xk and radii
elt . . . , sk, and if
y{i) = max (xt(t) - еъ)
and
z(t) = min
then A has the form
A0.22) A = {ж:у@ < x(t) < z{t), 0 < t <, 1}.
We are to prove
(Ю.23) РП(Л) -* W{A)
under the assumptions that A is denned by A0.22) with y, ze С and that
Ж(дл() = 0. We may also assume that y(t) < z(t) for all t, since л( is other-
otherwise empty. (If 2/@) ^ 0 and z@) 5^ 0, then W{dA) = 0 holds automatically,
but this need not concern us.) From here on, у and z are fixed.
In our first proof of Donsker's theorem, we showed that the finite-dimen-
finite-dimensional distributions of the Pn converge weakly to those of W. This fact,
which does not really involve the space С at all, we shall assume here. Thus
A0.24) limPn{z :«*!),. . . , x(tk))eH} = W{x:{x{tx),..., x(tk))eH}
71-HX)
holds for each ?>dimensional Borel set H satisfying
W{x:(x(tl),...,x(tk))edH} = 0.
If, for integral v,
А, -{^
then, as и increases to infinity, A2» decreases to the set of x for which
y(t) < x(t) < z(t) holds for all dyadic rationale t. But if W(dA) — 0, then
this last set differs from A by a set of W-measure 0. Therefore for each
positive rj there exists a v such that W(A) < W(AV) < W(A) + r\. Since
Pn(A) < Pn(Av), and, since limn Pn(Av) = W{Av) by A0.24), we have
lim supn Pn{A) < W(A) + rj. Since r\ was arbitrary,
A0.25) lim supn Pn(A) < W{A).
We complete the proof by showing that
A0.26) lim infn Pn(A) > W(A),
which is the harder part. Given rj, choose e so that, if
A0.27) D(e) = {x:y(t) + 3e < x{i) < z{t) - 3e, 0 ^ t < 1},
Donsker's Theorem 75
then W(D{ej) > W(A) - r\. (Clearly, the set A0.27) increases to A as e
decreases to 0.) Now define
Dv(e) = \x:yl-\ + 3e < xl-\ < z(-) - Зе, к = 0, 1, . . . , v).
{ \v/ \v/ \v/ I
For each v, we have D(e) с ^(е) and hence
A0.28) W{Dv{ej) > W(A) - y\.
Choose and fix а и so large that
A0.29) -f < n, wj-) < e,
-) < e,
where w denotes the modulus of continuity (8.1). Then n > v implies
(io.3O) m - /r
w. "
Let Ln denote the set of elements of С that are linear on each interval
[(/ — l)/n, i/n], i = 1, . . . , n, and let Gn be the set of a; in С for which
A0.31)
is violated for some i = 0, 1, . . . , n. If x e Ln, if x e Gnc (so that A0.31)
holds for all / =* 0, 1, . . . , n), and if n > v, then, by A0.30), y{t) < x(t) <
z(t) for all t ? [0, 1], so that x e A. Thus Ln n Gnc с ^. Let Gnr be the
set of a; for which the double inequality A0.31) holds for i < r but not for
i = r. Since Pn(Ln) = 1, and since the Gnr are disjoint and add to G, we
have
A0.32)
< 2 Pn(Gn,r), n>v.
r=0
For 1 < r <n, let knr denote that integer к A < к < v) such that
(ifc - l)/v < r\n < k/v and let kn0 = 0. By A0.32),
A0.33) Pn(Ac)<
r=l
x:
% -
>e\.
If x e Gnr and \x(r[n) - x(knr[v)\ < e, then, by A0.30),
x
+3e
¦•«-4
76 The Space С
so that x ф Dv(e). Thus the first sum on the right in A0.33) is at most
A0.34) Pn(Ac) < Pn(Dv(e)c) + i Рп(сПгГ nix:
for n ;> v.
We shall now show that the sum in A0.34) is small. Observe first that
A0.35)
*t-xfdPn<9\t-s\
for all 5, 7, and n. In fact, if s and 7 are of the form ijn andy/n, then the
left side of A0.35) is E{(St - S^/па2} = \t - s\; if s and t lie in the same
subinterval [(/ — l)[n, i/n], then the left side of A0.35) is n(t — sJ, which is
at most \t — s\; and the general result follows by Minkowski's inequality.
By the independence of the ?n, therefore,
A0.36) Pn[Gn,rn\x:
x[ — I —
Since rjn — knrjv < l/v, it follows by A0.29) that the last member of
A0.36) is at most r]Pn(Gnr). Using A0.34) and the fact that the P(Gnr) sum
to at most 1, we arrive at
Pn(A°) < Pn(Dv(e)c) ±V, n>v.
But НтпРп(^(е)с) = W(Dv(e)c) by A0.24), and so, by A0.28),
lim supnPn04c) < W(AC) + 2tj.
Since r\ was arbitrary, this establishes A0.26), which combines with A0.25)
to yield A0.23), proving Donsker's theorem once more.
Remarks. Erdos and Kac A946) first conceived the invariance principle itself—the
idea of computing the limiting distribution of a quantity such as maxfc<n Sk first in a special
case and then passing to the general case by connecting the result with Brownian motion
(which is how it all started). An early paper in this direction was Kolmogorov A931).
The original assertion of Donsker A951) was not that Xn ®+ W ox that Pn => W, but,
what is by Theorem 5.2 the same thing, that h(Xn) jE> h(W) for all real, continuous func-
functions h on C. The arguments leading to A0.23) are his. Kolmogorov and Prohorov A954)
first pointed out that weak convergence is relevant and that Pn => W follows from A0.23)
and Theorem 2.2. Prohorov A953 and 1956) brought the tightness concept to bear. For a
Functions of Brownian Motion Paths 77
very different approach to these problems, see Knight A962), and for a very different kind
of application, see Strassen A964). Lamperti A962a) has a result differing from Theorem
10.1 in that the limiting process is not Brownian motion.
PROBLEMS
1. Let |nl,. . . , |nfcn be independent with mean 0 and variances o^; put Sni =
!„! + ••• + ?„„ s2ni = a2nl + ¦ • ¦ + a2ni, and sn2 = s\kn. Let Xn be the random function
that is linear on each interval [s\tirA}s^, s2njsn2] and has values Xn(s2nilsn2) = SnJsn at
the points of division. Assume Lindeberg's condition G.3) holds and generalize Donsker's
theorem by using Lmdeberg's theorem, the inequality A0.7), and the corollary to Theorem
8.3 to prove Xn JL». W. (This result is due to Prohorov A956).)
11. FUNCTIONS OF BROWNIAN MOTION PATHS
In the preceding section we used Donsker's theorem and properties of random
walk to find the distribution of supt Wt. Having found this distribution, we
used Donsker's theorem once more to find the limiting distribution of
maXj<n Si in general. Here we shall apply the same technique to other
functions of Brownian motion paths and of partial sums. We also compute
some distributions associated with the Brownian bridge.f
We shall say we are in the Lindeberg-Levy case when dealing with
partial sums Sn of independent, identically distributed random variables ?n
with mean 0 and finite, positive variance cr2. In this case Xn will always
denote the random function A0.1). We shall say we are in the random walk
case if each ?n assumes the values +1 and —1 with probability \ each.
In this case cr2 = 1.
Maximum and Minimum
Let m — inft Wt and M = supt Wt, and let
A1.1) mn = min Si} Mm = max St
0<i<n 0<i<n
be the corresponding quantities for partial sums. The mapping carrying the
point a; of С to the point (inftx(t), supta;(f), x(l)) of R3 is everywhere
continuous, so that, by Theorems 10.1 and 5.1,
(И.2) ~V К Af n, Sn) Л (m, M,
in the Lindeberg-Levy case.
t Although weak-convergence results in С would have little point if it were not possible
to perform computations of the sort carried through in this section, the computations are
not themselves essential to an understanding of the general theory.
78 The Space С
We shall first find an explicit formula for
A1.3) pja, b, v) = ?{a <mn<Mn<b,Sn = v)
in the random walk case. We shall show that, if
(И.4) Pn{j)=nSn=j},
then
A1.5) pn(a, b, v) = | Pn(v + 2k(b - a)) - f pnBb - v + 2k(b - a))
k
k=—oo
for integers a, b, and и satisfying
A1.6) a<0<b, a<b, a<v<b.
If a < &, then the series in A1.5) are really finite sums. Notice that both
sides of A1.5) vanish if и and v have opposite parity.
For particular values of и, a, b, and v, let us denote the equation A1.5) by
[n, a, b, v]. We shall prove by induction on n that [n, a, b, v] is valid if A1.6)
holds. For n = 0, this follows by a straightforward examination of cases.
Assume as induction hypothesis that [n — 1, a, b, v] holds for a, b, v
satisfying A1.6). If a = 0, then A1.3) vanishes (note that i starts at 0 in the
minimum in A1.1)), and the sums on the right in A1.5) cancel because
pn(j) = pn{—j). Thus [n, a, b, v] is valid if A1.6) holds and a = 0; we may
dispose of the case b = 0 in the same way. To complete the induction step
we must verify [n,a,b,v] under the assumption that a < 0 < b and
a < v < b. But in this case a + 1 < 0 and b — 1 > 0, so that [n — 1, a — 1,
b — 1, v — 1] and [n — 1, a + 1, & + 1, v + 1] both come under the
induction hypothesis and hence are valid. And now [n, a, b, v] follows by
the probabilistically obvious recursions
Pnij) = bn-i(j- 1) + */V-i(/+ 1)
and
From A1.5) it follows by summation over v that, if
A1.7) a<0<b, a<u<v<b,
then
A1.8) P{a < mn < Mn < b, и < Sn < v}
| - a)< Sn < t; + 2fc(b - a)}
fc=— 00
- f pi2b ~v + 2k(b - a)<Sn<2b-u + 2k(b - a)}.
fc=—00
t Problem 2 outlines how A1.5) can be derived (as opposed to verified).
Functions of Brownian Motion Paths 79
Taking a = — n — 1 in this formula leads to
A1.9) ?{Mn < b, и < Sn < v)
= ?{u < Sn < v} - P{2b - v < Sn < 2b - и},
valid for — n — 1 < и < v < b, b >; O.f From A1.9) it is possible to
retrieve A0.13).
Now A1.8) holds in the random walk case and, because of A1.2), we can
find the distribution of (m, M, WJ by passing to the limit. If a, b, u, and v
are real numbers satisfying A1.7), replace them in A1.8) by the integers
[an-], — [—bn^], [ми*], and —[—vn*], respectively. Because of the central
limit theorem and the continuity of the normal distribution, a termwise
passage to the limit in A1.8) yields
A1.10) P{a < m < M < b, и < W1 < v)
= f ?{u + 2k(b - a)< N <v + 2k(b - a)}
2k(b -a)<N<2b-u + 2k(b - a)}.
fc=— 00
00
fc=—00
The interchange of limit with summation over к can be justified by the
series form of Scheffe's theorem (p. 224).
The joint distribution of M and W1 alone could be obtained by letting
a tend to — oo in A1.10), but it is simpler to return to the random walk case
and pass to the limit in A1.9), which yields
A1.11)
?{M < b, и < Wx < v) = ?{u < N < v} - P{2b - v < N < 2b - u},
valid for и < v < b, b > 0. Taking v = b and letting и -> — oo leads back
to A0.17).
From A1.10) with и — a and v = b we have
A1.12) ?{a<m<M<b)
= I (-l)*P{a + k(b- a)< N <b+k(b- a)},
fc=-00
valid for a < 0 < b. And this result with a — —b gives
A1.13) P{suPi \Wt\ < b) = f (-l)fcP{Bfc - l)b < AT < Bk +
A;=—qo
t See Problem 1 for another proof.
80 The Space С
for b > 0. By continuity, the strict inequalities in all these formulas can be
relaxed to allow equality. And the right sides can all be written out as sums
of normal integrals. It is possiblef to transform the series in A1.13) to
^)
r2Bfc+lJ/8b2
1 _ _ V V^) -ir2Bfc+lJ/8b2
77 »=i 2k + 1
Although we derived A1.10) through A1.13) by passing to the limit in the
random walk case, we have the limiting distributions for (mn, Mn, Sn),
(Mn, Sn), (mn, Mn), and тахг<п \St\ (properly normalized) in the more
general Lindeberg-Levy case because A1.2) holds there.
The Arc Sine Law
For x in C, let hx{x) be the supremum of those t in [0, 1] for which x(t) = 0;
let h2(x) be the Lebesgue measure of those t in [0, 1] for which x(t) > 0; and
let h3(x) be the Lebesgue measure of those t in [0, h^x)] for which x(t) > 0.
Then T = h-^W) is the time at which the Wiener path W last passes through
0, U = h2(W) is the total amount of time W spends above 0, and V = h3(W)
is the amount of time W spends above 0 in the interval [0, T]. We shall find
the joint distribution of (T, U, V, Wx).
In Appendix II (pp. 230 ff.) it is shown that each of the mappings h1}
h2, and h3 is measurable and is continuous except on a set of Wiener measure
0. Therefore
A1.14) (h.iXJ, h2(Xn), h3(Xn), Xn(l)) *> (T, U, V, WJ
in the Lindeberg-Levy case. In the random walk case, the vector on the left
has a simple interpretation: Tn — nh^XJ is the maximum i, 1 < i < n, for
which St = 0; Un = nh2(Xn) is the number of i, 1 < i < n, for which ^_х
and St are both nonnegative; Vn — nh3(Xn) is the number of i, 1 < i < Tn,
for which ^_! and St are both nonnegative; and, of course, XJY) — SJ^fn.
With these definitions we therefore have
A1.15) (- Tn, - Un, - Vn, ± Sn) ®+ (T, U, V,
\n n n yjn /
in the random walk case; we shall find the distribution of (T, U, V, W^) by
passing to the limit. In the general Lindeberg-Levy case, the left side of
A1.14) is a somewhat more complicated function of the partial sums S1} . . . ,
Sn. After we have found the distribution of (T, U, V, W-^), we shall show how
A1.14) leads even in the general case to limit theorems for quantities
associated in a natural way with the partial sums.
t See Feller A966, pp. 330 and 594).
Functions of Brownian Motion Paths 81
Since the random vector (T, U, V, W-^) is constrained by
1-T+V if Wx > 0,
A1.16)
it suffices to consider (T, V, Wx) and the related vector (Tn, Vn, Sn). The
distribution of the latter quantity in the random walk case we shall derive
from three facts which admit of elementary proofs we shall not carry through
here.
First, we shall use the local limit theorem for random walk: If m tends to
infinity andy varies with m in such a way that у and m have the same parity and
/ -> y, thent
Second, we shall use the fact that
A1.18) P{SX > 0, . . . , STO_X > 0, Sm = j} = ± pm(j)
m
for у positive.% If S2m = 0, then U2m = V2m assumes one of the values 0,
2, . . . , 2m; the third fact§ we shall use is that these m + 1 values all have
the same conditional probability:
A1.19) P{V2m = 2i | S2m = 0} = —Ц-, i = 0, 1, . . . , m.
m + 1
To compute the probability that Tn = 2k, Vn = 2i, and Sn = j, condition
on the event S2k = 0. Conditionally on this event, (So, . . . , S2k) and
(S2k+1, . . . , Sn) are independent, Vn depends only on the first sequence, and
Tn = 2k and Sn = j if and only if the elements of the second sequence are
nonzero and the last one is/ By A1.18) and A1.19) we conclude that
A1.20) P{Tn = 2k, Vn = 2i, Sn = j} = p2k@) -~ -^— pn_2k(j)
к + 1 n — 2k
if
A1.21) 0<,2i<2k <n, ;>0.
Both sides of A1.20) vanish if и andy have opposite parity. Fory negative, the
same formula holds with |y| in place of у on the right.
t Feller A957, p. 170) has the local theorem for the binomial distribution, and A1.17)
follows because ?(Sm + m) IS binomially distributed.
% See Feller A957, p. 70). This also follows from A1.9).
§ See Feller A957, p. 72).
82 The Space С
We shall apply Theorem 7.8 to the lattice of points Bfc/n, 2i\n, j[y/n),
where у and n have the same parity. Suppose k, i, and у tend to infinity with
n in such a way that
2k 2i
n n
where 0 < v < t < 1 and x > 0. Then A1.21) holds for large n, and it
follows by A1.20) and A1.17) that
/2.2. -L
\n n Jn
where
A1.22) g(f, *) = -1 W__ e-b2/(i-tM о
277 [f(l - OF
The same result holds for negative я by symmetry. Therefore
(П-23) /- Tn, i Vn, -j= Sn)
has (in the random walk case) the limiting distribution in R3 specified by the
density
(g(t,x) if 0<u <t < 1,
A1.24) f(t,v,x) = \6
[0 otherwise.
By A1.15), (T, V, Wx) has this density. Because of A1.16), the distribution of
(T, U, V, Wi) can be written out explicitly also.
From A1.24) it follows that the conditional distribution of V given 2" and
Wx is uniform on [0, T\\ this corresponds to A1.19). By A1.16), if T= t
and Wx = x, then U is uniformly distributed over [1 — t, 1] for x > 0 and
over [0, t] for x < 0. Using A1.24) to account for the possible values of t
and x, we find for the density of U alone
A1.25) f f g(t, x) dtdx + f fg(f, *) df dx.
x>0 x<0
1-й <t u<t
Now the integral of g(t, x) over the range x > 0 is 1/[2тг^A _ t)i], which
is the derivative of — (A — *)/0*/7Г> and hence A1.25) reduces to
Д - и)*]. Therefore
ds =- arc sin ^ц, 0 < и < 1.
A1.26) P{t7 < ц} = - f"
- s)
Functions of Brownian Motion Paths 83
This is Levy's arc sine distribution. A similar computation shows that Г also
follows the arc sine law:
A1.27) P{T<t} =-arc sin ^f, 0 < t < 1.
Let us now combine A1.14) with the facts just derived to obtain a limit
theorem for the general Lindeberg-Levy case. Let us agree to say that a
O-crossing takes place at / if the event
A1.28) E, = {S, = 0} U {5И > 0 > St} U {?_! <0<St)
occurs (which in the random walk case is to say that St = 0). Let T'n be the
maximum i, 1 < i < n, for which a O-crossing takes place at i; let U'n be
the number of i, 1 < i < n, for which St > 0; and let V'n be the number of i,
1 < * < T'n, for which St > 0. We shall prove that
(И.29) (- t;, - vn, - v;, -^- sn) *> (т, и, v,
by showing that the quantity on the left here approximates the left side of
A1.14).
Clearly, T'n\n is within 1/n of ^(Хп). If yn is the number of i, 1 < i < n,
for which Ег occurs—the number of O-crossings—then U'Jn and V'Jn are
within yjn of h2(Xn) and h3(Xn), respectively. Therefore A1.29) will follow
from A1.14) and Theorem 4.1 if we prove that yjn -?> 0, and for this it is
enough to show that
A1.30) е[Ы = - 2 P(E<) -> 0.
I n J n i=i
But
for each positive e, and hence, by the central limit theorem, P(Et) -> О.
And now A1.30) is a consequence of the theorem on arithmetic means of
convergent sequences.
From A1.29) we may conclude for example that Ujn and T'Jn have arc
sine distributions in the limit.
The Brownian Bridge
The Brownian bridge W° behaves like a Wiener path W conditioned by the
requirement Wx = 0. With an appropriate passage to the limit to take
account of the fact that {Wx = 0} is an event of probability 0, this observation
can be used to derive distributions associated with W°.
84 The Space С
Let PE be the probability measure on С defined by
PE(A)= Р{ЖеЛ|0^ Wx < ?}, Ае<?.
We shall prove that
A1.31) PE=>W°
as e tends to O.| Take the random function W as defined on some probability
space and take the random function W° to be defined on the same space by
W° = Wt — tWx (see the construction in Section 9). If we prove that
A1.32) lim sup ?{We F\0<W1<ae} <, ?{W° e F)
E->0
for each closed .Pin C, then A1.31) will follow by Theorem 2.1.
For arbitrary tXi . . . , tk, Wx is independent of (И^°, . . . , W°) because it
is uncorrelated with each component. Therefore
A1.33) P{W°eA, WxeB} = ?{W°eA}?{W1eB)
if A is a finite-dimensional set in С and В lies in 0lx. But for В fixed the set of
A in ^ that satisfy A1.33) is a monotone class and hence! coincides with <%.
Thus A1.33) holds for A e<g and В e М1. In particular,
?{W° e A | 0 < Wx <, e} = ?{W° e A}.
Since p(W, W°) = 1^1, where p is the metric on C, \Wx\ < б and
W 6 F imply W°eF6 = {x:p(x, F) < d}. Therefore, if e < <5,
P{WeF\ 0 ^ W1 < e} <, ?{W° 6 F8 \ 0 < W1 < e} = ?{W° e Fd}.
The limit superior in A1.32) is thus at most ?{W° e F8), which decreases to
P{PF°eF}asEJ,0if F is closed. This proves A1.32) and hence A1.31).§
Suppose now that h is a measurable mapping from С to Rk and that W°
has probability 0 of lying in the set Dh of discontinuities of h. It follows by
A1.31) and Theorem 5.1 that
A1.34) P{h(W°) < a} = lim P{h(W) < a | 0 < Wx <, e)
holds for each a at which the left side is continuous (as a function of a
ranging over Rk). From A1.34) we can find explicit forms for some distribu-
distributions connected with W°. Sometimes an alternative form of A1.34) is more
f We can let e tend continuously to 0 and use the definition involving B.4). Alternatively,
we can let e tend to 0 through some sequence (say the reciprocals of integers) fixed
throughout the discussion.
% See Halmos A950, p. 26).
§ This part of the argument merely repeats the proof of Theorem 4.1.
Functions of Brownian Motion Paths 85
convenient:
A1.35) P{h(W°) < a} = lim P{h(W) < a | -e <, Wx < 0}.
?->0
This is established in the same way (in place of [0, e] we could use any subset
of [—?,?] with positive measure).
Let
m = mft Wt, M = supt Wt.
Suppose that a<0<b and that 0 < e < b; by A1.10) we have, if
с = b — a,
A1.36) P{a < m < M < b, 0 < Wx < s}
= f P{2kc < N < 2kc + ?} - f P{2fcc + 2b - e < JV < 2kc + 2b}.
fc=—00 fc=—00
Since
A1.37) lim -P{x<N <x + e} = -L e~^ ,
?-+0 ? у/2тг
if we take the limit inside the sums in A1.36), then A1.34) yields
00 00
k=— oo fc=— oo
Since the series converge uniformly in s, the interchange is all right. Thus we
have the distribution of (m°, M°). Taking —a = b here gives ,
A1.39) P{supt \Wt°\ < b} = 1 + 2f (~l)ke~2kV, b > 0.
By an entirely similar analysis applied to A1.11),
A1.40) P{M° < b} = 1 - e~*b\ b > 0.
Let t/° be the Lebesgue measure of those t in [0, 1] for which W° > 0.
We shall show that U° is uniformly distributed over [0, 1] by showing that
A1.41) lim P{[/ <, a | -e ^ Wx < 0} = a, 0 < a < 1.
?-+0
Because of A1.16), the conditional probability here is
From the form of the density A1.24) we saw that the distribution of V given
T and Wx is uniform on @, T). In other words, if L = VjT, then L is
uniformly distributed on @, 1) and is independent of (T, Wx). Therefore the
86 The Space С
conditional probability in A1.41) is
P{TL<a|-? < Wx <0} = p(t<-
•'o [ s
-? < Wx < 0 ds,
and A1.41) will follow by the bounded convergence theorem if we prove the
intuitively obvious relation
P{T<e\-e<,W1^0}->0, 0 < 1.
But this follows by A1.37) and the form of the density A1.24). Therefore
A1.42) P{U° ^ a} = a, 0 < a < 1.
Remarks. For further evaluation of distributions of functions of Brownian motion paths,
see Ito and McKean A965), Karlin A966, Chapter 10), Erdos and Kac A946 and 1947),
Mark A949), Darling and Erdos A956), and Section 16 below.
PROBLEMS
1. Show by reflection in the random walk case that
>n(v) if v > b,
A1.43) {n,n }
[pnBb - v) if v < b
for b > 0. Derive A1.9) from this.
2. For nonnegative integers ct, let тт{сх, . . . , ck; v) be the probability that an n-step
random walk (и fixed) meets cx (one or more times), then meets —c2, then meets c3, . . . ,
then meets (—lOc+1cfc, and ends at v. Use A1.43) and induction on к to show that
c . ipni^i + ¦¦•+ 2c^x + (-l)fc+V) if (-1)*+^ > ck,
[Reflect through (—l)fccfc_1 the part of the path to the right of the first passage through that
point following successive passages through cl9 — c2, ¦ ¦ ¦ , (—I)k~2ck_2.] Derive A1.5) by
showing that pn{a, b, v) is
pn(v) — n(b; v) + n(b, a; v) — тг(Ь, a, b; v) + ¦ • ¦
— тг(а; v) + ir(a, b; v) — тт{а, b,a;v) + -- ¦,
3. For ж in с let h{x) be the smallest t for which x(t) = sup,, x(s). Show that h is measur-
measurable and is continuous on a set of W-measure 1. Let тп be the smallest к for which Sk =
maXj<n Si and prove *
P -^ < a| ^--arcsin Va, 0 < a < 1,
in the Lindeberg-Levy case. [See Feller A957, p. 86) for the random walk case.]
4. Derive the joint limiting distribution of the maximum and minimum of St — w^^,
0 < i < n, in the Lindeberg-Levy case. [Consider Yn(t) — Xn(t) — tXn{\) with Xn defined
by A0.1).]
Fluctuations of Partial Sums 87
12. FLUCTUATIONS OF PARTIAL SUMS
In Sections 9 and 10 we established tightness for sequences of random
functions by finding bounds for the distribution of the maximum of certain
partial sums. Here we shall derive such bounds under fairly general condi-
conditions; the results lead to practical tightness criteria which will be used
throughout the rest of the book.
Maxima
Let ?1? . . . , ?m be random variables; they need not be independent or
identically distributed. Let Sk = ^ + ' • • + ?k (So = 0), and put
A2.1) Mm = max \Sk\.
0<k<m
We shall obtain upper, bounds for P{Mm > X} by an indirect approach.
If
A2.2) M'm = max minflSJ, \SM - Sk\},
0<k<m
then
A2.3) M'm < Mm.
If f i ffc-i, ?*+!, • • • , ?™ are identically 0, then M'm = 0 and Mm = | ?fc|.
Thus there can be no inequality opposite to A2.3).
On the other hand, we do have
\Sk\ < min {\SJ + \Sk\, \SJ + \Sm - Sk\} = \SJ + min {|^|, \Sm - Sk\},
so that
m
A2-4) Mm<M'm
(There is equality here if Sm = 0.) Therefore
A2.5) ?{Mm >X}<
If we can find separate bounds for the terms on the right in A2.5), we shall
have a bound for the term on the left.
If we bound P{Mm > X) via A2.5) instead of by a direct analysis of some
kind, there may be some loss of accuracy. On the other hand, since the right
side of A2.5) is at most 2?{Mm > \X), the loss need not be great: an extra
factor of 2 will not matter for our purposes, and the bounds we derive will
decrease with increasing X slowly enough that passing from Я to Я/2 will have
no important effect.
There is a second way of deriving bounds on the distribution of Mm from
bounds on the distribution of M'm. We shall show that
A2.6) Mm
from which it will follow that
?[ max |?| > -\
[l<i<m 4J
A2.7) ?{Mm > A} < ?{м'т >-\
{ 4}
Again, bounding the terms on the right will yield a bound for the term on
the left. And the right side of A2.7) is at most 2P{Mm > iA}, so that using
A2.7) can result in no substantial loss of accuracy.
To prove A2.6), consider the set / consisting of those k, 0 <i к < т, for
which \Sk\ < \Sm - Sk\; certainly, Oe/. If Sm = 0, then Mm = M'm, so
that A2.6) holds. If Sm 9^ 0, then m ф I and hence there is a k, 0 < к < т,
for which к-lei and кфГ. |^_г| < \Sm - S^], \Sk^\ < M'm,
\Sm - Sk\ < \Sk\, and \Sm - Sk\ < M'm. For this к we have
A2.8) \Sm\ < \Sk^\ + \b\ + \Sm - Sk\ < 2M'm + \b\,
from which A2.6) follows via A2.4). (There is equality in A2.6) if m = 5
and fx = fa = ?3 = f4 = -?5.)
Product Moments
In view of A2.5) and A2.7), there is profit in bounding ?{M'm > A}. Let us
suppose that there exist nonnegative numbers щ, . . . , um such that
A2.9) E{|S,. - S,P • \Sk - S,r} < ( 2 «iY( 2 ul
0 < i < j < к <, m,
where у and a are positive. For example, this will hold with у = 2 and
a = 1 if the ?г- are independent, if E{?J = 0, and if щ is taken to be the
variance of ?,-.
Since xy < (x + yJ for nonnegative x and y, A2.9) implies
A2.10) E{|S, - S,V ¦ \Sk - S,?} <, I 2 uT, 0<i<j<k<m.
\i<l<k /
Moreover, if \S3- — 5J > A and \Sk — 5^| > A, where A is positive, then
15,- - Si\v ¦ \Sk - Si\y > A2y, so that A2.10) implies
A2.11) P{|S,. - St\ > A, \Sk - S,\ > A} < \-( 2
A Y\i<i<k
Fluctuations of Partial Sums 89
(The inequality is trivial if i = j or if; = k.) Note that A2.9) implies A2.10)
and A2.10) implies A2.11) also in the case у = 0, provided we take |x|° as 1
if x ?? 0 and as 0 if x = 0.
THEOREM 12.1 If у > 0 and a > \, and if A2.11) holds for all positive X,
then, for all positive X,
A2.12) ?{M'm > X} <, ?j* (u, + • • • + и J2*,
where Ky a is a constant depending only on у and a.
Although its specific value will have no importance for us, the constant
Ky(X may be taken as
tj j ^ \2a-|-By+D
nl/By+l) l^l/By
Theorem 12.1 is of purely theoretical value: K2>i, for example, is 55,021 to
the nearest integer.
Applications
Before proving the theorem, let us examine several of its consequences. If the
Hi are independent with E{?J = 0 and Е{?/} = <тД then A2.9) holds with
7 = 2 and a = 1, provided щ is taken to be a?. Now a-f + • ' " + cTO2 =
sm2 is the variance of Sm, and A2.12) implies
A2.14) ?{M'm >X}< ^f- ,
with К = K21. Replacing Я by Xsm and using A2.5) and A2.7), we obtain
A2.15) ?{Mm > Xsm} < ^ + P{\SJ > ^sm]
and
A2.16) ?{Mm > XsJ < ^f + Pfmax |^| > lXsm\.
According to A0.7),
A2.17) ?{Mm > XsJ ? 2P{\Sm\ > (Я - y/2)sj.
The factor 2 on the right here is of no importance and, if X is large, \X and
Я — sfl are about the same. Thus A2.15) represents no advance over A2.17)
in the independent case. But A2.16) does, because
A2.18) PJ max |&| > ^XsA <
Я2 si, i-iji
12
90 The Space С
so that A2.16) implies
A4 V A2 1 m Г
A2.19) P{Mm>hm}<,^f- + f,-r2
The sum on the right is the one involved in Lindeberg's condition G.3).
It is possible to deduce further results. For example, from A2.14) it
follows (see C) on p. 223) that
f
UMm'J>asm2}\sm/ a
Since the ^ are independent, we have (the indices in the sums are constrained
by 1 <, i, к < m)
^ f Ц d?
f max, Щ- d? < ^ f Ц-
h^S-r f , - ^2dP.
a/ < smJ{^2>0
For nonnegative U and K, we have
f (U +V)dP ?l{ l/dP + 2| V dP.
J{U+V>a}
That
A2.20)
J{ikfm2>asm2} \ Sm / La i-15m J{|fi*
for some universal constant K' now follows by A2.6) and the fact that the
sum on the right here is at most 1. The inequality persists if Мгт is replaced
by iS^j. This proves in particular the uniform integrability of the squares of
the normalized row sums for a triangular array satisfying Lindeberg's
condition.
Suppose now that the ^ are independent and identically distributed with
J. = 0 and E{^2} = a2. Then A2.15) implies
A2.21) P{Mm ^ Xojm] < ^f + P{|SJ > P
A
A2.16) implies
A2.22) P{Mm > Aojm] < ^f + Pf max |?| >
and A2.19) implies
A2.23) P{Mm > Xojm) < ^f + -?-f f ^ dP.
Fluctuations of Partial Sums 91
Since the integral here goes to 0 as X -»¦ oo, for sufficiently large X we have
P{MW > Xajm] <j2, m = 1, 2, ....
Thus we have another proof of the tightness of the random functions
involved in Donsker's theorem (see A0.10)), a proof which does not depend
on the central limit theorem.
Although the random variables are independent in the applications just
given, Theorem 12.1 does not require independence. This is a great advantage:
Most later applications will involve dependent sequences.
Proof of Theorem 12.1
The assumption in Theorem 12.1 is that
A2.24) P{|S, - St\ > X, \Sk - S,\ >X}< i./ J Л*",
0 < i < j < к < т,
for all positive X; we are to find a constant K, depending only on у and a,
for which
A2.25) ?{M'm > X) < ^ (Ul + • • • + uwJa.
Write
A2.26) д = —-— ,
2y + 1
so that 0 < д <. 1. For large К we have
because the left-hand member approaches l/2Ba-1)<5 as K-+ oo and Ba > 1,
д > 0) this limit is less than 1. We shall prove that A2.25) holds if К satisfies
A2.27) and if
A2.28) K>1.
(In point of fact, A2.27) implies A2.28). The smallest К satisfying A2.27)
is given by A2.13).)
The proof goes by induction on m. The result being trivial for m = 1,
consider the case m = 2. Since М'г = min {\SX\, \S2 — S^}, A2.24) and
A2.28) imply
; > x) < ~ (Щ + «2J« < A (Ml + щу\
92 The Space С
Assume now as induction hypothesis that the result holds for each integer
less than m. We shall prove it for m itself. Write и = ux + • • • + um; we
may assume и > 0. There exists an integer h, 1 < h <? m, such that
A2.29) «1 + + "»-.^1
и 2
where the sum on the left is 0 if h — 1.
Consider the four quantities
Ux = max
0<t<A—1
U2 = max min {\S, - Sh\, \(Sm - Sh) - (S, - SOI}
h<i<m
= max min
D2 = mm{\Sh\,\Sm- Sh\}.
Since A2.24) holds if m is replaced by /г — 1, and since h — 1 < m, we may
apply the induction hypothesis to the random variables |l5 . . . , |л-1 and
the quantities щ, . . . , uh_1 and conclude that
A2.30) ?{UX > X) < A K + • • • + „^^ < ? A ,
the last inequality following by A2.29). (If h = 1, A2.30) is trivial.)
If the indices in A2.24) are restricted to h < i <j ^ к < т, then only the
random variables |л+1, . . . , ?те are involved. Since m — h < m, the induc-
induction hypothesis applies to ?л+1, . . . , ?те and мл+1, . . . , um; hence
A2.31) P{U2 > X] < ^ (u U ^
the last inequality following by A2.29) again. (If h == m, A2.31) is trivial.)
Now A2.24) implies
A2.32) P{Dl > X\ < ±(Ul + ¦¦¦+ итГ = ^
and
A2-33) ?{D2>X\<^y.
(If h = 1, then A2.32) is trivial; if h == m, then A2.33) is trivial.)
We shall show that
A2.34) mm{\Si\,\Sm-Si\}<U1 + D1 if 0 < i < h - 1.
Fluctuations of Partial Sums 93
Let ^ be the minimum on the left. If |^| < Ult then
Suppose \Shr.1 - St\ < Ur\ if \S^r\ = Dx, then
f*t < \st\ < | Vi - si\ + I Vil < ux + z>x.
Suppose |5^i - S,| < C/i and |5m - Vil = A; then
A*, < \Sm - St\ < |^_х - St\ + \Sm - Vil < Ux + A.
This proves A2.34). The same sort of argument shows that
A2.35) min {|5,-|, \Sn - S5\} < U2 + D2 if h <j < m.
By A2.34) and A2.35),
A2.36) M'm < max {Ux + Dlt U2 + D2},
and hence
A2.37) ?{M'm ^ X] < ?{UX + Dx > X} + P{U2 + D2> A"}.
If Ao and Ax are positive and Ao + Ax = A, then, by A2.30) and A2.32),
/1ТТОЛ D( TT I П "~^ 11 ^S DfTT "~^ 11 I Df П ^^ 1 "I ^
Ao 2
Calculus shows that, if Co, Cl5 and A are positive numbers, then
A2.39) min \^+%-
with д defined by A2.26). Minimizing the right-most member of A2.38), we
arrive at
A2.40)
The same inequality holds for U2 + D2 (use A2.31) and A2.33)). By
A2.37), therefore
2a Г / K\5 I1/5
A2.41) ?{M'm > A} < — 2 j —j + 1 .
v I ™ *— J — -]2y I /i2a/
By the choice A2.27) of K, the right member here is at most Ku^/X2*. This
completes the induction step and the proof of Theorem 12.1.
94 The Space С
Moments^
Let us now replace A2.9) by the assumption that
A2.42) E{|S, - 5,1'} < {У iiX 0 < i < j < m.
It follows from this that, if X is positive, then
A2.43) P{|S, - S<\ > X) < I¦( 2 «X 0 < i < j < m.
In place of a > | we make the stronger requirement that a > 1.
THEOREM 12.2 If у > 0 and a > 1, and if A2.43) holds for all positive X,
then, for all positive X,
A2.44) ?{Mm >X}<^
where K'y a depends only on у and a.
We may define Щ>л by
A2.45) K'y>a = 2^A + KhM).
Proof. By Schwarz's inequality, P(?x n E2) < РЦЕ^Р^Е^). Therefore
A2.43) implies (recall xy < (x + yJ)
P{\St - S<\ > X, \Sk - S,\ > X) < — 2 и») T^l I и
a/2
4 I
U14
Ay\i</<fc /
Thus A2.11) holds with у and a replaced by \y and ?oc, and Theorem 12.1
implies
A2.46) ?{M'm > X) < jy (щ + ¦ ¦ • + и J*
with К = Kiy<ia. By A2.43) we have
A2.47) P{|SJ > A} < у (Ml + • • • + uj',
and A2.44) follows by A2.5). (Note that, since the right-hand members of
the last two inequalities are of the same order, A2.44) cannot be essentially
improved by using extra information about the distribution of |.SOT|.)
f The remaining results in this section are needed in Chapters 3 and 4 but not in Section 13,
the last in this chapter.
Fluctuations of Partial Sums 95
As an application of Theorem 12.2, consider again independent, identically
distributed variables ^ with mean 0 and variance a2; this time assume there
exists a finite fourth moment т4 = Е{^4}. Now
= 2
where the indices range independently from 1 to r. Since the !-г are independ-
independent and their means vanish, if the value of some index in the summand
differs from those of the other three, the term vanishes. Since о4 < т4,
E{Sr4} = rr4 + 3r(r - 1H* < 4r2r4,
and hence A2.42) holds with у = 4, a = 2, and ux ==•••== um — 2т2.
Theorem 12.2 now implies
m2
A2.48) ?{Mm > X) < AS К ^ ,
A
where A" = X^i2. Replacing X by Xayjm yields
A2.49) P{MTO > ^
From this and Theorem 8.4, the tightness of the random functions A0.1)
involved in Donsker's theorem follows easily. The argument involving
A2.23) is more powerful than this one because it requires only second
moments.
A Tightness Criterion
Theorem 12.2 leads to a condition for tightness of a sequence {Xn} of random
elements of C.
THEOREM 12.3 The sequence {Xn} is tight if it satisfies these two conditions:
(i) The sequence {Xn@)} is tight.
(ii) There exist constants у > 0 and a > 1 and a nondecreasing, continuous
function F on [0, 1] such that
A2.50) ?{\XM - Xn(h)\ > Ц < j №) - F(tlW
A
holds for all tlt t2, and n and all positive X.
The moment condition
A2.51) E{|Zn(g - XJtJM < \F(t2) -
implies A2.50).
96 The Space С
Proof. By Theorem 8.2 it suffices to produce, given e and ту, а д @ < д < 1)
for which
A2.52) P{w(Zn, 6) > 3e} < rj
for all и, and, by the corollary to Theorem 8.3, this will hold if d'1 is an
integer and
A2.53) 2 p( «up \Xn(s) - Xn(jd)\ > e) < n.
i<d~x \j5<s<(i+l)d )
Fix n, d, and у for the moment and, for a positive integer m, consider the
random variables
A2.54) ^=X
m J \ m
By A2.50), these random variables satisfy A2.43) with щ — F(jd + idnr1) —
F{jb + (i - lydm-1). By Theorem 12.2, therefore,
A2.55) p( max xjjd + - d) - Xn(jd) > e)
\0<i<m \ ТП J j
ГГ/ / ' ¦ <\ A 7"»/" 5Л "I(X
— [F((j + l)d) - F(jd)]a,
e7
where К = К^а. Since Xn lies in C, letting m —>- oo leads to
A2.56) Pj jup^\Xn(s) - Xn(jd)\ >j<fy [HO + W) ~ F№T-
Therefore
A2.57) 2 A .d<™f.+1)}X"^ - X^8^ ^ e)
< - [F(l) - F@)]fmax|F@- + Щ - FO^ll"
if 6-1 is integral. Since F is continuous and a > 1, we may make this last
quantity small by taking д the reciprocal of a large integer, which completes
the proof.
The ideas in this proof lead to a condition for the existence in С of a
random element with specified finite-dimensional distributions. For each
A;-tuple of points of [0, 1], let [Atv..tk be a probability measure on (Rk, ?%k).
Assume these measures satisfy the consistency conditions of Kolmogorov's
existence theorem (in its general form; see p. 230).
THEOREM 12.4 There exists in С a random element with finite-dimensional
distributions ftt tte, provided these distributions are consistent and provided
Fluctuations of Partial Sums 97
there exist constants у > 0 and a > 1 and a nondecreasing, continuous
function F on [0, 1] such that
A2.58) 1лЧЛг{фъ fo : |& - &| > A} < i
holds for all tx and t2 and all positive X.
If the ,Mtl...tfc satisfy
A2.59) f |& - W d/it^fa, &) < \F(t2) -
then A2.58) follows.
Proof For each n, construct a polygonal random function Xn that is linear
over each interval [(/ — lJ~n, i2~n] and for which the joint distribution of
is /ut Hn with tt = i\2n. If t1 and t2 are integral multiples of 2~n, then, by
A2.58)';
A2.60) P{\Xn(t2) - Zn(fl)l > Я} < ± \F(t2) - F(Ola.
As in the preceding proof, consider fixed n, d, andy, where д~г is assumed
integral. If the points
A2.61) jd + idirr1, i = 0, 1, . . . , m
involved in A2.54) are all integral multiples if 2~n, then A2.55) follows as
before. Suppose now that 62n is an integer. If we take m = 62n, then the
points A2.61) are indeed integral multiples of 2~n, so that A2.55) holds.
Moreover the points A2.61) are in this case exactly the integral multiples of
2~n in the interval [jd, (J + 1N], so that, because of the polygonal character
of Xn, A2.56) and A2.57) again hold.
Thus A2.57) holds if d'1 and 62n are both integers. If we take 6 = 2~v,
where v is an integer large enough that the right side of A2.57) is less than rj,
then A2.52) holds for all n > v, which, since the -^„@) all have the same
distribution, is enough for tightness.
By Prohorov's theorem, therefore, there is a random element X of С
and a subsequence {Xn.} such that Xn. ^ X. If tx, . . . , tk are dyadic
rationale, then, by the consistency assumption, the distribution of {Xn(t^),
. . . , Xn(tk)) is exactly [Atv..tle for all sufficiently large n, and it follows that
(Z(/1), . . . , X(tky) has distribution /utv..tk. For general points tlt . . . , tk of
[0, 1], there are dyadic rationale s^], . . . , s(kv) with lim^^"' = t{; since
98 The Space С
(Z(s<u))> • • , X(s(kv))) converges in distribution to (X(tJ, ..., X(tk)), and
since, by A2.58) and the continuity of F, /u^v) ...s<«) converges weakly to
1 к
Ph-w C^('i)» • • • ' Х(*кУ) nas H-tvtic as *ts distribution. Thus X is the
random function required.
The existence of Ж and W° follow anew from Theorem 12.4.|
Further Inequalities
For Chapter 3, we shall need a strengthening of Theorem 12.1. If
A2.62) M; = max min {|S, - S{\, \Sk - S,|},
0<i<3<fc<m
then
A2.63) M'm < M"m.
(If m = 3 and ?г = -?2 = ^3, then M'm = 0, and M"m = \?г\.) Therefore
bounds on the tail of the distribution of M"m are stronger than bounds on the
tail of the distribution of M'm, and the following result sharpens Theorem
12.1.
THEOREM 12.5 If у ^ 0 and a > ^, and if
A2.64) P{|S, - Sf| > Я, \Sk - St\ > X] < i-
holds for all positive X, then, for all positive X,
A2.65) ?{M"m > X) < ^f(Ul + • • • + umf*,
i<l<k
0<i<j <k<m,
where K" a is a constant depending only on у and a.
Proof. Put
A2.66) Nm = min max | max|Sf|, max \Sm — SM.
l<l<m \0<i<l l<i<m )
If / is the index that achieves the minimum here and i <j < k, then either
i,j < /, or else / <j, k, from which it follows that
A2.67) M"m < 2Nm.
Hence it will suffice to find a K, depending only on у and a, such that A2.64)
f Theorem 12.4 combines with Theorem 9.2 to give a condition for the continuity of sample
paths of separable processes. For extensions and variations of Theorems 12.3 and 12.4,
see Problems 7 and 8 in this section and Problem 4 in Section 15.
Fluctuations of Partial Sums 99
implies
A2.68) ?{Nm >Ц<~(и +•¦¦ + unT*.
(Since Nm < 2M"m, A2.68) and A2.65) have the same strength.)
For sufficiently large К, К > 1 and
A2.69) / —j +D-2*?Y<Ks,
where д = 1/B у + 1), as before. We shall prove by induction on m that
such а К works. The cases m = 1 and m = 2 are easy.
Assume as induction hypothesis that the result holds for integers smaller
than m. As in the proof of Theorem 12.1, choose h, 1 < h < m, so that
"i + V uh__x 1 Mi + • • • + uh
S S 5
и 2
те.
where н = нх + • • • + м
Write Лх for iVx:
A2.70) Ax = min max max |SJ, max
h \0
Write A2 for the value of Nm_h based on the quantities gh+1, . . . ,
A2.71) ^2 = min max I max |5г- — Sh\, max |STO — 5
h<l2<m |.Л<г<г2 lz<i<m
By the induction hypothesis applied to ?l5 . . . , ?л_15 we have
A2.72) P{^ > Я} < ^ (и, + ¦ ¦ ¦ + и^Г <Jy§a-
By the induction hypothesis applied to gh+1, . . . , |m,we have
A2.73) ?{A2 >X]<fy K+1 + • • • + итГ <^у§-«
Write
ix{i,j, k) = min (IS, - 5,|, \Sk - 5,-|}
and define
A2.74)
В = тах{/г@, /г — 1, m); /г@, /г — 1, h); [x(h — 1, /г, т); /u@,h,m)}.
By A2.64),
A2-75) P{B > Я} < 4 ^ .
100 The Space C v>,:;
We next show that
A2.76) Nm <, max {Ax, A2} + IB.
Let lx and /2 be the specific values of the indices that achieve the minima in
A2.70) and A2.71). To prove A2.76), we must produce an /, 1 < / < m,
for which
A2.77) maxISJ < max {Ax, A2} + 2B
and
A2.78) max | Sm - St-| < max {Alt A2} + IB.
Suppose first that
A2.79) |5Л_!|<В
and
A2.80) \Sm-Sh\<B.
We shall show that the choice / = h satisfies A2.77) and A2.78). Indeed,
0 < i < h implies |5г-| < Ax, and lx < / < / = h implies
P»l S P; *jft-il + Рл-il ^ ^i + -o
because of A2.79), so that A2.77) holds; and A2.78) is established in the
same way.
Suppose now that one or the other of A2.79) and A2.80) fails. For
definiteness, suppose A2.79) fails. We shall show that the choice / = lx
satisfies A2.77) and A2.78). Since A2.79) fails, it follows by the definition
A2.74) of В that \Sm - S^] < В and |?л| < В. If 0 < i < / = /l5 then
|^| < Ax. Hence A2.77) holds. If / = lx < i < h, then
Рте $i\ ^ Pa-1 — ^il + Рте "Л-ll ^ ^1 + -^5
if h < i < 12, then
\Sm - S,\ < \St - Sh\ + 1^1 + \Sm - S^l <A2 + IB;
if /2 < i < m, then \Sm - 5J < A2. Hence A2.78) holds. If A2.80) fails
instead of A2.79), then A2.77) and A2.78) hold with / = /2. This proves
A2.76).
For positive numbers Ao and Xx adding to X, A2.76) implies
A2.81) ?{Nm >X}< ?{AX > Xo} + ?{A2 > Xo} + P{B > %XX}.
Applying the inequalities A2.72), A2.73), and A2.75), we get
..2a
Fluctuations of Partial Sums 101
now A2.39) implies
\l/d
which A2.68) follows by A2.69), which completes the proof.
The hypotheses of Theorem 12.5 are satisfied if the ^ are independent,
J = 0, E{^2} = щ, у = 2, and a = 1. In this case, the ?f and the щ
also satisfy the stronger hypotheses of the following theorem and hence its
stronger conclusion. If all but one of the variances щ vanish, then
?{M"m > A} = 0 for A > 0; although the right side of A2.65) is positive, the
right side of A2.83) vanishes.
THEOREM 12.6 If у > 0 and a > \, and if
A2.82) P{|S, - St\ > A, \Sk - S,| > A} < -W 1 uX( 2 Л
Ky\i<i<i ) \}<i<k )
for all positive A, then, for all positive A,
A2.83) p{m; > a} < S« (Ml +... + и j*. min [i ъ T
А У l<A<mL «! + '¦"+ Um J
where K'^ depends only on у and a.
The moment form of A2.82), which implies it, is the inequality A2.9)
with which the whole discussion began.
Proof. Choose h to minimize the final factor in A2.83); write
и = Mi + • • • + мт, p = (i*! + • • • + ma_i)/u, ph = uhju, and ^ =
(uh+1 + • • • + um)lu. As in the preceding proof, define Ax by A2.70), A2 by
A2.71), and В by A2.74). By A2.82),
u2a u2a
P{M0, Л - 1, m) > A} < ^ pa(pA + ?)a <
and there is a similar inequality for each of the other ^a's in A2.74). Therefore
Write К = К'у'а. Since A2.82) implies A2.64), it follows by Theorem 12.5
(in the stronger form A2.68)) that
and
>Я}<
102 The Space С
Now A2.76) holds just as before, and hence
?{Nm > A} < P{A > Щ + P{A2 > |A} + P{B >
Combining this inequality with the three preceding ones, we arrive at
> X] ^
And now A2.83), for an appropriate Щ'л, follows easily by A2.67).
Remarks. The results of this section, new as such, stem from the work of Kolmogorov
(see Slutsky A937)) and Chentsov A956).
PROBLEMS
1. Do Problem 1 of Section 10 once more, using A2.19) in place of A0.7).
2. If we weaken A2.42) by assuming only that the inequality holds for the special pair of
indices i = 0, j = m, but compensate by assuming that Sx,. . . , Sm forms a martingale
and that у > 1, then [Doob A953, p. 314)] ?{Mm >Л}<(и1+---+ итУЧХ\ an in-
inequality of essentially the same strength as A2.44).
3. Assume ?15. . . , fTO independent with mean 0 and finite moments of order 2k. Use
the multinomial theorem to find a constant Ck, depending on к alone, such that
Generalize A2.48).
4. From A2.12) deduce that for positive s
2у-?} ^ *>
5. Adapt the proof of Menshov's inequality [Doob A953, p. 156)] to show that, if A2.10)
holds with a > \ and у > h, then
dog2 2mfy{Ul + ¦¦¦+ K
Now deduce that, if A2.42) holds with a > 1 and у > 1, then
E{Mm?} ^ (log2 4m)? («, + •••+ Um
(If у = 2 and a = 1, this gives Menshov's inequality again.)
6. Use Theorem 12.2 to show that, if f1; f2, . . . satisfies
\< ( I uX, o<;/
with у > 0, a > 1, and ?иг < oo, then ??г converges with probability 1. Find a result
connected in an analogous way with Theorem 12.6.
7. Theorem 12.3 is still true if we replace A2.50) by the assumption that
A2.84) P{\Xn(t) - Xn{tJ\ > X, \Xn(t2) - Xn(t)\ >$^Yy
holds for /j < / < t2 and for all n, where now a > ^.
Empirical Distribution Functions 103
8. Theorem 12.4 is false if A2.58) is weakened to the analogue of A2.84); it is also false
if the assumption a > 1 is weakened to a > 1. [Suppose /л( is a unit mass at 0 or at 1 accord-
according as / < i or / > ?.] See, however, Problem 4 in Section 15.
9. Let
' / \ k-\ к - 1 1
for </< + —
n n In
*t*0) = { (k \ к - 1 1 к
2( for +—<*<-
n 2n n
elsewhere,
and let Xn take the value xnk with probability 1/л, к = 1, . . . , п. Then
for all n, tlt and t2. Since {Xn} is not tight, we cannot take a = 1 in Theorem 12.3. Similarly,
we cannot take a = \ in Problem 7.
13. EMPIRICAL DISTRIBUTION FUNCTIONS
Let |l5 |2, . . . be random variables, on some (Q, ?$, P). We shall assume
that
A3.1) 0<|та(ш)^1,
which can always be arranged by a transformation.
The empirical (or sample) distribution function FJt, со) corresponding to
the points tjiico), . . . , }n(co) is defined, for 0 < t < 1, as Xjn times the
number of i < n for which Ij(co) < t.
If the ?n are independent and have a common distribution function F(t)
(by A3.1), only values of fin [0, 1] matter), then, for large n, Fn(t, со)
should approximate F(t)—the difference
A3-2) Fn(t, со) - F(t)
should be small. According to the Glivenko-Cantelli theorem,
A3.3) sup \Fn(t, со) - F(t)\
converges to 0 with probability l.f If F is continuous, it is possible to find
the limiting distribution of A3.3), properly normalized; this limit theorem,
due to Kolmogorov, stands to the Glivenko-Cantelli theorem as the central
limit theorem does to the law of large numbers.
We shall consider in this section only the case F(t) = t, so that the ?n,
assumed independent, have a common uniform distribution over [0, 1].
f Loeve A960, p. 20).
104 The Space С
Kolmogorov's result is that
A3.4) P{w:supt \Jn(Fn(t, со) - t)\ < a} — 1 - 2f (-l)fc+V2fc2'
We shall derive A3.4) by using the theory of weak convergence in C.
Consider the function Yn(co) with value
a > 0.
A3.5)
Yn(t, со) = y/n(Fn(t, со) - t)
at t. Although Yn{co) is a function on [0, 1] produced at random, it is not an
element of C, being obviously discontinuous. If we could establish the
convergence in distribution of Yn as a
random element of an appropriate
space of discontinuous functions with
an appropriate metric, we would be
in a position to derive A3.4) by the tech-
techniques of Sections 10 and 11, because
/
/
1 1
» 1
у
supt \Jn(Fn(tt со) - t)\ = h(Yn(co))
with h(x) = supj \x(t)\. Although in
the next chapter we shall analyze Yn
as a random element of a metric space
of discontinuous functions, here we
shall circumvent the discontinuity
problems by adopting a different defi-
definition of empirical distribution function. (We shall still be able to derive
A3.4) itself—not some perturbation of it.)
Let Gn(t, со) be, as a function of t ranging over [0, 1], the distribution
function corresponding to a uniform distribution of mass (n + I) over
each of the (n + 1) intervals [?«_!)(<»), ?(i)(<*>)], where |@) = 0, |(n+1) = 1,
and |A), . . . , |(n) are the values ?l5 . . . , |n ranged in increasing order.
(Note that Fn(t, со) corresponds to a distribution of mass rr1 to each of the
n points f!(<»), . . . , ln(w).) The functions Fn(t, со) and Gn(t, со) are close:
A3.6)
\Fn(t, со) - Gn(t, co)\ < - ,
n
0<t
Now let Zn(co) be the element of С with value
03.7) Zn(t, со) = Jn(Gn(t, со) - t)
at t. Since each Zn(t) is a random variable, Zn is a random element of С
(Z~^ с <%). By A3.6), we have
A3.8)
supt | Yn(t, со) -Zn(t,co)\< -L
Empirical Distribution Functions 105
THEOREM 13.1 If the |n are independent and uniformly distributed on
[0, -1], and ifZn is defined by A3.7), then
A3.9) Zn^W°,
where W° is the Brownian bridge.
Before proving the theorem, let us see how it implies A3.4). If h(x) =
supt |#@l> then h is continuous on С and hence A3.9) implies h(Zn) ^>-
h(W°). By A1.39),
P{supt \Zn(t)\ < a} - 1 - 22(-l)fc+V2fc ', a > 0,
fc=i
so that A3.4) follows from A3.8) and Theorem 4.1.
Using A1.40), we can in the same way derive the relation
A3.10) P(Vn supt (Fn(t, со) - t) < a} -+ 1 - e~2a\ a > 0.
If h(x) is the Lebesgue measure of the set of t for which x{t) > 0, then, by
A1.42),
A3.11) P{h(Zn) < a} ^ a, 0 < a < 1;
/z(Zn) is approximately n~x times the number of г, 1 < i < n, for which
Proof. To prove A3.9), we first show that the finite-dimensional distributions
of Zn converge to those of W°. Let Un(t, со) = nFn(t, со) be the number of
points among ^(w), . . . , !n(w) that satisfy |Дш) < Л If
0 = Г0<^< ••• <tk= 1,
then the random variables
A3.12)
are multinomially distributed with parameters n and pt = t{ — t{_x, i —
1, 2, . . . , k. It follows by the central limit theorem for multinomial trials
that the random vector with components
A3.13) YM ~ Ш-i) = \ ((UM - ип{П_,)) - nPi),
i == i,z, . . . , к,
converges in distribution to the random vector with components W°. — W°
(by (9.14) and (9.15), these normally distributed random variables have the
right variances pt(l — Pi) and covariances —PiP,)- Because of A3.8), the
same is true if we replace A3.13) by
Zn(ri_1), /=1,2,...,*.
106 The Space С
The finite-dimensional distributions of the Zn thus converge properly. If
we prove {ZJ tight, A3.9) will follow. By Theorem 8.3, it suffices to show
that, for each positive ? and r\, there exists a <5, 0 < <5 < 1, and an n0 such
that, if n > n0, then
A3.14) P( sup \Zn(s) - Zn(t)\ > e\ < dr]
\t<s<t+d j
holds for all t.
Because of A3.8), we may replace A3.14) by
A3.15) P( sup \Yn(s) - Yn(t)\ >s)< erj.
\t<s<t+d j
The event whose probability appears in A3.15) is easily shown to lie in the
c-field generated by |l5 . . . , |n; A3.15) is simply an inequality involving
|l5 . . . , |та—we have not embarked on an analysis of Yn as a random
element of a space of discontinuous functions. For notational convenience,
we shall take t = 0 in A3.15):
A3.16) (
Since the distributions of the increments of Yn(t) are stationary, this is
really no restriction anyway.
We shall bound the probability in A3.16) by using Theorem 12.1. Let us
show that
A3.17)
i) ~ Yn(s)\* • \Yn(s +pi +p2) - Yn(s +^)|2} < 6Plp2.
For 1 < i < n, let сх.г be 1 — pr or — px according as ?j lies in (s, s + Рг\
not, and let /^ be 1 — /?2 ог -~/>2 according as ^ lies in E + Pi, s + />i
or not. Then A3.17) is equivalent to
A3.18) e|B;<
Since the |j are independent, so are the random vectors (ai5 &). Since |f
is uniformly distributed, (ai5 /Зг) takes on the values A — p1} —p2),
(~Pi, 1 — Pz), and (—/Ji, — /?2) with respective probabilities /?15 /?2, and
/?3 = 1 — />! — рг. Now E{aJ = E{/3J = 0, and considerations of symmetry
lead to
1аг) I A) = "^{а!2^2} + и(я - 1)Е{а12}Е{^22}
+ 2n(n -
Empirical Distribution Functions 107
4ow A3.18) follows from
Е{а12Д12} = ^A - PlyPi* + p2px2(l - p2J + PzPiW < IPiP*,
E{a12}E{j322} = px{\ — j91)j92(l — P2) < P\p2,
ind
For fixed E, consider the variables
Л „ /i - 1
13.19)
m
i = 1,. . . , m.
[f у = 2 and a = 1, and if щ = в^ё/т, then, by A3.17), the hypotheses of
Theorem 12.1 are satisfied, with the variables A3.19) in the role of the |t-
in that theorem. If
Ml = max min
it follows that
^13.20)
with K= K21.
If
then, by A2.4),
From this and A3.20) we have
Mm = max
A3.21)
?{Mm >e}<
Now, for each со, Yn(s, со) is right-continuous in s. As m —>¦ 00, therefore,
Mm converges to sups<5 | Yn(s, co)\ for each со. Hence A3.21) implies
A3.22)
P sup |ВД| > e <
s<d
d* + P\\Yn(e)\ > * •
Because of the asymptotic normality of A3.13), Yn(d)
n —»- 00 F fixed), and hence
— <5)iVas
V<3A - <5I ?
E{iV4} =
108 The Space С
For n exceeding some nd, therefore,
hence, by A3.22),
A3.23) Pfsup |УЯE)| > e) < [
Given ? and rj, choose д so that 6 • 2*(K + l)<52/?4 < drj. For n > nd, A3.16)
follows by A3.23).
This completes the proof of Theorem 13.1. The treatment is evasive in that
we really analyze Yn, replacing Yn by Zn only in order to stay in C. In
Section 16 we treat Yn in the space natural to it and prove Yn-^> W°, and
generalizations, in that space. Although the proofs in Section 16 depend on
a considerable body of theory, they are much more transparent than the one
just given.
Remarks. Doob A949) conceived the idea of proving A3.4) by passing from the random
function A3.5) to the Brownian bridge; Donsker A952) justified the passage. The proof
here derives from Chentsov A956). Kac A949) originally proved A3.11). See Darling A957)
for an account of limit theorems connected with A3.5) and for a large bibliography. See also
the more recent papers: Bickel A968a and 1968b), Birnbaum and Руке A958), Руке A965
and 1968), and Руке and Shorack A968).
CHAPTER 3
The Space D
14. THE GEOMETRY OF D
The space С is unsuitable for the description of processes that, like the
Poisson process and unlike Brownian motion, must contain jumps. In this
chapter we study weak convergence in a space that includes certain
discontinuous functions.
The Space D
Let D = D[0, 1] be the space of functions x on [0, 1] that are right-
continuous and have left-hand limits:
(i) For 0 < t < 1, x{t+) = limsi< x{s) exists and x(t+) = x(t).
(ii) For 0 < t < 1, x(t—) = lims|t x{s) exists.
A function x is said to have a discontinuity of the first kind at t if x(t—)
and x{t+) exist but differ and x(i) lies between them. Any discontinuities of an
element of D are of the first kind; the requirement x{t) — x(t+) is a con-
convenient normalization. Of course, С is a subset of D.
For xeDandroc [0, 1], put
A4.1) wx(T0) = sup {\x(s) - X(t)\ :s,tE To}.
The modulus of continuity of x, defined by (8.1), may be expressed as
A4.2) wx(d)= sup wx[t,t + d].
0<t<l-d
A continuous function on [0, 1] is uniformly continuous. The following
lemma gives the corresponding uniformity idea for elements of D.
109
110 The Space D
LEMMA 1 For each x in D and each positive s, there exist points t0,
tx, . . . , tT such that
A4.3) 0 = t0 < tx < ¦ • ¦ < tr = 1 .
and
A4.4) wj'i-i, ti)<?, /=1,2,..., r.
Proof. Let т be the supremum of those t in [0, 1] for which [0, t) can be
decomposed into finitely many subintervals [?г_15 tt) satisfying A4.4). Since
x@) = z@+), we have т > 0; since x(r—) exists, [0, т) can itself be so
decomposed; т < 1 is impossible because x(r) — x(r+) in this case.
From this lemma it follows that there can be at most finitely many points
t at which the jump (saltus) \x{t) — x{t—)\ exceeds a given positive number;
in particular, x has at most countably many discontinuities. It follows also
that x is bounded:
A4.5) supt \x(t)\ < oo.
Finally, it follows that x can be uniformly approximated by simple functions
constant over intervals, so that x is Borel measurable.
We shall need a modulus that plays in D the role the modulus of
continuity plays in C. For 0 < <5 < 1, put
A4.6) we'(<3) = inf max wx[tt_lt tt),
{tti 0<i<r
where the infimum extends over the finite sets {7J of points satisfying
f0= to< tx < ••• < tr= 1,
A4.7)
[ti-ti_1>d, /=1,2, . ..,*
Lemma 1 is equivalent to the assertion that
A4.8) lim w'Jd) = 0
d-*0
holds for every x in D.
Even if x does not lie in D, the definition of w'x{d) makes sense. Just as
liin^,, wx(e) = 0 is necessary and sufficient for an arbitrary function x on
[0, 1] to lie in C, A4.8) is necessary and sufficient for x to lie in D.
Since [0, 1) can, for each <5 < \, be split into subintervals [^_l5 tt) with
<5 < tt — tf_x < 2<5, we have
A4.9) w'JLd) < wjBd), if <5<i
There can be no general inequality in the opposite direction because of A4.8)
and the fact that wx(d) does not go to 0 with <5 if x has discontinuities.
The Geometry of D 111
Suppose, however, that x ? C. Given e, choose points {t{} satisfying A4.7)
and
A4.10) max wx [h_x, tt) < wx(8) + e.
0<г<г
If \s — t\ < 6, then s and t lie in the same subinterval [^_i, tt) or in abutting
ones, and it follows by A4.10) and the assumed continuity of x that
Wx(<5) < 2w'x(S) + 2s. Since e was arbitrary,
A4.11) wx(d) < 2w'x(d) if xeC.
By A4.9) and A4.11), the moduli wx(d) and w'x(d) are essentially the same
for continuous functions x.
The Skorohod Topology
Two functions x and у are near one another in the uniform topology used
for С if the graph of x(t) can be carried onto the graph ofy(t) by a uniformly
small perturbation of the ordinates, with the abscissas kept fixed. In D, we
shall allow also a uniformly small deformation of the time scale. Physically,
this amounts to the admission that we cannot measure time with perfect
accuracy any more than we can position. The following topology, devised
by Skorohod, embodies this idea.
Let Л denote the class of strictly increasing, continuous mappings of
[0, 1] onto itself. If I ? Л, then Л@) = 0 and ЛA) = 1. For x and у in D,
define d{x, y) to be the infimum of those positive e for which there exists in
Л а Я such that
A4.12) supt|Af-f| <e
and
A4.13) sup, И0 - y{lf)\ < e.
By A4.5), d(x, y) is finite (take It = i). Clearly, d(x, y) > 0; and d(x, y) =
0 implies that for each t either x{t) = y(t) or x(t) = y(t—), which in turn
implies x = y. If Я lies in Л, so does k~x\ d(x, y) = d(y, x) follows from
supj \Х~Н — t\ = sup; \Xt — t\
and
sup, \x{X-H) - 2/@1 = sup, \x{t) - y(Xt)\.
If Ax and A2 lie in D, so does their composition ХгК\ the triangle inequality
follows from
sup, |AaV - t\ < sup, \Xxt — t\ + sup, |V - * I
and
sup, \x(t) - z{X^t)\ < sup, \x(t) - y{Xxt)\ + sup, \y(t) - 2(V)I-
Thus d is a metric.
112 The Space D
This metric defines the Skorohod topology. The uniform distance between
x and у may be defined as the infimum of those positive e for which
sup4 \x(t) — y(f)\ < e. The A in A4.12) and A4.13) represents the uniformly
small deformation of the time scale mentioned above.
Elements xn of D converge to a limit x in the Skorohod topology if and only
if there exist functions Xn in Л such that
limn xn(Xnt) = x(t)
uniformly in t and
lnt = t
uniformly in t. If xn goes uniformly to x, then there is convergence in the
Skorohod topology (take Xnt = t). On the other hand, there is convergence
in the Skorohod topology, whereas xn(t) —> x(t) fails in this case at t = |.
Since
Skorohod convergence does imply xn(t) —> x(t) for continuity points t of x,
and if x is (uniformly) continuous on all of [0, 1], then Skorohod con-
convergence implies uniform convergence. In particular, the Skorohod topology
relativized to С coincides with the uniform topology there.
The space D is separable; one countable, dense set consists of those x
that have a rational value at t = 1 and have, for some integer k, a constant,
rational value over each subinterval [(/ — l)/k, i/k), 0 < i < к (use Lemma
1 and the definition of d). But D is not complete under the metric d: If
A4.14)
then
A4.15) d(xn, xj =
n m
and hence {xn} is fundamental in the metric d, even though it is not con-
convergent. We shall introduce in D another metric d0—a metric which is
equivalent to d (gives the Skorohod topology) but under which D is complete.
Completeness facilitates characterizing compact sets.
The idea in defining d0 is to require that the time-deformation A that
intervenes in the definition of d be near the identity function in a sense
which at first appears more stringent than A4.12); namely, we require that
the slope {It — ls)\{t — s) of each chord be nearly 1 or, what is the same
thing and analytically more convenient, that its logarithm be nearly 0.
If Я is a nondecreasing function on [0, 1] with АО = 0 and Al = 1, put
A4.16) ||Я|| = sup
log
It- Is
t — s
The Geometry of D 113
If ||Я|| is finite, then the slopes of the chords of X are bounded away from 0
and infinity; therefore it is both continuous and strictly increasing and hence is
a member of Л. Although \\X\ may be infinite even if X does lie in Л, such
elements of Л do not enter into the following definition.
Let do(x, y) be the infimum of those positive e for which Л contains some
Я with
A4.17) ЦАЦ < е
and
A4.18) sup, HO - y{Xt)\ < e.
By A4.5), do(x, y) is finite (take Xt = i). That d0 is a metric follows from the
relations
A4.19) ||A-i|| =
and
A4.20) ЦЯЛ11 < Ш\ + 11А.Ц.
We shall show presently that d and d0 are equivalent metrics. We shall
also show that D is complete under d0; for the present, note that, if xn is
defined by A4.14), then
A4.21) do(xn, xj = minjl, log^ j, m, n > 3,
so that at least the (nonconvergent) sequence {xn} is not J0-fundamental.
If do(x, y) < e, then A4.17) and A4.18) hold for some X eA. If e < J,
then, since АО = О,
log A - 2e)< -e < log j < s < log A +2e),f
so that \Xt — t\ < 2s. Therefore
A4.22) d(x,y)<2do(x,y) if do(x,y)<l
A comparison of A4.15) and A4.21) shows that there can be no general
inequality in the direction opposite to A4.22). The following lemma shows,
however, that do(x, y) is small if d(x, y) and w'x{8) (or w'y(d)) are both small.
LEMMA 2 Ifd(x, y) < E2, where 0 < д < ?, then do(x, y) < 46 + w'x(d).
Proof. Choose points tt satisfying A4.7) and
A4.23) wj^, U) < w'xF) + 6, i = 1, 2,.. ., r.
t If M < i> then s - s2 < log A + s), whereas log A + s) <, s for all s > — 1.
114 The Space D
Now choose from Л а [л such that
A4.24) sup, \x{t) - y(jAt)\ = supt \x(p-4) - 2/@1 < &
and
A4.25) sup, \fjLt -t\< d\
We want to define in Л а Я that will be near /л but will not, as [л may,
have chords with slopes far removed from 1. Take X to agree with /л at the
points tt and to be linear in between. Since the composition yrxX fixes the tt
and is increasing, t and [л~гХ1 always lie in the same subinterval [*,-_!, tt).
By A4.23) and A4.24), therefore,
< \x(t) - хСи-^AOI + H^Xt) - у(Щ
б + б2 < 4д
It is now enough to prove ||Я|| < 46. Since X agrees with [л at the t{, it
follows by A4.25) and the inequality tt — гг_± > б that
2d(tt - U_x).
From the polygonal character of X, it now follows that
\{Xt- Xs)- (t-s)\ <2d\t-s\
holds for all s and t. Therefore
log A - 26) < log ^-^ < log A + 26);
t — s
since б < I, \\X\\ < 46 follows.
THEOREM 14.1 The metrics d and d0 are equivalent.
Proof. Let us denote an open J-sphere by Sd(x, s) and an open ^-sphere by
Sdo(x, s). It follows by A4.22) that inside an arbitrary Sd(x, s) we can find an
Sdo(x, 6). (The choice of the new radius does not depend on the center x.)
Lemma 2 implies that, if
A4.26) б < J, 46 + w'x(d) < e,
then Sd(x, E2) c: Sd (x, e). Given x and e, we can, by A4.8), find а д satisfying
A4.26). Inside an arbitrary J0-sphere we can thus find a J-sphere with the
same center. (This time the choice of the new radius does depend on the
center x, as must be true if D is to be complete under d0—if d and d0 are to
give different classes of fundamental sequences.) Therefore d and d0 are
equivalent metrics.
The Geometry of D 115
Completeness of D
THEOREM 14.2 The space D is complete in the metric d0.
Proof. It is enough to show that each ^-fundamental sequence contains a
4,-convergent subsequence. If {xk} is a ^-fundamental sequence, it contains
a subsequence {yn} = {xkn} such that
A4-27) do(yn, yn+1) < 1.
We shall prove that {yn} is ^-convergent to some limit.
By A4.27), Л contains a /j,n such that
A4.28) ^Pt\yn(t)-yn+1(^nt)\<~-n
and
A4-29> . . ^
This implies, for m > 1,
SUP« l^n+m+lPn+m ' ' ' P
= suPs \fxn+m+1s - s\ < — .
(Here /un+m • • • /un+1/un denotes iterated composition.) For fixed n the func-
functions
A4-30)
are thus uniformly fundamental as m —> oo. Therefore A4.30) converges
uniformly to a limit
A4.31) V = limTO /лп+т • • •
The function Xn must be continuous and nondecreasing and must satisfy
Яи0 = 0, АЯ1 = 1. If we prove that ||AJ| is finite, it will follow that Xn is
strictly increasing and hence is a member of Л. By A4.20),
i ftn+m ftn+lftnt f^n+
t — S
-2n-l
Letting m -*- oo in the first member of this inequality, we find that ||An|| <
1114'1; in particular, ||AJ| is finite and ln ? Л.
By A4.31), ln = Хп+Фп. Therefore, by A4.28),
¦ у Л 1 .4 У Л 1 .41 I S 4 • N.I-J-
116 The Space D
In consequence, the functions yn{X~H), which are elements of D, are
uniformly fundamental and hence converge uniformly to a limit function
x(t). It is easy to show that x must also be an element of D. Since
supfK(V0-x@|-^0 and IIAJI-j-O, we have do(yn, x) -> 0, which
completes the proof.
Compactness in D
We turn now to the problem of characterizing the compact subsets of D.
With the modulus w'x(8) defined by A4.6), we have an analogue of the
Arzela-Ascoli theorem (p. 221).
THEOREM 14.3 A set A has compact closure in the Skorohod topology if
and only if
A4.32) sup sup \x(t)\ < oo
xeA t
and
A4.33) lim sup w'x{d) = 0.
d-*0 xeA
Note that, because of A4.9), A4.33) is weaker than
A4.34) lim sup wx(d) = 0.
d-*0 xeA
Proof. To prove the sufficiency of these conditions, it is enough, since D is
^-complete, to show they imply that A is totally bounded with respect to
d0. Let us first show that A4.32) and A4.33) imply that A is totally bounded
with respect to the metric d.
Given a positive e, choose an integer к such that I/A: < e and w'x{\jk) < e
for all x in A. Take Я to be a finite e-net in the linear interval [—a, a],
where
a = sup sup \x(f)\.
xeA t
Let В be the finite set of у in D that assume on each of the intervals [(u — \)jk,
ujk) a constant value from H and satisfy y(\) ? H. We shall show that В is a
2e-net with respect to d.
Given x in A, use the inequality w'x{ljk) < e to choose points tit 0 =
fo < h < • • • < tr = 1, such that
A4.35) U - и_, > \
к
and
A4.36) wj^, u) <s, 0<i<r.
Choose integers щ such that ujk < t{ < (wf + l)/k, i = 0, 1, . . . , r. Since
The Geometry of D 117
he щ must be distinct by A4.35), there is in Л а Я that carries ujk to tt
.nd is linear in between these points. Choose in В a point у such that
14.37)
>ince an interval
nust be inside one of the intervals
:he function x{Xt) cannot, by A4.36), vary by more than eas/ varies over an
nterval [и/к, (и + 1)/^)- Since у is constant over intervals of this form,
'Д4.37) implies \y(t) — x(Xt)\ < 2e for all t. By construction, .
щ
к
h--
и,
~k
<7<e;
к
by linearity, sup^ \Xt — t\ < e. Thus d(x, y) < 2e. Therefore В is a 2e-net
in the sense of d.
Given a positive rj, choose д so that 0 < д < ? and so that 46 + w'x(d) < 77
holds for all x in A. Then choose e so that 0 < 2e < d2. If В is the set just
constructed—a finite 2e-net for A in the sense of the metric d—then, by
Lemma 2, В is an ??-net for A in the sense of the metric du. Thus A is d0-
totally bounded and hence, since D is J0-complete, the closure A~ of A is
compact.
This proves the sufficiency of A4.32) and A4.33). If A~ is compact, then
it is bounded, and A4.32) follows immediately (note that supf \x{t)\ is the
distance, in the sense either of d or of d0, from x to the function that is
identically 0). The present theorem differs from the Arzela-Ascoli theorem
in that for no single t do
A4.38)
sup
< 00
and A4.33) together imply A4.32). It is not hard, however, to prove that
A4.32) follows if A4.33) holds and if A4.38) holds for each individual value of
t (or even just for values of t including 1 and dense in [0, 1]).
It remains to prove the necessity of A4.33). By A4.8), w'x(d) goes to 0 with
д for each x. If we prove that w'x(d) is upper semicontinuous in x for each d,
then (see p. 218) the convergence will be uniform on compact sets and A4.33)
will follow.
118 The Space D
Let x, 6, and e be given. To prove upper semicontinuity, we must find an
7] such that d(x, y) < rj implies
A4.39) w&d) < w'x(d) + e.
First choose points tt satisfying A4.7) and
A4.40) wj^_b tt) < w?d) +- i = 1, 2,. . ., r.
Now choose rj so small that
ti-ti_1>d + 2rj, i=l,2,...,r,
and
A4.41) rj < Is.
If d{x, y) < rj, then, for some X in Л, we have
supt \Xt — t\ = supt \X-~4 — t\ < rj
and
A4.42) sup, \y(t) - x{Xt)\ < rj.
Let St = Х'Чг. Then
A4.43) Si - s^ >U- tt_x -2rj>d.
Moreover, if s and t both lie in [5»_19 5^), then Я5 and Xt both lie in [^_l5 tt)
and therefore, by A4.40), A4.41), and A4.42), \y(s) - y(t)\ < <(б) + в.
Thus
wjs^b s,) < w^C) + e,
which, in view of A4.43), implies A4.39). This completes the proof of
Theorem 14.3.
A Second Characterization of Compactness
Although the modulus w'(d) is natural in that it leads to a complete charac-
characterization of compact sets, a different one is sometimes more convenient to
work with, namely
A4.44) W'x{d) = sup min{H0 - x{h)\, \x(t2) - x(t)\},
where the supremum extends over tlt t, and t2 satisfying
A4.45) ti < t < t2, t2 - ^ < д.
Given д and e, decompose [0, 1) into subintervals [^_15 s^ such that
si — si-i > ^ and Wj.^-!, st) < w'x(d) + e. If A4.45) holds, then either t±
and t2 He in the same subinterval [5<_ь ^), in which case \x(t) — ж(?1)| <
w^.F) + e and \x(t2) — ж(?)| < w'x(d) + e, or else they lie in abutting intervals
The Geometry of D 119
'i-i> sd and IAj'Vh)' ш which case \x(t) — х(^)\ < wx(d) + e for
< > < ^ and |z(f2) — a;(f)| < wxF) + s for st < t < t2. Therefore
14.46) W'x(d) < Wx(d).
There can be no inequality in the other direction. If, for example,
A for 0<K n~\
14.47) xn(t) =
lO for n-^<t< 1,
rif
@ for 0 < t < 1 - n-1,
14.48) *я@ =
(l for 1-rKK 1,
tien wXn(d) = 0, whereas
, ... @ if б < и,
W-E) = (l if 6>«-
dnce {жэт} has compact closure in neither case, there can be no compactness
ondition involving (in addition to A4.32)) a restriction on w'x(d) alone. It is
•ossible, however, to formulate a compactness condition in terms of w'x(d)
.nd the behavior of x near 0 and 1.
THEOREM 14.4 A set A has compact closure in the Skorohod topology
f and only if
14.49) supsup|a;@| < oo
xnd
'lim sup W(8) = 0,
<5-»0
lim sup wx[0, 6) = 0,
d-*0 xeA
lim sup wjl — 6,1) = 0.
14.50)
proof. It suffices, in view of Theorem 14.3, to show that A4.50) is equivalent
:o
14.51) lim sup w'(<3) = 0.
d->0 xeA
That A4.51) implies A4.50) follows by A4.46) and the definition of w'(d).
We prove the reverse implication.
Given a positive e, choose a positive б such that, for all x in A,
A4.52) иДО) < e, wJO, 6) < e, W(>[1 - Л, 1)< e.
Assume that ж lies in A; we shall show that
A4.53) m/Q<5) < 6e,
which will suffice to prove the theorem.
120 The Space D
Let us first show that
A4.54) h<s <t<t2, t2 - tx < 6
implies
A4.55) min {|x(j) - x(tl)\, \x{t2) - x(t)\} < 2s.
Indeed, if \x(s) — x(tx)\ > s, then, by A4.52), \x(t) — x(s)\ < s and
\x(t2) — x(s)\ < s, so that \x(t2) — x(t)\ < 2e.
Suppose x has a jump exceeding 2e at each of two points т1 and r2. If
0 < r2 — TX < 6, then there exist points f1} s, t, t2 satisfying A4.54),
t1 < r1 = 5, and ? < r2 = ?2; if t1 is close enough to rx, and if ? is close
enough to т2, then A4.55) is violated, which we have seen is impossible.
Thus [0, 1] cannot contain two points, within д of each other, at each of
which ж jumps by more than 2e. And, by A4.52), neither [0, d) nor [1 — d, 1)
can contain a point at which x jumps by more than 2s.
Thus there exist points si} with 0 = s0 < s1 < • • • < sr = 1, such that
^ ~ si-i >: <5 and such that any point at which x jumps by more than 2e
is one of the st. If s§ — s^ > б for a pair of adjacent points, enlarge the
system {^J by including their midpoint. Continuing in this way leads to a
new, enlarged system s0, . . . , sr (with a new r) that satisfies
i<3 < st - ^_! < б, г = 1, 2, . . . , r.
Now A4.53) will follow if we show that
A4.56) щЬы, s^ < 6e
for each i. Suppose ^_х < tx < ?2 < 54. Then ?2 — t1 < 6. Let (Ti be the
supremum of those a in [tx, t2] for which supii<u<CT \x(u) — ж(^)| < 2s; let
or2 be the infimum of those a in [tx, t2] for which supa<u<<2 \x(t2) — x(u)\ <
2s. If o1 < cr2, there exist points s just to the right of аъ with |жE) — ж(^)| >
2e and there exist points t just to the left of a2 with \x(t2) — ж(/)| > 2s; since
we may ensure s < t, this contradicts the fact that A4.54) implies A4.55).
Therefore a2 < alt and it follows that \x{o1 —) — x(t^)\ < 2e and
|ж(?2) — x(a^)\ < 2e. Since t1 <. <y1 < t2, ax is interior to (^_15 st), so the
jump at a-L is at most 2s. Thus |ж(?2) — x{t^)\ < 6e. This establishes A4.56),
which proves A4.53) and the theorem.f
Finite-Dimensional Sets
Finite-dimensional sets play in D the same role they do in C. For points
tu . . . , tk in [0, 1], define the natural projection TTtv,.tk from D to Rk as
t The points 0 and 1 play a special role in the theory of the space D because each X in
Л fixes them. Judicious enlargement of Л might simplify the theory.
The Geometry of D 121
usual:
A4.57) ?rtl...tk(x) = {<h),...,x(tk)).
Now tt0 and ттх are everywhere continuous. Suppose 0 < t < 1. If points
xn converge to x in the Skorohod topology and x is continuous at t, then
(p. 112) xn(t) —>¦ x(t). Suppose, on the other hand, that x is discontinuous at
t. If Xn is the element of Л that is linear on [0, t] and on [t, 1] and satisfies
Xnt = t — \jn, and if xn{s) = x(kns), then xn converges to x in the Skorohod
topology but xn(t) does not converge to x(t). Thus: If 0 < t < 1, then vt
is continuous at x if and only if x is continuous at t.
We must prove that irti...tk *s measurable with respect to the cr-field 2
of Borel sets for the Skorohod topology. We need consider only a single time
point t (since a mapping into Rk is measurable if each component mapping is),
and we may assume t < 1 (since тгх is continuous). If xn converges to x
in the Skorohod topology, then xn(s) -> x(s) for continuity points s of x and
hence for points s outside a set of Lebesgue measure 0. Since the xn are
uniformly bounded,
A4.58) - zn(s)ds-^- x(s)ds (n -+ oo)
e Jt e Jt
for each positive s. Thus he{x) = e J*+? ж(,у) ds is continuous in the Skorohod
topology. By right-continuity, he(x) -> ттг(х) for each x as e -> 0. Thus
тгг(а;) is measurable.
Having proved тг^...^ measurable, we may, as in C, define the finite-
dimensional sets as sets of the form тгг1 , H with к > 1 and // e «^fc. Each
finite-dimensional set lies in 2 by the definition of measurability.
If To is a subset of [0, 1], let J^,o be the class of sets ttj^^H, where к is
arbitrary, the tt are arbitrary points of To, and Я e Mk. Then J^,o is a
(finitely additive) field. Of course, J7^ n is the class of finite-dimensional
sets.
THEOREM 14.5 IfT0 contains 1 and is dense in [0, 1], then 3FT generates
9.
These conditions are also necessary in order that J^y generate 3),
although we shall not need this fact. Taking To = [0, 1] makes the theorem
no easier to prove.
Proof. Since D is separable, it suffices to show that each open <io-sphere
Sdo(x, r) lies in the a-ueld generated by ^Tq. Fix the center x and radius r.
Choose in To a sequence tlf t2, . . . that is dense in [0, 1], with tx = 1. For
0 < e < r and к > 1, let Ak(e) consist of those у for which there exists in Л
а Я satisfying
IIAll < r - e
and
max \y(ti) — x(Xtf)\ < r — s.
1<г<к
It is enough to prove that
A4.59) Sdo(x, r) = U C\Ak(e),
E fc=l
where the union extends over the rational e in @, r) and to prove that each
Ak(s) lies in J^.
We prove the second fact first. For fixed e and k, let Нг be the set of
points (x(Xt^), . . . , x(Xtk)} in Rk, where X ranges over those functions in Л
satisfying ||A|| < r — e. Let H2 be the set of points (al5 . . . , afc) in Rk such
that \x{ — &| < r — e, i = 1, . . . , k, for some (/?l5 . . . , Ck) in J^. Then /f2
is open and hence lies in Mk, and Ak(e) = ttj^ tjH2. Thus Ak(s) g^Tq-
It is easy to see that the left member of A4.59) is. contained in the right
member. We complete the proof by showing that
A4.60) П Ak(e) cz Sdo(x, r).
/c=i
If у lies in the intersection on the left, choose for each к a Xk in Л with
A4.61) ||AJ <r-e
and
A4.62) max \уAг) - х(Хк1г)\ < г - e.
1<г<к
By Helly's selection theorem (p. 227), there is a subsequence {Xk>} and a
nondecreasing function X such that
A4.63) limfc. Xk,t = Xt
holds for continuity points t of A. We shall show that X еЛ, that ||A|| <
r — e, and that supt |y@ — ж(Я/)| < r — e. This will imply do(x, у) < г and
prove A4.60).
If j and ? are distinct continuity points of X, then, by A4.61),
A4.64)
log
Xt - Xs
t — s
log
Xk.t - Xk,s
t — s
< r
This relation makes a jump in X impossible (in particular, A4.63) holds for
all t) and it implies that X is strictly increasing; thus X еЛ. The inequality
A4.64) also implies ||Я|| < r - e. By A4.62), \y(tx) - x{t^\ <r - e (recall
that tx = 1). If / > 1, then, by A4.62), \y(tt) - х(Хк4г)\ < r - e for k! > i.
Since Xvt -> Af, we have either \y(t{) — x{Xt^)\ < r — e or |?/(Гг.) — x((Xtt) — )| <
r — e and, since {fj is dense, sup4 |y(f) — a;(Af)| < r — г follows. This
proves Theorem 14.5.
Since Jryo is always a field, this theorem implies that, if To contains 1
and is dense in [0, 1], then 3FT^ is a determining class. Not even ^"[0>1] is a
convergence-determining class—the counterexamples for С apply equally well
here.
If P is a probability measure on (D, Of), its finite-dimensional distributions
are the measures P^^..tk- If To contains 1 and is dense in [0, 1], then P is
completely determined by its finite-dimensional distributions for time
points in To.
We shall use in D the same conventions about coordinate variables we
used in С—the conventions set out at the end of Section 8.
Remarks. The topology studied here is the Jx topology of Skorohod A956); see also
Kolmogorov A956), Prohorov A956), and Skorohod A961).
PROBLEMS
1. Let D+ be the class of functions on [0, 1] that have only discontinuities of the first kind
in @, 1) and have right-hand limits at 0 and left-hand limits at 1. Convert D+ into a pseudo-
metric space in such a way that (i) x and у are at distance 0 if and only if they agree at their
common continuity points and (ii) D is isometric with the standard quotient space [Kelley
A955, p. 123)].
2. The Skorohod topology is finer than that given by the metric J \x(t) — y{t)\ dt.
3. Under the Skorohod topology and pointwise addition of functions, D is not a topo-
logical group.
4. The set С is nowhere dense in D.
5. Put
w%{8) — sup sup min {wx{t^, t), wx{t, f2)}.
ti-t1<d t1<t<ti
Show that
wx'(d) — sup inf max {wx{t^, t), wx(t, t2)}
ti-t1<d t1<t<ti
and
и?(<5) < <E) ? 2и?(<5).
15. WEAK CONVERGENCE AND TIGHTNESS IN D
Prove weak convergence in function space by proving weak convergence of
the finite-dimensional distributions and then proving tightness—this was the
technique so useful in C, and we want to adapt it to D. Since D is separable
and complete under the metric d0, a family of probability measures on (D, Sf)
is relatively compact if and only if it is tight, so there is no difficulty on that
point. On the other hand, the fact that the natural projections are not
continuous complicates matters somewhat.
Finite-Dimensional Distributions
For a probability measure P on (D, Sf), let TP consist of those t in [0, 1] for
which the projection щ is continuous except at points forming a set of
124 The Space D
P-measure 0. The points 0 and 1 always lie in TP. If 0 < t < 1, then t e TP
if and only if P(Jt) = 0, where
A5.1) Jt= {x:x(t) ^ x(t-)}
is the set of x that are discontinuous at t—recall (p. 121) that, for
0 < t < 1, TTt is continuous at x if and only if x is continuous at t.
An element of D has at most countably many jumps. Let us prove the
corresponding fact that P(Jt) > 0 is possible for at most countably many t.
Let Jt(e) be the set of x having at t a jump \x(t) — x(t—)| exceeding s. For
fixed positive e and д there can be at most finitely many points t for which
P(jt(ej) > d, since if this inequality held for a sequence of distinct points
t1} t2, . . . , then the set lim supn Jin{^) would have measure at least д and
hence would be nonempty, contradicting the fact that for each x the saltus
can exceed e at only finitely many places. For a fixed positive e, therefore,
P(jt(e)) can exceed 0 for at most countably many t. Since P(Jt(e)) f P(Jt) as
e I 0, the result follows.
Thus: TP contains 0 and 1 and its complement in [0, 1] is at most countable.
If tlf . . . , tk all lie in TP, then Trtl...tk is continuous except on a set of
P-measure 0.
Suppose
A5.2) Pn=>P,
where Pn and P are probability measures on (D, Sf)\ by Theorem 5.1,
A5-3) W-*=^U
holds if all the tt lie in TP. In general, A5.3) will not follow from A5.2) if
some tt lies outside TP: If P is a unit mass at the function /[0 i) and Pn a.
i / h P hld hil Р~х Р1 fil
[0 )
unit mass at /[0>^+n-i), then Pn =>P holds while Рптт~х ^-Рщ1 fails.
Let us now prove the analogue of Theorem 8.1.
THEOREM 15.1 If {PJ is tight and if Pn^..tk=>Рщ^к holds whenever
tlt . . . , tk all lie in TP, then Pn =>P.
Proof. By tightness, each subsequence {Pn>} contains a further subsequence
{Pn"} converging weakly to some limit Q. It is enough (Theorem 2.3) to
show that Q always coincides with P.
If 'i h all lie in TP n TQ, we have Pn-'lfTi..t]c=>P'n7i..tk ЪУ tne
hypothesis and Pn»TTj^,.tk => C?77^1-.^ because Pn» => Q. Thus
A5.4) Pir?.tk = Qt«7"-**> ^ . . . , ffc e Tp П To.
Since ГР and Го each contain all but countably many points of [0, 1], the
same is true of TP C\TQ\ in particular, this intersection is dense. Since
TP П TQ contains 1, it follows by Theorem 14.5 that 2 is generated by the
Weak Convergence and Tightness in D 125
finite-dimensional sets based on time points in TP П TQ. Thus P = Q
follows from A5.4), which completes the proof.
Tightness
The analysis of tightness in С began with a result (Theorem 8.2) which
substituted for compactness its Arzela-Ascoli characterization. Theorem
14.3, which characterizes compactness in D, leads in exactly the same way to
the following result. Let {Pn} be a sequence of probability measures on
THEOREM 15.2 The sequence {Pn} is tight if and only if these two conditions
hold:
(i) For each positive rj, there exists an a such that
A5.5) i\>:suPi KOI > a} < rj, n > 1.
(ii) For each positive e and rj, there exist a d, 0 < д < 1, and an integer nQ
such that
A5.6) Pn{x:<(d)>e}<ri, n > n0.
Using Theorem 14.4 in place of Theorem 14.3 gives a second set of
conditions for tightness.
THEOREM 15.3 The sequence {Pn} is tight if and only if these two
conditions hold:
(i) For each positive rj, there exists an a such that
Pn{x:snpt\x(t)\ >a}<V, n > 1.
(ii) For each positive s and rj, there exist a 6, 0 < д < 1, and an integer
n0 such that
A5.7) Pn{^<F)>e}<rj, n>n0,
such that
A5.8) Pn{x:wx[0,d)>e}<rj, n > no,
and such that
A5.9) i\>:wjl - d, 1) > e} < rj, n> n0.
In the most interesting cases, A5.8) and A5.9) are automatically satisfied,
as the next theorem shows. Let Pn and P be probability measures on {D,3f)\
by the definition A5.1), Л = {x:x(l) ^ z(l-)}.
126 The Space D
THEOREM 15.4 Suppose that
A5.10) Pn^t-tk^P^Tr-tk
holds whenever t1} . . . , tk all lie in TP. Suppose further that Р(Л) = О.
Suppose finally that, for each positive e and rj, there exist a 6, 0 < д < 1,
and an integer n0 such that
A5.11) Pn{x--<V)>e}<r], n>n0.
Proof. It will suffice, by Theorem 15.1, to show that {Pn} is tight. We
first verify condition (i) of the preceding theorem. Given a positive rj,
choose д and n0 satisfying A5.11) with e = 1 (say). Since TP is dense, it
contains points tx,. . . , tk, with 0 = /x < • • • < tk = 1, such that tt — ti_1 <
6. From A5.10) it follows that {PnTTj^..tk} is a tight sequence in Rk and
hence that
A5.12) pJx : max |<OI > a<\ < rj, n> 1,
I l<i<fc J
for an appropriately chosen a0. If \х(^г)\ < a0, i = 1,. . . , k, and if w"x{b) < 1,
then \x{f)\ < a0 + 1 for all t. If a = a0 + 1, then by A5.12) and A5.11)
(with e = 1) we have Pn{x:supt \x(t)\ > a} < 2rj for n ^ n0. For the
finitely many n preceding n0, we can ensure that this inequality holds by
increasing a, each individual Pn being tight. This establishes condition (i).
As for condition (ii) of the preceding theorem, we must find, given s and
rj, а д and n0 satisfying A5.8) and A5.9). Consider A5.8). Since each x in
D is right-continuous, we have
A5.13) P{x:\x(d) - z@)| >e}<rj
for all sufficiently small д. Choose in the dense set TP а. д small enough that
A5.13) holds and small enough that A5.11) holds for an appropriate n0.
Since 0 and д both lie in TP, we have Рптт~\ ^Ртт^ as a special case of
A5.10), and it follows by A5.13) that
A5.14) Pn{x--№)-x@)\>e}<ri
for all sufficiently large n. If w"x(d) < e and \x(d) — z@)| < e, then
\x(s) - x@)\ < 2e for all s in [0, 6], and hence wJO, 6] < As. Thus A5.11)
and A5.14) together imply Pn{x:wx[0, d) > 4e} < 2rj. This takes care of
A5.8).
. With one change, the symmetric argument works for A5.9). In place of
A5.13), we need
A5.15) P{x:\x(\) - x(l -6)\>e}<rj
Weak Convergence and Tightness in D 127
for small д. Since an element of D need not be left-continuous, A5.15) will
notin general hold; since we have assumed Р(Л) = 0, x is left-continuous at
t = 1 except for x in a set of P-measure 0, and A5.15) does hold for all
sufficiently small <5.
This completes the proof of Theorem 15.4. The condition P(Jx) = 0 is
essential: If P is a unit mass at the function /{1} and Pn is a unit mass at
/ n-i1 then all the finite-dimensional distributions converge and the
condition involving A5.11) holds, but Pn does not converge weakly to P.
Since
lim P{x:\x(l-) - x(t)\ > e} = 0, e > 0,
we have Р(Л) = О if and only if
A5.16) limP{x:\x(l)- x(t)\ > e} = 0, e > 0.
It is usually easy to check the condition Р(Л) =-0 via A5.1,6).
Our next result shows what happens if in place of w'x(8) or w'x(d) we use
the modulus wx(d) appropriate to C.
THEOREM 15.5 Suppose that, for each positive rj, there exists an a such
that
A5.17) Рп{х:\хф)\>а}<4, » > 1.
Suppose further that, for each positive s and rj, there exist a d, 0 < д < 1,
and an integer n0, such that
A5.18) Pn{x--wx(d)>e}<y, " > n0.
Then {Pn} is tight, and, if P is the weak limit of a subsequence {Pn>}, then
Proof. Since w'x(d) < wxBd) for д < | (see A4.9)), condition (ii) of Theorem
15.2 is satisfied. That condition (i) of that theorem is satisfied follows easily
from the hypotheses here and the inequality
KOI < l*@)| + 2 \x(itlk) - x((i - l)tlk)\.
Hence {PJ is tight.
If wy(%d) > 2e, then у is interior to {x:wx(d) > e}; Pn,^>P therefore
implies
A5.19) P{y:wy(\d) > 2e} < lim infn, Pn.{x:wx(d) > e}.
Given e and rj, choose д and n0 so that A5.18) holds; it follows by A5.19)
that P{y:wy(^d) > 2e} < rj. For each k, therefore, there exists a positive dk
128 The Space D
such that, if Ak = {y:wy(dk) > I/A:}, thenP(^fc) < I/A:. Put A = lim inffc Ak;
then P(A) = 0, and x ф A implies lim^ wx(d) = 0, so that P(C) = 1,
which proves the theorem.
In applications, the hypothesis A5.18) is generally verified by using the
corollary to Theorem 8.3, which carries over to D with no change. Note that,
if the n0 in A5.18) is 1 for every e and r), then Pn(C) = 1 for all n.
Random Elements of D
A random element of D we shall often call a random function. It will always
be clear from the context whether a random function under consideration
is a random element of С or of D.
If X is a mapping from (П, g§, P) into D, then, for each со, X(co) is an
element of D whose value at t we denote by X(t, со). For each t, X(t) denotes
the real function TTtX on Q.—its value at со is X(t, со). Just as in the space C,
X is a random element of D {X~X3) <==¦ ?8) if and only if, for each t, X(t)
is a random variable (use Theorem 14.5).
A sequence {Xn} of random elements of D is by definition tight when the
sequence of corresponding distributions is tight. Each of Theorems 15.1
through 15.5 can be routinely reformulated in terms of random functions.
A Criterion for Convergence
Let us now combine the results of this section with those of Section 12 to
obtain concrete criteria for convergence in distribution.
Let Xn and X be random elements of D. Write Tx for TP, where P is the
distribution of X. Thus Tx contains 0 and 1, and, if 0 < t < 1, t e Tx if
and only if P{X(t) Ф X(t-)} = 0. We shall write w"(X, d) in place of
W'x(d).
THEOREM 15.6 Suppose that
A5.20) (Xn(tl), ..., Xn(tk)) *+ (X(t,),..., X(tk))
holds whenever tlt . . . , tk all lie in Tx; that Ppf(l) Ф Щ—)} = 0; and that
A5.21) P{\Xn(t) - XM\ > I, \XM ~ *«@l > *} < Jy №) - F(tl)r
for t-l<t<t2 and n > 1, where у > 0, a > ^, and F is a nondecreasing,
continuous function on [0, 1]. Then Xn-^-X.
There is a more restrictive version of A5.21) involving moments, namely
Weak Convergence and Tightness in D 129
Proof. By Theorem 15.4, it suffices to show that for each positive e and 77
;here exists a positive д such that
A5.22) P{w"(Xn, 6)>e}<r}
holds for all n. For fixed т, д, and n, and for m a positive integer, consider
the random variables
(A ^ ОДЛ ? — Y^ I _J_ _— A1 Y I _J_ ' J5 \ * __ 1 о
"\ m / \ m J
Let
M'm = max min
where the maximum extends over 0 < i <y < к < m. By A5.21) and
Theorem 12.5,
A5.24) ?{M"n > e} < f [F(t + S) - F(r)]fc,
where К = К">сс depends only on у and a.
Put
A5.25) w"(Xn, [t, r + d]) = sup min {]Zn@ - Xn(^)l, !^n(^ - *n@!},
where the supremum extends over t±,t,t2 satisfying r < tx < t < ?2 < т + б.
Since Zn is a right-continuous function on [0, 1], letting m -> 00 in A5.24)
yields
A5.26) P{w"(Xn, [r, r + d]) > e} < * [F(t + Й) - F(r)]fc.
e '
Suppose now that д = l/Bu) is the reciprocal of an even integer, and
suppose that
A5.27) w"(Xn, [lid, Bi + 2)d]) <e, 0<i<u-l,
and
A5.28) w"(Xn, [Bi + 1N, Bi + 3K]) < e, 0 < 1 < и - 2.
If h < ? < ?2 and ^2 ~ ^1 ^ ^ > then ?x and /2 both lie in some one of the
2m — 1 intervals involved in A5.27) and A5.28), so that
min {|ЛГЯ(О - Xn(tJ\, \XM - Xn(t)\} < e.
Thus A5.27) and A5.28) together imply w"(Xn, 6) ? e. It now follows by
130 The Space D
A5.26) that
A5.29) P{w"(*n, d) > e} < ^ (Z' + I"),
e '
where each of 2' and 2" is a sum of the form
with 0 < tx < • • - < tr < 1 and tk - tk_x < 26. But then
A5.30) ?{W'{Xn, d)>e}< ±f [F(l) - F@)][w A
ey
Since 2a > 1 and F is continuous, the right member of this inequality goes
to 0 with d, which proves A5.22).
Criteria for Existence^
These ideas lead also to a condition for the existence in D of a random
element with specified finite-dimensional distributions. As in Theorem 12.4,
for each /c-tuple tx, . . . , tk of points of [0, 1], let pti.m.tk be a probability
measure on (Rk, Mk), and assume these measures satisfy the consistency
conditions of Kolmogorov's existence theorem.
THEOREM 15.7 There exists in D a random element with finite-dimensional
distributions (itl...t]c, provided these distributions are consistent; provided
i
for t1<.t<. t2, where у > 0, a > J, and F is a nondecreasing, continuous
function on [0, 1]; and provided
A5 32) lim AW»{@i» ^2) : IA - P*\ > e} = 0, 0 < t
4 J h|0
There is a more restrictive version of condition A5.31):
f - &Г \Pt - W d^tttfi, P,
t
JK
Proof. It is easy to construct for each n a random function Xn which is
constant over each interval [(/ — 1J~", f2~") and for which the joint
distribution of
t This topic may be omitted.
Weak Convergence and Tightness in D 131
is /лгошшЛк with к = 2n and tt = i2~n. We shall prove that {XJ is tight. We
first" show that, for given e and r\, there exists а д such that
A5.33) PK(In) <5) > ?} < v
holds for sufficiently large n.
If fl5 ?, and f2 are integral multiples of 2~n and tx < f < t2, then, by
A5.31),
P{|Xn@ - XM\ > A, |Xn(f2) - Xn(f)| > Я} <
Consider again the variables A5.23). If the points т + jbjm,j = 0,1,. . . , m,
are all integral multiples of 2~n, then A5.24) follows as before. If т and <5 are
integral multiples of 2~n, then the same is true of the points т +/<5/m,
provided m = <52n. But, since ^n is constant over each interval
[(i - lJ-n, i2-n), M"m for m = <52n is just the quantity defined by A5.25).
Thus A5.26) holds if т and <5 are both integral multiples of 2~n.
Suppose now that <5 = 1/2V for an integer v > 1. If n > t>, then <5 is an inte-
integral multiple of 2~n and so are the endpoints of all the intervals involved in
A5.27) and A5.28). It follows as before that A5.30) holds for n > v. Thus
there does exist a <5 such that A5.33) holds for all large n.
The Xn will be tight if their distributions Pn satisfy the hypotheses of
Theorem 15.3. Now A5.7) is satisfied because of A5.33). If 2~k < <5, then
sup \Xn(t)\ < max
Л2Ч
d).
Since the distributions of the first term on the right all coincide for n > k, it
follows by A5.33) that condition (i) of Theorem 15.3 is satisfied.
To take care of A5.8) and A5.9) we make the temporary assumption that,
for some positive <50, h < <50 implies
A5.34) fAM{(filt h)-h = ]8«} = 1, A*i.^{(]8i, h)-h = h) = 1.
With this assumption, A5.8) and A5.9) certainly hold, so that {Xn} is
tight. If X is the limit in distribution of some subsequence, then (^(^1), . . . ,
X(tk)) has distribution fth...tk for dyadic rational tt, and the general case
follows via A5.32) by approximation from above.
It remains to remove the restriction A5.34). Take <50 < \, put
0 for 0<K <50,
^7 for **?*?!-**>
^7
2d0
1 for 1 - <50 < t < 1,
132 The Space D
and define vh...tk as ^...s* witn si = Ah)- Tnen tne r*i-** satisfy the
conditions of the theorem (with a new F), as well as the special condition
A5.34), so that there is a random element Y of D with these finite-
dimensional distributions. We now need only define X by X(t) =
Y(d0 + t(\ - 2<5O)), 0 < t < 1.
As an illustration of Theorem 15.7, let us use it to prove the existence in D
of a random element describing a Markov chain in continuous time. Let
Pijit), i,j = 1, 2,... be nonnegative numbers defined for t > 0 and satisfying
^iPuiO = ! and ^чРМрл{г) = pik(s + 0- Let pit i = 1, 2, . . . , be non-
negative numbers satisfying 2^=1. Define Pj@) = p, and p^t) =
^iPiPij(t) for f > 0.
Under the regularity assumption that
A5.35) lim Pit(t) = 1
t->o
holds uniformly in i, we shall prove the existence in D of a random element
X with finite-dimensional distributions specified by
A5.36) P{X(O = iu, и = 0, 1,. . . , m} = Х
for 0 < f0 < tx < • • • < tm < 1. The consistency of the finite-dimensional
distributions implied by A5.36) is easy to establish.
We first prove the existence of a positive К such that
A5.37) 0<l -Pii{t)<Kt
for all i and all positive t. Because of the uniformity in A5.35), whatever s,
there is a <5 such that pu{s) > 1 — e for all i if s < 2<5. If t <, <5, then
<5 < mt < 26 for some positive integer m, and we have
m—2
+ 2 [Ри(ОЗг2 А/'Ы(т - / - 1H
Z=0
m—2
2
г=0
Take ? = i; it follows (since log и < и — 1) that m(l - pu(t)) < log 2 < 1.
Take ,? = d-1. Since mt > 6, it follows that A5.37) holds for t < д. But the
inequality is trivial for t > <5.
If A5.36) holds, then
A5.38)
where d1 = t — t1 and <52 = t2 — t and where the summation extends over
Weak Convergence and Tightness in D 133
. i and over those/ and к for which i Ф j and/ Ф k. By A5.37), the sum in
5.38) is at most КЧ^ < К2(д1 + <52J. Thus the finite-dimensional
stributions implied by A5.36) satisfy A5.31) with у = 0, a = 1, and
0 = Kt. Since A5.36) and A5.37) also imply ?{X(t + h) И X(t)} <, Kh,
5.32) is also satisfied. Thus some random element X of D satisfies A5.36).
ince the possible values of X{i) are at mutual distance at least 1 on the line,
is, with probability 1, a step function constant over intervals.)
ther Criteria^
le function .Fin Theorem 15.6 was assumed nondecreasing and continuous.
re can replace the assumption of continuity by that of continuity from the
ght, provided we strengthen A5.21) to
5.39) P{|Xn@ - XM\ > A, \Xn(t2) - Xn(t)\ >
<
ideed, A5.39) and Theorem 12.6 yield, in place of A5.24), the stronger
lequality
f
¦5.40) ?{M"m > e} < f [F(t + 6) - F(t)]
e '
x mm
m
F(t + <5) - F(t)
,et J(t, t + <5] denote the maximum jump of F in the interval (т, т + <5];
15.40) implies
15.41) P{M1 > ?} ^ fy lF(r
The right member of this inequality is to be interpreted as 0 if F(t) =
Х- + «).)
Exactly as before, we obtain A5.29), where now each of 2' and 2" is a
um of the form
15.42)
vith 0 < tr < • • • < tT < 1 and tk — ^_x ^ 25. To prove A5.22) it therefore
uffices to show that, for small enough <5, all sums A5.42) are less than
7o = r]e2y/2K. Write Afc for F(tk) - Fit^) and Jk for J(tk_x, tk]. Since a > J-
' This topic may be omitted.
134 The Space D
and the Afc add to at most F(\) - F@),
2 АЛА, - Jk)' <
к
But the left member of this inequality is the sum A5.42), and hence it is
enough to show that, for small enough <5, 0 < t — s < 2<5 implies
A5.43) F(t)- F(s) - J(s, t]<m = -
Since i7 is an element of D, there exist (see A4.8)) points si} with 0 =
so < si < ' ' ' < Si = 1, such that w^lX-i, st) < \r}x. Take 2<5 smaller than
the minimum of the Бг — s^. \f t — s < 2<5, then either s and f lie in a
common interval [5г-_15 ^г), in which case the left member of A5.43) is less
than ^t]x, or else they lie in adjacent intervals [^_х, s{) and [st, si+1), in which
case it is at most F(t) — F(sJ + Ял-) - F(s) < r\x.
Thus A5.39) for a right-continuous F implies A5.22). The rest of the
proof goes through as before, and we can conclude that Xn ^> X. In the same
way, we can in Theorem 15.7 take F to be continuous from the right if we
replace A5.31) by
A5.44) ^„.{(A, /5, /52): \fi - &| > A, |ft - /5| > Я}
Separable Stochastic Processes^;
According to Theorem 9.2, if the finite-dimensional distributions of a
separable stochastic process can be realized as the finite-dimensional
distributions of a probability measure on (C, ^), then the sample paths of
the process are continuous with probability 1. We shall prove a similar
result for the space D.
Let {|t:0<f<l}bea stochastic process and let P be a probability
measure on (D,@). Assume that the process and the measure have the same
finite-dimensional distributions. Assume further that the process is separable,
so that (9.16) holds for every a and /5 and every open interval /. It is no
restriction to assume 0 e To, in which case (9.16) holds also if / has the
form [0, s).
We now construct sets serving the same function as the sets (9.20) did in
the arguments in Section 9. This time let V denote the general system
A5.45) V: k;rlt...t rk; slt . . . , sk; ax, . . . , afc,
t This topic may be omitted.
Weak Convergence and Tightness in D 135
iere к is an arbitrary integer, where the ri} si} and a^ are rational, and
lerer
0 = ri < Si < r2 < S2 < " - * < rk < sk = *•
efine
к
5.46) UTo{V, e) = П {со: a, < |t(co) < a, + ?, f e (rit s<) П To}
A;
П П {co:min {аг._х, aj < |f(co) < max {а<_ь aj + e, f e (s^, rt-) П To};
the first intersection, we replace (r1} 5X) by [rl5 sx) (for f = 1). Let "Гы
;note the class of systems A5.45) that have a fixed value of к and satisfy
— jj_a < <5, / = 2, . . . , k. And now define
5.47) ПТо=ПиП U nTo(V,e),
e к 6 УеУкб
here ? and <5 are restricted to positive rationale. Finally, define Q.T(V, e)
id пт by substituting T = [0, 1] for To in A5.46) and A5.47).
It is not hard to show that, if со е ?2 T, then the sample path corresponding
> со is right-continuous at t = 0, has a left-hand limit at t = 1, and is
mtinuous in @, 1) except possibly for discontinuities of the first kind. And
the sample path has this property, then a> e ?1T, as follows by Lemma 1 of
sction 14 (applied to the sample path normalized to be right-continuous).
We shall prove that пт e dS and ?{пт) = 1. The relations (9.22) and
\Tf> e 3S follow by the same arguments as in Section 9; hence ?2T e SS and
(?2T) = P(l2To), and it suffices to prove
.5.48) P(QTe) = 1.
With (t1} t2, . . . ) an enumeration of the points in To, define q>:Q. -*¦ i?00
nd xp: D — R« by 9>(a>) = (|4l(a>), |ti(a>),... ) and xp(x) = (x^), x(t2),...).
jst as before, (9.25) holds for every H in ^?co, where now P denotes the
leasure on D (not C) having the finite-dimensional distributions of {|J.
For the system A5.45) and s > 0, let HTo(V, e) consist of those points
= (zi> z2, . . . ) in i?OT such that, first, for each i = 1, . . . , k, the inequality
a* < zu < a, + s
olds for every м for which fu e (ri5 5г) (or fu e [rl5 jj in the case i = l) and,
;cond, for each i = 2, . . . k, the inequality
min {а^_х, aj < zu < max {а^_х, aj + e
olds for every и for which tu e (^_1} гг). Now define
ЯТо=ПиП U HTo(V,e).
г к 5 Ve-Tkd
136 The Space D
Then HTo(V, e) and HTo lie in ^co, <p-lHTo(V, e) = &Tq(V, e), and
у-хНТо = пТо. By (9.25) we have P(ftT(j) = Р(уг1нт)> and> since
\p-xHT] = D, A5.48) follows.
We have proved this result:
THEOREM 15.8 If {?t:0 < t < 1} is a separable stochastic process, and if
there exists on (D, Of) a probability measure having the same finite-dimensional
distributions as {|J, then the sample paths, with probability 1, are right-
continuous at t = 0, have left-hand limits at t = 1, and are continuous on
@, 1) except possibly for discontinuities of the first kind.
An example shows that it is not possible to go on and prove that the paths
are right-continuous and hence lie in D (with probability 1): Define
if 0 < t <, со
if со < t < 1,
where со is drawn uniformly from П = @, 1). This points up the fact that in
taking the elements of D to be right-continuous we were simply adopting a
convention.
PROBLEMS
1. Show that Theorem 15.4 remains true if we only require that A5.10) hold for all
fx,. . . , tk in To, where To is dense in [0, 1] and contains 0 and 1. Show that it fails if
0 $ To or if 1 $ To.
2. The condition A5.32) in Theorem 15.7 is necessary.
3. If a random element X of D has the property that
lim sup - ?{\X(t + <5) - X(t)\ > e} = 0
°
for every positive e, then P{XE C} = 1.
4. Use Theorem 15.7 and Problem 3 to show that distributions Htl-~tk can be realized
as the finite-dimensional distributions of a random element of С if A5.31) holds (where
у > 0, a > i, and F is nondecreasing and continuous) and if
lim sup - Mt.t+sWi, &) : 102 - 0il > ?) = 0
<5-*0 0 °
for every positive e. There is the analogous result with A5.44) in place of A5.31). Use
Theorem 9.2 to derive conditions for the continuity of sample paths for a separable process.
(Compare Problem 8 in Section 12.)
5. Use Theorem 15.7 to construct a Poisson process in D. Now make a direct construc-
construction, starting with independent, exponentially distributed random variables.
Applications 137
16. APPLICATIONS
Donskefs Theorem
In some ways, the space D is more convenient than С for the formulation
of Donsker's theorem. Given random variables |x, |2, ... on (Q, ?8, P),
with partial sums Sn = |x + • ¦ • + ln, let ^n(co) be the function in D with
value
J_
Ту/П
A6.1) Xn(f, со) = — S[n4](co)
at f. Since each Xn{i) is a random variable, Xn is a random function—a
random element of D.
We want to prove, under suitable conditions, that the distribution of Xn
converges weakly to Wiener measure W. Now W is defined on (C, ^),
but it is easily extended to (D,@): Since Ce^, and since the relative
Skorohod topology in С coincides with the uniform topology there, Ae@
implies А Г\ С e^. We may therefore extend W to (D, Of) by giving it the
value W(A П C) for A in Э). Of course, С supports W. From now on we
shall interpret W as this probability measure on (D, &) or as a random
element of D with this probability measure as its distribution.
THEOREM 16.1 Suppose the random variables |n are independent and
identically distributed with mean 0 and finite, positive variance a2:
A6.2)
Then the random functions Xn defined by A6.1) satisfy
A6.3) Xn Д. Ж.
Proof. The proof in Section 10 that the finite-dimensional distributions
converge carries over with no difficulty. Thus we need only prove tightness.
There are several ways to prove tightness. One way is to verify the
hypotheses of Theorem 15.5. Now condition (ii) of Theorem 8.3, formulated
for random elements of D, requires that, for positive e and rj, there exist a
<5, 0 < <5 < 1, and an integer n0 such that, for 0 < t < 1,
-?[ sup \Xn(s)-Xn(t)\>e\<r), п>щ
О \t<s<t+6 )
(we replace t + <5 by 1 if it exceeds 1). Furthermore, this condition reduces,
by the arguments used in the proof of Theorem 8.4, to the following one:
138 The Space D
For each positive e there exist а л > 1 and an integer n0 such that
P max \St\ > Mn\ < — , n>n0.
\ i < n ) Я
It suffices therefore to verify this last condition, which can be done by
using the central limit theorem as in Section 10 (see A0.7)) or by using the
bounds in Section 12 (see A2.23)). Thus the methods of the preceding
chapter carry over without essential change.
Given the results in Section 15, we can construct a very simple proof. By
Theorem 15.6, it is enough to establish the inequality
A6.4) E{|Xn@ - Хп(к)\* • \Xn(t2) -
^ 4(f, - k)\ h<t< t2.
By A6.1), A6.2), and the independence of the |n, the left side of A6.4) is
A6.5) 4
4l
G П
n n
If h — h > 1/л, A6.4) follows from this. If t2 — t1 < \jn, then either tx
and t lie in the same subinterval [(i — \)jn, ijri) or else t and t2 do; in either
of these cases the left side of A6.4) vanishes. This establishes A6.4) in
general and proves the theorem.
Since the value of supf Xn{t) is the same for the random element of D
defined by A6.1) as for the random element of С defined by A0.1), namely
тахг<п Sjay/n, we can obviously use Theorem 16.1 to rederive A0.18).
(We must prove that the mapping h.D^-R1 defined by h{x) = supt x{t) is
continuous, but this presents no problem.) We may rederive the results in
Section 11 in the same way.
Some functions of the partial sums St have a simple expression in terms of
the Xn in Theorem 16.1 but have no simple expression in terms of the Xn in
Theorem 10.1. Such functions are better analyzed in D than in С For
example, for x e D, let h(x) be the Lebesgue measure of the set of t for which
x(t) > 0. Then (p. 232) h is measurable {hrx0P- <=- Of) and is continuous
except on a set of Wiener measure 0. If Xn is defined by A6.1), then h(Xn) is
exactly rr1 times the number of positive sums among S1, . . . , Sn_1} whereas
this is generally untrue if Xn is defined instead by A0.1). Combining Theorem
5.1, Theorem 16.1, and A1.26) leads to the arc sine law under the assump-
assumptions of the Lindeberg-Levy theorem.!
t See the problems for some further applications.
Applications 139
ominated Measures
tie random variables in Theorem 16.1 are defined on a probability space
1, &, P). We shall show that the result remains true if P is replaced by an
bitrary probability measure Po on (Q, ?%) dominated by (absolutely
mtinuous with respect to) P. For example, suppose that П = [0, 1], that
7 consists of the linear Borel subsets of [0, 1], and that ln(co) = 2con — 1,
here con is the rcth digit in the binary expansion of со. Then Theorem 16.1
)plies to {|n} if со is drawn from [0, 1] according to the uniform distribu-
on. According to the theorem we are going to prove, this is true also if со
drawn from [0, 1] according to an arbitrary distribution having a density
ith respect to Lebesgue measure.
We shall need the following preliminary result (in which o-(^0) denotes
le cr-field generated by
HEOREM 16.2 Let Ex, E2, . . . be measurable sets in a probability space
.^, &, P). Suppose there exist a constant a and a subfield ^0 of & such that
6.6) ?{En П E) -* aP(?)
»r every E in ^0. Suppose further that all the En lie in a(&0). If? dominates
0, a second probability measure on ?8, then
16.7) P0(En) - a.
roof. From the hypothests it follows just as in the proof of Theorem 4.5
mt
16.8) f g
)r every integrable g. But A6.8) and A6.7) are the same thing if g is the
Ladon-Nikodym derivative of Po with respect to P.
HEOREM 16.3 Theorem 16.1 remains valid if P is replaced by an
rbitrary probability measure Po dominated by it.
roof. Define X'n by
16.9) X'n(t,co) = ^
pn<i<nt
'here {pn} is a sequence of integers going to infinity slowly enough that
0 (X'n{t, со) = 0 if t < pjn). If
ben
1 Vn
140 The Space D
By Minkowski's inequality and the fact that pjyjn -*¦ 0,
so that, by Chebyshev's inequality,
A6.10) <5n^0,
where A6.10) is interpreted in the sense of P (that is, P governs the distribu-
distribution of the |n). By Theorem 16.1,
A6.11) Xn®>W,
where again the relation is interpreted in the sense of P. Since d(Xn, X^) < dn,
where d is the metric in D (either one), it follows by Theorem 4.1 that
A6.12) X'n ®> W
in the sense of P.
Let A be a Ж-continuity set (in D), temporarily fixed; A6.12) implies
A6.13) ?{X'neA}^W{A).
Let ?%q be the field of cylinders; ^ consists of sets of the form
with к > 1 and H e 0tb. If E e ^0, then, sincepn -> oo, ?{{X'n e A) n E) =
e ^}P(?) for large n and it follows by A6.13) that
Since the sets {X'n e A) all lie in the <y-field generated by ^0, it follows by
Theorem 16.2 with a = W{A) that
A6.14) P0{X'neA}-+W(A)
if Po is absolutely continuous with respect to P.
Since A6.14) holds for every Ж-continuity set A, A6.12) holds when
interpreted in the sense of Po. Now A6.10) asserts that P{<5n > e} —*¦ 0 for
each positive e; if Po is absolutely continuous with respect to P, then
Po{<5n > ?} —*¦ 0 as well, and A6.10) persists when interpreted in the sense of
Po. Applying Theorem 4.1 once more, we see that A6.11) holds in the sense
of Po, which completes the proof.
Theorem 16.3 concerns convergence in distribution. Notice that passing
from P to a dominated measure Po trivially preserves such properties of {Sn}
as the law of the iterated logarithm, which hold with probability 1.
Applications 141
Empirical Distribution Functions
Let |l5 |25 • • • be random variables with
and let Fn(t, со) be the proportion of the points ?i(co), . . . , ?n(co) not
exceeding ?, so that Fn( • , со) is the empirical distribution function. Let
Yn(co) be the element of D with value
A6.15) Уя(*, со) = Jn(Fn(t, со) - F(t))
at the point t e [0, 1], where F is the distribution function common to the
|n. Since each 1^@ is a random variable, Уя is a random element of D.
In Section 13 we studied the related random element A3.7) of C; here we
shall investigate Yn directly, and we shall remove the restriction that the ?n
be uniformly distributed. The stochastic process {Yn(t): 0 < t < 1} is
sometimes called the empirical process.
THEOREM 16.4 Suppose the ?„ are independent and have a common
distribution function F(t). If Yn is defined by A6.15), then
A6.16) Yn^Y,
where Y is the Gaussian random element of D specified by
(E{Y(t)} = 0
A6.17)
\E{Y(s)Y(t)} = F(s)(\ - F(t)), s<t.
Proof. Just as Wiener measure does, the measure W° extends from (C,
to (D, Of). Denote by W° a random element of D with this extended measure
as its distribution. We first show, under the assumption that the ?n are
uniformly distributed over [0, 1], that Yn Л* W°.
Let Un(t, со) be the number of points among ?х(со), . . . , |n(co) not
exceeding t. For 0 = t0 < tx < • • • < tk = 1, the random variables
ипAг) — ип^г_г), i = 1, . . . , к, are multinomially distributed with param-
parameters n and р; = tt — ^_x, and it follows as in Section 13 by the central
limit theorem for multinomial trials that the finite-dimensional distributions
of the Yn converge weakly to those of W°. By Theorem 15.6, it suffices to
prove that
E{l Yn(t) - Yn(tJ\* ¦ | Yn(t2) - 7n@l2} < 6(t - h)(t2 - t)
for tx < t < t2. But this is just the inequality A3.17) already established in
Section 13. Hence Yn -®> W°.
142 The Space D
Suppose now that the gn have an arbitrary distribution function F over
[0, 1]. Define an "inverse" to Fhy
y(s) = inf {t:s <F(t)}.
Then s < F(t) if and only if <p(s) < t, so that, if r\n is uniformly distributed
over [0, 1], then P{<p(r]n) < t) = F(t). Since the theorem involves only the
joint distribution of the |я, we may represent them as ?я = <p(r]n), with the
r]n independent and uniformly distributed.
If Gn{ • , со) is the empirical distribution function for rj^co), . . . , r]n{co)
and Zn(t, со) = -Jn(Gn(t, со) — t), then, by the case of the theorem already
established, Zn -^ W°. But the empirical distribution of ?i(a>), . . . , ?n{co) is
Fn(t, со) = GXF(t), со), so that, if Yn is defined by A6.15), Yn(t, со) =
Zn(F(t), со). Define y.D-^D by (ipx)(t) = x(F(t)). If xn converges to x
in the Skorohod topology and x e C, then the convergence is uniform, so
that ipxn converges to ipx uniformly and hence in the Skorohod topology.
Hence Zn4r and Theorem 5.1 imply Yn = y>(Zn) Д. ip{W°); since
Y = y(W°) is Gaussian and satisfies A6.17), the proof is complete.
If we knew the distribution of
A6.18) supt 7@ = supt W°(F(t)),
we could find the limiting distribution of
A6.19) yjn supt (Fn(t, со) - F(t)).
Now A6.18) has the same distribution as supt W° (namely that given by
A1.40)) if F is continuous, but not otherwise. The same remarks apply to
A6.20) supt | 7@1 = supt | W°(F(t))\.
Remarks. The random function A6.1) has independent increments and so has W; for
general results on weak convergence in this circumstance, see Kimme A957 and 1960) and
Skorohod A957). For convergence of diffusions, see Stone A963). Theorem 16.2 is due
to Renyi A958).
Concerning Theorem 16.4, see the remarks and references at the end of Section 13. See
Schmid A958) for the distributions of A6.18) and A6.20) for the general F.
PROBLEMS
1. Under the hypotheses of the Lindeberg-Levy theorem, there are limiting distributions
for
(a)
A 2 1**1.
i i sk\
n
(Ь) i
n k=l
(c) — min Sk, 0 < 0 < 1.
Vл рп<к<п
Random Change of Time 143
Construct the three relevant mappings from D to R1 and prove they are measurable and
continuous on a set of W^-measure 1. For the forms of the limiting distributions, see
Donsker A951) for (a) and (b) and Mark A949) for (c).
2. Theorem 16.1 (makes sense and) is true if л goes to infinity in a continuous manner.
[See B.4).]
3. For each n, let ?nl,.. . , ?nr! be independent random variables with P{?nk = 1} = pn
and P{?nk = 0} = 1 — pn, and define Yn by Yn(t) = ^k<nt ?nk. Assume npn -»- A and prove
that Yn -L>. Y, where Y is the appropriate Poisson process (with paths in D).
4. Show that Theorem 16.4 still holds if the measure governing the ?„ is replaced by a
probability measure absolutely continuous with respect to it.
5. Prove Theorem 16.4 by an application of Theorem 15.6 with A5.39) in place of
A5.21), thus avoiding the representation ?„ = y(rjn).
6. Let P be Lebesgue measure on the unit interval Q, and define ?я(со) = 2con — 1,
where con is the nth digit in the dyadic expansion of со. Show that the requirement in
Theorem 16.3 that Po be dominated by P is essential: The result fails if Po is a point mass.
It also fails if Po is Cantor measure (as defined in Billingsley A965, p. 36)).
7. Carry Problem 1 of Section 10 over to D. (See also Problem 1 of Section 12.) Gener-
Generalize the results in Problem 1 of this section.
17. RANDOM CHANGE OF TIME
Sometimes one requires an approximate distribution for a partial sum
Sv = |x + • • • + |v, where the index v is itself a random variable. Here we
shall prove several functional central limit theorems for such randomly
selected partial sums.
Randomly Selected Partial Sums
Suppose the partial sums Sn = ?x + • • • + gn for a nonrandom index n
obey the central limit theorem, say with norming factors oyjn, so that
A7.1) ^
as n -> со. If v is a random integer such that v is large with high probability,
there is some hope that SJo\jv will be approximately normally distributed.
To formulate a limit theorem, consider a sequence {vn} of random
integers. We seek conditions under which
A7.2) ^
as n —>> со. It is not enough simply to assume that vn goes to infinity in
probability, in the sense that
A7.3) ?{vn < a} -> 0
144 The Space D
for each a. For suppose the ?n are independent and take the values +1 and
— 1, with probability \ for each, so that A7.1) holds with a = 1. If vn is the
time of the nth zero crossing (the nth value of к for which Sk = 0), then
A7.3) holds (because vn > vn_1 + 1) but A7.2) does not (because SVn = 0).
To derive A7.2) we must assume more than A7.3). Our principal result
will be that A7.2) holds if vjn converges in probability to a positive, finite
random variable, and if the ?n are independent and identically distributed
with mean 0 and variance <r2. We shall derive this result from a functional
central limit theorem which gives much more information.
Define a random element Xn of D by
Xn(t, со) = —- SlntJ((o),
Yn(t, со) = —= Sv[jid(e)(o))
and define Yn by
(certainly Yn is a random element of D). Since ^„A) = SvJoy]n, we shall be
in a position to approximate the distribution of SVn and derive results like
A7.2) if we show that Yn itself converges in distribution.
The distribution of a random function such as Yn is most easily
investigated by using the fact that Yn results from subjecting Xn to a random
change of time scale. Let Фя(^, со) = v[nu{co)jn. Since
A7.4) Yn(t, со) = Хп(Фп«, со), со)
(we assume for the moment that vn < n, so that Фя(?, со) < 1), Yn is Xn
with the time scale subjected to a change represented by the random function
Фя. То clarify the reasoning involved in this section, we first consider such
random time changes in a general context. We assume Xn and Фя each
converge in distribution and look for conditions under which Yn, as defined
by A7.4), converges in distribution.
Random Change of Time
Let Do consist of those elements cp of D that are nondecreasing and satisfy
0 < y@ < 1 for all t. Such a cp represents a transformation of the time
interval [0, 1]. We topologize Do by relativizing the Skorohod topology of D.
Since Do e 3, as is easily shown, the ff-field ^0 of Borel sets in Do consists of
the subsets of Do that lie in Q) (see p. 224).
For x e D and cp e Do, let x ° <p denote the composition of x and cp—the
function on [0, 1] whose value at t is
A7.5) (x о p)@ = x(<p(t)).
Random Change of Time 145
low x о cp lies in D and, if гр: D X Do —> D is defined by
17.6) ^(я, у) = x о у,
hen, as is proved on p. 232, ip is measurable (ip'1^ <=¦ Я) X ^0).
Let X be a random element of 2) and let Ф be a random element of Do.
Ve assume X and Ф have the same domain, so that (X, Ф) is a random
lement of D x Do with the product topology (see p. 225). If X о ф has
alue X(cd) ° Ф(со) at со—that is, if Xо ф == у{Х, Ф)—then Х°Ф is a
andom element of D; X ° Ф results from subjecting X to the time change
epresented by Ф.
Suppose that, in addition to X and Ф, we have, for each n, random
dements Xn and Фп of D and Do, respectively, where Xn and Фп have the
ame domain (which may vary with ri). We ask for conditions under which
Xn, Фп) Д- (X, Ф) (convergence in distribution relative to the product
opology) implies 1яофД1оф.
Suppose that
17.7) (Xn, Фп) Д. (X, Ф)
md that
17.8) P{XeC} = Р{ФеС} = 1.
3y Corollary 1 to Theorem 5.1,
17.9) 1яофД1оф
vill follow if we show that ip (defined by A7.6)) is continuous at (x, ф) for
с e С and <p e С П Do. If ?я converges to a; and <pn converges to cp in the
5korohod topology, and if x and cp lie in C, then the convergence in each
;ase is uniform. But
supt \xn(<pn(t)) - x(<p(t))\
< supt \xn(t) - x(t)\ + supt \x(<pn(t)) - x(<p(t))\;
since x is uniformly continuous, xn о <pn converges to x ° cp uniformly and
hence in the Skorohod topology, which proves A7.9).
There remains, of course, the question of when (Xn, Фп) converges in
distribution, and here Theorems 4.4 and 4.5 can be used.
Applications
We return now to a consideration of sums Sn = |x + • • • + |n. For each
n, let vn be a positive-integer-valued random variable defined on the same
probability space as the ?n. Define Xn by
A7.10) - Xn{t, cd) = ±
146 The Space D
and Yn by
A7.11)
Yn(t, со) =
1
¦SlYnl*n№= XVn(o»0, 0>).
THEOREM 17.1 //
A7.12) ?A«,
where в is a positive constant and the an are constants going to infinity, then
A7.13) Xn^W
implies
A7.14) Yn Л- W.
Proof. There is no loss of generality in assuming that
A7.15) O<0<1,
since this can be arranged by passing to new constants an if necessary.
Furthermore, there is no loss of generality in assuming that the an are
integers. Define
A7.16)
Since
A7.17)
ФпС. to) =
if
td
otherwise.
supt\<!>n(t) - td\ <
0,
Ф„ converges in probability in the sense of the Skorohod topology to the
element <p(t) = 6t of Do. Because of A7.13) and the assumption яя —>- oo,
it follows from Theorem 4.4 that (Хпп, Фя) Д- (W, ф) and hence, since
A7.7) and A7.8) imply A7.9), that Xa °"фя -> W ° <p.
If
Y&t, со) = —— S[Vn(co)f](co),
then Xan о Фп and Y'n have the same value at со if vn(co)lan < 1, the probability
of which goes to 1 by A7.12) and A7.15). Therefore Y'n *> W° (p. Now
sup,
-= Y&tt со) - Yn(t, со)
Vе
and hence Yn converges in distribution to
has the same distribution as W, A7.14) follows.
cp). Since H(Jf»
Random Change of Time 147
Of course, A7.14) implies SvJff\ vn -^>- JV, as well as an arc sine law and
limit laws for the maxima and so on. Note that we have assumed of the |n
nothing beyond A7.13), which holds for example in the Lindeberg-Levy
situation. In Chapter 4 we prove A7.13) for various dependent sequences
{?„}, to each of which Theorem 17.1 then applies.
If the limit in A7.12) is not constant, we must make more specific
assumptions about the |n.
THEOREM 17.2 Suppose |l9 ?2, . . . are independent and identically
distributed with E{?J = 0 and E{?n2} = a\ If
A7.18) ^Д-0,
where в is a positive random variable and the an are constants going to infinity,
then Yn ®> W.
Proof. We assume at first that в is bounded, so that there exists a constant
К such that
0 < в ^ К
with probability 1. We may adjust the an so that they are integers and so that
K< 1.
If we define Фя by A7.16), then, as before, Фп Д- Ф, where Ф is the random
element of Do denned by
Ф@ = 6t.
Let &0 be the field of cylinders and define X'n by A6.9), where pn -> oo and
-> 0. Then, as in Section 16, we have
and
P({X'neA} n E) -> W(A)?(E)
for every Ж-continuity set A and every E in ^0. Now by Theorem 4.4,
(Фя, vjan)^>(<?>, в) in the sense of the product topology on Do x R1;
since every X'n is measurable o-(^0), it follows by Theorem 4.5 that
relative to the product topology in D x Do x R1, where в0 is independent
of W and Ф0(Г) = 60t. By A7.19),
(Xan, Фп, Ц Д. (W, Фо, во).
148 The Space D
Now the mapping that carries the point (x, <p, a) to ori(x о у) is continuous
at that point if x e C, <p e С П Z)o, and a > 0. By Corollary 1 to Theorem
5.1, therefore
(-V (*e. ° Ф„) Д- tf{W - Ф0).
W
Since в0 and Ж are independent, б„-*(Ж° Фо) has the same distribution as
W. Moreover {yJan)-i(Xan ° Фя) coincides with Yn if vjan < 1, the
probability of which goes to 1 since К < 1. Thus Yn^W if 6 is bounded.
Suppose в is not bounded. For и > 0, define 6U = в and уия = vn if
б < и and define 6U = и and уия = аям if в > и. Then, for each u,
vuJan -^- 6U as n -> oo, and, by the case already treated, if
Yun(t) = S[Vun(coU](co),
()
then Yun ^> W as n -> oo. Since Р{Гия И ^я} < Р{б > и}, Yn®+W
follows by Theorem 4.2.
This proves Theorem 17.2. The proof goes through for many dependent
sequences {?„} also; see Chapter 4.
Renewal Theory
These ideas can be used to derive a functional central limit theorem
connected with renewal theory. Let т]1г г\г,. . . be a sequence of positive
random variables and define
A7.20) vt = maxj/c:i^< t\, t>0,
with vt = 0 if 7}г > t. If 7]k is the time lapse between the occurrences of the
(k — l)st and Ath events in a series, then vt is the number of occurrences up
to time t.
We shall assume the existence of positive constants /г and a such that, if
Xn(t, со) =
then Xn Д- Ж. This will be true if, as in the usual renewal theory situation,
the r]n are independent and identically distributed with mean ^ and variance
a2. Define Zn by
THEOREM 17.3 7/Хя Д^ W, then Zn
Random Change of Time 149
''roof. We shall assume in the proof that /л > 1; this is only a matter of
cale. We first show that
17.21)
sup
0<v<u
V
—
и
1
— —
V
—
и
о
is и -*- oo. The hypothesis Xn -^- W implies that
17.22)
1
sup -
0<i<s S
[t]
2 Vi -
0
is s -*¦ oo (because replacing s~r by 5~* would give convergence in distribu-
ion). By the definition A7.20), vv > t implies S^x т]г < v. Therefore
implies
17.23)
Similarly,
implies (if e < /jr1)
A7.24)
SUp M
<«<u\M ^ M/
sup
> (лие.
—e
0<v<u\U
sup
M
Iv<-
/лие.
By A7.22), the probabilities of A7.23) and A7.24) go to 0 as и -+ oo, which
proves A7.21).
Put
if
n
otherwise.
By A7.21), Фя -?> <p, where 9?(f) = t\/i, so that, by the usual argument,
Xn°<$>n®>W° cp. Let
ад)--^ 2
i=\
Yn = Xn° Фп if vjn < 1, the probability of which goes to 1 by A7.21) and
the assumption /л > 1. Therefore Yn -^- W° (p.
By the definition A7.20),
^ < Yn(t)
From тахг<я
it follows that supt<x \rjvjl<j\/n, which in turn
150 The Space D
implies that, if
7= >
^ Д.Ж»?). Therefore $Z'n Д- W(recall that <p(t) = tl/л), from which
Zn Д- W follows because of the symmetry of W.
It is possible in Theorem 17.3 to let n tend to infinity in a continuous
manner.
Remarks. The results of this section, which are new, extend those in Billingsley A962).
For the central limit theorem for a random number of random variables, see Renyi A960)
and Blum, Hanson, and Rosenblatt A963). See Lamperti A962) for another weak-con-
weak-convergence theorem in renewal theory.
18. THE UNIFORM TOPOLOGY
Let us consider briefly the uniform topology on D—the topology given by
the uniform metric
A8.1) p(x,y) = swpt\x(t)-y{t)\.
With this metric, D is a complete metric space. It is, however, not separable:
The uncountably many points xQ defined for 0 < в < 1 by
@ if 0 < t < 6,
A8.2) xe(t) =
U if e<t<\,
are at mutual distance 1 (see p. 216).
A concept prefixed with U will refer to the uniform topology: [/-open,
[/-continuous, etc. In the same way, a prefix S will refer to the Skorohod
topology of D. If d is either of the two metrics that give the Skorohod
topology (see Section 14), and if p is the uniform metric A8.1) then
d(%, y) < p(x, y)- Therefore the uniform topology is finer than the Skorohod
topology: [/-convergence implies S-convergence. On the other hand, as
observed in Section 14, if xn S-converges to x and x e C, then xn [/-converges
to x. In particular, the uniform and Skorohod topologies coincide when
relativized to C.
Let Si denote the class of iS-Borel sets as usual, and let ^ denote the class
of [/-Borel sets. Since an S-open set is also [/-open,
A8.3) 9 с <%.
Let us suppose that Pn and P are probability measures defined on ^,
the larger of % and 3). Let us write Pn =>P[[/] to indicate weak convergence
in the sense of the uniform topology and Pn^>P[S] to indicate weak
The Uniform Topology 151
;onvergence in the sense of the Skorohod topology. NowPn => P[U] requires
18.4) [
or all bounded, [/-continuous/, whereas Pn=>P[S] requires A8.4) only
or all bounded, ^-continuous/. Hence Pn=>P[U] implies Pn=>P[S].
The converse is false because ^-convergence does not imply [/-convergence
consider point masses).
Suppose now that Pn and P are defined on W, that Pn =>P[S], and that
d(C) = 1. We shall prove that Pn=>P[U]. For A in °tt, let Av denote its
[/-closure and let As denote its S-closure. Note that Av <= As. Since
°n =>P[S] and P(C) = 1, lim supn Pn(A) < Um suPn Pn(As) < P(AS) =
D(AS П C). Since a sequence ^-converging to a limit in С also [/-converges
:o the limit, As Г\ С ^ Аи. Therefore lim suipnPn(A) < Р(Аи), which
mplies i>n =>i>[[/] by Theorem 2.1.
We have proved this: Suppose Pn and P are defined on %. (i) IfPn =>/*[[/],
fhen Pn=>P[S]. (ii) IfPn=>P[S] and P(C) = 1, then Pn=>P[U]j Because
}f (i), it is better if possible to prove Pn => P[U] than to prove Pn => P[S]—
for example, one has then the asymptotic distributions for more functions
эп D. Because of (ii) on the other hand, if P(C) = 1, then a proof of
Pn => P[S] is automatically a proof of the better result Pn => P[U].
Consider the distributions Pn of the random functions Xn involved in
Donsker's theorem as formulated for D (see A6.1)). According to Theorem
16.1, Pn=> W[S\, where W is Wiener measure. Since W(C) = 1, we can
draw the stronger conclusion Pn => W[U].
There is a gap in this argument: the measures Pn and W are defined only
on 3. The question is whether they can be extended to the larger cr-field tft.
Certainly, W can be extended—the argument used in Section 16 to extend
W from (C, <g) to (D, 9) enables us also to extend it from (C, <g) to
{D, Щ. Now Pn is defined as PX^1 (the domain of Xn is a probability space
(Cl, 0ft, P)), a definition which is possible because
A8.5) Х~г2 с <#.
If we are to define Pn on °tt instead, we must strengthen A8.5) to
A8.6) Х-г% с <#.
If A is a finite-dimensional set in D, then Х~гА е 38, and it follows that
the same is true if A is a [/-sphere. Since the uniform topology is not
separable, A8.6) does not follow without further argument.
t These statements are true because [/-convergence always implies ^-convergence and
^-convergence to a limit in С implies [/-convergence; they involve no further properties of
C, D, and the two topologies.
152 The Space D
The definition A6.1) of Xn shows that the range Хпп is separable in the
uniform topology. If A is [/-open, therefore, there exist countably many
[/-spheres At such that А П Хпп = \JAA n Xnty- But then X^A =
{JiX^Ai, and, since each Х~гА{ lies in &, so does Х~гА, which proves
A8.6).
Thus the distribution Pn is (or can be) defined on °U in the natural way by
Рп(А)=Р{со:Хп(со)еА}, Ae%,
and we can strengthen Theorem 16.1 by interpreting Xn -^> Win the uniform
topology :i>n=> W[U].
Consider next the random functions Yn defined by
Yn(co, t) = Jn(Fn(t, со) - t),
where Fn(t, со) is the empirical distribution function for ?i(co), . . . , ?n(co),
assumed independent and uniformly distributed over the unit interval.
Now Yn A- W° by Theorem 16.4. Since W° can be extended to °tt just as
W was, we can interpret Yn ¦%. W° in the uniform topology rather than the
Skorohod topology if we can extend the distribution of Yn from 3) to
^—that is, if we can strengthen
A8.7) Y-^9 <= <%
to
A8.8) У-1^ с ^.
But here the program collapses: We shall show that A8.8) is false.
To see that A8.8) fails, consider the case n = 1. Let h be the mapping
from Oto D that carries со to the distribution function with a unit jump at
^(co); h(co) = Яе (ш) with xe defined by A8.2). Clearly, A8.8) is equivalent
to
A8.9) h-Щ с <#,
and we shall show that A8.9) is false if ?i(co) is uniformly distributed.
Let Aq be the [/-sphere with center xe and radius ^. Since the xe are at
mutual distance 1 in the uniform topology, we have
A8.10) h-1! U Ав\ = {со:^(со)еН}
\вен J
for every subset H of the unit interval. If A8.9) were true, then, since
\JoeHAe is [/-open, the set A8.10) would lie in ?%, so that we would have
{со: ^(co) eH} e & for every subset H of the unit interval. Thus /л = P^
would be a measure defined for every subset of the unit interval and having
the property that [л(а, b) = b — a for every subinterval {a, b). But no such
[л exists.
The Uniform Topology 153
A small elaboration of these ideas shows that A8.8) fails for every n
unless the distribution common to the ?t- is purely atomic. In treating
empirical distribution functions as random elements of D, therefore, we
must stay with the Skorohod topology—and this is solely for reasons of
measurability.t As a rule, we may interpret Xn -^-Xvn the uniform topology
only if (i) X lies in С with probability 1 and (ii) the jumps in Xn occur at
fixed time points rather than at time points with random positions.
Remarks. Chibisov A965) has pointed out that A8.8) is false. Another way around the
problems created by the nonseparability of D in the uniform topology is to work within the
(т-field <2f0 generated by the [/-spheres. This requires a different theory of weak convergence;
see Dudley A966 and 1967).
t O, wiste a man how manye maladyes
Folwen of excesse and of glotonyes,
He wolde been the moore mesurable ...
The Pardoner's Tale
CHAPTER 4
Dependent Variables
19. DIFFUSION
This section contains a theorem characterizing Brownian motion, a theorem
characterizing more general diffusion processes, and asymptotic forms of
these two theorems. In Section 20 we use these results to prove a version of
Donsker's theorem for the variables of a stationary sequence satisfying a
uniform mixing condition; in Section 21 we extend the result to functions
of such sequences; and in Section 22 we prove functional central limit
theorems for empirical distribution functions in these two cases. In Sections
23 and 24 the results of the present section are used to derive limit theorems
for martingales and for exchangeable random variables.!
Although many of the results of this chapter could be formulated and
proved in the space C, we shall consistently use D with the Skorohod
topology.
A Characterization of Brownian Motion
The random function W lies in С with probability 1 and has independent
increments; moreover, E{Wt} = 0 and E{Wt2} = t. These properties
characterize W:
THEOREM 19.1 Let X be a random element of D that has independent
increments and satisfies ?{XeC}=\, E{X(t)} = 0, and E{X*(t)} = t.
Then X is distributed as W.
f The succeeding sections all depend upon this one. Sections 20, 21, and 22 require to be
read in order, but this sequence and Sections 23 and 24 form three independent units.
154
Diffusion 155
Proof, The independence and the moment conditions imply
Е{(Д0 - X(s)f} = \t - s\
and P{X@) = 0} = 1. We are to show that X(t) — X(s) has distribution
N@, t — s) for s < t. Because of the independence, it is enough to prove
this for s = 0, and by continuity we may assume t < 1.
Let cp{t, • ) be the characteristic function of X(t)\
<p(t, u) = E{eiuX{t)}.
Since P{XeC} = 1, cp(t,u) is continuous in t as well as in u. We shall
show that it satisfies the differential equation
A9.1) I <p(t, u) = -iu2<p(t, u), 0<t<l, ue R\
at
It will then followt that
since X@) vanishes with probability 1, this will entail normality. In proving
A9.1), we may take the partial derivative to be a derivative from the right.J
We are thus to prove
A9.2) lim i [<p{t + h,u)- <p(t, u)] = -\u2tp{t, u), 0 < * < 1.
ftlo h
By a standard estimate, §
A9.3) eiv = 1 + iv - \v2 + c(v),
with
A9.4) \c(v)\ < N3
for v real. Write
A9.5) AM = A(s, 0 = X(t) - X(s).
From the independence of these increments and the relations E{AsJ = 0
f If / satisfies /'@ = A(t)f(t) with A continuous, then its ratio with the nonvanishing
function/0@ = exp J* A(t) dr has derivative 0, so that/(O//o(O =/@)//0@) =/@).
$ To prove that a continuous/with continuous right-hand derivative/" on [0,1) necessarily
has a two-sided derivative on @, 1), it suffices to assume / real and prove that F(t) —
fit) - /@) - Jo/+(t) dr vanishes identically. If (say) Я*„) < 0, then G(t) = F(t) -
tF(to)/to satisfies G@) = G(t0) = 0 and (since F+ vanishes) G+(t) > 0, so that, over
[0, /0], G must have a positive maximum at some interior point s; but then G+(s) < 0, a
contradiction.
§ Feller A966, p. 485).
156 Dependent Variables
and E{A^ J = t — s, we have
<p(t + h,u)- cp(t, u) = E{eiuX{t)[eiuMt't+h) - 1]}
c(uA
Because of A9.4), A9.2) will follow if we prove that
A9.6)
nlo h
Using again the independence of the increments, we obtain
A9.7) E{A21>tA2MJ = (t - Wtt - 0 < (U - h)\
Fix t and t + h, with h > 0. If
<t< t2.
A9.8)
= max mm
0<i<m {
Д М + -
m
A f + -fc,
m
then, by A9.7) and Theorem 12.1,
A9.9) ?{M'm ? .
And, by A2.6),
A9.10) 1АМ+Л| < ЪМ'т + max
1<г<т
-ДА4
m
m
Since Z lies in С with probability 1, the maximum here goes to 0 with
probability 1 as m —*- oo, and it follows by A9.9) that
A9.11) Р{|Д*,*+д1 >fy<K —
with К = 44#2Д.
From A9.11) we obtain (see C) on p. 223)
poo 1^1,
Е{|АМ+Л|3} < a + ^
Ja О?
= a
(Г
(this minimizes the function on the right),
valid for a > 0. Taking a =
we obtain
which implies A9.6) and completes the proof.
The condition that X lie in С with probability 1 is essential here, since the
other hypotheses of the theorem are satisfied if
A9.12) X(t) = 7@ - t
and Y is distributed as a Poisson process with parameter 1.
Diffusion 157
We now derive a limit theorem from Theorem 19.1. Let Xn be random
elements of D. We say Xn has asymptotically independent increments if
:i9.13) 0 < Sl < h < s2 < h < • • ' < sr < tr < 1
mplies, for all linear Borel sets Hx, . . . , Hr, that the difference
;i9.i4) p{xm - хмещ, i = i,..., r} - jj nxnih) - *«(«,) ея,}
;onverges to 0 as и —»- oo. Note that we require A9.14) to hold only when the
ntervals fo, tJ are strictly separated—when ^_x < ^.
Recall that the modulus of continuity wx(d) = w(x, d), defined by (8.1),
's well defined if x lies in D. It enters into the next theorem because the
imiting distribution on D is actually supported by C.
THEOREM 19.2 Suppose that Xn has asymptotically independent incre-
increments, that {Xn2(t):n > 1} is uniformly integrable for each t, and that
E{Xn(t)}-+0 and ?{Xn2(t)}-^*t as n —>- oo. Suppose finally that, for each
oositive s and r\, there exists a positive д such that
A9.15) P{w(Zn, d) > e} < г)
for all sufficiently large n. Then Xn —*- W.
Proof. Since Е{ХД0)}-+0, {Xn@)} is tight. It follows from A9.15) and
Theorem 15.5 that {Xn} is tight and that, if X is the limit in distribution of a
subsequence, then P{Ie C) = 1. It is enough to show that any such X
must be distributed as W. But E{ATn(f)} -> 0, E{Zn2@} -»-1, and the uniform
integrability of {X2(t):n > 1} imply E{X(t)} = 0 and Z{X*(t)} = t
(Theorem 5.4). Now the increments X{t^) — X(s{), i=l, ..., r, are
independent when A9.13) holds, and from P{X e C} = 1 it follows that the
same is true if we allow tt = sihl in A9.13). Thus the result follows by
Theorem 19.1.
It is not possible to replace the w in A9.15) by v/ as defined by A4.6)
(define Xn = X by A9.12)).
The condition that {Xn2(t):n > 1} is uniformly integrable for each t
cannot be dispensed with in this theorem; indeed the condition follows via
Theorem 5.4 from E{Xn2(t)} -> t and Xn^- W. For a specific example in
which the condition fails, take Xn{t) = y/t gn, where ?n assumes the values
\Jn, —\Jn, and 0 with respective probabilities l/2«, l/2«, and 1 — 1/n;
although Xn does not converge in distribution to W, the hypotheses of the
theorem are satisfied except for the requirement of uniform integrability.
Theorem 19.2, together with the results of Section 12, leads to another
proof of Donsker's theorem. If Xn is the random function A0.1) involved in
158 Dependent Variables
that theorem, then, by A2.23) and Theorem 8.4 (carried over to D), the
condition involving A9.15) is satisfied. Further, by A2.20),
f S2Jn d? < K'P + \ f ^2JF
J {?„ /n<r2>a} |_Д 0" J {||i|>io<TV n)
for a universal constant K', which implies {Xn2@:" > 1} uniformly
integrable. The remaining conditions in Theorem 19.2 are easily checked, and
Xn ^- W follows. Notice that this proof does not presuppose the central
limit theorem; in particular, we have an independent proof of the
Lindeberg-Levy theorem.! As we shall see in Section 20, this method applies
also to certain sequences of dependent random variables.
Other Diffusion Processes%
IfXsatisfies the hypotheses of Theorem 19,1, then, for 0 < tx < • • • < tk < 1
and h positive,
A9.16) {
\t{(x( + h) x()Y || x()x()} = h.
If Xe С and X@) = 0 with probability 1, that X is Brownian motion
follows, as we shall see, from A9.16), and we can dispense with the assump-
assumption of independent increments. The same is true if, in addition to certain
regularity assumptions, we require only that, for small h, the equations in
A9.16) hold approximately, in a sense presently to be made precise.
We can in a similar way characterize diffusion processes other than
Brownian motion by replacing the right sides of the equations in A9.16) by
functions of h, tk, and X(tk). We shall assume that
A9.17) E{X(tk + h) - X(tk) || X(tJ, ..., X(tk)} * hP(tk)X(tk)
and
A9.18) E{(X(tk + h) - X(tk)Y || X(tl), . . . , X(tk)} ** ha\tk)
for small h (the meaning to be made exact), where p(t) and o2(t) are given
functions, and prove (Theorem 19.3 below) that X is a Gaussian random
function satisfying
A9.19)
t The argument is easily adapted to the Lindeberg case; see Problem 1 of Section 10 and
Problem 1 of Section 12.
J The remainder of this section is required only for Sections 23 and 24.
Diffusion 159
and
A9.20) Е{ВД X(t)} = Га\г) exp
р(т) dr + p(r) Jr
dr,
0<s <t
The choice p@ = 0 and o2(t) = 1 leads back to Brownian motion, and the
choice p(t) = —1/A — 0 and o2(t) = 1 leads to the Brownian bridge.
Of the functions o2(t) and p{t) we shall assume that a2(t) is nonnegative,
that both are continuous on [0, 1), and that, if
and
then the limit
A9.21)
g@ = exp
G(t) =
p(r) dr
dr,
lim g(t)G(t)
exists and is finite. As t —>• 1, the nondecreasing function G(t) has a limit
(finite or infinite), and hence git) has a finite limit (which vanishes if G —> oo).
Thus the right side of A9.20), which is well defined for 0 < s < t < 1, has
as t -> 1 a limit which we shall take as its value for t = 1.
If we define
X(t)=g(t)(\ + G(t))W°(G(t)l(l + G(t)))
for 0 < / < 1, where W° is the Brownian bridge, then A9.20) holds for
0 < s < t < 1. Since g{t)t g{t)G{t), and G(t)/A + G(t)) all have limits as
t —*¦ 1, we can define X(l) by passing to the limit. Thus, there does exist a
continuous, Gaussian random function satisfying A9.19) and A9.20).
Clearly X@) = 0 with probability 1. We shall regard Jasa random element
of D lying with probability 1 in C.
We shall give three conditions on X which characterize it as this random
function. Each condition will be given in two versions, first in a form
convenient for most applications and second in a distinctly weaker form that
suffices for the characterization. The first condition defines the sense in
which the approximations A9.17) and A9.18) are to hold.
Condition 1. I/O < h< • • < tk < 1, then
A9.22)
and
limy
fc + h) - X(tk)
г), •••, Xitk)}
- hpitk)Xitk)\} = 0
A9.23) lim I E{\E{(X(tk + h) - X(tk)f \\ X&),..., X(tk)} - ha\tk)\} = 0.
160 Dependent Variables
Condition la. If Q < h < • - ¦ < tk < \, then for all real щ, . . . , uk,
A9.24)
and
lim -
nloh
e(
I
ехр21из^(^з)
[X(tk + ft) - x(y - М'ь№)]} = о
A9.25) limy e(Гехр| ш,ЛГ(*,)] [(*& + /г) - X(/fc)J - ^U)]] = 0.
Condition 2. We have
A9.26) sup, E{X2@} < oo.
Condition 2a. 77ze variables X(t), 0 < t < I, are uniformly integrable.
Condition 3. There is a constant К such that
A9.27) E{(X(t) - X(tJY(X(tJ - X(t)f} < K(t2 - t,f, tx<t< t2.
Condition 3a. For t < 1,
- f (X(t + h) - X(t)f d? = 0.
A9.28) lim lim sup
Clearly, Condition 1 implies la and 2 implies 2a. To see that 3 implies 3a
(which is essentially a uniform integrability condition), observe that, with
the notation A9.5), the inequality A9.27) is just A9.7) with an extra factor
of К on the right, and, if X lies in С with probability 1, we may deduce
A9.11) just as before (with a new К—the value is irrelevant). An application
of C) on p. 223 shows that the integral in A9.28) cannot exceed IKhjoL, so
that 3a does follow from 3. (For conditions intermediate between 2 and 2a
and between 3 and 3a, replace the exponent 2 on the left in A9.26) and
A9.27) by 1 + ?.)
THEOREM 19.3 Let X be a random element of D with ?{X e C} = 1 and
P{X@) = 0} = 1. Suppose that o2(t) and p(t) are continuous on [0,1) and there
exists the finite limit A9.21). If X satisfies Conditions 1 (or la), 2 (or 2a), and
3 (or 3a), then X is the continuous, Gaussian random function specified by
A9.19) and A9.20).
Before proving the theorem, let us connect it with our two familiar
examples. If
A9.29) />@ = 0, G2@=l,
then A9.20) becomes ?{X(s) X(t)} = s, s < t, so that X is Brownian
motion. If
A9.30) ?{X(tk + h) - X(tk) I! X(tx),..., X(tk)} = 0
Diffusion 161
and.
A9.31) E{(X(tk + h)- X(tk)f || X(tx), . . . , X(tk)} = h
with probability 1, then A9.22) and A9.23) are certainly satisfied, A9.26)
holds, and A9.27) holds (even with (t — tj(t2 — 0 on the right). Hence
(if Xe С and Z@) = 0 with probability l) X must be Brownian motion.
Since the hypotheses of Theorem 19.1 imply A9.30) and A9.31), that theorem
is a corollary of the present one.
If
A9.32) ^
then A9.20) becomes E{X(s)X(t)} = s(\ - t), s < t, and X must be
distributed as the Brownian bridge W°.
Proof of the theorem. Fix points tt with 0 < tx < • • • < tk < 1, fix real
numbers u1? . . . , uk, and let t and и vary over the strip
A9.33) h<t<l, -оо<м<оо.
Put
A9.34) y,(t, u) = E{exp i(Ul X(tl) + • • • + uk X(tk) + и X(t))};
we shall derive a differential equation for ip(t, u).
For notational convenience, write Z = ux X{t^) + • • • + uk X(tk), so
that A9.34) becomes
A9.35) y)(t,u) = E{eiZeiuX{t)}.
Since Xe С with probability 1, ip is continuous (jointly in t and u) in the
strip A9.33). Since E{X@} exists, we have
A9.36) — w(t, u) = E{iX(t)eiZeiuXU)},
аи
and Condition 2a implies that this function also is continuous in the strip
A9.33).
With the notations A9.3) and A9.5), we have
h,u)- y>(t, u) = E{eiZe
By A9.35) and A9.36),
A9.37)
- [y(t + h,u)- y>(t, u)] - uP(t) — y)(t, u) + ±u2o\t)y)(t, u)
h du
Ш\^п^\()]}\ + ]-E{\c(uAt,t+h)\}.
zn h
162 Dependent Variables
Since \eiv — 1 — iv\ < |y2, we have, in addition to A9.4), the inequality
\c(v)\ < v2. Therefore
\ E{|(AM+ft)l} ^ \fJh^ + ? f AJ,^ d?,
? f
П J{\t2,t
and it follows by Condition 3a that the third term on the right in A9.37)
tends to 0 with h. And by Condition la (with (tlf . . . , tk,t) replacing
(h, ¦ ¦ ¦ , h) and ("i, . . . , uk, u) replacing (ult . . . , uk)), the other two terms
also tend to 0. Therefore
A9.38) | W(t, u) = uP(t) |- W(t, u) - *uV@v(', u);
since the right side of this equation is continuous in the strip A9.33), so is
the left.t
We now solve the differential equation A9.38). For v arbitrary, define
A9.39) Xv(s) = v ехрГ-Пэ(т) dr\ tk < s < 1,
and put
A9.40) gv(s) = W(s, Xv(s)), tk<s<l.
By the chain rule (^ and y2 denote the partial derivatives with respect to the
first and second arguments of yi),
ip2(s, Xv(
which, together with A9.38) and the definitions A9.39) and A9.40), gives
Therefore ?
A9.41) w(s, Xv(s)) = W(tk, v) ехрГ- ^ ^ог2(г)АЛг) dr\ tk < s < 1.
Given an arbitrary (t, u) in the strip A9.33), take v = и exp j"*fc р(т) dr.
Then Xv(t) = и and
Xv\r) = u2 exp[2 f p(t) dr\ hKr^ t,
so that, by A9.41) with s = t,
A9.42) y,(t, u) = W\
t The derivative with respect to / here is two-sided, even though h tends to 0 through
positive values; see the second footnote on p. 155.
% See the first footnote on p. 155.
Diffusion 163
vhere
a = exp р(т) dr
A
md
Ь2 =
Jr.
Going back to the definition of ip, we see by A9.42) that
E{exp 1(щ X(tx) + --- + Щ X(tk) + и X(t))}
= E{exp i(Ul Xih) + • • • + uk X(tk) + ua
We have proved this for 0 < t± < • • • < tk <. t < 1; by continuity, it is
true for t = 1 as well. It implies that, for arbitrary v,
E{exp i(«i X(tJ + --- + uk X(tk) + v(X(t) - a X(tk)))}
= E{exp i(Ul XVJ + --- + ut X(tk))}e~b*b\
Thus the random vector (Х(^), . . . , X(tk)) and the random variable
X(t) — a X(tk) are independent of each other, and the latter has distribution
^@, W
Taking к = 1 and t± = 0 and using the fact that P{Z@) = 0} = 1, we
see that X(t) is normally distributed with mean 0. Taking к = 1 and tx
general, we see that A9.20) holds and that Z(^) and X(t) — a X(tx) are
jointly normal (for an appropriate a) and hence that the same is true of
X(tJ and X(t). Taking к = 2, we see that X(t±), X(t2), and X(t) — a X(t2)
are jointly normal and hence that the same is true of X(tx), X(t2), and X(t).
Continuing in this way, we conclude that all the finite-dimensional distribu-
distributions are, normal, which completes the proof.
For the asymptotic form of Theorem 19.3, we shall need three conditions
which parallel the Conditions 1, 2, and 3. As before, each condition will
be given two forms. Let Xn be random elements of D.
Condition 1°. I/O < tx <, • • • < tk < 1, then
A9.43) lim lim sup - E{|E{Xn(ffc + h) - Xn{tk) \\XM,..., Xn(tk)}
ft i 0 n->oo П
- hp(tk)Xn(tk)\} = 0
and
A9.44) lim lim sup - t{\l{(Xn{tk + h) - Xn(tk)J1| Xn(tl),..., Xn(tk)}
ft 10 n->oo П
-ha\tk)\} = 0.
t This implies that {X(t)} is a Markov process.
164 Dependent Variables
Condition l°a. If^<t1<---<tk<\, then, for all real wl5 . . . , uk,
\ (Г k
— Г"/ РУ1Л > 111 X it
Л I 0 )(->• со h I 3=1
A9.45) lim lim sup-
X [Xn{tk + h) - Xn(tk) - hp(tk)Xn(tk)]\
0
and
A9.46) lim lim sup-
л I о п -> да /г
Condition 2°. Же
A9.47)
Condition 2°a.
A9.48)
X [(Х„(Ъ + /7) - ^n(OJ - ^2(^)]}
= 0.
sup lim sup E{Xn2(t)} < oo.
have
lim sup lim sup
| Xn(t)\ d? = 0.
Condition 3°. TAere /5 a constant К such that
2} < K(t2 - txf,
'i < t < t2
A9.49)
for all n.
Condition 3°a. For t < 1,
A9.50)
lim lim sup lim sup -
a->oo л!о n-»oo /l J{(Xn(t+ft)—Xn(tJ>aM
It is easy to show that Condition 1° implies l°a and that 2° implies 2°a.
Although 3° does not imply 3°a, each of them does suffice, as we shall see,
for the theorem.
THEOREM 19.4 Suppose that {Xn2(t):n > 1} is uniformly integrable for
each t, that Xn(Q) -^> 0, and that for each positive s and r) there exists а д
such that
A9.51)
») > e} < V
for all sufficiently large n. Suppose o2(t) and p{t) are continuous on [0, 1),
and there exists the finite limit A9.21). If {Xn} satisfies Conditions 1° (or
l°a), 2° (or 2°a), and 3° (or 3°a), then Xn -?> X, where X is the continuous,
Gaussian random function specified by A9.19) and A9.20).
Diffusion 165
Proof. By Theorem 15.5, each subsequence of {Xn} contains a further
subsequence converging in distribution to some random element X of D
with P{X@) = 0} = 1 and P{IeC} = 1. It is enough to show that any
such Zmust satisfy the remaining hypotheses of Theorem 19.3. By Theorem
5.4 and the assumed uniform integrability of {Xn2(t):n > 1}, Condition
l°a implies Condition la; moreover, by Theorem 5.3, 2°a implies 2a, 3°a
implies 3a, and 3° implies 3 (which, as we saw before, implies 3a). Thus X
satisfies Conditions la, 2a, and 3a, which completes the proof.
Remark. Suppose that Condition 3° holds, that
A9.52) Zn@) = 0
holds with probability 1, and that the maximum jump in Xn satisfies
A9.53) mzxt\Xn(t)-Xn(t-)\<en
with probability 1, where
A9.54) en->0.
By Theorem 12.1 and the inequality A2.6) we then have
A9.55) P ( sup \Xn(s) - Xn(t)\ > x) < К' ^
\t<s<t+6 J X
with K' =A*KK2l, provided en < \X, from which A9.51) follows by the
corollary to Theorem 8.3. Now A9.52) and A9.55) imply
if en < \X; therefore (by C) on p. 223)
A9.56) Г X*(t)d?<2 —
J{Xn2(t)>a} a
if ?n2 < T^a, which implies that {Xn(t):n > 1} is uniformly integrable.
Finally, A9.56) implies A9.48). Therefore, if A9.52), A9.53), and A9.54)
hold, the conclusion of Theorem 19.4 will follow if we verify only Condition
Г (or l°a) and Condition 3°.
Remarks. The results here are those of Rosen A967a), with the conditions altered so as to
bring Theorem 12.1 to bear. The idea of deducing the central limit theorem from a differ-
differential equation goes back to Khinchine A933); Theorem 19.3 under the special conditions
A9.30) and A9.31) is due to Levy A948) and Doob A953).
166 Dependent Variables
20. MIXING PROCESSES
cp-Mixing
Let
B0.1) . . . , U, fo, fi, • • •
be a strictly stationary sequence of random variables defined on a probability
space (Q., &, P). For a <Lb, define J?* as the a-field generated by the
random variables fo, . . . , fb; define J^^ as the a-field generated by
• • • , ?a-i> ?a> an(* define ^o°° as the a-field generated by fo, ?o+1, ....
Consider a nonnegative function 93 of positive integers. We shall say that
the sequence {?n} is cp-mixing if, for each к (— oo < A: < 00) and for each n
(n > 1), ?i e u?i ^ and J?2 б Л?гП together imply
B0.2) \?{EX n ?2) - P(?x)P(?2)| < <p{n)?{Ex).
This is a joint property of {?„} and 93. We consider only functions 93 satisfying
B0.3) lim ?>(n) = 0,
п-кп
and usually we require that cp(ri) go to 0 at some specified minimum rate.
If we say that {?n} is 93-mixing without specifying cp, we mean that B0.2)
holds for some 93 satisfying B0.3).
If P(?i) > 0, then B0.2) is equivalent to
B0.4) \?(E2\ EJ - P(E2)\ < <p(n),
whereas B0.2) holds trivially if P(EX) = 0. We shall regard the left side of
B0.4) as vanishing if Р(?х) = 0, so that the condition for 93-mixing is
B0.5) sup {|P(?21 Ed -
If 93G1) is small, B0.2) and B0.4) say that E2 is virtually independent of
E-y. In a 93-mixing process, the distant future is virtually independent of the
past and present.f This property enables us to prove various central limit
theorems and functional central limit theorems.
We shall often write 9зп in place of 93G1). If each 93n has the minimal value
consistent with B0.5), then
B0.6) 1 > (рг > y2 > • • • .
t Since B0.2) and B0.4) are not symmetric in Ег and E2, past and future cannot be inter-
interchanged; for a 95-mixing process that ceases to be 95-mixing when time is reversed, take
fn = S?Li Vn-kl^' where r]k are independent and assume the values ±1 with probability
\ each.
Mixing Processes 167
If we replace yn by
B0.7) min{l, <plt . . . , <pn},
then {?„} is still 93-mixing and B0.6) holds. Thus there is no loss of generality
in assuming B0.6). It will be convenient to define <p0 = 1, so that B0.5)
holds trivially if n = 0.
For fixed E2,
B0.8) {E1:\P(E2\E1)-P(E2)\<e}
is a monotone class| (is closed under the formation of countable increasing
unions and countable decreasing intersections); for fixed Ex, the same is true
of the class
B0.9) {?2:|P(?2|?i)-P(?2)l<?}.
It follows that, if B0.4) holds for Ex in j/ and E2 in 3&, where j/ and & are
fields generating -^^ and ^k+n, respectively, then B0.5) holds, an
observation useful in checking whether a given process is 93-mixing.
Example 1. The sequence B0.1) is said to be m-dependent if the random
vectors (?,-, . . . , ?fc) and (?k+n, . . . , f,) are independent whenever n > m.
(In this terminology, an independent process is 0-dependent.) Such a process
is 93-mixing with (pn = 0 for n > m. For a specific example, take
B0.10) ?„ = ao?n + Дх^-х + ¦ ¦ • + amin_m,
where the a{ are constants and the ?n are independent and identically
distributed.
Example 2. Let {?„} be a stationary Markov process with finite state space,
and let fn =/(^n)' where / is some real function on the state space. If
^J> is defined as before and if ^Vab is the ог-field generated by ?a, . . . , ?6,
then Лаъ с Жа\
Let /?u be the stationary probabilities for the Markov process, and let
puv be the transition probabilities, so that
PRi = Щ, . . . , Ci+l = U,} = PUoPuoUl ' * ' Put_lUl
for each finite sequence u0, щ, . . . , ut of states. Assume that the /?u are all
positive, so that
B0.11) <pn = max ^ - 1
is finite, where p{^ are the «th order transition probabilities. Let Hx be a set
t See Halmos A950, p. 26).
168 Dependent Variables
of (/ + l)-tuples of states, and let H2 be a set of (j + l)-tuples of states.
For the special element
B0.12) ?1 = {(?*-,,...,?*) e Hi}
of Жк_ „ and the special element
B0.13)
Of -^k+n' We have
P ' ' ' Р \P
where the sum extends over (u0, . . . , ut) in Нг and over (v0, . . . , v0) in H2.
From the relation ^vpuv = 1 and the definition B0.11), we obtain B0.2).
And now from the fact that B0.8) and B0.9) are monotone classes, it follows
that B0.2) holds for all Ex in Жк_ ^ and E2 in ^^+n and hence for all Ex in
uP «, and E2 in uT?n.
If the transition matrix (puv) is irreducible and aperiodic, thenj
B0.14) P{:J->PV
for all и and v. Since the state space is assumed finite, the convergence must
be uniform in и and v, so that cpn, defined by B0.11), converges to 0: {?„} is
93-mixing. In these circumstances more is known, namely that the rate of
convergence in B0.14) is exponential: There exist positive constants a and
p, p < 1, such that, if
B0.15) <pn = aP\
then {fn} is 93-mixing. This is also true of certain process fn =/(?„) for
which {?„} is a Markov process with infinite state space—it is true, for
example, if {t,n} satisfies Doeblin's condition, has one ergodic class, and is
aperiodic. %
In some applications, we are presented only with a one-sided (strictly
stationary) sequence
B0.16) flf ?„....
In such cases we shall say the sequence is 93-mixing if
B0.17) sup {I P(?21 Ej) - P(?2) I: E, e ^\ E2 e Jt?n) < <pn
for positive integers к and n. Given a one-sided sequence B0.16), we can
t Doob A953, pp. 172 ff.).
JDoob A953, pp. 190 ff.).
Mixing Processes 169
always construct a two-sided sequence B0.1) with the same finite-dimensional
distributions—the new sequence will in general be denned on a new sample
space (which we can always take to be the product of a doubly infinite
sequence of copies of the real line). We shall then call B0.1) the doubly
infinite extension of B0.16).
If B0.16) is ^-mixing, then, by stationarity, the doubly infinite extension
satisfies B0.4) for Ег e JK\_t and E2 e Jt?w Since B0.8) is a monotone
class, it follows that the doubly infinite extension is 93-mixing, with the same
cp as before. It is easy to verify the converse: If the extension B0.1) is
93-mixing, then the original sequence B0.16) is 93-mixing with the same
function cp.
If we prove, for example, that
1 n
/- Z и —*¦ yv>
/П
where the i-t are interpreted as elements of B0.1), then the result is also true
if they are interpreted as elements of B0.16), since only finite-dimensional
distributions are involved. All our theorems will be of this kind, and it is
indifferent to the results whether we work with B0.16) or with B0.1). In
proofs we shall consistently work with B0.1), which is more convenient
because then the past is an infinite sequence instead of a finite sequence of
changing length.
Example 3. Let Q. be the unit interval, let 38 consist of the linear Borel
subsets of Q, and let P be Lebesgue measure on 38. Suppose
B0.18) fn(eo) = g(a)n, a)n+1, . ... , a)n+m), n = 1, 2, . .. ,
where со has dyadic expansion 00 = • a>1oo2' ' ' and g is some function of
(m + l)-long sequences of 0's and l's. The sequence {?„} is stationary and
m-dependent. This sequence is of the form B0.16) and comes under the theory
as developed here because it is possible to construct, as outlined above, a
doubly infinite extension (on some new space).
Example 4. For another example in which the sequence {?„} is one-sided,
take Q and 38 as in Example 3, let P be Gauss's measure
dx
log 2 Ja 1 + x
and take ?n(co) to be an(oo), the nth coefficient in the continued-fraction
expansion of со. The sequence (ai(co), a2(a)), . . . } is stationary and «p-mixing
with cp of the form
B0.20) <pn = apn, a > 0, 0 < p < l.f
B0.19) P(A) = ;J- f ~- , Az
2 Ja 1
t Levy A937, Chapter 9).
170 Dependent Variables
Inequalities for Moments
. If ? and -r\ are
Suppose ? is measurable Jt^L^ and r] is measurable
the indicators of sets, then, by B0.2),
B0.21) |E{??7} — E{?}E{>7}|
is at most ?>nE{?}. We shall require bounds on B0.21) for random variables
? and г] more general than indicators. In all that follows, {?„} is assumed
stationary and ^-mixing unless the contrary is explicitly stated.
LEMMA 1 If ? is measurable Jtb_^ and r\ is measurable Jt™+n (и ^ 0),
then
B0.22) Е{|?Г} < oo, E{|t?|s}<oo, r, s > 1, - + -=l,
r s
implies
Proof For an understanding of the lemma, consider first two extreme cases.
If (pn = 0, so that ? and r\ are independent, then the inequality B0.23)
holds because each of its members vanishes. If q>n = 1, which imposes no
restriction at all, then, by the inequalities of Holder and Minkowski,
Since we can approximate to f and r\ by simple random variables, in
treating the general case we may suppose that
B0.24)
and
B0.25)
where {^
elements of
inequality,
= ZiUiIAt
7] = 2^/g,'
is a finite decomposition of the sample space Q, into
. For such random variables, we have, by Holder's
S\l/S
and hence it suffices to prove
B0.26)
v,{P(Bt
<
Mixing Processes 171
For each i, Holder's inequality gives
I vtfp, | At) -
Since
B0.26) will follow if we show that
;20.27)
At) - P(fi,)lj || |Р(Я, | A,) - P(fi,
lolds for each i. If Сг+ [Cr] is the union of those Bd for which Р(Б3-1 Аг) —
Р(Б3) is positive [nonpositive], then Cf and Cr lie in <^?™+n, and hence
,) - P(Cf)]
+ [P(CT) - P(C71 Л,)] < 29V
Thus B0.27) holds, which completes the proof.
Taking r = 1 and ^ = oo in Lemma 1 leads formally to this result: If
Е{|Ш < oo and P{|?7| > C} = 0 (| measurable «^.^ and rj measurable
. then
B0.28) lEdr?} - ^{ЦЕ^}! <
The inequality is easily proved directly: It suffices to consider simple random
variables given by B0.24) and B0.25), where |i>3-| < C. Since
< С 2 \щ\ ?(At) 2 \P(B, | At) - P(B3)\,
г i
B0.28) follows by B0.27).
We shall only need the special case of B0.28) in which both random
variables are bounded.
LEMMA 2 If | is measurable Jt*_^ and ||| < Cl5 and if r\ is measurable
in > 0) and \rj\ < C2, then
B0.29)
Let
B0.30) Sn = Si + • • ¦ + ln
and So = 0. Let us temporarily abandon the hypothesis that {?n} is ^-mixing
(retaining the hypothesis of stationarity, however).
172 Dependent Variables
LEMMA 3 Suppose that E{?0} = 0 and
B0.31) |
Then
B0.32)
П fc=0
for all n, and
B0.33) 1 E{Sn2} -> <r2,
vv/zere
B0.34) о*=Щ*}+
Proof. If pk — E{?0?fc}, then, by stationarity,
E{Sn2} = «Po + 2nf (и -
B0.32) follows immediately and B0.33) follows from
n
n—1 oo
г=1 fc=i
Suppose once more that {?n} is 9?-mixing. If ^0 has mean 0 and finite
variance, then, by Lemma 1 with r — s = 2,
B0.35)
< oo, then B0.31) holds, so that the variance of Sn is asymptotically
no2 with <r2 defined by B0.34). The quantity a2 may vanish even though ?0
has positive variance. This is true, for example, if |n = ?n — ?n_l5 where the
?n are independent and identically distributed (see B0.10)). The case a = 0
is a degenerate one, to be excluded in most of our theorems.
LEMMA 4 If |0 w bounded by С and E{|0} = 0, and if'Z(pni< oo, then
where K^ depends on <p alone.
Proof. We shall show that
B0.36) E{Sn4} < 768С
Г °° i~l2
Li=o J
If we replace cp{ by B0.7), the series in B0.36) does not increase; hence we
may assume
B0.37) 1 = cp0 > <px > cp2 > • ¦ • .
Mixing Processes 173
By stationarity,
B0.38) Е{ЗД < 4! n 2 \Щ0^г+^ш+к}\,
where the indices in the sum are constrained by
B0.39) i,j, k>0, i+j + к < n.
By Lemma 2,
B0.40)
and
B0.41)
By Lemma 2 again (and stationarity),
Two further applications of Lemma 2 yield
inserting these two inequalities into the preceding one, we obtain
B0.42)
By B0.40), B0.41), and B0.42), the summand in B0.38) does not exceed
4C4 times the minimum of the three quantities
<Pi> <Рк, <Рг<Рк + <Pi,
and hence the sum itself does not exceed 4C4 times
B0.43) 2 <Pi + I V* + I (<Pi<P* + V,) = I <Рг<Рь + 3 2 V<,
where in each sum the indices obey B0.39) as well as the restrictions indicated.
Since q>t < 1,
n oo
3=0 i,k=O
Furthermore, by B0.37),
n г п
2 ^j ч 2, 2 ^i S *m 2*\}~*
i,k<i i=0 j,k=O i=0
nn oooo, Г00 i~I^
= 2n 2 2 ^ < 2и 2 ?>u 2 9V < 2n 2 Vi •
u=0 v—u u=0 v=u j 2=0 I
From these two inequalities it follows that B0.43) is at most
B0.44) 8л|~2
174 Dependent Variables
and hence that the sum in B0.38) is at most 4C4 times this quantity.
Multiplying B0.44) by 4! n • 4C4, we arrive at B0.36).
Functional Central Limit Theorem
Define a random element Xn of D by
B0.45) Xn(t, со) = -L S[ni](co), 0 < / < 1.
THEOREM 20.1 Suppose that {|n} zs tp-mixing with 2n 9?n* < oo and that
|0 /гй5 wean 0 and finite variance. Then the series
B0.46) <г2=Е{!02} +
fc=i
converges absolutely; if a* > 0 and Xn is defined by B0.45), then
B0.47) Xn К W.
Proof. That the series B0.46) converges absolutely follows from B0.35).
Suppose o2 > 0. We shall prove B0.47) by an application of Theorem 19.2,
thus avoiding a separate argument for the central limit theorem.
We shall prove that {Snz/n} is uniformly integrable. If we write Ea{f7}
as shorthand for the integral of U over the set {17 > a}, the condition for
uniform integrability becomes
B0.48) lim sup E J- sA = 0.
a-юо n (П )
Assuming this condition for the moment, let us see how the rest of the proof
goes through. We must first show that Xn has asymptotically independent
increments, in the sense that A9.13) implies that the difference A9.14)
converges to 0 as n ->¦ 0.
Suppose щ and u3- are integers with щ < vx < w2 < v2 < • • • < ur < vr
and щ — Vt_x > b, i = 2, . . . , r, and suppose that, for i = 1, . . . , r, E{
lies in «л^*|. It follows from the definition of ^-mixing by induction on r that
B0.49)
< Mb).
Suppose that A9.13) holds and let Et be the event {Xn(tt) - Xn(st) ? H^,
where Ht e 0P-. Then Et lies in ~^[?*l.]+1 and, if <5 is the smallest difference
Si - tt_lf then [nSi] + 1 - [шг_г] > [пд], so that, by B0.49), the difference
A9.14) has modulus at most rq>([nd]). Since <5 is positive, Xn does have
asymptotically independent increments.
Mixing Processes 175
Since X2{t) < S*ntJo*[nt], B0.48) implies that {X2{t)} is uniformly
integrable for each t. Certainly, E{Xn(t)} = 0, and Lemma 3 implies that
E{Xn2(t)} -»-1. By Theorem 8.4 (adapted to D) the tightness condition
involving A9.15) will follow if we prove that, for each positive e, there exist
a X, X > a, and an integer n0 such that n > n0 implies
B0.50) (
Let us set
3=1
Since |0 has a finite second moment, there exists an increasing sequence of
integers mt such that n > тг implies P{|?ol > vnji2} < Ijni2. If we define
pn = i for тг < n < mi+1 (and jun = 1 for n < тх), then/?n goes to infinity,
but so slowly that
B0.51) lim nP{S*n > Xjn) = 0
n-»oo
for each positive X. We may at the same time choose pn in such a way that
Pn <n.
Given e, choose Я so that X > <r and so that
B0.52)
for all i, which is possible because of B0.48). If
Et = (max \St\ < ЪХ^п < \St\),
then
pfmax \St\ > 3V«) < P{|Sn| > V») + I^C^ П
With p = pn, the sum here is at most
" Si+,\ > Xjn} +ПI V(?, n {\Sn - Si+P\ > Xjn})
il
п—p
Each term in the first and third of these sums is at most P{S* > Xsjn}, and
we can estimate the second sum by using the fact that E{ e J#*_«,'•
P(max \St\ > 3XJn\ < P{\Sn\ > XJn) + (и + p)?{S* > Xjn}
{ i< )
К - si+P\ >
176 Dependent Variables
And now B0.52) and the fact that the Et are disjoint yields
P max \Si\ > Mjn\ < — + 2n?{S*v > Xjn) + ~2 + 9V
From B0.51) and the fact thatpn -> go, we conclude that
Pjmax |SJ > ЗД7"| < 7?
1г<п j Я
for large n, and this is B0.50) except for the two irrelevant factors of 3.
It now remains to prove B0.48). If | |ol is bounded by C, then, by Lemma 4,
A \ 1/1
± С 2I <? ± С _
1 <f 2W 1 e/ 1
[n ) a In2 / a
where ^ depends on cp alone. Thus B0.48) holds if |0 is bounded. If l0 is
not bounded, define, for real x and positive u,
ix if |x| < u, @ if |x| < u,
[0 if |x| > u, [x if |z| > u,
and put
Then x =fjx) + ^и(х) =/u(x) + gu(x), so that, if Snu = 2;=1/M(|3) and
^nU = SjLi f«(f,-), then 5n = 5nu + ^nu and hence
B0.54) Sn2 < 2SL + 2D2nu.
Since/u(^0) is bounded by 2м, it follows by B0.53) that
B0.55)
[n
By Lemma 1,
and it follows by B0.32) that
м
B0.56)
[n
fc=0
From B0.54) and the relation Ea{U + V} < 2t^{U} + 2E{V}, we now
obtain
И 1И
Mixing Processes 177
and B0.55) and B0.56) lead to
for an appropriate K'v depending only on cp. We can achieve B0.48) by
choosing и so that ^Eu2{|02} < \e and then choosing a so that К^/сс < ^е.
This completes the proof of Theorem 20.1.
Under the hypotheses of the theorem we of course have
B0.57) 4 ^ 2
4= S" ^
if <r2 > 0. And if a% = 0 E0.57) is equivalent to SJy/nA*0 and follows
easily by Chebyshev's inequality and Lemma 3.f
It is easy to make up a multivariate version of B0.57). If |n is the random
vector
B0.58) |n = (Й» . .., ?<;>),
we can define the notion of 9?-mixing just as before. If 2<pn? < oo and if the
!^° have mean 0 and finite variances, it follows by an application of the
Cramer-Wold technique (Theorem 7.7) to B0.57) that тг* 2?.x fл has
asymptotically a normal distribution, centered at the origin, with covariances
B0.59) ait = E{tf>^>} +1 E{f«>tf>} + | Е(Й«ЙЛ},
where the series converge absolutely. The matrix (<т^-) may be singular or
nonsingular.
These results apply to each of the four examples given above. Note that in
Example 1 (p. 167) and in Example 3 (p. 169) the series B0.46) are finite.
In Example 2, with |n =/(Q, where the transition matrix for {?n} is
irreducible and aperiodic, the asymptotic variance o2 can be written
* = 2 7u/(u)f(v),
uv
where
7uv = KPu - PuPv + Pu2(P(uv - Pv) + PvJ,(pikJ - Pu)-
If the matrix (yuv) annihilates the vector (/(и)), then o2 = 0.
f If a2 = 0, it is even possible, by adapting the proof of B0.50), to show that
178 Dependent Variables
Integrals in Place of Sums
Theorem 20.1 has a natural formulation with {?n} replaced by a process
in continuous time. Let
B0.60) {vt(co):-oo < t < со}
be a stationary stochastic process satisfying
B0.61) E{u0} = 0, E{y02} < oo.
Suppose that the process B0.60) is measurable,! so that the integrals
B0.62) f vu(co) du
are well defined and finite with probability 1. Let Yn be the random element
of D defined by
Yn(t, со) = -1= f"с» ds,
(У-sJn Jo
where a will be determined later (Yn, of course, lies in C). We should like to
prove
B0.63) Yn ^ W.
We define {vt} to be ^-mixing if
B0.64) |P(?X n E2) -
holds whenever ?x lies in the <r-field generated by {vu:u < s} and E2 lies in
the a-field generated by {vu:u > 5 + ?}, where now <р(г) is defined for all
positive t. We shall suppose that {vt} is ^-mixing with
B0.65) f°
Jo
Under this assumption and the assumption B0.61), we shall prove B0.63)
with a (assumed positive) defined by
B0.66) a% = 2 Г
Jo
dt.
Although we could imitate the arguments used before for discrete time, it
is simpler to reduce the present case to the previous one. If
=
Ji
s(co) ds,
i-i
f Doob A953, p. 62).
Mixing Processes 179
then, by two applications of Fubini's theorem, E{?0} = 0 and
B0.67) E{|02} <
j\v,\ ds7\ ? EJ j\2 ds\ = E{Uo2}
< oo.
It is not hard to check that B0.66) and B0.46) define the same a. Since
B0.64) implies that {|n} satisfies B0.2), and since B0.65) implies H<pi(n) <
oo (provided у has the minimal value consistent with B0.64)), it follows by
Theorem 20.1 that, if Xn is defined by B0.45), then
B0.68) Xn ®+ W.
Now
dn = supt\Yn(t) - Xn(t)\ ? -Jp max Г |с<| dt,
Gjn \<i<n J i—1
so that
>e}< X
? О J{|?l>?<rVn>
where Z, = JJ \vs\ ds. Since Щ2} < oo by B0.67), <5n^0, and B0.63)
follows by B0.68) and Theorem 4.1.
The relation B0.63) persists if n goes to infinity in a continuous manner.
If vt is interpreted as the velocity of a particle at time t, the integral B0.62)
is the displacement it undergoes during the time interval (s, t); B0.63) asserts
that the particle is approximately in Brownian motion.
Nonstationarity
Returning to the case of discrete time, let us generalize Theorem 16.3 by
showing that in Theorem 20.1 we may replace P, the probability measure
governing the |n, by an arbitrary probability measure Po absolutely con-
continuous with respect to it. Under Po the process need not be stationary.
THEOREM 20.2 Theorem 20.1 remains valid if P is replaced by an arbitrary
probability measure Po dominated by it.
Proof. Define Xn by B0.45) and X'n by
vn<i<nt
where pn —*¦ oo but pjy/n -> 0. As in Section 16 (see A6.10)), we have
If E lies in .лР «,, then, for each Am Si,
\?{{X'n e A} n E) - ?{X'n e A}P(E)\ < <p(pn - k)-+0.
180 Dependent Variables
Since this is true for every k, the proof now goes through precisely as the
proof of Theorem 16.3.
In Example 4 (p. 169) we may thus substitute Lebesgue measure for
Gauss's measure.
The proof of Theorem 17.2 carries over in exactly the same way:
THEOREM 20.3 If the hypotheses of Theorem 20.1 are satisfied, if vn are
positive-integer-valued random variables such that vjan -^- в, where в is a
positive random variable and an —> oo, then {if a > 0) Yn -^> W, where
Yn(t, со) = —-= S[Vn(m)i](co).
If P{|0 e #} > 0, then, by Theorem 20.2, all our limit theorems persist if
we compute the probabilities conditionally on the event {|0 ? H}. It is
possible to go further and work with probabilities conditional on an event
{lo = a)- To carry this through, we need another lemma. As usual, we
assume {|n} to be 9?-mixing.
LEMMA 5 IfE2E Jt™+n, then
B0.69) |P{?2 || Jt*_^ — P(?2)l < <Pn
with probability 1.
Proof. Suppose P{E21| ^i^} — P(E2) > <pn holds on some set Ex that
has positive measure and lies in ^^_^. Integrating over Ex produces a
contradiction to B0.2). Thus ?{Ег || Jt*_^\ - P(?2) < <?„ with probability 1.
The opposite inequality is treated in the same way.
Let us now suppose that {?n} is regular in the following sense. In the
first place, we suppose that the |n generate the <r-field ^ (this is really no
restriction). In the second place, we assume that there exist conditional
probability measures on 88 relative to ^й_о0. We assume, that is, that there
exists a function P@(M) such that, for each со in Q, P@( •) is a probability
measure on ?8, and such that, for each E in 08, P (E) is a version of the
conditional probability P{.E|| ~#^00} (and hence is measurable ^^oj.
This regularity condition holds, for example, if (Q, 88) is the product of a
doubly infinite sequence of copies of (R1, 0Р) and the |n are the coordinate
variables.!
THEOREM 20.4 Suppose {|n} is regular in the sense described. Under the
hypotheses of Theorem 20.1, there exists in ?8 a set E*, with P(E*) = 0, such
t See Loeve A960, p. 361).
Mixing- Processes 181
hat, if cdo$E*, then the conclusion of this theorem remains valid when P
у replaced by P0)o.
Voo/. Let ?„=$! + 1- Sn and define Xn by B0.45). For positive
Qtegers и and л let Zun be the random element of D defined by
у/ u<i<nt
f
dun(co) = sup, |Xn(f, (a) - Zun(t, co)\,
hen
l,et Fo be the set of со for which ?г(со) is ± oo for some /; then P(^o) = 0 and,
or each u,
20.70) -lim dun(co) = 0, соф Ео.
n-*ao
4ow Xn ^ Why Theorem 20.1, and it follows by B0.70) and Theorem 4.1
hat, for each u,
;20.71) Zun ®+ W
is n -> oo. In B0.71) P is the measure governing the ?n.
Since P(?"o) — 0, we have P^C^o) = 0 f°r wo outside some exceptional
set Ex of P-measure 0. If F is a closed set in D and Fe = {x:d(x, F) < e}
with d either of the metrics for the Skorohod topology, it follows by B0.70)
that
;20.72) lim sup P^co: Xn(co) e F} < lim sup Pmo{co: Zun(co) 6 F,}
П-*ао П-Ю0
For co0 outside Ex.
According to Lemma 5, if E e Л™, then
;20.73) IP^E) - P(?)| < ф)
for co0 outside an exceptional set (of P-measure 0) depending on E. Since
Jt™ is generated by {?„, ?u+1,. . . }, it is generated by a countable subfield.
Since PWo( •) is actually a measure, it follows that B0.73) holds for all
и > 1 and all E e ~#", provided co0 lies outside some grand exceptional set
E2. Now {co:Zun(co) e FJ lies in Л™\ hence
B0.74) Р^ш: Zun(co) e Fj < P{w: Zun(co) e Fe} + ф)
for co0 ^ F2.
182 Dependent Variables
If oo0$E* = Еги Е2, then B0.72) and B0.74) both apply, so that
lim sup Ршо{со: Хп(ш) e F} < lim sup P{co: Zun(co) e F,} + y(u)
n->oo n-»oo
for each F and each e. Since B0.71) holds in the sense of P and Ft is closed,
the limit superior on the right here is at most ?{W e Fe]. Letting e -»¦ 0 and
и —»¦ oo, we conclude that
n-*-a>
for all closed F if a>0 $ E*, which, by Theorem 2.1, proves the theorem.
If we have a one-sided sequence |lf |2, . . . , then Theorem 20.4 still holds
if we compute probabilities conditionally on {fx = a}.
21. FUNCTIONS OF MIXING PROCESSES
In this section we shall generalize the results of the preceding one by analyzing
stationary processes {r}n} for which each r\n is a function of the entire process
{?„}, which we assume to be stationary and 9?-mixing, as before.
Preliminaries
Let/be a measurable mapping from the space of doubly infinite sequences
( . . . , a_l5 a0, a1} . . .) of real numbers into the real line:
/(..., a_l5 a0, al5 . . . ) e Rl.
Define random variables
B1.1) y]n =/(..., !„_!, |я> |я+1, . . .), n = 0, ±1, ±2, . . . ,
where ?n occupies the Oth place in the argument of/. We shall derive limit
theorems for the process {r]n}. Although the rjn must be real valued, the ?n
may now take values in Rk or even in a general measurable space.
If the value of/ depends on only finitely many of the coordinates of its
argument, then {r]n} is again 9?-mixing (with a new function cp), and hence
the theorems of Section 20 apply. (Note that {rjn} is in any case stationary.)
If/effectively involves all the coordinates of its argument, {rjn} need not be
99-mixing. We shall obtain limit theorems for {rjn} under the assumption
that r\n can be closely approximated by functions depending only on finitely
many of the |n.
Let/ be a measurable mapping from R2l+X into R1. We write the general
point of R2l+1 as (а_г, . . . , a0, . . . , <хг), so that the value of/ is in the form
_г, . . . , a0, . . . , аг).
3Ut
21-2) Vm
the subscript /n is a pair, not a product). The process {r]ln} is stationary for
:ach /. We shall assume of the function/and the process {?„} that
= 0, Efoe»} < oo.
Ne shall assume further that there exist functions ft such that r\ln (denned
эу B1.2)) has a finite second moment and such that, if
21.3) v, = v(l) = E{\Vo - Vl0\2},
hen
f v,* < oo.
г=1
This is the sense in which r\n is assumed well approximable by functions of
эп1у finitely many of the |n.
If, in place of the doubly infinite sequence B0.1), we are presented with a
one-sided sequence B0.16), then we must assume in place of B1.1) that r\n.
is given by
B1-4) *г„
where now f is a real, measurable function of one-sided sequences (ax,
a2, . . .). In this case, we take/j to be a measurable mapping from Rl into
R1, we define
B1-5) ^n
and we assume that, if
B1-6) v,
then 2vj^ converges as before. Let us replace {?„} by its doubly infinite
extension (see the discussion following B0.16)). If we rewrite/(al5 a2, . . . )
as a function /(..., a_l5 a0, al5 . . . ) effectively involving only the co-
coordinates with positive indices, and similarly for/j, then B1.1) and B1.2)
have the same finite-dimensional distributions as B1.4) and B1.5). It follows
that the square root of B1.3) is summable in / and hence that to each limit
theorem for functions of B0.1) there corresponds a limit theorem for
functions of B0.16). As before, therefore, we may confine our attention to
the doubly infinite case.
Since rjl0, as defined by B1.2), is measurable ^l_v it follows by Jensen's
inequality for conditional expected values! that
t Doob A953) or Billingsley A965). In all that follows, we take E{| || &} to be measurable
& (rather than equal almost everywhere to a function that is measurable ^).
184 Dependent Variables
Therefore, by Minkowski's inequality,
j|2} < 2vf.
Since l>2v^ converges if livft does, there is no loss of generality in assuming
that
B1.7)
The following lemma implies that, if rjln is given by B1.7), then уг is
nonincreasing.
LEMMA 1 Let 3F and & be a-fields with & <= <g.If E{?2} < oo, then
Proof. The following equalities and inequalities all hold with probability 1.
If 7] = | - E{? [ &}, then $ - E{f || ^} = v - E{rj || ^}. Hence it suffices
to prove E{|?7 — E{?7 ^}|2} < E{?^2}. But
E{\r] - E{r] I ^}|!
Take expected values.
Functional Central Limit Theorem
Write
B1.9) Sn = ^
and define Xn by
B1.10) Xn(f, со) =
THEOREM 21.1 Suppose that {?„} и y-mixing with E <рп^ < oo
the r]n defined by B1.1) have mean 0 and finite variance. Suppose further that
there exist random variables of the form B1.2) such that Svfi < oo, where
vt is defined by B1.3). Then the series
B1.11) <т2 = ЕЫ + 2| E{Wfc}
converges absolutely; if a2 > 0 ant/ Zn и defined by B1.10),
B1.12) Xn^>W.
t According to Doob A953, p. 603), E{rjn || -^^±|} can be represented in the form B1.2)
because it is measurable -^n-l- ^^s ^oes not геа^У matter, however: In the proofs we use
only the fact that {r)ln} is stationary for each / and the fact that r\ln is measurable
Functions of Mixing Processes 185
^roof. We first show that B1.11) converges absolutely. For each к and /,
f i = [\k\, then, by Lemma 1 of the preceding section,
md therefore
21.13) |
because of the assumed convergence of ?<?v and ^vz > B1.11) does converge
ibsolutely.
Although it is possible to derive B1.12) from Theorem 19.2, it is more
convenient to use the results of the preceding section. We first prove
21.14) 4= sn -^ N@, a2).
vVe shall assume, as we may, that r\ln is given by B1.7). Put
[21.15) bln = r]n- r]ln,
ю that v(t) = E{<52J and E{dln} - E{r]ln} = 0. We have
1 1 !L 1 i
21.16) 4=sn = ~ 2vu + -7= 2Л.
md the idea in proving B1.14) is to show, by using Theorem 20.1, that the
irst sum on the right in B1.16) is approximately normal for large n and then
to show that the second sum is small for large /.
First of all
since i > / implies that the left side vanishes, whereas i < / implies that it
coincides with E{E2{?70 — E{?70 | -#ij | Л1_г}}, which by Jensen's theorem
does not exceed v,. In the argument leading to B1.13) we may therefore re-
replace щ by r]l0 and rjk by rjlk and, using the inequality E{?720} <; E{?702}, con-
conclude that |E{?7JO?7jfc}| does not exceed the right side of B1.13). Therefore the
186 Dependent Variables
series converges absolutely and uniformly in /; since it converges termwise to
B1.11), we have
B1.17) lim в? = o\
J-GO
It is easy to check that, for each /, {rjln} is 9?U)-mixing, where
fl if n < 21,
[<p(n - 21) if л > 21.
The convergence of ?(<p(n))^ implies that of ~Z{cp(l)(n))?, and it follows by
Theorem 20.1 that, for each fixed /,
-^Flnu^mo?)
B1.18)
as n —>• oo.
Now N@, of) ^> N@, a2) by B1.17); because of B1.16) and B1.18),
the relation B1.14) will follow by Theorem 4.2 if we show that
B1.19)
lim lim sup P
I-»oo n-»-oo
3=1
=0
for each positive e. To this end, we estimate the variance of 2^=1 dtj.
If m denotes the maximum of / and i, then, by the definition B1.15),
l0
E{<5
J0
ij = dm0. Since v(m) <, v(j), we obtain
In the argument leading to B1.13) we may therefore replace rj0 by dl0 and
Vk by <5№; since E{5^0} < E{^02} by Lemma 1, |Е{<5гоEгЛ.}| cannot exceed the
right side of B1.13). Therefore the series
converges absolutely and uniformly in /. Since each term goes to 0 as / —v oo,
B1.20) lim гг2 = 0.
By Lemma 3 of the preceding section applied to {dln} (the sequence {dln}
is not 93-mixing, but the lemma does not require this),
B1.21)
lim -E
n-*-oo П
>li
= r,
Chebyshev's inequality and B1.20) now yield B1.19), which completes the
proof of B1.14).
Functions of Mixing Processes 187
From the absolute convergence of the series B1.11), it follows via Lemma
3 of the preceding section (recall once again that this lemma requires only
stationarity—99-mixing is not needed) that
B1.22) ^
n
If a2 = 0, B1.14) is the same as SJy/n ^>0. From now on we assume
that a2 > 0. To prove B1.12), we establish the convergence of the finite-
dimensional distributions and then tightness. Let pn be positive integers
going to infinity at a rate to be specified later and define
77 —
uni —
and
B1.24) Vni = ?{Sn - Si+2Pn 1
In these definitions we adopt the conventions that Si_2v = 0 if i < 2pn and
Sn — Si+2Vn = 0 if i + 2pn > n. We shall often write/? in place ofpn.
By Minkowski's inequality and Lemma 1,
3=1
3=1
If
B1.25)
then
B1.26) E{|S, - ?{Sk || ^1+Л I2}
and it follows that
B1.27) E{\Uni - SA2} <
for all i. In the same way we obtain
B1.28) E{|Fni - (Sn - S,)|2} < 2?{SlDn] + 2fx\pn).
Since Uni and Vni are measurable -#!f^ and ^?_v, respectively,
B1.29) \P{UnieHlt VnieH2] - ?{Uni e Hx}?{Vni еЯ2}| < <pBpn)
for all linear Borel sets Hx and Я2.
Consider now the finite-dimensional distributions. From B1.14) it follows
that
B1.30) Xn{t) - Xn(s) ®+Wt- Ws.
188 Dependent Variables
We shall prove that
B1.31) (Xn(t),Xn(l) - Xn(t))^(Wt, Wx - Wt);
the argument is easily adapted to cases of dimension exceeding 2. Let pn
go to infinity slowly enough that pJn^-0. Since E{Sk2} ~ ко2 by B1.22),
it follows by the inequality B1.27) and Chebyshev's inequality that we have
the convergence relation
Similarly,
B1-33) -J- Fn,[nt] - (Xn(l) - Xn(t)) ** 0,
and B1.31) will follow if
Because of B1.29), it is enough (Theorem 3.1) to show that there is conver-
convergence in distribution in each of the two coordinates here; but this follows
from B1.30) by B1.32) and B1.33).
We turn now to the question of tightness and prove that, for each positive
?, there exists a X > о such that
B1.34) (
for all sufficiently large n. Put
3=1
By the argument leading to B0.51), there is a sequence pn going to infinity
so slowly that, if
t
B1.35) Д,(А) = ?{StVn > A^/n},
then
B1.36)
n-»oo
for each positive Я.
Consider the variables B1.23) and B1.24) for this choice of pn. With the
definition B1.35), we have
P{|S, - Uni\ > Ajn} < P{|S
so that, by B1.26),
B1.37) ?{\Uni - St\ > Xjn}
Functions of Mixing Processes 189
for all /. Similarly,
B1.38) ?{\Vni - (Sn - S<)\ > Xjn) <
n
By B1.14) and B1.22) it follows via Theorem 5.4 that the variables Sn*jn
are uniformly integrable. Therefore there exists a X > о such that
B1-39)
for all k. Fix such a X. By B1.37),
B1.40) p(max \St\ > 6Xjn) < pfmax \UJ > 5ljn)
Consider the sets
E{ = max \Uni\ < 5Xjn < \UJ
{ i<i
We have
B1.41) Pfmax \Uni\ > 5Xjn\ < P{\Sn\ > Xjn}
\ i<n j
ii
Since
|Sn - Uni\ < \Sn - St - VJ + \Vni\ + |54 - UJ,
the /th summand in B1.41) is at most
B1.42) P{\Sn - Si - VJ > lVn
+ Р(Ег n {[VJ ^ 2Xy/n}) + P{|^ - UJ
Since Ei lies in Jt^ an<l ^ni is measurable ^™_P,
Р(Ег n {|Kni| > 2lVn}) < Р№)Р{|КЯ1-| > 2lVn} + P(E{) <p{2pn)
By B1.38), ?{\VM\ >
so that
Yt
+ Р(Ег) <рBрп).
Using this estimate for the middle term in B1.42) and the estimates B1.37)
and B1.38) for the other two, we see that the ith summand in B1.41) is at
190 Dependent Variables
most
?{E%)?{\Sn_t\ > hf^i) + Р(Е,)<рBрп)
An
Therefore, by B1.39) and the disjointness of the Et,
f \ni\ > 5V«) < ~ ^
Vni\ > 5V«) < ~ + <pBpn)
) A
The last three terms on the right here go to 0 (see B1.25) and B1.36)), and
it follows by B1.40) that
i -»i =— -у I —- -i2
i<n ) A
for all sufficiently large n, which proves B1.34).
This completes the proof of Theorem 21.1. In particular, we have the
central limit theorem B1.14), valid even if a2 vanishes, and this result has a
multivariate version. Suppose
B1-43) rjn = (rj™, ..., Vnr)),
where each rj1^ is a function
(i) f^V ? ? ? \
with mean 0 and finite variance. Suppose these random variables have
approximations
U) _ /•(»)/? t t \
Vin Jl \Sn-li • • • j ьп, • • • 5 bn+lJ-
If Sj v^(i) < oo for each i, where
then, since
E*(iow«>-
( t=l г=1 j г=1
any linear combination of the components of rjn satisfies the hypotheses of
Theorem 21.1, so that the Cramer-Wold method applies. It follows that
rr% E?=1 rjk is asymptotically normal; the covariances for the limit are
B1.44) EfoiV') + t ЦЧоУ"} +
where the series converge absolutely.
Functions of Mixing Processes 191
Applications
Example 1. Let the ?n be independent and identically distributed with
mean 0 and variance 1 and define
i=—oo
oo
where we assume Tit a? < oo. The random variable B1.7) is
in this case, and the requirement Hvfi < oo becomes
< oo.
oo / \2
2 ( 2 a*)
Example 2. Let (Q, $, P) be the unit interval with Lebesgue measure, as
in Example 3 of the preceding section (p. 169), and let ?n(eo) be the Hth
digit in the dyadic expansion of со (n > 1). Then {|l5 |2> • • •} is stationary
and independent. A random variable r\n of the form B1.4) can be regarded as
a function r)n(co) = f(Tn~1co), where/is a function on the unit interval and
T is the transformation Too = 2co (mod 1). Suppose that JJ /(соJ с/со is
finite and that 2v^ converges, where
ь = \f{
Jo
and each/j depends on со only through the first / digits fi(co), . . . , ?г(со) of
its dyadic expansion. It then follows by Theorem 21.1 that, if /a = JJ f(co) dco,
B1.45) -У i/(T'a>) - Ain) ^> N@, a2)
\i=i
for an appropriate asymptotic variance a2. And there is the corresponding
functional central limit theorem. If there exist such functions fx at all, they
may be taken to be
B1.46) /г(со) = 2гГ f(s)ds, cog/Ц^,^
J(i-D/2! \ -2 2
(see B1.7)).
If/(со) = I{ot)(co) and/ is defined by B1.46), then vx < 2~l, so that
B1.45) holds (jl = 0- In this case Zj»=1 /(Тгсо) is the number of/, 1 < i < n,
for which T^to < t. It Ms a dyadic rational,/coincides with/ for some /,
and B1.45) follows by the central limit theorem for (/— Independent
variables. If/ is not a dyadic rational,/(со) involves the entire expansion
of со.
192 Dependent Variables
If/is continuous and/ is defined by B1.46), then vt < 4wf2B~l), where
wf(d) is the modulus of continuity of/. Therefore B1.45) is true if 2г wfB~l)
converges, a condition which holds, for example, if / satisfies a uniform
Holder condition of some positive order:
B1.47) |/(си) -Дсо')| < K\co - (o'\e, K,6>0.
For example, we may take/(со) = со, in which case r)n(co) = Tn(co). (It can
be verified that this last sequence {щ, г\г, . . . } is not <p-mixing for any <p
going to 0—to know Tnco is to know Tn+1co, Tn+2co, . . . exactly.)
Example 3. Consider the (Q, &S, P) and {a^co), a2(co),. . . } of Example 4
of the preceding section. If 7\ denotes the continued-fraction transformation
B1.48) 7> = - -
со
1
CO
and if a(co) = [1/co], then an(co) = aiT^co), n > l.f As in the preceding
example, B1.4) can be regarded as a function ??n(co) =/(Г1"~1со) with/
defined over the unit interval. And
1 f'/M ,
log 2 Jo 1 + со
will hold if
if
B1.49) — Г
log 2 Jo i + со
and if 2v^ converges, where
B1.50) Vj = — f Cf(°>)-A^y da)
log 2 Jo 1 + со
and where each / depends on a> only through its first / partial quotients
a-^ipS), . . . , flj(co). Since 1 < 1 + со < 2 B1.49) is equivalent to
P
f(coJdco < oo
and the requirement 2v^ < oo is unaffected if we define v? by
|/(co)-/?(co)|2rfco
instead of by B1.50).
Jo
t See Section 4 of Billingsley A965) for this and the other facts about 7\ used here.
Functions of Mixing Processes 193
Since a set {со:аг{со) = at, 1 < i < /} is an interval of length at most
2~J+1, we may, for the same reasons as in Example 2 preceding, take/(со) to
be /((И)(ш) or to be any function satisfying a Holder condition B1.47).
For another example, note that, sincef
, , Pi(oo) - 1
log со — log ?-L-
where/?г(со)/#г (со) is the /th convergent, we may also take/(со) = —log со;
for this/, [л = 7T2/A2 log 2). Moreover,
B1.51) log qn(co) = - 2 log Тгксо + 40, |0| < 1,
fc=O
and it follows that
B1.52) -1 (log gn(co) - -^-) Д> N@, a2)
V"\ 12 log 2/
for an appropriate positive a2.
The functional versions of these limit theorems are easily written out.
Diophantine Approximation
These results lead to limit theorems connected with Diophantme approxi-
approximation. A fraction p/q is a best approximation to со if it minimizes the form
B1.53) \q'-oo-p'\
over fractions p'jq' with denominators q' not exceeding q. The successive
best approximations to со are{just the convergents/?n(co)/^n(co), n = 1,2,... ,
and so the value of the form B1.53) for the «th in the series of best approxi-
approximations is
B1.54) rfn(co) = |gn(co)-co-pn(co)|.
Since §
B1.55) l-log^O) - log?n+1(co)| < log 2,
B1.52) implies
B1.56) -U- log dn(co) - -^-) Д> N@, a2).
yjn\ 12 log 2/
t Billingsley A965, p. 42).
t See Khinchine A961).
§ See D.6) on p. 42 of Billingsley A965).
194 Dependent Variables
This limit theorem has a functional form. From B1.51) and Theorem 4.1
it follows that, if Yn is the random element of D denned by
Yn(t, со) = —p(log q[n<](co) -
[nt]
7Г
12 log 2
(with the same a as in B0.52)), then Yn Д- W. Let
Zn(t,co) = —- -
[n<](co) - — .
12 log 2/
Now B1.55) implies a functional central limit theorem corresponding to
B1.56):
B1.57) Zn^>W.
From B1.57) we can derive an arc sine law for best approximations. By
B1.56), A; log dk А- тг2/A2 log 2) (it can be shown that there is convergence
with probability 1), so that the measure of discrepancy dk(oS) has normal
order e~k**lA2 log 2). Let us say that the fcth best approximation pk(a>)lqk(co)
is "superior" if
dk(co) < <г*Л<шо«»
and "inferior" otherwise. If Qn{co) is the fraction of superior ones among the
convergents
then, by B1.57) and A1.26),
2 _
P{0n < a}-^ - arc sin ^a, 0 < a < 1.
For example, if и is large and if pn(co)lqn(a>) is superior, then the odds are
approximately even that more than 85 per cent of the convergents B1.58)
are also superior.
Nonstationarity
If X'n is denned by
X'n(t, со) = -±= E{5[ni]
where pn —»¦ oo but pjy/n —»¦ 0, if Xn is defined by B1.10), and if
<5n = supt|Xn@-X;@|,
Empirical Distribution Functions 195
then
By Minkowski's inequality and stationarity,
b№ < -7= f
1=1
If 2v^ < 00, then by Lemma 1 and the choice of pn, Е{<5„2} —»¦ 0. By the
same arguments as before, therefore, Theorems 20.2 and 20.3 carry over to
processes {??„}. It follows, for example, that we may replace Gauss's measure
by Lebesgue measure in the applications to Diophantine approximations.
Theorem 20.4, however, does not carry over. To see this, consider the
process {?„} of Example 2 of this section (p. 191) and put r}n(eo) =
2?Ln |fc(co)/2l-"+1. Although 2?=1 т){(со) is asymptotically normal (when
properly normalized), this cannot be true if the probabilities are computed
conditionally on ^i(co): Since rj^w) = со it follows that, if ^i(co) =
a @ < a < 1), then the conditional distribution of 2"=1 ч]{(со) is a unit mass
at 2>=1 ъ(<х).
Remarks. The results of this and the preceding section, which are new, extend those in
Billingsley A956 and 1962). Central limit theorems under the hypotheses of Theorems
20.1 and 21.1 have been proved by Ibragimov A962); see his paper for references to the
earlier literature (see also Doeblin A940), Fortet A940), and Kac A946)).
22. EMPIRICAL DISTRIBUTION FUNCTIONS
We shall analyze the empirical distribution functions for sequences {!„} and
{r]n} of the kind discussed in the preceding two sections.
cp-Mixing Processes
Throughout the section, {?„} is stationary and <p-mixing. We shall need the
following estimate, related to Lemma 4 of Section 20.
LEMMA 1 Suppose that |?0| < 1 with probability 1, that E{f0} = 0, and
that Yik^cp^ < oo. Then
B2.1) Е{5Й4} < 288[n2E2{f02} +
196 Dependent Variables
Proof. We have
B2.2) Е{5„4} < 4! n
with the indices constrained by
B2.3) i,y, к > 0, i +7 + A: < и.
Write/7 in place of E{f02}. We first prove these three inequalities:
B2.4) V
B2.5)
B2.6)
By Lemma 1 of Section 20, the left member of B2.4) is at most
since |fn| < 1, B2.4) follows. A similar argument leads to B2.5).
By another application of Lemma 1 of Section 20, the left member of
B2.6) is at most
B2.7) |Е{
Two further applications of the same lemma give
o?<}l < 2<P<*P,
from \?n\ < 1 it follows that B2.7) is at most 4^i^/72 + 2y$p, which
proves B2.6).
Now B2.2) and the three inequalities just proved imply
3,k<i I
where the indices also obey B2.3). Since
4! 4nlp* 2
\ i,k<3
i<Pk < 2 2 к?* <
. - - j=Oi,k=O
and
2 ^<2 2^
J",fc<i i=Oj,k=O k=0
B2.1) is proved.
Suppose now that
0<fnM<l;
let Fn(t, со) be the empirical distribution function for fi(co), . . . , ?n(co), and
Empirical Distribution Functions 197
define Yn by
B2.8) Yn(t, со) = Jn(Fn(t, со) - F(t)),
where F is the distribution function for ?0, Let
B2.9) gt(a) = /[Oit](a) - *¦(/).
THEOREM 22.1 Suppose {?„} и cp-mixing with Hn2q>n% < oo, and suppose
?0 /zas a continuous distribution function F on [0, 1].
B2.10) ^п^У,
where Yn is defined by B2.8) and У is f/ze Gaussian random function specified
by
B2.11) E{7@} = 0
B2.12)
These series converge absolutely and P{Ye C} = 1.
Proof. We first show that we may confine our attention to the case in
which ?0 is uniformly distributed over [0, 1]. Now {F(?n)} is <p-mixing,
and, since F is continuous, F(?o) is uniformly distributed. If F'Jj, со) is the
empirical distribution function for F(i1(co)'), . . . , F(?n(co)), and if
then, with probability 1,
y;(F(o) = Yn(t)
for all /. If the theorem is true in the uniform case, then Y'n -^> У, where
У is a Gaussian random function with Е{У'(/)} = 0 and (write g't(a.) =
Е{У'E)У'@} =
and where P{Y"eC} = 1. Define h:D-> D by (/ur)@ = x(F(t)) and let
У = Л(У'). Since С с Л-ic, Р{УеС}=1; furthermore, У is Gaussian
and satisfies B2.11) and B2.12). Since h is continuous on C, it follows by
Corollary 1 to Theorem 5.1 that Y'n ®+ У implies Yn ^> Y. Thus we need
only treat the uniform case.
198 Dependent Variables
Assume then that f0 is uniformly distributed, and take F(t) = t in B2.9).
Since {gt(?n)} is <p-mixing and
*«(o=4=
it follows by Theorem 20.1 that Yn(t) is asymptotically normal. From the
multivariate version of this result, it follows that, for each fc-tuple tu . . . , th,
the distribution of {Yjj-d, . . . , Yn(tk)) approaches a normal distribution
centered at the origin; by B0.59), the covariances of these limit distributions
are those specified by B2.12) for Y. It follows also from Theorem 20.1 that
the series in B2.12) are absolutely convergent.
We shall show that for each positive e and r\ there exists a <5, 0 < <5 < 1,
such that
B2.13) P{w(Yn,d)>e}<V
for all sufficiently large n. It will follow by Theorem 15.5 that {Yn} is tight
and that, if Y is taken as the limit in distribution of some subsequence
{Yn.}, then Р{УеС} = 1, 7ДУ, and Yis Gaussian and satisfies B2.11)
and B2.12). This will complete the proof.
Fix e and r\. Since f0 is uniformly distributed,
By Lemma 1 applied to {&(?„) - ?,(?„)},
where K± depends on cp alone. Therefore, if
?
B2.14) ~ut- s,
n
we have (assume e < 1)
B2.15) Е{|Г„@ - 7n(s)|4} <^(t- sf.
?
Assume now that p is a number satisfying с/л < р, and consider the
random variables
Yn(s + г» - Yn(s + (i - \)p), i = 1, 2, . . . , m,
where m is a positive integer. By B2.15) and Theorem 12.2,
B2.16) Pfmax | Yn(s + ip) - Yn(s)\ > x) < ^ m2/,
[i<m J SA
where K2 = 2К^КХ depends on cp alone.
Empirical Distribution Functions 199
We next show that
[22.il)
I y.@ - ВД1 й I Yn(s + p)- Yn(s)\ + pVn, s < t < s + p.
Taking s = 0 for notational simplicity, we see that B2.17) is equivalent to
\Un(t) - nt\ < \Un(p) - np\ +np, 0<t<p,
•vhere Un(t) is the number among gu . . . , fn that satisfy |г- < t. But
Un(t) -nt< Un(p) -nt= Un(p) -np + n(p -t)< \Un(p) - np\ + np
ind nt - Un{t) <nt < \Un(p) - np\ + np.
Now B2.17) implies
;22.18) sup \Yn(t) - Yn(s)\ < 3 max \Yn(s + ip) - Yn(s)\ + pVn.
s<t<s+mj> i<m
[f
;22.19) -<p<-7T,
:hen B2.16) applies, and it follows by B2.18) that
;22.20) P( sup \Yn(t) - Yn(s)\ > 4e\ < Щ m2p2.
[s<t<s+mp ) ?
Choose <3 so that K2dje5 < rj. From B2.20) it will follow that
;22.2i) p( sup | ад - ад | > Ц < vd,
[s<t<s+d )
provided there exist a p and an integer m such that B2.19) holds and mp = д.
But this is equivalent to the existence of an integer m with (<5/?)v« < rn <.
{bje)n, which is true for all sufficiently large n.
For given ? and r], therefore, there exists a <5 such that B2.21) holds (for
ill s < 1 and with s + д replaced by 1 if it exceeds 1) when n is large enough.
But this implies (see the corollary to Theorem 8.3) that
for large n, which is B2.13) except for the factor 12.
Thus Yn ^> Y. The assumption that F is continuous serves only to simplify
the proof. It can be avoided by using Theorem 12.5 where we have used
Theorem 12.2. Of course, Y will not lie in С with probability 1 if F has
discontinuities.
Functions of cp-Mixing Processes
Consider now a function
B2.22) 4n =/(.-., ?-1,?«,?и-ь---)
200 Dependent Variables
of a <p-mixing process, as in the preceding section. We shall assume as before
that t]n has close approximations
B2.23) rjln =/i(fn_i, ...,?„,..., fn+j)
depending on only finitely many of the ?,-. Again we may work with one-
onesided sequences just as well.
Suppose that
0<rjn(co)<\,
and define Yn by B2.8), where now F denotes the distribution function of
rH and Fn(t, со) denotes the empirical distribution of r]i(co), . . . , r]n(co). In
addition to a restriction on the magnitude of cpn, we shall need, in analyzing
the asymptotic distribution of Yn, an assumption the effect of which is to
ensure that the empirical distribution Fln(t, со) of r}n(co), . . . , r}ln(co) agrees
with Fn(t, со) for points t in a set Нг which rapidly becomes dense in [0, 1]
as /—»¦ oo.
We shall suppose in the first place that
о <mn(v) < i-
We shall suppose in the second place that there exist sets Нг in [0, 1] with
these three properties:
(i) If t e Hlt then
B2.24) hoAVo) = hoAVw)
with probability 1.
(ii) If
B2.25) J^iFity.teH,},
then Jt is a pj-net in [0,1], where pt goes to 0 exponentially: There exist
positive A and p, p < 1, such that
B2.26) Pl<AP\ /=1,2,....
(iii) We have Нг с Н1+1.
From (i) it follows that, ifteH^ then Fln(t, со) = Fn(t, со) with probability
1. Define gt by B2.9), as before.
THEOREM 22.2 Suppose that {?„} is cp-mixing with T,n2(pn% < oo, that
r]0 has a continuous distribution function F on [0,1], and that there exist sets
Hl with the three properties just described. Then
Yn^Y,
where Y is the Gaussian random function specified by
B2.27)
Empirical Distribution Functions 201
md
22.28) E{Y(s)Y(t)} =
The series converge absolutely and P{ Y e C} = 1.
Before proving the theorem, we give two examples of its application.
Example 1. Let (O, &, P) be the unit interval with Lebesgue measure and
et ?„(«) be the «th digit in the dyadic expansion of со, as in Example 2 of the
preceding section (p. 191). If 7co = 2co (mod 1) and rjn(a>) = Tn~1co, then
r]n has the form B2.22) (one-sided version). The r\n are uniformly distributed:
F(t) ЕЕ t.
Define r]ln by
Льп(со) = 1-~^ if ^<^(co)<^, l<i<2'.
Then r)l0(w) is a function of the first / digits of со and hence rjln has the
form B2.23) (one-sided version). If t has the form i/21, then B2.24) holds.
Since F is the identity, we may take
which is a 2~J-netin [0, 1]. Thus {rjn} satisfies the hypotheses of the theorem.
Example 2. Let (O, 3%, P) be the unit interval with Gauss's measure
B0.19), as in Example 3 of the preceding section (p. 192). Let ?n(oo) — an(co)
be the «th partial quotient in the continued-fraction expansion of со, and
let r]n(co) = T^co, where Tx is the continued-fraction transformation
B1.48). Then r\n has the form B2.22) (one-sided version), and the distribution
function of the r)n is
F(t) = J_ f^L = logd
log 2 Jo 1 + x log
f
log 2 Jo 1 + x log 2
Define rjln(co) as the /th convergent of T^co:
Then r]ln has the form B2.23) (one-sided version). Let Ht be the set of all
/th order convergents (for all со). Now the unit interval splits into countabiy
many subintervalsf having the elements of Нг as endpoints; ' '
t The fundamental intervals of rank /—see Billingsley A965, p. 42).
= со
202 Dependent Variables
and rjn(co) = Pi(co)lqi(co) lie in the same subinterval, so that B2.24) holds
for t in Hx. Now the subintervals have length at most 2~l+1. Since the
derivative of F is everywhere less than or equal to I/log 2, the set B2.25)
is a prnet in [0, 1] with
<r 2 1
log 2 2'
Thus {rjn} satisfies the hypotheses of the theorem.
Proof of Theorem 22.2. As in the proof of Theorem 22.1, we first show that
it suffices to consider the case in which щ is uniformly distributed on [0, 1].
Let rj'n = F(rjn) and r\\n = F(r)ln). Then r]'o is uniformly distributed and,
if t lies in the set Jx defined by B2.25), then
ho.tyJlo) = ho,f]\4io)
with probability 1. If Щ = J[ = Jx, then {rj'J and {rj'ln} satisfy the hypotheses
of the theorem relative to H{ and J[. If the result is true in the uniform case,
then, by the opening arguments of the preceding proof, it is true in general.
Assume then that r]0 is uniformly distributed; take F(t) = t in B2.9);
assume that, if t e Нг, then B2.24) holds with probability 1; assume
Hx с Hl+1; and, finally, assume that Ht is a pj-net in [0,1], where px
satisfies B2.26). To include 0 and 1 in Нг involves no loss of generality. x
From these assumptions, it follows that \rj0 — ?7г0| < 2pf with probability
1. Therefore, whatever t may be,
P{% < t < Т]Ю} + Р{г}ю <t<r)o}
< ?{t - 2Pl <rjo<t} + P{t<rj0<t + 2Pl},
so that, since rj0 is uniformly distributed,
Since gt(r]l0) is a function of ?_t, . . . , ?г alone, and since B2.26) implies
Ерг4 < oo, it follows by Theorem 21.1 that
1 Л
yjn i-i
is asymptotically normal. The multivariate version of this result (see B1.43)
and B1.44)) shows, just as the multivariate version of Theorem 20.1 did in
the preceding proof, that the finite-dimensional distributions of Yn converge
to those specified for Y and that the series in B2.28) converge absolutely.
Again as in the preceding proof, it now suffices to produce, given positive
? and 7], a d, 0 < <5 < 1, such that
B2.29) P{w(rn, <5) > ?} ^ 7]
for all large n.
Empirical Distribution Functions 203
If s and t both lie in Нг, then the process
s q>(l)-mixing with
n = 0, ±1, ±2,...
<pu\n) =
l
if К 21,
{(p(n - 21) if n > 2/.
Jince ?70 is uniformly distributed, and since
.emma 1 implies
fc=0
vhere K3 depends on <p alone. Therefore
s,teHlf - < t
n
22.30)
mply (assume ? < 1)
•22.31) P{\Ut)-Yn(s)\>*}<
Choose and fix an integer m such that
;22.32)
- s
^(t - sJ
vvhich is possible because of B2.26). We shall inductively define sets
r Av) Av) Av)
Lv: t0 , t± ,. . ., tj>
for v > 0. The sets Lv will be such that
such that
and
and such that
B2.33)
where
»2г — li 5
1<г<2
max
1<г<2
204 Dependent Variables
If f«») _ о and ^0) = 1, then Lo has these properties. Suppose Lo, . . . , Lv
already constructed so as to have these properties. Define t?+l) = t\v), and
take as t%?l] any point of Hm(v+1) satisfying
\Av+l) l(Av) _i *(^'\| s* „
\hi+l ~ Wi + h+ljl <¦ Pm{v+D>
such a point exists because Hm{v+1) is a /3m(<,+i)-net in [0, 1]. From B2.32) it
follows that B2.33) holds also with v replaced by v + 1. Thus Lv+X satisfies
the requirements.
Fix a number в such that
B2.34) i < 60 = 04 < 1.
Suppose ?, и, and у satisfy
B2.35) - < -Ц < —л < -^,
n ~ 2"+1 ~ 2"-1 ~ Vn
which, by B2.33) implies
B2.36) i < a, < ^ < -p .
Suppose further that 0 < и < и and 0 < h < 2*~и. We shall show that
B2.37) p( max \Yn(t{&+i) - Yn(t^)\ >
for a iT5 depending only on <p and the fixed numbers m and в involved in
B2.32) and B2.34). For notational convenience, we carry through the proof
with h = 0.
If 0 < i < 2", then
1 Yn(t?) | < i max J Уя(#) - Yn(t\tm*) 1
(consider the base-2 representation of the integer i), and therefore
Pf
2
Now ^ = /|"-к)еЯтA_м. Since B2.30) implies B2.31), it follows by
B2.36) that
P{
2
J.—0 1—1 p
я. — " J—-i- о
^ 4 -vp v - - j i -у—к ли)
Martingales 205
where JC4 = 2K3m6l(l - бL. Since &_fc < l/2"-fc-\ B2.37) follows with
oo ,-6
(recall that 2в0 > 1 and that the K3 in B2.31) depends on cp alone).
The purely geometric inequality B2.17) applies as before and yields
sup \YJt) - Yn(f$OI < 3 max \УМЦи+Л - Уп(*$)| + Jn .
0<i<2
It now follows by B2.36) and B2.37) that
К
er"
Since f J*l = 4"~u) ? !».„ and a^_u ^ l/2*-"+1, we have, by the corollary to
Theorem 8.3,
B2.38) ?L(Yn, -±-\ > Us] < Щ Ql~\
This inequality holds under the assumption of B2.35) and 0 < и < v.
If ? and r] are given, choose an integer r such that K560rle5 < iq (recall that
60 < 1) and define <5 = l/2r+1. For all n exceeding some n0, there exists a v
satisfying B2.35). If we take и = v — r, then B2.38) holds, which implies
B2.29) (with 12e in place of e). This proves Theorem 22.2.
It is not hard to show that the theorem persists if the probability measure
P governing the ?„ and y\n is replaced by a probability measure it dominates.
Thus we can replace Gauss's measure by Lebesgue measure in Example 2.
Remarks. Theorems 22.1 and 22.2 are new; the case presented here as Example 1 is due
to Ciesielski and Kesten A962).
23. MARTINGALES
In Section 20 we proved a functional central limit theorem for stationary
processes {?„} satisfying a uniform mixing condition. This mixing condition
can be weakened to the assumption of ergodicity if we assume that the
partial sums
Sn = & + • • • + ln
form a martingale .f
f See Doob A953) for the ergodic theory and martingale theory required here.
206 Dependent Variables
THEOREM 23.1 Let {?l5 ?2, . . .} be a stationary, ergodic process for
which
B3.1) Е{?„ !&,..., ln_x} = 0
with probability 1 and for which Е{?„2} = <72 is positive andfinite. IfXn(t, со) =
S[ntl(u))la^t then Xn U W.
Proof. There is no loss of generality in working with a doubly infinite
sequence . . . , |_x, |0, ?1} . . . , stationary and ergodic. f If ^ denotes the
<7-field generated by . . . , ?к_1г ?fc, then, by B3.1) and stationarity,
B3.2)
with probability 1.
We shall verify the hypotheses of Theorem 19.4 with the functions p(t) = 0
and o2(t) = 1 appropriate to Brownian motion (see A9.29)); Xn -^> W will
follow. For Condition 1° of that theorem, it is enough to show that, for
B3.3) limlim sup -
ft 10 n-»oo П
and
B3.4) lim lim sup \
Л10 n->oo rt
- Xn(t)
п{Х + h) - Xn(t)J
= 0
- h\} = 0.
Now B3.3) is an immediate consequence of B3.2). As for B3.4), the relation
B3.2) implies
- Skf
so that, by stationarity, the expected value in B3.4) is
nhi
1=1
where А:„ = [n(t + /г)] — [«?]. Since, by Jensen's inequality, the above
expression does not exceed
a
nh i=
B3.4) follows by the mean ergodic theorem.
t Take the !„ as the coordinate variables on the product of a doubly infinite sequence of
copies of the real line, with the finite-dimensional distributions prescribed by the original
process; ergodicity is preserved because it depends only on the finite-dimensional distribu-
distributions.
Martingales 207
By B3.2), the variables ?k are orthogonal, and hence
B3.5) E{Sn2} = na\
which implies Condition 2°.
Suppose we succeed in proving that {maxfc<n Sk2/n} is uniformly integrable,
which, if we define Ea{?/} as the integral of U over {U > a}, means that
B3.6) lim sup e/- max sA = 0.
a-»oo n {П k<n j
It will certainly follow that ~{Xn2(t):n > 1} is uniformly integrable for each
t, which is one of the hypotheses of Theorem 19.4, and Condition 3°a will
follow easily by stationarity. Finally, since
p(max | Sk\ > A^Jn) < 72 Ea4~ max
( | k\ > ^J) < 72 a4
{k<n j А [П k<n j
it will follow by Theorem 8.4 that the tightness condition A9.51) is satisfied.
It suffices therefore to prove B3.6).
If ?0 has a fourth moment, then E{?*} = EE^I^IJ, with the indices
running independently from 1 to n. If the largest index is not matched by any
other, then, by B3.2), the term vanishes; hence
{^} 2 {f A8} + 6
к i<k i
If 1 ?ol < С with probability 1, then the first two sums on the right contribute
at most 3«2C4 in all, and the last sum is 6 ??=2 EfS^^2}, which cannot, by
B3.5), exceed 3«2C4. Thus
B3.7) Е{5„4} < 6n2C4
if \?o\<C with probability 1.
By a martingale inequality!
B3.8) EJmaxISJ') < (
for у > 1. Therefore B3.5) implies
B3.9) E(maxSfc2]<4«E{!02},
U<n j
and, if Hoi is bounded by C, B3.7) implies
B3.10) Efmax sA < (IL • 6n2C4.
t Doob A953, p. 317).
208 Dependent Variables
For и > 0, define
fe if
10 if
and put
and
If Sku = ?*LX rjiu and Dfcu = S*=1 <5iu, then
B3.11) - max Sfc2 < - max S2ku + - max D\u.
П к<п П к<п П k<n
Now {rjnu:n > 1} has the martingale property B3.1) and \rjnu\ < 2u; it
follows by B3.10) that
Е.Д max SJB) < - eA max S4J < -(^ ¦ 6BWL.
\n k<n j a U k<n j a\3/
And {5nu:« > 1} also has the martingale property, so that, by B3.9) and
Lemma 1 of Section 21 (see p. 184),
ЕД max D2A < 4E{602u}
[П k<n j
From the relation Ea{*7 + V} < 2E^a{t/} + 2Eia{F}, together with B3.11),
it now follows that
tJ- max ьу < к - + tu2^0-j
[n k<n j La J
for a universal constant K. Since |0 has a finite second moment, B3.6)
follows.
Remarks. The central limit theorem under the hypotheses of Theorem 23.1 was proved
independently by Billingsley A961) and Ibragimov A963); the proof of Theorem 23.1 itself
(for bounded ?„) is due to Rosen A967a).
24. EXCHANGEABLE RANDOM VARIABLES
Sampling
For each n, let
v-^4.1) xnl, xn2, . . . , xnk
Exchangeable Random Variables 209
be a sequence of real numbers (not necessarily distinct), and let ?nl(co), . . . ,
?nkn(co) be a random permutation of these numbers, each of the kn\ permuta-
permutations having probability l[knl Define a random element Xn of D by
:24.2) XJt, со) = 2 §JLa>),
with ^„(f, со) = 0 for 0 < t < \jkn. If we sample until the finite population
B4.1) is exhausted, Xn describes the course the sampling takes.
THEOREM 24.1 If
lc 1c
B4.3) J xni = 0, f x2ni = 1,
and
B4.4) max \xni\->Q,
l<i<kn
and ifXn is defined by B4.2), then
B4.5) Xn ^> W°.
Proof. It will be enough to verify the hypotheses of Theorem 19.4 with the
functions p(i) — —1/A — t) and a2(t) = 1 that characterize the Brownian
bridge W° (see A9.32)).
We shall need a preliminary computation. Let у1г . . . , yu be real numbers,
suppose 1 < m < u, and let r\x, . . . , r)m be an ordered sample of size m
made without replacement (there are I I possible samples, all equally
likely). By symmetry, the sum Ег7^1 г\г has mean mE^}, and hence
m u
B4.6)
( j U i=l
The second moment of this sum is, again by symmetry,
wE{r?!2} + m(m -
which computation reduces to
B4.7)
Applying these results to ?^x gni and using B4.3), we see that
E{Xn(t)} = О, Е{ХД0} = [fc^](fcfc" ~^]) .
The second moment here converges to t(l — t) (note that B4.3) and B4.4)
imply kn -»¦ oo).
210 Dependent Variables
Suppose now that 1 < mx < тг + m2 < kn and that we know the values
of f я1, . . . , |nMi. Then ёПг1Я1+1,..., In,mi+TO2 is conditionally distributed as a
sample of size m2 from the population B4.1) with the values |nl, . . . ?nTOi
removed from it. Applying B4.6) and B4.7) with m = m2 and и = kn — mx
and using B4.3), we obtain
B4.8)
and
B4.9) E
E 2
>ni
= jil' ¦ • ¦ '
m2
К ~ wii=1
/pmi+тг
V L»=
bjlJJ • • • 5 Ь
m2) Г _у|2"
x - 1) L *=i "j
m2(m2-l)
-l2
Fix ? < 1; taking m1 = [fcnf] and m2 = [А:„(? + /г)] — [knt] in B4.8)
gives
E{Xn(t + h)~ Xn(t) || Xn@} = -A,
with
Therefore
E{Xn(t
К ~ [knt]
- Xn(t)
Xn(t)} +
1 - t
xn(t)
An(h)
1 -
Since Е{|ЛГЯ(О1} < &{Xn\t)} is bounded and limn AJh) = A/(l - 0.
A9.43) follows. (Although we have for notational convenience conditioned
with respect to but one X(t), B4.8) easily yields the general case as well.
The same remark applies to the next argument.)
It follows from B4.9) that
^+Cn(h)Xn\t)
with
and
h)] - [knt])(kn - [kn(t + h)])
(K - [knt])(kn - [knt] -
с (h) =
~ ^^J ~ 0
« - [Kt] - 1)
Exchangeable Random Variables 211
Therefore
B4.10) -h
h) - Xn(t)f
- h\}
\вм\
h
V f2 _ 1.КпЧ
Now
h \ knj - h
1Н has mean [knt]/kn and second moment (use B4.7))
[knt](kn - [fc,t]) fc" ^4 [fc,t]([M - 1) ^f2.
kn(kn-l) А*"' &«(&«-1) ~* '
hence the variance of this sum tends to 0. Since
limn Bn(h) = h(l-t- A)/(l - гJ,
the first term on the right in B4.10) tends to 0 as n ->¦ oo. Since E{Zn2@} is
bounded and limn Cn(h) = /г2/A - tf, A9.44) follows.
We have verified Condition 1° of Theorem 19.4. Since the maximum jump
max^ \xni\ in Xn tends to 0, it follows by the remark following the proof of
Theorem 19.4 that we need only verify Xn@) = 0 (which is obvious) and
Condition 3°. Hence it remains only to show that
B4.11)
E{|*n@ - Xn(h)\2 \xn(t2) - xn(t)\2} < K{u - o2, h<t< t2,
for some ^independent of n, tu t, and t2. Now the left member of B4.11) is
B4.12)
where i and j range over [kj-^] < i, j < [knt] and к and / range over
[knt] <k, l< [knt2]. Put [knt] - [кяь] = m1 and [knt2] - [knt] = m2; by
symmetry, B4.12) reduces to
m2 -
Write тп =
to
+ m1(m1 - I)m2(m2 -
Calculations (routine if tedious) further reduce B4.12)
1 -
m1m2(m1 + m2 — 2)
2rn-l
h(mi - l)ma(ma - 1)
(K - l)(kn - 2)
3A - 2rn)
kn(kn - l)(/cn - 2)(fcn - 3)
212 Dependent Variables
Since 0 < тп < 1 by B4.3), kn > 6 implies that this last expression is at
most
2ш2
%т^тг{тх + m2) 24ш12ш22 o./mi + тг\2
T3 fc4 I k )¦
If h ~ h > l/kn, then (mx + m2)//cn < 3(t2 - h), so that B4.11) holds with
К = 306. If t2 — tx < l//cn, then B4.11) holds because one or the other of
the two factors inside the expected value vanishes. This proves Theorem 24.1.
We shall need the following consequence of Theorem 24.1. For each
TF°-continuity set A and for each positive e, there exists a positive d(e, A)
such that, if xnl, . . . , xnkn satisfy B4.3) and
B4.13) max \xni\ < d(e,A),
then
B4.14) \?{XnEA}- W°(A)\<e
(Xn being denned by B4.2)). Indeed, if for some e and A there were no such
d(e, A), we could construct a sequence of sets {xni} satisfying B4.3) and
B4.4) but violating B4.5).
Exchangeable Variables
We turn now to a more general problem. For each n, let
B4.15) ?nl, ?n2, . . . , ?nkn
be random variables that are exchangeable in the sense that each permutation
of the set B4.15) has the same joint distribution as the set itself. Define Xn,
as before, by B4.2). We shall assume the ?ni satisfy the three conditions,
B4.16) Z?»«A0, 2&^1> maxIfJ^O,
i=l г=1 1<г<*:п
as n —»- oo.
If B4.15) is a random permutation of B4.1), then it is exchangeable and
B4.3) and B4.4) together imply B4.16). The following result thus generalizes
Theorem 24.1.
THEOREM 24.2 If the variables B4.15) are exchangeable and satisfy
B4.16), ®
Proof. If an = k;1 2*^ ?я< and pn* = Z& (?ni - anJ, then, by B4.16),
B4.17) knan^0, /SnAl.
If ^ni = (?ni — an)//5n, then the variables 7;nl, r]n2,. . . , r]nkn are exchangeable
Exchangeable Random Variables 213
and
Define a random element Yn of D by
Yn(f, со) = J ?yni(co).
Since
Xn@ = 0ЯУЯ(О + [knt]an,
j^n Д. w° and 7n^ Г are equivalent by B4.17) and Theorem 4.1. We
shall prove Yn ^> W°.
Let inl, . . . , ?nkn be a random permutation of r]nl, ..., r)nkn. (The r)ni
are subjected to a random permutation that is independent of them. This is
a probative device; to support the random permutation, the probability
space on which the rjni are defined may require to be enlarged.) The t,ni have
the same joint distribution as the rjni, since the latter are exchangeable, and
hence Yn has the same distribution as the random function defined by
Zn(t, oj) =
г=1
To prove Yn Д- W° it therefore suffices to show that
B4.19) ?{ZneA}->W°(A)
for each JF°-continuity set A.
Given a JF°-continuity set A and a positive e, choose d(e, A) in such a way
that B4.13) implies B4.14) (where, in B4.14), Xn is defined via B4.2) from a
random permutation of numbers B4.1) satisfying B4.3)). If En is the event
En= max \r}ni\< d(A,e)\,
[l<i<kn j
then P(?n) —>-1 by B4.18). By the definition of conditional probability,
P{Zn e A} = f P{Zn e A || Via,..., r]nkn} d?
+ Г P{ZnEA\\rjnl,...,rinkn}dP.
Because of B4.18),
\?{ZneA\r]nX,...,rlnkn}-W\A)\<e
holds on the set En, and therefore
\?{ZneA) - W°(A)\ <2e + 2?{En°).
Since P(En) —»- 1, B4.19) follows, which completes the proof of the theorem.
214 Dependent Variables
Combining Theorem 24.2 with the computations in Section 11 (see A1.39),
A1.40), and A1.42)) gives the limiting distributions for maxj.<j.n E*=1 ?ni,
for тахй<й |??=1 ?nJ, and for the fraction of A:, 1 < к < kn, for which
4.1 Zm > o.
Remarks. For Theorem 24.1 and extensions to other sampling procedures, see Rosen
A964, 1967a, and 1967c). Theorem 24.2 is new. Chernoff and Teicher A958) proved
Xn{t) 3> N@, t(l — 0) (t fixed) under the hypotheses of this theorem; see their paper for
examples of variables satisfying these hypotheses. Related limit theorems concern proba-
probabilities computed conditionally on Xn(l) = 0, where Xn is some random function; see
Dwass and Karlin A963)—and the references there—and Trumbo A965), as well as forth-
forthcoming papers by T. Liggett and M. Wichura.
APPENDIX I
Metric Spaces
We review here a few facts about metric spaces, taking as known the first
definitions (open, closed, dense, limit point, continuity, etc.) and properties.!
We denote the metric space by S and the metric itself by p(x, y). We
denote the closure of a subset A of S by A~, its interior by A°, and its
boundary by dA (= A~ — A°). We define the distance from x to A as
P(x, A) = inf {P(x, y):yeA};
it is easy to check that p(x, A) is uniformly continuous in x. Denote by
S(x, e) the open sphere with center x and radius e: S(x, e) = {y: p(x, y) < e).
By "sphere" we shall mean "open sphere"; we shall call S(x, e) the e-sphere
about x. (Topologists use "ball" in place of "sphere.")
Two metrics p1 and p2 on S are said to be equivalent if (write St(x, e) =
{у:рг(х, y) < e}) for each x and e there is a <5 with S^x, д) с: S2(x, e) and
S2(x, <5) c: S^(ж, e), so that 5 with px is homeomorphic to S with p2.
Separability
The space S is by definition separable if it contains a countable, dense subset.
A base for S is a class of open sets such that each open subset of .S is the
union of some of the members of the class. An open cover of A is a class of
open sets whose union contains A. A set A is discrete if about each point of
f For full accounts, see Dieudonne A960), Royden A963), or Simmons A963), for
example.
215
216 Appendix I
A there is a sphere containing no other points of A—in other words, if each
point of A is isolated in the relative topology. If S itself is discrete, then
taking the distance between distinct points to be 1 defines a metric equivalent
to the original one.
These three conditions are equivalent:
(i) S is separable.
(ii) S has a countable base.
(iii) Each open cover of each subset of S has a countable subcover.
Moreover, separability implies
(iv) S contains no uncountable discrete set,
and this in turn implies
(v) S contains no uncountable set A with
A) inf {p(x, y):x, у eA, x ^ y} > 0.
Proof of (i) -»- (ii). Assuming D is a countable set dense in S, let У be the
countable class of spheres with rational radii and with centers in D. Let
G be open; to prove that У is a base, we must show that, if Gx is the union
of those elements of У that are contained in G, then G = Gx. Clearly,
С?! с G, and to prove G c G1? it suffices to find, for a given x in G, a d in D
and a rational r such that x e S{d, r) c= G. But if x e G, then S(x, e) c= G
for some positive e. Since D is dense, there is г. d in D with р(ж, J) < \s.
Take a rational r with p(x, d) < r < |e.
Proo/ o/ (ii) —>- (iii). Let {Vx, V2, . . . } be a countable base, and suppose
{Ga} is an open cover of A (a ranges over an arbitrary index set). For each
Vk for which there exists a Ga with Kfc с Ga, let Gafc be some one of these Ga
containing it. Then A c: \JkGak.
Proof of (iii) —>- (i). For each n, {S(x, n):^ e S} is an open cover of S.
If (iii) holds, there is a countable subcover {S(xnk, п~г):к = 1,2,...}.
The countable set {xny.n, к = 1, 2, . . . } is dense in S.
Proof of (Hi) —>- (iv). If A is discrete, then for each x in A there is a positive
ex such that S(x, ex) contains no other point of A. Since {S(x, sx):x e A} is
an open cover of A without any subcover, (iii) cannot hold if A is uncountable.
Since a set satisfying A) is discrete, certainly (iv) implies (v). Although we
shall not need the fact, (v) implies separability, so that (i) through (v) are
all equivalent. (For each positive e, use Zorn's lemma to find a maximal set
As of points distant at least e from one another; the union of the AE for
rational e is dense and, if (v) holds, countable.)
Metric Spaces 217
Compactness
A set A in S is by definition compact if each open cover of A contains a finite
subcover. An ?-net for A is a set of points {жд.} with the property that for
each x in A there is an xk such that p(x, xk) < e (the xk are not required to lie
in A). A set is totally bounded if, for every positive e, it has a finite ?-net.
A set A is complete if each fundamental sequence in A converges to some
point of A.
For an arbitrary set A in S, these four conditions are equivalent:
(i) A~ is compact.
(ii) Each countable open cover of A~~ has a finite subcover.
(iii) Each sequence in A has a limit point {has a subsequence converging to a
limit, which necessarily lies in A~).
(iv) A is totally bounded and A" is complete.
It is easy to show that (iii) holds if and only if each sequence in A~ has a
limit point (necessarily in A~) and that A is totally bounded if and only if
A~ is totally bounded. Therefore we may assume in the proof that A = A~
is closed. The implication (i) —>- (ii) is trivial.
Proof of (ii) —>- (iii). Given a sequence {xn} in A, define Fn to be the closure
of the set {xk:k > n}. If f\nFn = 0, then the open sets Fnc cover A, and
hence, if (ii) holds, A <=¦ F-fV • ¦ • VFnc for some n, which implies the
impossible relation Fn n A = 0. Thus ПпРп contains some x, which must
be a limit point of {xn}.
Proof of (iii) —*¦ (iv). If A is not totally bounded, there exist some positive
? and some sequence {xn} such that p(xm, xn) > e for m ф п; {xn} can have
no limit point. Hence (iii) implies total boundedness. And (iii) implies
completeness because, if {xn} is fundamental and has a limit point x, then
{xn} converges to x.
Proof of (iv) —v (i). Assume (iv) and suppose {Ga}, where a ranges over an
arbitrary index set, is an open cover of A having no finite subcover. We
shall derive a contradiction.
Since A is totally bounded, it can, for each n, be covered by finitely many
open spheres Bnl, . . . , Bnkn of radius 2~n. At least one of the Bni must have
the property that no finite subfamily of {Ga} covers А П Bni (which must
therefore be nonempty); let Cn be one such Bni. Since the Bni П Cn_x П A
cover Cn_! П A (if n > 1), we may also insist that no finite subfamily of
{Ga} cover Cn П Cn_x П A, so that, in particular, Cn П Cn_x ^ 0.
Let xn be one of the points that Cn shares with A. Since Cn П Сп_х ф 0
and Cn has radius 2~n, we have p(xn, xn_x) < 6-2~n, which implies
218 Appendix I
p(xn, xn+k) < 6-2~n. Thus {xn} is fundamental and has, by the completeness
assumption, a limit x, which must lie in A.
Now x ? Ga for some a, and S(x, e) <=¦ Ga for some positive e. But then
p(x, xn) < e/3 for some n satisfying 2~n < e/3, which implies Cn «= Ga, а
contradiction.")"
A subset of A;-dimensional Euclidean space Rk (with the usual metric) has
compact closure if and only if it is bounded.
Ifh is a continuous mapping of S into another metric space S', and if К is a
compact subset of S, then hK is a compact subset of S'.
Proof. If {G'a} is an open cover of hK, then {h^G'J is an open cover of К
and hence has a finite subcover {h~1G'a.:i = 1, ... , n). Clearly, {G'a.:i =
1, . . . , n} covers hK.
Upper Semicontinuity
A function/is upper semicontinuous at x if for every positive e there exists a
positive <5 such that p(x, y) < <5 implies f(y) < f{x) + e. It is easy to see
that / is everywhere upper semicontinuous if and only if, for each real a,
the set {x:f{x) < a} is open.
Letfn be real functions on S such that, for each x,fn(x) is nonincreasing and
B) lim/n(x) = 0.
n-*oo
If the fn are upper semicontinuous, then the convergence in B) is uniform on
each compact set.
Proof. For each positive e, the open sets Gn = {x:fjx) < e} cover S.
If К is compact, then К <= Gn for some n and uniformity follows.
The Space В.™
Let R*3 be the space of sequences x = (хг, x2, . . .) of real numbers. If
Po(a> ft) — la — 01/A + |a — 0|), then pQ is a metric on the line R1 equivalent
to the ordinary metric |a — 0|. The line is complete under p0. It follows that,
if p(x, y) = ??.! po(xk, ykJ~k, then p is a metric on R™. If
C) Nk;(x) = {y.\yi - xt\< e, i = 1, . . . , k},
then Nk>?(ж) is open in the sense of the metric p. Moreover, since p(x, y) <
е2~к1A + e) implies у ?Nke(x), which in turn implies p(x, y) < e + 2~fc,
the sets C) form a base for the topology given by p.
t Proof from Dieudonne A960).
Metric Spaces 219
We shall always take i?00 with this topology. It is the product topology,
or the topology of coordinatewise convergence: limn x(n) = x if and only if
limn xk{n) = xk for each k. The space R™ is separable; one countable, dense
set consists of those points with coordinates that are all rational and that,
with only finitely many exceptions, vanish.
Suppose {x(n)} is a fundamental sequence in i?00. Since
PofaOn), xk(n)) < 2kP(x(m), x(n)),
it follows easily that, for each k, {^A), xkB), . . . } is a fundamental sequence
on the line with the usual metric, so that the limit xk = limn xk(n) exists. If
x = (хг, x2, . . .), then x(ri) converges to x in the sense of R™. Thus R™
is complete.
A subset A of R°° has compact closure if and only if the set {xk:x e A} is,
for each k, a bounded set on the line.
Proof. It is easy to show that the stated condition is necessary for compact-
compactness. We may prove sufficiency by the classical diagonal method. Given a
sequence {x(n)} in A, we may choose a sequence of subsequences
/л\ I ""VOJt ~V22/» XV^2.Z)i
in the following way. The first row of D) is a subsequence of {x(n}}, so
chosen that xx = lim^ хг(пи) exists; there is such a subsequence because
{xx:x e A} is a bounded set of real numbers. The second row of D) is a
subsequence of the first row, so chosen that x2 = lim^ x2(n2i) exists; there is
such a subsequence because {x2:x e A} is bounded.
We continue in this way; row A; is a subsequence of row к — I, and
xk = lim.; xk(nki) exists. Let x be the point of R™ with coordinates xk. If
щ = nu, then {х(щ)} is a subsequence of {x(n)}. For each k, moreover,
x(nk), x(nk+i)> • • • all lie in the kth. row of D), so that lim^ хк(щ) = xk. Thus
limf х(пг) = x, and it follows that A~ is compact.
Our final result about Rm is a special case of Urysohn's embedding theorem.
Every separable metric space S is homeomorphic to a subset of R™.
Proof Let {dx, d2, . . . } be a sequence of points dense in S, and define a
mapping h from S into R™ by
h(x) = (P(x, dj, p(x,d2),...), xe S,
220 Appendix I
where p denotes the metric on S. If points xn of S converge to a limit x, then
limn p{xn, djc) = p(x, dk) for each k, so that, since the topology of Л00 is
that of coordinatewise convergence, h(xn) converges to h(x). Thus h is
continuous.
Suppose xn does not converge to x. Then p(xn,, x) > e for some positive
s and for each element of some subsequence {xn>}. If p(x, dk) < ?e, which
must be true for some element of the dense sequence {dk}, then p(xn,, dk) > |e
for all n'; thus р(жп, dk) cannot converge to p(x, d^) for this value of к and
hence h(xn) cannot converge to h(x).
Thus h(xn) —>- h(x) implies xn —>- ж. The same argument shows that h(x) =
h(y) implies x = у (take xn = ?/). Therefore h is a one-to-one, bicontinuous
mapping of S onto the subset hS of R°°.
The Space С
Let С = C[0, 1] be the space of continuous, real-valued functions on the
unit interval [0, 1] with the uniform metric. The distance between two
elements x = x(t) and у = y(i) of С is
p{x, y) = supt \x(t) - y{t)\;
it is easy to check that p is a metric. Convergence in the topology is uniform
pointwise convergence of (continuous) functions.
The space С is separable; one countable, dense set consists of the (polyg-
(polygonal) functions that are linear on each subinterval [(i — 1I k, i/k], i = 1,. . . ,
k, for some integer k, and assume rational values at the points ijk, i =
0,1,-..,*.
If {xn} is a fundamental sequence in C, then, for each value of t, {xn(t)}
is a fundamental sequence on the line and hence has a limit x(t). It is easy to
show that the convergence xjj) —> x(t) is uniform in t, so that x lies in С
and is the limit in С of {xn}. Thus С is complete.
We define the modulus of continuity of an element a; of С by
E) wx(d) = w(x, d) = sup \x(s) - x(t)\, 0 < д < 1.
|s-i|<5
Since
F) K(<5) - wy(d)\ < 2P(x, y),
wx(d) is, for fixed positive <5, continuous in x. Note also that, since an element
of С is uniformly continuous, we have
G) lim wx(d) = 0, x e C.
5-*0
Metric Spaces 111
(The Arzela-Ascoli Theorem.) A subset A of С has compact closure if and
only if
(8)
and
(9)
supHO)|<oo
xeA
lim sup wx(d) = 0.
5->0 xeA
Proof If A~ is compact, (8) follows easily. Since wx(\jn) is continuous in x
and nonincreasing in n, G) holds uniformly on A if A" is compact (see p. 218)
and (9) follows.
Suppose now that (8) and (9) hold. Choose к large enough that
wx(l/k) is finite. Since
к
K01<K0)l+I
-6')-ft1')
it follows that
A0)
sup sup \x(t)\ < oo.
t xeA
We shall deduce from (9) and A0) that A is totally bounded; since S is
complete, this will imply that A~ is compact. Given s, we must construct a
finite ?-net for A. Let a denote the finite quantity in A0), and let H denote
the finite set of points
-а,
v
и = 0, ±1,.. ., ±v,
where v is an integer such that ajv < e (H is an ?-net for the linear interval
[—a, a]). Now choose к large enough that wx(l[k) < s for all x in A, and
take В to consist of those elements of С that are linear on each subinterval
[(/ — l)[k, i[k], i = 1, . . . , k, and assume values in H at the points i[k,
i = 0, 1, . . . , k. The set В is finite (it contains Bv + I)*4 points); we shall
show that it is a 2e-net for A.
If x e A, then \x(ijk)\ < a. Therefore there exists a point у of В such that
x
©-'(9
< e, i = 0,1,.
k.
Since wx(l[k) < e, and since у is linear on each subinterval [(i —
i/k], it follows from A1) that p(x, y) < 2e. This proves the theorem.
We have seen that (8) and A0) are equivalent in the presence of (9). When
(9) holds, A is said to be uniformly equicontinuous. Thus the Arzela-Ascoli
theorem asserts that A~ is compact if and only if A is uniformly bounded
and uniformly equicontinuous.
APPENDIX II
Miscellany
Measurability
Let (Q, 38) and (Q', 38') be measurable spaces; 3$[3$'} is a cr-field of subsets
of Q[O']. A mapping h:Q,—*• O' from Q into O' is said to be measurable
C8, &') if the inverse image hrxM' lies in 38 for each M' in SI1. If h~x38'
denotes the class {h~xM':M' e 38'}, this condition may be succinctly stated as
fr1 381 с SI. Since {M':hrxM' e Щ is a <r-field, i/ ^ й contained in SI*
and generates it, then h~x&'0 <=¦ 38 implies hrx38' <=¦ SI.
Let (Q", ST) be a third measurable space, let/:Q' -> Q" map O' into O",
and denote by jh the composition of h and j: (jh)(со) =j(h(co)). It is easy to
show that, ifh-1^' с SI andj-1^" с SS\ then (jh)'1^" с SI.
If Q = S and O' = S' are metric spaces, h is continuous when h~xG' is
open in S for each open G' in S'. Let «?" and Sf' be the (T-fields of Borel sets
in S and S'. If h is continuous, then, since h~xG' e «9" for G' open in 5', and
since the open sets in S' generate &", h~x?f' <=¦ ?f, so that h is measurable.
If O' is A;-dimensional Euclidean space Rk, we always take 38' to be the
class 3%k of A;-dimensional Borel sets (the (T-field generated by the open
subsets of jRfc with the Euclidean metric), and we say h is measurable 38 if
hrxMk с 38\ h is measurable 38 if /rx{a ei?4:^ < a) e 38 for each
f = 1, . . . , к and each real a. In particular, if Q = 5 is a metric space, if
к = 1, and if h is upper semicontinuous (see p. 218), then h is measurable 5f.
Change of Variable
If P is a probability measure on (Q, 38), and if h~x3$' <= 38, ?hrx denotes the
probability measure on (п',38') defined by (?hrx)(M') = P(/rW) for
222
Miscellany 223
M' e 38'. If/ is a real function on Q', measurable ^", then the real function
fh on О is measurable ?8.
Under these circumstances,\ fis integrable with respect to ?h~x if and only if
fh is integrable with respect to P, in which case we have
A) f f(h(co)) P(dco) = f f(co') Ph~\dco')
for each M' in &'.
If X takes Q into a metric space S and X~x?f <= 88, so that X is a random
element of S (see Section 4), we generally write E{/(Z)} in place of
$f(X(co))P(do>). If P = PX-1 is the distribution of JTin S, A) implies
B) E{/(X)} = f f(x)P(dx).
Js
Tail Probabilities
Let X be a random variable on a probability space (O, &, P). If X is non-
negative and integrable, and if a > 0, then
C) f IdP = aP{I>a} + f°°P{X^ t}dt.
J{X>a} Ja
This will follow if we prove
D) E{X} = rP{X>t}dt,
Jo
since we may replace X by its product with the indicator of {X > a}. If X
has finite range, D) follows by summation by parts. The general result
follows from the fact that any nonnegative X can be represented as the limit
of a nondecreasing sequence of random variables Xn with finite range. (The
equation also holds in the extended sense that if one side of D) is infinite so
is the other.)
Still assuming X nonnegative and integrable, and assuming a > 0, we
have
P{X > a} < - f X dP < - ЦХ}.
a J{x>a} a
E)
Scheffe's Theorem
Let A be a measure (not necessarily finite) on a space (O, 3$), and let p(co)
3.ndpn(co) be probability densities with respect to l\p a.ndpn are nonnegative
functions on Q, measurable ^, such that Jp dX = Jpn dX = 1.
t For a proof, see Halmos A950, p. 163), for example.
224 Appendix II
(Scheffe's Theorem.)^ Ifpn(co) ->p(co) except for со in a set of X-measure
0, then
F)
sup
Ee3S
\ pdX- \ pndX =-\\p-pn
JE JE 2 J
Proof lfdn=p- pn, then J<5n dX = 0. For E in &, therefore,
\dndX
Je
=
\dndX
JE
f bndX
JEC
{\dn\dX;
J
and, if E = {<5n > 0}, there is equality here. This proves the equality in F).
If <5+ is the positive part of <5n, then <5+ -> 0 except on a set of A-measure 0.
Since 0 < 5+ < p, J|<5J dX = 2J5+ dX -> 0 follows by Lebesgue's dominated
convergence theorem.
A special case: If 2^„(г) = 2t-/>(i) = 1, the terms being nonnegative, and
if limnpn(i) = p(i) for each i, then Тц \p(i) — pn(i)\ -»¦ 0. If a(i) is bounded,
it follows that T,t a(i)pn(i) -»¦ 2,- a(i)p(i).
Subspaces
Let S be a metric space, and let Sf be its (T-field of Borel sets. A subset So
(not necessarily in ?f) is a metric space in its own right in the relative
topology. Let ?f0 be the (T-field of Borel sets in So. We shall prove
If h(x) = x for x e So, then A is a continuous mapping from So to S and
hence Ьгг& <= ^Oj So that So n A e ^0 if A e ^. Since {?„ n ^4 :^4 e 5f)
is a (T-field in So and contains all sets So П G with G open in S1, that is, all
open sets in So, G) follows.
If >S0 lies in 5^, G) becomes
Product Spaces
Let 5' and S" be metric spaces with metrics p and p" and (T-fields Sf" and
Sf" of Borel sets. The rectangles
(9) A' X A"
with A' open in 5" and A" open in 5" are a basis for the product topology in
S = S' x S". This topology may also be described as the one under which
t Scheffe A947).
Miscellany 225
(x'n, x"J -> (xr, x") if and only if x'n -+ x and < -+ x". Finally, the topology
may be specified by various metrics, for example,
A0) p((x\ x"), (</', jO) = V [/>'(*', y')f + [P"(x\ y")f
and
A1) P{{x', x"), (y1, y")) = max {p\x\ у'), р\х\ у")}.
Let 9" x 9"' be the (T-field generated by the measurable rectangles (sets
(9) with A' e 9" and A" e 9"'), and let 9 be the cr-field of Borel sets in S
for the product topology. We first show that
A2) 9' X 9" <= 9.
If ir'(x', x") = x' and tt"(x', x") = x", then tt':S^S' and ir':S-+S" are
continuous and hence measurable ((тг')^" <= ^ and (тг")-1^" с ^).
It follows that, if A' e ^' and Л" e &\ then Л' x A" = ir'-1/ П тг"-М"
lies in 9. Since 5^ contains all the measurable rectangles, A2) follows.
Now S is separable if and only if S' and S" are both separable. Let us show
that, if S is separable, then
A3) 9" X 9" = Sf.
In view of A2), it suffices to show that, if G is open in S, then G e 9" X Sf*.
But G is a union of rectangles (9) with A' open in S' and Л" open in S"
(so that A' X A" e9" x 9"'), and, if ? is separable, G is a countable such
union. This proves A3).f
Suppose that X' and X" are random elements of S' and S", respectively, and
have a common domain (Q, &). Now (X'(co), X"(cSj) defines a mapping
(X\ X") from О to S = S' x S", and clearly (JT, Х"У\9" х 9") <= 38.
If S is separable, then, by A3), (X1, X") is a random element of S.
If 5" = S", then p'(x', y') defines a continuous mapping from S' X S" to
jR1; if S' is separable, then the composition p(X', X") is measurable and hence
is a random variable, t
Measurability of Dh
Let Dh be the set of discontinuities of a mapping h from a metric space S to
another metric space 5". Denote the metrics in S and S' by p and p'. Let
Aed be the set of x in S for which there exist points у and z in 5 satisfying
f Without separability, A3) may fail: If S' = S№ is a discrete space whose cardinality
exceeds that of the continuum, then the diagonal {(ж, у): х = у) lies in SP but not in
Sf" x ^" (see Problem 2 on p. 261 of Halmos A950)).
X The counterexample in the preceding footnote shows that this may be false if S' is not
separable.
226 Appendix II
p(x, У) < <5, p(x, z) < d, and p'(hy, hz) > e. Then Aed is an open set. Since
Dh = U П AejS,
z 3
where s and <5 are restricted to positive rationale, Dh is a Borel set in S. This
is true even if h is not measurable (for example, if h is the indicator of a set
A, then Dh = dA is closed no matter what A may be).
Consider now the set E involved in Theorem 5.5. We shall assume that the
mapping h is measurable and that S' is separable and prove E e ?f. If
Be s _t.is the set of ж such that p'Qix, h$) > s for some у with p(oj, y) < <5, then
? = unn и
where e and <5 range over the positive rationals, and it suffices to find sets
CE д i that lie in Sf and satisfy
If the sequence мт is dense in S' and if He m = {x:p'(hx, um) < |e}, then
ЯЕ TO e ?f and 51 = \J mHzm. The requirements on Сг&л are satisfied by
СЕ>г,, = U (ЯЕ>ГИ П jMA J,
m
where /eE>t. m is the set of x such that р'(^у, hz) > e for some pair of points
у and z with p(x, y) < 5, p(x, z) < 5, and z e Hem (Лл<§ж is open). This
provesf ?e ^. (The proof of Theorem 5.5 goes through even if E lies
outside Sf, provided it has outer P-measure 0.)
Helly's Theorem
A distribution function is a function F(x) = F(xu . . . , xk) on Rk with these
three properties (we use the terminology and notation of Section 3):
(i) Fis everywhere continuous from above;
(ii) 0 < F(x) < 1 for all x, Fis nondecreasing in each variable, and, for
each A:-dimensional rectangle (a, b],
A4) • Z±F{a1 + eid1,...,ak+6kdk)>0,
where dt = Ъг — at, where the sum ranges over all 2fc sequences
FX, . . . , 6k) of O's and l's, and where the sign is + or — according
as the number of O's in the sequence is even or odd;
(iii) F(x) —>- 0 as any one coordinate of x goes to — oo, and F(x) -> 1 as
all coordinates of ж go to oo.
f This proof is due to F. Tops0e.
Miscellany 227
If P is a probability measure on (Rk, ?%k) and
A5)' F(x) = P{y:y<x}, xeR*,
then F is a distribution function. It is a standard fact of real variable theory
that, if F is a distribution function, then there is exactly one P satisfying
A5). If F has properties (i) and (ii) above (but perhaps not (ill)), then
there is a finite measure ц on (Rk, Mk) such that
A6) F{x) = [x{y:y<x}, xeRk;
/x will satisfy p(Rk) < 1 but need not be a probability measure.
(Helly's Selection Theorem.) If{Fn} is a sequence of distribution functions
on Rk, then there exist a subsequence {Fn,} and a function F satisfying condi-
conditions (i) and (ii) above (but perhaps not (iii)) such that
A7) linV Fn.(x) = F(x)
for all continuity points x of F.
Proof. Let Rok denote the set of rational points in jRfc—the set of points
whose coordinates are rational—and let (r(l), rB), . . . } be an enumeration
of the points of ДД F°r eacn Fn in the given sequence of distribution
functions,
A8)
is a point of R™. Since 0 < Fn(x) < 1, it follows by the compactness
criterion for R™ (see p. 219) that some subsequence of the points A8)
converges in the sense of R™ to some point (zl9 z2, . . . ) of jR00. Define Fo
on jRofc by F0(r(k)) = zk. We see in this way that there exists a function Fo
on Rok and a subsequence {Fn,} of {Fn} such that
A9) limn.Fn.(r) = Fo(r), r e Rk0 .
Since each Fn is a distribution function, it follows from A9) that Fo
satisfies condition (ii) on Rok: 0 < F0(r) < 1, Fo is nondecreasing as each
coordinate varies over the rationals, and A4) holds if a and b lie in ДД
If we define F on jRfc by
F(x) = inf {F0(r): x < r, r e Rj}, xe Rk,
then F has properties (i) and (ii) (but perhaps not (iii)).
If F is continuous at x, then, given e > 0, we can find points r' and r" of
R? such that r' < x < r" and
F(x) - e < F0(r') < F0(r") < F(x) + e.
For each n',
Fn(r') < Fn,(x) < Fn.(r").
228 Appendix II
From these relations and A9) it follows that
F(x) - e < lim infn. Fn,(x) < lim supw, Fn,(x) < F(x) + e.
Since ? was arbitrary, A7) follows. This proves Helly's theorem.
Kolmogorov" s Theorem
We shall prove the Kolmogorov (or Daniell-Kolmogorov) existence theorem.
Let TTh be the projection from R™ to Rk defined by ттк{х) = (хг, . . . , xk), as
described in Section 3. For к > 1, let ipk be the projection from Rk to jRfc-x
defined by чрк(хг, . . . , xk) = (xx, . . . , zfc_x). Since ттк and tpk are continuous,
they are measurable (ттк1&к е 0t™ and у}кг^к^ с &k).
If P is a probability measure on (R™, M™), define probability measures
txh on (Rk, Mk) by
B0) fxk = \
Since
B1) 7rfc_i = укттк, к > 1,
the measures /^fc satisfy
B2) nk_x= [xkipk\ fc>l.
The problem is to go the other way and construct, from given [xk satisfying
the consistency conditions B2), a P satisfying B0).
{Kolmogorov's Existence Theorem.) If probability measures цк on
{Rk,Mk), k>\, satisfy B2), then there exists on (Д00, M™) a unique
probability measure P satisfying B0).
Proof. For к > i> 1, define a continuous mapping y)ki from jRfc to .R* by
Wk,i(xi, ¦ ¦ ¦ , xk) = (xi, • ¦ • , O- Note that у)кк_г = ipk and that
B3) yk<i = %+i' • • Щ+iVk, Щ = Wk.i^k
From this and B2) it follows that
B4) fr = 1Аьц>^, k>i>l.
Let 3F be the collection of finite-dimensional sets, as defined in Section 3;
3F, which consists of the sets of the form
B5) A = ттГгН, Н e 0t\
is a field generating the (T-field M™. A set of the form B5) can also be cast
in the form
B6) A = тт^Н', Н' e 0tk,
if к > i; we need only take H' = y)k]H and use B3).
Miscellany 229
Suppose now that A is given both by B5) and by B6), where i < k.
Then- a = (al5 . . . , afc) lies in H' if and only if (al5 . . . , afc, 0, 0, . . . ) lies
n тг^Я' = 7гт1Я, which is true if and only if (al5. . . , aj lies in Я. Thus
tf' = ip^H, and B4) implies
B7)
Since B5) and B6) together imply B7), we may consistently define a
Function P on & by setting P(A) = /л^Н) if A is given by B5). Clearly,
P@) = 0, PiR™) = 1, and 0 < P(^) < 1. Suppose A is given by B5) and
В by
В = тг-V, J e M\
where we assume к > i. Then A also has the form B6), and if А П В = 0,
we must have Я' П J = 0, so that
P(A UB) = /лк(Н' U J) = 1лк(Н') + t*k{J) = P(A) + P(B).
Thus P is a finitely additive probability measure on ZF.
Let us prove that P is completely additive on J5" by showing that, if
B8) Л => Л, =>•••,
and Пг ^г = 0, then lim^ Р(Аг) = 0. Since ^4f lies in J5", it has the form
B9) ^=<Я„ Я<?^.
Since a set of the form B5) can be cast in the form B6) for each к exceeding
i, there is no loss of generality in assuming that
C0) пг < n2 < nz < • • • .
We shall show that, if sets B9) satisfy B8) and C0), and if there exists a
positive e such that
C1)
then Hi At Ф 0.
From C1) and B9), it follows by the definition of P that цп.{Н^ > е.
By Theorems 1.1 and 1.4, there exists in jRni a compact subset Кг of Я^ with
fx^H, - Кг) < е№+\ If В, = 1Г-)К,, then Вг eJF, Bt с Л„ and
P(At - Вг)< -1- .
From this and B8) it follows that Ct = Вг П • • • n B{ is a subset of At
satisfying P(A{ - CJ < Щ=ХР(А5 - B3) < e/2. By C1), therefore, P(Ct) >
e/2, so that Q is nonempty.
230 Appendix II
We have constructed nonempty sets Ct <= w^ with Kt compact and
C2) d => C2 =>¦¦•.
Since Q <= At, f|i At # 0 will follow if we find a point common to all the
Q. Let x(j) be an arbitrary element of C,-. If у > i, thenxQ") e Ct by C2), and
hence irnx{f)e Кг. Since ^ is compact, we have sup^ K(/)| < со and
hence sup3 \xt{j)\ < oo. For each i, therefore, (жД1), х{B),. . . } is a bounded
set of real numbers. Therefore (see p. 219) some subsequence {x(j')} of
{x(j)} converges in the sense of R°° to some limit x e R™. Since x(j') e Cy
and each Q is closed, C2) implies that x lies in f^ Ct.
We have shown that P is a completely additive probability measure on
the finitely additive field !F. Since !F generates M™, P can be extendedf to a
probability measure on R™. Clearly, P satisfies B0). Since IF is a determining
class, there can be at most one P satisfying B0), which proves Kolmogorov's
theorem.
It is not difficult to go on from here and prove a more general version of
the theorem. Let RT be the space of all real functions x = x{t) on an arbitrary
set T, which we may as well take to be infinite. (If Г consists of the integers,
then RT = jR00.) For a finite ordered set a = (sx, . . . , ska) of distinct elements
of T, define TTa:RT^Rk° by -na{x) = (x(Sl), . . . , x(ska)); let 0tT be the
(T-field generated by the sets тт~хН with H e Mka (for all a and H). Then
(jRT, &T) is a measurable space; no topology is involved (or need be
involved).
A collection of probability measures [ла on (Rka, 0№) (one measure for
each a) is consistent if ца^ = [лаух whenever a2 = (si ¦> • • • > si}) is a permuta-
permutation of/ of the elements of ax = (sx, . . . , sk) (here у < к) and ip:Rk->- Rj is
defined by y{xx, . . . , xk) = (xh, . . . , хг).
If the [xa are consistent, there exists a unique probability measure P on
(RT, @T) such that Ptt-1 = iia for all a.
To prove this, define, for a sequence т = (tx, t2, . . .) of elements of T,
a mapping ttt:Rt -»¦ jR00 by тгт(ж) = (x(tx), x(t2), . . . ). From the existence
theorem for R™, it follows that there exists a probability measure PT on
{R™, M™) such that Рттт-Х = ptY..tic for every k.
Now the class of sets тт-гН with H e M™ (for all т and Я) is exactly
0lT (and does not merely generate it), and it follows easily that Р{тг~хН) =
PT(H) consistently defines the desired P.
Measurability of Some Mappings
Let T denote the unit interval [0, 1]; let &~ denote the class of linear
Borel subsets of T. For each t, the projection ттг from С to jR1 is measurable
f Halmos A950, Chapter 3).
'€. Since the mapping
C3) ' (x, 0 - x(t)
from С X T to J?1 is continuous in the product topology, and since ^ x 3~
is the (r-field of Borel sets for this topology (see p. 225), C3) is measurable
For x e C, let h(x) be the Lebesgue measure of the set of t in T for which
x(t) > 0. We want to prove that h is measurable #, and we shall derive this
from a more general result. If we define a real function и of a real variable by
fl if a>0
C4) «(a) =
10 if a<0,
then
C5) h(x)= v(x(t))dt.
= Cv(x(t)),
Jo
Now v is (i) Borel measurable, (ii) bounded, and (iii) continuous except on a
set of Lebesgue measure 0. Using these three properties of v, we shall show
that the function h defined by C5) is measurable ^ and is continuous except
on a set of Wiener measure 0.
Since, for each x in C, v(x(t)) is a bounded, measurable function of t,
the integral in C5) is well defined. Since the mapping C3) is measurable
^ x 0Г and since v is measurable, the mapping ip: С x T—^R1 defined by
ip(x, t) = y(z(f)) is also measurable. Since ip is bounded, h(x) = JJ ip(x, t) dt
is measurable in x.| Hence h is measurable ^\
If Dv is the set of discontinuities of v, then by (iii), А(Д) = 0, where X
denotes Lebesgue measure on the line. Let E denote the set of (x, t) for
which x(t) e Dv. For each positive t,
W{x:(x, t)eE}= —L= f «Г* du = 0.
jlt JDV
It follows by Fubini's theorem applied to the measure W x A on
that
z,0e?} = 0
if x ф A, where Л. is an element of ^ with W{A) = 0. Now if xn converges to
x pointwise, then v(xn(t)) -+ v(x(t)) for each t such that x(t) ф Dv. If x ф А,
this is true for almost all t, and hence
{v(xn(t))dt-+{ v(x(t))dt
Jo Jo
by the bounded convergence theorem. This proves that h is continuous
except at points forming a set of FF-measure 0.
t See Halmos A950, Chapter 7).
232 Appendix II
The argument goes through if W is replaced by an arbitrary P with the
property that P-nj1 is absolutely continuous with respect to Lebesgue measure
for almost all t. This is true of W°, for example.
Except for the proof that C3) is measurable, this argument goes through
word for word if С is replaced by D. And the measurability of C3) can be
proved by adapting the proof in Section 14 of the measurability of irt.
One more function on С remains to be analyzed, namely the supremum
h(x) of those t in [0, 1] for which x(t) = 0. Since (x:/z(x) < a} is open,
certainly h is measurable. If h is discontinuous at x, then x(t) must keep to
one side of 0 in (A(x), l) and keep to the same side of 0 in (h(x) — s, A(x))
for some s. That h is continuous except on a set of Wiener measure 0 will
therefore follow if we show that, for each t0, the supremum and infimum of
Hoover [t0, 1] have continuous distributions. Since Wt — Wt with t ranging
over [t0, 1] is distributed as a Wiener path with a linearly transformed time
scale, — Wttj + supf>io Wt has a continuous distribution (see A0.17)). This
last random variable and Wto are independent, and hence their sum also has a
continuous distribution. The infimum is treated the same way. (This function
h extended to have domain D is not continuous except on a set of Wiener
measure 0.)
More Measurability
In Section 17 we defined the mapping ip:D x Do^~ D by ip(x, cp) = x 0 cp
(see A7.5) and A7.6)), and we are to prove it measurable {гр~х2 с 2 x 20).
Since the finite-dimensional sets in D generate 2, it is enough to prove that,
for each t, the mapping
C6) (x, cp) -> rrrt{x о cp) = x(cp(t))
is measurable 2 x 20. If cpk{t) is the smallest ratio i/k not smaller than
cp(t), then cpk(t) I cp(t) for each t. Hence the mapping
C7) (x, cp) -> x(cpk(t))
converges pointwise to C6), and it suffices to prove this latter mapping
measurable 2 x 20. Now {(x, q>):x(cpk(t)) < a} is the union of
C8) {(x, cp): <p(t) = 0} П {(x, cp): x@) < oc}
with the sets
C9) ((x, ф):1=± < cp(t) <Цп f(x, cp):x(i) < a), г = 1,. . ., к.
If H e Й?1, then {cp e D0:<p(t) e #} is the intersection with Do of the subset
-njxH of D and hence lies in 20. Therefore {(x, cp):cp(t) e H} e@ x 20.
Similarly, {(x, cp):x(t) e #} e2 x 2o. Thus the sets C8) and C9) all lie in
2 X 20, which proves the measurability of C7).
APPENDIX III
Theoretical Complements
This appendix, most of which requires general (nonmetric) topology, treats
questions that arise naturally out of the theory of weak convergence but are
irrelevant to its application.
The Problem of Measure
Does there exist on the class of all subsets of a given set U a probability
measure that has no atoms (that assigns measure 0 to each individual point)?
The answer certainly depends only on the cardinality of U. It is clear that no
such measure can exist if U is countable, and it can be shownf under the
assumption of the continuum hypothesis that no such measure can exist if
U has the power of the continuum.
If there does exist an atomless probability measure on the class of all
subsets of U, then the cardinal of U is called measurable; otherwise this
cardinal is called nonme asm able. Thus Xo is nonmeasurable, and the
continuum hypothesis implies that the power of the continuum is also non-
measurable. (Among sets on the line, it is the nonmeasurable ones that are
aberrant. Here the terminology is reversed: Among cardinals, it is the
measurable ones that are aberrant.)
Whether there exist measurable cardinals is a famous unsolved problem
of set theory—the so-called problem of measure.! If measurable cardinals
fSee Birkhoff A961, p. 187)..
% See Keisler and Tarski A964) for an account of the problem of measure and a large
bibliography.
233
234 Appendix III
exist at all, they must be so large as never to arise in a natural way in
mathematics. We shall see that certain irregular phenomena in weak con-
convergence are equally remote from ordinary mathematical activity because
they can arise only if there exist measuable cardinals.
Separable Measures
Consider now a probability measure P on the class Sf of Borel sets in a
metric space S. According to Theorem 1.4, P is tight if S is separable and
complete. Certainly, completeness can be weakened to topological complete-
completeness (the condition that there exists for S an equivalent metric under which
it is complete), and an examination of the proof shows that the assumption
of separability can be weakened to the assumption that P has separable
support. Let us define P itself to be separable if it has a separable support.!
We then have the following result.
THEOREM 1 If P is separable and if S is topologically complete, then P
is tight.
Remark 1. The hypothesis here that P is separable cannot be suppressed:
If P is tight, then it has a сг-compact support, and hence is separable. But, of
course, S itself need not be separable.
Remark 2. The hypothesis of topological completeness cannot be
suppressed either: Let S be a thick subset of [0, 1]—a set whose inner
Lebesgue measure X^S) is 0 and whose outer Lebesgue measure X* (S) is
1—with the relative topology of the line. Here ?f consists of the sets S П A
with A an ordinary linear Borel set (see G) on p. 224). If P is the restriction
of X* to У, then P is completely additive; it is a probability measure
because X*(S) = 1. If К is compact in S, it is an ordinary Borel set and hence
P(K) = 0 because X^ (S) = 0. Hence P is not tight. Since P is separable
(S itself being separable), Theorem 1 becomes false without the completeness
hypothesis.
Remark 3. If S consists of the rationale with the relative topology of the
line, then, since S is сг-compact, each P on S is tight. On the other hand, by
Baire's category theorem,J S is not topologically complete. It is an open
problem to characterize topologically those metric spaces that support tight
probability measures only.
The question now arises, do nonseparable probability measures exist?
t This is the specialization to the metric case of the more general notion of a т-smooth
measure; see Varadarajan A958a and 1961a).
% See Kelley A955, p. 200).
Theoretical Complements 235
THEOREM 2 A necessary and sufficient condition that each probability
measure on ?? be separable is that each discrete^ subset ofS have nonmeasurable
cardinal.
Proof. We first prove the necessity. Suppose S contains a discrete subset
Ao with measurable cardinal. We shall construct a nonseparable P on ?f. Let
Q be an atomless probability measure on the class of all subsets of Ao, and
define an atomless probability measure P on У by P(A) = Q(A П Ao).
If A is separable, then А П Ao is both discrete and separable and hence
(p. 216) countable, so that Q(A П ,40) = 0. Thus P can have no separable
support.
The proof of sufficiency is much deeper. Suppose that each discrete subset
of S has nonmeasurable cardinal and consider an arbitrary probability
measure P on ?f. We are to show that P is separable. Note first that it suffices
to show that each open cover IS of S contains a countable subclass
{Gx, Ga, . . . } with
A) P(\JnGn) = 1.
Indeed, for each к there is then a sequence Akl, Ak2, ... of open 1/fc-spheres
with P(\JnAkn) — 1> and DfeUn^kn is a separable support for P.
Consider then an open cover ^ of S. By the paracompactness theorem,!
there is a class Ж with these properties: (i) Ж is an open cover of S. (ii)
Each element of Ж is contained in some element of У. (iii) Ж can be
represented as a countable union
B) Ж = \}пЖп,
where, for each n,
C) inf {P(A, B):A, ВеЖп,А^В}>0,
p being the metric on 5*.
Fix n for the moment. From each element of Жп choose a single point,
forming in this way a set Sn. Because of C), Sn is discrete. For А с Sn, let
Xn(A) be the P-measure of the union of all those elements of Жп that meet A
(this union is open). Since the elements of Жп are disjoint, Xn is a finite
measure on the class of all subsets of Sn. Since Sn has nonmeasurable
cardinal by hypothesis, Xn has a countable support. (Otherwise we could
subtract away the atomic part of Xn to obtain on the class of all subsets of Sn
an atomless measure vn with 0 < vn(Sn) < oo, and vn could be normalized
to a probability measure.) Let An be a countable subset of Sn with
D) h(Sn - An) = 0,
t See p. 215.
t See Kelley A955, p. 129).
236 Appendix III
let /n be the class of elements of Жn that meet An, and define $ = \JnJ n-
Since each class $n is countable, so is Jf.
If Hn is the union of the sets in Ж п and Jn is the union of the sets in f n,
then, by D) and the definition of kn, P(Hn — Jn) = 0. Since Ж covers S, it
follows by B) that \JnHn = S and hence, if / = \JnJn, that P(J) = 1.
Now / is just the union of the sets in Jf, which we can enumerate as G[,
G'z, . . . . Each G'n lies in Ж and hence is contained in some element Gn of
^. Since P(J) = 1, it follows that the Gn satisfy A), as required.
That each P on a separable S is itself separable, an obvious fact, follows
because each discrete subset must have the nonmeasurable cardinal Xo.
Theorem 2 also implies that each P on 5* is separable if 5* itself has non-
measurable cardinal, which is true if the power of S does not exceed that
of the continuum and the continuum hypothesis is true. In any case, the
search for a nonseparable probability measure belongs properly not to
probability but to set theory.
The Topology of Weak Convergence
Consider now the space Z = Z(S) of probability measures on (S, ?P). Make
Z into a Hausdorff space by taking as the basic neighborhoods of P the sets
of the form
E)
(ftdQ-(ftdP
< e, i = 1,..., fcj,
where ? is positive and/1?. . . ,fk are elements of C(S). The resulting topology
we shall call the topology of weak convergence and denote by iV. We have
Pn => P if and only if the sequence {Pn} ^-converges to P.
We shall show that there are three other bases for iV, namely, the sets
F) {Q
with Ft closed, the sets
G) {Q
with Gi open, and the sets
(8) {Q:\QiA,) - Р{Аг)\ < e,i = \, ... ,k)
with At a P-continuity set. Certainly, each of these three classes of sets is a
basis (for some topology). Theorem 2.1 is the sequential version of the
following result.
THEOREM 3 The three bases just described all generate iV.
Proof. Since F) and G) coincide if Gt = Ff, the two corresponding bases
are identical. We shall show that the basis F) generates the same topology as
does (8) and then that it generates the same topology as E), that is,
Theoretical Complements 237
Fix P. If A is a P-continuity set, then for б in a set of the form F) we have
Q(A) < Q(A~) < P(A-) + ? = P(A) + ?, and for Q in a set of the form F)
(or G)) we have Q(A) > Q(A°) > P(A°) - s = P(A) - s. Since the sets
F) do have the properties of a basis, each set (8) contains a set F) (with the
same P). On the other hand, if F is closed, then there is some д for which
(9) Fd = {x:P(x,F)<d}
is a P-continuity set and
A0) P(FS) < P(F) + \e.
If \Q(Fd) - P(Fd)\ < |e, then Q(F) < P(F) + e. Thus each set F) contains
a set (8).
We now compare F) with E). Given a closed F, choose д so that (9)
satisfies A0), and then choose in C(S) an/with value 1 on F, value 0 outside
F8> and value everywhere contained between 0 and 1 (Theorem 1.2). If
\(fdQ - lfdP\ < is, then Q(F) < P(F) + e. Thus each F) contains a E).
It remains only to find within E) a set of the form F). We need consider
only a single / in C(S) and we may assume that 0 < f(x) < 1 for all x.
Choose к so that \\k < ? and let Ft — {х:Цк <f(x)}. By B.2) we have
J/dQ < ? + k-1 2, Q{FX) and fc-1 2, P{F,) < ifdP. For Q in F) we thus
have ifdQ <fdP + 2e. The same argument applied to 1 — / completes
the proof.
By N(P) we shall mean a ^-neighborhood of P having any of the forms
E) through (8). We are free to work with whatever base is most convenient.
THEOREM 4 The probability measures with finite support are W-dense
in Z.
Proof. Consider a neighborhood N(P) of the form F). From each nonempty
set В in the finite partition generated by Fx, . . . , Fk choose a single point and
place there a mass P(B). The resulting measure has finite support and lies in
N(P) because it agrees with P for each Ft.
Theorem 4 applies even to the approximation of a nonseparable P (if
there are any). Therefore, if a subset П of Z consists exclusively of separable
measures, one cannot conclude that each element of the ^-closure of П
is separable. If, however, the elements of П have a common separable
support and P lies in the "^-closure of П, then P is separable (because, if
A is the closure of a common separable support, then A is separable and it
follows by Theorem 3 that P(A) =1). Since a countable collection of
separable measures has a common separable support, P is separable if
Pn^> P and each Pn is separable.
It is natural to ask when iV is metrizable. For P and Q in Z, let p(P, Q)
be the infinum of those positive s for which the inequalities Q(A) < P(Ae) + e
238 Appendix III
and P(A) < Q{AE) + ? hold for all A in ?', where Ae = {x:p(x, A) < s}.
From p(P, 0 = 0 it follows that P and Q agree on closed sets and hence
(Theorem 1.1) are identical. The remaining postulates being easy to check,
p is a metric on Z—the Prohorov metric. We shall see that, if if can be
metrized at all, then p does it.
Fix P. If for each N(P) there is about P a/7-sphere contained in N(P), we
say p is at least as fine as if at P. If, in addition, for each/7-sphere about P
there is an N(P) contained in it, we say p and if are equivalent at P.
THEOREM 5 For arbitrary P, p is at least as fine as if at P. Moreover,
p and if are equivalent at P if and only if if has a countable basis at P,
which in turn holds if and only if P is separable.
Proof. Given P, F, and s, choose <5 so that <5 < s and (9) satisfies A0). If
p{P, Q) < R then Q(F) < P(Fd) + <5 < P(F) + e. Thus each N(P) of the
form F) contains a /7-sphere about P, which proves the first part of the
theorem.
If p and if are equivalent at P, then certainly if has a countable basis at
P. It remains then to prove that this last condition implies the separability of
P and that the separability of P implies the equivalence of p and if at P.
Suppose if has at P a countable basis Nn(P), n = 1, 2, .... By Theorem
4, each fl?=i Nk(P) contains a Pn with separable (even finite) support. But
then Pn => P, which, as remarked after the proof of Theorem 4, implies that
P is separable.
Finally, suppose that P is separable. We are to prove that p and if are
equivalent at P, for which, in view of the first part of the theorem, it suffices
to show that a/7-sphere with radius s and center P must contain some N(P).
Choose <5 so that 3<5 < e. Cover the separable support of P by P-continuity
spheres of diameter less than <5, pass to a countable subcover (p. 216), and
by the usual procedure construct disjoint P-continuity sets Ax, A2, . . . that
cover the support. Now choose к so that
(И)
Each of Alt . . . , Ak has diameter less than <5 and, if «s/ is the (finite) class of
unions of these sets, each element of «s/ is a P-continuity set.
By Theorem 3, there is an N(P) such that QeN{P) implies
A2) \Q(A) - P(A)\ < d, Ae^.
This inequality and A1)'imply
A3)
We shall show that p(P, Q)< s if Q e N(P).
Theoretical Complements 239
Given the general В in ?f, consider the union of those sets among Alt . . . ,
v4fc-that meet B. If A denotes this union, then it satisfies A2). The relation
5ciu (ULi Ai)c> tne relation A ^ Bs (true because diam At < <5), and
the inequalities A1), A2), and A3) combine to yield P(B) < Q(B8) + 2<5
and Q(B) < P(B8) + 3<5. Since 3C < e, p(P, Q) < e follows.
It follows by Theorems 2 and 5 that W is metrizable if and only if each
separable subset of S has nonmeasurable cardinal, in which case it is
metrizable by p.
One can ask after the separability of Z. If S is separable, then the p and Z
topologies coincide on Z, and it follows by Theorem 4 that the probability
measures with finite support are dense. It is not difficult to go on to show
that, if Ao is a countable, dense set in S and Zo consists of those P having
finite support contained in Ao and rational masses, then Zo is countable and
dense relative to p. Thus Z is metrizable and separable if S is separable.
On the other hand, if "W satisfies the second axiom of countability, it can
be metrized by p and is separable. Since S with its metric is homeomorphic
with the set of unit masses in Z, it follows that S is separable.
Prohorov's Theorem
Let us reexamine Prohorov's theorem in the light of the preceding results.
If П is a subset of Z, let Uv and П^- denote its closures with respect to
p and W. Theorem 5 implies that П,, с П^- (with strict inclusion for some
П if W is not metrizable).
A set in a topological space is compact if each open cover of it has a finite
subcover; it is sequentially compact if each sequence in it contains a sub-
subsequence converging to a limit that is also in the set. Consider these three
statements:
1°. Each sequence in П contains a ^-convergent subsequence.
2°. Uir is sequentially compact.
3°. П^- is compact.
The limit of the convergent subsequence in 1° is not required to lie in П,
but, of course, it will lie in П^-. Clearly, 2° implies 1°. From the three
statements, one can make up five other implications; whether any of them
are true is unknown.! If each separable subspace of S has nonmeasurable
cardinal, however, the W topology is metrizable, so that all three concepts
coincide, and we shall see that this is also true if П is tight. In this book
(see the beginning of Section 6), we have defined П to be relatively compact if
it satisfies 1°.
t To me.
240 Appendix III
Consider the direct half of Prohorov's theorem (Theorem 6.1).
THEOREM 6 If U is tight, then Up = IV, IV is tight, the p and
topologies coincide on IV, and IV is both compact and sequentially
compact (in either topology).
Proof. Since П is tight, there is to each s a compact К such that
Q(K) > 1 - ie for all Q in П. Suppose P elV By Theorem 3, there is
an N(P) such that Q e N(P) implies Q{K) < P(K) + \e. Choosing Q in
N(P) n П leads to P(K) > 1 - e. Thus IV is tight.
In particular, each element of IV is separable. Therefore (Theorem 5)
the p and iV topologies coincide on IV; since П^ с IV in any case,
Up = IV follows. From the tightness of IV, it follows by Theorem
6.1 that IV is sequentially compact in the sense of W. Since compactness
and sequential compactness are the same in the metric case, the proof is
complete.
The direct Prohorov theorem can also be used to investigate the complete-
completeness of Z. Suppose that S is complete and that each element of Z is separable
(and hence tight). Then p and if coincide on Z, and we shall show that Z
is /7-complete. For this it is enough to show that a ^-fundamental sequence
{Pn} necessarily contains a weakly convergent subsequence, which will
follow by the direct Prohorov theorem if we prove that {Pn} is tight. Finally,
as we saw in the proof of Theorem 6.2 (see p. 40), since S is complete {Pn}
is tight if, for each positive ? and <5, there exists a finite collection Ax, . . . , Ak
of E-spheres such that
A4) Рп(Аг U • • • U Ak) > 1 - e
for all n.
Choose tj so that 2r] < min {e, <5}, and then choose n0 so that n > n0
implies p(Pn, РПо) < r\. Since Р„о is separable, there exist finitely many
^-spheres Въ . . °. , Bk such that P°no(ULi Bt) > l ~ V- Let A, • ¦ • , A be
the 2?7-spheres with the same centers. Since (U*=i ВгУ с UiLi A an(i
p(Pn, РПо) < г) for n > n0, it follows that A4) holds for n > щ. Since each
Pn is separable, we can ensure that A4) holds for the finitely many n preceding
щ by enlarging the system {Аг}.
Thus Z is /7-complete if 5* is complete and each probability measure on it
is separable. Suppose, on the other hand, that Z is /^-complete. It is not
difficult to show that the set Zo of unit masses is /7-closed and hence also
/7-complete and then to show that S is complete.
We turn now to the converse half of Prohorov's theorem.
THEOREM 7 If U is if-compact and each measure in it is separable, and
if S is topologically complete, then П is tight.
Theoretical Complements 241
Proof. By Theorem 1, the elements of П are individually tight. To prove П
itself tight, it suffices as usual to produce, given e and <5, finitely many
<5-spheres Ax, . . . , Ak such that P(A± U • • • U Ak) > 1 — e for all P in П.
If P lies in П, then, since P is tight, there is a compact set К such that
P(K) > 1 — Je, and К can be covered by finitely many open <5-spheres
Bp i5 i = 1, . . . , kP. By Theorem 3, there is a ^-neighborhood N(P) of P
such that <2 e NG*) implies that the union GP of the i?Pji satisfies Q(GP) >
P(GP) — \e and hence Q(GP) > 1 — e. Since П is #"-compact, it can be
covered by a finite selection of these N(P). Take А1г . . . , Ak to consist of
the i?P>i corresponding to the neighborhoods N(P) selected.
In view of Theorem 1, Theorem 7 is the same as the assertion that, if П
is "^-compact and each of its elements is tight, and if S is topologically
complete, then П is tight. The hypothesis that each P in П is tight cannot be
suppressed, because a single nontight P forms a compact set. On the other
hand, it seems reasonable to conjecture that the completeness hypothesis
can be suppressed—to conjecture that П is tight if it is "^-compact and if
each of its elements is tight.f Although the problem is open, there is an
interesting result in this direction.
THEOREM 8 Suppose that Pn => P, where P and each of the Pn are tight.
Then {P, Px, P2, . . . } is tight.
Proof. Since the measures involved are individually tight, it is enough to
show that for each e there is a compact К such that
A5) Pn(K) >l-e
for all sufficiently large n.
Given e, use the tightness of P to choose a compact K' with P(K') > 1 — Je,
and put Gh= {x:p(x,K')< I/A;}. Since Pn=> P, there is an increasing
sequence {nk} such that n > nh implies
A6) Pn(Gk) > P(Gk) - ie > 1 - |e.
For nk < n < nk+1, use the tightness of Pn to choose a compact K'n satisfying
K'n^Gk and Pn(Gh - K'n)< ie. If А, = Гии«и^ then Kk is
compact, К' с Kh<^ Gh, and nk < n < nk+1 implies Pn(Gk — Kk) < Je.
From A6) it now follows that nk < n <! nfc+1 implies
A7) P
Put К = [JkKk. Suppose {xu} is a sequence in K; if there is some one Kk
containing all the xu, then {xu} contains a convergent subsequence; otherwise
t Varadarajan A958a and 1961a) has the result in this form, but his proof contains a
lacuna.
242 Appendix III
there is a subsequence {xu } and a sequence {vm} of integers such that
xUmeKVm^ GVm and vm>m, so that p(xUm, K')< l/m; {xuj must
contain a further subsequence that converges. Thus К is compact. If n > и1}
then «fc < и < nfc+1 for some A;, and A5) follows from A7) and the relation
It is essential in this theorem to assume that P is tight: Take a nontight P
on a separable S, as in the second remark following Theorem 1, and apply
Theorem 4.
Remarks. Theorem 8 and the example in Remark 2 following Theorem 1 are due to
LeCam A957); the remaining results are for the most part due to Varadarajan A958a and
1961a).
Bibliography
Articles in the Soviet journal Teoriya Veroyatnosti i ее Primeneniya are listed as they
appear in the English translation Theory of Probability and its Applications, published by
the Society for Industrial and Applied Mathematics.
\. D. Alexandroff A940-43), "Additive set functions in abstract spaces," Mat. Sb. 8,
307-348; 9, 563-628; 13, 169-238.
Г. W. Anderson A963), "Asymptotic theory for principal component analysis," Ann.
Math. Statist. 34, 122-148.
Г. W. Anderson and D. A. Darling A952), "Asymptotic theory of certain 'goodness of fit'
criteria based on stochastic processes," Ann. Math. Statist. 23, 193-212.
Stefan Banach A932), Theorie des Operations Lineaires. Warsaw: Monografje Mate-
matyczne.
Robert Bartoszynski A961), "A characterization of the weak convergence of measures,"
Ann. Math. Statist. 32, 561-576.
P. J. Bickel A968a), "Some contributions to the theory of order statistics," to appear.
A968b), "A distribution free version of the Smirnov two sample test in the />-variate
case," to appear.
Patrick Billingsley A956), "The invariance principle for dependent random variables,"
Trans. Amer. Math. Soc. 83, 250-268.
A961), "The Lindeberg-Levy theorem for martingales," Proc. Amer. Math. Soc. 12,
788-792.
A962), "Limit theorems for randomly selected partial sums," Ann. Math. Statist. 33,
85-92.
A964), "An application of Prohorov's theorem to probabilistic number theory,"
///. J. Math. 8, 697-704.
A965), Ergodic Theory and Information. New York: John Wiley and Sons.
Patrick Billingsley and Flemming Topsoe A967), "Uniformity in weak convergence,"
Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 1-16.
Garret Birkhoff A961), Lattice Theory. Providence: The American Mathematical Society.
243
244 Bibliography
Z. W. Birnbaum and Ronald Руке A958), "On some distributions related to the statistic
D+," Ann. Math. Statist. 29, 179-187.
J. R. Blum, D. L. Hanson, and J. I. Rosenblatt A963), "On the central limit theorem for
the sum of a random number of independent random variables," Z. Wahrschein-
lichkeitstheorie und Verw. Gebiete 1, 389-393.
H. Bohr A932), Fastperiodische Funktionen, Erg. Math. 1, no. 5. Berlin-Gottingen-
Heidelberg: Springer-Verlag. (English translation 1947. New York: Chelsea.)
N. N. Chentsov A956), "Weak convergence of stochastic processes whose trajectories
have no discontinuities of the second kind and the 'heuristic' approach to the
Kolmogorov-Smirnov tests," Theor. Probability Appl. 1, 140-144.
H. Chernoff A956), "Large sample theory: parametric case," Ann. Math. Statist. 27,1-22.
H. Chernoff and H. Teicher A958), "A central limit theorem for sequences of exchangeable
random variables," Ann. Math. Statist. 29, 118-130.
D. M. Chibisov A965), "An investigation of the asymptotic power of the tests of fit,"
Theor. Probability Appl. 10, 421-^37.
Z. Ciesielski and H. Kesten A962), "A limit theorem for the fractional parts of the sequence
2kt," Proc. Amer. Math. Soc. 13, 596-600.
Harald Cramer A946), Mathematical Methods of Statistics. Princeton: Princeton University
Press.
H. Cramer and H. Wold A936), "Some theorems on distribution functions," J. London
Math. Soc. 11, 290-295.
D. A. Darling A957), "The Kolmogorov-Smirnov, Cramer-von Mises tests," Ann. Math.
Statist. 28, 823-838.
D. A. Darling and P. E. Erdos A956), "A limit theorem for the maximum of normalized
sums of independent random variables," Duke Math. J. 23, 143-155.
Jean Dieudonne A960), Foundations of Modem Analysis. New York: Academic Press.
W. Doeblin A940), "Remarques sur la theorie metrique des fractions continues," Com-
positio Math. 7, 353-371.
M. Donsker A951), "An invariance principle for certain probability limit theorems,"
Mem. Amer. Math. Soc. 6.
A952), "Justification and extension of Doob's hueristic approach to the Kolmogorov-
Smirnov theorems," Ann. Math. Statist. 23, 277-281.
J. L. Doob A949), "Heuristic approach to the Kolmogorov-Smirnov theorems," Ann.
Math. Statist. 20, 393-403.
A953), Stochastic Processes. New York: John Wiley and Sons.
R. M. Dudley A966), "Weak convergence of probabilities on nonseparable metric spaces
and empirical measures on Euclidean spaces," ///. J. Math. 10, 109-126.
A967), "Measures on non-separable metric spaces," III. J. Math. 11, 449-^53.
Nelson Dunford and Jacob T. Schwartz A958), Linear Operators. Part I: General Theory.
New York: Interscience Publishers.
Meyer Dwass and Samuel Karlin A963), 'Conditional limit theorems," Ann. Math.
Statist. 34, 1147-1167.
P. Erdos and M. Kac A946), "On certain limit theorems in the theory of probability,"
Bull. Amer. Math. Soc. 52, 292-302.
A947), "On the number of positive sums of independent random variables," Bull.
Amer. Math. Soc. 53, 1011-1020.
William Feller A957), An Introduction to Probability Theory and Its Applications, Volume I,
second edition. New York: John Wiley and Sons.
A966), An Introduction to Probability Theory and Its Applications, Volume II. New
York: John Wiley and Sons.
Bibliography 245
R. Fortet A940), "Sur une suite egalement repartie," Studia Math. 9, 54-69.
1.1. Gikhman and A. V. Skorohod A965), Introduction to the Theory of Random Processes.
Moscow.
Leonard Gross A963), "Harmonic analysis on Hilbert space," Mem. Amer. Math. Soc. 46.
Paul R. Halmos A950), Measure Theory. New York: D. Van Nostrand.
G. H. Hardy and E. M. Wright A960), An Introduction to the Theory of Numbers, fourth
edition. Oxford: Clarendon Press.
P. L. Hennequin and A. Tortrat A965), Theorie des Probabilities et Quelques Applications.
Paris: Masson et Cie.
I. A. Ibragimov A962), "Some limit theorems for stationary processes, " Theor. Proba-
Probability Appl. 7, 349-382.
A963), "A central limit theorem for a class of dependent random variables," Theor.
Probability Appl. 8, 83-89.
Kiyosi ltd and Henry P. McKean, Jr. A965), Diffusion Processes and Their Sample Paths.
Berlin-Heidelberg-New York: Springer-Verlag.
M. Kac A946), "On the distributions of values of sums of type S/Bfcr)," Ann. of Math.
47, 33-^9.
A949), "On deviations between theoretical and empirical distributions," Proc. Nat.
Acad. Sci. U.S.A. 35, 252-257.
Gopinath Kallianpur A961), "The topology of weak convergence of probability measures,"
J. Math. Mech. 10, 947-969.
Samuel Karlin A966), A First Course in Stochastic Processes. New York: Academic
Press.
John L. Kelley A955), General Topology. New York: D. Van Nostrand.
A. Ya. Khinchine A961), Continued Fractions, third edition. Moscow: State Publishing
House of Physical-Mathematical Literature. (English translation 1964. Chicago:
University of Chicago Press )
A933), Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Erg. Math. 2, no. 4.
Berlin-Gottingen-Heidelberg: Springer-Verlag.
H. J. Keisler and A. Tarski A964), "From accessible to inaccessible cardinals," Fund.
Math. 53, 225-308.
Ernest G. Kimme A957), "On the convergence of sequences of stochastic processes,"
Trans. Amer. Math. Soc. 84, 208-229.
A960), "Some equivalence conditions for the convergence in distribution of sequences
of stochastic processes," Trans. Amer. Math. Soc. 95, 495-515.
Frank B. Knight A962), "On the random walk and Brownian motion," Trans. Amer.
Math. Soc. 103, 218-228.
A. N. Kolmogorov A931), "Eine Verallgemeinierung des Laplace-Liapounoffschen Satzes,"
Izv. Akad. Nauk SSSR Ser. Fiz-Mat., 959-962.
A933), Grundbegriffe der Wahrscheinlichkeitsrechnung. Erg. Math. 2, no. 3. Berlin-
Gottingen-Heidelberg : Springer-Verlag.
A956), "On Skorohod convergence," Theor. Probability Appl. 1, 213-222.
A. N. Kolmogorov and Yu. V. Prohorov A954), "Zufallige Funktionen und Grenzverteil-
ungssatze," Berich tiber die Tagung Wahrscheinlichkeitsrechnung und Mathematische
Statistik, 113-126. Berlin: Deutscher Verlag der Wissenschaften.
John Lamperti A962), "An invariance principle in renewal theory," Ann. Math. Statist.
33, 685-696.
A962a), "A new class of probability limit theorems," J. Math. Mech. 11, 749—772.
A962b), "On convergence of stochastic processes," Trans. Amer. Math. Soc. 104,
430-435.
246 Bibliography
L. LeCam A957), "Convergence in distribution of stochastic processes," Univ. California
Publ. Statist. 2, no. 11, 207-236.
Paul Levy A925), Calcul des Probabilites. Paris: Gauthier-Villars.
A937), Theorie de VAddition des Variables Aleatoires. Paris: Gauthier-Villars.
A948), Processus Stochastiques et Mouvement Brownien Paris: Gauthier-Villars.
Michel Loeve A960), Probability Theory, second edition. Princeton: D. Van Nostrand.
H. B. Mann and A. Wald A943), "On stochastic limit and order relations," Ann. Math.
Statist. 14, 217-226.
A. M. Mark A949), "Some probability theorems," Bull. Amer. Math. Soc. 55, 885-900.
J. C. Oxtoby and S. Ulam A939), "On the existence of a measure invariant under a trans-
transformation," Ann. of Math. B) 40, 560-566.
K. R. Parthasarathy A967), Probability Measures on Metric Spaces. New York: Academic
Press.
Yu. V. Prohorov A953), "Probability distributions in functional spaces, " Uspehi Matem.
Nauk (N.S.) 55, 167.
A956), "Convergence of random processes and limit theorems in probability theory,"
Theor. Probability Appl. 1, 157-214.
A960), "The method of characteristic functionals," Proceedings of the Fourth Berkeley
Symposium on Mathematical Statistics and Probability, Volume 2, 403-419.
Ronald Руке A965), "Spacings," J. Roy. Stat. Soc. Ser. В 27, 395^49.
A968), "The weak convergence of the empirical process of random sample size," to
appear.
Ronald Руке and Galen Shorack A968), "Weak convergence of the two-sample empirical
process and the Chernoff-Savage theorem," to appear.
R. Ranga Rao A962), "Relations between weak and uniform convergence of measures
with applications," Ann. Math. Statist. 33, 659-680.
R. Ranga Rao and V. S. Varadarajan A958), "On a theorem in metric spaces," Ann.
Math. Statist. 29, 612-613.
A. Renyi A958), "On mixing sequences of sets," Acta Math. Acad. Sci. Hung. 9, 215-228.
A960), "On the central limit theorem for the sum of a random number of independent
random variables," Acta. Math. Acad. Sci. Hung. 11, 97-102.
Bengt Rosen A964), "Limit theorems for sampling from a finite population," Arkiv for
Matematik, 5, 383-^24.
A967a), "On the central limit theorem for sums of dependent random variables," Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 48-82.
A967b), "On asymptotic normality of sums of dependent random variables," Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 95-102.
A967c), "On the central limit theorem for a class of sampling procedures," Z. Wahr-
Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 103-115.
H. Royden A963), Real Analysis. New York: Macmillan.
H. Scheffe A947), "A useful convergence theorem for probability distributions," Ann.
Math. Statist. 18, 434-438.
Paul Schmid A958), "On the Kolmogorov and Smirnov limit theorems for discontinuous
distribution functions," Ann. Math. Statist. 29, 1011-1027.
J. Sethuraman A964), "On the probability of large deviations of families of sample means,"
Ann. Math. Statist. 35, 1304-1316.
A965), "On the probability of large deviations of the mean for random variables in
D[0, 1]," Ann. Math. Statist. 36, 280-285.
George F. Simmons A963), Introduction to Topology and Modern Analysis. New York:
McGraw-Hill.
Bibliography 247
A. V. Skorohod A956), "Limit theorems for stochastic processes," Theor. Probability
-Appl. 1, 261-290.
A957), "Limit theorems for stochastic processes with independent increments," Theor.
Probability Appl. 2, 138-171.
A961), Studies in the Theory of Random Processes. Kiev: Kiev University Press.
(English translation 1965. Reading: Addison-Wesley.)
E. E. Slutsky A925), "tiber Stochastische Asymptoten und Grenzwerte," Metron 5, 1-90.
A937), "Qualche proposizione relativa alia teoria delle funzioni aleatorie," Giorn.
1st. Ital. Attuari 8, 183-199.
Charles Stone A963), "Weak convergence of stochastic processes defined on semi-infinite
time intervals," Proc. Amer. Math. Soc. 14, 694-696.
A963), "Limit theorems for random walks, birth and death processes, and diffusion
processes," ///. J. Math. 7, 638-660.
V. Strassen A964), "An invariance principle for the law of the iterated logarithm," Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete 3, 211-226.
E. C. Titchmarsh A939), The Theory of Functions, second edition. Oxford: Oxford
University Press.
Flemming Topsoe A967a), "On the connection between P-continuity and P-uniformity
in weak convergence," Theor. Probability Appl. 12, 281-290.
A967b), "Preservation of weak convergence under mappings," Ann. Math. Statist. 38,
1661-1665.
Bruce E. Trumbo A965), "Sufficient conditions for the weak convergence of conditional
probability distributions in a metric space." Thesis, University of Chicago.
V. S. Varadarajan A958a), "Convergence in distribution of stochastic processes." Thesis,
Calcutta University.
A958b), "Weak convergence of measures on separable metric spaces," Sankyu 19,
15-22.
A958c), "On the convergence of probability distributions,aAjA:ja, 19, 23-26.
A961a), "Measures on topological spaces," Mat. Sb. 55 (97), 35-100. (English trans-
translation 1965. Providence: American Mathematical Society Translations, series 2, 48,
161-228.)
A961b), "Convergence of stochastic processes," Bull. Amer. Math. Soc. 67, 276-280.
Summary of Notation
This list includes only symbols used systematically throughout the book in some special
way. Although the generic element of S is asserted to be x, it can actually be any letter in
that general region of the lower case roman alphabet, possibly with subscripts and so on;
similar remarks apply to other generic elements.
Set Theory
Ec is the complement of E; E1 — ?2 = E1 П E2C is the difference of E1 and E2; E1 + E2 —
(Ег — E2) vj (E2 — Ej) is their symmetric difference; /^ is the indicator or characteristic
function of E.
Probability Spaces
(Cl, Щ is a measurable space; со and Eare the generic elements of Q and @; P is a probability
measure on (Q, '.Щ\ Е denotes expected value; ? is a random variable, and {?„} and {?t) are
stochastic processes.
If h maps (Q, Щ into (Q\ 38'),
h'xSS' с ®
means that h~xE' e 38 for each E' G .^' —• that is, that h is measurable — and P/z^has value
P(h~lE') at ?' (p. 222).
The General Metric Space
S is a metric space with generic point x, metric p(x, y) (and p(x, A) — see p. 215), and
(T-field^of Borel sets; A°, A~, and дЛ are the interior, closure, and boundary of the generic
subset Л (which lies in Sf unless the contrary is stated); S(x, s) is the open e-sphere (ball)
about ж; the general closed set is F(but this can also be a distribution function), the general
open set is G, and the general compact set is К (but this can also be a constant or bound).
248
Summary of Notation 249
C(S), with generic element/, is the class of bounded, continuous functions on C; if h
maps S into another metric space, Dh is the set of its discontinuities.
P and Q are probability measures on (S, У)\Рп => P denotes weak convergence; II is a
family of P's.
Special Metric Spaces
The n-field of Borel sets in a space is denoted by the script version (with any attendant
superscripts) of the upper case roman letter denoting the space itself.
Rk is Euclidean A>space; H is the generic element of 'Mk\ F is a distribution function;
Fn => F denotes weak convergence (p. 18); for the notations x < y, x < y, \x — y\, and
{a, b], see p. 17.
Rw is the topological product of a countable sequence of copies of the real line (pp. 19
and 218), with generic point x = (xlt x2, . . .); rrk{x) = (x1, . . . , xk) is the natural pro-
projection into Rh.
Cis C[0, 1] (pp. 19, 54, and 220), with metric p and generic point x — x(t); rrti_ tk(x) =
(x^j), , a;(rfc)) is the natural projection into Rh; for xt as a coordinate variable and for
the phrase "the distribution of xf under P", see p. 60.
D is D[0, 1] (p. 109), with two equivalent metrics d (p. Ill) and d0 (p. 113), and generic
point x = x(t); projections nt iftand coordinate variablesxt are as for C; special symbols:
Я and Л (p. Ill), ЩИ (p. 112)',"тР (p. 123), Jt (p. 124), and Tx (p. 128).
Random Elements
X is a random element (p. 22) with value X(co) at со; if its range is a function space,
X(t) = ^and X(t, со) = 77t(X(co)) (p. 57 for C; p. 128 for ?>); Xn ® > A'and Xn rj y P
denote convergence in distribution (pp. 23 and 24); Xn p > a and Xn p > X denote
convergence in probability (pp. 24 and 26); for (X, Y) and p(X, Y), see pp. 25 and 225.
Nand N(/u, a2) are normal distributions or variables (p. 24); H^is Wiener measure on С
(p. 61), a random function in С with that distribution (p. 64), or the corresponding
measure or random function in D (p. 137); W° is the Brownian bridge, as a measure or a
random function in С or in D (pp. 64, 65, and 141).
Moduli
The modulus of continuity for С and related moduli for D:
wx(8) = w(x, 8), pp. 54 and 220,
wx(d) = w(X, 6), p. 58,
wx(T0), p. 109,
w'x{8), p. 110,
w"x{8), p. 118,
w"(X, 6), p. 128.
Index
Alexandrov, 16
Anderson, 34
Arc sine law, 80, 138, 194
Arzela-Ascoli theorem, 55, 221
Asymptotically independent increments,
157
Bartozynski, 48
Base for a topology, 215
Bickel, 108
Billingsley, 16, 17, 150, 195, 208
Biikhoff, 233
Birnbaum, 108
Bjornsson, 17
Blum, 160
Borel set, 3, 7
Boundary, 215
Brownian bridge, 64
Brownian motion, 4, 62, 154
Brownian motion process, 64
Central limit theorem, 42, 72
Change of variable, 222
Characteristic function, of a probability
measure, 45
of a set, 8
Chentsov, 102, 108
Chernoff, 34, 214
Chibisov, 153
Ciesielski, 205
Compactness, 217
Complete set, 217
Continued fractions, 169, 192, 201
Continuity of sample paths, 66
Continuity set, of a measure, 2, 11
of a random element, 23
Convergence, in distribution, 23, 24
in probability, 24, 26
Convergence-determining class, 15
Coordinate variables, in C, 60
in A 123
Cramer-Wold theorem, 49
Darling, 86, 108
De Moivre-Laplace theorem, 1,12
Determining class, 15
Diagonal method, 219
Diffusion, 158
Diophantine approximation, 193
Discontinuity of the first kind, 109
Discrete set, 215
Distribution function, 17, 22
Distribution of a random element, 22
Doeblin, 195
Domain of a random element, 22
Donsker, 6, 76, 108, 143
Donsker's theorem, 68, 137
Doob, 6, 108, 165
Dudley, 16, 153
Dunford, 16
Dwass, 214
Empirical distribution function, 5, 103,141
e-net,.217
Equivalent metrics, 215
Erdos, 6, 76, 86
Exchangeable random variables, 212
Feller, 52, 81
Finite-dimensional distributions, in C, 30
in A 123
of random elements, 40 |
in/?00, 30
Finite-dimensional sets, in C, 19
in A 121
ini?00, 19
Fortet, 195
Functional central limit theorem, 72
251
252 Index
Function of a ф-mixing process, 182
Gaussian random function, 64
Gauss's measure, 169
Gikhman, 16
Glivenko-Cantelli theorem, 103
Gross, 52
Halmos, a.e.
Hanson, 150
Hardy, 52
Helley's selection theorem, 227
Hennequin, 16
Ibragjmov, 195, 208
Independence of random elements, 26
Independent increments, 61, 154
Indicator function of a set, 8
Interval in Rk, 17
Invariance principle, 72
Ito, 67, 86
Kac, 6, 76, 86, 108, 195
Kallianpur, 16
Karlin, 86, 214
^-dimensional Borel set, 17
Keisler, 233
Kesten, 205
Khinchine, 165
Kimme, 142
Knight, 77
Kolmogorov, 6, 16, 76, 102, 103, 123
KolmogoroVs existence theorem, the
general case, 230
for i?00, 228
Lamperti, 61, 77, 150
LeCam, 6, 16, 40, 242
Levy, 52, 165
Liggett, 214
Lindeberg-Levy case, 77
Lindeberg-Levy theorem, 3, 45
Lindeberg's theorem, 42
Lineal Borel set, 17
Local compactness, 7
Local limit theorems, 49
Lyapounov's theorem, 44
McKean, 67, 86
Mann, 34
Marginal distribution, 20
Mark, 86, 143
Martingale, 205
Measurability, 222
Measurable cardinal, 233
Modulus of continuity, 54, 220
Nonmeasurable cardinal 233
Normal distribution, 24
One-sided process, 168, 183
Open cover, 215
Ox toby, 10
Pardoner, 153
Parthasaiathy, 16
0-mixing, 166
Problem of measure, 10, 233
Products of metric spaces, 224
Prohorov, 6, 16, 34, 40, 52, 61, 76, 77, 123
Prohorov metric, 238
Prohorov's theorem, 37, 240
Projection, from C, 19
from A 120
fromi?00, 19
Руке, 108
Random element, 22
Random function, 4, 22, 57, 128
Random variable, 22
Randum vector, 22
RarHom walk case, 77
Ranga Rao, 16, 17, 29
Re'; -ction principle, 71
Regular measure, 7
Relative compactness of measures, 35, 239
Rehyi, 28, 142, 150
Rosen, 165, 208, 214
Rosenblatt, 150
Rubin, 34
Sampling, 208
Scheffe's theorem, 224
Schmid, 142
Schwartz, 16
Separability, 215
Separable measure, 234
Separable stochastic process, 65, 134
Sequential compactness, 35, 239
Shorack, 108
Index 253
(^compact, 9
Skofohod, 6, 16, 22, 123, 142
Skorohod topology, 111
Slutsky, 34, 102
Sphere, 215
Stone, 61, 142
Strassen, 77
Subspaces, 224
Support, 9
Tarski, 233
Teicher, 214
Tight family of measures, 37, 239
Tight measure, 9
Tight sequence of random elements, 40
Topological completeness, 234
Topology of weak convergence, 236
Topstfe, 16, 17, 34, 61, 226
Tortrat, 16
Totally bounded set, 217
Trumbo, 214
Two-sided process, 166, 182
Ulam, 10
Under, 60
Uniform integrability, 32
Uniform topology, on C, 54, 220
onД 150
Upper semicontinuity, 218
Varadarajan, 6, 16, 29, 40, 234, 241, 242
Wald, 34
Weak convergence, of distribution func-
functions, 1, 18
of measures, 2, 7, 16
Weyl's theorem, 19,50
Wichura, 214
Wiener, 67
Wiener measure, 4, 61
Wiener path, 63
Wiener process, 64
Wold, 52
Wright, 52