Text
                    Convergence of
Probability Measures
Patrick Billingsley
Departments of Statistics and Mathematics
The University of Chicago
JOHN WILEY & SONS, New York • Chichester • Brisbane • Toronto


Copyright © 1968 by John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of any part of this work beyond that permitted by Sections 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. Library of Congress Catalog Card Number: 68-23922 SBN 471 07242 7 Printed in the United States of America 20 19 18 17 16 15 14 13
TO MY MOTHER
Preface Asymptotic distribution theorems in probability and statistics have from the beginning depended on the classical theory of weak convergence of distribu- distribution functions in Euclidean space—convergence, that is, at continuity points of the limit function. The past several decades have seen the creation and extensive application of a more inclusive theory of weak convergence of probability measures on metric spaces. There are many asymptotic results that can be formulated within the classical theory but require for their proofs this more general theory, which thus does not merely study itself. This book is about weak-convergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space of continuous functions on the unit interval and in Chapter 3 to the space of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. Although standard measure-theoretic probability and metric-space topol- topology are assumed, no general (nonmetric) topology is used, and the few results required from functional analysis are proved in the text or in an appendix. Mastering the impulse to hoard the examples and applications till the last, thereby obliging the reader to persevere to the end, I have instead spread them evenly through the book to illustrate the theory as it emerges in stages. Chicago, March 1968 Patrick Billingsley vii
Acknowledgements My thanks go to Soren Johansen, Samuel Karlin, David Kendall, Ronald Руке, and Flemming Topsoe, who read large parts of the manuscript; the book owes much to their detailed suggestions, and I am very grateful. I should also like to thank Mary Woolridge for her typing, cheerful, swift, and error-free. The writing of this book was supported in part by the Statistics Branch, Office of Naval Research, and in part by Research Grant No. 8026 from the Division of Mathematical, Physical, and Engineering Sciences of the National Science Foundation. Vlll
Contents INTRODUCTION 1 CHAPTER 1. WEAK CONVERGENCE IN METRIC SPACES 7 1. Measures in Metric Spaces, 7 Measures and Integrals, 7, Tightness, 9 2. Properties of Weak Convergence, 11 Portmanteau Theorem, 11, Other Criteria, 14 3. Some Special Cases, 17 Euclidean Space, 17, The Circle, 18, The Space R™, 19, The Space C, 19, Product Spaces, 20 4. Convergence in Distribution, 22 Random Elements, 22, Convergence in Distri- Distribution, 23, Convergence in Probability, 24, Product Spaces, 26 5. Weak Convergence and Mappings, 29 Continuous Mappings, 29, Main Theorem, 30, Integration to the Limit, 31, An Extension of Theorem 5.1, 33 6. Prohorov's Theorem, 35 Relative Compactness, 35, The Direct Theorem, 37, The Converse, 40 IX
Contents 7. First Applications, 41 Smooth Functions, 41, The Central Limit Theorem, 42, Characteristic Functions, 45, The Cramer-Wold Device, 48, Local and Integral Limit Theorems, 49, Weak Convergence on the Circle and Torus, 50 CHAPTER 2. THE SPACE С 54 8. Weak Convergence and Tightness in C, 54 Weak Convergence, 54, Tightness, 54, Random Functions, 57, Coordinate Variables, 60 9. The Existence of Wiener Measure, 61 Wiener Measure, 61, The Brownian Bridge, 64, Separable Stochastic Processes, 65 10. Donsker's Theorem, 68 The Theorem, 68, An Application, 70, A Necessary Condition for Tightness, 73, Another Proof of Donsker's Theorem, 73 11. Functions of Brownian Motion Paths, 77 Maximum and Minimum, 11, The Arc Sine Law, 80, The Brownian Bridge, 83 12. Fluctuations of Partial Sums, 87 Maxima, 87, Product Moments, 88, Applica- Applications, 89, Proof of Theorem 12.1, 91, Moments, 94, A Tightness Criterion, 95, Further In- Inequalities, 98 13. Empirical Distribution Functions, 103 CHAPTER 3. THE SPACE D 109 14. The Geometry of D, 109 The Space D, 109, The Skorohod Topology, 111, Completeness ofD, 115, Compactness in D, 116, A Second Characterization of Compactness, 118, Finite-Dimensional Sets, 120 15. Weak Convergence and Tightness in D, 123 Finite-Dimensional Distributions, 123, Tight- Tightness, 125, Random Elements of D, 128, A Criterion for Convergence, 128, Criteria for
Contents xi Existence, 130, Other Criteria, 133, Separable Stochastic Processes, 134 16. Applications, 137 Donskefs Theorem, 137, Dominated Measures, 139, Empirical Distribution Functions, 141 17. Random Change of Time, 143 Randomly Selected Partial Sums, 143, Random Change of Time, 144, Applications, 145, Renewal Theory, 148 18. The Uniform Topology, 150 CHAPTER 4. DEPENDENT VARIABLES 19. Diffusion, 154 A Characterization of Brownian Motion, 154, Other Diffusion Processes, 158 20. Mixing Processes, 166 y-Mixing, 166, Inequalities for Moments, 170, Functional Central Limit Theorem, VIA, Inte- Integrals in Place of Sums, 178, Nonstationarity, 179 21. Functions of Mixing Processes, 182 Preliminaries, 182, Functional Central Limit Theorem, 184, Applications, 191, Diophantine Approximation, 193, Nonstationarity, 194 22. Empirical Distribution Functions, 195 (p-Mixing Processes, 195, Functions of <p- Mixing Processes, 199 23. Martingales, 205 24. Exchangeable Random Variables, 208 Sampling, 208, Exchangeable Variables, 212 APPENDIX I. METRIC SPACES Separability, 215, Compactness, 217, Upper Semicontinuity, 218, 77ie S/?aa? i?00, 218, 77ie C, 220 154 215
xii Contents APPENDIX II. MISCELLANY 222 Measurability, 222, Change of Variable, 222, Tail Probabilities, 223, Scheffe's Theorem, 223, Subspaces, 224, Product Spaces, 224, Measur- Measurability of Dh, 225, Belly s Theorem, 226, Kolmogorov's Theorem, 228, Measurability of Some Mappings, 230, More Measurability, 232 APPENDIX III. THEORETICAL COMPLEMENTS 233 The Problem of Measure, 233, Separable Measures, 234, The Topology of Weak Con- Convergence, 236, Prohorov's Theorem, 239 BIBLIOGRAPHY 243 SUMMARY OF NOTATION 248 INDEX 251
CONVERGENCE OF PROBABILITY MEASURES
Introduction The De Moivre-Laplace limit theorem states that, if A) ВД = Pf^=^ < x\ I J J is the distribution function of the normalized number of successes in n Bernoulli trials, and if B) F(x) = ~=Г e~W du is the unit normal distribution function, then C) Fn(x) -> F{x) for all x (n -> oo, the probability p of success fixed). We say of arbitrary distribution functions Fn and F on the line that Fn converges weakly to F, which we indicate by writing Fn => F, if C) holds for all continuity points x of F. Thus the De Moivre-Laplace theorem asserts that A) converges weakly to B); since B) is everywhere continuous, the proviso about continuity points is vacuous in this case. If Fn and F are denned by 0 if x < - D) Fn{x) = ; if x>- n and fO if x < 0 E) F{x) = U if x > 0,
2 Introduction then again Fn=>F, and this time the proviso does come into play: C) fails at x = 0. For a better understanding of this notion of weak convergence, which underlies a large class of Limit theorems in probability theory, consider the probability measures Pn and P generated by arbitrary distribution functions Fn and F. These probability measures, defined on the class of Borel subsets of the line, are uniquely determined by the requirements Pn{- со, x] = Fn(x), P{- со, x] = F(x). Since F is continuous at x if and only if the set {x} consisting of x alone has P-measure 0, Fn => F means that the implication F) Pn(-co,x]-+P(-cc,x] if P{x} = 0 holds for each x. Let dA denote the boundary of a subset A of the line; дA consists of those points that are limits of sequences of points in A and are also limits of sequences of points outside A. Since the boundary of (— со, х] consists of the single point x, F) is equivalent to G) Pn(A)-+P(A) if P(dA) = 0, where we have written A for (— oo, x]. The fact of the matter is that Fn=> F holds if and only if the implication G) is true for every Borel set A—a result proved in Chapter 1. Let us distinguish by the term P-continuity set those Borel sets A for which P(dA) = 0, and let us say that Pn converges -weakly to P, and write Pn => P, if Pn(A) -> P(A) for each P-continuity set—that is, if G) holds. As just asserted, Pn=>P if and only if the corresponding distribution functions satisfy Fn => F. This reformulation of the concept of weak convergence clarifies the reason why we allow C) to fail if F has a jump at x. Without this exemption, D) would not converge weakly to E), but this example may appear artificial. If we turn our attention to probability measures Pn and P, however, we see that Pn(A) -> P{A) may fail if P(dA) > 0 even in the De Moivre-Laplace theorem. The measures Pn and P generated by A) and B) satisfy (8) Pn(A) = p[^-=^P e A] and (9) P(A) = -±= | e-*"" du ^2 J
Introduction 3 for Borel sets A. Now if A consists of the countably many points .— , и = 1,2,..., к = 0, 1,. . . , и, then РЯ(Л) = 1 for all и and P{A) = 0, so that Pn(A) -> P(^) is impossible. Since 9Л is the entire real line, this does not violate G). Although the concept of weak convergence of distribution functions is tied to the real line (or to Euclidean space, at any rate), the concept of weak convergence of probability measures can be formulated for the general metric space, which is the real reason for preferring the latter concept. Let S be an arbitrary metric space, let Sf be the class of Borel subsets of S (Sf is the G-neld generated by the open sets), and consider probability measures Pn and P denned on Sf. Exactly as before, we define weak convergence Pn => i> by requiring the implication G) to hold for all Borel sets A. In Chapter 1 we investigate the general theory of this concept of convergence and see what it reduces to in various particular metric spaces. We prove there, for example, that Pn converges weakly to P if and only if A0) (fdPn->(fdP if feC(S), Js Js where C(S) denotes the class of bounded, continuous real-valued functions on S. (In order to conform with general mathematical usage, we take A0) as the definition of weak convergence, so that G) becomes a necessary and sufficient condition instead of a definition.) Chapter 2 concerns weak convergence in the space С — C[0, 1] with the uniform topology; С is the space of all continuous real functions on the closed unit interval [0, 1], metrized by taking the distance between two functions x = x{t) and у — y{t) to be A1) p(z,y)= sup \x(t)-y{i)\. 0<t<l An example of the sort of application made in Chapter 2 will show the utility of a general treatment of weak convergence—one that goes beyond the classical Euclidean case. Let ?l5 ?2, ... be a sequence of independent, identically distributed random variables defined on some probability space (Q., &, P). If the ?n have mean 0 and variance a2, then, according to the Lindeberg-Levy central limit theorem, the distribution of the normalized sum A2) ^5 (| ++!) converges weakly, as n tends to infinity, to the normal distribution defined by (9).
4 Introduction We can formulate a refinement of the central limit theorem by proving weak convergence of the distributions of certain random functions constructed from the partial sums Sn. For each integer n and each sample point со, construct on the unit interval the polygonal function that is linear on each of the subintervals [(i — \)/n, Цп], i = 1, 2, . . . , n, and has the value 8г{соIа\1п at the point i\n (S0(co) = 0). In other words, construct the function Xn{co) whose value at a point / of [0, 1] is A3) Xn(t, со) = J- SU°>) + t ~ °' Г 1)/П ~T- U") a^Jn 1/n Tj "i- 1 П n ' nj For each со, Xn(co) is an element of the space C. Let Pn be the distribution of Xn(co) in C, defined for Borel subsets A of С—Borel sets relative to the metric A1)—by (the definition is possible because the mapping со -> Xn(co) turns out to be measurable in the right way). In Chapter 2 we prove A4) Pn ^ W, where W is Wiener measure. We also prove the existence in С of Wiener measure, which describes the probability distribution of the path traced out by a particle under Brownian motion. If A = {x:x{\) < a}, then, since the value of the function Xn(co) at / = 1 is Xn{\, со) = Sn(co)lay/n, It turns out that W{dA) = 0, so that A4) implies Pfco:^-Sn(co) < a) -> W{x:x(l) < a}. It also turns out that W{x:x(l) < a} = -p= Г e~W du, ¦JItT J — ao so that A4) does contain the Lindeberg-Levy theorem. The sum |x + • • • + ?n may be interpreted as the position at time n in a random walk. The central limit theorem asserts that this position (properly normalized) is, for n large, distributed approximately as the position at time t = 1 of a particle in Brownian motion. The relation A4) asserts that the
Introduction 5 entire path of the random walk during the first n steps is, for n large, distributed approximately as the path up to time / = 1 of a particle under Brownian motion. To see in a concrete way that A4) contains information going beyond the central limit theorem, consider the set A = \x: sup x{t) < <xl. I o<t<i J Again it turns out that W(dA) — 0, so that A4) implies A5) lim ?[co:—L max Sk(co) < a] = lim Pn(A) = W\x: sup x(t) < a). I o<t<i J If we evaluate the right-most member of A5) (which we can do in a number of ways, for example, by computing the limit on the left for some specially selected sequence {?n} that makes the computation easy), then we have a limit theorem for the distribution of maxfc<n Sk under the hypotheses of the Lindeberg-Levy theorem. As a final example involving Xn(co), take A to be the set of x in С for which the set {t:x(t)> 0} has Lebesgue measure at most a (we assume 0 < a < 1). As before, Pn(A) -> W(A). Since the Lebesgue measure of {t:Xn(t, со) > 0} is essentially the fraction of the partial sums S1} S2, . . . , Sn that exceed 0, this argument leads to an arc sine law under the hypotheses of the Lindeberg-Levy theorem. Chapter 2 contains the details of all these derivations. We can in this way use the theory of weak convergence in С to obtain a whole class of limit theorems for functions of the partial sums Slt S2,. . . , Sn. The fact that Wiener measure W is the weak limit of the distribution in С of the random function Xn(a>) can also be used to prove theorems about W, and W is interesting in its own right. Chapter 3 specializes the theory of weak convergence to another space of functions on [0, 1]—the space D = D[0, 1] of functions having discontinu- discontinuities of at most the first kind. This is the natural space in which to analyze the behavior of empirical distribution functions, for example. Let |l5 ?2, . . . be independent random variables on (Q., &, P), each uniformly distributed on [0, 1]. Let Fn(t, со) be the empirical distribution function of ^(co), ..., fn(o>); for 0 < / < 1, Fn(t, со) is the frac- fraction of integers k, 1 < к < n, for which |fc(a») < t. Now let Yn(co) be the element of D whose value at / is Yn{t, со) = Jn(Fn(t, со) - /).
6 Introduction If D is metrized in the right way, it becomes possible to speak of the distribu- distribution in D of the random function Yn(a>) and to prove that this distribution converges weakly as n tends to infinity. Just as in the case of the random element of С defined by A3), we can then go on to derive the limiting distributions of sup ^n(Fn{t, со) - t)= sup Yn(t, со) 0<t<l 0<<l and related quantities that arise in statistics. Chapter 4 concerns weak convergence of the distributions of random functions derived from various dependent sequences of random variables. Many of the conclusions in Chapters 2, 3, and 4, although not requiring function-space concepts for their statement, could hardly have been derived without function-space methods. Standard measure-theoretic probability and metric-space topology are used from the beginning. Although the point of view throughout is that of functional analysis (a function is a point in a space), nothing of functional analysis is assumed (beyond an initial willingness to view a function as a point in a space); all function-analytic results needed are proved in the text or else in Appendix I (which also gathers together for easy reference some results in metric-space topology). Remarks. The main papers that lead to the development of this theory were Kolmogorov A931), Erdos and Kac A946 and 1947), Doob A949), and Donsker A951 and 1952). Prohorov A953 and 1956) and Skorohod A956) gave the theory its present form. Le Cam A957) and Varadarajan A958a and 1961a) have extended it to general topological spaces.
CHAPTER 1 Weak Convergence in Metric Spaces 1. MEASURES IN METRIC SPACES . . . Let S be a metric space. We shall study probability measures on the class Sf of Borel sets in S. Here Sf is the a-field generated by .the open sets—the smallest a-field containing all the open sets—and a probability measure on Sf is a nonnegative, countably additive set function P with P(S) = 1. If such probability measures Pn and P satisfy J^ fdPn -> J^ fdP for every bounded, continuous real function / on S, we say that Pn converges weakly to P and write Pn => P. Our aim in this chapter is to study this concept in detail; we begin with some properties of individual probability measures on (S, SP). Although we must sometimes assume separability or completeness, most of the theorems in this chapter hold for an arbitrary metric space S. The spaces in our applications are usually separable and complete; since they rarely have further regularity properties, such as local compactness, we never impose further restrictions. Measures and Integrals THEOREM 1.1 Every probability measure on (S,?f) is regular; that is, if A e?f and s > 0, then there exist a closed set F and an open set G such that F с Л с GandP(G - F)< s. Proof. Denote the metric on S by p(x, y) and the distance from x to A by p(x, A).1[ If A is closed, then we may take F = A and G = {x:p{x, A) < <5} f For terminology and some theorems about metric spaces, see Appendix I.
8 Weak Convergence in Metric Spaces for some d, since the latter sets decrease to A as д J, 0. Hence we need only show that the class ^ of Borel sets with the asserted property is a o--field.f Given sets An in 0, choose closed sets Fn and open sets Gn such that FnczAn^Gn and P(Gn - Fn)< sl2n+K If G=\JnGn, and if F- Un<n/n' with "o so chosen that РA1Л - Ж e/2, then Fc(JAcG and P(G — F) < ?- Thus ^ is closed under the formation of countable unions; since У is obviously closed under complementation, the proof is complete. Theorem 1.1 implies that P is determined by the values of P(F) for closed sets F. Theorem 1.3 shows that P is determined by the values of $fdP% for bounded, continuous real functions / denned on S. Denote by CE) the class of such functions /. It is shown on p. 222 § that each / in C(S) is measurable Sf. Everything depends on the following result, which shows how to approximate the indicator (or characteristic function) IF of a closed set F by elements of CE). THEOREM 1.2 If F is closed and s positive, there is a function f in C(S) such thatf(x) = lifxe F,f(x) = 0 if P(x, F) > s, and 0 <f(x) < 1 for all x. The function f may be taken to be uniformly continuous. Proof. Define a continuous function q> of a real variable by (\ if t<0, A.1) <p(t) - ( If A-2) J 1 - / if 0 < / < 1, 0 if 1 < /. a-6 f We have defined the class ?f of Borel sets as the ст-field generated by the open sets, which is the same thing as the c-field generated by the closed sets and is the one appropriate for the present theory. For related (mostly inappropriate) ст-fields, see Problem 6. % When it is the entire space, we omit the region of integration. § Of Appendix II, a miscellany to which most measurability questions are relegated.
Measures in Metric Spaces 9 then / has the required properties—it is even uniformly continuous. The drawing graphs this/for F — [a, b] on the line. THEOREM 1.3 Probability measures P and Q on (S, Sf) coincide if A.3) jfdP=(fdQ for each/in C(S). Proof. Suppose F is closed. Start with A.1) and define, for each positive integer u, A.4) <pu(t) = <p(ut) and A-5) /„(*) = ?u(p(*. F)). Then {/„} is a nonincreasing sequence of elements of C(S) converging point- wise to IF. By the bounded convergence theorem, P(F) = limu J/u dP and Q(F) =* HmJ/u dQ, so that, if A.3) holds for all/in C(S), P(F) =* Q(F). Since P and Q agree for all closed sets, it follows by Theorem 1.1 that P and Q are identical. Thus the values of \fdP for/in C(S) completely determine the values of P{A) for A in Sf. This fact underlies the circle of ideas centering on the notion of weak convergence; although we have defined weak convergence by requir- requiring the convergence of the integrals of functions in C(S), in the next section we shall characterize it in terms of the convergence of the measures of certain sets. Tightness The following notion of tightness proves important both in the theory of weak convergence and in its applications. A probability measure P on (S, Sf) is tight if for each positive s there exists a compact (p. 217) set К such that P{K) > 1 — e. Clearly, P is tight if and only if it has a a-compact support.f By Theorem 1.1, P is tight if and only if P(A) is, for each A in Э*, the supremum of P(K) over the compact subsets К of A. In a space that is a-compact, every probability measure is tight—which covers fc-dimensional Euclidean space. The following result, which also covers the Euclidean case, is more useful. t A support of a probability measure is a set A in У with P{A) = 1; a set is c-compact if it can be represented as a countable union of compact sets. The characterization of a tight P as having a c-compact support is inappropriate as a definition because it does not generalize in the right way to families of probability measures (see Section 6).
10 Weak Convergence in Metric Spaces THEOREM 1.4 If S is separable and complete, then each probability measure on (S, ?f) is tight. Proof. Since S is separable, there is, for each n, a sequence Anl, An2, ... of open l/«-spheres covering S. Choose in so гЬ.аХР(Х)г^гпАпг) > 1 — ?/2n- By the completeness hypothesis, the totally bounded set C\n>i\Ji<inAni has compact closure К (see p. 217). Since clearly P(K) > 1 — s, the theorem follows. Theorem 1.4 is false without the hypothesis of completeness; whether the hypothesis of separability can be suppressed is equivalent to the problem of measure. These matters are discussed in Appendix III.j Remarks. Theorem 1.4 is due to Ulam (see Oxtoby and Ulam A939)); LeCam A957) introduced the term "tight." PROBLEMS* 1. Say that a function / separates sets A and В if f(x) = 0 for x in A, f(x) = 1 for x in B, and 0 < f(x) < 1 for all x. If A and В are at positive distance, they can be separated by a uniformly continuous / [Theorem 1.2]. If A and В have disjoint closures but are at distance 0, they can be separated by a continuous/ [/(x) = p(x, A)/(p(x, A) + p(x, B))] but not by a uniformly continuous/. There is no continuous/separating A and В if their closures meet; there is no /separating A and В if they meet. 2. Give examples of distinct topologies that give rise to the same class of Borel sets. 3. If S can be embedded as an open set in some complete metric space, then [Kelley A955, p. 207)] it is topologically complete. Since a locally compact S is open in its completion, it is topologically complete. Hence Theorem 1.4 applies if S is separable and locally compact. Since such an S is a-compact [being a union of open sets with compact closures and hence (p. 216) a countable such union], it also follows directly that each proba- probability measure on it is tight; Euclidean space is an example. 4. Let S be a Hilbert space with a countably infinite orthonormal basis xx, x2, . . . . Since S is separable and complete, Theorem 1.4 applies. However, no set with nonempty interior is compact [a nonempty interior must, for some x and e, contain all the points x + exn], so that S is neither locally compact nor [Baire's category theorem; Kelley A955, p. 200)] ff-compact. If P assigns positive mass to each element of a countable, dense set, then P has no support locally compact in the relative topology. 5. Adapt Problem 4 to the general Banach space of countably infinite dimension [there exist points хъ x2, . . . with supn ||xn|] < со and mfm^n \\xm — xn\\ > 0; see Banach A932, p. 83)]; C[0, 1], important in probability, is such a space, which explains why a theory based on local compactness is of small utility in this subject. (See also Problem 5 in Section 3.) t Although Theorem 1.4 as given suffices for all the applications in this book, it is natural to inquire after extensions. It is to questions of just this sort that Appendix III is devoted. t Some problems involve concepts not required for an understanding of the text itself; there are no problems whose solutions are used later in the text. A simple assertion is understood to be prefaced by "show that." Square brackets contain hints or indications of solutions.
Properties of Weak Convergence 11 6. We have defined У as the cr-field generated by the open sets, which we can indicate by wr-iting У — cr(open sets). In the same way, define ?x = cr(closed G5 sets) (a set is a G& if it is a countable intersection of open sets), define Sf2 = cr(C(S)) (the smallest cr-field with respect to which each function in C(S) is measurable), and define SP$ = cr(open spheres), •5^4 = o-(compact sets), and Уь — cr(compact Gs sets). In a metric space each closed set is a G&. Use this fact and Theorem 1.2 to prove Show that У = У3 if S is separable. Show that У = У5 if S is cr-compact (which will be true if S is separable and locally compact). We may have Уг # Уъ (even if S is locally compact): Take S uncountable and discrete. We may have Sf% Ф <S^4 (even if S is separable and complete): Take S to be the Hilbert space in Problem 4. (The situation differs in the general topological space, where one must consider two classes of sets: The Borel sets are taken as the elements sometimes of У and sometimes of Sf±, and the Baire sets are taken as the elements sometimes of S?2 an(^ sometimes of Уь—the terminology varies.) 7. In connection with tightness, this fact is interesting: Suppose P is defined on (S, У), but suppose at the outset only that it is finitely additive. If, for each A in У, P{A) = sup P(K) with К ranging over the compact subsets of A, then P is countably additive after all. 2. PROPERTIES OF WEAK CONVERGENCE We have defined Pn =>P to mean that §fdPn -> \fdP for each/in the class C{S) of bounded, continuous real functions on S. Note that, since the integrals $fdP completely determine P (Theorem 1.3), the sequence {Pn} cannot converge weakly to two different limits at the same time. Note also that weak convergence depends only on the topology of S, not on the specific metric that generates it: Two metrics generating the same topology give rise to the same classes Sf and C(S) and hence to the same notion of weak convergence.! Portmanteau Theorem The following theorem provides useful conditions equivalent to weak convergence; any one of these conditions could serve as the definition. A set A in Sf whose boundary ЗА satisfies P(dA) — 0 is called a P-continuity set (note that dA is closed and hence lies in Sf). THEOREM 2.1 Let Pn, P be probability measures on (S, Sf). These five conditions are equivalent: t If we topologize the space Z{S) of all probability measures on {S, У) by taking as the general basic neighborhood of P the set of Q such that |f ftdP — f/,- dQ\ < e for / = 1, . . . , k, where e is positive and the/j lie in C(S), then weak convergence is convergence in this topology. The topological structure of Z{S), which will be of no direct concern to us, is discussed in Appendix III.
12 Weak Convergence in Metric Spaces @ Pn (ii) limn jfdPn — jfdPfor all bounded, uniformly continuous real/. (in) lim supn Pn(F) < P(F)for all closed F. (iv) lim MnPn(G) > P(G)for all open G. (v) Нтя Рп{А) = P(A)for all P-continuity sets A. A couple of examples will show the significance of these conditions. Let P be a unit mass at the point x (P(A) is 1 or 0, according as x lies in A or notf), and letPn be a unit mass at xn. If xn —*- x, then §fdPn =/(#„) -»¦/(#) — $fdP for all/in C(S), so that Pn =>P. If xn does not converge to x, then, for some positive e, we have p(xn,x) > e for infinitely many n. If f(y) *= <р(е~1р(х, у)) with <p defined by A.1), then/e C(S),f(x) = 1, and/(zj = 0 for infinitely many n; hence Pn cannot converge weakly to P. Thus Pn => P if and only if жя ->- ж, which provides an example we shall often use. (Many putative weak-convergence theorems that are in fact not theorems can be disproved by specializing this example.) Since A is a P-continuity set if and only if x ф дА, it is easy to check the equivalence of (i) and (v) in this case. If xn —> x but the xn all differ from x, then there is strict inequality in (iii) for F ~ {x} and strict inequality in (iv) for the complementary set G =* Fc; moreover, if the xn are all distinct and A = {x2, ж4, . . . }, then Pn(A) does not converge to P{A) or to anything else. On the line with the ordinary metric, the DeMoivre-Laplace theorem also illustrates the conditions in the theorem. For a simpler example equally relevant, consider the measure Pn corresponding to a mass of \jn at each of the points i/n, i — 1,2, . . . , n. Now Pn converges weakly to Lebesgue measure P confined to the unit interval, as follows from the fact that J fdPn is an approximating sum to ifdP viewed as a Riemann integral. If A consists of the rationals, then Pn(A) = 1 does not converge to P(A) ;= 0; if G is an open set containing the rationals and having Legesgue measure near 0, then there is strict inequality in (iv). We prove Theorem 2.1 by establishing the implications in the following diagram. I I (i) —(ii) —(iii)<->(iv) I (v) Of course, (i) —> (ii) is trivial. Proof of (ii) —> (iii). Suppose (ii) holds and that F is closed. Suppose <3 > 0. For small enough e, G = {x:P(x, F) < e} satisfies P(G) < P(F) + d, Each subset of S mentioned is assumed to lie in
Properties of Weak Convergence 13 since the sets of this form decrease to Fas e [ 0. If f(x) is the function defined by A.2), then/is uniformly continuous on S,f(x) = 1 on F,f(x) =s 0 on the complement Gc of G, and 0 <f(x) < 1 for all x. Since (ii) holds, we have limn $fdPn = $fdP, which, together with the relations and [fdP =* Г fdP < P(G) < P{F) + <3, implies lim supn Pn(F) < limn [fdPn = \fdP < P(F) + <3. Since д was arbitrary, (iii) follows. Proof of (iii) -* (i). Suppose that (iii) holds and that /e C(S). We shall first show that B.1) \imsuVn\fdPn<\fdP. jfdPn<(j By transforming / linearly (with a positive coefficient for the first-degree term), we may reduce the problem to the case in which 0 </(#) < 1 for all x. For an integer k, temporarily fixed, let Ft be the closed set Ft = {х:Цк <f(x)}, i = 0,l it. Since 0 < f(x) < 1, we have i) < \ к) J fdP < ъ[р[х-Х=± <f(x) < f) k { к к r ?/() < i) < \fdP < ъ[р[ к к) J i=ik { к The sum on the right is 2 7 [ВД0 ^(^)] + 2 7 [ВД-0 (^)] г + 7 i=i /с к к i This and a similar transformation of the sum on the left yield B.2) If (iii) holds, then lim зирпРи(^г) < Р(^) for each г and hence (apply the right-hand inequality in B.2) to Pn and the left-hand one to P) | i ад> < G лр < 7+7 i p(^.). К г=1 J к к г=1 n<± + ( Letting к —> со, we obtain B.1). Applying B.1) to —/yields lim infn lfdPn > J/rfP, which, together with B.1) itself, proves weak convergence. The equivalence of (iii) and (iv) follows easily by complementation.
14 Weak Convergence in Metric Spaces Proof of (iii) -> (v). Let A° denote the interior of A, and let A~ denote its closure. If (iii) holds, then so does (iv), and hence, for each A, B.3) P(A-) ^ lim supn Pn(A~) ^ lim supn Pn(A) > lim inf. Pn(A) > Hm infn Pn(A°) If P(dA) = 0, then the extreme terms equal P(A) and limn Pn(A) = P(A) follows. Proof of (v) —*• (iii). Since d{x: p(x, F) < <5} is contained! in {x: p(x, F) = <5}, these boundaries are disjoint for distinct <5, and hence at most countably many of them can have positive P-measure. Therefore, for some sequence of positive dk going to 0, the sets Fk =* {x:p(x, F) < dk} are P-continuity sets. If (v) holds, then lim supnPn(F) < \imn Pn(Fk) = P(Fk) for each k; if F is closed, then Fk j, F, so that (iii) follows. This completes the proof of Theorem 2.1. Other Criteria It is sometimes convenient to prove weak convergence by showing that Pn(A) -^P(A) for some special class of sets A. THEOREM 2.2 Let % be a subclass of Sf such that (i) $1 is closed under the formation of finite intersections and (ii) each open set in S is a finite or countable union of elements of<%. IfPn(A) -> P(A)for every A in <%, then Pn =>P. Proof. If Ax, . . . , Am lie in %, then so do their intersections; hence, by the inclusion-exclusion formula, 1 m \ Pn U АЛ =^РЯШ ~^Рп(АгА0) +ЪшРп(АЛАк) If G is open, then G = Ui^i f°r some sequence {Аг} of elements of W. Given e, choose m so that P(Ui<mA) > P(G) — ?- By the relation just proved, P(G) - е<Р(и{<тЛг) ^"Шп.РЛи^яЛ) < lim infnPn(G). Since в was arbitrary, condition (iv) of the preceding theorem holds. Let S(x, e) denote the (open) ?-sphere about x. COROLLARY 1 Let % be a class of sets such that (i) ^ is closed under the formation of finite intersections and (ii) for every x in S and every positive s t The inclusion may be strict—in a discrete space, for example.
Properties of Weak Convergence 15 there is an A in fy with x e A0 <= A <= S(x, e). If S is separable and ifPn(A) -> P(A)for every A in <%, then Pn =>P. Proof. Condition (ii)t implies that, for each point x of an open set G, x g A0 <= A <= G for some A in %f. Since S is separable, there exists (see p. 216) in ^ a finite or infinite sequence {At} such that G <= UiA° anc* ^» ^ ^> which implies (j — UHr Thus ^ satisfies the hypotheses of Theorem 2.2. COROLLARY 2 Suppose that, for each finite intersection A of open spheres, we have Pn(A)-+P(A), provided A is a P-continuity set. If S is separable, then Pn^P. Proof. The boundaries dS(x, e), being contained in the sets {y'-p{%, y) — ?}, are disjoint (for fixed x) and hence have P-measure 0, with at most countably many exceptions. Since д(А П Я) с (дА) и (дВ), it follows that the hypotheses of Corollary 1 are satisfied by the class °U of those P-continuity sets that are finite intersections of spheres, and the result follows. Let us agree to call a subclass У of Sf a convergence-determining class if convergence Pn{A) -> P(A) for all P-continuity sets A in У invariably entails the weak convergence of Pn to P. Corollary 2 becomes: In a separable space, the finite intersections of spheres constitute a convergence-determining class. Let us further agree to call У a determining class if P and Q are identical whenever they agree on У. The class of closed sets is a determining class and so is any field that generates Sf. Although each convergence-determining class is clearly also a determining class, the following example shows that the converse fails. Let S be the half-open interval [0, 1) with the ordinary metric; let У be the class of sets [a, b) with 0 < a < b < 1. Then У is а determining class but not (as may be seen by taking Pn [P] a unit mass at 1 — 1/w [0]) a convergence-determining class. Although this one is artificial, we shall see that the applications abound with real examples of determining classes that are not convergence-determining classes. We close this section with another condition for weak convergence. A sequence {xn} of real numbers converges to a limit x if and only if each subsequence {xn,} contains a further subsequence {xn~} that converges to x. (It is convenient to denote a sequence of integers by {n'} rather than {nk} and a subsequence of {«'} by {n"} rather than {nk}.) From this fact it is easy to deduce a weak-convergence analogue. t This condition is slightly stronger than the requirement that the interiors of the elements of % form a base for the topology of S.
16 Weak Convergence in Metric Spaces THEOREM 2.3 We have Pn^P if and only if each subsequence {Pn>} contains a further subsequence {Pn»} such that Pn» =>P. We shall deal occasionally with weak convergence of Pt to P when t goes to infinity in a continuous manner. Of course, this is defined to mean that B.4) lim [fdPt=[fdP i^°o J J for each/in C(S), For/fixed, B.4) holds if and only if B.5) lim [fdPtn = [fdP for each sequence {tn} going to infinity. Thus Pt =>P as t —> oo if and only if Ptn =>P for each sequence {tn} going to infinity, and nothing really new is involved. We can also let t approach in a continuous manner some finite value t0. Remarks. Theorem 2.1 dates back at least to Alexandrov A940-43). Theorem 2.2 is due to Kolmogorov and Prohorov A954). For other accounts of the theory, see the books of Gikhman and Skorohod A965), Hermequin and Tortrat A965), and Parthasarathy A967). The Banach space C(S) has for its adjoint C*(S) the space of finite signed measures on У\ the weak* topology, or the C(S) topology of C*(S), relativized to the space Z(S) of probability measures on У, is the topology described in the first footnote in this section (hence the "weak" in our "weak convergence"); see Dunford and Schwartz A958, pp. 262 and 419). Varadarajan A958a and 1961a) investigates the topological structure of Z(S); see also Appendix III. For extensions of the theory to general topological spaces, see LeCam A957), Vara- Varadarajan A958a and 1961a), and Kallianpur A961). If the metric space S is not separable, the cr-field <S^0 generated by the spheres may be smaller than S?. Dudley A966 and 1967) has a theory of weak convergence involving only sets in S?o and functions measurable Уо. If Pn=> P, one can ask whether Pn{A) -»¦ P(A) holds uniformly on a given-class of P- continuity sets; see Ranga Rao A962) and Billingsley and Tops0e A967). PROBLEMS 1. If S is countable and discrete, then Pn ¦=> P if and only if Pn{x} -> P{x) for each one- point set {x}. 2. Let Pn and P be given by densities p and pn with respect to a measure A on (S, ??). lfpn(x) ->p(x) except for x in a set of A-measure 0, then Pn => P [see Scheffe's theorem on p. 224]. Show by example.that pn(%) -^-p(x) may fail on a set of positive measure even though Pn => P. 3. Even though Pn => P, \fdPn -> \fdP may fail if /is bounded but not continuous or if/ is continuous but not bounded (even if the integrals exist). Give examples. If S is compact, the second possibility does not exist. What if S is not compact but P has compact support? 4. The class of P-continuity sets (P fixed) form a field.
Some Special Cases 17 5. If * is a determining class, if Pn(A) -> Q(A) for Ae<%, and if Pn => P, it does /iof follow that P = Q. [Define probability measures on the line by Р^л} = Pn{l + nr1} — p{0} = P{1} = ^ and Q{0} = 1. Let Б consist of the points 0, 1, и, and 1 + /г1 (л = 1,2,...)- Define °U as the field of Borel sets A such that either (i) А О В is finite and 0 ф A or else (ii) Ac П Bis finite and 0 ^ Ac. (This example is due to O. Bjornsson.)] 6. Define what one should mean by determining classes and convergence-determining classes of elements of C(S). Give an example of a determining class that is not a converg- convergence-determining class. Show that a class uniformly dense in C(S) is a convergence- determining class. 7. If/ is bounded and upper semicontinuous (p. 218), then Pn=> P implies lim supn \fdPn ? \fdP. 8. Let {fg} be a family of real functions on S, equicontinuous at each x (for each x and e, there exists a 8 for which p(x, y) < 8 implies, for all 6, \fg(x) — fe{y)\ < e). If {/e}is uniformly bounded and S is separable, then Pn=> P implies that J/e dPn -*¦ J/e dP uniformly in 6. [First show that, for each e, there exists a countable partition A? = {D^} of S into P-continuity sets such that | fe(x) —fg(y)\ < e for all б if ж and у lie in the same D^. Approxi- Approximate J/e dP by T*kfe(xk)P(Dek) with xfc in D?k, and similarly for \fe dPn, and apply Scheffe's theorem (p. 224).] (This result is due to Ranga Rao A962); for extensions, see Billingsley and Topsoe A967) and Topsoe A967a and 1967b).) 3. SOME SPECIAL CASES Euclidean Space Let Rk denote A:-dimensional Euclidean space, which we shall always take with the ordinary metric p(x, y) = \x — y\ = [2*=1(а^ — 2ЛJ]^- We denote by 8%k the class of Borel sets; the elements of 8%k we shall call k-dimensional Borel sets, linear Borel sets in case к = 1. By the notation у < x\y < x], we mean yi < хг\уг < xt] for all i = 1, 2, . . . , k. (Note that у < х is stronger than у < x together with y^ x.) An interval is a set C.1) (a,b] = {x:a<x<b}. Finally, let us denote by e = A, . . . , 1) the vector whose coordinates are all 1. The general probability measure P on (Rk, fflk) has a distribution function F, defined by C.2) F(x) = P{y:y^x}, x E Rk. Let us relate weak convergence Pn => P to the usual notion of convergence for the corresponding distribution functions Fn, F. The function F is continuous at x if and only if for each positive s there exists a positive <5 such that x — де < у < x -\- de implies \F(x) — F(y)\ < e. To say that F is continuous from above at x means that for each positive e there exists a positive <5 such that x < у < x + be implies \F(x) — F(y)\ < e.
18 Weak Convergence in Metric Spaces From the definition C.2) it follows that F is nondecreasing in each variable. Hence F is continuous from above at x if and only if F(x) coincides with infj>0 F(x + de) = inf5>0 P{y\y < x + de}. This infimum is just the P- measure of the intersection f]d>0 {y.y < x + de} = {y.y < x}. Therefore F is continuous from above at each x. Since F is nondecreasing in each variable and is everywhere continuous from above, it is continuous at x if and only if it is continuous from below at that point, that is, if and only if for each positive s there exists a positive <5 such that x — de < у < x implies \F(x) — F(y)\ < e. Using the monotonicity once more, we see that this condition is in turn equivalent to F(x) =* sup5>0 F(x — de). The supremum being the P-measure of the union \Js>0 {y:y < x — de} = {y.y < x}, we see that F is continuous at x if and only if F(x) = P{y:y < x}. Since {y.y < x} — {y.y < x} is exactly the boundary of {y.y < x}, F is continuous at x if and only if {y.y < x} is a P-continuity set. For distribution functions Fn and F, let us define Fn^>F to mean that there is convergence Fn(x) -+ F(x) at continuity points x of F. By what has just been proved, if Pn =>P, then the corresponding distribution functions satisfy Fn => F. Now an interval (a, b] is determined by the 2k (k — 1)- dimensional hyperplanes containing its faces; let °U be the class of intervals for which all these hyperplanes have P-measure 0. Each vertex of an element of ^ is a continuity point of F, and ^ is closed under the formation of finite intersections. Since only countably many parallel hyperplanes can have positive P-measure, it follows by Corollary 1 to Theorem 2.2 that, if Pn(A) -> P(A) for each A in <8f, then Pn => P. Since P(a, b] is a sum 2 ± F(x) with x ranging over the 2k vertices of (a, b] and similarly forPn(a, b], Fn => F implies that Pn(A) -+ P(A) holds for each A in °U. Therefore Pn^-P and i%j => F are equivalent. Thus the notion of weak convergence reduces in Rk to the ordinary notion of the convergence of distribution functions. In other words, the sets {y.y < x} form a convergence-determining class. The proof above also shows that the rectangles (a, b] form a convergence-determining class. The Circle The same sort of result obviously holds if S is the unit circle in the complex plane: Pn=>P if and only if Pn(A) —*¦ P(A) for every arc A whose endpoints have P-measure 0. A sequence {xx, x2, . . . } of points of S (complex numbers of modulus 1) is said to be uniformly distributed if the allotment of points to each arc is proportional to its length, in the sense that C.3) n-+oo П 3=1
Some Special Cases 19 where IA is the indicator, or characteristic function, of A, and P is circular Lebesgue measure so normalized that P(S) =* 1. If Pn denotes the rcth empirical distribution of the sequence—the measure corresponding to a mass of l/n at each of the points xx, xz, . . . , xn—this condition reduces to Pn(A)^>~P(A) for arcs A, so that the sequence is uniformly distributed if and only if Pn=>P. Therefore, if every arc contains its proper quota of points, in the sense of C.3), then so does every other Borel set whose boundary has Lebesgue measure 0. We prove in Section 7 a famous theorem of Weyl, according to which {xx, x2, . . . } is uniformly distributed if and only if n~x 2"=1 (x3)u —v 0 holds for every nonzero integer u. The Space R™ The results for Rk carry over in all essential respects to the topological product of a countable sequence of copies of the line, that is, to the space R™ of sequences x =a (хг, x2, . . . ) of real numbers (see p. 218). The topology has as the basic neighborhoods of a point x the sets of the form C-4) Nk>?(x) ^ {y:\xt-Vi\<e, i = 1, . . . , k} with ? > 0 and к = 1, 2, .... With this topology, i?°° is a complete, separable metric space. (We shall always mean by i?00 this topological product of a countable set of copies of the real line.) Let 7Tk denote the natural projection from i?00 to Rk, defined by ттк(х) = (xx, . . . , xk). A finite-dimensional set, or cylinder, is by definition a set of the form тг~гЯ with к > 1 and H e 8%k. Since each ттк is continuous and hence measurable (see p. 222), the finite-dimensional sets lie in the a-field ^°° of Borel sets in i?00. Let ^ denote the class of finite-dimensional sets. Since each set C.4) lies in IF and since i?00 is separable, IF generates ^°°. Since IF is a (finitely additive) field, it follows that IF is a determining class. For fixed к and x, the sets C.4) for different values of e have disjoint boundaries (e < <5 implies JVj^e <= Nkd). Applying Corollary 1 of Theorem 2.2 to the class °U of P-continuity sets in IF, we see therefore that JF is even a convergence-determining class. Thus Pn=>P if and only if Pn(A) —** P(A) holds for all finite-dimensional P-continuity sets A. The Space С In С = C[0, 1], the space of continuous functions on [0, 1] with the uniform metric p(x, y) = supt \x(t) — y(t)\ (see p. 220), the situation differs markedly from that in i?00. For points t1}. . . , tk in [0, 1], let Trtl...tk be the mapping that carries the point x of С to the point (x(t^, . . . , x(tk)) of Rk. The finite- dimensional sets are now defined as sets of the form тгг1 \ H with H e Mk.
20 Weak Convergence in Metric Spaces Since TTtv..tk is continuous, these sets lie in the class *% of Borel sets in C. On the other hand, the closed sphere {y:p{x,y) < z} is the limit of the finite-dimensional sets {y:\x(i[n) — y(ijri)\ < e, i — 1, ...,«}; since С is separable, each open set is a countable union of open spheres and hence of closed spheres, so that the finite-dimensional sets generate ^'. Since they form a field, the finite-dimensional sets are thus a determining class. An example shows that the finite-dimensional sets do not form a conver- convergence-determining class. Let P be a unit mass at 0 (the function that vanishes identically), and let Pn be a unit mass at the function xn, where C.5) xn{t) = nt if 0 < t <> - n 2-nt if - < t < - , n n 0 if - < t < 1. n Since xn does not converge to 0 uniformly (that is, in the topology of C), Pn cannot converge weakly to P. (For example, if A = S@, ?), then P{dA) — 0, while Pn(A) = 0 does not converge to P(A) = 1.) But Pn(A) -+P(A) does hold for finite-dimensional P-continuity sets—in fact, if A = ir~^tkH and if 2/л is smaller than the least of the nonzero tt, then Pn(A) — P(A). The finite-dimensional sets are thus a determining class in С but not a convergence-determining class. The difficulty, interest, and usefulness of weak convergence in С all spring from the fact that it involves considerations going beyond those of finite-dimensional sets. Product Spaces Let S — 5" x S" be the product of metric spaces S' and 5"'. If S is separable (which requires that S' and S" be separable), then the a-fields Sf, Sf', and Sf" of Borel sets in these spaces are related by У = У" х У" (see p. 225). The two marginal distributions of a probability measure P on (S, У) are defined by P'(A') =* P(A' x S"), A'e9", and P"(A") = P(S' к A"), A" e Sf". THEOREM 3.1 If S is separable, then a necessary and sufficient condition for Pn^P is that Pn(A' x A")-+P(A' x A") for each P'-continuity set A' and each P"-continuity set A", where P' and P" are the marginal distributions ofP. Proof. Let 3, d', and d" denote the boundary operators in S, S', and S",
Some Special Cases 21 respectively. Since C.6) d(A' x А") <= {{д'А') x S") и (S' x ( the condition is necessary. To prove sufficiency, we apply Corollary 1 of Theorem 2.2 to the class <% of sets A' x A" with A' a P'-continuity set and A" a P"-continuity set. The class °tt is closed under the formation of finite intersections and, by hypothesis, Pn(A) -+P(A) for A in <%. Given (xf, x") in S and e > 0, consider the sets A& - {y:p\x\ y') < 6} x {*/":/>"(*", У*) < <5}. For distinct d, the sets д'{у':р'(х', у') < 6} are disjoint and the sets d"{y":p"(x",y") < 6} are disjoint; therefore ^ lies in °tt for some д with 0 < E < e. If S is metrized by ) = max {я'(*\ 2/'), />"(*", /)}. then Лг is just the sphere with center (x', x") and radius 6. Hence <% satisfies the hypotheses of Corollary 1 of Theorem 2.2, as required. The sufficiency of the condition in Theorem 3.1 implies that the measurable rectangles form a convergence-determining class (but says more: P(d(A' x A")) =* 0 does not imply P'(d'A') - P"(d"A") = 0). For given probability measures P' and P" on (S1, Sf") and (S"', &"'), the product measure P' x P" is a probability measure on У х ?f" and hence, if iS is separable, on У. The following theorem, in which P'n and P' are probability measures on (S", S?') and P"n and P" are probability measures on (iS", Sf"), is an immediate consequence of Theorem 3.1. THEOREM 3.2 IfS is separable, then P'nxP"n=>P' x P" if and only if Pfn=>P'andPun=>P". PROBLEMS 1. Show directly that Fn => F if and only if Fn{x) -+ F{x) for all ж in a dense set and that Fn => F if and only if lim supn Fn(x) <, F{x) and lim infn Fn{x — 0) > F{x — 0) for all x (here F(x - 0) = ы?у<х F(y)). 2. If к > 1, the set of discontinuities of F, although having dense complement, need not be countable. A (k — l)-dimensional hyperplane can contain at most countably many discontinuities if it is normal to none of the axes. [To see the problem, consider first the hyperplane {(хг, x2): xx = — x2} in i?2.] 3. If Fn => -Fand if Fis continuous at each point of a closed set A, then supxe^ \Fn(x) - F(x)\ -> 0. 4. The Levy distance A(F, G) between two one-dimensional distribution functions is the infimum of those positive e such that F(x — e) — e < G(x) < F(x + e) + e for all x.
22 Weak Convergence in Metric Spaces Interpret A(F, G) geometrically in terms of the graphs of F and G. Show that Fn => F if and only if A(Fn, F) -*¦ 0; prove that the collection of one-dimensional distribution functions is a separable, complete metric space under X. 5. Problem 5 in Section 1 adapts the problem preceding it to the general Banach space of countably infinite dimension. In С a simple direct analysis is possible: Work with the functions xn defined by C.5). 6. The uniform distribution on the unit square and the uniform distribution on its diagonal have identical marginal distributions. Relate to Theorem 3.2. 7. Extend the product-space theory at the end of the section by showing that, for a countable product of separable spaces, the finite-dimensional sets (appropriately defined) form a convergence-determining class (i?00 is a special case). 4. CONVERGENCE IN DISTRIBUTION The theory of weak convergence can be paraphrased as the theory of convergence in distribution. When stated in the terminology of the latter theory, which involves no new ideas, many results assume a compact and perspicuous form. Random Elements Let X be a mapping from a probability space (П, 3$, P) into a metric space 5. If JHs measurable (in the sense that X^S? <= ^; see p. 222), we call it a random element. We shall say X is defined on its domain D (or (П, 3ft, P)) and in its range 5 and call it a random element of S. If 5 = R1, we call X a random variable; if 5 — Rk, we call X a random vector; if S = C, we call X a random function. Random variables and random vectors are familiar objects, and the Introduction contains (see formula A3) there) an example of a useful random function (although its measurability was not proved). A variety of random functions arise in a natural way in probability theory. The distribution^ of X is the probability measure P = РХ~г on (S, SP): D.1) P(A) - ?(Х~гА) = P{o): X(co) eA} = P{XeA}, AeST. (We generally suppress the argument со in this way.) In case S = Rk, we have also the associated distribution function of X, defined by F(x) =P{y.y<x}= ?{X <x}, xe R\ Note that P is a probability measure on a space of an arbitrary nature, whereas P is always defined on a metric space. For many questions, the distribution P contains all relevant information about the random element t This has nothing to do with the distributions of Schwartz.
Convergence in Distribution 23 X. If h is a measurable function on S (/r1^1 с У), then, by the change-of- variable formula (see p. 223), D.2) (h(X) d? = (h dP, in the sense that the two integrals exist or fail to exist together and have the same value if they do exist. In the usual expected-value notation, D.2) becomes D.3) ЦКХ)} =[hdP. Each probability measure on each metric space is the distribution of some random element on some probability space: Given P on (S, У), if we take (П, 98, P) - OS, У, P), and if we take X to be the identity, D.4) X(co) -co, соеп = S, then X is a random element on П with values in S and has P as its distribu- distribution. Although the class of distributions thus coincides with the class of probability measures on metric spaces, we generally call a measure on a metric space a distribution only when it is indeed the distribution of some random element already under discussion. Convergence in Distribution We say a sequence {Xn} of random elements converges in distribution to the random element X, and we write D.5) Xn^X, if the distributions Pn of the Xn converge weakly to the distribution P of X: D.6) Pn^P. Although this definition of course makes no sense unless the image space S (the range) and the topology on it are the same for all the random elements X, Хъ X2, . . . , the underlying probability spaces (the domains) may be all distinct. We ordinarily make no mention of these underlying spaces because their structures enter into the argument only by way of the distributions on S they induce. Thus we write P{Xn e A} where we should write Pn{Xn e A}, and we write E{f(Xn)} where we should write $ f(Xn) d?n or perhaps ЕпШХг)}- Since )sf(x)P(dx) = Jn f{X) d? by the change-of-variable formula D.3) and similarly for $fdPn, we have Xn Д. X if and only if E{/TO> - W(X)} for every /e C(S). Theorem 2.1 asserts the equivalence of the following five statements. Call a set A in У an Z-continuity set if P{X e dA) — 0.
24 Weak Convergence in Metric Spaces (i)Xn^X (ii) limn E{f(Xn)} = E{f(X)} for all bounded, uniformly continuous real f. (iii) lim supn ?{Xn eF}< ?{XeF}for all closed F. (iv) lim infn ?{Xn eG}> P{XeG}for all open G. (v) limn ?{Xn eA} = ?{X e A} for all X-continuity sets A. Each theorem about weak convergence can be similarly recast. The following hybrid terminology is useful. If Xn are random elements of S, if Pn are the corresponding distributions, and if P is a probability measure on (iS, У), we say the Xn converge in distribution to P, and write D.7) Xn^P, in case Pn =>P. There is the obvious corresponding version of Theorem 2.1. It is a great convenience to be able to pass from one to another of the three equivalent concepts D.5), D.6), and D.7), and we shall do so freely. This is largely a matter of expedient phraseology. For example, if random variables Xn have asymptotically a normal distribution with mean ц and variance a2, we shall express this fact by writing D.8) Xn^ It makes no difference whether we interpret N(/u, a2) as the measure on (R1, Mx) with density Bna2)-i е~ы-^12^ and understand D.8) in the sense of D.7) or whether we interpret N(/n, a2) as a random variable with и-">2/2<т2 du, Ae M1, and understand D.8) in the sense of D.5). (There exist many such random variables N(/n, a2) on many probability spaces; the construction D.4) gives one of them.) We shall restrict NQz, a2) to this (dual) meaning, and we shall write N for N@, 1): D.10) P{iV e A} = —L f e~/2 du, AeM\ D.9) ?{N(p, a2) eA}= -=L- Г yj2n a J Convergence in Probability Many of the standard concepts and results for convergence in distribution of ordinary random variables generalize to random elements. If, for an element a of S, D.11) for each positive e, we say Xn converges in probability to a and write D.12) Xn^a.
Convergence in Distribution 25 If a is conceived as a constant-valued random element, then, as is easily proved, Xn A- a if and only if Xn ^> a. Alternatively, Xn A- a if and only if the distribution of Xn converges weakly to the probability measure corre- corresponding to a mass of 1 at the point a. The random elements Xn in D.12) may, as usual, be defined on distinct probability spaces—only the range S need be common to them all. If Xn and Yn have a common domain, it makes sense to speak of the distance p(Xn, Yn)—the function with value p(Xn(co), Yn(co)) at со. If S is separable, p(Xn, Yn) is a random variable (see p. 225). In the following theorem, we assume that, for each n, Xn and Yn do have a common domain and that S is separable. THEOREM 4.1 //Xn ^> X and p(Xn, Yn)^0, then Yn ^> X. Proof. If FE = {x:P(x, F)<e}, then P{ YneF}< P{p(Xn, YJ >e} + ?{Xn e Fe}. Since Fe is closed, the hypotheses imply lim supn P{ YneF}< lim supn P{Xn e F.} < P{X e F?}. If F is closed, then Fe \. F as e [ 0 and the result follows by Theorem 2.1 (the version corresponding to convergence in distribution). In the next theoremt we assume that, for each n, Yn, Xln, X2n, . . . have a common domain and that S is separable. THEOREM 4.2 Suppose that, for each u, Xun -^ Xu as n^ oo and that Xu ^> X as м —»¦ oo. Suppose further that D.13) lim lim sup ?{9{Xun, Yn) > e} = 0 «-* oo n-* oo for each positive e. Then Yn ^> X as n -*¦ oo. Proof. Defining FE as before by FE = {x:p{x,F)< e}, we have P{Yn EF}< P{Xun e Fe} + P{P(Xun, Yn) > e}. By the hypothesis Xun 2+ Xu (n -*- oo), lim sup P{Yn eF}< P{Xu e Ft} + lim sup P{P(Xun, Yn) > e}. n-*oo n-*co By D.13) and the hypothesis Xu ¦?> X (u -> oo), lim sup P{Yn eF} < P{X e FE}. n-* oo The result follows as before. t The remainder of this section is not central to the theory; after a cursory reading, it can be consulted as the need arises.
26 Weak Convergence in Metric Spaces Suppose now that X, Xlf X2, . . . all have a common domain and that S is separable. If for every positive e, we say that Xn converges in probability to X and write D.14) Xn^X. Because of the assumptions here of separability and (more important) of a common domain for the Xn, this concept does not generalize D.12). THEOREM 4.3 // Xn ^> X, therf D.15) P({XneA} + {XeA})^0 for every X-continuity set A. Proof. For each positive e, ?{Xn e A, X$ A} < P{P(Xn, X) > e} + P{P(X, A) < e, X$A). From this, the same inequality with Ac in place of A, and the assumption Xn A- X, we obtain lim sup ?{{Xn eA} + {Xe A}) < P{p(X, A)<8,X$A} + ?{P{X, Ae) < 8, XeA}. As e —>- 0, the right-hand member of this inequality tends to ?{X e dA) = 0. Since D.15) implies ?{Xn e A} ^ ?{XeA), it follows from Theorem 4.3 that Xn -^ X implies Xn ®+ X. Product Spaces % Let X' and X'n be random elements of S", and let X" and X"n be random elements of S". In the rest of this section we assume that X' and X" have the same domain, that X'n and X"n have the same domain for each n, and that S" and S"' are separable, so that (X', X") and (X'n, JiQ are random elements of S' x S"' (see p. 225). We seek conditions under which D.16) The random elements X' and X" are by definition independent if ?{X' e A', X" e A"} = ?{X' e A'} ?{X" e A"). If X' and X" are independent and X'n and X"n are independent for each n, then, by Theorem 3.2, D.16) is t Some notation: Ec is the complement of E, Ex — E2 = Ег (~л E2° is the difference of Ex and E2, and E1-\- E2 = (Ег — E2) U {E2 — Ег) is the symmetric difference of Ex and E2. t The next-to-last footnote still applies.
Convergence in Distribution 27 equivalent to D.17) X'n^X', X"n®+X". Although D.16) implies D.17) even without independence, the converse fails—take (X', X") distributed uniformly on the unit square and (X'n, X'^) distributed uniformly on its diagonal. We shall be concerned with cases in which X' and X" are independent but X'n and X"n satisfy some condition weaker than independence. If they satisfy D.16) with independent X' and X", then it is natural to regard X'n and X"n as asymptotically independent. By Theorem 3.1, D.16) holds if and only if D.18) P{X'n e A', X"n e A"} -> P{X' e A', X" e A"} for all X'-continuity sets A' and all ^''-continuity sets A". THEOREM 4.4 If X'n ^> X' and X"n ^ a", then (X'n, JQ ^> (X', a"). Proof We must verify D.18) with X" identically equal to a". Suppose that A' is an ^'-continuity set and that A" is an Х''-continuity set (that is, а" ф дА"). If a" e A", then P{X'^ ф A"} -> 0, and D.18) follows from X'n ®+ X' and P{X'n e A'} - P{X: ф A"} < ?{X'n e A', X: e A"} < P{X'n e A'}. If а" ф A", then D.18) follows from ?{X'n e A', X"n e A"} < P{X'; e A"} — 0. In the next theorem we assume that X"n converges in probability to a random element Y" that is not necessarily constant, which requires that Y" and all the (X'n, JQ have the same domain (D, &, P). Let ^0 be a (finitely additive) field contained in ?%, and denote by a(&0) the (T-field generated by ^0. THEOREM 4.5 Suppose that X' and X" are independent and that X" has the same distribution as Y". If' X"n A- 7", if D.19) ?{{X'n e А'} ПЕ)^ P{X' e A'} P(E) for each X'-continuity set A' and each set E in the field ^0, and if each X'n is measurable o(&0), then (X'n, X'^) ®+ (X1, X"). Note that D.19) implies X'n®+X' (take E =* D). Note also that the domain of (Xr, X") need not be that common to Y" and the (X'n, X'^). We may replace (X', X") in the conclusion by G', Y") if the domain of 7" supports some random element Y' that is independent of Y" and has the proper distribution; but it would be an unnecessary restriction to assume in general the existence of such a Y'.
28 Weak Convergence in Metric Spaces Proof. Fix an Z'-continuity set A' and an ^''-continuity set A". We are to prove D.18), which, since X' and X" are independent and X" has the distribu- distribution of Y", is the same thing as .. D.20) ?{X'n e A', X"n ? A"} -> P{X' e A'} ?{Y" e A"). Since X"n A- Y", it follows by Theorem 4.3 that D.20) is in turn the same as D.21) ?{X'n e A', Y" e A") -> P{X' e A'} ?{Y" e A"). Write En - {Х'пеА'} and a =* ?{X' eA'}, and let g denote the indicator of the set {Y" e A"}. Then D.21) takes the form D.22) Г gdP-+oi(gdP. Since each X'n is measurable o(&0), each En lies in a( We shall prove that D.22) holds if g is an arbitrary integrable function (measurable 38). If g is the indicator of a set in 38 a, then D.22) holds because it is the same thing as D.19). Clearly D.22) then also holds if g is a simple function measurable 380. If g is integrable and measurable o{38^), then, for each positive e, there is a simple function g?, measurable 38 й, with E{l? ~ ge\) < e; but then lim sup g dP — <x\g dP n-+oo jEn J so that D.22) follows for all such g. Finally, suppose that g is measurable 38 and integrable but not necessarily measurable aC$0). By the properties of conditional expected valuesf we still have, since En e f g d? = Г E{# 1 a(^0)} d? - а Ге{# I a(^0)} ^P = a f^ d?. J En JEn J J Thus D.22) does hold for all integrable g, as required. Remarks. The ideas in the proof of Theorem 4.5 come from Renyi A958). PROBLEMS 1. Prove for random variables that, if Xn ®+ X and Yn ?+ 0, then Xn + Yn 1^ X and лп rn —> u- 2. Prove for random vectors Xn, Yn, X and random variables Zn that (a) if Xn Ё+ X, \Xn - Yn\ ? Zn \Xn\, and Zn P> 0, then Уп М+ X and that (b) if Xn ®+ X, \Xn - Yn\ <, Zn | Yn\, and Zn L-y 0, then У„ ?> X [Reduce (b) to (a) via the fact that \x — y\ <, e \y\ with e < i implies \x — y\ < 2e \x\.] t See Doob A953) or Billingsley A965). The central results in this book do not require conditional probabilities and expected values.
Weak Convergence and Mappings 29 3. Three fair coins are tossed independently. Let Etj be the event that coins j and у show the-same face, let X' = X'n = X"n = Y" be the indicator of ?13, let X" be the indicator of E12, and let 3§0 = {E12, E2Z}. The conclusion of Theorem 4.5 fails, although its hypotheses are satisfied except for the requirement that &0 be a field. 4. Let Xlt X2,... be independent and have a common distribution P on S. Let Pn m be the empirical measure for Xx(in), . .. , Xn(co); Pn m(A) is the fraction of k, 1 < к <| n, for which ^(co) e A: 1 Ы Show that, if S is separable, then Pn m => P with probability 1. [Use the strong law of large numbers for Bernoulli trials and Corollary 1 to Theorem 2.2.1 (This result is due to Vara- darajan A958b); see Ranga Rao A962) for extensions.) 5. Show that random variables Xn and X satisfy Xn Ё+ X if and only if D.22) 4F(Xn)} - 4F(X)} for every continuous distribution function F [turn A.1) around]. If У has the distribution function Fand is independent of X and all the Xn (take them all to have the same domain), then D.22) is the same as P{ Y < Xn} — P{ Y < X}. 6. For a probability measure P on (S, ?"), D.4) shows how to construct on a probability space (Q, 3B, P) a random element X with distribution P. If S is separable and complete, we can take P as Lebesgue measure on the Borel sets 88 of the unit interval Q. [Let sfk = {Aku} be a decomposition of S into P-continuity sets of diameter less than I/A: and let Sk = {Iku} be a decomposition of Q into subintervals with lengths P(/fcu) = ^(Л^); arrange that •я?к+1 refines s4h and >/fc+1 refines Jk. For a> G Iku, give Xk(a>) some value in Aku, show that {Хк(ш)} is fundamental for each a>, and use Theorem 4.1 to show that X(a>) = UmkXk(u>) has distribution P.] (Skorohod A956) has the stronger result that, if Pn => P, random elements Xn and X with these distributions can be constructed on the unit interval in such a way that Xn(w) -*¦ X(w) for each со.) 7. Show that (X'n, X"n) 1*. (X', X") if X' and X" are independent, X"n Д. Y", where Y" has the distribution of X", and D.19) holds for each E in the a-field generated by Y". 5. WEAK CONVERGENCE AND MAPPINGS Continuous Mappings If h is a measurable mapping of 5 into another metric space S' (with metric p and cr-field У of Borel sets), then each probability measure P on(.S, ST) induces on (S', S?') a unique probability measure Phr1, defined by Рк~х(А) = P{hr1A) for AeSS". We need conditions under which Pn^>P implies Pnhr1^>Phr1. One such condition is that h be continuous, since then f(h(x)) is bounded and continuous on S whenever f(y) is bounded and continuous on S', so that Pn =>P implies §f(h(x))Pn(dx) -> §f(h{x))P(dx), a relation which, upon transformation of the integrals (see p. 223), becomes f(y)Ph~\dy).
30 Weak Convergence in Metric Spaces For example, the natural projection vk from i?00 to Rk is continuous, so that Pn =>P implies P^-1 => Ртг for each k. Let us show that, conversely, if Pn-n-1 => P-n--1 for each k, then Pn^~P. From the continuity of ттк it follows easily that Этг^Я <= тг ЭЯ for Я с i?fc. Using special properties of ттк we shall prove that there is inclusion in the other direction. If x e тг дН, so that nkx е ЭЯ, then there are points <x(u> in Я and points /?(u) in Hc such that <x(u) -> тткх and /S(t<) -> тг&а; (и -> oo). Since the points (a<">, . . . , а?и), жлн-ъ • • • ) lie in -n~xH and converge to x, and since the points (/^u>, . . . , fi[u), xk+1, . . . ) lie in (тг^ЯH and also converge to x, x e d^-^H). Thus дтт^Н =* 7Г-1 дН. If РпТг-^Ртт-1, then Рп(Л)^Р(Л) for sets Л = тг^Я with ^ andРСтг-1 дН) - 0. SinceРСтг дН) = 0 is equivalent to 0, РП(Л)^-Р(Л) holds for all finite-dimensional P-continuity sets, and hence, since the finite-dimensional sets form a convergence-determining class, Pn => P. We call the Ртг^г1 the finite-dimensional distributions or measures corre- corresponding to P. We have shown that probability measures on G?00, ^°°) converge weakly if and only if all the corresponding finite-dimensional distributions converge weakly. The finite-dimensional distributions of a probability measure P on (C, ^) we define as the various measures P^J^..tic, where the irt ...tjfe are the projec- projections defined in Section 3. Since these projections are continuous, the weak convergence of probability measures on (C, ^) implies the weak convergence of the corresponding finite-dimensional distributions. But the converse fails because, as was shown by counterexample, the class of finite-dimensional sets is not convergence determining. Indeed, if P [PJ is a unit mass at 0 [the function C.5)], then Pn does not converge weakly to P, even though Pn7r~^...tk =>Pnj*..tk holds for all sets (fls . . . , tk). On the other hand, since the finite-dimensional sets form a determining class, a probability measure on (C, ^) is uniquely determined by its finite-dimensional distributions. Main Theorem We have seen that Pn^>P implies PJr1 => Phr1 if A is a continuous mapping of iS into S", but we can weaken the continuity assumption. Assume only that h is measurable and let Dh be the set of discontinuities of h. Then DheSf (even if h is not measurable; see p. 225). THEOREM 5.1 IfPn^P and P(Dh) = 0, then PJr1 ^>Pfr1. Proof. We shall show that, if F is a closed subset of S", then lim sup PJT\F) < Phr\F).
Weak Convergence and Mappings 31 Since Pn => P, we have Um supn Pn(h^F) < lim supn Pn(Qr*F)-) < P((h^F)-). Hence it suffices to prove P((h~1F)~) — P{h~xF), which follows from the assumption P(Dh) = 0 and the fact that (hr1?)- с ?>л и There are two immediate corollaries. For a random element X of S, h(X) is a random element of S' (we still assume h measurable). COROLLARY 1 IfXn^X and ?{X e Dh) = 0, then h(Xn) -^ h(X). COROLLARY 2 IfXn ?* a and ifh is continuous at a, then h(Xn) ?> h{a). For example, for ordinary random variables, (Xn, Yn) Д>- (X, Y) implies Xn + Yn Q> X + Y. Facts of this sort, used constantly in probability and statistics, all depend ultimately on Theorem 5.1. Particular interest attaches to Theorem 5.1 when S' is the line, in which case h is an ordinary real, measurable function. THEOREM 5.2 (i) IfPn => P, then Pnhrx ^> Phrx for every real, measurable function hfor which P(Dh) = 0. (ii) IfPJr1 =>Ph~x for all bounded, continuous real h, then Pn =>P. (iii) IfPn => P and h is a real, bounded, measurable function with P(Dh) = 0, then ]hdPn^lh dP. Proof. Part (i) follows from Theorem 5.1. If PJr1 => Phr1, then, by change of variable, ff(h(x))Pn(dx) -» J f(h{x))P{dx) for each / in С{ВУ). If h is bounded by M, then, taking f-M if t < -M, f{t) = | t if -M <t <M, [ M if M <, t, we see that J h dPn —> J h dP. Thus part (iii) is a consequence of part (i) and part (ii) follows by the definition of weak convergence. (We can strengthen (ii) by requiring h in the hypothesis to be uniformly continuous.) In applications we are generally interested in establishing weak conver- convergence in various metric spaces in order to be able, by applying part (i) of this theorem, to prove the weak convergence of measures induced on the line by various real functions h. Integration to the Limit In part (iii) of Theorem 5.2, the restriction that h be bounded can be relaxed. It is simplest to discuss this problem in terms of random variables Xn and X having distributions PJr1 and Phrx.
32 Weak Convergence in Metric Spaces THEOREM 5.3 IfXn 2+ X, then E{|X|} <, lim infn E{\Xn\}. Proof. Take (\x\ for |ж| < а h(x) =* ( 0 for |ж| > a in part (in) of Theorem 5.2; if P{|X| = a} = 0, then f |*|rfP=lim f |XJ JP<limi J{|X|<a} n-oo J{|Xn|<a} n-o The result follows if we let a tend to infinity through values satisfying = a} = 0. The variables Xn are said to be uniformly integrable if lim sup f \Xn\ d? = 0. a-oo n J{\Xn\>a] If the Xn are uniformly integrable, then E.1) supnE{|Xn|}<oo. On the other hand, if supn E{\Xn\1+E] < oo for some positive e, then the Xn are uniformly integrable because in this case f \Xn\ d? < \ <xE The Xn are also uniformly integrable if there is a random variable Y such that E{| F|} < oo and E.2) P{\Xn\ > a} < P{| Y\ ? a}, n > 1, a > 0, because in this case (see C) on p. 223) f \Xn\dP<,{ \Y\dP. J{|Xn|>a} J{|F|>a} THEOREM 5.4 Suppose Xn %- X. If the Xn are uniformly integrable, then E.3) if X and the Xn are nonnegative and integrable, then E.3) implies that the Xn are uniformly integrable. Proof. If the Xn are uniformly integrable, the integrability of X follows from E.1) and Theorem 5.3. Take (x if \x\ < a, h(x) = @ if \x\ > a,
Weak Convergence and Mappings 33 in part (iii) of Theorem 5.2. Since Xn 2* X, E.4) ЕВД} -> E{ha(X)} if P{\X\ - a} = 0. But E.5) E{Xn} - E{/za(XJ} = f Xnd? J{|Xn|>a} and E.6) E{X} - E{/za(X)} = f XdP. J{|X|>a} Since these three relations imply lim sup \E{Xn} - B{X}\ < sup f \Xn\ d? + f \X\ d?, n-oo n J{\Xn\>a] J{\X\>a} E.3) does follow from uniform integrability. On the other hand, if X and the Xn are integrable and E.3) holds, it follows by E.4), E.5), and E.6) that f XndP-*[ X J{|Xn|>a} J{|X|>a} E.7) f XndP-*[ XdP J{|Xn|>a} J{|X|>a} if P{|X| — a} =: 0. Since X is integrable, we can, given e, choose a so the right side of E.7) is less than e. But then the left side is less than e for all n exceeding some n0. Since the Xn are nonnegative and individually integrable, uniform integrability follows. Since Xn^X implies Xnr^Xr, Theorems 5.3 and 5.4 immediately extend to moments higher than the first. In none of this need X and the Xn (and the Y in E.2)) have a common domain. If they do, and if XJco) —*¦ X(co) for almost all со, or if Xn Д. X, then Xn -^ X. Thus Theorem 5.3 contains Fatou's lemma and Theorem 5.4 contains the standard theorems about integration to the limit (the case involving E.2) contains Lebesgue's dominated convergence theorem). An Extension of Theorem 5.11 Let hn and h be measurable mappings from S to 5". We may ask whether Pn=>P implies Pnh~x =>Phrx when hn converges to h in some sense. Let E be the set of x such that hnxn -> hx fails to hold for some sequence {xn} approaching x. (If hn is identically equal to h, then E =* Dh.) It is not hard to show that x e Ec if and only if for every e there exist а к and а д such that / > к and p(x, у) < д together imply p'(hx, h-y) < e. Assume that E lies in У (which is shown on p. 226 to hold automatically if S' is separable). f This result is not used in the sequel.
34 Weak Convergence in Metric Spaces THEOREM 5.5 IfPn => P and P{E) - 0, then Pnh~x => Phr\ Proof. We shall show that P(hrlG) < lim infn Pn(h-XG) for open subsets G of S'. From the above characterization of the points of Ec, it follows that, if x e Ec and if hx lies in the open set G, then there exist к and д such that h-y E G if i > к and p(x, y) < d, so that x is interior to Tk = [ Thus hrxG с ? и иЛ0- Since ^C^) — °» we nave P(h~lG) < since Г° с J°+l, we have, for given e > 0, PQr^G) < Р(Г°) + ? for sufficiently large fc. From Pn^>P it follows that P(J°) < lim infnPn(J°); since T° a h~xG for large n, we have P(T°) < lim infn Pn(h~lG). Therefore P(hrxG) < lim infn Pn(hr*-G) + ?, which, since ? was arbitrary, completes the proof. If hn = h, this result reduces to Theorem 5.1. If h is everywhere continuous and hn converges to h uniformly on compact sets, then E is empty, so that the hypothesis of the theorem is satisfied. More generally, P(E) =* 0 if P(Dh) — 0 and if there is uniform convergence on compact sets. On the other hand, if Dh — E — 0, then hn converges to h uniformly on compact sets. Remarks. In the Euclidean case, Theorem 5.1 is sometimes called the Mann-Wald theorem (see Mann and Wald A943) and Chernoff A956)); Corollary 2 for rational functions h is Slutsky's theorem (see Slutsky A925)). Theorem 5.5, which generalizes a result of Prohorov A956), is due to H. Rubin; see Anderson A963). For extensions of Theorems 5.1 and 5.5 and converses to them, see Tops0e A967a and 1967b). PROBLEMS 1. If a Borel function/has a derivative at a (this being the only assumption on/) and Xn P+ a, then f(Xn) = f(a) + f'(a)(Xn - a) + (Xn - a)Zn with Zn P> 0. Extend to higher derivatives. 2. Suppose/has a derivative at 0. If XnYn ®+ Y and Yn ?>. 0, then Xn(f(Yn)-f@))l+f\0)Y. 3. Suppose fn and / have continuous derivatives with f'n(%) -*¦/' (x) uniformly in x. If Xn Yn ®+ Y and Yn P> 0, then Xn(fn( Yn) - /n@)) ^/'@) Y. 4. Suppose Xn ,^> X. Give examples in which the Xn are integrable but X is not, and conversely. Show that E {Xn} -> E^} does not in general imply uniform integrability of the Xn. 5. Show that Xn are uniformly integrable if and only if supn E{|ZJ} < oo and for each г there exists а д such that, for all n, P(?) < S implies ]E \Xn\ d? < e. 6. Let P be Lebesgue measure on the unit interval and Pn correspond to a mass of \\n at some point chosen from ((/ — 1)/и, i\ri), i = 1, . . . , n. Show that Pn => P and deduce from Theorem 5.2 that a bounded function continuous almost everywhere is Riemann integrable.
Prohorov's Theorem 35 6. PROHOROV'S THEOREM Relative Compactness Let П be a family of probability measures on (S, ?P). We call П relatively compact if every sequence of elements of П contains a weakly convergent subsequence; that is, if for every sequence {Pn} in П there exist a subsequence {Pn,} and a probability measure Q (defined on (S, 5f), but not necessarily an element of П) such that Pn. => g.| Even though Pn. => Q makes no sense if Q(S) < 1, it is to be emphasized that we do require that Q(S) = 1—we disallow any escape of mass, as discussed below. We need to be able to decide whether or not a given family П is relatively compact. For example, suppose we know of probability measures Pn and P on (C, #) that the finite-dimensional distributions of Pn converge weakly to those of P. We have seen in Section 5 that Pn need not converge weakly to P. Suppose, however, that we also know that {Pn} is relatively compact. Then each subsequence {Pn<} contains a further subsequence {Pn.} converging weakly to some limit Q. The finite-dimensional distributions of Q must be the weak limits of those of {Pn*} and hence must coincide with the finite- dimensional distributions of P (for each (tlf . . . , tk), Pn'7rT^-tk => Q^-tk and Pn7rj*..tk=> PTr~*..tk, so QTTj~*..tk — iV"?-..^). But then, since a probability measure on С is completely determined by its finite-dimensional distributions (the finite-dimensional sets form a determining class), Q and P must them- themselves coincide. Thus each subsequence {Pn,} contains a further subsequence converging weakly to P, and it follows by Theorem 2.3 that the entire sequence {Pn} converges weakly to P. Note that, if {Pn} does converge weakly to P, then it is relatively compact, so that relative compactness is not an excessive requirement. Suppose we know that {Pn} is relatively compact and that, for each (tx, . . . , tk), Pnrrj^..tic converges weakly to some probability measure pt ...ik on (Rk, Mk)—the point being that we do not assume at the outset that the nt...tic are the finite-dimensional distributions of a probability measure on (C, <i). It still follows that each subsequence {Pn,} contains a further subsequence {Pn*} converging weakly to some limit. Since this limit must have the (*tl...tk as its finite-dimensional distributions, it is unique. Therefore {Pn} converges weakly to some P. These ideas provide a powerful technique for proving weak convergence in С and in other function spaces. One proves first that the finite-dimensional f We take this as a definition; it is really a sequential compactness notion in Z(S) as denned in the footnote on p. 11. See Appendix III for further topological information.
36 Weak Convergence in Metric Spaces distributions converge weakly and then that the sequence in question is relatively compact. To use this method we need an effective criterion for relative compactness. Consider R1 first. Let П be a family of probability measures on (R1, &1). Given a sequence {Pn} from П, we can apply to the sequence {Fn} of corre- corresponding distribution functions the classical Helly selection theorem (see p. 227). There exist a subsequence {Fn>} and some function F such that F.1) FA*)-+F(x) holds for all its continuity points x. The function F may be taken to be right- continuous, in which case there exists on (R1, ?%*) a finite measure /л, such that F.2) /л(а, b] = F(b) - F{d). Now it may happen that ^(R1) < 1. For instance, if Pn corresponds to a unit mass at the point n, then .Fis identically 0, no matter what subsequence {Fn,} is selected, so that ^(R1) — 0. If Pn is the uniform distribution over [—n, n], then F(x) = \ is the only possibility—again /j,(R}) = 0. In these examples, mass is "escaping to infinity" in an obvious and intuitive sense. Suppose, however, that the measure /л determined by F via F.2) satisfies /^(R1) = 1. Then [л is a probability measure and, since F.1) holds at continuity points of F, we have (see Section 3) Pn, => [x. Thus П will be relatively compact if we somehow ensure that each of these measures уь we encounter satisfies ^(R1) — 1. Now ^(R1) = 1 if for every positive e there exist a and b such that ц{а, b] > 1 — e. Suppose that for every positive e there exist a and b such that F.3) Pn(a,b]>l-e, л =1,2, .... Since F.3) persists if a is decreased and b increased, we may take a and b to be continuity points of F, in which case F.3) and F.1) together imply M(a, b]>l-e. It follows that a family П of probability measures on (i?1, 3^) is relatively compact if for each positive e there exist a and b such that P(a, b] > 1 — e for all P in П, a condition which has the effect of preventing the escape of mass alluded to above. On the other hand, if this condition fails, then there exists some positive e such that, no matter what a and b are, P(a, b] < 1 — e for some P in П. Choose in П a Pn with Pn(—n, n] < 1 — e. If some subsequence {Pn'} were to converge weakly to a probability measure Q, we would have, for each x, Q(-x, x) < lim infn, PA-*, x) ^ Ига infn, Pn<-n', n'] < 1 - e, which is impossible. Thus the condition is both necessary and sufficient.
Prohorov's Theorem 37 Since an interval (a, b] has compact closure, and since each compact set can be enclosed in such an interval, the condition can be recast: a family П of probability measures on (R1, &1) is relatively compact if and only if for each positive e there exists a compact set К such that P(K) > 1 — e for all P in П. Now the condition has meaning in an arbitrary metric space. A family П of probability measures on the general metric space S is said to be tight if for every positive e there exists a compact set К such that P(K) > 1 — ? for all P in П. If П consists of a single measure alone, this definition reduces to the one introduced in Section 1. We have just seen in the case R1 that tightness is necessary and sufficient for relative compactness. The following theorems, due to Prohorov, extend the sufficiency to arbitrary metric spaces and the necessity to separable, complete spaces. THEOREM 6.1 If U is tight, then it is relatively compact. THEOREM 6.2 Suppose S is separable and complete. If П is relatively compact, then it is tight. We shall refer to these two theorems conjointly as Prohorov's theorem, calling Theorem 6.1 the direct half and Theorem 6.2 the converse half. The Direct Theorem We turn first to the proof of the direct half, which is the more useful in the applications. We shall prove the result successively for Rk, for i?00, for S cr-compact (a countable union of compact sets), and, finally, for the general S. Each of the last three cases is handled by reducing it to the one preceding. The case Rk. Here the proof is practically the same as the one already given for R1. If {Pn} is a sequence in П, Helly's theorem (p. 227) implies that the sequence {Fn} of corresponding distribution functions contains a subsequence {F„,} such that F.4) Fn.(*) - F(x) at continuity points of F, where F is a function continuous from above. There exists on (Rk, &k) a measure ju such that ii{a, b] is Fdifferenced around the vertices of the ^-dimensional rectangle (a, b]. Now Pn. => [i will follow if we prove /j,(Rk) — 1. Given e, choose a compact set К in Rk such that Pn>(K) > 1 — e for all ri, which is possible because П is tight. Now choose a and b such that К <=¦ (a, b] and such that all 2k vertices of (a, b] are continuity points of F (this is possible because only countably many parallel (k — l)-dimensional hyperplanes can have positive ^-measure). Since Pn>{a, b] is Fn. differenced around the vertices of (a, b], F.4) implies Pn-{a, b] —*¦ р(а, b]. From Pn>(a, b] >_
38 Weak Convergence in Metric Spaces Pn>(K) > 1 — ?, it follows that ц(а, b] > 1 — e. Since e was arbitrary, ju(Rk) — 1. Thus П is relatively compact. For the case i?00 we shall need the following lemma. LEMMA 1 If II is a tight family on (S, 3f) and if h is a continuous mapping from S to S', then {Ph'1:? eU} is a tight family on (S', Sf"). Proof Given e, choose in S a compact set К such that P(K) > 1 — e for all P in П. If К' я hK, then K' is compact (see p. 218) and hrxK' => K, so that Phr^K') > 1 - ? for all P in П. The case i?°°. If П is a tight family on (i?°°, ^°°), then, by Lemma 1, {Ртт-г:Р е П} is, for each k, a tight family on (Rk, 0tk). By the direct half of Prohorov's theorem for Rk, just proved via Helly's theorem, we can select from a given sequence {Pn} in П a subsequence {Pn>} such that Р^тт^1 converges weakly to a probability measure fa on (Rk, 3%k). By the diagonal method (as on p. 219), we can choose the sequence {Pn-} so that Р^тг^1 => fa holds for all к simultaneously. Since the measures fa obviously satisfy the consistency conditions of Kolmogorov's existence theorem (see p. 228), there is a probability measure Q on (i?00, ^°°) such that Qtt'1 — fa for all k. (It was shown in Section 3 that the cr-field ^°° of Borel sets in i?00 coincides with the <r-neld generated by the finite-dimensional sets, which is the cr-field that intervenes in Kolmogorov's existence theorem.) But then iV^1 => Qtt^1 for each k, so that the finite-dimensional distributions of Pn. converge to those of Q, which, as noted at the beginning of Section 5, implies Pn> => Q. Tightness implies relative compactness in i?°°. For the cr-compact and the general cases, we shall need two further lemmas. Suppose that 5*0 is a Borel subset of 5*: F.5) Soe#>. Now 5*0 is a metric space in its own right in the relative topology; let S?o denote the class of Borel sets for 5*0. From F.5) it follows (see p. 224) that F.6) У0={А:Ас: S0,Ae^}; in particular, F.7) ^0 с у. If P is a probability measure on E*, Sf) with P(S0) = 1, let Pr (r for restric- restriction) be the probability measure on (So, ?f0) got by restriction from У to SfQ (see F.7)). If P is a probability measure on E*0, Уо), let Pe (e for exten- extension) be the probability measure on (S, Sf) with Pe(A) — P(A r\ So) for Ae^ (see F.6)). Note that Pe(S0) =* 1.
Prohorov's Theorem 39 If P is a probability measure on (S, ?P) with P(S0) = 1, then F.8) (РУ = P; if P is a probability measure on E*0, S^o), then F.9) (Pe)r - P. If, in Lemma 1 and in Theorem 5.1, we take h to be the identity mapping from 5*0 into 5*, we obtain the following lemma. LEMMA 2 If U is a tight family on (So, S?o), then Ue - {Pe:P e П} is a tight family on (S, Sf). IfPn^>Pin (So, S?o), then Pne=>Pe in (S, SP). LEMMA 3 IfPn^>Pin (S, Sf) and Pn(S0) = P(S0) = 1, then Pnr =>Pr in (So, ^o). Proof The general open set in So is Go — G П So with G open in S. Since Pnr(G0) = Pn(G) and РЧ^о) = P(G), lim infn Pn(G) > P(G) implies lim infn Pn(G0) > P'(G0). The o-compact case. If 5* is cr-compact, then it is separable and hence can be embedded homeomorphically into i?00 (see p. 219). Since 5* is cr-compact, so is its image under the homeomorphism; in particular, this image is a Borel subset of i?°°. Thus 5* is homeomorphic to a Borel subset of i?00. By Theorem 5.1, weak convergence persists under homeomorphism, and hence so does relative compactness of П. Since compactness of sets persists under homeo- homeomorphism, so does tightness of П. Therefore we can replace 5* by its homeomorphic image. We may thus assume that 5* is a Borel subset of i?°°. If П is tight in E*, ?f), then, by Lemma 2 applied to i?°° and its subset S, ГР is tight in (i?°°, ^°°). Since, as already proved, Theorem 6.1 holds in this larger space, Ue is relatively compact, so that, for each sequence {Pn} in П, the corresponding sequence {Pne} contains a subsequence {Pen-} converging weakly in the sense of (i?00, ^°°) to some Q. By the tightness of П itself, there is for each e a compact subset К of S such that Penl(K) = Pn-(K) > 1 - e for all n', so that Q{S) > Q(K) > lim supn,P^CK) > 1 - ?. Thus S supports Q as well as all the Penl, and hence, by Lemma 3 and F.9), Pn, converges weakly in the sense of E*, SP) to Qr. Tightness implies relative compactness if S is cr-compact.t The general case. Whatever 5* is, if 5*0 — {J^, where Kt is a compact set in S such that Р(Кг) > 1 — 1// for all P in П, then 5*0 supports each element of f A separable 5 is homeomorphic to a subset of i?00. The stronger assumption that S is ст-compact ensures that its image lies in ^°°, which is convenient because Lemmas 2 and 3 become awkward without the assumption So G У.
П and Пг = {РГ:Р е П} is a tight family in (So, Уо). By the case just treated, Пг is relatively compact. For each sequence {Pn} in П, therefore, the corresponding sequence {Pnr} contains a subsequence {Prn'} converging weakly in the sense of (So, S^o) to some Q. By Lemma 2 and F.8), Pn> converges weakly in the sense of E*, Sf) to Qe. Thus П is relatively compact, which proves Theorem 6.1 in full generality. The Converse We turn now to Theorem 6.2, the converse half of Prohorov's theorem. Note that it reduces to Theorem 1.4 in case П consists of a single measure; the proof is a refinement of the earlier one. Suppose that for each positive e and д there exists a finite collection Alt . . . , An of E-spheres such that Р(и<<»Л) > 1 - ? for all P in П. The following argument shows that П is then tight. Given e, choose, for each integer k, finitely many 1/fc-spheres Akl, . . . , АкПк such that P([Ji<nkAkl) > 1 — ej2k for all P in U. If К is the closure of the totally bounded set r\k>i\Ji<nkAki> tnen P(K) > 1 — ? and, since 5* is assumed complete, jSTis compact (see p. 217). We prove Theorem 6.2 by showing that, if the condition stated in the preceding paragraph fails, then П is not relatively compact. Suppose then there exist positive e and д such that every finite collection Ax, . . . , An of 5-spheres satisfies P(\Ji<nAi) <[ 1 — e for some P in П. Since S is assumed separable, it is the union of a sequence Alt A2, ... of open spheres of radius д. Let Bn = \Ji<nAi and choose Pn in П so thatPn(J?n) < 1 — e. Suppose some subsequence {Pn<} converged weakly to some limit Q. Since Bm is open, we would have P(Bm) <, lim infn, Pn,(Bm) for each fixed m. But then, since Bm с Bn, for large n, P(BJ < lim infn. Pn>(Bn.) < 1 - e would follow; since Bm increases to S, this is impossible, which completes the proof. For strengthened forms of Theorem 6.2, see Appendix III. If Xn are random elements of S, we say {Xn} is tight when {Pn} is tight, where Pn is the distribution of Xn. If S is i?00 or C, we identify the finite- dimensional distributions of Xn with those of Pn. The argument at the beginning of this section may be interpreted as asserting of random elements Xn and X of С that, if the finite-dimensional distributions of Xn converge weakly to those of X and {Xn} is tight, then Xn ^ X. Remarks. In his original proof of Theorem 6.1, Prohorov A956) assumed S to be separable and complete; the present extension and its proof are due to Varadarajan A958a and 1961a). See also LeCam A957). PROBLEMS 1. If П is tight, then its elements have a common сг-compact support. The converse fails (unless, for example, П consists of a single measure).
2. Let П consist of the unit masses for points in A. Show directly from the definition that n.is relatively compact if and only if A~ is compact in S. 3. A sequence of probability measures on the line is tight if and only if the corresponding distribution functions satisfy lima._00 Fn{x) = 1 and lim^^oo Fn{x) = 0 uniformly in n. A class of normal distributions is tight if and only if the means and variances are bounded. 4. Show that, if Fis a right-continuous, nondecreasing real function with 0 < F(x) < 1, then there exist distribution functions Fn such that Fn(x) —*¦ F(x) for each continuity point x. 5. If {\Xn\d} is uniformly integrable (see p. 32) for some д > 0, then {Xn} is tight. 6. Probability measures on a product S' x 5" are tight if and only if the two sets of marginal distributions are tight in S' and 5". 7. Suppose S is separable and locally compact. Since S can be given a metric under which it is complete [Problem 3 of Section 1], Theorem 6.2 applies. From the fact that each compact set in S can be enclosed in the interior of a larger compact set, it follows further that Pn => P if and only if J fdPn —*¦ J fdP for every (bounded) continuous / with compact support (a result which cannot hold in C, for example, since there a continuous function with compact support vanishes identically; see Problem 5 in Section 3). 8. In connection with Lemma 3, note that, if П is a tight family on E, if) and P(S0) = 1 for all Ре П, then it does not follow that LTr = {Pr:Pe П} is a tight family on (SQ, Уо). [Take S = [0, 1] and So = @, 1) and consider point masses.] This can happen even if П consists of a single measure. [See Remark 2 following Theorem 1 in Appendix III.] 7. FIRST APPLICATIONS In this section we examine from the point of view of the preceding theory some standard probability results, results used frequently in the chapters that follow. Smooth Functions In proving Theorems 1.2 and 1.3 and in proving the implication (ii)—>¦ (iii) in Theorem 2.1, we used arguments involving the function A.1). These arguments still go through if, in place of A.1), we use any uniformly continuous 93 such that y(t) =1 for t < 0, 0 < y(t) < 1 for 0 < t < 1, and (p(t) =; 0 for t > 1. It is possible to construct such functions cp with smoothness properties stronger than uniform continuity. Outside [0, 1], define 9? as before; for 0 < t < 1, define G.1) where Then, for each integer v, cp has, over the whole line, a bounded, continuous derivative of order v. Let Pn and P be probability measures on (R1,
42 Weak Convergence in Metric Spaces THEOREM 7.1 IfJ /'dP'n—> J /' dP for every bounded, continuous function f having bounded, continuous derivatives of each order, then Pn =>P. Proof. Consider the distribution functions Fn and F corresponding to Pn and P. If <pu(f) = (p(ut), with <p defined by G.1), then, for each u, lim sup FJx) < lim <pu(y - x)Pn(dy) =; <pu(y - x)P{dy) < Fix + -I. П—+ОЭ П—+СО J J \ U ' Using <pu(y — x + 1/u) in place of cpu{y — x), we see that lim infn FJx) > .F(x — 1/m) for each u. Hence Fn(x) —*¦ F(x) if F is continuous at x, and Pn=>P follows. The Central Limit Theorem Theorem 7.1 can be used to derive the classical central limit theorems for triangular arrays. For each n, let G-2) U, • • ¦ , fnt. be independent random variables with mean 0 and finite variance a2nk. (The probability space on which the variables G.2) are defined may vary with n.) Let Sn = |nl + • • • + €nkn and suppose its variance sn2 = ffni + ' ' ' + °2пкп ^s positive. Recall that TV is a random variable normally distributed with mean 0 and variance 1. We first prove Lindeberg's theorem: THEOREM 7.2 If G.3) -2 |f & d? - 0 Sn fc=l J{|?nfc|>?Sn} (n —»- oo) for each positive e, then Sjsn ^ TV. Proof It suffices to show that G.4) E{ for every bounded, continuous /having bounded, continuous derivatives of each order. Fix such a function /and define G.5) g{h) = supx |/(* + h) - /(*) -/'(*)/* - \f"{x)h*\ (g is Borel measurable). By the mean value theorems of the second and third orders, there is a constant K, depending on/alone, such that G.6) g(h)< К mm {h\\h\*}. We have g(h) <, Kh2 and g(h) < K\h\s; the first inequality is good for h large, the second for h near 0. Since / is fixed throughout the rest of the
First Applications 43 argument, so is K. From the definition G.5) it follows that G.7) \[f(x + hj -f(x + h2)] - [f\x){hx - h2) + i/'(*)(V - hm + g(h2). If the |nfc were all normally distributed, E{/(Sn/sn)} would coincide with E{/GV)}. If we successively replace the ?nk by normal variables r\nk with mean 0 and variances a2nlc, we get a sequence n J)}, The first member of the sequence is E{f(SJsn)} and the last one is E{f(N)}. The idea now is to show that each member is so close to the next (for n large) that even the first and last members are close. Since G.4) involves the joint distribution of the variables G.2) but not any properties of the probability space they are defined on, we may, by passing to a new space (say (R2kn, ?%2kn)), introduce random variables r]nl, . . . , rjnkn such that rjnk is normally distributed with mean 0 and variance a2nk and such that the 2kn random variables G.8) are independent. If bnl> • • • > bnfcn> < 2 k<i<kn < k <> K, then, since ?,nkn + ?nfcn = Sn and since ?nl + r)nl has the distribution of snN, E/ ^ - ,2 'Ink + Г]пЛ *n I \ Sn Because of the independence of the sequence G.8), the three variables t,nk, gnk, and rjnk are independent for each value of к and therefore пк) — 0 From G.7) it follows that
44 Weak Convergence in Metric Spaces The proof will therefore be complete if we show that G.9) IeUMUo *-i 1 \snn and G1O) Given ? > 0, split the expected value in G.9) into an integral over {|?n*:l < esn) an(i an integral over the complementary set. Use G.6) to bound the integrand by K\^nk/sn\3 on the first set and by K\?nkjsn\2 on the second: E\g\ — \) < I \snJ) Thus kn I It \ ч 1 kn G 11) У E{ o- i zjifc\ I ^ j^-? _i_ _g — V and G.9) follows from G.3). Since G.11) also holds with rjnk in place of |nfc, to prove G.10) we need only show that G.12) T^Ii/, >?S}^^P tends to 0 for each positive e. But G.12) is at most Since .^Ц < e2 + Sn -Г G.3) implies mo.xk^knankjsn —»- 0, which in turn implies 2*»х o3njsn3 —*- 0. Thus G.12) does tend to 0, which completes the proof. If the ink have moments of order 2 + d, then the sum in G.3) is at most ?~5^~2~5 S^x E{||nfc|2+5}, which proves Lyapounov's theorem: THEOREM 7.3 If, for some positive d, 2+S 2, Finally, from Theorem 7.2 we can deduce the Lindeberg-Levy theorem:
First Applications 45 THEOREM 7.4 If ?l5 ?2, . . . are independent and identically distributed with mean 0 and finite variance a2 > 0, k=1 To prove this last result, take kn = n and ?m- = <^; the sum in G.3) is at most Ii2/ff2 integrated over flU > ea\jn}. Characteristic Functions If P is a probability measure on (Rk, &k), its characteristic function p is defined as p(t) — I eitxP(dx), t eRk, where t • x = 2^=1 tuxu denotes inner product. Let Q be a second probability measure on (Rk, 3#k), and let ^ be its characteristic function. We shall prove the uniqueness theorem: THEOREM 7.5 If p{t) = q(t)for all t, then P = Q. For each t, eitx is, as a function of x, an element of C(Rk) (or its real and imaginary parts are). Thus in the case Rk the uniqueness theorem refines Theorem 1.3, which asserts that P is determined by the values of I f dP for fe C(S). We shall prove uniqueness not by deriving an explicit inversion formula but by using the Weierstrass approximation theorem. We know from Section 3 that the rectangles (a, b] form a determining class; since such a rectangle is an increasing union of closed rectangles, it suffices to check that P and Q agree for sets A = [аъ Ъг] X • • • X [ak, bk]. By the argument for Theorem 1.3, this will follow if, for each integral u, fdP = (fdQ G.13) [fdP = [ holds when / has the form /(xl5 . . . , xk) =/x(xx) • • -fk(xk) with f}(s) = <pu(p(s, [aj} bj])), where p denotes linear distance and <pu is defined by A.4). Fix u. Given в, choose r so large that each/,- vanishes outside the interval [—r, r] and, at the same time, so large that, if /r denotes the cube {x:\xj < r, m = 1, . . . , k}, then P(Irc) < e and ?>(//) < e. Since fj(—r) =fj{r), fj{s) can, by Weierstrass's theorem,! be uniformly t For example, see Titchmarsh A939) or, for Stone's generalization, Simmons A963).
46 Weak Convergence in Metric Spaces approximated in [ —r, r] by a finite trigonometric sum И1а.1еШз/г of period 2r. Multiplying together these sums for the different values of j, we see that/can be uniformly approximated in Ir by a finite trigonometric sum G.14) g(aO = Sav*"'"- of period Ir in each variable. Choose G.14) so that |/(x) — g(x)\ < e for x in Ir. Since/is everywhere bounded by 1, g is bounded by 1 + ? in Ir and hence, by periodicity, in all of Rk. Thus [/— g\ is bounded by ? in /r and by 2 + ? everywhere. Assuming 0 < в < 1, we have I/- g\ dP < ? + B + e)P(Ir°) < 4?. J Similarly, /|/— g| dQ < 4e, so that fdP - jgdP- jgdQ 8?. But $ g dP — § g dQ is an immediate consequence of /? = q and G.14); since ? was arbitrary, G.13) follows. Suppose now that Pn and P are probability measures on (Rk, 0tk~) with characteristic functions pn and p. We shall prove the continuity theorem: THEOREM 7.6 A necessary and sufficient condition for Pn^>P is that />n@ -*p(t)for each t. The necessity follows from the fact that eitx is bounded and continuous in x for each t. We shall prove the sufficiency via two intermediate propositions. If the pn{t) converge pointwise to some limit and if {Pn} is tight, then Pn => P for some P. Let g(t) — limn pn(t); we make no assumptions whatever about g. If {Pn} is tight, then, by Prohorov's theorem, it is relatively compact, so that each subsequence {Pn'} contains some further subsequence {Pn-} converging weakly to some limit whose characteristic function must then be linvPn"(t) ~ g@- By the uniqueness theorem, there is thus only one possible such limit, which is the P we seek (see Theorem 2.3). Notice that this proposition fails without the assumption of tightness [example: к — 1 and Pn uniform over (—n, и)]. Notice also that the proof just given exactly parallels the argument at the beginning of Section 6 involving tightness and finite-dimensional distributions for probability measures on (С, ЯГ). Now let us show that, if the limit function g{f) =* \\mnpn(t) is continuous at t = 0, then {Pn} is tight (and hence weakly convergent to some limit). Clearly, {Pn} is tight if each of the к corresponding sequences of one- dimensional marginal distributions is tight. The characteristic functions pn(s, 0, . . . , 0) of the marginal distributions for the first coordinate converge pointwise to a limit g(s, 0, . . . , 0) continuous at s — 0, and
First Applications 47 similarly for the other coordinates. Thus each sequence of marginal distribu- distributions satisfies the hypotheses and it suffices to treat the case к = 1. In this case, by Fubini's theorem, J L J-u G.15) U J-u sin ux\ > 2 f (l - -M Рп(*п) > Pn{x: \x\ > -) (in particular, the first integral is real). Since g is continuous at the origin, there is, for each positive e, а и such that urx J"u |1 — g{t)\ dt < e. By the bounded convergence theorem, for n beyond some n0 we have (• Ju \l -pn(t)\ dt < 2e and hence, by G.15), Pn{x:\x\ > a} < 2e with a = 2/w. By increasing a if necessary, we can ensure that this inequality holds for the finitely many n preceding n0: {Pn} is indeed tight. If g is the characteristic function of a probability measure, it is auto- automatically continuous; the continuity theorem follows. It is instructive to compare the following pairs of statements, of which all are true except 6°. Here Pn and P are probability measures on (Rk, ?%k) or on (C, *?), as appropriate; in the former case,/?n and/? are their characteristic functions andg is a function on Rk; in the latter case, Рптт^...и and Ртт^..и are their finite-dimensional distributions and f*J*..t. are probability measures on (R\ 0b\ Space Rk Space С 1. The function p(t) determines P. 1°. The measures Pn^1 t, determine P. 2. If pn(t) converges to some g{t) for 2°. If iVr converges weakly to some each t and if {Pn} is tight, then Pn fxf u for each (tx- • • t{) and if {Pn} converges weakly to some P. is tight, then Pn converges weakly to some P. 3. Statement 2 fails without tightness. 3°. Statement 2° fails without tightness. 4. If pn(t)converges to some^-(r) for each t and if g is continuous at 0, then {Pn} is tight (and hence, by 2, weakly convergent). 5. If pn(t) ^p{t) for each / and if {Pn} 5°. If Рпщ*..и => Рщ*..и for each (rx • • • tt) is tight, then Pn => P. and if {Pn} is tight, then Pn => P. 6. Upn(t)-»p(t) for each /, then Pn^ P. 6°. [If Pnn^..u=> Ртг'Ц for each (/x •¦•/0, then jPn => P.]
48 Weak Convergence in Metric Spaces Statement 1 is the uniqueness theorem for characteristic functions; 1° restates the fact that the finite-dimensional sets in С form a determining class. The proofs of 1 and 1° have nothing particular in common. Statements 2 and 2° follow, respectively, from 1 and 1° by exactly parallel arguments. Counterexamples substantiate 3 and 3°. Statement 4 has no analogue 4°. Now 5 and 5° follow, respectively, from 2 and 2°, again by exactly parallel arguments; but the assumption of tightness in 5 is superfluous because of 4 and the fact that/? must be continuous. Suppressing the tightness assumption leads from 5 to 6 and from 5° to 6°; although 6 is true (because of 4 and the continuity of/?), 6° is false (because there is no 4°). Not only is 6° false as it stands (as an implication applying to all P and all {Pn}), but there is no single P for which it is true (as an implication applying to all {Pn})\ Define hn:C^ С by hn(x) - x + xn with xn defined by C.5) and, given P, put Pn — Ph~x. For some sufficiently small d, the open set A = {x:x(t) < x@) + \, t <d} satisfies P(A) > 0. If An = A - xn, then lim supn An is empty and hence lim infn Pn(A) < lim supn Pn(A) = lim supn P(An) = 0. By Theorem 2.1, Pn cannot converge weakly to P—but the finite-dimensional distributions do converge. The argument just given also shows that there can be no 4°. In this connec- connection, notice that 3° follows from the fact that 6° is false; since 6 is true, 3 requires a special counterexample (the limit function g must be discontinuous atO). If there were a 4°, most of Chapter 2 would be superfluous (and if there were a 4° for the space D, most of Chapter 3 would be superfluous). About 6°: Since all the finite-dimensional distributions of P and all the Pn determine P and all the Pn themselves, conditions for Pn=>P can in principle be given in terms of finite-dimensional distributions alone.f But we must in effect deal with all sets (tx, . . . , t{) simultaneously asn^- oo—it will not do to fix an arbitrary (tlt . . . , tt) and consider what happens as n —*- oo. And this is where tightness comes in. The Cramer-Wold Device By means of the following simple device due to Cramer and Wold, problems involving random vectors in Rk can often be reduced to problems involving only ordinary random variables in R1. Suppose that /c-dimensional random f See Bartozynski A961).
First Applications 49 vectors Xn - (Xnl, ..., Xnk) andX- (Xx, . . . , Xk) satisfy J=l 5=1 for each point t — (tu . . . , tk) of Rk. Then the characteristic functions pn(s) =¦ E{exp (is 2*=1 tjXnj)} of these one-dimensional random variables converge to p(s) = E{exp (is S*=1 tjXj)} for each real s. Taking s = 1, we see that Since t was arbitrary, Хп ^- X follows by the continuity theorem for characteristic functions. THEOREM 7.7 In Rk, Xn converges in distribution to X if and only if each linear combination of the components of Xn converges in distribution to the corresponding linear combination of the components of X. In the terminology of Section 2, the half-spaces in Rk form a convergence- determining class, a fact which apparently cannot be proved without ideas from Fourier analysis. Local and Integral Limit Theorems If Pn and P are probability measures on Rk with densities pn and p with respect to Lebesgue measure and if G.16) PnW^Pi?) for all x outside some set of Lebesgue measure 0, then, by Scheffe's theorem (p. 224), G.17) Pn^P. The relation G.16), called a local limit theorem, implies the relation G.17), called an integral limit theorem. There is an analogous result in case P has a density but the mass for Pn is confined to a lattice in Rk. Let d(n) =; («^(и), . . . , дк(п)) be a point of Rk with positive coordinates, let <x(n) — (ai(«), • . • , <*k(ri)) be an arbitrary point in Rk, and denote by Ln the lattice consisting of all those points of Rk having the form where щ, . . . , uk range independently over 0, ±1, ±2, . ... If ж is a point in Ln, then the interval (x — d(ri), x] (see C.1) for the notation) is a cell of volume vn = d^n) • • ¦ 6k(n), and Rk is the disjoint union of these countably many cells.
50 Weak Convergence in Metric Spaces Suppose now that Pn is a probability measure in Rk with Pn(Ln) — 1, and, for x g Ln, \etpn(x) be the mass (possibly 0) at that point. Let P have density p with respect to Lebesgue measure. THEOREM 7.8 Suppose that G.18) max {^(n), . . . , <5t(n)} - 0. Suppose further that, if {xn} is any sequence and x any point in Rk, and if xn lies in Ln and varies with n in such a way that G.19) xn^x, then G.20) ^^ — p(x). Then Pn => P. Proof. Define a probability density qn on Rk by setting qn{y) — pn(x)lvn if у lies in the cell (x — d(n), x] determined by the point x of the lattice Ln. Since G.19) implies G.20), we have G.21) qn(y) for each y. Let Xn have density qn, and define Yn on the same probability space by setting Yn = x if Xn e (x — d(n), x] with x e Ln. What we are to prove is Yn®+P. Since \Xn - Гп| < \д(п)\, this will follow by G.18) and Theorem 4.1 if we prove Xn ^ P. But, in view of G.21), this is a consequence of Scheffe's theorem. Weak Convergence on the Circle and Torus Let S be the unit circle in the complex plane and let eu(x) = xu be the circular functions, и = 0, ±1, ±2, .... The Fourier series of a measure P on S is defined by p(u) — J eu(x) P(dx) for integral u. Since P is determined by the values of I fdP for/in C(S) and since each/in C(S) can, by Weierstrass's theorem, be uniformly approximated by linear combinations of the circular functions, p determines P. By the same reasoning as for the line, this unique- uniqueness theorem implies a continuity theorem: Pn => P if and only if the corre- corresponding Fourier series satisfy pn(u) —*¦ p(u) for each integer u. Since the circle is compact, there is this time no question that {Pn} is tight; hence {Pn} converges weakly to some limit if \imn pn(u) exists for each u. If P is normalized circular Lebesgue measure, then p(u) vanishes for и Ф 0 (and, of course, is 1 for и = 0). Weyl's criterion follows immediately: (xl5 x2, . . .) is uniformly distributed on the circle if and only if n 2"=1(а;3-)и —»- 0 for each nonzero u. In particular the powers x} are uniformly distributed if x is not a root of unity.
First Applications 51 The circle can be rolled out onto the interval [0, 1) by means of the correspondence e2lTix <-> x. The Fourier series of a probability measure P on [0, 1) is denned by p(u) = J e2iriux P{dx) for integral u; of course, the unique- uniqueness and continuity theorems for the circle can be restated for the interval [0, 1). Although weak convergence here must refer to the topology that [0,1) inherits from the circle and not to the relative topology of the line (for example, 1 — 1/rc -+ 0), we can clearly ignore this distinction if the limit measure assigns measure 0 to the point 0. Thus, if Pn, P are probability measures on R1 with support [0, 1) and if P{0} = 0, then Pn^>P if and only if L2:Tiuxpn(dx) -* L for each integer u. A sequence (х1г x2, . . . ) of real numbers is said to be uniformly distributed modulo 1 if the empirical distributions of the sequence ({хх}, {x2}, . . . ) of fractional parts converge weakly to the uniform distribu- distribution on the unit interval. Weyl's criterion becomes: (x1, x2, . . . ) is uniformly distributed modulo 1 if and only if n~x SJLi e2iriuXi —> 0 for each nonzero integer u. This criterion is satisfied if xt = j!-, where ? is irrational. By the ?>dimensional Weierstrass theorem, similar results hold for the A:-dimensional torus—the product of к circles. (They even carry over to the general compact group, with the characters in the role of the circular func- functions.) Taking the case к = 2 and laying the torus out into a square, we see that, if Pn and P are probability measures in R2 supported by the square {(x, y):0 < x, у < 1}, and if the lower and left-hand edges of the square have P-measure 0, then Pn => P if and only if /• 2rri(ux+vy) jp __. „2iri{ux+vv) for all pairs of integers и and v. We say a sequence ((ж1} уг), (х2,у2),...) in the plane is uniformly distributed modulo 1 if the empirical distributions, obtained by reducing all coordinates modulo 1, converge weakly to the uniform distribution on the square; this holds if and only if rr1 2™=1 e27ri{uxi+vv')-^ 0 whenever и and v do not both vanish. This condition is satisfied if (x^, y}) = (j?,jrj), where ?, rj, and 1 are linearly independent (there do not exist integers и and v, not both 0, such that u? + vr\ is integral). To take another case, suppose ? and r\ satisfy exactly one constraint: ug + vr\ is integral if and only if и = 2w and v = 3w for some integer w. Then 1 if — = - is integral, 2 3 0 otherwise.
52 Weak Convergence in Metric Spaces By checking the Fourier coefficients, one sees that the empirical distributions converge weakly to a distribu- distribution uniform on the four slanting lines in the diagram. The Cramer-Wold result has an obvious analogue here. For example, the planar sequence ((ж1} у-,), (х2,У-д-, • • •) is uniformly distributed modulo 1 if and only if for all integral и and v, not both 0, the linear sequence (ихг + vyx, ux% + vy2, . . .) is uniformly distributed modulo 1. Remarks. Theorem 7.2 is here proved by Levy's version of Lindeberg's method; see Levy A925, pp. 246-249). See Feller A966) for other approaches to the central limit theorem and for characteristic functions. There exists for linear spaces a theory of characteristic functions which ties in closely with weak convergence; see Prohorov A960) and Gross A963) and the references there. See Cramer and Wold A936) for their device. See Hardy and Wright A960) for uniform distribution modulo 1, in particular for the last example here. PROBLEMS .1. Replace the integrand in G.1) by a B&)th-degree polynomial so chosen that cp has bounded, continuous derivatives up to order к (к = 3 suffices for the proof of Lindeberg's theorem). 2. Use the Cramer-Wold device to derive a ^-dimensional version of the Lindeberg- Levy theorem. 3. A measurable function h{x) on the line has a mean value if the limit i Г M{h) = lim — h(x) dx y_oo IT J_T exists; it has a distribution if the probability measures (here X is Lebesgue measure) PT(A) = — X{x: \x\ < T, h(x) EA}, A converge weakly as T-* со. If h is almost periodic, then it is bounded and, for each t, eltx is almost periodic (see Bohr A932)); from the first proposition used in deriving the continuity theorem for characteristic functions, show that h has a distribution. 4. Consider on the line probability measures having moments of all orders. The measure P is said to be determined by its moments if ]xu P(dx) = $xu Q(dx), и = 1, 2,. . . implies P = Q. (See Feller A966, pp. 224 and 487) for an example of a P not determined by its moments and for a criterion for P to be determined by its moments.) If supn \$xu Pn(dx)\ < oo for each u, and if Pn => P, then the moments of Pn converge to those of P [Theorem 5.4]. If the moments of Pn converge to those of P and if P is determined by its moments, then Pn => P. [First show {Pn} is tight (Problem 5 in Section 6) and then imitate the proof given above for the continuity theorem for characteristic functions.] Use the Cramer-Wold technique to extend this result to Rk [consider product moments of all orders].
First Applications 53 5. If A: varies with n in such a way that (k — np)\^jnpq -> x, then pkqn-knpq-+-i=e-b*2 (see Feller A957, p. 169)). Use Theorem 7.8 to deduce the De Moivre-Laplace theorem. Apply the same technique to the hypergeometric distribution (see Problem 2 on p. 17 and Problem 10 on p. 180 of Feller A957)) and to the frequency count for a set of multinomial trials (leave out the count for one cell, so that the limiting distribution will have a density). 6. Use Problem 8 in Section 2 to prove that, if probability measures in Rh converge weakly, then the corresponding characteristic functions converge uniformly on bounded sets. 7. Generalize Theorem 7.8: Replace Lebesgue measure on Rk by a general measure on a separable S and consider decompositions {Cni} of S with Итта тах^ diam Cni — 0.
CHAPTER 2 The Space С 8. WEAK CONVERGENCE AND TIGHTNESS IN С The Introduction explains some of the merits of proving weak convergence in the space С = C[0, 1] of continuous functions on [0, 1], where we give С the uniform topology by defining the distance between two points x and у (two functions x and у of t e [0, 1]) as P(x, y) = sup, КО - y(t)\. Weak Convergence Although weak convergence in С does not in general follow from weak convergence of the finite-dimensional distributions, we saw at the beginning of Section 6 that it does in the presence of relative compactness. Since С is separable and complete (see p. 220) it follows by Prohorov's theorem (Theorems 6.1 and 6.2) that relative compactness of a family of probability measures on (C, ^) is equivalent to tightness of the family. Thus we have the following result. THEOREM 8.1 Let Pn, P be probability measures on (C, %). If the finite- dimensional distributions of Pn converge weakly to those of P, and if {Pn} is tight, then Pn => P. If we are to use this theorem to prove weak convergence in C, we must see just what tightness here means. Tightness The modulus of continuity of an element a; of С is defined by (8.1) wx(d) = w(x, d) = sup \x(s) - x(t)\, 0 < д < 1. |s-<|<<5 54
Weak Convergence and Tightness in С 55 Let {Pn} be a sequence of probability measures on (С, <&). THEOREM 8.2 The sequence {Pn} is tight if and only if these two conditions hold: (i) For each positive rj, there exists an a such that (8.2) Рп{х:\х(Щ>а}<г1, n>\. (ii) For each positive s and rj, there exist a d, with 0 < д < 1, and an integer щ such that (8.3) Pn{x-wM>e}<n, n>n0. Condition (i) stipulates that {Рптт^г} be tight. In connection with (8.3), note that wx(d) is continuous in x and hence measurable. Proof. Suppose {Pn} is tight. Given e and r\, choose a compact set К such that Pn{K) > 1 — rj for all n. By the Arzela-Ascoli theorem (p. 221), we have К <^ {ж:|ж@)| < a) for large enough a and К <^ {x: wx(d) < e} for small enough d, so that (i) and (ii) follow (with «0 - 1 in (ii)). This proves the necessity of (i) and (ii). Since a single probability measure P on (C, ^) is tight (Theorem 1.4), it follows by the necessity of (ii) that for each e and rj there is а д such that P{x:wx(d) > e} < rj. If {Pn} satisfies condition (ii), therefore, we may ensure that the inequality in (8.3) holds for the finitely many n preceding n0 by decreasing д if necessary. Thus we may assume in proving sufficiency that the n0 in (8.3) is always 1. Assume that {Pn} satisfies (i) and (ii), with щ always 1 in (8.3). Given rj, choose a so that, if A = {x:\x@)\<a}, then Pn(A) > 1 — \rj for all n, and choose dk so that, if then Pn(Ak) > 1 - 77/2*+! for all n. If К is the closure of А П f\°lA, then Pn(K) > 1 — 7] for all n and, by the Arzela-Ascoli theorem, К is compact. Hence {Pn} is tight. As the proof shows, we may demand that (8.3) hold for all Pn. With this change, the theorem is true if {Pn} is replaced by an arbitrary family П. In applications, we shall often prove (8.3) with d, s, and rj replaced by (say) \b, 3s, and 9r/, which is just as good. Theorem 8.2 transforms the concept of tightness in С simply by substituting for compactness its Arzela-Ascoli characterization. Our next theorem goes only a small step beyond this, but, even so, fills our present needs.
56 The Space С THEOREM 8.3 The sequence {Pn} is tight if these two conditions are satisfied: (i) For each positive rj, there exists an a such that (8.4) Рп{хЛхШ> a] <rj, n>\. (ii) For each positive e and rj, there exist a d, with 0 < d < 1, and an integer щ such that (8.5) \PnU- sup \x(s)-x(t)\>e)<r), n > n0, О [ t<s<t+d ) for all t. Of course, we restrict the t in (8.5) to 0 < t < 1; if t > 1 — d, we restrict s in the supremum to t < s < 1. Note that (8.5) is formally satisfied if д > Ifrj; but we require д < 1. Proof Fix д and let At = \x: sup \x(s) — x(t)\ > el. { t<s<t+d ) Now 5 and t each lie in intervals of the form [id, (i + 1K]. If \s — t\ < d, then these intervals either coincide or abut; it follows that (8-6) Pn{x:wx(d)>3e}<Pnl U Aid). \i<d I Since (8-7) Pj U Ai5) < \i<6 г J i<5 (8.5) implies Pn{x:wx(d) > 3e} < A + [l/d])drj < 2r\ (recall that 5 < l).f Therefore condition (ii) here implies the corresponding condition in Theorem 8.2. Since condition (i) is unchanged, the result follows. The following corollary contains the essential point of the proof. COROLLARY IfQ=^tQ<t1<---<tr=l,and if (8.8) и-и_г>д, 2<i<r-l, then (8.9) P{x:wxF)>3e}<^p[x: sup \x(s) - a;(^| > e). i=i i ц-.х<8<и ) Note that the inequality in (8.8) need not hold for i = 1 and / = r. t We use [a] to denote the integer part of a—the largest integer not larger than a.
Weak Convergence and Tightness in С 57 Although the inequality (8.6) entails no real loss, (8.7) does, and condition (ii) is not necessary.f Suppose, however, that the sets Ai& {b fixed, i varying) are independent under Pn and all have the same measure p, which will be at least approximately true for many of the measures Pn encountered in this chapter. Then If this quantity is bounded above by r\, then (if r\ < ^) (8.10) \ < (l + \-~\)p < -log(l - 17) < 2i?, which is a restriction of the same order as (8.5). Therefore any effective tightness criterion that goes beyond Theorem 8.3 must make some sort of positive use of dependence among the Aid. Random Functions Let X be a mapping from (П, ^, P) into С For the moment we do not assume X to be measurable {X~xc€ <= ^). For each со in П, X(co) is an element of С—а. continuous function on [0, 1] whose value at t we denote by X(t, со). For fixed t, let X{t) denote the real function on Q. with value X(t, со) at со; X{t) is the composition -ntX. Similarly, let (X(t^), . . . , X(tk)) denote the mapping from D. into Rk with value (X(tx, со), . . . , X(tk, со)) Sit CO. If A = {xe C:x(t) < a}, then Ae<€ and {co:X(t, со) < «.} - Х~гА. It follows that if X is a random element of С—that is, if X~xc€ <= ?fi—then each X{t) is a random variable {X{t)-x3P- <= ?g) and hence each (X{t^),. . . , X(tk)) is a random vector. Suppose, on the other hand, that each X(t) is a random variable. If В is the closed sphere in С with center x and radius d, then Х~гВ = {co:X(co) EB} = C\r {co:x(r) - д ^ X(r, со) < x(r) + 6}, where the intersection extends over the rationale, so that Х~хВе&. Since С is separable, X~xc€ с 38 follows: X is a random element of C. Thus X is a random function (a random element of C) if and only if each X(t) is a random variable. (We have merely proved once more that the finite-dimen- finite-dimensional sets generate ^.) Suppose now that {Xn} is a sequence of random functions. The sequence is by definition tight when the sequence of corresponding distributions is tight. According to Theorem 8.2, {Xn} is tight if and only if the sequence B^@)} is tight (on the line) and for each positive e and r\ there exist a d, 0 < б < 1, t See Problem 1.
58 The Space С and an integer щ such that (8.11) P{HXn, $)>?}<У, n> n0. (We use w(x, d) as an alternate notation for wx(d).) This condition stipulates that the random functions Xn do not oscillate too violently. Theorem 8.3 can be recast in the same way: {Xn} is tight if {Xn@)} is tight and if for each positive e and r\ there exist a d, 0 < д < 1, and an integer n0 such that (8-12) - P( sup \Xn(s) - Xn(t)\ > e) < V О [t<s<t+d J for n>n0 and 0 < t < 1 (with 5 in the supremum restricted to t < s < 1 in case 1 — д < t < 1). As explained in the Introduction, our first interest will be in random functions constructed in the following way. Let ?l5 |2> • • • ¦> be random variables on some probability space (Cl, 28, P). For the present, the tjn need not have any special properties such as stationarity and independence. We define Sn — ?i + • • • + ?n, with So =* 0, and construct Xn from the partial sums So, Slf . . . , Sn. For points ijn in [0, 1], we set (8.13) Хп(-,сЛ =J- (We confine our attention to norming factors of the form a^jn—others are possible.) For the remaining points t of [0, 1], we define Xn(t, со) by linear interpolation: If t e [(/ — l)/n, i/n], then xti l/и \ n / 1/и \n \ n J a^Jn Since 1 — 1 = [nt] if ? e [(/ — l)/n, i[n), we may define the function more concisely by (8.15) Xn(t, со) = -±- Slntl(co) + (nt - [nt]) -?- ?[n<]+1(co). a^Jn a^Jn Since the ?i} and hence the iSi5 are random variables, it follows by (8.15) that Xn(t) is a random variable for each t. Therefore the Xn are random functions. When do these random functions form a tight sequence? Since 2^@) = 0, certainly {Xn@)} is tight. By using the definition (8.15) we can translate (8.12) into a restriction on the fluctuations of the partial sums S^ If t = kfn
Weak Convergence and Tightness in С 59 and t + d — j\n, with к and у integers, then (8.12) reduces to (8.16) - Pfmax -*-= \Sh+i - Sk\ >e)<rj. d {i<nd Oy/n ) Although (8.12) and (8.16) in general differ if t and d are not integral multiples of l/n, the discrepancy is, as we shall show, irrelevant to our purposes. Consider the integers у and к defined by the inequalities n n n In Because of the polygonal character of Xn, we have sup \Xn(s) - Xn(t)\ < 2 max -^- \Sk+i - Sk\. t<s<t+ii J If n > 4/E, then j — к < пд, so that the maximum on the right does not decrease if the restriction i <j — к is relaxed to i < nd. Therefore, if (8.16) holds for all к and all n > n0, then (8.12) holds for all t and all n >_ max {щ, 4/E}, provided the e, rj, and д in (8.12) are replaced by 2e, 2r\, and \b, respectively, which amounts only to a renaming of these quantities. Thus {Xn} is tight if, for each positive e and r\, there exist a d, with 0 < д < 1, and an integer n0, such that (8.16) holds for all fc and for all n > n0. The maximum in (8.16), which extends over / < nd, becomes easier to work with if nd is replaced by an integer. If nd is an integer m—it need not be—then the inequality (8.16) becomes P max \Sh+i - Sk\ > -^ ojm) < rjd. \i<m ^Jd ) Put X = sf-Jd (if d is small, X is large); the inequality then further reduces to Pfmax |Sfc+i - Sfc| > \ i < m Since rjs2 is positive if r\ and e are, we are led to formulate the following theorem. THEOREM 8.4 Suppose {Xn} is defined by (8.15). The sequence {Xn} is tight if for each positive s there exist a X, with X > 1, and an integer n0 such that, ifn> n0, then (8.17) pjmax \Sk+i - Sk\ > Xojn) < ~ [i<n j A holds for all к.
60 The Space С We require X > 1 ((8.17) is trivial if X < V e), which corresponds to the requirement д < 1 in Theorem 8.3. In concrete cases, proving (8.17) forces large X's on us. Proof. Given e and rj, we shall produce а д @ < д < 1) and an n0 for which (8.16) holds for all к if n > n0. Since (8.16) becomes more stringent as e and ?7 decrease, we may assume 0 < s, rj < 1. By the hypothesis, with ^e2 in place of e, there exist A (A > 1) and nx such that Pjmax \Sk+i - Sk\ > Xojn] < ^ [ i<n ) X (8.18) for n > nx and к > 1. Put 5 = e2/A2; since X > 1 > e, we have 0 < 5 < 1. Let щ be an integer exceeding n^d. If я > n0, then [л<5] > и1} and it follows by (8.18) that p( max \Sk+i -Sk\ > Since X\J [nd] < e\Jn and ^e2/P = ?)d, (8.16) holds for all к if и > и0. This proves Theorem 8.4. It is not hard to see that it is enough that (8.17) hold for к < nX2/s, but we shall not need this. If {?n} is stationary, then (8.17) reduces to (8.19) Pfmax \St\ > Xojn) < ^ . \i<n ) X We can absorb a into X and require (8.20) P(max |S,| > Xjn) < ± \i<n j X with X > a. We shall see later that in some circumstances the hypothesis of Theorem 8.4 is necessary as well as sufficient. Coordinate Variables The projection ттг, with value 7Tt(x) = x(t) at x e C, is a random variable on (C, ^). We shall often denote this random variable by xt: For fixed t, xt is the function on С with value x{t) at x. If there is a probability measure P on (C,^), {?*:()< *< 1} is a stochastic process. We think of us a time parameter, and the xt are commonly called the coordinate variables or functions. The distribution of xt depends on the measure P, and we shall speak of the distribution of xt under P; we shall often write P{xt e H) in
The Existence of Wiener Measure 61 place of P{x:xt e H) and J xt dP in place of J zt P(dx). Finally, when / is a complicated expression, we shall often revert back to x(t), still intended as a coordinate function. Remarks. The general theory of weak convergence in С (Theorem 8.2, for example) is due to Prohorov A956). Theorem 8.4, a convenient reformulation of Theorem 8.3, resulted from discussions with F. Topsoe. Stone A963) treats weak convergence in a space of continuous functions on [0, oo). Lamperti A962b) discusses spaces of functions satisfying a Holder condition. PROBLEMS 1. Let X{t) = f?, where ? is a random variable with P{|?| > a} ~ a~* as a -»- oo, and, for every n, let Pn be the distribution of X. Then {Р„} is tight but does not satisfy condition (ii) of Theorem 8.3. 9. THE EXISTENCE OF WIENER MEASURE Wiener Measure Wiener measure, denoted here by W, is a probability measure on (C, ^) having the following two properties. First, for each t, the random variable xt is normally distributed under Ж with mean 0 and variance t: W{xt < a} = —L= Г e-/2t du. y/27Tt J-™ (9.1) If t = 0, this is interpreted to mean W{x0 = 0} = 1. Second, the stochastic process {xt:0 <; t < 1} has independent increments under W: If then the random variables (9.3) xti — xto, xt^ — xt , . . . , xt — xtk are independent under W. In this section, we prove there does exist such a measure W. If W has these two properties and if s <, t, then xt (normal with mean 0 and variance t) is the sum of the independent variables xs (normal with mean 0 and variance s) and xt — xs, so that xt — xs must be normal with mean 0 and variance t — s, as may be seen by dividing the characteristic functions. Thus, when (9.3) holds, (9.4) W{xti - xti_x < a,, i = 1, . . . , k} 1 /*°ч , .. '-1' du. — h
62 The Space С In particular, the increments are stationary (the distribution of xt — xs under W depends only on the difference t — s) as well as independent. If we regard x(t) as (one coordinate of) the position at time t of a moving particle, then x itself gives the history of the particle's motion (relative to this specific coordinate) from time t = 0 to time t = 1. Wiener measure gives to these paths x a distribution appropriate for the description of Brownian motion—the motion of a pollen grain suspended in water. In proving the existence of W, we face the problem of proving the existence on (C, ^) of a probability measure with specified finite-dimensional distribu- distributions. There can for an arbitrary specification be at most one such measure, and for some specifications there is none at all (there is, for example, no P under which the distribution of xt is a unit mass at 0 for t < \ and at 1 for '>*)• THEOREM 9.1 There exists on (C,^) a probability measure W such that (9.1) holds and such that the random variables (9.3) are independent under W whenever (9.2) holds. Proof. Let |l5 |2, ... be independent and normally distributed (on some (Q, &, P)) with mean 0 and variance 1. Let Xn be the random function defined by (8.15) with a = 1: (9.5) Xn(t, o>) = ^- Slntl(oj) + (nt - [nt]) -j= | Let Pn of the distribution of Xn on C. This measure is well defined because> as we showed in the preceding section, Xn is measurable {Х~х(ё с We shall first show that the finite-dimensional distributions of the Pn converge weakly to what we want the finite-dimensional distributions of W to be. After that, we shall prove that the sequence {Pn} is tight. It is clear that the limit of any weakly convergent subsequence {Pn'} will satisfy the requirements placed on W. The idea is that the Pn approximate the putative measure W. The finite-dimensional distribution -Pn^1..^ is just the distribution of the random vector (Xn(tx), . . . , Xn(tk)y Consider a single time point t. By (9.5) and the assumed normality of the |n, Xn(t) is normal with mean 0 and variance [nt] (nt - [nt])* . Г 5 n n this variance differs from t. by at most 2/n. Thus Xn(t) ®> N@, t) (see D.9) for this notation). Clearly, we can treat two or more time points in the same way; the finite- dimensional distributions therefore converge weakly to those prescribed for W.
The Existence of Wiener Measure 63 To prove that {Pn} is tight, we apply Theorem 8.4. To prove (8.20), write (9.6) Et = (max |S,| < IX^n < \St\\. { o<i J By the stationarity and independence of the |n, we have (9.7) Pjmax \St\ > 2Ljn) < P{\Sn\ > Xjn} [< J г=1 г=1 п—\ < P{|SJ > Цп} + 2 Р(?,)Р{|5П_,| ^ Xjn - i] г=1 Since SjVj is normally distributed with mean 0 and variance 1 and hence has a finite third absolute moment a independent of у (а — 2v2/tt), it follows by (9.7) and Chebyshev's inequality that P(max \St\ > 2XJn\ < + f W) \ i<n ) A i=\ A Since the Et are disjoint, we can conclude (9.8) Pf max \St\ ^ 2Xjn\ < \ , n = 1, 2, .... { г<п ) А Given ?, choose X so that 2a/A < e and Я > 1. Then (9.8) implies Pfmax \St\ which is (8.20), except for the irrelevant factor 2 on the left. By Theorem 8.4, therefore, the {Pn} are tight, which completes the proof. Thus Wiener measure exists. In the sections following, we shall derive some facts about W. For the present, let us only show that the elements of С are of unbounded variation, except for a set of Wiener measure 0. This is interesting because it exhibits a sense in which Wiener paths (elements of С chosen according to the probability measure W) are highly irregular—they are continuous, but only just. An element x of С is of bounded variation if and only if there exists a number Mx such that г=1
64 The Space С for all t0, tlt . . . , tk with 0 < t0 < tx < • • ¦ < tk < 1. Therefore it will suffice to show that, if 'i - Г /.(*) = I — x\ 1=1 then/n(z)-> oo except for x in a set of Ж-measure O. Now |JV| (see D.10)) has positive mean b and variance с (b = V2/tt and с = 1 — 2/тг). Since the |z(/2-n) - x((i — lJ~n)| are, under W, independent with mean 2~n/2Z> and variance 2~nc, the mean and variance of fjx) are 2nl2b and с It follows by Chebyshev's inequality that W{x :fn(x) > a} -»¦ 1 for each a. Since fn+i(x) >fn(x)> it follows further that fn(x) -> oo except on a set of Wiener measure 0. We shall also use the symbol W to denote a random element with values in С and with Wiener measure as its distribution; that is, Ж is a measurable mapping from some (Q, 38, P) to С with the property that P{co: W(co) gA}= W(A), AeV, where the W on the right is Wiener measure as already defined. That there exists such a random element follows by the argument centering on D.4). This dual use of W should cause no confusion. We shall denote by Wt(co) or W(t, со) the value at t of the random function W{oo). Thus {Wt:0 <, t < 1} is a stochastic process whose paths are continuous for all со. This process we shall call the Wiener process or the Brownian motion process. The relation Xn ^> W can be interpreted in accordance with D.5) or in accordance with D.7), depending on whether W is construed as a random function or as a measure—the meaning is the same for the two interpretations. The Brownian Bridge A random element X of С is Gaussian if all its finite-dimensional distributions are normal. The distribution in С of a Gaussian random element is com- completely specified by the means E{X(t)} and product moments ?{X(s)X(t)} 0 <i s, t <; 1, because these determine the finite-dimensional distributions. For W, the moments are (9.9) ?{Wt} = 0 and (9.10) E{WsWt} = s if s <, t. In studying the behavior of empirical distribution functions, we shall need a Gaussian random element W° of С whose distribution is specified by the
The Existence of Wiener Measure 65 requirements (9.П) E{Wt°} = 0 and (9.12) E{Ws°Wt°} = s(l - 0 if s < t. Although we could prove the existence of such a W° by the methods of Theorem 9.1, it is simpler to construct W° from W by setting (9.13) W° =Wt-tWx, 0<t< 1. Certainly W°, thus defined, is a Gaussian random element of C, and (9.11) and (9.12) follow from (9.9) and (9.10). The random element W° is called the Brownian bridge (or tied-down Brownian motion). Note that W° — W° =^ 0 with probability 1. From (9.11) and (9.12) it follows that (9.14) E{(Wt° - W:f) = (t - 5)A - (t - 5)) if s<t and that (9.15) = ~(S2 - Si)(*2 - tx) if 5X < S2 < tr < t2. We shall also use W° to denote the distribution on С of the random element W°. If /г:С-> С takes the function x to the function with value x(t) — tx(l) at t, then the measures W° and Ж are related by W° = Whrx. Separable Stochastic Processes^ Let us connect our construction of Wiener measure with the notion of a separable stochastic process. According to Kolmogorov's existence theorem (p. 230), there exists a stochastic process {|t:0 <, t < 1}, on some probability space (Q, ^, P), for which the distribution of each finite collection of the |t coincides with the corresponding finite-dimensional distribution prescribed for W. In the standard construction, (Q, 8$) is the product of a collection of copies of (R1, ?%х), one copy for each t in [0, 1], and the |t are the coordinate functions. The set Qo of со for which !f(co) is continuous in t, 0 ^ t < 1, need not satisfy P(O0) = 1; indeed, in the standard construction Qo does not even lie in ?%. It is possible, however, by another procedure, to find a separable process with the prescribed finite-dimensional distributions. The process {?t'O < t < 1} on (Q, ^, P) is separable! if & is complete relative to P and t The rest of this section may be omitted. % See Doob A953, p. 51).
66 The Space С if there exists in ^ a set E of measure 0 and in [0, 1] a countable set To such that, for each a and /? and for each open interval /, we have (9.16) {ш:а < ?t(co) < /5, / e / П To} - {co:a < |f(a>) ^ /5, / e / П Г} с ?, where we have written Tin place of [0, 1]. The left-hand set in the difference here lies in 38 because To is countable; since P(?") — 0 and 38 is complete, the right-hand set in the difference must lie in 38 also and have the same probability. If {IJ is separable and has the finite-dimensional distributions appropriate to Brownian motion, thenf the set Qo of со for which the sample path (It(co) as a function oft) is continuous lies in 38 and satisfies P(O0) = 1- If we map Qo into С by carrying со into its sample path, and if we carry P (restricted to O0) over to (C, ^f) via this mapping, we arrive at W, which gives another method of constructing Wiener measure. The problem of proving the continuity of the sample paths of separable processes and the problem of constructing measures on (C, ^f) are effectively the same. If each separable process having certain specified finite-dimensional distributions also has continuous sample paths with probability 1, then, as the above construction shows, there exists on (C, ^) a probability measure with these finite-dimensional distributions. Let us prove the converse. THEOREM 9.2 If{?t:Q <, t < 1} is a separable stochastic process, and if there exists on (C, ^) a probability measure with the same finite-dimensional distributions as {IJ, then the sample paths of the process are continuous with probability 1. Proof. In proving this result it is no restriction to assume that To contains all the rationale in the unit interval, in which case (9.16) holds if / is a closed interval with rational endpoints. We shall use the relations (9.17) (U. Ee) + (U. E'e) cz \J9 (Ев + Е'в) and (9.18) (Пе Ев) + (С\в Е'в) с Ue (Ев + Е'в), each of which is valid whatever the range of the index 6. % Let V denote the general system (9.19) V: k;r0,..., rk; ax, . . . , afc, where к is an arbitrary integer, the rt and the o^ are rational, and 0 = r0 < • • • < rk = 1. There are countably many such systems V. For V given f SeeDoob A953, p. 393). % Recall that + stands for symmetric difference.
The Existence of Wiener Measure 67 by (9.19) and e positive, define к (9.20) QTo(V, e) = П {со:a, < |t(co) ^ a, + e, t e [гИ) г,] П Го} and (9.21) 0Го = П U QTo(V, s), E V where the intersection extends over positive, rational e. Let Q.T(V, e) and пт be the same sets but with To replaced by T = [0, 1] in (9.20) and (9.21). Since UT(V, e) + QTo(V, s) <= E, (9.17) and (9.18) imply (9.22) пт + 0Го с Е. Now 0Го(К, ?) is a countable intersection of sets in & and hence itself lies in &. Therefore пТо e dS. It follows by (9.22) that пт lies in SS and P(Qr) = P(Qr ). Since Or is exactly the set of со with continuous paths, we need only prove - ¦ (9.23) ?{пт) = 1. Let (tu t2, . . .) be an enumeration of the points in To. For V given by (9.19) and ? positive, let HTo(V, e) denote the set of points z =* (zls z2, . . .) in i?00 such that, for each / — 1, . . . , k, the inequality o^ < zu < a^ + ? holds for every coordinate index м for which tu e [г^_х, rj. Then HTo(V, e) e ^°°. If the mapping 99 :Q -> i?00 is defined by q>{<o) = (|tl(o>), lt2(co), . . .), then <p-*HTo(V, e) = Qro(F, ?). If ЯГо = ПеигЯГо(К, е), then ЯГо е <%«> and (9.24) ^ЯГо = QTo. Now define xp:C^ R™ by ^0*0 = 0»(*i)> «('2). • • •)• Let p be the probability measure on (C, ^) whose finite-dimensional distributions coincide with those of {?J. If H is a finite-dimensional set in i?00, then (9.25) Р(<Р-1Я) = Р(^-ХЯ); since the finite-dimensional sets form a field generating ^f°, (9.25) holds also for every Я in ^°°. By (9.24), therefore, ?{пт) =* Р(^-хЯГо). But ^-хЯГо = С, from which (9.23) follows. Thus proving continuity for separable processes and constructing measures on С are the same activity. Remarks. See ltd and McKean A965) for other constructions of Wiener measure and for some history. For an interesting account of Wiener's early work, see the articles in "Norbert Wiener, 1894-1964," Bull. Amer. Math. Soc. 72 A966), number 1, part II.
68 The Space С PROBLEMS 1. Let ?r, with r ranging over the rationale in [0, 1], be random variables with the finite-dimensional distributions appropriate to Brownian motion. By adapting the argu- arguments in the proof of Theorem 9.1 but avoiding the tightness concept (and indeed all mention of the space C), show that {?r} is, with probability 1, uniformly continuous in r. For t irrational, define ?f = limr_t ?r, and, by the argument preceding Theorem 9.2, construct Wiener measure on С anew. 2. Show that Wiener measure has no locally compact support. 3. For 0 <; t < oo, define Vt = A + t)W°/{1+t), where W° is the Brownian bridge. The process {Vt:t > 0} has sample paths that are continuous for all со, the finite-dimensional distributions are Gaussian, and the moments are E{Ft} = 0 and E{FsFt} = min (s, t). The process thus represents a Brownian motion over the time interval [0, oo). 10. DONSKER'S THEOREM The Theorem Given random variables ?ls |2, ... defined on (Q, ?%, P), let ?„ = |x + • • • + ln be the partial sums and define a random element Xn of С by (8.15): A0.1) Xn(t, со) = -L S[nt](co) + (#i/ - И]) -L l[nt]+1(co). crv« <Л/и The following theorem on the convergence in distribution of the Xn is due to Donsker. The Introduction has examples of its use. THEOREM 10.1 Suppose the random variables |n are independent and identically distributed with mean 0 and finite, positive variance cr2: (Ю.2) E{|J = 0, E{|n2} - a2. Then the random functions Xn defined by A0.1) satisfy (Ю.З) Xn ^ W. Proof. We first show that the finite-dimensional distributions of the Xn converge to those of W. Consider first a single time point s\ we must prove (Ю.4) Xn{s) Д. Ws. Since A0.5) Xn(s) -S, 0 by Chebyshev's inequality, A0.4) will follow by Theorem 4.1 if we prove XX/ VVs-
Donskefs Theorem 69 Jut this is a direct consequence of the Lindeberg-Levy central limit theorem Theorem 7.4) and the fact that [ns]jn —»¦ s. Consider now two time points s and t with s < t. We are to prove (Xn(s),Xn(t))X(Ws,Wt), vhich will follow by Corollary 1 to Theorem 5.1 if we prove (Xn(s), Xn(t) - Xn(s)) X (Ws, Wt - Ws). because of A0.5) and the same relation with t in place of s, it is enough to )rove 10.6) (—=, S[fM], -~ (S[frt] - S[ns])) Д. (Ws, Wt - Ws). Since the components on the left are independent, by the independence of the ?n, A0.6) follows by the central limit theorem and Theorem 3.2. A set of ;hree or more, time points can be treated in the same way, and hence the inite-dimensional distributions converge properly. We-prove tightness via the following lemma, which is slightly more general than presently required. LEMMA Let |x, . . . , |m be independent random variables with mean 0 mdfinite variances of; put S^ = ?x + • • • + !г- and s? — аг2 + • • • + о*- Then 10.7) P(max |S,| > Xsm) < 2?[\SJ > (A - y/2)sm). \i<m ) { _ ) To prove A0.7)—note that it is trivial if Я < V2—consider the sets E, = (max |S,| < Xsm < \sA [ i<i ) Clearly, ;i0.8) Pfmax \St\ > Xsm) ? P{\Sm\ > (A - [i<m | m + 2 p№ n (ISJ < a - m—1 г=1 Since \St\ > Xsm and \Sm\ < (A - \ll)sm together imply \Sm — St\ > it follows by Chebyshev's inequality and the assumed independence of the I; that the sum in A0.8) is at most m—1 _ m—1 2 P№)P(lsm - s*l > V2s-} < 2 г=1 г=1 m-1 2 г=1 \i<m And now A0.8) and A0.9) combine to give A0.7).
70 The Space С Applying the lemma to the random variables involved in Theorem 10.1, we have, for X > 2V2, pfmax \S,\ > Xajn] < 2P{\Sn\ > \XoJn). { i<n J By the central limit theorem, PflSU > \^n) - P{|JV| > Щ < ^ E{\N\3}. Therefore, if e is positive, we have A0.10) lim sup PJmax \St\ > Xo^Jn \ < j2 for X sufficiently large. Tightness now follows by Theorem 8.4. An Application As pointed out in the Introduction, Donsker's theorem has this qualitative interpretation: Xn Д- W says that, if т is small, then a particle subject to independent displacements |x, |2, ... at successive times r, 2r, . . . will, viewed from afar, appear to perform approximately a Brownian motion. More important than this qualitative interpretation is the use of Donsker's theorem to prove limit theorems for various functions of the partial sums Sn. The Introduction indicates how to use the relation Xn Д- Wto derive the limiting distribution of maxi<n St; let us now carry this through in detail. Since h(x) =; supt x{t) is a continuous function on C, Xn -^> ^implies, by Corollary 1 to Theorem 5.1, that supf Xn(t) ^ supt Wt. The obvious relation sup Xn(t) = max —— S^ 0<t<l i<n Oyjn now implies A0.11) —^- max Si ^> supt Wt. j Thus we would have the limiting distribution of тах^<п St (properly normalized) if we knew the distribution of supt Wt. (We still assume the hypotheses of Theorem 10.1, of course.) The technique we shall use to find this latter distribution is to compute the limiting distribution of maxi<n St in an easy special case. Suppose that So, Slt. . . are the random variables for a symmetric random walk starting from the origin. Suppose, that is, that the |n are independent
Donsker's Theorem 71 ind satisfy 110.12) P{?n=1}==PUn:=_1} = i. Let us show that, if a is a nonnegative integer, then 10.13) pf max St > a) = 2P{Sn > a} + P{Sn = a}. \0<i<n ) The case a = 0 is easy. Assume a > 0 and put Mt = тахо<3<г- Ss. Since ?{Mn >a}- P{Sn - a} = P{i/n > a, Sn < д} + ?{Мп >a,Sn> a} and P{Mn > a, Sn > a} = ?{Sn > a}, A0.13) will follow if we prove A0.14) ?{Mn >a,Sn<a} = ?{Mn >a,Sn> a}. Because of A0.12), all 2n possible paths (JSlt S2, . . . ,.5J have the same probability 2~n. Therefore A0.14) will follow if we show that the number of paths contributing to the left-hand event is the same as the number of paths contributing to the right-hand event, and to show this it suffices to match the paths in a one-to-one manner.f Given a path (S1} S2, . . . , Sn) contributing to the left-hand event in A0.14), match it with the path obtained by reflecting through a all the partial sums after the first one that achieves the height a. Since the correspondence is one-to-one, A0.14) follows. This argument is an example of the reflection principle. Let a be an arbitrary nonnegative number, and let an = — [—a.n%]. By A0.14), we have A0.15) Pfmax -^ St > a) = 2P{Sn > an} + P{Sn = an}. { i<n Jn } By the central limit theorem, ?{Sn > an} -* P{7V > a} t The new mathematics.
72 The Space С (сг2 = 1 in this case, in view of A0.12)). Since the largest term in the symmetric binomial distribution goes to 0, the term ?{Sn = an} in A0.15) is negligible. Thus A0.16) Pfmax -p St > a] -> 2P{N > a}, a > 0. [ j ) Combining A0.16) with A0.11) (still assuming A0.12)), we conclude A0.17) P{suPi Wt < a} = -L [*e-W du, a > 0. Of course, the left side of A0.17) vanishes if a < 0. We have derived a fact about Brownian motion by combining Donsker's theorem with a computa- computation involving random walk, a computation that was simple partly because it reduced to enumeration and partly because a random walk cannot pass above a positive integer a without passing through it. Let us now drop the assumption A0.12). If the ?n are independent and identically distributed and satisfy A0.2), then A0.11) holds, and from A0.17) we can now conclude A0.18) P(-Jp max St < a] -> -p= TV* du, a > 0. [ff^Jn i<n J V2tt Jo Thus we have derived the limiting distribution of max,<n St under the hypotheses of the Lindeberg-Levy theorem. This argument follows a general pattern. If h is continuous on С—or continuous except at points forming a set of Wiener measure 0^—then Xn®+ W implies A0.19) h(Xn)^h(W). (In the case just analyzed, h(x) = sup4 z@-) We can find the limiting distribution of h(Xn) if we know the distribution of h(W), and we can often find the distribution of h{W) by finding the limiting distribution of h(Xn) in some special case and then using A0.19) in the other direction. The next section contains further examples in this pattern. Therefore, if the $n are independent and identically distributed with E{?J == 0 and E{fn2} = a2, then the limiting distribution of h(Xn) does not depend on any further properties of the fn. For this reason, Donsker's theorem is often called the (or an) invariance principle. Here we shall call it instead the (or a) functional central limit theorem. This term will be used for a variety of similar theorems in what follows, just as the term central limit theorem itself is used for a whole class of theorems.
Donsker's Theorem 1Ъ A Necessary Condition for Tightness Let us drop the assumption that the ?n are independent and directly assume that Xn Д* W (with Xn still defined by A0.1)). We shall show that, in this circumstance, for each positive e there exists а Я exceeding 1 such that A0.20) p(max \St\ > lajn) < - \i<n j Я2 holds for all n exceeding some n0. In case {?n} is stationary, this is just the hypothesis in Theorem 8.4, which we verified in proving Donsker's theorem. Let Y — supj \Wt\. Because of A0.17) and the symmetry of W under reflection through 0, Y has a finite second moment. From A0.19) with h(x) =; supt |a;(f)|,.it follows that, for each positive Я, A0.21) p(max |S,| > XaJn) -+ P{Y > Я} < — Г У2 d?. Given e, choose Я so large that the integral here is less than e; it follows from A0.21) that A0.20) holds for all sufficiently large n. Thus we see, after the fact, that A0.20) is "the" condition to verify in proving tightness in Donsker's theorem. In Chapter 4 we shall use the same condition in proving functional central limit theorems for sequences of dependent random variables. The preceding argument really shows this: Suppose that {fn} is stationary and that the finite-dimensional distributions of the Xn defined by A0.1) converge to those of a random function X, where supt \X(t)\ has a finite second moment. Under these conditions, if {Xn} is tight, then the hypothesis of Theorem 8.4 is satisfied. To this extent, this hypothesis is necessary; compare the remarks centering on (8.10). Another Proof of Donsker's Theorem^ The rest of this section is devoted to a second proof of Theorem 10.1. The proof is interesting because it makes little use of the preceding theory. In fact, granted the existence of Wiener measure, J we need only Theorem 2.2 and the central limit theorem. We are to prove Pn => W, where Pn is the distribution of Xn. According to Corollary 2 of Theorem 2.2, it is enough to prove that Pn(A) ~* W{A) whenever A is a finite intersection of open spheres and satisfies W(dA) =; 0. t The rest of this section may be omitted. t Which can itself be established by a direct argument; see Problem 1 of Section 9.
74 The Space С Now if A is the intersection of spheres with centers x1} . . . , xk and radii elt . . . , sk, and if y{i) = max (xt(t) - еъ) and z(t) = min then A has the form A0.22) A = {ж:у@ < x(t) < z{t), 0 < t <, 1}. We are to prove (Ю.23) РП(Л) -* W{A) under the assumptions that A is denned by A0.22) with y, ze С and that Ж(дл() = 0. We may also assume that y(t) < z(t) for all t, since л( is other- otherwise empty. (If 2/@) ^ 0 and z@) 5^ 0, then W{dA) = 0 holds automatically, but this need not concern us.) From here on, у and z are fixed. In our first proof of Donsker's theorem, we showed that the finite-dimen- finite-dimensional distributions of the Pn converge weakly to those of W. This fact, which does not really involve the space С at all, we shall assume here. Thus A0.24) limPn{z :«*!),. . . , x(tk))eH} = W{x:{x{tx),..., x(tk))eH} 71-HX) holds for each ?>dimensional Borel set H satisfying W{x:(x(tl),...,x(tk))edH} = 0. If, for integral v, А, -{^ then, as и increases to infinity, A2» decreases to the set of x for which y(t) < x(t) < z(t) holds for all dyadic rationale t. But if W(dA) — 0, then this last set differs from A by a set of W-measure 0. Therefore for each positive rj there exists a v such that W(A) < W(AV) < W(A) + r\. Since Pn(A) < Pn(Av), and, since limn Pn(Av) = W{Av) by A0.24), we have lim supn Pn{A) < W(A) + rj. Since r\ was arbitrary, A0.25) lim supn Pn(A) < W{A). We complete the proof by showing that A0.26) lim infn Pn(A) > W(A), which is the harder part. Given rj, choose e so that, if A0.27) D(e) = {x:y(t) + 3e < x{i) < z{t) - 3e, 0 ^ t < 1},
Donsker's Theorem 75 then W(D{ej) > W(A) - r\. (Clearly, the set A0.27) increases to A as e decreases to 0.) Now define Dv(e) = \x:yl-\ + 3e < xl-\ < z(-) - Зе, к = 0, 1, . . . , v). { \v/ \v/ \v/ I For each v, we have D(e) с ^(е) and hence A0.28) W{Dv{ej) > W(A) - y\. Choose and fix а и so large that A0.29) -f < n, wj-) < e, -) < e, where w denotes the modulus of continuity (8.1). Then n > v implies (io.3O) m - /r w. " Let Ln denote the set of elements of С that are linear on each interval [(/ — l)/n, i/n], i = 1, . . . , n, and let Gn be the set of a; in С for which A0.31) is violated for some i = 0, 1, . . . , n. If x e Ln, if x e Gnc (so that A0.31) holds for all / =* 0, 1, . . . , n), and if n > v, then, by A0.30), y{t) < x(t) < z(t) for all t ? [0, 1], so that x e A. Thus Ln n Gnc с ^. Let Gnr be the set of a; for which the double inequality A0.31) holds for i < r but not for i = r. Since Pn(Ln) = 1, and since the Gnr are disjoint and add to G, we have A0.32) < 2 Pn(Gn,r), n>v. r=0 For 1 < r <n, let knr denote that integer к A < к < v) such that (ifc - l)/v < r\n < k/v and let kn0 = 0. By A0.32), A0.33) Pn(Ac)< r=l x: % - >e\. If x e Gnr and \x(r[n) - x(knr[v)\ < e, then, by A0.30), x +3e ¦•«-4
76 The Space С so that x ф Dv(e). Thus the first sum on the right in A0.33) is at most A0.34) Pn(Ac) < Pn(Dv(e)c) + i Рп(сПгГ nix: for n ;> v. We shall now show that the sum in A0.34) is small. Observe first that A0.35) *t-xfdPn<9\t-s\ for all 5, 7, and n. In fact, if s and 7 are of the form ijn andy/n, then the left side of A0.35) is E{(St - S^/па2} = \t - s\; if s and t lie in the same subinterval [(/ — l)[n, i/n], then the left side of A0.35) is n(t — sJ, which is at most \t — s\; and the general result follows by Minkowski's inequality. By the independence of the ?n, therefore, A0.36) Pn[Gn,rn\x: x[ — I — Since rjn — knrjv < l/v, it follows by A0.29) that the last member of A0.36) is at most r]Pn(Gnr). Using A0.34) and the fact that the P(Gnr) sum to at most 1, we arrive at Pn(A°) < Pn(Dv(e)c) ±V, n>v. But НтпРп(^(е)с) = W(Dv(e)c) by A0.24), and so, by A0.28), lim supnPn04c) < W(AC) + 2tj. Since r\ was arbitrary, this establishes A0.26), which combines with A0.25) to yield A0.23), proving Donsker's theorem once more. Remarks. Erdos and Kac A946) first conceived the invariance principle itself—the idea of computing the limiting distribution of a quantity such as maxfc<n Sk first in a special case and then passing to the general case by connecting the result with Brownian motion (which is how it all started). An early paper in this direction was Kolmogorov A931). The original assertion of Donsker A951) was not that Xn ®+ W ox that Pn => W, but, what is by Theorem 5.2 the same thing, that h(Xn) jE> h(W) for all real, continuous func- functions h on C. The arguments leading to A0.23) are his. Kolmogorov and Prohorov A954) first pointed out that weak convergence is relevant and that Pn => W follows from A0.23) and Theorem 2.2. Prohorov A953 and 1956) brought the tightness concept to bear. For a
Functions of Brownian Motion Paths 77 very different approach to these problems, see Knight A962), and for a very different kind of application, see Strassen A964). Lamperti A962a) has a result differing from Theorem 10.1 in that the limiting process is not Brownian motion. PROBLEMS 1. Let |nl,. . . , |nfcn be independent with mean 0 and variances o^; put Sni = !„! + ••• + ?„„ s2ni = a2nl + ¦ • ¦ + a2ni, and sn2 = s\kn. Let Xn be the random function that is linear on each interval [s\tirA}s^, s2njsn2] and has values Xn(s2nilsn2) = SnJsn at the points of division. Assume Lindeberg's condition G.3) holds and generalize Donsker's theorem by using Lmdeberg's theorem, the inequality A0.7), and the corollary to Theorem 8.3 to prove Xn JL». W. (This result is due to Prohorov A956).) 11. FUNCTIONS OF BROWNIAN MOTION PATHS In the preceding section we used Donsker's theorem and properties of random walk to find the distribution of supt Wt. Having found this distribution, we used Donsker's theorem once more to find the limiting distribution of maXj<n Si in general. Here we shall apply the same technique to other functions of Brownian motion paths and of partial sums. We also compute some distributions associated with the Brownian bridge.f We shall say we are in the Lindeberg-Levy case when dealing with partial sums Sn of independent, identically distributed random variables ?n with mean 0 and finite, positive variance cr2. In this case Xn will always denote the random function A0.1). We shall say we are in the random walk case if each ?n assumes the values +1 and —1 with probability \ each. In this case cr2 = 1. Maximum and Minimum Let m — inft Wt and M = supt Wt, and let A1.1) mn = min Si} Mm = max St 0<i<n 0<i<n be the corresponding quantities for partial sums. The mapping carrying the point a; of С to the point (inftx(t), supta;(f), x(l)) of R3 is everywhere continuous, so that, by Theorems 10.1 and 5.1, (И.2) ~V К Af n, Sn) Л (m, M, in the Lindeberg-Levy case. t Although weak-convergence results in С would have little point if it were not possible to perform computations of the sort carried through in this section, the computations are not themselves essential to an understanding of the general theory.
78 The Space С We shall first find an explicit formula for A1.3) pja, b, v) = ?{a <mn<Mn<b,Sn = v) in the random walk case. We shall show that, if (И.4) Pn{j)=nSn=j}, then A1.5) pn(a, b, v) = | Pn(v + 2k(b - a)) - f pnBb - v + 2k(b - a)) k k=—oo for integers a, b, and и satisfying A1.6) a<0<b, a<b, a<v<b. If a < &, then the series in A1.5) are really finite sums. Notice that both sides of A1.5) vanish if и and v have opposite parity. For particular values of и, a, b, and v, let us denote the equation A1.5) by [n, a, b, v]. We shall prove by induction on n that [n, a, b, v] is valid if A1.6) holds. For n = 0, this follows by a straightforward examination of cases. Assume as induction hypothesis that [n — 1, a, b, v] holds for a, b, v satisfying A1.6). If a = 0, then A1.3) vanishes (note that i starts at 0 in the minimum in A1.1)), and the sums on the right in A1.5) cancel because pn(j) = pn{—j). Thus [n, a, b, v] is valid if A1.6) holds and a = 0; we may dispose of the case b = 0 in the same way. To complete the induction step we must verify [n,a,b,v] under the assumption that a < 0 < b and a < v < b. But in this case a + 1 < 0 and b — 1 > 0, so that [n — 1, a — 1, b — 1, v — 1] and [n — 1, a + 1, & + 1, v + 1] both come under the induction hypothesis and hence are valid. And now [n, a, b, v] follows by the probabilistically obvious recursions Pnij) = bn-i(j- 1) + */V-i(/+ 1) and From A1.5) it follows by summation over v that, if A1.7) a<0<b, a<u<v<b, then A1.8) P{a < mn < Mn < b, и < Sn < v} | - a)< Sn < t; + 2fc(b - a)} fc=— 00 - f pi2b ~v + 2k(b - a)<Sn<2b-u + 2k(b - a)}. fc=—00 t Problem 2 outlines how A1.5) can be derived (as opposed to verified).
Functions of Brownian Motion Paths 79 Taking a = — n — 1 in this formula leads to A1.9) ?{Mn < b, и < Sn < v) = ?{u < Sn < v} - P{2b - v < Sn < 2b - и}, valid for — n — 1 < и < v < b, b >; O.f From A1.9) it is possible to retrieve A0.13). Now A1.8) holds in the random walk case and, because of A1.2), we can find the distribution of (m, M, WJ by passing to the limit. If a, b, u, and v are real numbers satisfying A1.7), replace them in A1.8) by the integers [an-], — [—bn^], [ми*], and —[—vn*], respectively. Because of the central limit theorem and the continuity of the normal distribution, a termwise passage to the limit in A1.8) yields A1.10) P{a < m < M < b, и < W1 < v) = f ?{u + 2k(b - a)< N <v + 2k(b - a)} 2k(b -a)<N<2b-u + 2k(b - a)}. fc=— 00 00 fc=—00 The interchange of limit with summation over к can be justified by the series form of Scheffe's theorem (p. 224). The joint distribution of M and W1 alone could be obtained by letting a tend to — oo in A1.10), but it is simpler to return to the random walk case and pass to the limit in A1.9), which yields A1.11) ?{M < b, и < Wx < v) = ?{u < N < v} - P{2b - v < N < 2b - u}, valid for и < v < b, b > 0. Taking v = b and letting и -> — oo leads back to A0.17). From A1.10) with и — a and v = b we have A1.12) ?{a<m<M<b) = I (-l)*P{a + k(b- a)< N <b+k(b- a)}, fc=-00 valid for a < 0 < b. And this result with a — —b gives A1.13) P{suPi \Wt\ < b) = f (-l)fcP{Bfc - l)b < AT < Bk + A;=—qo t See Problem 1 for another proof.
80 The Space С for b > 0. By continuity, the strict inequalities in all these formulas can be relaxed to allow equality. And the right sides can all be written out as sums of normal integrals. It is possiblef to transform the series in A1.13) to ^) r2Bfc+lJ/8b2 1 _ _ V V^) -ir2Bfc+lJ/8b2 77 »=i 2k + 1 Although we derived A1.10) through A1.13) by passing to the limit in the random walk case, we have the limiting distributions for (mn, Mn, Sn), (Mn, Sn), (mn, Mn), and тахг<п \St\ (properly normalized) in the more general Lindeberg-Levy case because A1.2) holds there. The Arc Sine Law For x in C, let hx{x) be the supremum of those t in [0, 1] for which x(t) = 0; let h2(x) be the Lebesgue measure of those t in [0, 1] for which x(t) > 0; and let h3(x) be the Lebesgue measure of those t in [0, h^x)] for which x(t) > 0. Then T = h-^W) is the time at which the Wiener path W last passes through 0, U = h2(W) is the total amount of time W spends above 0, and V = h3(W) is the amount of time W spends above 0 in the interval [0, T]. We shall find the joint distribution of (T, U, V, Wx). In Appendix II (pp. 230 ff.) it is shown that each of the mappings h1} h2, and h3 is measurable and is continuous except on a set of Wiener measure 0. Therefore A1.14) (h.iXJ, h2(Xn), h3(Xn), Xn(l)) *> (T, U, V, WJ in the Lindeberg-Levy case. In the random walk case, the vector on the left has a simple interpretation: Tn — nh^XJ is the maximum i, 1 < i < n, for which St = 0; Un = nh2(Xn) is the number of i, 1 < i < n, for which ^_х and St are both nonnegative; Vn — nh3(Xn) is the number of i, 1 < i < Tn, for which ^_! and St are both nonnegative; and, of course, XJY) — SJ^fn. With these definitions we therefore have A1.15) (- Tn, - Un, - Vn, ± Sn) ®+ (T, U, V, \n n n yjn / in the random walk case; we shall find the distribution of (T, U, V, W^) by passing to the limit. In the general Lindeberg-Levy case, the left side of A1.14) is a somewhat more complicated function of the partial sums S1} . . . , Sn. After we have found the distribution of (T, U, V, W-^), we shall show how A1.14) leads even in the general case to limit theorems for quantities associated in a natural way with the partial sums. t See Feller A966, pp. 330 and 594).
Functions of Brownian Motion Paths 81 Since the random vector (T, U, V, W-^) is constrained by 1-T+V if Wx > 0, A1.16) it suffices to consider (T, V, Wx) and the related vector (Tn, Vn, Sn). The distribution of the latter quantity in the random walk case we shall derive from three facts which admit of elementary proofs we shall not carry through here. First, we shall use the local limit theorem for random walk: If m tends to infinity andy varies with m in such a way that у and m have the same parity and / -> y, thent Second, we shall use the fact that A1.18) P{SX > 0, . . . , STO_X > 0, Sm = j} = ± pm(j) m for у positive.% If S2m = 0, then U2m = V2m assumes one of the values 0, 2, . . . , 2m; the third fact§ we shall use is that these m + 1 values all have the same conditional probability: A1.19) P{V2m = 2i | S2m = 0} = —Ц-, i = 0, 1, . . . , m. m + 1 To compute the probability that Tn = 2k, Vn = 2i, and Sn = j, condition on the event S2k = 0. Conditionally on this event, (So, . . . , S2k) and (S2k+1, . . . , Sn) are independent, Vn depends only on the first sequence, and Tn = 2k and Sn = j if and only if the elements of the second sequence are nonzero and the last one is/ By A1.18) and A1.19) we conclude that A1.20) P{Tn = 2k, Vn = 2i, Sn = j} = p2k@) -~ -^— pn_2k(j) к + 1 n — 2k if A1.21) 0<,2i<2k <n, ;>0. Both sides of A1.20) vanish if и andy have opposite parity. Fory negative, the same formula holds with |y| in place of у on the right. t Feller A957, p. 170) has the local theorem for the binomial distribution, and A1.17) follows because ?(Sm + m) IS binomially distributed. % See Feller A957, p. 70). This also follows from A1.9). § See Feller A957, p. 72).
82 The Space С We shall apply Theorem 7.8 to the lattice of points Bfc/n, 2i\n, j[y/n), where у and n have the same parity. Suppose k, i, and у tend to infinity with n in such a way that 2k 2i n n where 0 < v < t < 1 and x > 0. Then A1.21) holds for large n, and it follows by A1.20) and A1.17) that /2.2. -L \n n Jn where A1.22) g(f, *) = -1 W__ e-b2/(i-tM о 277 [f(l - OF The same result holds for negative я by symmetry. Therefore (П-23) /- Tn, i Vn, -j= Sn) has (in the random walk case) the limiting distribution in R3 specified by the density (g(t,x) if 0<u <t < 1, A1.24) f(t,v,x) = \6 [0 otherwise. By A1.15), (T, V, Wx) has this density. Because of A1.16), the distribution of (T, U, V, Wi) can be written out explicitly also. From A1.24) it follows that the conditional distribution of V given 2" and Wx is uniform on [0, T\\ this corresponds to A1.19). By A1.16), if T= t and Wx = x, then U is uniformly distributed over [1 — t, 1] for x > 0 and over [0, t] for x < 0. Using A1.24) to account for the possible values of t and x, we find for the density of U alone A1.25) f f g(t, x) dtdx + f fg(f, *) df dx. x>0 x<0 1-й <t u<t Now the integral of g(t, x) over the range x > 0 is 1/[2тг^A _ t)i], which is the derivative of — (A — *)/0*/7Г> and hence A1.25) reduces to Д - и)*]. Therefore ds =- arc sin ^ц, 0 < и < 1. A1.26) P{t7 < ц} = - f" - s)
Functions of Brownian Motion Paths 83 This is Levy's arc sine distribution. A similar computation shows that Г also follows the arc sine law: A1.27) P{T<t} =-arc sin ^f, 0 < t < 1. Let us now combine A1.14) with the facts just derived to obtain a limit theorem for the general Lindeberg-Levy case. Let us agree to say that a O-crossing takes place at / if the event A1.28) E, = {S, = 0} U {5И > 0 > St} U {?_! <0<St) occurs (which in the random walk case is to say that St = 0). Let T'n be the maximum i, 1 < i < n, for which a O-crossing takes place at i; let U'n be the number of i, 1 < i < n, for which St > 0; and let V'n be the number of i, 1 < * < T'n, for which St > 0. We shall prove that (И.29) (- t;, - vn, - v;, -^- sn) *> (т, и, v, by showing that the quantity on the left here approximates the left side of A1.14). Clearly, T'n\n is within 1/n of ^(Хп). If yn is the number of i, 1 < i < n, for which Ег occurs—the number of O-crossings—then U'Jn and V'Jn are within yjn of h2(Xn) and h3(Xn), respectively. Therefore A1.29) will follow from A1.14) and Theorem 4.1 if we prove that yjn -?> 0, and for this it is enough to show that A1.30) е[Ы = - 2 P(E<) -> 0. I n J n i=i But for each positive e, and hence, by the central limit theorem, P(Et) -> О. And now A1.30) is a consequence of the theorem on arithmetic means of convergent sequences. From A1.29) we may conclude for example that Ujn and T'Jn have arc sine distributions in the limit. The Brownian Bridge The Brownian bridge W° behaves like a Wiener path W conditioned by the requirement Wx = 0. With an appropriate passage to the limit to take account of the fact that {Wx = 0} is an event of probability 0, this observation can be used to derive distributions associated with W°.
84 The Space С Let PE be the probability measure on С defined by PE(A)= Р{ЖеЛ|0^ Wx < ?}, Ае<?. We shall prove that A1.31) PE=>W° as e tends to O.| Take the random function W as defined on some probability space and take the random function W° to be defined on the same space by W° = Wt — tWx (see the construction in Section 9). If we prove that A1.32) lim sup ?{We F\0<W1<ae} <, ?{W° e F) E->0 for each closed .Pin C, then A1.31) will follow by Theorem 2.1. For arbitrary tXi . . . , tk, Wx is independent of (И^°, . . . , W°) because it is uncorrelated with each component. Therefore A1.33) P{W°eA, WxeB} = ?{W°eA}?{W1eB) if A is a finite-dimensional set in С and В lies in 0lx. But for В fixed the set of A in ^ that satisfy A1.33) is a monotone class and hence! coincides with <%. Thus A1.33) holds for A e<g and В e М1. In particular, ?{W° e A | 0 < Wx <, e} = ?{W° e A}. Since p(W, W°) = 1^1, where p is the metric on C, \Wx\ < б and W 6 F imply W°eF6 = {x:p(x, F) < d}. Therefore, if e < <5, P{WeF\ 0 ^ W1 < e} <, ?{W° 6 F8 \ 0 < W1 < e} = ?{W° e Fd}. The limit superior in A1.32) is thus at most ?{W° e F8), which decreases to P{PF°eF}asEJ,0if F is closed. This proves A1.32) and hence A1.31).§ Suppose now that h is a measurable mapping from С to Rk and that W° has probability 0 of lying in the set Dh of discontinuities of h. It follows by A1.31) and Theorem 5.1 that A1.34) P{h(W°) < a} = lim P{h(W) < a | 0 < Wx <, e) holds for each a at which the left side is continuous (as a function of a ranging over Rk). From A1.34) we can find explicit forms for some distribu- distributions connected with W°. Sometimes an alternative form of A1.34) is more f We can let e tend continuously to 0 and use the definition involving B.4). Alternatively, we can let e tend to 0 through some sequence (say the reciprocals of integers) fixed throughout the discussion. % See Halmos A950, p. 26). § This part of the argument merely repeats the proof of Theorem 4.1.
Functions of Brownian Motion Paths 85 convenient: A1.35) P{h(W°) < a} = lim P{h(W) < a | -e <, Wx < 0}. ?->0 This is established in the same way (in place of [0, e] we could use any subset of [—?,?] with positive measure). Let m = mft Wt, M = supt Wt. Suppose that a<0<b and that 0 < e < b; by A1.10) we have, if с = b — a, A1.36) P{a < m < M < b, 0 < Wx < s} = f P{2kc < N < 2kc + ?} - f P{2fcc + 2b - e < JV < 2kc + 2b}. fc=—00 fc=—00 Since A1.37) lim -P{x<N <x + e} = -L e~^ , ?-+0 ? у/2тг if we take the limit inside the sums in A1.36), then A1.34) yields 00 00 k=— oo fc=— oo Since the series converge uniformly in s, the interchange is all right. Thus we have the distribution of (m°, M°). Taking —a = b here gives , A1.39) P{supt \Wt°\ < b} = 1 + 2f (~l)ke~2kV, b > 0. By an entirely similar analysis applied to A1.11), A1.40) P{M° < b} = 1 - e~*b\ b > 0. Let t/° be the Lebesgue measure of those t in [0, 1] for which W° > 0. We shall show that U° is uniformly distributed over [0, 1] by showing that A1.41) lim P{[/ <, a | -e ^ Wx < 0} = a, 0 < a < 1. ?-+0 Because of A1.16), the conditional probability here is From the form of the density A1.24) we saw that the distribution of V given T and Wx is uniform on @, T). In other words, if L = VjT, then L is uniformly distributed on @, 1) and is independent of (T, Wx). Therefore the
86 The Space С conditional probability in A1.41) is P{TL<a|-? < Wx <0} = p(t<- •'o [ s -? < Wx < 0 ds, and A1.41) will follow by the bounded convergence theorem if we prove the intuitively obvious relation P{T<e\-e<,W1^0}->0, 0 < 1. But this follows by A1.37) and the form of the density A1.24). Therefore A1.42) P{U° ^ a} = a, 0 < a < 1. Remarks. For further evaluation of distributions of functions of Brownian motion paths, see Ito and McKean A965), Karlin A966, Chapter 10), Erdos and Kac A946 and 1947), Mark A949), Darling and Erdos A956), and Section 16 below. PROBLEMS 1. Show by reflection in the random walk case that >n(v) if v > b, A1.43) {n,n } [pnBb - v) if v < b for b > 0. Derive A1.9) from this. 2. For nonnegative integers ct, let тт{сх, . . . , ck; v) be the probability that an n-step random walk (и fixed) meets cx (one or more times), then meets —c2, then meets c3, . . . , then meets (—lOc+1cfc, and ends at v. Use A1.43) and induction on к to show that c . ipni^i + ¦¦•+ 2c^x + (-l)fc+V) if (-1)*+^ > ck, [Reflect through (—l)fccfc_1 the part of the path to the right of the first passage through that point following successive passages through cl9 — c2, ¦ ¦ ¦ , (—I)k~2ck_2.] Derive A1.5) by showing that pn{a, b, v) is pn(v) — n(b; v) + n(b, a; v) — тг(Ь, a, b; v) + ¦ • ¦ — тг(а; v) + ir(a, b; v) — тт{а, b,a;v) + -- ¦, 3. For ж in с let h{x) be the smallest t for which x(t) = sup,, x(s). Show that h is measur- measurable and is continuous on a set of W-measure 1. Let тп be the smallest к for which Sk = maXj<n Si and prove * P -^ < a| ^--arcsin Va, 0 < a < 1, in the Lindeberg-Levy case. [See Feller A957, p. 86) for the random walk case.] 4. Derive the joint limiting distribution of the maximum and minimum of St — w^^, 0 < i < n, in the Lindeberg-Levy case. [Consider Yn(t) — Xn(t) — tXn{\) with Xn defined by A0.1).]
Fluctuations of Partial Sums 87 12. FLUCTUATIONS OF PARTIAL SUMS In Sections 9 and 10 we established tightness for sequences of random functions by finding bounds for the distribution of the maximum of certain partial sums. Here we shall derive such bounds under fairly general condi- conditions; the results lead to practical tightness criteria which will be used throughout the rest of the book. Maxima Let ?1? . . . , ?m be random variables; they need not be independent or identically distributed. Let Sk = ^ + ' • • + ?k (So = 0), and put A2.1) Mm = max \Sk\. 0<k<m We shall obtain upper, bounds for P{Mm > X} by an indirect approach. If A2.2) M'm = max minflSJ, \SM - Sk\}, 0<k<m then A2.3) M'm < Mm. If f i ffc-i, ?*+!, • • • , ?™ are identically 0, then M'm = 0 and Mm = | ?fc|. Thus there can be no inequality opposite to A2.3). On the other hand, we do have \Sk\ < min {\SJ + \Sk\, \SJ + \Sm - Sk\} = \SJ + min {|^|, \Sm - Sk\}, so that m A2-4) Mm<M'm (There is equality here if Sm = 0.) Therefore A2.5) ?{Mm >X}< If we can find separate bounds for the terms on the right in A2.5), we shall have a bound for the term on the left. If we bound P{Mm > X) via A2.5) instead of by a direct analysis of some kind, there may be some loss of accuracy. On the other hand, since the right side of A2.5) is at most 2?{Mm > \X), the loss need not be great: an extra factor of 2 will not matter for our purposes, and the bounds we derive will decrease with increasing X slowly enough that passing from Я to Я/2 will have no important effect.
There is a second way of deriving bounds on the distribution of Mm from bounds on the distribution of M'm. We shall show that A2.6) Mm from which it will follow that ?[ max |?| > -\ [l<i<m 4J A2.7) ?{Mm > A} < ?{м'т >-\ { 4} Again, bounding the terms on the right will yield a bound for the term on the left. And the right side of A2.7) is at most 2P{Mm > iA}, so that using A2.7) can result in no substantial loss of accuracy. To prove A2.6), consider the set / consisting of those k, 0 <i к < т, for which \Sk\ < \Sm - Sk\; certainly, Oe/. If Sm = 0, then Mm = M'm, so that A2.6) holds. If Sm 9^ 0, then m ф I and hence there is a k, 0 < к < т, for which к-lei and кфГ. |^_г| < \Sm - S^], \Sk^\ < M'm, \Sm - Sk\ < \Sk\, and \Sm - Sk\ < M'm. For this к we have A2.8) \Sm\ < \Sk^\ + \b\ + \Sm - Sk\ < 2M'm + \b\, from which A2.6) follows via A2.4). (There is equality in A2.6) if m = 5 and fx = fa = ?3 = f4 = -?5.) Product Moments In view of A2.5) and A2.7), there is profit in bounding ?{M'm > A}. Let us suppose that there exist nonnegative numbers щ, . . . , um such that A2.9) E{|S,. - S,P • \Sk - S,r} < ( 2 «iY( 2 ul 0 < i < j < к <, m, where у and a are positive. For example, this will hold with у = 2 and a = 1 if the ?г- are independent, if E{?J = 0, and if щ is taken to be the variance of ?,-. Since xy < (x + yJ for nonnegative x and y, A2.9) implies A2.10) E{|S, - S,V ¦ \Sk - S,?} <, I 2 uT, 0<i<j<k<m. \i<l<k / Moreover, if \S3- — 5J > A and \Sk — 5^| > A, where A is positive, then 15,- - Si\v ¦ \Sk - Si\y > A2y, so that A2.10) implies A2.11) P{|S,. - St\ > A, \Sk - S,\ > A} < \-( 2 A Y\i<i<k
Fluctuations of Partial Sums 89 (The inequality is trivial if i = j or if; = k.) Note that A2.9) implies A2.10) and A2.10) implies A2.11) also in the case у = 0, provided we take |x|° as 1 if x ?? 0 and as 0 if x = 0. THEOREM 12.1 If у > 0 and a > \, and if A2.11) holds for all positive X, then, for all positive X, A2.12) ?{M'm > X} <, ?j* (u, + • • • + и J2*, where Ky a is a constant depending only on у and a. Although its specific value will have no importance for us, the constant Ky(X may be taken as tj j ^ \2a-|-By+D nl/By+l) l^l/By Theorem 12.1 is of purely theoretical value: K2>i, for example, is 55,021 to the nearest integer. Applications Before proving the theorem, let us examine several of its consequences. If the Hi are independent with E{?J = 0 and Е{?/} = <тД then A2.9) holds with 7 = 2 and a = 1, provided щ is taken to be a?. Now a-f + • ' " + cTO2 = sm2 is the variance of Sm, and A2.12) implies A2.14) ?{M'm >X}< ^f- , with К = K21. Replacing Я by Xsm and using A2.5) and A2.7), we obtain A2.15) ?{Mm > Xsm} < ^ + P{\SJ > ^sm] and A2.16) ?{Mm > XsJ < ^f + Pfmax |^| > lXsm\. According to A0.7), A2.17) ?{Mm > XsJ ? 2P{\Sm\ > (Я - y/2)sj. The factor 2 on the right here is of no importance and, if X is large, \X and Я — sfl are about the same. Thus A2.15) represents no advance over A2.17) in the independent case. But A2.16) does, because A2.18) PJ max |&| > ^XsA < Я2 si, i-iji 12
90 The Space С so that A2.16) implies A4 V A2 1 m Г A2.19) P{Mm>hm}<,^f- + f,-r2 The sum on the right is the one involved in Lindeberg's condition G.3). It is possible to deduce further results. For example, from A2.14) it follows (see C) on p. 223) that f UMm'J>asm2}\sm/ a Since the ^ are independent, we have (the indices in the sums are constrained by 1 <, i, к < m) ^ f Ц d? f max, Щ- d? < ^ f Ц- h^S-r f , - ^2dP. a/ < smJ{^2>0 For nonnegative U and K, we have f (U +V)dP ?l{ l/dP + 2| V dP. J{U+V>a} That A2.20) J{ikfm2>asm2} \ Sm / La i-15m J{|fi* for some universal constant K' now follows by A2.6) and the fact that the sum on the right here is at most 1. The inequality persists if Мгт is replaced by iS^j. This proves in particular the uniform integrability of the squares of the normalized row sums for a triangular array satisfying Lindeberg's condition. Suppose now that the ^ are independent and identically distributed with J. = 0 and E{^2} = a2. Then A2.15) implies A2.21) P{Mm ^ Xojm] < ^f + P{|SJ > P A A2.16) implies A2.22) P{Mm > Aojm] < ^f + Pf max |?| > and A2.19) implies A2.23) P{Mm > Xojm) < ^f + -?-f f ^ dP.
Fluctuations of Partial Sums 91 Since the integral here goes to 0 as X -»¦ oo, for sufficiently large X we have P{MW > Xajm] <j2, m = 1, 2, .... Thus we have another proof of the tightness of the random functions involved in Donsker's theorem (see A0.10)), a proof which does not depend on the central limit theorem. Although the random variables are independent in the applications just given, Theorem 12.1 does not require independence. This is a great advantage: Most later applications will involve dependent sequences. Proof of Theorem 12.1 The assumption in Theorem 12.1 is that A2.24) P{|S, - St\ > X, \Sk - S,\ >X}< i./ J Л*", 0 < i < j < к < т, for all positive X; we are to find a constant K, depending only on у and a, for which A2.25) ?{M'm > X) < ^ (Ul + • • • + uwJa. Write A2.26) д = —-— , 2y + 1 so that 0 < д <. 1. For large К we have because the left-hand member approaches l/2Ba-1)<5 as K-+ oo and Ba > 1, д > 0) this limit is less than 1. We shall prove that A2.25) holds if К satisfies A2.27) and if A2.28) K>1. (In point of fact, A2.27) implies A2.28). The smallest К satisfying A2.27) is given by A2.13).) The proof goes by induction on m. The result being trivial for m = 1, consider the case m = 2. Since М'г = min {\SX\, \S2 — S^}, A2.24) and A2.28) imply ; > x) < ~ (Щ + «2J« < A (Ml + щу\
92 The Space С Assume now as induction hypothesis that the result holds for each integer less than m. We shall prove it for m itself. Write и = ux + • • • + um; we may assume и > 0. There exists an integer h, 1 < h <? m, such that A2.29) «1 + + "»-.^1 и 2 where the sum on the left is 0 if h — 1. Consider the four quantities Ux = max 0<t<A—1 U2 = max min {\S, - Sh\, \(Sm - Sh) - (S, - SOI} h<i<m = max min D2 = mm{\Sh\,\Sm- Sh\}. Since A2.24) holds if m is replaced by /г — 1, and since h — 1 < m, we may apply the induction hypothesis to the random variables |l5 . . . , |л-1 and the quantities щ, . . . , uh_1 and conclude that A2.30) ?{UX > X) < A K + • • • + „^^ < ? A , the last inequality following by A2.29). (If h = 1, A2.30) is trivial.) If the indices in A2.24) are restricted to h < i <j ^ к < т, then only the random variables |л+1, . . . , ?те are involved. Since m — h < m, the induc- induction hypothesis applies to ?л+1, . . . , ?те and мл+1, . . . , um; hence A2.31) P{U2 > X] < ^ (u U ^ the last inequality following by A2.29) again. (If h == m, A2.31) is trivial.) Now A2.24) implies A2.32) P{Dl > X\ < ±(Ul + ¦¦¦+ итГ = ^ and A2-33) ?{D2>X\<^y. (If h = 1, then A2.32) is trivial; if h == m, then A2.33) is trivial.) We shall show that A2.34) mm{\Si\,\Sm-Si\}<U1 + D1 if 0 < i < h - 1.
Fluctuations of Partial Sums 93 Let ^ be the minimum on the left. If |^| < Ult then Suppose \Shr.1 - St\ < Ur\ if \S^r\ = Dx, then f*t < \st\ < | Vi - si\ + I Vil < ux + z>x. Suppose |5^i - S,| < C/i and |5m - Vil = A; then A*, < \Sm - St\ < |^_х - St\ + \Sm - Vil < Ux + A. This proves A2.34). The same sort of argument shows that A2.35) min {|5,-|, \Sn - S5\} < U2 + D2 if h <j < m. By A2.34) and A2.35), A2.36) M'm < max {Ux + Dlt U2 + D2}, and hence A2.37) ?{M'm ^ X] < ?{UX + Dx > X} + P{U2 + D2> A"}. If Ao and Ax are positive and Ao + Ax = A, then, by A2.30) and A2.32), /1ТТОЛ D( TT I П "~^ 11 ^S DfTT "~^ 11 I Df П ^^ 1 "I ^ Ao 2 Calculus shows that, if Co, Cl5 and A are positive numbers, then A2.39) min \^+%- with д defined by A2.26). Minimizing the right-most member of A2.38), we arrive at A2.40) The same inequality holds for U2 + D2 (use A2.31) and A2.33)). By A2.37), therefore 2a Г / K\5 I1/5 A2.41) ?{M'm > A} < — 2 j —j + 1 . v I ™ *— J — -]2y I /i2a/ By the choice A2.27) of K, the right member here is at most Ku^/X2*. This completes the induction step and the proof of Theorem 12.1.
94 The Space С Moments^ Let us now replace A2.9) by the assumption that A2.42) E{|S, - 5,1'} < {У iiX 0 < i < j < m. It follows from this that, if X is positive, then A2.43) P{|S, - S<\ > X) < I¦( 2 «X 0 < i < j < m. In place of a > | we make the stronger requirement that a > 1. THEOREM 12.2 If у > 0 and a > 1, and if A2.43) holds for all positive X, then, for all positive X, A2.44) ?{Mm >X}<^ where K'y a depends only on у and a. We may define Щ>л by A2.45) K'y>a = 2^A + KhM). Proof. By Schwarz's inequality, P(?x n E2) < РЦЕ^Р^Е^). Therefore A2.43) implies (recall xy < (x + yJ) P{\St - S<\ > X, \Sk - S,\ > X) < — 2 и») T^l I и a/2 4 I U14 Ay\i</<fc / Thus A2.11) holds with у and a replaced by \y and ?oc, and Theorem 12.1 implies A2.46) ?{M'm > X) < jy (щ + ¦ ¦ • + и J* with К = Kiy<ia. By A2.43) we have A2.47) P{|SJ > A} < у (Ml + • • • + uj', and A2.44) follows by A2.5). (Note that, since the right-hand members of the last two inequalities are of the same order, A2.44) cannot be essentially improved by using extra information about the distribution of |.SOT|.) f The remaining results in this section are needed in Chapters 3 and 4 but not in Section 13, the last in this chapter.
Fluctuations of Partial Sums 95 As an application of Theorem 12.2, consider again independent, identically distributed variables ^ with mean 0 and variance a2; this time assume there exists a finite fourth moment т4 = Е{^4}. Now = 2 where the indices range independently from 1 to r. Since the !-г are independ- independent and their means vanish, if the value of some index in the summand differs from those of the other three, the term vanishes. Since о4 < т4, E{Sr4} = rr4 + 3r(r - 1H* < 4r2r4, and hence A2.42) holds with у = 4, a = 2, and ux ==•••== um — 2т2. Theorem 12.2 now implies m2 A2.48) ?{Mm > X) < AS К ^ , A where A" = X^i2. Replacing X by Xayjm yields A2.49) P{MTO > ^ From this and Theorem 8.4, the tightness of the random functions A0.1) involved in Donsker's theorem follows easily. The argument involving A2.23) is more powerful than this one because it requires only second moments. A Tightness Criterion Theorem 12.2 leads to a condition for tightness of a sequence {Xn} of random elements of C. THEOREM 12.3 The sequence {Xn} is tight if it satisfies these two conditions: (i) The sequence {Xn@)} is tight. (ii) There exist constants у > 0 and a > 1 and a nondecreasing, continuous function F on [0, 1] such that A2.50) ?{\XM - Xn(h)\ > Ц < j №) - F(tlW A holds for all tlt t2, and n and all positive X. The moment condition A2.51) E{|Zn(g - XJtJM < \F(t2) - implies A2.50).
96 The Space С Proof. By Theorem 8.2 it suffices to produce, given e and ту, а д @ < д < 1) for which A2.52) P{w(Zn, 6) > 3e} < rj for all и, and, by the corollary to Theorem 8.3, this will hold if d'1 is an integer and A2.53) 2 p( «up \Xn(s) - Xn(jd)\ > e) < n. i<d~x \j5<s<(i+l)d ) Fix n, d, and у for the moment and, for a positive integer m, consider the random variables A2.54) ^=X m J \ m By A2.50), these random variables satisfy A2.43) with щ — F(jd + idnr1) — F{jb + (i - lydm-1). By Theorem 12.2, therefore, A2.55) p( max xjjd + - d) - Xn(jd) > e) \0<i<m \ ТП J j ГГ/ / ' ¦ <\ A 7"»/" 5Л "I(X — [F((j + l)d) - F(jd)]a, e7 where К = К^а. Since Xn lies in C, letting m —>- oo leads to A2.56) Pj jup^\Xn(s) - Xn(jd)\ >j<fy [HO + W) ~ F№T- Therefore A2.57) 2 A .d<™f.+1)}X"^ - X^8^ ^ e) < - [F(l) - F@)]fmax|F@- + Щ - FO^ll" if 6-1 is integral. Since F is continuous and a > 1, we may make this last quantity small by taking д the reciprocal of a large integer, which completes the proof. The ideas in this proof lead to a condition for the existence in С of a random element with specified finite-dimensional distributions. For each A;-tuple of points of [0, 1], let [Atv..tk be a probability measure on (Rk, ?%k). Assume these measures satisfy the consistency conditions of Kolmogorov's existence theorem (in its general form; see p. 230). THEOREM 12.4 There exists in С a random element with finite-dimensional distributions ftt tte, provided these distributions are consistent and provided
Fluctuations of Partial Sums 97 there exist constants у > 0 and a > 1 and a nondecreasing, continuous function F on [0, 1] such that A2.58) 1лЧЛг{фъ fo : |& - &| > A} < i holds for all tx and t2 and all positive X. If the ,Mtl...tfc satisfy A2.59) f |& - W d/it^fa, &) < \F(t2) - then A2.58) follows. Proof For each n, construct a polygonal random function Xn that is linear over each interval [(/ — lJ~n, i2~n] and for which the joint distribution of is /ut Hn with tt = i\2n. If t1 and t2 are integral multiples of 2~n, then, by A2.58)'; A2.60) P{\Xn(t2) - Zn(fl)l > Я} < ± \F(t2) - F(Ola. As in the preceding proof, consider fixed n, d, andy, where д~г is assumed integral. If the points A2.61) jd + idirr1, i = 0, 1, . . . , m involved in A2.54) are all integral multiples if 2~n, then A2.55) follows as before. Suppose now that 62n is an integer. If we take m = 62n, then the points A2.61) are indeed integral multiples of 2~n, so that A2.55) holds. Moreover the points A2.61) are in this case exactly the integral multiples of 2~n in the interval [jd, (J + 1N], so that, because of the polygonal character of Xn, A2.56) and A2.57) again hold. Thus A2.57) holds if d'1 and 62n are both integers. If we take 6 = 2~v, where v is an integer large enough that the right side of A2.57) is less than rj, then A2.52) holds for all n > v, which, since the -^„@) all have the same distribution, is enough for tightness. By Prohorov's theorem, therefore, there is a random element X of С and a subsequence {Xn.} such that Xn. ^ X. If tx, . . . , tk are dyadic rationale, then, by the consistency assumption, the distribution of {Xn(t^), . . . , Xn(tk)) is exactly [Atv..tle for all sufficiently large n, and it follows that (Z(/1), . . . , X(tky) has distribution /utv..tk. For general points tlt . . . , tk of [0, 1], there are dyadic rationale s^], . . . , s(kv) with lim^^"' = t{; since
98 The Space С (Z(s<u))> • • , X(s(kv))) converges in distribution to (X(tJ, ..., X(tk)), and since, by A2.58) and the continuity of F, /u^v) ...s<«) converges weakly to 1 к Ph-w C^('i)» • • • ' Х(*кУ) nas H-tvtic as *ts distribution. Thus X is the random function required. The existence of Ж and W° follow anew from Theorem 12.4.| Further Inequalities For Chapter 3, we shall need a strengthening of Theorem 12.1. If A2.62) M; = max min {|S, - S{\, \Sk - S,|}, 0<i<3<fc<m then A2.63) M'm < M"m. (If m = 3 and ?г = -?2 = ^3, then M'm = 0, and M"m = \?г\.) Therefore bounds on the tail of the distribution of M"m are stronger than bounds on the tail of the distribution of M'm, and the following result sharpens Theorem 12.1. THEOREM 12.5 If у ^ 0 and a > ^, and if A2.64) P{|S, - Sf| > Я, \Sk - St\ > X] < i- holds for all positive X, then, for all positive X, A2.65) ?{M"m > X) < ^f(Ul + • • • + umf*, i<l<k 0<i<j <k<m, where K" a is a constant depending only on у and a. Proof. Put A2.66) Nm = min max | max|Sf|, max \Sm — SM. l<l<m \0<i<l l<i<m ) If / is the index that achieves the minimum here and i <j < k, then either i,j < /, or else / <j, k, from which it follows that A2.67) M"m < 2Nm. Hence it will suffice to find a K, depending only on у and a, such that A2.64) f Theorem 12.4 combines with Theorem 9.2 to give a condition for the continuity of sample paths of separable processes. For extensions and variations of Theorems 12.3 and 12.4, see Problems 7 and 8 in this section and Problem 4 in Section 15.
Fluctuations of Partial Sums 99 implies A2.68) ?{Nm >Ц<~(и +•¦¦ + unT*. (Since Nm < 2M"m, A2.68) and A2.65) have the same strength.) For sufficiently large К, К > 1 and A2.69) / —j +D-2*?Y<Ks, where д = 1/B у + 1), as before. We shall prove by induction on m that such а К works. The cases m = 1 and m = 2 are easy. Assume as induction hypothesis that the result holds for integers smaller than m. As in the proof of Theorem 12.1, choose h, 1 < h < m, so that "i + V uh__x 1 Mi + • • • + uh S S 5 и 2 те. where н = нх + • • • + м Write Лх for iVx: A2.70) Ax = min max max |SJ, max h \0 Write A2 for the value of Nm_h based on the quantities gh+1, . . . , A2.71) ^2 = min max I max |5г- — Sh\, max |STO — 5 h<l2<m |.Л<г<г2 lz<i<m By the induction hypothesis applied to ?l5 . . . , ?л_15 we have A2.72) P{^ > Я} < ^ (и, + ¦ ¦ ¦ + и^Г <Jy§a- By the induction hypothesis applied to gh+1, . . . , |m,we have A2.73) ?{A2 >X]<fy K+1 + • • • + итГ <^у§-« Write ix{i,j, k) = min (IS, - 5,|, \Sk - 5,-|} and define A2.74) В = тах{/г@, /г — 1, m); /г@, /г — 1, h); [x(h — 1, /г, т); /u@,h,m)}. By A2.64), A2-75) P{B > Я} < 4 ^ .
100 The Space C v>,:; We next show that A2.76) Nm <, max {Ax, A2} + IB. Let lx and /2 be the specific values of the indices that achieve the minima in A2.70) and A2.71). To prove A2.76), we must produce an /, 1 < / < m, for which A2.77) maxISJ < max {Ax, A2} + 2B and A2.78) max | Sm - St-| < max {Alt A2} + IB. Suppose first that A2.79) |5Л_!|<В and A2.80) \Sm-Sh\<B. We shall show that the choice / = h satisfies A2.77) and A2.78). Indeed, 0 < i < h implies |5г-| < Ax, and lx < / < / = h implies P»l S P; *jft-il + Рл-il ^ ^i + -o because of A2.79), so that A2.77) holds; and A2.78) is established in the same way. Suppose now that one or the other of A2.79) and A2.80) fails. For definiteness, suppose A2.79) fails. We shall show that the choice / = lx satisfies A2.77) and A2.78). Since A2.79) fails, it follows by the definition A2.74) of В that \Sm - S^] < В and |?л| < В. If 0 < i < / = /l5 then |^| < Ax. Hence A2.77) holds. If / = lx < i < h, then Рте $i\ ^ Pa-1 — ^il + Рте "Л-ll ^ ^1 + -^5 if h < i < 12, then \Sm - S,\ < \St - Sh\ + 1^1 + \Sm - S^l <A2 + IB; if /2 < i < m, then \Sm - 5J < A2. Hence A2.78) holds. If A2.80) fails instead of A2.79), then A2.77) and A2.78) hold with / = /2. This proves A2.76). For positive numbers Ao and Xx adding to X, A2.76) implies A2.81) ?{Nm >X}< ?{AX > Xo} + ?{A2 > Xo} + P{B > %XX}. Applying the inequalities A2.72), A2.73), and A2.75), we get ..2a
Fluctuations of Partial Sums 101 now A2.39) implies \l/d which A2.68) follows by A2.69), which completes the proof. The hypotheses of Theorem 12.5 are satisfied if the ^ are independent, J = 0, E{^2} = щ, у = 2, and a = 1. In this case, the ?f and the щ also satisfy the stronger hypotheses of the following theorem and hence its stronger conclusion. If all but one of the variances щ vanish, then ?{M"m > A} = 0 for A > 0; although the right side of A2.65) is positive, the right side of A2.83) vanishes. THEOREM 12.6 If у > 0 and a > \, and if A2.82) P{|S, - St\ > A, \Sk - S,| > A} < -W 1 uX( 2 Л Ky\i<i<i ) \}<i<k ) for all positive A, then, for all positive A, A2.83) p{m; > a} < S« (Ml +... + и j*. min [i ъ T А У l<A<mL «! + '¦"+ Um J where K'^ depends only on у and a. The moment form of A2.82), which implies it, is the inequality A2.9) with which the whole discussion began. Proof. Choose h to minimize the final factor in A2.83); write и = Mi + • • • + мт, p = (i*! + • • • + ma_i)/u, ph = uhju, and ^ = (uh+1 + • • • + um)lu. As in the preceding proof, define Ax by A2.70), A2 by A2.71), and В by A2.74). By A2.82), u2a u2a P{M0, Л - 1, m) > A} < ^ pa(pA + ?)a < and there is a similar inequality for each of the other ^a's in A2.74). Therefore Write К = К'у'а. Since A2.82) implies A2.64), it follows by Theorem 12.5 (in the stronger form A2.68)) that and >Я}<
102 The Space С Now A2.76) holds just as before, and hence ?{Nm > A} < P{A > Щ + P{A2 > |A} + P{B > Combining this inequality with the three preceding ones, we arrive at > X] ^ And now A2.83), for an appropriate Щ'л, follows easily by A2.67). Remarks. The results of this section, new as such, stem from the work of Kolmogorov (see Slutsky A937)) and Chentsov A956). PROBLEMS 1. Do Problem 1 of Section 10 once more, using A2.19) in place of A0.7). 2. If we weaken A2.42) by assuming only that the inequality holds for the special pair of indices i = 0, j = m, but compensate by assuming that Sx,. . . , Sm forms a martingale and that у > 1, then [Doob A953, p. 314)] ?{Mm >Л}<(и1+---+ итУЧХ\ an in- inequality of essentially the same strength as A2.44). 3. Assume ?15. . . , fTO independent with mean 0 and finite moments of order 2k. Use the multinomial theorem to find a constant Ck, depending on к alone, such that Generalize A2.48). 4. From A2.12) deduce that for positive s 2у-?} ^ *> 5. Adapt the proof of Menshov's inequality [Doob A953, p. 156)] to show that, if A2.10) holds with a > \ and у > h, then dog2 2mfy{Ul + ¦¦¦+ K Now deduce that, if A2.42) holds with a > 1 and у > 1, then E{Mm?} ^ (log2 4m)? («, + •••+ Um (If у = 2 and a = 1, this gives Menshov's inequality again.) 6. Use Theorem 12.2 to show that, if f1; f2, . . . satisfies \< ( I uX, o<;/ with у > 0, a > 1, and ?иг < oo, then ??г converges with probability 1. Find a result connected in an analogous way with Theorem 12.6. 7. Theorem 12.3 is still true if we replace A2.50) by the assumption that A2.84) P{\Xn(t) - Xn{tJ\ > X, \Xn(t2) - Xn(t)\ >$^Yy holds for /j < / < t2 and for all n, where now a > ^.
Empirical Distribution Functions 103 8. Theorem 12.4 is false if A2.58) is weakened to the analogue of A2.84); it is also false if the assumption a > 1 is weakened to a > 1. [Suppose /л( is a unit mass at 0 or at 1 accord- according as / < i or / > ?.] See, however, Problem 4 in Section 15. 9. Let ' / \ k-\ к - 1 1 for </< + — n n In *t*0) = { (k \ к - 1 1 к 2( for +—<*<- n 2n n elsewhere, and let Xn take the value xnk with probability 1/л, к = 1, . . . , п. Then for all n, tlt and t2. Since {Xn} is not tight, we cannot take a = 1 in Theorem 12.3. Similarly, we cannot take a = \ in Problem 7. 13. EMPIRICAL DISTRIBUTION FUNCTIONS Let |l5 |2, . . . be random variables, on some (Q, ?$, P). We shall assume that A3.1) 0<|та(ш)^1, which can always be arranged by a transformation. The empirical (or sample) distribution function FJt, со) corresponding to the points tjiico), . . . , }n(co) is defined, for 0 < t < 1, as Xjn times the number of i < n for which Ij(co) < t. If the ?n are independent and have a common distribution function F(t) (by A3.1), only values of fin [0, 1] matter), then, for large n, Fn(t, со) should approximate F(t)—the difference A3-2) Fn(t, со) - F(t) should be small. According to the Glivenko-Cantelli theorem, A3.3) sup \Fn(t, со) - F(t)\ converges to 0 with probability l.f If F is continuous, it is possible to find the limiting distribution of A3.3), properly normalized; this limit theorem, due to Kolmogorov, stands to the Glivenko-Cantelli theorem as the central limit theorem does to the law of large numbers. We shall consider in this section only the case F(t) = t, so that the ?n, assumed independent, have a common uniform distribution over [0, 1]. f Loeve A960, p. 20).
104 The Space С Kolmogorov's result is that A3.4) P{w:supt \Jn(Fn(t, со) - t)\ < a} — 1 - 2f (-l)fc+V2fc2' We shall derive A3.4) by using the theory of weak convergence in C. Consider the function Yn(co) with value a > 0. A3.5) Yn(t, со) = y/n(Fn(t, со) - t) at t. Although Yn{co) is a function on [0, 1] produced at random, it is not an element of C, being obviously discontinuous. If we could establish the convergence in distribution of Yn as a random element of an appropriate space of discontinuous functions with an appropriate metric, we would be in a position to derive A3.4) by the tech- techniques of Sections 10 and 11, because / / 1 1 » 1 у supt \Jn(Fn(tt со) - t)\ = h(Yn(co)) with h(x) = supj \x(t)\. Although in the next chapter we shall analyze Yn as a random element of a metric space of discontinuous functions, here we shall circumvent the discontinuity problems by adopting a different defi- definition of empirical distribution function. (We shall still be able to derive A3.4) itself—not some perturbation of it.) Let Gn(t, со) be, as a function of t ranging over [0, 1], the distribution function corresponding to a uniform distribution of mass (n + I) over each of the (n + 1) intervals [?«_!)(<»), ?(i)(<*>)], where |@) = 0, |(n+1) = 1, and |A), . . . , |(n) are the values ?l5 . . . , |n ranged in increasing order. (Note that Fn(t, со) corresponds to a distribution of mass rr1 to each of the n points f!(<»), . . . , ln(w).) The functions Fn(t, со) and Gn(t, со) are close: A3.6) \Fn(t, со) - Gn(t, co)\ < - , n 0<t Now let Zn(co) be the element of С with value 03.7) Zn(t, со) = Jn(Gn(t, со) - t) at t. Since each Zn(t) is a random variable, Zn is a random element of С (Z~^ с <%). By A3.6), we have A3.8) supt | Yn(t, со) -Zn(t,co)\< -L
Empirical Distribution Functions 105 THEOREM 13.1 If the |n are independent and uniformly distributed on [0, -1], and ifZn is defined by A3.7), then A3.9) Zn^W°, where W° is the Brownian bridge. Before proving the theorem, let us see how it implies A3.4). If h(x) = supt |#@l> then h is continuous on С and hence A3.9) implies h(Zn) ^>- h(W°). By A1.39), P{supt \Zn(t)\ < a} - 1 - 22(-l)fc+V2fc ', a > 0, fc=i so that A3.4) follows from A3.8) and Theorem 4.1. Using A1.40), we can in the same way derive the relation A3.10) P(Vn supt (Fn(t, со) - t) < a} -+ 1 - e~2a\ a > 0. If h(x) is the Lebesgue measure of the set of t for which x{t) > 0, then, by A1.42), A3.11) P{h(Zn) < a} ^ a, 0 < a < 1; /z(Zn) is approximately n~x times the number of г, 1 < i < n, for which Proof. To prove A3.9), we first show that the finite-dimensional distributions of Zn converge to those of W°. Let Un(t, со) = nFn(t, со) be the number of points among ^(w), . . . , !n(w) that satisfy |Дш) < Л If 0 = Г0<^< ••• <tk= 1, then the random variables A3.12) are multinomially distributed with parameters n and pt = t{ — t{_x, i — 1, 2, . . . , k. It follows by the central limit theorem for multinomial trials that the random vector with components A3.13) YM ~ Ш-i) = \ ((UM - ип{П_,)) - nPi), i == i,z, . . . , к, converges in distribution to the random vector with components W°. — W° (by (9.14) and (9.15), these normally distributed random variables have the right variances pt(l — Pi) and covariances —PiP,)- Because of A3.8), the same is true if we replace A3.13) by Zn(ri_1), /=1,2,...,*.
106 The Space С The finite-dimensional distributions of the Zn thus converge properly. If we prove {ZJ tight, A3.9) will follow. By Theorem 8.3, it suffices to show that, for each positive ? and r\, there exists a <5, 0 < <5 < 1, and an n0 such that, if n > n0, then A3.14) P( sup \Zn(s) - Zn(t)\ > e\ < dr] \t<s<t+d j holds for all t. Because of A3.8), we may replace A3.14) by A3.15) P( sup \Yn(s) - Yn(t)\ >s)< erj. \t<s<t+d j The event whose probability appears in A3.15) is easily shown to lie in the c-field generated by |l5 . . . , |n; A3.15) is simply an inequality involving |l5 . . . , |та—we have not embarked on an analysis of Yn as a random element of a space of discontinuous functions. For notational convenience, we shall take t = 0 in A3.15): A3.16) ( Since the distributions of the increments of Yn(t) are stationary, this is really no restriction anyway. We shall bound the probability in A3.16) by using Theorem 12.1. Let us show that A3.17) i) ~ Yn(s)\* • \Yn(s +pi +p2) - Yn(s +^)|2} < 6Plp2. For 1 < i < n, let сх.г be 1 — pr or — px according as ?j lies in (s, s + Рг\ not, and let /^ be 1 — /?2 ог -~/>2 according as ^ lies in E + Pi, s + />i or not. Then A3.17) is equivalent to A3.18) e|B;< Since the |j are independent, so are the random vectors (ai5 &). Since |f is uniformly distributed, (ai5 /Зг) takes on the values A — p1} —p2), (~Pi, 1 — Pz), and (—/Ji, — /?2) with respective probabilities /?15 /?2, and /?3 = 1 — />! — рг. Now E{aJ = E{/3J = 0, and considerations of symmetry lead to 1аг) I A) = "^{а!2^2} + и(я - 1)Е{а12}Е{^22} + 2n(n -
Empirical Distribution Functions 107 4ow A3.18) follows from Е{а12Д12} = ^A - PlyPi* + p2px2(l - p2J + PzPiW < IPiP*, E{a12}E{j322} = px{\ — j91)j92(l — P2) < P\p2, ind For fixed E, consider the variables Л „ /i - 1 13.19) m i = 1,. . . , m. [f у = 2 and a = 1, and if щ = в^ё/т, then, by A3.17), the hypotheses of Theorem 12.1 are satisfied, with the variables A3.19) in the role of the |t- in that theorem. If Ml = max min it follows that ^13.20) with K= K21. If then, by A2.4), From this and A3.20) we have Mm = max A3.21) ?{Mm >e}< Now, for each со, Yn(s, со) is right-continuous in s. As m —>¦ 00, therefore, Mm converges to sups<5 | Yn(s, co)\ for each со. Hence A3.21) implies A3.22) P sup |ВД| > e < s<d d* + P\\Yn(e)\ > * • Because of the asymptotic normality of A3.13), Yn(d) n —»- 00 F fixed), and hence — <5)iVas V<3A - <5I ? E{iV4} =
108 The Space С For n exceeding some nd, therefore, hence, by A3.22), A3.23) Pfsup |УЯE)| > e) < [ Given ? and rj, choose д so that 6 • 2*(K + l)<52/?4 < drj. For n > nd, A3.16) follows by A3.23). This completes the proof of Theorem 13.1. The treatment is evasive in that we really analyze Yn, replacing Yn by Zn only in order to stay in C. In Section 16 we treat Yn in the space natural to it and prove Yn-^> W°, and generalizations, in that space. Although the proofs in Section 16 depend on a considerable body of theory, they are much more transparent than the one just given. Remarks. Doob A949) conceived the idea of proving A3.4) by passing from the random function A3.5) to the Brownian bridge; Donsker A952) justified the passage. The proof here derives from Chentsov A956). Kac A949) originally proved A3.11). See Darling A957) for an account of limit theorems connected with A3.5) and for a large bibliography. See also the more recent papers: Bickel A968a and 1968b), Birnbaum and Руке A958), Руке A965 and 1968), and Руке and Shorack A968).
CHAPTER 3 The Space D 14. THE GEOMETRY OF D The space С is unsuitable for the description of processes that, like the Poisson process and unlike Brownian motion, must contain jumps. In this chapter we study weak convergence in a space that includes certain discontinuous functions. The Space D Let D = D[0, 1] be the space of functions x on [0, 1] that are right- continuous and have left-hand limits: (i) For 0 < t < 1, x{t+) = limsi< x{s) exists and x(t+) = x(t). (ii) For 0 < t < 1, x(t—) = lims|t x{s) exists. A function x is said to have a discontinuity of the first kind at t if x(t—) and x{t+) exist but differ and x(i) lies between them. Any discontinuities of an element of D are of the first kind; the requirement x{t) — x(t+) is a con- convenient normalization. Of course, С is a subset of D. For xeDandroc [0, 1], put A4.1) wx(T0) = sup {\x(s) - X(t)\ :s,tE To}. The modulus of continuity of x, defined by (8.1), may be expressed as A4.2) wx(d)= sup wx[t,t + d]. 0<t<l-d A continuous function on [0, 1] is uniformly continuous. The following lemma gives the corresponding uniformity idea for elements of D. 109
110 The Space D LEMMA 1 For each x in D and each positive s, there exist points t0, tx, . . . , tT such that A4.3) 0 = t0 < tx < ¦ • ¦ < tr = 1 . and A4.4) wj'i-i, ti)<?, /=1,2,..., r. Proof. Let т be the supremum of those t in [0, 1] for which [0, t) can be decomposed into finitely many subintervals [?г_15 tt) satisfying A4.4). Since x@) = z@+), we have т > 0; since x(r—) exists, [0, т) can itself be so decomposed; т < 1 is impossible because x(r) — x(r+) in this case. From this lemma it follows that there can be at most finitely many points t at which the jump (saltus) \x{t) — x{t—)\ exceeds a given positive number; in particular, x has at most countably many discontinuities. It follows also that x is bounded: A4.5) supt \x(t)\ < oo. Finally, it follows that x can be uniformly approximated by simple functions constant over intervals, so that x is Borel measurable. We shall need a modulus that plays in D the role the modulus of continuity plays in C. For 0 < <5 < 1, put A4.6) we'(<3) = inf max wx[tt_lt tt), {tti 0<i<r where the infimum extends over the finite sets {7J of points satisfying f0= to< tx < ••• < tr= 1, A4.7) [ti-ti_1>d, /=1,2, . ..,* Lemma 1 is equivalent to the assertion that A4.8) lim w'Jd) = 0 d-*0 holds for every x in D. Even if x does not lie in D, the definition of w'x{d) makes sense. Just as liin^,, wx(e) = 0 is necessary and sufficient for an arbitrary function x on [0, 1] to lie in C, A4.8) is necessary and sufficient for x to lie in D. Since [0, 1) can, for each <5 < \, be split into subintervals [^_l5 tt) with <5 < tt — tf_x < 2<5, we have A4.9) w'JLd) < wjBd), if <5<i There can be no general inequality in the opposite direction because of A4.8) and the fact that wx(d) does not go to 0 with <5 if x has discontinuities.
The Geometry of D 111 Suppose, however, that x ? C. Given e, choose points {t{} satisfying A4.7) and A4.10) max wx [h_x, tt) < wx(8) + e. 0<г<г If \s — t\ < 6, then s and t lie in the same subinterval [^_i, tt) or in abutting ones, and it follows by A4.10) and the assumed continuity of x that Wx(<5) < 2w'x(S) + 2s. Since e was arbitrary, A4.11) wx(d) < 2w'x(d) if xeC. By A4.9) and A4.11), the moduli wx(d) and w'x(d) are essentially the same for continuous functions x. The Skorohod Topology Two functions x and у are near one another in the uniform topology used for С if the graph of x(t) can be carried onto the graph ofy(t) by a uniformly small perturbation of the ordinates, with the abscissas kept fixed. In D, we shall allow also a uniformly small deformation of the time scale. Physically, this amounts to the admission that we cannot measure time with perfect accuracy any more than we can position. The following topology, devised by Skorohod, embodies this idea. Let Л denote the class of strictly increasing, continuous mappings of [0, 1] onto itself. If I ? Л, then Л@) = 0 and ЛA) = 1. For x and у in D, define d{x, y) to be the infimum of those positive e for which there exists in Л а Я such that A4.12) supt|Af-f| <e and A4.13) sup, И0 - y{lf)\ < e. By A4.5), d(x, y) is finite (take It = i). Clearly, d(x, y) > 0; and d(x, y) = 0 implies that for each t either x{t) = y(t) or x(t) = y(t—), which in turn implies x = y. If Я lies in Л, so does k~x\ d(x, y) = d(y, x) follows from supj \Х~Н — t\ = sup; \Xt — t\ and sup, \x{X-H) - 2/@1 = sup, \x{t) - y(Xt)\. If Ax and A2 lie in D, so does their composition ХгК\ the triangle inequality follows from sup, |AaV - t\ < sup, \Xxt — t\ + sup, |V - * I and sup, \x(t) - z{X^t)\ < sup, \x(t) - y{Xxt)\ + sup, \y(t) - 2(V)I- Thus d is a metric.
112 The Space D This metric defines the Skorohod topology. The uniform distance between x and у may be defined as the infimum of those positive e for which sup4 \x(t) — y(f)\ < e. The A in A4.12) and A4.13) represents the uniformly small deformation of the time scale mentioned above. Elements xn of D converge to a limit x in the Skorohod topology if and only if there exist functions Xn in Л such that limn xn(Xnt) = x(t) uniformly in t and lnt = t uniformly in t. If xn goes uniformly to x, then there is convergence in the Skorohod topology (take Xnt = t). On the other hand, there is convergence in the Skorohod topology, whereas xn(t) —> x(t) fails in this case at t = |. Since Skorohod convergence does imply xn(t) —> x(t) for continuity points t of x, and if x is (uniformly) continuous on all of [0, 1], then Skorohod con- convergence implies uniform convergence. In particular, the Skorohod topology relativized to С coincides with the uniform topology there. The space D is separable; one countable, dense set consists of those x that have a rational value at t = 1 and have, for some integer k, a constant, rational value over each subinterval [(/ — l)/k, i/k), 0 < i < к (use Lemma 1 and the definition of d). But D is not complete under the metric d: If A4.14) then A4.15) d(xn, xj = n m and hence {xn} is fundamental in the metric d, even though it is not con- convergent. We shall introduce in D another metric d0—a metric which is equivalent to d (gives the Skorohod topology) but under which D is complete. Completeness facilitates characterizing compact sets. The idea in defining d0 is to require that the time-deformation A that intervenes in the definition of d be near the identity function in a sense which at first appears more stringent than A4.12); namely, we require that the slope {It — ls)\{t — s) of each chord be nearly 1 or, what is the same thing and analytically more convenient, that its logarithm be nearly 0. If Я is a nondecreasing function on [0, 1] with АО = 0 and Al = 1, put A4.16) ||Я|| = sup log It- Is t — s
The Geometry of D 113 If ||Я|| is finite, then the slopes of the chords of X are bounded away from 0 and infinity; therefore it is both continuous and strictly increasing and hence is a member of Л. Although \\X\ may be infinite even if X does lie in Л, such elements of Л do not enter into the following definition. Let do(x, y) be the infimum of those positive e for which Л contains some Я with A4.17) ЦАЦ < е and A4.18) sup, HO - y{Xt)\ < e. By A4.5), do(x, y) is finite (take Xt = i). That d0 is a metric follows from the relations A4.19) ||A-i|| = and A4.20) ЦЯЛ11 < Ш\ + 11А.Ц. We shall show presently that d and d0 are equivalent metrics. We shall also show that D is complete under d0; for the present, note that, if xn is defined by A4.14), then A4.21) do(xn, xj = minjl, log^ j, m, n > 3, so that at least the (nonconvergent) sequence {xn} is not J0-fundamental. If do(x, y) < e, then A4.17) and A4.18) hold for some X eA. If e < J, then, since АО = О, log A - 2e)< -e < log j < s < log A +2e),f so that \Xt — t\ < 2s. Therefore A4.22) d(x,y)<2do(x,y) if do(x,y)<l A comparison of A4.15) and A4.21) shows that there can be no general inequality in the direction opposite to A4.22). The following lemma shows, however, that do(x, y) is small if d(x, y) and w'x{8) (or w'y(d)) are both small. LEMMA 2 Ifd(x, y) < E2, where 0 < д < ?, then do(x, y) < 46 + w'x(d). Proof. Choose points tt satisfying A4.7) and A4.23) wj^, U) < w'xF) + 6, i = 1, 2,.. ., r. t If M < i> then s - s2 < log A + s), whereas log A + s) <, s for all s > — 1.
114 The Space D Now choose from Л а [л such that A4.24) sup, \x{t) - y(jAt)\ = supt \x(p-4) - 2/@1 < & and A4.25) sup, \fjLt -t\< d\ We want to define in Л а Я that will be near /л but will not, as [л may, have chords with slopes far removed from 1. Take X to agree with /л at the points tt and to be linear in between. Since the composition yrxX fixes the tt and is increasing, t and [л~гХ1 always lie in the same subinterval [*,-_!, tt). By A4.23) and A4.24), therefore, < \x(t) - хСи-^AOI + H^Xt) - у(Щ б + б2 < 4д It is now enough to prove ||Я|| < 46. Since X agrees with [л at the t{, it follows by A4.25) and the inequality tt — гг_± > б that 2d(tt - U_x). From the polygonal character of X, it now follows that \{Xt- Xs)- (t-s)\ <2d\t-s\ holds for all s and t. Therefore log A - 26) < log ^-^ < log A + 26); t — s since б < I, \\X\\ < 46 follows. THEOREM 14.1 The metrics d and d0 are equivalent. Proof. Let us denote an open J-sphere by Sd(x, s) and an open ^-sphere by Sdo(x, s). It follows by A4.22) that inside an arbitrary Sd(x, s) we can find an Sdo(x, 6). (The choice of the new radius does not depend on the center x.) Lemma 2 implies that, if A4.26) б < J, 46 + w'x(d) < e, then Sd(x, E2) c: Sd (x, e). Given x and e, we can, by A4.8), find а д satisfying A4.26). Inside an arbitrary J0-sphere we can thus find a J-sphere with the same center. (This time the choice of the new radius does depend on the center x, as must be true if D is to be complete under d0—if d and d0 are to give different classes of fundamental sequences.) Therefore d and d0 are equivalent metrics.
The Geometry of D 115 Completeness of D THEOREM 14.2 The space D is complete in the metric d0. Proof. It is enough to show that each ^-fundamental sequence contains a 4,-convergent subsequence. If {xk} is a ^-fundamental sequence, it contains a subsequence {yn} = {xkn} such that A4-27) do(yn, yn+1) < 1. We shall prove that {yn} is ^-convergent to some limit. By A4.27), Л contains a /j,n such that A4.28) ^Pt\yn(t)-yn+1(^nt)\<~-n and A4-29> . . ^ This implies, for m > 1, SUP« l^n+m+lPn+m ' ' ' P = suPs \fxn+m+1s - s\ < — . (Here /un+m • • • /un+1/un denotes iterated composition.) For fixed n the func- functions A4-30) are thus uniformly fundamental as m —> oo. Therefore A4.30) converges uniformly to a limit A4.31) V = limTO /лп+т • • • The function Xn must be continuous and nondecreasing and must satisfy Яи0 = 0, АЯ1 = 1. If we prove that ||AJ| is finite, it will follow that Xn is strictly increasing and hence is a member of Л. By A4.20), i ftn+m ftn+lftnt f^n+ t — S -2n-l Letting m -*- oo in the first member of this inequality, we find that ||An|| < 1114'1; in particular, ||AJ| is finite and ln ? Л. By A4.31), ln = Хп+Фп. Therefore, by A4.28), ¦ у Л 1 .4 У Л 1 .41 I S 4 • N.I-J-
116 The Space D In consequence, the functions yn{X~H), which are elements of D, are uniformly fundamental and hence converge uniformly to a limit function x(t). It is easy to show that x must also be an element of D. Since supfK(V0-x@|-^0 and IIAJI-j-O, we have do(yn, x) -> 0, which completes the proof. Compactness in D We turn now to the problem of characterizing the compact subsets of D. With the modulus w'x(8) defined by A4.6), we have an analogue of the Arzela-Ascoli theorem (p. 221). THEOREM 14.3 A set A has compact closure in the Skorohod topology if and only if A4.32) sup sup \x(t)\ < oo xeA t and A4.33) lim sup w'x{d) = 0. d-*0 xeA Note that, because of A4.9), A4.33) is weaker than A4.34) lim sup wx(d) = 0. d-*0 xeA Proof. To prove the sufficiency of these conditions, it is enough, since D is ^-complete, to show they imply that A is totally bounded with respect to d0. Let us first show that A4.32) and A4.33) imply that A is totally bounded with respect to the metric d. Given a positive e, choose an integer к such that I/A: < e and w'x{\jk) < e for all x in A. Take Я to be a finite e-net in the linear interval [—a, a], where a = sup sup \x(f)\. xeA t Let В be the finite set of у in D that assume on each of the intervals [(u — \)jk, ujk) a constant value from H and satisfy y(\) ? H. We shall show that В is a 2e-net with respect to d. Given x in A, use the inequality w'x{ljk) < e to choose points tit 0 = fo < h < • • • < tr = 1, such that A4.35) U - и_, > \ к and A4.36) wj^, u) <s, 0<i<r. Choose integers щ such that ujk < t{ < (wf + l)/k, i = 0, 1, . . . , r. Since
The Geometry of D 117 he щ must be distinct by A4.35), there is in Л а Я that carries ujk to tt .nd is linear in between these points. Choose in В a point у such that 14.37) >ince an interval nust be inside one of the intervals :he function x{Xt) cannot, by A4.36), vary by more than eas/ varies over an nterval [и/к, (и + 1)/^)- Since у is constant over intervals of this form, 'Д4.37) implies \y(t) — x(Xt)\ < 2e for all t. By construction, . щ к h-- и, ~k <7<e; к by linearity, sup^ \Xt — t\ < e. Thus d(x, y) < 2e. Therefore В is a 2e-net in the sense of d. Given a positive rj, choose д so that 0 < д < ? and so that 46 + w'x(d) < 77 holds for all x in A. Then choose e so that 0 < 2e < d2. If В is the set just constructed—a finite 2e-net for A in the sense of the metric d—then, by Lemma 2, В is an ??-net for A in the sense of the metric du. Thus A is d0- totally bounded and hence, since D is J0-complete, the closure A~ of A is compact. This proves the sufficiency of A4.32) and A4.33). If A~ is compact, then it is bounded, and A4.32) follows immediately (note that supf \x{t)\ is the distance, in the sense either of d or of d0, from x to the function that is identically 0). The present theorem differs from the Arzela-Ascoli theorem in that for no single t do A4.38) sup < 00 and A4.33) together imply A4.32). It is not hard, however, to prove that A4.32) follows if A4.33) holds and if A4.38) holds for each individual value of t (or even just for values of t including 1 and dense in [0, 1]). It remains to prove the necessity of A4.33). By A4.8), w'x(d) goes to 0 with д for each x. If we prove that w'x(d) is upper semicontinuous in x for each d, then (see p. 218) the convergence will be uniform on compact sets and A4.33) will follow.
118 The Space D Let x, 6, and e be given. To prove upper semicontinuity, we must find an 7] such that d(x, y) < rj implies A4.39) w&d) < w'x(d) + e. First choose points tt satisfying A4.7) and A4.40) wj^_b tt) < w?d) +- i = 1, 2,. . ., r. Now choose rj so small that ti-ti_1>d + 2rj, i=l,2,...,r, and A4.41) rj < Is. If d{x, y) < rj, then, for some X in Л, we have supt \Xt — t\ = supt \X-~4 — t\ < rj and A4.42) sup, \y(t) - x{Xt)\ < rj. Let St = Х'Чг. Then A4.43) Si - s^ >U- tt_x -2rj>d. Moreover, if s and t both lie in [5»_19 5^), then Я5 and Xt both lie in [^_l5 tt) and therefore, by A4.40), A4.41), and A4.42), \y(s) - y(t)\ < <(б) + в. Thus wjs^b s,) < w^C) + e, which, in view of A4.43), implies A4.39). This completes the proof of Theorem 14.3. A Second Characterization of Compactness Although the modulus w'(d) is natural in that it leads to a complete charac- characterization of compact sets, a different one is sometimes more convenient to work with, namely A4.44) W'x{d) = sup min{H0 - x{h)\, \x(t2) - x(t)\}, where the supremum extends over tlt t, and t2 satisfying A4.45) ti < t < t2, t2 - ^ < д. Given д and e, decompose [0, 1) into subintervals [^_15 s^ such that si — si-i > ^ and Wj.^-!, st) < w'x(d) + e. If A4.45) holds, then either t± and t2 He in the same subinterval [5<_ь ^), in which case \x(t) — ж(?1)| < w^.F) + e and \x(t2) — ж(?)| < w'x(d) + e, or else they lie in abutting intervals
The Geometry of D 119 'i-i> sd and IAj'Vh)' ш which case \x(t) — х(^)\ < wx(d) + e for < > < ^ and |z(f2) — a;(f)| < wxF) + s for st < t < t2. Therefore 14.46) W'x(d) < Wx(d). There can be no inequality in the other direction. If, for example, A for 0<K n~\ 14.47) xn(t) = lO for n-^<t< 1, rif @ for 0 < t < 1 - n-1, 14.48) *я@ = (l for 1-rKK 1, tien wXn(d) = 0, whereas , ... @ if б < и, W-E) = (l if 6>«- dnce {жэт} has compact closure in neither case, there can be no compactness ondition involving (in addition to A4.32)) a restriction on w'x(d) alone. It is •ossible, however, to formulate a compactness condition in terms of w'x(d) .nd the behavior of x near 0 and 1. THEOREM 14.4 A set A has compact closure in the Skorohod topology f and only if 14.49) supsup|a;@| < oo xnd 'lim sup W(8) = 0, <5-»0 lim sup wx[0, 6) = 0, d-*0 xeA lim sup wjl — 6,1) = 0. 14.50) proof. It suffices, in view of Theorem 14.3, to show that A4.50) is equivalent :o 14.51) lim sup w'(<3) = 0. d->0 xeA That A4.51) implies A4.50) follows by A4.46) and the definition of w'(d). We prove the reverse implication. Given a positive e, choose a positive б such that, for all x in A, A4.52) иДО) < e, wJO, 6) < e, W(>[1 - Л, 1)< e. Assume that ж lies in A; we shall show that A4.53) m/Q<5) < 6e, which will suffice to prove the theorem.
120 The Space D Let us first show that A4.54) h<s <t<t2, t2 - tx < 6 implies A4.55) min {|x(j) - x(tl)\, \x{t2) - x(t)\} < 2s. Indeed, if \x(s) — x(tx)\ > s, then, by A4.52), \x(t) — x(s)\ < s and \x(t2) — x(s)\ < s, so that \x(t2) — x(t)\ < 2e. Suppose x has a jump exceeding 2e at each of two points т1 and r2. If 0 < r2 — TX < 6, then there exist points f1} s, t, t2 satisfying A4.54), t1 < r1 = 5, and ? < r2 = ?2; if t1 is close enough to rx, and if ? is close enough to т2, then A4.55) is violated, which we have seen is impossible. Thus [0, 1] cannot contain two points, within д of each other, at each of which ж jumps by more than 2e. And, by A4.52), neither [0, d) nor [1 — d, 1) can contain a point at which x jumps by more than 2s. Thus there exist points si} with 0 = s0 < s1 < • • • < sr = 1, such that ^ ~ si-i >: <5 and such that any point at which x jumps by more than 2e is one of the st. If s§ — s^ > б for a pair of adjacent points, enlarge the system {^J by including their midpoint. Continuing in this way leads to a new, enlarged system s0, . . . , sr (with a new r) that satisfies i<3 < st - ^_! < б, г = 1, 2, . . . , r. Now A4.53) will follow if we show that A4.56) щЬы, s^ < 6e for each i. Suppose ^_х < tx < ?2 < 54. Then ?2 — t1 < 6. Let (Ti be the supremum of those a in [tx, t2] for which supii<u<CT \x(u) — ж(^)| < 2s; let or2 be the infimum of those a in [tx, t2] for which supa<u<<2 \x(t2) — x(u)\ < 2s. If o1 < cr2, there exist points s just to the right of аъ with |жE) — ж(^)| > 2e and there exist points t just to the left of a2 with \x(t2) — ж(/)| > 2s; since we may ensure s < t, this contradicts the fact that A4.54) implies A4.55). Therefore a2 < alt and it follows that \x{o1 —) — x(t^)\ < 2e and |ж(?2) — x(a^)\ < 2e. Since t1 <. <y1 < t2, ax is interior to (^_15 st), so the jump at a-L is at most 2s. Thus |ж(?2) — x{t^)\ < 6e. This establishes A4.56), which proves A4.53) and the theorem.f Finite-Dimensional Sets Finite-dimensional sets play in D the same role they do in C. For points tu . . . , tk in [0, 1], define the natural projection TTtv,.tk from D to Rk as t The points 0 and 1 play a special role in the theory of the space D because each X in Л fixes them. Judicious enlargement of Л might simplify the theory.
The Geometry of D 121 usual: A4.57) ?rtl...tk(x) = {<h),...,x(tk)). Now tt0 and ттх are everywhere continuous. Suppose 0 < t < 1. If points xn converge to x in the Skorohod topology and x is continuous at t, then (p. 112) xn(t) —>¦ x(t). Suppose, on the other hand, that x is discontinuous at t. If Xn is the element of Л that is linear on [0, t] and on [t, 1] and satisfies Xnt = t — \jn, and if xn{s) = x(kns), then xn converges to x in the Skorohod topology but xn(t) does not converge to x(t). Thus: If 0 < t < 1, then vt is continuous at x if and only if x is continuous at t. We must prove that irti...tk *s measurable with respect to the cr-field 2 of Borel sets for the Skorohod topology. We need consider only a single time point t (since a mapping into Rk is measurable if each component mapping is), and we may assume t < 1 (since тгх is continuous). If xn converges to x in the Skorohod topology, then xn(s) -> x(s) for continuity points s of x and hence for points s outside a set of Lebesgue measure 0. Since the xn are uniformly bounded, A4.58) - zn(s)ds-^- x(s)ds (n -+ oo) e Jt e Jt for each positive s. Thus he{x) = e J*+? ж(,у) ds is continuous in the Skorohod topology. By right-continuity, he(x) -> ттг(х) for each x as e -> 0. Thus тгг(а;) is measurable. Having proved тг^...^ measurable, we may, as in C, define the finite- dimensional sets as sets of the form тгг1 , H with к > 1 and // e «^fc. Each finite-dimensional set lies in 2 by the definition of measurability. If To is a subset of [0, 1], let J^,o be the class of sets ttj^^H, where к is arbitrary, the tt are arbitrary points of To, and Я e Mk. Then J^,o is a (finitely additive) field. Of course, J7^ n is the class of finite-dimensional sets. THEOREM 14.5 IfT0 contains 1 and is dense in [0, 1], then 3FT generates 9. These conditions are also necessary in order that J^y generate 3), although we shall not need this fact. Taking To = [0, 1] makes the theorem no easier to prove. Proof. Since D is separable, it suffices to show that each open <io-sphere Sdo(x, r) lies in the a-ueld generated by ^Tq. Fix the center x and radius r. Choose in To a sequence tlf t2, . . . that is dense in [0, 1], with tx = 1. For 0 < e < r and к > 1, let Ak(e) consist of those у for which there exists in Л а Я satisfying IIAll < r - e
and max \y(ti) — x(Xtf)\ < r — s. 1<г<к It is enough to prove that A4.59) Sdo(x, r) = U C\Ak(e), E fc=l where the union extends over the rational e in @, r) and to prove that each Ak(s) lies in J^. We prove the second fact first. For fixed e and k, let Нг be the set of points (x(Xt^), . . . , x(Xtk)} in Rk, where X ranges over those functions in Л satisfying ||A|| < r — e. Let H2 be the set of points (al5 . . . , afc) in Rk such that \x{ — &| < r — e, i = 1, . . . , k, for some (/?l5 . . . , Ck) in J^. Then /f2 is open and hence lies in Mk, and Ak(e) = ttj^ tjH2. Thus Ak(s) g^Tq- It is easy to see that the left member of A4.59) is. contained in the right member. We complete the proof by showing that A4.60) П Ak(e) cz Sdo(x, r). /c=i If у lies in the intersection on the left, choose for each к a Xk in Л with A4.61) ||AJ <r-e and A4.62) max \уAг) - х(Хк1г)\ < г - e. 1<г<к By Helly's selection theorem (p. 227), there is a subsequence {Xk>} and a nondecreasing function X such that A4.63) limfc. Xk,t = Xt holds for continuity points t of A. We shall show that X еЛ, that ||A|| < r — e, and that supt |y@ — ж(Я/)| < r — e. This will imply do(x, у) < г and prove A4.60). If j and ? are distinct continuity points of X, then, by A4.61), A4.64) log Xt - Xs t — s log Xk.t - Xk,s t — s < r This relation makes a jump in X impossible (in particular, A4.63) holds for all t) and it implies that X is strictly increasing; thus X еЛ. The inequality A4.64) also implies ||Я|| < r - e. By A4.62), \y(tx) - x{t^\ <r - e (recall that tx = 1). If / > 1, then, by A4.62), \y(tt) - х(Хк4г)\ < r - e for k! > i. Since Xvt -> Af, we have either \y(t{) — x{Xt^)\ < r — e or |?/(Гг.) — x((Xtt) — )| < r — e and, since {fj is dense, sup4 |y(f) — a;(Af)| < r — г follows. This proves Theorem 14.5.
Since Jryo is always a field, this theorem implies that, if To contains 1 and is dense in [0, 1], then 3FT^ is a determining class. Not even ^"[0>1] is a convergence-determining class—the counterexamples for С apply equally well here. If P is a probability measure on (D, Of), its finite-dimensional distributions are the measures P^^..tk- If To contains 1 and is dense in [0, 1], then P is completely determined by its finite-dimensional distributions for time points in To. We shall use in D the same conventions about coordinate variables we used in С—the conventions set out at the end of Section 8. Remarks. The topology studied here is the Jx topology of Skorohod A956); see also Kolmogorov A956), Prohorov A956), and Skorohod A961). PROBLEMS 1. Let D+ be the class of functions on [0, 1] that have only discontinuities of the first kind in @, 1) and have right-hand limits at 0 and left-hand limits at 1. Convert D+ into a pseudo- metric space in such a way that (i) x and у are at distance 0 if and only if they agree at their common continuity points and (ii) D is isometric with the standard quotient space [Kelley A955, p. 123)]. 2. The Skorohod topology is finer than that given by the metric J \x(t) — y{t)\ dt. 3. Under the Skorohod topology and pointwise addition of functions, D is not a topo- logical group. 4. The set С is nowhere dense in D. 5. Put w%{8) — sup sup min {wx{t^, t), wx{t, f2)}. ti-t1<d t1<t<ti Show that wx'(d) — sup inf max {wx{t^, t), wx(t, t2)} ti-t1<d t1<t<ti and и?(<5) < <E) ? 2и?(<5). 15. WEAK CONVERGENCE AND TIGHTNESS IN D Prove weak convergence in function space by proving weak convergence of the finite-dimensional distributions and then proving tightness—this was the technique so useful in C, and we want to adapt it to D. Since D is separable and complete under the metric d0, a family of probability measures on (D, Sf) is relatively compact if and only if it is tight, so there is no difficulty on that point. On the other hand, the fact that the natural projections are not continuous complicates matters somewhat. Finite-Dimensional Distributions For a probability measure P on (D, Sf), let TP consist of those t in [0, 1] for which the projection щ is continuous except at points forming a set of
124 The Space D P-measure 0. The points 0 and 1 always lie in TP. If 0 < t < 1, then t e TP if and only if P(Jt) = 0, where A5.1) Jt= {x:x(t) ^ x(t-)} is the set of x that are discontinuous at t—recall (p. 121) that, for 0 < t < 1, TTt is continuous at x if and only if x is continuous at t. An element of D has at most countably many jumps. Let us prove the corresponding fact that P(Jt) > 0 is possible for at most countably many t. Let Jt(e) be the set of x having at t a jump \x(t) — x(t—)| exceeding s. For fixed positive e and д there can be at most finitely many points t for which P(jt(ej) > d, since if this inequality held for a sequence of distinct points t1} t2, . . . , then the set lim supn Jin{^) would have measure at least д and hence would be nonempty, contradicting the fact that for each x the saltus can exceed e at only finitely many places. For a fixed positive e, therefore, P(jt(e)) can exceed 0 for at most countably many t. Since P(Jt(e)) f P(Jt) as e I 0, the result follows. Thus: TP contains 0 and 1 and its complement in [0, 1] is at most countable. If tlf . . . , tk all lie in TP, then Trtl...tk is continuous except on a set of P-measure 0. Suppose A5.2) Pn=>P, where Pn and P are probability measures on (D, Sf)\ by Theorem 5.1, A5-3) W-*=^U holds if all the tt lie in TP. In general, A5.3) will not follow from A5.2) if some tt lies outside TP: If P is a unit mass at the function /[0 i) and Pn a. i / h P hld hil Р~х Р1 fil [0 ) unit mass at /[0>^+n-i), then Pn =>P holds while Рптт~х ^-Рщ1 fails. Let us now prove the analogue of Theorem 8.1. THEOREM 15.1 If {PJ is tight and if Pn^..tk=>Рщ^к holds whenever tlt . . . , tk all lie in TP, then Pn =>P. Proof. By tightness, each subsequence {Pn>} contains a further subsequence {Pn"} converging weakly to some limit Q. It is enough (Theorem 2.3) to show that Q always coincides with P. If 'i h all lie in TP n TQ, we have Pn-'lfTi..t]c=>P'n7i..tk ЪУ tne hypothesis and Pn»TTj^,.tk => C?77^1-.^ because Pn» => Q. Thus A5.4) Pir?.tk = Qt«7"-**> ^ . . . , ffc e Tp П To. Since ГР and Го each contain all but countably many points of [0, 1], the same is true of TP C\TQ\ in particular, this intersection is dense. Since TP П TQ contains 1, it follows by Theorem 14.5 that 2 is generated by the
Weak Convergence and Tightness in D 125 finite-dimensional sets based on time points in TP П TQ. Thus P = Q follows from A5.4), which completes the proof. Tightness The analysis of tightness in С began with a result (Theorem 8.2) which substituted for compactness its Arzela-Ascoli characterization. Theorem 14.3, which characterizes compactness in D, leads in exactly the same way to the following result. Let {Pn} be a sequence of probability measures on THEOREM 15.2 The sequence {Pn} is tight if and only if these two conditions hold: (i) For each positive rj, there exists an a such that A5.5) i\>:suPi KOI > a} < rj, n > 1. (ii) For each positive e and rj, there exist a d, 0 < д < 1, and an integer nQ such that A5.6) Pn{x:<(d)>e}<ri, n > n0. Using Theorem 14.4 in place of Theorem 14.3 gives a second set of conditions for tightness. THEOREM 15.3 The sequence {Pn} is tight if and only if these two conditions hold: (i) For each positive rj, there exists an a such that Pn{x:snpt\x(t)\ >a}<V, n > 1. (ii) For each positive s and rj, there exist a 6, 0 < д < 1, and an integer n0 such that A5.7) Pn{^<F)>e}<rj, n>n0, such that A5.8) Pn{x:wx[0,d)>e}<rj, n > no, and such that A5.9) i\>:wjl - d, 1) > e} < rj, n> n0. In the most interesting cases, A5.8) and A5.9) are automatically satisfied, as the next theorem shows. Let Pn and P be probability measures on {D,3f)\ by the definition A5.1), Л = {x:x(l) ^ z(l-)}.
126 The Space D THEOREM 15.4 Suppose that A5.10) Pn^t-tk^P^Tr-tk holds whenever t1} . . . , tk all lie in TP. Suppose further that Р(Л) = О. Suppose finally that, for each positive e and rj, there exist a 6, 0 < д < 1, and an integer n0 such that A5.11) Pn{x--<V)>e}<r], n>n0. Proof. It will suffice, by Theorem 15.1, to show that {Pn} is tight. We first verify condition (i) of the preceding theorem. Given a positive rj, choose д and n0 satisfying A5.11) with e = 1 (say). Since TP is dense, it contains points tx,. . . , tk, with 0 = /x < • • • < tk = 1, such that tt — ti_1 < 6. From A5.10) it follows that {PnTTj^..tk} is a tight sequence in Rk and hence that A5.12) pJx : max |<OI > a<\ < rj, n> 1, I l<i<fc J for an appropriately chosen a0. If \х(^г)\ < a0, i = 1,. . . , k, and if w"x{b) < 1, then \x{f)\ < a0 + 1 for all t. If a = a0 + 1, then by A5.12) and A5.11) (with e = 1) we have Pn{x:supt \x(t)\ > a} < 2rj for n ^ n0. For the finitely many n preceding n0, we can ensure that this inequality holds by increasing a, each individual Pn being tight. This establishes condition (i). As for condition (ii) of the preceding theorem, we must find, given s and rj, а д and n0 satisfying A5.8) and A5.9). Consider A5.8). Since each x in D is right-continuous, we have A5.13) P{x:\x(d) - z@)| >e}<rj for all sufficiently small д. Choose in the dense set TP а. д small enough that A5.13) holds and small enough that A5.11) holds for an appropriate n0. Since 0 and д both lie in TP, we have Рптт~\ ^Ртт^ as a special case of A5.10), and it follows by A5.13) that A5.14) Pn{x--№)-x@)\>e}<ri for all sufficiently large n. If w"x(d) < e and \x(d) — z@)| < e, then \x(s) - x@)\ < 2e for all s in [0, 6], and hence wJO, 6] < As. Thus A5.11) and A5.14) together imply Pn{x:wx[0, d) > 4e} < 2rj. This takes care of A5.8). . With one change, the symmetric argument works for A5.9). In place of A5.13), we need A5.15) P{x:\x(\) - x(l -6)\>e}<rj
Weak Convergence and Tightness in D 127 for small д. Since an element of D need not be left-continuous, A5.15) will notin general hold; since we have assumed Р(Л) = 0, x is left-continuous at t = 1 except for x in a set of P-measure 0, and A5.15) does hold for all sufficiently small <5. This completes the proof of Theorem 15.4. The condition P(Jx) = 0 is essential: If P is a unit mass at the function /{1} and Pn is a unit mass at / n-i1 then all the finite-dimensional distributions converge and the condition involving A5.11) holds, but Pn does not converge weakly to P. Since lim P{x:\x(l-) - x(t)\ > e} = 0, e > 0, we have Р(Л) = О if and only if A5.16) limP{x:\x(l)- x(t)\ > e} = 0, e > 0. It is usually easy to check the condition Р(Л) =-0 via A5.1,6). Our next result shows what happens if in place of w'x(8) or w'x(d) we use the modulus wx(d) appropriate to C. THEOREM 15.5 Suppose that, for each positive rj, there exists an a such that A5.17) Рп{х:\хф)\>а}<4, » > 1. Suppose further that, for each positive s and rj, there exist a d, 0 < д < 1, and an integer n0, such that A5.18) Pn{x--wx(d)>e}<y, " > n0. Then {Pn} is tight, and, if P is the weak limit of a subsequence {Pn>}, then Proof. Since w'x(d) < wxBd) for д < | (see A4.9)), condition (ii) of Theorem 15.2 is satisfied. That condition (i) of that theorem is satisfied follows easily from the hypotheses here and the inequality KOI < l*@)| + 2 \x(itlk) - x((i - l)tlk)\. Hence {PJ is tight. If wy(%d) > 2e, then у is interior to {x:wx(d) > e}; Pn,^>P therefore implies A5.19) P{y:wy(\d) > 2e} < lim infn, Pn.{x:wx(d) > e}. Given e and rj, choose д and n0 so that A5.18) holds; it follows by A5.19) that P{y:wy(^d) > 2e} < rj. For each k, therefore, there exists a positive dk
128 The Space D such that, if Ak = {y:wy(dk) > I/A:}, thenP(^fc) < I/A:. Put A = lim inffc Ak; then P(A) = 0, and x ф A implies lim^ wx(d) = 0, so that P(C) = 1, which proves the theorem. In applications, the hypothesis A5.18) is generally verified by using the corollary to Theorem 8.3, which carries over to D with no change. Note that, if the n0 in A5.18) is 1 for every e and r), then Pn(C) = 1 for all n. Random Elements of D A random element of D we shall often call a random function. It will always be clear from the context whether a random function under consideration is a random element of С or of D. If X is a mapping from (П, g§, P) into D, then, for each со, X(co) is an element of D whose value at t we denote by X(t, со). For each t, X(t) denotes the real function TTtX on Q.—its value at со is X(t, со). Just as in the space C, X is a random element of D {X~X3) <==¦ ?8) if and only if, for each t, X(t) is a random variable (use Theorem 14.5). A sequence {Xn} of random elements of D is by definition tight when the sequence of corresponding distributions is tight. Each of Theorems 15.1 through 15.5 can be routinely reformulated in terms of random functions. A Criterion for Convergence Let us now combine the results of this section with those of Section 12 to obtain concrete criteria for convergence in distribution. Let Xn and X be random elements of D. Write Tx for TP, where P is the distribution of X. Thus Tx contains 0 and 1, and, if 0 < t < 1, t e Tx if and only if P{X(t) Ф X(t-)} = 0. We shall write w"(X, d) in place of W'x(d). THEOREM 15.6 Suppose that A5.20) (Xn(tl), ..., Xn(tk)) *+ (X(t,),..., X(tk)) holds whenever tlt . . . , tk all lie in Tx; that Ppf(l) Ф Щ—)} = 0; and that A5.21) P{\Xn(t) - XM\ > I, \XM ~ *«@l > *} < Jy №) - F(tl)r for t-l<t<t2 and n > 1, where у > 0, a > ^, and F is a nondecreasing, continuous function on [0, 1]. Then Xn-^-X. There is a more restrictive version of A5.21) involving moments, namely
Weak Convergence and Tightness in D 129 Proof. By Theorem 15.4, it suffices to show that for each positive e and 77 ;here exists a positive д such that A5.22) P{w"(Xn, 6)>e}<r} holds for all n. For fixed т, д, and n, and for m a positive integer, consider the random variables (A ^ ОДЛ ? — Y^ I _J_ _— A1 Y I _J_ ' J5 \ * __ 1 о "\ m / \ m J Let M'm = max min where the maximum extends over 0 < i <y < к < m. By A5.21) and Theorem 12.5, A5.24) ?{M"n > e} < f [F(t + S) - F(r)]fc, where К = К">сс depends only on у and a. Put A5.25) w"(Xn, [t, r + d]) = sup min {]Zn@ - Xn(^)l, !^n(^ - *n@!}, where the supremum extends over t±,t,t2 satisfying r < tx < t < ?2 < т + б. Since Zn is a right-continuous function on [0, 1], letting m -> 00 in A5.24) yields A5.26) P{w"(Xn, [r, r + d]) > e} < * [F(t + Й) - F(r)]fc. e ' Suppose now that д = l/Bu) is the reciprocal of an even integer, and suppose that A5.27) w"(Xn, [lid, Bi + 2)d]) <e, 0<i<u-l, and A5.28) w"(Xn, [Bi + 1N, Bi + 3K]) < e, 0 < 1 < и - 2. If h < ? < ?2 and ^2 ~ ^1 ^ ^ > then ?x and /2 both lie in some one of the 2m — 1 intervals involved in A5.27) and A5.28), so that min {|ЛГЯ(О - Xn(tJ\, \XM - Xn(t)\} < e. Thus A5.27) and A5.28) together imply w"(Xn, 6) ? e. It now follows by
130 The Space D A5.26) that A5.29) P{w"(*n, d) > e} < ^ (Z' + I"), e ' where each of 2' and 2" is a sum of the form with 0 < tx < • • - < tr < 1 and tk - tk_x < 26. But then A5.30) ?{W'{Xn, d)>e}< ±f [F(l) - F@)][w A ey Since 2a > 1 and F is continuous, the right member of this inequality goes to 0 with d, which proves A5.22). Criteria for Existence^ These ideas lead also to a condition for the existence in D of a random element with specified finite-dimensional distributions. As in Theorem 12.4, for each /c-tuple tx, . . . , tk of points of [0, 1], let pti.m.tk be a probability measure on (Rk, Mk), and assume these measures satisfy the consistency conditions of Kolmogorov's existence theorem. THEOREM 15.7 There exists in D a random element with finite-dimensional distributions (itl...t]c, provided these distributions are consistent; provided i for t1<.t<. t2, where у > 0, a > J, and F is a nondecreasing, continuous function on [0, 1]; and provided A5 32) lim AW»{@i» ^2) : IA - P*\ > e} = 0, 0 < t 4 J h|0 There is a more restrictive version of condition A5.31): f - &Г \Pt - W d^tttfi, P, t JK Proof. It is easy to construct for each n a random function Xn which is constant over each interval [(/ — 1J~", f2~") and for which the joint distribution of t This topic may be omitted.
Weak Convergence and Tightness in D 131 is /лгошшЛк with к = 2n and tt = i2~n. We shall prove that {XJ is tight. We first" show that, for given e and r\, there exists а д such that A5.33) PK(In) <5) > ?} < v holds for sufficiently large n. If fl5 ?, and f2 are integral multiples of 2~n and tx < f < t2, then, by A5.31), P{|Xn@ - XM\ > A, |Xn(f2) - Xn(f)| > Я} < Consider again the variables A5.23). If the points т + jbjm,j = 0,1,. . . , m, are all integral multiples of 2~n, then A5.24) follows as before. If т and <5 are integral multiples of 2~n, then the same is true of the points т +/<5/m, provided m = <52n. But, since ^n is constant over each interval [(i - lJ-n, i2-n), M"m for m = <52n is just the quantity defined by A5.25). Thus A5.26) holds if т and <5 are both integral multiples of 2~n. Suppose now that <5 = 1/2V for an integer v > 1. If n > t>, then <5 is an inte- integral multiple of 2~n and so are the endpoints of all the intervals involved in A5.27) and A5.28). It follows as before that A5.30) holds for n > v. Thus there does exist a <5 such that A5.33) holds for all large n. The Xn will be tight if their distributions Pn satisfy the hypotheses of Theorem 15.3. Now A5.7) is satisfied because of A5.33). If 2~k < <5, then sup \Xn(t)\ < max Л2Ч d). Since the distributions of the first term on the right all coincide for n > k, it follows by A5.33) that condition (i) of Theorem 15.3 is satisfied. To take care of A5.8) and A5.9) we make the temporary assumption that, for some positive <50, h < <50 implies A5.34) fAM{(filt h)-h = ]8«} = 1, A*i.^{(]8i, h)-h = h) = 1. With this assumption, A5.8) and A5.9) certainly hold, so that {Xn} is tight. If X is the limit in distribution of some subsequence, then (^(^1), . . . , X(tk)) has distribution fth...tk for dyadic rational tt, and the general case follows via A5.32) by approximation from above. It remains to remove the restriction A5.34). Take <50 < \, put 0 for 0<K <50, ^7 for **?*?!-**> ^7 2d0 1 for 1 - <50 < t < 1,
132 The Space D and define vh...tk as ^...s* witn si = Ah)- Tnen tne r*i-** satisfy the conditions of the theorem (with a new F), as well as the special condition A5.34), so that there is a random element Y of D with these finite- dimensional distributions. We now need only define X by X(t) = Y(d0 + t(\ - 2<5O)), 0 < t < 1. As an illustration of Theorem 15.7, let us use it to prove the existence in D of a random element describing a Markov chain in continuous time. Let Pijit), i,j = 1, 2,... be nonnegative numbers defined for t > 0 and satisfying ^iPuiO = ! and ^чРМрл{г) = pik(s + 0- Let pit i = 1, 2, . . . , be non- negative numbers satisfying 2^=1. Define Pj@) = p, and p^t) = ^iPiPij(t) for f > 0. Under the regularity assumption that A5.35) lim Pit(t) = 1 t->o holds uniformly in i, we shall prove the existence in D of a random element X with finite-dimensional distributions specified by A5.36) P{X(O = iu, и = 0, 1,. . . , m} = Х for 0 < f0 < tx < • • • < tm < 1. The consistency of the finite-dimensional distributions implied by A5.36) is easy to establish. We first prove the existence of a positive К such that A5.37) 0<l -Pii{t)<Kt for all i and all positive t. Because of the uniformity in A5.35), whatever s, there is a <5 such that pu{s) > 1 — e for all i if s < 2<5. If t <, <5, then <5 < mt < 26 for some positive integer m, and we have m—2 + 2 [Ри(ОЗг2 А/'Ы(т - / - 1H Z=0 m—2 2 г=0 Take ? = i; it follows (since log и < и — 1) that m(l - pu(t)) < log 2 < 1. Take ,? = d-1. Since mt > 6, it follows that A5.37) holds for t < д. But the inequality is trivial for t > <5. If A5.36) holds, then A5.38) where d1 = t — t1 and <52 = t2 — t and where the summation extends over
Weak Convergence and Tightness in D 133 . i and over those/ and к for which i Ф j and/ Ф k. By A5.37), the sum in 5.38) is at most КЧ^ < К2(д1 + <52J. Thus the finite-dimensional stributions implied by A5.36) satisfy A5.31) with у = 0, a = 1, and 0 = Kt. Since A5.36) and A5.37) also imply ?{X(t + h) И X(t)} <, Kh, 5.32) is also satisfied. Thus some random element X of D satisfies A5.36). ince the possible values of X{i) are at mutual distance at least 1 on the line, is, with probability 1, a step function constant over intervals.) ther Criteria^ le function .Fin Theorem 15.6 was assumed nondecreasing and continuous. re can replace the assumption of continuity by that of continuity from the ght, provided we strengthen A5.21) to 5.39) P{|Xn@ - XM\ > A, \Xn(t2) - Xn(t)\ > < ideed, A5.39) and Theorem 12.6 yield, in place of A5.24), the stronger lequality f ¦5.40) ?{M"m > e} < f [F(t + 6) - F(t)] e ' x mm m F(t + <5) - F(t) ,et J(t, t + <5] denote the maximum jump of F in the interval (т, т + <5]; 15.40) implies 15.41) P{M1 > ?} ^ fy lF(r The right member of this inequality is to be interpreted as 0 if F(t) = Х- + «).) Exactly as before, we obtain A5.29), where now each of 2' and 2" is a um of the form 15.42) vith 0 < tr < • • • < tT < 1 and tk — ^_x ^ 25. To prove A5.22) it therefore uffices to show that, for small enough <5, all sums A5.42) are less than 7o = r]e2y/2K. Write Afc for F(tk) - Fit^) and Jk for J(tk_x, tk]. Since a > J- ' This topic may be omitted.
134 The Space D and the Afc add to at most F(\) - F@), 2 АЛА, - Jk)' < к But the left member of this inequality is the sum A5.42), and hence it is enough to show that, for small enough <5, 0 < t — s < 2<5 implies A5.43) F(t)- F(s) - J(s, t]<m = - Since i7 is an element of D, there exist (see A4.8)) points si} with 0 = so < si < ' ' ' < Si = 1, such that w^lX-i, st) < \r}x. Take 2<5 smaller than the minimum of the Бг — s^. \f t — s < 2<5, then either s and f lie in a common interval [5г-_15 ^г), in which case the left member of A5.43) is less than ^t]x, or else they lie in adjacent intervals [^_х, s{) and [st, si+1), in which case it is at most F(t) — F(sJ + Ял-) - F(s) < r\x. Thus A5.39) for a right-continuous F implies A5.22). The rest of the proof goes through as before, and we can conclude that Xn ^> X. In the same way, we can in Theorem 15.7 take F to be continuous from the right if we replace A5.31) by A5.44) ^„.{(A, /5, /52): \fi - &| > A, |ft - /5| > Я} Separable Stochastic Processes^; According to Theorem 9.2, if the finite-dimensional distributions of a separable stochastic process can be realized as the finite-dimensional distributions of a probability measure on (C, ^), then the sample paths of the process are continuous with probability 1. We shall prove a similar result for the space D. Let {|t:0<f<l}bea stochastic process and let P be a probability measure on (D,@). Assume that the process and the measure have the same finite-dimensional distributions. Assume further that the process is separable, so that (9.16) holds for every a and /5 and every open interval /. It is no restriction to assume 0 e To, in which case (9.16) holds also if / has the form [0, s). We now construct sets serving the same function as the sets (9.20) did in the arguments in Section 9. This time let V denote the general system A5.45) V: k;rlt...t rk; slt . . . , sk; ax, . . . , afc, t This topic may be omitted.
Weak Convergence and Tightness in D 135 iere к is an arbitrary integer, where the ri} si} and a^ are rational, and lerer 0 = ri < Si < r2 < S2 < " - * < rk < sk = *• efine к 5.46) UTo{V, e) = П {со: a, < |t(co) < a, + ?, f e (rit s<) П To} A; П П {co:min {аг._х, aj < |f(co) < max {а<_ь aj + e, f e (s^, rt-) П To}; the first intersection, we replace (r1} 5X) by [rl5 sx) (for f = 1). Let "Гы ;note the class of systems A5.45) that have a fixed value of к and satisfy — jj_a < <5, / = 2, . . . , k. And now define 5.47) ПТо=ПиП U nTo(V,e), e к 6 УеУкб here ? and <5 are restricted to positive rationale. Finally, define Q.T(V, e) id пт by substituting T = [0, 1] for To in A5.46) and A5.47). It is not hard to show that, if со е ?2 T, then the sample path corresponding > со is right-continuous at t = 0, has a left-hand limit at t = 1, and is mtinuous in @, 1) except possibly for discontinuities of the first kind. And the sample path has this property, then a> e ?1T, as follows by Lemma 1 of sction 14 (applied to the sample path normalized to be right-continuous). We shall prove that пт e dS and ?{пт) = 1. The relations (9.22) and \Tf> e 3S follow by the same arguments as in Section 9; hence ?2T e SS and (?2T) = P(l2To), and it suffices to prove .5.48) P(QTe) = 1. With (t1} t2, . . . ) an enumeration of the points in To, define q>:Q. -*¦ i?00 nd xp: D — R« by 9>(a>) = (|4l(a>), |ti(a>),... ) and xp(x) = (x^), x(t2),...). jst as before, (9.25) holds for every H in ^?co, where now P denotes the leasure on D (not C) having the finite-dimensional distributions of {|J. For the system A5.45) and s > 0, let HTo(V, e) consist of those points = (zi> z2, . . . ) in i?OT such that, first, for each i = 1, . . . , k, the inequality a* < zu < a, + s olds for every м for which fu e (ri5 5г) (or fu e [rl5 jj in the case i = l) and, ;cond, for each i = 2, . . . k, the inequality min {а^_х, aj < zu < max {а^_х, aj + e olds for every и for which tu e (^_1} гг). Now define ЯТо=ПиП U HTo(V,e). г к 5 Ve-Tkd
136 The Space D Then HTo(V, e) and HTo lie in ^co, <p-lHTo(V, e) = &Tq(V, e), and у-хНТо = пТо. By (9.25) we have P(ftT(j) = Р(уг1нт)> and> since \p-xHT] = D, A5.48) follows. We have proved this result: THEOREM 15.8 If {?t:0 < t < 1} is a separable stochastic process, and if there exists on (D, Of) a probability measure having the same finite-dimensional distributions as {|J, then the sample paths, with probability 1, are right- continuous at t = 0, have left-hand limits at t = 1, and are continuous on @, 1) except possibly for discontinuities of the first kind. An example shows that it is not possible to go on and prove that the paths are right-continuous and hence lie in D (with probability 1): Define if 0 < t <, со if со < t < 1, where со is drawn uniformly from П = @, 1). This points up the fact that in taking the elements of D to be right-continuous we were simply adopting a convention. PROBLEMS 1. Show that Theorem 15.4 remains true if we only require that A5.10) hold for all fx,. . . , tk in To, where To is dense in [0, 1] and contains 0 and 1. Show that it fails if 0 $ To or if 1 $ To. 2. The condition A5.32) in Theorem 15.7 is necessary. 3. If a random element X of D has the property that lim sup - ?{\X(t + <5) - X(t)\ > e} = 0 ° for every positive e, then P{XE C} = 1. 4. Use Theorem 15.7 and Problem 3 to show that distributions Htl-~tk can be realized as the finite-dimensional distributions of a random element of С if A5.31) holds (where у > 0, a > i, and F is nondecreasing and continuous) and if lim sup - Mt.t+sWi, &) : 102 - 0il > ?) = 0 <5-*0 0 ° for every positive e. There is the analogous result with A5.44) in place of A5.31). Use Theorem 9.2 to derive conditions for the continuity of sample paths for a separable process. (Compare Problem 8 in Section 12.) 5. Use Theorem 15.7 to construct a Poisson process in D. Now make a direct construc- construction, starting with independent, exponentially distributed random variables.
Applications 137 16. APPLICATIONS Donskefs Theorem In some ways, the space D is more convenient than С for the formulation of Donsker's theorem. Given random variables |x, |2, ... on (Q, ?8, P), with partial sums Sn = |x + • ¦ • + ln, let ^n(co) be the function in D with value J_ Ту/П A6.1) Xn(f, со) = — S[n4](co) at f. Since each Xn{i) is a random variable, Xn is a random function—a random element of D. We want to prove, under suitable conditions, that the distribution of Xn converges weakly to Wiener measure W. Now W is defined on (C, ^), but it is easily extended to (D,@): Since Ce^, and since the relative Skorohod topology in С coincides with the uniform topology there, Ae@ implies А Г\ С e^. We may therefore extend W to (D, Of) by giving it the value W(A П C) for A in Э). Of course, С supports W. From now on we shall interpret W as this probability measure on (D, &) or as a random element of D with this probability measure as its distribution. THEOREM 16.1 Suppose the random variables |n are independent and identically distributed with mean 0 and finite, positive variance a2: A6.2) Then the random functions Xn defined by A6.1) satisfy A6.3) Xn Д. Ж. Proof. The proof in Section 10 that the finite-dimensional distributions converge carries over with no difficulty. Thus we need only prove tightness. There are several ways to prove tightness. One way is to verify the hypotheses of Theorem 15.5. Now condition (ii) of Theorem 8.3, formulated for random elements of D, requires that, for positive e and rj, there exist a <5, 0 < <5 < 1, and an integer n0 such that, for 0 < t < 1, -?[ sup \Xn(s)-Xn(t)\>e\<r), п>щ О \t<s<t+6 ) (we replace t + <5 by 1 if it exceeds 1). Furthermore, this condition reduces, by the arguments used in the proof of Theorem 8.4, to the following one:
138 The Space D For each positive e there exist а л > 1 and an integer n0 such that P max \St\ > Mn\ < — , n>n0. \ i < n ) Я It suffices therefore to verify this last condition, which can be done by using the central limit theorem as in Section 10 (see A0.7)) or by using the bounds in Section 12 (see A2.23)). Thus the methods of the preceding chapter carry over without essential change. Given the results in Section 15, we can construct a very simple proof. By Theorem 15.6, it is enough to establish the inequality A6.4) E{|Xn@ - Хп(к)\* • \Xn(t2) - ^ 4(f, - k)\ h<t< t2. By A6.1), A6.2), and the independence of the |n, the left side of A6.4) is A6.5) 4 4l G П n n If h — h > 1/л, A6.4) follows from this. If t2 — t1 < \jn, then either tx and t lie in the same subinterval [(i — \)jn, ijri) or else t and t2 do; in either of these cases the left side of A6.4) vanishes. This establishes A6.4) in general and proves the theorem. Since the value of supf Xn{t) is the same for the random element of D defined by A6.1) as for the random element of С defined by A0.1), namely тахг<п Sjay/n, we can obviously use Theorem 16.1 to rederive A0.18). (We must prove that the mapping h.D^-R1 defined by h{x) = supt x{t) is continuous, but this presents no problem.) We may rederive the results in Section 11 in the same way. Some functions of the partial sums St have a simple expression in terms of the Xn in Theorem 16.1 but have no simple expression in terms of the Xn in Theorem 10.1. Such functions are better analyzed in D than in С For example, for x e D, let h(x) be the Lebesgue measure of the set of t for which x(t) > 0. Then (p. 232) h is measurable {hrx0P- <=- Of) and is continuous except on a set of Wiener measure 0. If Xn is defined by A6.1), then h(Xn) is exactly rr1 times the number of positive sums among S1, . . . , Sn_1} whereas this is generally untrue if Xn is defined instead by A0.1). Combining Theorem 5.1, Theorem 16.1, and A1.26) leads to the arc sine law under the assump- assumptions of the Lindeberg-Levy theorem.! t See the problems for some further applications.
Applications 139 ominated Measures tie random variables in Theorem 16.1 are defined on a probability space 1, &, P). We shall show that the result remains true if P is replaced by an bitrary probability measure Po on (Q, ?%) dominated by (absolutely mtinuous with respect to) P. For example, suppose that П = [0, 1], that 7 consists of the linear Borel subsets of [0, 1], and that ln(co) = 2con — 1, here con is the rcth digit in the binary expansion of со. Then Theorem 16.1 )plies to {|n} if со is drawn from [0, 1] according to the uniform distribu- on. According to the theorem we are going to prove, this is true also if со drawn from [0, 1] according to an arbitrary distribution having a density ith respect to Lebesgue measure. We shall need the following preliminary result (in which o-(^0) denotes le cr-field generated by HEOREM 16.2 Let Ex, E2, . . . be measurable sets in a probability space .^, &, P). Suppose there exist a constant a and a subfield ^0 of & such that 6.6) ?{En П E) -* aP(?) »r every E in ^0. Suppose further that all the En lie in a(&0). If? dominates 0, a second probability measure on ?8, then 16.7) P0(En) - a. roof. From the hypothests it follows just as in the proof of Theorem 4.5 mt 16.8) f g )r every integrable g. But A6.8) and A6.7) are the same thing if g is the Ladon-Nikodym derivative of Po with respect to P. HEOREM 16.3 Theorem 16.1 remains valid if P is replaced by an rbitrary probability measure Po dominated by it. roof. Define X'n by 16.9) X'n(t,co) = ^ pn<i<nt 'here {pn} is a sequence of integers going to infinity slowly enough that 0 (X'n{t, со) = 0 if t < pjn). If ben 1 Vn
140 The Space D By Minkowski's inequality and the fact that pjyjn -*¦ 0, so that, by Chebyshev's inequality, A6.10) <5n^0, where A6.10) is interpreted in the sense of P (that is, P governs the distribu- distribution of the |n). By Theorem 16.1, A6.11) Xn®>W, where again the relation is interpreted in the sense of P. Since d(Xn, X^) < dn, where d is the metric in D (either one), it follows by Theorem 4.1 that A6.12) X'n ®> W in the sense of P. Let A be a Ж-continuity set (in D), temporarily fixed; A6.12) implies A6.13) ?{X'neA}^W{A). Let ?%q be the field of cylinders; ^ consists of sets of the form with к > 1 and H e 0tb. If E e ^0, then, sincepn -> oo, ?{{X'n e A) n E) = e ^}P(?) for large n and it follows by A6.13) that Since the sets {X'n e A) all lie in the <y-field generated by ^0, it follows by Theorem 16.2 with a = W{A) that A6.14) P0{X'neA}-+W(A) if Po is absolutely continuous with respect to P. Since A6.14) holds for every Ж-continuity set A, A6.12) holds when interpreted in the sense of Po. Now A6.10) asserts that P{<5n > e} —*¦ 0 for each positive e; if Po is absolutely continuous with respect to P, then Po{<5n > ?} —*¦ 0 as well, and A6.10) persists when interpreted in the sense of Po. Applying Theorem 4.1 once more, we see that A6.11) holds in the sense of Po, which completes the proof. Theorem 16.3 concerns convergence in distribution. Notice that passing from P to a dominated measure Po trivially preserves such properties of {Sn} as the law of the iterated logarithm, which hold with probability 1.
Applications 141 Empirical Distribution Functions Let |l5 |25 • • • be random variables with and let Fn(t, со) be the proportion of the points ?i(co), . . . , ?n(co) not exceeding ?, so that Fn( • , со) is the empirical distribution function. Let Yn(co) be the element of D with value A6.15) Уя(*, со) = Jn(Fn(t, со) - F(t)) at the point t e [0, 1], where F is the distribution function common to the |n. Since each 1^@ is a random variable, Уя is a random element of D. In Section 13 we studied the related random element A3.7) of C; here we shall investigate Yn directly, and we shall remove the restriction that the ?n be uniformly distributed. The stochastic process {Yn(t): 0 < t < 1} is sometimes called the empirical process. THEOREM 16.4 Suppose the ?„ are independent and have a common distribution function F(t). If Yn is defined by A6.15), then A6.16) Yn^Y, where Y is the Gaussian random element of D specified by (E{Y(t)} = 0 A6.17) \E{Y(s)Y(t)} = F(s)(\ - F(t)), s<t. Proof. Just as Wiener measure does, the measure W° extends from (C, to (D, Of). Denote by W° a random element of D with this extended measure as its distribution. We first show, under the assumption that the ?n are uniformly distributed over [0, 1], that Yn Л* W°. Let Un(t, со) be the number of points among ?х(со), . . . , |n(co) not exceeding t. For 0 = t0 < tx < • • • < tk = 1, the random variables ипAг) — ип^г_г), i = 1, . . . , к, are multinomially distributed with param- parameters n and р; = tt — ^_x, and it follows as in Section 13 by the central limit theorem for multinomial trials that the finite-dimensional distributions of the Yn converge weakly to those of W°. By Theorem 15.6, it suffices to prove that E{l Yn(t) - Yn(tJ\* ¦ | Yn(t2) - 7n@l2} < 6(t - h)(t2 - t) for tx < t < t2. But this is just the inequality A3.17) already established in Section 13. Hence Yn -®> W°.
142 The Space D Suppose now that the gn have an arbitrary distribution function F over [0, 1]. Define an "inverse" to Fhy y(s) = inf {t:s <F(t)}. Then s < F(t) if and only if <p(s) < t, so that, if r\n is uniformly distributed over [0, 1], then P{<p(r]n) < t) = F(t). Since the theorem involves only the joint distribution of the |я, we may represent them as ?я = <p(r]n), with the r]n independent and uniformly distributed. If Gn{ • , со) is the empirical distribution function for rj^co), . . . , r]n{co) and Zn(t, со) = -Jn(Gn(t, со) — t), then, by the case of the theorem already established, Zn -^ W°. But the empirical distribution of ?i(a>), . . . , ?n{co) is Fn(t, со) = GXF(t), со), so that, if Yn is defined by A6.15), Yn(t, со) = Zn(F(t), со). Define y.D-^D by (ipx)(t) = x(F(t)). If xn converges to x in the Skorohod topology and x e C, then the convergence is uniform, so that ipxn converges to ipx uniformly and hence in the Skorohod topology. Hence Zn4r and Theorem 5.1 imply Yn = y>(Zn) Д. ip{W°); since Y = y(W°) is Gaussian and satisfies A6.17), the proof is complete. If we knew the distribution of A6.18) supt 7@ = supt W°(F(t)), we could find the limiting distribution of A6.19) yjn supt (Fn(t, со) - F(t)). Now A6.18) has the same distribution as supt W° (namely that given by A1.40)) if F is continuous, but not otherwise. The same remarks apply to A6.20) supt | 7@1 = supt | W°(F(t))\. Remarks. The random function A6.1) has independent increments and so has W; for general results on weak convergence in this circumstance, see Kimme A957 and 1960) and Skorohod A957). For convergence of diffusions, see Stone A963). Theorem 16.2 is due to Renyi A958). Concerning Theorem 16.4, see the remarks and references at the end of Section 13. See Schmid A958) for the distributions of A6.18) and A6.20) for the general F. PROBLEMS 1. Under the hypotheses of the Lindeberg-Levy theorem, there are limiting distributions for (a) A 2 1**1. i i sk\ n (Ь) i n k=l (c) — min Sk, 0 < 0 < 1. Vл рп<к<п
Random Change of Time 143 Construct the three relevant mappings from D to R1 and prove they are measurable and continuous on a set of W^-measure 1. For the forms of the limiting distributions, see Donsker A951) for (a) and (b) and Mark A949) for (c). 2. Theorem 16.1 (makes sense and) is true if л goes to infinity in a continuous manner. [See B.4).] 3. For each n, let ?nl,.. . , ?nr! be independent random variables with P{?nk = 1} = pn and P{?nk = 0} = 1 — pn, and define Yn by Yn(t) = ^k<nt ?nk. Assume npn -»- A and prove that Yn -L>. Y, where Y is the appropriate Poisson process (with paths in D). 4. Show that Theorem 16.4 still holds if the measure governing the ?„ is replaced by a probability measure absolutely continuous with respect to it. 5. Prove Theorem 16.4 by an application of Theorem 15.6 with A5.39) in place of A5.21), thus avoiding the representation ?„ = y(rjn). 6. Let P be Lebesgue measure on the unit interval Q, and define ?я(со) = 2con — 1, where con is the nth digit in the dyadic expansion of со. Show that the requirement in Theorem 16.3 that Po be dominated by P is essential: The result fails if Po is a point mass. It also fails if Po is Cantor measure (as defined in Billingsley A965, p. 36)). 7. Carry Problem 1 of Section 10 over to D. (See also Problem 1 of Section 12.) Gener- Generalize the results in Problem 1 of this section. 17. RANDOM CHANGE OF TIME Sometimes one requires an approximate distribution for a partial sum Sv = |x + • • • + |v, where the index v is itself a random variable. Here we shall prove several functional central limit theorems for such randomly selected partial sums. Randomly Selected Partial Sums Suppose the partial sums Sn = ?x + • • • + gn for a nonrandom index n obey the central limit theorem, say with norming factors oyjn, so that A7.1) ^ as n -> со. If v is a random integer such that v is large with high probability, there is some hope that SJo\jv will be approximately normally distributed. To formulate a limit theorem, consider a sequence {vn} of random integers. We seek conditions under which A7.2) ^ as n —>> со. It is not enough simply to assume that vn goes to infinity in probability, in the sense that A7.3) ?{vn < a} -> 0
144 The Space D for each a. For suppose the ?n are independent and take the values +1 and — 1, with probability \ for each, so that A7.1) holds with a = 1. If vn is the time of the nth zero crossing (the nth value of к for which Sk = 0), then A7.3) holds (because vn > vn_1 + 1) but A7.2) does not (because SVn = 0). To derive A7.2) we must assume more than A7.3). Our principal result will be that A7.2) holds if vjn converges in probability to a positive, finite random variable, and if the ?n are independent and identically distributed with mean 0 and variance <r2. We shall derive this result from a functional central limit theorem which gives much more information. Define a random element Xn of D by Xn(t, со) = —- SlntJ((o), Yn(t, со) = —= Sv[jid(e)(o)) and define Yn by (certainly Yn is a random element of D). Since ^„A) = SvJoy]n, we shall be in a position to approximate the distribution of SVn and derive results like A7.2) if we show that Yn itself converges in distribution. The distribution of a random function such as Yn is most easily investigated by using the fact that Yn results from subjecting Xn to a random change of time scale. Let Фя(^, со) = v[nu{co)jn. Since A7.4) Yn(t, со) = Хп(Фп«, со), со) (we assume for the moment that vn < n, so that Фя(?, со) < 1), Yn is Xn with the time scale subjected to a change represented by the random function Фя. То clarify the reasoning involved in this section, we first consider such random time changes in a general context. We assume Xn and Фя each converge in distribution and look for conditions under which Yn, as defined by A7.4), converges in distribution. Random Change of Time Let Do consist of those elements cp of D that are nondecreasing and satisfy 0 < y@ < 1 for all t. Such a cp represents a transformation of the time interval [0, 1]. We topologize Do by relativizing the Skorohod topology of D. Since Do e 3, as is easily shown, the ff-field ^0 of Borel sets in Do consists of the subsets of Do that lie in Q) (see p. 224). For x e D and cp e Do, let x ° <p denote the composition of x and cp—the function on [0, 1] whose value at t is A7.5) (x о p)@ = x(<p(t)).
Random Change of Time 145 low x о cp lies in D and, if гр: D X Do —> D is defined by 17.6) ^(я, у) = x о у, hen, as is proved on p. 232, ip is measurable (ip'1^ <=¦ Я) X ^0). Let X be a random element of 2) and let Ф be a random element of Do. Ve assume X and Ф have the same domain, so that (X, Ф) is a random lement of D x Do with the product topology (see p. 225). If X о ф has alue X(cd) ° Ф(со) at со—that is, if Xо ф == у{Х, Ф)—then Х°Ф is a andom element of D; X ° Ф results from subjecting X to the time change epresented by Ф. Suppose that, in addition to X and Ф, we have, for each n, random dements Xn and Фп of D and Do, respectively, where Xn and Фп have the ame domain (which may vary with ri). We ask for conditions under which Xn, Фп) Д- (X, Ф) (convergence in distribution relative to the product opology) implies 1яофД1оф. Suppose that 17.7) (Xn, Фп) Д. (X, Ф) md that 17.8) P{XeC} = Р{ФеС} = 1. 3y Corollary 1 to Theorem 5.1, 17.9) 1яофД1оф vill follow if we show that ip (defined by A7.6)) is continuous at (x, ф) for с e С and <p e С П Do. If ?я converges to a; and <pn converges to cp in the 5korohod topology, and if x and cp lie in C, then the convergence in each ;ase is uniform. But supt \xn(<pn(t)) - x(<p(t))\ < supt \xn(t) - x(t)\ + supt \x(<pn(t)) - x(<p(t))\; since x is uniformly continuous, xn о <pn converges to x ° cp uniformly and hence in the Skorohod topology, which proves A7.9). There remains, of course, the question of when (Xn, Фп) converges in distribution, and here Theorems 4.4 and 4.5 can be used. Applications We return now to a consideration of sums Sn = |x + • • • + |n. For each n, let vn be a positive-integer-valued random variable defined on the same probability space as the ?n. Define Xn by A7.10) - Xn{t, cd) = ±
146 The Space D and Yn by A7.11) Yn(t, со) = 1 ¦SlYnl*n№= XVn(o»0, 0>). THEOREM 17.1 // A7.12) ?A«, where в is a positive constant and the an are constants going to infinity, then A7.13) Xn^W implies A7.14) Yn Л- W. Proof. There is no loss of generality in assuming that A7.15) O<0<1, since this can be arranged by passing to new constants an if necessary. Furthermore, there is no loss of generality in assuming that the an are integers. Define A7.16) Since A7.17) ФпС. to) = if td otherwise. supt\<!>n(t) - td\ < 0, Ф„ converges in probability in the sense of the Skorohod topology to the element <p(t) = 6t of Do. Because of A7.13) and the assumption яя —>- oo, it follows from Theorem 4.4 that (Хпп, Фя) Д- (W, ф) and hence, since A7.7) and A7.8) imply A7.9), that Xa °"фя -> W ° <p. If Y&t, со) = —— S[Vn(co)f](co), then Xan о Фп and Y'n have the same value at со if vn(co)lan < 1, the probability of which goes to 1 by A7.12) and A7.15). Therefore Y'n *> W° (p. Now sup, -= Y&tt со) - Yn(t, со) Vе and hence Yn converges in distribution to has the same distribution as W, A7.14) follows. cp). Since H(Jf»
Random Change of Time 147 Of course, A7.14) implies SvJff\ vn -^>- JV, as well as an arc sine law and limit laws for the maxima and so on. Note that we have assumed of the |n nothing beyond A7.13), which holds for example in the Lindeberg-Levy situation. In Chapter 4 we prove A7.13) for various dependent sequences {?„}, to each of which Theorem 17.1 then applies. If the limit in A7.12) is not constant, we must make more specific assumptions about the |n. THEOREM 17.2 Suppose |l9 ?2, . . . are independent and identically distributed with E{?J = 0 and E{?n2} = a\ If A7.18) ^Д-0, where в is a positive random variable and the an are constants going to infinity, then Yn ®> W. Proof. We assume at first that в is bounded, so that there exists a constant К such that 0 < в ^ К with probability 1. We may adjust the an so that they are integers and so that K< 1. If we define Фя by A7.16), then, as before, Фп Д- Ф, where Ф is the random element of Do denned by Ф@ = 6t. Let &0 be the field of cylinders and define X'n by A6.9), where pn -> oo and -> 0. Then, as in Section 16, we have and P({X'neA} n E) -> W(A)?(E) for every Ж-continuity set A and every E in ^0. Now by Theorem 4.4, (Фя, vjan)^>(<?>, в) in the sense of the product topology on Do x R1; since every X'n is measurable o-(^0), it follows by Theorem 4.5 that relative to the product topology in D x Do x R1, where в0 is independent of W and Ф0(Г) = 60t. By A7.19), (Xan, Фп, Ц Д. (W, Фо, во).
148 The Space D Now the mapping that carries the point (x, <p, a) to ori(x о у) is continuous at that point if x e C, <p e С П Z)o, and a > 0. By Corollary 1 to Theorem 5.1, therefore (-V (*e. ° Ф„) Д- tf{W - Ф0). W Since в0 and Ж are independent, б„-*(Ж° Фо) has the same distribution as W. Moreover {yJan)-i(Xan ° Фя) coincides with Yn if vjan < 1, the probability of which goes to 1 since К < 1. Thus Yn^W if 6 is bounded. Suppose в is not bounded. For и > 0, define 6U = в and уия = vn if б < и and define 6U = и and уия = аям if в > и. Then, for each u, vuJan -^- 6U as n -> oo, and, by the case already treated, if Yun(t) = S[Vun(coU](co), () then Yun ^> W as n -> oo. Since Р{Гия И ^я} < Р{б > и}, Yn®+W follows by Theorem 4.2. This proves Theorem 17.2. The proof goes through for many dependent sequences {?„} also; see Chapter 4. Renewal Theory These ideas can be used to derive a functional central limit theorem connected with renewal theory. Let т]1г г\г,. . . be a sequence of positive random variables and define A7.20) vt = maxj/c:i^< t\, t>0, with vt = 0 if 7}г > t. If 7]k is the time lapse between the occurrences of the (k — l)st and Ath events in a series, then vt is the number of occurrences up to time t. We shall assume the existence of positive constants /г and a such that, if Xn(t, со) = then Xn Д- Ж. This will be true if, as in the usual renewal theory situation, the r]n are independent and identically distributed with mean ^ and variance a2. Define Zn by THEOREM 17.3 7/Хя Д^ W, then Zn
Random Change of Time 149 ''roof. We shall assume in the proof that /л > 1; this is only a matter of cale. We first show that 17.21) sup 0<v<u V — и 1 — — V — и о is и -*- oo. The hypothesis Xn -^- W implies that 17.22) 1 sup - 0<i<s S [t] 2 Vi - 0 is s -*¦ oo (because replacing s~r by 5~* would give convergence in distribu- ion). By the definition A7.20), vv > t implies S^x т]г < v. Therefore implies 17.23) Similarly, implies (if e < /jr1) A7.24) SUp M <«<u\M ^ M/ sup > (лие. —e 0<v<u\U sup M Iv<- /лие. By A7.22), the probabilities of A7.23) and A7.24) go to 0 as и -+ oo, which proves A7.21). Put if n otherwise. By A7.21), Фя -?> <p, where 9?(f) = t\/i, so that, by the usual argument, Xn°<$>n®>W° cp. Let ад)--^ 2 i=\ Yn = Xn° Фп if vjn < 1, the probability of which goes to 1 by A7.21) and the assumption /л > 1. Therefore Yn -^- W° (p. By the definition A7.20), ^ < Yn(t) From тахг<я it follows that supt<x \rjvjl<j\/n, which in turn
150 The Space D implies that, if 7= > ^ Д.Ж»?). Therefore $Z'n Д- W(recall that <p(t) = tl/л), from which Zn Д- W follows because of the symmetry of W. It is possible in Theorem 17.3 to let n tend to infinity in a continuous manner. Remarks. The results of this section, which are new, extend those in Billingsley A962). For the central limit theorem for a random number of random variables, see Renyi A960) and Blum, Hanson, and Rosenblatt A963). See Lamperti A962) for another weak-con- weak-convergence theorem in renewal theory. 18. THE UNIFORM TOPOLOGY Let us consider briefly the uniform topology on D—the topology given by the uniform metric A8.1) p(x,y) = swpt\x(t)-y{t)\. With this metric, D is a complete metric space. It is, however, not separable: The uncountably many points xQ defined for 0 < в < 1 by @ if 0 < t < 6, A8.2) xe(t) = U if e<t<\, are at mutual distance 1 (see p. 216). A concept prefixed with U will refer to the uniform topology: [/-open, [/-continuous, etc. In the same way, a prefix S will refer to the Skorohod topology of D. If d is either of the two metrics that give the Skorohod topology (see Section 14), and if p is the uniform metric A8.1) then d(%, y) < p(x, y)- Therefore the uniform topology is finer than the Skorohod topology: [/-convergence implies S-convergence. On the other hand, as observed in Section 14, if xn S-converges to x and x e C, then xn [/-converges to x. In particular, the uniform and Skorohod topologies coincide when relativized to C. Let Si denote the class of iS-Borel sets as usual, and let ^ denote the class of [/-Borel sets. Since an S-open set is also [/-open, A8.3) 9 с <%. Let us suppose that Pn and P are probability measures defined on ^, the larger of % and 3). Let us write Pn =>P[[/] to indicate weak convergence in the sense of the uniform topology and Pn^>P[S] to indicate weak
The Uniform Topology 151 ;onvergence in the sense of the Skorohod topology. NowPn => P[U] requires 18.4) [ or all bounded, [/-continuous/, whereas Pn=>P[S] requires A8.4) only or all bounded, ^-continuous/. Hence Pn=>P[U] implies Pn=>P[S]. The converse is false because ^-convergence does not imply [/-convergence consider point masses). Suppose now that Pn and P are defined on W, that Pn =>P[S], and that d(C) = 1. We shall prove that Pn=>P[U]. For A in °tt, let Av denote its [/-closure and let As denote its S-closure. Note that Av <= As. Since °n =>P[S] and P(C) = 1, lim supn Pn(A) < Um suPn Pn(As) < P(AS) = D(AS П C). Since a sequence ^-converging to a limit in С also [/-converges :o the limit, As Г\ С ^ Аи. Therefore lim suipnPn(A) < Р(Аи), which mplies i>n =>i>[[/] by Theorem 2.1. We have proved this: Suppose Pn and P are defined on %. (i) IfPn =>/*[[/], fhen Pn=>P[S]. (ii) IfPn=>P[S] and P(C) = 1, then Pn=>P[U]j Because }f (i), it is better if possible to prove Pn => P[U] than to prove Pn => P[S]— for example, one has then the asymptotic distributions for more functions эп D. Because of (ii) on the other hand, if P(C) = 1, then a proof of Pn => P[S] is automatically a proof of the better result Pn => P[U]. Consider the distributions Pn of the random functions Xn involved in Donsker's theorem as formulated for D (see A6.1)). According to Theorem 16.1, Pn=> W[S\, where W is Wiener measure. Since W(C) = 1, we can draw the stronger conclusion Pn => W[U]. There is a gap in this argument: the measures Pn and W are defined only on 3. The question is whether they can be extended to the larger cr-field tft. Certainly, W can be extended—the argument used in Section 16 to extend W from (C, <g) to (D, 9) enables us also to extend it from (C, <g) to {D, Щ. Now Pn is defined as PX^1 (the domain of Xn is a probability space (Cl, 0ft, P)), a definition which is possible because A8.5) Х~г2 с <#. If we are to define Pn on °tt instead, we must strengthen A8.5) to A8.6) Х-г% с <#. If A is a finite-dimensional set in D, then Х~гА е 38, and it follows that the same is true if A is a [/-sphere. Since the uniform topology is not separable, A8.6) does not follow without further argument. t These statements are true because [/-convergence always implies ^-convergence and ^-convergence to a limit in С implies [/-convergence; they involve no further properties of C, D, and the two topologies.
152 The Space D The definition A6.1) of Xn shows that the range Хпп is separable in the uniform topology. If A is [/-open, therefore, there exist countably many [/-spheres At such that А П Хпп = \JAA n Xnty- But then X^A = {JiX^Ai, and, since each Х~гА{ lies in &, so does Х~гА, which proves A8.6). Thus the distribution Pn is (or can be) defined on °U in the natural way by Рп(А)=Р{со:Хп(со)еА}, Ae%, and we can strengthen Theorem 16.1 by interpreting Xn -^> Win the uniform topology :i>n=> W[U]. Consider next the random functions Yn defined by Yn(co, t) = Jn(Fn(t, со) - t), where Fn(t, со) is the empirical distribution function for ?i(co), . . . , ?n(co), assumed independent and uniformly distributed over the unit interval. Now Yn A- W° by Theorem 16.4. Since W° can be extended to °tt just as W was, we can interpret Yn ¦%. W° in the uniform topology rather than the Skorohod topology if we can extend the distribution of Yn from 3) to ^—that is, if we can strengthen A8.7) Y-^9 <= <% to A8.8) У-1^ с ^. But here the program collapses: We shall show that A8.8) is false. To see that A8.8) fails, consider the case n = 1. Let h be the mapping from Oto D that carries со to the distribution function with a unit jump at ^(co); h(co) = Яе (ш) with xe defined by A8.2). Clearly, A8.8) is equivalent to A8.9) h-Щ с <#, and we shall show that A8.9) is false if ?i(co) is uniformly distributed. Let Aq be the [/-sphere with center xe and radius ^. Since the xe are at mutual distance 1 in the uniform topology, we have A8.10) h-1! U Ав\ = {со:^(со)еН} \вен J for every subset H of the unit interval. If A8.9) were true, then, since \JoeHAe is [/-open, the set A8.10) would lie in ?%, so that we would have {со: ^(co) eH} e & for every subset H of the unit interval. Thus /л = P^ would be a measure defined for every subset of the unit interval and having the property that [л(а, b) = b — a for every subinterval {a, b). But no such [л exists.
The Uniform Topology 153 A small elaboration of these ideas shows that A8.8) fails for every n unless the distribution common to the ?t- is purely atomic. In treating empirical distribution functions as random elements of D, therefore, we must stay with the Skorohod topology—and this is solely for reasons of measurability.t As a rule, we may interpret Xn -^-Xvn the uniform topology only if (i) X lies in С with probability 1 and (ii) the jumps in Xn occur at fixed time points rather than at time points with random positions. Remarks. Chibisov A965) has pointed out that A8.8) is false. Another way around the problems created by the nonseparability of D in the uniform topology is to work within the (т-field <2f0 generated by the [/-spheres. This requires a different theory of weak convergence; see Dudley A966 and 1967). t O, wiste a man how manye maladyes Folwen of excesse and of glotonyes, He wolde been the moore mesurable ... The Pardoner's Tale
CHAPTER 4 Dependent Variables 19. DIFFUSION This section contains a theorem characterizing Brownian motion, a theorem characterizing more general diffusion processes, and asymptotic forms of these two theorems. In Section 20 we use these results to prove a version of Donsker's theorem for the variables of a stationary sequence satisfying a uniform mixing condition; in Section 21 we extend the result to functions of such sequences; and in Section 22 we prove functional central limit theorems for empirical distribution functions in these two cases. In Sections 23 and 24 the results of the present section are used to derive limit theorems for martingales and for exchangeable random variables.! Although many of the results of this chapter could be formulated and proved in the space C, we shall consistently use D with the Skorohod topology. A Characterization of Brownian Motion The random function W lies in С with probability 1 and has independent increments; moreover, E{Wt} = 0 and E{Wt2} = t. These properties characterize W: THEOREM 19.1 Let X be a random element of D that has independent increments and satisfies ?{XeC}=\, E{X(t)} = 0, and E{X*(t)} = t. Then X is distributed as W. f The succeeding sections all depend upon this one. Sections 20, 21, and 22 require to be read in order, but this sequence and Sections 23 and 24 form three independent units. 154
Diffusion 155 Proof, The independence and the moment conditions imply Е{(Д0 - X(s)f} = \t - s\ and P{X@) = 0} = 1. We are to show that X(t) — X(s) has distribution N@, t — s) for s < t. Because of the independence, it is enough to prove this for s = 0, and by continuity we may assume t < 1. Let cp{t, • ) be the characteristic function of X(t)\ <p(t, u) = E{eiuX{t)}. Since P{XeC} = 1, cp(t,u) is continuous in t as well as in u. We shall show that it satisfies the differential equation A9.1) I <p(t, u) = -iu2<p(t, u), 0<t<l, ue R\ at It will then followt that since X@) vanishes with probability 1, this will entail normality. In proving A9.1), we may take the partial derivative to be a derivative from the right.J We are thus to prove A9.2) lim i [<p{t + h,u)- <p(t, u)] = -\u2tp{t, u), 0 < * < 1. ftlo h By a standard estimate, § A9.3) eiv = 1 + iv - \v2 + c(v), with A9.4) \c(v)\ < N3 for v real. Write A9.5) AM = A(s, 0 = X(t) - X(s). From the independence of these increments and the relations E{AsJ = 0 f If / satisfies /'@ = A(t)f(t) with A continuous, then its ratio with the nonvanishing function/0@ = exp J* A(t) dr has derivative 0, so that/(O//o(O =/@)//0@) =/@). $ To prove that a continuous/with continuous right-hand derivative/" on [0,1) necessarily has a two-sided derivative on @, 1), it suffices to assume / real and prove that F(t) — fit) - /@) - Jo/+(t) dr vanishes identically. If (say) Я*„) < 0, then G(t) = F(t) - tF(to)/to satisfies G@) = G(t0) = 0 and (since F+ vanishes) G+(t) > 0, so that, over [0, /0], G must have a positive maximum at some interior point s; but then G+(s) < 0, a contradiction. § Feller A966, p. 485).
156 Dependent Variables and E{A^ J = t — s, we have <p(t + h,u)- cp(t, u) = E{eiuX{t)[eiuMt't+h) - 1]} c(uA Because of A9.4), A9.2) will follow if we prove that A9.6) nlo h Using again the independence of the increments, we obtain A9.7) E{A21>tA2MJ = (t - Wtt - 0 < (U - h)\ Fix t and t + h, with h > 0. If <t< t2. A9.8) = max mm 0<i<m { Д М + - m A f + -fc, m then, by A9.7) and Theorem 12.1, A9.9) ?{M'm ? . And, by A2.6), A9.10) 1АМ+Л| < ЪМ'т + max 1<г<т -ДА4 m m Since Z lies in С with probability 1, the maximum here goes to 0 with probability 1 as m —*- oo, and it follows by A9.9) that A9.11) Р{|Д*,*+д1 >fy<K — with К = 44#2Д. From A9.11) we obtain (see C) on p. 223) poo 1^1, Е{|АМ+Л|3} < a + ^ Ja О? = a (Г (this minimizes the function on the right), valid for a > 0. Taking a = we obtain which implies A9.6) and completes the proof. The condition that X lie in С with probability 1 is essential here, since the other hypotheses of the theorem are satisfied if A9.12) X(t) = 7@ - t and Y is distributed as a Poisson process with parameter 1.
Diffusion 157 We now derive a limit theorem from Theorem 19.1. Let Xn be random elements of D. We say Xn has asymptotically independent increments if :i9.13) 0 < Sl < h < s2 < h < • • ' < sr < tr < 1 mplies, for all linear Borel sets Hx, . . . , Hr, that the difference ;i9.i4) p{xm - хмещ, i = i,..., r} - jj nxnih) - *«(«,) ея,} ;onverges to 0 as и —»- oo. Note that we require A9.14) to hold only when the ntervals fo, tJ are strictly separated—when ^_x < ^. Recall that the modulus of continuity wx(d) = w(x, d), defined by (8.1), 's well defined if x lies in D. It enters into the next theorem because the imiting distribution on D is actually supported by C. THEOREM 19.2 Suppose that Xn has asymptotically independent incre- increments, that {Xn2(t):n > 1} is uniformly integrable for each t, and that E{Xn(t)}-+0 and ?{Xn2(t)}-^*t as n —>- oo. Suppose finally that, for each oositive s and r\, there exists a positive д such that A9.15) P{w(Zn, d) > e} < г) for all sufficiently large n. Then Xn —*- W. Proof. Since Е{ХД0)}-+0, {Xn@)} is tight. It follows from A9.15) and Theorem 15.5 that {Xn} is tight and that, if X is the limit in distribution of a subsequence, then P{Ie C) = 1. It is enough to show that any such X must be distributed as W. But E{ATn(f)} -> 0, E{Zn2@} -»-1, and the uniform integrability of {X2(t):n > 1} imply E{X(t)} = 0 and Z{X*(t)} = t (Theorem 5.4). Now the increments X{t^) — X(s{), i=l, ..., r, are independent when A9.13) holds, and from P{X e C} = 1 it follows that the same is true if we allow tt = sihl in A9.13). Thus the result follows by Theorem 19.1. It is not possible to replace the w in A9.15) by v/ as defined by A4.6) (define Xn = X by A9.12)). The condition that {Xn2(t):n > 1} is uniformly integrable for each t cannot be dispensed with in this theorem; indeed the condition follows via Theorem 5.4 from E{Xn2(t)} -> t and Xn^- W. For a specific example in which the condition fails, take Xn{t) = y/t gn, where ?n assumes the values \Jn, —\Jn, and 0 with respective probabilities l/2«, l/2«, and 1 — 1/n; although Xn does not converge in distribution to W, the hypotheses of the theorem are satisfied except for the requirement of uniform integrability. Theorem 19.2, together with the results of Section 12, leads to another proof of Donsker's theorem. If Xn is the random function A0.1) involved in
158 Dependent Variables that theorem, then, by A2.23) and Theorem 8.4 (carried over to D), the condition involving A9.15) is satisfied. Further, by A2.20), f S2Jn d? < K'P + \ f ^2JF J {?„ /n<r2>a} |_Д 0" J {||i|>io<TV n) for a universal constant K', which implies {Xn2@:" > 1} uniformly integrable. The remaining conditions in Theorem 19.2 are easily checked, and Xn ^- W follows. Notice that this proof does not presuppose the central limit theorem; in particular, we have an independent proof of the Lindeberg-Levy theorem.! As we shall see in Section 20, this method applies also to certain sequences of dependent random variables. Other Diffusion Processes% IfXsatisfies the hypotheses of Theorem 19,1, then, for 0 < tx < • • • < tk < 1 and h positive, A9.16) { \t{(x( + h) x()Y || x()x()} = h. If Xe С and X@) = 0 with probability 1, that X is Brownian motion follows, as we shall see, from A9.16), and we can dispense with the assump- assumption of independent increments. The same is true if, in addition to certain regularity assumptions, we require only that, for small h, the equations in A9.16) hold approximately, in a sense presently to be made precise. We can in a similar way characterize diffusion processes other than Brownian motion by replacing the right sides of the equations in A9.16) by functions of h, tk, and X(tk). We shall assume that A9.17) E{X(tk + h) - X(tk) || X(tJ, ..., X(tk)} * hP(tk)X(tk) and A9.18) E{(X(tk + h) - X(tk)Y || X(tl), . . . , X(tk)} ** ha\tk) for small h (the meaning to be made exact), where p(t) and o2(t) are given functions, and prove (Theorem 19.3 below) that X is a Gaussian random function satisfying A9.19) t The argument is easily adapted to the Lindeberg case; see Problem 1 of Section 10 and Problem 1 of Section 12. J The remainder of this section is required only for Sections 23 and 24.
Diffusion 159 and A9.20) Е{ВД X(t)} = Га\г) exp р(т) dr + p(r) Jr dr, 0<s <t The choice p@ = 0 and o2(t) = 1 leads back to Brownian motion, and the choice p(t) = —1/A — 0 and o2(t) = 1 leads to the Brownian bridge. Of the functions o2(t) and p{t) we shall assume that a2(t) is nonnegative, that both are continuous on [0, 1), and that, if and then the limit A9.21) g@ = exp G(t) = p(r) dr dr, lim g(t)G(t) exists and is finite. As t —>• 1, the nondecreasing function G(t) has a limit (finite or infinite), and hence git) has a finite limit (which vanishes if G —> oo). Thus the right side of A9.20), which is well defined for 0 < s < t < 1, has as t -> 1 a limit which we shall take as its value for t = 1. If we define X(t)=g(t)(\ + G(t))W°(G(t)l(l + G(t))) for 0 < / < 1, where W° is the Brownian bridge, then A9.20) holds for 0 < s < t < 1. Since g{t)t g{t)G{t), and G(t)/A + G(t)) all have limits as t —*¦ 1, we can define X(l) by passing to the limit. Thus, there does exist a continuous, Gaussian random function satisfying A9.19) and A9.20). Clearly X@) = 0 with probability 1. We shall regard Jasa random element of D lying with probability 1 in C. We shall give three conditions on X which characterize it as this random function. Each condition will be given in two versions, first in a form convenient for most applications and second in a distinctly weaker form that suffices for the characterization. The first condition defines the sense in which the approximations A9.17) and A9.18) are to hold. Condition 1. I/O < h< • • < tk < 1, then A9.22) and limy fc + h) - X(tk) г), •••, Xitk)} - hpitk)Xitk)\} = 0 A9.23) lim I E{\E{(X(tk + h) - X(tk)f \\ X&),..., X(tk)} - ha\tk)\} = 0.
160 Dependent Variables Condition la. If Q < h < • - ¦ < tk < \, then for all real щ, . . . , uk, A9.24) and lim - nloh e( I ехр21из^(^з) [X(tk + ft) - x(y - М'ь№)]} = о A9.25) limy e(Гехр| ш,ЛГ(*,)] [(*& + /г) - X(/fc)J - ^U)]] = 0. Condition 2. We have A9.26) sup, E{X2@} < oo. Condition 2a. 77ze variables X(t), 0 < t < I, are uniformly integrable. Condition 3. There is a constant К such that A9.27) E{(X(t) - X(tJY(X(tJ - X(t)f} < K(t2 - t,f, tx<t< t2. Condition 3a. For t < 1, - f (X(t + h) - X(t)f d? = 0. A9.28) lim lim sup Clearly, Condition 1 implies la and 2 implies 2a. To see that 3 implies 3a (which is essentially a uniform integrability condition), observe that, with the notation A9.5), the inequality A9.27) is just A9.7) with an extra factor of К on the right, and, if X lies in С with probability 1, we may deduce A9.11) just as before (with a new К—the value is irrelevant). An application of C) on p. 223 shows that the integral in A9.28) cannot exceed IKhjoL, so that 3a does follow from 3. (For conditions intermediate between 2 and 2a and between 3 and 3a, replace the exponent 2 on the left in A9.26) and A9.27) by 1 + ?.) THEOREM 19.3 Let X be a random element of D with ?{X e C} = 1 and P{X@) = 0} = 1. Suppose that o2(t) and p(t) are continuous on [0,1) and there exists the finite limit A9.21). If X satisfies Conditions 1 (or la), 2 (or 2a), and 3 (or 3a), then X is the continuous, Gaussian random function specified by A9.19) and A9.20). Before proving the theorem, let us connect it with our two familiar examples. If A9.29) />@ = 0, G2@=l, then A9.20) becomes ?{X(s) X(t)} = s, s < t, so that X is Brownian motion. If A9.30) ?{X(tk + h) - X(tk) I! X(tx),..., X(tk)} = 0
Diffusion 161 and. A9.31) E{(X(tk + h)- X(tk)f || X(tx), . . . , X(tk)} = h with probability 1, then A9.22) and A9.23) are certainly satisfied, A9.26) holds, and A9.27) holds (even with (t — tj(t2 — 0 on the right). Hence (if Xe С and Z@) = 0 with probability l) X must be Brownian motion. Since the hypotheses of Theorem 19.1 imply A9.30) and A9.31), that theorem is a corollary of the present one. If A9.32) ^ then A9.20) becomes E{X(s)X(t)} = s(\ - t), s < t, and X must be distributed as the Brownian bridge W°. Proof of the theorem. Fix points tt with 0 < tx < • • • < tk < 1, fix real numbers u1? . . . , uk, and let t and и vary over the strip A9.33) h<t<l, -оо<м<оо. Put A9.34) y,(t, u) = E{exp i(Ul X(tl) + • • • + uk X(tk) + и X(t))}; we shall derive a differential equation for ip(t, u). For notational convenience, write Z = ux X{t^) + • • • + uk X(tk), so that A9.34) becomes A9.35) y)(t,u) = E{eiZeiuX{t)}. Since Xe С with probability 1, ip is continuous (jointly in t and u) in the strip A9.33). Since E{X@} exists, we have A9.36) — w(t, u) = E{iX(t)eiZeiuXU)}, аи and Condition 2a implies that this function also is continuous in the strip A9.33). With the notations A9.3) and A9.5), we have h,u)- y>(t, u) = E{eiZe By A9.35) and A9.36), A9.37) - [y(t + h,u)- y>(t, u)] - uP(t) — y)(t, u) + ±u2o\t)y)(t, u) h du Ш\^п^\()]}\ + ]-E{\c(uAt,t+h)\}. zn h
162 Dependent Variables Since \eiv — 1 — iv\ < |y2, we have, in addition to A9.4), the inequality \c(v)\ < v2. Therefore \ E{|(AM+ft)l} ^ \fJh^ + ? f AJ,^ d?, ? f П J{\t2,t and it follows by Condition 3a that the third term on the right in A9.37) tends to 0 with h. And by Condition la (with (tlf . . . , tk,t) replacing (h, ¦ ¦ ¦ , h) and ("i, . . . , uk, u) replacing (ult . . . , uk)), the other two terms also tend to 0. Therefore A9.38) | W(t, u) = uP(t) |- W(t, u) - *uV@v(', u); since the right side of this equation is continuous in the strip A9.33), so is the left.t We now solve the differential equation A9.38). For v arbitrary, define A9.39) Xv(s) = v ехрГ-Пэ(т) dr\ tk < s < 1, and put A9.40) gv(s) = W(s, Xv(s)), tk<s<l. By the chain rule (^ and y2 denote the partial derivatives with respect to the first and second arguments of yi), ip2(s, Xv( which, together with A9.38) and the definitions A9.39) and A9.40), gives Therefore ? A9.41) w(s, Xv(s)) = W(tk, v) ехрГ- ^ ^ог2(г)АЛг) dr\ tk < s < 1. Given an arbitrary (t, u) in the strip A9.33), take v = и exp j"*fc р(т) dr. Then Xv(t) = и and Xv\r) = u2 exp[2 f p(t) dr\ hKr^ t, so that, by A9.41) with s = t, A9.42) y,(t, u) = W\ t The derivative with respect to / here is two-sided, even though h tends to 0 through positive values; see the second footnote on p. 155. % See the first footnote on p. 155.
Diffusion 163 vhere a = exp р(т) dr A md Ь2 = Jr. Going back to the definition of ip, we see by A9.42) that E{exp 1(щ X(tx) + --- + Щ X(tk) + и X(t))} = E{exp i(Ul Xih) + • • • + uk X(tk) + ua We have proved this for 0 < t± < • • • < tk <. t < 1; by continuity, it is true for t = 1 as well. It implies that, for arbitrary v, E{exp i(«i X(tJ + --- + uk X(tk) + v(X(t) - a X(tk)))} = E{exp i(Ul XVJ + --- + ut X(tk))}e~b*b\ Thus the random vector (Х(^), . . . , X(tk)) and the random variable X(t) — a X(tk) are independent of each other, and the latter has distribution ^@, W Taking к = 1 and t± = 0 and using the fact that P{Z@) = 0} = 1, we see that X(t) is normally distributed with mean 0. Taking к = 1 and tx general, we see that A9.20) holds and that Z(^) and X(t) — a X(tx) are jointly normal (for an appropriate a) and hence that the same is true of X(tJ and X(t). Taking к = 2, we see that X(t±), X(t2), and X(t) — a X(t2) are jointly normal and hence that the same is true of X(tx), X(t2), and X(t). Continuing in this way, we conclude that all the finite-dimensional distribu- distributions are, normal, which completes the proof. For the asymptotic form of Theorem 19.3, we shall need three conditions which parallel the Conditions 1, 2, and 3. As before, each condition will be given two forms. Let Xn be random elements of D. Condition 1°. I/O < tx <, • • • < tk < 1, then A9.43) lim lim sup - E{|E{Xn(ffc + h) - Xn{tk) \\XM,..., Xn(tk)} ft i 0 n->oo П - hp(tk)Xn(tk)\} = 0 and A9.44) lim lim sup - t{\l{(Xn{tk + h) - Xn(tk)J1| Xn(tl),..., Xn(tk)} ft 10 n->oo П -ha\tk)\} = 0. t This implies that {X(t)} is a Markov process.
164 Dependent Variables Condition l°a. If^<t1<---<tk<\, then, for all real wl5 . . . , uk, \ (Г k — Г"/ РУ1Л > 111 X it Л I 0 )(->• со h I 3=1 A9.45) lim lim sup- X [Xn{tk + h) - Xn(tk) - hp(tk)Xn(tk)]\ 0 and A9.46) lim lim sup- л I о п -> да /г Condition 2°. Же A9.47) Condition 2°a. A9.48) X [(Х„(Ъ + /7) - ^n(OJ - ^2(^)]} = 0. sup lim sup E{Xn2(t)} < oo. have lim sup lim sup | Xn(t)\ d? = 0. Condition 3°. TAere /5 a constant К such that 2} < K(t2 - txf, 'i < t < t2 A9.49) for all n. Condition 3°a. For t < 1, A9.50) lim lim sup lim sup - a->oo л!о n-»oo /l J{(Xn(t+ft)—Xn(tJ>aM It is easy to show that Condition 1° implies l°a and that 2° implies 2°a. Although 3° does not imply 3°a, each of them does suffice, as we shall see, for the theorem. THEOREM 19.4 Suppose that {Xn2(t):n > 1} is uniformly integrable for each t, that Xn(Q) -^> 0, and that for each positive s and r) there exists а д such that A9.51) ») > e} < V for all sufficiently large n. Suppose o2(t) and p{t) are continuous on [0, 1), and there exists the finite limit A9.21). If {Xn} satisfies Conditions 1° (or l°a), 2° (or 2°a), and 3° (or 3°a), then Xn -?> X, where X is the continuous, Gaussian random function specified by A9.19) and A9.20).
Diffusion 165 Proof. By Theorem 15.5, each subsequence of {Xn} contains a further subsequence converging in distribution to some random element X of D with P{X@) = 0} = 1 and P{IeC} = 1. It is enough to show that any such Zmust satisfy the remaining hypotheses of Theorem 19.3. By Theorem 5.4 and the assumed uniform integrability of {Xn2(t):n > 1}, Condition l°a implies Condition la; moreover, by Theorem 5.3, 2°a implies 2a, 3°a implies 3a, and 3° implies 3 (which, as we saw before, implies 3a). Thus X satisfies Conditions la, 2a, and 3a, which completes the proof. Remark. Suppose that Condition 3° holds, that A9.52) Zn@) = 0 holds with probability 1, and that the maximum jump in Xn satisfies A9.53) mzxt\Xn(t)-Xn(t-)\<en with probability 1, where A9.54) en->0. By Theorem 12.1 and the inequality A2.6) we then have A9.55) P ( sup \Xn(s) - Xn(t)\ > x) < К' ^ \t<s<t+6 J X with K' =A*KK2l, provided en < \X, from which A9.51) follows by the corollary to Theorem 8.3. Now A9.52) and A9.55) imply if en < \X; therefore (by C) on p. 223) A9.56) Г X*(t)d?<2 — J{Xn2(t)>a} a if ?n2 < T^a, which implies that {Xn(t):n > 1} is uniformly integrable. Finally, A9.56) implies A9.48). Therefore, if A9.52), A9.53), and A9.54) hold, the conclusion of Theorem 19.4 will follow if we verify only Condition Г (or l°a) and Condition 3°. Remarks. The results here are those of Rosen A967a), with the conditions altered so as to bring Theorem 12.1 to bear. The idea of deducing the central limit theorem from a differ- differential equation goes back to Khinchine A933); Theorem 19.3 under the special conditions A9.30) and A9.31) is due to Levy A948) and Doob A953).
166 Dependent Variables 20. MIXING PROCESSES cp-Mixing Let B0.1) . . . , U, fo, fi, • • • be a strictly stationary sequence of random variables defined on a probability space (Q., &, P). For a <Lb, define J?* as the a-field generated by the random variables fo, . . . , fb; define J^^ as the a-field generated by • • • , ?a-i> ?a> an(* define ^o°° as the a-field generated by fo, ?o+1, .... Consider a nonnegative function 93 of positive integers. We shall say that the sequence {?n} is cp-mixing if, for each к (— oo < A: < 00) and for each n (n > 1), ?i e u?i ^ and J?2 б Л?гП together imply B0.2) \?{EX n ?2) - P(?x)P(?2)| < <p{n)?{Ex). This is a joint property of {?„} and 93. We consider only functions 93 satisfying B0.3) lim ?>(n) = 0, п-кп and usually we require that cp(ri) go to 0 at some specified minimum rate. If we say that {?n} is 93-mixing without specifying cp, we mean that B0.2) holds for some 93 satisfying B0.3). If P(?i) > 0, then B0.2) is equivalent to B0.4) \?(E2\ EJ - P(E2)\ < <p(n), whereas B0.2) holds trivially if P(EX) = 0. We shall regard the left side of B0.4) as vanishing if Р(?х) = 0, so that the condition for 93-mixing is B0.5) sup {|P(?21 Ed - If 93G1) is small, B0.2) and B0.4) say that E2 is virtually independent of E-y. In a 93-mixing process, the distant future is virtually independent of the past and present.f This property enables us to prove various central limit theorems and functional central limit theorems. We shall often write 9зп in place of 93G1). If each 93n has the minimal value consistent with B0.5), then B0.6) 1 > (рг > y2 > • • • . t Since B0.2) and B0.4) are not symmetric in Ег and E2, past and future cannot be inter- interchanged; for a 95-mixing process that ceases to be 95-mixing when time is reversed, take fn = S?Li Vn-kl^' where r]k are independent and assume the values ±1 with probability \ each.
Mixing Processes 167 If we replace yn by B0.7) min{l, <plt . . . , <pn}, then {?„} is still 93-mixing and B0.6) holds. Thus there is no loss of generality in assuming B0.6). It will be convenient to define <p0 = 1, so that B0.5) holds trivially if n = 0. For fixed E2, B0.8) {E1:\P(E2\E1)-P(E2)\<e} is a monotone class| (is closed under the formation of countable increasing unions and countable decreasing intersections); for fixed Ex, the same is true of the class B0.9) {?2:|P(?2|?i)-P(?2)l<?}. It follows that, if B0.4) holds for Ex in j/ and E2 in 3&, where j/ and & are fields generating -^^ and ^k+n, respectively, then B0.5) holds, an observation useful in checking whether a given process is 93-mixing. Example 1. The sequence B0.1) is said to be m-dependent if the random vectors (?,-, . . . , ?fc) and (?k+n, . . . , f,) are independent whenever n > m. (In this terminology, an independent process is 0-dependent.) Such a process is 93-mixing with (pn = 0 for n > m. For a specific example, take B0.10) ?„ = ao?n + Дх^-х + ¦ ¦ • + amin_m, where the a{ are constants and the ?n are independent and identically distributed. Example 2. Let {?„} be a stationary Markov process with finite state space, and let fn =/(^n)' where / is some real function on the state space. If ^J> is defined as before and if ^Vab is the ог-field generated by ?a, . . . , ?6, then Лаъ с Жа\ Let /?u be the stationary probabilities for the Markov process, and let puv be the transition probabilities, so that PRi = Щ, . . . , Ci+l = U,} = PUoPuoUl ' * ' Put_lUl for each finite sequence u0, щ, . . . , ut of states. Assume that the /?u are all positive, so that B0.11) <pn = max ^ - 1 is finite, where p{^ are the «th order transition probabilities. Let Hx be a set t See Halmos A950, p. 26).
168 Dependent Variables of (/ + l)-tuples of states, and let H2 be a set of (j + l)-tuples of states. For the special element B0.12) ?1 = {(?*-,,...,?*) e Hi} of Жк_ „ and the special element B0.13) Of -^k+n' We have P ' ' ' Р \P where the sum extends over (u0, . . . , ut) in Нг and over (v0, . . . , v0) in H2. From the relation ^vpuv = 1 and the definition B0.11), we obtain B0.2). And now from the fact that B0.8) and B0.9) are monotone classes, it follows that B0.2) holds for all Ex in Жк_ ^ and E2 in ^^+n and hence for all Ex in uP «, and E2 in uT?n. If the transition matrix (puv) is irreducible and aperiodic, thenj B0.14) P{:J->PV for all и and v. Since the state space is assumed finite, the convergence must be uniform in и and v, so that cpn, defined by B0.11), converges to 0: {?„} is 93-mixing. In these circumstances more is known, namely that the rate of convergence in B0.14) is exponential: There exist positive constants a and p, p < 1, such that, if B0.15) <pn = aP\ then {fn} is 93-mixing. This is also true of certain process fn =/(?„) for which {?„} is a Markov process with infinite state space—it is true, for example, if {t,n} satisfies Doeblin's condition, has one ergodic class, and is aperiodic. % In some applications, we are presented only with a one-sided (strictly stationary) sequence B0.16) flf ?„.... In such cases we shall say the sequence is 93-mixing if B0.17) sup {I P(?21 Ej) - P(?2) I: E, e ^\ E2 e Jt?n) < <pn for positive integers к and n. Given a one-sided sequence B0.16), we can t Doob A953, pp. 172 ff.). JDoob A953, pp. 190 ff.).
Mixing Processes 169 always construct a two-sided sequence B0.1) with the same finite-dimensional distributions—the new sequence will in general be denned on a new sample space (which we can always take to be the product of a doubly infinite sequence of copies of the real line). We shall then call B0.1) the doubly infinite extension of B0.16). If B0.16) is ^-mixing, then, by stationarity, the doubly infinite extension satisfies B0.4) for Ег e JK\_t and E2 e Jt?w Since B0.8) is a monotone class, it follows that the doubly infinite extension is 93-mixing, with the same cp as before. It is easy to verify the converse: If the extension B0.1) is 93-mixing, then the original sequence B0.16) is 93-mixing with the same function cp. If we prove, for example, that 1 n /- Z и —*¦ yv> /П where the i-t are interpreted as elements of B0.1), then the result is also true if they are interpreted as elements of B0.16), since only finite-dimensional distributions are involved. All our theorems will be of this kind, and it is indifferent to the results whether we work with B0.16) or with B0.1). In proofs we shall consistently work with B0.1), which is more convenient because then the past is an infinite sequence instead of a finite sequence of changing length. Example 3. Let Q. be the unit interval, let 38 consist of the linear Borel subsets of Q, and let P be Lebesgue measure on 38. Suppose B0.18) fn(eo) = g(a)n, a)n+1, . ... , a)n+m), n = 1, 2, . .. , where со has dyadic expansion 00 = • a>1oo2' ' ' and g is some function of (m + l)-long sequences of 0's and l's. The sequence {?„} is stationary and m-dependent. This sequence is of the form B0.16) and comes under the theory as developed here because it is possible to construct, as outlined above, a doubly infinite extension (on some new space). Example 4. For another example in which the sequence {?„} is one-sided, take Q and 38 as in Example 3, let P be Gauss's measure dx log 2 Ja 1 + x and take ?n(co) to be an(oo), the nth coefficient in the continued-fraction expansion of со. The sequence (ai(co), a2(a)), . . . } is stationary and «p-mixing with cp of the form B0.20) <pn = apn, a > 0, 0 < p < l.f B0.19) P(A) = ;J- f ~- , Az 2 Ja 1 t Levy A937, Chapter 9).
170 Dependent Variables Inequalities for Moments . If ? and -r\ are Suppose ? is measurable Jt^L^ and r] is measurable the indicators of sets, then, by B0.2), B0.21) |E{??7} — E{?}E{>7}| is at most ?>nE{?}. We shall require bounds on B0.21) for random variables ? and г] more general than indicators. In all that follows, {?„} is assumed stationary and ^-mixing unless the contrary is explicitly stated. LEMMA 1 If ? is measurable Jtb_^ and r\ is measurable Jt™+n (и ^ 0), then B0.22) Е{|?Г} < oo, E{|t?|s}<oo, r, s > 1, - + -=l, r s implies Proof For an understanding of the lemma, consider first two extreme cases. If (pn = 0, so that ? and r\ are independent, then the inequality B0.23) holds because each of its members vanishes. If q>n = 1, which imposes no restriction at all, then, by the inequalities of Holder and Minkowski, Since we can approximate to f and r\ by simple random variables, in treating the general case we may suppose that B0.24) and B0.25) where {^ elements of inequality, = ZiUiIAt 7] = 2^/g,' is a finite decomposition of the sample space Q, into . For such random variables, we have, by Holder's S\l/S and hence it suffices to prove B0.26) v,{P(Bt <
Mixing Processes 171 For each i, Holder's inequality gives I vtfp, | At) - Since B0.26) will follow if we show that ;20.27) At) - P(fi,)lj || |Р(Я, | A,) - P(fi, lolds for each i. If Сг+ [Cr] is the union of those Bd for which Р(Б3-1 Аг) — Р(Б3) is positive [nonpositive], then Cf and Cr lie in <^?™+n, and hence ,) - P(Cf)] + [P(CT) - P(C71 Л,)] < 29V Thus B0.27) holds, which completes the proof. Taking r = 1 and ^ = oo in Lemma 1 leads formally to this result: If Е{|Ш < oo and P{|?7| > C} = 0 (| measurable «^.^ and rj measurable . then B0.28) lEdr?} - ^{ЦЕ^}! < The inequality is easily proved directly: It suffices to consider simple random variables given by B0.24) and B0.25), where |i>3-| < C. Since < С 2 \щ\ ?(At) 2 \P(B, | At) - P(B3)\, г i B0.28) follows by B0.27). We shall only need the special case of B0.28) in which both random variables are bounded. LEMMA 2 If | is measurable Jt*_^ and ||| < Cl5 and if r\ is measurable in > 0) and \rj\ < C2, then B0.29) Let B0.30) Sn = Si + • • ¦ + ln and So = 0. Let us temporarily abandon the hypothesis that {?n} is ^-mixing (retaining the hypothesis of stationarity, however).
172 Dependent Variables LEMMA 3 Suppose that E{?0} = 0 and B0.31) | Then B0.32) П fc=0 for all n, and B0.33) 1 E{Sn2} -> <r2, vv/zere B0.34) о*=Щ*}+ Proof. If pk — E{?0?fc}, then, by stationarity, E{Sn2} = «Po + 2nf (и - B0.32) follows immediately and B0.33) follows from n n—1 oo г=1 fc=i Suppose once more that {?n} is 9?-mixing. If ^0 has mean 0 and finite variance, then, by Lemma 1 with r — s = 2, B0.35) < oo, then B0.31) holds, so that the variance of Sn is asymptotically no2 with <r2 defined by B0.34). The quantity a2 may vanish even though ?0 has positive variance. This is true, for example, if |n = ?n — ?n_l5 where the ?n are independent and identically distributed (see B0.10)). The case a = 0 is a degenerate one, to be excluded in most of our theorems. LEMMA 4 If |0 w bounded by С and E{|0} = 0, and if'Z(pni< oo, then where K^ depends on <p alone. Proof. We shall show that B0.36) E{Sn4} < 768С Г °° i~l2 Li=o J If we replace cp{ by B0.7), the series in B0.36) does not increase; hence we may assume B0.37) 1 = cp0 > <px > cp2 > • ¦ • .
Mixing Processes 173 By stationarity, B0.38) Е{ЗД < 4! n 2 \Щ0^г+^ш+к}\, where the indices in the sum are constrained by B0.39) i,j, k>0, i+j + к < n. By Lemma 2, B0.40) and B0.41) By Lemma 2 again (and stationarity), Two further applications of Lemma 2 yield inserting these two inequalities into the preceding one, we obtain B0.42) By B0.40), B0.41), and B0.42), the summand in B0.38) does not exceed 4C4 times the minimum of the three quantities <Pi> <Рк, <Рг<Рк + <Pi, and hence the sum itself does not exceed 4C4 times B0.43) 2 <Pi + I V* + I (<Pi<P* + V,) = I <Рг<Рь + 3 2 V<, where in each sum the indices obey B0.39) as well as the restrictions indicated. Since q>t < 1, n oo 3=0 i,k=O Furthermore, by B0.37), n г п 2 ^j ч 2, 2 ^i S *m 2*\}~* i,k<i i=0 j,k=O i=0 nn oooo, Г00 i~I^ = 2n 2 2 ^ < 2и 2 ?>u 2 9V < 2n 2 Vi • u=0 v—u u=0 v=u j 2=0 I From these two inequalities it follows that B0.43) is at most B0.44) 8л|~2
174 Dependent Variables and hence that the sum in B0.38) is at most 4C4 times this quantity. Multiplying B0.44) by 4! n • 4C4, we arrive at B0.36). Functional Central Limit Theorem Define a random element Xn of D by B0.45) Xn(t, со) = -L S[ni](co), 0 < / < 1. THEOREM 20.1 Suppose that {|n} zs tp-mixing with 2n 9?n* < oo and that |0 /гй5 wean 0 and finite variance. Then the series B0.46) <г2=Е{!02} + fc=i converges absolutely; if a* > 0 and Xn is defined by B0.45), then B0.47) Xn К W. Proof. That the series B0.46) converges absolutely follows from B0.35). Suppose o2 > 0. We shall prove B0.47) by an application of Theorem 19.2, thus avoiding a separate argument for the central limit theorem. We shall prove that {Snz/n} is uniformly integrable. If we write Ea{f7} as shorthand for the integral of U over the set {17 > a}, the condition for uniform integrability becomes B0.48) lim sup E J- sA = 0. a-юо n (П ) Assuming this condition for the moment, let us see how the rest of the proof goes through. We must first show that Xn has asymptotically independent increments, in the sense that A9.13) implies that the difference A9.14) converges to 0 as n ->¦ 0. Suppose щ and u3- are integers with щ < vx < w2 < v2 < • • • < ur < vr and щ — Vt_x > b, i = 2, . . . , r, and suppose that, for i = 1, . . . , r, E{ lies in «л^*|. It follows from the definition of ^-mixing by induction on r that B0.49) < Mb). Suppose that A9.13) holds and let Et be the event {Xn(tt) - Xn(st) ? H^, where Ht e 0P-. Then Et lies in ~^[?*l.]+1 and, if <5 is the smallest difference Si - tt_lf then [nSi] + 1 - [шг_г] > [пд], so that, by B0.49), the difference A9.14) has modulus at most rq>([nd]). Since <5 is positive, Xn does have asymptotically independent increments.
Mixing Processes 175 Since X2{t) < S*ntJo*[nt], B0.48) implies that {X2{t)} is uniformly integrable for each t. Certainly, E{Xn(t)} = 0, and Lemma 3 implies that E{Xn2(t)} -»-1. By Theorem 8.4 (adapted to D) the tightness condition involving A9.15) will follow if we prove that, for each positive e, there exist a X, X > a, and an integer n0 such that n > n0 implies B0.50) ( Let us set 3=1 Since |0 has a finite second moment, there exists an increasing sequence of integers mt such that n > тг implies P{|?ol > vnji2} < Ijni2. If we define pn = i for тг < n < mi+1 (and jun = 1 for n < тх), then/?n goes to infinity, but so slowly that B0.51) lim nP{S*n > Xjn) = 0 n-»oo for each positive X. We may at the same time choose pn in such a way that Pn <n. Given e, choose Я so that X > <r and so that B0.52) for all i, which is possible because of B0.48). If Et = (max \St\ < ЪХ^п < \St\), then pfmax \St\ > 3V«) < P{|Sn| > V») + I^C^ П With p = pn, the sum here is at most " Si+,\ > Xjn} +ПI V(?, n {\Sn - Si+P\ > Xjn}) il п—p Each term in the first and third of these sums is at most P{S* > Xsjn}, and we can estimate the second sum by using the fact that E{ e J#*_«,'• P(max \St\ > 3XJn\ < P{\Sn\ > XJn) + (и + p)?{S* > Xjn} { i< ) К - si+P\ >
176 Dependent Variables And now B0.52) and the fact that the Et are disjoint yields P max \Si\ > Mjn\ < — + 2n?{S*v > Xjn) + ~2 + 9V From B0.51) and the fact thatpn -> go, we conclude that Pjmax |SJ > ЗД7"| < 7? 1г<п j Я for large n, and this is B0.50) except for the two irrelevant factors of 3. It now remains to prove B0.48). If | |ol is bounded by C, then, by Lemma 4, A \ 1/1 ± С 2I <? ± С _ 1 <f 2W 1 e/ 1 [n ) a In2 / a where ^ depends on cp alone. Thus B0.48) holds if |0 is bounded. If l0 is not bounded, define, for real x and positive u, ix if |x| < u, @ if |x| < u, [0 if |x| > u, [x if |z| > u, and put Then x =fjx) + ^и(х) =/u(x) + gu(x), so that, if Snu = 2;=1/M(|3) and ^nU = SjLi f«(f,-), then 5n = 5nu + ^nu and hence B0.54) Sn2 < 2SL + 2D2nu. Since/u(^0) is bounded by 2м, it follows by B0.53) that B0.55) [n By Lemma 1, and it follows by B0.32) that м B0.56) [n fc=0 From B0.54) and the relation Ea{U + V} < 2t^{U} + 2E{V}, we now obtain И 1И
Mixing Processes 177 and B0.55) and B0.56) lead to for an appropriate K'v depending only on cp. We can achieve B0.48) by choosing и so that ^Eu2{|02} < \e and then choosing a so that К^/сс < ^е. This completes the proof of Theorem 20.1. Under the hypotheses of the theorem we of course have B0.57) 4 ^ 2 4= S" ^ if <r2 > 0. And if a% = 0 E0.57) is equivalent to SJy/nA*0 and follows easily by Chebyshev's inequality and Lemma 3.f It is easy to make up a multivariate version of B0.57). If |n is the random vector B0.58) |n = (Й» . .., ?<;>), we can define the notion of 9?-mixing just as before. If 2<pn? < oo and if the !^° have mean 0 and finite variances, it follows by an application of the Cramer-Wold technique (Theorem 7.7) to B0.57) that тг* 2?.x fл has asymptotically a normal distribution, centered at the origin, with covariances B0.59) ait = E{tf>^>} +1 E{f«>tf>} + | Е(Й«ЙЛ}, where the series converge absolutely. The matrix (<т^-) may be singular or nonsingular. These results apply to each of the four examples given above. Note that in Example 1 (p. 167) and in Example 3 (p. 169) the series B0.46) are finite. In Example 2, with |n =/(Q, where the transition matrix for {?n} is irreducible and aperiodic, the asymptotic variance o2 can be written * = 2 7u/(u)f(v), uv where 7uv = KPu - PuPv + Pu2(P(uv - Pv) + PvJ,(pikJ - Pu)- If the matrix (yuv) annihilates the vector (/(и)), then o2 = 0. f If a2 = 0, it is even possible, by adapting the proof of B0.50), to show that
178 Dependent Variables Integrals in Place of Sums Theorem 20.1 has a natural formulation with {?n} replaced by a process in continuous time. Let B0.60) {vt(co):-oo < t < со} be a stationary stochastic process satisfying B0.61) E{u0} = 0, E{y02} < oo. Suppose that the process B0.60) is measurable,! so that the integrals B0.62) f vu(co) du are well defined and finite with probability 1. Let Yn be the random element of D defined by Yn(t, со) = -1= f"с» ds, (У-sJn Jo where a will be determined later (Yn, of course, lies in C). We should like to prove B0.63) Yn ^ W. We define {vt} to be ^-mixing if B0.64) |P(?X n E2) - holds whenever ?x lies in the <r-field generated by {vu:u < s} and E2 lies in the a-field generated by {vu:u > 5 + ?}, where now <р(г) is defined for all positive t. We shall suppose that {vt} is ^-mixing with B0.65) f° Jo Under this assumption and the assumption B0.61), we shall prove B0.63) with a (assumed positive) defined by B0.66) a% = 2 Г Jo dt. Although we could imitate the arguments used before for discrete time, it is simpler to reduce the present case to the previous one. If = Ji s(co) ds, i-i f Doob A953, p. 62).
Mixing Processes 179 then, by two applications of Fubini's theorem, E{?0} = 0 and B0.67) E{|02} < j\v,\ ds7\ ? EJ j\2 ds\ = E{Uo2} < oo. It is not hard to check that B0.66) and B0.46) define the same a. Since B0.64) implies that {|n} satisfies B0.2), and since B0.65) implies H<pi(n) < oo (provided у has the minimal value consistent with B0.64)), it follows by Theorem 20.1 that, if Xn is defined by B0.45), then B0.68) Xn ®+ W. Now dn = supt\Yn(t) - Xn(t)\ ? -Jp max Г |с<| dt, Gjn \<i<n J i—1 so that >e}< X ? О J{|?l>?<rVn> where Z, = JJ \vs\ ds. Since Щ2} < oo by B0.67), <5n^0, and B0.63) follows by B0.68) and Theorem 4.1. The relation B0.63) persists if n goes to infinity in a continuous manner. If vt is interpreted as the velocity of a particle at time t, the integral B0.62) is the displacement it undergoes during the time interval (s, t); B0.63) asserts that the particle is approximately in Brownian motion. Nonstationarity Returning to the case of discrete time, let us generalize Theorem 16.3 by showing that in Theorem 20.1 we may replace P, the probability measure governing the |n, by an arbitrary probability measure Po absolutely con- continuous with respect to it. Under Po the process need not be stationary. THEOREM 20.2 Theorem 20.1 remains valid if P is replaced by an arbitrary probability measure Po dominated by it. Proof. Define Xn by B0.45) and X'n by vn<i<nt where pn —*¦ oo but pjy/n -> 0. As in Section 16 (see A6.10)), we have If E lies in .лР «,, then, for each Am Si, \?{{X'n e A} n E) - ?{X'n e A}P(E)\ < <p(pn - k)-+0.
180 Dependent Variables Since this is true for every k, the proof now goes through precisely as the proof of Theorem 16.3. In Example 4 (p. 169) we may thus substitute Lebesgue measure for Gauss's measure. The proof of Theorem 17.2 carries over in exactly the same way: THEOREM 20.3 If the hypotheses of Theorem 20.1 are satisfied, if vn are positive-integer-valued random variables such that vjan -^- в, where в is a positive random variable and an —> oo, then {if a > 0) Yn -^> W, where Yn(t, со) = —-= S[Vn(m)i](co). If P{|0 e #} > 0, then, by Theorem 20.2, all our limit theorems persist if we compute the probabilities conditionally on the event {|0 ? H}. It is possible to go further and work with probabilities conditional on an event {lo = a)- To carry this through, we need another lemma. As usual, we assume {|n} to be 9?-mixing. LEMMA 5 IfE2E Jt™+n, then B0.69) |P{?2 || Jt*_^ — P(?2)l < <Pn with probability 1. Proof. Suppose P{E21| ^i^} — P(E2) > <pn holds on some set Ex that has positive measure and lies in ^^_^. Integrating over Ex produces a contradiction to B0.2). Thus ?{Ег || Jt*_^\ - P(?2) < <?„ with probability 1. The opposite inequality is treated in the same way. Let us now suppose that {?n} is regular in the following sense. In the first place, we suppose that the |n generate the <r-field ^ (this is really no restriction). In the second place, we assume that there exist conditional probability measures on 88 relative to ^й_о0. We assume, that is, that there exists a function P@(M) such that, for each со in Q, P@( •) is a probability measure on ?8, and such that, for each E in 08, P (E) is a version of the conditional probability P{.E|| ~#^00} (and hence is measurable ^^oj. This regularity condition holds, for example, if (Q, 88) is the product of a doubly infinite sequence of copies of (R1, 0Р) and the |n are the coordinate variables.! THEOREM 20.4 Suppose {|n} is regular in the sense described. Under the hypotheses of Theorem 20.1, there exists in ?8 a set E*, with P(E*) = 0, such t See Loeve A960, p. 361).
Mixing- Processes 181 hat, if cdo$E*, then the conclusion of this theorem remains valid when P у replaced by P0)o. Voo/. Let ?„=$! + 1- Sn and define Xn by B0.45). For positive Qtegers и and л let Zun be the random element of D defined by у/ u<i<nt f dun(co) = sup, |Xn(f, (a) - Zun(t, co)\, hen l,et Fo be the set of со for which ?г(со) is ± oo for some /; then P(^o) = 0 and, or each u, 20.70) -lim dun(co) = 0, соф Ео. n-*ao 4ow Xn ^ Why Theorem 20.1, and it follows by B0.70) and Theorem 4.1 hat, for each u, ;20.71) Zun ®+ W is n -> oo. In B0.71) P is the measure governing the ?n. Since P(?"o) — 0, we have P^C^o) = 0 f°r wo outside some exceptional set Ex of P-measure 0. If F is a closed set in D and Fe = {x:d(x, F) < e} with d either of the metrics for the Skorohod topology, it follows by B0.70) that ;20.72) lim sup P^co: Xn(co) e F} < lim sup Pmo{co: Zun(co) 6 F,} П-*ао П-Ю0 For co0 outside Ex. According to Lemma 5, if E e Л™, then ;20.73) IP^E) - P(?)| < ф) for co0 outside an exceptional set (of P-measure 0) depending on E. Since Jt™ is generated by {?„, ?u+1,. . . }, it is generated by a countable subfield. Since PWo( •) is actually a measure, it follows that B0.73) holds for all и > 1 and all E e ~#", provided co0 lies outside some grand exceptional set E2. Now {co:Zun(co) e FJ lies in Л™\ hence B0.74) Р^ш: Zun(co) e Fj < P{w: Zun(co) e Fe} + ф) for co0 ^ F2.
182 Dependent Variables If oo0$E* = Еги Е2, then B0.72) and B0.74) both apply, so that lim sup Ршо{со: Хп(ш) e F} < lim sup P{co: Zun(co) e F,} + y(u) n->oo n-»oo for each F and each e. Since B0.71) holds in the sense of P and Ft is closed, the limit superior on the right here is at most ?{W e Fe]. Letting e -»¦ 0 and и —»¦ oo, we conclude that n-*-a> for all closed F if a>0 $ E*, which, by Theorem 2.1, proves the theorem. If we have a one-sided sequence |lf |2, . . . , then Theorem 20.4 still holds if we compute probabilities conditionally on {fx = a}. 21. FUNCTIONS OF MIXING PROCESSES In this section we shall generalize the results of the preceding one by analyzing stationary processes {r}n} for which each r\n is a function of the entire process {?„}, which we assume to be stationary and 9?-mixing, as before. Preliminaries Let/be a measurable mapping from the space of doubly infinite sequences ( . . . , a_l5 a0, a1} . . .) of real numbers into the real line: /(..., a_l5 a0, al5 . . . ) e Rl. Define random variables B1.1) y]n =/(..., !„_!, |я> |я+1, . . .), n = 0, ±1, ±2, . . . , where ?n occupies the Oth place in the argument of/. We shall derive limit theorems for the process {r]n}. Although the rjn must be real valued, the ?n may now take values in Rk or even in a general measurable space. If the value of/ depends on only finitely many of the coordinates of its argument, then {r]n} is again 9?-mixing (with a new function cp), and hence the theorems of Section 20 apply. (Note that {rjn} is in any case stationary.) If/effectively involves all the coordinates of its argument, {rjn} need not be 99-mixing. We shall obtain limit theorems for {rjn} under the assumption that r\n can be closely approximated by functions depending only on finitely many of the |n. Let/ be a measurable mapping from R2l+X into R1. We write the general point of R2l+1 as (а_г, . . . , a0, . . . , <хг), so that the value of/ is in the form _г, . . . , a0, . . . , аг).
3Ut 21-2) Vm the subscript /n is a pair, not a product). The process {r]ln} is stationary for :ach /. We shall assume of the function/and the process {?„} that = 0, Efoe»} < oo. Ne shall assume further that there exist functions ft such that r\ln (denned эу B1.2)) has a finite second moment and such that, if 21.3) v, = v(l) = E{\Vo - Vl0\2}, hen f v,* < oo. г=1 This is the sense in which r\n is assumed well approximable by functions of эп1у finitely many of the |n. If, in place of the doubly infinite sequence B0.1), we are presented with a one-sided sequence B0.16), then we must assume in place of B1.1) that r\n. is given by B1-4) *г„ where now f is a real, measurable function of one-sided sequences (ax, a2, . . .). In this case, we take/j to be a measurable mapping from Rl into R1, we define B1-5) ^n and we assume that, if B1-6) v, then 2vj^ converges as before. Let us replace {?„} by its doubly infinite extension (see the discussion following B0.16)). If we rewrite/(al5 a2, . . . ) as a function /(..., a_l5 a0, al5 . . . ) effectively involving only the co- coordinates with positive indices, and similarly for/j, then B1.1) and B1.2) have the same finite-dimensional distributions as B1.4) and B1.5). It follows that the square root of B1.3) is summable in / and hence that to each limit theorem for functions of B0.1) there corresponds a limit theorem for functions of B0.16). As before, therefore, we may confine our attention to the doubly infinite case. Since rjl0, as defined by B1.2), is measurable ^l_v it follows by Jensen's inequality for conditional expected values! that t Doob A953) or Billingsley A965). In all that follows, we take E{| || &} to be measurable & (rather than equal almost everywhere to a function that is measurable ^).
184 Dependent Variables Therefore, by Minkowski's inequality, j|2} < 2vf. Since l>2v^ converges if livft does, there is no loss of generality in assuming that B1.7) The following lemma implies that, if rjln is given by B1.7), then уг is nonincreasing. LEMMA 1 Let 3F and & be a-fields with & <= <g.If E{?2} < oo, then Proof. The following equalities and inequalities all hold with probability 1. If 7] = | - E{? [ &}, then $ - E{f || ^} = v - E{rj || ^}. Hence it suffices to prove E{|?7 — E{?7 ^}|2} < E{?^2}. But E{\r] - E{r] I ^}|! Take expected values. Functional Central Limit Theorem Write B1.9) Sn = ^ and define Xn by B1.10) Xn(f, со) = THEOREM 21.1 Suppose that {?„} и y-mixing with E <рп^ < oo the r]n defined by B1.1) have mean 0 and finite variance. Suppose further that there exist random variables of the form B1.2) such that Svfi < oo, where vt is defined by B1.3). Then the series B1.11) <т2 = ЕЫ + 2| E{Wfc} converges absolutely; if a2 > 0 ant/ Zn и defined by B1.10), B1.12) Xn^>W. t According to Doob A953, p. 603), E{rjn || -^^±|} can be represented in the form B1.2) because it is measurable -^n-l- ^^s ^oes not геа^У matter, however: In the proofs we use only the fact that {r)ln} is stationary for each / and the fact that r\ln is measurable
Functions of Mixing Processes 185 ^roof. We first show that B1.11) converges absolutely. For each к and /, f i = [\k\, then, by Lemma 1 of the preceding section, md therefore 21.13) | because of the assumed convergence of ?<?v and ^vz > B1.11) does converge ibsolutely. Although it is possible to derive B1.12) from Theorem 19.2, it is more convenient to use the results of the preceding section. We first prove 21.14) 4= sn -^ N@, a2). vVe shall assume, as we may, that r\ln is given by B1.7). Put [21.15) bln = r]n- r]ln, ю that v(t) = E{<52J and E{dln} - E{r]ln} = 0. We have 1 1 !L 1 i 21.16) 4=sn = ~ 2vu + -7= 2Л. md the idea in proving B1.14) is to show, by using Theorem 20.1, that the irst sum on the right in B1.16) is approximately normal for large n and then to show that the second sum is small for large /. First of all since i > / implies that the left side vanishes, whereas i < / implies that it coincides with E{E2{?70 — E{?70 | -#ij | Л1_г}}, which by Jensen's theorem does not exceed v,. In the argument leading to B1.13) we may therefore re- replace щ by r]l0 and rjk by rjlk and, using the inequality E{?720} <; E{?702}, con- conclude that |E{?7JO?7jfc}| does not exceed the right side of B1.13). Therefore the
186 Dependent Variables series converges absolutely and uniformly in /; since it converges termwise to B1.11), we have B1.17) lim в? = o\ J-GO It is easy to check that, for each /, {rjln} is 9?U)-mixing, where fl if n < 21, [<p(n - 21) if л > 21. The convergence of ?(<p(n))^ implies that of ~Z{cp(l)(n))?, and it follows by Theorem 20.1 that, for each fixed /, -^Flnu^mo?) B1.18) as n —>• oo. Now N@, of) ^> N@, a2) by B1.17); because of B1.16) and B1.18), the relation B1.14) will follow by Theorem 4.2 if we show that B1.19) lim lim sup P I-»oo n-»-oo 3=1 =0 for each positive e. To this end, we estimate the variance of 2^=1 dtj. If m denotes the maximum of / and i, then, by the definition B1.15), l0 E{<5 J0 ij = dm0. Since v(m) <, v(j), we obtain In the argument leading to B1.13) we may therefore replace rj0 by dl0 and Vk by <5№; since E{5^0} < E{^02} by Lemma 1, |Е{<5гоEгЛ.}| cannot exceed the right side of B1.13). Therefore the series converges absolutely and uniformly in /. Since each term goes to 0 as / —v oo, B1.20) lim гг2 = 0. By Lemma 3 of the preceding section applied to {dln} (the sequence {dln} is not 93-mixing, but the lemma does not require this), B1.21) lim -E n-*-oo П >li = r, Chebyshev's inequality and B1.20) now yield B1.19), which completes the proof of B1.14).
Functions of Mixing Processes 187 From the absolute convergence of the series B1.11), it follows via Lemma 3 of the preceding section (recall once again that this lemma requires only stationarity—99-mixing is not needed) that B1.22) ^ n If a2 = 0, B1.14) is the same as SJy/n ^>0. From now on we assume that a2 > 0. To prove B1.12), we establish the convergence of the finite- dimensional distributions and then tightness. Let pn be positive integers going to infinity at a rate to be specified later and define 77 — uni — and B1.24) Vni = ?{Sn - Si+2Pn 1 In these definitions we adopt the conventions that Si_2v = 0 if i < 2pn and Sn — Si+2Vn = 0 if i + 2pn > n. We shall often write/? in place ofpn. By Minkowski's inequality and Lemma 1, 3=1 3=1 If B1.25) then B1.26) E{|S, - ?{Sk || ^1+Л I2} and it follows that B1.27) E{\Uni - SA2} < for all i. In the same way we obtain B1.28) E{|Fni - (Sn - S,)|2} < 2?{SlDn] + 2fx\pn). Since Uni and Vni are measurable -#!f^ and ^?_v, respectively, B1.29) \P{UnieHlt VnieH2] - ?{Uni e Hx}?{Vni еЯ2}| < <pBpn) for all linear Borel sets Hx and Я2. Consider now the finite-dimensional distributions. From B1.14) it follows that B1.30) Xn{t) - Xn(s) ®+Wt- Ws.
188 Dependent Variables We shall prove that B1.31) (Xn(t),Xn(l) - Xn(t))^(Wt, Wx - Wt); the argument is easily adapted to cases of dimension exceeding 2. Let pn go to infinity slowly enough that pJn^-0. Since E{Sk2} ~ ко2 by B1.22), it follows by the inequality B1.27) and Chebyshev's inequality that we have the convergence relation Similarly, B1-33) -J- Fn,[nt] - (Xn(l) - Xn(t)) ** 0,  and B1.31) will follow if Because of B1.29), it is enough (Theorem 3.1) to show that there is conver- convergence in distribution in each of the two coordinates here; but this follows from B1.30) by B1.32) and B1.33). We turn now to the question of tightness and prove that, for each positive ?, there exists a X > о such that B1.34) ( for all sufficiently large n. Put 3=1 By the argument leading to B0.51), there is a sequence pn going to infinity so slowly that, if t B1.35) Д,(А) = ?{StVn > A^/n}, then B1.36) n-»oo for each positive Я. Consider the variables B1.23) and B1.24) for this choice of pn. With the definition B1.35), we have P{|S, - Uni\ > Ajn} < P{|S so that, by B1.26), B1.37) ?{\Uni - St\ > Xjn}
Functions of Mixing Processes 189 for all /. Similarly, B1.38) ?{\Vni - (Sn - S<)\ > Xjn) < n By B1.14) and B1.22) it follows via Theorem 5.4 that the variables Sn*jn are uniformly integrable. Therefore there exists a X > о such that B1-39) for all k. Fix such a X. By B1.37), B1.40) p(max \St\ > 6Xjn) < pfmax \UJ > 5ljn) Consider the sets E{ = max \Uni\ < 5Xjn < \UJ { i<i We have B1.41) Pfmax \Uni\ > 5Xjn\ < P{\Sn\ > Xjn} \ i<n j ii Since |Sn - Uni\ < \Sn - St - VJ + \Vni\ + |54 - UJ, the /th summand in B1.41) is at most B1.42) P{\Sn - Si - VJ > lVn + Р(Ег n {[VJ ^ 2Xy/n}) + P{|^ - UJ Since Ei lies in Jt^ an<l ^ni is measurable ^™_P, Р(Ег n {|Kni| > 2lVn}) < Р№)Р{|КЯ1-| > 2lVn} + P(E{) <p{2pn) By B1.38), ?{\VM\ > so that Yt + Р(Ег) <рBрп). Using this estimate for the middle term in B1.42) and the estimates B1.37) and B1.38) for the other two, we see that the ith summand in B1.41) is at
190 Dependent Variables most ?{E%)?{\Sn_t\ > hf^i) + Р(Е,)<рBрп) An Therefore, by B1.39) and the disjointness of the Et, f \ni\ > 5V«) < ~ ^ Vni\ > 5V«) < ~ + <pBpn) ) A The last three terms on the right here go to 0 (see B1.25) and B1.36)), and it follows by B1.40) that i -»i =— -у I —- -i2 i<n ) A for all sufficiently large n, which proves B1.34). This completes the proof of Theorem 21.1. In particular, we have the central limit theorem B1.14), valid even if a2 vanishes, and this result has a multivariate version. Suppose B1-43) rjn = (rj™, ..., Vnr)), where each rj1^ is a function (i) f^V ? ? ? \ with mean 0 and finite variance. Suppose these random variables have approximations U) _ /•(»)/? t t \ Vin Jl \Sn-li • • • j ьп, • • • 5 bn+lJ- If Sj v^(i) < oo for each i, where then, since E*(iow«>- ( t=l г=1 j г=1 any linear combination of the components of rjn satisfies the hypotheses of Theorem 21.1, so that the Cramer-Wold method applies. It follows that rr% E?=1 rjk is asymptotically normal; the covariances for the limit are B1.44) EfoiV') + t ЦЧоУ"} + where the series converge absolutely.
Functions of Mixing Processes 191 Applications Example 1. Let the ?n be independent and identically distributed with mean 0 and variance 1 and define i=—oo oo where we assume Tit a? < oo. The random variable B1.7) is in this case, and the requirement Hvfi < oo becomes < oo. oo / \2 2 ( 2 a*) Example 2. Let (Q, $, P) be the unit interval with Lebesgue measure, as in Example 3 of the preceding section (p. 169), and let ?n(eo) be the Hth digit in the dyadic expansion of со (n > 1). Then {|l5 |2> • • •} is stationary and independent. A random variable r\n of the form B1.4) can be regarded as a function r)n(co) = f(Tn~1co), where/is a function on the unit interval and T is the transformation Too = 2co (mod 1). Suppose that JJ /(соJ с/со is finite and that 2v^ converges, where ь = \f{ Jo and each/j depends on со only through the first / digits fi(co), . . . , ?г(со) of its dyadic expansion. It then follows by Theorem 21.1 that, if /a = JJ f(co) dco, B1.45) -У i/(T'a>) - Ain) ^> N@, a2) \i=i for an appropriate asymptotic variance a2. And there is the corresponding functional central limit theorem. If there exist such functions fx at all, they may be taken to be B1.46) /г(со) = 2гГ f(s)ds, cog/Ц^,^ J(i-D/2! \ -2 2 (see B1.7)). If/(со) = I{ot)(co) and/ is defined by B1.46), then vx < 2~l, so that B1.45) holds (jl = 0- In this case Zj»=1 /(Тгсо) is the number of/, 1 < i < n, for which T^to < t. It Ms a dyadic rational,/coincides with/ for some /, and B1.45) follows by the central limit theorem for (/— Independent variables. If/ is not a dyadic rational,/(со) involves the entire expansion of со.
192 Dependent Variables If/is continuous and/ is defined by B1.46), then vt < 4wf2B~l), where wf(d) is the modulus of continuity of/. Therefore B1.45) is true if 2г wfB~l) converges, a condition which holds, for example, if / satisfies a uniform Holder condition of some positive order: B1.47) |/(си) -Дсо')| < K\co - (o'\e, K,6>0. For example, we may take/(со) = со, in which case r)n(co) = Tn(co). (It can be verified that this last sequence {щ, г\г, . . . } is not <p-mixing for any <p going to 0—to know Tnco is to know Tn+1co, Tn+2co, . . . exactly.) Example 3. Consider the (Q, &S, P) and {a^co), a2(co),. . . } of Example 4 of the preceding section. If 7\ denotes the continued-fraction transformation B1.48) 7> = - - со 1 CO and if a(co) = [1/co], then an(co) = aiT^co), n > l.f As in the preceding example, B1.4) can be regarded as a function ??n(co) =/(Г1"~1со) with/ defined over the unit interval. And 1 f'/M , log 2 Jo 1 + со will hold if if B1.49) — Г log 2 Jo i + со and if 2v^ converges, where B1.50) Vj = — f Cf(°>)-A^y da) log 2 Jo 1 + со and where each / depends on a> only through its first / partial quotients a-^ipS), . . . , flj(co). Since 1 < 1 + со < 2 B1.49) is equivalent to P f(coJdco < oo and the requirement 2v^ < oo is unaffected if we define v? by |/(co)-/?(co)|2rfco instead of by B1.50). Jo t See Section 4 of Billingsley A965) for this and the other facts about 7\ used here.
Functions of Mixing Processes 193 Since a set {со:аг{со) = at, 1 < i < /} is an interval of length at most 2~J+1, we may, for the same reasons as in Example 2 preceding, take/(со) to be /((И)(ш) or to be any function satisfying a Holder condition B1.47). For another example, note that, sincef , , Pi(oo) - 1 log со — log ?-L- where/?г(со)/#г (со) is the /th convergent, we may also take/(со) = —log со; for this/, [л = 7T2/A2 log 2). Moreover, B1.51) log qn(co) = - 2 log Тгксо + 40, |0| < 1, fc=O and it follows that B1.52) -1 (log gn(co) - -^-) Д> N@, a2) V"\ 12 log 2/ for an appropriate positive a2. The functional versions of these limit theorems are easily written out. Diophantine Approximation These results lead to limit theorems connected with Diophantme approxi- approximation. A fraction p/q is a best approximation to со if it minimizes the form B1.53) \q'-oo-p'\ over fractions p'jq' with denominators q' not exceeding q. The successive best approximations to со are{just the convergents/?n(co)/^n(co), n = 1,2,... , and so the value of the form B1.53) for the «th in the series of best approxi- approximations is B1.54) rfn(co) = |gn(co)-co-pn(co)|. Since § B1.55) l-log^O) - log?n+1(co)| < log 2, B1.52) implies B1.56) -U- log dn(co) - -^-) Д> N@, a2). yjn\ 12 log 2/ t Billingsley A965, p. 42). t See Khinchine A961). § See D.6) on p. 42 of Billingsley A965).
194 Dependent Variables This limit theorem has a functional form. From B1.51) and Theorem 4.1 it follows that, if Yn is the random element of D denned by Yn(t, со) = —p(log q[n<](co) - [nt] 7Г 12 log 2 (with the same a as in B0.52)), then Yn Д- W. Let Zn(t,co) = —- - [n<](co) - — . 12 log 2/ Now B1.55) implies a functional central limit theorem corresponding to B1.56): B1.57) Zn^>W. From B1.57) we can derive an arc sine law for best approximations. By B1.56), A; log dk А- тг2/A2 log 2) (it can be shown that there is convergence with probability 1), so that the measure of discrepancy dk(oS) has normal order e~k**lA2 log 2). Let us say that the fcth best approximation pk(a>)lqk(co) is "superior" if dk(co) < <г*Л<шо«» and "inferior" otherwise. If Qn{co) is the fraction of superior ones among the convergents then, by B1.57) and A1.26), 2 _ P{0n < a}-^ - arc sin ^a, 0 < a < 1. For example, if и is large and if pn(co)lqn(a>) is superior, then the odds are approximately even that more than 85 per cent of the convergents B1.58) are also superior. Nonstationarity If X'n is denned by X'n(t, со) = -±= E{5[ni] where pn —»¦ oo but pjy/n —»¦ 0, if Xn is defined by B1.10), and if <5n = supt|Xn@-X;@|,
Empirical Distribution Functions 195 then By Minkowski's inequality and stationarity, b№ < -7= f 1=1 If 2v^ < 00, then by Lemma 1 and the choice of pn, Е{<5„2} —»¦ 0. By the same arguments as before, therefore, Theorems 20.2 and 20.3 carry over to processes {??„}. It follows, for example, that we may replace Gauss's measure by Lebesgue measure in the applications to Diophantine approximations. Theorem 20.4, however, does not carry over. To see this, consider the process {?„} of Example 2 of this section (p. 191) and put r}n(eo) = 2?Ln |fc(co)/2l-"+1. Although 2?=1 т){(со) is asymptotically normal (when properly normalized), this cannot be true if the probabilities are computed conditionally on ^i(co): Since rj^w) = со it follows that, if ^i(co) = a @ < a < 1), then the conditional distribution of 2"=1 ч]{(со) is a unit mass at 2>=1 ъ(<х). Remarks. The results of this and the preceding section, which are new, extend those in Billingsley A956 and 1962). Central limit theorems under the hypotheses of Theorems 20.1 and 21.1 have been proved by Ibragimov A962); see his paper for references to the earlier literature (see also Doeblin A940), Fortet A940), and Kac A946)). 22. EMPIRICAL DISTRIBUTION FUNCTIONS We shall analyze the empirical distribution functions for sequences {!„} and {r]n} of the kind discussed in the preceding two sections. cp-Mixing Processes Throughout the section, {?„} is stationary and <p-mixing. We shall need the following estimate, related to Lemma 4 of Section 20. LEMMA 1 Suppose that |?0| < 1 with probability 1, that E{f0} = 0, and that Yik^cp^ < oo. Then B2.1) Е{5Й4} < 288[n2E2{f02} +
196 Dependent Variables Proof. We have B2.2) Е{5„4} < 4! n with the indices constrained by B2.3) i,y, к > 0, i +7 + A: < и. Write/7 in place of E{f02}. We first prove these three inequalities: B2.4) V B2.5) B2.6) By Lemma 1 of Section 20, the left member of B2.4) is at most since |fn| < 1, B2.4) follows. A similar argument leads to B2.5). By another application of Lemma 1 of Section 20, the left member of B2.6) is at most B2.7) |Е{ Two further applications of the same lemma give o?<}l < 2<P<*P, from \?n\ < 1 it follows that B2.7) is at most 4^i^/72 + 2y$p, which proves B2.6). Now B2.2) and the three inequalities just proved imply 3,k<i I where the indices also obey B2.3). Since 4! 4nlp* 2 \ i,k<3 i<Pk < 2 2 к?* < . - - j=Oi,k=O and 2 ^<2 2^ J",fc<i i=Oj,k=O k=0 B2.1) is proved. Suppose now that 0<fnM<l; let Fn(t, со) be the empirical distribution function for fi(co), . . . , ?n(co), and
Empirical Distribution Functions 197 define Yn by B2.8) Yn(t, со) = Jn(Fn(t, со) - F(t)), where F is the distribution function for ?0, Let B2.9) gt(a) = /[Oit](a) - *¦(/). THEOREM 22.1 Suppose {?„} и cp-mixing with Hn2q>n% < oo, and suppose ?0 /zas a continuous distribution function F on [0, 1]. B2.10) ^п^У, where Yn is defined by B2.8) and У is f/ze Gaussian random function specified by B2.11) E{7@} = 0 B2.12) These series converge absolutely and P{Ye C} = 1. Proof. We first show that we may confine our attention to the case in which ?0 is uniformly distributed over [0, 1]. Now {F(?n)} is <p-mixing, and, since F is continuous, F(?o) is uniformly distributed. If F'Jj, со) is the empirical distribution function for F(i1(co)'), . . . , F(?n(co)), and if then, with probability 1, y;(F(o) = Yn(t) for all /. If the theorem is true in the uniform case, then Y'n -^> У, where У is a Gaussian random function with Е{У'(/)} = 0 and (write g't(a.) = Е{У'E)У'@} = and where P{Y"eC} = 1. Define h:D-> D by (/ur)@ = x(F(t)) and let У = Л(У'). Since С с Л-ic, Р{УеС}=1; furthermore, У is Gaussian and satisfies B2.11) and B2.12). Since h is continuous on C, it follows by Corollary 1 to Theorem 5.1 that Y'n ®+ У implies Yn ^> Y. Thus we need only treat the uniform case.
198 Dependent Variables Assume then that f0 is uniformly distributed, and take F(t) = t in B2.9). Since {gt(?n)} is <p-mixing and *«(o=4= it follows by Theorem 20.1 that Yn(t) is asymptotically normal. From the multivariate version of this result, it follows that, for each fc-tuple tu . . . , th, the distribution of {Yjj-d, . . . , Yn(tk)) approaches a normal distribution centered at the origin; by B0.59), the covariances of these limit distributions are those specified by B2.12) for Y. It follows also from Theorem 20.1 that the series in B2.12) are absolutely convergent. We shall show that for each positive e and r\ there exists a <5, 0 < <5 < 1, such that B2.13) P{w(Yn,d)>e}<V for all sufficiently large n. It will follow by Theorem 15.5 that {Yn} is tight and that, if Y is taken as the limit in distribution of some subsequence {Yn.}, then Р{УеС} = 1, 7ДУ, and Yis Gaussian and satisfies B2.11) and B2.12). This will complete the proof. Fix e and r\. Since f0 is uniformly distributed, By Lemma 1 applied to {&(?„) - ?,(?„)}, where K± depends on cp alone. Therefore, if ? B2.14) ~ut- s, n we have (assume e < 1) B2.15) Е{|Г„@ - 7n(s)|4} <^(t- sf. ? Assume now that p is a number satisfying с/л < р, and consider the random variables Yn(s + г» - Yn(s + (i - \)p), i = 1, 2, . . . , m, where m is a positive integer. By B2.15) and Theorem 12.2, B2.16) Pfmax | Yn(s + ip) - Yn(s)\ > x) < ^ m2/, [i<m J SA where K2 = 2К^КХ depends on cp alone.
Empirical Distribution Functions 199 We next show that [22.il) I y.@ - ВД1 й I Yn(s + p)- Yn(s)\ + pVn, s < t < s + p. Taking s = 0 for notational simplicity, we see that B2.17) is equivalent to \Un(t) - nt\ < \Un(p) - np\ +np, 0<t<p, •vhere Un(t) is the number among gu . . . , fn that satisfy |г- < t. But Un(t) -nt< Un(p) -nt= Un(p) -np + n(p -t)< \Un(p) - np\ + np ind nt - Un{t) <nt < \Un(p) - np\ + np. Now B2.17) implies ;22.18) sup \Yn(t) - Yn(s)\ < 3 max \Yn(s + ip) - Yn(s)\ + pVn. s<t<s+mj> i<m [f ;22.19) -<p<-7T, :hen B2.16) applies, and it follows by B2.18) that ;22.20) P( sup \Yn(t) - Yn(s)\ > 4e\ < Щ m2p2. [s<t<s+mp ) ? Choose <3 so that K2dje5 < rj. From B2.20) it will follow that ;22.2i) p( sup | ад - ад | > Ц < vd, [s<t<s+d ) provided there exist a p and an integer m such that B2.19) holds and mp = д. But this is equivalent to the existence of an integer m with (<5/?)v« < rn <. {bje)n, which is true for all sufficiently large n. For given ? and r], therefore, there exists a <5 such that B2.21) holds (for ill s < 1 and with s + д replaced by 1 if it exceeds 1) when n is large enough. But this implies (see the corollary to Theorem 8.3) that for large n, which is B2.13) except for the factor 12. Thus Yn ^> Y. The assumption that F is continuous serves only to simplify the proof. It can be avoided by using Theorem 12.5 where we have used Theorem 12.2. Of course, Y will not lie in С with probability 1 if F has discontinuities. Functions of cp-Mixing Processes Consider now a function B2.22) 4n =/(.-., ?-1,?«,?и-ь---)
200 Dependent Variables of a <p-mixing process, as in the preceding section. We shall assume as before that t]n has close approximations B2.23) rjln =/i(fn_i, ...,?„,..., fn+j) depending on only finitely many of the ?,-. Again we may work with one- onesided sequences just as well. Suppose that 0<rjn(co)<\, and define Yn by B2.8), where now F denotes the distribution function of rH and Fn(t, со) denotes the empirical distribution of r]i(co), . . . , r]n(co). In addition to a restriction on the magnitude of cpn, we shall need, in analyzing the asymptotic distribution of Yn, an assumption the effect of which is to ensure that the empirical distribution Fln(t, со) of r}n(co), . . . , r}ln(co) agrees with Fn(t, со) for points t in a set Нг which rapidly becomes dense in [0, 1] as /—»¦ oo. We shall suppose in the first place that о <mn(v) < i- We shall suppose in the second place that there exist sets Нг in [0, 1] with these three properties: (i) If t e Hlt then B2.24) hoAVo) = hoAVw) with probability 1. (ii) If B2.25) J^iFity.teH,}, then Jt is a pj-net in [0,1], where pt goes to 0 exponentially: There exist positive A and p, p < 1, such that B2.26) Pl<AP\ /=1,2,.... (iii) We have Нг с Н1+1. From (i) it follows that, ifteH^ then Fln(t, со) = Fn(t, со) with probability 1. Define gt by B2.9), as before. THEOREM 22.2 Suppose that {?„} is cp-mixing with T,n2(pn% < oo, that r]0 has a continuous distribution function F on [0,1], and that there exist sets Hl with the three properties just described. Then Yn^Y, where Y is the Gaussian random function specified by B2.27)
Empirical Distribution Functions 201 md 22.28) E{Y(s)Y(t)} = The series converge absolutely and P{ Y e C} = 1. Before proving the theorem, we give two examples of its application. Example 1. Let (O, &, P) be the unit interval with Lebesgue measure and et ?„(«) be the «th digit in the dyadic expansion of со, as in Example 2 of the preceding section (p. 191). If 7co = 2co (mod 1) and rjn(a>) = Tn~1co, then r]n has the form B2.22) (one-sided version). The r\n are uniformly distributed: F(t) ЕЕ t. Define r]ln by Льп(со) = 1-~^ if ^<^(co)<^, l<i<2'. Then r)l0(w) is a function of the first / digits of со and hence rjln has the form B2.23) (one-sided version). If t has the form i/21, then B2.24) holds. Since F is the identity, we may take which is a 2~J-netin [0, 1]. Thus {rjn} satisfies the hypotheses of the theorem. Example 2. Let (O, 3%, P) be the unit interval with Gauss's measure B0.19), as in Example 3 of the preceding section (p. 192). Let ?n(oo) — an(co) be the «th partial quotient in the continued-fraction expansion of со, and let r]n(co) = T^co, where Tx is the continued-fraction transformation B1.48). Then r\n has the form B2.22) (one-sided version), and the distribution function of the r)n is F(t) = J_ f^L = logd log 2 Jo 1 + x log f log 2 Jo 1 + x log 2 Define rjln(co) as the /th convergent of T^co: Then r]ln has the form B2.23) (one-sided version). Let Ht be the set of all /th order convergents (for all со). Now the unit interval splits into countabiy many subintervalsf having the elements of Нг as endpoints; ' ' t The fundamental intervals of rank /—see Billingsley A965, p. 42). = со
202 Dependent Variables and rjn(co) = Pi(co)lqi(co) lie in the same subinterval, so that B2.24) holds for t in Hx. Now the subintervals have length at most 2~l+1. Since the derivative of F is everywhere less than or equal to I/log 2, the set B2.25) is a prnet in [0, 1] with <r 2 1 log 2 2' Thus {rjn} satisfies the hypotheses of the theorem. Proof of Theorem 22.2. As in the proof of Theorem 22.1, we first show that it suffices to consider the case in which щ is uniformly distributed on [0, 1]. Let rj'n = F(rjn) and r\\n = F(r)ln). Then r]'o is uniformly distributed and, if t lies in the set Jx defined by B2.25), then ho.tyJlo) = ho,f]\4io) with probability 1. If Щ = J[ = Jx, then {rj'J and {rj'ln} satisfy the hypotheses of the theorem relative to H{ and J[. If the result is true in the uniform case, then, by the opening arguments of the preceding proof, it is true in general. Assume then that r]0 is uniformly distributed; take F(t) = t in B2.9); assume that, if t e Нг, then B2.24) holds with probability 1; assume Hx с Hl+1; and, finally, assume that Ht is a pj-net in [0,1], where px satisfies B2.26). To include 0 and 1 in Нг involves no loss of generality. x From these assumptions, it follows that \rj0 — ?7г0| < 2pf with probability 1. Therefore, whatever t may be, P{% < t < Т]Ю} + Р{г}ю <t<r)o} < ?{t - 2Pl <rjo<t} + P{t<rj0<t + 2Pl}, so that, since rj0 is uniformly distributed, Since gt(r]l0) is a function of ?_t, . . . , ?г alone, and since B2.26) implies Ерг4 < oo, it follows by Theorem 21.1 that 1 Л yjn i-i is asymptotically normal. The multivariate version of this result (see B1.43) and B1.44)) shows, just as the multivariate version of Theorem 20.1 did in the preceding proof, that the finite-dimensional distributions of Yn converge to those specified for Y and that the series in B2.28) converge absolutely. Again as in the preceding proof, it now suffices to produce, given positive ? and 7], a d, 0 < <5 < 1, such that B2.29) P{w(rn, <5) > ?} ^ 7] for all large n.
Empirical Distribution Functions 203 If s and t both lie in Нг, then the process s q>(l)-mixing with n = 0, ±1, ±2,... <pu\n) = l if К 21, {(p(n - 21) if n > 2/. Jince ?70 is uniformly distributed, and since .emma 1 implies fc=0 vhere K3 depends on <p alone. Therefore s,teHlf - < t n 22.30) mply (assume ? < 1) •22.31) P{\Ut)-Yn(s)\>*}< Choose and fix an integer m such that ;22.32) - s ^(t - sJ vvhich is possible because of B2.26). We shall inductively define sets r Av) Av) Av) Lv: t0 , t± ,. . ., tj> for v > 0. The sets Lv will be such that such that and and such that B2.33) where »2г — li 5 1<г<2 max 1<г<2
204 Dependent Variables If f«») _ о and ^0) = 1, then Lo has these properties. Suppose Lo, . . . , Lv already constructed so as to have these properties. Define t?+l) = t\v), and take as t%?l] any point of Hm(v+1) satisfying \Av+l) l(Av) _i *(^'\| s* „ \hi+l ~ Wi + h+ljl <¦ Pm{v+D> such a point exists because Hm{v+1) is a /3m(<,+i)-net in [0, 1]. From B2.32) it follows that B2.33) holds also with v replaced by v + 1. Thus Lv+X satisfies the requirements. Fix a number в such that B2.34) i < 60 = 04 < 1. Suppose ?, и, and у satisfy B2.35) - < -Ц < —л < -^, n ~ 2"+1 ~ 2"-1 ~ Vn which, by B2.33) implies B2.36) i < a, < ^ < -p . Suppose further that 0 < и < и and 0 < h < 2*~и. We shall show that B2.37) p( max \Yn(t{&+i) - Yn(t^)\ > for a iT5 depending only on <p and the fixed numbers m and в involved in B2.32) and B2.34). For notational convenience, we carry through the proof with h = 0. If 0 < i < 2", then 1 Yn(t?) | < i max J Уя(#) - Yn(t\tm*) 1 (consider the base-2 representation of the integer i), and therefore Pf 2 Now ^ = /|"-к)еЯтA_м. Since B2.30) implies B2.31), it follows by B2.36) that P{ 2 J.—0 1—1 p я. — " J—-i- о ^ 4 -vp v - - j i -у—к ли)
Martingales 205 where JC4 = 2K3m6l(l - бL. Since &_fc < l/2"-fc-\ B2.37) follows with oo ,-6 (recall that 2в0 > 1 and that the K3 in B2.31) depends on cp alone). The purely geometric inequality B2.17) applies as before and yields sup \YJt) - Yn(f$OI < 3 max \УМЦи+Л - Уп(*$)| + Jn . 0<i<2 It now follows by B2.36) and B2.37) that К er" Since f J*l = 4"~u) ? !».„ and a^_u ^ l/2*-"+1, we have, by the corollary to Theorem 8.3, B2.38) ?L(Yn, -±-\ > Us] < Щ Ql~\ This inequality holds under the assumption of B2.35) and 0 < и < v. If ? and r] are given, choose an integer r such that K560rle5 < iq (recall that 60 < 1) and define <5 = l/2r+1. For all n exceeding some n0, there exists a v satisfying B2.35). If we take и = v — r, then B2.38) holds, which implies B2.29) (with 12e in place of e). This proves Theorem 22.2. It is not hard to show that the theorem persists if the probability measure P governing the ?„ and y\n is replaced by a probability measure it dominates. Thus we can replace Gauss's measure by Lebesgue measure in Example 2. Remarks. Theorems 22.1 and 22.2 are new; the case presented here as Example 1 is due to Ciesielski and Kesten A962). 23. MARTINGALES In Section 20 we proved a functional central limit theorem for stationary processes {?„} satisfying a uniform mixing condition. This mixing condition can be weakened to the assumption of ergodicity if we assume that the partial sums Sn = & + • • • + ln form a martingale .f f See Doob A953) for the ergodic theory and martingale theory required here.
206 Dependent Variables THEOREM 23.1 Let {?l5 ?2, . . .} be a stationary, ergodic process for which B3.1) Е{?„ !&,..., ln_x} = 0 with probability 1 and for which Е{?„2} = <72 is positive andfinite. IfXn(t, со) = S[ntl(u))la^t then Xn U W. Proof. There is no loss of generality in working with a doubly infinite sequence . . . , |_x, |0, ?1} . . . , stationary and ergodic. f If ^ denotes the <7-field generated by . . . , ?к_1г ?fc, then, by B3.1) and stationarity, B3.2) with probability 1. We shall verify the hypotheses of Theorem 19.4 with the functions p(t) = 0 and o2(t) = 1 appropriate to Brownian motion (see A9.29)); Xn -^> W will follow. For Condition 1° of that theorem, it is enough to show that, for B3.3) limlim sup - ft 10 n-»oo П and B3.4) lim lim sup \ Л10 n->oo rt - Xn(t) п{Х + h) - Xn(t)J = 0 - h\} = 0. Now B3.3) is an immediate consequence of B3.2). As for B3.4), the relation B3.2) implies - Skf so that, by stationarity, the expected value in B3.4) is nhi 1=1 where А:„ = [n(t + /г)] — [«?]. Since, by Jensen's inequality, the above expression does not exceed a nh i= B3.4) follows by the mean ergodic theorem. t Take the !„ as the coordinate variables on the product of a doubly infinite sequence of copies of the real line, with the finite-dimensional distributions prescribed by the original process; ergodicity is preserved because it depends only on the finite-dimensional distribu- distributions.
Martingales 207 By B3.2), the variables ?k are orthogonal, and hence B3.5) E{Sn2} = na\ which implies Condition 2°. Suppose we succeed in proving that {maxfc<n Sk2/n} is uniformly integrable, which, if we define Ea{?/} as the integral of U over {U > a}, means that B3.6) lim sup e/- max sA = 0. a-»oo n {П k<n j It will certainly follow that ~{Xn2(t):n > 1} is uniformly integrable for each t, which is one of the hypotheses of Theorem 19.4, and Condition 3°a will follow easily by stationarity. Finally, since p(max | Sk\ > A^Jn) < 72 Ea4~ max ( | k\ > ^J) < 72 a4 {k<n j А [П k<n j it will follow by Theorem 8.4 that the tightness condition A9.51) is satisfied. It suffices therefore to prove B3.6). If ?0 has a fourth moment, then E{?*} = EE^I^IJ, with the indices running independently from 1 to n. If the largest index is not matched by any other, then, by B3.2), the term vanishes; hence {^} 2 {f A8} + 6 к i<k i If 1 ?ol < С with probability 1, then the first two sums on the right contribute at most 3«2C4 in all, and the last sum is 6 ??=2 EfS^^2}, which cannot, by B3.5), exceed 3«2C4. Thus B3.7) Е{5„4} < 6n2C4 if \?o\<C with probability 1. By a martingale inequality! B3.8) EJmaxISJ') < ( for у > 1. Therefore B3.5) implies B3.9) E(maxSfc2]<4«E{!02}, U<n j and, if Hoi is bounded by C, B3.7) implies B3.10) Efmax sA < (IL • 6n2C4. t Doob A953, p. 317).
208 Dependent Variables For и > 0, define fe if 10 if and put and If Sku = ?*LX rjiu and Dfcu = S*=1 <5iu, then B3.11) - max Sfc2 < - max S2ku + - max D\u. П к<п П к<п П k<n Now {rjnu:n > 1} has the martingale property B3.1) and \rjnu\ < 2u; it follows by B3.10) that Е.Д max SJB) < - eA max S4J < -(^ ¦ 6BWL. \n k<n j a U k<n j a\3/ And {5nu:« > 1} also has the martingale property, so that, by B3.9) and Lemma 1 of Section 21 (see p. 184), ЕД max D2A < 4E{602u} [П k<n j From the relation Ea{*7 + V} < 2E^a{t/} + 2Eia{F}, together with B3.11), it now follows that tJ- max ьу < к - + tu2^0-j [n k<n j La J for a universal constant K. Since |0 has a finite second moment, B3.6) follows. Remarks. The central limit theorem under the hypotheses of Theorem 23.1 was proved independently by Billingsley A961) and Ibragimov A963); the proof of Theorem 23.1 itself (for bounded ?„) is due to Rosen A967a). 24. EXCHANGEABLE RANDOM VARIABLES Sampling For each n, let v-^4.1) xnl, xn2, . . . , xnk
Exchangeable Random Variables 209 be a sequence of real numbers (not necessarily distinct), and let ?nl(co), . . . , ?nkn(co) be a random permutation of these numbers, each of the kn\ permuta- permutations having probability l[knl Define a random element Xn of D by :24.2) XJt, со) = 2 §JLa>), with ^„(f, со) = 0 for 0 < t < \jkn. If we sample until the finite population B4.1) is exhausted, Xn describes the course the sampling takes. THEOREM 24.1 If lc 1c B4.3) J xni = 0, f x2ni = 1, and B4.4) max \xni\->Q, l<i<kn and ifXn is defined by B4.2), then B4.5) Xn ^> W°. Proof. It will be enough to verify the hypotheses of Theorem 19.4 with the functions p(i) — —1/A — t) and a2(t) = 1 that characterize the Brownian bridge W° (see A9.32)). We shall need a preliminary computation. Let у1г . . . , yu be real numbers, suppose 1 < m < u, and let r\x, . . . , r)m be an ordered sample of size m made without replacement (there are I I possible samples, all equally likely). By symmetry, the sum Ег7^1 г\г has mean mE^}, and hence m u B4.6) ( j U i=l The second moment of this sum is, again by symmetry, wE{r?!2} + m(m - which computation reduces to B4.7) Applying these results to ?^x gni and using B4.3), we see that E{Xn(t)} = О, Е{ХД0} = [fc^](fcfc" ~^]) . The second moment here converges to t(l — t) (note that B4.3) and B4.4) imply kn -»¦ oo).
210 Dependent Variables Suppose now that 1 < mx < тг + m2 < kn and that we know the values of f я1, . . . , |nMi. Then ёПг1Я1+1,..., In,mi+TO2 is conditionally distributed as a sample of size m2 from the population B4.1) with the values |nl, . . . ?nTOi removed from it. Applying B4.6) and B4.7) with m = m2 and и = kn — mx and using B4.3), we obtain B4.8) and B4.9) E E 2 >ni = jil' ¦ • ¦ ' m2 К ~ wii=1 /pmi+тг V L»= bjlJJ • • • 5 Ь m2) Г _у|2" x - 1) L *=i "j m2(m2-l) -l2 Fix ? < 1; taking m1 = [fcnf] and m2 = [А:„(? + /г)] — [knt] in B4.8) gives E{Xn(t + h)~ Xn(t) || Xn@} = -A, with Therefore E{Xn(t К ~ [knt] - Xn(t) Xn(t)} + 1 - t xn(t) An(h) 1 - Since Е{|ЛГЯ(О1} < &{Xn\t)} is bounded and limn AJh) = A/(l - 0. A9.43) follows. (Although we have for notational convenience conditioned with respect to but one X(t), B4.8) easily yields the general case as well. The same remark applies to the next argument.) It follows from B4.9) that ^+Cn(h)Xn\t) with and h)] - [knt])(kn - [kn(t + h)]) (K - [knt])(kn - [knt] - с (h) = ~ ^^J ~ 0 « - [Kt] - 1)
Exchangeable Random Variables 211 Therefore B4.10) -h h) - Xn(t)f - h\} \вм\ h V f2 _ 1.КпЧ Now h \ knj - h 1Н has mean [knt]/kn and second moment (use B4.7)) [knt](kn - [fc,t]) fc" ^4 [fc,t]([M - 1) ^f2. kn(kn-l) А*"' &«(&«-1) ~* ' hence the variance of this sum tends to 0. Since limn Bn(h) = h(l-t- A)/(l - гJ, the first term on the right in B4.10) tends to 0 as n ->¦ oo. Since E{Zn2@} is bounded and limn Cn(h) = /г2/A - tf, A9.44) follows. We have verified Condition 1° of Theorem 19.4. Since the maximum jump max^ \xni\ in Xn tends to 0, it follows by the remark following the proof of Theorem 19.4 that we need only verify Xn@) = 0 (which is obvious) and Condition 3°. Hence it remains only to show that B4.11) E{|*n@ - Xn(h)\2 \xn(t2) - xn(t)\2} < K{u - o2, h<t< t2, for some ^independent of n, tu t, and t2. Now the left member of B4.11) is B4.12) where i and j range over [kj-^] < i, j < [knt] and к and / range over [knt] <k, l< [knt2]. Put [knt] - [кяь] = m1 and [knt2] - [knt] = m2; by symmetry, B4.12) reduces to m2 - Write тп = to + m1(m1 - I)m2(m2 - Calculations (routine if tedious) further reduce B4.12) 1 - m1m2(m1 + m2 — 2) 2rn-l h(mi - l)ma(ma - 1) (K - l)(kn - 2) 3A - 2rn) kn(kn - l)(/cn - 2)(fcn - 3)
212 Dependent Variables Since 0 < тп < 1 by B4.3), kn > 6 implies that this last expression is at most 2ш2 %т^тг{тх + m2) 24ш12ш22 o./mi + тг\2 T3 fc4 I k )¦ If h ~ h > l/kn, then (mx + m2)//cn < 3(t2 - h), so that B4.11) holds with К = 306. If t2 — tx < l//cn, then B4.11) holds because one or the other of the two factors inside the expected value vanishes. This proves Theorem 24.1. We shall need the following consequence of Theorem 24.1. For each TF°-continuity set A and for each positive e, there exists a positive d(e, A) such that, if xnl, . . . , xnkn satisfy B4.3) and B4.13) max \xni\ < d(e,A), then B4.14) \?{XnEA}- W°(A)\<e (Xn being denned by B4.2)). Indeed, if for some e and A there were no such d(e, A), we could construct a sequence of sets {xni} satisfying B4.3) and B4.4) but violating B4.5). Exchangeable Variables We turn now to a more general problem. For each n, let B4.15) ?nl, ?n2, . . . , ?nkn be random variables that are exchangeable in the sense that each permutation of the set B4.15) has the same joint distribution as the set itself. Define Xn, as before, by B4.2). We shall assume the ?ni satisfy the three conditions, B4.16) Z?»«A0, 2&^1> maxIfJ^O, i=l г=1 1<г<*:п as n —»- oo. If B4.15) is a random permutation of B4.1), then it is exchangeable and B4.3) and B4.4) together imply B4.16). The following result thus generalizes Theorem 24.1. THEOREM 24.2 If the variables B4.15) are exchangeable and satisfy B4.16), ® Proof. If an = k;1 2*^ ?я< and pn* = Z& (?ni - anJ, then, by B4.16), B4.17) knan^0, /SnAl. If ^ni = (?ni — an)//5n, then the variables 7;nl, r]n2,. . . , r]nkn are exchangeable
Exchangeable Random Variables 213 and Define a random element Yn of D by Yn(f, со) = J ?yni(co). Since Xn@ = 0ЯУЯ(О + [knt]an, j^n Д. w° and 7n^ Г are equivalent by B4.17) and Theorem 4.1. We shall prove Yn ^> W°. Let inl, . . . , ?nkn be a random permutation of r]nl, ..., r)nkn. (The r)ni are subjected to a random permutation that is independent of them. This is a probative device; to support the random permutation, the probability space on which the rjni are defined may require to be enlarged.) The t,ni have the same joint distribution as the rjni, since the latter are exchangeable, and hence Yn has the same distribution as the random function defined by Zn(t, oj) = г=1 To prove Yn Д- W° it therefore suffices to show that B4.19) ?{ZneA}->W°(A) for each JF°-continuity set A. Given a JF°-continuity set A and a positive e, choose d(e, A) in such a way that B4.13) implies B4.14) (where, in B4.14), Xn is defined via B4.2) from a random permutation of numbers B4.1) satisfying B4.3)). If En is the event En= max \r}ni\< d(A,e)\, [l<i<kn j then P(?n) —>-1 by B4.18). By the definition of conditional probability, P{Zn e A} = f P{Zn e A || Via,..., r]nkn} d? + Г P{ZnEA\\rjnl,...,rinkn}dP. Because of B4.18), \?{ZneA\r]nX,...,rlnkn}-W\A)\<e holds on the set En, and therefore \?{ZneA) - W°(A)\ <2e + 2?{En°). Since P(En) —»- 1, B4.19) follows, which completes the proof of the theorem.
214 Dependent Variables Combining Theorem 24.2 with the computations in Section 11 (see A1.39), A1.40), and A1.42)) gives the limiting distributions for maxj.<j.n E*=1 ?ni, for тахй<й |??=1 ?nJ, and for the fraction of A:, 1 < к < kn, for which 4.1 Zm > o. Remarks. For Theorem 24.1 and extensions to other sampling procedures, see Rosen A964, 1967a, and 1967c). Theorem 24.2 is new. Chernoff and Teicher A958) proved Xn{t) 3> N@, t(l — 0) (t fixed) under the hypotheses of this theorem; see their paper for examples of variables satisfying these hypotheses. Related limit theorems concern proba- probabilities computed conditionally on Xn(l) = 0, where Xn is some random function; see Dwass and Karlin A963)—and the references there—and Trumbo A965), as well as forth- forthcoming papers by T. Liggett and M. Wichura.
APPENDIX I Metric Spaces We review here a few facts about metric spaces, taking as known the first definitions (open, closed, dense, limit point, continuity, etc.) and properties.! We denote the metric space by S and the metric itself by p(x, y). We denote the closure of a subset A of S by A~, its interior by A°, and its boundary by dA (= A~ — A°). We define the distance from x to A as P(x, A) = inf {P(x, y):yeA}; it is easy to check that p(x, A) is uniformly continuous in x. Denote by S(x, e) the open sphere with center x and radius e: S(x, e) = {y: p(x, y) < e). By "sphere" we shall mean "open sphere"; we shall call S(x, e) the e-sphere about x. (Topologists use "ball" in place of "sphere.") Two metrics p1 and p2 on S are said to be equivalent if (write St(x, e) = {у:рг(х, y) < e}) for each x and e there is a <5 with S^x, д) с: S2(x, e) and S2(x, <5) c: S^(ж, e), so that 5 with px is homeomorphic to S with p2. Separability The space S is by definition separable if it contains a countable, dense subset. A base for S is a class of open sets such that each open subset of .S is the union of some of the members of the class. An open cover of A is a class of open sets whose union contains A. A set A is discrete if about each point of f For full accounts, see Dieudonne A960), Royden A963), or Simmons A963), for example. 215
216 Appendix I A there is a sphere containing no other points of A—in other words, if each point of A is isolated in the relative topology. If S itself is discrete, then taking the distance between distinct points to be 1 defines a metric equivalent to the original one. These three conditions are equivalent: (i) S is separable. (ii) S has a countable base. (iii) Each open cover of each subset of S has a countable subcover. Moreover, separability implies (iv) S contains no uncountable discrete set, and this in turn implies (v) S contains no uncountable set A with A) inf {p(x, y):x, у eA, x ^ y} > 0. Proof of (i) -»- (ii). Assuming D is a countable set dense in S, let У be the countable class of spheres with rational radii and with centers in D. Let G be open; to prove that У is a base, we must show that, if Gx is the union of those elements of У that are contained in G, then G = Gx. Clearly, С?! с G, and to prove G c G1? it suffices to find, for a given x in G, a d in D and a rational r such that x e S{d, r) c= G. But if x e G, then S(x, e) c= G for some positive e. Since D is dense, there is г. d in D with р(ж, J) < \s. Take a rational r with p(x, d) < r < |e. Proo/ o/ (ii) —>- (iii). Let {Vx, V2, . . . } be a countable base, and suppose {Ga} is an open cover of A (a ranges over an arbitrary index set). For each Vk for which there exists a Ga with Kfc с Ga, let Gafc be some one of these Ga containing it. Then A c: \JkGak. Proof of (iii) —>- (i). For each n, {S(x, n):^ e S} is an open cover of S. If (iii) holds, there is a countable subcover {S(xnk, п~г):к = 1,2,...}. The countable set {xny.n, к = 1, 2, . . . } is dense in S. Proof of (Hi) —>- (iv). If A is discrete, then for each x in A there is a positive ex such that S(x, ex) contains no other point of A. Since {S(x, sx):x e A} is an open cover of A without any subcover, (iii) cannot hold if A is uncountable. Since a set satisfying A) is discrete, certainly (iv) implies (v). Although we shall not need the fact, (v) implies separability, so that (i) through (v) are all equivalent. (For each positive e, use Zorn's lemma to find a maximal set As of points distant at least e from one another; the union of the AE for rational e is dense and, if (v) holds, countable.)
Metric Spaces 217 Compactness A set A in S is by definition compact if each open cover of A contains a finite subcover. An ?-net for A is a set of points {жд.} with the property that for each x in A there is an xk such that p(x, xk) < e (the xk are not required to lie in A). A set is totally bounded if, for every positive e, it has a finite ?-net. A set A is complete if each fundamental sequence in A converges to some point of A. For an arbitrary set A in S, these four conditions are equivalent: (i) A~ is compact. (ii) Each countable open cover of A~~ has a finite subcover. (iii) Each sequence in A has a limit point {has a subsequence converging to a limit, which necessarily lies in A~). (iv) A is totally bounded and A" is complete. It is easy to show that (iii) holds if and only if each sequence in A~ has a limit point (necessarily in A~) and that A is totally bounded if and only if A~ is totally bounded. Therefore we may assume in the proof that A = A~ is closed. The implication (i) —>- (ii) is trivial. Proof of (ii) —>- (iii). Given a sequence {xn} in A, define Fn to be the closure of the set {xk:k > n}. If f\nFn = 0, then the open sets Fnc cover A, and hence, if (ii) holds, A <=¦ F-fV • ¦ • VFnc for some n, which implies the impossible relation Fn n A = 0. Thus ПпРп contains some x, which must be a limit point of {xn}. Proof of (iii) —*¦ (iv). If A is not totally bounded, there exist some positive ? and some sequence {xn} such that p(xm, xn) > e for m ф п; {xn} can have no limit point. Hence (iii) implies total boundedness. And (iii) implies completeness because, if {xn} is fundamental and has a limit point x, then {xn} converges to x. Proof of (iv) —v (i). Assume (iv) and suppose {Ga}, where a ranges over an arbitrary index set, is an open cover of A having no finite subcover. We shall derive a contradiction. Since A is totally bounded, it can, for each n, be covered by finitely many open spheres Bnl, . . . , Bnkn of radius 2~n. At least one of the Bni must have the property that no finite subfamily of {Ga} covers А П Bni (which must therefore be nonempty); let Cn be one such Bni. Since the Bni П Cn_x П A cover Cn_! П A (if n > 1), we may also insist that no finite subfamily of {Ga} cover Cn П Cn_x П A, so that, in particular, Cn П Cn_x ^ 0. Let xn be one of the points that Cn shares with A. Since Cn П Сп_х ф 0 and Cn has radius 2~n, we have p(xn, xn_x) < 6-2~n, which implies
218 Appendix I p(xn, xn+k) < 6-2~n. Thus {xn} is fundamental and has, by the completeness assumption, a limit x, which must lie in A. Now x ? Ga for some a, and S(x, e) <=¦ Ga for some positive e. But then p(x, xn) < e/3 for some n satisfying 2~n < e/3, which implies Cn «= Ga, а contradiction.")" A subset of A;-dimensional Euclidean space Rk (with the usual metric) has compact closure if and only if it is bounded. Ifh is a continuous mapping of S into another metric space S', and if К is a compact subset of S, then hK is a compact subset of S'. Proof. If {G'a} is an open cover of hK, then {h^G'J is an open cover of К and hence has a finite subcover {h~1G'a.:i = 1, ... , n). Clearly, {G'a.:i = 1, . . . , n} covers hK. Upper Semicontinuity A function/is upper semicontinuous at x if for every positive e there exists a positive <5 such that p(x, y) < <5 implies f(y) < f{x) + e. It is easy to see that / is everywhere upper semicontinuous if and only if, for each real a, the set {x:f{x) < a} is open. Letfn be real functions on S such that, for each x,fn(x) is nonincreasing and B) lim/n(x) = 0. n-*oo If the fn are upper semicontinuous, then the convergence in B) is uniform on each compact set. Proof. For each positive e, the open sets Gn = {x:fjx) < e} cover S. If К is compact, then К <= Gn for some n and uniformity follows. The Space В.™ Let R*3 be the space of sequences x = (хг, x2, . . .) of real numbers. If Po(a> ft) — la — 01/A + |a — 0|), then pQ is a metric on the line R1 equivalent to the ordinary metric |a — 0|. The line is complete under p0. It follows that, if p(x, y) = ??.! po(xk, ykJ~k, then p is a metric on R™. If C) Nk;(x) = {y.\yi - xt\< e, i = 1, . . . , k}, then Nk>?(ж) is open in the sense of the metric p. Moreover, since p(x, y) < е2~к1A + e) implies у ?Nke(x), which in turn implies p(x, y) < e + 2~fc, the sets C) form a base for the topology given by p. t Proof from Dieudonne A960).
Metric Spaces 219 We shall always take i?00 with this topology. It is the product topology, or the topology of coordinatewise convergence: limn x(n) = x if and only if limn xk{n) = xk for each k. The space R™ is separable; one countable, dense set consists of those points with coordinates that are all rational and that, with only finitely many exceptions, vanish. Suppose {x(n)} is a fundamental sequence in i?00. Since PofaOn), xk(n)) < 2kP(x(m), x(n)), it follows easily that, for each k, {^A), xkB), . . . } is a fundamental sequence on the line with the usual metric, so that the limit xk = limn xk(n) exists. If x = (хг, x2, . . .), then x(ri) converges to x in the sense of R™. Thus R™ is complete. A subset A of R°° has compact closure if and only if the set {xk:x e A} is, for each k, a bounded set on the line. Proof. It is easy to show that the stated condition is necessary for compact- compactness. We may prove sufficiency by the classical diagonal method. Given a sequence {x(n)} in A, we may choose a sequence of subsequences /л\ I ""VOJt ~V22/» XV^2.Z)i in the following way. The first row of D) is a subsequence of {x(n}}, so chosen that xx = lim^ хг(пи) exists; there is such a subsequence because {xx:x e A} is a bounded set of real numbers. The second row of D) is a subsequence of the first row, so chosen that x2 = lim^ x2(n2i) exists; there is such a subsequence because {x2:x e A} is bounded. We continue in this way; row A; is a subsequence of row к — I, and xk = lim.; xk(nki) exists. Let x be the point of R™ with coordinates xk. If щ = nu, then {х(щ)} is a subsequence of {x(n)}. For each k, moreover, x(nk), x(nk+i)> • • • all lie in the kth. row of D), so that lim^ хк(щ) = xk. Thus limf х(пг) = x, and it follows that A~ is compact. Our final result about Rm is a special case of Urysohn's embedding theorem. Every separable metric space S is homeomorphic to a subset of R™. Proof Let {dx, d2, . . . } be a sequence of points dense in S, and define a mapping h from S into R™ by h(x) = (P(x, dj, p(x,d2),...), xe S,
220 Appendix I where p denotes the metric on S. If points xn of S converge to a limit x, then limn p{xn, djc) = p(x, dk) for each k, so that, since the topology of Л00 is that of coordinatewise convergence, h(xn) converges to h(x). Thus h is continuous. Suppose xn does not converge to x. Then p(xn,, x) > e for some positive s and for each element of some subsequence {xn>}. If p(x, dk) < ?e, which must be true for some element of the dense sequence {dk}, then p(xn,, dk) > |e for all n'; thus р(жп, dk) cannot converge to p(x, d^) for this value of к and hence h(xn) cannot converge to h(x). Thus h(xn) —>- h(x) implies xn —>- ж. The same argument shows that h(x) = h(y) implies x = у (take xn = ?/). Therefore h is a one-to-one, bicontinuous mapping of S onto the subset hS of R°°. The Space С Let С = C[0, 1] be the space of continuous, real-valued functions on the unit interval [0, 1] with the uniform metric. The distance between two elements x = x(t) and у = y(i) of С is p{x, y) = supt \x(t) - y{t)\; it is easy to check that p is a metric. Convergence in the topology is uniform pointwise convergence of (continuous) functions. The space С is separable; one countable, dense set consists of the (polyg- (polygonal) functions that are linear on each subinterval [(i — 1I k, i/k], i = 1,. . . , k, for some integer k, and assume rational values at the points ijk, i = 0,1,-..,*. If {xn} is a fundamental sequence in C, then, for each value of t, {xn(t)} is a fundamental sequence on the line and hence has a limit x(t). It is easy to show that the convergence xjj) —> x(t) is uniform in t, so that x lies in С and is the limit in С of {xn}. Thus С is complete. We define the modulus of continuity of an element a; of С by E) wx(d) = w(x, d) = sup \x(s) - x(t)\, 0 < д < 1. |s-i|<5 Since F) K(<5) - wy(d)\ < 2P(x, y), wx(d) is, for fixed positive <5, continuous in x. Note also that, since an element of С is uniformly continuous, we have G) lim wx(d) = 0, x e C. 5-*0
Metric Spaces 111 (The Arzela-Ascoli Theorem.) A subset A of С has compact closure if and only if (8) and (9) supHO)|<oo xeA lim sup wx(d) = 0. 5->0 xeA Proof If A~ is compact, (8) follows easily. Since wx(\jn) is continuous in x and nonincreasing in n, G) holds uniformly on A if A" is compact (see p. 218) and (9) follows. Suppose now that (8) and (9) hold. Choose к large enough that wx(l/k) is finite. Since к K01<K0)l+I -6')-ft1') it follows that A0) sup sup \x(t)\ < oo. t xeA We shall deduce from (9) and A0) that A is totally bounded; since S is complete, this will imply that A~ is compact. Given s, we must construct a finite ?-net for A. Let a denote the finite quantity in A0), and let H denote the finite set of points -а, v и = 0, ±1,.. ., ±v, where v is an integer such that ajv < e (H is an ?-net for the linear interval [—a, a]). Now choose к large enough that wx(l[k) < s for all x in A, and take В to consist of those elements of С that are linear on each subinterval [(/ — l)[k, i[k], i = 1, . . . , k, and assume values in H at the points i[k, i = 0, 1, . . . , k. The set В is finite (it contains Bv + I)*4 points); we shall show that it is a 2e-net for A. If x e A, then \x(ijk)\ < a. Therefore there exists a point у of В such that x ©-'(9 < e, i = 0,1,. k. Since wx(l[k) < e, and since у is linear on each subinterval [(i — i/k], it follows from A1) that p(x, y) < 2e. This proves the theorem. We have seen that (8) and A0) are equivalent in the presence of (9). When (9) holds, A is said to be uniformly equicontinuous. Thus the Arzela-Ascoli theorem asserts that A~ is compact if and only if A is uniformly bounded and uniformly equicontinuous.
APPENDIX II Miscellany Measurability Let (Q, 38) and (Q', 38') be measurable spaces; 3$[3$'} is a cr-field of subsets of Q[O']. A mapping h:Q,—*• O' from Q into O' is said to be measurable C8, &') if the inverse image hrxM' lies in 38 for each M' in SI1. If h~x38' denotes the class {h~xM':M' e 38'}, this condition may be succinctly stated as fr1 381 с SI. Since {M':hrxM' e Щ is a <r-field, i/ ^ й contained in SI* and generates it, then h~x&'0 <=¦ 38 implies hrx38' <=¦ SI. Let (Q", ST) be a third measurable space, let/:Q' -> Q" map O' into O", and denote by jh the composition of h and j: (jh)(со) =j(h(co)). It is easy to show that, ifh-1^' с SI andj-1^" с SS\ then (jh)'1^" с SI. If Q = S and O' = S' are metric spaces, h is continuous when h~xG' is open in S for each open G' in S'. Let «?" and Sf' be the (T-fields of Borel sets in S and S'. If h is continuous, then, since h~xG' e «9" for G' open in 5', and since the open sets in S' generate &", h~x?f' <=¦ ?f, so that h is measurable. If O' is A;-dimensional Euclidean space Rk, we always take 38' to be the class 3%k of A;-dimensional Borel sets (the (T-field generated by the open subsets of jRfc with the Euclidean metric), and we say h is measurable 38 if hrxMk с 38\ h is measurable 38 if /rx{a ei?4:^ < a) e 38 for each f = 1, . . . , к and each real a. In particular, if Q = 5 is a metric space, if к = 1, and if h is upper semicontinuous (see p. 218), then h is measurable 5f. Change of Variable If P is a probability measure on (Q, 38), and if h~x3$' <= 38, ?hrx denotes the probability measure on (п',38') defined by (?hrx)(M') = P(/rW) for 222
Miscellany 223 M' e 38'. If/ is a real function on Q', measurable ^", then the real function fh on О is measurable ?8. Under these circumstances,\ fis integrable with respect to ?h~x if and only if fh is integrable with respect to P, in which case we have A) f f(h(co)) P(dco) = f f(co') Ph~\dco') for each M' in &'. If X takes Q into a metric space S and X~x?f <= 88, so that X is a random element of S (see Section 4), we generally write E{/(Z)} in place of $f(X(co))P(do>). If P = PX-1 is the distribution of JTin S, A) implies B) E{/(X)} = f f(x)P(dx). Js Tail Probabilities Let X be a random variable on a probability space (O, &, P). If X is non- negative and integrable, and if a > 0, then C) f IdP = aP{I>a} + f°°P{X^ t}dt. J{X>a} Ja This will follow if we prove D) E{X} = rP{X>t}dt, Jo since we may replace X by its product with the indicator of {X > a}. If X has finite range, D) follows by summation by parts. The general result follows from the fact that any nonnegative X can be represented as the limit of a nondecreasing sequence of random variables Xn with finite range. (The equation also holds in the extended sense that if one side of D) is infinite so is the other.) Still assuming X nonnegative and integrable, and assuming a > 0, we have P{X > a} < - f X dP < - ЦХ}. a J{x>a} a E) Scheffe's Theorem Let A be a measure (not necessarily finite) on a space (O, 3$), and let p(co) 3.ndpn(co) be probability densities with respect to l\p a.ndpn are nonnegative functions on Q, measurable ^, such that Jp dX = Jpn dX = 1. t For a proof, see Halmos A950, p. 163), for example.
224 Appendix II (Scheffe's Theorem.)^ Ifpn(co) ->p(co) except for со in a set of X-measure 0, then F) sup Ee3S \ pdX- \ pndX =-\\p-pn JE JE 2 J Proof lfdn=p- pn, then J<5n dX = 0. For E in &, therefore, \dndX Je = \dndX JE f bndX JEC {\dn\dX; J and, if E = {<5n > 0}, there is equality here. This proves the equality in F). If <5+ is the positive part of <5n, then <5+ -> 0 except on a set of A-measure 0. Since 0 < 5+ < p, J|<5J dX = 2J5+ dX -> 0 follows by Lebesgue's dominated convergence theorem. A special case: If 2^„(г) = 2t-/>(i) = 1, the terms being nonnegative, and if limnpn(i) = p(i) for each i, then Тц \p(i) — pn(i)\ -»¦ 0. If a(i) is bounded, it follows that T,t a(i)pn(i) -»¦ 2,- a(i)p(i). Subspaces Let S be a metric space, and let Sf be its (T-field of Borel sets. A subset So (not necessarily in ?f) is a metric space in its own right in the relative topology. Let ?f0 be the (T-field of Borel sets in So. We shall prove If h(x) = x for x e So, then A is a continuous mapping from So to S and hence Ьгг& <= ^Oj So that So n A e ^0 if A e ^. Since {?„ n ^4 :^4 e 5f) is a (T-field in So and contains all sets So П G with G open in S1, that is, all open sets in So, G) follows. If >S0 lies in 5^, G) becomes Product Spaces Let 5' and S" be metric spaces with metrics p and p" and (T-fields Sf" and Sf" of Borel sets. The rectangles (9) A' X A" with A' open in 5" and A" open in 5" are a basis for the product topology in S = S' x S". This topology may also be described as the one under which t Scheffe A947).
Miscellany 225 (x'n, x"J -> (xr, x") if and only if x'n -+ x and < -+ x". Finally, the topology may be specified by various metrics, for example, A0) p((x\ x"), (</', jO) = V [/>'(*', y')f + [P"(x\ y")f and A1) P{{x', x"), (y1, y")) = max {p\x\ у'), р\х\ у")}. Let 9" x 9"' be the (T-field generated by the measurable rectangles (sets (9) with A' e 9" and A" e 9"'), and let 9 be the cr-field of Borel sets in S for the product topology. We first show that A2) 9' X 9" <= 9. If ir'(x', x") = x' and tt"(x', x") = x", then tt':S^S' and ir':S-+S" are continuous and hence measurable ((тг')^" <= ^ and (тг")-1^" с ^). It follows that, if A' e ^' and Л" e &\ then Л' x A" = ir'-1/ П тг"-М" lies in 9. Since 5^ contains all the measurable rectangles, A2) follows. Now S is separable if and only if S' and S" are both separable. Let us show that, if S is separable, then A3) 9" X 9" = Sf. In view of A2), it suffices to show that, if G is open in S, then G e 9" X Sf*. But G is a union of rectangles (9) with A' open in S' and Л" open in S" (so that A' X A" e9" x 9"'), and, if ? is separable, G is a countable such union. This proves A3).f Suppose that X' and X" are random elements of S' and S", respectively, and have a common domain (Q, &). Now (X'(co), X"(cSj) defines a mapping (X\ X") from О to S = S' x S", and clearly (JT, Х"У\9" х 9") <= 38. If S is separable, then, by A3), (X1, X") is a random element of S. If 5" = S", then p'(x', y') defines a continuous mapping from S' X S" to jR1; if S' is separable, then the composition p(X', X") is measurable and hence is a random variable, t Measurability of Dh Let Dh be the set of discontinuities of a mapping h from a metric space S to another metric space 5". Denote the metrics in S and S' by p and p'. Let Aed be the set of x in S for which there exist points у and z in 5 satisfying f Without separability, A3) may fail: If S' = S№ is a discrete space whose cardinality exceeds that of the continuum, then the diagonal {(ж, у): х = у) lies in SP but not in Sf" x ^" (see Problem 2 on p. 261 of Halmos A950)). X The counterexample in the preceding footnote shows that this may be false if S' is not separable.
226 Appendix II p(x, У) < <5, p(x, z) < d, and p'(hy, hz) > e. Then Aed is an open set. Since Dh = U П AejS, z 3 where s and <5 are restricted to positive rationale, Dh is a Borel set in S. This is true even if h is not measurable (for example, if h is the indicator of a set A, then Dh = dA is closed no matter what A may be). Consider now the set E involved in Theorem 5.5. We shall assume that the mapping h is measurable and that S' is separable and prove E e ?f. If Be s _t.is the set of ж such that p'Qix, h$) > s for some у with p(oj, y) < <5, then ? = unn и where e and <5 range over the positive rationals, and it suffices to find sets CE д i that lie in Sf and satisfy If the sequence мт is dense in S' and if He m = {x:p'(hx, um) < |e}, then ЯЕ TO e ?f and 51 = \J mHzm. The requirements on Сг&л are satisfied by СЕ>г,, = U (ЯЕ>ГИ П jMA J, m where /eE>t. m is the set of x such that р'(^у, hz) > e for some pair of points у and z with p(x, y) < 5, p(x, z) < 5, and z e Hem (Лл<§ж is open). This provesf ?e ^. (The proof of Theorem 5.5 goes through even if E lies outside Sf, provided it has outer P-measure 0.) Helly's Theorem A distribution function is a function F(x) = F(xu . . . , xk) on Rk with these three properties (we use the terminology and notation of Section 3): (i) Fis everywhere continuous from above; (ii) 0 < F(x) < 1 for all x, Fis nondecreasing in each variable, and, for each A:-dimensional rectangle (a, b], A4) • Z±F{a1 + eid1,...,ak+6kdk)>0, where dt = Ъг — at, where the sum ranges over all 2fc sequences FX, . . . , 6k) of O's and l's, and where the sign is + or — according as the number of O's in the sequence is even or odd; (iii) F(x) —>- 0 as any one coordinate of x goes to — oo, and F(x) -> 1 as all coordinates of ж go to oo. f This proof is due to F. Tops0e.
Miscellany 227 If P is a probability measure on (Rk, ?%k) and A5)' F(x) = P{y:y<x}, xeR*, then F is a distribution function. It is a standard fact of real variable theory that, if F is a distribution function, then there is exactly one P satisfying A5). If F has properties (i) and (ii) above (but perhaps not (ill)), then there is a finite measure ц on (Rk, Mk) such that A6) F{x) = [x{y:y<x}, xeRk; /x will satisfy p(Rk) < 1 but need not be a probability measure. (Helly's Selection Theorem.) If{Fn} is a sequence of distribution functions on Rk, then there exist a subsequence {Fn,} and a function F satisfying condi- conditions (i) and (ii) above (but perhaps not (iii)) such that A7) linV Fn.(x) = F(x) for all continuity points x of F. Proof. Let Rok denote the set of rational points in jRfc—the set of points whose coordinates are rational—and let (r(l), rB), . . . } be an enumeration of the points of ДД F°r eacn Fn in the given sequence of distribution functions, A8) is a point of R™. Since 0 < Fn(x) < 1, it follows by the compactness criterion for R™ (see p. 219) that some subsequence of the points A8) converges in the sense of R™ to some point (zl9 z2, . . . ) of jR00. Define Fo on jRofc by F0(r(k)) = zk. We see in this way that there exists a function Fo on Rok and a subsequence {Fn,} of {Fn} such that A9) limn.Fn.(r) = Fo(r), r e Rk0 . Since each Fn is a distribution function, it follows from A9) that Fo satisfies condition (ii) on Rok: 0 < F0(r) < 1, Fo is nondecreasing as each coordinate varies over the rationals, and A4) holds if a and b lie in ДД If we define F on jRfc by F(x) = inf {F0(r): x < r, r e Rj}, xe Rk, then F has properties (i) and (ii) (but perhaps not (iii)). If F is continuous at x, then, given e > 0, we can find points r' and r" of R? such that r' < x < r" and F(x) - e < F0(r') < F0(r") < F(x) + e. For each n', Fn(r') < Fn,(x) < Fn.(r").
228 Appendix II From these relations and A9) it follows that F(x) - e < lim infn. Fn,(x) < lim supw, Fn,(x) < F(x) + e. Since ? was arbitrary, A7) follows. This proves Helly's theorem. Kolmogorov" s Theorem We shall prove the Kolmogorov (or Daniell-Kolmogorov) existence theorem. Let TTh be the projection from R™ to Rk defined by ттк{х) = (хг, . . . , xk), as described in Section 3. For к > 1, let ipk be the projection from Rk to jRfc-x defined by чрк(хг, . . . , xk) = (xx, . . . , zfc_x). Since ттк and tpk are continuous, they are measurable (ттк1&к е 0t™ and у}кг^к^ с &k). If P is a probability measure on (R™, M™), define probability measures txh on (Rk, Mk) by B0) fxk = \ Since B1) 7rfc_i = укттк, к > 1, the measures /^fc satisfy B2) nk_x= [xkipk\ fc>l. The problem is to go the other way and construct, from given [xk satisfying the consistency conditions B2), a P satisfying B0). {Kolmogorov's Existence Theorem.) If probability measures цк on {Rk,Mk), k>\, satisfy B2), then there exists on (Д00, M™) a unique probability measure P satisfying B0). Proof. For к > i> 1, define a continuous mapping y)ki from jRfc to .R* by Wk,i(xi, ¦ ¦ ¦ , xk) = (xi, • ¦ • , O- Note that у)кк_г = ipk and that B3) yk<i = %+i' • • Щ+iVk, Щ = Wk.i^k From this and B2) it follows that B4) fr = 1Аьц>^, k>i>l. Let 3F be the collection of finite-dimensional sets, as defined in Section 3; 3F, which consists of the sets of the form B5) A = ттГгН, Н e 0t\ is a field generating the (T-field M™. A set of the form B5) can also be cast in the form B6) A = тт^Н', Н' e 0tk, if к > i; we need only take H' = y)k]H and use B3).
Miscellany 229 Suppose now that A is given both by B5) and by B6), where i < k. Then- a = (al5 . . . , afc) lies in H' if and only if (al5 . . . , afc, 0, 0, . . . ) lies n тг^Я' = 7гт1Я, which is true if and only if (al5. . . , aj lies in Я. Thus tf' = ip^H, and B4) implies B7) Since B5) and B6) together imply B7), we may consistently define a Function P on & by setting P(A) = /л^Н) if A is given by B5). Clearly, P@) = 0, PiR™) = 1, and 0 < P(^) < 1. Suppose A is given by B5) and В by В = тг-V, J e M\ where we assume к > i. Then A also has the form B6), and if А П В = 0, we must have Я' П J = 0, so that P(A UB) = /лк(Н' U J) = 1лк(Н') + t*k{J) = P(A) + P(B). Thus P is a finitely additive probability measure on ZF. Let us prove that P is completely additive on J5" by showing that, if B8) Л => Л, =>•••, and Пг ^г = 0, then lim^ Р(Аг) = 0. Since ^4f lies in J5", it has the form B9) ^=<Я„ Я<?^. Since a set of the form B5) can be cast in the form B6) for each к exceeding i, there is no loss of generality in assuming that C0) пг < n2 < nz < • • • . We shall show that, if sets B9) satisfy B8) and C0), and if there exists a positive e such that C1) then Hi At Ф 0. From C1) and B9), it follows by the definition of P that цп.{Н^ > е. By Theorems 1.1 and 1.4, there exists in jRni a compact subset Кг of Я^ with fx^H, - Кг) < е№+\ If В, = 1Г-)К,, then Вг eJF, Bt с Л„ and P(At - Вг)< -1- . From this and B8) it follows that Ct = Вг П • • • n B{ is a subset of At satisfying P(A{ - CJ < Щ=ХР(А5 - B3) < e/2. By C1), therefore, P(Ct) > e/2, so that Q is nonempty.
230 Appendix II We have constructed nonempty sets Ct <= w^ with Kt compact and C2) d => C2 =>¦¦•. Since Q <= At, f|i At # 0 will follow if we find a point common to all the Q. Let x(j) be an arbitrary element of C,-. If у > i, thenxQ") e Ct by C2), and hence irnx{f)e Кг. Since ^ is compact, we have sup^ K(/)| < со and hence sup3 \xt{j)\ < oo. For each i, therefore, (жД1), х{B),. . . } is a bounded set of real numbers. Therefore (see p. 219) some subsequence {x(j')} of {x(j)} converges in the sense of R°° to some limit x e R™. Since x(j') e Cy and each Q is closed, C2) implies that x lies in f^ Ct. We have shown that P is a completely additive probability measure on the finitely additive field !F. Since !F generates M™, P can be extendedf to a probability measure on R™. Clearly, P satisfies B0). Since IF is a determining class, there can be at most one P satisfying B0), which proves Kolmogorov's theorem. It is not difficult to go on from here and prove a more general version of the theorem. Let RT be the space of all real functions x = x{t) on an arbitrary set T, which we may as well take to be infinite. (If Г consists of the integers, then RT = jR00.) For a finite ordered set a = (sx, . . . , ska) of distinct elements of T, define TTa:RT^Rk° by -na{x) = (x(Sl), . . . , x(ska)); let 0tT be the (T-field generated by the sets тт~хН with H e Mka (for all a and H). Then (jRT, &T) is a measurable space; no topology is involved (or need be involved). A collection of probability measures [ла on (Rka, 0№) (one measure for each a) is consistent if ца^ = [лаух whenever a2 = (si ¦> • • • > si}) is a permuta- permutation of/ of the elements of ax = (sx, . . . , sk) (here у < к) and ip:Rk->- Rj is defined by y{xx, . . . , xk) = (xh, . . . , хг). If the [xa are consistent, there exists a unique probability measure P on (RT, @T) such that Ptt-1 = iia for all a. To prove this, define, for a sequence т = (tx, t2, . . .) of elements of T, a mapping ttt:Rt -»¦ jR00 by тгт(ж) = (x(tx), x(t2), . . . ). From the existence theorem for R™, it follows that there exists a probability measure PT on {R™, M™) such that Рттт-Х = ptY..tic for every k. Now the class of sets тт-гН with H e M™ (for all т and Я) is exactly 0lT (and does not merely generate it), and it follows easily that Р{тг~хН) = PT(H) consistently defines the desired P. Measurability of Some Mappings Let T denote the unit interval [0, 1]; let &~ denote the class of linear Borel subsets of T. For each t, the projection ттг from С to jR1 is measurable f Halmos A950, Chapter 3).
'€. Since the mapping C3) ' (x, 0 - x(t) from С X T to J?1 is continuous in the product topology, and since ^ x 3~ is the (r-field of Borel sets for this topology (see p. 225), C3) is measurable For x e C, let h(x) be the Lebesgue measure of the set of t in T for which x(t) > 0. We want to prove that h is measurable #, and we shall derive this from a more general result. If we define a real function и of a real variable by fl if a>0 C4) «(a) = 10 if a<0, then C5) h(x)= v(x(t))dt. = Cv(x(t)), Jo Now v is (i) Borel measurable, (ii) bounded, and (iii) continuous except on a set of Lebesgue measure 0. Using these three properties of v, we shall show that the function h defined by C5) is measurable ^ and is continuous except on a set of Wiener measure 0. Since, for each x in C, v(x(t)) is a bounded, measurable function of t, the integral in C5) is well defined. Since the mapping C3) is measurable ^ x 0Г and since v is measurable, the mapping ip: С x T—^R1 defined by ip(x, t) = y(z(f)) is also measurable. Since ip is bounded, h(x) = JJ ip(x, t) dt is measurable in x.| Hence h is measurable ^\ If Dv is the set of discontinuities of v, then by (iii), А(Д) = 0, where X denotes Lebesgue measure on the line. Let E denote the set of (x, t) for which x(t) e Dv. For each positive t, W{x:(x, t)eE}= —L= f «Г* du = 0. jlt JDV It follows by Fubini's theorem applied to the measure W x A on that z,0e?} = 0 if x ф A, where Л. is an element of ^ with W{A) = 0. Now if xn converges to x pointwise, then v(xn(t)) -+ v(x(t)) for each t such that x(t) ф Dv. If x ф А, this is true for almost all t, and hence {v(xn(t))dt-+{ v(x(t))dt Jo Jo by the bounded convergence theorem. This proves that h is continuous except at points forming a set of FF-measure 0. t See Halmos A950, Chapter 7).
232 Appendix II The argument goes through if W is replaced by an arbitrary P with the property that P-nj1 is absolutely continuous with respect to Lebesgue measure for almost all t. This is true of W°, for example. Except for the proof that C3) is measurable, this argument goes through word for word if С is replaced by D. And the measurability of C3) can be proved by adapting the proof in Section 14 of the measurability of irt. One more function on С remains to be analyzed, namely the supremum h(x) of those t in [0, 1] for which x(t) = 0. Since (x:/z(x) < a} is open, certainly h is measurable. If h is discontinuous at x, then x(t) must keep to one side of 0 in (A(x), l) and keep to the same side of 0 in (h(x) — s, A(x)) for some s. That h is continuous except on a set of Wiener measure 0 will therefore follow if we show that, for each t0, the supremum and infimum of Hoover [t0, 1] have continuous distributions. Since Wt — Wt with t ranging over [t0, 1] is distributed as a Wiener path with a linearly transformed time scale, — Wttj + supf>io Wt has a continuous distribution (see A0.17)). This last random variable and Wto are independent, and hence their sum also has a continuous distribution. The infimum is treated the same way. (This function h extended to have domain D is not continuous except on a set of Wiener measure 0.) More Measurability In Section 17 we defined the mapping ip:D x Do^~ D by ip(x, cp) = x 0 cp (see A7.5) and A7.6)), and we are to prove it measurable {гр~х2 с 2 x 20). Since the finite-dimensional sets in D generate 2, it is enough to prove that, for each t, the mapping C6) (x, cp) -> rrrt{x о cp) = x(cp(t)) is measurable 2 x 20. If cpk{t) is the smallest ratio i/k not smaller than cp(t), then cpk(t) I cp(t) for each t. Hence the mapping C7) (x, cp) -> x(cpk(t)) converges pointwise to C6), and it suffices to prove this latter mapping measurable 2 x 20. Now {(x, q>):x(cpk(t)) < a} is the union of C8) {(x, cp): <p(t) = 0} П {(x, cp): x@) < oc} with the sets C9) ((x, ф):1=± < cp(t) <Цп f(x, cp):x(i) < a), г = 1,. . ., к. If H e Й?1, then {cp e D0:<p(t) e #} is the intersection with Do of the subset -njxH of D and hence lies in 20. Therefore {(x, cp):cp(t) e H} e@ x 20. Similarly, {(x, cp):x(t) e #} e2 x 2o. Thus the sets C8) and C9) all lie in 2 X 20, which proves the measurability of C7).
APPENDIX III Theoretical Complements This appendix, most of which requires general (nonmetric) topology, treats questions that arise naturally out of the theory of weak convergence but are irrelevant to its application. The Problem of Measure Does there exist on the class of all subsets of a given set U a probability measure that has no atoms (that assigns measure 0 to each individual point)? The answer certainly depends only on the cardinality of U. It is clear that no such measure can exist if U is countable, and it can be shownf under the assumption of the continuum hypothesis that no such measure can exist if U has the power of the continuum. If there does exist an atomless probability measure on the class of all subsets of U, then the cardinal of U is called measurable; otherwise this cardinal is called nonme asm able. Thus Xo is nonmeasurable, and the continuum hypothesis implies that the power of the continuum is also non- measurable. (Among sets on the line, it is the nonmeasurable ones that are aberrant. Here the terminology is reversed: Among cardinals, it is the measurable ones that are aberrant.) Whether there exist measurable cardinals is a famous unsolved problem of set theory—the so-called problem of measure.! If measurable cardinals fSee Birkhoff A961, p. 187).. % See Keisler and Tarski A964) for an account of the problem of measure and a large bibliography. 233
234 Appendix III exist at all, they must be so large as never to arise in a natural way in mathematics. We shall see that certain irregular phenomena in weak con- convergence are equally remote from ordinary mathematical activity because they can arise only if there exist measuable cardinals. Separable Measures Consider now a probability measure P on the class Sf of Borel sets in a metric space S. According to Theorem 1.4, P is tight if S is separable and complete. Certainly, completeness can be weakened to topological complete- completeness (the condition that there exists for S an equivalent metric under which it is complete), and an examination of the proof shows that the assumption of separability can be weakened to the assumption that P has separable support. Let us define P itself to be separable if it has a separable support.! We then have the following result. THEOREM 1 If P is separable and if S is topologically complete, then P is tight. Remark 1. The hypothesis here that P is separable cannot be suppressed: If P is tight, then it has a сг-compact support, and hence is separable. But, of course, S itself need not be separable. Remark 2. The hypothesis of topological completeness cannot be suppressed either: Let S be a thick subset of [0, 1]—a set whose inner Lebesgue measure X^S) is 0 and whose outer Lebesgue measure X* (S) is 1—with the relative topology of the line. Here ?f consists of the sets S П A with A an ordinary linear Borel set (see G) on p. 224). If P is the restriction of X* to У, then P is completely additive; it is a probability measure because X*(S) = 1. If К is compact in S, it is an ordinary Borel set and hence P(K) = 0 because X^ (S) = 0. Hence P is not tight. Since P is separable (S itself being separable), Theorem 1 becomes false without the completeness hypothesis. Remark 3. If S consists of the rationale with the relative topology of the line, then, since S is сг-compact, each P on S is tight. On the other hand, by Baire's category theorem,J S is not topologically complete. It is an open problem to characterize topologically those metric spaces that support tight probability measures only. The question now arises, do nonseparable probability measures exist? t This is the specialization to the metric case of the more general notion of a т-smooth measure; see Varadarajan A958a and 1961a). % See Kelley A955, p. 200).
Theoretical Complements 235 THEOREM 2 A necessary and sufficient condition that each probability measure on ?? be separable is that each discrete^ subset ofS have nonmeasurable cardinal. Proof. We first prove the necessity. Suppose S contains a discrete subset Ao with measurable cardinal. We shall construct a nonseparable P on ?f. Let Q be an atomless probability measure on the class of all subsets of Ao, and define an atomless probability measure P on У by P(A) = Q(A П Ao). If A is separable, then А П Ao is both discrete and separable and hence (p. 216) countable, so that Q(A П ,40) = 0. Thus P can have no separable support. The proof of sufficiency is much deeper. Suppose that each discrete subset of S has nonmeasurable cardinal and consider an arbitrary probability measure P on ?f. We are to show that P is separable. Note first that it suffices to show that each open cover IS of S contains a countable subclass {Gx, Ga, . . . } with A) P(\JnGn) = 1. Indeed, for each к there is then a sequence Akl, Ak2, ... of open 1/fc-spheres with P(\JnAkn) — 1> and DfeUn^kn is a separable support for P. Consider then an open cover ^ of S. By the paracompactness theorem,! there is a class Ж with these properties: (i) Ж is an open cover of S. (ii) Each element of Ж is contained in some element of У. (iii) Ж can be represented as a countable union B) Ж = \}пЖп, where, for each n, C) inf {P(A, B):A, ВеЖп,А^В}>0, p being the metric on 5*. Fix n for the moment. From each element of Жп choose a single point, forming in this way a set Sn. Because of C), Sn is discrete. For А с Sn, let Xn(A) be the P-measure of the union of all those elements of Жп that meet A (this union is open). Since the elements of Жп are disjoint, Xn is a finite measure on the class of all subsets of Sn. Since Sn has nonmeasurable cardinal by hypothesis, Xn has a countable support. (Otherwise we could subtract away the atomic part of Xn to obtain on the class of all subsets of Sn an atomless measure vn with 0 < vn(Sn) < oo, and vn could be normalized to a probability measure.) Let An be a countable subset of Sn with D) h(Sn - An) = 0, t See p. 215. t See Kelley A955, p. 129).
236 Appendix III let /n be the class of elements of Жn that meet An, and define $ = \JnJ n- Since each class $n is countable, so is Jf. If Hn is the union of the sets in Ж п and Jn is the union of the sets in f n, then, by D) and the definition of kn, P(Hn — Jn) = 0. Since Ж covers S, it follows by B) that \JnHn = S and hence, if / = \JnJn, that P(J) = 1. Now / is just the union of the sets in Jf, which we can enumerate as G[, G'z, . . . . Each G'n lies in Ж and hence is contained in some element Gn of ^. Since P(J) = 1, it follows that the Gn satisfy A), as required. That each P on a separable S is itself separable, an obvious fact, follows because each discrete subset must have the nonmeasurable cardinal Xo. Theorem 2 also implies that each P on 5* is separable if 5* itself has non- measurable cardinal, which is true if the power of S does not exceed that of the continuum and the continuum hypothesis is true. In any case, the search for a nonseparable probability measure belongs properly not to probability but to set theory. The Topology of Weak Convergence Consider now the space Z = Z(S) of probability measures on (S, ?P). Make Z into a Hausdorff space by taking as the basic neighborhoods of P the sets of the form E) (ftdQ-(ftdP < e, i = 1,..., fcj, where ? is positive and/1?. . . ,fk are elements of C(S). The resulting topology we shall call the topology of weak convergence and denote by iV. We have Pn => P if and only if the sequence {Pn} ^-converges to P. We shall show that there are three other bases for iV, namely, the sets F) {Q with Ft closed, the sets G) {Q with Gi open, and the sets (8) {Q:\QiA,) - Р{Аг)\ < e,i = \, ... ,k) with At a P-continuity set. Certainly, each of these three classes of sets is a basis (for some topology). Theorem 2.1 is the sequential version of the following result. THEOREM 3 The three bases just described all generate iV. Proof. Since F) and G) coincide if Gt = Ff, the two corresponding bases are identical. We shall show that the basis F) generates the same topology as does (8) and then that it generates the same topology as E), that is,
Theoretical Complements 237 Fix P. If A is a P-continuity set, then for б in a set of the form F) we have Q(A) < Q(A~) < P(A-) + ? = P(A) + ?, and for Q in a set of the form F) (or G)) we have Q(A) > Q(A°) > P(A°) - s = P(A) - s. Since the sets F) do have the properties of a basis, each set (8) contains a set F) (with the same P). On the other hand, if F is closed, then there is some д for which (9) Fd = {x:P(x,F)<d} is a P-continuity set and A0) P(FS) < P(F) + \e. If \Q(Fd) - P(Fd)\ < |e, then Q(F) < P(F) + e. Thus each set F) contains a set (8). We now compare F) with E). Given a closed F, choose д so that (9) satisfies A0), and then choose in C(S) an/with value 1 on F, value 0 outside F8> and value everywhere contained between 0 and 1 (Theorem 1.2). If \(fdQ - lfdP\ < is, then Q(F) < P(F) + e. Thus each F) contains a E). It remains only to find within E) a set of the form F). We need consider only a single / in C(S) and we may assume that 0 < f(x) < 1 for all x. Choose к so that \\k < ? and let Ft — {х:Цк <f(x)}. By B.2) we have J/dQ < ? + k-1 2, Q{FX) and fc-1 2, P{F,) < ifdP. For Q in F) we thus have ifdQ <fdP + 2e. The same argument applied to 1 — / completes the proof. By N(P) we shall mean a ^-neighborhood of P having any of the forms E) through (8). We are free to work with whatever base is most convenient. THEOREM 4 The probability measures with finite support are W-dense in Z. Proof. Consider a neighborhood N(P) of the form F). From each nonempty set В in the finite partition generated by Fx, . . . , Fk choose a single point and place there a mass P(B). The resulting measure has finite support and lies in N(P) because it agrees with P for each Ft. Theorem 4 applies even to the approximation of a nonseparable P (if there are any). Therefore, if a subset П of Z consists exclusively of separable measures, one cannot conclude that each element of the ^-closure of П is separable. If, however, the elements of П have a common separable support and P lies in the "^-closure of П, then P is separable (because, if A is the closure of a common separable support, then A is separable and it follows by Theorem 3 that P(A) =1). Since a countable collection of separable measures has a common separable support, P is separable if Pn^> P and each Pn is separable. It is natural to ask when iV is metrizable. For P and Q in Z, let p(P, Q) be the infinum of those positive s for which the inequalities Q(A) < P(Ae) + e
238 Appendix III and P(A) < Q{AE) + ? hold for all A in ?', where Ae = {x:p(x, A) < s}. From p(P, 0 = 0 it follows that P and Q agree on closed sets and hence (Theorem 1.1) are identical. The remaining postulates being easy to check, p is a metric on Z—the Prohorov metric. We shall see that, if if can be metrized at all, then p does it. Fix P. If for each N(P) there is about P a/7-sphere contained in N(P), we say p is at least as fine as if at P. If, in addition, for each/7-sphere about P there is an N(P) contained in it, we say p and if are equivalent at P. THEOREM 5 For arbitrary P, p is at least as fine as if at P. Moreover, p and if are equivalent at P if and only if if has a countable basis at P, which in turn holds if and only if P is separable. Proof. Given P, F, and s, choose <5 so that <5 < s and (9) satisfies A0). If p{P, Q) < R then Q(F) < P(Fd) + <5 < P(F) + e. Thus each N(P) of the form F) contains a /7-sphere about P, which proves the first part of the theorem. If p and if are equivalent at P, then certainly if has a countable basis at P. It remains then to prove that this last condition implies the separability of P and that the separability of P implies the equivalence of p and if at P. Suppose if has at P a countable basis Nn(P), n = 1, 2, .... By Theorem 4, each fl?=i Nk(P) contains a Pn with separable (even finite) support. But then Pn => P, which, as remarked after the proof of Theorem 4, implies that P is separable. Finally, suppose that P is separable. We are to prove that p and if are equivalent at P, for which, in view of the first part of the theorem, it suffices to show that a/7-sphere with radius s and center P must contain some N(P). Choose <5 so that 3<5 < e. Cover the separable support of P by P-continuity spheres of diameter less than <5, pass to a countable subcover (p. 216), and by the usual procedure construct disjoint P-continuity sets Ax, A2, . . . that cover the support. Now choose к so that (И) Each of Alt . . . , Ak has diameter less than <5 and, if «s/ is the (finite) class of unions of these sets, each element of «s/ is a P-continuity set. By Theorem 3, there is an N(P) such that QeN{P) implies A2) \Q(A) - P(A)\ < d, Ae^. This inequality and A1)'imply A3) We shall show that p(P, Q)< s if Q e N(P).
Theoretical Complements 239 Given the general В in ?f, consider the union of those sets among Alt . . . , v4fc-that meet B. If A denotes this union, then it satisfies A2). The relation 5ciu (ULi Ai)c> tne relation A ^ Bs (true because diam At < <5), and the inequalities A1), A2), and A3) combine to yield P(B) < Q(B8) + 2<5 and Q(B) < P(B8) + 3<5. Since 3C < e, p(P, Q) < e follows. It follows by Theorems 2 and 5 that W is metrizable if and only if each separable subset of S has nonmeasurable cardinal, in which case it is metrizable by p. One can ask after the separability of Z. If S is separable, then the p and Z topologies coincide on Z, and it follows by Theorem 4 that the probability measures with finite support are dense. It is not difficult to go on to show that, if Ao is a countable, dense set in S and Zo consists of those P having finite support contained in Ao and rational masses, then Zo is countable and dense relative to p. Thus Z is metrizable and separable if S is separable. On the other hand, if "W satisfies the second axiom of countability, it can be metrized by p and is separable. Since S with its metric is homeomorphic with the set of unit masses in Z, it follows that S is separable. Prohorov's Theorem Let us reexamine Prohorov's theorem in the light of the preceding results. If П is a subset of Z, let Uv and П^- denote its closures with respect to p and W. Theorem 5 implies that П,, с П^- (with strict inclusion for some П if W is not metrizable). A set in a topological space is compact if each open cover of it has a finite subcover; it is sequentially compact if each sequence in it contains a sub- subsequence converging to a limit that is also in the set. Consider these three statements: 1°. Each sequence in П contains a ^-convergent subsequence. 2°. Uir is sequentially compact. 3°. П^- is compact. The limit of the convergent subsequence in 1° is not required to lie in П, but, of course, it will lie in П^-. Clearly, 2° implies 1°. From the three statements, one can make up five other implications; whether any of them are true is unknown.! If each separable subspace of S has nonmeasurable cardinal, however, the W topology is metrizable, so that all three concepts coincide, and we shall see that this is also true if П is tight. In this book (see the beginning of Section 6), we have defined П to be relatively compact if it satisfies 1°. t To me.
240 Appendix III Consider the direct half of Prohorov's theorem (Theorem 6.1). THEOREM 6 If U is tight, then Up = IV, IV is tight, the p and topologies coincide on IV, and IV is both compact and sequentially compact (in either topology). Proof. Since П is tight, there is to each s a compact К such that Q(K) > 1 - ie for all Q in П. Suppose P elV By Theorem 3, there is an N(P) such that Q e N(P) implies Q{K) < P(K) + \e. Choosing Q in N(P) n П leads to P(K) > 1 - e. Thus IV is tight. In particular, each element of IV is separable. Therefore (Theorem 5) the p and iV topologies coincide on IV; since П^ с IV in any case, Up = IV follows. From the tightness of IV, it follows by Theorem 6.1 that IV is sequentially compact in the sense of W. Since compactness and sequential compactness are the same in the metric case, the proof is complete. The direct Prohorov theorem can also be used to investigate the complete- completeness of Z. Suppose that S is complete and that each element of Z is separable (and hence tight). Then p and if coincide on Z, and we shall show that Z is /7-complete. For this it is enough to show that a ^-fundamental sequence {Pn} necessarily contains a weakly convergent subsequence, which will follow by the direct Prohorov theorem if we prove that {Pn} is tight. Finally, as we saw in the proof of Theorem 6.2 (see p. 40), since S is complete {Pn} is tight if, for each positive ? and <5, there exists a finite collection Ax, . . . , Ak of E-spheres such that A4) Рп(Аг U • • • U Ak) > 1 - e for all n. Choose tj so that 2r] < min {e, <5}, and then choose n0 so that n > n0 implies p(Pn, РПо) < r\. Since Р„о is separable, there exist finitely many ^-spheres Въ . . °. , Bk such that P°no(ULi Bt) > l ~ V- Let A, • ¦ • , A be the 2?7-spheres with the same centers. Since (U*=i ВгУ с UiLi A an(i p(Pn, РПо) < г) for n > n0, it follows that A4) holds for n > щ. Since each Pn is separable, we can ensure that A4) holds for the finitely many n preceding щ by enlarging the system {Аг}. Thus Z is /7-complete if 5* is complete and each probability measure on it is separable. Suppose, on the other hand, that Z is /^-complete. It is not difficult to show that the set Zo of unit masses is /7-closed and hence also /7-complete and then to show that S is complete. We turn now to the converse half of Prohorov's theorem. THEOREM 7 If U is if-compact and each measure in it is separable, and if S is topologically complete, then П is tight.
Theoretical Complements 241 Proof. By Theorem 1, the elements of П are individually tight. To prove П itself tight, it suffices as usual to produce, given e and <5, finitely many <5-spheres Ax, . . . , Ak such that P(A± U • • • U Ak) > 1 — e for all P in П. If P lies in П, then, since P is tight, there is a compact set К such that P(K) > 1 — Je, and К can be covered by finitely many open <5-spheres Bp i5 i = 1, . . . , kP. By Theorem 3, there is a ^-neighborhood N(P) of P such that <2 e NG*) implies that the union GP of the i?Pji satisfies Q(GP) > P(GP) — \e and hence Q(GP) > 1 — e. Since П is #"-compact, it can be covered by a finite selection of these N(P). Take А1г . . . , Ak to consist of the i?P>i corresponding to the neighborhoods N(P) selected. In view of Theorem 1, Theorem 7 is the same as the assertion that, if П is "^-compact and each of its elements is tight, and if S is topologically complete, then П is tight. The hypothesis that each P in П is tight cannot be suppressed, because a single nontight P forms a compact set. On the other hand, it seems reasonable to conjecture that the completeness hypothesis can be suppressed—to conjecture that П is tight if it is "^-compact and if each of its elements is tight.f Although the problem is open, there is an interesting result in this direction. THEOREM 8 Suppose that Pn => P, where P and each of the Pn are tight. Then {P, Px, P2, . . . } is tight. Proof. Since the measures involved are individually tight, it is enough to show that for each e there is a compact К such that A5) Pn(K) >l-e for all sufficiently large n. Given e, use the tightness of P to choose a compact K' with P(K') > 1 — Je, and put Gh= {x:p(x,K')< I/A;}. Since Pn=> P, there is an increasing sequence {nk} such that n > nh implies A6) Pn(Gk) > P(Gk) - ie > 1 - |e. For nk < n < nk+1, use the tightness of Pn to choose a compact K'n satisfying K'n^Gk and Pn(Gh - K'n)< ie. If А, = Гии«и^ then Kk is compact, К' с Kh<^ Gh, and nk < n < nk+1 implies Pn(Gk — Kk) < Je. From A6) it now follows that nk < n <! nfc+1 implies A7) P Put К = [JkKk. Suppose {xu} is a sequence in K; if there is some one Kk containing all the xu, then {xu} contains a convergent subsequence; otherwise t Varadarajan A958a and 1961a) has the result in this form, but his proof contains a lacuna.
242 Appendix III there is a subsequence {xu } and a sequence {vm} of integers such that xUmeKVm^ GVm and vm>m, so that p(xUm, K')< l/m; {xuj must contain a further subsequence that converges. Thus К is compact. If n > и1} then «fc < и < nfc+1 for some A;, and A5) follows from A7) and the relation It is essential in this theorem to assume that P is tight: Take a nontight P on a separable S, as in the second remark following Theorem 1, and apply Theorem 4. Remarks. Theorem 8 and the example in Remark 2 following Theorem 1 are due to LeCam A957); the remaining results are for the most part due to Varadarajan A958a and 1961a).
Bibliography Articles in the Soviet journal Teoriya Veroyatnosti i ее Primeneniya are listed as they appear in the English translation Theory of Probability and its Applications, published by the Society for Industrial and Applied Mathematics. \. D. Alexandroff A940-43), "Additive set functions in abstract spaces," Mat. Sb. 8, 307-348; 9, 563-628; 13, 169-238. Г. W. Anderson A963), "Asymptotic theory for principal component analysis," Ann. Math. Statist. 34, 122-148. Г. W. Anderson and D. A. Darling A952), "Asymptotic theory of certain 'goodness of fit' criteria based on stochastic processes," Ann. Math. Statist. 23, 193-212. Stefan Banach A932), Theorie des Operations Lineaires. Warsaw: Monografje Mate- matyczne. Robert Bartoszynski A961), "A characterization of the weak convergence of measures," Ann. Math. Statist. 32, 561-576. P. J. Bickel A968a), "Some contributions to the theory of order statistics," to appear. A968b), "A distribution free version of the Smirnov two sample test in the />-variate case," to appear. Patrick Billingsley A956), "The invariance principle for dependent random variables," Trans. Amer. Math. Soc. 83, 250-268. A961), "The Lindeberg-Levy theorem for martingales," Proc. Amer. Math. Soc. 12, 788-792. A962), "Limit theorems for randomly selected partial sums," Ann. Math. Statist. 33, 85-92. A964), "An application of Prohorov's theorem to probabilistic number theory," ///. J. Math. 8, 697-704. A965), Ergodic Theory and Information. New York: John Wiley and Sons. Patrick Billingsley and Flemming Topsoe A967), "Uniformity in weak convergence," Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 1-16. Garret Birkhoff A961), Lattice Theory. Providence: The American Mathematical Society. 243
244 Bibliography Z. W. Birnbaum and Ronald Руке A958), "On some distributions related to the statistic D+," Ann. Math. Statist. 29, 179-187. J. R. Blum, D. L. Hanson, and J. I. Rosenblatt A963), "On the central limit theorem for the sum of a random number of independent random variables," Z. Wahrschein- lichkeitstheorie und Verw. Gebiete 1, 389-393. H. Bohr A932), Fastperiodische Funktionen, Erg. Math. 1, no. 5. Berlin-Gottingen- Heidelberg: Springer-Verlag. (English translation 1947. New York: Chelsea.) N. N. Chentsov A956), "Weak convergence of stochastic processes whose trajectories have no discontinuities of the second kind and the 'heuristic' approach to the Kolmogorov-Smirnov tests," Theor. Probability Appl. 1, 140-144. H. Chernoff A956), "Large sample theory: parametric case," Ann. Math. Statist. 27,1-22. H. Chernoff and H. Teicher A958), "A central limit theorem for sequences of exchangeable random variables," Ann. Math. Statist. 29, 118-130. D. M. Chibisov A965), "An investigation of the asymptotic power of the tests of fit," Theor. Probability Appl. 10, 421-^37. Z. Ciesielski and H. Kesten A962), "A limit theorem for the fractional parts of the sequence 2kt," Proc. Amer. Math. Soc. 13, 596-600. Harald Cramer A946), Mathematical Methods of Statistics. Princeton: Princeton University Press. H. Cramer and H. Wold A936), "Some theorems on distribution functions," J. London Math. Soc. 11, 290-295. D. A. Darling A957), "The Kolmogorov-Smirnov, Cramer-von Mises tests," Ann. Math. Statist. 28, 823-838. D. A. Darling and P. E. Erdos A956), "A limit theorem for the maximum of normalized sums of independent random variables," Duke Math. J. 23, 143-155. Jean Dieudonne A960), Foundations of Modem Analysis. New York: Academic Press. W. Doeblin A940), "Remarques sur la theorie metrique des fractions continues," Com- positio Math. 7, 353-371. M. Donsker A951), "An invariance principle for certain probability limit theorems," Mem. Amer. Math. Soc. 6. A952), "Justification and extension of Doob's hueristic approach to the Kolmogorov- Smirnov theorems," Ann. Math. Statist. 23, 277-281. J. L. Doob A949), "Heuristic approach to the Kolmogorov-Smirnov theorems," Ann. Math. Statist. 20, 393-403. A953), Stochastic Processes. New York: John Wiley and Sons. R. M. Dudley A966), "Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces," ///. J. Math. 10, 109-126. A967), "Measures on non-separable metric spaces," III. J. Math. 11, 449-^53. Nelson Dunford and Jacob T. Schwartz A958), Linear Operators. Part I: General Theory. New York: Interscience Publishers. Meyer Dwass and Samuel Karlin A963), 'Conditional limit theorems," Ann. Math. Statist. 34, 1147-1167. P. Erdos and M. Kac A946), "On certain limit theorems in the theory of probability," Bull. Amer. Math. Soc. 52, 292-302. A947), "On the number of positive sums of independent random variables," Bull. Amer. Math. Soc. 53, 1011-1020. William Feller A957), An Introduction to Probability Theory and Its Applications, Volume I, second edition. New York: John Wiley and Sons. A966), An Introduction to Probability Theory and Its Applications, Volume II. New York: John Wiley and Sons.
Bibliography 245 R. Fortet A940), "Sur une suite egalement repartie," Studia Math. 9, 54-69. 1.1. Gikhman and A. V. Skorohod A965), Introduction to the Theory of Random Processes. Moscow. Leonard Gross A963), "Harmonic analysis on Hilbert space," Mem. Amer. Math. Soc. 46. Paul R. Halmos A950), Measure Theory. New York: D. Van Nostrand. G. H. Hardy and E. M. Wright A960), An Introduction to the Theory of Numbers, fourth edition. Oxford: Clarendon Press. P. L. Hennequin and A. Tortrat A965), Theorie des Probabilities et Quelques Applications. Paris: Masson et Cie. I. A. Ibragimov A962), "Some limit theorems for stationary processes, " Theor. Proba- Probability Appl. 7, 349-382. A963), "A central limit theorem for a class of dependent random variables," Theor. Probability Appl. 8, 83-89. Kiyosi ltd and Henry P. McKean, Jr. A965), Diffusion Processes and Their Sample Paths. Berlin-Heidelberg-New York: Springer-Verlag. M. Kac A946), "On the distributions of values of sums of type S/Bfcr)," Ann. of Math. 47, 33-^9. A949), "On deviations between theoretical and empirical distributions," Proc. Nat. Acad. Sci. U.S.A. 35, 252-257. Gopinath Kallianpur A961), "The topology of weak convergence of probability measures," J. Math. Mech. 10, 947-969. Samuel Karlin A966), A First Course in Stochastic Processes. New York: Academic Press. John L. Kelley A955), General Topology. New York: D. Van Nostrand. A. Ya. Khinchine A961), Continued Fractions, third edition. Moscow: State Publishing House of Physical-Mathematical Literature. (English translation 1964. Chicago: University of Chicago Press ) A933), Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Erg. Math. 2, no. 4. Berlin-Gottingen-Heidelberg: Springer-Verlag. H. J. Keisler and A. Tarski A964), "From accessible to inaccessible cardinals," Fund. Math. 53, 225-308. Ernest G. Kimme A957), "On the convergence of sequences of stochastic processes," Trans. Amer. Math. Soc. 84, 208-229. A960), "Some equivalence conditions for the convergence in distribution of sequences of stochastic processes," Trans. Amer. Math. Soc. 95, 495-515. Frank B. Knight A962), "On the random walk and Brownian motion," Trans. Amer. Math. Soc. 103, 218-228. A. N. Kolmogorov A931), "Eine Verallgemeinierung des Laplace-Liapounoffschen Satzes," Izv. Akad. Nauk SSSR Ser. Fiz-Mat., 959-962. A933), Grundbegriffe der Wahrscheinlichkeitsrechnung. Erg. Math. 2, no. 3. Berlin- Gottingen-Heidelberg : Springer-Verlag. A956), "On Skorohod convergence," Theor. Probability Appl. 1, 213-222. A. N. Kolmogorov and Yu. V. Prohorov A954), "Zufallige Funktionen und Grenzverteil- ungssatze," Berich tiber die Tagung Wahrscheinlichkeitsrechnung und Mathematische Statistik, 113-126. Berlin: Deutscher Verlag der Wissenschaften. John Lamperti A962), "An invariance principle in renewal theory," Ann. Math. Statist. 33, 685-696. A962a), "A new class of probability limit theorems," J. Math. Mech. 11, 749—772. A962b), "On convergence of stochastic processes," Trans. Amer. Math. Soc. 104, 430-435.
246 Bibliography L. LeCam A957), "Convergence in distribution of stochastic processes," Univ. California Publ. Statist. 2, no. 11, 207-236. Paul Levy A925), Calcul des Probabilites. Paris: Gauthier-Villars. A937), Theorie de VAddition des Variables Aleatoires. Paris: Gauthier-Villars. A948), Processus Stochastiques et Mouvement Brownien Paris: Gauthier-Villars. Michel Loeve A960), Probability Theory, second edition. Princeton: D. Van Nostrand. H. B. Mann and A. Wald A943), "On stochastic limit and order relations," Ann. Math. Statist. 14, 217-226. A. M. Mark A949), "Some probability theorems," Bull. Amer. Math. Soc. 55, 885-900. J. C. Oxtoby and S. Ulam A939), "On the existence of a measure invariant under a trans- transformation," Ann. of Math. B) 40, 560-566. K. R. Parthasarathy A967), Probability Measures on Metric Spaces. New York: Academic Press. Yu. V. Prohorov A953), "Probability distributions in functional spaces, " Uspehi Matem. Nauk (N.S.) 55, 167. A956), "Convergence of random processes and limit theorems in probability theory," Theor. Probability Appl. 1, 157-214. A960), "The method of characteristic functionals," Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2, 403-419. Ronald Руке A965), "Spacings," J. Roy. Stat. Soc. Ser. В 27, 395^49. A968), "The weak convergence of the empirical process of random sample size," to appear. Ronald Руке and Galen Shorack A968), "Weak convergence of the two-sample empirical process and the Chernoff-Savage theorem," to appear. R. Ranga Rao A962), "Relations between weak and uniform convergence of measures with applications," Ann. Math. Statist. 33, 659-680. R. Ranga Rao and V. S. Varadarajan A958), "On a theorem in metric spaces," Ann. Math. Statist. 29, 612-613. A. Renyi A958), "On mixing sequences of sets," Acta Math. Acad. Sci. Hung. 9, 215-228. A960), "On the central limit theorem for the sum of a random number of independent random variables," Acta. Math. Acad. Sci. Hung. 11, 97-102. Bengt Rosen A964), "Limit theorems for sampling from a finite population," Arkiv for Matematik, 5, 383-^24. A967a), "On the central limit theorem for sums of dependent random variables," Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 48-82. A967b), "On asymptotic normality of sums of dependent random variables," Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 95-102. A967c), "On the central limit theorem for a class of sampling procedures," Z. Wahr- Wahrscheinlichkeitstheorie und Verw. Gebiete 7, 103-115. H. Royden A963), Real Analysis. New York: Macmillan. H. Scheffe A947), "A useful convergence theorem for probability distributions," Ann. Math. Statist. 18, 434-438. Paul Schmid A958), "On the Kolmogorov and Smirnov limit theorems for discontinuous distribution functions," Ann. Math. Statist. 29, 1011-1027. J. Sethuraman A964), "On the probability of large deviations of families of sample means," Ann. Math. Statist. 35, 1304-1316. A965), "On the probability of large deviations of the mean for random variables in D[0, 1]," Ann. Math. Statist. 36, 280-285. George F. Simmons A963), Introduction to Topology and Modern Analysis. New York: McGraw-Hill.
Bibliography 247 A. V. Skorohod A956), "Limit theorems for stochastic processes," Theor. Probability -Appl. 1, 261-290. A957), "Limit theorems for stochastic processes with independent increments," Theor. Probability Appl. 2, 138-171. A961), Studies in the Theory of Random Processes. Kiev: Kiev University Press. (English translation 1965. Reading: Addison-Wesley.) E. E. Slutsky A925), "tiber Stochastische Asymptoten und Grenzwerte," Metron 5, 1-90. A937), "Qualche proposizione relativa alia teoria delle funzioni aleatorie," Giorn. 1st. Ital. Attuari 8, 183-199. Charles Stone A963), "Weak convergence of stochastic processes defined on semi-infinite time intervals," Proc. Amer. Math. Soc. 14, 694-696. A963), "Limit theorems for random walks, birth and death processes, and diffusion processes," ///. J. Math. 7, 638-660. V. Strassen A964), "An invariance principle for the law of the iterated logarithm," Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 3, 211-226. E. C. Titchmarsh A939), The Theory of Functions, second edition. Oxford: Oxford University Press. Flemming Topsoe A967a), "On the connection between P-continuity and P-uniformity in weak convergence," Theor. Probability Appl. 12, 281-290. A967b), "Preservation of weak convergence under mappings," Ann. Math. Statist. 38, 1661-1665. Bruce E. Trumbo A965), "Sufficient conditions for the weak convergence of conditional probability distributions in a metric space." Thesis, University of Chicago. V. S. Varadarajan A958a), "Convergence in distribution of stochastic processes." Thesis, Calcutta University. A958b), "Weak convergence of measures on separable metric spaces," Sankyu 19, 15-22. A958c), "On the convergence of probability distributions,aAjA:ja, 19, 23-26. A961a), "Measures on topological spaces," Mat. Sb. 55 (97), 35-100. (English trans- translation 1965. Providence: American Mathematical Society Translations, series 2, 48, 161-228.) A961b), "Convergence of stochastic processes," Bull. Amer. Math. Soc. 67, 276-280.
Summary of Notation This list includes only symbols used systematically throughout the book in some special way. Although the generic element of S is asserted to be x, it can actually be any letter in that general region of the lower case roman alphabet, possibly with subscripts and so on; similar remarks apply to other generic elements. Set Theory Ec is the complement of E; E1 — ?2 = E1 П E2C is the difference of E1 and E2; E1 + E2 — (Ег — E2) vj (E2 — Ej) is their symmetric difference; /^ is the indicator or characteristic function of E. Probability Spaces (Cl, Щ is a measurable space; со and Eare the generic elements of Q and @; P is a probability measure on (Q, '.Щ\ Е denotes expected value; ? is a random variable, and {?„} and {?t) are stochastic processes. If h maps (Q, Щ into (Q\ 38'), h'xSS' с ® means that h~xE' e 38 for each E' G .^' —• that is, that h is measurable — and P/z^has value P(h~lE') at ?' (p. 222). The General Metric Space S is a metric space with generic point x, metric p(x, y) (and p(x, A) — see p. 215), and (T-field^of Borel sets; A°, A~, and дЛ are the interior, closure, and boundary of the generic subset Л (which lies in Sf unless the contrary is stated); S(x, s) is the open e-sphere (ball) about ж; the general closed set is F(but this can also be a distribution function), the general open set is G, and the general compact set is К (but this can also be a constant or bound). 248
Summary of Notation 249 C(S), with generic element/, is the class of bounded, continuous functions on C; if h maps S into another metric space, Dh is the set of its discontinuities. P and Q are probability measures on (S, У)\Рп => P denotes weak convergence; II is a family of P's. Special Metric Spaces The n-field of Borel sets in a space is denoted by the script version (with any attendant superscripts) of the upper case roman letter denoting the space itself. Rk is Euclidean A>space; H is the generic element of 'Mk\ F is a distribution function; Fn => F denotes weak convergence (p. 18); for the notations x < y, x < y, \x — y\, and {a, b], see p. 17. Rw is the topological product of a countable sequence of copies of the real line (pp. 19 and 218), with generic point x = (xlt x2, . . .); rrk{x) = (x1, . . . , xk) is the natural pro- projection into Rh. Cis C[0, 1] (pp. 19, 54, and 220), with metric p and generic point x — x(t); rrti_ tk(x) = (x^j), , a;(rfc)) is the natural projection into Rh; for xt as a coordinate variable and for the phrase "the distribution of xf under P", see p. 60. D is D[0, 1] (p. 109), with two equivalent metrics d (p. Ill) and d0 (p. 113), and generic point x = x(t); projections nt iftand coordinate variablesxt are as for C; special symbols: Я and Л (p. Ill), ЩИ (p. 112)',"тР (p. 123), Jt (p. 124), and Tx (p. 128). Random Elements X is a random element (p. 22) with value X(co) at со; if its range is a function space, X(t) = ^and X(t, со) = 77t(X(co)) (p. 57 for C; p. 128 for ?>); Xn ® > A'and Xn rj y P denote convergence in distribution (pp. 23 and 24); Xn p > a and Xn p > X denote convergence in probability (pp. 24 and 26); for (X, Y) and p(X, Y), see pp. 25 and 225. Nand N(/u, a2) are normal distributions or variables (p. 24); H^is Wiener measure on С (p. 61), a random function in С with that distribution (p. 64), or the corresponding measure or random function in D (p. 137); W° is the Brownian bridge, as a measure or a random function in С or in D (pp. 64, 65, and 141). Moduli The modulus of continuity for С and related moduli for D: wx(8) = w(x, 8), pp. 54 and 220, wx(d) = w(X, 6), p. 58, wx(T0), p. 109, w'x{8), p. 110, w"x{8), p. 118, w"(X, 6), p. 128.
Index Alexandrov, 16 Anderson, 34 Arc sine law, 80, 138, 194 Arzela-Ascoli theorem, 55, 221 Asymptotically independent increments, 157 Bartozynski, 48 Base for a topology, 215 Bickel, 108 Billingsley, 16, 17, 150, 195, 208 Biikhoff, 233 Birnbaum, 108 Bjornsson, 17 Blum, 160 Borel set, 3, 7 Boundary, 215 Brownian bridge, 64 Brownian motion, 4, 62, 154 Brownian motion process, 64 Central limit theorem, 42, 72 Change of variable, 222 Characteristic function, of a probability measure, 45 of a set, 8 Chentsov, 102, 108 Chernoff, 34, 214 Chibisov, 153 Ciesielski, 205 Compactness, 217 Complete set, 217 Continued fractions, 169, 192, 201 Continuity of sample paths, 66 Continuity set, of a measure, 2, 11 of a random element, 23 Convergence, in distribution, 23, 24 in probability, 24, 26 Convergence-determining class, 15 Coordinate variables, in C, 60 in A 123 Cramer-Wold theorem, 49 Darling, 86, 108 De Moivre-Laplace theorem, 1,12 Determining class, 15 Diagonal method, 219 Diffusion, 158 Diophantine approximation, 193 Discontinuity of the first kind, 109 Discrete set, 215 Distribution function, 17, 22 Distribution of a random element, 22 Doeblin, 195 Domain of a random element, 22 Donsker, 6, 76, 108, 143 Donsker's theorem, 68, 137 Doob, 6, 108, 165 Dudley, 16, 153 Dunford, 16 Dwass, 214 Empirical distribution function, 5, 103,141 e-net,.217 Equivalent metrics, 215 Erdos, 6, 76, 86 Exchangeable random variables, 212 Feller, 52, 81 Finite-dimensional distributions, in C, 30 in A 123 of random elements, 40 | in/?00, 30 Finite-dimensional sets, in C, 19 in A 121 ini?00, 19 Fortet, 195 Functional central limit theorem, 72 251
252 Index Function of a ф-mixing process, 182 Gaussian random function, 64 Gauss's measure, 169 Gikhman, 16 Glivenko-Cantelli theorem, 103 Gross, 52 Halmos, a.e. Hanson, 150 Hardy, 52 Helley's selection theorem, 227 Hennequin, 16 Ibragjmov, 195, 208 Independence of random elements, 26 Independent increments, 61, 154 Indicator function of a set, 8 Interval in Rk, 17 Invariance principle, 72 Ito, 67, 86 Kac, 6, 76, 86, 108, 195 Kallianpur, 16 Karlin, 86, 214 ^-dimensional Borel set, 17 Keisler, 233 Kesten, 205 Khinchine, 165 Kimme, 142 Knight, 77 Kolmogorov, 6, 16, 76, 102, 103, 123 KolmogoroVs existence theorem, the general case, 230 for i?00, 228 Lamperti, 61, 77, 150 LeCam, 6, 16, 40, 242 Levy, 52, 165 Liggett, 214 Lindeberg-Levy case, 77 Lindeberg-Levy theorem, 3, 45 Lindeberg's theorem, 42 Lineal Borel set, 17 Local compactness, 7 Local limit theorems, 49 Lyapounov's theorem, 44 McKean, 67, 86 Mann, 34 Marginal distribution, 20 Mark, 86, 143 Martingale, 205 Measurability, 222 Measurable cardinal, 233 Modulus of continuity, 54, 220 Nonmeasurable cardinal 233 Normal distribution, 24 One-sided process, 168, 183 Open cover, 215 Ox toby, 10 Pardoner, 153 Parthasaiathy, 16 0-mixing, 166 Problem of measure, 10, 233 Products of metric spaces, 224 Prohorov, 6, 16, 34, 40, 52, 61, 76, 77, 123 Prohorov metric, 238 Prohorov's theorem, 37, 240 Projection, from C, 19 from A 120 fromi?00, 19 Руке, 108 Random element, 22 Random function, 4, 22, 57, 128 Random variable, 22 Randum vector, 22 RarHom walk case, 77 Ranga Rao, 16, 17, 29 Re'; -ction principle, 71 Regular measure, 7 Relative compactness of measures, 35, 239 Rehyi, 28, 142, 150 Rosen, 165, 208, 214 Rosenblatt, 150 Rubin, 34 Sampling, 208 Scheffe's theorem, 224 Schmid, 142 Schwartz, 16 Separability, 215 Separable measure, 234 Separable stochastic process, 65, 134 Sequential compactness, 35, 239 Shorack, 108
Index 253 (^compact, 9 Skofohod, 6, 16, 22, 123, 142 Skorohod topology, 111 Slutsky, 34, 102 Sphere, 215 Stone, 61, 142 Strassen, 77 Subspaces, 224 Support, 9 Tarski, 233 Teicher, 214 Tight family of measures, 37, 239 Tight measure, 9 Tight sequence of random elements, 40 Topological completeness, 234 Topology of weak convergence, 236 Topstfe, 16, 17, 34, 61, 226 Tortrat, 16 Totally bounded set, 217 Trumbo, 214 Two-sided process, 166, 182 Ulam, 10 Under, 60 Uniform integrability, 32 Uniform topology, on C, 54, 220 onД 150 Upper semicontinuity, 218 Varadarajan, 6, 16, 29, 40, 234, 241, 242 Wald, 34 Weak convergence, of distribution func- functions, 1, 18 of measures, 2, 7, 16 Weyl's theorem, 19,50 Wichura, 214 Wiener, 67 Wiener measure, 4, 61 Wiener path, 63 Wiener process, 64 Wold, 52 Wright, 52