Author: Ibragimov I. A  

Tags: mathematics   higher mathematics  

ISBN: 90 01 41885 6

Year: 1971

Text
                    INDEPENDENT AND STATIONARY SEQUENCES
OF RANDOM VARIABLES


I. A. IBRAGIMOV AND Yu. V. LINNIK University of Leningrad, Leningrad INDEPENDENT AND STATIONARY SEQUENCES OF RANDOM VARIABLES Edited by PROFESSOR J. F. C. KINGMAN University of Oxford, Oxford, U.K. 12240 WOLTERS-NOORDHOFF PUBLISHING GRONINGEN THE NETHERLANDS
© 1971 WOLTERS-NOORDHOFF PUBLISHING GRONINGEN No part of this book may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. Library of Congress Catalog Card No. 79.-119886 ISBN 90 01 41885 6 PRINTED IN THE NETHERLANDS BY NEDERLANDSE BOEKDRUK INDUSTRIE N.V. - 'S-HERTOGENBOSCH
EDITOR'S NOTE The notation used is substantially that of the original, with a few excep- exceptions of which the most notable is the use of E rather than M for mathe- mathematical expectation; V is used for variance rather than the original D, since the latter might be mistaken for standard deviation. The symbol • is used to signal the end of the proof of a theorem or lemma. In some places the argument has been recast so as to read more smoothly in English, I hope without violence to the authors' intentions. Readers will be familiar with the 0, o notation, but will perhaps not recognise the symbol B, which is used in some chapters to denote a generic bounded quantity. Oxford, October 1969 J.F.C.K.
CONTENTS Editor's note 1 - Preface 1 _ Chapter 1 Probability distributions on the real line: infinitely divisible laws 17 1. Probability spaces, conditional probabilities and expectations 17 2. Distributions and distribution functions 19 3. Convergence of distributions 21 4. Moments and characteristic functions 24 5. Continuity of the correspondence between distributions and characteristic functions 27 6. A special theorem about characteristic functions 32 7. Infinitely divisible distributions 34 Chapter 2 Stable distributions; analytical properties and domains of attraction 37 1. Stable distributions 37 2. Canonical representation of stable laws 39 3. Analytic structure of the densities of stable distributions ... 47 4. Asymptotic formulae for the densitiesp(x; a, /?) 54 5. Unimodality of stable laws 66 6. Domains of attraction 76
6 CONTENTS Chapter 3 Refinements of the limit theorems for normal convergence 94 1. Introduction 94 2. Some auxiliary theorems 94 3. The deviation Rn(x) 97 4. Necessary and sufficient conditions 104 5. The maximum deviation of Fn from <P Ill 6. Dependence of the remainder term on n and x 117 Chapter 4 Local limit theorems 120 1. Formulation of the problem 120 2. Local limit theorems for lattice distributions 121 3. A limit theorem for densities 125 4. Limit theorems in the Lx metric 128 5. A refinement of the local limit theorems for the case of normal convergence 135 Chapter 5 Limit theorems in Lp spaces 139 1. Statement of the problem 139 2. Domains of attraction of stable laws in the Lp metric .... 141 3. Estimates of || Fn — $ ||p in the case -of normal convergence . . . 146 Chapter 6 Limit theorems for large deviations 154 1. Introduction and examples 154 2. Statement of the problem 158
CONTENTS 7 Chapter 7 Richter's local theorems and Bernstein's inequality 160 1. Statement of the theorems 160 2. A local limit theorem for probability densities 161 3. Calculation of the integral near a saddle point 166 4. A local limit theorem for lattice variables 167 5. Bernstein's inequality 169 Chapter 8 Cramer's integral theorem and its refinement by Petrov 171 1. Statement of the theorem 171 2. The introduction of auxiliary random variables 172 3. Proof of the theorem 174 Chapter 9 Monomial zones of local normal attraction 177 1. Zones of normal attraction 177 2. The fundamental conditions 178 3. Fundamental theorems 180 4. Approximation of the characteristic function by a finite Taylor series 182 5. Derivation of the basic integral 184 6. Completion of the proof 187 Chapter 10 Monomial zones of local attraction to Cramer's system of limiting tails 190 1. Formulation 190 2. On the condition A0.1.9) 192
CONTENTS 3. Derivation of the fundamental integral 192 4. Application of the method of steepest descents 194 5. Completion of the proof of Theorem 10.1.1 197 Chapter 11 Narrow zones of normal attraction 198 1. Classification of narrow zones by the function h 198 2. Statement of the theorems r- . . . 199 3. On the conditions imposed upon h(x) 200 4. The necessity of A1.2.2) for Class 1 200 5. The sufficiency of A1.2.2) for Class I 201 6. Investigation of the fundamental integral 203 7. More investigation of the fundamental integral 204 8. Investigation of K(t) 207 9. More investigation of K(t) 209 10. Completion of the proof of Theorem 11.2.1 211 11. The corresponding integral theorem 212 12. Calculation of the auxiliary limit distribution 214 13. More about the auxiliary limit distribution 215 14. Completion of the proof of Theorem 11.2.2 217 15. The general case of narrow zones 218 16. The transition to Theorems 11.2.3-5 220 17. Choice of/i 222 18. Completion of the proof 224 Chapter 12 Wide monomial zones of integral normal attraction 226 1. Formulation 226 2. An upper bound for the probability of a large deviation . . . 227 3. Introduction of auxiliary variables 229 4. Study of the basic relation 231 5. Derivation of the fundamental formula 232
CONTENTS 9 6. The fundamental integral formula 234 7. Study of the auxiliary integral 235 8. Expansion of R as a Taylor series 236 9. Further transformations 238 10. Completion of the proof of sufficiency 240 11. Proof of the necessity 241 12. Completion of the proof 243 Chapter 13 Monomial zones of integral attraction to Cramer's system of limiting tails 244 1. Formulation 244 2. An upper bound for the probability of a large derivation . . . 245 3. Investigation of the basic formula 251 4. Completion of the proof 253 Chapter 14 Integral theorems holding on the whole line 254 1. Formulation 254 2. An elementary result on the probability of very large deviations 255 3. Radial extensions 258 4. Investigation of the fundamental integral 260 5. Investigation of the auxiliary integrals 263 6. An example 265 Chapter 15 Approximation of distributions of sums of independent components by infinitely divisible distributions 267 1. Statement of the problem 267 2. Concentration functions 268
10 CONTENTS 3. Auxiliary propositions 273 4. Proof of Theorem 15.1.1 278 Chapter 16 Some results from the theory of stationary processes 284 1. Definition and general properties 284 2. Stationary processes and the associated measure-preserving transformations 286 3. Hilbert spaces associated with a stationary process 288 4. Autocovariance and spectral functions of stationary processes 291 5. The spectral representation of stationary processes 292 6. The structure of L^ and linear transformations of stationary processes 296 7. Existence theorems for the spectral density 298 Chapter 17 Conditions of weak dependence for stationary processes 301 1. Regularity 301 2. The strong mixing condition 305 3. Conditions of weak dependence for Gaussian sequences ... 310 Chapter 18 The central limit theorem for stationary processes 315 1. Statement of the problem 315 2. The variance of Xx +...+ Xn 321 3. The variance of the integral ft X{t)dt 330 4. The central limit theorem for strongly mixing sequences . . . 333 5. Sufficient conditions for the central limit theorem 340
CONTENTS 6. The central limit theorem for functionals of mixing sequences 352 7. The central limit theorem in continuous time 362 Chapter 19 Examples and addenda 365 1. The central limit theorem for homogeneous Markov chains . . 365 2. m-dependent sequences 369 3. The distribution of values of sums of the form ~LfBkx). . . . 370 4. Application to the metric theory of continued fractions ... 374 5. Example of a sequence not satisfying the central limit theorem 384 Chapter 20 Some unsolved problems 390 Appendix 1 Sowly varying functions 394 Appendix 2 Theorems on Fourier transforms 398 Appendix 3 A theorem on convergence of conditional expectations 400 Notes 401 Some contributions of recent years 406 by I. A. Ibragimov, V. V. Petrov Bibliography 429 Subject index 440
PREFACE It is difficult to indicate in a short title the contents and methods of attack of this book, and we seek therefore to do so in this preface. The problems studied here concern sums of stationary sequences of random variables, including sequences of independent and identically distributed variables. More specifically, we are concerned with the distribution function Fn (x) of the sum X1 + X2+ ... + Xn, where Xx, X2, ... is a stationary sequence. In the independent case, asymptotic analysis of Fn (x) for large n is highly developed, but in the general case much less is known. Most of the methods expounded here can be extended, for example, to problems in which the Xn are not identically distributed, but the results are cumbersome and seem less final, and we therefore restrict ourselves to the stationary case. As well as the problem of summation just outlined, we include a discussion of some closely related problems of the analytical structure of stable laws. The book presupposes a knowledge of the monograph "Limit Distribu- Distributions of Sums of Independent Random Variables" by B. V. Gnedenko and A. N. Kolmogorov, whose publication in 1949 inspired much of the re- research we describe. Chapters 2-5 treat problems about sums of independent, identically distributed random variables not connected with the theory of large deviations, which occupies Chapters 6-14. In Chapter 15 the problem of approximating Fn (x) by infinitely divisible distributions is studied. Chap- Chapters 16-19 are devoted to limit theorems for weakly dependent stationary sequences. In Chapter 20 some unsolved problems are formulated.
Chapter 1 PROBABILITY DISTRIBUTIONS ON THE REAL LINE: INFINITELY DIVISIBLE LAWS This chapter is of an introductory nature, its purpose being to indicate some concepts and results from the theory of probability which are used in later chapters. Most of these are contained in Chapters 1-9 of Gne- denko [47], and will therefore be cited without proof. The first section is somewhat isolated, and contains a series of results from the foundations of the theory of probability. A detailed account may be found in [76], or in Chapter I of [31]. Some of these will not be needed in the first part of the book, in which attention is confined to independent random variables. § 1. Probability spaces, conditional probabilities and expectations A probability space is a triple (Q, 5, P), where Q is a set of elements co, 5 a cr-algebra of subsets of Q (called events), and P a measure on 5 with P (Q) = 1. For E g g, P {E) is called the probability of the event E. A random variable X is a real-valued measurable function on (Q, 5), and the measure F defined on the Borel sets of the real line R by F(A) = P(X e A) is called the distribution of X. Several random variables X1,X2, ¦¦¦, Xn may be combined in a random vector X = (XX, X2, ..., Xn), and the measure F(A) = P(XeA) defined on the Borel sets of/?" is the distribution of X, or the joint distribution of the variables Xl5 X2, ..., Xn. More generally, if T is any set of real numbers, a family of random variables X(t), tsT, defined on (Q, 5, P) is called a random process. Conditions for the existence of random processes with prescribed joint distributions are given by Kolmogorov's theorem [76]. A probability space is a special case of a measurable space, and it is there-
18 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 fore possible to construct in it a Lebesgue integral (as, for example, in [105]). If the function X is integrable with respect to P, that is, if \X(co)\P{dco)< oo, .' n then the integral f X{co)P{dco)=[ XdP is called the expectation of X, and is denoted by the symbol E(X). If X is a random vector with values in R" and distribution F, and </> is a Borel measurable function from R" to R, then <j> (X) is a random variable, and = f <t>(x)F(dx). Let 5i be a cr-algebra with Si <= 3f, and let X be a random variable with E \X\ < oo. The conditional expectation of X relative to 5i is the random variable, denoted by E(X\%1), which is measurable with respect to 5i and satisfies E{X\%1)dP = ( *dP A.1.1) a for all >le5i- These conditions determine E(X\%1) uniquely, except for differences on events of zero probability. For A e 5, define a random variable Xa by XaW=1 {coeA) 0 {co$A) Then ?(^1 5i) is called the conditional probability of A relative to gl5 and is denoted by P(A l^). The random variable P(,415i) is measurable with respect to 5l5 and satisfies A.1.2) B for all Begj. Let |X(t); te T] be a random process. Then it is natural to consider the minimal cr-algebra 9lR with respect to which each of the variables X (t) is measurable. This is the cr-algebra generated by the events of the form
1.2. DISTRIBUTIONS AND DISTRIBUTION FUNCTIONS 19 {(X(t1),X(t2),...,X(tn))cA} for tl512, ..., tneT and Borel sets A in R". For any random variable Y, we write We shall state various properties of conditional expectations which will be needed later (cf. [31], Chapter I). If Y and Z are random variables with ?|y| < oo and E\Z\ < oo, and if Z is measurable with respect to 5i> then with probability one, A.1.3) If cr-algebras gl5 g2 satisfy ^ c g2 c g, then with probability one, A.1.4) § 2. Distributions and distribution functions If X is a random variable, its probability distribution is the measure F{A) = P{XeA) on the Borel subsets of the real line. It is well known that F is uniquely determined by the corresponding distribution function F defined by F(x) = F{{- oo, x)) = P{X<x). In what follows, no distinction will be made between F and F, and we shall speak, for instance, of a random variable X having distribution F(x). A probability distribution F is called continuous if the measure F is ab- absolutely continuous with respect to Lebesgue measure, i.e. if F(A) = f p(x)dx A for some function p, which is necessarily given by p(x) = F'(x) outside a set of Lebesgue measure zero. The function p is then called the density of the distribution. A probability distribution F is said to be discrete if it is concentrated on some countable set {xh}. If ph = P(X = xh), then
20 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 pk, F(x)= ? pk. xk<x In particular, F is called a lattice distribution if {xk} is contained in an arithmetic progression {a + kh; /c = 0, +1, + 2, ...}. Such a distribution is a natural generalisation of that of an integer-valued random variable. The maximal value of h for which the distribution is concentrated on an arithmetic progression with step h is called the step of the distribution. Thus for example the random variable X distributed according to the Poisson law P{X = k) = Xke~x/k\ (fc = 0, 1,2, ...) for X > 0 has step 1, although of course for any integer n it is concentrated on the multiples of l/n. A distribution concentrated on a single point a is said to be degenerate. Its distribution function has the form E(x — a), where ?(x) = 0 (x<0), = 1 (x > 0). As well as continuous and discrete distributions there are the singular distributions, which are concentrated on uncountable sets of Lebesgue measure zero, and have P(X = x) = 0 for all x. Every distribution F can be represented as F = a1F1 + a2F2 + a3F3, A.2.1) where F1, F2 and F3 are respectively continuous, singular and discrete distributions, and al5 a2, a3^0, a1+a2 + a3 = l. The distribution func- function F has a corresponding decomposition F{x) = fl1F1(x) + a2F2(x) + fl3F3(x), into continuous, singular and discrete components. Every distribution function F is non-decreasing, left-continuous, and has lim F(x) = 0, limF(x)=l . x-* — oo Conversely, every function satisfying these conditions is a distribution function, since we may take Q — R, % the o--algebra of Borel sets, P the Lebesgue-Stieltjes measure determined by P{[a, b)} =F(b) — F(a), and X(co) = co.
1.3. CONVERGENCE OF DISTRIBUTIONS 21 Let X and Y be independent random variables with respective distribu- distribution functions Fx and F2. The distribution function F of X+ Y is given by F(x)=[ F1(x-y)dF2{y)=r F2(x-y)dF1(y). A.2.2) J — CO J — 00 We say that F is the convolution of Fx and F2, and write The convolution of n identical distributions will be denoted by " = F*F*...*F . If one of the distributions Fj, F2 is continuous, then so is F; in fact if Fj admits a density pl5 then F has density p1{x-y)dF2{y). § 3. Convergence of distributions We here consider different types of convergence of probability distribu- distributions on the real line, following Kolgomorov [82]. The ideas in that paper will also be used in later chapters. A) Convergence in variation. Define the distance px (F, G) between two distributions F and G by Pi(F, G) = sup \F{A)-G{A)\, A.3.1) where the supremum is taken over all Borel sets A. A sequence of dis- distributions Fn converges in variation to a distribution F if px (Fn, F)-»0. It is clear that this mode of convergence can be expressed in terms of distri- distribution functions: Pi(F, G) is one-half the total variation of F(x)—G(x). For continuous distributions |F'(x)-G'(x)|dx, TO while for discrete distributions
PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 the summand being zero except at a countable number of values of x. B) Strong convergence. Suppose that in A.3.1) we take the supremum, not over all Borel sets A, but only over intervals A. This gives a new distance p'2(F,G) = sup\F(A)-G(A)\. Equivalently, the distance P2(F,G) = sup \F(x)-G(x)\ A.3.2) — oo <x< oo defines the same mode of convergence, since it is easy to see that p2(F,G)<p'2(F,G)<2p2(F,G). Convergence in either of these metrics is called strong convergence. C) Weak convergence. A sequence of distributions Fn is said to converge weakly to a distribution F if FH(A)-*FW A-3-3) for every Borelset A whose boundary has F-probability zero. This is equivalent to the requirement that the corresponding distribution func- functions Fn(x) converge to F(x) at every point of continuity x of F. Weak convergence will be denoted by the symbol F =>F ¦ it is equivalent to convergence in the Levy metric* L{F, G) = M{h; F{x-h)-h< G{x)^ F{x + h) + h]. Weak convergence has the advantage that it takes into account the error which is inherent in the measurement of a random variable. For example, for any positive number a, denote by Fa the distribution of Y=X + ?, where X has the distribution F, and ?, independent of X, has a normal * See [48], page 38. Every distribution F generates a linear functional (F,/) = J _ <«, f(x) dF (x) in the space C of continuous functions with limits at oo. Weak convergence of distributions is equivalent to weak convergence of the corresponding functionals, i.e. Fn=>F if and only if (Fn,/H(F,/)forall/eC.
1.3. CONVERGENCE OF DISTRIBUTIONS 23 distribution with mean zero and variance a2. Define a type of convergence by saying that Fn-+F if, for all o > 0, Pl 0 . A,3.4) Theorem 1.3.1. Convergence as defined by A.3.4) is equivalent to weak convergence. Proof. Denote by (f)a{x) = B7r<T2)-*exp(-x2/2<72) the density of the distribution of ?. Then for any distribution G, Ga is a continuous distribution with density g°(x)= <y{x-y)dG(y). Therefore dx F(x-y)d{FH(y)-F(y)} Suppose first that Fn=>F, and fix a positive number A. By the dominated convergence theorem, ¦00 + \ dx -A {Fn(y)-F(y)}-F(x-y)dy \Fn(A)-F(A)\ f cj>°{x-A)dx + J — 00 \Fn(y)-F(y)\dy -A dx . A.3.5) One may assume that A is taken to be a point of continuity of F. Then the
24 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 last three terms in A.3.5) tend to zero as n-+ oo. By choosing A sufficiently large, the remaining terms may be made arbitrarily small. Hence, for all a. Conversely, suppose that this holds. Then {Fn(y)-F(y)}cf>°(x-y)dy -0. A.3.6) J — oo Suppose if possible that Fn=f>F. Then there exists x0, a point of continuity of F, and 5 > 0 such that \Fn(x0)-F(x0)\>6 for infinitely many n. There is no loss of generality in taking x0 = 0. If then for instance Fn@) > F@) + S, there is an interval [0, a] in which F(x)< F@)-|<5, and Fn(x)-F(x) > FH@)~F(x) > S {F(x)-F@)} > \b . But then upP {Fn(y)-F(y)}<j>*&-y)dy= a-+0 J — oo = lim sup {Fn(y) — F(y)} (f)a{^a — y)dy > a~*0 J 0 >jS lim sup (f)a(ja-y)dy = \b . a~*0 JO 0 Hence there is a value of a for which p2\tn,t )>4O, A.3.7) and a similar conclusion holds in the other case Fn@)< F@) — d. Thus A.3.7) holds for infinitely many n, which contradicts A.3.6), showing that the supposition Fn=f>F must be false. • § 4. Moments and characteristic functions T-ie moments av and absolute moments j5v of a random variable X with distribution F are defined respectively by
1.4. MOMENTS AND CHARACTERISTIC FUNCTIONS 25 xvdF(x), oo /JV=E\X\V = P \x\vdF(x), J - oo so long as these expectations exist. The fiv satisfy the inequalities #"<#", (r>s>0). The characteristic function /(?) of X is defined by eitxdF{x), A.4.1) oo and its connection with the moments is contained in the following assertion. If a random variable X has finite absolute moment fik (where k is a positive integer), then f has derivatives up to order k, and for s = 0, 1, 2, ..., k. As ?->0, f(t)= I ~(ity+o(tk). s = 0 S- It is a most important fact that addition of independent random variables corresponds to multiplication of characteristic functions. If the indepen- independent variables Xt have respective characteristic functions f(t), then the characteristic function of Xl + X2 +... + Xn is f(t)=fi(t)f2(t)..Jn(t). From A.4.1) the characteristic function is uniquely determined by the distribution function. The converse is also true, and is expressed by the relation i rc eity — e~itx F(x) = — lim lim I f{t)dt, A.4.2) In y^n ^^ J_c it
26 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 which holds at all points of continuity of F. Thus there is a one-to-one correspondence between distribution functions and characteristic func- functions. If the distribution F has a density p, then/is just the Fourier transform of p, and by the Riemann-Lebesgue theorem, lim r->oo Consequently, if F has a non-zero absolutely continuous component, lim sup |/@| < 1. t->oo On the other hand, if F is discrete,/is almost periodic, and lim sup |/@l = 1 • t—ao Suppose that X takes only the values a + kh (k = 0, ±Z, + 2, ...), and write pk = P (X = a + kh). Then the characteristic function of X is k k and consequently / is periodic with period 2n/h. Theorem 1.4.1. In order that a random variable X have a lattice distribu- distribution, it is necessary and sufficient that \f(to)\ = lfor some ?0 Proof If X has a lattice distribution with step h, then Conversely, suppose that for to^O, \f{to)\ = 1. Then for some real a, r00 f(to)= e"°*dF(x) = e*, J- oo and therefore • oo Joo cos to(x — a)dF(x) = 1 . — oo This is only possible if F concentrates all probability on the points x with cos t0 (x — a) = 1 , i.e. the points a+kh, where h = 2n/t0. •
1.5. DISTRIBUTIONS AND CHARACTERISTIC FUNCTIONS 27 Theorem 1.4.2. If the step of the lattice distribution is h, then \fBn/h)\ = 1 and\f{t)\<lforO<\t\<2n/h. Proof. Suppose that 0< \to\ < 2n/h and \f(to)\ = 1. Then the distribution is concentrated on an arithmetic progression with step 2n/\to\ >h, which contradicts the definition of h. • § 5. Continuity of the correspondence between distributions and characteristic functions The correspondence between probability distributions on the real line and their characteristic functions is not only one-to-one, but also contin- continuous in the following sense. Theorem 1.5.1. A sequence (Fn) of distributions converges weakly to a distribution F if and only if the corresponding sequence (/„) of characteristic functions converges uniformly in every bounded interval to the characteristic function f of F. For the proof of this theorem, see for example [47]. In the sequel we shall need various refinements of this theorem permitting us, from the proximity of their characteristic functions, to estimate proxi- proximity of distributions in the sense of different metrices. It is convenient to state these somewhat more generally for functions G of bounded varia- variation. The Fourier-Stieltjes transform g(t) = H e''MG(x) •> - oo will be called the characteristic function of G. Theorem 1.5.2. Let A, T, e be positive constants, F a non-decreasing func- function, G a function of bounded variation, and f and g their characteristic func- functions. If A) F(-oo) = G(-oo), F{od) = G(od), B) G' (x) exists for all x and \G' {x)\ < A , C) -r
28 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 then for each k>l, there exists a number c(k) depending only on k with the property that, for all x, A.5.1) Moreover, cB) < 24/n . The proof, due to Esseen [33], may be found in [48] (with the unnecessary restriction that $™M \F(x)-G{x)\dx< oo), or in [105] (which contains the estimate for cB)). Theorem 1.5.3. Let F be a non-decreasing purely discontinuous function (i.e. of the form F = aFi+b where FY is a discrete distribution function), G a function of bounded variation, and f g their characteristic functions. Suppose that A) F(-oo)=G(-oo), F(oo) = G(oo), B) the discontinuities ofF and G are confined to a set {..., x_ 1; x0, x1; ...} with xv+1—xv~^lfor all v, C) for all x outside this set, G'(x) exists and \G'(x)\ <A, ¦ _r t Then, for k>l, there exist constants c^k) and c2{k) such that A.5.2) whenever Tl^c2{k). For proof, see [19] (page 214). Theorem 1.5.4. Let T, 5, e be constants, F and G functions of bounded variation, f and g their characteristic functions. If A) F(-oo) = G(-oo), F(oo) = G(oo), |F(x)-G(x)|dx< oo B) r -r f{t)-9(t)
1.5. DISTRIBUTIONS AND CHARACTERISTIC FUNCTIONS 29 -T dt then 00 c A 4 |F(x)-G(x)|dx<-(VarG + VarF)+- + -) 00 V 7 A.5.3) w/iere c is an absolute constant. (It is possible to show that c^47r.) Proof. Denote by V the class of complex functions A (x) with bounded variation and by K the class of Fourier-Stieltjes transforms a(t)= f°° QitxdA{x) J - oo of functions in V. It is clear that INI = V(A) is well-defined, and that Lemma 1.5.1. Suppose that a(t) is absolutely continuous, and that both a(t) and a'(t) belong to L2( —oo, oo). Then aeV, and a'(t)\2dtV . A.5.4) J Proof. Use Plancherel's theorem (Appendix 2) to compare a and its Fourier transform a(x). Then d f00 „. eitx— 1 and
30 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 \a{t)\2dt= \a{x)\2dx. A.5.5) GO •' — 00 If we can prove that 5eL(— oo, oo), then a will belong to V, since then and moreover Qitxa{x)dx, \a(x)\dx. A.5.6) But the functions a and a' belong to L2(— oo, oo), and so by Theorem A2.2 (Appendix 2), xa(x)eL2( — oo, oo) and \a'{t)\2dt = From A.5.5), A.5.6) and A.5.7), we have A.5.7) = T a'(t)\2dt Proof of theorem 1.5.4. Integrate by parts in the equation (-00 f(t)-g(t)= etod{F(x)-G(x)}, J — oo to obtain n(t\ "Y°° eif*{F(x)-G(x)}dx, whence |F(x)-G(x)|dx = f(t)-g(t) -it
1.5. DISTRIBUTIONS AND CHARACTERISTIC FUNCTIONS 31 Now introduce the function k defined by k(t) = O if |*|>7\ = 2(t+T)/T if -T^t<$T, = 1 if = 2{T~t)/T if ± Then |/c@l^l, \k'(t)\^2/T, A.5.8) and it is easy to check that keV and that II*IIO. A.5.9) Writing = h(t)k(t) we have TOO \F(x)~G(x)\dx^\\hk\\ + \\h(l-k)\\. A.5.10) J - oo To estimate \\hk\\ we use the lemma, together with A.5.8), to give {roo rco ) \h(t)k(t)\2dt+ \h(t)k'(t) + h'(t)k(t)\2dt\ J J \h{t)\2dt + 2 J \h'{t)\2dt + ^ \h{t)\2dt\ T 3 2)M. A.5.11) To estimate ||/i(l -k)\\ we use the fact that 1 -k(t)=O for \t\ ^jT, so that for any function ueV with the property that u(t)=l/-it for \t\>\T. Then by virtue of A.5.9),
32 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 \\h(l-k)\\^\\f-g\\\\u\\\\l-k\\^ < {11/11 + 11^11} INI {1 + 11*11} < <4(VarF + VarG)||u||. A.5.12) Taking in particular u{t) = 4t/iT2, for |t|<±7\ = l/it, for \t\^T, we have roa r 00 sin txu(t)dt dx^c/T , A.5.13) J-oo JO where c is an absolute constant. Combining A.5.11), A.5.12), A.5.13) proves the theorem. (It is shown in [165] that the smallest possible value for ||w|| is n/T) • § 6. A special theorem about characteristic functions The following theorem will be needed later. Theorem 1.6.1. Let f(t) be any characteristic function, and v(t) = exp (iat — ja2t2) the characteristic function of the normal distribution with mean a and variance a2^0. Let (tk) be a sequence of points with tk^0, lim tk = 0. If for all k,f(tk) = v(tk), then f{t) = v(t) for all t. Proof Denote by F the distribution function corresponding to / Then there are two cases. A) <t2 = 0. Then so that Joo {1-cos tk(x-a)}dF(x) = 0 . — oo This is possible only if F(x) = E(x — a). B) a2 > 0. We shall need the theorem only for real characteristic func- functions (corresponding to symmetric distributions) and the proof will therefore be restricted to this case. Clearly then a = 0 and we may for
1.6. A SPECIAL THEOREM ABOUT CHARACTERISTIC FUNCTIONS 33 simplicity take a=l. We show that/has derivatives of all orders, and that /<2'>@) = i;B'>@) A.6.1) for all r. (Derivatives of odd order all vanish at 0, by symmetry). The proof proceeds by induction. To establish A.6.1) when r = 1, note that = 2 f sm2&kx)dF(x)=l-v(tk) = O(t2). A.6.2) J — CO Consequently, the integrals -A 1 hX J are bounded uniformly in A, k. Letting ?fc->0, it follows that •a x2dF{x) -A is bounded in A. Letting A^oo, we have ( x2dF(x)<oo, A.6.3) J - oo from which it follows that / is twice differentiate, whence of course /'@) = 0. Dividing A.6.2) by t2, and letting *fc->0, we have Now suppose that, for all s<r,/Bs)@) exists and /Bs)(O) = yBs)(O). By Rolle's theorem there is a sequence (rfc) with rfc^0, tfc->0 and Then SCO x2(r-1)sin2(irfcx)dF(x) = — oo Arguing as before, x2rdF(x)<oo oo and/Br) exists with/Br)(O) = i;Br)(O).
34 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 Now v{t) is an entire function of t, and |/Br)@l ^ l/Br)@)l = |uBr)@)|. Hence f(t) is also an entire function, and its derivatives at the origin agree with those of v(t). Therefore /= v. • § 7. Infinitely divisible distributions A distribution F is said to be infinitely divisible if, for each n, there exists a distribution Fn with Thus a random variable X with an infinitely divisible distribution can be expressed, for every n, in the form X = Xln + X2n + ¦¦¦ + Xnn, where the Xjn (j= 1, 2, ..., n) are independent and identically distributed. Theorem 1.7.1. In order that the function f(t) be the characteristic func- function of an infinitely divisible distribution it is necessary and sufficient that eiut-l - ), A.7.1) where cr^O, — oo <y < oo, and M and N are non-decreasing functions with M(-oo) = iV(oo)=0 and 0 re re u2dM{u) + u2dN{u)<oo JO for all e>0. The representation A.7.1) is unique. The proof may be found in [48] (page 83) or in [47] (Chapter 9). Equation A.7.1) is called Levy's formula. Simple examples of infinitely divisible distributions are the normal and the Poisson distributions, but we shall need also a generalised form of the latter. The distribution F is called a compound Poisson distribution if it can be represented in the form
1.7. INFINITELY DIVISIBLE DISTRIBUTIONS 35 fc=0 where G is a distribution function, and p>0. The characteristic functions of F and G are related by the equation = exp||00 (J»-l)d{pG{u)}\, where the last expression is clearly a special case of A.7.1). Interest in the class of infinitely divisible laws is motivated by Khinchin's theorem A.7.2), which shows that only infinitely divisible distributions can arise as limits of distributions of sums of independent random variables. Consider, for each n, a collection of independent random vari- variables, The Xnk are said to be uniformly asymptotically negligible if lim supP(|Xnfc|^e) = 0 n—> oo k for all ?>0. Theorem 1.7.2. In order that the distribution F should be, for an appro- appropriate choice of constants An, the weak limit of the distributions of Zn = Xni + Xn2 + ...+Xnkn-An A.7.2) as n->oo, where the Xnk are uniformly asymptotically negligible, it is necessary and sufficient that F be infinitely divisible. Conditions for convergence to a particular F can be expressed in the following way. Theorem 1.7.3. In order that, for an appropriate choice of the An, the distributions of A-7.2) should converge to F, it is necessary and sufficient that
36 PROBABILITY DISTRIBUTIONS ON THE REAL LINE Chap. 1 A) ? Fnk(x)^M(x) (x<0), point of continuity of M, N, and B) limlimsup ? ( [ x2dFnfc(x)-f( xdFnfc(x)V 1= ?->0 n->oo fc=l|_J|x|<? \J|x|<? / J = lim lim inf { f where M, N and a2 are as in the Levy formula A.7.1) for F, and Fnk is the distribution of Xnk. For the proofs, see [48]; particular cases may be found in Chapter 9 of [47].
Chapter 2 STABLE DISTRIBUTIONS; ANALYTICAL PROPERTIES AND DOMAINS OF ATTRACTION § 1. Stable distributions Definition. A distribution function F is called stable if, for any a1, a2 >0 and any bl,b2, there exist constants a>0 and b such that bi)*F{a2x + b2) = F(ax + b). B.1.1) It clearly suffices to take b1=b2=0. Then in terms of the characteristic function / of F, B.1.1) becomes f{tMf{t/a2)=f{t/a)e-a». B.1.2) Interest in the stable distributions is motivated by the fact that, under weak assumptions, they are the only possible limiting distributions of normed sums Zt.xl+x^...+x._Am BU) of stationarily dependent random variables. In this section we establish this result for independent random variables; the general case is dealt with in Theorem 18.1.1. Theorem 2.1.1. In order that a distribution function F be the weak limit of the distribution of Znfor some sequence (Xi) of independent identically distributed random variables, it is necessary and sufficient that F be stable. If this is so, then unless F is degenerate, the constants Bn in B.1.3) must take the form Bn = nll*h(n), where 0<a^2 and h(n) is a slowly varying function in the sense of Karamata.
38 STABLE DISTRIBUTIONS Chap. 2 Proof. Let/be the common characteristic function of the Xh and let 4> be the characteristic function corresponding to the distribution F. Since a degenerate distribution is trivially stable, we exclude this case, and prove that necessarily n = co, lim Bn+1/Bn= 1 . B.1.4) Suppose that the first condition in B.1.4) does not hold, so that there is a subsequence (Bnk) with limit B^oo. Then so that, for all t, This is possible only if \f(t) \ = 1 for all t, which implies that F is degenerate. Thus the first part of B.1.4) is proved, so that lim \f(t/Bn+i)\ = l. Thus and \f(t/Bn+1)\"+i = Substituting Bnt/Bn+i for t in the former, and then Bn+ y t/Bn for t in the latter, we deduce that, as n->oo, lim 'n+ 1 = lim 'ifc')/ = 1 . B.1.5) If Bn+ !/?„-/> l,we can find a subsequence of either (Bn+ ^/B,) or (Bn/Bn+1) converging to some B< 1. Going to the limit in B.1.5) we arrive at the equation (j){t) = (j){Bt), from which which is again impossible unless F is degenerate. Thus B.1.4) is proved. Now let 0<a1 <a2 and by, b2 be constants. Because of B.1.4) we can choose a sequence (m(n)) such that, as n->oo,
2.2. CANONICAL REPRESENTATION OF STABLE LAWS 39 Consider the sum ) " T B where -A, B.1.6) = BJa lt From the assumption of the thjeorem, the distribution functions of the two components of the left-hand side of B.1.6) converge respectively to F(a1x + b1) and F(a2x + b2), while that of the right-hand side converges to F(ax + b). Consequently F(a1x + bi)*F(a2x + b2) = so that F is stable. Conversely, let F be a stable distribution. For every n, the sum X1 + X2 + ... + Xn of independent random variables with distribution F has distri- distribution function of the form F(anx + bn), so that has distribution function F. The proof of the final assertion is deferred to §2. . In the next section we indicate the rather simple form of the characteristic functions of stable laws. The bulk of the chapter is devoted to the investi- investigation of the analytical properties of the corresponding densities, which are by no means obvious from the characteristic functions. Finally in § 6 conditions on the distribution of the Xt are given which ensure conver- convergence of the distribution of the normed sums B.1.3) to a given stable distri- distribution. § 2. Canonical representation of stable laws Theorem 2.2.1. In order that a distribution F be stable, it is necessary and sufficient that F be infinitely divisible, with Levy representation either
40 STABLE DISTRIBUTIONS Chap. 2 log/(f) = iyt+ M{u) = c,{-uY\ N(u)=-c2u\ 0<a<2, c^O, c2^0, c1 + c2>0, or Iog/@ = i>f-ic72t2. B.2.2) Proof. The infinite divisibility of F follows from the results of the last section, together with Theorem 1.7.2. Consequently \ogf(t) has the Levy representation A.7.1). Equation B.1.2) gives log/(t/a) = log/(t/fll) + log f(t/a2) + ibt. B.2.4*) Comparing this with A.7.1) we have itu dM(alU) iya2~it-:2-(j2a2~2t2+ The uniqueness of the Levy representation therefore implies that a2{a-2-a^2-a22) = 0, B.2.5) ) ) 2u), (u<0), • B.2.6) u), (u>0). B.2.7) * Equation B.2.3) in the original is identical to A.7.1).
2.2. CANONICAL REPRESENTATION OF STABLE LAWS 41 Suppose that M is not identically zero, and write m(x) = M(e~x), ( —oo<x<oo). From B.2.6) it follows that, for any Xx, X2, there exists 1 = 1A1, X2) such that, for all x, Thus more generally, for any Xx, X2, ..., Xn, there exists X such that m(x + X) = m(x + XJ + ... +m(x + Xn). B.2.8) Setting Xi =... =Xn = 0, there exists X = X(n) such that m(x + X) = nm(x). B.2.9) If p/q is any positive rational in its lowest terms, define X(p/q) = X(p)-X(q); then B.2.9) implies that -m(x) = pm{x-X(q)} = m{x + X(p)-X(q)} = H = m{x + X(p/q)}. Thus, for any rational r > 0, m{x + X{r)} = rm{x). B.2.10) Since M is non-decreasing, m is non-increasing, and so therefore is the function X defined on the positive rationals. Consequently, X has right and left limits X (s - 0) and X (s + 0) at all s > 0. From B.2.10) these are equal, and X(s) is defined as a non-increasing continuous function on s>0, satisfying m{x + X{s)} = sm{x). B.2.11) Moreover, it follows from this equation that lim X(s) = oo , lim X(s) = — oo . s->0 Since m is not identically zero, we may assume that m@)^0 (otherwise shift the origin), and write m1(x)=m(x)/m@). Let xltx2 be arbitrary, and choose s1? s2 so that
42 STABLE DISTRIBUTIONS Chap. 2 A(sl) = x1, X{s2) = x2. Then s1m@) = m(x1), s2m@) = m(x2), s2m(xl) = m(xl + x2), so that ml(xl+x2) = ml(xl)ml(x2). B.2.12) Since m^ is non-negative, non-increasing and not identically zero, B.2.12) shows that mi>0, and then m2 = log mi is monotonic and satisfies m2(xl + x2) = m2(xl) + m2(x2). B.2.13) It is known (see for example [50], page 106) that the only monotonic functions satisfying this equation are of the form m2(x) = ax. Since M( —oo)=0, this implies that = c1(-u)-*, a>0, As the integral -1 ->0 must converge, we have a < 2. Thus finally M(u) = cl(-u)~", 0<a<2, c^O. B.2.14) In an exactly similar way, N{u)= -c2u~p, 0<P<2, c2^0. B.2.15) Taking ai=a2 = l in B.2.6) and B.2.7), we have a-*=a-fi = 2, B.2.16) whence a = /?. Moreover, B.2.5) becomes in this case o2(a-2-2) = 0. This is incompatible with B.2.16) unless o2=0, so that either g2 = 0 or (u) = JV(u) = 0forallu. • The integrals on the right-hand side of B.2.1) can be evaluated explicitly, enabling the theorem to be reformulated in the following way.
2.2. CANONICAL REPRESENTATION OF STABLE LAWS 43 Theorem 2.2.2. In order that a distribution F be stable, it is necessary and sufficient that its characteristic function be expressible in the form \ogf(t) = iyt-c\t\' (l-iP^co(t, aj), B.2.17) where a, ft, y, c are constants (c^O, 0<a^2, \f}\^ 1) and co(t, a) = tan fact), a ^ 1 , = 2tt~1 log |*| , a=l. (Note that a, which is called the index of F, has the same meaning as in the previous theorem.) Proof. We examine B.2.1) in three cases. A) 0<a< 1. In this case the integrals 0 u du , f °° u du yr^2 yir^ and 1 j—p- ^m are finite and B.2.1) becomes, for some y', f ° du c °° du log f{t) = iy't + ac, (e'«- 1) -r^ + *c2 (e''"- 1) -^ ¦> -oo IMI JO U Therefore, in t>0, du r00 dw (-'-'lW J A"!) L Jo u The function ^ () u Jo u is analytic in the complex plane cut along the positive half of the real axis. Integrating it round a contour consisting of the line segment (r, R) @< r< R), the circular arc (with centre 0) from R to iR, the line segment (iR, ir), and the circular arc from ir to r, we obtain (on letting R^oo and du where
44 STABLE DISTRIBUTIONS Chap. 2 Similarly, o u and therefore for t > 0, logf(t) = iy't + ocL{<x)tsl{{ci + c2) cos {{na) + i{cl-c2) sin = iy't-ct*(l - ifi tan (^thx)), where c = —aL(a)(c1 + c2) cos (j7ra) ^ 0 , For log/@ = log/(-0 = iyt-c\t\'(l-ip tan g so that B.2.17) holds for all *. B) l<a<2. For this case we can throw B.2.1) into the form (for t>0) dw u Integrating the function round the same contour as above, we obtain ' • v dw o u
2.2. CANONICAL REPRESENTATION OF STABLE LAWS 45 o where Proceeding as before, we deduce that B.2.17) holds, with c = —aM(a)(c1 + c2) cos (^7ca)^0 , P = (c1-c2)/(c1+c2), |?K1. C) a= 1. Using the fact that ~°° 1-cosu o u2 du = jn, we have u2) u2 f °°cos tu-l J f" / . ut \du = du + i\ sin tu - —-= Jo " Jo V l + u2)u2 ,, ... ff°° sin to J f00 du ¦pit + it\im\\ —j—du-t u(l+u2)J "sinu . r°°/sinu 1 2 u "du f00/sinu 1 — + it —-2 ^ Jo V " u(l+u2) = —\nt — it log ? + itr , say . Thus B.2.17) is satisfied with c =jn{c1 +c2),
46 STABLE DISTRIBUTIONS Chap. 2 This theorem allows us to establish the form of the normalising constants Bn asserted in § 1. We shall prove the following result. If a sequence X1,X2,... of independent, identically distributed random variables is such that the distribution of the normed sum Zn=(X1+X2+... + Xn-An)/Bn converges to a stable law with index a, then Bn = n1/ah(n), B.2.18) where h is a slowly varying function in the sense of Karamata. Using the notation of § 1, we have for all t, „ ( t' = exp(-C|t|- For any fixed integer k, „ / t N Ikn = exp(-c|tH(l+o(l)), but at the same time fkn GO ,B kn, Bn = exp(-c|t|a; B.2.19) B.2.20) the remainder term tending to zero uniformly in every finite ^-interval. Suppose first that the sequence (BJBkn) is unbounded, so that there is a subsequence (rij) with Setting t = BknJ Bn. in B.2.20) and using B.2.19), we obtain the impossible equation e~ck= 1. Hence {BJBkn) is bounded, and then B.2.19) and B.2.20) yield which is only possible if lim B This proves the assertion.
2.3. DENSITIES OF STABLE DISTRIBUTIONS; ANALYTIC STRUCTURE 47 § 3. Analytic structure of the densities of stable distributions The results of the preceding section show that the stable distributions form a four-parameter family F(<x, ft y, c). From B.2.17) each of these admits a density p(x; a, ft y, c) given by the inversion formula p{x;aj,y,c) = 2n\ e-Uxexpliyt-c\t\* (l-ifi -^ co{t,a)\\ dt. Except for a few special cases, these densities are not expressible in terms of elementary functions, but (A) yields nevertheless a good deal of infor- information about their properties. A more convenient representation is one considered by Zolotarev: p(x;oc,fty,c) = — j e'^exp jiy?-c|t|aexp [-^(a)^! \dt for a#l, and Here K(a) = 1 —11 — a|, and the ranges of a, ft y, c are the same as in (A), but these parameters are not identical; indeed in the obvious notation PA = cot&za) 1a = 1b , cA = Unless otherwise indicated, we shall use the representation (B). A simple change of variables in (B) shows that p(x;a,P,y,c) = c'llap{(x-y)c-lla; a, 0, 0,1} B-3.1) for a#l, and that Thus we may restrict ourselves to the case y = 0, c= 1, and we shall write p(x) = p{x;a, 0) = p{x;a, ft 0, 1). It follows easily from (B) that, for all a,
48 STABLE DISTRIBUTIONS Chap. 2 p(x;a, P) = p(-x;a, - P). B.3.2) We may therefore restrict ourselves either to /?^0 or alternatively to x ^ 0, a remark which will be of use later. Another easy consequence of (B) is that p(x) has derivatives of all orders, a statement which can be greatly strengthened as follows. Theorem 2.3.1. The density p(x) of a stable law with a> 1, or with a= 1, , is an entire function of x. Ifoc< 1, the density may be written p(x) = x~14>1(x'a) , x>0, = x-1<P2((-x)-«), -x<0, where (P1 and <P2 are entire functions. Proof We distinguish three cases. A) a> 1. In this case the integral converges uniformly for all complex z, and thus defines an entire func- function of z coinciding with p on the real axis. B) a=l. Write nfyj j-i J Y| I *l I Y| I ,Z } } I where 1 r00 r / ? \ ¦) ] dt. B.3.4) For the sake of argument take /?>0. Suppose that it is permissible to rotate the contour of integration through an angle — \n. Then 1 f°° f 2 -tx-C- n and as before this implies that p1} and so p, is entire. It remains to justify the change of contour in B.3.4). To do this we integrate the function - \ (p(t) = exp (— hx — T — iC-T log t j
2.3. DENSITIES OF STABLE DISTRIBUTIONS; ANALYTIC STRUCTURE 49 (taking the branch with log 1=0) around the contour consisting of the line segment (r, R) @ < r < R), the circular arc CR (centre O) from R to — iR, the line segment (— iR, — ir), and the arc cr from — ir to r. Clearly lim and r <t>o C 2 ) ix ^ R\ exp< R |sin <f>x\— R cos (f> -\—JR(/>>d(/>- Jo 1 n ) [in f 2 2 ) R exp IRsintyx + - jR(/> p sin <j> R log R \ J<t>o I n n ) \Rn exp R {(/>0(|x| + l) — cos ^>0} + 2 1 1 + 0 exp R ||x| + 1 fi sin 0O log R as JR->oo for (/>0 sufficiently small. This justifies the change of contour. C) a < 1. It suffices to consider the case x>0. Substitute u for tx in the equation and rotate the contour of integration through an angle — jtt (the validity of this operation being proved as in B)). Then where 1- f °° <P1(z)= - exp{-t-fzcosfrza(l+P))} x x Jo x sin {taz sin(^roc(l+/?))} dt is clearly an entire function. • Remark 1. The exclusion of the case a = l, /? = 0 is necessary, since p(x; 1, 0) is the Cauchy distribution which though analytic for x real has poles at x= ±i.
50 STABLE DISTRIBUTIONS Chap. 2 Remark 2. For a<l, C=1 the arguments used in case C) above show that, for all x < 0, 1 f p{x; a, 1) = —Re/ 7I.X 1 n = 0. Similarly, for a<l, p(x; a, -l) = 0 for all x>0. Theorem 2.3.2. For a^ 1, we may write where dsm c forlpositive m. When cc = p/q is rational, this is equivalent to the differential equation s=l of order max(p —1, g—1). Proof. Write and use </>(*) =f{r)(x) to denote the relation (For integral r, this coincides with the usual notation for derivatives, see [193].) For Re /i>0, it is clear that
2.3. DENSITIES OF STABLE DISTRIBUTIONS; ANALYTIC STRUCTURE 51 Now p{x;a, P) = x~1 where x{0 = - ( explit- 71J o Introduce a new function 71 Differentiating B.3.7) m times with respect to s we obtain ds Hence w. s= 1 71 exp r"raexp {it- Setting r = m/a in B.3.9) and comparing with B.3.6), we have = - exp { ds" s=l B.3.5) B.3.6) B.3.7) B.3.8) B.3.9) and the first part of the theorem is proved. To prove the second part, write a=p/q, m=p—l,r = q—l and integrate by parts in B.3.6). Then z(f)= - - + ^e-*1"*""' P t'-'expiit-Zt'e-i^^dt, ni ni J0 B.3.10) and comparing this with B.3.9) we have the final equation asserted. • Up to now we have looked at p(x; a, C) as a function of x alone, but we now go on to study p(x; a, C) as a function of the three variables x, a, /?.
52 STABLE DISTRIBUTIONS Chap. 2 This point of view leads to a series of interesting and useful analytic rela- relationships between the stable laws with different values of a, /?, of which examples are given by the next two theorems. The first establishes a differential equation, while the second sets up a duality relation whereby to every stable law of index a > 1 corresponds a stable law of index a'1. Both theorems, with many similar relations, are to be found in the work ofZolotarev [189], [190]. Theorem 2.3.3. Write x = ex and P = 2<fxx/nK(a). Then for a#l the func- function A(t, 4>) = xp(x; a, /?) is the Dirichlet solution of Laplace's equation dx2 + in the strip \4>\^ nK (a)/2a, subject to the boundary conditions = eTp(eT;a, ±1). The proof follows by direct verification of the differential equation from the expression A{t,<I>) = -Re ( Gxp{-it- 71 Jo It follows in particular from the theorem that, for a fixed a # 1, the density p(x; a, C) is analytic in (x, C) in any region of the form {e<x< oo, — !+?</?<!— e} , 0<e<l. Theorem 2.3.4. For any a>l, -l^^l, x>0, p(x; a, C) = x~1~ap( — x~a; a; ft), where fi = {a-1){C-1)-C. Proof. As before, if X{O = - \ 71 Jo B.3.11) for a>l, ?>0, then for x>0, {x~a). B.3.12)
2.3. DENSITIES OF STABLE DISTRIBUTIONS; ANALYTIC STRUCTURE 53 We show that the contour of integration in B.3.11) may be rotated through the (negative) angle The integrand \j/(t) of B.3.11) is analytic in the complex plane cut along the negative half of the real axis. Let Fx be the line segment (p, R), F2 the circular arc from R to R e^, F3 the line segment (R ei4>, p Qi<f>), and T4 the circular arc from p el<t> to p. By Cauchy's theorem, jf +f +f +f U(t)dt = O, and it suffices therefore to show that the integrals along F2 and r4 tend to zero as JR-> oo, p->0. For the first we have the estimate \]/(t)dt exp {- R cos %tt + 9) - ?Ra cos {9a - \n B - a) 0)} d91. By breaking the interval into two parts @, ^J, (^>1; <j>) such that on @, <j)x) the inequality cos@a— \nB — a)) >5 >0 is satisfied, we have r2 as Moreover, ij/(t)dt = O(p)->0 as p->0. Rotating the contour, substituting t for obtain = —\ exp {i?t-t1/a and integrating by parts, we '01'1 dt = in n j 0 Now jn + 4>= —ftn/2a, so that taking real parts in B.3.13) we have B.3.13) so that
54 STABLE DISTRIBUTIONS Chap. 2 § 4. Asymptotic formulae for the densities p(x; a, /?)• It has already been remarked that the densities p(x; a, /?) may not in general be expressed in terms of elementary functions or the common "special functions". It is therefore of interest to represent the densities as convergent or asymptotic series in the neighbourhood of particular points, and to examine properties which are not at once obvious from the Fourier expansions (A) and (B) with their oscillating integrands. In this section we present a series of asymptotic formulae due to Linnik [99], Skorokhod [174]. Bergstrom [16] and Pollard [135]. The special case of the "extreme" stable laws p(x; a, ± 1) is due to Kolmogorov. The method of proof is that of contour integration and, later, the technique of steepest descent. Because of B.3.2) we can, and consistently will, restrict attention to positive values of x. Theorem 2.4.1. For a<l and x>0, ? () {fr(j)} nx n=1 n\ B.4.1) Proof. In proving Theorem 2.3.1 C) we established the equation p{x;a,P)= - — Re if e"f exp{-tax-aQ~ TIX J o and expanding the exponential formally we have {C) nx -0 nx n=Q nl To justify this formal expansion it suffices to prove the last series absolu- absolutely convergent, which is done by using Stirling's formula and noting that the series is majorised by f »=o »'¦
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(x; a, P) 55 A similar result holds for a > 1, but the series is now divergent and gives an asymptotic expansion for p(x) as x->oo. Theorem 2.4.2. For a> 1, the asymptotic expansion as x—>oo. The sign ~ denotes the asymptotic relation that, for all N, p{x;a,P)=~ X n\ The series B.4.3) does not converge; its terms do not even tend to zero. Proof. In the equation p(x\a,P) = — Re ( e~"exp{-ta;rae-*l'ltB~a)/'}dt, nx J 0 rotate the contour of integration through an angle justifying this as in the proof of Theorem 2.3.4. Then p(x; <x,P) = — Re ei4> [ exp {-te?(*lt+*)}exp{itax-a}dt. B.4.4) nx Jo Taylor's formula implies that, for real s, N (is)" \s\N+1 where |0| ^1. Hence N ftiGt .y. net 'fi f(^' ~^ )^ v* ^ Combining B.4.4) and B.4.5) we have
56 STABLE DISTRIBUTIONS Chap. 2 N /• oo j.na —na n = 0 /I r oo ^(N+lJa -(^+l)a (N+l)! Now rotate the contour of integration in the integral I Qxp{~tei(in+<t>)}tnadt Jo through an angle d=-\n-(j). Then so that B.4.6) leads easily to p(x; a,P) = — f, (-l (N+l) Theorem 2.4.3. For a = l and x>0, 28 \ 1 Mogx;l/n x n% where = lm\ e-lf I i + iC- —log A dt Proof. First let /?>0. In the equation B.4.6) X -ix-", B.4.7) nx n%n\ If00 f ( 2B \ ( 2iB \ ~) = - Re exp < - it [ x + — log x - t 11 H log M i dt substitute t for tx and rotate the contour of integration through an angle — -'m. Then
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(x; a, 0) 57 ( 2P i \ 1 f°° {it, 2B \ p\x-\ log x; 1, /? = — Im e ' exp < - A + p) tlogtrdt. \ n / nx Jo tx nx ) B.4.8) Expanding the exponential as a finite Taylor series with remainder, B.4.8) becomes Ttx.^n! V (N + l)! 7' which is equivalent to B.4.7). To show that this also holds good for expand the exponential in (IB \ 1 f00 f lit ) p x H log x; 1, /? = — Re e"te~f/*exp <— — log t\ dt \ n / nx Jo ( nx J to deduce that v • • / n = —ReS f°°^( — logtj tne-H~tlxdt + O{x-N-2). B.4.10) 7CX n = o-'O n ¦ ^^ Now rotate the contour through — \k and expand eitlx, and the result follows. • We remark that for P= — 1 this theorem is of little interest, since it asserts only that p(x; 1, — 1) decreases faster than any negative power of x. More complete information for this case is given by the following theorem. Theorem 2.4.4. As x-> + go, where an= Jo and cn((f>) is the coefficient ofy" in the power series expansion of the function
58 STABLE DISTRIBUTIONS Chap. 2 Proof. In the equation p(x; 1, —1) = -Re I exp substitute t = zeinx and write ^ = e***. Then -itx-t f 1 log t) >dt p{x) = p{x; 1, -1) = -Re f exp[-z? A - -logzHdx B.4.11) 7C Jo (. V 7C / J In the complex plane of z = u + iv with a cut along the negative real axis the integrand is analytic, and we may deform the contour of integration in B.4.11). For the choice of contour we use the method of steepest descents [24]. It is easy to see that the saddle point is at z0 = — i/e, and the contour of steepest descent is given by Im \z (l --log zH =0. Near the saddle point this is close to a circle of radius 1/e centred at the origin, so that in B.4.11) we change the contour of integration to F = F1 + r2 + r3, where F1 is the line segment (O, — i/e), F2 is the circular arc (centre O) from — i/e to 1/e, and F3 the line segment A/e, oo). Thus p{x)= The first term is equal to H (' - ! dz BA12) -Re f exp{-z^(l--logzHdz = n ri n = -Rei( n _e-i \n = 0, B.4.13) and the third has exp \-z? (l -logz ) idz n = e B.4.14) Finally, consider the integral around F2, which if z = e 1 exp {i(<j) — j can be written as
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(x;a,P) 59 = in Re f \xp{-ria(<l>)}el+d<l>, B.4.15) Jo where rj = 2?/ne, a(<f)) = ei<t>(l-i(t)). Writing we can write this as e o B.4.16) The expression outside the integral gives the leading term in the asymp- asymptotic expansion. To determine the other terms, we expand the integrand in powers of 77 ~*. To do this consider the function c{y) = exp{-y-2b (<f>y)} which is analytic in y and has the Taylor expansion N c(y)= V akyk + A(y)yN+1 , k = 0 in which it is not difficult to see that ock is a polynomial in <f> of degree at most 3k. We first show that A{y)^AN((fJN+1 + l)Qie't'2, B.4.17) where AN and e are independent of <f> and y, s is independent also of N, and e<l. To do this we remark that 4shr40 2sin0\] ,^1O, + —-^ — JJ, B.4.18) where 9 = (py, O^O^n. Since tan ^9<9 for 9^9^^n, it follows that, B-4-19) for some e< 1. Moreover,
60 STABLE DISTRIBUTIONS Chap. 2 B.4.20) for | A (y) | ^ max jn , where C is a constant. Hence Texp{-j; 2b{(f)y)} oy N + = max|exp{-j; 2b((f)y)}1Z\ , where S is a sum of products of derivatives of y~2bD>y) which may be bounded by B.4.20). This easily proves B.4.17). Using B.4.17), we have Qxp{ — rjb((j)r] ^) + i(j)r] N = V cn((}))ri~2"-\-0 {n n = 0 Substitute B.4.21) into B.4.16) to give B.4.21) N n=l + 0 B.4.22) Now for n o so that n=0 . B.4.23) Collecting together B.4.13), B.4.14) and B.4.23) and substituting for x we obtain the required formula. •
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(x; a, /?) 61 Theorem 2.4.5. For a<l, x>0, we have the asymptotic expansion p(x; a, fi) ~ I Jo (-1)"r{("l1!)a}*n cos For l<a<2 am* x>0, B.4.25) If a<l, then 1 r°° p(x; a, j?) = -Re e-itxexp{-tae-?nap}dt. 71 Jo Expand e~"* as a finite Taylor series with remainder, to give i N ( — ixY1 r00 p(x;a,P) = - X RKL\ 71 n = 0 n- Jo f yN+l r oo + 01 To calculate the integrals in this expression we rotate the contours of inte- integration through an angle \n$, to show that Substituting this into B.4.26) we obtain p{x;a,p)=- X t^ n\ (N
62 STABLE DISTRIBUTIONS Chap. 2 For a > 1 we carry out the same arguments, starting from a)li}dt, B.4.28) but now Stirling's formula implies that the remainder term is [ 71 Jo so that the series B.4.25) is in fact convergent. • In the extreme case a < 1, \f}\ = 1, all the coefficients in B.4.24) vanish and the theorem only asserts that p(x)->0 as x->0 faster than any power of x. More precise information is given by the following result. Theorem 2.4.6. Let a<l, x>0. Then p{x;a,-l) = 0, B.4.29) and p{x;a, 1)~ ( (^T | ), B.4.30) where is the coefficient ofy" in the power series expansion of the function exP I - -2 b G and Proof. Equation B.4.29) has already been proved (page 50). To prove B.4.30) use again the equation p(x;a,l)=—Re ( exp{-it-tax-ae~ina}dt. B.4.31) nx .) 0
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(.x; a, 0) 63 Setting ? = x~a we have to examine the behaviour, for large ?, of the func- function (°° -;z-za?e-*™}dz. B.4.32) The integrand is analytic in the complex plane of z = u + iv cut along the negative real axis, and we may therefore deform the contour in B.4.32), using the method of steepest descents. The saddle point is at a solution z0 of ~{-iz-zaZe~ina}=0, dz i.e. say, where r>0. The contour of steepest descent is determined by the equation Im{-iz~za?e~iina} = Im{-izo-zao?e~?na} , which in the neighbourhood of the saddle point is close to a circle of radius r centred on the origin. Since on this circle the integrand has a very simple form, we deform the contour in B.4.32) into F = FX+F2 + F3, where Fl = @, ir), F2 is the circular arc from ir to r, and F3 = (r, oo). The integral along F1 is Re i [' exp (v - va?)dv = 0 , B.4.33) and that along F3 is equal to = o(x~n exp {- A - a)(a/x)a/A ~a)}) B.4.34) for n > 0. Thus to obtain the required asymptotic expansion we have to consider the integral XoiO along F2, which is given by f Xo(O = ReH exp{re1(^ »*>- B.4.35)
64 STABLE DISTRIBUTIONS Chap. 2 where a($)= -e~i<t> + a-1e~ia<t>. Clearly a((p) has the convergent power series expansion where In B.4.35) substitute </>{r(l — a)}"* for </> to give /4-TrMl-a)}* ZoE) = (l-a)-*r*exp{-r(a-1-l)}Re x Jo B.4.36) We now expand ' as a finite Taylor series in r~* with remainder. For the estimation of the remainder term we need the inequality that, for 0^A— a)"*y(j)^\n , |exp{-};-2H(l-«)"i^]}Kexp(-i#2), B.4.37) where 0<rj < 1 and v\ does not depend on y. This is proved by noting that the absolute value of the left-hand side of B.4.37) is equal to where y(9)=l-(l-a)-1d-2(sin29-a-1sm2ad), and6 =j(l—ot)~*(f)y^in. It is easily checked that sin20 — a sin2a0 is increasing in 0<9^^n, and is thus strictly positive there. Hence y is continuous and y(9)<l on 0^9^^n, so that there exists rj<l with y(9)^rj<l. Thus B.4.37) is proved. It is not difficult to see that, for all y, <A\<f>\k+2t B.4.38) where A depends only on k and a, and it follows that
2.4. ASYMPTOTIC FORMULAE FOR THE DENSITIES p(.x; a, 0) 65 exp {-y-2b[(l-a)"*0j;]} = 0(|</>|3fce^2). B.4.39) Using this we may expand the function by Taylor's theorem in the form fi{y)=l+ ? y"rfB@) + O@4<N+1)yw + 1e*^2), B.4.40) n=l where dn{4>) is a polynomial in <f) of degree at most 3n. Write 3; = r i and substitute B.4.40) into B.4.36), to obtain n = 0 0(yN+1 ^ B.4.41) Since p we have, substituting for r, x f1 + (U S (a^-"/2A-a)an + O(r|N+1)/2A~a))). B.4.42) Combining this with the integrals along Fi and T3, and substituting for ?, we obtain the required result. • Theorem 2.4.7. For l<a<2, C= — 1 and :*:->• 00, p(x;a, -I)~[27r(a-l)]-*a-1/2(a-1)x-1+a/2(a-1)x x exp { -(a- l)a~a/(a- "x"'14'} x x(i + (-) f «>/^ \ \7T/ „= 1 where an is obtained by replacing a by a'1 in B.4.30).
66 STABLE DISTRIBUTIONS Chap. 2 Proof. It is unnecessary to work through the details because of the duality Theorem 2.3.4. We merely substitute B.4.30) into the equation p(x;a, -l) = x-1-ap{x-a;a-\ 1). • Remark. All the asymptotic formulae of this section may be differentiat- differentiated any number of times with respect to x, to give asymptotic expansions for the derivatives p{k)(x; a, /?). § 5. Unimodality of stable laws Definition. A distribution function F(x) is said to be unimodal if there exists at least one a such that F(x) is convex inx<a and concave in x>a. It follows from the theory of convex functions [130] that F is necessarily a continuous distribution (except possibly at a), and that F'(x) is non-de- non-decreasing in x < a and non-increasing in x > a. If F is any distribution for which F" exists, it is unimodal if and only if F" (x) > 0 for x < a and F" (x) < 0 for x > a. The point a is called the mode of the distribution. Theorem 2.5.1. If a sequence of unimodal distributions Fn converges weakly to a distribution F, then F is unimodal. Proof. Let an be a mode of Fn, and write a = lim sup an . n-»oo Suppose first that \a\ < oo, and choose a subsequence ank converging to a. Let xl5 x2 be points of continuity of F, with x1<a,x2<a. For sufficiently large k, xx <ank, x2<ank, and since Fnk is unimodal, Letting k-*co, (^^j. B.5.1)
2-5- UNIMODALITY OF STABLE LAWS 67 Since the points of continuity are everywhere dense, B.5.1) holds for all xt, x2<a. Similarly, for x1? x2>a, This shows that F is unimodal so long as \a\ < oo, and it remains only to show that no other case is possible. Suppose for example that a = + oo. Then B.5.1) holds for all x1? x2, so that F is everywhere convex. Since F is bounded, F must be constant, which is impossible. • Theorem 2.5.2. If Flt F2 are symmetric unimodal distributions, then so is F = F1*F2. Proof. It is obvious that F is symmetric. To prove unimodality, it suffices to consider twice differentiate functions Fl,F2, since any unimodal distribution may be approximated by a sequence of such. Then F"(x)=C F2'(x-t)F[(t)dt = r F[(x-t)F'2'(t)dt = J — oo J — oo = H {F[(x-t)-F[(x + t)}F2'(t)dt . B.5.2) Jo Because Fl and F2 are symmetric and unimodal, whence it follows from B.5.2) that F"(x)^0 in x<0 and that F"(x in x>0. • The basic result of this section is the following theorem. Theorem 2.5.3. Every stable distribution is unimodal. The plan of the proof is first to prove unimodality in the symmetric case P = 0, and in the extreme case /? = 1. We then deduce information about the harmonic function A(z, fi) introduced in Theorem 2.3.3. These suffice to prove the theorem.
68 STABLE DISTRIBUTIONS Chap. 2 A) Some auxiliary results Lemma 2.5.1. For ot< 1, x>0, p'x(x; a, 1) = —- p(x; a, 1) = B.5.3) j o where "{<{>) = (^ and cos <f> sin(la)<^ (l)i0 For x<0, a<l, Jpi(x;a,l) = 0. B.5.4) Proof. The function of the complex variable z = pel<t> is analytic in the complex plane cut along the negative real axis, and r °° p'x{x;oc, l) = Re y(z)dz. B.5.5) Jo Deform the contour into r = Fi+r2, where /~\ is the line segment @, fa1/1"ax"a/1"a), and T2 is the curve Supposing for the moment that this deformation is admissible, and noting that Re we have, after a little calculation,
2.5. UNIMODALITY OF STABLE LAWS 69 p'x(x; a, 1) = Re v(z)dz = Jr = Re f v{z)dz = Jr2 where a(<fi) and b(<fi) are given by B.5.3). To show that this deformation is admissible, write Fn for that part of the contour F lying within the circle \z\ = n, and by Cn the smaller arc of that circle joining the point at which the circle meets F with the point z = n. Then since it suffices to prove that lim ( v{z)dz = 0. B.5.6) n-»oo JCn Choose (fH such that 0<(f)o<iz(l— a)/2a; then [ v{z)dz o ho x exp {- nxal(a~ ^sin<f>[1 + na~ *cosec ^ cos (^tt + <?)]} d^^O as n^-oo, proving B.5.6). • Lemma 2.5.2. The functions a((f)) and b{4>) of Lemma 2.5.1 have the prop- properties : A) the function a((p) is strictly increasing in [0,7i], B) the function b (cp) has exactly one change of sign in [0, tz\ .
70 STABLE DISTRIBUTIONS Chap. 2 Proof. Since sin cfx <\> in <\> > 0 we have — (a cot tx(j) - cot </>) = i cosec2 acj) (sin 2a0 - 2a0) < 0 , so that the function \jj (a) = a cot a0 — cot 0 is decreasing, and i//(l)=0, whence the inequality a cot a(/> > cot cf) holds for 0<a< 1, O<0<7r. Thus — log aD>) = A -a)"x {a2 cota0 + A -aJ cot(l -a)^-cot $} >0, and A) is proved. To prove B) it is sufficient to show that , . 2a cos 0 sin(l—aH _ 1 /a sinB — oc)(f) \ A — a)sina(/) 1—a\ sin a^ / has just one change of sign. This is true since otherwise it has at least three, and differentiating \J/((f)) = a sin B — olL> — sin occf) we obtain a contradic- contradiction. • Lemma 2.5.3. Suppose that l<a<2. Then for x>0, p'x(x;a,l)<0, B.5.7) and when x<0, where (a-1) sin (a- /_s ysi y^/ sin txf) is strictly increasing in [0, n/a], and / sin 4> \2/(a~ n /2a cos ^ sin (a-1H \ \sinacf)) \ (a — 1) sin a^ has exactly one change of sign in this interval.
2.5. UNIMODALITY OF STABLE LAWS 71 Proof. A) x>0. As before we start from the equation p'x{x;a, l) = Re ( v{z)dz , B.5.8) Jo where is analytic in the complex plane cut along the negative real axis. We seek to deform the contour of integration into the contour F+ defined by - . j 71 \ 71 71 71 p = — cos (p/sin a I —\- a>\ , ^ <p ^ — ¦ \2 ¦ y a 2 2 Let zn = nQl<t>n be the point at which this contour meets the circle \z\ =n, and let Cn be the smaller arc of this circle joining zn and n. Then as before the deformation is valid so long as B.5.9) Cn Now it is easy to check that lim cf)n= . n-»oo CC Z Thus for all sufficiently large n we have y(z)dz cn <t>n exp{ —xa/(a~1)(n sin cf) — na cos a((/>+^rr))}d(/>^ o n2 exp {rf cos (^rra)} = o A), proving B.5.9). Thus p'x{x;oc, l) = Re I u(z)dz = Jr+ n f u i\ { sin 6 \ 1/a~1 sin(a— \N ) exp { -Xat(*-y) r^-r) ^ ^- \ X nla [ \ —Sin a(P/ —Sin a0 J / sin^ 1) / 2a cos x —: . 1 j V :. 1,rr.t Y — smacpj V — (a — 1) sin acp in which the integrand is easily checked to be negative, proving B.5.7).
72 STABLE DISTRIBUTIONS Chap. 2 B) x < 0. Write x = - y, so that P'x{x;ol, l) = Re f u(z)dz, B.5.10) f Jo where as always v{z) = iT1 y2**-" iz exp {-//(a- is analytic in the cut plane. We deform the contour to F~ =Tf + r2~, where Ff is the line segment @, -ia1/A"a)) and F2~ the curve p" = — cos 0/sin a((/)+^7r), — - ^ 4> < -. The justification for this change is exactly as in the proof of Lemma 2.5.1, and the proof is completed in the same way. • B) The unimodality of symmetric stable distributions The case a = 2 being trivial, we confine ourselves to values a < 2. We write the characteristic function/(t) =f(t; a, 0) in the Levy form log/@ = iyt + \° (ete-1 - -^) dM(u) where M{u) = ci{ — u) a, N(u)= —c2u a. Clearly M is convex and N concave. By symmetry, y = 0 and c1=c2 = c. Define, for each n, a function Mn(u) by Mn(u) = M(u) {U^~l)' Then Mn is convex, and lim Mn{u) = M{u).
2.5. UNIMODALITY OF STABLE LAWS 73 Similarly Nn(u)=-Mn(-u) defines a concave function with lim Nn{u) = N(u). n —>oo Let/n be the characteristic function of the infinitely divisible distribution corresponding to Mn and Nn in the Levy formula. Since \\mfn(t) = f(t) n-»oo it is suffices to prove that the corresponding distributions Fn are unimodal. Now eituLn{u)du\ , B.5.11) k = 0 * '• I -I -oo J where 5H=\° dM» + PdN» J-oo JO and = N» («>0). Now Ln(u) is positive, symmetric, and has a single maximum at the origin. Thus where /„ is a constant and !Pn a symmetric unimodal distribution function. By Theorem 2.5.2, ?/n*fc is also symmetric and unimodal, and so therefore is ^ B.5.12) From Theorem 2.5.1 it follows that F is also unimodal.
74 STABLE DISTRIBUTIONS Chap. 2 C) Stable distributions with |/?| = 1. Consider first the case a < 1. For x>0, Lemmas 2.5.1 and 2.5.2 show that p'x{x;ol, l) = 7r-1 where a is the unique root of &((/>) =0. Let x0 be any zero of p'x(x; a, 1). Then n p'xX{x0; ot, 1) < — —1 exp{ — Xq a(<fi)}b[(fi)d(f) = 0 , n A -a) Jo B.5.13) from which it follows that p'x(x; a, 1) vanishes exactly once on @, oo). Since p(x; a, l) = 0inx<0, this shows that p(x; a, 1) is unimodal. For a> 1, the same argument goes through using Lemma 2.5.3. Thus we have proved that the stable distributions with a#l and j8= ±1 are all unimodal. In fact, we have proved more (and will need the stronger result later): A) ifa<l the function p^.(x; a, 1) is zero in (— oo, 0] and has just one zero, and that simple, in @, oo), and B) for a> 1 the function is non-zero in [0, oo] and has one simple zero in (—oo,0). D) Completion of the proof of Theorem 2.5.3 A) a<l. It suffices to take O</J<1. Then f(t;a,P) = f{at;a,O)f{bt;a,l), where a = sin sin 1 sin
2.5. UNIMODALITY OF STABLE LAWS 75 so that by B.5.4), r oo p'x{x;cc,P) = a2b p'x{a{x-t); a, 0) p(bt; a, l)dt. Jo Because p(x; a, 0) is unimodal this implies that p'x(x; a, /?)>0 in x^O. Denote by xo = xo(P) the smallest zero of p'x(x, a, /?), so that xo@)=0 andxo(j8)>0fbr 0>O. We prove unimodality by showing that D = {(x, P);x>xo{P), 0^1, p'x{x; a, /})>()} is empty. Let D be the closure of D. From the asymptotic expansion of p'x(x; a, /?) for large x it follows that D is bounded, and contains at most a finite number of points of ft = 0 and none of of /? = 1. It is not difficult to verify (cf. Theorem 2.3.3) that the function A(T,fi) = x2p'x(x;a,P), B.5.14) where x = e~\ P = 2/j,/tz, is harmonic in — oo<t<oo, 0<fi<^n. The mapping (x, /?)->(t, fi) maps the compact set D into a compact set D in this strip which meets the boundary at most a finite number of points. On the boundary of D, A{z, /i)=0, which implies that A(t, h) = 0 in D, since A is harmonic. In particular, if D is non-empty, a point of D is map- mapped into a point with A (t, /x) = 0, which contradicts the definition of D. This completes the proof for a < 1. B) a>l. From the asymptotic expansion for x->0 it follows that, if jg^O, p'x{0; a, /?)#0. Thus on the boundary of the domain 0<x<oo, \P\ < 1, the function p'x{x; a, /?) vanishes only twice, at @, 0) and (x0, 1), where x0 is the unique zero of p'x{x; a, — 1). The remainder of the argu- argument goes through as in A), using the fact that B.5.14) defines a function analytic in —oo<t<oo, — jn<fi<jn. n C). a= 1. Use the representation {ro / itu \ c ol iyt + ] (eitu-1 - j-^jj (T^pr d" +
76 STABLE DISTRIBUTIONS Chap. 2 to show that f(t; l,cl,c2)= lim f{t; 1 -w, cl9 c2), n-'oo and use Theorem 2.5.1. • § 6. Domains of attraction Let Xt, X2, ... be a sequence of independent random variables, with the same distribution function F(x), and set 7 Xl+X2 + ...+Xn — An ,«,n Zn = . B.6.1) If, for a suitable choice of the constants An, Bn, the distribution of Zn converges weakly to a non-degenerate distribution G(x), we say that F(x) is attracted to G(x). The set of all distributions attracted to G(x) is called the domain of attraction ofG(x), and Theorem 2.1.1 shows that only stable laws have non-empty domains of attraction. Theorem 2.6.1. In order that a distribution F(x) belong to the domain of attraction of a stable law with exponent a @<a<2), it is necessary and sufficient that, as |x|—>oo, B.6.2) where the function h (x) is slowly varying in the sense of Karamata (see Appendix 1) and ct and c2 are constants with ct, c2 ^0, ct +c2>0 related to the stable law by B.2.1). Proof It follows from Theorems 1.7.3 and 2.2.1 that F will belong to the domain of attraction of a stable law with index a @<a<2) if and only if for some choice of the constants Bn, ), B.6.3) ), B.6.4)
2.6. DOMAINS OF ATTRACTION 77 lim lim sup n{( x2dF{Bnx) -( \ xdF{Bnx)) J = 0 . B.6.5) We first prove that these three conditions follow from B.6.3). For x>0 we write and define Bn to be the smallest value of x for which B.6.6) Then lim Bn = oo , n->oo and so «^-x-. B.6.7) X(Bn) It therefore follows from B.6.2) that and this in turn implies that B.6.3) and B.6.4) are satisfied. In order to verify B.6.5), we note that, since the expression there is non- negative, it suffices to show that lim lim sup n [ x2dF {Bn x) = 0 . B.6.8) 2-»0 n-»oo J |x| <e Integrating by parts, we have x2dF(Bnx) = 2B;2 \xX(x)dx-e2X(Bne). B.6.9) \x\<e JO We may clearly assume that sBn> 1; let s be the integer with The results of Appendix 1 show that
78 STABLE DISTRIBUTIONS Chap. 2 hm sup bk 1 so that constants A{ can be found such that B s f2k+1 h(x) o fc=o hk * s r 2k~ hBk) J 2k = i(c1+c2) + A4X(?Bn)Bn2?2. B.6.10) Combining B.6.9) and B.6.10) we have n [ x2dF(Bnx) ^C- \x\<? We have already shown that lim and by a property of slowly varying functions, lim 2 'n • oo Thus, from B.6.11), lim lim sup n [ x2 dF{Bnx) ^ lim A + 2AA) s2~a=0. Conversely, we have to show that, if B.6.3), B.6.4) and B.6.5) are assumed, then B.6.2) follows. It suffices to prove that
2.6. DOMAINS OF ATTRACTION 79 B) for every k>0, lim 1&- = k° x^^ i{kx) Fix x>0 and, for large y>0, take n so that Then (n+l)F(-Bn+1x) n < F(-y) < nF(-Bnx) ((nx)) n + 1 l-F(y)(n + l)(l-F(B,,+ 1x)) n and {n+l)x{BH+1x) n < x(y) < nX{Bnx) nX(Bnkx) n+1 ^ X(ky)" (n+l)X(Bn+1kx) n ' As ^^-oo, n-> oo, and B.6.3) and B.6.4) therefore imply that F(-y) ct and lim A^r = k*. Theorem 2.6.2. The distribution function F(x) belongs to the domain of attraction of a normal law if it satisfies one of the following conditions: A) F has finite variance, {2)forx>0, X(x)=l-F(x) + F(-x) = x-2h(x), where h(x) is slowly varying. Proof. From the results of § 1.7, F is attracted to a normal law if and only if, for some choice of the constants Bn, lim nX(Bnx) = 0 B.6.12) n-»oo for all x > 0, and
80 STABLE DISTRIBUTIONS Chap. 2 lim nB~2( I x2dF{x) -I f xdF(x)Y) = 1 B.6.13) n-°o \J|x|<?Bn \J|x|<?Bn / / for some ? > 0. Suppose first that 00 .2, x2dF(x)< oo , 00 and set a= p xdF(x), g2= [ {x-aJdF{x). — oo It is then easy to check that B.6.12) and B.6.13) are satisfied with Bn = Suppose on the other hand that 00 2. x2dF(x)=oo. 00 We first show that ([ xdF{x))=o([ x2dF{x)) B.6.14) \}\X\<2 ) \)\X\<Z J as z-* oo. To do this, choose a positive even function \[/ (x), increasing and unbounded in x>0, such that /= ( \j/{xJdF{x)< oo . J— oo Then \ 2 xdF(x)| ^ \ \j/{xJdF(x) \ {x/\Js{x)JdF{x) = showing that B.6.13) is equivalent to the condition lim nB~2 f " x2dF{x)= 1 . B.6.15) n-"oo J If the function H is defined by H(z)=[ x2dF(x), J -2
2.6. DOMAINS OF ATTRACTION 81 then we prove that B.6.12) and B.6.13) imply that, for all /c>0, r H(kz) i It is sufficient to prove this for k> 1, and since H(z)=- \Z x2dX(x), Jo it suffices to show that, as z->oo, - J* x2dX(x) = o {- j* x2dX(z^j . B.6.16) For any z, define n so that Bn^z^Bn+1. Then B.6.13) shows that, for large z, o Jo-- From B.6.12) and the fact that Bn+l/Bn-+l, we have for large z, kz rkBn+i \ x Bn = o(n-1B2). Thus H(z) is a slowly varying function. Conversely, suppose that H(z) is slowly varying. Then lim z and we define Bn as the largest value of z for which Then J3n->oo and fBn lim nB; 2 H (Bn) = nB; 2 T" x2 dX (x) = 1 . B.6.17) n-oo J0
82 STABLE DISTRIBUTIONS Chap. 2 Since H is slowly varying, ^oo H(Bn) so that lim nB~2 \ Moreover, nx{xBn) = n 0 r 00 dx JxBn 00 s = 0 y-2R-2 dx)=l H(xBn) ¦ J 2sxBn I 2'2s- s=0 Vdx(x)< fBs+1xBn) H(xBn) HBs+1 B.6.18) By Karamata's theorem (Appendix 1) H(z) has a representation jj2^j B.6.19) where A is a constant and e(u)->-0 as «-> 00. Thus for sufficiently large n, H{2s+1xBn) „ (f2I+lrf» e(u) _, ) , -^ r^ < 2 exp { -^- dw J ^ 2is, ^W) PUxBn l+ii j^ and as n^-oo, = 1+0A). Hence from B.6.17) and B.6.18) it follows that, for all x, lim nX{xBn) = 0. n—* oo Thus B.6.12) and B.6.13) are together equivalent to the statement that H is slowly varying. We now prove that if h(x) = x2x{x) is slowly varying, then so is H(z).
2.6. DOMAINS OF ATTRACTION 83 Integrating by parts, we find that H(z) = - \ x2dX{x)= -h{z) + f^dx. B.6.20) Jo J o x From Karamata's theorem we can choose z1=z1(z)<z so that lim lim 2->00 Then Jo and so sup z X sup — ou < h(x) h(z) z h(x ¦ 1 x = 1, ^dx; dx >ih(z) log (z/z,), H(z)=\Z Jo For any k>0, as z^-oo, ^ 2 log kh(z) = so that Collecting these results together, the theorem is proved. • It is clear that, when the variance is finite, H(z) will be slowly varying, and thus the theorem may be expressed in the following way. The distribution function F(x) belongs to the domain of attraction of a normal law if and only if 2 H(z) = c2dF(x) B.6.21) J -2 is a slowly varying function.
84 STABLE DISTRIBUTIONS Chap. 2 It may be shown in a similar way that the conditions of Theorem 2.6.1 are satisfied if and only if the function lim -^ = il. B.6.23) z-»oo \ ( ,,\«Jr/,.\ 2 H{z) = \x\adF{x) B.6.22) is slowly varying, and xadF{x) (-xfdF(x) These conditions imply that h(x) ~ -?--x(x) = o{H(x)} . B.6.24) c1-\-c2 Conversely, the methods used in the proof of Theorem 2.6.2 can be used to show that B.6.24) implies B.6.2), B.6.22) and B.6.23). This permits a unification of Theorems 2.6.1 and 2.6.2. Theorem 2.6.3. In order that a distribution function JF(x) belong to the domain of attraction of a stable law with index a, it is necessary and sufficient that 2-a Theorem 2.6.4. IfF(x) belongs to the domain of attraction of a stable law with index a, then for any d (O^d<oc), \x\ddF(x)< oo. Proof The result is obvious if the variance is finite. If it is infinite and a< 2, then Theorem 2.6.1, together with the results of Appendix 1, shows that for any e>0. Taking e sufficiently small, we have
2.6. DOMAINS OF ATTRACTION 85 \x\ddF(x) = If rf = 2, use the formula B.6.21). • Theorem 2.6.5. In order that the distribution with characteristic function f(t) belong to the domain of attraction of the stable law whose characteristic function has logarithm where cc, /?, c, co(t, a) are as in Theorem 2.2.1, it is necessary and sufficient that, in the neighbourhood of the origin, log/@ = iyt-c\t\'R(t) (l " tf j*j «(t «)) , where y is a constant, and h(t) is slowly varying as t-*0. Proof. To prove necessity, first note that, in the neighbourhood of the origin, where that branch of the function log is taken with log 1 =0. If, for G1(x)=l-F(x), G2(x) = F(-x), then 1-/@= = r (eta- VdG^x) + {°° (e~ta- l)dG2(x). B.6.25) Jo Jo The asymptotic behaviour of Gx and G2 for large x is given by B.6.2); from this we deduce the behaviour of their Fourier transforms, and thus that of 1 —f{t), as ?->0. Further calculations depend on the value of a; we distin- distinguish four cases. A) 0<a< 1. If suffices to examine the first integral on the right-hand side of B.6.25).
86 STABLE DISTRIBUTIONS Chap. 2 Integrating by parts, we have (eitx-l)dGl(x)=it o Jo where, by Theorem 2.6.1 ,h1(x) is slowly varying as x-> go. (We are assum- assuming, without loss of generality, that cx ^0.) The analysis of these integrals requires the following lemma. Lemma 2.6.1. If h(x) is a positive slowly varying function (as x->co), and x~ah(x) is monotone decreasing, then as ?-*¦(), v I v X j X a v ' j v I va X J X B.6.26) Proof of the lemma. Consider for example the integral involving sin x, and split it into four parts: + + d + A2t By the second mean value theorem, 00 • h(x/t). sin x - g - dx = lim ¦ h(x/t). sin x —^—- dx since as t->0 for fixed A, h(A/t) ^ { B.6.27)
2.6. DOMAINS OF ATTRACTION 87 From B.6.19), for all xe(<5, A2t), h(x/t) x where s can, by suitable choice of A2, be made arbitrarily small. Therefore sin x 6 Hx/t) . —— sin xdx dx^Sh(l/t). B.6.28) If x<A2t, the function h(x/t) is bounded, so that *2th{x/t) . , sin xdx B.6.29) It is easy to see that bounds analogous to B.6.27)-B.6.29) hold also for the integrals of sin x/xa. Finally, uniformly in S < x < A x, so that B.6.30) We now take Al=Al(t)-^oo and S = S(t)-^O (t-^0) sufficiently slowly for these bounds to remain true. Combining them, we have 00 . /j(x/t) sin x o *a An exactly similar argument deals with the integral involving cos x. Remark. By Euler's formula [12], f00 fsin xl dx _ _,. , (cos^ J 0 | COS X j Xa and consequently " (sin x| fe(x/t) . _ , fcosl I > ()^ o [cosxj xa v [sin i unu\ v ' ' B.6.31) Returning now to the proof of the theorem (case 1), we have from B.6.25) and B.6.31),
88 STABLE DISTRIBUTIONS Chap. 2 l-/@ = r(l-a)cos(i7ra)(c1+c2)/2(|rr1)|?|ax x\ lPW\ \ l ui where /? = (c1—c2)/(c1 + c2). B) l<a<2. In this case f00 |x|dF(x)<oo, J - oo and there is no loss of generality in supposing that xdF(x) = 0 . — 00 Then (l-eitx + itx)dF(x) = = it r (e^-^G^dx-it ((e-I'xt-l)G2(x)dx. Jo Jo By the same method as before, one proves the following lemma. Lemma 2.6.2. 7/1 <a<2 and the conditions of Lemma 2.6.1 are satisfied, then as t->0, oo glx_ 1 r oo e'X_ 1 h(A)dMiA) -3T- 0 x = e"i't(a-1)r(l-a)/2(l/f). B.6.32) Applying this result, we have 1 -f(t) =-chA/0111" (l + f/J ~ tan ftTra)) ,
2.6. DOMAINS OF ATTRACTION 89 where c = (c1 + c2) cos(jna)r(l—a), P=(c1-c2)/(c1+c2). C) a = 1. In this case 1 —f(t) differs by a term iy't from B.6.33) Integrating by parts and arguing as in the proof of Lemma 2.6.1, we have for t>0, itx ;t(l+x2)-2ix2A/i(x) A + x2J / x r °° eIX it\ —h(x/t)dx+iy1t+O(t2) = J t x ith(l/t)\ —dx(l + o(l)) + iyit J t x iy1t, B.6.34) Since f °° sin x , , f00 cosx dx = |yr, dx= -log ? + 0A). Jo * Jr x Carrying out similar computations for the second integral, and examining in the same way the case t<0, we come to the result that where c = %k(c1 + c2), P = (cl-c2)/(cl
90 STABLE DISTRIBUTIONS Chap. 2 D) a = 2. If F(x) has finite variance a2, and if xdF(x) = a , J— 00 then = iat-±<r2t2(l + o(l)). B.6.35) Suppose on the other hand that the variance is infinite, and write as be- before X(x)=l-F(x H(x)=\Xu2dX(u). Jo It was shown in the proof of Theorem 2.6.2 that h(x) = o(H(x)). By the methods which have already been repeatedly used, it is easy to show that f00 (e<*-l-itx)dX(x) Jt-1 B.6.36) and therefore c\t\~l \ogf(t)= (eitx-l-itx+$t2x2)dG1(x)+ Jo o M-1 CM-1 -±t2 x2dX(x) + o(t2H(\t\-1)). B.6.37) Jo Moreover, = O(t2h(\t\~1)) + it ( Jo
2.6. DOMAINS OF ATTRACTION 91 so that finally This concludes the proof of the necessity; indeed we have proved some- somewhat more, that when 0<a<2, then as ?-*¦(), B.6.38) and when a = 2, |/(t)| -exp^tfdr1)}. B.6.39) We now turn to the proof of the sufficiency. We write A(t) = \t\*h(t), and note that lim l(t) = 0 . The normalising constants Bn are chosen as follows, ; X{t) = c/n}, (this definition being meaningful for large n, since X(t) is continuous in a neighbourhood of t = 0). Then lim f(t/Bnf = = exp | - c \t |- (l + ifi ~ co (t, a)^j |, B.6.40) and the theorem is proved. • It was shown in § 2.2 that the normalising constants Bn determining attrac- attraction to a stable law of index a were necessarily of the form Bn = nllah(n), where h(n) is slowly varying. The classical theorems of probability (de Moivre-Laplace-Levy) show that, for convergence to the normal law, the most interesting case is that in which Bn = arv for a constant.
92 STABLE DISTRIBUTIONS Chap. 2 On the other hand, any stable law G of exponent a belongs to its own domain of attraction, with Bn = an1/tx. This suggests the following defini- definition. A distribution belongs to the normal domain of attraction of a stable law G with exponent a if it is in the domain of attraction ofG and if the normalising constants are given by Bn = = an1/a, where a is a constant. Normal domains of attraction are characterised by the following theorems. Theorem 2.6.6. In order that the distribution F(x) belong to the normal domain of attraction of the normal distribution it is necessary and sufficient that it have finite variance a2, and then Bn = an*. Proof The sufficiency follows from Levy's theorem. To prove the necessity take Bn = an* and assume without loss of generality that xdF(x) = O . It then follows from Theorems 2.6.2, 2.6.5 and equation B.6.39) that lim ^ H (w)" *'2 • This is only possible if H(oo)=— I x2dx(x) = x2dF(x) = a2 < oo , Jo ' J-oo and a = a. • Theorem 2.6.7. In order that the distribution F(x) belong to the normal domain of attraction of the stable law G(x) with exponent a @<a<2) and given constants cl5 c2, with Bn = ani, it is necessary and sufficient that
2.6. DOMAINS OF ATTRACTION 93 F(x) = (c1a« + a1(x))\xr\ (x<0), F(x)=l-(c2aa + a2(x))x-\ (x>0), } where a?(x)->0 as |x|->oo. Proof. The sufficiency is immediate. To prove the necessity, note that from B.6.35), for small t, lim x(ani/a\t\- >~aUfn = |t|a, t-0 which is only possible if B.6.41) holds.
Chapter 3 REFINEMENTS OF THE LIMIT THEOREMS FOR NORMAL CONVERGENCE § 1. Introduction In this chapter we consider a sequence XUX2, ... of independent, identi- identically distributed random variables belonging to the domain of attraction of the normal law. As shown in § 2.6, the X-} necessarily have a finite vari- variance a2. We shall assume that E(Xj) = 0; then necessarily the distribution Fnof ZH=(X1+X2 + ... + XH)l<m* C.1.1) converges to the normal distribution <P with zero mean and unit variance. Indeed, with we have Rn(x) = Fn(x)-<P(x)^0 as n->oo, uniformly in x. In § 3 we give an asymptotic formula for Rn(x) in terms of n~*. In the later sections the behaviour of sup |-Rn(x)| for large n is the object of study. The symbols//, and v will denote the characteristic functions correspond- corresponding to the distributions F (the common distribution of the X,), Fn and <P respectively (so that v(t) = e~*t2). We shall also write a2 = E(Xf), a,= E(Xj), P,= E\Xj\. § 2. Some auxiliary theorems This section is devoted to some important properties of the characteristic functions fn(t).
3.2. SOME AUXILIARY THEOREMS 95 Theorem 3.2.1. If /?3 is finite, then A) for\t\^Tn = <73 l 6 C.2.1) B) where gn(t) = l+n~iP1(it), and lim S(n) = 0, C) for\t\^Tn3, where lim dl(n) = 0 C.2.2) C.2.3) . A) Using the expression exp(^)dF(x), where |0| ^ 1, it is easy to show that, in \t\ ^ Tn, |/(t/crn})| >f|. Therefore in this domain, where by Lyapunov's inequality, d3 . „, Since fn(t) = exp {n log (t/en*)}, it follows that B4/25J
96 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 \Mt)-v(t)\ - J/2 exp - 1 where we have used the obvious inequality Finally, if \t\^Tn, then +2 11 4. |3 o +2 2 6a3n> 4 ' and C.2.1) is proved. B) Since f{tlan^)-\=-~n we have o \t\: n- In 6a5 n* Vn Using C.2.4) again, we have, for 11\ ^ Tn3 , where But «3(ftK 6cr3 n^ (itK C.2.4) C.2.5) C.2.6) whence C.2.2) follows. C) The proof of C.2.3) is exactly similar. • By more intricate arguments (for which see, for example, [48] page 219), the second part of the theorem may be strengthened as follows.
3.3. THE DEVIATION Rn(x) 97 Theorem 3.2.2. J/& is finite for some s ^ 3, then for \ 1| ^ Tns = n* <r3/8s&1/s, s-2 . . . . i +\t\^-"}e-*\ C.2.7) where c(s) depends only on s, S(n) depends only on n and lim S(n) = 0, and where the coefficients ckj are polynomials in the variables (ocjcf), r = 3, 4, ..., k-j+3. § 3. The deviation Rn(x) Under the assumptions made in this chapter, In this section we investigate the asymptotic behaviour of Rn(x). The character of the argument and its results depends on whether or not the Xj have a lattice distribution. We first consider the non-lattice case. Theorem 3.3.1. If the independent random variables Xj have the same non- lattice distribution with finite third moment, then Fn(x)-*(x) = n~* ^|^ A -x2)e-t*2 + o(n-t), C.3.1) uniformly in x. The proof follows on combining the expansion of/n presented in § 2 with the following lemma. Lemma 3.3.1. IfF is a non-lattice distribution, then for each co>0, there exists a sequence X(n) with X(n)-^co as n->oo, such that I = \fn(t)\r1dt = o(e-inV2). C.3.2)
98 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 Proof. If F satisfies Cramer's condition (C) lim sup |/@l < 1 , r-*oo (cf. § 1.4), then there is a positive constant c (depending on co) such that | f(t) | < e ~c for all 111 ^ co. Choosing X (n) = n, we have On the other hand, suppose that lim sup |/@l = 1 • Since the distribution is non-lattice, \f(t)\ < 1 for all t (see § 1.4), and hence a(t)= {1— sup defines a continuous, non-decreasing function, with lim a(t)= oo . t—* oo For any X(n), m . fA(n)l /. IV. 7= If a(n)^n*, set X(n) = n, so that I ^ A -n"*)" log(n/co) = o(Q-*n If on the other hand a(n) >ni, then Now a(t) takes every value larger than a(co), and hence for sufficiently large n we can define X(n) by the equation a\_X(n)] = ni. Then X(n)< n and I ^ A -«-*)" log (n/co) = o{e Proof of Theorem. We have to show that, uniformly in x, where
3.3. THE DEVIATION Rn(x) 99 G(x)=*(x)-n a3 A-: According to Theorem 1.5.2, y Mt)-g(t) dt, C.3.3) where ^l = sup|G'(x)|, g(t) is the Fourier-Stieltjes transform of G, and we take T=l(n)ni, where X(ri) is determined by the lemma. It therefore suffices to prove that fn(t)-g(t) For sufficiently large n, C.3.4) does not exceed Ii ) • C-3.4) i = <T3n*/24/?3, and then the integral in f3, where \fn(t)-g(t) ~Tn3 fn(t) 9n(t) dt, dt, dt. Now so that, by Theorem 3.2.1, " By Lemma 3.3.1, \f"(t)\r1dt=o(n-±), and clearly C.3.5) C.3.6) C.3.7)
100 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 Combining C.3.5), C.3.6) and C.3.7) to obtain C.3.4), we complete the proof. • It is however easy to see that C.3.1) cannot apply to random variables with a lattice distribution. Suppose for instance that Xj takes the values +1 with probability j each, so that Xl-\-X2 + ... + Xn has a binomial distribution. Then Fn(x) has a discontinuity at x = 0, whose magnitude is by the de Moivre-Laplace theorem (cf. Theorem 4.2.1) asymptotically B7rn)~i. Thus Fn(x) cannot be approximated by any continuous function to an accuracy better than n~*. To obtain an analogue of C.3.1) we introduce the discontinuous function S(x)=[x]-x+|, where [x] denotes the integer part of x. For the binomial example, it is possible to compute the expression *x2 + o(n-*). C.3.8) This is a special case of the following general result. Theorem 3.3.2. Let X1,X2,... be independent, identically distributed lattice random variables, taking values in the arithmetic progression {a + kh; k = 0, + 1, ...}, (h being maximal). Then, uniformly in x, =>-+*->. <3-3.9» Proof Denote the right-hand side of C.3.9), without the error term, by G(x), and its Fourier-Stieltjes transform by g(t). If n is sufficiently large, Theorem 1.5.3 applies to give t n where ,4 = sup |G'(x)| < oo. We have therefore to show that
3.3. THE DEVIATION R,(x) 101 t = o(n~*). C.3.10) We first compute the characteristic function dn(t) of Expanding S(x) in a Fourier series, we have S(x) = Yj — sm B7rvx), V= 1 """ so that h ^ 1 . x -sin {Tv<m2x — v=l V7r with t = 2n/h. Therefore dn(t)= rx Yj ~ exp(itx — jx2)sin(TV<mix — Tvan)dx = oo p — izvan where the symbol X' indicates that v = 0 is excluded from the range of summation. We now go on to evaluate the integral C.3.10), expressing it in the form h + h + h' where /,- is the integral over the interval Aj, and Ai = (-n, -^ ^2 = (~^xo A3 = fa<mi, n). A) Consider first the integral I2, and suppose that Tn3<^xarv (if not, the calculation is similar, but simpler). Since h is maximal, there exists a constant c1>0 such that, in the interval Tn3^|t| ^jtott* , \fn(t)\ = \f(t/ani)\n = o(e-^). C.3.11) In this interval, for some c2>0. Finally, using C.3.2) we have
102 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 /,= Tn3 -Tn, Tn3 -Tn3 Mt)-g(t) dn(t) o(n~*) = t 1 °° 1 i Z-i U.I B) ' — Tn, C.3.12) and simple calculations give L = fc=l f*(t)-dH(<m*t) where r=[<r xt 1ni — |] and (k-i)t dt = dt kxcn2 Because of C.3.1) and C.3.11), Tn Tn C.3.13) C.3.14) = O(n-i). C.3.15)
3.3. THE DEVIATION Rn(x) 103 Moreover, Therefore Jk = O(l/kn), and C3.13) gives A similar argument shows that so that C.3.10) is proved. • If the random variables Xj have finite movements of order k>3, we can extend the asymptotic expansions C.3.1) and C.3.9) down to terms of order n~i{k~2). Theorem 3.3.3. If the independent random variables Xj have /3k finite for some /c>3, and satisfy Cramer's condition r->oo l, (C) then, uniformly in x, X n~iJ Pj(-<P) + o(n-*k-2)). C.3.17) Here JJ+2S (ir2s4j) s=l = *(x)Qj(x), where Qj(x) is a polynomial in x, and c{sj) is a polynomial in the moments av. There is a connection between Pj( — $) and the function Pj(it) of § 2: eitxdPj(-<P(x)). 00 We omit the proof, which is similar to that of Theorem 3.3.1 and uses C.2.7) and Theorem 1.5.2, (see [48], page 235).
104 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 For lattice distributions the analogue of C.3.17) is considerably more complicated, and involves the functions Sj defined by the Fourier series S2j(x) =2 f B7rv)-2jcosB7ivx), v= 1 S2J+i{x)=2 v= 1 Furthermore, write v= 1 and d= 1 if v=l or 2 (mod 4), = -1 if v = 0or3 (mod 4). Theorem 3.3.4. Let Xj be independent, with the same lattice distribution taking values in the progression {a + kh; k = 0, + 1, ...}, h being maximal. U Pk is finite for some k, then uniformly in x, k~2 ( h y a, I—r x (<m*x _/na Vna The proof of this theorem also will be omitted. § 4. Necessary and sufficient conditions It follows from Theorems 3.3.1 and 3.3.2 that, whenever /?3<oo, |Fn(x)-4>(x)| = 0(n~*) - C.4.0) This raises the question of giving necessary and sufficient conditions for C.4.0) to hold. In this direction we have the following result. Theorem 3.4.1. In order that \Fn(x)-<P(x)\ = O(n-id), @«5<l), C.4.1)
3.4. NECESSARY AND SUFFICIENT CONDITIONS 105 it is necessary and sufficient that ) = O{z-d). C.4.2) Proof. Throughout this chapter the random variables X} have E(X,)=0 and E(Xf) = c1 < oo. Hence C.4.3) where lim y(t) = O . t-0 Near t = 0, the equation has only a finite number of solutions (except in the trivial case when the Xj have a normal distribution, which we exclude; see § 1.6), and therefore, for some positive e, y(f)#O, @<|t|<e). C.4.4) We first prove the theorem for symmetric random variables, for which the characteristic function f(t) is real. Lemma 3.4.1. For symmetric random variables Xj, C.4.1) holds if and only if t2\y{t)\dt = O{x3+d). C.4.5) o Proof. To prove the necessity of C.4.5), integrate by parts in the equation etxd(Fn(x)-0(x)), to obtain —e~it2 f00 f00 = e^ {Fn(x) - 0(x)} dx . -^ — 00 In other words, the functions
106 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 and r(x) = Fn(x)-<P(x), which belong to L2(— oo, oo), are Fourier transforms of one another. It is easy to compute that and are also Fourier transforms of one another, and Parseval's theorem therefore implies that f ip{t)I{F)dt = i ?{x)J(xjdx. (*) J — oo J — oo Thus C.4.1) implies that and therefore logn e ' {1— exp[ — jt2y(t/(jni)~]}dt = 0(n id). -logn Because of C.4.4), the integrand is of constant sign for sufficiently large n, so that o As n->co, the integrand is equal to uniformly in 0<f<l, so that = O{n-iC+S)). C.4.7) o Now choose n so that
3.4. NECESSARY AND SUFFICIENT CONDITIONS 107 Then \ r " /2 t2\y(t)\dt = o and C.4.5) is proved. To prove the converse, note that Theorem 1.5.2 implies that |F,,(x)-<P(x)| ^ i ^\fn(t)-e-*2\\t\-idt+ ^fj.. C-4.8) We choose T = don*, where 5 > 0 is sufficiently small that max \y(t)\^. C.4.9) By C.2.4), n*) | exp {- and so for \t\^T, using C.4.9), Therefore t/an Vz ] u2y{u)du = t-0 Jt Jo t-0 \\ u2y{u)du \d{r 1e"i' } t (. J 0 J Combining this with C.4.8) proves the lemma.
108 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 To complete the proof for the special case of symmetric summands, we have to express the condition C.4.5) in terms of F(x). Near t = 0 , f(t)-l=-Wt2(l+o(l)), so that = {f(t)-l}+O(t*). C.4.10) It has been remarked that y has constant sign in @, e); suppose for in- instance that y{t)<0 in this interval. Then from C.4.10), a2 ( ( {cos tu-1)dF{u)dt + O{x5) = Jo J - oo sin ux u2x2 1 dF{u) + 0(x5). 1 + oq \ UX O C.4.11) Consequently, C.4.5) is equivalent to the condition p.4.12) Lemma 3.4.2. The conditions C.4.2) and C.4.12) are equivalent. Proof. It is easily checked that, for all u, sin u . u2 u 6 ^ ' so that C.4.12) implies that MX 6 whence '
3.4. NECESSARY AND SUFFICIENT CONDITIONS 109 u2dF{u) = O{xd), J\u\>x - 1 which implies C.4.2) Conversely, suppose that C.4.2) holds, and write R{z) = [ u2dF{u). \u\>z From the inequalities .2 sin u u - 1 u and the condition R(z) =O(z d), it follows that 00 fs'mux u2x2 1 fs -oo V ux 6 x^ 5! ^ 3x2K(x~1)+x4 f uR(u)du = O{x2+d). • Jh<*-i The theorem is therefore proved for the special case in which the Xj have a symmetric distribution, and we now proceed to the general case. To prove that C.4.2) is necessary, consider the independent, symmetric random variables where Xnl, Xn2 are independent with distribution F. Clearly the charac- characteristic function of Yn is \f{t)\2, and the distribution function of isGn(x) = Fn(x)*{l-Fn(-x)}.
110 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 Thus, if C.4.1) is fulfilled for Fn, then <: \{Fn(x)-<P(x)}*{l-Fn(-x)}\ Then by Lemma 3.4.1, t2\Rey{t)\dt = C and as before Re log/(t)dt = o 2x2 C.4.13) Hence, exactly as in the symmetric case, the necessity of C.4.2) is proved. To prove the sufficiency we need the following lemma whose proof, being the same as that of lemma 3.4.1, will be omitted. Lemma 3.4.3. // = O(x3+d), C.4.14) \ o then Because of this lemma it suffices to prove that C.4.14) follows from C.4.2). We have already shown that C.4.2) implies that [* t2\Rey{t)\dt=O{x3+3) ; C.4.15) Jo we now show that this remains true if Re y(t) is replaced by Im y(t). Writing as before R(z)=[ x2dF(x), )\x\>z
MAXIMUM DEVIATION OF Fn FROM <P and using C.4.2), we have |Im!<72f2y(f)| = |Imlog/(f)| = = |Im f(t)\+O(t*) = (sin tu — tu)dF(u) \u\3dF{u) + o R{u)du+O{tA) o so that Jo Combining this with C.4.15), Remark. The reader will note that, in the symmetric case, the theorem is also true for 5 = 1. In general this is not true, and one must add to C.4.2) the additional condition which is, of course, automatically satisfied for symmetric F. § 5. The maximum deviation of Fn from <JP. Suppose that the random variables Xi have finite third moment. From what has already been proved, we know that
112 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 2 but so far we have no information on the influence of F on the constant implied by the 0 symbol. It is clear from Theorem 3.3.1 that this constant must depend on F; the crucial parameter terms out to be the ratio /?3/cr3. Theorem 3.5.1. If p3 = E\Xj\3<oo, then max |F,,(x)-<P(x)| < Cp3/<r3n* , C.5.1) X where C is an absolute constant. The proof is similar to that of Theorem 3.3.1. Using Theorem 1.5.2 we have Taking T=Tn = o3ni/5p3, and using Theorem 3.2.1 A), we have Clearly the smallest constant C which one can put in C.5.1) is d = sup {^ max |F,,(x)-<P(x)| 1, n,F I A>3 x ) where the supremum is taken over all n and all distributions F with zero mean and finite third moment. The exact value of the 'proper' constant Cx is as yet unknown, but we shall calculate the 'asymptotically proper' constant C2 = lim sup sup max \ —— \Fn(x) — <P(x)\ } . n-oo F x I H3 J Theorem 3.5.2 Proof. The results of § 3 show that
3.5. MAXIMUM DEVIATION OF Fn FROM <P 113 llm max{^ |FH(x)<P(x)|} ^ n-oo x I P3 J A>3 OB7C) for non-lattice distributions F, and that lim max I^|F.(x)-*(x)|} -(&)"* (^ + n-oo for lattice distributions of step h. Thus the problem reduces to that of finding a sharp upper bound for ha2 | oc31 among lattice distributions of step h. Lemma 3.5.1. For lattice distributions with step h, \ha2 < inf E\X— c|3 . C.5.2) c Proof. There is no loss of generality in taking h=l, and in supposing that the point c0 at which /?3(c) = E\X — c|3 attains its minimum is c0 = 0. Then since /?'3(c0) = 0 we have ) = (°°x2dF(x). C.5.3) Jo • Moreover, since EX2 ^a2, it suffices to prove that I" x2(|x|-|)dF(x)^0. C.5.4) J — 00 Now this inequality is trivially satisfied unless there is a discontinuity x0 of F in the interval (— |, |); since h = 1 there can be at most one such dis- discontinuity. Suppose without loss of generality that 0<x0<j. Then, be- because the jumps of Fare at the points xo + k(k = 0, ±1, ...) and because of C.5.3), we have x2dF(x), J — oo J—c r x3dF(x) > x0 I" x r Jo Jo
114 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 and adding these two equations, x2dF{x). • —oo Lemma 3.5.2. For lattice distributions F the lower bound Ao of those A for which C.5.6) is Ao= {A0)*+ 3} /6. C.5.7) Proof. There is no loss of generality in confining attention to those distri- distributions for which h = l, 0^=0, a3>0. Then the inequality A^A0 is equivalent to the assertion that, for all such distributions, )>0. C.5.8) Let us first remark that C.5.8) implies that A >?, since if A <? the function Va\x) = -^ \x\ ~'6X~1 is negative in x> — a= — j{^+ A)~1 , and it is easy to find a distribution with a!=0, a3>0 and h=l, concentrated on ( — a, oo], and C.5.8) is contradicted. If A>^, the function <\>A has two zeros, at Ax= — \{^+A)~l and at A2 = \{A — %)~1; it is negative on(A1, A2) and positive outside this inter- interval. If A2 — A1>1 we can find a distribution with h=l, oc1=O and a3>0 concentrated on the interval {Ax, A2), and C.5.8) is again contradicted. It follows therefore that so that We have therefore proved that and it remains to prove the reverse inequality.
3.5. MAXIMUM DEVIATION OF Fn FROM <P 115 We introduce the function |x-r|3dF(x) + i(°° (x-rKdF(x) , where A = {A0)^ + 3} /6, and denote by t0 the point at which \j/ attains its minimum. We may assume that a3(T0)= (otherwise replace F(x) by l—F( — x)). We distinguish two cases: A) a3(T0) = 0. By Lemma 3.5.1, r oo Aj83 — i |a3| ^ ^(t0) = A \x-xo\3dF(x)^2AG2>^G2 , J - oo which implies C.5.8) B) a3(T0)>0. We have ^'(ro) = CA + i) p (x-roJdF(x)-CA-i) @°(x-ToJdF(x)=0 J - oo J to C.5.9) Taking for simplicity t0 = 0, we have °° x3dF(x)>0, - oo and ^(to) = a(°0 |x|3dF(x)-i[°° xidF(x)^A/J3-±\a3\. C.5.10) J — oo J — oo In view of C.5.9) and C.5.10) it suffices to prove that follows from ° x2dF(x) = (A-i)(°°x2dF(x). C.5.11) -oo JO Since 4>A ^ 0 outside (A j, A 2) and since A2 — A t = 1, it suffices to study the
116 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 case in which F has a discontinuity (it cannot have more than one) in Suppose for instance that F has a discontinuity at —xoe(A1,O~\. Then since <j>A decreases on (— co, 0) and increases on @, co), we have x2cj)A(x)dF(x) > \ $)xo-$ f° x2dF(x) = J - oo using C.5.11). Similarly °°2 x2dF(x) o Adding these two inequalities, oo r oo — oo J — oo A similar argument works if xog [0, A2), showing that, for this particular value of A, C.5.8) holds. • In view of the earlier remarks this lemma proves the theorem. Remark. It is easy to verify that equality holds in C.5.6) when, and only when, F is the distribution of ±hY, where Y takes the values Ax = 2-?A0)* and A2 = ?A0)*-l with respective probabilities ?A0)*-1 and 2—?A0)*. Consequently, the upper bound C2 is attained.
3.6. DEPENDENCE OF THE REMAINDER TERM ON n AND x 117 § 6. Dependence of the remainder term on n and x. The estimates so far obtained for Rn(x)=Fn(x) — <P(x) are independent of x, and are therefore of little value for large x. For instance, since Zn has variance 1, it is a trivial consequence of Chebyshev's inequality that \Fn(x)-0(x)\^2/x2. C.6.1) A precise analysis of this remainder term for large values of x will be the subject of Chapters 6-14, but we here prove a refinement of C.6.1) which makes weaker assumptions (about existence of moments) than do the more detailed results to be proved later. Theorem 3.6.1. Let Xl,X2, ... be independent {but not necessarily identi- identically distributed) random variables, with zero means and finite variances a2 = E(X2). Let Fn be the distribution of the random variable Zn = (X1 + X2 + ... + Xn)/sn, where and write A = A(n) = sup \Fn{x)-<P(x)\. If A (n)^ for all n>N, then for n>N, \Fn(x)-<P(x)\ ^ min [j, C ^f#} , C-6.2) where C is an absolute constant. Proof. Let a ^ 1 be a real number, whose choice we defer, except to re- require that each Fn be continuous at a and — a. Then T x2dFn{x)= T x2d{Fn{x)-<P{x)} + " x2d<P{x) = J—a J —a . —a = a2{Fn(a)-^)}-a2{Fn(-a)-4>(-a)} + -2V x{Fn{x)-<P{x)}dx+\ x2d0{x), J —a J —a so that
118 LIMIT THEOREMS FOR NORMAL CONVERGENCE REFINEMENTS Chap. 3 x2dFn{x)> -4a2A + T x2d<P(x). —a ' Since this implies that x2 dFn (x) ^ 4a2 A + \ x2 d& (x) It is clear that , (x2{l-FJx)} if x^a, x2dFn(x) >\ { "{ n ' ' C.6.4) fx2{l-^(x)} if >U(x) if so that, for \x\^a , [ x2dFn{x)+ [ x2d<P{x). )\x\2a Thus for all \x\^a, (l+x2)|Fn(x)-^(x)|^Da2 + l)zl+2( x2d^(x). C.6.6) Now, writing c = B/tt)* , = c a oo
3.6. DEPENDENCE OF THE REMAINDER TERM ON n AND x 119 Substituting this into C.6.6) and taking we obtain where C is an absolute constant. • A similar theorem is the following, whose proof may be found in [112]. Theorem 3.6.2. Under the conditions of Theorem 3.5.1, where C is an absolute constant.
Chapter 4 LOCAL LIMIT THEOREMS § 1. Formulation of the problem Suppose that the independent, identically distributed random variables X1,X2,-.- have a lattice distribution with interval h, so that the sum Zn = X1 + X2 + ... + Xn takes values in the arithmetic progression {na + kh; /c = 0, ±1, ...}. The distribution of Zn is completely determined by the numbers Pn(k) = P{Zn=na + kh}. A local limit theorem is an asymptotic expression for Pn(k) as n->oo. If the distribution of the X-} belongs to the domain of attraction of a stable law with density g(x), the natural way to obtain an asymptotic expres- expression is to associate with the stable law a discrete distribution on the lattice {khn}, where hn = h/Bn, and the Bn are the usual normalising con- constants, assigning to khn the probability Pn(k)= g(x)dx~hng(khn). , Uk~i)hn The theorems of § 2 give conditions which ensure that PH(k)~?H(k). Another sort of local limit theorem arises when the distribution of the Xj, belonging to the domain of attraction of a stable law with density g(x), has a density p(x). The problem then is to give asymptotic expres- expressions for the density pn(x) of the normalised sum ard in particular to give conditions under which pn(x) converges (in some sej se) to g(x). These problems are examined in § 3.
4.2. LOCAL LIMIT THEOREMS FOR LATTICE DISTRIBUTIONS 121 The first local limit theorem to emerge was that of de Moivre and Laplace. In the last fifteen years local limit problems have been studied by many authors, notably Gnedenko, whose work on the subject was motivated by the work of Khinchin [74] on the analytic foundations of statistical mechanics. § 2. Local limit theorems for lattice distributions Let the independent random variables XltX2, ...,Xn, ... D.2.1) have the same distribution, concentrated on the arithmetic progression {a + kh}, and write zn = x1+x2+...+xn, P(Zn=an + kh) = Pn(k). Theorem 4.2.1. In order that, for some choice of constants An, Bn lim sup n-»oo k h "v ' *V B n D.2.2) where g(x) is the density of some stable distribution G with exponent a @<a^2), it is necessary and sufficient that A) the common distribution function F of the Xj should belong to the domain of attraction of G, and B) the interval h be maximal. Proof. The transformation Xj = (Xj-a)/h permits us to confine attention, as we shall, to the case a = 0, h = 1. (i) Necessity. If h = 1 is not maximal, there is some integer b > 1 such that Pn(k) = 0 unless b divides k. Since this clearly contradicts D.2.2), the necessity of B) is proved. Moreover, D.2.2) implies that
122 LOCAL LIMIT THEOREMS Chap. 4 as n->oo, so that A) is also necessary. (ii) Sufficiency. Choose An, Bn so that Fn-+G. The characteristic func- function of Zn is given by where/is the characteristic function of the X}, and therefore *Bn where z = znk=(k-An)/Bn. If v is the characteristic function of the stable distribution G, then D.2.4) From D.2.3) and D.2.4), for any k, 'nPn{k)-gf^~) </1+/2 + /3+/4, D.2.5) where -A -A h= h= f \f(t/Bn)\"dt, JeBn^\t\$nBn and A and e are constants, to be determined. We turn now to the estimation of the integrals /,-.
4.2. LOCAL LIMIT THEOREMS FOR LATTICE DISTRIBUTIONS 123 A) Condition A) implies that, uniformly in \_ — A, A], the integrand in 1± converges to zero as n->oo. Hence lim /i = 0 . D.2.6) B) We remark that, for any 5<ct, there is a positive number c(d) (not depending on n) such that in some neighbourhood of t = 0 (also indepen- independent of n), . D.2.7) To prove this, use the results of § 2.6 to show that / satisfies where c>0 and h is a slowly varying function with \imnB;«h(Bn)=l. By Karamata's theorem (Appendix 1) there exists a function e(u)->0 («->co) such that, as n->co, Bn u J D.2.8) If therefore n is sufficiently large, for some c(S) >0. Consequently, for sufficiently large n, s > 0 can be chosen so that exp{-c(ia)|r|ia}d^O as
124 LOCAL LIMIT THEOREMS Chap. 4 C) Because h= 1 is the maximal interval, the results of § 1.4 show that there is a positive constant c such that, for 1/@1 ^e'c. Since Bn = o(e"c) (Theorem 2.1.1), we have h= as n->oo. D.2.9) D.2.10) D) Finally, since v(t) is integrable on (— oo, oo), we have lim 74 = 0. D.2.11) A-* oo Thus we have proved that each I} can be made arbitrarily small, and D.2.2) follows. Theorem 4.2.2. Let the conditions of Theorem 4.2.1 be fulfilled. Then, with the same choice of normalising constants An, Bn, lim n-> oo k Pn(k) -wg = 0. D.2.12) Proof. Denote by Gn (x) the distribution on the lattice {(an + kh — An)/Bn} obtained by grouping the distribution G in the manner described in § 1. Denote by Fn(x) the distribution function of Then D.2.12) asserts that the variation distance px(Fn, Gn) tends to zero as n->co. To prove this, restrict attention as before to the case h=l, a = 0. By Theorem 4.2.1, A (n) = A = sup BnPn{k)-g( ¦k-A, 0 as n^co, and consequently D.2.13)
4-3. A LIMIT THEOREM FOR DENSITIES 125 From the analytic properties of the density g(x) established in Chapter 2, there exists a constant C such that when \x\ ^ \y\. Consequently I B^g{{k-An)/Bn- f /2 fl,(x)d l\^BnA-1A J-A-V2 I |fc-^n|<BnJ-% •/(fc-^n-i)/Bn V Bn = 0(b;1), and since g(x)dx= l+O(Aia), we have Y, B~1g((k — An)/Bn)=l+O(Aia + B~1). D.2.14) Similarly, ? B~1gf((fc-yln)/Bn) = O(zlia + B~1). D.2.15) Finally, since the probabilities Pn(k) sum to unity, it follows from D.2.13) and D.2.14) that o(l). D.2.16) \k-An\>BnA-lA Combining D.2.13), D.2.15) and D.2.16) proves D.2.12). • § 3. A limit theorem for densities In this section we shall assume that the common distribution of the Xj has the property that, for some value N of n, the random variable D.3.1) has a density pn(x). This clearly implies the existence of pn{x) for all n^ N.
126 LOCAL LIMIT THEOREMS Chap. 4 Theorem 4.3.1. In order that for some choice of the constants An, Bn, lim sup\Pn(x)-g(x)\^0, D.3.2) n-*oo x where g(x) is the density of some stable distribution with exponent a @< a ^2), it is necessary and sufficient that the following conditions be fulfilled: A) the common distribution function F(x) of the Xj should belong to the domain of attraction of the stable law, and B) there exists N with sup pN (x) < oo. Proof Condition B) is clearly necessary, and implies that the densities pn{x) (n ^ N) are uniformly bounded. To see that A) is necessary, note that D.3.2) implies that, for x>y, \{Fn(x)-Fn(y)}-{G(x)-G(y)}\ ^ \Pn(z)-g(z)\dz ^ Jy ^ sup \pn(z)-g(z)\(x-y)^0 z as n->-oo, from which it follows easily that lim Fn(x) = G(x). n-* oo Assume therefore that A) and B) are satisfied, and choose appropriate constants An, Bn. Because of B) the density pN(x), and thus also its Fourier transform, is square integrable, and It follows that is integrable for all n^2N, whence () T" T Q~itx~itAnlBnf{t/Bn)"dt 271 J_^
4.3. A LIMIT THEOREM FOR DENSITIES 127 Denoting by v (t) the characteristic function of G, we have Rn= sup\pn(x)-g(x)\ = = sup 271 !_ - Q-itx{fn(t)-v(t)}dt T \fn(t)-v(t)\dt< D.3.3) where A ~A I2= \v(t)\dt, L = \fn(t)\dt, and A and g are positive numbers to be determined. Condition A) implies that/n->-t; uniformly in every bounded interval, lim /x = 0 . D.3.4) n-*oo Since v is integrable, lim T2 = 0. A-*ao The estimate D.2.7) shows that, for sufficiently small g, there exists c0 > 0 such that < as A-+CO. Since FN has a density, D.3.5) D.3.6) and since fn is integrable for
128 LOCAL LIMIT THEOREMS Chap. 4 /4=f \fn(t)\dt = '\t\2eB f \f{t)\2Ndt-+0 \2cBn ..... D.3./) as 7j->co. Thus each of the integrals /,- can be made arbitrarily small, and so therefore can Rn. • Remark 1. It is not difficult to give examples of densities p(x) for Xj, for which each pn(x) is unbounded. Such is the case for example [48] when P(x)=\ ({2|x|loglog|x|}-1, Ix^e, a density belonging to the domain of attraction of the normal law. Remark 2. Condition B) of the theorem will be satisfied if for some N, the density pN(x) belongs to Lp for some p>l. Indeed, if l^p<2 and \pN(x)\pdx<co , oo then Titchmarsh's inequality (Appendix 2) shows that showing that /„ is integrable, and pn therefore bounded, whenever n>Np/(p-l). § 4. Limit theorem in the Lt metric In the last section the discrepancy between p,,(x) and its limit g(x) was measured by the uniform metric sup\pn{x)-g{x)\. X However, pn(x) is determined only up to a set of measure zero, and it is therefore more natural to use the Lx metric
4.4. LIMIT THEOREMS IN THE L, METRIC 129 roo \\Pn-9\\l = \Pn(x)-g(x)\dx, J -oo or more generally the Lp metric \\pn-g\\P={ ) IP.W-flr for 1 It turns out to be unnatural to restrict attention to absolutely continuous distributions F, and we shall accordingly describe the derivative F'(x) as the density p(x), without presupposing that F(x)= \ F'(y)dy. J ~ oo Each distribution function F may be represented in the form F{x) = aR(x) + bS{x), D.4.1) where a,b^0, a + b=l, R{x) is an absolutely continuous distribution function R(x) = j p(z)dz, and S(x) is a singular distribution function (corresponding to a distribu- distribution concentrated on a set of zero Lebesgue measure) with S'(x) = 0 for almost all x. Then the density of x is Now let Xx, X2, ¦¦¦ be a sequence of independent random variables with distribution F, and denote as before by Fn(x) the distribution function of the normalised sum Zn=(X1 + X2 + ...+Xn-An)/Bn. Then Fn has a similar decomposition Sn(x) D-4-2) into absolutely continuous and singular components, and pn(x) = F^(x) = anR'n(x) will denote the corresponding density.
130 LOCAL LIMIT THEOREMS Chap. 4 Theorem 4.4.1. Let g(x) be the density of a stable law G. In order that, as n-^oo, llpn-0lli= I" \pn(x)-g(x)\dx^0 D.4.3) J - oo it is necessary and sufficient that A) F belongs to the domain of attraction ofG, and B) for some N, aN > 0. Proof From D.4.3) it follows that an= C anR'n(x)dx = J- oo r oo j-oo pn(x)dx^ g(xNx=l D.4.4) J— oo J — oo as n-+co, whence B) is certainly necessary. Moreover, D.4.3) and D.4.4) imply that, for each x, \Fn(x)-G(x)\^ so that A) is also necessary. Conversely, suppose that A) and B) are satisfied. To prove D.4.3) we re- require a number of lemmas. Lemma 4.4.1. For any a>0, b^O, a + b=l, /?>0, (n)amb"~m=o{n-p). D.4.5) n1/ilogn \mJ m — na< — i Proof Let <^l5 <^2, ... be independent and identically distributed random variables taking only the values 0 and 1, with respective probabilities b and a. Bernstein's inequality (cf § 7.5) shows that X (n)amb"~m = m-na<-nV2 logn V^f D46)
4.4. LIMIT THEOREMS IN THE Ll METRIC 131 Lemma 4.4.2. IfN is the integer referred to in the statement of Theorem 4.4.1, then FN can be written in the form FN(x) = aH1{x)+bH2{x), where a>0, b^O, a + b=l, Hx and H2 are distribution functions, and H1 is absolutely continuous with bounded derivative: ess sup H[ (x) < oo . Proof Choose positive numbers k and K so that {x; k<pN(x)<K} has positive measure, and define u(x) to be equal to pN{x) on this set and zero elsewhere. Determine a and H, by a= (°° u{x)dx>0, H1(x) = a~1[X u{x)dx. • D.4.7) J J We now proceed to the proof of the theorem. Any integer n ^ N can be written n = mN + r, where m and r are integers and 0 ^ r < N. By Lemma 4.4.2, + AnBn-ANBN\ BN J z\ BN *F*r(Bnx + AnBn) = j=o\jJ V BN z + z 1 G j — ma < — mV4 log m j — ma > — mVi logm J \-' / *H1*(m~j)*F*r = , say.
132 LOCAL LIMIT THEOREMS Chap. 4 By Lemma 4.4.1, I ( D.4.8) j — ma< — mVilogm \J By virtue of Lemma 4.4.2, the distribution is absolutely continuous, with bounded density pmJ(x) (cf. § 1.2). If hi{t), h2(t) and/(r) are the characteristic functions of Hu H2 and F, then by Parseval's theorem and Lemma 4.4.2, H \h1(t)\2dt= [°° \H[{x)\2dx^K [°° H'1{x)dx<oo. D.4.9) J — oo J— oo J — oo Therefore for all j^2 the function h{ h^~jf is absolutely integrable, and rn(x) = rln(x)-g(x) = I ("!)°'VH->pmJ(x)-g(x) = j ~ ma < — m V4 log m \J / ( -oo Vj-ma^mVilogm D.4.10) where v(t) is the characteristic function of G. We shall prove that lim sup |rn(x)|=0. D.4.11) n-» oo x To prove this, write D.4.10) as
4.4. LIMIT THEOREMS IN THE Lx METRIC 133 where h = \2A I2= L = and A and 3 are to be determined. A) If/n(t) denotes the characteristic function of Zn, then condition A) implies that \im fn(t) = v(t) uniformly on ( — A, A). Thus Lemma 4.4.1 implies that -A -A {ilsn(t)-fn(t)}e-itxdt ^ 2A sup \fn(t)-v(t)\ + 2A = o(l). j-ma< — mVi logm V J/ D.4.12) B) Since, for some s>0, \v(t)\ ^e ?|f|°", I2 can be made arbitrarily small by choosing A sufficiently large, (cf. Theorem 2.2.2). C) Since F belongs to the domain of attraction of a stable law with ex- exponent a, there exist D.2.7) <5l5 s1>0 such that, for ItK^B,,, D.4.13) Choose <5<<52; then by Lemma 4.4.1, fn(t)Q-itxdt- AH\t\<dBn dt + 25Bn j — ma < — m lA log m \ J D.4.14)
134 LOCAL LIMIT THEOREMS Chap. 4 In § 2.2 it was shown that Bn = nllah(n), where h is a slowly varying func- function, and consequently lim Bne~(logMJ = 0. Therefore 73 can be made arbitrarily small by making A sufficiently large. D) For all sufficiently large m, ma — mi log m > \ma , so that \t\>bBn <my j—ma> — mV4 logm \J D.4.15) Because /ii (t) is the characteristic function of an absolutely continuous dis- distribution, there exist c>0 such that l/z^r)! < e~c for all \t\> 5BN, so that D.4.15) implies that f \h1(t)\2dt = o(l). D.4.16) J-oo Combining D.4.13), D.4.14), D.4.16) we obtain D.4.11). Since r oo . r oo Z'lB(x)dx<l= ^(x)dx, J — oo J—oo D.4.11) shows that, for any A, so that sup \rn(x)\ \ J\x\>A
4.5. A REFINEMENT OF THE LOCAL LIMIT THEOREMS 135 Remark. If the conditions of the theorem are satisfied, then SOO roo \pn(x)-g(x)\dx + (l-an) g{ — 00 J— 00 as n->00, so that Fn-+G in variation. § 5. A refinement of the local limit theorems for the case of normal convergence In this section we assume that the common distribution of the random variables Xj has zero mean and finite variance a2. We write for the density of the standard normal law. The theorems of this section are rather similar to those of Chapter 3. We first assume that, for n ^ n0, the normalised sum Zn = (X1+X2 + ...+Xn)/<m± D.5.1) has, for all n^n0, an absolutely continuous distribution with density Pn(x). Theorem 4.5.1. In order that sup\pn(x)-(t)(x)\ = O(n-±d) D.5.2) X (O<S<1), it is necessary and sufficient that A) \x\2z where F is the distribution function of the Xj, and B) there exists N such that sup pN(x) < oo . Proof. The necessity of B) was proved in Theorem 4.3.1. In § 3 it was also shown that, for n>N,
136 LOCAL LIMIT THEOREMS Chap. 4 -»@}dt. D.5.3) Now v(t) = e~it2 has Fourier transform <?(*), so that, by Parseval's theorem, {fn{t)-v(t)}e-*2dt= f {pn(x)-0(x)}e-^2 = J— oo = O(n"**). D.5.4) This equation is the same as that denoted by (*) in § 3.4, which was there proved to imply A). Thus the conditions A) and B) are necessary for D.5.2) to hold. To prove their sufficiency, represent the right-hand side of D.5.3) as the sum of the three integrals i r tan Vi ~- Q~itx{fn(t)-v(t)}dt , Z7t J-fW/2 \t\>Eon1A \\>ecrnV2 As in § 3.4, it is proved that and as in § 3, that for some c>0. Finally Assuming that the X-} have finite moments of order /c^3, asymptotic expansions for pn{x), similar to C.3.16), can be established. Theorem 4.5.2. Suppose that the Xj have a finite moment of order k, and that, for some N,
4.5. A REFINEMENT OF THE LOCAL LIMIT THEOREMS 137 sup pN(x) < X Then, uniformly Pn{*) =</>{* where Pj(-(t>) = - oo . in x trPj as (- tt->oo, n-*>P, D.5.5) Pj( — $) is defined as in Theorem 3.3.3. In other words, the theorem shows that, whenever the left-hand side of C.3.16) has a bounded derivative, that expansion may be formally differentiated. We indicate the proof only for the case k = 3. If n>2N, then iP1{-4>)=^T T Q-itx{fn(t)-g(t)}dt, where {cf § 3.3). Therefore, if Tn3 = p3 then \fn(t)\dt+ \ \g(t)\dt. n3 \\>Tn3 J\t\>Tn3 D.5.6) As we have seen, the last two integrals are o(n~*). By Theorem 3.2.1 B), Similar results obtain for lattice distributions, like the following analogue of Theorem 4.2.1.
138 LOCAL LIMIT THEOREMS Chap. 4 Theorem 4.5.3. In order that sup an- where 0 < 5 < 1, it is necessary and sufficient that the interval h be maximal and that Theorem 4.5.4. // the independent variables Xj have the same lattice distribution with step h, and have E(Xj) = 0 and E\Xj\k<ccfor some then (znk) Here Pj(— 4>) is obtained from Pj( — $) by substituting 4> for <P. The proofs of these two theorems involve no new ideas beyond those of § 2, and will be omitted.
Chapter 5 LIMIT THEOREMS IN Lp SPACES § 1. Statement of the problem Consider the sequence X1,X2,... of independent random variables with the same distribution F. If F belongs to the domain of attraction of a stable law Ga with exponent a, then the distribution functions Fn of the normalised sums Zn=(X1+X2 + ...+Xn-An)/Bn satisfy lim FH(x) = Ga(x) n-»oo for all x. In fact we can make the stronger assertion lim sup\Fn(x)-Ga(x)\=0, E.1.1) n —* oo x because of the following simple lemma. Lemma 5.1.1. If a sequence of distribution functions Gn(x) converges to a continuous distribution function G(x), then lim sup \Gn{x)-G(x)\=0. n-* oo x Proof For any positive number s, we can choose A so large that and points cij with — A = a0<a1<... <as = A such that Giaj-Giaj-J^e/e, (j=l, 2, ..., s).
140 LOCAL LIMIT THEOREMS Chap.5 There exists N such that, for all n>N and each j, \G(aj)-Gn(aj)\<e/6. If |x| < A, there exists j with cij < x < aj+1, and since G and Gn are monoto- nic, \Gn(x)-G(x)\ < IG^xJ-G^aJI + IG^ < {Gn(aj+1)-Gn(aj)} + \ + \Gn(aj)-G(aj}\ BBSS + + + Similarly, if x > A, see <6 + 3 + 6<8' and the same argument deals with the case x< — A. Thus, for all x and alln>JV, \Gn(x)-G(x)\<8. . If we denote by Lx the Banach space of bounded measurable functions /on (—00, oo), with norm = ess sup \f(x)\, then E.1.1) asserts that, when F is in the domain of attraction of Ga, then IIF.-GJI-O. This chapter is devoted to a study of the analogous problem in the space
5.2. DOMAINS OF ATTRACTION OF STABLE LAWS 141 Lp of functions / for which the norm \f(x)\pdx\llP is finite. In § 2 it is shown that the domain of attraction of a stable law is not reduced by replacing weak convergence by convergence in Lp, and the remaining sections deal with the case of normal convergence. § 2. Domains of attraction of stable laws in the Lp metric Let Xx, X2, ... be a sequence of independent random variables with the same distribution F. If it is possible to select normalising constants An, Bn in such a way that the distributions Fn of Zn = {Xx+X2 + ...+Xn-An)IBn E.2.1) converge weakly to a distribution function G, then it was shown in § 2.6 that G is necessarily stable. No change is necessary if weak convergence is replaced by convergence in Lp. Theorem 5.2.1. If the distribution G is a limit in Lp (p>0) of distributions of normalised sums of the form E.2.1), then it is necessarily stable. Con- Conversely, if the distributions Fn of E.2.1) converge weakly to a stable distri- distribution G with exponent a, then \\Fn-G\\p^0 as n->oo,for all p>a-1. Proof Because of Lemma 5.1.1, we can restrict attention to p< oo. Sup- Suppose therefore that 0 <p < oo and suppose that || Fn — G||p->0. The stability of G follows from Theorem 2.1.1 and the following lemma. Lemma 5.2.1. If\\Fn — G||p->-0, then Fn converges weakly to G. Proof. If not, there is a point x0 and a sequence {n,} such that either Fnj{x0)-G{x0)>5>0 or FnjM-G(x0)< -<5<0
142 LOCAL LIMIT THEOREMS Chap.5 for all j. Suppose for instance that the former is true, and choose g>0 so that G(xo + s) — G(xo)^j3. Then for xo^y^xo + e , so that ¦> x0 The contradiction proves the lemma. • We now proceed to prove the second half of the theorem. Before doing so, we remark that the condition p>a~1 is essential, since it is possible (using Theorems 2.6.1 and 2.6.2) to give examples of distributions F in the domain of attraction of G (of exponent a) such that, for all n and all p ^ a, \Fn(x)-G(x)\*dx=oo. oo Suppose therefore that F is in the domain of attraction of a stable law G of exponent a, and that pxx. By Theorem 2.6.4, Fn and G have finite absolute moments of order 3 for any 3 <cc, and therefore, as x->oo, Fn{-x) =o(x~d), G(-x) =o(x~d), Taking 3 so that p>d~1>oc~1, we have |F,,(x)-G(x)|'=0(|xr0, so that (Fn — G) belongs to Lp. Lemma 5.2.2. IfF belongs to the domain of attraction of a stable law G of exponent a, then for all 3 < a the moments Kn(S) = are uniformly bounded in n. Proof We distinguish two cases. A) 0<a<l. We first show that the characteristic function fn(t) of Fn satisfies the inequality
5.2. DOMAINS OF ATTRACTION OF STABLE LAWS 143 \l-fn(t)\^c(S)\t\6 E-2.2) for <5<a, where cE) does not depend on n. To prove this, note that by Theorem 2.6.4, = ejp{iyt-c(t)\t\'h(\t\-1)} , where h is a slowly varying function with n->oo and c(t) = co(l + i sgn tco(t, a)). Then and by Karamata's theorem (Appendix 1) there exists a function g(w)->-0 («->()) such that Moreover, it is clear that ny-An sup < co, and thus E.2.2) is proved. Write Wn(x) = Fn(x)-E(x), where E(x) is the distribution function of the degenerate distribution concentrated at x = 0. From E.2.2) the function belongs to Lp so long as 1 <p<(l —a), and ||^J|P is bounded by a con- constant independent of n. By Titchmarsh's theorem (Appendix 2), ij/n{t) is the Fourier transform of some function ^n(x) in Lq (q=p/{p—l)) if l<p<min B,A-a)). But
144 LOCAL LIMIT THEOREMS Chap.5 fn(t)-l = whence it follows easily that *Pn{x)= Yn{x). Integrating by parts, we find that KnE)= T \x\*dF n(x)= J \x\ddVn(x) = x\d-1\Vn(x)\dx = = 2 + 3 T Hd{x)?n{x)dx, J - oo E.2.3) where Setting O (|x|<l) = Btc)-* f" elementary calculations yield If a<i let <5 <<5'<a, p = (l -<5')"x <2. Then «Pn of Appendix 2 shows that and Theorem A2.2 = r J — If, on the other hand, E.2.4) as before. E.2.4) a^l, write p = (<5')~1, <5<<5'<a, and verify B) l<a^2. Now E(Xj) is finite and we may clearly assume that E(Xj) =0. As before we have, uniformly in n, \l-fn(t)\<cE)\t\* E.2.5) for <5<a, and as in E.2.3),
5.2. DOMAINS OF ATTRACTION OF STABLE LAWS 145 where Hd_1(x) = Hd_1 (x) sgn x. Because of E.2.5) \j/n (t) is differentiate, with \]/'n e Lp for all 1 < p < B — a)~1. From Theorem A2.3, it follows that \j/'n is the Fourier transform of xWn(x). Arguing as before, we have Hd_l(x)xWn(x)dx = Thus we have proved that, for all values of a, Kn{d) ^4 + K1{5) = K{d) < oo . . It is now easy to complete the proof of the theorem. Since F belongs to the domain of attraction of G, Lemma 5.1.1 shows that X ~1>a Hence if p>S i >a \ Lemma 5.2.2 gives P \Fn(x)-G(x)\pdx + 2p\\ \WH(x)\p** + + P {1-G(x)}pdx+ [ {G(x)}pdx; J T J - oo j >  |x|MG(x) po — i I L-^-oo J Taking T = A(n)~1/<5, we have By analogy with the terminology of § 2.6 we could define the Lp domain of attraction of G as the aggregate of distributions F with ||Fn-G||p->-0. The theorem shows that, so long as p ><x~ \ the Lp domain of attraction coincides with the domain of attraction as originally defined.
146 LOCAL LIMIT THEOREMS Chap.5 § 3. Estimates for ||FB — <P|| p in the case of normal convergence In this section we assume that the Xj have finite third moment, and write E(Xj) = 0, E{Xj) = o\ E(X?) = a3, E\Xf=fl3. As before, the distribution function and characteristic function of Xj, and the distribution function and characteristic function of the normalised Zn, are denoted respectively by F(x), f(t), Fn(x), fn(t); <P(x) is the distribu- distribution function of the standard normal distribution. In this section we estimate the rate at which cn = \\Fn — <P\\p converges to zero. Theorem 5.3.1 can be considered as a generalisation of Theorem 3.5.1 in which Lp appears in place of L^. Theorems 5.3.2 and 5.3.3 are analogues of Theorems 3.3.1 and 3.3.2. Theorem 5.3.1. If the Xj have finite third moment, then for allp^l, pp(Fn, <P) = \\FH-$\\p ^ cP-»l>c\»P3n-* , E.3.1) where c, cx are absolute constants and p3 = jS3/<73. Proof. Theorem 3.5.1 asserts that, under the conditions stated, sup | Fn{x)-<P{x)\ ^cp3n-±, E.3.2) which is E.3.1) with p=oo. Suppose we can also prove that E.3.1) is true for p= 1, i.e. that \\Fn-<P\\i= \FH{x)-${x)\dx^clP3n-*. E.3.3) J - oo Then it will follow that \\Fn-<P\\pP= sup\fn[x)-9[xn . x J — oo rP ~ 1 r r>P n ~ iP /-co 1 •) — proving E.3.1). Hence it remains only to prove E.3.3). From Theorem 3.2.1 we have, for \t\^Tn = n*/5p3, the inequalities E.3.4)
5.3. ESTIMATES OF \\F,-4>\\,, 147 and Now turn to Theorem 1.5.4, and set A(x) = Fn(x), B(x) = <P(x), L By virtue of E.3.4), exp (- n-1 E.3.5) where cx x is a constant, and by E.3.4) and E.3.5), e so that 71 12 713 n2 In the next two theorems we describe the asymptotic behaviour of c{np)=\\Fn-0\\p as «-> oo; this depends on whether or not the Xj have a lattice distribution. Theorem 5.3.2. IfF is not a lattice distribution then for \\Fn-0\\p= Ap\a3\/a3ni + o(n--), as n—>oo, where oo, E.3.6) = 1/6B*)* , =A (p<00) Proof We have seen in § 3.3 that under the conditions stated, E.3.7) uniformly inxasn^co, where
148 LOCAL LIMIT THEOREMS Chap. 5 Gn(x)= This proves E.3.6) in case p = oo, since sup \Bnn)~^Q1(x)e~ix \ = Bnn)~* |a3|/6<73 . X It is natural therefore to proceed by estimating ||Fn — Gn\\p. The next lem- lemma deals with the case p= 1. Lemma 5.3.1. Under the conditions of the theorem, as n->oo, Proof. Use Theorem 1.5.4 with A(x) = Fn(x), B(x)=Gn(x), T = X(n)n±, where k{n)->oo (n-+ oo) is chosen in accordance with Lemma 3.3.1, to get where d = Mt)-gn(t) dt, s = -T d L(t)-gn(t) dt t dt, and g.(f) = Arguing as in the derivation of C.3.4), we have E.3.8) Moreover, T -T dt + 2 -T fM-^2dt = e1+s2, say.
5-3. ESTIMATES OF ||FB-*||P 149 The estimate ?i = o(n~1) E.3.9) is proved in just the same way as E.3.8). To estimate e2 we split the inte- integral je2 into three terms [Tn rX(n)nV* ,--Tn h=\ , h =\ , h = \ •> -Tn JTn J-X(n)nV2 where Tn = n*/5p3. It then follows from Theorem 3.2.1 C) that I^oin'1). E.3.10) It is clear that I2= ni fmi°\f(z)\2n-2 l/5p3<T Now and by Lemma 3.3.1 (if X(n)/<r if(\,2n-2 r X(n)lo l/5p3<T Z J l/5p3<T Z Hence /2 = o(n~1), and similarly I3 = o(n~i), so that e^e1 + e2 = o(«-1). E.3.11) Combining these various inequalities, we have /•oo J - Now suppose that 1 < p< oo. From Lemma 5.3.1 and equation E.3.7) we infer \\Fn-Gn\\pP=
150 LOCAL LIMIT THEOREMS Chap.5 Thus we have proved: Lemma 5.3.2. Ifp^l then, as n->co, \\Fn-Gn\\pp = o(n--p). E.3.12) We are now in a position to complete the proof of the theorem. If Rn(x) = Bnn)^Q1(x)e-^2 = Gn(x)~<P(x), then by Minkowski's inequality, Gn\\p + \\Rn\\p, . , lp-H^-GJIp. l ' Thus Lemma 5.3.2 implies that and since the theorem is proved. • Remark. For integer values of p, Ap can be calculated explicitly; for example, A2=l/4{6)in*. We now turn to the case of lattice distributions. Denote by Lh the class of lattice distributions with maximal step h. For F in Lh let the values on which F is concentrated be a + kh (k = 0, +1, ±2,...), write t = 2n/h and define St (x) and dn(t) as in § 3.3. Theorem 5.3.2. IfXl,X2, ... are independent random variables with the same distribution F belonging to Lh and having finite third moment, then, oo and n->oo, where Mp =
5.3. ESTIMATES OF ||F,-*|L 151 Proof. By Theorem 3.3.2, as n->oo- uniformly in x, so that = sup |Fn(x)-^(x)l = x It is not difficult to compute that, as n-> oo, so that the theorem is proved in the special case p=oo. To discuss the general case, we need the following analogue of Lemma 5.3.2. Lemma 5.3.3. Under the conditions of the theorem, for all p^l,as «-> oo, where Proof. As before, it suffices to establish the case p=l. Again we use Theorem 1.5.4, setting A(x) = Fn(x), B(x) = Hn(x), T = n*. to obtain where fn(t)-h(t) 6 = -T dt,
152 LOCAL LIMIT THEOREMS Chap. 5 ? = -' -T d L(t)-k(t) dt t J - T 2 rT dt + 2\ and hn(t)= eitxdHn(x) = . — co /•co /-co 4txdGH(x) + e'' J - co J - co = gn(t) + dn{t). Exactly as in the proof of Theorem 3.3.2, we have To estimate ?x we split it into two parts, and note that, again as in the proof of Theorem 3.3.2, ?12 = o{n~1). To deal with ?ll5 note that L(t)-gn(t) V dt + 1/2 14.@ -txanVi dt. E.3.14) The arguments used to estimate I2 in the proof of Theorem 3.3.2 show that thefirstintegralinE.3.14)iso(n1). From co _ — iTvan it is easy to check that <fB@) = 0, sup sup K < oo Tt^us, uniformly in n, and in some neighbourhood of t = 0 not depending or n,
5-3. ESTIMATES OF ||FB-*||p 153 dn(t) = O(t2). E.3.15) Moreover, for |f|^jr<7n^ , 14,@1 < \t\ Q~?2o(e'nV2). E.3.16) Therefore I™ ^W2dt<|" o(i)dt+o(e-"%)( r^-^d^ so that e^oin-*). E.3.17) The proof of is similar, it only being necessary to use B) rather than C) of Theorem 3.2.1, and to note that d'n(t) = O(t) in some neighbourhood not depending on n, and that if \t\ ^jTcn^. Thus and the lemma is proved. The rest of the proof is exactly analogous to that of Theorem 5.3.1. • For some values of p, Mp can be explicitly calculated; for instance, M, = p h2 M If a3 = 0, then for all p, Mp = jha~1Bn)~{p
Chapter 6 LIMIT THEOREMS FOR LARGE DEVIATIONS § 1. Introduction and examples In this and succeeding chapters we shall examine the simplest problems in the theory of large deviations. Let Xi, X2,. • ¦ be independent, identically distributed random variables, with a2, F.1.1) and let Zn denote the normalised sum Then, for any x0, P{Zn<x)-{2n)-> T e-*'2df^0 F.1.2) J as n->oo, uniformly in |x|^x0. If the Xj have a probability density p(x), then the results of § 4.3 show that, under weak conditions, the density pn{x) of Zn satisfies pn(x)-Bn)-±e-±x2^0 F.1.3) as x->oo, uniformly in |x|^x0. In many problems encountered in such different branches of science as mathematical statistics [18], [24], information theory [ 185], the statistical physics of polymers [181] and even the analytic arithmetic of the hyper- complex numbers [103], more precise information about the distribution of Zn is required than is contained in the classical theorems. In particular, such problems require the estimation of P(Zn>x) F.1.4)
6.1. INTRODUCTION AND EXAMPLES 155 when both n and x are large. Such problems constitute the theory of large deviations. Since the probability F.1.4) will in general be small, the usual methods of establishing limit theorems (via characteristic functions and partial dif- differential equations) are too crude for the derivation of sufficiently general results, and most of the theorems about large deviations are proved under very stringent conditions. Before formulating the problem in general, we consider some simple but characteristic special results. Consider a Bernoulli scheme of n independent trials, with a probability p > 0 of success. Write Y} = 1 if they th trial results in a success, and Y) = 0 otherwise. If b(m,n,p) = p(fl Yj=rnJ, then of course where q = l—p. If Xj= Yj—p, then E(Xj) = 0, V(Xj) = pq, and Zn takes only the values xm = (m-np)l(npqf (m = 0, 1, 2, ..., n), with respective probabilities b(m, n,p). If we apply Stirling's formula to F.1.5), we obtain the following local limit theorem: if xm = o(rvt) as n->oo, then b{m, n,p)~Bnnpq)-i exp {-jxl-<j)(xm)} , F.1.6) where 00 nv~1 We remark that the asymptotic formula F.1.6) can be very useful, and is often much easier to compute than the exact expression F.1.5). Suppose that the random variables Xj introduced at the beginning of the section satisfy Cramer's condition that, for some a>0, (C) ?(expa|XJ|)< oo . F.1.8) Then the following theorem will be proved later.
156 LIMIT THEOREMS FOR LARGE DEVIATIONS Chap. 6 Theorem 6.1.1. If x^O and x = o(n*) as n->oo, then and = exp G(-x) Here A (z) is a power series constructed by means of the cumulants of the Xj, and converging in a neighbourhood of z=0, which conversely determines the distribution of Xj, and This theorem displays an important characteristic property. In it x is only restricted to the range [0, o(n*)~\, but suppose we restrict it to the narrower interval [0, na], where a < \. Then it is unnecessary to include in F.1.9) the whole power series 1{z) = A0 + A1z+..., F.1.11) since the truncated form Xls\z) = A0 + Xxz+...+Xszs F.1.12) gives the same asymptotic formula, where s is the integer satisfying - F.1.13) 2 s + 2 2 Now it will be seen that the coefficients Xk (k^s) are determined by the cumulants of Xj up to order (s+3). Thus if we have two sequences {Xj} and {Xj}, both satisfying Cramer's condition, whose moments agree up to order (s+3), and Zn and Z'n are the corresponding normalised sums, then for |x|^na, f(z;>x) ^r1- w^ru FU4) as n->oo. Thus the asymptotic behaviour of the tails of the distribution of Zn, in the range |x|^na (a<y), is determined for distributions satisfying Cra-
6-1. INTRODUCTION AND EXAMPLES 157 mer's condition by a finite number of parameters, the first (s + 3) moments of Xj. This situation is analogous to the classical case, in which a whole class of distributions is attracted to the same stable law. It is however in sharp contrast to the case a = |, in which the whole function X{z) enters, since two different distributions have different functions l(z). Theorems of the former type we will describe as having a "collective" character. In the range x = o(ni) the asymptotic expressions F.1.9) and F.1.10) are less valuable, since they are not collective. They only have a computational value if it is easier to compute X{z) than to calculate the convolutions directly. At the same time, these expressions can have a role in the ap- approximate estimation of the probabilities of large deviations (cf. [18], [185]). Sometimes it is necessary to give bounds for such probabilities in wider ranges x = 0(ni), in which the case of the Bernoulli scheme shows that we can have P(Zn>x) = 0. For such cases Bernstein's inequality (§ 7.5) gives an upper bound of wide applicability. Let us remark that the study of the very large deviations x = O (n*) gives rise to an expression involving the entropy of a certain system of events (Sanov [166]). We illustrate this by a simple example of the multinomial distribution. Suppose we require to test two alternative hypotheses Ho, H^ by means of a series of n independent trials with possible outcomes A±, A2, -•-, Ar. According to Ho the respective probabilities of these outcomes are pl5 p2, •••• pr', according to Hy they are all equal to 1/r. The likelihood ratio test accepts Ho if « p?...p?'>Z, F-1.15) where m,- is the number of trials resulting in the outcome Ah n\ L(Ho)-m1!m2!...mr!Pl Vl '~ is the likelihood of Ho, and L^) is similarly defined. Now F.1.15) can be thrown into the form m1 Iogp1 + m2 logp2+... + m1 log pr>log <!;-« log r , F.1.16) and the expectation of the left-hand side, under H0,isn times the entropy of the scheme Au A2, •••, Ar under this hypothesis. Now suppose that Hi is true, so that Ho is false, and the observations ml,m2, ...,mr represent
I5K LIMIT THEOREMS FOR LARGE DEVIATIONS Chap. 6 large deviations from npu np2, ..., npr. Were this not so, we could apply the well-known Laplace approximation L(H0)~Bnn)-*'-1\plP2...Pr)-t exp |-| ? g,.x?J, F.1.17) where qt=l-Pi, Xi = (m;-npi)/(npiqt)* . Using this approximation, the likelihood ratio criterion would give a quadratic rather than a linear form F.1.16). The reason for the discrepancy is that F.1.17) does not hold when x{ is of order n*; the correct asymptotic expression includes terms of entropy type. § 2. Statement of the problem For the variables Xj introduced at the beginning of the chapter, we examine the behaviour of the tail probabilities P(Zn>x), P(Zn<-x) F.2.1) as n-> oo for x in the range [0, \]/ (n)], \]/ (n) being a function tending mono- tonically to infinity. We shall seek theorems which imply that, for all xe[0, <A(n)], as n->oo, P(Zn>x)/<P{x, au a2, ..., ak, n)-> 1 , F.2.2) P{Zn< -x)/<P{-x, bu b2, ..., bh n)-, 1 , F.2.3) where the parameters al5 ...,ak,bi, ...,bl are linear functions of the dis- distribution F of the variables Xj. Such a limit theorem will have a collective character, since it will show that all distributions for which these linear functionals have given values have the same limiting behaviour. To put it another way, we can speak of the "domain of attraction" of the "limiting tails" <P. The problem of discovering the possible forms of the limiting tails is closely analogous to the classical problem of characterising the possible limit laws for centralised and normalised sums of independent variables, i.e. the stable laws. And of course there is a corresponding problem of local limit theorems. In the following chapters, several systems of limiting tails are considered. When the Xj have finite variance, the appropriate system is due to Cramer,
6.2. STATEMENT OF THE PROBLEM 159 the cij, bj are moments of Xj, and collective theorems hold for \\i (n) = rf (a < j). If not all the moments of Xj exist, limit theorems may still be valid, the aj, bj being "pseudo-moments" defined by analytic continuation. In these theorems \\i (n) can be arbitrarily large. There is one property of limit theorems for large deviations which should be remarked; the local theorems are usually easier to prove than the corresponding integral theorems. This is because, although the former are stronger, they are naturally stated under stronger conditions, and these considerably ease the proofs. Because of this, we begin with local limit theorems.
Chapter 7 RICHTER'S LOCAL THEOREMS AND BERNSTEIN'S INEQUALITY § 1. Statement of the theorems The theorems of this chapter do not have a collective character, and are related to Theorem 6.1.1. We shall consider a sequence of independent, identically distributed random variables Xj with E{Xj) = 0, V{Xj) = G2>0 G.1.1) satisfying Cramer's condition (C) ?{exp(a|XJ-|)<oo, G.1.2) where a is a positive constant. We shall call such variables those of class (C), and distinguish the subclass (C, d) of variables with a bounded continuous probability density g(x), and the subclass (C, e) of lattice variables, i.e. those taking only the values b + kh (k = 0, +1, ...), h being maximal. Assuming, as before, that we remark that, for (C, d) variables, Zn has a probability density pn(x), while for (C, e) variables, Zn takes only the values The local theorems of Richter [147], [148] treat the asymptotic behaviour of pn(x) and P(Zn = xnk) respectively. We shall consider only the simplest formulation of these theorems, in order to make the proofs reasonably simple (cf. §§ 4.2, 4.3). Theorem 7.1.1. If the variables Xj belong to (C, d) then,forx^l,x = o(ni) as n—>oo, we have
7.2. A LOCAL LIMIT THEOREM FOR PROBABILITY DENSITIES = exp 161 G.1.3) G.1.4) Here po(x) = Bn) and is Cramer's power series, convergent for \z\ < s(a), where s(a) depends only on a (cf. F.1.11)). The construction of this power series will be detailed later. Theorem 7.1.2. If the variables Xj belong to (C, e), andx = xnk = (kh + bn)/ orfi, then for x^l, x = o(ni) as n->oo, we have 1+0 G.1.5) For x^ — 1, x = o(n^), we have an- n = xnk) = p0(x)exp The symbols po{x), k(z) have the same meanings as before. Theorems G.1.1) and G.1.2) will be proved by the method of steepest descents. § 2. A local limit theorem for probability densities Let the Xj belong to (C, d) and denote their characteristic function by = Af(it) = We remark that 4>(t)eL2( — co, co), i.e. that \(P{t)\2dt<cjc G.2.1) Indeed, \4>{t)\2 is the characteristic function of Xx—X2, which has a bounded continuous density ^(x). Then G.2.1) follows from the following
162 RICHTER'S LOCAL THEOREMS; BERNSTEIN'S INEQUALITY Chap. 7 lemma from the theory of Fourier transforms (for the proof of which see [11], page 20). Lemma 7.2.1. If a bounded continuous function g(x)eLi( — co, oo) has a non-negative Fourier transform h(t), then h(t)eL1(—oo, oo). The relation G.2.1) permits us to express pn{x) (n^2) by the inversion formula rlc0 M(z)" exp(-(Tii*xz)dz, G.2.2) the integral being taken along the imaginary axis. Since g is bounded and continuous, M(z)->0 as z-> + foo. Moreover, \M(z)\< 1 for z^O. Hence for any e>0, n>2, oo "-2 \M(it)\"dt<{l-n(s)Y \t\>? J-00 G.2.3) Here B is bounded and rj(s) > 0. The right-hand side of G.2.3) can be writ- written as B exp (— m/i(e)), where rj1(s)>0. Substituting into G.2.2) and using the fact that, on the imaginary axis, |exp( — on*z)\ = 1, we have G-2.4) Because of condition (C) in G.1.2) M(z) has an analytic continuation to the strip |Re z\ < a, which has a power series expansion about z = 0 convergent in \z\^\a = ax. The integrand in G.2.4) has the form M (z)" exp (- on* zx). G.2.5) We shall suppose that e is chosen so small that EKa^ and that |M(z)|>j in |z|<e (this being possible since M is continuous and M@) = l). In \z\ ^ e define K (z) as the branch of log M (z) with K @) = 0. Then G.2.5) may be written as exp{n(K(z)-c7ZT)}, G.2.6) where x = x/ni; we assume that x ^ 1. Because |M(z)| ~^\ in \z\ ^ e, K(z) is a regular function of z in this circle, and has a Taylor expansion oo K(z)= I ykzk/kl, G.2.7) k=2
7.2. A LOCAL LIMIT THEOREM FOR PROBABILITY DENSITIES 163 where etc. are the cumulants of Xj and ju,- are the moments of Xj. Turning now to G.2.6), we assume that x = o(ni), so that t->0 as n->oo. The saddle point equation (see for instance [24]) is K'(z)-gt = 0 G.2.8) or z2 z3 ax = G2z+y3 — + y4 — + ... , G.2.9) or T = c7Z+Zlf!+^ + .... G.2.10) 2<7 6<7 If t is sufficiently small, and this will be true for large n, G.2.10) may be inverted as a power series in t, converging for sufficiently small t. This gives the position of the saddle point as (by the rules for manipulating power series). For sufficiently small t, z0 will lie inside the circle \z\ ^^e = el5 and from G.2.11) will lie on the positive half of the real axis. We consider the rectangular contour Lt+Li + Li + Li, G.2.12) where Li = (iei, — iei), L2 = (-ielt Zo-iEi), L3 = (^o-^i. zo + i?i), L4 = (zo + I?i, iEj) . By Cauchy's theorem the function G.2.6) has zero integral around this contour, so that, replacing e by et in G.2.4), +\ + Jz.3 G-2-13)
164 RICHTER'S LOCAL THEOREMS; BERNSTEIN'S INEQUALITY Chap. 7 Because M(it) = exp K(it) = exp(-^2t2 + O(t3)), we have, for ex sufficiently small, Because M is continuous, when t is sufficiently small, |M(z)|^exp(-|c72e2) on L2 and L4. Moreover, |exp( — an^z)\ = exp( — (m*Re z) < 1 , and therefore ¦>L2 ->L4 on L2 and L4. Moreover, |exp (— an* z) | = exp (— en* Re z) and therefore l2 for f72(?i) >°- We therefore have, from G.2.13), exp{n(i^(z)-c7Tz)}dz + + 0{exp(-ra7(?1))}, where f/(e1) = min[/71(e1), ^(fiJJ.If z = - K (z) — axz = K (z0) — oxz0 + G.2.14) G.2.15) G.2.16) G.2.15) G.2.16) G.2.17) and t is small, then ^. G.2.18) J=2 Moreover, K (Zo) - C7TZ0 = K (ZO) - Z0 X'o (ZO) = Using G.2.11), we have K(zo)-axzo= -%t m~\ ml G.2.19)
7.2. A LOCAL LIMIT THEOREM FOR PROBABILITY DENSITIES 165 where is completely determined by K(z), converges for sufficiently small t, and is called Cramer's series. The series in G.2.18) is the Taylor expansion of K(z) about z0, and its radius of convergence is at least %e=%e1. From G.2.11), for sufficiently large n. For \t\ < st we have n = -inK"(z0)t2 + nO(t3). G.2.21) j = 2 J- Consider t in the range n-"(lognJ<|f|^ei. G.2.22) Because of G.2.21) we have in this range, Re ^ ? " ^M'VJ < -.in/r(zo)r2 f , G2.23) if Si is sufficiently small. Further n)), G.2.24) where cx is a positive constant. Inserting G.2.18) and G.2.24) into G.2.11), we obtain crn2 Pn(x)= -r-exp{n(K(zo)-(TXzo)} x In x[ expln f; (o)^} j=2 J- ) + 0 ((^-expn{K(z0)-GTz0)exp(-c1 log4n)J + G-2-25)
166 RICHTER'S LOCAL THEOREMS; BERNSTEIN'S INEQUALITY Chap. 7 § 3. Calculation of the integral near a saddle point From now on B will denote a bounded quantity, not necessarily the same from one occurrence to another. If \t\ ^n~*(log nJ, (itV o)Kf- = -±nK"{z0)t2+inK'"(z0){itK + Bn~'(log n)\ j=2 ]• and inK'"(zo)(itK = Bn~i(log nN, so that expfn V j=2 exp(-inK"(z0)t2)x l/2(lognJ x (l+^iX'"(zo)(irK + B«-1 log8n)df = exp(-±nK"(zo)t2)x n. G.3.1) - oo The integral is equal to Bn/nK"{zo)f, G.3.2) so that G.3.1) is equal to Bn/nK"(zo))i(l+Bn'1 log8 n). G.3.3) Thus the first term on the right-hand side of G.2.25) is equal to <7BnK"(zo))-± exp{n(K(z0)-oTz0)}(l + Bn-x log8n), G.3.4) or, because of G.2.19), a{2nK"{z0)yiexp{-$nT2 + nT3A(T)}(l+Bn-1log8n). G.3.5) Furthermore, G.2.7) gives K"{z0) = G2 + Bz0 = <j2 + Bt. G.3.6)
7.4. A LOCAL LIMIT THEOREM FOR LATTICE VARIABLES 167 Substituting into G.3.5) and noting that A +Bt)A +Bn~x log8n) =l + O(x/ni), G.3.5) becomes Bn)~* Qxp{-^nx2 + nx2 + nx3A{x)}{l + O(x/ni)). G.3.7) We now remark that, for x = o(l), ~\nx2 n* exp( — cx log4n) = 0{x/ni), so that G.2.25) and G.3.7) combine to give G.1.3). To obtain G.1.4) replace Xj by — Xj-; Theorem 7.1.1 is proved. • We shall make a few remarks about Cramer's series G.2.20). It is easy to verify that the first k coefficients of this series determine the first (k + 3) moments of Xj (assuming that EXj=0 and that a1 = VXi is known). In fact if these coefficients are known, we have the first (k+ 3) terms of the expansion of K(z0)-<7xz0 = K(z0)-z0K'(z0) in powers of x. Hence from G.2.9) we can determine the cumulants ym (mKk + 3) and hence the moments /nm (m <k + 3). The argument reverses; if jU3, ..., jUfc+3 are known, then Ao, A1? ...,Xk_x are determined. § 4. A local limit theorem for lattice variables We now proceed to the proof of Theorem 7.1.2. We introduce oo ], G.4.1) Jt=-oo defined in |Re z\ < a because of G.1.2), and periodic with period 2ni/h. Write .7=1 Pn(k) = P(Sn=kh + bn),
168 RICHTER'S LOCAL THEOREMS; BERNSTEIN'S INEQUALITY Chap. 7 (these being the only values taken by Sn). Then, if |Re z\ <a, M(zf= ? Pn(k)exp[z(kh + bn)-\. G.4.2) k= — co For any c in — \a ^ c < \a, multiply G.4.2) by exp [ — z(k0 h + bn)~\ to obtain (after writing k for k0), M{zf exp [-z{kh + bn)-]dz. G.4.3) Writing x = xnk = (kh + bn)/an*, we have fa r Pn (k) = ~-\ M (zf exp (- zarv x) dz . G.4.4) 2nijc_in/h We now remark, that, for \t\^n/h, t^O, the strict inequality \M(c + it)\<M(c) G.4.5) obtains. The weak inequality is obvious, and if there is equality we must have ekhit = 1 G.4.6) whenever pfc#0, which contradicts the maximality of h. Assuming that x^l, x = o(rfi) and keeping the notation of the previous sections, we find that the saddle point is at z0, determined by G.2.10). For sufficiently small eu we take c = z0 and study the integral — [ ° ' M(z)n exp (- zaxn) dz . G.4.7) This differs from G.2.17) only by a factor arv/h, and consequently, according to G.3.7), is equal to he'1 B7in)-± exp {-\nx2 + m3X(x)}A + 0(x/n*)). G.4.8) Further, according to G.2.19), M (zo)n exp (- z0 axn) = exp {- \m2 + m31 (t) } . G.4.9) Because of G.4.5) and the continuity of M we have \M(z0 + it)\<M(z0)(l + r](8i)) G.4.10) for ?i ^ t^ n/h, where r\(ex) is a positive constant not depending on z0.
7.5. BERNSTEIN'S INEQUALITY 169 Hence o + iY)|B|exp(-z<7T«)||dz| = = B\M(zo)\"Qxp(-zo<7Tn)(l-r,(e1))n. G.4.11) Since x^l, xjn>(l — n{ex))n for sufficiently large n, and this, together with G.4.9) and G.4.8) gives G.1.5); G.1.6) follows on replacing Xj by - Xj. § 5. Bernstein's inequality We have remarked before that useful results of the theory of large devia- deviations are not always asymptotic expansions, but are sometimes inequali- inequalities. These are particularly useful if they admit effective computation, and if the constants in them are best possible, or nearly so. Important among these is Bernstein's inequality ([7], pages 161-165). We assume that the independent random variables Xx, X2, ... satisfy = fl|, V(Xt)=0t, G.5.1) and write Theorem 7.5.1. Suppose that, for some H>0 and all ?(Zf)^t.#fc-2fc! G.5.2) Then, for ^t P(Sn e-t2, G.5.3) Proof. It is sufficient to prove the first of the inequalities G.5.3); the other two follow from it in an obvious way. From G.5.2) it is clear that
170 RICHTER'S LOCAL THEOREMS; BERNSTEIN'S INEQUALITY Chap. 7 E exp (yZi) < oo G.5.4) if \y\ < H~K Take 0<y<BH)'J. Then I{y) = E exp [y(Z1+Z2 + ... + Zn)-\ = E exp(ySn). Consider the inequality 2 G.5.5) or et2 exp(ySn)/I(y)>et2. G.5.6) Since the left-hand side of G.5.6) has expectation 1, Chebyshev's inequality shows that this inequality has probability at most e~'2, i.e. >t2 + \ogI(y)}<e~t2, G.5.7) But oo k k = 2 K- 5i), G-5.8) so that I (y)=f\E (#*')< <exp(y2Bn). G.5.9) Thus P(Sn > t2 + y2Bn)<P(ySn > t2 + log I(y))e't2. G.5.10) Now take y = tB~i, so that yH ^ by the condition assumed of t. Then from G.5.10) we deduce that )<e't2, G.5.11) and G.5.3) is proved. •
Chapter 8 CRAMER'S INTEGRAL THEOREM AND ITS REFINEMENT BY PETROV § 1. Statement of the theorem The first general result in the theory of large deviations was the integral theorem of Cramer [19], published in 1938, which has considerable computational and analytical usefulness. It was refined and generalised by Petrov [133] in 1954. In this chapter we discuss the work of Petrov, keeping for simplicity to the case of identically distributed variables Xj. It should be remarked that the most natural method for proving integral theorems under Cramer's condition is the method of steepest descents, whose use for local theorems was described in the last chapter. In the case in which the distributions of the Xj are different such an approach encounters, however, considerable difficulty. Let the Xj satisfy G.1.1) and Cramer's condition G.1.2), and let X (z) denote Cramer's series G.2.20). Write Zn = = P(Xj<y). Then Petrov's refinement of Cramer's theorem, in the identically distri- distributed case, has the following form. Theorem 8.1.1. For x> 1, x = o(ni), we have (8.1.1) = exp -r X -r n*. fc^ = exp(Jig))|l + 0^)|. (8.1.2)
172 CRAMER'S INTEGRAL THEOREM; ITS REFINEMENT BY PETROV Chap. 8 § 2. The introduction of auxiliary random variables Since ?(exp a\Xj\)< oo , we may write, for \h\ <a, R = R(h)= ehydV{y). (8.2.1) J - oo Let Xj be independent random variables with the distribution function V(x) = R~1 (X ehydV(y), (8.2.2) ¦J — oo and write _ v_ _ (8.2.3) Then m = E{Xj) = R'1 r xehxdV(x) = J - oo R'lh\ d = — log R , R(h) d/i and Write FH(x) Fn(x) = P(Xt + ... + Xn-mn<<mix). (8.2.6) We prove by induction on n the fundamental relation y). (8.2.7) When n = l this follows trivially from (8.2.2). Suppose it is true for a particular value of n. Then
8.2. INTRODUCTION OF AUXILIARY RANDOM VARIABLES 173 WH+l(x)= P V(x-z)dWn(z) = — oo OO TOO = Rn+1 R-1V(x-z)Q-hzdWn(z) = J - oo r f = Rn+1 dWn(z)Q~hz e"*W(?). ((8.2.8) J — oo •' — to Making the substitution ^ = r\ — z, (8.2.8) becomes whence the induction succeeds, and (8.2.7) is proved. From (8.2.7) and (8.2.6) we have (8.2.9) Fn(x) = Wn Fn (x) = Wn (mn + so that rxanVi Fn(x) = R"\ e-^d^iy). (8.2.10) J- oo Setting ri = mn + ydrfi, we have - r(ax-mnxA)laVi Fn{x) = R"Q-hm Qxp(-hyffnlA)dFn{y)- (8-2.11) ^ - oo Letting x->oo, l=RnQ-hmn I Qxp(-hyan*)dFn(y), J- oo so that /• go 1 _ Fn (x) = Rn e 'hmn exp (- /i^n") dFn (y). (8.2.12)
174 CRAMER'S INTEGRAL THEOREM; ITS REFINEMENT BY PETROV Chap. 8 § 3. Proof of the theorem Since log ? = log ehydV(y)= ? ^/iv, (8.3.1) - — oo v — 2 • where yv are the cumulants of the Xj G.2.6), we have " = dilogi?=|2G=T)T/lV' <8-3-2) °2=di?logR=|2 j^j! hy~2 > ° • (8-3-3) In the notation of § 7.2, so that the factor multiplying the integral (8.2.12) is exp{n(K(h)-hK'(h))}. (8.3.4) We now choose h to be the solution of the saddle point equation K'(h)-<n = 0, (8.3.5) where x = x/ni = o{l). According to G.2.18) and G.2.19) there holds, for sufficiently large n, the equation K(h)-hK'(h) = K(h)-cx= -1t2 + t3A(t), (8.3.6) where A(t) is Cramer's series G.2.20). Moreover, m = K'(h), so that by (8.3.5), <7x-mni=Q. Substituting (8.3.6) into (8.2.12), we therefore have _ exp(-hcn±y)dFn(y). (8.3.7) o We therefore have only to examine the integral in (8.3.7): (8.3.8) o Now Fn is the distribution of the normalised sum
8.3. PROOF OF THE THEOREM 175 so that we can use the theorems of § 3.5. From (8.3.3), & = c + O(h), (8.3.9) and Fn(y) = <p(y) + Qn(y), Qn(y) = Bn^ , (8.3.10) so that oo o r 00 n* exp{-hen±y)Qn{y)dy. (8.3.11) Jo The last integral in (8.3.11) is Qxp{-hdniy)dy = Bn~i, (8.3.12) and Qn@) = J5n~i, so that it remains only to estimate the integral han^y — jy2)dy. (8.3.13) 'o Now (8.3.2) and (8.3.5) imply that mn* = a2hni+O(h2ni), so that hen* = mnia~1 + 0(h2ni). (8.3.14) Substituting (8.3.14) into (8.3.13), we have the expression r oo {2k)~* exp( — mnia~1y)exp(Bh2niy)exp(—^y2)dy = Jo + Bexp(-Wh~2) = i>'2)d>'(l + O(/i)) (8.3.15) exp(m2n/2<72)(l -$(mn*lo))(l + 0(h)). (8.3.16) o
176 CRAMER'S INTEGRAL THEOREM; ITS REFINEMENT BY PETROV Chap. 8 According to (8.3.4), so that m2n/la2 =jm2. Using this and (8.3.13), and substituting (8.3.16) into (8.3.7), we find that ?& (8.3.17) Since h = O (t) = O (x/n1), the theorem follows. •
Chapter 9 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION § 1. Zones of normal attraction In this chapter it will be assumed that the independent random variables Xj satisfy E(Xj) = 0, FpQ = <t2 >0 , and that Sn = Xx + X2 + ... + Xn, Zn = SJcn* . We shall also suppose that the Xj belong to the class (d) of variables having a bounded continuous probability density g(x). The method discussed in this chapter may be used under less stringent conditions on g(x), and also for lattice variables, but we restrict attention to (d) for simplicity of presentation. Let \j/ (n) be any function increasing to infinity. The seg- segments [0, \j/{n)~\ will be called a zone of (integral) normal attraction if, uniformly in xe[0,ij/(n)~\ as n->oo, P(Zn>x)/Bn)-* I e-*du->l. (9.1.1) If it is desired to emphasise the uniformity, the phrase "zone of uniform normal attraction" may be used. A similar definition holds for zones of normal attraction of the form [ — \j/{n), 0]. When Zn has a probability density pn{x), we can similarly define a zone of local normal attraction as a sequence of segments [0, \ff(n)~\, in which pn(x)/Bn)-U-^2-+l (9.1.2) uniformly in x. It will be seen later that a special role is played by the zones delimited by
178 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 yl,(n) = o(r&) ; (9.1.3) such zones are said to be narrow. Zones of the form [0, ri*~\ (or [- na, 0]) are called monomial. In what follows, Sx, S2, ... ; e1? e2, ... ; nx, n2, ... ; Co> Ci, C2> ••• are small and positive, each one depending on its predecessors, c0, cx, ... ; Co, Cx, ... ; Ko, Kx, ... are positive constants similarly chosen, B is bounded and varies from one expression to the next, and p (n),px(n), p2(n) are positive functions converging to oo as n->oo. In this chapter we study monomial zones of local normal attraction, both narrow and wide. § 2. The fundamental conditions Theorem 9.2.1. Let 0<a<j. Then the condition Eexp(\Xj\4a/{2a+1))< oo (9.2.1) is necessary for [0, nap(n)~\, [ — nap(n), 0] to be zones of local normal attraction. Proof. Write jS=4a/Ba+l). Suppose that (9.2.1) does not hold. Then there exists a sequence xm-+oo such that P(Xx>xm)>Gxp{-2xpm) (9.2.2) for all m, or P(XX < ~xm) > exp(-2x?) (9.2.3) for all m. Suppose that (9.2.2) holds. For sufficiently large m, choose n so that Since [0, nap(n)~\ is a zone of normal attraction, P(Zn>2-n°p(n))<Qxp(-^n2ap(nJ). (9.2.4) But the event {Zn > \na p (n)} will certainly occur if the independent events {Xx>aniap{n) + d} and {\(X2 + X3+ ... + Xn)/<mi\< 1} both occur. Hence, by the central limit theorem and (9.2.2),
9.2. THE FUNDAMENTAL CONDITIONS 179 P(Zn > frfp(n))> c0P{X, > xm)>c0 exp(-Cln2'p(rif). (9.2.5) Since a < \, p < 1 and (9.2.5) contradicts (9.2.4). The case of (9.2.3) is treated similarly. • Theorem 9.2.2. For random variables of class (d) the condition (9.2.1) is necessary in order that [0, nap(n)~\ and [ — nap(n), 0] should be zones of local normal attraction. We remark that this result is not an immediate consequence of the last theorem since uniform convergence of densities does not at once imply anything about P(Zn>x). Proof. Suppose that (9.2.1) is not fulfilled. We show that there is either a sequence xm->oo such that f2%(x)dx>exp(-4x?), (9.2.6) or one such that f ^ 0(x)dx>exp(-4x?). (9.2.7) J -2xm Indeed, if there is no such sequence, then Xg(x)dx = Bexp(-4xp) (9.2.8) X for x>0, and a similar condition for x<0. Hence 2x exp (x^) g (x) dx = B exp (— xp) X in x >0. Taking x = l, 2, 4, ... and adding, we get )dx < oo , and combining this with the corresponding argument we get (9.2.1). Thus if (9.2.1) does not hold, either (9.2.6) or (9.2.7) does; suppose the former, and write
180 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 Since [0, nap(n)~\ is a zone of local normal attraction, we have P{?nap(n)^Zn^^nap{n)} < exp{~n2ap(nJc3} . (9.2.9) The event on the left-hand side certainly occurs if xm ^ Xx ^ 2xm and \(X2 + ¦¦¦+Xn)/an'i\< I. The argument then proceeds as for the last theorem. • § 3. Fundamental theorems Theorem 9.3.1 //[0, na~\ and [ — na, 0] are zones of local normal attraction for all a < j, then the Xj have a normal distribution. In Chapters 10 and 12 analogous theorems will be proved for integral normal attraction. Thus only bands with fixed a< \ are interesting. The theorem is a corollary of the following more complex result. Theorem 9.3.2. Let 0 < a < \, and let p (n) be any increasing function tending to infinity and slowly varying as n-> oo. Ifa.<\, then the condition (9.2.1): ?{exp|X/a/Ba + 1)}< oo, which is necessary for [0, nap(n)~\ and [ — nap(n),0~\ to be zones of local normal attraction, is also sufficient for [0, na/p(n)~\ and [ — na/p(n),0] to be zones of local normal attraction. If on the other hand^^a<^, we consider a relative to the series of "critical numbers" 6> 4> 10> ¦¦¦¦> 2 s , ^ ' ••• 2 • {y.3.1) Let s be the unique integer with ,s+l ,s+2 2 s + 3 s + 4 Then for [ — nap(n), 0] and [0, nap(n)] fo ^e zones of local normal attrac- attraction it is necessary that (9.2.1) holds and that the moments ofXj, up to order (s+3), should coincide with those of a normal distribution. Conversely, these two conditions suffice for [ — na/p(n), 0] and [0, na/p(n)~\ to be zones of local normal attraction.
9.3. THE FUNDAMENTAL THEOREMS 181 Proof. In view of Theorem 9.2.2, it is sufficient to consider variables satisfying (9.2.1), but we shall use only the weaker assumption that E{exp(A\X/)}<co, (9.3.2) where A < 1 is a constant, and /?=4a/Ba +1). From (9.3.2) all the moments Hk = E Xf exist, and there is no loss of generality in taking g2 = 1. Suppose that a<j is fixed and, if cl^\, take the integer s to satisfy ii±I^a<i?±|. (9.3.3) 2 s+3 s + 4 v ; If a>j(s+l)/(s + 3), we consider the moments jU3,jU4, ...,/ns+3 and the cumulants k3=jU3, ka=ha — 3, ks=h5 — 10^3, ..., ks+3. If on the other hand a = j(s+l)/(s + 3) we consider only ,u3, ...,ns+2, k3, ...,ks+2. For the moment, however, we remain with the former case of strict inequality. Assume that the first non-zero cumulant is Ka, so that a, Kr = 0 (r<a = so + 3), /ca#0. (9.3.4) Suppose that so<s. (We must return later to the case in which or in which there is equality in (9.3.3).) Since the Xj belong to class (d), they have a bounded continuous prob- probability density g (x), and their characteristic function is 4>(t)= P eitxg(x)dx. •> — en Then \<p(t)\2 is the characteristic function of (Xt — X2), which has a pro- probability density, whence \4>{t)\2 has a non-negative Fourier transform. From the lemma quoted in § 7.2, \<j)(tJELx(—<x>, oo), so that \4>{t)\2dt< oo. (9.3.5) oo The normalised sum Zn = n~iSn(cr=l) has a probability density Pn(x) = ^T <fi(t)n exp(-in±tx)dt. (9.3.6) Moreover, 14>{t) \ < 1 for f#Oand(/)(f)->Oas|f|->oo. Thus, for any 0 < e0 < U (9.3.5) implies that 1 , (9.3.7)
182 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 where \R1\=Be~ElX. (9.3.8) Because of (9.3.2). cf)(t) is infinitely differentiable for all t. For positive integers T, p and \t\ ^ T, tp~16{p~1)@) tp t<f>'@) + ...+ JlV[) + -pRpif), (9.3.9) where 2 sup \4^(t)\. (9.3.10) Further, (p'@)=0, 0"(O) = -1, so that, for suitable e0 and \t\ ^e0, (9.3.9) implies that l-if2. (9.3.11) If we write and \ ~ T^ < " (9-3.12) (9-3.13) then (9.3.11) shows that, for n" ^ |f|<e0, r < A -i"'2)" = 5 exp(-c4n2ai). (9.3.14) § 4. Approximation of the characteristic function by a finite Taylor series The function 0 (t) is infinitely differentiable, but not in general analytic, and to estimate the remainder in (9.3.9) we need bounds for (j)(q) for large q. Now f oo \<t>(q)(t)) < |x|«0(x)dx • (9.4.1) If fc = (l+2a)/4a, (9.4.2) then (9.3.2) implies that )dx<oo. (9.4.3)
9.4. APPROXIMATING THE CHARACTERISTIC FUNCTION i83 Thus, for xM, Too exp(Ax1/k) g(u)du = B, J x and a similar condition holds for x^ — 1. It follows easily that \(f){q)(t)\ = Bqr{kq). (9.4.4) In \t\ ^n", write K{t) = log (f>(t), K@)=0 . Then, from (9.3.7) and (9.3.14), Pn{x)=T~\ Qxp(nK(t)~initx)dt + BQxp(-c4rn2'Xl), (9.4.5) and from (9.3.11), K(t) = B, ' (|t|<e0). (9.4.6) Write D) = ^(to) + t(/)/(to) + ...H p-^ . (9-4.7) Then, since X(9)(f0) depends only on <p(p)(t0) for p^q, and since <p{p)(t0) = 0(P)(O), we have o)rUo. (9-4.8) For sufficiently small p, log <J>(t +10) is an analytic function of the complex variable t in the disc bounded by Cp = {z; \z\=p), so that From (9.4.4), ^^ = Bexp(Bp+(k-l)p log p). (9.4.10) Choose p = exp(-K0-(/c-l)logg), (9.4.11) i<C0 being sufficiently large. Then ^_i^p = ? ? exp(Bp-X0p + (/c-l)p(logp-logg)) P! p=i (9.4.12)
184 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 For sufficiently large Ko, the absolute value of this expression is less than |, and (9.3.9) gives, for \t\ ^j?0, (t0) = B exp(Bq + kq log q). (9.4.13) Moreover, for \t\ ^n~fil, where , "^ = Bexpm^ + ^-lJlogm-^logn). (9.4.15) We now take (9.4.16) where Kx is a large positive constant to be chosen later. Using the values of k and fix, we have ? exp (Bm + (k~l)m log m~fj.x m log n) = = J5 exp(J5m + (fc~l) m log m — (| — a^ m log n) = {|~1 ^ —j ^ Bm + m — Ba log n-log Kx)~(^~txx) log n > = L 4a J J J5 — (a — a^logn —logjK^i. (9.4.17) But a^a, 1 -2a>0, so that (9.4.17), and thus (9.4.15), are bounded by Bexp(~exn2ai), (9.4.18) if Kx is chosen sufficiently large (sx -sx(Kx)). § 5. Derivation of the basic integral Write ^r = iC(r)@) = irK:r, so that \}/r=Q for 3^r<so + 3. Then ilsr-+Bexp(~exn2a>). (9.5.1)
9.5. DERIVATION OF THE BASIC INTEGRAL 185 Now Re nK(t) ^0 for \t\ ^n"^1, and writing m t.r (9.5.2) we have, from (9.4.5), (9.4.14) and (9.4.18), + Bexp(-e2n2ai). (9.5.3) Now consider the entire function where, from (9.5.2), Z,o+3 = #so+3- (9-5.5) From (9.4.13) with to = 0, we have for | log r-^ log n"). (9.5.6) For r^Cx, (9.5.6) is, for \t\ ^n~^1, equal to fin"^1 , - (9.5.7) and for r>Cl, if log r^sx log n, to Bn~rdl . (9.5.8) If log r>31 log n, then (9.4.17) and (9.4.18) show that (9.5.6) is equal to Be~dir (ndl^r^m). (9.5.9) Thus, in \t\ ^n'*1, we have r = so + 3 ) = 5) (9.5.10) r = so + 3 using (9.3.13) and (9.3.12). We may express xr as a Cauchy integral around \t\=n~^\ and then (9.5.10) shows that
186 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 r^ . (9.5.11) We also remark that, because of (9.5.5), xSo+3 is of exact order n. Now let IfK^iW", (9.5.12) so that If m is chosen in accordance with (9.4.16), then Brft = B exp (n2^ !°JpA = B exp(-e2n2«<), (9.5.13) where e2 = \og(l/r]l)/K1. (9.5.14) Now turn to (9.3.14) and to (9.4.5) with n~^ replaced by ^n"; (9.5.3) gives ni rim'* Pn{x) = r- exp(-jnt2) exp(nKS0+3{t)) exp(- in±tx)dt + + 5exp(-c4^n2ai), (9.5.15) or, taking into account (9.5.13) and (9.5.14), \ ( n pn(x) = — \ exp(->2) A+ J ^ (9.5.16) where i) + 5 exp(-?3n2ai), (9.5.17) and . (9.5.18) Making the substitution ? = tri*, we find that (9.5.19) For
9.6. COMPLETION OF THE PROOF i87 = Br exp(-i»7fn1-2'")r(ir)n-(*-'">r, (9.5.20) and = 5 exp(flr +jr log r-r^-^) log n) = = B exp(Br + r(a1 log n-y log K1-a1 log «)) = = 5 exp (Br-±r log ^) = = 5e~C5r. (9.5.21) Therefore, summing (9.5.20) over r^m, we get an expression equal to ij2"*) ; (9.5.22) a similar argument obtains for the integral over ( —oo, —t]1n*~fil). Thus (9.5.16) may be written ^)e-&^ + R2, (9.5.23) / where i?2 satisfies the same equation (9.5.17) as Rx. § 6. Completion of the proof In view of (9.5.23), we need to study the integrals f 00 V )e-"*2, (9.6.1) - 00 where H(ro)(x) = bHr(ax) for suitable constants a, b, and the Hr(r^ m) are the Hermite polynomials [164] Bx)q~2s We suppose that 0<x<Cnai = Cni~ftl , (9.6.3) and estimate (9.6.1) when C2<r^m. From (9.6.2), for x^0,
188 MONOMIAL ZONES OF LOCAL NORMAL ATTRACTION Chap. 9 f . tf <°>(x) = B«q\ max f -, (9.6.4) so that, writing s = pq @ -p log q-(l-2p)\og q + \og q-p log p + -(l-2p)log(l-2p)] = = 5 exp [tf E+1 -2p){?-/0 log n + log Q + P log g] . Multiplying this by * ^), (9.6.5) and by ^ ), (9.6.6) we obtain the expression B exp 0E+ A -2p)(|-/f1) log n-(i-^) log = 5 exp 0E +log C(l -2p)-p log KJ] . (9.6.7) If p<$, |log C(l — 2p)| may be made arbitrarily large by taking ? small; if p>\ and .K^ sufficiently large, p log iCj may be made arbitrarily large. Thus, taking q = r, the sum of the terms in (9.5.23) for C2 < r < m is of order Be~ix2e~C3. (9.6.8) We now turn to the terms in (9.5.23) with 50 + 4^r^C2, (9.6.9) whose sum is of order Be-i*2 Y *?- = Be-ix2{xn~i+flif0+4. (9.6.10) Moreover, the term with r = s0 + 3 is Xsp + 3 tt@) (Y\p-ix2 -Hso+3) _ {<? 4- ^\ I "* ¦•A--/-* / v* ¦ ~\-// [y.o.LL)
9.6. COMPLETION OF THE PROOF 189 as n->oo, where a0 is a positive constant. Thus (9.5.23) becomes R2 . (9.6.12) Therefore, if 0<x<n*-'»/p1{n) = n"/p1{n), (9.6.13) we have and Hence, from (9.6.12), the segments [0, nai/p1(n)'] form a zone of local normal attraction. Thus, if a = al5 50 + 3 = 5 + 3 and i{/s+3 = 0, we can infer from (9.6.13) that [0,ri*/p1 («)] and, by a similar argument, [-na/p1(n),0'] are zones of local normal attraction. If on the other hand a > ax and i/^o+3 #0, then [0, rf~] cannot be a zone of local normal attraction, since if it were so, because c^ <a, [0, nai] would be such a zone. Then, if (>0 is sufficiently small, and x = C«*~/fl =Cwai , (9.6.14) (9.6.13) gives (9.6.15) where |yi|<y. Since 1/^+3 #0, this contradicts the assumption of local normal attraction. The case of equality in (9.3.3) and the case oc<i require only straight- straightforward modifications of the arguments. In order to complete the proof it suffices only to remark that, if k3 = k4 = ... = ks+3=0, then ^3, ^4, ..., pis+3 are equal to the moments of the normal distribution with mean k1 and variance k2. Theorem 9.3.1 follows on recalling that a distribution all of whose moments are the same as those of a given normal distribution must be equal to that normal distribution. •
Chapter 10 MONOMIAL ZONES OF LOCAL ATTRACTION TO CRAMER'S SYSTEM OF LIMITING TAILS § 1. Formulation The theorems to be discussed in this chapter are important generalisations of those of the last chapter. There only elementary theorems (Taylor's theorem and elementary results in complex analysis) were used; here we shall make use of the method of steepest descents. Theorem 7.1.1 shows that, for variables of class (C, d) (i.e. (d) variables satisfying the very stringent condition of Cramer), the limiting relations G.1.3) and G.1.4) are satisfied in the ranges [0, i//{n)'] and [— i//(n), 0] so long as ij/(ri) = o(ni). These relations involve the Cramer series ?.(z) defined at G.2.20). Now let ~~/~\ _r__i__.2| /1A11\ 7T I zl ~~~ jTf\ —t— jT 4 7 —t— jTa 7 —4— III/ I II be any given power series with real coefficients, with non-zero radius of convergence. Let Xj be variables of class (d), and let Sn, Zn, a and pn(x) be as in the last chapter. We shall be interested in the possibility of limiting relations of the form pB(x)/B*)-* exp (-W +^n (±yj -> 1 A0.1.2) and pH(-x)/Bn)-* exp(-W - ^n (- ^Y) -> 1 , A0.1.3) in O^x^n". We shall see that, as before, it is sufficient to consider a<y. We remark that, if
10.1. FORMULATION 191 so that relations like A0.1.2) and A0.1.3) imply local normal convergence. In the last chapter the zones of local normal convergence were character- characterised, so that we may take this case as having been dealt with. Suppose therefore that ?<a<?. A0.1.4) For 0<x^na, ^«^) = B-*?o»,ljr. A0.1.5) Let 5 be the unique non-negative integer with It is easily seen that OO v'+3 v I Ut-fr-Bn-', A0.1.7) t~s+l n where e=^(s + 2)—a(s + 4)>0, and thus n(z) may be replaced in A0.1.2) and A0.1.3) by the truncated series t *vz2, A0.1.8) v=O 5 being determined by A0.1.6). Theorem 10.1.1. Let ?^a<y, and define s by A0.1.6). Let p(n)-+oo as n tends to infinity, and suppose that ?{exp|X/a/Ba+1)}<oo. A0.1.9) Then uniformly in 0^x^na/p(n) as n-»oo, pH(x)/Bn)-* exp (-ix2 + ^ A" ^ -> 1 , A0.1.10) and Pn{-x)l{2nf exp ^x2 - ^ A" {- ~Jj - 1 , A0.1.11) where X{z) is Cramer's series.
192 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 10 That A0.1.9) is to some extent necessary is shown by the following result. Theorem 10.1.2. If, for all |x|^;7ap(;7) and all n^n0, we have pn(x)^exp(-a0x2), A0.1.12) where a0 is a positive constant, then A0.1.9) is satisfied (and then by the previous theorem A0.1.10) and A0.1.11)/o//ow). These theorems are, of course, of collective type in the sense of Chapter 7; the moments of Xj (up to order 5 + 3) play the role of the linear functionals ap by We see that, in the monomial zones, the only possible limiting tails are those determined by the segments of Cramer's series k(z). Since s->-oo as a-*j, the only possible series n (z) is the Cramer series, whose coefficients are fixed polynomials of the underlying moments. Thus not every sequence of numbers Xj can be the sequence of coefficients of a Cramer series. § 2. On the condition A0.1.9) We first show that A0.1.9) follows from A0.1.12); the proof follows that of Theorem 9.2.2 almost verbatim. If A0.1.9) does not hold, there exists a sequence xm-»oo such that either (9.2.6) or (9.2.7) holds. Taking xm = ani+ap{n) + 6 (|0| ^ 1) we see from A0.1.12) that, for sufficiently large m, P(irfp(n) ^Zn^ hap(n)) < exp [-|a0n2*p("J] • A0.2.1) The event whose probability is so bounded certainly occurs if both of the independent events \X2 + ... + Xn\/<m*<l, xm<X1<2xm, occur, and this has probability greater than )L A0.2.2) if (9.2.6) holds. Thus (9.2.6) (and likewise (9.2.7)) contradicts A0.2.1), since 4a/Ba +1) < 1. This proves Theorem 10.1.2. • § 3. Derivation of the fundamental integral We now proceed to the proof of Theorem 10.1.1, assuming as we may that <7 = 1 and replacing A0.1.9) by the weaker condition
10.3. DERIVATION OF THE FUNDAMENTAL INTEGRAL 193 ))}<co A0.3.1) for fixed A < 1. We shall begin along the lines of §§ 9.3,9.4. Note the basic equation (9.3.6), and set Following the arguments of the last chapter we arrive at an analogue of (9.4.4): yA r"~" pn(x) = — exp(nK(t)-rijtitx)dt + Bexp(-c2nlct), A0.3.2) where K(t) has its usual meaning. According to (9.4.12), if ?0 is sufficiently small, and |?ol<i?o> (t0) = B exp (Bq + kq log q), A0.3.3) where /c = (l + 2a)/4a. A0.3.4) Moreover, (9.4.14) and (9.4.17) show that, if m = [n2aIKl'\, A0.3.5) and Ki is chosen sufficiently large, then where, for \t\^n tm+1Rm(t) (m+1)! Thus (as in (9.4.16) for JB+ 1 (m+l)! if Kx is chosen sufficiently large. From A0.3.6), and Re{nK{t))^0 for \t\ ^n~M. Write = B exp [m(B + (k-1) log m-M log n)~\. A0.3.7) = 5exp(-?1n2a), A0.3.8) m jr nK{t)= -\nt2 + n ? $, - + B exp(-?ln2a), A0.3.9) r=3 r •
194 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 10 r= 3 ' • then from A0.3.9) we have (cf. (9.5.2)) ni A0.3.10) § 4. Application of the method of steepest descents The integrand of A0.3.10) is an entire function, since Km (t) is a polynomial. Set t = x/n*, l^x^n", T<na~*^n"", A0.4.1) and z = it; then the integrand of A0.3.10) becomes exp [n&2 + Km{~iz)-tz)] . A0.4.2) Here Km (iz) = K(®] (z) is a polynomial in z with real coefficients. The saddle point equation is z + ~K«\z) = t, A0.4.3) which for sufficiently large n will have a unique positive solution z = zo = x-^x2 +?Cyi-y4)T3 + ... , A0.4.4) in which the first (m — 2) coefficients will coincide with those of the Cramer series G.2.20). Write p1(n) = p(n)i: and suppose that Pi{n), A0.4.5) so that TVPifo)- A0.4.6) The contour of integration z = ft, |?|^n~'/ may now be given a parallel translation to the contour z = z0 + it, \t\ ^ rT11, so long as we can estimate the integral over the horizontal segments z = ?±in~fl, 0^ have (c/ (9.5.5)) on these horizontal segments,
10.4. THE METHOD OF STEEPEST DESCENTS: ITS APPLICATION 195 \z\rK(r){0) n f / l-2a, \l p^ = B exp \r (B + —— log r-/j. log n I . A0.4.7) This may be estimated as in § 9.5; when r^Ci it is Bn~r\ A0.4.8) when Ci <r^e2 log n A0.4.7) it is Bn~S4r, A0.4.9) and when e2 log n ^ r ^ m it is 5e~?5r. A0.4.10) Thus on the horizontal segments we have n \K{^(z)\ =Bn1~3fl = Bn3^^ . A0.4.11) Moreover, since z = ? + in ~fl, ^-i-n1-2^ -\n2\ A0.4.12) since ^n~7Pi(«) and t^n~'l/p1(n)A0.4.6). Moreover, 2a>3a — j, and comparing A0.4.11) with A0.4.12) we see that A0.4.2) is, on the horizontal segments, equal to 5exp(-in2a). A0.4.13) Thus A0.3.10) transforms into pn{x) = — A0.4.14) From A0.4.3) K^(z0), A0.4.15) and, on z=zo + it, 0)-Tz0 + f; I -^ K^(zo)(ity . A0.4.16) j=2 J• az Using the estimates A0.4.8), A0.4.9) and A0.4.10) in A0.4.7), we have
196 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 10 id^^O)(Zo)=2+jBT' A0A17) (cf. (9.3.6)). We separate the contour of integration z=z0 + it, |r|<n~^into two parts, \t\ ^ /T* (log nf , «-* (log nf ^ \t\ ^ n'*1, as in § 7.3. According to A0.4.8), A0.4.9) and A0.4.10) we have, for m 1 / d V = B\t\3( V3O$Cl C1<r<?2logn ?2logn<r ) = BW3- A0.4.18) Thus exp = flexp[-(lognJ]. A0.4.19) Inserting this into A0.4.14), we have 'o)-tzo)]x x exp n(i22 + KL0)B0)-T20) exp[-(log nK] + + 5exp(-?5n2a). A0.4.20) From A0.4.18), for \t\ ^n*(log nJ, m 1 jj ^ °*9 . A0.4.21) j—3 J ' Inserting this into A0.4.20) and using the computations of § 3.3, we have
10.5. COMPLETION OF THE PROOF OF THEOREM 10.1.1 197 + 5exp(-e5n2a) = = Btc) - * exp [n &2 + K™ (z0) - xz0)] A + Bn " °-48) + + 5exp(-?5n2a). A0.4.22) § 5. Completion of the proof of Theorem 10.1.1 We now consider the expression whose exponential appears in A0.4.22). Using A0.4.8), A0.4.9) and A0.4.10), we have A0.1.1) where KlC] denotes the sum of the first C terms of K. According to A0.2.18), 2), A0.5.2) where Cx becomes arbitrarily large for large C. If Kx<n7Pi(n) and s satisfies A0.1.6), then nxzklCi]{x) = nxzkls]{x) + Bn-E. A0.5.3) Substituting A0.5.1), A0.5.2) and A0.5.3) into A0.4.19) completes the proof of the theorem for l^x^rf/p(n). The case —rix/p(ri)^x^ — l follows on replacing Xj by — Xj, and the case \x\ < 1 is a consequence of the classical theorem. •
Chapter 11 NARROW ZONES OF NORMAL ATTRACTION § 1. Classification of narrow zones by the function h We retain the notation of the last two chapters, and record here some new terminology. The narrow zones [0, A{n)\ and [-A(n), 0], where A(n) is continuous and increasing and A(n) = o(ni), will be described in §2 by means of a function h(x), non-decreasing and continuous in x^2. It will turn out to be natural to distinguish three classes of possible func- function h. Class I: This consists of functions h satisfying (for some Co > 0)> (logxJ+c°<fc(x)^x*. A1.1.1) If we write /i(x) = exp{ff(logx)} , then H(z) is required to be monotonic and differentiate, with H'(z)^l, H'(z)-*0 B-00), A1.1.2) H'(z)expH(z)>Clz1+^. A1.1.3) Class II: This consists of the non-decreasing continuous functions with po(x) log x < fe(x) < (log xJ , fc(x) = M(x)logx = JV(logx)logx, A1.1.4) JV'(z)-»0 (z-»oo). (As.before, p with affixes denotes a function tending to infinity at infinity.)
11.2. STATEMENT OF THE THEOREMS 199 Class III: The functions h with 3 log x ^ h(x) ^ M log x , where M ^ 3 is a constant. § 2. Statement of the theorems We shall investigate the narrow zones of local and integral normal con- convergence for variables of the class (d) defined in Chapter 9. In terms of the function h(x) we define A(n) by the equation h{n*A{n)) = {A(n)}2 . A1.2.1) We shall show that, in a weak sense, the condition for [0, A («)] to be a zone of local normal attraction is related to the condition ?{expfc(|X;|)}< oo. A1.2.2) Theorem 11.2.1. If h(x) belongs to Class I, then A1.2.2) is necessary for [0, A(n)p(n)~\ , [ — A(n)p(n), 0] to be zones of local normal attraction and sufficient for [0, A(n)/p(n)~\ and [ — A(n)/p(n), 0] to be zones of local nor- normal attraction. Theorem 11.2.2. The statement of Theorem 11.2.1 remains valid if the word "local" is replaced by "integral" throughout. If h belongs to Class II we define A{n) = {h(n)}± = {M{n) log nf , A1.2.3) since, under the conditions defining this class, this differs only by a slowly varying function from that determined by A1.2.1). Theorem 11.2.3. If h(x) belongs to Class II, the statement of Theorem 11.2.1 continues to hold. Theorem 11.2.4. If h(x) belongs to Class II, the statement of Theorem 11.2.2 continues to hold.
200 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 For h in Class III, we take A(n)= {log n}*, A1.2.4) thus delimiting "very narrow" zones. Theorem 11.2.5. For h(x) in Class III and A(n) = {log np the statements of Theorems 11.2.1 and 11.2.2 continue to hold. § 3. On the conditions imposed upon h(x) We first comment on the conditions which the different classes, in parti- particular Class I, impose on h(x). The inequality h(x)^(\og xJ+Co ensures that the zone is not too narrow; it may be noted that, in the particular case h(x) = (log xK, A1.2.2) simply says that the Xj have finite third mo- moment. On the other hand, h (x) < xi implies that the zones are narrow in the sense of Chapter 9. It should also be remarked that A (n) = na cor- corresponds in A1.2.1) to h{x) = x4a/Ba+1), and 4a/Ba+l)<|for a<?. It is natural to assume h to be monotonic and differentiate, so that H is also. If we also assume that h' is monotonic, this leads to A1.1.2) and, in view of the left-hand inequality of A1.1.1), to A1.1.3). § 4. The necessity of A1.2.2) for Class I Suppose that [0, A(n)p(n)~] and [-A(n)p(n), 0] are zones of normal attraction. Then for n > n0 we have P(Zn > iA(n)p(n)) < exp(-M(nJp(nJ), A1.4.1) since A(«)->¦ oo as n-»oo. Suppose that A1.2.2) is not satisfied. Then we can find a sequence xm-^oo such that either P(X1>xm)>Qxp(-2h(xm)), A1.4.2) or P(X, <-m)> exp{-2h(-xm)). A1.4.3) Suppose for instance that A1.4.2) holds. For sufficiently large m choose n so that xm=an*A(n)p{n) + 0, \0\ ^ 1 .
11.5. THE SUFFICIENCY OF A1.2.2) FOR CLASS I 201 The event described in A1.4.1) occurs if both the independent events occur, and thus by A1.4.2) and the central limit theorem, P(Zn>±A(n)p(n)) > CoP&i >xm) > = c0 exp [ - 2h (crn1 A (n) p (n) + 6) >c0 exp[-2h{2oniA{n)p{n))'] . A1.4.4) For sufficiently large ?, rj, h{?ri) = exp [if(log ? + log >j)] ^exp \H(log ^)] + o(log n) = = h^)rj0A), A1.4.5) using A1.1.2). Setting ? = nM(n), rj = 2ap(n) we find that 2h{2aniA(n)p(n)} = 2A{nf p{nHA), A1.4.6) so that A1.4.1) and A1.4.4) are contradictory. Similar arguments hold for A1.4.3), and we have therefore proved the necessity of A1.2.2) in Theorem 11.2.2. The corresponding local result for Theorem 11.2.1 is proved by a similar argument as in § 9.2. § 5. The sufficiency of A1.2.2) for Class I We now proceed to the proof that A1.2.2) is sufficient for [0, A(n)/p(n)~] and [ — A(n)lp{n), 0] to be zones of local normal attraction. Indeed we shall prove more generally the sufficiency of for any positive constant a; for this slightly stronger result there is no loss of generality in taking a=\. For a given p(n), the function Ap(n) is defined implicitly by h{Ap(n)nyP(n)} = {Ap(n)}2. A1.5.1) It is then sufficient to prove that [0, Ap(n)~\ and \_-Ap(n), 0] are zones of
202 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 local normal attraction, since [0, A(n)/p(n)~\ and [-A(n)/p(n), 0] are narrower. Indeed, if we write Ap(n) = A(n)y(n), the arguments of § 4 applied to A1.5.1) and A1.2.2) show that \p(n)J ""' ' '*"' Kp Write log Ap(n) = Xp{n), A1.5.2) so that, from A1.5.1), H{log Ap(n) + \og {n-/p{n))} = 2Xp(n). A1.5.3) Setting lp(n) = log (n*/p(n)) = = ± log n-Pl(n), A1.5.4) we have H{Xp(n) + lp(n)} = 2Xp(n), Xp(n)+lp(n) = H-iBXp(n)). A1.5.5) We now choose a small positive number n so that n1-2" = eBAp(nJ = cB exp {2Xp(n)}, A1.5.6) B denoting as before a bounded quantity, not in general the same from time to time. Then A1.5.7) A - 2fx) log n = so that n From ullU. n log A1. log 1 2 n = 5 + = B + 2Xp ilogn-. = B+lp(n)-Xp = lP(n 5.5), we n = H~ XP(n) logn ) + Xp(n)-\ have 1{2Xp(n)} 1 B logn in), Xp(») = («) + Pi(«) = -p2(/i)-2Xp(/i). + p2(n)-2Xp(n), ¦ A1.5.8)
11.6. INVESTIGATION OF THE FUNDAMENTAL INTEGRAL 203 § 6. Investigation of the fundamental integral In the notation of Chapter 9, we have the equation (cf. (9.3.7), (9.4.5)) Pn(x) = y (/>WexP(~itxn^dt + B exp( — cle2Xp(n)). A1.6.1) In order to use the method described in Chapter 9, we need to estimate 4>(q)(t) in |t| ^n~M. We have . A1.6.2) Moreover, lim x/i'(x)=oo, A1.6.3) x-> oo since x/i'(x).= exp [if (log x)] if'(log x). The exponent in the integrand of A1.6.2) has derivative q/x — h'(x), and we therefore consider the "saddle point equation" xh'(x) = q. A1.6.4) Because of A1.6.3) this has a unique positive solution x = Qo(q) for q>q0. Lemma 11.6.1. A1.6.5) Proof. Choose Xj such that, for x>xa, h{x) ¦? 2q log x , so that h{x)-q log x^$h{x). A1.6.6) Because of A1.1.1), this is true if (logxJ+Co$s2g logx, so that a possible choice of Xj is
204 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 For x: h (x) - q log x ^ \h (x) ^ 1 (log xJ +««, so that x«e~'1(*)dx = B. A1.6.7) Moreover, the integrand has its maximum at Qo, so that x«e-fc(x)(bc = 5 exp {B4 + 4 log Q0-h{Q0)} e = Bexp{Bq + q\ogQ0-h{Q0)}, A1.6.8) which proves the lemma. • Substituting this result in A1.6.2), we have 6{q)(t) BaLplq(B + \ogQ0qh(Q0))\ogq]. (J Writing, as in Chapter 9, ^, (n.6.10, we have $ + t0)\t=o. A1-6.11) § 7. More investigation of the fundamental integral The function L(q)= log Q0(q)-q-ih(Q0(q))-log q A1.7.1) can be shown to be non-decreasing, since the equation Qo(q)h'(Qo(q)) = q (H.7.2) implies that l}. A1.7.3)
11.7. MORE INVESTIGATION OF THE FUNDAMENTAL INTEGRAL 205 It therefore suffices to prove that i.e. that < h(y) ^ y' This is equivalent to which we have assumed to be true. Thus L(q) is non-decreasing in q. From A1.6.9) we have (?~Y»tq=Bexp(B + qL(q))\t\q. A1.7.4) We consider this expression on the contour |t| = e v, where . v = v(q) = Cl+L(q), A1.7.5) and Cl is a suitable large constant. On this contour, A1.7.4) gives — tq = B exp(B-Ciq-qLfa) so that and = * exp(-iCi tJ(t>U)(to) i \ Hence, for |t0K" M> 2ni J|t| = e-v = Bq\ dt r+1 = B exp{Bq + q \og Qo{q)-h[Q0(q)]} . A1.7.6) A1.7.7) A1.7.8) A1.7.9)
206 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 Thus, for sup K{m)(t) ml = B exp {m[B + log Qo(m)-m h(Qo(m)) + Now choose m to satisfy Then exp H {log Q0{m)} = h{Q0(m)} = -logm-n logn]} . A1.7.10) A1.7.11) A1.7.12) Substituting this in A1.17.10) and taking account of A1.5.8), we through A1.7.10) into the form B exp [m(B-p2 {n) + 2Xp{n)- log m-m'1 exp 2Xp(«))] = = B exp [m (B - p2 (n) - exp 2XP (n) - log m) + 2Xp (n) - log m] . A1.7.13) We now show that 2Xp(n)-logm^oo A1.7.14) as n-»-oo. First note that, by A1.7.2), h'(Q0) HQo) exp[H(log Q0(m))] H'(log Q0(m)) = h(Q0) so that exp [H(log QoH)] #'(log QoH) = m . Thus, from A1.7.11), expBXp(n))H'(logBoH) = m, expBXp(n))H'(Xp(n) + lp(n)) = m. Thus, by A1.1.2), 2Xp (n) - log m = - log H' (Xp (n) + /p („)) _ oo , as n-»-oo, as asserted. If we use A1.7.14) in A1.7.13), the latter becomes ?exp[-c2expBX»)]. A1.7.15) A1.7.16) A1.7.17)
11-8. INVESTIGATION OF K(t) 207 We remark that, because of A1.5.6), §8. Investigation of K(t) From A1.7.10), sup K{m)(t) tm = B exp[-c2 expBX»)] , A1.8.1) so that, for |t|^n~M, m j- ik-y + B exp [-c2 expBXp(«))], A1.8.2) r = 3 ' • where = B exp(Br + r log QoW-fcCQoW)) • A1.8.3) Since 2X» > H(log n*/p(»)) = log h(n*/p(n)), we have expBXp(n)) > fc(n*/p(«)) > [log(«Vp(»))]2+Co, so that n exp( — c2 exp2Xp(n)) = B exp( —c3 exp 2Xp(n)). Hence nK(t) = nK2(t) + Bexp(-c4rn1-2'1), A1.8.4) where K2(t)=-$t2+ t r= 3 Thus -c4n1-2"), A1.8.6) where
208 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 l~r A1-8.7) r=3 r ¦ We now consider the entire function expK3(t). A1.8.8) We write fii=fi — con, where ©n = ii, if exp BXp(n)) ^ m log n , , otherwise. expBX») 100 m log w If |r| O~Ml and r^m, then A1.8.9) Since H(z) = o(z), Xp(n) = o(log n), and it follows from A1.5.7) that Af = i + o(l), ^^0.49 + 0A). A1.8.10) Therefore, for 3 il/ f n o(l n = o(l). Now consider the values of r in A1.8.11) We prove that Bo(r) = ?exp(r1/A+w). A1.8.12) From A1.1.3) we have so that, from A1.6.4), proving A1.8.12). Thus A1.8.9) gives, in this range,
11.9. MORE INVESTIGATION OF K(t) 209 = B exp [r(? + r1/A+w-0.48 log n)~\ = = B exp [(-0.47 log n)f[ = Bn'0Alr. A1.8.13) Summing these expressions over C3<r^(log nI + Kl gives Bn~2 . A1.8.14) Finally, consider the range Because L(q) is monotonic, A1.8.9) does not exceed Bexp[r{B + \ogQ0(m)-m-1h(Q0(m))-\ogm-nl\ogn}-\. A1.8.15) Comparing this with A1.7.10) and A1.7.13), this becomes Bexp[r{B-Bm)-1expBX»)-p2(n) + conlogn}]. A1.8.16) From the definition of con, this is + B exp CO jT we have Xr = —-: exP For A1.8.17) Summing over r in the given range, this gives Bn~2 . A1.8.18) Collecting together these various estimates we have, in \t\ ^n~Ml, nK3(t) = B, exp[nK3(t)~\=B. A1.8.19) § 9. More investigation of K(t) Writing
210 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 Xrtr/r ! = Bnr(-") = B Qxp(-rcon log n). A1.9.3) Each of the two cases of the definition of con leads to ' ! r = m+l V HUW1 / )). A1.9.4) m y f Thus y f exp{nK3{t)} = l+ X ^- + Bexp(-c4exp2Xp{n)). A1.9.5) and therefore pH(x) ="- Bexp{-c5 exp[2X»]} A1.9.6) + Bexp{-<;6 exp[2^(«)]} . A1.9.7) For r^m we study the integral A1.9.8) for which the methods of § 9.5 give the estimate B exp [-in1'2"] exp [r{B + ± log r-^-^) log «}] = «}]. A1.9.9) Since expBXp(n)) = e^1^, Xp(n) = ({-pi) log n + B , we have % log m-ft-^i) log n = i log m-ft-^) log n + (^! -fi) log n = Thus the sum of A1.9.9) over 3 ^ r ^ m is A1.9.10)
11.10. COMPLETION OF THE PROOF OF THEOREM 11.2.1 211 A similar analysis can be made of so that A1.9.7) becomes If00 f Pn(x) = j- exp(-ia + Bexp{-c5expBX»)}. A1.9.11) r=3 r. rc § 10. Completion of the proof of Theorem 11.2.1. We now investigate, for 3^r^m, the integral )e-**2 A1.10.1) — oo (cf § 10.6). The sum of the terms in A1.9.11) with 3 <r<C3 is C3 |y I '*-r' vr -±r {-I 1 1A ^\ —- x n , ^ii.iu.zj which for is c3 Be~^2 X n^nii-^n-i'pj^-^Be-^p^nj-s, A1.10.4) r = 3 Now let C3<r^m, and take r = q, s = pq@^p^j). Following §9.6 (cf. A1.5.7)) we have, for C = B sup exp {q[B-(l-2p) log Pl(n) + p log q + P ogn + (/i1-Ai)logn]}. A1.10.5) Further, expBXp(n))/m-»-oo, so that eBn1~2M/rn-»-oo, whence log q < log m ^ B + A - 2jz) log n . Thus we have p log q- p(l-2n)logn = B-v(n),
212 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 where v(n)^0. Therefore (see § 8), as w-»-oo, (jji1-n)\ogn= -con log n^ - oo , so that A1.10.5) is of order Bexp(-C4q). A1.10.6) Summing these over p2(n)^q^.m we obtain B/p3(n). A1.10.7) Suppose that p2(n) is chosen so that log q ^ i(l -2fi) log n = ft-//) log n. A1.10.8) (remarking that A — 2/z) log «-»-oo). Then A1.10.5) has the estimate sup {B exp[-q(l -2p) log pl(n)~\] + B exp [-gpft-//) log n] = p = B exp[-qp3(n)~\. A1.10.9) Summing over 3^q^p2(n), we have the estimate B/p4{n). A1.10.10) This proves that [0, n*~'t/pl(n)~\ and \_ — ni~/t/pl(n)~\ are zones of local normal attraction, and completes the proof of Theorem 11.2.1. • § 11. The corresponding integral theorem Consider first the monomial zones [0, rf~\ and [ — na, 0], where a< ^ is a constant. It is not difficult to go from this to the general narrow zone. We introduce auxiliary normal variables Yn with zero mean and variance n2a, which therefore have characteristic functions i«2*f2), A1.11.1) and set Z'n = (Sn+Yn)n-±. Let ; A1.11.2)
//.//. THE CORRESPONDING INTEGRAL THEOREM 213 we show that, if this is a zone of normal attraction for Z'n, then it is a zone of normal attraction for Zn. We have YncN@, rf), Ynn-±eN@, n3''), A1.11.3) and A1.11.4) Because of A1.11.3), [ e~*du A1.11.5) as n-+co, if x satisfies A1.11.2). Thus f e^^dw A1.11.6) J X assuming A1.11.2). Moreover, and 2a-i<i-4<0. Suppose that, under A1.11.2), 1 -±Y) Then, from A1.11.6) and A1.11.7), A1.11.8) A1.11.9) Writing y = n2lx~*, we have fX+7 e-iu2du = By exp{-±x2) exp(xy)(l + o(l)). A1.11.10) J X Since a<^, xy<n3a'i= n~E*, n~a<
214 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 and so T ' e->du = Bx-1 exp{-±x2)>rEs. A1.11.11) In view of this, A1.11.9) gives re-"dw. A1.11.12) If in A1.11.7) we replace n2a~* by — nla~* and reverse the inequality, we get C e~±du. A1.11.13) Similar arguments apply to [ — rf/p(n), — 1] and [— 1, 1], and thus we have proved that A1.11.8) in \x\^na/p(n) implies that this is a zone of normal convergence for Zn. § 12. Calculation of the auxiliary limit distribution Take 1 ^xx ^na/p(n) and x2 = nK, where K is a positive constant to be chosen later, and write l-Fn(x) = P(Xl + ...+Xn+Yn>xn±). A1.12.1) The event on the right-hand side implies that at least one of the events X^^n'*, ^>ix2»~* occurs, and by (9.2.1), A1.11.3) and A1.11.9), for sufficiently large K, its probability is bounded by o(l) f°°e-*dw. A1.12.2) We have Fn(x2)-FH(Xl) = - <WiMt) dt. 271 J- t oo A1.12.3) For|t|>e0, tn(t) = exp(-Wat2) = B exp(-Waz2o), A1.12.4) so that with this error we can restrict the integral A1.12.3) to the interval [~eo> eo]-
11.13. MORE ABOUT THE AUXILIARY LIMIT DISTRIBUTION 215 Moreover, exp (— an* itx2) — exp (— an* i A1.12.5) Thus, Fn(x2)-Fn(x1) = n x {exp (- a* itx2) - exp (- an* itx^} dt + B exp (- c6 n2a) A1.12.6) Arguing as in §§ 9.5, 9.6 and keeping the same notation, we take m = n2a/p0(n), A1.12.6) where po(n) will be chosen later. Then x t'1 {exp( — ari^itx^— exp ( — Setting ?,=tr&, this becomes exp( —c7n2a). A1.12.7) r=3 A1.12.8) § 13. More about the auxiliary limit distribution If we set
2N NARROW ZONES OF NORMAL ATTRACTION Chap. 11 then 27TJ-oo V r=3 x v {exp(-/vx2(l + p))-exp(-h'x1(l+p))}di + Bexp(-c8n2a). A1.13.1) We now take the summation sign outside the integral; the first term in the resulting finite sum is equal to JC2U+P) JCl(l+p) Now rf B n so that this first term is A1.13.2) We therefore proceed to the estimation of the other terms, which can be expressed in terms of Hermite polynomials. Consider first the values of r in the range 3^r^C1. A1.13.3) As in §§ 5.6, we must first estimate H<0)(x)e-**2dx. A1.13.4) X! For 3 < r < Cx, this is of order Bx'f^"**2. A1.13.5) Since ir = Br\nr>l (cf. A1.9.2)) we have, under A1.11.2) and A1.3.3), ^"^(Xi «"-*)'. A1.13.6)
1114- COMPLETION OF THE PROOF OF THEOREM 11.2.2 217 But ix.n^^K l/p(n), A1.13.7) so that A1.13.6) is P(n) A1.13.8) xi Now consider the range Qo-^m. A1.13.9) We have so that A1.13.11) For l t = O so that, with Q = r - 2s, du = xeBr\ ? ? xe. s«ir s\(r — 2s)! t = 0 A1.13.12) § 14. Completion of the proof of Theorem 11.2.2 Writing as in § 6, r = q, s = pq@^p^), Q = r{l-2p), the sum in A1.13.2) is, for l^x^na/p(n) and fixed t, B exp[Bq + q(\ -2p)-2t-l)(a log n-log p(n)) + -pq log q-pq log p-(l-2p) q log ^-^A -2p) log (l-2p) + + q logq + 2tlogq + 2tlog(l-2p)~\ . A1.14.1)
218 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 This must be multiplied by -^_ = Bexp(-aqlogn). A1.14.2) The term corresponding to t is thus B exp [t(log q-2a log n + 2 log p{n))] = = B exp [t(log m-2a log n + 2 log p(n))] . A1.14.3) In this logm = 2a log n-log po(n) + 0 , |0| < 1 . If we take po{n) so that log Po(») > 2 log p(n), then A1.4.3) is estimated as Bipd")}-'* (H-14.4) and when summed over t, taking account of the argument of § 9, gives B{p2(n)}-«. A1.14.5) Summing over q^Cu we obtain B{p2(n)}~1. Thus we have to add to A1.13.2) an error term of order o(l) ("V^dM. A1.14.6) A similar argument for [ — nap(n), —1] completes the proof of Theorem 11.2.2 in the special case of monomial zones. We now proceed to the general case. § 15. The general case of narrow zones We now follow the argument of §§ 11-14 to prove Theorem 11.2.2. Let Xj be random variables with E{X}) = 0, V(X}) = 1, and suppose that A1.2.2) is satisfied, where h(x) is a function of Class I. We begin by following § 3; determine ft from A1.3.6) and set
11.15. THE GENERAL CASE OF NARROW ZONES 219 • a = i-iu. A1.15.1) We shall prove that [0, rix/p5(n)~\ is a zone of normal attraction; similar arguments will apply to [ — na/ ps(n), 0]. We shall follow the notations and arguments of §§ 11-14, noting only significant differences. It is important to note that (cf A1.5.3)) n*-" =n« = eB exp(Xp{n))> (fc(n*/p(n))}* > (log nf +^ . A1.15.2) As before, we introduce Yn and i//n(t) and deduce A1.12.3). We remark that so that nK+1 exp(-c6n2a) = exp(-c7n2a), (cf A1.12.5)). In view of this the formulae A1.12.7), A1.12.8), A1.13.1)- A1.13.5) all hold, the bound (9.5.11) for Xr being used. We do, however, have to use somewhat more precise bounds. We have s\(q — 2s)\ JXl A1.15.3) For l^:xl ^na/p6(n), Q = q — 2s = q(l — 2p), we easily find (for example by the method of steepest descents) the estimate (Q^l), C e~iu2uQdu = BQx^-' e~ix2 exp [iQ log % + 1 ] A1.15.4) If xl^m^Q, xl^mi^Qi, this shows that A1.5.4) is Bflxf-ie-**2. A1.15.5) Arguing as in § 10, we deduce that [m1, na/p6(n)~\ is a zone of normal attraction. Now let Xj <m*. Then A1.5.4) is estimated as BQxQ-lQ-WQWlo9Q A1.15.6) We now set s = pq (O^p^j) and use A1.15.6) and the estimates Xq/ql = B exp^tf log n), n~*q = exp{-^q log n),
220 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 to give an error term of Bexp[q(B+l-2p)^ log m-p log g-(l-2p)log q + + log q-p log p-(l-2p) log(l-2p)—? log n + ni log n)] = = B exp [^{B + (i-p) log m + p log g-p log p + = B exp [^{B + i log m-^-Af) log n-con log n}] . A1.15.7) Moreover, (as in § 10), con log n -*¦ oo , so that A1.5.7) is Bexp[-4p7(n)]. . A1.15.8) Summing over 3 ^ q ^ m , we get */p8(n), A1.15.9) which shows that [1, m*] is a zone of normal attraction. This completes the proof of Theorem 11.2.2. • § 16. The transition to Theorems 11.2.3-5 The remaining theorems refer to the "very narrow" zones. Functions of Class III satisfy 3 log* ^/i(x)^M log* , A1.16.1) and A1.2.2) implies the existence of third moments, but not that of moments of all orders. In this case it is possible [4] to establish by classi- classical methods that [0, (log n)*/p(n)] and [ — (log n)* jp{n), 0] are zones of local normal attraction for variables in (d), and of integral normal attraction in general, and that [0, (log n)ip(n)~\ and [ — (log n)*p(n), 0] will not be so unless all the moments exist. These assertions, which com- comprise Theorem 11.2.5, can also be proved by the arguments described below. We shall, however, confine ourselves to functions h(x) of Class II, i.e. those with
11.16. THE TRANSITION TO THEOREMS 11.2.3-5 221 po(x) log x ^ h(x) < (log xJ . A1.16.2) We take h(x) = M(x) \ogx = N (log x) log x , A1.16.3) where 0 (z->oo), A1.16.4) and A{n) ={M(n) log n}*. A1.16.5) If [0, A(n)p{n)~\ and [ — A(n)p(n), 0] are zones of local normal attraction (it being understood that XjG(d)) or zones of normal attraction, then the argument of § 4 shows that A1.2.2) must be satisfied. Conversely, suppose that A1.2.2) is fulfilled, and that Xje(d);v/e prove that [0,A(n)/p{n)~\ is a zone of local normal attraction. Let fi be a positive number to be fixed later; then p(x) = ?-[ (p(t)n Qxpi-itxn^dt + B Qxpi-csn1'2"). A1.16.6) Following § 4, we have the estimate 4>(q){t) = B P° xqe~h{x)dx = = B C exp(q \ogx-h(x))dx . A1.16.7) Let Q(q) be the solution of the equation h{x) = {q + 4)\ogx, A1.16.8) M(x) = Then dx r °° r °° dx exp(^logx-/z(x))dx = B — = #. A1.16.9) J<2(«) ^<2(«) x and oo 0 \ogx-h(x))dx = BQ{q) exp [4 log Q(q)~\ = log fife)]. A1.16.10)
222 NARROW ZONES OF NORMAL ATTRACTION Chap. The resulting estimate for 4>(q)(t), through crude, is sufficient for our purposes. From A1.16.8), = M~1(q + 4), A1.16.11) so that A1.16.10) gives <^>@ = B exp[D + l) log M- Following § 6, we find that Kl9)@) = B exp [(q + 1) logM-1(g+4)] , A1.16.12) A1.16.13) and sup tm ml = Bexp[logM 1(m+4)-/jmlogn]. A1.16.14) § 17. Choice of ft We choose pL so that n1 ~2fl = A(nJ = M(n) log n, so that B ** 2 log n ' log n ' where X(n) = log A(n) = O(log log n). Let t= 10 ~6, and choose m by the condition Thus log AT (log n) = because of A1.16.4). A1.17.1) A1.17.2) A1.17.3) A1.17.4)
"-17- CHOICE OF n 223 From A1.16.14) we obtain tm sup K(m){t) ml = B exp [{-(/J-ir) log n + B}m] = = B exp [ — c8M(n) log n\ = = Bexp[-c8A(nJ~\, A1.17.5) using A1.17.2) and A1.17.4). Thus Pn(x) = 5- Qxp{-^nt2)Qxp(nK3{t))exp(-initx)dt + + Bexp(-c8A{nJ), A1.17.6) where K3{t)= I il*rf/r\. A1.17.7) r = 3 To study the entire function Qxp(nK3(t)) ,we set ^ = 0.99// A1.17.8) and take |t|^n~Ml. For r^C3 we have i//r = B, and 3^ = o(l). A1.17.9) For C3<r^m, A1.16.13) gives il/ f = Bexp[(r+l)logM~1(r + 4)-/^1rlogn-r log r] = —^r//1 logn]= Bn1^1 . A1.17.10) From A1.17.9) and A1.17.10) we conclude that, for |t| < n' , nK3(t) = B, A1.17.11) and by Cauchy's integral (cf. A1.9.2)), Xr = Br\n2ltl. A1.17.12) For |t|</r"\ A1.17.10) gives 00 tr Y IlL = b exp (m log n/200) = r=m r! = Bexp(-c9M(n)logn)=Bexp(-c9^l(nJ). A1.17.13)
224 NARROW ZONES OF NORMAL ATTRACTION Chap. 11 § 18. Completion of the proof From the formula A1.17.13) we obtain f n(x) = ^~ f exp(-±nr2) A + f ^tr) exp(-itxn>)dt + + Bexp(-c10A{nJ). A1.18.1) The substitution ? = tn* gives + Bexp{-c10A(nJ . A1.18.2) A1.18.3) Now BTftrJn-tt-"* = B exp [r(B + i log r-ft-^) log «)] , A1.18.4) and log r-^-/^) log n^ log m-^J log n< -? log n , A1.18.5) because of A1.17.4). Therefore A1.18.3) is Bn~ir, and the sum over r> C3 is o(l). Further, so that A1.12.2) gives oo V r= 3 ' • '* / A1.18.6) Take 0<x<n*-Vp7(») = ^(n)/P7(n), A1.18.7) and separate off the first term
11-18. COMPLETION OF THE PROOF 225 from A1.18.6); the sum of the remaining terms with 3 ^r< C3 will then be o{l)e~ix2. A1.18.8) For C^Kr^m, we follow § 10, and examine the expression exp{r[B-(l-2p)logp7(n) + + p log r-p{\-2p) log n-^i-^) log n]}. A1.18.9) Here log r^log m = B log log n (see (i 1.17.4)), so that A1.18.9) is Brf^'^ . A1.18.10) Summing over Ci<r^m gives an error Bn~2 , A1.18.11) so that A1.18.6) gives pn(x) = Bn)-±e-^2(l + o(l)). A1.18.12) This proves Theorem 11.2.3. • The corresponding integral Theorem 11.2.4 is proved exactly as in § 15, the rough estimates derived in §§ 16-18 being sufficient for the purpose. It is important to note that, since A(nJ = M(n) log n ^ po(n) log n , we have -c12yl(nJ), A1.18.13) and we can argue as in § 15. Theorem 11.2.5 is derived by classical methods, the asymptotic expansions of Chapter 3.
Chapter 12 WIDE MONOMIAL ZONES OF INTEGRAL NORMAL ATTRACTION § 1. Formulation In this chapter, as before, we study the independent, identixally distributed random variables Xx, X2, ... with We shall study the zone [0, rf] where a>?; we recall that this is said to be a zone of normal attraction if, uniformly in 0<x<na as n->oo, / f P(Zn >x)/B7r)-H e-*du-*l. A2.1.1) An analogous definition holds for [ — rf, 0]. As before, the symbols p(n), px (n), ..., pk(n) will denote functions mono- tonically increasing to infinity, each one usually defined in terms of its predecessors. In this chapter we prove the following theorems. Theorem 12.1.1. //[ — rf, 0] and [0, rf] are zones of normal attraction for all ol<j, then the variables Xj are normally distributed. It follows that we need only consider values of a < j. This theorem is a corollary of the following more precise result. Theorem 12.1.2. If ^^a<%, consider the series of critical numbers ,.••->*• A2-1.2) Let s be the unique integer with ,s+1 ,s+2 2s+3 2s+4
12.2. THE PROBABILITY OF A LARGE DEVIATION 227 In order that [0, nap(n)~] and [-nap(n), 0] be zones of normal attraction, it is necessary that E{expA\Xj\4al{2a+i))< oo {Q<A<1), A2.1.3) and that the moments of Xj, up to order (s + 3), should coincide with those of a normal distribution. These conditions are moreover sufficient for [0, na/p(n)~\ and [ — na/p(n), 0] to be zones of normal attraction. The reason why it is necessary to include A in A2.1.3) is that we have used a change of scale already to set a = l. We remark that the necessity of A2.1.3) has already been proved in § 9.2. Theorem 12.1.2 is completely analogous to the corresponding local Theorem 9.3.2, but the method used in Chapter 9 is not sufficiently power- powerful to prove the present theorem except under more restrictive conditions, ore precisely, it requires that \4>(t)\^l for t^0 and that \4>(t)\<c<l for \t\ > C. This will be true, for example, if F(x) = P(Xj< x) contains an absolutely continuous-component, in which case Theorem 12.1.2 can be proved by the methods of Chapter 9. § 2. An upper bound for the probability of a large deviation We now proceed to the proof of the sufficiency part of Theorem 12.1.2, assuming A2.1.3) and ^3 = ^4=...=^+3 = 0, A2.2.1) where \j/r is the r th cumulant of X-y Lemma 12.2.1. For x>n*/Pl(n), A2.2.2) we have P(Sn>^)<C1exp[-Cln27p1(«J], A2.2.3) and an analogous inequality for x^ — na/Pi(«)- This is a weaker inequality than would be implied by the integral limit theorem that we are trying to prove, and the methods of Chapter 11 are
228 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 appropriate. We shall indicate the necessary changes in §§ 11.11-13 which are necessary to arrive at A2.2.3). It is clearly sufficient to take x = Xi = na/2p1(n) ; as in § 11.11 we introduce an auxiliary normal variable YnEN@, rf), with characteristic function and write Z'n = n-HSn+Yn). Then V(Z'n)=l + n2a~1 , A2.2.4) and the other cumulants of Z'n coincide with those of Zn = n~iSn. Suppose that we can prove that, for rf rf x P(Z'n>x)<C2 exp[-c2n27Pi(nJ] ; A2.2.5) we show that this implies A2.2.3). From A1.11.4)—A1.11.6) we conclude that A2.2.6) From A2.2.5), P(Z'n>x\n-i\Yn\n2*-i) = BQxp[-c2n2"/Pl(nJ]. A2.2.7) Taking x = 3/178/0! (n) and following A1.11.7), we find that P(Zn>x)^P{Z'n>x + n2a-i\n-±\Yn\^n2a-*). A2.2.8) Since oc<j, 2a — j<a, so that for sufficiently large n, 2a~i< l.Olx, and A2.2.5)-A2.2.7) combine to prove A2.2.3). It therefore suffices to establish A2.2.5), and we do this by following closely the argument of §§ 11.12, 13. The only difference is that now a 5^, so
12.3. INTRODUCTION OF AUXILIARY VARIABLES 229 that in the derivation of A1.13.2) we cannot use the inequality 3a-1 < 0. However, since a<^, 3a-1 <a, and we can replace A1.13.2) by the esti- estimate -co ,-co B e-±u2du = B exp[-c2n2a/Pl(nJ] . A2.2.9) The later arguments of §§ 11.13, 14 go through unchanged, and we arrive at A2.2.5), which proves A2.2.3). Replacing X-} by —X} we get the cor- corresponding inequality for negative x. • We need a slight strengthening of Lemma 12.2.1. Lemma 12.2.2. Let n^ be an integer inl^n^n, and let x satisfy A2.2.2). Then P(Sni>xn*)<C3 exp[-C3«27Pi(*J] , A2.2.10) with a similar result for negative x. Proof Write Sn = Sni + Tni, where Then P(SH>±xn*)>P{Sni>xn*)P{\Tni\<$xn*)>cAP{Sni>xn*), whence A2.2.10) follows from A2.2.3). • § 3. Introduction of auxiliary variables We now write X[ = Xi + Ai, (i<n), where the A{ are independent, small normal variables, 4-eiV@,n-10), A2.3.1) and S'n = Xl + X'2 + ...+X'n. Then V(S'n)=V(Sn) + n-'\ and the other cumulants of S'n coincide with those of Sn. It is easy to see
230 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 that if A2.1.1) is true with Sn replaced by S'n then it is true without this substitution, so that we can work with the sequence S'n. This is conve- convenient, since X\ has a continuous probability density p(x) with 0<p{x)^n10 (i<n). A2.3.2) (It is to be noted that p(x) depends on n, but not on i ^ n. We shall use a modification of Cramer's method ([156] and Chapter 8). For functions p2{n), p3(«), ... to be specified later, take h in n-±^h^n*-±/p2{n), A2.3.3) and define = 0 , (\Z\>n2*/p3(n)). A2.3.4) Let Xt be independent random variables with probability density ), A2.3.5) dx where oo R=\ e^pidy). A2.3.6) J — oo For n1 ^n, write fni{x) for the probability density of S'Hl = X[ + ...+XZl, and/ni(x) for the probability density of Sni = X1 + ...+Xnl. For all ? we then have /1(a=^1^i(^)/i(a- A2-3,7) We shall seek estimates of the form (Z) + 9pni, A2.3.8) where |0|<1 (and 6; like B, may vary from place to place), and pni is some error bound, attempting to establish these by induction 011^.
l2-4- STUDY OF THE BASIC RELATION 231 § 4. Study of the basic relation We have f°° fni + dt) = R-1 fni(Z-z)ei(hz)p(z)dz, A2.4.1) J — oo and supposing that A2.3.8) holds, this gives OO z))e1(hz)fnM-z)p(z)d — oo 00 — oo 9pnie1(hz)p(z)dz. A2.4.2) In view of A2.3.6), this may be rewritten as e1(HZ-z))e1(hz)fni(Z-z)p(z)dz + epni. A2.4.3) We now compare the first term: TOO R~">~' el(h(Z-z))e1(hz)fni(Z~z)p(z)dz A2.4.4) J — oo with the integral + 1(?). A2.4.5) We separate the domain of integration into two sets K = {z;\z\^n° + yP4(n), |?-z|^n« + */p4(n)} A2.4.6) and its complement 21. We first estimate R*-1 I \e1(h(Z~z))ei(hz)~e1(h?)\fni(Z-z)p(z)dz, A2.4.7) assuming that p4(n)<{p3(n)}\ A2.4.8) From A2.3.4), ei(hu)<exp{n2*/p3{n))
232 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 for all u, so that 2alpM)- A2-4.9) Moreover, as in A2.3.2), 0</ni(?-z)<rc10, A2.4.10) so that A2.4.7) is bounded by ( ( i I )\l;-z\>n«+V2/p4(n) + [ p{z)dz\. A2.4.11) )\z\>n«+V2/p4(n) J It is easy to see that Lemma 12.2.2 (with c3 replaced by jc3) applies to the sums S'ni, so that the sum of the integrals in brackets does not exceed 2C3 exp(-c4n2Vp4(nJ) < 2C3 exp(-c4n2Vp3(»)f), A2.4.12) using A2.4.8). Thus A2.4.11) does not exceed 2C3R--1 exp[-c5n27p3(/i)*] A2.4.13) for sufficiently large n, with c5=jc4. We now assume that in A2.3.3) p2(n) has been defined by P2(»)={p3(«)}10. A2A14) Then, for ze 21, \hz\<n2«/p3(ny°, MZ-z)\<n2°/p3(nyo, A2.4.15) so that A2.3.4) shows that e1(h(Z-z))e1(hz)-e1(hQ = 0. A2.4.16) Thus, A2.4.4) is R-^-'exp[~c5n2Vp3(nn • A2.4.17) § 5. Derivation of the fundamental formula We have shown in § 4 that A2.3.8) implies that r0p,,1. A2.5.1)
12.5. DERIVATION OF THE FUNDAMENTAL FORMULA 233 We have therefore proved by induction on n the formula ePnt A2.5.2) where n27p3(n)*]. A2.5.3) Thus, for values of ? such that e^hty^Q, m = R"e1(-K)W + 9R"e1(-h<;)pn. A2.5.4) Since for all u, A2.5.4) and 12.5.3) yield, when e fS) = Rne,{~H)m) + eP'n, A2.5.5) where p; = 2(l + R+...+R")exp[C6n27p3(n)*]. A2.5.6) If ei{hZ)*O, then \h?\^n2a/p3{n), so that by A2.3.3) and A2.4.14), 3{n)9^n2 A257) Now consider the function m)di , A2.5.8) — 00 and write From A2.5.5) and A2.5.7), ^ + en'p'^ A2.5.9) 1 2p'n, A2.5.10) since the integrand vanishes on 41. We therefore estimate
234 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 '21 If ei(h?)=Q, then by A2.3.3) and A2.4.14), l?l 2* A Pi(n) = « Pi \n) and thus by Lemma 12.2.2, A2.5.12) f /n(?)d?< C3 exp[-c3n2-/Pi(nJ] , A2.5.13) for arbitrary Pl. We take p1(n)=p3(n), and then A2.5.13) can be seen, for sufficiently large n, to be smaller than p'n. Thus we arrive at the equation Wn(u) = f /„(?)<!?= R- [ ei(-^)/n(^)d^ + 20n2P;- A2-5.14) § 6. The fundamental integral formula We consider the distribution function The random variables Xt (i < n) have distribution function V(x) = P(Xi<x) = R-1 T e1(hy)p(y)dy. A2.6.1) •/ — oo It is clear that for sufficiently large n, and i < n, a2=V{Xt)>0, m = E(Xi)^oo. A2.6.2) Writing Fn{u)= Wn(mn + aun±), A2.5.14) gives C un lA Fn(u) = R" ed-WdW^Q + Oprt, A2.6.3) ¦' — oo where
12.7. STUDY OF THE AUXILIARY INTEGRAL 235 = 2n2p'n. A2.6.4) In A2.6.3) we set w = co and u = x^l and subtract one expression from the other, to get xn Vz A2.6.5) If x^l, h?^0, then ei(-h?) = e K for h?<n2*/p3(n), and e1{-h?) = 0 for larger values. Thus l-Fn(x) = R" f e-^dFFn(?) + 0pn3, A2.6.6) JxnVi where Pn3 = 2pn2 + R" j°° e~«d^@ , A2.6.7) and From this f e-*«d^B(^) ^ exp(-n2Vp3(«)), A2.6.8) J xo so that we can take Pn3 = 2pn2 + R" exp [- n27p3 (n)] . A2.6.9) n2* § 7. Study of the auxiliary integral In A2.6.6) set ^ = mn^ + avn^, to obtain _ e-w/2"dFn(i;) + 0pn3. A2.7.1) We now turn to the quantity R= e^htfpWdy, A2.7.2) and approximate it by means of a truncated power series. Set p5(n)=p3(n)^, yn = ri*+i/p5{n), so that Lemma 12.2.2 and A2.3.4) give
236 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 R = I" ei(hy)p(y)dy + ePn4, A2.7.3) where pn4 = C4exp[-c7n27p3(n)*] . A2.7.4) Because of A2.3.4) and A2.3.3), e1(hy) = ehy for |j>|^.yn, so that R = j" e^p^dy + flp^. A2.7.5) We need an upper bound for R; R < c5 . A2.7.6) To prove this, set P{y)= C p{z)dz {y^O), Jy and use A2.1.3), A2.3.1 and the arguments of § 9.4 (following (9.4.3)) to prove that >). A2.7.7) Use this and integrate A2.7.5) by parts, using the fact that, for Ky^y,,, hy-c8y4*«2*+l) < -ic8/«/<2«+1>, A2.7.8) to obtain A2.7.6). More generally, if 6(y) is continuous on [ — yn, yn], and exp(hyd(y))p(y)dy<C5. A2.7.9) § 8. Expansion of R as a Taylor series For a fixed positive integer K, expand A2.7.5) as a Taylor series of (K + 2) terms: K hp r hK+ K hp r hK+i r R = I 1 yPP(y)dy + J^rW] exP(hye(y))p(y)dy + k=0P- J\y\<yn [K+i-I J\y\<yn + dpn4. A2.8.1)
12.8. EXPANSION OF R AS A TAYLOR SERIES 237 We take K = [10/(±-a)] + l A2.8.2) and note that, for p < K, ypp(y)dy <C6Qxp[~c9n2a/p5{nJ]<C6n-10 A2.8.3) \y\>yn for sufficiently large n. Thus A2.8.1) becomes, using A2.8.2), A2.8.3) and A2.7.5), K hp R = l+ X olp— + 0C7n10, A2.8.4) p=2 P- where the ap are the moments of the Xt (i^n). Moreover, yehyp(y)dy, A2.8.5) oo and the argument used to derive A2.7.5) and A2.7.6) gives m = R~i f yJyp{y)dy + OPnS, A2.8.6) JM^yn where pn5 = C4exp[-C9n2Vp3(»W • A2.8.7) A2.8.8) Hence, arguing as for A2.8.4), K h"'1 We also need the variance As in A2.8.8) we have K hp-2 E(X2) = R~1 I ap-~~ + dC,n-\ A2.8.9) so that a2 may be obtained from A2.8.8) and A2.8.9). This is most easily done by remarking that, as far as their principal terms are concerned, m and a2 may be obtained from R'JR^ and (R'[Rx -R\2)/R2 (cf. Chapter 8), where
238 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 ehyp{y)dy. Thus logR= ? yp^ + #C7n-9, A2.8.10) p-2 P • m= V yp— + dC7n-8 , A2.8.11) P=2 PP! pf2^7 n-7, A2.8.12) where the yp are the cumulants of X{. Under the conditions A2.8.1) and A2.3.1) we have y2=l + n-20, yj = 0 (/=3,4, ..., s + 3), A2.8.13) where s is the greatest integer with ! S+l 2 s + 3 Thus and from A2.8.10), A2.8.11) and A2.3.3) we have log R = ±y2h2 + eC8n-(s+4)l(s+3), A2.8.15) m = y2h + eC8n-1/p2{n)s+3 . A2.8.16) Thus A2.8.13) implies that n-1p2{n)-s-3 . A2.8.17) § 9. Further transformations Turning to A2.7.1) we now choose h so that x = mni, A2.9.1) where ^P2(nJ0. A2-9.2)
12.9. FURTHER TRANSFORMATIONS 239 From A2.8.17) this implies that n-1p2{n)~s-3 , A2.9.3) which in view of A2.9.2) is consistent with A2.3.3). Moreover, R" e ' m" = exp [n (log R-hm)], and from A2.8.15) and A2.8.16), n(\ogR-hm)= -$nh2 + 9h= -^c2 + 0n~*x A2.9.4) for sufficiently large n. From A2.7.1), f e-h7lnViVdFH(v) + 0p3{n). A2.9.5) Jo For the calculation of the first term on the right-hand side we can follow the argument of Cramer ([19]) and Chapter 8) almost verbatim. We have dn±<v}, A2.9.6) where Moreover, we can prove as in A2.8.8) and A2.8.9) that E\Xi-fn\3 < C9 (Kn). A2.9.7) We remark that in A2.9.6) the normalisation is by rv rather than (n + n~19)^, but this can be avoided by taking the factor (l + n~20)^ to the other side, when the central limit theorem (Chapter 8) gives Fn(v) = *(v) + Qn(v), \QH(v)\ < Clon~* log n , A2.9.8) where <P is the standard normal distribution. Hence exp{-haniy)dFn(y) = o /-co = B71)"^ exp(-haniy-jy2)dy + Jo TOO + Qn @) + han* exp (- h&vfiy) Qn (y) dy = Jo . A2.9.9)
240 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 Moreover (see for example [22]), -o JhanVi = Bnn)-±(ha)-1(l + 0CiO/hani). A2.9.10) We now use A2.3.3), A2.9.3) and A2.8.12) to throw A2.9.10) into the form B7r)-"x-1(l+0C11/i), A2.9.11) which, substituted into A2.9.5), gives A2.9.12) § 10. Completion of the proof of sufficiency The expression just derived is valid for all x satisfying A2.9.2), but we shall use it only in the range n^x<«7p7(n), A2.10.1) as for x<n* the integral limit theorem has already been proved in §§ 7.11, 14. We can choose p7(n) to be equal to p2(nJ0 A2.4.14) and Pi (n) = Pi («I0> but p3 (n) can increase arbitrarily slowly, so that the gener- generality of A2.10.1) is not restricted. Thus if A2.10.1) is satisfied, we have l-FH{x) = {2n)-* e-^duil + OC^n-^ + ep^, A2.10.2) Jx where /? = minG, j—oc). It remains to estimate pn3, using A2.6.9) and A2.8.4). Consider first pn2, which by A2.6.4), A2.8.10) and A2.5.6) satisfies pn2 < 4n3 exp(n log R) < C10nh2 < Cnx2< C11n
12.11. PROOF OF THE NECESSITY 241 so that pn2 < C16 exp[-c12n27p3(«)*] • A2.10.3) Similarly the last term in A2.6.9) has the bound C16exp[-c12n27p3(»)L A2.10.4) so that pn3 < 2C16 exp[-c12n2Vp3(n)] . A2.10.5) Substituting this into A2.10.2) we have the required integral theorem. § 11. Proof of the necessity We now complete the proof of Theorem 12.1.2 by showing that, if [0, nap(n)~\ and [ — rfp(n), 0] are zones of normal attraction, then A2.2.1) must hold. Suppose to the contrary that, for same s0 + 3- A2.11.1) Writing _i l _! we introduce a further independent random variable YneN@,na>), A2.11.2) with characteristic function A2.11.3) The sum (X1 + X2 +... + Xn+Yn) has distribution function Fn(x) and probability density J — co Using the notation of Chapter 9, and following the calculations which there led to (9.5.2), we find that
242 INTEGRAL NORMAL ATTRACTION: WIDE MONOMIAL ZONES Chap. 12 + Bexp(-?ln2a'). A2.11.4) Following the computations of §§ 9.6, 6 that, for l<x<na'C, A2.11.5) where ? is a sufficiently small constant, So + J A2.11.6) where ^0. A2.11.7) Take x1 = K»ai5 x2=C«ai, A2.11.8) and integrate over x, remembering that we have x n(n-ix)So + 3(l+yx)dx + Bexp(-fe1n2ai), A2.11.9) where lv,Ki. (i2.ii.io) The ratio of the second and third terms to the first term on the right-hand side of A2.11.9) is >c13CS0+3 = Ci, A2.11.11) if, as we shall assume, ( is sufficiently small compared with e1#
12.12. COMPLETION OF THE PROOF 243 § 12. Completion of the proof Now suppose that [0, nap(n)] and [-rfp(n), 0] are zones of normal attraction, and consider the distribution function Fn(x) of n~k(Sn+ Yn). Write gfjx) for the probability density of Then roo Fn(x)= Fn(x-z)9i(z)dz, A2.12.1) J - oo and we shall consider the expression Fn(x2)-FH(Xl). A2.12.2) Suppose that e2>0; then Fn(x-z)g1(z) = BQxp{-{n1~2E2). A2.12.3) If x^x^x2, then for \z\<rfl~E\ Fn(x-z) = Bn)~i e~T" du[l+f/(x-z)], A2.12.4) J -oo where to(x-z)|<e3, (x^x<x2), A2.12.5) and e3 may be chosen arbitrarily small for large n. From A2.12.3) and A2.12.4) we conclude that [x J — oo A2.12.6) where |0|< 1, and e4 can be made arbitrarily small for large n. In particular, A2.12.7) If ?4 is sufficiently small compared with the number Ci in A2.11.11), then A2.12.7) contradicts A2.11.9). Thus A2.11.1) is impossible, and the proof of the theorem is complete. •
Chapter 13 MONOMIAL ZONES OF INTEGRAL ATTRACTION TO CRAMER'S SYSTEM OF LIMITING TAILS § 1. Formulation This chapter is devoted to Petrov's theorems on integral convergence, whose local analogues have been proved in Chapter 10. We keep the notation of that chapter, but do not restrict the variables X} to belong to the class (d). The basic results are the following analogues of Theorem 10.1.1. Theorem 13.1.1. Let p(n)->oo be an arbitrary increasing function, and suppose that, for some ot< j, ?{exp(|Xj|4a/Ba+1))}< oo . A3.1.1) Then, uniformly in 0^x^na/p(n), f)}» A3-1.2) Here 2[s] (z) is the truncated Cramer series, and s in the integer defined by A0.1.6). Theorem 13.1.2. If, for all x in 0<x<nap(n), all n^nn, and positive con- constants n0, a0 we have P(Zn>x) ^ e-*0*2, P(Zn< -x) < e-*0*2, A3.1.4) then A3.1.1) necessarily holds, and the conclusion of Theorem 13.1.1 applies.
13.2. THE PROBABILITY OF A LARGE DEVIATION 245 The deduction of A3.1.1) from A3.1.4) is exactly like that given in § 8.2. Thus no truncated power series n[s](z) other than that of Cramer can possibly appear in formulae of the type A3.1.2), A3.1.3). Further, in the collective Theorem 13.1.1 the role of the linear functionals ah is played by the moments of Xj. § 2. An upper bound for the probability of a large derivation For the sequel, we shall need inequalities like A2.2.3) and A2.2.10) for the probability of large deviations. We cannot however use the inequalities already proved, since these depended on the vanishing of the first (s + 3) cumulants, which is not here assumed. Suppose then that Xx, X2, ... are independent and identically distributed, with and suppose that A3.1.1) is satisfied. We prove that, for any monotonic function p(n)-+oo there exist positive constants cx and c2 such that P(Sn>n°+i/p(n))< Cl exP[-C2n27p(nJ] A3.2.1) for all sufficiently large n, where To prove this assertion, we consider the normalised sum Zn = SJanK A3.2.2) and a modified random variable Zn = Zn+YJan-, A3.2.3) where Yn is a random variable, independent of the Xj, and having a normal distribution with mean 0 and variance n2a. Thus E(Zn) = 0, V(Zn)=l+a-2n2*-1. A3.2.4) The distribution functions of Zn and Zn will be denoted by Fn(x) and Fn{x) respectively, and their characteristic functions by fn{t) and/„(*:), so that /H(t) =/,(,) exp {-^-j. A3.2.5)
246 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 13 The random variable Zn has a continuous distribution, with density which is everywhere continuous. Integrating from x1 to x2, (where x1 <x2 will be chosen later), 00 p-lfX2 a~itXl I rco p-itx2_e —it -m where v(u) is the common characteristic function of the variables Xj. If g is a fixed positive number, then i re p — iuan^hxi _ — IU , A3.2.6) where as usual sr denotes a positive constant. By virtue of B.6.35) there exists si>0 such that in |m|< ?i. Therefore (for any monotonic p x (n) -*¦ oo) we have, in the range the inequality Taking ? = ?! in A3.2.6), we therefore have Fn(x2)-Fn(Xl) = [ + J3exp[-?2n27p1(nJ]. A3.2.7) As in Chapters 9 and 10, we introduce the function K{t) = log v{t),
13.2. THE PROBABILITY OF A LARGE DEVIATION 247 and obtain from A3.2.7), X lU xexp \-±n2*u2 + n ? -f 1/ L r=2 ' • A3.2.8) where iJ/r = K{r){0) and m = [n2a/p1(nJ]. If <rzr/r\, A3.2.9) r=3 where this becomes 1 i* ina~ Vi/pj(n) p — z<rn1/4x2 p = ^ x f " \1/ zr\ x exp (nKN (z)) exp I n ^ —L— Jdz V N+l r' / where JV is a large positive constant to be determined later. We now apply the method of steepest descent, setting na <Tp(n) nT so that r-+0 as n-*oo. Form the equation A3.2.10) A3.2.11) ±{RN(z)-ztG}=0. A3.2.12) For sufficiently small t, this equation has a unique real root z0 with the same sign as t and tending to zero as r-+0. In A3.2.10) we deform the con- contour of integration to the three sides of a rectangle passing through z0, to give -83n2*/p1(nJ], A3.2.13)
248 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 13 where 2ni En(z,t)dz, zo-in*- Vilpi(n) and En{z,t) = exp[n{KN{z)-zt<j)'] exp in ^ Wr Along the line z = zo + iv we have z ' \ vTi r! r= 2 and substituting w = v{nK'^(zo)}i we get where X A3.2.14) f x exp n dw, A3.2.15) and Q is the interval Now require that px (n) and N should satisfy lim p(n)/pl = oo , Iimp1(n)=oo, A3.2.16)
13.2. THE PROBABILITY OF A LARGE DEVIATION 249 Then on Q, for sufficiently large n (the positive constant C may be different in different formulae), so that, for r^m = [n2a/pl(nJ] and weQ we have \j/r ( iw Cexp < r l-2a 4a log r-(?-a) log n-log Hence C for sufficiently large n. It is not difficult to see that, under A3.2.17), a similar bound obtains for [logn] r = N+l so that for weQ and n sufficiently large, we have with Using this fact, we easily find that | For the root z0 of A3.2.12) we have Cri^ for all sufficiently large n. where 6a3 2^ is a series converging for small t, in which, by taking N sufficiently large, we can make arbitrarily many of its terms agree with those of the Cramer
250 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 13 series A(r). Hence from A3.2.14), and thus for sufficiently large n, taking A3.2.11) into account, , A3.2.18) -n2a/4G2p(nJ]. A3.2.19) The integrals I2 and 73 are estimated by the methods used for the similar integrals in § 10.4, to give the bounds |/,l<Cexp[-e5n27Pl(nJ], (s = 2,3). Combining A3.2.13), A3.2.18) and A3.2.20), we obtain for all finite x2>xi and all sufficiently large n. We now set x2 = np, where p>Ba+l)/4a + 3, and Sn=Sn+Yn. x Then Sn > np implies that one of the events A3.2.20) A3.2.21) occurs, and by A3.1.1) P(Xj ^ nP~2) < C and so that Adding A3.2.21) and A3.2.22), we therefore have It remains only to replace Fn by Fn in A3.2.23). We have A3.2.22) A3.2.23)
/-*•-*• INVESTIGATION OF THE BASIC FORMULA 251 so that A3.2.23) gives . l-Fn(Xl)^Cexp[-s6n2«/p(nJ]. Since x1 = rf/ap{n) this is the inequality A3.2.1) which we set out to prove. Replacing Xj by — Xp we also have the inequality P(Sn< -rf+*/p{n))<Cl exP[C2n27p(nJ] . Moreover, the argument used in § 12.2 shows that, for any n P(\Sni\ > rf+-lp{n) < c3 exp[-c4n27p(nJ] . § 3. Investigation of the basic formula Having established the basic inequalities, we can now proceed as in §§ 12.3-7. As in § 12.3, we write X/ = X; + zl,-, where At are independent with ^eiV@,n-10). ' A3.3.1) Keeping the notation of Chapter 12 and using A3.2.1) we proceed to the formula (cf. A2.7.1)) , A3.3.2) J (x-fhnV3.)lal/i where r oo R= ei{hy)p{y)dy, A3.3.3) and pn3 is defined in § 12.6. Since the case a<? has already been investigated, we may suppose that x>n^~s for any g>0. Thus we may take <-±-2s=n-i-2s. A3.3.4) From A2.8.10), A2.8.18) and A2.8.12) we have Z yP} 7 () 7n-g, A3.3.5) p=2 V ¦ |2yP7^7+^C7n-8 = m^(^) + ^7n-8, A3.3.6)
252 CRAMER'S SYSTEM OF LIMITING TAILS Chap. 13 n-7 , A3.3.7) where the superscript \K\ denotes a truncated power series in h, and the yp are the cumulants of X-; in particular Thus d „., m[K]= —¦ nk](h) + dCin-g . A3.3.8) dn v ' Following § 8.3, we take x = fhni, where Kx<n7p7(n) A3.3.9) (c/ A2.9.2), the notation of Chapter 12 being retained). From A3.3.2) we have Too v) + 6pn3. A3.3.10) In view of A3.3.5)—A3.3.8) the factor multiplying the integral in A3.3.10) may be written exp _d_ dh A3.3.11) Now write x — n ix = m and choose h as the solution of the saddle point equation fh-x = ~ tiK](h)-x + dC7n-6 = 0. A3.3.12) From A3.3.4), If x' = x-dC7n-6, we easily find (cf. (8.3.6)), ^ -3 , A3.3.13) where XliK] is a truncated Cramer series. By virtue of the definition of x', tiK](h)-h~ HK](h)= -±x2 + x3A[iK](x) + Bn-3 . A3.3.14)
13.4. COMPLETION OF THE PROOF 253 Substituting this shows that A3.3.11) is equal to 2] . A3.3.15) § 4. Completion of the proof Inserting A3.3.15) into A3.3.10), we have + 6pn3. A3.4.1) For the computation of the integral we follow the argument of § 12.9 to give [h). A3.4.2) Since z = n~ix, the substitution of A3.4.2) into A3.4.1) giveg x dpn3, A3.4.3) or, because of A3.3.9) and A2.10.5), 1). A3.4.4) But x>ni and /i = J5n~ix = J3na~i/p7(n), so that for the values of x described in A3.3.9), A3.4.4) gives Since X[ = Xl + Al, we find without difficulty that A3.4.5) holds for the original variables also. In view of the definition of K, we can replace \^K\ by (s], to obtain A3.1.2). Finally A3.1.3) is derived by replacing X} by -Xj. . The theorems of this chapter of course contain those of Chapter 12 on normal attraction as a special case.
Chapter 14 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE § 1. Formulation In the preceding chapters we have studied theorems of a collective type concerning large deviations in zones of the form [0, \J/ (n)] and [ — \j/ (n), 0], where \j/(n) = o(ni). The role of the linear functionals ap bj was played by moments of the random variables Xj. In the case ij/(n) = nccp(n), the con- condition E{exp(,4|X/a/Ba+1))}< oo A4.1.1) appears as a condition for normal attraction; this implies that all the moments of Xj exist and the probability of a large deviation in Xj itself falls off very sharply. In this chapter we study theorems in which x is not restricted to any zone, but allowed to range over the whole real line. Thus let XY,X2,... be independent and identically distributed with A4.1.2) We shall seek classes of such variables for which collective limit theorems hold which assert that, uniformly in x> 1 as n-+oo, P{Zn>x)/${x, au .... ak, n)-> 1 A4.1.3) and P{Zn<x)/<p{-x, blt ..., bh n)-> 1 . A4.1.4) Here the limiting tails <P depend on linear functionals ajy bj of F(x) = P(Xl<x). We remark that the restriction x > 1 is harmless, since in |x| < 1 the classical theorems hold. For simplicity we shall restrict attention to the case in which F is symme- symmetric, having a bounded continuous density g(x) such that, for x ^ 1,
14.2. PROBABILITY OF VERY LARGE DEVIATIONS: ELEMENTARY RESULT 255 6a a J- oo 6a a , g(u)du= ? -r+0(x-6*-), A4.1.5) x r = a -* and thus 6a a g(u)du = E -T + 0(x~6a-s). A4.1.6) — oo r = a Here a ^ 3 (since the variance exists), the Ar are constants, with Aa > 0, and ?>0. The class of such probability densities we call (A). Such variables have only a finite number of moments, and the role of the linear functionals dj, bj is layed by pseudomoments defined in § 5 below. Theorem 14.1.1. For x^l we have, uniformly in x as n-+oo, ,n*)j^l A4.1.7) where r(x, n*) is a rational function in both arguments. For x^ni+a'1+s, n>no{s), ; A4.1.8) r(x, n^) is determined by a finite number of linear functionals of the distri- distribution of XY, called pseudomoments. This theorem has a collective character since the asymptotic form is determined by a finite number of pseudomoments. For x ^ — 1, of course, another analogous relation holds, and for |x| < 1 the classical theorems hold. § 2. An elementary result on the probability of very large deviations We shall be concerned with the deduction of the asymptotic forms of and for very large x. We begin with the first of these expressions, setting
256 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 Y = axnJl and supposing that y > n. Thus we consider the probability of the event Xl+X2+... + Xn>y. A4.2.1) This can only occur if at least one of the events Xt>yln (/=1,2, ...,n) A4.2.2) occurs. These events overlap, but their intersections have small probability if y is large. More precisely, we shall find values of j; for which the probab- probability that two or more of the events A4.2.2) occur is of order Bwy-% A4.2.3) where rjn-+O as n-+oo. For each k^2 the probability that exactly k of the events A4.2.2) occur is ka nk{2Aafn k\yka k Jka + k k\yka ' The sum over k^2 is bounded by A4.2.3) if A4-2.4) nka + k . n since BAa)k/kl=Be~k. This is equivalent to y>11-l/(n-l)ankHk-l)+l/a (k^2), A4.2.6) and is certainly satisfied if y>f1n1n2+"-1. A4.2.7) In particular, we may take y>yn= n^-'Mog n , nn = (log n)~l . A4.2.8) Let H1 be the event {Xi>y/n}. Then in view of the discussion, P{SH>y) = nP{Hl)P{Sn>y\Hl) + Br,Hny-'. A4.2.9) We now investigate the expression , A4.2.10)
14.2. PROBABILITY OF VERY LARGE DEVIATIONS: ELEMENTARY RESULT 257 which if L = yjnk log n may be written A4.2.11) = P[Sn>y, P\Sn>y, an- L > L hX A4.2.12) For the first of these two expressions we have the inequalities < L)P[Sn>y an' < L)P[Sn>y A +o{l))P{X1 >y + Lan>\ H,), A4.2.13) 2 + ...+JCn an' < L ^A +o(l)) PiX^y- La^l HJ , A4.2.14) by virtue of the central limit theorem. Further, under A4.2.7), :1>y±L<mi\H1) = P{X1>y±Lani)/P{X1>y/n) = A4.2.15) because of A4.1.5), A4.2.8) and A4.2.11). We now examine the second term in A4.2.12). The event > L an2 A4.2.16) is independent of H1, and implies that for some i, X(> yna/log n. Arguing as before, and using A4.2.8) and A4.2.7), we have > L ) < nP( X, an- \ogn A4.2.18)
258 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 We now use A4.2.14), A4.2.15) and A4.2.18) to rewrite A4.2.9) in the form P(Sn>y) = n = nP(X1>y)(l + o(l)). A4.2.19) Thus, for y^yn, A4.2.10) where o(l) is uniform in y as n-*oo. This simple result has an immediate probabilistic significance; it asserts that if Sn takes a very large value this is most likely to be because exactly one of the summands is very large; the probability of Sn being large as a result of an accumulation of moderately large summands is comparatively small. Since the underlying distribution is symmetric, we also have P(Sn< -y) = nP(X1 < -y)(l+o(l)) = Aany-"A +o(l)), A4.2.22) where o(l) is uniform in § 3. Radial extensions We now set xn = n*+a~1+s, A4.3.1) where s < 10 ~ 4 is a small positive constant. Because of the previous results we have, for P(Zn>x)~nP(X1>x<mi). A4.3.2) Since the range x ^ 1 is dealt with by the central limit theorem, it is suffi- sufficient now to examine the range 1 ^ x ^ xn. A4.3.3) We shall do this with the help of the analytic method of Chapter 9. Con- Consider the characteristic function r oo 0(t)= eitxg(x)dx, A4.3.4) J - oo
14.3. RADIAL EXTENSIONS 259 which by A4.1.5) and A4.1.6) is differentiable for all t at least [a—1) times. We now introduce the concept of a radial extension of cf)(t). A function y(t) will be called a radial extension in r^O if is it defined in some neigh- neighbourhood [ —10, to~\ of t = 0 and coincides with </>(r) on [0, t0]. A radial extension in r^O is similarly defined. For example, the characteristic function </>(t)=e~l'l(|t| + l) corresponding to the probability density g(x) = 2/n(l + x2J has radial extensions y(t) = e~t(t+l) in t^O and y(t) = e'( — t+l) in r^O. Both are entire, neither is even. We now prove that, under the conditions here assumed, </>(r) has a radial extension y+ (t) in r^O which is everywhere differentiable at least Da+ 2) times, and a similar radial extension y_(t) in t^O. From A4.1.5) and A4.1.6) it is immediately clear that it is sufficient to prove that, for any r^3, the expression — d?+ — d? A4.3.5) C •' — oo C has radial extensions which are differentiable any number of times. It is clearly sufficient instead to consider 3 o)-tc since •3 i is an entire function. For ^3 we can expand ?~r as a power series in Thus oo K / c \k JC). A4.3.7) The question of the differentiable of the radial extensions of A4.3.5) there- therefore reduces to that of the radial extensions of A4.3.8)
260 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 for r^k^K, for if the continuous function p(?) = O(?~K~1), its Fourier transform is differentiable at least (K— 1) times. In the integral A4.3.8) the integrand is rational, non-zero on the real axis, has poles ±i and is of order O(?~k) at infinity. Moving the contour of integration upwards for r^O and downwards for r^O we obtain radial extensions in the form of entire functions (as in the example). Hence A4.3.5) has radial extensions which are infinitely differentiable, and </> (t) has radial extensions which are differentiable at least Da + 2) times. § 4. Investigation of the fundamental integral Because </>(r) is real, the probability density pn{x) of Zn is given by = —Re (j)(t)nQ-°nlAitx&t. A4.4.1) n Jo Moreover, since there is a bounded continuous probability density g(x), we have (cf. (9.3.7)) pn(x) = — Re ^(t)"e-ff"%ftxdt + Be-81". A4.4.2) 7T J We note that for t ^ 0, </> (t) = y (t), and that y (t) is differentiable in the neigh- neighbourhood at least b>6a — 3 times. From A4.4.2) we have ( ° pn{x) = — Re ( y(tfe-anVtitxdt + Be~ein, A4.4.3) 7C J o where y(t) is differentiable b times in [0, s0]. In In view of this we find that, for n~* log n y{t) ^l-in-^lognJ, y(r)"=J5exp(-g2(lognJ), and from A4.4.3) that
14.4. INVESTIGATION OF THE FUNDAMENTAL INTEGRAL 261 nk .-n-l logn pn{x) = — Re y{tfe-an1Aitxdt+B exp[-g2(log nJ] . n Jo A4.4.4) If for r^n~* log n we write K(t) = \ogy(t), then ni rn-*\ogn Pn (*) = — Re exp [nK (t) - an* itx] dt + 7C Jo + B exp [- g2 (log nJ]. A4.4.5) Note that y (t) is not necessarily even. Since it is b times differentiable, 4 t" + Btb A4.4.6) in |t|<e0. If |t|<n"*logn, 6*M*+e, A4.4.7) and since we have, writing yo(r)= 1-i-r2 + that K{t) = log 7o(t) + Bsn-*>+?, A4.4.9) since in our interval Y<yo(t)<|. For |t| ^n-i log n, 6-1 ^ log yo{t) = -it2 X 0, — + Bt», A4.4.10) q = 3 y • where ^=0. A4.4.11) Moreover, Bntb= Bsn-ib+1+\ A4.4.12) and substituting into A4.4.5) we find that
262 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 n pn{x) = — Re {¦H exp n -^ + 1 0,-7 - We write 6-1 j-q and examine the entire function exp[nK3{t)~\ . For \t\<n~i log n, and if we work to accuracy we can ignore [nK3(t)~\b, and write where Substituting in A4.4.5) we obtain • n ~ V2 log n 2 pn{x) = —Re 711 Jo Substituting ? = tn* , 1 r log n pn(x) = — Re 7T Jo and since logn o for r^Cl7 = j3 exp(-i(log nf) A4.4.13) A4.4.14) A4.4.15) A4.4.16) A4.4.17) A4.4.18) A4.4.19) +1 + s. A4.4.20) A4.4.21) A4.4.22)
14.5. INVESTIGATION OF THE AUXILIARY INTEGRALS 263 • Pn (x) — -Kc I c I It i\.^,[cyi , n)) e dx -I- BF n n Jo A4.4.24) _ _ , 2 1 f°° = B7r) *e ix H—Re e~ii2KJ^n~i,n)e~^xd^ + n Jo + J5?n-^+1+?. A4.4.25) We therefore have to investigate TOO Re I e"« f e-'{xd^. A4.4.26) Jo § 5. Investigation of the auxiliary integrals In this section we investigate more thoroughly the expression E{x, r) = Re f°° e-x2?e-& . Jo If r is even, then is expressed in terms of e ix2H^0)(x), where H{r0) is the rth Hermite poly- polynomial. We also remark that the assumption that g (x) be even is not essen- essential, though it simplifies the calculations. If r is odd, then E(x, r) does not fall off so sharply as x-+ oo (for a discussion of this function see [150]). For even r we have, for r bounded, E(x, r) = BxrQ~ix2, A4.5.1) while for r odd, E(x, r) = (- lp+ 1}r! x~r~ l + Bx-r~2 . A4.5.2) Let us now turn to the integral A4.4.25). The terms involving even powers of ?n~^ are bounded by B(xn~*)re~ix2 and for x^logn are therefore negligible compared with e"**2 and for x >log n smaller than the remainder term in A4.2.25). Then A4.2.25) will have the form where rx is a rational function of x.
264 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 Now let A4.3.1). Then, from the last equation, lb3 A4.5.3) where r2 (x, n*) is a rational function, since rx (x, n*) can be represented, up to accuracy Bsn~Jjrb+i+\ in the form of a power series in x ~ 1, beginning with x~k (k^-2), which may be integrated term by term to give r2(x, n*). For [ pn{x)dx~nP{Xl>yon*)~nAJoay"n*a. A4.5.4) Jy In particular, taking y = xn, nAJaaxania = Bsn~ib + 3+s, A4.5.5) and so, combining A4.5.3) and A4.5.4), TOO pn{x)dx = P{Zn>x) = Jx ~ib + 3+s, A4.5.6) for x^xn. This formula is also true moreover for x^x^/r, and conse- consequently, for such values of x, r2(x, n*) - nAjaaxania . A4.5.7) It is not difficult to see that A4.5.7) is also true for x > n , so that for x > 1, P(Zn>x)^Bn)~i \ e"*" du + r(x, n±), A4.5.8) .' X where r is a rational function. We notice that the coefficients of the rational function are expressed in terms of a finite number of the derivatives at zero of the radial extension y(t) of (f)(t). These derivatives are called the pseudomoments of Xj. If </>(r) is differentiable h times at 0 (i.e. if h^ a— 1) then the first {h—1) pseudomoments differ from the corresponding moments only by powers
14.5. INVESTIGATION OF THE AUXILIARY INTEGRALS 265 of i. The pseudomoments play the role of the linear functionals at, bi described in Chapter 2. We remark that similar conclusions may be drawn when the densities have asymptotic expansions as x->oo; P(Xl>X)= ^ g(u)du = j a X and similarly for x-> — oo, where G is of bounded variation (but not neces- necessarily monotonic). § 6. An example Suppose that so that a = 1 and, for t ^ 0, (/)(t) = e-t(t+l). A4.6.2) Then log <?(*)= -t + log(l + 0, so that, in 0<t<l, K(t) = -t + t-|t2+it3-it4 + ... A4.6.3) = -±t2 + K3(t), where Thus K4@ will be a truncation of Now •¦QO Re \ 1 o
266 INTEGRAL THEOREMS HOLDING ON THE WHOLE LINE Chap. 14 so that () B)-*-**22 B P{Zn>x) ~ Btt)-* [ e-*dn+ 3 • A4.6.4) 3 3' A4.6.5) which agrees with A4.6.4). Notice that in this case even the third moment fails to exist, and the pseudomoment is needed.
Chapter 15 APPROXIMATION OF DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS BY INFINITELY DIVISIBLE DISTRIBUTIONS § 1. Statement of the problem We here consider the general problem of the limiting behaviour of the distribution function Fn(x) of the sum Sn = X1+X2 + ...+X,I A5.1.1) of independent random variables with the same distribution F, when no further assumptions are made about F. It follows from § 2.6 that it is not in general possible to choose normalising constants An, Bn such that the distribution of (Sn — An)/Bn converges to any non-degenerate distribution. Even more is true, for there are distributions for which no subsequence (Snk — Ank)/Bnk converges in distribution. One such example is the (infinitely divisible) distribution with characteristic function [48] /W-exp{J'(c«rx-l)d( 4 log |x| -1 + (cosoc-l)df 4 log x, Although the sequence Fn (x) in general diverges, we can ask the question; does there exist a sequence Dn(x) of infinitely divisible distributions such that, in some sense, Fn and Dn are close for large n. The answer is affir- affirmative, and is given by the following theorem. Theorem 15.1.1. There exists an absolute constant C such that, for any distribution F and any n there exists an infinitely divisible distribution Dn with Dn(x)-Fn(x)\^Cn-i . A5.1.2)
268 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 This chapter is devoted to the proof of this theorem, which is completed in § 4, §§ 2, 3 being devoted to some auxiliary propositions which are necessary for the proof. If F and G are distribution functions, we write |F-G| = sup|F(x)-G(x)| A5.1.3) X for the distance used to define strong convergence in § 1.3. Then Theorem 15.1.1 is just the assertion that, for all F, ¦ M\Fn-D\^Cn-±, A5.1.4) D or since C does not depend on F, sup M \Fn-D\ ^ Cn~> . ' A5.1.5) F D The left-hand side of A5.1.5) may be regarded as the greatest distance (in the sense of A5.1.3)) of the set of n-fold convolutions Fn from the set of infinitely divisible distributions. Throughout this chapter, we shall write -1° (x^°)' 11 (x>0) ; Cu C2, ... will denote absolute constants. § 2. Concentration functions The concentration function of a random variable X is the function Qx{l) = 2@ = sup P{x^l X As a function of />0, this is non-decreasing and right-continuous. If X, Y are independent random variables, the concentration function of X + Y is not greater than that of either of them. In fact, for all u,
15.2. CONCENTRATION FUNCTIONS 269 and taking expectations, We shall however need a more precise estimate for the decrease in the concentration function of a sum of independent random variables. Write where the Xt are independent, Qt(i) = QxM, G@ = Gs.@. s= Theorem 15.2.1. There exists an absolute constant C± such that, for all Q(L)^CYL/ls^ . A5.2.1) The proof of this theorem requires a number of auxiliary results. Lemma 15.2.1. Let S&be a set ofn elements, and K a class of subsets of SSL such that no member ofK is contained in any other member. Then the num- number v of members of K does not exceed ([!$.„]). Proof. Among the classes K satisfying the conditions of the lemma, we can choose one, Ko with the greatest possible size (number of elements). Assume for the sake of argument that n is even (n = 2m); the argument for odd n is similar. We show that all the subsets in Ko have the same size m. Suppose if possible that Ko contains r ^ 1 sets of size k^m+l>j(n+l), denoted by Al, A2, ..., Ar, and none of size >k. Each A{ has k subsets An,Ai2,...,.Aik of size (k— 1), but the collection {A^; i=l, 2, ..., r; 7=1, 2, ..., k) may contain a given set more than once; enumerate the distinct members of the collection as Bx, B2, ..., Bs. Each Ba can be a subset of at most (n — k+l) of the Ah and so can appear at most (n — k+l) times in the collection {AtJ). Thus and since k>%(n+ 1), this implies that s>r .
270 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 Thus the class K' obtained from Ko by replacing Ay, ..., Ar by By, ..., Bs is larger than Ko and satisfies the conditions of the lemma, and this con- contradicts the assumption that Ko is maximal. The contradiction shows that the members of Ko each have size ^ m. An exactly similar argument shows that the members of Ko all have size ^ m. Thus the number of members of Ko cannot exceed the number of subsets of 21 of size m, which is (?,). • Lemma 15.2.2 If the random variables X{ in Theorem 15.2.1 have distribu- distributions given by where a,-^/, then QBl-0)^(^J-. A5.2.2) Proof. The probability of {x < Sn<x + 21} is equal to 2~n times the num- number of sums of the form falling in the interval (x, x + 2/). For any such sum, consider the subset of {1, 2, ..., n] consisting of those k for which ek= 1. Then this collection of subsets satisfies the conditions of Lemma 15.2.1, for if the subset corres- corresponding to ?4ak is contained in that corresponding to Ze^'ak, we clearly have which gives a contradiction. Thus Lemma 15.2.1 shows that there can be at most ([!„]) sums of the form Dekak lying in any interval (x, x + 21). • Lemma 15.2.3. Under the conditions of the previous lemma, Q{L)^C2L/ln*. A5.2.3) Proof By Lemma 15.2.2,
15.2. CONCENTRATION FUNCTIONS 271 and by Stirling's formula, Hence X IL/l] Corollary 15.2.1. If Proof. The concentration functions of Sn and Sn~Zj5? coincide. Proof of Theorem 15.2.1. First suppose that the distribution functions F,(x) of the X,- are continuous and strictly increasing, so that the inverse functions F[~1 (?) are well-defined. The variable ?? = F?(X?) has so that where <^l5 <^2, ••-, <^n are independent and uniformly distributed on @, 1). Write 1 -Qi(l) = 4e1, x[ = Ff1 (e-), x" = Fr1 A -e?), and note that so thatx-' — xj>/. We consider the random subset {/l5 ..., im} of {1, 2, ..., m} consisting of those / for which ^,-<?,- or ^> 1 — ?,-, and write, for such i, Zt if &,, -^ if ^>e...
272 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 Probabilities conditional on fixed values of iu ..., im, Zn, ..., Zim will be denoted by P. It is clear that, under P, the variables Xit, ..., Xim are independent, with P(Xk = ak + xk) = P(Xk = ak-xk) = ±, where As remarked above, Q(L)^Qi(L), A5.2.4) where Qi{L) is the concentration function, under P, of m This can be estimated using Corollary 15.2.1; for L^max xk^^l, *. A5.2.5) From this and A5.2.4) we have Q{L)^EQ{L)^EQ1{L)^P{m^s)+4C2L/lsi. A5.2.6) It remains to evaluate P (m^^s), which we do by noting that m can be re- regarded as the number of successes in n trials, when the probability of success at the kth trial is Thus E(m) = t 2*i = K i= 1 n V (m) = i_! 2e,- A — 2e,-) ^ js , i= 1 so that Chebyshev's inequality gives Combining this with A5.2.4) we have Q(L) ^ 8s +4C2L//s} ^ CiLlls* . A5.2.8)
15.3. AUXILIARY PROPOSITIONS 273 The theorem is therefore proved for variables X{ whose distribution func- functions are continuous and strictly increasing. The general case can easily be deduced as follows. Replace X{ by X- = X{ + nh where n{ are independent of each other and of the Xh having the normal distribution N@, a). The distribution functions of the X[ are continuous and strictly increasing (§ 1.2), and so <2'(L)$SC1L/Zs'*, where A5.2.9) As <r->0, Q'x^Qi and Q'^Q at points of continuity of Qt and Q which form dense sets. Since C\ is absolute, it follows that Corollary 15.2.2. If the distributions of the Xj satisfy xk ~~ Xk then Q(L)^CAL/lnx> . Proof. Under the condition stated, Q,(/)^|, so that and Q(L) A5.2.10) § 3. Auxiliary propositions Lemma 15.3.1. Ifa>0,ai>0, then 3-1 a1 A5.3.1)
274 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 Proof Without loss of generality we can suppose that |(o"i/<r)—l|<i and that a>a1,x>0. Then < <T + 1 1 Lemma 15.3.2. //ttoe distribution function F(x) satisfies /or sup A5.3.2) Proof. If ? is a random variable with distribution F, then by Chebyshev's inequality, so that the sum in A5.3.2) is bounded by Lemma 15.3.3. Let Xx, X2, ..., Xn be independent and identically distri- distributed with
15.3. AUXILIARY PROPOSITIONS 275 and let H be the distribution function ari^, then sup r = - oo rh < x < (r + ...+Xn.If\Xk\^l,h*? A5.3.3) Proof If Hn (x) denotes the distribution function of then by Theorem 3.6.2, Since IXJ^/, so that and thus +X2) sup = Z sup r = - oo rhlan V2 < y < [(r + ^T Z on* rJt n C ori* ^n 1+r2 Lemma 15.3.4. For any integer n and 00 I k=0 1, A5.3.4) A5.3.5) Proof. This lemma is a strengthening of Poisson's well-known approxima- approximation to the binomial distribution [47]. Let ?l5 ?2, ..., ?„ be independent
276 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 variables taking only the two values 1 (with probability p) and 0 (with probability q= 1 — p), and write so that Let r\x be a variable with a Poission distribution with parameter X, write nx(k) = p(r1?=k) = ~e-\ and nnp = ri, nnp{k) = n{k). Then the left-hand side of A5.3.5) is equal to f \pn(k)-n(k)\, fc = O the variation distance between the distributions of r\ and (. We recall that = np(l-p), A5.3.6) and that so that np, E{n2) = {npf + np . A5.3.7) There is no loss of generality in taking p^. We write xk = k — np, and denote by ?' and Z" summation respectively over |xk| <^ and \xk\^r&. Then . (k) - n (k) | = z' ip. (k) - n (k) |+z" ip. (k) - n (k) \, and we examine separately the two sums Z' and Z". (I) Z". From A5.3.6) and A5.3.7) and Chebyshev's inequality, X"\Pn{k)-n{k)\ ^ Z"Pw(/c) + Z"/7(/c)^ A5.3.8)
15.3. AUXILIARY PROPOSITIONS 277 (II) S\ There is no loss of generality in assuming n ^4. Then where d(k) = Pn(k)/n(k). It is easy to see that d(k) = d1(k)d2(k), where d2{k)= {l-p)"-kenp . We first show that d(k) = d(k, n, p) is bounded. We have s=l k-l co s= 1 r= 1 r==1 rn k-l where Sr = ? s s=l Setting fr (x) = xr, we have i,r+l r+1 - s. = kr+1 Y l s = 0 = k r+l k-l r(s+l)lk s = 0 J s/k rkr so that In particular, „ k2 k A5.3.9) Substituting into A5.3.9) and remembering that |/c|^|n, we have
278 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 1V ' ^ r(r+l),f n Moreover, log^2(/c)=n[(l-p)log(l-p) + p] A5.3.10) 7 It follows from A5.3.10) and A5.3.11) that no, m= By Taylor's theorem, and therefore 3/c — < 3. A5.3.12) Using the inequality |ex-1| ^ |x| e|x|, together with A5.3.6) and A5.3.12), we obtain § 4. Proof of theorem 15.1 In this section we conclude the proof of Theorem 15.1.1. The necessary arguments are rather complicated, and we separate the proof into several parts. (I) Preliminary construction Until part (IV) we shall assume that the distribution function F(x) of the variables Xj is continuous and strictly increasing. As in § 2, X{ = F~1 (?,), where (?,-) is a sequence of independent random variables, uniformly distributed over @, 1).
15-4. PROOF OF THEOREM 15.1.1 279 We write 3 11 , otherwise , n A* = Z A*j, i y = 0), B(x) = Pil xdA(x), A5.4.1) Clearly ) ). A5.4.2) This construction expands F as a combination of two distributions. One of them, A(x), is concentrated on the interval [x~, x+], where x~ =F~1 (jp), x+ =F~1 A — jp), of length X. Consequently ^4(x) can be examined using the results of § 3, notably Lemmas 15.3.2 and 15.3.3. The distribution B on the other hand is concentrated on the half-lines ( — oo, x~]and[x+, oo), each with probability j. For the powers BT=B*m (in this section powers of distributions are always to be understood in the sense of convolution) .we can use Corollary 15.2.2, which leads to the inequality (with X — x+-x"), QB«(X)^C4k-i, A5.4.3) where QG denotes the concentration function of the distribution G. There is no loss of generality in supposing that a = 0, since otherwise we can replace X} by X- = Xj — a; if the distribution function of Z X] can be approximated by the infinitely divisible distribution function D'(x), then that ofLXj is approximated to the same accuracy by D (x) = D' (x — na). We shall expand Fn as a sum Fn=F"={pB + (l-p)} ? () j= i V1 / and examine separately the two cases X ^ on* and X < an* .
280 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 (II) The case / ^ oiv. The compound Poisson distribution fc=0 where B° = E, is infinitely divisible (§ 1.7). If k=o then Lemma 15.3.4 shows that >-"P ^C9p = C9n-±. A5.4.4) The variance of A"~k is equal to (n — k)a2 ^ no1 ^ A2, so that taking h = /l in Lemma 15.3.2 and using A5.4.3) we have \Bk*A"-k(x)-Bk(x)\ ^ r \An'k{x-z)-E(x-z)\dBk(z)^ J — 00 sup \A-k(y)-E(y)\ rX dBk(x-y) A5.4.5) Hence and splitting the sum into two parts corresponding to the ranges k^^r and /c<^n*, we have \F-F1\^C,C6n--+ Now z A5.4.6) A5.4.7)
15.4. PROOF OF THEOREM 15.1.1 281 and so that Chebyshev's inequality gives Hence A5.4.6) and A5.4.7) combine to give \F-Fil ^ C4C6n"i + 4n-*^C10n"* , and so, from A5.4.4), \F-D,\ ^ IF-FJ + I^-FJ ^ Clin~* , which is the assertion of the theorem. (Ill) The case X<on*. As the approximating infinitely divisible distribution, we use A5.4.8) . p. — np r>k. (f) Write and From Lemma 15.3.4 we find, as above, that np T = C9n~\ A5.4.9) To estimate \F — F2\ we proceed as in the derivation of A5.4.5), but using Lemma 15.3.3 instead of Lemma 15.3.2, and setting h = ani. Then *) Z sup \A"~k(y)-4>in_k)a2(y)\ r y
282 DISTRIBUTIONS OF SUMS OF INDEPENDENT COMPONENTS Chap. 15 c arv (i5Aio) Without loss of generality we can take n ^ 8; then the right-hand side of A5.4.10) does not exceed C4C7/r*(l -fn-*)"^ This is the analogue of A5.4.5) in (II), and as in (II) we deduce that |F-F2|<2*C4C7n-*+ ("W n~* . A5.4.11) It therefore remains to estimate the deviation \F2 — F3\. By Lemma 15.3.1, n — k 1 n(l-p) Denoting by Z' a sum over those values of k with n — k n(l—p and by Z" a sum over the remaining values of k, A5.4.2) gives A5.4.12) A5.4.13) Using Chebyshev's inequality as in the derivation of A5.4.7) and A5.4.8), A5.4.14) = np{\-p)lni(\-pf since we have assumed that n ^ 8.
15.4. PROOF OF THEOREM 15.1.1 283 Consequently |F2-F3| ^ C5n-^ + C^n-* ^ C14n"* , A5.4.15) and combining A5.4.9), A5.4.11) and A5.4.15) we obtain |F-D2| ^ \F-F2\ + \F2-F3\ + \D2-F\ ^ C15n~* . A5.4.16) Theorem 15.1.1 is therefore proved for the case of continuous, strictly increasing distribution functions. We now deduce the general case by the method of § 2. (IV) The general case. Define X/ = X[ + ^I-, where the rj( are independent normal variables with mean 0 and variance 5. The distribution function of X[ is F'(x) = f F(x-z)d4>d(z), J — 00 which is continuous and strictly increasing. As 5-+0, F'n converges weakly to Fn. By what has already been proved, there is an infinitely divisible distribution D'n(x; 3) such that sup \F;(x)-D'H{x; S)\ ^ ClW-* , A5.4.17) where C1 = max(C11, C15). By Helly's theorem, we can extract from the distributions D'n(x;3) a convergent subsequence D'n(x; Sj) with dj->0, whose limit Dn(x), being the limit of a sequence of infinitely divisible distributions, is itself infinitely divisible. Setting S = Sj in A5.4.17) and passing to the limit, we have \FH(x)-DH(x)\^Cin-* A5.4.18) at all points of continuity of Fn — Dn. But this function is left-continuous and its points of continuity are everywhere dense, so that A5.4.18) holds for all x. •
Chapter 16 SOME RESULTS FROM THE THEORY OF STATIONARY PROCESSES In this chapter an account is given of those results from the theory of stationary processes which will be required in the sequel. This chapter has much in common with Chapter 1, but here the proofs will, as a rule, be given in full, although the discussion will be rather condensed. For a more complete and detailed account we refer to chapters X and XI of [31], as well as [163]. § 1. Definition and general properties A random process Xt (t e T) is called stationary (in the strict sense) if the distribution of the random vector {Xtl+h, Xt2+h, ..., Xts+h) does not depend on h, so long as the values tt + h belong to T (a subset of the real line). The random process is called stationary (in the wide sense) if E(Xf)< oo for all t, and if E(XS) and E(XsXs+t) do not depend on s. Without loss of generality we can (and will) take E(XS) = O (otherwise we can replace Xt byXt — E (Xt)). If no confusion can be caused, the qualifying parentheses (in the strict (or wide) sense) will be omitted. The parameter set T will be taken to be either the whole line or the set of integers (positive or negative), except when it is specifically stated that only non-negative values of t are considered. We distinguish the two cases as those of continuous time and discrete time; stationary processes with a discrete time parameter are often called stationary sequences. It is nota- tionally convenient to write continuous time processes as X(t) or X(s), and discrete time processes as Xn or Xj; when both are considered together we use the notation Xt or Xs.
16.1. DEFINITION AND GENERAL PROPERTIES 285 In the case of continuous time we assume that the process is stochastically continuous in the sense that, for all s>0, \imP{\X{t + s)-X{t)\>s}=0. A6.1.1) s-*0 When dealing with wide-sense stationary processes, however, we shall assume the stronger condition lim E\X(t + s)-X{t)\2 = 0 . A6.1.2) s-*0 These conditions are very weak; they are fulfilled in all cases of interest. A random process X, is a function Xt(co) of two variables te T and coeQ, where (Q, 5, P) is the underlying probability space (see § 1.1). We shall assume that, as a function defined on (TxQ), Xt(co) is measurable with respect to the product cr-algebra gr x g, where 5r is the cr-algebra of Lebesgue sets in T. Every stochastically continuous process may be modified to satisfy this condition without altering the finite-dimensional distributions [31]. Example 1. The sequence of independent, identically distributed random variables ..., X-i, Xq, Xi, X2, ... forms a stationary process in the strict sense. Example 2. Let ..., ?_l5 ?0, ?l5 ?2, ... be a sequence of independent, identically distributed random variables with ?(?,•)=(), E(^j)=a2<co. If the sequence a} is such that EJ1 _ x a] < oo, then the equations oo oo xj= Z ak?k+j= Z ak-jtk fc= — oo k= — oo determine a stationary sequence. If it is assumed that the ?,- are not in- independent, but merely orthogonal in the sense that ?(?,-?,) =0 for i^j, then the sequence Xj is stationary only in the wide sense. The verification of these assertions is left to the reader.
286 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 § 2. Stationary processes and the associated measure-preserving transformations With each random process Xt (— oo < t< go) we can associate cr-algebras aR?(*) = a«S (- oo < a < fc < oo), where %% is the a-algebra generated by the events of the form A = {(Xtl,Xt2,...,Xts)eA} A6.2.1) for a ^ t1 < t2 < ¦ ¦ ¦ < ts ^ b and s-dimensional Borel sets A. This o--algebra can be regarded as the closure of the set of events A6.2.1) with respect to the metric B) = P{(A-B)u(B-A)}. A6.2.2) A special role will be played by the o--algebras arc.. A*1 < °o), k— 00 5 clearly, for s>0, "•— oo — *vvoo ' Every stationary process defines a family of mappings T% of Wx into itself given by the following rule. For events A of the form A6.2.1), TA = T{(Xtl, ..., XJeA} = {(Xtl+t, ..., Xts+t)eA\ . The set 21 of events of the form A6.2.1) is dense in 99?^ in the metric A6.2.2). We can therefore extend V from 21 to ^Six as its unique continuous ex- extension. The family of mappings V has the following properties: A) Tx is well-defined up to events of zero probability. B) P(T'A) = P(A), AeWi^. C) Up to events of probability zero, r(f]Ak\= QT(Ak),
16.2. STATIONARY PROCESSES 287 V(A) =T(A), ViT-'A) = T~t(TtA) = A. D) Ttl+t2 = Ttl Tt2 i.e. T1+t2(A)=Ttl{Tt2{A)). Transformations satisfying conditions (l)-C) are called measure-preserv- measure-preserving transformations of WlK. Property D) indicates that the transforma- transformations V form a group (a semigroup if only non-negative values of the time parameter are considered). Thus to any process Xt, stationary in the strict sense, there corresponds a group (V) of measure-preserving transformations on the er-algebra SJd^. In the discrete time case, (T{) is the cyclic group of powers of T= T1. Conversely, every measure-preserving group of transformations, con- continuous in the sense of A6.1.1), on a er-algebra 9ft <= ft, generates a family of stationary processes Xt, and every stationary process is so generated. To prove this, associate with Tl a transformation T[ on the class of random variables measurable with respect to 9ft, defined by the conditions : A1). T[ is well-defined up to differences on sets of zero probability. BJ If x(A) is the indicator function of the event AeW, then CJ The transformations T[ are linear; for random variables measurable with respect to 9Ji, and constants a, /?, DJ The transformations T[ are continuous; if ?„-><!; with probability 1, then 77(<y-*77(?) with probability 1. Since each random variable ?, measurable with respect to 9ft, is the limit as n-»oo with probability 1, of the random variables " k (k s ^ k+l\ there exists one and only one transformation T[ (up to events of probabili- probability 0), satisfying the conditions A1)-D1). If now ? is any random variable measurable with respect to 9ft, the ran- random process defined by ?t=T[? will be stationary in the strict sense. Moreover, if V is the group,of transformations defined by the stationary * This assumes the process defined for both positive and negative values oft.
288 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 process Xt, then it is easy to see that and in particular, xt=rx0. We note two properties of T{ which follow from C) and B1)-{4i): m)n{n) A6.2.3) and A6.2.4) It is always possible to work, not with the transformation Tx of WR^ into itself, but with a one-to-one point transformation of Q itself. In fact, by Kolmogorov's extension theorem for measures on product spaces, we can always take Q to be the set of finite real-valued functions cot defined on T, with Xt(co) = cot. We define transformations Tz of Q into itself by the equation Tz{co)t=cot + Z. The stationarity implies that Tz preserves the measure of all sets in 21, and thus of all in Wx. The transformations T{ are given by An event AeWdx is called invariant if, for all t, VA = A (mod 0), i.e. p{T'A,A) = 0. Clearly all events with probability 0 or 1 are invariant. If the group of transformations Tl has no other invariant events it is said to be metrically transitive. From a probabilistic point of view, the absence of metric trans- sitivity implies a dependence between the distant past (ftft-oo) and the future (901J. § 3. Hilbert spaces associated with a stationary process On a given probability space (Q, g, P), consider the collection of all complex random variables <!;, with ?|<^2| < oo. It is easy to check (cf. [2]) that
16.3. HILBERT SPACES 289 has all the properties of a scalar product, endowed with which the collec- collection is a Hilbert space L2(Q). Let Xt be a stationary process (in the strict sense) defined on this probability space. Then the er-algebra 9Jt? determines the subspace Hba(X) consisting of those ? e L2 (Q) which are measurable with respect to 9ft?. With the obvious notation, for s > 0, The restrictions U' of T[ to L2 (Q) form a group of unitary operators (if only non-negative values of t are considered, a semigroup of isometric operators). In the discrete time case, this is the cyclic group of powers of the restriction U of T\. In fact, because of A6.2.3) and A6.2.4), , Urf) = E(U? Un) = EU(?rj) = E(?rj) = (?, rj). We remark that the space Hx is separable. In the discrete time case, the set of indicators of events of the form A6.2.1), with A an s-dimensional rectangle with rational vertices, forms a countable set which is contained in no proper closed subspace of HK. In the continuous time case, we take the indicators of A6.2.1) with A as before and the t-} rational. If the time parameter is continuous, the group (V) is continuous in the sense that, for all ?, rj, the scalar function is continuous in t. To prove this, it is sufficient to show that, for all ?eHx, -G^|| =lim{?|Gt+s^-G^|2}=0, A6.3.1) and it is only necessary to prove this for a set of elements c, dense in Hm. Because of A6.1.1) the indicators ? of events of the form A6.2.1) satisfy A6.3.1). If the process Xt is stationary only in the wide sense, we can define as before the spaces Hl(X), but these are not very helpful because we cannot define the measure-preserving transformation Uf. Instead we define the smaller subspace L[(X), the closed subspace generated by the variables Xu (s^u^t). It is clear that, for t>0, r\ p — i
290 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 We define the group (U{) of unitary operators on L^ by If the process is stationary in both the wide and strict senses, then U[ is the restriction of U' to Lx; we shall use the symbol Ul for both operators. A family of projection operators Ex (a ^ X ^ b) is called a partition of the operator / if A) Ea = 0, Eb=I, C) EE = E We state two theorems on the representation of groups of unitary oper- operators as Fourier-Stieltjes transforms of partitions of the identity (the proof of which can be found in [2]). Theorem 16.3.1 (von Neumann-Wintner). To each unitary operator U there corresponds a partition of the identity Ex( — n^X^;7t), such that U"= f eUndEx. A6.3.2) The integral is to be understood as the strong limit, as max (lj+1 —lj)-*0, of the sums Theorem 16.3.2 (Stone). Let U' be a group of unitary operators in the Hilbert space H such that, for all ^,neH, (?/'?, n) is a continuous function oft. Then there exists a partition of the identity Ek(—co<X<co) such that oo U'= QiodEx. A6.3.3) If we apply these theorems to the operators U' on L00(X), we obtain the representation Ul= \eiadEx A6.3.4) where the integration is over [ - n, it] in the discrete time case and over
16.4. AUTOCOVARIANCE AND SPECTRAL FUNCTIONS 291 (—go, go) in the continuous time case. We shall see in § 5 that this equa- equation has an important probabilistic meaning. § 4. Autocovariance and spectral functions of stationary processes Let Xt be a stationary process in the wide sense. By definition, E(XtXs) depends only on (t — s): The function Rt_s is called the autocovariance function of the process. It has two properties, apart from the obvious fact that Rt = R_t: (a) Rt is continuous, (b) Rt is positive-definite [47]. In fact, by A6.1.2), \Rt-Rs\ = \EXtX0-EXsX0\ ^ {E\Xo\2E\Xt-Xs\2}^0 as s-+t, and if zl5 ..., zn are arbitrary complex numbers, and t1, ..., tn points of the parameter set T, J,k=l fc=l These properties (a), (b) imply that Rt/R0 is the characteristic function of some probability distribution. In the continuous time case, the Bochner- Khinchin theorem shows that Rt= while in the discrete time case Herglotz's theorem asserts that Rt= where in either case F is bounded and non-decreasing. The function F{X) is called the spectral function of the process Xt. If F(X) is absolutely continuous, its derivative f(X) = F'(X) is called the spectral density of the process. The relation between the autocovariance and spectral functions is the same as that between the characteristic and distri- distribution functions; in particular they determine one another uniquely.
292 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 It is a consequence of Kolmogorov's extension theorem that every posi- positive-definite continuous function Rt defines a Gaussian process Xt stationary in both senses (§ 17.3). Consequently, every continuous posi- positive-definite function is the autocovariance function of some stationary process, and every bounded non-decreasing function is the spectral func- function of some stationary process. § 5. The spectral representation of stationary processes In order to ascertain the probabilistic significance of the theorems of § 3 about the spectral expansion of the family U\ consider the random process = EXXO. From A6.3.4), Xt= UlX0 = je^dE^o = LiadZ{X). A6.5.1) To understand A6.5.1) we must find an intrinsic description of the pro- process Z, and to this end we construct stochastic integrals of which A6.5.1) is a particular case. For a detailed account of such integrals see Chapter DC of[31]. A random process Y (X) is said to have orthogonal increments if, for any values X1<X2^X3<X4, To each such process there corresponds a non-decreasing function FY(X) = F{X) such that E\Y{X2)-Y{X1)\2 = F{X2)-F{X1), {X2>X1). It is convenient to write this relation in the symbolic form E\dY{X)\2 = dF{X). A6.5.2) In fact, one can set for example f E\Y(X)-Y(X0)\2, (X^X0), W \-E\Y(X)-Y(X0)\2, (X<X0),
16.5. THE SPECTRAL REPRESENTATION 293 where Xo is an arbitrary value of X. It is easy to see that A6.5.2) defines F uniquely up to an additive constant. Let Y(X) be a process with orthogonal increments, defined on some inter- interval [a, b~\, and F(X) the function associated with it in A6.5.2). Denote by L2 (dF) the complex Hilbert space of functions with 2 _ F = f \f(X)\2dF(X)<crj. •I a We define the stochastic integral f(X)dY(X) for all/eL2(d.F) and all Borel sets A c [a, b~]. In fact it is sufficient to define /(/)= C f(X)dY(X), J a since we can then define f(X)dY(X) = We first define / (/) when/is a step function. Ifa<a1<a2<...<an<b and CO, {X<ai), f(X)=<Cj, (aj-^XKcij), A6.5.3) to, {X>an), then we define = t Cj[Y(aj-0)- 7(^^-0)], 2 where Y(X±0) denotes the limit, in the metric of L2(Q) of Y(X+t) as t^±0. (Clearly Y{X + 0)=Y{X)=Y{X-0) at points of continuity of ()) The integral / (/) defined in this way on the step function has immediately the following properties. A) For any complex numbers a, /? and any step functions/, g, A6.5.4)
294 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 B) Iayl + pY2(f) = «Iyl(f) + PlY2(f). A6.5.5) C) E{I(f)l^)}= \f{X)^(X)&F{X). A6.5.6) D) If? then ?{/(/)} = 0. A6.5.7) It follows in particular that, for step functions / g, E\I(f)-I(g)\2 = \\I(f)-I(g)\\2 = = II/-0|If, A6.5.8) so that / is an isometric mapping of the step functions (which are dense in L2(dF)) into L2(Q). Thus / has a unique isometric extension to L2(dF), obtained as follows. Every fe L2 (dF) is the limit of a sequence (/„) of step functions. Because / is an isometry and L2 (Q) is complete, the elements /(/„) will converge to some element of L2 (Q). This element is independent of the choice of the sequence (/„), for if (gn) also converges to/ then ||lim /(/,)-lim I(gn)\\ = lim \\I(fn-gn)\\ = lim \\fn-gn\\F = 0 . Thus /(/) is defined for all/eL2(.F), and / is an isometry of L2(F) into L2(Q). From the definition it follows that the stochastic integral /(/) satisfies the equations A6.5.4)—A6.5.8). Moreover, /(/„) converges (in mean square) to /(/) if and only if/, converges to/in L2(dF), i.e. lim Cfn(X)dY(l) = \b(\imfn(X))dY(X). A6.5.9) n J a J a\ n / This is an easy consequence of A6.5.8). Turning now to the representation A6.5.1) of the stationary process Xt, we notice the following properties. (a) The process Z(X) has orthogonal increments. In fact, if 2.1 < X2 2.4, then (EX2 — EXi) and {EX4 — EX3) are projections onto orthogonal sub- spaces of LK. Therefore ((EX2-EXi)X0,(EX4-EX3)X0) = = E{(Z(X2)-Z(Xl))(Z(X4)-Z(X3))} = 0.
16.5. THE SPECTRAL REPRESENTATION 295 (b) The function FZ(X) in the equation E\dZ(X)\2 = is just the spectral function of Xt. In fact, it has been shown that the spec- spectral function F is uniquely determined by the equation Rt=E(XtX0) = But, from A6.5.1), — piad \\F Y II2 whence dF{X) = d\\ExX0\\2 = E\dZ{X)\2 . Thus we have proved the following theorem. Theorem 16.5.1. Every process Xt, stationary in the wide sense, with a spectral function F(X), can be represented by the stochastic integral fCO Xt= eitXdZ{X), A6.5.10) J - 00 if time is continuous, or by eiadZ{l), A6.5.11) if time is discrete, where Z(l) is a process with orthogonal increments, and E\dZ{A)\2 = dF{X). Conversely, it is easy to verify that A6.5.10) and A6.5.11) always define stationary processes (in the wide sense).
29o THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 § 6. The structure of L^ and linear transformations of stationary processes Let Xt be a stationary process with spectral function F(a), having the spectral expansion Xt = eir/dZ(x). Let Y be an arbitrary element of Lx (X); the smallest closed linear sub- space containing X, for all t. We show that there corresponds to Y a function </> (?) e L2 (dF) such that Y = i (f>(X)dZ{l). A6.6.1) In fact, Y is either a finite sum of the form where j or a mean square limit of such sums. In the latter case Y = lim X 4>fXtl* = lim D>n{X)dZ{X), n j- j n J where j By A6.5.9) the existence of lim /(</>„) entails that of 4> = lim (f>n, and Y = ^ (f)(X)dZ(X). Conversely, a similar argument shows that every random variable of the form A6.6.1) belongs to L^, so that we have proved the following result.
16.6. STRUCTURE OF ?„, AND LINEAR TRANSFORMATIONS 297 Theorem 16.6.1. The space L^ consists exactly of the random variables of the form A6.6.1). In fact, we can say more than this. Because of A6.5.8) the equation A6.6.1) defines an isometric isomorphism between the Hilbert spaces L^ and L2(dF), in which Xt corresponds to eitX, and the operator Ul in Lx to multiplication by eitX in L2(dF). Every random variable YeLx generates a stationary process Yt= UlY. From the above argument it follows that, if Y is represented in the form A6.6.1), then Yt = j eacf>(X)dZx(X) = | e'adZyB), A6.6.2) Zy{X) = \ c/){X)dZx{X). The spectral function of the process Yt is FY(X) = E\ZY(X)\2=E HX)dZx(X) = \ \cj>{X)\2dFx{X). A6.6.3) One can say of the process Yt that it is the result of a linear transformation of the process Xt. The function c/)(X) is called the kernel of the transforma- transformation. By Theorem 16.6.1 the linear transformation sending Xt into Yt necessarily has the form Yt = U{ Y , Y = for 4>eL2(dF). But Y e Lx is a limit of sums of the form Z (j>jXtj, so that every linear transformation of Xt is either given by a finite sum of the form or is a limit in mean square of such sums. Example: differentiation. Let X(t) be a continuous time process. If X2dF(X)< oo , J — 00
298 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 the function il is in L2(dF), and is the kernel of a linear transformation sending X (t) into Y{t)={ ueiudZ{A). Since in L2 (dF), U= lim/T^e'^-l}, and since h we have, in L2(Q), h->0 h Consequently, Y(t) is the derivative in mean square of X(t), and exists if and only if X2dF(X)< oo . § 7. Existence theorems for the spectral density Theorem 16.7.1. Let X} be a sequence stationary in the wide sense with spectral function F{?). Then F{?) is absolutely continuous if and only if Xj can be represented by the sum 00 Xj= I ck?k+J, A6.7.1) fc=-oo where Z|cfc|2 < oo , and the random variables ?j are orthogonal, with E\?j\2 = 1. Theorem 16.7.2. The spectral function F(X) of a continuous time process X(t), stationary in the wide sense, is absolutely continuous if and only if X(t) has a representation
16.7. EXISTENCE THEOREMS FOR THE SPECTRAL DENSITY 299 X{t) = C{t)d?{T + t), A6.7.2) •> — oo where CeL2{ — co, oo) and ?(t) is a process with orthogonal increments, withE\d^{x)\2=dx. Proof of Theorem 16.7.1. Suppose that Xj is of the form A6.7.1). Then the sequence (?k) is stationary, and thus has a representation where Then fc= — oo fc= — oo where Thus, by A6.6.3), which is absolutely continuous, with derivative Conversely, suppose that F(X) = FX(X) is absolutely continuous, with F'(X)=f(X). Choose a measurable function c(X) such that/(/l) = |c(/l)|2; then c(X)eL2{ — n, n), and has a Fourier series c(x)~Y.^ikk A6-7-3) with Eic*i2< °° • Now construct a process Zx B) with orthogonal increments and orthogo- orthogonal to ZX(X), with
300 THEORY OF STATIONARY PROCESSES: SOME RESULTS Chap. 16 and set • — n ^ r) where = 1 (c« = 0), and l/c(X) is taken as 0 if c(X) = 0. It is easy to check that Z^X) is a process with orthogonal increments and that ?|dZ^(/l)|2 =dl. Thus the sequence consists of orthogonal variables with E\^k\2 = 1. Hence 2) = B^ f k= — oo The proof of Theorem 16.7.2 is exactly similar, using Fourier integrals in place of Fourier series. The process ?(t) is defined by where \c(X}\2 =fx(X), so long as c(X) #0. If c(A) = 0, we need as before to introduce the auxiliary process Z1 (X).
Chapter 17 CONDITIONS OF WEAK DEPENDENCE FOR STATIONARY PROCESSES The past history of the process Xt is described by the er-algebras 93?^ s, the future by the a-algebras ^0l^+s. It may be that these a-algebras are inde- independent, in the sense that, for all AeWVS^, ?eSR^s, P{AB)-P{A)P{B) = 0. In the general case, the magnitude of the left-hand side measures the de- dependence between past and future, and it may be useful to assume this to be small, in some sense. In this chapter we examine some of the possible ways of limiting the dependence. § 1. Regularity Definition 17.1.1. A stationary process Xt is said to be regular if the a- algebra is trivial in the sense that it contains only events of probability zero or one. The famous zero-one law for independent random variables (see, for example, [59], [31]) implies that, for instance, a sequence of independent, identically distributed random variables is regular. In the Hilbert space terminology of the last chapter, regularity simply means that the subspace (which consists of the random variables measurable with respect to ftft-oo) contains only the constant functions.
302 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 Theorem 17.1.1. In order that a stationary process Xt be regular, it is necessary and sufficient that, for all BeVR^, lim sup \P{AB)-P{A)P{B)\ = 0. A7.1.1) r->-oo Proof. To prove the necessity of the condition, write xA for the indicator function of an event A; Xa(co)=\° (c°M)' jl {coeA), and set ? = %A — P(A), n = xB — PB, so that P{AB)-P{A)P{B) = E{?n). Since ? is measurable with respect to WIL^, equation A.1.3) gives as t-> — oo, in virtue of the theorem of Appendix 3. To prove the sufficiency, suppose to the contrary that A7.1.1) is satisfied, but that Xt is not regular. Then there is an event A e 9Ji _ x with 0 < P{A) < 1, and then sup \P{AB)-P{A)P{B)\^\P{A)-P{AJ\>0, which contradicts A7.1.1). • Corollary 17.1.1. A regular process Xt is metrically transitive. Proof. Let A be an invariant event. For any s > 0 we can find a finite t and an event AEe'W_t such that From A7.1.1), lim \P{T-t-sAEnA)-P{AE)P{A)\ = O. s-*co But = P{AEnTt+s A) = P{AEA), so that
17.1. REGULARITY 303 P(AEA) = P(A)P(AE). Letting g-»0, we have P{A) = P{AJ , so that P(A) = 0 or 1. From the proof of Theorem 17.1.1 it is clear that one can state it in the apparently stronger form: For the regularity of the process Xt it is necessary and sufficient that lim sup|E(fr)-E(?)E(ij)|=0, A7.1.2) t-*-<x> \ for alineHx, where the supremum is taken over all ^eH^^ with ?|?|2^1. The condition of regularity can be described geometrically in the follow- following way. Denote by Pt the projection operator onto the subspace Ht_x. Then it is easy to see that Xt is regular if and only if, for all rjeHK, lim ||P,»7|| = 0. t-* — oo Theorem 17.1.2. If the stationary process Xt is regular, and ifYeHx{Xt), then the stationary process Yt=U{Y has an absolutely continuous spectral function. Proof. We call a stationary process Xt (in the wide sense) linearly regular if Since E_fX c Hf_x, every regular process with E\X?\<co is linearly regular. Moreover, if Xt is regular, and YeHlK(X) (s<oo), then the process Yt=U'Y is linearly regular. In particular, the collection of variables Y e Hx (X) for which Yt= U'Y is linearly regular is dense in Hx. From this, we prove that all linearly regular processes (and hence a fortiori all regular processes) have spectral densities. For simplicity, we consider only the discrete time case. Lemma 17.1.1 (Wold decomposition). A sequence Xj stationary in the wide sense is linearly regular if and only if it is representable in the form Xj= I akQk + j, A7.1.3) k= — oo where Z \ak\2 < oo, and ?j= U^qeU^^X) are orthogonal random variables.
304 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 Proof. If the sequence Xj is represented in the form A7.1.3), then so that Xj is linearly regular. Conversely, let the sequence Xj be linearly regular. We denote by Lj the orthogonal complement of UZl\X) in B_az(X), so that The dimension of L{- clearly does not exceed 1. If it is zero, then r/-i _ r/ _ which plainly contradicts the linear regularity. Hence equal to 1. We now show that = Z ©4- k= — oo In fact, for all s <j, has dimension A7.1.4) and because the projection of any Y eLx pf) onto Ls_ ^(X) tends to zero as s-> — oo. Because of A7.1.5), ^has a representation of the form j xj= I flk,^. fc= — oo where Hk\ak.\2<co, ?kjel}k, and the ?k. are orthonormal. Because I}k is one-dimensional, <^fcj. does not depend on j (except by a factor of unit modulus, which may be absorbed into akj). Since Xj= UJX0, fc= — oo Combining this lemma with the results of § 16.7 we see that the linearly regular process Xj has a spectral density A7.1.6) 0 V n eUk — OC' 2 =
17.2. THE STRONG MIXING CONDITION 305 Remark. The results of § 16.7 show that, conversely, a stationary se- sequence with a spectral density of the form A7.1.6) allows of the expansion A7.1.3), and is thus linearly regular. But A7.1.6) means that f(X) = \4>(ea)\2, where (/>(e1/l) is the value at z = ea of a function <ft(z) analytic inside the unit disc \z\ < 1 and satisfying u)\2dA< oo . The theory of boundary values of such functions ([136], chapter II) shows that such a representation for/(A) is possible if and only if \ogf{X)dt> -oo. A7.1.7) 71 Thus A7.1.7) is a necessary and sufficient condition for linear regularity. Returning to the proof of the theorem, let YeH^ (X). We show that the spectral function of Y) = Uj Y is absolutely continuous. In fact, because of the regularity, if e is any positive number, there exists N < oo such that, if with yf'sMjX), Z(N) l^JI), then ?|Z(N)|2<8. The spectral function FY (X) of Y, is the sum of the spectral function FYiN)(X) of Y{N) and Fzifl)(X) of Z(N). The process Y}N} is linearly regular and FYlK)(X) is absolutely continuous. Thus the total variation of the singular compo- component of FY(X) does not exceed that oiFz(X) which since is arbitrarily small. Thus F(X) is absolutely continuous and the theorem is proved. • § 2. The strong mixing condition If we strengthen A7.1.1) by requiring it to hold uniformly in B as well as A, we arrive at the following definition. Definition 17.2.1. A stationary process Xt is said to be strongly mixing (or completely regular) if
306 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 <x(t)= sup \P{AB)-P{A)P{B)\^0 A7.2.1) as t-> oc through positive values. The non-increasing function a(t) will be called the mixing coefficient. It is of course clear that a strongly mixing process is necessarily regular. A sequence of independent random variables is strongly mixing; other examples will appear in §§ 17.3, 19.1, 19.2, 19.4. Theorem 17.2.1. Let the stationary process Xt satisfy the strong mixing condition. If c, is measurable with respect to SCR'-^,, and r\ with respect to (t>0), and if\?\^Cu \n\^C2, then \E{?n)-E(t;)E(n)\ < 4ClC2a(r). A7.2.2) Proof. We may clearly assume that t = 0. Using the properties of con- conditional expectations stated in § 1.1, we have O J-?(»7)]}, where Clearly Ci is measurable with respect to 90?° ^ and therefore Similarly, we may compare r\ with to give Introducing the events the strong mixing condition A7.2.1) gives
17.2. THE STRONG MIXING CONDITION 307 P(AB)-P(AB)-P{AB)-P{A)P{B) + -P(A)P(B) + P(A)P(B) + P(A)P(B)\^ 4a(t), whence A7.2.2) follows. • If the variables ?, r\ are complex, then separating the real and imaginary parts, we again arrive at A7.2.2), with 4 replaced by 16. Theorem 17.2.2. Let the random variables ?, r\ be measurable with respect to SCR'-oo and 9lR^t respectively, and suppose that, for some 8>0, ?|?|2 + <5<Cl<oo, ?|^|2+<5<c2<oo. A7.2.3) Then where Proof. As before, we take t = 0. Introduce the random variables defined by N to and rjN, fjN similarly defined. Then { (\, A7.2.5) and by Theorem 17.2.1, |?(Mn)-?(U? Wi < 4N2a(t). A7.2.6) Because of A7.2.3),
308 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 so that Combining A7.2.5)-A7.2.7), we have \E{?ri)-E{Z)E{ri)\ whence A7.2.4) follows on setting N = ol(t)~p. • The left-hand side of A7.2.1) can be small either because P(B\A) is near P{B), or because P(A) is small. This suggests that we should consider a stronger mixing condition which requires the difference to be small com- compared with P{A). Definition 17.2.2. The stationary process Xt is said to satisfy the uniform mixing condition if sup It is clear that 4> (t) is non-increasing, and that a uniformly mixing process is strongly mixing (the converse is false; see § 3). The essential supremum of a random variable ? (co) is the unique number C ^ oo with the property that P (? > C) = 0 but P (? > C) > 0 for all C < C. We remark that, if <Mt) = sup esssup|P(B|9WLJ-P(B)|, A7.2.9) then 0(t) = 0i(t). A7.2.10) In fact, \P(AB)-P(A)P{B)\ = [P(B\W_J-P(B)]dP(co) showing that ^(t)^^^!). Conversely, for any e>0, choose >l?e90lf_c and B?g9lR,*t so that P(A,)>0 and for all coeAe,
17.2. THE STRONG MIXING CONDITION 309 Without loss of generality we may suppose that, for all coeAe, Integrating this inequality over Ae, we obtain P(AEB?)-P(AEBE) > [f(r)-?]PD), showing that <j){x)xj)l (t) — e. Since e > 0 is arbitrary, this proves A7.2.10). Theorem 17.2.3. Let the stationary process Xt satisfy the uniform mixing condition, and let ?, r\ be measurable with respect to SCRL^, and 90? respectively. If 00 t + z where p,q>l, p~l +q~l = l, then A7.2.11) Proof. Suppose first that ^ and rj are represented by finite sums where the Aj are disjoint events in 90?'_ ^j and the Bt are disjoint events in z. Then, using Holder's inequality, \E(?ri)-E(Z)E(ri)\ = I I ^ <
310 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 j J A7.2.13) Denoting the summation over positive terms by D + , and over negative terms by IT, we have l\P(Bi\AJ)-P(Bi)\ = . A7.2.14) Substituting A7.2.14) into A7.2.13) proves the theorem for variables of the form A7.2.13). For the general case, it suffices to remark that as ./V->oo, where ?N, r\N are random variables of the form A7.2.12), ?N being defined by _{k/N and rjN similarly. • All the concepts and theorems of §§ 1,2 apply equally to processes defined only for t^O, so long as 90i'_^ is replaced throughout by 90?'o. § 3. Conditions of weak dependence for Gaussian sequences The random vector X = (XU X2, ¦.., Xn) is said to be Gaussian if its charac- characteristic function is of the form lt 02, ..., 0n) = exp{ i X ajOj-t E RtjOtOj] , A7.3.1) ( J k,j ) where a} are arbitrary real numbers, and the matrix R = (RkJ) is positive- semi-definite.
17.3. WEAK DEPENDENCE FOR GAUSSIAN SEQUENCES 311 It is easy to see that aj = E(Xj), Rkj = E{(Xk-ak){Xj-a})} . If the matrix R is non-singular, then X has the probability density (where \R\ is the determinant of R), p{xl,x 2, where the matrix (rkj) is the inverse of R. Conversely, any positive-semi- definite matrix R defines the distribution of a random vector with characteristic function A7.3.1). The following properties are immediate consequences of the definition. A) The variables X^, X2, ..., Xn are independent if and only if Rkj = 0 for k # j, i.e. if and only if they are uncorrelated. B) If Xj = {Xlj, X2j, ..., Xnj), and if the vector (Xl5 X2, .. .Xm) is Gaussian, then ZJ=1 bjXj is Gaussian for all real by C) If the sequence of Gaussian vectors Xj converges to a random vector X, then X is Gaussian. The random process Xt (te T) is said to be Gaussian if for any tu ..., tn€T, the vector (Xti, Xt2, ..., XtJ is Gaussian. It follows from what has been said that the finite-dimensional distributions of a Gaussian process are determined by the two functions at = E(Xt), Rts=E{(Xt-at)(Xs-as)} . Conversely, Kolmogorov's theorem implies that there exists a Gaussian process determined by these two functions, provided only that the func- function Rts is such that, for any tj, the matrix (Rt r.) is symmetric and positive- semi-definite. In particular, if (Xj) is a stationary Gaussian sequence, then any condition of weak dependence of the cr-algebras SCR/L00, SCR^+fc can, in principle, be expressed in terms of the autocovariance function of the sequence, or of the spectral function. Such expression may be far from simple, and raises difficult and interesting analytical problems. These lie away from the theme of this book, and we shall discuss them only in order to construct examples of processes satisfying the conditions of §§ 1, 2. Theorem 17.3.1. A Gaussian sequence Xj is regular if and only if it is linearly regular.
312 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 Proof It suffices to show that every linearly regular Gaussian sequence is regular. By Lemma 17.1.1 such a sequence has a representation of the form A7.1.3). In this representation the variables Cj are limits in mean square of linear combinations of the Xj, so thai the vector (<^l5 ..., ?„) is Gaussian. Since the C/are orthogonal, they are independent. From A7.1.3), and since Zj is regular by the zero-one law, Xj must be. • Corollary 17.3.1. The Gaussian sequence Xj is regular if and only if it can be represented in the form A7.1.3), where the tj are independent and nor- normally distributed. Corollary 17.3.2. The condition A7.1.7) is necessary and sufficient for the regularity of a Gaussian sequence with spectral density f (A). Theorem 17.3.2. A Gaussian sequence Xj satisfies the uniform mixing condition of and only if the a-algebras <SMIL!X>, 9CR^°+n are independent for all sufficiently large n. Proof. The sufficiency of the condition is obvious. To prove its necessity suppose that it is not satisfied, so that the autocovariance function Rj is non-zero for infinitely many values of j. Without loss of generality, sup- suppose that E(Xf) = 0, E{Xf)= 1. Let j satisfy Rj?, and write Rj=p- Define events A = {Xo ^ 2/p} , B = {XjE [0, 1]} . If Xj is uniformly mixing, then \P(AB)-P(A)P(B)\^<f>{j)P(A), where $(/)->0, as j->co. But P{AB)-P{A)P{B) = x2 — 2pxy + y2 { dxdy exp - Up j 0 <- f expj- 21 pJ 0 *-
17.3. WEAK DEPENDENCE FOR GAUSSIAN SEQUENCES 313 1 r f n2v2 exp - ?^ o i-p2 2(i-p2)j rj- It follows easily that P(AB)-P(A)P(B)>{exp[3/2(l-p2)]-l-\Cj\}P(A), where c,—> 0 asj-> oo, which contradicts the uniform mixing condition. • The investigation of Gaussian sequences satisfying the strong mixing con- condition is more complex. It rests upon a theorem of Kolmogorov and Rozanov, whose proof [47] is rather long and will be omitted, but which will be stated, since it is of central importance in the study of Gaussian processes satisfying the strong mixing condition. We set where the supremum is taken over all rjeL°n(X), E\r,\2 = l. The Kolmogorov-Rozanov theorem is then as follows. For stationary Gaussian sequences, oc(n) ^ p(n) ^ 2ncc(n). With the aid of this theorem we can construct an extensive class of Gaus- Gaussian sequences satisfying the strong mixing condition. Theorem 17.3.3. // the spectral density f(X) of a Gaussian sequence is continuous, andf(X)^m>0, then it satisfies the strong mixing condition. Proof. Clearly X ? bkXk where the supremum is taken over all finite sums Y = Itj^oajXj, Z =
314 WEAK DEPENDENCE FOR STATIONARY PROCESSES Chap. 17 ZjZnbjXj with E\Y\2=E\Z\2 = l. Hence, from the results of § 16.6, p(n) = where the supremum is taken over all trigonometric polynomials P and Q of the form &fce'» , A7.3.2) with By the Weierstrass approximation theorem, there exists, for any 8 > 0, a trigonometric polynomial 7^ (X) such that, for all X, If the order of 7?B) is N, then for k>N, n Hence, for n>N, P,Q J ~n = supf P,<2 J -71 em-1 sup f" P,<2 J -w em" = em Remark. The continuity of /(A) has of course to be interpreted on the unit circle; we must have/(ft) =/( — n).
Chapter 18 THE CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES § 1. Statement of the problem This chapter contains the main objective of the second part of the book, the investigation of the limiting behaviour of the distributions of sums or integrals of the form T + a r-a+T t = a r-a+T \ Xtdt-AT, A8.1.1) J a as 7"—>oo, where Xt is a stationary process. If no assumptions except stationarity are made, it is not generally possible to prove anything stronger than an ergodic theorem. Thus for instance we may take Xt = X for all t, AT = 0, BT=T, and obtain any distribution as the limiting distribution of A8.1.1). However, in this exam- example, there is strong dependence between Xtl and Xt2 even for very large values of 1^ —12\. This shows that, to obtain theorems of interest, we must impose conditions of weak dependence between the past (90?° ^ and future (9lRt°°) of the process. We shall therefore study processes satisfying the strong or uniform mixing conditions, and functionals of such processes. There is one other sort of trivial behaviour which must be excluded, which arises when the sums Zf Xt do not grow as T increases. Suppose for exam- example that (Zj) is a sequence of independent, identically distributed random variables; then defines a stationary process which is, in any reasonable sense, weakly dependent. But Xt — ST+ l~Sli t= 1
316 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 so that A8.1.1) converges in distribution in a trivial way, taking BT= 1. To exclude such behaviour, we always require that lim BT = go . With these restrictions, it is possible to find all the possible limit distribu- distributions of A8.1.1). Theorem 18.1.1. Let Fn(x) be the distribution function of Bn1 tXj-AH, A8.1.2) where Xj is a strongly mixing stationary sequence with mixing coefficient oc(n), and lim ?,,= oo. A8.1.3) n—»oo If Fn(x) converges weakly to a non-degenerate distribution function F(x), then F(x) is necessarily stable. If the latter distribution has exponent a, then where h(n) is slowly varying as n—>oo. Before proving the theorem, we make a general remark about the methods of proof of this and other limit theorems for dependent variables used in this book. They are all based on a very fruitful idea introduced into probability theory by Bernstein [8]. We represent the sum in the form j=0 j = 0 where Any two random variables ?,-, ^ (i#j) are separated by at least one vari- variable r\j containing q terms. If q is sufficiently large, the mixing condition
18.1. STATEMENT OF THE PROBLEM 317 will ensure that the ^ are almost independent, and the study of ? ^ may be related to the well understood case of sums of independent random variables. If, however, q is small compared with p, the sum ? r/j will be small compared with Sn. Thus Bernstein's method permits us to reduce the dependent case to the independent case. Proof of theorem 18.1.1. From A8.1.3) we conclude, as in § 2.1, that \imBn+1/Bn=l. A8.1.4) n-> oo Therefore, for any positive numbers al5 a2, there exists a sequence m(n)^> oo such that lim BJBn = aJa2. n-> oo We can also choose a sequence r(n) increasing so slowly that in probability as n->oo. Consider the sum ( n+r+m n+r+m \ 1 X Xj-Am-b2) j=n+r+l / (n+r+m \ n+r (^i^) Z ^-Cj-^BJ-1 I X, A8.1.5) j=i / j=«+i By virtue of the strong mixing condition A7.2.2), the distribution function of the left-hand side of A8.1.5) differs from Fn{a1x + b1)*Fm(al -^ by at most o(l) as r^-oo. Because of the choice of r, the right-hand side has the limiting distribution F(ax + b), where a>0 and b are constants. Consequently, F(alx + bl)*F(a2x + b2) = ^(a and F(x) is stable.
318 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 To prove the second part of the theorem it suffices to show that, for all positive integers k, lim BJBn n —* oo We denote by fn@) the characteristic function of A8.1.2), so that as in § 2.2, lim |/H@)| = e-c"l\ (c>0). A8.1.6) n —* oo Let r(n) be an unbounded increasing sequence, which will be chosen later, and write The variables ^ have the same distribution, so that Let /"(«)-> oo so slowly that the limiting distribution of the sum coincides with that of the sum nk A8.1.7) Since r-> oo the random variables ^ are weakly dependent; precisely, by Theorems 17.2.1, j = 1 B nk ? exp 'nk j=l .Bnk j=l B. ^s 'nk as n—><X). Thus
18.1. STATEMENT OF THE PROBLEM 319 as n->oo, whence it follows from A8.1.6) that lim (BjBnkYk=\. . A8.1.8) n-*oo Before formulating the analogous result for stationary processes with a continuous time parameter, we must explain what is meant by an integral of the form b X{t)dt. a For simplicity we assume that E\X(t)\ < oo . Then , E\X{t)\dt = \ dt\ \X{t,co)\P{dco) = a J a J Ci = {b-a)E\X{0)\< oo, and since X (t, co) is measurable in (t, co), Fubini's theorem shows that b X{t,co)dt a exists for almost all co e Q, and that -b r b X{t)dt = f EX{t)dt. a J a Theorem 18.1.2. Let FT(x) be the distribution function of Br1 i^ X{t)dt~AT, A8.1.9) Jo where X(t) is a stationary process satisfying the strong mixing condition, and lim BT = oo . r-oo IfFT(x) converges weakly to a non-degenerate distribution F(x) as T->co, then F(x) is necessarily stable. If the exponent of this stable law is a, then BT= T1/ah{T), A8.1.10) where h(T) is slowly varying at infinity.
320 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Proof. The stability of the limiting distribution is proved exactly as in the discrete case. To prove A8.1.10) it suffices to prove that, for all t>0, lim BTJBT = z11*. A8.1.11) First, let z = k be an integer. Then the same arguments as were applied to A8.1.8) in the discrete case yield lim {BT/BTk)*k= 1 . Thus A8.1.11) is proved for integral z. If x = p/q is rational, then with T'=T/q , we have lim (BTpiq/BT) = lim (^p |^ )= p«/q«. T-*oo 7"-*oo V "r &T'q / Thus A8.1.11) is proved for rational z. Now let z be any positive number, and choose z' so that (t + t') is rational. From what has already been proved, lim {BT{z + zl)IBT) = {x + x'Yla. A8.1.12) Writing rT(r + r') + r r Tz rTz + r rT(r + z') + r X{t)dt = X{t)dt + X{t)dt + X{t)dt, JO J 0 J Tz J Tz + r and using the arguments leading to A8.1.8), we see that Hence, from A8.1.12), T-oo L Therefore / BTr\a lim sup -^ ^ t \ B I and since t' can be chosen to be arbitrarily small, lim sup l^A ^ t. A8.1.14) V BT J
18.2. THE VARIANCE OF Xl + ... + Xn 321 This is true for all t > 0, and we may therefore replace t by t' and substitute in A8.1.13) to give (BT V liminf/-^ I ^ t. A8.1.15) The inequalities A8.1.14) and A8.1.15) complete the proof. • Conditions are still not known for the convergence of the normed sums of a stationary process to a given stable law with exponent a< 2. For the remainder of the chapter we shall therefore consider only the conver- convergence to a normal distribution of Xtdt, where we assume that E(Xt) = 0, E(Xt2) < oo, and BT is taken as or as the case may be. It is, of course, important then to ensure that BT^oo (T-*co), and this condition is investigated in §§ 2,3. In § 4 necessary and sufficient conditions for normal convergence, when Xt is strongly mixing, are established, and in §§ 5,6 simpler sufficient conditions are deduced. § 2. The variance of Xx +... + Xn Consider the stationary sequence • -¦¦> X_ l5 Xo, a\, X2 • ¦ • with autocovariance function R(n) and spectral function F(X) (and, as remarked, ?^- = 0). If there is a spectral density, it will be denoted by ). We write n = X1 + X2 + ¦ ¦ ¦ + Xn .
322 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Theorem 18.2.1. The variance of Sn is given in terms of R(n) and F(X) by the equations n)= Z (n~\j A8-2.1) If the spectral density exists and is continuous at 2 = 0, then as n->oo, + o{n). A8.2.3) Proof We have V(Sn)= t E(XkXt)= k,i=l k,i=l 1 \j\ i~k = j R(k-i)= Z (n-\j\)R(j), and A8.2.1) is proved. Since *0") = V(Sn) = and since -n \j\)eijx = n + 2 Re( n " ~ 2 Re j=i A8.2.2) is proved. Now let/B) exist and be continuous at A8.2.4) n sin2 (t; = 0. Integrating A8.2.4) we have A8.2.5) and hence 71 sin2(^
18.2. THE VARIANCE OF X.+...+ Xn 323 ,.n-'/4 max |/B) -/@) I sin2 (inA) |/B) -/@)|<U max |/B)—/@)| +O(n~*) = o(n), and the theorem is proved. • Theorem 18.2.2. // lim R(n) = 0, n-» oo then either lim V(Sn) = cv n-> oo or lim sup V(Sn) < oo , possibility holds if and only if Xn=Yn+1-Yn, A8.2.6) w/iere Yn=UnY, YeL^X). Proof. We shall regard the sums Sn as elements of the Hilbert space . Then the assertion that liminf V(Sn)< oo n-* oo implies that there is a sequence (Sn.) with ||5nj.|| = V(Sn.)< C< oo. Since a closed sphere in Hilbert space is weakly compact [2], there is a subse- subsequence {rrij} of {nj} such that 5mj. converges weakly to some element — Y of J, 0 = lim ?EMJ0 = - (Y, ?) = - But then
324 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 and so, for all feL = (Xu ^) -lim (Xmjt 0 = E(X1 ?) - lim fi Every ^eL^X) is a limit in mean square of sums of the form N jj j=-N and since R(ri)-*0 we have \imE(Xm.Z) = 0. Thus, if liminf V(Sn)< oo , then for all f e ?,„ (). - y, a = Taking, in particular, <^ = L/y— y — X1} we have with probability 1, X1=UY-Y=Y1-Y0, Xn = U"+1Y-U"Y=Yn+1~Yn. In such a case, for all n, c — y _ y so that Hence the theorem is proved. • Remark. The theorem shows that, if R(n)-^-0, then V(Sn) necessarily converges to a, possibly infinite, limit. Corollary 18.2.1. Under the conditions of Theorem 18.2.2, lim V{Sn) = if cosec2(^)dFB). A8.2.7) J
18.2. THE VARIANCE OF Xx f... + Xn 325 Proof If lim F(Sn)<co, A8.2.6) shows that the spectral functions of Xn and Yn are linked by the relation {X) = \ea-l\2dFY{X), A8.2.8) whence lim V {Sn) = IE (Yl) = 2 [ * dFY (X) = \ f cosec2 {&) dF {X). n-»oo J —n J —it Conversely, suppose that J cosec2 (jl)dF(X) < oo. Then, by Theorem 18.2.1, oo , _K sin [2a) and the argument just given gives A8.2.7). Let (Xn) be a regular stationary sequence. We have proved that, for any YeHv(X), the stationary sequence Yn=U"Y has a spectral density, and thus by the Riemann-Lebesgue lemma, lim RY{n)= lim [* einXfY{X)dX = 0 . n-»oo n-» oo J —n Thus the conditions of Theorem 18.2.2 are always satisfied for such se- sequences. Assuming the uniform mixing condition, the theorem can be con- considerably strengthened. Theorem 18.2.3. If a stationary sequence Xn is uniformly mixing, and if lim V(Sn) = oo , n-»oo then , A8.2.9) where h(n) is a slowly varying function of the integral variable n. Moreover, h (n) has an extension to the whole real line which is slowly varying. The theorem therefore asserts that V(Sn) is either bounded or almost linear. Proof This is quite long, and will therefore be divided into several parts.
326 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 (I) Writing \j/(n)=V (Sn), we have first to prove that, for every integer k, lim , . . = k . We write s=l s=l (fc-l)r v-i s=l where r=[log t^(n)]. Since by Theorem 18.2.1, 1 n oi*-i2 _K sin ^/.j we have r = 0(log n). Clearly k k snk= and = t Etf + 2 E ?^y J=l Since Xn is stationary, ^=F(Sn) = .A(n). A8.2.12) Using Theorem 17.2.3 with p = q = 2,we have for i^j, \E?iZj\ < 2*(|i-;|)*(?#)*(?#)* < 2^(r)*^(n), A8.2.13) where (J)(t) is the uniform mixing coefficient. Finally, by A8.2.10), A8.2.14) and similarly
18.2. THE VARIANCE OF Xl + ... + Xn 327 j\ <^M = 0{\og ij/{n)J . A8.2.15) Since r increases with n, 4>(r)=o(l) as n-»oo. The relations A8.2.11)— A8.2.15) therefore show that so that if/(n) is of the form A8.2.9), where h(n) is slowly varying. (II) We now list the properties of h(n) which admit its extension to a slowly varying function of a continuous variable. Lemma 18.2.1. For fixed k, lim h(n + k)/h{n)=l . A8.2.16) n-»oo Proof. Since ^(n)-»oo as n-»oo, the stationarity gives ( n+fc \2 / « n+fc so that h(n + k) n \j/(n + k) n h(n) n + k \j/(n) n + k Lemma 18.2.2. For all ?>0 , lim nE h (n) = oo , n-»oo lim n-?h{n) = 0. A8.2.17) Since lim and using A8.2.16), we have log h(n) = X log {h{[2-'ri])/h{[2-'-l ri])} = o(\og n)
328 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Lemma 18.2.3. Ifn is sufficiently large, then sup ?^4. A8.2.18) n\n) Proof Fix a number m so large that 4> (m) ^ tV We examine the case r > §n, the other case r ^ f n being similar. From the equation r + m n n + m r + m ZXj=ZXj+ I *j+ Z Xj, j=1 j=l j=n+l j=n+m+l we find that where Since we have, for large n, where #i>jf> d2>0. Consequently, for large n, h(n) 3 h(n) Lemma 18.2.4. For all sufficiently small c and all sufficiently large n, W)<c~'- (I8-219) (Of course, A8.2.19) only holds if en is an integer.) Proof. From what has been proved about h(n), U(rn\ [~logc/log2] {log/i(cn)-log/i[2-tlogc/log2]n]}<ilogc-1 .
18.2. THE VARIANCE OF Xl + ... + Xn 17^ We remark that A8.2.19) holds for all c<c0, where c0 does not depend on n. (Ill) Using Theorem 18.2.1, we now extend the functions i//(n), h(n) to the interval @, oo) by the equations h{x)= x ¦ We have to prove that, for all real a > 0, lim^ = a. A8.2.20) As x-»oo, tA(x) = «A([x])(l+0(l)), so that when a = k is an integer, using Lemma 18.2.1. Ua=p/q, where p, q are integers, then A8.2.21) gives lim >T , 7 = lim so that A8.2.20) is proved for rational values of a. For any positive a, define so that \j/^d) = xj/2{a) = a for rational a. It thus suffices to prove that and \\i2 are continuous. But I'* sin2(^?xA) , r sin(?xA)sin(ax/)
330 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 so that it suffices to establish the continuity of ij/1 and \\i2 at zero. Using A8.2.19) we have, if a is sufficiently small, as x-»oo, Consequently the functions t/^ and \\i2 are continuous and the theorem is proved. • The reader will notice that in the proof of theorem 18.2.3 we did not use the full force of the uniform mixing condition, but only the inequality n + m + p I xj i2 / m E ? X, J i= 1 j=n + p Thus the conclusion of the theorem remains true if one only assumes that A) V(Sn) = if/{n)-+co (n->oo), B) For any e>0, there exist numbers p, JV such that, for n,m>N, / n \ / n + m + p \ ?(XX,) X X,-] ^ sij/{n)il/(m). \i= 1 / \ j=n + p / Using Karamata's theorem, we can throw the conclusion of the theorem into the following form. Corollary 18.2.2. Under the conditions of Theorem 18.2.3, V{Sn)=Cn{l+o{l))expH" 8-^du\, A8.2.22) where C>0 and ?(w)-»0 (w-»oo). § 3. The variance of the integral Jj X(t)dt In this section we extend the results of § 2 to the continuous time case, setting S(T)=[ X{t)dt. J o
18.3. THE VARIANCE OF THE INTEGRAL tfX(t)dt 331 Theorem 18.3.1. The variance of S(T) is expressed in terms of the auto- covariance function R(t) and the spectral function F(X) by the equations V{S(T)} =\ {T-\t\}R(t)dt, A8.3.1) J -T V{S(T)} =4p SmfTA)dF(A). A8.3.2) If there is a spectral density f(X) continuous at A = 0, then V{S{T)} = 2nTf{0) + o(T). A8.3.3) Proof We have V{S(T)}=\ E{X(t)X(s)}dtds =\ R{t-s)dtds •'O-'O J 0 J 0 (T-\t\)R(t)dt, -T and since /• oo R{t) = \ eiadF{X). J - oo V{S(T)} = \ (T-\t\)\ -/ -T J _ dF(X) -T The proof of A8.3.3) is just the same as that of A8.2.3). Theorem 18.3.2. If t-»oo then either lim V{S(T)} = oo
332 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 or limsup V{S{T)}< oo , r-»ao the latter possibility occurring if and only ifX(t) is the mean square deri- derivative X{t) = ^^ = lim h-^Yit + Q-Yit)} , A8.3.4) of a process Y{t)=Ut Y , where YeLx(X). Proof. Suppose that V{S(T)} does not converge to oo. Then as in the proof of Theorem 18.2.2 there exists a sequence Tn-»oo such that S(Tn) converges weakly in Lx (X) to an element Y of Lx (X), so that for all 0 = (Y,0- A8.3-5) n-»oo We show that Y (t) = Ul Y is differentiable in mean square, and that dt Fix a number t, and note that, from A8.3.5), for any ? e L^ (X), Z) = (U*Y,{;). A8.3.6) Since R(t)->0, for all ^l9 ^2eL00(X). Hence, using A8.3.6), = (t I X{t)dt, n - lim t / n->oo V -I Tn Since ^ is arbitrary,
18.4. STRONGLY MIXING SEQUENCES 333 UXY-Y 7(t)-7@) ,r _ = __U U = T-i X{t)dt, A8.3.7) T Jo T and since R(t) is continuous, lim ? T-»0 - 1 f E{X{t)-X{0)}{X{s)-X{0)}dtds = t-»0 J OJ 0 ( [ t-»0 J oJ 0 This shows that the mean square derivative of X(t) exists at t = 0, and equals X@), proving A8.3.4). Conversely, if X(t)= Y'(t), then as T-»oo, V{S{T)} = E{Y(T)-Y{0)} = = 2?G2)-2?{7@O(T)}->2?G2). • The techniques of the last section lead in a similar way to the following results. Corollary 18.3.1. IfR(t)-+0 as r-»oo, then /-co lim V{S(T)} = 2\ X~2dF(X). T-* oo J — oo Theorem 18.3.3. 7/f/ie stationary process X(t) satisfies the uniform mixing condition, and if lim V{S{T)} = oo , V{S(T)} = Th{T), where h(T) is slowly varying at infinity. § 4. The central limit theorem for strongly mixing sequences Let (Xj) be a stationary sequence with E(Xj) =0, ?(X2) < oo, and set n + m
334 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 We shall say that the sequence satisfies the central limit theorem if lim P [^ < z I = Btc)-± I" e-"dw = <Z>(z). n-»oo I ® n J J — oo It is of course sufficient to prove this for the sequence Sn = S*. Theorem 18.4.1. Let the sequence Xj satisfy the strong mixing condition with mixing coefficient cc(n). In order that the sequence satisfy the central limit theorem it is necessary that A) a2 = nh(n), where h(x) is a slowly varying function of the continuous variable x>0, B) for any pair of sequences p = p(n), q = q{n) such that (a) p-»oo, <?->oo, q = o(p), p = o(n) as n-»oo, (b) \imn1-pq1 + pp-2 = Q forall ?>0, A8.4.1) n-* oo (c) lim np~icc(q) = O , n—* oo and any s > 0, the distribution function satisfy lim -\ [ z2dFp{z) = 0. A8.4.2) n-»oo P^n J |z|>?<rn Conversely, if (I) holds and if(l 8.4.2) is satisfied for some choice of the func- functions p, q satisfying the given conditions, then the central limit theorem is satisfied. Proof. We first establish the necessity of A). From Theorem 18.1.1 it follows that h (n) is slowly varying in its integral argument. Let the distri- distribution function converge to &(z) as n-»oo. Then for fixed JV, z2dFn(z)-> f z2d<P(z), so that
18.4. STRONGLY MIXING SEQUENCES 335 and If z*dFn{z) = J\z\>N [ f lim lim N-»oo n-»oo J \z\ >! n- 1 z2d 1 -jz<^z2dFn I' z2d<Z>(z), J|z|>N FB(z) = O. V Y z2d<P(z) = A8.4.3) j = 0 j = n+1+ p then From the remark at the end of § 18.2, we have only to show that, for each ?>0, there exists p = p(e) such that |?(&j)| ^ eE(?2) • A8.4.4) Using the arguments of Theorem 17.2.2 it is easy to show that for any A^. Choosing Ni=aa(p)~i, we have J f z)V . A8.4.5) J The strong mixing condition shows that, by suitable choice of p, we can make \E(^r\)\ smaller than ea2 for sufficiently large n. Thus we have proved the necessity of the condition A), which will henceforth be assumed. The remaining parts of the proof are more complicated, but proceed in outline as follows. We represent the sum Sn in the form i m = s'n+s:t A8.4.6) i=0 i = 0 where ip + iq + 1
336 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Xj, @<i<*-1), A8.4.7) kp + kq+ 1 where p and q satisfy B), and k=[n/(p + q)"\. This corresponds to the decomposition of the normalised sum Zn. Under the conditions imposed on p and q, we show that SJ,' is negligible, and that the ^ are nearly independent. We first have to verify that the conditions imposed on p and q can indeed be satisfied; otherwise the theorem would be somewhat trivial. To do this we set k(n) = max {a[/r]*, (log n)} , jr»«(»*r 1L X{n) _ 'LA(n). 9 = !>*] • Then, all the conditions A8.4.1) are satisfied; (a) p-»oo, q-*co, p = o(n), q = o(p) as n-»oo, (b) n1-^1+^-2 = O(n-A + 3/?)/4) = o(l),i (c) wp'^^^a^] ,. /.-> 0, as w->oo. To show that Z^,' is negligible, we need the following lemma. Lemma 18.4.1. If the distribution function Fn(x) of the random variable ?„ converges weakly as n-»oo to the distribution function F(x), and ifr\n con- converges to zero in probability, i.e. lim P(K|>?)-»0 n-»oo for all ? > 0, then the distribution function of ?„ = ?„ + r\n converges weakly to F(x). Proof. Let f(t) be the characteristic function of F(x), so that lim E(citin)=f(t).
STRONGLY MIXING SEQUENCES 337 Thus lim sup \EQit(in+"n)-f{t)\ lim |? e"{" -f(t) \ + lim sup E \eitn" -11 n-»oo ^ lim sup I |e"x-11dP(r]n<x) + 2 lim P(rjn\ >e) for any positive s. To continue the proof along the lines suggested, we show that lim ?|Z;'|2 = 0, n-»oo which, since shows that Z|,'-»0 in probability. We have E\Z':\2=a~2 E E(rjirjj) + 2<j;2 ? E^ + a;2 E(r,2k)^ S Jt 1 « i, j =S Jt — 1 where q' = n — (p + q) [n/(p + q) ] ^ p + q is the number of terms in nk. From the properties of h(n) (Lemma 18.2.4) and the requirements imposed on k, p, q, f(q\' h(nq/n)l ni+'ql~' as n->oo, by A8.4.1). Similarly, k{qh(q)q'h(q')}i = \ kqh{q)\^kqh{q)\ nh(n) 1 nh{n)) \ nh{n)\ " < {Hq/np}*{kit/rip}*-+0, A8.4.10) and ^4^-0. A8.4.11)
338 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Combining A8.4.8)-A8.4.11), we see that lim n —* oo as required. From Lemma 18.4.1 it follows that the limiting distribution of Zn is the same as that of Z'n, to the investigation of which we now turn. We denote the characteristic function of c^ by (j)n(t), and prove that \E(eitZ")-(j)n(t)k\-+O A8.4.12) as n-»oo. The variable exp it is measurable with respect to 9ft(*001)p+(*~2k, and ~it exp — is measurable with respect to SQl(*_ i)ir+<ifc-d«+ i- By Remark 17.2.1, k-l n r i* k-2 it [it 1 ?exp - X ^ -?exp L^ J and similarly, for / ^ k — 2, 7 I ^ ^exp —ft_J 16a(q), E exp Hence — j] ^- - ? exp — J] ^. L<7nJ = O J L^j^O J ? exp — which tends to zero by A8.4.1), and proves A8.4.12). Now consider a collection of independent random variables ft,. (n=l,2, ... ;;=l,2,...,/c = /c(n)), where ft,- has the same distribution as o~1 ?0. Then A8.4.12) asserts that the limiting distribution of Z'n is the same as that of (which has characteristic function (f)n{tf). The results of § 1.7 show that
18.4. STRONGLY MIXING SEQUENCES 339 this limiting distribution is <P(x) if and only if 0=lim ? ( z n-»oo j= 1 J |z| >? ( n-»oo J \z\ >E But k[ z2dP{o;^0<z) = /co--2f z2dP(Z0<z) J|z|>? J and the theorem is proved. We remark that the only part of the proof in which A8.4.1(b)) was used was in the proof that E\Z'^\2 =0. The theorem simplifies if we assume that V(Sn) is asymptotically linear, as it will be, for instance, if the spectral density/(A) exists and is continuous at 1 = 0, with/@)#0. Theorem 18.4.2. If Xj is strongly mixing, and V (Sn) = <r2n(l + o(l)) as n-» oo (a > 0), then Xj satisfies the central limit theorem if and only if limlimsupf z2dFn(z) = 0, A8.4.13) N-»oo n-»oo J |z|>N where Fn(z) is the distribution function of the normalised sum Proof If Fn converges weakly to 0, then for fixed JV, as n->oo. Since the variance of Fn is 1, this implies that x2dFn(x)-»|' x2d$(x), \x\>N )\x\>N so that A8.4.13) is a necessary condition.
340 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 Conversely, if A8.4.13) is satisfied, and al = a2n(\+o{\)), we have \z\>can °n J \z\>Lan\ap = (l + o(l))[ z2dFp(z)-»0 A8.4.14) J |z|>?fc(l+0(l)) as rc-»oo, since /c-»oo, p-»oo. • § 5. Sufficient conditions for the central limit theorem In this section we investigate some conditions on the moments of X}, and on the mixing coefficients a (n), (f) (n), which guarantee that Xj satisfies the central limit theorem. Theorem 18.5.1. Let the uniformly mixing sequence Xj satisfy E |Xy|2 + 5 < oo for some E>0. // as n->oo, then Xj satisfies the central limit theorem. Proof. We show that all the conditions of Theorem 18.4.1 are satisfied. By Theorem 18.2.3, so that condition A) is fulfilled. To verify condition B) we need the follow- following lemmas. Lemma 18.5.1. Under the conditions of the theorem, if 5 < 1, there exists a constant a such that n 2 + 5 Proof We denote constants by cl5 c2, ..., and write n 2n + k Sn= ? XJt Sn= X Xj, an=E\Sn\2+*. j=l j=n+k+l
18.5. SUFFICIENT CONDITIONS 341 We show that, for any s1 >0, we can find cx and k such that E\Sn + Sn\2+s ^ B + 81)an + c1G2+d • A8.5.1) In fact, \Sn\1+5. A8.5.2) Because of the stationarity, Sn and Sn have the same distribution, and E\Sf+*=E\Sn\2+d=an. By Theorem 17.2.3 (with p=B + <5)/(l+<5)), E\Sf+*\Sn\^2(l>(ky+^2+*an + E\Sn\1+*E\Sn\. A8.5.3) Using the theorem again, but with p = 2 + d, E[Sn\\Sn\l+d^2(t>(k)^2+^an + E\Sn\E\Sf+d A8.5.4) By Lyapunov's inequality (§ 1.4), E\Sn\^aHt ?|SJ1+^an1+5. A8.5.5) Inserting the inequalities A8.5.3)-A8.5.5) into A8.5.2), we have To prove A8.5.1) it suffices to take k so large that $<j)(kyi{2+d)^ &x. We now show that, for any s2 > 0 there is a constant c2 for which A8.5.6) In fact, using Minkowski's inequality and A8.5.1), we have for large n, a2n= E 2n n + k 2n + k sn+ X xj+sn- j=2n+l n + k 2 + 6 2n + k ) 2 + 5 ~ [E\Xj\2+dyB+3)> j=2n +
342 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 where since <7n->co, 8 "LP + ^K + c^H as n->co. If we choose JV so large that, for n^JV, then A8.5.6) holds for n ^ JV, with c2 = 2c ^ in place of c2. But we can choose c2 so that A8.5.6) holds also for n<JV, and so A8.5.6) is proved. Because of A8.5.6), for any integer r, where ¦hBT- V- 1 ~|1+*' J We show that, for sufficiently small s, yr is bounded, yr<c3. The function h (n) is slowly varying so that, for any ?3 > 0, there exists JV such that, for ^ l+e3. For any integer / in 2^/^r— 1, h{2r~l) (h{2r-2) h{2r-s) \//iB'-s-1) h{2r~l) Here we choose s so that 2r~s~1<JV^2r~s, so that r~l) If ?3 and ? are chosen so small that we obtain Thus, for this choice of e,
18.5. SUFFICIENT CONDITIONS 343 A8-5-7) Now let 2r<n<2r+1, and write n in binary form n=vo2r + v12r~1 + ... + vr (vo = l> Vj=O or 1). We write Sn in the form where the number of terms in thejth parenthesis is v}2r~K Using Min- kowski's inequality and A8.5.7), and remembering that (AT,-) is stationary, we have 4- -\-X j=0 1\ Li a2r-J \j=0 2+d 2+d j= But j=0 "n j = 0 and since (Lemma 18.2.3), h{2') h(n) SUp SUp < GO , we have only to prove that ? is bounded, which is true since the/th term is bounded by csp{ for some Pi<l. Thus the lemma is proved. • It is now not difficult to complete the proof of the theorem. We have to prove that lim n n-»oo f^n . z z2dFp(z) = 0. By Lemma 18.5.1,
344 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 0 , ?>; \h(n)J as n->oo, by A8.4.1 (a) and Theorem 18.2.3. • If the mixing coefficient (j)(n) is required to decrease reasonably fast, we can remove the moment condition imposed on the Xj. Theorem 18.5.2. Let the stationary sequence Xj satisfy the uniform mixing condition, and let the mixing coefficient (f)(n) satisfy Then the sum oo (J2 = E(X2O) + 2ZE(X0Xj) A8.5.9) converges, and ifcr^O, as n-»oo, (Z e-*dw. A8.5.10) Proof By Theorem 17.2.3, whence the convergence of A8.5.9) follows. As in A8.2.1), so that, if <7#0, crn->oo. We deduce the validity of A8.5.10) from Theorem 18.5.1. Since the sequence Xj is uniformly mixing, so is the sequence/N(X/), where (x,(\x\^N), JN\X) — } A /, . @, (|x|
18.5. SUFFICIENT CONDITIONS 345 with a mixing coefficient < (f) (n). Clearly so that we can apply Theorem 18.5.1. As N-+CO, E{fN(Xj)}-+0, E{fN(Xo)fN(Xj)}-+E{XoXj} . Thus since a # 0 it follows that, for large N, = E{fN(X0)-EfN(X0)}2 + 2 X E{fN(X0)-EfN(X0)} {fN(Xj)- EfN(Xj)} > For such N as n->oo, Thus all the conditions of Theorem 18.5.1 are satisfied and consequently lim pI^N)-1 t lfN(Xj)-EfN(Xj)-]<z\ = n—>oo We have to consider the normalised sum 7=1 where y [/Nft)-?/N(x; " _ 7= 7" _ an* fN(x) = x-fN{x).
346 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 We first estimate an + 2 X (n-j)E[fN(X0)-EfN(X0)-][fN(Xj)-EfN(Xj)-] By Theorem 18.2.3, fory>0, \E Un (*o) ~ EfN (Xo)-] [fN(Xj) - EfN(Xj)-] | ^ ^ 20 (if E [fN (Xo) - EfN (X9)Y ^ rN(j> (jf where 0@) = 1 and rN = 2E[fN(X0)']2. Thus, since rN-*0 (N->oo), { ° I ;=i J as N-+co. For a given e>0, choose N so that then the characteristic function /„ (t) of Zn satisfies ?e"z"-exp - <T2{N) tf a2  1 - a2(N) and the theorem is proved. • We now turn to sequences which are strongly mixing without necessarily being uniformly mixing. Naturally, stronger conditions are necessary to ensure normal convergence. Theorem 18.5.3. Let the stationary sequence Xj satisfy the strong mixing condition with mixing coefficient cc(n), and let E\Xj\2 + 3< oo/or some 5>0.If
18.5. SUFFICIENT CONDITIONS 347 oo, A8.5.11) n= 1 then <T2 = E{X20) + 2 ? E(X0X,-)<oo, A8.5.12) 7=1 and if ajkO, then lim FJa-^-" f X,.<zl=#(z). A8.5.13) Before proving this theorem, it is convenient to deal with the case of bounded variables, to which (as in the proof of Theorem 18.5.2) the general theorem may be reduced. Theorem 18.5.4. Let the stationary sequence Xj be strongly mixing, with oo X a(n)< go , n= 1 and let Xj be bounded; P(\Xj\ < co) = 1. Then 00 G2 = E{X20) + 2 X E{X0Xj)<oo A8.5.14) 7=1 and, if a lim plo-irT* ? Xj<z\=^(z). A8.5.15) Proof The convergence of the series A8.5.14) follows from the inequality |?(XoArJ)|^4coa(/) (see A7.2.2)). From this, as in the proof of Theorem 18.5.2, it follows that Xj) =<72n(l+o(l)), \7"=1 / and that consequently lim pl ——~ < z ) = lim P\ ——— < z ), n^oo V on* J „_„,, V an so long as either limit exists.
348 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 It is difficult to prove the theorem from the results of § 4, because the system of functions p, q is too crude to make effective use of all the condi- conditions assumed. It is easier to show how the arguments of § 4 can be refined for our particular special case. We first estimate the moment ?(?"= i XtL. Lemma 18.5.2. Under the conditions of Theorem 18.5.4, =o(n3). A8.5.16) Proof. We have t xX = nE(Xt)+ I E(X?X}) + ? E(X?Xj) 7=1 / i*j ij E(X?XjXk)+ ? EiXtXjXtXj). A8.5.17) i*jkl The number of terms in the second and third sums is 0(n2), so that it suffices to estimate the fourth and fifth. By Theorem 17.2.1 E(X2XjXk)=o(jjJEX?XjXk\^ = = ^(.<Zfc4oc(/c-;)Vo(n2M and = c(.<Z<i \exJjXm\~ = o( Z c$min(a(/-i),a(/-*))) = = o(n2 But n V jcc(j) ^ n 7=1 7 L a(/) + n Z =S n % 7 > n Vi We have thus proved that where y (n) -> 0 and y (n) = sup y (/).
18.5. SUFFICIENT CONDITIONS 349 We return to the proof of Theorem 18.5.4, defining ?h r}t by the equations A8.4.3), with n • "?¦ *" n p + q where the minimum is taken over integers p. We show that p and q satisfy A8.4.1 (a), (c)). (a) Clearly, as n-»oo, p-*oo. By Lyapunov's inequality, *( .? Xif > { E( i Xjjf = ^n2 (l+o(l)), proving that lim ny(n)>0 . n—>oo Thus for large n, p<n^(log nJ, so that p = o(n) and q-+oo. Since p-*oo, q = o(n). (b) Since a(n) is monotone and Za(n)<oo, () < I so that The condition A8.4.1 (b)) is not in general satisfied. However, as remarked at the end of the proof of Theorem 18.4.1, this condition was only used to prove that 1 X *J=0, A8.5.18) ? = 0 / and if we can find some other way of proving this, the rest of the argument will go through unchanged. If then we can prove A8.5.18), it will suffice to show that ^ \ t \ ¦ A8.5.19) z2dp\ t Xj<z\=0, for every e > 0.
350 CENTRAL LIMIT THEOREM FOR STATIONARY PROCESSES Chap. 18 We have k \2 '."' I n.) - i=o n (k-j)Er,or,j and by Theorem 17.2.1, h\^clq2a(p\i-j\), n i=0 Moreover, so that = O(p + q) = O(p) Since a(n) is monotone, fc-l fc-l jp-l 7=1 7=1 s = U-l)P so that it follows from A8.5.20) that 7 = 0 - i 7=1 (k-j)Erjorjj n 7= ,2 oo 7 = 0 as n->oo. Similarly, 7 = 0 Combining A8.5.21)-A8.5.23) we obtain A8.5.18). Finally, by Lemma 18.5.2, as n-»oo, oo / p z4dP _3 ?". ?2G41 and Theorem 18.5.4 is proved. A8.5.20) A8.5.21) A8.5.22) A8.5.23)
18.5. SUFFICIENT CONDITIONS 351 Proof of Theorem 18.5.3. The convergence of the series A8.5.12) follows quickly from Theorem 17.2.2 and the convergence of ?oc(/)<5/B+<5). As in the proof of Theorem 18.5.2, we introduce the functions fN,fN, and con- consider the stationary sequence fN{Xj). Since Za(/) converges, this sequence satisfies all the conditions of Theorem 18.5.4, and thus satisfies the central limit theorem. Now set 7=1 where / J= 7' — 7=1 z: = n a2(N) = E{fN(X0)-EfN(X0)} + 2 | E{fN(X0)-EfN(X0)}{fN(Xj)-EfN(Xj)} . 7=1 Using A7.2.4), we have ?|Z;f = a'2 {E[fN(X0)-EfN(X0J + + 2 ? (n-j)E[fN(X0)-EfN(X0)-][fN(XJ)-EfN(Xj)-]} 7=1 where C is a constant, and A ^ n an integer. This implies that lim ?|Z;f = 0 n—> oo uniformly in n. The proof is then completed in the same way as that of Theorem 18.5.2. •
352 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 § 6. The central limit theorem for functional of mixing sequences If Xj is a stationary sequence, then every YeH00(X) defines a stationary sequence Y, = UJ Y. If Xj is strongly mixing, and Y e Hk_k (X) for some finite k, then Yf is strongly mixing. For general YeH^X), this is not neces- necessarily true (see below). If however, it is possible to approximate Y suffi- sufficiently closely by variables in Hk_ k (X), then Y, might be expected to exhibit the central limit behaviour typical of strongly mixing sequences. That this is so under suitable conditions is shown by the theorems of this section. In this section we do not impose moment conditions on the Xj, but the Yj, since they belong to H^ (X), automatically have finite variance. We assume always that ?(Y/) = 0. Theorem 18.6.1. Let Xj be a stationary sequence satisfying the uniform mixing condition, with mixing coefficient (j)(n), and consider the stationary sequence Yj = UjY, where YeH^pQ. Suppose that A) X \E{\Y-E(Y\mk_k)\2}\-<oo, B) 7=1 Then <72 = E(Y20) + 2 t E(Y0Yj) A8.6.1) 7=1 converges, and ifa^O, then lim P (Yi + - + Y* <z)= Bjc)- n-oo V <™* . / Proof The stationary sequence {^f}, where is uniformly mixing, since with the obvious notation, and so, for AeW^^) Q \P(AB)-P(A)P(B)\^4>a(n)P(A),
18.6. CENTRAL LIMIT THEOREM FOR FUNCTIONALS 353 where rl, (n<2s), (f>s(n) < \ A8.6.2) UB) (n>2s). ' By the results of Appendix 3, = ?(Y2)<oo, so that (^f) satisfies the conditions of Theorem 18.5.2, and if 7= then lim P(gl +-^s" < z =Btt)-M e-^du. A8.6.3) n -»op Writing ^s) = Y, — ?{f\ we estimate the autocovariance functions of {Y,} and {rif}. If A8.6.2).gives,withi = [i/], * + ^r@*]. A8.6.4) Replacing Y by ^(os) = ^s ? HM (X), and noting that xp is thus replaced by since (§ 1.1), we have the analogue of A8.6.4),
354 ¦ CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 . A8.6.5) The convergence of the series A8.6.1) now follows from conditions A) and B), and A8.6.4). Writing where 1 " 1 " 7' — V F(s) 7" — V on j=i an j=1 we estimate ?|Z;'|2 by A8.6.5): Choosing first N, and then s, sufficiently large, we can make E\Z'^\2 arbitrarily small for all n. Moreover, for s large, |1 — (<rs/cr)| can be made arbitrarily small. Consequently, choosing N and s sufficiently large, and then letting n-»oo, we can make the difference arbitrarily small. • If Y has moments of higher order than 2, we can weaken some of the condi- conditions of the previous theorem. Theorem 18.6.2. Let the stationary sequence X-} be strongly mixing, with mixing coefficient cc(n), and consider the stationary sequence Yj=UjY, where YeH^X). If A) ?|7|2+<5<oo for some<5>0, oo B) I k= 1 oo C) I
18.6. CENTRAL LIMIT THEOREM FOR FUNCTIONALS 355 then oo a2 = E(Y20) + 2 ? E(Y0Yj) A8.6.6) 7=1 converges, and ifa^O, then lim pGl + "i+7" <z)= Btt)-* fZ e-*du . A8.6.7) n-»oo ^ ^^2 / J - oo Proo/ Define ?f and ^f as before. By A), (\Yj\2+d\WSj+j)} = A8.6.8) and by Minkowski's inequality, E\rjf\2+i= E\Yj-?f\2+'^22+'E\Y\2 + ' . A8.6.9) From A8.6.8) and Theorem 17.2.2, |??<?>?f| = O[a(/"-2sM/B + 5)] , A8.6.10) and, using Holder's inequality, \E?f #1 < {?|^)|B + WA+5)}A+5)/B+5){?|^)|2 + 5}1/B + 5), A8.6.11) and By a similar argument to that used in the proof of the last theorem, A8.6.10}-A8.6.12) give where i = [37]. The proof is then completed as before. • Remark 18.6.1. Condition B) of Theorem 18.6.2 is weaker than condi- condition B) of Theorem 18.6.1, since for <5>0,
356 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 Theorem 18.6.3. The conclusions of Theorem 18.6.2 remain valid if the conditions (l)-C) we replaced by A) P{\Y\<C) = l, C<oo, B) X E\Y-E{Y\W!Lk)\<ao, k= 1 C) ? a(/c)<co. The proof is a straightforward modification of that of Theorem 18.6.2, using Theorem 18.5.4 and the inequalities CE\n f\ Remark 18.6.2. It is easy to verify that all the theorems proved in this section remain true for a rather wider class of sequences Yj which include some cases of interest. Let $Ftj| (a ^ b) be a family of cr-algebras (not neces- necessarily generated by a sequence (Xj)) such that $R* cz $Rj)', if [a, ?>] cz [a', i>']. We say that this family is strongly mixing if sup sup \P(AB)-P{A)P{B)\ = oi{n)-*O 1^ k as n-»oo. The uniform mixing condition is similarly defined. Suppose there is a measure-preserving transformation T on <$fl-00 for which \; this generates a unitary operator U on the Hilbert space of random variables with finite second moment measurable with respect to 501* 00. Then the theorems apply to the stationary sequence Yj= UjY, where YeH^Jl^^). Returning now to the case in which Yj is generated by the mixing sequence Xj, we investigate more closely the important special case in which Yj is obtained by a linear transformation of Xj. For this to be meaningful we assume without further comment that E(Xj)<oo. Then YeLo0(X), and if Xj has the spectral decomposition e^ — K e^Z(dl), then the results of § 16.7 show that
18.6. CENTRAL LIMIT THEOREM FOR FUNCTIONALS 357 where the kernel CY(X) satisfies \CY{X)\2dF{X)<oo if F(X) is the spectral function of Xy The next two theorems show that, under wide conditions, the linear transformation preserves the central limit property. Theorem 18.6.4. Let Xj be stationary in the wide sense, and let ^ = ^(.1 XjJ->ao as n-*co. Let the sequence Yj=UjY, YeL00(X) be generated by a contin- continuous kernel CY(X). If lim Pi a;1 Y X:<z) = $(z) = Btt)"* e~±" du , *-oo V j=l / J -oo limP/V1 t Yj<z)=*(z/CY(O)). n -»oo \ J=l / Proo/ Suppose first that 7eLN_N(X), iV<oo, so that N N Y = Y c X Y = Y c k=-N k=-N and the kernel is k=-N Then n n N N / n a;1 Y Y^aZ1 Y Y c^+^a-1 Y ckl Y X: + 6n ), j=\ j=lk=-N k=-N \j=l where N r - 1 N + n ~i ^ I N I \Xj\+ I \Xj\\. k=-N Lj=-N j = n+l J
358 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 Since on-* oo, I0J-+O in probability, and by Lemma 18.4.1, lim Pie;1 ? Yj<z\= lim pja ? ckfdXj<z\ = Proceeding to the general case, we use Weierstrass's theorem to select a sequence of trigonometric polynomials CN (X) = Z^= _ N ckN eikX converging uniformly to CY(X). Considering CN(X) as the kernel of a linear transfor- transformation of Xj, we construct the stationary sequences YjN= and write YjN • 7=1 From the special case already proved, we have for fixed N, lim P{Z^<z) = <P{z/CN{0)). A8.6.13) n-» oo Write j 7=1 where n 7=1 By virtue of A8.6.13) it suffices to prove that as N-*oo, uniformly in n. Now Yy'N has spectral decomposition and spectral function
18.6. CENTRAL LIMIT THEOREM FOR FUNCTIONALS 359 By Theorem 18.2.1, \j = n where d{N) = max \CY{X)-CY{0)\2 -> 0 as N-+00. Thus and the theorem is proved. • Theorem 18.6.5. Let (Xj) be a sequence of independent, identically distri- distributed random variables, with E(Xo) = 0, E(Xl) < oo, and let k= — oo where oo Z d < °° • as n-*oo, then p/71 + ... + 7n < z\^^2nYi[ e"VdM \ <*n J J -oo Proo/ Clearly oo = Z k— — oo We prove that { ^(T.-Ml+K1)}1. A8.6.14)
360 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 In fact, cJ.H = W-1+cJ_ll_1) + 2(cj_1-cj_ll.1)cj-ll-lfll + cJ2.1.ll, A8.6.15) and summing over j = k — /, k — l+i, ..., k gives 00 00 .2 z L ck * -f- l.n 2- + —-—- + —~2— , A8.6.16) in which we can choose / so that c2_,_lt,,/G2 is arbitrarily small, thus yielding A8.6.14). - Writing akn = ckn, we have oo oo 0„ I Ii T ... T i.) = / Ou „ \u , / flt _ == 1 . k= — oo fc = — oo Let (Ej) be a sequence of positive numbers such that e7--*•(). Consider, for each n, the sequence of independent random variables {?nj;j=l,2, ..., 2JV + 2}, where 2-, \k\>N and AT is chosen so that Then 2N + 2 z2dP^nj<z)^\ z2dP(X0<z) + sn = o 7=1 The results of § 1.7 therefore show that Example. We have remarked that there are functionals of strong mixing sequences which do not generate strong mixing sequences, and we here exhibit an example of this phenomenon. Let (e7-) be a sequence of inde- independent random variables, with F(eJ=l) = F(ej=0)=^, and write
18.6. CENTRAL LIMIT THEOREM FOR FUNCTIONALS 361 %= i wir1* 0=1,2,...). k = 0 It is not difficult to see that E\Xl-E{Xl\el1e2i...tek)\2^2-k. If it were true that (Xj) were strongly mixing, then so would the sequence {f{Xj)) be, for any function/. It would then follow from Theorem 18.1.1 that, if Bn is such that 1 if(Xj)<z\->Bn)-*\S j=l J J - then Bl = nh (n) for some slowly varying function h. We choose for / the function k=l where rk is the fe th Rademacher function k rk (x) = sgn sin Bk nx), rk{Xl)= -l+2sk. The random variables rk(Xi) are independent, since if il9 ..., fk are each =is', s=l,2, ..., k) = P(ss=js; s=l,2, s=l Moreover, Erk(X1) = 0. If Yj=f{Xj), then so that (t J "t1 (n-j)EY0Yj>nHl+o(l)). The stationary sequence Gy) satisfies all the conditions of Theorem 18.6.5, so that
362 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 However, o2n >ir(l + o(l)) cannot be represented in the form nh(n). Consequently, Y, cannot be strongly mixing, and neither therefore, can the sequence Xy § 7. The central limit theorem in continuous time The extension of the results of this chapter to the case of stationary pro- processes X (t) with a continuous time parameter t gives rise to no serious difficulties. We shall therefore give only two theorems which are analogues of results proved in §§ 5,6, leaving the extension of the other results of §§ 4-6 to the reader. Theorem 18.7.1. Let the stationary process X(t) be uniformly mixing, and suppose that E\X(t)\2 + 3 <oo for some 5>0. If (fT V a\ = E[\ X{t)dt) -*oo as T-+O0, then 1 I X{t)dt<z) = Proof This could be carried out by the methods of § 5, but it is simplest to derive it as a corollary of Theorem 18.5.1. We introduce the stationary sequence X(t)dt, J 7-1 which is clearly uniformly mixing, and satisfies as n-»oo. All the conditions of Theorem 18.5.1 are therefore fulfilled by ?j, which therefore satisfies the central limit theorem. We have 2 7=1 < T \2 X(t)dt) [T] A8.7.1)
18.7. CENTRAL LIMIT THEOREM IN CONTINUOUS TIME 363 and as T-+00, [T] ,T \2 X{t)dt+\ X{t)dt) = 0 J [T] / ( X{t)dt{ X{s)ds ) + e(( X{t)dt 0 Jrri J \J rn Consequently, the right-hand side of A8.7.1) tends to zero as T-+00, showing that the limiting distribution of ST = a-l\ X(t)dt J o must be the same as that of •m _ [T] 0 j=l since Theorem 18.7.2. Let the stationary process X(t) be uniformly mixing with mixing coefficient </>(t), and consider the stationary process Y(t) = U'Y, where Y eH^X)- If A) Jo B) r J o then r°° E{Y@)Y(t)}dt 0 converges, and if a # 0, then as T-*co.
364 CENTRAL LIMIT THEOREM OF STATIONARY PROCESSES Chap. 18 Proof. Consider the stationary sequence "' Y(t)dt, so that ?j= Uj0;, where = C Y(t)dt, J o and therefore T .)] [Y(t')-E(Y(t')Wk-k)-]}dtdt' ) J 0 2 o Since E{Y-E{Y\W_t)}2 is a non-increasing function of t, condition A) shows that k= 1 From Theorem 18.6.1 and Remark 18.6.2 it follows that ?,- satisfies the central limit theorem. Hence, as in the proof of the last theorem, we deduce that X(t) satisfies the central limit theorem. •
Chapter 19 EXAMPLES AND ADDENDA The separate sections of this chapter are not related to one another except in so far as they illustrate or extend the results of Chapter 18. § 1. The central limit theorem for homogeneous Markov chains Consider a homogeneous Markov chain with a finite number of states (labelled 1, 2, ..., k) and transition matrix P = {Pij) (see, for instance, Chapter III of [47]). If Xn is the state of the system at time n, we have the sequence of random variables XUX2, ...,Xn, .... A9.1.1) We denote by p({f the probability of moving from state i to state; in n steps. If for some s > 0, p^ > 0 for all i, j, then Markov's theorem [47] states that the limits Pj = lim pg> n-*oo exist for all i and j and do not depend on i, and that, for constants C, p max |p|y-Pi|<Cp". A9.1.2) The numbers p1, p2, ..., pk form a stationary probability distribution in the sense that, if P(X1=j)=pj for all j, then the variables Xn form a stationary sequence. It then follows from A9.1.2) that Xn is uniformly mixing, since, if A = {X1 = i1, X2 = i2,..., Xr = ir} , = in + r, ..., Xs = is} ,
366 EXAMPLES AND ADDENDA Chap. 19 i so that \P(AB)-P(A)P(B)\ < P(A)\p\± + r-pin + r\ ^ P(A)Cp". Let/(?) be any real function defined on the states of the chain. Application of Theorem 18.5.2 shows that the central limit theorem applies to the sequence f{Xj) whenever o2 = E{f(X1)-Ef(X1)}2 + + 2 ? E{f{Xi+1)-Ef(Xi+1)){f(Xx)-Ef{Xx)}*0. If 7r = G1^ n2, ..., nk) is any other initial distribution, we denote the cor- corresponding probability and expectation by Pn and En. Theorem 19.1.1. Assume that A9.1.2) holds, and that cr^O. Then,for any initial distribution n, t {f(Xj)-Ef(Xj)}<zl = Bk) n-oo Proof. The theorem is already proved for the case n=(p1,p2, ¦¦¦,Pk}- Thus, denoting the normalised sum as usual by Zn, and setting r = [log n], |e-i'2- En eitZ"\ < \e~it2-EeitZn\ + \En eitZ"-EeitZ"\ ^ [f(Xj)-Ef(xd - ) j=\ U(Xj)-Ef(Xj)-]\-l it j=r+i niPijr+i Pjr+l)Pjr+ lJr+2'--Pjn- ljn
19.1. HOMOGENEOUS MARKOV CHAINS 367 max 2Cpr + 2 |f| *°g " max |/(z)| + o(l) ->0 (n-»oo). ^ A9.1.3) The results cited above extend to Markov chains on an arbitrary state space, for the theory of which the reader is referred to Chapter V, § 5 of [31]. The transition matrix (p0-) is replaced by the transition function p(x, A), defined for all points x of the state space X and all elements A of the o- algebra $x of subsets of X. We choose an initial distribution n(A), a probability measure on (X, ^x). Using the Kolmogorov extension theorem, we can find a sequence of random variables XUX2,...,XH,... A9.1.4) with values in X, such that P(X1eA1,X2eA2,...,XneAn) = ltdtH). A9.1.5) f A2 The n-step transition probabilities p{n)(?, A) are given by and n{A), (), P(XneA) =^ ^A)nm (n>2) Under reasonably weak conditions, there exists a stationary measure p{A), such that sup |pW (x, A) - p {A) | ^Cp", A9.1.7) Z,A where C, p are constants, 0< p < 1. This is true, for instance (see [23]) if A) there is a finite measure m on %x with m(?)>0, an integer v and a positive number e such that p(v) (?, ,4) < 1 — e whenever m D) ^ e (Doeblin's condition), and B) there is only one ergodic set.
368 EXAMPLES AND ADDENDA Chap. 19 If the measure p is taken as the initial distribution, then the sequence A9.1.4) is stationary. Moreover, A9.1.7) implies the uniform mixing condition, since eA,, ..., XkeAk, Xn+k, ..., XseAs) + -P(X1eA1,...,XkeAk)P(Xn+keAn+k,...,XseAs)\^ .eA,, ...,XkeAk)sup f \p(n)(Zn, p{?n+k,d!;n+k+1) ^2Cp"P(X1eA1,...,XkeAk). Thus the mixing coefficient satisfies The reader will be able to show, conversely, that if A9.1.4) is uniformly mixing, then A9.1.7) holds. Consequently the uniform mixing coefficient of a Markov chain either does not tend to zero, or decreases exponentially fast. Theorem 19.1.2. Let A9.1.7) hold, and letf(?) be a real-valued measurable function on X. If and if a2 = E{f(X1)-Ef(X1)}2 + 2 then for any initial distribution n(A), lim n-*ao Ja-'rT* ? [/(A})-?/(A})] < z } = B*)"* f' I j = 1 J J- Proof If the initial distribution n(A) is the stationary distribution p{A), the stationary sequence f{Xj) is uniformly mixing, and the theorem is a
19.2. m-DEPENDENT SEQUENCES 369 special case of Theorem 18.5.2. For an arbitrary initial distribution we proceed exactly as in the proof of Theorem 19.1.1, estimating the differ- difference En — E. • In a similar way we may use Theorem 18.6.1 to prove the following result. Theorem 19.6.1. Let A9.1.7) hold, and letf=f(?u ?2, ...)bea real-valued function on the infinite product XxXx ..., measurable with respect to the product a-algebra %x x %x x ... . Write IfE(tf)<oo,andif o2 = E(fl-EflJ + 2 then for any initial distribution n, as n-> oo, U ? (fj-EfJ)<z}=Bn) J -±\Z § 2. m-dependent sequences A sequence of random variables ..., X_ !, Ao, A1} X2, •¦• iJ to be m-dependent if the random vectors {Xa_p, ATa_p+1, ..., Xa), (X6, Xb+1, ...,Xb+q) are independent whenever b — a>m, or equivalently ifyjia_00 and t$flbx> are independent when b — a>m. The latter form of the definition has an obvious extension to the case of a process X(t) with continuous time parameter. A simple method of constructing m-dependent sequences (see [25] for examples occurring in statistics) is as follows. Let • ••¦> Q-1-> Co> Ci> ••• be independent variables, and/(xl5 ..., xm) a function of m real variables. Then ZJ+l,...,Zj+m-l) A9-2.1)
370 EXAMPLES AND ADDENDA Chap. 19 defines an m-dependent sequence. The converse is false; there are m- dependent sequences not expressible in the form A9.2.1). Since Gaussian variables are independent if they are orthogonal, a sta- stationary Gaussian process Xt is m-dependent if and only if its autocovari- ance function Rt is identically zero in \t\ > m. An m-dependent sequence is trivially uniformly mixing with </> (n) = 0 for n>m. Hence the following result is a special case of Theorem 18.5.2. Theorem 19.2.1. Let Xj be a stationary m-dependent sequence with EXq < oo. Then EX0Xj converges, and if <r#0, limP/V1^* fjXj<z\=<P{z). n-*oo V j= 1 / § 3. The distribution of values of sums of the form Let f(t) be a periodic function of the real argument t, with period 1, and consider the distribution of the values of the sum Sn(t) = t fBkt). A9.3.1) Such sums are of considerable importance in the metric theory of numbers, and as such have been studied by a number of authors. An important special case is the function At) = W, the fractional part of t. The reason for discussing the problem here is that it is a special case of those discussed in § 18.6. Indeed, for 0^?<l, Sn(t)= ?/({2kt})= ?/G*t), where T is the mapping of [0, 1) into itself defined by Tt={2t}.
19.3. SUMS OF THE FORM S/B*x) 371 This transformation preserves Lebesgue measure X; for A a [0,1), X{T~1A) = X{A). A9.3.2) (We leave this for the reader to verify.) We now study the probability space formed by the segment [0,1), with the Lebesgue measurable sets, and probability measure X. Then equation A9.3.2) means that the sequence of random variables fk=fBkt) is stationary. We shall see that much more can in fact be said. Any te [0, 1) has an expansion of the form where ek(t) = 0 or 1. If we neglect the rationals (which have measure zero), the correspondence between t and the sequence [ek] is one-to-one, so that the function f(t) may be written f{t)=f{e1,e2,...). It is clear that so that el5 e2, ... is a stationary sequence. Moreover, the ek are indepen- independent, since P(s1 = ius2 = i2, ...,es = is) = X{t;s1{t) = i1, ..., es{t) = is} = = 2-*= k=l k=l where ik = 0 or 1. Consequently, the random variable f=f(t)=f {sl, e2, ¦¦¦) is measurable with respect to the cr-algebra generated by the Ej, and ') —/(fik» fik+l» •••) is obtained from the independent random variables Ej in the way discussed in § 18.6. Theorem 18.6.1 therefore gives sufficient conditions for the asymptotic normality of Sn(t), i.e. for the limit . SH(t)-ESH(t) _} _ ^ to hold.
372 EXAMPLES AND ADDENDA Chap. 19 These conditions may be stated in a different, and more natural, form in the present application. We must first, of course, require that / have a finite variance, i.e. f{tJdt<oo. -' 0 Moreover, we need to compute This is measurable with respect to {e1, ...,sk}, and must therefore be constant on each of the intervals Ajk = [(j- \J'k,j2~k), (j= 1, 2, ..., 2k). The definition of conditional expectation (§1.1) gives f(t)dt, AJk so that f(t)dt, U-D2-k for teAjk. The special case of Theorem 18.6.1 can now be stated as follows. Theorem 19.3.1. Letf(t) be a function in L2@, 1) and with period 1, and let !hf(t)dt = O. If !/@-[/L@i2^Y< cx), A9.3.3) o / then ? C f(t)fBkt)dt 0 k= W 0 converges, and if cr^O, lim xlt-o'1^^ Remark 19.3.1. The condition A9.3.3) will be satisfied if either A) f(t) is a function of bounded variation, or B) V / I I r/A fU i Ul2J*_ /W1 -2 —? _2 J p>0
>9-3- SUMS OF THE FORM I/B*x) 373 Proof. A) If Var (/) < oo, then denoting the L2 norm by || • ||, we have \ Ajk [/(t)-/(«)] d« (Var/)* (X 2k f dt f |/(t)-/(«)|d«)* = \ j J Ajk JAjk / (Var/J-*k, so that B) As above, But f dt [ [f(u)-f(t)Ydt = JAJk JAjk dt A jk •> Ajk + 2 I dtf [/(tt)-/(tt + •> Aik JAik t-(j-lJ'k)-]x x[/(«+t-(/-lJ"k)-/(t)]dtt dt f [f(u + t-(j-lJ'k)-f(t)Ydu^ Ajk J Ajk dtf [/(«)-/(«¦ 0 J Ajk -k dt [f(v +
374 EXAMPLES AND ADDENDA Chap. 19 Substituting condition B), we have so that f 11/- [All < oo . . fc = l § 4. Application to the metric theory of continued fractions Each real number t in the interval @, 1) has a unique continued fraction expansion of the form t = —k n—•••> A9AA) where the an(t) are natural numbers, and every sequence {an(t)) corre- corresponds to some real number t. We write A9.4.1) symbolically in the form t = [ai(t),a2(t), ¦¦¦']¦ A9-4.2) The metric theory of continued fractions is concerned with the problem of computing the measures of sets of values of t defined by conditions on the sequence an(t). We shall see that, if a suitable measure is placed on the interval @, 1), then the sequence an(t) can be made stationary, and that it then satisfies the uniform mixing condition. Many of the results of the theory then follow from the known properties of mixing sequences. Define then a probability space with @, 1) as the set of elementary events, endowed with the cr-algebra of Lebesgue measurable sets. The appropriate probability measure turns out to be that defined by the equation -^-. A9.4.3) It is evident that the an(t) are random variables, but the proof of the prop- properties of the sequence (an) is quite complicated, and we need some simple facts about continued fractions (which may be found in textbooks of number theory, or in [68]). If t >0, [f] = a0 we t = [ao;a1,a2, ...
19.4. APPLICATION TO THE METRIC THEORY 375 where [al5 a2,...] is the expansion of {t}=t — a0. If r i 1 ! [a0; a!,a2, ...,ak] = flo+ ——-••.+ — is the terminating continued fraction corresponding to that of t, we write [a0; au ..., ak]= pk/qk. Then it is easy to prove by induction that, for all /c^ 2, i+Pk-2, A9.4.4) and from this it follows that, for all k ^ 2, fc/>k-i-/Vk-i = (-!)*• A9A5) If the continued fraction does not terminate, then 9k^2*(k-1), A9.4.6) and either - A9.4.7) or Pie ~T~ J)h — 1 Pie Let t = [a0; a l5 a2,...], and define rn (not necessarily integral) recursively, by t = ao + \/rx , rn=an + l/rn+1, (n = l,2, ...). Thent=[a0; a^^, ..., rn+1] and rn = [an; an+1,fln + 2, ...]. The number rn is called the remainder of order n of the continued fraction expansion oft. From A9.4.4) it follows that, for all ki r^ . -i Pk-lrk ["••«•¦•••¦<]-„_„ and thus for all n, t = p"-ir" + P"~\ A9.4.8) ^^ + ^
376 EXAMPLES AND ADDENDA Chap. 19 We shall denote by A^fc\\fs the set of t e @, 1) for which aki(t) = iu ...,aks{t) = is. Lemma 19.4.1. The set A]J;2w"n is the interval with end-points Pn Pn + Pn-l where pn and qn are defined by [i'i, i-2j •••' '«] = • Proof. The set A}^2ynin consists of the numbers of the form where l<rn + 1 <oo. Thus t Pnrn+\+Pn-l which varies on the interval with end-points Qn Qn + Qn-1 as rn+1 varies on the interval [1, oo). • The fundamental result is then the following theorem. Theorem 19.4.1. With respect to the measure \x, the sequence an(t) is stationary, and satisfies the uniform mixing condition with where K and A are absolute constants. We note that an cannot be stationary in the wide sense, since 1W log 2 J o 1 +
19.4. APPLICATION TO THE METRIC THEORY 377 Proof. A) To prove that an(t) is stationary we have only to show that We first show that By Lemma 19.4.1, X^;;;"n is an interval with end-points pjqn and {pn + pn-i)l {qn + Qn-i), where pjqn=[i1, i2, • ••, /„]. For the sake of argument suppose that 1 Pn Pn ~^~ Pn - 1 Then A 1 2 ... (n + 1) __ ) ' yl •< r < /I ( so that oo i 2...(n+l)\_ V ___ log Continuing in this way, and more generally, M<+s)::rs)) = M^::t)- A9.4.9) Finally, the general case is obtained from A9.4.9) and the equation and the stationarity is proved. B) To prove the uniform mixing condition, it is sufficient to prove that ^:::w:)> A9-4.11)
378 EXAMPLES AND ADDENDA Chap. 19 since any sets A, B measurable with respect to (al5 ..., ak) and (ak+n, ..., ak+n+s) respectively may be written as disjoint unions A = U Ai, B = (J Bm, I m of sets A,, Bm of the type occurring in A9.4.11), and A9.4.11) will imply that To prove A9.4.11) we need a theorem of Kyz'min, whose proof may be found in [68]. Theorem (Kyz'min). Let (/„(*); n=l,2, ...) be a sequence of functions on [0, 1] satisfying If for O^x^l, 0</o(x)<m, /„(*) = J where X is an absolute positive constant, and K depends only on M, m. We set Mn(x) = fi{t\a1 (t) = iu..., ak(t) = ak{t) = ik, zk+n{t)<x) , where zs(t) denotes the continued fraction zs(t)=[as+1{t),as+2{t),...]. In order that al(t) = il, ..., ak(t)= ik, zk+n(t)<x, it is necessary and suffi- sufficient that, for some integer r,
19.4. APPLICATION TO THE METRIC THEORY 379 a1(t) = i1,...,ak(t) = ik,— < w.iCX-. r + x r since Therefore, ~ 1. A9.4.12) It is easy to check that this equation may be differentiated term-by-term, to give A9.4.13) for n^l. We now introduce the functions which satisfy If the conditions of Kyz'min's theorem are satisfied, we can therefore con- conclude that 0K~XnV^ \°\1 A9A14) Now integrate this equation over A )k+n.'.\)k+n+s- Because of the stationarity, the integral of the right-hand side is equal to ...k + n + s\ i Q is., I Ak + n .. .k + n+s\ _ — XnVz To calculate the integral of the left-hand side, note that by Lemma 19.4.1, Ajk+n/sjk+n+s is the interval (a, C) on which the first s coefficients of the continued fraction of t arejk+n, ...,jk+n+s. The difference Mn(C) — Mn(cc) is therefore the /z-measure of
380 EXAMPLES AND ADDENDA Chap. 19 Thus jk+ 1 ...jk + n ...k ...k + n ...k + n +s\ Hence the integrated version of A9.4.14) is equivalent to A9.4.11). It remains to prove the admissibility of Kyz'min's theorem, so that we have to show that 0<*o(x)<Cl, |%o(x)|<c2, where cl5 c2 are constants. For teAj^y^ , where — LZ1' ••' lk\ Thus is the interval with end-points Pk + Therefore The second equation gives 1 2 log 2 Pk d«, dt = Differentiating A9.4.16), and using A9.4.5), <194-16) ,194.17)
19.4. APPLICATION TO THE METRIC THEORY 381 Hence from A9.4.17) and the inequality qk^ 1 <qk, we have The measure-preserving transformation T associated with the stationary sequence {aj(t)) has the form or equivalently, Since the sequence is mixing, T is metrically transitive, so that the ergodic theorem [47] has the following form. Theorem 19.4.2. Let f(t) be an absolutely integrable function on @, 1). Then for all points t of @, 1) except possibly for the elements of a set of measure zero, limn "j: f(TU) = (log!)-1 C {®-dt. Corollary 19.4.1. Let f(r) be a function of the integral variable r, and let f(r) = O{r1-5), <5>0. Then for almost all te@, 1), Hm n-1 t f(ak) = f) f(r) log (l + -J—)/log 2 . n-+oo &=i r=i \ ryr + z)/i To prove this, it suffices to note that f(a1 (t)) takes the value f(r) on In particular, taking/(r) = log r, we have, almost everywhere, n co / 1 \ / lim n 1 Y ak= Y log r log 1 + —.——. / log 2 , n-co fc=i r=i V r(r + 2)// or equivalently, co / 1 \logr/log2
382 EXAMPLES AND ADDENDA Chap. 19 Many similar results can be proved in the same way. To formulate a central limit theorem for the sequence f{Tjt), we have to be able to compute expressions of the form t) = E(f\aua2,...,ak). This is constant on A^y}^ and so that, for Theorem 19.4.3. Let the function f(t) satisfy A) Af(tJdt<co1 Jo B) f ( f \f(t)-UU)\2dty < oo , A9.4.18) k=l\J0 / and write Then a2= f {f(t)-a}2n(dt) + 2 ? f {f(t)-a}{f(Tkt)-a}»(dt) JO k= 1 J 0 converges, and z/cr#O, Km A< tier aT* X {/G*t-a)}<zf=tf>(z), n^oo I k=0 ) where X denotes Lebesgue measure on @, 1). Proof. It follows at once from Theorems 18.6.1 and 19.4.1 that Km ^ to o--1*!"* X {f(Tkt)-a} < z> = $(z). n->oo I k= 0 J Hence we have only to show that jjl can be replaced by Lebesgue measure X.
19.4. APPLICATION TO THE METRIC THEORY 383 Lemma 19.4.2. //B c @, 1) belongs to the a-algebra 9K*+1 generated by the functions aj(t) (/>n), then \H{B)-A{B)\ ^ K^-^XiB), A9.4.20) where Kx is a constant. Proof. If in the second part of the proof of Theorem 19.4.1 we set and carry through the same arguments, then instead of A9.4.11) we arrive at the inequality Summing over 1 ^i< oo, we find that Since any set BeW^+1 can be approximated by unions of sets of the form appearing in the last inequality, the lemma is proved. • As well as the probability space considered throughout this section, we construct another by replacing \x by X, distinguishing the corresponding expectation operators as E^ and Ex, so that Uf(t) is a bounded function measurable with respect to 9CR^+1, then Lemma 19.4.2 implies that \E,{f)-Ex{f)\ ^ sup l/tolKe-** . A9.4.21) t In fact, it is sufficient to prove this inequality for/of the form where the B, are disjoint sets in 9Pln°°+19 and for such/, A9.4.20) gives \E,(f)-EAf)\ * sup t
384 EXAMPLES AND ADDENDA Chap. 19 Now write " {f(Tkt)-a}, so that, by A9.4.19), lim n~* oo Using A9.4.20), as n->oo, with r= [log n], ™* fc = 0 n- 1 (^-^)exp ^x Hence the theorem is proved. • Remark 19.4.1. Condition A9.4.18) is satisfied if either of the conditions of Remark 19.3.1 is satisfied. The proof is almost the same as before, using the fact that, for t,ue Ajt;; ;fk, by A9.4.5) and A9.4.6), \t-u\ Pk Pk + Pk- § 5. Example of a sequence not satisfying the central limit theorem Let •••) s-1) so' Ci) Qi-> •• • be a sequence of independent normal variables with mean 0 and variance 1,
19.5. EXAMPLE OF A SEQUENCE 335 Yj= I \k\-aZk+j, A9.5.1) k= — 00 where \ < a < f, and Then Xj is a stationary sequence, which we shall prove not to satisfy the central limit theorem. More precisely, if Zn = SJV{Snf, SH=t Xj, then the distribution of Zn does not converge to the normal distribution. In view of the results of § 17.3, Yp and hence Xj, is regular. Thus we have an example of a regular stationary process not conforming to the central limit theorem. Moreover, X} is generated by independent random variables in the sense of § 18.6. The reader will be able to verify without difficulty that which shows that the condition of Theorem 18.6.1 cannot be significantly weakened. We now go on to investigate the limiting distribution of Zn. In order not to overload the argument, the justification for some of the analytical operations is left to the reader. From the results of § 16.7, it follows that the sequence Yj has spectral density k= 1 By partial summation, fc=1 k=1 s=l (e'"<k+1>*-l)[Vfl-(fc+l)-fl] . A9.5.2) Since /c"a(/c + l)"a = O^) it follows that/(A) is bounded outside any neighbourhood of 1 = 0. Indeed, a more precise analysis (using for instance the Euler-Maclaurin formula [36]) shows that, as l->0,
386 EXAMPLES AND ADDENDA Chap. 19 k~a QikX ~ \k\a~ lT{\ -a) (sin (irra) + i sgn A cos fata)), f(l)~\A\2la-»r(l-aJ. A9.5.3) The investigation of the distribution of Zn is based on the following lemma. Lemma 19.5.1. If (Yi, Y2, ..., Yn) is a non-degenerate Gaussian random vector, then the characteristic function o/S"=1 Y-1 is 0@= fl \l-2itnj\-*, where the fij are the eigenvectors of the matrix R = (EYjYk). Proof. We denote by y the vector {yx, y2, ..., yn), by A the matrix (al7), with inverse A'1 and determinant \A\, and associated quadratic form The identity matrix is /. If A is any complex matrix such that Re A = (Re au) is positive-definite, then (see for example [22]) — oo ^ — oo Therefore = ?{exp(it oo /•oo — oo ^ — oo oo oo -^ — oo i- = n ii -2i%r* • • Remark 19.5.1. The function |1 -2iY|~* is the characteristic function of ?/2, where t]€N@, 1). Lemma 19.5.1 therefore asserts that 1.Y2 has the same distribution as Zfiflj, where the ^are independent jV @, 1) variables.
19.5. EXAMPLE OF A SEQUENCE 387 Turning now to the stationary sequence X-}, Lemma 19.5.1 shows that the characteristic function of Sn = Z"= x Xt is E(eitS") = exp(-it f n{A f\ |l-2it/if I"*, j= where the /if are the eigenvalues of (#;_,; i,y'= 1,2, ..., n) and R, is the autocovariance function of Yy Since R is positive-definite, nf)>0. We suppose that they are indexed in descending order: We also remark that Lemma 19.5.2. There exist constants cx anrf c2 such that c1>0,c2<co and ixf = max /if ^ Cln2A"a), A9.5.4) (Tn<c2n2A-a). A9.5.5) Proof. The autocovariance function R7- has the expression n By A9.5.3) there exists e>0 such that, in \X\ <e , Hence = sup ? ykyjRk-j> \ ? Rk-j = Sy2=l k,j=l n k,j=l n) _n (^) ^ 2(W) di In and A9.5.4) is proved. a)
388 EXAMPLES AND ADDENDA Chap. 19 To prove A9.5.5) note that, if (Uu U2, U3, t/4) is a Gaussian vector, its characteristic function is tUj) =exp{-iX htjE(UkUj)} , j k,j and equating coefficients of tit2t3t4., we have + E(U1U4)E(U2U3). Hence ( ) , A9.5.6) and so that k= 1 Because of Lemmas 19.5.1 and 19.5.2, the characteristic function i]/n(t) of Zn = SJ<Jn is \-2itzi- , A9.5.7) where 0 < b = lim inf fc^ ^ lim sup bf = B < oo , n-*oo n-»oo and ^n(r) is a characteristic function. If the distribution of Zn converges to JV(O, 1) as ft-* oo, then we may go to infinity in A9.5.7) along a subse- subsequence (fij) on which both b("j) and (fin. (t) converge, to arrive at the equation 2 A9.5.8) where 0<bo< oo and (p(t) is a characteristic function. It is however, a famous result of Cramer (see, for instance, [23] or [105]), that if
19.5. EXAMPLE OF A SEQUENCE 389 where ^x and <f>2 are characteristic functions, then both </>! and <jJ must be normal. This contradicts A9.5.8), and shows that Zn cannot converge in distribution to N@, 1). In fact, more detailed analysis shows that 00 lim E{eitZ") = J] |1 -2itb n-»oo j= 1 where bj=]hn
Chapter 20 SOME UNSOLVED PROBLEMS In this chapter we list various unsolved problems and possible lines of further research, classified according to the chapters from which they arise. They come from various sources, to which it would be difficult to give exact credit. Chapters 3-5 A) The problem of extending the results of § 3.4 to the case of conver- convergence to a stable law of exponent a^2 (see [191], [21]). Thus we would wish to find necessary conditions and sufficient conditions, not too far apart, for the distribution function Fn(x) of the normalised sum of in- independent, identically distributed random variables in the domain of attraction of the stable law Ga (x) to satisfy \Fn(x)-Ga(x)\ = O(n-y), y>0. The method of § 3.4 does not work, since Theorem 1.6.1 is not applicable. B) What is the multi-dimensional analogue of Theorem 3.4.1? C) In the notation of § 3.5, write Cn = sup sup — |FB(x) -$(x)\ . F x P3 Theorem 3.5.2 says that Cn->[A0)* + 3]/6B7z:)* as n-+co It would of course be interesting to know Cn explicitly, but failing that, to find esti- estimates of Cn, perhaps the second term of an asymptotic expansion. D) Along the same lines as the last problem, set C*(x) = lim sup sup — \Fn(x)-<P(x)\ . n-»oo F A>3 It has been conjectured by Kolmogorov [82] that, for symmetric distri- distributions F,
Chap. 20 SOME UNSOLVED PROBLEMS 391 C*(x) = Bn)-^^x2. B0.1.1) This has been proved under restrictive conditions by Linnik [98]. E) Find an analogue of Theorems 4.5.1 and 4.5.3 for the case of conver- convergence to a stable law with exponent a =?2. It seems possible that this problem is simpler than A). F) How is Prohorov's theorem D.4.1) changed if convergence in Lx(—oo, oo) is replaced by convergence in Lp(—oo, oo) (l<p<oo)? G) Again in the spirit of A), extend the'results of § 5.3 to the case of convergence to stable laws. Chapters 6-14 A) In all the theorems of these chapters, the zones considered were of width A(n)p(n) or A(n)/p(n), where p(n) is a function increasing arbi- arbitrarily slowly to infinity. Can the function p (n) be replaced by a constant, say 1? (Many results in this direction have been obtained by Nagaev.) B) The problem of uniform normal convergence has been studied for narrow zones of general form, and for wide monomial zones. The pro- problem remains of studying wide zones which are not monomial. C) The derivation of asymptotic expansions in different zones of normal convergence. D) The analogues of A )-C) for convergence to Cramer's system of limit- limiting tails. E) The discovery of systems of limiting tails other than that of Cramer, and of their domains of attraction. F) The establishment of sharp bounds for large deviations, and in particular the computation of the best constant. G) The discovery of wider classes of random variables, for which integral theorems are valid on the whole line. (8) As a particular case of G), study the random variables with contin- continuous probability densities g(x) satisfying for x^l and a similar condition for x< — 1. (Some results have been obtained by B. Pergel of Budapest). (9) The study of random variables for which
392 SOME UNSOLVED PROBLEMS Chap. 20 for x ^ 1, and a similar condition for x < — 1. A0) Kolmogorov's problem. Let F1,F2 be two distributions, F\"\ F{2"] their n-fold convolutions, and suppose that F1(x) = F2(x) for \x\^a. Under what conditions and in what zones is it true that as n->oo, _—^ = 1? B0.1.4) If [ exp|x|4a/Bo(+1)dF1(x)< oo, B0.1.5) J — CO then for a > ?, B0.1.4) will not hold in general, in 0 ^ x < na, since the limit- limiting tails of Cramer's system depend on the moments, and the moments of i7! and F2 need not coincide. If a<?, then B0.1.4) depends on the equality of the first two moments, and does not even need the condition that the tails of Ft and F2 be identical. Thus Kolmogorov's problem is solved under B0.1.5). In the absence of this condition the situation is obscure. For variables of class (A) (§ 14.1) it is easy to check that B0.1.4) is implied by the equality of the first two moments of Fx and F2 in [0, c (log n)*], and also in \_n*+a~1+e, oo] if Fi(±x)~F2(±x). Between these two zones it will not hold unless the pseudomoments of Fl, F2 coincide, which is a condition on the distributions as a whole, and not merely on their tails. A1) Extend the results of Chapters 9-14 to the case in which the variables do not have the same distribution. (Much has been done in this respect by Petrov [128], [129].) A2) Deduce analogous results for Markov chains. (Some results have been established by V A. Statylyavichyus.) A3) Extend the results of Chapters 8-14 to random vectors. A4) Investigate the large deviations of infinite-dimensional objects, such as the whole history of Markov chains. Chapter 15 A) Can Theorem 15.1.1 be proved in a neat analytical way using the apparatus of characteristic functions? (Kolmogorov [85]) B) If in Theorem 15.1.1 the uniform distance \Fn — Dn\ is replaced by the variational distance p(Fn, Dn) (§ 15.1), it is not known whether, as n-+cc, sup inf p{Fn, DJ-+0. F Dn
Chap. 20 SOME UNSOLVED PROBLEMS 393 Chapters 16-19 A) A vast field of research is presented by the problem of characterising stationary processes satisfying one or other of the conditions of weak dependence. (See for example [87], [55], [56], [184].) What conditions, for example, are laid upon the moments of a stationary non-Gaussian process by the strong mixing condition? B) In Theorem 18.2.3, can the uniform mixing condition be replaced by the strong mixing condition? C) Conjecture: If a sequence Xj stationary in the strict sense is uniform- uniformly mixing and satisfies E{xfj< oo, lim J = 1 then it satisfies the central limit theorem, (cf. Theorems 18.5.1, 18.5.2). D) Can Theorem 18.5.3 be refined in the following way? Let a stationary sequence Xj be strongly mixing with mixing coefficient tx(n), and let E\Xj\2+d<co for some 8>0. Can one give a number a = a(8) such that the central limit theorem holds whenever tx(n) = o(n~a), and for which an example of a sequence not satisfying the central limit theorem can be found for any tx(n) with lim sup a(n)na>0, it being always assumed of course that F(S Xj) -> oo? E) How far from the best possible condition is A) of Theorem 18.6.1? For instance, suppose that Xj are independent variables taking two values with constant probabilities p, q, and that What is the precise order of magnitude of the quantity y(n) = E{Y0-E(Y0\X.n,...,Xn)}2 which ensures that Y,- satisfies the central limit theorem? In the context of § 19.3, how can this condition be expressed in terms of/e L2@, 1). F) How are the conditions for the central limit theorem to hold for Markov chains related to the conditions of weak dependence? For example, let/(Xj) be the stationary sequence derived from a homogeneous Markov chain X/(§ 19.1). For this sequence to satisfy the central limit theorem, is it enough that it be regular? If not, is it enough that it be strongly mixing?
Appendix 1 SLOWLY VARYING FUNCTIONS A positive function h(x), defined for x^O, is said to be slowly varying if, foralW>0, A positive function q(x) (x^O) is said to be regularly varying with expo- exponent a if for all t>0, lim «& = f. (A1.2) Clearly (A1.2) is equivalent to the assertion that q(x) = xah(x), where h(x) is a slowly varying function. Theorem A 1.1. A slowly varying function h(x) which is integrable on any finite interval may be represented in the form h{x) = c{x) expj f ^ dt\ , (A1.3) where lim c(x) = c#0 , jc-»oo lim e(x) = 0 , jc-»oo and a>0. This theorem is due to Karamata [62]. For its proof we require two lemmas.
App. 1 SLOWLY VARYING FUNCTIONS 395 Lemma Al.l. rx lim h(t)dt= 00 . x->oo J 0 Proof. By (Al.l), as x-^co, [log x] + 1 logfc(x) = I {log/i(x2-fc)-log/i(x2-fc-1)}+O(l) = k = 0 = o (log x), so that, for large x, = x-> • limf^dr-flim^dr-l. (A1.4) Lemma A1.2. ^dr-flim^ h(x) Proof. By Fatou's lemma, liminf -ALidt^l, (A1.5) so that the function Al\L (AL6) is bounded. By l'Hopital's rule, it is easy to show from (Al.l) that, for all r>0, ' h{t)dt a(rx) rxh(rz) J 0 hm -)-f = lim ) / ~ = 1 , ,_« a(x) ^^ xfi(x) o so that a(x) is also slowly varying. It is also bounded, so that lim {a{rx)-a{x)}=0. If H(x) = Jo h(t)dt, then (A1.6) may be written
396 SLOWLY VARYING FUNCTIONS App. 1 so that H(x) = [ h(t)dt = c exp f ^ dt, (A1.8) ¦JO J a t where c, a are constants. Thus F a(t) xh(x) = ca(x) exp —— dt, and by (A1.7), Crxa(t)A h(rx) a(x) exp ^i dt = r-r^-y--> r as X-+00, or equivalently, lim 1 f 1 Since log r = f1 f1^, (A1.9) shows that, as {a(x)-l}\ogr - t The integrand is bounded, and tends to zero as x -> oo for fixed t, so that lim -^ — dt = 0 . x-* qo -I 1 ^ Thus (A 1.10) shows that lim a(x) = 1 . • x-*oo Proof of theorem. Set as x-> oo, and c(x) = ca(x)/a. Then xh(x) = ca(x) exp ^dt = xc(x)exp —dt. • Ja t la t
App. 1 SLOWLY VARYING FUNCTIONS 397 The theorem has a number of simple consequences, of which the following are useful. B) Foralle>0, lim xeh(x) = oo , lim x~eh(x) = 0 . x-* oo x-* oo C) lim sup -4 = 1 . A positive function h(n) of an positive integral argument n is called slowly varying if, for any positive integer k, Generally speaking, a slowly varying function of an integer argument does not have the properties, such as A), B) and C), which distinguish the slowly varying functions of a continuous variable. In order to be able to invoke such properties, it is necessary that h(n) be not merely slowly varying, but also have a slowly varying extension h(x) defined for all x^O. An example of a slowly varying function h(x) which does not have such an extension is h(n) = (number of simple divisors of n) + (log n)* .
Appendix 2 THEOREMS ON FOURIER TRANSFORMS For p > 0, we denote by Lp the collection of functions/^) for which is finite, and write q = p/{p — l)..The following two theorems, due to Titchmarsh, are extensions of those of Plancherel and Parseval for the case p = 2. Their proofs may be found in Chapter 4 of [180]. Theorem A2.1. IffeLp, then f(t)eixtdt converges in Lq mean as a~+co to a function F(x) called the Fourier trans- transform off which satisfies the inequality (All) For almost all x, we have the dual relations eixt— 1 *, - ixt Theorem A2.2. 7/l<p<2,/(x), G(x)eLp, anrf F(x) anrf g(x) are their Fourier transforms, then r oo /"oo F(x)G(x)dx= /(x)flf(x)dx. (A2.2) J — oo J — oo
App. 2 THEOREMS ON FOURIER TRANSFORMS 399 Theorem A2.3. Let F(x) be the Fourier transform off(x)eLp A <p^2). // also f(x)eLq andf'(x)GLp, then xF(x)gL2 and -ixF(x) is the Fourier transform off'(x). Proof. Since — oo the limit as x-+cc of /(xJ-/@J = 2 \Xf(t)f'(t)dt Jo exists. Since |/(x)|2 is integrable, the limit of/(xJ cannot be non-zero, so that lim f(x) = 0 . But f f'(u)eixudu = {f{a)Jxa-f{-a)Q-ixa}-ix\a f(u)eixudu, J -a J -a and as a-* oo the left-hand side converges in Lp to the Fourier transform of/'(x), and the right-hand side to — ixF(x). • This theorem is a slight generalisation of Theorem 68 of [180].
Appendix 3 A THEOREM ON CONVERGENCE OF CONDITIONAL EXPECTATIONS Theorem A3.1. Let X be a random variable with E\X\p<co {p>l), and let 9Cfln be a o-algebra of events for each integer n, with mn^mn+1 (n=..., -1,0,1,2,...). Let and let 90?^ be the smallest a-algebra containing [j <H0ln. Then n lim E\E(X\<mn)-E(X\M_x)\p = 0, n-* — oo lim E\E{X\Wln)-E{X\moc)\p = 0. n-* oo In particular, ifX is measurable with respect to <H0loo, then lim E\E{X\Wln)-X\p = 0. (A3.2) n->oo The proof may be found in § 7.1 of [31]. We note that the left-hand side of (A3.1) is finite for all n, since by Jensen's inequality [31], for any cr-algebra 90?, E\E{X\m)\p ^ E{E{\X\p\Wl)} = E\X\P . In case p = 2, the point of view of § 16.3 gives (A3.1) a simple geometric meaning. If Hn is the subspace of L2 (Q) consisting of the random variables measurable with respect to 90ftn, and Pn is the projection operator onto Hn, then it is easy to verify that
NOTES Chapter 1 §§ 1-3: The development of probability theory on the basis of the con- concepts of measure theory, including the definition of conditional expec- expectation and conditional probability, comes from Kolmogorov [76]. Theorem 1.3.1 is cited without proof in [82]. § 4: Characteristic functions were first used to prove limit theorems in probability theory by Lyapunov [106]. Their basic properties were studied by Levy [93]. § 5: Theorem 1.5.1 comes from Levy [93]. The remaining theorems of this section are due to Esseen [33], [34]. § 6: Theorem 1.6.1 is well known. For much more general results, see the book by Linnik [102], from which our proof is taken. § 7: Infinitely divisible distributions were first studied by de Finetti [38]. Formula A.7.1) was discovered by Levy [94], but in the case of finite vari- variance had been earlier obtained by Kolmogorov [78]. Theorem 1.7.2 comes from Khinchin [70], Theorem 1.7.3 from Gnedenko [39]. Chapter 2 §§ 1-2: The results of these sections are due to Levy [93] and to Khinchin and Levy [75]. § 3: Theorem 2.3.1 for a > 1 was proved by Lapin, for a< 1 by Linnik [99] and Skorokhod [174]. Theorem 2.3.2 is a combination of the results of Linnik [99], Zolotarev [188] and Medygessy [109]. § 4: Theorems 2.4.2 and 2.4.5 were proved by Bergstrom [6], Theorem 2.4.1 by Bergstrom [6] and Pollard [135], Theorem 2.4.3 by Skorokhod [174]. The asymptotic expansions of Theorems 2.4.4 and 2.4.6 are new, the leading terms having been obtained by Skorokhod [174] and Linnik [99] respectively.
402 NOTES § 5: Unimodality was defined by Khinchin [72]; the unimodality of stable distributions was proved by Ibragimov and Chernin [58], see also Levy [97]. § 6: The domain of attraction of the normal law was studied by Khinchin [69] and Levy [95]. The domains of attraction of stable laws with ex- exponent a#2 were investigated by Gnedenko [40] and Doeblin [30]. Theorems 2.6.1 and 2.6.2 are reformulations of the results of these authors. Theorem 2.6.3 is due to Sakovich [165], Theorem 2.6.4 was proved for a = 2 by Khinchin [69] and for a # 2 by Gnedenko. Theorem 2.6.5 is new. Chapter 3 §3: Esseen [33]. For more restrictive conditions see Cramer [23]. The case of convergence to stable laws with a # 2 was investigated by Cramer [21] and Zolotarev [191], § 4: A new result. § 5: The first estimate of Fn — <P, in the spirit of Theorem 3.5.1, was ob- obtained by Lyapunov [101], the final result by Esseen [33]. Theorem 3.5.2 was proved by Esseen [35], the supplementary results by Rogozin [152]. § 6: Esseen [33]. Chapter 4 § 1: For earlier studies of local limit theorems, see von Mises [114] and Bavli [4], [5]. § 2: Gnedenko [42], [43], [45]. Local limit theorems for sums of vari- variables with different distributions have been obtained by Prokhorov [139], Rozanov [159] and Petrov [123]. § 3: Gnedenko [44], [45]. Extensions to non-identical summands by Smith [179] and Petrov [122]. § 4: Prokhorov [138]. In the work of Sirazhdinov and Mamatov [172] bounds were obtained for ||pn —</>|| for the case of normal convergence. § 5: Theorems 4.5.1 and 4.5.3 are new, Theorems 4.5.2 and 4.5.4 due to Esseen [33]. For the non-identical case see-Petrov [125]. Chapter 5 § 1: The first limit theorems in the Lp metric (in the case of normal con- convergence) are due to Agnew [1], who described them as global variants of the central limit theorem. § 2: A new result. § 3: Esseen [34].
NOTES 403 Chapter 6 §§1,2: Formulae analogous to F.1.6) and F.1.7) were obtained by Smirnov [175] in 1933. More general local theorems were obtained in 1957 by Richter [147], [148] by using the method of steepest descents and working under a condition introduced by Cramer [19], here cited in a refined form due to Petrov [133]. Zolotarev [192] obtained the limit theorems for large deviations for variables in the domain of attraction of a non-normal stable law. For the role of large deviations in information theory, see [28]. Chapter 7 § 5: Bernstein [7] and Richter [149] introduced refinements of the in- inequalities G.5.3). (In Richter's work there is an easily corrected error: in formula A) the expression should be Chapter 8 Petrov [133]. Chapter 9 Linnik [100], [101], [104]. Chapter 10 Petrov [130]. Other local theorems for large deviations have been ob- obtained by Richter [146] and Nagaev [120]. Chapters 11,12 These chapters largely describe the results of [101]. Chapter 13 Petrov [131]. Chapter 14 Linnik [104]. Chapter 15 §§ 1-4: The basic theorem of this chapter is due to Kolmogorov [85], whose paper also describes the history of this problem. Concentration
404 NOTES functions were introduced by Levy. Theorem 15.2.1 is an amplification due to Rogozin [153] of a result of Kolmogorov [84]. Lemma 15.3.5 comes from Prokhorov [140]. Meshalkin [111] has shown that inf sup |FB-D| ^ Or*(log n)~4 . D F In [85] there is an analogue of Theorem 15.1.1 for non-identical summands. Chapter 16 § 1: Processes stationary in the wide sense were first studied by Khin- chin [65]. § 2: An exposition of the theory of measure-preserving transformations may be found in [154]. Recent achievements in this field have been the result of the use of methods derived from probability theory, and especially from the theory of stationary processes (see [156], [170] and [167]). § 3: The geometrical interpretation comes from Kolmogorov [79], [80]. §4: Khinchin [65]. §5: Theorem 16.5.1 was proved by Cramer [20], although equation A6.5.1), without Z(X), was known to Kolmogorov [79]. For proofs of Theorem 16.5.1 not using the spectral theory of unitary operators, see [31], [187]. §§6,7: Kolmogorov [80]. Chapter 17 § 1: Regular processes were studied by Vinokourov [183] ; Lemma 17.1.1 is due to Wold [185]. The idea of linear regularity, and condition A7.1.7), come from Kolmogorov [80]. Theorem 17.1.2 is a very special case of theorems on the spectrum of iC-systems and X-flows [86], [171]. § 2: The strong mixing condition was introduced by Rosenblatt [158], the uniform mixing condition by Ibragimov [152]. The results of this section come from [184] and [57]. § 3: Further information on the spectral densities of strongly mixing processes may be found in [87], [55], [56], [163]. Chapter 18 § 1: An alternative technique to that of Bernstein for proving limit theorems is Markov's method of moments, which has been applied to stationary processes by Leonov and Shiryaev [88], [89], [91], [92]. Theorems 18.1.1 and 18.1.2 are due to Ibragimov [184].
NOTES 405 §§ 2, 3: Theorems 18.2.2 and 18.3.2 are from Leonov [90]. Theorems 18.2.3 and 18.3.3 are new. Equation A8.2.7) comes from Robinson [151]. §§ 4-7: Mainly the results of Ibragimov [52], [51]. Related investiga- investigations not confined to stationary processes may be found in the work of Volkonskii and Rozanov [184] and in that of Rozanov [160], [161], [162]. Theorem 18.4.2 comes from [184]. The first limit theorems for strongly mixing processes were proved by Rosenblatt [158]. A variant under a condition weaker than strong mixing has been proved by Sinai [169]. Estimates for the rate of convergence may be found in [177]. The method for deducing the analogous results in continuous time is due to Kolmogorov [77]. Results similar to those of § 5 were obtained by Ciucu [14], [15], see also [16]. Chapter 19 § 1: The central limit theorem for finite Markov chains was proved by Markov himself [108]. Theorem 19.1.2 comes from Nagaev [117], whose method of proof differs from ours. In [118] and [119] the condi- conditions of that theorem are further relaxed. The most complete results on inhomogeneous Markov chains were found by Dobrushin [27] and Statulevicius [176]. In [110] Meshalkin has enumerated all possible limit distributions for sums of random variables defined on a finite homo- homogeneous Markov chain. § 2: m-dependent random variables were first studied by Hoeffding and Robbins [51]. Theorem 19.2.1 is due to Diananda [25], [26]. § 3: The results of this section, which are due to Ibragimov [53], are amplifications of theorems of Kac [60]. Leonov [89] has investigated the distribution of values of sums of the form ~Zf(Akt), where/is defined on an n-dimensional cube, and the integral matrix A has no eigenvalues which are roots of unity. § 4: These profound results in the metric theory of continued fractions were obtained by Khinchin [66], [67]. The first part of Theorem 19.4.1 is by Ryll-Nardzewski [164], the second by Ibragimov [53]. Theorem 19.4.2 is due to Ryll-Nardzewski [164], but weaker variants were known to Khinchin. The central limit theorem for continued fractions was first proved by Doeblin [29], Theorem 19.4.3 by Ibragimov [153]. In [54] a central limit theorem was proved for the denominators qn(t). The metric theory of more general number systems has been studied by Renyi [144] and by Rokhlin [155]. §5: Rosenblatt [158].
SOME CONTRIBUTIONS OF RECENT YEARS I. A. Ibragimov, V. V. Petrov The present chapter is a review of contributions published in the years between the appearance of the original (Russian language) version of this book and the present translation A965-1970). Its authors have not attempted to review all such contributions pertaining to the book's subject matter; consideration is essentially given to those works which to a certain extent extend or develop the results of preceding chapters, such as those solving the problems of Chapter 20. Thus proofs are either wholly omitted, or only touched on. The referencing of this chapter is self-contained; all references given are to the complementary reference list at the conclusion of the chapter. On chapter 3 In recent years a great deal of work has been devoted to estimating the remainder term of the central limit theorem. M. Katz [52] has obtained the following generalization of the Berry-Esseen estimate (Theorem 3.5.1). Let Xl,...,Xn be independently and identically distributed random variables with zero mean and positive variance a2. Let E(X\g(Xl))< oo, for a non-negative even function g(x) with the properties that g(x) and x/g(x) are non-decreasing in the region x ^0 and lim g(x)= +oo. Put j), <P (x) = Bn) Then sup\Fn(x)-<P(x)\ < o2g{pn±) where C is an absolute constant. Substituting g(x) = \x\, we recover the Berry-Esseen estimate sup\Fn(x)=$(x)\ ^
SOME CONTRIBUTIONS OF RECENT YEARS 407 V.M. Zolotarev [13] has noted that, in this last estimate, one may put C = 0.82. V.V. Petrov*[33], L.V. Osipov [25], L.V. Osipov and V.V.Petrov [26], W. Feller [47], and others, have studied the generalization of the Berry- Esseen estimate to non-identically distributed independent random variables. Considerable progress has been made in respect of the subject of non- uniform estimates of the remainder term in the central limit theorem, which appear as essential refinements of the uniform estimates. We first note the following result of S. F. Kolodiazhniy [18], of interest beyond the central limit theorem alone, and relevant to a theorem of Esseen (Theorem 3.6.1). Let F(x) be an arbitrary distribution function, with finite absolute moment of order p >0. Put A = sup\F(x) — <P(x)\.lfO< A^ e~*, X then there exists a constant c(p), depending on p, such that for all x. Here \x\pd<P(x) As indicated in [18], this estimate is optimal in a certain sense. The following important refinement of the Berry-Esseen estimate is due to S.V. Nagaev [24]. Let Xy, X2, ¦¦¦, Xn be independent and identically distributed random variables such that E(Xl) = 0, E(X\) = a2 > 0, E \X1 |3 = /?3< oo. Then CB for all x. Here C is an absolute constant. A generalization of this result to non-identically distributed random variables was obtained by A. Biki- alis [1]. Non-uniform and uniform estimates of the remainder in the central limit theorem without assumptions concerning the existence of moments of the random variables under investigation may be found in the paper of L. V. Osipov and V. V. Petrov [26]. We now pass onto an account of some recent results concerning asymp- asymptotic expansions in the central limit theorem. Let Xx, X2, ¦¦¦ be a sequence
408 I. A. IBRAGIMOV, V. V. PETROV of independent random variables with the same distribution function V(x); suppose E(Xl) = 0, E{X\) = a2 >0, and let v(t)=E(eilXi), U E\Xj_\K< oo for some integer k >3, then for all x and n K~2 Pv( — $ F.(x)-*(x)- v= 1 n \y\KdV(y) |-K- 1 Here 8 = azA2 E\X113) 1 and c(k) is a positive constant, depending only on k. The function PV( — <P) is the same as in Theorem 3.3.3*). This result is due to L. V. Osipov [28]. We note that under the assumption that lim sup^^^ \v(t)\ < 1 (condition (C) of Cramer) we have sup|f| >a|y@l < 1 for any 8 >0, so that the factor (sup|,| >a|y(t)| + l/2«)" decreases faster than n~p for any p >0. The follow- following are then corollaries of Osipov's theorem. If Cramer's condition (C) holds and E\Xx\r< oo for some r ^3, then there exists a positive function e(u) such that limu_>ooe(u) = 0 and Fn(x)-<P(x)- v=l ,v/2 Also, if Cramer's condition (C) holds and ?|Z1|K< oo for some integer k ^ 3, then K — 2 n I rT\ 1*1 v= 1 -v/2 uniformly with respect to x. * n the paper of V.V. Petrov [31] there are explicit formulae for the functions PV(-<P).
SOME CONTRIBUTIONS OF RECENT YEARS 409 In the case k = 3, A. Bikialis [1] has shown that the preceding relation still holds if Cramer's condition (C)is replaced by the weaker requirement that the distribution of Xx be non-lattice. Let us now consider a sequence of independent random variables Xy, X2, ... having the same lattice distribution, on the possible values a + mh (m = 0, +1, + 2, ...) where h is the (maximum) lattice distance for the distribution. Let E(Xj)=0, E(Xf) = a2 >0, and E\Xx\r< oo for some r ^ 3. Then, as shown by L. V. Osipov [27], there exists a positive function e(u), such that lim,,-^ e(u)=0 and M-2 / h \v Fw(x)-77wr(x)- J] <5V — x v=l WHV xS, fxari* an '\~h~~~h an e(n*(\ + \x\)) Here + 1, -1, if if oo [r]y2n(-< v= 1 ^ v = 4m+l, v = 4m+ 3, cos 2nlx *) 5 4m + 2 4m, \2k sin 2tt /x Of some interest are upper and lower estimates of the remainder term in asymptotic expansions, having the same order. Let Xu X2, ... be a sequence of independent and identically distributed random variables with E(Xl)=0, E(X\) = a2 >0 and ?|X1|K< oo for some integer k ^3. We put V(x) = P{X, < x), x\ fdK(.x) (v = -X < (v=l,2,
410 I. A. IBRAGIMOV, V. V. PETROV _jAnx+\LntK+1\+LnfK+2, if k even, l^n,K-l+l^n.icl+^n,K+l , if K Odd . The following result is contained in L. V. Osipov's [29] paper. If ?|Z1|h'+1 = co and the distribution function V(x) satisfies Cramer's condition (C), then sup X k-2 *n(x)-*(x) - I PA-*) v=l 11 v/2 1,K for odd k, and sup v/2 for even k*). Here and the function FK _ 1 (— 0) is defined by the formal equation The paper [29] also contains an analogous result for the case when Xu X2, ... have identical lattice distributions. Up to this point we have been concerned with estimates of remainder terms in asymptotic expansions for distributions of sums of independent random variables. We now pass onto a consideration of necessary con- conditions for the representation of these distributions by similar asymptotic expansions. Let Xy, X2, ... be a sequence of independent random variables with the same distribution function V(x), zero mean and finite positive variance a2. Let ^!=0, /j,2 = g2, fi3, ^4, ... be a specified numerical sequence in which the numbers /i3, jiA, ... may be arbitrarily chosen. Let QK(x), k= 1, 2, ... be polynomials with coefficients expressed in terms of/i3, ..., fiK + 2, in the same manner as the coefficients of the classical polynomials QK(x) = Bn)iex2/2PK( — <P) are expressed in terms of the cumulants 73^ •••¦> Jk + 2 (see e-g- [31])-A sequence of numbers Pi,-fi2, ... is construct- * an^d>n denotes that 0< lim inf ajbn^ lim sup ajbn< oo.
SOME CONTRIBUTIONS OF RECENT YEARS 4n ed as follows: CK is defined in terms of nlf ..., fj,K in the same manner as moments are expressed in terms of cumulants, i.e. Cl=/.il, ^2 = ^2 + ^1, ^3 + 3^^2 + ^15 •••• We then have the following theorem, due to I. A. Ibragimov [15]. For the relation nKl2j k=1,2, ... to hold uniformly with respect to x, it is necessary (and for distribution V(x) satisfying Cramer's condition (C), sufficient) that the following conditions be satisfied: 1). the absolute moments up to order k + 1 of the distribution V(x) are finite, and ) = Pm (m=l, ... 2). f \x\K+1dV(x) = o(z-1) (z->oo); J 3) lim f xK+2dV(x) = z-*tx> J —z In recent years a number of papers have appeared in which the conver- convergence rate in the central limit theorem is investigated by means of series composed of weighted remainders [48], [49], [51]. Let Xl5 X2, ¦¦¦ be a sequence of independent and identically distributed random variables with E(X1)=0, 0<a2 = E(Xl)< 00, and Fn(x) the distribution function of the normed sum (en*)-1 j Heyde [49] has shown that the series n— 1 x converges if and only if: ?|A'1|2 + a<oo,@<^<l); E{Xj log A+ |A\|)} < 00 , {3 = 0). If Xlt ..., Xn are independent random variables each with the normal
412 I. A. IBRAGIMOV, V. V. PETROV distribution function <P(x), then supx\Fn(x) — <P(x)\ = 0, although the right hand side of the Berry-Esseen inequality differs from zero. Thus it is of interest to consider estimates of the remainder in the central limit theorem which do in fact become zero for normally distributed random variables. Let X1, X2, ¦.., Xn be independent random variables with the same distri- distribution function V(x), with E(X1) = 0, E(Xj) = l. We introduce the pseu- domoments 'oo v,= \x\ld(V(x)-$(x)) — oo (As far as is known, pseudomoments in connection with probabilistic limit theorems were first utilized by Bergstrom [43].) Extending the in- investigations of V. M. Zolotarev [10], V. Paulauskas [30] has shown that sup Pin-* Y,Xj<x)-<P(x) Cn * max (v3, v|), where C is an absolute constant. On chapter 4 We consider a sequence of independent and identically distributed ran- random variables Xx, X2, •••, with positive variance a2 and finite moment * of some integral order k^3. Put V. V. Petrov [32] has obtained the following refinements of Theorems 4.5.2 and 4.5.4- which are due respectively to B. V. Gnedenko and Esseen - without auxiliary conditions. If, for some n — n0 the random variable Zn has an absolutely continuous distribution with bounded density pn(x), then there exists a function ?(n) independent of x such that lim,,^^ ?(n) = 0 and pn{x)-(f)(x) - k-2 Mv/2 v= 1 " "i for all x. If the random variable Xy may only take values of the form a + Nh
SOME CONTRIBUTIONS OF RECENT YEARS 413 (N=0, + 1, ±2, ...) where h is the maximal lattice distance, and a some fixed number, then there exists a function 5 (n) independent of N such that lim 8(n)=0 and = 1 ,v/2 for all N, where In these theorems <?(*) = B7r)^e-^2, Pv(-# = ~ Pv(-*). Local limit theorems for sums of independent non-identically distributed random variables have been investigated by: V. V. Petrov [35]; V. A. Statuliavicius [40]; A. A. Mitalauskas and V. A. Statuliavicius [19]; N. G. Gamkrelidze [4], [5] ; D. A. Moskvin, L. P. Postnikova, and A. A. Yudin [21]; and V. L. Pipiras and V. A. Statuliavicius [37]. On chapter 5 Let Xy, X2, ... be a sequence of independently and identically distributed random variables with zero expectation and finite positive variance a2. As before, put j= Further, let / r« \ i/p \\Fn-<P\\P=[\ \Fn(x)-#(x)\>dx) , p>l. I. A. Ibragimov [14] has shown that for the relationship \\Fn-np=0(n-dl2) to hold for any p ^ 1 and 8 @< 5 < 1), it is necessary and sufficient that as
414 I. A. IBRAGIMOV, V. V. PETROV (In the case 8 = 1, it is necessary to supplement this condition by z-+oo.) Here V(x) is the distribution function of Xy. Heyde [51] has shown that the series n  n~ 1 converges if and only if ? {X\ log(l + |-X\ |)} < oo. Discarding the require- requirement of finite variance of the random variable Xl5 Heyde has shown that if the distribution function of Xl5 V(x), belongs to the domain of attraction of the normal law, and further the condition dx< oo ? n-'T p(B;liXj<x)-<P(x) ~ 1 J—co \ j=l / is satisfied, where Bn is a sequence of constants such that then it follows that ?(Xi) < oo, i.e. the distribution function V(x) be- belongs to the domain of normal attraction of the normal law. Estimates of the rate of convergence of Fn(x) to <P(x) in the metric of the space Lp may be obtained from non-uniform estimates of the difference Fn(x) — <P(x), which take into account the dependence of this difference on n and x. For example, from the results of L. V. Osipov [28] and V. V. Petrov [32] on asymptotic expansions in limit theorems, cited above, we arrive at the following conclusions. Let Xx, X2, ... be a sequence of in- independently and identically distributed random variables with zero mean and finite moment ?|X1|K for some integer k^3. If the distribution of the random variable Xx satisfies Cramer's condition (C), then .5=1 PV(-<P) ,v/2 for arbitrary p^l. If the random variable (an*) 12,nj=1Xj, where a2 =
SOME CONTRIBUTIONS OF RECENT YEARS 415 E(X\), has for some n = n0 an absolutely continuous distribution with bounded density pn{x), then Ilfl,-*IL = v= 1 iv/2 p for any p^l. Here <f>(x) = B7r)-±e -±e-*2/2 V. M. Zolotarev [11], [12] has investigated the topic of asymptotically correct constants in relation to refinements of limit theorems in Lp spaces. On chapters 6-14 A considerable amount of work in the literature of recent years has been devoted to limit theorems for probabilities of large deviations of sums of independent random variables, and to their application. We restrict our- ourselves to mentioning several results which are pertinent to the contents of Chapters 6-14. Let F (x) be the distribution function of a random variable with zero expectation, positive variance a2 and finite moments of all orders. Let yK be the cumulant of order k of the distribution F(x). V. A. Statuliavicius [59] has obtained relations of Cramer type for {1 — F(xa)}/{1 — <P(x)} and {F( — xa)/<P( — x)} in the interval l where A = a inf and H and 8 are certain positive constants. His results imply Theorem 8.4.1 (a refinement of Cramer's theorem), if for F(x) we take the distribu- distribution function of the normed sums of independent random variables each with the same distribution function V(x), satisfying the condition ehxdV< oo, \h\< A, for some A >0 (Cramer's condition (A)). •' — oo The paper [59] also contains information on the estimation of constants in remainder terms of Cramer-type relations.
416 I. A. IBRAGIMOV, V. V. PETROV As before, we shall say that a distribution satisfies Cramer's condition (C), if its characteristic function v(t) satisfies lim sup|,|_>oo|y(t)| < 1. Let Xy, X2, ... be a sequence of independent and identically distributed random variables, satisfying both Cramer's conditions (A) and (C). Let E{Xl) = 0, E(Xi) = cr2>0, Fn = P{{aii^)-l H"j=lXj<x}. L. Saulis [39] has shown that there exists a positive constant ? such that in the region 1 ^ x ^ ?n* for integral s ^ 2 the relation holds. Here /l(t) is Cramer's series (the same as in Theorem 8.4.1), and the Lv(x) are functions for which explicit formulae are given in [39]. In parti- particular, for s = 2 we have where x2-\ 'x2/2 ' x3 If Cramer's condition (A) is satisfied, and the random variable Xx has a non-lattice distribution, then L. Saulis [39] has shown that in the region 1 ^ x ^ ?w* we have 6a5 n V. A. Statuliavicius [59] and V. V. Petrov [36] have obtained generali- generalizations of Cramer's limit theorem to non-identically distributed inde- independent random variables. Local limit theorems forv large deviations of sums of independently non-identically distributed summands, satis- satisfying Cramer's condition (A), have been obtained by P. Survila [41], [42]. Extending the investigations of Yu. V. Linnik and V. V. Petrov in rela- relation to large deviations of sums of independently and identically distri-
SOME CONTRIBUTIONS OF RECENT YEARS 417 buted random variables when Cramer's condition (A) may not hold (an account is given in Chapters 9-13), V. Wolff [2], [3] has obtained very general results for sequences of independent non-identically distri- distributed random variables. In the course of these he has obtained estimates in the corresponding asymptotic expansions which appear to be new even in the particular case of identically distributed variables. We mention a corollary of Wolff's theorems. Let Xy, X2, ... be a sequence of indepen- independently and identically distributed random variables, with E(Xj)=0, E exp|X1|4ot/Bot+1)<oo for some positive a<\. Then 1 — <P(x) as n->-oo in the region 0^x^rf/p(n), where p(n) is an arbitrary function satisfying lim,,^ p(n) = + oo. Here s is a non-negative integer, defined by the inequalities s+1 < a ^ 2(s + 2) ¦ 2(s + 3)' and Xls] (t) is the truncation of Cramer's series X(t), consisting of the first (s+1) members. S. V. Nagaev [24] has related, to a substantial extent, the region in which the condition ?exp|X1|^< oo @</?<l) is sufficient for the relation \—Fn(x)~\—${x) and Fn(—x)~<P( — x) or for known relations per- pertaining to truncations of Cramer's series, to the region in which this condition is necessary for similar relations to hold. Recently L. V. Osipov has obtained necessary and sufficient conditions for « as n-*oo uniformly with respect to x in the domain O^x^rf, where We mention yet another result of A. V. Nagaev [24]: the condition ?|Ar1|m< oo is sufficient for in the region O^x^ {{\m— l)log n}'1, and necessary for these same relations in the region 0<x^ {(m+l)log n}~1.
418 I. A. IBRAGIMOV, V. V. PETROV For some special classes of distributions, in relation to the asymptotic behaviour of the probabilities of large deviations of sums of independent random variables, A. V. Nagaev [22] [23] has obtained results without restriction on the order of growth of x. In [23] the assumption made is that the distribution of the random variables is absolutely continuous with density p(x)^exp{ — \x\1~i} as |x|->-oo, where 0<?<l. V. V. Petrov [34] and Rubin and Sethuraman [57] have obtained limit theorems for the probabilities of large deviations when Cramer's condi- condition (A) is replaced by less restrictive conditions of one-sided character. For example, in [34] it is assumed that the moment generating function E(ehXl) is finite in some non-degenerate interval, one of the ends of which is h = 0. Heyde [50] has investigated the asymptotic behaviour of probabilities of large deviations of independently and identically distributed random variables belonging to the domain of attraction of a non-normal stable law. On chapter 15 To date it is not clear to what extent the estimate A5.1.2) is conclusive. A series of interesting estimates for the concentration function of a sum of independent random variables has been obtained using purely ana- analytic means, by Esseen [45], [46]. In particular, [45] contains an analytic proof of Theorem 15.2.1. To demonstrate the approach of Esseen, we confine ourselves to proving the inequality A5.2.10) for identically dis- distributed random variables, on which the proof of Theorem 15.2.1 de- depends. Esseen's method depends on the following fundamental lemma [46]. Let X be a random variable with concentration function QX(L) and characteristic function/(t). Then there exist absolute constants Cy and C2 such that rb/2 ra \ \f(t)\2dt^Qx(LHC2a-1 \f(t)\dt, A) J _b/2 J -a where b is an arbitrary positive number, and a an arbitrary number satis- satisfying the inequalities 0<aL<n. We prove only the right hand inequality in A). Let
SOME CONTRIBUTIONS OF RECENT YEARS 419 and H(x)= f eitxh{t)dt ; J then h(x)=l-\t\, if |t|^l, h{t) = 0 if |t| > 1. We have the following relation, denoting by F(x) the distribution function from which (the right hand side of) A) follows easily. Now let Xy, ..., Xn be independently and identically distributed random variables, satisfying the conditions of Corollary 15.2.2. Let F(x) and/(t) be, respectively, the distribution function and the characteristic function of XK. In the notation of Corollary 15.2.2, in view of A), ^ \f(t)\"dt, -n/L so that -n/L -n/L Denote by G(x) the distribution function of Xx — X2. Then — 00 \f(t)\2= and in view of the conditions of Corollary A5.2.2) J\x\>l P= Further, on account of the inequality between the geometric and arith- arithmetic means
420 I. A. IBRAGIMOV, V. V. PETROV exP{-in(l-|/(Ol2)}^exp{-n f sin2 ±txdG(x)} ^ J\x\>l ^ - exp {— np sin2 j tx\ dG (x). PJ\x\>l Thus (¦nx/L dG(x)x~1 \x]>l J -nx/L C3 T which coincides with A5.2.10). Interesting inequalities for the concentration functions of sums of in- independent variables are given by H. Kesten [53]. We mention also the earlier work of Le Cam [54]. On chapter 17 The recent monograph of I. A. Ibragimov and Yu. A. Rozanov [17] con- contains many results pertaining to conditions of regularity of stationary Gaussian processes. On chapters 18-19 M. I. Gordin [6] has obtained a substantial strengthening of Theorem 18.5.3; among his results is the following. Theorem. Let the stationary sequence {X-\ satisfy the strong mixing con- condition with coefficient cc(n). Suppose that for some E^0, E\Xj\2+s< oo. If and as n->oo V(Sn)xn, then < z Gordin uses a new method of proof which differs from the methods of
SOME CONTRIBUTIONS OF RECENT YEARS 421 S. N: Bernstein used in Chapters XVIII-XDC. Namely, let the stationary sequence {Xj} satisfy the condition E{Xj\Xj.lt...} = 0. Such sequences will be called martingale- differences. Earlier, P. Billings- ley and I. A. Ibragimov had shown, independently, that a stationary er- godic sequence of martingale differences with finite variances is subject to the central limit theorem. Gordin's method is one of first approximating the stationary process under investigation by a sequence of martingale- differences, and then using the cited result of Billingsley-Ibragimov. We remark that, whereas Gordin's theorem is substantially stronger than Theorem 18.5.3 for small 8, it loses comparative strength as E-»oo ; and coincides for E=co, with Theorem 18.5.4. This is not surprising. The reason for this is that this last-mentioned theorem is practically unimprovable. This was demonstrated by Yu. A. Davydov [9], who con- constructed examples of stationary sequences {Xj\ such that 2<k<3, V{SJ=v(?x]\ -oo but the normed sum { V(Sn)} * X Xj has in the limit a stable distribution with index k — 1. To construct this example, and a series of others, Davydov first investi- investigated when Markov processes satisfy the strong mixing condition. For simplicity we confine ourselves to Markov chains with a denumerable number of states (see §1, seen. 19). Suppose the sequence of random variables {Yn} forms such a Markov chain, with transition matrix ||/?,7||. Suppose further that the states form a simple aperiodic positive recurrent class. It is well known (e.g. [44]) that in this case lim pg> = nj n~* oo exists, where {nj} is the stationary distribution of the chain. If the initial distribution is taken as {nj}, {Yj} is a stationary process. Theorem. The stationary process defined above satisfies the strong mixing conditions, and <x(n) ^ sup ? n; sup T (p-f- pi*+n)) . B) K 16 J B<=J
422 I. A. IBRAGIMOV, V. V. PETROV Even though the proof of this assertion is not very complex, we omit it. On the basis of the inequality B), it is natural to use constructions for the required examples similar to those used for analogous purposes in the theory of Markov chains [44]. Specifically, let us consider a Markov chain whose state space consists of all integers, the transition probabilities being defined by Pu+i = P-i,-i-i =ai Pio =P-io =l-a,-, i^O. Here ao=\, and for i ^ 1,0< a{< 1. Denote by^n) the probability of first return to state i at the n-th step. Then for n > 2 b0 =b1 = l, bn = a1,...,an_1, O2 . Consequently if Xbn< oo, the states of the chain form a single positive recurrent class, with stationary distribution. o o Now let us select numbers an such that f(n) _ _?_ V f(n) _ /OO — ~H> 2-,J00 — and define a stationary processes {Xj} by Xj=f(Yj), where the function / is defined on the integers as follows = 0, (()i With the aid of methods used in [44] it is possible to show that for the process {Xj}, the sums X" Xj normed in an appropriate sense are asymp- asymptotically distributed according to a stable law with index k— 1. Finally, to see that cc(n)= O(n2~K), one needs to use the inequality B) in conjunction with the following result: if a Markov chain consists of a single positive recurrent class, then for s > 1
SOME CONTRIBUTIONS OF RECENT YEARS 423 if and only if We omit the proof of this assertion. The following examples may also be constructed, in an analogous manner : a) A stationary process {Xj} such that E\Xj\r< oo, and the sums Sn=Z" X-} are attracted to the stable law with index 2r/(r +1), but oc(n)x(n log log n) ~1 (here r is any number exceeding 2). (b) A stationary process {Xj} such that \Xj\ < 2 and the sums Sn='Z" Xj belong to the domain of partial attraction of all stable laws with indices from the interval A+8, 2), e>0, and c/n^a(ri)^c/rf. The results formulated above provide a partial solution to the problem of Section 4.6 of Chapters XVIII-XDC (see Chapter XX). In the theorems of Chapter XVIII, it is assumed that -> oo . The checking of whether this assumption holds is sometimes rather diffi- difficult. In the paper of M. I. Gordin [7] conditions are developed which imply lim?(E" XjJ = oo. For example, one such condition is P{AB}^cP{A}P{B}, oO.ylel^BelJ, which are, in particular, satisfied for stationary processes generated by the coefficients of decompositions into continued fractions (see §4 of Chapter 19). Diverse limit theorems for processes with mixing may be found in the papers of M. H. Reznik [38], R. J. Serfling [58], and W. Phillipp [55], [56]. The last-mentioned paper contains estimates of the rate of convergence to the limiting normal distribution in the spirit of Chapter III; it would seem, however, that, as yet, these are far from precise. Better estimates, but under more restrictive conditions (such as those applicable to the situation of §3 in Chapter XDC) have been obtained by I. A. Ibragimov [16]. D. Moskvin [20] has obtained theorems on large deviations for processes of the kind considered in §3 of Chapter XDC.
424 I. A. IBRAGIMOV, V. V. PETROV We remark that in §4 of Chapter XIX an example was given of the appli- application of theoretico-probabilistic considerations in the investigation of metric problems of number theory. Earlier examples of a similar sort may be found in the work of Gordin [7] and M. Waterman [60], in which other references are given. In particular, M. I. Gordin has shown that stationary processes generated by a whole series of number-theoretic endomorphisms (Riesz algorithms, ^-decompositions, Jacobi algorithms) satisfy the condition of uniformly strong mixing with exponentially decreasing mixing coefficient. Bibliography for chapter 21 [1] Bikialis, A., Estimates of the remainder term in the central limit theorem (Russian), Litovskiy matem. sbornik, 1966,6, No. 3,321-346. [2] Wolff, V., Some limit theorems for large deviations (Russian),, Doklady Akademii nauk SSSR, 1968, 178, No. 1, 21-23. [3] —, Some limit theorems for large deviations of sums of independent random variables (Russian), Doklady Akademii nauk SSSR, 1970, 191, No. 6, 1209-1211. [4] Gamkrelidze, N. G., On the rate of convergence in the local theorem for lattice distributions (Russian), Teoriya veroiatn. i ee primen., 1966, 11, vyp. 1, 129-140. [5] —, On the relation between the local and integral theorems for lattice distributions (Russian), Teoriya veroiatn. i ee primen., 1968, 13, vyp. 1, 175-179. [6] Gordin, M. I. On the central limit theorem for stationary processes (Russian), Doklady AN S.S.S.R., 188, 4 A969). [7] —, On random processes generated by number-theoretical endo- endomorphisms (Russian), Doklady AN S.S.S.R., 182 5, A968). [8] —, Yu. A. Davydov, I. A. Ibragimov, V. I. Solev, Stationary pro- processes : limit theorems, regularity conditions (Russian), Sovietsko- iaponskiy simposiyum po teorii veroiatnostey (Soviet-Japanese sym- symposium on probability theory), Novosibirsk, 1969. [9] Davydov, Yu. A., On the strong mixing property for Markov chains with a denumerable number of states (Russian), Doklady AN S.S.S.R., 187, No. 2 A969). [10] Zolotarev, V. M., On the closeness of the distributions of two sums
SOME CONTRIBUTIONS OF RECENT YEARS 425 of independent random variables (Russian), Teoriya veroiatn. i ee primenia, 1965, 10, vyp. 3, 519-526. [11] —, On an extremal problem in limit theorems for sums of independ- independent random variables (Russian), Litovskiy matem. sbornik, 1964, 4, No. 3, 343-352. [13] —, Some inequalities of probability theory and their application to the refinement of A. M. Liapunov's theorem, (Russian), Doklady Akademii nauk S.S.S.R., 1967, 177, No. 3, 501-504. [14] Ibragimov, I. A., On the accuracy of approximation of the distribu- distribution function of a sum of independent random variables by the normal distribution (Russian), Teoriya veroiatn. i ee primen., 1966, vyp. 4, 632-655. [15] —, On the Chebyshev-Cramer asymptotic expansions (Russian), Teoriya veroriatn. i ee primen., 1967, 12, vyp. 3, 506-519. [16] —, The central limit theorem for sums of functions of independent variables (Russian), Teoriya veroiatn. i ee primen., 12, 4, 1967. [17] —, Yu. A. Rozanov, Stationary Gaussian Processes (Russian), M., 1970. [18] Kolodiazhniy, S. F., Generalization ofa theorem of Esseen (Russian). Vestnik Leningradsk. univ., 1968, No. 13, 28-33. [19] Mitalauskas, A. A., V. A. Statuliavicius, Local limit theorems and asymptotic expansions for sums of independent lattice random variables (Russian), Litovskiy matem. sbornik, 1966,6, No. 4,569-583. [20] Moskvin, D. A., On the asymptotics of the probabilities of large deviations of the sums E/(x2") (Russian), Teoriya veroiatn. i ee primen., 15, 2 A970). [21] —, L. P. Postnikova, A. A. Yudin, On an arithmetic method of obtaining local limit theorems for lattice random variables (Russian), Teoriya veroiatn. i ee primen., 1970, 15, vyp. 1, 86-96. [22] Nagaev, A. V. Large deviations for a class of distributions (Russian), In the sbornik: Limit theorems of probability theory, izd.-vo Aka- Akademii nauk Uzbekskoy S.S.R., Tashkent, 1963, 56-68. [23] —, Integral limit theorems allowing for large deviations, when Cramer's condition does not hold, I, II (Russian), Teoriya veroiatn. i ee primen., 1969, 14, vyp. 1, 51-63; vyp. 2, 203-216. [24] —, Some limit theorems for large deviations (Russian), Teoriya veroiatn. i ee primen., 1965, 10, vyp. 2, 231-254. [25] Osipov, L. V., A refinement of Lindeberg's theorem (Russian),
426 I. A. IBRAGIMOV, V. V. PETROV Teoriya veroiatn. i ee primen., 1966, 11, vyp. 2, 339-342. [26] —, V. V. Petrov, On the estimation of the remainder term in the central limit theorem (Russian), Teoriya veroiatn. i ee primen., 1967, 12, vyp. 2, 322-329. [27] —, On asymptotic expansions of the distribution function of the sum of independent lattice random variables (Russian), Teoriya veroiatn. i ee primen., 1969, 14, vyp. 3, 468-^475. [28] —, Asymptotic expansions in the central limit theorem (Russian), Vestnik Leningrad, univ., 1967, No. 19, 45-62. [29] —, On the accuracy of approximation of the distribution of the sum of independent random variables by the normal distribution (Russian), Doklady Akademii nauk S.S.S.R., 1968, 178, No. 5, 1013- 1016. [30] Paulauskas, V., On a strengthening of Liapunov's theorem (Rus- (Russian), Litovskiy matem. sbornik, 1969, 9, No. 2, 323-328. [31] Petrov, V. V., On some polynomials occuring in probability theory (Russian), Vestnik Leningradsk. univ., 1962, No. 19, 150-153. [32] —, On local limit theorems for sums of independent random vari- variables (Russian), Teoriya veroiatn. i ee primen., 1964, 9, vyp. 2, 343- 352. [33] —, An estimate of the deviation of the distribution of the sum of independent random variables from the normal law (Russian), Doklady Akademii nauk S.S.S.R., 1965, 160, No. 5, 1013-1015. [34] —, On the probabilities of large deviations of sums of independent random variables (Russian), Teoriya veroiatn. i ee primen., 1965, 10, vyp. 2, 310-322. [35] —, Limit theorems for K-sequences of independent random variables (Russian), Litovskiy matem. sbornik, 1965, 5, No. 3, 443-455. [36] —, Asymptotic behaviour of probabilities of large deviations (Russian), Teoriya veroiatn. i ee pnmen.,-4968, 13, vyp. 2, 432^44. [37] Pipiras, V. L., V. A. Statuliavicius, Asymptotic expansions for sums of independent random variables (Russian), Litovskiy matem. sbor- sbornik, 1968, 8, No. 1, 137-151. [38] Reznik, M. H., The law of the iterated logarithm for some classes of stationary processes (Russian), Teoriyatn. i ee primen., 13,4, A968). [39] Saulis, L., An asymptotic expansion for probabilities of large deviations (Russian), Litovskiy matem. sbornik, 1969, 9, No. 3, 605-625.
SOME CONTRIBUTIONS OF RECENT YEARS 427 [40] Statuliavicius, V. A., Limit theorems for densities and asymptotic expansions for distributions of sums of independent random vari- variables (Russian), Teoriya veroiatn. i ee primen., 1965, 10, vyp. 4, 645-659. [41] Survila, P., On large deviations for densities (Russian), Litovskiy matem. sbornik, 1966, 6, No. 4, 591-600. [42] —, On large deviations in the local theorem for lattice random variables (Russian), Litovskiy matem. sbornik, 1968,8, No. 2,317-330. [43] Bergstr6m, H., On distribution functions with a limiting stable dis- distribution function, Arkiv Mat., 1953, 2, No. 5, 463-474. [44] Chung, K. L., Markov Chains with Stationary Transition Probabili- Probabilities, Springer, 1960. [45] Esseen, C. G., On the Kolmogorov-Rogozin inequality for the con- concentration function, Z. Wahrscheinlichkeitstheorie verw. Geb., 5, 210-216 A966). [46] —, On the concentration function of a sum of independent random variables, Z. Wahrscheinlichkeitstheorie verw. Geb., 9,290-308 A968). [47] Feller, W. On the Berry-Esseen theorem, Z. Wahrscheinlichkeits- Wahrscheinlichkeitstheorie verw. Geb., 1968, 10, No. 3, 261-268. [48] Friedman, N. M. Katz, L. H. Koopmans, Convergence rates for the central limit theorem, Proc. Nat. Acad. Sci. U.S.A., 1966, 56, No. 4, 1062-1065. [49] Heyde, C. C, On the influence of moments on the rate of convergence to the normal distribution, Z. Wahrscheinlichkeitstheorie verw. Geb., 1967, 8, No. 1, 12-18. [50] —, On large deviation probabilities in the case of attraction to a non-normal stable law, Sankhya, 1968, A30 ,No. 3, 253-258. [51] —, Some properties of metrics in a study on convergence to mor- mality, Z. Wahrscheinlichkeitstheorie verw. Geb., 1969, II, No. 3, 181-192. [52] Katz, M. L., Note on the Berry-Esseen theorem, Ann. Math. Statist., 1963, 34, 1107-1108. [53] Kesten, H., A sharper form of the Doblin-Levy-Kolmogorov- Rogozin inequality for concentration function, Math. Scand., 25, 1, A969) 133-143. [54] Le Cam, L., On the distribution of sums of independent random variables, Bernouilli-Bayes-Laplace Anniv. vol., Springer, 1965. [55] Philipp, W., The central limit problem for mixing sequences of
428 I. A. IBRAGIMOV, V. V. PETROV random variables, Z. Warsch. verw. Geb., 12, 155-171 A969). [56] —, The remainder in the central limit theorem for mixing stochastic processes, Ann. Math. Stat., 40, 2 A969). [57] Rubin, H., F. Sethuraman, Probabilities,of moderate deviations, Sankhya, All, 2-4 A965). [58] Serfling, R. J. Contributions to central limit theory for dependent variables, Ann. Math. Stat., 39, 1158-1195 A968). [59] Statulivicius, V. A., On large deviations, Z. Warsch. verw. Geb., 6, 2 A966). [60] Waterman, M., Some ergodic properties of multidimensional F- expansions, Michigan State Univ., RM-227, M S-W, May 1969.
BIBLIOGRAPHY [1] Agnew, R. P., Global versions of the central limit theorem, Proc. Nat. Acad. ScL, 40 A954) 800-804. [2] Akhiezer, N. I. and Glazman, I. M., The theory of linear operators in Hilbert space, Gostekhizdat, 1950. [3] Bahadur, R. R. and Ranga Rao, R., On deviations of the sample mean, Ann. Math. Statist., 31 A960) 1015-1027. [4] Bavli, G. M., On a local limit theorem in the theory of probability, Sc. Ann. Sverdlovsk Univ., 2 A937) 7-24. [5] —, Uber den lokalen Grenzwertsatz der Wahrscheinlichkeits- rechnung, Rev. Fac. Sci. Univ. Istanbul, 2 A937) 79-92. [6] Bergstrom, H., On some expansions of stable distribution func- functions, Ark. Mat., 2 A952) 375-378. [7] Bernstein, S. N., The theory of probability, Gostekhizdat 1946. [8] —, Sur l'extension du theoreme limite du calcul des probabilites aux sommes de quantites dependantes. Math. Ann., 97 A926) 1-59. [9] Black well, D. and Hodges, J. L., The probability in the extreme tail of a convolution. Ann. Math. Statist., 30 A959) 1113-1120. [10] Blum, J. R. and Rosenblatt, M., A class of stationary processes and a central limit theorem. Duke Math. J., 24 A957) 73-78. [11] Bochner, S. and Chandrasekharan, K., Fourier transforms, Princeton, 1949. [12] Bochner, S., Lectures on Fourier integrals, Princeton, 1959. [13] Bruijn, N. G. de, Asymptotic methods in analysis. North Holland, 1958. [14] Ciucu, G. and Theodorescu, R. Procese cu Legaturi complete. Bucarest, 1960. [15] Ciucu, G. Proprietati ergodice ale unor lanturi cu legaturi com- complete, Studii si cercetari matematice, 8 A957) 413-446.
430 BIBLIOGRAPHY [16] —, Proprietes asymptotiques des chaines a liaisons completes. Rend. Acad. Nat. Lince, 22 A957) 11-15. [17] Cheng, T. T., On asymptotic expansions connected with the sums of independent random variables. Acta Math. Sinica, 5 A955) 91-108. [18] Chernoff, H., Large sample theory: parametric case. Ann. Math. Statist., 27 A956) 1-22. [19] Cramer, H. Sur un nouveau theoreme-limite de la theorie des probabilites. Act. Sci. et Ind., 736 A938). [20] —, On the theory of stationary random processes. Ann. Math., 41 A940) 215-230. [21] —, On the approximation to a stable probability distribution. Studies in Mathematical Analysis and related topics (Stanford, 1962) 70-76. [22] — , Mathematical methods of statistics. Princeton, 1946. [23] —, Random variables and probability distributions. Cambridge, 1937. [24] Daniels, H. E., Saddle-point approximations in statistics. Ann. Math. Statist. 25 A954) 631-650. [25] Diananda, P. H. Some probability limit theorems with statistical applications. Proc. Camb. Phil. Soc, 49 A953) 239-246. [26] —, The central limit theorem for m-dependent random variables asymptotically stationary to second order. Proc. Camb. Phil. Soc, 50 A954) 287-292. [27] — , The central limit theorem for non-stationary Markov chains. Teor. Veroyatnost. i Primenen 1,1 A956) 72-89; II, 1 A956) 365-^25. [28] Dobrushin, R. L., Asymptotic bounds for error probability in transmitting messages over a discrete channel without memory with a symmetric transmission probability matrix. Teor. Veroya- Veroyatnost. i Primenen, 7 A962) 283-311. [29] Doeblin, W., Remarques sur la theorie metrique des fractions continues. Comp. Math., 7 A940) 353-371. [30] —, Sur l'ensemble des puissances d'une loi de probability. Studia Math., 9 A940) 71-96. [31] Doob, J. L., Stochastic processes. Wiley, 1953. [32] Erdelyi, A., Higher transcendental functions, Vol. 2. McGraw-Hill, 1953. [33] Esseen, C. G., Fourier analysis of distribution functions. A mathe-
BIBLIOGRAPHY 431 matical study of the Laplace-Gaussian law. Acta Math., 77 A945) 1-125. [34] —, On mean central limit theorems. Trans. Roy. Inst. Tech. Stockholm, 121 A958) 1-30. [35] —, A moment inequality with an application to the central limit theorem. Scand. Act., 3-4 A956) 160-170. [36] Evgrafov, M. A., Asymptotic estimates and entire functions. Fiz- matgiz, 1962. [37] Feller, W., Generalization of a probability theorem of Cramer. Trans. Amer. Math. Soc, 54 A943) 361-372. [38] Finetti, B. de, Le funzioni caratterische di legge instantenea, Rend. Lincei, 12 A930) 278-282. [39] Gnedenko, B. V., On the theory of limit theorems for sums of independent variables. Izv. Akad. Nayk USSR, A939) 181-232 and 643-647. [40] —, On the theory of domains of attraction of stable laws. Uchenye Zapiski Moskov. Gos. Univ., 30 A939) 61-72. [41] —, On some properties of limit distributions for normed sums. Ukrain. Mat. Z., 1 A949) 3-8. [42] —, A local theorem for limiting stable distributions. Ukrain. Mat. Z. 1 A949) 3-15. [43] —, On domains of attraction of a normal law. Doklady Akad. Nayk USSR, 71 A950) 425-428. [44] —, A local limit theorem for densities. Dokl. Akad. Nayk USSR, 95 A954) 5-7. [45] —, On a local limit theorem for identically distributed independent terms. Wiss. Z. Humboldt Univ. Berlin, 3 A954) 287-293. [46] —, On limit theorems of the theory of probability. Akad. Nayk. Kiev, 1958. [47] — , Course on the theory of probability. Gostekhizdat, 1949. [48] Gnedenko, B. V. and Kolmogorov, A. N. Limit distributions for sums of independent random variables. Addison-Wesley, 1954. [49] Gnedenko, B. V. and Koroluk, V. S., Some remarks on the theory of domains of attraction of stable distributions. Dopovidi Akad. Nauk. Ukrain., 4 A950) 275-278. [50] Hardy, G. H., Littlewood, J. E. and Polya, G., Inequalities, Cambridge, 1934. [51] Hoeffding, W. and Robbins, H., The central limit theorem for
432 BIBLIOGRAPHY dependent random variables. Duke Math. J., 15 A948) 773-780. [52] Ibragimov, I. A., Some limit theorems for stochastic processes stationary in the strict sense. Dokl. Akad. Nayk USSR 125 A959) 711-714. [53] —, The asymptotic distribution of values of certain sums. Vestnik Leningrad Univ., 1 A960) 550-69. [54] —, A theorem from the metric theory of continued fractions. Vestnik Leningrad Univ., 1 A961) 13-24. [55] — , On spectral functions of certain classes of stationary Gaussian processes. Dokl. Akad. Nayk USSR, 137 A961) 1046-1048. [56J — , On stationary Gaussian processes with a strong mixing prop- property. Dokl. Akad. Nayk USSR, 147 A962) 1282-1284. [57] — , Some limit theorems for stationary processes. Teor. Veroyat- nost. i Primenen 7 A962) 361-392. [58] Ibragimov, I. A. and Chernin, K. E. On the unimodality of stable laws. Teor. Veroyatnost. i Primenen, 4 A959) 453-456. [59] Ito, K., Stochastic Processes, Vol. 1. I.L. 1960. [60] Kac, M., Probability methods in some problems of analysis and number theory. Bull. Amer. Math. Soc. 55 A949) 641-665. [61] Kallianpur, G., On a limit theorem for dependent random variables. Dokl. Akad. Nayk USSR, 101 A955) 13-16. [62] Karamata, J., Sur une mode de croissance reguliere; theoremes fondamentaux. Bull. Soc. Math, de France, 61 A933) 55-62. [63] Kendall, M. G., The advanced theory of statistics, Griffin, 1962. [64] Khinchin, A. Ya., Uber einen neuen Grenzwertsatz der Wahr- scheinlichkeitsrechnung. Math. Ann., 101 A929) 745-752. [65] —, Korrelationstheorie der stationaren Stochastichen Prozesse. Math. Ann., 109 A934) 631-637. [66] —, Metrische Kettenbruchtheorie. Comp. Math., 3 A936) 276-285. [67] —, Zur Metrische Kettenbruchtheorie, Comp. Math., 3 A936) 276-285. [68] —, Continued fractions. Noordhoff, 1963. [69] —, Sul dominio di attrazione della legge di Gauss. Giorn. Hal. Attuari 6 A935) 371-393. [70] —, Zur Theorie der unbeschrankt teilbaren Verteilunggesetze, Mat. Sb., 2, 44 A937) 79-120. [71] — , Limit laws for sums of independent random variables. GONTI, 1938.
BIBLIOGRAPHY 433 [72] —, On unimodal distributions. Izv. mat. mech. Tomsk Univ., 2 A938) 1-7. [73] —, The mathematical foundations of statistical mechanics. Dover, 1957. [74] —, The mathematical foundations of quantum statistics. Gostek- hizdat, 1951. [75] Khinchin, A. Ya. and Levy, P., Sur les lois stables. C. R. Acad. Sci. Paris, 202 A936). [76] Kolmogorov, A. N., Foundations of the theory of probability. Chelsea, New York, 1950. [77] —, A simplification of the proof of the Birkhoff-Khinchin ergodic theorem. Uspekhi Matern. Nayk, 5 A938) 52-56. [78] —, Sulla forma generale di una processo stocastico omogeno (Uno problema di Bruno di Finetti). Rund. Accord. Lincei, 15 A932) 805-808. [79] —, Sur Interpolation et extrapolation des suites stationnaires. C. R. Acad. Sci. Paris, 208 A939) 2043-2045. [80] —, Stationary sequences in Hilbert space. Bull. Moscow Univ. A 2 A941) 1-40. [81] —, A local limit theorem for classical Markov chains. Isv. Akad. Nayk USSR Math., 13 A949) 281-300. [82] —, Some recent work in the field of limit theorems in probability theory. Vestnik Leningrad Univ., 10 A953) 29-38. [83] —, Two uniform limit theorems for sums of independent terms. Teor. Veroyatnost. iPrimenen, 1 A956L26-436. [84] —, Sur les proprietes des fonctions de concentration de M. P. Levy. Ann. Inst. H. Poincare, 16 A958) 27-34. [85] —, On the approximations of distributions of sums of independent terms by infinitely divisible distributions. Trydi Moscow Math., 12 A963) 437-451. [86] —, A new metric invariant of transitive dynamical systems and of automorphisms of Lebesgue spaces. Dokl. Akad. Nayk USSR, 119A958)861-865. [87] Kolmogorov, A. N. and Rozanov, Yu. A., On the strong mixing conditions of a stationary Gaussian process. Teor. Veroyatnost. i Primenen, 5 A960) 222-227. [88] Leonov, V. P., The use of the characteristic functional and semi- invariants in the ergodic theorem for stationary processes. Dokl.
434 BIBLIOGRAPHY Akad. Nayk USSR, 133, No. 3 A960). [89] —, On the central limit theorem for ergodic endomorphisms of compact commutative groups. Dokl. Akad. Nayk USSR, 135 A960) 258-261. [90] —, On the dispersion of time averages of a stationary random process. Teor. Veroyatnost i. Primenen, 6 A961) 93-101. [91] Leonov, V. P. and Shiryaev, A. N., On the technique of calculating semi-invariants. Teor. Veroyatnost. i Primenen, 4 A959) 342-355. [92] —, Some problems in the spectral theory of principal moments. Teor. Veroyatnost. i Primenen, 5 A960) 460-464. [93] Levy, P., Calcul des probabilites, Paris, 1925. [94] —, Sur les integrates dont les elements sont des variables aleatoires independentes. Ann. Scuola Norm. Pisa B) 3 A934) 337-366. [95] —, Proprietes asymptotiques des sommes de variables aleatoires independentes ou enchainees. J. Math. Pures appl, G) 14 A935) 347-^02. [96] — , Theorie de Vaddition des variables aleatoires, Paris, 1937. [97] —, Remarques sur un probleme relatif aux lois stables. Studies in mathematical analysis and related topics (Stanford, 1962) 211-218. [98] Linnik, Yu. V., On the accuracy of approximation to a Gaussian distribution of sums of independent random variables. Izv. Akad. Nayk USSR Math., 11 A947) 111-138. [99] —, On stable probability laws with exponent less than one. Dokl. Akad. Nayk USSR, 94 A954) 619-621. [100] —, New limit theorems for sums of random variables. Dokl. Akad. Nayk USSR, 133 A960) 1291-1293. [101] —, Limit theorems for sums of independent random variables, I, II, III. Teor. Veroyatnost. i Primenen, 6 A961) 145-163, 6 A961) 377-391, 7 A962) 121-134. [102] —, Decompositions of probability laws. Leningrad, 1960. [103] —, Markov chains in the analytic arithmetic of quaternions and matrices. Vestnik Leningrad Univ., 3 A956) 63-68. [104] —, On the probability of large deviations for the sums of inde- independent random variables. Proc. 6th Berkeley Symposium, 1960. [105] Loeve, M., Probability theory. Van Nostrand, 1955. [106] Lyapunov, A. M., Sur une proposition de la theorie des probabili- probabilites, Bull. Acad. Sci. St. Petersbourg E) 13 A900) 359-386. [107] —, Nouvelle forme du theoreme sur la limite de theorie des pro-
BIBLIOGRAPHY 435 babilites. Mem. Acad. St. Petersbourg, (8) 12 A901). [108] Markov, A. A., Probability theory, 1924. [109] Medygessy, P., Partial integro-differential equations for stable density functions and their applications. Publ. Math., 5 A958) ' 288-293. [110] Meshalkin, L. D., Limit theorems for Markov chains with a finite number of states. Teor. Veroyatnost. i Primenen, 3 A958) 361-385. [Ill] —, On the approximation of distributions of sums by infinitely divisible laws. Teor. Veroyatnost. i Primenen, 6 A961) 257-275. [112] Meshalkin, L. D., and Rogozin, B. A. Estimation of the distance between distribution functions by the proximity of their characteristic functions and application to the central limit theorem. Limit theorems in probability theory (Tashkent 1963). [113] Mitalauskas, A. A., On a local limit theorem for stable limit distributions. Teor. Veroyatnost. i Primenen, 7 A962) 185-190. [114] Mises, R. von, Vorlesungen aus dem Gebiete der angewandten Mathematik; Wahrscheinlichkeitsrechnung und ihre Angewendung in der Statistik und theoretischen Physik., Leipzig und Wien, 1931. [115] Morgentaler, G. W., A central limit theorem for uniformly bound- bounded orthonormal systems. Trans. Amer. Math. Soc, 79 A955) 281— 311. [116] Nagaev, S. V., Large deviations for a class of distributions. Limit theorems in probability theory (Tashkent, 1963) 56-68. [117] —, Some limit theorems for homogeneous Markov chains. Teor. Veroyatnost. i Primenen, 2 A957) 389-^16. [118] —, Some problems in the theory of homogeneous Markov chains in discrete time. Dokl. Akad. Nayk USSR, 139 A961). [119] —, The central limit theorem for Markov processes in discrete time. Izv. Akad. Nayk. Yz. S.S.R., 2 A962) 12-20. [120] —, Local limit theorems for large deviations. Vestnik Leningrad Univ., 1 A962) 80-88. [121] —, The central limit theorem for large deviations. Izv. Akad. Nayk Yz. S.S.R., 6 A962) 37-43. [122] Petrov, V. V., A local theorem for the densities of sums of inde- independent random variables. Teor. Veroyatnost. i Primenen, 1 A956) 349-357. [123] —, A local theorem for lattice distributions. Dokl. Akad. Nayk USSR 115A957L9-52.
436 BIBLIOGRAPHY [124] —, An asymptotic expansion for the derivatives of distribution functions of a sum of independent terms. Vestnik Leningrad Univ., 19 A960) 9-18. [125] —, A refinement of the local limit theorem for non-identical lat- lattice distributions. Teor. Veroyatnost. i Primenen, 7 A962) 344-346. [126] —, An asymptotic expansion for the derivatives of distribution functions of a sum of independent random variables. Trydi VI All-Union conference on the theory of probability and mathematical statistics (Vil'nyns, 1962) 71-73. [127] —, On local theorems for large deviations. Dokl. Akad. Nayk USSR, 134 A960) 525-528. [128] — , On integral theorems for large deviations. Dokl. Akad. Nayk USSR, 138 A961O79-780. [129] —, On large deviations of sums of random variables. Vestnik Leningrad Univ., 1 A961) 25-37. [130] —, Limit theorems for large deviations when Cramer's condition is violated, I. Vestnik Leningrad Univ., 19 A963) 49-68. [131] —, Limit theorems for large deviations when Cramer's condition is violated, II. Vestnik Leningrad Univ., 1 A964). [132] —, An extension of Cramer's limit theorem to non-identically distributed independent variables. Vestnik Leningrad Univ. ,8 A953) 13-25. [133] —, A generalisation of Cramer's limit theorem. Uspekhi Matem. Nayk, 9 A954) 195-202. [134] — , On the probability of large deviations of sums of independent identically distributed random variables. Dokl. Akad. Nayk, USSR, 154 A964). [135] Pollard, H., The representation of e~x* as a Laplace integral. Bull. Amer. Math. Soc, 52 A946) 908-910. [136] Privalov, 1.1., Boundary properties of analytic functions. Gostekhiz- dat, 1950. [137] Prokhorov, Yu. V., Some refinements of a theorem of Lyapunov. Izv. Akad. Nayk USSR Math., 16 A952) 281-292. [138] —, A local theorem for densities. Dokl. Akad. Nayk. USSR, 83 A952) 797-780. [139] —, On the local limit theorem for lattice distributions. Dokl. Akad. Nayk USSR, 98 A954) 535-538. [140] —, The asymptotic behaviour of the binomial distribution.
BIBLIOGRAPHY 437 Uspekhi Matem. Nayk, 8 A953) 135-142. [141] —, On sums of identically distributed random variables. Dokl. Akad. Nayk USSR, 105 A955) 645-647. [142] —, A uniform limit theorem of A. N. Kolmogorov. Teor. Veroyat- nost. i Primenen, 5 A960) 103-113. [143] —, On a local limit theorem. Limit theorems in probability theory (Tashkent, 1963) 75-80. [144] Renyi, A., Representations for real numbers and their ergodic properties. Acta Math. Acad. Sci. Hung., 8 A957) 477-^93. [145] Richter, W., Zur Frage der Notwendigkeit der Cramerschen Bedingung. Math. Nach., 20 A959) 231-238. [146] —, Wahrscheinlichkeiten grosser Abweichungen in Nicht- Cramerschen Fall. Wiss. Z. Techn. Hochsch. Dresden, 9 A959/60) 881-896. [147] —, Local limit theorems for large deviations. Teor. Veroyatnost.i Primenen, 2 A957) 214-229. [148] —, Multi-dimensional local limit theorems for large deviations. Teor. Veroyatnost. i Primenen, 3 A958) 107-114. [149] —, Refinement of an inequality of S. N. Bernstein. Vestnik Lenin- Leningrad Univ., 1 A959) 24-29. [150] Rizhik, I. M., Tables of integrals and functions. GTTI, 1946. [151] Robinson, E. A., Sums of stationary random variables. Proc. Amer. Math. Soc, 11 A960) 77-79. [152] Rogozin, B. A., A remark on the paper 'A moment inequality with an application to the central limit theorem' by C. G. Esseen. Teor. Veroyatnost. i Primenen, 5 A960) 125-127. [153] —, An estimate of concentration functions. Teor. Veroyatnost. i Primenen, 6 A961) 103-105. [154] Rokhlin, V. A., Selected problems from the metric theory of dynamical systems. Uspekhi Matem. Nayk, 4 A949) 57-128. [155] —, Exact endomorphisms of a Lebesgue space. Izv. Akad. Nayk. USSR Math., 25 A961) 499-530. [156] —, New progress in the theory of transformations with invariant measure. Uspekhi Matem. Nayk, 15 A960) 3-26. [157] Rosenblatt, M., Independence and dependence. Proc. 4th Berkeley symposium, 1961, 431-^443. [158] — , A central limit theorem and a strong mixing condition. Proc. Nat. Acad. Ad. USA, 42 A956) 43-^7.
438 BIBLIOGRAPHY [159] Rozanov, Yu. A., On a local limit theorem for lattice distributions. Teor. Veroyatnost. i Primenen, 2 A957) 275-281. [160] —, On the central limit theorem for random functions, Teor. Veroyatnost. i Primenen, 5 A960) 243-246. [161] —, On the application of the central limit theorem. Proc. 4th Berkeley Symposium, 1960. [162] — , On the central limit theorem for weakly dependent variables. Trydi VI All-Union conference on probability theory and mathe- mathematical statistics (Vil'nyns, 1962 85-95. [163] — , Stationary random processes. Holden-Day, 1967. [164] Ryll-Nardzewski, C, On the ergodic theorems, II; Ergodic theory of continued fractions. Studia Math., 12 I951) 74-79. [165] Sakovich, G. N., A unique form of conditions of attraction to stable laws. Teor. Veroyatnost. i. Primenen, 1 A956) 357-361. [166] Sanov, I. N., On the probability of large deviations of random variables. Mat. Sb., 42 A957) 11-14. [167] Sinai, Ya. G., Dynamical systems and stationary Markov processes. Teor. Veroyatnost. i Primenen, 5 A960) 335-338. [168] —, The central limit theorem for geometric flows on manifolds with constant positive curvature. Dokl. Akad. Nayk USSR, 133 A960) 1303-1306. [169] —, On limit theorems for stationary processes. Teor. Veroyatnost. i Primenen, 7 A962) 213-219. [170] —, Probability notions in ergodic theory. Proc. Intern. Congr. Math. Uppsala, 1963. [171] —, Dynamical systems with a denumerable Lebesgue spectrum of 1. Izv. Akad. Nayk USSR, 25 A961) 899-924. [172] Sirazhdinov, S. Kh. and Mamtov, M., On convergence in mean for densities. Teor. Veroyatnost. i Primenen, 7 A962) 433-437. [173] Skorokhod, A. V., A theorem about stable distributions. Uspekhi Matem. Nayk, 9 A954) 189-190. [174] —, An asymptotic formula for stable distribution laws. Dokl. Akad. Nayk USSR, 98 A954) 731-734. [175] Smirnov, N. V., On the probabilities of large errors. Mat. Sb. 40 A933) 443-^54. [176] Statulevicius, V. A., Limit theorems and their refinements for inhomogeneous Markov chains. Litovsk. Mat. Sb., 1 A961J21-314. [177] —, On refinements of limit theorems for weakly dependent vari-
BIBLIOGRAPHY 439 ables. Trydi VI Ail-Union conference on probability theory and mathematical statistics (Vil'nyns, 1962) 113-119. [178] Survila, P., Extremal properties of limit theorems. Teor. Veroyat- nost. i Primenen, 8 A963) 25-126. [179] Smith, W. L., A frequency-function form of the central limit theo- theorem. Proc. Camb. Phil. Soc, 49 A953) 462-^72. [180] Titchmarsh, E. C, Introduction to the theory of Fourier integrals, Oxford, 1948. [181] Treloar, L., Physics of rubber elasticity, Oxford, 1949. [182] Vilkayskas, L., Zones of normal convergence in the multi-dimen- multi-dimensional case. Litovsk. Mat. Sb., 1 A961) 25-39. [183] Vinokurov, V. G., Conditions for the regularity of stochastic processes. Dokl. Akad. Nauk USSR, 113 A957) 959-961. [184] Volkonskii, V. A. and Rozanov, Yu. A., Some limit theorems for random functions. Teor. Veroyatnost. i Primenen, 4 A959) 186-207. [185] Wold, H., A study in the analysis of stationary time series. Uppsala, 1938. [186] Wolfowitz, J., Information theory for mathematicians. Ann. Math. Statist., 29 A958) 351-356. [187] Yaglom, A. M., Introduction to the theory of stationary functions. Uspekhi Matem. Nayk, 7 A952) 3-168. [188] Zolotarev, V. M., An expression for the density of a stable law with exponent a greater than one by means of a density with exponent I/a. Dokl. Akad. Nayk USSR, 98 A954) 735-738. [189] —, On the analytic properties of stable distribution laws. Vestnik Leningrad Univ., 1 A956) 49-52. [190] —, Mellin-Stieltjes transforms in probability theory. Teor. Veroyatnost. i Primenen, 2 A957) 444-469. [191] —, An analogue of the Cramer expansion in the case of attraction to a stable law. Trydi VI All-Union conference on probability theory and mathematical statistics (Vil'nyns, 1962). [192] — ,On a new point of view on a limit theorem for large deviations. Trydi VI All-Union conference on probability theory and mathe- mathematical statistics (Vil'nyns, 1962) 43-^47. [193] Zygmund, A. Trigonometric series. Cambridge, 1959.
SUBJECT INDEX Agnew, 402 Autocovariance function, 291 Bavli, 402 Bergstrom, 54, 401 Bernstein, 316, 403, 404 — inequality, 130, 169 Binomial distribution, 100, 275 Cauchy distribution, 49 Central limit theorem, 315, 333, 340, 362, 384, 393 Chebyshev's inequality, 276 Chernim, 402 Cincer, 405 Collective, 157, 192 Compound Poisson distribution, 34 Concentration function, 268 Conditional expectation, 18, 400 — probability, 18 Continued fraction, 374 Continuous, 18 Convergence of distributions, 21 — in variation, 21 Convolution, 21 Cramer, 158, 171, 230, 244, 388, 402, 403 — condition, 98, 103, 155, 160 — series, 167, 174, 190,244 Degenerate, 20 Density, 19 Diananda, 405 Discrete, 19 Distribution function, 19 — probability, 19 Dobryshin, 405 Doeblin, 402, 405 Domain of attraction, 76, 79, 84, 120, 126, 141, 158 Domain of attraction, normal, 92 Entropy, 157 Ergodic theorem, 315 Esseen, 28, 401, 402 Expectation, 18 Finetti, 401 Fourier transform, 398 Functionals, 352 Gaussian, 310 — process, 292 Gnedenko, 401, 402 Hilbert space, 288 • Ibragimov, 402, 404, 405 Index of stable law, 43 Infinitely divisible distribution, 34, 267 Inversion formula, 25 Kac, 405 Karamata, 37, 76, 123, 394 Khinchin, 401, 402, 404, 405 — theorem, 35 Kolmogorov, 54, 390, 392, 401, 403, 404, 405 — theorem, 17
442 SUBJECT INDEX Kyz'min, 378 Laplace, 91, 100 Large deviations, 154, 227, 245, 255 Lattice distribution, 20, 26, 100, 120 Lebesque decomposition, 20 Lebesque - Stieltjes measure, 20 Leonov, 404, 405 Levy, 91,401,402, 404 — formula, 34 — representation," 39 Limiting tails, 158, 190, 244, 254, 391 Linnik, 54, 391, 401, 403 Local limit theorem, 120, 161 Lyapunov, 401, 402 M-dependant sequences, 369 Mamatov, 402 Markov, 404, 405 — chain, 365, 393 Medygessy, 401 Meshalkin, 404, 405 Method of steepest descents, 171,194 Metric, 128 Metrically transitive, 302 von Mises, 402 Mixing, 305 de Moivre, 91,100 Moments characteristic functions, 24 Monomial zones, 177,226 Multinomial distribution, 157 Nagaev, 391, 403, 405 Narrow zones, 198 von Neumann, 290 Parseval, 398 Pergel, 391 Petrov, 171, 244, 392, 402, 403 Plancherel, 398 Poisson law, 20 Pollard, 54 Probability, 17 — space, 17 Prohorov, 391 Radial extension, 258 Random process, 17 — variable, 17 — vector, 17 Regular, 301 Renyi, 405 Richter, 160,403 Robins, 405 Rogozin, 402, 404 Rokhlin, 405 Rosenblatt, 404, 405 Rozanov, 402, 405 Ryll-Nardzewsky, 405 Saddle point, 166 Sakovich, 402 Sanov, 157 Shirgaev, 404 Sinai, 405 Singular distributions, 20 Sirazhdinov, 402 Skorokhod, 54, 401 Slowly varying, 76 function, 37, 325, 394 Smirnov, 403 Smith, 402 Spectral density, 298 — function, 291 Stable distribution, 37 — law, 120,126, 319, 390 Stationary process, 284, 315 Statylyabichyus, 405 Step, 20 Stone, 290 Strong convergence, 22 — mixing, 305, 313, 316, 333, 354 Titchmarsh, 398 Uniform mixing, 308, 312, 325, 340, 352, 362 Uniformly asymptotically negligible, 35 Unimodal, 66, 72
SUBJECT INDEX 443 Vinokyrov, 404 Weak convergence, 22 Wintner, 290 Zero - one law, 301 Zolotarev, 52, 401, 402, 403 Zones of normal attraction, 177