/
Text
r
r
I
t
\
....I
.
, ..J , ...J
) )
Modern Techniques
nd Their ApplIcations
Second Edition
'JDJiJJJJ Jj j r JJiJJJJJ
.
. . . . , . .
. . . . .
. . . .. .
. . .
. . . .
Pure and Applied Mathematics:
A Wilev-Inlerscience Series 01 Texts, Monographs, and Tracts
Real Analysis
Modern Techniques and
Their Applications
Second Edition
Gerald B. Folland
U P COLLEGE OF SCIENCE
DILIW\N CENTRAL LIBRARY
11\11' 111'11' 1'1'1' 1\11111'1 I' 1\1111\1 \I 111'1'1'1 '1111 \I 'I 1'111\1\
A Wiley-Interscience Publication
JOHN WILEY & SONS, INC.
New York / Chichester / Weinheim / Brisbane / Singapore / Toronto
This book is printed on acid-free paper. @
Copyright @ 1999 by John Wiley & Sons, Inc. All rights reserved.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as
permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)
750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212)
850-6008. E-Mail: PERMREQ@WILEY.COM.
Library of Congress Cataloging-in-Publication Data:
Folland, Gerald B.
Real analysis: modern techniques and their applications / Gerald B.
Folland. - 2nd ed.
p. cm. - (Pure and applied mathematics)
"A Wiley-Interscience publication."
Includes bibliographical references and index.
ISBN 0-471-31716-0 (cloth: alk. paper)
1. Mathematical analysis. 2. Functions of real variables.
I. Title. II. Series: Pure and applied mathematics (John Wiley &
Sons : Unnumbered)
QA300.F67 1999
515--dc21 98-37260
Printed in the United States of America
10 9 8 7 6 5 4 3 2
To my mother
Helen B. Folland
and to the memory of my father
Harold F. Folland
Preface
The name "real analysis" is something of an anachronism. Originally applied to the
theory of functions of a real variable, it has come to encompass several subjects of
a more general and abstract nature that underlie much of modem analysis. These
general theories and their applications are the subject of this book, which is intended
primarily as a text for a graduate-level analysis course. Chapters 1 through 7 are
devoted to the core material from measure and integration theory, point set topology,
and functional analysis that is a part of most graduate curricula in mathematics,
together with a few related but less standard items with which I think all analysts
should be acquainted. The last four chapters contain a variety of topics that are meant
to introduce some of the other branches of analysis and to illustrate the uses of the
preceding material. I believe these topics are all interesting and important, but their
selection in preference to others is largely a matter of personal predilection.
The things one needs to know in order to read this book are as follows:
1. First and foremost, the classical theory of functions of a real variable: limits and
continuity, differentiation and (Riemann) integration, infinite series, uniform
convergence, and the notion of a metric space.
2. The arithmetic of complex numbers and the basic properties of the complex
exponential function e x + iy == eX ( cos y + i sin y). (More advanced results
from complex function theory are used only in the proof of the Riesz- Thorin
theorem and in a few exercises and remarks.)
3. Some elementary set theory.
vii
viii PREFACE
4. A bit of linear algebra - actually, not much beyond the definitions of vector
spaces, linear mappings, and determinants.
All of the necessary material in (1) and (2) can be found in W. Rudin's classic Princi-
ples of Mathematical Analysis (3rd ed., McGraw-Hill, 1976) or its descendants such
as R. S. Strichatrz's The Way of Analysis (Jones and Bartlett, 1995) or S. G. Krantz's
Real Analysis and Foundations (CRC Press, 1991). A summary of the relevant facts
about sets and metric spaces is provided here in Chapter O. The reader should be-
gin this book by examining 30.1 and 30.5 to become familiar with my notation and
terminology; the rest of Chapter 0 can then be referred to as needed.
Each chapter concludes with a section entitled "Notes and References." These
sections contain miscellaneous remarks, acknowledgments of sources, indications
of results not discussed in the text, references for further reading, and historical
notes. The latter are quite sketchy, although references to more detailed sources are
provided; they are intended mainly to give an idea of how the subject grew out of its
classical origins. I found it entertaining and instructive to read some of the original
papers, and I hope to encourage others to do the same.
A sizable portion of this book is devoted to exercises. They are mostly in the
form of assertions to be proved, and they range from trivial to difficult; hints and
intermediate steps are provided for the more complicated ones. Every reader should
peruse them, although only the most ambitious will try to work them all out. They
serve several purposes: amplification of results and completion of proofs in the
text, discussion of examples and counterexamples, applications of theorems, and
development of further ideas. Instructors will probably wish to do some of the
exercises in class; to maximize flexibility and minimize verbosity, I have followed
the principle of "When in doubt, leave it as an exercise," especially with regard
to examples. Exercises occur at the end of each section, but they are numbered
consecutively within each chapter. In referring to them, "Exercise n" means the nth
exercise in the present chapter unless another section is explicitly mentioned.
The topics in the book are arranged so as to allow some flexibility of presentation.
For example, Chapters 4 and 5 do not depend on Chapters 1-3 except for a few
examples and exercises. On the other hand, if one wishes to proceed quickly to LP
theory, one can skip from 33.3 to 335.1-2 and thence to Chapter 6. Chapters 10
and 11 are independent of Chapters 8 and 9 except that the ideas in 38.6 are used in
Chapter 10.
The new features of this edition are as follows:
. The material on the n-dimensional Lebesgue integral (332.6-7) has been rear-
ranged and expanded.
. Tychonoff's theorem (34.6) is proved by an elegant argument recently discov-
ered by Paul Chernoff.
. The chapter on Fourier analysis has been split into two chapters (8 and 9).
The material on Fourier series and integrals (338.3-5) has been rearranged and
now contains the Dirichlet-Jordan theorem on convergence of Fourier series.
PREFACE ix
The material on distributions (329.1-2) has been extensively rewritten and
expanded.
. A section on self-similarity and Hausdorff dimension (311.3) has been added,
replacing the outdated calculation of the Hausdorff dimension of Cantor sets
in the old 310.2.
. Innumerable small changes have been made in the hope of improving the
exposition.
The writer of a text on such a well-developed subject as real analysis must neces-
sarily be indebted to his predecessors. I kept a large supply of books on hand while
writing this one; they are too numerous to list here, but most of them can be found
in the bibliography. I am also happy to acknowledge the influence of two of my
teachers: the late Lynn Loomis, from whose lectures I first learned this subject, and
Elias Stein, who has done much to shape my point of view. Finally, I am grateful to
a number of people - especially Steven Krantz, Kenneth Ross, and William Faris
- whose comments and corrigenda concerning the first edition have helped me to
prepare the new one.
GERALD B. FOLLAND
Seattle, Washington
Contents
Preface . .
VII
0 Prologue 1
0.1 The Language of Set Theory 1
0.2 Orderings 4
0.3 Cardinality 6
0.4 More about Well Ordered Sets 9
0.5 The Extended Real Number System 10
0.6 Metric Spaces 13
0.7 Notes and References 16
1 Measures 19
1.1 Introduction 19
1.2 a-algebras 21
1.3 Measures 24
1.4 Outer Measures 28
1.5 Borel Measures on the Real Line 33
1.6 Notes and References 40
xi
xii CONTENTS
2 Integration 43
2.1 Measurable Functions 43
2.2 Integration of Nonnegative Functions 49
2.3 Integration of Complex Functions 52
2.4 Modes of Convergence 60
2.5 Product Measures 64
2.6 The n-dimensional Lebesgue Integral 70
2.7 Integration in Polar Coordinates 77
2.8 Notes and References 81
3 Signed Measures and Differentiation 85
3.1 Signed Measures 85
3.2 The Lebesgue-Radon-Nikodym Theorem 88
3.3 Complex Measures 93
3.4 Differentiation on Euclidean Space 95
3.5 Functions of Bounded Variation 100
3.6 Notes and References 109
4 Point Set Topology 113
4.1 Topological Spaces 113
4.2 Continuous Maps 119
4.3 Nets 125
4.4 Compact Spaces 128
4.5 Locally Compact Hausdorff Spaces 131
4.6 Two Compactness Theorems 136
4.7 The Stone- Weierstrass Theorem 138
4.8 Embeddings in Cubes 143
4.9 Notes and References 146
5 Elements of Functional Analysis 151
5.1 Normed Vector Spaces 151
5.2 Linear Functionals 157
5.3 The Baire Category Theorem and its Consequences 161
5.4 Topological Vector Spaces 165
5.5 Hilbert Spaces 171
5.6 Notes and References 179
CONTENTS xi;;
6 LP Spaces 181
6.1 Basic Theory of LP Spaces 181
6.2 The Dual of LP 188
6.3 Some Useful Inequalities 193
6.4 Distribution Functions and Weak LP 197
6.5 Interpolation of LP Spaces 200
6.6 Notes and References 208
7 Radon Measures 211
7.1 Positive Linear Functionals on Cc(X) 211
7.2 Regularity and Approximation Theorems 216
7.3 The Dual ofCo(X) 221
7.4 Products of Radon Measures 226
7.5 Notes and References 231
8 Elements of Fourier Analysis 235
8.1 Preliminaries 235
8.2 Convolutions 239
8.3 The Fourier Transform 247
8.4 Summation of Fourier Integrals and Series 257
8.5 Pointwise Convergence of Fourier Series 263
8.6 Fourier Analysis of Measures 270
8.7 Applications to Partial Differential Equations 273
8.8 Notes and References 278
9 Elements of Distribution Theory 281
9.1 Distributions 281
9.2 Compactly Supported, Tempered, and Periodic
Distributions 291
9.3 Sobolev Spaces 301
9.4 Notes and References 310
10 Topics in Probability Theory 313
10.1 Basic Concepts 313
10.2 The Law of Large Numbers 320
10.3 The Central Limit Theorem 325
10.4 Construction of Sample Spaces 328
10.5 The Wiener Process 330
10.6 Notes and References 336
xiv CONTENTS
11 More Measures and Integrals
11.1 Topological Groups and Haar Measure
11.2 Hausdorff Measure
11.3 Self-similarity and Hausdorff Dimension
11.4 Integration on Manifolds
11.5 Notes and References
339
339
348
355
361
363
Bibliography
365
Index of Notation
377
Index
379
Real Analysis
Prologue
The purpose of this introductory chapter is to establish the notation and terminology
that will be used throughout the book and to present a few diverse results from set
theory and analysis that will be needed later. The style here is deliberately terse,
since this chapter is intended as a reference rather than a systematic exposition.
0.1 THE LANGUAGE OF SET THEORY
It is assumed that the reader is familiar with the basic concepts of set theory; the
following discussion is meant mainly to fix our terminology.
Number Systems. Our notation for the fundamental number systems is as
follows:
N == the set of positive integers (not including zero)
Z == the set of integers
<Q == the set of rational numbers
lR = the set of real numbers
<C == the set of complex numbers
Logic. We shall avoid the use of special symbols from mathematical logic,
preferring to remain reasonably close to standard English. We shall, however, use
the abbreviation iff for "if and only if."
One point of elementary logic that is often insufficiently appreciated by students
is the following: If A and B are mathematical assertions and - A, - B are their
1
2 PROLOGUE
negations, the statement "A implies B" is logically equivalent to the contrapositive
statement" - B implies - A." Thus one may prove that A implies B by assuming - B
and deducing - A, and we shall frequently do so. This is not the same as reductio ad
absurdum, which consists of assuming both A and - B and deriving a contradiction.
Sets. The words "family" and "collection" will be used synonymously with
"set," usually to avoid phrases like "set of sets." The empty set is denoted by 0, and
the family of all subsets of a set X is denoted by P(X):
P(X) == {E: E eX}.
Here and elsewhere, the inclusion sign c is interpreted in the weak sense; that is, the
assertion "E C X" includes the possibility that E == X.
If £ is a family of sets, we can form the union and intersection of its members:
U E == {x : x E E for some E E £},
EEc.
n E == {x : x E E for all E E £}.
EEc.
Usually it is more convenient to consider indexed families of sets:
£ == {Eo : Q E A} == {Eo} oEA'
in which case the union and intersection are denoted by
U Eo,
oEA
n Eo.
oEA
If Eo n E{3 == 0 whenever Q =1= {3, the sets Eo are called disjoint. The terms "disjoint
collection of sets" and "collection of disjoint sets" are used interchangeably, as are
"disjoint union of sets" and "union of disjoint sets."
When considering families of sets indexed by N, our usual notation will be
{En} 1 or {En}f,
and likewise for unions and intersections. In this situation, the notions of limit
superior and limit inferior are sometimes useful:
ex:> ex:>
ex:> ex:>
lirninf En == U n En.
k=ln=k
lirnsupE n == n U En,
k=ln=k
The reader may verify that
lirn sup En == {x : x E En for infinitely many n },
lirn inf En == {x : x E En for all but finitely many n }.
THE LANGUAGE OF SET THEORY 3
If E and F are sets, we denote their difference by E \ F:
E \ F == {x : x E E and x F},
and their symmetric difference by EF:
E F == (E \ F) U (F \ E).
When it is clearly understood that all sets in question are subsets of a fixed set X, we
define the complement EC of a set E (in X):
E C == X \ E.
In this situation we have deMorgan's laws:
(U Ea r = n E,
aEA aEA
(n Ear = U E.
aEA aEA
If X and Yare sets, their Cartesian product X x Y is the set of all ordered pairs
(x, y) such that x E X and y E Y. A relation from X to Y is a subset of X x Y.
(If Y == X, we speak of a relation on X.) If R is a relation from X to Y, we shall
sometimes write xRy to mean that (x, y) E R. The most important types of relations
are the following:
. Equivalence relations. An equivalence relation on X is a relation R on X
such that
xRx for all x E X,
xRy iff yRx,
xRz whenever xRy and yRz for some y.
The equivalence class of an element x is {y EX: xRy}. X is the disjoint
union of these equivalence classes.
. Orderings. See 30.2.
. Mappings. A mapping f : X Y is a relation R from X to Y with the
property that for every x E X there is a unique y E Y such that xRy, in which
case we write y == f(x). Mappings are sometimes called maps or functions;
we shall generally reserve the latter name for the case when Y is <C or some
subset thereof.
If f : X -t Y and 9 : Y Z are mappings, we denote by go f their composition:
9 0 f : X Z,
gof(x) ==g(f(x)).
If D c X and E c Y, we define the image of D and the inverse image of E
under a mapping f : X Y by
f(D) == {f(x) : XED},
f-l(E) == {x: f(x) E E}.
4 PROLOGUE
It is easily verified that the map f-1 : P(Y) -t P( X) defined by the second formula
commutes with union, intersections, and complements:
f-1 (U Ea) = U f-l(Ea),
aEA aEA
f-l ( n Ea) = n f-l(Ea),
aEA aEA
f- 1 (E C ) == (f- 1 (E))C.
(The direct image mapping f : P(X) -t P(Y) commutes with unions, but in general
not with intersections or complements.)
If f : X -t Y is a mapping, X is called the domain of f and f(X) is called the
range of f. f is said to be injective if f(X1) = f(X2) only when Xl == X2, surjective
if f(X) == Y, and bijective if it is both injective and surjective. If f is bijective, it
has an inverse f-1 : Y -t X such that f-1 0 f and f 0 f-1 are the identity mappings
on X and Y, respectively. If A c X, we denote by flA the restriction of f to A:
(fIA) : A -t Y,
(fIA)(x) == f(x) for X E A.
A sequence in a set X is a mapping from N into X. (We also use the term finite
sequence to mean a map from {I, . . . , n} into X where n EN.) If f : N -t X is a
sequence and 9 : N -t N satisfies g(n) < g(m) whenever n < m, the composition
fog is called a subsequence of f. It is common, and often convenient, to be careless
about distinguishing between sequences and their ranges, which are subsets of X
indexed by N. Thus, if f(n) == x n , we speak of the sequence {x n }l; whether we
mean a mapping from N to X or a subset of X will be clear from the context.
Earlier we defined the Cartesian product of two sets. Similarly one can define the
Cartesian product of n sets in terms of ordered n-tuples. However, this definition
becomes awkward for infinite families of sets, so the following approach is used
instead. If {Xa}aEA is an indexed family of sets, their Cartesian product TIaEA Xa
is the set of all maps f : A -t UaEA Xa such that f(a) E Xa for every a E A. (It
should be noted, and then promptly forgotten, that when A == {I, 2}, the previous
definition of XIX X 2 is set-theoretically different from the present definition of
TI X j . Indeed, the latter concept depends on mappings, which are defined in terms
of the former one.) If X == TIaEA Xa and a E A, we define the ath projection or
coordinate map 1[a : X -t Xa by 1[a(f) == f(a). We also frequently write x and
Xa instead of f and f ( a) and call Xa the ath coordinate of x.
If the sets Xa are all equal to some fixed set Y, we denote TIaEA Xa by yA:
Y A == the set of all mappings from A to Y.
If A == {I, . . . , n}, Y A is denoted by yn and may be identified with the set of ordered
n-tuples of elements of Y.
0.2 ORDERINGS
A partial ordering on a nonempty set X is a relation R on X with the following
properties:
ORDERINGS 5
. if xRy and yRz, then xRz;
. if xRy and yRx, then x == y;
. xRx for all x.
If R also satisfies
. if x, y E X, then either xRy or yRx,
then R is called a linear (or total) ordering. For example, if E is any set, then P( E)
is partially ordered by inclusion, and IR is linearly ordered by its usual ordering.
Taking this last example as a model, we shall usually denote partial orderings by
< , and we write x < y to mean that x < y but x =1= y. We observe that a partial
ordering on X naturally induces a partial ordering on every nonempty subset of X.
Two partially ordered sets X and Yare said to be order isomorphic if there is a
bijection f : X -t Y such that Xl < X2 iff f(XI) < j(X2).
If X is partially ordered by < , a maximal (resp. minimal) element of X is an
element x E X such that the only y E X satisfying x < y (resp. x > y) is x itself.
Maximal and minimal elements mayor may not exist, and they need not be unique
unless the ordering is linear. If E c X, an upper (resp. lower) bound for E is an
element x E X such that y < x (resp. x < y) for all y E E. An upper bound for E
need not be an element of E, and unless E is linearly ordered, a maximal element of
E need not be an upper bound for E. (The reader should think up some examples.)
If X is linearly ordered by < and every nonempty subset of X has a (necessarily
unique) minimal element, X is said to be well ordered by < , and (in defiance of the
laws of grammar) < is called a well ordering on X. For example, N is well ordered
by its natural ordering.
We now state a fundamental principle of set theory and derive some consequences
of it.
0.1 The Hausdorff Maximal Principle. Every partially ordered set has a maximal
linearly ordered subset.
In more detail, this means that if X is partially ordered by < , there is a set E c X
that is linearly ordered by < , such that no subset of X that properly includes E is
linearly ordered by < . Another version of this principle is the following:
0.2 Zorn's Lemma. If X is a partially ordered set and every linearly ordered subset
of X has an upper bound, then X has a maximal element.
Clearly the Hausdorff maximal principle implies Zorn's lemma: An upper bound
for a maximal linearly ordered subset of X is a maximal element of X. It is also not
difficult to see that Zorn's lemma implies the Hausdorff maximal principle. (Apply
Zorn's lemma to the collection of linearly ordered subsets of X, which is partially
ordered by inclusion.)
0.3 The Well Ordering Principle. Every non empty set X can be well ordered.
6 PROLOGUE
Proof. Let W be the collection of well orderings of subsets of X, and define a
partial ordering on W as follows. If < 1 and < 2 are well orderings on the subsets
E 1 and E 2 , then < 1 precedes < 2 in the partial ordering if (i) < 2 extends < 1, i.e.,
E 1 C E 2 and < 1 and < 2 agree on E 1 , and (ii) if x E E 2 \ E 1 then y < 2 X for all
y EEl. The reader may verify that the hypotheses of Zorn's lemma are satisfied, so
that W has a maximal element. This must be a well ordering on X itself, for if < is
a well ordering on a proper subset E of X and Xo E X \ E, then < can be extended
to a well ordering on E U {xo} by declaring that x < Xo for all x E E. .
0.4 The Axiom of Choice. If {Xa}aEA is a nonempty collection of non empty sets,
then TIaEA Xa is nonempty.
Proof. Let X == UaEA Xa. Pick a well ordering on X and, for a E A, let f(a)
be the minimal element of Xa. Then f E TIaEA Xa. .
0.5 Corollary. If {X a} aEA is a disjoint collection of nonempty sets, there is a set
Y c UaEA Xa such that Y n Xa contains precisely one elementfor each a E A.
Proof. Take Y = f(A) where f E TIaEA Xa.
.
We have deduced the axiom of choice from the Hausdorff maximal principle; in
fact, it can be shown that the two are logically equivalent.
0.3 CARDINALITY
If X and Yare nonempty sets, we define the expressions
card(X) < card(Y),
card(X) == card(Y),
card(X) > card(Y)
to mean that there exists f X -t Y which is injective, bijective, or surjective,
respectively. We also define
card(X) < card(Y),
card(X) > card(Y)
to mean that there is an injection but no bijection, or a surjection but no bijection,
from X to Y. Observe that we attach no meaning to the expression "card(X)" when
it stands alone; there are various ways of doing so, but they are irrelevant for our
purposes (except when X is finite - see below). These relationships can be extended
to the empty set by declaring that
card(0) < card(X) and card(X) > card(0) for all X =1= 0.
For the remainder of this section we assume implicitly that all sets in question are
nonempty in order to avoid special arguments for 0. Our first task is to prove that
the relationships defined above enjoy the properties that the notation suggests.
CARDINALITY 7
0.6 Proposition. card(X) < card(Y) @'card(Y) > card(X).
Proof. If f : X Y is injective, pick Xo E X and define 9 : Y X by
g(y) == f-l(y) ify E f(X), g(y) == Xo otherwise. Then 9 is surjective. Conversely,
if 9 : Y -t X is surjective, the sets g-l ({ x}) (x E X) are nonempty and disjoint, so
any f E llxEx g-1 ({ x}) is an injection from X to Y. .
0.7 Proposition. For any sets X and Y, either card(X) < card(Y) or card(Y) <
card(X).
Proof. Consider the set J of all injections from subsets of X to Y. The members
of J can be regarded as subsets of X x Y, so J is partially ordered by inclusion. It is
easily verified that Zorn's lemma applies, so J has a maximal element f, with (say)
domain A and range B. If Xo E X \ A and Yo E Y \ B, then f can be extended
to an injection from A U {xo} to Y U {Yo} by setting f(xo) == Yo, contradicting
maximality. Hence either A == X, in which case card(X) < card(Y), or B = Y, in
which case f-l is an injection from Y to X and card(Y) < card(X). .
0.8 The Schroder-Bernstein Theorem. If card(X) < card(Y) and card(Y) <
card(X) then card(X) == card(Y).
Proof. Let f : X Y and 9 : Y -t X be injections. Consider a point x EX:
If x E g(Y), we form g-l(x) E Y; if g-l(x) E f(X), we form f-l(g-l(x)); and
so forth. Either this process can be continued indefinitely, or it terminates with an
element of X \g(Y) (perhaps x itself), or it terminates with an element ofY \ f(X).
In these three cases we say that x is in X 00, X x, or X y; thus X is the disjoint union
of Xoo, Xx, and X y. In the same way, Y is the disjoint union of three sets Y 00, Y x,
and Yy. Clearly f maps Xoo onto Y oo and Xx onto Y x , whereas 9 maps Y y onto
Xy. Therefore, if we define h : X Y by h(x) == f(x) if X E Xoo U Xx and
h(x) == g-l(x) if x E Xy, then h is bijective. .
0.9 Proposition. For any set X, card( X) < card(P( X)).
Proof. On the one hand, the map f (x) == {x} is an injection from X to P( X).
On the other, if 9 : X -t P(X), let Y = {x EX: x g(x)}. Then Y g(X), for
if Y = g(xo) for some Xo E X, any attempt to answer the question "Is Xo E Y?"
quickly leads to an absurdity. Hence g cannot be surjective. .
A set X is called countable (or denumerable) if card(X) < card(N). In
particular, all finite sets are countable, and for these it is convenient to interpret
"card(X)" as the number of elements in X:
card(X) == n iff card(X) == card( {I, . . . , n}).
If X is countable but not finite, we say that X is countably infinite.
8 PROLOGUE
0.10 Proposition.
a. If X and Yare countable, so is X x Y.
b. If A is countable and Xa is countable for every a E A, then UaEA Xa is
countable.
c. If X is countably infinite, then card( X) == card(N).
Proof. To prove (a) it suffices to prove that N 2 is countable. But we can define
a bijection from N to N2 by listing, for n successively equal to 2, 3, 4, . . ., those
elements (j, k) E N2 such that j + k = n in order of increasing j, thus:
(1,1), (1,2), (2,1), (1,3), (2,2), (3,1), (1,4), (2,3), (3,2), (4,1),...
As for (b), for each a E A there is a surjective fa : N --+ Xa, and then the map
f : N x A --+ UaEA Xa defined by f(n, a) = fa(n) is surjective; the result therefore
follows from (a). Finally, for (c) it suffices to assume that X is an infinite subset
of N. Let f (1) be the smallest element of X, and define f ( n) inductively to be the
smallest element of E \ {f(l),..., f(n -I)}. Then f is easily seen to be a bijection
mNX. .
0.11 Corollary. Z and <Q are countable.
Proof. Z is the union of the countable sets N, {-n : n EN}, and {O}, and one
can define a surjection f : Z2 --+ <Q by f(m, n) == m/n if n =1= 0 and f(m, 0) == O. .
A set X is said to have the cardinality of the continuum if card(X) == card(IR).
We shall use the letter c as an abbreviation for card(IR):
card(X) == c iff card(X) == card(IR).
0.12 Proposition. card(P(N)) == c.
Proof. If A c N, define f(A) E IR to be LnEA 2- n if N \ A is infinite and
1 + LnEA 2- n if N \ A is finite. (In the two cases, f(A) is the number whose base-2
decimal expansion is 0.ala2 . . . or 1.ala2 . . ., where an = 1 if n E A and an == 0
otherwise.) Then f : P(N) --+ IR is injective. On the other hand, define 9 : P(Z) IR
by g(A) == log(LnEA 2- n ) if A is bounded below and g(A) == 0 otherwise. Then
9 is surjective since every positive real number has a base-2 decimal expansion.
Since card(P(Z)) == card(P(N)), the result follows from the Schroder-Bernstein
theorem. .
0.13 Corollary. Ifcard(X) > c, then X is uncountable.
Proof. Apply Proposition 0.9.
.
The converse of this corollary is the so-called continuum hypothesis, whose va-
lidity is one of the famous undecidable problems of set theory; see 30.7.
MORE ABOUT WELL ORDERED SETS 9
0.14 Proposition.
a. Ifcard(X) < c and card(Y) < c, then card(X x Y) < c.
b. Ifcard(A) < c and card(Xa) < cfor all a E A, then card(UaEA Xa) < c.
Proof. For (a) it suffices to take X == Y == P(N). Define q;, 'ljJ : N N by
q;(n) = 2n and 'ljJ(n) == 2n - 1. It is then easy to check that the map f : P(N)2
P(N) defined by f(A, B) == q;(A) U 'ljJ(B) is bijective. (b) follows from (a) as in the
proof of Proposition 0.10. .
0.4 MORE ABOUT WELL ORDERED SETS
The material in this section is optional; it is used only in a few exercises and in some
notes at the ends of chapters.
Let X be a well ordered set. If A c X is nonempty, A has a minimal element,
which is its maximal lower bound or infimum; we shall denote it by inf A. If A is
bounded above, it also has a minimal upper bound or supremum, denoted by sup A.
If x EX, we define the initial segment of x to be
Ix == {y EX: y < x}.
The elements of Ix are called predecessors of x.
The principle of mathematical induction is equivalent to the fact that N is well
ordered. It can be extended to arbitrary well ordered sets as follows:
0.15 The Principle of Transfinite Induction. Let X be a well ordered set. If A is
a subset of X such that x E A whenever Ix C A, then A == X.
Proof. If X =1= A, let x == inf(X \ A). Then Ix C A but x A.
.
0.16 Proposition. If X is well ordered and A C X, then UXEA Ix is either an initial
segment or X itself.
Proof. Let J == UXEA Ix. If J =1= X, let b == inf(X \ J). If there existed y E J
with y > b, we would have y E Ix for some x E A and hence b E Ix, contrary to
construction. Hence J C I b , and it is obvious that Ib c J. .
0.17 Proposition. If X and Yare well ordered, then either X is order isomorphic
to Y, or X is order isomorphic to an initial segment in Y, or Y is order isomorphic
to an initial segment in X.
Proof. Consider the set l' of order isomorphisms whose domains are initial
segments in X or X itself and whose ranges are initial segments in Y or Y itself.
l' is nonempty since the unique f : {inf X} {inf Y} belongs to 1', and :r is
partially ordered by inclusion (its members being regarded as subsets of X x Y).
10 PROLOGUE
An application of Zorn's lemma shows that l' has a maximal element f, with (say)
domain A and range B. If A == Ix and B == Iy, then A U {x} and B U {y} are
again initial segments of X and Y, and f could be extended by setting f (x) == y,
contradicting maximality. Hence either A = X or B = Y (or both), and the result
follows. .
0.18 Proposition. There is an uncountable well ordered set 0 such that Ix is count-
able for each x E O. If 0' is another set with the same properties, then 0 and 0' are
order isomorphic.
Proof. Uncountable well ordered sets exist by the well ordering principle; let X
be one. Either X has the desired property or there is a minimal element Xo such that
Ixo is uncountable, in which case we can take 0 == Ixo. If 0' is another such set, 0'
cannot be order isomorphic to an initial segment of 0 or vice versa, because 0 and
0' are uncountable while their initial segments are countable, so 0 and 0' are order
isomorphic by Proposition 0.17. .
The set n in Proposition 0.18, which is essentially unique qua well ordered set, is
called the set of countable ordinals. It has the following remarkable property:
0.19 Proposition. Every countable subset of 0 has an upper bound.
Proof. If A c 0 is countable, UXEA Ix is countable and hence is not all of O.
By Proposition 0.16, there exists y E 0 such that UXEA Ix = Iy, and y is thus an
upper bound for A. .
The set N of positive integers may be identified with a subset of 0 as follows. Set
f(l) == inf 0, and proceeding inductively, set f( n) == inf(O \ {f(l), . . . , f( n -I)}).
The reader may verify that f is an order isomorphism from N to Iw, where W is the
minimal element of 0 such that Iw is infinite.
It is sometimes convenient to add an extra element WI to 0 to form a set 0* ==
[2 U {WI} and to extend the ordering on 0 to 0* by declaring that x < WI for all
x E O. WI is called the first uncountable ordinal. (The usual notation for WI is 0,
since WI is generally taken to be the set of countable ordinals itself.)
0.5 THE EXTENDED REAL NUMBER SYSTEM
It is frequently useful to adjoin two extra points 00 (== +(0) and -00 to 1R to form the
extended real number system 1R == JR U { - 00, 00 }, and to extend the usual ordering
on 1R by declaring that -00 < x < 00 for all x E JR. The completeness ofJR can then
be stated as follows: Every subset A of JR has a least upper bound, or supremum,
and a greatest lower bound, or infimum, which are denoted by sup A and inf A. If
A == {aI, . . . an}, we also write
max(al'...' an) = sup A,
min(al, . . . , an) == inf A.
THE EXTENDED REAL NUMBER SYSTEM 11
From completeness it follows that every sequence {x n } in lR has a limit superior
and a limit inferior:
lirn sup X n == inf ( SUP Xn ) ,
k2:1 n>k
lirn inf X n == sup ( inf Xn ) .
k>l n2:k
The sequence {xn} converges (in lR) iff these two numbers are equal (and finite), in
which case its limit is their common value. One can also define lirn sup and lirn inf
for functions f : 1R lR , for instance:
lirn sup f(x) == inf ( SUP f(x ) ) .
xa 8>0 0<lx-al<8
The arithmetical operations on lR can be partially extended to 1R :
x :1: 00 == :1:00 (x E lR), 00 + 00 == 00, -00 - 00 == -00,
x . (:1:00) == :1:00 (x > 0), x. (:1:00) == 1=00 (x < 0).
We make no attempt to define 00 - 00, but we abide by the convention that, unless
otherwise stated,
o . (:1:00) == O.
(The expression 0 . 00 turns up now and then in measure theory, and for various
reasons its proper interpretation is almost always 0.)
We employ the following notation for intervals in lR : if -00 < a < b < 00,
( a, b) == {x : a < x < b},
(a, b] == {x : a < x < b},
[a, b] == {x : a < x < b},
[a, b) == {x : a < x < b}.
We shall occasionally encounter uncountable sums of nonnegative numbers. If X
is an arbitrary set and f : X [0,00], we define L:xEX f(x) to be the supremum
of its finite partial sums:
L f(x) = SUP{ L f(x) : F c X, F finite}.
xEX xEF
(Later we shall recognize this as the integral of f with respect to counting measure
on X.)
0.20 Proposition. Given f : X [0,00], let A == {x : f(x) > O}. If A is
uncountable, then L:xEX f(x) == 00. If A is countably infinite, then L:xEX f(x) ==
L: f(g(n)) where g : N A is any bijection and the sum on the right is an
ordinary infinite series.
Proof. We have A == U An where An == {x : f(x) > l/n}. If A is
uncountable, then some An must be uncountable, and L:xEF f(x) > card(F) /n for
F a finite subset of An; it follows that L:xEX f (x) == 00. If A is countably infinite,
12 PROLOGUE
g : N A is a bijection, and BN == g( {I, . . . , N}), then every finite subset F of A
is contained in some B N. Hence
N
L f(x) < L f(g(n)) < L f(x).
xEF 1 xEX
Taking the supremum over N, we find
00
L f(x) < L f(g(n)) < L f(x),
xEF 1 xEX
and then taking the supremum over F, we obtain the desired result. .
Some terminology concerning (extended) real-valued functions: A relation be-
tween numbers that is applied to functions is understood to hold pointwise. Thus
f < g means that f(x) < g(x) for every x, and max(f, g) is the function whose
value at x is max(f(x),g(x)). If X c lR and f : X lR , f is called increasing
if f(x) < f(y) whenever x < y and strictly increasing if f(x) < f(y) whenever
x < y; similarly for decreasing. A function that is either increasing or decreasing is
called monotone.
If f : lR lR is an increasing function, then f has right- and left-hand limits at
each point:
f(a+) = lirn f(x) == inf f(x),
xa x>a
f(a-) == lim f(x) == supf(x).
x/a x<a
Moreover, the limiting values f(oo) == sUPaEJR f(x) and f( -(0) == infaEJR f(x)
exist (possibly equal to :1:(0). f is called right continuous if f(a) == f(a+) for all
a E lR and left continuous if f(a) == f(a-) for all a E lR.
For points x in lR or C, Ixl denotes the ordinary absolute value or modulus of x,
la + ibl = y' a 2 + b 2 . For points x in lR n or C n , Ixl denotes the Euclidean norm:
n 1/2
Ix I = [L IXjI2] ·
1
We recall that a set U c lR is open if, for every x E U, U includes an interval
centered at x.
0.21 Proposition. Every open set in lR is a countable disjoint union of open intervals.
Proof. If U is open, for each x E U consider the collection J x of all open
intervals I such that x E leU. It is easy to check that the union of any family
of open intervals containing a point in common is again an open interval, and hence
J x = UIEJ x I is an open interval; it is the largest element of J x . If x, y E U then
either J x = J y or J x n J y == 0, for otherwise J x U J y would be a larger open interval
than J x in J x. Thus if (J = {J x : x E U}, the (distinct) members of (J are disjoint,
and U == UJE8 J. For each J E (J, pick a rational number f(J) E J. The map
f : (J Q thus defined is injective, for if J =1= J' then J n J' = 0; therefore (J is
countable. .
METRIC SPACES 13
0.6 METRIC SPACES
A metric on a set X is a function P : X x X [0, (0) such that
. p( x, y) == 0 iff x == Y;
. p(x, Y) == p(y, x) for all x, y E X;
. p(x, z) < p(x, y) + p(y, z) for all x, y, z E X.
(Intuitively, p(x, y) is to be interpreted as the distance from x to y.) A set equipped
with a metric is called a metric space. Some examples:
i. The Euclidean distance p(x, y) == Ix - yl is a metric on lR n .
ii. PI (f, g) == fo1 If(x) - g(x)1 dx and Poo(f, g) == SUPO<x<l If(x) - g(x)1 are
metrics on the space of continuous functions on [0, 1]. - -
111. If P is a metric on X and A c X, then pi (A x A) is a metric on A.
IV. If (Xl, PI) and (X 2, P2) are metric spaces, the product metric p on XIX X 2
is given by
p((X1, X2), (Y1, Y2)) == max (PI (Xl, Y1), P2(X2, Y2)).
Other metrics are sometimes used on Xl x X 2 , for instance,
PI (Xl, Y1) + P2(X2, Y2) or [PI (Xl, Y1)2 + P2(X2, Y2)2] 1/2.
These, however, are equivalent to the product metric in the sense that we shall
define at the end of this section.
Let (X, p) be a metric space. If x E X and r > 0, the (open) ball of radius r
about x is
B(r,x) == {y EX: p(x,y) < r}.
A set E c X is open if for every x E E there exists r > 0 such that B( r, x) C E,
and closed if its complement is open. For example, every ball B (r, x) is open, for
if Y E B(r, x) and p(x, y) == s then B(r - s, y) C B(r, x). Also, X and 0 are
both open and closed. Clearly the union of any family of open sets is open, and
hence the intersection of any family of closed sets is closed. Also, the intersection
(resp. union) of any finite family of open (resp. closed) sets is open (resp. closed).
Indeed, if U 1 , . . . Un are open and x E n U j , for each j there exists r j > 0 such that
B(rj, x) C U j , and then B(r, x) c n U j where r == min(r1,..., rn), so n U j is
open.
If E eX, the union of all open sets U C E is the largest open set contained in E;
it is called the interior of E and is denoted by EO. Likewise, the intersection of all
closed sets F :) E is the smallest closed set containing E; it is called the closure of
E and is denoted by E . E is said to be dense in X if E == X, and nowhere dense if
14 PROLOGUE
E has empty interior. X is called separable if it has a countable dense subset. (For
example, Qn is a countable dense subset of lR n .) A sequence {x n } in X converges
to x E X (symbolically: X n x or lim X n == x) if limnoo p( x n , x) == O.
0.22 Proposition. If X is a metric space, E c X, and x E X, the following are
equivalent:
a. x E E .
b. B(r,x)nE=/:=0forallr>0.
c. There is a sequence {xn} in E that converges to x.
Proof. If B(r, x) n E == 0, then B(r, x)C is a closed set containing E but not
x, so x E . Conversely, if x E , since ( E )C is open there exists r > 0 such
that B(r,x) C ( E )C c EC. Thus (a) is equivalent to (b). If (b) holds, for each
n E N there exists X n E B(n- 1 , x) n E, so that X n x. On the other hand, if
B(r, x) n E == 0, then p(y, x) > r for all y E E, so no sequence of E can converge
to x. Thus (b) is equivalent to (c). .
If (Xl, PI) and (X 2, P2) are metric spaces, a map f : X 1 X 2 is called contin-
uous at x E X if for every E > 0 there exists 8 > 0 such that p2(f(y), f(x)) < E
whenverp1(x,y) < 8-inotherwords,suchthatf-1(B(E,f(x))):) B(8,x). The
map f is called continuous if it is continuous at each x E Xl and uniformly contin-
uous if, in addition, the 8 in the definition of continuity can be chosen independent
of x.
0.23 Proposition. f : Xl X 2 is continuous iff f-1(U) is open in Xl for every
open U C X 2 .
Proof. If the latter condition holds, then for every x E Xl and E > 0, the set
f-1(B(E, f(x))) is open and contains x, so it contains some ball about x; this means
that f is continuous at x. Conversely, suppose that f is continuous and U is open
in X 2. For each y E U there exists Ey > 0 such that B ( Ey, y) C U, and for each
x E f-1( {y}) there exists 8x > 0 such that B(8x, x) C f-1(B(Ey, y)) C f-1(U).
Thus f-1 (U) == UXE!-l (U) B( 8x, x) is open. .
A sequence {x n } in a metric space (X,p) is called Cauchy if p(xn,x m ) 0
as n, m 00. A subset E of X is called complete if every Cauchy sequence in
E converges and its limit is in E. For example, lR n (with the Euclidean metric) is
complete, whereas Qn is not.
0.24 Proposition. A closed subset of a complete metric space is complete, and a
complete subset of an arbitrary metric space is closed.
Proof. If X is complete, E C X is closed, and {x n } is a Cauchy sequence in E,
{x n } has a limit in X. By Proposition 0.22, x E E == E. If E c X is complete and
x E E , by Proposition (0.22) there is a sequence {x n } in E converging to x. {x n }
is Cauchy, so its limit lies in E; thus E == E . .
METRIC SPACES 15
In a metric space (X, p) we can define the distance from a point to a set and the
distance between two sets. Namely, if x E X and E, F c X,
p(x,E) == inf{p(x,y) : y E E},
p(E, F) == inf{p(x, y) : x E E, y E F} == inf{p(x, F) : x E E}.
Observe that, by Proposition 0.22, p(x, E) == 0 iff x E E . We also define the
diameter of E c X to be
diam E == sup{p(x, y) : x, y E E}.
E is called bounded if diam E < 00.
If E c X and {Va}aEA is a family of sets such that E c UaEA Va, {Va}aEA
is called a cover of E, and E is said to be covered by the Va's. E is called totally
bounded if, for every E > 0, E can be covered by finitely many balls of radius E.
Every totally bounded set is bounded, for ifx,y E U B(E,Zj), say x E B(E,Zl)
and y E B( E, Z2), then
p(x, y) < p(x, Zl) + p(Zl, Z2) + P(Z2, y) < 2E + max{p(zj, Zk) : 1 < j, k < n}.
(The converse is false in general.) If E is totally bounded, so is E , for it is easily
. n - n
seen that If E C Ul B( E, Zj), then E c U1 B(2E, Zj).
0.25 Theorem. If E is a subset of the metric space (X, p), the following are equiv-
alent:
a. E is complete and totally bounded.
b. (The Bolzano- Weierstrass Property) Every sequence in E has a subsequence
that converges to a point of E.
c. (The Heine-Borel Property) If {Va}aEA is a cover of E by open sets, there
is afinite set F c A such that {Va}aEF covers E.
Proof. We shall show that (a) and (b) are equivalent, that (a) and (b) together
imply (c), and finally that (c) implies (b).
(a) implies (b): Suppose that (a) holds and {x n } is a sequence in E. E can be
covered by finitely many balls of radius 2- 1 , and at least one of them must contain X n
for infinitely many n: say, X n E B1 for n E N 1 . E n B 1 can be covered by finitely
many balls of radius 2- 2 , and at least one of them must contain X n for infinitely many
n E N 1 : say, X n E B 2 for n E N 2 . Continuing inductively, we obtain a sequence
of balls Bj of radius 2- j and a decreasing sequence of subsets N j of N such that
X n E Bj for n E N j . Pick nl E N 1 , n2 E N 2 ,... such that nl < n2 < ....
Then {x nj } is a Cauchy sequence, for p(x nj , x nk ) < 2 1 - j if k > j, and since E is
complete, it has a limit in E.
(b) implies (a): We show that if either condition in (a) fails, then so does (b). If
E is not complete, there is a Cauchy sequence {x n } in E with no limit in E. No
subsequence of {x n } can converge in E, for otherwise the whole squence would
converge to the same limit. On the other hand, if E is not totally bounded, let E > 0
16 PROLOGUE
be such that E cannot be covered by finitely many balls of radius Eo Choose X n E E
inductively as follows. Begin with any Xl E E, and having chosen Xl, . . . , X n ,
pick Xn+l E E \ U B(E,Xj). Then p(xn,xm) > E for all n,m, so {xn} has no
convergent subsequence.
(a) and (b) imply (c): It suffices to show that if (b) holds and {Va}aEA is a cover
of E by open sets, there exists E > 0 such that every ball of radius E that intersects
E is contained in some Va, for E can be covered by finitely many such balls by (a).
Suppose to the contrary that for each n E N there is a ball Bn of radius 2- n such
that Bn n E =1= 0 and Bn is contained in no Va. Pick X n E Bn n E; by passing to a
subsequence we may assume that {xn} converges to some X E E. We have X E Va
for some Q, and since Va is open, there exists E > 0 such that B ( E, x) C Ya. But if
n is large enough so that p( X n , x) < E/3 and 2- n < E/3, then Bn C B( E, x) eVa,
contradicting the assumption on Bn.
(c) implies (b): If {xn} is a sequence in E with no convergent subsequence, for
each x E E there is a ball Bx centered at x that contains X n for only finitely many n
(otherwise some subsequence would converge to x). Then {Bx} xEE is a cover of E
by open sets with no finite subcover. .
A set E that possesses the properties (a)-(c) of Theorem 0.25 is called compact.
Every compact set is closed (by Proposition 0.24) and bounded; the converse is false
in general but true in JRn .
0.26 Proposition. Every closed and bounded subset ofJRn is compact.
Proof. Since closed subsets of JRn are complete, it suffices to show that bounded
subsets of n are totally bounded. Since every bounded set is contained in some
cube
Q == [-R, R]n = {x E JRn : max(lxII,..., IXnl) < R},
it is enough to show that Q is totally bounded. Given E > 0, pick an integer
k > Ry'ii/E, and express Q as the union of k n congruent subcubes by dividing the
interval [- R, R] into k equal pieces. The side length of these subcubes is 2R/ k and
hence their diameter is y'ii(2R/ k) < 2E, so they are contained in the balls of radius
E about their centers. .
Two metrics PI and P2 on a set X are called equivalent if
o PI < P2 < 0' PI for some 0, 0' > o.
It is easily verified that equivalent metrics define the same open, closed, and compact
sets, the same convergent and Cauchy sequences, and the same continuous and uni-
formly continuous mappings. Consequently, most results concerning metric spaces
depend not on the particular metric chosen but only on its equivalence class.
0.7 NOTES AND REFERENCES
33 0 . 1 -0.4: The best exposition of set theory for beginners is Halmos [63], and
Smullyan and Fitting [135] is a good text on a more advanced level. Kelley [83]
NOTES AND REFERENCES 17
also contains a concise account of of basic axiomatic set theory. All of these books
present a deduction of the Hausdorff maximal principle from the axiom of choice, as
does Hewitt and Stromberg [76].
The axiom of choice (or one of the propositions equivalent to it) is generally taken
as one of the basic postulates in the axiomatic formulations of set theory. Some
mathematicians of the intuitionist or constructivist persuasion reject it on the grounds
that one has not proved the existence of a mathematical object until one has shown
how to construct it in some reasonably explicit fashion, whereas the whole point of
the axiom of choice is to provide existence theorems when constructive methods fail
(or are too cumbersome for comfort). People who are seriously bothered by such
objections belong to a minority that does not include the present writer; in this book
the axiom of choice is used sparingly but freely.
The continuum hypothesis is the assertion that if card(X) < c, then X is
countable. (Since it follows easily from the construction of 0, the set of countable
ordinals, that card(O) < card(X) for any uncountable X, an equivalent assertion
is that card(O) == c.) It is known, thanks to Godel and Cohen, that the continuum
hypothesis and its negation are both consistent with the standard axioms of set theory
including the axiom of choice, assuming that those axioms are themselves consistent.
(An exposition of the consistency and independence theorems for the axiom of choice
and the continuum hypothesis can be found in Smullyan and Fitting [135].) Some
mathematicians are willing to accept the continuum hypothesis as true, seemingly as
a matter of convenience, but Godel [56] and Cohen [26, p. 151] have both expressed
suspicions that it should be false, and as of this writing no one has found any really
compelling evidence on one side or the other. My own feeling, subject to revision
in the event of a major breakthrough in set theory, is that if the answer to one's
question turns out to depend on the continuum hypothesis, one should give up and
ask a different question.
30.6: A more detailed discussion of metric spaces can be found in Loomis and
Sternberg [95] and DePree and Swartz [32].
1
Measures
In this chapter we set forth the basic concepts of measure theory, develop a general
procedure for constructing nontrivial examples of measures, and apply this procedure
to construct measures on the real line.
1.1 INTRODUCTION
One of the most venerable problems in geometry is to determine the area or volume
of a region in the plane or in 3-space. The techniques of integral calculus provide a
satisfactory solution to this problem for regions that are bounded by "nice" curves or
surfaces but are inadequate to handle more complicated sets, even in dimension one.
Ideally, for n E N we would like to have a function J-l that assigns to each E c JRn
a number J-l(E) E [0,00], the n-dimensional measure of E, such that J-l(E) is given
by the usual integral formulas when the latter apply. Such a function J-l should surely
possess the following properties:
1. If E 1 , E 2 , . . . is a finite or infinite sequence of disjoint sets, then
J-l( El U E 2 U . . .) == J-l( E 1 ) + J-l( E 2 ) + . . . .
11. If E is congruent to F (that is, if E can be transformed into F by translations,
rotations, and reflections), then J-l(E) = J-l(F).
111. J-l( Q) = 1, where Q is the unit cube
Q == {x E JRn : 0 < Xj < 1 for j == 1, . . . , n }.
19
20 MEASURES
Unfortunately, these' conditions are mutually inconsistent. Let us see why this is
true for n == 1. (The argument can easily be adapted to higher dimensions.) To begin
with, we define an equivalence relation on [0, 1) by declaring that x rv y iff x - y
is rational. Let N be a subset of [0, 1) that contains precisely one member of each
equivalence class. (To find such an N, one must invoke the axiom of choice.) Next,
let R == Q n [0,1), and for each r E R let
N r == {x + r : x E N n [0, 1 - r)} U {x + r - 1 : x E N n [1 - r, I)}.
That is, to obtain N r , shift N to the right by r units and then shift the part that sticks
out beyond [0, 1) one unit to the left. Then N r C [0,1), and every x E [0,1) belongs
to precisely one N r . Indeed, if y is the element of N that belongs to the equivalence
class of x, then x E N r where r == x - y if x > y or r = x - y + 1 if x < y; on
the other hand, if x E N r n N s , then x - r (or x - r + 1) and x - s (or x - s + 1)
would be distinct elements of N belonging to the same equivalence class, which is
impossible.
Suppose now that J-l : P(ffi.) [0,00] satisfies (i), (ii), and (iii). By (i) and (ii),
J-l(N) == J-l(N n [0, 1 - r)) + J-l(N n [1 - r, 1)) == J-l(N r )
for any r E R. Also, since R is countable and [0, 1) is the disjoint union of the Nr's,
J-l([0, 1)) == L J-l(N r )
rER
by (i) again. But JL([O, 1)) == 1 by (iii), and since JL(N r ) == J-l(N), the sum on the
right is either 0 (if J-l( N) == 0) or 00 (if JL( N) > 0). Hence no such JL can exist.
Faced with this discouraging situation, one might consider weakening (i) so that
additivity is required to hold only for finite sequences. This is not a very good idea,
as we shall see: The additivity for countable sequences is what makes all the limit
and continuity results of the theory work smoothly. Moreover, in dimensions n > 3,
even this weak form of (i) is inconsistent with (ii) and (iii). Indeed, in 1924 Banach
and Tarski proved the following amazing result:
Let U and V be arbitrary bounded open sets in ffi.n, n > 3. There exist kEN
and subsets E I , . . . , E k , F I , . . . , Fk of ffi.n such that
- the Ej's are disjoint and their union is U;
- the Fj's are disjoint and their union is V;
- Ej is congruent to Fj for j == 1, . . . , k.
Thus one can cut up a ball the size of a pea into a finite number of pieces and
rearrange them to form a ball the size of the earth! Needless to say, the sets Ej and Fj
are very bizarre. They cannot be visualized accurately, and their construction depends
on the axiom of choice. But their existence clearly precludes the construction of any
JL : P(ffi. n ) [0,00] that assigns positive, finite values to bounded open sets and
satisfies (i) for finite sequences as well as (ii).
a-ALGEBRAS 21
The moral of these examples is that n contains subsets which are so strangely put
together that it is impossible to define a geometrically reasonable notion of measure
for them, and the remedy for the situation is to discard the requirement that /-l should
be defined on all subsets of n. Rather, we shall content ourselves with constructing
/-l on a class of subsets of n that includes all the sets one is likely to meet in practice
unless one is deliberately searching for pathological examples. This construction
will be carried out for n == 1 in 31.5 and for n > 1 in 32.6.
It is worthwhile, and not much extra work, to develop the theory in much greater
generality. The conditions (ii) and (iii) are directly related to Euclidean geometry,
but set functions satisfying (i), called measures, arise also in a great many other
situations. For example, in a physics problem involving mass distributions, /-l( E)
could represent the total mass in the region E. For another example, in probability
theory one considers a set X that represents the possible outcomes of an experiment,
and for E C X, /-l(E) is the probability that the outcome lies in E. We therefore
begin by studying the theory of measures on abstract sets.
1.2 a-ALGEBRAS
In this section we discuss the families of sets that serve as the domains of measures.
Let X be a nonempty set. An algebra of sets on X is a nonempty collection A
of subsets of X that is closed under finite unions and complements; in other words,
if E 1 , . . . , En E A, then U Ej E A; and if E E A, then EC E A. A iT-algebra is
an algebra that is closed under countable unions. (Some authors use the terms field
and u-field instead of algebra and a-algebra.)
We observe that since n j Ej == (U j Ej)C, algebras (resp. a-algebras) are also
closed under finite (resp. countable) intersections. Moreover, if A is an algebra, then
o E A and X E A, for if E E A we have 0 == E n EC and X == E u EC.
It is worth noting that an algebra A is a a-algebra provided that it is closed under
countable disjoint unions. Indeed, suppose {E j } 1 c A. Set
k-l k-l
Fk = Ek \ [U E j ] = Ek n [U E j ] c.
1 1
Then the Pk's belong to A and are disjoint, and U Ej == U P k . This device of
replacing a sequence of sets by a disjoint sequence is worth remembering; it will be
used a number of times below.
Some examples: If X is any set, P(X) and {0, X} are a-algebras. If X is
uncountable, then
A == {E eX: E is countable or EC is countable}
is a a-algebra, called the a-algebra of countable or co-countable sets. (The point
here is that if {E j } 1 c A, then U Ej is countable if all Ej are countable and is
co-countable otherwise.)
22 MEASURES
It is trivial to verify that the intersection of any family of a-algebras on X is again
a a-algebra. It follows that if G is any susbset of P(X), there is a unique smallest
a-algebra M( G) containing G, namely, the intersection of all a-algebras containing
£. (There is always at least one such, namely, P(X).) M( G) is called the a-algebra
generated by G. The following observation is often useful:
1.1 Lemma. If G C M() then M( G) C M().
Proof. M() is a a-algebra containing G; it therefore contains M( G). .
If X is any metric space, or more generally any topological space (see Chapter
4), the a-algebra generated by the family of open sets in X (or, equivalel1,tly, by the
famil y of closed sets in X) is called the BorellT-algebra on X and is denoted by
'B x. Its members are called Borel sets. 'B x thus includes open sets, closed sets,
countable intersections of open sets, countable unions of closed sets, and so forth.
There is a standard terminology for the levels in this hierarchy. A countable
intersection of open sets is called a G 6 set; a countable union of closed sets is called
an F 0' set; a countable union of G 8 sets is called a G 60' set; a countable intersection of
Fa sets is called an F 0'6 set; and so forth. ({) and a stand for the German Durchschnitt
and Summe, that is, intersection and union.)
The Borel a-algebra on JR will playa fundamental role in what follows. For future
reference we note that it can be generated in a number of different ways:
1.2 Proposition. 'B is generated by each of the following:
a. the open intervals: G1 = {(a, b) : a < b},
b. the closed intervals: G2 = {[a, b] : a < b},
c. the half-open intervals: G3 = {(a, b] : a < b} or G4 = { [a, b) : a < b},
d. the open rays: G5 = {(a, (0) : a E JR} or G6 = {( -00, a) : a E JR},
e. the closed rays: G7 = {[a, (0) : a E JR} OrG8 = {(-(X), a] : a E JR}.
Proof. The elements of Gj for j =1= 3,4 are open or closed, and the elements of G3
and G4 are G 8 sets - for example, (a, b] = n (a, b + n -1 ). All of these are Borel
sets, so by Lemma 1.1, M( G j) C 23 for all j. On the other hand, every open set in JR
is a countable union of open intervals, so by Lemma 1.1 again, 23 C M( G 1). That
'BJR C M( Gj) for j > 2 can now be established by showing that all open intervals lie
in M( Gj) and applying Lemma 1.1. For example, ( a, b) = U [a + n- 1 , b - n- 1 ] E
M(G2). Verification of the other cases is left to the reader (Exercise 2). .
Let {XO;}O;EA be an indexed collection of nonempty sets, X = TIO;EA XO;, and
?To; : X XO; the coordinate maps. If MO; is a a-algebra on XO; for each lX, the
product a-algebra on X is the a-algebra generated by
{?T1 (Eo;) : Eo; E MO;, lX E A}.
We denote this a-algebra by Q9O;EA Mo;. (If A == {I, . . . , n} we also write Q9 M j
or JY( 1 G0 . . . @ JY(n .) The significance of this definition will become clearer in 32.1;
a-ALGEBRAS 23
for the moment we give an alternative, and perhaps more intuitive, characterization
of product a-algebras in the case of countably many factors.
1.3 Proposition. If A is countable, then Q9O;EA MO; is the a-algebra generated by
{IlO;EA Eo; : Eo; E MO;}.
Proof. If Eo; E MO;, then 7Tl(EO;) = Il,aEA E,a where E,a = X for {3 i= lX;
on the other hand, IlO;EA Eo; = nO;EA 7T;;1 (EO;). The result therefore follows from
Lemma 1.1. .
1.4 Proposition. Suppose that MO; is generated by Go;, lX E A. Then Q9O;EA JY(O; is
generated by 1 = {7T;;l(EO;) : Eo; EGo;, lX E A}. If A is countable and XO; E Go;
for all lX, Q9O;EA MO; is generated by 2 == {IlO;EA Eo; : Eo; E GO;}.
Proof. Obviously M(l) C Q9O;EA Mo;. On the other hand, for each lX, the
collection {E c XO; : 7T;;l(E) E M(I)} is easily seen to be a a-algebra on XO;
that contains Go; and hence Mo;. In other words, 7T;; I (E) E JY( ( I) for all E E JY(0;,
lX E A, and hence Q9O;EA MO; C M(I). The second assertion follows from the first
as in the proof of Proposition 1.3. .
1.5 Proposition. Let Xl'...' X n be metric spaces and let X = Il X j , equipped
with the product metric. Then Q9; r.Bx j C r.Bx. If the Xj's are separable, then
Q9 r.B Xj = 13x.
Proof. By Proposition 1.4, Q9 r.B x j is generated by the sets 7T j I (U j ), 1 < j <
n, where U j is open in X j . Since these sets are open in X, Lemma 1.1 implies that
Q9 13xj C 13x. Suppose now that C j is a countable dense set in X j , and let Gj be
the collection of balls in X j with rational radius and center in C j . Then every open
set in X j is a union of members of Gj - in fact, a countable union since Gj itself is
countable. Moreover, the set of points in X whose jth coordinate is in C j for all j
is a countable dense subset of X, and the balls of radius r in X are merely products
of balls of radius r in the Xj's. It follows that r.Bx j is generated by Gj and r.Bx is
generated by {Il Ej : Ej E G j}. Therefore r.B x = Q9 r.B x j by Proposition 1.4. .
1.6 Corollary. r.Bn = Q9 r.B.
We conclude this setion with a technical result that will be needed later. We
define an elementary family to be a collection G of subsets of X such that
. 0 E G,
. if E, F E G then E n F E G,
. if E E G then EC is a finite disjoint union of members of G.
1.7 Proposition. If G is an elementary family, the collection A of finite disjoint
unions of members of G is an algebra.
24 MEASURES
Proof. If A, BEG and BC == U{ C j (C j E G, disjoint), then A \ B ==
U (A n C j ) and A u B == (A \ B) u B, where these unions are disjoint, so
A \ B E A and Au B E A. It now follows by induction that if Al . . . , An E G, then
U Aj E A; indeed, by inductive hypothesis we may assume that AI, . . . , An-l are
disjoint, and then U Aj == An U U-1 (Aj \ An), which is a disjoint union. To see
J .
that A is closed under complements, suppose AI, . . . An E G and A == U j :\ Bln
with Bln, . . . , B:!nm disjoint members of G. Then
n n J m
( U A ) c == n (U Bj ) == U{ Bjln.. .nBjn : 1 < J . < J 1 < m < n }
m m 1 n - m - m, - - ,
m=1 m=1 j=1
which is in A.
.
Exercises
1. A family of sets 9( C P( X) is called a ring if it is closed under finite unions
and differences (Le., if E 1 , . . . , En E 9(, then U Ej E 9(, and if E, F E 9(, then
E \ F E 9(). A ring that is closed under countable unions is called au-ring.
a. Rings (resp. a-rings) are closed under finite (resp. countable) intersections.
b. If 9( is a ring (resp. a-ring), then 9( is an algebra (resp. a-algebra) iff X E 9(.
c. If 9( is a a-ring, then {E eX: E E 9( or EC E 9(} is a a-algebra.
d. If 9( is a a-ring, then {E eX: E n F E 9( for all F E 9(} is a a-algebra.
2. Complete the proof of Proposition 1.2.
3. Let M be an infinite a-algebra.
a. M contains an infinite sequence of disjoint sets.
b. card(M) > c.
4. An algebra A is a a-algebra iff A is closed under countable increasing unions
(i.e., if {E j }l c A and E 1 C E 2 C ..., then U Ej E A).
5. If M is the a-algebra generated by G, then M is the union of the a-algebras
generated by as ranges over all countable subsets of G. (Hint: Show that the
latter object is a a-algebra.)
1.3 MEASURES
Let X be a set equipped with a a-algebra M. A measure on M (or on (X, M), or
simply on X if ]v( is understood) is a function f.-l : M [0,00] such that
1. IL( 0) == 0,
11. if {Ej } 1 is a sequence of disjoint sets in M, then f.-l(U E j ) == L J-l( Ej ).
Property (ii) is called countable additivity. It implies finite additivity:
MEASURES 25
ii'. if E 1 , . . . En are disjoint sets in M, then JL(U Ej) == L JL(Ej),
because one can take Ej == 0 for j > n. A function JL that satisfies (i) and (ii') but
not necessarily (ii) is called a finitely additive measure.
If X is a set and M c P(X) is a a-algebra, (X, M) is called a measurable space
and the sets in M are called measurable sets. If JL is a measure on (X, M), then
(X, M, /-l) is called a measure space.
Let (X, M, /-l) be a measure space. Here is some standard terminology concerning
the "size" of /-l. If /-l( X) < 00 (which implies that /-l( E) < 00 for all E E M since
/-l(X) == /-l(E) + /-l(EC)), JL is called finite. If X = U Ej where Ej E M and
/-l( Ej) < 00 for all j, /-l is called u-finite. More generally, if E = U Ej where
Ej E ]v( and /-l( Ej) < 00 for all j, the set E is said to be u-finite for /-l. (It would
be correct but more cumbersome to say that E is of a-finite measure.) If for each
E E ]V( with /-l(E) == 00 there exists F E M with FeE and 0 < /-l(F) < 00, /-l is
called semifinite.
Every a-finite measure is semifinite (Exercise 13), but not conversely. Most mea-
sures that arise in parctice are a-finite, which is fortunate since non-a-finite measures
tend to exhibit pathological behavior. The properties of non-a-finite measures will
be explored from time to time in the exercises.
Let us examine a few examples of measures. These examples are of a rather trivial
nature, although the first one is of practical importance. The construction of more
interesting examples is a task to which we shall turn in the next two sections.
. Let X be any non empty set, M = P(X), and f any function from X to [0,00].
Then f determines a measure /-l on M by the formula /-l(E) == LXEE f(x).
(For the definition of such possibly uncountable sums, see 30.5.) The reader
may verify that /-l is semifinite iff f(x) < 00 for every x E X, and /-l is a-finite
iff /-l is semifinite and {x : f (x) > O} is countable. Two special cases are of
particular significance: If f (x) == 1 for all x, /-l is called counting measure;
and if, for some Xo E X, f is defined by f(xo) == 1 and f(x) == 0 for x =1= xo,
/-l is called the point mass or Dirac measure at Xo. (The same names are also
applied to the restrictions of these measures to smaller a-algebras on X.)
. Let X be an uncountable set, and let M be the a -algebra of countable or co-
countable sets. The function /-l on M defined by /-l( E) == 0 if E is countable
and /-l( E) == 1 if E is co-countable is easily seen to be a measure.
. Let X be an infinite set and M == P(X). Define /-l(E) = 0 if E is finite,
/-l( E) == 00 if E is infinite. Then /-l is a finitely additive measure but not a
measure.
The basic properties of measures are summarized in the following theorem.
1.8 Theorem. Let (X, M, /-l) be a measure space.
a. (Monotonicity) if E, F E Jv( and E c F, then /-l(E) < /-l(F).
b. (Subadditivity) if {E j }l c M, then /-l(U Ej) < L /-l(Ej).
26 MEASURES
c. (Continuity from below) If {E j }l c M and E I C E 2 C ..., then
JL(U Ej) == lirnj-+oo JL(Ej).
d. (Continuity from above) If{E j }l eM, E I E 2 ..., andJL(E I ) < 00,
then JL(n Ej) == limj-+oo JL(Ej).
Proof. (a) If E c F, then JL(F) == JL(E) + JL(F \ E) > JL(E).
(b) Let F I == E I and Fk == Ek \ (U-I Ej) for k > 1. Then the Fk's are disjoint
and U7 Fj == U7 Ej for all n. Therefore, by (a),
00 00 00 00
JL(UE j ) = JL(UF j ) = LJL(F j ) < LJL(Ej).
1 1 1 1
(c) Setting Eo == 0, we have
00 00 n
JL (U Ej ) == " JL(Ej \ E j - I ) == lirn "JL(E j \ E j - I ) == lirn JL(En).
n-+oo n-+oo
111
(d) Let Fj == E I \ E j; then FI C F 2 C ..., JL ( E I ) == JL ( Fj) + JL ( E j ), and
U Fj == E I \ (n Ej). By (c), then,
00 00
JL(EI) = JL(n E j ) + j JL(Fj) = JL(n E j ) + j [JL(E 1 ) - JL(Ej)].
1 1
Since JL( E I ) < 00, we may subtract it from both sides to yield the desired result. .
We remark that the condition JL( E I ) < 00 in part (d) could be replaced by
fl(En) < 00 for some n > 1, as the first n - 1 Ej's can be discarded from the
sequence without affecting the intersection. However, some finiteness assumption
is necessary, as it can happen that JL( Ej) == 00 for all j but JL(n Ej) < 00. (For
example, let JL be counting measure on (N, P(N)) and let Ej == {n : n > j}; then
n Ej == 0.)
If (X, M, JL) is a measure space, a set E E M such that JL(E) == 0 is called a null
set. By subadditivity, any countable union of null sets is a null set, a fact which we
shall use frequently. If a statement about points x E X is true except for x in some
null set, we say that it is true almost everywhere (abbreviated a.e.), or for almost
every x. (If more precision is needed, we shall speak of a J..L-null set, or J..L-almost
everywhere).
If J-l(E) == 0 and FeE, then JL(F) == 0 by monotonicity provided that F E M,
but in general it need not be true that F E M. A measure whose domain includes
all subsets of null sets is called complete. Completeness can sometimes obviate
annoying technical points, and it can always be achieved by enlarging the domain of
JL, as follows.
1.9 Theorem. Suppose that (X, M, JL) is a measure space. Let N == {N EM:
JL(N) == O} and M == {E U F: E E M and FeN for some N EN}. The n M is
a (J" -algebra, and there is a unique extension JL of JL to a complete measure on M.
MEASURES 27
Proof. Since M and N are closed under countable unions, so is M . If EuF E M
where E E M and FeN E N, we can assume that En N == 0 (otherwise, replace
F and N by F \ E and N \ E). Then E u F == (E u N) n (NC u F), so
(E U F)C == (E u N)C u (N \ F). But (E U N)C E Jv( and N \ FeN, so that
(E U F)C E M . Thus M is a a-algebra.
If E u F E M as above, we set /-l ( E u F) == /-l( E). This is well defined,
since if EI u FI == E 2 U F 2 where Fj C N j E N, then EI C E 2 U N 2 and so
/-l(E I ) < /-l(E 2 ) + /-l(N 2 ) = /-l(E 2 ), and likewise /-l(E 2 ) < /-l(EI). It issily
verified that /-l is a complete measure on M, and that /-l is the only measure on M that
extends /-l; details are left to the reader (Exercise 6). .
The measure /-l in Theorem 1.9 is called the completion of /-l, and M is called the
completion of M with respect to /-l.
Exercises
6. Complete the proof of Theorem 1.9.
7. If /-l1, . . . , /-In are measures on (X, M) and aI, . . . , an E [0,00), then L: aj /-lj
is a measure on (X, M).
8. If (X, M, /-l) is a measure space and {E j }l c M, then /-l(lim inf Ej) <
lim inf /-l(Ej). Also, /-l(lim sup Ej) > lim sup /-l(Ej) provided that /-l(U Ej) <
00.
9. If(X, M, /-l) is a measure space and E, F E M, then/-l(E)+/-l(F) == /-l(E u F)+
/-l(E n F).
10. Given a measure space (X, M, /-l) and E E M, define /-lE(A) = /-l(A n E) for
A E M. Then /-lEis a measure.
11. A finitely additive measure /-l is a measure iff it is continuous from below as in
Theorem 1.8c. If /-l( X) < 00, /-l is a measure iff it is continuous from above as in
Theorem 1.8d.
12. Let (X, M, /-l) be a finite measure space.
a. If E, F E M and /-l(Ef::.F) = 0, then /-l(E) = /-l(F).
b. Say that E rv F if /-l(Ef::.F) = 0; then rv is an equivalence relation on M.
c. For E, F E M, define p(E, F) == /-l(Ef::.F). Then p(E, G) < p(E, F) +
p(F, G), and hence p defines a metric on the space M/ rv of equivalence classes.
13. Every a-finite measure is semifinite.
14. If /-l is a semifinite measure and /-l(E) == 00, for any C > 0 there exists FeE
with C < /-l(F) < 00.
15. Given a measure /-l on (X, M), define /-lo on M by /-lo(E) == sup{/-l(F) : F C
E and /-l(F) < oo}.
a. /-lo is a semifinite measure. It is called the semifinite part of /-l.
b. If /-l is semifinite, then /-l == /-lo. (Use Exercise 14.)
28 MEASURES
c. There is a measure v on M (in general, not unique) which assumes only the
values 0 and 00 such that fJ == fJo + v.
16. Let (X, M, fJ) be a measure space. A set E c X is c all ed locally measurable
if E n A E M for all A E M such that fJ(A) < 00. Let M be the collection of all
---- ----
locally measurable sets. Clearly M c M; if M = M, then fJ is called saturated.
a. If fJ is a-finite, then fJ is saturated.
b. M is a a-algebra.
c. Define jj on M by fJ (E) = fJ(E) if E E M and fJ (E) = 00 otherwise. Then
fJ is a saturated measure on M, called the saturation of fJ.
d. If fJ is complete, so is fJ .
e. Suppose that fJ is semifinite. For E E M, define fJ (E) = sup{fJ(A) : A E
M and AcE}. Then fJ is a saturated measure on M that extends fJ.
f. Let Xl, X 2 be disjoint uncountable sets, X = X I U X 2 , and M the a-algebra
of countable or co-countable sets in X. Let fJo be counting measure on P(X I ),
and define fJ on M by fJ(E) = fJo(E n Xl). Then fJ is a measure on M,
M = P( X), and in the notation of parts (c) and (e),- jj i- Jl .
1.4 OUTER MEASURES
In this section we develop the tools we shall use to construct measures. To motivate
the ideas, it may be useful to recall the procedure used in calculus to define the area
of a bounded region E in the plane 2 . One draws a grid of rectangles in the plane
and approximates the area of E from below by the sum of the areas of the rectangles
in the grid that are subsets of E, and from above by the sum of the areas of the
rectangles in the grid that intersect E. The limits of these approximations as the grid
is taken finer and finer give the "inner area" and "outer area" of E, and if they are
equal, their common value is the "area" of E. (We shall discuss these matters in more
detail in 32.6.) The key idea here is that of outer area, since if R is a large rectangle
containing E, the inner area of E is just the area of R minus the outer area of R \ E.
The abstract generalization of the notion of outer area is as follows. An outer
measure on a nonempty set X is a function j1* : P(X) [0,00] that satisfies
. fJ*(0) =0,
. fJ*(A) < j1*(B) if A c B,
· j1*(U Aj) < L j1*(A j ).
The most common way to obtain outer measures is to start with a family G of
"elementary sets" on which a notion of measure is defined (such as rectangles in the
plane) and then to approximate arbitrary sets "from the outside" by countable unions
of members of G. The precise construction is as follows.
OUTER MEASURES 29
1.10 Proposition. Let (, C P(X) and p : (, ---+ [0,00] be such that 0 E (" X E ("
and p(0) == O. For any A C X, define
00 00
j1*(A) = inf{Lj1(E j ) : Ej E £ and A c UE j }.
1 1
Then j..L * is an outer measure.
Proof. For any A C X there exists {E j }l C (, such that A C U Ej (take
Ej == X for all j) so the definition of j..L* makes sense. Obviously j..L* (0) == 0 (take
Ej == 0 for all j), and j..L* (A) < j..L* (B) for A C B because the set over which
the infimum is taken in the definition of j-l* (A) includes the corresponding set in the
definition of j-l* (B). To prove the countable subadditivity, suppose {A j }l c P(X)
and E > O. For each j there exists {EJ}l C (, such that Aj C U 1 Ej and
L=l p(EJ) < j..L*(A j ) + E2- j . But then if A == U Aj, we have A C U.rk=l Ej
and Lj,kP(Ej) < Ljj-l*(A j ) + E, whence j-l*(A) < Ljj..L*(A j ) + E. Since E is
arbitrary, we are done. .
The fundamental step that leads from outer measures to measures is as follows. If
j-l* is an outer measure on X, a set A C X is called J-L*-measurable if
j-l*(E) == j-l*(E n A) + j-l*(E n A C ) for all E C X.
Of course, the inequality j..L* (E) < j-l* (E n A) + j..L* (E n AC) holds for any A and E,
so to prove that A is j..L* -measurable, it suffices to prove the reverse inequality. The
latter is trivial if j..L*(E) == 00, so we see that A is j..L*-measurable iff
j..L*(E) > j..L*(E n A) + j..L*(E n A C ) for all E C X such that j..L*(E) < 00.
Some motivation for the notion of j..L* -measurability can be obtained by referring
to the discussion at the beginning of this section. If E is a "well-behaved" set such
that E :) A, the equation j..L*(E) == j-L*(E n A) + j..L*(E n AC) says that the outer
measure of A, j..L*(A), is equal to the "inner measure" of A, j..L*(E) - j..L*(E n AC).
The leap from "well-behaved" sets containing A to arbitrary subsets of X a large
one, but it is justified by the following theorem.
1.11 Caratheodory's Theorem. If j-l* is an outer measure on X, the collection ]v(
of J-L* -measurable sets is a (j-algebra, and the restriction of j..L* to ]v( is a complete
measure.
Proof. First, we observe that]v( is closed under complements since the definition
of j..L*-measurability of A is symmetric in A and AC. Next, if A, B E ]v( and E C X,
j..L* (E) == j-l* (E n A) + j-l* (E n A C)
== j..L*(E n A n B) + j-l*(E n An B C ) + j-l*(E n A C n B) + j..L*(E n A C n B C ).
But (A U B) == (A n B) U (A n BC) U (AC n B), so by subadditivity,
j..L*(E n A n B) + j..L*(E n A n B C ) + j-l*(E n A C n B) > j..L*(E n (A U B)),
30 MEASURES
and hence
{1*(E) > {1*(E n (A u B)) + {1*(E n (A u B)C).
It follows that A U B E M, so M is an algebra. Moreover, if A, B E M and
An B == 0,
{1*(A u B) == {1*((A u B) n A) + {1*((A u B) n A C ) == {1*(A) + {1*(B),
so j1 * is finitely additive on M.
To show that M is a a-algebra, it will suffice to show that M is closed under
countable disjoint unions. If {A j }l is a sequence of disjoint sets in M, let Bn ==
U Aj and B == U Aj. Then for any E c X,
j1*(E n Bn) == {1*(E n Bn n An) + j-L*(E n Bn n A)
== j1*(E n An) + {1*(E n B n - I ),
so a simple induction shows that {1* (E n Bn) == 2:: {1* (E n Aj). Therefore,
n
J-L* (E) == J-L* (E n Bn) + J-L* (E n B) > L {1* (E n Aj) + {1* (E n B C ),
1
and letting n 00 we obtain
00 00
J-l*(E) > LJ-l*(EnAj)+J-l*(EnB C ) > J-l*(U(EnA j )) +J-l*(EnBC)
1 1
== {1*(E n B) + J-L*(E n B C ) > J-L*(E).
All the inequalities in this last calculation are thus equalities. It follows that B E M
and - taking E == B - that J-L* (B) == 2:: {1* (Aj), so J-L* is countably additive on
M. Finally, if {1* (A) == 0, for any E c X we have
{1* (E) < {1* (E n A) + {1* (E n A C ) == {1* (E n A C ) < {1* (E),
so that A E M. Therefore {1 * I M is a complete measure.
.
Our first applications of Caratheodory's theorem will be in the context of extending
measures from algebras to a-algebras. More precisely, if A c P(X) is an algebra, a
function J-Lo : A [0,00] will be called a premeasure if
. J-Lo(0) == 0,
. if {A j }l is a sequence of disjoint sets in A such that U Aj E A, then
J-Lo(U Aj) == 2:: {1o(Aj).
In particular, a premeasure is finitely additive since one can take Aj == 0 for j large.
The notions of finite and a-finite premeasures are defined just as for measures. If {10
OUTER MEASURES 31
is a premeasure on A c P(X), it induces an outer measure on X in accordance with
Proposition 1.10, namely,
(1.12)
00 00
fL*(E) = inf{LfLo(A j ) : Aj E A, E c UA j }.
1 1
1.13 Proposition. If f.-la is a premeasure on A and f.-l * is defined by (1.12), then
a. f.-l * I A == f.-la;
b. every set in A is f.-l * measurable.
Proof. (a) Suppose E E A. If E c U Aj with Aj E A, let Bn == E n
(An \ U-I Aj). Then the Bn's are disjoint members of A whose union is E, so
f.-la(E) == L f.-la(Bj) < L f.-la(A j ). It follows that f.-la(E) < f.-l*(E), and the
reverse inequality is obvious since E c U Aj where Al == E and Aj == 0 for
j > 1.
(b) If A E A, E c X, and E > 0, there is a sequence {B j }l C A with
E c U Bj and L f.-la(B j ) < f.-l*(E) + E. Since f.-la is additive on A,
00
00
f.-l*(E) + E > L f.-la(B j n A) + L f.-la(B j n A C ) > f.-l*(E n A) + f.-l*(E n A C ).
1 1
Since E is arbitrary, A is f.-l * -measurable.
.
1.14 Theorem. Let A C P(X) be an algebra, f.-la a premeasure on A, and M the
a-algebra generated by A. There exists a measure f.-l on M whose restriction to A is
f.-la - namely, f.-l == f.-l* 1M where f.-l* is given by (1.12). Ifv is another measure on M
that extends f.-la, then v( E) < J-l( E) for all E E M, with equality when f.-l( E) < 00.
If f.-la is a-finite, then f.-l is the unique extension of f.-la to a measure on M.
Proof. The first assertion follows from Caratheodory's theorem and Proposition
1.13 since the a-algebra of f.-l* -measurable sets includes A and hence M. As for
the second assertion, if E E M and E C U Aj where Aj E A, then v(E) <
L v(Aj) == L f.-la(A j ), whence v(E) < f.-l(E). Also, if we setA == U Aj, we
have
n n
v(A) = } v(UA j ) = }fL(UAj) = fL(A).
1 1
If f.-l(E) < 00, we can choose the Aj's so that f.-l(A) < f.-l(E) + E, hence f.-l(A \ E) < E,
and
f.-l(E) < f.-l(A) == v(A) == v(E) + v(A \ E) < v(E) + f.-l(A \ E) < v(E) + E.
Since E is arbitrary, f.-l(E) == v(E). Finally, suppose X = U Aj with f.-la(Aj) < 00,
where we can assume that the Aj's are disjoint. Then for any E E M,
00
00
f.-l(E) == Lf.-l(EnA j ) == Lv(EnA j ) == v(E),
1 1
so v == f.-l.
.
32 MEASURES
The proof of this theorem yields more than the statement. Indeed, j-lo may be
extended to a measure on the algebra Jy(* of all j..L* -measurable sets. The relation
between Jy( and Jy(* is explored in Exercise 22 (along with Exercise 20b, which
ensures that the outer measures induced by j-lo and j..L are the same).
Exercises
17. If j-l * is an outer measure on X and {A j } 1 is a sequence of disjoint j..L *-
measurable sets, then j..L * (E n (U Aj)) == L j..L * (E n Aj) for any E eX.
18. Let A c P( X) be an algebra, Aa the collection of countable unions of sets
in A, and Aa8 the collection of countable intersections of sets in Aa. Let j..Lo be a
pre measure on A and j..L * the induced outer measure.
a. For any E c X and E > 0 there exists A E Aa with E c A and j..L*(A) <
j-l*(E) +E.
b. If j..L* (E) < 00, then E is j..L* -measurable iff there exists B E Aa8 with E c B
and j-l* (B \ E) == o.
c. If /-Lo is a-finite, the restriction j-l*(E) < 00 in (b) is superfluous.
19. Let j..L* be an outer measure on X induced from a finite premeasure po. If
E c X, define the inner measure of E to be j..L*(E) == j-lo(X) - j..L*(EC). Then E
is j-l* -measurable iff j..L* (E) == j..L* (E). (Use Exercise 18.)
20. Let j..L* be an outer measure on X, Jy(* the a-algebra of j..L* -measurable sets,
j-l == j..L* 1Jy(*, and j..L+ the outer measure induced by j-l as in (1.12) (with j..L and Jy(*
replacing j-lo and A).
a. If E c X, we have j-l*(E) < j..L+(E), with equality iff there exists A E Jy(*
with A :) E and j-l*(A) == j-l*(E).
b. If j..L* is induced from a premeasure, then j..L* == j..L+. (Use Exercise 18a.)
c. If X == {O, I}, there exists an outer measure j-l* on X such that j-l* =1= j-l+.
21. Let j..L* be an outer measure induced from a premeasure and j..L the restriction of
/-L* to the j..L* -measurable sets. Then j..L is saturated. (Use Exercise 18.)
22. Let (X, Jy(, j..L) be a measure space, j-l * the outer measure induced by j..L according
to (1.12), Jy(* the a-algebra of j-l*-measurable sets, and j-l == j..L*IJy(*.
a. If j..L is a-finite, then j-l is the completion of j..L. (Use Exercise 18.)
b. In general, j-l is the saturation of the completion of j..L. (See Exercises 16 and
21.)
23. Let A be the collection of finite unions of sets of the form (a, b] n Q where
-00 < a < b < 00.
a. A is an algebra on Q. (Use Proposition 1.7.)
b. The a-algebra generated by A is P(Q).
c. Define /-Lo on A by j-lo(0) == 0 and j..Lo(A) == 00 for A =1= 0. Then j..Lo is a
premeasure on A, and there is more than one measure on P(Q) whose restriction
to A is /-Lo.
BOREL MEASURES ON THE REAL LINE 33
24. Let j..L be a finite measure on (X, M), and let j-l* be the outer measure induced by
J-L. Suppose that E C X satisfiesj-l*(E) = J-L*(X) (butnotthatE EM).
a. If A, B E M and An E = B n E, then j..L(A) == j..L(B).
b. Let ME = {A n E : A E M}, and define the function l/ on ME defined by
l/(A n E) == j..L(A) (which makes sense by (a)). Then ME is a a-algebra on E
and l/ is a measure on ME.
1.5 BOREL MEASURES ON THE REAL LINE
We are now in a position to construct a definitive theory for measuring subsets of JR
based on the idea that the measure of an interval is its length. We begin with a more
general (but only slightly more complicated) construction that yields a large family
of measures on JR whose domain is the Borel a-algebra 13 JR ; such measures are called
Borel measures on JR.
To motivate the ideas, suppose that J-L is a finite Borel measure on JR, and let
F(x) == j..L(( -00, x]). (F is sometimes called the distribution function of j..L.)
Then F is increasing by Theorem I.8a and right continuous by Theorem I.8d since
( -00, x] == n ( -00, x n ] whenever X n x. (Recall the discussion of increasing
functions in 30.5.) Moreover, if b > a, (-00, b] = (-00, a] U (a, b], so j..L((a, b]) ==
F (b) - F ( a). Our procedure will be to turn this process around and construct a
measure J-L starting from an increasing, right-continuous function F. The special case
F( x) == x will yield the usual "length" measure.
The building blocks for our theory will be the left-open, right-closed intervals in
JR - that is, sets of the form (a, b] or (a, (0) or 0, where -00 < a < b < 00. In
this section we shall refer to such sets as h-intervals (h for "half-open"). Clearly the
intersection of two h-intervals is an h-interval, and the complement of an h-interval
is an h-interval or the disjoint union of two h-intervals. By Proposition 1.7, the
collection A of finite disjoint unions of h-intervals is an algebra, and by Proposition
1.2, the a-algebra generated by A is 13 JR .
1.15 Proposition. Let F : JR JR be increasing and right continuous. If (aj , b j ]
(j == 1, . . . , n) are disjoint h-intervals, let
n n
J-to(U(aj,b j ]) = 2)F(b j ) - F(aj)] ,
I I
and let j..Lo (0) == O. Then j..Lo is a premeasure on the algebra A.
Proof. First we must check that j..Lo is well defined, since elements of A can be
represented in more than one way as disjoint unions of h-intervals. If {( aj, b j ]} 1
are disjoint and U (aj, b j ] == (a, b], then, after perhaps relabeling the index j, we
must have a == al < b l == a2 < b 2 == ... < b n == b, so L[F(bj) - F(aj)] ==
F(b) - F(a). More generally, if {I i }l and {J j }l are finite sequences of disjoint
34 MEASURES
h-intervals such that U Ii == U J j , this reasoning shows that
L /-La ( Ii) == L /-La (Ii n J j ) == L /-La ( J j ) .
'l i ,j j
Thus /-La is well defined, and it is finitely additive by construction.
It remains to show that if {I j }l is a sequence of disjoint h-intervals with U Ij E
A then JLa(U Ij) = I: JLa(I j ). Since U Ij is a finite union of h-intervals, the
sequence {I j } 1 can be partitioned into finitely many subsequences such that the
union of the intervals in each subsequence is a single h-interval. By considering each
subsequence separately and using the finite additivity of JLa, we may assume that
U Ij is an h-interval I == (a, b]. In this case, we have
n n n n
/-La (I) = /-La (U 1 j ) + /-La (I \ U 1 j ) > /-La (U 1 j ) = L /-La (1j).
1 1 1 1
Letting n 00, we obtain JLa(I) > I: JL(Ij). To prove the reverse inequality,
let us suppose first that a and b are finite, and let us fix E > O. Since F is right
continuous, there exists 8 > 0 such that F (a + 8) - F ( a) <;: E, and if Ij = (aj, b j ],
for each j there exists 8j > 0 such that F(b j + 8j) - F(b j ) < E2- j . The open
intervals (aj, b j + 8j) cover the compact set [a + 8, b], so there is a finite subcover.
By discarding any (aj, b j + 8j) that is contained in a larger one and relabeling the
index j, we may assume that
. the intervals (aI, b 1 + 8 1 ), . . . , (aN, b N + 8 N) cover [a + 8, b],
. b j + 8j E (aj+l, b j + 1 + 8 j + 1 ) for j == 1,..., N - 1.
But then
JLa(I) < F(b) - F(a + 8) + E
< F(bN + 8 N ) - F(al) + E
N-l
== F(bN + 8N) - F(aN) + L [F(aj+l) - F(aj)] + E
1
N-l
< F(bN+8N)-F(aN)+ L[F(b)'+8 j )-F(aj)] +E
1
N
< L[F(b j ) + E2- j - F(aj)] + E
1
00
< LJL(I j ) + 2E.
1
Since E is arbitrary, we are done when a and b are finite. If a == -00, for any
M < 00 the intervals (aj b j + 8j) cover [-M, b], so the same reasoning gives
F(b) - F(-M) < I: /-La(Ij) + 2E, whereas if b == 00, for any M < 00 we
likewise obtain F (M) - F ( a) < I: /-La (Ij) + 2E. The desired result then follows
by letting E 0 and M 00. .
BOREL MEASURES ON THE REAL LINE 35
1.16 Theorem. If F : IR -7 IR is any increasing, right continuous function, there is
a unique Borel measure J-lF on IR such that J-lF((a, b]) == F(b) - F(a) for all a, b. If
G is another such function, we have J-L F == J-Lc iff F - G is constant. Conversely, if
J-L is a Borel measure on IR that is finite on all bounded Borel sets and we define
{ J-l((0, x])
F(x) = 0
-J-L(( -x, 0])
if x > 0,
if x = 0,
if x < 0,
then F is increasing and right continuous, and J-L = J-Lp.
Proof. Each F induces a premeasure on A by Proposition 1.15. It is clear that F
and G induce the same premeasure iff F - G is constant, and that these premeasures
are a-finite (since IR = UOO (j, j + 1 ]). The first two assertions therefore follow from
Theorem 1.14. As for the last one, the monotonicity of J-l implies the monotonicity
of F, and the continuity of J-L from above and below implies the right continuity of F
for x > 0 and x < O. It is evident that J-L == J-LF on A, and hence J-L == J-LF on 13 IR by
the uniqueness in Theorem 1.14. .
Several remarks are in order. First, this theory could equally well be developed
by using intervals of the form [ a, b) and left continuous functions F. Second, if
J-l is a finite Borel measure on IR, then J-l = J-lF where F(x) = J-l( (-00, x]) is the
cumulative distribution function of J-l; this differs from the F specified in Theorem
1.16 by the constant J-l( ( -00, 0]). Third, the theory of 31.4 gives, for each increasing
and right continuous F, not only the Borel measure J-LF but a complete measure J-l F
whose dOll1ain includes 13 IR . In fact, J-l F is just the completion of J-lF (Exercise 22a
or Theorem 1.19 below), and one can show that its domain is always strictly larger
than 13 IR . We shall usually denote this complete measure also by J-lF; it is called the
Lebesgue-Stieltjes measure associated to F.
Lebesgue-Stieltjes measures enjoy some useful regularity properties that we now
investigate. In this discussion we fix a complete Lebesgue-Stieltjes measure J-l on IR
associated to the increasing, right continuous function F, and we denote by MJ.L the
domain of J-l. Thus, for any E E MJ.L'
00 00
J-L(E) = inf {I)F(b j ) - F(aj)] : E c U(aj, b j ]}
1 1
00 00
= inf {L: J-L( (aj, b j ]) : E c U(aj, b j ]}.
1 1
We first observe that in the second formula for J-l(E) we can replace h-intervals by
open h-intervals:
1.17 Lemma. For any E E MJ.L'
00 00
J-L(E) = inf{L:J-L((aj,b j )): E C U(aj,b j )}.
1 1
36 MEASURES
Proof. Let us call the quantity on the right v(E). Suppose E c U (aj, b j ).
Each (aj, b j ) is a countable disjoint union of h-intervals Ij (k == 1, 2, . . .); specifi-
cally, Ij == (cj, cj+l] where {Cj} is any sequence such that C] == aj and cj increases
to b j as k 00. Thus E c U.rk=l Ij, so
00
00
LJ-L((aj,b j )) == L J-L(Ij) > J1(E),
1 j,k=l
and hence v( E) > J-L( E). On the other hand, given E > 0 there exists {( aj, b j ]} 1
with E c U(aj, b j ] and L J-L((aj, bj]) < J-L(E) + E, and for each j there exists
8j > 0 such that F(b j + 8j) - F(b j ) < E2- j . Then E c U(aj, b j + 8j) and
00
00
LJ1((aj, b j + 8 j )) < LJ-L((aj,b j ]) + E < J1(E) + 2E,
1 1
so that v(E) < J1(E).
.
1.18 Theorem. If E E MJL' then
J-L(E) == inf {J1(U) : U :) E and U is open}
== sup{J-L(K) : K c E and K is compact}.
Proof. By Lemma 1.17, for any E > 0 there exist intervals (aj,b j ) such that
E c U(aj, b j ) and J1(E) < L J-L((aj, bj)) + E. If U == U(aj, b j ) then U is
open, U :) E, and J-L(U) < J-L(E) + E. On the other hand, J1(U) > J1(E) whenever
U :) E, so the first equality is valid. For the second one, suppose first that E is
bounded. If E is closed, then E is compact and the equality is obvious. Otherwise,
given E > 0 we can choose an open U :) E \ E such that J1(U) < J-L( E \ E) + E. Let
K == E \ U. Then K is compact, K c E, and
J1(K) == J-L(E) - J-L(E n U) == J-L(E) - (f-L(U) - J-L(U \ E)]
> J-L(E) - J-L(U) + J-L( E \ E) > J-L(E) - E.
If E is unbounded, let Ej == E n (j, j + 1]. By the preceding argument, for
any E > 0 there exist compact Kj C Ej with J-L(Kj) > J1(Ej) - E2- j . Let
Hn == Un Kj. Then Hn is compact, Hn C E, and J-L(Hn) > J-L(Un Ej) - E.
Since J1(E) == limnoo J1(U:'n Ej), the result follows. .
1.19 Theorem. If E c IR, the following are equivalent.
a. E E MJL.
b. E == V \ N 1 where V is a C8 set and J-L(N 1 ) == o.
c. E == H U N 2 where H is an Fa set and J-L(N 2 ) == o.
BOREL MEASURES ON THE REAL LINE 37
Proof. Obviously (b) and (c) each imply (a) since J-l is complete on MIL. Suppose
E E MIL and J1(E) < 00. By Theorem 1.18, for j E N we can choose an open
U j :) E and a compact Kj C E such that
J-L(U j ) - 2- j < J1(E) < J-l(Kj) + 2- j .
Let V == n U j and H = U Kj. Then H C E c V and J-l(V) == J1(H)
J-l( E) < 00, so J-l(V \ E) == J1( E \ H) == O. The result is thus proved when
J-l(E) < 00; the extension to the general case is left to the reader (Exercise 25). .
The significance of Theorem 1.19 is that all Borel sets (or, more generally, all sets
in MIL) are of a reasonably simple form modulo sets of measure zero. This contrasts
markedly with the machinations necessary to construct the Borel sets from the open
sets when null sets are not excepted; see Proposition 1.23 below. Another version
of the idea that general measurable sets can be approximated by "simple" sets is
contained in the following proposition, whose proof is left to the reader (Exercise
26):
1.20 Proposition. If E E MIL and J1( E) < 00, then for every E > 0 there is a set A
that is ajinite union of open intervals such that J1(ED.A) < E.
We now examine the most important measure on JR, namely, Lebesgue measure:
This is the complete measure J-lF associated to the function F( x) == x, for which the
measure of an interval is simply its length. We shall denote it by m. The domain of
m is called the class of Lebesgue measurable sets, and we shall denote it by /:.;. We
shall also refer to the restriction of m to 13jR as Lebesgue measure.
Among the most significant properties of Lebesgue measure are its invariance
under translations and simple behavior under dilations. If E c JR and s, r E JR, we
define
E + s = {x + s : x E E},
rE={rx:xEE}.
1.21 Theorem. If E E /:.;, then E + s E /:.; and r E E /:.; for all s, r E JR. Moreover,
m(E + s) = m(E) and m(rE) == Irlm(E).
Proof. Since the collection of open intervals is invariant under translations and
dilations, the same is true of 13jR. For E E 13jR, let ms(E) == m(E + s) and
mT(E) == m(rE). Then ms and m T clearly agree with m and Irlm on finite unions
of intervals, hence on 13jR by Theorem 1.14. In particular, if E E 13jR and m( E) = 0,
then m(E + s) = m(r E) == 0, from which it follows that the class of sets of
Lebesgue measure zero is preserved by translations and dilations. It follows that /:.;
(the members of which are a union of a Borel set and a Lebesgue null set) is preserved
by translation and dilations and that m(E + s) == m(E) and m(rE) == Irlm(E) for
all E E /:.;. .
The relation between the measure-theoretic and topological properties of subsets
of R is delicate and contains some surprises. Consider the following facts. Every
singleton set in JR has Lebesgue measure zero, and hence so does every countable
38 MEASURES
set. In particular, m(Q) == O. Let {rj}l be an enumeration of the rational numbers
in [0, 1], and given E > 0, let Ij be the interval centered at rj of length E2- j. Then
the set U == (0,1) n U Ij is open and dense in [0,1], but m(U) < I: E2- j == E;
its complement K == [0,1] \ U is closed and nowhere dense, but m(K) > 1 - E.
Thus a set that is open and dense, and hence topologically "large," can be measure-
theoretically small, and a set that is nowhere dense, and hence topologically "small,"
can be measure-theoretically large. (A nonempty open set cannot have Lebesgue
measure zero, however.)
The Lebesgue null sets include not only all countable sets but many sets having
the cardinality of the continuum. We now present the standard example, the Cantor
set, which is also of interest for other reasons.
Each x E [0,1] has a base-3 decimal expansion x == I: a j 3- j where aj == 0, 1,
or 2. This expansion is unique unless x is of the form p3 -k for some integers p, k, in
which case x has two expansions: one with aj == 0 for j > k and one with aj == 2 for
j > k. Assuming p is not divisible by 3, one of these expansions will have ak == 1
and the other will have ak == 0 or 2. If we agree always to use the latter expansion,
we see that
1 . ff 1 2
al == 1 3" < x < 3"'
-I- 1 d 1 . ff 1 2 7 8
al I an a2 == 1 "9 < x < "9 or "9 < x < "9'
and so forth. It will also be useful to observe that if x == I: a j 3- j and y == I: b j 3- j ,
then x < y iff there exists an n such that an = b n and aj == b j for j < n.
The Cantor set C is the set of all x E [0, 1] that have a base-3 expansion
x == I: a j 3- j with aj i= 1 for all j. Thus C is obtained from [0,1] by removing the
open middle third (1, ), then removing the open middle thirds (, ) and (, ) of
the two remaining intervals, and so forth. The basic properties of C are summarized
as follows:
1.22 Proposition. Let C be the Cantor set.
a. C is compact, nowhere dense, and totally disconnected (i.e., the only connected
subsets of C are single points). Moreover, C has no isolated points.
b. rn( C) == O.
c. card (C) == c.
Proof. We leave the proof of (a) to the reader (Exercise 27). As for (b), C is
obtained from [0, 1] by removing one interval of length 1, two intervals of length ,
and so forth. Thus
00 2 j 1 1
m(C) = 1 - 3j+1 = 1 - "3 · 1 _ (2/3) = o.
Lastly, suppose x E C, so that x == I: a j 3- j where aj == 0 or 2 for all j.
Let f(x) == I: b j 2- j where b j == aj/2. The series defining f(x) is the base-2
expansion of a number in [0,1], and any number in [0,1] can be obtained in this way.
Hence f maps C onto [0,1], and (c) follows. .
BOREL MEASURES ON THE REAL LINE 39
Let us examine the map f in the preceding proof more closely. One readily sees
that if x, y E C and x < y, then f(x) < f(y) unless x and yare the two endpoints
of one of the intervals removed from [0,1] to obtain C. In this case f(x) = p2- k for
some integers p, k, and f (x) and f (y) are the two base-2 expansions of this number.
We can therefore extend f to a map from [0,1] to itself by declaring it to be constant
on each interval missing from C. This extended f is still increasing, and since its
range is all of [0,1] it cannot have any jump discontinuities; hence it is continuous.
f is called the Cantor function or Cantor-Lebesgue function.
The construction of the Cantor set by starting with [0,1] and successively removing
open middle thirds of intervals has an obvious generalization. If I is a bounded
interval and a E (0,1), let us call the open interval with the same midpoint as I and
length equal to a times the length of I the "open middle ath" of I. If {aj}r is
any sequence of numbers in (0, 1), then, we can define a decreasing sequence {K j }
of closed sets as follows: Ko == [0,1], and Kj is obtained by removing the open
middle ajth from each of the intervals that make up K j - 1 . The resulting limiting
set K = n Kj is called a generalized Cantor set. Generalized Cantor sets all
share with the ordinary Cantor set the properties (a) and (c) in Proposition 1.22. As
for their Lebesgue measure, clearly m(Kj) == (1 - aj)m(Kj-l), so m(K) is the
infinite product Il (1- aj) = lim n -+ oo Il (1- aj). If the aj are all equal to a fixed
a E (0,1) (for example, a == for the ordinary Cantor set), we have m(K) == O.
However, if aj --7 0 sufficiently rapidly as j --7 00, m(K) will be positive, and for
any (3 E (0,1) one can choose aj so that m(K) will equal (3; see Exercise 32. This
gives another way of constructing nowhere dense sets of positive measure.
Not every Lebesgue measurable set is a Borel set. One can display examples of
sets in ,c \ 'B R by using the Cantor function; see Exercise 9 in Chapter 2. Alternatively,
one can observe that since every subset of the Cantor set is Lebesgue measurable, we
have card('c) == card(P()) > c, whereas card('BJR) == c. The latter fact follows
from Proposition 1.23 below.
Exercises
25. Complete the proof of Theorem 1.19.
26. Prove Proposition 1.20. (Use Theorem 1.18.)
27. Prove Proposition 1.22a. (Show that if x, y E C and x < y, there exists z tt C
such that x < z < y.)
28. Let F be increasing and right continuous, and let J..L F be the associated measure.
Then J..LF({a}) == F(a) - F(a-), J..LF([a,b)) == F(b-) - F(a-), J..LF([a,b])
F ( b) - F ( a - ), and J..L F ( ( a, b)) == F (b-) - F ( a) .
29. Let E be a Lebesgue measurable set.
a. If E c N where N is the nonmeasurable set described in 31.1, then m(E) =
o.
b. If m( E) > 0, then E contains a nonmeasurable set. (It suffices to assume
E C [0,1]. In the notation of 31.1, E = UrER E n N r .)
40 MEASURES
30. If E E /:.; and m(E) > 0, for any a < 1 there is an open interval I such that
m(E n I) > am(I).
31. If E E /:.; and m(E) > 0, the set E - E == {x - y : x, y E E} contains an
interval centered at o. (If I is as in Exercise 30 with a > , then E - E contains
(-m(I), m(I)).)
32. Suppose {aj}l C (0,1).
a. I1(1 - aj) > 0 iff E aj < 00. (Compare E 10g(1 - aj) to E aj.)
b. Given {3 E (0,1), exhibit a sequence {aj} such that I1 (1 - aj) == {3.
33. There exists a Borel set A c [0,1] such that 0 < m(A n I) < m(I) for every
subinterval I of [0, 1]. (Hint: Every subinterval of [0, 1] contains Cantor-ty'pe sets of
positive measure.)
1.6 NOTES AND REFERENCES
The history of Ineasure theory is intimately connected with the history of integration
theory, comments on which will be made in 32.7.
31.1: The Banach- Tarski paradox appeared first in [11], but the following variant
goes back to Hausdorff [68]:
The unit sphere in 3, {x E 3 : Ixl = I}, is the disjoint union of four sets
E 1 , . . . , E4 such that (a) E 1 is countable and (b) the sets E 2 , E 3 , E 4 , and
E3 U E4 are all images of each other under rotations.
An elementary exposition of the Banach- Tarski paradox and Hausdorff's result can
be found in Stromberg [146].
31.2: Our characterization of the a-algebra M( £) generated by a family £ c
P( X) is nonconstructive, and one might ask how to obtain M( £) explicitly from £.
The answer is rather complicated. One can begin as follows: Let £ 1 == £ U {EC :
E E £}, and for j > 1 define £ j to be the collection of all sets that are countable
unions of sets in £ j -1 or complements of such. Let £w == U £ j: is £w = M( £)?
In general, no. £w is closed under complements, but if Ej E £j \ £j-1 for each j,
there is no reason for U Ej to be in £w. So one must start all over again. More
precisely, one must define £a for every countable ordinal a by transfinite induction:
If a has an immediate predecesor {3, £a is the collection of sets that are countable
unions of sets in £/3 or complements of such; otherwise, £a = U/3<a £/3. Then:
1.23 Proposition. M(E) = UaEf1 £a, where n is the set of countable ordinals.
Proof. Transfinite induction shows that £a C M(E) for all a E n, and hence
UaEf1 £a C M(E). The reverse inclusion follows from the fact that any sequence in
n has a supremum in n (Proposition 0.19): If Ej E £aj for j E Nand /3 = sup{ aj},
then Ej E £a for all j and hence U Ej E £/3 where {3 is the successor of a. .
NOTES AND REFERENCES 41
Combining this with Proposition 0.14, we see that if card(N) < card( £) < c,
then card(M( £)) = c. (Cf. Exercise 3.)
31.3: Some authors prefer to take the domains of measures to be a-rings rather
than a-algebras (see Exercise 1). The reason is that in dealing with "very large"
spaces one can avoid certain pathologies by not attempting to measure "very large"
sets. However, this point of view also has technical disadvantages, and it is no longer
much in favor.
81.4: Caratheodory's theorem appears in his treatise [22]. Theorem 1.14 has
been attributed in the literature to Hahn, Caratheodory, and E. Hopf, but it is orig-
inally due to Frechet [54]. The proof via Caratheodory's theorem was discovered
independently by Hahn [60] and Kolmogorov [85].
See Konig [86] for a deeper study of the problem of constructing measures from
more primitive data.
31.5: Lebesgue originally defined the outer measure m * (E) of a set E c in
terms of countable coverings by intervals, as we have done. He then defined a bounded
set E to be measurable if m * (E) + m * ( ( a, b) \ E) == b - a, where (a, b) is an interval
containing E, and an unbounded set to be measurable if its intersection with any
bounded interval is measurable. Caratheodory's characterization of measurability,
which is technically eaiser to work with, came later. For the equivalence of the two
definitions, see Exercise 19.
One should convince oneself that the remarkably fussy proof of Proposition 1.15
is necessary by contemplating the complicated ways in which an h-interval can be
decomposed into a disjoint union of h-subintervals. In any such decomposition the
collection of right endpoints of the subintervals, when ordered from right to left, is
a well ordered set, but it can be order isomorphic to any initial segment of the set of
countable ordinals.
Lebesgue measure can be extended to a translation-invariant measure on a-
algebras that properly include ,c; see Kakutani and Oxtoby [81]. Of course, such
a -algebras can never contain the nonmeasurable set discussed in 31. However,
Lebesgue measure can be extended to a translation-invariant finitely additive mea-
sure on P'(), and its 2-dimensional analogue (see 32.6) can be extended to a finitely
additive measure on P(2) that is invariant under translations and rotations; see
Banach [8]. The Banach-Tarski paradox prevents this result from being extended to
higher dimensions.
In connection with the existence of non measurable sets, Solovay [138] has proved
a remarkable theorem which says in effect that it is impossible to prove the existence
of Lebesgue nonmeasurable sets without using the axiom of choice. (The precise
statement of the theorem involves to technical points of axiomatic set theory, which
we shall not discuss here.) From the point of view of the working analyst, the effect of
Solovay's theorem is to reaffirm the adequacy of the Lebesgue theory for all practical
purposes.
See Rudin [124] for a terse solution of Exercise 33.
Integration
In the classical theory of integration on , J: f(x) dx is defined as a limit of Rie-
mann sums, which are integrals of functions that approximate f and are constant on
subintervals of [a, b]. Similarly, on any measure space there is an obvious notion
of integral for functions that are, in a suitable sense, locally constant, and it can be
extended to an integral for more general functions. In this chapter, we develop the
theory of integration on abstract measure spaces, paying particular attention to the
Lebesgue integral on IR and its generalization to n .
2.1 MEASURABLE FUNCTIONS
We begin our study of integration theory with a discussion of measurable mappings,
which are the morphisms in the category of measurable spaces.
We recall that any mapping f : X -4 Y between two sets induces a mapping
f-l : P(Y) -4 P(X), defined by f-l(E) == {x EX: f(x) E E}, which
preserves unions, intersections, and complements. Thus, if N is a a-algebra on Y,
{f-l(E) : E E N} is a a-algebra on X. If (X,M) and (Y,N) are measurable
spaces, a mapping f : X -4 Y is called (M, N)-measurable, or just measurable
when M and N are understood, if f-l(E) E M for all E E N.
It is obvious that the composition of measurable mappings is measurable; that is,
if f : X -4 Y is (M, N)-measurable and 9 : Y -4 Z is (N,O)-measurable, then
9 0 f is (M, O)-measurable.
2.1 Proposition. IfN is generated by t., then f : X -4 Y is (M, N)-measurable iff
f-l(E) E Mforall E E t..
43
44 INTEGRATION
Proof. The "only if" implication is trivial. For the converse, observe that {E c
Y : f-l (E) E M} is a a-algebra that contains £; it therefore contains N. .
2.2 Corollary. If X and Yare metric (or topological) spaces, every continuous
f : X ---+ Y is ('B x, 'By )-measurable.
Proof. f is continuous iff f-l (U) is open in X for every open U C Y. .
If (X, M) is a measurable space, a real- or complex - valued function f on X will be
called M-measurable, or just measurable, if it is (M, 'BR) or (M, 'Be) measurable.
'B R or 'Be is always understood as the a-algebra on the range space unless otherwise
specified. In particular, f : ---+ <C is Lebesgue (resp. Borel) measurable if it is
(1.:, 'Be) (resp. ('BR, 'Be)) measurable; likewise for f : ---+ .
Warning: If f, 9 : ---+ are Lebesgue measurable, it does not follow that fog
is Lebesgue measurable, even if 9 is assumed continuous. (If E E 'B R we have
f-l(E) E ,c, but unless f-l(E) E 'BR there is no guarantee that g-l(f-l(E)) will
be in 'c. See Exercise 9.) However, if f is Borel measurable, then fog is Lebesgue
or Borel measurable whenever 9 is.
2.3 Proposition. If (X, M) is a measurable space and f : X ---+ , the following
are equivalent:
a. f is M-measurable.
b. f-l((a, (0)) E Mforall a E .
c. f-l((a, (0)) E Mfor all a E .
d. f-l((-oo,a))EMforallaE.
e. f-l((-oo,aJ) E Mforall a E.
Proof. This follows from Propositions 1.2 and 2.1. .
Sometimes we wish to consider measurability on subsets of X. If (X, M) is a
measurable space, f is a function on X, and E E M, we say that f is measurable on
E if f-l (B) nEE M for all Borel sets B. (Equivalently, fiE is ME-measurable,
where ME == {F n E : F EM}.)
Given a set X, if {(Y a , N a )} aEA is a family of measurable spaces, and f :
X ---+ Y a is a map for each a E A, there is a unique smallest a-algebra on X with
respect to which the fa's are all measurable, namely, the a-algebra generated by the
sets f;;l(Ea) with Ea E N a and ex E A. It is called the a-algebra generated by
{fa}aEA. In particular, if X == IlaEA Y a , we see that the product a-algebra on X,
as defined in 31.2, is the a -algebra generated by the coordinate maps 1r a : X ---+ Ya.
2.4 Proposition. Let (X,M) and (Ya,N a ) (a E A) be measurable spaces, Y =
IlaEA Y a , N == Q9aEA N a , and 1ra : Y ---+ Y a the coordinate maps. Then f : X ---+
Y is (M, N)-measurable iff fa = 1ra 0 f is (M, Na)-measurablefor all a.
Proof. If f is measurable, so is each fa since the composition of measurable
maps is measurable. Conversely, if each fa is measurable, then for all Ea E N a ,
f-l(1rl(Ea)) = f;;I(Ea) E M, whence f is measurable by Proposition 2.1. .
MEASURABLE FUNCTIONS 45
2.5 Corollary. A function f X <C is M-measurable iff Re f and 1m fare
M-measurable.
Proof. This follows since 'Be == 'B R 2 == 'BR @ 'B R by Proposition 1.5. .
It is sometimes convenient to consider functions with values in the extended real
number system ffi. == [00, 00]. We define Borel sets in by 'B R = {E c ffi. :
E n ffi. E 'BR}. (This coincides with the usual definition of the Borel a-algebra
if we make ffi. into a metric space with metric p(x,y) == IA(x) - A(y)l, where
A(x) == arctan x.) It is easily verified as in Proposition 2.3 that 'B R is generated by
the rays (a, 00] or [-00, a) (a E ffi.), and we define f : X -4 to be M-measurable
if it is (M, 'B R )-measurable. See Exercise 1.
We now establish that measurability is preserved under the familiar algebraic and
limiting operations.
2.6 Proposition. If f, 9 : X -4 <C are M-measurable, then so are I + 9 and I g.
Proof. Define F : X -4 <C x <C, cjJ : <C x <C -4 <C, and 'ljJ : <C x <C -4 <C by
F(x) == (/(x), g(x)), cjJ(z, w) == z + w, 'ljJ(z, w) == zw. Since 'Bexe == 'Be 0 'Be by
Proposition 1.5, F is (M, 'Bexe)-measurable by Proposition 2.4, whereas cjJ and 'ljJ
are (<Cexe, 'Be)-measurable by Corollary 2.2. Thus f + 9 = cjJ 0 F and fg = 'ljJ 0 F
are M-measurable. .
Proposition 2.6 remains valid for ffi.-valued functions provided one takes a little
care with the indeterminate expressions 00 - 00 and 0 . 00. (Recall, however, that by
convention we always define 0 . 00 to be 0.) See Exercise 2.
2.7 Proposition. If {/j } is a sequence of - valued measurable functions on (X, M),
then the functions
gl (x) == sP Ij (x),
J
g2(X) == if fj(x),
J
g3(X) = lisup Ij(x),
J --+ 00
g4(X) == liinf fj(x)
J --+ 00
are all measurable. If f(x) = limj--+oo f(x) exists for every x E X, then I is
measurable.
Proof. We have
00
9 11 ( ( a, 00]) == U f j- 1 ( ( a, 00]),
1
00
g2 1 ( [ - 00, a)) = U f j - 1 ( [ - 00, a) ) ,
1
SO gl and g2 are measurable by Proposition 2.3. More generally, if h k (x)
sUPj>kfj(x) then h k is measurable for each k, so 93 == infkh k is measurable,
and likewise for g4. Finally, if I exists then I = g3 == g4, so I is measurable. .
46 INTEGRATION
2.8 Corollary. If f, 9 X -4 are measurable, then so are max(f, g) and
min(f, g).
2.9 Corollary. If {fj} is a sequence of complex-valued measurable functions and
f(x) == limjoo fj (x) exists for all x, then f is measurable.
Proof. Apply Corollary 2.5.
.
For future reference we present two useful decompositions of functions. First, if
f : X -4 , we define the positive and negative parts of f to be
f+(x) == max(f(x), 0),
f- (x) == max( - f(x), 0).
Then f == f+ - f-. If f is measurable, so are f+ and f- , by Corollary 2.8. Second,
if f : X -4 C, we have its polar decomposition:
f = (sgnf)lfl,
where
sgn z = { /izi
if Z =1= 0,
if Z = o.
Again, if f is measurable, so are I f I and sgn f. Indeed, Z I Z I is continuous on
C, and Z sgn Z is continuous except at the origin. If U C C is open, sgn -1 (U)
is either open or of the form V U {o} where V is open, so sgn is Borel measurable.
Therefore I f I == I . I 0 f and sgn f == sgn 0 f are measurable.
We now discuss the functions that are the building blocks for the theory of inte-
gration. Suppose that (X, M) is a measurable space. If E eX, the characteristic
function XE of E (sometimes called the indicator function of E and denoted by
IE) is defined by
( ) { I if x E E,
XE x = 0 if x tt E.
It is easily checked that XE is measurable iff E E M. A simple function on X is a
finite linear combination, with complex coefficients, of characteristic functions of sets
in M. (We do not allow simple functions to assume the values ::i:oo.) Equivalently,
f : X -4 C is simple iff f is measurable and the range of f is a finite subset of C.
Indeed, we have
n
f == L ZjXE j , where Ej == f-1 ({ Zj}) and range(f) == {Zl, . . . , Zn}.
1
We call this the standard representation of f. It exhibits f as a linear combination,
with distinct coefficients, of characteristic functions of disjoint sets whose union is
X. Note: One of the coefficients Zj may well be 0, but the term ZjXE j is still to be
envisioned as part of the standard representation, as the set Ej may have a role to
play when f interacts with other functions.
It is clear that if f and 9 are simple functions, then so are f + 9 and f g. We
now show that arbitrary measurable functions can be approximated in a nice way by
simple functions.
MEASURABLE FUNCTIONS 47
2.10 Theorem. Let (X, M) be a measurable space.
a. If I : X -4 [0, 00] is measurable, there is a sequence { cPn} of simple functions
such that 0 < 4Jl < cP2 < . . . < I, 4Jn -4 I pointwise, and 4Jn -4 I uniformly
on any set on which I is bounded.
b. If I : X ---4 <C is measurable, there is a sequence { cPn} of simple functions such
that 0 < l4Jll < 1 cP21 < . .. < I I I, cPn -4 I pointwise, and cPn -4 I uniformly
on any set on which I is bounded.
Proof. (a) For n == 0, 1, 2, . . . and 0 < k < 2 2n - 1, let
E == 1-1 ((k2- n , (k + 1)2- n ]) and Fn = 1-1 ((2 n , 00]),
and define
2 2n _l
4Jn == L k2-nXE + 2 n XF n .
k=O
(This formula is messy in print but easily understood graphically; see Figure 2.1.) It
is easily checked that cPn < cPn+l for all n, and 0 < I - cPn < 2- n on the set where
I < 2 n . The result therefore follows.
(b) If I = 9 + ih, we can apply part (a) to the positive and negative parts of 9
and h, obtaining sequences 7/J;;, 7/J:;;, (;t, (;; of nonnegative simple functions that
increase to g+, g-, h+, h-. Let 4Jn == 7/J;; - 7/J:;; + i((;t - (;;); it is then a simple
exercise to verify that cPn has the desired properties. .
If j.-l is a measure on (X, M), we may wish to except j.-l-null sets from consideration
in studying measurable functions. In this respect, life is a bit simpler if j.-l is complete.
2.11 Proposition. Thefollowing implications are valid iff the measure j.-l is complete:
a. If I is measurable and I == 9 j.-l-a.e., then 9 is measurable.
b. If In is measurable for n E N and In ---4 I j.-l-a.e., then I is measurable.
The proof is left to the reader (Exercise 10).
On the other hand, the following result shows that one is unlikely to commit any
serious blunders by forgetting to worry about completeness of the measure.
I
I
I
I
. . . . . . . . . . . . . . . .
I
I.....
1.... .... ....
I
I
Fig. 2.1 The functions cPo (left) and cPl (right) in the proof of Theorem 2.1 Ga.
48 INTEGRATION
2.12 Proposition. Let (X, JY(, J.-l) be a measure space and let (X, JY( , J.-l ) be its com-
pletion. If f is an M -measurable function on X, there is an M-measurable function
9 such that f == 9 J.-l -almost everywhere.
Proof. This is obvious from the definition of J.-l if f = XE where E E M ,
and hence if f is an M-measurable simple function. For the general case, choose
a sequence {<Pn} of M -measurable simple functions that converge pointwise to f
according to Theorem 2.10, and for each n let 'l/Jn be an M-measurable simple
function with 'l/Jn == <Pn except on a set En E M with J.-l (En) == o. Choose N E M
such that J.-l(N) == 0 and N =:) U En' and set 9 = lirn XX\N'l/Jn. Then 9 is
M-measurable by Corollary 2.9, and 9 == f on NC. .
Exercises
In Exercises 1-7, (X, M) is a measurable space.
1. Let f : X -7 IR and Y == f-l (IR). Then f is measurable iff f-l ({ -oo}) E M,
f-l( {oo}) E M, and f is measurable on Y.
2. Suppose f, 9 : X -7 are measurable.
a. f 9 is measurable (where 0 . (::i:oo) = 0).
b. Fix a E IR and define h(x) == a if f(x) = -g(x) = ::i:oo and h(x)
f (x) + 9 (x) otherwise. Then h is measurable.
3. If {f n} is a sequence of measurable functions on X, then {x : lirn f n (x) exists}
is a measurable set.
4. If f : X IR and f-l((r, 00]) E M for each r E Q, then f is measurable.
5. If X == A u B where A, B E M, a function f on X is measurable iff f is
measurable on A and on B.
6. The supremum of an uncountable family of measurable IR -valued functions on
X can fail to be measurable (unless the a-algebra M is very special).
7. Suppose that for each ex E we are given a set Ea E M such that Ea C EfJ
whenever ex < (3, UaER Ea == X, and naER Ea == 0. Then there is a measurable
function f : X IR such that f(x) < ex on Ea and f(x) > ex on E for every ex.
(Use Exercise 4.)
8. If f : is monotone, then f is Borel measurable.
9. Let f : [0, 1] [0,1] be the Cantor function (31.5), and let g(x) = f(x) + x.
a. 9 is a bijection from [0, 1] to [0, 2], and h == g-l is continuous from [0,2] to
[0,1] .
b. If C is the Cantor set, m(g( C)) = 1.
c. By Exercise 29 of Chapter 1, g( C) contains a Lebesgue non measurable set
A. Let B == g-l (A). Then B is Lebesgue measurable but not Borel.
INTEGRATION OF NONNEGATIVE FUNCTIONS 49
d. There exist a Lebesgue measurable function F and a continuous function G
on IR such that FoG is not Lebesgue measurable.
10. Prove Proposition 2.11.
11. Suppose that f is a function on IR x IRk such that f(x, .) is Borel measurable
for each x E IR and f (., y) is continuous for each y E IRk. For n E N, define f n as
follows. For i E IE let ai == i / n, and for ai < x < ai+ 1 let
f ( ) - f(ai+l, y)(x - ai) - f(ai, y)(x - ai+l)
n x, y - .
ai+l - ai
Then fn is Borel measurable on IR x IRk and fn -+ f pointwise; hence f is Borel
measurable on IR x IRk. Conclude by induction that every function on IRn that is
continuous in each variable separately is Borel measurable.
2.2 INTEGRATION OF NONNEGATIVE FUNCTIONS
In this section we fix a measure space (X, M, J.-l), and we define
L + = the space of all measurable functions from X to [0, 00].
If 1> is a simple function in L + with standard representation 1> = L: aj XE j , we
define the integral of 1> with respect to J-L by
n
J cPdp,= Lajp,(E j )
1
(with the convention, as always, that 0 . 00 == 0). We note that J 1> dJ.-l may equal 00.
When there is no danger of confusion, we shall also write J 1> for J 1> dJ.-l. Also, it is
sometimes convenient to display the argument of 1> explicitly, especially when 1>( x) is
given by a formula in terms of x or when there are other variables involved; in this case
we shall use the notation J 1>( x) dJ-L( x). (Some authors prefer to write f 1>( x) J-L( dx)
instead.) Finally, if A E M, then 1>XA is also simple (viz., 1>XA == L: ajXAnEj)' and
we define J A 1> dJ.-l (or J A 1> or J A 1>( x) dJ-L( x)) to be J 1>XA dJ-L. The same notational
conventions will also apply to the inegrals of more general functions to be defined
below. To summarize:
i cPdp,= i cP= i cP(x)dp,(x) = J cPXAdp"
J= fx.
2.13 Proposition. Let 1> and 'ljJ be simple functions in L +.
a. If c > 0, J c1> = c J 1>.
b. J(1)+)==f1>+J.
c. If 1> < 'ljJ, then J 1> < J .
d. The map A 1---* J A dJ-L is a measure on M.
50 INTEGRATION
Proof. (a) is trivial. For (b), let L: ajXEj and L: bkXF k be the standard
representations of 1> and 'l/;. Then Ej == Ul (Ej n F k ) and Fk == U(Ej n F k )
since U Ej == U Fk == X, and these unions are disjoint. Hence the finite additivity
of J-l implies that
J cjJ + J 1/; = 2)aj + bk)J-l(Ej n Fk),
J,k
and the same reasoning show that the sum on the right equals I ( 1> + 'l/;). Moreover,
if 1> < 'l/J, then aj < b k whenever Ej n Fk =1= 0, so
J cjJ = L ajJ-l(Ej n F k ) < L bkJ-l(E j n F k ) = J 1/;,
j,k j,k
which proves (c). Finally, if {Ak} is a disjoint sequence in Jv( and A == U A k ,
r cjJ = I>jJ-l(A n E j ) = I>jJ-l(Ak n Ej) = L r cjJ,
} A. . k k } Ak
J J,
which establishes (d).
We now extend the integral to all functions I E L + by defining
.
J f dJ-l = sup {J cjJdJ-l : 0 < cjJ < f, cjJ SimPle} .
By Proposition 2.13c, the two definitions of I I agree when I is simple, as
the family of simple functions over which the supremum is taken includes I itself.
Moreover, it is obvious from the definition that
J f < J g whenever f < g, and J cf = c J f for all c E [0,00).
The next step is to establish one of the fundamental convergence theorems.
2.14 The Monotone Convergence Theorem. If {In} is a sequence in L+ such that
Ij < Ij+1 for all j, and I == lirn n --+ oo In (== SUPn In), then I I == lirn n --+ oo I In.
Proof. {I In} is an increasing sequence of numbers, so its limit exists (possibly
equal to 00). Moreover, I fn < I I for all n, so lirnI In < I I. To establish the
reverse inequality, fix a E (0,1), let 1> be a simple function with 0 < 1> < I, and let
En == {X : In (X) > a1>( X ) }. Then {En} is an increasing sequence of measurable
sets whose union is X, and we have J In > IEn In > a IEn 1>. By Proposition
2.13d and Theorem 1.8c, lirn JE n 1> == J 1>, and hence lirn J In > a J 1>. Since this
is true for all a < 1, it remains true for a == 1, and taking the supremum over all
simple 1> < I, we obtain lirn J In > I I. .
INTEGRATION OF NONNEGATIVE FUNCTIONS 51
The monotone convergence theorem is an essential tool in many situations, but
its immediate significance for us is as follows. The definition of J I involves the
supremum over a huge (usually uncountable) family of simple functions, so it may
be difficult to evaluate J I directly from the definition. The monotone convergence
theorem, however, assures us that to compute J I it is enough to compute lim J 1>n
where {1>n} is any sequence of simple functions that increase to I, and Theorem
2.10 guarantees that such sequences exist. As a first application, we establish the
additivity of the integral.
2.15 Theorem. If {In} is afinite or infinite sequence in L+ and I == Ln In, then
J f == Ln J In.
Proof. First consider two functions 11 and 12. By Theorem 2.10 we can find
sequences {<pj} and {'ljJj} of nonnegative simple functions that increase to 11 and 12.
Then {1>j + 'ljJj} increases to 11 + 12, so by the monotone convergence theorem and
Theorem 2.13b,
jU1 + h) = Hm j(<jJj + V;j) = Hm j <jJj + lim j V;j = j h + j h
Hence, by induction, J L fn == L J In for any finite N. Letting N -+ 00
and applying the monotone convergence theorem again, we obtain J L In
L J fn. .
2.16 Proposition. If I E L+, then J I == 0 iff I == 0 a.e.
Proof. This is obvious if I is simple: if f == L aj XEj with aj > 0, then
J I == 0 iff for each j either aj == 0 or J-l( Ej) == O. In general, if I == 0 a.e. and 1>
is simple with 0 < 1> < I, then 1> == 0 a.e., hence J f == sUP4>f J 1> == O. On the
other hand, {x : f (x) > O} == U En where En == {X : I (X) > n -1 }, SO if it is
false that I == 0 a.e., we must have J-l( En) > 0 for some n. But then I > n -1 X En'
so J f > n- 1 J-l(En) > O. .
2.17 Corollary. If {In} C L+, I E L+, and fn(x) increases to f(x) for a.e. x,
then J f == lim J In.
Proof. If In(x) increases to I(x) for x E E where J-l(EC) == 0, then 1- IXE == 0
a.e. and In - InXE == 0 a.e., so by the monotone convergence theorem, J I ==
J IXE == limJ InXE == limJ In. .
The hypothesis that the sequence {f n} be increasing, at least a.e., is essential
for the monotone convergence theorem. For example, if X is IR and J-l is Lebesgue
measure, we have X(n,n+l) -+ 0 and nX(O,I/n) -+ 0 pointwise, but J X(n,n+l) ==
J nX(O,I/n) == 1 for all n. As one sees by sketching the graphs, the trouble in these
examples is that the area under the graph "escapes to infinity" as n -+ 00, so the
area in the limit is less than one would expect. This is typical of the cases when the
52 INTEGRATION
integral of the limit is not the limit of the integrals, but in this situation there is still
an inequality that remains valid. We deduce it from the following general result.
2.18 Fatou's Lemma. If {In} is any sequence in L+, then
j (liminf In) < liminf j In.
Proof. For each k > 1 we have infnk In < Ij for j > k, hence I infnk In <
J Ij for j > k, hence J infnk In < infjk J Ij. Now let k -+ 00 and apply the
monotone convergence theorem:
j (lim inf In) == lim j ( inf In) < lim inf j In.
k-+()() nk
.
2.19 Corollary. If {In} C L+, I E L+, and In -+ I a.e., then I I < liminf I In.
Proof. If In -+ I everywhere, the result is immediate from Fatou's lemma,
and this can be achieved by modifying In and I on a null set without affecting the
integrals, by Proposition 2.16. .
2.20 Proposition. If I E L + and I I < 00, then {x : I (x) == oo} is a null set and
{x: f(x) > O} isa-finite.
The proof is left to the reader (Exercise 12).
Exercises
12. Prove Proposition 2.20. (See Proposition 0.20, where a special case is proved.)
13. Suppose {In} C L +, In -+ I pointwise, and I I == lim I In < 00. Then
J E I == lim I E In for all E E M. However, this need not be true if I I == lim I In ==
00.
14. If I E L +, let "\(E) == IE I dJ-l for E E M. Then ,,\ is a measure on M, and for
any gEL +, J 9 d,,\ == I I 9 dJ-l. (First suppose that 9 is simple.)
15. If {In} C L +, In decreases pointwise to I, and I 11 < 00, then I I == lirn I In.
16. If I E L + and I I < 00, for every f > 0 there exists E E M such that
J-l(E) < 00 and IE I > (I I) - f.
17. Assume Fatou's lemma and deduce the monotone convergence theorem from it.
2.3 INTEGRATION OF COMPLEX FUNCTIONS
We continue to work on a fixed measure space (X, M, J-l). The integral defined in the
previous section can be extended to real-valued measurable functions I in an obvious
INTEGRATION OF COMPLEX FUNCTIONS 53
way; namely, if f+ and f- are the positive and negative parts of f and at least one
of J f+ and J f - is finite, we define
J f = J f+ - J f-.
We shall be mainly concerned with the case where J f+ and J f- are both finite; we
then say that f is integrable. Since If I = f+ + f-, it is clear that f is integrable iff
J If I < 00.
2.21 Proposition. The set of integrable real-valued functions on X is a real vector
space, and the integral is a linear functional on it.
Proof. The first assertion follows from the fact that laf + bgl < lallfl + Ibllgl,
and it is easy to check that J af == a J f for any a E JR. To show additivity, suppose
that f and 9 are integrable and let h == f + g. Then h+ - h- == f+ - f- + g+ - g-,
so h+ + f- + g- == h- + f+ + g+. By Theorem 2.15,
J h+ + J f- + J g- = J h- + J f+ + J g+,
and regrouping then yields the desired result:
J h = J h+ - J h- = J f+ - J f- + J g+ - J g- = J f + J g.
.
Next, if f is a complex-valued measurable function, we say that f is integrable if
J If I < 00. More generally, if E E M, f is integrable on E if JE If I < 00. Since
If I < IRe fl + 11m fl < 21fl, f is integrable iff Re f and 1m f are both integrable,
and in this case we define
J f = J Ref + i J Imf.
It follows easily that the space of complex-valued integrable functions is a complex
vector space and that the integral is a complex-linear functional on it. We denote this
space -provisionally - by L 1 (J-l) (or L 1 (X, J-l), or L 1 (X), or simply L1, depending
on the context). The superscript 1 is standard notation, but it will not assume any
significance for us until Chapter 6.
2.22 Proposition. If f E L1, then I J fl < J If I.
Proof. This is trivial if J f == 0 and almost trivial if f is real, since
J f = J f+ - J f- < J f+ + J f- = J Ifl.
If f is complex-valued and J f =1= 0, let a == sgn(J f). Then I J fl == a J f == J af.
In particular, J af is real, so
J f = Re J af = J Re(af) < J I Re(af) I < J lafl = J Ifl.
.
54 INTEGRATION
2.23 Proposition.
a. If fELl, then {x : f(x) i= O} is a-jinite.
b. If f,g ELI, then JE f == JEgforall E E Miff J If - gl == 0 iff f == 9 a.e.
Proof. (a) and the second equivalence in (b) follow from Propositions 2.20 and
2.16. If J If - 9 I == 0, then by Proposition 2.22, for any E E M,
k f - kg < J XElf - g\ < J If - g\ = 0,
so that JE f == JE g. On the other hand, if u == Re(f - g), v == Im(f - g), and it
is false that f == 9 a.e., then at least one of u+, u - , v+, and v- must be nonzero on
a set of positive measure. If, say, E == {x : u + (x) > O} has positive measure, then
Re(JE f - JE g) == JE u+ > 0 since u- == 0 on E; likewise in the other cases. .
This proposition shows that for the purposes of integration it makes no difference
if we alter functions on null sets. Indeed, one can integrate functions f that are only
defined on a measurable set E whose complement is null simply by defining f to be
zero (or anything else) on EC. In this fashion we can treat lR-valued functions that
are finite a.e. as real-valued functions for the purposes of integration.
With this in mind, we shall find it more convenient to redefine L1 (J-l) to be the
set of equivalence classes of a.e.-defined integrable functions on X, where f and 9
are considered equivalent iff f == 9 a.e. This new L 1 (J-l) is still a complex vector
space (under pointwise a.e. addition and scalar multiplication). Although we shall
henceforth view L 1 (J-l) as a space of equivalence classes, we shall still employ the
notation "f EL I (J-l)" to mean that I is an a.e.-defined integrable function. This
minor abuse of notation is commonly accepted and rarely causes any confusion.
The new definition of L 1 (J-l) has two further advantages. First, if J-l is the comple-
tion of J-l, Proposition 2.12 yields a natural one-to-one correspondence between L 1 ( J-l )
and L 1 (J-l), so we can (and shall) identify these spaces. Second, L 1 is a metric space
with distance function p(f, g) == J If - 9 I. (The triangle inequality is easily verified,
and obviously p(f, g) == p(g, f); but to obtain the condition that p(f, g) == 0 only
when f == g, one must identify functions that are equal a.e., according to Proposition
2.23b.) We shall refer to convergence with respect to this metric as convergence in
L1; thus fn f in L 1 iff J Ifn - fl o.
We now present the last of the three basic convergence theorems (the other two
being the monotone convergence theorem and Fatou's lemma) and derive some useful
consequences from it. In the context of integration on JR with Lebesgue measure as
in the discussion preceding Fatou's lemma, the idea behind this theorem is that if
I n f a.e. and the graph of II n I is confined to a region of the plane with finite area
so that the area beneath it cannot escape to infinity, then J I n J f.
2.24 The Dominated Convergence Theorem. Let {fn} be a sequence in L1 such
that (a) fn f a.e., and (b) there exists a nonnegative 9 E L1 such that Ilnl < 9
a.e. for all n. Then fELl and J f == lim n -+ oo J In.
INTEGRATION OF COMPLEX FUNCTIONS 55
Proof. f is measurable (perhaps after redefinition on a null set) by Propositions
2.11 and 2.12, and since If I < 9 a.e., we have f ELI. By taking real and imaginary
parts it suffices to assume that fn and f are real-valued, in which case we have
9 + fn > 0 a.e. and 9 - fn > 0 a.e. Thus by Fatou's lemma,
1 9 + 1 I < lim inf 1 (9 + In) = 1 9 + lim inf 1 In'
1 9 - 1 I < liminf 1(9 - In) = 1 9 -limsup 1 In"
Therefore, liminf J fn > J f > limsup J fn, and the result follows.
.
2.25 Theorem. Suppose that {fj} is a sequence in L 1 such that L: J Ifj I < 00.
Then L: fj converges a.e. to afunction in Ll, and J L: fj == L: J fj.
Proof. By Theorem 2.15, JL: Ifjl == L: J Ifjl < 00, so the function 9 ==
L: I fj I is in L 1 . In particular, by Proposition 2.20, L: I fj (x) I is finite for a.e.
x, and for each such x the series L: fj(x) converges. Moreover, I L: fjl < 9 for
all n, so we can apply the dominated convergence theorem to the sequence of partial
sums to obtain J L: fj == L: J fj. .
2.26 Theorem. If f E L 1 (J-L) and f > 0, there is an integrable simple function
cP == L: ajXEj such that J If - cPl dJ-l < f. (That is, the integrable simple functions
are dense in L 1 in the L 1 metric.) If J-l is a Lebesgue-Stieltjes measure on IR, the sets
Ej in the definition of cP can be taken to be finite unions of open intervals; moreover,
the re is a continuous function 9 that vanishes outside a bounded interval such that
J If - 9 I dJ-L < f.
Proof. Let {cPn} be as in Theorem 2.10b; then J IcPn - fl < f for n sufficiently
large by the dominated convergence theorem, since I cPn - f I < 21 f I. If cPn ==
L: aj XE] where the Ej are disjoint and the aj are nonzero, we observe that J-l( Ej) =
lajl-l JE- IcPnl < lajl-l J If I < 00. Moreover, if E and F are measurable sets,
]
we have J-l(E6F) == J IXE - XFI. Thus if J-L is a Lebesgue-Stieltjes measure on JR,
by Proposition 1.20 we can approximate XEj arbitrarily closely in the L 1 metric by
finite sums of functions Xl k where the Ik's are open intervals. Finally, if Ik == (a, b)
we can approximate X1k in the Ll metric by continuous functions that vanish outside
(a, b). (For example, given f > 0, take 9 to be the continuous function that equals
o on (-00, a] and [b, 00), equals 1 on [a + f, b - f], and is linear on [a, a + f] and
[b - f, b].) Putting these facts together, we obtain the desired assertions. .
The next theorem gives a criterion, less restrictive than those found in most
advanced calculus books, for the validity of interchanging a limit or a derivative with
an integral.
56 INTEGRATION
2.27 Theorem. Suppose that f : X x [a, b] C (-00 < a < b < (0) and that
fC, t) : X -4 CC is integrable for each t E [a, b]. Let F(t) == Ix f(x, t) dJ-l(x).
a. Suppose that there exists 9 E L 1 (J-l) such that If(x, t) I < g(x) for all x, t.
If limt-+to f(x, t) == f(x, to) for every x, then limt-+to F(t) == F(to); in
particular, if f (x, .) is continuous for each x, then F is continuous.
b. Suppose that a f / at exists and there is agE L 1 (J-l) such that I (a f / at) (x, t) I <
g(x) for all x, t. Then F is differentiable and F'(x) == I(af jat)(x, t) dJ-l(x).
Proof. For (a), apply the dominated convergence theorem to fn(x) == f(x, tn)
where {t n } is any sequence in [a, b] converging to to. For (b), observe that
a a f (x, to) = limhn(x) where hn(x) = f(x, tn) - f(x, to) ,
t t n - to
{ t n } again being any sequence converging to to. It follows that a f j 8t is measurable,
and by the mean value theorem,
Ihn(x)1 < sup 8 a f (x, t) < g(x),
tE[a,b] t
so the dominated convergence theorem can be invoked again to give
F'(to) = lim F(tn) - F(to) = Hm J hn(x) dJ-l(x) == J 8 a f (x, t) dp,(x).
t n - to t
.
The device of using sequences converging to to in the preceding proof is technically
necessary because the dominated convergence theorem deals only with sequences of
functions. However, in such situations we shall usually just say "let t --t to" with the
understanding that sequential convergence is underlying the argument.
It is important to note that in Theorem 2.27 the interval [a, b] on which the
estimates on f or a f j at hold might be a proper subinterval of an open interval I
(perhaps JR itself) on which f (x, .) is defined. If the hypotheses of (a) or (b) hold
for all [a, b] c I, perhaps with the dominating function 9 depending on a and b, one
obtains the continuity or differentiability of the integrated function F on all of I, as
these properties are local in nature.
In the special case where the measure J-l is Lebesgue measure on JR, the integral
we have developed is called the Lebesgue integral. At this point it is appropriate
to study the relation between the Lebesgue and Riemann integrals on JR. We shall
use Darboux's characterization of the Riemann integral in terms of upper and lower
sums, which we now recall.
Let [a, b] be a compact interval. By a partition of [a, b] we shall mean a finite
sequence p == {tj}o such that a == to < tl < . .. < t n == b. Let f be an arbitrary
bounded real-valued function on [a, b]. For each partition P we define
n
Spf == L Mj(t j - tj-l),
1
n
spf == L mj(t j - tj-l),
1
INTEGRATION OF COMPLEX FUNCTIONS 57
where M j and mj are the supremum and infimum of f on [tj-1, t j ]. Then we define
-b
Ia(f) == inf Bpf,
p
I (f) == sup Sp f,
P
where the infimum and supremum are taken over all partitions P. If I : (f) == I (f),
their common value is the Riemann integral J: f(x) dx, and f is called Riemann
integrable.
2.28 Theorem. Let f be a bounded real-valued function on [a, b].
a. If f is Riemann integrable, then f is Lebesgue measurable (and hence inte-
grable on [a, b) since it is bounded}, and I: j(x) dx = I[a,b] j dm.
b. f is Riemann integrable iff {x E [a, b] : f is discontinuous at x} has Lebesgue
measure zero.
Proof. Suppose that f is Riemann integrable. For each partition P let
n
G p == LMjX(tj-l,tj]'
1
n
gp == L mjX(tj_l,tj]
1
(with the same notation as above), so that Bpf == J G p dm and spf == J gp dm.
There is a sequnce {P k } of partitions whose mesh (i.e., maxj (tj - t j -1)) tends to
zero, each of which includes the preceding one (so that 9 Pk increases with k while
Gpk decreases), such that Bpkf and sPkf converge to J: f(x) dx. Let G == lim G pk
and 9 == lim 9 Pk. Then 9 < f < G, and by the dominated convergence theorem,
J G dm == J 9 dm == J: f(x) dx. Hence J(G - g) dm == 0, so G == 9 a.e. by
Proposition 2.16, and thus G == f a.e. Since G is measurable (being the limit of a
sequence of simple functions) and m is complete, f is measurable and J[a,b] f dm ==
J G dm = J: f(x) dx. This proves (a), and the proof of (b) is outlined in Exercise
23. .
The (proper) Riemann integral is thus subsumed in the Lebesgue integral. Some
improper Riemann integrals (the absolutely convergent ones) can be interpreted
directly as Lebesgue integrals, but others still require a limiting procedure. For
example, if f is Riemann integrable on [0, b] for all b > 0 and Lebesgue integrable
on [0,(0), then fro,CXJ) j dm = limb->CXJ J j(x) dx (by the dominated convergence
theorem), but the limit on the right can exist even when f is not integrable. (Example:
f == 2: n- 1 (-1)n X (n,n+1].) Henceforth we shall generally use the notation
J: f (x) dx for Lebesgue integrals.
A few remarks comparing the construction of the Lebesgue and Riemann integrals
may be helpful. Let f be a bounded measurable function on [a, b], and for simplicity
let us assume that f > O. To compute the Riemann integral of f, one partitions
the interval [a, b] into subintervals and approximates f from above and below by
functions that are constant on each subinterval. To compute the Lebesgue integral of
58 INTEGRATION
f, one picks a sequence of simple functions that increase to f. In particular, if one
picks the sequence constructed in the proof of Theorem 2.10a (see Figure 2.1), one
is in effect partitioning the range of f into subintervals Ij and approximating f by a
constant on each of the sets f-l (Ij ). This procedure requires a more sophisticated
theory of measure to begin with since the sets f-l (Ij ) can be complicated, even when
f is continuous; but it is better adapted to the particular f under consideration and
therefore more flexible - and more susceptible to generalization. (In the Lebesgue
theory, the assumption that f is measurable removes the necessity of considering
both upper and lower approximations; however, the latter point of view can also be
made to work in the abstract setting. See Exercise 24.)
The Lebesgue theory offers two real advantages over the Riemann theory. First,
much more powerful convergence theorems, such as the monotone and dominated
convergence theorems, are available. These not only yield results previously unob-
tainable but also reduce the labor in proving classical theorems. Second, a wider
class of functions can be integrated. For example, if R is the set of rational numbers
in [0,1], XR is not Riemann integrable, being everywhere discontinuous on [0,1],
but it is Lebesgue integrable, and J XR dm == O. (Actually, this is in some sense a
trivial example since XR agrees a.e. with the constant function O. For a more inter-
esting example, see Exercise 25.) Of course, virtually all functions that one meets in
classical analysis are (locally) Riemann integrable, so this added generality is rarely
used in computing specific integrals. However, it has the crucial effect that various
metric spaces of functions whose metrics are defined in terms of integrals are com-
plete when Lebesgue integrable functions are used but not when one considers only
Riemann integrable functions. We shall investigate this situation more thoroughly
later, especially in Chapter 6. (We have already proved the completeness of Ll (J-l),
disguised as Theorem 2.25. To remove the disguise, see Theorem 5.1.)
We conclude this section by introducing the most ubiquitous of the higher tran-
scendental functions, the gamma function r, which will playa role in a number of
places later on. If z E C and Re z > 0, define fz : (0,00) C by fz (t) = t z - 1 e- t .
(Here t z - 1 == exp[(z -1) logt].) Since It Z - 1 1 == t Rez - 1 , we have Ifz(t)1 < t Rez - 1 ,
and also Ifz (t) I < C z e- t / 2 for t > 1. (The precise value of C z can easily be found
by maximizing tRe z-1 e- t / 2 , but it is of no importance here.) Since fa 1 t a dt < 00
for a > -1 and ftO e- t / 2 dt < 00, we see that fz E L 1 ((0, (0)) for Re z > 0, and
we define
r(z) = 1 00 e-1e- t dt
(Rez > 0).
Since
j N N
E ee- t dt = -ee-tl + z 1 e-1e- t dt
by integration by parts, by letting E 0 and N 00 we see that for Re z > 0, r
satisfies the functional equation
r(z + 1) == zr(z).
This equation can then be used to extend r to (almost) the entire complex plane.
Namely, for -1 < Re z < 0 we can define r (z) to be r (z + 1) / z, and by induction,
INTEGRATION OF COMPLEX FUNCTIONS 59
having defined r(z) for Rez > -n, we define r(z) for Rez > -n - 1 to be
r (z + 1) / z. The result is a function defined on all of C except for singularities at the
non positive integers where the algorithm just described involves division by zero.
We have r(l) == Jo oo e- t dt == -e-tlo == 1, so an n-fold application of the
functional equation shows that r ( n + 1) == n!. (Another proof of this fact is outlined
in Exercise 29.) Many of the applications of the gamma function involve the fact that
it provides an extension of the factorial function to nonintegers.
Exercises
18. Fatou's lemma remains valid if the hypothesis that f n E L + is replaced by the
hypothesis that fn is measurable and fn > -g where gEL + n L 1 . What is the
analogue of Fatou's lemma for nonpositive functions?
19. Suppose {fn} C L 1 (J--l) and fn f uniformly.
a. If J--l(X) < 00, then f E Ll (J--l) and J fn J f.
b. If J--l(X) == 00, the conclusions of (a) can fail. (Find examples on ffi. with
Lebesgue measure.)
20. (A generalized Dominated Convergence Theorem) If fn, gn, f, 9 E L 1 , fn f
and gn 9 a.e., Ifni < gn, and J gn J g, then J fn J f. (Rework the proof
of the dominated convergence theorem.)
21. Suppose fn, f E L 1 and fn f a.e. Then J Ifn - fl 0 iff J Ifni J If I.
(Use Exercise 20.)
22. Let J--l be counting measure on N. Interpret Fatou's lemma and the monotone and
dominated convergence theorems as statements about infinite series.
23. Given a bounded function f : [a, b] ffi., let
H(x) == lirn sup f(y),
bO Iy-xlb
h(x) = lirn inf f(y).
bO Iy-xlb
Prove Theorem 2.28b by establishing the following lemmas:
a. H(x) = h(x) iff f is continuous at x.
b. In the notation of the proof of Theorem 2.28a, H == G a.e. and h == 9
r -b
a.e. Hence Hand h are Lebesgue measurable, and J [a,b] H dm == I a (f) and
J[a,bJ h dm == I (f).
24. Let (X, M, J--l) be a measure space with J--l(X) < 00, and let (X, M , J--l ) be its
completion. Suppose f : X ffi. is bounded. Then f is M -measurable (and
hence in L 1 ( J--l )) iff there exist sequences {cPn} and {'l/;n} of M-measurable simple
functions such that cPn < f < 'l/;n and J ('l/;n - cPn) dJ--l < n -1 . In this case,
lim J <Pn dJ--l == lim J 'l/;n dJ--l == J f rlj1.
25. Let f(x) == x- 1 / 2 if 0 < x < 1, f(x) == 0 otherwise. Let {Tn}! be an
enumeration of the rationals, and set g(x) == L 2- n f(x - Tn).
a. 9 E Ll (m), and in particular 9 < 00 a.e.
60 INTEGRATION
b. 9 is discontinuous at every point and unbounded on every interval, and it
remains so after any modification on a Lebesgue null set.
c. g2 < 00 a.e., but g2 is not integrable on any interval.
26. If f E £1 (m) and F( x) == foo f (t) dt, then F is continuous on IR.
27. Let fn(x) == ae- nax - be- nbx where 0 < a < b.
a. L fo oo Ifn(x)1 dx == 00.
b. L fo oo fn(x) dx == O.
c. L fn E L 1 ([0, 00), m), and fo oo L fn(x) dx == log(bla).
28. Compute the following limits and justify the calculations:
a. limnoo fo oo (1 + (xln))-n sin(xln) dx.
b. limnoo fOl (1 + nx 2 )(1 + x 2 )-n dx.
c. limnoo fo oo n sin(x/n) [x(1 + x2)] -1 dx.
d. limnoo fa oo n(1 + n 2 x 2 )-1 dx. (The answer depends on whether a > 0,
a == 0, or a < O. How does this accord with the various convergence theorems?)
29. Show that fo oo xne- X dx == n! by differentiating the equation fo oo e- tx dx ==
lit. Similarly, show that foo x 2n e- x2 dx == (2n)!J1r 1 4nn ! by differentiating the
equation foo e- tx2 dx == J1r7i (see Proposition 2.53).
30. Show that limkoo fok x n (1 - k- 1 x)k dx == n!.
31. Derive the following formulas by expanding part of the integrand into an infinite
series and justifying the term-by-term integration. Exercise 29 may be useful. (Note:
In (d) and (e), term-by-term integration works, and the resulting series converges,
only for a > 1, but the formulas as stated are actually valid for all a > 0.)
a. For a > 0, foo e- x2 cas ax dx == J1re- a2 /4.
b. For a > -1, fOI x a (1 - x)-llogxdx == L(a + k)-2.
c. For a > 1, fo oo xa-l(e x - 1)-1 dx == f(a)((a), where ((a) == L n- a .
d. For a > 1, fo oo e- ax x- 1 sinxdx == arctan(a- 1 ).
e. For a > 1, Io oo e-axJo(x) dx = (s2 + 1)-1/2, where
Jo(x) == L( _1)n x 2n /4 n (n!)2 is the Bessel function of order zero.
2.4 MODES OF CONVERGENCE
If {f n} is a sequence of complex -valued functions on a set X, the statement" f n f
as n 00" can be taken in many different senses, for example, pointwise or uniform
convergence. If X is a measure space, one can also speak of a.e. convergence or
convergence in £1. Of course, uniform convergence implies pointwise convergence,
which in turn implies a.e. convergence (and not conversely, in general), but these
MODES OF CONVERGENCE 61
modes of convergence do not imply L 1 convergence or vice versa. It will be useful
to keep in mind the following examples on IR (with Lebesgue measure):
1. fn == n- 1 X(O,n).
11. f n == X(n,n+l).
111. fn = nX[o,l/n].
IV. 11 == X[O,I], f2 == X[O,1/2], f3 == X[I/2,1]' f4 = X[O,I/4]' fs == X[I/4,1/2]'
f6 == X[I/2,3/4], f7 == X[3/4,1], and in general, fn == X(j/2 k , (j+l)/2k] where
n == 2 k + j with 0 < j < 2 k .
In (i), (ii), and (iii), in 0 uniformly, pointwise, and a.e., repectively, but
fn 0 E Ll (in fact I Ifni == I fn == 1 for all n). In (iv), fn 0 in L 1 since
I Ifni == 2- k for 2 k < n < 2k+l, but fn(x) does not converge for any x E [0,1]
since there are infinitely many n for which fn(x) == 0 and infinitely many for which
fn(x) == 1.
On the other hand, if fn f a.e. and Ifni < 9 E Ll for all n, then fn fin L 1 .
(This is clear from the dominated convergence theorem since Ifn - fl < 2g.) Also,
we shall see below that if fn f in L 1 then some subsequence converges to f a.e.
Another mode of convergence that is frequently useful is convergence in measure.
We say that a sequence {fn} of measurable complex-valued functions on (X, M, J-.L)
is Cauchy in measure if for every E > 0,
JL({x: Ifn(x) - fm(x)1 > E}) Oasm,n 00,
and that {fn} converges in measure to f if for every E > 0,
J-.L ( {x : If n ( x) - f ( x ) I > E }) 0 as n 00.
For example, the sequences (i), (iii), and (iv) above converge to zero in measure, but
(ii) is not Cauchy in measure.
2.29 Proposition. If fn f in L 1 , then fn f in measure.
Proof. LetEn,E == {x : Ifn(x)- f(x)1 > E}. Then I Ifn - fl > IE Ifn - fl >
n,E
EJ-.L(En,t)' so J-.L(En,t) < E- 1 I Ifn - II o. .
The converse of Proposition 2.29 is false, as examples (i) and (iii) show.
2.30 Theorem. Suppose that {f n} is Cauchy in measure. Then there is a measurable
function f such that fn f in measure, and there is a subsequence {fnj} that
converges to f a. e. Moreover, if also I n 9 in measure, then 9 == f a. e.
Proof. We can choose a subsequence {gj} == {fnj} of {fn} such that if Ej ==
{x : Igj(x) - gj+l(x)1 > 2- j }, then J-.L(Ej) < 2- j . If Fk == U;:kEj, then
JL(F k ) < I::r 2- j == 21-k, and if x rt F k , for i > j > k we have
(2.31 )
i-I i-I
Igj(x) - gi(x)1 < L Igz+l(x) - gz(x)1 < L2-Z < 2 1 - j .
l=j l=j
62 INTEGRATION
Thus {gj} is pointwise Cauchy on Fk. Let F == n Fk == lim sup Ej. Then
p,(F) == 0, and if we set f(x) == limgj(x) for x rt F and f(x) == 0 for x E F,
then f is measurable (see Exercises 3 and 5) and gj f a.e. Also, (2.31) shows
that Igj(x) - l(x)J < 2 1 - j for x rt Fk and j > k. Since p,(F k ) 0 as k 00, it
follows that gj f in measure. But then In f in measure, because
{x : Iln(x)- f(x)1 > E} C {x : Iln(x)-gj(x)1 > E }u{ x : Igj(x)- l(x)1 > E},
and the sets on the right both have small measure when nand j are large. Likewise,
if f n 9 in measure,
{x: If(x)-g(x)1 > E} C {x: If(x)-fn(x)1 > E}U{X: Ifn(x)-g(x)1 > E}
for all n, hence J..L( {x : If(x) - g(x)1 > E}) == 0 for all E. Letting E tend to zero
through some sequence of values, we conclude that 1 == 9 a.e. .
2.32 Corollary. If fn f in L 1 , there is a subsequence {fnj} such that fnj 1
a.e.
Proof. Combine Proposition 2.29 and Theorem 2.30.
.
If f n f a.e., it does not follow that f n I in measure, as example (ii) shows.
However, this conclusion does hold on a finite measure space, where something
considerably stronger is true.
2.33 EgorofT's Theorem. Suppose that J..L(X) < 00, and fl, 12, . . . and f are mea-
surable complex-valuedfunctions on X such that In f a.e. Then for every E > 0
there exists E C X such that J--l(E) < E and In I uniformly on EC.
Proof. Without loss of generality we may assume that fn I everywhere on
X. For k, n E N let
00
En(k) == U {x : Ifm(x) - f(x)1 > k-l}.
m=n
Then, for fixed k, En(k) decreases as n increases, and n=1 En(k) == 0, so since
J--l(X) < 00 we conclude that J--l(En(k)) 0 as n 00. Given E > 0 and kEN,
choose nk so large that J--l(Enk(k)) < E2- k and let E == U%:1 Enk(k). Then
J--l(E) < E, and we have Iln(x) - l(x)1 < k- 1 for n > nk and x tt E. Thus In I
uniformly on EC. .
The type of convergence involved in the conclusion of Egoroff's theorem is some-
times called almost uniform convergence. It is not hard to see that almost uniform
convergence implies a.e. convergence and convergence in measure (Exercise 39).
MODES OF CONVERGENCE 63
Exercises
32. Suppose J--l(X) < 00. If f and 9 are complex-valued measurable functions on
X, define
J If-gl
p(f,g) = 1 + If _ gl dp,.
Then p is a metric on the space of measurable functions if we identify functions that
are equal a.e., and f n f with respect to this metric iff f n f in measure.
33. If fn > 0 and fn f in measure, then J f < liminf J fn.
34. Suppose Ifni < 9 E L 1 and fn f in measure.
a. J f == limJ fn.
b. fn f in L 1 .
35. f n f in measure iff for every E > 0 there exists N E N such that J--l( {x :
Ifn(x)-f(x)1 > E}) <Eforn > N.
36. If J--l(En) < 00 for n E Nand XE n f in Ll, then I is (a.e. equal to) the
characteristic function of a measurable set.
37. Suppose that fn and f are measurable complex-valued functions and cP : C C.
a. If cP is continuous and f n f a.e., then cP 0 f n cP 0 I a.e.
b. If c/J is uniformly continuous and f n f uniformly, almost uniformly, or
in measure, then cP 0 f n cP 0 I uniformly, almost uniformly, or in measure,
respectively.
c. There are counterexamples when the continuity assumptions on cP are not
satisfied.
38. Suppose fn f in measure and gn 9 in measure.
a. fn + gn f + 9 in measure.
b. I ngn I 9 in measure if J--l( X) < 00, but not necessarily if J--l( X) == 00.
39. If fn f almost uniformly, then In f a.e. and in measure.
40. In Egoroff's theorem, the hypothesis "J--l( X) < 00" can be replaced by "I In I < 9
for all n, where 9 E Ll (p,)."
41. If J--l is a-finite and f n f a.e., there exist measurable E 1 , E 2 , . .. c X such
that J--l( (U Ej )C) == 0 and I n I uniformly on each Ej.
42. Let fL be counting measure on N. Then f n ---+ f in measure iff In ---+ I uniformly.
43. Suppose that J--l(X) < 00 and f: X x [0, 1] C is a function such that 1(., Y)
is measurable for each y E [0,1] and I(x,.) is continuous for each x E X.
a. If 0 < E,8 < 1 then E€,8 == {x : If(x, y) - I(x, 0)1 < E for all y < 8} is
measurable.
b. For any E > 0 there is a set E c X such that J--l(E) < E and f(., y) f(., 0)
uniformly on EC as y o.
64 INTEGRATION
44. (Lusin's Theorem) If f : [a, b] C is Lebesgue measurable and E > 0, there is
a compact set E c [a, b] such that j..t(EC) < E and fIE is continuous. (Use Egoroff's
theorem and Theorem 2.26.)
2.5 PRODUCT MEASURES
Let (X, M, J--l) and (Y, N, v) be measure spaces. We have already discussed the
product a -algebra M Q9 N on X x Y; we now construct a measure on M Q9 N that is,
in an obvious sense, the product of J--l and v.
To begin with, we define a (measurable) rectangle to be a set of the form A x B
where A E M and BEN. Clearly
(A x B) n (E x F) = (A n E) x (B n F),
(A x B)C = (X x B C ) U (A C x B).
Therefore, by Proposition 1.7, the collection A of finite disjoint unions of rectangles
is an algebra, and of course the a-algebra it generates is M Q9 N.
Suppose A x B is a rectangle that is a (finite or countable) disjoint union of
rectangles Aj x Bj. Then for x E X and y E Y,
XA(X)XB(Y) == XAxB(X,y) == LXAjXBj(X,y) == LXA j (X)XBj (y).
If we integrate with respect to x and use Theorem 2.15, we obtain
JL(A)XB(Y) = ! XA(X)XB(Y) dJL(x) = L ! XAj (X)xB j (y) dJL(x)
== LJ--l(Aj)XBj(Y).
In the same way, integration in Y then yields
j..t(A)v(B) == L J--l(Aj )v(Bj).
It follows that if E E A is the disjoint union of rectangles Al x B I , . . . , An X Bn,
and we set
n
7r(E) = L j..t(Aj )v(Ej)
1
(with the usual convention that 0 . 00 == 0), then 7r is well defined on A (since any
two representations of E as a finite disjoint union of rectangles have a common
refinement), and 7r is a premeasure on A. According to Theorem 1.14, therefore, 7r
generates an outer measure on X x Y whose restriction to M x N is a measure that
extends 7r. We call this measure the product of J.L and v and denote it by J--l xv.
Moreover, ifJ--land v area-finite-say, X = U Aj and Y == U Bk withJ--l(Aj) <
00 and V(Bk) < 00 - then X x Y = Uj,k Aj X Bk, and fL x v(Aj x Bk) < 00, so
J-L x v is also a-finite. In this case, by Theorem 1.14, fL x v is the unique measure on
Jy( Q9 N such that j..t x v(A x B) = J--l(A)v(B) for all rectangles A x B.
PRODUCT MEASURES 65
The same construction works for any finite number of factors. That is, suppose
( X j , M j , fL j) are measure spaces for j == 1,..., n. If we define a rectangle to be
a set of the form Al x ... x An with Aj E M j , then the collection A of finite
disjoint unions of rectangles is an algebra, and the same procedure as above produces
a measure J-.LI x . . . X J-.Ln on M I 0 . . . 0 M n such that
n
J-.LI X ... x J-.Ln(A1 x ... x An) == fIJ-.Lj(A j ).
I
Moreover, if the J-.Lj's are a-finite so that the extension from A to Q9 M j is uniquely
determined, the obvious associativity properties hold. For example, if we identify
Xl x X 2 X X 3 with (Xl x X 2 ) X X 3 , we have M I 0M2 0 M 3 == (M I 0M 2 ) 0M3
(the former being generated by sets of the form Al x A 2 X A3 with Aj E M j , and
the latter by sets of the form B x A3 with B E M I Q9 M 2 and A3 E M 3 ), and
J-.LI x J-.L2 X J.-L3 == (J.-LI X J.-L2) X J.-L3 (since they agree on sets of the form Al x A 2 X A 3 ,
and hence in general by uniqueness). Details are left to the reader (Exercise 45). All
of our results below have obvious extensions to products with n factors, but we shall
stick to the case n == 2 for simplicity.
We return to the case of two measure spaces (X, M, J-.L) and (Y, N, v). If E c
X x Y, for x E X and y E Y we define the x-section Ex and the y-section EY of
Eby
Ex == {y E Y : (x, y) E E},
EY == {x EX: (x, y) E E}.
Also, if f is a function on X x Y we define the x-section fx and the y-section fY
of f by
fx(Y) == fY(x) == f(x, y).
Thus, for example, (XE) x == XEx and (XE) Y == XEY.
2.34 Proposition.
a. If E E M x N, then Ex E Nforall x E X and EY E MforaZZ y E Y.
b. If f is JY( @ N-measurable, then fx is N-measurable for all x E X and fY is
M-measurable for all y E Y.
Proof. Let 9{ be the collection of all subsets E of X x Y such that Ex E N for all x
and EY E M for all y. Then 9{ obviously contains all rectangles (e.g., (A x B)x == B
if x E A, == 0 otherwise). Since (U Ej)x == U(Ej)x and (EC)x == (Ex)C, and
likewise for y-sections, 9{ is a a-algebra. Therefore 9{ M0N, which proves (a). (b)
follows from (a) because (fx)-I(B) == (f-1(B))x and (fY)-I(B) == (f-I(B))Y..
Before proceeding further we need a technical lemma. We define a monotone
class on a space X to be a subset e ofP(X) that is closed under countable increasing
unions and countable decreasing intersections (that is, if Ej E e and E 1 C E 2 C . . .,
then U EJ E e, and likewise for intersections). Clearly every a-algebra is a monotone
class. Also, the intersection of any family of monotone classes is a monotone class,
66 INTEGRATION
so for any £ c P(X) there is a unique smallest monotone class containing £, called
the monotone class generated by G.
2.35 The Monotone Class Lemma. If A is an algebra of subsets of X, then the
monotone class e generated by A coincides with the a-algebra M generated by A.
Proof. Since M is a monotone class, we have e c M; and if we can show that
e is a a-algebra, we will have M c e. To this end, for E E e let us define
e(E) == {F E e : E \ F, F \ E, and En F are in e}.
Clearly 0 and E are in e(E), and E E e(F) iff F E e(E). Also, it is easy to check
that e(E) is a monotone class. If E E A, then F E e(E) for all F E A because A
is an algebra; that is, A c e(E), and hence e c e(E). Therefore, if FEe, then
F E e(E) for all E E A. But this means that E E e(F) for all E E A, so that
A c e(F) and hence e c e(F). Conclusion: If E, FEe, then E \ F and E n F
are in e. Since X E Ace, e is therefore an algebra. But then if {E j } 1 c e, we
have U Ej E e for all n, and since e is closed under countable increasing unions it
follows that U Ej E e. In short, e is a a-algebra, and we are done. .
We now come to the main results of this section, which relate integrals on X x Y
to integrals on X and Y.
2.36 Theorem. Suppose (X, M, J-l) and (Y, N, v) are a-finite measure spaces. If
E E M Q9 N, then the functions x v(Ex) and Y J-l(EY) are measurable on X
and Y, respectively, and
fL x v(E) = J v(Ex) dfL(x) = J fL(EY) dv(y).
Proof. First suppose that J.-L and v are finite, and let e be the set of all E E
M 0 N for which the conclusions of the theorem are true. If E == A x B, then
v(Ex) == XA(x)v(B) and J-l(EY) == j-L(A)XB(Y), so clearly E E e. By additivity
it follows that finite disjoint unions of rectangles are in e, so by Lemma 2.35 it
will suffice to show that e is a monotone class. If {En} is an increasing sequence
in e and E == U En' then the functions fn(Y) == J-l((En)Y) are measurable and
increase pointwise to f (y) == JL( EY). Hence f is measurable, and by the monotone
convergence theorem,
J fL(EY) dv(y) = lim J fL((En)Y) dv(y) = limfL x v(En) = fL x v(E).
Likewise J-l x v(E) == J v(Ex) dJ-l(x), so E E e. Similarly, if {En} is a decreas-
ing sequence in e and n En, the function Y J-l((El)Y) is in Ll(v) because
JL((El)Y) < J-l(X) < 00 and v(Y) < 00, so the dominated convergence theorem
can be applied to show that E E e. Thus e is a monotone class, and the proof is
complete for the case of finite measure spaces.
PRODUCT MEASURES 67
Finally, if J-.L and v are a-finite, we can write X x Y as the union of an increasing
sequence {X j x Yj} of rectangles of finite measure. If E E M Q9 N, the preceding
argument applies to E n (X j x Yj) for each j to give
J-txv(En(XjxYj)) = I XXj(x)v(ExnYj)dJ-t(x) = I XYj(Y)J-t(EYnXj)dv(y),
and a final application of the monotone convergence theorem then yields the desired
result. .
2.37 The Fubini- Tonelli Theorem. Suppose that (X, M, J-.L) and (Y, N, v) are a-
finite measure spaces.
a. (Tonelli) if I E L+(X X Y), then the functions g(x) = J Ix dv and h(y) ==
J IY dJ.-L are in L + (X) and L + (Y), respectively, and
(2.38)
I f d(J-t x v) = I [I f(x, y) dV(Y)] dJ-t(x)
= I [I f(x, y) dJ-t(X)] dv(y).
b. (Fubini) If I E L1(J.-L X v), then Ix E L1(v) for a.e. x E X, IY E L1(J.-L) for
a.e. y E Y, the a.e.-definedfunctions g(x) == J Ix dv and h(x) == J IY dv are
in L 1 (J.-L) and L1 (v), respectively, and (2.38) holds.
Proof. Tonelli's theorem reduces to Theorem 2.36 in case I is a characteristic
function, and it therefore holds for nonnegative simple functions by linearity. If
I E L+(X X Y), let {In} be a sequence of simple functions that increase pointwise
to I as in Theorem 2.10. The monotone convergence theorem implies, first, that the
corresponding gn and h n increase to g and h (so that g and h are measurable), and,
second that
J 9 dJ-t = lim I 9n dJ-t = lim I fn d(J-t x v) = I f d(J-t x v),
I h dv = lim I h n dv = lim I f n d(J-t xv) = I f d(J-t xv),
which is (2.38). This establishes Tonelli's theorem and also shows that if I E
L + (X x Y) and J I d(J-.L xv) < 00, then g < 00 a.e. and h < 00 a.e., that is,
Ix E L1(v) for a.e. x and IY E L1(J-.L) for a.e. y. If I E L1(J-.L X v), then, the
conclusion of Fubini's theorem follows by applying these results to the positive and
negative parts of the real and imaginary parts of I. .
A few remarks are in order:
. We shall usually omit the brackets in the iterated integrals in (2.38), thus:
I [I f(x,Y)dJ-t(X)] dv(y) = II f(x,y)dJ-t(x)dv(y) = II fdJ-tdv.
68 INTEGRATION
. The hypothesis of a-finiteness is necessary; see Exercise 46.
. The hypothesis I E L + (X x Y) or I EL I (J--l xv) is necessary, in two
respects. First, it is possible for I x and IY to be measurable for all x, y and
for the iterated integrals II f dJ--l dv and II f dv dJ--l to exist even if I is not
M Q9 N-measurable. However, the iterated integrals need not then be equal; see
Exercise 47. Second, if I is not nonnegative, it is possible for Ix and IY to be
integrable for all x, y and for the iterated integrals II f dJ--l dv and II I dv dJ--l
to exist even if I If I d(J--l x v) == 00. But again, the iterated integrals need not
be equal; see Exercise 48.
. The Fubini and Tonelli theorems are frequently used in tandem. Typically one
wishes to reverse the order of integration in a double integral II f dJ--l dv. First
one verifies that I I I I d(J--l x v) < 00 by using Tonelli's theorem to evaluate this
integral as an iterated integral; then one applies Fubini's theorem to conclude
that II f dJ.-L dv == II f dv dJ.-L. For examples, see the exercises in 3 2 . 6 .
Even if J.-L and v are complete, J--l x v is almost never complete. Indeed, suppose
that there is a nonempty A E M with J.-L( A) == 0 and that N =I P(Y). (This is the
case with J--l == v == Lebesgue measure on IR, for example.) If E E P(Y) \ N, then
A x E tfi M Q9 N by Proposition 2.34, but A x E c A x Y, and J--l x v(A x Y) == o.
If one wishes to work with complete measures, of course, one can consider the
completion of J--l xv. In this setting the relationship between the measurability
of a function on X x Y and the measurability of its x-sections and y-sections is
not so simple. However, the Fubini- Tonelli theorem is still valid when suitably
reformulated:
2.39 The Fubini-Tonelli Theorem for Complete Measures. Let (X, M, JL) and
(Y, N, v) be complete, a-finite measure spaces, and let (X x Y, £, A) be the com-
pletion of (X x Y, M Q9 N, J--l x v). If f is £-measurable and either (a) I > 0
or (b) fELl (A), then Ix is N-measurable for a.e. x and fY is M-measurable for
a.e. y, and in case (b) fx and fY are also integrable for a.e. x and y. Moreover,
x I Ix dv and y I IY dJ--l are measurable, and in case (b) also integrable, and
I fd>'= II f(x,y) dJ-L(x) dv(y) = II f(x,y) dv(y) dJ-L(x).
This theorem is a fairly easy corollary of Theorem 2.37; the proof is outlined in
Exercise 49.
Exercises
45. If (X j , M j ) is a measurable space for j == 1,2,3, then Q9 M j == (Jv(1 Q9 Jv(2) Q9
M3. Moreover, if J--lj is a a-finite measure on (X j , M j ), then J--l1 x J-L2 X J-L3 ==
(ILl X JL2) X J--l3.
46. Let X == Y == [0,1], Jy( == N == '13[0,1]' JL = Lebesgue measure, and v = counting
measure. If D == {(x, x) : x E [0, 1]} is the diagonal in X x Y, then If XD dJ--ldv,
PRODUCT MEASURES 69
JJ XD dv df.1, and J XD d(f.1 xv) are all unequal. (To compute J XD d(fL xv) ==
f.1 x v(D), go back to the definition of /-l x v.)
47. Let X == Y be an uncountable linearly ordered set such that for each x EX,
{y EX: y < x} is countable. (Example: the set of countable ordinals.) Let
JY( == N be the a-algebra of countable or co-countable sets, and let /-l == v be defined
on ]V( by /-l(A) == 0 if A is countable and f.1(A) == 1 if A is co-countable. Let
E == {(x,y) E X x X : y < x}. Then Ex and EY are measurable for all x,y,
and JJ XE d/-l dv and JJ XE dv d/-l exist but are not equal. (If one believes in the
continuum hypothesis, one can take X == [0,1] [with a nonstandard ordering] and
thus obtain a set E C [0,1]2 such that Ex is countable and EY is co-countable [in
particular, Borel] for all x, y, but E is not Lebesgue measurable.)
48. Let X == Y == N, JY( == N == P(N), /-l == v == counting measure. Define
f(m, n) == 1 if m == n, f(m, n) == -1 if m == n + 1, and f(m, n) == 0 otherwise.
Then J I f I d(/-l xv) == 00, and JJ f d/-l dv and JJ f dv d/-l exist and are unequal.
49. Prove Theorem 2.39 by using Theorem 2.37 and Proposition 2.12 together with
the following lemmas.
a. If E E M x Nand /-l x v(E) == 0, then v(Ex) == /-l(EY) == 0 for a.e. x and y.
b. If f is -C-measurable and f == 0 A-a.e., then fx and fY are integrable for a.e.
x and y, and J f x dv == J fY d/-l == 0 for a.e. x and y. (Here the completeness of
f.1 and v is needed.)
50. Suppose (X, M, /-l) is a a-finite measure space and f E L+(X). Let
G f == {(x,y) E X x [0,00]: y < f(x)}.
Then G f is M x 13 R -measurable and /-l x m( G f) == J f d/-l; the same is also true
if the inequality y < f (x) in the definition of G f is replaced by y < f (x). (To
show measurability of G f, note that the map (x, y) r--+ f (x) - y is the composition
of (x, y) r--+ (f(x), y) and (z, y) r--+ Z - y.) This is the definitive statement of the
familiar theorem from calculus, "the integral of a function is the area under its graph."
51. Let (X,M,/-l) and (Y,N,v) be arbitrary measure spaces (not necessarily a-
finite) .
a. If f : X C is M-measurable, 9 : Y C is N-measurable, and h(x, y) ==
f(x)g(y), then his M 0 N-measurable.
b. If f E Ll(/-l) and 9 E Ll(v), then h E Ll(/-l X v) and J hd(/-l x v) ==
[J f d/-lHJ 9 dv].
52. The Fubini-Tonelli theorem is valid when (X, M, /-l) is an arbitrary measure
space and Y is a countable set, N == P(Y), and v is counting measure on Y. (Cf.
Theorems 2.15 and 2.25.)
70 INTEGRATION
2.6 THE n-DIMENSIONAL LEBESGUE INTEGRAL
Lebesgue measure m n on JRn is the completion of the n-fold product of Lebesgue
measure on JR with itself, that is, the completion of m x . . . x m on 13 R Q9 . . . Q913 R =
13 R n, or equivalently the completion of m x . .. x m on 1: Q9 . . . Q9 1:. The domain
n of m n is the class of Lebesgue measurable sets in ]Rn; sometimes we shall also
consider m n as a measure on the smaller domain 13Rn. When there is no danger of
confusion, we shall usually omit the superscript n and write m for m n , and as in the
case n == 1, we shall usually write J f(x) dx for J f dm.
We begin by establishing the extensions of some of the results in 31.5 to the
n-dimensional case. In what follows, if E = rr Ej is a rectangle in JRn, we shall
refer to the sets Ej C JR as the sides of E.
2.40 Theorem. Suppose E E 1: n.
a. m(E) == inf{m(U) : UE, U open} == sup{m(K) : KCE, K compact}.
b. E == Al U N 1 == A 2 \ N 2 where Al is an Fa set, A 2 is a G 8 set, and
m(N I ) = m(N 2 ) = O.
c. If m(E) < 00, for any E > 0 there is a finite collection {Rj}f of disjoint
rectangles whose sides are intervals such that m( E uf Rj) < E.
Proof By the definition of product measures, if E E 1: nand E > 0 there is
a countable family {T j } of rectangles such that E C u Tj and L m(Tj) <
m(E) + E. For each j, by applying Theorem 1.18 to the sides of Rj we can find
a rectangle U j Fj whose sides are open sets such that m(U j ) < m(Tj) + E2- j .
If U == u U j , then U is open and m(U) < L m(U j ) < m(E) + 2E. This
proves the first equation in part (a); the second one, and part (b), then follow as in the
proofs of Theorems 1.18 and 1.19. Next, if m( E) < 00, then m(U j ) < 00 for all j.
Since the sides of U j are countable unions of open intervals, by taking suitable finite
subunions we obtain rectangles '0 C U j whose sides are finite unions of intervals
such that m('0) > m(U j ) - E2- j . If N is sufficiently large, then, we have
N N 00
m(E\U'0) < m(UU j \ '0) +m( U U j ) < 2E
1 1 N+I
and
N 00
m(U'0\E) < m(UUj\E) <E,
1 1
so that m(E U '0) < 3E. Since U '0 can be expressed as a finite disjoint union
of rectangles whose sides are intervals, we have proved (c). .
2.41 Theorem. If f EL I ( m) and E > 0, there is a simple function cP = L a j X Rj'
where each Rj is a product of intervals, such that J If - cPl < E, and there is a
continuous function 9 that vanishes outside a bounded set such that J If - gl < E.
THE n-DIMENSIONAL LEBESGUE INTEGRAL 71
Proof. As in the proof of Theorem 2.26, approximate f by simple functions,
then use Theorem 2.40c to approximate the latter by functions cjJ of the desired
form. Finally, approximate such cjJ's by continuous functions by applying an obvious
generalization of the argument in the proof of Theorem 2.26. .
2.42 Theorem. Lebesgue measure is translation-invariant. More precisely, for a E
]Rn define Ta : ]Rn ]Rn by Ta(X) == X + a.
a. If E E [.;n, then Ta(E) E [.;n and m(Ta(E)) == m(E).
b. If f : ]Rn C is Lebesgue measurable, then so is f 0 Ta. Moreover, if either
f > 0 or f E Ll(m), then J(f oTa)dm = f fdm.
Proof. Since Ta and its inverse T -a are continuous, they preserve the class of
Borel sets. The formula m(Ta(E)) == m(E) follows easily from the one-dimensional
result (Theorem 1.21) if E is a rectangle, and it then follows for general Borel sets
since m is determined by its action on rectangles (the uniqueness in Theorem 1.14).
In particular, the collection of Borel sets E such that m( E) == 0 is invariant under
Ta. Assertion (a) now follows immediately.
If f is Lebesgue measurable and B is a Borel set in C, we have f-l (B) == E uN
where E is Borel and m(N) == O. But T;;1 (E) is Borel and m( T;;1 (N)) == 0, so
(f 0 Ta)-1 (B) E [.;n and f is Lebesgue measurable. The equality J(f 0 Ta) dl-L ==
J f dl-L reduces to the equality m( T -a (E)) == m( E) when f == XE. It is then true for
simple functions by linearity, and hence for nonnegative measurable functions by the
definition of the integral. Taking positive and negative parts of real and imaginary
parts then yields the result for f E Ll ( m). .
Let us now compare Lebesgue measure on JRn to the more naive theory of n-
dimensional measure usually found in advanced calculus books. In this discussion,
a cube in ]Rn is a Cartesian product of n closed intervals whose side lengths are all
eq ual.
For k E Z, let Qk be the collection of cubes whose side length is 2- k and whose
vertices are in the lattice (2- k z)n. (That is, rr[aj, b j ] E Qk iff 2 k aj and 2 k b j are
integers and b j - aj = 2- k for all j.) Note that any two cubes in Qk have disjoint
interiors, and that the cubes in Qk+l are obtained from the cubes in Qk by bisecting
the sides.
If E c JRn, we define the inner and outer approximations to E by the grid of
cubes Qk to be
A (E,k) = U{Q E Qk: Q c E},
A (E,k) == U{Q E Qk: QnE i= 0}.
(See Figure 2.2.) The measure of A (E, k) (in either the naive geometric sense or the
Lebesgue sense) is just 2- nk times the number of cubes in Qk that lie in A (E, k),
and we denote it by m( A (E, k)); likewise for m( A (E, k)). Also, the sets A (E, k)
increase with k while the sets A ( E, k) decrease, because each cube in Qk is a union
of cubes in Qk+ 1. Hence the limits
/1; (E) == lim m( A (E, k)),
k-+oo
/1; (E) = lim m( A (E,k))
k ---+ 00
72 INTEGRATION
I . . . I . . , I
...........-......................... ..
I I I I I . .
... ... ... '. ... ... I. ... ... .'... ... ...
. . .
. . . .
Fig. 2.2 Approximations to the inner and outer content of a set.
exist. They are called the inner and outer content of E, and if they are equal, their
common value /1;(E) is the Jordan content of E.
Two comments: First, Jordan content is usually defined using general rectangles
whose sides are intervals rather than our dyadic cubes, but the result is the same.
Second, although all the definitons above make sense for arbitrary E c jRn, the
theory of Jordan content is meaningful only if E is bounded, for otherwise /1; (E)
always equals 00.
Let
00
00
A (E) == U A (E,k),
1
A (E) = n A (E,k).
1
Then A (E) c E c A (E), A (E) and A (E) are Borel sets, and /1; (E) == m( A (E))
and /1; (E) == m( A (E)). Thus the Jordan contentofEexists iffm( A (E)\ A (E)) = 0,
which implies that E is Lebesgue measurable and m(E) == (E).
To clarify further the relationship between Lebesgue measure and the approxi-
mation process leading to Jordan content, we establish the following lemma. (The
second part of the lemma will be used later.)
2.43 Lemma. If U c JRn is open, then U == A (U). Moreover, U is a countable
union of cubes with disjoint interiors.
Proof. If x E U, let b == inf{ly - xl : Y tj. U}, which is positive since U is open.
If Q is a cube in Qk that contains x, then every y E Q is at a distance at most 2- k Vii
from x (the worst case being when IXj - Yj I = 2- k for all j), so we will have Q C U
provided k is large enough so that 2- k Vii < b. But then x E A (U, k) c A (U).
This shows that A (U) == U, and the second assertion follows by writing A (U) ==
A (U,O) u U[ A (U,k) \ A (U,k -1)]. A (U,O) is a (countable) union of cubes in
Qo, and for k > 1, the closure of A (U, k) \ A (U, k - 1) is a (countable) union of
cubes in Qk. These cubes all have disjoint interiors, and the result follows. .
THE n-DIMENSIONAL LEBESGUE INTEGRAL 73
Lemma 2.43 immediately implies that the Lebesgue measure of any open set is
equal to its inner content. On the other hand, suppose that F c JRn is compact.
We can find a large cube, say Qo == {x : max IXjl < 2 M }, whose interior int(Qo)
contains F. If Q E Qk and Q c Qo then either Q n F i= 0 or Q C (Qo \ F), so
m( A (F,k)) + m( A (Qo \ F,k)) == m(Qo). Letting k ---t 00, we see that /1; (F) +
r;, ( Qo \ F) == m(Qo). But Qo \ F is the union of the open set int( Qo) \ F and the
boundary of Qo, which has content zero, so that /1; ( Qo \ F) == /1; (int ( Qo) \ F) ==
m( Qo \ F). It follows that the Lebesgue measure of any compact set is equal to its
outer content.
Combining these results with Theorem 2.40a, we can see exactly how Lebesgue
measure compares to Jordan content. The Jordan content of E is defined by approx-
imating E from the inside and the outside by finite unions of cubes. The Lebesgue
measure of E, on the other hand, is given by a two-step approximation process:
First one approximates E from the outside by open sets and from the inside by
compact sets, and then approximates the open sets from the inside and the compact
sets from the outside by finite unions of cubes. The Lebesgue measurable sets are
precisely those for which these outer-inner and inner-outer approximations give the
same answer in the limit. (Cf. Exercise 19 in gl.4.)
We now investigate the behavior of the Lebesgue integral under linear transfor-
mations. We identify a linear map T : jRn ---t jRn with the matrix (T ij ) == (ei . Tej)
where {ej} is the standard basis for }Rn. We denote the determinant of this matrix
by det T and recall that det(T 0 S) == (det T) (det S). Furthermore, we employ the
standard notation G L( n, JR) (the "general linear" group) for the group of invertible
linear transformations of }Rn. We shall need the fact from elementary linear algebra
that every T E G L( n, jR) can be written as the product of finitely many transfor-
mations of three "elementary" types. The first type multiplies one coordinate by a
nonzero constant e and leaves the others fixed; the second type adds a multiple of one
coordinate to some other coordinate and leaves all but the latter fixed; the third type
interchanges two coordinates and leaves the others fixed. In symbols:
T I ( X I, . . . , x j , . . . , X n) == (X 1 , . . . , ex j, . . . , X n ) (e i= 0),
T 2 ( Xl, . . . , x j, . . . , X n) == (X I , . . . , x j + ex k, . . . , X n ) (k i= j),
T3(XI'... ,Xj,... ,Xk,... ,X n ) == (Xl,... ,Xk,... ,Xj,... ,X n ).
That every invertible transformation is a product of transformations of these three
types is simply the fact that every nonsingular matrix can be row-reduced to the
identity matrix.
2.44 Theorem. Suppose T E GL(n, JR).
a. If f is a Lebesgue measurable function on JRn, so is f 0 T. If f > 0 or
f E LI(m), then
(2.45) J f(x)dx = IdetTI J foT(x)dx.
b. If E E [.;n, then T(E) E [.;n and m(T(E)) == I det Tlm(E).
74 INTEGRATION
Proof. First suppose that f is Borel measurable. Then f 0 T is Borel measurable
since T is continuous. If (2.45) is true for the transformations T and S, it is also true
for T 0 S, since
J f (x) dx = I det T I J f 0 T (x) dx = I det T II det 8 I J (f 0 T) 0 8 (x) dx
= I det(T 0 8)1 J f 0 (T 0 8)(x) dx.
Hence is suffices to prove (2.45) when T is of the types T 1 , T 2 , T3 described above.
But this is a simple consequence of the Fubini -Tonelli theorem. For T3 we interchange
the order of integration in the variables Xj and Xk, and for T 1 and T 2 we integrate
first with respect to x j and use the one-dimensional formulas
J f(t) dt = lei J f(ct) dt,
J f(t + a) dt = J f(t) dt,
which follow from Theorem 1.21. Since it is easily verified that det T 1 = c, det T 2 =
1, and det T3 == -1, (2.45) is proved. Moreover, if E is a Borel set, so is T( E) (since
T- 1 is continuous), and by taking f = XT(E), we obtain m(T(E)) == I det Tlm(E).
In particular, the class of Borel null sets is invariant under T and T- 1 , and hence so
is .e n. The result for Lebesgue measurable functions and sets now follows as in the
proof of Theorem 2.42. .
2.46 Corollary. Lebesgue measure is invariant under rotations.
Proof. Rotations are linear maps satisfying TT* = I where T* is the transpose
of T. Since det T == det T*, this condition implies that I det TI = 1. .
Next we shall generalize Theorem 2.44 to differentiable maps. This result will
not be used elsewhere in this book and may be omitted on a first reading. We shall
prove a generalization of it, by somewhat different methods, in g 11.2.
Let G = (gl,..., gn) be a map from an open set 0 C JRn into JRn whose
components gj are of class C 1 , i.e., have continuous first-order partial derivatives.
We denote by DxG the linear map defined by the matrix ((8g i j8xj)(x)) of partial
derivatives at x. (Observe that if G is linear, then DxG = G for all x.) G is called
a C1 diffeomorphism if G is injective and DxG is invertible for all x E O. In this
case, the inverse function theorem guarantees that G-1 : G(O) ---t 0 is also a C1
diffeomorphism and that Dx(G- 1 ) = [DC-l(x)G]-l for all x E G(O).
2.47 Theorem. Suppose that 0 is an open set in ]Rn and G : 0 ---t JRn is a C1
diffeomorphism.
a. If f is a Lebesgue measurable function on G(O), then foG is Lebesgue
measurable on 0. If f > 0 or f E L 1 (G(0), m), then
( f(x) dx = ( f 0 G(x)1 det DxGI dx.
ic(n) in
b. If E c nand E E .en, then G(E) E .en and m(G(E)) == IE I det DxGI dx.
THE n-DIMENSIONAL LEBESGUE INTEGRAL 75
Proof. It suffices to consider Borel measurable functions and sets. Since G and
G-l are both continuous, there are no measurability problems in this case, and the
general case follows as in the proof of Theorem 2.42.
A bit of notation: For x E IRn and T == (T ij ) E G L( n, JR), we set
!lxll == mx IXj I,
lJn
n
IITII == mx I: ITijl.
1 <<n
- - j=l
We then have IITxl1 < IITllllxll, and {x : IIx - all < h} is the cube of side length
2h centered at a.
Let Q be a cube in 0, say Q == {x : IIx - all < h}. By the mean value theorem,
gj(x) - gj(a) == j(Xj - aj)(8gj8xj)(Y) for some y on the line segmentjoning
x and a, so that for x E Q, IIG(x) - G(a)11 < h(SUPYEQ IIDyGII). In other words,
G( Q) is contained in a cube of side length SUPYEQ IIDyGil times that of Q, so that
by Theorem 2.44, m(G(Q)) < (suPYEQ IIDyGII)nm(Q). 1fT E GL(n, IR), we can
apply this formula with G replaced by T- 1 0 G together with Theorem 2.44 to obtain
(2.48)
m(G(Q)) == IdetTlm(T-l(G(Q)))
< I det TI (sup IIT- 1 DyGII) n m(Q).
yEQ
Since DyG is continuous in y, for any E > 0 we can choose b > 0 so that
II (D z G)-l DyGIIn < 1 + E if Y, z E Q and Ily - zll < b. Let us now subdivide
Q into subcubes Q1, . . . Q N whose interiors are disjoint, whose side lengths are at
most 8, and whose centers are Xl,.. . XN. Applying (2.48) with Q replaced by Qj
and with T == Dx. G, we obtain
J
N
m(G(Q)) < I:m(G(Qj))
1
N n
< I:ldetDxjGI( sup II(D xj G)-lD y GII) m(Qj)
1 yEQJ
N
< (l+E)I:ldetD xj Glm(Qj).
1
This last sum is the integral of I det DXJ GIXQj (x), which tends uniformly on
Q to I det DxGI as b 0 since DxG is continuous. Thus, letting b ---t 0 and E 0,
we find that
m(G(Q)) < h I detDxGIdx.
We claim that this estimate holds with Q replaced by any Borel set in O. Indeed, if
U c 0 is open, by Lemma 2.43 we can write U == U Qj where the Qj'S are cubes
76 INTEGRATION
with disjoint interiors. Since the boundaries of the cubes have Lebesgue measure
zero, we have
(X) (X)
m(G(U)) < I:m(G(Qj)) < I: 1 I detDxGIdx = 1 1 detDxGIdx.
1 1 Qj U
Moreover, if E c n is any Borel set of finite measure, by Theorem 2.40 there is a
decreasing sequence of open sets U j C 0 of finite measure such that E c n U j
and m(n U j \ E) == O. Hence by the dominated convergence theorem,
(X)
m(G(E)) < m(G(nUj)) =lirnm(G(Uj))
1
< lirn 11 det DxGI dx = 1 1 det DxGI dx.
U j E
Finally, since m is a-finite, it follows from this that m(G(E)) < IE I det DxGI dx
for any Borel set E c O.
If f = L ajXA j is a nonnegative simple function on G(O), we therefore have
1 f(x) dx == I: ajm(Aj) < I: aj 1 I det DxGI dx
C(O) C-l (A j )
= In f 0 G(x)1 det DxGI dx.
Theorem 2.10 and the monotone convergence theorem then imply that
r f(x)dx < r foG(x)ldetDxGldx
iC(D) io
for any nonnegative measurable f. But the same reasoning applies with G replaced
by G-1 and f replaced by foG, so that
10 f 0 G(x)1 det DxGI dx
< r foGoG- 1 (x)ldetD a -l(xPlldetD x G- 1 Idx= r f(x)dx.
ic(o)) ic(o)
This establishes (a) for f > 0, and the case fELl follows immediately. Since (b)
is just the special case of (a) where f = XC(E), the proof is complete. .
Exercises
53. Fill in the details of the proof of Theorem 2.41.
54. How much of Theorem 2.44 remains valid if T is not invertible?
INTEGRATION IN POLAR COORDINATES 77
55. Let E == [0,1] x [0,1]. Investigate the existence and equality of JE f dm 2 ,
J0 1 Jo 1 f(x, y) dx dy, and Jo 1 Jo 1 f(x, y) dy dx for the following f.
a. f ( x, y) == (x 2 - y2) (x 2 + y2) - 2 .
b. f(x, y) == (1 - xy)-a (a > 0).
c. f(x, y) == (x - )-3 if 0 < y < Ix - I, f(x, y) == 0 otherwise.
56. If f is Lebesgue integrable on (0, a) and g(x) == J: t- I f(t) dt, then 9 is
integrable on (0, a) and Jo a g(x) dx == Jo a f(x) dx.
57. Show that Jo oo e- sx x- 1 sin x dx == arctan( 8- 1 ) for 8 > 0 by integrating
e- SXY sin x with respect to x and y. (It may be useful to recall that tan( - B) ==
(tan B) -1. Cf. Exercise 31d.)
58. Show that J e- sx x- 1 sin 2 x dx == i 10g(1 + 48- 2 ) for 8 > 0 by integrating
e- SX sin 2xy with respect to x and y.
59. Let f(x) == x-I sin x.
a. Show that Jo oo If(x) I dx == 00.
b. Show that limb---too J f( x) dx == 7r by integrating e- XY sin x with respect
to x and y. (In view of part (a), some care is needed in passing to the limit as
b ---t 00.)
60. r(x)r(y)jr(x + y) == Jo 1 t X - 1 (1 - t)Y-1 dt for x, y > O. (Recall that r was
defined in 32.3. Write r(x)f(y) as a double integral and use the argument of the
exponential as a new variable of integration.)
61. If f is continuous on [0,(0), for a > 0 and x > 0 let
1 (X
I",!(x) = f(a) J o (x - t)"'-l !(t) dt.
10: f is called the ath fractional integral of f.
a. 10:+{3f == 10: (1{3f) for all a, f3 > o. (Use Exercise 60.)
b. If n E N, Inf is an nth-order anti derivative of f.
2.7 INTEGRATION IN POLAR COORDINATES
The most important nonlinear coordinate systems in JR2 and JR3 are polar coor-
dinates (x == r cos B, y == r sin B) and spherical coordinates (x == r sin cP cos B,
y == r sin cP sin B, z == r cos cP). Theorem 2.47, applied to these coordinates, yields the
familiar formulas (loosely stated) dx dy == r dr dB and dx dy dz == r 2 sin cP dr dB dcjJ.
Similar coordinate systems exist in higher dimensions, but they become increasingly
complicated as the dimension increases. (See Exercise 65.) For most purposes,
however, it is sufficient to know that Lebesgue measure is effectively the product of
the measure rn-1 dr on (0,00) and a certain "surface measure" on the unit sphere
(dB for n == 2, sin q; dB dq; for n == 3).
78 INTEGRATION
Our construction of this surface measure is motivated by a familiar fact from plane
geometry. Namely, if So is a sector of a disc of radius r with central angle () (i.e.,
the region in the disc contained between the two sides of the angle), the area m( So )
is proportional to (); in fact, m(So) == !r 2 (). This equation can be solved for () and
hence used to define the angular measure () in terms of the area m( So). The same
idea works in higher dimensions: We shall define the surface measure of a subset of
the unit sphere in terms of the Lebesgue measure of the corresponding sector of the
unit ball.
We shall denote the unit sphere {x E JRn : Ixl = 1} by sn-1. If x E ]Rn \ {O},
the polar coordinates of x are
r == Ixl E (0, (0),
I X S n-1
x==R E .
The map <I>(x) == (r, x') is a continuous bijection from JRn \ {O} to (0,00) X sn-1
whose (continuous) inverse is <I>-1 (r, x') == rx ' . We denote by m* the Borel
measure on (0,00) X sn-1 induced by <I> from Lebesgue measure on jRn, that is,
m*(E) == m(<I>-l(E)). Moreover, we define the measure p = pn on (0, (0) by
p(E) == IE rn-1 dr.
2.49 Theorem. There is a unique Borel measure a == an-Ion sn-1 such that
m* == p x a. If f is Borel measurable on JRn and f > 0 or fELl (m), then
(2.50)
( f(x) dx = roo ( f(rx')r n - l da(x') dr.
JIRn Jo J Sn-l
Proof. Equation (2.50), when f is a characteristic function of a set, is merely a
restatement of the equation m* == p x a, and it follows for general f by the usual
linearity and approximation arguments. Hence we need only to construct a.
If E is a Borel set in sn-1, for a > 0 let
Ea == <I>-1 ((0, a] x E) == {rx' : 0 < r < a, x' E E}.
If (2.50) is to hold when f == XE 1 , we must have
m(El) = {l ( rn-l da(x') dr = a(E) (l rn-l dr = a(E) .
Jo J E Jo n
We therefore define a(E) to be n . m(E 1 ). Since the map E E 1 takes Borel sets
to Borel sets and commutes with unions, intersections, and complements, it is clear
that a is a Borel measure on sn-1. Also, since Ea is the image of E 1 under the
map x ax, it follows from Theorem 2.44 that m(Ea) = a n m(E 1 ), and hence, if
o < a < b,
b n - an I b
m* (( a, b] x E) = m(Eb \ Ea) == a(E) == a(E) r n - 1 dr
n a
== p x a (( a, b] x E).
INTEGRATION IN POLAR COORDINATES 79
Fix E E 13 Sn-l and let A E be the collection of finite disjoint unions of sets of the
form (a, b] x E. By Proposition 1.7, A E is an algebra on (0,00) x E that generates
the a-algebra ME == {A x E : A E 13(0,00)}. By the preceding calculation we
have m* == p x a on AE, and hence by the uniqueness assertion of Theorem
1.14, m* == p x a on ME. But U{ME : E E 13 s n-l} is precisely the set of Borel
rectangles in (0, (0) X sn-1, so another application of the uniqueness theorem shows
that m* == p x a on all Borel sets. .
Of course, (2.50) can be extended to Lebesgue measurable functions by consider-
ing the completion of the measure a. Details are left to the reader.
2.51 Corollary. If f is a measurable function on JRn, nonnegative or integrable,
such that f(x) == g(lxl)forsomefunction g on (0, (0), then
J f(x) dx = a(sn-l) 1= g(r)r n - 1 dr.
2.52 Corollary. Let c and G denote positive constants, and let B == {x E JRn
Ixl < c}. Suppose that f is a measurable function on JRn.
a. If If(x)1 < Glxl- a on B for some a < n, then f E L 1 (B). However, if
If(x)1 > Glxl- n on B, then f tt L 1 (B).
b. If If(x)1 < Glxl- a on BC for some a > n, then f E Ll(BC). However, if
If(x)1 > Glxl- n on BC, then f tt L 1 (BC).
Proof. Apply Corollary 2.51 to Ixl-axB and Ixl-axBc. .
We shall compute a(sn-1) shortly. Of course, we know that a(SI) == 21T; this
is just the definition of 21T as the ratio of the circumference of a circle to its radius.
Armed with this fact, we can compute a very important integral.
2.53 Proposition. If a > 0,
r ( 1T ) n/2
IR.n exp( -alxI 2 ) dx = a .
Proof. Denote the integral on the left by In. For n == 2, by Corollary 2.51 we
have
1 2 = 27r 1= re- ar2 dr = - ( : ) e-ar21 = : .
Since exp( -alxI 2 ) == TI exp( -ax]), Tonelli's theorem implies that In == (I 1 )n.
In particular, II == (12)1/2, so In == (I2)n/2 == (1T/a)n/2. .
Once we know this result, the device used in its proof can be turned around to
compute a(sn-l) for all n in terms of the gamma function introduced in 32.3.
21Tn /2
2.54 Proposition. a(sn-l) = r(n/2)'
80 INTEGRATION
Proof. By Corollary 2.51, Proposition 2.53, and the substitution s == r 2 ,
Jrn/2 = r e-1x12 dx = a(sn-l) ('X> r n - 1 e- r2 dr
JRn Jo
== a(sn-l) (ex) (n/2)-1 -8 d == a(sn-l) f ( n )
2 Jo s e s 2 2.
.
Jrn/2
2.55 Corollary. IIBn = {x E]Rn: Ixl < I}, thenm(B n ) = e ) '
f 2n + 1
Proof. m(Bn) == n- 1 a(Sn-l) by definition of a, and !nf(!n) == f(!n + 1)
by the functional equation for the gamma function. .
We observed in 32.3 that f(n) == (n - 1)!' Now we can also evaluate the gamma
function at the half-integers:
2.56 Proposition. f (n + !) == (n - !) (n - ) . . . ( ! ) J1r.
Proof. We have f( n + !) == (n - !) (n - ) . . . (! )f(!) by the functional
equation, and by Proposition 2.53 and the substitution s == r 2 ,
r(!) = roc; s-1/2e- s ds = 2 roc; e- r2 dr = J OC; e-r2 dr = Vii.
Jo Jo -ex)
.
An amusing consequence of Proposition 2.56 and the formula f (n) == (n - 1)!
is that the surface measure of the unit sphere and the Lebesgue measure of the unit
ball in JRn are always rational multiples of integer powers of Jr, and the power of Jr
increases by 1 when n increases by 2.
Exercises
62. The measure a on sn-l is invariant under rotations.
63. The technique used to prove Proposition 2.54 can also be used to integrate any
polynomial over sn-l. In fact, suppose f(x) == Il x;j (Qj E N U {O}) is a
monomial. Then J f da == 0 if any Qj is odd, and if all OJ'S are even,
Jfd - 2f({3l)... f({3n) h {3 . _ Qj + 1
a - f ({3l + . . . + {3n)' were J - 2 .
64. For which real values of a and b is Ixl a Ilog Ixllb integrable over {x E JRn
Ixl < }? Over {x E JRn : Ixl > 2}?
65. Define G : JRn JRn by G(r, cPl'...' cPn-2, (}) == (Xl'...' x n ) where
Xl == r cos <PI, x2 == r sin <PI cos <P2,
xn-l == r sin <PI . . . sin <Pn-2 cos (),
X3 == r sin <PI sin <P2 cos <P3, . . . ,
X n == r sin <PI . . . sin <Pn-2 sin ().
NOTES AND REFERENCES 81
a. G maps JRn onto JRn, and IG(r, cjJl,..., cjJn-2, B)I = Irl.
b d t D G n-l. n-2 . n-3 .
. e (r,<Pl,...,<Pn-2,B) ==r SIn lSIn 2...SInn-2.
c. Let 0 = (0, (0) x (0,7r)n-2 X (0, 27r). Then GIO is a diffeomorphism and
m(JRn \ G(O)) = O.
d. Let F(cjJl,...,cjJn-2,B) = G(1,cjJl,...,cjJn-2,B) and 0' == (0,7r)n-2 X
(0, 27r). Then (FIO')-l defines a coordinate system on sn-l except on a a-null
set, and the measure a is given in these coordinates by
da(cjJl,... cjJn-2, B) = sin n - 2 cjJl sin n - 3 cjJ2... sincjJn-2 dcjJl ... dcjJn-2 dB.
2.8 NOTES AND REFERENCES
The history of modem measure and integration theory can fairly be said to have
begun with the publication of Lebesgue's thesis [91] in 1902, although of course
Lebesgue was building on earlier works of other mathematicians, and some of his
results were obtained independently by Vitali and W. H. Young. The theory of the
Lebesgue integral was extensively developed by a number of mathematicians in the
ensuing decade, during which time most of the results in this chapter were first
derived. In particular, Lebesgue himself proved the dominated convergence theorem
and deduced the monotone convergence theorem from it in the case when the limit
function f is integrable; when J f = 00 the latter theorem is due to B. Levi.
Lebesgue [92] studied more general measures on JRn (which he called "additive
set functions") in connection with the problem of generalizing the notion of indefinite
integrals to functions of several variables. Radon [111] then developed the theory of
integration with respect to what we now call regular Borel measures on JRn, which
in particular yields the Lebesgue-Stieltjes integrals when n == 1. Finally, in 1915
Frechet [53] pointed out that many of Radon's ideas would work in the general setting
of sets equipped with a-algebras. Thus was abstract measure and integration theory
born. It continued to develop until, by about 1950, it had assumed more or less the
form in which we know it today. The first systematic modem treatise on the subject
is Halmos [62].
For accounts of the prehistory and early history of the Lebesgue integral, see
Hawkins [70]. References concerning the later development of the subject can be
found in Saks [128] and Hahn and Rosenthal [61].
We have adopted the point of view of beginning with measures and deriving
integrals from them. However, it is also possible to go the other way, a procedure
first developed by Daniell [29]. Roughly speaking, one starts with an "elementary
integral": a linear functional 1 defined on a suitable space of functions that satisfies
some mild continuity conditions and is positive in the sense that l(f) > 0 whenever
f > 0 (for example, the Riemann integral on the space of continuous functions on
[a, b]). The Daniell theory provides an extension of 1 to a functional 1 defined on a
larger class of functions. Under appropriate hypotheses, the collection ]v( of sets E
such that XE is in the domain of 1 is then a a-algebra, the function J1( E) == 1 (XE) is a
82 INTEGRATION
measure on M, and I is integration with respect to J1. See Royden [121] for a concise
account of the Daniell theory and Pfeffer [108] for a comprehensive treatment, as
well as Konig [86] for a somewhat different approach.
The Lebesgue theory is not the last word regarding integration on JR. Motivated
partly by the problem of establishing the fundamental theorem of calculus in the
greatest possible generality (about which we shall say more in 33.6), a number of
theories of integration have been developed that include not only the Lebesgue integral
but also certain "conditionally convergent" integrals. That is, they assign a meaning to
J f (x) dx for certain measurable functions f : JR JR such that J f+ = J f- = 00,
but for which the cancellation of positive and negative values in some way yields a
reasonable definition of J f(x)dx. (A standard example is f(x) = x- 1 sinx; see
Exercise 59.) The first procedures for defining such integrals, due to Denjoy and
Perron, were quite complicated. However, in the late 1950s, Henstock and Kurzweil
independentl y discovered a modification of the classical Riemann integral that yields
the same results.
The Henstock-Kurzweil integral on a bounded interval [a, b] is defined as follows.
A tagged partition of [a, b] is a finite sequence {Xj}b" such that a == Xo < ... <
x N = b (Le., a partition in the sense of 32.3) together with another finite sequence
{tj}f such that tj E [Xj-l,Xj]. A gauge on [a, b] is an (arbitrary!) function
8 : [a, b] (0,00). If P is a tagged partition and 8 is a gauge, P is called 6-fine if
Xj - Xj-l < 8(tj) for all j. The compactness of [a, b] easily implies that for any
gauge 8 there is a 8-fine tagged partition of [a, b].
Now suppose f is a real-valued function on [a, b]. If P is a tagged partition of
[a, b], the corresponding Riemann sum for f is pf = I: f(tj)(xj - Xj-l). The
function f is called Henstock-Kurzweil integrable on [a, b] if there exists c E JR
with the following property: For any E > 0 there is a gauge 8 E such that if P is any
8 E -fine tagged partition of [a, b], then Ip f - cl < E. In this case the number c is
unique, and it is called the Henstock-Kurzweil integral of f. The ordinary Riemann
integral of f, in contrast, can be defined in exactly the same way except that one
allows only constant gauges.
It turns out that the Henstock - Kurzweil integral coincides with the integrals of
Denjoy and Perron. In particular, it coincides with the Lebesgue integral for nonneg-
ative functions, but its domain includes many functions that have both positive and
negative values and are not in Ll ([a, b]). The definition of the Henstock-Kurzweil
integral is easily extended to unbounded intervals. It also admits an n-dimensional
version: One simply defines an n-interval to be a product of none-dimensional
intervals and a tagged partition of an n-interval I to be a finite collection {I j } of
n-intervals with disjoint interiors whose union is I together with a choice of tj E Ij
for each j; the definition of the integral then proceeds as above.
A good case can be made that the Henstock - Kurzweil integral ought to be the theory
of integration on JRn that is generally taught to students, not just because of its added
generality but (more cogently) because its definition is relatively simple and requires
no measure theory to get started. On the other hand, it does not generalize as readily
to spaces other than JRn, and although it can be developed in a rather abstract setting,
it loses much of its appealing simplicity there. Moreover, although conditionally
NOTES AND REFERENCES 83
convergent integrals that cannot be obtained by a simple limiting procedure from
absolutely convergent ones do turn up now and then in certain problems, their utility
is not sufficently broad to make a compelling case for their study by nonspecialists.
In any case, in this book we shall content ourselves with the Lebesgue integral
and the general theory of measure and integration of which it is a part. Readers who
wish to learn more about the Henstock- Kurzweil integral can find a brief introduction
in Bartle [13] and detailed treatments in McLeod [99] and Pfeffer [109]. See also
Gordon [57] for a comprehensive account of the Denjoy, Perron, and Henstock-
Kurzweil integrals on [a, b], and Henstock [72] for a development of the theory in a
more abstract setting.
32.1: A Borel isomorphism between two measurable spaces (X, M) and (Y, N)
is a bijection f : X Y such that f-l is a bijection from N to M. Unlike the
related notion of homeomorphism for topological spaces (see Chapter 4) and notions
of isomorphism in various other categories, the notion of Borel isomorphism is of
limited utility, because it is too easy for two spaces to be Borel isomorphic. That
this is so is clearly indicated by the single major theorem in the subject, due to
Kuratowski:
Suppose that (X, M) is Borel isomorphic to a Borel subset E of a complete
separable metric space Y (equipped with the a-algebra {F E 13 y : FeE}).
Then either X is countable and M == P(X), or X is Borel isomorphic to
(JR, 13 IR ).
A proof of this theorem, as well as much additional information about Borel sets, can
be found in Srivastava [139].
There is a hierarchy of Borel measurable functions on a metric space that corre-
sponds roughly to the hierarchy of Borel sets (open and closed, Fa and G 8 , etc.).
Namely, let Bo be the space of all continuous functions, and for each countable
ordinal Q define Ba recursively as follows. If Q has an immediate predecessor
{3, Ba is the set of all limits of pointwise convergent sequences in B{3; otherwise,
Ba == U{3<a B{3. Functions in Ba are said to be of Baire class Q. For example, if f
is everywhere differentiable on JR, f' is of Baire class 1.
Exercise 11 is a result from Lebesgue's first published paper. See Rudin [123] for
a discussion of it.
32.3: The blurring of the distinction between individual measurable functions
and equivalence classes of functions defined by almost-everywhere equality is often
convenient and rarely disastrous. The most common situations where some care is
needed involve the interplay of measurable and continuous functions (on JRn, say),
for a function that is equal a.e. to a continuous function will not be continuous in
general. See Zaanen [165] for a careful discussion of this point.
32.4: An interesting discussion of Egoroff's theorem, including some necessary
and sufficient conditions for almost uniform convergence, can be found in Bartle
[12]. For a simple proof of Lusin's theorem (Exercise 44) that does not depend on
Egoroff's theorem, see Feldman [43]. We shall prove a more general form of this
theorem in 37.2.
84 INTEGRATION
32.5: The original theorems of Fubini and Tonelli pertained to Lebesgue measure
in the plane. The theory of abstract product measures was developed independently
by several people in the 1930s; the construction of J1 x v presented here is that of
Hahn [60]. It is also possible to define a product measure on the product of an infinite
famil y { (X a, Ma, J1a) } a E A of measure spaces provided that J1a (X a) = 1 for all but
finitely many a; see Saeki [127], Halmos [62, 338], or Hewitt and Stromberg [76,
322]. We shall present a version of this result in 37.4 (Theorem 7.28).
Using the axiom of choice but not the continuum hypothesis, Sierpinski [134] has
proved the existence of a Lebesgue non measurable subset of JR.2 whose intersection
with any straight line contains at most two points. This should be compared with
Exercise 47 (which is also due to Sierpinski).
The following generalization of the notion of product measures is useful in a
number of situations: One is given a measurable space (X, M), a a-finite measure
space (Y, N, v), and a family {J1y : y E Y} of finite measures on X such that the
function y J1y(E) is measurable on Y for each E E M. One can then define a
measureAonXxYsuchthatJ fdA == JJ f(x,y)dJ1y(x)dv(y)forf E L+(XxY).
See Johnson [79].
32.6: Our proof of Theorem 2.47 follows J. Schwartz [131]. This theorem can
also be proved under slightly weaker hypotheses on the transformation G; see Rudin
[125, Theorem 7.26].
Signed Measures and
Diffe rentiation
The principal theme of this chapter is the concept of differentiating a measure v with
respect to another measure J1 on the same (]' -algebra. We do this first on the abstract
level, then obtain a more refined result when J1 is Lebesgue measure on JRn. When
the latter is specialized to the case n == 1, it joins with classical real-variable theory
to produce a version of the fundamental theorem of calculus for Lebesgue integrals.
In developing this program it is useful to generalize the notion of measure so as to
allow measures to assume negative or even complex values. There are three reasons
for this. First, in applications such "signed measures" can represent things such
as electric charge that can be either positive or negative. Second, the differentiation
theory proceeds more naturally in the more general seting. Finally, complex measures
have a functional-analytic significance that will be explained in Chapter 7.
3.1 SIGNED MEASURES
Let (X, M) be a measurable space. A signed measure on (X, M) is a function
v : M [-00,00] such that
. v(0) == 0;
. v assumes at most one of the values :f::oo;
. if {E j } is a sequence of disjoint sets in M, then v(U Ej) == E v(Ej),
where the latter sum converges absolutely if v(U Ej) is finite.
Thus every measure is a signed measure; for emphasis we shall sometimes refer to
measures as positive measures.
85
86 SIGNED MEASURES AND DIFFERENTIATION
Two examples of signed measures come readily to mind. First, if J11, J12 are
measures on JV( and at least one of them is finite, then v == J11 - J12 is a signed
measure. Second, if J1 is a measure on ]v( and f : X [-00,00] is a measurable
function such that at least one of J f+ dJ1 and J f- dJ1 is finite (in which case we
shall call f an extended J.L-integrable function), then the set function v defined by
v( E) == J E f dJ1 is a signed measure. In fact, we shall see shortly that these are
really the only examples: Every signed measure can be represented in either of these
two forms.
3.1 Proposition. Let v be a signed measure on (X, M). If {E j } is an increasing
sequence in M, then v(U Ej) = limj--+oo v(Ej). If {E j } is a decreasing sequence
in M and V(EI) is finite, then v(n Ej) = limj--+oo v(Ej).
The proof is essentially the same as for positive measures (Theorem 1.8) and is
left to the reader (Exercise 1).
If v is a signed measure on (X, M), a set E E M is called positive (resp. negative,
null) for v if v(F) > 0 (resp. v(F) < 0, v(F) == 0) for all F E M such that FeE.
(Thus, in the example v(E) == JE f dJ1 described above, E is positive, negative, or
null precisely when f > 0, f < 0, or f = 0 J1-a.e. on E.)
3.2 Lemma. Any measurable subset of a positive set is positive, and the union of
any countable family of positive sets is positive.
Proof. The first assertion is obvious from the definition of positivity. If PI, P 2 , . . .
are positive sets, let Qn = Pn \ U-l Pj. Then Qn C Pn, so Qn is positive. Hence
if E c U Pj, then v(E) == L v(E n Qj) > 0, as desired. .
3.3 The Hahn Decomposition Theorem. Ifv is a signed measure on (X, M), there
exist a positive set P and a negative set N for v such that PuN = X and PnN == 0.
If P', N' is another such pair, then P 6.P' (= N 6.N') is nuUfor v.
Proof. Without loss of generality, we assume that v does not assume the value
-00. (Otherwise, consider -v.) Let m be the supremum of v(E) as E ranges over
all positive sets; thus there is a sequence {P j } of positive sets such that v (Pj) m.
Let P = U Pj. By Lemma 3.2 and Proposition 3.1, P is positive and v(P) == m;
in particular, m < 00. We claim that N == X \ P is negative. To this end, we assume
that N is not negative and derive a contradiction.
First, notice that N cannot contain any nonnull positive sets. Indeed, if E c N is
positive and v(E) > 0, then E U P is positive and v(E UP) = v(E) + v(P) > m,
which is impossible.
Second, if A c Nand v(A) > 0, there exists B c A with v(B) > v(A). Indeed,
since A cannot be positive, there exists C c A with v( C) < 0; thus if B == A \ C
we have v(B) == v(A) - v(C) > v(A).
If N is not negative, then, we can specify a sequence of subsets {A j } of Nand
a sequence {nj} of positive integers as follows: nl is the smallest integer for which
there exists a set BeN with v(B) > nIl, and Al is such a set. Proceeding
SIGNED MEASURES 87
inductively, nj is the smallest integer for which there exists a set B c Aj -1 with
v(B) > V(Aj-l) + njl, and Aj is such a set.
Let A == n Aj. Then 00 > v(A) == limv(Aj) > E nj\ so nj 00
as j 00. But once again, there exists B c A with v(B) > v(A) + n- 1 for
some integer n. For j sufficiently large we have n < nj, and B C A j - 1 , which
contradicts the construction of nj and Aj. Thus the assumption that N is not negative
is untenable.
Finally, if pI, N ' is another pair of sets as in the statement of the theorem, we
have P \ P' c P and P \ P' C N ' , so that P \ P' is both positive and negative,
hence null; likewise for P' \ P. .
The decomposition X == PuN if X as the disjoint union of a positive set and a
negative set is called a Hahn decomposition for v. It is usually not unique (v-null
sets can be transferred from P to N or from N to P), but it leads to a canonical
representation of v as the difference of two positive measures.
To state this result we need a new concept: We say that two signed measures J1 and
v on (X, M) are mutually singular, or that v is singular with respect to J1, or vice
versa, if there exist E, F E M such that E n F == 0, E u F == X, E is null for J1,
and F is null for v. Informally speaking, mutual singularity means that J1 and v "live
on disjoint sets." We express this relationship symbolically with the perpendicularity
sIgn:
J1 1- v.
3.4 The Jordan Decomposition Theorem. If v is a signed measure, there exist
unique positive measures v+ and v- such that v == v+ - v- and v+ 1- v-.
Proof. Let X == PuN be a Hahn decomposition for v, and define v+(E) ==
v(E n P) and v- (E) == -v(E n N). Then clearly v == v+ - v- and v+ 1- v-. If
also v == J1+ - J1- and J1+ 1- J1-, let E, F E M be such that EnF == 0, EuF == X,
and J1+ (F) == J1- (E) == O. Then X == E u F is another Hahn decomposition for v,
so P 6.E is v-null. Therefore, for any A E M, J1+ (A) == J1+ (A n E) == v(A n E) ==
v(A n P) == v+ (A), and likewise v- == J1-. .
The measures v+ and v- are called the positive and negative variations of v,
and v == v+ - v- is called the Jordan decomposition of v, by analogy with the
representation of a function of bounded variation on JR. as the difference of two
increasing functions (see 33.5). Furthermore, we define the total variation of v to
be the measure I v I defined by
Ivl == v+ + v-.
It is easily verified that E E M is v-null iff Ivl(E) == 0, and v 1- J1 iff Ivl 1- J.-L iff
v+ 1- J1 and v- 1- J1 (Exercise 2.)
We observe that if vomits the value 00 then v+(X) == v(P) < 00, so that v+ is a
finite measure and v is bounded above by v+ (X); similarly if vomits the value -00.
In particular, if the range of v is contained in JR., then v is bounded. We observe also
88 SIGNED MEASURES AND DIFFERENTIATION
that v is of the form v(E) == JE f dJ1, where J1 == Ivl and f == XP - XN, X = PuN
being a Hahn decomposition for v.
Integration with respect to a signed measure v is defined in the obvious way: We
set
LI(V) = LI(v+) n LI(v-),
J f dv = J f dv+ - J f dv- (f E £l(v)).
One more piece of terminology: a signed measure v is called finite (resp. u-finite)
if Ivl is finite (resp. a-finite).
Exercises
1. Prove Proposition 3.1.
2. If v is a signed measure, E is v-null iff I v I (E) = O. Also, if v and J1 are signed
measures, v .1. J1 iff Ivl .1. J1 iff v+ .1. J1 and v- .1. J1.
3. Let v be a signed measure on (X, M).
a. L I ( v) = L I ( I v I) .
b. If f E LI(v), I J f dvl < J If I dlvl.
c. If E E M, Ivl(E) == sup{IJEfdvl: If I < 1}.
4. If v is a signed measure and A, J1 are positive measures such that v == A - J1,
then A > v+ and J1 > v-.
5. If VI, v2 are signed measures that both omit the value +00 or -00, then I VI +v21 <
IVII + IV21. (Use Exercise 4.)
6. Suppose v(E) = J f dJ1 where J1 is a positive measure and f is an extended
J1-integrable function. Describe the Hahn decompositions of v and the positive,
negative, and total variations of v in terms of f and J1.
7. Suppose that v is a signed measure on (X, M) and E E M.
a. v+(E) = sup{v(F) : E E M, FeE} and v-(E) == - inf{v(F) : F E
M, FeE}.
b. Ivl (E) == sup{I:7I v (E j ) I : n E N, El'...' En are disjoint, and U7 Ej =
E}.
3.2 THE LEBESGUE-RADON-NIKODYM THEOREM
Suppose that v is a signed measure and J1 is a positive measure on (X, M). We say
that v is absolutely continuous with respect to J1 and write
v«J1
if v(E) = 0 for every E E M for which J1(E) = O. It is easily verified that v « J1 iff
Ivl « J1 iff v+ « J1 and v- « J1 (Exercise 8). Absolute continuity is in a sense the
THE LEBESGUE-RADON-NIKODYM THEOREM 89
antithesis of mutual singularity. More precisely, if v .1 J-l and v « J-l, then v == 0, for
if E and F are disjoint sets such that E u F == X and J-l(E) == Ivl (F) == 0, then the
fact that v « J-l implies that Ivl (E) == 0, whence Ivl == 0 and v == O. One can extend
the notion of absolute continuity to the case where J-l is a signed measure (namely,
v « J-l iff v « IJ-ll), but we shall have no need of this more general definition.
The term "absolute continuity" is derived from real-variable theory; see 33.5. For
finite signed measures it is equivalent to another condition that is obviously a form
of continuity.
3.5 Theorem. Let v be a finite signed measure and J-l a positive measure on (X, M).
Then v « J-l ifffor every E > 0 there exists 8 > 0 such that Iv(E)1 < E whenever
J-l( E) < 8.
Proof. Since v « J-l iff Ivl « J-l and Iv(E)1 < Ivl(E), it suffices to assume
that v == Ivl is positive. Clearly the E-8 condition implies that v « J-l. On the other
hand, if the E-8 condition is not satisfied, there exists E > 0 such that for all n E N
we can find En E M with J-l(En) < 2- n and v(En) > E. Let Fk == U En and
F == n Fk. Then J-l(Fk) < L: 2- n == 21-k, so J-l(F) == 0; but V(Fk) > E for all
k and hence, since v is finite, v(F) == lim V(Fk) > E. Thus it is false that v «J-l. .
If J-l is a measure and f is an extended J-l-integrable function, the signed measure
v defined by v( E) == IE f dJ-l is clearly absolutely continuous with respect to J-l; it is
finite iff f E L1(J-l). For any complex-valued f E L1(J-l), the preceding theorem can
be applied to Re f and 1m f, and we obtain the following useful result:
3.6 Corollary. If fELl (J-l), for every E > 0 there exists 8 > 0 such that
lIE f dJ-l1 < E whenever J-l( E) < 8.
We shall use the following notation to express the relationship v(E) == IE f dJ-l:
dv = f dJ-l.
Sometimes, by a slight abuse of language, we shall refer to "the signed measure
f dJ-l."
We now come to the main theorem of this section, which gives a complete picture
of the structure of a signed measure relative to a given positive measure. First, a
technical lemma.
3.7 Lemma. Suppose that v and J-l are finite measures on (X, M). Either v .1 J-l, or
there exist E > 0 and E E M such that J-l( E) > 0 and v > EJ-l on E (that is, E is a
positive set for v - EJ-l).
Proof. Let X == Pn U N n be a Hahn decomposition for v - n- 1 J-l, and let
P == U Pn and N == n N n == pc. Then N is a negative set for v - n- 1 J-l for all
n, i.e., 0 < v(N) < n- 1 J-l(N) for all n, so v(N) == o. If J-l(P) == 0, then v .1 fl. If
J-l(P) > 0, then J-l(Pn) > 0 for some n, and Pn is a positive set for v - n- 1 J-l. .
90 SIGNED MEASURES AND DIFFERENTIATION
3.8 The Lebesgue-Radon-Nikodym Theorem. Let v be a a-finite signed measure
and J-l a a-finite positive measure on (X, M). There exist unique a-finite signed
measures A, p on (X, M) such that
A .1 J-l, P « J-l, and v == A + p.
Moreover, there is an extended J-l-integrable function f : X -t IR such that dp == f dJ-l,
and any two such functions are equal J-l-a. e.
Proof. Case I: Suppose that v and J-l are finite positive measures. Let
3" = {f : X ---+ [0,00] : L f dp, < v(E) for all E E J\t( } .
:J is nonempty since 0 E :J. Also, if f, 9 E :J, then h == max(f, g) E :J, for if
A == {x : f(x) > g(x)}, for any E E M we have
r hdp,= r fdp,+ r gdp, < v(EnA)+v(E\A)=v(E).
lE lEnA lE\A
Let a == sup{f f dJ-l : f E :J}, noting that a < v(X) < 00, and choose a sequence
{fn} C :J such that f fn dJ-l a. Let gn == max(fl,... , fn) and f == sUPn fn.
Then gn E :J, gn increases pointwise to f, and f gn dJ-l > f f n dJ-l. It follows that
lirn f gn dJ-l == a and hence, by the monotone convergence theorem, that f E :J
and f f dJ-l == a. (In particular, f < 00 a.e., so we may take f to be real-valued
evecywhere. )
We claim that the measure dA == dv - f dJ-l (which is positive since f E 9J is
singular with respect to J-l. If not, by Lemma 3.7 there exist E E M and E > 0
such that J-l( E) > 0 and A > EJ-l on E. But then EXE dJ-l < dA == dv - f dJ-l, that
is, (f + eXE) dJ-l < dv, so f + EXE E :J and f(f + EXE) dJ-l == a + EJ-l(E) > a,
contradicting the definition of a.
Thus the existence of A, f, and dp == f dJ-l is proved. As for uniqueness, if also
dv == dA' + f'dJ-l, we have dA - dA' == (f' - f) dJ-l. But A - A' .1 J-l (see Exercise
9), while (f' - f) dJ-l « dJ-l; hence dA - dA' == (f' - f) dJ-l == 0, so that A == A' and
(by Proposition 2.23) f == f' J-l-a.e. Thus we are done in the case when J-l and v are
finite measures.
Case II: Suppose that J-l and v are a-finite measures. Then X is a countable
disjoint union of J-l-finite sets and a countable disjoint union of v-finite sets; by taking
intersections of these we obtain a disjoint sequence {A j } c M such that J-l(Aj) and
v(Aj) are finite forallj and X == U Aj. Define J-lj(E) == J-l(EnAj) and vj(E) ==
v(E n Aj). By the reasoning above, for each j we have dVj == dAj + fj dJ-lj where
Aj .1 J-lj. Since J-lj(Aj) == vj(Aj) == 0, we have Aj(Aj) == vj(Aj) - fA f dJ-lj == 0,
:J
and we may assume that fj == 0 on Aj. Let A == L: Aj and f == L: fj. Then
dv == dA + f dJ-l, A .1 J-l (see Exercise 9), and dA and f dJ-l are a-finite, as desired.
Uniqueness follows as before.
The General Case: If v is a signed measure, we apply the preceding argument to
v+ and v- and subtract the results. .
THE LEBESGUE-RADON-NIKODYM THEOREM 91
The decomposition v == A + P where A .1 J-l and p « J-l is called the Lebesgue
decomposition of v with respect to J-l. In the case where v « J-l, Theorem 3.8 says
that dv == f dJ-l for some f. This result is usually known as the Radon-Nikodym
theorem, and f is called the Radon-Nikodym derivative of v with respect to J-l. We
denote it by dv / dJ-l:
dv
dv = dp, dp,.
(Strictly speaking, dv / dJ-l should be construed as the as the class of functions equal
to f J-l-a.e.) The formulas suggested by the differential notation dJ-l/ dv are generally
correct. For example, it is obvious that d(V1 + V2) / dJ-l == ( dV 1 / dJ-l) + ( dV 2/ dJ-l), and
we have the chain rule:
3.9 Proposition. Suppose that v is a a-finite signed measure and J-l, A are a-finite
measures on (X, M) such that v « J-l and J-l « A.
a. If 9 E L 1 (v), then g( dv / dJ-l) E L1 (J-l) and
J 9 dv = J 9 : dp,.
b. We have v « A, and
dv
dA
dv dJ-l
--
dJ-l dA
A -a. e.
Proof. By considering v+ and v- separately, we may assume that v > O. The
equation f 9 dv == f g( dv / dJ-l) dJ-l is true when 9 == XE by definition of dv / dJ-l. It
is therefore true for simple functions by linearity, then for nonnegative measurable
functions by the monotone convergence theorem, and finally for functions in L 1 (v)
by linearity again. Replacing v, J-l by J-l, A and setting 9 == XE(dv /dJ-l), we obtain
v(E) = r dv dp, = r dv dp, d)"
} E dJ-l } E dJ-l dA
for all E E M, whence (dv / dA) == (dv / dJ-l) (dJ-l / dA) A-a.e. by Proposition 2.23. .
3.10 Corollary. If J-l « A and A « J-l, then (d)"'/ dJ-l) (dJ-l / dA) == 1 a.e. (with respect
to either A or J-l).
Nonexample: Let J-l be Lebesgue measure and v the point mass at 0 on (, JR).
Clearly v .1 J-l. The nonexistent Radon-Nikodym derivative dv / dJ-l is popularly
known as the Dirac 8-finction.
We conclude this section with a simple but important observation:
3.11 Proposition. If J-l1, . . . , J-ln are measures on (X, M), there is a measure J-l such
that J-lj « J-l for all j - namely, J-l == I: J-lj.
The proof is trivial.
92 SIGNED MEASURES AND DIFFERENTIATION
Exercises
8. v« J-l iff Iv I « J-l iff v+ « J-l and v- « J-l.
9. Suppose {Vj} is a sequence of positive measures. If Vj .1 J-l for all j, then
L: Vj .1 J-l; and if Vj « J-l for all j, then L: Vj « J-l.
10. Theorem 3.5 may fail when v is not finite. (Consider dv(x) == dx/x and
dJ-l(x) == dx on (0,1), or v = counting measure and J-l(E) == L:nEE 2- n on N.)
11. Let J-l be a positive measure. A collection of functions {/a}aEA C £1(J-l)
is called uniformly integrable if for every E > 0 there exists 8 > 0 such that
I IE la dpll < E for all ex E A whenever J-l(E) < 8.
a. Any finite subset of £1 (J-l) is uniformly integrable.
b. If {In} is a sequence in £1 (J-l) that converges in the L 1 metric to I E L 1 (J-l),
then {In} is uniformly integrable.
12. For j == 1,2, let J-lj,Vj be a-finite measures on (Xj,M j ) such that Vj «J-lj.
Then VI x V2 « J-ll x J-l2 and
d(Vl x V2) dVl dV2
d( ) (Xl, X2) = - d (x1)- d (X2).
J-ll x J-l2 J-l1 J-l2
13. Let X == [0,1], Jv( == '13[0,1]' m = Lebesgue measure, and J-l = counting measure
onM.
a. m« J-l but dm =I- I dJ-l for any I.
b. J-l has no Lebesgue decomposition with respect to m.
14. If v is an arbitrary signed measure and J-l is a a-finite measure on (X, M) such
that v « J-l, there exists an extended J-l-integrable function I : X [-00,00] such
that dv == I dJ-l. Hints:
a. It suffices to assume that J-l is finite and v is positive.
b. With these assumptions, there exists E E M that is a-finite for v such that
J-l( E) > J-l( F) for all sets F that are a-finite for v.
c. The Radon-Nikodym theorem applies on E. If F n E == 0, then either
v( F) == J-l( F) == 0 or J-l( F) > 0 and I v( F) I == 00.
15. A measure J-l on (X, M) is called decomposable if there is a family C M with
the following properties: (i) J-l( F) < 00 for all F E ; (ii) the members of are
disjoint and their union is X; (iii) if J-l(E) < 00 then J-l(E) == L:FE J-l(E n F); (iv)
if E c X and E n F E M for all F E then E E M.
a. Every a-finite measure is decomposable.
b. If J-l is decomposable and v is any signed measure on (X, M) such that v « J-l,
there exists a measurable f : X [- 00, 00] such that v (E) == IE I dJ-l for any
E that is a-finite for J-l, and III < 00 on any F E that is a-finite for v. (Use
Exercise 14 if v is not a-finite.)
16. Suppose that J-l, v are measures on (X, M) with v « J-l, and let A == J-l + v. If
I == dv / dA, then 0 < I < 1 J-l-a.e. and dv / dJ-l == 1/(1 - I).
COMPLEX MEASURES 93
17. Let (X, M, 1-") be a a-finite measure space, N a sub-a-algebra of M, and v ==
I-"IN. If f E £1 (1-"), there exists 9 E £1 (v) (thus 9 is N-measurable) such that
J E f dl-" == J E 9 dv for all E E N; if g' is another such function then 9 = g' v-a.e.
(In probability theory, 9 is called the conditional expectation of f on N.)
3.3 COMPLEX MEASURES
A complex measure on a measurable space (X, M) is a map v : M -t <C such that
. v(0) == 0;
. if {E j } is a sequence of disjoint sets in M, then v(U Ej) == L v(Ej),
where the series converges absolutely.
In particular, infinite values are not allowed, so a positive measure is a complex
measure only if it is finite. Example: If I-" is a positive measure and f E L 1 (1-"), then
f dl-" is a complex measure.
If v is a complex measure, we shall write V r and Vi for the real and imaginary
parts of v. Thus V r and Vi are signed measures that do not assume the values :i:oo;
hence they are finite, and so the range of v is a bounded subset of <C.
The notions we have developed for signed measures generalize easily to complex
measures. For example, we define L 1 (v) to be L 1 (v r ) n L 1 (Vi), and for f E L 1 (v),
we set J f dv = J f dV r + i J f dVi. If v and I-" are complex measures, we say that
v .-L I-" if Va .-L I-"b for a, b == r, i, and if A is a positive measure, we say that v « A if
V r « A and Vi « A. The theorems of 33.2 also generalize; one has merely to apply
them to apply them to the real and imaginary parts separately. In particular:
3.12 The Lebesgue-Radon-Nikodym Theorem. Ifv is a complex measure and I-"
is a a-finite positive measure on (X, M), there exist a complex measure A and an
f E Ll (I-") such that A .1 I-" and dv == dA + f dl-". If also A' .1 I-" and dv == dA' + f' dl-",
then A == A' and f == f' I-"-a. e.
As before, if v « 1-", we denote the f in Theorem 3.12 by dv / dl-".
The total variation of a complex measure v is the positive measure I v I determined
by the property that if dv == f dl-" where I-" is a positive measure, then dlvl == If I dl-".
To see that this is well defined, we observe first that every v is of the form f dl-" for
some finite measure I-" and some f EL I (1-"); indeed, we can take I-" == I V r I + I Vi I and
use Theorem 3.12 to obtain f. Second, if dv == fl dl-"1 == f2 d1-"2, let p == 1-"1 + 1-"2.
Then by Proposition 3.9,
dl-"l dl-"2
11 dp dp = dv = h dp dp,
so that fl (dl-"l / dp) == f2 (d1-"2 / dp) p-a.e. Since dl-"j / dp is nonnegative, we therefore
have
111 I df-ll = 11 df-ll
dp dp
h df-l2 = I hi df-l2
dp dp
p-a.e.,
94 SIGNED MEASURES AND DIFFERENTIATION
and thus
dJ-l1 dJ-l2
I!II dl-£l = I!II dp dp = 1121 dp dp = 1121 d1-£2.
Hence the definition of Ivl is independent of the choice of J-l and f. This definition
agrees with the previous definition of v when v is a signed measure, for in that
case dv == (XP - XN )dlvl where X == PuN is a Hahn decomposition, and
IX? - XNI == 1.
3.13 Proposition. Let v be a complex measure on (X, M).
a. Iv(E)1 < Ivl(E)for all E E M.
b. v« lvi, and dv/dlvl has absolute value Il v l-a.e.
c. L 1 (v) == L1 (lvI), and if fELl (v), then I J f dvl < J If I dlvl.
Proof. Suppose dv == f dJ-l as in the definition of Ivl. Then
Iv(E)1 = I L f dl-£I < L If I dl-£ = Ivl(E).
This proves (a) and shows that v « Ivl. If g == dv / dlvl, then, we have f dJ-l == dv ==
9 dlvl == glfl dJ-l, so glfl == f J-l-a.e. and hence Ivl-a.e. But clearly If I > 0 Ivl-a.e.,
whence Igi == Il v l-a.e. Part (c) is left to the reader (Exercise 18). .
3.14 Proposition. If VI, v2 are complex measures on (X, M), then I V 1 + v21 <
I vII + I V2/.
Proof. By Proposition 3.11 we can write Vj == fj dJ-l, with the same J-l, for
j == 1,2. But then dlv1 + v21 == If 1 + f21 dJ-l < If11 dJ-l + If21 dJ-l == dlV11 + dlv21. .
Exercises
18. Prove Proposition 3.13c.
19. If v, J-l are complex measures and A is a positive measure, then v .1 J-l iff I vi .1 1J-l1,
and v « A iff Ivl « A.
20. If v is a complex measure on (X, M) and v(X) == Ivl (X), then v == Ivl.
21. Let v be a complex measure on (X, JYC). If E E M, define
n n
1-£1(E) = sup{L:: Iv(Ej)1 : n E N, El,"', En disjoint, E = U Ej },
1 1
CX) CX)
1-£2(E) = sup{L:: Iv(Ej)1 : El, E 2 ,... disjoint, E = U Ej },
1 1
1-£3(E) = sU P { IL f dl-£I : If I < 1 }.
DIFFERENTIATION ON EUCLIDEAN SPACE 95
Then J-l l == J-l2 == J-l3 == Ivl. (First show that J-ll < J-l2 < J-l3. To see that J-l3 == lvi,
let f == dv/dlvl and apply Proposition 3.13. To see that J-l3 < J-lI, approximate f by
simple functions.)
3.4 DIFFERENTIATION ON EUCLIDEAN SPACE
The Radon-Nikodym theorem provides an abstract notion of the "derivative" of a
signed or complex measure v with respect to a measure J-l. In this section we analyze
more deeply the special case where (X, M) == CIln, 23}Rn) and J-l == m is Lebesgue
measure. Here one can define a pointwise derivative of v with respect to m in the
following way. Let B(r, x) be the open ball of radius r about x in ffi,n; then one can
consider the limit
F(x) = lirn v(B(r,x))
rO m(B(r, x))
when it exists. (One can also replace the balls B(r, x) by other sets which, in a
suitable sense, shrink to x in a regular way; we shall examine this point later.) If
v « m, so that dv == f dm, then v(B(r, x))/m(B(r, x)) is simply the average value
of f on B (r, x), so one would hope that F == f m-a.e. This turns out to be the case
provided that v(B(r, x)) is finite for all r, x. From the point of view of the function
f, this may be regarded as a generalization of the fundamental theorem of calculus:
The derivative of the indefinite integral of f (namely, v) is f.
For the remainder of this section, terms such as "integrable" and "almost every-
where" refer to Lebesgue measure unless otherwise specified. We begin our analysis
with a technical lemma that is of interest in its own right.
3.15 Lemma. Let e be a collection of open balls in ffi,n, and let U == UBEe B. If
c < m(U), there exist disjoint B I , . . . , Bk E e such that L: m(Bj) > 3- n c.
Proof. If c < m(U), by Theorem 2.40 there is a compact K c U with m(K) >
c, and finitely many of the balls in e - say, AI, . . . , Am - cover K. Let BI be the
largest of the Aj's (that is, choose BI to have maximal radius), let B 2 be the largest of
the Aj's that are disjoint from B I , B3 the largest of the Aj's that are disjoint from BI
and B 2 , and so on until the list of Aj's is exhausted. According to this construction,
if Ai is not one of the Bj's, there is a j such that Ai n Bj =1= 0, and if j is the smallest
integer with this property, the radius of Ai is at most that of Bj. Hence Ai C Bj,
where Bj is the ball concentric with Bj whose radius is three times that of Bj. But
then K C U Bj, so
k k
C < m(K) < L m(Bj) == 3 n L m(Bj).
I I
.
A measurable function f : ffi,n <C is called locally integrable (with respect to
Lebesgue measure) if f K If(x) I dx < 00 for every bounded measurable set KeIRn.
96 SIGNED MEASURES AND DIFFERENTIATION
We denote the space of locally integrable functions by L[oc. If f E L[oc' x E JR.n,
and r > 0, we define Arf(x) to be the average value of f on B(r, x):
Arf(x) = (B )) r f(y) dy.
m r, x } B(r,x)
3.16 Lemma. If f E L[oc' Ar f (x) is jointly continuous in r and x (r > 0, x E ffi.n ).
Proof. From the results in 32.7 we know that m(B( r, x)) == ern where e ==
m(B(l,O)), and m(S(r, x)) == 0 where S(r, x) == {y : Iy - xl == r}. Moreover,
as r ---t ro and x ---t Xo, XB(r,x) ---t XB(ro,xo) pointwise on JR.n \ S(ro, xo). Hence
XB(r,x) ---t XB(ro,xo) a.e., and IXB(r,x) 1 < XB(ro+1, xo) ifr < ro+ and Ix-'xo 1 < .
By the dominated convergence theorem, it follows that fB(r,x) f(y) dy is continuous
in r and x, and hence so is Arf(x) == e- 1 r- n fB(r,x) f(y) dYe .
Next, if f E L[oc, we define its Hardy-Littlewood maximal function H f by
Hf(x)=supArlfl(x)=sup (B )) r If(y)ldy.
r>O r>O m r, x } B(r,x)
H f is measurable, for (H f)-l (( a, 00)) == Ur>O (Ar Ifl)-l (( a, 00)) is open for any
a E JR., by Lemma 3.16.
3.17 The Maximal Theorem. There is a constant C > 0 such that for all fELl
and all ex > 0,
m({x: Hf(x) > a}) < J If(x)1 dx.
Proof. LetE Q == {x: Hf(x) > ex}. For each x E E Q we can choose r x > 0
such that Ar x Ifl(x) > ex. The balls B(r x , x) cover E Q , so by Lemma 3.15, if
e < m(E Q ) there exist Xl, . . . ,Xk E E Q such that the balls Bj == B(r xj , Xj) are
disjoint and E7 m(Bj) > 3- n e. But then
k 3 n k 1 3n 1
c < 3 n L m(Bj) < - L If(y)1 dy < - If(y)1 dYe
ex 1 B Q jRn
1 J
Letting e ---t m(E Q ), we obtain the desired result.
.
With this tool in hand, we now present three successively sharper versions of the
fundamental differentiation theorem. In the proofs we shall use the notion of limit
superior for real-valued functions of a real variable,
limsup<p(r) == lim sup <p(r) == inf sup <p(r),
r--+R E--+O O<lr-RI<E E>O O<lr-RI<E
and the easily verified fact that
lim <p(r) == e iff lim sup I<p(r) - el == O.
r--+R r--+R
DIFFERENTIATION ON EUCLIDEAN SPACE 97
3.18 Theorem. Iff E Lfoc' thenlim r -4oArf(x) == f(x)fora.e. x E ffi,n.
Proof. It suffices to show that for N E N, Arf(x) -t f(x) for a.e. x with
Ixl < N. But for Ixl < Nand r < 1 the values Arf(x) depend only on the values
f(y) for Iyl < N + 1, so by replacing f with fXB(N+1,O) we may assume that
fELl.
Given E > 0, by Theorem 2.41 we can find a continuous integrable function 9
such that f Ig(y) - f (y) I dy < E. Continuity of 9 implies that for every x E ffi,n and
8 > 0 there exists r > 0 such that Ig(y) - g(x)1 < 8 whenever Iy - xl < r, and
hence
IArg(x) - g(x)1 = (B )) r [g(y) - g(x)] dy < 8.
m r, x } B(r,x)
Therefore Arg(x) -t g(x) as r -t 0 for every x, so
limsup IArf(x) - f(x)1
r-+O
= limsuplAr(f - g)(x) + (Arg - g)(x) + (g - f)(x)1
r-40
< H(f - g)(x) + 0 + If - gl(x).
Hence, if
Ea = {x : limsup IArf(x) - f(x)1 > ex},
r-40
Fa == {x: If - gl(x) > ex},
we have
Ea C Fa/2 U {x : H(f - g)(x) > ex/2}.
But (ex/2)m(Fa/2) < fF a / 2 If(x) - g(x)1 dx < E, so by the maximal theorem,
m(Ea,} < 2E + 2CE .
ex ex
Since E is arbitrary, m(Ea) == 0 for all ex > O. But lim r -4o Arf(x) == f(x) for all
x U E 1 / n , so we are done. .
This result can be rephrased as follows: If f E Lfoc'
(3.19)
lim (B )) r [f(y) - f(x)] dy = 0 for a.e. x.
r-40 m r, x } B(r,x)
Actually, something stronger is true: (3.19) remains valid if one replaces the integrand
by its absolute value. That is, let us define the Lebesgue set L f of f to be
Lf == { X : lim (B )) r If(y) - f(x)1 dy = O } .
r-40 m r, x } B(r,x)
98 SIGNED MEASURES AND DIFFERENTIATION
3.20 Theorem. If f E Ltoc, then m((Lf )C) == o.
Proof. For each c E C we can apply Theorem 3.18 to gc(x) == If(x) - cl to
conclude that, except on a Lebesgue null set Ec, we have
. 1 1
hm (B( )) If(y) - cl dy = If(x) - cI-
r-40 m r, x B(r,x)
Let D be a countable dense subset of C, and let E == UCED Ec. Then m(E) == 0,
and if x E, for any E > 0 we can choose c E D with If(x) - cl < E, so that
If(y) - f(x)1 < If(y) - cl + E, and it follows that
limsup (B )) r If(y) - f(x)1 dy < If(x) - cl + f < 2f.
r-40 m r, x J B(r,x)
Since E is arbitrary, the desired result follows.
.
Finally, we consider families of sets more general than balls. A family {Er }r>O
of Borel subsets of n is said to shrink nicely to x E n if
. Er C B(r, x) for each r;
. there is a constant a > 0, independent of r, such that m(Er) > am(B(r, x)).
The sets Er need not contain x itself. For example, if U is any Borel subset of
B(l, 0) such that m(U) > 0, and Er == {x + ry : y E U}, then {Er} shrinks nicely
to x. Here, then, is the final version of the differentiation theorem.
3.21 The Lebesgue Differentiation Theorem. Suppose f E Ltoc. For every x in
the Lebesgue set of f - in particular, for almost every x - we have
lim () r If(y) - f(x)1 dy = 0 and Hm () r f(y) dy = f(x)
r-+O m r J Er r-40 m r J Er
for every family {Er }r>O that shrinks nicely to x.
Proof. For some a > 0 we have
() r If(y) - f(x)1 dy < () r If(y) - f(x)1 dy
m r JE r m r JB(r,x)
< (;( )) r If(y) - f(x)1 dy.
am r, x J B(r,x)
The first equality therefore follows from Theorem 3.20, and one sees immediately
that it implies the second one by writing the latter in the form (3.19). .
DIFFERENTIATION ON EUCLIDEAN SPACE 99
We now return to the study of measures. A Borel measure v on ]Rn will be called
regular if
. v(K) < 00 for every compact K;
. v(E) == inf{v(U) : U open, E c U} for every E E 'ERn.
(Condition (ii) is actually implied by condition (i). For n == 1 this follows from
Theorems 1.16 and 1.18, and we shall prove it for arbitrary n in 37.2. For the time
being, we assume (ii) explicitly.) We observe that by (i), every regular measure is
a-finite. A signed or complex Borel measure v will be called regular if Ivl is regular.
For example, if f E L + (]Rn), the measure f dm is regular iff f E Ltoc. Indeed,
the condition f E Ltoc is clearly equivalent to (i). If this holds, (ii) may be verified
directly as follows. Suppose that E is a bounded Borel set. Given 8 > 0, by
Theorem 2.40 there is a bounded open U ::) E such that m(U) < m(E) + 8 and
hence m(U \ E) < 8. But then, given E > 0, by Corollary 3.6 there is an open
U ::) E such that IU\E f dm < E and hence Iu f dm < IE f dm + E. The case
of unbounded E follows easily by writing E == U Ej where Ej is bounded and
finding an open U j ::) Ej such that IUo \ Eo f dm < E2- j .
J J
3.22 Theorem. Let v be a regular signed or complex Borel measure on ]Rn, and let
dv == dA+ f dm be its Lebesgue-Radon-Nikodym representation. Thenform-almost
every x E ]Rn,
. v(Er)
hm (E) = f(x)
r---.Q m r
for every family {Er }r>O that shrinks nicely to x.
Proof. It is easily verified that dlvl == diAl + If I dp, so the regularity of v implies
the regularity of both A and f dm (Exercise 26). In particular, f E Ltoc, so in view of
Theorem 3.21, it suffices to show that if A is regular and A 1- m, then for m-almost
every x, A(Er) /m(Er) 0 as r 0 when Er shrinks nicely to x. It also suffices
to take Er == B(r, x) and to assume that A is positive, since for some a > 0 we have
A(Er) IAI (Er) IAI (B(r, x)) IAI (B(r, x))
< < < .
m(Er) - m(Er) - m(Er) - am(B(r,x))
Assuming A > 0, then, let A be a Borel set such that A(A) == m(AC) == 0, and let
{ . A(B(r,x)) 1 }
H = x E A: hp m(B(r,x)) > k .
We shall show that m( F k ) == 0 for all k, and this will complete the proof.
The argument is similar to the proof of the maximal theorem. By regularity of
A, given E > 0 there is an open U E ::) A such that A(U E ) < E. Each x E Fk is
the center of a ball Bx C U E such that A(Bx) > k- 1 m(Bx). By Lemma 3.15, if
100 SIGNED MEASURES AND DIFFERENTIATION
== UXEF k Bx and c < m() there exist Xl, . . . , Xj such that B x !, . . . , B XJ are
disjoint and
J J
C < 3 n Lm(Bxj) < 3 n k LA(Bxj) < 3nkA() < 3 n kA(U E ) < 3 n kE.
I I
We conclude that m() < 3 n kE, and since Fk C and E is arbitrary, m(Fk) == 0..
Exercises
22. If f E LI(}Rn), f =1= 0, there exist C,R > Osuch that Hf(x) > Cxl-n for
Ixl > R. Hence m( {x : H f(x) > a}) > C' / a when a is small, so the estimate in
the maximal theorem is essentially sharp.
23. A useful variant of the Hardy-Littlewood maximal function is
H* f(x) = sup { mtB) L If(y)1 dy : B is a ball and x E B } .
ShowthatHf < H*f < 2 n Hf.
24. If f E Lfoc and f is continuous at x, then x is in the Lebesgue set of f.
25. If E is a Borel set in }Rn, the density DE (x) of E at x is defined as
D ( ) I . m(EnB(r,x))
E x == 1m ,
rO m(B(r, x))
whenever the limit exists.
a. Show that DE(X) == 1 for a.e. X E E and DE(X) == 0 for a.e. X E EC.
b. Find examples of E and x such that DE(X) is a given number a E (0,1), or
such that DE (x) does not exist.
26. If A and J1 are positive, mutually singular Borel measures on }Rn and A + J-l is
regular, then so are A and J1.
3.5 FUNCTIONS OF BOUNDED VARIATION
The theorems of the preceding section apply in particular on the real line, where,
because of the correspondence between regular Borel measures and increasing func-
tions that we established in 31.5, they yield results about differentiation and inte-
gration of functions. As in 31.5, we adopt the notation that if F is an increasing,
right continuous function on }R, J1 F is the Borel measure determined by the rela-
tion J1F((a, b]) == F(b) - F(a). Also, throughout this section the term "almost
everywhere" will always refer to Lebesgue measure.
Our first result uses the Lebesgue differentiation theorem to prove the a.e. differ-
entiability of increasing functions.
FUNCTIONS OF BOUNDED VARIATION 101
3.23 Theorem. Let F : be increasing, and let G(x) == F(x+).
a. The set of points at which F is discontinuous is countable.
b. F and G are differentiable a.e., and F' == G' a.e.
Proof. Since F is increasing, the intervals (F(x-), F(x+)) (x E ) are disjoint,
and for I x I < N they lie in the interval (F ( - N), F (N) ). Hence
L [F(x+) - F(x-)] < F(N) - F(-N) < 00,
Ixl<N
which implies that {x E (- N, N) : F(x+) i= F(x-)} is countable. As this is true
for all N, (a) is proved.
Next, we observe that G is increasing and right continuous, and G == F except
perhaps where F is discontinuous. Moreover,
G(x + h) _ G(x) == { /-lC((X, x + h]) f h > 0,
-/-lc( (x + h, x]) If h < 0,
and the families {(x-r, x]} and {(x, x+r]} shrink nicely to x as r == Ihl O. Thus,
an application of Theorem 3.22 to the measure /-lc (which is regular by Theorem
1.18) shows that G' (x) exists for a.e. x. To complete the proof, it remains to show
that if H == G - F, then H' exists and equals zero a.e.
Let {Xj} be an enulneration of the points at which H i= O. Then H(xj) > 0,
and as above we have L{j:lxjl<N} H(xj) < 00 for any N. Let 8j be the point
mass at x} and /-l == L j H(xj)8 j . Then /-l finite on compact sets by the preceding
sentence, and hence /-l is regular by Theorems 1.16 and 1.18; also, /-l 1- m since
m(E) == /-l(EC) == 0 where E == {Xj }1. But then
H(x + h) - H(x) H(x + h) + H(x) /-l((x - 21 h l, x + 2I h l))
< <4
h - Ihl - 41hl '
which tends to zero as h 0 for a.e. x, by Theorem 3.22. Thus H' == 0 a.e., and we
are done. .
As positive measures on are related to increasing functions, complex measures
on are related to so-called functions of bounded variation. The definition of the
latter concept is a bit technical, so some motivation may be appropriate. Intuitively,
if F( t) represents the position of a particle moving along the real line at time t, the
"total variation" of F over the interval [a, b] is the total distance traveled from time
a to time b, as shown on an odometer. If F has a continuous derivative, this is just
the integral of the "speed," J: IF' (t) I dt. To define the total variation without any
smoothness hypotheses on F requires a different approach; namely, one partitions
[a, b] into subintervals [t j -1, t j ] and approximates F on each subinterval by the linear
function whose graph joins (tj -1, F (t j -1)) to (t j , F (tj )), and then passes to a limit.
In making this precise, we begin with a slightly different point of view, taking
a == - 00 and considering the total variation as a function of b. To wit, if F : 1R <C
102 SIGNED MEASURES AND DIFFERENTIATION
and x E IR, we define
n
TF(X) = sup{L: /F(xj) - F(Xj-l)/ : n E N, -00 < Xo < ... < X n = x}.
1
T p is called the total variation function of F. We observe that the sums in the
definition of T p are made bigger if the additional subdivision points x j are added.
Hence, if a < b, the definition of Tp(b) is unaffected if we assume that a is always
one of the subdivision points. It follows that
Tp(b) - Tp(a)
n
(3.24) = sup{L: IF(xj) - F(xj-dl : n E N, a = Xo < .,. < X n = b}.
1
Thus T p is an increasing function with values in [0, 00]. IfT p ( 00) == limxoo T p (x)
is finite, we say that F is of bounded variation on IR, and we denote the space of all
such F by BV.
More generally, the supremum on the right of (3.24) is called the total variation of
F on [a, b]. It depends only on the values of F on [a, b], so we may define BV([a, b])
to be the set of all functions on [a, b] whose total variation on [a, b] is finite. If
F E BV, the restriction of F to [a, b] is in BV([a, b]) for all a, b; indeed, its total
variation on [a, b] is nothing but Tp(b) - Tp(a). Conversely, if F E BV([a, b]) and
we set F(x) == F(a) for x < a and F(x) == F(b) for x > b, then F E BV. By this
device the results that we shall prove for BV can also be applied to BV([a, b]).
3.25 Examples.
a. If F : IR IR is bounded and increasing, then F E BV (in fact, T p (x) ==
F(x) - F( -00)).
b. If F, G E BV and a, b E <C, then aF + bG E BV.
c. If F is differentiable on IR and F ' is bounded, then F E BV ([ a, b]) for
-00 < a < b < 00 (by the mean value theorem).
d. If F(x) == sinx, then F E BV([a, b]) for -00 < a < b < 00, but F rt BV.
e. If F(x) == xsin(x- 1 ) for x i= 0 and F(O) == 0, then F rt BV([a, b]) for
a < 0 < b or a < 0 < b.
The verification of these examples is left to the reader (Exercise 27).
3.26 Lemma. If F E BV is real-valued, then T p + F and Tp - F are increasing.
Proof. If x < y and E > 0, choose Xo < . . . < X n == x such that
n
L: IF(xj) - F(Xj-l)1 > Tp(x) - E.
1
FUNCTIONS OF BOUNDED VARIATION 103
Then L IF(xj) - F(Xj-l)1 + IF(y) - F(x)1 is an approximating sum for TF(y),
and F(y) == [F(y) - F(x)] + F(x), so
n
TF(y) :!: F(y) > I: IF(xj) + F(Xj-l)1
1
+IF(y) - F(x)1 :!: [F(y) - F(x)] :!: F(x)
> TF(X) - E :i: F(x).
Since E is arbitrary, TF(y) :i: F(y) > TF(X) :i: F(x), as desired.
.
3.27 Theorem.
a. F E BV iff ReF E BV andlmF E BV.
b. If F : IR IR, then F E BV iff F is the difference of two bounded increasing
functions; for F E BV these functions may be taken to be (TF + F) and
(TF - F).
c. If F E BV, then F(x+) == limyx F(y) and F(x-) == lim y / x F(y) exist
for all x E IR, as do F(:i:oo) == limy:i::oo F(y).
d. If F E BV, the set ofpoints at which F is discontinuous is countable.
e. If F E BV and G(x) == F(x+), then F' and G' exist and are equal a.e.
Proof. (a) is obvious. For (b), the "if" implication is easy (see Examples 3.25a,b).
To prove "only if," observe that by Lemma 3.26, the equation F == (TF + F) -
(T F - F) expresses F as the difference of two increasing functions. Also, the
inequalities
TF(y) :!: F(y) > TF(X) :!: F(x)
(y > x)
imply that
IF(y) - F(x)1 < TF(y) - TF(X) < TF( (0) - TF( -(0) < 00,
so that F, and hence T F :!: F, is bounded. Finally, (c), (d), and (e) follow from (a),
(b), and Theorem 3.23. .
The representation F == (TF + F) - (TF - F) of a real-valued F E BV is
called the Jordan decomposition of F, and (TF + F) and (TF - F) are called
the positive and negative variations of F. Since x+ == max(x, 0) == (Ixl + x) and
x- == max( -x, 0) == (Ixl- x) for x E IR, we have
(TF :!: F)(x)
n
= sup{L:[F(Xj) - F(Xj-l)] : Xo <... < X n = X}:!: F(-oo),
1
Theorem 3.27(a,b) leads to the connection between BV and the space of complex
Borel measures on IR. To make this precise, we introduce the space N BV (N for
"normalized") defined by
N BV == {F E BV : F is right continuous and F( -(0) == O}.
104 SIGNED MEASURES AND DIFFERENTIATION
We observe that if F E BV, then the function G defined by G(x) == F(x+)- F( -00)
is in N BV and G' == F ' a.e. (That G E BV follows easily from Theorem 3.27(a,b):
if F is real and F == FI - F 2 where F I , F 2 are increasing, then G (x) == FI (x+) -
[F 2 (x+) + F( -(0)], which is again the difference of two increasing functions.)
3.28 Lemma. If F E BV, then T p ( -00) == o. If F is also right continuous, then
so is T p .
Proof. If E > 0 and x E , choose Xo < . . . < X n == X so that
n
L IF(xj) - F(Xj-l)1 > Tp(x) - E.
1
From (3.24) we see that Tp(x) - Tp(xo) > Tp(x) - E, and hence Tp(y) < E for
y < Xo. Thus Tp( -00) == O.
Now suppose that F is right continuous. Given x E and E > 0, let a ==
Tp(x+) -Tp(x), and choose 8 > 0 so that IF(x+h) -F(x)1 < E and Tp(x + h)-
Tp(x+) < E whenever 0 < h < 8. For any such h, by (3.24) there exist Xo < . . . <
X n == x + h such that
n
L IF(xj) - F(Xj-l)1 > [Tp(x + h) - Tp(x)] > a,
1
and hence
n
L IF(Xj) - F(Xj-I)1 > a - IF(XI) - F(xo)1 > a - E.
2
Likewise, there exist x == to < . . . < t m == Xl such that L IF(tj) - F(tj-I)1 >
a, and hence
a + E > Tp(x + h) - Tp(x)
m n
> L IF(t j ) - F(tj-I)1 + L IF(xj) - F(Xj-I)1
I 2
3
> 2"a - E.
Thus a < 4E, and since E is arbitrary, a == O.
.
3.29 Theorem. If {L is a complex Borel measure on and F(x) == {L(( -00, x]),
then FEN BV. Conversely, if FEN BV, there is a unique complex Borel measure
{Lp such that F(x) == {Lp(( -00, x]); moreover, l{Lpl == {LTp.
Proof. If {L is a complex measure, we have {L == {Lt - {Ll +i({Lt - {L"2) where the
{LT are finite measures. If F/=(x) == {LT(( -00, x]), then F/= is increasing and right
continuous, F/=( -00) == 0, and Fl=(oo) == {Lt() < 00. By Theorem 3.27(a,b),
FUNCTIONS OF BOUNDED VARIATION 105
the function F == Ft - F 1 - + i(F;j - F 2 -) is in NBV. Conversely, by Theorem
3.27 and Lemma 3.28, any FEN BV can be written in this form with the F/::-
increasing and in N BV. Each FjI gives rise to a measure 11; according to Theorem
1.16, so F(x) == I1F(( -00, x]) where I1F == I1t -111 + i(l1t -112). The proof that
111 F 1 == I1T F is outlined in Exercise 28. .
The next obvious question is: Which functions in N BV correspond to measures
11 such that 11 1- m or 11 « m? One answer is the following:
3.30 Proposition. If FEN BV, then F ' E L 1 (m). Moreover, I1F 1- m iff F ' == 0
a.e., and I1F « m iff F(x) == Joo F'(t) dt.
Proof. We have merely to observe that F'(X) == limro I1F(E r )/m(Er) where
Er == (x, x + r] or (x - r, x] and apply Theorem 3.22. (The measure I1F is
automatically regular by Theorem 1.18.) .
The condition I1F « m can also be expressed directly in terms of F, as follows.
A function F : JR C is called absolutely continuous if for every E > 0 there exists
8 > 0 such that for any finite set of disjoint intervals (a1, b I ), . . . , (aN, b N ),
(3.31)
N
L(b j - aj) < 8 ===}
1
N
L IF(b j ) - F(aj)1 < E.
1
More generally, F is said to be absolutely continuous on [a, b] if this condition is
satisfied whenever the intervals (aj, b j ) all lie in [a, b]. Clearly, if F is absolutely
continuous, then F is uniformly continuous (take N == 1 in (3.31)). On the other
hand, if F is everywhere differentiable and F ' is bounded, then F is absolutely
continuous, for IF(b j ) - F(aj)1 < (max IF/I)(b j - aj) by the mean value theorem.
3.32 Proposition. If FEN BV, then F is absolutely continuous iff 11 F « m.
Proof. If I1F « m, the absolute continuity of F follows by applying Theorem
3.5 to the sets E == U (aj, b j ]. To prove the converse, suppose that E is a Borel set
such that m( E) == O. If E and 8 are as in the definition of absolute continuity of F,
by Theorem 1.18 we can find open sets U I ::) U 2 ::) ... ::) E such that m(U 1 ) < 8
(and thus J-l(U j ) < 8 for all j) and I1F(U j ) I1F(E). Each U j is a disjoint union of
open intervals (aj, bj), and
N N
LII1F((a;, b;)) I < L IF(b;) - F(a;)1 < E
k=1 k=1
for all N. Letting N 00, we obtain II1F(U j )1 < E and hence II1F(E)1 < E. Since
E is arbitrary, I1F(E) == 0, which shows that I1F « m. .
3.33 Corollary. If f E LI(m), then the function F(x) == Joo f(t) dt is in N BV
and is absolutely continuous, and f == F ' a.e. Conversely, ifF E N BV is absolutely
continuous, then F ' E L1 (m) and F( x) == Joo F ' (t) dt.
Proof. This follows immediately from Propositions 3.30 and 3.32. .
106 SIGNED MEASURES AND DIFFERENTIATION
If we consider functions on bounded intervals, this result can be refined a bit.
3.34 Lemma. If F is absolutely continuous on [a, b], then F E BV([a, b]).
Proof. Let 8 be as in the definition of absolute continuity, corresponding to E == 1,
and let N be the greatest integer less than 8- 1 (b - a) + 1. If a == XQ < . . . < X n ==
b, by inserting more subdivision points if necessary, we can collect the intervals
(Xj-1, Xj) into at most N groups of consecutive intervals such that the sum of the
lengths in each group is less than 8. The sum I: IF(xj) - F(xj-1)1 over each group
is at most 1, and hence the total variation of F on [a, b] is at most N. .
3.35 The Fundamental Theorem of Calculus for Lebesgue Integrals. If -00 <
a < b < 00 and F : [a, b] <C, the following are equivalent:
a. F is absolutely continuous on [a, b].
b. F(x) - F(a) == J: f(t) dtforsome f E L1([a, b], m).
c. F is differentiable a.e. on [a,b], F ' E L1([a,b],m), and F(x) - F(a)
J: F ' (t)dt.
Proof. To prove that (a) implies (c), we may assume by subtracting a constant
from F that F(a) == O. If we set F(x) == 0 for x < a and F(x) == F(b) for x > b,
then FEN BV by Lemma 3.34, so (c) follows from Corollary 3.33. That (c) implies
(b) is trivial. Finally, (b) implies (a) by setting f(t) == 0 for t tJ- [a, b] and applying
Corollary 3.33. .
The following decomposition of Borel measures on }Rn is sometimes important.
A complex Borel measure {L on }Rn is called discrete if there is a countable set
{Xj} C }Rn and complex numbers Cj such that I: ICj I < 00 and {L == I: cj8xj' where
8x is the point mass at x. On the other hand, {L is called continuous if {L( {x}) == 0 for
all x E JRn. Any complex measure {L can be written uniquely as {L == {Ld + {Lc where
{Ld is discrete and {Lc is continuous. Indeed, let E == {x : {L( {x}) i= O}. For any
countable subset F of E the series I:xEF {L( {x}) converges absolutely (to {L(F)), so
{x E E : I{L( {x}) I > k- 1 } is finite for all k, and it follows that E itself is countable.
Hence {Ld(A) == {L(A n E) is discrete and {Lc(A) == {L(A \ E) is continuous.
Obviously, if {L is discrete, then {L 1- m; and if {L « m, then {L is continuous.
Thus, by Theorem 3.22, any (regular) complex Borel measure on ]Rn can be written
uniquely as
{L == {Ld + {Lac + {Lsc
where {Ld is discrete, {Lac is absolutely continuous with respect to m, and {Lsc is a
"singular continuous" measure, that is, {Lsc is continuous but {Lsc 1- m.
The existence of nonzero singular continuous measures in }Rn is evident enough
when n > 1; the surface measure on the unit sphere discussed in 32.7 is one example.
Their existence when n == 1 is not quite so obvious; they correspond via Theorem
3.29 to nonconstant functions FEN BV such that F is continuous but F ' == 0
a.e. One such function is the Cantor function constructed in 3 1.5 (extended to ]R by
setting F(x) == 0 for x < 0 and F(x) == 1 for x > 1). More surprisingly, there exist
strictly increasing continuous functions F such that F ' == 0 a.e.; see Exercise 40.
FUNCTIONS OF BOUNDED VARIATION 107
If FEN BV, it is customary to denote the integral of a function 9 with respect
to the measure {LF by J gdF or J g(x) dF(x); such integrals are called Lebesgue-
Stieltjes integrals. We conclude by presenting an integration-by-parts formula for
Lebesgue-Stieltjes integrals; other variants of this result can be found in Exercises
34 and 35.
3.36 Theorem. If F and G are in N BV and at least one of them is continuous, then
for -00 < a < b < 00,
r F dG + r G dF = F(b)G(b) - F(a)G(a).
(a, (a,
Proof. F and G are linear combinations of increasing functions in N BV by
Theorem 3.27(a,b), so a simple calculation shows that it suffices to assume F and
G increasing. Suppose for the sake of definiteness that G is continuous, and let
o == {(x, y) : a < x < Y < b}. We use Fubini's theorem to compute {Lp x {Lc(O)
in two ways:
JlF X Jlc(O) = r r dF(x) dG(x) = r [F(y) - F(a)] dG(y)
(a,b] (a,y] (a,
= r FdG-F(a)[G(b)-G(a)],
(a,b]
and since G(x) == G(x-),
JlF X Jlc(O) = r r dG(y) dF(x) = r [G(b) - G(x)] dF(x)
(a,b] [x,b] (a,b]
= G(b)[F(b) - F(a)] - r G dF.
(a,b]
Subtracting these two equations, we obtain the desired result.
.
Exercises
27. Verify the assertions in Examples 3.25.
28. If F E NBV, let G(x) == l{Lpl((-oo,xJ). Prove that I{LFI == {LTp by showing
that G == T F via the following steps.
a. From the definition of T p , T p < G.
b. l{Lp(E)1 < {LTp(E) when E is an interval, and hence when E is a Borel set.
c. l{Lpl < {LTp, and hence G < T p . (Use Exercise 21.)
29. If FEN BV is real-valued, then {Lt == {Lp and {Lp == {LN where P and N are
the positive and negative variations of F. (Use Exercise 28.)
30. Construct an increasing function on 1R. whose set of discontinuities is Q.
108 SIGNED MEASURES AND DIFFERENTIATION
31. Let F(x) == x 2 sin(x- 1 ) and G(x) == x 2 sin(x- 2 ) for x =1= 0, and F(O)
G(O) == O.
a. F and G are differentiable everywhere (including x == 0).
b. F E BV([-l, 1]), but G rt BV([-l, 1]).
32. If F I , F 2 , . . . , FEN BV and Fj F pointwise, then T p < lim inf T pj .
33. If F is increasing on IR, then F(b) - F(a) > f: F'(t) dt.
34. Suppose F, G E N BV and -00 < a < b < 00.
a. By adapting the proof of Theorem 3.36, show that
r F(x) + F(x-) dG(x) + r G(x) + G(x-) dF(x)
J[a,b] 2 J[a,b] 2
== F(b)G(b) - F(a-)G(a-).
b. If there are no points in [a, b] where F and G are both discontinuous, then
r F dG + r G dF = F(b)G(b) - F(a- )G(a-).
J[a,b] J[a,b]
35. If F and G are absolutely continuous on [a, b], then so is FC, and
l b (FG' + GF')(x) dx = F(b)G(b) - F(a)G(a).
36. Let G be a continuous increasing function on [a, b] and let C(a) == c, C(b) == d.
a. If E c [c, d] is a Borel set, then m( E) == {LG ( C- 1 (E) ). (First consider the
case where E is an interval.)
b. If J is a Borel measurable and integrable function on [c, d], then fed J (y) dy ==
f: J(C(x)) dG(x). In particular, fed J(y) dy == f: J(G(x))G'(x) dx if C is
absolutely continuous.
c. The validity of (b) may fail if G is merely right continuous rather than
continuous.
37. SupposeF: IR C. ThereisaconstantMsuchthatIF(x)-F(y)1 < Mlx-yl
for all x, y E IR (that is, F is Lipschitz continuous) iff F is absolutely continuous
and IF' I < M a.e.
38. If f : [a, b] IR, consider the graph of J as a subset of C, namely, {t + i J (t) :
t E [a, b]}. The length L of this graph is by definition the supremum of the lengths
of all inscribed polygons. (An "inscribed polygon" is the union of the line segments
joining tj-I + if(t j - I ) to tj + iJ(t j ), 1 < j < n, where a == to < ... < t n == b.)
a. Let F(t) == t + if(t); then L is the total variation of F on [a, b].
b. If f is absolutely continuous, L == f:[l + J'(t)2]1/2 dt.
39. If {F j } is a sequence of nonnegative increasing functions on [a, b] such that
F(x) == L Fj(x) < 00 for all x E [a, b], then F'(x) == L Fj(x) for a.e.
x E [a, b]. (It suffices to assume Fj E N BV. Consider the measures {L Pj .)
NOTES AND REFERENCES 109
40. Let F denote the Cantor function on [0, 1] (see 31.5), and set F( x) == 0 for x < 0
and F(x) == 1 for x > 1. Let {[an, b n ]} be an enumeration of the closed subintervals
of [0,1] with rational endpoints, and let Fn(x) == F((x - an)/(b n - an)). Then
G == 2: 2- n Fn is continuous and strictly increasing on [0,1], and G' == 0 a.e.
(Use Exercise 39.)
41. Let A C [0,1] be a Borel set such that 0 < m(A n I) < m(I) for every
subinterval I of [0, 1] (Exercise 33, Chapter 1).
a. Let F(x) == m([O,x] n A). Then F is absolutely continuous and strictly
increasing on [0, 1], but F' == 0 on a set of positive measure.
b. Let G(x) == m([O, x] n A) - m([O, x] \ A). Then G is absolutely continuous
on [0, 1], but G is not monotone on any subinterval of [0, 1].
42. A function F : ( a, b) -t IR (-00 < a < b < 00) is called convex if
F(AS + (1 - A)t) < AF(s) + (1 - A)F(t)
for all s, t E ( a, b) and A E (0,1). (Geometrically, this says that the graph of F
over the interval from s to t lies underneath the line segment joining (s, F( s)) to
(t, F(t)).)
a. F is convex iff for all s, t, s', t' E (a, b) such that s < s' < t' and s < t < t',
F ( t) - F ( s ) F ( t') - F ( s')
< .
t - s - t' - s'
b. F is convex iff F is absolutely continuous on every compact subinterval of
( a, b) and F' is increasing (on the set where it is defined).
c. If F is convex and to E (a, b), there exists (3 E IR such that F(t) - F(to) >
[3(t - to) for all t E (a, b).
d. (Jensen's Inequality) If (X, M, JL) is a measure space with JL(X) == 1,
g : X -t (a, b) is in Ll(JL), and F is convex on (a, b), then
F (J 9 d) < J Fog d.
(Let to == J gdJL and t == g(x) in (c), and integrate.)
3.6 NOTES AND REFERENCES
3.2: The Lebesgue-Radon-Nikodym theorem was proved by Lebesgue [92] in
the case where JL is Lebesgue measure on n. Under the hypothesis v « JL, it
was generalized by Radon [111] to arbitrary regular Borel measures on IRn and by
Nikodym [107] to measures on abstract spaces. The Lebesgue decomposition in the
abstract setting appears in Saks [128]. The proof of the Lebesgue-Radon-Nikodym
theorem in the text is similar to, but more efficient than, the one in Halmos [62]; I
learned it from L. Loomis.
110 SIGNED MEASURES AND DIFFERENTIATION
3.3: The characterization
n n
Iv/(E) = sup{I: /v(Ej)J : n E N, El,...' En disjoint, E = U Ej }
1 1
of the total variation of a complex measure v (see Exercise 21) is usually taken as
the definition of Ivl. Our definition seems more generally useful, and it is certainly
easier to compute with.
3 .4: Theorems 3.21 and 3.22 are due to Lebesgue [92], but the line of argument
we have presented is essentially that of Wiener [161], and the maximal function H f,
in dimension one, was first studied in Hardy and Littlewood [65]. Our proof of
Theorem 3.18 is illustrative of a general technique that has been much exploited in
recent years, namely, controlling the limiting behavior of a family of operators by
means of estimates on an appropriate maximal function.
Lemma 3.15, a simplified version of Wiener's covering lemma, is taken from
Rudin [125]. There is also an older and more delicate covering theorem, due to
Vitali, which is used for similar purposes:
If E c n and Q is a family of cubes such that each x E E is contained in
members of Q of arbitrarily small diameter, then there is a (finite or infinite)
disjoint sequence {Qj} c Q such that m(E \ U j Qj) == O.
Proofs can be found in many books, for example, Cohn [27, 36.2], Falconer [39,
31.3], and Hewitt and Stromberg [76, 317].
33.5: The main results of this section are due to Lebesgue and Vitali; see Hawkins
[70] for detailed references. Exercise 36 gives one form of the change-of-variable
formula for Lebesgue integrals; others can be found in Serrin and Varberg [133].
Exercise 39 is a theorem of Fubini, and the example in Exercise 40 is due to Brown
[21] .
The Stieltjes integral J: 9 dF was originally defined, under the hypothesis that F is
an increasing function on [a, b], as a limit of Riemann sums L: g(tj )[F(t j ) - F(tj-1)].
The theory of such "Riemann-Stieltjes" integrals is much like that of the ordinary
Riemann integral, but some care is needed to handle cases where 9 and F are both
allowed to be discontinuous. See ter Horst [148], which contains the analogue of
Theorem 2.28 for Stieltjes integrals.
The example of the Cantor function shows that a continuous, a.e.-differentiable
function need not be the integral of its derivative. It is a highly nontrivial theorem
that if F is continuous on [a, b], F ' (x) exists for every x E [a, b] \ A where A is
countable, and F ' E L 1 , then F is absolutely continuous and hence can be recovered
from F ' by integration. A proof can be found in Cohn [27, 6.3]; see also Rudin
[125, Theorem 7.26] for the somewhat easier case when A == 0.
However, this is not the end of the story, for there exist everywhere differentiable
functions F such that F ' L 1 . Perhaps the simplest example is F (x) == x 2 sin (x - 2 )
(see Exercise 31). Here the only trouble is at x == 0, so for a < 0 < b one could
consider J: F ' (t) dt as an improper integral, i.e., the limit of Lebesgue integrals over
NOTES AND REFERENCES 111
[a, b] \ [-€, €] as € O. However, it is not hard to construct examples in which
the singularities of F ' are so complicated that F' is not Lebesgue integrable on any
interval. In this situation the Lebesgue integral is simply insufficient. However, the
Henstock-Kurzweil integral (or the Denjoy or Perron integral) that was discussed in
2.8 is powerful enough to integrate such F', and by using this integral one obtains
the general fundamental theorem of calculus: If F is everywhere differentiable on
[a, b], then F(b) - F(a) == f: F'(t) dt.
Point Set Topology
The concepts of limit, convergence, and continuity are central to all of analysis, and
it is useful to have a general framework for studying them that includes the classical
manifestations as special cases. One such framework, which has the advantage of
not requiring many ideas beyond those occurring in analysis on Euclidean space, is
that of metric spaces. However, metric spaces are not sufficiently general to describe
even some very classical modes of convergence, for example, pointwise convergence
of functions on IR. A more flexible theory can be built by taking the open sets, rather
than a metric, as the primitive data, and it is this theory that we shall explore in the
present chapter.
4.1 TOPOLOGICAL SPACES
Let X be a nonempty set. A topology on X is a family 'J of subsets of X that
contains 0 and X and is closed under arbitrary unions and finite intersections (i.e., if
{Ua}aEA C 'J then UaEA U a E 'J, and if U 1 ,..., Un E 'J then n U j E 'J). The
pair (X, 'J) is called a topological space. If'J is understood, we shall simply refer
to the topological space X. Let us examine a few examples:
. If X is any nonempty set, P(X) and {0, X} are topologies on X. They
are called the discrete topology and the trivial (or indiscrete) topology,
respecti vel y.
. If X is an infinite set, {U eX: U = 0 or UC is finite} is a topology on X,
called the cofinite topology.
113
114 POINT SET TOPOLOGY
. If X is a metric space, the collection of all open sets with respect to the metric
is a topology on X.
. If (X, 'I) is a topological space and Y c X, then 'I y == {U n Y : U E 'I} is
a topology on Y, called the relative topology induced by 'I.
We now present the basic terminology concerning topological spaces. Most of
these concepts are already familiar in the context of metric spaces. Until further
notice, (X, 'I) will be a fixed topological space.
The members of 'I are called open sets, and their complements are called closed
sets. If Y eX, the open (closed) subsets of Y in the relative topology are called
relatively open (closed). We observe that, by deMorgan's laws, the family of closed
sets is closed under arbitrary intersections and finite unions.
If A eX, the union of all open sets contained in A is called the interior of A,
and the intersection of all closed sets containing A is called the closure of A. We
denote the interior and closure of A by A ° and A , respectively. Clearly A ° is the
largest open set contained in A and A is the smallest closed set containing A, and we
have (AO)C == Ac and ( A )C == (AC)o. The difference A \ AO == A n Ac is called the
boundary of A and is denoted by GA. If A == X, A is called dense in X. On the
other hand, if ( A ) ° == 0, A is called nowhere dense.
If x E X (or E c X), a neighborhood of x (or E) is a set A c X such that
x E A ° (or E c A 0). Thus, a set A is open iff it is a neighborhood of itself. (Some
authors require neighborhoods to be open sets; we do not.) A point x is called an
accumulation point of A if A n (U \ {x}) =1= 0 for every neighborhood U of x.
(Other terms sometimes used for the same concept are "cluster point" and "limit
point." We shall use "cluster point" to mean something a bit different below.)
4.1 Proposition. If A c X, let acc(A) be the set of accumulation points of A. Then
A == Au acc(A), and A is closed iffacc(A) C A.
Proof. If x fj. A , then A C is a neighborhood of x that does not intersect A, so
x fj. acc(A); thus AUacc(A) c A . Ifx fj. AUacc(A), there is an open U containing
x such that UnA == 0, so that A c uc and x fj. A . Thus A c Au acc(A). Finally,
A is closed iff A == A , and this happens iff acc( A) c A. .
If 'II and 'J 2 are topologies on X such that 'II c 'J 2 , we say that 'II is weaker
(or coarser) than 'J 2 , or that 'J 2 is stronger (or finer) than 'II. Clearly the trivial
topology is the weakest topology on X, while the discrete topology is the strongest. If
E c P(X), there is a unique weakest topology 'J( E) on X that contains E, namely the
intersection of all topologies on X containing E. It is called the topology generated
by E, and E is sometimes called a subbase for 'I ( E).
If 'I is a topology on X, a neighborhood base for 'I at x E X is a family N C 'I
such that
. x E V for all V E N;
. if U E 'J and x E U, there exists V E N such that x E V and V c U.
TOPOLOGICAL SPACES 115
A base for 'J is a family 13 c 'J that contains a neighborhood base for 'J at each
x EX. For example, if X is a metric space, the collection of open balls centered at
x is a neighborhood base for the metric topology at x, and the collection of all open
balls in X is a base.
4.2 Proposition. If'J is a topology on X and £ c 'J, then £ is a base for 'J iff every
nonempty U E 'J is a union of members of £.
Proof. If £ is a base, U E 'J, and x E U, there exists V x E £ with x E V x C U,
so U == UXEU V x . Conversely, if every nonempty U E 'J is a union of members of
£, then {V E £ : x E V} is clearly a neighborhood base at x, so £ is a base. .
4.3 Proposition. If £ c P(X), in order for £ to be a base for a topology on X it is
necessary and sufficient that the following two conditions be satisfied:
a. each x E X is contained in some V E £;
b. ifU, V E £ and x E Un V, there exists W E £ with x EWe (U n V).
Proof. The necessity is clear, since if U, V are open, then so is U n V. To prove
the sufficiency, let
'J == {U eX: for every x E U there exists V E £ with x EVe U}.
Then X E 'J by condition (a) and 0 E 'J trivially, and 'J is obviously closed under
unions. If U 1 , U 2 E 'J and x E U I n U 2 , there exist VI, V 2 E £ with x E VI C U I
and x E V 2 C U 2 , and by condition (b) there exists W E £ with x EWe (VI n V 2 ).
Thus U I n U 2 E 'J, so by induction 'J is closed under finite intersections. Therefore
'J is a topology, and £ is clearly a base for 'J. .
4.4 Proposition. If£ C P(X), the topology 'J(£) generated by £ consists of0, X,
and aU unions offinite intersections of members of £.
Proof. The family of finite intersections of sets in £, together with X, satisfies
the conditions of Proposition 4.3, so by Propostiion 4.2 the family of all unions of
such sets, together with 0, is a topology. It is obviously contained in 'J ( £), hence
equal to 'J (£). .
Note how the simplicity of this proposition contrasts with the corresponding result
for a-algebras (Proposition 1.23). What makes life easier here is that only finite
intersections are involved.
The concept of topological space is general enough to include a great profusion
of interesting examples, but - by the same token - too general to yield many
interesting theorems. To build a reasonable theory one must usually restrict the class
of spaces under consideration. The remainder of this section is devoted to a discussion
of two types of restrictions that are commonly made, the so-called countability and
separation axioms.
116 POINT SET TOPOLOGY
A topological space (X, 'J) satisfies the first axiom of countability, or is first
countable, if there is a countable neighborhood base for 'J at every point of X. (It is
useful to observe that if X is first countable, for every x E X there is a neighborhood
base {U j }l at x such that U j U j + 1 for all 1. Indeed, if {Vj}l is any countable
neighborhood base at x, we can take U j == ni Vi.) The space (X, 'J) satisfies the
second axiom of countability, or is second countable, if 'J has a countable base.
Also, (X, 'J) is separable if X has a countable dense subset. Every metric space is
first countable (the balls of rational radius about x are a neighborhood base at x), and
a metric space is second countable iff it is separable (Exercise 5). The latter fact can
be part! y generalized:
4.5 Proposition. Every second countable space is separable.
Proof. If X is second countable, let G be a countable base for the topology,
and for each U E G pick a point Xu E U. Then the complement of the closure of
{xu : U E G} is an open set that does not include any U E G; hence it is empty and
{ xu: U E G} is dense. I
A sequence {Xj} in a topological space X converges to x E X (in symbols:
x j x) if for every neighborhood U of x there exists J E N such that x j E U for all
j > J. First countable spaces have the pleasant property that such things as closure
and continuity can be characterized in terms of sequential convergence - which is
not the case in more general spaces, as we shall see. For example:
4.6 Proposition. If X is first countable and A c X, then x E A iff there is a
sequence {x j } in A that converges to x.
Proof. Let {U j } be a countable neighborhood base at x with U j U j + 1 for all
j. If x E A , then U j n A =1= 0 for all j. Pick Xj E U j n A; since Uk C U j for k > j
and every neighborhood of x contains some U j , it is clear that Xj x. On the other
hand, if x tt A and {Xj} is any sequence in A, then ( A )C is a neighborhood of x
containing no Xj, so Xj f+ x. .
Lastly, we discuss the separation axioms. These are properties of a topological
space, labeled To, . . . , T 4 , that guarantee the existence of open sets that separate
points or closed sets from each other. If X has the property Tj, we say that X is a Tj
space or that the topology on X is Tj.
To: If x =1= y, there is an open set containing x but not y or an open set containing
y but not x.
T 1 : If x =1= y, there is an open set containing y but not x.
T2: If x =1= y, there are disjoint open sets U, V with x E U and y E V.
T3: X is a Tl space, and for any closed set A C X and any x E AC there are
disjoint open sets U, V with x E U and a C V.
TOPOLOGICAL SPACES 117
T4: X is a TI space, and for any disjoint closed sets A, B in X there are disjoint
open sets U, V with A c U and B c V.
T 2 , T 3 , and T4 also have other names: A T 2 space is a Hausdorff space, a T3
space is a regular space, and a T4 space is a normal space. (Some authors do not
require regular and normal spaces to be T I .) There is an additional useful separation
condition, intermediate between T3 and T 4 , that we shall discuss in 4.2.
The following characterization of TI spaces is useful. It shows in particular that
every normal space is regular and that every regular space is Hausdorff.
4.7 Proposition. X is a TI space iff { x} is closed for every x EX.
Proof. If X is TI and x E X, for each y =1= x there is an open U y containing y
but not x; thus {x}C == Uy#x U y is open and {x} is closed. Conversely, if {x} is
closed, then {x} C is an open set containing every y =1= x. .
The vast majority of topological space that arise in practice (and, in particular, in
this book) are Hausdorff, or become Hausdorff after simple modifications. (This last
phrase refers to spaces such as the space of integrable functions on a measure space,
which becomes a Hausdorff space with the L 1 metric when we identify two functions
that are equal a.e.) However, two classes of (usually) non-Hausdorff topologies
are of sufficient importance to warrant special mention: the quotient topology on a
space of equivalence classes, discussed in Exercises 28 and 29 (34.2), and the Zariski
topology on an algebraic variety. Without attempting to give the definition of an
algebraic variety, we shall describe the Zariski topology on a vector space.
Let k be a field, and let k(X 1 ,... ,X n ) be the ring of polynomials in n variables
over x. Each P E k(X I ,..., X n ) determines a polynomial map p : k n k by
substituting elements of k for the formal indeterminates Xl, . . . , Xn. The corre-
spondence P p is one-to-one precisely when k is infinite. The collection of all
sets p-l ({O}) in k n , as p ranges over all polynomial maps, is closed under finite
unions, since p-I({O}) U q-l({O}) == (pq)-l({O}), and it contains k n itself (take
p == 0). Hence, by Propositions 4.2 and 4.3, the collection of all sets of the form
naEA p;;l ({O}) (Pa being a polynomial map for each ex) is the collection of closed
sets for a topology on k n , called the Zariski topology. The Zariski topology is TI
by Proposition 4.7, for if a == (al,... , an) E k n then {a} == n Pjl ({O}) where
Pj (X 1, . . . , X n) == X j - aj. If k is finite the Zariski topology is discrete, but if k is
infinite the Zariski topology is not Hausdorff; in fact, any two nonempty open sets
have nonempty intersection. This is just a restatement of the fact that k( Xl, . . . , X n)
is an integral domain, that is, if P and Q are nonzero polynomials, then PQ is
nonzero. (For n == 1, the Zariski topology is the cofinite topology.)
Other examples illustrating the separation and countability axioms will be found
in the exercises.
Exercises
1. If card(X) > 2, there is a topology on X that is To but not TI.
118 POINT SET TOPOLOGY
2. If X is an infinite set, the cofinite topology on X is Tl but not T 2 , and is first
countable iff X is countable.
3. Every metric space is normal. (If A, B are closed sets in the metric space (X, p),
consider the sets of points x where p(x, A) < p(x, B) or p(x, A) > p(x, B).)
4. Let X == IR, and let 'J be the family of all subsets of JR( of the form U U (V n Q)
where U, V are open in the usual sense. Then 'J is a topology that is Hausdorff but
not regular. (In view of Exercise 3, this shows that a topology stronger than a normal
topology need not be normal or even regular.)
5. Every separable metric space is second countable.
6. Let G == {( a, b] : - 00 < a < b < oo}.
a. G is a base for a topology 'J on IR in which the members of G are both open
and closed.
b. 'J is first countable but not second countable. (If x E , every neighborhood
base at x contains a set whose supremum is x.)
c. Q is dense in IR with respect to 'J. (Thus the converse of Proposition 4.5 is
false.)
7. If X is a topological space, a point x E X is called a cluster point of the
sequence {Xj} if for every neighborhood U of x, Xj E U for infinitely many j. If X
is first countable, x is a cluster point of {Xj} iff some subsequence of {Xj} converges
tox.
8. If X is an infinite set with the cofinite topology and {Xj} is a sequence of distinct
points in X, then Xj x for every x E X.
9. If X is a linearly ordered set, the topology 'J generated by the sets {x : x < a}
and {x : x > a} (a E X) is called the order topology.
a. If a, b E X and a < b, there exist U, V E 'J with a E U, b E V, and x < y
for all x E U and y E V. The order topology is the weakest topology with this
property.
b. If Y eX, the order topology on Y is never stronger than, but may be weaker
than, the relative topology on Y induced by the order topology on X.
c. The order topology on IR is the usual topology.
10. A topological space X is called disconnected if there exist nonempty open sets
U, V such that U n V == 0 and U U V == X; otherwise X is connected. When we
speak of connected or disconnected subsets of X, we refer to the relative topology
on them.
a. X is connected iff 0 and X are the only subsets of X that are both open and
closed.
b. If {Ea} aEA is a collection of connected subsets of X such that naEA Ea #
0, then UaEA Ea is connected.
c. If A c X is connected, then A is connected.
CONTINUOUS MAPS 119
d. Every point x E X is contained in a unique maximal connected subset of X,
and this subset is closed. (It is called the connected component of x.)
11. If E 1 , . . . , En are subsets of a topological space, the closure of U Ej is U Ej .
12. Let X be a set. A Kuratowski closure operator on X is a map A A * from
P(X) to itself satisfying (i) 0* == 0, (ii) A c A* for all A, (iii) (A*)* == A* for all
A, and (iv) (A u B)* == A* u B* for all A, B.
a. If X is a topological space, the map A A is a Kuratowski closure operator.
(D se Exercise 11.)
b. Conversely, given a Kuratowski closure operator, let == {A eX: A == A *}
and'J == {U eX: UC E }. Then'J is a topology, and for any set A c X, A*
is its closure with respect to 'J .
13. If X is a topological space, U is open in X, and A is dense in X, then U == UnA.
4.2 CONTINUOUS MAPS
Topological spaces are the natural setting for the concept of continuity, which can
be described in either global or local terms as follows. Let X and Y be topological
spaces and f a map from X to Y. Then f is called continuous if f-l (V) is open
in X for every open V C Y. (Since f-l (A C) == [f- 1 (A)] c, an equivalent condition
is that f-l (A) is closed in X for every closed A c Y.) If x E X, f is called
continuous at x if for every neighborhood V of I (x) there is a neighborhood U of
x such that I(U) c V, or equivalently, if 1-1 (V) is a neighborhood of x for every
neighborhood V of I (x). Clearly, if I : X ---+ Y and 9 : Y ---+ Z are continuous (or
I is continuous at x and 9 is continuous at I(x)), then 9 0 I is continuous (at x). We
shall denote the set of continuous maps from X to Y by C(X, Y).
4.8 Proposition. The map f : X ---+ Y is continuous iff I is continuous at every
x E X.
Proof. If I is continuous and V is a neighborhood of f(x), 1- 1 (VO) is an open
set containing x, so I is continuous at x. Conversely, suppose that I is continuous at
each x EX. If V c Y is open, V is a neighborhood of each of its points, so 1-1 (V)
is a neighborhood of each of its points. Thus 1-1 (V) is open, so I is continuous. .
4.9 Proposition. If the topology on Y is generated by a family of sets £, then
f : X -+ Y is continuous iff f-l (V) is open in X for every V E G.
Proof. This is clear from Proposition 4.4 and the fact that the set mapping f-l
commutes with unions and intersections. .
If I : X ---+ Y is bijective and I and 1-1 are both continuous, I is called a
homeomorphism, and X and Yare said to be homeomorphic. In this case the
set mapping 1-1 is a bijection from the open sets in Y to the open sets in X, so
120 POINT SET TOPOLOGY
X and Y may be considered identical as far as their topological properties go. If
I : X -+ Y is injective but not surjective, and I : X ---+ I (X) is a homeomorphism
when I(X) c Y is given the relative topology, I is called an embedding.
If X is any set and {I a : X -+ Ya}aEA is a family of maps from X into some
topological spaces Y a , there is a unique weakest topology 'J on X that makes all the
la continuous; it is called the weak topology generated by {la}aEA. Namely, 'J is
the topology generated by sets of the form I;; 1 (U a) where a E A and U a is open in
Ya.
The most important example of this construction is the Cartesian product of
topological spaces. If {X a} aE A is any family of topological spaces, the product
topology on X == TIaEA Xa is the weak topology generated by the coordinate
maps 1fa : X ---+ Xa. When we consider a Cartesian product of topological spaces,
we always endow it with the product topology unless we specify otherwise. By
Proposition 4.4, a base for the product topology is given by the sets of the form
n7rjl(Uaj) where n E Nand U aj is open in X aj for 1 < j < n. These sets
can also be written as TIaEA U a where U a == Xa if a =I- al,.. . , an. Notice, in
particular, that if A is infinite, a product of nonempty open sets TIaEA U a is open in
TIaEA Xa iff U a == Xa for all but finitely many a.
4.10 Proposition. If Xa is Hausdorff for each a E A, then X
Hausdorff.
TIaEA Xa is
Proof. If x and yare distinct points of X, we must have 7r a (x) i- 7r a (Y) for
some a. Let U and V be disjoint neighborhoods of 1fa(x) and 1fa(Y) in Xa. Then
n;; 1 (U) and 7r;; 1 (V) are disjoint neighborhoods of x and y in X. .
4.11 Proposition. IfX a (a E A)andY are topological spaces and X == TIaEA Xa,
then I : Y ---+ X is continuous iff 1f a 0 f is continuous for each a.
Proof. If 7r a 0 I is continuous for each a, then 1-1 (7r;; 1 (U a)) is open in Y for
each open U a in Xa. By Proposition 4.9, I is continuous. The converse is obvious..
If the spaces Xa are all equal to some fixed space X, the product TIaEA Xa is just
the set X A of mappings from A to X, and the product topology is just the topology
of pointwise convergence. More precisely:
4.12 Proposition. If X is a topological space, A is a nonempty set, and {In} is a
sequence in X A, then In ---+ I in the product topology iff In ---+ I pointwise.
Proof. The sets
k
N(U 1 ,..., Uk) == n1f/(Uj) == {g E X A : g(aj) E U j for 1 < j < k},
1
where kEN and U j is a neighborhood of f (aj ) in X for each j, form a neighborhood
base for the product topology at I. If In ---+ I pointwise, then In (aj) E U j for
CONTINUOUS MAPS 121
n > N j and hence In E N(U 1 ,..., Uk) for n > max(N 1 ,..., Nk); therefore
In ---+ I in the product topology. Conversely, if In ---+ I in the product topology,
ex E A, and U is a neighborhood of I(ex), then In E N(U) = 7r;;;l(U) for large n;
hence In ( ex) E U for large n, and so In ( ex) ---+ I ( ex). .
We shall be particularly interested in real- and complex-valued functions on topo-
logical spaces. If X is any set, we denote by B(X, ffi.) (resp. B(X, CC)) the space
of all bounded real- (resp. complex-) valued functions on X. If X is a topological
space, we also have the spaces C(X, ffi.) and C(X, CC) of countinuous functions on
X, and we define
BC(X, F) == B(X, F) n C(X, F)
(F == ffi. or CC).
In speaking of complex-valued functions we shall usually omit the CC and simply
write B(X), C(X), and BC(X). Since addition and multiplication are continuous
from CC x CC to CC, C(X) and BC(X) are complex vector spaces.
If I E B(X), we define the uniform norm of f to be
1I/IIu = sup{lf(x)1 : x EX}.
The function p(/,g) = III - gllu is easily seen to be a metric on B(X), and
convergence with respect to this metric is simply uniform convergence on X. B(X)
is obviously complete in the uniform metric: If {In} is uniformly Cauchy, then
{In (x)} is Cauchy for each x, and if we set I (x) = limn In (x), it is easily verified
that IIln - Illu ---+ o.
4.13 Proposition. If X is a topological space, BC(X) is a closed subspace of
B(X) in the uniform metric; in particular, BC(X) is complete.
Proof. Suppose {fn} C BC(X) and IIfn - fllu ---+ O. Given E > 0, choose N
so large that IIln - Iliu < E/3 for n > N. Given n > N and x E X, since In is
continuous at x there is a neighborhood U of x such that Iln(Y) - In(x)1 < E/3 for
Y E U. But then
I/(Y) - l(x)1 < I/(Y) - In(Y)1 + I/n(Y) - In(x)1 + Iln(x) - l(x)1 < E,
so I is continuous at x. By Proposition 4.8, I is continuous.
.
For a given topological space X it may happen that C(X) consists only of constant
functions. This is obviously the case, for example, if X has the trivial topology, but
it can happen even when X is regular. Normal spaces, however, always have plenty
of continuous functions, as the following fundamental theorems show.
4.14 Lemma. Suppose that A and B are disjoint closed subsets of the normal space
X, and let == {k2- n : n > 1 and 0 < k < 2 n } be the set of dyadic rational
numbers in (0, 1). There is a family {U r : r E } of open sets in X such that
A C U r C Be for all r E and U r C Usforr < s.
122 POINT SET TOPOLOGY
Proof. By normality, there exist disjoint open sets V, W such that A c V,
B c W. Let U 1 / 2 == V. Then since WC is closed,
A C U 1 / 2 C U 1 / 2 C W C c B C .
We now select U r for r == k2- n by induction on n. Suppose that we have chosen U r
for r == k2- n when 0 < k < 2 n and n < N - 1. To find U r for r == (2j + 1)2- N
(0 < j < 2N-l), observe that U j 2 1-N and (U(j+l)21-N)C are disjoint closed sets
(where we set U a == A and Uf == B), so as above we can choose an open U r with
A C U j 2 1-N C U r C U r C U(j+l)21-N C B C .
These Ur's clearly have the desired properties.
.
4.15 Urysohn's Lemma. Let X be a normal space. If A and B are disjoint closed
sets in X, there exists f E C(X, [0,1]) such that f == 0 on A and f == 1 on B.
Proof. Let U r be as in Lemma 4.14 for r E , and set U 1 == X. For x E X,
define f(x) == inf{r : x E U r }. Since A C U r C BC for 0 < r < 1, we clearly have
f(x) == 0 for x E A and f(x) == 1 for x E B, and 0 < f(x) < 1 for all x E X. It
remains to show that f is continuous. To this end, observe that f (x) < ex iff x E U r
for some r < ex iff x E Ur<a U r , so f-l(( -00, ex)) == Ur<a U r is open. Also
f(x) > ex iff x U r for some r > ex iff x U s for some s > ex (since U s C U r for
s < r) if[ x E Us>a( U s)C, so f-l((a, 00)) == Us>a( U s)C is open. Since the open
half-lines generate the topology on ffi., f is continuous by Proposition 4.9. .
The proof of Urysohn's lemma may seem somewhat opaque at first, but there is a
simple geometric intuition behind it. If one pictures X as the plane ffi.2 and the sets
U r as regions bounded by curves, the curves aUr form a "topographic map" of the
function f.
4.16 The Tietze Extension Theorem. Let X be a normal space. If A is a closed
subset of X and f E C(A, [a, b]), there exists F E C(X, [a, b]) such that FIA == f.
Proof. Replacing f by (f - a)/(b - a), we may assume that [a, b] == [0,1].
We claim that there is a sequence {gn} of continuous functions on X such that
o < gn < 2 n - 1 /3 n on X and 0 < f - L gj < (2/3)n on A. To begin with, let
B == f-l ([0,1/3]) and C == f-l ([2/3,1 ]). These are closed subsets of A, and since
A itself is closed, they are closed in X. By Urysohn's lemma there is a continuous
gl : X --+ [0,1/3] with gl == 0 on Band gl == 1/3 on C; it follows that 0 <
f - gl < 2/3 on A. Having found gl, . . . , gn-l, by the same reasoning we can find
gn : X [0, 2n-1 /3 n ] such that gn == 0 on the set where f - L-1 gj < 2 n - 1 /3 n
and gn == 2 n - 1 /3 n on the set where f - L-1 gj > (2/3)n. Let F == L gn.
Since Ilgn Ilu < 2 n - 1 /3 n , the partial sums of this series converge uniformly, so F is
continuous by Proposition 4.13. Moreover, on A we have 0 < f - F < (2/3)n for
all n, whence F == f on A. .
CONTINUOUS MAPS 123
4.17 Corollary. If X is normal, A c X is closed, and f E C(A), there exists
F E C(X) such that FIA == f.
Proof. By considering real and imaginary parts separately, it suffices to assume
that f is real-valued. Let g == f /(1 + If I). Then g E C(A, (-1,1)), so there exists
G E C(X, [-1,1]) with GIA == g. Let B == G- 1 ( {-1, 1}). By Urysohn's lemma
there exists h E C(X, [0,1]) with h == 1 on A, h == 0 on B. Then hG == G on A
and IhGI < 1 everywhere, so F == hG/(1 -lhG!) does the job. .
A topological space X is called completely regular if X is T 1 and for each closed
A c X and each x A there exists f E C(X, [0,1]) such that f(x) == 1 and f == 0
on A. Completely regular spaces are also called Tychonoff spaces or T3 1 spaces.
2
The latter terminology is justified, for every completely regular space is T3 (if A, x,
f are as above, then f -1 ( ( , (0)) and f -1 ( ( - 00, )) are disjoint neighborhoods of
x and A), and Urysohn's lemma shows that every T4 space is completely regular.
Exercises
14. If X and Ya re topol ogical spaces, f : X Y is continuous iff f( A ) c f{A)
for all A c X iff f-1(B) c f-1( B ) for all BeY.
15. If X is a topological space, A c X is closed, and 9 E C (A) satisfies g == 0 on
a A, then the extension of g to X defined by g( x) == 0 for x E A C is continuous.
16. Let X be a topological space, Y a Hausdorff space, and f, g continuous maps
from X to Y.
a. {x: f(x) == g(x)} is closed.
b. If f == g on a dense subset of X, then f == g on all of X.
17. If X is a set, a collection of real-valued functions on X, and 'J the weak
topology generated by , then 'J is Hausdorff iff for every x, y E X with x -1= y there
exists f E with f(x) =1= f(y).
18. If X and Yare topological spaces and Yo E Y, then X is homeomorphic to
X x {Yo} where the latter has the relative topology as a subset of X x Y.
19. If {X Q } is a family of topological spaces, X == TIQ X Q (with the product
topology) is uniquely determined up to homeomorphism by the following property:
There exist continuous maps 7r Q : X X Q such that if Y is any topological space
and f Q E C(Y, X Q ) for each ex, there is a unique F E C(Y, X) such that fQ == 1f Q of.
(Thus X is the category-theoretic product of the XQ's in the category of topological
spaces. )
20. If A is a countable set and X Q is a first (resp. second) countable space for each
ex E A, then TIQEA X Q is first (resp. second) countable.
21. If X is an infinite set with the cofinite topology, then every f E C(X) is constant.
22. Let X be a topological space, (Y, p) a complete metric space, and {fn} a
sequence in yX such that sUPxEX p(fn(x), fm(x)) 0 as m, n 00. There is
124 POINT SET TOPOLOGY
a unique 1 E Y x such that sUPxEX p(f n (x), f (x)) ---+ 0 as n ---+ 00. If each In is
continuous, so is f.
23. Give an elementary proof of the Tietze extension theorem for the case X == ffi..
24. A Hausdorff space X is normal iff X satisfies the conclusion ofUrysohn's lemma
iff X satisfies the conclusion of the Tietze extension theorem.
25. If (X, 'J) is completely regular, then 'J is the weak topology generated by C(X).
26. Let X and Y be topological spaces.
a. If X is connected (see Exercise 10) and f E C(X, Y), then I(X) is con-
nected.
b. X is called arcwise connected if for all Xo, Xl E X there exists f E
C([O, 1], X) with 1(0) == Xo and f(1) == Xl. Every arcwise connected space is
connected.
c. Let X == {( 8, t) E ffi.2 : t == sin( 8- 1 )} U {(O, O)}, with the relative topology
induced from ffi. 2 . Then X is connected but not arcwise connected.
27. If Xa is connected for each ex E A (see Exercise 10), then X == TIaEA Xa is
connected. (Fix X E X and let Y be the connected component of X in X. Show that
Y includes {y EX: 1f a (y) == 7r a (x) for all but finitely many ex} and that the latter
set is dense in X . Use Exercises 10 and 18.)
28. Let X be a topological space eq uipped with an equivalence relation, X the set
of equivalence classes, 1f : X ---+ X the map taking each X E X to its equivalence
class, and 'J == {U c X : 1f-l (U) is open in X}.
a. 'J is a topology on X. (It is called the quotient topology.)
b. If Y is a topological space, f : X ---+ Y is continuous iff 1 0 7r is continuous.
c. X is Tl iff every equivalence class is closed.
29. If X is a topological space and G is a group of homeomorphisms from X to
itself, G induces an equivalence relation on X, namely, x rv y iff y == g(x) for some
g E G. Let X == ffi.2; describe the quotient space X and the quotient topology on
it (as in Exercise 28) for each of the following groups of invertible linear maps. In
particular, show that in (a) the quotient space is homeomorphic to [0, (0); in (b) it is
Tl but not Hausdorff; in (c) it is To but not T l , and in (d) it is not To. (In factin (d)
X i s unc ou nt able, but there are only six open sets and there are points p E X such
that {p} == X.)
a. { ( : n/): B E }
b.{( ):aE}
c. {( ): a > 0, b E }
d. { ( ): a, b E Q \ {O} }
NETS 125
4.3 NETS
As we have hinted above, sequential convergence does not play the same central
role in general topological spaces as it does in metric spaces. The reasons for
this may be illustrated by the following example. Consider the space CC lR of all
complex-valued functions on IR, with the product topology (i.e., the topology of
pointwise convergence), and its subspace C(IR). On the one hand, by Corollary 2.9,
if {In} C C(IR) and In I pointwise, then I is Borel measurable, so the set of
I imi ts of convergent sequences in C (IR) is a proper subset of CC lR . Nonetheless, C (IR)
is dense in CC R . Indeed, if I E CC lR , the sets
{g E CC lR : I g ( x j) - I ( x j ) I < E for j == 1, . . . , n }
(n EN, Xl,..., X n E IR, E > 0)
form a neighborhood base at I, and each of these sets clearly contains continuous
functions.
There is, however, a generalization of the notion of sequence that works well in
arbitrary topological spaces; the key idea is to use index sets more general than N.
The precise definitions are as follows.
A directed set is a set A equipped with a binary relation ;S such that
. a;S a for all Q E A;
· if a ;S {3 and {3 ;S 1 then a ;S 1;
. for any a, {3 E A there exists 1 E A such that a ;S 1 and {3 ;S 1.
If a ;S {3, we shall also write {3 a. A net in a set X is a mapping a Xa from a
directed set A into X. We shall usually denote such a mapping by (Xa)aEA, or just
by (xa) if A is understood, and we say that (xa) is indexed by A.
Here are some examples of directed sets:
1. The set of positive integers N, with j ;S k iff j < k.
11. The set IR \ {a} (a E IR), with x ;S Y iff Ix - al > Iy - al.
111. The set of all partitions {Xj}o of the interval [a, b] (i.e., a == Xo < ... < X n ==
b), with {Xj}o ;S {Yk}O iffmax(xj - Xj-l) > maX(Yk - Yk-l).
IV. The set N of all neighborhoods of a point x in a topological space X, with
U ;S V iff U V. (We say that N is directed by reverse inclusion.)
v. The Cartesian product A x B of two directed sets, with (a, (3) ;S (a', (3') iff
a ;S a' and {3 ;S {3'. (This is always the way we make A x B into a directed
set. )
Examples (i)-(iii) occur in elementary analysis: A net indexed by N is just a
sequence, and the nets indexed by the sets in (ii) and (iii) occur in defining limits of
126 POINT SET TOPOLOGY
real variables and Riemann integrals. Example (iv) is of fundamental importance in
topology, and we shall see several uses of the construction in (v).
Let X be a topological space and E a subset of X. A net (Xa)aEA is eventually
in E if there exists ao E A such that Xa E E for a 2: ao, and (xa) is frequently in
E if for every a E A there exists {3 2: a such that x{3 E E. A point x E X is a limit
of (xa) (or (xa) converges to x, or Xa ---+ x) if for every neighborhood U of x, (xa)
is eventually in U, and x is a cluster point of (xa) if for every neighborhod U of x,
(xa) is frequently in U.
The next three propositions show that nets are a good substitute for sequences.
4.18 Proposition. If X is a topological space, E c X, and x E X, then x is an
accumulation point of E iff there is a net in E \ {x} that converges to x, and x E E
iff there is a net in E that converges to x.
Proof. If x is an accumulation point of E, let N be the set of neighborhoods
of x, directed by reverse inclusion. For each U E N, pick Xu E (U \ {x}) n E.
Then Xu ---+ x. Conversely, if Xa E E \ {x} and Xa ---+ x, then every punctured
neighborhood of x contains some Xa, so x is an accumulation point of E. Likewise,
if Xa ---+ X where Xa E E, then x E E, and the converse follows from Proposition
4.1. .
4.19 Proposition. If X and Yare topological spaces and f : X ---+ Y, then f is
continuous at x E X ifffor every net (xa) converging to x, (f(xa)) converges to
f(x).
Proof. If f is continuous at x and V is a neighborhood of f(x), then f-l (V) is a
neighborhood of x. Hence, if Xa ---+ X then (xa) is eventually in f-l (V), so (f(xa))
is eventually in V, and thus f(xa) ---+ f(x). On the other hand, if f is not continuous
at x, there is a neighborhood V of f (x) such that f-l ( V) is not a neighborhood of x,
that is, x (f-l (V))O, or equivalently, x E f-l (Vc). By Proposition 4.18, there is
a net (xa) in f-l (VC) that converges to x. But then f(xa) V, so f(xa) -F f(x)..
A subnet of a net (Xa)aEA is a net (Y{3)f3EB together with a map !3 af3 from
B to A such that:
· for every ao E A there exists !3o E B such that af3 2: ao whenever !3 2: !3o;
· Y{3 == xa{3.
Clearly if (xa) converges to a point x, then so does any subnet (x a {3).
Warning: The name "subnet" is used because subnets perform much the same
functions as subsequences, but it should not be taken too literally, as the mapping
{3 af3 need not be injective. In particular, the index set B may well have larger
cardinality than the index set A, and a subnet of a sequence need not be a subsequence.
4.20 Proposition. If (Xa)aEA is a net in a topological space X, then x E X is a
cluster point of (xa) iff (xa) has a subnet that converges to x.
NETS 127
Proof. If (Y{3) == (x a {3) is a subnet converging to x and U is a neighborhood
of x, choose f31 E B such that Y{3 E U for f3 2: f31. Also, given a E A, choose
f32 E B such that a{3 2: a for f3 2: f32. Then there exists f3 E B with f3 2: f31 and
f3 2: f32, and we have a{3 2: a and x a {3 == Y{3 E U. Thus (xa) is frequently in U,
so x is a cluster point of (xa). Conversely, if x is a cluster point of (xa), let N be
the set of neighborhoods of x and make N x A into a directed set by declaring that
(U, a) ;S (U ' , a ' ) iff U U ' and a ;S a'e For each (U,) E N x A we can choose
a(V,,) E A such that a(V,,) 2: and xa(U,-y) E U. Then if (U ' , /) 2: (U,) we
have a(V' ,,') 2: ' 2: and xa(U' ,-y') E U ' C U, whence it follows that (xa(U,-y)) is
a subnet of (xa) that converges to x. .
Exercises
30. If A is a directed set, a subset B of A is called cofinal in A if for each a E A
there exists {3 E B such that {3 2: a.
a. If B is cofinal in A and (Xa)aEA is a net, the inclusion map B A makes
(x{3) {3EB a subnet of (Xa)aEA.
b. If (Xa)aEA is a net in a topological space, then (xa) converges to x iff for
every cofinal B c A there is a cofinal C c B such that (X,),EC converges to x.
31. Let (Xn)nEN be a sequence.
a. If k nk is a map from N to itself, then (X nk ) kEN is a subnet of (xn) iff
nk 00 as k 00, and it is a subsequence (as defined in gO. 1 ) iff nk is strictly
increasing in k.
b. There is a natural one-to-one correspondence between the subsequences of
(xn) and the subnets of (xn) defined by cofinal sets as in Exercise 30.
32. A topological space X is Hausdorff iff every net in X converges to at most
one point. (If X is not Hausdorff, let x and y be distinct points with no disjoint
neighborhoods, and consider the directed set N x x Ny where N x , Ny are the families
of neighborhoods of x, y.)
33. Let (Xa)aEA be a net in a topological space, and for each a E A let Ea == {x{3 :
/3 2: a}. Then x is a cluster point of (xa) iff x E naEA E a.
34. If X has the weak topology generated by a family 1" of functions, then (xa)
converges to x E X iff (f (xa)) converges to f (x) for all f E 1". (In particular, if
X == DiE! Xi, then Xa x iff 7ri(Xa) 7ri(X) for all i E I.)
35. Let X be a set and A the collection of all finite subsets of X, directed by inclusion.
Let f : X ffi. be an arbitrary function, and for A E A, let ZA == LXEA f(x).
Then the net (ZA) converges in ffi. iff {x: f(x) =1= O} is a countable set {Xn}nEN and
L If(xn)1 < 00, in which case ZA L f(xn). (Cf. Proposition 0.20.)
36. Let X be the set of Lebesgue measurable complex-valued functions on [0, 1].
There is no topology 'J on X such that a sequence (fn) converges to f with respect
to 'J iff f n f a.e. (Use Corollary 2.32 and Exercises 30b and 31 b.)
128 POINT SET TOPOLOGY
4.4 COMPACT SPACES
In 30.6 we gave three equivalent characterizations of compactness for metric spaces:
the Heine-Borel property, the Bolzano- Weierstrass property, and completeness plus
total boundedness. Only the first two of these make sense for general topological
spaces, and it is the first one that turns out to be the most useful. Accordingly, we
define a topological space X to be compact if whenever {Ua}aEA is an open cover
of X - that is, a collection of open sets such that X == UaEA U a - there is a
finite subset B of A such that X = UaEB U a . To be brief (although somewhat
sylleptic, since the adjectives "open" and "finite" refer to different things), we say:
X is compact if every open cover of X has a finite subcover.
A subset Y of a topological space X is called compact if it is compact in the
relative topology; thus Y c X is compact iff whenever {Ua}aEA is a collection of
open subsets of X with Y c UaEA U a , there is a finite B c A with Y c UaEB U a .
Furthermore, Y is called precompact if its closure is compact.
DeMorgan's laws lead to the following characterization of compactness in terms of
closed sets. A family {Fa}aEA of subsets of X is said to have the finite intersection
property if naEB Fa -1= 0 for all finite B c A.
4.21 Proposition. A topological space X is compact ifffor every family {Fa }aEA
of closed sets with the finite intersection property, naEA Fa =1= 0.
Proof. Let U a = (Fa)c. Then U a is open, naEA Fa -1= 0 iff UaEA U a =1= X,
and {Fa} has the finite intersection property iff no finite subfamily of {U a } covers
X. The result follows. .
We now list several basic facts about compact spaces.
4.22 Proposition. A closed subset of a compact space is compact.
Proof. If X is compact, F c X is closed, and {Ua}aEA is a family of open sets
in X with F c UaEA U a , then {Ua}aEA U {FC} is an open cover of X. It has a
finite subcover, so by discarding FC from the latter if necessary, we obtain a finite
subcollection of {Ua}aEA that covers F. .
4.23 Proposition. If F is a compact subset of a Hausdorff space X and x tt. F,
there are disjoint open sets U, V such that x E U and F c V.
Proof. Foreachy E F, choose disjoint open U y andV y with x E Uyandy E V Y .
{VY}YEF is an open cover of F, so it has a finite subcover {VYj}. Then U = n U yj
and V == U V Yj have the desired properties. .
4.24 Proposition. Every compact subset of a Hausdorff space is closed.
Proof. According to Proposition 4.23, if F is compact then FC is a neighborhood
of each of its points, hence is open. .
COMPACT SPACES 129
We remark that in a non-Hausdorff space, compact sets need not be closed (for
example, every subset of a space with the trivial topology is compact), and the
intersection of compact sets need not be compact; see Exercise 37. Of course, in
a Hausdorff space the intersection of any family of compact sets is compact by
Propositions 4.22 and 4.24. Moreover, in an arbitrary topological space a finite union
of compact sets is always compact. (If K 1, . . . K n are compact and {U a} is an open
cover of U Kj, choose a finite subcover of each Kj and combine them.)
4.25 Proposition. Every compact Hausdorff space is normal.
Proof. Suppose that X is compact Hausdorff and E, F are disjoint closed subsets
of X. By Proposition 4.23, for each x E E there exist disjoint open sets U x , V x with
x E U x , F c V x . By Proposition 4.22, E is compact, and {Ux}xEE is an open cover
of E, so there is a finite subcover {U Xj }1. Let U == U U Xj and V == n V Xj . Then
U and V are disjoint open sets with E c U and F c V. .
4.26 Proposition. If X is compact and f : X Y is continuous, then f(X) is
compact.
Proof. Let {Va} be an open cover of f(X) in Y. Then {f-l(V a )} is an open
cover of X, so it has a finite subcover {f-l (V aj )}, and {V aj } is then a finite subcover
of j(X). .
4.27 Corollary. If X is compact, then C(X) == BC(X).
4.28 Proposition. If X is compact and Y is Hausdorff, then any continuous bijection
f : X Y is a homeomorphism.
Proof. If E c X is closed, then E is compact, hence f(E) is compact, hence
f (E) is closed, by Propositions 4.22, 4.26, and 4.24. This means that f-l is
continuous, so f is a homeomorphism. .
We now show that a version of the Bolzano- Weierstrass property holds for compact
topological spaces. As one might suspect, it is merely necessary to replace sequences
by nets.
4.29 Theorem. If X is a topological space, the following are equivalent:
a. X is compact.
b. Every net in X has a cluster point.
c. Every net in X has a convergent subnet.
Proof. The equivalence of (b) and (c) follows from Proposition 4.20. If X is
compact and (xa) is a net in X, let Ea == {xj3 : {3 ex}. Since for any ex, {3 E A
there exists '"Y E A with '"Y ex and '"Y {3, the family {Ea }aEA has the finite
intersection property, so by Proposition 4.21, naEA E a ::/= 0. If x E naEA E a
and U is a neighborhood of x, then U intersects each Ea, which means that (xa)
130 POINT SET TOPOLOGY
is frequently in U, so x is a cluster point of (XCi). On the other hand, if X is not
compact, let {U,e},eEB be an open cover of X with no finite subcover. Let A be the
col1ection of finite subsets of B, directed by inclusion, and for each A E A let x A
be a point in (U,eEA U,e)c. Then (XA)AEA is a net with no cluster point. Indeed, if
x E X, choose {3 E B with x E U,e. If A E A and A {{3} then XA f/:. U,e, so x is
not a cluster point of (x A). .
We conclude by mentioning two other useful concepts related to compactness. A
topological space X is called countably compact if every countable open cover of
X has a finite subcover, and sequentially compact if every sequence in X has a
convergent subsequence. Of course, every compact space is countably compact, and
for metric spaces compactness and sequential compactness are equivalent. However,
in general there is no relation between compactness and sequential compactness. See
Exercises 39-43 for further results and examples.
Exercises
37. Let 0 ' denote a point that is is not an element of ( -1, 1), and let X == (-1, 1) U
{O/}. Let 'I be the topology on X generated by the sets (-1, a), (a, 1), [( -1, b) \
{O}]U{O/}, and [(c, 1) \ {O}]U{O/} where -1 < a < 1,0< b < 1, and -1 < c < O.
(One should picture X as (-1,1) with the point 0 split in two.)
a. Define f, 9 : (-1,1) --7 X by f(x) == x for all x, g(x) == x for x =I- 0, and
g(O) == 0 ' . Then f and 9 are homeomorphisms onto their ranges.
b. X is Tl but not Hausdorff, although each point of X has a neighborhood that
is homeomorphic to (-1,1) (and hence is Hausdorft).
c. The sets [- , ] and ([ - , ] \ {O}) U {O/} are compact but not closed in X,
and their intersection is not compact.
38. Suppose that (X, 'I) is a compact Hausdorff space and'l' is another topology
on X. If'l' is strictly stronger than 'I, then (X, 'I') is Hausdorff but not compact. If
'I' is strictly weaker than 'I, then (X, 'I') is compact but not Hausdorff.
39. Every sequentially compact space is countably compact.
40. If X is countably compact, then every sequence in X has a cluster point. If X
is also first countable, then X is sequentially compact.
41. A Tl space X is countably compact iff every infinite subset of X has an accu-
mulation point.
42. The set of countable ordinals (30.4) with the order topology (Exercise 9) is
sequentially compact and first countable but not compact. (To prove sequential
compactness, use Proposition 0.19.)
43. For x E [0,1), let L a n (x)2-n (an(x) == 0 or 1) be the base-2 decimal
expansion of x. (If x is a dyadic rational, choose the expansion such that an (x) == 0
for n large.) Then the sequence (an) in {O, 1 }[O,l) has no pointwise convergent
subsequence. (Hence {O, 1 }[O,l), with the product topology arising from the discrete
LOCALLY COMPACT HAUSDORFF SPACES 131
topology on {O, I}, is not sequentially compact. It is, however, compact, as we shall
show in 34.6.)
44. If X is countably compact and f : X Y is continuous, then f(X) is countably
compact.
45. If X is normal, then X is countably compact iff C(X) == BC(X). (Use
Exercises 40 and 44. If (x n ) is a sequence in X with no cluster point, then {x n : n E
N} is closed, and Corollary 4.17 applies.)
4.5 LOCALLY COMPACT HAUSDORFF SPACES
A topological space is called locally compact if every point has a compact neighbor-
hood. We shall be mainly concerned with locally compact Hausdorff spaces, which
we call LCH spaces for short.
4.30 Proposition. If X is an LCH space, U C X is open, and x E U, there is a
compact neighborhood N of x such that N c U.
Proof. We may assume U is compact; otherwise, replace U by U n po where
p is a compact neighborhood of x. By Proposition 4.23, there are disjoint relatively
open sets V, W in U with x E V and au c W. Then V is open in X since V C U,
and V is a closed and hence compact subset of U \ W. Thus we may take N == V . .
4.31 Proposition. If X is an LCH space and K cUe X where K is compact and
U is open, there exists a precompact open V such that K eVe V c U.
Proof. By Proposition 4.30, for each x E K we can choose a compact neigh-
borhood N x of x with N x c U. Then {N}xEK is an open cover of K, so there
is a finite subcover {Nj}l. Let V == U Nj; then K C V and V == U N Xj is
compact and contained in U. .
4.32 Urysohn's Lemma, Locally Compact Version. If X is an LCH space and
K cUe X where K is compact and U is open, there exists f E C(X, [0,1]) such
that f == 1 on K and f == 0 outside a compact subset ofU.
Proof. Let V be as in Proposition 4.31. Then V is normal by Proposition 4.25,
so by Urysohn's lemma 4.15 there exists f E C( V , [0,1]) such that f == 1 on K
-c
and f == 0 on avo We extend f to X by setting f == 0 on V . Suppose that
E C [0,1] is closed. If 0 ft. E we have f-l(E) == (fI V )-I(E), and if 0 E E we
have f-l(E) == (fI V )-I(E) U V C == (fI V )-I(E) U VC since (fI V )-I(E) avo
In either case, f-l (E) is closed, so f is continuous. .
4.33 Corollary. Every LCH space is completely regular.
132 POINT SET TOPOLOGY
4.34 Tietze Extension Theorem, Locally Compact Version. Suppose that X is an
LCH space and K c X is compact. If I E C(K), there exists F E C(X) such that
FIK == f. Moreover, F may be taken to vanish outside a compact set.
The proof is similar to that of Theorem 4.32; details are left to the reader (Exercise
46).
The preceding results show that LCH spaces have a rich supply of continuous
functions that vanish outside compact sets. Let us introduce some terminology: If X
is a topological space and I E C(X), the support of I, denoted by supp(I), is the
smallest closed set outside of which I vanishes, that is, the closure of {x : I (x) =I- O}.
If supp(I) is compact, we say that I is compactly supported, and we define
Cc(X) == {I E C(X) : supp(f) is compact}.
Moreover, if f E C (X), we say that f vanishes at infinity if for every E > 0 the set
{x : I I ( x ) I > E} is compact, and we define
Co (X) == {f E C(X) : f vanishes at infinity}.
Clearly Cc(X) C Co (X). Moreover, Co (X) c BC(X), because for I E Co (X)
the image of the set {x : I I (x) I > E} is compact, and I I I < E on its complement.
4.35 Proposition. If X is an LCH space, Co(X) is the closure of Cc(X) in the
uniform metric.
Proof. If {fn} is a sequence in Cc(X) that converges uniformly to f E C(X),
for each E > 0 there exists n E N such that Ilfn - fllu < E. Then II(x)1 < E
if x f/:. supp(fn), so I E Co(X). Conversely, if I E Co(X), for n E N let
Kn == {x : If(x)1 > n- 1 }. Then Kn is compact, so by Theorem 4.32 there exists
gn E Cc(X) with 0 < gn < 1 and gn == 1 on Kn. Let fn == gni. Then fn E Cc(X)
and IIIn - Illu < n- 1 , so fn I uniformly. .
If X is a noncompact LCH space, it is possible to make X into a compact space
by adding a single point "at infinity" in such a way that the functions in Co (X) are
precisely those continuous functions I such that I (x) 0 as x approaches the point
at infinity. More precisely, let 00 denote a point that is not an element of X, let
X* == X U {oo}, and let 'J be the collection of all subsets of X* such that either (i)
U is an open subset of X, or (ii) 00 E U and UC is a compact subset of X.
4.36 Proposition. If X, X*, and'J are as above, then (X*, 'J) is a compact Haus-
dorff space, and the inclusion map i : X X* is an embedding. Moreover, if
I E C(X), then I extends continuously to X* iff I == 9 + c where 9 E Co(X) and
c is a constant, in which case the continuous extension is given by I ( (0) == c.
The proof is straightforward and is left to the reader (Exercise 47). The space X*
is called the one-point compactification or Alexandroff compactification of X.
If X is a topological space, the space ex of all complex-valued functions on X
can be topologized in various ways. One way, of course, is the product topology,
LOCALLY COMPACT HAUSDORFF SPACES 133
that is, the topology of pointwise convergence. Another is the topology of uniform
convergence, which is generated by the sets
{g E eX : sup Ig(x) - f(x)1 < n- 1 }
xEX
(n E N, f E CX).
The proof of Proposition 4.13 shows that C (X) is a closed subspace of C X in the
topology of uniform convergence. Intermediate between these two topologies is the
topology of uniform convergence on compact sets, which is generated by the sets
{g E eX : sup Ig(x) - f(x)1 < n- 1 }
xEK
(n E N, f E CX, K c X compact).
We shall now examine this topology in the case where X is an LCH space.
4.37 Lemma. If X is an LCH space and E c X, then E is closed iff E n K is
closed for every compact K c x.
Proof. If E is closed, then E n K is closed by Propositions 4.22 and 4.24. If E
is not closed, pick x E E \ E and let K be a compact neighborhood of x. Then x is
an accumulation point of En K but is not in En K, so by Proposition 4.1 E n K
is not closed. .
4.38 Proposition. If X is an LCH space, C(X) is a closed subspace ofCx in the
topology of uniform convergence on compact sets.
Proof. If f is in the closure of C(X), then f is a uniform limit of continuous
functions on each compact K c X, so flK is continuous. If E c C is closed,
f-l (E) n K == (f I K) -1 (E) is thus closed for each compact K, so by Lemma 4.37
f-l (E) is closed, whence f is continuous. .
A topological space X is called u-compact if it is a countable union of compact
sets. To appreciate the significance of the next two propositions, see Exercise 54.
4.39 Proposition. If X is a a-compact LCH space, there is a sequence {Un} of
precompact open sets such that U n C U n + 1 for all n and X == U Un.
Proof. Suppose X == U Kn where each Kn is compact. Every compact subset
of X has a precompact open neighborhood by Proposition 4.31. Thus we may take
U 1 to be a precompact open neighborhood of K 1 , and then, proceeding inductively,
take Un to be a precompact open neighborhood of U n-l U Kn. .
4.40 Proposition. If X is a a-compact LCH space and {Un} is as in Proposition
4.39, thenfor each f E C X the sets
{g E eX : s up Ig(x) - f(x)1 < m- 1 }
xEU n
(m,n E N)
134 POINT SET TOPOLOGY
form a neighborhood base for f in the topology of uniform convergence on compact
sets. Hence this topology is firs t countable, and fj --7 f uniformly on compact sets
iff f J ---t f uniformly on each U n.
Proof. These assertions follow easily from the observation that if K c X is
compact, then {Un}! is an open cover of K and hence K c U n for some n. Details
are left to the reader (Exercise 48). .
We close this section with a construction that is useful in a number of situations.
If X is a topological space and E eX, a partition of unity on E is a collection
{hex} exE A of functions in C (X, [0, 1]) such that
. each x E X has a neighborhood on which only finitely many hex's are nonzero;
. LcxEA hex (x) == 1 for x E E.
A partition of unity {hex} is subordinate to an open cover 11 of E if for each ex there
exists U E 11 with supp(hex) C U.
4.41 Proposition. Let X be an LCH space, K a compact subset of X, and {U j }l an
open cover of K. There is a partition of unity on K subordinate to {U j }l consisting
of compactly supported functions.
Proof. By Proposition 4.30, each x E K has a compact neighborhood N x
such that N x C U j for some j. Since {N} is an open cover of K, there exist
Xl, . . . , X m such that K C U N Xk . Let Fj be the union of those NXk's that
are subsets of U j . Then Fj is a compact subset of U j , so by Urysohn's lemma
there exist gl,...,gn E Cc(X, [0,1]) with gj == 1 on Fj and supp(gj) C U j .
Since the Fj's cover K we have L gk > 1 on K, so by Urysohn again there
exists f E Cc(X, [0,1]) with f == 1 on K and supp(f) C {x : L gk(X) > O}.
Let gn+l == 1 - f, so that L+l gk > 0 everywhere, and for j == 1,..., n let
h j == gj / L+l gk. Then supp(h j ) == supp(gj) C U j and L h j == 1 on K. .
A generalization of this result may be found in Exercise 57.
Exercises
46. Prove Theorem 4.34.
47. Prove Proposition 4.36. Also, show that if X is Hausdorff but not locally
compact, Proposition 4.36 remains valid except that X * is not Hausdorff.
48. Complete the proof of Proposition 4.40.
49. Let X be a compact Hausdorff space and E eX.
a. If E is open, then E is locally compact in the relative topology.
b. If E is dense in X and locally compact in the relative topology, then E is
open . (Use Exercise 13.)
c. E is locally compact in the relative topology iff E is relatively open in E .
LOCALLY COMPACT HAUSDORFF SPACES 135
50. Let U be an open subset of a compact Hausdorff space X and U* its one-point
compactification (see Exercise 49a). If cjJ : X U* is defined by cjJ( x) = x if x E U
and cjJ( x) == 00 if x E UC, then cjJ is continuous.
51. If X and Yare topological spaces, c/J E C(X, Y) is called proper if cjJ-l (K) is
compact in X for every compact KeY. Suppose that X and Yare LCH spaces and
X* and Y* are their one-point compactifications. If cjJ E C(X, Y), then cjJ is proper
iff cjJ extends continuously to a map from X* to Y* by setting cjJ( 00 x) == ooy.
52. The one-point compactification of JRn is homeomorphic to the n-sphere {x E
JRn+l : Ixl == 1}.
53. Lemma 4.37 remains true if the assumption that X is locally compact is replaced
by the assumption that X is first countable.
54. Let Q have the relative topology induced from JR.
a. Q is not locally compact.
b. Q is a-compact (it is a countable union of singleton sets), but uniform con-
vergence on singletons (i.e., pointwise convergence) does not imply uniform
convergence on compact subsets of Q.
55. Every open set in a second countable LCH space is a-compact.
56. Define <I> : [0, 00] [0,1] by <I>(t) == t/(t + 1) for t E [0, (0) and <I>(oo) = 1.
a. <I> is strictly increasing and <I> (t + s) < <I> ( t) + <I> ( s ) .
b. If (Y, p) is a metric space, then <I> 0 p is a bounded metric on Y that defines
the same topology as p.
c. If X is a topological space, the function p(f, g) == <I>(suPXEX If(x) - g(x)1)
is a metric on (CX whose associated topology is the topology of uniform conver-
gence.
d. If X is a a-compact LCH space and {Un}! is as in Proposition 4.39, the
function
p(f,g) = I= 2- n <I> ( s up If(x) - g(X)I )
1 xEU n
is a metric on (CX whose associated topology is the topology of uniform conver-
gence on compact sets.
57. An open cover 11 of a topological space X is called locally finite if each x E X
has a neighborhood that intersects only finitely many members of 11. If 11, V are
open covers of X, V is a refinement of 11 if for each V E V there exists U E 11
with V c U. X is called paracompact if every open cover of X has a locally finite
refinement.
a. If X is a a-compact LCH space, then X is paracompact. In fact, every open
cover 11 has locally finite refinements {Va}, {W a } such that V a is compact
and W aC Va for all a. (Let {Un}! be as in Proposition 4.39. For each n,
{E n (U n + 2 \ U n-I) : E C 11} is an open cover of U n+l \ Un. Choose a finite
136 POINT SET TOPOLOGY
subcover to obtain {Va} and mimic the beginning of the proof of Proposition
4.41 to obtain {W a }.)
b. If X is a a-compact LCH space, for any open cover 11 of X there is a partition
of unity on X subordinate to 11 and consisting of compactly supported functions.
4.6 TWO COMPACTNESS THEOREMS
The geometric objects on which one does analysis (Euclidean spaces, manifolds,
etc.) tend to be compact or locally compact. However, in infinite-dimensional spaces
such as spaces of functions, compactness is a rather rare phenomenon and is to be
greatly prized when it is available. Almost all compactness results in such situations
are obtained via two basic theorems, Tychonoff's theorem and the ArzeHl-Ascoli
theorem, which we present in this section.
Tychonoff's theorem has to do with compactness of Cartesian products. To prepare
for it, we introduce some notation. Recall that an element x of X == I1aEA Xa is,
strictly speaking, a mapping from A into UaEA Xa; namely, x(a) E Xa is the ath
coordinate of x, which we generally denote by 7r a (x). If B c A, there is a natural
map 1fB : X I1aEB Xa; namely, 1fB(X) is the restriction of the map x to B. (In
particular, 1f{a} is essentially identical to 1fa, and we shall not distinguish between
them.) If p E I1aEB Xa and q E I1aEC Xa, we shall say that q is an extension of p
if q extends p as a mapping, that is, if Bee and p( a) == q( a) for a E B.
4.42 Tychonoff's Theorem. If {Xa}aEA is any family of compact topological
spaces, then X == I1aEA Xa (with the product topology) is compact.
Proof. By Theorem 4.29, it is enough to show that any net (Xi)iEI in X has a
cluster point. We shall do this by examining cluster points of the nets (1f B (Xi)) in
the subproducts of X. To wit, let
p = U {p E II Xc< : p is a cluster point of (7rB(Xi)) }.
BcA aEB
Pis nonempty, because each Xa is compact and so (1fB(Xi)) has cluster points when
B == {a}. Moreover, P is partially ordered by extension; that is, p < q if q is an
extension of p as defined above.
Suppose that {Pl : l E L} is a linearly ordered subset ofP, where Pl E I1aE B z Xa.
Let B* == UlEL Bl, and let p* be the unique element of I1aEB* Xa that extends
every Pl. We claim that p* E P. Indeed, from the definition of the product topology,
any neighborhood of p* contains a set of the form I1aEB* U a where each U a is open
in Xa and U a == Xa for all but finitely many a, say aI, . . . , an. Each of these aj's
belongs to some B l , so by linearity of the ordering they all belong to a single Bl. But
then I1aEB z U a is a neighborhood of Pl, so ( 1fB z (Xi)) is frequently in I1aEB z U a ;
hence (1fB* (Xi)) is frequently in I1aEB* U a , so p* is a cluster point of (1fB* (Xi)).
Therefore p* is an upper bound for {Pl} in P.
TWO COMPACTNESS THEOREMS 137
By Zorn's lemma, then, P has a maximal element p E I1 aE B Xa. We claim
that B == A. If not, pick '"Y E A \ B . By Proposition 4.20 there is a subnet
{7r B (Xi(j)))jEJ of \ 7r B (Xi)) that converges to p, and since X"'( is compact, there is
a subnet (7r"'((Xi(j(k))))kEK of (7r"'((Xi(j))) that converges to some P"'( E X"'(. Let q
be the unique element of I1aE B U{"'(} Xa that extends both p and P"'(; then the net
( 7r B u{ "'(} (Xi(j (k)))) kE K converges to q and hence q is a cluster point of (7r B U{ "'(} (Xi)),
contradicting the maximality of p. Therefore p is a cluster point of (Xi), and we are
done. .
We now turn to the ArzeUl-Ascoli theorem, which has to do with compactness in
spaces of continuous mappings. There are several variants of this result; the theorems
below are two of the most useful ones. See also Exercise 61.
If X is a topological space and l' c C(X), l' is called equicontinuous at X E X
if for every E > 0 there is a neighborhood U of X such that If (y) - f (x) I < E for all
y E U and all f E 1', and l' is called equicontinuous if it is equicontinuous at each
x EX. Also, l' is said to be pointwise bounded if {f (x) : f E 1'} is a bounded
subset of C for each x EX.
4.43 Arzela-Ascoli Theorem I. Let X be a compact Hausdorff space. If l' is an
equicontinuous, pointwise bounded subset ofC(X), then is totally bounded in the
uniform metric, and the closure of1' in C(X) is compact.
Proof. Suppose E > O. Since l' is equicontinuous, for each x E X there is an
open neighborhood U x of x such that If(y) - f(x)1 < iE for all y E U x and all
f E 1'. Since X is compact, we can choose Xl, . . . , X n E X such that U U Xj == X.
Then by pointwise boundedness, {f(xj) : f E 1', 1 < j < n} is a bounded subset
of C, so there is a finite set {Zl, . . . , zm} C <C that is iE-dense in it - that is,
each f(xj) is at a distance less than iE from some Zk. Let A == {Xl,..., xn} and
B == {Zl'...' zm}; then the set B A of functions from A to B is finite. For each
c/J E B A , let
1'</> = {f E 1': If(xj) - 4>(xj)1 < iEforl < j < n}.
Then clearly U</>EBA l' </> == 1', and we claim that each l' </> has diameter at most E, so
we obtain a finite E-dense subset of l' by picking one f from each nonempty l' </>. To
prove the claim, suppose f, 9 E l' </>. Since If - c/JI < i E and Ig - c/JI < i E on A, we
have If - gl < E on A. If x E X, we have x E U Xj for some j, and then
If(x) - g(x)1 < If(x) - f(xj)1 + If(xj) - g(xj)1 + Ig(xj) - g(x)1 < E.
This shows that l' is totally bounded. Since the closure of a totally bounded set is
totally bounded and C (X) is complete, the theorem is proved. .
4.44 Arzela-Ascoli Theorem II. Let X be a a-compact LCH space. If {fn} is an
equicontinuous, pointwise bounded sequence in C(X), there exist f E C(X) and a
subsequence of {fn} that converges to f uniformly on compact sets.
138 POINT SET TOPOLOGY
Proof. By Proposition 4.39 there is a sequence {Uk} of precompact open sets
- 00
such that Uk C U k + 1 and X == U1 Uk. By Theorem 4.43 there is a subsequence
{fnj } 1 of {fn} that is uniformly Cauchy on U 1; we denote it by {f] }1. Pro-
ceeding inductively, for kEN we obtain a subsequence {f;}l of {f j k - 1 }1 that
is uniformly Cauchy on U k. Let gk == f:; then {gk} is a subsequence of {fn} which
is (except for the first k - 1 terms) a subsequence of {ff} and hence is uniformly
Cauchy on each U k. Let f == lirn gk. Then f E C(X) and gk f uniformly on
compact sets by Propositions 4.38 and 4.40. .
Exercises
58. If {Xa}a E A is a family of topological spaces of which infinitely many are
noncompact, then every closed compact subset of I1aEA Xa is nowhere dense.
59. The product of finitely many locally compact spaces is locally compact.
60. The product of countably many sequentially compact spaces is sequentially
compact. (Use the "diagonal trick" as in the proof of Theorem 4.44.)
61. Theorem 4.43 remains valid for maps from a compact Hausdorff space X into
a complete metric space Y provided the hypothesis of pointwise boundedness is
replaced by pointwise total boundedness. (Make this statement precise and then
prove it.)
62. Rephrase Theorem 4.44 in a form similar to Theorem 4.43 by using the metric
in Exercise 56d.
63. Let K E C([O, 1] x [0,1]). For f E C([O, 1]), let Tf(x) == fo1 K(x, y)f(y) dYe
Then T f E C([O, 1]), and {T f : Ilfllu < 1} is precompact in C([O, 1]).
64. Let (X, p) be a metric space. A function f E C(X) is called Holder continuous
of exponent Q (ex > 0) if the quantity
N (f) - If(x) - f(y)1
a - sup
x#y p(x, y)a
is finite. If X is compact, {f E C(X) : Ilfllu < 1 and Na(f) < I} is compact in
C(X).
65. Let U be an open subset of C, and let {fn} be a sequence of holomorphic
functions on U. If {fn} is uniformly bounded on compact subsets of U, there is a
subsequence that converges uniformly to a holomorphic function on compact subsets
of U. (Use the Cauchy integral formula to obtain equicontinuity.)
4.7 THE STONE-WEIERSTRASS THEOREM
In this section we prove a far-reaching generalization of the well-known theorem of
Weierstrass to the effect that any continuous function on a compact interval [a, b] is
THE STONE-WEIERSTRASS THEOREM 139
the uniform limit of polynomials on [a, b]. Throughout this section, X will denote a
compact Hausdorff space, and we equip the space C(X) with the uniform metric.
A subset A of C(X, JR) or C(X) is said to separate points if for every x, y E X
with x =I- y there exists f E A such that f (x) =I- f (y). A is called an algebra if it is
a real (resp. complex) vector subspace of C(X, JR) (resp. C(X)) such that f 9 E A
whenever f,g EA. If A c C(X, JR), A is called a lattice ifrnax(f, g) andrnin(f,g)
are in A whenever f, 9 E A. Since the algebra and lattice operations are continuous,
one easily sees that if A is an algebra or a lattice, so is its closure A in the uniform
metric.
4.45 The Stone-Weierstrass Theorem. Let X be a compact Hausdorff space. If A
is a closed subalgebra of C(X,) that separates points, then either A == C(X, JR)
or A == {f E C(X, JR) : f(xo) == O} for some Xo E X. Thefirst alternative holds iff
A contains the constant functions.
The proof will require several lemmas. The first one, in effect, proves the theorem
when X consists of two points, and the second one is a special case of the classical
Weierstrass theorem for X == [-1, 1]. After these two we return to the general case.
4.46 Lemma. Consider JR2 as an algebra under coordinatewise addition and mul-
tiplication. Then the only subalgebras ofJR2 are JR2, {(O, O)}, and the linear spans
of (1 , 0), (0, 1), and (1, 1).
Proof. The subspaces of JR2 listed above are evidently subalgebras. If A c JR2
is a nonzero algebra and (0,0) =I- ( a, b) E A, then (a 2 , b 2 ) E A. If a =I- 0, b =I- 0,
and a =I- b, then ( a, b) and (a 2 , b 2 ) are linearly independent, so A == JR2. The other
possibilities - a -:I 0 == b, a == 0 -:I b, and a == b -:I 0 for all nonzero (a, b) E A -
give the other three subalgebras. .
4.47 Lemma. For any E > 0 there is a polynomial P on JR such that P(O) == 0 and
Ilxl- p(x)1 < Eforx E [-1,1].
Proof. Consider the Maclaurin series for (1 - t)1/2:
1 / 2 ( 1 ) ( 1 ) ( 2n - 3 ) tn
(1 - t) = 1 + -2 2' . · 2 n!
00
== 1- Lcnt n
1
(C n > 0).
By the ratio test, this series converges for It I < 1; a proof that its sum is actually
(1 - t) 1/2 is outlined in Exercise 66. Moreover, by the monotone convergence
theorem (applied to counting measure on N),
00 00
""' C n == lirn cnt n == 1 - lirn(l - t)1/2 == 1.
tl tl
1 1
140 POINT SET TOPOLOGY
It follows from the finiteness of I: C n that the series 1 - I: cnt n converges
absolutely and uniformly on [-1,1], and its sum is (1 - t)1/2 there. Therefore,
given E > 0, by taking a suitable partial sum of this series we obtain a polynomial
Q such that 1(1 - t)1/2 - Q(t)1 < !E for t E [-1,1]. Setting t == 1 - x 2 and
R(x) == Q(l - x 2 ), we obtain a polynomial R such that Ilxl - R(x)1 < !E for
x E [-1,1]. In particular, IR(O)I < !E, so if we set P(x) == R(x) - R(O), P is a
polynomial such that P(O) = 0 and Ilxl - p(x)1 < E for x E [-1,1]. .
4.48 Lemma. If A is a closed subalgebra of C(X, JR), then If I E A whenever
f E A, and A is a lattice.
Proof. If f E A and f i- 0, let h == f /llfllu. Then h maps X into [-1,1], so if
E > 0 and P is as in Lemma 4.47, we have Illhl - P 0 hll u < E. Since P(O) == 0,
P has no constant term, so P 0 h E A since A is an algebra. Since A is closed and
E is arbitrary, we have Ihl E A and hence If I == Ilfllulhl E A. This proves the first
assertion, and the second one follows because
max(f, g) == !(f + 9 + If - gl),
min(f, g) == !(f + 9 - If - gl).
.
4.49 Lemma. Suppose A is a closed lattice in C(X, JR) and f E C(X, JR). If for
every x, y E X there exists gxy E A such that gxy(x) == f(x) and gxy(Y) = f(y),
then f E A.
Proof. Given E > 0, for each x,y E X let U xy = {z EX: f(z) < gxy(Z) + E}
and V xy == {z EX: f(z) > gxy(z) - E}. These sets are open and contain x and
y. Fix y; then {U xy : x E X} covers X, so there is a finite subcover {U XjY }l. Let
gy == max(gxlY'...' gxny); then f < gy +E on X and f > gy - E on V y = n V XjY ,
which is open and contains y. Thus {VY}YEX is another open cover of X, so there is
a finite subcover {V Yj }l. Letg == min(gYl'... ,gYm); then Ilf - gllu < E. Since A
is a lattice, 9 E A, and since A is closed and E is arbitrary, f E A. .
Proof of Theorem 4.45. Given x =I- y EX, let Axy = {(f(x), f(y)) : f E A}.
Then Axy is a sub algebra ofJR2 as in Lemma 4.46 because f (f(x), f(y)) is an
algebra homomorphism. IfA xy == JR2 for all x, y, then Lemmas 4.48 and 4.49 imply
that A == C(X, JR). Otherwise, there exist x, y for which Axy is a proper subalgebra
of JR2. It cannot be {(O, O)} or the linear span of (1,1) because A separates points,
so by Lemma 4.46 Axy is the linear span of (1,0) or (0,1). In either case there
exists Xo E X such that f (xo) == 0 for all f E A. There is only one such Xo since
A separates points, so if neither x nor y is xo, we have Axy == JR2. Lemmas 4.48
and 4.49 now imply that A == {f E C(X, JR) : f(xo) == O}. Finally, if A contains
constant functions, there is no Xo such that f (xo) == 0 for all f E A, so A must equal
C(X, JR). .
THE STONE-WEIERSTRASS THEOREM 141
We have stated the Stone-Weierstrass theorem in the form that is most natural for
the proof. However, in applications one is typically dealing with a subalgebra 23 of
C(X, JR) that is not closed, and one applies the theorem to A == 23 . The resulting
restatement of the theorem is as follows:
4.50 Corollary. Suppose 23 is a subalgebra of C(X, JR) that separates points. If
there exists Xo E X such that f(xo) == 0 for all f E 23, then 23 is dense in
{f E C(X, JR) : f(xo) == O}. Otherwise, 13 is dense in C(X, JR).
The classical Weierstrass approximation theorem is the special case of this corol-
lary where X is a compact subset of JRn and 23 is the algebra of polynomials on JRn
(restricted to X); here 23 contains the constant functions, so the conclusion is that it
is dense in C(X, JR).
The Stone-Weierstrass theorem, as it stands, is false for complex-valued functions.
For example, the algebra of polynomials in one complex variable is not dense in C (K)
for most compact subsets K of C. (In particular, if KO =I- 0, any uniform limit of
polynomials on K must be holomorphic on KO.) Here we shall give a simple proof
that the function f (z) == z cannot be approximated uniformly by polynomials on the
unit circle {e it : t E [0, 27r]}. If P(z) == ajz j , then
1 2 n 1 2
f ( eit)p( e it ) dt == L aj e i (j+l)t dt == o.
000
Thus, abbreviating f( e it ) and P( e it ) by f and P, since If I == 1 on the unit circle we
have
r2
21f = J 0 f f dt <
r2 r2
Jo (J-P) f dt + Jo f Pdt
r2 r2
J o (J - P) f dt < Jo If - PI dt < 21fllf - Pllu o
Therefore, IIf - Pllu > 1 for any polynomial P.
There is, however, a complex version of the Stone-Weierstrass theorem.
4.51 The Complex Stone-Weierstrass Theorem. Let X be a compact Hausdorff
space. If A is a closed complex subalgebra of C(X) that separates points and is
closed under complex conjugation, then either A == C(X) or A == {f E C(X) :
f(xo) == O} for some Xo E X.
Proof. Since Re f := (f + f ) /2 and 1m f == (f - f ) /2i, the set AIR. of real and
imaginary parts of functions in A is a subalgebra of C(X, IR) to which the Stone-
Weierstrass theorem applies. Since A == {f + ig : f, 9 E AIR.}, the desired result
follows. I
There is also a version of the Stone-Weierstrass theorem for noncompact LCH
spaces. We state this result for real functions; the corresponding analogue of Theorem
4.51 for complex functions is an immediate consequence.
142 POINT SET TOPOLOGY
4.52 Theorem. Let X be a noncompact LCH space. If A is a closed subalgebra of
Co(X, JR) (== Co(X) n C(X, JR)) that separates points, then either A == Co(X, JR)
or A == {f E Co (X, JR) : f(xo) == O} for some Xo E X.
The proof is outlined in Exercise 67.
Exercises
66. Let 1 - L: C n t n be the Maclaurin series for (1 - t) 1/2.
a. The series converges absolutely and uniformly on compact subsets of ( -1, 1),
as does the termwise differentiated series - L: nC n tn-I. Thus, if f (t) =
1 - L: cnt n , then f'(t) == - L: nc n t n - 1 .
b. By explicit calculation, f(t) == -2(1 - t)f'(t), from which it follows that
(1 - t)-1/2 f(t) is constant. Since f(O) == 1, f(t) == (1 - t)1/2.
67. Prove Theorem 4.52. (If there exists Xo E X such that f (xo) == 0 for all f E A,
let Y be the one-point compactification of X \ { xo}; otherwise let Y be the one-point
compactification of X. Apply Proposition 4.36 and the Stone-Weierstrass theorem
on Y.)
68. Let X and Y be compact Hausdorff spaces. The algebra generated by functions
of the form f(x, y) == g(x)h(y), where 9 E C(X) and h E C(Y), is dense in
C(X x Y).
69. Let A be a nonempty set, and let X == [0, l]A. The algebra generated by the
coordinate maps 7r a : X -t [0, 1] (Q E A) and the constant function 1 is dense in
C(X).
70. Let X be a compact Hausdorff space. An ideal in C(X, JR) is a subalgebra :J of
C(X, JR) such that if f E :J and 9 E C(X, JR) then f 9 E J.
a. If J is an ideal in C(X, JR), let h(J) == {x EX: f(x) = 0 for all f E :J}.
Then h(:J) is a closed subset of X, called the hull of:J .
b. If E c X, let k(E) == {f E C(X, JR) : f(x) == 0 for all x E E}. Then k(E)
is a closed ideal in C(X, JR), called the kernel of E.
c. If E c X, then h(k(E)) == E .
d. If:J is an ideal in C(X, JR), then k( h(J)) == J . «Hint: k( h(:J)) may be
identified with a subalgebra of Co(U, IR) where U == X \ h(:J).)
e. The closed subsets of X are in one-to-one correspondence with the closed
ideals of C(X, JR).
71. (This is a variation on the theme of Exercise 70; it does not use the Stone-
Weierstrass theorem.) Let X be a compact Hausdorff space, and let M be the set of
all nonzero algebra homomorphisms from C(X, JR) to JR. Each x E X defines an
element x of M by x(f) == f(x).
a. If <P E M, then {f E C(X, JR) : cjJ(f) == O} is a maximal proper ideal in
C(X, JR).
EMBEDDINGS IN CUBES 143
b. If J is a proper ideal in C(X, JR), there exists Xo E X such that f(xo) == 0
for all f E J. (Suppose not; construct an f E J with f > 0 everywhere and
conclude that 1 E J. This requires no deep theorems.)
c. The map x x is a bijection from X to M.
d. If M is equipped with the topology of pointwise convergence, then the map
x x is a homeomorphism from X to M. (Since M is defined purely alge-
braically, it follows that the topological structure of X is completely determined
by the algebraic structure of C(X, JR).)
4.8 EMBEDDINGS IN CUBES
We now present a technique for embedding topological spaces in products of intervals
and discuss some of its applications. (These results will not be used elsewhere in this
book.) Throughout this section we shall denote the unit interval [0, 1] by I, and if A
is any nonempty set, we shall call the product space I A a cube.
If X is a topological space and c C(X, I), we say that separates points
and closed sets if for ev ery closed E c X and every x E EC there exists f E
such that f (x ) f (E). If l' separates points and closed sets, there is another
family 9 c C(X, I) with the following slightly stronger property: For every closed
set E c X and every x E EC there e xists 9 E 9 with g( x) == 1 and 9 == 0 on
E. (Indeed, if f E satisfi es f( x) f(E), take 9 == cP 0 f where cP E C(1, I),
cP(f(x)) == 1, and cP == 0 on f(E).) It follows that a Tl space X admits a family
that separates points and closed sets iff X is completely regular.
Each nonempty l' c C(X, I) canonically induces a map e : X I9=" by the
formula 7r f (e( x)) == f( x), where 7r f : 1 I is the coordinate map. We call e the
map from X into the cube I9=" associated to. (Evidently this construction can be
generalized to target spaces other than I; see Exercise 20.)
4.53 Proposition. Let X be a topological space, C C(X, I), and e : X I9=" be
the map associated to. Then
a. e is continuous.
b. If separates points, then e is injective.
c. If X is Tl and separates points and closed sets, then e is an embedding.
Proof. (a) follows from Proposition 4.11, and (b) is obvious. Next, observe that
if separates points and closed sets and X is T 1 , then e is injective by (b) and
Proposition 4.7. To prove the continu ity of t he inverse, suppose that U is open in X.
If x E U, choose f E l' wi th f(x) f(Uc) and let
V = IT/! [j(UC) r = {p E IT : IT f(P) tf- j(Uc)}.
Then V is open in I9=" and e(x) E V n e(X) C e(U). Thus e(U) is a neighborhood
of e(x) in e(X) at every x E U, so e(U) is open in e(X). It follows that e- 1 is
continuous. .
144 POINT SET TOPOLOGY
4.54 Corollary. Every compact Hausdorff space is homeomorphic to a closed subset
of a cube.
Proof. By Proposition 4.25 and Urysohn's lemma, we can take == C(X, I). .
4.55 Corollary. A topological space is completely regular iffit is homeomorphic to
a subset of a compact Hausdorff space.
Proof. Proposition 4.53, with == C(X, I), gives the "only if" implication; the
converse is left to the reader (Exercise 72). .
A compactification of a topological space X is a pair (Y, cP) where Y is a compact
Hausdorff space and cP is a homeomorphism from X onto a dense subset of Y.
(Frequently one identifies X with its image cjJ(X) c Y and then speaks simply of "the
compactification Y of X .") For example, ([-1, 1], tanh) is a compactification of JR,
and the one-point compactification (X* , i) of an LCH space X is a compactification
in the present sense, where i : X X* is the inclusion map.
Suppose X is completely regular. According to Proposition 4.53, if c C(X, I)
separates points and closed sets, e : X 1 is the associated embedding, and Y is
the closure of e( X) in 1, then (Y, e) is a compactification of X. It has the property
that if we identify X with its image e(X), every f E has a continuous extension to
Y, which is unique since X is dense in Y. Indeed, the identification of X with e( X)
turns f into the coordinate map 7r fle(X), which extends to 7r flY. Moreover, if f and
9 are bounded continuous functions on X that extend continuously to Y, obviously
so are f + 9 and f g, and if {fn} is a uniformly convergent sequence of functions on
X that extend continuously to Y, their extensions converge uniformly on Y since X
is dense in Y, so f == lirn f n also extends continuously. We have proved:
4.56 Proposition. Suppose that c C(X, I) separates points and closed sets. Let
(Y, e) be the compactification of X associated to , and let A be the smallest closed
subalgebra of BC(X) that contains. Then every f E A has a continuous extension
to Y.
This result has a converse: see Exercise 73.
If X is a completely regular space, the compactification of X associated to
g:- == C (X, I) is called the Stone-tech compactification of X and is denoted by
({3X, e), or simply by (3X if we identify X with e(X). Every f E BC(X) extends
continuously to (3X; in fact, a much more general result holds:
4.57 Theorem. If X is a completely regular space, Y is a compa.EJ Hausdorff space,
and cP E C(X, Y), then cP has a unique continuous extension cP to (3X - that is,
-- ---
there is a unique cP E C({3X, Y) such that cjJ 0 e == cPo If (Y, cP) is a compactification
of X, then cP is surjective; if also every f E BC(X) extends continuously to Y (i.e.,
f == 9 0 cP for some 9 E C(Y)), then cjJ is a homeomorphism.
Proof. Let == C(X, I) and 9 == C(Y, I), and let ({3Y, i) be the Stone-Cech
compactification of Y. (That is, i : Y 1 9 is the embedding associated to 9,
EMBEDDINGS IN CUBES 145
and j3Y = i(Y); j3Y is homeomorphic to Y since Y is compact.) Given cjJ E
C (X, Y), define <I> : 1 1 9 by 7r 9 ( <I> (p)) == 7r gocP (p). The map <I> is continuous by
Proposition 4.11, and
7r g (<I>(e(x))) = 7r gocP (e(x)) == g(cP(x)) == 7r g (i(cP(x))),
that is, <1> 0 e = i 0 cjJ. It follows that <1>( e(X)) == i( cjJ(X)) c j3Y and hence that
<1>(j3X) C j3Y = j3Y. The situation is summarized in the following commutative
diagram:
x
q?1
Y
j3X
. 4>1.BX 1
(3Y
1
4>1
1 9
Let == i-I 0 (<I>I{3X). Then 0 e == i-I 0 <I> 0 e == cjJ, and uniqueness of cjJ
is clear since e(X) is dense in {3X; thus the first assertion is proved. If (Y, cjJ) is a
compactification of X, then cjJ( X) is dense in Y; but then cjJ({3X) is dense in Y and
also compact, so that cjJ({3X) == Y. Finally, if every f E BC (X) is of the form
9 0 cjJ for some 9 E C(Y), then <I> is injective; hence cjJ is bijective and therefore, by
Proposition 4.28, a homeomorphism. .
This theorem shows that (3X is the "largest" compactification of a completely
regular space X, in the sense that every other compactification is a continuous image
of it. At the other end of the scale, if X is locally compact, then = Cc(X)nC(X, I)
separates points and closed sets by Urysohn's lemma. A glance at the construction
of the compactification (Y, e) associated to this shows that Y consists of e( X)
together with the single point of 1 all of whose coordinates are zero. It is then easy
to verify that Y is homeomorphic to the one-point compactification of X constructed
in 34.5.
As a final application of the embedding e : X 1, we give a partial answer to
the question: When is a topological space metrizable, that is, when is its topology
defined by a metric? A necessary condition for X to be metrizable is that X be
normal (Exercise 3). On the other hand:
4.58 The Urysohn Metrization Theorem. Every second countable normal space
is metrizable.
Since every subset of a metrizable space is metrizable (with the same metric),
this theorem is an immediate consequence of Proposition 4.53 and the following two
facts, whose proofs are outlined in Exercises 76 and 77:
. If X is normal and second countable, there is a countable family c C(X, I)
that separates points and closed sets.
. If is countable, 1 is metrizable.
146 POINT SET TOPOLOGY
Exercises
72. Every subset of a completely regular space is completely regular in the relative
topology.
73. If X is a completely regular space, a subalgebra A of BC(X) is called com-
pletely regular if (i) it is closed and contains the constant functions, and (ii)
A n C (X, I) separates points and closed sets.
a. If (Y, e) is a Hausdorff compactification of X, A y == {f 0 e : f E C(Y)} is
a completely regular subalgebra of BC(X).
b. If (Y, e) and (Y', e') are Hausdorff compactifications of X such that Ay ==
Ay" there is a homeomorphism 4J : Y Y' such that 4J 0 e == e'. (Adapt the
proof of Theorem 4.57, which deals with the case Y == (3X.)
c. If (Y, e) is the compactification of X associated to C C (X, I), then A y is
the smallest closed subalgebra of BC(X) that contains . (Use Exercise 69.)
d. The Hausdorff compactifications of X are in one-to-one correspondence with
the completely regular subalgebras of BC(X).
74. Consider N (with the discrete topology) as a subset of its Stone-Cech compacti-
fication {3N.
a. If A and B are disjoint subsets of N, their closures in {3N are disjoint. (Hint:
XA E C(N, I).)
b. No sequence in N converges in {3N unless it is eventually constant (so (3N is
emphatically not sequentially compact).
75. Suppose X is a completely regular space. The set M of nonzero algebra
homomorphisms from BC(X, JR) to JR, equipped with the topology of pointwise
convergence, is homeomorphic to {3 X. (See Exercise 71. This realization of (3 X is
the natural one from the point of view of Banach algebra theory.)
76. If X is normal and second countable, there is a countable family c C(X, I)
that separates points and closed sets. (Let 23 be a countable base for the topology.
Consider the set of pairs (U, V) E 23 x 13 such that U C V, and use Urysohn's
lemma.)
77. Let {(X n , Pn)}r be a countable family of metric spaces whose metrics take
values in [0, 1]. (The latter restriction can always be satisfied; see Exercise 56b.)
Let X == rr Xn. If x, Y E X, say x == (Xl, X2,...) and Y == (YI, Y2,.. .), define
p(x, y) == L: 2- n Pn(xn, Yn). Then P is a metric that defines the product topology
on X.
4.9 NOTES AND REFERENCES
The germ of the concept of topological space is clearly present in Riemann's lecture
[113] on the foundations of geometry, delivered in 1854, but another half century
passed before the mathematical world was ready to consider abstract spaces in a
NOTES AND REFERENCES 147
systematic way. The first attempt to construct an abstract framework for the study of
limits and continuity was made in 1906 by Frechet [52], who introduced metric spaces
as well as a more general class of quasi-topological spaces whose properties were
defined in terms of sequential convergence. A few years later Hausdorff [68] devised
axioms for neighborhoods of points that amount to the definition of a Hausdorff
space, and he deduced from them many of the basic results of general topology. The
usefulness of his point of view was quickly recognized, and it became the foundation
for the further development of the subject.
There are several good books to which the reader may refer for a more com-
prehensive treatment of point set topology, including Bourbaki [20], Dugundji [34],
Engelking [38], Kelley [83], and Nagata [102]. Engelking [38] contains extensive
references and historical notes.
84.2: Urysohn's lemma and the Tietze extension theorem were both first proved
in Urysohn [152]. Special cases of the latter had previously been obtained by several
authors, including Tietze (see [152] for references). Examples of completely regular
spaces that are not normal and regular spaces that are not completely regular, which
are all rather complicated, were first constructed by Tychonoff [151]. Particularly
noteworthy is the existence of a regular space that admits no nonconstant continuous
functions, a result due to Hewitt [73]. Examples may also be found in the books cited
above.
84.3: The theory of nets is sometimes called the Moore-Smith theory of conver-
gence, after its originators [101]. Another general theory of convergence, invented
by H. Cartan and publicized by Bourbaki, is based on the notion of filters. A filter in
a set X is a family C P( X) with the following properties:
. If F E and E F, then E E .
. If E E and F E , then E n F E .
. 0 .
If X is a topological space, a filter in X converges to x E X if every neighborhood
of x belongs to. Filters and nets are related as follows. If (Xa)aEA is a net in X,
its derived filter is the collection of all E c X such that (xa) is eventually in E.
On the other hand, if is a filter, then is a directed set under reverse inclusion, and
a net (XF) PEg:- indexed by is said to be associated to if XF E F for all F E .
It is then easy to verify that a net (xa) converges to x iff its derived filter converges
to x, and a filter converges to x iff all of its associated nets converge to x. See
Bourbaki [20] or Dugundji [34] for more information.
84.4: The usage of the term "compact" is not completely standardized. In many
older works the terms "compact" and "bicompact" were used to mean countably
compact and compact, respectively, and some authors use "compact" and "quasi-
compact" to mean compact Hausdorff and compact, respectively. Synonyms for
"precompact" that are frequently found in the literature are "conditionally compact"
and "relatively compact"; the latter one is infelicitous because it suggests compactness
in the relative topology, which is quite different.
148 POINT SET TOPOLOGY
34.6: Tychonoff [151] proved that [0, l]A is compact for any set A; together with
Corollary 4.54, which is in the same paper, this easily implies that any product of
compact Hausdorff spaces is compact. The Tychonoff theorem in full generality is
due to Cech [23]. The proof we have presented, which is simpler and more elegant
than the older ones, is due to Chernoff [24].
The axiom of choice, usually in the form of Zorn 's lemma, is an essential ingredient
in all the proofs of Tychonoff's theorem. It is an intriguing fact, discovered by Kelley
[82], that Tychonoff's theorem in turn implies the axiom of choice. Here is the proof:
Suppose that {Xa}aEA is a nonempty collection of non empty sets. Pick a point w
that is not an element of any Xa, set X == Xa U {w}, and define a topology on X
by declaring the open sets to be 0, Xa, {w}, and X. Evidently X is compact, so
Tychonoff's theorem implies that X* == ITaEA X is compact. Let Fa == 7r1 (X a ).
The sets Fa are closed, and by the axiom of choice for finite collections of sets -
which is provable from the other standard axioms of set theory - they have the
finite intersection property. Indeed, given a finite set B c A, pick x {3 E X (3 for
/3 E B; then n{3EB F{3 contains the point x E X such that 1r{3(x) == x{3 for (3 E B
and 1fa(x) == w for Q rf:. B. By Proposition 4.21, naEA Fa, which is precisely
ITaEA Xa, is nonempty.
By elaboration of this argument, one can deduce the axiom of choice from the
special case of Tychonoff's theorem that X A is compact for any A if X is compact;
see Ward [156].
The original results of Arzela and Ascoli had to do with functions on JR.; see Arzela
[6]. Other versions of the Arzela-Ascoli theorem, pertaining to the compactness of
subsets of C(X, Y) under various hypotheses on X and Y, can be found in the books
cited above and in Royden [121].
84.7. The Stone-Weierstrass theorem first appeared in the middle of a lengthy and
difficult paper of Stone [144]. Later Stone [145] wrote a much-simplified exposition
of the theorem and some of its applications, which still makes good reading.
34.8. The history of this material begins with Urysohn [153], where the metrization
theorem is proved, essentially by the method we have outlined. The technique of
embedding spaces in cubes is implicit in this paper, but it was first developed explicitly
in Tychonoff [151]. The Stone-Cech compactification, in turn, is implicit in the latter
paper, but it was first described explicitly and investigated by Stone [144] and Cech
[23].
It is not hard to show that every second countable regular space is normal (see
Kelley [82, Lemma 4.1]; consequently, the hypothesis of normality in the Urysohn
metrization theorem can be replaced by regularity. Necessary and sufficient condi-
tions are known for an arbitrary topological space to be metrizable, but they are not as
readily verifiable as the conditions in Urysohn's theorem. See the books cited above.
Occasionally the term "compactification" is used to mean a continuous injection
c/J : X Y from a topological space X onto a dense subset of a compact space
Y without the requirement that it be an embedding. Such "compactifications" arise
from subalgebras of C (X) that separate points but are not completely regular in
the sense of Exercise 73. An example is provided by the algebra of "uniformly
NOTES AND REFERENCES 149
almost periodic" functions on JR, which is the algebra generated by the functions
f).. (x) == ei)"x, A E JR; the associated "compactification" of JR is known as the Bohr
compactification. See Folland [47, 34.7].
5
Elements of Functional
Analysis
"Functional analysis" is the traditional name for the study of infinite-dimensional
vector spaces over JR or C and the linear maps between them. What distinguishes this
from mere linear algebra is the importance of topological considerations. On finite-
dimensional vector spaces there is only one reasonable topology, and linear maps are
automatically continuous, but in infinite dimensions things are not so simple. (As
we have already observed, if {f n} is a sequence of functions on JR, there are many
things one can mean by the statement "fn ---7 f.") As our aim in this chapter is only
to give a brief introduction to the subject, we shall restrict attention - except in 35.4
- to topologies defined by norms on vector spaces.
5.1 NORMED VECTOR SPACES
Let K denote either JR or CC, and let X be a vector space over K. We denote the zero
element of X simply by 0, relying on context to distinguish it from the scalar 0 E K.
By a subspace we shall always mean a vector subspace. If x E X, we denote by K x
the one-dimensional subspace spanned by x. Also, if ]v( and N are subspaces of X,
M + N denotes the subspace {x + y : x E M, yEN}.
A seminorm on X is a function x IIxll from X to [0,00) such that
. II x + y II < II x II + II y II for all x, y E X (the triangle inequality),
. IIAxl1 == IAlllxll for all x E X and A E K.
151
152 ELEMENTS OF FUNCTIONAL ANALYSIS
The second property clearly implies that II 0 /I == o. A seminorm such that II x II == 0
only when x == 0 is called a norm, and a vector space equipped with a norm is called
a normed vector space (or normed linear space).
If X is a normed vector space, the function p( X, y) == II x - y II is a metric on X,
SInce
II x - zll < II x - yll + Ily - zll,
II x - yll == II( -l)(x - y)1I == Ily - xii.
The topology it defines is called the norm topology on X. Two norms II . 111 and
II . 112 on X are called equivalent if there exist C 1 , C 2 > 0 such that
C 1 11xl11 < IIxl12 < C 2 11xl11
(x E X).
Equivalent norms define equivalent metrics and hence the same topology and the
same Cauchy sequences.
A normed vector space that is complete with respect to the norm metric is called
a Banach space. (Every normed vector space can be embedded in a Banach space
as a dense subspace. One way to do this is to mimic the construction of JR from Q
via Cauchy sequences; we shall present a simpler way in 35.2.) The following is a
useful criterion for completeness of a normed vector space. If {xn} is a sequence in
X, the series L: X n is said to converge to x if L: X n x as N -4 00, and it is
called absolutely convergent if L: Ilxn II < 00.
5.1 Theorem. A normed vector space X is complete iff every absolutely convergent
series in X converges.
Proof. If X is complete and L: IIxn /I < 00, let 8N == L:f Xn. Then for
N > M we have
N
118N - 8 M II < L IIxnll -4 0 as M, N -4 00,
M+1
so the sequence {SN} is Cauchy and hence convergent. Conversely, suppose that
every absolutely convergent series converges, and let {xn} be a Cauchy sequence.
We can choose n1 < n2 < ... such that Ilxn - X m II < 2- j for m, n > nj. Let
Yl == X n1 and Yj == x nj - x nj _ 1 for j > 1. Then L: Yj == X nk ' and
00
00
L IIYj II < IIY111 + L 2- j == IIY111 + 1 < 00,
1 1
so lim x nk == L: Yj exists. But since {xn} is Cauchy, it is easily verified that {xn}
converges to the same limit as {x nk }. .
We have already seen some examples of Banach spaces. First, if X is a topolog-
ical space, B(X) and BC(X) are Banach spaces with the uniform norm IIfllu ==
sUPXEX If(x)l. Second, if (X,M,J-l) is a measure space, £1(J-l) is a Banach space
NORMED VECTOR SPACES 153
with the L 1 norm IIfl11 == J If I dJ-l. (Observe that II . III is only a seminorm if we
think of £1 (J-l) as consisting of individual functions, but it becomes a norm if we
identify functions that are equal a.e.) That £1 (J-l) is complete follows from Theorems
2.25 and 5.1. Indeed, if L: IIfnl11 < 00, Theorem 2.25 shows that f == L: fn
exists a.e., and
N 00
JjJ - Lfnl dJL < L J Ifni dJL -4 0 as N -4 00.
1 N+l
More examples will be found in Exercises 8-11 and in subsequent sections.
If X and 1j are normed vector spaces, X x 1j becomes a normed vector space when
equipped with the product norm
II ( x, y) II == max ( II x II, II y II).
(Here, of course, /lxll refers to the norm on X while lIyll refers to the norm on 1j.)
Sometimes other norms equivalent to this one, such as II (x, y) II == Ilxll + Ilyll or
II (x, y)11 == (11 x 11 2 + lIyI12)1/2, are used instead.
A related construction is that of quotient spaces. If M is a vector subspace of
the vector space X, it defines an equivalence relation on X as follows: x rv y iff
x - y E M. The equivalence class of x E X is denoted by x + M, and the set of
equivalence classes, or quotient space, is denoted by X 1M. X 1M is a vector space
with vector operations (x+M) + (y+M) == (x+y) +M and A(x+M) == (Ax) +M.
If X is a normed vector space and M is closed, X/M inherits a norm from X called
the quotient norm, namely
II x + Mil == inf II x + yll.
yEM
See Exercise 12 for a more detailed discussion.
A linear map T : X 1J between two normed vector spaces is called bounded if
there exists G > 0 such that
IITxl1 < Gllxll for all x E X.
This is different from the notion of boundedness for functions on a set, according to
which T would be bounded if IITxl1 < C for all x. Clearly no nonzero linear map
can satisfy the latter condition, since T( AX) == ATx for all scalars A. The present
definition means that T is bounded on bounded subsets of X.
5.2 Proposition. If X and are normed vector spaces and T : X is a linear
map, the following are equivalent:
a. T is continuous.
b. T is continuous at O.
c. T is bounded.
Proof. That (a) implies (b) is trivial. If T is continuous at 0 E X, there is a
neighborhood U of 0 such that T(U) c {y E : lIyll < I}, and U must contain
154 ELEMENTS OF FUNCTIONAL ANALYSIS
a ball B == {x EX: Ilxll < 8} about 0; thus IITxll < 1 when Ilxll < 8. Since
T commutes with scalar multiplication, it follows that IITxll < a8- 1 whenever
Ilxll < a, that is, IITxl1 < 8- l llxll. This shows that (b) implies (c). Finally,
if IITxl1 < Cllxll for all x, then IITxI - TX211 == IIT(XI - X2) II < E whenever
Ilxl - x211 < C-1E, so that T is continuous. .
If X and are normed vector spaces, we denote the space of all bounded linear
maps from X to by L(X, ). It is easily verified that L(X, ) is a vector space and
that the function T IITII defined by
(5.3)
IITII = sup{ IITxl1 : /lxll = 1}
= sup { IITxl1 . x -I- O }
Ilxll .
== inf{C: IITxll < Cllxll for all x}
is a norm on L(X, ), called the operator norm (Exercise 2). We always assume
L(X, ) to be equipped with this norm unless we specify otherwise.
5.4 Proposition. If is complete, so is L(X, ).
Proof. Let {Tn} be a Cauchy sequence in L(X, ). If x E X, then {Tnx} is
Cauchy in because IITnx - Tmxll < IITn - Tmllllxll. Define T : X by
Tx == lirn Tnx. We leave it to the reader (Exercise 3) to verify that T E L(X, ) (in
fact, IITII == lirn II Tn II) and that IITn - TII o. .
Another useful property of the operator norm is the following. If T E L(X,)
and S E L(, Z), then
IISTxll < IISlIllTxll < IISIIIITllllxll,
so that ST E L(X, Z) and IISTII < IISIIIITII. In particular, L(X, X) is an algebra.
If X is complete, L(X, X) is in fact a Banach algebra: a Banach space that is also
an algebra, such that the norm of a product is at most the product of the norms.
(Another example of a Banach algebra is BC(X), where X is a topological space,
with pointwise multiplication and the uniform norm.)
If T E L(X, ), T is said to be invertible, or an isomorphism, if T is bijective
and T- 1 is bounded (in other words, IITxl1 > Cllxll for some C > 0). T is called an
isometry if IITxll == Ilxll for all x E X. An isometry is injective but not necessarily
surjective; it is, however, an isomorphism onto its range.
Exercises
1. If X is a normed vector space over K (== 1R or CC), then addition and scalar
multiplication are continuous from X x X and K x X to X. Moreover, the norm is
continuous from X to [0,00); in fact, Illxll - Ilylll < Ilx - yll.
2. L(X,) is a vector space and the function II . II defined by (5.3) is a norm on it.
In particular, the three expressions on the right of (5.3) are always equal.
NORMED VECTOR SPACES 155
3. Complete the proof of Proposition 5.4.
4. If X,1j are normed vector spaces, the map (T, x) Tx is continuous from
L(X,1j) x X to 1j. (That is, if Tn T and X n x then Tnxn Tx.)
5. If X is a normed vector space, the closure of any subspace of X is a subspace.
6. Suppose that X is a finite-dimensional vector space. Let e1, . . . , en be a basis
for X, and define II I: ajej III == I: laj I.
3. II . 111 is a norm on X.
b. The map (aI, . . . , an) I: ajej is continuous from Kn with the usual
Euclidean topology to X with the topology defined by II . 111.
c. {x EX: II x lll == I} is compact in the topology defined by II . 111.
d. All norms on X are equivalent. (Compare any norm to 1/ . 111.)
7. Let X be a Banach space.
a. If T E L(X, X) and III - TII < 1 where I is the identity operator, then T is
invertible; in fact, the series I:c; (I - T)n converges in L(X, X) to T- 1 .
b. If T E L(X, X) is invertible and liS - TII < liT-III-I, then S is invertible.
Thus the set of invertible operators is open in L(X, X).
8. Let (X, M) be a measurable space, and let M(X) be the space of complex
measures on (X, M). Then IIJ-lil == 1J-lI(X) is a norm on M(X) that makes M(X)
into a Banach space. (Use Theorem 5.1.)
9. Let Ck([O, 1]) be the space of functions on [0,1] possessing continuous deriva-
tives up to order k on [0, 1], including one-sided derivatives at the endpoints.
a. If f E C([O, 1]), then f E Ck([O, 1]) iff f is k times continuously differen-
tiable on (0,1) and lirnxo f(j) (x) and lirn x /1 f(j) (x) exist for j < k. (The
mean val ue theorem is useful.)
b. IIfll == I: IIf(j)lIu is a norm on Ck([O, 1]) that makes Ck([O, 1]) into a
Banach space. (Use induction on k. The essential point is that if {f n} C
C 1 ([0, 1]), fn --t f uniformly, and f g uniformly, then f E C 1 ([0, 1]) and
f' , g. The easy way to prove this is to show that f(x) - f(O) == f; g(t) dt.)
10. Let Lk([O, 1]) be the space of all f E Ck-l([O, 1]) such that f(k-l) is absolutely
continuous on [0,1] (and hence f(k) exists a.e. and is in L 1 ([0, 1])). Then I/fll ==
I: fo1 I f(j) (x) I dx is a norm on Lk ([0, 1]) that makes Lk ([0, 1]) into a Banach space.
(See Exercise 9 and its hint.)
11. If 0 < a < 1, let A Q ([0,1]) be the space of Holder continuous functions of
exponent a on [0,1]. That is, f E AQ([O, 1]) iff IIfilAa < 00, where
IlfllAa == If(O)1 + sup
x,YE[O,l], x#y
If(x) - f(y)1
Ix -ylQ
3. " . IIAa is a norm that makes AQ([O, 1]) into a Banach space.
156 ELEMENTS OF FUNCTIONAL ANALYSIS
b. Let AQ([O, 1]) be the set of all f E AQ([O, 1]) such that
!!(I X ) - (Y)I -> 0 as x -> y, for all y E [0,1].
x-y Q
If Q < 1, AQ([O, 1]) is an infinite-dimensional closed subspace of AQ([O, 1]). If
a == 1, A Q ([0, 1]) contains only constant functions.
12. Let X be a normed vector space and M a proper closed subspace of X.
a. Ilx + Mil == inf{llx + yll : y E M} is a norm on X/Me
b. For any E > 0 there exists x E X such that IIxll == 1 and Ilx + Mil > 1 - E.
c. The projection map 7r(x) == x + M from X to X/M has norm 1.
d. If X is complete, so is X/Me (Use Theorem 5.1.)
e. The topology defined by the quotient norm is the quotient topology as defined
in Exercise 28 in 34.2.
13. If II . II is a seminorm on the vector space X, let M == {x EX: II x 1\ == O}. Then
M is a subspace, and the map x + M IIxll is a norm on X/Me
14. If X is a normed vector space and M is a nonclosed subspace, then Ilx + Mil, as
defined in Exercise 12, is a seminorm on X /M. If one divides by its nullspace as in
Exercise 13, the resulting quotient space is isometrically isomorphic to X/ M e (Cf.
Exercise 5.)
15. Suppose that X and 1j are normed vector spaces and T E L(X, ). Let N(T) ==
{x EX: Tx == O}.
3. N(T) is a closed subspace of X.
b. There is a unique 8 E L(XjN(T), ) such that T == 8 0 7r where 7r : X --t
X/M is the projection (see Exercise 12). Moreover, /1811 == IITII.
16. The purpose of this exercise is to develop a theory of integration for functions
with values in a separable Banach space. Let (X, M, J.L) be a measure space, 1j a
separable Banach space, and L}J the space of all (M, 13}J)-measurable maps from
X to 1j, and F}J the set of maps f : X 1j of the form f(x) == L: XEj (x)Yj
where n E N, Yj E 1j, Ej E M, and J.L(Ej) < 00. If f E L}J, since Y IlylI
is continuous (Exercise 1), x IIf(x)1I is (M,13}R)-measurable, and we define
IIflll == J IIf(x) II dJ.L(x). Finally, let L == {f E L}J : Ilflh < oo}.
3. L}J is a vector space, F}J and L are subspaces of it, F}J C L, and II . 111 is a
seminorm on L that becomes a norm if we identify two functions that are equal
a.e.
b. Let {Yn}l be a countable dense set in . Given E > 0, let B == {y E 1j :
lIy - Ynll < EIIYnll}. Then U B :) \ {O}.
c. If f E L, there is a sequence {h n } C F}J with h n --t f a.e. and Ilh n - flh -4
O. (With notation as in (b), let Anj == B/ j \ U:\ B;i j and Enj == f-1 (Anj ),
and consider gj = L:=l YnXE nj .)
d. There is a unique linear map J : L such that J YXE == J.L(E)y for
Y E 1j and E E M (J.L(E) < 00), and II J fll < II fill.
LINEAR FUNCTIONALS 157
e. The dominated convergence theorem: If {fn} is a sequence in L such that
f n f a.e., and there exists g E L 1 such that II f n (x) II < g( x) for all nand a.e.
x, then J fn J f.
f. If Z is a separable Banach space, T E L(1), Z), and f E L, then To f E Li
and J T 0 f == T(J f).
5.2 LINEAR FUNCTIONALS
Let X be a vector space over K, where K == 1R or C. A linear map from X to K is
called a linear functional on X. If X is a normed vector space, the space L(X, K)
of bounded linear functionals on X is called the dual space of X and is denoted by
X*. According to Proposition 5.4, X* is a Banach space with the operator norm.
If X is a vector space over C, it is also a vector space over JR, and we can consider
both real and complex linear functionals on X, that is, maps f : X JR that are
linear over JR and maps f : X C that are linear over C. The relationship between
the two is as follows:
5.5 Proposition. Let X be a vector space over ce. If f is a complex linear functional
on X and u == Re f, then u is a real linear functional, and f (x) == u (x) - i u ( ix ) for
all x E X. Conversely, ifu is a real linear functional on X and f : X C is defined
by f(x) == u(x) - iu(ix), then f is complex linear. In this case, if X is normed, we
have lIull == IIfli.
Proof. If f is complex linear and u == Re f, u is clearly real linear and 1m f(x) ==
- Re(if(x)] == -u(ix), so f(x) == u(x) - iu(ix). On the other hand, if u is real
linear and f(x) == u(x) - iu(ix), then f is clearly linear over JR, and f(ix) ==
u(ix) - iu( -x) == u(ix) + iu(x) == if(x), so f is also linear over C. Finally, if X is
normed, since lu(x)1 == IRe f(x )1 < If(x)1 we have lIuli < Ilfll. On the other hand,
if f(x) =1= 0, let a == sgn f(x). Then If(x)1 == af(x) == f(ax) == u(ax) (since
f(ax) is real), so If(x)1 < lIullllaxll == Ilullllxll, whence Ilfll < lIuli. .
It is not obvious that there are any nonzero bounded linear functionals on an
arbitrary normed vector space. The fact that such functionals exist in great abundance
is one of the fundamental theorems of functional analysis. We shall now present this
result in a more general form that has other important applications.
If X is a real vector space, a sub linear functional on X is a map p : X JR such
that
p(x + y) < p(x) + p(y) andp(Ax) == Ap(X) for all x, y E X and A > o.
For example, every seminorm is a sublinear functional.
5.6 The Hahn-Banach Theorem. Let X be a real vector space, p a sublinear func-
tional on X, M a subspace of X, and f a linear functional on M such that f (x) < p( x)
for all x E M. Then there exists a linear functional F on X such that F(x) < p(x)
for all x E X and FIM == f.
158 ELEMENTS OF FUNCTIONAL ANALYSIS
Proof. We begin by showing that if x E X \ M, f can be extended to a linear
functional 9 on M + JRx satisfying g(y) < p(y) there. If Yl, Y2 E M, we have
f(Yl) + f(Y2) == f(Yl + Y2) < p(Yl + Y2) < P(YI - x) + p(x + Y2),
or
!(Yl) - p(Yl - x) < p(x + Y2) - f(Y2).
Hence
sup { f (y) - p (y - x) : Y E ]v(} < inf {p( x + y) - f (y) : Y EM}.
Let a be any number satisfying
sup{f(Y) - p(y - x) : Y E M} < a < inf{p(x + y) - f(y) : Y E M}
and define 9 : M + JRx --t JR by g(y + AX) == f(y) + Aa. Then 9 is clearly linear,
and glM == f, so that g(y) < p(y) for Y E M. Moreover, if A > 0 and Y E M,
g(Y+AX) == A[f(Y/A)+aJ < A[f(Y/A)+p(X+(Y/A))-f(Y/A)J ==p(Y+AX),
whereas if A == - J-l < 0,
g(y + AX) == J-l[f(Y/ J-l) - a] < J-l[f(Y/ J-l) - f(y/ J-l) + p((y/ J-l) - x)] == p(y + AX).
Thus g(z) < p(z) for all z E M + JRx.
Evidently the same reasoning can be applied to any linear extension F of f
satisfying F < p on its domain, and it shows that the domain of a maximal linear
extension satisfying F < p must be the whole space X. But the family 9=" of all
linear extensions F of f satisfying F < p is partially ordered by inclusion (maps
from subspaces of X to JR being regarded as subsets of X x JR). Since the union of
any increasing family of subspaces of X is again a subspace, one easily sees that the
union of a linearly ordered subfamily of lies in. The proof is therefore completed
by invoking Zorn's lemma. .
If p is a seminorm and f : X --t 1R is linear, the inequality f < p is equivalent to
the inequality If I < p, because If(x)1 == -:1:.f(x) == f(-:1:.x) and p( -x) == p(x). In
this situation the Hahn-Banach theorem also applies to complex linear functionals:
5.7 The Complex Hahn-Banach Theorem. Let X be a complex vector space, p a
seminorm on X, M a subspace of X, and f a complex linear functional on M such
that If (x) I < p( x) for x E M. Then there exists a complex linear functional F on X
such that IF(x)1 < p(x) for all x E X and FIM == f.
Proof. Let u == Re f. By Theorem 5.6 there is a real linear extension U of u to X
such that IU(x)1 < p(x) for all x E X. Let F(x) == U(x) - iU(ix) as in Proposition
5.5. Th en F is a complex linear extension of f, and as in the proof of Proposition 5.5,
if a == sgn F(x) we have IF(x) I == aF(x) == F( ax) == U( ax) < p( ax) == p(x). .
LINEAR FUNCTIONALS 159
From now on until 35.5, all of our results apply equally to real or complex vector
spaces, but for the sake of definiteness we shall assume that the scalar field is C.
The principal applications of the Hahn-Banach theorem to normed vector spaces are
summarized in the following theorem.
5.8 Theorem. Let X be a normed vector space.
a. IfM is a closed subspace of X and x E X \ M, there exists f E X* such that
f(x) # 0 and flM == O. In fact, if 8 == infyEM IIx - yll, f can be taken to
.satisfY IIfll == 1 and f(x) == 8.
b. Ifx # 0 E X, there exists f E X* such that Ilfll == 1 and f(x) == IIxll.
c. The bounded linear functionals on X separate points.
d. Ifx E X, define x : X* CC by x(f) == f(x). Then the map x x is a linear
isometry from X into X** (the dual ofX*).
Proof. To prove (a), define f on M + Cx by f(y + AX) == A8 (y E M, A E C).
Then f(x) == 8, flM == 0, and for A # 0, If(y + Ax)1 == IAI8 < IAIIIA- l y + xII ==
Ily + Axil. Thus the Hahn-Banach theorem can be applied, with p(x) == IIxli and M
replaced by M + Cx. (b) is the special case of (a) with M == {O}, and (c) follows
immediately: if x # y, there exists f E X* with f(x - y) # 0, i.e., f(x) # f(y).
As for (d), obviously x is a linear functional on X* and the map x x is linear.
Moroever, Ix(f)1 == If(x)1 < IIfllllxll, so IIxll < IIxll. On the other hand, (b)
implies that !lxll > IIxll. .
With notation as in Theorem 5.8d, let X == {x : x E X}. Since X** is always
-...
complete, the closure X of X in X** is a Banach space, and the map x x embeds
:::::- -::::::-
X into X as a dense subspace. X is called the completion of X. In particular, if X is
-...
itself a Banach space then X == X.
-...
If X is finite-dimensional, then of course X == X**, since these spaces have the
same dimension. For infinite-dimensional Banach spaces it mayor may not happen
-...
that X == X**; if it does, X is called reflexive. The examples of Banach spaces we
have examined so far are not reflexive except in trivial cases where they turn out to be
finite-dimensional. We shall prove some cases of this assertion and present examples
of reflexive Banach spaces in later sections.
Usually we shall identify x with x and thus regard X** as a superspace of X;
reflexivity then means that X** == X.
Exercises
17. A linear functional f on a normed vector space X is bounded iff f-l ( {O}) is
closed. (Use Exercise 12b.)
18. Let X be a normed vector space.
a. If M is a closed subspace and x E X \ M then M + Cx is closed. (D se
Theorem 5.8a.)
b. Every finite-dimensional subspace of X is closed.
160 ELEMENTS OF FUNCTIONAL ANALYSIS
19. Let X be an infinite-dimensional normed vector space.
a. There is a sequence {Xj} in X such that IIXj II == 1 for all j and Ilxj - Xk II >
for j -=F k. (Construct Xj inductively, using Exercises 12b and 18.)
b. X is not locally compact.
20. If M is a finite-dimensional subspace of a normed vector space X, there is a
closed subspace N such that M n N == {O} and M + N == X.
21. If X and are normed vector spaces, define a : X* x * (X x )* by
a(f, g)(x, y) == f(x) + g(y). Then a is an isomorphism which is isometric if we
use the norm II (x, y) II == max( II x II, II y II) on X x , the corresponding operator norm
on (X x )*, and the norm II (f, g) II == II fll + IIgli on X* x *.
22. Suppose that X and are normed vector spaces and T E L(X, ).
a. Define Tt : * X* by Tt I == loT. Then Tt E L(*, X*) and
IITt II == IITII. Tt is called the adjoint or transpose of T.
b. Applying the construction in (a) twice, one obtains Ttt E L(X**, **). If
......... .........
X and are identified with their natural images X and in X** and **, then
Ttt IX == T.
c. Tt is injective iff the range of T is dense in .
d. If the range of Tt is dense in X*, then T is injective; the converse is true if X
is reflexive.
23. Suppose that X is a Banach space. IfM is a closed subspace of X and N is a closed
subspace of X*, let MO == {f E X* : 11M == O} and N-L- == {x EX: f(x) == 0 for
all fEN}. (Thus, if we identify X with its image in X**, N-L- == W n X.)
a. MO and N-L- are closed subspaces of X* and X, respectively.
b. (MO)-L- == M and (N-L-)O :) N. If X is reflexive, (N-L-)O == N.
c. Let Jr : X --7 X jM be the natural projection, and define a : (X jM) * --7 X*
by a(f) == f 0 Jr. Then a is an isometric isomorphism from (XjM)* onto MO,
where XjM has the quotient norm.
d. Define (3 : X* --7 M* b L {3(/) == 11M; then {3 induces a map {3 : X* jMO --7
JY(* as in Exercise 15, and {3 is an isometric isomorphism.
24. Suppose that X is a Banach space.
a. Let X, (X* fbe the natural images of X, X* in X**, X*** , and let XO == {F E
X*** : FIX == O}. Then (x*fn XO == {O} and (X*f + Xa == X***.
b. X is reflexive iff X* is reflexive.
25. If X is a Banach space and X* is separable, then X is separable. (Let {In}r
be a countable dense subset of X*. For each n choose X n E X with II X n /I == 1 and
Ifn(xn)1 > llfnll. Then the linear combinations of {xn}r are dense in X.) Note:
Separability of X does not imply separability of X* .
26. Let X be a real vector space and let P be a subset of X such that (i) if x, YEP,
then x + YEP, (ii) if x E P and A > 0, then AX E P, (iii) if x E P and -x E P,
THE BAIRE CATEGORY THEOREM AND ITS CONSEQUENCES 161
then x == o. (Example: If X is a space of real-valued functions, P can be the set of
nonnegative functions in X.)
a. The relation < defined by x < y iff y - x E P is a partial ordering on X.
b. (The Krein Extension Theorem) Suppose that ]V( is a subspace of X such
that for each x E X there exists y E ]V( with x < y. If j is a linear functional
on ]V( such that j (x) > 0 for x E ]V( n P, there is a linear functional F on X
such that F(x) > 0 for x E P and FI]V( == j. (Consider p(x) == inf{j(y) : y E
]V( and x < y}.)
5.3 THE SAIRE CATEGORY THEOREM AND ITS CONSEQUENCES
In this section we present an important theorem about complete metric spaces and
use it to obtain some fundamental results concerning linear maps between Banach
spaces.
5.9 The Baire Category Theorem. Let X be a complete metric space.
a. If {Un}! is a sequence of open dense subsets of X, then n Un is dense in
X.
b. X is not a countable union of nowhere dense sets.
Proof. For part (a), we must show that if W is a nonempty open set in X,
then W intersects n Un. Since U I n W is open and nonempty, it contains a ball
B(ro, xo), and we can assume that 0 < ro < 1. For n > 0, we choose X n E X and
r n E (0,00) inductively as follows: Having chosen Xj and rj for j < n, we observe
that Un n B(rn-I, Xn-l) is o pen and nonempty, so we can choose xn,r n so that
o < r n < 2- n and B(r n , x n ) C Un n B(rn-l' Xn-l). Then if n, m > N, we see
that X n , X m E B (r N , x N ), and since r n 0, the sequence {xn} is Cauchy. As X
is complete, x == lim X n exists. Since X n E B (r N , x N) for n > N we have
x E B(rN, XN) C UN n B(rl, Xl) C UN n W
for all N, and the proof is complete.
As for (b), if {En} is a sequence of nowhere dense sets in X, then {( E n) C} is a
sequence of open dense sets. Since n( E n)C =1= 0, we have U En C U E n =1= X. .
We remark that since the conclusions of the Baire category theorem are purely
topological, it suffices for X to be homeomorphic to a complete metric space. For
example, the theorem applies to X == (0,1), which is not complete with the usual
metric but is homeomorphic to JR.
The name of this theorem comes from Baire's terminology for sets: If X is a
topological space, a set E C X is of the first category, according to Baire, if E is a
countable union of nowhere dense sets; otherwise E is of the second category. Thus
Baire's theorem asserts that every complete metric space is of the second category
in itself. A more modem and more descriptive synonym for "of the first category" is
meager. The complement of a meager set is called residual.
162 ELEMENTS OF FUNCTIONAL ANALYSIS
The Baire category theorem is often used to prove existence results: One shows that
objects having a certain property exist by showing that the set of objects not having
the property (within a suitable complete metric space) is meager. For example, one
can prove the existence of nowhere differentiable continuous functions in this way;
see Exercise 42.
We turn to the applications of the Baire category theorem in the theory of linear
maps. Some terminology: If X and Yare topological spaces, a map f : X Y is
called open if f(U) is open in Y whenever U is open in X. If X and Yare metric
spaces, this amounts to requiring that if B is a ball centered at x EX, then f (B)
contains a ball centered at f (x). Specializing still further, if X and Yare normed
linear spaces and f is linear, then f commutes with translations and di\ations; it
follows that f is open iff f (B) contains a ball centered at 0 in Y when B is the ball
of radius 1 about 0 in X.
5.10 The Open Mapping Theorem. Let X and 1:J be Banach spaces. If T E
L(X, ) is surjective, then T is open.
Proof. Let Br denote the (open) ball of radius r about 0 in X. By the preceding
remarks, it will suffice to show that T(B I ) contains a ball about 0 in 'J. Since
X == U Bn and T is surjective, we have == U T(B n ). But is complete and
the map y ny is a homeomorphism of that maps T(B I ) to T(B n ), so Baire's
theorem implies that T(B I ) cannot be nowhere den se. Th at is, there exist Yo E 'J and
r > 0 such that the ball B(4r, yo) is contained in T(B I ). Pick Y I == TXI E T(B I )
such that IIYl - Yo II < 2r; then B(2r, YI) C B(4r, Yo) C T(B I ) , so if lIyll < 2r,
y == TXI + (y - YI) E T(XI + B I ) C T(B 2 ).
Dividing both si des by 2, we conclude that th ere exists r > 0 such that if IIyll < r
then Y E T(B I ). If we could replace T(B I ) by T(B I ), perhaps shrinking r at the
same time, the proof would be complete; we now proceed to accomplish this.
Since T commutes with dilations, it follows that if lIyll < r2- n , then y E
T(B 2 -n). Suppose Ilyll < r/2; we can find Xl E BI/2 such that lIy - TXIII < r/4,
and proceeding inductively, we can find X n E B 2 -n such that Ily - L TXj II <
r2- n - l . Since X is complete, by Theorem 5.1 the series L X n converges, say to
x. But then IIxli < L 2- n == 1 and y == Tx. In other words, T(B I ) contains all y
with Ilyll < r /2, so we are done. .
5.11 Corollary. If X and are Banach spaces and T E L(X, 'J) is bijective, then
T is an isomorphism; that is, T- I E L(, X).
Proof. If T is bijective, continuity of T-l is equivalent to the openness of T. .
For the next results we need some more terminology. If X and 'J are normed
vector spaces and T is a linear map from X to , we define the graph of T to be
r (T) == {( X, y) E X x : y == T x } ,
THE BAIRE CATEGORY THEOREM AND ITS CONSEQUENCES 163
which is a subspace of X x 1:J. (From a strict set-theoretic point of view, of course,
T and f(T) are identical; the distinction is a psychological one.) We say that T is
closed if f(T) is a closed subspace of X x . Clearly, if T is continuous, then T is
closed, and if X and are complete the converse is also true:
5.12 The Closed Graph Theorem. If X and are Banach spaces and T : X
is a closed linear map, then T is bounded.
Proof. Let 7rl and 7r2 be the projections off(T) onto X and 1:J, that is, 7rl (x, Tx) ==
x and 7r2(X, Tx) == Tx. Obviously 7rl E L(f(T), X) and 7r2 E L(f(T), ). Since
X and are complete, so is X x , and hence so is f(T) since T is closed. The map
7rl is a bijection from f(T) to X, so by Corollary 5.11, 7r 1 1 is bounded. But then
T == 7r2 0 7r 1 1 is bounded. .
Continuity of a linear map T : X 1:J means that if X n x then TX n Tx,
whereas closedness means that if X n x and TX n y then y == Tx. Thus the
significance of the closed graph theorem is that in verifying that TX n Tx when
X n x, we may assume that TX n converges to something, and we need only to
show that the limit is the right thing. This frequently saves a lot of trouble.
The completeness of X and was used in a crucial way in proving the open
mapping theorem and hence also in proving the closed graph theorem. In fact, the
conclusions of both of these theorems may fail if either X or is incomplete; see
Exercises 29-31.
Our final result in this section is a theorem of almost magical power that allows
one to deduce uniform estimates from pointwise estimates in certain situations.
5.13 The Uniform Boundedness Principle. Suppose that X and}J are nonned vec-
tor spaces and A is a subset of L(X, ).
a. IfsuPTEA IITx!! < ooforallxinsomenonmeagersubsetofX, thensuPTEA liT II <
00.
b. If X isa Banach space andsuPTEA IITxll < ooforallx E X, thensuPTEA IITII <
00.
Proof. Let
En == {x EX: sup IITxl1 < n} == n {x EX: IITxl1 < n}.
TEA TEA
Then the En's are clos ed, so un der the hypothesis of (a) s ome En must contain
a nontrivial closed ball B(r, xo). But then E 2n :) B(r,O), for if Ilx!! < r, then
x - Xo E En and hence
IITxl1 < IIT(x - xo)11 + IITxol1 < 2n.
In other words, IITxl1 < 2n whenever TEA and /lxll < r, so sUPTEA liT!! < 2n/r.
This proves (a), and (b) follows by the Baire category theorem. .
164 ELEMENTS OF FUNCTIONAL ANALYSIS
Exercises
27. There exist meager subsets of ]R whose complements have Lebesgue measure
zero.
28. The Baire category theorem remains true if X is assumed to be an LCH space
rather than a complete metric space. (The proof is similar; the substitute for com-
pleteness is Proposition 4.21.)
29. Let == L 1 (/1) where /1 is counting measure on N, and let X == {f E :
L: nlf(n)1 < oo}, equipped with the L1 norm.
a. X is a proper dense subspace of ; hence X is not complete.
b. Define T : X by T f (n) == n f (n ). Then T is closed but not bounded.
c. Let S == T-1. Then S : X is bounded and surjective but not open.
30. Let == 0([0,1]) and X == 0 1 ([0,1]), both equipped with the uniform norm.
a. X is not complete.
b. The map (dj dx) : X is closed (see Exercise 9) but not bounded.
31. Let X, be Banach spaces and let S : X be an unbounded linear map (for
the existence of which, see 35.6). Let r(S) be the graph of S, a subspace of X x .
a. r( S) is not complete.
b. Define T : X r(S) by Tx == (x, Sx). Then T is closed but not bounded.
c. T- 1 : r( S) --+ X is bounded and surjective but not open.
32. Let" . "1 and " . "2 be norms on the vector space X such that " . "1 < " . "2. If
X is complete with respect to both norms, then the norms are equivalent.
33. There is no slowest rate of decay of the terms of an absolutely convergent series;
that is, there is no sequence {an} of positive numbers such that L: an IC n I < 00
iff { c n } is bounded. (The set of bounded sequences is the space B (N) of bounded
functions on N, and the set of absolutely summable sequences is L1 (/1) where /1 is
counting measure on N. If such an {an} exists, consider T : B(N) L 1 (/1) defined
by T f(n) == anf(n). The set of f such that f(n) == 0 for all but finitely many n is
dense in L 1 (/1) but not in B(N).)
34. With reference to Exercises 9 and 10, show that the inclusion map of L1([0, 1])
into Ok-1 ([0,1]) is continuous (a) by using the closed graph theorem, and (b) by
direct calculation. (This is to illustrate the use of the closed graph theorem as a
labor-saving device.)
35. Let X and be Banach spaces, T E L(X, }J), N(T) == {x : Tx == O}, and
M == range(T). Then XjN(T) is isomorphic to ]V( iff]v( is closed. (See Exercise
15.)
36. Let X be a separable Banach space and let /1 be counting measure on N. Suppose
that { x n } 1 is a countable dense subset of the unit ball of X, and define T : L 1 (/1)
X by T f == L: f(n)x n .
a. T is bounded.
TOPOLOGICAL VECTOR SPACES 165
b. T is surjective.
c. X is isomorphic to a quotient space of L 1 (11). (Use Exercise 35.)
37. Let X and be Banach spaces. If T : X ---t is a linear map such that f 0 T E X*
for every f E *, then T is bounded.
38. Let X and be Banach spaces, and let {Tn} be a sequence in L(X, ) such that
lim Tnx exists for every x E X. Let Tx == lim Tnx; then T E L(X, ).
39. Let X, , Z be Banach spaces and let B : X x ---t Z be a separately continuous
bilinear map; that is, B(x,.) E L(, Z) for each x E X and B(., y) E L(X, Z) for
each y E . Then B is jointly continuous, that is, continuous from X x 'J to Z.
(Reduce the problem to proving that IIB(x, y)11 < Cllxlillyll for some C > 0.)
40. (The Principle of Condensation of Singularities) Let X and be Banach
spaces and {Tjk : j, kEN} c L(X, 'J). Suppose that for each k there exists x E X
such that sup{ IITjkXl1 : j E N} == 00. Then there is an x (indeed, a residual set of
x's) such that sup{IITjkXII : j E N} == 00 for all k.
41. Let X be a vector space of countably infinite dimension (that is, every element
is afinite linear combination of members of a countably infinite linearly independent
set). There is no norm on X with respect to which X is complete. (Given a norm on
X, apply Exercise 18b and the Baire category theorem.)
42. Let En be the set of all! E C ([0, 1]) for which there exists Xo E [0, 1] (depending
on f) such that I!(x) - f(xo)1 < nix - xol for all x E [0,1].
a. En is nowhere dense in C([O, 1]). (Any real f E C([O, 1]) can be uniformly
approximated by a piecewise linear function 9 whose linear pieces, finite in
number, have slope :i:2n. If IIh - gllu is sufficiently small, then h En.)
b. The set of nowhere differentiable functions is residual in C([O, 1]).
5.4 TOPOLOGICAL VECTOR SPACES
It is frequently useful to consider topologies on vector spaces other than those defined
by norms, the only crucial requirement being that the topology should be well behaved
with respect to the vector operations. Precisely, a topological vector space is a vector
space X over the field K (== ]R or C) which is endowed with a topology such that the
maps (x, y) ---t X + y and (A, x) ---t AX are continuous from X x X and K x X to X.
A topological vector space is called locally convex if there is a base for the topology
consisting of convex sets (that is, sets A such that if x, yEA then tx + (1 - t)y E A
for 0 < t < 1). Most topological vector spaces that arise in practice are locally
convex and Hausdorff.
The most common way of defining locally convex topologies on vector spaces is
in terms of seminorms. Namely, if we are given a family of seminorms on X, the
"balls" that they define can be used to generate a topology in the same way that the
166 ELEMENTS OF FUNCTIONAL ANALYSIS
balls defined by a norm generate the topology on a normed vector space. The precise
result is as follows:
5.14 Theorem. Let {PG}GEA be a family of seminorms on the vector space X. If
x E X, a E A, and E > 0, let
U XGE == {y EX: PG(Y - x) < E},
and let 'J be the topology generated by the sets U XGE .
a. For each x E X, the finite intersections of the sets U XGE (a E A, E > O)form
a neighborhood base at x.
b. If(xi)iEI is a net in X, then Xi x iffpG(xi - x) Of or all a E A.
c. (X, 'J) is a locally convex topological vector space.
Proof. (a) If x E n7 UXjGjEj,let8j == Ej-PG(X-Xj). By the triangle inequality,
we have x E n7 UXGjbj C n7 UXjGjEj. Thus the assertion follows from Proposition
4.4.
(b) In view of (a), it suffices to observe that PG (Xi - X) --+ 0 iff (Xi) is eventually
in U XGE for every E > O.
(c) The continuity of the vector operations follows easily from Proposition 4.19
and part (b). Indeed, if Xi x and Yi Y, then
PG ((Xi + Yi) - (x + y)) < PG(Xi - x) + PG(Yi - y) 0,
so Xi + Yi X + y. If also Ai A, then eventually IAil < C == IAI + 1, so
PG(AiXi- AX ) < PG(Ai(Xi- X ))+PG((Ai-A)X) < CPG(xi-x)+IAi-AlpG(x),
and it follows that AiXi AX. Moreover, the sets U XG € are convex, for if Y, Z E U XGE ,
then
PG(X-[ty+(1-t)z]) < PG(tx-tY)+PG((1-t)x+(1-t)z) < tE+(1-t)E == E.
The local convexity of the topology therefore follows from (a).
In this context there is an analogue of Proposition 5.2:
.
5.15 Proposition. Suppose X and are vector spaces with topologies defined,
respectively, by the families {PG }GEA and {q,B} ,BEB of seminorms, and T : X
is a linear map. Then T is continuous iff for each (3 E B there exist 0'.1, . . . , ak E A
and C > 0 such that q,B(Tx) < C 2:7 PGj (x).
Proof. If the latter condition holds and (Xi) is a net converging to x E X, by
Theorem 5.14b we have PG(Xi - x) 0 for all a, hence q,B(TXi - Tx) 0
for all {3, hence TXi Tx. By Proposition 4.19, T is continuous. Conversely,
if T is continuous, for every (3 E B there is a neighborhood U of 0 in X such that
q,B(Tx) < 1 for x E U. By Theorem 5.14a we may assume that U == n7 UXGjEj. Let
TOPOLOGICAL VECTOR SPACES 167
E :::; min( El, . . . , Ek); then qj3(Tx) < 1 whenever POj (x) < E for all j. Now, given
x E X, there are two possibilities. Ifpo j (x) > 0 for some j, let y == Exl L POj (x).
Then POj (y) < E for all j, so
k k
qj3(Tx) == L E- 1 po j (x)qj3(Ty) < E- 1 LPo j (x).
1 1
On the other hand, if POj (x) == 0 for all j, then POj (rx) == 0 for all j and all
r > 0, hence rqj3(Tx) == qj3(T(rx)) < 1 for all r > 0, hence qj3(Tx) == O. Thus
qj3(Tx) < E- 1 L POj (x) in this case too, and we are done. .
The proof of the following proposition is left to the reader (Exercise 43).
5.16 Proposition. Let X be a vector space equipped with the topology defined by a
family {Po} oEA of seminorms.
a. X is Hausdorff iff for each x =1= 0 there exists a E A such that Po (x) =1= o.
b. If X is Hausdorff and A is countable, then X is metrizable with a translation-
invariant metric (i.e., p(x,y) == p(x+z, y+z)forallx,y,z EX).
If X has the topology defined by the seminorms {PO}OEA, by Proposition 5.15 a
linear functional f on X is continuous iff If (x) I < c L POj (x) for some C > 0 and
aI, . . . , ak E A. Since a finite sum of seminorms is again a seminorm, the Hahn-
Banach theorem guarantees the existence of lots of continuous linear functionals on
X - enough to separate points, if X is Hausdorff. The set of all such functionals is
denoted, as before, by X*. There are various ways of making X* into a topological
vector space, but we shall not consider this question systematically. The simplest
way is to impose the weakest topology that makes all the evaluation maps f f(x)
(x E X) continuous, an idea that we shall discuss further below.
In a topological vector space X the notion of Cauchy sequence or Cauchy net makes
sense. Namely, a net (Xi)iEI in X is called Cauchy if the net (Xi - Xj)(i,j)EIXI
converges to zero. (Here I x I is directed in the usual way: (i, j) ;S (i', j') iff
i ;S i' and j ;S j'.) Naturally, X is called complete if every Cauchy net converges.
Completeness is of most interest when X is first countable, in which case it is
equivalent to the condition that every Cauchy sequence converges (Exercise 44).
More particularly, if X is Hausdorff and its topology is defined by a countable family
of seminorms, then this topology is first countable by Theorem 5.14a; indeed, it
is given by a translation-invariant metric p by Proposition 5.16b, and a sequence
is Cauchy according to the definition just given iff it is Cauchy with respect to
p. A complete Hausdorff topological vector space whose topology is defined by a
countable family of seminorms is called a Frechet space.
Let us now consider some interesting examples of topological vector spaces whose
topologies are defined by families of seminorms rather than by single norms. We
have already met a couple of them in previous chapters:
. Let X be an LCH space. On eX, the topology of uniform convergence
on compact sets is defined by the seminorms PK(!) == SUPxEK If(x)1 as
168 ELEMENTS OF FUNCTIONAL ANALYSIS
K ranges over compact subsets of X. If X is a-compact and {Un} are
as in Propositions 4.39 and 4.40, this topology is defined by the seminorms
Pn(f) == sUPxE u n If(x)l. In this case, eX is easily seen to be complete, so it
is a Frechet space; by Proposition 4.38, so is C(X).
. The space LfocORn), defined in 33.4, is a Frechet space with the topology
defined by the seminorms Pk (f) == xl5:k If (x) I dx. (Completeness follows
easily from the completeness of L 1 .) An obvious generalization of this con-
struction yields a locally convex topological vector space Lfoc (X, J-l) where X
is any LCH space and J-l is a Borel measure on X that is finite on compact sets.
Another class of topological vector spaces arises naturally in connection with
the theory of differential equations. One often wishes to study the operator d/ dx, or
more complicated operators constructed from it, acting on various spaces of functions.
Unfortunately, it is virtually impossible to define norms on most infinite-dimensional
functions spaces so that d / dx becomes a bounded operator. Here is one precise result
along these lines: There is no norm on the space Coo ([0, 1]) of infinitely differentiable
functions on [0, 1] with respect to which d/ dx is bounded. Indeed, if f).. (x) == e)..x,
then (d/ dx )f).. == Af).., so IId/ dxll > IAI for all A no matter what norm is used on
Coo ([0, 1]).
In view of this difficulty, three courses of action are available. First, one can
consider differentiation as an unbounded operator from :x to where is a suitable
Banach space and X is a dense subspace of , as in Exercise 30. Second, one can
consider differentiation as a bounded linear map from one Banach space X to a
different one, such as.X == Ck([O, 1]) and == C k - 1 ([0, 1]) in Exercise 9. Finally,
one can consider differentiation as a continuous operator on a locally convex space X
whose topology is not given by a norm. All of these points of view have their uses, but
it is the last one that concerns us here. It is easy to construct families of seminorms
on spaces of smooth functions such that differentiation becomes continuous almost
by definition. For example, the seminorms Pk(f) == sUPO<x<1 If(k)(x)1 (k ==
0,1,2, . . .) make COO([O, 1]) into a Frechet space (the completeness is proved as
in Exercise 9), and d/dx is continuous on this space by Proposition 5.15 since
Pk (f') == Pk+ 1 (f). Other examples are considered in Exercise 45 and in Chapter 9.
One of the most useful procedures for constructing topologies on vector spaces is
by requiring the continuity of certain linear maps. Namely, suppose that X is a vector
space, is a normed linear space, and {Ta}aEA is a collection of linear maps from
:x to . Then the weak topology 'J generated by {Ta} makes :x into a locally convex
topological vector space. Indeed, 'I is just the topology 'I' defined by the seminorms
Pa(x) == IITaxl! according to Theorem 5.14. ('I is generated by sets of the form
{x : IITax - Yo II < E} with Yo E , whereas 'I' is generated by sets of the form
{x : IITax - Taxo II < E} with Xo E X. If the Ta's are surjective, these are obviously
the same; the general case is left as Exercise 46.) The topology on COO([O, 1]) in
the preceding paragraph is an example of this construction, with == C([O, 1]) and
Tkf == f(k). We now present some more.
First, let :x be a normed vector space. The weak topology generated by X* is
known simply as the weak topology on X, and convergence with respect to this
TOPOLOGICAL VECTOR SPACES 169
topology is known as weak convergence. Thus, if (xa) is a net in X, Xa -7 x
weakly iff f(x a ) -7 f(x) for all f E X*. When X is infinite-dimensional, the weak
topology is always weaker than the norm topology; see Exercise 49.
Next, let X be a normed vector space, X* its dual space. The weak topology
on X* as defined above is the topology generated by X**; of more interest is the
topology generated by X (considered as a subspace of X**), which is called the
weak* topology (read "weak star topology") on X*. X* is a space of functions on X,
and the weak* topology is simply the topology of pointwise convergence: fa -7 f
iff fa(x) -7 f(x) for all x E X. The weak* topology is even weaker than the weak
topology on X*; the two coincide precisely when X is reflexive.
Finally, let X and 11 be Banach spaces. The topology on L(X, 11) generated by
the evaluation maps T Tx (x E X) is called the strong operator topology on
L(X, 11), and the topology generated by the linear functionals T f(Tx) (x E X,
f E 11*) is called the weak operator topology on L(X, 11). Again, these topologies
are best understood in terms of convergence: Ta -7 T strongly iff Tax -7 Tx in the
norm topology of 11 for each x E X, whereas Ta -7 T weakly iff Tax -7 Tx in the
weak topology of 1) for each x E X. Thus the strong operator topology is stronger
than the weak operator topology but weaker than the norm topology on L(X, 1}).
The following result concerning strong convergence is almost trivial but extremely
useful:
5.17 Proposition. Suppose {Tn}r c L(X, 11), sUP n II Tn II < 00, andT E L(X, 11).
if IITn x - Tx/l -+ 0 for all x in a dense subset D of X, then Tn -+ T strongly.
Proof. Let C == sup{ IITII, IITll, IIT211, . . .}. Given x E X and E > 0, choose
x' E D such that Ilx - x' II < E/3C. If n is large enough so that II Tn x ' - Tx'lI < E/3,
we have
IITn x - Txll < IITn x - Tnx'il + IITn x ' - Tx'il + IITx' - Txll
< 2Cllx - x'il + E < E,
so that Tnx -+ Tx.
.
Our final result in this section is a compactness theorem that is one of the main
reasons for the usefulness of the weak* topology on a dual space. The idea of the
proof is similar to the techniques discussed in 34.8.
5.18 Alaoglu's Theorem. If X is a normed vector space, the closed unit ball B* ==
{f E X* : Ilfll < 1} in X* is compact in the weak* topology.
Proof For each x E X let Dx == {z E C: Izl < IIxll}, and let D == TIxExDx.
Then D is compact by Tychonoff's theorem. The elements of D are precisely those
complex-valued functions cjJ on X such that IcjJ(x) I < Ilxll for all x E X, and B*
consists of those elements of D that are linear. Moreover, the relative topologies that
B* inherits from the product topology on D and the weak* topology on X* both
coincide with the topology of pointwise convergence, so it suffices to see that B* is
170 ELEMENTS OF FUNCTIONAL ANALYSIS
closed in D. But this is easy: If (/a) is a net in B* that converges to 1 E D, for any
x, y E X and a, b E <C we have
1 (ax + by) == lirn 1 a ( ax + by) == lirn [ ala ( x) + b 1 a (y )] == a 1 ( x) + b 1 (y ) ,
so that 1 E B * .
.
Warning: Alaoglu's theorem does not imply that X* is locally compact in the
weak* topology; see Exercise 49b.
Exercises
43. Prove Proposition 5.16. (For part (b), proceed as in Exercise 56d in 34.5.)
44. If X is a first countable topological vector space and every Cauchy sequence in
X converges, then every Cauchy net in X converges.
45. The space Coo (IR) of all infinitely differentiable functions on IR has a Frechet
space topology with respect to which in ---+ 1 iff Ik) ---+ I(k) uniformly on compact
sets for all k > o.
46. If X is a vector space, 11 a normed linear space, 'I the weak topology on X
generated by a family of linear maps {Ta : X ---+ 1I}, and'l' the topology defined by
the seminorms {x 1---+ IITaxll}, then 'I == 'I'.
47. Suppose that X and 11 are Banach spaces.
a. If {Tn}l C L(X, 1J) and Tn ---+ T weakly (or strongly), then sUPn IITn II <
00.
b. Every weakly convergent sequence in X, and every weak*-convergent se-
quence in X*, is bounded (with respect to the norm).
48. Suppose that X is a Banach space.
a. The norm-closed unit ball B == {x EX: IIxll < I} is also weakly closed.
(Use Theorem 5.8d.)
b. If E c X is bounded (with respect to the norm), so is its weak closure.
c. If F c X* is bounded (with respect to the norm), so is its weak* closure.
d. Every weak*-Cauchy sequence in X* converges. (Use Exercise 38.)
49. Suppose that X is an infinite-dimensional Banach space.
a. Every nonempty weakly open set in X, and every nonempty weak*-open set
in X*, is unbounded (with respect to the norm).
b. Every bounded subset of X is nowhere dense in the weak topology, and every
bounded subset of X* is nowhere dense in the weak* topology. (Use Exercise
48b,c. )
c. X is meager in itself with respect to the weak topology, and X* is meager in
itself with respect to the weak* topology.
d. The weak* topology on X* is not defined by any translation-invariant metric.
(Use Exercise 48d.)
HILBERT SPACES 171
50. If X is a separable normed linear space, the weak* topology on the closed unit
ball in X* is second countable and hence metrizable. (But cf. Exercise 49d.)
51. A vector subspace of a normed vector space X is norm-closed iff it is weakly
closed. (However, a norm-closed subspace of X* need not be weak*-closed unless
X is reflexive; see Exercise 52d.)
52. Let X be a Banach space and let 11, . . . , f n be linearly independent elements of
X* .
a. Define T : X -? C n by Tx == (11 (x), ..., In (X)). IfN == {x : Tx == O} and
JY( is the linear span of 11, . . . , f n, then JY( == NO in the notation of Exercise 23
and hence Jv{* is isomorphic to (X IN) * .
b. If F E X**, for any E > 0 there exists x E X such that F(fj) == fj (x)
for j == 1,..., nand Ilxll < (1 + E) II FII. (FIM can be identified with an
element of (X IN) ** and hence with an element of X IN since the latter is finite-
dimensional. )
c. If X is considered as a subspace of X**, the relative topology on X induced
by the weak* topology on X** is the weak topology on X.
d. In the weak* topology on X**, X is dense in X** and the closed unit ball in
X is dense in the closed unit ball in X**.
e. X is reflexive iff its closed unit ball is weakly compact.
53. Suppose that X is a Banach space and {Tn}, {Sn} are sequences in L(X, X)
such that Tn -? T strongly and Sn -? S strongly.
a. If {x n } C X and Ilx n - xii -? 0, then IITnxn - Txll -? O. (Use Exercise
47 a. )
b. TnSn -? TS strongly.
5.5 HILBERT SPACES
The most important Banach spaces, and the ones on which the most refined analysis
can be done, are the Hilbert spaces, which are a direct generalization of finite-
dimensional Euclidean spaces. Before defining them, we need to introduce a few
concepts.
Let 9-( be a complex vector space. An inner product (or scalar product) on 9-( is
a map (x, y) 1---+ (x, y) from X x X ---+ C such that:
1. (ax + by , z) == a(x, z) + b(y, z) for all x, y, z E 9-( and a, b E C.
11. (y, x) == (x, y) for all x, y E 9-(.
111. (x, x) E (0,00) for all nonzero x EX.
We observe that (i) and (ii) imply that
(x, ay + bz) == a (x, y) + b(x, z) for all x, y, z E 9-( and a, b E C.
172 ELEMENTS OF FUNCTIONAL ANALYSIS
(One can also define inner products on real vector spaces: (x, y) is then real, a and b
are assumed real in (i), and (ii) becomes (y, x) == (x, y).)
A complex vector space equipped with an inner product is caned a pre-Hilbert
space. If 9-C is a pre-Hilbert space, for x E 9-C we define
Ilxll == J (x, x).
5.19 The Schwarz Inequality. I (x, y) I < Ilxllllyll for all x, y E 9-C, with equality
iff x and yare linearly dependent.
Proof. If (x, y) == 0, the result is obvious. If (x, y) =1= 0 (and in particular y =1= 0),
let a == sgn(x, y) and z == ay, so that (x, z) == (z, x) == I (x, y) I and Ilzll == Ilyll.
Then for t E IR we have
o < (x - tz, x - tz) == IIxl1 2 - 2tl(x, y)1 + t 2 11y112.
The expression on the right is a quadratic function of t whose absolute minimum
occurs at t == Ilyll-21 (x, y) I. Setting t equal to this value, we obtain
o < IIx - tzll 2 == IIxll 2 - IIyll-21(x,y)12
with equality iff x - tz == x - aty == 0, from which the desired result is immediate. .
5.20 Proposition. The function x 1---+ Ilxll is a norm on 9{.
Proof. That IIxll == 0 iff x == 0 and that IIAxll == IAlllxll are obvious from the
definition. As for the triangle inequality, we have
Ilx + yll2 == (x + y, x + y) == IIxl1 2 + 2 Re(x, y) + lIy112,
so by the Schwarz inequality,
IIx + yll2 < IIxl/2 + 211xllilyll + IIyl12 == (lIxll + IIyll)2,
as desired. .
A pre-Hilbert space that is complete with respect to the norm IIxll == J (x, x)
is called a Hilbert space. (One can also consider real Hilbert spaces with real
inner products. However, Hilbert spaces are usually assumed to be complex unless
otherwise specified.)
Example: Let (X, JvC, J.l) be a measure space, and let L2(J.l) be the set of all
measurable functions f : X ---+ <C such that J If 1 2 dJ.l < ex) (where, as usual, we
identify two functions that are equal a.e.). From the inequality ab < (a 2 + b 2 ),
valid for an a, b > 0, we see that if f, g E £2(J.l) then If g l < (lfI2 + IgI2), so that
fg E £1 (J-L). It follows easily that the formula
(1, g) = J fg dp
HILBERT SPACES 173
defines an inner product on £2 (J-L). In fact, £2 (J-L) is a Hilbert space for any measure
/-1. We shall prove completeness in Theorem 6.6; for the present we shall take this
result for granted.
An important special case of this construction is obtained by taking J-L to be
counting measure on (A, P(A)), where A is any nonempty set; in this situation
£2 (J-L) is usually denoted by Z2 (A). Thus, Z2 (A) is the set of functions f : A -? C
such that the sum L:aEA If (ex) 1 2 (as defined in 30.5) is finite. The completeness of
Z2 (A) is rather easy to prove directly (Exercise 54).
For the remainder of this section, 9{ will denote a Hilbert space.
5.21 Proposition. If X n -? x and Yn -? y, then (xn, Yn) -? (x, y).
Proof. By the Schwarz inequality,
I (xn, Yn) - (x, y) I == I (xn - x, Yn) + (x, Yn - y) I
< Ilxn - Xii llYn II + IIxIIIIYn - ylI,
which tends to zero since II Yn II -? II Y II.
.
5.22 The Parallelogram Law. For all x, Y E Je,
IIX + YI12 + II X - y//2 == 2(llxII2 + IIYII2).
("The sum of the squares of the diagonals of a parallelogram is the sum of the squares
of the four sides." )
Proof. Add the two formulas Ilx ::f: YII2 == IIxII2 ::f: 2 Re(x, y) + IIYII2. .
If x, Y E X, we say that x is orthogonal to Y and write x 1- Y if (x, y) == O. If
E c Je, we define
E.L == {x E Je : (x, y) == 0 for all Y E E}.
It is immediate from Proposition 5.21 and the linearity of the inner product in its first
argument that E.L is a closed subspace of Je.
5.23 The Pythagorean Theorem. If Xl, . . . , x n E Je and Xj 1- Xk for j =1= k,
n 2 n
IILXjl1 = L IIXjl/2,
I I
Proof. II L: Xj 11 2 == (L: Xj, L: Xj) == L:j,k (Xj, Xk). The terms with k =1= j are
all zero, leaving only L: (Xj, Xj) == L: IIXj 11 2 . .
5.24 Theorem. IfM is a closed subspace ofJe, then Je == M EB M.L; that is, each
x E Jecanbeexpresseduniquelyasx == y+zwherey E Mandz E M...L. Moreo1'el;
y and z are the unique elements ofM and M.L whose distance to x is minimal.
174 ELEMENTS OF FUNCTIONAL ANALYSIS
Proof. Given x E 9-(, let 8 == inf{lIx - yll : Y EM}, and let {Yn} be a sequence
in M such that IIx - Yn II 8. By the paralellogram law,
2(IIYn - xl1 2 + IIYm - x1/2) == llYn - Yml1 2 + llYn + Ym - 2x11 2 ,
so since (Yn + Ym) E M,
llYn - Yml1 2 == 211Yn - xl1 2 + 211Ym - xl1 2 - 411(Yn + Ym) - xl1 2
< 211Yn - xl1 2 + 211Ym - xl1 2 - 48 2 .
As m, n ex) this last quantity tends to zero, so {Yn} is a Cauchy sequence. Let
Y == lim Yn and z == x - y. Then Y E M since M is closed, and Ilx - yll == 8.
We claim that z E M 1-. Indeed, if u E M, after multiplying u by a nonzero scalar
we may assume that (z, u) is real. Then the function
f(t) == Ilz + tul1 2 == IIzl1 2 + 2t(z, u) + t 2 11ul1 2
is real for t E JR., and is has a minimum (namely, 8 2 ) at t == 0 because z + tu ==
x - (y - tu) and Y - tu E M. Thus 2(z,u) == f'(O) == 0, so z E M1-. Moreover,
if z' is another element of M 1-, by the Pythagorean theorem (since x - z == Y E M)
we have
Ilx - z'I12 == Ilx - zl12 + Ilz - z'I12 > Ilx - z112,
with equality iff z == z'. The same reasoning shows that Y is the unique element of
]V( closest to x.
Finally, if x == y' + z' with y' E M and z' E M 1- , then Y - y' == z' - z E M n M 1- ,
so Y - y' and z' - z are orthogonal to themselves and hence are zero. .
If Y E 9-(, the Schwarz inequality shows that the formula f y (x) == (x, y) defines
a bounded linear functional on 9-( such that Ilfyll == Ilyll. Thus, the map Y fy is
a conjugate-linear isometry of 9-( into 9-(*. It is a fundamental fact that this map is
surjective:
5.25 Theorem. If f E 9-(*, there is a unique Y E 9-( such that f(x) == (x, y) for all
x E X.
Proof. Uniqueness is easy: If (x, y) == (x, y') for all x, by taking x == Y - y'
we conclude that Ily - y'I1 2 == 0 and hence Y == y'. If f is the zero functional, then
obviously Y == o. Otherwise, let M == {x E 9-( : f(x) == O}. Then Jy( is a proper
closedsubspaceofX, soM1- =1= {O} by Theorem 5.24. Pickz E M1- with Ilzll == 1.
If u == f(x)z - f(z)x then u E M, so
o == (u, z) == f(x)lIzI12 - f(z)(x, z) == f(x) - (x, f(z) z).
Hence f(x) == (x, y) where Y == f(z)z.
.
HILBERT SPACES 175
Thus, Hilbert spaces are reflexive in a very strong sense: Not only is naturally
isomorphic to 9{**, it is naturally isomorphic (via a conjugate-linear map) to *.
A subset {rua }aEA of is called orthonormal if liru a II == 1 for all a and
rua 1- ru{3 whenever a =I (3. If {xn}r is a linearly independent sequence in 9{, there
is a standard inductive procedure, called the Gram-Schmidt process, for converting
{ x n } into an orthonormal sequence { un} such that the linear span of { x n } f coincides
with the linear span of {un}f for all N. Namely, the first step is to set UI == Xl /llxIII.
Having defined UI, . . . , UN-I, we set VN == XN - Lf- l (XN, un)u n . Then VN
is nonzero because X N is not in the linear span of Xl, . . . , X N -1 and hence of
ru 1, . . . , ruN -1, and (v N , rum) == (X N , U m ) - (X N , rum) == 0 for all m < N. We can
therefore take UN == VN /l/vNII.
5.26 Bessel's Inequality. If { ru a } aEA is an orthonormal set in , then for any
.L' E 9{,
I: 1 (x, U a ) 1 2 < Il x 11 2 .
aEA
In particular, {a : (x, U a ) =I O} is countable.
Proof. It suffices to show that LaEF I (x, U a ) 1 2 < II x ll 2 for any finite F c A.
But
o < Ilx - I: (x, U a ) U a 11 2
aEF
= IIxl1 2 - 2 Re \ x, I: (x, Ua)U a ) + " I: (x, Ua)U a 11 2
aEF aEF
== II x l1 2 - 2 I: I(x, u a )1 2 + I: I(x, u a )1 2
aEF aEF
== IIxl/2 - I: I (x, U a ) 1 2 ,
aEF
where the Pythagorean theorem was used in the third line.
.
5.27 Theorem. If {Ua}aEA is an orthonormal set in, thefollowing are equivalent:
a. (Completeness) If (x, u a ) == 0 for all a, then x == O.
b. (Parseval's Identity) IIxll 2 == LaEA 1 (x, U a ) 1 2 for all x E .
c. For each x E , x == LaEA (x, rua)u a , where the sum on the right has only
countably many nonzero terms and converges in the norm topology no matter
how these terms are ordered.
Proof. (a) implies (c): If x E , let aI, a2, . . . be any enumeration of the a's
for which (x, ru a ) =I O. By Bessel's inequality the series L 1 (x, ru aj ) 1 2 converges, so
by the Pythagorean theorem,
m 2 m
III: (x, U aj )U aj II = I: I (x, uaJ 1 2 ---* 0 as m, n ---* 00.
n n
176 ELEMENTS OF FUNCTIONAL ANALYSIS
The series L (x, U a j ) U a j therefore converges since 9-( is complete. If y = x -
L(x,UaJ)u aJ , then clearly (y,u a ) == 0 for all ex, so by (a), y == o.
(c) implies (b): With notation as above, as in the proof of Bessel's inequality we
have
n n 2
IIxl1 2 - L !(x, u a JI2 = II x - L (x, U aj )U aj II --t 0 as n --t 00.
1 1
Finally, that (b) implies (a) is obvious.
.
An orthonormal set having the properties (a-c) in Theorem 5.27 is called an
orthonormal basis for 9-(. For example, let 9-( == Z2 (A). For each ex E A, define
eo: E Z2 (A) by eo: ((3) == 1 if (3 == ex, eo: ((3) == 0 otherwise. The set {e a } aEA is
clearly orthonormal, and for any 1 E Z2 (A) we have (I, e a ) = 1 (ex), from which it
follows that { eo:} is an orthonormal basis.
5.28 Proposition. Every Hilbert space has an orthonormal basis.
Proof. A routine application of Zorn's lemma shows that the collection of or-
thonormal sets, ordered by inclusion, has a maximal element; and maximality is
equivalent to property (a) in Theorem 5.27. .
5.29 Proposition. A Hilbert space 9-( is separable iff it has a countable orthonormal
basis, in which case every orthonormal basis for 9-( is countable.
Proof. If {x n } is a countable dense set in 9-(, by discarding recursively any X n
that is in the linear span of Xl, . . . , X n -l we obtain a linearly independent sequence
{Yn} whose linear span is dense in 9-(. Application of the Gram-Schmidt process to
{Yn} yields an orthonormal sequence { un} whose linear span is dense in 9-( and which
is therefore a basis. Conversely, if { Un} is a countable orthonormal basis, the finite
linear combinations of the Un's with coefficients in a countable dense subset of <C
form a countable dense set in 9-(. Moreover, if {va }aEA is another orthonormal basis,
for each n the set An == {ex E A : (Un, Va) i= o} is countable. By completeness of
{Un}, A == U An, so A is countable. .
Most Hilbert spaces that arise in practice are separable. We discuss some examples
in Exercises 60-62.
If 9{ 1 and 9--(2 are Hilbert spaces with inner products (., .) 1 and (-, .) 2, a unitary
map from 9-(1 to 9-(2 is an invertible linear map U : 9-(1 ---+ 9-(2 that preserves inner
products:
(U X, U y) 2 == (X, y) 1 for all X, Y E 9-(1.
By taking Y = x, we see that every unitary map is an isometry: IIUxI12 = /lxlh.
Conversely, every surjective isometry is unitary (Exercise 55). Unitary maps are the
true "isomorphisms" in the category of Hilbert spaces; they preserve not only the
linear structure and the topology but also the norm and the inner product. From the
point of view of this abstract structure, every Hilbert space looks like an Z2 space:
HILBERT SPACES 177
5.30 Proposition. Let {ea}aEA be an orthonormal basis for X. Then the corre-
spondence x x defined by x(a) == (x, u a ) is a unitary map from Je to Z2(A).
Proof. The map x x is clearly linear, and it is an isometry from Je to Z2 (A)
by the Parseval identity /lx/l2 == I: Ix(a)12. If f E Z2(A) then I: If(a)12 < 00,
so the Pythagorean theorem shows that the partial sums of the series I: f ( a) U a (of
which only countably many terms are nonzero) are Cauchy; hence x == I: f(a)u a
exists in Je and x == f. By Exercise 55b, x x is unitary. .
Exercises
54. For any nonempty set A, Z2 (A) is complete.
55. Let Je be a Hilbert space.
a. (The polarization identity) For any x, y E Je,
(x, y) == i (11x + yl12 + Ilx - yl12 + illx + iyl12 - illx - iyI12).
(Completeness is not needed here.)
b. If Je' is another Hilbert space, a linear map from Je to Je' is unitary iff it is
isometric and surjective.
56. If E is a subset of a Hilbert space Je, (E.L).L is the smallest closed subspace of
Je containing E.
57. Suppose that 9{ is a Hilbert space and T E L(Je,9-£).
a. There is a unique T* E L(Je, Je), called the adjoint ofT, such that (Tx, y) ==
(x, T*y) for all x, y E Je. (Cf. Exercise 22. We have T* == y-1Tty where Y
is the conjugate-linear isomorphism from 9-£ to Je* in Theorem 5.25, (Y y) (x) ==
(x, y).)
b. IIT*/I == IITII, IIT*T/I == IITI12, (as + bT)* == a s* + bT*, (ST)* == T*S*,
and T** == T.
c. Let andNdenoterangeandnullspace; then(T).L == N(T*) andN(T).L ==
(T*).
d. T is unitary iff T is invertible and T- 1 == T*.
58. Let M be a closed subspace of the Hilbert space Je, and for x E Je let Px be the
element of M such that x - Px E M.L as in Theorem 5.24.
a. P E L(Je, Je), and in the notation of Exercise 57 we have P* == P, p2 == P,
(P) == M, and N( P) == M.L. P is called the orthogonal projection onto M.
b. Conversely, suppose that P E L(Je, Je) satisfies p2 == P* == P. Then (P)
is closed and P is the orthogonal projection onto (P).
c. If { u a } is an orthonormal basis for M, then P x == I: (x, u a ) U a .
59. Every closed convex set K in a Hilbert space has a unique element of minimal
norm. (If 0 E K, the result is trivial; otherwise, adapt the proof of Theorem 5.24.)
60. Let (X, M, J-L) be a measure space. If E E M, we identify L 2 (E, J-L) with the
subspace of L 2 (X, J-L) consisting of functions that vanish outside E. If {En} is
178 ELEMENTS OF FUNCTIONAL ANALYSIS
a disjoint sequence in M with X == U En, then {L2(En,J.l)} is a sequence of
mutually orthogonal subspaces of L 2 (X, J.l), and every I E L 2 (X, J.l) can be written
uniquely as f == L: fn (the series converging in norm) where In E L2(En, J.l). If
L2(En, J.l) is separable for every n, so is L 2 (X, J.l).
61. Let (X, M, JL) and (Y, N, v) be a-finite measure spaces such that L2(JL) and
L2(v) are separable. If {1m} and {gn} are orthonormal bases for L2(J.l) and L2(v)
and hmn(x, y) == Im(x)gn(Y), then {h mn } is an orthonormal basis for L2(J.l x v).
62. In this exercise the measure defining the L 2 spaces is Lebesgue measure.
a. 0([0,1]) is dense in L 2 ([0, 1]). (Adapt the proof of Theorem 2.26.)
b. The set of polynomials is dense in L 2 ([0, 1]).
c. L 2 ([0, 1]) is separable.
d. L 2 (IR) is separable. (Use Exercise 60.)
e. L 2 (IRn) is separable. (Use Exercise 61.)
63. Let J( be an infinite-dimensional Hilbert space.
a. Every orthonormal sequence in J( converges weakly to O.
b. The unit sphere S == {x : Ilxll == 1} is weakly dense in the unit ball
B == {x : Ilxll < 1}. (In fact, every x E B is the weak limit of a sequence in S.)
64. Let J( be a separable infinite-dimensional Hilbert space with orthonormal basis
{ un} 1.
a. For kEN, define Lk E L(J(, J() by Lk(L: anu n ) == L: anUn-k. Then
Lk ---+ 0 in the strong operator topology but not in the norm topology.
b. For kEN, define Rk E L(J(, J() by Rk(L: anu n ) == L: anUn+k. Then
Rk 0 in the weak operator topology but not in the strong operator topology.
c. RkLk ---+ 0 in the strong operator topology, but LkRk == I for all k. (Use
Exercise 53b.)
65. l2(A) is unitarily isomorphic to Z2(B) iff card(A) == card(B).
66. Let M be a closed subspace of L 2 ([0, 1], m) that is contained in 0([0, 1]).
a. There exists 0 > 0 such that 1I/IIu < 011/11£2 for all I E M. (Use the closed
graph theorem.)
b. For each x E [0,1] there exists gx E M such that I(x) == (I, gx) for all
I E M, and Ilgxll£2 < c.
c. The dimension of M is at most C 2 . (Hint: If {fj } is an orthonormal sequence
in M, L: Ifj(x)12 < C 2 for all x E [0,1].)
67. (The MeaD Ergodic Theorem). Let U be a unitary operator on the Hilbert
space J(, M == {x : U x == x}, P the orthogonal projection onto M (Exercise 58),
and Sn == n -1 L:-1 U j . Then Sn P in the strong operator topology. (If x E M,
then Snx == x; if x == y - Uy for some y, then Snx ---+ O. By Exercise 57d,
M = {x : U*x == x}. Apply Exercise 57c with T = I - U.)
NOTES AND REFERENCES 179
5.6 NOTES AND REFERENCES
Functional analysis is a vast subject of which we have barely scratched the surface
here. For the reader who wishes to learn more, Reed and Simon [112] and Rudin
[126] are good places to start; one should also familiarize oneself with the treatises
of Dunford and Schwartz [35] and Yosida [163].
Functional analysis has roots in a number of classical problems, particularly in
the theory of differential and integral equations. The study of particular infinite-
dimensional function spaces began in earnest around 1907 with work of F. Riesz,
Frechet, Schmidt, Helly, and others, and the notion of an abstract normed vector space
appeared in papers by several authors about 1920. The research of the succeeding
decade culminated in Banach's classic book [9], which marked the emergence of
functional analysis as an established discipline. Detailed historical accounts can be
found in Dieudonne [33] and in the notes in Dunford and Schwartz [35].
35.1: The integral for vector-valued functions developed in Exercise 16 is called
the Bochner integral. The hypothesis that is separable can be dropped, but the
functions in L must then be required to have separable range (after modification on
a null set). A more detailed account can be found in Cohn [27] or Yosida [163].
Another approach to vector-valued integrals is as follows. Suppose that (X, M, J-l)
is a measure space and is a topological vector space on which the continuous linear
functionals separate points. A function f : X is called weakly integrable if (i)
q; 0 fELl (J-l) for all q; E *, and (ii) there exists y E (necessarily unique) such
that J cjJ 0 f dJ-l == cjJ(y) for all cjJ E *. In this case we set J f dJ-l == y. If is a
separable Banach space, this notion of integral coincides with the Bochner integral.
See Yosida [163] and Rudin [126].
35.3: The open mapping and closed graph theorems are due to Banach [9]. See
Grabiner [58] for an interesting comment on the relation between the proofs of the
open mapping theorem and the Tietze extension theorem.
The uniform boundedness principle, as we have stated it, is due to Banach and
Steinhaus [10]; however, the second part of the theorem - that if X is a Banach
space and sUPTEA IITxll < 00 for all x E X, then sUPTEA IITII < 00 - had been
proved previously by what Dieudonne [33] calls the "method of the gliding hump."
This rather pretty (and elementary) argument has been largely neglected in recent
years, but a modem exposition of it can be found in Hennefeld [71].
It is simple to construct examples of unbounded linear maps T : X from one
normed vector space to another when X is incomplete (see Exercises 29 and 30), but
virtually impossible to do so when X is complete without using the axiom of choice.
The standard method is as follows: Start with an unbounded T : Xo where
Xo is incomplete, and let X be the completion of Xo. Pick a basis {Ua}aEA for Xo
(meaning that every x E Xo is afinite linear combination of the ua's), and extend it
to a basis {Ua}aEB (B ::) A) for X. (This is where the axiom of choice comes in.)
Let M be the linear span of {Ua}aEB\A, so that each x E X can be written uniquely
as x == Xo + Xl where Xl E Xo and Xl E M. Then T can be extended to X by setting
T(xo + Xl) == Txo.
180 ELEMENTS OF FUNCTIONAL ANALYSIS
35.4: Treves [150] contains a readable account of the general theory of topolog-
ical vector spaces, with many concrete examples.
Alaoglu's theorem, which was first announced in Alaoglu [3] and proved in detail
in Alaoglu [4], supersedes a number of earlier results dealing with special cases. It
was discovered independently by Bourbaki [19].
35.5: The space envisaged by Hilbert himself was Z2 (N); the notion of an abstract
Hilbert space was introduced by von Neumann [154] in his work on the mathematics
of quantum mechanics. Theorem 5.25 is originally due to F. Riesz [115] in the setting
of £2 spaces. It is one of several representation theorems for linear functionals on
various spaces that bear his name, the others being Theorems 6.15, 7.2, and 7.17. To
avoid confusion, we reserve the name "Riesz representation theorem" for the latter
two, which are closely related.
In the literature of quantum physics, scalar products are customarily denoted by
(xly) and are taken to be linear in the second variable and conjugate-linear in the
first.
LP Spaces
LP spaces are a class of Banach spaces of functions whose norms are defined in terms
of integrals and which generalize the L 1 spaces discussed in Chapter 2. They furnish
interesting examples of the general theory of Chapter 5 and play a central role in
modem analysis.
6.1 BASIC THEORY OF L P SPACES
In this chapter we shall be working on a fixed measure space (X, M, p). If f is a
measurable function on X and 0 < p < 00, we define
II flip = [J Ifl P d P ] lip
(allowing the possibility that IIfll p == (0), and we define
LP(X,M,p) == {f: X <C: fismeasurableand IIfll p < oo}.
We abbreviate LP(X, M, p) by LP(p), LP(X), or simply LP when this will cause no
confusion. As we have done with L1 , we consider two functions to define the same
element of LP when they are equal almost everywhere.
If A is any nonempty set, we define lP (A) to be LP (p) where p is counting measure
on (A, P(A)), and we denote lP(N) simply by lP.
LP is a vector space, for if f, 9 E LP, then
If + glP < [2 max(lfl, Igl)]P < 2 P (lf1 P + IgIP),
181
182 L P SPACES
so that f + 9 E LP. Our notation suggests that II . lip is a norm on LP. Indeed, it is
obvious that Ilfll p == 0 iff f == 0 a.e. and Ilefllp == lei II flip, so the only question is
the triangle inequality. It turns out that the latter is valid precisely when p > 1, so
our attention will be focused almost exclusively on this case.
Before proceeding further, however, let us see why the triangle inequality fails for
p < 1. Suppose a > 0, b > 0, and 0 < p < 1. For t > 0 we have t p - 1 > (a + t )p-1 ,
and by integrating from 0 to b we obtain a P + bP > (a + b)P. Thus, if E and
F are disjoint sets of positive finite measure in X and we set a == p(E)l/p and
b == p( F) lip, we see that
IIXE + XFll p == (a P + bP)l/p > a + b == IlxEllp + IlxFllp.
The cornerstone of the theory of LP spaces is Holder's inequality, which we now
derive.
6.1 Lemma. If a > 0, b > 0, and 0 < A < 1, then
a A b 1 - A < Aa + (1 - A)b,
with equality iff a == b.
Proof. The result is obvious if b == 0; otherwise, dividing both sides by band
setting t = a/b, we are reduced to showing that t A < At + (1 - A) with equality iff
t == 1. But by elementary calculus, t A - At is strictly increasing for t < 1 and strictly
decreasing for t > 1, so its maximum value, namely 1 - A, occurs at t == 1. I
6.2 Holder's Inequality. Suppose 1 < p < 00 and p-1 + q-1 == 1 (that is, q ==
p/(p - 1)). If f and 9 are measurable functions on X, then
(6.3)
Ilfgll1 < Ilfllpllgll q .
In particular, if f E LP and 9 E Lq, then f 9 E L 1 , and in this case equality holds in
(6.3) iff alfl P == ,6lgl q a.e. for some constants a,,6 with a,6 -I o.
Proof. The result is trivial if Ilfll p == 0 or Ilgllq == 0 (since then f == 0 or 9 == 0
a.e.), or if II flip == 00 or Ilgllq == 00. Moreover, we observe that if (6.3) holds for a
particular f and g, then it also holds for all scalar multiples of f and g, for if f and 9
are replaced by af and bg, both sides of (6.3) change by a factor of labl. It therefore
suffices to prove that (6.3) holds when Ilfll p == Ilgllq == 1 with equality iff I/IP == Iglq
a.e. To this end, we apply Lemma 6.1 with a == If(x)IP, b == Ig(x)lq, and A == p-1
to obtain
(6.4) If(x)g(x)1 < p- 1 If(x)I P + q-1Ig(x)lq.
Integration of both sides yields
IIIgl11 < p-1 J IIIP + q-1 J Iglq = p-1 + q-1 = 1 = 1IIIIpllgllqo
Equality holds here iff it holds a.e. in (6.4), and by Lemma 6.1 this happens precisely
when Ifl P == Iglq a.e. I
BASIC THEORY OF L P SPACES 183
The condition p-1 + q-1 == 1 occurring in Holder's inequality turns up frequently
in LP theory. If 1 < p < 00, the number q == p / (p - 1) such that p-1 + q-1 == 1 is
called the conjugate exponent to p.
6.5 Minkowski's Inequality. If1 < p < 00 and f, 9 E LP, then
IIf + gllp < Ilfll p + Ilgllp.
Proof. The result is obvious if p == 1 or if f + 9 == 0 a.e. Otherwise, we observe
that
If + glP < (If I + Igl) If + g1P-1
and apply Holder's inequality, noting that (p - l)q == p when q is the conjugate
exponent to p:
/ If + glP < Ilfllplllf + g1P-111q + Ilgllplllf + g1P-111q
( ) l/q
= (1lfll p + Ilgllp) / If + glP .
Therefore,
[/ ] l-(l/q)
IIf + gllp == If + glP < IIfll p + Ilgllp.
I
This result shows that, for p > 1, LP is a normed vector space. More is true:
6.6 Theorem. For 1 < p < 00, LP is a Banach space.
Proof. We use Theorem 5.1. Suppose {fk} C LP and L IIfkllp == B < 00.
Let G n == L Ifkl and G == L Ifkl. Then IIGnll p < L IIfkllp < B for all n,
so by the monotone convergence theorem, J GP == lirn J G < BP. Hence G E LP,
and in particular G(x) < 00 a.e., which implies that the series L fk converges
a.e. Denoting its sum by F, we have IFI < G and hence F E LP; moreover,
IF - L fkl P < (2G)P E L 1 , so by the dominated convergence theorem,
IIF- tfkll P = / IF- tlklP --40.
1 P 1
Thus the series L fk converges in the LP norm.
I
6.7 Proposition. For 1 < p < 00, the set of simple functions f == L aj XE j , where
p(Ej) < 00 for all j, is dense in LP.
Proof. Clearly such functions are in LP. If f E LP, choose a sequence {f n}
of simple functions such that f n f a.e. and If n I < I f I, according to Theorem
2.10. Then fn E LP and Ifn - flP < 2 P lfl P E L 1 , so by the dominated convergence
theorem, "fn - flip O. Moreover, if fn == L ajXE j where the Ej are disjoint and
the aj are nonzero, we must have p(Ej) < 00 since L lajIPp(Ej) == J If niP < 00..
184 L P SPACES
To complete the picture of LP spaces, we introduce a space corresponding to the
limiting value p == 00. If I is a measurable function on X, we define
II f 1100 = inf { a > 0 : J.t ( { x : If ( x ) I > a}) = 0 },
with the convention that inf 0 == 00. We observe that the infimum is actually attained,
for
00
{x: II(x)1 > a} == U{x: II(x)1 > a+n- 1 },
1
and if the sets on the right are null, so is the one on the left. II I II 00 is called the
essential supremum of III and is sometimes written
1111100 == ess SUPxEX II(x) I.
We now define
L OO == LOO(X, M, p) == {I : X -4 <C : I is measurable and 1111100 < 00 },
with the usual convention that two functions that are equal a.e. define the same
element of L 00. Thus I E L 00 iff there is a bounded measurable function 9 such that
I == 9 a.e.; we can take 9 == IXE where E == {x : II(x)1 < 1l11I00}.
Two remarks: First, for fixed X and M, LOO(X, M, p) depends on p only insofar
as p determines which sets have measure zero; if p and v are mutually absolutely
continuous, then L oo.(p) == L 00 (v). Second, if J-l is not semifinite, for some purposes
it is appropriate to adopt a slightly different definition of Loo. This point will be
explored in Exercises 23-25.
The results we have proved for 1 < p < 00 extend easily to the case p == 00, as
follows:
6.8 Theorem.
a. If I and 9 are measurable functions on X, then IIlglll < 1lllllllglloo. If
I E L 1 and 9 E Loo, Illgll1 == 1IIII111glloo Wlg(x)1 = Ilglloo a.e. on the set
where I(x) -I O.
b. II. 1100 is a norm on Loo.
c. IIln - 11100 0 Wthere exists E E M such that p(EC) == 0 and In I
uniformly on E.
d. Loo is a Banach space.
e. The simple functions are dense in L 00 .
The proof is left to the reader (Exercise 2).
In view of Theorem 6.8a and the formal equality 1-1 + 00- 1 == 1, it is natural to
regard 1 and 00 as conjugate exponents of each other, and we do so henceforth.
Theorem 6.8c shows that II . 1100 is closely related to, but usually not identical
with, the uniform norm 11.llu. However, if we are dealing with Lebesgue measure, or
more generally any Borel measure that assigns positive values to all open sets, then
BASIC THEORY OF L P SPACES 185
Ilflloo == Ilfllu whenever f is continuous, since {x : If(x)1 > a} is open. In this
situation we may use the notations Ilflloo and Ilfllu interchangeably, and we may
regard the space of bounded continuous functions as a (closed!) subspace of Loo.
In general we have LP ct Lq for all p =1= q; to see what is at issue, it is instructive
to consider the following simple examples on (0,00) with Lebesgue measure. Let
fa(x) == x-a, where a > O. Elementary calculus shows that faX(O,I) E LP iff
p < a-I, and faX(I,oo) E LP iff p > a-I. Thus we see two reasons why a function
f may fail to be in LP: either Ifl P blows up too rapidly near some point, or it
fails to decay sufficiently rapidly at infinity. In the first situation the behavior of
Ifl P becomes worse as p increases, while in the second it becomes better. In other
words, if p < q, functions in LP can be locally more singular than functions in Lq,
whereas functions in Lq can be globally more spread out than functions in LP. These
somewhat imprecisely expressed ideas are actually a rather accurate guide to the
general situation, concerning which we now give four precise results. The last two
show that inclusions LP C Lq can be obtained under conditions on the measure space
that disallow one of the types of bad behavior described above; for a more general
result, see Exercise 5.
6.9 Proposition. /f0 < p < q < r < 00, then Lq C LP + LT; that is, each f E Lq
is the sum of a function in LP and a function in LT.
Proof. If f E Lq, let E == {x : If(x)1 > I} and set g == fXE and h == fXEc.
Then IglP == IflPXE < IflqxE, so g E LP, and Ihl T == IflTxEc < IflqXEc, so
hE LT. (Forr == 00, obviously IIhlloo < 1.) I
6.10 Proposition. /f 0 < p < q < r < 00, then LP n LT c Lq and Ilfll q <
IIfll; Ilfll-A, where A E (0,1) is defined by
-1 -1
q -r
-1 ,-1 ( 1 ' ) -1 h . ,
q == /\p + - /\ r , t at IS, /\ == p-l _ r- 1 .
Proof. Ifr == 00, we have Ifl q < IlfllPlflP and A == plq, so
IIfll q < II fll/q II f II (p/q) == II f II; II fll'\.
If r < 00, we use Holder's inequality, taking the pair of conjugate exponents to be
pi Aq and r 1(1 - A)q:
J Ifl q = J Ifl,Xqlfl(l-,X)q < II Ifl,Xqllp/,Xqll Ifl(l-,X)qllr/(l_,X)q
[J ] Aq/p [J ] (I-A)q/T
== Ifl P IflT == Ilfll;qllfllI-A)q.
Taking qth roots, we are done.
I
186 L P SPACES
6.11 Proposition. If A is any set and 0 < p < q < 00, then lP(A) C lq(A) and
IIfll q < IIfll p .
Proof. Obviously IIIII == sUPa If(a)IP < La If(a)IP, so that Ilflioo < Ilfll p .
The case q < 00 then follows from Proposition 6.10: if A == p / q,
Ilfll q < Ilfll;lIfllA < Ilfll p .
I
6.12 Proposition. If p(X) < 00 and 0 < p < q < 00, then LP(p) ::) Lq(p) and
IIfll p < Ilfll q p(X)(l/ p )-(l/ q ).
Proof. If q == 00, this is obvious:
Ilfll = J Ifl P < Ilfll J 1 = Ilfll/l(X).
If q < 00, we use Holder's inequality with the conjugate exponents q/ p and q / (q - p):
Ilfll = J IfI P . 1 < IllfIPllq/plllllq/(q-p) = Ilfll/l(X)(q-p)/q.
I
We conclude this section with a few remarks about the significance of the LP
spaces. The three most obviously important ones are L 1 , L 2 , and Loo. With L 1 we
are already familiar; L 2 is special because it is a Hilbert space; and the topology
on Loo is closely related to the topology of uniform convergence. Unfortunately,
L 1 and Loo are pathological in many respects, and it is more fruitful to deal with
the intermediate LP spaces. One manifestation of this is the duality theory in 36.2;
another is the fact that many operators of interest in Fourier analysis and differential
equations are bounded on LP for 1 < p < 00 but not on L1 or Loo. (Some examples
are mentioned in 39.4.)
Exercises
1. When does equality hold in Minkowski's inequality? (The answer is different
for p == 1 and for 1 < p < 00. What about p == oo?)
2. Prove Theorem 6.8.
3. If 1 < p < r < 00, LP n LT is a Banach space with norm Ilfll == Ilfll p + IIfIIT'
and if p < q < r, the inclusion map LP n LT ---t Lq is continuous.
4. If 1 < p < r < 00, LP + LT is a Banach space with norm 11111 == inf{llglip +
Ilhllr : f = 9 + h}, and if p < q < r, the inclusion map Lq ---t LP + LT is continuous.
5. Suppose 0 < p < q < 00. Then LP ct Lq iff X contains sets of arbitrarily small
positive measure, and Lq ct LP iff X contains sets of arbitrarily large finite measure.
BASIC THEORY OF L P SPACES 187
(For the "if" implication: In the first case there is a disjoint sequence {En} with
o < J-L(En) < 2- n , and in the second case there is a disjoint sequence {En} with
1 < J-L(En) < 00. Consider f == E anXE n for suitable constants an.) What about
the case q == oo?
6. Suppose 0 < Po < PI < 00. Find examples of functions I on (0, (0) (with
Lebesgue measure), such that I E LP iff (a) Po < P < PI, (b) Po < P < PI, (c)
P = Po. (Consider functions of the form I(x) = x-a I log xl b.)
7. If I E LP n Loo for some P < 00, so that I E Lq for all q > P, then
1111100 == limqoo 1I111q.
8. Suppose J-L(X) == 1 and I E LP for some P > 0, so that I E Lq for 0 < q < p.
a. log 1IIIIq > Jlog III. (Use Exercise 42d in 33.5, with F(t) == e t .)
b. (J Illq - 1)/q > log 11111q, and (J Illq - 1)/q Jlog III as q o.
c. limqo II f II q == exp(J log I I I).
9. Suppose 1 < P < 00. If II In - flip 0, then fn I in measure, and hence
some subsequence converges to I a.e. On the other hand, if I n I in measure and
Ilnl < 9 E LP for all n, then Ilfn - Illp O.
10. Suppose 1 < P < 00. If In, I E LP and In I a.e., then II In - Illp 0 iff
II In lip Hilip. (Use Exercise 20 in 3 2 . 3 .)
11. If f is a measurable function on X, define the essential range R f of I to be the
set of all z E <C such that {x : I I (x) - z I < E} has positive measure for all E > O.
a. R f is closed.
b. If I E Loo, then Rf is compact and 1111100 == max{lzl : z E R f }.
12. If P -I 2, the LP norm does not arise from an inner product on LP, except in
trivial cases when dim(LP) < 1. (Show that the parallelogram law fails.)
13. LP(In, m) is separable for 1 < P < 00. However, Loo(In, m) is not separable.
(There is an uncountable set C Loo such that III - glloo > 1 for all I, 9 E with
I -I g.)
14. If 9 E Loo, the operatorT defined by T I == I 9 is bounded on LP for 1 < P < 00.
Its operator norm is at most Ilglloo, with equality if J-L is semifinite.
15. (The Vitali Convergence Theorem) Suppose 1 < P < 00 and {In}! C LP.
In order for {In} to be Cauchy in the LP norm it is necessary and sufficient for the
following three conditions to hold: (i) {In} is Cauchy in measure; (ii) the sequence
{llnI P } is uniformly integrable (see Exercise 11 in 33.2); and (iii) for every E > 0
there exists E C X such that J-L(E) < 00 and JEc Ilnl P < E for all n. (To prove
the sufficiency: Given E > 0, let E be as in (iii), and let Amn == {x E E :
Ifm(x) - fn(x)1 > E}. Then the integrals of Iin - Iml P over E \ Amn, Amn, and
EC are small when m and n are large - for three different reasons.)
16. If 0 < P < 1, the formula p(l, g) == J If - 9 I P defines a metric on LP that makes
LP into a complete topological vector space. (The proof of Theorem 6.6 still works
188 L P SPACES
for p < 1 if Ilfll p is replaced by J IfI P , as it uses only the triangle inequality and not
the homogeneity of the norm.)
6.2 THE DUAL OF L P
Suppose that p and q are conjugate exponents. Holder's inequality shows that each
9 E Lq defines a bounded linear functional cjJg on LP by
</>9(1) = f fg,
and the operator norm of cjJg is at most Ilgllq. (If p == 2 and we are thinking of L 2 as
a Hilbert space, it is more appropriate to define cjJg (f) == J f g . The same convention
can be used for p -I 2 without changing the results below in an essential way.) In
fact, the map 9 -4 cjJg is almost always an isometry from Lq into (LP)*.
6.13 Proposition. Suppose that p and q are conjugate exponents and 1 < q < 00.
If 9 E Lq, then
Ilgllq = 11</>911 = sup {If fgl : Ilfll p = I} ·
If J-L is semifinite, this result holds also for q == 00.
Proof. Holder's inequality says that IlcjJg II < Ilgllq, and equality is trivial if 9 == 0
(a.e.). If 9 -I 0 and q < 00, let
Iglq-1 sgng
f = Ilgll-l ·
Then
J Igl(q-1)P J Iglq
Ilfll = Ilgllq-l)p = J Iglq = 1,
so
> f == J Iglq ==
11</>911 - fg Ilgll-l Ilgllq.
(If q == 1, then f == sgng, Ilflloo = 1, and J fg == IIgll1.) If q = 00, for E > 0 let
A == {x : Ig(x)1 > Ilglioo - E}. Then J-L(A) > 0, so if J-L is semifinite there exists
B c A with 0 < J-L(B) < 00. Let f == J-L(B)-lXB sgng ; then IIfl11 = 1, so
11</>911 > f fg = p() llgl > Ilglloo - E.
Since E is arbitrary, IlcjJglI == IIglloo.
I
THE DUAL OF L P 189
Conversely, if f ---+ J f 9 is a bounded linear functional on LP, then 9 E Lq in
almost all cases. In fact, we have the following stronger result.
6.14 Theorem. Let p and q be conjugate exponents. Suppose that 9 is a measurable
function on X such that f 9 E L 1 for all f in the space of simple functions that
vanish outside a set of finite measure, and the quantity
Mq(g) = sup {II fgl : fEE and II flip = 1 }
is finite. Also, suppose either that 8g == {x : g( x) f= O} is a-finite or that JL is
semifinite. Then 9 E Lq and Mq(g) == IIgllq.
Proof First, we remark that if f is a bounded measurable function that vanishes
outside a set E of finite measure and II flip == 1, then I J fgl < Mq(g). Indeed,
by Theorem 2.10 there is a sequence {f n} of simple functions such that Ifni < I f I
(in particular, fn vanishes outside E) and fn ---t f a.e. Since Ifni < IlfllooxE and
XEg E L 1 , by the dominated convergence theorem we have I J fgl == lim I J fngl <
Mq (g).
Now suppose that q < 00. We may assume that 8g is a-finite, as this condition
automatically holds when J-L is semifinite; see Exercise 17. Let { En} be an increasing
sequence of sets of finite measure such that 8 9 == U En. Let { ifyn} be a sequence
of simple functions such that CPn ---t 9 pointwise and I CPn I < I 9 I, and let gn == cpn XE n .
Then gn ---t 9 pointwise, /gn/ < Igl, and gn vanishes outside En. Let
Ignl Q - 1s gng
fn = IIgnll-l ·
Then as in the proof of Proposition 6.13 we have Ilfn lip == 1, and by Fatou's lemma,
Ilgllq < liminf Ilgnllq = liminf 1 Ifngnl
< liminf 1 Ifngl = liminf 1 fng < Mq(g).
(For the last estimate we used the remark at the beginning of the proof.) On the other
hand, Holder's inequality gives Mq(g) < Ilgllq, so the proof is complete for the case
q < 00.
Now suppose q == 00. Given E > 0, let A == {x : Ig(x)1 > Moo(g) + E}. If
f-L(A) were positive, we could choose B c A with 0 < JL(B) < 00 (either because
f-L is semi finite or because A c 8g). Setting f == JL(B)-lXBsgng, we would then
have IIfl11 == 1, and J fg == JL(B)-l JB Igl > Moo (g) + E. But this is impossible by
the remark at the beginning of the proof. Hence II 9 II 00 < Moo (g), and the reverse
inequality is obvious. .
The last and deepest part of the description of (LP) * is the fact that the map
9 ---t ifyg is, in almost all cases, a surjection.
190 L P SPACES
6.15 Theorem. Let p and q be conjugate exponents. If 1 < p < 00, for each
cjJ E (LP)* there exists 9 E Lq such that cjJ(f) = J fg for all f E LP, and hence Lq
is isometrically isomorphic to (LP) *. The same conclusion holds for p == 1 provided
fL is a-finite.
Proof. First let us suppose that fL is finite, so that all simple functions are in
LP. If cjJ E (LP)* and E is a measurable set, let v(E) = cjJ(XE). For any disjoint
sequence {E j }, if E == U Ej we have XE == L XEj where the series converges
in the LP norm:
n 00 00 1/
IIXE-LXEjll =IILXEjll =fL(UE j ) P ->Oasn->00.
1 P n+l P n+l
(It is at this point that we need the assumption that p < 00.) Hence, since cP is linear
and continuous,
00
00
v(E) == LCP(XE j ) == Lv(E j ),
1 1
so that v is a complex measure. Also, if fL( E) == 0, then XE == 0 as an element
of LP, so v(E) == 0; that is, v « fl. By the Radon-Nikodym theorem there exists
9 E L1 (fL) such that cjJ(XE) == v( E) = J E 9 dfL for all E and hence cjJ(f) == J f 9 dfL
for all simple functions f. Moreover, I J fgl < IlcjJllllfll p , so 9 E Lq by Theorem
6.14. Once we know this, it follows from Proposition 6.7 that cp(f) == J f 9 for all
f E LP.
Now suppose that fL is a-finite. Let {En} be an increasing sequence of sets such
that 0 < fL(En) < 00 and X == U En' and let us agree to identify LP(En) and
Lq(E n ) with the subspaces of LP(X) and Lq(X) consisting of functions that vanish
outside En. The preceding argument shows that for each n there exists gn E Lq(E n )
such that cjJ(f) == J fgn for all f E LP(En), and Ilgnllq = IICPILP(En)11 < Ilcpll.
The function gn is unique modulo alterations on nullsets, so gn == gm a.e. on En for
n < m, and we can define 9 a.e. on X by setting 9 == gn on En. By the monotone
convergence theorem, Ilgll q == lim Ilgnllq < IlcjJll, so 9 E Lq. Moreover, if f E LP,
then by the dominated convergence theorem, fXE n f in the LP norm and hence
cjJ(f) == lim4>(fXE n ) == limJEn fg == J fg.
Finally, suppose that fL is arbitrary and p > 1, so that q < 00. As above, for each
a-finite set E c X there is an a.e.-unique gE E Lq(E) such that cjJ(f) == J fgE for
all f E LP(E) and IlgEllq < Ilcpll. If F is a-finite and F :> E, then gp == gE a.e.
on E, so Ilgpllq > IlgEllq. Let M be the supremum of IlgEllq as E ranges over all
a-finite sets, noting that M < IlcjJll. Choose a sequence {En} so that ligEn Ilq M,
and set F == U En. Then F is a-finite and I/gpl/q > IIgEnllq for all n, whence
IIgF IIq == M. Now, if A is a a-finite set containing F, we have
J Igp/q + J IgA\plq = J IgAlq < Mq = J Igplq,
and thus gA\P == 0 and gA = gp a.e. (Here we use the fact that q < 00.) But if
f E LP, then A = F U {x : f(x) =I O} is a-finite, so cjJ(f) = J fgA == J fgp. Thus
we may take 9 == gp, and the proof is complete. .
THE DUAL OF L P 191
6.16 Corollary. If 1 < p < 00, LP is reflexive.
We conclude with some remarks on the exceptional cases p == 1 and p == 00. For
any measure JL, the correspondence 9 cjJg maps L 00 into (L 1 ) *, but in general it is
neither injective nor surjective. Injectivity fails when J-L is not semifinite. Indeed, if
E c X is a set of infinite measure that contains no subsets of positive finite measure,
and f E L 1 , then {x : f(x) i= O} is a-finite and hence intersects E in a null set.
It follows that 1;XE == 0 although XE i= 0 in L 00. This problem, however, can be
remedied by redefining Loo; see Exercises 23-24. The failure of surjectivity is more
subtle and is best illustrated by an example; see also Exercise 25.
Let X be an uncountable set, JL == counting measure on (X, P(X)), M == the a-
algebra of countable or co-countable sets, and JLo = the restriction of JL to M. Every
f E L 1 (JL) vanishes outside a countable set, and it follows that L 1 (JL) == Ll (JLo).
On the other hand, L 00 (JL) consists of all bounded functions on X, whereas L 00 (JLa)
consists of those bounded functions that are constant except on a countable set. With
this in mind, it is easy to see that the dual of L 1 (JLa) is Loo (JL) and not the smaller
space LOO(J-La).
As for the case p == 00: the map 9 --t cjJg is always an isometric injection of L 1
into (LOO) * by Proposition 6. f3, but it is almost never a surjection. We shall say more
about this in 36.6; for the present, we give a specific example. (Another example can
be found in Exercise 19.)
Let X == [0,1], JL == Lebesgue measure. The map f f(O) is a bounded linear
functional on C(X), which we regard as a subspace of Loo. By the Hahn-Banach
theorem there exists 1; E (LOO)* such that cjJ(f) == f(O) for all f E C(X). To see
that 1; cannot be given by integration against an L 1 function, consider the functions
fn E C(X) defined by fn(x) == max(l-nx, 0). Then cjJ(fn) == fn(O) == 1 for all n,
but fn(x) --t 0 for all x > 0, so by the dominated convergence theorem, J fng --t 0
for all 9 ELI.
Exercises
17. With notation as in Theorem 6.14, if JL is semifinite, q < 00, and Mq (g) < 00,
then {x : Ig(x)1 > €} has finite measure for all € > 0 and hence 8g is a-finite.
18. The self-duality of L2 follows from Hilbert space theory (Theorem 5.25), and this
fact can be used to prove the Lebesgue-Radon-Nikodym theorem by the following
argument due to von Neumann. Suppose that JL, v are positive finite measures on
(X, M) (the a-finite case follows easily as in 33.2), and let A = J-L + v.
a. The map f J f dv is a bounded linear functional on L 2 (A), so J f dv ==
J fgdA for some 9 E L2(A). Equivalently, J f(l - g)dv == J fgdjJo for
f E L2(A).
b. 0 < 9 < 1 A-a.e., so we may assume 0 < 9 < 1 everywhere.
c. Let A == {x : g(x) < I}, B == {x : g(x) == I}, and set va (E) == v(A n E),
V S (E) == v(B n E). Then V s 1. JL and Va « JL; in fact, dV a == g(l- g) -1 XA djJo.
192 L P SPACES
19. Define cjJn E (ZOO)* by cPn(f) == n- 1 L: f(j). Then the sequence {cjJn} has
a weak* cluster point cjJ, and cP is an element of (ZOO) * that does not arise from an
element of ZI .
20. Suppose sUP n Illnllp < 00 and In f a.e.
a. If 1 < p < 00, then I n I weakly in LP. (Given 9 E Lq, where q is
conjugate to p, and € > 0, there exist (i) 8 > 0 such that JE Iglq < € whenever
JL(E) < 8, (ii) A c X such that JL(A) < 00 and JX\A Iglq < €, and (iii) B c A
such that JL(A \ B) < 8 and In I uniformly on B.)
b. The result of (a) is false in general for p == 1. (Find counterexamples in
L 1 (IR, m) and ZI.) It is, however, true for p == 00 if JL is a-finite and weak
convergence is replaced by weak* convergence.
21. If 1 < p < 00, In I weakly in ZP(A) if[ sUPn Illnllp < 00 and In I
pointwise.
22. Let X = [0,1], with Lebesgue measure.
a. Let In (x) == cos 27rnx. Then In 0 weakly in L 2 (see Exercise 63 in 35.5),
but In f+ 0 a.e. or in measure.
b. Let In(x) == nX(O,I/n). Then In 0 a.e. and in measure, but In f+ 0
weakly in LP for any p.
23. Let (X, M, JL) be a measure space. A set E E M is called locally null if
JL(E n F) == 0 for every F E M such that JL(F) < 00. If I : X C is a measurable
function, define
11111* == inf{ a : {x : II(x)1 > a} is locally bull},
and let [.;00 == [.;00 (X, M, JL) be the space of all measurable I such that 11111* < 00.
We consider I, 9 E [.;00 to be identical if {x : I(x) -# g(x)} is locally null.
a. If E is locally null, then JL( E) is either 0 or 00. If JL is semi finite, then every
locally null set is null.
b. II. II * is a norm on [.; 00 that makes [.; 00 into a Banach space. If JL is semifinite,
then [.; 00 == L 00 .
24. If 9 E [.;OCJ (see Exercise 23), then Ilgll* == sup{1 J Igi : 111111 == I}, so the
map 9 cjJg is an isometry from [.;OCJ into (L 1 )*. Conversely, if MOCJ(g) < 00 as in
Theorem 6.14, then 9 E [.;OCJ and MOCJ(g) == Ilgll*.
25. Suppose JL is decomposable (see Exercise 15 in 33.2). Then every cjJ E (L 1 )* is
of the form cjJ(l) == Jig for some 9 E [.; OCJ, and hence (L 1 ) * I"-.J [.; OCJ (see Exercises
23 and 24). (If is a decomposition of JL and I E Ll, there exists {E j } c such
that I == L: I XE j where the series converges in L 1 .)
SOME USEFUL INEQUALITIES 193
6.3 SOME USEFUL INEQUALITIES
Estimates and inequalities lie at the heart of the applications of LP spaces in analysis.
The most basic of these are the Holder and Minkowski inequalities. In this section
we present a few additional important results in this area. The first one is almost a
triviality, but it is sufficiently useful to warrant special mention.
6.17 Chebyshev's Inequality. If f E LP (0 < p < (0), then for any a > 0,
J1({x: If(x)1 > a}) < [ "Ip r.
Proof. Let Eo; == {x : If(x)1 > a}. Then
Ilfll = / IflP > r Ifl P > a P r 1 = a P J1(Ea).
} Eo } Eo
.
The next result is a rather general theorem about boundedness of integral operators
on LP spaces.
6.18 Theorem. Let (X, M, JL) and (Y, N, v) be a-finite measure spaces, and let K
be an (M 0 N)-measurable function on X x Y. Suppose that there exists C > 0
such that J IK(x, y) I dJL(x) < C for a.e. y E Y and J IK(x, y) I dv(y) < C for a.e.
x E X, and that 1 < p < 00. If f E LP(v), the integral
Tf(x) = / K(x, y)f(y) dv(y)
converges absolutely for a.e. x E X, the function T f thus defined is in LP(JL), and
liT flip < Cllfll p .
Proof. Suppose that 1 < p < 00. Let q be the conjugate exponent to p. By
applying Holder's inequality to the product
IK(x, y)f(y)1 == IK(x, y)1 1 /q (IK(x, y)II/Plf(y)l)
we have
[ ] l/q [ ] l/p
/ IK(x, y)f(y)1 dv(y) < / IK(x, y)1 dv(y) / IK(x, y)llf(y)IP dv(y)
< Cl/q [/ IK(x, y)llf(y)IP dV(y)] l/p
for a.e. x EX. Hence, by Tonelli's theorem,
/ [/ IK(x, y)f(y)1 dV(y)r dJ1(x) < Cp/q // \K(x, y)llf(y)I P dv(y) dJ1(x)
< C(p/q)+l / If(y)IP dv(y).
194 L P SPACES
Since the last integral is finite, Fubini' s theorem implies that K (x, . ) f EL I (v) for
a.e. x, so that T f is well defined a.e., and
J ITf(x)IP dfL(X) < C(pJq)+1l1fll.
Taking pth roots, we are done.
For p = 1 the proof is similar but easier and requires only the hypothesis
J IK(x, y) I dJL(x) < C; for p == 00 the proof is trivial and requires only the hy-
pothesis J IK(x, y)1 dv(y) < C. Details are left to the reader (Exercise 26). .
Minkowski's inequality states that the LP norm of a sum is at most the sum of
the LP norms. There is a generalization of this result in which sums are replaced by
integrals:
6.19 Minkowski's Inequality for Integrals. Suppose that (X, M, JL) and (Y, N, v)
are a-finite measure spaces, and let f be an (MQ!)N)-measurablefunction on X x Y.
a. If f > 0 and 1 < p < 00, then
[J (J f(x,Y)dV(Y))P dfL(X)fJP < I [I f(X,y)PdfL(x)f JP dv(y).
b. If1 < p < 00, f(., y) E LP(JL)fora.e. y, and the function y IIf(., y)llp is
in Ll(v), then f(x,.) E Ll(v)fora.e. x, the function x J f(x, y) dv(y) is
in LP(JL), and
J f(.,y)dv(y) < Illf("y)lIp dv (y).
P
Proof. If p == 1, (a) is merely Tonelli's theorem. If 1 < p < 00, let q be the
conjugate exponent to p and suppose 9 E Lq(JL). Then by Tonelli's theorem and
Holder's inequality,
I [I f(x, y) dV(y)] Ig(x)1 dfL(X) = II f(x, y)lg(x)1 dfL(X) dv(y)
< IIgllq J [J f(x, y)P dfL(X)] IJp dv(y).
Assertion (a) therefore follows from Theorem 6.14. When p < 00, (b) follows from
(a) (with f replaced by IfD and Fubini's theorem; when p == 00, it is a simple
consequence of the monotonicity of the integral. .
Our final result is a theorem concerning integral operators on (0, (0) with Lebesgue
measure.
SOME USEFUL INEQUALITIES 195
6.20 Theorem. Let K be a Lebesgue measurable function on (0, (0) x (0, (0) such
that K(AX, AY) == A -1 I«(x, y) for all A > 0 and 10 00 IK(x, 1) Ix-lip dx == C < 00
for some p E [1,00], and let q be the conjugate exponent to p. For f E LP and
9 E Lq, let
T f(y) = 1 00 K(x, y)f(x) dx,
Sg(x) = 1 00 K(x, y)g(y) dy.
Then Tf and 8g are defined a.e., and IITfll p < Cllfll p and 118gll q < Cllgllq.
Proof. Setting z == x/y, we have
1 00 IK(x, y)f(x)1 dx = 1 00 IK(yz, y)f(yz)ly dz = 1 00 IK(z, l)fz(y)1 dz
where fz(y) == f(yz); moreover,
[ rOO ] lip [ rOO ] lip
llfzll p = Jo If(yz)IP dy = Jo If(x)IPz-l dx = z-l/ P llfll p .
Therefore, by Minkowski's inequality for integrals, T f exists a.e. and
liT flip < 1 00 IK(z, l)lllfzllpdz = IIfll p 1 00 IK(z, l)lz- 1 /p dz = ellfll p '
Finally, setting u == y-l, we have
1 00 IK(l, y)ly-l/ q dy = 1 00 IK(y-l, l)ly-l-(l/ Q ) dy
= 1 00 IK(u, l)lu- 1 / p du = e,
so the same reasoning shows that 8g is defined a.e. and that 118gll q < Cllgliq. .
6.21 Corollary. Let
Tf(y) = y-l 1 Y f(x) dx,
Sg(x) = 1 00 y-lg(y) dy.
Thenfor 1 < p < 00 and 1 < q < 00,
liT flip < p l llfll p ,
118gll q < qllgllq.
Proof. Let K(x, y) == y-1 XE (x, y) where E == {(x, y) : x < y}. Then
10 00 IK(x, 1)lx- 1/p dx = 10 1 x-lip dx == p/(p - 1) == q, where q is the conjugate
exponent to p, so Theorem 6.20 yields the result. .
196 L P SPACES
Corollary 6.21 is a special case of Hardy's inequalities; the general result is in
Exercise 29.
Exercises
26. Complete the proof of Theorem 6.18 for the cases p = 1 and p = 00.
27. (Hilbert's Inequality) The operator Tf(x) = JoOO(x + y)-l f(y) dy satisfies
liT flip < Cpllfll p for 1 < p < 00, where C p == Jo oo x- 1 / p (x + 1)-1 dx. (For those
who know about contour integrals: Show that C p == 7r csc( 7r /p ).)
28. Let I a be the ath fractional integral operator as in Exercise 61 of 32.6, and let
Jaf(x) == x-a laf(x).
a. J a is bounded on LP(O, (0) for 1 < p < 00; more precisely,
f(l _ p-1 )
!lJofll p < r(a + 1 _ p-l) II flip'
b. There exists fELl (0,00) such that J 1 f tt L 1 (0,00).
29. Suppose that 1 < p < 00, r > 0, and h is a nonnegative measurable function on
(0,00). Then:
rOO [ {X ] P P P roo
J o x- r - l Jo h(y)dy dx < C ) Jo xp-r-lh(x)Pdx,
roo [1 00 ] p p p roo
Jo x r - l x h(y)dy dx < C ) J o xp+r-lh(x)Pdx.
(Apply Theorem 6.20 with K(x,y) = xf3- 1 y -f3 X (0,00)(Y-x), f(x) == x'h(x), and
g(x) == x D h(x) for suitable,8, 1', 8.)
30. Suppose that K is a nonnegative measurable function on (0,00) such that
Jo oo K(x)x s - 1 dx == cjJ(s) < 00 for 0 < s < 1.
a. If 1 < p < 00, p-1 + q-1 == 1, and f, 9 are nonnegative measurable functions
on (0,00), then (with J = 10(0)
// K(xy)f(x)g(y) dxdy < cjJ(p-l) [/ x p - 2 f(x)P dX] lip [/ g(x)q x] llq.
b. The operator Tf(x) == 10 00 K(xy)f(y) dy is bounded on L 2 ((0, (0)) with
norm < 4>( ). (Interesting special case: If K(x) == e- x , then T is the Laplace
transform and 4>(s) == f(s).)
31. (A Generalized Holder Inequality) Suppose that 1 < Pj < 00 and L:: Pj1 ==
r- 1 < 1. If fj E LP] for j = 1,..., n, then TI fj E Lr and II TI fjllr <
TI II fj IIpj. (First do the case n == 2.)
DISTRIBUTION FUNCTIONS AND WEAK L P 197
32. Suppose that (X, M, J.L) and (Y, N, v) are a-finite measure spaces and K E
L2(J.L X v). If f E L2(v), the integral T f(x) == J K(x, y)f(y) dv(y) converges
absolutely for a.e. x EX; moreover, T f E L2(J.L) and liT fl12 < IIK11211f112.
33. Given 1 < p < 00, let T f(x) == x-lip Jo X f(t) dt. If p-1 + q-1 == 1, then Tis
a bounded linear map from Lq ((0, (0)) to Co ((0, (0)).
34. If f is absolutely continuous on [E, 1] for 0 < E < 1 and Jo 1 xlf'(x)IP dx < 00,
then limxo f(x) exists (and is finite) if p > 2, If(x)I/llog xl 1/2 0 as x 0 if
p == 2, and If(x) l/x 1 -(2 Ip ) 0 as x 0 if p < 2.
6.4 DISTRIBUTION FUNCTIONS AND WEAK LP
If f is a measurable function on (X, M, J.L), we define its distribution function
Af: (0,00) [0,00] by
Af(ex) == J.L({x: If(x)1 > ex}).
(This is closely related, but not identical, to the "distribution functions" discussed in
31.5 and 310.1.) We compile the basic properties of A f in a proposition:
6.22 Proposition.
a. A f is decreasing and right continuous.
b. Iflfl < Igl, then Af < Ag.
c. If Ifn I increases to If I, then A in increases to A f.
d. If f == 9 + h, then Af(ex) < Ag(ex) + Ah(ex).
Proof. Let E ( ex, f) == {x : If (x) I > ex}. The function A f is decreasing since
E( ex, f) :) E((3, f) if ex < (3, and it is right continuous since E( ex, f) is the increasing
union of {E(ex + n- 1 , f)}l. If If I < Igl, then E(ex, f) C E(ex, g), so Af < Ag.
If Ifni increases to I f I, then E ( ex, f) is the increasing union of {E ( ex, f n) }, so A f n
increases to A f. Finally, if f == 9 + h, then E( ex, f) C E( ex, g) u E( ex, h), which
implies that Af(ex) < Ag( ex) + Ah( ex). .
Suppose that A f (ex) < 00 for all ex > O. In view of Proposition 6.22a, A f
defines a negative Borel measure v on (0, (0) such that v((a, b]) == Af(b) - Af(a)
whenever 0 < a < b. (Our construction of Borel measures on IR in 31.5 works
equally well on (0,00).) We can therefore consider the Lebesgue-Stieltjes integrals
J 4J dA f = J cjJ dv of functions 4J on (0,00). The following result shows that the
integrals of functions of If I on X can be reduced to such Lebesgue-Stieltjes integrals.
6.23 Proposition. If A f (ex) < 00 for all ex > 0 and cp is a nonnegative Borel
measurable function on (0,00), then
Ix cjJ 0 If I dfL = -1 00 cjJ(a)dAf(a).
198 L P SPACES
Proof. If v is the negative measure determined by A f, we have
v((a,b]) == Af(b) - Af(a) == -J-L({x: a < II(x)1 < b}) == -J-L(III-l((a,b])).
It follows that v(E) == -J-L(III-l(E)) for all Borel sets E c (0,00), by the
uniqueness of extensions (Theorem 1.14). But this means that Jx cjJ 0 III dJ-L ==
- Jo oo cjJ( a) dA f ( a) when cjJ is the characteristic function of a Borel set, and hence
when cjJ is simple. The general case then follows by virtue of Theorem 2.10 and the
monotone convergence theorem. .
The case of this result in which we are most interested is 4J( a) == a P , which gives
J I/IP dfL = -1 00 a P dAf(a).
A more usefuJ form of this equation is obtained by integrating the right side by parts
(Theorem 3.36) to obtain JlflPdJ-L == pJooo aP-1Af(a)da. The validity of this
calculation is not clear unless we know that a P A f (Q) 0 as a 0 and a 00;
nonetheless, the conclusion is correct.
6.24 Proposition. [fO < p < 00, then
J I/IP dfL = P 1 00 a P - 1 Af(a) da.
Proof. If A f ( a) == 00 for some a > 0, then both integrals are infinite. If not,
and I is simple, then A f is bounded as a 0 and vanishes for a sufficiently large, so
the integration by parts described above works. (It is also easy to verify the formula
directly in this case.) For the general case, let {gn} be a sequence of simple functions
that increases to I II; then the desired result is true for gn, and it follows for I by
Proposition 6.22c and the monotone convergence theorem. .
A variant of the LP spaces that turns up rather often is the following. If I is a
measurable function on X and 0 < p < 00, we define
( ) lip
[I]p == supaPAf(a) ,
Q>O
and we define weak LP to be the set of all I such that [I]p < 00. [.]p is not a norm;
it is easily checked that [el]p == lei [I]p, but the triangle inequality fails. However,
weak LP is a topological vector space; see Exercise 35.
The relationship between LP and weak LP is as follows. On the one hand,
LP C weak LP, and [I]p < 11111 p .
(This is just a restatement of Chebyshev's inequality.) On the other hand, if we
replace A f ( a) by ([I]p / a)P in the integral p Jo oo a P - l A f ( a) da, which equals II I",
we obtain a constant times Jo oo a-I da, which is divergent at both 0 and 00 - but
DISTRIBUTION FUNCTIONS AND WEAK LP 199
just barely. One needs only slightly stronger estimates on A f near 0 and 00 to obtain
f E LP. (See also Exercise 36.) The standard example of a function that is in weak
LP but not in LP is f(x) == x-lip on (0, (0) (with Lebesgue measure).
Frequently it is convenient to express a function as the sum of a "small" part and
a "big" part. The following is a way of doing this that gives a simple formula for the
distribution functions.
6.25 Proposition. If f is a measurable function and A > 0, let E (A) == {x :
If(x)1 > A}, and set
h A = fXX\E(A) + A(sgn f)XE(A)'
Then
gA == f - h A == (sgnf)(lfl- A)XE(A).
A gA (a) == Af(a + A),
{ A f( a ) if a < A,
AhA (a) == 0
if a > A.
The proof is left to the reader (Exercise 37).
Exercises
35. For any measurable f and 9 we have [cf]p == Icl[f]p and [f + g]p < 2([f] +
[g])l/p;-hence weak LP is a vector space. Moreover, the "balls" {g : [g - f]p < r}
(r > 0, f E weak LP) generate a topology on weak LP that makes weak LP into a
topological vector space.
36. If f E weak LP and p,( {x : f(x) =I- O}) < 00, then f E Lq for all q < p. On
the other hand, if f E (weak LP) n Loo, then f E Lq for all q > p.
37. Prove Proposition 6.25.
38. f E LP iff I:oo 2 kp Af(2 k ) < 00.
39. If f E LP, then lima-+o a P A f (a) == lim a -+ oo a P A f (a) == O. (First suppose f is
simple. )
40. If f is a measurable function on X, its decreasing rearrangement is the function
f* : (0,00) [0,00] defined by
f*(t) == inf{ a : Af(a) < t} (where inf 0 = (0).
a. f* is decreasing. If f*(t) < 00 then Af(f*(t)) < t, and if Af(a) < 00 then
f*(Af(a)) < a.
b. A f == A f* , where A f* is defined with respect to Lebesgue measure on (0, 00 ).
c. If Af(a) < 00 for all a > 0 and lim a -+ oo Af(a) = 0 (so that f*(t) < 00
for all t > 0), and cp is a nonnegative measurable function on (0, 00 ), then
Ix cp 0 If I dp, == 10 00 cp 0 f*(t) dt. In particular, IIfll p = IIf*llp for 0 < p < 00.
d. If 0 <p < 00, [f]p == SUPt>otl/Pf*(t).
e. The name "rearrangement" for f* comes from the case where f is a nonneg-
ative function on (0,00). To see why it is appropriate, pick a step function on
(0,00) assuming four or five different values and draw the graphs of f and f*.
200 L P SPACES
6.5 INTERPOLATION OF LP SPACES
If 1 < P < q < r < 00, then (LP n Lr) c Lq c (LP + Lr), and it is natural to
ask whether a linear operator T on LP + Lr that is bounded on both LP and Lr is
also bounded on Lq. The answer is affirmative, and this result can be generalized in
various ways. The two fundamental theorems on this question are the Riesz- Thorin
and Marcinkiewicz interpolation theorems, which we present in this section. We
begin with the Riesz-Thorin theorem, whose proof is based on the following result
from complex function theory.
6.26 The Three Lines Lemma. Let c/J be a bounded continuous function on the strip
o < Re z < 1 that is holomorphic on the interior of the strip. If I cjJ( z) I < Mo for
Rez = 0 and IcjJ(z) I < MlforRez == 1, then IcjJ(z) I < MJ-tMiforRez == t,
o < t < 1.
Proof For E > 0 let c/J€(z) = c/J(z)M-1 M 1 z exp(Ez(Z -1)). Then c/J€ satisfies
the hypotheses of the lemma with Mo and M I replaced by 1, and also IcjJ€(z) I 0
as I Imzl 00. Thus 1c/J€(z)1 < 1 on the boundary of the rectangle 0 < Rez < 1,
- A < 1m z < A provided that A is large, and the maximum modulus principle
therefore implies that Ic/J€(z) I < 1 on the strip 0 < Rez < 1. Letting E 0, we
obtain the desired result:
1c/J(z)IM6-IMlt == Hm Ic/J€(z) I < 1 for Rez = t.
€-+-o
.
6.27 The Riesz-Thorin Interpolation Theorem. Suppose that (X, M, J-l) and
(Y, N, v) are measure spaces and Po, PI, qo, qi E [1, 00]. If qo = qi == 00, suppose
also that v is semifinite. For 0 < t < 1, define Pt and qt by
1 I-t t
+-,
Pt Po PI
1 1 - t t
+ -.
qt qo qi
1fT is a linear map from LPo(/1) +LPl (J-l) into Lqo(v) +Lql (v) such that IITfllqo <
Mo IIfll po for f E LPO (/1) and liT fll ql < Millfllpl for f E LPI (/1), then liT fllqt <
MJ-t MfllfliPt for f E LPt (/1), 0 < t < 1.
Proof. To begin with, we observe that the case Po = PI follows from Proposition
6.10: If P == Po == PI, then
liT fllqt < liT fll;t liT flll < MJ-t Mi IIfli p .
Thus we may assume that Po =1= PI, and in particular that Pt < 00 for 0 < t < 1.
Let :Ex (resp. :E y ) be the space of all simple functions on X (resp. Y) that vanish
outside sets of finite measure. Then x C LP (p) for all P and :Ex is dense in LP (p)
for P < 00, by Proposition 6.7; similarly for:Ey. The main part of the proof consists
INTERPOLATION OF LP SPACES 201
of showing that liT fllqt < MJ-t Mi IIfilPt for all f E x. However, by Theorem
6.14,
liT fll q , = sup {If (T f)g dvl : g E By and Ilgllq = 1 } ,
where q is the conjugate exponent to qt. (Note that T f E LqO nLql, so {y : T f(y) =I-
O} must be a-finite unless qo == ql == 00; hence the hypotheses of Theorem 6.14 are
satisfied.) Moreover, we may assume that f =I- 0 and rescale f so that IIfllPt == 1.
We therefore wish to establish the following claim:
. If f E x and IlfllPt == 1, then I f(T f)g dvl < MJ-t Mi for all 9 E y such
that II 9 II q == 1.
Let f == E CjXEj and 9 == E dkXFk where the Ej's and the Fk's are disjoint
in X and Y and the Cj'S and dk's are nonzero. Write Cj and dk in polar form:
Cj == ICj le iOj , dk == Idk lei'l/Jk. Also, let
a(z) == (1 - z)pOI + zPl l ,
(3(z) == (1 - z)qol + zql l ;
thus a(t) == Pt: l and (3(t) == q;l for 0 < t < 1. Fix t E (0,1); we have assumed
that Pt < 00 and hence a(t) > 0, so we may define
m
fz == L ICj la(z)/a(t) e iOj XE j .
I
If (3(t) < i, we define
n
gz == L Idkl(I-,B(z))/(I-,B(t))ei'l/JkXFk'
I
while if (3(t) == 1 we define gz == 9 for all z. (We henceforth assume that (3(t) < 1
and leave the easy modification for (3(t) == 1 to the reader.) Finally, we set
cfy(z) = f (T fz)gz dv.
Thus,
cjJ(z) == L Ajk ICj la(z)/a(t) Id k I (l-,B(z))/(l-,B(t))
j,k
where
Ajk = e i (Oj+'1/1k) f (TXEj )XFk dv,
so that cjJ is an entire holomorphic function of z that is bounded in the strip 0 <
Rez < 1. Since f(Tf)gdv == cjJ(t), by the three lines lemma it will suffice to sho\v
that IcjJ(z) I < Mo for Re z == 0 and IcjJ(z) I < M I for Re z == 1. However, since
a( is) == Pol + is(pl 1 - POl),
1 - (3(is) == (1 - qol) - is(q11 - qol)
202 L P SPACES
for s E JR, we have
'fisl == IfIRe[a(is)/a(t)] == IfI Pt / po ,
Igisl == IgI Re [(I-,a(is))/(I-,a(t))] == Iglq/qb.
Therefore, by Holder's inequality,
IcjJ(is) I < IITfisllqollgisllqb < Moilfisllpoligisllqb == Mollfllptllgllq == Mo.
A similar calculation shows that 1</y(1 + is) I < M I , so the claim is proved.
We have now shown that IITfllqt < MJ-t MfllfllPt for f E I: x , so in view of
Proposition 6.7, TI I:x has a unique extension to LPt (p) satisfying the same estimate
there. It remains to show that this extension is T itself, that is, that T satisfies this
estimate for all f E LPt (p). Given such an f, choose a sequence {fn} in I:x such
that Ifni < If I and fn --7 f pointwise. Also, let E == {x : If(x)1 > I}, 9 == fXE,
9n == fnXE, h == f - g, and h n == fn - gn. Then if Po < PI (which we may assume,
by relabeling the p's), we have 9 E LPO (p), h E LPI (p), and by the dominated
convergence theorem, IIfn - fliPt --7 0, IIgn - gllpo --7 0, and IIh n - hll p1 --7 o.
Hence IITgn - Tgll qo --7 0 and IITh n - Thllq1 --7 0, so by passing to a suitable
subsequence we may assume that Tg n --7 Tg a.e. and Th n --7 Th a.e. (Exercise 9).
But then T fn --7 T f a.e., so by Fatou's lemma,
liT fllqt < Urn inf liT fn Ilqt < Urn inf Mt- t Mi IIfn IIpt == MJ-t Mi IIfllPt 1
and we are done.
.
The conclusion of the Riesz- Thorin theorem can be restated in a slightly stronger
form. Let M ( t) be the operator norm of T as a map from LPt (p) to L qt (v). We have
shown that M(t) < MJ-t Mf. It is possible for strict inequality to hold; however,
if 0 < s < t < u < 1 and t == (1 - T)S + TU, the theorem may be applied again
to show that M(t) < M(s)I-T M(U)T. In short, the conclusion is that log M(t) is a
convex function of t.
We now turn to the Marcinkiewicz theorem, for which we need some more
terminology. Let T be a map from some vector space 1) of measurable functions on
(X, M, p) to the space of all measurable functions on (Y, N, v).
. T is called sublinear if IT(f + g) I < ITfl + ITgl and IT(cf)1 == clTfl for
all f, 9 E 1) and c > o.
. A sublinear map T is strong type (p, q) (1 < p, q < (0) if LP(J-l) C 1), T
maps LP(J-l) into Lq(v), and there exists C > 0 such that IITfllq < Cllfll p for
all f E LP(p).
. A sublinear map T is weak type (p, q) (1 < p < 00, 1 < q < (0) if
LP(J-l) C 1), T maps LP(p) into weak Lq(v), and there exists C > 0 such that
[T fJq < Cllfll p for all f E LP(J-l). Also, we shall say that T is weak type
(p, (0) iff T is strong type (p, 00 ).
INTERPOLATION OF L P SPACES 203
6.28 The Marcinkiewicz Interpolation Theorem. Suppose that (X, JV(, J1) and
(Y, N, 1/) are measure spaces; Po, PI, qo, qi are elements of [1, 00] such that Po < qo,
PI < qI, and qo =1= qI; and
1
P
1 - t t 1 1 - t t
+ - and - == + -, where 0 < t < 1.
Po PI q qo qi
If T is a sublinear map from LPG (J1) + LPI (J1) to the space of measurable functions
on Y that is weak types (Po, qo) and (PI, qI), then T is strong type (p, q). More
precisely, if [Tf]qj < Cjllfll pj for j = 0,1, then IITfll q < Bpllfll p where Bp
depends only on Pj, qj, C j in addition to P; and for j == 0,1, Bplp - Pj I (resp. B p )
remains bounded as P -+ Pj ifpj < 00 (resp. Pj == (0).
Proof. The case Po == PI is easy and is left to the reader (Exercise 42). Without
loss of generality we may therefore assume that Po < PI, and for the time being
we also assume that qo < 00 and qi < 00 (whence also Po < PI < (0). Given
f E LP(J1) and A > 0, let gA and h A be as in Proposition 6.25. Then by Propositions
6.24 and 6.25,
J IgAIPO dJ.L = Po 1 00 (3Po-1 A gA ((3) d(3 = Po 1 00 (3Po-1 Aj((3 + A) d(3
(6.29) = Po i oo ((3 - A )PO -1 A f ((3) d(3 < po i oo (3PO -1 A f ((3) d(3,
J Ih A IPl dJ.L = P1 1 00 (3Pl-1 AhA ((3) d(3 = P1 1 A (3Pl-1 A j((3) d(3.
Likewise,
(6.30) J ITfl q dv = q 1 00 a q - 1 ATf(a) da = 2 q q 1 00 a q - 1 ATj(2a) da.
Since T is sublinear, by Proposition 6.22d we have
(6.31 )
ATf(2a) < ATgA (a) + ATkA (a).
This is true for all a > 0 and A > 0, so we may take A to depend on a. We now
make a specific choice of A. Namely, it follows from the equations defining P and q
that
po(qO - q) p-I(q-I - qol) p-I(q-I - ql l ) PI(ql - q)
(6.32) qo(Po - p) = q-1(p-1 _ POl) = q-1(p-1 - Pl 1 ) = q1(P1 - p);
204 L P SPACES
we denote the common value of these quantities by a, and we take A :=: aO". Then
by (6.29), (6.30), (6.31), and the weak type estimates on T,
(6.33)
liT fll < 2 q q 1 00 a q - 1 [(Co IlgA Ilpola) qo + (C1I1hA Ilpt I a) qt ] da
roo [1 00 ] qo / Po
< 2 q qC'!/pZo/ p o Jo a q - q o- 1 aU (3po-1 Af({3) d{3 da
roo [ {aCT ] ql/Pl
+2 q qC'Ppit/Pt Jo a q - qt - 1 Jo (3Pt- 1 Af({3) d{3 da
1 OO [ 00 ] qj/Pj
= 'L2qqCfjpJj/Pj 1 1 <pj(a,{3)d{3 da,
J=O 0 0
where, denoting by Xo and Xl the characteristic functions of {(a, (3) : {3 > aO"} and
{ (a, (3) : {3 < aO"},
4Jj (a, (3) == Xj (a, (3)a(q-qj -l)pj /qj {3Pj -1 A f ({3).
Since qo/po > 1 and q1/P1 > 1, we may apply Minkowski's inequality for integrals
to obtain
(6.34)
1 00 [1 00 <pj(a,{3)d{3r j / pj da
< [1 00 [1 00 <pj(a, (3)qj/pjdarj/qj d{3r j / pj .
Let T == 1/ a. If q1 > qo, then q - qo and a are positive and the inequality {3 > aO"
is equivalent to a < {3T, so
1 00 [1 00 <po(a, (3)qo/ p Oda ro/ Q O d{3
= 1 00 [l{3T aQ-QO -1 da ] pO / Qo {3Po -1 A f ({3) d{3
= (q - qo)-po!qo 1 00 [JPo-1+ P O (Q-Qo)/ Qo a Af({3) d{3
= (q - qo)-pO/QO 1 00 (3p-1 A f({3) d{3
= 1 q - qo 1- po / qo P -111 f II '
where we have used (6.32) to simplify the exponent of {3. On the other hand, if
q1 < qo, then q - qo and a are negative and the inequality {3 > aO" is equivalent to
INTERPOLATION OF L P SPACES 205
a > {3T, so as above,
roo [ roo ] po/qO [00 [ roo ] Po/qO
J o Jo <Po ( a, (3)qo/ po da d{3 = J o J j3T a q - qO - 1 da (3Po-1 )..-j{{3) d{3
= (qO - q) -po/qo 1 00 (3p-1).. f({3) d{3
== I q - qo I-PO / qo P -111 f II .
A similar calculation shows that
1 00 [1 00 <P1(a,{3)ql/Pldarl/ql d{3 = Iq - qd-Pl/qlp-11Ifll.
Combining these results with (6.33) and (6.34), we see that
SUp{ liT fll q : IIf lip = 1} < Bp = 2 q 1/ q [t, CJj (Pj Ip )qj /Pj Iq - qj 1- 1 f/ q .
But since IT(cf)1 == clTfl for c > 0, this implies that IITfll q < Bpllfll p for all
f E LP (/1), and we are done. (The verification of the asserted properties of Bp is left
as an easy exercise.)
It remains to show how to modify this argument to deal with the exceptional cases
qo == 00 or qi == 00. We distinguish three cases.
Case I: PI == ql == 00 (so Po < qo < (0). Instead of taking A == aO" in the
decomposition of f, we take A = alGI. Then IIThAlloo < ClllhAlloo < a, so
ATkA (a) == 0, and we obtain (6.33) with CPl == 0 and aO" replaced by alGI in the
definition of CPo. The same argument as above then gives
[ ] 1/ q
IITfllq < 2 qC8°ci-qO(polp)qo/POlq - qOI-l IIfll p .
Case II: Po < PI < 00, qo < ql == 00. Again the idea is to choose A so that
AThA (a) == 0, and the proper choice is A == (ald)O" where d == C l [Plllfll/pJ l/P!
and a == PI I (PI - p) (the limiting value of the a defined by (6.32) as ql -+ (0).
Indeed, since PI > P, we have
IIThAII < CflllhAII: = Cflp11A a Pl - 1 )..f(a)da
< Cflp1APl-P lA a P - 1 )..f(a)da = Cfl [ rlllfll = a Pl .
As in Case I, then, we find that CPl == 0 in (6.33) and the integral involving cPo is
majorized by a constant Bp when IIfll p == 1, which yields the desired result.
Case III: Po < PI < 00, ql < qo == 00. The argument is essentially the same as
in Case II, except that we take A == (a/ d)O" with d chosen so that ATgA (a) == o. .
206 L P SPACES
The lengthy formulas in this proof may seem daunting, but the ideas are reasonably
simple. To elucidate them, we recommend the exercise of writing out the proof for two
special (but important) cases: (i) Po == qo == 1, PI == q1 == 2, and (ii) Po = qo = 1,
PI = q1 == 00.
Let us compare our two interpolation theorems. The Marcinkiewicz theorem
requires some restrictions on Pj and qj that are not present in the Riesz- Thorin
theorem; these restrictions, however, are satisfied in all the interesting applications.
Apart from this, the hypotheses of the Marcinkiewicz theorem are weaker: T is
allowed to be sublinear rather than linear, and it needs only to satisfy weak-type
estimates at the endpoints. The conclusion in both cases is that T is bounded from
LP (p) to L q (v), but the Riesz- Thorin theorem produces a much sharper estimate for
the operator norm of T. Thus neither theorem includes the other.
We conclude with two applications of the Marcinkiewicz theorem. The first one
concerns the Hardy-Littlewood maximal operator H discussed in 33.4,
Hf(x)=sup (Bt )) r If(y)ldy
r>O m r, x } B(r,x)
(f E Lfoc (JR n )).
H is obviously sublinear and satisfies IIH flloo < IIflloo for all f E Loo. Moreover,
Theorem 3.17 says precisely that H is weak type (1,1). We conclude:
6.35 Corollary. There is a constant G > 0 such that if1 < P < 00 and f E LP (JRn),
then
IIHfil p < C p I "fll p .
Our second application is a theorem on integral operators related to Theorem 6.18.
6.36 Theorem. Suppose (X, M, p) and (Y, N, v) are a-finite measure spaces, and
1 < q < 00. Let K be a measurable function on X x Y such that, for some G > 0,
we have [K(x,.)]q < Gfora.e. x E X and [K(.,Y)]q < Gfora.e. Y E Y. If
1 < P < 00 and f E LP(v), the integral
Tf(x) = J K(x, y)f(y) dv(y)
converges absolutely for a.e. x E X, and the operator T thus defined is weak
type (1, q) and strong type (p, r) for all p, r such that 1 < p < r < 00 and
p-1 + q-1 = r- 1 + 1. More precisely, there exist constants Bp independent of K
such that
[Tf]q < B 1 Gllfl/!,
liT fllr < BpGllfll p (p > 1, r- 1 == p- 1 +q-1_1 > 0).
Proof. Let P' , q' be the conjugate exponents to p, q; then
-1 -1 + -1 1 -1 ( ' ) -1 -1 ( ' ) -1
r ==p q - ==p - q ==q - p ,
INTERPOLATION OF LP SPACES 207
so p < q' and q < p'. Suppose 0 =I- f E LP (1 < p < q'); by multiplying f and K
by constants, we may assume that II flip == c == 1. Given a positive number A whose
val ue will be fixed later, define
E == {(x,y): IK(x,y)1 > A}, K 1 == (sgnK)(IKI- A)XE, K 2 == K - K 1 ,
and let T 1 , T 2 be the operators corresponding to K 1 , K 2 . Then by Propositions 6.24
and 6.25, since q > 1 we have
j roo J OO A l-q
JK1(x,y)ldv(y) = Jo AK(x,)(a+A)da < A a-qda= q_l '
and likewise
j Al-q
IKl (x, y)1 dfL(X) < q _ 1 .
Hence, by Theorem 6.18, the integral defining T 1 f (x) converges for a.e. x and
A 1 -q A l-q
IIT 1 fli p < I llfll p == 1 .
q- q-
(6.37)
Similarly, since q < p',
A
jIK 2 (x,y) lp l dv(y) =p'l aP'-1AK(x,)(a)da
l A 'AP'-q
< I p'-l-q d -P
_p a a- I .
o P - q
Therefore, by Holder's inequality, the integral defining T2 f (x) converges for every
x, and
(6.38) II T 2/lloo < [ p1:q riP' II/lip = [r/PI Aqlr.
We have thus established that T f == T 1 f + T 2 f is well defined a.e.
Next, given a > 0, we wish to estimate AT f (a). But by Proposition 6.22d,
ATf(a) < ATlf(a) + AT2f(a),
and by (6.38), if we choose
A = [ rlq [rlqpl
we will have IIT 2 fli00 < a, so that AT 2 f( a) == o. With this choice of A, then, by
(6.37) and Chebyshev's inequality we obtain
[ 211T fll ] p [ 2A l-q ] p
ATf(a) < ATtf(!a) < p < (q _ l)a
== 2 P -(1-q)pr/q [ CJ. ] (l-q)pr/qp' a- p +(l-q)pr/ q == C [ lIfllp ] r ,
(q - l)p r p a
208 L P SPACES
because II flip = 1 and
(1 - q)pr _ p = p ( -r _ 1 ) == _p . :.. = -r.
q q' p
A simple homogeneity argument now yields the estimate ATf(a) < Cp(lIfllpja)r
with no restriction on Ilfll p , so we have shown that T is weak type (p,r), and in
particular (for p == 1) weak type (1, q).
Finally, given p E (1, q'), choose p E (p, q') and define rby r- 1 == p- 1 - (q')-I.
Then T is weak types (1, q) and ( p, r), so it follows from the Marcinkiewicz theorem
that T is strong type (p, r). .
Exercises
41. Suppose 1 < p < 00 and p-1 + q-l = 1. If T is a bounded operator on LP
such that f(Tf)g == f f(Tg) for all f,g E LP n Lq, then T extends uniquely to a
bounded operator on Lr for all r in [p, q] (if p < q) or [q, p] (if q < p).
42. Prove the Marcinkiewicz theorem in the case Po = Pl. (Setting p == Po == P1,
we have ATf(a) < (Collfllpja)qO and ATf(a) < (C 1 I1fll p ja)ql. Use whichever
estimate is better, depending on a, to majorize q 10 00 a q - 1 ATf(a) da.)
43. Let H be the Hardy-Littlewood maximal operator on IR. Compute H X(O,I)
explicitly. Show that it is in LP for all p > 1 and in weak L 1 but not in L 1 , and that
its LP norm tends to 00 like (p -1)-1 asp --7 1, although IIX(O,l) lip == 1 for allp.
44. Let 1 a be the fractional integration operator of Exercise 61 in 32.6. If 0 < a < 1,
1 < p < a-I, and r- 1 = p-1 - a, then 10: is weak type (1, (1 - a)-l) and strong
type (p, r) with respect to Lebesgue measure on (0,00).
45. If 0 < a < n, define an operator To: on functions on IR n by
Taf(x) = J Ix - yl-a f(y) dy.
Then To: is weak type (1, (n - a)-I) and strong type (p, r) with respect to Lebesgue
measure on IRn, where 1 < p < na- 1 and r- 1 == p-l - an- 1 . (The case n == 3,
a == 1 is of particular interest in physics: If f represents the density of a mass or
charge distribution, - ( 47r) -1 T 1 f represents the induced gravitational or electrostatic
potential.)
6.6 NOTES AND REFERENCES
The importance of the space L2([a, b]) was recognized soon after the invention of the
Lebesgue integral because of its connection with Fourier series and other orthogonal
expansions; and one of the early triumphs of the Lebesgue theory was the discovery
in 1907 by Fischer [44] and F. Riesz [114] that L2([a,b]) is isomorphic to [2, or
NOTES AND REFERENCES 209
what amounts to the same thing, that L2([a, b]) is complete. The spaces LP([a, b])
for 1 < p < 00 were first investigated by F. Riesz [117], who proved all of the major
results in 9-36.1-2 for them as well as the weak sequential compactness of the closed
unit ball in LP. The fact that (L 1 )* == Loo was first proved by Steinhaus [143].
In some respects it is unfortunate that LP spaces were not named Ll/p spaces, for
- as one sees in the conjugacy relation p-1 + q-1 = 1 and in the results of 3 6 . 5 -
relationships among different LP spaces usually involve linear equations in p-1 .
A discussion of some of the deeper aspects of LP spaces and their applications in
other areas of analysis can be found in Lieb and Loss [93].
36.1: Holder's inequality, in the case p = 2, is commonly associated with the
names of Cauchy (who proved it for finite sums) and Buniakovsky and Schwarz (who
proved it, independently, for integrals). For general p it was discovered independently
by Holder and Rogers. Minkowski's original inequality was for finite sums. (See
Hardy, Littlewood, and P6lya [66] for references.) A neat proof of Holder's inequality
using complex function theory can be found in Rubel [122].
The relations among the spaces LP + Lq, defined in Exercise 4, are studied in
Alvarez [5]. See Romero [120] for a discussion of Exercise 5, including some other
conditions for the inclusion LP C Lq to hold, and Miamee [100] for a discussion of
the more general relation LP(J-L) C Lq(v).
36.2: A quite different approach to the LP duality theory for 1 < p < 00
can be found in Hewitt and Stromberg [76, 315]. J. Schwartz [130] has found a
characterization of (L 1 )* that is valid on arbitrary measure spaces.
The proof of Theorem 6.15 breaks down for p = 00 because the set function
v(E) == <!J(XE) need not be countably additive. It is, however, a bounded, finitely
additive complex measure on (X, M) that is absolutely continuous with respect to J-L in
the sense that v( E) == 0 whenever J-L( E) = O. Conversely, given a bounded, finitely
additive complex measure v on (X, M), one can define the integral of a bounded
measurable function with respect to v. (One defines f f dv in the obvious way when
f is simple and then shows that Iff dvl < Cllfllu, so that the integral extends to
all uniform lilnits of simple functions.) In this way one obtains a representation of
(L 00) * as a space of finitely additive complex measures. See Hewitt and Stromberg
[76, 320], and for a more general treatment of finitely additive integrals, Dunford and
Schwartz [35, Chapter 3]. (The example of a <!J E (LOO)* \ L1 that we presented at the
end of 36.2 shows how horrible finitely additive measures can be: If v(E) = <!J(XE),
then v « m, but v behaves like the point mass at zero when integrated against any
continuous function.)
36.3: Theorem 6.18 generalizes results of Schur [129] (for the case p == 2) and
w. H. Young [164] (for the case K(x, y) == k(x - y); see 38.2). Theorem 6.20 is
also essentially due to Schur [129].
The reader whose appetite for inequalities is not satisfied by this section can find
a feast in Hardy, Littlewood, and P6lya [66].
36.4: The weak LP spaces first appeared implicitly in weak-type estimates,
instances of which go back to the 1920s; see also the notes for 36.5 below. Decreasing
210 L P SPACES
rearrangements (Exercise 40) were introduced by Hardy and Littlewood [65], who
give an entertaining motivation of their principal theorem on rearrangements in terms
of cricket averages.
86.5: The Riesz- Thorin theorem was first proved by M. Riesz (F. Riesz's younger
brother) [118] under the assumption that Pj < qj for j = 0, 1; the proof in the general
case and the idea of using the three lines lemma are due to Thorin [149]. E. M. Stein
has proved a very powerful generalization of the Riesz- Thorin theorem. It deals
with a family {Tz : 0 < Re z < I} of operators that (roughly speaking) depend
holomorphicallyon z and satisfy some mild growth conditions as 11m zl 00, and
it asserts that if Tz is bounded from LPj to Lqj for Re z == j (j == 0, 1), then Tz is
bounded from LPt to Lqt for Re z == t (0 < t < 1), where Pt, qt are defined as in the
Riesz- Thorin theorem. The precise statement and proof can be found in Bennett and
Sharpley [15, 34.3], Stein and Weiss [142, 3V.4], or Zygmund [167, 3XII.l]. For a
further extension of these ideas, see Coifman et al. [28].
The Marcinkiewicz interpolation theorem was announced by Marcinkiewicz [97]
for the case Pj == qj (j = 0,1); after his untimely death in World War II, the work
was completed by Zygmund [166]. The theorem can be proved under still weaker
hypotheses on T; an extra twist to the argument we have given yields the same result
under the sole assumption that IT(f + g) I < C(IT fl + ITgl) for some constant C.
See Zygmund [166], [167, 3XII.4]. The spaces LP and weak LP form part of a two-
parameter family {L(p, q) : 1 < P, q < oo} of function spaces, the so-called Lorentz
spaces, such that LP == L(p,p) and weak LP = L(p, (0), and the Marcinkiewicz
theorem can be extended to a result about interpolation of operators on the L(p, q)
spaces. See Bennett and Sharpley [15, 34.4], or Stein and Weiss [142, 35.3]
There are many other examples of "continuous families" of Banach spaces for
which interpolation theorems can be proved - for example, the spaces Aa discussed
in Exercise 11 in 35.1 and the Sabolev spaces discussed in 39.3. There are also two
general techniques for constructing "intermediate spaces" between pairs of Banach
spaces, known as the "complex method" and the "real method," which may be
regarded as abstract forms of the Riesz-Thorin and Marcinkiewicz theorems. An
account of these theories and their applications can be found in Bergh and Lofstrom
[16]; see also Bennett and Sharpley [15] the real method and its applications.
Corollary 6.35 is due to Hardy and Littlewood [65]. Theorem 6.36 appears first
in Folland and Stein [51], but the essential idea of the preof was discovered by Stein
several years earlier (see, e.g., Stein [140, 35.1]), and the special case discussed in
Exercise 44 goes back to Hardy and Littlewood [64].
7
Radon Measures
The subject of this chapter is measure and integration theory on locally compact
Hausdorff (LCH) spaces. We have seen in 32.6 that Lebesgue measure on jRn
interacts nicely with the topology on jRn - measurable sets can be approximated by
open or compact sets, and integrable functions can be approximated by continuous
functions - and it is of interest to study measures having similar properties on
more general spaces. Moreover, it turns out that certain linear functionals on spaces
of continuous functions are given by integration against such measures. This fact
constitutes an important link between measure theory and functional analysis, and it
also provides a powerful tool for constructing measures.
Throughout this chapter, X will denote an LCH space. We continue to employ
the terminology developed in Chapter 1 in the context of metric spaces: 13 x will
denote the Borel a-algebra on X, that is, the a-algebra generated by the open sets;
measures on 13 x will be called Borel measures; countable unions (intersections) of
closed (open) sets will be called Fa (G 8) sets, and so forth.
7.1 POSITIVE LINEAR FUNCTIONALS ON Cc(X)
We recall that Cc(X) is the space of continuous functions on X with compact support.
A linear functional [ on Cc(X) will be called positive if [(I) > 0 whenever 1 > o.
In this definition there is no mention of continuity, but it is worth noting that positivity
itself implies a rather strong continuity property.
211
212 RADON MEASURES
7.1 Proposition. If I is a positive linear functional on Cc(X), for each compact
K c X there is a constant C K such that II(f)1 < CKllflluforall f E Cc(X) such
that supp(f) c K.
Proof. It suffices to consider real-valued f. Given a compact K, choose </J E
Cc(X, [0, 1]) such that </J = 1 on K (Urysohn's lemma). Then if supp(f) C K,
we have If I < IIfllu</J, that is, IIfllu</J - f > 0 and Ilfllu</J + f > O. Thus
IlflluI(q;) - I(f) > 0 and IlflluI(</J) + I(f) > 0, so that II(f)1 < I(</J)llfllu. .
If J1 is a Borel measure on X such that J1(K) < 00 for every compact K c X,
then clearly Cc(X) c £1(J1), so the map f f-t f f dJ1 is a positive linear functional
on C c (X). The principal result of this section is that every positive linear functional
on Cc(X) arises in this fashion; moreover, one can impose some additional regularity
conditions on J1, subject to which J1 is unique. These conditions are as follows.
Let j.j be a Borel measure on X and E a Borel subset of X. The measure j.j is
called outer regular on E if
J1(E) = inf {J1(U) : U => E, U open}
and inner regular on E if
j.j(E) == sup{j.j(K) : K c E, K compact}.
If j.j is outer and inner regular on all Borel sets, j.j is called regular. It turns out that
regularity is a bit too much to ask for when X is not a-compact, so we adopt the
following definition. A Radon measure on X is a Borel measure that is finite on all
compact sets, outer regular on all Borel sets, and inner regular on all open sets. We
shall show in 37.2 that Radon measures are also inner regular on all of their a-finite
sets.
One further bit of notation: If U is open in X and f E Cc(X), we shall write
f-<U
to mean that 0 < f < 1 and supp(f) c U. (This is slightly stronger than the
condition 0 < f < Xu, which implies only that supp(f) C U .)
7.2 The Riesz Representation Theorem. If I is a positive linear functional on
Cc(X), there is a unique Radon measure j.j on X such that I(f) = f f dJ1 for
all f E Cc(X). Moreover, j.j satisfies
(7.3) j.j(U) == sup{ I(f) : f E Cc(X), f -< U} for all open U c X
and
(7.4) j.j(K) == inf {I(f) : f E Cc(X), f > XK } for all compact K c X.
Proof. Let us begin by establishing uniqueness. If j.j is a Radon measure such
thatI(f) == f fdj.jforallf E Cc(X),andU c X is open, thenclearlyI(f) < j.j(U)
POSITIVE LINEAR FUNCTIONALS ON CC (X) 213
whenever 1 -< U. On the other hand, if K e U is compact, by Urysohn's lemma there
is an 1 E Cc(X) such that 1 -< U and 1 == 1 on K, whence JL(K) < J f dJL == I(f).
Since J1 is inner regular on U, it follows that (7.3) is satisfied. Thus J1 is determined
by I on open sets, and hence on all Borel sets because of outer regularity.
This argument proves the uniqueness of J.L and also suggests how to go about
proving existence. Namely, we begin by defining
J1(U) == sup{ I(f) : 1 E Cc(X), f -< U}
for U open, and we then define J1 * (E) for an arbitrary E e X by
J.L* (E) == inf {J.L(U) : U => E, U open}.
Clearly JL(U) < J.L(V) if U e V, and hence J1* (U) == J.L(U) if U is open.
The outline of the proof is now as follows. We shall establish that
i. J.L * is an outer measure.
ii. Every open set is J.L * - measurable.
At this point it follows from Caratheodory's theorem that every Borel set is J.L*-
measurable and that J.L == J.L * I 'B x is a Borel measure. (The notation is consistent
because J.L*(U) = J.L(U) for U open.) The measure J.L is outer regular and satisfies
(7.3) by definition. We next show that
iii. J.L satisfies (7.4).
This clearly implies that J.L is finite on compact sets, and inner regularity on open sets
also follows easily. Indeed, if U is open and a < J.L(U), choose f E Cc(X) such
that 1 -< U and I(f) > a, and let K = supp(f). If 9 E Cc(X) and 9 > XK, then
9 - 1 > 0 and hence I (g) > I (f) > a. But then JL( K) > a by (7.4), so J.L is inner
regular on U. Finally, we prove that
iv. I(f) == f f dJL for all f E Cc(X).
With this, the proof of the theorem will be complete.
Proof of (i): It suffices to show that if {U j } is a sequence of open sets and
U = U U j , then J.L(U) < E J.L(U j ). Indeed, from this it follows that for any
E eX,
ex:> ex:>
p*(E) = inf{I>(Uj) : U j open, E C U U j },
1 1
and the expression on the right defines an outer measure by Proposition 1.10. If U =
U U j , 1 E Cc(X), and 1 -< U, let K = supp(f). Since K is compact, we have
K e U U j for some finite n, so by Proposition 4.41 there exist g1, . . . , gn E Cc(X)
with gj -< U j and E gj = 1 on K. But then 1 == E fgj and fgj -< U j , so
n n ex:>
1(/) == LI(fgj) < LJ.L(U j ) < LJ.L(U j ).
111
214 RADON MEASURES
Since this is true for any f -< U, we conclude that J-L(U) < L: J-L(U j ) as desired.
Proof of (ii): We must show that if U is open and E is any subset of X such that
J-L*(E) < 00, then J-L*(E) > J-L*(E n U) + J-L*(E \ U). First suppose that E is open.
Then E n U is open, so given E > 0 we can find f E Cc(X) such that f -< E n U
and I(f) > J-L(E n U) - E. Also, E \ (supp(f)) is open, so we can find g E Cc(X)
such that g -< E \ supp(f) and 1 (g) > J-L( E \ supp(f)) - E. But then f + g -< E, so
J-L(E) > I(f) + I(g) > J-L(E n U) + J-L(E \ supp(f)) - 2E
> J-L*(E n U) + J-L*(E \ U) - 2E.
Letting E 0, we obtain the desired inequality. For the general case, if J-L* (E) < 00
we can find an open V ::J E such that J-L(V) < J-L*(E) + E, and hence
J-L* (E) + E > J-L(V) > J-L* (V n U) + J-L* (V \ U)
> J-L* (E n U) + J-L* (E \ U).
Letting E 0, we are done.
Proof of (iii): If K is compact, f E Cc(X), and f > XK, let Uf. == {x : f(x) >
1 - E}. Then Uf. is open, and if g -< Uf., we have (1 - E)-1 f - 9 > 0 and so
I(g) < (1 - E)-I/(f). Thus J-L(K) < J-L(Uf.) < (1 - E)-I/(f), and letting E 0
we see that J-L(K) < I(f). On the other hand, for any open U ::J K, by Urysohn's
lemma there exists f E Cc(X) such that f > XK and f -< U, whence I(f) < J-L(U).
Since J-L is outer regular on K, (7.4) follows.
Proof of (iv): If suffices to show that I(f) = f f dJ-L if f E Cc(X, [0, 1]), as
Cc(X) is the linear span of the latter set. Given N E N, for 1 < j < N let Kj =
{x : f(x) > jN- 1 } and let Ko == supp(f). Also, define fl'..., fN E Cc(X)
by fj(x) = 0 if x K j - 1 , fj(x) == f(x) - (j - l)N-l if x E Kj-l \ Kj, and
fj(x) = N-l if x E Kj. In other words,
. { { j-1 } I }
fj = mIll max f - N ' 0 , N .
Then N- 1 XK j < fj < N- 1 XK j _ 1 , hence
1 J 1
N J-L(K j ) < fj dJ-L < N J-L(K j - 1 ).
Also, if U is an open set containing Kj-l we have Nfj -< U and so I(fj) <
N- 1 J-L(U). Hence, by (7.4) and outer regularity,
1 1
N J-L(K j ) < I(fj) < N J-L(K j - 1 ).
POSITIVE LINEAR FUNCTIONALS ON CC(X) 215
Moreover, f == L: fj, so that
1 N J 1 N-l
N LJL(K j ) < fdJL < N L JL(K j ),
1 0
1 N 1 N-l
N LJL(K j ) < I(J) < N L JL(Kj).
1 0
It follows that
II(J) - J f dJLI < JL(Ko) 7v JL (K N ) < JL(su(J)) .
Since J-L( supp(f)) < 00 and N is arbitrary, we conclude that I (f) == ! f dJ-L. .
The proof of this theorem yields something stronger than the statement: We obtain
not just a Borel measure J-L but an extension J-L of J-L to the a-algebra of J-L*-measurable
sets. However, it follows from outer regularity that for any E eX,
J-L * (E) = inf {J-L ( B) : B E 'B x, B => E},
so J-L * is the outer measure induced by J-L in the sense of 31.4. According to Exercise
22 in 31.4, therefore, J-L is the completion of J-L if J-L is a-finite and is the saturation of
the completion of J-L in general.
On the other hand, some authors prefer to restrict attention to a smaller a-algebra
than 13 x, namely, the a-algebra 'B generated by Cc(X) (that is, the smallest a-
algebra with respect to which every f E Cc(X) is measurable). The elements of 13
are called Baire sets. For more about Baire sets, see Exercises 4-6.
Exercises
1. Let X be an LCH space, Y a closed subset of X (which is an LCH space in
the relative topology), and J-L a Radon measure on Y. Then I(f) == !(fIY) dJ-L is a
positive linear functional on Cc(X), and the induced Radon measure v on X is given
by v(E) = J-L(E n Y).
2. Let J-L be a Radon measure on X.
a. Let N be the union of all open U c X such that J-L(U) == O. Then N is open
and J-L( N) == o. The complement of N is called the support of J-L.
b. x E supp(J-L) iff J f dJ-L > 0 for every f E Cc(X, [0, 1]) such that f(x) > o.
3. Let X be the one-point compactification of a set with the discrete topology. If J1
is a Radon measure on X, then supp(J-L) (see Exercise 2) is countable.
4. Let X be an LCH space.
a. If f E Cc(X, [0,(0)), then f-l ([a, (0)) is a compact C8 set for all a > O.
b. If K c X is a compact C 8 set, there exists f E Cc(X, [0, 1]) such that
K == f-l({1}).
216 RADON MEASURES
c. The a-algebra of Baire sets is the a-algebra generated by the compact
G 8 sets.
5. Let X be a second countable LCH space.
a. Every compact subset of X is a G 8 set.
b. 23 x == 23 .
6. Let X be an uncountable set with the discrete topology, or the one-point com-
pactification of such a set. Then 23 x =1= 23.
7.2 REGULARITY AND APPROXIMATION THEOREMS
In this section we explore the properties of Radon measures in more detail.
7.5 Proposition. Every Radon measure is inner regular on all of its a-finite sets.
Proof. Suppose that J-L is Radon and E is a-finite. If J-L( E) < 00, for any E > 0
we can choose an open U :) E such that J-L(U) < J-L(E) + E and a compact FeU
such that J-L(F) > J-L(U) - E. Since J-L(U \ E) < E, we can also choose an open
V :=) U \ E such that J-L(V) < E. Let K = F \ V. Then K is compact, K c E, and
J-L(K) = J-L(F) - J-L(F n V) > J-L(E) - E - J-L(V) > J-L(E) - 2E.
Thus J-L is inner regular on E. On the other hand, if J-L(E) = 00, E is an increasing
union of sets Ej with J-L( Ej) < 00 and J-L( Ej) 00. Thus for any N E N there exists
j such that J-L( Ej) > N and hence, by the preceding argument, a compact K c Ej
with JL(K) > N. Hence J-L is inner regular on E. .
7.6 Corollary. Every a-finite Radon measure is regular. If X is a-compact, every
Radon measure on X is regular.
For an example of a nonregular Radon measure, see Exercise 12.
7.7 Proposition. Suppose that J-L is a a-finite Radon measure on X and E is a Borel
set in X.
a. For every E > 0 there exist an open U and a closed F with FeE c U and
J-L(U \ F) < E.
b. There exist an Fa setAandaG 8 setB such that A C E c BandJ-L(B\A) == o.
Proof. Write E = U Ej where the Ej's are disjoint and have finite measure.
For each j, choose an open U j => Ej with J-L(U j ) < J-L(Ej) + E2- j - 1 and let
U = U U j . Then U is open, U => E, and J-L(U \ E) < L J-L(U j \ Ej) < E/2.
ApplyingthesamereasoningtoEC, we obtain an open V => ECwithJ-L(V\EC) < E/2.
Let F = Vc. Then F is closed, FeE, and
Jl(U \ F) = J-L(U \ E) + J-L(E \ F) = J-L(U \ E) + J-L(V \ E C ) < E.
This proves (a), and (b) follows easily; details are left to the reader.
.
REGULARITY AND APPROXIMATION THEOREMS 217
7.8 Theorem. Let X be an LCH space in which every open set is a-compact (which
is the case, for example, if X is second countable). Then every Borel measure on X
that is finite on compact sets is regular and hence Radon.
Proof. If J-L is a Borel measure that is finite on compact sets, then Cc(X) c L1 (J-L),
so the map I (f) = f f dJ-L is a positive linear functional on C c (X). Let v be
the associated Radon measure according to Theorem 7.2. If U c X is open, let
U = U Kj where each Kj is compact. Choose 11 E Cc(X) so that I -< U and
f == 1 on K 1. Proceeding inductively, for n > 1 choose In E C c (X) so that In -< U
and In == 1 on U Kj and on U-l supp(lj). Then fn increases pointwise to Xu
as n 00, so
J-L(U) = Hrn J In dJ-L = lirn J In dv = v(U)
by the monotone convergence theorem. Next, if E is any Borel set and E > 0, by
Proposition 7.7 there exist an open V => E and a closed FeE with v(V \ F) < E.
But V \ F is open, so J-L(V \ F) == v(V \ F) < Eo In particular, J-L(V) < J1( E) + E,
so J-L is outer regular. Also, J-L(F) > p(E) - E, and F is a-compact (since X is), so
there exist compact Kj C F with J-L(Kj) J-L(F), whence J1 is inner regular. Thus
f-l is regular (and equal to v, by the uniqueness part of Theorem 7.2.) .
Examples of non- Radon measures are considered in Exercises 13-15. In particular,
Exercise 15 exhibits an example of a finite, non-Radon Borel measure on a compact
Hausdorff space.
We now turn to some approximation theorems for measurable functions.
7.9 Proposition. If p is a Radon measure on X, Cc(X) is dense in LP(p) for
1 < p < 00.
Proof. Since the LP simple functions are dense in LP (Proposition 6.7), it suffices
to show that for any Borel set E with J-L( E) < 00, XE can be approximated in
the LP norm by elements of Cc(X). Given E > 0, by Proposition 7.5 we can
choose a compact K C E and an open U => E such that J-L(U \ K) < E, and by
Urysohn's lemma we can choose I E Cc(X) such that XK < I < Xu. Then
IIXE - Illp < J-L(U \ K)llp < E11p, so we are done. .
7.10 Lusin's Theorem. Suppose that J-L is a Radon measure on X and I : X C
is a measurable function that vanishes outside a set of finite measure. Then for any
E > 0 there exists <P E Cc(X) such that cP == I except on a set of measure < E. If f
is bounded, cP can be taken to satisfy IlcPllu < 1I/11u.
Proof. Let E == {x : f(x) i= O}, and suppose to begin with that I is bounded.
Then fELl (J1), so by Proposition 7.9 there is a sequence {gn} in Cc(X) that
converges to I in L 1 , and hence by Corollary 2.32 a subsequence (still denoted by
{gn} ) that converges to I a.e. By Egoroff's theorem there is a set AcE such that
J-L(E \ A) < E/3 and gn I uniformly on A, and there exist a compact B c A and
218 RADON MEASURES
an open U => E such that JL(A \ B) < E/3 and J.L(U \ E) < E/3. Since gn f
uniformly on B, liB is continuous, so by Theorem 4.34 there exists h E Cc(X)
such that h == f on Band supp(h) c U. But then {x : f(x) i= h(x)} is contained
in U \ B, which has measure < Eo
To complete the proof for I bounded, define (3 : C C by (3(z) == z if Izl < 1I/IIu
and (3(z) == Ilfllu sgn z if Izl > Ilfllu, and set <t> == (30 h. Then <t> E Cc(X) since (3
is continuous and (3(0) = O. Moreover, 11<t>llu < Ilfllu, and <t> == f on the set where
h == I, so we are done.
If I is unbounded, let An = {x : 0 < If(x)1 < n}. Then An increases to E as
n 00, so J-L(E \ An) < E/2 for sufficiently large n. By the preceding argument
there exists <t> E Cc(X) such that <t> == fXA n except on a set of measure < E/2, and
hence <t> == f except on a set of measure < E. .
Our final group of results in this section concerns semicontinuous functions. If X
is a topological space, a function f : X (- 00, 00] is called lower semicontinuous
(LSC) if {x: f(x) > a} is open for all a E JR, and f: X [-00,(0) is called
upper semicontinuous (USC) if {x : f (x) < a} is open for all a E JR.
7.11 Proposition. Let X be a topological space.
a. IfU is open in X, then Xu is LSC.
b. If f is LSC and c E [0,(0), then cf is LSC.
c. If9 is a family ofLSCfunctions and f(x) = sup{g(x) : 9 E 9}, then f is
LSC.
d. If 11 and 12 are LSC, so is 11 + f2.
e. If X is an LCH space and f is LSC and nonnegative, then
I(x) = sup{g(x) : 9 E Cc(X), 0 < 9 < I}.
Proof (a) and (b) are obvious, and (c) follows from the observation that
f-l ((a, 00]) == U g-1 ((a, 00]).
gE9
As for (d), if 11 (XO) + 12(XO) > a, choose E > 0 so that 11 (xo) > a - I2(xo) + E.
Then
{x: (11 +/2)(X) > a} => {x: 11 (x) > a-/2(xO)+E}n{x: 12(X) > f2(XO)-E},
which is a neighborhood of XQ. Thus fl + f2 is LSC. Finally, if X is LCH, f(x) > 0,
and 0 < a < f(x), then U == {y : I(y) > a} is an open set containing x, so by
Urysohn's lemma there exists 9 E Cc(X) such that g(x) = a and 0 < 9 < axu < f.
This establishes (e) when f(x) > 0, and (e) is trivial when f(x) == O. .
There is, of course, a corresponding set of results for USC functions, whose
formulation is left to the reader. The following result is a monotone convergence
theorem for nets of LSC functions.
REGULARITY AND APPROXIMATION THEOREMS 219
7.12 Proposition. Let 9 be afamily of nonnegative LSCfunctions on an LCH space
X that is directed by < (that is, for every gl, g2 E 9 there exists 9 E 9 such that
gl < 9 and 92 < g). Let f == sup{g : 9 E 9}. If J-l is any Radon measure on X, then
J f dJ-l == sup{J 9 dJ-l : 9 E 9}.
Proof. By Proposition 7.11c, f is LSC and hence Borel measurable, and clearly
J f dJ-l > sup{J 9 dJ-l}. To prove the reverse inequality, consider the sequence cj;n of
simple functions increasing to f that was constructed in Theorem 2.10:
2 2n
1
<Pn = 2 n L Xu nj , where U nj = {x : f(x) > j2- n }.
j=l
By the monotone convergence theorem, given a < J f dJ-l we can fix n large enough
so that 2- n E j J-l(U nj ) == J CPn dJ-l > a. Since U nj is open, there exist compact
Kj c U nj (1 < j < 2 2n ) such that 2- n E j J-l(Kj) > a. Let'ljJ == 2- n E j XKj. For
each x E U j Kj we have f(x) > CPn(x) > 'ljJ(x), so we can pick gx E 9 such that
9x(X) > 'ljJ(x). But -XK j is LSC, so gx - 'ljJ is LSC by Proposition 7.11d, and hence
the set V x == {y : 'ljJ(y) < 9x (y)} is open. Thus {V x : x E U j K j } is an open cover
of U j Kj, so there is a finite subcover V X1 , . . . , V Xm . Pick 9 E 9 such that gXk < 9
for k == 1,..., m; then < g, so J 9 dJ-l > a. Since a was any number less than
J f dJ-l, we are done. .
7.13 Corollary. If J-l is Radon and f is nonnegative and LSC, then
J f dJ.L = sup {J gdJ.L : 9 E Cc(X), 0 < g < f} .
Proof. Combine Propositions 7.11e and 7.12.
.
7.14 Proposition. If J-l is a Radon measure and f is a nonnegative Borel measurable
function, then
J f dJ.L = inf {J 9 dJ.L : 9 > f and 9 is Lse} .
If {x : f(x) > O} is a-finite, then
J f dJ.L = sup {J gdJ.L : 0 < g < f and 9 is use} .
Proof. Let {CPn} be a sequence of nonnegative simple functions that increase
pointwise to f. Then f == cj;1 + E (cj;n - cj;n-1), and each term in this series is a
nonnegative simple function, so we can write f == E ajXEj where aj O. Given
E > 0, for each j choose an open U j :J Ej such that J-l(U j ) < J-l( Ej) +Ej (2 J aj). Then
9 == E aj XU j is LSC by Proposition 7.11, 9 > f, and J 9 d/-l < J f dJ-l + E. This
establishes the first assertion. For the second, if a < J f dJ-l, let N be large enough
so that E ajJ-l(Ej) > a. Since the Ej's are a-finite, by Proposition 7.5 there are
compact sets Kj C Ej such that E aj J-l( Kj) > a. Thus if 9 == E aj XK j , then 9
is USC, 9 < f, and J 9 dJ-l > a. .
220 RADON MEASURES
Exercises
7. If J.L is a a-finite Radon measure on X and A E x, the Borel measure J.LA
defined by J.LA(E) == J.L(E n A) is a Radon measure. (See also Exercise 13.)
8. Suppose that J.L is a Radon measure on X. If cjJ E £1 (J.L) and cjJ > 0, then
v(E) == JE cjJ dJ.L is a Radon measure. (Use Corollary 3.6.)
9. Suppose that J.L is a Radon measure on X and cjJ E C(X, (0, (0)). Let v(E) ==
J E cjJ dJ.L, and let v' be the Radon measure associated to the functional f r-+ J f cjJ d/-l
on Cc(X).
a. If U is open, v(U) = v'(U). (Apply Corollary 7.13 to cjJXu.)
b. v is outer regular on all Borel sets. (Hint: The open sets V k = {x : 2 k <
cjJ(x) < 2k+2}, k E Z, cover X.)
c. v == v', and hence v is a Radon measure. (See also Exercise 13.)
10. If J.L is a Radon measure and f E £1 (J.L) is real-valued, for every E > 0 there exist
an LSC function 9 and a USC function h such that h < f < 9 and J (g - h) dJ.L < E.
11. Suppose that /-l is a Radon measure on X such that /-l( {x}) == 0 for all x EX,
and A E x satisfies 0 < J.L(A) < 00. Then for any Q such that 0 < Q < J.L(A)
there is a Borel set B c A such that J.L( B) == Q.
12. Let X == IR X IR d , where IRd denotes IR with the discrete topology. If f is a
function on X, let fY(x) = f(x, y); and if E c X, let EY == {x : (x, y) E E}.
a. f E Cc(X) iff fY E Cc(IR) for all y and fY == 0 for all but finitely many y.
b. Define a positive linear functional on Cc(X) by 1(f) == 2: YE 1R J f(x, y) dx,
and let J.L be the associated Radon measure on X. Then J.L( E) == 00 for any E
such that EY =1= 0 for uncountably many y.
c. Let E == {O} X IRd. Then J.L(E) == 00 but J.L(K) == 0 for all compact K c E.
13. In the setting of Exercise 12, let A == (IR \ {O}) X IRd and cjJ(x, y) == Ixl. Then
the measures J-lA (E) = J.L( A n E) and v( E) == J E cjJ dJ.L are not Radon. (Thus, the
hypotheses that J.L be a-finite in Exercise 7, that cjJ E L 1 (/-l) in Exercise 8, and that
cjJ > 0 in Exercise 9, cannot be dropped.)
14. Let J.L be a Radon measure on X, and let J.Lo be the semifinite part of J.L (see
Exercise 15 in 31.3).
a. J.Lo is inner regular on all Borel sets.
b. J.Lo is outer regular on all Borel sets E such that J.L( E) < 00.
c. J f dJ.L == J f dJ.Lo for all f E Cc(X).
d. If J.L is the measure of Exercise 12 and m is Lebesgue measure on IR, then
J.Lo(E) == 2: YE 1R m(EY) for any Borel set E.
15. Let f2 be the set of countable ordinals, WI the first uncountable ordinal, and
f2* = f2 U {WI}. Let f2* be endowed with the order topology (see Exercise 9 in 34.1).
a. f2* is a compact Hausdorff space. (Hint: f2* contains no infinite strictly
decreasing sequences.)
THE DUAL OFCo(X) 221
b. [2 is an open set in [2* that is not a-compact.
c. A subset E of 0 is uncountable iff for each x E [2 there exists y E E such
that x < y.
d. If {En} is a sequence of uncountable closed sets in [2*, then n En is
uncountable. (If {x j } is an increasing sequence in n such that each En contains
infinitely many Xj'S, then limj-+oo Xj exists and is in n En.)
e. If E c 'Bo*, then either E U {WI} or EC U {WI} contains an uncountable
closed set. (Hint: The set of all E satisfying the latter condition is a a-algebra.)
f. Define J-l on 'Bo* by J-l( E) == 1 if E U {WI} contains an uncountable closed
set, J-l( E) = 0 otherwise. Then J-l is a measure, J-l( {WI}) == 0, but J-l( U) == 1 for
every open U containing WI.
g. If f E C(O*), there exists x E n such that f(y) == f(WI) for y > x. (If
En = {x : If(x) - f(WI)1 < n- I }, then E is countable.)
h. With J-l as in (f), the Radon meausre on 0* associated to the functional
f r-+ J f dJ-l is the point mass at WI.
7.3 THE DUAL OF Co (X)
We recall that for any LCH space X, Co (X) is the uniform closure of Cc(X)
(Proposition 4.35), and hence if J-l is a Radon measure on X, the functional I(f) ==
J f dJ-l extends continuously to Co (X) iff it is bounded with respect to the uniform
norm. In view of the equality
J-l(X) = sup {J f dJ-l : f E Cc(X), 0 < f < 1 }
(a special case of (7.3)) together with the fact that I J f dJ-l1 < J If I dJ-l, this happens
precisely when J-l(X) < 00, in which case J-l(X) is the operator norm of I.
We have therefore identified the positive bounded linear functionals on Co (X):
they are given by integration against finite Radon measures. Our object in this section
is to extehd this result to give a complete description of Co(X)*. The key fact is that
real linear functionals on Co (X, IR) have a "Jordan decomposition."
7.15 Lemma. If I E Co (X, IR)*, there exist positivefunctionals I-I. E Co (X, R)*
such that I == 1+ - 1-.
Proof. If f E Co (X, [0,(0)), we define
I+(f) == sup{I(g) : 9 E Co(X,IR), 0 < g < f}.
Since II (g) 1 < IIIllllgllu < IIIllllfllu for 0 < 9 < f, and 1(0) == 0, we have
o < 1+ (f) < IIIlIllfllu. We claim that 1+ is the restriction to Co (X, [0,(0)) of
a linear functional; the proof is much the same as the proof of the linearity of the
integral in 32.3.
222 RADON MEASURES
Obviously I+(cf) == cI+(f) if c > O. Also, whenever 0 < gl < fl and
o < g2 < f2wehaveO < gl+g2 < fl+f2,sothatI+(fl+f2) > I(gl)+I(g2),and
it follows thatI+(fl + f2) > I+(fl)+I+(f2). On the other hand, if 0 < 9 < fl + f2,
let gl == min(g, fl) and g2 == 9 - gl. Then 0 < gl < fl and 0 < g2 < f2, so
I(g) == I(gl)+I(g2) < I+(fl)+I+(f2); thereforeI+(fl + f2) < I+(fl)+[+(f2).
In short, 1+ (fl + f2) == 1+ (fl) + [+ (f2).
Now, if f E Co(X, IR), then its positive and negative parts f+ and f- are in
Co(X, [0,00)), and we define I+(f) == I+(f+) - I+(f-). If also f == 9 - h where
g, h > 0, then 9 + f- == h + f+, whence I+(g) + I+(f-) == I+(h) + I+(f+).
Thus 1+ (f) == 1+ (g) - 1+ (h), and it follows easily as in the proof of Proposition
2.21 that 1+ is linear on Co (X, IR). Moreover,
II+(f)1 < max(I+(f+), I+(f-)) < 11111 max(llf+llu, Ilf-Ilu) == IIIllllfllu,
so that 111+ II < 11111.
Finally, let 1- == 1+ - I. Then 1- E Co (X, IR)*, and it is immediate from the
definition of 1+ that 1+ and 1- are positive. .
Any I E Co(X)* is uniquely determined by its restriction J to Co(X, IR), and
we have J == J 1 + iJ 2 where J 1 , J 2 are real linear functionals. We therefore
conclude from Lemma 7.15 and the discussion preceding it that for any I E Co(X)*
there are finite Radon measures J-ll, . . . , J-l4 such that I (f) == J f dJ-l where J-l ==
J-ll - J-l2 + i(J-l3 - J-l4).
At this point we need some more definitions. A signed Radon measure is
a signed Borel measure whose positive and negative variations are Radon, and a
complex Radon measure is a complex Borel measure whose real and imaginary
parts are signed Radon measures. (It is worth noting that on a second countable LCH
space, every complex Borel measure is Radon. This follows from Theorem 7.8 since
complex measures are bounded.) We denote that space of complex Radon measures
on X by M(X), and for J-l E M(X) we define
1I1ll! == 1J-lI(X),
where, of course, 1J-l1 is the total variation of J-l.
7.16 Proposition. If J-l is a complex Borel measure, then J-l is Radon ifflJ-l1 is Radon.
Moreover, M(X) is a vector space and J-l r-7 11J-l11 is a norm on it.
Proof. We observe that a finite positive Borel measure 1/ is Radon iff for every
Borel set E and every f > 0 there exist a compact K and an open U such that
K c E c U and 1/(U \ K) < f, by Propositions 7.5 and 7.7. The first assertion
follows easily from this. Indeed, if J-l == J-ll - J-l2 + i(J-l3 - J-l4) and 1J-l1 (U \ K) < f,
then J-lj(U \ K) < f for all j; conversely, if J-lj(U j \ Kj) < f/4 for all j, then
J-l(U \ K) < f where K = ui Kj and U = ni U j . The same argument shows that
M(X) is closed under addition and scalar multiplication. Finally, that II . II is a norm
on M(X) follows from Proposition 3.14. .
THE DUAL OFCo(X) 223
7.17 The Riesz Representation Theorem. Let X be an LCH space, andfor /l E
M(X) and I E Co(X) let Ip,(/) == J I dJ-l. Then the map J-l 1p, is an isometric
isomorphismfrom M(X) to Co(X)*.
Proal. We have already shown that every I E Co(X)* is of the form 1p,. On the
other hand, if J-l E M(X),by Proposition 3.13c we have
J I dp, < Jill dlp,1 < 1IIIIullp,ll,
so 1p, E Co(X)* and 1111-£11 < IIJ-lli. Moreover, if h == dJ-l/dl/ll, then Ihl == 1 by
Proposition 3.13b, so by Lusin's theorem, for any E > 0 there exists I E Cc(X) such
that 1I/IIu == 1 and I == h except on a set E with 1J-lI(E) < E/2. Then
11p,11 = J Ihl 2 dlp,1 = J h dp, < J I dp, + J (f - h ) dp,
< J I dp, + 21p,I(E) < J I dp, + E < IIII'll + E.
It follows that 11J-l11 < 1111-£11, so the proof is complete.
.
7.18 Corollary. If X is a compact Hausdorff space, then C(X)* is isometrically
isomorphic to M(X).
Let J-l be a fixed positive Radon measure on X. If I E £1 (J-l), the complex measure
dVI == I dJ-L is easily seen to be Radon (Exercise 8), and II vI II == Jill dJ-l == 11/111.
Thus I v I is an isometric embedding of £ 1 (/l) into M (X) whose range consists
precisely of those v E M (X) such that v « J-l. (The last statement follows from
the Radon-Nikodym theorem, which applies even if J-l is not a-finite; see 37.5.) The
most important example of this situation is J-l == m == Lebesgue measure on JRn, and
we shall identify £l(m) with a subspace of M(JRn).
The weak* topology on M(X) == Co(X)*, in which J-la J-l iff J I dJ-la
J I dJ-l for all I E Co (X), is of considerable importance in applications; we shall call
it the vague topology on M(X). (The term "vague" is common in probability theory
and has the advantage of forming an adverb more gracefully than "weak* .") The
vague topology is sometimes called the weak topology, but this terminology conflicts
with ours, since Co(X) is rarely reflexive (see Exercise 20). Weak convergence
arguments for £P (J-l) generally fail for p == 1 because £1 (J-l) is not the dual of
£00 (J-l), but good substitute results can often be obtained by regarding £1 (J-l) as a
subspace of M(X) as in the preceding paragraph and using the vague topology there.
We conclude by presenting a useful criterion for vague convergence in M (JR).
7.19 Proposition. Suppose J-l, J-l1, J-l2, . . . E M(JR), and let Fn (x) == J-ln (( -00, x])
and F ( x) == /l ( ( - 00, x]).
a. IfsuPn IIJ-lnll < 00 and Fn(x) F(x) for every x at which F is continuous.
then J-Ln J-l vaguely.
224 RADON MEASURES
b. If /In J-L vaguely, then sUPn II/ln II < 00. If, in addition, the /In 's are positive,
then Fn(x) -t F(x) at every x at which F is continuous.
Proof. (a) Since F is continuous except at countably many points (Theorems 3.27
and 3.29), Fn F a.e. with respect to Lebesgue measure. Also, IIFn IIu < IIJ-Ln II, so
the F n's are uniformly bounded. If ! is continuously differentiable and has compact
support, then, integration by parts (Theorem 3.36) and the dominated convergence
theorem yield
J f df-Ln = J f'(x)Fn(x) dx --> J f'(x)F(x) dx = J f df-L.
But by Theorem 4.52, the set of all such !'s is dense in Co (JR), so J in dJ-L J ! d/-l
for all ! E Co(JR) by Proposition 5.17. Thus J-Ln J-L vaguely.
(b) If J-Ln J-L vaguely, then sUPn IIJ-Lnll < 00 by the uniform boundedness
principle. Suppose that J-Ln > 0, and hence J-L > 0, and that F is continuous at x == a.
If! E Cc(JR) is the function that is 1 on [- N, a], 0 on (-00, - N - E] and [a+ E, (0),
and linear in between, we have
Fn(a) - Fn( -N) = f-Ln(( -N, aD < J f df-Ln --> J f df-L
< F(a + E) - F( -N - E).
As N 00, Fn( -N) and F( -N - E) tend to zero, so
limsup Fn(a) < F(a + E).
noo
Similarly, by considering the function that is 1 on [-N + E, a - E], 0 on (-00, N]
and [a, (0), and linear in between, we see that
liminf Fn(a) > F(a - E).
noo
Since E is arbitrary and F is continuous at a, we have Fn (a) F( a) as desired. .
Exercises
16. Suppose that I E Co(X, IR)* and 1+,1- are the functionals constructed in the
proof of Lemma 7.15. If J-L is the signed Radon measure associated to I, then the
positive and negative variations of J-L are the Radon measures associated to 1+ and
1- .
17. If J-L is a positive Radon measure on X with J-L(X) == 00, there exists! E Co(X)
such that J ! dJ-L == 00. Consequently, every positive linear functional on Co (X) is
bounded.
18. If J-L is a a-finite Radon measure on X and 1/ E M(X), let 1/ == VI + 1/2 be the
Lebesgue decomposition of 1/ with respect to J-L. Then 1/1 and 1/2 are Radon. (Use
Exercise 8.)
THE DUAL OFCo(X) 225
19. Let X be a completely regular space and A a completely regular subalgebra of
BC(X) (see Exercise 73 in 34.8). Find a description of A * as a space of measures.
20. Some examples of nonreflexivity of Co (X):
a. If J-l E M (X), let <!>(J-l) == LXEX J-l( {x}). This sum is well defined, and
<I> E M (X) *. If there exists a nonzero J-l E M (X) such that J-l( {x}) == 0 for all
x EX, then <I> is not in the image of Co (X) in M(X)* r-...i Co(X)**.
b. At the other extreme, let X == N with the discrete topology; then Co (X) * r-...i ZI
and ([1)* r-...i [00. (Note: Co(N) is usually denoted by co.)
21. Let {fa}aEA be a subset of Co (X) and {Ca}aEA a family of complex numbers.
If for each finite set B c A there exists J-lB E M(X) such that IIJ-lB II < 1 and
J fa dJ-lB == C a for a E B, then there exists J-l E M(X) such that 1IJ-l11 < 1 and
J fa dJ-l == C a for all a E A.
22. A sequence {fn} in Co (X) converges weakly to f E Co (X) iff sup Ilfnllu < 00
and fn f pointwise.
23. The hypothesis of positivity in Proposition 7.19b is necessary. (Take J-ln to be
the difference of the point masses at n- 1 and -n- 1 .)
24. Find examples of sequences {J-ln} in M () such that
a. J-ln 0 vaguely, but /I J-ln" -F o.
b. J-ln 0 vaguely, but J f dJ-ln -F J f dJ-l for some bounded measurable f
wi th compact support.
c. J-ln > 0 and J-ln 0 vaguely, but there exists x E such that F n ( x) -F F ( x )
(notation as in Proposition 7.19).
25. Let J-l be a Radon measure on X such that every nonempty open set has positive
measure (e.g., Lebesgue measure). For each x E X there is a net {fa} in £1 (J-l) that
converges vaguely in M(X) to the point mass at x. If X is first countable, the net
can be taken to be a sequence. (Consider functions of the form J-l(U)-IXU.)
26. If {J-ln} C M(X), J-ln J-l vaguely, and IIJ-lnll 11J-l1I, then J f dJ-ln J f dJ-l
for every f E BC(X). (If J-l == 0 the result is trivial. Otherwise, there exists
9 E Cc(X) with IIgliu < 1 such that J 9 dJ-l > II J-l II - E, and J gf dJ-ln J gf dJ-l
for f E BC(X).) Moreover, the hypothesis IIJ-ln II 11J-l11 cannot be omitted.
27. Let Ck([O, 1]) be as in Exercise 9 in 35.1. If I E C k ([0,1])*, there exist
J-l E M([O, 1]) and constants Co,..., Ck-l, all unique, such that
k-l
1(1) = J f(k) df-L + I:>jfU) (0).
o
(The functionals f f(j) (0) could be replaced by any set of k functionals that
separate points in the space of polynomials of degree < k.)
226 RADON MEASURES
7.4 PRODUCTS OF RADON MEASURES
In this section we study Radon measures on product spaces. X and Y will denote
LCH spaces, and 7rx and 7ry will denote the projections of X x Y onto X and Y,
respectively.
7.20 Theorem.
a. 23x 023 y C 23xxY.
b. If X and Yare second countable, then 23 x 0 'By == 'B x x y.
c. If X and Yare second countable and J.L and v are Radon measures on X and
Y, then J.L x v is a Radon measure on X x Y.
Prouf. Parts (a) and (b) are direct generalizations of Proposition 1.5, and the
proof is essentially the same. The main tool is Proposition 1.4. It implies, first, that
'Bx 0 'By is generated by the sets U x V where U is open in X and V is open
in Y. Since these sets are open in X x Y, we have 'B x 0 'By C 'B x x y. If the
topologies on X and Y have countable bases £ and , then every open set in X,
Y, or X x Y is a countable union of sets in £, , or {U x V : U E £, V E }.
It follows that 'B x, 'By, and 'B x x yare generated by these families and hence that
'B x 0 'B y == 'B x x y. As for (c), J.L x v is a Borel measure by (b), so by Theorem
7.8 we need only show that (J.L x v)(K) is finite for every compact K C X x Y.
But this is easy: 7rx(K) and 7ry(K) are compact, and K C 7rl(K) x 7r2(K), so
(J.L x v)(K) == J.L(7rx(K))v(7ry(K)) < 00. I
When X or Y is not second countable it can happen that 'B x 0 'By =1= 'B x x y; see
Exercises 28 and 29. In this case the product of Radon measures is certainly not a
Radon measure. However, there is a natural way of manufacturing a Radon measure
from it. To see this, we need a couple of facts about continuous functions. If 9 and
h are functions on X and Y, we define 9 0 h on X x Y by
9 0 h(x, y) == g(x)h(y).
7.21 Proposition. Let P be the vector space spanned by the functions 9 0 h with
9 E Cc(X), h E Cc(Y). Then P is dense in Cc(X 0 Y) in the uniform norm. More
precisely, given f E Cc(X x Y), E > 0, and precompact open sets U C X and
V C Y containing 7rx(supp(f)) and 7ry(supp(f)), there exists F E P such that
IIF - fllu < E and supp(F) C U x V.
Proof. U x V is a compact Hausdorff space. It follows easily from the Stone-
Weierstrass theorem that the linear span of {g 0 h : 9 E C ( U ), h E C ( V )} is
dense in C ( U x V ). In particular, there is an element G of this linear span such that
sUP u x v IG - fl < E. Also, by Urysohn's lemma there exist cjJ E Cc(U, [0,1]) and
fljJ E Cc(V, [0, 1]) such that cjJ == 1 on 7rx(supp(f)) and'ljJ == 1 on 7ry(supp(f)).
Thus if we define F == (q; 0 'ljJ) G on U x V and F == 0 elsewhere, we have F E P,
supp(F) c U x V, and IIF - fllu < E. I
PRODUCTS OF RADON MEASURES 227
7.22 Proposition. Every f E Cc(X x Y) is 'B x 0 'By-measurable. Moreover, if J-l
and v are Radon measures on X andY, thenCc(X x Y) C £1(J-l X v), and
J f d(jL xv) = J J f djL dv = J J f dv djL
(f E Cc(X x Y)).
Proof. If 9 E Cc(X) and h E Cc(Y), we have 9 0 h = (g 0 7rx )(h 0 7rY).
Since 7r x and 7ry are measurable from 'B x 0 'By to 'B x and 'By (by definition of
'B x 0 'By) and 9 and h are continuous, go 7rx and h 0 7ry are 'B x 0 'By-measurable.
Since products, sums, and pointwise limits of measurable functions are measurable,
the first assertion follows from Proposition 7.21. Also, every f E C c (X x Y) is
bounded and supported in a set of finite (J-l x v)-measure, hence is in £1 (J-l xv).
Fubini's theorem holds for such f even if J-l and v are not a-finite because one can
replace J-l and v by the finite measures J-l1 7r x(supp(f)) and vl7ry(supp(f)). I
It is now clear how to obtain a Radon measure on X x Y from Radon measures J-l
and v on X and Y. Namely, by Proposition 7.22 the formula 1(f) == J f d(J-l x v)
defines a positive linear functional on Cc(X x Y), so it determines a Radon measure
on X x Y by the Riesz representation theorem. We call this measure the Radon
product of J-l and v and denote it by J-l x v. The obvious question is: Does J-l x v agree
with J-l x v on 'B x 0 'By? In general, the answer is no. Indeed, a counterexample
may be obtained by taking X = JR, Y == JR d (JR with the discrete topology), J-l ==
Lebesgue measure, and v == counting measure. It is not hard to see that in this case
'B x x y == 'B x 0 'By, but Exercises 12 and 14 show that J-l x v is not semifinite and
that J-l x v is the semifinite part of J-l xv. However, some results are still available,
and in the a-finite case everything works out beautifully. In what follows, we employ
the notation of x-sections and y-sections introduced in 32.5.
7.23 Lemma.
a. If E E 'Bxxy, then Ex E 'By for all x E X and EY E 'Bx for all y E Y.
b. If f : X x Y C is 'B x x y-measurable, then fx is 'By-measurable for all
x E X and fY is 'B x -measurable for all y E Y.
Proof. The collection of all E c X x Y such that Ex E 'By and EY E 'B x for
all x, y is easily seen to be a a-algebra. It contains all open sets - if E is open,
so are Ex and EY, being inverse images of E under the maps y' 1-+ (x, y') and
x' 1-+ (x', y) - and hence it contains 'B x x y. This proves (a), and (b) follows since
(fx)-l(A) == (f-1(A))x and (fY)-l(A) == (f-1(A))Y. I
7.24 Lemma. If f E Cc(X x Y) and J-l and v are Radon measures on X and Y,
then the functions x 1-+ J f x dv and y 1-+ J fY dJ-l are continuous.
Proof. We write out the proof only for fx. It suffices to show that for any Xo E X
and E > 0 there is a neighborhood U of Xo such that Ilfx - fxo Ilu < E for x E U,
since then
J(Jx-fxo)dV < w(ny(supp(J))).
228 RADON MEASURES
However, for each y E 7T"y(supp(f)) there exist neighborhoods U y , V y of Xo and y
such that if (x, z) E U y x V y , then If(xo, y) - f(x, z)1 < E. We may choose a
finite subcover V y1 , . . . , V Yn of 7T"y(supp(f)) and then take U == n U yj ; details are
left to the reader. I
7.25 Proposition. Let J.L and v be Radon measures on X and Y. If U is open in
X x Y, then thefunctions x t-+ v(U x ) and y t-+ J.L(UY) are Borel measurable on X
and Y, and
f-L x v(U) = J v(Ux) df-L(x) = J f-L(UY) dv(y).
Proof. Let 9=" == {f E Cc(X x Y) : 0 < f < xu}. By Proposition 7.11 we have
Xu == sup{f : f E 9="} and hence XU x == sup{fx : f E 9="} and XUy == sup{fY :
f E 9="}. Thus by Proposition 7.12,
f-L x v(U) = sUP{J Id(f-L x v): I E},
v(U x ) = sup {J Ix dv : I E } , f-L(UY) = sup {J I Y df-L : I E } .
From Lemma 7.24 and Proposition 7.11 it follows that x t-+ v(U x ) and y r-+ J.L(UY)
are LSC and hence Borel measurable. Another application of Proposition 7.12,
together with Proposition 7.22, yields
f-L x v(U) = sup {JJ Ix dv df-L(x) : I E }
= J [sup {J Ix dv : I E }] df-L(x) = J v(Ux) df-L(x) ,
andlikewiseJ.L x v(U) == J J.L(UY)dv(y).
.
7.26 Theorem. Suppose that J.L and v are a-finite Radon measures on X and Y. If
E E 23 xxY , then the functions x t-+ v(Ex) and y t-+ J.L(EY) (which make sense by
Lemma 7.23) are Borel measurable on X and Y, and
f-L x v(E) = J v(Ex) df-L(x) = J f-L(EY) dv(y).
Moreover, the restriction of J.L x v to 'B x 0 'By is J.L x v.
Proof. For the moment, let us fix open sets U C X and V c Y with J.L(U) and
v(V) finite, and let W == U x V. Let JV( be the collection of all sets E E 'B x x Y
such that E n W satisfies the conclusions of the theorem. We then have
i. JV( contains all open sets, by Proposition 7.25.
PRODUCTS OF RADON MEASURES 229
11. If E, F E M and FeE, then E \ F E M; in particular, if F E M, then
FC == X \ F E M. Indeed, we have
J-L x v(E n W) == J-L x v(F n W) + J-L x v((E \ F) n W),
and likewise for v((EnW)x) and J-L((EnW)Y). Since the conclusions are true
for En Wand F n Wand all the sets involved have finite measure (this is why
we introduced W), we can subtract to obtain the conclusions for (E \ F) n W.
111. M is closed under finite disjoint unions. (This is simply the additivity of the
measures.)
IV. ]v( is closed under countable increasing unions, and hence (by (ii)) under count-
able decreasing intersections. (This follows from the monotone convergence
theorem.)
Now, let G == {A \ B : A, B open in X x Y}, and let A be the collection of finite
disjoint unions of sets in G. Since
(AI \ B 1 ) n (A 2 \ B 2 ) == (AI n A 2 ) \ (B 1 U B 2 ),
(A \ B)C == [(X x Y) \ A] U [(A n B) \ 0],
G is an elementary family, so by Proposition 1.7, A is an algebra. By Lemma 2.35,
the monotone class generated by A coincides with the a -algebra generated by A,
which is clearly 13 x x y. But by (i)-(iv) (since A \ B == A \ (A n B)), M contains
this monotone class, so M == 13 x x y .
N ext, since J-L and v are a-finite and outer regular, we have X == U Un and
Y == U V n where Un and V n are open and have finite measure, and we may assume
that the sequences { Un} and {V n } are increasing. If E E 13 x x y, the preceding
argument shows that En ( Un X V n ) satisfies the conclusions of the theorem for all n,
and the monotone convergence theorem then implies that E satifies the conclusions
too.
Finally, if E E 13 x x 13 y , by Tonelli's theorem we have
M x v(E) = J v(Ex) dM(X) = M x v(E),
and the proof is complete.
.
7.27 The Fubini-Tonelli Theorem for Radon Products. Let J-L and v be a-finite
Radon measures on X and Y, and let f be a Borel measurable function on X x Y.
Then f x and fY are Borel measurable for every x and y. If f > 0, then x 1---* J f x dv
and y 1---* J fY dJ-L are Borel measurable on X and Y. If f E £1 (J-L xv), then
fx E £1(v) for a.e. x, fY E £1(J-L) for a.e. y, and x 1---* J fx dv and y 1---* J fY dJ-L
are in £1(J-L) and £1(v). In both cases, we have
J f d(M xv) = J J f dM dv = J J f dv dM.
230 RADON MEASURES
Proof. The measurability of fx and fY was established in Lemma 7.23. The rest
of the proof is identical to the proof of the ordinary Fubini- Tonelli theorem, except
that Theorem 7.26 is used in place of Theorem 2.36. .
The extension of the notion of Radon products to any finite number of factors is
straightforward. More interestingly, the theory can be extended to infinitely many
factors provided that the spaces in question are compact and the measures on them
are normalized to have total mass 1.
To be precise, suppose that {XO:}O:EA is a family of compact Hausdorff spaces
and, for each a, /-Lo: is a Radon measure on Xo: such that /-Lo: (Xo:) == 1. Let
X == Ilo:EA Xo:, a compact Hausdorff space by Tychonoff's theorem. We would
like to define a Radon measure /-L on X such that if Eo: is a Borel set in Xo: for each
a and Eo: == Xo: for all but finitely many a, then /-L(Ilo:EA Eo:) == Ilo:EA /-Lo:(Eo:).
(The product on the right is well defined since all but finitely many factors are equal
to 1.) A bit of notation will be helpful: Given al, . . . , an E A, let 7r(O:l ,,,,,O:n) be the
natural projection from X onto Il Xo: j ,
7r(O:l,...,O:n)(x) == (XO:l'... ,xo: n ).
Thus 7r(o:ll ,...,O:n) (EO:l X ... x EO:n) == Ilo:EA Eo: where Eo: == Xo: for a =1=
a 1, . . . , an.
7.28 Theorem. Suppose that, for each a E A, /10: is a Radon measure on the
compact Hausdorff space Xo: such that /-Lo:(Xo:) == 1. Then there is a unique Radon
measure /-L on X == Ilo:EA Xo: such that for any al, . . . , an E A and any Borel set
E in Il Xo: j ,
/-L(7rll,...,Qn)(E)) == (/-LO:l X ... x /-LO:n)(E).
Proof. Let Gp(X) be the set of all f E G(X) that depend on only finitely many
coordinates, that is, all f of the form f == g 0 7r(O:l ,...,O:n) for some al, . . . , an E A
and g E G(Il Xo: j ). If f is such a function, we define
1(1) = J 9 d(f-tal X . . . x f-taJ.
Adding on some extra coordinates to the set al , . . . , an has no effect on this formula
since /-Lo: (Xo:) == 1 for all a. Thus l(f) is a well-defined positive linear functional
on GF(X), and 11(f) I < Ilfllu with equality when f is constant.
Now, GF(X) is clearly an algebra that separates points, contains constant func-
tions, and is closed under complex conjugation, so by the Stone-Weierstrass theorem
it is dense in G(X). Hence the functional 1 extends uniquely to a positive linear
functional of norm 1 on G (X), and the Riesz representation theorem therefore yields
a unique Radon measure /-L on X such that l(f) == J f d/-L for all f E Gp(X).
Given al,..., an E A, let /-L(O:l,...,O:n) == /-L 0 7r(o: l l,...,O:n)" Then /-L(O:l,...,O:n) is a
Borel measure on Il X 0: j that satisfies
J 9 df-t(al ,...,a n ) = J 9 0 71"( al ,...,a n ) df-t
NOTES AND REFERENCES 231
when 9 is the characteristic function of a Borel set, and hence (by the usual linearity
and approximation arguments) when 9 is any bounded Borel function. In particular,
from the definition of JL, for all 9 E C(TI Xa J ) we have
J 9 dJ-L(al ,...,a n ) = J 9 d(J-Lal X . . . x J-LaJ.
If we can show that JL(al ,...,a n ) is Radon, the uniqueness of the Riesz representation
will imply that JL(al ,...,a n ) = JLal X . . . X JLa n , which will complete the proof.
Let E be a Borel set in TI X aj , and write 7r == 7r(al,...,a n ) for short. Since
JL is regular, for any E > 0 there is a compact K C 7r- 1 (E) such that JL( K) >
JL(7r- I (E)) - E. Then K' == 7r(K) is a compact subset of E, and JL(al,...,an)(K') ==
JL(7r- I (K')) > JL(K) since K C 7r- I (K'), so JL(al,...,a n )(K') > JL(7r- I (E)) - E ==
JL(al ,...,a n ) (E) - E. Thus JL(al ,...,a n ) is inner regular, and the same argument applied
to EC shows that it is outer regular. Thus JL(al ,...,a n ) is Radon, and we are done. .
Exercises
28. If X is the set of ordinals less than or equal to the first uncountable ordinal WI,
with the order topology, then 23 x x x =1= 23 x Q?) 23 x. In fact, {(x, y) : x < Y < WI} is
open but not in 23 x Q?) 23 x. (Reexamine Exercise 47 in 32.5 in the light of Exercise
15 in 37.2.)
29. If X is a set of cardinality > c with the discrete topology, then 23 x x x =1=
23 x Q?) 23 x. In fact, D == {(x, Y) : x = y} is closed but not in 23 x Q?) 23 x. (D se
Exercise 5 in 31.2 and Proposition 1.23. If D E 23 x Q?) 23 x, then D E Jv( where
]v( is a sub-a-algebra of 23 x Q?) 23 x generated by a countable family of rectangles,
hence 'D E N Q?) N where N is a countably generated sub-a-algebra of 23 x. Then
{x} == Dx E N for all x, but card(N) < c.) The same reasoning applies if X is
replaced by its one-point compactification.
30. Let JL and v be Radon measures on X and Y, not necessarily a-finite. If I is a
nonnegative LSC function on X x Y, then x J Ix dv and y J IY dJL are Borel
measurable and J I d(JL xv) == JJ I dJL dv == JJ I dv dJL.
31. Some results concerning Baire sets on product spaces:
a. 23xY C 23 Q?) 23. (Hint: Proposition 7.22 remains true if 23 is replaced
by 23°.)
b. If X and Yare either compact or second countable, then 23 x Y = 23 Q?) 23.
c. If X is an uncountable set with the discrete topology, then 23 x x i=
23 Q?) 23.
7.5 NOTES AND REFERENCES
37.1 : The Riesz representation theorem is actually the work of many hands. F. Riesz
[116] first proved it for the case X == [a, b] C ffi(; he formulated the result in terms of
232 RADON MEASURES
Riemann-Stieltjes integrals and used no measure theory. It was extended to compact
subsets ofIR(n by Radon [111], to compact metric spaces by Banach (see Saks [128]),
and to compact Hausdorff spaces by Kakutani [80]. For the noncompact case, the
first general results were obtained by Markov [98], who characterized positive linear
functionals on BC(X) for a normal space X in terms of certain finitely additive set
functions. A theorem essentially equivalent to Theorem 7.2 was apparently known
to Bourbaki about 1940 (see Weil [158, 36]), but his treatment of integration was
not published until 1952, by which time several others had obtained similar results.
For more detailed references, see Dunford and Schwartz [35, 3IV.I6] and Hewitt and
Ross [75, 311]. See also Konig [86] for a generalization of the Riesz representation
theorem to spaces that are not locally compact.
Our use of the term "Radon measure," which derives from Radon's seminal paper
[Ill], is common but not entirely standard. Some authors refer to such measures as
"regular Borel measures"; others use the term "Radon measure" to mean a positive
linear functional on Cc(X), and still others define Radon measures to be inner regular
rather than outer regular on all Borel sets. It should also be noted that some older
texts define the Borel a-algebra to be the a-algebra generated by the compact sets,
which is in general smaller than our 23 x .
If J-L is a Radon measure, let J-L denote the complete, saturated extension of J-L
discussed at the end of 37.1. It is a significant fact that J-L is always decomposable
in the sense of Exercise 15 in 33.2; see Hewitt and Stromberg [76, Theorem 19.30].
Consequently, the extension of the Radon-Nikodym theorem in that exercise and the
fact that £1 ( J-L ) rv 00 ( J-L ) (Exercise 25 in 36.2) are available. In this connection one
should note that LP ( J-L ) is essentially identical to LP (JL) for p < 00, by Propositions
2.12 and 2.20.
37.2 See Cohn [27, Proposition 7.2.3] for a proof of Theorem 7.8 that does not
use the Riesz representation theorem.
Propositions 7.12 and 7.14 suggest an alternative way of constructing the Radon
measure J-L associated to a positive linear functional 1 on Cc(X) in the spirit of the
Daniell integral (see 32.8). Namely, one first extends 1 to nonnegative LSC functions
9 by setting
I(g) == sup{/(f): f E Cc(X), 0 < f < g}
and then extends 1 to arbitrary nonnegative functions h by setting
1 ( h) == inf { 1 (g) : 9 LS C, 9 > h}.
It is then not difficult to verify that if E c X, I(XE) == J-L*(E) where J-L* is the outer
measure in the proof of Theorem 7.2. For details, see Hewitt and Ross [75] or Hewitt
and Stromberg [76].
Kupka and Prikry [88] contains a readable discussion of some of the more advanced
topics in the theory of measures on LCH spaces.
37.3: Theorem 7.17 is frequently stated only for the case where X is compact;
however, the more general formulation follows easily from the compact case by
considering the one-point compactification of X. An interesting proof of the Baire
measure version of this result, quite different from ours, can be found in Hartig [67].
NOTES AND REFERENCES 233
37.4: The Fubini-Tonelli theorem for Radon products, as presented here, is
essentially due to deLeeuw [31]; see also Cohn [27, 37.6]. Another variant of this
theorem, which includes some further results for the non-a-finite case, can be found
in Hewitt and Ross [75, 313].
Theorem 7.28 is essentially due to Nelson [103]. There is also a purely measure-
theoretic version of this result: If {(Xo:, Mo:, J.LaJ }O:EA is a family of measure
spaces with /-Lo: (X 0:) == 1 for all a, one can define a product measure TIo: J.L0: on
(TIo: Xo:, Q90: Mo:). See Saeki [127], Halmos [62, 338], or Hewitt and Stromberg
[76, 322]. The hypotheses of Theorem 7.28 are more restrictive, but the conclusion
is stronger; in particular, the domain of the measure J.L in Theorem 7.28 is the Borel
a-algebra on TIo: Xo:, which is much larger than Q90: 23 Xa when the index set A is
uncountable.
Elements of Fourier
Analysis
It is easy to say that Fourier analysis, or harmonic analysis, originated in the work
of Euler, Fourier, and others on trigonometric series; it is much harder to describe
succinctly what the subject comprises today, for it is a meeting ground for ideas from
many parts of analysis and has applications in such diverse areas as partial differential
equations and algebraic number theory. Two of the central ingredients of harmonic
analysis, however, are convolution operators and the Fourier transform, which we
study in this chapter.
8.1 PRELIMINARIES
We begin by making some notational conventions. Throughout this chapter we
shall be working on n, and n will always refer to the dimension. In any measure-
theoretic considerations we always have Lebesgue measure in mind unless we specify
otherwise. Thus, if E is a measurable set in n, we shall denote LP (E, m) by LP (E).
If U is open in n and kEN, we denote by Ck(U) the space of all functions on
U whose partial derivatives of order < k all exist and are continuous, and we set
COC>(U) = n Ck(U). Furthermore, for any E c n we denote by C(E) the
space of all COC> functions on n whose support is compact and contained in E.
If E == n or U == n, we shall usually omit it in naming function spaces: thus,
LP == LP(n), C k == ck(n), C = c(n). Ifx,y E n, we set
n
x. Y == LXjYj,
1
IXI = vrx:x.
235
236 ELEMENTS OF FOURIER ANALYSIS
It will be convenient to have a compact notation for partial derivatives. We shall
wri te
a
a j == _ a '
x.
J
and for higher-order derivatives we use multi-index notation. A multi-index is an
ordered n-tuple of nonnegative integers. If a == (al'...' an) is a multi-index, we
set
n
lal == L aj,
I
n
II/I == IT II/ . I
u:.. u:.J.'
I
aa == ( ) al ... ( ) an ,
aX I aX n
and if x == (Xl, . . . , X n ) E n,
n
IT a"
X a == X j J .
I
(The notation lal == I:aj is inconsistent with the notation Ixi == (I:x;)1/2, but the
meaning will always be clear from the context.) Thus, for example, Taylor's formula
for functions f E Ck reads
f(x) = L (8 a 1)(xo) (x - o)a + Rk(X),
a.
lal:::;k
lirn IRk(X)1 == 0
x-+xo Ix - Xo I k '
and the product rule for derivatives becomes
8 a (fg) = L (3!' (8 f3 1)(8"1 g)
{3+,=a .1.
(Exercise 1).
We shall often avail ourselves of the sloppy but handy device of using the same
notation for a function and its value at a point. Thus, "x a " may be used to denote the
function whose value at any point x is x a .
Two spaces of Coo functions on n will be of particular importance for us. The
first is the space C of Coo functions with compact support. The existence of
nonzero functions in C is not quite obvious; the standard construction is based on
the fact that the function rJ(t) == e-l/tX(O,oo)(t) is Coo even at the origin (Exercise
3). If we set
(8.1)
'Ij; ( x) = 1] (1 _ I X 1 2 ) = { XP [ ( 1 X 1 2 - 1) -1 ]
if I x I < 1,
if Ixi > 1,
it follows that 'l/J E Coo, and supp( 'l/J) is the closed unit ball. In the next section we
shall use this single function to manufacture elements of C in great profusion; see
Propositions 8.17 and 8.18.
The other space of Coo functions we shaH need is the Schwartz space 3 consisting
of those Coo functions which, together with all their derivatives, vanish at infinity
PRELIMINARIES 237
faster than any power of Ixl. More precisely, for any nonnegative integer N and any
multi-index a we define
II f II (N,a) == sup (1 + I X I) N I aa f ( X ) I;
xEIRn
then
3 == {f E C(X) : Ilfll(N,a) < 00 for all N, a}.
Examples of functions in S are easy to find: for instance, fa(x) == xae-lxl2 where
a is any multi-index. Also, clearly C C S.
It is an important observation that if f E 3, then aa f E LP for all a and all
p E [1,00]. Indeed, laa f(x)1 < C N (l + Ixl)-N for all N, and (1 + Ixl)-N E LP
for N > nip by Corollary 2.52.
8.2 Proposition. 3 is a Frechet space with the topology defined by the norms
II . II (N,a).
Proof. The only nontrivial point is completeness. If {fk} is a Cauchy sequence in
S, then II fj - fk II (N,a) 0 for all N, a. In particular, for each a the sequence {aa fk}
converges uniformly to a function ga. Denoting by ej the vector (0, . . . , 1, . . . ,0)
with the 1 in the jth position, we have
h(x + tej) - h(x) = it OJ!k(X + sej) ds.
Letting k 00, we obtain
go (x + tej) - go (x) = it gej (x + sej) ds.
The fundamental theorem of calculus implies that gej == a j go, and an induction on
lal then yields ga == aa go for all a. It is then easy to check that Ilfk - goll(N,a) 0
for all a. .
Another useful characterization of S is the following.
8.3 Proposition. If f E C(X), then f E 3 iff x!3 aa f is bounded for all multi-indices
a, {3 iff aa (x!3 f) is bounded for all multi-indices a, {3.
Proof. Obviously Ix!31 < (1 + Ixl)N for 1{31 < N. On the other hand, L IXj IN
is strictly positive on the unit sphere Ixl == 1, so it has a positive minimum 8 there. It
follows that L IXj IN > 81xl N for all x since both sides are homogeneous of degree
N, and hence
n
(1 + Ixl)N < 2 N (1 + IxI N ) < 2 N [1 + 8- 1 L Ixfl] < 2 N 8- 1 L Ix(3l.
1 1!3I:S N
This establishes the first equivalence. The second one follows from the fact that each
aa (x!3 f) is a linear combination of terms of the form x' a 6 f and vice versa, by the
product rule (Exercise 1). .
238 ELEMENTS OF FOURIER ANALYSIS
We next investigate the continuity of translations on various function spaces. The
following notation for translations will be used throughout this chapter and the next
one: If f is a function on JRn and y E JRn,
Tyf(x) == f(x - y).
We observe that IITyfll p == Ilfll p for 1 < p < 00 and that IITyfilu == Ilfllu. A
function f is called uniformly continuous if IITyf - fllu 0 as y O. (The
reader should pause to check that this is equivalent to the usual E-8 definition of
uniform continuity.)
8.4 Lemma. If f E Cc(ffi(n), then f is uniformly continuous.
Proof. Given E > 0, for each x E supp(f) there exists 8x > 0 such that
If(x-y)- f(x)1 < EiflYI < 8x. Sincesupp(f) is compact, there exist XI, ... ,XN
such that the balls of radius 8xj about Xj cover supp(f). If 8 = mini 8 xj }, then,
one easily sees that IITyf - fllu < E whenever Iyl < 8. .
8.5 Proposition. If 1 < p < 00, translation is continuous in the LP norm; that is, if
f E LP and z E ffi(n, then limyo IITy+zf - Tzfll p == O.
Proof. Since Ty+z == TyTz, by replacing f by Tzf it suffices to assume that z = O.
First, if 9 E C c , for Iyl < 1 the functions Tyg are all supported in a common compact
set K, so by Lemma 8.4,
J ITyg - glP < IITyg - gll m(K) -; 0 as y -; o.
Now suppose f E LP. If E > 0, by Proposition 7.9 there exists g E C c with
Ilg - flip < E/3, so
I\Tyf - flip < IITy(f - g) lip + IITyg - gllp + Ilg - flip < E + IITyg - gllp,
and /lTyg - g/lp < E/3 if y is sufficiently small.
.
Proposition 8.5 is false for p == 00, as one should expect since the L 00 norm is
closely related to the uniform norm; see Exercise 4.
Some of our results will concern multiply periodic functions in ffi(n, and for
simplicity we shall take the fundamental period in each variable to be 1. That is, we
define a function f on }Rn to be periodic if f(x + k) = f(x) for all x E ffi(n and
k E zn. Every periodic function is thus completely determined by its values on the
unit cube
Q= [_,!)n.
Periodic functions may be regarded as functions on the space jRn jzn r-v (ffi(jz)n of
co sets of zn, which we call the n-dimensional torus and denote by 1'n. (When
n == 1 we write l' rather than 1'1.) Tn is a compact Hausdorff space; it may be
CONVOLUTIONS 239
identified with the set of aU Z == (Zl'...' Zn) E (Cn such that IZjl == 1 for all j, via
the map
( ) ( 27riXl 27rix n )
X I, . . . , x n e , . . . , e .
On the other hand, for measure-theoretic purposes we identify Tn with the unit cube
Q, and when we speak of Lebesgue measure on Tn we mean the measure induced on
Tn by Lebesgue measure on Q. In particular, m(Tn) == 1. Functions on Tn may be
considered as periodic functions on JRn or as functions on Q; the point of view will
be clear from the context when it matters.
Exercises
1. Prove the product rule for partial derivatives as stated in the text. Deduce that
80:(x{3 f) == x{380:f + L C,8 x88 ' f, x{380: f == oO:(x{3 f) + L c88'(x8 f)
for some constants C,8 and C8 with C,8 == c8 == 0 unless IJiI < lal and 1£51 < 11'1.
2. Observe that the binomial theorem can be written as follows:
( ) k ""'" k! 0:
Xl + X2 == ,x
Q.
IQI=k
Prove the following generalizations:
a. The multinomial theorem: If x E JRn,
(x == (Xl,X2), a == (al,a2)).
( ) k ""'" k! 0:
Xl + . . . + X n == ,X .
a.
lal=k
b. The n-dimensional binomial theorem: If x, y E JRn,
( ) Q _ ""'" a! {3,
x + y - {3T'x y .
(3+,=Q .Ji.
3. Let T}(t) == e- l / t for t > 0, T}(t) == 0 for t < o.
a. For kEN and t > 0, T}(k)(t) == P k (l/t)e- l / t where P k is a polynomial of
degree 2k.
b. T}(k) (0) exists and equals zero for all kEN.
4. If f E LOC> and IITyf - flloc> --7 0 as y --7 0, then! agrees a.e. with a uniformly
continuous function. (Let Ar! be as in Theorem 3.18. Then Arf is uniformly
continuous for r > 0 and uniformly Cauchy as r --7 0.)
8.2 CONVOLUTIONS
Let ! and 9 be measurable functions on JRn. The convolution of f and 9 is the
function! * 9 defined by
f * g(x) = J f(x - y)g(y) dy
240 ELEMENTS OF FOURIER ANALYSIS
for all x such that the integral exists. Various conditions can be imposed on f and
9 to guarantee that f * 9 is defined at least almost everywhere. For example, if f is
bounded and compactly supported, 9 can be any locally integrable function; see also
Propositions 8.7-8.9 below.
In what follows, we shall need the fact that if f is a measurable function on JRn,
then the function K (x, y) == f (x - y) is measurable on JRn X JRn. We have K == f 0 S
where s(x, y) == x - y; since s is continuous, K is Borel measurable if f is Borel
measurable. This can always be assumed without affecting the definition of f * g, by
Proposition 2.12. However, the Lebesgue measurability of K also follows from the
Lebesgue measurability of f; see Exercise 5.
The elementary properties of convolutions are summarized in the following propo-
sition.
8.6 Proposition. Assuming that all integrals in question exist, we have
a. f * 9 == 9 * f,
b. (f * g) * h == f * (g * h).
c. For z E JRn, Tz(f * g) == (Tzf) * 9 == f * (Tzg).
d. If A is the closure of {x + y : x E supp(f), Y E supp(g)}, then supp(f * g) c
A.
Proof. (a) is proved by the substitution z == x - y:
J * g(x) = 1 J(x - y)g(y) dy = 1 J(z)g(x - z) dz = 9 * J(x).
(b) follows from (a) and Fubini's theorem:
l U *g)*h(x) = 11 J(y)g(x-z-y)h(z)dydz
= 11 J(y)g(x - y - z)h(z) dz dy = J * (g * h) (x).
As for (c),
TzU * g)(x) = 1 J(x - z - y)g(y) dy = 1 Tz!(X - y)g(y) dy = (Tz!) * g(x),
and by (a),
Tz(f * g) == Tz(g * f) == (Tzg) * f == f * (Tzg).
For (d), we observe that if x tj. A, then for any y E supp(g) we have x- y tj. supp(f);
hence f(x - y)g(y) == 0 for all y, so f * g(x) = O. .
The following two propositions contain the basic facts about convolutions of LP
functions.
8.7 Young's Inequality. If fELl and 9 E LP (1 < p < (0), then f * g(x) exists
for almost every x, f * 9 E LP, and IIf * gllp < IIfll1l1gllp.
CONVOLUTIONS 241
Proof. This is a special case of Theorem 6.18, with K(x, y) == f(x - y).
Alternatively, one can use Minkowski's inequality for integrals:
Ilf * gllp = II/ f(y)g(. - y) dyll p < / If(y)IIITygllp dy = Ilflhllgllp'
.
8.8 Proposition. If p and q are conjugate exponents, f E LP, and 9 E Lq, then
f * 9 (x) exists for every x, f * 9 is bounded and unifonnly continuous, and II f * g" u <
Ilfllpllgllq. Ifl < p < 00 (so that 1 < q < 00 also), then f * 9 E Co (IRn).
Proof. The existence of f * 9 and the estimate for IIf * gllu follow immediately
from Holder's inequality. In view of Propositions 8.5 and 8.6, so does the uniform
continuity of f * g: If 1 < p < 00,
IITy(f * g) - f * gllu == II(Tyf - f) * glloo < IITyf - fllpllgll q -+ 0 as y -+ o.
(If p == 00, interchange the roles of f and g.) Finally, if 1 < p, q < 00, choose
sequences {fn} and {gn} of functions with compact support such that IIfn - flip -+ 0
and Ilgn - gl\q -+ O. By Proposition 8.6d and what we have just proved, fn * gn E C c .
But
Ilfn * gn - f * gllu < Ilfn - fllpllgnllq + IIfllplign - gllq -+ 0,
so f * 9 E Co by Proposition 4.35.
.
The preceding results are all we shall use, but for the sake of completeness we
state also the following generalization.
8.9 Proposition. Suppose 1 < p, q, r < 00 and p-1 + q-1 == r- 1 + 1.
a. (Young's Inequality, General Form) If f E LP and 9 E Lq, then f * 9 E Lr
and Ilf * gllr < IIfllpIIgllq.
b. Suppose also that p > 1, q > 1, and r < 00. Iff E LP and 9 E weak Lq, then
f * 9 E Lr and IIf * gllr < C pq IIfllp[g]q where C pq is independent of f and g.
c. Suppose that p == 1 and r == q > 1. If fELl and 9 E weak Lq, then
f * 9 E weak Lq and [f * g]q < C q II fill, where C q is independent of f and g.
Proof. To prove (a), let q be fixed. The special cases p == 1, r == q and
p == q/(q - 1), r == 00 are Propositions 8.7 and 8.8. The general case then follows
from the Riesz-Thorin interpolation theorem. (See also Exercise 6 for a direct proof.)
(b) and (c) are special cases of Theorem 6.36. .
One of the most important properties of convolution is that, roughly speaking,
f * 9 is at least as smooth as either f or g, because formally we have
(Y>(f * g)(x) = 8 a / f(x - y)g(y) dy = / 8 a f(x - y)g(y) dy = (8 a 1) * g(x),
242 ELEMENTS OF FOURIER ANALYSIS
and similarly 8 a (f * g) == f * (8 a g). To make this precise, one needs only to impose
conditions on f and 9 so that differentiation under the integral sign is legitimate. One
such result is the following; see also Exercises 7 and 8.
8.10 Proposition. If fELl, 9 E C k , and 8 a g is bounded for lal < k, then
f * 9 E C k and 8 a (f * g) == f * (8a g )for lal < k.
Proof. This is clear from Theorem 2.27.
.
8.11 Proposition. If f, 9 E S, then f * 9 E S.
Proof. First, f * 9 E Coo by Proposition 8.10. Since
(8.12) 1 + Ix I < 1 + Ix - yl + Iyl < (1 + Ix - yl)(1 + Iyl),
we have
(1 + jx1)N18 Q (f * g)(x)j < J (1 + Ix - yl)NI8 Q f(x - y)l(l + lyI)Nlg(y)1 dy
< IlfllcN,Q)!lgIICN+nH,Q) J (1 + !yl)-n-l dy,
which is finite by Corollary 2.52.
.
Convolutions of functions on the torus Tn are defined just as for functions on JRn.
(If one regards functions on Tn as periodic functions on jRn, of course, the integration
is to be extended over the unit cube rather than JRn.) All of the preceding results
remain valid, with the same proofs.
The following theorem underlies many of the important applications of convolu-
tions on JRn. We introduce a bit of notation that will be used frequently hereafter: If
cP is any function on JRn and t > 0, we set
(8.13)
CPt ( x) == t - n cP ( t -1 X ) .
We observe that if cP E L 1 , then J CPt is independent of t, by Theorem 2.44:
J 1Yt = J 1>(t-lx)C n dx = J 1>(y) dy = J 1>.
Moreover, the "mass" of CPt becomes concentrated at the origin as t ---r O. (Draw a
picture if this isn't clear.)
8.14 Theorem. Suppose cP E L 1 and J cp(x) dx == a.
a. If f E LP (1 < p < (0), then f * CPt ---r af in the LP nonn as t ---r O.
b. If f is bounded and unifonnly continuous, then f * CPt ---r af unifonnly as
t ---r O.
c. If f E Loo and f is continuous on an open set U, then f * CPt ---r af unifonnly
on compact subsets of U as t ---r o.
CONVOLUTIONS 243
Proof. Setting y == tz, we have
f * 4>t(x) - af(x) = j[f(X - y) - f(x)]4>t(Y) dy
= jU(x - tz) - f(x)]4>(z) dz
= jhzf(x) - f(x)]4>(z) dz.
Apply Minkowski's inequality for integrals:
Ilf * 4>t - afll p < J Ilrtzf - fll p l4>(z)1 dz.
Now, IITtzf - flip is bounded by 211fll p and tends to 0 as t -+ 0 for each z, by
Proposition 8.5. Assertion (a) therefore follows from the dominated convergence
theorem.
The proof of (b) is exactly the same, with II. lip replaced by 11.llu. The estimate for
IIf * cjJt - afllu is obvious, and IITtzf - fllu -+ 0 as t -+ 0 by the uniform continuity
of f.
As for (c), given E > 0 let us choose a compact E c IRn such that fEc IcjJl < E.
Also, let K be a compact subset of U. If t is sufficiently small, then, we will have
x - tz E U for all x E K and z E E, so from the compactness of K it follows as in
Lemma 8.4 that
sup If(x - tz) - f(x)/ < E
xEK, zEE
for small t. But then
sup If * cjJt(x) - af(x)1 < sup [ r + ( ] If(x - tz) - f(x)llcjJ(z)1 dz
xEK xEK JE lEc
< E j 14>1 + 211fllooE,
from which (c) follows. .
If we impose slightly stronger conditions on cP, we can also show that f * cPt -+ af
almost everywhere for f E LP. The device in the following proof of breaking up an
integral into pieces corresponding to the dyadic intervals [2 k , 2k+l] and estimating
each piece separately is a standard trick of the trade in Fourier analysis.
8.15 Theorem. Suppose IcjJ(x) I < C(l+lxl)-n-EforsomeC,E > o (which implies
that cjJ E L 1 by Corollary 2.52), and f cjJ(x) dx == a. If f E LP (1 < p < (0), then
f * cjJt(x) -+ af(x) as t -+ 0 for every x in the Lebesgue set of f - in particular:
for almost every x, andfor every x at which f is continuous.
Proof. If x is in the Lebesgue set of f, for any 8 > 0 there exists TJ > 0 such that
(8.16) 1 If(x - y) - f(x)1 dy < 8r n forr < TJ.
Iyl<r
244 ELEMENTS OF FOURIER ANALYSIS
Let us set
h = r If(x - y) - f(x)ll<pt(y)1 dy,
J 1yl <TJ
1 2 = r If(x - y) - f(x)ll<pt(y)1 dy.
J 1yl ?TJ
We claim that II is bounded by A8 where A is independent of t, whereas 1 2 ---r 0 as
t ---r O. Since
If * 4>t ( x) - a f ( x ) I < II + 1 2 ,
we will have
limsup If * 4>t(x) - af(x)1 < A8,
t-+O
and since 8 is arbitrary, this will complete the proof.
To estimate II, let K be the integer such that 2 K < TJ j t < 2 K + 1 if TJ j t > 1,
and K == 0 if TJjt < 1. We view the ball Iyl < TJ as the union of the annuli
2-kTJ < Iyl < 21-kTJ (1 < k < K) and the balllyl < 2- K TJ. On the kth annulus we
use the estimate
I<pt (y) I < CC n I I- n -€ < CC n [ 2-t k 1] ] -n-€ ,
and on the ball I y I < 2 - K TJ we use the estimate I cPt (y) I < Ct- n . Thus
h < t CC n [ 2- kTJ ] -n-€ r If(x - y) - f(x)\ dy
1 t J2- k TJ'5:lyl<2 1 - k TJ
+cc n r If(x - y) - f(X)1 dy.
J 1yl <2- K TJ
Therefore, by (8.16) and the fact that 2 K < TJjt < 2K+l,
1 1 < C8)21-k1])ncn [ 2-t k 1] ] -n-€ + C8t- n (2- K 1]t
= 2 n C8 [ ] -€ 2k€ + C8 [ 2-;1] ] n
[ TJ ] -E 2(K+l)E - 2 E [ 2-K TJ ] n
== 2 n C8 - 2 + C8
t E - 1 t
< 2 n C[2 E (2 E _1)-1 + 1]8.
As for 1 2 , if p' is the conjugate exponent to p and X is the characteristic function of
{y : Iyl > TJ}, by Holder's inequality we have
h < r (If(x-y)I+lf(x)l)l<pt(y)ldy
J 1yl ?TJ
< II f II p II X 4>t II p' + If ( x ) III X 4>t III ,
CONVOLUTIONS 245
so it suffices to show that for 1 < q < 00, and in particular for q == 1 and q == p',
Ilx<Pt Il q 0 as t O. If q == 00, this is obvious:
II x4>t 1100 < Ct- n [1 + (1] /t) ] -n-E == CtE (t + 1]) -n-E < C1] -n-Et E .
If q < 00, by Corollary 2.51 we have
Ilx4>tll = r C nQ I4>(C 1 y)IQ dy = tn(l-q) 1 14>(z)lq dz
J 1yl ?.rl I z l2::17/ t
1 00 [ 'n ] n-(n+E)q
< C 1 t n (1- q ) rn-1-(n+E)q dr == C 2 t n (1- q ) :!.. == C 3 t Eq .
17/ t t
In either case, Ilx4>t Ilq is dominated by t E , so we are done. .
In most of the applications of the preceding two theorems one has a == 1, although
the case a = 0 is also useful. If a == 1, {4>t}t>o is called an approximate identity, as
it furnishes an approximation to the identity operator on LP by convolution operators.
This construction is useful for approximating LP functions by functions having
specified regularity properties. For example, we have the following two important
results:
8.17 Proposition. C (and hence also S) is dense in LP (1 < p < (0) and in Co.
Proof. Given f E LP and E > 0, there exists 9 E C c with Ilf - gllp < E/2, by
Proposition 7.9. Let 4> be a function in C such that J 4> == 1 - for example, take
4> == (J 1/;) -I1/; where 1/; is as in (8.1). Then 9 * 4>t E C by Propositions 8.6d and
8.10, and IIg * cPt - gllp < E/2 for sufficiently small t by Theorem 8.14. The same
argument applies if LP is replaced by CO, II . lip by II . Ilu, and Proposition 7.9 by
Proposition 4.35. .
8.18 The Coo Urysohn Lemma. If K c JRn is compact and U is an open set
containing K, there exists f E C such that 0 < f < 1, f == 1 on K, and
supp(f) c U.
Proof. Let 8 == p(K, UC) (the distance from K to UC, which is positive since
K is compact), and let V == {x : p( x, K) < 8/3}. Choose a nonnegative 4> E C
such that J <P == 1 and cP(x) = 0 for Ixl > 8/3 (for example, (J 1/;)-11/;8/3 with 1/;
as in (8.1)), and set f == Xv * 4>. Then f E C by Propositions 8.6d and 8.10, and
it is easily checked that 0 < f < 1, f == 1 on K, and supp(f) c {x : p(x, K) <
28/3} c U. .
Exercises
5. If 8 : n X JRn JRn is defined by 8(X, y) == X - y, then 8- 1 (E) is Lebesgue
measurable whenever E is Lebesgue measurable. (For n == 1, draw a picture of
8- 1 (E) C JR2. It should be clear that after rotation through an angle 7r /4, 8- 1 (E)
246 ELEMENTS OF FOURIER ANALYSIS
becomes F x IR where F == {x : J2 x E E}, and Theorem 2.44 can be applied. The
same idea works in higher dimensions.)
6. Prove Theorem 8.9a by using Exercise 31 in 36.3 to show that
If * g(xW < Ilfll;-PlIgll-q J If(y)IPlg(x - yW dy.
7. If f is locally integrable on JRn and 9 E C k has compact support, then f * 9 E C k .
8. Suppose that f E LP(JR). If there exists h E LP(JR) such that
lim Ily-1 (T -yf - f) - hll == 0,
yo P
we call h the (strong) LP derivative of f. If I E LP(JRn), LP partial derivatives of f
are defined similarly. Suppose that p and q are conjugate exponents, f E LP, 9 E Lq,
and the LP derivative 8j f exists. Then 8j (I * g) exists (in the ordinary sense) and
equals (8j I) * g.
9. If f E LP(JR), the LP derivative of f (call it h; see Exercise 8) exists iff f is
absolutely continuous on every bounded interval (perhaps after modification on a null
set) and its pointwise derivative I' is in LP, in which case h == I' a.e. (For "only if,"
use Exercise 8: If 9 E C c with J 9 == 1, then f * gt --7 f and (f * gt)' -+ h as t -+ O.
For "if," write
f (x + y) - f ( x) _ l' (x) = r [J' (x + t) - l' (x )] dt
Y Y Jo
and use Minkowski's inequality for integrals.)
10. Let cP satisfy the hypotheses of Theorem 8.15. If f E LP (1 < p < ex)), define
the q,-maximal function of f to be M<jJf(x) == SUPt>o If * cPt(x)l. (Observe that
the Hardy-Littlewood maximal function H f is M<jJlfl where cP is the characteristic
function of the unit ball divided by the volume of the ball.) Show that there is
a constant C, independent of f, such that M<jJf < C . H f. (Break up the integral
J f(x-Y)cPt(y)dyasthesumoftheintegralsoverlyl < tandover2 k t < Iyl < 2k+1t
(k == 0,1,2,. . .), and estimate cPt on each region.) It follows from Theorem 3.17 that
M<jJ is weak type (1,1), and the proof of Theorem 3.18 can then be adapted to give an
alternate demonstration that f * cPt -+ (J cP) f a.e.
11. Young's inequality shows that L 1 is a Banach algebra, the product being convo-
lution.
a. If J is an ideal in the algebra L 1 , so is its closure in L 1 .
b. If fELl, the smallest closed ideal in L 1 containing f is the smallest closed
subspace of L 1 containing all translates of f. (If 9 E C c , f * g(x) can be
approximated by sums L f(x - Yj)g(Yj)Yj. On the other hand, if {cPt} is an
approximate identity, f * Ty (cPt) -+ Tyf as t -+ 0.)
THE FOURIER TRANSFORM 247
8.3 THE FOURIER TRANSFORM
One of the fundamental principles of harmonic analysis is the exploitation of sym-
metry. To be more specific, if one is doing anal ysis on a space on which a group acts,
it is a good idea to study functions (or other analytic objects) that transform in simple
ways under the group action, and then try to decompose arbitrary functions as sums
or integrals of these basic functions.
The spaces we are studying are JRn and Tn, which are Abelian groups under addi-
tion and act on themselves by translation. The building blocks of harmonic analysis
on these spaces are the functions that transform under translation by multiplication
by a factor of absolute value one, that is, functions f such that for each x there is a
number cjJ(x) with IcjJ(x) I == 1 such that f(y + x) == cP(x)f(y). If f and cP have this
property, then f(x) == cjJ(x)f(O), so f is completely determined by cp once f(O) is
gIven; moreover,
cP(x)cp(y)f(O) == cP(x)f(y) == f(x + y) == cp(x + y)f(O),
so that (unless f == 0) cjJ(x + y) == cjJ(x)cjJ(y). In short, to find all f's that transform
as described above, it suffices to find all cp's of absolute value one that satisfy the
functional equation cp(x + y) == cp(x)cp(y). Upon imposing the natural requirement
that cP should be measurable, we have a complete solution to this problem.
8.19 Theorem. If cP is a measurable function on JRn (resp. Tn) such that cP( x + y) ==
cP(x)cP(Y) and IcPl == 1, there exists E n (resp. E Tn) such that cP(x) == e27ri.x.
Proof. We first prove this assertion on IPL Let a E JR be such that IDa cjJ( t) dt =I=- 0;
such an a surely exists, for otherwise the Lebesgue differentiation theorem would
imply that cP == 0 a.e. Setting A == (IDa cP(t) dt)-1, then, we have
cj>(X) = A la cj>(x)cj>(t) dt = A la cj>(x + t) dt = A l x + a cj>(t) dt.
Thus cjJ, being the indefinite integral of a locally integrable function, is continuous;
and then, being the integral of a continuous function, it is 0 1 . Moreover,
cP'(x) == A[cP(x + a) - cP(x)] == BcP(x), where B == A[cP(a) - 1].
It follows that (d/dx)(e-BxcP(x)) == 0, so that e-BxcjJ(x) is constant. Since cP(O) ==
1, we have cp(x) == e Bx , and since IcPl == 1, B is purely imaginary, so B == 27ri
for some E JR. This completes the proof for ; as for T, the cjJ we have been
considering will be periodic (with period 1) iff e27ri == 1 iff E Z.
The n-dimensional case follows easily, for if el, . . . , en is the standard basis for
JRn, the functions 1/;j (t) == cP( tej) satisfy 1/;j (t + s) == 1/;j (t )1/;j (s) on JR, so that
'ljJj (t) == e21fiJ t, and hence
n n
cj>(x) = cj>(2: xjej) = II 'lj;j (Xj) = e27ri.x.
1 1
.
248 ELEMENTS OF FOURIER ANALYSIS
The idea now is to decompose more or less arbitrary functions on Tn or JRn in
terms of the exponentials e21ri -x. In the case of Tn this works out very simply for
L2 functions:
8.20 Theorem. Let E"" (x) == e 21ri ",,-x. Then {E"" : "" E zn} is an orthonormal basis
of L2 (Tn).
Proof. Verification of orthonormality is an easy exercise in calculus; by Fubini's
theorem it boils down to the fact that Jo l e21rikt dt equals 1 if k = 0 and equals 0
otherwise. Next, since E""E).. == E",,+).., the set of finite linear combinations of the
E",,'s is an algebra. It clearly separates points on Tn; also, Eo == 1 and E "" == E_"".
Since Tn is compact, the Stone-Weierstrass theorem implies that this algebra is dense
in C (Tn) in the uniform norm and hence in the L2 norm, and C (Tn) is itself dense
in L 2 (Tn) by Proposition 7.9. It follows that {E",,} is a basis. .
To restate this result: If 1 E L2 (Tn), we define its Fourier transform J, a
function on zn, by
j(K) = (1, E K ) = r f(x)e-27riK'X dx.
JTn
and we call the series
L !("")E",,
""EZ n
the Fourier series of I. The term "Fourier transform" is also used to mean the map
1 1. Theorem 8.20 then says that the Fourier transform maps L2(Tn) onto [2 (zn),
that 111112 == 111112 (Parseval's identity), and that the Fourier series of f converges to
1 in the £2 norm. We shall consider the question of pointwise convergence in the
next two sections.
Actually, the definition of 1( "") makes sense if f is merely in Ll (Tn), and 11( "") 1 <
"f Ill, so the Fourier transform extends to a norm-decreasing map from Ll (Tn) to
Zoo (zn). (The Fourier series of an L 1 function may be quite badly behaved, but there
are still methods for recovering f from 1 when 1 EL I , as we shall see in the next
section.) Interpolating between L 1 and £2, we have the following result.
8.21 The Hausdorff-Young Inequality. Suppose that 1 < p < 2 and q is the
conjugate exponent to p. If f E LP(Tn), then f E zq(zn) and Ilillq < 11111p.
Proof. Since 1111100 < IIflll and 111112 == 111112 for 1 E Ll or 1 E L 2 , the
assertion follows from the Riesz- Thorin interpolation theorem. .
The situation on JRn is more delicate. The formal analogue of Theorem 8.20
should be
f(x) = r j(Oe 27ri l;'x d, where j() = r f(x)e- 27ri l;'x dx.
Jn Jn
THE FOURIER TRANSFORM 249
These relations turn out to be valid when suitably interpreted, but some care is needed.
In the first place, the integral defining f() is likely to diverge if f E £2. However, it
certainly converges if f E £1. We therefore begin by defining the Fourier transform
of f E £l(Jn) by
g:- f() = [() = r f(x)e-27ri.x dx.
JIRn
(We use the notation for the Fourier transform only in certain situations where it
is needed for clarity.) Clearly IIfllu < IlfllI, and lis continuous by Theorem 2.27;
thus
: £1(IRn) BC(IR n ).
We summarize the elementary properties of in a theorem.
8.22 Theorem. Suppose f, 9 E £1 (IRn).
a. (Tyf)() == e-21'ii.y [() and Try(!) == Ii where h(x) == e 2 1'iiTJ'x f(x).
b. 1fT is an invertible linear transformation ofIRn and S == (T*) -1 is its inverse
transpose, then (f 0 Tf' == I det TI- 1 ! 0 S. In particular, if T is a rotation,
then (f 0 T)-- == ! 0 T; and ifTx == t- 1 x (t > 0), then (f 0 T)--() == t n !(t),
so that (ft)() == !( t) in the notation of (8.13).
-- -..,......
c. (f * g) == f g.
d. Ifx a f E £1 for lal < k, then! E C k and 8 a ! == [( -27rix)a f]-:
e. If f E Ck, 8 a f E £1 for lal < k, and 8 a f E Co for lal < k - 1, then
(8 a f)() == (27ri)a [( ).
f (The Riemann-Lebesgue Lemma) (Ll(IRn)) c Co (IRn).
Proof. a. We have
(7yfn) = J f(x - y)e-2-rri'x dx = J f(x)e-27ri.(X+Y) dx = e- 27rib [(),
and similarly for the other formula.
b. By Theorem 2.44,
(f 0 Tn) = J f(Tx)e-27ri.x dx = I det TI- 1 J f(x)e-2-rri.T-lx dx
= I det T\-l J f(x)e-27riS.x dx = I det TI- 1 [(S).
250 ELEMENTS OF FOURIER ANALYSIS
c. By Fubini's theorem,
(J * gr(l) = J J f(x - y)g(y)e- 27ri t;.x dydx
= J J f(x - y)e- 27ri t;.Cx-y) g(y)e- 27rib dx dy
= i() J g(y)e- 27ri t;.y dy
== i()g().
d. By Theorem 2.27 and induction on lal,
80: f() = 8f J f(x)e- 27ri t;.x dx = J f(x)( _27rix)O:e- 27ri t;.x dx.
e. First assume n == lal == 1. Since I E Co, we can integrate by parts:
J f' (x)e- 27ri t;.x dx = f(x)e- 27ri t;.x eoo - J f(x)( _27ri)e-27rit;.x dx
-
== 27riI().
-
The argument for n > 1, I a I == 1 is the same - to compute (OJ I) , integrate by parts
in the jth variable - and the general case follows by induction on lal.
f. By (e), if I E C1 n C c , then 1li() is bounded and hence i E Co. But the set
of all such I's is dense in £1 by Proposition 8.17, and in ---+ i uniformly whenever
In ---+ I in L 1 . Since Co is closed in the uniform norm, the result follows. .
Parts (d) and (e) of Theorem 8.22 point to a fundamental property of the Fourier
-
transform: Smoothness properties of I are reflected in the rate of decay of I at
infinity, and vice versa. Parts (a), (c), (e), and (f) of this theorem are valid also on
Tn, as is (b) provided that T leaves the lattice zn invariant (Exercise 12).
8.23 Corollary. maps the Schwartz class S continuously into itself.
Proof. If I E S, then x a o f3 I E L 1 n Co for all a, (3, so by Theorem 8.22d,e, i
is COCJ and
(x a a f3 I)- == (_l)la l (27ri)If3I-lalaa(f3i).
Thus aa (f3i) is bounded for all a, (3, whence i E S by Proposition 8.3. Moreover,
since J(l + Ixl)-n-1 dx < 00,
II (x a a f3 I)-lIu < Ilx a a f3 II/! < CII (1 + Ixl)n+1x a af3 Illu.
It then follows that Ilill(N,f3) < C N ,f3 1,11f3IIIIII(N+n+1,,) by the proof of Propo-
sition 8.3, so the Fourier transform is continuous on S. .
THE FOURIER TRANSFORM 251
At this point we need to compute an important specific Fourier transform.
2 2
8.24 Proposition. If f(x) == e-7ralxl where a > 0, then f() == a-n/2e-7rI1 fa.
Proof. First consider the case n == 1. Since the derivative of e- 7rax2 is
2
-27rae- 7rax , by Theorem 8.22d,e we have
([)'(f,) = (-27fixe- 7rax2 nf,) = iU'nf,) = i(27fif,)[(f,) = - 27f f,[(f,).
a a a
2 2
It follows that (d/d)(e7r /a f()) == 0, so that e7r /a f() is constant. To evaluate
the constant, set == 0 and use Proposition 2.53:
[(0) = I e- 7rax2 dx = a- 1 / 2 .
The n-dimensional case follows by Fubini's theorem, since Ixl 2 == L: X]:
n
[(f,) = IT I exp( -7fax; - 27fif,jxj) dXj
1
n
= IT [a -1/2 exp ( -7f f,J / a )] = a -n /2 exp( -7f If, 1 2 / a) .
1
.
We are now ready to invert the Fourier transform. If fELl, we define
fV (x) = [( -x) = I f(f,)e 27ri f;.x dt"
and we claim that if f E £1 and [ E £1 then ([) v == f. A simple appeal to Fubini's
theorem fails because the integrand in
([t(x) = II f(y) e-27ribe27rif;.x dydf,
is not in £1 (IRn x IRn). The trick is to introduce a convergence factor and then pass
to the limit, using Fubini's theorem via the following lemma:
8.25 Lemma. If f, 9 E £1 then J [g == J f'9.
Proof. Both integrals are equal to JJ f(x)g()e-27ri.x dx d.
.
8.26 The Fourier Inversion Theorem. Iff E £1 and [ E £1, then f agrees almost
everywhere with a continuous function fa, and ([)V == (fV)== fa.
252 ELEMENTS OF FOURIER ANALYSIS
Proof. Given t > 0 and x E IRn, set
cP() == exp(2ni. x - nt2112).
By Theorem 8.22a and Proposition 8.24,
(y) == t- n exp( -nix - yl2jt 2 ) == gt(x - y),
where g( x) == e- 1T / xI2 and the subscript t has the meaning in (8.13). By Lemma
8.25, then,
J e-1rt21€12 e 21ri €.x f(l;,) d = J f1J = J f;f; = f * 9t(X).
Since J e-1TlxI2 dx == 1, by Theorem 8.14 we have I * gt -t I in the L1 norm as
t -t O. On the other hand, since [ E L 1 the dominated convergence theorem yields
lim J e _1Tt2112 e27ri.x [() d == J e21Ti.x [() d == ([) v (x).
tO
It follows that I == ([) v a.e., and similarly (IV) == I a.e. Since ([) v and (I V )-- are
continuo;, being Fourier transforms of L 1 functions, the proof is complete. .
8.27 Corollary. If I E L 1 and [ == 0, then I == 0 a. e.
8.28 Corollary. is an isomorphism of S onto itself.
Proof. By Corollary 8.23, maps S continuously into itself, and hence so does
I IV, since IV (x) == [( - x ). By the Fourier inversion theorem, these maps are
inverse to each other. .
At last we are in a position to derive the analogue of Theorem 8.20 on IRn.
8.29 The Plancherel Theorem. If I E L 1 n L 2 , then [ E L2; and 1(L1 n L2)
extends uniquely to a unitary isomorphism on L 2 .
Proof Let X == {I E L 1 : [ E L1}. Since [ E L 1 implies I E LOO, we have
X C L 2 by Proposition 6.10, and X is dense in L 2 because SeX and S is dense in
L 2 by Proposition 8.17. Given I, 9 E X, let h == g . By the inversion theorem,
h() = J e- 21ri €,x g(x) dx = J e 21ri €.xg(x) dx = 9() .
Hence, by Lemma 8.25,
J f 9 = J fh= J fh= ]g .
THE FOURIER TRANSFORM 253
Thus I X preserves the £2 inner product; in particular, by taking 9 == f, we obtain
.........
IIfl12 == IIf112. Since (X) == X by the inversion theorem, IX extends by continuity
to a unitary isomorphism on £2 .
It remains only to show that this extension agrees with on all of £1 n £2. But
if f E £1 n £2 and g(x) == e-7rlxI2 as in the proof of the inversion theorem, we
have f * gt E £1 by Young's inequality and (f * gt)......... E £1 because (f * gt).........() ==
2 2......... .........
e- 7rt II f() and f is bounded. Hence f * gt E X; moreover, by Theorem 8.14,
f * gt f in both the £1 and £2 norms. Therefore (f * gt)......... fboth uniformly
and in the £2 norm, and we are done. .
We have thus extended the domain of the Fourier transform from £1 to £1 +
L 2 . Just as on Tn, the Riesz- Thorin theorem yields the following result for the
intermediate £P spaces:
8.30 The Hausdorff-Young Inequality. Suppose that 1 < p < 2 and q is the
conjugate exponent to p. If f E LP(IRn), then f E £q(IRn) and IIll1q < IIfll p .
If f E £1 and 1 E £1, the inversion formula
f(x) = J i()e21ri.x d
exhibits f as a superposition of the basic functions e 27ri t;,.x; it is often called the
Fourier integral representation of f. This formula remains valid in spirit for all
f E £2, although the integral (as well as the integral defining f) may not converge
pointwise. The interpretation of the inversion formula will be studied further in the
next section.
We conclude this section with a beautiful theorem that involves an interplay of
Fourier series and Fourier integrals. To motivate it, consider the following problem:
Given a function f E £1 (IRn), how can one manufacture a periodic function (that is,
a function on Tn) from it? Two possible answers suggest themselves. One way is to
"average" f over all periods, producing the series L:kEZ n f(x - k). This series, if it
converges, will surely define a periodic function. The other way is to restrict f to the
lattice zn and use it to form a Fourier series L:EZn 1( K )e27ri.x. The content of the
following theorem is that these methods both work and both give the same answer.
8.31 Theorem. If fELl (IRn), the series L:kEZ n Tkf converges pointwise a.e. and
in £l(Tn) to a function Pf such that IIPfl1i < IlfllI. Moreover, for K E zn,
......... .........
(P f) (Ii) (Fourier transform on Tn) equals f (Ii) (Fourier transform on IRn ).
Proof. Let Q == [-!, !)n. Then IRn is the disjoint union of the cubes Q + k ==
{x + k : x E Q}, k E zn, so
r L If(x - k)1 dx = L 1 If(x)1 dx = llf(X)! dx.
} Q kEZ n kEZ n Q+k }Rn
254 ELEMENTS OF FOURIER ANALYSIS
Now apply Theorem 2.25. First, it shows that the series L: Tkf converges a.e. and
in L 1 (1r n ) to a function Pf E L 1 (1r n ) such that IIPfl1i < IlfllI, since Tn is
measure-theoretically identical to Q. Second, it yields
(pfn",) = 1 L f(x - k)e- 21ri t<.x dx = L 1 f(x)e- 21ri t<,(x+k) dx
Q kEZn kEZ n Q+k
= L f f(x)e- 21ri t<.x dx = f f(x)e- 21ri t<.x dx = 1(",).
kE71 N } Q+k J'R n
.
If we impose conditions on f to guarantee that the series in question converge
absolutely, we obtain a more refined result.
8.32 The Poisson Summation Formula. Suppose f E C (Jn) satisfies \ f (x) I <
C(l + Ixl)-n-E and \j()\ < C(1 + \\)-n-E for some C, E > O. Then
L f(x + k) == L 1(K)e21ri.x,
kEZ n EZn
where both series converge absolutely and uniformly on Tn. In particular, taking
x == 0,
L f(k) == L l(K).
kEZ n EZn
Proof. The absolute and uniform convergence of the series follows from the fact
that L:kEZ n (1 + \kl)-n-E < 00, which can be seen by comparing the latter series to
the convergent integral J(1 + Ix\)-n-E dx. Thus the function Pf == L:k Tkf is in
C(Tn) and hence in L 2 (Tn), so Theorem 8.35 implies that the series L: 1( K )e21ri'x
converges in L 2 (Tn) to P f. Since it also converges uniformly, its sum equals P f
pointwise. (The replacement of k by - k in the formula for P f is immaterial since
the sum is over all k E zn.) .
Exercises
12. Work out the analogue of Theorem 8.22 for the Fourier transform on Tn.
13. Let f(x) == - x on the interval [0,1), and extend f to be periodic on IR.
a. j(O) == 0, and j( K) == (27riK) -1 if K =1= o.
b. L: k- 2 == 7r 2 /6. (Use the Parseval identity.)
14. (Wirtinger's Inequality) If f E C1([a, b]) and f(a) == f(b) == 0, then
lb'f(X)12dx < c:a r l b1 f'(X) ,2 dx.
THE FOURIER TRANSFORM 255
(By a change of variable it suffices to assume a == 0, b == . Extend f to [- , ] by
setting f( -x) == - f(x), and then extend f to be periodic on IR. Check that f, thus
extended, is in C 1 (1r) and apply the Parseval identity.)
15. Let sinc x == (sin 7rx) /7rX (sinc 0 == 1).
a. If a > 0, X[-a,a] (x) == X,a] (x) == 2a sinc 2ax.
b. Let 9-C a == {f E L 2 : f() == 0 (a.e.) for II > a}. Then 9-C is a Hilbert
space and { sinc(2ax - k) : k E Z} is an orthonormal basis for 9-C.
c. (The Sampling Theorem) If f E 9-C a , then f E Co (after modification on a
null set), and f(x) == oo f(k/2a) sinc(2ax - k), where the series converges
both uniformly and in £2. (In the terminology of signal analysis, a signal of
bandwidth 2a is completely determined by sampling its values at a sequence of
points {k /2a} whose spacing is the reciprocal of the bandwidth.)
16. Let fk == X[ -1,1] * X[ -k,k].
a. Compute fk(X) explicitly and show that Ilfllu == 2.
b. f(x) == (7rx)-2 s in27rkxsin27rx, and Ilfl\1 ---+ 00 as k 00. (Use
Exercise 15a, and substitute y == 27rkx in the integral defining II f 1\1.)
c. (£1) is a proper subset of Co. (Consider gk == f and use the open mapping
theorem. )
17. Given a > 0, let f(x) == e-21Txxa-1 for x > 0 and f(x) == 0 for x < o.
a. f E £1, and f E L 2 if a > .
b. f() == f(a)[(27r)(1 + i)]-a. (Here we are using the branch of za in the
right half plane that is positive when z is positive. Cauchy's theorem may be
used to justify the complex substitution y == (1 + i)x in the integral defining
.........
f.)
c. If a, b > then
1 00 . -a . -b 22-a- b 7rf(a + b - 1)
-00 (1 - zx) (1 + zx) dx = f(a)f(b) .
18. Suppose f E £2 (IR).
a. The £2 derivative f' (in the sense of Exercises 8 and 9) exists iff f E L 2 , in
which case i'() == 27riJ().
b. If the £2 derivative f' exists, then
[1 1!(x)1 2 dX] < 4 1 Ix!(x)1 2 dx 1 1f'(x)12 dx.
(If the integrals on the right are finite, one can integrate by parts to obtain
Jlfl 2 = -2ReJx f f'.)
c. (Heisenberg's Inequality) For any b, {3 E IR,
1 (x - b)21!(x)/2 dx 1 (f, - (3)2\J(f,)\2 df, > .
256 ELEMENTS OF FOURIER ANALYSIS
(The inequality is trivial if either integral on the right is infinite; if not, reduce to
the case b == (3 == 0 by considering g(x) == e- 27Ci {3x f(x + b).) This inequality,
.-
a form of the quantum uncertainty principle, says that f and f cannot both be
sharply localized about single points band {3.
19. (A variation on the theme of Exercise 18) If f E £2 (IRn) and the set S ==
{x : f(x) =1= O} has finite measure, then for any measurable E c IRn, JE IJl2 <
/I f II m( S)m( E).
20. Iff E £l(IRn+m), definePf(x) == J f(x,y)dy. (Here x E IRn andy E ]Rm.)
Then P f E £1 (IRn), liP flh < Ilfl\!, and (P f)'-() == f(, 0).
21. State and prove a result that encompasses both Theorem 8.31 and Exercise 20,
in the setting of Fourier transforms on closed subgroups and quotient groups of]Rn.
22. Since commutes with rotations, the Fourier transform of a radial function is
radial; that is, if F E £l(IRn) and F(x) == f(lxl), then F() == g(II), where f and
9 are related as follows.
a. Let J () == Is e ixE dO" ( x) where 0" is surface measure on the unit sphere
S in IRn (Theorem 2.49). Then J is radial - say, J() == j(II) - and
g(p) == Jo oo j(27rrp)f(r)r n - 1 dr.
b. J satisfies L: a J + J == O.
c. j satisfies pj"(p) + (n - 1)j'(p) + pj(p) == O. (This equation is a variant
of Bessel's equation. The function j is completely determined by the fact that it
is a solution of this equation, is smooth at p == 0, and satisfies j (0) == 0"( S) ==
27r n / 2 /f(n/2). In fact, j(p) == (27r)n/2 p C 2-n)/2J(n_2)/2(p) where J a is the
Bessel function of the first kind of order a.)
d. If n == 3, j(p) == 47rp-1 sinp. (Set f(p) == pj(p) and use (c) to show that
f" + f == O. Alternatively, use spherical coordinates to compute the integral
defining J (0, 0, p) directly.)
23. In this exercise we develop the theory of Hermite functions.
a. Define operators T, T* on S(IR) by T f(x) == 2-1/2(xf(x) - f'(x)J and
T* f(x) == 2-1/2(xf(x) + f'(x)J. Then J(Tf) g == J f( T*g ) and T*T k -
TkT* == kTk-1.
b. Let ho(x) == 7r-1/4e-x2/2, and for k > 1 let h k == (k!)-1 /2Tk hO. (h k is
the kth normalized Hermite function.) We have Thk == y' k + 1 hk+1 and
T* h k == /k h k - 1 , and hence TT* h k == kh k .
c. Let S == 2TT* + I. Then S f(x) == x 2 f(x) - f" (x) and Sh k == (2k + 1 )h k .
(S is called the Hermite operator.)
d. {hk}o is an orthonormal set in £2(IR). (Check directly that Ilh o l1 2 == 1, then
observe that for k > 0, J hk h m == k- 1 J(TT* hk) h m and use (a) and (b).)
e. We have
Tkf(x) = (_1)kTk/2ex2/2 ( d )k [e- x2 / 2 f(x)]
SUMMATION OF FOURIER INTEGRALS AND SERIES 257
(use induction on k), and in particular,
_ (-I)k x 2 /2 ( ) k _x2
hk(x) - [7f 1 / 2 2kk!]1/2 e dx e .
f. Let Hk(X) == e x2 / 2 h k (x). Then Hk is a polynomial of degree k, called the
kth normalized Hermite polynomial. The linear span of Ho, . . . , Hm is the
set of all polynomials of degree < m. (The kth Hermite polynomial as usually
defined is [7I"1/22kk!]1/2 Hk.)
g. {hk}O is an orthonormal basis for £2 (IR). (Suppose I -.L h k for all k, and let
g(x) == I(x)e- X2 /2. Show that 9 == 0 by expanding e-27ri.x in its Maclaurin
series and using (f).)
h. Define A : £2 --+ £2 by AI(x) == (271")1/4I(xV27r), and define 1 ==
A -1 AI for I E £2. Then A is unitary and 1() == (271")-1/2 J I(x)e-iX dx.
Moreover, Tj == -iT( I) for I E S, and h o == ho; hence h k == (-i)k h k .
Therefore, if cPk == Ah k , {cPk}O is an orthonormal basis for L 2 consisting of
eigenfunctions for ; namely, 'J;k == (-i)kcPk.
8.4 SUMMATION OF FOURIER INTEGRALS AND SERIES
The Fourier inversion theorem shows how to express a function I on IRn in terms
of i provided that I and i are in £1. The same result holds for periodic functions.
Namely, if I E L 1 (1r n ) and i E [1(zn), then the Fourier series I: i(K)e27ri.X
converges absolutely and uniformly to a function g. Since [1 C [2, it follows from
Theorem 8.20 that I E £2 and that the series converges to I in the £2 norm. Hence
I == g a.e., and I == g everywhere if I is assumed continuous at the outset.
.........
Two questions therefore arise. What conditions on I will guarantee that I is
......... .........
integrable? And how can I be recovered from I if I is not integrable?
As for the first question, since i is bounded for I E £1, the issue is the decay
.........
of I at infinity, and this is related to the smoothness properties of I. For example,
by Theorem 8.22e, if I E cn+l(IRn) and 8 a I E £1 n Co for lal < n + 1, then
1j()1 < C(1 + I\)-n-l and hence i E £l(IRn) by Corollary 2.52. The same
result holds for periodic functions, for the same reason: If I E cn+l (Tn), then
If(K)1 < C(l + IK\)-n-l and hence i E [1(zn).
To obtain sharper results when n > 1 requires a generalized notion of partial
derivatives, so we shall postpone this task until 39.3. (See Theorem 9.17.) However,
for n == 1 we can easily obtain a better theorem that covers the useful case of functions
that are continuous and piecewise C 1 . We state it for periodic functions and leave
the nonperiodic case to the reader (Exercise 24).
8.33 Theorem. Suppose that I is periodic and absolutely continuous on JR, and that
I' E £P(1f) for some p > 1. Then J E [1 (Z).
258 ELEMENTS OF FOURIER ANALYSIS
Proof. Since p > 1, we have C p == L: K-P < 00; and since LP(1r) C L2(1r)
for p > 2, we may assume that p < 2. Integration by parts (Theorem 3.36) shows that
.......... ..........
(f') (K) == 27riKf(K). Hence, by the inequalities of Holder and Hausdorff-Young, if
q is the conjugate exponent to p,
.......... [ ] 1/P [ .......... ] 1/ q
2: If(K)1 < 2:( 27r IK\)-P 2:(27r\Kf(K)\)q
K#O K#O K#O
= (2C p )l jp II (J'fli < (2C p )l jp 11f'11 .
27r q 27r P
Adding 11(0) I to both sides, we see that 11[111 < 00.
.
..........
We now turn to the problem of recovering f from f under minimal hypotheses
on f, and we consider first the case of n. The proof of the Fourier inversion
theorem contains the essential idea: Replace the divergent integral J [() e21Ti.x d
by J [()ip(t)e21Ti.x d where ip is a continuous function that vanishes rapidly
enough at infinity to make the integral converge. If we choose <P to satisfy <p(0) == 1,
then ip( t) 1 as t 0, and with any luck the corresponding integral will converge
to f in some sense. One ip that works is the function ip() == e-1T112 used in the proof
of the inversion theorem, but we shall see below that there are others of independent
interest. We therefore formulate a fairly general theorem, for which we need the
following lemma that complements Theorem 8.22c.
8.34 Lemma. If f, 9 E L 2 (JRn), then (fg)V = f * g.
Proof. f 9 E Ll by Plancherel's th eorem an d Holder's inequality, so (f g) v
makes sense. Given x E IRn, let h(y) == g(x - y). It is easily verified that h() =
g() e-21Ti'x, so since is unitary on L 2 ,
f * g(x) = J f h = J jh = J 1()g()e27ri.X d = C!gt (x).
.
8.35 Theorem. Suppose that ip E L 1 n Co, ip(O) == 1, and q; == ip v ELI. Given
f E L 1 + L 2 ,fort > 0 set
t(x) = J 1()<I>(t)e27ri.x d.
a. If f E LP (1 < p < (0), then ft E LP and 11f t - flip 0 as t o.
b. Iff is bounded and uniformly continuous, then so is ft, and ft f uniformly
as t o.
c. Supposealsothatlq;(x)\ < C(1+\xl)-n-EforsomeC,E > o. Thenft(x)
f (x) for every x in the Lebesgue set of f.
SUMMATION OF FOURIER INTEGRALS AND SERIES 259
Proof. We have f == fl + f2 where fl E £1 and f2 E £2. Since h E £00,
j; E £2, and <I> E (£1 n Co) c (£1 n £2), the integral defining ft converges
absolutelyforeveryx. Moreover,ifcPt(x) == t- n cP(t- 1 x), wehave<I>(t) == (cPt)()
by the inversion theorem and Theorem 8.22b, and J cP(x) dx == <I>(O) == 1. Since
cP, <I> E £1 we have fl * cP E £1 and h <I> E £1, so by Theorem 8.22c and the inversion
formuJa,
J h ()<I>(t)e27ri'x d = h * cPt (x ).
Also, cP E £2 by the Plancherel theorem, so by Lemma 8.34,
J j;()<I>(t)e27ri'x d = 12 * cPt().
In short, ft == f * cPt, so the assertions follow from Theorems 8.14 and 8.15. .
By combining this theorem with the Poisson summation formula, we obtain a
corresponding result for periodic functions.
8.36 Theorem. Suppose that <I> E CORn) satisfies I<I>()I < C(1 + II)-n-€,
I <I> v (x)1 < C(l + Ixl)-n-€, and <I> (0) == 1. Given f E L 1 Crr: n ), for t > 0 set
ft (x) == L l( K- )<I>( tK- )e27ri.x
EZn
(which converges absolutely since L I <I> (tK-) I < (0).
a. Iff E £P(1f n ) (1 < p < (0), then 11f t - flip 0 as t 0, and iff E C(1f n ),
then ft f uniformly as t o.
b. ft (x) f (x) for every x in the Lebesgue set of f.
Proof. Let cP == <I>v and cPt (x) == t- n cP(t- 1 x). Then (cPt)() == <I>(t), and cPt
satisfies the hypotheses of the Poisson summation formula, so
L cPt (x - k) == L <I>(tK-)e27ri.x.
kEZ n kEZ n
Let us denote the common value of these sums by 'l/;t (x). Then
(f * 'l/;t)() == j(K-):(h(K-) == j(K-)<I>(tK-) == (ft)(K-),
so ft == f * 'l/;t. Hence, by Young's inequality and Theorem 8.31 we have
11f t lip < IIfll p ll'l/;tlll < IIfll p llcPtlil == IlfllpllcPlh,
so the operators f ft are uniformly bounded on £P, 1 < p < 00.
Now, since <I> is continuous and <I>(O) == 1, we clearly have ft f uniformly
(and hence in £P(1f n )) if f is a trigonometric polynomial- that is, if j(K-) == 0 for
all but finitely many K-. But the trigonometric polynomials are dense in C(1f n ) in the
260 ELEMENTS OF FOURIER ANALYSIS
uniform norm by the Stone-Weierstrass theorem, and hence also dense in LP cr n ) in
the LP norm for p < 00. Assertion (a) therefore follows from Proposition 5.17.
To prove (b), suppose that x is in the Lebesgue set of f; by translating f we may
assume that x == 0, which simplifies the notation. With Q == [- , )n, we have
t(O) = J * 'l/Jt (0) = h J(x )'l/Jt( -x) dx
= 1 J(x)4Jt( -x) dx + L 1 J(x)4Jt( -x + k) dx.
Q k#O Q
Since
IcPt(x)1 < Ct- n (l + t- 1 Ixl)-n-t < Cttlxl-n-\
for x E Q and k =I 0 we have IcPt( -x + k)1 < C2n+tttlkl-n-t, and hence
Ll IJ (x)4Jt(-x+k)ldx < [C2 n +€IIJlhLl k l- n -€]t€,
k#O Q k#O
which vanishes as t ---t O. On the other hand, if we define g == fXQ E L1 (}Rn), then
o is in the Lebesgue set of g (because 0 is in the interior of Q, and the condition that 0
be in the Lebesgue set of g depends only on the behavior of g near 0), so by Theorem
8.15,
lim ( J(x)4Jt( -x) dx = lim g * 4Jt(O) = g(O) = J(O).
toJQ tO
.
Let us examine some speci fic examples of functions <I> that can be used in Theorems
8.35 and 8.36. The first is the one already used in the proof of the inversion theorem,
<I>() == e-nl12,
q;( x) == <I> v (x) == e -nlxl 2 .
This q; is called the Gauss kernel or Weierstrass kernel. It is important for a number
of reasons, including its connection with the heat equation that we shall explain in
38.7. When n == 1, its periodized version
'l/Jt(x) = L e-1I"Ix-kI2 jt 2 = L e-1I"t 2 1<2 e 2 11"il<'x,
kEZ EZ
in terms of which the jt in Theorem 8.36 is given by jt == f * 'l/Jt, is essentially one
of the Jacobi theta functions, which are connected with elliptic functions and have
applications in number theory.
The second example is <p() == e-2nll, whose inverse Fourier transform q; is
called the Poisson kernel on }Rn. When n == 1, we have
(8.37)
cP( x) == f o e211"(1+ix) d + roc e21l"( -1+ix) d
-ex) Jo
1 [ 1 1 ] 1
== 27f 1 + ix + 1 - ix 7f (1 + x 2 ) .
SUMMATION OF FOURIER INTEGRALS AND SERIES 261
The formula for cjJ in higher dimensions is worked out in Exercise 26; it turns out
that cjJ(x) is a constant multiple of (1 + IxI 2 )-C n + 1 )/2. Like the Gauss kernel, the
Poisson kernel has an interpretation in terms of partial differential equations that we
shall explain in 38.7.
If we take n == 1 and <I>() == e-27r11 in Theorem 8.36, make the substitution
r == e- 27rt , and write Ar f in place of ft, we obtain
Arf(x) == L rlK:I!()e-27riK:x
K:EZ
(8.38)
ex::>
== [(0) + L r k [[( k )e27rikx + [( -k )e-27rikx].
k=l
This formula is a special case of one of the classical methods for summing a (possibly)
divergent series. Namely, ifL ak is a series of complex numbers, for 0 < r < 1 its
rth Abel mean is the series L r k ak. If the latter series converges for r < 1 to the
sum S (r) and the limit S == lim r /,1 S (r) exists, the series L ak is said to be Abel
summable to S. If L ak converges to the sum S, then it is also Abel summable to
S (Exercise 27), but the Abel sum may exist even when the series diverges.
In (8.38), Ar f (x) is the rth Abel mean of the Fourier series of f, in which the kth
and (-k )th terms are grouped together to make a series indexed by the nonnegative
integers. It has the following complex-variable interpretation: If we set z == re 27rix ,
then
ex::> ex::>
Arf(x) == L f(k)zk + L!( -k) z k.
o 1
The two series on the right define, respectively, a holomorphic and an antiholomorphic
function on the unit disc Izi < 1. In particular, Arf(x) is a harmonic function on the
unit disc, and the fact that Arf f as r 1 means that f is the boundary value of
this function on the unit circle. See also Exercise 28.
Our final example is the function <I> ( ) == max(1 - II, 0) with n == 1. Its inverse
Fourier transform is
cjJ(x) == f o (1 + )e21ri'x d + {l (1 - )e27ri'x d
-1 Jo
== e 27rix + e- 27rix _ 2 == ( sin 7r x ) 2 .
(27rix)2 7rX
If we use this <I> in Theorem 8.36, take t == (m + 1)-1 (m == 0,1,2, . . .), and write
O"mf(x) for f1/(m+1)(x), we obtain
(8.39)
crmf(x) = m + 1 - 11\;1 [(1\;)e27riKX
m+l
K:=-m
m
= [(0) + " m + 1 - k [j(k)e27rikx + [( _k)e-27rikx].
m+l
k=l
262 ELEMENTS OF FOURIER ANALYSIS
This is an instance of another classical method for summing divergent series. Namely,
if 2: ak is a series of complex numbers, its mth Cesaro mean is the average of its
first m + 1 partial sums, (m + 1) -1 2: Sn, where Sn == 2: ak. If the sequence
of Cesaro means converges as m 00 to a limit S, the series is said to be Cesaro
summable to S. It is easily verified that if 2: ak converges to S, then it is Cesaro
summable to S (but perhaps not conversely), and that umf(x) is the mth Cesaro
mean of the Fourier series of f with the kth and (-k )th terms grouped together. See
Exercise 29, and also Exercise 33 in the next section.
Exercises
24. State and prove an analogue of Theorem 8.33 for functions on IR. (In addition
to the hypotheses that f be locally absolutely continuous and that ff E LP for some
p > 1, you will need some further conditions f and/or ff at infinity to make the
argument work. Make them as mild as possible.)
25. For 0 < a < 1, let Aa (1f) be the space of Holder continuous functions on 1f of
exponent a as in Exercise 11 in 3 5 . 1 . Suppose 1 < p < 00 and p-1 + q-1 == 1.
a. If f satisfies the hypotheses of Theorem 8.33, then f E A 1 / q (1f), but f need
not lie in Aa(1f) for any a > l/q. (Hint: f(b) - f(a) == J: f'(t)dt.)
b. If a < 1, Aa (1f) contains functions that are not of bounded variation and
hence are not absolutely continuous. (But cf. Exercise 37 in 33.5.)
26. The aim of this exercise is to show that the inverse Fourier transform of e-27r11
on IRn is
ljJ(x) = r:/)) (1 + Ix12) -(n+1)/2.
a. If {3 > 0, e-{3 == 7r- 1 Joo(1 + t 2 )-1e- i {3t dt. (Use (8.37).)
b. If {3 > 0, e-{3 == Jo oo (7rs)-1/2 e -s e -{32 /4s ds. (Use (a), Proposition 8.24, and
the formula (1 + t 2 ) -1 == Jo oo e-(1+t 2 )s ds.)
c. Let f3 == 27r11 where E IRn; then the formula in (b) expresses e-27r11 as a
superposition of dilated Gauss kernels. Use Proposition 8.24 again to derive the
asserted formula for cPo
27. Suppose that the numerical series 2: ak is convergent.
a. Let S == 2: ak. Then 2: rkak == 2:-1 S!n(r j - r j + 1 ) + Srn for
o < r < 1 ("summation by parts").
n k .
b. l2:m r akl < SUPjm ISI.
c. The series 2: rkak is uniformly convergent for 0 < r < 1, and hence its
sum S(r) is continuous there. In particular, 2: ak == lim r /1 S(r).
28. Suppose that f E £1 (1f), and let Arf be given by (8.38).
a. Arf == f * Pr where Pr(x) == 2:00 r l K:l e 27riK:x is the Poisson kernel for 1f.
b. Pr(x) == (1 - r 2 )/(1 + r 2 - 2r cas 27rx).
29. Given {ak}O C C, let Sn == 2: ak and U m == (m + 1)-1 2: Sn.
POINTWISE CONVERGENCE OF FOURIER SERIES 263
a. U m == (m + 1)-1 E(m + 1 - k)ak.
b. If limnoo Sn == E ak exists, then so does limmoo U m , and the two
limits are equal.
c. The series E ( -1)k diverges but is Abel and Cesaro summable to !.
30. If ! E Ll (IRn), ! is continuous at 0, and 1 > 0, then 1 ELI. (Use Theorem
8.35c and Fatou's lemma.)
31. Suppose a > O. Use (8.37) to show that
00 1 7r 1 + e- 27ra
2::= k 2 + a 2 = a 1 - e- 2 11"a .
-00
Then subtract a- 2 from both sides and let a ---t 0 to show that E k- 2 == 7r 2 /6.
32. A Coo function ! on IR is real-analytic if for every x E IR, ! is the sum of its
Taylor series based at x in some neighborhood of x. If ! is periodic and we regard!
as a function on 8 == {z E CC : Izi == I}, this condition is equivalent to the condition
that f be the restriction to S of a holomorphic function on some neighborhood of
8. Show that! E C oo (1f) is real-analytic iff 1j(K:) I < Ce-tlK:1 for some C, E > O.
(See the discussion of the Abel means Ar! in the text, and note that z == z-1 when
Izi == 1.)
8.5 POINTWISE CONVERGENCE OF FOURIER SERIES
The techniques and results of the previous two sections, involving such things as
LP norms and summability methods, are relatively modern; they were preceded
historically by the study of pointwise convergence of one-dimensional Fourier series.
Although the latter is one of the oldest parts of Fourier analysis, it is also one of
the most difficult - unfortunately for the mathematicians who developed it, but
fortunately for us who are the beneficiaries of the ideas and techniques they invented
in doing so. A thorough study of this issue is beyond the scope of this book, but we
would be remiss not to present a few of the classic results.
To set the stage, suppose! E L 1 (1f). We denote by 8m! the mth symmetric
partial sum of the Fourier series of !:
m
Sm!(x) == 'Ll(k)e27rikx.
-m
From the definition of j( k), we have
m 1
Smf(x) = 2::= 1 f(y)e 2 11"ik(x-y) dy = f * Dm(x),
-m 0
264 ELEMENTS OF FOURIER ANALYSIS
where Dm is the mth Dirichlet kernel:
m
Dm(x) == L e27rikx.
-m
The terms in this sum form a geometric progression, so
2m 27r(2m+ l)x _ 1
D ( x ) == e -27rimx '""""" e27rikx == e -27rimx e. .
m e27r2X _ 1
o
Multiplying top and bottom by e- 7rix yields the standard closed formula for Dm:
(8.40)
e(2m+l)7rix _ e-(2m+l)7rix sin(2m + l)7rx
Dm(x) == . . == .
e 7r2X - e- 7r2X sin 7rX
The difficulty with the partial sums 8m!, as opposed to (for example) the Abel or
Cesaro means, can be summed up in a nutshell as follows. 8 m f can be regarded as a
special case of the construction in Theorem 8.36; in fact, with the notation used there,
8 m f == fIlm if we take <I> == X[-I,I]. But X[-I,I] does not satisfy the hypotheses of
Theorem 8.36, because its inverse Fourier transform (7rX) -1 sin 21fx (Exercise 15a)
is not in £1 (IR). On the level of periodic functions, this is reflected in the fact that
although Dm E £1(1f) for all m, IIDmlll --+ 00 as m --+ 00 (Exercise 34).
Among the consequences of this is that the Fourier series of a continuous function
f need not converge pointwise, much less uniformly, to f; see Exercise 35. (This
does not contradict the fact that trigonometric polynomials are dense in C (1f)! It
just means that if one wants to approximate a function fEe (1f) uniformly by
trigonometric polynomials, one should not count on the partial sums 8 m f to do the
job; the Cesaro means defined by (8.39) work much better in general.) To obtain
positive results for pointwise convergence, one must look in other directions.
The first really general theorem about pointwise convergence of Fourier series
was obtained in 1829 by Dirichlet, who showed that 8m f (x) --+ [f (x+ ) + f (X- )]
for every x provided that f is piecewise continuous and piecewise monotone. Later
refinements of the argument showed that what is really needed is for f to be of
bounded variation. We now prove this theorem, for which we need two lemmas. The
first one is a slight generalization of one of the more arcane theorems of elementary
calculus, the "second mean value theorem for integrals."
8.41 Lemma. Let cjJ and 'ljJ be rea/-valued functions on [a, b]. Suppose that cjJ is
monotone and right continuous on [a, b] and'ljJ is continuous on [a, b]. Then there
exists rJ E [a, b] such that
1 b <p ( x ) 'lj; ( x) dx = <p ( a) 1'" 'lj; ( x ) dx + <p (b) 1 b 'lj; ( x) dx.
Proof. Adding a constant c to cjJ changes both sides of the equation by the
amount c f: 'l/;( x) dx, so we may assume that cjJ( a) == O. We may also assume that cjJ
POINTWISE CONVERGENCE OF FOURIER SERIES 265
is increasing; otherwise replace cp by -cPo Let w(x) == f: 'ljJ(t) dt (so that w' == -'ljJ)
and apply Theorem 3.36:
{b cjJ ( X ) 'lj; ( x) dx = - cjJ ( x ) I]i ( x ) I + { I]i ( x) dcjJ ( x ) .
J a J(a,b]
The endpoint evaluations vanish since cp(a) == w(b) == O. Since cjJ is increasing and
f(a,b] dcp == cP(b) - cP(a) == cjJ(b), ifm and M are the minimum and maximum values
of W on [a, b] we have mq;(b) < f(a,b] W dq; < M q;(b). By the intermediate value
theorem, then, there exists T) E [a, b] such that f( a,b] W dcp == w ( T)) cP( b), which is the
desired result. .
8.42 Lemma. There is a constant C < 00 such that for every m > 0 and every
[a, b] c [-, ],
b
11 Dm(x) dxl < c.
o r 1 / 2 1
Moreover, f-1/2 Dm(x) dx == Jo Dm(x)dx == 2forallm.
Proof. By (8.40),
l b l b sin(2m + l)7rx l b . [ 1 1 ]
Dm(x) dx == dx + s1n(2m + l)7rx . - - dx.
a a 7rX a SIn 7rX 7rX
Since (sin 7rx)-1 - (7rX)-l is bounded on [-,] and I sin(2m + l)7rXI < 1, the
second integral on the right is bounded in absolute value by a constant. With the
substitution y == (2m + 1 )7rX, the first one becomes
{(2m+l)7rb sin y dy = Si[(2m + l)7fb] - Si[(2m + l)7fa]
J(2m+1)7ra 7rY 7r
where Si(x) == fox y-1 sin y dYe But Si(x) is continuous and approaches the finite
limits ::i:: 7r as x ---t ::i::oo (see Exercise 59b in 32.6), so Si( x) is bounded. This proves
the first assertion. As for the second one,
/ 1/2 m / 1/2
Dm(x) dx == L e27rikx dx == 1
-1/2 -m -1/2
(only the term with k == 0 is nonzero), so since Dm is even,
o 1
/ Dm(x) dx = (2 Dm(x) dx = .
-1/2 J 0
.
266 ELEMENTS OF FOURIER ANALYSIS
8.43 Theorem. If I E BV(1f) - that is, if I is periodic on IR and of bounded
variation on [- , ] - then
lim Smf(x) == [f(x+) + f(x-)] for every x.
m ---t ex:>
In particular, limm---tex:> Sm! (x) == ! (x) at every x at which I is continuous.
Proof. We begin by making some reductions. In examining the convergence of
8m! (x), we may assume that x == 0 (by replacing I with the translated function
T -xl), that I is real-valued (by considering the real and imaginary parts separately),
and that I is right continuous (since replacing I(t) by I(t+) affects neither Sml
nor [1(0+) + 1(0- )]). In this case, by Theorem 3.27b, on the interval [- , ) we
can write I as the difference of two right continuous increasing functions 9 and h. If
these functions are extended to IR by periodicity, they are again of bounded variation,
and it is enough to show that Smg(O) --+ [9(0+) + g(O-)] and likewise for h.
In short, it suffices to consider the case where x == 0 and I is increasing and
right continuous on [-, ). Since Dm is even, we have Sml(O) == 1* Dm(O) ==
Ji2 f(x)Dm(x) dx, so by Lemma 8.42,
Sml(O) - [1(0+) + 1(0-)]
= r 1 / 2 [J(X)_f(0+)]D m (X)dx+ j O [f(x)-f(O-)]Dm(x)dx.
Jo -1/2
We shall show that the first integral on the right tends to zero as m --+ 00; a similar
argument shows that the second integral also tends to zero, thereby completing the
proof.
Given E > 0, choose 8 > 0 small enough so that 1(8) - f(O+) < EIG where C
is as in Lemma 8.42. Then by Lemma 8.41, for some T) E [0,8],
(b b
I Jo [f(x) - f(O+)] Dm(x) dxl = [f(8) - f(O+)] 11 Dm(x) dxl,
which is less than E. On the other hand, by (8.40),
(1/2
Jo [f(x) - f(O+)] Dm(x) dx = g+( -m) - g_(m),
where 9:1::. is the periodic function given on the interval [- , ) by
[I(x) - I(O+)]e:f:7rix
g:f:(x) == 2. . X[b,1/2)(X).
Z SIn 7rX
But g:f: E £1 (1f), so g:f: (=Fm) --+ 0 as m --+ 00 by the Riemann-Lebesgue lemma
(the periodic analogue of Theorem 8.22f). Therefore,
r 1 / 2
lIJo [f(x) - f(O+)] Dm(x) dxl < E
for every E > 0, and we are done.
.
POINTWISE CONVERGENCE OF FOURIER SERIES 267
One of the less attractive features of Fourier series is that bad behavior of a function
at one point affects the behavior of its Fourier series at all points. For example, if
f has even one jump discontinuity, then 1 cannot be in Zl (Z) and so the series
L 1( k )e27rikx cannot converge absolutely at any point. However, to a limited extent
the convergence of the series at a point x depends only on the behavior of f near x,
as explained in the following localization theorem.
8.44 Theorem. If f and 9 are in £1 (1f) and f == 9 on an open interval I, then
Sm! - Smg --.., 0 uniformly on compact subsets of I.
Proof. It is enough to assume that 9 == 0 (consider! - g), and by translating!
we may assume that I is centered at 0, say I == ( -c, c) where c < . Fix 8 < c; we
shall show that if ! == 0 on I then Sm! --.., 0 uniformly on [-8, 8].
The first step is to show that Sm! --.., 0 pointwise on [-8,8], and the argument is
similar to the preceding proof. Namely, by (8.40) we have
/ 1/2
Smf(x) == f(x - y)Dm(Y) dy == gx,+( -m) - gx,_(m),
-1/2
where
!(X - y)e-:17ri y
gx,-:1(y) == 2.. .
'l SIn 'Try
Since !(x - y) = 0 on a neighborhood of the zeros of sin 'Try, the functions gx,-:1 are
in £1(1f), so ,-:1(=fm) ---t 0 by the Riemann-Lebesgue lemma.
The next step is to show that if Xl, X2 E [-8,8], then Sm!(X1) - Sm!(X2)
vanishes as Xl - X2 ---t 0, uniformly in m. By (8.40) again,
/ 1/2 sin(2m + l)'TrY
Sm!(Xl) - Smf(X2) == . [f(X1 - y) - !(X2 - y)] dy.
-1/2 SIn 'Try
But f(X1 - y) - !(X2 - y) == 0 for Iyl < c - 8, and for c - 8 < Iyl < we have
sin(2m + l)'TrY < 1 == A
SIn 'Try - sin 'Tr(c - 8) ,
where A is independent of m. Hence
I S m!(X1) - Sm!(x2)1 < A / 1/2 If(Xl - y) - f(X2 - y)1 dy = AllTxJ - T x2 f1l1,
-1/2
which vanishes as Xl - X2 ---t 0 by (the periodic analogue of) Proposition 8.5.
Now, given E > 0, we can choose T) small enough so that if Xl, X2 E [-8, 8] and
IXI - x21 < 1], then I S mf(X1) - Smf(x2)1 < E/2. Choose Xl,..., Xk E [-8,6] so
that the intervals Ix - Xj I < T) cover [-8,8]. Since Smf(xj) --.., 0 for each j, we can
choose M large enough so that ISmf(xj)1 < E/2 for m > M and 1 < j < k. If
Ixi < 8, then, we have Ix - Xj I < rJ for some j, so
ISm!(x)1 < ISm!(x) - Sm!(xj)1 + ISmf(xj)1 < E
for m > M, and we are done.
.
268 ELEMENTS OF FOURIER ANALYSIS
8.45 Corollary. Suppose that! E L 1 (1f) and I is an open interval of length < 1.
a. If! agrees on I with a function g such that g E ZI (Z), then Sm! --+ !
uniformly on compact subsets of I.
b. If I is absolutely continuous on I and I' E LP(I) for some p > 1, then
Sm! -+ ! uniformly on compact subsets of I.
Proof. If! == g on I, then Sm! - ! == Sm! - g == (Sm! - Smg) + (Smg - g)
on I, and ifg E ZI (Z), then Smg --+ g uniformly on IR; (a) follows. As for (b), given
[ao, b o ] C I, pick a < ao and b > b o so that [a, b] C I, and let g be the continuous
periodic function that equals I on [a, b] and is linear on [b, a + 1] (which is unique
since g(b) == f(b) and g(a + 1) == g(a) == !(a)). Under the hypotheses of (b), g is
absolutely continuous on IR and g' E LP(1f), so g E ZI (Z) by Theorem 8.33. Thus
Sm! --+ ! uniformly on [ao, b o ] by (a). .
Finally, we discuss the behavior of Smf near a jump discontinuity of !. Let us
first consider a simple example: Let
(8.46)
q;(x) == -x - [x]
([x] == greatest integer < x).
Then cjJ is periodic and is CCXJ except for jump discontinuities at the integers, where
q;(j+) - cjJ(j-) == 1. It is easy to check that (O) == 0 and (k) == (27rik)-1 for
k =I 0 (Exercise 13a), so that
SmcjJ(x) == L
O<lkl::;m
e27rikx == sin 27r kx
27rik 7rk
1
From Corollary 8.45 it follows that SmcjJ --+ cjJ uniformly on any compact set not
containing an integer, and it is obvious that Sm cjJ( x) == 0 when x is an integer. But
near the integers a peculiar thing happens: Sm cjJ contains a sequence of spikes that
overshoot and undershoot cjJ, as shown in Figure 8.1, and as m --+ 00 the spikes tend
to zero in width but not in height. In fact, when m is large the value of Smq; at its
first maximum to the right of 0 is about 0.5895, about 18% greater than q; (0+) == .
This is known as the Gibbs phenomenon; the precise statement and proof are given
in Exercise 37.
Now suppose that! is any periodic function on IR having a jump discontinuity at
x == a (that is, ! (a+) and! (a- ) exist and are unequal). Then the function
g(x) == f(x) - [I(a+) - !(a-)JcjJ(x - a)
is continuous at every point where I is, and also at x == a provided that we (re )define
g( a) to be ! [I (a+) + I (a- )], as the jumps in ! and cjJ cancel out. If g satisfies one
of the hypotheses of Corollary 8.45 on an interval I containing a, the Fourier series
of g will converge uniformly near a, and hence the Fourier series of ! will exhibit
the same Gibbs phenomenon as that of cjJ.
Finally, suppose that! is periodic and continuous except at finitely many points
aI, . . . , ak E 1f, where! has jump discontinuities. We can then subtract off all the
POINTWISE CONVERGENCE OF FOURIER SERIES 269
Fig. 8. 1 The Gibbs phenomenon: the graph y == Lo (7r k) -1 sin 27r kx, - < x < .
jumps to form a continuous function g:
g(x) == I(x) - L[/(aj+) - l(aj-)]1>(x - aj)
If I satisfies some mild smoothness conditions - for example, if I is absolutely
continuous on any interval not containing any aj and I' E LP for some p > 1 - then
9 will be in [1 (Z). Conclusion: Smf f uniformly on any interval not containing
any aj, Sm (aj) [I (aj + ) + I (aj - )], and Sml exhibits the Gibbs phenomenon
near every aj.
Exercises
33. Let a mf be the Cesaro means of the Fourier series of I given by (8.39).
a. ami == f * Fm where Fm == (m + 1)-1 L Dk and Dk is the kth Dirichlet
kernel. (See Exercise 29a.) F m is called the mth Fejer kernel.
b. Fm(x) == sin 2 (m + l)7rxj(m + 1) sin 2 7rX. (Use (8.40) and the fact that
sin(2k + l)7rx == 1m e(2k+l)nix.)
34. If Dm is the mth Dirichlet kernel, IIDm Ih 00 as m 00. (Make the
substitution y == (2m + 1 )7rX and use Exercise 59a in 32.6.)
35. The purpose of this exercise is to show that the Fourier series of "most" contin-
uous functions on T do not converge pointwise.
a. Define 1>m(/) == Sml(O). Then 1> E C(T)* and 111>11 == IIDmlh.
b. The set of all I E C(1r) such that the sequence {Smf(O)} converges is
meager in C(T). (Use Exercise 34 and the uniform boundedness principle.)
c. There exist f E C(1r) (in fact, a residual set of such f's) such that {Smf(x)}
diverges for every x in a dense subset of T. (The result of (b) holds if the point
o is replaced by any other point in T. Apply Exercise 40 in 35.3.)
270 ELEMENTS OF FOURIER ANALYSIS
36. The Fourier transform is not surjective from £1(1r) to C o (2). (Use Exercise 34,
and cf. Exercise 16c.)
37. Let cp be given by (8.46) and let m == SmCP - cp.
3. (d/dx)m(x) == Dm(x) for x 2.
b. The first maximum of m to the right of 0 occurs at x == (2m + 1)-1, and
lim m ( 1 ) == {7r sin t dt _ rv 0.0895.
m--+(XJ 2m + 1 7r Jo t 2
(Use (8.40) and the fact that m(x) == J; 'm(t) dt - .)
c. More generally, the jth critical point of m to the right of 0 occurs at
x == j / (2m + 1) (j == 1, . . . , 2m), and
1 . ;\ ( j ) 1 l j7r sin t d 1
1m Ll - - - t - -
m---700 m 2m + 1 - 7r 0 t 2.
These numbers are positive for j odd and negative for j even. (See Exercise 59b
in 32.6.)
8.6 FOURIER ANALYSIS OF MEASURES
We recall that M(IRn) is the space of complex Borel measures on IRn (which are
automatically Radon measures by Theorem 7.8), and we embed £1 (IRn) into M(IRn)
by identifying f E £1 with the measure df.-l == f dm. We shall need to define products
of complex measures on Cartesian product spaces, which can easily be done in terms
of products of positive measures by using Radon-Nikodym derivatives. Namely, if
f.-l, v E M (IRn), we define f.-l x v E M (IRn x IRn) by
df.-l dv
d(p, x v)(x, y) = dlp,1 (x) dlvl (y) d(Ip,1 x Ivl)(x, y).
If f.-l, v E M (IRn ), we define their convolution f.-l * v E M (IRn) by f.-l * v (E) ==
f.-l x v(a- 1 (E)) where a: IRn x IRn IRn is addition, a(x,y) == x + y. In other
words,
(8.47)
p, x v(E) = 11 XE(X + y) dp,(x) dv(y).
8.48 Proposition.
a. Convolution of measures is commutative and associative.
b. For any bounded Borel measurable function h,
1 h d(p, * v) = 11 h(x + y) dp,(x) dv(y).
FOURIER ANALYSIS OF MEASURES 271
c. IIIL * vII < 1I1L1l1lvll.
d. If dlL == f dm and dv == 9 dm, then d(1L * v) == (f * g) dm; that is, on L1 the
new and old definitions of convolution coincide.
Proof. Commutativity is obvious from Fubini's theorem, as is associativity, for
A * IL * v is unambiguously defined by the formula
>.. * IH v(E) = 111 XE(X + y + z) d>"(x) dp,(y) dv(z).
Assertion (b) follows from (8.47) by the usual linearity and approximation arguments.
In particular, taking h == dill * vii d(1L * v), since Ihl == 1 we obtain
lip, * vii = 1 hd(p, * v) < 11 Ihl dip, I dlvl = 11p,llllvll,
which proves (c). Finally, if dlL == f dm and dv == 9 dm, for any bounded measurable
h we have
1 hd(p, * v) = 11 h(x + y)f(x)g(y) dxdy
= 11 h(x)f(x - y)g(y) dx dy = 1 h(x)(f * g) (x) dx,
whence d(1L * v) == (f * g) dm.
.
We can also define convolutions of measures with functions in LP(IRn, m), which
we implicitly assume to be Borel measurable. (By Proposition 2.12, this is no
restriction.)
8.49 Proposition. If f E LP(IRn) (1 < p < (0) and IL E M(IRn), then the integral
f * IL(X) == J f(x - y) dJ-L(Y) existsfora.e. x, f * IL E LP, and Ilf * ILllp < IIfllplllLll.
(Here "LP" and "a. e." refer to Lebesgue measure.)
Proof. If f and IL are nonnegative, then f * IL( x) exists (possibly being equal to
(0) for every x, and by Minkowski's inequality for integrals,
[If * p,[[p < 1 Ilf(' - y)llp dp,(y) = Ilfllpllp,ll.
In particular, f * IL( x) < 00 for a.e. x. In the general case this argument applies to
If I and III I, and the result follows easily. .
In the case p == 1, the definition of f * IL in Proposition 8.49 coincides with the
definition given earlier in which f is identified with f dm, for
hf*p,(X)dX= 11 XE(x)f(x-y)dp,(y)dx= 11 XE(x+y)f(x)dxdp,(y)
272 ELEMENTS OF FOURIER ANALYSIS
for any Borel set E. Thus L 1 (IRn) is not merely a subalgebra of M (IRn) with respect
to convolution but an ideal.
We extend the Fourier transform from L 1 (IRn) to M (IRn) in the obvious way: If
f.-l E M (IRn), ji is the function defined by
f1() = J e-21fi.x df-l(x).
(The Fourier transform on measures is sometimes called the Fourier-Stieltjes trans-
form.) Since e-27ri.x is uniformly continuous in x, it is clear that ji is a bounded
continuous function and that lijiliu < 11f.-l11. Moreover, by taking h(x) == e-27ri.x in
'"
Proposition 8.48b, one sees immediately that (f.-l * l/) == jiv.
We conclude by giving a useful criterion for vague convergence of measures in
terms of Fourier transforms.
8.50 Proposition. Suppose that f.-l1, f.-l2, . . ., and f.-l are in M(IRn). lfllf.-lk II < C < 00
for all k and jik ji pointwise, then f.-lk f.-l vaguely.
Proof. If I E S,then IV E S (Corollary 8.23), so by the Fourier inversion
theorem,
J f df-lk = J J fV (y)e-21fiY.x dy df-lk (x) = J r (y)f1k(Y) dy.
Since IV E L 1 and lljik Ilu < C, the dominated convergence theorem implies that
J I df.-lk J I df.-l. But S is dense in Co (IRn) (Proposition 8.17), so by Proposition
5.17, J I df.-lk J I df.-l for all I E Co (IRn), that is, f.-lk f.-l vaguely. .
This result has a partial converse: If f.-lk f.-l vaguely and lIf.-lkll 11f.-l1l, then
f1k ---4 f1 pointwise. This follows from Exercise 26 in 37.3.
Exercises
38. Work out the analogues of the results in this section for measures on the torus
Tn.
39. If f.-l is a positive Borel measure on 1r with f.-l(1r) == 1, then Iji(k)1 < 1 for all
k i= 0 unless f.-l is a linear combination, with positive coefficients, of the point masses
at 0, 1 ' . . . , ml for some mEN, in which case ji(jm) == 1 for all j E Z.
40. L 1 (IRn) is vaguely dense in M (IRn ). (If f.-l E M (IRn ), consider CPt * f.-l where
{ 4>t }t>o is an approximate identity.)
41. Let be the set of finite linear combinations of the point masses Dx, x E IRn.
Then is vaguely dense in M(IRn). (If I is in the dense subset Cc(IRn) of L 1 (IRn)
and 9 E Co (IRn), approximate J Ig by Riemann sums. Then use Exercise 40.)
42. A function cP on IRn that satisfies E7:k=l Zj ZkCP(Xj -Xk) > 0 for all Zl, . . . , Zm E
CC and all Xl, . . . , X m E IRn, for any mEN, is called positive definite. If f.-l E M (IRn)
is positive, then ji is positive definite.
APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS 273
8.7 APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS
In this section we present a few of the many applications of Fourier analysis to the
theory of partial differential equations; others will be found in Chapter 9. We shall
use the term differential operator to mean a linear partial differential operator with
smooth coefficients, that is, an operator L of the form
Lf(x) == I: aa(x)aa f(x),
lalm
aa E C OCJ .
If the aa' s are constants, we call L a constant-coefficient operator. In this case,
if for all sufficiently well-behaved functions f (for example, f E S) we have
(Lfnf,) = I: aa(27rif,)aj(f,).
lalm
It is therefore convenient to write L in a slightly different form: We set b a
(27ri)la 1 aa and introduce the operators
D a == (27ri)- la1 aa,
so that
L == I: baD a ,
lalm
(Lff = I: baf,a 1.
lalm
Thus, if P is any polynomial in n complex variables, say P() == Llalm baa,
we can form the constant-coefficient operator P(D) == Llalm baDa, and we then
have [P(D)fJ == P f. The polynomial P is called the symbol of the operator P(D).
Clearly, one potential application of the Fourier transform is in finding solutions of
the differential equation P(D)u == f. Indeed, application of the Fourier transform to
both sides yields u == p- 1 f, whence u == (P- 1 i) v. Moreover, if p- 1 is the Fourier
transform of a function cjJ, we can express u directly in terms of f as u == f * cjJ. For
these calculations to make sense, however, the functions f and p- 1 i (or p-l) must
be ones to which the Fourier transform can be applied, which is a serious limitation
within the theory we have developed so far. The full power of this method becomes
available only when the the domain of the Fourier transform is substantially extended.
We shall do this in 39.2; for the time being, we invite the reader to work out a fairly
simple example in Exercise 43. (It must also be pointed out that even when this
method works, u == (p- 1 l) v is far from being the only solution of P(D)u == f;
there are others that grow too fast at infinity to be within the scope even of the
extended Fourier transform.)
Let us turn to some more concrete problems. The most important of all partial
differential operators is the Laplacian
n 8 2 n
== I: 8 2 == -47r 2 I: DJ == P(D) where P() == _47r2112.
x.
1 J 1
274 ELEMENTS OF FOURIER ANALYSIS
The reason for this is that is essentially the only (scalar) differential operator
that is invariant under translations and rotations. (If one considers operators on
vector-valued functions, there are others, such as the familiar grad, curl, and div of
3-dimensional vector analysis.) More precisely, we have:
8.51 Theorem. A differential operator L satisfies L(I 0 T) == (LI) 0 T for all
translations and rotations T iff there is a polynomial P in one variable such that
L == P().
Proof. Clearly L is translation-invariant iff L has constant coefficients, in which
case L == Q(D) for some polynomial Q in n variables. Moreover, since (LI) == Qj
and the Fourier transform commutes with rotations, L commutes with rotations iff Q
is rotation-invariant. Let Q == 2: Qj where Qj is homogeneous of degree j; then
it is easy to see that Q is rotation-invariant iff each Qj is rotation-invariant. (Use
induction on j and the fact that Qj() == limr--+o r- j 2:7 Qi(r).) But this means
thatQj() depends only on II, soQj() == cjllj by homogeneity. Moreover, 1lj is a
polynomial precisely whenj is even, so Cj == 0 for j odd. Setting b k == (-47r 2 )-kc2k'
then, we have Q() == 2: bk( _47r2112)k, that is, L == 2: bkk. .
One of the basic boundary value problems for the Laplacian is the Dirichlet
problem: Given an open set n c JRn and a function f on its boundary an, find
a function u on n such that u == 0 on n and ulan == I. (This statement of the
problem is deliberately a bit imprecise.) We shall solve the Dirichlet problem when
n is a half-space.
For this purpose it will be convenient to replace n by n + 1 and to denote the
coordinates on JRn+ I by Xl, . . . , X n , t. We continue to use the symbol to denote
the Laplacian on JRn, and we set
a
at == at '
so the Laplacian on JRn+1 is + a;. We take the half-space n to be JRn x (0,00).
Thus, given a function 1 on JRn, satisfying conditions to be made more precise
below, we wish to find a function u on JRn X [0,00) such that ( + a;)u == 0 and
u(x,O) == f(x).
The idea is to apply the Fourier transform on JRn, thus converting the partial
differential equation ( + a;)u == 0 into the simple ordinary differential equation
(-47r2112 + B;)u == O. The general solution of this equation is
(8.52)
u(, t) == CI ()e-27rtl1 + c2()e27rtll,
"""-
and we require that u( , 0) == 1 (). We therefore obtain a solution to our problem
by taking Cl () == j(), C2() == 0 (more about the reasons for this choice below);
this gives u(, t) == j()e-27rtll, or u(x, t) == (f * Pt)(x) where Pt == (e-27rtlI)V
is the Poisson kernel introduced in 38.4. As we calculated in Exercise 26,
Pt x = r((n+ 1)) t
() 7r(n+ 1)/2 (t 2 + I x 1 2 ) -(n+!)/2 .
APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS 275
So far this is all formal, since we have not specified conditions on f to ensure that
these manipulations are justified. We now give a precise result.
8.53 Theorem. Suppose f E LP(IRn) (1 < p < (0). Then the function u(x, t) ==
(f * Pt)(x) satisfies (+ a;)u == 0 on IRn x (0, (0), andlimt--+o u(x, t) == f(x)for
a.e. x and for every x at which f is continuous. Moreover, limt--+o Ilu(., t) - flip == 0
provided p < 00.
Proof. Pt and all of its derivatives are in Lq (IRn) for 1 < q < 00, since a rough
calculation shows that la Pt(x) I < C a Ixl-n-l-Ial and lal Pt (x) I < C j I x l- n - 1 for
large x. Also, ( + a;)Pt(x) == 0, as can be verified by direct calculation or (more
easil y) by taking the Fourier transform. Hence f * Pt is well defined and
( + 8;)(f * Pt) == f * ( + a;)P t == o.
Since Pt (x) == t- n PI (t- 1 X) and J PI (x) dx == PI (0) == 1, the remaining assertions
follow from Theorems 8.14 and 8.15. .
The function u( x, t) == (f * P t ) (x) is not the only one satisfying the conclusions
of Theorem 8.53; for example, v(x, t) == u(x, t) + ct also works, for any c E <C. For
f E L 1 , we could also obtain a large family of solutions by taking C2 in (8.52) to be an
.........
arbitrary function in C and Cl == f - C2. (But there is no nice convolution formula
for the resulting function u, because e27rtl1 is not the Fourier transform of a function
or even a distribution.) The solution u(x, t) == (f * Pt) (x) is distinguished, however,
by its regularity at infinity; for example, it can be shown that if f E BC (IRn), then u
is the unique solution in BC(IRn x [0,(0)).
The same idea can be used to solve the heat equation
(at-)u==O
on IRn x (0, (0) subject to the initial condition u(x,O) == f(x). (Physical inter-
pretation: u(x, t) represents the temperature at position x and time t in a homo-
geneous isotropic medium, given that the temperature at time 0 is f(x).) Indeed,
Fourier transformation leads to the ordinary differential equation (at +47r2112)u == 0
with initial condition u(, 0) == l(). The unique solution of the latter problem is
u(, t) == 1()e-47r2tl12. In view of Proposition 8.24, this yields
u(x, t) == f * Gt(x),
Gt(x) == (47rt)-n/2 e - 1 x I2 /4t.
Here we have Gt(x) == t-n/2Gl(t-l/2x), so after the change of variable s == Vi,
Theorems 8.14 and 8.15 apply again, and we obtain an exact analogue of Theorem
8.53 for the initial value problem (at - )u == 0, u(x, 0) == f(x). Actually, in the
present case the hypotheses on f can be relaxed considerably because G t E S; see
Exercise 44.
Another fundamental equation of mathematical physics is the wave equation
(a; - )u == o.
276 ELEMENTS OF FOURIER ANALYSIS
(Physical interpretation: u(x, t) is the amplitude at position x and time t of a wave
traveling in a homogeneous isotropic medium, with units chosen so that the speed of
propagation is 1.) Here it is appropriate to specify both u(x, 0) and 8 t u(x, 0):
(8.54)
(a; - )u == 0,
u(x,O) == f(x),
at u ( x, 0) == 9 ( x ) .
After applying the Fourier transform, we obtain
(B; + 47r2112) u( , t) == 0,
u(, 0) == i(),
atU(, 0) == g(),
the solution to which is
(8.55)
/'. sin 27rtl1
u((, t) = (CDS 271" t IWf(() + 271"1(1 g(().
Since
a [ sin 27rtll ]
CDS 271" t l(1 = ot 271"1(1 '
it follows that
[ sin 27rtll ] v
u(x, t) = f * OtWt(x) + g * Wt(x), where W t = 271"1(1 .
But here there is a problem: (27r I I) -1 sin 27rt I I is the Fourier transform of a function
only when n < 2 and the Fourier transform of a measure only when n < 3; for these
cases the resulting solution of the wave equation is worked out in Exercises 45-47.
To carry out this analysis in higher dimensions requires the theory of distributions,
which we shall examine in Chapter 9. (We shall not, however, derive the explicit
formula for W t , which becomes increasingly complicated as n increases.)
Exercises
43. Let cjJ(x) == e- 1xl / 2 on IR. Use the Fourier transform to derive the solution
u == f * cjJ of the differential equation u - u" == f, and then check directly that it
works. What hypotheses are needed on f?
44. Let Gt(x) == (47rt)-n/2 e - 1 x I2 /4t, and suppose that f E Ltoc(IRn) satisfies
If(x)1 < CEeElxl2 for every E > O. Then u(x, t) == f * Gt(x) is well defined for all
x E JRn and t > 0; (at - )u == 0 on JRn X (0, (0); and limt--+o u(x, t) == f(x) for
a.e. x and for every x at which f is continuous. (To show u(x, t) f(x) a.e. on a
bounded open set V, write f == cP f + (1 - cP) f where cP E C c and cjJ == 1 on V, and
show that [(1 - cP )f] * G t 0 on V.)
45. Let n == 1. Use (8.55) and Exercise 15a to derive d' Alembert's solution to the
initial value problem (8.54):
1 1 (x+t
u(x, t) = 2 [f(x + t) + f(x - t)] + 2 Jx-t g(s) ds.
APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS 277
Under what conditions on f and 9 does this formula actually give a solution?
46. Let n == 3, and let at denote surface measure on the sphere Ix I == t. Then
sin 27rtl1 _ ( ) -1-- ( C )
27r1(1 - 47rt (Jt c" .
(See Exercise 22d.) What is the resulting solution of the initial value problem (8.54),
expressed in terms of convolutions? What conditions on f and 9 ensure its validity?
47. Let n == 2. If E IR 2 , let [ == (, 0) E IR 3 . Rewrite the result of Exercise 46,
--
sin 27rtl1 _ 1 1 -27ri . x d ( )
__ - - e at x ,
27r11 47rt Ixl=t
in terms of an integral over the disc Dt == {y : Iyl < t} in IR 2 by projecting the
upper and lower hemispheres of the sphere Ixl == t in IR 3 onto the equatorial plane.
Conclude that (27r1\)-1 sin 27rtl1 is the Fourier transform of
W t (x) == (27r) -1 (t 2 - Ix1 2 ) -1/2X D t (x),
and write out the resulting solution of the initial value problem (8.54).
48. Solve the following initial value problems in terms of Fourier series, where f, g,
and u(., t) are periodic functions on IR:
a. (8; + 8;)u == 0, u(x, 0) == f(x). (Cf. the discussion of Abel means in 38.4.)
b. (8 t - 8;)u == 0, u(x, 0) == f(x).
c. (a; - 8;)u == 0, u(x, 0) == f(x), 8tu(x, 0) == g(x).
49. In this exercise we discuss heat flow on an interval.
3. Solve (at - 8;)u == 0 on (a, b) x (0, (0) with boundary conditions u(x, 0) ==
f(x) for x E (a, b), u(a, t) == u(b, t) == 0 for t > 0, in terms of Fourier series.
(This describes heat flow on (a, b) when the endpoints are held at a constant
temperature. It suffices to assume a == 0, b == ; extend f to IR by requiring f to
be odd and periodic, and use Exercise 48b.)
b. Solve the same problem with the condition u( a, t) == u( b, t) == 0 replaced
by 8xu(a, t) == 8xu(b, t) == O. (This describes heat flow on (a, b) when the
endpoints are insulated. This time, extend f to be even and periodic.)
50. Solve (8; - a;)u == 0 on ( a, b) x (0,00) with boundary conditions u( x, 0) ==
f(x) and atu(x, 0) == g(x) for x E (a, b), u(a, t) == u(b, t) == 0 for t > 0, in terms
of Fourier series by the method of Exercise 49a. (This problem describes the motion
of a vibrating string that is fixed at the endpoints. It can also be solved by extending
f to be odd and periodic and using Exercise 45. That form of the solution tells you
what you see when you look at a vibrating string; this one tells you what you hear
when you listen to it.)
278 ELEMENTS OF FOURIER ANALYSIS
8.8 NOTES AND REFERENCES
The scope of Fourier analysis is much wider than we have been able to indicate in
this chapter. Dym and McKean [36] gives a more comprehensive treatment with
many interesting applications. Also recommended are Komer's delightful book [87],
which discusses various aspects of classical Fourier analysis and their role in science,
and the excellent collection of expository articles edited by Ash [7], which gives a
broader view of the mathematical ramifications of the subject. On the more advanced
level, the reader should consult Zygmund [167] for the classical theory and Stein
[140], [141] and Stein and Weiss [142] for some of the more recent developments.
38.1: The formulas given in most calculus books for the remainder term Rk(X) ==
f(x) - Pk(X) in Taylor's formula (where P k is the Taylor polynomial of degree k)
require f to possess derivatives of order k + 1, but this is not really necessary. The
version of Taylor's theorem stated in the text is derived in Folland [45].
38.3: Trigonometric series and integrals have a very long history, but modem
Fourier analysis only became possible after the invention of the Lebesgue integral.
When that tool became available, the £2 theory was quickly established: the Riesz-
Fischer theorem [44], [114] for Fourier series (essentially Theorem 8.20), and the
Plancherel theorem [110] for Fourier integrals. Since then the subject has developed
in many directions.
There is no universal agreement on where to put the factors of 27r in the definition
of the Fourier transform. Other common conventions are
9"d() = J e-i.x f(x) dx,
9"2f() = (27r)-n/2 J e-i.x dx,
whose inverse transforms are
9"llg(x) = (27r)-n J ei'xg() d,
9"21g() = (27r)-n/2 J ei'xg() df
l has the disadvantage of not being unitary (111/112 == (27r)n/211/112), whereas
2 is unitary but does not convert convolution into multiplication (2 (I * g) ==
(27r)n/2(2f)(2g)). To make both £2 norms and convolutions come out right,
one can either put the 27r's in the exponent, as we have done, or omit them from
the exponent but replace Lebesgue measure dx by (27r)-n/2 dx in defining both the
Fourier transform (as in 2) and convolutions.
The Hausdorff-Young inequality IIfll q < IIfli p (1 < p < 2, p-l + q-l == 1)
is sharp on Tn, since equality holds when f is a constant function; but on IRn the
optimal result, a deep theorem of Beckner [14], is that Ilfll q < pn/2Pq-n/2qllfllp.
One of the fundamental qualitative features of the Fourier transform is the fact
that, roughly speaking, a nonzero function and its Fourier transform cannot both be
sharply localized, that is, they cannot both be negligibly small outside of small sets.
This general principle has a number of different precise formulations, two of which
are derived in Exercises 18 and 19; see Folland and Sitaram [50] for a comprehensive
discussion.
NOTES AND REFERENCES 279
A nice complex-variable proof of the fact that the Fourier transform is injective
on L1 can be found in Newman [106].
338.4-5: The theory of convergence of one-dimensional Fourier series really
began (as mentioned in the text) with Dirichlet's theorem in 1829. The first construc-
tion of a continuous function whose Fourier series does not converge pointwise was
obtained by du Bois Reymond in 1876, and the fact that the Fourier series of a con-
tinuous function I is uniformly Cesaro summable to I was proved by Fejer in 1904.
In 1926 Kolmogorov produced an I E L 1 (T) such that {Sm/(x)} diverges at every
x; on the other hand, in 1927 M. Riesz proved that for 1 < p < 00, II Sml - I lip ---t 0
for every I E LP (T). The culmination of this subject is the theorem of Carleson
(1966, for p = 2) and Hunt (1967, for general p) that if I E LP(T) where p > 1,
then 8 m f ---t f almost everywhere.
For more information, see Zygmund [167], the articles by Zygmund and Hunt
in Ash [7], and Fefferman [42]. Also see Hewitt and Hewitt [74] for an interesting
historical discussion of the Gibbs phenomenon.
Convergence of Fourier series in n variables is an even trickier subject. In the first
place, one must decide what one means by a partial sum of a series indexed by zn.
It is a straightforward consequence of the Riesz and Carleson- Hunt theorems that if
f E LP (Tn) with p > 1, the "cubical partial sums"
S/(x) = L 1()e27ri.x (IIII = max(lll,..., Inl))
1IIIm
converge to f a.e. and (if p < (0) in the LP norm. On the other hand, C. Fefferman
proved the rather shocking result that for the "spherical partial sums"
Srf(x) = L f()e27ri.x
1I<r
n
(IKI2 = LK;),
1
the convergence limroo IISrl - flip == 0 holds for all I E LP only when p == 2,
if n > 1. Of course, one can consider modifications of the spherical partial sums in
the hope of obtaining positive results; the most intensively studied of these are the
Bochner- Riesz means
a I(x) = L (1 _lr-112)Ql()e27riK>x
IKI<r
obtained by taking <I>() == [max(1 - 112), 0] Q in Theorem 8.36. (When n == 1,
a+11 is essentially equivalent to the Cesaro mean ami.) These <I>'s satisfy the
hypotheses of Theorem 8.36 when Q > (n - 1), and some positive results are also
known for smaller values of Q.
Davis and Chang [30] is a good source for all of this material; see also Stein and
Weiss [142] and Ash [7].
38.7: The solution of the initial value problem for the wave equation in arbitrary
dimensions can be found in Folland [48]; see also Folland [49]. Further applications
of Fourier analysis to differential equations can be found in Folland [46], [48], Komer
[87], and Taylor [147].
Elements of Distribution
Theory
At least as far back as Heaviside in the 1890s, engineers and physicists have found
it convenient to consider mathematical objects which, roughly speaking, resemble
functions but are more singular than functions. Despite their evident efficacy, such
objects were at first received with disdain and perplexity by the pure mathematicians,
and one of the most important conceptual advances in modem analysis is the devel-
opment of methods for dealing with them in a rigorous and systematic way. The
method that has proved to be most generally useful is Laurent Schwartz's theory of
distributions, based on the idea of linear functionals on test functions. For some
purposes, however, it is preferable to use a theory more closely tied to L 2 on which
the power of Hilbert space methods and the Plancherel theorem can be brought to
bear, namely, the (L2) Sobolev spaces. In this chapter we present the fundamentals
of these theories and some of their applications.
9.1 DISTRIBUTIONS
In order to find a fruitful generalization of the notion of function on ffi.n, it is necessary
to get away from the classical definition of function as a map that assigns to each
point of ffi.n a numerical value. We have already done this to some extent in the
theory of LP spaces: If f E LP, the pointwise values f (x) are of little significance
for the behavior of f as an element of LP, as f can be modified on any set of measure
zero without affecting the latter. What is more to the point is the family of integrals
J f cP as cP ranges over the dual space Lq. Indeed, we know that f is completely
determined by its action as a linear functional on Lq; on the other hand, if we take
281
282 ELEMENTS OF DISTRIBUTION THEORY
<P == <Pr == m(Br )-lXB r where Br is the ball of radius r about x, by the Lebesgue
differentiation theorem we can recover the pointwise value f (x), for almost every
x, as limro J f<Pr. Thus, we lose nothing by thinking of f as a linear map from
Lq (}Rn) to C rather than as a map from }Rn to C.
Let us modify this idea by allowing f to be merely locally integrable on }Rn but
requiring rjJ to lie in C. Again the map <P J f <P is a well-defined linear functional
on C, and again the pointwise values of f can be recovered a.e. from it, by an easy
extension of Theorem 8.15. But there are many linear functionals on C that are
not of the form <P J f <P, and these - subject to a mild continuity condition to be
specified below - will be our "generalized functions."
Recall that for E c }Rn we have defined C (E) to be the set of all CCXJ functions
whose support is compact and contained in E. If U c }Rn is open, C (U) is the
union of the spaces C (K) as K ranges over all compact subsets of U. Each of the
latter is a Frechet space with the topology defined by the norms
<P II(Y<pllu
(a E {O, 1,2,.. .}n),
in which a sequence {<pj} converges to <P iff f)a <Pj --+ f)a <P uniformly for all a. (The
completeness of C(K) is easily proved by the argument in Exercise 9 in 35.1.)
With this in mind, we make the following definitions, in which U is an open subset
of}Rn:
1. A sequence {<pj} in C (U) converges in C to <P if {<pj} c C (K) for
some compact set K c U and <Pj --+ <P in the topology of C(K), that is,
aa<pj --+ aa<p uniformly for all a.
11. If X is a locally convex topological vector space and T : C (U) --+ X is
a linear map, T is continuous if TIC(K) is continuous for each compact
K c U, that is, if T<pj --+ T<p whenever <Pj --+ <p in C(K) and K c U is
compact.
111. A linear map T : C(U) --+ C(U') is continuous if for each compact
K c U there is a compact K' c U' such that T(C(K)) c C(K'), and T
is continuous from C(K) to C(K').
IV. A distribution on U is a continuous linear functional on C (U). The space
of all distributions on U is denoted by 1)' (U), and we set 1)' == 1)' (}Rn).
We impose the weak* topology on 1)' (U), that is, the topology of pointwise
convergence on C(U).
Two remarks: First, the standard notation 1)' for the space of distributions comes
from Schwartz's notation 1) for C, which is also quite common. Second, there is
a locally convex topology on C with respect to which sequential convergence in
C is given by (i) and continuity of linear maps T : C --+ X and T : C --+ C
is given by (ii) and (iii). However, its definition is rather complicated and of little
importance for the elementary theory of distributions, so we shall omit it.
Here are some examples of distributions; more will be presented below.
DISTRIBUTIONS 283
· Every f E Lfoc (U) - that is, every function f on U such that J K I I I < 00 for
every compact K c U - defines a distribution on U, namely, the functional
cP J f cP, and two functions define the same distribution precisely when they
are equal a.e.
· Every Radon measure J-L on U defines a distribution by cjJ J cjJ dJ-L.
. If Xo E U and ex is a multi-index, the map cjJ {)QcjJ(xo) is a distribution
that does not arise from a function; it arises from a measure J-L precisely when
ex == 0, in which case J-L is the point mass at Xo.
If f E Lfoc (U), we denote the distribution cjJ J I cjJ also by I, thereby identifying
Lfoc(U) with a subs pace of 1)'(U). In order to avoid notational confusion between
I (x) and I ( cjJ) == J I cP, we adopt a different notation for the pairing between C (U)
and 1)' (U). Namely, if F E 1)' (U) and cjJ E C (U), the value of F at cjJ will be
denoted by (F, cjJ). Observe that the pairing (.,.) between 1)' (U) and C (U) is
linear in each variable; this conflicts with our earlier notation for inner products but
will cause no serious confusion. If J-L is a measure, we shall also identify J-L with the
distribution cjJ J cjJ dJ-L
Sometimes it is convenient to pretend that a distribution F is a function even
when it really is not, and to write J F(x)cjJ(x) dx instead of (F, cjJ). This is the case
especially when the explicit presence of the variable x is notationally helpful.
At this point we set forth two pieces of notation that will be used consistently
throughout this chapter. First, we shall use a tilde to denote the reflection of a
function in the origin:
-
cjJ(x) == cjJ(-x).
Second, we denote the point mass at the origin, which plays a central role in distri-
bution theory, by 8:
(8, cjJ) == cjJ(O).
As an illustration of the role of 8 and the notion of convergence in 1)', we record
the following important corollary of Theorem 8.14:
9.1 Proposition. Suppose that I E L 1 (IRn) and J I == a, and for t > 0 let It (x) ==
t- n l(t- 1 x). Then It a8 in 1)' as t o.
Proof. If cjJ E C, by Theorem 8.14 we have
(ft, </J) = J it</J = it * </J (O) --. a;f;(O) = a</J(O) = a(8, </J).
.
Although it does not make sense to say that two distributions F and G in 1)' (U)
agree at a single point, it does make sense to say that they agree on an open set
V C U; namely, F == G on V iff (F, cjJ) == (G, cjJ) for all cjJ E C(V). (Clearly, if F
and G are continuous functions, this condition is equivalent to the pointwise equality
of F and G on V; if F and G are merely locally integrable, it means that F == G a.e.
284 ELEMENTS OF DISTRIBUTION THEORY
on V.) Since a function in C(VI U V 2 ) need not be supported in either VI or V 2 , it
is not immediately obvious that if F == G on VI and on V 2 then F = G on VI U V 2 .
However, it is true:
9.2 Proposition. Let {Va} be a collection of open subsets ofU and let V == Ua Va.
If F, G E 1)'(U) and F == G on each Va, then F == G on V.
Proof. If cjJ E C (V), there exist QI, . . . Qm such that supp cjJ C U V aj . Pick
'ljJI, . . . , 'ljJm E C such that supp( 'ljJj) C Vaj and E 'ljJj == 1 on supp( cjJ). (That
this can be done is the Coo analogue of Proposition 4.41, proved in the same way
as that result by using the Coo Urysohn lemma.) Then (F,cjJ) == E(F,'ljJjcjJ) ==
E(G, 'ljJjcjJ) == (G, cjJ). .
According to Proposition 9.2, if F E 1)' (U), there is a maximal open subset of
U on which F == 0, namely the union of all the open subsets on which F == O. Its
complement in U is called the support of F.
There is a general procedure for extending various linear operations from functions
to distributions. Suppose that U and V are open sets in ffi.n, and T is a linear map
from some subspace X of Lfoc(U) into Lfoc(V). Suppose that there is another linear
map T' : C(V) C(U) such that
j(Tf)</J = J f(T'</J)
(f E X, cjJ E C(V)).
Suppose also that T' is continuous in the sense defined above. Then T can be
extended to a map from 1)' (U) to 1)' (V), still denoted by T, by
(T F, cjJ) == (F, T' cjJ)
(F E 1)'(U), cjJ E C(V)).
The intervention of the continuous map T' guarantees that the original T, as well as
its extension to distributions, is continuous with respect to the weak* topology on
distributions: If Fa F E 1)'(U), then TFa TF in 1)'(V).
Here are the most important instances of this procedure. In each of them, U is
an open set in ffi.n, and the continuity of T' is an easy exercise that we leave to the
reader.
1. (Differentiation) Let Tf == {)af, defined on c1al(u). IfcjJ E C(U), inte-
gration by parts gives f ({)a f)cjJ == (-1) lal f f( {)acjJ); there are no boundary
terms since cjJ has compact support. Hence T' == (-l)laITIC(U), and we
can define the derivative {)a F E 1)' (U) of any F E 1)' (U) by
({)a F, cjJ) == (-1) I a I (F, {)a cjJ) .
Notice, in particular, that by this procedure we can define derivatives of arbitrary
locally integrable functions even when they are not differentiable in the classical
sense; this is one of the main reasons for the power of distribution theory. We
shall discuss this matter in more detail below.
DISTRIBUTIONS 285
11. (Multiplication by Smooth Functions) Given 'l/J E CCXJ (U), define T f == 'l/J f.
Then T' == T I C (U), so we can define the product 'l/J F E 1)' (U) for F E
1)' (U) by
('l/J F, cjJ) == (F, 'l/JcjJ).
Moreover, if 'l/J E C (U), this formula makes sense for any cjJ E C (}R n) and
defines 'l/JF as a distribution on }Rn.
III. (Translation) Given y E }Rn, let V == U + y == {x + y : x E U} and
let T == Ty. (Recall that we have defined Tyf(x) == f(x - y).) Since
J f(x - y)cjJ(x) dx == J f(x)cjJ(x + y) dx, we have T' == T_yIC(U + y).
For F E 1)'(U), then, we define the translated distribution TyF E 1)'(U + y)
by
(TyF, cjJ) == (F, T -ycjJ).
For example, the point mass at y is Ty8.
IV. (Composition with Linear Maps) Given an invertible linear transformation S
of}Rn, let V == S-l (U) and let T f == f 0 S. Then T' cjJ == I det SI- 1 cjJ 0 S-l
by Theorem 2.44, so for F E 1)' (U) we define F 0 S E 1)' (S-l (U)) by
(F 0 S, cjJ) == I det SI- 1 (F, cjJ 0 S-l).
In particular, for Sx == -x we have f 0 S == f, S-l == S, and I det SI == 1, so
we define the reflection of a distribution in the origin by
- -
( F, cjJ) == (F, cjJ) .
v. (Convolution, First Method) Given 'l/J E C, let
V == {x : x - y E U for y E supp( ) }.
(V is open but may be empty.) If f E Lfoc(U), the integral
f*'ljJ(x) = J f(x-y)'ljJ(y)dy= J f(y)'ljJ(x-y)dy= J f(Tx 'ljJ )
is well defined for all x E V. The same definition works for F E 1)' (U): the
convolution F * 'l/J is the function defined on V by
F * 'l/J ( x) == (F, T x 'l/J) .
- -
Since Tx'l/J -+ Txo'l/J in C as x -+ xo, F * 'l/J is a continuous function (actually
CCXJ, as we shall soon see) on V. As an example, for any 'l/J E C we have
- -
8 * 'l/J(x) == (8, Tx'l/J) == Tx'l/J(O) == 'l/J(x),
so 8 is the multiplicative identity for convolution.
286 ELEMENTS OF DISTRIBUTION THEORY
VI. (Convolution, Second Method) Let 'ljJ, ;f, and V be as in (v). If f E Lfoc(U)
and cP E C(V), we have
J (J * 'ljJ)cP = J J f(y)'ljJ(x - Y)cP(y) dy dx = J f( cP * 'ljJ ).
That is, if T f == f * 'ljJ, then T maps Lfoc (U) into Lfoc (V) and T' cP == cP * ;f.
For F E 1)' (U), we can therefore define F * 'ljJ as a distribution on V by
(F * 'ljJ, cP) == (F, cP * 'ljJ).
Again, we have b * 'ljJ == 'ljJ, for
({j * 'ljJ, cP) = ({j, cP * 'ljJ) = cP * ;j ( 0) = J cP ( x ) 'ljJ ( x ) dx = ('ljJ, cP).
The definitions of convolution in (v) and (vi) are actually equivalent, as we shall
now show.
9.3 Proposition. Suppose that U is open in }Rn and'ljJ E C. Let V == {x : x - y E
U for y E supp('ljJ)}. For F E 1)'(U) and x E V let F * 'ljJ(x) == (F, Tx'ljJ). Then
a. F * 'ljJ E Coo (V).
b. {)G (F * 'ljJ) == ({)G F) * 'ljJ == F * ({)G'ljJ).
c. For any cP E C(V), J(F * 'ljJ)cP = (F, cP * 'ljJ ).
Proof. Let el, . . . , en be the standard basis for}Rn. If x E V, there exists to > 0
such that x + tej E U for It I < to, and it is easily verified that
1 .-..." """" --------
t- (Tx+tej 'ljJ - Tx'ljJ) -? Tx{)j'ljJ in C (U) as t -? O.
It follows that {)j(F * 'ljJ)(x) exists and equals F * {)j'ljJ(x), so by induction, F * 'ljJ E
Coo(V) and {)G(F * 'ljJ) == F * {)G'ljJ. Moreover, since {)G 'ljJ == (-l)IGI and
{)G Tx == Tx{)G, we have
( {)G F) * 'ljJ ( x) == (a G F, T x 'ljJ ) == (-1) I G I (F, a G T x 'ljJ ) = (F, T x ) == F * ( {)G 'ljJ ) ( x ) .
Next, if cP E C(V), we have
cP * ;j(x) = J cP(y)'ljJ(y - x) dy = J cP(Y)Ty 'ljJ (X) dy.
The integrand here is continuous and supported in a compact subset of U, so
the integral can be approximated by Riemann sums. That is, for each (large)
mEN we can approximate supp( cP) by a union of cubes of side length 2- m
(and volume 2- nm ) centered at points y"{\ . . . , Y;:Cm) E supp( cP); then the corre-
sponding Riemann sums 8 m == 2- nm Lj cP(yj)Tyj'ljJ are supported in a common
DISTRIBUTIONS 287
compact subset of U and converge uniformly to cP * 'l/J as m --+ 00. Likewise,
--- ------
f)G sm == 2- nm Lj cP(yj )TYj f)G'l/J converges uniformly to cP * f)G'l/J == f)G (cP * 'l/J),
so sm --+ cP * 'l/J in Cgo(U). Hence,
(F,cP * 1/J ) == lim (F,sm) == lim 2-nm"\:"cP(yj)(F,Ty 'l/J )
moo moo J
J
= J cj;(y) (F, Ty 'lj; ) dy = J cj;(y)F * 'lj;(y) dy.
.
Next we show that although distributions may be highly singular objects, they can
all be approximated in the (weak*) topology of distributions by smooth functions,
even by compactly supported ones.
9.4 Lemma. Suppose that cP E cgo, 'l/J E C, and J'l/J == 1, and let 'l/Jt(x)
t-n'l/J (t- I x).
a. Given any neighborhood U of supp( cP), we have supp( cP * 'l/Jt) C U for t
sufficiently small.
b. cP * 'l/Jt --+ cP in C as t --+ O.
Proof. If supp( 'ljJ) C {x : Ixl < R} then supp( cP * 'l/Jt) is contained in the set of
points whose distance from supp( cP) is at most tR; this is included in a fixed compact
set if t < 1 and is included in U if t is small. Moreover, f)G ( cP * 'l/Jt) == (f)G cP) * 'l/Jt --+
f)G cP uniformly as t --+ 0, by Theorem 8.14. The result follows. .
9.5 Proposition. For any open U C ffi.n, C(U) is dense in 1)'(U) in the topology
of 1)' (U).
Proof. Suppose F E 1)' (U). We shall first approximate F by distributions
supported in compact subsets of U, then approximate the latter by functions in
C(U).
Let {Vj} be an increasing sequence of precompact open subsets of U whose union
is U, as in Proposition 4.39. For each j, by the Coo Urysohn lemma we can pick
(j E C(U) such that (j == 1 on V j . Given cP E C(U), for j sufficiently large we
have supp(cP) C Vj and hence (F, cP) == (F, (jcP) == ((jF, cP). Therefore (jF --+ F
as J --+ 00.
Now, as we noted in defining products of smooth functions and distributions,
since supp( (j) is compact, (j F can be regarded as a distribution on ffi. n . Let 'l/J, 'l/Jt
be as in Lemma 9.4, and ;j; (x) == 'l/J ( - x). Then J 'l/J == 1 also, so given cP E C,
we have cP * 'l/Jt --+ cP in C by Lemma 9.4. But then by P roposition 9.3, we
have ((j F) * 'l/Jt E Coo and (( (j F) * 'l/Jt, cP) == ((j F, cP * 'l/Jt) --+ ((j F, cP), so
( (j F) * 'l/Jt --+ (j F in 1)'. In short, every neighborhood of F in 1)' (U) contains the
Coo functions ((j F) * 'l/Jt for j large and t small.
288 ELEMENTS OF DISTRIBUTION THEORY
Finally, we observe that supp( (j) C Vk for some k. If supp( 4» n V k == 0, then
for sufficiently small t we have supp( 4>* 'ljJt ) n V k == 0 (Lemma 9.4 again) and hence
(( (j F) * 'ljJt, 4» == (F, (j (4) * 'ljJt )) == O. In other words, supp( ((j F) * 'ljJt) C V k C U,
so we are done. .
We conclude this section with some further remarks and examples concerning
differentiation of distributions. To restate the basic facts: Every FE'])' (U) possesses
derivatives oQ F E 1)' (U) of all orders; moreover, oQ is a continuous linear map
of 1)' (U) into itself. Let us examine a couple of one-dimensional examples to see
what sort of things arise by taking distribution derivatives of functions that are not
classically differentiable.
First, differentiating functions withjump discontinuities leads to "delta-functions,"
that is, distributions given by measures that are point masses. The simplest example
is the Heaviside step function H == X(O,oo)' for which we have
(H', cjJ) = -(H, cjJ') = -1 00 cjJ' (x) dx = cjJ(O) = (8, cjJ),
so H' == 8. See Exercises 5 and 7 for generalizations.
Second, distribution derivatives can be used to extract "finite parts" from divergent
integrals. For example, let f(x) == X- 1 X(O,oo)(x). f is locally integrable on ffi. \ {O}
and so defines a distribution there, but J f4> diverges whenever 4>(0) =1= O. Nonethe-
less, there is a distribution on ffi. that agrees with f on ffi. \ {O}, namely, the distribution
derivative of the locally integrable function L(x) == (log x )X(O,oo) (x). One way of
seeing what is going on here is to consider the functions L€(x) == (logx)X(€,oo) (x).
By the dominated convergence theorem we have J L4> == limEO J L E 4> for any
1> E C, that is, L E L in 1)'; it follows that L L' in 1)'. But
(L, 4» == - (L€, 4>') == _ 1 00 cjJ' (x) log x dx = 1 00 cjJ( x) dx + cjJ( E) log Eo
E E X
As E 0, this last sum converges even though the two terms individually do not.
Formally, passage to the limit gives (L',4» == J f4> + (log 0)4>(0); that is, L' is
obtained from f by subtracting an infinite multiple of 8. (This process is akin to the
"renormalizations" used by physicists to remove the divergences from quantum field
theory. )
Another way to analyze this situation is to consider smooth approximations to L,
such as LE(X) == L(X)'ljJ(EX) where'ljJ is a smooth function such that 'ljJ(x) == 0 for
x < 1 and 'l/;( x) == 1 for x > 2. The reader is invited to sketch the graphs of LE and
(LE)'; the latter will look like the graph of f together with a large negative spike near
the origin, which turns into "-00 . 8" as E --t O. See also Exercises 10 and 12.
Finally, we remark that one of the bugbears of advanced calculus, that equality
of mixed partials need not hold for functions whose derivatives are not continuous,
disappears in the setting of distributions: OjOk == 8k8j on C; therefore 8j8k ==
OkOj on 1)'! In the standard counterexample, f(x, y) == xy(x\2 - y2)(x 2 + y2)-1
(with f(O,O) == 0), 8xoyf and oy8 x f are locally integrable functions that agree
everywhere except at the origin; hence they are identical as distributions.
DISTRIBUTIONS 289
Exercises
1. Suppose that fl, f2,..., and f are in Lfoc(U). The conditions in (a) and (b)
below imply that fn --t f in 1)'(U), but the condition in (c) does not.
a. fn E LP(U) (1 < p < (0) and fn --t f in the LP norm or weakly in LP.
b. For all n, Ifn I < 9 for some 9 E Lfoc(U), and fn --t f a.e.
c. f n f pointwise.
2. The product rule for derivatives is valid for products of smooth functions and
distributions.
3. On, if'ljJ E Coo then 'ljJ8(k) == I:( -l)j ()'ljJ(j)(0)8(k-j), where the super-
scripts denote derivatives.
4. Suppose that U and V are open in n and <I> : V U is a Coo diffeomorphism.
Explain how to define F 0 <I> E 1)' (U) for any F E 1)' (V).
5. Suppose that f is continuously differentiable on IR except at Xl, . . . , X m , where
f has jump discontinuities, and that its pointwise derivative df j dx (defined except
at the Xj's) is in Lfoc(). Then the distribution derivative f' of f is given by
f' == (df jdx) + I:[f(xj+) - f(xj- )]Txj8.
6. If f is absolutely continuous on compact subsets of an interval U c , the dis-
tribution derivative f' E '])' (U) coincides with the pointwise (a.e.-defined) derivative
of f.
7. Suppose f E Ltoc (). Then the distribution derivative f' is a complex measure
on iff f agrees a.e. with a function FEN BV, in which case (f', cjJ) == J cjJ dF.
8. Suppose f E LP (n). If the strong LP derivatives 8j f exist in the sense of
Exercise 8 in 38.2, they coincide with the partial derivatives of f in the sense of
distributions.
9. A distribution F on n is called homogeneous of degree ,.\ if F 0 Sr == r A F for
all r > 0, where Sr(x) == rx.
a. 8 is homogeneous of degree -no
b. If F is homogeneous of degree A, then 8 a F is homogeneous of degree A -I ex I.
c. The distribution (dj dx) [X(O,oo) (x) log x] discussed in the text is not homo-
geneous, although it agrees on \ {O} with a function that is homogeneous of
degree -1.
10. Let f be a continuous function on n \ {O} that is homogeneous of degree -n
(i.e., f(rx) == r- n f(x)) and has mean zero on the unit sphere (Le., J f dO' == 0
where 0' is surface measure on the sphere). Then f is not locally integrable near the
origin (unless f == 0), but the formula
(PV(f), cjJ) == lim 1 f(x)cjJ(x) dx
fO Ixl>f
(cp E C)
290 ELEMENTS OF DISTRIBUTION THEORY
defines a distribution PV(f) - "PV" stands for "principal value" - that agrees
with f on jRn \ {O} and is homogeneous of degree -n in the sense of Exercise 9.
(Hint: For any a > 0, the indicated limit equals
r f(x)[ 4>(x) - 4>(0)] dx + r f(x )4>(x) dx,
Jlxla J1xl>a
r
and these integrals converge absolutely.)
11. Let F be a distribution on jRn such that supp(F) == {O}.
a. There exist N E N, C > 0 such that for all cjJ E C,
I(F, cjJ)/ < C L sup /aacjJ(x)l.
lalN Ixll
b. Fix E C with (x) == 1 for Ixl < 1 and (x) == 0 for Ixl > 2. If
cjJ E C, let cjJk(X) == cjJ(x)[l - (kx)]. If aacjJ(O) == 0 for lal < N, then
aacjJk aacjJ uniformly as k 00 for lal < N. (Hint: By Taylor's theorem,
laacjJ(x) I < ClxIN+1-lal for lal < N.)
c. If cjJ E C and aacjJ(O) == 0 for la! < N, then (F, cjJ) == o.
d. There exist constants C a (Ia! < N) such that F = I:lalN caaa/j.
12. Suppose A > n; then the function x Ixl-A on JRn is not locally integrable
near the origin. Here are some ways to make it into a distribution:
a. If cjJ E C, let P; be the Taylor polynomial of cjJ about x == 0 of degree k.
Given k > A - n - 1 and a > 0, define
(F!;:,4» = r [4>(x) - P$(x)] Jxl- A dx + r 4>(x)lxl-A dx.
Jlxla J1xl>a
Then F: is a distribution on jRn that agrees with Ixl- A on jRn \ {O}.
b. If A Z and we take k to be the greatest integer < A - n, we can let a 00
in (a) to obtain another distribution F that agrees with Ixl-A on JRn \ {O}:
(F,4» = J [4>(x) - P$(x)] Ixl- A dx.
c. Let n == 1 and let k be the greatest integer < A. Let
x == { [( k - A) . . . (1 - A)] -1 ( sgn x ) k I x I k - A if A > k,
f() (_l)k-l[(k - l)!J-l(sgnx)k log Ixl if >. = k.
Then f E Lfoc(jR), and the distribution derivative f(k) agrees with !xl- A on
jR \ {O}.
d. According to Exercise 11, the difference between any two of the distributions
constructed in (a)-(c) is a linear combination of /j and its derivatives. Which
one?
COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC DISTRIBUTIONS 291
13. If F E 1)' and a j F == 0 for j == 1,..., n, then F is a constant function.
(Consider I * 'l/Jt where'l/Jt is an approximate identity in C.)
14. For n > 3, define F, FE E Lfoc(1Rn) by
Ixl 2 - n
F(x) = w n (2 - n) '
E (lxl 2 + E 2 )(2-n)/2
F (x) = w n (2 _ n) ,
where W n == 27r n / 2 /f( n/2) is the volume of the unit sphere, and let be the
Laplacian.
a. FE(X) == E- n g(E- 1 X) where g(x) == nw;I(lxI 2 + 1)-(n+2)/2.
b. J 9 == 1. (D se polar coordinates and set s == r 2 / (r 2 + 1).)
c. F == 8. (FE Fin 1)'; use Proposition 9.1.)
d. If cp E C, the function I == F * cp satisfies I == cpo
e. The results of (c) and (d) hold also for n == 1 but can be proved more simply
there. Forn == 2, theyholdprovidedF, FE are defined by F(x) == (27r)-1Iog Ix I
and FE == (47r)-1Iog(lxI 2 + E 2 ).
15. Define G on jRn X jR by G(x, t) == (47rt)-n/2e-lxI2/4tX(O,00) (t).
3. (at - )G == 8, where is the Laplacian on jRn. (Let GE (x, t)
G(x, t)X(E,OO) (t); then GE Gin 1)'. Compute ((at - )Gt, cjJ) for cjJ E C,
recalling the discussion of the heat equation in 38.7.)
b. If cp E C (jRn X ), the function f == G * cp satisfies (at - ) I == cpo
9.2 COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC
DISTRIBUTIONS
If U is an open set in jRn, the space of all distributions on U whose support is a
compact subset of U is denoted by £' (U); as usual, we set £' == £' (jRn). £' (U) turns
out to be a dual space in its own right, as we shall now show.
The space COO(U) of Coo functions on U is a Frechet space with the Coo topology
- that is, the topology of uniform convergence of functions, together with all their
derivatives, on compact subsets of U. This topology can be defined by a countable
family of semi norms as follows. Let {V m} 1 be an increasing sequence of precompact
open subsets of U whose union is U, as in Proposition 4.39; then for each mEN
and each multi-index ex we have the seminorm
(9.6)
IIfl/[m,a] == s up laa l(x)l.
xEV m
Clearly aa Ij aD: I uniformly on compact sets for all ex iff II Ij - III [m,a] 0 for
all m, ex; a different choice of sets V m would yield an equivalent family of seminorms.
9.7 Proposition. C(U) is dense in COO(U).
292 ELEMENTS OF DISTRIBUTION THEORY
Proof. Let {V m }l be as in (9.6). For each m, by the Coo Urysohn lemma we can
pick 1/Jm E C(U) with 1/Jm == 1 on V m. If 4> E COO(U), clearly II'ljJm4>-4>11 [mo,a] ==
o provided m > mo; thus 1/Jm4> 4> in the Coo topology. .
9.8 Theorem. £'(U) is the dual space ofCOO(U). More precisely: If F E £'(U),
then F extends uniquely to a continuous linear functional on COO(U); and ifG is a
continuous linear functional on COO(U), then GIC(U) E £'(U).
Proof. If F E £'(U), choose 'ljJ E C(U) with 'ljJ == 1 on supp(F), and define
the linear functional G on Coo (U) by (G, 4» == (F, 1/J4». Since F is continuous
on C (supp( 1/J)), and the topology of the latter is defined by the norms 4> r-;
/laa4>llu, by Proposition 5.15 there exist N E Nand C > 0 such that I(G, 4»1 <
CLlalN lIaa('ljJ4»llu for 4> E COO(U). By the product rule, if we choose m large
enough so that su pp ( 'ljJ) C V m, this implies that
I (G, 4» I < C' L sup laa4>(x) I < C' L 114>11 [m,aJ,
lalN xEsupp('l/J) lalN
so that G is continuous on Coo (U). That G is the unique continuous extension of F
follows from Proposition 9.7.
On the other hand, if G is a continuous linear functional on Coo (U), by Propo-
sition 5.15 there exist C, m, N such that I(G, 4»1 < CLlalN 114>11 [m,a] for all 4> E
COO(U). Since 114>11 [m,a] < Ilaa4>/lu, this implies that G is continuous on C(K) for
each compact K c U, so GIC(U) E 1)'(U). Moreover, if [supp(4))] n V m == 0,
then (G,4» == 0; hence supp(G) C V m and GIC(U) E £'(U). .
The operations of differentiation, multiplication by Coo functions, translation,
and composition by linear maps discussed in 39.1 all preserve the class £'. As for
convolution, there is more to be said.
First, if F E £' and 4> E C then F * 4> E C, as Proposition 8.6d remains
valid in this setting. Second, if F E £' and 'lj; E Coo, F * 'ljJ can be defined as a Coo
function or as a distribution just as before:
F * 'ljJ ( x) == (F, T x 1/J) ,
--
(F * 'ljJ, 4» == (F,4> * 'ljJ) (4) E C)
(see Exercise 16). Finally, a further dualization allows us to define convolutions of
arbitrary distributions with compactly supported distributions. To wit, if FE'])' and
G E £', we can define F * G E 1)' and G * FE'])' as follows:
-- --
(F * G, 4» == (F, G * 4», (G * F, 4» == (G, F * 4»
(4) E C),
--
and likewise for F. The proof that F * G and G * F are indeed distributions (i.e., that
they are continuous on C) and that F * G == G * F requires a closer examination of
the continuity of the maps involved. We shall not pursue this matter here; however,
see Exercises 20 and 21.
A notable omission from our list of operations that can be extended from functions
to distributions is the Fourier transform. The trouble is that does not map C
COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC DISTRIBUTIONS 293
.........
into itself; in fact, if cP E C, then cP cannot vanish on any nonempty open set
.........
unless cP == O. To see this, suppose cP == 0 on a neighborhood of o. Replacing cP
by e-27rio'x cP, we may assume that o == O. Since cP has compact support, we can
expand e-27ri.x in its Maclaurin series and integrate term by term to obtain
......... 00 1 1 1 1
cj;() = L k! (-27ri . x)kcj;(x) dx = L a! '" (-27rix)"'cj;(x) dx
k=O a
(see Exercise 2a in 38.1). But J( -21rixYcP(x) dx == aa$(o) for all ex by Theorem
.........
8.22d. These derivatives all vanish by assumption, so cP == 0 and hence cP == O.
However, we do have available a slightly larger space of smooth functions that is
mapped into itself by , namely, the Schwartz class S. We recall that S is a Frechet
space with the topology defined by the norms
IIcPll(N,a) == sup (1 + IxI)NlaacP(x)l.
xEJRn
9.9 Proposition. Suppose 'ljJ E C and'ljJ(O) == 1, and let 'ljJE (x) == 'ljJ( EX). Then
for any cP E S, 'ljJE cP cP in S as E O. In particular, C is dense in S.
Proof Given N E N, for any 'T] > 0 we can choose a compact set K such that
(1 + Ixl)N IcP(x) I < 'T] for x K. Since 'ljJ( EX) 1 uniformly for x E K as E 0,
it follows easily that lI'ljJEcP - cPl/ (N,O) 0 for every N. For the norms involving
derivatives, we observe that by the product rule,
(1 + Ixl)N aa('ljJE cP - cP) == (1 + Ixl)N ('ljJEaa cP - aa cP ) + EE(X),
where EE is a sum of terms involving derivative of 'ljJE. Since
la/3'ljJE(X) I == E I /3lla/3'ljJ(EX)1 < C/3E1/31,
we have 1/ EE 1/ U < C E 0 as E o. The preceding argument then shows that
lI'ljJEcP - cPl/(N,a) o. .
A tempered distribution is a continuous linear functional on S. The space of
tempered distributions is denoted by S'; it comes equipped with the weak* topology,
that is, the topology of pointwise convergence on S. If F E S', then FIC is
clearly a distribution, since convergence in C implies convergence in S, and FIC
determines F uniquely by Proposition 9.9. Thus we may, and shall, identify S' with
the set of distributions that extend continuously from C to S. We say that a locally
integrable function is tempered if it is tempered as a distribution.
The condition that a distribution be tempered means, roughly speaking, that it
does not grow too fast at infinity. Here are a few examples:
. Every compactly supported distribution is tempered.
. If f E LfocORn) and J(l + Ixl)Nlf(x)1 dx < 00 for some N, then f is
tempered, for I J fcPl < CllcPlI(o,N).
294 ELEMENTS OF DISTRIBUTION THEORY
. The function f (x) == e ax on lR is tempered iff a is purely imaginary. Indeed,
suppose a = b + ic with b, c real. If b == 0, then f is bounded and hence
tempered by (ii). If b # 0, choose a function 'ljJ E C such that J 'ljJ == 1, and
let c/Jj (x) == e-ax'ljJ( x - j). It is easily verified that c/Jj 0 in S as j +00
(if b > 0) or j --t -00 (if b < 0), but J fc/Jj == J'ljJ == 1 for all j.
. On the other hand, the function f (x) == eX cos eX on lR is tempered, because it
is the derivative of the bounded function sin eX. Indeed, if c/J E S, integration
by parts yields
J f</Y = - J </y'(x)sineXdx < ClI</Ylb,1)'
Intuitively, f (x) is not too large "on average" when x is large, because of its
rapid oscillations.
We turn to the consideration of the basic linear operations on tempered distri-
butions. The operations of differentiation, translation, and composition with linear
transformations work just the same way for tempered distributions as for plain dis-
tributions; these operations all map Sand S' into themselves. The same is not true
of multiplication by arbitrary smooth functions, however. The proper requirement
on 'ljJ E CCX> in order for the map F 'ljJF to preserve Sand S' is that 'ljJ and all its
derivatives should have at most polynomial growth at infinity:
laa'ljJ(x) I < C a (1 + IxI)N(a) for all a.
Such CCX> functions are called slowly increasing. For example, every polynomial
is slowly increasing; so are the functions (1 + IxI 2 )S (8 E lR), which will play an
important role in the next section.
As for convolutions, for a F E S' and 'ljJ E S we can define the convolution
F * 'ljJ by F * 'ljJ (x) == (F, T X 'ljJ), as before, and we have an analogue of Proposition
9.3:
9.10 Proposition. If F E S' and 'ljJ E S, then F * 'ljJ is a slowly increasing CCX>
function, and for any c/J E S we have J(F * 'ljJ)c/J == (F, c/J * ;[;).
Proof. That F * 'ljJ E CCX> is established as in Proposition 9.3. By Proposition
5.15, the continuity of F implies that there exist m, N, C such that
I(F, c/J)I < C L 1Ic/JII(m,a)
lalN
(c/JES),
COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC DISTRIBUTIONS 295
and hence by (8.12),
IF * ?jJ(x) I < C L sup(1 + lyl)mlaa?jJ(x - y)1
lal::;N y
< C(1 + Ixl)m L sup(1 + Ix - yl)mlaa?jJ(x - y)1
lal::;N y
< C(1 + Ixl)m L 1I?jJII(m,a).
lalN
The same reasoning applies with ?jJ replaced by a f3 ?jJ, so F * ?jJ is slowly increasing.
Next, by Proposition 9.3 we know that the equation J(F * ?jJ)cP == (F, cP * ;j;) holds
when cP,?jJ E C. By Proposition 9.9, if cP,?jJ E S we an find seq uences {cPj} and
{?jJj} in C that converge to cP and ?jJ in S. Then cPj *?jJj cP * ?jJ in S by (the proof
--- ---
of) Proposition 8.11, so (F, cPj * ?jJj) (F, cP * ?jJ). On the other hand, the preceding
estimates show that IF * ?jJj(x)1 < C(1 + Ixl)m with C and m independent of j,
and likewise IcPj(x)1 < C(1 + Ixl)-m-n-1, so J(F * ?jJj)cPj J(F * ?jJ)cP by the
dominated convergence theorem. .
Finally, we come to the principal raison d' etre of tempered distributions, the
Fourier transform. We recall (Corollary 8.23) that the Fourier transform maps S
continuously into itself, and that for f, 9 E L 1 (in particular, for f, 9 E S) we have
1 J(y)g(y) dy = 11 J(x)g(y)e-2rrix.y dxdy = 1 J(x)g(x) dx.
We can therefore extend the Fourier transform to a continuous linear map from S' to
itself by defining
(F, cP) == (F, $)
(F E S', cP E S).
This definition clearly agrees with the one in Chapter 8 when FELl + L 2 .
The basic properties of the Fourier transform in Theorem 8.22 continue to hold in
this setting. To wit,
(TyF)- == e-27ri.y F,
T: F - [ e27riTJox F ] -
TJ - ,
aa F == [( -27rix)a F]-, (aa F)-== (27ri)a F,
(f 0 T)- == I det TI- 1 1 0 (T*)-l (T E GL(n, lR)),
(F * ?jJ)- == F (?jJ E S).
(The first four of these formulas involve products of slowly increasing CCX> functions,
specified by their values at a general point x or , and tempered distributions.) The
easy verifications of these facts are left to the reader (Exercise 17).
Moreover, we can define the inverse transform in the same way:
(F V , cP) == (F, cP V).
296 ELEMENTS OF DISTRIBUTION THEORY
The Fourier inversion theorem formula cjJ == () v == (cp v r'then extends to S':
( ( F) v , cp) == (F, cp V) == (F, (cp v f') == (F, cp) ,
so that (F) v F, and likewise (FVf' == F. Thus the Fourier transform is an
isomorphism on S'.
If F E £', there is an alternative way to define F. Indeed, (F, cp) makes sense for
any cjJ E Coo, and if we take cjJ(x) == e-27ri.x, we obtain a function of that has a
strong claim to be called F(). In fact, the two definitions are equivalent:
9.11 Prop-osition. If F E £', then F is a slowly increasing Coo function, and it is
given by F() == (F, E_) where E(x) == e27ri.x.
Proof. Let g() == (F, E_). Consideration of difference quotients of g, as in
the proof of Proposition 9.3, shows that 9 is a Coo function with derivatives given
by oag() == (F, of E_) == (-27ri)la l (F, x a E_). Moreover, by Theorem 9.8 and
Proposition 5.15, there exist m, N, C such that
loag()1 < C L sup lof3[x a E_(x)]1 < C'(1 + m)la l (1 + II)N,
I,BIN Ixlm
so 9 is slowly increasing.
-..
It remains to show that 9 == F, and by Proposition 9.9 it suffices to show that
J gcp == (F,) for cp E C. In this case gcp E C, so J gcjJ can be approximated
by Riemann sums as in the proof of Proposition 9.3, say g( j )cp( j) tlj. The
corresponding sums cp(j )e-27rij"x tlj and their derivatives in x converge uni-
-..
formly, for x in any compact set, to cp( x) and its derivatives. Therefore, since F is a
continuous functional on Coo,
J gcjJ = lim L (P, E_f.j )cjJ(j) t::.j = lim( P, L cjJ(Xj )E_f.j t::.j ) = (P, if;;.
.
It is time for some examples. First and foremost, the Fourier transform of the
point mass at 0 is the constant function 1: (8, E_) == E_(O) == 1. More generally,
for point masses at other points and their derivatives, we have
(oaTy8)-"() == (-1)la l (8, T_yOa E_) == (-1)lalo(e-27ri,(x+Y))lx=o
= (27ri)ae-27ri.Y.
In particular:
9.12 Proposition. The Fourier transforms of the linear combinations of 8 and its
derivatives are precisely the polynomials.
COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC DISTRIBUTIONS 297
The Fourier inversion theorem then yields the formulas for the Fourier transforms
of polynomials and imaginary exponentials:
(9.13) (xaf'== [(_x)Q]V == (-27ri)-la 1 o a<5,
.-. v
Ey == (E_y) == Ty<5.
As an illustration of the heuristics associated to these results, consider the formula
J e27r:it; o x dI;, = 8(x).
Although this is nonsensical as a pointwise equality, it is valid when viewed from the
right angle. One the one hand, it expresses the fact that the Fourier transform of the
constant function 1 is <5. More interestingly, it is a concise statement of the Fourier
inversion theorem. Indeed, if we replace x by x - y, integrate both sides against
cP E S, and reverse the order of integration on the left, we obtain
J J <jJ(y)e 27ri t;o(x- y ) dy dx = J 8(x - y)<jJ(y) dy.
The integral on the left is () v (x), and the integral on the right equals cP( x ) !
It is an important fact that every distibution is, at least locally, a linear combination
of derivatives of continuous functions. The Fourier transform yields an easy proof of
this:
9.14 Proposition.
a. If F E £', there exist N E N, constants C a (Ial < N), and f E Co(JRn) such
that F == ElalN caoaf.
b. If F E 1)' (U) and V is a precompact open set with V c U, there exist N, C a , f
as above such that F == ElalN cao a f on v.
Proof. By Proposition 9.11, if F E £' then F is slowly increasing, so the function
g() == (1 + 112)-M F() will be in £1 if the integer M is chosen sufficiently large.
Let f == g; then f E Co and F == (1 + 112)M 1, so F == (1 - (47r 2 )-1 E oJ)M f.
This proves (a); for (b), choose E C(U) such that == 1 on V, and apply (a) to
F. .
We conclude this section with a sketch of the theory of periodic distributions;
some of the details are fleshed out in Exercises 22-24.
The space CCX> (Tn) of smooth periodic functions is a Frechet space with the
topology defined by the semi norms cP IloacPllu, and a distribution on Tn is a
continuous linear functional on this space; the space of distributions on Tn is denoted
by 1)' C1r n ). If F E TI' C1r n ), its Fourier transform is the function F on 7l... n defined by
!( "') == (F, E-K,) where EK, (x) == e 27ri K, O x. Since F satisfies an estimate of the form
I (F, cP) I < C ElalN 11 8a cPllu, there exist C, N such that
IF(I'\:) I < C(l + II'\:I)N,
(9.15)
298 ELEMENTS OF DISTRIBUTION THEORY
and the Fourier transform is an isomorphism from 1)' ern) to the space of all functions
on tl n satisfying such an estimate. Moreover, if F E 1)' ern), the Fourier series
E F("" )E converges in 1)' (Tn) to F.
Instead of defining periodic distributions as distributions on Tn (linear functionals
on CCX> (Tn)), one can define them as distributions on lR n (linear functionals on
C (lR n )) that are invariant under the translations T, "" E 7l n . Accordingly, let
1)'(lR n )per == {F E 1)'(lR n ): TF == Ffor"" Ell}.
The periodization map PcjJ == EEZn TcjJ used in Theorem 8.31 is easily seen to map
C (IRn) continuously into CCX> (Tn), so it induces a map P' : 1)' (Tn) 1)' (lR n )
given by (P'F,cjJ) == (F,PcjJ). SincePoT == Pfor"" E 7l n , wehaveToP' == P',
that is, the range of P' lies in 1)' (lR n )per. In fact, P' : 1)' (Tn) 1)' (lR n )per is a
bijection. (The proof is nontrivial; see Exercise 24.) Moreover, if fELl (Tn), then
f and P' f coincide as periodic functions on lR, for if cjJ E C (JRn),
(PI f, cf;) = (f, Pcf;) = r f(x) L cf;(x - h:) dx
J[o,l)n
= L r f(x)cf;(x) dx = r f(x)cf;(x) dx = (f, cf;).
J[o,l)n+ Jn
Thus the two descriptions of periodic distributions are equivalent.
If F E 1)'(Tn), the Fourier series E F(",,)E converges in 1)'(1r n ) to F; on the
other hand, it follows easily from (9.15) that it also converges in S' (lR n ), and its sum
there is P' f. Thus 1)' (lR n ) per C S' (lR n ), and by (9.13) we have
(PI Ff= L F(h:)E" = L F(h:)r,,8,
giving the relation between the lR n _ and Tn-Fourier transforms for periodic distribu-
'"
tions. In particular, if F == 8 1r n, the point mass at the origin in Tn, then F( "") == 1
'"
for all ""; hence P' F and (P' F) are both equal to E T8 - a restatement of the
Poisson summation formula.
Exercises
16. Su!:.pose F E £' and'l/J E CCX>. Show that for any cjJ E C, J (F, Tx 'l/J )cjJ(X) dx ==
(F, cjJ * 'l/J). (The result can be reduced to Proposition 9.3; given F and cjJ, the indicated
expressions depend only on the values of'l/J in a compact set.)
17. Suppose that F E S'. Show that
a. (TyF)"'== e-21rioy F, T'lF == [e21riTJox F]-:
b. {)Ci F == [( -27rix)Ci F ({)Ci F)'" == (27ri)Ci F.
c. (F 0 T)"'== I det TI- 1 F 0 (T*)-l for T E GL(n, IR).
d. (F * 'l/J)'" == ;j;F for 'l/J E S.
COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC DISTRIBUTIONS 299
18. If n == I + m, let us write x E JRn as (y, z) with y E JRl and z E JRm. Let
l' denote the Fourier transform on JRn and 1'1, 1'2 the partial Fourier transforms
in the first and second sets of variables - i.e., 1'lf(T/, z) == J f(y, z)e-27ri17°Y dy
and likewise for 1'2. Then 1'1 and 1'2 are isomorphisms on S(JRn) and S' (JRn), and
l' == 1'11'2 == 1'21'1.
19. On JR, let Fo = PV (1/ x) as defined in Exercise 10. Also, for c > 0 let
FE (x) == x(x 2 + c 2 )-I, G(x) == (x ::l: ic)-I, and SE(X) == e-27rElxl sgn x.
a. limE-+o FE == Fo in the weak* topology of S'. (Theorem 8.14, with a == 0,
may be useful.)
b. limE-+o G E == Fo =F 1ri8. (Hint: (x::l: ic)-1 == (x =F ic)(x 2 + c 2 )-I.)
c. SE == (7ri)-1 FE and hence sgn = (7ri)-1 Fo.
--
d. From (c) it follows that Fo == -7ri sgn. Prove this directly by showing
that Fo == limE-+o, N-+oo HE,N, where HE,N(X) == x-I if c < Ixl < Nand
HE,N (x) == 0 otherwise, and using Exercise 59b in 32.6.
e. Compute X(O,oo) (i) by writing X(O,oo) == sgn + and using (c), (ii) by
writing x(O,oo) (x) = lime-EXx(O,oo)(x) and using (b).
20. Suppose that F E S' and G E £'.
a. FC is well-defined element of S'.
b. If1/; E S, then G * 1/; E S.
-- -- --
c. Let F * G (or G * F) be the tempered distribution such that (F * G) == FG.
- -
Then (F * G, 1/;) == (F, G * 1/;) == (G, F * 1/;) for1/; E S.
21. Suppose that F, G, H E S'.
a. If at most one of F, G, H has noncompact support, then (F * G) * H =
F * (G * H), where the convolutions are defined as in Exercise 20.
b. On JR, let F be the constant function 1, G == d8/dx, and H = X(O,oo). Then
(F * G) * Hand F * (G * H) are well defined in S' but are unequal.
22. Let E",,(x) == e 27ri "".x. If g : zn <C satisfies Ig(K)1 < C(l + IKI)N for some
C, N > 0, then the series I:KEZ n g( K )E"" converges in 1)' (1r n ) to a distribution
F that satisfies F = g. It also converges in S' (JRn) to a tempered distribution G
(== P' F) such that T"" G == G for all K.
23. Suppose that F, G E 1)' (1r n ).
a. There is a unique F * G E 1)'(1r n ) such that (F * G)-- == FC. (Use Exercise
22.)
b. IfG E COO(1r n ), thenF*G E e OO (1r n ) andF*G(x) == (F,TxG) asonJRn.
24. Let P be the periodization map, PcP = I:""Ezn T""cP.
a. P is a continuous linear map from cgo(IRn) to e OO (1r n ). (Note that for
cP E e and x in a compact set, only finitely many terms of the series I: T""cP(X)
are nonzero.)
b. Choose "'( E ego with J "'( == 1, and let w == "'( * X[o,l)n. Then W E Cgo and
Pw == 1.
300 ELEMENTS OF DISTRIBUTION THEORY
c. If 'l/J E Coo (Tn), then 'l/J == P (w'l/J) (where 'l/J is regarded as a function on Tn
on the left and as a function on JRn on the right). Consequently, P : C (JRn)
Coo (Tn) is surjective and the dual map P' : 1)' (Tn) 1)' (JRn )per is injective.
d. Given G E 1)'(JRn)per, define F E 1)'(Tn) by (F, 'ljJ) == (G, w'ljJ) (with the
same understanding as in part (c)). Then P' F == G, so P' maps 1)' (Tn) onto
1)' (JRn )per.
25. Suppose that P is a polynomial in n variables such that only zero of P() in JRn
is == 0, and let P(D) be as in 38.7.
a. Every tempered distribution F that satifies P(D)F == 0 is a polynomial. (Use
Proposition 9.12 and Exercise 11.)
b. Every bounded function f that satisfies P( D) f == 0 is a constant. (This
result, for the special cases where P(D) is the Laplacian or the Cauchy-Riemann
operator ax + ia y on JR2, is known as Liouville's theorem.)
26. On JRn x JR, let G(x, t) == (47rt)-n/2e-lxI2/4tX(O,00) (t).
a. G is the tempered function G(, r) == (27rir + 47r2112)-1. (Use Proposition
8.24 and Exercise 18.)
b. Deduce that (at - )G == 8. (Cf. Exercise 15.)
27. Suppose that 0 < Re a < n.
a. For any cP E S,
r((n - a)/2) J Ixla-nJ(x) dx = f(a/2) J 1I-acP() d.
7r(n-a)/2 7r a / 2
(Hint: By Proposition 8.24 and Lemma 8.25, if t > 0 we have
J e-11" t lxI 2 J( x) dx = C n/2 J e _11"1';1 2 It <p() d.
Multiply both sides by t- 1 +(n-a)/2 dt and integrate from 0 to 00.)
b. Let Ra(x) == r((n - a)/2)[r(a/2)2 a 7r n / 2 ]-1Ixla-n. Then Ra is a tem-
pered function and Ra is the tempered function Ra() == (27r11)-a.
c. If n > 2, then R2 == -8. (Cf. Exercise 14. See the next exercise for the
case n == 2.)
28. Suppose n == 2. For 0 < Rea < 2, let C a == r((2 - a)/2)[r(a/2)2 a 7r]-1
and Qa(x) == ca(Ila-2 - 1). (Note that Qa differs by a constant from the Ra in
Exercise 27.)
a. lim a -+2 Qa(x) == _(27r)-1Iog lxi, pointwise and in S'.
b. By (a), lim a -+2 Qa exists in S', and by Exercise 27b, Qa() == (27r1I)-a -
C a 8 . Noting that (27r I I ) - 2 is not integrable near the origin and that lima -+ 2 C a ==
--
00, find an explicit formula for lim a -+2 Qa. (Exercise 12 may help.)
29. For 1 < p < 00, let e p be the set of all F E S' for which there exists C > 0
such that II F * cPlip < C II cPlip for all cP E S, so that the map cP r---7 F * cP extends to a
bounded operator on LP.
SOBOLEV SPACES 301
a. e 1 == M (JR n ). (If FEel, consider F * CPt where {<pt} is an approximate
identity, and apply Alaoglu's theorem.)
b. e 2 == {F E S': FE LOO}. (UsethePlancherel theorem.)
c. If p and q are conjugate exponents, then e p == e q . (Hint: (F * <p,) ==
(F * , <p).)
d. If 1 < p < 2 and q is the conjugate exponent to p, then e p C e r for all
r E (p, q). (Use the Riesz-Thorin theorem.)
e. e 1 C e p C e 2 forallp E (1,00).
9.3 SOBOLEV SPACES
One of the most satisfactory ways of measuring smoothness properties of functions
and distributions is in terms of L2 norms. There are two reasons for this: L2 has
the advantage of being a Hilbert space, and the Fourier transform, which converts
differentiation into multiplication by the coordinate functions, is an isometry on L 2 .
As a first step, suppose kEN and let Hk be the space of all functions! E L 2 (JRn)
whose distribution derivatives aa ! are L2 functions for lal < k. One can make Hk
into a Hilbert space by imposing the inner product
(I,g) I--> L /(8C> 1)( 8C>g ).
lalk
However, it is more convenient to use an equivalent inner product defined in terms of
the Fourier transform. Theorem 8.22e and the Plancherel theorem imply that! E H k
if[ a f E L2 for lal < k. A simple modification of the argument in the proof of
Proposition 8.3 shows that there exist C 1 , C 2 > 0 such that
0 1 (1 + 112)k < L 1C>12 < O 2 (1 + 112)k,
lalk
from which it follows that! E Hk iff (1 + 112)k/2 f E L 2 and that the norms
1 I--> (L 118C> III) 1/2 and 1 I--> II (1 + 112)k/2 1]12
lalk
are equivalent. The latter norm, however, makes sense for any k E JR, and we can
use it to extend the definition of Hk to all real k.
We proceed to the formal definitions. For any s E IR the function (1 + 112)s/2
is Coo and slowly increasing (Exercise 30), so the map As defined by
As! == [(1 + 112)S/2 jJ v
is a continuous linear operator on S' - actually an isomorphism, since A;l == A-s-
If s E IR, we define the Sobolev space Hs to be
Hs == {! E S' : As! E L 2 },
302 ELEMENTS OF DISTRIBUTION THEORY
and we define an inner product and norm on Hs by
(I,g)(s) = /(Asf)( Asg ) = / f()(l + 112 tg() d,
[/ ..- ] 1/2
II fll (s) == IIAsfl12 == If() 1 2 (1 + 112) s d .
(The equality of the two formulas for (f,g)(s) and for IIfll(s) follows from the
Plancherel theorem. ) Note that the inner products (-,.) (s) are conjugate linear in
the second variable, but we are continuing to use the notation (., .) for the bilinear
pairing between S' and S. This will cause no confusion, since we shall not be using
the inner products (., .)(s) explicitly.
The following properties of Sobolev spaces are simple consequences of the defi-
nitions and the preceding discussion:
1. The Fourier transform is a unitary isomorphism from Hs to L 2 (IRn , f-Ls) where
df-Ls() == (1 + 112)s d. In particular, Hs is a Hilbert space.
11. S is a dense subspace of Hs for all s E JR. (This follows easily from (i) and
Proposition 8.17.)
111. 1ft < s, Hs is a dense subspace of Ht in the topology of Ht, and 11.11 (t) < 11.11 (s).
IV. At is a unitary isomorphism from Hs to Hs-t for all s, t E IR.
v. Ho == L2 and 11.11(0) = II . 112 (by Plancherel).
VI. a o. is a bounded linear map from Hs to H s -/ o.1 for all s, a (because Io. I <
(1 + 112)lo.l/2).
By (iii) and (v), for s > 0 the distributions in Hs are L2 functions. For s < 0
the elements of Hs are generally not functions. For example, the point mass 8 is
in Hs iff s < -n, for 8 is the constant function 1, and J(l + 112)S d < 00
iff s < - n. Another example: The distribution W t whose Fourier transform is
(27r11)-1 sin 27rtll, which arose in the discussion of the wave equation in 38.7, is
in Hs iff s < 1 - n; it i in L1 n L 2 when n == 1 and in L1 \ L 2 for n == 2, but is
not a function for n > 3.
9.16 Proposition. If s E JR, the duality between S' and S induces a unitary isomor-
phismfrom H-s to (H s )*. More precisely, if f E H-s, the functional c/; r---7 (f, c/;)
on S extends to a continuous linear functional on Hs with operator norm equal to
Ilfll(-s), and every element of(Hs)* arises in thisfashion.
Proof. If f E H -s and c/; E S,
(I, cj;) = (Iv, f> = / fV ()()
SOBOLEV SPACES 303
since fV () == f( -) is a tempered function. Thus by the Schwarz inequality,
1(f,<f»1 < [/ IfV(W(1 + 112)-S dr/2 [/ 1J;()12(1 + 112r dr/2
== Ilfll(-s)II</;II(s),
so the functional 4Y (f, 4Y) extends continuously to Hs, with norm at most Ilfll( -s).
In fact, its norm equals II fll (-s), since if 9 E S' is the distribution whose Fourier
transform is g() == (1 + 112)-s j(), we have 9 E Hs and
(f,g) = / 1j()12(1 + 112r d = IIfll-s) = IlfIIC-s)llgIIC s )'
Finally, if G E (H s )*, then G 0 -1 is a bounded linear functional on £2(/1s) where
d/1s() == (1 + 112)s d, so there exists 9 E L 2 (/1s) such that
G(cjJ) = / J;()g()(1 + 112r d.
But then G(</;) == (f, </;) where fV () == (1 + 112)sg(), and f E H-s since
IIfll-s) = / 1j()12(1 + 112)-S d = / Ig(W(1 + Ins df
.
For s > 0, the elements of Hs are L 2 functions that are "L 2 -differentiable up
to order s," and it is natural to ask what is the relationship between this notion of
smoothness and ordinary differentiability. Of course, if one thinks of elements of H s
as distributions or elements of L2, there is no distinction among functions that agree
almost everywhere; from this perspective, when one says that a function in H s is of
class C k , one means that it agrees a.e. with a C k function. With this understanding,
the question just posed has a simple and elegant answer. We introduce the notation
C == {f E Ck(JR n ) : an f E Co for lal < k}.
C is a Banach space with the C k norm f r---7 Llnlk Ila n fllu.
9.17 The Sobolev Embedding Theorem. Suppose s > k + n.
a. If f E Hs, then (an ff' E L1 and II(a n ff'lh < Cllfll(s)jor lal < k, where C
depends only on k - s.
b. Hs c C, and the inclusion map is continuous.
Proof. By the Schwarz inequality,
(27!r ICtI / I (aCt fn)1 d = / ICt j()1 d < / (1 + 112)k/21j()1 d
[/ ......... ] 1/2 [/ ] 1/2
< (1 + 112)s If() 12d (1 + 112) k-s d .
304 ELEMENTS OF DISTRIBUTION THEORY
The first factor on the right is II f II (s), and the second one is finite by Corollary 2.52
since 2( k - s) < -no This proves (a), and (b) follows by the Fourier inversion
theorem and the Riemann-Lebesgue lemma. .
9.18 Corollary. If f E Hs for all s, then f E Coo.
An example may help to elucidate this theorem. Let f).. (x) = </J( x) I x I).., where
A E JR and </J E C with </J == 1 on a neighborhood of o. Then the (classical)
derivative oa f).. is Coo except at 0 and is homogeneous of degree A - lal near 0, so
that loa f)..1 < Ca,)..lxl)..-lal, and in particular oa f).. E L1 provided A - lal > -no
In this case oa f).., as an L1 function, is also the distribution derivative of f)... (To
see this, replace f).. by the Coo function </J(x)(lxI 2 + £2),,/2 and consider the limit as
E 0.) Moreover, oa f).. E L2 iff A - lal > -n, so f E Hk (k == 0,1,2, . . .) iff
A > k - n, whereas f).. E C iff A > k. See also Exercises 33-35 for some related
results.
Next, we show that multiplication by suitably smooth functions preserves the Hs
spaces. We need a lemma:
9.19 Lemma. For all ,TJ E JRn and s E ,
(1 + 112) s (1 + ITJ/ 2 ) -s < 21s1 (1 + I _ TJ1 2 ) Isi.
Proof. Since II < I - TJI + ITJI, we have 112 < 2(1 - TJI 2 + ITJI 2 ) and hence
1 + 112 < 2 (1 + I - TJ1 2 ) (1 + ITJ/ 2 ).
If s > 0, we have merely to raise both sides to the sth power. If s < 0, we interchange
and TJ and replace s by -s, obtaining
(1 + ITJI 2 ) -s < 2- s (1 + 112) -s (1 + I _ TJ12) -s,
which is again the desired result.
.
9.20 Theorem. Suppose that </J E Co (JRn) and that J is a function that satisfies
J (1 + 112r/21if;()1 d = C < 00
for some a > O. Then the map M<f>(f) == </Jf is a bounded operator on Hs for
Isl < a.
Proof. Since As is a unitary map from Hs to Ho == £2, it is equivalent to show
that AsM4>A-s is a bounded operator on £2. But
(AsM<pA-sfn) = (1 + 112r/2 [if; * (A-sJ)l () = J K(, 7])1(7]) d7],
SOBOLEVSPACES 305
where
K(, TJ) == (1 + 112)S/2 (1 + ITJI 2 ) -s/2:$( - TJ).
By Lemma 9.19,
IK(,TJ)I < 2/ s // 2 (1 + I - TJI2)lsl/21:$( - TJ)I,
so if Isl < a, then J IK(, TJ)I d and J IK(, TJ)I dTJ are bounded by 2 a / 2 C. That
AsM4>A-s is bounded on L 2 therefore follows from the Plancherel theorem and
Theorem 6.18. .
9.21 Corollary. If cP E S, then M<I> is a bounded operator on Hs for all s E IR.
Our next result is a compactness theorem that is of great importance in the appli-
cations of Sobolev spaces.
9.22 Rellich's Theorem. Suppose that {fk} is a sequence of distributions in Hs
that are all supported in afixed compact set K and satisfy sUPk Ilfk II(s) < 00. Then
there is a subsequence {fk j } that converges in Ht for all t < s.
-
Proof. First we observe that by Proposition 9.11, fk is a slowly increasing Coo
function. Pick cP E C such that cP == 1 on a neighborhood of K. Then fk == cPfk, so
- --
fk == cP * fk where the convolution is defined pointwise by an absolutely convergent
integral. By Lemma 9.19 and the Schwarz inequality,
(1 + 112)S/2Ih()1
< 2181/2 J I;j;( - 7]) 1(1 + I - 7]/2) 1 8 1/ 2 Ih (7]) 1(1 + 17]1 2 r/ 2 d7]
< 2 Isl / 2 1IcPll(lsl)llfkll(s) < constant.
Likewise, since OJ(;j * h) == (Oj) * h, we see that (1 + 112)s/2Iojh()1 is
-
bounded by a constant independent of, j, and k. In particular, the fk'S and their first
derivatives are uniformly bounded on compact sets, so by the mean value theorem
and the ArzeUl-Ascoli theorem there is a subsequence {h j } that converges uniformly
on compact sets.
We claim that {fk j } is Cauchy in Ht for all t < s. Indeed, for any R > 0 we can
write the integral
2 J 2t- - 2
Ilfk i - fk j II (t) == (1 + II ) Ifk i - fk j I () d
as the sum of the integrals over the regions II < R and II > R. For II < R we use
the estimate
(1 + 112) t < (1 + R2) max(t,O),
306 ELEMENTS OF DISTRIBUTION THEORY
and for II > R we use the estimate
(1 + 112)t < (1 + R2) t-s (1 + 112) s,
which yield
1 2 n ( 2 ) max ( t, 0 ) 1 .- .- 1 2
Ifk z - fk j II(t) < CR 1 + R sup fk i - fk j ()
IIR
+(1 + R2)t-sllfki - fk j II(s).
Given € > 0, the second term will be less than € provided R is chosen sufficiently
large, since t - s < 0; once such an R is fixed, the first term will less than € provided
i and j are sufficiently large. The proof is therefore complete. .
Although the definition of Sobolev spaces in terms of the Fourier transform entails
their elements being defined on all of JRn, these spaces can also be used in the study
of local smoothness properties of functions. The key definition is as follows: If U is
an open set in }Rn, the localized Sobolev space H;oC (U) is the set of all distributions
f E 1)'(U) such that for every precompact open set V with V c U there exists
9 E Hs such that 9 == f on V.
9.23 Proposition. A distribution f E f)'(U) is in H;OC(U) iff c/Jf E Hs for every
c/J E C(U).
Proof. If f E H;OC(U) and c/J E C(U), then f agrees with some 9 E Hs
on a neighborhood ofsupp(c/J); hence c/Jf == c/Jg E Hs by Corollary 9.21. For the
conv ers e, given a precompact openJl with V c U, we can choose c/J E C(U)
with c/J == 1 on a neighborhood of V by the Coo Urysohn lemma; then c/Jf E Hs
and c/J f == f on V. (We have implicitly used Proposition 4.31 to obtain compact
neighborhoods of supp c/J and V in U.) .
We conclude this section with one of the classic applications of Sobolev spaces, a
regularity theorem for certain partial differential operators.
If L == :E aj (d/ dx)j is an ordinary differential operator with Coo coefficients
such that am never vanishes, it is not hard to show that smooth data give smooth
solutions. More precisely, if Lu == f and f is C k on an open interval I, then u is
Ck+m on I. No such result holds for partial differential operators in general. For
example, for any f E Ltoc (JR) the function u( x, t) == f (x - t) satisfies the wave
equation (8; - 8;)u = 0, but u has only as much smoothness as f. However, there
is a large class of differential operators for which a strong regularity theorem holds.
We restrict attention to the constant -coefficient case, although the results are valid in
greater generality.
Let P(D) == :E1al::;m caDa (notation as in 38.7) be a constant-coefficient opera-
tor. We assume that m is the true order of P(D), i.e., that C a i= 0 for some a with
lal == m. The principal symbol Pm is the sum of the top-order terms in its symbol:
Pm() == L: caa.
laj=m
SOBOLEVSPACES 307
P(D) is called elliptic if Pm () i= 0 for all nonzero E JRn. Thus, ellipticity means
that, in a formal sense, P(D) is genuinely mth order in all directions. (For example,
the Laplacian is elliptic on JRn, whereas the heat and wave operators at - and
af - are not elliptic on }Rn+1.)
9.24 Lemma. Suppose that P(D) is oforderm. Then P(D) is elliptic iff there exist
C, R > 0 such that IP()I > CIlm when II > R.
Proof. If P( D) is elliptic, let C 1 be the minimum value of the principal symbol
Pm on the unit sphere II == 1. Then C 1 > 0, and since Pm is homogeneous of
degree m, we have IPm()1 > C11lm for all. On the other hand, P - Pm is of
order m - 1, so there exists C2 such that IP() - Pm()1 < C21lm-1. Therefore,
IP()I > IPm()I-IP() - Pm()1 > C11lm for II > 2C2C11.
Conversely, if P(D) is not elliptic, say Pm(O) == 0, then IP()I < CIlm-1 for
every scalar multi pie of o. .
9.25 Lemma. If P(D) is elliptic of order m, u E Hs, and P(D)u E Hs, then
u E Hs+m.
Proof. The hypotheses say that (1 + 112)s/2iL E L 2 and (1 + 112)s/2 PiL E L2.
By Lemma 9.24, for some R > 1 we have
(1 + 112)m/2 < 2mllm < c-12mIP()1 for II > R,
and (1 + 112)m/2 < (1 + R2)m/2 for II < R. It follows that
(1 + 112)(s+m)/2IiLl < C'(l + 112)S/2(IPul + liLl) E L 2 ,
that is, u E H s + m .
.
9.26 The Elliptic Regularity Theorem. Suppose that L is a constant-coefficient
elliptic differential operator of order m, 0 is an open set in }Rn, and u E 1)' (0).
If Lu E H;OC(O) for some s E IR, then u E H;+m(O); and if Lu E COO(O), then
u E COO(O).
Proof. The second assertion follows from the first in view of Corollary 9.18, so
by Proposition 9.23 we must show that if Lu E H;OC(O) and cjJ E C(O), then
cjJu E H s + m . Let V be a precompact open set such that supp(cjJ) eVe V c 0,
and choose 'ljJ E C(O) such that 'ljJ == 1 on V . Then 'ljJu E £', so it follows
from Proposition 9.11 that 'ljJu E Ha for some a E JR. By decreasing a we may
assume that s + m - a is a positive integer k. Set'ljJo == 'ljJ and 'ljJk == cjJ, and choose
recursively 'ljJ1, . . . , 'ljJk-1 E C such that 'ljJj == 1 on a neighborhood of supp( cjJ)
and supp( 'ljJj) is contained in the set where 'ljJj-1 == 1. We shall prove by induction
308 ELEMENTS OF DISTRIBUTION THEORY
that 'ljJjU E Ha+j. When j == k, we obtain cjJu == 'ljJku E Ha+k == Hm, which will
complete the proof.
The crucial observation is that for any ( E C the operator [L, (] defined by
[L, (]f == L((f) - (Lf
is a differential operator of order m - 1 whose coefficients are linear combinations of
derivatives of (; in particular, these coefficients are Coo functions that vanish on any
open set where ( is constant. (This follows from the product rule for derivatives.)
Thus, if f E Ht, we have aa f E Ht-(m-l) for lal < m - 1 and hence [L, (]f E
Ht-(m-l) by Theorem 9.20.
For j == 0 we have 'ljJou E Ha by assumption. Suppose we have established that
'ljJ j U E H a+ j, where 0 < j < k. Then by the preceding remarks,
L ( 'ljJ j + I u) == 'ljJ j + I Lu + [L, 'ljJ j + I ] U == 'ljJ j + I Lu + [L, 'ljJ j + 1 ] 'ljJ j U
E Hs + Ha+j-(m-l) == H a + j + l - m .
Since 'l/Jj+l u == 'l/Jj+I'ljJjU E Ha+j, Lemma 9.25 (with P(D) == L) implies that
'l/Jj+l U E H a + j + 1 , and we are done. .
Two classical special cases of this theorem are particularly noteworthy. First,
every distribution solution of Laplace's equation u == 0 is a Coo function. (This
fact is known as Weyl's lemma.) Second, if L == a l + ia 2 on JR?2, the equation
Lu == 0 is the Cauchy-Riemann equation, whose solutions are the holomorphic
(or analytic) functions of z == Xl + iX2. We thus recover the fact that holomorphic
functions are Coo.
Exercises
30. Let fs() == (1 + 112)s/2. Then loa fs()1 < C a (l + 1l)s-lal.
31. If kEN, H k is the space of all f E L 2 that possess strong L2 derivatives aa f,
as defined in Exercise 8 in 38.2, for lal < k; and these strong derivatives coincide
with the distribution derivatives.
32. Suppose r < s < t. For any E > 0 there exists C > 0 such that Ilfll(s) <
Ell fll (t) + Cllfll (r) for all f E Ht.
33. (Converse of the Sobolev Theorem) If Hs C C, then s > k + n. (Use the
closed graph theorem to show that the inclusion map Hs C is continuous and
hence that aa8 E (H s )* for lal < k.)
34. (A Sharper Sobolev Theorem) For 0 < a < 1, let
Aa(JRn) = {f E BC(lR n ) : sup If(x) - f(y)1 < oo}.
x#y Ix - yla
a. If s == n + a where 0 < a < 1, then IITx8 - T y 8 11(_s) < Calx - yla. (We
have (Txbf( () = e-211"i.x. Write the integral defining IITx8 - Ty81Il_s) as the
SOBOLEVSPACES 309
sum of the integrals over the regions II < R and II > R, where R == Ix - yl-l,
-.
and use the mean value theorem to estimate (Tx8 - T y 8) on the first region.)
b. If S == n + a where 0 < a < 1, then Hs C Aa(IRn).
c. If S == n + k + a where kEN and 0 < a < 1, then
Hs C {f E C : oa f E Aa(JRn) for lal < k}.
35. The Sobolev theorem says that if S > n, it makes sense to evaluate functions in
Hs at a point. For 0 < s < n, functions in Hs are only defined a.e., but if S > k
with k < n, it makes sense to restrict functions in Hs to subspaces of codimension
k. More precisely, let us write JRn = JRn-k x JRk, X == (y, z), == (rJ, (), and define
R: S(JRn) ---+ S(JRn-k) by Rf(y) = f(y,O).
a. (Rf)(rJ) == J !('T}, () de. (See Exercise 20 in 3 8 . 3 .)
b. If s > k,
IC R fnl7)1 2 < Cs(l + 11712) (k/2)-s J Ifcl7, ()12(1 + 1171 2 + 1(1 2 r de.
c. R extends to a bounded map from HsCJRn) to HS_(k/2)(JRn-k) provided
s > k.
36. Suppose that 0 i= cjJ E C and { aj} is a sequence in JRn with 1 aj I ---+ 00, and let
cjJj (x) == cjJ(x - aj). Then {cjJj} is bounded in Hs for every s but has no convergent
subsequence in Ht for any t.
37. The heat operator Ot - is not elliptic, but a weakened version of Theorem
9.26 holds for it. Here we are working on JRn+l with coordinates (x, t) and dual
coordinates (, T), and 0t - == P(D) where P(, T) == 27riT + 47r2112.
a. There exist C, R > 0 such that III(, T)l l / 2 < CIP(, T)I for I(, T)I > R.
(Consider the regions ITI < 112 and ITI > 112 separately.)
b. If f E Hs and (Ot - )f E Hs, then f E HS+l and oXif E H S +(1/2) for
1 < i < n.
c. If ( E C (JRn+l), we have
[Ot - , (]f == (Ot( - ()f - 2 L(OXi()(OXif).
d. If n is open in JRn+l, U E 1)'(0), and (Ot - )u E H;OC(O), then U E
H;+l (0). (Let'ljJj be as in the proof of Theorem 9.26. Show inductively that
if'ljJou E Ha, then 'ljJjU E H a +(j/2) and OXi('ljJjU) E H a +(j-l)/2 provided
(j + j < s.)
38. Suppose So < Sl and to < t l , and for 0 < A < 1 let
SA == (1 - A)so + ASl,
t A == (1 - A)to + Atl.
If T is a bounded linear map from Hso to Hto whose restriction to HSI is bounded
from HSl to H tl , then the restriction of T to H s >. is bounded from H s >. to H t >. for
310 ELEMENTS OF DISTRIBUTION THEORY
o < A < 1. (T is bounded from Hs to Ht iff AsT A-t is bounded on L2. Observe
that Az is well defined for all z E C and Az is unitary on every Hs if Re z == O. Let
s(z) == (1- z)so + ZSl, t(z) == (1- z)to + zt 1 , and for 0 < Re z < 1 and cjJ, 'ljJ E S
let F(z) == f[At(z)TA-s(z)cjJ]'ljJ. Apply the three lines lemma as in the proof of the
Riesz- Thorin theorem.)
39. Let f2 be an open set in IRn, and let G : f2 ---+ IRn be a Coo diffeomorphism.
For any cjJ E C(G(f2)), the map Tf = (cjJf) 0 G is bounded on Hs for all s;
consequently, foG E H;OC(f2) whenever f E H;OC(G(f2)). Proceed as follows:
a. If s == 0,1,2, . . ., use the chain rule and the fact that f E Hs iff {}a f E L2
for lal < s.
b. Use Exercise 38 to obtain the result for all s > o.
c. For s < 0, use Proposition 9.16 and the fact that the transpose of T is another
operator of the same type, namely, T' f == ('ljJ f) 0 H where H == G- 1 and
'ljJ = (J cjJ) 0 G with J(x) == I det DxGI.
40. State and prove analogues of the results in this section for the periodic Sobolev
spaces
Hs ('Jf n ) = {f E 2)' (Tn) : 2: (1 + 1A;1 2 r I!(A; )1 2 < 00 }.
9.4 NOTES AND REFERENCES
The mathematical foundations for the theory of distributions were largely laid in
the 1930s. On the one hand, several researchers in partial differential equations
arrived at the notion of "weak derivatives" of functions; to wit, if f, 9 E Lfoc(U),
9 is the derivative {}a f in the weak sense if f gcjJ == (-I)la l f f{}acjJ for all cjJ in
some suitable space of test functions. On the other, various attempts were made to
extend the domain of the Fourier transform beyond L 1 + L2. The idea of defining
"generalized functions" as linear functionals on certain function spaces goes back to
Sobolev [136], but it was Laurent Schwartz who systematically developed the theory
of distributions and who introduced the spaces Sand S' as natural domains for the
Fourier transform. (See Dieudonne [33] for a more detailed historical account.)
Rudin [126] contains a good concise introduction to the theory of distributions that
includes some functional-analytic points we have elided, such as the definition of the
topology on C (U) and the properties of convolution on 1)' x £'. More extensive
treatments can be found in Gelfand and Shilov [55], Schwartz [132], and Treves [150].
Hormander [77] contains an excellent full-scale treatment of distribution theory with
a view toward its applications to differential equations.
39.2: See Folland [49] for a case study of the analytic techniques used in manip-
ulating distributions and their Fourier transforms and applying them to differential
equations.
39.3: The spaces originally considered by Sobolev [137] are the spaces Hk of
functions f E LP whose distribution derivatives {}a f are in LP for lal < k. When
NOTES AND REFERENCES 311
1 < p < 00, it turns out that HI == {! : As! E LP} (although this is far from
obvious when p i= 2), and this characterization of Hk can be used to define Hf for
aU s E JR. Sobolev's embedding theorem, in this setting, is that if s < nip then
Hf c Lq where q-l == p-l - n- 1 s, and if s > k + nip then Hf C Ck. See Stein
[140, 88V.2-3]. Further results on Sobolev space and their applications can be found
in Adams [1] and Lieb and Loss [93].
In ReJIich' s theorem the hypothesis that K is compact can be replaced by the
hypothesis that m( K) < 00 when s > 0; see Lair [89].
A differential operator L == Elalm aaDa with Coo coefficients is called elliptic
on 0 C IRn if E1al=m aa(x)a i= 0 for all x E 0 and all nonzero E IRn. The
elliptic regularity theorem remains true as stated for such operators; see Folland [48].
The LP version of this theorem is also valid for 1 < p < 00, but not for p == 1 or
p == 00. It is not true, except in dimension 1, that if Lu E Ck(D) then u E Ck+m(O),
but if {}a Lu is not just continuous but Holder continuous of exponent A (0 < A < 1)
for I a I == k, then u E C k + m (0) and {}{3 u satisfies the same Holder condition for
1131 == k + m. See Taylor [147, Chapter XI].
1
Topics in Probability
Theory
Probability theory, originally conceived to analyze games of chance, has developed
into a broadly useful discipline with deep connections to other branches of mathemat-
ics and many applications to other subjects such as physics, statistics, and economics.
The mathematical study of probability began some two and a half centuries ago, but
until the advent of modem analytic tools the theory was limited to combinatorial the-
orems invol ving discrete sample spaces and a few other results of somewhat doubtful
rigor. It is now recognized that the fundamental datum in probability theory is a
measure space (X, M, p,) such that p,(X) == 1; such a measure p, is called a proba-
bility measure. X is to be considered as the set of all possible outcomes of some
process, such as an experiment or a gambling game, and the measure of a set E E M
is interpreted as the probability that the outcome lies in E. Although measure spaces
are the natural setting for the study of probability, it is hardly accurate to say that
probability theory is a branch of measure theory, for its central ideas and many of its
techniques are distinctively its own.
This brief chapter is intended not as a systematic introduction to probability theory
but rather as an advertisement for the subject; it also serves to illustrate further some
results of previous chapters.
10.1 BASIC CONCEPTS
Probability theory has its own vocabulary, which is partly a legacy of its development
before the connection with measure theory was made explicit and partly a result of the
313
314 TOPICS IN PROBABILITY THEORY
fact that the probabilistic point of view is different. We therefore begin by presenting
a brief dictionary of probabilists' dialect.
Analysts' Term
Measure space (X, M, p,) (p,(X) == 1)
(a- )algebra
Measurable set
Measurable real-valued function f
Integral of f, I f dp,
LP [as adjective]
Convergence in measure
Almost every( where), a.e.
Borel probability measure on IR
Fourier transform of a measure
Characteristic function
Probabilists' Term
Sample space (f2, 23, P)
(a- )field
Event
Random variable X
Expectation or mean of X, E(X)
Having finite pth moment
Convergence in probability
Almost sure(ly), a.s.
Distribution
Characteristic function of a distribution
Indicator function
Probabilists have an aversion to displaying the arguments of random variables.
For example, {w : X (w) > a} and P( {w : X (w) > a}) are commonly written as
{X > a} and P(X > a).
Henceforth we shall, for the most part, adopt probabilistic language in this chapter,
although we shall use the term "LP random variable" in preference to the more
cumbersome "random variable with finite pth moment." One more standard piece of
terminology, which has no equivalent in classical analysis, is the following: If X is a
random variable, its variance a 2 (X) and standard deviation a(X) are defined by
a(X) == vi a 2 (X).
a 2 (X) == inf E[(X - a)2],
aER
If X L2, then a 2 (X) == 00. If X E L 2 , then E[(X-a)2] == E(X 2 )-2aE(X)+a 2
is a quadratic function of a whose minimum occurs when a == E(X); hence
a 2 (X) == E[(X - E(X))2] == E(X 2 ) - E(X)2
(X E L 2 ).
a(X) is a measure of how widely X deviates from its mean E(X).
At this point we must discuss a general measure-theoretic construction. Let
(f2, 23, P) be a probability space (or, for that matter, an arbitrary measure space), let
(f2',23') be another measurable space, and let cjJ : f2 ---+ f2' be a (23, 23')-measurable
map. Then the measure P induces an image measure P<jJ on f2' by
P<jJ(E) == P(cjJ-l(E)).
That this is indeed a measure follows from the fact that cjJ-l commutes with unions
and intersections.
10.1 Proposition. With notation as above, if f : f2' ---+ IR is a measurable function,
then In l f dP<jJ == In (1 0 cjJ) dP whenever either side is defined.
Proof. When f == XE with E E 23', this is just the definition of P<jJ, since
XE 0 cjJ == Xc/>-l (E). The general result follows by taking linear combinations and
limits. .
BASIC CONCEPTS 315
If X is a random variable on 0, then Px is a probability measure on JR, called the
distribution of X, and the function
F(t) = Px ((-00, t]) == P(X < t)
(which determines Px by Theorem 1.16) is called the distribution function of X.
If {Xa}aEA is a family of random variables such that P XQ = P X (3 for all a, (3 E A,
the X a's are said to be identically distributed.
More generally, for any finite sequence Xl, . . . , X n of random variables, we can
consider (Xl,..., X n ) as a map from 0 to IRn, and the measure PeX1,...,X n ) on
n is called the joint distribution of X I, . . . , Xn. It is a general principle that all
properties of random variables that are relevant to probability theory can be expressed
in terms of their joint distributions. For example, by Proposition 10.1,
E(X) = ! tdPx(t), 0-2(X) = ! (t - E(X))2 dPx(t),
E(X+Y)= !(t+S)dP(X,y)(t,s).
In fact, given a Borel probability measure A on JR, one can simply speak of the
mean A and variance a 2 of A,
A = ! tdA(t),
0- 2 = ! (t - A )2 dA(t),
which are the mean and variance of any random variable with distribution A.
One of the most important concepts in probability theory, and the one that most
clearly sets it apart from general measure theory, is that of (stochastic) independence.
To motivate this idea, consider a probability space (0, 1', P) and an event E such that
P(E) > O. Then the set function PE(F) == p(EnF)j P(E) is a probability measure
on 0 called the conditional probability on E; PE (F) represents the probability of the
event F given that E occurs. If P E (F) == P (F), that is, if the probability of F is the
same whether or not we restrict to E, then F is said to be independent of E. Thus,
F is independent of E iff P(E n F) == P(E)P(F); moreover, the latter condition is
clearly symmetric in E and F and makes sense even if P(E) = o.
With this in mind, we define a collection {Ea} aEA of events in 0 to be indepen-
dent if
n
P(E a1 n . . . n Ea n ) == II P(Eaj) for all n E N and all distinct al . . . ,an E A.
I
(For the events Ea to be independent it does not suffice for them to be pairwise
independent; see Exercise 1.)
A collection {Xa}aEA of random variables on 0 is called independent if the
events {Xa E Ba} = X;;I(Ba) are independent for all Borel sets Ba C IR. This
condition can be neatly rephrased as follows. Observe that if aI, . . . ,an E A and
316 TOPICS IN PROBABILITY THEORY
we write X j == X aj , we have
P(Xl1(B1)n...nx1(Bn)) ==P((X 1 ,...,X n )-1(B 1 X ... X Bn))
== p( Xl," ., X n ) ( B 1 X . . . x B n ) ,
whereas
n n n
II p(X;l (B j )) = II P Xj (B j ) = (II PXj) (B 1 X .. . x Bn).
111
These quantities are equal for all Borel sets Bj C ffi. iff
n
p(X1,...,X n ) == II PXj.
1
That is, {X a } aEA is an independent set of random variables iff the joint distribution
of any finite set of X a's is the product of their individual distributions.
The following proposition expresses the fact that functions of independent random
variables are independent.
10.2 Proposition. Let {X nj : 1 < j < J (n), 1 < n < N} be independent random
variables, and let f n : JR J (n) ---+ 1R be Borel measurable for 1 < n < N. Then the
random variables Y n == fn(X n1 ,..., XnJ(n)), 1 < n < N, are independent.
Proof. Let X n == (X n1 ,..., XnJ(n)). If B 1 , . . . , BN are Borel subsets of JR,
we have Yn- 1 (Bn) == X;; 1 (f;;l (Bn)) and hence
N
(Y 1 , . . . , Y N ) -1 (B 1 X . . . x B N) == n y n - 1 (Bn)
1
== (Xl,... ,X N )-1(f 1 1 (B 1 ) x ... X f"N 1 (BN)).
Therefore, by the independence of the Xnj's and Fubini's theorem,
p( Y I ,.. . , Y n) ( B 1 X . . . x Bn) == p(X 1 ,. .. ,x N ) (f 11 ( B 1) X . . . x f"N 1 ( B N ) )
N J(n)
= (II II PXnj)Ul1(Bd x ... X fjVl(BN))
n=lj=l
N
== II PXn (f;;l(Bn))
n=l
N
== II PYn(Bn).
n=l
.
BASIC CONCEPTS 317
We now present some fundamental properties of independent random variables.
For the first one we need the notion of convolutions of measures on IR developed in
38.6. An easy induction on (8.47) shows that if AI, . . . ,An E M(IR), then Al *. . . * An
is given by
(10.3) Al * . . . * An (E) = J... J XE(t1 + . . . + tn)dAI (td.. . dAn(tn).
10.4 Proposition. If {X j } '1 are independent random variables, then
P X1 +...+X n = PX 1 * . . . * PXn.
Proof. Let A(tl,. .. ,t n ) == L: t j . Then Xl +. . . + X n == A(X1'. . . ,X n ), so
N
Px1+,,+x n = (p(X1,,,.,X n )) A = (II PXj) ,
1 A
and by (10.3), the last expression equals PI * . . . * Pn.
.
10.5 Proposition. Suppose that {X j }'1 are independent random variables. If X j E
£1 for all j, then rr X j E £1, and E(rr X j ) == rr E( X j ).
Proof. We have rr IX j I == f(X 1 ,..., X n ) where f(tl,. . . ,t n ) == rr It j I.
Hence
n n
E(II IXjl) = J f dP(x1,,,.,X n ) = J f d(II PXj)
1 1
n n
= II J Itjl dPxj(tj) = II E(IXjl).
1 1
This proves the first assertion, and once this is known, the same argument (with the
absolute values removed) proves the second one. .
10.6 Corollary. If {X j }'1 are independent and in £2, then (j2(X 1 + ... + X n ) ==
L: (j2 (X j ).
Proof. Let Yj == X j - E(X j ). Then {Yj}'1 are independent and have mean
zero, so
E(YjY k ) == E(Yj)E(Y k ) == 0
(j i= k).
Therefore,
(j2(X 1 +... + X n ) = E((Y 1 +... + Yn)2) == LE(YjY k )
j,k
== LE(Yj2) == L(j2(X j ).
J j
.
318 TOPICS IfJ PROBABILITY THEORY
These results show that independence is a very stringent property. For one thing,
it is usually not the case that the product of two L1 functions is in L1. For another,
suppose that X and Yare independent and E(X) == 0. Then for any Borel measurable
function f on ffi. such that f 0 Y EL I we have
E(X. (f 0 Y)) == E(X)E(f 0 Y) == 0.
In other words, X is orthogonal (in the L 2 sense) to every function of Y. This
indicates that, for example, if one tries to construct a sequence of independent
random variables on [0, 1] with Lebesgue measure by using the familiar functions of
calculus, one will probably not succeed. (Perhaps the simplest example is X n (x) ==
the nth digit in the decimal expansion of x; see Exercise 23.) Rather, the natural
setting for independence is product spaces.
Indeed, suppose
n == n 1 x . . . x nn, 23 == 23 1 Q9 . . . Q9 23 n , P == PI X . . . X Pn.
Then any random variables Xl, . . . , X n on n such that X j depends only on the jth
coordinate are independent, for if X j == fj 07r j where 7r j : n n j is the coordinate
map,
n
(Xl,... ,Xn)-1(B1 X ... x Bn) == IIf j - 1 (B j )
1
and hence
n
p(X 1 ,...,X n ) (B 1 X . . . x Bn) = P (II f j - 1 (B j ))
1
n
n
== II Pj(fj-1(Bj)) == II PXj(Bj).
1 1
The same idea can be made to work for infinitely many factors; see 310.4.
As an application of these ideas, we present Bernstein's constructive proof of the
Weierstrass approximation theorem.
10.7 Theorem. Given f E C([O, 1]), let
n ,
Bn(x) = L f(k/n) k!(n k)! xk(1 - xt- k ,
k=O
Then Bn f uniformly on [0,1] as n 00.
Proof. Given x E [0,1], let A == x8 1 + (1 - x)80 where 8t is the point mass at t.
Let n == ffi.n, P == A x . . . x A, and X j == the jth coordinate function on ffi. n . Then
Xl, . . . ,X n are independent and have the common distribution A. It is easy to check
(Exercise 7) that
n ,
'""'" n. k ( ) n-k
A*oo.*A=Lo'k!(n_k)!x I-x I5 k ,
BASIC CONCEPTS 319
and hence, in view of Proposition 10.4,
Bn(x) = E [f( Xl + .. + X n ) ] .
Now, L: n![k!(n - k)!]-lxk(1 - x)n-k == 1 by the binomial theorem, so
n ,
(10.8) If(x) - Bn(x)1 < lf(X) - f(kfn)1 k!(nn k)! xk(l - x)n-k.
Given £ > 0, by the uniform continuity of J on [0, 1] there exists 8 > 0, independent
of x and y, such that IJ(x) - J(y)1 < E whenever Ix - yl < 8. The sum of the terms
in (10.8) such that Ix - (k/n)1 < b is at most £, while the sum of the remaining terms
is at most
211fllu P (I XI + .. +X n - xl> 8).
But
a 2 (Xj) == E(XJ) - E(X j )2 == x - x 2 < 1,
so by Corollary 10.6,
[( X1+...+Xn ) 2 ] _ 2 ( Xl+...+Xn ) n _ 1
E -x -a <---
n n - n 2 n '
and Chebyshev's inequality therefore gives
If(x) - Bn(X)1 < E + 2u ,
which is less than 2£ provided n is sufficiently large.
.
Exercises
1. Let !l consist of four points, each with probability . Find three events that are
pairwise independent but not independent. Generalize.
2. Let {X j } be a sequence of independent identically distributed positive random
variables such that E(X j ) = a < 00 and E(1/ X j ) == b < 00, and let Sn == L: X j .
a. E(Xj/S n ) == 1/n if j < n, and E(Xj/S n ) == aE(1/S n ) if j > n.
b. E(Sm/ Sn) == m/n if m < nand E(Sm/ Sn) == 1 + (m - n)aE(1/ Sn) if
m > n.
3. Suppose that {EO'.} aEA is a collection of events in O.
a. If E a1 , . . . , Ea n are independent, so are E a1 , . . . , Ea n _l , En .
b. If { EO'.} is an independent set, so is { Fa}, where each Fa is either EO'. or E.
c. {EO'.} is an independent set of events iff {XEa} is an independent set of
random variables.
320 TOPICS IN PROBABILITY THEORY
4. Let X, Y, Z be positive independent random variables with a common distribu-
tion A, and let F(t) == A((O, t]). The probability that the polynomial Xt 2 + Yt + Z
has real roots is fo oo fo oo F(t 2 j4s) dA(t) dA(S).
5. If X is a random variable with distribution dPx (t) == f (t) dt where f (t) ==
f( -t), then the distribution of X 2 is dP X 2 (t) == t- 1 / 2 f(t 1 / 2 )X(0,00) (t) dt.
6. For a, u > 0, let d'Ya,u(t) == [r(a)]-luata-le-utX(o,oo)(t) dt, the gamma
distribution with parameters a and u.
a. The mean and variance of 'Ya,u are aju and aju 2 , respectively.
b. 'Ya,u * 'Yb,u == 'Ya+b,u. (Use Exercise 60 in 3 2 . 6 .)
c. If Xl'...' X n are independent and all have the distribution dPx (t)
(27r)-1/2e- t2 / 2 dt, then Xf + . . . + X has the distribution 'Yn/2,1/2. (Use
(b) and Exercise 5. 'Yn/2,1/2 is called the chi-square distribution with n de-
grees of freedom.)
7. Let b t denote the point mass at t E JR. Given 0 < p < 1, let {3p == pb 1 + (1- p )b o ,
and let f3;n be the nth convolution power of f3p. Then
n ,
*n '"'"" n. k ( ) n-k b
(3p = L k!(n _ k)! P 1 - P k,
k=O
and the mean and variance of f3;n are np and np( 1 - p). f3;n is called the binomial
distribution on {O, . . . , n} with parameter p.
8. Let b t denote the point mass at t E JR. Given a > 0, let Aa = e- a (a k j k!)b k ,
the Poisson distribution with parameter a.
a. The mean and variance of Aa are both equal to a.
b. Aa * Ab == Aa+b.
c. The binomial distribution {3/n (Exercise 7) converges vaguely to Aa as n
00. (Use Proposition 7.19.)
9. Suppose that {Xn}r is a sequence of random variables. If X n --+ X in
probability, then PXn --+ Px vaguely. (Use Proposition 7.19.)
10. (The Moment Convergence Theorem) Let Xl, X 2 , . . . , X be random variables
such that PX n --+ Px vaguely and sUPn E(IXnIT) < 00, where r > O. Then
E(IXnIS) --+ E(IXIS) for all S E (0, r), and if also S E N, then E(X) --+ E(XS).
(By Chebyshev's inequality, if E > 0, there exists a > 0 such that P(IXnl > a) < E
for all n. Consider J cjJ(t)ltI S dPxn (t) and J[1-cjJ(t)]l t I S dPxn (t) where cjJ E Cc(JR)
and cjJ( t) == 1 for It I < a.)
10.2 THE LAW OF LARGE NUMBERS
If one plays a gambling game many times, one's average winnings or losses per
game should be roughly the the expected winnings or losses in each individual game;
THE LAW OF LARGE NUMBERS 321
more generally, if one plays a sequence of possibly different games, one's average
winnings or losses should be roughly the average of the expected winnings or losses
in the individual games. In symbols: If {X j }l is a sequence of independent random
variables and E(X j ) == J-Lj, then the average n- 1 I: X j should be close to the
constant n -1 I: /-Lj when n is large.
The law of large number is a precise formulation of this idea. It comes in several
versions, depending on the hypotheses one wishes to make. The first version, with
the weakest hypotheses and conclusions, has a very simple proof.
10.9 The Weak Law of Large Numbers. Let {X j } 1 be a sequence of indepen-
dent £2 random variables with means {J-Lj} and variances {a;}. Ifn- 2 I: a; --+ 0
as n 00, then n -1 I: (X j - J-Lj) 0 in probability as n --+ 00.
Proof. n- 1 I:(Xj - J-Lj) has mean 0 and variance n- 2 I: a; (the latter by
Corollary 10.6). Hence by Chebyshev's inequality, for any E > 0 we have
p(ln- 1 ) X j - JLj)1 > E) < (nE)-2 > -J 0 as n 00.
.
Under slightly stronger hypotheses, we can obtain the sharper conclusion that
n -1 I: (X j - J-Lj) 0 almost surely. To establish this, we need the following two
lemmas, which are of interest in their own right.
10.10 The Borel-Cantelli Lemma. Let {An} 1 be a sequence of events.
a. IfI: P(An) < 00, then P(lim sup An) == o.
b. /fthe An's are independent and I: P(An) == 00, then P(limsup An) == 1.
Proof. We recall that limsupA n == n 1 U k An, so that
CX) CX)
P(limsup An) < p( U An) < L P(An),
n=k n=k
and the latter sum tends to zero as k 00 if I: P(An) converges. On the other
hand, suppose that I: P( An) diverges and the An's are independent. We must show
that
CX) CX)
P((limsupAnn = p(U n A) = 0,
k=ln=k
and for this it is enough to show that p(n=k A) = 0 for all k. But the A's are
independent (Exercise 3), so since 1 - t < e- t ,
KKK K
p( n A) = IT[l - P(An)] < IT e-P{An) = ex p ( - L P(An)).
n=k k k k
The last expression tends to zero as K --+ 00, which yields the desired result. .
322 TOPICS IN PROBABILITY THEORY
10.11 Kolmogorov's Inequality. Let Xl, . . . , X n be independent random variables
with mean 0 and variances ar, . . . , a;, and let Sk = Xl + . . . + X k . For any E > 0,
n
P ( max ISkl > E ) < E- 2 '""" a.
I <k<n
- - I
Proof. Let Ak be the set where ISj I < E for j < k and ISk I > Eo Then the Ak'S
are disjoint and their union is the set where max ISk I > E, so
n n
P(maxlSkl > E) == LP(A k ) < E-2LE(XAkS),
I I
because S > E 2 on Ak. On the other hand,
n
E(S;) > L E(XAkS;)
I
n
= LE(XAk[SZ +2S k (Sn - Sk) + (Sn - SkfJ)
I
n n
> L E(XAkS) + 2 L E(XAkSk(Sn - Sk)).
1 I
It will suffice to show that E(XAkSk(Sn - Sk)) == 0 for all k, for then we have
n
P(max ISkl > E) < E- 2 E(S;) == E- 2 L a
1
by Corollary 10.6, since the Xk's have mean zero. But XAk is a measurable function
of Sl, . . . , Sk and hence of Xl, . . . , Xk, whereas Sn - Sk is a measurable function
of X k + 1 , . . . , Xn. Moreover, E(Sk) == I: E(X j ) = 0 for all k. Therefore, by
Propositions 10.2 and 10.5,
E(XAkSk(Sn - Sk)) == E(XAkSk)E(Sn - Sk) == E(XAkSk) .0= O.
.
10.12 Kolmogorov's Strong Law of Large Numbers. If {Xn}r is a sequence of
independent £2 random variables with means {J.ln} and variances {a} such that
I: n-2a < 00, then n- I I:(Xj - J.lj) ---7 0 almost surely as n ---7 00.
Proof. Let Sn == I:(Xj - J.lj). Given E > 0, for kEN let Ak be the set
where n -11 Sn I > E for some n such that 2k-1 < n < 2 k . Then on Ak we have
ISnl > E2 k - 1 for some n < 2 k , so by Kolmogorov's inequality,
2 k
P(Ak) < (E2k-1)-2 La.
1
THE LAW OF LARGE NUMBERS 323
Therefore,
00 00 2 k 00 00 2
LP(Ak) < E; LLT 2k o-; = L( L T2k)0-; < E L < 00,
1 k=1 n=1 n=1 klog2 n n=1
so P(lirn sup Ak) == 0 by the Borel-Cantelli lemma. But lirn sup Ak is precisely the
set where n -11 Sn I > E for infinitely many n, so
P(lirnsupn-1ISnl < E) = 1.
Letting E -* 0 through a countable sequence of values, we conclude that n -1 Sn -* 0
almost surely. .
The hypotheses of this theorem are a bit stronger than those of the weak law
(Exercise 11). They are certainly satisfied when the Xn's are identically distributed
L 2 random variables, since then a is independent of n. However, in the identically
distributed case the assumption that X n E L 2 can be weakened.
10.13 Khinchine's Strong Law of Large Numbers. If {X n } 1 is a sequence
of independent identically distributed Ll random variables with mean J..l, then
n- 1 :E X j -* J..l almost surely as n -* 00.
Proof. Replacing X n by X n - J..l, we may assume that J..l = O. Let A be the
common distribution of the Xj's; we are thus assuming that
J It I d)..(t) < 00,
J t d)..(t) = O.
Let ¥j = X j on the set where \X j \ < j and ¥j == 0 elsewhere. Then
00
00
00
LP(Yj =I X j ) = LP(IXjl > j) = LA({t: It I > j})
111
00 00
== LLA({t: k < It I < k + I}).
j=1 k=j
Since :E ;' 1 :E j == :E 1 :E=I' interchanging the order of summation yields
f P(X j # Yj) = f k)"( {t : k < It I < k + 1}) < J It I d)..(t) < 00.
1 k=1
By the Borel-Cantelli lemma, then, with probability one we have X j == Yj for j
sufficiently large, and it therefore suffices to show that n -1 :E Yj -* 0 almost
surely.
We have
0- 2 (Y n ) < E(Y;) = r t 2 d)..(t),
Jltln
324 TOPICS IN PROBABILITY THEORY
and hence
> -2(T2(Yn) < ;: n-2 t-l<ltlj t 2 d>.(t)
00 n
< L Ljn- 2 .Itl d>.(t).
n=l j=l Jj-l<ltlJ
Reversing the order of summation again and using the fact that I: j n- 2 < 2j-1
(by comparison to J j OO x- 2 dx), we obtain
n- 2 (T2(Yn) < 2 t, i-l<ltlj It I d>.(t) = 21: It I d>.(t) < 00.
By Theorem 10.12, therefore, if J.lj == E(Yj) we have n- 1 I:(Yj - J.lj) ---7 0 almost
surely. However, by the dominated convergence theorem,
J-lj = r t d>.(t) J oo t d>.(t) = 0,
Jltlj -00
and it follows easily (Exercise 12)thatn- 1 I: J.lj ---7 o also. Hencen- 1 I: Yj ---7 0
a.s., and the proof is complete. .
Thus far we have not shown how to construct sequences of random variables that
satisfy the hypotheses of the theorems in this section. We shall do so in 310.4.
Exercises
11. If I: n-2a < 00, then lirnnoo n- 2 I: aJ == O. (EstimateI:jEn aJ and
I:j>En aJ separately.)
12. If { an} C <C and Hrn an == a, then lirn n -1 I: aj == a.
13. The weak law of large numbers remains valid if the hypothesis of independence
is replaced by the (much weaker) hypothesis that E[(X j - J.lj)(Xk - J.lk)] == 0 for
j f= k.
14. If {X n } is a sequence of independent random variables such that E(X n ) == 0 and
I: a 2 (Xn) < 00, then I: X n converges almost surely. (Apply Kolmogorov's
inequality to show that the partial sums are Cauchy a.s.) Corollary: If the plus and
minus signs in I: :f:n -1 are determined by successive tosses of a fair coin, the
resulting series converges almost surely.
15. If {X n } is a sequence of independent identically distributed random variables
that are not in £1, then lirn sUPnoo n- 1 1 I: X j I == 00 almost surely. (Let
An == {IXnl > n}. Using an idea from the proof of Theorem 10.13, show that
I: P(An) == 00, and apply the Borel-Cantelli lemma.)
16. (Shannon's Theorem) Let {Xi} be a sequence of independent random variables
on the sample space f2 having the common distribution A == I: pj8j where 0 <
THE CENTRAL LIMIT THEOREM 325
Pj < 1, E Pj == 1, and 8j is the point mass at j. Define random variables Y 1 , Y 2 , . . .
on [2 by
Yn(W) == P( {w' : Xi (W') == Xi(w) for 1 < i < n}).
a. Y n == TI P Xi. (The notation is peculiar but correct: Xi (.) E {1, . . . , r} a.s.,
so PX i is well-defined a.s.)
b. limnCX) n -1 log Y n == E Pj log Pj almost surely. (In information the-
ory, the Xi'S are considered as the output of a source of digital signals, and
- E Pj log Pj is called the entropy of the signal.)
17. A collection or "population" of N objects (such as mice, grains of sand, etc.)
may be considered as a smaple space in which each object has probability N-1.
Let X be a random variable on this space (a numerical characteristic of the objects
such as mass, diameter, etc.) with mean J..l and variance a 2 . In statistics one is
interested in determining J..l and a 2 by taking a sequence of random samples from
the population and measuring X for each sample, thus obtaining a sequence {X j } of
numbers that are values of independent random variables with the same distribution
as X. The nth sample mean is M n == n- 1 E X j and the nth sample variance
is S == (n - 1)-1 E(Xj - M j )2. Show that E(M n ) == J..l, E(S) == a 2 , and
M n -t J..l and S -t a 2 almost surely as n -t 00. Can you see why one uses
(n - 1) -1 instead of n -1 in the definition of S?
10.3 THE CENTRAL LIMIT THEOREM
Suppose J..l E IR and a > O. By Proposition 2.53 and some elementary calculus, the
2
measure v on JR defined by
dv u2 (t) = 1 e(t-/l-)2/2u dt
J-L a
is a probability measure that satisfies
J tdv;\t) = p"
J (t - P, )2dv;2 (t) = (J2.
It is called the normal or Gaussian distribution with mean J..l and variance a 2 . The
special case V6 is called the standard normal distribution.
It is a matter of empirical observation that normal and approximately normal dis-
tributions are extremely common in applied probability and statistics. The theoretical
explanation for this phenomenon is the central limit theorem, the idea of which is
as follows. Suppose that {X j } is a sequence of independent identically distributed
random variables with mean 0 and variance a 2 . Then n -1 E X j has mean 0 and
variance n -1 a 2 , so there is a high probability that it is close to 0 when n is large; this
is the content of the weak law of large numbers. On the other hand, n- 1 / 2 E X j
has mean 0 and variance a 2 for all n, so one might ask if its distribution approaches
some nontrivial limit as n ---7 00. The remarkable answer is that no matter what the
326 TOPICS IN PROBABILITY THEORY
distribution of the Xj's is, this limit exists and equals the normal distribution with
mean 0 and variance a 2 .
The central limit theorem is really a theorem in Fourier analysis. We shall state it
as such and then translate it into probablity theory.
10.14 Theorem. Let A be a Borel probability measure on JR. such that
J t 2 d)..(t) = 1,
J t d)..(t) = o.
(The finiteness of the first integral implies the existence of the second.) For n E N let
A*n == A * . . . * A (nfactors) and define the measure An by An (E) == A*n( Vii E),
where Vii E == { Vii t : tEE}. Then An -t l/6 vaguely as n -t 00.
Proof. The hypotheses on the measure A imply that its Fourier transform X() ==
f e-21ri.x dA( x) is of class 0 2 and satisfies X(O) == 1, X, (0) = 0, and X" (0) = -47r 2 .
(Differentiate the integral twice as in Theorem 8 .22d.) Thus by Taylor's theorem,
X() == 1 - 27r22 + o(2),
where o(a) denotes a quantity that satisfies a- 1 0(a) -t 0 as a -t O. Moreover,
-- --
(A *n) == (A) n, so by the obvious change of variable,
Xn() = [X(n-l/2)r = [1- 27re + 0( : ) r.
Thus, since 10g(1 + z) = Z + o(z),
[ 27r22 2 ] 2
log X n () = n log 1 - n + 0 ( -;;) = - 27r 2 e + n . 0 ( -;;:- ) ,
........ 2 2
which tends to -27r22 as n -t 00. In other words, An() -t e- 21r as n -t 00 for
all , so the conclusion follows from Propositions 8.24 and 8.50. .
10.15 The Central Limit Theorem. Let {X j } be a sequence of independent iden-
tically distributed £2 random variables with mean J..l and variance a 2 . As n -t 00,
the distribution of (a Vii) -1 L (X j - J..l) converges vaguely to the standard normal
distribution l/6, and for all a E JR.,
. 1 n 1 j a 2
hm P ( Vii )Xn - J-l) < a ) = fie e- t /2 dt.
nCX) anI V 27r -CX)
Proof. Replacing X j by a-I (X j - J..l), we may assume that J..l == 0 and a = 1. If
A is the common distribution of the Xj's, then A satisfies the hypotheses of Theorem
10.14, and in the notation used there, An is the distribution of n- 1 / 2 L Xj. The
first assertion thus follows immediately, and the second one is equivalent to it by
Proposition 7.19. .
THE CENTRAL LIMIT THEOREM 327
As the reader may readily verify, the same argument yields the following more
general result. Under the hypotheses of the central limit theorem, if {Kn} is any
sequence of finite subsets of N such that k n = card (K n) -+ 00, then the distribution
of (aVE;;)-l jEKn (X j - J..l) converges vaguely to lid.
Exercises
18. A fair coin is tossed 10,000 times; let X be the number of times it comes up
heads. Use the central limit theorem and a table of values (printed or electronic) of
erf(x) == 27r- 1 / 2 Jo x e- t2 dttoestimate
a. the probability that 4950 < X < 5050;
b. the number k such that IX - 50001 < k with probability 0.98.
19. If {X j } satisfies the hypotheses of the central limit theorem, the sequence
Y n == (ayln)-l (Xj - J..l) does not converge in probability. (Use the remarks
following the central limit theorem to show that {Y 2 k } is not Cauchy in probability.)
20. If {X j } is a sequence of independent identically distributed random variables
with mean 0 and variance 1, the distributions of
n / n 1/2 n / n
LX j (LXI) and vnLX j LX]
1 1 1 1
both converge vaguely to the standard normal distribution.
21. Let {X n } be a sequence of independent random variables, each having the
Poisson distribution with mean 1 (Exercise 8). Let
n
( s -n ) - ( n-s )
Y n = In =illax ylnn ,o.
Sn = LXj,
1
n n _ k n k nn+(1/2)e- n
a. E(Y n ) = e- n L vn - k ' == , . (For the first equation, use
n. n.
k=O
Exercise 8b. As for the second, the sum telescopes.)
b. pYn converges vaguely to 80 + X(O,CX))lI6. (Use Proposition 7.19.)
c. J t 2 dPYn (t) == E(Y;) < 1 for all n.
d. E(Y n ) == JoCX)tdPyn(t) -+ ftdll6(t) == (27r)-1/2. (Use Exercise 10.)
Combining this with (a), one obtains Stirling's formula:
1 . n! 1
1m = .
nCX) nn e -n y!27rn
22. In this exercise we consider random variables with values in the circle 1r, regarded
as {z E <C : I z I == 1}. The distribution of such a random variable is a measure on 1r.
a. If Xl'...' X n are independent, then PX 1 X 2 ,..X n == PX 1 * ... * PXn.
b. If {X j } is a sequence of independent random variables with a common
distribution A, the distribution of I1 X j converges vaguely to the uniform
328 TOPICS IN PROBABILITY THEORY
distribution on 1r (= arc length over 27r) unless A is supported on a finite subgroup
Zm == {e27rijlm : 0 < j < m} of 1r, in which case it converges to the uniform
distribution m- 1 I:zEz rn 8z on Zm. (Use Exercise 8.39.)
10.4 CONSTRUCTION OF SAMPLE SPACES
The preceding two sections have dealt with sequences of random variables whose
joint distributions have certain properties. We now address the question of finding
examples of such sequences, and more generally of constructing families {X a} aEA
of random variables indexed by an arbitrary set A whose finite subfamilies have
prescribed joint distributions.
If the index set A is finite, this is easy: given any Borel probability measure P on
JRn, P is by definition the joint distribution of the coordinate functions Xl, . . . , X n
on the space (JRn, 23]Rn , P). If A is infinite, however, the problem is more delicate.
Suppose to begin with that {X a } aEA is a family of random variables on some sample
space (0, 23, P), and for each ordered n-tuple (aI, . . . , an) of distinct elements of A
(n E N) let Peal ,...,a n ) be the joint distribution of X al , . . . , Xa n . Then the measures
P(al,...,a n ) satisfy the following consistency conditions:
(10.16)
If a is a permutation of {I, . . . , n}, then
dP(aU(l),...aU(n))(Xa(l)'...' Xa(n)) = dP(al,...,an)(Xl,..., x n ).
(10.17)
If k < nand E E 23jRk, then
( E ) == ( E x JR n-k )
(al,...,ak) (al,...,a n ) .
Conversely, given any family of measures P(al,...,a n ) satisfying (10.16) and
(10.17), we shall show that there exist a sample space (0,23, P) and random variables
{X a } on 0 such that P(al,...,a n ) is the joint distribution of Xal'...' Xa n . To do
this, it is convenient to make one minor technical modification: We replace JR by
its one-point compactification JR* == JR U {oo}. Any Borel measure on JRn can be
regarded as a Borel measure on (JR * ) n that assigns measure zero to (JR *) n \ JR n, and
vice versa. In other words, we allow our random variables to assume the value 00,
although they will do so with probability zero. The point of this is that the space
(JR*)A is compact for any A, by Tychonoff's thoerem.
With this modification, the construction of the sample space (0, 23, P) in the case
where the random variables Xa are independent, so that P should be the product of
the P a's, is contained in Theorem 7.28. The general case is achieved by a simple
adaptation of the argument given there, which we review in detail for the convenience
of the reader.
10.18 Theorem. Let A be an arbitrary nonempty set, and suppose that for each
ordered n-tuple of distinct elements of A (n E N) we are given a Borel probability
measure P(al,...,a n ) onJRn, orequivaZentZyon (JR*)n, satisfying (10.16) and (10.17).
Then there is a unique Radon probability measure P on the compact Hausdorff space
CONSTRUCTION OF SAMPLE SPACES 329
o == (JR*)A such that P(al,...,a n ) is the joint distribution of X al , . . . , X an , where
Xa : 0 --+ JR* is the ath coordinate function.
Proof. Let GF(O) be the set of all f E G(O) that depend only on finitely many
coordinates. If f E GF(O), say f(x) == F(x al ,... x an )' let
1 (f) == J F d ) .
(al,...,a n
l(f) is well defined because of (10.16) and (10.17): If we permute the variables
or add some extra ones, the result is the same. Clearly 1 (f) > 0 if f > 0, and
11(f)1 < Ilfllu with equality when f is constant.
Now, GF(O) is clearly an algebra that separates points, contains constant func-
tions, and is closed under complex conjugation, so by the Stone-Weierstrass theorem
it is dense in G(O). Hence, the functional 1 extends uniquely to a positive linear
functional on G (0) with norm 1, and the Riesz representation theorem therefore
yields a unique Radon measure P on 0 such that 1(f) == J f dP for all f E G(O).
Let Xa be the ath coordinate function on 0 and let Peal ,...,a n ) be the joint distribu-
tion of Xal'...' X an on (JR*)n. If F E G((JR*)n) and f == F 0 (X al ,..., X an ) as
above, then
J F dPCa1,...,a n ) = J f dP = 1(1) = J F dP(al,...,a n )'
But P[al,...,a n ) and P(al,...,a n ) are both Radon measures by Theorem 7.8, so they
are equal by the uniqueness of the Riesz representation. I
The only property of IR * used in this proof is that it is a compact Hausdorff space in
which every open set is a-compact, so the theorem admits an obvious generalization.
In particular, if for each a there is a compact set Ka C JR such that P(al,...,a n ) is
supported in Kal x . . . x Ka n for all al, . . . , an, we could take 0 == IlaEA Ka and
thus avoid introducing the point at infinity.
Of special interest is the independent case, in which Peal ,...,a n ) == Pal X. . . X Pan.
We state it as a corollary:
10.19 Corollary. Suppose {Pa }aEA is afamily ofprobability measures on JR. Then
there exist a sample space (0,23, P) and independent random variables {Xa}aEA
on 0 such that Pa is the distribution of Xa for every a E A. Specifically, we can
take 0 to be (JR*)A and Xa to be the ath coordinate function; if Pa is supported in
the compact set Ka C JRfor each a, we can take 0 to be IlaEA Ka.
Exercises
23. Given b E N \ {1}, let B == {O, 1, . . . , b - 1}, and let Po be the probability
measure on B (or JR) that assigns measure b- 1 to each point in B. Let P be the
measure on 0 == B N given by Corollary 10.19, where A == Nand Pn == Po for all
n E N, and let {X n }l be the coordinate functions on O.
330 TOPICS IN PROBABILITY THEORY
a. If AI, . . . , An C B,
n n
p(n X j-l(A j )) = b- n II card(Aj),
1 1
and P( {w }) = 0 for all w E O.
b. Let 0' == {w EO: Xn(w) =1= 0 for infinitely many n}. Then 0 \ 0' is
countable and P(O') = 1.
c. Define F : 0 [0,1] by F(w) = I: Xn(w)b- n (so F(w) is the number
such that {Xn(w)} is the sequence of digits in its base b decimal expansion).
Then FIO' is a bijection from 0' to (0, 1] that maps 230' bijectively onto 23(0,1].
(O' is generated by sets of the form X;l(A) no'. The image under F of
such a set is a finite union of intervals of the form (jb- n , kb- n ], and these sets
generate (0,1].)
d. The image measure of P under F is Lebesgue measure.
e. (Borel's Normal Number Theorem) A number x E (0,1] is called normal
in base b if the digits 0, 1, . . . , b - 1 occur with equal frequency in its base b
decimal expansion, that is, if
. card { m E {I, . . . , n} : X m ( w) == j} 1
hm n = - b for j = 0,1, . . . , b - 1.
noo
Almost every x E (0,1] (with respect to Lebesgue measure) is normal in base b
for every b.
10.5 THE WIENER PROCESS
It is observed that small particles suspended in a fluid such as water or air undergo
an irregular motion, known as Brownian motion, due to the collisions of the particles
with the molecules of the fluid. A physical derivation of the statistical properties of
Brownian motion was developed independently by Einstein and Smoluchowski, but
the rigorous mathematical model for Brownian motion - in the limiting case where
the motion is assumed to result from an infinite number of collisions with molecules
of infinitesimal size - is due to Wiener. This model, called the Wiener process or
Brownian motion process, has turned out to be of central importance in probability
theory and its applications to physics and mathematical analysis.
One can consider Brownian motion in any number of space dimensions. We shall
describe the theory in dimension one and indicate how to generalize it.
The position of a particle undergoing Brownian motion on the line at time t > 0
is considered to be a random variable Xt (on a sample space to be specified later)
satisfying the following conditions. First, as a matter of normalization, we assume
that the particle starts at the origin at time t = 0:
(10.20)
Xo = 0 (almost surely).
THE WIENER PROCESS 331
Second, since any given collision affects the particle by only an infinitesimal amount,
it has no long-term effect, so the motion of the particle after time t should depend on
its position Xt at that time but not on its previous history. Thus we assume:
(10.21)
If 0 < to < t I < . . . < tn,
the random variables X tj - X tj _ l (1 < j < n) are independent.
Third, since the physical processes underlying Brownian motion are homogeneous
in time, we shall postulate that the distribution of Xt - Xs depends only on t - 8. If
we divide the interval [8, t] into n equal subintervals [to, tl],..., [tn-I, tn] (to == 8,
t n == t) and write Xt - Xs == L(Xtj - Xtj-l)' it then follows from (10.21) that
Xt - Xs is a sum of n independent identically distributed random variables. Since n
can be taken arbitrarily large, the central limit theorem suggests that the distribution
of Xt - Xs should be normal, and this conclusion is also supported by experimental
evidence. Moreover, by Corollary 10.6, a 2 (Xt - Xs) == na2(Xtl - Xto), and it
follows that a 2 (Xt - Xs) == ra 2 (Xt' - X s ') whenever t - 8 == r(t' - s') and r is
rational; this strongly indicates that a 2 (Xt - Xs) should be proportional to t - 8.
Finally, since the particle is as likely to move to the left as to the right, the mean of
Xt - Xs should be o. Putting this all together, we are led to the third assumption:
There is a constant C > 0 such that for 0 < s < t, Xt - Xs has
(10.22) C(t-s) .
the normal distribution v o with mean 0 and varIance C (t - 8).
The constant C, which expresses the rate of diffusion, is of course related to the
physical parameters of the system. For simplicity, we shall henceforth take C == 1.
A family {Xt}t>o of random variables satisfying (10.20)-(10.22), with C == 1,
is called an abstract Wiener process. The generalization to n dimensions can
now be described easily: an n-dimensional abstract Wiener process is a family
of JRn-valued random variables {Xt}t>o, where Xt == (Xl, . . . , X;"), such that (i)
{Xl }tO is a one-dimensional abstract Wiener process for each j, and (ii) if Yj is any
function of the variables {Xl }t>o for j == 1, . . . , n, then Y I , . . . , Y n are independent.
In other words, an n-dimensional abstract Wiener process is just a Cartesian product
of n one-dimensional abstract Wiener processes. In particular, Xt - Xs has the
n-dimensional normal distribution [vg-s]n for t > 8:
( ",n 2 )
t-s n _ -n/2 L.,t! X j
d[v o ] (Xl'... ,X n ) - [2n(t - s)] exp 2(t _ s)
dXI . . . dXn,
which has the appropriate sort of spherical symmetry.
We return to the one-dimensional case. The conditions (10.20)-(10.22) completely
determine the joint distributions of the Xt's as follows. If tl < ... < tn, then
X tl , X t2 - X tl , . . . , X tn - X tn _ 1 are independent (since Xo == 0 a.s.), so their joint
distribution is the product measure
vg l X v5 2 -tl X . . . X vn -tn-l .
332 TOPICS IN PROBABILITY THEORY
But
(X tl ,... ,X tn ) == T(X tl , X t2 - Xtl' ..., X tn - X tn _ l )
where
T(YI,...,Yn) == (Yl, Yl +Y2, ..., Yl +...+Yn).
Since det T == 1, Theorem 2.44 implies that the joint distribution Petl,...,t n ) of
X tl , . . . , X tn is given by
(10.23)
d n ( ) d tn -tn-l ( ) d t2 -tl ( ) d tl ( )
F(tl,...,t n ) Xl,..., X n == VA X n - Xn-l ... VA X2 - Xl VA Xl
[rr n ] -1/2 ( (Xj - Xj_l)2 )
= 1 27r(tj - tj-d exp 2(tj _ tj-l) dXl . . . dXn,
where to == Xo == O. We thus know Ph,...,t n when tl < ... < tn, and we obtain it
in the general case by permuting the variables according to (10.16). Also, it follows
easily from (10.23) that (10.17) is satisfied. Therefore, by Theorem 10.18, abstract
Wiener processes exist.
This situation leaves something to be desired, however. Physically, one expects
the position of a particle to be a continuous function of time, so one would like the
sample space for the Wiener process to be C([O, (0), IR) (or some subset thereof)
and the random variable Xt to be evaluation at t. Actually, Theorem 10.18 yields
something along these lines: The sample space it provides is the space (*)[O,oo) of
all functions from [0,(0) into the compactified line, and Xt(w) is indeed w(t). We
can therefore achieve our goal by showing that the measure P of Theorem 10.18 is
concen trated on C ( [0, 00 ) , JR), considered as a subset of (JR *) [0,00 ). The resulting
realization of the abstract Wiener process on C([O, (0), JR) is what is usually called
the Wiener process.
Henceforth we shall use the notation
f2 == (JR*) [0,(0) ,
f2c == 0([0, (0), IR),
and P will denote the Radon measure on f2 whose finite-dimensional projections are
given by (10.23).
To begin with, we need to make a few comments about the role of the point
at infinity. The function f( t, 8) == It - 81 maps JR2 to [0, +(0) (we write +00 to
distinguish it from the point at infinity in *), and we extend f to a map from
(IR*)2 to [0, +00] by declaring that It - 001 == 100 - tl = +00 for t E JR and
100 - 00 I == o. When thus extended, f is of course discontinuous at 00, but it is lower
semicontinuous, as the reader may verify (Exercise 24). Thus for a, t, 8 E [0,(0) the
sets{wEf2: Iw(t)-W(8)1 >a}areopenandthesets{wEf2: Iw(t)-w(8)1 < a}
are closed in f2.
Next, we need to make some estimates in terms of the quantity
1 [ 2 ] 1/2 1 00 2
p(E,8) == sup dV5(x) == sup - e- x /2t dx.
t8 Ixl>€ t8 rrt €
THE WIENER PROCESS 333
These estimates are contained in the following four lemmas, after which we come to
the main theorem.
10.24 Lemma. For each E > 0, limo-+o 8- 1 p( E, 8) == O.
Proof. We have
j oo j oo 2t
e- x2 / 2t dx < e-€x/2t dt == _e-€2 /2t,
€ € E
whence p( E, 8) < (2/ E) (28/ 1T ) 1/2 e _€2 /28. The exponential term tends to zero faster
than any power of 8, so the result follows. .
10.25 Lemma. Suppose E > 0, 8 > 0, 0 < t1 < . . . < tk, tk - t1 < 8, and
A== {w: Iw(tj) -w(t1)1 > 2Eforsomej E {2,...,k}}.
Then P(A) < 2p( E, 8).
Proof. For j == 1, . . . , k, let
Bj == {w: IW(tk) -w(tj)1 > E},
Dj == {w : Iw(tj) - w(t1)1 > 2E and IW(ti) - w(t 1 )1 < 2E fori < j}.
If w E A, then w E Dj for some j > 2; if w is not also in Bj, then w must be in B 1 ,
for IW(tk) - w(t 1 )1 > E whenever Iw(tj) - w(t1)1 > 2E and IW(tk) - w(tj)1 < E. In
other words,
k
A C B 1 u U(B j n Dj).
2
But P(Bj) < p( E, 8) by (10.22), and Bj and Dj are independent events by (10.21);
also, the D j'S are clearly disjoint. Therefore,
k k
P(A) < P(Br) + LP(Bj)P(D j ) < p(E,8)[1 + LP(D j )] < 2p(E,8).
2 2
.
10.26 Lemma. With the notation of Lemma 10.25, let
E == {w : IW(ti) - w(tj)1 > 4Eforsome i,j E {1,..., k}}.
Then P(E) < 2p(E, 8).
Proof. If IW(ti) - w(tj)1 > 4E, we have either IW(ti) - w(t1)1 > 2E or Iw(t j ) -
w(t1)1 > 2E. Thus E c A, so the result follows from Lemma 10.25. .
334 TOPICS IN PROBABILITY THEORY
10.27 Lemma. Suppose E > 0, 0 < a < b, and b - a < 8. Let
v == {w: Iw(t) -w(s)1 > 4Eforsomet,s E [a,b]}.
Then P(V) < 2p( E, 8).
Proof. If S is a finite subset of [a, b], let
V(S) == {w : Iw(t) - w(s)1 > 4E for some t, s E S}.
Then the sets V (S) are open in f2 by the remarks preceding Lemma 10.24, and their
union is V. Also, by Lemma 10.26, P(V(S)) < 2p(E, 8). Since the family {V(S) :
S is a finite subset of [a, b]} is closed under finite unions, if K is any compact subset
of V we have K c V(S) for some S and hence P(K) < 2p(E,8). But then
P(V) < 2p( E, 8) by the inner regularity of P. .
10.28 Theorem. Let f2 == (JR.*)[O,oo) and f2c 0([0, (0), JR.), and let P be the
Radon measure on f2 whose finite-dimensional projections are given by (10.23),
according to Theorem 10.18. Then f2c is a Borel subset off2 and P(f2c) == 1.
Proof. A real-valued function w on [0,(0) is continuous iff it is uniformly
continuous on [0, n] for each n, and it is uniformly continuous on [0, n] iff for every
j E N there exists kEN such that Iw(t) - w(s)1 < j-1 (note the use of < rather
than <) for all s E [0, n] and all t E [0, n] n (s - k- 1 , s + k- 1 ). Moreover, even if
we only assume that w is IR* -valued, this last condition implies that it is real-valued
unless it is identically 00. Therefore, if Woo denotes the function whose value is
identically 00, we have
f2c u {woo}
(10.29) = n nun {w ED: Iw(t) -w(s)1 < r 1 }.
n=l j=l k=l s,tE[O,n], It-sl<l/k
00 00 00
By the remarks preceding Lemma 10.24, {W : Iw(t) - W(S) I < j-1} is closed for all
s, t, and j. Hence f2c U {woo} is an Fa8 set, and f2c is therefore a Borel set.
Moreover, if for E, 8 > 0 and n E N we set
U(n, E, 8) == {w E f2 : Iw(t) - w(s)1 > 8E for some t, s E [0, n] with It - sl < 8},
by (10.29) we have
D \ Dc = [QljQ [l U(n, (8j)-1, k-l)] U {woo}.
Clearly P( {woo}) == 0, so in order to show that P(f2c) == 1, or equivalently that
P(f2 \ f2c) == 0, it will suffice to show that 1imk-+00 P(U(n, E, k- 1 )) == 0 for all E
and n.
THE WIENER PROCESS 335
The interval [0, n] is the union of the subintervals [0, k- 1 ], [k- 1 , 2k- 1 ], ...,
[n - k- 1 , n]. If W E U(n, E, k- 1 ), then Iw(t) - w(8)1 > 8E for some t, 8 lying in the
same subinterval or in adjacent subintervals, and hence, in the notation of Lemma
10.27, W E V where 8 == k- 1 and [a, b] is one of the subintervals. (In the case of
adjacent subintervals, use their common endpoint as an intermediate point.) As there
are nk subintervals, Lemma 10.27 implies that P(U(n, E, k- 1 )) < 2nkp(c, k- 1 ).
Lemma 10.24 then shows that P(U(n,E,k- 1 )) 0 as k 00, which completes
the proof. .
Exercises
24. The function f : (JR.*)2 [0, +00] defined by f(t, 8) == It - 81 for t, 8 E JR.,
f ( 00, t) == f (t, (0) == +00 for t E JR., and f ( 00, 00) == 0 is lower semicontinuous.
25. Let f2 == (JR. *) [0,(0), {X t }t>o the coordinate functions on f2, and for any A c
[0,00), M A == the a-algebra generated by {Xt}tEA. (Thus M[o,oo) is the product
a-algebra on f2 corresponding to the Borel a-algebras on the factors.)
a. Suppose VEMA and w, w' E f2. If w E V and w'(t) == w(t) for all tEA,
then w' E V.
b. If V E M[o,oo), then VEMA for some countable set A. (Use Exercise 5 in
3 1 . 2 . )
c. The set f2c == C([O, 00), JR.) is not in M[o,oo).
26. Let f2c and P be as in Theorem 10.30. If w E f2c, it can be shown that w is
almost surely not of bounded variation, so if f is a Borel measurable function on
[0,00), the integral Jo oo f(t) dw(t) apparently makes no sense. However:
a. If f == E Cj X[aj ,b j ) is a step function, define
1 00 n
If(W) == f(t) dw(t) == L Cj [w(b j ) - w(aj)].
o 1
Then If is an L 2 random variable on f2c with mean 0 and variance Jo oo If(x) 1 2 dx.
(Hint: The intervals [aj, b j ) may be assumed disjoint.)
b. The map f If extends to an isometry from L 2 ([0, 00), m) to L2(f2c, P).
c. If f E BV ([0,00)) is right continuous and supp(f) is compact, there is a
sequence {fn} of step functions such that fn fin L 2 and dfn df vaguely,
where df denotes the Lebesgue-Stietjes measure defined by f. (By Exercise 41
in 38.6, there is a sequence {JLn} of linear combinations of point masses such
that dJLn ---+ df vaguely. Consider f n (x) == JLn ((0, x]) + f (0).)
d. Iff E BV([O,oo))isrightcontinuousandsupp(f) is compact, then If(w ) ==
- Jo oo w(t) df(t) almost surely. (Check this directly when f is a step function
and apply (b) and (c).)
336 TOPICS IN PROBABILITY THEORY
10.6 NOTES AND REFERENCES
The development of probability theory as a rigorous mathematical discipline began
in the early part of the 20th century, when the tools of measure theory and Lebesgue-
Stieltjes integrals became available. In 1933 Kolmogorov [85] put the subject on a
solid foundation by explicitly identifying sample spaces and random variables with
measure spaces and measurable functions. Since then it has grown extensively.
More detailed accounts of probability theory on a level comparable to that of this
book can be found in Billingsley [17], Chung [25], and Lamperti [90].
310.3: An account of the long history of the central limit theorem can be found in
Adams [2]. More general versions of this result exist in which the random variables
are not assumed to be identically distributed; see the references given above. The
form of Taylor's theorem used in the proof is explained in Folland [45].
The proof of Stirling's formula outlined in Exercise 21 is due to Wong [162]; see
Blyth and Pathak [18] for some other probabilistic proofs of Stirling's formula.
There is one more major result about the asymptotic behavior of sums of indepen-
dent identically distributed random variables that should be mentioned along with
the law of large numbers and the central limit theorem:
The Law of the Iterated Logarithm: Suppose that {X n }l is a sequence
of independent identically distributed £3 random variables with mean J1 and
variance a 2 , and let Sn == I: X j . Then
lim sup Sn - nJ.l = 1 almost surely.
n-+oo a y' 2n log log n
The proof may be found in Chung [25], which gives a more general result; see also
Lamperti [90] for the case in which the Xn's are assumed uniformly bounded.
310.4: Theorem 10.18 is a variant due to Nelson [103] of the fundamental
existence theorem of Kolmogorov [85]. In Kolmogorov' s original construction, the
sample space is IRA (which could be replaced by (IR*)A) and the a-algebra on which
the measure P is constructed is @aEA 23jR. Theorem 10.18 is a decided improvement
on Kolmogorov's theorem, both in the simplicity of its proof and in the fact that the
Borel a-algebra on (IR*)A properly includes @aEA 23jR when A is uncountable. (The
significance of the latter fact is evident from Exercise 25.)
310.5: Wiener constructed his measure P on 0([0,00), JR) in [159] and [160];
his approach is quite different from ours. Our proof of Theorem 10.30 follows Nelson
[104]. See also Nelson [105] for some related material, including the derivation of
the postulates (10.20)-( 10.22) from physical principles.
A discussion of the many interesting properties of the Wiener process is beyond the
scope of this book. We shall mention only one, as a complement to Theorem 10.30:
The sample paths of the Wiener process are almost surely nowhere differentiable; in
fact, with probability one, at each point they are Holder continuous of every exponent
a < but not of exponent . This fact may be startling at first, but it seems almost
NOTES AND REFERENCES 337
inevitable when one reflects that Iw(t) - w(s)I/lt - sll/2 has the standard normal
distribution for all t, s.
Knight [84] is a good source for further information about the Wiener process.
More Measures and
Integrals
In this chapter we discuss some additional examples of measures and integrals that
are of importance in analysis and geometry: invariant measures on locally compact
groups, geometric measures of lower-dimensional sets in JRn, and integration of
densities and differential forms on manifolds. Although we have grouped these
topics together in one chapter, they are substantially independent of one another.
11.1 TOPOLOGICAL GROUPS AND HAAR MEASURE
A topological group is a group G endowed with a topology such that the group
operations (x, y) 1-7 xy and x 1-7 x- 1 are continuous from G x G and G to G.
Examples include topological vector spaces (the group operation being addition),
groups of invertible n x n real matrices (with the relative topology induced from
jRn x n), and all groups equipped with the discrete topology. If G is a topological
group, we denote the identity element of G bye, and for A, BeG and x E G we
define
xA = {xy : YEA},
A- 1 = {X-1 : x E A},
Ax = {yx : YEA},
AB = {yz : YEA, Z E B}.
We say that A eGis symmetric if A = A -1.
Here are some of the basic properties of topological groups:
11.1 Proposition. Let G be a topological group.
a. The topology of G is translation invariant: If U is open and x E G, then U x
and xU are open.
339
340 MORE MEASURES AND INTEGRALS
b. For every neighborhood U of e there is a symmetric neighborhood V of e with
V C u.
c. For every neighborhood U of e there is a neighborhood V of e with VV c U.
d. If H is a subgroup ofG, so is H .
e. Every open subgroup of G is also closed.
/. If K 1 , K 2 are compact subsets ofG, so is K 1 K 2 .
Proof. (a) is equivalent to the continuity in each variable of the map (x, y) xy,
and (b) and (c) are equivalent to the continuity of x X-I and (x, y) xy at the
identity. (Details are left to the reader.) For (d), if x, y E H , there exist nets (Xa)aEA,
(Yf3)(3EB in H that converge to x and y. Then X;-1 x-I and x a Yf3 xy (with
the usual product ordering on A x B), so X-I and xy belong to H . For (e), if H is
an open subgroup, the cosets xH are open for all x, so that G \ H = UxH xH is
open and hence H is closed. Finally, (f) is true because K 1 K 2 is the image of the
compact set K 1 x K 2 under the continuous map (x, y) xy. .
If f is a continuous function on the topological group G and y E G, we define the
left and right translates of f through y by
Lyf(x) = f(y- 1 x),
Ryf(x) = f(xy).
(The point of using y-l on the left and y on the right is to make Lyz = LyLz and
Ryz == RyRz.) f is called left (resp. right) uniformly continuous if for every E > 0
there is a neighborhood V of e such that IILyf - fllu < E (resp. IIRyf - fllu < E)
for y E V. (Some authors reverse the roles of Ly and Ry in this definition.)
11.2 Proposition. If f E C c ( G), then f is left and right uniformly continuous.
Proof. We shall consider right uniform continuity; the proof on the left is the
same. Let K = supp(f) and suppose € > o. For each x E K there is a neighborhood
U x of e such that If(xy) - f(x)1 < !E fory E U x , and by Proposition 11.1(b,c) there
is a symmetric neighborhood V x of e with V x V x c U X . Then {XVX}XEK covers K,
so there exist Xl, . . . , X n E K such that K c U Xj V Xj . Let V = n V Xj ; we claim
that If(xy) - f(x)1 < E if y E V. On the one hand, if x E K, then for some j we
have Xj 1 x E V Xj and hence xy = Xj( xj l X )Y E XjU Xj ; therefore,
If(xy) - f(x)1 < If(xy) - f(xj)1 + If(xj) - f(x)1 < E.
On the other hand, if x K, then f(x) = 0, and either f(xy) = 0 (if xy K) or
xj 1 xy E V Xj for some j (if xy E K); in the latter case xj 1 x = xj 1 xyy-l E U Xj ,
so that If(xj)1 < !E and hence If(xy)1 < E. .
One usually assumes that the topology of a topological group is Hausdorff. The
following proposition shows that this is not much of a restriction.
TOPOLOGICAL GROUPS AND HAAR MEASURE 341
11.3 Proposition. Let G be a topological group.
a. If G is TI' then G is Hausdorff.
b. If G is not TI, let H be the closure of {e}. Then H is a normal subgroup, and
if G / H is given the quotient topology (i.e., a set in G / H is open iffits inverse
image in G is open), G / H is a Hausdorff topological group.
Proof. (a) If G is TI and x =1= y E G, by Proposition 11.I(b,c) there is a
symmetric neighborhood V of e such that xy-I VV. Then Vx and Vy are
disjoint neighborhoods of x and y, for it z == VI X == V2Y for some VI, V2 E V, then
xy-I = vII ZZ- I V2 E V-I V == VV.
(b) H is a subgroup by Proposition 11.ld; it is clearly the smallest closed subgroup
of G. It follows that H is normal, for if H' were a conjugate of H with H' =1= H,
H' n H would be a smaller closed subgroup. It is routine to verify that the group
operations on G / H are continuous in the quotient topology, so that G / H is a
topological group. If e is the identity element in G / H, then {e} is closed since
its inverse image in G is H. But then every singleton set in G / H is closed by
Proposition 11.1a, so G / H is TI and hence Hausdorff. I
In the context of Proposition 11.3b, it is easy to see that every Borel measurable
function on G is constant on the cosets of H and hence is effectively a function on
G / H. Thus for most purposes one may as well work with the Hausdorff group G / H.
We shall be interested in the case where G is locally compact, and we henceforth
use the term locally compact group to mean a topological group whose topology is
locally compact and Hausdorff.
Suppose that G is a locally compact group. A Borel measure J-l on G is called
left-invariant (resp. right-invariant) if J-l(xE) == J-l(E) (resp. J-l(Ex) = J-l(E)) for
all x E G and E E 'Be. Similarly, a linear functional I on C c ( G) is called left- or
right-invariant if I(L x f) == l(f) or I(Rxf) == l(f) for all f. A left (resp. right)
Haar measure on G is a nonzero left-invariant (resp. right-invariant) Radon measure
J.L on G. For example, Lebesgue measure is a (left and right) Haar measure on JRn, and
counting measure is a (left and right) Haar measure on any group with the discrete
topology. (Other examples will be found in the exercises.) The following proposition
summarizes some elementary properties of Haar measures; in it, and in the sequel,
we employ the notation
ct == {f E Cc(G) : f > 0 and Ilfllu > a}.
11.4 Proposition. Let G be a locally compact group.
a. A Radon measure J-l on G is a left Haar measure iff the measure J-l defined by
J-l (E) = J-l(E-I) is a right Haar measure.
b. A nonzero Radon measure J-l on G is a left Haar measure iff J f dJ-l = J Ly f dJ-l
for all f E ct and y E G.
c. If J-l is a left Haar measure on G, then J-l(U) > 0 for every non empty open
U c G, and J f dJ-l > 0 for all f E ct.
d. If J-l is a left Haar measure on G, then J-l( G) < 00 iff G is compact.
342 MORE MEASURES AND INTEGRALS
Proof. (a) is obvious. The "only if" implication of (b) follows by approximating
f by simple functions, and the converse is true by (7.3). As for (c), since /-l =1= 0, by the
regularity of /-l there is a compact KeG with /-l( K) > o. If U is open and nonempty,
K can be covered by finitely many left translates of U, and it follows that /-l(U) > O.
If fEe:, let U = {x : f(x) > llfllu}. Then J f d/-l > llfllu/-l(U) > O.
Finally, we prove (d). If G is compact, then /-l( G) < 00 since /-l is Radon. If
G is not compact and V is a compact neighborhood of e, then G cannot be covered
by finitely many translates of V, so by induction we can find a sequence {xn}
such that X n U-l Xj V for all n. By Proposition 11.I(b,c) there is a symmetric
neighborhood U of e such that UU c V. If m > nand X n U n X m U is nonempty,
then X m E xnUU C X n V, a contradiction. Hence {x n U}l is a disjoint sequence,
and /-l(xnU) = /-l(U) > 0 by (c), whence /-l(G) > /-l(U xmU) = 00. I
Our aim now is to prove the existence and uniqueness of Haar measures. In view
of Proposition 11.4a, one can pass from left Haar measures to right Haar measures at
will, so for the sake of definiteness we shall concentrate on left Haar measures. We
begin with some motivation for the existence proof.
If E E 13c and V is open and nonempty, let (E : V) denote the smallest number
of left translates of V that cover E, that is,
(E : V) = inf{ #(A) : E C U xV},
xEA
where #(A) == card(A) if A is finite and #(A) = 00 otherwise. Thus (E : V) is
a rough measure of the relative sizes of E and V. If we fix a precompact open set
Eo, the ratio (E : V) / (Eo : V) gives a rough estimate of the size of E when the size
of Eo is normalized to be 1. This estimate becomes the more accurate the smaller
V is, and it is obviously left-invariant as a function of E. We might therefore hope
to obtain a Haar measure as a limit of the "quasi-measures" (E : V) / (Eo : V) as V
shrinks to { e } .
This idea can be made to work as it stands, but it is simpler to carry out if we
think of integrals of functions instead of measures of sets. If f, fjJ E C:' then
{x : fjJ(x) > llfjJllu} is open and nonempty, so finitely many left translates of it
cover supp(f), and it follows that
211fllu
f < 114>llu L xj 4> for some X!,..., X n E G.
It therefore makes sense to define the "Haar covering number" of f with respect to
4J:
n n
(J : 4» = inf{I: Cj : f < I: c j L xj 4> for some n E N and Xl,... , X n E G}.
1 1
Clearly (f : 4J) > 0; in fact, (f : 4J) > Ilfllu/II4Jllu.
TOPOLOGICAL GROUPS AND HAAR MEASURE 343
11.5 Lemma. Suppose that I, g, 4J E ct.
a. (I: 4J) = (Lx I : 4J) for any x E G.
b. (c/: 4J) == c(1 : 4J) for any C > o.
c. (I + 9 : 4J) < (I : 4J) + (g : 4J).
d. (I: 4J) < (I : g) (g : 4J).
Proof. We have I < Ec j L xj 4J iff LxI < Ec j L xxj 4J; this proves (a), and
(b) is equally obvious. If f < E c j L xj 4J and 9 < E+l cjLxj4J, then I + 9 <
E cjLxj4J, so (c) follows by minimizing E Cj and E+l Cj. Similarly, if
I < E cjLxjg andg < E d k L yk 4J, then I < Ej,k CjdkLxjYk4J. Since Ej,k cjd k ==
(E Cj )(E d k ), (d) follows. I
At this point we make a normalization by choosing 10 E ct once and for all and
defining
1</>(1) = (:: ) for f, <P E C:.
By Lemma 11.5(a-c), for each fixed 4J the functional 14> is left-invariant and bears
some resemblance to a positive linear functional except that it is only subadditive.
Moreover, by Lemma 11.5d it satisfies
( 11.6)
(/0 : 1)-1 < 14>(/) < (I : 10).
We now show that, in a certain sense, 14> is approximately additive when supp( 4J) is
small.
11.7 Lemma. If 11, 12 E C: and E > 0, there is a neighborhood V of e such that
14>(/1) + 1</>(/2) < 14>(/1 + 12) + E whenever supp(4J) C V.
Proof Fix g E C: such that g == 1 on SUpp(/1 + 12), and let b be a positive
number to be specified later. Set h == 11 + 12 + bg and hi == li/h (i = 1,2), where
it is understood that hi = 0 outside SUPP(/i). Then hi E C:, so by Proposition 11.2
there is a neighborhood V of e such that Ihi(x) -hi(y)1 < b ifi == 1,2 and y-1x E V.
If 4J E ct, supp( 4J) C V, and h < E cjLxj 4J, then Ihi(x) - hi(xj) I < b whenever
xj1x E supp(4J), so
Ii (X) == h(x)hi(x) < L Cj4J( xj 1 x )h i (x) < L Cj4J( xj 1 x )[h i (Xj) + 8].
j j
But then (Ii: 4J) < E cj[hi(xj) + b], and since h 1 + h 2 < 1,
(/1 : 4J) + (/2 : 4J) < L cj[1 + 28].
j
Now, E Cj can be made arbitrarily close to (h : 4J), so by Lemma 11.5(b,c),
14>(f1) + 14>(/2) < (1 + 2b)14>(h) < (1 + 28) [14>(/1 + 12) + b14>(g)].
In view of (11.6), therefore, it suffices to choose 8 small enough so that
2b(/1 + 12 : 10) + 8(1 + 2b)(g : 10) < E.
I
344 MORE MEASURES AND INTEGRALS
11.8 Theorem. Every locally compact group G possesses a left Haar measure.
Proof. For each f E C: let X f be the interval [(fo : f)-I, (f : fo)], and let
X == TIfEC: Xf. Then X is a compact Hausdorff space by Tychonoff's theorem,
and by (11.6), every Ie/> is an element of X. For each compact neighborhood V of
e, let K(V) be the closure in X of {Ie/> : supp(4)) c V}. Clearly n K(Vj) :J
K(n Vj), so by Proposition 4.21 there is an element I in the intersection of all
the K(V)'s. Every neighborhood of I in X intersects {Ie/> : supp(4)) C V} for
all V; in other words, for any neighborhood V of e and any fl, . . . , f n E C: and
E > 0 there exists 4> E C: with supp( 4» c V such that II (fj) - Ie/> (fj ) I < E
for j == 1,..., n. Therefore, in view of Lemmas 11.5 and 11.7, I is left-invariant
and satisfies I(af + bg) == aI(f) + bI(g) for all f, 9 E C: and a, b > O. It
follows easily, as in the proof of Lemma 7.15, that if we extend I to C c by setting
I(f) == I(f+) - I(f-), then I is a left-invariant positive linear functional on C c ( G).
Moreover, I(f) > 0 for all f E C: by (11.6). The proof is therefore completed by
invoking the Riesz representation theorem. .
11.9 Theorem. If jj and v are left Haar measures on G, there exists e > 0 such that
f.-l == ev.
Proof. We first present a simple proof that works when f.-l is both left- and right-
invariant - in particular, when G is Abelian. Pick h E C: such that h E C: and
h(x) == h(x- 1 ) (e.g., h(x) == g(x) + g(x- 1 ) where 9 is any element of ct). Then
for any f E C c ( G),
1 hdv 1 f d/-L = 11 h(y)f(x) d/-L(x) dv(y)
= 11 h(y)f(xy) d/-L(x) dv(y) = 11 h(y)f(xy) dv(y) d/-L(x)
= 11 h(x- 1 y)f(y) dv(y) d/-L(x) = 11 h(y- 1 x)f(y) dv(y) d/-L(x)
= 1 1 h(y- 1 x)f(y) d/-L(x) dv(y) = 11 h(x)f(y) d/-L(x) dv(y)
= 1 h d/-L 1 f dv,
so that f.-l == ev where e == (J h df.-l) / (J h dv). (J h dv =1= 0 by Proposition 11.4c, and
Fubini's theorem is applicable since the functions in question are supported in sets
which are compact and hence of finite measure. The same remarks apply below.)
Now, another proof for the general case. The assertion that f.-l = ev is equivalent
to the assertion that the ratio r f == (J f df.-l) / (J f dv) is independent of f E ct.
Suppose, then, that f, 9 E ct; we shall show that rf == rg.
Fix a symmetric compact neighborhood Va of e and set
A == [supp(f)]V o u Va [supp(f)],
B = [supp(g)]V o U Va [supp(g)].
TOPOLOGICAL GROUPS AND HAAR MEASURE 345
Then A and B are compact by Proposition 11. If, and for y E Va the functions
x f(xy) - f(yx) and x g(xy) - g(yx) are supported in A and B, respectively.
Next, given E > 0, by Proposition 11.2 there is a symmetric compact neighborhood
V C Va of e such that sup x If(xy) - f(yx)1 < E and sUPx Ig(xy) - g(yx)1 < E for
y E V. Pick h E ct with supp(h) c V and h(x) == h(x- 1 ). Then
I hdv I fdp, = II h(y)f(x) dp,(x) dv(y)
= II h(y)f(yx) dp,(x) dv(y),
and since h(x) == h(x- 1 ),
I h dp, I f dv = I I h(x)f(y) dp,(x) dv(y)
= I I h(y- 1 x)f(y) dp,(x) dv(y) = II h(x- 1 y)f(y) dv(y) dp,(x)
= I I h(y)f(xy) dv(y) dp,(x) = II h(y)f(xy) dp,(x) dv(y).
Thus,
I hdv I fdp,- I hdp, I fdv = II h(Y)[f(xY)-f(yx)]dp,(x)dv(y)
< Ep,(A) I h dv.
By the same reasoning,
I hdv I gdp,- I hdp, I gdv < Ep,(B) I hdv.
Dividing these inequalities by (J h dv)(J f dv) and (J h dv)(J 9 dv), respectively,
and adding them, we obtain
J f dJl;
J fdv
J 9 dJl; < E ( Jl;(A) + Jl;(B) ) .
J 9 dv - J f dv J 9 dv
Since E is arbitrary, we are done.
.
We conclude this section by investigating the relationship between left and right
Haar measures. If Jl; is a left Haar measure on G and x E G, the measure Jl;x (E) ==
Jl;( Ex) is again a left Haar measure, because of the commutativity of left and right
translations (i.e., the associative law). Hence, by Theorem 11.9 there is a positive
number (x) such that Jl;x = (x)Jl;. The function : G (0,00) thus defined
346 MORE MEASURES AND INTEGRALS
is independent of the choice of J-l by Theorem 11.9 again; it is called the modular
function of G.
11.10 Proposition. is a continuous homomorphism from G to the multiplicative
group of positive real numbers. Moreover, if J-l is a left Haar measure on G, for any
f E £1 (J-l) and y in G we have
(11.11)
J (Ryf) dJ-L = b.(y-l) J f dJ-L.
Proof. For any x, y E G and E E 'Be,
(xY)J-l(E) == JL(Exy) == (Y)J-l(Ex) = (y)(x)J-l(E),
so is a homomorphism from G to (0, (0). Also, since XE(XY) == XEy-l (x),
J XE(XY) dJ-L(x) = J-L(Ey-l) = b.(y-l )J-L(E) = b.(y-l) J XE dJ-L.
This proves (11.11) when f = XE, and the general case follo\vs by the usual linearity
and approximation arguments. Finally, it is an easy consequence of Proposition 11.2
that the map x J Rxf dJ-l is continuous for any f E C c ( G) (Exercise 2), so the
continuity of follows from (11.11). .
Evidently, the left Haarmeasures on G are also right Haar measures precisely when
is identically 1, in which case G is called unimodular. Of course, every Abelian
group is unimodular; remarkably enough, groups that are highly noncommutative
are also unimodular. To be precise, let [G, G] denote the smallest closed subgroup
of G containing all elements of the form [x, y] == xyx- 1 y-l. [G, G] is called the
commutator subgroup of G; it is normal because z[x, Y]Z-l = [zxz- 1 , zyz-l],
and it is trivial precisely when G is Abelian.
11.12 Proposition. If G / [G, G] is finite, then G is unimodular.
Proof. Every continuous homomorphism (such as ) from G into an Abelian
group must annihilate [x, y] for all x, y and must therefore factor through G / [G, G].
If the latter group is finite, (G) is a finite subgroup of (0, (0); but (0,00) has no
finite subgroups except {1}. I
11.13 Proposition. If G is compact, then G is unimodular.
Proof. For any x E G, obviously G == Gx. Hence if J-l is a left Haar measure,
we have J-l(G) == J-l(Gx) == (x)J-l(G), and since 0 < J-l(G) < 00 we conclude that
(x) == 1. I
We observed above that if J-l is a left Haar measure, J-L (E) = J-L(E-1) is a right
Haar measure. We now show how to compute it in terms of J-l and .
11.14 Proposition. d J-l (x) == (x)-1 dJ-l(x).
TOPOLOGICAL GROUPS AND HAAR MEASURE 347
Proof. By (11.11), if I E C c ( G),
J f(x)!l(x)-l dJ-L(x) = !l(y) J f(xy)!l(xy)-l dJ-L(x)
= J Ryf(x)!l(X)-l dJ-L(x).
Thus the functional I J I -1 dJ-l is right-invariant, so its associated Radon
measure is a right Haar measure. However, this Radon measure is simply -1 dJ-l by
Exercise 9 in 37.2; hence, by Theorem 11.9, -1 dJ-l = c d J-l for some c > O. If c =1= 1,
we can pick a symmetric neighborhood U of e in G such that I (x) -1 - 11 < ! Ie - 11
on U. But J-l (U) == J-l(U), so
Ic - IIJ-L(U) = IC J-L (U) - J-L(U) I = Ii (!l(x)-l - 1) dJ-L(x) I < !Ic - 11J-L(U),
a contradiction. Hence c = 1 and dji == -1 dJ-l.
I
11.15 Corollary. Left and right Haar measures are mutually absolutely continuous.
Exercises
1. If G is a topological group and E c G, then E == n{ EV : V is a neighborhood
of e } .
2. If J-l is a Radon measure on the locally compact group G and I E C c ( G), the
functions x J LxI dJ-l and x J Rx! dJ-l are continuous.
3. Let G be a locally compact group that is homeomorphic to an open subset U of
JRn in such a way that, if we identify G with U, left translation is an affine map - that
is, xy == Ax (y) + b x where Ax is a linear transformation of JRn and b x E JRn. Then
I det Ax 1-1 dx is a left Haar measure on G, where dx denotes Lebesgue measure on
JRn. (Similarly for right translations and right Haar measures.)
4. The following are special cases of Exercise 3.
a. If G is the multiplicative group of nonzero complex numbers z = x + iy,
(x 2 + y2)-I dx dy is a Haar measure.
b. If G is the group of invertible n x n real matrices, I det A I-n dA is a left
and right Haar measure, where dA == Lebesgue measure on JRnxn. (To see
that the determinant of the map X AX is I det Aln, observe that if X is
the matrix with columns Xl, . . . , X n , then AX is the matrix with columns
AX I , . . . , Axn.)
c. If G is the group of 3 x 3 matrices of the form
( )
(x, y, z E JR),
348 MORE MEASURES AND INTEGRALS
then dx dy dz is a left and right Haar measure.
d. If G is the group of 2 x 2 matrices of the form
( i)
(x > 0, Y E JR),
then x- 2 dx dy is a left Haar measure and x-I dx dy is a right Haar measure.
S. Let G be as in Exercise 4d. Construct a Borel set in G with finite left Haar
measure but infinite right Haar measure, and a left uniformly continuous function on
G that is not right uniformly continuous.
6. Let {Ga}oEA be a family of topological groups and G == IlaEA Ga.
a. With the product topology and coordinatewise multiplication, G is a topolog-
ical group.
b. If each G a is compact and J-la is the Haar measure on Go such that J-la ( G a) ==
1, then the Radon product of the J-la's, as constructed in Theorem 7.28, is a Haar
measure on G.
7. In Exercise 6, for each a let G a be the multiplicative group { -1, I} with the
discrete topology. Let J-l be a Haar measure on G.
a. If 7f a : G -+ {-I, 1 } is the ath coordinate function, then J 7f a 7f (3 dJ-l == 0 for
a =1= (3.
b. If A is uncountable, £2 (J-l) is not separable even though J-l( G) < 00.
8. Let Q have the relative topology induced from JR. Then Q is a topological
group that is not locally compact, and there is no nonzero translation-invariant Borel
measure on Q that is finite on compact sets.
9. Let G be a locally compact group with left Haar measure J-l.
a. G has a subgroup H that is open, closed, and a-compact. (Let H be the
subgroup generated by a precompact open neighborhood of e.)
b. The restriction of J-l to subsets of H is a left Haar measure on H.
c. J-l is decomposable in the sense of Exercise 15 in 33.2.
d. If the topology of G is not discrete, then J-l( { x }) = 0 for all x E G. In this
case, J-l is regular iff J-l is semifinite iff J-l is a-finite iff G is a-compact. (See
Exercises 12 and 14 in 37.2.)
11.2 HAUSDORFF MEASURE
In geometric problems it is important to have a method for measuring the size of
lower-dimensional sets in JRn, such as curves and surfaces in }R3. Differential-
geometric techniques provide such a method that applies to smooth submanifolds of
JRn ; see 311.4. However, there is also a measure-theoretic approach to the problem
that applies to more general sets. Indeed, the basic ideas can be carried out just as
easily in arbitrary metric spaces, so we begin by working in this generality.
HAUSDORFF MEASURE 349
Let (X, p) be a metric space. (See 30.6 for the relevant terminology.) An outer
measure j1* on X is called a metric outer measure if
j1*(A U B) = j1*(A) U j1*(B) whenever p(A, B) > o.
11.16 Proposition. If j1* is a metric outer measure on X, then every Borel susbset
of X is j1 * -measurable.
Proof. Since the closed sets generate the Borel a-algebra, it suffices to show that
every closed set F c X is j1*-measurable. Thus, given A c X with j1*(A) < 00,
we wish to show that
j1* (A) > j1* (A n F) + j1* (A \ F).
Let Bn = {x E A \ F : p( x, F) > n -1 }. Then Bn is an increasing sequence of sets
whose union is A \ F (since F is closed), and p(Bn, F) > n- 1 . Therefore,
j1*(A) > j1*((AnF)UB n ) =j1*(AnF)+j1*(B n ),
so it will be enough to show that j1*(A \ F) = lim j1*(B n ). Let C n = Bn+1 \ Bn.
If x E C n + 1 and p(x, y) < [n(n + 1)]-1, then
1 1 1
p(y, F) < p(x, y) + p(x, F) < ( 1) + 1 = -,
n n+ n+ n
so that p(C n + 1 , Bn) > [n(n + 1)]-1. A simple induction therefore shows that
j1*(B 2k + 1 ) > j1*(C 2 k U B 2k - 1 ) = j1*(C 2k ) + j1*(B 2 k-1)
k
> j1* (C 2k ) + j1* (C 2k - 2 U B2k-3) . . . > L: j1* (C 2j ),
1
and similarly j1*(B 2 k) > L j1*(C 2j - 1 ). Since j1*(B n ) < j1*(A) < 00, it follows
that the series L j1 * ( C 2j ) and L j1 * ( C 2j -1) are convergent. But by subadditivity
we have
00
j1*(A \ F) < j1*(B n ) + L: j1*(C j ).
n+1
As n -t 00, the last sum vanishes and we obtain
j1* (A \ F) < lim inf j1* (Bn) < lim sup j1* (Bn) < j1* (A \ F),
as desired.
.
We are now ready to define Hausdorff measure. Suppose that (X, p) is a metric
space, p > 0, and 8 > O. For A c X, let
00 00
Hp,I5(A) = inf{L:(diamBj)p: A C UB j and diamBj < 8},
1 1
350 MORE MEASURES AND INTEGRALS
with the convention that inf 0 == 00. As 8 decreases the infimum is being taken over
a smaller family of coverings of A, so Hp,8(A) increases. The limit
Hp(A) = l Hp,8(A)
is called the p-dimensional Hausdorff (outer) measure of A.
Several comments on this definition are in order:
. The sets Bj in the definition of H p ,8 are arbitrary subsets of X. However,
one obtains the same result if one requires the Bj's to be closed (because
diam Bj = diam B j ), or if one requires the Bj's to be open (because one can
replace Bj by the open set U j = {x : p(x, Bj) < E2- j - 1 }, whose diameter is
at most (diam Bj) + E2- j ). Similarly, if X = }R, one can restrict the Bj's to
be closed or open intervals.
. The intuition behind the definition of Hp is that if p is an integer and A is a
"p-dimensional" subset of}Rn such as a relatively open set in a p-dimensional
linear subspace of}Rn, the amount of A that is contained in a region of diameter
r should be roughly proportional to r P .
. The restriction to coverings by sets of small diameter is necessary to provide an
accurate measure of irregularly shaped sets; otherwise one could simply cover
a set by itself, with the result that its measure would be at most the pth power of
its diameter. Consder, for example, the curve Am = {(x, sin nx) : I x I < I} in
}R2. Clearly diam Am < 2 3 / 2 for all m, but the length of An tends to 00 along
with m. One needs to take 8 « m- 1 before H 1 ,8(A) becomes an accurate
estimate of the length of Am.
We now derive the basic properties of Hp.
11.17 Proposition. Hp is a metric outer measure.
Proof. H p ,8 is an outer measure by Proposition 1.10, and it follows that Hp is an
outer measure. If p( A, B) > 0 and { C j } is a covering of AUB such that (diam C j ) <
8 < p(A, B) forallj, then noCj can intersect both A and B. Splitting L:(diam Cj)P
into two parts according to whether C j n B = 0 or C j n A = 0 shows that
L:(diamCj)P > H p ,8(A)+H p ,8(B),andhenceH p ,8(AUB) > Hp,8(A)+Hp,8(B).
As this inequality is valid whenever 8 < p(A, B), the desired result follows by letting
8 -t o. .
In view of Propositions 11.16 and 11.17, the restriction of H P to the Borel sets is
a measure, which we still denote by Hp and call p-dimensional Hausdorff measure.
11.18 Proposition. Hp is invariant under isometries of X. Moreover, if Y is any
set and f,g: Y -t X satisfy p(f(y),f(z)) < Cp(g(y),g(z))forally,z E Y, then
Hp(f(A)) < CPHp(g(A)) for all A c Y.
Proof. The first assertion is evident from the definition of Hp. As for the
second, given E,8 > 0, cover g(A) by sets Bj such that diam Bj < C- 1 8 and
HAUSDORFF MEASURE 351
2: (diam Bj)P < Hp(g(A)) + E. Then the sets Bj == f(g-I(Bj)) cover f(A), and
diam Bj < C( diam Bj) < 8, so that
Hp,o(f(A)) < L(diamBj)P < CPHp(g(A)) + CPE.
The proof is completed by letting 8 0 and E o.
.
11.19 Proposition. If Hp(A) < 00, then Hq(A) == Of or all q > p. If Hp(A) > 0,
then Hq(A) == 00 for all q < p.
Proof. It suffices to prove the first statement, as the second one is its contra-
positive. If Hp(A) < 00, for any 8 > 0 there exists {B j }l with A c UBj,
diamBj < 8, and 2: (diam Bj)P < Hp(A) + 1. Butifq > p,
L(diamBj)q < 8 q - P L(diamB j )P < 8 q - P (H p (A) + 1),
so Hq,o(A) < 8 q - P (H p (A) + 1). Letting 8 0, we see that Hq(A) == O. .
According to Proposition 11.19, for any A C X the numbers
inf{p > 0 : Hp(A) == O} and sup{p > 0 : Hp(A) == oo}
are equal. Their common value is called the Hausdorff dimension of A.
From now on we restrict attention to the case X == IRn. Our object is to show
that for p == I, . . . , n, Hp gives the geometrically correct notion of measure (up to a
normalization constant) for p-dimensional submanifolds of IRn. We begin with the
case p == n.
11.20 Proposition. There is a constant'rn > 0 such that'rnHn is Lebesgue measure
on IRn.
Proof. Hn is a translation-invariant Borel measure on IRn. If Q c IRn is a cube,
it is easily verified that 0 < Hn(Q) < 00 (Exercise 10). It follows that Hn =1= 0
and that Hn is finite on compact sets, whence Hn is a Radon measure by Theorem
7.8. The desired result is therefore a consequence of Theorem 11.9. (The simple
argument given there for the Abelian case can be read without going through the rest
of 311.1: Simply read x + y for xy and -x for x-I.) .
The normalization constant 'rn turns out to be the volume of a ball of diameter
1, which by Corollary 2.55 is Jrn/2 j2 n r(!n + 1). We shall not give the proof
here, as the value of'rn is irrelevant for our purposes. (The hard part is proving
the intuitively obvious fact that among all sets of diameter 1, the ball has the largest
volume.) Many authors build 'rn into the definition of Hausdorff measure; that is,
they define p-dimensional Hausdorff measure, for p E [0,(0), to be 'rpHp where
'rp == JrP/2 j2 p r(!p + 1).
We now consider lower-dimensional sets in IRn. If 1 < k < n, a k-dimensional
C 1 submanifold of IRn is a set M C IRn with the following property: For each
352 MORE MEASURES AND INTEGRALS
x E M there exist a neighborhood U of x in ffi.n, an open set V c ffi. k, and an
injective map f : V -t U of class 0 1 such that f(V) = M n U and the differential
Dxf - i.e., the linear map from ffi.k to ffi.n whose matrix is [(8fij8xj)(x)] - is
injective for each x E V. Such an f is called a parametrization of M n U. Every
submanifold M can be covered by countably many U's for which M n U has a
parametrization, so for our purposes it will suffice to assume that M = M n U has a
global parametrization.
(There are other common definitions of "submanifold": M is a k-dimensional 0 1
submanifold if it is locally the set of zeros of a 0 1 map 9 : ffi.n -t ffi.n-k such that Dxg
is surjective at each x E M, or if it is locally the image of a ball in a k-dimensional
linear subspace of ffi.n under a 0 1 diffeomorphism of ffi. n . The equivalence of these
definitions with the one given above is a standard exercise in the use of the implicit
function theorem.)
We begin our study of submanifolds with the linear case. If T is a linear map from
ffi.k to ffi.n and T* : ffi.n -t ffi.k is its transpose, then T*T is a positive semi definite
linear operator on ffi. n. Its determinant is therefore nonnegative, and we may define
J(T) = V det(T*T).
11.21 Proposition. If k < n, A C ffi.k, and T : ffi.k -t ffi.n is linear, then
Hk{T(A)) = J(T)Hk(A).
Proof. If k = n, then det(T*T) = (det T)2, so J(T) = I det TI and the
assertion reduces to Theorem 2.44 because of Proposition 11.20. If k < n, let R
be a rotation of ffi.n that maps the range of T to the subspace ffi.k x {O} = {y E
ffi.n : Yj = 0 for j > k}, and let 8 == RT. Then 8* 8 = T* R* RT = T*T, so that
J(8) = J(T), and H k (8(A)) = Hk(T(A)) since Hk is rotation-invariant. But if
we identify ffi.k x {O} with ffi.k, 8 becomes a map from ffi.k to itself, and the definition
of 8* 8 is unchanged when this identification is made. We are therefore back in the
equidimensional case, which was disposed of above. .
It is now easy to guess what the corresponding formula must be for a general
smooth injection f : IRk -t ffi.n, since locally every smooth map is approximately
linear. Our next lemma makes this idea precise.
11.22 Lemma. Suppose M is a k-dimensional 0 1 submanifold offfi.n parametrized
by f : V -t jRn. For any a > 1 there is a sequence {B j } of disjoint Borel subsets of
V such that V = U Bj, and a sequence {T j } of linear maps from ffi.k to ffi.n, such
that
(11.23) a-1lTjzl < I(Dxf)zl < alTjzlforx E Bj, z E ffi.k
and
(11.24) a- 1 lTjx - TjYI < If(x) - f(Y)1 < alTjx - Tjylfor x, Y E Bj.
Proof. Let us fix E > 0 and (3 > 1 such that
a-I + E < (3-1 < 1 < (3 < a - E,
HAUSDORFF MEASURE 353
and let 'J be a countable dense subset of the set of linear maps from IRk to IRn (e.g.,
the set of matrices with rational entries). For T E 'J and mEN let E (T, m) be the
set of all x E V such that
,6-1ITzl < I (Dxf)zl < ,6ITzl for z E IRk,
a- 1 lTx - Tyl < If(x) - f(y)1 < alTx - Tyl for all y E V with Iy - xl < m- 1 .
The definition of E(T, m) is unaffected if y and z are restricted to lie in countable
dense subsets of V and IRk; hence E(T, m) is defined by countably many inequalities
involving continuous functions, so it is a Borel set. It will therefore suffice to show
that the sets E(T, m) cover V. Indeed, each E(T, m) is a countable union of sets
Ei(T, m) of diameter less than m- 1 , so by disjointifying the countable collection
{E i (T, m) : T E T, i, mEN} we obtain the desired sets Bj and the associated
maps Tj.
Suppose, then, that x E V, and let 8 0 = inf{I(Dxf)zl : Izi = 1}. Since Dxf is
injective, 8 0 is positive. Choose 8 > 0 so that 8 < (,6 - 1)8 0 and 8 < (1 - ,6-1 )8 0 ,
and then pick T E 'J such that liT - Dxfll < 8. Then
ITzl < I(Dxf)zl + ITz - (Dxf)zl < I(Dxf)zl + 81 z 1 < ,61(D x f)zl,
and similarly ITzl > ,6-11 (Dxf)zl. This establishes the first inequality and also
shows that T is injective, so 1] = inf{ITzl : Izl = 1} is positive. Since f is
differentiable at x, there exists mEN such that
If(Y) - f(x) - (Dxf)(y - x)1 < E1]IY - xl < EIT(y - x)1 for Ix - yl < m- 1 .
But then
If(y) - f(x)1 < If(y) - f(x) - (Dxf)(y - x)1 + I(Dxf)(Y - x)1
< EITy - Txl + ,6ITy - Txl < alTy - Txl,
and similarly If(y) - f(x)1 > a- 1 lTy - Txl. In short, x E E(T, m), so we are
done. .
11.25 Theorem. Let M be a k-dimensional 0 1 submanifold ofIRn parametrized by
f : V IRn. If A is a Borel subset of V, then f(A) is a Borel subset ofIRn, and
(11.26)
Hk(f(A)) = L J(Dxf) dHk(X),
Moreover, if cP is a Borel measurable function on M that is either nonnegative or in
£1(M, H k ), then
(11.27)
1M <jJ(y) dHk(Y) = [<jJ(f(X))J(Dxf) dHk(X),
Proof. Since V is an open subset of IRk, it is a-compact. It follows that if A is
closed in V, then A is a-compact, and hence f(A) is a-compact since f is continuous.
354 MORE MEASURES AND INTEGRALS
The collection of all A c V such that f (A) is Borel is therefore a a-algebra that
contains all closed sets, hence all Borel sets. We shall prove (11.26); this establishes
(11.27) when cP == Xf(A)' and the general result follows from the usual linearity and
approximation arguments.
Given a > 1, let {B j }, {T j } be as in Lemma 11.22, and let Aj = An Bj. It
follows from (11.23) and Proposition 11.18 that
a-kHk(Tj(E)) < Hk((Dxf)(E)) < akHk(Tj(E))
(x E Aj, E C }Rk),
and hence by Proposition 11.21,
a- k J(T j ) < J(Dxf) < a k J(T j )
(x E Aj).
But it also follows from (11.24) and Proposition 11.18 that
a- k Hk(Tj(Aj)) < Hk(f(Aj)) < a k Hk(Tj(Aj))
and from Proposition 11.21 that Hk{Tj(Aj)) = J(Tj)Hk(A j ). Therefore,
a- 2k Hk(f(Aj)) < a- k J(Tj)Hk(Aj) < L. J(Dxf) dHk(X)
J
< akJ(Tj)Hk(A j ) < a2kHk(f(Aj)).
Summing over j, we obtain
a- 2k Hk(f(A)) < L J(Dxf) dHk(X) < a 2k Hk(f(A)),
so the proof is completed by letting a -t 1.
.
If both sides of the identities (11.26) and (11.27) are multiplied by the normalizing
constant'rk in Proposition 11.20, the integrals on the right become ordinary Lebesgue
integrals, and we obtain the formula for measures and integrals on M given by Rie-
mannian geometry (see 311.4). Moreover, if k = n we have J(Dxf) = I detDxfl,
so the result reduces to Theorem 2.47. See also Exercises 11 and 12.
There remains the question of whether p-dimensional Hausdorff measure is of any
interest when p t/:. N. An affirmative answer will be provided in the next section.
Exercises
10. Show directly from the definition of Hn that if Q is a cube in }Rn, then 0 <
Hn (Q) < 00. (Hint: There is a constant C such that if E c }Rn, the Lebesgue
measure of E is at most C( diam E)n.)
11. If f : (a, b) -t }Rn is a parametrization of a smooth curve (i.e., a I-dimensional
C1 submanifold of }Rn), the Hausdorff I-dimensional measure of the curve is
J:lf'(t)ldt.
SELF-SIMILARITY AND HAUSDORFF DIMENSION 355
12. If cjJ : }Rk -t }R is a C 1 function, the graph of cjJ is a k-dimensional C1 submanifold
of}Rk+l parametrized by f(x) = (x,cjJ(x)). If A c }Rk, the k-dimensional volume
of the portion of the graph lying abo ve A is
l V I + 1'V4>(x)1 2 dx.
(First do the linear case, cjJ(x) = a . x. Show that if T : }Rk -t }Rk+1 is given by
Tx = (x, a . x), then T*T = I + 8 where 8x = (a . x)a, and hence det(T*T) =
1 + lal 2 . Hint: The determinant of a matrix is the product of its eigenvalues; what
are the eigenvalues of 8?)
13. In any metric space, zero-dimensional Hausdorff measure is counting measure.
14. If Am is a subset of a metric space X of Hausdorff dimension Pm for mEN,
then U Am has Hausdorff dimension sUP m Pm.
15. If A c }Rn has Hausdorff dimension p, then A x A C }R2n has Hausdorff
dimension 2p.
11.3 SELF-SIMILARITY AND HAUSDORFF DIMENSION
In this section we produce some geometrically interesting examples of sets of frac-
tional Hausdorff dimension. The sets we consider are "self-similar," which means
roughly that each small part of the set looks like a shrunken copy of the whole set. We
begin by establishing the terminology necessary to discuss such sets. Our definitions
will be more restrictive than is really necessary, since we aim only to give the flavor
of the theory and to display some examples.
For r > 0, a similitude with scaling factor r is a map 8 : }Rn -t }Rn of the
form 8 (x) == rO (x) + b, where 0 is an orthogonal transformation (a rotation or
the composition of a rotation and a reflection) and b E }Rn. Suppose that S =
(8 1 , . . . ,8m) is a finite family of similitudes with a common scaling factor r < 1. If
E c }Rn, we define
m
SO(E) == E,
S(E) = U 8 j (E),
1
sk(E) = S(Sk-1(E)) for k > 1.
E is called invariant under S if S(E) = E. In this case, Sk(E) == E for all k,
which means that for every k > 1, E is the union of m k copies of itself that have
been scaled down by a factor of r k . If, in addition, these copies are disjoint or have
negligibly small overlap, E can be said to be "self-similar."
Before proceeding with the theory, let us examine some of the standard examples
of self-similar sets. They are all obtained by starting with a simple geometric figure,
applying a family of similitudes repeatedly to it, and passing to the limit.
. Given {3 E (0, !), let C{3 be the Cantor set obtained from [0, 1] by successively
removing open middle (1 - 2(3)ths of intervals, as discussed at the end of 31.5.
356 MORE MEASURES AND INTEGRALS
That is, C{3 = n Sk(E) where
E = [0,1], S == (8 1 ,8 2 ), 8 1 (x) == (3x, 82(X) = {3x + 1 - (3.
See Figure 11.1a.
. The Sierpinski gasket r is the subset of}R2 obtained from a solid triangle
by dividing it into four equal subtriangles by bisecting the sides, deleting the
middle subtriangle, and then iterating. Thus, if we take the initial triangle
to be the closed triangular region with vertices (0,0), (1,0), and (!, 1), then
r == n Sk(), where S == (8 1 ,8 2 ,8 3 ) with
8j(x) == !x + b j ,
See Figure 11.1 b.
. The snowflake curve is the subset of}R2 obtained from a line segment by
replacing its middle third by the other two legs of the equilateral triangle based
on that middle third (there are two such triangles; make a definite choice),
and then iterating. That is, let L be the broken line joining (0,0) to (, 0) to
( !, V3) to (, 0) to (1, 0), and let
b i = (0, 0), b 2 = (!, 0), b 3 == (, !).
S == (8 1 ,...,8 4 ), 8j(x) = Oj(x) + b j ,
b l == (0, 0), b 2 == (, 0), b 3 = (!, V3), b 4 = (., 0) ,
0 1 == 0 4 = I, O 2 = R 7r / 3 , 0 3 = R_ 7r / 3 ,
where Ro denotes the rotation through the angle (). Then = limkoo Sk (L );
more precisely, = U Sk(L) \ U Sk(M), where M is the union of the
open middle thirds of the line segments that constitute L. See Figure 11.lc.
(The actual "snowflake" is made by joining three rotated and reflected copies
of in the same way in which one can join three copies of the initial figure L
to make a six-pointed star.)
The Cantor sets, the Sierpinski gasket, and the snowflake curve are all clearly
invariant under the families of similitudes used to generate them. The condition of
negligi ble overlap of the rescaled copies is also satisfied, for in all cases 8 i (E) n 8 j (E)
is either empty or a single point.
We now return to the general theory. Suppose that S = (8 1 , . . . ,8m) is a family of
similitudes with scaling factor r < 1. We introduce some notation for the actions of
the iterations ofS on points, sets, and measures: For x E }Rn, E c }Rn, J1 E M(}Rn),
and ii, ... ,ik E {1,... ,m}, we set
Xi1...ik = 8i 1 0 . . .0 8ik (x), E i1 ... ik = 8i 1 0 . . .0 8ik (E),
J1i 1 ...ik(E) = J1((8i 1 0...0 8 ik )-I(E)).
It is an important property of compact sets that are invariant under a family of
similitudes that they carry measures with a corresponding invariance property.
SELF-SIMILARITY AND HAUSDORFF DIMENSION 357
(b)
(a)
(c)
Fig. 11.1 The first three approximations to (a) the Cantor set C 3 / 8 , (b) the Sierpinski gasket,
and (c) the snowflake curve.
358 MORE MEASURES AND INTEGRALS
11.28 Theorem. Suppose that S == (8 1 ,..., 8m) is a family of similitudes with
scaling factor r < 1 and that X is a nonmempty compact set that is invariant under
S. Then there is a Borel measure J1 on}Rn such that J1(}Rn) = 1, supp(J1) = X, and
for all kEN,
( 11.29)
1
jt = m k L
il,...,ik=l
m
J1i 1 ...ik.
Proof. We shall construct J1 as a measure on X, extending it to }Rn by setting
J1(XC) = o. Pick x E X, let Dx be the point mass at x, and for kEN define
J1k E M(X) by
m
jtk = l k " [Dx]il...i k ,
m
il,...,ik=l
that is, for f E C( X),
f 1 m
! djtk = m k . !(Xil...ik)'
l ,...,k=l
Thus each J1k is a probability measure on X. We claim that the sequence {J1k}
converges vaguely as k 00. Indeed, given f E C(X) and E > 0, there exists
K > 0 such that If(x) - f(y)1 < E whenever x, y E X and Ix - yl < r K (diam X).
Suppose l > k > K. Since Xil...il E X il ." ik and diam XiI ...ik == r k (diam X), we
have
If(Xi1...ik) - f(Xi1...ikik+l...il)1 < E.
Summing over ik+l' . . . , il gives
1!(Xi1'''ik) - m;-k L !(Xil...ikik+l...il) I < 10,
. .
k+l,"',l
and then summing over iI, . . . , ik yields I J f dJ1k - J f dJ1ll < E. Thus the sequence
{J f dJ1k} converges for every f, and the limit defines a positive linear functional on
C(X).
Let J1 be the associated Radon measure, according to the Riesz representation
theorem. Clearly J1(X) == J 1 dJ1 = lim J 1 dJ1k = 1. Also, we have Xi1"'Xk E
Xi"'ik' diamX i1 "' ik = rk(diamX), and X == UX i1 ... ik , so the points Xi1...ik
(k E N) are dense in X; it follows that supp(J1) = X. Also, from the definition of
J1k we have
m
k+l _ 1 '"
J1 - m k
il,...,ik=l
[J1l] i 1 . ..ik
As I 00, J1k+l and J1l both tend vaguely to J1, and composition with similitudes
preserves vague convergence, so (11.29) follows. .
SELF-SIMILARITY AND HAUSDORFF DIMENSION 359
The existence of an invariant measure on the invariant set X requires no special
hypotheses on the similitudes Sj, but in order to be able to compute the Hausdorff
dimension of X, we need to impose an extra condition which (as we shall see
in Theorem 11.33b) guarantees that the sets Sj (X) have negligibly small overlap.
Namely, we require S to possess a separating set: a nonempty bounded open set U
such that
(11.30)
S(U) c U,
Si(U) n Sj(U) == 0 if i =1= j.
The existence of a separating set is more delicate than it might seem at first; the
first condition in (11.30) will fail if U is too small, and the second one will fail if U
is too big. However, all the examples considered above admit separating sets. As
the reader may easily verify, for the Cantor sets C{3 one can take U == (0,1), for the
Sierpinski gasket one can take U to be the interior of the initial triangular region ,
and for the snowflake curve one can take U to be the interior of the triangular region
with vertices (0,0), (1,0), and (, iJ3), i.e., the interior of the convex hull of the
initial figure L.
11.31 Proposition. Suppose that S is a family of similitudes with scaling factor
r < 1 that admits a separating set U. Then there is a unique nonempty compact set
that is invariant under S, namely, n Sk( U ).
Proof. Since S(U) c U and the Sj'S are continuous, we have
U :) S( U ) :) S2 ( U ) :) . . . .
It follows that X == n Sk ( U ) is a compact invariant set for S, and it is nonempty
by Proposition 4.21. If Y is another such set, let d(Y, X) == maxyEY p(y, X) be
the maximum distance from points in Y to X. Since Sj decreases distances by
a factor of r, we have d(Sj(Y), Sj(X)) == rd(Y, X). But Y = U Sj(Y), so
d(Y,X) < maxjd(Sj(Y),X) < rd(Y, X). It follows that d(Y,X) = 0, which
means that Y c X since X and Yare compact. By the same reasoning, X c Y, so
X is the unique compact invariant set. .
11.32 Lemma. Let c, C, 8 be positive numbers. Suppose that {Ua}aEA is a collec-
tion of disjoint open sets in JRn such that each U a contains a ball of radius c8 and
is contained in a ball of radius C8. Then no ball of radius 8 intersects more than
(1 + 2c)nc- n of the sets U a.
Proof. If B is a ball of radius 8 and Bn U a =1= 0, then U a is contained in the ball
concentric with B with radius (1 + 2C)8. Hence, if N of the U a's intersect B, there
are N disjoint balls of radius c8 contained in a ball of radius (1 +2C)8. Adding up their
Lebesgue measures, we see that N(c8)n < [(1 + 2C)8]n, so N < (1 + 2c)n c -n. .
360 MORE MEASURES AND INTEGRALS
11.33 Theorem. Suppose that S = (8 1 ,..., Sm) is a family of similitudes with
scaling factor r < 1 that admits a separating set U, let X be the unique nonempty
compact set that is invariant under S, and let p = 10gl/r m. Then
a. 0 < Hp(X) < 00; in particular, X has Hausdorff dimension p.
b. Hp(Si(X) n 8j(X)) == Of or i =1= j.
Proof. For any kEN we have X = Sk(X) = UX i1 ... ik and diarnX i1 .'. ik =
rk(diarnX), so if 8k = rk(diarnX),
H p ,8k(X) < L(diarnXi1...ik)P = mkrkp(diarnX)P = (diarnX)P.
Since 8k 0 as k 00, it follows that Hp(X) < (diarnX)P < 00.
Next we show that Hp(X) > O. Choose positive numbers c and C so that the
separating set U contains a ball of radius cr- 1 and is contained in a ball of radius C,
and let N = (1 + 2c)n c -n. We shall prove that Hp(X) > N-1 by showing that if
{Ej } 1 is any covering of X by sets of diameter < 1, then E (diarn Ej)P > N-1.
Since any set E with diameter 8 is contained in a (closed) ball of radius 8, it is enough
to show that if X c U Bj where Bj is a ball of radius 8j < 1, then E 8f > N- 1 .
Let J1- be the measure on X given by Theorem 11.28. We claim that if B is any
ball of radius 8 < 1, then J1-(B) < N8 P . The desired conclusion is an immediate
consequence:
1 = J1-(X) < LJ1-(B j ) < NL8f.
To prove the claim, let k be the integer such that r k < 8 < r k - 1 . By (11.29),
m
J1-(B) = m -k L J1-i 1 ...ik (B).
il,...,ik=l
Since X c U by Proposition 11.31, we have sUPP(J1- i l...ik) = Xii."ik C U ii ... ik , so
J1- i l".ik (B) = 0 unless B intersects U il".ik. On the other hand, iteration of (11.30)
shows that the sets U i1 .. .ik are all disjoint, and each of them contains a ball of radius
cr k - 1 > c8 and is contained in a ball of radius Crk < 08. By Lemma 11.32, B can
intersect at most N of the U i1 '.. ik 's. Therefore,
J1-(B) < Nm- k == Nr kp < N8 P ,
as claimed.
Finally, since Sj decreases distances by a factor of r, we have H p (8 j (X))
rPHp(X) = m- 1 H p (X) and hence Hp(X) = E;.n Hp(Sj(X)). But since X =
U;.n Sj(X), this can happen only if Hp(Si(X) n 8j(X)) = 0 for i =1= j. .
11.34 Corollary. The Cantor sets 0/3, the Sierpinski gasket, and the snowflake curve
have Hausdorff dimension 10gl/ /3 2, log2 3 , and log3 4, respectively.
INTEGRATION ON MANIFOLDS 361
Exercises
16. Modify the construction of the snowflake curve by using isosceles triangles rather
than equilateral ones. Tha t is, giv en i < (3 < , let L be the broken line connecting
(0,0) to ({3,0) to (1, y' 4{3 - 1) to (1 - (3, 0) to (1,0). Proceeding inductively,
let Lk be the figure obtained by replacing each line segment in Lk-l by a copy of
L, scaled down by a factor of {3k, and let /3 be the limiting set. (Thus 1/3 is
the ordinary snowflake curve.) Find the family of similitudes under which /3 is
invariant, show that it possesses a separating set, and find the Hausdorff dimension
of /3.
17. Investigate analogues of the Sierpinski gasket constructed from squares, or
higher-dimensional cubes, rather than triangles. (There are various possibilities
here. )
18. Givenn E Nandp E (0,n),constructBorelsetsE l ,E 2 ,E 3 C JRnofHausdorff
dimension p, with the following properties.
a. 0 < Hp(E l ) < 00. (Exercise 15 or Exercise 17 could be used here.)
b. H p (E 2 ) == 00. (Use (a).)
c. Hp(E3) == O. (Use Exercise 14.)
19. The measure J1- in Theorem 11.28 is unique.
11.4 INTEGRATION ON MANIFOLDS
This section is a brief essay on integration on manifolds for the benefit of those
who are familiar with the language of differential geometry and wish to see how
the geometric notions of integration fit into the measure-theoretic framework. The
discussion and the notation will both be quite informal.
Let M be a Coo manifold of dimension m. (We work in the Coo category for con-
venience; C l would actually suffice.) Given a coordinate system x == (Xl, . . . , X m )
on an open set U c M, one can consider Lebesgue measure dx == dXl . . . dXm on
U. This has no intrinsic geometric significance, for if Y == (Yl,'.., Ym) is another
coordinate system on U, by Theorem 2.47 we have dy == I det(8yj8x) I dx where
(8y j 8x) denotes the matrix (8Yij 8xj). However, dy and dx are mutually absolutely
continuous, and the Radon-Nikodym derivative I det( 8y j 8x) I is a Coo function. It
therefore makes sense to define a smooth measure on M to be a Borel measure J1-
which, in any local coordinates x, has the form dJ1- == cjJx dx where cjJx is a nonnega-
tive Coo function. The representations of J1- in different coordinate systems are then
related by
(11.35)
cjJX == I det( 8y j 8x) IcjJY.
(This conveniently sloppy notation is in the same spirit as the formula duj dt
(dujdx)(dxjdt) for the chain rule. More precisely, if y == F(x), then cjJx
I det( 8yj 8x) I (cjJY 0 F).)
362 MORE MEASURES AND INTEGRALS
Equation (11.35) may be interpreted in the language of geometry as follows. The
functions I det ( 8y / 8x ) I are the transition functions for a line bundle on M; a section
of this bundle, called a density on M, is an object that is represented in each local
coordinate system x by a function 4Jx, such that the functions for different coordinate
systems are related by (11.35). In short, smooth measures can be identified with
nonnegative densities on M. More generally, any density 4J defines, at least locally,
a smooth signed or complex measure J1- on M, so the integral JK 4J = f.-L(K) is well
defined for any compact K c M, as is J 14J == J 1 df.-L for any 1 E Cc(M).
Suppose now that M is equipped with a Riemannian metric. In any coordinate
system x, the metric is represented by a positive definite matrix-valued function
gX = (gfj). The matrix gY for another coordinate system is related to gX by
x '" Y 8Yi 8Yj
gij = gkl- 8 _ 8 '
xk Xl
k,l
( 8y ) * ( 8y )
or gX = 8x gY 8x '
so that
det gX = [det ( ) r det gY .
It follows that vi det 9 is a positive density on M canonically associated to the metric
g; it is called the Riemannian volume density on M. In particular, if M is a
submanifold of]Rn (n > m), it inherits a Riemannian structure from the ambient
Euclidean structure. If M is parametrized by 1 : V ]Rn as in 311.2, the metric 9
is given in the coordinates induced by 1 by
g'!'. = '" () ik () ik
7,) 8x' 8x. '
k 7, )
x ( 81 ) * ( 81 )
or 9 = ax ax .
Theorem 11.25 therefore asserts that integration of the Riemannian volume density
gives m-dimensional Hausdorff measure on M, up to the factor "'1m.
These ideas yield an easy construction of a left Haar measure on any Lie group,
that is, any topological group that is a Coo manifold and whose group operations are
COO. Namely, choose an inner product on the tangent space at the identity element
and transport it to every other point by left translation. The result is a left-invariant
Riemannian metric, and the associated volume density defines a left Haar measure.
The most popular things to integrate on manifolds are differential forms. For
our purposes it will suffice to describe a differential m-form on an m-dimensional
manifold M as a section of the line bundle whose transition functions are det( 8y / 8x).
That is, a differential m-form w is given in local coordinates x by a function w X ,
and the function w Y for a different coordinate system is related to W X by W X =
det( 8y / 8x )w Y . (The usual notation is w == W X dXl 1\ . . . 1\ dx m .) Differential m-
forms thus look just like densities if one restricts oneself to coordinate systems whose
Jacobian matrices have positive determinant. If it is possible to do this consistently
on all of M, M is called orientable. In this case, assuming that M is connected, the
coordinate systems on subsets of M fall into two classes such that within each class
one always has det( 8y / 8x) > 0; a choice of one of these classes is an orientation of
NOTES AND REFERENCES 363
M. (In]R3, for example, one speaks of left-handed or right-handed coordinates.) If
M is equipped with an orientation, therefore, differential m-forms may be identified
with densities and as such may be integrated over compact subsets of M.
The notion of density can be generalized. If 0 < B < 1, a O-density on M is a
section of the line bundle whose transition functions are I det( ay / ax) 1°. (Thus, a
I-density is a density, and a O-density is just a smooth function.) Suppose B > 0 and
P == B- 1 . If cP is a B-density, IcPlP is well defined as a nonnegative density and so can
be integrated over M. The set of B-densities cP such that IlcPlip = (J IcPlP)llp < 00
is a normed linear space whose completion LP (M) is called the intrinsic LP space
of M. The duality results of Chapter 6 work in this setting: If cPj is a Bj-density for
j == 1, 2, where B 1 + B 2 == 1, then cP1 cP2 is a density and I J cP1 cP21 < II <P111Pl II <P21Ip2'
where Pj = Oj1, and LPl (M) (LP2 (M))*.
11.5 NOTES AND REFERENCES
3 11.1: The existence and uniqueness of Haar measure were first proved by Haar [59]
and von Neumann [155], respectively, for groups whose topology is second countable;
the general case is due to Weil [158]. Our proof of existence and uniqueness follows
Weil [158] and Loomis [94]. There is another proof, due to H. Cartan, which yields
existence and uniqueness simultaneously and avoids the use of the axiom of choice
(which we invoked via Tychonoff's theorem). This proof, as well as further references
and historical remarks, can be found in Hewitt and Ross [75, 315].
Haar measure is the foundation for harmonic analysis on locally compact groups.
The articles by Graham, Weiss, and Sally in Ash [7] provide a good introduction to
this field; a more extensive treatment can be found in Folland [47].
311.2: Proposition 11.16 is due to Caratheodory [22], and the theory of Hausdorff
measure was developed in Hausdorff [69]. The computation of the constant "'In can
be found in Billingsley [17] or Falconer [39,31.4]. There are other ways of defining
lower-dimensional measures on ]Rn, all of which agree on smooth submanifolds but
sometimes differ on more irregular sets; see Federer [41].
The concept of Hausdorff measure can be generalized. If -X is any strictly increas-
ing continuous function on [0,(0) such that -X(O) = 0, for a subset A of a metric
space X one can define
00 00
H)..,c5(A) = inf{I: A(diamBj) : A c UB j , diamBj < 8}
1 1
and H)..(A) == lim6o H)..,6(A). Thus Ho = H).. where -X(t) == t p . Rogers [119]
contains a systematic treatment of these generalized Hausdorff measures.
311.3: The computation of the Hausdorff dimension of the Cantor sets C /3 goes
back to Hausdorff [69]. The arguments presented here are due to Hutchinson [78];
they can easily be extended to families of similitudes with different scaling factors,
364 MORE MEASURES AND INTEGRALS
that is, S == (8 1 ,..., 8m) where 8j has scaling factor rj < 1. Such a family always
has a unique nonempty compact invariant set X, and if it possesses a separating set,
the Hausdorff dimension of X is the number p such that E r} = 1. See Hutchinson
[78] or Falconer [39, 38.3].
Self-similar sets are among the simplest examples of "fractals." Falconer [39] is
a good reference for the geometric measure theory of fractals; see also Edgar [37],
Falconer [40], and Mandelbrot [96] for other aspects of the theory of fractals.
Continuous curves of Hausdorff dimension > 1 can be constructed from non-
differentiable functions. For example, if f : [a, b] ]R is Holder continuous of
exponent Q, 0 < Q < 1, the graph of f in]R2 can have Hausdorff dimension as large
as 2 - Q, and the range of a sample path of the n-dimensional Wiener process (n > 2)
almost surely has Hausdorff dimension 2. See Falconer [39, 338.2,7].
311.4: The theory of integration of differential forms can be found in a number of
books such as Warner [157] and Loomis and Sternberg [95]; the latter book also has
a discussion of densities.
Bibliography
1. R. A. Adams, Sobolev Spaces, Academic Press, New York (1975).
2. W. J. Adams, The Life and Times of the Central Limit Theorem, Kaedmon, New
York (1974).
3. Weak convergence of linear functionals, Bull. Amer. Math. Soc. 44 (1938), 196.
4. Weak topologies ofnormed linear spaces, Annals of Math. 41 (1940),252-267.
5. S. A. Alvarez, LP arithmetic, Amer. Math. Monthly 99 (1992), 656-662.
6. C. Arzela, Sulle funzioni di linee, Mem. Accad. Sci. 1st. Bologna Cl. Sci. Fis.
Mat. (5) 5 (1895), 55-74.
7. J. M. Ash (ed.), Studies in Harmonic Analysis, Mathematical Association of
America, Washington, D.C. (1976).
8. S. Banach, Sur Ie probleme de la mesure, Fund. Math. 4 (1923), 7-33; also in
Banach's Oeuvres, Vol. I, Polish Scientific Publishers, Warsaw (1967), 66-89.
9. S. Banach, Theorie des Operations Lineaires, Monografje Matematyczne, War-
saw, 1932; also in Banach's Oeuvres, Vol. II, Polish Scientific Publishers, Warsaw
(1967), 13-302.
10. S. Banach and H. Steinhaus, Sur Ie principe de la condensation de singularites,
Fund. Math. 9 (1927), 50-61; also in Banach's Oeuvres, Vol. II, Polish Scientific
Publishers, Warsaw (1967), 365-374.
365
366 BIBLIOGRAPHY
11. S. Banach and A. Tarski, Sur la decomposition des ensembles de points en parties
respectivement congruentes, Fund. Math. 6 (1924), 244-277; also in Banach's
Oeuvres, Vol. I, Polish Scientific Publishers, Warsaw (1967), 118-148.
12. R. G. Bartle, An extension of Egorov's theorem, Amer. Math. Monthly 87 (1980),
628-633.
13. R. G. Bartle, Return to the Riemann integral, Amer. Math. Monthly 103 (1996),
625-632.
14. W. Beckner, Inequalities in Fourier analysis, Annals of Math. 102 (1975), 159-
182.
15. C. Bennett and R. Sharpley, Interpolation of Operators, Academic Press, Boston
(1988).
16. J. Bergh and J. Lofstrom, Interpolation Spaces, Springer-Verlag, Berlin (1976).
17. P. Billingsley, Probability and Measure (2nd ed.), Wiley, New York (1986).
18. C. B. Blyth and P. K. Pathak, A note on easy proofs of Stirling's theorem, Amer.
Math. Monthly 93 (1986), 376-379.
19. N. Bourbaki, Sur les espaces de Banach, C. R. A cad. Sci. Paris 206 (1938),
1701-1704.
20. N. Bourbaki, General Topology (2 vols.), Hermann, Paris, and Addison-Wesley,
Reading, Mass. (1966).
21. A. Brown, An elementary example of a continuous singular function, Amer.
Math. Monthly 76 (1969), 295-297.
22. C. Caratheodory, Vorlesungen iiber Reelle Funktionen, Teubner, Leipzig (1918);
2nd ed. (1927), reprinted by Chelsea, New York (1948).
23. E. Cech, On bicompact spaces, Annals of Math. 38 (1937), 823-844.
24. P. R. Chernoff, A simple proof of Tychonoff's theorem via nets, Amer. Math.
Monthly 99 (1992), 932-934.
25. K. L. Chung, A Course in Probability Theory (2nd ed.), Academic Press, New
York (1974).
26. P. J. Cohen, Set Theory and the Continuum Hypothesis, Benjamin, New York
( 1966).
27. D. L. Cohn, Measure Theory, Birkhauser, Boston (1980).
28. R. Coifman, M. Cwikel, R. Rochberg, Y. Sagher, and G. Weiss, Complex inter-
polation for families of Banach spaces, Harmonic Analysis in Euclidean Spaces
BIBLIOGRAPHY 367
(Proc. Symp. Pure Math., Vol. 35), part 2, American Mathematical Society,
Providence, R.I. (1979), 269-282.
29. P. J. Daniell, A general form of integral, Annals of Math. 19 (1918),279-294.
30. K. M. Davis and Y. C. Chang, Lectures on Bochner-Riesz Means, Cambridge
University Press, Cambridge, U.K. (1987).
31. K. deLeeuw, The Fubini theorem and convolution formula for regular measures,
Math. Scand. 11 (1962), 117-122.
32. J. DePree and C. Swartz, Introduction to Real Analysis, Wiley, New York (1988).
33. J. Dieudonne, History of Functional Analysis, North-Holland, Amsterdam (1981).
34. J. Dugundji, Topology, Allyn and Bacon, Boston (1966); reprinted by
Wm. C. Brown, Dubuque, Iowa (1989).
35. N. Dunford and J. T. Schwartz, Linear Operators (3 vols.), Wiley-Interscience,
New York (1958, 1963, and 1971).
36. H. Dym and H. P. McKean, Fourier Series and Integrals, Academic Press, New
York (1972).
37. G. A. Edgar, Measure, Topology, and Fractal Geometry, Springer-Verlag, New
York (1990).
38. R. Engelking, General Topology (rev. ed.), Heldermann Verlag, Berlin (1989).
39. K. J. Falconer, The Geometry of Fractal Sets, Cambridge University Press, Cam-
bridge, U.K. (1985).
40. K. J. Falconer, Fractal Geometry, Wiley, New York (1990).
41. H. Federer, Geometric Measure Theory, Springer-Verlag, Berlin (1969).
42. C. Fefferman, Pointwise convergence of Fourier series, Annals of Math. 98
(1973),551-571.
43. M. B. Feldman, A proof of Lusin's theorem, Amer. Math. Monthly 88 (1981),
191-192.
44. E. Fischer, Sur Ie convergence en moyenne, C. R. Acad. Sci. Paris 144 (1907),
1022-1024.
45. G. B. Folland, Remainder estimates in Taylor's theorem, Amer. Math. Monthly
97 (1990), 233-235.
46. G. B. Folland, Fourier Analysis and Its Applications, Wadsworth & Brooks/Cole,
Pacific Grove, Cal. (1992).
368 BIBLIOGRAPHY
47. G. B. Folland, A Course in Abstract Harmonic Analysis, CRC Press, Boca Raton,
Fla. (1995).
48. G. B. Folland, Introduction to Partial Differential Equations (2nd ed.), Princeton
University Press, Princeton, N.J. (1995).
49. G. B. Folland, Fundamental solutions for the wave operator, Expos. Math. 15
(1997), 25-52.
50. G. B. Folland and A. Sitaram, The uncertainty principle: a mathematical survey,
J. Fourier Anal. Appl. 3 (1997), 207-238.
51. G. B. Folland and E. M. Stein, Estimates for the 8 b complex and analysis on the
Heisenberg group, Commun. Pure Appl. Math. 27 (1974), 429-522.
52. M. Frechet, Sur quelques points du calcul fonctionnel, Rend. Circ. Mat. Palermo
22 (1906), 1-74.
53. M. Frechet, Sur l'integrale d'une fonctionnelle etendue a un ensemble abstrait,
Bull. Soc. Math. France 43 (1915), 248-265.
54. M. Frechet, Des familIes et fonctions additivies d' ensembles abstraits, Fund.
Math. 5 (1924), 206-251.
55. I. M. Gelfand and G. E. Shilov, Generalized Functions, Academic Press, New
York (1964).
56. K. Godel, What is Cantor's continuum problem?, Amer. Math. Monthly 54 (1947),
515-525; revised and expanded version in P. Benacerraf and H. Putnam (eds.),
Philosophy of Mathematics, Prentice-Hall, Englewood Cliffs, N.J. (1964), 258-
273.
57. R. A. Gordon, The Integrals of Lebesgue, Denjoy, Perron, and Henstock, Amer-
ican Mathematical Society, Providence, R.1. (1994).
58. S. Grabiner, The Tietze extension theorem and the open mapping theorem, Amer.
Math. Monthly 93 (1986), 190-191.
59. A. Haar, Der Massbegriff in der Theorie der kontinuerlichen Gruppen, Annals
of Math. 34 (1933), 147-169; also in Haar's Gesammelte Arbeiten, Akademiai
Kiad6, Budapest (1959), 600-622.
60. H. Hahn, Uber die Multiplikation total-additiver Mengefunktionen, AnnaU
Scuola Norm. Sup. Pisa 2 (1933), 429-452.
61. H. Hahn and A. Rosenthal, Set Functions, University of New Mexico Press,
Albuquerque, N .M. (1948).
62. P. R. Halmos, Measure Theory, Van Nostrand, Princeton, N.J. (1950); reprinted
by Springer-Verlag, New York (1974).
BIBLIOGRAPHY 369
63. P. R. Halmos, Naive Set Theory, Van Nostrand, Princeton, N.J. (1960); reprinted
by Springer-Verlag, New York (1974).
64. G. H. Hardy and J. E. Littlewood, Some properties of fractional integrals I,
Math. Zeit. 27 (1928), 565-606; also in Hardy's Collected Papers, Vol. III,
Oxford University Press, Oxford (1969), 564-607.
65. G. H. Hardy and J. E. Littlewood, A maximal theorem with function-theoretic
applications, Acta Math. 54 (1930), 81-116; also in Hardy's Collected Papers,
Vol. II, Oxford University Press, Oxford (1967), 509-544.
66. G. H. Hardy, J. E. Littlewood, and G. P6lya, Inequalities (2nd ed.), Cambridge
University Press, Cambridge, U.K. (1952).
67. D. G. Hartig, The Riesz representation theorem revisited, Amer. Math. Monthly
90 (1983), 277-280.
68. F. Hausdorff, Grundziige der Mengenlehre, Verlag von Veit, Leipzig (1914);
reprinted by Chelsea, New York (1949).
69. F. Hausdorff, Dimension und ausseres Mass, Math. Annalen 79 (1919),157-179.
70. T. Hawkins, Lebesgue's Theory of Integration, University of Wisconsin Press,
Madison, Wisc. (1970).
71. J. Hennefeld, A nontopological proof of the uniform boundedness theorem, Amer.
Math. Monthly 87 (1980), 217.
72. R. Henstock, The General Theory of Integration, Oxford University Press, Ox-
ford, U.K. (1991).
73. E. Hewitt, On two problems of Urysohn, Annals of Math 47 (1946), 503-509.
74. E. Hewitt and R. E. Hewitt, The Gibbs-Wilbraham phenomenon: an episode in
Fourier analysis, Arch. Rist. Exact Sci. 21 (1979), 129-160.
75. E. Hewitt and K. A. Ross, Abstract Harmonic Analysis, Vol. I, Springer-Verlag,
Berlin (1963).
76. E. Hewitt and K. Stromberg, Real and Abstract Analysis, Springer-Verlag, Berlin
(1965).
77. L. Hormander, The Analysis of Linear Partial Differential Operators, Vol. I,
Springer-Verlag, Berlin (1983).
78. J. E. Hutchinson, Fractals and self-similarity, Indiana U. Math. J. 30 (1981),
713-747.
79. G. W. Johnson, An unsymmetric Fubini theorem, Amer. Math. Monthly 91 (1984),
131-133.
370 BIBLIOGRAPHY
80. S. Kakutani, Concrete representation of abstract (M)-spaces, Annals of Math. 42
(1941),994-1024.
81. S. Kakutani and J. C. Ox toby, Construction of a non-separable invariant extension
of the Lebesgue measure space, Annals of Math. 52 (1950), 580-590.
82. J. L. Kelley, The Tychonoff product theorem implies the axiom of choice, Fund.
Math. 37 (1950), 75-76.
83. J. L. Kelley, General Topology, Van Nostrand, Princeton, N.J. (1955); reprinted
by Springer-Verlag, New York (1975).
84. F. B. Knight, Essentials of Brownian Motion and Diffusion, American Mathe-
matical Society, Providence, R.I. (1981).
85. A. N. Kolmogorov, Grundbegriffe der WahrscheinUchkeitsrechnung, Springer-
Verlag, Berlin (1933); translated as Foundations of the Theory of Probability,
Chelsea, New York (1950).
86. H. Konig, Measure and Integration, Springer-Verlag, Berlin (1997).
87. T. W. Komer, Fourier Analysis, Cambridge University Press, Cambridge, U.K.
(1988).
88. J. Kupka and K. Prikry, The measurability of uncountable unions, Amer. Math.
Monthly 91 (1984), 85-97.
89. A. V. Lair, A Rellich compactness theorem for sets of finite volume, Amer. Math.
Monthly 83 (1976), 350-351.
90. J. Lamperti, Probability (2nd ed.), Wiley, New York (1996).
91. H. Lebesgue, Integrale, longueur, aire, AnnaU Mat. Pura Appl. (3) 7 (1902), 231-
359; also in Lebesgue's Oeuvres Scientifiques, Vol. I, L'Enseignement Mathe-
matique, Geneva (1972), 201-331.
92. H. Lebesgue, Sur I' integration des fonctions discontinues, Ann. Sci. Ecole Norm.
Sup. 27 (1910), 361-450; also in Lebesgue's Oeuvres Scientifiques, vol. II,
L'Enseignement Mathematique, Geneva (1972), 185-274.
93. E. H. Lieb and M. Loss, Analysis, American Mathematical Society, Providence,
R.I. (1997).
94. L. H. Loomis, An Introduction to Abstract Harmonic Analysis, Van Nostrand,
Princeton, N.J. (1953).
95. L. H. Loomis and S. Sternberg, Advanced Calculus, Addison-Wesley, Reading,
Mass. (1968); reprinted by Jones and Bartlett, Boston (1990).
96. B. Mandelbrot, The Fractal Geometry of Nature, Freeman, San Francisco (1983).
BIBLIOGRAPHY 371
97. J. Marcinkiewicz, Sur l'interpolation d'operateurs, C. R. A cad. Sci. Paris 208
(1939), 1272-1273.
98. A. Markov, On mean values and exterior densities [Russian], Mat. Sbornik 4(46)
(1938), 165-191.
99. R. M. McLeod, The Generalized Riemann Integral, Mathematical Association
of America, Washington, D.C. (1980).
100. A. G. Miamee, The inclusion LP(J-L) C Lq(v), Amer. Math. Monthly 98 (1991),
342-345.
101. E. H. Moore and H. L. Smith, a general theory of limits, Amer. J. Math. 44
(1922), 102-121.
102. J. Nagata, Modem General Topology (2nd rev. ed.), North-Holland, Amsterdam
(1985).
103. E. Nelson, Regular probability measures on function spaces, Annals of Math. 69
(1959), 630-643.
104. E. Nelson, Feynman integrals and the Schrodinger equation, J. Math. Phys. 5
(1964), 332-343.
105. E. Nelson, Dynamical Theories of Brownian Motion, Princeton University Press,
Princeton, N.J. (1967).
106. D. J. Newman, Fourier uniqueness via complex variables, Amer. Math. Monthly
81 (1974), 379-380.
107. O. Nikodym, Sur une generalisation des integrales de M. J. Radon, Fund. Math.
15 (1930),131-179.
108. W. F. Pfeffer, Integrals and Measures, Marcel Dekker, New York (1977).
109. W. F. Pfeffer, The Riemann Approach to Integration, Cambridge University
Press, London (1993).
110. M. Plancherel, Contribution a l'etude de la representation d'une fonction arbi-
traire par des integrales definies, Rend. Circ. Mat. Palermo 30 (1910), 289-335.
111. J. Radon, Theorie und Anwendungen der absolut additiv Mengenfunktionen,
S.-B. Math.-Natur. Kl. Kais. Akad. Wiss. Wien 122.IIa (1913), 1295-1438; also
in Radon's Collected Works, vol. I, Birkhtiuser, Basel (1987), 45-188.
112. M. Reed and B. Simon, Methods of Modem Mathematical Physics I: Functional
Analysis (2nd ed.), Academic Press, New York (1980).
113. B. Riemann, Uber die Hypothesen, welche der Geometrie zu Grunde liegen, in
Riemann's Gesammelte Mathematische Werke, Teubner, Leipzig (1876), 254-
269.
372 BIBLIOGRAPHY
114. F. Riesz, Sur les systemes orthogonaux de fonctions, C. R. A cad. Sci. Paris 144
(1907), 615-619; also in Riesz's Oeuvres Completes, Vol. I, Akaemiai Kiado,
Budapest (1960),378-381.
115. F. Riesz, Sur une espece de geometrie analytique des systemes de fonctions
sommables, C. R. Acad. Aci. Paris 144 (1907), 1409-1411; also in Riesz's
Oeuvres Completes, Vol. I, Akaemiai Kiado, Budapest (1960),386-388.
116. F. Riesz, Sur les operations fonctionnelles lineaires, C. R. Acad Sci. Paris 149
(1909), 974-977; also in Riesz's Oeuvres Completes, Vol. I, Akaemiai Kiado,
Budapest (1960), 400-402.
117. F. Riesz, Untersuchungen tiber Systeme integrierbarer Funktionen, Math. An-
nalen 69 (1910),449-497; also in Riesz's Oeuvres Completes, Vol. I, Akaemiai
Kiado, Budapest (1960),441-489.
118. M. Riesz, Sur les maxima des formes bilineaires et sur les fonctionnelles
lineaires, Acta Math. 49 (1926), 465-497; also in Riesz's Collected Papers,
Springer-Verlag, Berlin (1988), 377-409.
119. C. A. Rogers, Hausdorff Measures, Cambridge University Press, Cambridge,
U.K. (1970).
120. J. L. Romero, When is £P(J-l) contained in £q(J-l)?, Amer. Math. Monthly 90
(1983), 203-206.
121. H. L. Royden, Real Analysis (3rd ed.), Macmillan, New York (1988).
122. L. A. Rubel, A complex-variables proof of Holder's inequality, Proc. Amer.
Math. Soc. 15 (1964), 999.
123. W. Rudin, Lebesgue's first theorem, in L. Nachbin (ed.), Mathematical Analysis
andApplications (Advances in Math. Supplementary Studies, vol. 7B), Academic
Press, New York (1981), 741-747.
124. W. Rudin, Well-distributed measurable sets, Amer. Math. Monthly 90 (1983),
41-42.
125. W. Rudin, Real and Complex Analysis (3rd ed.), McGraw-Hill, New York (1987).
126. W. Rudin, Functional Analysis (2nd ed.), McGraw-Hill, New York (1991).
127. S. Saeki, A proof of the existence of infinite product probability measures, Amer.
Math. Monthly 103 (1992), 682-683.
128. S. Saks, Theory of the Integral (2nd ed.), Monografje Matematyczne, Warsaw
(1937); reprinted by Hafner, New York (1938).
129. I. Schur, Bemerkungen zur Theorie der beschrankten Bilinearformen mit un-
endlich vielen Veranderlichen, J. Reine Angew. Math. 140 (1911), 1-28; also
BIBLIOGRAPHY 373
in Schur's Gesammelte Abhandlungen, Vol. I, Springer-Verlag, Berlin (1973),
464-491.
130. J. Schwartz, A note on the space L;, Proc. Amer. Math. Soc. 2 (1951), 270-275.
131. J. Schwartz, The formula for change of variables in a multiple integral, Amer.
Math. Monthly 61 (1954), 81-85.
132. L. Schwartz, Theorie des Distributions (2nd ed.), Hermann, Paris (1966).
133. J. Serrin and D. E. Varberg, A general chain rule for derivatives and the change
of variables formula for the Lebesgue integral, Amer. Math. Monthly 76 (1969),
514-520.
134. W. Sierpinski, Sur un probleme concernant les ensembles mesurables superfi-
ciellement, Fund. Math. 1 (1920), 112-115; also in Sierpinski's Oeuvres Choisis,
vol. II, Polish Scientific Publishers, Warsaw (1975), 328-330.
135. R. M. Smullyan and M. Fitting, Set Theory and the Continuum Problem, Oxford
University Press, Oxford, U.K. (1996).
136. S. L. Sobolev, A new method for solving the Cauchy problem for normal linear
hyperbolic equations [Russian], Mat. Sbornik 1(43) (1936), 39-72.
137. S. L. Sobolev, On a theorem of functional analysis [Russian], Mat. Sbomik 4(46)
(1938),471-496.
138. R. M. Solovay, A model of set theory in which every set of reals is Lebesgue
measurable, Annals of Math. 92 (1970), 1-56.
139. S. M. Srivastava, A Course in Borel Sets, Springer-Verlag, New York (1998).
140. E. M. Stein, Singular Integrals and Differentiability Properties of Functions,
Princeton University Press, Princeton, N.J. (1970).
141. E. M. Stein, Harmonic Analysis, Princeton University Press, Princeton, N.J.
(1993).
142. E. M. Stein and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces,
Princeton University Press, Princeton, N.J. (1971).
143. H. Steinhaus, Additive und stetige Funktionaloperationen, Math. Zeit. 5 (1919),
186-221; also pp. 252-288 in Steinhaus's Selected Papers, Polish Scientific
Publishers, Warsaw (1985).
144. M. H. Stone, Applications of the theory of Boolean rings to general topology,
Trans. Amer. Math. Soc. 41 (1937), 375-481.
145. M. H. Stone, The generalized Weierstrass approximation theorem, Math. Mag.
21 (1948), 167-184 and 237-254; reprinted in R. C. Buck (ed.), Studies in Modem
374 BIBLIOGRAPHY
Analysis, Mathematical Association of America, Washington, D.C. (1962), 30-
87.
146. K. Stromberg, The Banach- Tarski paradox, Amer. Math. Monthly 86 (1979),
151-161.
147. M. E. Taylor, Pseudodifferential Operators, Princeton University Press, Prince-
ton, N.J. (1981).
148. H. J. ter Horst, Riemann-Stieltjes and Lebesgue-Stieltjes integrability, Amer.
Math. Monthly 91 (1984),551-559.
149. G. O. Thorin, An extension of a convexity theorem due to M. Riesz, Kungl.
Fysiografiska Saellskapet i Lund Forhaendlinger 8 (1939), No. 14.
150. F. Treves, Topological Vector Spaces, Distributions, and Kernels, Academic
Press, New York (1967).
151. A. Tychonoff, Uber die topologische Erweiterung von Raiimen, Math. Annalen
102 (1929), 544-561.
152. P. Urysohn, Ober die Machtigkeit der zusammenhangenden Mengen, Math.
Annalen 94 (1925), 262-295.
153. P. Urysohn, Zum Metrisationsproblem, Math. Annalen 94 (1925),309-315.
154. J. von Neumann, Mathematische Begriindung der Quantenmechanik, Gottinger
Nachr. (1927), 1-57; also in von Neumann's Collected Works, Vol. I, Pergamon
Press, New York (1961), 151-207.
155. J. von Neumann, The uniqueness of Haar's measure, Mat. Sbomik 1(43) (1936),
721-734; also in von Neumann's Collected Wr)rks, Vol. IV, Pergamon Press, New
York (1962), 91-104.
156. L. E. Ward, A weak Tychonoff theorem and the axiom of choice, Proc. Amer.
Math. Soc. 13 (1962), 757-758.
157. F. W. Warner, Foundations of Differentiable Manifolds and Lie Groups, Scott
Foresman, Glenview, Ill. (1971); reprinted by Springer-Verlag, New York (1983).
158. A. Weil, L']ntegration dans les Groupes Topologiques et ses Applications, Her-
mann, Paris (1940).
159. N. Wiener, Differential-space, J. Math. and Phys. 2 (1923),131-174; also in
Wiener's Collected Works, Vol. I, MIT Press, Cambridge, Mass. (1976),455-598.
160. N. Wiener, The average value of a functional, Proc. London Math. Soc. 22 (1924),
454-467; also in Wiener's Collected Works, Vol. I, MIT Press, Cambridge, Mass.
(1976),499-512.
BIBLIOGRAPHY 375
161. N. Wiener, The ergodic theorem, Duke Math. J. 5 (1939),1-18; also in Wiener's
Collected Works, Vol. I, MIT Press, Cambridge, Mass. (1976),672-689.
162. C. S. Wong, A note on the central limit theorem, Amer. Math. Monthly 84 (1977),
472.
163. K. Yosida, Functional AnaZysis (6th ed.), Springer-Verlag, New York (1980).
164. W. H. Young, On the multiplication of successions of Fourier constants, Proc.
RoyaZ Soc. (A) 87 (1912), 331-339.
165. A. C. Zaanen, Continuity of measurable functions, Amer. Math. Monthly 93
(1986), 128-130.
166. A. Zygmund, On a theorem of Marcinkiewicz concerning interpolation of oper-
ators, J. Math. Pures Appl. (9) 35 (1956), 223-248.
167. A. Zygmund, Trigonometric Series (2 vols., reprinted in 1 voL), Cambridge
University Press, Cambridge, U.K. (1968).
Index of Notation
For the basic notation used throughout the book for sets, mappings, numbers, and
metric and topological spaces, see Chapter O. Notation used only in the section in
which it is introduced is, for the most part, not listed here.
Analysis on Euclidean space: x. y (dot product), 235. aa, x a , aI, lal
(multi-index notation), 236. Tn (n-torus), 238.
Functions and operations on functions: fI (positive and negative
parts), 46. sgn, 46. XE (characteristic function), 46. fx, fY (sections), 65. r,
58. supp(f) (support), 132, 284. Aj (distribution function), 197. Tyf (translation),
-.
238. f * 9 (convolution), 2 39 , 285. cPt (dilation), 242. f,:f f (Fourier transform),
248,249,295. (F, cP), 283. f (reflection), 283.
Integrals: The basic notation is developed in 32.2. I f(x) dx (Lebesgue inte-
gral), 57,70. II f dJ-l dv (iterated integral), 67. I 9 dF (Stieltjes integral), 107.
Measures: J-lF (Lebesgue-Stieltjes measure), 35. m, m n (Lebesgue measure),
37, 70. J-l x v (product), 64. a (surface measure on sphere), 78. vI (positive and
negative variations), 87. Ivl (total variation), 87, 93. J-l-L v (mutual singularity), 87.
J-l « v (absolute continuity), 88. f dJ-l, 89. dJ-l/ dv (Radon-Nikodym derivative), 91.
supp(J-l) (support), 215. J-l x v (Radon product), 227. J-l * v (convolution), 270.
Norms and seminorms: Ilfllu (uniform norm), 121. IITII (operator norm),
154. IIfll p (LP norm), 181. IIflloo (LOO norm), 184. [f]p (weak LP quasi-norm), 198.
1IJ-l1I (measure norm), 222. IIcPIICN,a) (Schwartz space norm), 237. IIfllCs) (Sobolev
norm), 302.
377
378 INDEX OF NOTATION
Probability theory: E(X) (expectation), 314. a 2 (X) (variance), 314. P<jJ
(image measure, distribution), 314. lJ2 (normal distribution), 325.
Sets: Fa, Fa8, G 8 , G 8a , 22. Ex, EY (sections), 65.
a-algebras: M(£) (a-algebra generated by G), 22. x (Borel sets), 22.
Q9cxEA Mcx, M 0 N (products), 22. 'c,,Cn (Lebesgue measurable sets), 37, 70.
(Baire sets), 215.
Spaces of functions, measures, etc.: L+,49. L1,54, 181. Lfoc,96.
BV, 102. N BV, 103. C(X, Y), 119. B(X, JR), 121. BC(X, JR), 121. B(X), 121.
C(X), 121. BC(X), 121. Cc(X), 132. Co(X), 132. L(X, ), 154. X*, 157. L2,
172,181. [2, 173, 181. LP, 181, 184. [P, 181. LOC>, 184. weak LP, 198. M(X), 222.
C k ,235. COC>,235. C,235. S,237.1)',282. £',291. C8',293. H s ,301. H;oc,
306.
A
Abel mean, 261
Abel summation, 261
Absolute continuity
of a function, 105
of a measure, 88
Absolute convergence, 152
Accumulation point, 114
Adjoint, 160, 177
A.e., 26
Alaoglu's theorem, 169
Alexandroff compactification, 132
Algebra
Banach, 154
of functions, 139
of sets, 21
Almost every(where), 26
Almost sure(ly), 314
Almost uniform convergence, 62
Approximate identity, 245
Arcwise connected set, 124
Arzela-Ascoli theorem, 137
A.s., 314
Axiom
of choice, 6
of countability, 116
of separation, 116
Index
B
Baire category theorem, 161
Baire classes, 83
Baire set, 215
Ball, 13
Banach algebra, 154
Banach space, 152
Banach- Tarski paradox, 20
Base for a topology, 115
Bessel's inequality, 175
Bijective mapping, 4
Binomial distribution, 320
Bochner integral, 179
Bochner-Riesz means, 279
Bolzano- Weierstrass property, 15
Borel isomorphism, 83
Borel measurable function, 44
Borel measure 33
Borel set, 22
Borel u-algebra, 22
Borel's normal number theorem, 330
Borel-Cantelli lemma, 321
Boundary, 114
Bounded linear map, 153
Bounded set, 15
Bounded variation, 102
379
380 INDEX
c
Cantor function, 39
Cantor set, 38
Cantor-Lebesgue function, 39
Caratheodory's theorem, 29
Cartesian product, 3-4
Cauchy net, 167
Cauchy sequence, 14
in measure, 61
Cauchy-Riemann equation, 308
Central limit theorem, 326
Cesaro mean, 262
Cesaro summation, 262
Characteristic function, 46
of a probability distribution, 314
Chebyshev's inequality, 193
Chi-square distribution, 320
Closed graph theorem, 163
Closed linear map, 163
Closed set, 13, 114
Closure, 13, 114
Closure operator, 119
Cluster point, 118, 126
Coarser topology, 114
Cofinal net, 127
Cofinite topology, 113
Commutator subgroup, 346
Compact set, 16, 128
Compact space, 128
countably, 130
locally, 131
sequentially, 130
Compactification, 144
one-point, 132
Stone-Cech, 144
Compactly supported function, 132
Complement, 3
Complete measure, 26
Complete metric space, 14
Complete orthonormal set, 175
Complete topological vector space, 167
Completely regular algebra, 146
Completely regular space, 123
Completion
of a measure, 27
of a normed vector space, 159
of a a-algebra, 27
Complex measure, 93
Composition, 3
Condensation of singularities, 165
Conditional expectation, 93
Conjugate exponent, 183
Connected component, 119
Connected set, 118
Constant-coefficient operator, 273
Content, 72
Continuity, 14, 119
absolute, 88, 105
Holder, 138
Lipschitz, 108
of linear maps on C ' 282
of measures, 26
uniform, 14, 238, 340
Continuous measure, 106
Continuum, 8
Continuum hypothesis, 17
Convergence
absolute, 152
almost uniform, 62
in C, 282
in L1, 54
in measure, 61
in probability, 314
of a filter, 147
of a net, 126
ofasequenc 14, 116
of a series, 152
Convex function, 109
Convolution
of distributions, 285, 292
of functions, 239
of measures, 270
Coordinate, 4
Coordinate map, 4
Countable additivity, 24
Countable ordinals, 10
Countable set, 7
Countably compact space, 130
Counting measure, 25
Cover, 15
Cube, 71, 143
D
Daniell integral, 81
Decomposable measure, 92
Decreasing function, 12
Decreasing rearrangement, 199
DeMorgan's laws, 3
Dense set, 13, 114
Density, 362
of a set, 100
Riemannian volume, 362
Denumerable set, 7
Derivative
LP, 246
of a distribution, 284
Radon-Nikodym, 91
Diameter, 15
Diffeomorphism, 74
Difference of sets, 3
Differential operator, 273
Dirac measure, 25
Directed set, 125
Dirichlet kernel, 264
Dirichlet problem, 274
Disconnected set, 118
Discrete, 113
Discrete measure, 106
Discrete topology, 113
Disjoint sets, 2
Distribution function, 33,197,315
Distribution, 282
homogeneous, 289
joint, 315
of a random variable, 315
periodic, 297
tempered, 293
Domain, 4
Dominated convergence theorem, 54
Dual space, 157
E
Egoroff's theorem, 62
Elementary family, 23
Elliptic differential operator, 307, 311
Elliptic regularity theorem, 307
Embedding, 120
Entropy, 325
Equicontinuity, 137
Equivalence class, 3
Equivalence relation, 3
Equivalent metric, 16
Equivalent norm, 152
Essential range, 187
Essential supremum, 184
Event, 314
Eventually, 126
Expectation, 314
conditional, 93
Extended integrable function, 86
Extended real number system, 10
F
Fa and Fa6 sets, 22
Fatou's lemma, 52
Fejer kernel, 269
Field of sets, 21
Filter, 147
Finer topology, 114
Finite intersection property, 128
Finite measure, 25
Finite signed measure, 88
Finitely additive measure, 25
First category, 161
First countable space, 116
First uncountable ordinal, 10
Fourier integral, 253
INDEX 381
Fourier inversion theorem, 251
Fourier series, 248
Fourier transform, 248-249
of a measure, 272
of a tempered distribution, 295
Fourier-Stieltjes transform, 272
Fractional integral, 77
Frequently, 126
Frechetspace,167
Fubini's theorem, 67-68, 229
Fubini-Tonelli theorem, 67-68, 229
Function, 3
Fundamental theorem of calculus, 106
G
G 6 and G 6a sets, 22
Gamma distribution, 320
Gamma function, 58
Gauge, 82
Gauss kernel, 260
Gaussian distribution, 325
Generalized Cantor set, "39
Gibbs phenomenon, 268
Gram-Schmidt process, 175
Graph of a linear map, 162
H
H-interval, 33
Haar measure, 341
Hahn decomposition, 87
Hahn decomposition theorem, 86
Hahn-Banach theorem, 157-158
Hardy's inequalities, 196
Hardy-Littlewood maximal function, 96
Hausdorff dimension, 351
Hausdorff maximal principle, 5
Hausdorff measure, 350
Hausdorff space, 117
Hausdorff- Young inequality, 248
Heat equation, 275
Heine-Borel property, 15
Heisenberg's inequality, 255
Henstock-Kurzweil integral, 82
Hermite function, 256
Hermite operator, 256
Hermite polynomial, 257
Hilbert space, 172
Hilbert's inequality, 196
Homeomorphism, 119
Homogeneous distribution, 289
Hull of an ideal, 142
Holder continuity, 138
Holder's inequality, 182, 196
382 INDEX
I
Ideal in an algebra, 142
Identically distributed random variables, 315
Iff, 1
Image, 3
Image measure, 314
Increasing function, 12
Independent events, 315
Independent random variables, 315
Indicator function, 46
Indiscrete topology, 113
Inequality
Bessel's, 175
Chebyshev ' s, 193
Hardy's, 196
Hausdorff- Young, 248
Heisenberg's, 255
Hilbert's, 196
Holder's, 182, 196
Jensen's, 109
Kolmogorov's, 322
Minkowski's, 183, 194
Schwarz, 172
triangle, 151
Wirtinger's, 254
Young's, 240-241
Infimum, 9-10
Initial segment, 9
Injective mapping, 4
Inner measure, 32
Inner product, 171
Inner regular measure, 212
Integrable function, 53
weakly, 179
extended, 86
locally, 95
Integral
Bochner, 179
Daniell, 81
fractional, 77
Henstock-Kurzweil,82
Lebesgue, 56
Lebesgue-Stieltjes, 107
of a complex function, 53
of a nonnegative function, 50
of a real function, 53
of a simple function, 49
Riemann, 57
Interior, 13, 114
Invariant set, 355
Inverse, 4
Inverse image, 3
Invertible linear map, 154
Isometry, 154
Isomorphism
Borel, 83
linear, 154
order, 5
unitary, 176
J
Jensen's inequality, 109
Joint distribution, 315
Jordan content, 72
Jordan decomposition
of a function, 103
of a signed measure, 87
Jordan decomposition theorem, 87
K
Kernel of a closed set, 142
Kolmogorov's inequality, 322
Krein extension theorem, 161
L
LP derivative, 246
LP norm, 181, 184
LP space, 181, 184
intrinsic, 363
weak, 198
Laplacian, 273
Lattice of functions, 139
Law of large numbers
strong, 322-323
weak, 321
Law of the iterated logarithm, 336
LCH space, 131
Lebesgue decomposition, 91
Lebesgue differentiation theorem, 98
Lebesgue integral, 56
Lebesgue measurable function, 44
Lebesgue measurable set, 37, 70
Lebesgue measure, 37, 70
Lebesgue set, 97
Lebesgue-Radon-Nikodym theorem, 90, 93
Lebesgue-Stieltjes integral, 107
Lebesgue-Stieltjes measure, 35
Left continuous function, 12
Left-invariant measure, 341
Lemma
Borel-Cantelli, 321
Fatou's, 52
monotone class, 66
Riemann- Lebesgue, 249
three lines, 200
Urysohn's, 122, 131,245
Weyl's,308
Zorn's, 5
Limit inferior, 2, 11
Limit of a net, 126
Limit superior, 2, 11
Linear functional, 157
positive, 211
Linear ordering, 5
Liouville's theorem, 300
Lipschitz continuity, 108
Localized Sobolev space, 306
Locally compact group, 341
Locally compact space, 131
Locally convex space, 165
Locally finite cover, 135
Locally integrable function, 95
Locally measurable set, 28
Locally null set, 192
Lower bound, 5
Lower semicontinuous function, 218
LSC function, 218
Lusin's theorem, 64, 217
M
Map, 3
Mapping,_ 3
Marcinkiewicz interpolation theorem, 202
Maximal element, 5
Maximal function, 96, 246
Maximal theorem, 96
Meager set, 161
Mean ergodic theorem, 178
Mean, 314-315
Measurable function, 44
Measurable mapping, 43
Measurable set, 25
Lebesgue, 37, 70
locally, 28
with respect to an outer measure, 29
Measurable space, 25
Measure, 24
Borel, 33
complete, 26
complex, 93
continuous, 106
counting, 25
decomposable, 92
dicrete, 106
Dirac, 25
finitely additive, 25
Hausdorff, 350
inner, 32
inner regular, 212
Lebesgue, 37, 70
Lebesgue-Stieltjes, 35
outer, 28
outer regular, 212
positive, 85
Radon, 212
regular, 99, 212
INDEX 383
semifinite, 25
a-finite, 25
signed, 85
singular, 87
smooth, 361
Measure space, 25
Metric, 13
Metric outer measure, 349
Metric space, 13
Minimal element, 5
Minkowski's inequality, 183
for integrals, 194
Modular function, 346
Moment convergence theorem, 320
Monotone class, 65
Monotone class lemma, 66
Monotone convergence theorem, 50
Monotone function, 12
Monotonicity of measures, 25
Multi-index, 236
Mutually singular measures, 87
N
Negative part of a function, 46
Negative set, 86
Negative variation
of a function, 103
of a signed measure, 87
Neighborhood, 114
Neighborhood base, 114
Net, 125
Norm, 152
LP, 181, 184
operator, 154
product, 153
quotient, 153
uniform, 121
Norm topology, 152
Normal distribution, 325
Normal number, 330
Normal space, 117
Normed linear space, 152
Normed vector space, 152
Nowhere dense set, 13, 114
Null set, 26, 86
locally, 192
o
One-point compactification, 132
Open map, 162
Open mapping theorem, 162
Open set, 12-13, 114
Operator norm, 154
Order isomorphism, 5
Order topology, 118
Ordinal, 10
384 INDEX
Orientable manifold, 362
Orientation, 362
Orthogonal projection, 177
Orthogonal set, 173
Orthonormal basis, 176
Orthonormal set, 175
Outer measure, 28
metric, 349
Outer regular measure, 212
p
Paracompact space, 135
Parallelogram law, 173
Parametrization, 352
Parseval's identity, 175
Partial ordering, 4
Partition
of an interval, 56
of unity, 134
tagged, 82
Periodic distribution, 297
Periodic function, 238
Plancherel theorem, 252
Point mass, 25
Pointwise bounded family, 137
Poisson distribution, 320
Poisson kernel, 260, 262
Poisson summation formula, 254
Polar coordinates, 78
Polar decomposition, 46
Positive definite function, 272
Positive linear functional, 211
Positive measure, 85
Positive part of a function, 46
Positive set, 86
Positive variation
of a function, 103
of a signed measure, 87
Pre-Hilbert space, 172
Precompact set, 128
Predecesor, 9
Premeasure, 30
Principal symbol, 306
Probability measure, 313
Product measure, 64
Product metric, 13
Product norm, 153
Product 0" -algebra, 22
Product topology, 120
Projection, 4
orthogonal, 177
Proper map, 135
Pythagorean theorem, 173
Q
Quotient norm, 153
Quotient space, 153
Quotient topology, 124
R
Radon measure, 212
complex, 222
Radon product, 227
Radon-Nikodym derivative, 91
Radon-Nikodym theorem, 91
Random variable, 314
Range, 4
Real-analytic function, 263
Rectangle, 64
Refinement of a cover, 135
Reflexive space, 159
Regular measure, 99, 212
Regular space, 117
Relation, 3
Relative topology, 114
Rellich's theorem, 305
Residual set, 161
Reverse inclusion, 125
Riemann integrable function, 57
Riemann integral, 57
Riemann-Lebesgue lemma, 249
Riemannian volume density, 362
Riesz representation theorem, 212, 223
Riesz- Thorin interpolation theorem, 200
Right continuous function, 12
Right-invariant measure, 341
Ring of sets, 24
s
Sample mean, 325
Sample space, 314
Sample variance, 325
Sampling theorem, 255
Saturated measure, 28
Saturation of a measure, 28
Scalar product, 171
Schroder-Bernstein theorem, 7
Schwartz space, 236
Schwarz inequality, 172
Second category, 161
Second countable space, 116
Section of a set or function, 65
Semifinite measure, 25
Semifinite part, 27
Semi norm, 151
Separable space, 14, 116
Separating set, 359
Separation
of points, 139
of points and closed sets, 143
Sequence, 4
Sequentially compact space, 130
Shannon's theorem, 324
Shrink nicely, 98
Sides of a rectangle, 70
Sierpinski gasket, 356
a-algebra, 21
Borel, 22
generated by a family of functions, 44
generated by a family of sets, 22
of countable or co-countable sets, 21
product, 22
a -compact space, 133
a-field, 21
a-finite measure, 25
a-finite set, 25
a-finite signed measure, 88
a-ring, 24
Signed measure, 85
Similitude, 355
Simple function, 46
Singular measure, 87
Slowly increasing function, 294
Smooth measure, 361
Snowflake curve, 356
Sobolev embedding theorem, 303, 308
Sobolev space, 301
localized, 306
Standard deviation, 314
Standard normal distribution, 325
Standard representation of a simple function, 46
Stirling's formula, 327
Stone-Cech compactification, 144
Stone- Weierstrass theorem, 139, 141
Strong law of large numbers, 322-323
Strong operator topology, 169
Strong type, 202
Stronger topology, 114
Subadditivity, 25
Subbase for a topology, 114
Sublinear functional, 157
Sublinear map, 202
Submanifold, 351
Subnet, 126
Subordination, 134
Subsequence, 4
Subspace of a vector space, 151
Support
of a distribution, 284
of a function, 132
of a measure, 215
Supremum, 9-10
Surjective mapping, 4
Symbol, 273
principal, 306
Symmetric difference, 3
Symmetric neighborhood, 339
INDEX 385
T
To, . . . , T4 space, 116, 123
Tagged partition, 82
Tempered distribution, 293
Tempered function, 293
Theorem
Alaoglu's, 169
Arzela-Ascoli, 137
Baire category, 161
Caratheodory's, 29
central limit, 326
closed graph, 163
dominated convergence, 54
Egoroff's,62
Fourier inversion, 251
Fubini- Tonelli, 67-68, 229
Hahn decomposition, 86
Hahn-Banach, 157-158
Jordan decomposition, 87
Krein extension, 161
Lebesgue differentiation, 98
Lebesgue-Radon-Nikodym, 90, 93
Liouville's, 300
Lusin's, 64, 217
Marcinkiewicz interpolation, 202
maximal, 96
moment convergence, 320
monotone convergence, 50
open mapping, 162
Plancherel, 252
Pythagorean, 173
Radon-Nikodym,91
Rellich's, 305
Riesz representation, 212, 223
Riesz- Thorin interpolation, 200
sampling, 255
Schroder-Bernstein, 7
Shannon's, 324
Sobolev embedding, 303, 308
Stone-Weierstrass, 139, 141
Tietze extension, 122, 131
Tychonoff's, 136
Urysohn metrization, 145
Vitali convergence, 187
Vitali covering, 110
Weierstrass approximation, 141, 318
Three lines lemma, 200
Tietze extension theorem, 122, 131
Tonelli's theorem, 67-68, 229
Topological group, 339
Topological space, 113
Topological vector space, 165
Topology, 113
cofinite, 113
generated by a family of sets, 114
386 INDEX
indiscrete, 113
norm, 152
of uniform convergence, 133
of uniform convergence on compact sets, 133
product, 120
quotient, 124
relative, 114
strong operator, 169
trivial, 113
vague, 223
weak operator, 169
weak, 120, 168
weak*, 169
Zariski, 117
Torus, 238
Total ordering, 5
Total variation
of a complex measure, 93
of a function, 102
of a signed measure, 87
Totally bounded set, 15
Transfinite induction, 9
Transpose, 160
Triangle inequality, 151
Trivial topology, 113
Tychonoff space, 123
Tychonoff's theorem, 136
u
Uniform boundedness principle, 163
Uniform continuity, 238, 340
Uniform integrability, 92
Uniform norm, 121
Unimodular group, 346
Unitary map, 176
Upper bound, 5
Upper semi continuous function, 218
U rysohn metrization theorem, 145
Urysohn's lemma, 122, 131, 245
USC function, 218
v
Vague topology, 223
Vanish at infinity, 132
Variance, 314-315
Vitali convergence theorem, 187
Vitali covering theorem, 110
w
Wave equation, 275
Weak convergence, 169
Weak LP, 198
Weak law of large numbers, 321
Weak operator topology, 169
Weak topology, 120, 168
Weak type, 202
Weak* topology, 169
Weaker topology, 114
Weierstrass approximation theorem, 141, 318
Weierstrass kernel, 260
Well ordering principle, 5
Well ordering, 5
Weyl's lemma, 308
Wiener process, 332
abstract, 331
Wirtinger's inequality, 254
y
Young's inequality, 240-241
Z
Zariski topology, 117
Zorn's lemma, 5
PURE AND APPLIED MATHEMATICS
A Wiley-Interscience Series of Texts, Monographs, and Tracts
Founded by RICHARD COURANT
Editor Emeritus: PETER HILTON and HARRY HOCHST ADT
Editors: MYRON B. ALLEN III, DAVID A. COX, PETER LAX,
JOHN TOLAND
ADAMEK, HERRLICH, and STRECKER--Abstract and Concrete Catetories
ADAMOWICZ and ZBIERSKI-Logic of Mathematics
AKIVIS and GOLDBERG-Conformal Differential Geometry and Its Generalizations
ALLEN and ISAACSON-Numerical Analysis for Applied Science
* AR TIN-Geometric Algebra
AZIZOV and IOKHVIDOV-Linear Operators in Spaces with an Indefinite Metric
BERMAN, NEUMANN, and STERN-Nonnegative Matrices in Dynamic Systems
BOY ARINTSEV-Methods of Solving Singular Systems of Ordinary Differential
Equations
BURK.-Lebesgue Measure and Integration: An Introduction
*CARTER-Finite Groups of Lie Type
CASTILLO, COBO, mBETE and PRUNEDA-Orthogonal Sets and Polar Methods in
Linear Algebra: Applications to Matrix Calculations, Systems of Equations,
Inequalities, and Linear Programming
CHA TELIN-Eigenvalues of Matrices
CLARK-Mathematical Bioeconomics: The Optimal Management of Renewable
Resources, Second Edition
COX-Primes of the Form x 2 + ny2: Fermat, Class Field Theory, and Complex
Multiplication
*CURTIS and REINER-Representation Theory of Finite Groups and Associative Algebras
*CURTIS and REINER-Methods of Representation Theory: With Applications to Finite
Groups and Orders, Volume I
CURTIS and REINER-Methods of Representation Theory: With Applications to Finite
Groups and Orders, Volume II
*DUNFORD and SCHWARTZ-Linear Operators
Part I-General Theory
Part 2-Spectral Theory, Self Adjoint Operators in
Hilbert Space
Part 3-Spectral Operators
FOLLAND-Real Analysis: Modem Techniques and Their Applications
FROLICHER and KRIEGL-Linear Spaces and Differentiation Theory
GARDINER- Teichmiiller Theory and Quadratic Differentials
GREENE and KRANTZ-Function Theory of One Complex Variable
*GRIFFITHS and HARRIS-Principles of Algebraic Geometry
GRILLET -Algebra
GROVE-Groups and Characters
GUSTAFSSON, KREISS and OLIGER-Time Dependent Problems and Difference
Methods
HANNA and ROWLAND-Fourier Series, Transforms, and Boundary Value Problems,
Second Edition
*HENRICI-Applied and Computational Complex Analysis
Volume 1, Power Series-Integration-Conformal Mapping-Location
of Zeros
Volume 2, Special Functions-Integral Transforms-Asymptotics-
Continued Fractions
Volume 3, Discrete Fourier Analysis, Cauchy Integrals, Construction
of Conformal Maps, Univalent Functions
*HIL TON and WU-A Course in Modem Algebra
*HOCHST ADT -Integral Equations
lOST -Two-Dimensional Geometric Variational Procedures
*KOBA Y ASHI and NOMIZU-Foundations of Differential Geometry, Volume I
*KOBA Y ASHI and NOMIZU-Foundations of Differential Geometry, Volume II
LAX-Linear Algegra
LOGAN-An Introduction to Nonlinear Partial Differential Equations
McCONNELL and ROBSON-Noncommutative Noetherian Rings
NA YFEH-Perturbation Methods
NA YFEH and MOOK-Nonlinear Oscillations
PANDEY-The Hilbert Transform of Schwartz Distributions and Applications
PETKOV-Geometry of Reflecting Rays and Inverse Spectral Problems
*PRENTER-Splines and Variational Methods
RAO-Measure Theory and Integration
RASSIAS and SIMSA-Finite Sums Decompositions in Mathematical Analysis
RENEL T -Elliptic Systems and Quasiconformal Mappings
RIVLIN-Chebyshev Polynomials: From Approximation Theory to Algebra and Number
Theory, Second Edition
ROCKAFELLAR-Network Flows and Monotropic Optimization
ROITMAN-Introduction to Modem Set Theory
*RUDIN-Fourier Analysis on Groups
SENDOV-The Averaged Moduli of Smoothness: Applications in Numerical Methods
and Approximations
SENDOV and POPOV-The Averaged Moduli of Smoothness
*SIEGEL-Topics in Complex Function Theory
Volume I-Elliptic Functions and Uniformization Theory
Volume 2-Automorphic Functions and Abelian Integrals
Volume 3-Abelian Functions and Modular Functions of Several Variables
SMITH and ROMANOWSKA-Post-Modem Algebra
STAKGOLD-Green's Functions and Boundary Value Problems, Second Editon
*STOKER-Differential Geometry
*STOKER...,.:-Nonlinear Vibrations in Mechanical and Electrical Systems
*STOKER-Water Waves: The Mathematical Theory with Applications
WESSELING-An Introduction to Multigrid Methods
WHITMAN-Linear and Nonlinear Waves
tZAUDERER-Partial Differential Equations of Applied Mathematics, Second Edition
*Now available in a lower priced paperback edition in the Wiley Classics Library.
tNow available in paperback.
An in-dePth look at real analvsis and its
applications no I expanded and revised
This new edition of the Wldel used analTsis book continues to cover real
analysis in greater detail and at a more advanced level than most books on the
subject. Encompassing several subjects that underlie much of modern analysis,
the book focuses on measure and integration theory, point set topology, and
the basics of functional analysis. It illustrates the use of the general theories
and introduces readers to other branches of analysis such s Fourier analysis.
distribution theory, and probability theory.
This edition is bolstered in content as well as in scope-extending its usefulness
to students outside of pure analysis as well as those interested in dynamical
systems. The numerous exercises, extensive bibliography, and review chapter
on sets and metric spaces make Real Analyss: Modern Techniques and
Their Applications, Second Edition invaluable for students in graduate-level
analysis courses. New features include:
. evised material on the n-dimensional Lebesgue integral
.! improved proof of Tychonoff's theorem
. xpanded material on Fourier analysis
. ! newly written chapter devoted to distributions and differential equations
. pdated material on Hausdorff dimension and fractal dimension
GERALD B FOLLAND is Professor of MathematIcs at the University of
Washington in Seattle He has written extensively on mathematical analysis,
including Fourier analysis, harmonic analysis, and differential equations
Cover Design: Adrienne Weiss
LEINTERSCIENCE
John Wiley & Sons, Inc.
r' -l{i L Oil Q
Scientific, Technical, and Medical Divisi. ". ":'':W : !..;
!} if 1
605 Third Avenue, New York, N.Y. 10158 \.iU J.{_
i! ?:"i4 i 0_ i
New York. Chichester. Weinheim .,.-.;t: U .::.
P5'!522z0u
Brisbane · Singapore · Toronto
0-471-31716-0
ISBN 0-471-31716-0
90000
THC 5-04
1999
9 780471 317166
...-