Probability, Random Variables and Stochastic Processes - Papoulis A.

Author: Papoulis A.
Tags: mathematics probability theory stochastic processes computing systems
ISBN: 0-07-048477-5
Year: 1991
Similar
Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance (Stochastic Modeling)
Probability and random processes
Random processes for engineers
Probability metrics and the stability of stochastic models
Text
                    Probability,
Random Variables,
and Stochastic
Processes
Third Edition
Athanasios Pap julis
McGraw-Hill Series in Electrical Engineering
Consulting Editor
Stephen W. Director, Carnegie-Mellon University
Circuits and Systems
Communications and Signal Processing
Control Theory
Electronics and Electronic Circuits
Power and Energy
Electromagnetics
Computer Engineering
Introductory
Radar and Antennas
VLSI
Previous Consulting Editors
Ronald N. Bracewell, Colin Cherry, James F. Gibbons, Willis W. Harman, Hubert Heffner,
Edward W. Herold, John G. Linvill, Simon Ramo, Ronald A. Rohrer, Anthony E.
Siegman, Charles Susskind, Frederick E. Terman, John G. Truxal, Ernst Weber, and John
R. Whinnery
Communications and Signal Processing
Consulting Editor
Stephen W. Director, Carnegie-Mellon University
Antoniou: Digital Filters: Analysis and Design
Candy: Signal Processing: The Model-Based Approach
Candy: Signal Processing: The Modem Approach
Carlson: Communications Systems: An Introduction to Signals and Noise in Electrical
Communication
Cherin: An Introduction to Optical Fibers
Collin: Antennas and Radiowave Propagation
Collin: Foundations for Microwave Engineering
Cooper and McGillem: Modem Communications and Spread Spectrum
Davenport: Probability and Random Processes: An Introduction for Applied Scientists and
Engineers
Drake: Fundamentals of Applied Probability Theory
Huelsman and Allen: Introduction to the Theory and Design of Active Filters
Jong: Method of Discrete Signal and System Analysis
Keiser: Local Area Networks
Keiser: Optical Fiber Communications
Kraus: Antennas
Kuc: Introduction to Digital Signal Processing
Papoulis: Probability, Random Variables, and Stochastic Processes
Papoulis: Signal Analysis
Papoulis: The Fourier Integral and Its Applications
Peebles: Probability, Random Variables, and Random Signal Principles
Proakis: Digital Communications
Schwartz: Information Transmission, Modulation, and Noise
Schwartz and Shaw: Signal Processing
Smith: Modem Communication Circuits
Taub and Schilling: Principles of Communication Systems
PROBABILITY,
RANDOM VARIABLES,
AND STOCHASTIC
PROCESSES
Third Edition
Athanasios Papoulis
Polytechnic Institute of New York
McGraw-Hill, Inc.
New York St. Louis San Francisco Auckland Bogota
Caracas Hamburg Lisbon London Madrid, Mexico Milan Montreal
New Delhi Paris SanJuan Sao Paulo Singapore Sydney Tokyo Toronto
This book was set in Times Roman by Science Typographers, Inc.
The editors were Roger L. Howell and John M. Morriss:
the production supervisor was Richard A. Ausburn.
The cover was designed by Joseph Gillians.
Project supervision was done by Science Typographers, Inc.
R. R. Donnelley & Sons Company was printer and binder.
PROBABILITY, RANDOM VARIABLES, AND STOCHASTIC PROCESSES
Copyright © 1991, 1984, 1965 by McGraw-Hill. Inc. All rights reserved.
Printed in the United States of America. Except as permitted under the
United States Copyright Act of 1976, no part of this publication may be
reproduced or distributed in any form or by any means, or stored in a data
base or retrieval system, without the prior written permission of the
publisher.
1234567890 DOC DOC 90987654321
ISBN 0-07-ачаЧ77-3
Library of Congress Cataloging-in-Publication Data
Papoulis, Athanasios, (date).
Probability, random variables, and stochastic processes/
Athanasios Papoulis.—3rd ed.
p.	cm.—(McGraw-Hill series in electrical engineering.
Communications and signal processing)
Includes bibliographical references and index.
ISBN 0-07-048477-5
1.	Probabilities. 2. Random variables; 3. Stochastic processes.
П. Series.
1991
90-23127
CONTENTS
Preface to the Third Edition	xi
Preface to the Second Edition	xiii
Preface to the First Edition	xv
Part I Probability and Random Variables
1	The Meaning of Probability	з
1-1	Introduction	3
1-2	The Definitions	5
1-3	Probability and Induction	12
1-4	Causality versus Randomness	13
Concluding Remarks	14
2	The Axioms of Probability	15
2-1	Set Theory	15
2-2	Probability Space	20
2-3	Conditional Probability	27
Problems	36
3	Repeated Trials	38
3-1	Combined Experiments	38
3-2	Bernoulli Trials	43
3-3	Asymptotic Theorems	47
3-4	Poisson Theorem and Random Points	55
Problems	60
4	The Concept of a Random Variable	63
Introduction	63
4-2	Distribution and Density Functions	66
vii
viil CONTENTS
Part
4-3	Special Cases	73
4-4	Conditional Distributions and Total Probability	79
	Problems	84
5	Functions of One Random Variable	86
5-1	The Random Variable g(x)	86
5-2	The Distribution of g(x)	87
5-3	Mean and Variance	102
5-4	Moments	109
5-5	Characteristic Functions	115
	Problems	120
6	Two Random Variables	124
6-1	Bivariate Distributions	124
6-2	One Function of Two Random Variables	135
6-3	Two Functions of Two Random Variables	142
	Problems	148
7	Moments and Conditional Statistics	151
7-1	Joint Moments	151
7-2	Joint Characteristic Functions	157
7-3	Conditional Distributions	162
7-4	Conditional Expected Values	169
7-5	Mean Square Estimation	173
	Problems	179
8	Sequences of Random Variables	182
8-1	General Concepts	182
8-2	Conditional Penalties, Characteristic Functions, and Normality	192
8-3	Mean Square Estimation	201
8-4	Stochastic Convergence and Limit Theorems	208
8-5	Random Numbers: Meaning and Generation	221
	Problems	237
9	Statistics	241
9-1	Introduction	241
9r2	Parameter Estimation	244
9-3	Hypothesis Testing	265
	Problems	279
n	Stochastic Processes	
10	General Concepts	285
285
303
! 51И' jOefimtions
o	Systems.withStochastic Inputs
10-3
10-4
11
11-1
11-2
11-3
11-4
11-5
11-6
11-7
12
12-1
12-2
12-3
12-4
13
13-1
13-2
13-3
14
14-1
14-2
14-3
14-4
15
15-1
15-2
15-3
(ОМ
The Power Spectrum
Digital Processes
Appendix 10/\ Continuity, Differentiation, Integration
Appendix 10B Shift Operators and Stationary Processes
Problems
Basic Applications
Random Walk, Brownian Motion, and Thermal Noise
Poisson Points and Shot Noise
Modulation
Cyclostationary Processes
Bandlimitcd Processes and Sampling Theory
Deterministic Signals in Noise
Bispcctra and System Identification
Appendix 11A The Poisson Sum Formula
Appendix 1 IB Schwarz's Inequality
Problems
Spectral Representation
Factorization and Innovations
Finite-Order Systems and State Variables
Fourier Series and Karhunen-Loeve Expansions
Spectral Representation of Random Processes
Problems
Spectral Estimation
Ergod icily
Spectral Estimation
Extrapolation and System Identification
Appendix 13A Minimum-Phase Functions
Appendix 13B All-Pass Functions
Problems
Mean Square Estimation
Introduction
Prediction
Filtering and Prediction
Kalman Filters
Problems
Entropy
Introduction
Basic Concepts
Random Variables and Stochastic Processes
The Maximum Entropy Method
Coding
Channel Capacity
Problems
ms ix
319
332
336
339
340
345
345
354
362
373
376
384
389
395
395
396
401
401
404
412
416
425
427
427
443
455
474
475
477
480
480
487
508
515
529
533
533
542
558
569
579
591
600
X contents
16	Selected Topics	боз
16-1	The Levci-Crossing	Problem	603
16-2	Queueing Theory	612
16-3	Shot Noise	629
16-4	Markoff Processes	635
Problems	654
Bibliography	658
Index	661
PREFACE TO THE
THIRD EDITION
In this edition, about a third of the text is either new or substantially revised.
The new topics include the following:
A chapter on statistics. With this addition, the first nine chapters of the
book could form the basis for a senior-graduate course in probability and
statistics.
A chapter on spectral estimation. This chapter starts with an expanded
treatment of ergodicity and it covers the fundamentals of parametric and
nonparametric estimation in the context of system identification.
A section on the meaning and generation of random numbers. This
material is essential for the understanding of computer simulation of random
phenomena and the use of statistics in the solution of deterministic problems
(Monte Carlo techniques).
Other topics include bispectra, state variables and vector processes, factoriza-
tion, and spectral representation.
I wrote the first edition of this book long ago. My objective was to develop
the subject of probability and stochastic processes as a deductive discipline and
to illustrate the theory with basic applications of general interest. I tried to
stress clarity and economy, avoiding sophisticated mathematics or, at the other
extreme, detailed discussion of practical applications. It appears that this
approach met with some success. For over a quarter of a century, the book has
been used as a basic text and standard reference not only in this country but
throughout the world. I am deeply grateful.
McGraw-Hill and I would like to thank the following reviewers for their
many helpful comments and suggestions: John Adams, Lehigh University; David
Anderson, University of Michigan; V. Krishnan, University of Lowell; Robert J.
Mulholland, University of Oklahoma; Stephen Sebo, Ohio State University;
ibid Samir S. Soliman, Southern Methodist University.
Athanasios Papoulis
PREFACE TO THE
SECOND EDITION
This is an extensively revised edition reflecting the developments of the last two
decades. Several new topics are added, important areas are strengthened, and
sections of limited interest are eliminated. Most additions, however, deal with
applications; the first ten chapters are essentially unchanged.
In the selection of the new material I have attempted to concentrate on
subjects that not only are of current interest, but also contribute to a better
understanding of the basic properties of stochastic processes. The new material
includes the following:
Discrete-time processes with applications in system theory
Innovations, factorization, spectral representation
Queueing theory, level crossings, spectra of FM signals, sampling theory
Mean square estimation, orthonormal expansions, Levinson’s algorithm, Wold’s
decomposition, Wiener, lattice, and Kalman filters
Spectral estimation, windows, extrapolation, Burg’s method, detection of line
spectra
This book concludes with a self-contained chapter on entropy developed ax-
iomatically from first principles. It is presented in the context of earlier
chapters, and it includes the method of maximum entropy in parameter estima-
tion and elements of coding theory.
As in the first edition, I made a special effort to stress the conceptual
difference between mental constructs and physical reality. This difference is
summarized in the following paragraph, taken from the first edition:
Scientific theories deal with concepts, not with reality. All theoretical results
are derived from certain axioms by deductive logic. In physical sciences the
theories are so formulated as to correspond in some usefid sense to the real
world, whatever that may mean. However, this correspondence is approxi-
xiii
'o
I
Xiv PREFACE IX) THE SECOND EDITION
mate, and the physical justification of all theoretical conclusions is based on
some form of inductive reasoning.
Responding to comments by a number of readers over the years, I would like to
emphasize that this passage in no way questions the existence of natural laws
(patterns). It is merely a reminder of the fundamental difference between
concepts and reality.
During the preparation of the manuscript I had the benefit of lengthy
discussions with a number of colleagues and friends. I thank in particular Hans
Schreiber of Grumman, William Shanahan of Norden Systems, and my col-
leagues Frank Cassara and Basil Maglaris for their valuable suggestions. I wish
also to express my appreciation to Mrs. Nina Adamo for her expert typing of the
manuscript.
dthanasios Papoulis
PREFACE TO THE
FIRST EDITION
Several years ago I reached the conclusion that the theory of probability should
no longer be treated as adjunct to statistics or noise or any other terminal topic,
but should be included in the basic training of all engineers and physicists as a
separate course. I made then a number of observations concerning the teaching
of such a course, and it occurs to me that the following excerpts from my early
notes might give you some insight into the factors that guided me in the
planning of this book:
“Most students, brought up with a deterministic outlook of physics, find
the subject unreliable, vague, difficult. The difficulties persist because of inade-
quate definition of the first principles, resulting in a constant confusion between
assumptions and logical conclusions. Conceptual ambiguities can be removed
only if the theory is developed axiomatically. They say that this approach would
require measure theory, would reduce the subject to a branch of mathematics,
would force the student to doubt his intuition leaving him without convincing
alternatives, but I don’t think so. I believe that most concepts needed in the
applications can be explained with simple mathematics, that probability, like any
other theory, should be viewed as a conceptual structure and its conclusions
should rely not on intuition but on logic. The various concepts must, of course,
be related to the physical world, but such motivating sections should be
separated from the deductive part of the theory. Intuition will thus be strength-
ened, but not at the expense of logical rigor.
“There is an obvious lack of continuity between the elements of probabil-
ity as presented in introductory courses, and the sophisticated concepts needed
in today’s applications. How can the average student, equipped only with the
probability of cards and dice, understand prediction theory or harmonic analy-
sis? The applied books give at most a brief discussion of background material;
their objective is not the use of the applications to strengthen the student’s
understanding of basic concepts, but rather a detailed discussion of special
topics.
о *
XV
XVI PRLfACr ГО THE I IRST EDI DON
“Random variables, transformations, expected values, conditional densi-
ties, characteristic functions cannot be mastered with mere exposure. '1 hese
concepts must be clearly defined and must be developed, one at a time, with
sufficient elaboration. Special topics should be used to illustrate the theory, but
they must be so presented as to minimize peripheral, descriptive materia! and to
concentrate on probabilistic content. Only then the student can learn a variety
of applications with economy and perspective.”
I realized that to teach a convincing course, a course that is not a mere
presentation of results but a connected theory'. I would have to reexamine not
only the development of special topics, but also the proofs of many results and
the method of introducing the first principles.
“The theory must be mathematical (deductive) in form but without the
generality or rigor of mathematics. The philosophical meaning of probability
most somehow be discussed. This is necessary to remove the mystery associated
with probability and to convince the student of the need for an axiomatic
approach and a clear distinction between assumptions and logical conclusions.
The axiomatic foundation should not be a mere appendix but should be
recognized throughout the theory.
“Random variables must be defined as functions with domain an abstract
set of experimental outcomes and not as points on the real line. Only then
infinitely dimensional spaces arc avoided and the extension to stochastic pro-
cesses is simplified.
“The inadequacy of averages as definitions and the value of an underlying
space is most obvious in the treatment of stochastic processes. Time averages
must be introduced as stochastic integrals, and their relationship to the statisti-
cal parameters of the process must be established only in the form of crgodicity.
“The emphasis on second-order moments and spectra, utilizing the stu-
dent’s familiarity with systems and transform techniques, is justified by the
current needs.
“Mean-square estimation (prediction and filtering), a topic of considerable
importance, needs a basic reexamination. It is best understood if it is divorced
from the details of integral equations or the calculus of variations, and is
presented as an application of the orthogonality principle (linear regression),
simply explained in terms of random variables.
“To preserve conceptual order, one must sacrifice continuity of special
topics, introducing them as illustrations of the general theory.”
These ideas formed the framework of a course that I taught at the
Polytechnic Institute of Brooklyn. Encouraged by the students’ reaction, I
decided to make it into a book. I should point out that I did not view my task as
an impersonal presentation of a complete theory, but rather as an effort to
explain the essence of this theory to a particular group of students. The book is
written neither for the handbook-oriented students nor for the sophisticated few
who can learn the subject from advanced mathematical texts. It is written lor
the majority of engineers and physicists who have sufficient maturity to appreci-
ate and follow a logical presentation, but, because of their limited mathematical
background, would find a book such as Doob’s too difficult for a beginning text.
rut । м i к» ihi i ihm i.unios xvii
Although 1 have included many useful results, some of them new, my hope
is that the book will be judged not for completeness but lot organization and
clarity. In this context 1 would like to anticipate a criticism and explain my
approach. Some readers will find the proofs of many important theorems
lacking in rigor. 1 emphasize that it was not out of negligence, but after
considerable thought, that I decided to give, in several instances, only plausibil-
ity arguments. 1 realize too well that “a proof is a proof or it is not." However, a
rigorous proof must be preceded by a clarification of the new idea and by a
plausible explanation of its validity. 1 felt that, for the purpose of this book, the
emphasis should be placed on explanation, facility, and economy. I hope that
this approach will give you not only a working knowledge, but also an incentive
for a deeper study of this fascinating subject.
Although 1 have tried to develop a personal point of view in practically
every topic, I recognize that 1 owe much to other authors. In particular, the
books "Stochastic Processes” by J. L. Doob and "Theorie des Functions
Aleatoires” by A. Blanc-Lapierre and R. Forter influenced greatly my planning
of the chapters on stochastic processes.
Finally, it is my pleasant duty to express my sincere gratitude to Miseha
Schwartz for his encouragement and valuable comments, to Ray Pickhohz for
his many ideas and constructive suggestions, and to all my colleagues and
students who guided my efforts and shared by enthusiasm in this challenging
project.
Athanasios Papoulis
PART
I
PROBABILITY
AND RANDOM
VARIABLES
CHAPTER
1
THE MEANING
OF PROBABILITY
1-1 INTRODUCTION
The theory of probability deals with averages of mass phenomena occurring
sequentially or simultaneously: electron emission, telephone calls, radar detec-
tion, quality control, system failure, games of chance, statistical mechanics,
turbulence, noise, birth and death rates, and queueing theory, among many
others.
It has been observed that in these and other fields certain averages
approach a constant value as the number of observations increases and this
value remains the same if the averages are evaluated over any subsequence
specified before the experiment is performed. In the coin experiment, for
example, the percentage of heads approaches 0.5 or some other constant, and
the same average is obtained if we consider every fourth, say, toss (no betting
system can beat the roulette).
The purpose of the theory is to describe and predict such averages in
terms of probabilities of events. The probability of an event л/ is a number
P(jaf) assigned to this event. This number could be interpreted as follows:
If the experiment is performed n times and the event .й/ occurs n^ times, then,
with a high degree of certainty, the relative frequency n.v/n of the occurrence of xsf
is close to ):
Р(л/)=лл</п	(1-1)
provided that n is sufficiently large.
3
4 JUL Ml ANIMi Ol I,I«)1IAIIII ITY
This interpretation is imprecise: The terms “with a high degree of certainty,"
“close,*’ and “sufficiently large" have no clear meaning. However, this lack of
precision cannot be avoided. If we attempt to define in probabilistic terms the
“high degree of certainty" we shall only postpone the inevitable conclusion that
probability, like any physical theory, is related to physical phenomena only in
inexact terms. Nevertheless, the theory is an exact discipline developed logically
from clearly defined axioms, and when it is applied to real problems, it works.
OBSERVATION, DEDUCTION, PREDICTION. In the applications of probability to
real problems, the following steps must be clearly distinguished:
Step 1 (physical) We determine by an inexact process the probabilities
P(.o/) of certain events
This process could be based on the relationship (1-1) between probability
and observation: The probabilistic data equal the observed ratios /1 z/n.
It could also be based on “reasoning" making use of certain symmetries: If, out
of a total of N outcomes, there are A’.z outcomes favorable to the event .У,
then /J(.c/) = N^/N.
For example, if a loaded die is rolled 1000 times and five shows 203 limes,
then the probability of five equals 0.2. If the die is fair, then, because of its
symmetry, the probability of fire equals 1/6.
Step 2 (conceptual) We assume that probabilities satisfy certain axioms,
and by deductive reasoning we determine from the probabilities P(.>/) of
certain events .й/ the probabilities P(.^) of other events
For example, in the game with a fair die we deduce that the probability of
the event even equals 3/6. Our reasoning is of the following form:
If P(l) = ••• = P(6) = I then P(even) = £
Step 3 (physical) We make a physical prediction based on the numbers
P(&j) so obtained.
This step could rely on (1-1) applied in reverse: If we perform the
experiment n times and an event & occurs n;j) times, then n „ - nP(&).
If, for example, we roll a fair die 1000 times, our prediction is that even
will show about 500 times.
We could not emphasize too strongly the need for separating the above
three steps in the solution of a problem. We must make a clear distinction
between the data that are determined empirically and the results that arc
deduced logically.
Steps 1 and 3 are based on inductive reasoning. Suppose, for example, that
we wish to determine the probability of heads of a given coin. Should we toss
the coin 100 or 1000 times? If we toss it 1000 times and the average number of
heads equals 0.48 what kind of prediction can we make on the basis of this
observation? Can we deduce that at the next 1000 tosses the number of heads
will be about 480? Such questions can be answered only inductively.
In this book, we consider mainly step 2, that is, from certain probabilities
wC derive deductively other probabilities. One might argue that such derivations
1-2 ihi in । iMitoss 5
are mere tautologies because the results are contained in the assumptions. This
is true in the same sense that the intricate equations of motion of a satellite are
included in Newton’s laws.
To conclude, we repeat that the probability Л-?/) of an event .:/ will be
interpreted as a number assigned to this event as mass is assigned to a body or
resistance to a resistor. In the development of the theory, we will not he
concerned about the "physical meaning" of this number. This is what is done in
circuit analysis, in electromagnetic theory', in classical mechanics, or in any other
scientific discipline. These theories are, of course, of no value to physics unless
they help us solve real problems. We must assign specific, if only approximate,
resistances to real resistors and probabilities to real events (step 1); we must
also give physical meaning to all conclusions that are derived from the theory
(step 3). But this link between concepts and observation must be separated from
the purely logical structure of each theory' (step 2).
As an illustration, we discuss in the next example the interpretation of the
meaning of resistance in circuit theory.
Example 1-1. A resistor is commonly viewed as a two-terminal device whose
voltage is proportional to the current
This, however, is only a convenient abstraction. A real resistor is a complex device
with distributed inductance and capacitance having no clearly specified terminals.
A relationship of the form (1-2) can, therefore, be claimed only within certain
errors, in certain frequency ranges, and with a variety of other qualifications.
Nevertheless, in the development of circuit theory we ignore all these uncertainties.
We assume that the resistance R is a precise number satisfying (1-2) and we
develop a theory based on (1-2) and on Kirchhoff's laws. It would not be wise, we
all agree, if at each stage of the development of the theory' we were concerned with
the true meaning of R.
1-2 THE DEFINITIONS
In this section, we discuss various definitions of probability and their roles in
our investigation.
Axiomatic Definition
We shall use the following concepts from set theory (for details see Chap. 2):
The certain event is the event that occurs in every trial. The union + &
of two events .2/ and & is the event that occurs when .й/ or or both occur.
The intersection of the events .£/ and & is the event that occurs when
both events л/ and dS occur. The events and & are mutually exclusive if the
occurrence of one of them excludes the occurrence of the other.
6 ТМГ MI.ANINfi CM PROHAUII.IIY
We shall illustrate with the die experiment: The certain event is the event
that occurs whenever any one of the six faces shows. The union of the events
even and less than 3 is the event 1 or 2 or 4 or 6 and their intersection is the
event 2. The events even and odd are mutually exclusive.
The axiomatic approach to probability is based on the following three
postulates and on nothing else: The probability P(.V) of an event s>/ is a
positive number assigned to this event
P(j/) > 0
The probability of the certain event equals 1:
P(.Z) = I
In the events xt/ and & are mutually exclusive, then
/>(.:/+ .^) = P(.-/) + P(^)
(1-3)
(1-4)
(1-5)
This approach to probability is relatively recent (A. Kolmogoroff,tl933). How-
ever, in our view, it is the best way to introduce a probability even in elementary
courses. It emphasizes the deductive character of the theory, it avoids concep-
tual ambiguities, it provides a solid preparation for sophisticated applications,
and it offers at least a beginning for a deeper study of this important subject.
The axiomatic development of probability might appear overly mathemati-
cal. However, as we hope to show, this is not so. The elements of the theory can
be adequately explained with basic calculus.
Relative Frequency Definition
The relative frequency approach is based on the following definition: The
probability P(j^) of an event за/ is the limit
P(.;/) = lim —	(1-6)
и -*oc n
where n is the number of occurrences of and n is the number of trials.
This definition appears reasonable. Since probabilities are used to de-
scribe relative frequencies, it is natural to define them as limits of such
frequencies. The problem associated with a priori definitions are eliminated,
one might think, and the theory is founded on observation.
However, although the relative frequency concept is fundamental in the
applications of probability (steps 1 and 3), its use as the basis of a deductive
theory (step 2) must be challenged. Indeed, in a physical experiment, the
numbers n& and n might be large but they are only finite; their ratio cannot,
therefore, be equated, even approximately, to a limit. If (1-6) is used to define
tA. Kolmogoroff: Grundbcgrific der Wahrschcinlichkeits Rechnung. Ergeh. Math und ihrer Gntnsg.
vol. 2, 1933.
1-2 THE. UEHNII IONS 7
P(jZ), the limit must be accepted as a hypothesis, not as a number that can be
determined experimentally.
Early in the century, Von Misest used (1-6) as the foundation for a new
theory. At that time, the prevailing point of view was still the classical and his
work offered a welcome alternative to the a priori concept of probability,
challenging its metaphysical implications and demonstrating that it leads to
useful conclusions mainly because it makes implicit use of relative frequencies
based on our collective experience. The use of (1-6) as the basis for deductive
theory has not, however, enjoyed wide acceptance even though (1-6) relates
Р(л/) to observed frequencies. It has generally been recognized that the
axiomatic approach (Kolmogoroff) is superior.
We shall venture a comparison between the two approaches using as
illustration the definition of the resistance R of an ideal resistor. We can define
R as a limit
where e(t) is a voltage source and in(t) are the currents of a sequence of real
resistors that tend in some sense to an ideal two-terminal element. This
definition might show the relationship between real resistors and ideal elements
but the resulting theory is complicated. An axiomatic definition of R based on
Kirchhoffs laws is, of course, preferable.
Classical Definition
For several centuries, the theory of probability was based on the classical
definition. This concept is used today to determine probabilistic data and as a
working hypothesis. In the following, we explain its significance.
According to the classical definition, the probability Р(лУ') of an event .£/
is determined a priori without actual experimentation: It is given by the ratio
(1-7)
where N is the number of possible outcomes and is the number of
outcomes that are favorable to the event .of.
In the die experiment, the possible outcomes are six and the outcomes
favorable to the event even are three; hence P(even) = 3/6.
It is important to note, however, that the significance of the numbers N
and is not always clear. We shall demonstrate the underlying ambiguities
with the following example.
tRichard Von Mises: Probability, Statistics and Truth, English edition, H. Geiringcr. cd„ G. Allen
and Unwin Lid., London, 1957.
8 THE MEANING QF I’ROUABlUTY
Example 1-2. We roll two dice and wc want to find the probability p that the sum
of the numbers that show equals 7.
To solve this problem using (1-7). we must determine the numbers N and
(a) We could consider as possible outcomes the 11 sums 2,3..... 12. Of these,
only one, namely the sum 7, is favorable; hence p = 1/11. This result is of course
wrong, (b) We could count as possible outcomes all pairs of numbers not
distinguishing between the first and the second die. We have now 21 outcomes of
which the pairs (3,4), (5,2), and (6,1) are favorable. In this case, N^= 3 and
N = 21; hence p = 3/21. This result is also wrong, (c) We now reason that the
above solutions arc wrong because the outcomes in (л) and (6) are not equally
likely. To solve the problem “correctly,” we must count all pairs of numbers
distinguishing between the first and the second die. The total number of outcomes
is now 36 and the favorable outcomes are the six pairs (3,4), (4,3), (5,2), (2,5),
(6,1), and (1,6); hence p = 6/36.
The above example shows the need for refining definition (1-7). The
improved version reads as follows:
The probability of an event equals the ratio of its favorable outcomes to the total
number of outcomes provided that all outcomes are equally likely.
As we shall presently see, this refinement does not eliminate the problems
associated with the classical definition.
Notes 1. The classical definition was introduced as a consequence of the principle of
insufficient reason^: “In the absence of any prior knowledge, we must assume that the
events .0/ have equal probabilities.” This conclusion is based on the subjective interpre-
tation of probability as a measure of our state of knowledge about the events Indeed,
if it were not true that the events have the same probability, then changing their
indices we would obtain different probabilities without a change in the state of our
knowledge.
2. As we explain in Chap. 15, the principle of insufficient reason is equivalent to
the principle of maximum entropy.
CRITIQUE. The classical definition can be questioned on several grounds.
A.	The term equally likely used in the improved version of (1-7) means, actually,
equally probable. Thus, in the definition, use is made of the concept to be
defined. As we have seen in Example 1-2, this often leads to difficulties in
determining N and N^.
B.	The definition can be applied only to a limited class of problems. In the die
experiment, for example, it is applicable only if the six faces have the same
probability. If the die is loaded and the probability of four equals 0.2, say,
the number 0.2 cannot be derived from (1-7).
tH. Bernoulli. Arts Conjectandi, 1713.
1-2 ihi DtiiMiioss 9
C It appears from (1-7) that the classical definition is a consequence of logical
imperatives divorced from experience. This, however, is not so. We accept
certain alternatives as equally likely because of our collective experience.
The probabilities of the outcomes of a fair die equal 1/6 not only because
the die is symmetrical but also because it was observed in the long history of
rolling dice that the ratio n ^/n in (l-l) is close to 1/6. The next illustration
is, perhaps, more convincing:
We wish to determine the probability p that a newborn baby is a boy. It is
generally assumed that p = 1/2; however, this is not the result of pure
reasoning. In the first place, it is only approximately true that p = 1 /2.
Furthermore, without access to long records we would not know that the
boy-girl alternatives are equally likely regardless of the sex history of the
baby’s family, the season or place of its birth, or other conceivable factors. It
is only after long accumulation of records that such factors become irrele-
vant and the two alternatives are accepted as equally likely.
D. If the number of possible outcomes is infinite, then to apply the classical
definition we must use length, area, or some other measure of infinity for
determining the ratio N.//N in (1-7). We illustrate the resulting difficulties
with the following example known as the Bertrand paradox.
Example 1-3. Wc are given a circle C of radius r and we wish to determine the
probability p that the length I of a “randomly selected” cord AB is greater than
the length ry/3 of the inscribed equilateral triangle.
Wc shall show that this problem can be given at least three reasonable
solutions.
I. If the center M of the cord AB lies inside the circle C, of radius r/2 shown
in Fig. 1-1Л, then I > г/з. It is reasonable, therefore, to consider as favorable
outcomes all points inside the circle Cj and as possible outcomes all points
inside the circle C. Using as measure of their numbers the corresponding
areas ттг2/4 and тгг2, we conclude that
irr2/4	1
p =------— = -
тгг	4
FIGURE 1-1
10 ТНЬ MEANING Of PKOUAUll.l I Y
II. We now assume that the end Л of the cord AB is fixed. This reduces the
number of possibilities but it has no effect on the value of p because the
number of favorable locations of В is reduced proportionately. If В is on the
120° arc DBE of Fig. I-lb, then / > rv'3". The favorable outcomes arc now
the points on this arc and the total outcomes all points on the circumlcrcnce
of the circle C. Using as their measurements the corresponding lengths 2ттг/3
and 2тгг, we obtain
Ш. We assume finally that the direction of AB is perpendicular to the line FK of
Fig. 1-1 c. As in II this restriction has no effect on the value of p. If the center
M of AB is between G and H, then / > г/З. Favorable outcomes arc now
the points on GH and possible outcomes all points on FK. Using as their
measures the respective lengths r and 2r, we obtain
r 1
P ~ Tr ~ 2
We have thus found not one but three different solutions for the same
problem! One might remark that these solutions correspond to three different
experiments. This is true but not obvious and, in any case, it demonstrates the
ambiguities associated with the classical definition, and the need for a clear
specification of the outcomes of an experiment and the meaning of the terms
“possible” and “favorable.”
VALIDITY. We shall now discuss the value of the classical definition in the
determination of probabilistic data and as a working hypothesis.
A.	In many applications, the assumption that there are N equally likely alterna-
tives is well established through long experience. Equation (1-7) is then ac-
cepted as self-evident. For example, “If a ball is selected at random from a box
containing m black and n white balls, the probability that it is white equals
n/(m +«),” or, “If a call occurs at random in the time interval (0, T), the
probability that it occurs in the interval (/„ t2) equals </2 -
Such conclusions are of course, valid and useful; however, their validity
rests on the meaning of the word random. The conclusion of the last example
that “the unknown probability equals (r2 - t^/T” is not a consequence of the
“randomness” of the call. The two statements are merely equivalent and they
follow not from a priori reasoning but from past records of telephone calls.
B.	In a number of applications it is impossible to determine the probabilities of
various events by repeating the underlying experiment a sufficient number of
times. In such cases, we have no choice but to assume that certain alternatives
are equally likely and to determine the desired probabilities from (1-7). This
1-2 IHblMHMIlONS 11
means that we use the classical definition as a working hypothesis. The hypothe-
sis is accepted if its observable consequences agree with experience, otherwise it
is rejected. We illustrate with an important example from statistical mechanics.
Example 1-4. Given n particles and m > n boxes, wc place at random each
particle in one of the boxes. Wc wish to find the probability p that in n preselected
boxes, one and only one particle will be found.
Since wc are interested only in the underlying assumptions, wc shall only
state the results (the proof is assigned as Prob. 3-15). We also verify the solution
for n = 2 and m = 6. For this special case, the problem can be stated in terms of a
pair of dice: The m = 6 faces correspond to the m boxes and the л = 2 dice to the
n particles. We assume that the preselected faces (boxes) are 3 and 4.
The solution to this problem depends on the choice of possible and favorable
outcomes. We shall consider the following three celebrated cases:
Maxwell-Boltzmann statistics. If wc accept as outcomes all possible ways of placing
n particles in m boxes distinguishing the identity of each particle, then
«!
P =
For n = 2 and m = 6 the above yields p — 2/36. This is the probability for
getting 3,4 in the game of two dice.
Bose-Einstein statistics. If wc assume that the particles are not distinguishable, that
is, if all their permutations count as one, then
(m - !)!«!
P (n + m - 1) 1
For n = 2 and m = 6 this yields p = 1/21. Indeed, if we do not distinguish
between the two dice, then W = 21 and 1 because the outcomes 3,4 and 4,3
are counted as one.
Fermi-Dirac statistics. If we do not distinguish between the particles and also wc
assume that in each box wc are allowed to place at most one particle, then
n!(m - л)!
For n = 2 and m = 6 we obtain p = 1/15. This is the probability for 3,4 if we do
not distinguish between the dice and also we ignore the outcomes in which the two
numbers that show are equal.
One might argue, as indeed it was in the early years of statistical mechanics,
that only the first of these solutions is logical. The fact is that in the absence of
direct or indirect experimental evidence this argument cannot be supported. The
three models proposed are actually only hypotheses and the physicist accepts the
one whose consequences agree with experience.
C.	Suppose that we know the probability of an event stf in experiment 1
and the probability P(0) of an event @ in experiment 2. In general, from this
12 TH>. Ml ANIN’G Ol I’ROIIAHII.I I Y
information wc cannot determine the probability Pi'-t/tri) that both events
and tri will occur. However, if wc know that the two experiments are indepen-
dent, then
P(V.^) =P(.o/)P(.^)	(1-8)
In many cases, this independence can be established a priori by reasoning that
the outcomes of experiment I have no effect on the outcomes of experiment 2.
For example, if in the coin experiment the probability of heads equals 1 /2 and
in the die experiment the probability of even equals 1/2, then, wc conclude
“logically” that if both experiments arc performed, the probability that wc get
heads on the coin and eren on the die equals 1/2 X 1/2. Thus, as in (1-7). we
accept the validity of (1-8) as a logical necessity without recourse to (1-1) or to
any other direct evidence.
D.	The classical definition can be used as the basis of a deductive theory if we
accept (1-7) as an assumption. In this theory, no other assumptions are used and
postulates (1-3) to (1-5) become theorems. Indeed, the first two postulates arc
obvious and the third follows from (1-7) because, if the events and .-ri are
mutually exclusive, then N-s+j = N-./ + Ny, hence
N NN
/’(.2/4- .0) =	4-	= p( V) + p(.^)
N NN
As we show in (2-25), however, this is only a very' special case of the axiomatic
approach to probability.
1-3 PROBABILITY AND INDUCTION
In the applications of the theory of probability we are faced with the following
question: Suppose that we know somehow from past observations the probabil-
ity P(«o/) of an event in a given experiment. What conclusion can we draw
about the occurrence of this event in a single future performance of this
experiment? (See also Sec. 9-1.)
We shall answer this question in two ways depending on the size of /’(.£/):
We shall give one kind of an answer if Р(.с/) is a number distinctly different
from 0 or 1, for example 0.6, and a different kind of an answer if P(&/) is close
to 0 or 1, for example 0.999. Although the boundary between these two cases is
not sharply defined, the corresponding answers are fundamentally different.
Case 1 Suppose that P(j/) = 0.6. In this case, the number 0.6 gives us
only a ‘‘certain degree of confidence that the event .о/ will occur.” The known
probability is thus used as a “measure of our belief’ about the occurrence of .a/
in a single trial. This interpretation of P(.o/) is subjective in the sense that it
cannot be verified experimentally. In a single trial, the event .с/ will either
occur or will not occur. If it does not, this will not be a reason for questioning
the validity of the assumption that P(.c/) = 0.6.
Case 2 Suppose, however, that P(.c/) = 0.999. We can now state with
practical certainty that at the next trial the event .t/ will occur. This conclusion
1-4 < AUSAl.llh \1 RSI S KXMXJMM v, 13
is objective in the sense that it can be verified experimentally. At the next trial
the event must occur. If it does not, we must seriously doubt, if not outright
reject, the assumption that P(.c/) = 0.999.
The boundary between these two cases, arbitrary though it is (0.9 or
0.99999?), establishes in a sense the line separating “soft” from “hard" scientific
conclusions. The theory of probability gives us the analytic tools (step 2) for
transforming the “subjective” statements of case 1 to the “objective" statements
of case 2. In the following, we explain briefly the underlying reasoning.
As we show in Chap. 3. the information that P(.:/) = (1.6 leads to the
conclusion that if the experiment is performed 1000 times, then "almost
certainly” the number of times the event у/ will occur is between 550 and 650.
This is shown by considering the repetition of the original experiment 1000
times as a single outcome of a new experiment. In this experiment the probabil-
ity of the event
= {the number of times occurs is between 550 and 650}
equals 0.999 (sec Prob. 3-6). We must, therefore, conclude that (case 2) the
event .3/, will occur with practical certainty.
We have thus succeeded, using the theory of probability, to transform the
“subjective” conclusion about .г/ based on the given information that /Д.'/) =
0.6, to the “objective" conclusion about .2/, based on the derived conclusion
that PCg/,) = 0.999. We should emphasize, however, that both conclusions rely
on inductive reasoning. Their difference, although significant, is only quantita-
tive. As in case 1, the “objective” conclusion of case 2 is not a certainty but only
an inference. This, however, should not surprise us; after all, no prediction
about future events based on past experience can be accepted as logical
certainty.
Our inability to make categorical statements about future events is not
limited to probability but applies to all sciences. Consider, for example, the
development of classical mechanics. It was observed that bodies fall according
to certain patterns, and on this evidence Newton formulated the laws of
mechanics and used them to predict future events. His predictions, however,
are not logical certainties but only plausible inferences. To “prove” that the
future will evolve in the predicted manner we must invoke metaphysical causes.
1-4 CAUSALITY VERSUS RANDOMNESS
We conclude with a brief comment on the apparent controversy between
causality and randomness. There is no conflict between causality and random-
ness or between determinism and probability if we agree, as we must, that
scientific theories arc not discoveries of the laws of nature but rather inventions
of the human mind. Their consequences are presented in deterministic form if
we examine the results of a single trial: they are presented as probabilistic
statements if we are interested in averages of many trials. In both cases, all
statements are qualified. In the first case, the uncertainties are of the form
“with certain errors and in certain ranges of the relevant parameters": in the
14 THE MEANING OF PROBABILITY
0
FIGURE 1-2
second, “with.a high degree of certainty if the number of trials is large enough.”
In the next example, we illustrate these two approaches.
Example 1-5. A rocket leaves the ground with an initial velocity г forming an
angle 6 with the horizontal axis (Fig. 1-2). We shall determine the distance
d = OB from the origin to the reentry point B.
From Newton’s law it follows that
u2
d — — sin 20	(1*9)
S
The above seems to be an unqualified consequence of a causal law; however,
this is not so. The result is approximate and it can be given a probabilistic
interpretation.
Indeed, (1-9) is not the solution of a real problem but of an idealized model
in which we have neglected air friction, air pressure, variation of g, and other
uncertainties in the values of v and 0. We must, therefore, accept (1-9) only with
qualifications. It holds within an error e provided that the neglected factors are
smaller than 8.
Suppose now that the reentry area consists of numbered holes and wc want
to find the reentry hole. Because of the uncertainties in v and 0, we are in no
position to give a deterministic answer to our problem. We can, however, ask a
different question: If many rockets, nominally with the same velocity, are launched,
what percentage will enter the nth hole? This question no longer has a causal
answer; it can only be given a random interpretation.
Thus the same physical problem can be subjected cither to a deterministic or
to a probabilistic analysis. One might argue that the problem is inherently
deterministic because the rocket has a precise velocity even if we do not know it. If
we did, we would know exactly the reentry hole. Probabilistic interpretations are,
therefore, necessary because of our ignorance.
Such arguments can be answered with the statement that the physicists are
not concerned with what is true but only with what they can observe.
CONCLUDING REMARKS
In this book, we present a deductive theory (step 2) based on the axiomatic
definition of probability. Occasionally, we use the classical definition but only to
determine probabilistic data (step 1),
To show the link between theory and applications (step 3), we give also a
relative frequency interpretation of the important results. This part of the book,
written in small print under the title Frequency interpretation, does not obey the
rules of deductive reasoning on which the theory is based.
CHAPTER
2
THE AXIOMS
OF PROBABILITY
2-1 SET THEORY
A sei is a collection of objects called elements. For example, "car, apple,
pencil” is a set whose elements are a car, an apple, and a pencil. The set
“heads, tails” has two elements. The set “1, 2, 3, 5” has four elements.
A subset дб of a set is another set whose elements are also elements of
<C/. All sets under consideration will be subsets of a set which we shall call
space.
The elements of a set will be identified mostly by the Greek letter g. Thus
&-U..................................f„)	(2-1)
will mean that the set srf consists of the elements fWe shall also
identify sets by the properties of their elements. Thus
&/— {all positive integers)	(2-2)
will mean the set whose elements are the numbers 1,2,3......
The notation
e У £ £ лэ/
will mean that £ is or is not an element of trf.
The empty or null set is by definition the set that contains no elements.
This set will be denoted by (0).
If a set consists of n elements, then the total number of its subsets
equals 2я.
IS
16 THI: AXIOMS OF PROVABILITY
7С.л’С.'/
FIGURE 2-2
Note In probability theory, we assign probabilities to the subsets (events) of У and wc
define various functions (random variables) whose domain consists of the elements of
We must be careful, therefore, to distinguish between the element f and the set (s')
consisting of the single element £.
Example 2-1. We shall denote by /, the faces of a die. These faces are the
elements of the set У'= {/j....../6).	In this case, n = 6; hence .У has 2Ъ = 64
subsets:

{ЛА), --Л/iAA)....
In general, the elements of a set are arbitrary objects. For example, the 64
subsets of the set uZ in the above example can be considered as the elements of
another set. In Example 2-2, the elements of are pairs of objects. In
Example 2-3, is the set of points in the square of Fig. 2-1.
Example 2-2. Suppose that a coin is tossed twice. The resulting outcomes are the
four objects hh, ht, th, It forming the set
S= {hh, ht, th, tt)
where hh is an abbreviation for the element “heads-heads.*’ The set .У’ has
24 = 16 subsets. For example,
{heads at the first toss) = {hh,ht}
& = {only one head showed) = {ht, th}
& = {heads shows at least once) = (hh, ht, гЛ)
In the first equality, the sets jZ, Sd, and € are represented by their properties as
in (2-2); in the second, in terms of their elements as in (2-1).
Example 2-3. In this example, Z* is the set of all points in the square of Fig. 2-1.
Ils elements.are all ordered pairs of numbers (x, y) vlhere
O^x s T Q^y
2-1 si-.i i hi orv 17
FIGURE 2-3
FIGURE 2-4
The shaded area is a subset .с/ of ./ consisting of all points (.v,y) such that
—b £ x — у £ a. The notation
•й/ = {-/> s x - у < «}
describes V in terms of the properties of x and у as in (2-2).
Set Operations
In the following, we shall represent a set and its subsets by plane figures as
in Fig. 2-2 (Venn diagrams).
The notation dd c .£/ or .o/z> & will mean that & is a subset of .с/, that
is, that every element of 3d is an element of .о/. Thus, for any
{0} c .o/c c ./
Transitivity If if c <fd and .Й? с .о/ then f c .?/
Equality s/= !2 ifff . с/ c @ and c
Unions and intersections The sum or union of two sets .о/ and &d is a set
whose elements are all elements of л/ or of or of both (Fig. 2-3). This set
will be written in the form
szf+ 3d or .c/U &d
The above operation is commutative and associative:
Sd =	+ л/ (j/+ Sd} + if = s>/+ (^+ if)
We note that, if с за/, then <o/'+ Sd = srf. From this it follows that
<p/+ srf= stf &/+ (0) = sV s>/= .S'
The product or intersection of two sets &/ and Gd is a set consisting of all
elements that are common to the set sxf and dd (Fig. 2-3). This set is written in
the form
ssfdd or	srf Г\ Sd
The above operation is commutative, associative, and distributive (Fig. 2-4):
tf@=Sdstf (s/^)^=	+ <?) = srf& +
+ltT is an abbreviation for if and only if.
18 ТИН AXIOMS OF PROBABILITY
FIGURE 2-5
FIGURE 2-6
We note that if then = &/. Hence
,;/.?/= .й/	(0}.g/= {0}	,o/./=
Note If two sets .с/ and Z are described by the properties of their elements as in (2-2),
then their intersection will be specified by including these properties in braces. For
example, if
{1,2,3,4,5,6}	,c/ = {even} Z = {less than 5}
thenf
= {even, less than 5} = {2,4}	(2-3)
Mutually exclusive sets Two sets and .Z are called mutually exclusive
or disjoint if they have no common elements, that is, if
= {0}
Several sets ,0/,,	• • • are called mutually exclusive if
= {0} f°r every i and j * i
Partitions A partition 31 of a set .Z is a collection of mutually exclusive
subsets of У whose union equals ,Z (Fig. 2-5).
+ ••• 4-^ = .Z	= {0}	i +j	(2-4)
All partitions will be denoted by boldface German script (Fraktur) letters. Thus
tWe should stress the difference in the meaning of commas in (2-1) and (2-3). In (2-1) the braces
include all elements and
«I.....U - «.}u •••
is the union of the sets {£))• In (2-3) the braces include the properties of the sets {even) and {less
than 5), and
{even, less than 5} — {even} n {less than 5)
Ц-tho lnteraeuliOn of the sets {even} and {less than 5}.
2-1 si.i ни ohy 19
Complements The complement .о/ of a set is the set consisting of all
elements of that are not in .о/ (Fig. 2-6). From the definition it follows that
.с/.У= {0}	.:7= .У .7 = {0}	{0}=./'
If 32 c .0/ then 3 d .7; if .УУ then 5/ = .7.
De Morgan's law Clearly (sec Fig. 2-7)
&/ + Л	^/.7 = .7+ &	(2-5)
Repeated application of (2-5) leads to the following:
If in a set identity wc replace all sets by their complements, all unions by
intersections, and all intersections by unions, the identity is preserved.
We shall demonstrate the above using as example the identity
.c/(.7 + tf) =	+ .7<f	(2-6)
From (2-5) it follows that
Л/(й$ + if) = .£/+ Й? + tf = Л/+
Similarly,
+ .^tf = (7<7)(77f) = (.7 + .7) (.7 + 7)
and since the two sides of (2-6) are equal, their complements are also equal.
Hence
.7+ .^7= (.7+ <%)(&+ ?)	(2-7)
Duality principle As we know, {0} and {0} — Furthermore, if in
an identity like (2-7) all overbars are removed, the identity is preserved. This
leads to the following version of De Morgan’s law:
If in a set identity we replace all unions by intersections, all intersections
by unions, and the sets .7 and {0} by the sets {0} and .7, the identity is
preserved.
Applying the above to the identities
+ if) =	+ ja^f ./+	/
we obtain the identities
0if = (.<✓ + ^)(.У+ Tf) {0}.c/= {0}
1
20 THL AXIOMS OF I’KOliABII I I Y
2-2 PROBABILITY SPACE
In probability theory, the following set terminology is used: The space ./ is
called the certain erent, its elements experimental outcomes, and its subsets
erents. The empty set {0} is the impossible erent, and the event {£,} consisting of
a single element is an elementary erent. All events will be identified by script
letters.
In the applications of probability theory to physical problems, the identi-
fication of experimental outcomes is not always unique. We shall illustrate this
ambiguity with the die experiment as might be interpreted by players X. Y.
and Z.
X says that the outcomes of this experiment are the six faces of the die
forming the space ./'= {/j,...,/6}. This space has 26 = 64 subsets and the
event {even} consists of the three outcomes fA, and fh.
Y wants to bet on even or odd only. He argues, therefore that the
experiment has only the two outcomes even and odd forming the space
{even, odd}. This space has only 21 = 4 subsets and the event {even}
consists of a single outcome.
Z bets that one will show and the die will rest on the left side of the table.
He maintains, therefore, that the experiment has infinitely many outcomes
specified by the coordinates of its center and by the six faces. The event {even}
consists not of one or of three outcomes but of infinitely many.
In the following, when we talk about an experiment, we shall assume that
its outcomes are clearly identified. In the die experiment, for example. will
be the set consisting of the six faces fv ..., /fi.
In the relative frequency interpretation of various results, we shall use the
following terminology.
Trial A single performance of an experiment will be called a trial. At
each trial we observe a single outcome We say that an event ja/ occurs
during this trial if it contains the element The certain event occurs at every
trial and the impossible event never occurs. The event .?/+ occurs when я/
or 3d or both occur. The event occurs when both events .с/ and 3$ occur.
If the events and 3d are mutually exclusive and .2/ occurs, then 3d does not_
occur. If jaZ c 3d and <fiZ occurs, then 3d occurs. At each trial, cither .?/ or .с/
occurs.
If, for example, in the die experiment we observe the outcome /s, then the
event {/5}, the event {odd}, and 30 other events occur.
The Axioms
We assign to each event jbZ a number P(^Z) which we call the probability of the
event sxY. This number is so chosen as to satisfy the following three conditions:
I	(2-8)
П	= 1	(2-9)
Ш if ^={0} then P(jaZ+^) =P(^Z)+P(^)	(2-10)
2-2 i-RoKAUii.in м-м.1	21
These conditions are the axioms of the theory of probability. In the
development of die theory, all conclusions are based directly or indirectly on the
axioms and only on the axioms. The following are simple consequences.
Properties. The probability of the impossible event is 0:
/’{0} = 0	(2-11)
Indeed, .c/{0] = {0} and .cZ-l- {0} = .2/; therefore [see (2-10)]
P(.o/) = /J(.c/ + 0) = P(.V) + P{$]
For any .?/.
P(.V) = I -P(V) < I	(2-12)
because &/ + &/ =	and .2/.;/ = {0}: hence
i =P(.Z) =/’(•/+..V) =p(V) +P(.o/)
For any .2/ and -Z,
/»(.;/+ .Z) = /’(."/) + P(.#) -	< P(.r/) + P(.Z) (2-13)
To prove the above, we write the events .;/+ .Z and Л as unions of two
mutually exclusive events:
.c/+ mS = .2/4- .c/.'Z Л = р/Зв + jZZ
Therefore [see (2-10)]
Z) = P(.s<) + P(.cZ^)	P(.^) = P(.oZ^) + /’(.7.Z)
Eliminating /’(jZZ), wc obtain (2-13).
Finally, if .Z c .2/, then
P(jZ) = P(0) + P(jy^) > P(^)	(2-14)
because л/й? 4- and Z4.2/.Z) = (0).
Frequency interpretation The axioms of probability are so chosen that the resulting
theory gives a satisfactory representation of the physical world. Probabilities as used in
real problems must, therefore, be compatible with the axioms. Using the frequency
interpretation
P(sS) = —
л
of probability, we shall show that they do.
I.	Clearly, P(.c/) £ 0 because 0 and n > 0.
II.	« 1 because ./’ occurs at every trial; hence n./ = n.
III.	If	then	because if .n/+ & occurs then .V or M
occurs but not both. Hence
0) =	= — + — = P(.₽/) + P(^)
n n n
22 IKE AXIOMS OF PROBABILITY
• Y.rf+.-Y.rf
FIGURE 2-8
Equality of events. Two events .я/ and & are called equal if they consist of the
same elements. They are called equal with probability 1 if the set
(,c/ + .Z)( лз/Z) =
consisting of all outcomes that are in лэ/ or in & but not in .Z.Z (shaded area
in Fig. 2-8) has zero probability.
From the definition it follows that (see Prob. 2-4) the events лэ/ and $ are
equal with probability 1 iff
Р(л/) = P(Z) = P(.a/^)	(2-15)
If Р(лэ/) = P(Z) then we say that лэ/ and дё are equal in probability. In
this case, no conclusion can be drawn about the probability of If fact, the
events л/ and & might be mutually exclusive.
From (2-15) it follows that, if an event .//z equals the impossible event with
probability 1 then P(./Jz) = 0. This does not, of course, mean that . i'= {0}.
The Class $ of Events
Events are subsets of -Z to which we have assigned probabilities. As we shall
presently explain, we shall not consider as events all subsets of uZ but only a
class of subsets.
One reason for this might be the nature of the application. In the die
experiment, for example, we might want to bet only on even or odd. In this case,
it suffices to consider as events only the four sets {0}, {even}, {odd}, and .Z.
The main reason, however, for not including all subsets of in the class
§ of events is of a mathematical nature: In certain cases involving sets with
infinitely many outcomes, it is impossible to assign probabilities to all subsets
satisfying all the axioms including the generalized form (2-21) of axiom III.
The class $ of events will not be an arbitrary collection of subsets of Z.
We shall assume that, if лэ/ and 0% are events, then лэ/+ & and лУ^ are also
events. We do so because we will want to know not only the probabilities of
various events, but also the probabilities of their unions and intersections. This
leads to the concept of a field.
FIELDS. A field is a nonempty class of sets such that:
If лз/е g then лУе §	(2-16)
If лз/е ft and g then лз/+e	(2-17)
2-2 рконлии । i v si*a< i 23
These two properties give a minimum set of conditions for ft to be a field.
All other properties follow:
If .?/Gft and & g § then 'УЛ g §	(2-18)
Indeed, from (2-16) it fofiows that .c/g ft and Я g ft. Applying (2-17) and
(2-16) to the sets •£/ and we conclude that
.У +	g ft .У+ Л = e ft
A field contains the certain event and the impossible event:
./Gft {0} g ft	(2-19)
Indeed, since ft is not empty, it contains at least one clement therefore [see
(2-16)] it also contains ,*>/. Hence
.с/ + <o/= .y'G ft	{0} e ft
From the above it follows that all sets that can be written as unions or
intersections of finitely many sets in ft are also in ft. This is not, however,
necessarily the case for infinitely many sets.
Borel fields. Suppose that ..., .Уп.... is an infinite sequence of sets in ft.
If the union and intersection of these sets also belongs to ft, then ft is called a
Borel field.
The class of all subsets of a set .У7 is a Borel field. Suppose that ® is a
class of subsets of that is not a field. Attaching to it other subsets of .Z’, all
subsets if necessary, we can form a field with S as its subset. It can be shown
that there exists a smallest Borel field containing all the elements of
Example 2-4. Suppose that У consists of the four elements a, b, c, d and S
consists of the sets (a) and {ft}. Attaching to 6 the complements of {a) and {ft) and
their unions and intersections, we conclude that the smallest field containing {a}
and {ft) consists of the sets
{0}	{a} {ft} {a, ft} {c,<Z)	{b,c,d}	(a,c,d}
Events. In probability theory, events are certain subsets of forming a Borel
field. This permits us to assign probabilities not only to finite unions and
intersections of events, but also to their limits.
For the determination of probabilities of sets that can be expressed as
limits, the following extension-of axiom III is necessary.
Repeated application of (2-10) leads to the conclusion that, if the events
, sgfn are mutually exclusive, then
+ X) =P(^I.) + ••• +P(4)	(2-20)
24 THU AXIOMS OF PROBABILITY
The extension of the above to infinitely many sets does not follow from (2-10). h
is an additional condition known as the axiom of infinite additivity :
Ilk. If the events J>/2,... are mutually exclusive, then
P(^i +	+ • • • ) = P(&\) + Р(.а^) + • • '	(2-21)
We shall assume that all probabilities satisfy axioms 1, II. Ill, and Ila.
Axiomatic Definition of an Experiment
In the theory of probability, an experiment is specified in terms of the following
concepts:
1.	The set of all experimental outcomes.
2.	The Borel field of all events of
3.	The probabilities of these events.
The letter ./ will be used to identify not only the certain event, but also
the entire experiment.
We discuss next the determination of probabilities in experiments with
finitely many and infinitely many elements.
Countable spaces. If the space cf consists of N outcomes and N is a finite
number, then the probabilities of all events can be expressed in terms of the
probabilities
P(f,} =A
of the elementary events {£,}. From the axioms it follows, of course, that the
numbers must be nonnegative and their sum must equal 1:
Pt>0 Pi + •••+/?„= 1	(2-22)
Suppose that л/ is an event consisting of the r elements , In this case,
can be written as the union of the elementary events {£* }. Hence [see (2-20)]
P(j^) = Pf&J + • • •	= pkl + •1 • +pkr (2-23)
The above is true even if consists of an infinite but countable number
of elements £2,... [see (2-21)].
Classical definition If consists of N outcomes and the probabilities pt
of the elementary events are all. equal, then
A -	(2'24)
2-2 PROB.Mill I I Y SI4<~I.	25
In this case, the probability of an event .0/ consisting of r elements equals r/N.
P№ =	(2-25)
This very special but important case is equivalent to the classical definition
(1-7), with one important difference, however: In the classical definition, (2-25)
is deduced as a logical necessity; in the axiomatic development of probability.
(2-24), on which (2-25) is based, is a mere assumption.
Example 2-5. (a) In the coin experiment, the space consists of the outcomes h
and t:
/= {h,t}
and its events arc the four sets {0),{/}, {/«}.•>'. If P{h} = p and P{t} = q, then
P + Q = 1.
(h) We consider now the experiment of the toss of a coin three times. The
possible outcomes of this experiment arc:
hhh, hht, hth, hti, thh, tht, tth, Hi
We shall assume that all elementary events have the same probability as in (2-24)
(fair coin). In this case, the probability of each elementary event equals 1/8. Thus
the probability P{hhh} that wc get three heads equals 1/8. The event
{heads at the first two tosses) = {hhh, hht}
consists of the two outcomes hhh and hhr. hence its probability equals 2/8.
The real line. If У7 consists of a noncountable infinity of elements, then its
probabilities cannot be determined in terms of the probabilities of the elemen-
tary events. This is the case if is the set of points in an л-dimensional space.
In fact, most applications can be presented in terms of events in such a space.
We shall discuss the determination of probabilities using as illustration the real
line.
Suppose that is the set of all real numbers. Its subsets can be
considered as sets of points on the real line. It can be shown that it is impossible
to define probabilities to all subsets of so as to satisfy the axioms. To
construct a probability space on the real line, we shall consider as events all
intervals xx<,x <хг and their countable unions and intersections. These
eVents form a field that can be specified as follows:
It is the smallest Borel field that includes all half-lines x < x, where xt is
any number.
This field contains all open and closed intervals, all points, and, in fact,
every set of points on the real line that is of interest in the applications. One
might wonder whether % does not include all subsets of ./. Actually, it is
possible to show that there exist sets of points on the real line that arc not
countable unions and intersections of intervals. Such sets, however, are of no
interest in most applications. To complete the specification of it suffices to
26 ТНГ AXIOMS OF PROBABILITY
assign probabilities to the events {x < хД. All other probabilities can then be
determined from the axioms.
Suppose that a(x) is a function such that (Fig. 2-9a)
a(x) dx = 1
a(x) > 0
(2-26)
We define the probability of the event {x < x,} by the integral
P{x < x,} = f ‘ a(x) dx
— 00
(2-27)
This specifies the probabilities of all events of ..Z. We maintain for example,
that the probability of the event {xj < x < x2) consisting of all points in the
interval (x,, x2) is given by
Zx2
a(x) dx
(2-28)
Indeed, the events {x < x,} and {x, < x x2} are mutually exclusive and their
union equals {x < x2). Hence [see (2-10)]
P{X Xj + P{Xj < x < x2) = P{x < x2}
and (2-28) follows from (2-27).
We note that, if the function a(x) is bounded, then the integral in (2-28)
tends to 0 as x । -»x2. This leads to the conclusion that the probability of the
event {x2} consisting of the single outcome x2 is 0 for every x2. In this case, the
probability of all elementary events of ^Z equals 0, although the probability of
their unions equals 1. This is not in conflict with (2-21) because the total
number of elements of «Z is not countable.
Example 2-6. A radioactive substance is selected at t = 0 and the time i of
emission of a particle is observed. This process defines an experiment whose
outcomes are all points on the positive t axis. This experiment can be considered
as a spccial case of the real line experiment if we assume that <Z is the entire i
axis and all events on the negative axis have zero probability.
2-3 CONDITIONAL PHOIIAHIf l IV 27
Suppose then that the function a(z) in (2-26) is given by (Fig. 2-96)
a(r)-ce-'U(r)	(/(/) = {’	J*}]
Inserting into (2-28), we conclude that the probability that a particle will be
emitted in the time interval (0, r0) equals
f'Je cl dt = I - e * 'u
'o
Example 2-7. A telephone call occurs at random in the interval (0. T). This means
that the probability that it will occur in the interval 0 < t < tn equals ta/T. Thus
the outcomes of this experiment are all points in the interval (0. T) and the
probability of the event (the call will occur in the interval (rlf r2)) equals
P('i <; t z r2) =
This is again a special case of (2-28) with a(r) = 1/T for 0 < t <, T and 0
otherwise (Fig. 2-9c).
Probability masses. The probability Р(^У) of an event л/ can be interpreted as
the mass of the corresponding figure in its Venn diagram representation.
Various identities have similar interpretations. Consider, for example, the
identity P(^/+ Й?) =	+ P(^) — Р(.й/й?). The left side equals the mass
of the event &/+ In the sum + P(^), the mass of is counted
twice (Fig. 2-3). To equate this sum with Р(л/+ &?), we must, therefore,
subtract Р(л/^).
2-3 CONDITIONAL PROBABILITY
The conditional probability of an event л/ assuming denoted by Р(лз/|^), is
by definition the ratio
Р(.йфП «

(2-29)
where we assume that P(^) is not 0.
The following properties follow readily from the definition:
If srf then Р(.й/|.^) = 1	(2-30)
because феп	Similarly,
P(^)
if	then Р(лЛ.^) =s Р(лГ) (2.31)
28 THE AXIOMS Ob PROBABILITY
Frequency interpretation Denoting by n^, n ,, and n the number of occurrences
of the events л/. and respectively, wc conclude from (I-I) that
n v	n j-	л
Pf.ft/) = —	P(.^) = —	P(V.^) = —-
n	n	n
Hence
/W) =
P(.cZ^)
/’(•'O
/z>
(2-32)
'Ibis result can be phrased as follows: If wc discard all trials in which the event did
not occur and we retain only the subsequence of trials in which // occurred, then
equals the relative frequency of occurrence n.j#/n # of the event :/ in that
subsequence.
Fundamental remark. We shall show that, for a specific .z^. the conditional
probabilities are indeed probabilities; that is, they .satisfy the axioms.
The first axiom is obviously satisfied because Р(.г/.//) > (J and P{.//) > 0:
P(M/)>0	(2-33)
The second follows from (2-30) because .// c
Р(./И) = 1	(2-34)
To prove the third, we observe that if the events .о/ and & arc mutually
exclusive, then (Fig. 2-10) the events srfJ? and are also mutually exclusive.
Hence
Р[(л/+ .^).zH	+ P(.^.//)
P(.s/+ .&.//) = ——-—--—- = —-------------------------
This yields the third axiom:
P(&+	= P(.o/|.^) + Р(Я\Л)	(2-35)
From the above it follows that ail results involving probabilities holds also
for conditional probabilities. The significance of this conclusion will be appreci-
ated, later.
.</.**= {0} (.</. ад.//) = {0}
FIGURE?'10
FIGURE 2-11
2-3 («imhiioxi i>i-Hoii.Miii m 29
Example 2-8. In the fair-die experiment, we shall deletininc the conditional
probability of the event {f2} assuming that the event eieu occurred. With
V= {/,}	.//= {even} = {Л-Л.Д}
wc have P(ftO = 1/6 and P(.^) = 3/6. And since ..</.// =	(2-29) cields
p/ri i	1
' P{evcn} 3
This equals the relative frequency of the occurrence of the event (two) in the
subsequence whose outcomes are even numbers.
Example 2-9. Wc denote by r the age of a person when he dies. The probability
that i s t„ is given by
/’{( < /„} =	dt
'll
where a(t) is a function determined from mortality records. We shall assume that
«(/) = 3 X I0~9r(l00 - t)2	0 < l < lOOyears
and 0 otherwise (Fig. 2-11).
From (2-28) it follows that the probability that a person will die between the
ages of 60 and 70 equals
P{60 < t < 70} = P’a(f) (it = 0.154
This equals the number of people who die between the ages of 60 and 70 divided
by the total population.
With
.?/= {60 s t < 70}	-^= {r ;> 60}	.г/
it follows from (2-29) that the probability that a person will die between the ages of
60 and 70 assuming that he was alive at 60 equals
J a(t)dl
P{60 < t < 70| г 2 60} = ~----------- = 0.486
( afj) dt
-/6o
This equals the number of people who die between the ages 60 and 70 divided by
the number of people that arc alive at age 60.
Example 2-10. A box contains three white balls w,» iv2, и>л and two red balls r,. r2.
Wc remove at random two balls in succession. What is the probability that the first
removed ball is white and the second is red?
Wc shall give two solutions to this problem. In the first, wc apply (2-25); in
the second, we use conditional probabilities.
30 THl. AXIOMS OI PKOHAHII-I n
First soltition. The space of our experiment consists of all ordered pairs that we
can form with the five balls:
H'l,v2	,Vlrl lvlr2 ••• r2,vl r2,v2 r2wl r2ri
The number of such pairs equals 5 X 4 = 20. The event {white first, red second)
consists of the six outcomes
1Р,Г, И’,Г, H‘2r, иу,	ii ,r.
Hence [see (2-25)] its probability equals 6/20.
Second solution. Since the box contains three while and two red halls, the probability
of the event = {while lirst) equals 3/5. If a while hall is removed, there remain
two white and two red balls; hence the conditional probability 7/x) of the
event .'#2 — {red second) assuming {white first) equals 2/4. From this and (2-29) it
follows that
2	3	6
P( Г,.^2) = P(.^2| 7/',)P( //,) = - X - = -
where is the event {white first, red second).
Total Probability and Bayes’ Theorem
If SI =	..., &/„] is a partition of and & is an arbitrary event (Fig. 2-5),
then
P(^) =Р(,^|х/,)/’(.й/1) +	(2-36)
Proof. Clearly,
c^( s>/x + • • ♦ + X,) =	4- • • • + .^.0/,
But the events and are mutually exclusive because the events .й/
and .й/- are mutually exclusive [see (2-4)]. Hence
P(.^) =/,(^.-<) + ••• +Р(.Ш<)
and (2-36) follows because [see (2-29)]
) = Р(&\^	)	(2-37)
This result is known as the total probability theorem.
Since Р(^<й^-) == Р(.й^|^)Р(^) we conclude with (2-37) that
)
P(XI«) - )-sfst	<2’38)
Inserting (2r36) into (2-38), we obtain Bayes' theorem?:
...	P(^|.oZ)P(^)
( ' >" p{^wx)p^x} + ••• +p(^i^)p(x,) (2’39
tThe main idea Of this theorem is due to Thomas Bayes (1763). However, its final form (2-39) was
given byLaplace several years'later.
2-3 CONDI IIOSAI. РКОНАШ1 ПУ 31
Note The terms a priori and a posteriori arc often used for the probabilities pt c/) and
Example 2-11. Wc have four boxes. Box 1 contains 2000 components of which
5 percent are defective. Box 2 contains 500 components of which 40 percent are
defective. Boxes 3 and 4 contain 1000 each with 10 percent defective. We select at
random one of the boxes and wc remove at random a single component.
(a)	What is the probability that the selected component is defective?
The space of this experiment consists of 4000 good (g) components and 500
defective (d) components arranged as follows:
Box I:	1900g,lOOd	Box 2: 300g,200d
Box 3:	900g, KJOd	Box 4: 900g, lOOd
Wc denote by the event consisting of all components in the ith box and
by £P the event consisting of all defective components. Clearly,
P(.^J =P(.>?,) =P(.^?) = P(.^) = i	(2-40)
because the boxes are selected at random. The probability that a component taken
from a specific box is defective equals the ratio of the defective to the total number
of components in that box. This means
100
100
iooo=0J
that
(2-41)
200
''<^> = да=ол
100
And since the events	form a partition of wc conclude from
(2-36) that
P(</) = 0.05 X | + 0.4 X | + 0.1 x 4 + 0.1 x j = 0.1625
This is the probability that the selected component is defective.
(b)	We examine the selected component and wc find it defective. On the
basis of this evidence, wc want to determine the probability that it came from
box 2.
We now want the conditional probability P(6d2|5'). Since
P(2?) = 0.1625 Р(Я#г) = 0.4	P(&2) = 0.25
(2-38) yields
0.25
P(#2|<?) =0.4 x -^ = 0.615
Thus the a priori probability of selecting box 2 equals 0.25 and the
a posteriori probability assuming that the selected component is defective equals
0.615. These probabilities have the following frequency interpretation: If the
experiment is performed n times, then box 2 is selected 0.25л times. If wc consider
only the n$ experiments in which the removed part is defective, then the number
of times the part is taken from box 2 equals 0.615л%.
We conclude with a comment on the distinction between assumptions and
deductions: Equations (2-40) and (2-41) arc not derived; they are merely reasonable
assumptions. Based on these assumptions and on the axioms, wc deduce that
Pl£Z) - 0.1625 and P(^2I^) - 0.615.
32 THE AXIOMS OF PROBABILITY
Independence
Two events .й/ and .<3 are called independent if
(2-42)
The concept of independence is fundamental. In fact, it is this concept
that justifies the mathematical development of probability, not merely as a topic
in measure theory, but as a separate discipline. The significance of indepen-
dence will be appreciated later in the context of repeated trials. We discuss here
only various simple properties.
Frequency interpretation Denoting by n -j, n#. and n the number of occurrences of
the events and ..с/й? respectively, wc have
n-s	v n
P(^) = —	P(&) ~	= -----
n	n	n
If the events and Si arc independent, then
_ P( = £1^1 = n-^/n =
n "	}	P(&) n./n
Thus, if .a/ and 3} arc independent, then the relative frequency n^/n of the occurrence
of sif in the original sequence of n trials equals the relative frequency	of the
occurrence of in the subsequence in which 3 occurs.
We_show next that if the events 32/_and @ are independent, then the
events ?/ and & and the events <£/ and35 are also independent.
As we know, the events and &/3& are mutually exclusive and
P(j7) = 1 - P(&/}
From this and (2-42) it follows that
P(^) =P(^) -P(^) = [1 -P(j/)]P(.0) =P(V)P(^)
This establishes_the independence of and 35. Repeating the argument, we
conclude that «о/ and 35 are also independent.
In the next two examples, we illustrate the concept of independence. In
Example 2-12a, we start with a known experiment and we show that two of its
events are independent. In Examples 2-12b and 2-13 we use the concept of
independence to complete the specification of each experiment. This idea is
developed further in the next chapter.
Example 2-12. If we toss a coin twice, we generate the four outcomes ЛЛ, Лг, th,
and tt.
(a) To construct an experiment with these outcomes, it suffices to assign
probabilities to its elementary events. With a and b two positive numbers such
that a 4- b « 1, wc assume that
P{hh} - a2 P{ht) = P{th} - ab P{a) - b2
2-3 CONIMTION'Al FKOHAHII IIY 33
These probabilities are consistent with the axioms because
a2 + ab + ab + b2 — (a + b)2 — 1
In the experiment so constructed, the events
= (heads at first toss) = {hh.ht}
~ {heads at second toss) = {hh. th}
consist of two elements each, and their probabilities arc [sec (2-23)]
P(^t) = P{hh} + P{hi} = a- + ab = a
P(J?Z) = P{hh} + P{ih} = a2 + ab = a
The intersection of these two events consists of the single outcome {hh}.
Hence
^(^1^2) = P{hh} = a2 = P(<*\)P{Jr2)
This shows that the events Л*, and are independent.
(&) The above experiment can be specified in terms of the probabilities
P(a^) =	= a of the events and «Л*,, and the information that these
events are independent.
___ Indeed, as we have shown, the events and and the events and
are also independent. Furthermore,
^|e^2 = {hh} JP\jP2 = {hi}	= W ^t'^2 = {"}
and PM*) = 1 - P(^\) = 1 - a, P(^2) - 1 - P(^2) = 1 - a. Hence
P{hh} = a2 P{hl} - a(l - a) P{ih} = (I - a)a P{tl} = (1 - a)2
Example 2-13. Trains X and Y arrive at a station at random between 8 a.m. and
8.20 a.m. Train X stops for four minutes and train Y stops for five minutes.
Assuming that the trains arrive independently of each other, we shall determine
various probabilities related to the times x and у of their respective arrivals. To
do so, we must first specify the underlying experiment.
The outcomes of this experiment are all points (x, y) in the square of Fig.
2-12. The event
.й/= {X arrives in the interval (/,,/2)) = {/, <,x <> r2}
is a vertical strip as in Fig. 2-12e and its probability equals (r, - 11)/20. This is
FIGURE2-L2
34 THE AXIOMS OF PROBABILITY
our interpretation of the information that the train arrives at random. Similarly,
the event
& ** [У arrives in the interval (t3, (4)j = {t3 <, у < /4)
is a horizontal strip and its probability equals (r4 - r3)/20.
Proceeding similarly, we can determine the probabilities of any horizontal or
vertical sets of points. To complete the specification of the experiment, we must
determine also the probabilities of their intersections. Interpreting the
independence of the arrival times as independence of the events and we
obtain
(r, - G)(G -
P(^) =P(^)P(^) = —
£\J Л &X)
The event is the rectangle shown in the figure. Since the coordinates of
this rectangle are arbitrary, we conclude that the probability of any rectangle
equals its area divided by 400. In the plane, all events are unions and intersections
of rectangles forming a Borel field. This shows that the probability that the point
(x, y) will be in an arbitrary region R of the plane equals the areas of R divided
by 400. This completes the specification of the experiment.
(a)	We shall determine the probability that train X arrives before train У.
This is the probability of the event
= (x < y)
shown in Fig. 2-12b. This event is a triangle with area 200. Hence
200
PW-400
(6)	Wc shall determine the probability that the trains meet at the station.
For the trains to meet, x must be less than у + 5 and у must be less than x + 4.
This is the event
& = {-4 <. x - у £ 5}
of Fig. 2-12c. As we see from the figure, the region consists of two trapezoids
with common based, and its area equals 159.5. Hence
159.5
- «O'
(c)	Assuming that the trains met, we shall determine the probability that
train X arrived before train Y. We wish to find the conditional probability
/’(xfl-S’). The event is a trapezoid as shown and its area equals 72. Hence
. . , P( ^)	72
“ 159.5
INDEPENDENCE OF THREE EVENTS. The events	are called (mut-
ually) independent if they are independent in pairs:
) = P(s^ ) i	t2-43)
P^s^J -	(2-44)
2-3 CONDITIONAI I’KOIIAIUI 11V	35
ss.tMTf
We should emphasize that three events might be independent in pair but
not independent. The next example is an illustration.
Example 2-14. Suppose that the events M and it' of Fig. 2-13 have the same
probability
P(,^) = P(^) = P(^) = 5
and the intersections s-/€. Mt', and also have the same probability
P = P(.cZ^) =	= P(M6’) = P(.?/.#6’)
(a) If p = 1/25, then these events are independent in pairs but they arc not
independent because
Ф P(.:/)P(^)P(//)
(&)	If p = 1/25, then	= Р(.й/}P{^)P{6’) but the events arc not
independent because
P(.pZ0) *P(.c/)P(^)
From the independence of the events <o/, and if it follows that:
1.	Any one of them is independent of the intersection of the other two.
Indeed, from (2-43) and (2-44) it follows that
P(rf^2tf3) = Р(^)Р(.й/2)Р(.й/3) = P(.^)P(^2 ,c/3) (2-45)
Hence the events «й/, and are independent.
2.	If we replace one or more of these events with their complements, the
resulting events are also independent.
Indeed, since
4- лз/(лз/2^3	P(^3) = 1 - P(.ft/r3)
we conclude with (2-45) that
Р(^г&) = P(^.g/2) “ P(J^X2)P(4/3) =P(.r/,)P(.ft/2)P(^)
Hence the events <o/i, &f2, and л/3 are independent because they satisfy
(2-44) and, as we have shown earlier in the section, they are also independent
in pairs.
3.	Any one of them is independent of the union of the other two.
36 ГН г: AXIOMS or PROBABILITY
То show that the events .cZ( and .£/, + >Ci6 are independent, it suffices to
show that the events .cZ( and .cZ2 + "Z? = ,й/2.с/ч arc independent. 1'his
follows from 1 and 2.
Generalization. The independence of n events can be defined inductively:
Suppose that we have defined independence of к events for every к < n. We
then say that the events .cZt,..., are independent if any к < n of them arc
independent and
P(V, •• •<) = ?(//,) •• /’(X,)
(2-46)
This completes the definition for any n because we have defined independence
for n = 2.
PROBLEMS
2-1. Show that (u).cZ + .9? + .:Z + &= cZ; (/>) (xZ + .^X.cZ.^) = .cZ^ + .Z.'Z.
2-2. If .rZ= (2 <. x < 5} and = {3 < x < 6}. find .:•/&}, and (.У +
2-3. Show that if xZ^? = {0}, then P(.oZ) < /4.Z).
2-4. Show that (a) if P(.Z) = P(.#) = P(.cZ^). then P(M + .^.cZ) = 0; (/>) if
P(.cZ) = /»(.#) = 1, then P(M) = 1.
2-5. Prove and generalize the following identity
?(.:/+ & + 6’) = P(.:S) + P(&) + P(<?) -
- P(.^) - P(.i№) 4- P(.cZ.^Z)
2-6. Show that if ./ consists of a countable number of elements and each subset {/,}
is an event, then all subsets of arc events.
2-7. If {1,2,3,4}, find the smallest field that contains the sets {I) and {2.3}.
2-8. If Vc P(jZ) = 1/4, and P(^) = 1/3, find P(^\.^f) and /W).
2-9. Show that P(.3<0| if) = P(.cZ|.#if)P(^lZ) and P(xZ.^rJ’) = P(.Z|.^<f)
2-10. (Chain rule) Show that
P(„4	.G/j) «р(Ч1Ч-1.............-z,)	••• P(.cZ2|.tz,).z’(.zI)
2-11. We select at random m objects from a set .У of n objects and we denote by X„
the set of the selected objects. Show that the probability p that a particular
clement £0 of ,Z is in equals m/n.
Hint:p equals the probability that a randomly selected element of is in X„.
2-12. A call occurs at time t where t is a random point in the interval (0.10). (a) Find
P{6 i <. 8). (/>) Find P{6 <; t < 8|f > 5).
2-13; The space •/ is the set of all positive numbers t. Show that if P{t(1 < t < iu + 111
r & t0) - P{t S r,} for every tQ and then P{t £/!}=!- e~ch where c is a
constant.
2-14, The events Л/and are mutually exclusive. Can they be independent?
>KS. Show that if the events jZh......c/n arc independent and	equals .3/ or or
Z*, then the events .......are also independent.
i*i*< ни i мч 37
2-16. Show that 2" - (м + I) equations are needed to establish the independence of и
events.
2-17. Box 1 contains I white and 999 red balls. Box 2 contains I red and 9‘»9 white balls.
Л ball is picked from a randomly selected box. If the ball is red what is the
probability that it came from box 1?
2-18. Box 1 contains 1000 bulbs of which 10 percent are defective. Box 2 contains 2000
bulbs of which 5 percent arc defective. Two bulbs are picked from a randomly
selected box. (a) Find the probability that both bulbs are defective. (/>) Assuming
that hoth are defective, find the probability that they came from box I.
2-19. A train and a bus arrive at the station at random between 9 л.м. and 10 мм. The
train stops for 10 minutes and the bus for л minutes. Find t so that the probability
that the bus and the train will meet equals 0.5.
2-20. Show that a set ./ with n elements has
n(n - I) •••(л - к + 1) л!
I • 2 • • • к ~ кЦп - А)!
/с-clement subsets.
2-21. We have two coins; the first is fair and the second two-headed. Wc pick one of the
coins at random, wc toss it twice and heads shows both times. Find the probability
that the coin picked is fair.
CHAPTER
3
REPEATED
TRIALS
3-1 COMBINED EXPERIMENTS
We are given two experiments: The first experiment is the rolling of a fair die
A) WJ 4
The second experiment is the tossing of a fair coin
A={hj} P2{M =	= 1
Wc perform both experiments and we want to find the probability that we get
“two” on the die and “heads” on the coin.
If we make the reasonable assumption that the outcomes of the first
experiment are independent of the outcomes of the second, we conclude that
the unknown probability equals 1/6 X 1/2.
The above conclusion is reasonable; however, the notion of independence
used in its derivation does not agree with the definition given in (2-42). In that
definition, the events л/ and & were subsets of the same space. In order to fit
the aibove conclusion into our theory, we must, therefore, construct a space
having as subsets the events “two” and “heads.” This is done as follows:
The two experiments are viewed as a single experiment whose outcomes
are pairs where is one of the six faces of the die and ,s heads or
38
3-1 COMHINLU EXt'LlUMl.NIS 39
tails.f The resulting space consists of the 12 elements
fxh..................................
In this space, (two) is not an elementary event but a subset consisting of
two elements
{two) = {/2/»./2/}
Similarly, {heads} is an event with six elements
{heads} = {/(Л........ДЛ}
To complete the experiment, we must assign probabilities to all subsets of
Clearly, the event {two} occurs if the die shows “two" no matter what shows
on the coin. Hence
/’{two} = P,{/2} = +
Similarly,
P{ heads) = /<{/;} = {
The intersection of the events {two} and {heads) is the elementary event
{/2/i}. Assuming that the events {two} and {heads} are independent in the sense
of (2-42), we conclude that P{f2h} = 1/6 x 1/2 in agreement with our earlier
conclusion.
CARTESIAN PRODUCTS. Given two sets and ./2 with elements and £2
respectively, we form all ordered pairs ^^2 where is any element of and
g2 is any element of The cartesian product of the sets and .Z2 is a set
c/* whose elements are all such pairs. This set is written in the form
-У) X
Example 3-1. The cartesian product of the seis
.У) = (car, apple, bird} .Z2 = {h,t)
has six elements
v/*! X ./2 = {car-/:, car-z, apple-/:,apple-/.bird-/»,bird-/}
Example 3-2. If .Z, = (Л, /}, .Z2 = {Л, /}. Then
.У*, X /, = {АЛ, Л/, th, //}
In this example, the sets and -X are identical. We note also that the
element hi is different from the clement th.
tin theearlier discussion, the symbol £, represented a single element of a set In the following,
willqlso represent an arbitrary element of a set It will be understood from the context
whether f, is one particular element or any element of У-
40 REPEATED TRIALS
FIGURE 3-1
If <0/ is a subset of .Z( and is a subset of -У?, then the set
-6= a/X
consisting of all pairs £x£2 where e «с/ and g2 e is a subset of .
Forming similarly the sets Ss/ X .Z2 and .Sx X wc conclude that their
intersection is the set jZx
.Ух & = (.o/x ~Z2) П (./] x	(3-1)
Note Suppose that uZ( is the x axis, is the у axis, and .о/ and В arc two intervals:
{a j < x < x2} GS = {y, <. у < у,}
In this case, &fx 2 is a rectangle, x/x .У^ is a vertical strip, and X $8 is a
horizontal strip (Fig. 3-1).
We can thus interpret the cartesian product зУх У? of two arbitrary sets as a
generalized rectangle.
Cartesian product of two experiments. The cartesian product of two experi-
ments c/\ and .Z, is a new experiment .Z= .Z, X .Z2 whose events are all
cartesian products of the form
.Ух &
(3-2)
where £/ is an event of ^Z( and @ is an event of .Z^, and their unions and
intersections.
In this experiment, the probabilities of the events л/х .Z2 and X^
are such that
Р(лГx Z^) = Px(^) P(^x x = P2(&)	(3-3)
where Р|(л^) is the probability of the event зз/ in the experiments «Z( and
Л(^) is the probability of the event & in the experiments .Z2. The above is
motivated by the interpretation of .Z as a combined experiment. Indeed, the
event jaf x of the experiment Z” occurs if the event of the experiment
<-Z| occurs no matter what the outcome of .Z2 *s- Similarly, the event Zt X
of the experiment Z* occurs if the event of the experiment .Z occurs no
matter what the outcome of Z^ is. This justifies the two equations in (3-3).
3- 1 COMIMNI О 1 \|>| KIMIS г
41
These equations determine only the probabilities of the events с/х
and A\ X A The probabilities of events of the form .?/x ,/? and of (heir
unions and intersections cannot in general be expressed in terms of P and /\.
To determine them, we need additional information about the experiments
and
Independent experiments. In many applications, the events с/х and
X of the combined experiment are independent for any .г/ and A
Since the intersection of these events equals л/х (see (3-1)].. wc conclude
from (2-42) and (3-3) that
P(.c/x <#) = P(.:/x	X .^) =	(3-4)
This completes the specification of the experiment .X' because all its
events arc unions and intersections of events of the form j?/x
We note in particular that the elementary event {£(£2} can be written as a
cartesian product {^j x {£,} of the elementary events {f(} and {<2} of and
.A. Hence
- Х(Л>Л(Ы	(3-5)
Example 3-3. A box Bx contains 10 white and 5 red balls and a box B2 contains 20
white and 20 red balls. A ball is drawn from each box. What is the probability that
the ball from Bx will be white and the ball from B2 red?
The above operation can be considered as a combined experiment.
Experiment A\ ,s *he drawing from Bv and experiment A is the drawing from
B2. The space has 15 elements: 10 white and 5 red balls. The event
= {all white balls in B,}
has 10 elements and its probability equals 10/15. The space has 40 elements:
20 white and 20 red balls. The event
.'A = {all red balls in B2)
has 20 elements and its probability equals 20/40. The space x •/, has
40 X 15 elements: all possible pairs that can be drawn.
Wc want the probability of the event
x ,#2 = {white from B( and red from B2}
Assuming independence of the two experiments, we conclude from (3-4) that
10	20
P( Г, X г₽2) = P,(	= — X —
Example 3-4. Consider the coin experiment where the probability of "heads”
equals p and the probability of "tails” equals q = 1 — p. If wc toss the coin twice,
we obtaiin the space
A= A x A A = 'A « {h, /}
Titus A consists of the four outcomes ЛЛ, Лг, гЛ, and «. Assuming that the
42 RLI’EAIL-D 1RIA1.S
experiments and arc independent, wc obtain
Р(ЛЛ) = Px{h}Pz{h} = p2
Similarly,
P{/z/} = pq P{th} = qp P{tt} = q2
We shall use the above to find the probability of the event
= {heads at the first loss} = {hh.ht}
Since consists of the two outcomes hh and hl, (2-23) yields
P(^i) = P{hh) + P{ht} = p2 + pq = p
Ibis follows also from (3-4) because = {Л} x .У\.
Generalization. Given n experiments .У],..., .y^, we define as their cartesian
product
.У\ X •   X	(3-6)
the experiment whose elements are all ordered n tuplcts	where £, is
an element of the set .У*. Events in this space are all sets of the form
.2/, X • •  x &/n
where л/} с. and their unions and intersections. If the experiments are
independent and is the probability of the event in the experiment
.>5, then
P(X    X О = P,(^,)C,(X.)	(3-7)
Example 3-5. If we toss the coin of Example 3-4 n times, we obtain the space
.У"= .У\ X • • • x -,Уп consisting of the 2” elements • • • £„ where £ = h or i.
Clearly,
 «-'’.«i) -- w,)	f'Z? (M>
If, in particular p = q = 1/2, then
From (3-8) it follows that, if the elementary event {£( • • • („} consists of к
heads and n — к tails (in a specific order), then
p{<i	(3-10)
We note that the event = {heads at the first toss} consists of 2я 1
outcomes ft • • * where ft « h and ft = t or h for i > 1. The event t can be
written as a cartesian product
= {/i} xzx  x
Hence (see (3-7»
P(^) - Wt(-4) • • • P„M) - P
3-2 HI IlNOUI l.l I HIAI.S 43
because /’(•<) = I. We can similarly show that if
= {heads at the ith toss} .7, = {tails at the /th toss)
then
?( *,)=P P( ) = 4
Dual meaning of repeated trials. In the theory of probability, the notion of
repeated trials has two fundamentally different meanings. The lirst is the
approximate relationship (l-l) between the probability P(.:/) of an event in
an experiment У and the relative frequency of the occurrence of The
second is the creation of the experiment ./' x • • • x . / .
For example, the repeated tossings of a coin can be given the following
two interpretations:
First interpretation (physical) Our experiment is the single toss of a fair
coin. Its space has two elements and the probability of each elementary event
equals 1/2. A trial is the toss of the coin once.
If we toss the coin n times and heads shows nh times, then almost
certainly nh/n = 1/2 provided that n is sufficiently large. Thus the first
interpretation of repeated trials is the above inprecisc statement relating
probabilities with observed frequencies.
Second interpretation (conceptual) Our experiment is now the toss of the
coin n times where n is any number large or small. Its space has 2" elements
and the probability of each elementary event equals 1/2". A trial is the toss of
the coin n times. All statements concerning the number of heads arc precise
and in the form of probabilities.
We can. of course, give a relative frequency interpretation to these
statements. However, to do so, we must repeat the n tosses of the coin a large
number of times.
3-2 BERNOULLI TRIALS
It is well known from combinatorial analysis that, if a set has n elements, then
the total number of its subsets consisting of к elements each equals
(n\	- I) • • (н - Л + I) _ fi'-
1-2 - k	kl(n-k)! 1	'
For example, if n = 4 and к = 2, then
Indeed, the two-elemerit subsets of the four-element set abed are
ab ac ad be bd cd
The above result will be used to find the probability that an event .occurs к
times in n independent trials of an experiment This problem is essentially
44 RL1*I:ATI:(> TRIALS
the same as the problem of obtaining к heads in n tossings of a coin We start
therefore, with the coin experiment.
Example 3-6. A coin with /•’(/») = p is tossed л times. We maintain that the
probability p„(k) that heads shows к times is given by
P,,(k) =	k q = 1 - p	(3-12)
Proof. The experiment under consideration is the л-lossing of a coin. A single
outcome is a particular sequence of heads and tails. The event {A- heads in any
order) consists of all sequences containing к heads and л - к tails. The к heads
of each such sequence form a А-element subset of a set of л heads. As we noted,
there arc (^ j such subsets. Hence the event {A heads in any order) consists of (^ j
elementary events containing к heads and n - к tails in a specific order. Since the
probability of each of these elementary events equals pkqn~k, we conclude that
P{k heads in any order) = ( д
Special Case. If л = 3 and к = 2. then there are three ways of getting two heads,
namely, hht, hth, and thh. Hence p3(2) = 3p2q in agreement with (3-12).
Success or Failure of an Event ей/ in n
Independent Trials
We consider now our main problem. We are given an experiment .У and an
event with
P(x/) = p P(.of) = q p + q = 1
We repeat the experiment n times and the resulting product space we denote by
.S". Thus
_ ,/x ... x у
We shall determine the probability pn(k) that the event .о/ occurs exactly к
times.
FUNDAMENTAL THEOREM
p„(£) = occurs к times in any order) =	(3-13)
Proof. The event occurs к times in a specific order) is a cartesian product
X X @n where к of the events equal зУ and the remaining n - к
equal Л/. As we know from (3-7), the probability of this event equals
P(^) ••• P(@n) ~pkqn~k
3-2 iii.iisotii 11 iKtAf s 45
FIGURE 3-2
because
In other words,
P{&/ occurs k times in a specific order) = pk<f к (3-14)
The event {.о/ occurs к times in any order) is the union of the j events
{.£/ occurs к times in a specific order) and since these events are mutually
exclusive, we conclude from (2-20) that p„Uc.) is given by (3-13).
In Fig. 3-2, we plot p„(k) for n = 9. The meaning of the dashed curves will be
explained later.
Example 3-7. A fair die is rolled five times. Wc shall find the probability p5(2) that
“six” will show twice.
In the single roll of a die, .c/= (six) is an event with probability 1/6. Setting
/’(.c/) = ^	P(-7) = £	w-5 Л = 2
in (3-13), wc obtain
5! /l)2/5y’
Example 3-8. A pair of fair dice is rolled four times. We shall find the probability
/?д(0) that “seven” will not show at all.
The space of the single roll of the two dice consists of the 36 elements f,fr
The event .of — (seven) consists of the six elements
Therefore Р(л^) - 6/36 and P(.7) - 5/6. With n - 4 and к • 0, (3-13) yields
P4(0) = (|)“
46 REPEATED TRIAt-S
A- points
	• •	• 1 » • • • • • 1
0--------6	t2
T FIGURE 3-3
Example 3-9. Wc place at random n points in the interval (0, T). What is the
probability that к of these points are in the interval (г,./,) (Fig. 3-3)'*
This example can be considered as a problem in repeated trials. The
experiment is the placing of a single point in the interval (0, 7 ) In this
experiment, .й/= {the point is in the interval (г^г,)} is an event with probability
12. ~ 11
-P = -Ly1
In the space .Z”, the event {jZ occurs к times} means that к of the n points are
in the interval (/,, t2). Hence [see (3-13)]
P{k points are in the interval (/|,/2)} =	(3-15)
Example 3-10. A system containing n components is put into operation at t — 0.
The probability that a particular component will fail in the interval (0, r) equals
p — (‘а(т) dr where a(r) 2: 0	[ a(t) dt — 1	(3-16)
Jo	Jo
What is the probability that к of these components will fail prior to time tl
This example can also be considered as a problem in repeated trials.
Reasoning as above, we conclude that the unknown probability is given by (3-15).
Most likely number of successes. We shall now examine the behavior of p„(A)
as a function of к for a fixed n. We maintain that as к increases, pn(k)
increases reaching a maximum for
=	+	(3-17)
where the brackets mean the largest integer that does not exceed (л + Dp. If
(л + Dp is an integer, then pn(k) is maximum for two consecutive values of k:
к ~ кi = (л + l)p and к = kz - kt - 1 = np - q
Proof. We form the ratio
Pn(k - 1) = kq
Pn(k) (n-k + Dp
If this ratio is less than 1, that is, if к < (л + l)p, then pn(k - 1) is less than
р„(ЛХ This shows that as к increases, pn(k) increases reaching its maximum for
к «• [(л + 1)р]. For к > (n -I- Dp, the above ratio is greater than 1: hence
pn(k) decreases.
3-3 ASYMH'OIIV IHfcOHI MS 47
(jUf* (3-18)
If kt = (zi -+ l)p is an integer, then
P,t(ki ~ 0 = kiQ _ (n 4 l)pg
Pn(ki)	[n - (zi - l)p + |]p	1
This shows that p„(k) is maximum for к = kl and к = *, - I,
Example 3-11. (a) If n = 10 and p = 1/3. then (n + Dp = 11/3; hence к
lll/3] = 3.
(b) If n = 11 and p = 1/2, then (n + Dp = ft; hence A, = ft. к, = 5.
We shall, finally, find the probability
P{kt <k sk2]
that the number к of occurrences of .?/ is between kt and k,. Clearly, the
events № occurs к times}, where к takes all values from kt to k2, are
mutually exclusive and their union is the event {к । < к < k2). Hence [sec (3-13)}
k.	k,
P(k,sksk2} = £ p„(k) = £
к‘-kt	к-kt
Example 3-12. An order of 104 parts is received. The probability that a part is
defective equals 0.1. What is the probability that the total number of defective
parts docs not exceed 1100?
The experiment is the selection of a single part. The probability of the
event £•/ = {the part is defective} equals 0.1. We want the probability that in 104
trials, .о/ will occur at most 1100 times. With
p = 0.1 n = I04	= 0	)t2=ll00
(3-18) yields
not) , 4.
P{0 < к <, 1100} = E P? (0.1)A(0.9)ln‘~*	(3-19)
л-<Л k '
3-3 ASYMPTOTIC THEOREMS
In the preceding section, we showed that if the probability P(j3/) of an event £/
of a certain experiment equals p and the experiment is repeated n times, then
the probability that -f/ occurs к times in any order is given by (3-13) and the
probability that к is between kx and k2 by (3-18). In this section, we develop
simple approximate formulas for evaluating these probabilities.
Gaussian functions. In the following and throughout the book we use exten-
sively the normal or gaussian function
9(x) - 4=-e’x!/!	<3-20)
VZ7T
48 REPEATED TRIALS
and its integral (see Fig. 3-4 and Table 3-1).
G(x) = f 9.(y)dy = -f==r f e~y2/2dy	(3-21)
J —00	у2тг J “00
As is well known
f°e-ax2dx=^	(3-22)
From this it follows that
1	,00
G(co) = ——- f e~x2/2dx = 1	(3-23)
v2ir
Since g(-x) =g(x), we conclude that
G(-x) = 1 -G(x)	(3-24)
With a change of variablesr(3-21) yields
1	/*"^2 r	>	/ Л*2	\	/ -£| —~ \	аг\
/ e-v-W/^ dx = G —-------- - G -------- (3-25)
av2w JXi	\ a }	\ a )
for any a and b.
3-3 ASYMPTOTIC ГНГОШ MS 49
TABLE 3-1
1 f*	1
erfx -	— I e y‘/2 dy = G(x)----
v2ir Jo	2
x	erf x		X	erf x	x		erf X	x	erf x	
0.05	0.01994	0.80	0.28814	1.55	0.43943	2.30	0.48928
0.10	0.03983	0.85	0.30234	1.60	0.44520	2.35	0.49061
0.15	0.05962	0.90	0.31594	1.65	0.45053	2.40	0.49180
0.20	0.07926	0.95	0.32894	1.70	0.45543	2.45	0.49286
0.25	0.09871	1.00	0.34134	1.75	0.45994	2.50	0.49379
0.30	0.11791	1.05	0.35314	1.80	0.46407	2.55	0.49461
0.35	0.13683	1.10	0.36433	1.85	0.46784	2.60	0.49534
0.40	0.15542	1.15	0.37493	1.90	0.47128	2.65	0.49597
0.45	0.17364	1.20	0.38493	1.95	0.47441	2.70	0.49653
0.50	0.19146	1.25	0.39435	2.00	0.47726	2.75	0.49702
0.55	0.20884	1.30	0.40320	2.05	0.47982	2.80	0.49744
0.60	0.22575	1.35	0.41149	2.10	0.48214	2.85	0.49781
0.65	0.24215	1.40	0.41924	2.15	0.48422	2.90	0.49813
0.70	0.25804	1.45	0.42647	220	0.48610	2.95	0.49841
0.75	0.27337	1.50	0.43319	2.25	0.48778	3.00	0.49865
For large x, G(x) is given approximately by (see Prob. 3-9)
G(x) « 1 - -g(x)
(3-26)
We note, finally, that G(x) is often expressed in terms of the error function
erf x =
1 ,x , ,
7-— / e y A dy = G( x)
y2ir Jo
1
2
DeMoivre-Laplace Theorem
It can be shown that, if npq » 1, then
(П\пклп~к-----------------* .r~lk-np)2 /2прц
UP*	1/25^
(3-27)
for к in yjnpq neighborhood of np. This important approximation, known as
the DeMoivre-Laplace theorem, can be stated as an equality in the limit: The
ratio of the two sides tends to 1 as л The proof is based on Stirling's
formula
n! = nne~n y/lirn л -»<»	(3-28)
The details, however, will be omitted.!
tlhe proof can be found in Feller, 1957 (see references al the end of the book).
50 REPEATED TRIALS
Thus the evaluation of the probability of к successes in n trials, given
exactly by (3-13), is reduced to the evaluation of the normal curve
1----е-1х-Пр}2/2прч	(3-29)
y/lirnpq
for x = k.
Example 3-13. A fair coin is tossed 100 times. Find the probability pa that heads
will sho.w 500 times and the probability ph that heads will show 510 times.
In this example
p = q = 0.5 n — 1000 y/npq = 5 v^TcT
(a)	If к = 500 then к - np = 0 and (3-27) yields
1 1
n = .	= —== = 0.0252
J2 irnpq	1075tt
(/>) If к = 510 then к — np = 10 and (3-27) yields
As the next example indicates, the approximation (3-27) is satisfactory
even for moderate values of n.
Example 3-14. We shall determine pn(k) for p = 0.5, n = 10, and к = 5.
(a) Exactly from (3-13)
In\ i i Ю! I
₽,.(*) = (*)'”''	-5!5!?=- °'244
(6)	Approximately from (3-27)
1 1
pn(k) = — e-(k-nPr/2пРч = __ = о.252
y/lirnpq	v5ir
APPROXIMATE EVALUATION OFPUj < к k2}. Using the approximation (3-27),
we shall Show that
(3-30)
Thus, to find the probability that in n trials the number of occurrences of an
event «of is between k} and k2, it suffices to evaluate the tabulated normal
function G(x). The approximation is satisfactory if npq » 1 and the differences
A.| •? rip and. A2 — np are of the order of jnpq.
11
3-3 ASYMI'IOIK IHI OHI MS 51
FIGURE 3-5
Proof. Inserting (3-27) into (3-18). we obtain
tc	k »
E (>V’-A = -^= E
k-kJK’	o-v2tf
(3-31)
The normal curve is nearly constant in any interval of length 1 because
<r2 = npq 1 by assumption; hence its area in such an interval equals approxi-
mately its ordinate (Fig. 3-5). From this it follows that the right side of (3-31)
can be approximated by the integral of the normal curve in the interval (k}. kJ.
This yields
*2
У g-(k-np)1 /2<т2 ~
k-kt
—7==- fkze (r "p)'/2rT' dx
(3-32)
and (3-30) results [see (3-25)].
Error correction. The sum on the left of (3-31) consists of k2 - kt + 1 terms.
The integral in (3-32) is an approximation of the shaded area of Fig. 3-6a,
consisting of k2 — k} rectangles. If k2 - k} » 1 the resulting error can be
neglected. For moderate values of k2 — ku however, the error is no longer
negligible. To reduce it, we replace in (3-30) the limits kt and k2 by k} - 1/2
(я)
FIGURE 3-<5
52 RLPliATEP 1RIAI-S
and k2 + 1/2 respectively (see Fig. 3-6/>). This yields the improved approxima-
tion
Example 3-15. A fair coin is tossed 10000 times. What is the probability that the
number of heads is between 4900 and 5100?
In this problem
« = 10000 p = r/ = 0.5 A) = 4900 k2 = 5100
Since (k2 — np)/ -^npq = 100/50 and (Af — np)/ ijnpq = — 100/50, we conclude
from (3-30) that the unknown probability equals
G(2) - G(-2) = 2G(2) - 1 = 0.9545
Example 3-16. Over a period of 12 hours 180 calls arc made at random. What is
the probability that in a four-hour interval the number of calls is between 50 and
70?
The above can be considered as a problem in repeated trials with p = 4/12
the probability that a particular call will occur in the four-hour interval. The
probability that к calls will occur in this interval equals [sec (3-27)]
(180W4	=	1	c_,A..6„r/xo
\ к ) \ 3 ) \ 3 )	4/5тг
and the probability that the number of calls is between 50 and 70 equals [see
(3-30)]
70 / «ол\ ( 1 Xk f 2 \,K0-A
E (T)к к =G(^5) -G(-TZ5) =0.886
*• = 50 ' K И J J
Note It seems that we cannot use the approximation (3-30) if k} = 0 because the sum
contains values of к that are not in the yfnpq vicinity of np. However, the corresponding
terms are small compared to the terms with к near np; hence the errors of their
estimates are also small. Since
G( -np/y/npq ) = G( - y/np/q ) - 0 for np/q э» 1
we conclude that if not only n 1 but also np » 1, then
V (n\ к n-k Акг~ПР\	W
L L b v - G .—-	(3-34)
<t=oVK/	\ V'M J
Th the sum (3-19) of Example 3-42,
A, - np 10
np = 1000 npq 900	 —= - = —
3-3 ASYMI'IlHK Illi OKI MS 53
Using (3-34), wc obtain
1НИ1 i* j \	/Ids
E I	j (0.1/(09)'"* k = j = 0.99936
Wc note that the sum of the terms of the above sum from 900 to 1100 equals
2G(10/3) - I = 0.99872.
The Law of Large Numbers
According to the relative frequency interpretation of probability, if an event
with ZJ(.ftz) = p occurs к times in n trials, then к — up. In the following, we
rephrase this heuristic statement as a limit theorem.
We start with the observation that к = np does not mean that к will be
close to np. In fact [(see (3-27)]
1
P{k = np} -	 = -» 0 as n ->	(3-35)
yjZTrnpq
As we show in the next theorem, the approximation к = np means that the
ratio k/n is close to p in the sense that, for any e > 0. the probability that
lk/n —p| < e tends to 1 as n -»
THEOREM. For any r > 0,
( k
P{--------Р < E
I H
1 as n -* к
(3-36)
Proof. The inequality \k/n - pl < e means that
With к। = n(p - e) and k2 = n( p + e) we have
( к	)
P{------p £ e) = P{kt £k < k->] =
I «	/
л,
к =A|
p q
P(k, < к < k2)
Inserting into (3-30), we obtain
(k, — np\	(k.2 — np\	(
= G ;	- G ,	= 2G -
( fnpq )	( fnpq )	( \
n -* oo for any e. Hence
) “ 2g(7J) -  1 as n
- 1
But Ey/n/pq -» oc as
( k
P{------P £ E
l n
(3-37)
Example 3-17. Suppose that p = q = 0.5 and e = 0.05. In this case
п(р — с) =* 0.45/t n(p + r) = 0.55/* ey/n/pq = 0.
54 RhPRATV.O TRIALS
In the table below we show the probability 2G(O.lv67) - 1 that к is between 0.4S«
and 0.55/t for various values of /».
/1	100	400	900
0.1 Jn	1	2	3
2GC0.li/w) - 1	0.682	0.954	0.997
Example 3-18. We now assume that p = 0.6 and we wish to find n such that the
probability that к is between 0.59n and 0.61м is at least 0.98.
In this case, p = 0.6, q — 0.4, and e = 0.01. Hence
P{0.59„ <. к <, 0.6bt) = 2G(0.01/„/0.24 ) - 1
Thus n must be such that
2G(0.01y6»/0.24 ) - 1 2 0.98
From Tabic 3-1 we see that G(x) > 0.99 if x > 2.35. Hence 0.01 /„/0.24 > 2.35
yielding „ > 13 254.
GENERALIZATION OF BERNOULLI TRIALS. The experiment of repeated trials
can be phrased in the following form: The events = .о/ and .ft/; = .0/ of the
space form a partition and their respective probabilities equal p} = p and
p2 = 1 — p. In the space the probability of the event occurs fc, = к
times and &f2 occurs k2 = n — к times in any order} equals p„(k) as in (3-13).
We shall now generalize.
Suppose that
a =
is a partition of consisting of the r events .5^ with
=pi Pl + •• • +Pr = 1
We repeat the experiment n times and we denote by рп(кх,..., kr) the
probability of the event (л/, occurs kx times,...,&/r occurs kr times in any
order} where
&i 4- • • • +kr = n
We maintain that
n!
A(*„..., kr) = ———pf. •  • pL	(3-38)
л । • • • * ft
Proof. Repeated application of (3-11) leads to the conclusion that the number
©f events of the form {л/, occurs times,..., occurs kr times in a specific
order) efluals
л!
£,!••• Лг!
3-4 POISSON THLORLM AND RANDOM POIN IS 55
Since the trials are independent, the probability of each such event equals
Pk' Pr'
and (3-38) results.
Example 3-19. A fair die is rolled 10 times. We shall determine the probability
that /] shows three times, and "even" shows six times.
In (his case
•o/> = {/i)	= {fi. fh) = {fy.fj
Clearly.
Pi = 6 Pi = o Py = i; A, = 3 A: = 6 ky = I
and (3-38) yields
Pio(3.6.1)
10!
3'611!
/ 1 \! \61
\6 J (2/ 3
= 0.002
DeMoivre-Laplace theorem. We can show as in (3-27) that, if k, is in the fn
vicinity of np, and n is sufficiently large, then
j 1 (Л|-ЛР|)2	(Ar-npr)2l)
exp { - - ---------- -t- •  • 4------ }
n!	I 2 npi	npr j
(3-39)
Equation (3-27) is a special case.
3-4 POISSON THEOREM AND RANDOM
POINTS
We have shown in (3-13) that the probability that an event st/ occurs к times in
n trials equals
n(n - 1) • • • (n - к + 1)
pV
(3-40)
1 • 2— к
In the following, we obtain an approximate expression for this probability under
the assumption that p 1. If n is so large that np = npq » 1, then we can use
the DeMoivre-Laplace theorem (3-27). If, however, np is of order of one, (3-27)
is no longer valid. In this case, the following approximation can be used: For к
of the order of np,
"*	...J'”’*
kl(n-k)lp4 -e kl
uni
(3-41)
56 REPEATED TRIALS
Indeed, if к is of the order of np, then к n and kp 1. Hence
n( n — ])•••(« — к + 1) — n • n ' • n = nk
q = 1 — p = e~p qn~k —	= ^~np
Inserting into (3-40), we obtain (3-41).
The above approximation can be stated as a limit theorem (see Feller
1957):
POISSON THEOREM. If
n —> <» p -* 0 np -* a
then
n!	ak
--------Vvipkq"~k 7^* e~a 77	(3-42)
k\{n - k)!	" kl
Example 3-20. A system contains 1000 components. Each component fails
independently of the others and the probability of its failure in one month equals
10-3. We shall find the probability that the system will function (i.e., no component
will fail) at the end of one month.
This can be considered as a problem in repeated trials with p = 10~3,
n — 103, and к = 0. Hence [see (3-15)]
P{k = 0} - qn = O.9991000
Since np = 1, the approximation (3-41) yields
P{k = 0} =е-”р = е~' = 0.368
Applying (3-41) to the sum in (3-18), we obtain the following approxima-
tion for the probability that the number к of occurrences of ja/ is between к ।
and k2:
(np)k
P{kx < к < k2) - e~np £ —ТГ	(3-43)
Example 3-21. An order of 3000 parts is received. The probability that a part is
defective equals 10”3. We wish to find the probability P{k > 5} that there will be
more than five defective parts.
Clearly,
P{k > 5} = 1 - P{k < 5}
With np = 3, (3-43) yields
5 3*
P{k 5} = e~3 £ — = 0.916
Л-0
Hence
P(k > 5} = 0.084
3-4 ROISSON T1IL-OREM AND RANDOM POINTS 57
Generalization of Poisson theorem. Suppose that ...............,	arc the m + I
events of a partition with Р{л/} = p.. Reasoning as in (3-42), we can show that
if np, -♦ a, for i < m, then
л!	e
(3-44)
c ** П1
к <
Л’m -
Random Poisson Points
An important application of Poisson’s theorem is the approximate evaluation of
(3-15) as T and n tend to oo. We repeat the problem: We place at random n
points in the interval (- T/2, T/2) and wc denote by P{k in ta} the probability
that к of these points will lie in an interval (z „ z;) of length t2 - z( = ta. As we
have shown in (3-15)
P{k in /„) = ]pkq”~k where p = у	(3-45)
We now assume that n » 1 and ta « T. Applying (3-41), we conclude
that
r (ni /T)K
P{k in zfl) = r ------------- (3-46)
for к of the order of nta/T.
Suppose, next, that n and T increase indefinitely but the ratio
A = n/T
remains constant. The result is an infinite set of points covering the entire z axis
from -oo to +oo. As we see from (3-46), the probability that к of these points
are in an interval of length ta is given by
P{k in zj = е~Л<а^	(3-47)
POINTS IN NONOVERLAPPING INTERVALS. Returning for a moment to the
original interval (-T/2, T/2) containing n points, we consider two nonoverlap-
ping subintervals ta and tb (Fig. 3-7).
FIGURE 3-7
58 REHEATED FRIALS
We wish to determine the probability
P{ka in t„,kb in
that ka of the n points are in interval ta and kb in the interval tb. We maintain
that
л! Iiu\ka( tb\k4 ta tb\k'
P(ka in ta,kb in/J =	(7) (l-7-7j (3*48)
where k3 = n — ka — kh.
Proof. The above can be considered as a generalized Bernoulli trial. The
original experiment is the random selection of a single point in the interval
(— T/2, T/2). In this experiment, the events ,2/( = {the point is in rj..?/, = {the
point is in /Д and л/3 = {the point is outside the intervals ta and t(l} form a
partition and
t„	th	ta th
P(^ = ~	/’(«<.) = PW = 1 - у - у
If the experiment	is performed n times, then the event {Aa in ta and kb in
t6} will equal the event {.q/( occurs kx = ka times, occurs k, = kb times,
and occurs k3 = n — kx — k2 times). Hence (3-48) follows from (3-38) with
r = 3.
We note that the events {ka in ta} and {kb in tj are not independent
because the probability (3-48) of their intersection {ka in ta,kb in tb} does not
equal P{ka in ta}P{{kb in гД
Suppose now that
n
— = A	n —> oc 7* -» 00
T
Since nta/T = Aro and ntb/T = Atb, we conclude from (3-48) and Prob. 3-16
that
(At )k" (At )kb
P(ka in ta, kb in t6) =	/ e~Af» b	(3-49)
Ao!	kbi
From (3-47) and (3-49) it follows that
P{A:fl in to, kb in tb) = P{ka in ta}P{kb in tb}	(3-50)
This shows that the events {ka in ta} and {kb in tb} are independent.
We have thus created ah experiment whose outcomes are infinite sets of
points on the t axis. These outcomes will be called random Poisson points. The
experiment was formed by a limiting process; however, it is completely specified
3-4 POISSON ГНЕОН1 M ANb RANDOM POININ 59
in terms of the following two properties.
1. The probability P{ka in ta} that the number of points in an interval <r(, г»)
equals ka is given by (3-47).
2. If two intervals (/,, /2) and (/3, r.,) arc nonoverlapping, then the events (ka in
(r(,/2B and (kb in (/3,r4)) are independent.
The experiment of random Poisson points is fundamental in the theory
and the applications of probability. As illustrations wc mention electron emis-
sion, telephone calls, cars crossing a bridge, and shot noise, among many others.
Example 3*22. Consider two consecutive inlcrvals(rl.t;)and(r2,r3)with respective
lengths ta and tb. Clearly, (/ r3) is an interval with length tc = tlt + (b. We denote
by ka> kb, and kc = ki: + kh (he number of points in these intervals. We assume
that the number of points kc in the interval (r|t r3) is specified. We wish to find the
probability that ka of these points are in the interval (rp r,). In other words, wc
wish to find the conditional probability
P{ka in z„|Af in rj
With kh = kc — ka, we observe (hat
(k0 in ta,kc in /J = {ka in tatkh in ih]
Hence
nf/ .	...	, «л ta,kh in rj
P{k,	-j----
From (3-47) and (3-49) it follows that the above fraction equals
e-A,4(Ara)Mfli]g^[(A/ft)Mftl]
Since tc = ta + th and kc = ka + kht the above yields
Jt 1 ( t \ка( 6. \
P{ka inra|kf in Ц =	O’51)
This result has the following useful interpretation: Suppose that we place at
random kc points in the interval (/h t3). As we see from (3-15), the probability that
ka of these points are in the interval (t |, t2) equals the right side of (3-51).
Density of Poisson points. The experiment of Poisson points is specified in
terms of the parameter A. We show next that this parameter can be interpreted
as the density of the points. Indeed, if the interval Д/ = t2 - r, is sufficiently
small, then
А Дге"АЛ' = АД/
From this and (3-47) it follows that
P(one point in (/, t + Д/)} = A At	(3-52)
60 REI’L;ATi l> I RIAl-S
Hence
P{onc point in (t, t + Az)]
Nonunifonn density Using a nonlinear transformation of the t axis. wc
shall define an experiment whose outcomes arc Poisson points specified by a
minor modification of property I.
Suppose that A(r) is a function such that A(r) > 0 but otherwise arbitrary.
We define the experiment of the nonuniform Poisson points as follows:
1. The probability that the number of points in the interval (f1(z,) equals к is
given by
P{k in (Г।, r2)} = exp - у *A(z) dt
f‘2X(t) dt
ki
(3-54)
2. The same as in the uniform case.
The significance of A(/) as density remains the same. Indeed, with t2 - zt
= Дг and к = 1, (3-54) yields
P{one point in (r, r + Дг)} = A(r) Дг	(3-55)
as in (3-52).
PROBLEMS
3-1. A pair of fair dice is rolled 10 times. Find the probability that •‘seven” will show at
least once.
Answer: 1 - (5/6)10.
3-2, A coin with p{h} = p — 1 — q is tossed n times. Show that the probability that the
number of heads is even equals 0.5[l + (q - p)”].
3-3. {Hypergeometric series) A shipment contains К good and N - К defective compo-
nents. Wc pick at random t\ <, К components and test them. Show that the
probability p that к of the tested components are good equals
HW(?)
3-4. A fair coin is tossed 900 times. Find the probability that the number of heads is
between 420 and 465.
Answer: G(2) + G(l) — 1 = 0.819.
3-5. A fair coin is tossed n times. Find n such that the probability that the number of
heads is between 0.49н and 0.52/j is at least 0.9.
•Answer: G(0.04^T) + G(0.02^T) ^ 1.9; hence n > 4556.
3*6» If = 0.6 and к is the number of successes of «з/ in n trials (a) show that
P{550 £. к £ 650) = 0.999, for n = 1000. (b) Find n such that P{0.59w < к <
0.61л) w 0.95.
I'KOIII I MS 61
3-7. A System has 100 components. The probability that a specific component will fail in
the interval (fl.ft) equals e '' 1 - e h Find the probability that in the interval
(0,774). no more than 10(1 components will fail.
3-8. A coin is tossed an infinite number of times. Show that the
heads are observed al the /1 th toss but not earlier equals
3-9. Show that
probability that к
\y«"
(L
< 1 - J'(-t) < ^g(.t) A > 0
Hint: Prove the following inequalities and integrate from л to
3-10. Suppose that tn n trials, the probability that an event .:/ occurs at least once
equals Pt. Show that, if P(.:/) = p and pn << 1, then Pt - np.
3-11. The probability that a driver will have an accident in 1 month equals 0.02. Find the
probability that in 100 months he will have three accidents.
Answer: About 4 c ’’/3.
3-12. A fair die is rolled five times. Find the probability that one shows twice, three
shows twice, and six shows once.
3-13. Show that (3-27) is a special case of (3-39) obtained with r = 2. k} - к. k2 = n - к.
p} =p, p2= I - p.
3-14. Players X and У roll dice alternately starting with X. The player that rolls eleven
wins. Show that the probability p that X wins equals 18/35.
Outline: Show that
/’( V) = P(.?/1-7)P(.7) + Р(.71.^)Р(.^)
Set 4/= {.Y wins), — {eleven shows al first try). Note that P(.:/)=p,
P(aW) = 1, P(-7) = 2/36. P(.7|M) = 1 - p.
3-15. We place at random n particles in tn > it boxes. Find the probability p that the
particles will be found in tt preselected boxes (one in each box). Consider the
following cases: («) M-B (Maxwell-Boltzmann)—the particles are distinct: all
alternatives arc possible, (b) B-E (Bose-Einstein)—the particles cannot be distin-
guished; all alternatives are possible, (c) F-D (Fermi-Dirac)—the particles cannot
be distinguished; at most one particle is allowed in a box.
Answer:
	M-B	B-E	F-D
p =	n\	nl(m - 1)! (tn + n - 1)!	itl(>n — /1)! ш!
Outline: (fl) The number N of all alternatives equals nt". The number A.z of
favorable alternatives equals the n! permutations of the particles in the preselected
boxes. (/>) Place the nt - 1 walls separating the boxes in line ending with the м
particles. This corresponds to one alternative where all particles arc in the last box.
All other possibilities are obtained by a permutation of the n + nt - 1 objects
consisting of the tn - 1 walls and the n particles. All the (m - 1)! permutations of
62 RHFISATED TRIAIJ»
the walls and the w! permutations of the particles count as one alternative. Hence
N = (.»! + n - !)!/(/» - !)!«’! and //./ - 1. (c) Since the particles are not distin-
guishable, N equals the number of ways of selecting n out of m objects: N = j
and М/=“ 1.
3-16. Reasoning as in (3-41), show that, if
k} + k2 + A, - n Pi + Pi + P$ =1 k{pt 1 k,p2 -t I
then
Use the above to justify (3-49).
3-17. We place at random 200 points in the interval (0,100). Find the probability that in
the interval (0,2) there will be one and only one point (n) exactly and (d) using the
Poisson approximation.
CHAPTER
4
THE CONCEPT
OF A RANDOM
VARIABLE
4-1 INTRODUCTION
A random variable (abbreviation: RV) is a number x«) assigned to every
outcome < of an experiment. This number could be the gain in a game of
chance, the voltage of a random source, the cost of a random component, or any
other numerical quantity that is of interest in the performance of the experi-
ment.
Example 4-1. (я) In the die experiment, we assign to the six outcomes ft the
numbers x(/,) = 10/. Thus
х(/1) = 10,...,х(Д)=60
(6) In the same experiment, wc assign the number 1 to every even outcome
and the number 0 to every odd outcome. Thus
Х(Л) = х(/з) = Х(Л) = 0 Х(Л) = x(A) = X(A) » 1
THE MEANING OF A FUNCTION. An RV is a function whose domain is the set
of experimental outcomes. To clarify further this important concept, we
review briefly the notion of a function. As we know, a function x(r) is a rule of
correspondence between values of t and x. The values of the independent
variable t form a set on the t axis called the domain of the function and the
values of the dependent variable x form a set -У^ on the x axis called the range
of the function. The rule of correspondence between t and x could be a curve,
a table, or a formula, for example, x(/) « r2.
63
64 THE CONCEPT or Л RANDOM VARIABLE
The notation x(r) used to represent a function is ambiguous: It might
mean either the particular number x(t) corresponding to a specific /, or the
function x(r). namely, the rule of correspondence between any t in .y; and the
corresponding x in .У,7. To distinguish between these two interpretations, wc
shall denote the latter by x, leaving its dependence on t understood.
The definition of a function can be phrased as follows: We are given two
sets of numbers .yj and ,УХ. To every t e ,У\ we assign a number x(t)
belonging to the set Ух. This leads to the following generalization: We are given
two sets of objects and consisting of the elements a and /3 respec-
tively. We say that /3 is a function of a if to every element of the set ,Уа we
make correspond an element /3 of the set The set is the domain of the
function and the set .У^ its range.
Suppose, for example, that .y^ is the set of children in a community and
the set of their fathers. The pairing of a child with his or her father is a
function.
We note that to a given a there corresponds a single /3(a). However, more
than one element from might be paired with the same /3 (a child has only
one father but a father might have more than one child). In Example 4-lb, the
domain of the function consists of the six faces of the die. Its range, however,
has only two elements, namely, the numbers 0 and 1.
The Random Variable
We are given an experiment specified by the space <У', the field of subsets of У
called events, and the probability assigned to these events. To every outcome f
of this experiment, we assign a number x(£)- We have thus created a function x
with domain the set У and range a set of numbers. This function is called
random-variable if it satisfies certain mild conditions to be soon given.
All random variables will be written in boldface letters. The symbol x(f) will
indicate the number assigned to the specific outcome £ and the symbol x will
indicate the rule of correspondence between any element of ./ and the number
assigned to it. Example 4-1 a, x is the table pairing the six faces of the die with
the six numbers 10,..., 60. The domain of this function is the set У’=
1/1»••••A) and its range is the set of the above six numbers. The expression
x(/2) is the number 20.
Events generated by random variables. In the study of RVs, questions of the
following form arise; What is the probability that the RV x is less than a given
tiumber x, or what is the probability that x is between the numbers x, and x2.
If, for example, the RV is the height of a person, we might want the probability
that it will not exceed certain bounds. As we know, probabilities are assigned
only to events; therefore, in order to answer such questions, we should be able
toexpress the various conditions imposed on x as events.
We start with the meaning of the notation
{x <;x)
4-1 inthoddction 65
This notation represents a subset of consisting of all outcomes £ such that
x(£) £ x. We elaborate on its meaning: Suppose that the RV x is specified by a
table. At the left column we list all elements £, of and at the right the
corresponding values (numbers) x(£() of x. Given an arbitrary number x, wc find
all numbers x(£,) that do not exceed x. The corresponding elements t,l on the
left column form the set {x < x). Thus {x < x} is not a set of numbers but a set
of experimental outcomes.
The meaning of
{X] < x <, x,}
is similar. It represents a subset of consisting of all outcomes £ such that
Xj x(£) x2 where X] and x2 are two given numbers.
The notation
{x =x)
is a subset of consisting of all outcomes £ such that x(£) = x.
Finally, if Я is a set of numbers on the x axis, then
{x eR}
represents the subset of consisting of all outcomes £ such that x(£) e R.
Example 4-2. We shall illustrate the above with the RV х(/,) = 10< of the die
experiment (Fig. 4-1).
The set {x 5 35} consists of the elements f, f2, f3 because x(/,) <, 35 only
if i = 1, 2, or 3.
The set (x 5} is empty because there is no outcome such that x(.ft) < 5.
The set (20 x < 35) consists of the elements f2 and f3 because 20 <, x( ft)
<, 35 only if i = 2 or 3.
The set {x = 40) consists of the element /4 because x(£) = 40 only if i = 4.
Finally, (x = 35) is the empty set because there is no experimental outcome
such that x(/}) = 35.
Note In the applications, we are interested in the probability that an RV x takes values
in a certain region R of the x axis. This requires that the set {x 6 Л) be an event. As wc
noted in Sec. 2-2, that is not always possible. However, if (x x) is an event for every x
and R is a countable union and intersection of intervals, then {x g R] is also an event. In
10	20	30	40	50	60
t x<35
•X--M----M—»
_______I
20^x<35
x>50
FIGURE 4-1
66 THk CONCI-I’r ОГ A RANDOM VARIAUI.I:
ihc definition of RVs we shall assume, therefore, that the set (x s x) is an event. This
mild restriction is mainly of mathematical interest.
Wc conclude with a formal definition of an RV.
DEFINITION. An RV x is a process of assigning a number x(£) to every outcome
£. The resulting function must satisfy the following two conditions but is
otherwise arbitrary:
I. The set (x x) is an event for every x.
II. The probabilities of the events {x = »} and (x = -«} equal 0:
P(x = «) = 0 P(x = -*>} =0
The second condition states that, although wc allow x to be +« or — <»
for some outcomes, wc demand that these outcomes form a set with zero
probability.
A complex RV z is a sum
Z = X + jy
where x and у are real RVs. Unless otherwise stated, it will be assumed that all
RVs are real.
4-2 DISTRIBUTION AND DENSITY
FUNCTIONS
The elements of the set ./ that arc contained in the event (x x) change as
the number x takes various values. The probability P{x x} of the event
(x £ x} is, therefore, a number that depends on x. This number is denoted by
F,(x) and is called the (cumulative) distribution function of the RV x.
DEFINITION. The distribution function of the RV x is the function
F/x) = P(x <;x)	(4-1)
defined for every x from -» to ».
The distribution functions of the RVs x, y, and z are denoted by F/x),
Fy(y\ and Ft(z) respectively. In this notation, the variables x, y, and z can be
identified by any letter. Wc could, for example, use the notation Fx(w\ Fy(w),
and F.(w) to represent the above functions. Specifically,
Fx(w) = P{x <, и>}
is the distribution function of the RV x. However, if there is no fear of
ambiguity, we shall identify the RVs under consideration by the independent
variable in (4-1) omitting the subscripts. Thus the distribution functions of the
RVs x, y, and z will be denoted by F(x), F(y), and F(z) respectively.
Example 4-3. In the coin-tossing experiment, the probability of heads equals p
and the probability of tails equals q. We define the RV x such that
х(Л) - 1	x(r)-0
4-2 IMSTKIHUHON AND hl NSII V I HN< I IONS 67
We shall find its distribution function F(x) for every x from — to x.
If x 2: 1. then x(/j) = 1 < x and x(/) « 0 < x. Hence (Fig. 4-2)
F( x) = P{x < x) = P(h. I} = 1 x > 1
If 0 <, x < 1, then x(h) = 1 > x and x(t) = 0 < x. Hence
F(x) = P{x < x} = P{i} = q 0 < x < 1
If x < 0, then x(/i) = 1 > x and x(t) = 0 > x. Hence
F(x) = P{x x) = P{0} = 0 x < 0
Example 4-4. In the die experiment of Example 4-2. the RV x is such that
xkfi) = 10/. If the die is fair, then the distribution function of x is a staircase
function as in Fig, 4-3.
We note, in particular, that
F(100) = P{x < 100} = /’(./)=!
F(35) = P{x 35) = Pff,,/3} =
F(30.01) = P(x < 30.01} = РИрЛ.Л} = |
F(3O) = P(X^3O}=P(AJ2,A} = |
F(29.99) = P{x <; 29.99} = P{/,./2} = |
Example 4-5. A telephone call occurs at random in the interval (0.1). In this
experiment, the outcomes are time distances t between 0 and 1 and the probability
that t is between and /2 is given by
P{ti
We define the RV x such that
x(t) = t 0 & t <> 1
FIGURE 4-3
68 'ГНК CONCl.ri <)1 Л RAMPOM VARIA11I L
Thus the variable / has a double meaning: It is the outcome of the experiment and
the corresponding value x(/> of the RV x. We shall show that the distribution
function F(x) of x is a ramp as in Fig. 4-4.
If x > I, then xG) < .v for even' outcome. Hence
F(x) = P{x <x} = P{0 < i < 1} = P(.Z) = 1 ,i > 1
If 0 ^x < I, then xG) <x for every i in the interval (0.x). Hence
F(x) = P{x <x) = P{0 s t S.i] 0 sx S I
If x < 0, then {x < x) is the impossible event because xG) S 0 for every /. Hence
F(x) =P{x<.r) =P(0) = 0	x<0
Example 4-6. Suppose .that an RV x .is such that x(f) = a for every £ in .Z. We
shall find its distribution function.
If x s a, then x(f) « a < x for every Hence
F(x) = P(x x} = P(.Z) = 1 x > a
If x < a, then {x-< x) is the impossible event because x«) = a. Hence
F(x) =P{x <x) =P{0) =0 x<a
Thus a constant can be interpreted as an RV with distribution function a delayed
step U(t — a) as in Fig. 4-5.
Note A complex RV ;z = x + jy has no distribution function because the inequality
x +,jy <x +jy has no meaning. The statistical properties of г are specified in terms of
the Jornr dislributibn of the RVs x and. у (see Chap. 6).
Percentiles, The и percentile of an .RV x is the smallest number such that
и = P{x xu} = F( xu)	(4-2)
Thus. x,( is the inverse of the function и = F(x). Its domain is- the interval
0	£ 1, and its range is the x axis. To find the graph of the function xM, we
interchange the axes Of the F(x) curve as in. Fig. 4-6. The Median of x is the
smallest number, m such that = 0.5. Thus m is the 0.5 percentile of x.
4-2 DISI RIBll I |(>S ANI) |>1 NSI n I I >M 'I loss 69
Frequency interpretation of Hx) and xu. Wc perform the experiment n times
and we observe n values .v,.....x„ of the RV x. We place these numbers on
the x axis and wc form a staircase function F„(x) as in Fig 4-6o. The steps arc
located at the points x, and their height equals 1 /и. They start at the smallest
value xmjn of x„ and Fn(x) = 0 for x < xniln, The function F„(x) so constructed
is called the empirical distribution of the RV x.
For a specific x, the number of steps of F„(x) equals the number я, of x,.v
that are smaller than x; thus F„(x) = nx/n. And since nK/n = /’{x <x) for
large n, we conclude that
n.
F„(x) = ——» P{x <x} = F(x) as n -* »	(4-3)
n
The empirical interpretation of the u percentile x„ is the Quetelet curve
defined as follows: We form n line segments of length x, and place them
vertically in order of increasing length, distance l/n apart. We then form a
staircase function with corners at the endpoints of these segments as in Fig.
4-6£>. The curve so obtained is the empirical interpretation of x„ and it equals
the empirical distribution Fn(x) if its axes arc interchanged.
Properties of Distribution Functions
In the following, the expressions F(x + ) and F(x") will mean the limits
F(x+) = limF(x + e) F(x“) = lim F(x - e) 0 < e -> 0
The distribution function has the following properties
1.	F(+®) = 1	F(-o°) = 0
Proof.
F( + <») = P(x <,	= F(.Z) = 1 F(-oo) =P{x= -«} =0
70 Tills CONCEPT OF A RANDOM VARIABLE
2.	it is a nondecreasing function of x:
if X| <x2 then F(X|) <, F(x2)
(4-4)
Proof. The event {x < x,} is a subset of the event {x < x2) because, if x(£) < A-
for some £, then x(£) < x2. Hence [see (2-14)] P{x < x,} < P{x < x2] and (4-4)
results.
From (4-4) it follows that F(x) increases from 0 to 1 as x increases from
— oo to oo.
3.	if F(x()) = 0 then F(x) = 0 for every x < x()	(4-5)
Proof. It follows from (4-4) because F( —«) —0. The above leads to the
following conclusion: Suppose that x(£) > 0 for every £. in this case. HO) =
P{x < 0} = 0 because {x < 0} is the impossible event. Hence F(x) = 0 for every
x < 0.
P{x > x} = 1 - F(x)
Proof. The events {x x) and {x > x} are mutually exclusive and
{x x} + {x > x} = Z
Hence P{x < x) + P{x > x) = P(<Z) = 1 and (4-6) results.
5. The function F(x) is continuous from the right:
F(x+) = F(x)
(4-6)
(4-7)
Proof. It suffices to show that P{x < x + e} -» F(x) as e -» 0 because P{x <,
x + e} = F(x + e) and F{x + e) -» F(x+) by definition. To prove the above,
we must show that the sets {x £ x + e) tend to the set {x < x) as e -» 0 and to
use the axiom III о of finite additivity. We omit, however, the details of the
proof because we have not introduced limits of sets.
P{xi < x < x2] = F(x2) - F(x{)
Proof. The events {x^xt} and {xt <x^x2) are mutually exclusive because
x(£) cannot be less than x, and between X| and x2. Furthermore,
{X X2) = {X <, Xj} + {X| < x x2}
Hence
P{X < X2} « P{x X,) + P{Xj < x < x2)
and (4-8) results.
4-2 DIS1RIBUTION ANI> DI-NSITV H'NCI IONS 71
7.	P{x = x) = F( x) - F( a )	(4-9)
Proof. Setting x( = x — f and x2 = x in (4-8), we obtain
P{x - c < x < x) = F(x) - F(x - e)
and with e -» 0, (4-9) results.
8-	< x sxj = F(x2) - F(x[)	(4-10)
Proof. It follows from (4-8) and (4-9) because
{АГ, < X < x;) = {xt < X < X2) + {x = A’,}
and the last two events are mutually exclusive.
Statistics We shall say that the statistics of an RV x are known if we can
determine the probability F{x g R] that x is in a set R of the x axis consisting
of countable unions or intersections of intervals. From (4-1) and the axioms it
follows that the statistics of x are determined in terms of its distribution
function.
Continuous, discrete, and mixed types. We shall say that an RV x is of
continuous type if its distribution function F(x) is continuous. In this case.
F(x_) = F(.r); hence
/J{x=x} = ()	(4-11)
for every x.
We shall say that x is of discrete type if F(x) is a staircase function as in
Fig. 4-7. Denoting by xt by discontinuity points of F(x), we have
F(xJ - F(x’) = F{x =xj =Pi	(4-12)
In this case, the statistics of x are determined in terms of x, and p,. If the points
x, are equidistant, that is, if xt: — a + bi, then the RV x is of lattice type.
We shall say that x is of mixed type if F(x) is discontinuous but not a
staircase.
Note that if the set has finitely many elements, then any RV defined
on is of discrete type. However, an RV x might be of discrete type even if
has infinitely many elements.
FIGURE 4-7
72 THE CONCEPT Ol- A RANDOM VARIABLE
Example 4-7. If .я/ is an arbitrary event of .Z and x y is an RV such that
z fl < 6 s/
\0 f e .cZ	(4-13)
then Xjz is called the zero-one RV associated with the event jZ. Thus
{x.{z=l}='aZ {x^=0} = ^
Hence is of discrete type taking only the two values 0 and 1 with
P(x^ = l} = /»(.*,) P{Xa<= 0} = 1 - P(.&/)
The space however, might have infinitely many elements.
The Density Function
The derivative
of F(x) is called the density function (known also as the frequency function) of
the RV x.
If the RV x is of discrete type taking the values x, with probabilities pit
then
f(x) = £p,.3(x - x,) pi = F{x = x,}	(4-15)
I
where 3(x) is the impulse function (Fig. 4-7). The term PjStx — xf) is shown as
a vertical arrow at x = x( with length equal to p,-.
In Example 4-2, the RV x is of discrete type taking the six values
Xj = 10,..., x() = 60 with Pi = 1 /6. Hence
f(x) = |[3(x - 10) + 3(x - 20) + • • • + 3(x - 60)]
PROPERTIES. From the monotonicity of F(x) it follows that
/(x) ;> 0	(4-16)
Integrating (4-14) from -« to x and using the fact that F(—°°) = 0, we
obtain
fUldi	(4-17)
Since F(a>) = 1, the above yields
f f(x)dx = l	(4-18)
J —oo
From (4-17) it follows that
F(x2) - F(xi) = ( 7(x) dx	(4-19)
4-3 Sl’l:< IAI.C ASIA 73
Hence [see (4-8)]
P{a-| < x < x2] = f ’fix) dx	(4-20)
If the RV x is of continuous type, then the set on the left might be replaced by
the set {x, < x <x2). However, if Fix) is discontinuous at x, or x,. then the
integration must include the corresponding impulses of fix).
With xt =x and x2 ~x + Ax it follows from (4-20) that, if x is of
continuous type, then
P{x < x < x + Ax) = fi x) Ax	(4-21)
provided that Ax is sufficiently small. This shows that fix) can be defined
directly as a limit
Ax-о	Ax
Note As we can see from (4-21), the probability that x is in a small interval of specified
length Ax is proportional to fix) and it is maximum if that interval contains the point
xm where fix) is maximum. This point is called the mode or the most likely value of x.
An RV is called unimodal if it has a single mode.
Frequency interpretation We denote by Дпл the number of trials such that
x x(< ) < x + Ax
From (1-i) and (4-21) it follows that
A/i,
/(x)Ax = -^	(4-23)
4-3 SPECIAL CASES
In the preceding sections, wc defined RVs starting from known experiments. In
this section and throughout the book, we shall often consider RVs having
specific distribution or density functions without any reference to a particular
probability space.
Existence theorem. To do so, we must show that given a function fix) or its
integral
F(x) = f fi^dt
we can construct an experiment and an RV x with distribution Fix) or density
fix)..As we know, these functions must have the following properties:
The function fix) must be nonnegative and its area must be 1. The
function Fix) must be continuous from the right and, as x increases from -»
to oo, it must increase monotonically from 0 to 1.
74 THE CONCEPT OE A RANDOM VARIABLE
Proof. We consider as our space the set of all real numbers, and as its events
all intervals on the real line and their unions and intersections. We define the
probability of the event (x < x,} by
P{x <xt) = F(xt)	(4-24)
where Fix) is the given function. This specifies the experiment completely (see
Sec. 2-2).
The outcomes of our experiment are the real numbers. To define an RV x
on this experiment, we must know its value x(x) for every x. We define x such
that
x(x) =x	(4-25)
Thus x is the outcome of the experiment and the corresponding value of the
RV x (see also Example 4-5).
We maintain that the distribution function of x equals the given Fix).
Indeed, the event {x<x() consists of all outcomes x such that x(x)<X(.
Hence
P(x<x,} =P{x<x1) =F(x,)	(4-26)
and since this is true for every x,, the theorem is proved.
In the following, we discuss briefly a number of common densities.
Normal. An RV x is called normal or gaussian if its density is the normal curve
g(x) [see (3-20)], shifted and scaled
1	/ x — i? \	1	,,
/(*)--«-----------(4-27)
a \ a } an]2ir
This is a bell-shaped curve, symmetrical about the line x = 17 (Fig. 4-8) and its
area equals 1 as it should [see (3-22)]. The corresponding distribution function is
given by
F(x) =	(4-28)
\ <r )
where (x) is the tabulated integral of g(x) [see (3-21)].
We shall use the notation
FIGURE 4-8
4-3 м-i t iai < ,\st s 75
Uniform
0 x, x;	x FIGURE 4-9
to indicate that an RV x is normal as in (4-27). The significance of the constants
77 and <r will be given in Sec. 5-4 (77: mean. <r; standard deviation).
Example 4-8. An RV x is N(1000;50). Wc shall find the probability that x is
between 900 and 1050. Clearly.
P{900 < x < 1050} = F( 1050) - F(900) = G(l) - G( -2)
Since
G(-x) = l-G(x)	(4-29)
we conclude from Table 3-1 that
P{900 < x < 1050} = G(l) + G(2) - I = 0.819
Uniform. An RV x is called uniform between .v, and x, if its density is
constant in the interval (x,. x,) and 0 elsewhere
( 1
f(x) = / x,-x,	(4-30)
\0	otherwise
The corresponding distribution function is a ramp as in Fig. 4-9.
Example 4-9. A resistor r is an RV uniform between 900 and 1100 П. We shall
find the probability that r is between 950 and 1050 fl.
Since /(r) = 1/200 in the interval (900.1100). (4-20) yields
1 rioso
/’{950 Sr < 1050} = — / dr = 0.5
200 '950
Binomial. We say that an RV x has a binomial distribution of order n if it takes
the values 0,1,...»л with
P{x = k} =	P + q = 1	(4-31)
Thus x is of lattice type and its density is a sum of impulses (Fig. 4-10a)
f(x) = E (2)/№Л3(х - к)	(4-32)
k-o' 1
The corresponding distribution is a staircase function ahd in the interval (0, n) it
is given by
m tn \
F(x) = E ( l.	m <x Cm + 1	(4-33)
76 THL- CONClilT Ot A RANDOM VARIAHl L
FIGURE 4-10
We note that, if n is large, then [see (3-34)] /*(л) is close to an .N^np^npq)
distribution. In other words.
Example 4-10 Bernoulli trials. In (he experiment of the n tosses of a coin, an
outcome is a sequence <, • • • of к heads and n - к tails where к = 0 n.
Wc define the RV x such that
x(<i ’ • • £„) = A-
Thus x equals the number of heads. As wc know [sec (3-13)], the probability that
x = к equals the right side of (4-31). Hence x has a binomial distribution.
Suppose that the coin is fair and i( is tossed n = 100 times. Wc shall find the
probability that x is between 40 and 60. In (his case
p = q = 0.5 np = 50 \/npq = 5
and (4-34) yields
/ 60 - 50 \ I 40 - 50 \
P{40 <, x < 60} = G -------- - G ---------- = G(2) - G( - 2) = 0.9545
Poisson. An RV x is Poisson distributed with parameter a if it takes the values
0,1,..,, n ... with
a*
P{x = k] = e~“ — к = 0, 1,...	(4-35)
Thus x is of lattice type with density
» ak
/(x) «e- £ тт5(х-Л)	(4-36)
Jt-0
The corresponding distribution is a staircase function as in Fig. 4-10b.
With pk = P(x = k), it follows from (4-35) that
Pti-i _ e~aak~l/(.k - 1)1 k_
pk e~aak/kl a
4-4 ( ONDIIIONAI. I.HSIRIUUIIOSS 77
FIGURE 4-11
If the above ratio is less than 1, that is. if к < «. then pk < pk. This shows
that, as к increases. pk increases reaching its maximum for к = [a]. Hence
if a < 1. then pk is maximum for к = 0;
if a > 1 but it is not an integer, then pk increases as к increases, reaching
its maximum for к = [a];
if a is an integer, then pk is maximum for к = a — 1 and к = a.
Example 4-11 Poisson points. In (he Poisson points experiment, an outcome < is
a set of points t, on the { axis.
(«) Given a constant t„. we define the RV n such (hat its value n(£) equals
the number of points t, in the interval (0,/„). Clearly, n = к means that the
number of points in the interval (0, iv) equals k. Hence [see (3-47)]
(yt )k
P[n = k} =	(4-37)
Thus (he number of Poisson points in an interval of length i„ is a Poisson
distributed RV with parameter a = Xio where A is the density of the points.
(6) We denote by tt the first random point co the right of (he fixed point
and wc define the RV x as the distance from t„ to tt (Fig. 4-1 la). From the
definition it follows that x(<) £ 0 for any £. Hence the distribution function of x is
0 for X < 0. We maintain that for x > 0 it is given by
F(x) = 1 - е"Ад
Proof. As we know, F(x) equals the probability that x <x where x is a specific
number. But x <, x means that there is at least one point between to and to + x.
Hence 1 - F(x) equals the probability ptl that there are no points in the interval
(ro, to + x). And since the length of this interval equals x, (4-37) yields
= I - F(x)
The corresponding density
f(x) = Xe~x*U(x)
is called exponential (Fig. 4-116).
78 ТИБ CONCI-PT OF Л RANDOM VARIABl.U
TABLE 4-1
4-4 CONDI IIONAI. UISIRIIIUIIONS 79
Gamma. An RV x has a gamma distribution if
cb
f(x) = yxb~'e ,lU(x) у = ——	(4-38)
1 (b)
In the above, b and c are positive numbers and
ЦЬ + 1) = ybe~ydy b>—l	(4-39)
is the gamma function. This function is also called the generalized factorial
because Г(Ь + 1) = ЬГ(Ь). If b is an integer, Г(и + 1) = лГ(и) = • • • = н!
because Г(1) = 1. Furthermore.
г(-) = f y”,/2c_vrfy = 2f e~:'dz = /т
The following densities are special cases of (4-38).
Erlang. If b = n is an integer, the Erlang density
results. With n = 1, we obtain the exponential density shown in Fig. 4-11.
Chi-square. For b = n/2 and c = 1/2. (4-38) yields
This density is denoted by x2(n} and >s called chi-square with n degrees of
freedom. It is used extensively in statistics.
In Table 4-1, we show a number of common densities. In the formulas of
the various curves, a numerical factor is omitted. The omitted factor is deter-
mined from (4-18).
4-4 CONDITIONAL DISTRIBUTIONS
We recall that the probability of an event assuming Л is given by
P(j^l^) =	’ where P(^) * 0
°(** )
The conditional distribution F(x\ad) of an RV x, assuming is defined as
the conditional probability of the event (x x):
P,{x < x,
F(xU) = P(x <x|^} -	p(^)
In the above, (x £ x, uH is the intersection of the events {x x) and that is,
the event consisting of all outcomes £ such that x(£) s x and f e
(4-41)
80 THE CONCEPT OF A RANDOM VARIABLE
Thus the definition of is the same as the definition (4-1) of F(x),
provided that all probabilities are replaced by conditional probabilities. From
this it follows (see Fundamental remark, Sec. 2-3) that F(x\.#) has the same
properties as F(x). In particular [see (4-3) and (4-8)]
F(«|^) = 1	F(-oo|^)=0
(4-42)
Pf*! < X £X2№'} = F(x2[.^) - p(x||^z) =
P(.^)-
The conditional density is the derivative of F(x\a^):
dF(x\^)	P{x < x x + Ax|.^)
f(x|/) = —Ц------- = lim
(4-43)
dx дх-о	Дх
This function is nonnegative and its area equals 1.
(4-44)
Example 4-12. We shall determine the conditional F(xl^) of the RV x(/J = 10/
of the fair-die experiment (Example 4-4), where	Д) 's the event
“even.”
If x 60, then {x < x) is the certain event and (x < x, = JK. Hence
(Fig. 4-12)
P(^)
P'(xl^) = -J—( = 1	x s 60
If 40 £ x < 60, then {x <, x, Л} = {/2, /,}. Hence
P{/2,/4}	2/6
™-4£ft- 37* 40-<m
If 20 x < 40, then (x x, - {f2}- Hence
P{/,}	1/6
20^<m
If x < 20, then {x x, - (0). Hence
F(xH) - 0 x < 20
To find F(x|^), we must, in general, know the underlying experiment.
However, if Л is an event that can be expressed in terms of the RV x, then, for
the determination of F(xl^), knowledge of F(x) is sufficient. The following
two cases are important illustrations.
4-4 CONDITIONAL DISTRIBUTIONS 81
FIGURE 4-13
I. We wish to find the conditional distribution of an RV x assuming that x < a
where a is number such that F(a) Ф 0. This is a special case of (4-41) with
Thus our problem is to find the function
F(x|x <, a) = P{x <x|x < a) =
P(x < a)
If x a, then {x < x, x <, a} = (x < a}. Hence (Fig. 4-13)
P{x < a)
F(x|x < a) = —------г = 1 x a
P(x < a]
If x < a, then (x < x, x <, a) = {x < x}. Hence
P{x^x) F(x)
F(x x < a) = —-------- = ——-	x < a
Р{х<д} F(a)
Differentiating F(x|x <, a) with respect to x, we obtain the corresponding
density: Since F'(x) = /(x), the above yields
. Я*)	Я*)
for x < a (4-45)
F(a)
J — Ct,
and.it is 0 for x > a.
П. Suppose now that Л= (b < x <, a}. In this case, (4-41) yields
P(x <,x, b < x < a]
f(x|i,<XSa)-----------------P(b<x*a~
If x £ a, then {x £ x, b < x £ o} «= {b < x a). Hence
F(a) - F(b)
f(z|t<xaa)-f(a) jr(^ -1 Xia
If b ^x < a, then (x x, b < x £ a} = {6 < x x}. Hence
F(x) -F(fc)
f(zlt<xSa)-F(a)_^) b*x<a
82 THE CONCEPT OF A RANDOM VARIABLE
Finally, if x < b, then (x < x, b < x < a) = (0). Hence
F( x\b < x <, a) = 0 x < b
The corresponding density is given by
/(-v)
f(x\b < x a) = ——----------——	for b <;x < a (4-46)
F(a) - F(b)
and it is 0 otherwise (Fig. 4-14).
Example 4-13. Wc shall determine the conditional density /(x| lx - 171 < ко) of
an Mij; a) RV. Since
P{|x — tj| < Act) = P{r) - ко < к <, q 4- Лег} = G(/c) — G(-£) = 2G(£) - 1
we conclude from (4-46).that
/(x| lx -4| s k<r) - 2GW_|
for x between 7] - ко- and q + ко and 0 otherwise. This density is called
truncated normal.
Frequency interpretation In a sequence of n trials, we reject all outcomes f such that
x(£) b or x(£) > a. In the subsequence of the remaining trials, F(x|ft < x < «) has the
same frequency interpretation as F(x) [see (4-3)].
Total Probability and Bayes’ Theorem
We shall now extend the results of Sec. 2-3 to random variables.
1.	Setting S8 — {x <, x) in (2-36), we obtain
P[x^x) = P{x ^х|л/1}Р(«о/1) 4- ••• +Р{х^х|^}Р(^я)
Hence [see (4-41) and (4-44)]
F(jt) = F(x|jZi)P(j»')) + •  • +F(xlX,)P( X)	t4"47)
/(*)	’ - +Лх1^)Г(.йб.)	t4'48)
In the above, the events .,., form a partition of uZ.
4-4 CONIMTIOSAI DISI RIBU I ION'S 83
FIGURE 4-15
Example 4-14. Suppose that the RV x is such that /(x|.^)_is	and
fixl.#) is N(.ti2-,(t2) as in Fig. 4-15. Clearly, the events and form a partition
of Setting л/1 = and .лЛ = in (4-48). wc conclude that
_ p (x - n. \	1 - p / x - n,
f(x) = pf(x|.^) + (1 -p)f(xU) = -G----------------+ -------—G( --------
Oj \ O’।	) cr2 \	(T2
where p =
2.	From the identity
Р(лИ£?) =
Р(^|.й/)Р(л/)
P(SB)
(4-49)
[see (2-38)] it follows that
Р{х<х|л/}	F(x|.o/)
/’(•QHxsx) =	(4’50)
Г|Л S Aj	* \ X )
3.	Setting SB = {x, < x x2} in (4-49), we conclude with (4-43) that
,	% P{x. < x < x2|.oZ]
Р{лИх, < X £x2)-------- >
Г|Л| ’Ч Л is Л 2/
F(x2|<oH — Р(х.|.й/)
= X c Р(.<И	(4-51)
F(x2) - F(x,)
4.	The conditional probability Р(<йИх=х) of the event .о/ assuming
x = x cannot be defined as in (2-29) because, in general, P{x = x) = 0. We shall
define it.as a limit
Р(лИх = x) = lim Р{лНх < x x + Дх)	(4-52)
Дх-»0
With x( = x, x2 = x + Дх, we conclude from the above and (4-51) that
Р{лНх -x) =	P(^)	(4-53)
j\x)
84 ГНЬ CONCEPT OF A RANDOM VARIABLE
P (.*/]* = x)
Л-ri^) -  л О
Total probability theorem. As we know [see (4-42)]
F(oo|.ft/) = Г f(xW)dx = 1
J — ОС
Multiplying (4-53) by /(x) and integrating, we obtain
( P(si/]x = x)f(x) dx = P(&/)	(4-54)
*' — x
This' is the continuous version of the total probability theorem (2-36).
Bayes’ theorem. From (4-53) and (4-54) it follows that
/’(.c/|x = x)/(x)
—------------------------- (4-55)
Г P(V|x = x}f\x) dx
J — <x
This is the continuous version of Bayes’ theorem (2-39).
Example 4-15. Suppose that the probability of heads in a coin-tossing experiment
is not a number, but an RV p with density /(p) defined in some space yj. The
experiment of the toss of a randomly selected coin is a cartesian product .✓/ x
In this experiment, the event	[head] consists of all pairs of the form £ch
where £c *s апУ clement of	and h is the element heads of the space
c/= {Л, /}. We shall show that
P(^) = ('pf(p)dp	(4-56)
Jo
Proof. The conditional probability of assuming p = p is the probability of
heads if the coin with p = p is tossed. In other words,
P(^|p=p)=p	(4-57)
Inserting into (4-54), we obtain (4-56) because f(p) = 0 outside the interval (0,1).
PROBLEMS
4-1. Suppose that xu is the и percentile of the RV x, that is, F(x„) = u. Show that if
/(-x) = /(x), then X|_u = -xu.
4-2. Show that if/(x) is symmetrical about the point x = p and P{r) - a < x < q + a) =
1 - a, then a = 7j - x„/2 = xt _„/2 - p.
4-3. ('tf) Using Table 3-1 and linear interpolation, find the percentile of the N(0,1)
RV z for и = 0.9,0.925, 0.95,0.975, and 0.99. (b) The RV x is Ж o'). Express its
xM percentiles in terms of zt/.
4-4. The RV is x is Mtj.o-) and P{ri - k<r < x < + kcr} — pk. (a) Find pk for
к - 1, 2, and 3. (b) Find к for pk = 0.9, 0.99, and 0.999. (c) If - z„tr < x <
t? 4- zua} - y, express zu in terms of y.
Find xu for «. = 0.1,0.2,..., 0.9 (a) if x is uniform in the interval (0.1); (b) if
/(x) -m?-2'U(x):
I>KOIIU:MS 85
4-6. We measure for resistance R of each resistor in a production line and wc accept
only the units the resistance of which is between 96 and 104 ohms. Find the
percentage of the accepted units (a) if R is uniform between 95 and 105 ohms:
(h) if R is normal with rj = 100 and cr = 2 ohms.
4-7. Show that if the RV x has an Erlang density with n = 2. then F,(.v) =
(1 -	- cxe~")Uix).
4-8. The RV x is Hi 10; 1). Find /(x|(x - 10)2 < 4).
4-9. Find fix) if Fix} = (1 - e-°x)U(x - c).
4-10. Их is M0,2) find (д) P{1 < x <: 2} and ib) P{ 1 < x < 2|x	I}.
4-11. The space ./ consists of all points t, in the interval (0,1) and P{0 < i, s у) = у
for every у < 1. The function Gix) is increasing from G(-<*) = 0 to G(<x) = 1;
hence it has an inverse G(' n(y) = H(y). The RV x is such that x(r,) = Hit,).
Show that Fxix) = G(.r).
4-12. If x is M1000;20) find («) P{x < 1024), ib) P{x < 1024|x > 961), and (c)
P{31 <	< 32).
4-13. A fair coin is tossed three times and the RV x equals the total number of heads.
Find and sketch F,ix) and /4(x).
4-14. A fair coin is tossed 900 times and the RV x equals the total number of heads, (a)
Find fxix): I; exactly 2; approximately using (4-34). ib) Find P{435 <x s 460).
4-15. Show that, if a £ x(£) s b for every < e then F(x) =1 for x > b and
Fix) = 0 for x < «.
4-16. Show that if x(f) y(£) for every £ <= then Fxiw) > Fviw) for every >v.
4-17. Show that if Pit) — /(r|x > t) is the conditional failure rate of the RV x and
Pit) = kt, then fix) is a Rayleigh density (see also Sec. 7-3).
4-18. Show that Pi^f) = Р(.й/|х <r)F(x) + PijV\x > xHl - F(x)].
4-19. Show that
P(V|x <x)F.(x)
—w—
4-20. Show that if P(.V|x = x) = P(£?|x = x) for every x x0, then P(.<V|x <, x0) =
P(0|x £X(1).
Hint: Replace in (4-54) Pi&f) and fix) by P(.a/|x sx(1) and /(x|x <x0).
4-21. The probability of heads of a random coin is an RV p uniform in the interval (0,1).
(a) Find P{0.3 sps 0.7). ib) The coin is tossed 10 times and heads shows 6
times. Find the a posteriori probability that p is between 0.3 and 0.7.
4-22. The probability of heads of a random coin is an RV p uniform in the interval
(0.4,0.6). ia) Find the probability that at the next tossing of the coin heads will
shows, ib) The coin is tossed 100 limes and heads shows 60 times. Find the
probability that at the next tossing heads will show.
CHAPTER
5
FUNCTIONS
OF ONE
RANDOM
VARIABLE
5-1 THE RANDOM VARIABLE g(x)
Suppose that x is an RV and g(x) is a function of the real variable x. The
expression
У = g(x)
is a new RV defined as follows: For a given x(£) is a number and g[x(£)] is
another number specified in terms of x(£) and g(x). This number is the value
y(£) = g[x(£)] assigned to the RV y. Thus a function of an RV x is a composite
function у = g(x) = g[x(£)] with the domain set of experimental outcomes.
The distribution function Fy(y) of the RV so formed is the probability of
the event {y <, y) consisting of all outcomes < such that y(f) = g[x«)] <, y. Thus
Fy(y) = р(у ^У} = F{g(x) у!	<54)
For a specific y, the values of x such that g(x) у form a set on the x
axis denoted by Ry. Clearly, g[x(f)] gy if x(£) is a number in the set Ry.
Hence
FrM =P{xe«y)
(5-2)
86
5-2 Till DISTRIBUTION OF (la) 87
The above leads to the conclusion that for g(x) to be an RV, the function
g(x) must have the following properties:
1.	Its domain must include the range of the RV x.
2.	It must be a Baire function, that is, for every y, the set Ry such that
g(x) у must consist of the union and intersection of a countable number of
intervals. Only then {y y) is an event.
3.	The events (g(x) = ±<»} must have zero probability.
5-2 THE DISTRIBUTION OF g(x)
We shall express the distribution function Fy(y) of the RV у = g(x) in terms of
the distribution function Fx(x) of the RV x and the function g(x). For this
purpose, we must determine the set Ry of the x axis such that g(x) < y, and
the probability that x is in this set. The method will be illustrated with several
examples. Unless otherwise stated, it will be assumed that Fx(x) is continuous.
1.	We start with the function g(x) in Fig. 5-1. As we see from the figure,
g(x) is between a and b for any x. This leads to the conclusion that if у £ b,
then g(x) <y for every x, hence P{y < y) = 1; if у < a, then there is no x
such that g(x) <, y, hence P{y < y) = 0. Thus
F(y)-!1	yzb
\o	y<a
With X! and У1 - g(xx) as shown, we observe that g(x) < y, for x x,.
Hence
F/yJ = P{x<ix1}=Fx(x1)
We finally note that
g(x) £ y2 if x <, xi or if x£ £ x x?
88 FUNCTIONS OF ONE RANDOM VARIABLE
Hence
^•(ъ) = P{x *2) + P{x!f x x£'} = Fx(x±) 4- Fx(x"') - Fx(x?)
because the events (x x£) and (x£ x < x^} are mutually exclusive.
Example 5*1
у = ex + b
To find Fy(y), we must find the values of x such that ax + b £ y.
(a)	If a > 0, then ax + b £ у for x < (y — b)/a (Fig. 5-2я). Hence
( У ~ bx f у — b\
Fy(y) =F|x^—— j =/Ц—— j	я>0
(b)	If я < 0, then ax + b < у for x > (y - b)/a (Fig. 5-26). Hence
{y — b\	( У — b\
x ------} = 1 - Fd----- я < 0
a j	\ a )
Example 5*2
у - x2
If у 0, then x2 5 у for -	5 x -/y (Fig. 5-Зд). Hence
Fy(y) =P{—/у	x vV) =Fx(Vy) “	/у)	У>0
FIGURES*}
5-2 THE DISTRIBUTION OF *(»! 89
If у < 0, then there are no values of x such that x2 < y. Hence
F>.(y) = P{0} = 0	y<0
Special case If x is uniform in the interval (-1,1), then
1 x
F,(x) = 2 + 2	1x1 < 1
(Fig. 5-3fe). Hence
Fy(y) - ft for O^y $ 1 and F/y) = J * > *
2.	Suppose now that the function g(x) is constant in an interval (x0, x,):
g(x) -y, x0 <x ^x,
In this case
P{y = yj = P{x0 < x <;х,} = Fjxj - Fx(x0)	(5-3)
Hence Fy(y) is discontinuous at у = yt and its discontinuity equals F/x,) -
Fx(x0).
Example 5-3. Consider the function (Fig. 5-4)
g(x)=0 for -c^xsc and я(х) = {£ + с x < -c
In this case, Fy(y) is discontinuous for у = 0 and its discontinuity equals Fx(c) -
Fx(—c). Furthermore,
If y^O	then P{y ^y) = P{x sy + c} = Fx(y + c)
If у < 0	then P{y £ y) = P{x sy - c) = Fx(y - c)
HGUREM
90 FUNCTIONS OF ONE RANDOM VARIABLE
Example 5-4 Limiter. The curve g(x) of Fig. 5-5 is constant for x < -b and
x b and in the interval (-b, b) it is a straight line. With у = g(x). it follows that
Fy(y) is discontinuous for у — g(.-b) = -b and у = g{b) - b respectively.
Furthermore,
If	у b	then
If — b £ у < b then
If	у < -b then
g(x)<y for every x;
g(x) < у for x < y;
g(x) <,y for no x;
hence	Fv(y) = 1
hence	Fy(y) = F^y)
hence	Fy(y) = 0
3.	We assume next that g(x) is a staircase function
£(*)=£(*/)= У. •*,-1 < x x,
In this case, the RV у = g(x) is of discrete type taking the values y( with
Р{У = У/} =	< x <x.} = F/x.) -
Example 5-5 Hard limiter. If
g(x) = l 1	x>0
'	l-l	x^O
then у takes the values ± 1 with
P{y = -1} = P(x <, 0} = Fx(0)
P{y = 1} = P{x > 0} = 1 - Fx(0)
Hence Fy(y) is a staircase function as in Fig. 5-6.
Figures^
5-2 THE niSTKIIIl'llONOFr»») 91
FIGURE 5-7
Example 5-6 Quantization. If
g(x) = ns (n - l)s < x £ ns
then у takes the values yn = ns with
P{y = ns) = P{(n - l)s < x zjj) = F,(ns) - Fx(ns - s)
4.	We assume, finally, that the function g(x) is discontinuous at x = x0
and such that
g(x)<g(xo) for x<x0 s(x)>g(x£) for x>x0
In this case, if у is between g(xg ) and g(xg), then g(x) < у for x < x0. Hence
fyy) = р(х^хо) =Fx(xo) g(x„) <,y ^g(x0+)
Example 5-7. Suppose that
g(x) = /x + c
' \x - c
X 2 0
X < 0
is discontinuous (Fig. 5-7). Thus g(x) is discontinuous for x = 0 with g(0 ) = -c
and g(0+) - c. Hence Fy(y) = Fx(0) for |y| <>c. Furthermore,
If у 2 c
then g(x) <, у for x < у - c; hence
If — c £ у c then g(x) &y for x < 0; hence
If у £ -c theng(x)^y for x^y + c; hence
Fy(y) = Fx(y~c)
F,.(y) = Ft(0)
Fy(y) = Fx(y + c)
Example S-8. The function g(x) in Fig- 5-8 equals 0 in the interval (-c.c)
and it is discontinuous for x = ±c with g(c+) = c, g(c~) = 0, g(-c') = -'C,
g(-c,‘)”0. Hence F/y) is discontinuous for у = 0 and it is constant for
92 FUNCTIONS OF ONE RANDOM VARIABLE
FIGURE 5-8
О < у <• c and -с < у <, 0. Thus
If	у	then		£ У	for x < у;	hence	^.(.v) =F,(y)
If	0 £ у < c	then	g(x)	5 У	for x < с;	hence	F,(y) =F,(c)
If	-c <,y < 0	then	g(x)	<У	for x < -с;	hence	РЛу) = FA ~c
If	у < — c	then	g(x)	< У	for x < у;	hence	FM-FM
5.	We now assume that the RV x is of discrete type taking the values xk
with probability pk. In this case, the RV у = g(x) is also of discrete type taking
the values yk = g(xk).
If Ук = Six) for only one x = xk, then
ЛУ =У*} =	= **} =Pk
If, however, yk = g(x) for x = xk and x = xz, then
f’fy = Xt) = ^{x = -4-} + /’{x = X/} = pk + Pi
Example 5-9
у = x2
GO If x takes the values 1,2,...,6 with probability 1/6, then у takes the
values I2,22,,.., 62 with probability 1 /6.
(b) If, however, x takes the values —2, — 1,0,1,2,3 with probability 1/6.
then у takes the values 0,1,4,9 with probabilities 1/6,2/6,2/6,1/6 respectively.
Determination of /У(у)
We wish to determine the density of у = g(x) in terms of the density of x.
Suppose, first, that the set R of the у axis is not in the range of the function
g(x), that is, that g(x).is not a point of R for any x. In this case, the probability
that g(x) is in R equals 0. Hence fy(y) = 0 for у e.R. It suffices, therefore, to
consider the values of у such that for some x, g(x) = y.
5-2 IIIL DISIRIItl'IIOSOI 93
FIGURE 5-9
FUNDAMENTAL THEOREM. To find fy(y) for a specific y, we solve the equa-
tion у = g(x). Denoting its real roots by x„.
У =g(*i) = ••• = g(xn) = •	(5-4)
we shall show that
where g'(x) is the derivative of g(x).
Proof. To avoid generalities, we assume that the equation у = g(x) has three
roots as in Fig. 5-9. As we know
fy(y) dy = P{y < у <у + dy}
It suffices, therefore, the find the set of values x such that у < g(x) < у + dy
and the probability that x is in this set. As wc see from the figure, this set
consists of the following three intervals
x, < x < X| 4- dxl x, 4- dx2 < x < x2 x3 < x < x3 4- dx2
where dx{ > 0, dx3 > 0 but dx2 < 0. From the above it follows that
P{y < У <У + dy) = P{x, < x <x, 4- dxt)
4- P{x2 4- dx2 < x < x2) 4- P{x3 < x < x, 4- dx3)
The right side equals the shaded area in Fig. 5-9. Since
P{xj < x <x, 4- dr,} = /r(xl) dxi	dxt	= dy/g'(xi)
P{x2 + dx2 < x < x,} = fx(x2) |dx2	dx2	= dy/g'( x2)
P{x3 < x < x3 4- dx3} x3) dx3	dx3	= dy/g'( x3)
94 FUNCTIONS OF ONI: RANDOM VARIABLE
wc conclude that
ft(x.)	Ш)
fy(y) dy = -^rr^dy 4-	dy + J-^~~dy
g'(Xi) 1я'(л-2)| g'U.O
and (5-5) results.
We note, finally, that if g(x) = y, = constant for every x in the interval
(xt), x,), then [see (5-3)] F\iy) is discontinuous for у = y(. Hence j\(y) contains
an impulse 8(y - у,) of area Fx{xt) - Fx(xi}).
Conditional densify’ The conditional density fv(y\./F) of the RV у = g(x)
assuming is given by (5-5) if on the right side we replace the terms /v(x,) by
/X(xj.^) (see, for example, Prob. 5-17).
Illustrations
We give next several applications of (5-2) and (5-5).
1.	у = ax 4- b g'(x) = a
The equation у = ax 4- b has a single solution x = (y — b)/a for every y.
Hence
Special case If x is uniform in the interval (x।, x2), then у is uniform in
the interval (ax, 4- b, ax2 4- b).
Example 5-10. Suppose that the voltage v is an RV given by
v = z(r 4- r0)
where i = 0.01 A and r(, = 1000 П. If the resistance г is an RV uniform between
900 and 1100 П, then v is uniform between 19 and 21 V.
7	1	'( 4	1
2.	y=-	g'(x) = -^
The equation у = 1/x has a single solution x = 1/y. Hence
Special case If x has a Cauchy density with parameter a,
ct/ir	1/atr
= x2 4- a2 then = y2 4- I/a2
is also a Cauchy density with parameter 1/a.
5-2 THE IMSTRininiON OF »(«» 95
Example 5-11. Suppose that the resistance r is uniform between 900 and 1100 Q
as in Fig. 5-10. We shall determine the density of the corresponding conductance
g = 1/r
Since /r(r) = 1/200 S for r between 900 and 1100 it follows from (5-7) that
1
= 200g2
and 0 elsewhere.
1 1
TlOO <S < 900
3.
у = ax2 a>0	g'(x)=2ax
If у < 0, then the equation у — ax2 has no real solutions; hence fy(y) = 0. If
у > 0, then it has two solutions
and (5-5) yields
) +Л(~^)
у > 0	(5-8)
We note that Fy(y) = 0 for у < 0 and
F,(y^ = p{-^ ^’t^'l^}-F^]^F‘[-^} y>0
% FUNCTIONS OF ONE RANDOM VARIAB1X
Example 5-12. The voltage across a resistor is an RV e uniform between 5 and
10 V. Wc shall determine the density of the power
e2
w = — r = 1000 fl
r
dissipated in r.
Since /,(e) = 1/5 for e between 5 and 10 and 0 elsewhere, we conclude
from (5-8) with a = l/r that
ПО	1	1
/-W-VV	40 < W < Ю
and 0 elsewhere.
Special case Suppose that
fAx) = -7^e~x2/2 y = x2
v2ir
With о = 1, it follows from (5-8) and the evenness of Д(х) that (Fig. 5-11)
1 1
A(y) = -W/i/y) = -^=e"y/2f/(y)
уУ	у2тгу
We have thus shown that if x is an M0,1) RV, the RV у = x2 has a
chi-square distribution with one degree of freedom [see (4-40)].
1
4.	У = Vx g'(x) =
The equation у = Vx has a single solution x = y2 for у > 0 and no solution for
у < 0. Hence
Л(У) =2уД(у2)£/(у)	(5-9)
The chi density Suppose that x has a chi-square density as in (4-40),
and у = Jx. In this case, (5-9) yields
2
(s-10)
This function is called the chi density with n degrees of freedom. The following
cases are of special interest.
Maxwell For n — 3, (5-10) yields the Maxwell density
4(y) = y/2/iry2e~yI/2.
Rayleigh For л = 2, we obtain the Rayleigh density fy(y) = уе“>Д/2Му)-
Si	у = x(/(x)	g'(x) = U(x)
Qearly, fy(.y) “0 arid Fy(y) == 0 for у < 0 (Fig. 5-12). If у > 0, then the
5-2 the distribution oi- 97
equation у = xU(x) has a single solution *! = y. Hence
/>(у)=А(у) Fy(y)~Fx(y) y>0
Thus Fy(y) is discontinuous at у = 0 with discontinuity FAO") - F’ (0") = Fx(0).
Hence
A(y) = А(уЖ(у) +Л(0)5(у)
6,	у = e* g'(x)=ex
If У > 0, then the equation у = ex has the single solution x = In y. Hence
/У(у) = pk(Iny) y>0
If у < 0, then fy(y) = 0.
Special case If x is M17; a), then
f,(y)--------(5-11)
<ryv27T
This density is called lognormal (see Table 4-1).
7.	у = a sin(x + 0) a > 0
If |yI > a, then the equation у = a sin(x + 0) has no solutions; hence fy(y) = 0.
If |y| < a, then it has infinitely many solutions (Fig. 5-13a)
у
xn = arcsin-----в n	1,0,1,...
* a
Since g'(x„) « acos(x„ 4- 0) = ya2 — y2, (5-5) yields
Цу) -	1	Ё Л(х„)	|y|<a	(5-12)
ya2 -y2 rt--»
Special case Suppose that x is uniform in the interval (-ir.rr). In this
case, the equation у = a sin(x + 0) has exactly two solutions in the interval
(-ir.ir) for any 0 (Fig. 5-14). The function Д(х) equals l/2ir for these two
values and it equals 0 for any x„ outside the interval (-тг.тт). Retaining the
98 FUNCTIONS OF' ONE RANDOM VARIABLE
two nonzero terms in (5-12), we obtain
2
fy(y) =-------,	; lyI < a	(5-13)
2тгуа2 - у2
To find F/y), wc observe that у < у if x is either between -тг and x0 or
between Xj and ir (Fig. 5-13a). Since the total length of the two intervals equals
tt 4- 2X(, 4- 20, we conclude, dividing by 2vr, that
1	1	У
Fv(y) = - 4- — arcsin - |y| < a	(5-14)
2	7Г	a
We note that although fy(+a) = oo, the probability that у = ±a is 0.
Smooth phase If the density fx(x) of x is sufficiently smooth so that it can
be approximated by a constant in any interval of length 2vr (see Fig. 5-136),
then
it E fAxrl) = f fx(x)dx = 1
л--®
because in each interval of length 2ir the above sum has two terms. Inserting
into (5-12), we conclude that the density of x is given approximately by (5-13).
FIGURE5-I4
5-2 । hl dis । кип 'i ion inкш 99

FIGURE 5-15
TT
0 nil *
AW)f
Example 5-13. A particle leaves the origin under the influence of the force of
gravity and its initial velocity v forms an angle <p with the horizontal axis. The path
of the particle reaches the ground at a distance
г2
d = — sin 2 <p
g
from the origin (Fig. 5-15). Assuming that is an RV uniform between 0 and ir/2,
wc shall determine: (a) the density of d and (6) the probability that d < J(l.
Solution, (a) Clearly,
d = asinx a = r2/g
where the RV x = 2<p is uniform between 0 and tt. If 0 < d < a, then the
equation d = tzsin x has exactly two solutions in the interval (0. тг). Reasoning as
in (5-13), we obtain
Л(<0 ° / T 0 < </< a
тгуа~ - d2
and 0 otherwise.
(6) The probability that d < d0 equals the shaded area in Fig. 5-15:
2	d(l
F{d £ </0} - FaW = “ arcsin —
7Г	и
8.	У = tan x
The equation у - tan x has infinitely many solutions for any у (Fig. 5-16a)
xrt = arctan у n = ..., - 1,0,1,...
Since g'(^) = 1/cos2 x = 1 + у2, (5-5) yields
Ё Al».)	(5-|5)
Special case If x is uniform in the interval (—тг/2,тг/2), then the term
jyjCj) in (5-15) equals 1/xr and all others are 0 (Fig. 5-166). Hence у has a
100 FUNCTIONS OF ONE RANDOM VARIABLE
FIGURE 5*16
Cauchy density
1/77
/,(>) = T77	(5-i6)
As we see from the figure, у < у if x is between -tf/2 and x,. Since the length
of this interval equals x, + тг/2, we conclude, dividing by tt, that
1 /	77 \	1	1
-arctan у	(5-17)
?	77 \	2 )	2	77
Example 5-14. A particle leaves the origin in a free motion as in Fig. 5-17 crossing
the vertical line x = d at
у = d tan 9
Assuming that the angle <p is uniform in the interval (-0,0), we conclude as in
(5-16) that
fM <P+r* f0 and 0 otherwise. /<(*>)“ 1/20 / /Т	<— y -0	0	0 у (a) FIGURES*!?	r |y | < d tan 0 JL 0 d tan 0 у (A)
5-2 uh.pisiKiitt iiosoi ,14 101
ТИЕ INVERSE PROBLEM. In the preceding discussion, wc were given an RV x
with known distribution Fx(x) and a function g(x) and we determined the
distribution Fy(y) of the RV у = g(x). We consider now the inverse problem:
We are given the distribution of x and wc wish to find a function g(x) such that
the distribution of the RV у = g(x) equals a specified function F,(y). This topic
is developed further in Sec. 8-5. We start with two special cases'
From Fx(x) to a uniform distribution. Given an RV x with distribution F,.(x),
we wish to find a function g(x) such that the RV u = g(x) is uniformly
distributed in the interval (0,1). We maintain that g(x) = F,(x), that is. if
u = Fx(x) then F„(u) = it for 0 < it < 1	(5-18)
Proof. Suppose that x is an arbitrary number and и = F,(x). From the mono-
tonicity of Fx(x) it follows that u < и iff x < x. Hence
F„(u) = Ли “} =	a-} = /\ (x) = и
and (5-18) results.
The RV u can be considered as the output of a nonlinear memoryless
system (Fig. 5-18) with input x and transfer characteristic Ft(x). Therefore if we
use u as the input to another system with transfer characteristic the inverse
F<~l\u) of the function и = Ft(x), the resulting output will equal x:
If x = F4‘“h(u) then P{x < x) = Fx(x)
From uniform to Fy(y). Given an RV u with uniform distribution in the interval
(0,1), we wish to find a function g(u) such that the distribution of the RV
У — g(u) is a specified function Fy(y). Wc maintain that g(u) is the inverse of
the function и = Fy(y):
If y = F^"°(u) then P{y £ y} = Fy(y) (5-19)
FIGURE 548
102 FUNCTIONS OF ONE RANDOM VARIABLE
Proof. The RV u in (5-19) is uniform and the function F,(x) is arbitrary.
Replacing Fr(x) by F/y), we obtain (5-19) (see also Fig. 5-18).
From F/x) to F/y). We consider, finally, the general case: Given Fr(x) and
F/y), find g(x) such that the distribution of у = g(x) equals F/y). To solve
this problem, we form the RV u = F/x) as in (5-18) and the RV у » F* IJ(u) as
in (5-19). Combining the two, we conclude:
If y = FJ‘-,,(Fx(x).) then F{y<y}=Fv(y) (5-20)
5-3 MEAN AND VARIANCE
The expected value or mean of an RV x is by definition the integral
£{x} - Г Xf(x)dx	(5-21)
J — 00
This number will also be denoted by i\x or rj.
Example 5-15. If x is uniform in the interval (x,, x2), then /(x) = l/(x2 - x() in
this interval. Hence
We note that, if the vertical line x = a is an axis of symmetry of fix) then
E(x) = a; in particular, if /(-x) = /(x), then E{x} = 0. In the above example,
/(x) is symmetrical about the line x = (x( + xz)/2.
Discrete type For discrete type RVs the integral in (5-21) can be written
as a sum. Indeed, suppose that x takes the values x, with probability p,. In this
case [see (4-15)]
f(x) = E^(x-x,)
i
Inserting.into (5-21) and using the identity
I x8(x — X;) dx = X{
we obtain
ЕЫ - Ерл Р,- = P{x - x,}	(5-22)
i
Example 5-W. If x takes the values 1,2,..., 6 with probability 1 /6, then
£W -1(1 + 2+	+6) «33
5-3 MfcAN and variance 103
I I I
Xi-I	*1-1
(b)

FIGURE 5-19
Conditional mean The conditional mean of an RV x assuming .// is given
by the integral in (5-21) if /(x) is replaced by the conditional density f(x\.//):
= f xf(xl.//-)dx	(5-23)
For discrete type RVs the above yields
E{x\.tf} = Ex,-P{x »(5-24)
Example 5-17. With {x a «}. it follows from (5-23) that
f xf(x)dx
£{x|x a) = J x/(x|x > a) dx = —----------
f*f(x)dx
Lebesgue integral. The mean of an RV can be interpreted as a Lebesgue
integral. This interpretation is important in mathematics but it will not be used
in our development. We make, therefore, only a passing reference:
We divide the x axis into intervals (xk,xk.^) of length Дх as in Fig.
5-19a. If Дх is small, then the Riemann integral in (5-21) can be approximated
by a sum
/ x/(x)<Zr = £ х*/(хл)Дх	(5-25)
a---®
And since /(хл) Дх = P{xk < x < xk + Дх), we conclude that
ЭО
E{x} “ К хк?{хк < x < Xk + A*}
A- -«
In the above, the sets {xk < x < xk 4- Дх) are differential events specified in
terms of the RV x, and their union is the space (Fig. 5-19b). Hence, to find
£(x}, we multiply the probability of each differential event by the corresponding
value of x and sum over all k. The resulting limit as Дх -* 0 is written in the
form
£{x) = f xdP
and is called the Lebesgue integral of x.
104 FUNCTIONS OF ONE RANDOM VARIABLE
Frequency interpretation We maintain that the arithmetic average x of the observed
values x, of x lends to the integral in (5-21) as л -» ®:
x. + • • • + x„
Л---------------------> E{x)	(5-26)
Proof. We denote by &nk the number of x/s that are between zk and zk Ax =
From this it follows that
X) +	+x„ = Ax
And since f(zk)A.x = &nk/n [see (4-23)] we conclude (hat
x = - Ax = £zfc/(zj Ax = f xf(x)dx
П	J-as
and (5-26) results.
We shall use the above to express the mean of x in terms of its distribu-
tion. From the construction of Fig. 5-20a it follows readily that x equals the
area under the empirical percentile curve of x. Thus
x = (BCD) - (OAB)
where (BCD) and (OAB) are the shaded areas above and below the u axis
respectively. These areas equal the corresponding areas of Fig. 5-20Z>; hence
X = / [1 - F„(-v)] dx - f Fn(x) dx
where Fn(x) is the empirical distribution of x. With n -> <» this yields
rO
E{x) — [ R(x) dx - f F(x) dx
R(x) = l-F(x) (5-27)
FIGURE 5-20
5-3 MEAN AND VARIANCE 105
FIGURE 5-21
Mean of g(x). Given an RV x and a function g(x), we form the RV у = g(x).
As we see from (5-21), the mean of this RV is given by
£(y) = fjfyt У) dy
(5-28)
It appears, therefore, that to determine the mean of y, we must find its density
fy(y). This, however, is not necessary. As the next basic theorem shows, E{y}
can be expressed directly in terms of the function g(x) and the density /X(x)
of x.
THEOREM
£{£(*)}= / S(x)fx(x)dx	(5-29)
J — oe
Proof. We shall sketch a proof using the curve g(x) of Fig. 5-21. With у =
g(x,) = g(x2') — g(x3) as in the figure, we see that
fy(y) dy = Д(х,) dx, + fx(x2) dxz+fx(x3) dx3
Multiplying by y, we obtain
yfy(y) dy = «(х,)Д(х,) dX' 4- я(х2)Д(х2) dx2 4- g(x3)fx(x3) dx3
Thus to each differential in (5-28) there correspond one or more differen-
tials in (5-29). As dy covers the у axis, the corresponding dx’s are nonoverlap-
ping and they cover the entire x axis. Hence the integrals in (5-28) and (5.29)
are equal.
If x is of discrete type as in (5-22), then (5-29) yields
£{f(x)} - E^(xf)P{x=xJ	(5-30)
I
Example 5-18. With x0 an arbitrary number and g(x) as in Fig. 5-22, (5-29) yields
£{*(x)} -
This shows that the distribution function of an RV can be expressed as expected
value,.
106 FUNCTIONS OF ONE RANDOM VARIABLE
g(*>
едх)}=л(х0)
Xo X
FIGURE 5-22
Example 5-19. In this example, we show that the probability of any event .rf can
be expressed as expected value. For this purpose wc form the zero-one RV
associated with the event of'.
V«) =
(e ssf
Since this RV takes the values 1 and 0 with respective probabilities PG?/) and
M (5-22) yields
E{x^) = 1 X + 0 X P(.sV) = P(.c/)
Linearity From (5-29) it follows that
£{«tgi(x) + ••• + a„g„(x)} =el£{gl(x)} + ••• +anE{gn(x)| (5-31)
In particular, E(ax + b} = aE{x} + b
Complex RVs If z = x 4- jy is a complex RV, then its expected value is by
definition
E{z} = E{x) +jE{y}
From this and (5-29) it follows that if
g(*) =£i(x) +jg2(*)
is a complex function of the real RV x then
£{«(*)} =	dx + jf_J2(x)f(x) dx = jjMf(x) dx
(5-32)
In other words, (5-29) holds even if g(x) is complex.
Variance
The variance of an RV x is by definition the integral
<r2 = J (x-tj)2f(x)dx
(5-33)
where ij «= £{x), The positive constant a, denoted also by <rx, is called the
standard deviation of x.
5-3 MEAN ANP VAKlANC I 107
From the definition it follows that cr2 is the mean of the RV (x - n)2
Thus
0,2 ~ £{(x — л) } ~ £{x~ — 2x7/ + 17“} = E{x2} — 2t]E{x) + -q2
Hence
cr2 = E{x2} - E2{x)	(5-34)
Example 5-20. If x is uniform in the interval (-c,c). then rj = 0 and
Example 5-21. Wc have written the density of a normal RV in the form
fix) = —}=e " ’»)i/2,,J
trfl-rr
where up to now tj and a2 were two arbitrary constants. We show next that и is
indeed the mean of x and cr2 its variance.
Proof. Clearly, fix) is symmetrical about the line x = ij: hence E{x) = 17.
Furthermore,
Г e-(x-nf/2<r: dv = ^5?
because the area of fix) equals 1. Differentiating with respect to <r. we obtain
,= (x - ti)~ , , .	.—
J_x a
Multiplying both sides by <r2/^r, we conclude that E{x - 7?)2} = a2 and the
proof is complete.
Discrete type. If the RV x is of discrete type, then
o-2 = EpX*/ “ *1)2 Р/ = P{x = -v.)	(5-35)
i
Example 5-22. The RV x takes the values 1 and 0 with probabilities p and
q » J — p respectively. In this case
E{x) = 1 x p + 0 x <? = p
E(x2) = I2 X p + О2 X <7 = p
Hence
tr2 - E(x2} - E2{x) = p - p2 = pq
108 FUNCTIONS OF ONE RANDOM VARIABLE
Example 5-23. A Poisson distributed RV with parameter a takes the values
0. I,... with probabilities
ak
P{x = k} = e-a —
We shall show that its mean and variance equal a:
E{x) = a £7{x~) = a2 + a <r2 = a	(5-36)
Proof. We differentiate twice the Taylor expansion of e":
” a
ea = 52 —
Z-t
Ы1 K-
7	aA1	I Д (P
A-0	K-	° A~1
£*(*- I)
к « I
Hence
x ak	x	ak
E{x} = e “ 22 *77 = « Дх2) = e ° E *277 = «2 + «
Ы k'	Ы k'~
and (5-36) results.
Poisson points. As wc have shown in (3-47), the number n of Poisson points in an
interval of length t0 is a Poisson distributed RV with parameter a = A/rt. From
this it follows that
E{n) = Ar0 cr„2 - Az0	(5-37)
This shows that the density A of Poisson points equals the expected number of
points per unit time.
Notes 1. The variance a2 of an RV x is a measure of the concentration of x near its
mean r). Its relative frequency interpretation (empirical estimate) is the average of
(xf - tj)2:
°-2 = -E(*.-v)2
n
where xr:are the observed values of x. This average can be used as the estimate of a*
only if 7j is known. If it is unknown, we replace it by its estimate x and we change n to
л — 1. This yields the estimate
<r2 = ^7 £ (x, - x)2 x =	£x,
known as the sample variance of x [see (8-64)). The reason for changing n to n - I is
explained later.
5-4mo.minin 109
2. A simpler measure of the concentration of x near r] is the first absolute central
moment M = E{lx - t?1). Its empirical estimate is the average of |x, - 77H
M = - У lx, - 171
If 77 is unknown, it is replaced by x. This estimate avoids the computation of squares,
5-4 MOMENTS
The following quantities are of interest in the study of RVs:
Moments
m„ = £{x") = Г x'7(-v) dx	(5-38)
J - X
Central moments
цп = £{(x - 77)"} = f (x - ij)"f(x)dx	(5-39)
Absolute moments
£{|хГ)	£{|x —77Г}	(5-40)
Generalized moments
£{(x — a)"}	£{|x —dl”}	(5-41)
We note that
I k “ 0	'
Hence
д.-Е(;к(-чГ‘	<5-42)
k-0 ' '
Similarly,
m„ = £{[(x - 77) + 17Г} = L (fc)(x ” ’l)	*}
Hence
Л-0 '
In particular,
Mo = ,no = 1	wi=17 Mt = ° M2 = cr‘
and
Аз = m3 - 3?jm2 + 2173 m3 =	+ Зт/а2 + n3
Г10 FUNCTIONS OF ONE RANDOM VARIABLE
Notes 1. If the function /(x) is interpreted as mass density on the x axis, then £{x)
equals its center of gravity, E(x2} equals the moment of inertia with respect to the origin,
and a2 equals the central moment of inertia. The standard deviation tr is the radius of
gyration.
2. The constants p and <r give only a limited characterization of f(x). Knowledge
of other moments provides additional information that can be used, for example, to
distinguish between two densities with the same -q and er. In fact, if mn is known for
every n, then, under certain conditions, /(x) is determined uniquely [see also (5-69)].
The underlying theory is known in mathematics as the moment problem.
3. The moments of an RV arc not arbitrary numbers but must satisfy various
inequalities. For example (sec (5-34)]
<r2 = m2 - m2 0
Similarly, since the quadratic
E{(x" - a)2} — m2n - 2amn + a2
is nonnegativc for any a, its discriminant cannot be positive. Hence
m2n £ m2n
Normal random variables. We shall show that if
then
/(*) = -4=e-’:/2"!
<ту2тг
EM =	/0 \l • 3-	• * (n — l)a"	n = 2k + 1 n = 2k	(5-44)
Е(|х|я} =	p • 3 •	• • (n — l)<r" 2*+72л7	n = 2k	(5-45)
	I 2kk\cr		n = 2k 4- 1	
The odd moments of x are 0 because /(-x) = /(x). To prove the lower
part of (5-44), we differentiate к times the identity
This yields
x2*e a*2dx -
1 -3-- (2k — 1) /~^F~
2k V a2* + l
and with a - l/2<r2, (5-44) results.
Since /(-x) »/(x); we have
= 2Гх2*+1/(х)<£г -
2k *^-**/2^ fa
f2lT JQ
5-4 MOMENTS III
With у = хг/2<г2, the above yields
and (5-45) results because the last integral equals AH
We note in particular that
£{x4) = 3<r4 = 3£2{x2)	(5-46)
Example 5-24. If x has a Rayleigh density
f(x) =
a
then
E{x") = Д Гхп* ’е-'г/2<>2 dx = Г |х|л4 ’e-I/2eJdr
a* Jo	2a" J-x
From this and (5-45) it follows that
( 1 • 3 • • • nanyJir/2 n = 2k + 1
1 \2kk\a2k	n - 2k
In particular,
E{x} = ayjir/2 ir2 = (2 - ir/2)a2
Example 5-25. If x has a Maxwell density
V2 ,	,
/(x) = -^x^'^UCx)
then
E{x") = —1= Г
4 a3y2ir
and (5-45) yields
/1 -3---(л + l)a"
£{ХЛ) “ I 24!a2A-‘/W
n ~2k
n~2k- 1
(5-48)
In particular,
E(x} - 2a^2/^	£{x2} ° 3a2
112 FUNCTIONS OF ONE RANDOM VARIABLE
Poisson random variables. The moments of a Poisson	distributed	RV are
functions of the parameter a: 00	ak mn(a) = E[xn} = e~° E *"ту A-0 Kl		(5-49)
A„(fl) = £{(x - «)"} = e~u E (k - tf) A=0	„ак kl	(5-50)
We shall show that they satisfy the recursion equations w„+i(a) =tf[m„(a) + m;(a)]		(5-51)
A„ + i(a) =а[лд„_1(а) +д'„(а)]		(5-52)
Proof. Differentiating (5-49) with respect to a, we obtain
“a*	*	ak~l	1
m’M = ~e~a E kn— + e~a £ fcn+1 —- = -m„(a) + -m„+1(a)
&*=o k-	л-о	k'	a
and (5-51) results. Similarly, from (5-50) it follows that
“	ak	“	. ak
M = —e~° E (ft - «Гту “ ne~a E (k ~ ту
А=0	A-0	K'
“	ak-l
+ е~° Е (к “ аУк~Г-
А-0	к’
Setting к = (к - а) + a in the last sum, we obtain д'„ = -д„ - пцп_1 +
(1/аХдьи+1 + ap.n) and (5-52) results.
The preceding equations lead to the recursive determination of the
moments mn and д„. Starting with the known moments mx = a, = 0, and
ft2 = a [see (5-36)], we obtain m2 = a(a + 1) and
m3 = a(a2 + a + 2a -I- 1) = a3 + За2 + а д.3 = а(ц'г 4- 2д.|.) = a
ESTIMATE OF THE MEAN OF g(x). The mean of the RV у = g(x) is given by
£{s(x)} “ / g(x)f(x)dx	(5-53)
Hence, for its determination, knowledge of /(x) is required. However, if x is
concentrated near its mean, then E[g(*)} can be expressed in terms of the
moments /дл of x.
Suppose, first, that fix) is negligible outside an interval (77 - s, v + s)
and in/this interval, g(x) = gCn). In this case, (5-53) yields
£(g(x)} «	dx = g(v)
5-4 MOMENTS 113
This estimate can be improved if g(.v) is approximated by a polynomial
= g(-n) + g'(n)(* - 17) + • ’ • +gi">(-n) ———
n!
Inserting into (5-53), we obtain
£{g(x)} = g(7?) + %"Ы~ + ••	(5-54)
n!
In particular, if g(x) is approximated by a parabola, then
ily = £{g(x)} =^(17) +g"(T])~	(5-55)
And if it is approximated by a straight line, then qy = 5(77). This shows that the
slope of g(x) has no effect on 77y; however, as we show next, it affects the
variance a2 of y.
Variance. We maintain that the first-order estimate of a} is given by
<5? s lg'(’7)l2o’2	(5-56)
Proof. We apply (5-55) to the function g2(x). Since its second derivative equals
2(g')2 + 2gg", we conclude that
<5? + 77J = £{g2(x)} = g2 + [(g')2 + gg"]a2
Inserting the approximation (5-55) for r]y into the above and neglecting the a4
term, we obtain (5-56).
Example S-26. A voltage E = 120 V is connected across a resistor whose resistance
is an RV г uniform between 900 and 1100 O. Using (5-55) and (5-56), we shall
estimate the mean and variance of the resulting current
E
i = —
r
Clearly, £{r} = 77 = 103, <r2 = 1002/3. With g(r) = E/r, we have
g(77) - 0.12	g'G?) - -12 X 10-s	g"(7j) = 24 x 10““
Hence
£{i) = 0.12 + 0.0004 A <r2 « 48 X 10~6 /I2
Tchebycheff Inequality
A measure of the concentration of an RV near its mean 77 is its variance tr2. In
fact, as the following theorem shows, the probability that x is outside an
arbitrary interval (77 - e, 77 + s) is negligible if the ratio <r/e is sufficiently
small. This result, known as the Tchebycheff inequality, is fundamental.
114 FUNCTIONS OF ONE RANDOM VARIABLE
THEOREM. For any f > 0,
<r2
P{|x-77l ;>£} < p-	(5-57)
Proof. The proof is based on the fact that
P{|x - 7)1 £ f) = f V f(x)dx+f f(x)dx=f f(x)dx
-X	Jr)+E
Indeed
<r2=f (x - jj)2f(x) dx 2: f (x - Tf)2f(x) dx > e2f f(x)dx
and (5-57) results because the last integral equals P{|x - 771 > g).
Notes 1. From (5-57) it follows that, if a = 0, then the probability that x is outside the
interval (tj - e,t} + e) equals 0 for any e; hence x = 17 with probability 1. Similarly, if
E(x2} = tj2 + <т2 = 0 then 77 = 0 a = 0
hence x = 0 with probability 1.
2. For specific densities, the bound in (5-57) is too high. Suppose, for example, that
x is normal. In this case, P{|x - 771	3a) = 2 - 2G(3) = 0.0027. Inequality (5-57),
however, yields P{|x — 771	3a) < 1/9.
The significance of Tchebycheff’s inequality is the fact that it holds for any /(x)
and can, therefore be used even if f(x) is not known.
3. The bound in (5-57) can be reduced if various assumptions are made about /(x)
[see Chemoff bound (Prob. 5-30)].
MARKOFF INEQUALITY. If /(x) = 0 for x < 0, then, for any a > 0,
P{x t a) s £	(5-58)
Proof.
E{x) = [ xf(x) dx [ xf(x) dx^atf f(x) dx
J0	Ja	Ja
and (5-58) results because the last integral equals P{x a).
COROLLARY. Suppose that x is an arbitrary RV and a and n are two arbitrary
numbers. Clearly, the RV |x - а|л takes only positive values. Applying (5-58),
with a e”, we conclude that
5-5 CUARACTIRISHC I 1!\Г1 IONS 115
Hence
(5-59)
₽n i i Ed*-«И
P(lx - a I > e} £ -------
En
This result is known as the inequality of Bienayme. Tchebycheff’s inequality' is a
special case obtained with a = r) and n = 2.
5-5 CHARACTERISTIC FUNCTIONS
The characteristic function of an RV is by definition the integral
Ф(о>) = f* f(x)e,o,x dx	(5-60)
— X
This function is maximum at the origin because /(.v) > 0:
|Ф(ш)| < Ф(0) = 1	(5-61)
If ja> is changed to the resulting integral
Q(s) = f(x)e’x dx Ф(;Ъ)=Ф(ы)	(5-62)
J — OO
is the moment {generating) function of x.
The function
Ф(«) = 1пФ(<о) - Ф(/<а)	(5-63)
is the second characteristic function of x.
Clearly [see (5-32)]
Ф(а>) = £{<?"*}	&(s) = E{eiX}
This leads to the following:
If у = ax + b then Фу(ш) = е^шФх(аш) (5-64)
because
= e/<’"E{e7fl"x)
Example 5-27. We shall show that the characteristic function of an Mi?, a) RV x
equals
Фл(о») = ехр{Л?ы - |<г2ш2}	(5-65)
Proof. The RV z = (x - i))/a is MO, 1) and its moment function equals
With
z2 1 , sz
я-Т--5(г-^)- + 7
116 FUNCTIONS OF ONE RANDOM VARIABLE
wc conclude that
Ф,($) = ev/2 [	,— e <--')• fa - e'' /2
And since x - <rz + i). (5-65) follows from (5-64).
Inversion formula As we see from (5-60), Ф(<о) is the Fourier transform
of /(x). Hence the properties of characteristic functions arc essentially the
same as the properties of Fourier transforms. We note, in particular, that f(x)
can be expressed in terms of Ф(ю)
/(v) = — ( Ф(ш)е~'ы1 dw	(5-66)
Moment theorem. Differentiating (5-62) n times, we obtain
Ф("'(,у) = E(xV')
Hence
Ф(">(0) = E{x") = m„	(5-67)
Thus the derivatives of Ф(у) at the origin equal the moments of x. This
justifies the name “moment function” given to Ф(л).
In particular,
Ф'(0) = ,ni = V Ф*(0) = rn2 = T)2 + a2	(5-68)
Note Expanding ФСг) into a series near the origin and using (5-67), wc obtain
“ m
*(s) - £ -fs"	(5-69)
n-0 n'
This is valid only if all moments are finite and the series converges absolutely near
5 = 0. Since /(x) can be determined in terms of ФСО, (5-69) shows that, under the stated
conditions, the density of an RV is uniquely determined if all its moments arc known.
Example 5-28. We shall determine the moment function and the moments of an
RV x with gamma distribution:
ct>+\
f(x) = yxb~ 'е~схи(х) у =
Г(о + I)
From (4-39) it follows that
Ф($) = у f xb~}e~ic~3ix dx =	= --------j-	(5-70)
Jo	(c - s) (c — s)
Differentiating with respect to. 5 and setting s = 0, we obtain
, 4 b(b + 1) ••• (h + n- 1)
Ф<">(0) -	----------L « £(x»)
5-5 CIlARACH HISnC‘ HJ SCHONS 117
With n = I and n = 2, this yields
адЛ
<•'	c‘	c*
The exponential density is a special ease obtained with b = 1:
/(x) = cec,U(x) ф(5) = _f_	£{x) = - а2=Л
c — s	c	C~
Chi square. Setting b = in/2 and c= 1/2 in (5-701 we obtain the moment
function of the chi-square density X2Gn):
Ф($) “ —г-	—===•	E(x) = m ar2 = 2m (5-71)
^(1 - 2s)'"	7
Cumulants. The cumulants A;i of RV x are by definition the derivatives
<ГФ(0)
= A,.	(5-72)
of its second moment function OKs). Clearly [see (5-63)] Ф(0) = Au = 0; hence
1 „ 1
Ф(Я) = ЛрУ + -Л2л-2 + • • • + —A„s" + • • •
We maintain that
A, = tj A, = <r2	(5-73)
Proof. Since Ф = еф, we conclude that
Ф'= Ф'еф ф' = [ф" + (ф')2]ещ
With s = 0, this yields
Ф'(0) = Ф'(0) = m1 Ф"(0) = Ф"(0) + [Ф'(0)]2 = тг
and (5-73) results.
Discrete Type
Suppose that x is. a discrete type RV takingthe values x, with probability p;. In
this case, (5-60) yields
Ф(*>) =	(5-74)
i
Thus ф(<й) is a sum of exponentials. The moment function of x can be defined
as tn (5-62). However, if x takes only integer values, then a definition in terms of
z transforms is preferable.
118 FUNCTIONS OF ONE RANDOM VARIABLE
LATTICE TYPE. If n is a lattice type RV taking integer values, then its moment
function is by definition the sum
r(z)=£(z")= Ё p„z”	(5-75)
П" -x
Thus T(l/z) is the z transform of the sequence p„ = P{n = n}. With Ф(ш) as in
(5-74), the above yields
Ф(ш) = Г(е'“) = £ Рпе,пш
Л* -X
Thus Ф(и) is the discrete Fourier transform (DFT) of p„ and
Ф(5) = 1пГ(е5)	(5-76)
Moment theorem. Differentiating (5-75) к times, we obtain
H^z) = E{n(n - 1)  • • (n - к + l)zn-*}
With z = 1, this yields
П*>(1) = E{n(n - 1) • • • (n - к + 1)}	(5-77)
We note, in particular, that Г(1) = 1 and
Г(1) = E{n) Г(1) = £{n2} - E{n}	(5-78)
Example 5-29. (a) If n takes the values 0 and 1 with P[n = I) = p and
P{n = 0} = «у, then
P(z) = pz + q
Г'(1) = E{n) = p Г"(1) = E{n2} - E{n} = 0
(b) If n has the binomial distribution
pn = P{n = «} = ]pnqm~n 0 i n <, tn
then
m
r(z) = L ( и }pnqm~nzn = (pz + qfn
л-0
r"(l)=z7jp Г"(1) = tn(m - l)p2
Hence
E(n} = mp a2 = mpq
Example 5-30. If.n is Poisson distributed with parameter a,
an
P{n -- и) - e~° —-	n-0,1,..,
л!
5-5 CHARACTERISTIC I UNCHONS 119
then
Г(х) =e'“ £a"i-(5-79)
n -11	" 
In this case [see (5-76)]
Ф(5) = a(eJ - 1)	ф'(0) = д ф"(0) = о
and (5-73) yields E{n) = a, cr~ = a in agreement with (5-36).
Determination of the density of g(x). We show next that characteristic functions
can be used to determine the density fy(y) of the RV у — g(x) in terms of the
density fx(x) of x.
From (5-32) it follows that the characteristic function
Ф/w) = f e,tayfv(y)dy
— X
of the RV у = g(x) equals
Ф/са) = £{e;“R(x)} = f eMx>fx(x) dx	(5-80)
J — oo
If, therefore, the above integral can be written in the form
f eicayh(y) dy
—X
it will follow that (uniqueness theorem)
fy(y) = h(y)
This method leads to simple results if the transformation у = g(.x) is one-to-one.
Example 5-31. Suppose that x is MO; a) and у = ax2. Inserting into (5-80) and
using the evenness of the integrand, we obtain
ф (ш) . Г fi“«7(x) dx = -4=- C^r^^dx
'	J-a>	trylTT J0
As x increases from 0 to <», the transformation у = ax2 is one-to-one. Since
dy = 2axdx = ly/aydx
the above yields
2	-»	ч dy
Фу(а>) - -±=/ е^е~^У
а>/2тт Jf)	2yay
120 FUNCTIONS OF ONE RANDOM VARIABLE
Hence
e-y/2aa2
fM = —f—U(y)
aylTray
in agreement with (5-8).
Example 5-32. We assume finally that x is uniform in the interval (-т/2, тг/2)
and у = sin x. In this case
Ф/w) = Г e'wsin V(x) dx = - П/2 е^ыпл dx
J-'»	TTJ-ir/2
As x increases from —тг/2 to тг/2, the function у = sin x increases from -1 to 1
and
dy = cos xdx =	- y2dx
Hence
1 и dy
Ф/w) = - / е'ыу
y irJ-i /ГТ7
This leads to the conclusion that
//у) = ~7=T	for M < 1
rryl -y
and 0 otherwise, in agreement with (5-13).
PROBLEMS
5-1. The RV- x is N(5,2) and у = 2x + 4. Find 7jy, try, and fy(y).
5-2. Find Fy(y) and fy(y) if у = -4x + 3 and Д(х) = 2e-2jtMx).
5-3. К the RV x is MO, c) and g(x) is the function in Fig. 5-4, find and sketch the
distribution and the density of the RV у = g(x).
5-4. The RV x is uniform in the interval (—2c, 2c). Find and sketch fy(y) and Fy(y) if
У = g(x) and g(x) is the function in Fig. 5-3.
5-5. The RV x is MO, b) and g(x) is the function in Fig. 5-5. Find and sketch fy(y)
and Fy(y).
5-6. The RV x is uniform in the interval (0,1). Find the density of the RV у = -Inx.
5-7. We place at random 200 points in the interval (0,100). The distance from 0 to the
first random point is an RV z. Find Fr(z) (д) exactly and (fi) using the Poisson
approximation.
5-8. If у « fix and fx(x) = ce~c*U(x), find //y).
5-9. Express the density fy(y) of the RV у = g(x) in terms of /X(x) if (e) g(x) = |x|;
(fi) g(x) » e-xt/(x).
5-10. Find Fv(y) and L(y) if Fx(x) - (1 - e~2x)U(x) and (a) у »(x - 1МЛх - 1>.
PROBLEMS 121
5-11. Show that, if the RV x has a Cauchy density with a = 1 and у = arctan x, then у is
uniform in the interval (-тг/2, тг/2).
S-12. The RV x is uniform in the interval (-2тг,2тг). Find //y) if (a) у » x\
(b) у = x4, and (с) у = 2sin(3x + 40°).
5-13. The RV x is uniform in the interval (-1,1). Find g(x) such that if у = g(x) then
Д(у) = 2e~2W).
5-14. Given that RV x of continuous type, we form the RV у = g(x). (a) Find fy(y) if
g(x) = 2Fx(x) + 4. (fr) Find g(x) such that у is uniform in the interval (8,10).
5-15. A fair coin is tossed 10 times and x equals the number of heads, (a) Find F„(x).
(b) Find Fy(y) if у = (x - 3)2.
5-16. If t is an RV of continuous type and у = a sin wt, show that
ЛО)
lyl < a
lyl > a
5-17. Show that if у = x2, then
/У(у|х £ 0) ~
u(y) fA'/y)
1-FX(O) 2V7
5-18. (o) Show that if у = flx + b, then ay = |e|crx. (fr) Find т)у and try if у =
(x - rj,)/o-x.
5-19. Show that if x has a Rayleigh density with parameter a and у = b + ex2, then
try - 4с2оЛ
5-20. (a) Show that if m is the median of x, then
£{|x - a I) = £{|x - ml) + 2fm(x - a)f(x) dx
Ja
for any a. (b) Find c such that £{|x - c|} is minimum.
5-21. Show that if the RV x is M17; er), then
£{|x|) = cr^/^e ”1,2/2<7 2 + 2i?G^ j - 77
5-22. If x is MO,2) and у e 3x2, find qy, <ry, and /У(у).
5-23. Show that if Я = (jZj, ..., л/п] is a partition of then
£(x) - ^{xljarjPX^) + ••• +£{х|лб,)Р(Ч).
5-24, Show that if x 0 and £(x) •= П» then P{x £	.
5-25. Using (5-55), find £{x3) if = 10 and <rx = 2.
5-26. If x is.uniform in the interval (10,12) and у - x3, (a) find //y); (&) find £{y):
1, exactly; 2, using (5-55).
5-27. The RV x is M100, 3). Find approximately the mean of the RV у - 1/x using
(5-55).
□ ЯЛ
122 FUNCTIONS OF ONI RANDOM VARIABLE
5-28. Wc arc given an even convex function g(x) and an RV x whose density f(x) is
symmetrical as in Fig. P5-28 with a single maximum at x = tj. Show that the mean
E{g(x - o)} of the RV g(x - fl) is minimum if a = 77.
# FIGURE P5-28
x
5-29. Show that if x and у are two RVs with densities /X(x) and fy(y) respectively, then
E{log/,(x)} > E{log/,.(x)}
5-30. (Chernqff bound) (fl) Show that for any a > 0 and for any real x.
Ф(х)
P{eiK > «} < ——	where Ф(х) = £(e'K}	(i)
a
Hint: Apply (5-58) to the RV у = e’*. (b) For any A,
P{x A) < е-,л,Ф(х)	x > 0
P{x £ A) < е”’лФ(л)	jt < 0
Hint: Set a = ел/| in (i).
5-31. Show that (a) if /(x) is a Cauchy density, then Ф(<о) = с‘"ш|; (b) if fix) is a
Laplace density, then Ф(<о) = a2/(a2 + w2).
5-32. Show that if E{x) = 77, then
л-0
5-33. Show that if Фх(ш ।) = 1 for some o>, ¥= 0, then the RV x is of lattice type taking
the values xtl = 27rn/<U|.
Hint:
0=1- Фх(о>,) = f (1 - е'ш^я(х) dx
5-34. The RV x has zero mean, central moments and cumulants A„. Show that
Ля - д3, A4 = fi4 - 3/г2; И У is MO; ay) and ay = at, then E(x4) =
Я{у4} +Л4.
5-35. An RV x has a geometric distribution if
P{x = £}=/*?*	£-0,1,... p + <?=!
FihdT(z) and show that « q/p, <rT2 = q/p2-
i>Koiii.t.Ms 123
5*36. An RV x has a Pascal (or negative binomial) distribution if
P{x = k) = ()p"(	+ * " 1 jpV к = о. I,...
Find Hz) and show that r/l = nq/p, <r2 = nq/p2.
5-37. The RV x takes the values 0,1.... with P(x = k} = pk. Show that if
у = (x - l)U(x - I) then R(z) =p„ + : '[rx(z)-p,<]
= *?.,“ 1 + Plt £{y2} = £{x’} - 2r,t + I - p„
5-38. Show that, if Ф(ш) = then for any a,.
У. У Ф(«, -	°
i — 1 j “ I
Hint:
£( Ё u,e'“"* ? * 0
\ । “ i	)
5*39. The RV x is MO;a), (a) Using characteristic functions, show that if g(x) is a
function such that g(x)e'-,’/2<r‘ -» 0 as |x| -* ®, then
dE(g(x)} \ ld3g(x)\	,
------------- — El---> t> = <r“	(i)
du 2 \ dx2 j	'
(b) The moments p.n of x are functions of f. Using (i), show that
n(n - I)
PM =--------------fnP„-2(fl)dP
5*40. Show that, if n is an integer-valued RV with moment function Rz) as in (5-75),
then
P{n = jfc) = J- Г r(eJa,)e-^d<a
2тг •'—к
CHAPTER
6
TWO
RANDOM
VARIABLES
6-1 BIVARIATE DISTRIBUTIONS
We are given two RVs x and y, defined as in Sec. 4-1, and we wish to determine
their joint statistics, that is, the probability that the point (x. y) is in a specified
regionf D in the xy plane. The distribution functions Fr(x) and Fv(y) of the
given RVs determine their separate (marginal) statistics but not their joint
statistics. In particular, the probability of the event
{x < x) П {y < y} = {x < x, у < у}
cannot be expressed in terms of Fx(x) and Fv(y). In the following, we show that
the joint statistics of the RVs x and у are completely determined if the
probability of this event is known for every x and y.
Joint Distribution and Density
The joint (bivariate) distribution Fvy(x, y) or, simply, F(x, y) of two RVs x and
у is the probability of the event
{x <x,y <y} = {(x,y) e Dj
where x and у are two arbitrary real numbers and D{ is the quadrant shown in
tThe region D is arbitrary subject only to the mild condition that it can be expressed as a countable
union or intersectioii of rectangles;
124
6-1 ll|\ AKIAII DIM Hint HOSS
125
Fig. 6-la:
F(x, у) = P{x < x, у < у)	(6-1)
PROPERTIES. 1. The function Hx, y) is such that
F( — oc, y)	=0	/'( x.	— oc) = () x) = 1
Proof. As we know, P{x	= -»}	= P{y	= -oo) = (). And since
(x = -co,y <y)	C {x =	-oo)	{x <x,y = -oc) C {y =	-x)
the first two equations follow. The last is a consequence of the identities
{x < oo, у < x) = .Z P(.Z) = 1
2. The event {x1 < x < x2, у < y)	consists of all points (x,y)	in the	vertical
half-strip	D2	and the event {x <x, yt	< у < y2) consists of all	points	(x,y) in
the horizontal half-strip D, of Fig. 6-16. We maintain that
P{x, <x<x,,y^y)	= F(x2,y) -F(x,.y)	(6-2)
P{x <x, уi <y <y2)	= F(x,y2) - Р(х,У])	(6-3)
Proof. Clearly,
{x ^x2,y ^y) = {x <X|,y <y) + {x( < X <x2,y < y)
The last two events are mutually exclusive; hence [see (2-10)]
P{x x2, У < y} « P{x <. хи у у) + P{xt < x x2, у <, у)
arid (6-2) results. The proof of (6-3) is similar.
3-.	P{x। < x < x2, y.i < У £ У2)
= Р(л'2гУ2) -•-Р(Х2*У\) ^-Hx^yi) (6-4)
This is die probability that (x,y) is in the rectangle D4 of Fig. 6-lc.
126 two RANDOM VARIAIIM S
Proof. It follows from (6-2) and (6-3) because
{.Vj < X < x2, у < y2} = {.V, < X 5 x2, у < у,} + {xI < X < x2. y, < у < y2J
and the last two events are mutually exclusive.
JOINT DENSITY. The joint density of x and у is by definition the function
From this and property 1 it follows that
F( x, у) = f f f (a. Д) da d(i	(6-6)
Joint statistics. Wc shall now show that the probability that the point (x,y) is in
a region D of the .vy plane equals the integral of /(x. y) in D. In other words,
F{(x,y) e D} = f ff(x.y) dxdy	(6-7)
where {(x, y) e D] is the event consisting of all outcomes < such that the point
W<),y(f)] is in D.
Proof. As wc know, the ratio
F(x + Д.г,у + Ay) — F(x,y + Ay) - F(x + Ax,y) + F(x.y)
Ax Ay
tends to <72F(x, y)/dx(>y as Ax -» 0 and Ay -» 0. Hence [sec (6-4) and (6-5)]
P{x < x < x + Дх, у < у < у + Ду] — f(x,y) Дх Ду (6-8)
We have thus shown that the probability that (x, y) is in a differential rectangle
equals fix, y) times the area Ax Ay of the rectangle. This proves (6-7) because
the region D can be written as the limit of the union of such rectangles.
Marginal statistics. In the study of several RVs, the statistics of each are called
marginal. Thus Fx(x) is the marginal distribution and /,(x) the marginal density
of x. In the following, we express the marginal statistics of x and у in terms of
their joint statistics F(x, y) and fix, y).
Wc maintain that
F/x) - F(x,«)	F,,( y) - F(~, >)	(6-9)
fM‘[ /(x,y)</y	fr(y) - f f(x,y)dx	(6-10)
Proof. Clearly, {x <	= (y < «=) =	hence
{x sx) = (x <x,y < 00} (y <;y) = (x < 00, у <jy)
The .probabilities of the two sides above yield (6-9).
6-1 niVAKlAll DISIHIBUI IONS 127
Differentiating (6-6). we obtain
dF(x.y) y
—to--f.	“p
Setting у = x in the first and x = <x
because [see (6-9)]
in
dF(x,y) ,
---= J_J( «. У) da (6-11)
the second equation, we obtain (6-10)
<7F( .t,«)
dx
dF(<»ty)
dy
Existence theorem. From properties 1 and 3 it follows that
F(-oo, y) = 0 F(x.-«)—0	F(x,<x) = j (6-12)
and
f(jf2.y2) - F(x,,y2) — F(x2.y,) + F(x,,y,) > 0	(6-13)
for every x, < x2 y, < y2. Hence [see (6-6) and (6-8)]
f f f(x\y)dxdy = 1	Лх.у)>()	(6-14)
Conversely, given F(x. y) or /(x. y) as above, we can find two RVs x and
y, defined in some space with distribution F(x, y) or density /(x, y). This
can be done by extending the existence theorem of Sec. 4-3 to joint statistics.
Joint normality. We shall say that the RVs x and у are jointly normal if their
joint density is given by
r/ л (	1 Г(х — ту,)3 (x -->?,)(}’- ri2) (y - Th)2]]
f(x.y) = A exp - —---------7- -------;----2r-------------------- + ------;--- }
(	2(1 - г*)	<T|“	<Г|<г2	rr2 J]
(6-15)
This function is positive and its integral equals 1 if
A =-----------\-------- И < 1	(6-16)
2тг<Г|(г2У1 — r2
Thus f(x, y) is an exponential and its exponent is a negative quadratic because
|r| < 1. The function /(x, y) will be denoted by
As we shall presently see, 77 j and 172 are l^e expected values of x and y, and or2
and cr2 their variances. The significance of r will be given later (correlation
coefficient).
We maintain that the marginal densities of x and у are given by
t f (y) = ——e-0-^/2^ (6-17)
128 TWO RANDOM VARIAHI.IxS
Proof. To prove the above, we must show that if (6-15) is inserted into (6-10),
the result is (6-17). The bracket in (6-15) can be written in the form
Hence
г®	(У ~ ^г)2
/	, у) dx = A exp-------—
— —x	2tr£
(	1 X —	V — 7),
x / exp “	-----ГГ------------r-----------
'-X	|	2(1— rz) cr,	<r2
The last integral is a constant В (independent of x and y). Therefore [see
(6-10)]
Д(у)	/2ai-
And since //y) is a density, its area must equal 1. This yields AB = \/а2'/2тг
and the second equation in (6-17) results. The proof of the first is similar.
Notes 1. From (6-17) it follows that if two RVs are jointly normal, they are also
marginally normal. However, as the next example shows, the converse is not true.
2. Joint normality can be defined as follows: Two RVs x and у are jointly normal if
the sum ax + by is normal for every a and b [see (8-56)].
Example 6-1. We shall construct two RVs x, and y, that are marginally but not
jointly normal. We start with two jointly normal RVs x and у with density fix, y)
as in (6-15). Adding and subtracting small masses in the region D of Fig. 6-2
consisting of four circles as shown, we obtain a new function /Дх, у) such that
/i(x, y) = /(x, y) ± £ in D and f\(x, y) = f(x, y) everywhere else. The function
/i(x, y) so formed is a density; hence it defines two new RVs x, and y,. These RVs
are not jointly normal because /((x, y) is not of the form (6-15). We maintain,
FIGURE 6-2
6-1 1I1VAR1A1E DIM RUH)!IONS 129
however, ihai ihcy arc marginally normal. Indeed, ihc densities of x, and y, arc
determined by the masses in the vertical strip x, < x < x, 4 dr and the horizontal
strip y( <y <y, + dy. As we sec from the figure, the masses in these strips have
not changed. This shows that x( and y( arc normal because x and у arc normal.
Discrete type. Suppose that the RVs x and у arc of discrete type taking the
values x, and yk with respective probabilities
P{x = x,} = p, P{y = yA) = qk	(6-18)
Their joint statistics are determined in terms of the joint probabilities
P{x = x,,y = yA.} = plk	(6-19)
Clearly,
Lp,a-=1	(6-20)
i,A
because, as i and к take all possible values, the events {x = x,, у = yA.} are
mutually exclusive and their union equals the certain event.
We maintain that the marginal probabilities Pj and qk can be expressed in
terms of the joint probabilities pik:
P, = Ей,-* Як = LPik	(6-21)
к	i
This is the discrete version of (6-10).
Proof. The events {y = yk} form a partition of .У'. Hence as к ranges over all
possible values, the events {x = x,, у = yA.} are mutually exclusive and their
union equals (x = x,). This yields the first equation in (6-21) [see (2-36)]. The
proof of the second is similar.
Probability Masses
The probability that the point (x,y) is in a region D of the plane can be
interpreted as the probability mass in this region. Thus the mass in the entire
plane equals 1. The mass in the half-plane x sx to the left of the line Lx of
Fig. 6-3 equals Fx(x). The mass in the half-plane у < у below the line Ly equals
F/y). The mass in the cross-hatched quadrant (x <x, y<y) equals Fix, y).
t
FIGURE 6-3
1
130 TWO HANOOM VARIABLES
Finally, the mass in the clear quadrant (x > x. у > у) equals
P{x >x,y >y} = 1 - Fr(x) - Fy(y) + F(x.y) (^-22)
The probability mass in a region D equals the integral [see (6-7)1
(lxc{y
1Г, therefore, fix, y) is a bounded function, it can be be interpreted as surface
mass density.
Example 6-2. Suppose that
f(x,y) = ------(6-23)
2rrrr~	'
Wc shall find the mass tn in the circle x2 + y2 < a2. Inserting (6-23) into (6-7) and
using the transformation
x = r cos 0 у = r sin 6
we obtain
tn = -----7 f f e~r‘ /2" rdrd8 = 1 -	(6-24)
2тг(т~ -'о •'-tr
POINT MASSES. If the RVs x and у are of discrete type taking the values x, and
yk, then the probability masses are 0 everywhere except at the points (x|( yk).
We have, thus, only point masses and the mass at each point equals pik [see
(6-19)]. The probability p, = P{x = .v,} equals the sum of all masses plk on the
line x = x, in agreement with (6-21).
If i = 1,..., M and к = 1,..., N. then the number of possible point
masses on the plane equals MN. However, as the next example shows, some of
these masses might be 0.
Example 6-3. (a) In the fair-die experiment, x equals the number of dots shown
and у equals twice this number:
х(Л)“» У(Л) = 2'	/=1......6
In other words, x( = i. yk = 2k and
{A	» xa
n Л
0	i =# к
Thus there are masses only on the six points (<*, 2<> and the mass of each point
equals 1/6 (Fig. 6-4a).
(b) Wc loss the die twice obtaining the 36 outcomes ДД and we define x
and у such that x equals the first number that shows and у the second
= ‘ у(ЛЛ) = *	'•* = >.....6
Thus Xj «i, yA -= к, and pik - 1/36. We have, therefore, 36 point masses (Fig.
6-46) and the mass of each of each point equals 1/36. On the line x = i there are
six points with total mass 1/6.
6-1 IIIVAKIAI I DIS I Kill! Hoss 131
FIGURE 6-4
(c) Again the die is tossed twice but now
Af,fk) = \i-k\ у(Ш = 1 + Л
In this case, x takes the values 0,1.........5 and у the values 2.3...........12.	The
number of possible points equals 6 X 11 = 66; however, only 21 have positive
masses (Fig. 6-4c). Specifically, if x = 0, then у = 2, or 4........ or 12 because if
x — 0, then i = к and у = 2i. There are. therefore, six mass points on this line and
the mass of each point equals 1/36. If x = 1, then у = 3. or 5..........or 11. There
arc, therefore, five mass points on the line x = 1 and the mass of each point equals
2/36. For example, if x = 1 and у = 7. then i - 3. к = 4. or i = 4. к = 3; hence
P(x = 1, у = 7) = 2/36.
LINE MASSES. The following cases lead to line masses:
1.	If x is of discrete type taking the values x, and у is of continuous type, then
all probability masses are on the vertical lines x = x, (Fig 6-5«). In particu-
lar, the mass between the points y, and y, on the line x =x, equals the
probability of the event
(x =x„ у, < у £y2)
FIGURE6-5
132 IWO KAXIMIM VAKIA1II I S
2.	IF у = g(x). then all the masses arc on the curve у = g(x). In this case.
/•'(.«. y) can be expressed in terms of /\(л). For example, with x and у as in
Fig. 6-56, F(.v, у) equals the masses on the curve у = g(.v) to the left of the
point A and between В and C (heavy line). The masses to the left of A
equal F\(x}). The masses between В and C equal F,(a\) — /\(л,). Hence
F'(-v.y) = /\(A)) и- Ft(.v3) - /\(л,) у - g(x,) = g(x?) = g(.v3)
3.	If x = g(z) and у = A(z), then all probability masses are on the curve
.V = g(z). у = h(z) specified parametrically. For example, if g(z) = cos z.
A(z) = sin z, then the curve is a circle (Fig. 6-5c). In this case, the joint
statistics of x and у can be expressed in terms of F.(z).
Independence
Two RVs x and у are called (statistically) independent if the events {x e /1} and
{у e B) are independent [sec (2-40)], that is, if
P{x e А, у e B} = P{x e /1}Р{у e B)	(6-25)
where A and В arc two arbitrary sets on the x and у axes respectively.
Applying the above to the events {x < a} and {y < y}, we conclude that, if
the RVs x and у are independent, then
F(x,y) =Ft(x)F;.(y)	(6-26)
Hence
/(A-.y) -Л(д-)Л(у)	(6-27)
It can be shown that, if (6-26) or (6-27) is true, then (6-25) is also true; that
is, the RVs x and у are independent [see (6-7)].
If the RVs x and у are of discrete type as in (6-19) and independent, then
P,k = PiPk	(6‘28)
This follows if we apply (6-25) to the events (x = ,v,} and {y = yj.
Example 6-4 Buffon's needle. A fine needle of length la is dropped at random
on a board covered with parallel lines distance lb apart where b > a as in Fig.
6-6o, We shall show that the probability p that the needle intersects one of the
lines equals 2a/irb.
In terms of RVs the above experiment can be phrased as follows: Wc denote
by x the distance from the center of the needle to the nearest line and by в the
angle between the needle and the direction perpendicular to the lines. We assume
that the RVs x and fl are independent, x is uniform in the interval (0. b). and & is
uniform in the interval (0,ir/2). From this it follows that
I 2	~
Vzxsb	o<;0£?
П ТГ	~
and 0 elsewhere. Hence the probability that the point (x. 0) is in a region D
included in the rectangle R of Fig. 6-6/> equals the areas of D times 2/trb.
6-1 ItIVAHIMI nisi HUH III
133
FIGURE 6-6
The needle interseels the lines if .v <acosO. Hence p equals ilie shaded
area of Fig. 6-66 times 2/-b:
p = P{x < a cos 0} = —- / 'a cos 0 <10 = —
~b -'ll	тгЬ
The above can be used to determine experimentally the number — using the
relative frequency interpretation of p; If the needle is dropped n times and it
intersects the lines n, times, then
и,	2 a	2 an
— - P - ~~r hence г ~ -—
Il	77 Ь	bnt
THEOREM. If the RVs x and у arc independent, then the RVs
z = s’(x) w =/i(y)
are also independent.
Proof. We denote by A. the set of points on the x axis such that g(x) < z and
by Bw the set of points on the у axis such that h(y) < tv. Clearly,
(z < z} = {xe/lj {w < w) = {у e /?„.)	(6-29)
Therefore the events (z <, x} and (w < и*} are independent because the events
{x 6 A.} and (y 6 BtJ are independent.
INDEPENDENT EXPERIMENTS. As in the case of events (Sec. 3-1), the concept
of independence is important in the study of RVs defined on product spaces.
Suppose that the RV x is defined on a space ./\ consisting of the outcomes
and the RV у is defined on a space .Z, consisting of the outcomes In the
combined experiment .Zj X .Z, the RVs x and у arc such that
х(Ш =*(<>) У( Ш = MJ	(^0)
In Other words, x depends on the outcomes of .Z, only, and у depends on the
outcomes of .Z2 only.
134 TWO RANDOM VARIABLES
THEOREM. If the experiments .S\ and .У'2 are independent, then the RVs x
and у are independent.
Proof. We denote by the set (x x) in .У\ and by X the set (y < y} in
In the space x ./2.
{x <, a} = x x .z; {y < >-} =x x
From the independence of the two experiments, it follows that [see (3-4)] the
events .й/ x .У2 and X X are independent. Hence the events (x < .v} and
{y <, y) are also independent.
Example 6-5. A die with P(f} = p, is tossed twice and the RVs x and у arc such
that
х(/,Л)=' У(Ш = к
Thus x equals the first number that shows and у equals the second: hence the RVs
x and у arc independent. This leads to the conclusion that
P,k =	= '\У = =P,Pk
Circular Symmetry
We say that the joint density of two RVs x and у is circularly symmetrical if it
depends only on the distance from the origin, that is, if
Л-r, y) = g(r)	r = y]xz + y2	(6-31)
THEOREM. If the RVs x and у are circularly symmetrical and independent,
then they are normal with zero mean and equal variance.
Proof. From (6-31) and (6-27) it follows that
g(/r2 + у2) =Л(х)/у(у)	(6-32)
Since
<?g(r)	dg{r) dr	dr	x
—— = —-— —	and — = -
dx ar dx	dx	r
we conclude, differentiating (6-32) with respect to x, that
=A'(x)//y)
Dividing both sides by xg(f) — xfx(x)fy(.y)t we obtain
The right side above is independent of у and the left side is a function of r
6-2 ONI FUNCTION <>l TWO RANUOM VARlAlil.IJi 135
- yCv2 + у 2. This shows that both sides are independent of л and y. Hence
1 Г(г)
----—— = a = constant
r g(r)
From this if follows that
rflng(r)
----—-----= ar g(r) = Ae“' z*
and (6-31) yields
Л -v. у ) = g (y/x2 + у2) = Лси(' ’ 'J>> -	(6-34)
Thus the RVs x and у are normal with zero mean and variance a2 - - 1 /a.
6-2 ONE FUNCTION
OF TWO RANDOM VARIABLES
Given two RVs x and у and a function g(x, y). we form the RV
z = g(x.y)
We shall express the statistics of z in terms of the function g(x, y) and the joint
statistics of x and y.
With z a given number, we denote by D: the region of the xy plane such
that g(x, y) <, z. This region might not be simply connected (Fig. 6-7). Clearly,
{z < z} = {g(x,y) < z) = {(x,y) e D,}
Hence [see (6-7)]
R(z) =P{z <z) =P{(x,y) eDr) = f ff(x,y)dxdy (6-35)
Thus, to determine F,(z). it suffices to find the region D. for every z and to
evaluate the above integral.
The density of z can be determined similarly. With SD: the region of the
xy plane such that z < g(x, y) < z + dz, we have
{z < z < z + dz} = {(x,y) c AD.)
FIGURE 6-7
FIGURE 6-8
136 TWO RANDOM VARIABLES
Hence
/,(2) dz = !>{z < z < z + dz} = [f f(x,y)dxdy (6-36)
JJSD.
Illustrations
In the following, we use (6-35) and (6-36) to find the statistics of various
functions of x and y.
1. z = x + у
The region D. of the xy plane such that x + у < z is the shaded part of
Fig. 6-8 to the left of the line x + у = z. Integrating over suitable strips, we
obtain
F,(z) = f [ yf(x,y)dxdy	(6-37)
We can find fXz) either by differentiating FXz} or directly from (6-36).
The region AD. such that z < x + у < z + dz is a diagonal strip bounded by
the lines x + у = z and x + у = z + dz. The coordinates of a point of this
region arc z — у, у and the area of a differential equals dydz. Hence
f:(z}dz = Г f(z-y,y)dydz	(6-38)
J —co
INDEPENDENCE AND CONVOLUTION. If the RVs x and у are independent,
then
f(x,y) =A(x)4(y)
Inserting into (6-38), we obtain
Л(О =	- y)f,(y) dy	(6-39)
The above integral is the convolution of the functions /,(л) and fy(y). We thus
reach the following fundamental conclusion;
If two RVs are independent, then the density of their sum equals the convolution
of their densities.
We note that, if Д(х) = 0 for x < 0 and fy(y) = 0 for у < 0, then
fXQ) = 0 for z < 0 and
fAz) = ( fAz -y)fy(yYdy z>0	(6-40)
'0
Example 6-6. It follows from (6-39) that the convolution of two rectangles is a
trapezoid. Hence, if the RVs x and у are uniform in the intervals (a, b) and (c, d)
respectively, then the density of their sum z - x + у is a trapezoid as in Fig. 6-9<r
If, in particular, b - a - d - c, then f.(z) is a triangle as in Fig. 6-9b.
6-2 ONE FUNCTION OF TWO RAN DOM VARI ABLES 137
FIGURE 6-9
Suppose, for example, that resistors r( and r2 arc two independent RVs
uniform between 900 and 1100 fl. From the above it follows that, if they are
connected in series, the density of the resulting resistor r = r, + r, is a triangle
between 1800 and 2200 fl. In particular, the probability that г is between 1900 and
2100 fl equals 0.75.
Example 6-7. If the RVs x and у arc independent and
fx(x) » ae~axU(x) fy(y) = pe~^U(y)
(Fig. 6-10) then for z > 0,
{cr/3
Д - a	(6-41)
a2ze~ax	fl = a
FIGURE 6-10
138 TWO random variables
FIGURE 6-11
2. z = x/y
The region D: of the xy plane such that x/y < z is the shaded part of
Fig. 6-11. Integrating over suitable strips, we obtain
F:(z) = ( (y f(x,y) dxdy + f" [ f(x,y)dxdy (6-42)
The region AD. such that z < x/y < z + dz is a triangle sector bounded
by the lines x = yz and x = y(z + dz). The coordinates of a point in this region
are zy, у and the area of a differential equals |y| dydz. Inserting into (6-36) and
canceling dz, we obtain
f:(z) = f \y\f(zy,y)dy
J — X
(6-43)
Normal densities. We maintain that, if the RVs x and у are jointly normal with
zero mean
/(x.y) =
---------7 -exp
2тгагх(ггу I — r2
1 I x2
2(1 - r2) a2
~ xy
- 2r------- +
(T|(T2
(6-44)
1
then their ratio z = x/y has a Cauchy density centered at ra-x/(r2.
о-|<г2У1 - г2 /тт
a£(z - rax/a2)2 + <Tf( 1 - r2)
(6-45)
Proof. Inserting (6-44) into (6-43) and using the fact that /(—x, — y) = /(x, y),
we obtain
2 r“ /
У2
2(1 -H)
Z“ z
— - 2r-----------
(Гх	O’t^’z
and (6-45) results because the above integral equals (1 - r2) divided by the
quantity in brackets.
UM I UNCI ION Ol rw<> RANDOM VARIAIII IS |J9
Integrating (6-45) from -Мог, we obtain the corresponding distribution
function
1 I	042 - /O’.
= ^ + ~arctan—(6-46)
о-j 1 - r-
Quadrant masses Using (6-46), we shall show that the probability masses
nip m2, m3, m4 in the four quadrants of the лу plane are given by
la	la
"’i ="'3 = 7 + 73	= "’.j = 7 - 7—	(6-47)
4	2. u	4 Z 7Г
where (Fig. 6-12)
a = arcsin r = arctan r/? 1 - r~ -тг/2 < a < тг/2
Proof. The second and fourth quadrant is the region of the plane such that
x/y < 0. The probability that the point (x,y) is in the region equals, therefore,
the probability that the RV z = x/y is negative. Hence
1	1	r
m2 + m4 = P{z <, 0} = F.(0) = --arctan-y==-
2	7Г VI - r-
and (6-47) results because
m2 - m4 + m2 +m3 +»n4- I
This useful result could have been obtained by integrating /(.v, y) in each
qiiadraht; the above method is, however, simpler.
3	.z = Ух2 + у2
The region £>. is the circle x2 + y2 < z2 and FXz) equals the probability
masses in this circle. If f(x, y) = g(r) is circularly symmetrical, then
F.(z) = 2irf rg(r) dr z > 0.
Aj
140 IU<> ЦДМЮМ VARIAHl.l-S
Normal densities. («) If
Ax’>’) =	(6-48)
<4 a <z	'
then
1 г-'
- ) = ^5 / >'? ' '~tr dr = I - e ‘ z > 0	(6-49)
Hence
Д(г) = -V^'W)	(6-50)
Thus, if the RVs x and у are normal, independent with zero mean and equal
variance, then the RV z = /x2 + у2 has a Rayleigh density.
(Z>) Suppose now that
/(-v,y) = —Це-I*' ^-'-1	(6-51)
L~<r~
The region AZZ of the plane such that z < yCr2 + y2 < z + dz is a circular
ring with inner radius z and thickness dz. With
.V z cos 0 у = г sin 0 dxdy = zdzdO
it follows that
/.(z) dz = [[ f(x, y) dxdv = —Ц
JJXD.	2тга~ A>
Hence
f.(?) = -------------------_£,-(-•*+’T)/2<7-[- e^C^0/,r-de
2тга~	A)
This yields
/;(z) = ”2;«>(р’)е’('Л+’’г>/2‘гг z > 0	(6'52)
where
/ (,v) = J-[2ee^»do	(6-53)
2tt Aj
is the modified Bessel function.
Example 6-8. Consider the sine wave
xcos + ysin u>t « rcosfwf + 0)
Since r e |/x: + y2. if follows from the above that, if the RVs x and у arc normal
as in (6-48). then the density of r is Rayleigh as in (6-50).
6-2
ОМ. I IISOION O( IWO KAMMJM VAHIAIII l_S 141
FIGURE 6-13
4. z = max(x,y) w = min(x.y)
(u) The region D. oi the xy plane such that max(x, y) < z is the set of
points such that x < z and у £ z (shaded in Fig. 6-13«). Hence
/<(-’)= F,v(z.z)	(6-54)
If the RVs x and у are independent. then
F.(z) =F,(z)F>.(z)	/.(z) =/v(2)Fv(z) + A(2)F((z) (6-55)
(6) The region DH of the xy plane such that min(.v, y) < »v is the set of
points such that x < и' or у < iv (shaded in Fig. 6-136). Hence
ЛД»’) = Л(»’) + /\(»’) ~ Ли(и’’и)	(6-56)
If the RVs x and у arc independent then it is simpler to express the result in
terms of the reliability function.
/?v(x) = P{x >x} = 1 - Ft(x)	(6-57)
Defining Rv(y) and Rw(w) similarly, wc conclude from (6-56) that
/?„.(и-) = Rt(w)Ry(w) fw(w) =fx(w)Ry(w) +fv(w)Rx(w) (6-58)
Discrete type. If the RVs x and у are of discrete type taking the values x, and
yk. then the RV z =g(x,y) is also of discrete type taking the values z, =
g(x,, yA). The probability that z = zr equals the sum of the point masses on the
curve g(x, y) = zr.
Example 6-9. A fair die is tossed twice and the RVs x and у are such that
*(/,/*) = ' у(ЛЛ) = *
The xy plane has 36 equal point masses as in Fig. 6-14. The RV z = x + у lakes
the values zr = x, + yk with probabilities p, = m/36 where m is the number of
points on the line x + у = z,. As we sec from the figure
z, = 2	3	4	5	6 7 8	9 10	11	12
12	3	4	5 6 5	4 3	2	1
Pr “363636	36	363636	3636	36	36
For example, there arc four mass points on the line x + у = 5; hence « 4/36.
142 TWO RANDOM VARIABLES
6-3 TWO FUNCTIONS
OF TWO RANDOM VARIABLES
Given two RVs x and у and two functions g(x, y) and h(x, y), we form the RVs
z = g(x,y) w = /i(x,y)	(6-59)
We shall express the joint statistics of z and w in terms of the functions g(x, y)
and h(x, y) and the joint statistics of x and y.
With z and w two given numbers, we denote by D.w the region of the xy
plane such that g(x, y) < z and h(x, y) < w. Clearly,
{z < z, w £ w) = {(x,y) e Dzw)
Hence [see (6-7)]
/Цг,и») = P{(x,y) e D.J = ff fxy(x, y) dxdy (6-60)
Suppose, for example, that
z = yx2 + y2 w = y/x	(6-61)
In this case, the set D.w such that
y/x2 4-y2 < z y/x £ w
is the shaded region of Fig. 6-15a, and F.„,(z, w) equals the mass in this region.
Example 6-10. If
Д/ж, У) =	~lx3+yl,/2tf2 z = 1/x2 + y2 w = y/x
then [see (6-49)] the mass in the circle x2 + y2 < z2 equals I - e“* /2,r - Since
/Х1.(х, у) has circular symmetry we conclude that for z > 0:
20	. , .	w
F.H.(z,»v) = r—(l ~ e-*'"7-"')	6 - ~ + arctanw
2тг
and XjhXz,».) - 0 for z < 0. This is a product of a function of z times a function
6-3 TWO l USrriONS Ol- IWO RAN|X>M VAR1AHI I s 143
FIGURE 6-15
of w. Hence the RV z and w arc independent with
/\.(г) = (1 - e "/2")U{ г) Fw(w) = - + —arctan u
In other words, z has a Rayleigh density and w has a Canehy density [see (5-17)] as
in Fig. 6-156.
Joint Density
We shall determine the joint density of the RVs
z = g(x,y) w = Л(х,у)
in terms of the joint density of x and y.
Fundamental theorem. To find	we solve the system
g(x,y)=z h(x,y)=w
(6-62)
Denoting by (x„, y„) its real roots
g(x„,yn)=z h(x„,y„)=w
we maintain that
	fxv(x„,ytl)	(6-63)
U(x„y,)l +"	 • + .	
	U(X„,y„)1	
where
	dz	dz		dx	dx	-i
	dx	By		dz	dw	(6-64)
J(x,y) =	dw	dw	=	dy	dy	
	dx	dy		dz	dw	
is the Jacobian of the transformation (6-62).
144 two random variables
FIGURE 6-16
Proof. We denote by AD.K, the region in the .ry plane such that
z < g(x, y) < z + dz w < Л(х, у) < w 4- dw
This region consists of differential parallelograms, one for each (x„,yn) as in
Fig. 6-16. The area of each parallelogram equals dzdw/ |Лх„, y„)l and its mass
equals
fXy(xnXy„)dzdw/ |J(x„,y„)|
Since f2W(z,w)dzdw equals the mass in Д£);и„ we conclude, summing the
masses in all parallelograms, that
f2W(z,w) dzdw =
fxytx^y^dzdw
fxy(xn,y„) dzdw
Щх„,У„)1
|/(х„У|)1
and (6-63) results.
If the system (6-62) has no solutions in some region of the zw plane, then
f.w(z,w) = 0 in that region.
We shall illustrate the above theorem with two special cases.
LINEAR TRANSFORMATION
z = ax 4- by w = ex 4- Jy	(6-65)
If ad + be ¥= 0, then the system ax 4- by = z, ex 4- dy = w has one and only
one solution
x = Az 4- Bw у = Cz 4- Dw
Since J(x, y) = ad - be, (6-63) yields
fzw(z’w) = . , 1 , । Ay(4- Bw, Cz 4- Dw)	(6-66)
lad — oe|
Joint normality. From (6-66) it follows that if the RVs x and у are jointly
normal and
z = ax 4- by w = ex 4- dy
then z and w are also jointly normal.
6-3 1WO FUNCTIONS OF FWO RANUOM VARIABI LS 145
Proof. Joint normality means that /,v(x. y) is an exponential whose exponent is
a quadratic in x and y. If, in this quadratic, wc replace л by Az + Bw and у by
Cz + Dw as in (6-66), then an exponential results whose exponent is a quadratic
in z and и». This shows that the RVs z and w are jointly, and therefore also
marginally, normal.
From the above it follows that, if x and у are jointly normal and z = x + y,
then z is normal. We should emphasize, however, that if x and у arc marginally
but not jointly normal, then z is not, in general, normal. We give next a counter
example.
Example 6-11. Wc shall construct two marginally normal RVs x, and y, such that
their sum z( = x, + y| is not normal: Wc start with two jointly normal RVs x and у
and add and subtract masses on the four circles of Fig. 6-17. The resulting mass
distribution specifics the joint density of the RVs x( and y,. As wc have shown in
Example 6-1. these RVs are marginally normal. However, their sum z( is not
normal.
Rotation, A special case of (6-65) is the transformation
г = xcos tp + у sin tp w = -xsin tp + у cos tp	(6-67)
In this case a = d - cos tp, b = — c = sin tp, and ad — be - 1. Hence
x = z cos tp — w sin tp у = z sin <p + w cos tp
and (6-66) yields
f.w( z,w) = fxy( z cos tp - tv sin tp, z sin tp + tv cos tp)	(6-68)
Thus, if two RVs are rotated by an angle tp, their probability masses are rotated
in the opposite direction by the same angle.
Circular symmetry If fxy(x, y) is circularly symmetrical as in (6-31), then
f (x, у) = fxy( x cos tp - у sin tp. x sin tp + у cos tp) (6-69)
because
(x cos tp — у sin tp)~ + (x sin tp + у cos ^>)~ = x2 + у2
Hence [see (6-68)1
2>»v) = Л/ w) = g()/z2 + tv2)	(6-70)
FIGURE 6-17
146 TWO RANDOM VARIABLES
Conversely, if the RVs x,y and z,w have the same statistics for every <p,
then their joint density is circularly symmetrical. From (6-34) it follows that if x
and у arc also independent, then they are normal.
POLAR COORDINATES. Consider the RVs
r=/x2 + y2 <p = arctany/x	(6-71)
where we assume that г > 0 and -—<(?<—. With this assumption, the
system \/x3 + у2 = r. arctan y/x = u? has a single solution
x = r cos <p у — r sin <p for r > 0
Since [see (6-64)]
,,	. cos <p -rsin<p ' I
J(x, y) = .	= -
Sin ip r cos	r
we conclude from (6-63) that
fr*(r><p) = rf„.(r cos<p, r sin <p)	r>0	(6-72)
and 0 for r < 0.
Example 6-12. Wc shall show that if
xcos wt + ysin wt = rcos(w/ - <p)	| < 7Г
and the RVs x and у arc V(0, a) and independent, then the RVs г and are
independent. <p is uniform in the interval (-тг, тт) and г has a Rayleigh distribution.
Proof. Since x = r cos <p, у - r sin <p. and
Av(x, v) = —
(6-72) yields
~e~ri/2": r>0 M < ”•
2ira~
and 0 otherwise. This is a product of a function of r times a function of <p. Hence
the RVs r and <p are independent with
aw - цм -
ar	лтт
for r > 0, —it < ^> s it and 0 otherwise. The proportionality factors arc so chosen
as to make the area of each term equal to 1.
From the above it follows that, if the RVs r and <p are independent, r has a
Rayleigh distribution, and tp is uniform in the interval (-tt.tf), then the RVs
x = rcosp y«rsin^
are MO, tr) and independent.
6-3 IWO f UNCTIONS Of- two random V.\RIAIU ES 147
Auxiliary variables. The determination of the density of one function z = g(x. y)
of two RVs can be determined from (6-63) where w is a conveniently chosen
auxiliary variable, for example w = x or w = y. The density of z is then found by
integrating the function w) so obtained.
Example 6-13. We shall find the density of the RV
z = ax + by
using as auxiliary variable the funeiion w = y.
The system z = ax + by, iv =y has a single solution: x = (z - bw)/a,
у = iv. Since
-|“o
it follows from (6-63) that
1	I z- by \
Л..(г.и.) - —( — .У)
Hence
(6-73)
Example 6-14. With
z = xy w = x
the system xy = z, x = >v has a single solution: x = iv. у = z/w. In this case,
J = —w and (6-63) yields
1—гА>(и?-
|»v| Ц w)
Hence the density of the RV z = xy is given by
(M4)
FIGURE 6-18
148 TWO RANDOM VARIAUl.liS
Special case. Wc now assume that the RVs x and у arc independent and each is
uniform in the interval (0, I). In this case.
in the triangle z < w < 1, 0<z<l (shaded in Fig. 6-18) and 0 elsewhere.
Inserting into (6-74), we obtain
f.(z) = ['-dw = / 7ln z W’	(f)-75)
'	J. и- \ 0	elsewhere	'	1
Example 6-15. An RV z has a Student-t distribution tin) with n degrees of
freedom if
У,	= Г[(» + D/2]
/(I +Z2/")"77 У' ^"Г("/2)'
(6-76)
We shall show that if x and у are two independent RVs, x is MU. 1). and у is
X2(ri)'-
f,(x) - e	fy(y) ~ у"/1- '<•“>/-(/( У)
then the RV
has a r(n) distribution
Proof. We introduce the RV w = у and use (6-63) with

This yields
Integrating with respect to w, we obtain
and (6-76) results because	dw = Г(а)/Ьи. The constant y, is
determined from (4-18).
PROBLEMS
6-1. If x anil у arc the zero-one RVs associated with the events .?/ and & respectively,
(fl) find the probability masses in the x-y plane and (b) show that the RVs x and у
are independent iff the events x/ and id are independent.
PKoint'MS 149
6-2. The RVs x and у are independent and z = x + y. Find f,( y) if
/,( x) = ce r'l/( x) f.(z) = c2zc ‘ {/( z)
6-3. The RVs x and у arc independent and у is uniform in the interval ((). I). Show that,
if z = x + y. then
/.-(-) = F,(z) - F,(z - I)
6-4. (o) The function g(x) is monotone increasing and у = g(x). Show ihai
(b) Find Flv(x, y) if g(x) is monotone decreasing.
6-5. Express F.w(r. iv) in terms of fiy(x, y) if z = niax(x,y). w = min(x,y).
6-6. The RVs x and у are M0.2) and independent. Find fAz) and F.(z) if («)
z = 2x + 3y, and (Л) z = x/y.
6-7. The RVs x and у are independent with
Show that the RV z = xy is ЛЧ0,«).
6-8. The RVs x and у are independent with Rayleigh densities
Д(х) =	f,(y) = ^e~'2'2li:U(y)
a	fi~
(a) Show that if z = x/y, then
(/?) Using (i). show that for any к > 0.
к2
6-9. The RVs x and у are independent with exponential densities
/,(x) = ae~”*U(x) fY(y) = pc-^U(y)
Find the densities of the following RVs:
1.2x + y 2. x - у	3.	4. max(x,y) 5. min(x.y)
6-10. The RVs x and у are independent and each is uniform in the interval (0.(я). Find
the density of the RV z = |x - y|.
641. Show that (n) the convolution of two normal densities is a normal density, and (b)
the convolution of two Cauchy densities is a Cauchy density.
ISO ТОО RANDOM VARIABLES
6-12. The RVs x and 6 arc independent and 8 is uniform in the interval (-тг, п-). Show
that if z = xcosGw + 0), then
6-13. The RVs x and у are independent, x is A/(0, cr), and у is uniform in the interval
(0,rr). Show that if z = x + a cosy, then
£•(*) =
-----== Ге ~	y’/2*’’ dy
тпг&тг -'ll
6-14. The RVs x and у are of discrete type, independent, with P{x = n) = ti„, P{y = n)
= b„, и = 0,1,... . Show that, if z = x + y, then
ft
P{z = ,t}=
it-II
6-15. The RV x is of discrete type taking the values x„ with P{x = л,,} = p„ and the RV
у is of continuous type and independent of x. Show that if z = x + у and w = xy,
then
AG) = ЕЛ-(г - x„)p„ fw(w) = £	— К
n	it । x^ti 1
6-16. The Rvs x and у are normal, independent, with the same variance. Show that, if
z = /x2 + y2, then fXz) is given by (6-52) where rj = ^/rj2 + rj~.
6-17. The RVs x, and x2 are jointly normal with zero mean. Show that their density can
be written in the form
= 2^C*P{~2XC~'X>}	C=[^ M.u]
where X: [г„гг1 = E(x,xy}, and Д =	+ Mu-
6-18. Show that if the RVs x and у are normal and independent, then
(ri. \	( flv\	/ n, \ I flv\
— +G — -2G — G —
/	\ °> /	I \ /
6-19. The RVs x and у are independent with respective densities ;y2(/n) and д,2(п).
Show that if
x/т	xm/2-2
z ------ then /,(z) = у .	~U(x)
7(1 +
This distribution is denoted by F(m, /1) and is called the SnedecorF distribution. It
is used in hypothesis testing (see Prob, 9-34).
CHAPTER
7
MOMENTS AND
CONDITIONAL
STATISTICS
7-1 JOINT MOMENTS
Given two RVs x and у and a function g(x, y), wc form the RV z = g(x,y). The
expected value of this RV is given by
E{z} = Г zf:(z)dz	(7-1)
J — ОС
However, as the next theorem shows, E(z) can be expressed directly in terms of
the function g(x, y) and the joint density /(x, y) of x and y.
THEOREM
£{g(x,y)} = f f g(x,y)f(x,y)dxdy	(7-2)
Proof. The proof is similar to the proof of (5-29). We denote by Д/Л the region
of the xy plane such that z < g(xr y) <z + dz. Thus to each differential in
(7-1) there corresponds a region AD. in the xy plane. As dz covers the z axis,
the regions AD. are not overlapping and they cover the entire xy plane. Hence
the integrals in (7-1) and (7-2) are equal.
We note that the expected value of g(x) can be determined either from
(7-2) or from (5-29) as a single integral
£{g(x)} = f f 8(x)f(x,y)dxdy~f g(x)/t(x) dx
J —	X
151
152 MOMEN rs AND CONDITIONAL STATISTICS
This is consistent with the relationship (6-10) between marginal and joint
densities.
If the RVs x and у are of discrete type taking the values x, and yk with
probability pik as in (6-19), then
E{g(^.y)} = 'Es(xi,yk)pik	(7-3)
i,k
Linearity From (7-2) it follows that
(n	\ n
£«а.яДх,у) = £>лЕ{яДх.у)}	(7-4)
I	I I
This fundamental result will be used extensively.
We note in particular that
E{x + y} = £{x) + £{y)	(7-5)
Thus the expected value of the sum of two RVs equals the sum of their
expected values. We should stress, however, that in general
E{xy} #= £{x)E{y)
Frequency interpretation As in (5-26)
c{ л x(O + y(<i) + •••+x(U+y(&)
E{x + y) =----------------------------------
n
«(<>) + •••+«(£,,), y«,) + ••+?«,)	,
---------------------+-------------------= ед + ед
However, in general,
w ,	X(f,)y(f,) + ••• +x(f„)y(f„)
E{xs}«-----------------------------
X<f.) + ••• +x(U y(f,)+ ••• +y(f„)
*--------------------x-------------------= £{x}E{y)
Covariance. The covariance C or Crv of two RVs x and у is by definition the
number
C = E((x - 4J(y - л>.))	(7-6)
where £{x) = and E{y) = Expanding the product in (7-6) and using (7-4)
we obtain
C = £{xy} — E{x)E{y)	(7-7)
Correlation coefficient The correlation coefficient r or rxv of the RVs x
and у is by definition the ratio
r=—	(7-8)
7-1 JOIN! MOMLNtS 153
We maintain that
И 1	|C|^o-,<Tv	(7.9)
Proof. Clearly,
f{[a(x - 7jJ + (y - t1>.)]2} = a-a~ + 2aC + a,2	(7-10)
The above is a positive quadratic for any a; hence its discriminant is negative. In
other words,
C2 - ofa2 < 0	(7-H)
and (7-9) results.
We note that the RVs x, у and x - ?)t, у - 17,. have the same covariance
and correlation coefficient.
Example 7-1. Wc shall show that the correlation coefficient of two jointly normal
RVs is the parameter r in (6-15). It suffices to assume that ??, = tj, = 0 and to
show that £{xy} = ra}cr2.
Since
wc conclude with (6-44) that
£{xy) =—-7= f ye">"/2,':'[ -----------===== exp
a2flTr >-*.	'-»о-|^2тг(1 - r2)
(x - ry<7|/o-2)2
2ar(l -r2)
dxdy
The inner integral is a normal density with mean ryo}/a2 multiplied by x; hence it
equals ry<rx/a2. This yields
£{xy) = —т== [ y2e~)r/-ai dy = гах(т2
<г2у2тг 7-x
Uncorrelatedness Two RVs are called uncorrelated if their covariance is
0. This can be phrased in the following equivalent forms
C = 0 r = 0	£{xy) = £{x) £{y)
Orthogonality Two RVs are called orthogonal if
£{xy} = 0
We shall use the notation
x ± у
to indicate that the RVs x and у are orthogonal.
Note (а) И x and у are uncorrelatcd, then x - тц ± у - ny- (b) If x and у arc
uncorrelated and ” Ootfy “О then x 1 y.
154 MOMl NTS AND CONDITIONAL STATISTICS
Vector space of random variables. We shall find it convenient to interpret RVs
as vectors in an abstract space. In this space, the second moment
£{xy)
of the RVs x and у is by definition their inner product and E{x2} and E{y2} are
the squares of their lengths. The ratio
£{xy}
\/E{x2)E{y2}
is the cosine of their angle.
We maintain that
E2{xy) < E(x2}E(y2]	(7-12)
This is the cosine inequality and its proof is similar to the proof of (7-11): The
quadratic
E{(ax - y)2} = a2E(x2} - 2яЕ{хУ| + E{y2)
is positive for every a; hence its discriminant is negative and (7-12) results. If
(7-12) is an equality, then the quadratic is 0 for some a = al}; hence у = anx.
This agrees with the geometric interpretation of RVs because, if (7-12) is an
equality, then the vectors x and у are on the same line.
The following illustration is an example of the correspondence between
vectors and RVs: Consider two RVs x and у such that E{x2} = E(y2}. Geometri-
cally, this means that the vectors x and у have the same length. If, therefore, we
construct a parallelogram with sides x and y, it will be a rhombus with diagonals
x + у and x - у (Fig. 7-1). These diagonals are perpendicular because
£((x + y)(x - y)} = £{x2 - У2} = 0
THEOREM. If two RVs are independent, that is, if
Л*»У)=/д.(хЩу)	(7-13)
then they are uncorrelated.
Proof. It suffices to show that
E{xy) = E{x)E(y}	(7-H)
x-yxx+y
У
FIGURE 7-1
7-1 ioim momi мъ 155
From (7-2) and (7-13) it follows that
£{xy} = f ( xyf,( x)/ ( у) dxdy = Г Xj\(x) dx Г yfv( y) dv
and (7-14) results.
If the RVs x and у are independent, then the RVs g(x) and Л(у) are also
independent [see (6-29)]. Hence
£{g(x)A(y)} = E(g(x)}E{h(y)}	(7-15)
This is not, in general, true if x and у are merely uncorrelated.
We note, finally, that if two RVs arc uncorrelated they are not necessarily
independent. However, for normal RVs uncorrelatcdness is equivalent to inde-
pendence. Indeed, if the RVs x and у arc jointly normal and r = 0, then [see
(6-15)]/(x, y) =/r(x)/>.(y).
Variance of the sum of two RVs If г = x + y, then = 77, + т]у\ hence
07 = E{(z - 77.)2) = e{[(x - 77x) + (У - T7.v)f)
From this and (7-10) it follows that
of = of + 2roxoy + of	(7-16)
The above leads to the conclusion that if r « 0 then
of = of + of	(7-17)
Thus, if two RVs are uncoirelated, then the variance of their sum equals the
sum of their variances.
It follows from (7-14) that this is also true if x and у are independent.
Moments
The mean
mkr = Е{х*у') = ( [ xkyrf(x,y) dxdy	(7-18)
—00
of the product x*yr is by definition a joint moment of the RVs x and у of order
к + r = n.
Thus m in = 77x, m01 = 77,. are the first-order moments and
= E{x2}	= E(xy) тпг = £{y2}
are the second-order moments.
The joint central moments of x and у are the moments of x ~~ 7}д and
У - -Пу’
- £{(x - ЧЛ)‘(У - 4,)'} " fjj* ~	У) W
(7-19)
156 MOMBNIS ANU CONDITIONAL STATISTICS
Clearly. дИ1 = дП1 = 0 and
Д||=С M2II = (Г1	Д|,2 = «Г
Absolute and generalized moments arc defined similarly [see (5-40) an<i
(5-41)].
For the determination of the joint statistics of x and у knowledge of their
joint density is required. However, in many applications, only (he first- and
second-order moments are used. These moments are determined in terms of the
five parameters
Ti П,	'гг
If x and у are jointly normal, then [see (6-15)] the above parameters
determine uniquely fix, y).
Example 7-2. The RVs x and у arc jointly normal with
т]л — 10 rjy = 0	= 2 <rv = 1 ru = 0.5
We shall find the joint density of the RVs
z = x + у w = x - у
Clearly,
17.- = Пл + П> = 10	П„. = П, “ 4» = 10
a? = <rf + a2 + 2rxvirx<rv = 7	a~ = tr~ +	— 2r<va1o-| = 3
£{zw) = £{x2 - y2} = (100 + 4) - 1 = 103
£{zw] - E{z}£{w}	3
(r.(rw	/7x3
As we know [see (6-66)], the RVs z and w are jointly normal because they are
linearly dependent on x and y. Hence their joint density is
V( 10,10;/7. A;-/3/7)
Estimate of the mean of g(x,y). If the function g(x,y) is sufficiently smooth
near the point (i7x,i7y), then the mean and variance a2 of g(x,y) can be
estimated in terms of the mean, variance, and covariance of x and y:
I	л d2g
4'=«+2^ +2^r‘T-°'- +
d2g 21
dy2<7> /
(7-20)
dg 12 2
dy j
(7-21)
where the function g(x, y) and its derivatives are evaluated at .r - and
7-2
JOIN) (.НАПАСИ KIS DC JUNCItOSS 157
Proof. Wc expand g(.v. y) into a series about the point (tj,. tjJ:
#(•*• y) = «(Пс-'Пс) + (-V - rif) Д 4- ( у - 7]4) — f---	(7-22)
cly
Inserting the above into (7-2), wc obtain the moment expansion of tfgfx.j)) in
terms of the derivatives of g(x, y) at (77,.^.) and the joint moments pA, of x
and y. Using only the first five terms in (7-22), we obtain (7-20), Equation (7-21)
follows if we apply (7-20) to the function [g(.v, у) - nJ2 and neglect moments
of order higher than 2.
7-2 JOINT CHARACTERISTIC FUNCTIONS
The joint characteristic function of the RVs x and у is by definition the integral
Ф(ю,.ш2) = / Г f(x.y)e),“''"",-i} dxdy	(7-23)
• — 'X.	— X.
From the above and the two-dimensional intorsion formula for Fourier trans-
forms, it follows that
/(x,y) = Г Г Ф((оРы,)е	(/«>,	(7-24)
Clearly,
Ф(<о,.Ш;) =	(7-25)
The logarithm
Ф( Wi, on) = In Ф(oj,,<*>-,)	(2-26)
of Ф(ю„ш,) is the joint second characteristic function of x and y.
The marginal characteristic functions
Ф,(«) = £{e'“’x)	Ф,(<а) = E{eJW>}	(7-27)
of x and у can be expressed in terms of their joint characteristic function
ФСшрШт). From (7-25) and (7-27) it follows that
Фг(о>) = Ф(ш,0)	ФДй») = Ф(0. fo>)	(7-28)
We note that, if z — ax + by, then
ф.(й») -	= Ф(аш.Ьш)	(7-29)
Hence Ф.(1) = Ф(д, b).
Cramer-Wold theorem The above shows that if Ф.(й>) is known for every
a and b, then Ф(й1|,а>2) is uniquely determined. In other words, if the density
of ax + by is known for every a and b, then the joint density f(x, y) of x and у
is uniquely determined.
Independence and convolution. If the RVs x and у are independent, then [see
(745)]
= £(e/W|'}£{e/“’2jr}
1S8 MOMLNIS ANO CONPI IIONAI. STATISTICS
From this it follows that
Ф( ат,, ш2) = ФД ю, )ФД a>2)	( 7-30)
Conversely, if (7-30) is true, then the RVs x and у are independent.
Indeed, inserting (7-30) into the inversion formula (7-24) and using (5-66), we
conclude that fix, y) = /Дх)/Ду).
Convolution theorem If the RVs x and у are independent and z = x + y,
then
Hence
Ф. (ы) = Фл (ш) ФД ы) Ф. (ш) = ФД w) + ФД (и)	(7-31)
As we know [sec (6-39)], the density of z equals the convolution of /,(x)
and fv(y). From this and (7-31) it follows that the characteristic function of the
convolution of two densities equals the product of their characteristic functions.
Example 7-3. Wc shall show that if the RVs x and у are independent arid Poisson
distributed with parameters a and b respectively, then their sum z = x + у is also
Poisson distributed with parameter a + b.
Proof. As we know (sec Example 5-30)
ФДш) = a{e,u> - I) ФДш) = b(e,u' - 1)
Hence
ФДш) = ФДш) + ФДш) = (а + b)(elw - I)
It can be shown that the converse is also true: If the RVs x and у arc
independent and their sum is Poisson distributed, then x and у arc also Poisson
distributed. The proof of this difficult theorem will not be given.
Example 7-4. It was shown in Sec. 6-3 that if the RVs x and у arc jointly normal,
then the sum ex + by is also normal. In the following we reestablish a special case
of the above using (7-30): If x and у arc independent and normal, then their sum
z = x + у is also normal.
Proof. In this case [sec (5.65)]
ФДш) =	~ l°x2<°2 ФДю) =	~ 1ауш'
Hence
ФДш) =	+	- 3(07 + сг~)ш:
It can be shown that the converse is also true (Cramer theorem): If the RVs x and
у are independent and their sum is normal, then they arc also normal. The proof of
this difficult theorem will not be given.t
tE. Lukacs: Characteristic Functions. Hafner Publishing Co., New York, I960.
7-2
IOINI ГНЛКАГП RISII* UNCTIONS 159
Normal RVs. We shall show that the joint characteristic function of two jointly
normal RVs is given by
Ф(<0|, Ш2) =	~	(7-32)
Proof. This can be derived by inserting /(x,y) into (7-23). The following
simpler proof is based on the fact that the RV z = ш,х + w2y is normal and
4z.(o>) =	— yrfw2	(7-33)
Since
77. = <O|771 4- ш2Т)2	=	+ 2ra>la>2crl<r2 + a>2(r2
and Ф.(<и) = Ф(й>](о, to2w), (7-32) follows from (7-33) with w = 1.
The above proof is based on the fact that the RV z = w,x 4 o>2y is normal
for any toj and e>2; this leads to the following conclusion: If it is known that the
sum ax 4 by is normal for every a and b, then RVs x and у arc jointly normal.
We should stress, however, that this is not true if ax 4 by is normal for only a
finite set of values of a and b. A counterexample can be formed by a simple
extension of the construction in Fig. 7-2.
Example 7-5. We shall construct two RVs x( and x2 with the following properties:
xH x,, and X| 4 x2 arc normal but x, and x2 are not jointly normal.
Suppose that x and у arc two jointly normal RVs with mass density fix, y).
Adding and subtracting small masses in the region D of Fig. 7-2 consisting of eight
circles as shown, we obtain a new function fxlx, y) such that f fx. y) = fix, y) ± r.
in D and falx, y) = fix, y) everywhere else. The function falx, y) is a density;
hence it defines.two new RVs Xj and yt. These RVs arc obviously not jointly
normal. However, they are marginally normal because x and у arc marginally
normal and the masses in any vertical or horizontal strip have not changed.
Furthermore, the RV z, = x, 4 y( is also normal because z = x + у is normal and
the masses in any diagonal strip of the form +	+ have not
changed.
160 MOMENTS AND CONDITIONAL STATISTICS
Moment theorem. The moment generating function of x and у is given by
0(sps2) =
Expanding the exponential and using the linearity of expected values, we obtain
the scries
ф(^^2) = E -4 E (£te(x¥-*)*frr*
„c() W! k-oyK}
= 1 + mief! + m(1152 + 5 (w 20 5 2 + 2ж,|5|52 + /n02s?) + • • • (7-34)
From this it follows that
дкдг
-j— Ф(0,0)-т„	(7-35)
VJ I C7|J 2
The derivatives of the function Ф(5,,52) = 1пФ(5|,52) are by definition
the joint cumulants ЛАг of x and y. It can be shown that
II) = ™IO ^01 = ™()l	^20 = M20	^(12 = Ml>2 ^11 = Mil
Hence
Ф($1, $2) = 17|51 + rl2S2 + К0"!25? + 2го-|О-2-5|52 + a2sl) + ’ ”
Example 7-6a. Using (7-34), wc shall show that if the RVs x and у arc jointly
normal with zero mean, then
E{x2y2} = E{x2)E(y2} + 2E2{xy)	(7-36)
Proof. As we see from (7-32)
Ф($1, s2) = e~A A = |(<Г|2Я + 2Су|52 + °"’5?)
where C = E(xy} = ro-|O-2. To prove (7-36), we shall equate the coefficient
of in (7-34) with the corresponding coefficient of the expansion of e~A. In this
expansion, the factors s^2 appear only in the terms
A~ I 1 •>	•» -»\2
T = -j (оТ*Г + 2CS|S2 + <r,‘s2)
Z о
Hence
ZT ( 2 )^хгу2^ = |(2o’'2a2 + 4C2)
and (7-36) results.
7-2 JOINT CllARACTLKISIIC FUNCTIONS	161
Prices theorem.! Given two jointly normal RVs x and y, we form the mean
/ = £{s(x.y)} - Г Г g(x, y)f(x. y) dxdy (1.31a)
of some function g(x,y) of (x,y). The above integral is a function /(д) of the
covariance д of the RVs x and у and of four parameters specifying the joint
density /Cr, y) of x and y. We shall show that if g(x, y)f(x, у) -> 0 as
(x, y) -» ®, then
дд"
* г* 93ng(x, у)
9x" dyn
f(x, y) dxdy = E
d2”g(x,y)
dx"dyn
(l-31b)
Proof. Inserting (7-24) into (7-37a) and differentiating with respect to д, we
obtain
<3-/(m)	(-1)"	,
X f f й/|'й/,'Ф(а>|, to2)<? Jlu>'i +u>2>} dwi dw2dxdy
From this and the derivative theorem, it follows that
^/(д)
ад"
/X
^V(x.y)
dx" dy"
dxdy
Integrating by parts and using the condition at a\ we obtain (7-376) (see also
Prob. 5-31).
Example 7-6b. Using Price’s theorem, wc shall rcdcrivc (7-36). Setting g(x, y) =
x2y2 into (7-376), we conclude with n = 1 that
dp. \ dxdy )	2
If д = 0, the RVs x and у arc independent; hence /(0) = E{x2y2} = E(x2}E{y2}
and-(7-36) results.
tR. Price, “A Useful Theorem for Nonlinear Devices Having Gaussian Inpiiis." IRE. PGfT, Vol.
IT-4, 1958. Sec also A. Papoulis, "On an Exiension of Price’s Theorem,” IEEE Transaction! on
Information Theory, Vol. ГТ-11, 1965.
162 MOMENTS AND CONDITIONAL STATISTICS
7-3 CONDITIONAL DISTRIBUTIONS
As we have noted, conditional distributions can be expressed as conditional
probabilities:
P{z <z,
r(z|.^) = P{z < zU] = —	•
(7-38)
P{z £ z,w < w,
F2W(z,w\*#) = F{z < z,w =
The corresponding densities are obtained by appropriate differentiations. In this
section, we evaluate these functions for various special cases.
Example 7-7. Wc shall first determine the conditional distribution F/ylx^x)
and density /г(у|х < x).
With {x <; x}, (7-38) yields
P{x<x,y<;y}	F(x,y)
Fy(y|x <x)-------17- ~ ---------
F{x<x)	Ft(x)
.. .	. 5F(x,y)/fly
Example 7-8. We shall next determine the conditional distribution F(x,y|.^) for
.^= {x( < x 5 x2). In this case, F(x, у |.^) is given by
F(x,ylx, < x ^x,) =
P{x £ x, у < y, x( < x < x,}
P(x( < x 5 x;)
' F(x2,y) - F(x{,y)
Ft{x2)-FAxx)
' F{x,y) - F(xi,y)
, Fx(x,)-F,(x,)
X > x2
X| <x <x2
and it equals 0 for x < x,. Since f == d'F/dxdy, the above yields
f( x, v)
/(x, у |x, < X s x,) =	X, < X <; X, (7-39)
ГА-*’) Гг( xI )
and 0 otherwise.
The determination of the conditional density of у assuming x = x is of
particular interest. This density cannot be derived directly from (7-38) because,
in general, the event {x = x) has zero probability. It can, however, be defined as
a limit. Suppose first that
7-3 CONDIIIOSAI I>ISI Klin. HOSS 163
In this case, (7-38) yields
/;.(УИ < x sx,) = P{X' < *	= *4 •*?♦>’) ~ /(a,, у)
/’{.v, < x <x,}	/\(х,) - Г4(х,)
Differentiating with respect to y. wc obtain
f 'f(x.y) dx
/v(yl-V| < x ^xs) = —-— ----------------	(7-40)
-F,(a,)
because [see (6.6)]
аГ(А-.у) x
To determine Д.(у|х = л). we set x, = л and x, = x + Да in (7-40). This
yields
fy( Уl-v < x < x + Дх)
/ 'f(a,y)da
/;.(А +Да) - I\(x)
f(x, у) Дх
/.(v) Дх
Hence
/У(у|х = x) = Jim^Cylx < x <x + Дх) =
If there is no fear of ambiguity, the function /y.(ylx = x) will be written in
the form f(y\x). Defining /(x|y) similarly, we obtain
/(y|x) = 7fjr /(Ф) = 7Г7Г (MI)
If the RVs x and у are independent, then
f(x,y) = /(x)/(y)	f(y\x)=f(y) f(x\y) = j\x)
Notes 1. For a specific x, the function fix, y) is a profile of /(x. у); that is, it equals the
intersection of the surface /(x, y) by the plane x = constant. The conditional density
/(y|x) is the equation of this curve normalized by the factor 1//(л-) so as to make its
area 1. The function /(x|y) has a similar interpretation: It is the normalized equation of
the intersection of the surface f(x, y) by the plane у = constant.
2. As we know, the product f(y)dy equals the probability of the event (у < у < у
+ dy). Extending this to conditional probabilities, wc obtain
, z ,	P{x} <x£x2,y <y £y+ dy}
Ц ylx, < x < x2) dy =-------Plx~<*<x V---------
This equals the mass in the rectangle of Fig. 7-3a divided by the mass in the vertical strip
x, <xsx2. Similarly, the product f(y\x)dy equals the ratio of the mass in the
differential rectangle dxdy of Fig. 7-36 over the mass in the vertical strip (x, x + dx).
164 MOMHNTS AND CONDITIONAI S TATISTICS
(«)	(6)	FIGURE 7-3
3. The joint statistics of x and у arc determined in terms of their joint density
fix, y). Since
/(.v.y) =/(ylx)/(A)
wc conclude that they arc also determined in terms of the marginal density f(.v) and the
conditional density /(y|.v).
Example 7-9. We shall show that, if the RVs x and у are jointly normal with zero
mean as in (6-44), then
/(yl.r) = -----, -- exp
rr2y 2тг( 1 -r)
(y - rtrzx/(T})2
2a,2(l -r’)
(7-42)
Proof, The exponent in (6-44) equals
(у — /•<Г2.Г/<Г| )“
2(7?(I - r2)
Division by /(л ) removes the term -.г2/2сг,2 and (7-42) results.
The same reasoning leads to the conclusion that if x and у arc jointly normal
with £(x) = ту, and E{y} = tj2, then f(y|.v) is given by (7-42) if у and л are
replaced by у — т)2 and .v - rj| respectively. In other words, for a given x. f(y\x)
is a normal density with mean tj2 + r<r2(x — and variance <t22(1 - r2).
Bayes’ theorem and total probability. From (7-41) it follows that
Я*Ь>) -
/<У)
(7-43)
This is the density version of (2-38).
The denominator /(y) can be expressed in terms of f(y|.v) and /(.v).
Since
/(>’)“( f(x,y)dx and f(x,y) =/(ylx)/(.v)
7-3 CONPIIIOSAI DISIRIHl'IIOSS 165
we conclude that (total probability)
(7-44)
Inserting into (7-43), we obtain Bayes’ theorem for densities
/ f{y\x)f(x)dx
(7-45)
Note As (7-44) shows, to remove the condition x = x from the conditional density
/(y|.r). we multiply by the density /(.v) of x and integrate the product.
Discrete type. Suppose that the RVs x and у are of discrete type
P{x = x,) = /», р{у = ук}=(1к
P{x = x,,y = .vA.} = plk i=\....,M к = I
N
where [see (6-21)]
P< = Xp.k Qk = Ep,*
A	i
From the above and (2-29) it follows that
F{y = yA.|x = x,} =
R{x = x,.y =yj
P{x=x,}
P,k
P,
Markoff matrix We denote by тт,к the above conditional probabilities
F(y = УЛ.|Х =X,} = 77,A
and by П the M x N matrix whose elements are irtk. Clearly,
TT|A = —	(7-46)
P,
Hence
>0	- 1	(7-47)
к
Thus the elements of the matrix П are positive and the sum on each row equals
1, Such a matrix is called Markoff. The conditional probabilities
r	1	1 ki
P{x = x,|y = yA.} = ~ k' = —
4k
are the elements of an WxAf Markoff matrix.
If the RVs x and у are independent, then
Pik = Pi<lk =(lk 'n’*'= P>
166 MOMENTS ANO CONDITIONAL STATISTIC'S
We note that
= ^ik~ Чк = Ъ^,кР,	(7-48)
These equations arc the discrete versions of Eqs. (7-43) and (7-44).
System Reliability
Wc shall use the term system to identify a physical device used to perform a
certain function. The device might be a simple element, a light bulb, for
example, or a more complicated structure. We shall call the time interval from
the moment the system is put into operation until it fails the time to failure. This
interval is, in general, random. It specifies, therefore, an RV x 0. The
distribution Fit) = P{x < r) of this RV is the probability that the system fails
prior to time t where wc assume that t = 0 is the moment the system is put into
operation. The difference
R(r) = 1 - F(t) = P(x > r)
is the system reliability. It equals the probability that the system functions at
time t.
The mean time to failure of a system is the mean of x. Since Fix) = 0 for
x < 0, we conclude from (5-27) that
E{x) = Cxfix) dx = CRit) dt	(7-49)
The probability that a system functioning at time t fails prior to time x > t
equals
Differentiating with respect to x, we obtain
z-, . x Лх)
fix x > t) = ------------------------------г
74	7	1 - F(t)
F(x)-F(t)
l-F(t)
(7-50)
(7-51)
x > t
The product /(x|x > t)dx equals the probability that the system fails in the
interval (x, x + dx), assuming that it functions at time t.
Example 7-10. If fix) then Fit) = 1 - e~a and (7-51) yields
ce~rx
fix\x > t) =	= fix ~/)
This shows that the probability that a system functioning at time t fails in the
interval (x, x + dr) depends only on the difference x - t (Fig. 7-4). We show later
that this is true only if fix) is an exponential density.
7-3 conditional dis» КИШ HONS 167
1 FIGURE 7-4
Conditional failure rate. The conditional density /(x|x > r) is a function of л
and t. Its value at x = t is a function only of t. This function is denoted by j3(t)
and is called the conditional failure rate or, the hazard rate of the system. From
(7-51) and the definition it follows that
/(r)
Д(') =/(r|x>/) = --	(7-52)
I - F(O
The product [3(.()dt is the probability that a system functioning at time t fails in
the interval (t,t + dt). In Sec. 8-1 (Example 8-3) we interpret the function ft(t)
as the expected failure rate.
Example 7-11. (a) If /(x) = then F(t) = I - e cl and
ce ~ '
(b) If fix') = c2xe~r\ then F(x) = 1 - схе ,л - е~сл and
c2te~l> czt
)	с1е~“ + ecl 1 + ct
From (7-52) it follows that
F'(/) _ £(0
l-F(t) R(t)
We shall use this relationship to express the distribution of x in terms of the
function /3(r). Integrating from 0 to x and using the fact that In Ж0) = 0, we
obtain
- [x0(t)dt = ln R(x)
Hence
R(x) = 1 -F(x) = exp{-dt
And since f(x) = F'(x), this yields
f(x) = &(x) exp^-jT Э(0
(7-53)
(7-54)
168 MOMENTS AND CONDITIONAI. STATISTICS
Example 7-12. A system is called memoryless if the probability that it fails in an
interval (/, x), assuming that it functions at time t, depends only on the length of
this interval. In other words, if the system works a week, a month, or a year after it
was put into operation, it is as good as new. This is equivalent to the assumption
that /(x|x > t) = f(x - /) as in Fig. 7-4. From this and (7-52) it follows that with
x = t:
Д(/)=Д/|х>г)=/(/-/)=/(0)=с
and (7-54) yields /(x) = ce ~cx. Thus a system is memoryless iff x has an exponential
density.
Example 7-13. A special form of /?(/) of particular interest in reliability theory is
the function
Д(/) = ctb~l
This is a satisfactory approximation of a variety of failure rates, at least near the
origin. The corresponding /(x) is obtained from (7-54):
i I
f(x) = cxb~{ exp<>	(7-55)
This function is called the Weibull density.
We conclude with the observation that the function /3(r) equals the value
of the conditional density /(x|x > /) for x = /; however, 0(r) is not a density
because its area is not one. In fact its area is infinite. This follows from (7-53)
because R(«t) = 1 - F(a>) = 0.
Interconnection of systems. We are given two systems and S2 with times to
failure x and у respectively, and we connect them in parallel or in series or in
FIGURE 7-5
Series
7-4 CONDITIONAL EXPECTED VALUES 169
standby as in Fig. 7-5, forming a new system S. We shall express the properties
of 5 in terms of the joint distribution of the RVs x and y.
Parallel. We say that the two systems are connected in parallel if S fails when
both systems fail. Denoting by z the time to failure of 5, we conclude that z = t
when the larger of the numbers x and у equals t. Hence [see (6-54)]
z = max(x,y) Fz(z) = Fry(z, z)
If the RVs x and у are independent, F.(z) = F/z)Fv(z).
Series. We say that the two systems are connected in series if S fails when at
least one of the two systems fails. Denoting by w the time of failure of S, we
conclude that w = t when the smaller of the numbers x and у equals t. Hence
[see (6-56)]
w = min(x,y) Fw(w) = Fx(w) + Fy(w) - Fxy(w,w)
If the RVs x and у are independent,
= Fx(w)Ry(w)	= 0x(t) + £y(t)
where j3x(t), pytt), and /3U.G) are the conditional failure rates of systems S,, S2,
and S respectively.
Standby. We put system 5, into operation, keeping S2 in reserve. When 5,
fails, we put S2 into operation. The system S so formed fails when S2 fails. If r,
and t2 are the times of operation of 5, and 52, t, + t2 is the time of operation
of S. Denoting by s the time to failure of system S, we conclude that
s = x 4- у
The distribution of s equals the probability that the point (x, y) is in the
shaded region of Fig. 7-5. If the RVs x and у are independent, the density of s
equals
fM-
as in (6-40).
7-4 CONDITIONAL EXPECTED VALUES
Applying theorem (5-29) to conditional densities, we obtain the conditional
mean of g(y):
£{^(y)l-^} = Г S(y)f(y^)dy	(7-56)
* —CD
This can be used to define the conditional moments of y.
170 MOMENTS AND CONDITIONAL STATISTICS
FIGURE 7-6
Using a limit argument as in (7-41), we can also define the conditional
mean E{g(y)|x}. In particular,
yylx = E(y|x} = f yf(y\x) dy	(7-57)
J — QC
is the conditional mean of у assuming x = x, and
= Е{(У ~ ’b-ix)2!*} = У ~ rly\x)Zf(y\x) dy (7-58)
is its conditional variance.
For a given x, the integral in (7-57) is the center of gravity of the masses in
the vertical strip (x, x + dx). The locus of these points, as x varies from -® to
°®, is the function
4>(x) = f yf(y\x)dy	(7-59)
J — ®
known as the regression line (Fig. 7-6).
Note If the RVs x and у are functionally related, that is, if у = g(x), then the probability
masses on the xy plane are on the line у = g(x) (see Fig. 6-5f>); hence E(y|x} = g(x).
Gallon’s law. The term regression has its origin in the following observation
attributed to the geneticist Sir Francis Galton (1822-1911): “Population ex-
tremes regress toward their mean.” This observation applied to parents and
their adult children means that children of tall (or short) parents are on the
average shorter (or taller) than their parents. In statistical terms this can be
phrased in terms of conditional expected values:
Suppose that the RVs x and у model the height of parents and their
children respectively. These RVs have the same mean and variance, and they
are positively correlated:
"Их = Vx = arx = ay = ar r>0
According to Gallon’s law, the conditional mean £{y|x} of the height of
children whose parents height is x, is smaller (or larger) than x if x > t? (or
x < 77):

if	x < 77
if	x < 17
7-4 CONDITIONAL I ХРЬСГЬР VALUES 171
This shows that the regression line <p(x) is below the line у = x for x > -q and
above this line if x < rj as in Fig. 7-7. If the RVs x and у are jointly normal,
then [see (7-60) below] the regression line is the straight line <p(x) = rx. For
arbitrary RVs, the function <p(x) does not obey Gallon's law. The term regres-
sion is used, however, to identify any conditional mean.
Example 7-14. If the RVs x and у are normal as in Example 7-9, then the function
.	x - n.
E{y|x} = 7], + rcr,---	(7-60)
’ *1
is a straight line with slope ra2/al passing through the point	Since for
normal RVs the conditional mean £{yI v] coincides with the maximum of f(y|x),
we conclude that the locus of the maxima of all profiles of /(x, y) is the straight
line (7-60).
From theorem (7-2) it follows that
£{#(x,y)Un = f f g(x, y)f(x, у\Л) dxdy (7-61)
— OO* —30
This expression can be used to determine £{g(x,y)|x}; however, the conditional
density /(x,y|x) consists of line masses on the line x-constant. To avoid
dealing with line masses, we shall define £{g(x,y)|x) as a limit:
As we have shown in Example 7-8, the conditional density f(x, y|x < x <
x + Ax) is 0 outside the strip (x, x + Ax) and in this strip it is given by (7-39)
where X| =x and x2 = x + Ax. Il follows, therefore, from (7-61) with .^=
{x < x x + Ax] that
/•“ ух + Дх .	/( «, У )	.
B{g(x,j-)|x<xSx + Ax) - j j*
As Ax -» 0, the inner integral tends to g(x, y)/(x, y)//(x). Defining
E(g(x,y)|x) as the limit of the above, we obtain
E(g(x,y)|x] « f g(x,y)f(y|x)<fy	(7-62)
* — QC
172 MOMI N IS AND CONDI IIONAI. SI AI IS TICS
Wc also note that
b'{A’(x,y)|.v) = [ g(x.y)f(y\x) dy	(7-6З)
' — -x
because g(x,y) is a function of the RV y, with x a parameter: hence its
conditional expected value is given by (7-56). Thus
E{ g (x, У) I x} = £{ g (-v, У) I -v)	(7-64)
One might be tempted from the above to conclude that (7-64) follows
directly from (7-56); however, this is not so. The functions g(x. y) and g(x.y)
have the same expected value, assuming x = x, but they arc not equal. The first
is a function g(x, y) of the RVs x and y, and for a specific £ it takes the value
£[x(£), y(£)]. The second is a function g(x.y) of the real variable л and the RV
y, and for a specific £ it takes the value g[x,y(<)] where x is an arbitrary
number.
Conditional Expected Values as RVs
The conditional mean of y, assuming x = x, is a function Дх) = £{y|x) of x
given by (7-59). Using this function, we can construct the RV Дх) = E{y|x) as in
Sec. 5-1. As wc see from (5-29), the mean of this RV equals
Е{Дх)} = f <p(x)f(x) dx = I f(x)f yf(y\x)dydx
Since Дх, у) = Дх)Ду|х), the above yields
E{E{y|x}} = [ f yf(x, y) dxdy = E(y)	(7-65)
This basic result can be generalized: The conditional mean E{g(x. y)|x} of
g(x, y), assuming x = x, is a function of the real variable x. It defines, therefore,
the function E{g(x, y)|x) of the RV x. As we see from (7-2) and (7-61), the mean
of E{g(x, y)|x) equals
f f(x) f g(x,y)/(y|x) dydx = f f g(x,y)f(xty) dxdy
J — 00	J—»	00-'—ac
But the last integral equals E{g(x,y)}; hence
£{E{g(x,y)|x}} = E(g(x,y)}	(7-66)
Wc note, finally, that
£{ Я|(х)я2(у) И =^{«|(х)^2(у)И =5!(x)E{g2(y)l-v)	(767)
E(£i(x)S2(y)) =£{£{« 1(х)«з(У) Iх}} = EUi(x)E{£2(y)lx}}
Example 7-15. Suppose that the RVs x and у are M0,0:o-|,cr2;r). As wc know
E{x2)	£(x4} = 3cr4
7-5 MLANSOl'ARI I..SIIMAIION 173
Furthermore. f(y | л) is a normal density with mean	and variance
rr,vl - r. Hence
£{r!jr) = n;l( 4	_ r')
Using (7-67), we shall show that
= r>rt<r:	E{*:*y= E{x:}E[y:] i 2£'{xy)
Proof
ZT{xy) = £{x£{y|x}}
£{x-v-) = £{x-£{y2|x)} = £<x*
and the proof is complete [see also (7-36)].
7-5 MEAN SQUARE ESTIMATION
The estimation problem is fundamental in the applications of probability and it
will be discussed in detail later (Chap. 14). in this section, we introduce the
main ideas using as illustration the estimation of an RV у in terms of another
RV x. Throughout this analysis, the optimality criterion will be the minimization
of the mean square value (abbreviation: MS) of the estimation error.
We start with a brief explanation of the underlying concepts in the context
of repeated trials, considering first the problem of estimating the RV у by a
constant.
Frequency interpretation As we know, the distribution function F(y) of the RV у
determines completely its statistics. This does not, of course, mean that if wc know F(y)
we can predict the value y(f) of у at some future trial. Suppose, however, that wc wish to
estimate the unknown y«) bv some number c. As wc shall presently sec, knowledge of
F(y) can guide us in the selection of c.
If у is estimated by a constant c. then, at a particular trial, the error yf£) - c
results and our problem is to select c so as to minimize this error in some sense. A
reasonable criterion for selecting c might be the condition that, in a long series of trials,
the error is close to 0:
y(£i) -c+ ••• 4-y«„) -c Q
n
As we see from (5-26), this would lead to the conclusion that c should equal the mean of
У (Fig. 7-8«).
Another criterion for selecting c might be the minimization of the average of
c|. In this case, the optimum c is the median of у (sec page 68).
174	MOMENTS AND C ONDITIONAL STATISTICS
FIGURE 7-8
In our analysis, wc consider only MS estimates. This means that c should be such
as to minimize the average of |y(£) — c|“. This criterion is in general useful but it is
selected mainly because it leads to simple results. As we shall soon sec. the best c is
again the mean of y.
Suppose now that at each trial we observe the value x(£) of the RV x. On the basis
of this observation it might be best to use as the estimate of у not the same number c at
each trial, but a number that depends on the observed x(£). In other words, wc might use
as the estimate of у a function c(x) of the RV x. The resulting problem is the optimum
determination of this function.
It might be argued that, if at a certain trial we observe x(f), then we can determine
the outcome 4 of this trial, and hence also the corresponding value y(£) of y. This,
however, is not so. The same number x«) = x is observed for every 4 in the sei {x = x)
(Fig. 7-8Л). И, therefore, this set has many elements and the values of у are different for
the various elements of this set, then the observed x(£) does not determine uniquely y(£).
However, wc know now that £ is an element of the subset {x = x}. This information
reduces the uncertainty about the value of y. In the subset {x =x}, the RV x equals x
and the problem of determining c(x) is reduced to the problem of determining the
constant c(x). As we noted, if the optimality criterion is the minimization of the MS
error, then c(x) must be the average of у in this set. In other words, c(x) must equal the
conditional mean of у assuming that x = x.
Wc shall illustrate with an example. Suppose that the space .S is the set of all
children in a community arid the RV у is the height of each child. A particular outcome (
is a specific child and y(£) is the height of this child. From the preceding discussion it
follows that if wc wish to estimate у by a number, this number must equal the mean of y.
We now assume that each selected child is weighed. On the basis of this observation, the
estimate of the height of the child can be improved. The weight is an RV x; hence the
optimum estimate of у is now the conditional mean £{y |x) of у assuming x = x where x
is the observed weight.
In the context of probability theory, the MS estimation of the RV у by a
constant c can be phrased as follows: Find c such that the second moment
7-5 Ml. AS SQl'ARL LM I.MA1 ION 175
(MS error)
c = £{(У - CH = / J У - c)2/( y) dy	(7-68)
of the difference (error) у - c is minimum. Clearly, e depends on c and it is
minimum if
de .v.
de	У ~ (ly = 0
that is, if
<’ = f УЛ У) dy
Thus
c = £{y) = f yf(y)dy	(7-69)
This result is well known from mechanics: The moment of inertia with respect
to a point c is minimum if c is the center of gravity of the masses.
NONLINEAR MS ESTIMATION. Wc wish to estimate у not by a constant but by a
function c(x) of the RV x. Our problem now is to find the function c(x) such
that the MS error
e = £{[y - c(x)]2} = f f [y - c(x)j2/(x, y) dxdy (7-70)
J —	—x
is minimum.
We maintain that
c(x) = £{y|x) = / yf(y\x)dy	(7-71)
Proof. Since f(x, y) = f(y\x)f(x), (7-70) yields
e= ( f(x)f [y - c(x)]zf(y\x) dydx
J-X.	J -X
The integrands above are positive. Hence e is minimum if the inner integral is
minimum for every л. This integral is of the form (7-68) if c is changed to c(x),
and /(y) is changed to /(y|x). Hence it is minimum if c(x) equals the integral
in (7-69), provided that /(y) is changed to /(y|x). The result is (7-71).
Thus the optimum c(x) is the regression line <p(x) of Fig. 7-6.
As we noted in the beginning of the section, if у « g(x), then £{y|x} =
g(x); hence c(x) = g(x) and the resulting MS error is 0. This is not surprising
because, if x is observed and у = g(x), then у is determined uniquely.
If the RVs x and у are independent, then £{y|x} = £{y) = constant. In
this case, knowledge of x has no effect on the estimate of y.
176 MOMENTS AND CONDITIONAL STATISTICS
Linear MS Estimation
The solution of the nonlinear MS estimation problem is based on knowledge of
the function <p(x). An easier problem, using only second-order moments, is the
linear MS estimation of у in terms of x. The resulting estimate is not as good as
the nonlinear estimate; however, it is used in many applications because of the
simplicity of the solution.
The linear estimation problem is the estimation of the RV у in terms of a
linear function Ax + В of x. The problem now is to find the constants A and В
so as to minimize the MS error
e = E{[y - (Ax + B)]2)	(7-72)
We maintain that e = etn is minimum if
Д11 rorv
A = —— = — B = ny - Лт7ж	(7-73)
Mao ax
and
em = Мог “ — = <r/( 1 - r2)	(7-74)
M20
Proof. For a given A, e is the MS error of the estimation of у - /lx by the
constant B. Hence e is minimum if В = E{y - Лх} as in (7-69). With В so
determined, (7-72) yields
<? = £{[(У “ Vy) ~A(x - 77л)]2} = or2 - 2Araxay + A1 a;
This is minimum if A = ray/<rx and (7-73) results. Inserting into the above
quadratic, we obtain (7-74).
Terminology. In the above, the sum Ax + B is the nonhomogeneous linear
estimate of у in terms of x. If у is estimated by a straight line ax passing through
the origin, the estimate is called homogeneous.
The RV x is the data of the estimation, the RV e = у — (Лх + В) is the
error of the estimation, and the number e = £{e2} is the MS error.
Fundamental note. In general, the nonlinear estimate <p(x) = E[yIx} of у in
terms of x is not a straight line and the resulting MS error E{(y - <p(x)]2) is
smaller than the MS error e,„ of the linear estimate Ax + B. However, if the
RVs x and у are jointly normal, then [see (7-60)]
is a straight line as in (7-73). In other words:
For normal RVs, nonlinear and linear MS estimates are identical.
7-5 MEAN SQUARE ESI IMA 1 ION 177
The Orthogonality Principle
From (7-73) it follows that
Я{[у - (Лх + B)]x) = 0	(7-75)
This result can be derived directly from (7-72). Indeed, the MS error e is a
function of A and В and it is minimum if Яе/ЯА = 0 and Яе/ЯВ = 0. The first
equation yields
de
— = E(2[y - ( Ax + B)](-x)} = 0
(77*1
and (7-75) results. The interchange between expected value and differentiation
is equivalent to the interchange of integration and differentiation.
Equation (7-75) states that the optimum linear MS estimate Ax + В of у
is such that the estimation error у - (Ax + B) is orthogonal to the data x. This
is known as the orthogonality principle. It is fundamental in MS estimation and
will be used extensively. In the following, we reestablish it for the homogeneous
case.
HOMOGENEOUS LINEAR MS ESTIMATION. We wish to find a constant a such
that, if у is estimated by ax, the resulting MS error
e = E[(y - ax)2}	(7-76)
is minimum. We maintain that a must be such that
E{(y - ax)x) = 0	(7-77)
Proof. Clearly, e is minimum if e'(a) = 0; this yields (7-77). We shall give a
second proof: We assume that a satisfies (7-77) and we shall show that e is
minimum. With a an arbitrary constant,
E{(y - ax)2} = E{[(y - ax) + (a - a)x]2}
= E{(y - ax)2} + (a - a)2E{x2) -I- 2(a - a)E((y - ax)x)
In the above, the last term is 0 by assumption and the second term is positive.
From this it follows that
E((y - ax)2} > E{(y - ax)2}
for any a; hence e is minimum.
The linear MS estimate of у in terms of x will be denoted by E{y|x).
Solving (7-77), we conclude that
E(y|x) = ax a =	(7-78)
178 MOMENTS ANO CONDITIONAL STATISTICS
MS error Since
e = E((y - ox)y} - E{(y - ax)ax) = E{y2} - E{(ax)2) - 2oE{(y - ax)x)
wc conclude with (7-77) that
e = E{(y - ax)y} = E(y2} - E[(«x)2}	(7-79)
We note finally that (7-77) is consistent with the orthogonality principle:
The error у — ax is orthogonal to the data x.
Geometric interpretation of the orthogonality principle. In the vector represen-
tation of RVs (see Fig. 7-9). the difference у - ax is the vector from the point
ax on the x line to the point y, and the length of that vector equals fe. Clearly,
this length is minimum if у - ax is perpendicular to x in agreement with (7-77),
The right side of (7-79) follows from the Pythagorean theorem and the middle
term states that the square of the length of у - ax equals the inner product of у
with the error у — ax.
Risk and loss functions. We conclude with a brief comment on other optimality
criteria limiting the discussion to the estimation of an RV у by a constant c. We
select a function L(x) and we choose c so as to minimize the mean
R = E(L(y - c)} + j" Цу - c)f(y) dy
of the RV L(y - c). The function L(x) is called the loss function and the
constant R is called the average risk. The choice of L(x) depends on the
applications. If L(x) = x1, then R = E{(y — c)2} is the MS error and as we
have shown, it is minimum if c = E{y).
If L(x) = |x|, then Л=Е{|у-с|). We maintain that in this case, c
equals the median y05 of у (see also Prob. 5-20).
Proof. The average risk equals
\y - c\f(y) dy = ( (c - y)f(y) dy + [ (y-c)f(y)dy
•'-oo	J-<»	Je
Differentiating with respect to c, we obtain
dR rC	ra>
= /_ J(>,) dy  (/(y) dy=2F ю -1
Thus R is minimum if F(c) = 1/2, that is, if c = yn5.
y—axlx
y' y—ax
« x FIGURE 7-5
7-5 Mt AS'SQVaKI. I SI IMA I ION 179
Wc note finally that in certain applications, у is estimated by its mode, that
is, the value yni3X of у for which /(y) is maximum. This is based on the
following: The probability that у is in an interval (c,c + dy) of specified length
dy equals P{c < у < c + dy} ~ f(c)dy This is maximum if c = yro.iv
PROBLEMS
7-1. The RVs x and у are AK0:rr) and independent. Show that if z = |x - v|. then
E{z} = 2ст/ fa. E{z2} = 2a2.
7-2. Show that if x and у arc two independent RVs with /\(.v) = <• 'U(x). fv(y) =
e~vU(y), and z = (x - y)t/(x - y). then E{z| = 1/2.
7-3. Show that for any x.y real or complex
U) IE{xy}212 < E{|x12}E{Iy|2}:	_______
(b) (triangle inequality) yj E{ |x + y|2} < у/E{ |x|2} + у/ E{ |y|2} .
7-4. Show that, if rlv = 1. then у = ях + b.
7-5. Show that, if E{x2} - E{y2} = E{xy). then x = y.
7-6. Show that, if the RV x is of discrete type taking the values хл with P(x = .v„) = p„
and z = g(x,y), then
ад = E^(e(x„.y)}p„ A(z) = ЕЛ(-’1а„)р„
/I	П
7-7. The RV n is Poisson with parameter A and the RV x is independent of n. Show
that, if z = nx and
Л(а) = • Л 2. then Ф.(<о) = схр{Ле "|ш| - A)
ir(a~ + л* )
7-8; Show that, if the RVs x and у arc М0,0;а,<т;г), then
1	( rzx2 1
w £(Л(-У|л)>= /
7-9. Show that if the RVs x,y are M0,0;a-,,a-2; r) then
2	rc p	2a,a3 2ata3
Eflxyll = — I arcsin-----dp +--------=--------(cosл 4- «sin a)
7Г A)	&~l&2
where r ~ sin a and С — ra\a2.
Hint: Use (7-37) with g(x, у) = |лу I.
7-10. The RVs x and у are uniform in the interval (-1,1) and independent. Find the
conditional density /,(№) of the RV г - y/x2 + y2 where .Z= {r <; 1).
7-H. We have a pile of m coins. The probability of heads of the tth coin equals p,. Wc
select at random one of the coins, we toss it n times and heads shows к times.
180
7-12.
7-13.
7-14.
7-15.
7-16.
7-17.
7-18.
7-19.
7-20.
7-21.
7-22.
7-23.
7-24.
7-25.
7-26.
MOMf.NrS AND CONDITIONAL STATISTICS
Show that the probability that wc selected the rth coin equals
_______________________________Р.Ч1 -pry~k_______________
/’*(• -P,)" * + ••• + p*(l -P„y-k
The RV x has a Student-/ distribution /(»). Show that E{x} - n/tn - 2).
Show that if 0r(/) =	> i), /3v(r |y > /) and fJJj) = then 1 - F(v)
= fl - Fy(.r)]\
Show that, for any x.y, and г > 0.
1
P(lx - у I > f} < — £{ |x - уГ)
Show that the RVs x and у arc independent iff for any a and b:
E{U(a - x)U(b - y)} = E(U(a - x)}E{U(b - у))
Show that
£{ylx 0} = —- f" E{y\x)ft(x)dx
Г x ( U ) J — sc
Show that, if the RVs x and у arc independent and z = x + y. then fSzl.r) —
fy(z -xl
The RVs x,y arc М3,4: 1,2;0.5). Find f(y\x) and /(.vly).
Show that, for any x and y, the RVs z = F,(x) and w = Fv(y|x) arc independent
and each is uniform in the interval (0,1).
The RVs x and у arc M0,0; 3,5; 0.8). Find g(x) such that E{[y - g(x)]2} is
minimum.
In the approximation of у by ^>(x), the “mean cost" E{g[y - <p(x)]} results, where
g(.r) is a given function. Show that, if g(x) is an even convex function as in Fig.
P5-28, then the “mean cost" is minimum if v’(x) = £{ylx).
Show that if <p(x) = E{y|x) is the nonlinear MS estimate of у in terms of x. then
£{[y - <p(x)]2} = £{y2} - E{<r(x)}
If 7jr ~ Tjy = 0, агл ~ o-y « 4, and у ~ 0.2x (linear MS estimate), find E{(y - y)").
Show that if the constants A, B, and a are such that Effy - (Лх + В)]') and
E{[(y - y) - a(x - tjx)]2} are minimum, then a = A.
Given ??, = 4, f)y = 0, cr, = 1, cr^ = 2, riy - 0.5, find the parameters A, B, and a
that minimize E{(y - (/ix + B)]‘) and E((y - ax)2).
The RVs x, у are independent, integer-valued with P{x = k} - pK, P{y = k) =
Show that (a) if z = x + y, then (discrete-lime convolution)
= n) = £
A- - -®
(/>) if the RVs x, у are Poisson distributed with parameters a and b respectively
and w « x - y, then
( - a"'kbk	[0	m>0
(„+А.)Ш «-Ui ,,<o
7-5 MI AN SOI'AIU 1 SI1MA1ION 181
7-27. If у => x ’, find the nonlinear and linear MS estimate of у in terms oi x and the
resulting MS errors.
7-28. The RV x has a Rayleigh density [sec (6-50)]. Find its conditional failure rale
7-29. Find the reliability /?(/) of a system if /3(r) = ct/(l + cl).
7-30. The RV x is uniform in the interval (0. T). Find and sketch (i(i).
7-31. Find and sketch R{i) if [ill) = 4U(i) + 2U(i - T). Find the mean time to failure
of the system.
7-32. The RVs x and у are jointly normal with the zero mean, and a, = 2.	= 4.
rlV — 0.5. (a) Find the regression line £{y|.v) = <p(.c). (/>) Show that the RVs x and
у - <p(x) arc independent.
7-33. («) Show that £{(y - c)2} = tr~ + (c - rjv): for any c. (/>) Using this, show that
£{(y - c)2) is minimum if г = tjv as in (7-69). (c) Reasoning similarly, show that
£{{y — g(x)])2} is minimum if g(x) = E{y|x} as in (7-71).
CHAPTER
8
SEQUENCES
OF RANDOM
VARIABLES
8-1 GENERAL CONCEPTS
A random vector is a vector
X=[xI,...,x„]	(8-1)
whose components x, are RVs.
The probability that X is in a region D of the л-dimensional space equals
the probability masses in D:
P(XeD) = f f(X)dX X=[xI,...,xJ	(8-2)
In the above
r(	' dnF{xXi...,xn)
f\X) -/(xu...,xw) = dx^ —	(8-3)
is the joint (or, multivariate') density of the RVs x, and
£(-¥) = F(x1,...,xn) = P{x ^xp...,x„ £x„}	(8-4)
is their joint distribution.
If we substitute in F(xb.... x„) certain variables by we obtain the joint
distribution of the remaining variables. If we integrate f(xx,. ..,xn) with
respect to certain variables, we obtain the joint density of the remaining
182
8-1 C.hNlRAI.C<W<l-nS 183
variables. For example
F(X|,x3) » Г(Х|,»,л3,«>)
/(-rPx3)=f f /(xp x2, x3. x4) dx2 dx4	(8
Note In the above, we identify various functions in terms of their independent variables.
Thus f(Xf, x3) is the joint density of the RVs x( and x3 and it is in general different from
the joint density f(x2. x4) of the RVs x2 and x4. Similarly, the density ft(x,) of the RV
x, will often be denoted by /(x,).
TRANSFORMATIONS. Given к functions
»l(X).....gt(X) X=[x,...........,r„]
we form the RVs
У1 =£1(Х),-.чУл = gkW	(8-6)
The statistics of these RVs can be determined in terms of the statistics of X as
in Sec. 6-3. If к < n, then we could determine first the joint density of the n
RVs У|,...,у*,хл+|,...,хя and then use the generalization of (8-5) to elimi-
nate the x’s. If к > n, then the RVs y„+1,... ,yk can be expressed in terms of
yI,...,y„. In this case, the masses in the к space are singular and can be
determined in terms of the joint density of ур...,уя. It suffices, therefore, to
assume that к = n.
To find the density //y,,..., y„) of the random vector Y = [yt,... ,y„] for
a specific set of number y„..., y„, we solve the system
Sl(X) = y(,...,«„(*) = y„	(8-7)
If this system has no solutions, then fy(yt,.... y„) = 0. If it has a single solution
X = [xt,..., x„], then
»........
where				
		dgi	dSi	
				
		dx.	dx„	
				(8-9)
		dgn	ds„	
				
		dXj	dx„	
1s the jacobian of the transformation (8-7). If it has several solutions, then we
add the corresponding terms as in (6-63).
184 SEQUENCES OF RANDOM VARIABLES
Independence
The RVs Xj,. . . , x„ are called (mutually) independent if the events
{xj Xj},..., {x„ xn} are independent. From this it follows that
F(xl,...,x„) = F(x,)	 F(x„)
Дх„...,х„)-/(x,)	/(x„)	(8’10)
Example 8-1. Given n independent RVs x, with respective densities we
form the RVs
Ук ~ xi + ‘’ + xк	к = \
We shall determine the joint density of yk. The system
x, =y,.xl + x, =y2....x, + ••• +x„ = y„
has a unique solution
хк~Ук~Ук-1
and its jacobian equals 1. Hence [see (8-8) and (8-10)]
/Дуп-’-.Ул) -Л(У|)Л(У2 -yi) ••• fn(yn ~yn-i) (8-11)
From (8-10) it follows that any subset of the set x; is a set of independent
RVs. Suppose, for example, that
/(x,,x2,x3) =/(x!)/(x2)/(x3)
Integrating with respect to x3, we obtain f(xl,x2) = /(xt)/(x2). This shows
that the RVs %! and x2 are independent. Note, however, that if the RVs x; are
independent in pairs, they are not necessarily independent. For example, it is
possible that
/(X|,x2) =/(x,)/(x2) /(x!,x3) =/(xI)/(x3) /(x2,x3) =/(x2)/(x3)
but /*(X|,x2, x3) #= f(xI)/(x2)/,(x3) (see Prob. 8-2).
Reasoning as in (6-29), we can show that if the RVs x, are independent,
then the RVs
У1 = 8i(xjy„ =8„(хл)
are also independent.
INDEPENDENT EXPERIMENTS AND REPEATED TRIALS. Suppose that
7" = x • • • x
is a combined experiment and the RVs x(- depend only on the outcomes £•
of ZJ:
••• £ ••• О »*/(&)
If the experiments are independent, then the RVs xf are independent (see
also (6-30)]. The following special case is of particular interest.
8-1 GENERAL CONCEHS 185
Suppose that x is an RV defined on an experiment and the experiment
is performed n times generating the experiment .У"* = ./x •  • x .У. In this
experiment, we define the RVs x, such that
*•' £ ••• О ~x«.)	»=	(8-12)
From this it follows that the distribution of x, equals the distribution
Fx(x) of the RV x. Thus, if an experiment is performed n times, the RVs x,
defined as in (8-12) are independent and they have the same distribution Fr(x).
These RVs are called i.i.d. (independent, identically distributed).
Example 8-2 Order statistics. The order statistics of the RVs x, arc n RVs yk
defined as follows: For a specific outcome f, the RVs xz take the values x,(f).
Ordering these numbers, we obtain the sequence
xr,«)	•• 5хГд«)5 ••• <Jxr<t(<)
and we define the RV yk such that
yi(O=xZi(f)< ••• 5У*(ЛЯЧ(П^ ^y„«) =xrJf) (8-13)
We note that for a specific i, the values x/f) of x, occupy different locations in the
above ordering as £ changes.
We maintain that the density fk(y) of the к th statistic yk is given by
n!	i
fM “ (*-l)!(,-t)l '(’’Я1 - fM (8-14)
where Fx(x) is the distribution of the i.i.d. RVs x,- and /Х(х) is their density.
Proof. As we know
A(y) dy - P{y < yk <. у + dy}
The event Sd = {у < yk <, у + dy) occurs iff exactly к - 1 of the RVs x, are less
than у and the one is in the interval (у, у + dy) (Fig. 8-1). In the original
experiment </, the events
~ (x <, y) = {y < x у + dy} = {x > у + dy}
form a partition and
P(^) = Fx(y) P(^2) - fx(y) dy P(^3) » 1 - Fx(y)
In the experiment the event & occurs iff occurs к - 1 times, occurs
once, and «я^3 occurs n — к times. With kx = к — I, кг = 1, k3 = n - k, it
4
3t»
4
*-----1----*-----1 ;	----------M-
У У* У+dy	y„
FIGVRE8-I
186 SEQUENCES OF RANDOM VARIABLES
follows from (3-38) that
P(S,) - (*-.)!?(«
and (8-14) results.
Note that
/,(>•) = «[I - ^(у)Г’*Л(у) Л(У) = ’iFr'WfM
These are the densities of the minimum y( and the maximum yn of the RVs x,.
Special Case. If the RVs x, are exponential with parameter a:
fx(x) — ae~nlU(x) Ft(.v) = (1 - e~ax)U(x)
then
that is, their minimum y( is also exponential with parameter na.
Example 8-3. A system consists of m components and the time to failure of the
/th component is an RV x, with distribution F,(x). Thus
1 — Fz(/) = P{x, >/}
is the probability that the ith component is good at time t. We denote by n(t) the
number of components that are good at time t. Clearly,
n(/) = ni + • •• +n„,
where
П, = (о x <r = 1 ~ Л(')
Hence the mean £{n(r)} = ?j(t) of n(r) is given by
rj(/) = !-£,(/)+ ••• +1->£«(/)
We shall assume that the RVs x, have the same distribution F(t). In this case,
7}(t) = m[l - F(/)]
Failure rate The difference rj(t) — rj(t + dt) is the expected number of
failures in the interval (/,/ + dt). The derivative —7]'(t) ~ mf(.t) of -i?(/) is the
rate of failure. The ratio
B(t) ~	(8*15)
l-F(t)
is called the relative expected failure rate. As we see from (7-52), the function 0(t)
can also be interpreted as the conditional failure rate of each component in the
system. Assuming that the system is put into operation at. t — 0. we have n(0) — m;
hence ij(0) — £(n(0)} = m. Solving (8-15) for tjC/), we obtain
tj(/) -mcxp{-j£iB(r) drj
8-1 GLN1.KALCONCM4S 187
Example 8-4 Measurement errors. Wc measure an object of length 77 with n
instruments of varying accuracies. The results of the measurements arc /1 RVs
x, =’) + p, £{p,} = 0	£{ v,2} = <r~
where v, are the measurement errors which we assume independent with zero
mean. We shall determine the unbiased, minimum variance, linear estimation of
77. This means the following: Wc wish to find n constants a, such that the sum
is an RV with mean E{fj} = ajEfxJ + •  • + a„E(xJ = 7] and its variance
P1 = afo-f + • • • + a2<r/
is minimum. Thus our problem is to minimize the above sum subject to the
constraint
«1 ++«„= 1
(8-16)
To solve this problem, we note that
V = a [erf + • • • + ajjor2 - A(ak + •  • + «„ - 1)
for any A (Lagrange multiplier). Hence V is minimum if
3V ,	A
— = 2a,tr, - A = 0 a, = —-j
oa,	2a/
Inserting into (8-16) and solving for A, we obtain
- = |/=----------=----------------T
2 1/af + •  • + i/ff,,-
Hence
- = Y|/g|2 + "' +
77 1/tTj2 + ••• +l/tr„2
(8-17)
Illustration. The voltage E of a generator is measured three times. We list below
the results xt of the measurements, the standard deviations a, of the measurement
errors, and the estimate £ of E obtained from (8-17):
x,. = 98.6 98.8 98.9 a, = 0.20 0.25 0.28
X|/0.04 4-Хг/0.0625 + x3/0.0784
1/0.04 + 1/0.0625 + 1/0.0784
Group independence. We say that the group Gx of the RVs x„...,x„ is
independent of the group Gy of the RVs y,»..., yk if
-/(Х|,...,хл)/(у„...,у*)	(8-18)
By suitable integration as in (8-5) we conclude from (8-18) that any subgroup of
Gx is independent of any subgroup of Gy. In particular, the RVs x,- and у; are
independent for any i and j.
Suppose that is a combined experiment X the RVs x; depend
only on the outcomes of and the RVs y, depend only on the outcomes of
188 StOUHNCES OF RANDOM VARlAHI.bS
•>2. If the experiments .Zt and ../2 are independent, then the groups G, and
Gy are independent.
Wc note finally that if the RVs z,„ depend only on the RVs x, of G, and
the RVs wr depend only on the RVs y} of Gv. then the groups G. and G„ arc
independent.
Complex random variables The statistics of the RVs
Z! = X!	= X„ +jy„
are determined in terms of the joint density f(xt, .v„. yp.... y„) of the 2n
RVs x, and yz. We say that the complex RVs z, are independent if
....x„, у= /(*,, yj •  /(x„,y„)	(8-19)
Mean and Covariance
Extending (7-2) to n RVs. we conclude that the mean of g(xn... ,x„) equals
f • •• f g(xl,...,x„)f(xl,....xn) dxl •• • dx,, (8-20)
J — 00	J — X
If the RVs z, = x, + jy, are complex, then the mean of g(z1,...,zn)
equals
/	•••/ g(zl,...,2„)/(xI,...,x„,y1,,...y„) dxt dy„
J —00	J — X
From the above it follows that (linearity)
£{0|g,(X) + ••• +a„g„(X)} = «,E{g,(X)) + ••• +n„E{g„(X)}
for any random vector X real or complex.
CORRELATION AND COVARIANCE MATRICES. The covariance C,; of two real
RVs X/ and x; is defined as in (7-6). For complex RVs
CV) = E{(xz - ^-)(x* - 77*)) = Hlx.xf) - E{x,}E{xf}
by definition. The variance of x(- is given by
<z2 = C„ = £{|x, - ч,|2) = £{lx,l2} - |£{х,) |2
The RVs xz are called (mutually) uncorrelated if CtJ = 0 for every i #= /. In
this case, if
x = x, + • • • 4- x„ then cr* = a~\ + • • • + <r„2	(8-21)
Example 8-5. The RVs
_	1 "	Iя ,
* = - L X,	v = ----- £ (x, - x)-
Л-1/-!
are by definition the sample mean and the sample variance respectively of x,. We
shall show that, if the RVs x, are uncorrelated with the same mean £{x,J = n and
8-1 (.i м-нл| i <i\ci its 189
variance <rf2 = a-2, then
£{x} = tj <rc2 = ,r2/,t	(8-22)
and
E{v) = a2	(8-23)
Proof. The first equation in (8-22) follows from the linearity of expected values and
the second from (8-21):
£(*} = - E £{x.} = *)	= — Ё a,- = —
" .-I	«“,„i	"
To prove (8-23), we observe that
£{(\ “ T)(x - П)} = ~E{(X, “ »l)[(xi “ t?) + • • +(x„ - tj)]}
1	er2
= -£{(x< - 7?)(x! - T?)} = —
because the RVs x, and x; are uncorrelated by assumption. Hence
.	T Гг,	ill	<r“ л - 1
£{(*. “	x)“}	= £{[(x, -	rj) - (x - -q)]*)	= <r- +------2—	=------<r-
'	n и n
This yields
1 Д ,	,, n n - 1 ,
£{f) - — £ £{(«. - SH = — —
and (8-23) results.
Note that if the RVs x, arc i.i.d. with E{|x, - т? I4} = then (see Prob.
8-21)
2 1 ( П ~ 3 Л
"b = ~ Д4------------r0-
Il \ Il - 1 I
If the RVs X|,...,x„ are independent, they are also uncorrelated. This
follows as in (7-14) for real RVs. For complex RVs the proof is similar: If the
RVs zt = Xj + jy, and z2 = x2 +/У2 are independent, then /(xt, x2. yt, y2) =
ftxi, у$(хг, y2). Hence
Г • • • Г .г,, x2, уi, y2) dxt dyx dx2 dy2
e Г Г dy' £^в^_в22*^X2, dy’2
This yields » EfzJEU?} therefore, zx and z, are uncorrelaied.
190 SEQUENCES OF RANDOM VARIABLES
Note, finally, that if the RVs x, are independent, then
E{i't(x।) • • • g„(x„)} = E(gt(xt)} • • • £{g„(x„)}	(8-24)
Similarly, if the groups xl5... ,x„ and y,,... ,yk are independent, then
E{g(xl,...,x„)/t(yl,...,yJ) -£{g(x„...,xJ)E{/i(y„....yA.)}
The correlation matrix. We introduce the matrices

Л •		c„ =	£ii	• clt;
A-	•• я J		к 	• C„„]
where
= £{x,x*} = R* CtJ = RtJ - tj/t,* = C*
The first is the correlation matrix of the random vector X = [X|,...,xJ
and the second its covariance matrix. Clearly,
Rn = E{X'X*}
where X' is the transpose of X (column vector). We shall discuss the properties
of the matrix R„ and its determinant A„. The properties of Cn are similar
because C„ is the correlation matrix of the “centered” RVs x, - n,.
THEOREM. The matrix R„ is nonnegative definite. This means that
Q = 'LarfR'j = AR„A + > 0	(8-25)
ij
where A + is the conjugate transpose of the vector A = [a,,..., nJ.
Proof. It follows readily from the linearity of expected values
E{|fl|X| + ••• + a„x„|2} = £e,-ef£{xfx/)	(8-26)
i.i
If (8-25) is strictly positive, that is, if Q > 0 for any A =# 0, then Rn is
called positive definite.t The difference between Q > 0 and Q > 0 is related to
the notion of linear dependence.
DEFINITION. The RVs x, are called linearly independent if
E{|a,x, + ••+a„x„|2) >0	(8-27)
for any A 0. In this case (see (8-26)], their correlation matrix R,t is positive
definite.
tWe shall use the abbreviation p.d. to indicate that Rn satisfies (8-25). The distinction between
Q St 0 and Q > 0 will be understood front the context.
8-1 GENERAL CONCEPTS 191
The RVs x, are called linearly dependent if
«ixi + •• +«„x„ = 0	(8-28)
for some A * 0. In this case, the corresponding Q equals 0 and the matrix Rn
is singular [see also (8-29)].
From the definition it follows that, if the RVs x( are linearly independent,
then any subset is also linearly independent.
The correlation determinant. The determinant A„ is real because Rtj = R*.
We shall show that it is also nonnegative
A„ > 0	(8-29)
with equality iff the RVs x, are linearly dependent. The familiar inequality
Д2 = Ли/?22 — /?f2	0 is a special case [see (7-12)].
Suppose, first, that the RVs x, are linearly independent. We maintain that,
in this case, the determinant Д„ and all its principal minors are positive
Д* > 0 k<n	(8-30)
Proof. The above is true for n = 1 because Д] =	> 0. Since the RVs of any
subset of the set {x,} are linearly independent, we can assume that (8-30) is true
for к < n — 1 and we shall show that Д,( > 0. For this purpose, we form the
system
^llal + ‘ ‘ ‘ + ^\nan = 1
+ • • • + R2nan = 0	(8-31)
Лл!а1 + ' ’' + Л/Л ~ 0
Solving for «i, we obtain at = Дя_1/Д„ where A„_j is the correlation determi-
nant of the RVs x2,...,x„. Thus is a real number. Multiplying the jth
equation by af and adding, we obtain
Д .
<2 = E^.7 = «l = -f:i	<8’32)
U	n
In the above, Q > 0 because the RVs x, are linearly independent and the left
side of (8-27) equals Q. Furthermore, > 0 by the induction hypothesis;
hence Дл > 0.
We shall now show that, if the RVs xz are linearly dependent, then
Д„ = 0	(8-33)
Proof. In this case, there exists a vector A * 0 such that atxt + • • • + anxn = 0.
Multiplying by x* and taking expected values, we obtain
‘ ‘ + anRin = 0
This is a homogeneous system satisfied by the nonzero vector Л; hence Д„ = 0.
192 SEQUENCES OF RANDOM VARIABLES
Note, finally, that [see (15-161)]
Д„ £ 1^22 '  * Я„„	(8-34)
with equality iff the RVs x, are (mutually) orthogonal, that is, if the matrix Rn is
diagonal.
8-2 CONDITIONAL DENSITIES,
CHARACTERISTIC FUNCTIONS,
AND NORMALITY
Conditional densities can be defined as in Sec. 7-2. We shall discuss various
extensions of the equation /(y|x) =/(x, y)/f(x). Reasoning as in (7-41),
we conclude that the conditional density of the RVs x„, ....x^, assuming
xA,...,Xj is given by
,	.	/(x., -.., X,,,..., x„)
f(xn,.... xA+1 |x,..... xj =-----—-------—------- (8-35)
J\X], . . . , xk)
The corresponding distribution function is obtained by integration:
F(xn> •  • > xk + 1	• • •» ^1)
= f "•••/’ 1+'/(ал,...,аА+1|хА,...,х1)^О!л + 1	da„ (8-36)
J — w	ОС
For example,
r( i	Лх1’*2.*з)	^(x^.xj)
/(х2>хз)
Chain rule From (8-35) it follows that
/(х15...,хл)	••• Л*г1*|)Л*1)	(8"37)
Example 8-6. We have shown that [see (5-18)] if x is an RV with distribution F(x),
then the RV у = F(x) is uniform in the interval (0,1). The following is a
generalization.
Given n arbitrary RVs xt, we form the RVs
=	Уг ” F(x2|x,),...,y„ =F(xJx„_),...,xl)	(8-38)
We shall show that these RVs are independent and each is uniform in the interval
(0,1).
Proof. The RVs y( are functions of the RVs x, obtained with the transformation
(8-38). For 0 & yf g 1, the system
У1 “F(x,) yt “F(x2|xl),...,yn ° F(xrt|x„_1,...,xl)
8-2 CONDITIONAL DENSITIES 193
has a unique solution xIt..., xn and its jacobian equals
J =	dx, аУ2 d*\	0 аУ2 dx2	0 0	0 0
	аУп			аУп
	dx\			3x„
The above determinant is triangular; hence it equals the product of its diagonal
elements
dyk
— = /(xJx*_,........x,)
Inserting into (8-8) and using (8-37), we obtain
f(	x ______________A*i.....*„)_______________
”	’ "	/(xt)/(x2|X|) •••/(xfl|xw_,,...,xt) “
in the л-dimensional cube 0 y, < 1, and 0 otherwise.
From (8-5) and (8-35) it follows that
f(xt\x3)=f /(xl,x2|x3)dr2
< — ОС
/(X||x4)=f f /(x1|x2,x3,x4)/(x2,x3|x4)dr2dx3
Generalizing, we obtain the following rule for removing variables on the left or
on the right of the conditional line: To remove any number of variables on the
left of the conditional line, we integrate with respect to them. To remove any
number of variables to the right of the line, we multiply by their conditional
density with respect to the remaining variables on the right, and we integrate
the product. The following special case is used extensively (Chapman-
Kolmogoroff):
f(xl\x3)= f /(x1|x2,x3)/(x2|x3)€&2	(8-39)
J — co
Discrete type The above rule holds also for discrete type RVs provided
that all densities are replaced by probabilities and all integrals by sums. We
mention as an example the discrete form of (8-39): If the RVs X|,x2’ x3 take the
values at,bk,cr respectively, then
P(x, = af|x3 = cj - EF(xi = (ч\Ьк,сг}Р{*2 = hjcj (8-40)
к
194 SEQUENCES Of- RANDOM VARIABLES
CONDITIONAL EXPECTED VALUES. The conditional mean of the RVs
£(х|э....x„) assuming is given by the integral in (8-20) provided that the
density f(xu...,x„) is replaced by the conditional density /(x,,..x/t\^).
Note, in particular, that [see also (7-57)]
E{xj|x,,..., x„) «j* xI/(x,|x2,...,xw)rfxl	(8-41)
The above is a function of x2,..., x„; it defines, therefore, the RV
E{Xj|x2,.. .,хл). Multiplying (8-41) by /(x2,...,x„) and integrating, we con-
clude that
E{E{xl|x2,...,x„}} = E{x,}	(8-42)
Reasoning similarly, we obtain
E{xl|x2,x3) = E{E{x,|x2,x3,x4})
« f £{x||a-2,x3,x4)/(x4|x2,x3) dxj (8-43)
This leads to the following generalization: To remove any number of variables
on the right of the conditional expected value line, we multiply by their
conditional density with respect to the remaining variables on the right and we
integrate the product. For example,
£{xilx.i} = f £{х11*2.л'з)Лл’21*з)^2	(8-44)
/ — X
and for the discrete case [see (8-40)]
£{xiK} = ££{х11Ьа.,сг}Р{х2 = 6Alcr)	(8-45)
к
Example 8-7. Given a discrete type RV n taking the values 1,2,... and a
sequence of RVs xk independent of n, we form the sum
s = У- xa	(8-46)
This sum is an RV specified as follows: For a specific £, n(£) is an integer and s(£)
equals the sum of the numbers xA(f) for к from 1 to n(£). We maintain that if the
RVs xk have the same mean, then
E{s) = rjE{n} where E(xfc) — 17	(8-47)
Clearly, E{xjn ~ n} — E{xk) because xk is independent of n. Hence
(II	\	n
22 x4n = n > = £ £{x*J=
k-i	J	fc-1
From this and (7-65) it follows that
E{s] =E{E{s|n}} =Е{лп}
and (8-47) results..
8-2 CONDI 11(>,S'Al. |)| SSI I U.S	195
, Wc show next that if the RVs x. are uncorrclatcd with the same variance
o--, then
£(s-} ~ T)?E{n2} + a’£{n)	(8-48)
Reasoning as above, we have
/<{s-|n = «)=££	(8-49)
i •• I A- - I
where
r-i i I <r: + i]2 i = к
(9“	i* к
The double sum in (8-49) contains n terms with i = к and n2 — n terms with
i Ф k; hence it equals
(cr2 + t)2)h + 7?2(/r - n) = трг + <r2n
This yields (8-48) because
E{s2} = E{E{s2|n)} = Е{тгп2 + <r2n)
Special Case. The number n of particles emitted from a substance in t seconds is a
Poisson RV with parameter Ar. The energy xA. of the A:th particle has a Maxwell
distribution with mean 3kT/2 and variance 3k2T2/2 (see Prob. 8-5). The sum s in
(8-46) is the total emitted energy in / seconds. As we know E{n) = Ar, E{n2} =
A2r2 + Ar [see (5-37)]. Inserting into (8-47) and (8-48), we obtain
3kTM , \5k2T2At
ад -	j—
Characteristic Functions and Normality
The characteristic function of a random vector is by definition the function
Ф(О) = E(e'nx'} = Е{ел“'х,+	= Ф(Д1)	(8-50)
where
X =	П = [wi,..•,*>«]
As an application, we shall show that if the RVs x^ are independent with
respective densities /;(хД then the density fz(z) of their sum z — Xj + • • • +x„
equals the convolution of their densities
ft(z) =Л(г)* ••• */„(z)	(8-51)
Proqf. Since the RVs x, are independent and depends only on x„ we
conclude that from (8-24) that
196 SliOUHNCES Ol- RANDOM VARIAHI.ES
Hence
Ф;(<у) « E{^w(x'+ " +'"’} = ф|(") ” ‘ ф"(ш)	(К-52)
where Ф,(ю) is the characteristic function of x,. Applying the convolution
theorem for Fourier transforms, we obtain (8-51).
Example 8-8. (a) (Bernoulli trials) Using (8-52) wc shall redcrivc the fundamental
equation (3-13). Wc define the RVs x, as follows: x, - 1 if heads shows at the ith
trial and x, = 0 otherwise. Thus
P{x, = 1} = P{h} = p P{k, = 0} = /’{/}= q Ф,(«>) = pe’M + q (8-53)
The RV z = X| 4- • •• +x„ takes the values 0,and {z = k} is the event
{k heads in n tossings). Furthermore.
ФДго) = Де'шг) = £ P{z = k}eiU'	(8-54)
*-o
The RVs x, arc independent because x, depends only on the outcomes of the ith
trial and the trials arc independent. Hence [sec (8-52) and (8-53)]
Фг(а>) = (pe’,u 4- q)" = £ (j)pW*‘
к ** 0
Comparing with (8-54), we conclude that
P{z = k] = P{k heads} =	(8-55)
(/>) (Poisson theorem) We shall show that if pci, then
as in (3-41). In fact, wc shall establish a more general result. Suppose that the RVs
xi are independent and each takes the value 1 and 0 with respective probabilities
p, and qj = 1 - p,. If pt 1, then
1» = 1 +	- 1) = Pieiu> 4- = Ф,.(ш)
With z = X] 4- • • • 4- x,„ it follows from (8-52) that
Ф.(ш) =	°
where a = p, 4- • • • 4-pz|. This leads to the conclusion that [see (5-79)] the RV z is
approximately Poisson distributed with parameter a. It can be shown that the
result is exact in the limit if
Pi 0 and pt 4- • • - +pn a as n -» *
NORMAL VECTORS. Joint normality of n RVs x; can be defined as in (6-15):
Their joint density is an exponential whose exponent is a negative quadratic. We
give next an equivalent definition that expresses the normality of n RVs in
terms of the normality of a single RV.
8-2 CONDITIONAL DI.NSIIILS 197
DEFINITION. The RVs x, are jointly normal iff the sum
fli*i + • • • + a„x„ = AX'	(8-56)
is a normal RV for any A.
We shall show that this definition leads to the following conclusions: If the
RVs x, have zero mean and covariance matrix C. then their joint characteristic
function equals
Ф(П) = ехр{-4ПСП'}	(8-57)
Furthermore, their joint density equals
/(*) = -	1 _ exp{ - U'C“ ' AT *)	(8-58)
V(2tt) Д
where Д is the determinant of C.
Proof. From the definition of joint normality it follows that the RV
w =ш1х| +  • • + ш„х„ - ПХ*	(8-59)
is normal. Since Е{хД = 0 by assumption, the above yields [see (8-26))
E{w) = 0	E{w2} =	= af
i.J
Setting 77 « 0 and w - 1 in (5-65), we obtain
E{e'”} = exp - у
This yields
E(eynx') e exp/ - -	(8-60)
(	“ i.j I
as in (8-57). The proof of (8-58) follows from (8-57) and the Fourier inversion
theorem.
Note, finally, that if the RVs xt are jointly normal and uncorrelated, they
are independent. Indeed, in this case, their covariance matrix is diagonal and its
diagonal elements equal erf. Hence C-1 is also diagonal with diagonal elements
l/<r2. Inserting into (8-58), we obtain
1	1 (x2	x2 H
f(xi,..., x_) =  ----------. = , expt — -r — + ' • • -I—? I ?
a,  • • cr„-/(27rjr	(<rf	cr~ J j
Example 8-9. Using characteristic functions, wc shall show that if the RVs x, arc
jointly normal with zero mean, and£{x,X;) = Cir then
£{*1*2*3*41 = 12^-3-»	^13^*24 "t"	(8-61)
198 SLQUfcNC’liS Ol RANDOM VARIABLES
Proof. Wc expand the exponentials on the left and right side of (8-6(1) and wc show
explicitly only the terms containing the factor
1 r	41
Е(е'(ш‘х‘+	= •• - +— E{(^|X| 4-	+w4x4) } +
24
= ... + —-£{xlx2xJx4}wl<u2to3<u4
4!
cxp^~ у	= + у ^7	+•••
8
Equating coefficients, wc obtain (8-61).
Complex normal vectors. A complex normal random vector is a vector Z =
X + j Y = [zt,. .,,z„] the components of which are n jointly normal RVs z, =
x, + /Ус We shall assume that £{z,} = 0. The statistical properties of the vector
Z are specified in terms of the joint density
/2(Z) ~f(xx,...,x,t,yx........y„)
of the 2n RVs xt- and y;. This function is an exponential as in (8-58) determined
in terms of the 2/i by 2n matrix
n = ^xx ^XY
[CXA- CYY
consisting of the 2n~ + n real parameters £{х,хД E(y,,yy}, and E{x,y;}. The
corresponding characteristic function
Ф2(П) « E{exp(J(uiXj + •  • +u„x„ + Г)У1 +  • • + vny„))}
is an exponential as in (8-60):
Ф2(П) = exp{- |Q)
G=[t/ И]
Cxx
CYv
^XY U'
C*yy . V‘
where U » [ub..., «„], И -	, vj, and Я = U + jV.
The covariance matrix of the complex vector Z is an n by n hermitian
matrix
Czz ~ E(ZfZ*} — Cxx + CYY j(CxY Cyx)
with elements £{z,z*}. Thus, Czz is specified in terms of n2 real parameters.
From this it follows that, unlike the real case, the density fz(.Z) of Z cannot in
general be determined in terms of Czz because fz(.Z) is a normal density
consisting of 2n2 4- n parameters; Suppose, for example, that n = 1. In this
case, Z = z = x + jy is a scalar and Czz = E{|z|2}. Thus, Czz is specified in
terms of the single parameter a.2 = E{x2 + y2}. However, f:(z) = /(x, y) is a
bivariate nprmal density consisting of the three parameters orx, cry, and E{xy}. In
8-2 CONDITIONAL Df-NSiril-S 199
the following, we present a special class of normal vectors that are statistically
determined in terms of their covariance matrix. This class is important in
modulation theory (see Sec. 11-3).
Goodman’s Theorem.! If the vectors X and Y are such that
<~XX ” ^YY Qvy = ~~Cyx
and Z = X + _/Y, then
Qz = 2(cv<v-;cvy)
fAz) ~ ^\c. I exP{"zQzz4	(8-62fl)
( I I
Фг(П) = exp/-~HC2zn+}	(8-626)
Proof. It suffices to prove (8-62/?); the proof of (8-62) follows from (8-626) and
the Fourier inversion formula. Under the stated assumptions,
Ml'
c.vy ‘-xa'JLk J
« UCXYU' + VCXYir - UCXYV’ + vcxxw
Furthermore Cxx = Cxx and CXY « ~CXY. This leads to the conclusion that
VCXXIT = UCXXV' UCXYU‘ = VCxy^ = 0
Hence
iflCzzn* = (U + jV)(Cxx-iCxr)(U' - JV') - e
and (8-626) results.
Normal quadratic forms. Given n independent MO, 1) RVs zf, we form the
sum of their squares
X = zf + • •  + zf
Using characteristic functions, we shall show that the RV x so formed has a
chi-square distribution with n degrees of freedom:
/X(x) = yxn/2~le~x/2U(x)
tN. R. Goodman, “Statistical Analysis Based on Certain Multivariate Complex Distribution,"
Annals of Math. Statistics, 1963, pp. 152-177,
200 SEQUENCES OF RANDOM VARIABLES
Proof. The RVs z2 have a ^2(1) distribution (see page 96); hence their charac-
teristic functions are obtained from (5-71) with m - 1. This yields
1
Ф/s) - £{e,5ir) ~ — —
1	/1 - 2s
From (8-52) and the independence of the RVs z2, it follows therefore that
ФГ(5)=Ф,(.)--.Ф„(5)«7=Ц=
V(] - 2s)
Hence [see (5-71)] the RV x is x2M-
Note that
1 1 1
7a - 2s)"‘ x 7(i - 2s)" 7a - 2s)"^n
This leads to the conclusion that if the RVs x and у arc independent, x is x2(m)
and у is x2(n\ then the RV
z - x + у is xHm + n)	(8-63)
Conversely, if z is x2(m + л), x anc* У are independent, and x is *2(*n).
then у is x2(n\ The following is an important application.
Sample variance. Given n i.i.d. Mi?, <r) RVs xf, we form their sample variance
s2 -------- 52 (x; ~ x)2 x = _ L x;	(8-64)
Л"1,-]	л,.]
as in Example 8-4. We shall show that the RV
(n-l)s2	« /x,-x\2 .
-----2----~ 12 ------- ,s x (n - 1)	(8-65)
/=l\ <r /
Proof. sum the identity
(x, - 17)2 = (x/ - x 4- x - 17)2 = (xt- - x)2 + (x - 17)2 + 2(xt ~ x)(x - 1?)
from 1 to n. Since E(x, — x) ~ 0, this yields
£pr^=M‘+(w	(M6)
It can .be shown that the RVs x and s2 are independent (see Prob. 8-17). From
this it follows that the two terms on the right of (8-66) are independent.
Furthermore, the term
8-3 MEAN SQUARE ESTIMAI ION 201
is x2(l) because the RV x is Mt?, a/yfo). Finally, the term on the left side is
X2(n) and the proof is complete.
From (8-65) and (5-71) it follows that the mean of the RV (n - l)s2/cr2
equals n - 1 and its variance equals 2(n - 1). This leads to the conclusion that
2	4
E{s2) = (n — 1) —= a2 Vars2 = 2(w - 1)—--------= ————
"(л - I)2	m-1
(8-67)
Example 8-10. We shall verify the above for л = 2. In this case,
_ xi + x2 ,	,	,	,	1
--------- s = (x, - x)- + (x2 - x)- + -(x, - x,)-
The RVs X! + x2 and x, — x2 arc independent because they are jointly normal
and Efy - x,} = 0, E((x, - x2Xx1 + x2)} = 0. From this it follows that the RVs
x and s2 arc independent. But the RV (x( - x2)/o72 = s/<r is MO, 1); hence its
square s2/tr2 is 1) in agreement with (8-65).
8-3 MEAN SQUARE ESTIMATION
In Sec. 7-5, we considered the problem of estimating an RV s by a linear and a
nonlinear function of another RV x. Generalizing, we consider now the problem
of estimating s in terms of n RVs xI}...,x„ (data). This topic is developed
further in Chap. 14 in the context of infinitely many data and stochastic
processes.
LINEAR ESTIMATION. The linear MS estimate of s in terms of the RVs x, is the
sum
s =	+ •• • + a„x„	(8-68)
where	are n constants such that the MS value
P = E{(s - s)2} = f([s - (fl]X + • • • + я„х„)]2}	(8-69)
of the estimation error s — s is minimum.
Orthogonality principle. P is minimum if the error s - s is orthogonal to the
data x,-:
E{[s - (^Xj + •••+а„хЛ)]х,} = 0	(8-70)
Proof, P is a function of the constants «,• and it is minimum if
— =E{-2[s - (ajXj + ••• +a«x„)]xj = 0
and (8-70) results. This important result is known also as the projection theorem.
202 SEQUENCES OF RANDOM VARIABLES
Setting i = 1,..., n in (8-70), we obtain the system
+ 7?21^2	= Л0!
E|2al + R22a2 + • " + R„2an = K()2	(8-71)
Rln^l R2n^2	Rnn^n R0n
where R(J = E{x,x,} and R0/ = E{sxy).
To solve this system, we introduce the row vectors
X = [x,,...,xj A = [a„...,art] Ko = [K0J...., R(I„]
and the data correlation matrix R = E(X'X) where X' is the transpose of X.
This yields
AR = R0 A = R^R-'	(8-72)
Inserting the constants a, so determined into (8-69), we obtain the LMS
error. The resulting expression can be simplified. Since s - s ± x, for every i,
we conclude that s - s 1 s; hence
P = E{(s - s)s) = E{s2} - Л/?'	(8-73)
Note that if the rank of R is m < n, then the data are linearly dependent.
In this case, the estimate s can be written as a linear sum involving a subset of
m linearly independent components of the data vector X.
Geometric interpretation. In the representation of RVs as vectors in an abstract
space, the sum s =	+ • • • + алх„ is a vector in the subspace S„ of the data
X; and the error e = s - s is the vector from s to s as in Fig. 8-2л. The
projection theorem states that the length of e is minimum if e is orthogonal to
x;, that is, if it is perpendicular to the data subspace Sn. The estimate § is thus
the “projection” of s on Sn.
8-3 МкЛЧ М)| -Л КI I S 1IMA I II >\	203
If s is a vector in S,„ then s = s and P = 0. In this case, the n + I RVs
s,X|„. ...x„ arc linearly dependent and the determinant Д,,., of their correla-
tion matrix is 0. If s is perpendicular to S,„ then s = 0 and P = E{|s|2). This is
the case if s is orthogonal to all the data xp that is. if Rlh = (I for j * 0.
Nonhomogeneous estimation. The estimate (8-68) can be improved if a constant
is added to the sum. The problem now is to determine n + 1 parameters a*
such that if
s =	+ OjXj + • • • +a„x,1	(8-74)
then the resulting MS error is minimum. This problem can be reduced to the
homogeneous case if wc replace the term a0 by the product a(lxl( where xn = 1.
Applying (8-70) to the enlarged data set
*o,xi.....x„ where	~ 7?'	‘ r
ll	i = 0
we obtain
«о +	*7i«i	+	• • •	+	=Пл
+	K|i«i	+	••	+	/?!„«„	= Л(И
	 (8.75)
Vnan +	+	• • •	+	=/?„,,
Note that, if 7], = 17, = 0, then (8-75) reduces to (8-71). This yields a(l = 0 and
«Л &n"
Nonlinear estimation. The nonlinear MS estimation problem involves the de-
termination of a function g(x......,x„)=g(X) of the data x, such as to
minimize the MS error
P = E{[s-g(X)]2}	(8-76)
We maintain that P is minimum if
g(X) = E{sb¥) = J*sfs(s]X)ds	(8-77)
The function fs(s\X) is the conditional mean (regression surface) of the RV s
assuming X = X.
Proof. The proof is based on the identity [see (8-42)]
/> = £{[s -«(X)]2) =£{E{[s -«(X)]2|X|)	(8-78)
Since all quantities are positive, it follows that P is minimum if the conditional
MS error
£{[s ~«(X)]2|*) -	ds (8-79)
204 SliQUI-Nl.'l-S O1 RANDOM VAR1AII1.I:S
is minimum. In the above integral, g(X) is constant. Hcncc the integral is
minimum if g(X) is given by (8-77) [see also (7-71)].
The general orthogonality principle. From the projection theorem (8-70) it
follows that
£{[s - s](cjXj + •• + c„x„)} = 0	(8-80)
for any	This shows that if s is the linear MS estimator of s. the
estimation error s - s is orthogonal to any linear function у = qxj + • + c„x
of the data x,.
We shall now show that if g(X) is the nonlinear MS estimator of s. the
estimation error s - g(X) is orthogonal to any function w(X), linear or nonlin-
ear, of the data x,:
E([s-g(X)]w(X)) -0	(8-81)
Proof. We shall use the following generalization of (7-60):
£{[s -g(X)]iv(X)} = E{w(X)E{s - g(X) IX}}	(8-82)
From the linearity of expected values and (8-77) it follows that
E{s - g{X)\X} =£ВД -E{g(X)|X| =0
and (8-81) results.
Normality. Using the above, we shall show that if the RVs s,х,,...,хя are
jointly normal with zero mean, the linear and nonlinear estimators of s are
equal:
§ = «,x, + • • • + at)xn = g(X) = E{s|X}	(8-83)
Proof. To prove (8-83), it suffices to show that s = E(s|X). The RVs s - s and
X; are jointly normal with zero mean and orthogonal; hence they arc indepen-
dent. From this it follows that
£{s - §|^} = E{s - §} = 0 = E{s|^} - E{s|AZ}
and (8-83) results because EfsIA’} = s.
Conditional densities of normal RVs. We shall use the preceding result to
simplify the determination of conditional densities involving normal RVs. The
conditional density ffs\X) of s assuming X is the ratio of two exponentials the
exponents of which are quadratics, hence it is normal. To determine it, it
suffices, therefore, to find the conditional mean and variance of s. We maintain
that
£{sI*) = s	E{(s - s) V) = £{(s - s)2} = P (8-84)
8-3 Ml-.AN'SQUARI LX11MA11ON 205
The first follows from (8-83). The second follows from the fact that s - s is
orthogonal and, therefore, independent of X. We thus conclude that
Mxi..........x„} =	а“‘"’1‘/2Г	(8.85)
V
Example 8-11. The RVs x, and x2 are jointly normal with zero mean. Wc shall
determine their conditional density /(x-lx,). As wc know (sec (7-78)]
£{xji|} = ax, a=^
<V;U, = £ = £{(x2 - ax,)x2] = Л-r, - aRt2
Inserting into (8-85), wc obtain
/(-'six,) =	2/’
у2тг P
Example 8-12. Wc now wish to find the conditional density /(x-Jx,. x,). In this
case,
£{x,|x,, x2) = «,x, + a,x2
where the constants a, and a2 arc determined from the system
/?,,«! + Л|2«2 = R|3	/?I2a, + ^22a2 = ^23
Furthermore [see (8-84) and (8-73)]
= P — £33 — (£j3tfi + £23^2)
and (8-85) yields
/(^1хих2) =7^pe~l"~..............
Example 8-13. In this example, wc shall find the two-dimensional density
(x2, x3|x,). This involves the evaluation of five parameters [see (6-15)]: two
conditional means, two conditional variances, and the conditional covariance of
the RVs x2 and x3 assuming x,.
The first four parameters are determined as in Example 8-11:
R1?	R13
£{x2|x,} = — x, E{xj|x,) =	1
A, 1	«Ц
2	о-2	j?
О-Д,|Д1-Л22	R" <Гч1л,	R)3
The conditional covariance
«^12	1 I	1	I	to окЛ
is found as follows: Wc know that the errors x2 — f?i2xi/f?n and x3 — Л13х(/Лц
206 SEQUENCES OF RANDOM VARIABLES
arc independent of xP Hence the condition x, = x( in (8-86) can be removed.
Expanding the product, wc obtain
R 13
s' — Ft —
сжгж1|ж, /v23
This completes the specification of /(x2,x3|x().
Orthonormal Data Transformation
If the data x; are orthogonal, that is, if Ru = 0 for i #= j, then R is a diagonal
matrix and (8-71) yields
Thus the determination of the projection s of s is simplified if the data x, are
expressed in terms of an orthonormal set of vectors. This is done as follows. We
wish to find a set {iA} of n orthonormal RVs iA. linearly equivalent to the data set
{xA}. By this we mean that each iA is a linear function of the elements of the set
{xA} and each xA is a linear function of the elements of the set (iA}. The set {iA,}
is not unique. We shall determine it using the Gram-Schmidt method (Fig.
8-2b). In this method, each iA depends only on the first к data xt,.... xA.. Thus
»i = Ti*i
‘2 =	+ T2X2	(8-88)
= УГ*1 + У2Х2 + *••
In the notation у*, к is a superscript identifying the A th equation and r is a
subscript taking the values 1 to k. The coefficient y} is obtained from the
normalization condition
E{i?} = W)2».. = 1
To find the coefficients yf and yf, we observe that i, ± x, because i2 1 i( by
assumption. From this it follows that
E{i2Xj} = 0 = у^н + y2z/?2i
The condition £{i|} = 1 yields a second equation. Similarly, since iA 1 ir for
r < k, we conclude from (8-88) that iA ± x, if r <k. Multiplying the fcth
equation in (8-88) by xr and using the above, we obtain
E(iAx,} = 0==y*/?lr + ••• + ykkRkr 1 ^r<,k- 1	(8-89)
This is a system of к — 1 equations for the к unknowns y*,..., y*. The
condition E{iJ) = 1 yields one more equation.
The system (8-88) can be written in a vector form
I - ХГ
(8-90)
8-3 MEAN SOt.lAKI: LSI IMA HUN 207
where I is a row vector with elements iA. Solving for X, we obtain
xi=/i*4	,	%=ir-'=IL
X, — /fi, + /;i,	(8-91)
x,, =	+ A’i, + ••• + /"i„
In the above, the matrix Г and its inverse arc upper triangular
Since E{i,iy] = 3[/ — j] by construction, we conclude that
E{I'I) = l„ = Е{Г'Х'ХГ) = Г'Е{Х'Х]Г	(8-92)
where 1 „ is the identity matrix. Hence
ГКГ = 1„ R=L'L R{ = ГГ'	(8-93)
We have thus expressed the matrix R and its inverse R~{ as products of an
upper triangular and a lower triangular matrix [see also Cholesky factorization
(14-79)].
The orthonormal base (i„) in (8-88) is the finite version of the innovations
process i[/i] introduced in Sec. (12-1). The matrices Г and L correspond to the
whitening filter and to the innovations filter respectively and the factorization
(8-93) corresponds to the spectral factorization (12-3).
From the linear equivalence of the sets {ij and (xj, it follows that the
estimate (8-68) of the RV s can be expressed in terms of the set (ij:
s = Z>,il + • • • + 6Д, = BI'
where again the coefficients bk are such that
s - s 1 iA \ к <n
This yields [see (8-92)]
E{(s - BV)!} = 0 - E{sl] - В
from which it follows that
В = E{sl) = E{sXr] = Л„Г	(8-94)
Returning to the estimate (8-68) of s, we conclude that
s = BI1 = ВГ'Х' = AX' A = BY1	(8-95)
This simplifies the determination of the vector A if the matrix Г is known.
208 SKOUENCISS OH HANOOM VARIABLES
8-4 STOCHASTIC CONVERGENCE AND
LIMIT THEOREMS
A fundamental problem in the theory of probability is the determination of the
asymptotic properties of random sequences. In this section, we introduce the
subject, concentrating on the clarification of the underlying concepts. Wc start
with a simple problem.
Suppose that wc wish to measure the length a of an object. Due to
measurement inaccuracies, the instrument reading is a sum
x = a + v
where v is the error term. If there are no systematic errors, then v is an RV
with zero mean. In this case, if the standard deviation a of v is small compared
to a, then the observed value x(£) of x at a single measurement is a satisfactory
estimate of the unknown length a. In the context of probability, this conclusion
can be phrased as follows: The mean of the RV x equals a and its variance
equals a2. Applying Tchebycheffs inequality, we conclude that
a2
P{|x - a| < e) > 1----(8-96)
E~
If, therefore, cr « e, then the probability that |x - a| is less than that e is close
to 1. From this it follows that “almost certainly” the observed x(£) is between
a — e and a + s, or equivalently, that the unknown a is between x(£) - e and
x(£) + e. In other words, the reading x(£) of a single measurement is “almost
certainly” a satisfactory estimate of the length a as long as a «: a. If cr is not
small compared to a, then a single measurement does not provide an adequate
estimate of a. To improve the accuracy, we perform the measurement a large
number of times and we average the resulting readings. The underlying proba-
bilistic model is now a product space
.У"' = .^x • • • X
formed by repeating n times the experimentof a single measurement. If the
measurements are independent, then the ith reading is a sum
Xi; — a + V;
where the noise components vz are independent RVs with zero mean and
variance <r2. This leads to the conclusion that the sample mean
of the measurements is an RV with mean a and variance a2/n. If, therefore, n
is so large that a-2 c na2, then the value x(£) of the sample mean x in a single
performance of the experiment У" (consisting of n independent measure-
ments) is a satisfactory estimate of the unknown a.
To find a bound of the error in the estimate of a by x, we apply (8-96). To
be concrete, we assume that n is so large that cr2/na2 = 10-4, and we ask for
the probability that x is between 0.9a and 1.1a. The answer is given by (8-96)
8-4 STOCIIASIIC CONVI RCibNCI AND I.IMI I 1HI.OIUMS 209
with c « 0.1a.
P{0.9a < x < l.la) > 1 -	_ q 99
ZJ
Thus, if the experiment is performed n = 104o-2/«2 times, then “almost cer-
tainly” in 99 percent of the cases, the estimate x of a will be between 0.9a and
1.1a. Motivated by the above, we introduce next various convergence modes
involving sequences of random variables.
DEFINITION. A random sequence or a discrete-time random process is a se-
quence of RVs
xi.....*.............................. (8-98)
For a specific £, x„(£) is a sequence of numbers that might or might not
converge. This suggests that the notion of convergence of a random sequence
might be given several interpretations:
Convergence everywhere (e) As wc recall, a sequence of numbers л„
tends to a limit л if, given e > 0, we can find a number zin such that
|x„ - x| < e for every n > nn	(8-99)
We say that a random sequence x„ converges everywhere if the sequences
of numbers x„(£) converges as above for every The limit is a number that
depends, in general, on In other words, the limit of the random sequence x„
is an RV x:
x„ -» x as n -> =0
Convergence almost everywhere (a.e.) If the set of outcomes 4 such that
lim x„(£) = x(£) as n -» »	(8-100)
exists and its probability equals 1, then we say that the sequence x„ converges
almost everywhere (or with probability 1). This is written in the form
P{x„ x} = 1 as n -» oo	(8-101)
In the above, {x„ -» x) is an event consisting of all outcomes f such that
x,,U) -> x(£).
Convergence in the MS sense (MS) The sequence x„ tends to the RV x in
the MS sense if
E{|x„ - x|2}-> 0 as zj-> oo	(8-102)
This is called limit in the mean and it is often written in the form
l.i.m.x„ = x zi-»oo
Convergence in probability (p) The probability P(|x — x,,| > e) of the
event {|x - x„| > e) is a sequence of numbers depending on e. If this sequence
tends to 0:
P{|x - x„| > e} -> 0
(8-103)
n -» 00
210 SEQUENCES OF RANDOM VARIABLES
for any £ > 0, then we say that the sequence x„ tends to the RV x in probability
(or in measure). This is also called stochastic convergence.
Convergence in distribution (d) We denote by Fn(x) and F(x) respec-
tively the distribution of the RVs x„ and x. If
F„(x)-»F(x) n -> x	(8-104)
for every point x of continuity of F(x), then we say that the sequence x„ tends
to the RV x in distribution. We note that, in this case, the sequence x„(<) need
not converge for any
Cauchy criterion As wc noted, a deterministic sequence xn converges if it
satisfies (8-99). This definition involves the limit x of x,r The following theo-
rem, known as the Cauchy criterion, establishes conditions for the convergence
of x„ that avoid the use of x: If
k,™-*,,l “* 0	(8-105)
for any m > 0, then the sequence x„ converges.
The above theorem holds also for random sequence. In this case, the limit
must be interpreted accordingly. For example, if
£{lx„<-w - xJ2} “* 0 as n -> x
for every m > 0, then the random sequence x„ converges in the MS sense.
Comparison of convergence modes. In Fig. 8-3, we show the relationship
between various convergence modes. Each point in the rectangle represents a
random sequence. The letter on each curve indicates that all sequences in the
interior of the curve converge in the stated mode. The shaded region consists of
all sequences that do not converge in any sense. The letter d on the outer curve
shows that if a sequence converges at all, then it converges also in distribution.
We comment next on the less obvious comparisons:
If a sequence converges in the MS sense, then it also converges in
probability. Indeed, Tchebycheffs inequality yields
If x„ -> x in the MS sense, then for a fixed e > 0 the right side tends to 0; hence
the left side also tends to 0 as n -> «> and (8-103) follows. The converse.
FIGURE 8-3
8-4 STOCHASTIC CONVBRGCNCIi AND I.IMIГ1 HI OKI.MS 2(1
however, is not necessarily true. If x„ is not bounded, then P(|x„ - x| > e}
might tend to 0 but not E(|x„ - xF}. If, however, x„ vanishes outside some
interval (-c, c) for every n > w0, then p convergence and MS convergence are
equivalent.
It is self-evident that a.e. convergence implies p convergence. We shall
show by a heuristic argument that the converse is not true. In Fig. 8-4, we plot
the difference |x„ - x| as a function of n where, for simplicity, sequences are
drawn as curves. Each curve represents, thus, a particular sequence |xzl(£) -
x(£)|. Convergence in probability means that for a specific n > «0, only a small
percentage of these curves will have ordinates that exceed e (Fig. 8-4a). Il is, of
course, possible that not even one of these curves will remain less than e for
every n > nn. Convergence a.e., on the other hand, demands that most curves
will be below s for every n > nn (Fig. 8-4b).
The law of large numbers (Bernoulli). In Sec. 3-3 wc showed that if the
probability of an event in a given experiment equals p and the number of
successes of s/ in n trials equals k, then
P --P
I «
< ej -> 1 as n -*oo	(8-106)
We shall reestablish this result as a limit of a sequence of RVs. For this
purpose, we introduce the RVs
if srf occurs at the ith trial
otherwise
We shall show that the sample mean
Xj + ••• +x„
n
of these RVs tends to p in probability as n -> <».
Proof, As we know
,	/и?
£{x J = E(x„) = p <r2 = pq <r;n = —
212 SEQUENCES OF RANDOM VARIABLES
Furthermore, pq — p(l - p} <, 1/4. Hence [see (5-57)]
pq	1
P{ |x„ - pl < e} > 1-------з	1 “ —7	।
1 " K J ПЕ2 4пе~ " x
This reestablishes (8-106) because x„«) = k/n if occurs к times.
The strong law of large numbers (Borel) It can be shown that x„ tends to
p not only in probability, but also with probability 1 (a.e.). This result, due to
Borel, is known as the strong law of large numbers. The proof will not be given.
We give below only a heuristic explanation of the difference between (8-106)
and the strong law of large numbers in terms of relative frequencies.
Frequency interpretation Wc wish to estimate p within an error e = 0.1, using as its
estimate the sample mean x„. If n 1000, then
1	39
Р(|хя “Pl < 0.1} > 1 -	> —
Thus, if we repeat the experiment at least 1000 times, thei» in 39 out of 40 such runs, our
error |x„ - p| will be less than 0.1.
Suppose, now. that we perform the experiment 2000 times and we determine the
sample mean xrt not for one n but for every n between 1000 and 2000. The Bernoulli
version of the law of large numbers leads to the following conclusion: If our experiment
(the toss of the coin 2000 times) is repeated a large number of times, then, for a specific
n larger than 1000, the error |x„ - pl will exceed 0.1 only in one run out of 40. In other
words, 97.5 percent of the runs will be “good.” We cannot draw the conclusion that in
the good runs the error will be less than 0.1 for every n between 1000 and 2000. This
conclusion, however, is correct, but it can be deduced only from the strong law of large
numbers.
Ergodicity. Ergodicity is a topic dealing with the relationship between statistical
averages and sample averages. This topic is treated in Sec. 12-1. In the
following, we discuss certain results phrased in the form of limits of random
sequences.
Markoff’s theorem. We are given a sequence x, of RVs and we form their
sample mean
x, +	+x„
X"	л
Clearly, x„ is an RV whose values x„(£) depend on the experimental outcome
We maintain that, if the RVs x(- are such that the mean 77,( of xrt tends to a limit
7] and its variance an tends to 0 as n -> °o;
)2} 0 (8'107)
then the RV xzl tends to 17 in the MS sense
*((*. - ’>)2}	0	(8-108)
8-4 STOCHASTIC CONVI:KG1:N( I AND I.IMI I 1II1OIUMS 213
Proof. The proof is based on the simple inequality
|x„ - nh < 2|x„ - 77„|2 + 2|7j„ — 7712
Indeed, taking expected values of both sides, wc obtain
£((*„ “ 7?)2)	2E((x„ “ i?»)2) + 2(17,1 - П)2
and (8-108) follows from (8-107).
COROLLARY (Tchebycheffs condition). If the RVs x, are uncorrelated and
then
1 A
*„	“ ,inl _ E е(*Л
'< - * n j=i
in the MS sense.
Proof. It follows from the theorem because, for uncorrclated RVs. the left side
of (8-109) equals a*.
We note that Tchebycheff’s condition (8-109) is satisfied if or < К < » for
every i. This is the case if the RVs x, are i.i.d. with finite variance.
Kinchin We mention without proof that if the RVs x, are i.i.d., then their
sample mean xw tends to 77 even if nothing is known about their variance. In
this case, however, x„ tends to 17 in probability only. The following is an
application:
Example 8-14. We wish to determine the distribution Fix) of an RV x defined in
a certain experiment. For this purpose we repeat the experiment n times and form
the RVs x,- as in (8-12). As we know, these RVs are i.i.d. and their common
distribution equals Fix). We next form the RVs
where x is a fixed number. The RVs y/x) so formed are also i.i.d. and their mean
equals
£{%(*)} = 1 x P{y, = 1} =P{x,	= F(x)
Applying Kinchin’s theorem to y,(x), we conclude that
y,(x) + ••• +y„(x)
-----------------------> F(x)
n
in probability. Thus, to determine Fix), we repeat the original experiment n limes
and count the number of times the RV x is less than x. If this number equals к
and n is sufficiently large, then Fix) = k/n. The above is thus a restatement of
the relative frequency interpretation (4-3) of Fix) in the form of a limit theorem.
214 SEQUENCES OF RANDOM VARIABLES
The Central Limit Theorem
Given n independent RVs x„ we form their sum
x = x, + • •  + x„
This is an RV with mean 77 =	+  • • +17,, and variance <r2 = 07 + • • +rf.
The central limit theorem (CLT) states that under certain general conditions,
the distribution F(x) of x approaches a normal distribution with the same mean
and variance:
1 x - ri \
F(x)=G^ —— j	(8-110)
as n increases. Furthermore, if the RVs x( are of continuous type, the density
/(x) of x approaches a normal density (Fig. 8-5a):
f(x) = ——.e-(x-n)-/2<r-	(8_ш)
ау2тг
This important theorem can be stated as a limit: If z = (x - уУ/а- then
for the general and for the continuous case respectively. The proof is outlined
later.
The CLT can be expressed as a property of convolutions: The convolution
of a large number of positive functions is approximately a normal function [see
(8-51)].
The nature of the CLT approximation and the required value of n for a
specified error bound depend on the form of the densities /j(x). If the RVs x,
are i.i.d., the value n = 30 is adequate for most applications. In fact, if the
FIGURE 8-5
8-4 Snx-HASTK CONVLRGfcNCI ЛКПИМП IHfcORIMS 215
functions fi(x) are smooth, values of n as low as 5 can be used. The next
example is an illustration.
Example 8-15. The RVs x, arc i.i.d. and uniformly distributed in the interval (0.1 J.
Wc shall compare the density /Да) of their sum x with the normal approximation
(8-111) for /i = 2 and n = 3. In this problem.

, тг	T
(T~ = ——-	7) = 71 —
'	12	2
n
T:
12
n = 2 /(a-) is a triangle obtained by convolving a pulse with itself (Fig. 8-6)
, T2	1 ГТ
7, = r <T~ =	e rr'1
О	/ V 7Г
n=3 /(.v) consists of three parabolic pieces obtained by convolving a
triangle with a pulse
As wc can see from the figure, the approximation error is small even for such small
values of n.
For a discrete-type RVs F(a') is a staircase function approaching a normal
distribution. The probabilities pk however, that x equals specific values xk are,
in general, unrelated to the normal density. Lattice-type RVs are an exception:
FIGURE 8-6
216 SCQUI-NCliS OH RANDOM VARIABLES
If the RVs x, take equidistant values akt. then x takes the values ak and for
large л, the discontinuities pk = P{x = ak] of F(.v) at the points xk = ak equal
the samples of the normal density (Fig. 8-5b):
1 • ,
P{x = ak} ~	(8-1P)
СГу277	’
We give next an illustration in the context of Bernoulli trials. The RVs x of
Example 8-7 are i.i.d. taking the values 1 and 0 with probabilities p and q
respectively; hence their sum x is of lattice type taking the values к = 0.......n.
In this case,
E{x) = nE{xt} = np a~ = ncr~ = npq
Inserting into (8-112), we obtain the approximation
P{x = k) =	= -г—1	е-(к-„лг/2ПР11	(8-ЦЗ)
-\/2iTnpq
This shows that the DeMoivre-Laplace theorem (3-27) is a special case of the
lattice-type form (8-112) of the central limit theorem.
Example 8-16. A fair coin is tossed six times and x, is the zero-one RV associated
with the event {heads at the ith toss). The probability of к heads in six losses
equals
P{x = k} = =Pk x = xj + ---+xfe
In the following table we show the above probabilities and the samples of
the normal curve N(tj,<t2) (Fig. 8-7) where
71 = np - 3	<r~ = npq = 1.5
к	0	I		2	3	4	5	6
Pk	0.016	0.094	0.234	0.312	0.234	0.094	0.016
Nd), a)	0.016	0.086	0.233	0.326	0.233	0.086	0.016
8-4 STOCHASHC
CONVERGENCE AMI) LIMI Г IHEOREMS 217
ERROR^ CORRECTION. In the approximation of /(.r) by the normal curve
M77, <>")» error
e(a) =/(.v)------}=e~'2/2"'
сгу2тг
results where we assumed, shifting the origin, that tj = 0. Wc shall express this
error in terms of the moments
mn = E{x")
of x and the Hermite polynomials
. , dk
dx
= xk - (2)** * + 1 ‘	+	(8-114)
These polynomials form a complete orthogonal set on the real line:
Г e~x'/2Hn(x)Hm(x') dx = /«^	n = m
J -*	(0	n * m
Hence e(x) can be written as a series
The series starts with к = 3 because the moments of e(x) of order up to 2 are
0. The coefficients Cn can be expressed in terms of the moments m„ of x.
Equating moments of order n = 3 and n = 4, we obtain [sec (5-44)]
3!o-3C3 = m3 V.a'C4 = m4 - 3a4
First-order correction. From (8-114) it follows that
H3(x) = x3 - 3x H4(x) = x4 - 6x2 + 3
Retaining the first nonzero term of the sum in (8-115), wc obtain
m3 I x3	3x \
6cr3 Iff3	<r I
/(x) = -^e-2/2"1
ffVZTF
Iffix') is even, then m3 = 0 and (8-115) yields
1 (m4

24 \ cr
.4
л
6x2
(8-116)
(8-117)
Example 8-17, If the RVs xf are i:i.d. with density /,(х) as in Fig. 8-8a, then f(x)
consists of three parabolic pieces (see also Example 8-12) and MO, 1/4) is its
normal approximation. Since fix) is even and «14 •= 13/80 (see Prob. 8-4), (8-117)
218 SEQUENCES OF RANDOM VARIABLES
FIGURE 8-8
yields
In Fig. 8-8b, we show the error e(x) of the normal approximation and the
first-order correction error f(x) - /(x).
ON THE PROOF OF THE CENTRAL LIMIT THEOREM. We shall justify the
approximation (8-111) using characteristic functions. We assume for simplicity
that 7?; = 0. Denoting by Ф,(а>) and Ф(л>), respectively, the characteristic
functions of the RVs x, and x = X| + * • • + x„, we conclude from the indepen-
dence of X; that
Ф(ш) = Ф|(&>) • • • Ф,|(ю)
Near the origin, the functions ФДа») = In Ф;(а>) can be approximated by a
parabola:
Ф,(й>) — — |oy2cu2 Ф,-(<о) = e_<r<z"*/2 for |w| < e (8-118)
If the RVs X,- are of continuous type, then [see (5-61) and Prob. 5-25]
Ф,(0) = 1	|Ф,.(4))| < 1 for |о>| #= 0	(8-П9)
Equation (8-119) suggests that for small e and large n, the function Ф(л>) is
negligible for |w| > e, (Fig. 8-9a). This holds also for the exponential e~v " /_
if ст -> a» as in (8-123). From the above it follows that
Ф(й>) —	-•  g-0»?*»*/2 — e-tr'a>'/2 fOj. aj| ш (8-120)
in agreement with (8-111).
8-4 SIOCHAS'UCCONVt RG1-NCL А\|> 1.1МГ1 IHIOKI.MS 219
The exact form of the theorem states that the normalized RV
xi + ••• + x„
Z = ------------- <T* = (Tf + •  • + <T~
<T
tends to an N(0,1) RV as n ->
(8-121)
V ^7T
A general proof of the theorem is given below. In the following, we sketch a
proof under the assumption that the RVs x, are i.i.d. In this case
Ф|(&)) = • • = Ф„(й)) <т = arfn
Hence,
ФДео) =фИ-4-)
\<Г,УЛ I
Expanding the functions = In Ф,(о>) near the origin, we obtain
(T2b)2
=------— + <7(w3)
Hence,
/ (!) \	(t)2	/ 1 1	Ш2
ф.((о) = лФ, —г = --у + о 7= ,772 - у <8-122)
\ ОуУП /	z	\ v« /	z
This shows that Фг(ш) -* e"“1/2 as л -* «> and (8-121) results.
As we noted", the theorem is not true always. The following is a set of
sufficient conditions:
(fl)	<г.2+ •• +a„2^oo	(8-123)
(b) There exists a number a > 2 and a finite constant К such that
Г x*fi(x) dx<K<«> for all i	(8-124)
220 SEQUENCES OF RANDOM VARIANCES
These conditions are not the most general. However, they cover a wide range of
applications. For example, (8-123) is satisfied if there exists a constant e > ()
such that a-, > s for all i. Condition (8-124) is satisfied if all densities /Дх) are 0
outside a finite interval (-c, c) no matter how large.
Lattice type The preceding reasoning can also be applied to discrete-type
RVs. However, in this case the functions ФДш) are periodic (Fig. 8-9h) and
their product takes significant values only in a small region near the points
ш = 2тгп/а. Using the approximation (8-112) in each of these regions, wc
obtain
ч	>	2 7Г
Ф(ш) =	/2	= —	(8-125)
a	a
As we can see from (11A-1), the inverse of the above yields (8-112).
The Berry-Esseen theorem^ This theorem states that if
£{x;4} £ ca2 all i	(8-126)
where c is some constant, then the distribution F(.r) of the normalized sum
_	X,+ ••• + x„
x = --------------- erf + • • • + er/ = cr~
a
is close to the normal distribution G(x) in the following sense
_	4c
|F(x) -G(x)| < —	(8-127)
<r
The central limit theorem is a corollary of (8-127) because (8-127) leads to
the conclusion that
F(x)->G(.v) as a->oo	(8-128)
This proof is based on condition (8-126). This condition, however, is not too
restrictive. It holds, for example, if the RVs x, are i.i.d. and their third moment
is finite.
We note, finally, that whereas (8-128) establishes merely the convergence
in distribution of x to a normal RV, (8-127) gives also a bound of the deviation
of F(jc) from normality.
The central limit theorem for products. Given n independent positive RVs x,,
we form their product:
У = XjX2 • * • X„ X; > 0
•fA. Papoulis: “Narrow-Band Systems and Gaussianity,” IEEE Transactions on Information Theory,
January 1972.
8-5 KANBOM NGMMt RS. MEANING AND СИ M HAI1ON 221
THEOREM. For large n, the density of у is approximately lognormal-.
exp{“ Ч“г(,п У ~ v)^U(y)	(8-129)
where
'»	ZI
V = 52 £{ln\} w2 - Var(lnx,)
»- i	(-1
Proof. The RV
z = Iny = Inx, + •• • + !nx„
is the sum of the RVs lnxf. From the CLT it follows, therefore, that for large n,
this RV is nearly normal with mean 77 and variance a2. And since у = c\ wc
conclude from (5-10) that у has a lognormal density. The theorem holds if the
RVs Inx, satisfy the conditions for the validity of the CLT.
Example 8-18. Suppose that the RVs xf arc uniform in the interval (0.1). In this
case,
E{lnx,} = In xdx = - 1 E{(ln x, )2} = ||'(lnx)“ dx = 2
Hence 77 = — n and <r2 = n. Inserting into (8-129). we conclude that the density of
the product у = x( ♦ • • x„ equals
Л(у) = /=------ exp/- —(In у + n)z\u(y)
уу2тг« ( 2/t	}
8-5 RANDOM NUMBERS: MEANING
AND GENERATION
Random numbers (RNs) are used in a variety of applications involving comput-
ers and statistics. In this section, we explain the underlying ideas concentrating
on the meaning and generation of RNs. We start with a simple illustration of
the role of statistics in the numerical solution of deterministic problems.
MONTE CARLO INTEGRATION. We wish to evaluate the integral
7» ('g(x)dx	(8-130)
jd
For this purpose, we introduce an RV x with uniform distribution in the interval
(0,1) and we form the RV у = g(x). As we know,
£(««) - I'sMfM = //O) *	(«-I’D
hence 77. = 7. We have thus expressed the unknown I as the expected value of
the Rv\. This result involves only concepts; it docs not yield a numerical
222 SEQUENCES OF RANDOM VARIABLES
method for evaluating /. Suppose, however, that the RV x models a physical
quantity in a real experiment. We can then estimate I using the idative
frequency interpretation of expected values: We repeat the experiment a large
number of times and observe the values x, of x; we compute the corresponding
values y, = g(x,) of у and form their average as in (5-26). This yields
I = £{g(x)} =	(8-132)
The above suggests the following method for determining I:
The data л,-, no matter how they are obtained, are random numbers; that
is, they are numbers having certain properties. If, therefore, we can numerically
generate such numbers, we have a method for determining /. To carry out this
method, we must reexamine the meaning of RNs and develop computer
programs for generating them.
THE DUAL INTERPRETATION OF RNs. “What are RNs? Can they be generated
by a computer? Is it possible to generate truly random number sequences?”
Such questions do not have a generally accepted answer. The reason is simple.
As in the case of probability (see Chap. 1), the term random numbers has two
distinctly different meanings. The first is theoretical: RNs are mental constructs
defined in terms of an abstract model. The second is empirical: RNs are
sequences of real numbers generated either as physical data obtained from a
random experiment or as computer output obtained from a deterministic
program. The duality of interpretation of RNs is apparent in the following
extensively quoted definitions!:
A sequence of numbers is random if it has every property that is shared by all
infinite sequences of independent samples of random variables from the uniform
distribution (J. M. Franklin)
A random sequence is a vague notion embodying the ideas of a sequence in which
each term is unpredictable to the uninitiated and whose digits pass a certain
number of tests, traditional with statisticians and depending somewhat on the uses
to which the sequence is to be put. (D. H. Lehmer)
It is obvious that these definitions cannot have the same meaning. Never-
theless, both are used to define RN sequences. To avoid this confusing ambigu-
ity, we shall give two definitions: one theoretical, the other empirical. For these
definitions we shall rely solely on the uses for which RNs are intended: RNs are
used to apply statistical techniques to other fields. It is natural, therefore, that
they are defined in terms of the corresponding probabilistic concepts and their
tD. fi. Knuth: The Art dfComputer Programming, Addison-Wesley, Reading, MA, 1969.
8-5
RANDOM NUMBERS: MEANING AM) GENI RAI ION 22Л
properties as physically generated numbers are expressed directly in terms of
the properties of real data generated by random experiments.
CONCEPTUAL DEFINITION. A sequence of numbers x, is called random if it
equals the samples x, = x,«) of a sequence x, of i.i.d. RVs x, defined in the
space of repeated trials.
It appears that this definition is the same as Franklin’s. There is. however,
a subtle but important difference. Franklin says that the sequence .v, has everv
property shared by i.i.d. RVs; we say that x, equals the samples of the i.i.d. RVs
x,. In this definition, all theoretical properties of RNs are the same as the
corresponding properties of RVs. There is, therefore, no need for a new theory.
EMPIRICAL DEFINITION. A sequence of numbers x, is called random if its
statistical properties are the same as the properties of random data obtained
from a random experiment.
Not all experimental data lead to conclusions consistent with the theory of
probability. For this to be the case, the experiments must be so designed that
data obtained by repeated trials satisfy the i.i.d. condition. This condition is
accepted only after the data have been subjected to a variety of tests and in any
case, it can be claimed only as an approximation. The same applies to com-
puter-generated RNs. Such uncertainties, however, cannot be avoided no mat-
ter how we define physically generated sequences. The advantage of the above
definition is that it shifts the problem of establishing the randomness of a
sequence of numbers to an area with which we are already familiar. We can,
therefore, draw directly on our experience with random experiments and apply
the well-established tests of randomness to computer-generated RNs.
Generation of RN Sequences
RNs used in Monte Carlo calculations are generated mainly by computer
programs; however, they can also be generated as observations of random data
obtained from real experiments: The tosses of a fair coin generate a random
sequence of Q’s (heads) and l’s (tails); the distance between radioactive emis-
sions generates a fandom sequence of exponentially distributed samples. Wc
accept number sequences so generated as random because of our long experi-
ence with such experiments. RN sequences experimentally generated are not,
however, suitable for computer use, for obvious reasons. An efficient source of
RNs is a computer program with small memory, involving simple arithmetic
operations. We outline next the most commonly used programs.
Our objective is to generate RN sequences with arbitrary distributions. In
the present state of the art, however, this cannot be done directly. The available
algorithms only generate sequences consisting of integers z( uniformly dis-
tributed in an interval (0,m). As we show later, the generation of a sequence x,
with an arbitrary distribution is obtained indirectly by a variety of methods
involving the uniform sequence z(.
224 SEQUENCES OF RANDOM VARIABLES
The most general algorithm for generating an RN sequence z, is an
equation of the form
*„=/(*„-1.......г„_г) mod m	(8-133)
where f(z„_iy..., z„_r) is a function depending on the r most recent past
values of z„. In this notation, zn is the remainder of the division of the number
by m. The above is a nonlinear recursion expressing z„ in
terms of the constant m, the function /, and the initial conditions z,,..., zr~t.
The quality of the generator depends on the form of the function f. It might
appear that good RN sequences result if this function is complicated. Experi-
ence has shown, however, that this is not the case. Most algorithms in use are
linear recursions of order 1. We shall discuss the homogeneous case.
LEHMER’S ALGORITHM. The simplest and one of the oldest RN generators is
the recursion
z„ = mod m z0 = 1 n > 1	(8-134)
where m is a large prime number and a is an integer. Solving, we obtain
zn = a" mod tn	(8-135)
The sequence z,t takes values between I and m - 1; hence at least two of its
first m values are equal. From this it follows that zn is a periodic sequence for
n > m with period mo < m - 1. A periodic sequence is not, of course, random.
However, if for the applications for which it is intended the required number of
sample does not exceed m(,y periodicity is irrelevant. For most applications, it is
enough to choose for m a number of the order of 109 and to search for a
constant a such that mo = m - 1. A value for m suggested by Lehmer in 1951
is the prime number 2?I - 1.
To complete the specification of (8-134), we must assign a value to the
multiplier a. Our first condition is that the period nio of the resulting sequence
z(l equal tn0 — 1.
DEFINITION. An integer a is called the primitive root of m if the smallest n
such that
an = 1 mod m is n = m — 1.	(8-136)
From the definition it follows that the sequence an is periodic with period
mo = m — 1 iff a is a primitive root of m. Most primitive roots do not generate
good RN sequences. For a final selection, we subject specific choices to a variety
of tests based on tests of randomness involving real experiments. Most tests are
carried out not on terms of the integers z, but in terms of the properties of the
numbers
„ _ L	(8-137)
m
These numbers take essentially all values in the interval (0,1) and the purpose
8-5 RANDOM NUMBERS Ml AMNG AND Gl M HaIion 225
of testing is to establish whether they are the values of a sequence u of
continuous-type i.i.d. RVs uniformly distributed in the interval (0,1). The i.i.d.
condition leads to the following equations:
For every it, in the interval (0. I) and for every n.
Ли, w,-} = n,	(8-l38«)
P(ui * "t......u« 2 ««) “ Ли, < nJ •  • Ли,, < «„)	(8-1386)
To establish the validity of these equations, we need an infinite number of tests.
In real life, however, we can perform only a finite number of tests. Furthermore,
all tests involve approximation based on the empirical interpretation of proba-
bility. We cannot, therefore, claim with certainty that a sequence of real
numbers is truly random. Wc can claim only that a particular sequence is
reasonably random for certain applications or that one sequence is more
random than another. In practice, a sequence it,, is accepted as random not only
because it passes the standard tests but also because it has been used with
satisfactory results in many problems.
Over the years, several algorithms have been proposed for generating
“good” RN sequences. Not all, however, have withstood the test of time. An
example of a sequence z„ that seems to meet most requirements is obtained
from (8-134) with a = 27 — 1 and m = 251 - 1:
z„ = 16,807z„ _ j mod 2,147,483,647	(8-139)
This sequence meets most standard tests of randomness and has been used
effectively in a variety of applications.!
We conclude with the observation that most tests of randomness are
applications, direct or indirect, of well-known tests of various statistical hy-
potheses. For example, to establish the validity of (8-138a), we apply the
Kolmogoroff-Smimov test, page 272, or the chi-square test, page 273. These
tests are used to determine whether given experimental data fit a particular
distribution. To establish the validity of (8-1386), we apply the chi-square test,
page 274. This test is used to determine the independence of various events.
In addition to direct testing, a variety of special methods have been
proposed for testing indirectly the validity of both equations in (8-138). These
methods are based on well-known properties of RVs and they are designed for
particular applications. The generation of random vector sequences is an
application requiring special tests.
Random vectors. We shall construct a multidimensional sequence of RNs using
the following properties of subsequences. Suppose that x is an RV with
distribution Fix) and x, is the corresponding RN sequence. It follows from
TS. K> Park and K. W. Miller “Random Number Generations: Good Ones Are Hard to Find,"
Communications of the ACM, vol. 31, no. 10, October 1988.
226 .SEQUENCES ОГ RANDOM VARIABLES
(8-138) that every subsequence of x, is an RN sequence with distribution F(x).
Furthermore, if two subsequences have no common elements, they are the
samples of two independent RVs. From this we conclude that the odd-subscript
and even-subscript sequences
x" = x,j_ । x- = x2l 1=1,2,...
are the samples of two i.i.d. RVs x" and x‘' with distribution Fix). Thus,
starting from a scalar RN sequence, x,, we constructed a vector RN sequence
(x‘\ xf). Proceeding similarly, we can construct RN sequences of any dimen-
sionality. Using superscripts to identify various RVs and their samples, we
conclude that the RN sequences
*=1.......ffl z=l,2, ...	(8-140)
arc the samples of m i.i.d. RVs x1,.. . ,x"' with distribution F(x).
Note that a sequence of numbers might be sufficiently random for scalar
but not for vector applications. If, therefore, an RN sequence x, is to be used
for multidimensional applications, it is desirable to subject it to special tests
involving its subsequences.
RN Sequences with Arbitrary Distributions
In the following, the letter u will identify an RV with uniform distribution in the
interval (0,1); the corresponding RN sequence will be identified by ut. Using
the sequence uh we shall present a variety of methods for generating sequences
with arbitrary distributions. In this analysis, we shall make frequent use of the
following:
If x, are the samples of the RV x, then y, = g(x,) arc the samples of the
RV у = g(x). For example, if x, is an RN sequence with distribution FA(x),
then yt = a + Z>xt is an RN sequence with distribution Ft[(y - a)/b] if b > 0,
and 1 — Fj(y - a)/b] if b < 0. From this it follows, for example, that r, = 1 -
ut is an RN sequence uniform in the interval (0.1).
PERCENTILE TRANSFORMATION METHOD. Consider an RV x with distribu-
tion F/x). We have shown in Sec. 5-2 that the RV u = Fv(x) is uniform in the
interval (0,1) no matter what the form of FA.(x) is. Denoting by Fv,-n(x) the
inverse of F/x), we conclude that x = F/-l>(u) (see Fig. 8-10). From this it
follows that
x, = F/~’’(m,)	(8-141)
is an RN sequence with distribution Fx(x), [see also (5-19)]. Thus, to find an RN
sequence x, with distribution a given function Fx(x), it suffices to determine the
inverse of F/x) and to compute Fv<-I,(u;). Note that the numbers x# are the u,
percentiles of Fr(x).
8-5 RANDOM NUMBERS: MEANINU AND OI.NI.RA IK >N 227
Example 8-19. We wish to generate an RN sequence x, with exponential
distribution. In this case,
Fx(x) = I - e~1/A x=-Aln(l-u)
Since 1 — u is an RV with uniform distribution, wc conclude that the sequence
x, = —A In u,	(8-142)
has an exponential distribution.
Example 8-20. We wish to generate an RN sequence x, with Rayleigh distribution.
In this case,
F,(x) = I - е"л’/2 Fj" "(u) = У— 2 ln( I - u)
Replacing 1 — u by u, we conclude that the sequence
X; = y-2ln u,	(8-143)
has a Rayleigh distribution.
Suppose now that we wish to generate the samples xt of a discrete-type
RV x taking the values ak with probability
pk=P{x — ak} k =
In this case, Fx(x) is a staircase function (Fig. 8-11) with discontinuities at the
points ak, and its inverse is a staircase function with discontinuities at the points
«P| + • • • +pk. Applying (8-141), we obtain the following rule for
generating the RN sequence *,:
Set xf -	iff px + • • •	«»• < Pi + "' +Pk (8-144)
228 SEQUENCES OF RANDOM VARIABLES
FIGURE 8-11
Example 8-21. The sequence
10 if Q<ii'<p
I 1 if p < ut < 1
lakes the values 0 and I with probability p and 1 - p respectively. It specifics,
therefore, a binary RN sequence.
The sequence
x, = k iff O.U < и, < 0.1(A- + 1) к = 0,1.....................9
lakes the values 0,1,..., 9 with equal probability. Il specifics, therefore, a decimal
RN sequence with uniform distribution.
Setting
^ = (k)pkqnk л = 0-1........nt
into (8-15), we obtain an RN sequence with binomial distribution.
Setting
A4
ak = k Pk=e A = 0,1, ...
into (8-15) wc obtain an RN sequence with Poisson distribution.
Suppose now that we are given not a uniform sequence, but a sequence xt
with distribution F/x). We wish to find a sequence y{ with distribution Fv(y).
As we know, y, = Fy(-l>(u;) is an RN sequence with distribution Fy(y). Hence
(see Fig. 8-10) the composite function
W,‘”V,W)	(8-145)
generates an RN sequence with distribution Fy(y) [see also (5-20)].
8-5 random NUMHkKS MI AN1NG z\ND GENERA I ION 229
Example 8-22. Wc arc given an RN sequence x, > 0 with distribution I\(x) = I
e xe and we wish to generate an RN sequence y( > () with distribution
F/y) = 1 - In this example F' ”(M) = - ln( 1 - «); hence
П( F,(.r)) = - ln[ I -	= - ln(c '+.«< *)
Inserting into (8-145). wc obtain
y, = — ln(e *• + .r,c *')
REJECTION METHOD. In the percentile transformation method, we used the
inverse of the function Fx(x). However, inverting a function is not a simple task.
To overcome this difficulty, we develop next a method that avoids inversion. The
problem under consideration is the generation of an RN sequence y, with
distribution Fv(y) in terms of the RN sequence .v, as in (8-145).
The proposed method is based on the relative frequency interpretation of
the conditional density
Л(.г|.^) dx =
P{x < x < x + dx,.//}
(8-146)

of an RV x assuming J? (see page 80). In the following method, the event is
expressed in terms of the RV x and another RV u, and it is so chosen that the
resulting function /д.(х|.^) equals ffy). The sequence is generated by
setting y, = x{ if .Z' occurs, rejecting x, otherwise. The problem has a solution
only if fy(x) — 0 in every interval in which /t(x) = 0. Wc can assume, there-
fore, without essential loss of generality, that the ratio fy(x)/fx(x) is bounded
from below by some positive constant «:
fx(x)
------> a > 0 for every x
fy(*)
Rejection theorem. If the RVs x and u are independent and
f ( Д')
{u < r(x)} where r(x) a *	£ I (8-147)
Jx\ Л )
then
W) = /,.(*)	(8-148)
Proof. The joint density of the RVs x and u equals /t(x) in the strip 0 < и < I
of the xu plane, and 0 elsewhere. The event <// consists of all outcomes such
that the point (x,u) is in the shaded area of Fig. 8-12 below the curve и = r(x).
Hence
P(.^) = f* г(х)Л(х) dx = аdx ~ a
The event (x < x £X + dx,^} consists of all outcomes such that the point
230 SEQUENCES OP RANDOM VARIABLES
FIGURE 8-12
(x, u) is in the strip x < x <x + dx below the curve и = r(x). The probability
masses in this strip equal fx(x)r(x) dx. Hence
P{x < x <x +	= fr(x)r(x) dx
Inserting into (8-146), wc obtain (8-148).
From the rejection theorem it follows that the subsequence of x, such that
Uj < r(x,) forms a sequence of random numbers that are the samples of an RV
у with density	= /У(у). This leads to the following rule for generating
the sequence yz: Form the two-dimensional RN sequence (x,,n,.).
( V •)
Set У1 = Д', if < a ?( -y; reject x, otherwise (8-149)
Example 8-23. We arc given an RN sequence x, with exponential disiribution and
we wish to construct an RN sequence y, with truncated normal distribution:
Л(х) = e~xU(x) fy(y) =
у2тг
For x > 0,
Setting a = у/тг/2е, we obtain the following rule for generating the sequence y;:
Set y, =x, if ul < e~ix,~1)2/2; reject x, otherwise
MIXING METHOD. We develop next a method generating an RN sequence x,
with density /(x) under the following assumptions: The function /(x) can be
expressed as the weighted sum of tn densities fk(rn):
Л*) = PifM + • *' +Pmf„M Pk > 0	(8-150)
Each component fk(x) is the density of a known RN sequence x*.
In the mixing method, we generate the sequence xz by a mixing process
involving certain subsequences of the tn sequences x* selected according to the
following rule:
Set x,=Xy* if />| + ••• +p*_, <Pi + •“ +Pk (8-151)
8-5 RANDOM NUMHI-RS MI ANING ANDGI M RAIION 231
Mixing theorem. If the sequences u, and x,'...., x,”‘ are mutually independent,
then the density /v(x) of the sequence x,. specified by (8-151) equals
№ =Pj1(-v) + ••• +р,„/,„(x)	(8-152)
Proof. The sequence x, is a mixture of ni subsequences. The density of the
subsequence of the Hh sequence xf equals Д.(х). This subsequence is also a
subsequence of x, conditioned on the event
= {Pi + ’ •' + Pk-t < и < pt + • • • + pj
Hence its density also equals fx(x\.y/k). This leads to the conclusion that
A(-vl--4) =A(-v)
From the total probability theorem (4-58), it follows that
/,(•')	+ • +Л(.г|.Ч,)/'(.Ч„)
And since Р(.с/Л) = pk. (8-152) results. Comparing with (8-150), we conclude
that the density /X(x) generated by (8-152) equals the given function Дх).
Example 8-24. The Laplace density 0.5 c 1,1 can be written as a sum
f(x) = Q.5e xU(x) 4- 0.5c''(/( - .v)
This is a special case of (8-150) with
/,(x) = e-'U(x)	f2(x) = e'U(-x) P| = p2 = 0.5
A sequence xt with density f(x) can. therefore, be realized in terms of the
samples of two RVs x1 and x2 with the above densities. As we have shown in
Example 8-19, if the RV v is uniform in the interval (0,1). then the density of the
RV x1 = — Inv equals ffx); similarly, the density of the RV x2 = luv equals
/2(x). This yields the following rule for generating an RN sequence .v, with
Laplace distribution: Form two independent uniform sequences ut and r,:
Set x, = - In v,	if 0 < ui < 0.5
Set x, = In r,	if 0.5 u, < 1
GENERAL TRANSFORMATIONS. We now give various examples for generating
an RN sequence wt with specified distribution F„,(w) using the transformation
w = g(x’,..., x"')
where x* are m RVs with known distributions. To do so, we determine g such
that the distribution of w equals Flv(w). The desired sequence is given by
w. = g(x?.....x"')
Binomial RNs. If x* are m i.Ld. RVs taking the values 0 and 1 with probabili-
ties p and q respectively, their sum has a binomial distribution. From this it
follows that if x* are tn binary sequences, their sum
wt = x,1 + • • • + x •**
232 SEQUENCES OF RANDOM VARIABLES
is an RN sequence with binomial distribution. The m sequences x) can be
realized as subsequences of a single binary sequence x, as in (8-140).
Erlang RNs. The sum w = x' + ••• + x"‘ of m i.i.d. RVs x* with density
t>-r‘t/(x) has an Erlang density [see (4-38)]:
/„.(w) - w'”~'e~"U(w)	(8-153)
From this it follows that the sum w, = w- + ••• +w"' of m exponentially
distributed RN sequences w* is an RN sequence with Erlang distribution.
The sequences x* can be generated in terms of m subsequences of a
single sequence м{ (sec Example 8-19):
w = --(ln«! + ••• + lni<")	(8-154)
c
Chi-square RNs. We wish to generate an RN sequence w, with density
fH,(w) ~ w''/1-'e~w/2U(w)
For n = 2m, this is a special case of (8-153) with c = 1/2. Hence wt is given by
(8-154).
To find wt for n = 2m + 1, we observe that if у is ^2(2/n) and z is MO. 1)
and independent of y, the sum w = у + z2 is ^2(2zn + 1) [sec (8-63)]; hence the
sequence
w, = — 2(ln mJ + ♦ • ♦ +ln «'") + (г,)2
has a ^2(2m + 1) distribution.
Student-t RNs. Given two independent RVs x and у with distributions MO, 1)
and x2(n) respectively, we form the RV w = x/^/y/n. As we know, w has a
t(jf) distribution (see example 6-15). From this it follows that, if x, and y, are
samples of x and y, the sequence

has a /(n) distribution.
Lognormal RNs. If z is MO, 1) and w = ea+hz, then w has a lognormal distribu-
tion (see (5-10)]:
1	( (In w — a)2)
, /5— exp---------—2-----
bw>j2Tr	( 2bz	j
Hence, if z, is an MO, 1) sequence, the sequence
w, = е"+Лг'
has a lognormal distribution.
8-5 RANDOM NUMBERS’. MEANING AND GLNI-.RAT1ON 233
RN sequences with normal distributions. Several methods arc available for
generating normal RVs. We give next various illustrations. The percentile
transformation method is not used because of the difficulty of inverting the
normal distribution. The mixing method is used extensively because the normal
density is a smooth curve; it can, therefore, be approximated by a sum as in
(8-150). The major components (shaded) of this sum arc rectangles Fig. (8-13)
that can be realized by sequences of the form aut + h. The remaining compo-
nents (shaded) are more complicated, however; since their areas are small, they
need not be realized exactly. Other methods involve known properties of
normal RVs. For example, the central limit theorem leads to the following
method.
Given m independent RVs u*, we form the sum
z = u' + •  • + u'"
If tn is large, the RV z is approximately normal [see (8-111)]. From this it
follows that if are m independent RN sequences their sum
z, = a- + • • • + h"'
is approximately a normal RN sequence. This method is not very efficient. The
following three methods are more efficient and are used extensively.
Rejection and mixing (G. Marsaglia). In Example 8-23, we used the rejection
method to generate an RV sequence y, with a truncated normal density
2
/,(>') = -/==€->-'2U(y)
y2ir
The normal density can be written as a sum
ЛО) =	<8-155)
v2ir	2	&
The density fy(y) is realized by the sequence y, as in Example 8-23 and the
density fy( — y) by the sequence -yf. Applying (8-151), we conclude that the
234 SEQUENCES CM; RANDOM VARIABI.ES
following rule generates an MO, 1) sequence zt:
Scl Zj = y, if 0 < u, <0.5
Sei z, = -y, if 0.5 <, u, < 1
(8-156)
Polar coordinates. Wc have shown that, if the RVs r and <p are independent, r
has the Rayleigh density /r(r) = re~f2/1 and <p is uniform in the interval
(——,77). then (see Example 6-12) lhe RVs
z = rcos<p w=rsin<p	(8-157)
are MO, 1) and independent. Using this, wc shall construct two independent
normal RN sequences z, and w, as follows: Clearly, <p = tt(2u - 1); hence <p, =
п-(2н, - I). As we know, r = '/2x = J— 2 In v where x is an RV with exponen-
tial distribution and v is uniform in the interval (0,1). Denoting by x, and r, the
samples of the RVs x and v, wc conclude that r, = ^2x, = У — 2 In r, is an RN
sequence with Rayleigh distribution. From this and (8-157) it follows that if u,
and v, are two independent RN sequences uniform in lhe interval (0,1), then
the sequences
z, = у/— 2In p, cos 7t(2h, — 1) iv, = y]— 2In i\ sin тт(2и, - I) (8-158)
are MO, 1) and independent.
The Box-Muller method. The rejection method was based on the following: If
xt is an RN sequence with distribution F(x), its subsequence y, conditioned on
an event is an RN sequence with distribution F(x|.^). Using this, we shall
generate two independent MO, 1) sequences z, and w, in terms of the samples
x,-, yj of two independent RVs x,y uniformly distributed in lhe interval (- 1,1).
We shall use for Л the event
{q < 1} q = /x2 + y2
The joint density of x and у equals 1/4 in the square |x| < 1, |y| < 1 of
Fig. 8-14 and 0 elsewhere. Hence
= — F{q < q} = -y for q < 1
But {q < q, лН = (q < q}, for q < 1 because {q < <7} is a subset of Hence
= o2 f„(qU) = 2q 0 < q< 1 (8-159)
Writing the RVs x and у in polar form:
x - qcos<p у = qsin q> tan <p = y/x	(8-160)
8-5 RANDOM NUMBERS MEANING AND GENERATION 235
FIGURE 8-14
we conclude as in (8-159) that the joint density of the RVs q and 9 is such that
P{q < q < q + dq, ip < < <p + dip}	qdqd<p/4
for 0 < q < 1 and |<p| < тг. From this it follows that the RVs q and <p are
conditionally independent and
fq(ql*#)=2q	= 1/2тг Q^q^l -ir<ip<ir
THEOREM. If x and у are two independent RVs uniformly distributed in the
interval (-1,1) and q = \/x2 + y2, then the RVs
z = — y/ — 41nq w = — /— 4 In q	(8-161)
q	q
are conditionally MO, 1) and independent:
ztt
Proof. From (8-160) it follows that
z = y/- 4In q cos <p w = 41nq sin <p
This system is similar to the system (8-157). To prove the theorem, it suffices,
therefore, to show that, the conditional density of the RV r = V~ 41nq
assuming equals re~r‘/2. To show this, we apply (5-5). In our case,
<?(r) -	<r(r) =	= 2»
z	r t q)
236 SEQUENCES OF RANDOM VARIABLES
Hcncc	r
fr{rU) =Л/(^1^)1<7'('-)1 =2e-'2/-'-e-r?/4 = n?-'2/2
This shows that the conditional density of the RV г is Rayleigh as in (8-157).
The preceding theorem leads to the following rule for generating the
sequences z, and iv,: Form two independent sequences x, = 2г/, - 1, у, =
2г;,. - 1.
If <7, = Ул7 + У,2 < К set z, = — у/^41п //, iv, = - у/- 4 In q,
Qi	Qi
Reject (x,, yt) otherwise.
COMPUTERS AND STATISTICS. In this section, we analyzed the dual meaning
of random numbers and their computer generation. We conclude with a brief
outline of the general areas of interaction between computers and statistics:
1.	Statistical methods are used to solve numerically a variety of deterministic
problems.
Examples include the following: evaluation of integrals, solution of differen-
tial equations; determination of various mathematical constants. The solu-
tions are based on the availability of RN sequences. Such sequences can be
obtained from random experiments; in most cases, however, they arc com-
puter generated. We shall give a simple illustration of the two approaches in
the context of Buffon’s needle. The objective in this problem is the statistical
estimation of the number tt. The method proposed in Example 6-4 involves
the performance of a physical experiment. We introduce the event .c/=
(x <ocos0) where x (distance from the nearest line) and 0 (angle of the
needle) are two independent RVs uniform in the intervals (0, a) and (0, ~/2),
respectively. This event occurs if the needle intersects one of the lines and its
probability equals тгЬ/2а. From this it follows that
n2a
— тг — —~n(8-162)
n	no
where n is the number of intersections in n trials. The above estimate can
be obtained without experimentation. We form two independent RN se-
quences x, and 0, with distributions Ft(x) and F6(e), respectively, and wc
denote by n^ the number of times x, < a cos 6,. With n ;Z so determined the
computer generated estimate of 7r is obtained from (8-162).
2.	Computers are used to solve a variety of deterministic problems originating
in statistics.
Examples include the following: evaluation of the mean, the variance, or
other averages used in parameter estimation and hypothesis testing; classifi-
cation and storage of experimental data; use of computers as instructional
tools. For example, graphical demonstration of the law of large numbers or
the central limit theorem. Such applications involve mostly routine computer
I’Roni.i ms 237
programs unrelated to statistics. There is, however, another class of deter-
ministic problems the solution of which is based on statistical concepts and
RN sequences. A simple illustration follows:
We are given m RVs xx„ with known distributions and we wish to
estimate the distribution of the RV у = g(x!...x„). This problem can, in
principle, be solved analytically; however, its solution is. in general, complex.
See, for example, the problem of determining the exact distribution of the
RV q used in the chi-square test (9-76). As we explain next, the determina-
tion of Fv(y) is simplified if wc use Monte Carlo techniques. Assuming for
simplicity that n = I, wc generate an RN sequence x, of length n with
distribution the known function /\.(л) and wc form the RN sequence
yz = g(x,). To determine F,.(y) for a specific y. wc count the number /ц of
samples y, such that y, < y. Inserting into (4-3). wc obtain the estimate
П.
Fv(y) - —	(8-163)
n
A similar approach can be used to determine the и percentile xM of x or to
decide whether x„ is larger or smaller than a given number (see hypothesis
testing, Sec. 9-2).
3.	Computers are used to simulate random experiments or to verify a scientific
theory.
This involves the familiar methods of simulating physical systems where now
all inputs and responses arc replaced by appropriate RN sequences.
PROBLEMS
8-1. Show that if F(x,y,z) is a joint distribution, then for any .v( <x2. yt sy2,
2i z2:
F(x2,y2,z2) + FCxj.yj.Zj) + F(X|,y2,zt) + F(x2, yt,zj
-F(x,,y2,z2)-F(x2.y„z2)-F(x2,y2,z,)-F(x,.y„z,) £0
8-2. The events x/,	£ arc such that
P(.o/) = Р(^)=Р(Л = °.5
P(.s>Z0) = P(x/<f ) = P(£^) = P(.fc'W) = 0.25
Show that the zero-one RVs associated with these events arc not independent;
they are, however, independent in pairs.
8-3. Show that if the RVs x, y, z arc jointly normal and independent in pairs, they arc
independent.
The RVs x(- are i.i.d. and uniform in the interval (-05,0.5). Show that
E{(X, + X2 + Xj)4) = 5ft
238 SEQUENCES OF RANDOM VARIABLES
8-5. (a) Reasoning as in (6-34), show that if the RVs x,y,z arc independent and lhetr
joint density has spherical symmetry:
f(x,y,z) = /(/r2 + у2 + z2)
then they arc normal with zero mean and equal variance.______
(b) The components vx,vy,v. of the velocity v = ^/v; + v2 + v2 of a particle are
independent RVs with zero mean and variance kT/m. Furthermore, their joini
density has spherical symmetry. Show that v has a Maxwell density and
fW , ЗкТ л 15k2T2
£(v} = 21/----- £{v2} ----------- £{v4} = --------—
' у irm	m	nr
8-6. Show that if the RVs x,y,z are such that rxy = ry. = 1, then rts = I.
8-7. Show that
£{xtx2|x3} = £{£{xIx2|x2,x3) |хл) = £{x2£{xi|x2,x3)|x3)
8-8. Show that £{y|xt} = £{£{y|X|,x2}|x|) where £{y|X|,x2) = a,Xj + a2x2 is the
linear MS estimate of у terms of Xj and x2.
8-9. Show that if
n
X, > 0, 2i{xfJ = M and s = У. x,
i- t
then
£{s2) < ME{n2}
8-10. We denote by xm an RV equal to the number of tosses of a coin until heads shows
for the mth time. Show that if P{h) = p, then £{x,„} = m/p.
Hint: E{xm - xm_ J = £{X[} = p + 2pq + • • • + npqn~1 +	• = 1/p.
8-11. The number of daily accidents is a Poisson RV n with parameter a. The probability
that a single accident is fatal equals p. Show that the number m of fatal accidents
in one day is a Poisson RV with parameter ap.
Hint:
£{e>"m|n = n} - У eJU,k(^\pkqn~k = (pe;<“ + q)"
fc-0 ' '
8-12. The RVs x* are independent with densities fk(x) and the RV n is independent of
xk with P(n = к) = pk. Show that if
s- Ex* then A(s) = E P*[/iCr)*’•• */*(*)]
Л-1	Л-1
8-13. The RVs x, are-i.i.d. with moment function Ф/s) = £(e4x'}. The RV n takes the
values 0,1,... and its moment function equals r„(z) = £{z"}. Show that if
у = E x, then Фу(,) = £{e"} = Г„[ф,(х)]
i
Hint: £{e,y|n = k} = E{e**>* +x‘>) = Фк(5).
Special case: If n is Poisson with parameter a, then - t.ue>.u>-a
prohi i ms 239
8-14. The RVs x, are i.i.d. and uniform in the interval (0.1). Show that if у = maxx ,
then Fiy) = y" for 0 £ у 1.
8-15. Given an RV x with distribution F,(.r), wc form its order statistics y* as in
Example 8-2, and their extremes
i=y„ = xm...x w = y1=xni(1)
Show that
f (z w) = Н» - D/,(-)A(w)[Fa(z) - F,(w)]" : OH-
IO	-<и-
8-16. Given n independent N(q,, 1) RVs z,. wc form the RV w = zf + •  + z2. This
RV is called noncentral chi-square with zi degrees of freedom and eccentricity
e =	• +77^. Show that its moment generating function equals
8-17. Show that if the RVs x, are i.i.d. and normal, then their sample mean x and sample
variances s2 are two independent RVs.
8-18. Show that, if a0 + ajXj + a2x; is the nonhomogencous linear MS estimate of s in
terms of xt and x2, then
E{s - 77,|X| - 77|,x, - 77,} = a,(xt - t]|) + o,(x2 - 77,)
8-19. Shows that
E{y|x,J = E{E{y|X|.x,} |x,}
8-20. We place at random n points in the interval (0,1) and wc denote by x and у the
distance from the origin to the first and last point respectively. Find Fix'), Fiy),
and Fix, y).
8-21. Show that if the RVs x( are i.i.d. with zero mean, variance cr2, and sample variance
v (see Example 8-5), then
n - 3
£{x?) - —,‘j
8-22. The RVs x( are MO; cr) and independent. Using Prob. 7-1, show that if
VtT "	, 7Г - 2 э
2 = — E |x21 - x21_|| then E{z}=cr cr?------------------
8-23. Show that if R is the correlation matrix of the random vector X: [Х|,...,хл] and
R~* is its inverse, then
E{XR'lX'} = n
8«24. Show that if the RVs X,- are of continuous type and independent, then, for
sufficiently large n, the density of sin(X] + • • • + x„) is nearly equal to the density
of sin x where x is an RV uniform in the interval (-тг, тг).
8-25, Show that if a„ and E(|x„ -	-» 0, then x„ -»a in the MS sense as
ft -» ».
'8-26. Using the Cauchy criterion, show that a sequence x„ tends to a limit in the MS
sense iff the limit of E{x„x,„} as n,rn -» « exists.
240 PROBABILITY AND RANDOM VARIABLES
8-27. An infinite sum is by definition a limit:
E x* = I'm У„ У„ = £ x*
*-l	*-*	A-1
Show that if the RVs xk are independent with zero mean and variance cr/, then
the sum exists in the MS sense iff
A- 1
Hint:
^{(Ул	“ У,.)2} = i *£
А-Л4-1
8-28. The RVs x, are i.i.d. with density ce~CIU(x). Show that, if x = X| + • • • +x„, then
fSx) is an Erlang density.
8-29. Using the central limit theorem, show that for large л:
c
-------~Xn~ le'“ =	g-(cx—n)2/2n x > 0
(л-1)!	У^гл
8-30. The resistors rj,r2, r3,r4 are independent RVs and each is uniform in the interval
(450; 550). Using the central limit theorem, find />{19(X) q + r-> + r3 + r4
2100}.
8-31. Show that the central limit theorem does not hold if the RVs x, have a Cauchy
density.
8-32. The RVs x and у arc uncorrelated with zero mean and <rx = <ry - tr. Show that if
z = x + Jy, then
/.(z) =/(x,y) =	=	’ е-|л|’/<г/
2тго-~	ire.
f 1	, ,	,	\	(	1 ,
Фг(П) = exp< - -(o-2ir + o-*t--) > = exp< - — trfini” >
where П = u + jv. This is the scalar form of (8-62).
CHAPTER
9
STATISTICS
9-1 INTRODUCTION
Probability is a mathematical discipline developed as an abstract model and its
conclusions are deductions based on the axioms. Statistics deals with the
applications of the theory to real problems and its conclusions are inferences
based on observations. Statistics consists of two parts: analysis and design.
Analysis, or mathematical statistics, is part of probability involving mainly
repeated trials and events the probability of which is close to 0 or to 1. This
leads to inferences that can be accepted as near certainties (see page 12).
Design, or applied statistics, deals with data collection and construction of
experiments that can be adequately described by probabilistic models. In this
chapter, we introduce the basic elements of mathematical statistics.
We start with the observation that the connection between probabilistic
concepts and reality is based on the approximation
relating the probability p = of an event «о/ to the number n^ of
successes of л/ in n trials of the underlying physical experiment. We used this
empirical formula to give the relative frequency interpretation of all probabilis-
tic concepts. For example, we showed that the mean 17 of an RV x can be
approximated by the average
i
Ц =	(’-2)
of the observed values of x, and its distribution F(x) by the empirical
241
242 statistics
Predict x	Estimate (i
Ы	(6) FIGURE 94
distribution
(9-3)
where n,. is the number of .v,’s that do not exceed x. These relationships are
empirical point estimates of the parameters -p and F(x) and a major objective
of statistics is to give them an exact interpretation.
In a statistical investigation, we deal with two general classes of problems.
In the first class, we assume that the probabilistic model is known and we wish
to make predictions concerning future observations. For example, wc know the
distribution F(x) of an RV x and we wish to predict the average x of its n
future samples or we know the probability p of an event and we wish to
predict the number n ;Z of successes of ./ in n future trials. In both cases, we
proceed from the model to the observations (Fig. 9-1g). In the second class, one
or more parameters 6, of the model are unknown and our objective is either to
estimate their values (parameter estimation) or to decide whether 9, is a set of
known constants 0O) (hypothesis testing). For example, we observe the values x,
of an RV x and we wish to estimate its mean 17 or to decide whether to accept
the hypothesis that 77 = 5.3. We toss a coin 1000 times and heads shows 465
times. Using this information, we wish to estimate the probability p of heads or
to decide whether the coin is fair. In both cases, we proceed from the
observations to the model (Fig. 9-lb). In this chapter, we concentrate on
parameter estimation and hypothesis testing. As a preparation, we comment
briefly on the prediction problem.
Prediction. We are given an RV x with known distribution and we wish to
predict its value x at a future trial. A point prediction of x is the determination
of a constant c chosen so as to minimize in some sense the error x - c. At a
specific trial, the RV x can take one of many values. Hence the value that it
actually takes cannot be predicted; it can only be estimated. Thus prediction of
an RV x is the estimation of its next value x by a constant c. If we use as the
criterion for selecting c the minimization of the MS error E{(x - c)2}, then
c = E{x}. This problem was considered in Sec. 7-3 and Sec. 8-3.
An interval prediction of x is the determination of two constants c, and c2
such that
P(C| < x < c2} = у = 1 — 8
(9-4)
9-1 INIRODUC'TION 243
FIGURE 9-2
where у is a given constant called the confidence coefficient. The above equatipn
states that if we predict that the value x of x at the next trial will be in the
interval (C|,c2), our prediction will be correct in 100y% of the cases. The
problem in interval prediction is to find C| and c2 so as to minimize the
difference c2 — C| subject to the constraint (9-4). The selection of у is dictated
by two conflicting requirements. If у is close to 1, the prediction that x will be
in the interval (,cuc2) is reliable but the difference c2 - c, is large; if у is
reduced, c2 — C| is reduced but the estimate is less reliable. Typical values of у
are 0.9, 0.95, and 0.99. For optimum prediction, we assign a value to у and we
determine c( and c2 so as to minimize the difference c2 — c, subject to the
constraint (9-4). We can show that (see Prob. 9-6) if the density fix) of x has a
single maximum, c2 — Cj is minimum if /(C|) = f(c2). This yields C| and c2 by
trial and error. A simpler suboptimal solution is easily found if we determine c,
and c2 such that
5	8
p{x <	= £ p{x > c2) = -
(9-5)
This yields C| ~xs/2 and c2 = xx_6/2 where xM is the и percentile of x (Fig.
9-2 a). This solution is optimum if f(x) is symmetrical about its mean 77 because
then /(cP = /(c2). If x is also normal, then x„ = 77 + zua where z„ is the
standard normal percentile (Fig. 9-2b).
244 stat isncs
Example 9-1. The life expectancy of batteries of a certain brand is modeled by a
norma) RV with r) = 4 years and a = 6 months. Our car has such a batten, bind
the prediction interval of its life expectancy with у = 0.95.
In this example. 3 = 0.05,	_й/: = zl)47s = 2 =	This yields the
interval 4 ± 2 x 0.5. Wc can thus expect with confidence coefficient 0.95 that the
life expectancy of our battery will be between 3 and 5 years.
As a second application, we shall estimate the number n z of successes of
an event .с/ in n trials. The point estimate of n is the product np. The
interval estimate (к^к2) is determined so as to minimize the difference k2 - к,
subject to the constraint
P(k{ < n :/< k2] = у
We shall assume that n is large and у = 0.997. To find lhe constants k\ and A,,
we set к = and e = y/pq/n into (3-37). This yields
P\np - 3y/npq < n.z< np + 3yjnpq } = 0.997	(9-6)
because 2G(3) - 1 = 0.997. Hence we predict with confidence coefficient 0.997
that will be in the interval np ± 3y/npq.
Example 9-2. We toss a fair coin 100 limes and wc wish to estimate the number
n & of heads with у = 0.997. In this problem n — 100 and p = 0.5. Hence
kt = np — iy/npq =35 k2 = np - Зу/npq = 65
We predict, therefore, with confidence coefficient 0.997 that the number of heads
will be between 35 and 65.
The above example illustrates the role of statistics in the applications of
probability to real problems: The event &/= (heads) is defined in the experi-
ment ,Z of the single toss of a coin. The given information that P(x/) = 0.5
cannot be used to make a reliable prediction about the occurrence of '/ at a
single performance of ,Z. The event
,^= {35 < n^< 65}
is defined in the experiment of repeated trials and its probability equals
P(£?) = 0.997. Since Р(й?) - 1 we can claim with near certainty that & will
occur at a single performance of the experiment ./J. We have thus changed the
“subjective” knowledge about sxf based on the given information that P(ss/} =
0.5 to the “objective” conclusion that will almost certainly occur, based on
the derived probability that P(^) ® 1. Note, however, that both conclusions
are inductive inferences; the difference between them is only quantitative.
9-2 PARAMETER ESTIMATION
Suppose that lhe distribution of an RV x is a function Fix,в) of known form
depending on a parameter 0, scalar or vector. We wish to estimate 0. To do so.
we repeal the underlying physical experiment n times and wc denote by .v( the
9-2
I’ARAMI 11.К I SI IMA 1 К 14	245
observed values of x. Using these observations, we shall find a point estimate
and an interval estimate of в.
A point estimate is a function 0 = g(X) of the observation vector A =
.......*„]• The corresponding RV 6 = g(X) is the point estimator of 0. Any
function of the sample vector X = [x,..x„]	is called a statistic Л Thus a point
estimator is a statistic.
We shall say that 0 is an unbiased estimator of the parameter 0 if
E{0} = 0. Otherwise. it is called biased with bias h = £{6) - 0. If the function
g(A') is properly selected, the estimation error 0 - 0 decreases as л increases.
If it tends to 0 in probability as n -> a>, then 0 is called a consistent estimator.
The sample mean x of x is an unbiased estimator of its mean t). Furthermore,
its variance сгг/п tends to 0 as n -* x. From this it follows that x tends to rj in
the MS sense, therefore, also in probability. In other words, x is a consistent
estimator of 77. Consistency is a desirable property; however, it is a theoretical
concept. In reality, the number n of trials might be large but it is finite. The
objective of estimation is thus the selection of a function g(X) minimizing in
some sense the estimation error g(X) - 0. If g(X) is chosen so as to minimize
the MS error
e = E([g(X) - s]2) = f [g( X} -e\2f(X,e)dX (9-7)
then the estimator 0 = g(X) is called the best estimator. The determination of
best estimators is not, in general, simple because the integrand in (9-7) depends
hot only on the function g(A') but also on the unknown parameter 0. The
corresponding prediction problem involves the same integral but it has a simple
solution because in this case, 0 is known (see Sec. 8-3).
In the following, we shall .select the function g(X) empirically. In this
choice we are guided by the following: Suppose that 0 is the mean 0 =
of some function q(x) of x. As we have noted, the sample mean
e = -l>(x,)	(9-8)
n
of <?(x) is a consistent estimator of 0. If, therefore, we use the sample mean 0 of
<?(x) as the point estimator of 0, our estimate will be satisfactory at least for
large it. In fact, it turns out that in a number of cases it is the best estimate.
INTERVAL ESTIMATES. We measure the length 0 of an object and the results
are the samples x, = 0 + V; of the RV x = 0 + v where v is the measurement
error. Can we draw with near certainty a conclusion about the true value of 0?
Wc cannot do so if we claim that 0 equals its point estimate 0 or any other
tThis interpretation of the term statistic applies only for Chap. 9. In all other chapters, statistics
means statistical properties.
246 STATISTICS
constant. We can, however, conclude with near certainty that в equals в within
specified tolerance limits. This leads to the following concept.
An interval estimate of a parameter в is an interval (0,, 02), the endpoints
of which are functions 0, = g^X) and 02 = g2(X) of the observation vector X.
The corresponding random interval (0(,02) *s the interval estimator of в. We
shall say that (0,, 02) is a у confidence interval of 0 if
P{e, < в < 02) = у	(9-9)
The constant у is the confidence coefficient of the estimate and the difference
8 - 1 — у is the confidence level. Thus у is a subjective measure of our
confidence that the unknown в is in the interval (0|,02). If у is close to 1 we
can expect with near certainty that this is true. Our estimate is correct in lOOy
percent of the cases. The objective of interval estimation is the determination of
the functions g/A') and g2(X) so as to minimize the length 02 - 0| of the
interval (0,, 02) subject to the constraint (9-9). If 0 is an unbiased estimator of
the mean 77 of x and the density of x is symmetrical about 77, then the optimum
interval is of the form 77 + a as in (9-10). In this section, we develop estimates
of the commonly used parameters. In the selection of 0 we are guided by (9-8)
and in all cases we assume that n is large. This assumption is necessary for good
estimates and, as we shall see, it simplifies the analysis.
Mean
We wish to estimate the mean 77 of an RV x. We use as the point estimate of 77
the value
of the sample mean x of x. To find an interval estimate, we must determine the
distribution of x. In general, this is a difficult problem involving multiple
convolutions. To simplify it we shall assume that x is normal. This is true if x is
normal and it is approximately true for any x if n is large (CLT).
Known variance. Suppose first that the variance a2 of x is known. The
normality assumption leads to the conclusion that the point estimator x of 77 is
Mt7,o-/т/n). Denoting by zu the и percentile of the standard normal density,
we conclude that
~ zt-s/2 < x < 77 + zt_6/2 —j=- j = G(zt_a/2) - G( — i[-e/2)
8	8
» 1 “ 2 " 2	(9-10)
9-2 PARA.MI 11 R IS НМЛ I JON 247
TABLE 9-1
ti -
и	0.90	0.925	0.95	0.975	0.99	0.995	0.999	0.9995
z	1.282	1.440	1.645	1.967	2.326	2.576	3.090	3.29) * I
because z„ == -Z|_„ and G(= G(z„) = u. This yields
p/x — г।_й/, —7=r < rj < x + Z| _л/2 -у=-\ = 1 — <5 = у (9-11)
\	vn	vn )
We can thus state with confidence coefficient у that 77 is in the interval
x ± 2t_6/2o-/ -/n . The determination of a confidence interval for 77 thus pro-
ceeds as follows:
Observe the samples x, of x and form their average x. Select a number
у = 1 - Й and find the standard normal percentile z„ for и = 1 - Й/2. Form
the interval x ± z,,a/ -fii.
This also holds for discrete-type RVs provided that n is large [sec (8-110)].
The choice of the confidence coefficient у is dictated by two conflicting
requirements: If у is close to 1. the estimate is reliable but the size 2z„a/ \bi
of the confidence interval is large; if у is reduced, z„ is reduced but <he
estimate is less reliable. The final choice is a compromise based on the
applications. In Table 9-1 we list z„ for the commonly used values of u. The
listed values are determined from Table 3-1 by interpolation.
Tchebycheff inequality. Suppose now that the distribution of x is not known. To
find the confidence interval of 77, we shall use (5-57): We replace x by x and cr
by ст/ 4n, and we set e = a/n8. This yields
p[x------f==r < 77 < x +	\ > I - й = у	(9-12)
I M	JnS J
The above shows that the exact у confidence interval of 77 is contained in
the interval x ± a/ . If, therefore, we claim that 77 is in this interval, the
probability that we are correct is larger than y. This result holds regardless of
the form of F(x) and, surprisingly, it is not very different from the estimate
(9-11). Indeed, suppose that у = 0.95; in this case, 1/ J8 = 4.47. Inserting into
(9-12), we obtain the interval x ± 4.47а/ 4n . The corresponding interval (9-11),
obtained under the normality assumption, is x ± 2<r/ 4n because z0975 - 2.
Unknown variance. If <r is unknown, we cannot use (9-11). To estimate 77, we
form the sample variance
I2- —!—E(x,-i)2	(9-13)
n - 1
248 statistics
This is an unbiased estimate of cr2 [see (8-23)] and it tends to cr2 as n -* x.
Hence, for large n, we can use the approximation 5 — cr in (9-11). This yields
the approximate confidence interval
s	_	s
x —	< Л < x + X[-a/2	(9-14)
We shall find an exact confidence interval under the assumption that x is
normal. In this case [see (8-65)] the ratio
has a Student-/ distribution with л - 1 degrees of freedom. Denoting by /„ its
и percentiles, we conclude that
I	x - 77	]
P{ -tu < ------=- < tu = 2u - 1 = у	(9-16)
[	s/ул	J
This yields the interval
X — f|_fi/2	< V < X + Г] -5/2	(9-17)
In Table 9-2 we list r„(/i) for n from 1 to 20. For n > 20, the tin)
distribution is nearly normal with zero mean and variance n/in - 2) (see Prob.
7-12).
Example 9-3. The voltage И of a voltage source is measured 25 times. The results
of the measurement are the samples jc, = И + v, of the RV x = V + v and their
average equals x = 112 V. Find the 0.95 confidence interval of V.
(a)	Suppose that the standard deviation of x due to the error v is cr = 0.4 V.
With 8 — 0.05, Table 9-1 yields zl)Q75 = 2. Inserting into (9-11), we obtain the
interval
x ± 2().975o-/>67 - 112 ± 2 X 0.4/i/25 = 112 ± 0.16 V
(b)	Suppose now that cr is unknown. To estimate it, we compute the sample
variance and we find s2 = 0.36. Inserting into (9-14), wc obtain the approximate
estimate
x ± 4.4isS/4n = 112 ± 2 X 0.6/i/25 = 112 ± 0.24 V
Since /0.975(25) = 2.06, the exact estimate (9-17) yields 112 ± 0.247 V.
In the following three estimates the distribution of x is specified in terms
of a single parameter. We cannot, therefore, use (9-11) directly because the
constants and <r are related.
tin most examples of this chapter, we shall not list all experimental data. To avoid lengthly tables,
we shall list only the relevant averages.
9-2 PARAMETER ESTIMATION 249
TABLE 9-2
Student-r Percentiles
nK	.9	.95	.975	.99	.995
1	3.08	631	12.7	31.8	63.7
2	1.89	2.92	430	6.97	9.93
3	1.64	2.35	3.18	4.54	5.84
4	1.53	2.13	2.78	3.75	4.60
5	1.48	2.02	2.57	3.37	4.03
6	1.44	1.94	2.45	3.14	3.71
7	1.42	1.90	2.37	3.00	3.50
8	1.40	1.86	2.31	2.90	3.36
9	1.38	1.83	2.26	2.82	3.25
10	1.37	1.81	2.23	2.76	3.17
11	1.36	1.80	2.20	2.72	3.11
12	1.36	1.78	2.18	2.68	3.06
13	1.35	1.77	2.16	2.65	3.01
14	1.35	1.76	2.15	2.62	2.98
15	1.34	1.75	2.13	2.60	2.95
16	134	1.75	2.12	2.58	2.92
17	133	1.74	2.11	2.57	2.90
18	133	1.73	2.10	2.55	2.88
19	133	1.73	2.09	2.54	2.86
20	1.33	1.73	2.09	2.53	2.85
22	132	1.72	2.07	2.51	2.82
24	132	1.71	2.06	2.49	2.80
26	1.32	1.71	2.06	2.48	2.78
28	131	1.70	2.05	2.47	2.76
30	131	1.70	2.05	2.46	2.75
Exponential distribution. We are given an RV x with density
У(х,Л) = — e~x/AU(x)
Л
and we wish to find the у confidence interval of the parameter A. As we know,
tj = A and <r = A; hence, for large n, the sample mean x of x is MA, А/ Jn).
Inserting into (9-11), we obtain
(	A	A ]
A - zu-f=- < x < A + zu-y=- > = у = 2u - 1
I	Vn	Vn I
250 statistics
This yields
(9-18)
and the interval x/(l ± zu/ yfn ) results.
Example 9-4. The time to failure of a light bulb is an RV x with exponential
distribution. Wc wish to find the 0.95 confidence interval of Л. To do so. we
observe the lime to failure of 64 bulbs and wc find that their average x equals 210
hours. Setting 2U/ Jn -2/ /f>4 = 0.25 into (9-18), we obtain the interval
168 < Л < 280
Wc thus expect with confidence coefficient 0.95 that the mean lime to failure
E{x) = Л of the bulb is between 168 and 280 hours.
Poisson distribution. Suppose that the RV x is Poisson distribution with param-
eter Л:
A*
P(x = к} = e~* — fc = 0,l,...
In this case, 17 = A and a2 - A; hence, for large n, the distribution of x is
approximately MA, yjk/n ) [see (8-110)]. This yields
/ )
The points of the xk. plane that satisfy the inequality |.r - A| < zu^k/n are in
the interior of the parabola
2^
(A-a)2=—A	(9-19)
n
From this it follows that the у confidence interval of A is the vertical segment
(A|,A2) of Fig. 9-3 where A, and A2 are the roots of the quadratic (9-19).
FIGURE 9-3
9-2 PAKAMIflER ESflMAIION 251
Example 9-5. The number of particles emitted from a radioactive substance per
second is a Poisson RV x with parameter A. We observe the emitted particles x in
64 consecutive seconds and we find that л = 6. Find the 0.95 confidence interval of
Л. With z„/n = 0.0625, (9-19) yields the quadratic
(A — 6)" = 0.0625Л
Solving, wc obtain A( = 5.42, A, = 6.64. Wc can thus claim with confidence
coefficient 0.95 that 5.42 < A < 6.64.
Probability. We wish to estimate the probability p = P(.oZ) of an event .о/. To
do so, we form the zero-one RV x associated with this event. As we know,
E{x} = p and 07 = pq. Thus the estimation of p is equivalent to the estimation
of the mean of the RV x.
We repeat the experiment n times and we denote by к the number of
successes of л/. The ratio x = k/n is the point estimate of p. To find its
interval estimate, we form the sample mean x of x. For large n, the distribution
of x is approximately N(p, y/pq/n ). Hence
Ppx - pl <	= у =	- 1
The points of the xp plane that satisfy the inequality |x - pl < zuy/pq/n are in
the interior of the ellipse
(p ~x)2
2p(1 ~P)
(9-20)
к
x = —
n
From this it follows that the у confidence interval of p is the vertical segment
(Pi.P2) of Fig. 9-4. The endpoints P| and p2 of this segment are the roots of
(9-20). For n > 100 the following approximation can be used:
Pi _ /x(l - x)
p2 ^x±zu]l -
Pi <P <P2
(9-21)
This follows from (9-20) if we replace on the right side the unknown p by its
point estimate x.
FIGURE 9-4
252 statistics
Example 9-6. In a preelection poll. 500 persons were questioned and 240 responded
Republican. Find the 0.95 confidence interval of the probability p = (Republican}.
In this example, zu = 2, n = 500. x = 240/500 = 0.48, and (9-21) yields the
interval 0.48 + 0.045.
In lhe usual reporting of the results, the following wording is used We
estimate that 48 percent of lhe voters arc Republican. The margin of error is ±4.5
percent. This only specifics the point estimate and the confidence interval of the
poll. The confidence coefficient (0.95 in this case) is rarely mentioned.
Variance
We wish to estimate the variance г = a2 of a normal RV x in terms of the n
samples xi of x.
Known mean. We assume first that the mean -q of x is known and we use as the
point estimator of о the average
v = - E (x, - t})2	(9-22)
n z=i
As we know,
2cr4
£•( v} = и a.----------—+ 0
n '
Thus v is a consistent estimator of cr2. We shall find an interval estimate. The
RV nv/cr2 has а л,2(п) density (see page 200). This density is not symmetrical;
hence the interval estimate of cr2 is not centered at a2. To determine it, we
introduce two constants ct and c2 such that (Fig. 9-5«)
This yields	ci = Xi-6/2^' and the interval
nv	nv
------FT < a < ------FT
Xl-6/2(n)	*6/21”)
FIGURE 9-5
9-2 РАНАМ I 11 R I SI IMA I К IN
253
TABLE 9-3
Chi-square percentiles Хц(п)
\w										
n\	.005	.01	.025	.05	.1	.9	.95	.975	.99	.995
1	0.00	0.00	0.00	0.00	0.02	2.71	3.84	5.02	6.63	7.SS
2	0.01	0.02	0.05	0.10	0.21	4.61	5.99	7.38	9.21	10.60
3	0.07	0.11	0.22	0.35	0.58	6.25	7.81	9.35	11.34	12.8-1
4	0.21	0.30	0.48	0.71	1.06	7.78	9.49	11.14	13.28	14.86
5	0.41	0.55	0.83	1.15	1.61	9.24	11.07	12.83	15.09	16.75
6	0.68	0.87	1.24	1.64	2.20	10.64	12.59	14.45	16.81	18.55
7	0.99	1.24	1.69	2.17	2.83	12.02	14.07	16.01	18.48	20.28
8	1.34	1.65	2.18	2.73	3.49	13.36	15.51	17.53	20.00	21.96
9	1.73	2.09	2.70	3.33	4.17	14.68	16.92	19.02	21.67	23.59
10	2.16	2.56	3.25	3.94	4.87	15.99	18.31	20.48	23.21	25.19
II	2.60	3.05	3.82	4.57	558	17.28	19.68	21.92	24.73	26.76
12	3.07	3.57	4.40	5.23	6.30	18.55	21.03	23.34	26.22	28.30
13	357	4.11	5.01	5.89	7.04	19.81	22.36	24.74	27.69	29.82
14	4.07	4.66	5.63	6.57	7.79	21.06	23.68	26.12	29.14	31.32
15	4.60	5.23	6.26	7.26	8.55	22.31	25.00	27.49	30.58	32.80
16	5.14	5.81	6.91	7.96	9.31	23.54	26.30	28.85	32.00	34.27
17	5.70	6.41	7.56	8.67	10.09	24.77	27.59	30.19	33.41	4S
18	6.26	7.01	8.23	9.39	10.86	25.99	28.87	31.53	34.81	37.16
19	6.84	7.63	8.91	10.12	11.65	27.20	30.14	32.85	36.19	38.58
20	7.43	8.26	959	10.85	12.44	28.41	31.41	34.17	37.57	40.00
22	8.6	95	11.0	12.3	14.0	30.8	33.9	36.8	40.3	42.8
24	9.9	10.9	12.4	13.8	15.7	33.2	36.4	39.4	43.0	45.6
26	11.2	12.2	13.8	15.4	17.3	35.6	38.9	41.9	45.6	48.3
28	12.5	13.6	15.3	16.9	18.9	37.9	41.3	44.5	48.3	51.0
30	13.8	15.0	16.8	18.5	20.6	40.3	43.8	47.0	50.9	53.7
40	20.7	22.2	24.4	265	29.1	51.8	55.8	59.3	63.7	66.8
50	28.0	29.7	32.4	34.8	37.7	63.2	67.5	71.4	76.2	79.5
For n & 50: *,/") = -(<„ i/2n - I f
results. This interval does not have minimum length. The minimum interval is
such that /V(C|) = fx(c2) (Fig. 9-5/?); however, its determination is not simple.
In Table 9-3, we list the percentiles	the A'2(n) distribution.
Unknown mean. If у is unknown, we use as the point estimate of <r2 the
sample variance sz [see (9-13)]. The RV (и - Ds2/"’2 has a A,2(/I _ 0 distribu-
tion. Hence
~ О <
(и - l)s2
^2
<Х\-я/2(" - Ш = У
This yields the interval
(n-l)s2	(л-1)л-2
АТ-й/2(" - 1)	- l>
(9-24)
254 statistics
Example 9-7. A voltage source lz is measured six limes. The measurements arc
modeled by the RV x = И + v. Wc assume that the error v is .Ши). We wish to
find the 0.95 interval estimate of ст2.
(о) Suppose first that the source is a known standard with Iz= 110 V. Wc
insert the measured values xt = 110 + v, of V into (9-22) and wc find Г = 0.25.
From Table 9-3 wc obtain
^.025(6) = 1-24	x2975(6) = 14.45
and (9-23) yields 0.104 < ст2 < 1.2. The corresponding interval for ст is 0.332 < ст
< 1.096 V.
(b) Suppose now that Iz is unknown. Using the same data, wc compute .r
from (9-13) and wc find s2 — 0.30. From Table 9-3 wc obtain
A'i7.o25(5) = 0.83	аг2975(5) = 12.83
and (9-24) yields 0.117 < ст2 < 1.8. The corresponding interval for ст is 0.342 < ст
< 1.344 V.
PERCENTILES. The и percentile of an RV x is by definition a number xu such
that F(xtl) = u. Thus x„ is the inverse function F<_,)(tz) of the distribution
F(x) of x. Wc shall estimate xa in terms of the samples x, of x. To do so, we
write the n observations x; in ascending order and we denote by yk the к th
number so obtained. The corresponding RVs yk are the order statistics of x [see
(8-13)].
From the definition it follows that yk < xu iff at least к of the samples x,
are less than x„; similarly, yk+r > xa iff at least к + r of the samples x, are
greater than xir Finally, yk < xu < yk+r iff at least к and at most к + r — 1 of
the samples x, are less than xu. This leads to the conclusion that the event
(У* <xu < occurs iff the number of successes of the event {x <x„] in n
repetitions of the experiment is at least к and at most к + r - 1. And since
P{x £x„} - ut it follows from (3-18) with p - и that
P{y„ <*,,	- E	(9-25)
Using this basic relationship, we shall find the у confidence interval of xu
for a specific u. To do so, we must find an integer к such that the sum in (9-25)
equals у for the smallest possible r. This is a complicated task involving trial
and error. A simple solution can be obtained if n is large. Using the normal
approximation (3-33) with p = nu, we obtain
(к + r — .0.5 — пи \	( к - 0.5 - nu \
—1 ,,	,	-G i ,	= У
ynw(l - и)	J I ynu(n - U) j
This follows from (3-33) with p = nu. For a specific y, r is minimum if nu is
9-2 PARAMI. II.Il I-STI.MA IION 255
near the center of the interval (к, к 4- r). This yields
к = пи - zx _й/2 yjnu{\ - и) к + г - пи 4- z, _й/2 у/пи( I - и) (9-26)
to the nearest integer.
Example 9-8. We observe 100 samples of x and wc wish to find the 0.95 confidence
interval of the median xn5 of x. With и = 0.5, пи = 50. z(W75 = 2, (9-26) yields
к ~ 40, к 4- r = 60. Thus wc can claim with confidence coefficient 0.95 that the
median of x is between y4U and
DISTRIBUTIONS. We wish to estimate the distribution F(x) of an RV x in
terms of the samples x4 of x. For a specific x, F(.r) equals the probability of the
event {x x); hence its point estimate is the ratio nx/n where nx is the number
of x/s that do not exceed x. Repeating this for every x, we obtain the empirical
estimate
of the distribution F(x) [see also (4-3)]. This estimate is a staircase function
(Fig. 9-6e) with-discontinuities at the points x,.
Interval estimates. For a specific x, the interval estimate of F(x) is obtained
from (9-20) with p = F(x) and x = F(x). Inserting into (9-21), we obtain the
interval
F(x) ±	-F(x)]
We can thus claim with confidence coefficient у = 2u - 1 that the unknown
-F(x) is in the above interval. Note that the length;of this interval depends on x.
256 statistics
We shall now find an interval estimate Fix) ± c of Fix) where c is a
constant. The empirical estimate Fix) depends on~ the samples x, of x. It
specifies, therefore, a family of staircase functions F(x), one for each set of
samples x,. The constant c is such that
P{|F(x) — F(x)|<c) = y	(9-27)
for every x and the у confidence region of Fix) is the strip Fix) ± c. To find c,
we form the maximum
w= max|F(x) -F(x)|	(9-28)
(least upper bound) of the distance between Fix) and Fix). Suppose that
iv == w(£) a specific value of w. From (9-28) it follows that iv < c iff Fix) —
Fix) < c for every x. Hence
у - P{w < c) = Fwic)
It suffices, therefore, to find the distribution of w. We shall show first that the
function Fwiw) does not depend on Fix). As we know [see (5-18)], the RV
у — Fix) is uniform in the interval (0,1) for any Fix). The function у = Fix)
transforms the points xt to the points y, = Fix J and the RV w to itself (see
Fig. 9-6Z0. This shows that Fwiw) does not depend on the form of Fix). For its
determination it suffices, therefore, to assume that x is uniform. However, even
with this simplification, it is not simple to find F„(w). We give next an
approximate solution due to Kolmogoroff:
For large n:
Fw(w) = 1 - 2е~2п*г	(9-29)
From this it follows that у = Fwic) = 1 — e~2nc'. We can thus claim with
confidence coefficient у that the unknown Fix) is between the curves Fix) + c
and Fix) — c where
I	i
c = V"T-ln^—	(9-зо)
у	2n	2
This approximation is satisfactory if w > 1/ v^-
Bayesian Estimation
We return to the problem of estimating the parameter в of a distribution
Fix, 9). In dur earlier approach, we viewed в as an unknown constant and the
estimate was based solely on the observed values x,- of the RV x. This approach
to estimation is called classical. In certain applications, в is not totally unknown.
If, for example, в is the probability of six in the die experiment, we expect that
its possible values are close to 1/6 because most dice are reasonably fair. In
bayesian statistics, the available prior information about в is used in the
estimation problem. In this approach, the unknown parameter 0 is viewed as
the value of an RV в and the distribution of x is interpreted as the conditional
9-2 PARAMt.TLR I SI IMA I IUN 257
distribution Fx(x|0) of x assuming 0 = 0. The prior information is used to
assign somehow a density /„(0) to the RV 0, and the problem is to estimate the
value в of 0 in terms of the observed values x, of x and the density of 0. The
problem of estimating the unknown parameter 0 is thus changed to the problem
of estimating the value в of the RV 0. Thus, in bayesian statistics, estimation is
changed to prediction.
We shall introduce the method in the context of the following problem.
We wish to estimate the inductance в of a coil. We measure 0 n times and the
results are the samples x, = в + p, of the RV x = в + v. If we interpret в as an
unknown number, we have a classical estimation problem. Suppose, however,
that the coil is selected from a production line. In this case, its inductance 0 can
be interpreted as the value of an RV 0 modeling the inductances of all coils.
This is a problem in bayesian estimation. To solve it, we assume first that no
observations are available, that is, that the specific coil has not been measured.
The available information is now the prior density /„(0) of 0 which we assume
known and our problem is to find a constant в close in some sense to the
unknown в, that is, to the true value of the inductance of the particular coil. If
we use the LMS criterion for selecting 0, then [see (7-62)]
0 = £{0) = Г ef0(e) de
J — x
To improve the estimate, we measure the coil n times. The problem now
is to estimate 0 in terms of the n samples x( of x. In the general case, this
involves the estimation of the value 0 of an RV 0 in terms of the n samples x,
of x. Using again the MS criterion, we obtain
0 = £{0|X} = Г 0fe(0\X) de	(9-31)
J — x
[see (8-77)] where X = [x1?..., x„] and
A«W -	(9-32)
In the above, f(X\e) is the conditional density of the n RVs x, assuming 0 = 0.
If these RVs are conditionally independent, then
r(XI0-/(x,l»)--rW«)	(9-33)
where /(x|0) is the conditional density of the RV x assuming 0 = 0. These
results hold in general. In the measurement problem, /(x|0) = /,.(x — 0).
We conclude with the clarification of the meaning of the various densities
used in bayesian estimation, and of the underlying model, in the context of the
measurement problem. The density /tf(0), called prior (prior to the measure-
ments), models the inductances of all coils. The density /fl(0|A'X called posterior
(after the measurements), models the inductances of all coils of measured
inductance x. The conditional density /д.(х|0) = /„(x — 0) models all measure-
ments of a particular coil of true inductance 0. This density, considered as a
258 statistics
FIGURE 9-7
function of 0, is called the likelihood function. The unconditional density /Л(л )
models all measurements of all coils. Equation (9-33) is based on the reasonable
assumption that the measurements of a given coil are independent.
The bayesian model is a product space ,У= Уу x y£ where ./e is the
space of the RV 0 and У£ is the space of the RV x. The space Уу is the space
of all coils and .y£ is the space of all measurements of a particular coil. Finally,
У" is the space of all measurements of all coils. The number в has two
meanings: It is the value of the RV 0 in the space Уй; it is also a parameter
specifying the density f(x|0) = /„(x - 0) of the RV x in the space У'.
Example 9-9. Suppose that x = 0 + v where v is an N(0, <r) RV and 0 is the
value of an N(0u,<ro) RV 0 (Fig. 9-7). Find the bayesian estimate 0 of 0.
The density /(x|0) of x is N(0,o-). Inserting into (9-32), we conclude that
(see Prob. 9-37) the function fe(0\X) is MOj.o-j) where
n + o-2/fi
0t = —eu +
<r
ZltTj2 _
—~x
<T‘
From the above it follows that £{0|zY) = 0(; in other words, 0 = 0(.
Note that the classical estimate of 0 is the average x of x,. Furthermore, its
prior estimate is the constant 0O, Hence 0 is the weighted average of the prior
estimate 0() and the classicaj estimate x. Note further that as n tends to », tr( -» 0
and ncr2/a2 -» I; hence 0 tends to x. Thus, as the number of measurements
increases, the bayesian estimate 0 approaches the classical estimate x; the effect of
the prior becomes negligible.
We present next the estimation of the probability p = of an event
To be concrete, we assume that зУ is the event “heads” in the coin
experiment. The result is based on Bayes’ formula [see (4-67)]
=
Г P(^\x)f(x)dx
J —00
(9-34)
In bayesian statistics, p is the value of an RV p with prior density f(p). In the
9-2 <*лхлм| ilk i.siimaiio^ 2S9
(9-35)
absence of any observations, the LMS estimate p is given by
P = f'pf(p) dp
Jo
To improve the estimate, we toss the coin at hand n times and we observe that
“heads” shows к times. As we know,
P{*#\p = p} = pkq"~k Л= {k heads}
Inserting into (9-34), we obtain the posterior density
f(p\^) = —-------------------
f pkq"~kf(p) dp
Jo
Using this function, we can estimate the probability of heads at the next
toss of the coin. Replacing ftp) by ftp\-#) in (9-35), we conclude that the
updated estimate p of p is the conditional estimate of p assuming
(9-36)
('pf(pU)dp	(9-37)
Note that for large n, the factor ^?(p) = pktl - p)n~k in (9-36) has a
sharp maximum at p = k/n. Therefore, if /(p) is smooth, the product ftp)<p(p)
is concentrated near k/n (Fig. 9-8a). However, if /(p) has a sharp peak at
p = 0.5 (this is the case for reasonably fair coins), then for moderate values of
n, the product /(p)^>(p) has two maxima: one near k/n and the other near 0.5
(Fig. 9-8Z>). As n increases, the sharpness of <ptp) prevails and f(p|^/) is
maximum near k/n (Fig. 9-8c).
Example 9-10. We toss a coin of unknown quality n times and we observe к
heads. Using this information, wc wish to find the bayesian estimate p of the
probability p that at the next toss heads will show.
In the absence of any prior information, we assume that p is the value of an
RV p uniformly distributed in the interval (0,1). Setting ftp) = 1 in (9-36) and
260 STATISTICS
using the identity
A)
A!(/i - it)!
(/j + I)!
wc obtain
(« + 0!	,	„ a
f^-w^.^'-p>	n<p<'
This function is known as the beta density. The updated estimate p of p is
obtained from (9-37):
P =
(« + !)!
A!(n - k)l
flpk И(1 -P)
Jo
dp =
к + 1
»j + 2
This result is known as the law of succession.
Note Bayesian estimation is a controversial subject. The controversy has its origin on the
dual interpretation of the physical meaning of probability. In the first interpretation, the
probability of an event .с/ is an 'objective” measure of the relative frequency of
the occurrence of .о/ in a large number of trials. In the second interpretation, P( :S) is a
“subjective” measure of our state of knowledge concerning the occurrence of in a
single trial. This dualism leads to two different interpretations of the meaning of
parameter estimation. In the coin experiment, these interpretations take the following
form:
In the classical (objective) approach, p is an unknown number. To estimate its
value, wc toss the coin n times and use as an estimate of p the ratio p = k/n. In the
bayesian (subjective) approach, p is also an unknown number, however, wc interpret it as
the value of an RV 6, the density of which wc determine using whatever knowledge we
might have about the coin. The resulting estimate of p is determined from (9-37). If wc
know nothing about p.wcset /(p) = 1 and wc obtain the estimate p = (k + !)/(/» + 2).
Conceptually, the two approaches arc different. However, practically, they lead in most
estimates of interest to similar results if the size n of the available sample is large. In the
coin problem, for example, if n is large, к is also large with high probability; hence
(k + l)/(n + 2) = k/n. If n is not large, the results arc different but unreliable for
either method. The mathematics of bayesian estimation arc also used in classical
estimation problems if в is the value of an RV the density of which can be deter-
mined objectively in terms of averages. This is the case in the problem considered in
Example 9-9.
Method of Maximum Likelihood
Up to now, we considered the estimation of particular parameters, and the
selection of their estimators was based on the relative frequency interpretation
of the mean of some function of x. In the following, we develop a general
method of estimation. This method can be used for most applications but it is
efficient primarily for large values of n. We introduce the method in the context
qf the following problem^
9-2 I’AKAMI lt.lt LSI IMAI IOS 261
(tf)
FIGURE 9-9
Wc have an RV x with density f(x, 0) and wc wish to estimate 0 in terms
of a single observation of the RV x. To do so, we plot the density f(x,0) as a
function of 0, assigning to л the observed value of x, and we determine the
value 0 = 0mux of 0 that maximizes fix, 0). Wc shall call the curve /(.v, 0) so
plotted the likelihood function of x and the number 0 the maximum likelihood
(ML) estimate of 0. This estimate is the value of 0 for which the probability
f(x,0)dx that the RV x is in the interval (x. x + dx) is maximum.
Example 9-11. If Fig. 9-9 we plot the Erlang density
f(x.O) = 02xe "'(/(x)
as a function of x and the corresponding likelihood function. The likelihood
function is maximum for 0 = 2/x. Thus the ML estimate of 0 in terms of the
observed value x of x is 0 = 2/x. The mode xllbIX = 1/0 of the density is the
predicted value of x if 0 is known (see page 179).
We shall now determine the ML estimate of 0 in terms of n observations
xt of x. To do so, we form the joint density
/(X,0)=/(xt,0)---/(x„.0)
of the n samples x, of x. This density, considered as a function of 0 is called the
likelihood function of X. The value 0 of 0 that maximizes f(X\0) is the ML
estimate of 0. The logarithm
L(X,0) = \nf(X,0) = Eln/(x,,0)	(9-38)
/»i
is the log-likelihood function of X. From the monotonicity of the logarithm, it
follows that 0 also maximizes the function L(x, 0). If 0 is in the interior of the
domain 0 of 0, then 0 is a root of the equation
i V(.r„o)	j
3»	M
262 statistics
Example 9-12. Suppose that fix,в) = Oen'U(x). In this case.
f(X,6) = ene~OnS L(X,6) = n In fl - впх
Hence
Thus the ML estimator of 0 equals 1/x. This estimator is biased because
E(I/x) =/i0/(n - 1).
The ML method can be used to estimate any parameter. However, for
moderate values of n, the estimate is not efficient. The method is used primarily
for large values of n. This is based on the following important result.
Asymptotic properties. For large n, the distribution of the ML estimator 6
approaches a normal curve with mean в and variance l/nl where
I « E
3L(x,0)
30
0L(x,0)
dO
2
/(x,0) dx
(9-40)
Thus
л 1	( nl л
fM « /5—f exp/ - — (0 - 0) }	(9-41)
у2тгл/ I 2	}
The number I is called the information about 0 contained in x. Using integra-
tion by parts, we can show that (see Prob. 9-24)
____f 02L(x,0) \
\	002	/
We show later [see (9-46)] that the variance of any estimator of 0 cannot be
smaller than l/nl. From this it follows that the ML estimator is asymptotically
normal, unbiased, with minimum variance. In the next example, we demonstrate
the validity of the above theorem. The proof will not be given.
Example 9-13. Suppose that the RV x is N(7j,<r) where -37 is a known constant.
Wc wish to find the ML estimate 0 of its variance c = cr2. In this problem,
/(Xt u) « —«exp/ -	£(x, - n)2)
(/2irv) I 21	/
,	n	1 r-.	,
L(X,u) = - —ln(2-n-p) - — Е(х, - 77Г
Inserting into (9-39), we obtain
fl£(X,0) я 1 _
9-2 I'AHAMI II R ISIIMAIIOS 263
This yields the estimator
-Е(».-ч):
As wc know (see (8-67)]
it
the RV v is nearly normal (CLT) as in (9-41). To
н/2'Л This
Furthermore, for large n
complete the validity of (9-41), it suffices to show that nl = l/<rf
follows from the identity
( d2L(x, r))
I = E{------------ = E
(x “ V)2
+
The Rao-Cramer Bound
A basic problem in estimation is the determination of the best estimator 0 of a
parameter 0. It is easy to show that if 0 exists, it is unique (see Prob. 9-39).
However, in general, the problem of determining the best estimator of 0, or
even of showing that such an estimator exists, is not simple. In the following, we
determine the greatest lower bound of the variance of most estimators. This
result can be used to establish whether a particular estimator is the best or that
it is close to the best. We shall assume that the density /(x,0) of x is
differentiable with respect to в and that the boundary of the domain of x does
riot depend on 0. Differentiating the area condition ff(x, 3) dx = 1 with respect
to в, we obtain the identity
f V(x-e> , n
/ ---------dx = 0
J-~ 30
A density satisfying the conditions leading to this identity will be called regular.
We show next that
(9-42)
(ЗЦХ,в)
do
dL(X,0)
de
2i
> - nl
(9-43)
E
where L(X, в) = In f(Xt 0) is the log-likelihood of X and nl is the information
about 0 contained in X [see also (9-40)].
Proof. From the identity L(x, в) = In f(x, в) and (9-42), it follows that
,« 0L(x,e)	df(x,e)
=L.-i»- =0
This shows that the mean of the function 5L(x, 0)/d0 is 0; hence its variance
equals £{|<?L(x, 0)/d0\2}. Inserting into (9-38), we obtain (9-43) because the
RVs In f(xh 0) are independent.
264 STATISTICS
We shall use (9-43) to determine the greatest lower bound of the variance
of an arbitrary estimator 6 of 0. Suppose first that в = g(X) is an unbiased
estimator of 0:
E{0} = f g(X)f(x,e)dx = o
JR
Differentiating with respect to 0, we obtain
,	af(x,e)	,	дцх,е) ,
I = M*)	’	[s(X) aa f(x,e}dx
Jr	of)	Jr	dti
This yields
Multiplying the first equation of (9-43) by 0 and subtracting from (9-44), we
obtain
i	0Л(Х,0Ъ
e «(x)-e—=1	(9-45)
\	00 I
We shall use this identity to prove the following important result.
THEOREM. The variance £{[g(X) — 0]2) of any unbiased estimator 0 of 0
cannot be smaller than 1/nZ:
4	(9'46)
Proof. The proof is based on Schwarz’s inequality
E2{zw) E{z2}E{w2}	(9-47)
Squaring both sides of (9-45) and applying (9-47) to the RVs z = g(X) — 0 and
w = 0L(X, 0)/00 we obtain
( <?L(X,0) 2)
1 s E [S(X) - в]2 E \	(9-48)
I 0(7 I
and (9-46) results.
We shall how determine the class of functions for which the estimator 0 is
best, that is, that (9-46) is ah equality. As we know, (9-47) is an equality if
z = cw. Hence (9-48) is an equality if g(X) — 0 = c0L(X, 0)/00. To find c, we
insert into (9-48) and use (9-43). This yields c = 1 /nl hence
0L(X,0)
—	=n/[g(X)-0]	(9-49)
ou
Thus the estimate 0 = g(X) is best if the log-likelihood function L(X, 0)
satisfies (9-49).
9-3 HYI’OIHLSIS IIJ»FISG 265
COROLLARY. If 0 = g(X) is a biased estimator of в with mean h'{0) =
then
r° - nl
(9-50)
Proof. The statistic 0 = g(X) is an unbiased estimator of the parameter т = r(0).
We can, therefore, apply (9-46) provided that we replace 0 by r(0) and nl by
the information about т contained in X. Since
0L[*,0(t)]	дЦХ.в) de
dr	ее	d т
and 0'(т) = l/r'(0) we obtain
J ад[х,0(т)] 2) = ni
\ dr / [t’(0)]2
and (9-50) results. Reasoning as in (9-49), we conclude that (2-44) is an equality
iff
0L[X,0(r)] nl	T
~H---------L~7 „„Tp «(-V)-e(r)	(9-51)
ar	[-r'(e)]
Note If /(x, 0) is a density of exponential type, that is, if
/(x,0) = A(x)cxp{a(0)^(x) + />(0)}	(9-52)
then the statistic 0 = (l/n)E<7(x) is the best estimator of the parameter r(0) =
—Z>'(0)/a'(0). This follows readily from (9-51).
9-3 HYPOTHESIS TESTING
A statistical hypothesis is an assumption about the value of one or more
parameters of a statistical model. Hypothesis testing is a process of establishing
the validity of a hypothesis. This topic is fundamental in a variety of applica-
tions: Is Mendel’s theory of heredity valid? Is the number of particles emitted
from a radioactive substance Poisson distributed? Docs the value of a parame-
ter in a scientific investigation equal a specific constant? Are two events
independent? Does the mean of an RV change if certain factors of the
experiment are modified? Does smoking decrease life expectancy? Do voting
patterns depend on sex? Do IQ scores depend on parental education? The list
is endless.
We shall introduce the main concepts of hypothesis testing in the context
of the following problem: The distribution of an RV x is a known func-
tion F(x,0) depending on a parameter 0. We wish to test the assumption
0 « 0O against the assumption 0 * 0O. The assumption that 0 « 0O is denoted
by Ho and is called the null hypothesis. The assumption that 0 * 0O is denoted
266 SI A US’1 ICS
by W, and is called the altcmatire hypothesis. The values that 0 might take
under the alternative hypothesis form a set 0, in the parameter space. If (-),
consists of a single point в = 0P the hypothesis is called simple., otherwise,
it is called composite. The null hypothesis is in most cases simple.
The purpose of hypothesis testing is to establish whether experimental
evidence supports the rejection of the null hypothesis. The decision is based on
the location of the observed sample X of x. Suppose that under hypothesis //(t
the density f(.X, 0(l) of the sample vector X is negligible in a certain region /2
of the sample space, taking significant values only in the complement Z) of
Dc. It is reasonable then to reject Ha if X is in D,. and to aecep£/-/n if X is in
Dc. The set Dc is called the critical region of the test and the set Dc is called the
region of acceptance of H„. The test is thus specified in terms of the set Z).
We should stress that the purpose of hypothesis testing is not to determine
whether or If is true. It is to establish whether the evidence supports the
rejection of Hl}. The terms “accept” and “reject" must, therefore, be inter-
preted accordingly. Suppose, for example, that we wish to establish whether the
hypothesis Ha that a coin is fair is true. To do so, we toss the coin 100 times
and observe that heads show к times. If к = 15, we reject Ho. that is, we decide
on the basis of the evidence that the fair-coin hypothesis should be rejected. If
к = 49, we accept Z7(), that is, we decide that the evidence does not support the
rejection of the fair-coin hypothesis. The evidence alone, however, does not lead
to the conclusion that the coin is fair. We could have as well concluded that
p = 0.49.
In hypothesis testing two kinds of errors might occur depending on the
location of X:
1. Suppose first that Z/(l is true. If X g Dr, we reject Нл even though it is true.
We then say that a Type I error is committed. The probability for such an
error is denoted by a and is called the significance level of the test. Thus
a = P{XeDc\Hn]	(9-53)
The difference I — a = P{X £ Dc|equals the probability that we accept
Ho when true. In this notation, P{ • • • |H(1} is not a conditional probability.
The symbol Ha merely indicates that /Z() is true.
2. Suppose next that HQ is false. If X £ Dc, we accept Z7() even though it is
false. We then say that a Type П error is committed. The probability for such
an error is a function /3(0) of 0 called the operating characteristic (ОС) of the
test. Thus
/3(0) = P{X £ DJZZ,}	(9-54)
The difference 1 — /3(0) is the probability that we reject Ha when false. This
is denoted by P(0) and is called the power of the test. Thus
P(0) - 1 - /3(0) = P{X e	(9-55)
9-3 iiypoiiii sis it sum, 267
Fundamental note Hypothesis testing is not a part of statistics. It is part of decision
theory based on statistics. Statistical consideration alone cannot lead to a decision. They
merely lead to the following probabilistic statements:
If H() is true, then P(X e Dr} - a
If Hu is false, then P{X € D,.} = 0(0)
Guided by these statements, wc "reject" H„ if X g D( and we "accept” if X e I\..
These decisions arc not based on (9-56) alone. They take into consideration other, often
subjective, factors, for example, our prior knowledge concerning the truth of H„. or the
consequences of a wrong decision.
The test of a hypothesis is specified in terms of its critical region. The
region Dc is chosen so as to keep the probabilities of both types of errors small.
However both probabilities cannot be arbitrarily small because a decrease in a
results in an increase in /3. In most applications, it is more important to control
a. The selection of the region £>. proceeds thus as follows:
Assign a value to the Type I error probability a and search for a region Dc
of the sample space so as to minimize the Type II error probability for a specific
в. If the resulting /3(0) is too large, increase a to its largest tolerable value; if
/3(0) is still too large, increase the number n of samples.
A test is called most powerful if /3(0) is minimum. In general, the critical
region of a most powerful test depends on 0. If it is the same for every' 0 e O(.
the test is uniformly most powerful. Such a test does not always exist. The
determination of the critical region of a most powerful test involves a search in
the л-dimensional sample space. In the following, we introduce a simpler
approach.
TEST STATISTIC. Prior to any experimentation, we select a function
q = £(X)
of the sample vector X. We then find a set Rc of the real line where under
hypothesis the density of q is negligible, and we reject H(l if the value
q = g(X) of q is in Rc. The set Rc is the critical region of the test; the RV q is
the test statistic. In the selection of the function g( X) we are guided by the
point estimate of 0.
In a hypothesis test based on a test statistic, the two types of errors are
expressed in terms of the region Rc of the real line and the density f4(q,0) of
the test statistic q:
a = P{q e R(|/7„} = f f (q, 0O) clq	(9-57)
-'я,
W) =P{qe/?J/71) = ff^q.&jdq	(9-58)
'я,
To carry out the test, we determine first the function f4(q, 0). We then
assign a value to a and we search for a region Rc minimizing /3(0). The search
268 SIAII.S'IKS
FIGURE 9-10
is now limited to the real line. We shall assume that the function /,,(<7.0) has a
single maximum. This is the case for most practical tests.
Our objective is to test the hypothesis 0 = 0O against each of the hypothe-
ses 0 =# 0(1, 0 > 0(), and 0 < 0O. To be concrete, we shall assume that the
function fq(q, 0) is concentrated on the right of fq(q, 0(1) for 0 > 0O and on its
left for 0 < 0(l as in Fig. 9-10.
Ht: в & 0()
Under the stated assumptions, the most likely values of q are on the right of
fq(q, 0()) if 0 > 0(| and on its left if 0 < 0(). It is. therefore, desirable to reject Ho
if q < ct or if q > c2. The resulting critical region consists of the half-lines
q < ct and q > c2. For convenience, we shall select the constants ct and c2
such that
P{q < cjAZ,]}	P{q > c2|/-/0} =|
Denoting by qu the и percentile of q under hypothesis Яп, we conclude that
ci = tfa/2’ c2 “ tft-n/z- This yields the following test:
Accept H(i iff qa/2 <q <q^u/2	(9-59 a)
The resulting ОС function equals
P(9) = ("'!f„(4,0)dq	(9-60»)
H,: e > 0U
Under hypothesis Hlf the most likely values of q are on the right of /,/<?. 0). It
is, therefore, desirable to reject HQ if q > c. The resulting critical region is now
9-3 IIYTOTI (t:SIS ILS IING 269
the half-line q > c where c is such that
> c|/70) = a c = q
and the following test results:
Accept //0 iff q < qt _a
The resulting ОС function equals
Р(в) = f f4(<hf>)dq
d — sc
(9-59/>)
(9-60b)
Hx: e <eQ
Proceeding similarly, we obtain the critical region q < c where c is such that
P{q < с|Я0} = a c = qn
This yields the following test:
Accept Ho iff q > qa
The resulting ОС function equals
(9-59c)
W) = faq^dq
(9-60c)
The test of a hypothesis thus involves the following steps: Select a test
statistic q = g(X) and determine its density. Observe the sample X and com-
pute the function q = g(X). Assign a value to a and determine the critical
region Rc. Reject iff q e Rc.
In the following, we give several illustrations of hypothesis testing. The
results are based on (9-59) and (9-60). In certain cases, the density of q is known
for 0 = 0U only. This suffices to determine the critical region. The ОС function
/3(0), however, cannot be determined.
MEAN. We shall test the hypothesis Hu: q = q0 that the mean q of an RV x
equals a given constant qQ.
Known variance. We use as the test statistic the RV
Under the familiar assumptions; x is N(q,a/ hence q is N(qtl, 1) where
Under hypothesis Ho, q is MO, i). Replacing in (9-59) and (9-60) the qu
270 statistics
percentile by the standard normal percentile zu, we obtain the following test:
Accept Hn iff za/2 < q < zx_a/2	(9-63a)
=/’{|ql <г,_в/21^|) - &(zx_a/2 - ??„) -G(z„/2 -77_) (9-64a)
Л > Vo	Accept Hn iff q < z,_o	(9-636) P(v) = P{q <zl_a|//1) = G(zt_a -	(9-646)
ti < 77O	Accept Hu iff q > za	(9-63c) P(v) = /’{q >	= 1 “ G(Za “ Vq)	(9-64<?)
Unknown variance. We assume that x is normal and use as the test statistic the
RV
q =
X - 77o
S/l/rT
(9-65)
where s2 is the sample variance of x. Under hypothesis Ho, the RV q has a
Student-f distribution with n - 1 degrees of freedom. We can, therefore, use
(9-59) where we replace qu by the tabulated tu(n — 1) percentile. To find Д(^),
we must find the distribution of q for -q * -q0.
Example 9-14. Wc measure the voltage И of a voltage source 25 times and wc find
x = 110.12 V (see also Example 9-3). Test the hypothesis V = Vu = ПО V against
V ч6 ПО V with a = 0.05. Assume that the measurement error v is M0, a).
(a)	Suppose that cr = 0.4 V. In this problem, Z|_o/2 = z0.97s = 2:
110.12- 110
q  --------7=— = 1.5
0.4/>/25
Since 1.5 is in the interval (—2,2), we accept Hu.
(6)	Suppose that cr is unknown. From the measurements wc find 5 = 0.6 V.
Inserting into (9-65), we obtain
110.12 - 110
q  --------f=— = 1
0.6/v^5
Table 9-3 yields /t_a/2(n - 1) =/0975(25) = 2.06 = -/0,025- Since 1 is in the
interval (—2.06,2.06), we accept ff0.
PROBABILITY. We shall test the hypothesis HQ: p = p0 = 1 — qQ that the
probability p =	) of an event л/ equals a given constant p0, using as data
the number k of successes of л/ in n trials. The RV к has a binomial
distribution and for large n it is N(np, y/npq). We shall assume that n is large.
The test will be based on the test statistic
к - npQ
q — 1-------
ynpQqQ
(9-66)
Under hypothesis Ho, q is M0,1). The test thus proceeds as in (9-63).
9-3 HYPOIHLSIS IUS1INC, 271
To find the ОС function /3(p), we must determine the distribution of q
under the alternative hypothesis. Since к is normal, q is also normal with
_	~ npa , _ npq
q JnPn<h “ "Pn‘h>
This yields the following test:
Я^Р^Ро	Accept Hn iff z<f/, < q < zt_n/2	(9-67a)
ЙР) - Pflql <	j - g( 7/; ~ j (9-68«)
\ УДО/РЛ / I VPfl/Pnfld /
Я1:р>р()	Accept Ho iff q < zt_a	(9-67Ю
0(p) =P{q <z,_„W,) -G 4==^]	(9-686)
1 VW/P(l<7o /
H\'- P <Pa Accept HQ iff q > za	(9-67c)
0(p)-P(q>;„|W,) - 1-g( Г"-’’1' I (9-68c)
1 ylPP/P^a I
Example 9-15. Wc wish to test the hypothesis that a coin is fair against the
hypothesis that it is loaded in favor of “heads”:
Hf. p = 0.5 against Hf. p > 0.5
Wc toss the coin 100 times and “heads” shows 62 times. Docs the evidence
support the rejection of the null hypothesis with significance level a = 0.05? Tn
this example, Z|_CT = z09S = 1.645. Since
the fair-coin hypothesis is rejected.
VARIANCE. The RV x is M17, cr). We wish to test the hypothesis Htt: cr = cr0.
Known mean. We use as test statistic the RV
/V _ „ \2
q-E—	<9’69)
i I ^0 /
Under hypothesis HQt this RV is x2(n)- We can, therefore, use (9-59) where qa
equals the x*(n) percentile.
Unknown mean. We use as the test statistic the RV
(9-70)
i X I
272 statistics
Under hypothesis Ha, this RV is ^2(n - 1). We can, therefore, use (9-59) with
<?« = X?Sfl ~ D-
Example 9-16. Suppose that in Example 9-14, the variance cr2 of the measurement
error is unknown. Test the hypothesis Ho: <r = 0.4 against <r > 0.4 with
a = 0.05 using 20 measurements л,- = И 4- p(.
(a) Assume that V = 110 V. Inserting the measurements .v, into (9-69), we
find
Since Xi-<Sn) = А'о.ч5<2О) = 31.41 < 36.2, wc reject Ho.
(/?) If И is unknown, we use (9-70). This yields
Since Xi-a(n - 0 =	= 30.14 < 22.5, we accept HQ.
DISTRIBUTIONS. In this application, HQ does not involve a parameter; it is the
hypothesis that the distribution F(x) of an RV x equals a given function Fn(x).
Thus
HQ: F(x) = Fn(x) against F(x) Ф F0(x)
The Kolmogoroff-Smimov test. We form the random process F(x) as in the
estimation problem (see page 256) and use as the test statistic the RV
q = max|F(x) - F0(x)|	(9-71)
X
This choice is based on the following observations: For a specific <, the function
Ax) is the empirical estimate of F(x) [see (4-3)]; it tends, therefore, to F(x) as
n -> 00. From this it follows that
E(F(x)J - F(x) F(x) F(x)
This shows that for large n, q is close to 0 if HQ is true and it is close to
Fix') — F0(x) if Ht is true. It leads, therefore, to the conclusion that we must
reject Ho if q is larger than some constant c. This constant is determined in
terms of the significance level a = P{q > c|H0} and the distribution of q. Under
hypothesis HOt the test statistic q equals the RV w in (9-28). Using the
Kolmogoroff approximation (9-29), we obtain
a = P{q > с|Я0) = 1 - e~2neI	(9-72)
The test thus proceeds as follows: Form the empirical estimate Ax) of F(x)
9-3 hvi'oiiii.sis iLsiiMi 273
and determine q from (9-71).
/1 a
Accept //,. iff q > i/-In —
V 2n 2
The resulting Type II error probability is reasonably small only if n is large.
(9-73)
Chi-Square Tests
We are given a partition ?I =	....of the space and we wish to test
the hypothesis that the probabilities p, = Р(.^) of the events ?/, equal in
given constants pOi:
Ho: p, = plu, all i against /7,: p, =# p(b. some i
using as data the number of successes kt of each of the events
For this purpose, we introduce the sum
£ (k.-'iPo.)2
q = E---------------
(9-74)
in n trials.
(9-75)
a binomial
i-l "P<u
known as Pearson's test statistic. As we know, the RVs k, have
distribution with mean npt and variance np^. Hence the ratio k,/zz tends to p,
as л ->*. From this it follows that the difference |kt - np(uI is small if p, = piU
and it increases as |pf - plb| increases. This justifies the use of the RV q as a
test statistic and the set q > c as the critical region of the test.
To find c, we must determine the distribution of q. We shall do so under
the assumption that n is large. For moderate values of n, we use computer
simulation [see (9-85)]. With this assumption, the RVs k( are nearly normal with
mean kpf. Under hypothesis HOf the RV q has а jf2(/n - 1) distribution. This
follows from the fact that the constants p(u satisfy the constraint Lplh = 1. The
proof, however, is rather involved.
The above leads to the following test: Observe the numbers kt and
compute the sum q in (9-75); find xl-Sm ~ 0 from Table 9-3.
Accept Hn iff q < Xi-a(m ~ 0	(9-76)
We note that the chi-square test is reduced to the test (9-68) involving the
probability p of an event лэ/. In this case, the partition equals [.й/, .ft/] and
the statistic q in (9-75) equals (k ~ npQ)z/np0q0 where = р(И, q(t = pni>
к = klt and n — к = k2 (see Prob. 9-40).
Example 9-17. We roll a die 300 times and wc observe that /( shows k, = 55 43 44
61 40 57 times. Test the hypothesis that the die is fair with a = 0.05. In this
problem, pai = 1/6, m = 6, and nplfl = 50. Inserting into (9-75), we obtain
E—50-----------M
<-1
Since A’nssCS) e 11.07 > 7.6, wc accept the fair-die hypothesis.
274 statistics
The chi-square lest is used in goodness-of-fit tests involving the agreement
between experimental data and theoretical models. We next give two illustra-
tions.
TESTS OF INDEPENDENCE. We shall lest the hypothesis that two events and
are independent:
H„:	.#) = P(.c/)/’(.^) against Ht: P(v/C\ * Р(.?/)Р(Л}
(9-77)
under the assumption that the probabilities b = P(.S)') and c = P(€) of these
events are known. To do so, we apply the chi-square test to the partition
consisting of the four events
.й/, = 3 П if .й/2 = .^ n	= 13 n 6'	--/4 = > n 6’
Under hypothesis //(l, the components of each of the events are indepen-
dent. Hence
Poi =bc pt)2 = b(l-c) pai = (\-b)c pM = (1 - Z>)(1 - c)
This yields the following test:
4 ( /c — ftp * ) *"
Accept HQ iff £ —---------— < *?_„(3)	(9-78)
k-i nPw
In the above, k-, is the number of occurrences of the event .£/; for example, k2
is the number of times .2$ occurs but does not occur.
Example 9-18. In a certain university. 60 percent of all first-year students arc male
and 75 percent of all entering students graduate. Wc select at random the records
of 299 males and 101 females and wc find that 168 males and 68 females
graduated. Test the hypothesis that the events й? = (male) and £ = (graduate} arc
independent with a = 0.05. In this problem, m = 400, P(3) = 0.6, /’(tf) = 0.75.
p0(- = 0.45 0.15 0.3 0.1, ki = 168 68 131 33, and (9-75) yields
Since ^0.95(3) = 7.81 > 4.1, we accept the independence hypothesis.
TESTS OF DISTRIBUTIONS. We introduced earlier the problem of testing the
hypothesis that the distribution F(x) of an RV x equals a given function F0(x).
The resulting test is reliable only if the number of available samples x} of x is
very large. In the following, we test the hypothesis that F(x) = F0(x) not at
every x but only at a set of m — 1 points o, (Fig. 9-11):
F(at) =	1	i: £ m. - 1 against F(«() * F0(c(). some i
(9-79)
9-3 HYPomi sis ils(ing 275
FIGURE 9-11
We introduce the m events
= {a,_ । < x < a,) i = 1,..., m
where a0 = — <» and am = oo. These events form a partition of The number
kt of successes of .w'' equals the number of samples xf in the interval (o, _,, аД
Under hypothesis Ho,
“ Л)(а,-1) =P(h
Thus, to test the hypothesis (9-79), we form the sum q in (9-75) and apply
(9-76). If Ho is rejected, then the hypothesis that F(x) = F0(x) is also rejected.
Example 9-19. We have a list of 500 computer-generated decimal numbers and
wc wish to test the hypothesis that they arc lhe samples of an RV x uniformly
distributed in the interval (0,1). We divide this interval into 10 subintcrvals of
length 0.1 and wc count the number k, of samples xt that arc in the ith
subinterval. The results are
/c, = 43 56 42 38 59 61 41 57 46 57
In this problem, m = 500, p(), = 0.1, and
Ю (к,- 50)2
Since *0.95(9) = 16.9 > 13.8 we accept the uniformity hypothesis.
Likelihood Ratio Test
We conclude with a general method for testing any hypothesis, simple or
composite. We ate given an RV x with density /(x, 0), where в is an arbitrary
parameter, scalar or vector, and we wish to test the hypothesis Htt: 6 g 0(l
276 statistics
against в e 0r The sets 0O and 0| are subsets of the parameter space
0 = enu
The density f(X, в), considered as a function of в, is the likelihood
function of X. We denote by the value of в for which /(X, 0) is maximum in
the space 0. Thus вт is the ML estimate of 0. The value of 0 for which f(X.O)
is maximum in the set 0O will be denoted by 0mQ. If Hu is the simple hypothesis
0 = 0O, then 0„,n = 0(). The maximum likelihood (ML) test is a test based on the
statistic
Note that
0<X< 1
because f(X,0mO) < f(X,0m). We maintain that X is concentrated near 1 if
is true. As we know [see (9-41)], the ML estimate 0,„ of 0 tends to its true value
0* as и -»<». Furthermore, under the null hypothesis, 0* is in the set 0O;
hence A -» 1 as л -» ». From this it follows that wc must reject HQ if A < c.
The constant c is determined in terms of the significance level a of the test.
Suppose, first, that HQ is the simple hypothesis 0 = 0O. In this case.
« = P{X <; c|H0) = / A(A,0o) dk	(9-81)
This leads to the following test: Using the samples x, of x, form the likelihood
function f(X,0). Find 0,„ and 0„(O and form the ratio A =/(Д', 0„((I)//(X, 0,„):
Reject Hn iff Л < Atr	(9-82)
where ka is the a percentile of the test statistic X under hypothesis Htt.
If Ho is a composite hypothesis, c is the smallest constant such that
P{X < c] < ka for every в g 0O.
Example 9-20. Suppose that fix, 0) ~ 6e~exU(x). Wc shall test the hypothesis
Hn: 0 < 6 <, 0O against Я(:0>0Й
In this problem. ®0 is the segment 0 < 0 < 0() of the real line and 0 is the
half-line 0 > 0. Thus both hypotheses arc composite. The likelihood function
/(X,0) = 0"e-"rfl
is shown in Fig. 9-12a for .r > 1/0(1 and .r < l/0h. In the hnlf-Iinc 0 > 0 this
function is maximum for 0 = l/x. In the interval O<0^0(, it is maximum for
fl - l/x if x > l/fl0 and for в = 0O if x < 1/0O. Hence
д =. 1 a / l/i for *> 1/6°
m x п,0=\в» for x < 1/0O
9-3 in ни hi sis и sim. 277
FIGURE 9-12
The likelihood ratio equals (Fig. 9-12/?)
1
for X > 1/0Ц
for x < l/0lt
Wc reject if A < c or, equivalently, if лее, where ct equals the a percentile
of the RV x.
To carry out a likelihood ratio test, we must determine the density of X.
This is not always a simple task. The following theorem simplifies the problem
for large n.
ASYMPTOTIC PROPERTIES. We denote by m and m(, the number of free
parameters in 0 and 0() respectively, that is, the number of parameters that
take noncountably many values. It can be shown that if m > then the
distribution of the RV w = -2 In X approaches a chi-square distribution with
m - mn degrees of freedom as n -> oo. The function и» = -2 In A is monotone
decreasing; hence A < c iff m > C| = — 2 In c. From this it follows that
a = P(X < c) = P(w > C]|
where ct =	- zn0), and (9-82) yields the following test
Reject H{) iff -2 In A >	~ mu)	(9-83)
We give next an example illustrating the theorem.
Example 9-21. Wc arc given an Mtj,1) RV x and wc wish to test the simple
hypotheses 77 - tjq against tj * ?j0. In this problem nnii> = 9<i ancl
/(^Л) = -7=^cxp{-4 L(.v, - tj)2}
V(2tf)
This is maximum if the sum [see (8-66)]
E(-t< - Ч)2 “ E(*i -л)2 + n(x - г})2
278 statistics
is minimum, that is, if tj = x. Hence rjin — x and
cxp{-i£(.r, ~ Th))2}
cxp(-{E(.r, - *)2}
= cxp{- - 7?n)2
From the above it follows that A > c iff |jf - 77,,! < cr This shows that the
likelihood ratio test of the mean of a normal RV is equivalent to the test (9-63л).
Note that in this problem, m = 1 and тпц = 0. Furthermore,
w = -21n X = n(x - 7jn)2
But the right side is an RV with x2(l) distribution. Hence the RV w has a
X2(m - m0) distribution not only asymptotically, but for any n.
COMPUTER SIMULATION IN HYPOTHESIS TESTING. As we have seen, the test
of a hypothesis //0 involves the following steps: We determine the value X of
the random vector X = [xi,...,x,n] in terms of the observations of the m
RVs хл and compute the corresponding value q = q(X) of the test statistic
q = g(X). We accept if q is not in the critical region of the test, for
example, if q is in the interval (qa,qb) where qa and qb are appropriately
chosen values of the и percentile qtl of q [see (9-59)]. This involves the
determination of the distribution F(q) of q and the inverse q„ =	**(«) of
F(q). The inversion problem can be avoided if we use the following approach.
The function F(q) is monotone increasing. Hence,
qa<q <qbtff a = F(qa) < F(q) < F(qb) = b
This shows that the test qa < q < qb is equivalent to the test
Accept HQ iff a < F(q) <b	(9-84)
involving the determination of the distribution F(q) of q. As we have shown in
Sec. 8-3, the function F(q) can be determined by computer simulation [see
(8-163)]:
To estimate numerically F(q) we construct the RV vector sequence
[ % 1,/» • • • »	] i I, . . . , Zl
where xkJ are the computer generated samples of the m RVs xk. Using the
sequence Xj, we form the RN sequence q, = gCX,) and we count the number
nq of <7,’s that are smaller than the computed q. Inserting into (8-163), we
obtain the estimate F(q) — nq/n. With F(q) so determined, (9-84) yields the
test
n.
Accept Hn iff a <	< b	(9-85)
In the above, q « g(X) is a number determined in terms of the experi-
mental data xk. The sequence qit however, is computer generated.
гконнмч 279
The above approach is used if it is difficult to determine analytically, the
function F(q). This is the case in the determination of Pearson’s test statistic
(9-75).
PROBLEMS
9-1. The diameter of cylindrical rods coming out of a production line is a normal RV x
with a = 0.1 mm. Wc measure n = 9 units and find that the average of the
measurements is x = 91 mm. (e) Find c such that with a 0.95 confidence coeffi-
cient, the mean tj of x is in the interval x ± c. (b) Wc claim that tj is in the
interval (90.95.91.05). Find the confidence coefficient of our claim.
9-2. The length of a product is an RV x with a = 1 mm and unknown mean. Wc
measure four units and find that x = 203 mm. (a) Assuming that x is a normal RV,
find the 0.95 confidence interval of tj. (b) The distribution of x is unknown. Using
Tchebycheffs inequality, find c such that with confidence coefficient 0.95, и is in
the interval 203 ± c.
9-3. We know from past records that the life length of type A tires is an RV x with
cr = 5000 miles. Wc test 64 samples and find that their average life length is
x = 25,000 miles. Find the 0.9 confidence interval of the mean of x.
9-4. We wish to determine the length a of an object. We use as an estimate of a the
average x of n measurements. The measurement error is approximately normal
with zero mean and standard deviation 0.1 mm. Find n such that with 95 percent
confidence, x is within ±0.2 mm of a.
9-5. The RV x is uniformly distributed in the interval в — 2 < x < 0 + 2. Wc observe
100 samples x( and find that their average equals x = 30. Find the 0.95 confidence
interval of в.
9-6. Consider an RV x with density /(x) - xe~'U(x). Predict with 95 percent confi-
dence that the next value of x will be in the interval (a, b). Show that the length
b - a of this interval is minimum if a and b arc such that
/(«)=/(6) P{a < x < b} = 0.95
Find a and b.
9-7. (Estimation-prediction) The time to failure of electric bulbs of brand A is a normal
RV with cr = 10 hours and unknown mean. Wc have used 20 such bulbs and have
observed that the average x of their time to failure is 80 hours. Wc buy a new bulb
of the same brand and wish to predict with 95 percent confidence that its time to
failure will be in the interval 80 ± c. Find c.
9-8. Suppose that the lime between arrivals of patients in a dentist’s office constitutes
samples of an RV x with density 0e~n'U(x). The 40th patient arrived 4 hours after
the first. Find the 0.95 confidence interval of the mean arrival time 17 = 1/0.
9-9. The number of particles emitted from a radioactive substance in 1 second is a
Poisson distributed RV with mean A. Il was observed that in 200 seconds, 2550
particles were emitted. Find the 0.95 confidence interval of A.
9-10. Among 4000 newborns, 2080 arc male. Find the 0.99 confidence interval of the
probability p - Pfmale).
280 statistics
9-11. In an exit poll of 900 voters questioned, 360 responded that they favor a particular
proposition. On this basis, it was reported that 40 percent of the voters favor the
proposition, (a) Find the margin of error if the confidence coefficient of the results
is 0.95. (b) Find the confidence coefficient if the margin of error is ±2 percent.
9-12. In a market survey, it was reported that 29 percent of respondents favor product A.
The poll was conducted with confidence coefficient 0.95, and the margin of error
was ±4 percent. Find the number of respondents.
9-13. Wc plan a poll for the purpose of estimating the probability p of Republicans in a
community. Wc wish our estimate to be within ±0.02 of p. How large should our
sample be if the confidence coefficient of the estimate is 0.95?
9-14. A coin is tossed once, and heads shows. Assuming that the probability p of heads
is the value of an RV p uniformly distributed in the interval (0.4,0.6), find its
bayesian estimate.
9-15. The time to failure of a system is an RV x with density fix, в) = 0e~e*U(.x). Wc
wish to find the bayesian estimate 0 of 0 in terms of the sample mean x of the л
samples x, of x. Wc assume that в is the value of an RV 0 with prior density
/й(0) = ce~c0U(Q). Show that
.	n + 1	1
fl----------	-
c + tix	n	X
9-16. The RV x has a Poisson distribution with mean fl. We wish to find the bayesian
estimate 0 of fl under the assumption that 0 is the value of an RV 0 with prior
density /„(0) ~ 0be~c°U(0). Show that
. rix + b + 1
0---------------
n + c
9-17. Suppose that the IQ scores of children in a certain grade arc the samples of an
Mi7,a) RV x. We test 10 children and obtain the following averages: x = 90,
s = 5. Find the 0.95 confidence interval of rj and of cr.
9-18. The RVs x,- arc i.i.d. and M0,cr). Wc observe that xf + • • • TXfn = 4. Find the
0.95 confidence interval of a.
9-19. The readings of a voltmeter introduces an error v with mean 0. We wish to
-estimate its standard deviation cr. We measure a calibrated source V = 3 V four
times and obtain the values 2.90, 3.15, 3.05, and 2.96. Assuming that v is normal,
find the 0.95 confidence interval of er.
9-20. The RV x has the Erlang density /(x) ~ c4x*e~cxU(x). Wc observe the samples
Xj — 3.1,3.4,3.3. Find the ML estimate c of c.
9-21. The RV x has the truncated exponential density f(x) = ce~clx~x,,)U(x - x„). Find
th‘6 ML estimate c of c in terms of the n samples x, of x.
9-22. The time to failure of a bulb is an RV x with density ce~rxU(x). We test 80 bulbs
and find that 200 hours later, 62 of them are still good. Find the ML estimate of c.
9-23. The RV x has a Poisson distribution with mean 0. Show that the ML estimate of 0
equals x.
9-24. Show that.if L(x,0) = In /(x, 0) is the likelihood function of an RV x, then
pkoiii ems 281
9-25. Wc are given an RV x with mean 17 and standard deviation cr = 2, and we wish to
lest the hypothesis 77 = 8 against 77 = 8.7 with a = 0.01 using as the test statistic
the sample mean x of n samples, (a) Find the critical region Rc of the test and the
resulting fi if n = 64. (6) Find n and Rc if /3 = 0.05.
9-26. A new car is introduced with the claim that its average mileage in highway driving
is at least 28 miles per gallon. Seventeen cars are tested, and the following mileage
is obtained:
19 20 24 25 26 26.8 27.2 27.5
28 28.2 28.4 29 30 31 32 33.3 35
Can we conclude with significance level at most 0.05 that the claim is true?
9-27. The weights of cereal boxes arc the values of an RV x with mean 77. Wc measure
64 boxes and find that x = 7.7 oz. and s = 1.5 oz. Test the hypothesis Hu; 77 = 8
oz. against 77 ¥= 8 oz. with a = 0.1 and a = 0.01.
9-28. Brand A batteries cost more than brand В batteries. Their life lengths are two RVs
x and y. We test 16 batteries of brand A and 26 batteries of brand В and find these
values, in hours:
x = 4.6	.s\=l.l у = 4.2 .vy = 0.9
Test the hypothesis ту, = 7)v against 77* > 77,. with a = 0.05.
9-29. A coin is tossed 64 times, and heads shows 22 limes. Test the hypothesis that the
coin is fair with significance level 0.05. Wc toss a coin 16 times, and heads shows к
times. If к is such that kt < к £ kz, wc accept the hypothesis that the coin is fair
with significance level a = 0.05. Find kt and k2 and the resulting Д error.
9-30. In a production process, the number of defective units per hour is a Poisson
distributed RV x with parameter A = 5. A new process is introduced, and it is
observed that the hourly defectives in a 22-hour period arc
x, = 3054264153740832436569
Test the hypothesis A = 5 against A < 5 with a = 0.05.
9-31. A die is tossed 102 times, and the /th face shows k, = 18, 15, 19, 17, 13, and 20
times. Test the hypothesis that the die is fair with a = 0.05 using the chi-square
test.
9-32. A computer prints out 1000 numbers consisting of the 10 integers j = 0,1,.... 9.
The number of times J appears equals
ttj- - 85 110 118 91 78 105 122 94 101 96
Test the hypothesis that the numbers j are uniformly distributed between 0 and 9,
with a - 0.05.
9-33. The number x of particles emitted from a radioactive substance in 1 second is a
Poisson RV with mean 0. In 50 seconds, 1058 particles arc emitted. Test the
hypothesis 0O = 20 against в #= 20 with a = 0.05 using the asymptotic approxima-
tion.
9-34. The RVs x and у are crx) and N(rjy, cry) respectively and independent. Test
the hypothesis ax = cry against ax & <ry using as the test statistic the ratio (see
Prob. 6-19)
1 m	/ 1 n
q - — £ (x, - Пд)7 - £ (у< - n>.)
,n 1-1	/ n/-i
282 statistics
9-3S. Show that (he variance of an RV with student-/ distribution /(л) equals ii/(n - 2).
9-36. Find the probability p5 that in a men's tennis tournament the final match will last
five games. («) Assume that the probability p that a player wins a set equals 0.5.
(h) Usebayesian statistic with uniform prior (see taw of succession).
9-37. Show that in the measurement problem of Example 9-9, the bayesian estimate 0 of
lhe parameter 6 equals
2	2
0 = —+ —— .v where
a (r
_2	Л-2
— X ———
П	4 tr'/n
9-38.. Using the 'ML method, find.the у confidence inlcrval-of the variance r = о-2 of an
Mrj.rr) RV with known mean.
9-39- Show that if 0| and 02 are lwo unbiased minimum variance estimators of a
parameter 0, then 0] = 02. Hint: Form the RV 0 = (0, 4 0,)/2. Show that
о/ = <r2(l 4 r)/2 £ <r2 where a2 is the common variance of 0t and 02 and r is
their correlation coefficient.
9-40, The number Of successcs df an event in n trials equals к(. Show that
(*i — zijo,)2 (k2~np2)2 _ (k, - np,)2
"Pi	"Pi	"Pi Pi
where k2 = n — k, and P(.a<) = p, = 1 - p2.
PART
II
STOCHASTIC
PROCESSES
CHAPTER
10
GENERAL
CONCEPTS
10-1 DEFINITIONS
As we recall, an RV x is a rule for assigning to every outcome 4 of an
experiment .У a number x(£). Л stochastic process x(z) is a rule for assigning to
every £ a function x(t,£). Thus a stochastic process is a family of time functions
depending on the parameter < or, equivalently, a function of t and C The
domain of C is the set of all experimental outcomes and the domain of t is a set
R of real numbers.
If R is the real axis, then x(/) is a continuous-time process. If R is the set
of integers, then x(r) is a discrete-time process. A discrete-time process is. thus,
a sequence of random variables. Such a sequence will be denoted by x„ as in
Sec. 8-4, or, to avoid double indices, by x[/i].
We shall say that x(z) is a discrete-state process if its values are countable.
Otherwise, it is a continuous-state process.
Most results in this investigation will be phrased in terms of continuous-
time processes. Topics dealing with discrete-time processes will be introduced
either as illustrations of the general theory, or when their discrete-time version
is not self-evident.
We shall use the notation x(/) to represent a stochastic process omitting,
as in the case of random variables, its dependence on Thus x(r) has the
following interpretations:
I.	It. is a family (or an ensemble) of functions x(/,£). In this interpretation, t
and £ are variables.
285
286 STOCHASTIC PROPERTIES
2.	It is a single time function (or a sample of the given process). In this case, t is
a variable and £ is fixed.
3.	If t is fixed and f is variable, then x(r) is a random variable equal to the state
of the given process at time t.
4.	If t and f are fixed, then x(r) is a number.
A physical example of a stochastic process is the motion of microscopic
particles in collision with the molecules in a fluid (brownian motion). The
resulting process x(/) consists of the motions of all particles (ensemble). A single
realization x(t, 0 of this process (Fig. 10-la) is the motion of a specific particle
(sample). Another example is the voltage
x(/) = rcos(wr + <p)
of an ac generator with random amplitude г and phase <p. In this case, the
process x(/) consists of a family of pure sine waves and a single sample is the
function (Fig. 10-lb)
x(t,£) = r(f)cos[oK + <p(f)]
According to our definition, both examples are stochastic processes. There
is, however, a fundamental difference between them. The first example (regular)
consists of a family of functions that cannot be described in terms of a finite
number of parameters. Furthermore, the future of a sample x(/,£) of x(t)
cannot be determined in terms of its past. Finally, under certain conditions, the
statistics! of a regular process x(t) can be determined in terms of a single
sample (see Sec. 13-1). The second example (predictable) consists of a family of
pure sine waves and it is completely specified in terms of the RVs r and <p.
Furthermore, if x(r, £) is known for t to, then it is determined for t > to.
Finally, a single sample x(t, £) of x(f) does not specify the properties of the
tRecall that statistics hereafter wilt mean statistical properties.
IO-1 im । im। ions 287
entire process because it depends only on the particular values r(<) and <p(<) of
rand <p. A formal definition of regular and predictable processes is given in Sec.
12-3.
Equality. We shall say that two stochastic processes x(r) and y(r) arc equal
(everywhere) if their respective samples x(r, <) and y(r, f) arc identical for every
Similarly, the equality z(/) = x(r) + y(t) means that z(z.<) = x(r,<) + y(z.<)
for every £. Derivatives, integrals, or any other operations involving stochastic
processes are defined similarly in terms of the corresponding operations for
each sample.
As in the case of limits, the above definitions can be relaxed. We give
below the meaning of MS equality and in Арр. I0A we define MS derivatives
and integrals. Two processes x(/) and y(r) arc equal in the MS sense iff
£{lx(') -y(/)l2) = 0	(10-1)
for every t. Equality in the MS sense leads to the following conclusions: We
denote by the set of outcomes f such that x(r. О = y(r,<) for a specific t,
and by the set of outcomes £ such that x(r,£) = y(r,<) for every r. From
(10-1) it follows that x(/,£) - y(/,f) = 0 with probability 1; hence P(.n/Z) =
P(.Z) = 1. It does not follow, however, that P(.^C) = 1. In fact, since .•;/ is
the intersection of all sets &/, as t ranges over the entire axis. Р(л£) might
even equal 0.
Statistics of Stochastic Processes
A stochastic process is a noncountable infinity of random variables, one for each
t. For a specific t, x(/) is an RV with distribution
F(x,t) =P{x(r) <x}	(10-2)
This function depends on t, and it equals the probability of the event {x(r) < x)
consisting of all outcomes £ such that, at the specific time /, the samples x(f, {)
of the given process do not exceed the number x. The function Их, r) will be
called the first-order distribution of the process x(r). Its derivative with respect
to x:
is the first-order density of x(/).
Frequency interpretation If the experiment is performed n times, then n functions
x(r, f() arc observed, one for each trial (Fig. 10-2). Denoting by м/x) the number of
trials such that at time t the ordinates of the observed functions do not exceed x (solid
lines), we conclude as in (4-3) that
F(x,t)^'^-	(10-4)
it
288 STOCHASTIC PROPERTIES
The second-order distribution of the process x(/) is the joint distribution
F(xitx2;tltl2) =P{x(ti) <xx,x(t2) <x2)	(10-5)
of the RVs xf^) and x(r2). The corresponding density equals
<?2F(xt, x2; r।, t2)
f(x,, x2; t,, t2) = ---——----------- (10-6)
dx,aX2
We note that (consistency conditions)
F(x,;r,) = Ffxpoo;/,,/^,)	/(x1,f1)=/‘ f(xltx2;tltt2)dx2
J — ОС
as in (6-9) and (6-10).
The nth-order distribution of x(r) is the joint distribution F(x(,...,x„;
f|,.. -, t„) of the RVs x(f|),... ,x(f„).
SECOND-ORDER PROPERTIES. For the determination of the statistical proper-
ties of a stochastic process, knowledge of the function F(xp x„; ff„) is
required for every x,, th and n. However, for many applications, only certain
averages are used, in particular, the expected value of x(r) and of x2(r). These
quantities can be expressed in terms of the second-order properties of x(r)
defined as follows:
Mean The mean 17(f) of x(/) is the expected value of the RV xG):
i)(t) = E(x(t)) = f xf(x,t)dx	(10-7)
— ОС
Autocorrelation The autocorrelation R(t},t2) of x(r) is the expected
value of the product x(rt)x(/2):
R(z„l2) =£{x(»,)x(f,)} = Г Г xlx2f(.xl,x2,tl,l2)dx,dx2 (10-8)
— OO-' — »
The value of R(r„ t2) on the diagonal /l = t2 = t is the average power of
x(t):
E{x2(/)} = R(M)
10-1 DEHNITIONS 289
The autocovariance C(tx, t2) of x(f) is the covariance of the RVs х(^) and
x(r2):
C(r„r2) =Я('!.'2)	(10-9)
and its value C(r, t) on the diagonal r, = t2 = t equals the variance of x(/).
Note The following is an explanation of the reason for introducing the function /?(/,, t2)
even in problems dealing only with average power: Suppose that x(z) is the input to a
linear system and yG) is the resulting output. In Sec. 10-2 wc show that the mean of y(/)
can be expressed in terms of the mean of x(r). However, the average power of y(r)
cannot be found if only E{x2(j)} is given. For the determination of £{y2(r)}, knowledge
of the function /?(/h/2) is required, not just on the diagonal r( = r,, but for every and
The following identity is a simple illustration
£{[x(z,) + x(z,)]3) = /?(/,, г,) + 2Л(г,,г2) + R(t2,t2)
This follows from (10-8) if wc expand the square and use the linearity of expected values.
Example 10-1. An extreme example of a stochastic process is a deterministic signal
x(/) = /(/). In this case,
n(0 =*{/(')) =/(O	«(G,G)=^/(G)/(G)} =/(G)/(G)
Example 10-2. Suppose that x(/) is a process with
r?(/) = 3	Л(г,.г2) = 9 + 4t>
Wc shall determine the mean, the variance, and the covariance of the RVs
z = x(5) and w = x(8).
Clearly, £{z) = i)(5) = 3 and £{w) = tj(8) = 3. Furthermore,
£{z2} - 7?(5,5) = 13	£{w2} = R(8.8) = 13
£(zw) = £(5,8) = 9 + 4e-"h = 11.195
Thus z and w have the same variance cr2 = 4 and their covariance equals
C(5,8) = 4e-'*h = 2.195.
Example 10-3. The integral
s = Г\(/) dt
of a stochastic process x(r) is an RV s and its value s(£) for a specific outcome I is
the area under the curve x(z,f) in the interval (a, b) (see also Арр. 10A).
Interpreting the above as a Riemann integral, we conclude from the linearity of
expected values that
= £{s} = (bE{x(t)} dt = dt	(10-10)
Similarly, since
s2= fb (bx(t,)x(t2)dtidt2
290 STOCHASTIC PROPERTIUS
wc conclude, using again the linearity of expected values, that
E{s2} = (b(hE{x(tl)x(t2)} di{ dtz = fh fbR(ti,t,)dtidt: (10-11)
Example 10-4. We shall determine the autocorrelation Л(Г|,г2) of the process
x(r) = rcos(<z>r + <p)
where wc assume that the RVs r and <p are independent and <p is uniform in the
interval (—тт,tt).
Using simple trigonometric identities, we find
E(x(/|)x(g)} = ^E{r2}E{cos w(r, - t2) + cos(&)t| + wi, + 2<p)}
and since
I rTT
£{cos(<wrj 4- ш1~, + 2<p)} = -— I cos(w/| + wt-, + 2<p) dip = 0
Zir-'-TT
we conclude that
Я(Г|,Г2) = ^£{r2)cosш(Г| - t2)	(10-12)
Example 10-5 Poisson process. In Sec. 3-4 we introduced the concept of Poisson
points and we showed that these points are specified by the following properties:
P|i The number n(tb t2) of the points t, in an interval (rb t2) of length t = t2 - r,
is a Poisson RV with parameter Ar:
e~A'(Ar)A
P{n(r|tr2) =fc) =------(10-13)
P2: If the intervals (rbr2) and (r3, r4) are nonoverlapping, then the RVs n(rbr2)
and n(t3, t4) are independent.
Using the points l{, we form the stochastic process
x(r) = n(0,t)
shown in Fig. 10-Зя. This is a discrete-state process consisting of a family of
increasing staircase functions with discontinuities at the points tz.
For a specific r, x(r) is a Poisson RV with parameter Ar; hence
E{x(/)} = 7j(t) = At
Wc shall show that its autocorrelation equals
ч I At, + A2t,r,	r. G
Л(г,и,)-{ “	,	<1044)
(A/i+A-r^,	r, £ r,
or equivalently that
C(t|,r2) = A min(rt, t2) = At|{/(r2 - r,) + Ar,U(t| - r2)
10-1 Dt-HN(1(ONS 291
Proof. The above is true for /, = r, because [see (5-36)]
E{x2(r)} = Ar 4-A2r2	(10-15)
Since Л(Г|,/2) = Л(/2,/|), it suffices to prove (10-14) for Г| < tz. The RVs x(r,)
and x(t2)-x(t|) are independent because lhe intervals (0, r,) and (r,, t2) are
nonoverlapping. Furthermore, they are Poisson distributed with parameters АГ|
and A(r, - *|) respectively. Hence
E{*('i)[*('2) ~ x('i)]} = £{*('i)}^{xCr2) “ *('i)} =	“ '»)
Using the identity
x<^i)x(/2) = x('i)[x('i) + x('2> - x('l)]
we conclude from the above and (10-15) that
= А/, + A2r2 4- Ar,A(t2 - /,)
and (10-14) results.
Nonuniform case If the points t, have a nonuniform density A(z) as in
(3-54), then the preceding results still hold provided that the product A(r, - Г|) is
replaced by the integral of A(r) from r( to r2.
Thus
= f‘x(a)da	(10-16)
and
K('i.'2) = /Й,'А(0Л[1 +
'i ^'2
(10-17)
Example 10-6 Telegraph signal. Using the Poisson points t^, we form a process
x(r) such that x(r) = 1 if the number of points in the interval (0, t) is even, and
x(t) — -1 if this number is odd (Fig. 10-3Z>).
292 STOCHASTIC PROPERTIES
Denoting by p(k) the probability that the number of points in the interval
(Oj) equals k, we conclude that [see (10-13)]
P{x(t) = 1} = p(0) + p(2) +
= e A' cosh Л t
P{x(t) = -1} = p(l) +p(3) + •••
= e A‘ At +
(Ar)3
= e A' sinh kt
Hence
£{x(z)} - e~ A'(cosh kt - sinh kt) = e 2A'	(10-18)
To determine R(tx,t2), we note that, if x(r j) = 1. then x(z2) = I if the
number of points in the interval (t,, t2) is even. Hence
/>{x(z2) = 1 |x(f।) = 1} = cA'cosh kt t = |f2 - tj|
Multiplying by P{x(t,) = 1), we obtain
P{x(t।) = 1. x(g) = 1) = e A'cosh kte A'- cosh kt2
Similarly,
/’{х(Г|) = - 1, x(r2) - ~ 1} = e“A'cosh Are~A'-sinh kt2
P{x(Ji) = к x(r2) = — 1} = eA' sinh kte~*': sinh kt2
P{x(f|) = - l,x(r2) = 1} = e"A/ sinh kte A'- cosh kt2
Since the product x(/|)x(r2) equals 1 or - 1, we conclude omitting details that
Я(г,Л2)(10-19)
The above process is called semirandom telegraph signal because its value
x(0) = 1 at t = 0 is not random. To remove this certainty, we form the product
y(r) = ax(r)
where a is an RV taking the values + 1 and -1 with equal probability and is
independent of x(r). The process yO) so formed is called random telegraph signal.
Since £(a) = 0 and £{a2} = 1, the mean of yO) equals £{a)£{x(t)) = 0 and its
autocorrelation is given by
£{y(r i)y(r2)} = £{a2)£{x('i)x(z2)} = e-2A,“"'-’
Wc note that as f -> « the processes x(r) and y(r) have asymptotically equal
statistics.
10-1 di । ini । ions 293
General Properties
The statistical properties of a real stochastic process x(r) are completely
determined! in terms of its nth-order distribution
F(xi,...,x„;tl...../„) = Р{х(г,) <x,.......x(/„) <.t„)	(10-20)
The joint statistics of two real processes x(r) and y(r) are determined in
terms of the joint distribution of the RVs
x('t)....X( G,).y( f;)....
The complex process = x(/) + jy(r) is specified in terms of the joint
statistics of the real processes x(t) and y(f).
A vector process (n-dimensional process) is a family of n stochastic
processes.
Correlation and covariance. The autocorrelation of a process x(r). real or
complex, is by definition lhe mean of the product xU,)x*(r,). This function, will
be denoted by R(t{, /,) or /?//,, t2) or /?лл(г(, t2). Thus
= £{x(/,)x*(/;)}	(10-21)
where the conjugate term is associated with the second variable in RKK(t{,t2).
From this it follows that
/?(£,,/,) =E{x(r,)x*(r,)} =/?*(/,.r2)	(10-22)
We note, further, that
/?(/./) = £'{|x(/)|2} > 0	(10-23)
The last two equations are special cases of the following: The autocorrela-
tion /?(грг2) of a stochastic process x(r) is a positive definite (p.d.) function,
that is, for any af and af:
'£fa(a*R(t„tt) > 0	(10-24)
This is a consequence of the identity
0 ZXxU)| ) = E«/«*£(X(OX*('J}
v «	' / «.>
We show later that the converse is also true: Given a p.d. function
R(tf,t2), we can find a process x(/) with autocorrelation R(t,, t2).
tThere are processes (nonseparable) for which this is not true. However, such processes are mainly
of mathematical interest.
294 STOCHASTIC PROPERTIES
Example 10-7. (o) If x(/) = aeio't then
R(r,,/2) = E{ae'“'la*e~'"‘’} = E{ |a|2}^"(/'~^’
(b) Suppose that the RVs a, are uncorrelated with zero mean and variance
ст/. If
x(r) =
i
then (10-21) yields
Л(/„г2) = Z>.2c'41''‘';)
i
The autocovariance C(tlt t2) of a process x(f) is the covariance of the RVs
x(r ।) and x(t2):
C(/„/2) =/?(/>.f2) -tlW(h)	(10-25)
In the above, y(t) = E{x(t)) is the mean of x(r).
The ratio
C(f,, f2)
r(/t,/2) =	,	r—	(10-26)
ус(»„»,)с(/2,»г)
is the correlation coefficient t of the process x(/).
Note The autocovariancc C(z,, r2) of a process x(r) is the autocorrelation of the centered
process
x(O = x(0 “ л(')
Hence it is p.d.
The correlation coefficient r(.t{, t2) of x(f) is the autocovariance of the normalized
process x(t)/ yC(ttf); hence it is also p.d. Furthermore [see (7-9)]
к(г1э/2)| 2S 1 r(M) = i	(10-27)
Example 10-8. If
s = fb*(t) dt then s - r}3 = fh*(t) dt
J a	J a
where x(r) = x(/) — п.ДО- Using (10-11), we conclude from the above note that
a/ = E(|s - nJ2) = ffhCx(ty h) dti dtz (Ю-28)
Ja Ja
The cross-correlation of two processes x(t) and yO) is the function
^y(/„f2) =£{x(f,)y*(f2)} =/?*.(/,,/,)	(10-29)
tin optics, C(/[,/2) (s called the coherence function and r(/(, r2) is called the complex degree of
coherence {see Papoulis, 1968).
10-1 dijimikjss 295
Similarly,
Сг.Л'р'2) = ^O(G-U) -vAfi)Vy(ti)	(10-30)
is their cross-covariance.
Two processes xG) and y(r > are called (mutually) orthogonal if
= O for every /, and /,	(10-31)
They are called uncorrelated if
CrjXG’G) = 0 forevery tf and t2	(10-32)
а-dependent	processes	In general, the values x(r,) and x(t2) of a stochastic
process x(/)	are	statistically dependent for any	r, and t2.	However,	in most
cases this dependence decreases as |r, - r2| -♦ x. This leads to the following
concept: A stochastic process x(/) is called а-dependent if all its values x(f) for
t < to and for I > to + a are mutually independent. From this it follows that
C(r,,/,)=0 for I/, - r,| > a	(10-33)
A process x(r) is called correlation а-dependent if its autocorrelation
satisfies (10-33). Clearly, if x(r) is correlation n-dependent. then any linear
combination of its values for t < ta is uncorrelated with any linear combination
of its values for + a.
White noise We shall say that a process v(/) is white noise if its values
»(/,) and are uncorrelated for every ti and
= 0 ti^tl
As we explain later, the autocovariance of a nontrivial white-noise process
must be of the form
C(/,,r2) = q(tl)8(ti -t2) q(l) > 0	(10-34)
If the RVs v(/,) and v(/;) are not only uncorrelated but also independent,
then v(t) will be called strictly white noise. Unless otherwise stated, it will be
assumed that the mean of a white-noise process is identically 0.
Example 10-9. Suppose that v(/) is white noise and
x(/) = f'v(a) da	(10-35)
Л)
Inserting (10-34) into (10-35), wc obtain
E{x2(0) =	~ t2)dr[dt2 = ^(fjdt, (10-36)
because
^8(11 - t2) dt2 =1 for 0 < /j < t
Uncorrelated and independent increments If the increments x(/2) - x(/j)
and XG4) — х(Гд) of a process x(/) are uncorrelated (independent) for any
296 STOCHASTIC PROPERTIES
f, < t2 < t3 < tA, then we say that x(f) is a process with uncorrelated (indepen-
dent) increments. The Poisson process is a process with independent incre-
ments. The integral (10-35) of white noise is a process with uncorrelated
increments.
Independent processes If two processes x(f) and y(f) are such that the
RVs x(/j), . ..,x(f„) and y(f [),... ,y(f^) are mutually independent, then these
processes are called independent.
Normal processes. A process x(r) is called normal, if the RVs x(z,)..x(r„)
are jointly normal for any n and tx,...,tn.
The statistics of a normal process are completely determined in terms of
its mean 17(f) and autocovariance C(f|,f2). Indeed, since
£{x(f)} = 17(f)	a~(t) =
we conclude that the first-order density f(x,t) of x(f) is the normal density
Мч(О;/С(770].
Similarly, since the function r(f|,f2) in (10-26) is the correlation coeffi-
cient of the RVs x(f 1) and x(f,), the second-order density f(x{, x,; f(, f2) of x(f)
is the jointly normal density
^(*2);zi)	;r(tx,f2)]
The nth-order characteristic function of the process x(f) is given by [see
(8-60)]
ехр(/Е1?(0)й>,- “ 7	(10-37)
I i	2 i.k	)
Its inverse /(X|,...,re;f1,...,t(() is the nth-order density of x(f).
Existence theorem. Given an arbitrary function 17(f) and a p.d. function C(t (, /2),
we can construct a normal process with mean 17(f) and autocovariance C(f p f2).
This follows if we use in (10-37) the given functions 17(f) and C(fj,f2). The
inverse of the resulting characteristic function is a density because the function
C(f j, t2) is p.d. by assumption.
Example 10-10. Suppose that x(f) is a normal process with
17(f) = 3	C(/„/2)
(«) Find the probability that x(5) < 2.
Clearly, x(5) is a normal RV with mean ij(5) = 3 and variance C(5,5) = 4.
Hence
P{x(5) s 2} = G(-1/2) = 0.309
(W Find the probability that |x(8) - x(5)|	1.
The difference s e x(8) — x(5) is a normal RV with mean 17(8) - ij(5) = 0
and variance
C(8,8) + C(5,5) - 2C(8,5) - 8(1 - e”0 6) = 3.608
10-1 IMJ INIl IONS 297
Hence
P{ lx(8) - x(5)l <; 1) = 2G( 1 /1.9) - 1 = 0.4
Point and renewal processes. A point process is a set of random points t, on the
time axis. To every point process we can associate a stochastic process x(r)
equal to the number of points t, in the interval (0, /). An example is the Poisson
process. To every point process t( we can associate a sequence of RVs z„ such
that
Z| = t| Z2 = t, — t| • • • zn = t„ — tzl_ I
where t, is the first random point to the right of the origin. This sequence is
called a renewal process. An example is the life history of light bulbs that are
replaced as soon as they fail. In this case, zt is the total time the ith bulb is in
operation and t, is the time of its failure.
We have thus established a correspondence between the following three
concepts (Fig. 10-4): (a) a point process t;, (b) a discrete-state stochastic process
xG) increasing in unit steps at the points t,f (c) a renewal process consisting of
the RVs z; and such that tw = Zj +  • • +z„. This correspondence is developed
further in Sec. 16-1.
Stationary Processes
A stochastic process xG) is called strict-sense stationary (abbreviated SSS) if its
statistical properties are invariant to a shift of the origin. This means that the
processes xG) and xG + c) have the same statistics for any c.
Two processes xG) and yG) are called jointly stationary if the joint
statistics of x(t) and y(t) are the same as the joint statistics of xG -+ c) and
y(t + c) for any c.
A complex process zG) = xG) + jyCf) is stationary if the processes xG)
and y(r) are jointly stationary.
From the definition it follows that the nth-order density of an SSS process
must be such that
f{xx,...,xn\tx,...ttn) = /(х1,...,хл;/1 +	+ c) (10-38)
for any c.
2S>8 STOCHASTIC PROPERTIES
From the above it follows that /(x; t) = f(x\ t + c) for any c. Hence the
first-order density of x(t) is independent of r.
/(x;r) = /(x)	(10-39)
Similarly, f(xlt x2; t] + c, t2 + c) is independent of c for any c. This leads
to the conclusion that
f{xx,x2,t^t2) = f(xXix2\T) T = tl-t2 (10-40)
Thus the joint density of the RVs x(z + r) and x(r) is independent of t and it
equals f(xltx2;r).
WIDE SENSE. A stochastic process x(/) is called wide-sense stationary (abbrevia-
ted WSS) if its mean is constant
E{x(t)}=-q	(10-41)
and its autocorrelation depends only on т = r, - t2:
E{x(t + t)x*(z)} = R(r)	(10-42)
Since т is the distance from t to t + r, the function R(r) can be written in the
symmetrical form
R(r) =	(10-43)
Note in particular that
E(|x(OI2) = «(0)
Thus the average power of a stationary process is independent of t and it equals
Ж0).
Example 10-11. Suppose that x(/) is a WSS process with autocorrelation
R(t) = Ae~a,rl
We shall determine the second moment of the RV x(8) - x(5). Clearly,
E{[x(8) - x(5)]2} = E{x2(8)J + E{x2(5)} - 2E{x(8)x(5)}
“R(0) + R(0) - 2Я(3) = 2A - 2Ae~3a
Note As the above example suggests, the autocorrelation of a stationary process x(z) can
be defined as average.power. Assuming for simplicity that x(z) is real, we conclude from
(10-42) that
Е{[х(/ + т) - x(0]2} = 2[Я(0) - R(r)]	(10-44)
•From (10-42) it follows that the autocovariance of a WSS process depends
only on r~ti — t2i

(10-45)
10-1 DEFINITIONS 299
and its correlation coefficient [see (10-26)] equals
r(r) = C(r)/C(0)	(10-46)
Thus C(r) is the covariance, and r(r) the correlation coefficient of the RVs
x(t + r) and x(z).
Two processes x(t) and y(/) are called jointly WSS if each is WSS and
their cross-correlation depends only on т = Г] - t2:
Лху(т) = E(x(t + r)y*(/)) CXJ.(T) = Rxy(r) -	(10-47)
If x(z) is WSS white noise, then [see (10-34)]
C(t)=^3(t)	(10-48)
If x(z) is an а-dependent process, then C(r) = 0 for |r| > a. In this case,
the constant a is called the correlation time of x(z). This term is also used for
arbitrary processes and it is defined as the ratio
= 7777 f C(t) dr	(10-49)
C(0) Jq
In genera] C(r) * 0 for every r. However, for most regular processes
C(r) —-------> 0 R(r) —-----------> hl2
|t|—
Example 10-12. If x(z) is WSS and
s = fT x(z) dt
-T
then [see (10-28)]
а,2 = Г Г C(tt - t2) dtx dt2 = [2T (2T - |г|)С(т) dr (10-50)
У -n,. уУ — у*	* — 27*
The last equality follows with r = /] — t2 (see Fig. 10-5); the details, however, are
omitted [see also (10-143)].
T T	2T
f jc(t/-4) Л, dt2 - /(27Нт|)С(7) dr
-r-T

HGUREltS
300 STOCHASTIC PROPERTIES
Special cases, («) If C(r) = q3(r), then
<rs2 = q[2T (2T — |t|)6(t) dr = 2Tq
J-2T
(b) If the process x(r) is «-dependent and a <k T, then (10-50) yields
a,2- (2T (2T- |T|)C(T)rfr = 2Т(° C(t) dr
J-2T	J~a
This shows that, in the evaluation of the variance of s, an а-dependent process
with a T can be replaced by white noise as in (10-48) with
<7 = / C(t)<Zt
J —a
If a process is SSS, then it is also WSS. This follows readily from (10-39)
and (10-40). The converse, however, is not in general true. As we show next,,
normal processes are an important exception.
Indeed, suppose that x(r) is a normal WSS process with mean 17 and
autocovariance C(r). As we see from (10-37), its nth-order characteristic
function equals
( 1 1
ЯР	(10-51)
I	i L i.k	)
This function is invariant to a shift of the origin. And since it determines
completely the statistics of x(r), we conclude that x(f) is SSS.
Example 10-13. We shall establish necessary and sufficient conditions for the
stationarity of the process
x(r) = acos tot + bsin ад/	(10-52)
The mean of this process equals
E{x(r)} = E{a}cos ад/ + E{b}sin«>r
This, function must be independent of t. Hence the condition
E{a) = E{b)=0	(10-53)
is necessary for both forms of stationarity. We shall assume that it holds.
Wide sense. The process x(/) is WSS iff the RVs a and b are uncorrelated with
equal variance:
E(ab} = 0	E{a2} = E{b2) = a2	(10-54)
If this holds, then
Л(т) = a2 cos (от	(10-55)
Proo/. If x(r) is WSS, then
E(x2(0)} = E{x2(7r/2to)} = Я(0)
10-1 1Л-.1 INHIONS 301
But x(0) - a and х(тг/2ш) = b; hence E(a2) = E(b2}. Using the above, we obtain
Я{х(/ + т)х(г)} = E{[acosw(/ + r) + bsinw(/ + r)][acosw/ + bsin wt]}
= a2 cos wt 4- E{ab)sin w(2r + r)	(|0-56)
This is independent of t only if E(ab) = 0 and (10-54) results.
Conversely, if (10-54) holds, then, as we see from (10-56), the autocorrelation
of x(t) equals a2 cos шт; hence x(t) is WSS.
Strict sense. The process x(t) is SSS iff the joint density /(a, b) of the RVs a and b
has circular symmetry, that is, if
/(a,b) =f(>/a2 + b2)	(10-57)
Proof. If x(r) is SSS. then the RVs
x(0) = a	x(—/2ш) = b
and
x(/) = acoswt + bsin wt x(t + тг/2ш) = bcos wt - asin wt
have the same joint density for every t. Hence [see (6-70)], /(a, b) must have
circular symmetry.
We shall now show that, if /(a, b) has circular symmetry, then x(t) is SSS.
With т a given number and
3| = acos wr + bsin wr b, = bcos шт - asin шт
we form the process
X|(f) = at cos wt + bt sin wt = x(t + r)
Clearly, the statistics of x(t) and x ,(z) are determined in terms of the joint
densities /(a, b) and f(ait bt) of the RVs a, b and a(,b|. But [see (6-67)] the RVs
a, b and a^bj have the same joint density. Hence the processes x(t) and x(t + t)
have the same statistics for every r.
Corollary. If the process x(t) is SSS and the RVs a and b are independent, then
they are normal.
Proof. It follows from (10-57) and (6-34).
Example 10-14. (a) Given an RV w with density /(w) and an RV q> uniform in the
interval (—77,77) and independent of w, we form the process
x(f) = a cos(wf + <p)	(10-58)
We shall show that x(t) is WSS with zero mean and autocorrelation
a2	a2
R(t) = — E(cos wt} = — Re Фы(т)	(10-59)
Where
Ф^т) - £{е/ыт) - E{cos wt) + jE{sin wt)	(10-60)
is the characteristic function of w.
302 STOCHASTIC PROPERTIES
Proof. Clearly [sec (7-59)]
E{cos(u>r + y)} = £{£{cos(<oz + <p) !<*>)}
From the independence of <0 and <p, it follows that
E{cos(<o/ + <p)	= cos ut E(cos <p) — sin cat E{sin <p)
Hence E(xG)} = 0 because
1	yW	I
E{cos <p) = — / cos <p d<p = Q E{sin <p) = — I sin <p d<p = 0
2	77 J—tt	2ТТ J-tt
Reasoning similarly, wc obtain E{cos(2<ot + шт + 2<p>} = 0. And since
2cos[io(f + r) + <p]cos(un + ip) = cos tor + cos(2<nf + шт + 2<p)
wc conclude that
a~
R(t) = a2E{cos[<o(f + t) + <p]cos(wf + <p)} = — E{cos<or}
(b) With ы and <p as above, the process
z(f) = ae*”1
is WSS with zero mean and autocorrelation
E{z(t + r)z*(r)J = a2E{e'“'} = а2Ф„(т)
Centering. Given a process x(r) with mean 77(f) and autocovariance (?,(/,»/2),
we form difference
x(r) = x(r) - Tj(t)	(10-61)
This difference is called the centered process associated with the process x(f).
Note that
E{x(r)} =0	Лх.(г„г2) = Cx(r„r2)
From this it follows that if the process x(t) is covariance stationary, that is, if
Сх(/1эГ2) = C/f, — t2), then its centered process x(r) is WSS.
Other forms of stationarity. A process x(r) is asymptotically stationary if the
statistics of the RVs x(r, + c),	+ c) do not depend on c if c is large.
More precisely, the function
f(xi......Vi +c,...,t„ + c)
tends to a limit (that does not depend on c) as c -» co. The semirandom
telegraph signal is an example-
A process x(/) is Nth-order stationary if (10-38) holds not for every /1, but
only for n <, N.
A process x(t) is stationary in an interval if (10-38) holds for every /, and
4- c in this interval.
We say that x(/) is a process with stationary increments if its increments
y(/)ax(f + Л) — x(r) form a stationary process for every h. The Poisson
process is ah example.
10-2 SYSTEMS Wl ГН S ГОС1 IASI IC INPI its 303
MEAN SQUARE PERIODICITY. A process x(/) is called MS periodic if
E{|x(r + T) - x(/)|2) = 0	(10-62)
for every t. From this it follows that, for a specific f,
4- T) = x(/)	(10-63)
with probability 1. It does not, however, follow that the set of outcomes £ such
that х(/ + T, f) = x(/, for all t has probability 1.
As we see from (10-63) the mean of an MS periodic process is periodic.
We shall examine the properties of R(t}. i2).
THEOREM. A process x(/) is MS periodic iff its autocorrelation is doubly
periodic, that is, if
Я(/, + тТ, t2 + nT) = R(tt.t2)	(10-64)
for every integer m and n.
Proof. As we know [see (7-12)]
E2{zw} < E{z2}E{w2}
With z = x(/|) and w = x(/2 + T) - x(/2) the above yields
E2{x(Zi)[x(r, + T) - x(f2)]} < E{x2(r,)}E[[x(/2 + T) - x(i2)]2)
If x(f) is MS periodic, then the last term above is 0. Equating the left side to 0,
we obtain
R(lt,t2 + T)	=0
Repeated application of this yields (10-64).
Conversely, if (10-64) is true, then
R(t + Tj + T) = R(t + T,t) = R(t,t)
Hence
E([x(z+ T) - x(Z)]2) =R(t + T,t + T) + R(t,t) - 2R(t + Tj) = 0
therefore x(i) is MS periodic.
10-2 SYSTEMS WITH STOCHASTIC INPUTS
Given a stochastic process x(z), we assign according to some rule to each of its
samples x(z,<z) a function у(^£,). We have thus created another process
y(0 = 7’[x(/)]
whose samples are the functions y(z, The process y(/) so formed can be
considered as the output of a system (transformation) with input the process
x(r). The system is completely specified in terms of the operator T, that is, the
nile of correspondence between the samples of the input x(z) and the output
yfr)-
3Q4 STOCHASTIC PROPERTIES
The system is deterministic if its operates only on the variable t treating f
as a parameter. This means that if two samples x(/, {J and x(r, f2) of the input
are identical in t, then the corresponding samples y(t, £,) and y(r,<2) of the
output are also identical in t. The system is called stochastic if T operates on
both variables t and £• This means that there exist two outcomes and <2 such
that x(r, £\) = x(r, f2) identically in t but y(/, f j) Ф y(t, f2). These classifications
are based on the terminal properties of the system. If the system is specified in
terms of physical elements or by an equation, then it is deterministic (stochastic)
if the elements or the coefficients of the defining equations are deterministic
(stochastic). Throughout this book we shall consider only deterministic systems.
In principle, the statistics of the output of a system can be expressed in
terms of the statistics of the input. However, in general this is a complicated
problem. We consider next two important special cases.
Memoryless Systems
A system is called memoryless if its output is given by
У(*) =£[x(')]
where g(x) is a function of x. Thus, at a given time t = tt, the output y(rг)
depends only on x(/j) and not on any other past or future values of xG).
From the above it follows that the first-order density /У(у; r) of yG) can be
expressed in terms of the corresponding density /X(x; t) of xG) as in Sec. 5-2.
Furthermore,
£(у(')}=/ g(x)fx(x-t)dx
J —00
Similarly, since y(f j) = g[xGj)] and y(r2) = #[xG2)l, the second-order den-
sity /у(Ур Уг5 G» G) yG) can be determined in terms of the corresponding
density Д(Х], x2;	r2) of x(r) as in Sec. 6-3. Furthermore,
£{y(fi)y(G)} = f / g(xl)g(x2)fx(xl,x2'ttltt2) dx{ dx2
— 00* —00
The nth-order density /У(ур..., y„; tlt..., tn) of y(r) can be determined
from the corresponding density of x(r) as in (8-8) where the underlying transfor-
mation is the system
y(G) e^[x(G)],...,y(t„) =£[*('„)]	(10-65)
STATIONARTTY. Suppose that the input to a memoryless system is an SSS
process xG). We shall show that the resulting output y(f) is also SSS.
Proof, To determine the nth-order density of y(t), we solve the system
s(*i) вУ1» .-,«(хп) — yn	(10-66)
10-2 SYSTEMS WITH STOCHASTIC INPUTS 305
If this system has a unique solution, then [see (8-8)]
/у( У1» -	У„: 'i	? Xn,t".",tn)	(10-67)
Ig'Ui) ‘ •• £'(x„)|
From the stationarity of x(r) it follows that the numerator in (10-67) is invariant
to a shift of the time origin. And since the denominator does not depend on t,
we conclude that the left side does not change if r, is replaced by t, + c. Hence
y(r) is SSS. We can similarly show that this is true even if (10-66) has more than
one solution.
Notes 1. If xG) is stationary of order N, then yO) is stationary of order N.
2. If x(/) is stationary in an interval, then yO) is stationary in the same interval.
3. If x(r) is WSS stationary, then yO) might not be stationary in any sense.
Square-law detector. A square-law detector is a memoryless system whose
output equals
y(f) =x2(/)
We shall determine its first- and second-order densities. If у > 0, then the
system у = x2 has the two solutions ± y/y • Furthermore, y'(x) = ±2y/y;
hence
At*') =	+a(-^;')I
If у i > 0 and у 2 > 0, then the system
У1=-Ч2	Уг=*1
has the four solutions (± y/y?> ± '[Уг )• Furthermore, its jacobian equals
±4Уу(у2; hence
/у(У1» Уг» G» *2) ~	xB/xfiy/y?> iy/y7»Л» *2)
where the summation has four terms.
Note that, if x(/) is SSS, then /Х(х;г) = /x(x) is independent of t and
Zr(Xj, x2; G, f2) = /x(xh x2;r) depends only on т » tx - t2. Hence /У(у) is
independent of t and fy(yit y2; r) depends only on т = tj - t2.
Example 10-15. Suppose that x(z) is a normal stationary process with zero mean
and autocorrelation Лл(т), In this case, /X(x) is normal with variance Rx(0).
if yG)- **0) (Fig-10-6), then £{y(r)} = J?x(0) and (see (5-8)]
306 STOCHASTIC PROPERTIES
Wc shall show that
Я,(т) - Я*(0) + 2К2(т)	(10-68)
Proof. The RVs x(f + r) and x(f) are jointly normal with zero mean. Hence [sec
(7-36)]
E{x2(l + r)x2(f)} = E{x2(t + r)}£{x2(r)} + 2E2{x(f + r)x(r)}
and (10-68) results.
Note in particular that
£{У2(')} = Яу(0) = 3R;(0)	<r; = 2R;(0)
Hard limiter. Consider a memorylcss system with
«(*)-{_[	(10-69)
(Fig. 10-7). Its output y(r) takes the values ± 1 and
P{y(/) = 1} =P{X(/) > 0} = 1 -F/0)
P(y(/) = -1} =P{x(t) <0} =Fx(0)
Hence
£{У(')) = 1 XP{y(t) = 1} - 1 xP{y(r) = -1) = 1 - 2Fx(0)
The product y(r 4- r)y(/) equals 1 if x(r + r)x(t) > 0 and its equals - 1 other-
wise. Hence
ЯДт) = P{x(t + t)x(0 > 0} - P{x(t + t)x(0 < 0)	(10-70)
1 — Дк * * 0 x * \	1	y(0- 0 f| t2 b	G 1
FIGURE 10-7
10-2 SYSTEMS WITH SIXX'HAS IKINPUTS 307
Thus, in the probability plane of the RVs x(/ + 7) and x(r), RV(T) equals
the masses in the first and third quadrants minus the masses in the second and
fourth quadrants.
Example 10-16. We shall show that if x(z) is a normal stationary process, then the
autocorrelation of the output of a hard limiter equals
2 Rv(r)
Rv(t) = - arcsin R	(10-71)
This result is known as the arcsine law.t
PROOF. The RVs x(r + r) and x(z) are jointly normal with zero mean, variance
Rx(0), and correlation coefficient /?,(т)//?,(0). Hence [sec (6-47)).
P{x(f + r)x(f) >0} = | + -
2 7Г
sin a = — , 4
I a	/?,»)
P{x(/+ r)x(t) < 0} = — -—
L ТГ
Inserting in (10-70), we obtain
and (10-71) follows.
Example 10-17 Bussgang’s theorem. Using Price’s theorem, wc shall show that if
the input to a memoryless system у = g(x) is a zero-mean normal process x(z), the
cross-correlation of x(r) with the resulting output y(z) = g[x(/)] is proportional to
^xx^T
Rxv(r) = KRxx(r) where К = E{g'[x( t)]}	(10-72)
Proof. For a specific t, the RVs x = x(f) and z = x(z + t) arc jointly normal with
zero mean and covariance д = E{xz) = /?хх(т). With
I - E{zg(x)} = E{x(f + r)y(z)} = Лг,(т)
it follows from (7-37) that
(‘°-7”
dp. ( dx dz J
If д = 0, the RVs x(z + t) and x(t) arc independent; hence I = 0. Integrating
(10-73) with respect to д, wc obtain 1 — Kp and (10-72) results.
+J. L Lawson and G. E. Uhlenbeck: Threshold Signals, McGraw-Hill Book Company, New York,
1Й0.
308 STOCHASTIC PROPERTIES
Special cases A (a) (Hard limiter) Suppose that g(x) = sgn x as in (10-69). In this
case, g'(x) = 2S(x); hence
К = E{25(x)} =2 Г З(х)Дх) dx = 2/(0)
** — ж
where
/(X) = /2irRtt(0) CXP{ ” 2Я„(0) }
is the first-order density of x(/). Inserting into (10-72), we obtain
/	2
Яху(г) = RM-d TrRtx(Q)	= Sgnx(/)
(b) (Limiter) Suppose next that y(z) is the output of a limiter
S(A-) = C	«<') = (n H *
(c |x| > c	(0	|x| >
In this case,
К = Г f(x)dx = 2G|  C z - | - I
(10-74)
(10-75)
Linear Systems
The notation
y(f) =L[x(f)]	(10-76)
will indicate that y(r> is the output of a linear system with input x(f). This
means that
LfajX^/) 4-a2x2(r)] = a1L[x1(t)] + azMx2(z)] (10-77)
for any .a1>a2,X|(0>x2(0.
The above is the familiar definition of linearity and it also holds if the
coefficients a( and a2 are random variables because, as we have assumed, the
system is deterministic, that is, it operates only on the variable t.
Note If a system is specified by its internal structure or by a differential equation, then
(10-77) holds only if y(r) is the zero-state response. The response due to the initial
conditions (zero-input response) will not be considered.
A system is called time-inuariant if its response to x(r 4- c) equals y(t + c).
We shall assume throughout that all linear systems under consideration arc
time-invariant.
tH. E, Rowe, “Memoryless Nonlinearities with Gaussian Inputs,” BSTJ, vol. 67, no. 7, September
1982,
10-2 systemsw«।л sioc has'Ik iNi'i'is 309
It is well known that the output of a linear system is a convolution
y(f) =	= ( x( / - a)h(a) da	(10-78)
— te
where
h(i) = L[6(/)]
in its impulse response. In the following, most systems will be specified by
(10-78). However, we start our investigation using the operational notation
(10-76) to stress the fact that various results based on the next theorem also
hold for arbitrary linear operators involving one or more variables.
The following observations are immediate consequences of the linearity
and time invariance of the system.
If x(r) is a normal process, then y(/) is also a normal process. This is an
extension of the familiar property of linear transformations of normal RVs and
can be justified if we approximate the integral in (10-78) by a sum:
У(С) =	~ »a)A(«)
к
If x(/) is SSS, then y(r) is also SSS. Indeed, since y(z + c) = L[x(z + c)]
for every c, we conclude that if the processes x(z) and x(z + c) have the same
statistical properties, so do the processes y(r) and y(z + c). Wc show later [see
(10-133)] that if x(z) is WSS, the processes x(z) and y(z) are jointly WSS.
Fundamental theorem. For any linear system
£{L[x(f)]) = L[E{x(z)}]	(10-79)
In other words, the mean 17//) of the output y(z) equals the response of the
system to the mean 17/z) of the input (Fig. 10-8a)
Ч,(')=Мл.(0]	(10-80)
The above is a simple extension of the linearity of expected values to
arbitrary linear operators. In the context of (10-78) it can be deduced if wc write
the integral as a limit of a sum. This yields
E(y(r)} = f E{x(t - a)}h(a) da = r}x(t')* h(t) (10-81)
(a)	(6)
FIGURE 10-8
310 STOCHASTIC PROPERTIES
Frequency interpretation At the zth trial the input to our system is a function xG,{()
yielding as output the function yG»&) = LfodJ, £)]• For large л,
, , 4, H'.fi) +	+y(',f„)	i[x(r.f,)] + ••• +L[x((.J„)]
ЭДО) =-----------------------------------------------------
From the linearity of the system it follows that the last term above equals
Гх(/,^) + ••• + x(/,Q
л
This agrees with (10-79) because the fraction is nearly equal to E{xG)}.
Notes 1. From (10-80) it follows that if
x(r) = x(f) - у(О = у(О-ny(O
L[x(0] » Ф(')] - Ц^(О] = У(О	(10-82)
Thus the response of a linear system to the centered input x(r) equals the centered
output y(r).
2.	Suppose that
x(0=/(0 + v(0	£{v(O) = 0
In this case, E(x(/j) = f(t); hence
Thus, if x(f) is the sum of a deterministic signal /(/) and a random component
v(t), then for the determination of the mean of the output wc can ignore v(i) provided
that the system is linear and E{v(f)} = 0.
Theorem (10-79) can be used to express the joint moments of any order of
the output y(f) of a linear system in terms of the corresponding moments of the
input. The following special cases are of fundamental importance in the study of
linear systems with stochastic inputs.
OUTPUT AUTOCORRELATION. We wish to express the autocorrelation Ryy(tt,
t2) of the output y(r) of a linear system in terms of the autocorrelation
Лхх(ги t2) of the input x(r). As we shall presently see, it is easier to find first the
cross-correlation Rxy(t{, f2) between x(t) and yO).
THEOREM
(«)	= L2[«„(G,»2)]	(10-83)
In the above notation, L2 means that the system operates on the variable
treating as a parameter. In the context of (10-78) this means that
M'l.'z) - /”_Я„(«|. <2-«)*(«)</»	(10-84)
<»)	-4R„(»,.'j)]	(10-85)
10-2 SYS П-MS WITH SHH IIASJK INPL'IS 311
In this case, the system operates on
= J_ ~ (x,t2)h(a) da	(10-86)
Proof. Multiplying (10-76) by x(r,) and using (10-77), we obtain
x('i)y(') = L,[x(z()x(z)]
where Lf means that the system operates on t. Hence [see (10-79)]
£{x('i)y(')} = L, [£{x(i\)x(t)}]
and (10-83) follows with t = t2. The proof of (10-85) is similar: We multiply
(10-76) by y(/2) and use (10-79). This yields
f{y(')y(/2)} = L,[£{x(z)y(z,)}]
and (10-85) follows with t = tv
The preceding theorem is illustrated in Fig. 10-86: If Rlv(f,,/,) is the
input to the given system and the system operates on t2, the output equals
/?,//„ t2). И /?,.//,, z2) is the input and the system operates on r,, the output
equals Ryy(tltt2).
Inserting (10-84) into (10-86), wc obtain
Яп.('р'2) = Г f
This expresses j?v>,(/h z2) directly in terms of Rxl.(t P t2). However, conceptually
and operationally, it is preferable to find first /?rv(r,, z2).
Example 10-18. A stationary process v(t) with autocorrelation R,„.(t) = q8(r)
(white noise) is applied at t = 0 to a linear system with
A(z) = e"‘'t/(z)
We shall show that the autocorrelation of the resulting output y(t) equals
/?v..(z,,z2) = T-(l -e-2"’)^-'-!	(10-87)
2c
for 0 < z, < t2.
Proof. We can use the preceding results if we assume that the input to the system
is the process
x(z) = v(i)U(i)
With this assumption, all correlations arc 0 if z, < 0 or z2 < 0. For z, > 0 and
h> o,
Rtx(f|.M	= <73(z, - z2)
Aswcsee from (10-83), Rxy(tit t2) equals the response of the system to qfUtf - z2)
considered as a function of t2. Since <5(Z| - z2) = 5(z2 - t\) and L[8(t2 - z,)l =
312 STOCHASTIC PROPERTIES
Л(/2 - f|) (time invariance), we conclude that
R,y(hJi) =<lh(t2 - /,) = qe~c(,2~',)U(t2 - /,)
In Fig. 10-9, we show Ялу(г1,/2) as a function of and t2. Inserting into (10-86),
we obtain
/? .(fp/,) = q Г'е‘(1,~"~1:>е~‘° da < t-,
Jo
and (10-87) results.
Note that
£(/-(/)) = R„«) - £(1 - e-:") “
COROLLARY. The autocovariance Cyy(.tx,t2) of y(t) is the autocorrelation of
the process y(t) = yO) - tj/O and, as we see from (10-82), y(r) equals L[x(/)].
Applying (10-84) and (10-86) to the centered processes x(() and yG), we obtain
CfyOl’'2) = fl’G) */z(G)	(10 88)
Cyy(G,/2)=Cxy(rl,/2)*/t(f1)
where the convolutions are in I, and t2 respectively.
Complex processes The preceding results can be readily extended to
complex processes and to systems with complex-valued h(t). Reasoning as in
the real case, we obtain
M'i’G) = «x,r(G.G)*/i*(G)
(Ю-89)
Л>(>,(/|, t2) = Rxy(t|, t2) *	\)
Response to white noise. We shall determine the average intensity E{|yG)|2) of
the output of a system driven by white noise. This is a special case of (10-89),
however, because of its importance it is stated as a theorem.
THEOREM. If the input to a linear system is white noise with autocorrelation
G) = G — G)
then
MIKOI2) =<1(0*|Л(')12= f%(/-a)|/i(a)|2rfa	(10-90)
J — ae
10-2 SYSTEMS WITH STOCTIAS ГК'INPU IS 313
Proof. From (10-89) it follows that
Я,у('р'2) =<7('|Ж'2- '.)*Л*('г) = 4('i)A’(f: - g)
Я„.('р'2) = Г«('| -«)Л*['2~ ('. -a)]h(a)da
and with r, = t2 = t, results.
Special cases (a) If x(t) is stationary white noise, then qlt) = q and
(10-90) yields
£{у2(О}=<?£ where E = f |A(f)l2r/t
is the energy of Mt).
(b) If A(/) is of short duration relative to the variations of q(t), then
£(У2(')} =<?(') Г \h(a)\2da = Eq(t)	(10-91)
J — X
This relationship justifies the term average intensity' used to describe the
function q(t).
(c) If = qS(r) and v(t) is applied to the system at t = 0, then
q(t) = qU(t) and (10-90) yields
£{y2(/)} = qf‘ IA(a)|2rf«
— X
Example 10-19. The integral
у = Pvfa) da
Ai
can be considered as the output of a linear system with input x(/) = v(t)U(t) and
impulse response Mt) = U(l). If, therefore, v(i) is white noise with average
intensity q(t), then x(l) is white noise with average intensity q(t)U(t) and (10-90)
yields
^{У2(/)) =<7(t)U(f)^(/) = J q(a) da
Differentiators. A differentiator is a linear system whose output is the derivative
of the input
L[x(/)] = x'(r)
We can, therefore, use the preceding results to find the mean and the autocor-
relation of x-4r).
From. (10-80) it follows that
(10-92)
314 STOCHASTIC PROPERTIES
Similarly [see (10-83)]
d/rJr.G)
я„.(»,,<2) =l2[«„(',.'2)| = —(10-9.4,
because, in this case, L2 means differentiation with respect to t2. Finally,
M','2) =Ll[/?^(r1,f2)] =	3{ -	(10-94)
Combining, wc obtain
^•('и'г)---------; д	(10-95)
(i 11 u / 2
Stationary processes If x(f) is WSS, then 7jK(r) is constant; hence
E{x'(r))=0	(10-96)
Furthermore, since /?х//,,/2) = Ялх(т), we conclude with т =	- t2 that
a«„('i - >2) _ _ MM	_ <<2я„(т)
dt2	dr	dt\dt2	dr2
Hence
Rxx,(r) = -Я.Дт) Яг.,,(т) = -R"M (Ю-97)
Poisson impulses. If the input x(r) to a differentiator is a Poisson process, the
resulting output z(/) is a train of impulses (Fig. 10-10)
z(r) = £3(t -t.)	(10-98)
i
We maintain that z(z) is a stationary process with mean
77. = A	(10-99)
and autocorrelation
/?..(т) = A2 + AS(t)	(10-100)
x(0
(a)
FIGURE 10-10
10-2 SYStFMS will) SIOCHASIK IM'IIIS 315
Proof. The first equation follows from (10-91) because = Ar. To prove the
second, wc observe that [see (10-14)]
««(G’G) = А2Г|Г, + A min(r,./2)	(10-101)
And since z(r) = x’(t), (10-93) yields
M'i.'j) = -------------— = A2/! + A4/(r, - r,)
This function is plotted in Fig. 10-106 where the independent variable is r(. As
we see, it is discontinuous for r, = t, and its derivative with respect to r(
contains the impulse A6(r, - r2). This yields [see (10-94)]
M,.(fltb)
------ = A- + A3(г, - ty)
DIFFERENTIAL EQUATIONS. A deterministic differential equation with random
excitation is an equation of the form
+ ••• + aoy(r) = x(r)	(10-102)
where the coefficients ak are given numbers and the driver x(r) is a stochastic
process. We shall consider its solution y(r) under the assumption that the initial
conditions are 0. With this assumption, y(r) is unique (zero-state response) and
it satisfies the linearity condition (10-77). We can, therefore, interpret y(r) as the
output of a linear system specified by (10-102).
In general, the determination of the complete statistics of y(r) is compli-
cated. In the following, we evaluate only its second-order moments using the
preceding results. The above system is an operator L specified as follows: Its
output y(r) is a process with zero initial conditions satisfying (10-102).
Mean. As we know [see (10-80)] the mean of y(r) is the output of L with
input Hence it satisfies the equation
+ ••• +«<>^(0 =	(ю-103)
and the initial conditions
tj/O) = ••• =^Гп(0) =0	(10-104)
This result can be established directly: Clearly,
E{y<*’(/)} =77(y*V)	(10-105)
Taking expected values of both sides of (10-102) and using the above, we obtain
(10-103). Equation (10-104) follows from (10-105) because y<Aj(0) = 0 by as-
sumption.
3'16 STOCHASTIC PROPERTIES
Correlation. To determine ,Rx>.(ti, t2), we use (10-83)
= ^-2 [	> (^ I ’ 2) 1
In this case, L2 means that Rxy(t^ satisfies the differential equation
+ ... +anRf(liJz) = ял,(,,.,,)	(i(M()6)
with the initial conditions
Лл/G.O) “ •••
d""1/?,, ,(r.,0)
-----— = о
dt?-'
Similarly, since [see (10-85)]
= ^i[^xy(G’ ^2)]
we conclude as above that
d"Rvv(t.,t2)
^fn +	+«o^,(r„r2) =Rxv('{,'2)
Ryy(0, (2) ~
а"-’я„.(0л2)
( 10-107)
(10-108)
(10-109)
The preceding results can be established directly: From (10-102) it follows
that
*(/|)[яйУ<я)(ь) +	+«(>y(f2)J =x(Zi)x(t2)
This yields (10-106) because [see (10-119)]
Similarly, (10-108) is a consequence of the identity
[аяУ0,)(Г|) + ••• +<r(,yUi )]y(/2) =x('!)y('2)
because
^{y(‘,(fl)y(/2)}-a%>.(fpf2)/^f
Finally, the expected values of
x('i)y<A)(0) = 0	y<A,(0)y(z2) = 0
yieldi C10-107) and (10-109).
General moments. The moments of any order of the output y(z) of a linear
system can be expressed in terms of the corresponding moments of the input
x(/). As an illustration, we shall determine the third-order moment
= £{У|(ОУз(')Уз(')}
of y(f) to terms of the third-order moment Лл.жл.(г t, t2, of x(z). Proceeding as
10-2 >YMI MS Willi SICHIIAMH IM’I IS 317
in (10-83), we obtain
£T{x< z, )x( гг)у( z,)) = L,[ £{x(/, )x( ь )X( i,)} ]
= J K..,('i-'2-G - у)МуМу (10-IHM)
£{x('i)y('з>У<G)} = M £{x('i)x(<2)y(fг)}]
= f	J/J (10-110Л)
J - X
£{y('i)y('->)y('J) = M £(x( z,)y( z2)y( z,)} ]
= f Rllv(ft - a.t2.t')h(n) da (KI-llOc)
• X
Note that for the evaluation of /?vl v(Zj. z?. z5) for specific times zPz2.zv the
function /?1Л10,, z2, z3) must be known for every z,.z2, zv
Vector Processes and Multiterminal Systems
We consider now systems with n inputs x,(z) and r outputs y,(z). As a
preparation, we introduce the notion of autocorrelation and cross-correlation
for vector processes starting with a review of the standard matrix notation.
The expression A = [at)] will mean a matrix with elements atl. The
notation
A' = [«J	= [«*] Af = [«;]
will mean the transpose, the conjugate, and the conjugate transpose of A.
A column vector will be identified by A = [«J. Whether A is a vector or a
general matrix will be understood from the context. If A - [«J and В = [bj
are two vectors with m elements each, the product AlB = а,Ь, +  • • +ciinbtll is
a number, and the product AB1 = [cr,bj is an m X m matrix with elements
A vector process X(z) = (xr(z)] is a vector, the components of which are
stochastic processes. The mean tj(z) = E{X(z)} = [??,(/)] of X(z) is a vector with
components 7j,(r) = E(x,(z)}. The autocorrelation }?(/,,/2) or /?rv(Z|,z2) of a
vector process X(r) is an m x tn matrix
/?(/,, z2) =E(X(zl)Xt(z2)}	(10-111)
with elements	We define similarly the cross-correlation matrix
М'р'з) =^{X(zl)Yt(z2)}	(10-112)
of the vector processes
X(OeMO] z=l....,m	Y(0 = [*>(')]
A rnullitcrminal system with m inputs x,(z) and r outputs y,(z ) is a rule for
assigjitag to an ni vector X(z) an r vector Y(z). If the system is linear and
318 STOCHASTIC PROPERTIES
lime-invariant, it is specified in terms of its impulse response matrix. This is an
г X m matrix
Я(') = [M')J	'=1.......m J = 1.........r (Ю-114)
defined as follows: Its component is the response of the jth output when
the fth input equals 8(t) and all other inputs equal 0. From this and the
linearity of the system, it follows that the response y,(r) of the jth output to an
arbitrary input X(t) = [x,(z)] equals
УДО = [ hj}(a)x^t - a) da +  • + ( hj,„(a)K„,(t ~ a) da
Hence
Y(/) = Г H(a)X(t-a)da	(10-115)
J — X
In the above, X(/) and Y(r) are column vectors and H(t) is an г X m matrix.
We shall use this relationship to determine the autocorrelation	of
Y(/). Premultiplying the conjugate transpose of (10-115) by X(r,) and setting
t = t2, we obtain
X(G)Y+(Z2) = Г X(/,)X+(Z2 - a)H\a) da
Hence
АгД'г'г) = [	а)Н*(а) da (10-116cr)
J — oe
Postmultiplying (10-115) by Y+(r2) and setting I = we obtain
'2) - Г	-a, I2) da (10-1166)
J —QO
as in (10-89). These results can be used to express the cross-correlation of the
outputs of several scalar systems in terms of the cross-correlation of their
inputs. The next example is an illustration.
Example 10-20. In Fig. 10-11 we show two systems with inputs X|(z),x,(z) and
outputs
У1(')	— a)da y2(O = Г h2(a)x2(l - a) da (10-117)
J-x>
(e)
(6)
FIGURE 10-11
10-3 1 HL I’OWI H SI’I-.CIHI IM 319
These signals can be considered as the components of the output vector Y'(/) =
[yt0 ),y2(/)] of a 2 x 2 system with input vector X'G) = [x,(r).x,(zj] and impulse
response matrix
Inserting into (10-116), we obtain
^ii»;(,r fj) ~ f	~ °	) da
(10-118)
f’) = / ^i(° )^,,v,(/| -a.lz)da
Thus, to find /?дРр|.r2), wc use RKl^t{,t2} as the input to the conjugate /<?(/)
of Л2(г), operating on the variable t2. To find Rt(t,(/,. /2). we use /?, *.(/|,as
the input to /t|(r) operating on the variable r, (Fig. 10-11).
Example 10-21. The derivatives yt{t) = z('"’G)andy,(f) = w'"’(t) of two processes
zO) and w(r) can be considered as the responses of two differentiators with inputs
x,(f) = z(r) and x2(r) = w(t). Applying (10-118) suitably interpreted, we conclude
that
,	,	a"" "K.u.(/../,)
E{z‘-">(/1)w<'->(/2)} =---‘	(I0-H9)
10-3 THE POWER SPECTRUM
In signal theory, spectra are associated with Fourier transforms. For determinis-
tic signals, they are used to represent a function as a superposition of exponen-
tials. For random signals, the notion of a spectrum has two interpretations. The
first involves transforms of averages; it is thus essentially deterministic. The
second leads to the representation of the process under consideration as
superposition of exponentials with random coefficients. In this section, we
introduce the first interpretation. The second is treated in Sec. 12-4. We shall
consider only stationary processes. For nonstationary processes the notion of a
spectrum is of limited interest.
DEFINITIONS. The power spectrum (or spectral density) of a WSS process x(f).
real or complex, is the Fourier transform S(<d) of its autocorrelation R(r) =
£(x(f + r)x *(/)):
SM = Г R(T)e-)b,rdr	(10-120)
' — X
Since R(— r) = R*(t) it follows that S(o>) is a real function of ы.
From the Fourier inversion formula, it follows that
Я(т) = Г SMeia”do>	(10-121)
2тг* -x
320 STOCHASTIC PROPERTIES
TABLE 10-1
Л(т) = _L Г Sturie^do) «-> S(u) = Г R(r)e~ja>r dr
2'IT * — co	* — QB
S(t) ♦* 1
elpt « 2тгЗ(ш - 3)
z°
4-
1 -♦ 2тгЗ(ш)
COS 3т irS(<i> — p) + ir3(<4 + p)
a	a
e-o|r| cos fir -♦ —--------------r + —----------------г
a2 + (a> — 3) о2 + (ш + 3 К
2e"nr!cos3r ♦*»/-	+ e-<«•*«*/»»]
4sin2(&>7/2)
Ta2
|<z> | < a
|<z> | > a
If x(f) is a real process, then /?(т) is real and even; hence S(<o) is also real
and even. In this case,
/00	00
jR(t)cos шт dr = 2 I /?(t)cos шт dr
° ,	(10-122)
Я(т) = -—/ S(ai)cos шт d<t) = — / 5(ai)cos ыт do)
2тг J-tn	7Г JQ
The cross-power spectrum of two processes x(r) and y(/) is the Fourier
transform Sxy{o)) of their cross-correlation Яху(т) = E(x(t + т)у*(г)}:
M“) = Г	dr R„(r) = ~Г s,^a)e>" dv (10-123)
"r — co	Z7T J — co
The function Szy(o)) is, in general, complex even when both processes x(f) and
y(t) are real. In all cases,
Sxy(a)) = SyUa))	(10-124)
because Ях/-т) = £{x(r - т)у*(г)) = Я*х(т).
In Table ICF1 we list a number of frequently used autocorrelations and the
corresponding spectra. Note that in all cases, S(w) is positive. As we shall soon
show, this is true for every spectrum.
Example 10-22. A random telegraph signal is a process x(/) taking the values +1
and -1 as in Example 10-6:
x(/) - {
^2/ < < ^2f+l
l2i-I < 1 <
10-3 THE POWER SPLAT RUM 321
where t, is a sei of Poisson points with average density Л. As wc have shown in
(10-19), its autocorrelation equals e"2A,T|. Hence
For most processes R(r) -* tj2 where tj = E(x(r)} (see Sec. 12-4). If,
therefore, 17 #= 0, then S(w) contains an impulse at ш = 0. To avoid this, it is
often convenient to express the spectral properties of x(t) in terms of the
Fourier transform Sc(w) of its autocovariance C(r). Since 7?(т) = C(r) + tj2, it
follows that
S(w) = Sc(w) + 2tti723(6))	(10-125)
The function $с(ы) is called the covariance spectrum of x(r).
Example 10-23. Wc have shown in (10-100) that the autocorrelation of the Poisson
impulses
d
*(') =	- *<) = Z>(t - t,)
equals /?.(т) = A2 + A3(r). From this it follows that
5.(ш) = A + 2ttA23(w) Scz(u) = A
We shall show that given an arbitrary positive function S(ro), we can find a
process x(r) with power spectrum S(ia).
(a)	Consider the process
x(r) = ae'<w'-v)	(10-126)
where a is a real constant, <o is an RV with density /ш(ю), and <p is an RV
independent of <»> and uniform in the interval (0,2тг). As we know, this process
is WSS with zero mean and autocorrelation
Лх(т) = a2E(e'") = а2 Г
* — «о
From this and the uniqueness property of Fourier transforms, it follows that
(see (10-121)] the power spectrum of x(/) equals
$/*>) = 2тга2иШ)	(10-127)
If, therefore,
«$(&))	1	r00
then Д,Сю) is a density and 5x(a>) = 5(w). To complete the specification of x(r),
it suffices to construct an RV ш with density S(<o)/2ira2 and insert it into
(10^126).
322 STOCHASTIC PROPERTIES
FIGURE 10-12
(b)	We show next that if Si-ш) = S(w), we can find a real process with
power spectrum 5(o>). To do so, we form the process
y(t) ~ a cos(tat + <p)	(10-128)
In this case (see Example 10-14)
a2	a2
Яу(т) = —£{coswt) = — I f(u)cos шт dw
Л	2 J—cc
From this it follows that if /ш(^) = 5(w)/rra2, then Sv(o>) = S(w).
Example 10-24 Doppler effect. A harmonic oscillator located at point P of the x
axis (Fig. 10-12) moves in the x direction with velocity v. The emitted signal equals
and the signal received by an observer located at point О equals
s(t) =
where c is the velocity of propagation and r = rQ + vt. We assume that v is an RV
with density Clearly,
s(r) =	co = cu0^l----j <p = ° °-
hence the spectrum of the received signal is given by (10-127)
S(w) = 2ira2fat(a>) =
2ira2c Г/ a
------A 1- —
w0	шо
(10-129)
Note that if v = 0, then
s(f) - ae*W R(r) = a2eib,°T S(u>) = 2тга28{ш - w0)
This is the spectrum of the emitted signal. Thus the motion causes broadening of
the spectrum.
The above holds also if the motion forms an angle with the x axis provided
that v is. replaced by its projection v, on OP. The following case is of special
interest. Suppose that the emitter is a particle in a gas of temperature T. In this
case, the x component of its velocity is a normal RV with zero mean and variance
10-3 THE POWER SPECTRUM 323
kT/m (see Prob. 8-5). Inserting into (10-129), we conclude that
2тгагс f me2 f ы \21
= vQf2irkT/m CXP\ ” ШТ ~	/
( кТш^г21
Я(т) - д2 cxp{-------v \e,a>l,T
( 2nic~ J
Line spectra, (a) We have shown in Example 10-7 that the process
x(r) = ^с,еУЧ'
i
is WSS if the RVs c, are uncorrelated with zero mean. From this and Table
10-1 it follows that
Я(т) = £<r/e^'T SM = 2ir'Ea,28(a) - to,) (10-130)
i
where 072 = £{c2}. Thus S(to) consists of lines. In Sec. 14-2 we show that such a
process is predictable, that is, its present value is uniquely determined in terms
of its past.
(b) Similarly, the process
У(0 = E(a,- cos a)jt + b, sin (Ojt)
i
is WSS iff the RVs and b, are uncorrelated with zero mean and E{a2} =
E{b2} = 0/. In this case,
R(r) = X/°i2 cos Ш1Т	= тг£о;2[3(й) — to,) + 3(to + to,)]
i	«
(10-131)
linear systems. We shall express the autocorrelation Rw,(r) and power spec-
trum 5>у(<о) of the response
y(() = f *(t-a)h(a)da	(10-132)
* —00
of a linear system in terms of the autocorrelation Rxx(t) and power spectrum
^xx(w) of the input x(/).
THEOREM
Мт) - Мт)*Л*(-т)	Луу(т) = Лду(т)*Л(т)	(10-133)
Sxy - Sxx(to)H*(to)	Sy/to) - Sxy(o»)H(to)	(10-134)
Proof, The two equations in (10-133) are special cases of (10-184) and (10-185).
However, because of their importance they will be proved directly. Multiplying
324 STOCHASTIC PROPHRTIKS
the conjugate of(10-132) by x(r + т) and taking expected values, wc obtain
E(x(/ + т)у*(/)} = f E{x(z + -)x*(l - a)}h*(a) da
J - ОС
Since E{x(z + r)x*(r - a)} = Ядд.(т + a), this yields
R (r) = Г /?1Л(г + а)Л*(а) da = Г ЯД1.(т - 0)h*(-£) d$
* J-X	J
Proceeding similarly, we obtain
E{y(r)y*(/ - r)} = f E{x(l - а)у*(г - т)}/|(а) da
= f Rxy(r - a)h(a) da
J — ОС
Equation (10-134) follows from (10-133) and the convolution theorem.
COROLLARY. Combining the two equations in (10-133) and (10-134). we obtain
Лу>(г) = Ялд.(т)*Л(т)*Л*(-т) =/?гг(7) *р(т)	(10-135)
M") =M")«(")^*(") =5гд(й>)|/У(ш)|2	(10-136)
where
р(т) = Л(т)*Л*(-т) = Г h(t + 7)h*(t)dt	|W(m)|2 (10-137)
J — X
Note, in particular, that if x(r) is white noise with average power q, then
Едд(т) = qS(r)	Sxx(u)=q
,	(10-138)
Syy(a>) - q\H(a>)\‘	Ryy(r)=qp(r)
From (10-136) and the inversion formula (10-121), it follows that
£{MOI2} -Я„(0) = -- Г SIXM\HM\2do, > 0 (10-139)
This equation describes the filtering properties of a system when the input is a
random process. It shows, for example, that if Н(ш) = 0 for |&>I > and
SXJf(ai) = 0 for |ш| < <o(), then E(y2(r)} = 0.
Note The preceding results hold if all correlations are replaced by the corresponding
covariances and all spectra by the corresponding covariance spectra. This follows from
the fact that the response to x(r) -у* equals y(r) —17... For example, (10-136) and
(10-142) yield
M") = M‘‘»)IW(«)I2	(io-14°)
Vary(/) f S^(w)|H(w)r-dw	(Ю-141)
Z1T -'-x
ltl-3 uh 1’owi.h si*i<him 325
Example 10-25. («) (Moring aieragc) The integral
y(Z) = —J x(a)da
is the average of the process x(r) in the interval (t - T, i + T). Clearly. y(r) is the
output of a system with input x(z) and impulse response a rectangular pulse as in
Fig. 10-13. The corresponding p(r) is a triangle. In this case,
1	sin 7'ш	sin2 7’w
11	J -T	1(4	1 ~ш~
Thus Н(ш) takes significant values only in an interval of the order of 1/7’ centered
at the origin. Hence the moving average suppresses the high-frequency components
of the input. It the thus a simple low-pass filter.
Since p(r) is a triangle, it follows from (10-135) that
Rvv(r) = ^f21 (1 -	- a) da (10-142)
2Г •'-27Д	~1 /
We shall use this result to determine the variance of the integral
= — f x(') dt
1 / J - т
Clearly» T|z - y<0); hcncc
Var qz = C,„(0) = 57	da (10-143)
(Z>) (High-pass filter) The process z(r) = x(r) - y(z) is the output of a system
with input x(z) and system function
sin Тш
Н(ш) = 1 - ——
Tw
This function is nearly 0 in an interval of the order of 1/7" centered at the origin,
and it approaches 1 for large <u. It acts, therefore, as a high-pass filter suppressing
the low frequencies-of the input.
Example 10-26 Derivatives. The derivative x'(r) of a process x(z) can be
Considered as lhe output of a linear system with input x(/) and system function ja>.
326 STOCHASTIC PROPERTIES
From this and (10-134), it follows that
Sxx,(o») = — jcuSxx(tu)	Sx,x,(w) = arSxx(w)
Hence
R.Ar) =
^xx(T)
dr
The nth derivative y(r) = x(n’(/) of x(t) is the output of a system with input
x(r) and system function (/co)". Hence
Syy(*>) = l/cul2" Ryy(r) = (- 1)лЛ(2'”(т)	(10-144)
Example 10-27. (a) The differential equation
y’(t) + cy(z) = x(f) all /
specifics a linear system with input x(r), output yG), and system function l/(/to +
c). We assume that x(r) is white noise with Лхх(т) = qS(r). Applying (10-136). wc
obtain
SyAm) = ~5-----7 = ~7—
n w2 + c2 ы2+
Note that E{y2(r)} = Луу(0) = q/2c.
(ft) Similarly, if
q
«>/’•)  27е"1'1
У(') + fty'(') + сИ') = «(')
then
1	q
H(w)  ------j—T---------	--------------------
-a> +]Ьш+с	(C _ ^,2)2 + Ь2Ш2
To find 7?y,.(r), we shall consider three cases:
b2 < 4c
Ryy(T) = 4T7e'n|rl(cos0T +	=	a2+f:
£.uC \	p	)	£
b2 « 4c
о	b
/г"(т)“2б7е'""'(1+“|т|)	a~2
b2 > 4c
^y(T) = 4^(ar + y)^‘“"r,|Tl ~ (« ~ y)e_(“+*)|Tl]
Ь	2
a=> -	a1 - y~ ~>c
In all,cases, £(y2(/)} - q/2bc.
10-3 ini row t к sh ci ri m 327
S.-Mf
Example 10-28 Hilbert transforms. A system with system function (Fig. 10-14)
H(w) = -;sgn<u = / . }	> n	(10-145)
(У a) < 0
is called a quadrature filter. The corresponding impulse response equals \/~t
(Papoulis, 1977). Thus H(u>) is all-pass with -90° phase shift; hence its response
to cos wt equals cos(wt — 90°) = sin mt and its response to sin mt equals sin(wt -
90°)= - cos mt.
The response of a quadrature filter to a real process x(r) is denoted by x(r)
and it is called the Hilbert transform of x(t). Thus
1	1 x(a)
x(r) = x(/)*— = — / ---------da	(10-146)
ttI	7Г J-xt - a
From (10-134) and (10-124) it follows that (Fig. 10-14)
Sxi(w) = j'Stl(w)sgnw----Sfr(w)
(10-147)
= S„(W)
The complex process
z(r) = x(t) + jx(t)
is called the analytic signal associated with x(/). Clearly, z(/) is the response of the
system
1 + j( —j sgn m) = 2U(m)
with input x(/). Hence [sec (10-136)]
Stz(m) = 45л./й|)и(а>) = 2SXJt(w) + 2jSix(m) (10-148)
Я„(т) = 2Ялх(т) + 2;7?/х(т)	(10-149)
THE WIENER-KHINCHIN THEOREM. From (10-121) it follows that
£{x1 2(/)} = Л(0) = 2^/" S(u>) dm s 0	(KM50)
=
= (
328 STOCHASTIC PROPERTIES
This shows that the area of the power spectrum of any process is positive. Wc
shall show that
$(«>);> 0	(10-151)
for every ш.
Proof. We form an ideal bandpass system with system function
1	o)( < a) < a)2
0	otherwise
and apply x(r) to its input. From (10-139) it follows that the power spectrum
Syy((o) of the resulting output y(r) equals
5(a))	o>! < a) < a)2
0	otherwise
Hence
о <;£{y2(0) = Л Г W dta = ^TFSM du> (l0"152>
ZTr-»-®	2тг-'Ш|
Thus the area of 5(a)) in any interval is positive. This is possible only if
S(o)) 0 everywhere.
We have shown on page 321 that if S{u>) is a positive function, then we
can find a process x(/) such that Sxx(w) = S(o>). From this it follows that a
function is a power spectrum iff it is positive. In fact, we can find an
exponential with random frequency as in (10-127) with power spectrum an
arbitrary positive function S(w).
We shall use (10-152) to express the power spectrum 5(a>) of a process
x(r) as the average power of another process y(r) obtained by filtering x(r).
Setting ш । =ш,| + (5 and w2 - «о - 3, we conclude that if 8 is sufficiently
small,
3
£{№(<)) = -S(o>o)	(10-153)
7Г
This shows the localization of the average power of x(r) on the frequency axis.
Integrated spectrum. In mathematics, the spectral properties of a process x(r)
dre expressed in terms of the integrated spectrum F(e>) defined as the integral
of 5(a>):
FM = Г S(a) da	(10-154)
** — oo
Krom the positivity of S(w), it follows that F(w) is a nondecreasing function w.
Integrating the inversion formula (10-121) by parts, we can express the autocor-
10-3 I HL POWLR SPECTRUM 329
relation /?(т) of x(f) as a Riemann-Stieltjes integral:
1 oc
K(r) = — f e^dFM)	(10-155)
•—IT J — x
This approach avoids the use of singularity functions in the spectral representa-
tion of Mr) even when 5(<u) contains impulses. If $(ш) contains the terms
- <u,-), then is discontinuous at ы, and the discontinuity jump
equals /3,.
The integrated covariance spectrum Fr(w) is the integral of the covari-
ance spectrum. From (10-125) it follows that F(w) = Fc(u} + 2vq2U(a)).
Vector spectra. The vector process X(r) = [x,(/)] is WSS if its components x,(r)
are jointly WSS. In this case, its autocorrelation matrix depends only on
т = - t2. From this it follows that [see (10-116)]
My(T) = / Rxx(t + a)H\a) da Ryy(r) = f H(a)Rxv(r - a) da
(10-156)
The power spectrum of a WSS vector process X(r) is a square matrix
Sxx(to) = (5jy(w)], the elements of which are the Fourier transforms S0(w) of
the elements R^t) of its autocorrelation matrix Rx ,(r). Defining similarly the
matrices Sxy((o) and Syy(a>), we conclude from (10-156) that
Sxy(W) = Sxx(a>)H'M Syy(a>) = 77(w)Sxv(a>)	(10-157)
where H(m) = (/f|7(o»)] is an m X r matrix with elements the Fourier trans-
forms Я//(й>) of the elements of the impulse response matrix H(t). Thus
Syy(a>) =	(10-158)
This is the extension of (10-136) to a multiterminal system.
Example 10-29. The derivatives
y,(/) = zCm,(r) y2(z) = w<n,(0
of two WSS processes z(r) and w(z) can be considered as the responses of two
differentiators with inputs z(z) and w(r) and system functions H/w) = and
H2(w) •= (ja>)n. Proceeding as in (10-119), wc conclude that the cross-power
Spectrum of z<m,G) and w("’(z) equals (,/й>)'”(—jw)"SfH.(w). Hence
£{z<"”(; + r)z‘">(0} = (-0 dT^ -	(10-,59>
PROPERTIES OF CORRELATIONS. If a function RM is the autocorrelation of a WSS
.process x(z), then (sec (10-151)] its Fourier transform S(w) is positive. Furthermore, if
RM is a function with positive Fourier transform, we can find a process x(z) as in
(10-126) with autocorrelation Л(т). Thus a necessary and sufficient condition for a
function RM IO be an autocorrelation is the positivity of Its Fourier transform. The
330 STOCHASTIC PROPERTIES
conditions for a function R(r) to be an autocorrelation can be expressed directly in
terms of Жт). We have shown in (10-84) that the autocorrelation Я(т) of a process x(/)
is p.d., that is,
£a,a*R(7<--7|) a 0	(10-160)
for every o/t г,, and rr It can be shown that the converse is also truet: If R(r) is a
p.d. function, then its Fourier transform is positive. Thus a function Жт) has a positive
Fourier transform iff it is p.d.
A sufficient condition. To establish whether Я(т) is p.d., we must show cither
that it satisfies (10-160) or that its transform is positive. This is not, in general, a
simple task. The following is a simple sufficient condition.
Polya’s criterion. It can be shown that a function R(r) is p.d. if it is concave for
г > 0 and it tends to a finite limit as -
Consider, for example, the function w(r) — If 0 < с < I, then
w(t) -» 0 as r —♦ oo and w"(r) > 0 for т > 0; hence w(r) is p.d. because it
satisfies Polya’s criterion. Note, however, that it is p.d. also for 1 < c < 2 even
though it does not satisfy this criterion.
Necessary conditions. The autocorrelation Я(т) of any process x(r) is maximum
at the origin because [see (10-121)]
1 »
|Я(т)|< — [ S(o)) da) = R(Q)	(10-161)
27Г — ОС
We show next that if Я(т) is not periodic, it reaches its maximum only at the
origin.
THEOREM. If J?(7,) = Ж0) for some r, 0, then Я(т) is periodic with period
Tt:
^(т + т,) = R(t) for all t	(10-162)
Proof. From Schwarz’s inequality
E2{zw) < E{z2]E(w2)	(10-163)
it follows that
E2([x(t -1-7 4-7,)- x(f 4- 7)]x(t)}
E([x(r 4-7 4-7,)- x(f 4- 7)]2)E{x2(t)}
Hence
[Я(т 4- T|) - Я(т)]2 <i 2[Я(0) - Е(т,)] E(0)	(10-164)
If E(r,) « R(0}, then the right side is 0; hence the left side is also 0 for every r.
This yields (10-162).
tS Bocher: 4ATur« on Fduricr Integrals. Princeton Univ. Press, Princeton. NJ, 1959.
10-3 THL P(1WI.H SI'l сп«>м 331
COROLLARY. If Жт,) = R(r,) = 7?(0) and the numbers T| and -2 are noncom-
mensurate, that is, their ratio is irrational, then Я(т) is constant.
Proof. From the theorem it follows that Жт) is periodic with periods 7( and r,.
This is possible only if Жт) is constant.
Continuity. If Жт) is continuous at the origin, it is continuous for every -.
Proof. From the continuity of Жт) at r = 0 it follows that Жт,) -> Ж0); hence
the left side of (10-164) also tends to 0 for every т as t, -» 0.
Example 10-30. Using the theorem, wc shall show that the truncated parabola
is not an autocorrelation.
If w(r) is the autocorrelation of some process x(r). then [see (10-144)] the
function
is the autocorrelation of x'(f). This is impossible because -iv"(r) is continuous for
r = 0 but not for r = a.
MS continuity and periodicity. We shall say that the process x(t) is MS
continuous if
E{[x(t + t?) - x(/)]2)-* 0 as e-*0	(10-165)
Since £{[x(r + e) - x(t)]2} = 2[Ж0) - Же)], we conclude that if x(/) is MS
continuous, Ж0) - Же) -» 0 as e -» 0. Thus a WSS process x(r) is MS continu-
ous iff its autocorrelation Жт) is continuous for all r.
We shall say that the process x(t) is MS periodic with period t, if
K{[x(/4- 71) - x(f)]2} = 0	(10-166)
Since the left side equals 2[Ж0)-Жт,)], we conclude that Жт,) = Ж0);
hence (see (10-162)) Жт) is periodic. This leads to the conclusion that a WSS
process x(r) is MS periodic iff its autocorrelation is periodic.
Cross-correlation. Using (10-163), we shall show that the cross-correlation
Яху(т) of two WSS processes x(/) and y(f) satisfies the inequality
R2,.(r) S«„(0)«„.(0)	(10-167)
Proof. From <10-163) it follows that
£2(x(I + r)y»(z)) £ E{|x(r + r)|2)£{ly(t)I2) - Я„(0)Я„(0)
and (10-167) results.
332 STOCASTIC PROPERTIES
COROLLARY. For any a and b,
< ( Sxx(w)dw( Sys((jj) dw (10-168)
J a
I Siy(w) dw
a
Proof, Suppose that xG) and y(r) are the inputs to the ideal filters
= H2(w} = [\ a<w<b
14 ’	2	(0 otherwise
Denoting by zG) and wG) respectively the resulting outputs, we conclude that
1 fh	1	,-b
«.-.•(0) = T-l dw Я„„.(0) = — J 5,,(Ш) dw
АЛТ Ja	& "
я.„(0) =	dw
and (10-168) follows because Л2и.(0) < Ягг(0)Я№„.(0).
10-4 DIGITAL PROCESSES
A digital (or discrete-time) process is a sequence x„ or RVs. To avoid double
subscripts, we shall use also the notation xbi] where the brackets will indicate
that n is an integer. Most results involving analog (or continuous-time) pro-
cesses can be readily extended to digital processes. We outline the main
concepts.
The autocorrelation and autocovariance of x[«] are given by
л2] = £{х[л1]х*[«2]} ФьЯг] = Я["1’«2] ~ ’if'2!J7?*['22]
(10-169)
respectively where тДл] = £{x[n]} is the mean of х[л].
A process x[m] is SSS if its statistical properties are invariant to a shift of
the origin. It is WSS if	= constant and
Л[л +	= E{x[zt + m]x*[n]J = R[zn] (10-170)
A process x[/i] is strictly white noise if the RVs x[nj are independent. It is
white noise if the RVs x[nj are uncorrelated. The autocorrelation of a white-
noise process with zero mean is thus given by
Л[и1}л2]	- n2J where 5[n] = (J	(10-171)
and — Е{х2[л]}. If x[/z] is also stationary, then	Thus a WSS
White noise is a sequence of i.i.d. RVs with variance q.
The delta response /i[n] of a linear system is its response to the delta
sequence 3[n]. Its system function is the z transform of Л[л):
H(z)- f h[n]z~n	(10-172)
л -
10-4 DIGIIAI PHCHLSil.s 333
If х[л] is the input to a digital system, the resulting output is the digital
Convolution of х[л] with /<(«]:
У[«] = £ x[zz — А ]Л[A. ] = x[/j ] ./г[,2 ]	(10-173)
Л “ - x
From this it follows that 7}Дл] = 17,[?г]* Л[л]. Furthermore,
£ R„[nt,n2-k]li*[k]	(10-174)
к - -x
ЯУ)["1’Л2]= E ЛдДл, - г,л2]/г[г]	(10-175)
f =» — ОС
If х[л] is white noise with average intensity as in (10-171), then, [see
(10-90)],
£{у2[л]} = <?[>г]*|/»[л]|2	(10-176)
If х[л] is WSS, then у[л] is also WSS with 77v = т?л = H( 1). Furthermore,
ЯхДги] = Я„[т]* й*[-ю] ЯуД/л] = ЯгДт]* h[m]
х	(10-177)
= /?xx[m]*p[m]	рМ = Е ч-А:]Л*[Лс] 1
к — х
asin (10-133) and (10-135).
THE POWER SPECTRUM. Given a WSS process х[л], we form the z transform
S(z) of its autocorrelation Я[т]:
S(z) - £	(10-178)
m — —oo
The power spectrum of х[л] is the function
S(<u) = S(i?J“) - £	(10-179)
m “ — »
Thus S(e/<u) is the DFT of Я[т]. The function S(e'“) is periodic with period 2-тг
and Fourier series coefficients Я[т]. Hence
Л[т] = — Г S(e;")e""“ da>	(10-180)
2tf J-тг
It suffices, therefore, to specify S(eiai) for |w| < тг only (see Fig. 10-15).
If x[n] isa real process, then 7?[—/л] = Я[т] and (10-179) yields
S(e/<tf) = 7?[0] + 2 £ 7?[wjcos mu	(10-181)
m—0
This shows that the power spectrum of a real process is a function of cos ш
because cos mu is a function of cos &>.
334 STOCHASTIC PROPERTIES
SaM	fS(w)
FIGURE 10-15
Example 10-31. If Я[/п] = a,m,t then
“1	“	az z
S(z) = E a~mz~m + ^amz~m = --------------+------
№--<»	m-0	\ az z a
a-1 — a
(a-1 + a) — (z-1 + z)
Hence
,	a-1 — a
S(e'“) = —--------------
a + a — 2cos w
Example 10-32. Proceeding as in the analog case, we can show that the process
x[n] = ^cieia,,n
i
is WSS iff the coefficients c/ are uncorrelated with zero mean. In this case,
Я[т] - LayVAH $(ш) = 27rE<r,25(w - Д) Ы < tt (10-182)
«	i
where <r,a - Etc?), e>, - 2rk, + ft, and Iftl < v.
From (10477) and the convolution theorem, it follows that if yfnl is the
output of,a linear, system with input 4n], then
S„(e'“) = S„(e>-)H*(et”)
S„(e'“) = S„(e>")H(e'“)	(10483)
S„(e*) = S„(e'")|H(e'“)|2
10-4 DIGITAL I'HOlTSSt-S 335
If Л[л] is real, H*(e/w) = Н(е"'“). In this case
S>y(z) = S,x( z)H( z)H( 1/z)	(10-184)
Example 10-33. The first difference
y[n] = х[л] - х[л - 1]
of a process x[n] can be considered as lhe output of a linear system with input x|/i|
and system function H(z) = 1 - z_|. Applying (10-184), we obtain
Syj.(z) = S„(z)(l -z-')(l -z) = S1A(z)(2-z - г1)
Лу>.[/п] = -/?лд[/и + 1] + 2Ядд[/л] - RK Д/л — 1]
If x[«] is white noise with S„(z) = q, then
Syy(e^) = <?(2 - e>“ - e~>“) = 2g(l - cos w)
Example 10-34. The recursion equation
y[n] - ay[n - 1] = x[n]
specifies a linear system with input x[n] and system function H(z) = 1/(1 - az~ *).
If S,r(z) = q, then (sec Example 10-31)
From (10-183) it follows that
£{ly[«]l2) -Я„[0] - -J-f S„(e'“)|H(e'“)l2<to	(10-185)
Z7T j —tf
Using this identity, we shall show that the power spectrum of a process x[n] real
or complex is a positive function:
Sxx(e;<u) > 0	(10-186)
Proof. We form an ideal bandpass filter with center frequency <a0 and band-
width 2Д and apply (10-185). For small Д,
E(ly[«]l2) =	= ;S„(e'“")
2-ТГ^-Д	IF
mid ,(10-186) results because Е{у2[л]) > 0 and a>Q is arbitrary.
SAMPLING. In many applications, the digital processes under consideration are
obtained by sampling various analog processes. We relate next the correspond-
ing correlations and spectra.
Given an analog process x(/), we form the digital process
х[л] == х(лТ)
J
33.6 STOCHASTIC PROPERTIES
where T is a given constant. From this it follows that
*?["] =-Па(пТ) R[ni,n2]=Ra(niT^2T) (10-187)
where ??e(z) is the mean and Ra(tit t2) the autocorrelation of x(r). If x(r) is a
stationary process, then x[n] is also stationary with mean 17 = rja and autocorre-
lation
fl[m] = Ra(mT)
From this it follows that the power spectrum of x[/i] equals (Fig. 10-15)
*	1 Л / <0 + 2irn \
S(e'“)- £	= - £ S. --------------- (10-188)
n>- -®	J n- 1	1	'
where Se(<a) is the power spectrum of x(/). The above is a consequence of
Poisson’s sum formula [see (11A-1)].
Example 10-35. Suppose that x(r) is a WSS process consisting of M exponentials
as in (10-130):
м	м
x(r) = Ecif'“'' so(w) = 2?T £<rr:S(w - to,)
i-i	i-i
where cr,2 = E(c2). Wc shall determine the power spectrum S(i’/“) of the process
x[n) “ x(nT). From (10-188) it follows that
м
S(eJ,u) = E E <r,2(to ~ to, + 2irn)
П - - ® I - 1
In the interval (-эт.тт), this consists of M lines:
м
S(e'") = E<r,23(to - J3,)	|шI < tt 10,| < it
i-t
where are such that ш, = 2этп, + 0,.
APPENDIX 10A
CONTINUITY, DIFFERENTIATION, INTEGRATION
In the earlier discussion, we routinely used various limiting operations involving
stochastic processes, with the tacit assumption that these operations hold for
every sample involved. This assumption is, in many cases, unnecessarily restric-
tive. To give some idea of the notion of limits in a more general case, we discuss
next conditions for the existence of MS limits and we show that these conditions
can be phrased in . terms of second-order moments (see also Sec. 8-4).
STOCHASTIC CONTINUITY. A process x(/) is called MS continuous if
(10A-1)
APPENDIX 10A CONTINUITY. DIFFERENTIATION. 1NIFGR AVION 337
THEOREM. We maintain that x(/) is MS continuous if its autocorrelation is
continuous.
Proof. Clearly,
E{[x(t + s) - x(t)]2) = R(t + e, t + e) - 2R(t + e, /) + R(t, t)
If, therefore, R(t,, t2) is continuous, then the right side tends to 0 as e -» 0 and
(10A-1) results.
Note Suppose that (10A-1) holds for every t in an interval /. From this it follows that
[see (10-1)] almost all samples of x(/) will be continuous at a particular point of I. It
does not follow, however, that these samples will be continuous for every point in /. Wc
mention as illustrations the Poisson process and the Wiener process. As wc see from
(10-14) and (11-5), both processes arc MS continuous. However, the samples of the
Poisson process are discontinuous at the points tt, whereas almost all samples of the
Wiener process are continuous.
COROLLARY. If x(/) is MS continuous, then its mean is continuous
?)(/ + e) -> 7}(t)	£ -> 0
Proof. As we know
E{[x(f + e) - x(/)]2} > E2{[x(/ + e) - x(t)]}
Hence (10A-2) follows that (10A-1).
The above shows that
lim E{x(t + s)} = El lim x(r + e
(10A-2)
(IOA-3)
STOCHASTIC DIFFERENTIATION. A process x(r) is MS differentiable if
in the MS sense, that is, if
THEOREM. The process M is MS differentiable if d2R(t{, dt2 exists,
ftw/. It suffices to show that (Cauchy criterion)
x(t + £j) - x(t) x(t + £2) - x(t) I2’
->0 (10A-6)
We use this criterion because, unlike (10A-5), it does not involve the unknown
338 STOCHASTIC PROPERTIES
x'(t). Clearly,
E{[x(f + e,) - x(t)][x(r + e2) - x(/)]}
= R(t + eitt + e2) ~ R(t + Et,t) - R(t, t + e2) + R(t, t)
The right side divided by E|e2 tends to d2R(t, t)/dt dt which, by assumption,
exists. Expanding the square in (10A-6), we conclude that its left side tends to
<72Я(/,/) d2R(t,t) d2R(t,t)
dt dt	dt dt	dt dt
COROLLARY. The above yields
/ и	rk	x('+£)“x(')\	.. _/ *(f + e) - x(t) t
E{x'(/)} = El hm------------------> = lim El---------------->
о e )	r-0 I e j
Note The autocorrelation of a Poisson process x(f) is discontinuous al the points t,;
hence x'(t) docs not exist at these points. However, as in lhe case of deterministic
signals, it is convenient to introduce random impulses and to interpret x'(t) as in (10-98).
STOCHASTIC INTEGRALS. A process x(f) is MS integrable if the limit
[bx(t)dt= lim £x(t,) At,	(10A-7)
'a	i
exists in the MS sense.
THEOREM. The process x(r) is MS integrable if
a
(10A-8)
0
Proof. Using again the Cauchy criterion, we must show that
0.2\
£х(г,)Д/,-	------*
i	к	U Д«4.Д«*-0
This follows if we expand the square and use the identity
я{Ех(г,)Д'/Ех('*)Д'Л =	А/,, ДгА.
' i	к	' i,k
because the right side tends to the integral of R(tlt /,) as Д/, and &tk tend to 0.
COROLLARY. From the above it follows that
I	* n
fhx(t) dt
a
E
(10A-9)
as in (10-11).
APPENDIX IOR SHIH OPERATORS AND SI AlIONARY PRIX I SSE.S 339
APPENDIX JOB
SHIFT OPERATORS AND STATIONARY PROCESSES
An SSS process can be generated by a succession of shifts Tx of a single RV x
where Г is a one-to-one measure preserving transformation (mapping) of the
probability space Z into itself. This difficult topic is of fundamental importance
in mathematics. In the following, we give a brief explanation of the underlying
concept, limiting the discussion to the discrete-time case.
A transformation T of ./ into itself is a rule for assigning to each clement
of Z another element of .Z:
=	(iob-])
called the image of £. The images £ of all elements of a subset .с/ of .Z
form another subset
Z= T.Z
of Z called the image of Z.
We shall assume that the transformation T has the following properties.
P,: It is one-to-one. This means that
if	then
P2: It is measure preserving. This means that if .of is an event, then its
image «я? is also an event and
P(Z)=P(Z)	(10B-2)
Suppose that x is an RV and that T is a transformation as above. The
expression Tx will mean another RV
y=Tx such that y(£) = x(£)	(10B-3)
where is the unique inverse of fj.This specifies у for every element of Z
because (see Px) the set of elements £ equals Z.
The expression z = T~lx will mean that x = 7'z. Thus
z=T"'x iff z(O=x(£)
We can define similarly T2x - T(Tx) = Ту and
Tnx = T(T"~'x) = T-I(T" + Ix)
for any n positive or negative.
From (10B-3) it follows that if, for some x(£) < w, then y(£) = x(£) <
w. Hence the event {y <, w) is the image of the event {x w}. This yields [see
(10B-2)]
P(x w} = P{y <> w) у = Tx	(I OB-4)
for any w. We thus conclude that the RVs x and Tx have the same distribution
№
340 STOCHASTIC PROPERTIES
Given an RV x and a transformation T as above, we form the random
process
Xy = x x„ = T"x n = ~x................x	( ЮВ-5)
It follows from (10B-4) that the random variables x„ so formed have the some
distribution. Wc can similarly show that their joint distributions of any order arc
invariant to a shift of the origin. Hence the process x„ so formed is SSS.
It can be shown that the converse is also true: Given an SSS process x„.
we can find an RV x and a one-to-one measuring preserving transformation of
the space .У into itself such that for all essential purposes, x„ = 7 ' x. The
proof of this difficult result will not be given.
PROBLEMS
10-1. In the fair-coin experiment, wc define the process x(z) as follows: x(r) = sin тгг if
heads shows, x(t) = 2t if tails shows, (a) Find £(x(r)}. (b) Find F(x, r) for
t = 0.25, f = 0.5, and t = 1.
10-2. The process x(f) = e"' is a family of exponentials depending on the RV a.
Express the mean т?(г), the autocorrelation /?(Zj. r2). and the first-order density
fix, t) of x(z) in terms of the density f,(,a) of a.
10-3. Suppose that x(r) is a Poisson process as in Fig. 10-3 such that E(x(9)J = 6.
(a) Find the mean and the variance of x(8). (b) Find P(x(2) < 3). (c) Find
P(x(4) S 5|x(2) <. 3}.
10-4. The RV c is uniform in the interval (0, T). Find /?r(z)t t-,) if («) x(t) = L'(.t - c),
(b) x(z) = 8(t - c).
10-5. The RVs a and b arc independent M0; a) and p is the probability that the
process x(z) — a — bz crosses the t axis in the interval (0, T). Show that — p =
arctan T.
Hint: p = P{0 < a/b 5 T}.
10-6. Show that if
= <?('iW'i - t2)
w"(z).= v(z)l/(z) and w(0) = w'(0) = 0, then
£{w2(O} = f'(t ~ t)z/(t) (It
A)
10-7. The process x(z) is real with autocorrelation Л(т). (д) Show that
•P{|X(Z + t) - x(z)| г fl} <; 2[7?(O) - R(r)]/a2
(b) Express P(|x(z + r) - x(z)| a) in terms of the second-order density
f(xitx2',r) of x(z).
10-8. The process x(z) is WSS and normal with £{x(r)} = 0 and Л(т) = 4c“:'r|.
(o) Find P{x(z) £ 3}. (b) Find £([x(z + 1) - x(z - I)]2}.
W-9. Show that the process x(z) = cw(z) is WSS iff £(c) - 0 and w(z) =	4
10-10, The process x(z) is WSS and £{x(z)} = 0. Show- that if z(z) •= x2(z), then C..(r)
1’11411! I Ms 341
10-11. Find E{y(/)}. E{yHt)}, and Rvv(r) if
y"(r) + 4y(r) + 13y(f) = 26 + v(r)	= 103(7)
Find P{y(r) £ 3} if v(t) is normal.
10-12. Show that: If x(f) is a process with zero mean and autocorrelation l(t ,Mt t
- t2). then the process y(G = x(r)//(r) is WSS with autocorrelation и(т). if x(r)
is white noise with autocorrelation <у(Г|).Й(г( then the process z(t> -
x(f)/ Jq(t) is WSS white noise with autocorrelation <S(r).
10-13. Show that |/?,v(r)| < |[/?,,(0) ч Rvv(())].
10-14. Show that if the processes x(r).y(f) are WSS and E(!x(0) - v(.())l*} = (). then
Я,,(т) = Яд>.(т) - Я,ч,(т).
Hint: Set z = x(r + т). w = x*(r» - y*(f) in (10-163).
10-15. Show that if x(t) is a complex WSS process, then
E{|x(f + -) - x( t )|2} = 2 Re[/?((!) - R( r)]
10-16. Show that if <p is an RV with Ф(А) = E{rM*} and <l>( 1) = Ф(2) = 0. then the
process x(f) = cosfrot + <p) is WSS. Find E(x(t)} and /?Дт) if <p is uniform in the
interval (-тг, it).
10-17. Given a process x(f) with orthogonal increments and such that x(0) = 0. show
that (a) R(t|,t:) = /?(г,. г,) for tt <, t2. and (b) if £{[x(f|) - x(r2)]’} = zyl/j - r2|
then the process yit) = (x(f + r) - x(f)]/e is WSS and its autocorrelation is a
triangle with area q and base 2e.
10-18. Show that if Rsx(f|. f,) = r/frpSfr, - g) and yU) = x(f )* Mr) then
£{x(')y(0} -Л(0Мг)
10-19. The process x(f) is normal with tjx = 0 and Rt(r) = 4e"3,T|. Find a memoryless
system g(x) such that the first-order density of the resulting output
y(f) = g[x(f)] is uniform in the interval (6.9).
/Insiver; g(x) = 3G(a/2) + 6.
10-20. Show that if x(f) is an SSS process and e is an RV independent of x(f), then the
process y(f) = x(t - e) is SSS.
10-21. Show that if x(t) is a stationary process with derivative x'(f), then for a given t
the RVs x(r) and x'(r) are orthogonal and uncorrclated.
10-22. Given a normal process x(r) with = 0 and Rx(r) = 4e~2,T|, we form the RVs
z = x(l 4- I), w = х(/ - 1), (n) find £{zw} and E{(i + w)2}, (b) find
Л(2)	P{Z<1)	/U-3V)
10-23. Show that if x(f) is normal with autocorrelation R(r), then
P{x'(f) <;0} =G
a
/- R"(0)
10-24. Show that if xG) is a normal process with zero mean and y(/) = sgnx(f). then
2 x 1
Ry(r) - - £ -[J0(htt) - (-l)"]sin
7Гп-|Л
й1Ч(0)
where J0(a’) is the Bessel function.
Hint: Expand the arcsine in (10-71) into a Fourier series.
342 STOCHASTIC PROPERTIES
10-25. Show that if x(t) is a normal process with zero mean and y(t) = then
Vy = =/cxp^y/?,((!)J Яу.(т) = /2ехр{а2[Я,(0) + Яж(т)]}
10-26. Show that (a) if
y(f)=ax(ct) then Ry(r) = a2Rx(cr)
(ft) if Яд(т) -» 0 as т -> oo and
z(/) = lim ^x(eZ) then Rz(t) = q8(r) q = J Rx(T)dr
10-27. Show that if x(f) is white noise, h(t) = 0 outside the interval (0, T), and y(r) =
х(/)*й(/) then Ryy(tl,t2) = 0 for |f, - f2| > T.
10-28. Show that if
-t2)	£{y2(O} = Ф)
and
(а)	y(t) = ffA(t,a)x(a) da	then l(t) = ( h2(t ,a)q{a) da
Ju	Jn
(b)	y'(t) + c(t)y(t) = x(t)	then I'(t) + 2c(t)I(t) = q(t)
10-29. Find E{y2(r)} (a) if Rxx(t) = 58(r) and
y'(t) + 2y(f) = x(t) all t	(i)
(ft) if (i) holds for t > 0 only and y(t) = 0 for t < 0.
Hint: Use (10-90).
10-30. The input to a linear system with ft(t) = Ae~a,U(t) is a process x(t) with
Я/т) = NSM applied at t = 0 and disconnected at t = T. Find and sketch
E(y2(/)}.
Hint: Use (10-90) with q(t) = N for 0 < t < T and 0 otherwise.
10-31. Show that if
s= [,Ox(t)dt then E{s2} = [,a (10 - |т|)Яд(т)^т
A)	J -in
Find the mean and variance of s if E(x(r)} = 8, Ял(т) = 64 + 10e-2,rl.
10-32. The process x(/) is WSS with Rxx(t) = 58(т) and
y'(t) + 2y(r) = x(t)	(i)
Find E{y2(t)}, Ял>,(/|,72), ЯууОи f2) (a) if (i) holds for all t, (ft) if y(0) = 0 and
(i) holds for t 0.
10-33. Find S(w) if (а) Я(т) = e~aT', (ft) Я(т) = e“er* cos aipT.
10-34. Show that the power spectrum of an SSS process x(t) equals
5(«) “ f f xlx2G(xltx2',<o)dXi dx,
ОС*'—Ж
where G(X|, x2; ш) is the Fourier transform in the variable r of the second-order
density ДХ|, x2; t) of x(r).
puohi । ms 343
10-35. Show that ifyG) = x(t + a) - x(t - a). then
^у(т) = 2Л,(т) — /?,(7 + la) — Rx(r — la) .S’((w) = 4.S’>(<li)sin* aa>
10-36. Using (10-122), show that
K(0) - R(~) > -2_[Z?(O) - /?(2"т)]
Hint:
,0	,0	1
1 — cos0 = 2 sin* - > 2sin* — cos* - = -(1 - cos20)
2	224'	’
10-37. The process x(t) is normal with zero mean and /?4(т) = le "!Ti cos /3т. Show that
if yG) = x2(f). then Cv(t) = Z3<--2"'T|(I + cos2/3t). Find .S\((1>).
10-38. Show that if R(r) is the inverse Fourier transform of a function 5(o>) and
5(w) > 0, then, for any a,,
"La^Rir, - -J > 0
i.k
Hint:
/_* 5(w)|Lo1f/u,T-| dtaZO
10-39. Find R(t) if (а) 5(ш) = 1/(1 + w1), (b) SM = 1/(4 + w2)2.
10-40. Show that, for complex systems. (10-136) and (10-181) yield
Sn.(s) = Si4(jr)H(s)H*(-x*) Svl.(z) = S4r(z)H(z)H*(l/z*)
10-41. The process x(f) is normal with zero mean. Show that if y(t) = x2(t), then
S/ш) = 2тт/?*(0)3(ш) + 2S/w)« 5Д(ш)
Plot Sy(w) if 5r(w) is (a) ideal LP, (b) ideal BP.
10-42. The process x(f) is WSS with E{x(t)) = 5 and /?лг(т) = 25 +	~2|T|. If y(f) =
2x(f) + 3x'(f), find 7)v, Ryy(r), and 5у>.(ш).
10-43. The process x(f) is WSS and 7?1Л(т) = 55(т). (a) Find E{y2(t)} and Syv(w) if
y'(f) + 3y(r) = x(f). (b) Find £{y2(/)} and Rxy(th z2) if у'(f) + 3y(f) = x(t)U(t).
Sketch the functions Rxy(2, tt) and Rxy(tt,3).
10-44. Given a complex process x(f) with autocorrelation R(r\ show that if |/?(t,)| = 1,
then
R(t) = e'“Tw(r) x(f) = ey“"y(f)
where w(r) is a periodic function with period and y(r) is an MS periodic
process with the same period.
10-45. Show that (a) £{x(t)x(f )} = 0, (b) i(f) = -x(f).
10-46. (Stochastic resonance) The input to the system
h(i)  ,>+L+s
344 S’lnCHASIlC PROPERTIES
is a WSS process x(z) with £{x2(z)} = 10. Find Sx(u) such that the average power
E(y2(z)} of the resulting output y(z) is maximum.
Hint: |Н(/ш)| is maximum for ш = 7з.
10-47. Show that if Rx(r) - Aeju>°Tt then Rxy(r) = Beju,>r for any y(z).
Mini: Use (10-167).
10-48. Given a system /7(ш) with input x(z) and output y(z), show that (a) if x(z) is WSS
and Rxx(r) = eiaT, then
^x(r) -	Ryy(r) = e""|H(«)|2
(6)	if /?„(Z„Z2) =	then
10-49. Show that if Sxx(<i>)Syy(.(u) a 0, then S(>.(w) « 0.
10-50. Show that if х[л] is WSS and ЯД1] = Я Д0], then ЯДт] = Я ДО] for every m.
10-51. Show that if Я[гл] = Е{х[л + m]x[zi]), then
Я[0]Я[2] > 2Я2(1] -Я2[0]
10-52. Given an RV ш with density /(w) such that /(w) = 0 for Iwl > it, wc form the
process x[n] = Aejnuir. Show that Sx(w) = 2tt/12/(w) for |ш| < тт.
10-53. (a) Find £{y2(z)) if y(0) = y'(0) = 0 and
У"(') + 7y'(z) + 10y(z) = x(z) Rx(r) = 58(t)
(h) Find Ely 2[л]} if y[- 1] == y[-2] = 0 and
8у[л] - 6у[л - 1] + у[л - 2] = х[л] Ях[т] = 5фл]
10-54. The process х[л] is WSS with ЯхДл»] = 56[л|] and
у[л] - 0.5у[л - 1] = х[л]	(i)
Find Е(у21л]}, ЯжД/м„т2], Я^Д/л,. гл2] (о) if (i) holds for all л, (h)ify[-l] = 0
and (i) holds for n 0.
10-55. Show that (a) if Ax[m|,m2] = 9lW|]8[»n1 — m2] and
N	N
5 = E then E{s2} = £ а,М«]
л-0	«-0
(b) If Я^Дг,, z2) = ^(z1)6(z1 - z2) and
s= f a(t)x(r)di then £{s2} = f a2(t)<i(i)dt
Jo	Jo
CHAPTER
11
BASIC
APPLICATIONS
11-1 RANDOM WALK, BROWNIAN MOTION,
AND THERMAL NOISE
We toss a fair coin every T seconds and after each toss we take instantly a step
of length 5, to the right if heads shows, to the left if tails shows. The process
starts at t = 0 and our location at time t is a staircase function with discontinu-
ities at the points t = nT (Fig. 11-la). We have thus created a discrete-state
stochastic process xG) whose samples xG, f) depend on the particular sequence
of heads and tails. This process is called the random walk.
Suppose that at the first n tosses we observe к heads and n - к tails. In
this case, our walk consists of к steps to the right and n - к steps to the left.
Hence our position at time t = nT is
x(nT) = ks — (n - k)s = ms m = 2k - n
Thus xCnT) is an RV taking the values ms, where m equals n, or n - 2,..., or
—n. Furthermore,
t \ 1	m + n
P{^nT) = ms} = [nk\- к =	UM)
This is the probability of к heads in n tosses.
We note that x(nT) can be written as a sum
x(nT) + • • • +x„
where x, equals the size of the /th step. Thus the RVs xt are independent
345
346 STOCHASTIC PROCESSES
taking the values ±j and E{x,} = 0, E{x2} = s2. From this it follows that
Е{х(лТ))=0	Е{х2(лТ)} = ns2	(11-2)
Large t. As we know, if n is large and к is in the т/npq vicinity of np, then [see
(3-27)]
(П\пклп~к —	1 „-(k-npr /Inpq
к)рч “
From this and (11-1) it follows with p = q = 0.5 and nt = 2k — n that
Р(х(ЯТ) = -m) =
уЛ7Г/2
for m of the order of -/n. Hence
P{x(r) £ ms} = G(m/yfn) nT - T < t £ nT (11-3)
where G(x) is the N(0,1) distribution [see (3-34)].
Note that if л, < n2 n3 < n4 then the increments х(л4 T) - х(л3 7) and
х(л2 T) - х(л, T) of x(/) are independent.
The Wiener process. We shall now examine the limiting form of the random
walk as n -» « or, equivalently, as T -> 0. As we have shown
ts2
£{x2(/)} =ns2 = — t = nT
Hence, to obtain meaningful results, we shall assume that s tends to 0 as УТ:
s2 = aT
The limit of x(r) as T -» 0 is then a continuous-state process (Fig. 11-lb)
w(t) = limx(r) T 0
known as the Wiener process.
1 1-1 RANDOM WALK, BROWNIAN MOTION, AND I IIURMAI. NOISI: 347
We shall show that the first-order density f(w,t) of wit) is normal with
zero mean and variance at:
/(»v,r) = -=LC-“!A'	(И-4)
v27T«f	'	’
Proof. If w = ms and t = nT, then
m w/s w
4n	4at
Inserting into (11-3), we conclude that
.	f w \
P(w(r) < И'} = G -= I
\ vat /
and (11-4) results.
We show next that the autocorrelation of wit) equals
/?(/,,/,)= omin(/1,r;)	(H-5)
Indeed, if r, < l2, then the difference w(r2) - wG]) is independent of wir,).
Hence
£{[w('2) - w(r,)]w(r,)} = £{[w(t2) - w(/,)]}E{w(t1)} = 0
This yields
E{w(r,)w(r2)} = E{w2(/,)} =	= at,
as in (11-5). The proof is similar if r, > t2.
Note finally that if z, < t2 < t2 < t4 then the increments wit4) - wG3)
and w(r2) - w(t}) of wG) are independent.
Generalized random walk. The random walk can be written as a sum
х(г)=£сД»-*П	(n-1)T<t£nT	(11-6)
k-l
where cA is a sequence of i.i.d. RVs taking the values s and -s with equal
probability. In the generalized random walk, the RVs ck take the values s and
~s with probability p and 4 respectively. In this case,
E{c4 = ( P - q)*	£{<%} = s* = W2
From this it follows that
E{x(/)} = n(p - q)s Varx(/) = 4npqs2	(H-7)
For large n, the process x(z) is nearly normal with
l	4t
- ^(.Р ~ Varx(z) =(П-8)
348 STOCHASTIC PROCESSES
Brownian Motion
The term brownian motion is used to describe the movement of a particle in a
liquid, subjected to collisions and other forces. Macroscopically, the position
x(t) of the particle can be modeled as a stochastic process satisfying a second-
order differential equation:
mx"(t) +fx'(t) + cx(t) = F(t) c>0	(П-9)
where F(r) is the collision force, m is the mass of the particle, / is the
coefficient of friction, and cxG) is an external force which wc assume propor-
tional to xG). On a macroscopic scale, the process FG) can be viewed as normal
white noise with zero mean and power spectrum
Sr(c»)=2kTf	(11-10)
where T is the absolute temperature of the medium and к = 1,37 x 10”23
Joule-degrees is the Boltzmann constant. We shall determine the statistical
properties of x(f) for various cases.
Bound motion. We assume first that the restoring force cxG) is different from
0. For sufficiently large t, the position x(r) of the particle approaches a
stationary state with zero mean and power spectrum
2kTf
s.M--------------2, -2 ,	(и-ID
(c — mar) + f‘ca-
To determine the statistical properties of xG), it suffices to find its autocorrela-
tion. We shall do so under the assumption that the roots of the equation
ms2 + fs + c = 0 are complex
f > c
Si,2=~a±jp a =—— or + p2------------------
2m	m
Replacing b, ct and q in Example 10-26b by f/m, c/m, and 2kTf/m2
respectively, wc obtain
kT	t	a	\
Лж(т) = —e~“,r| cos fh- + — sin /3|т|	(11-12)
c	\	Э	/
Thus, for a specific t, x(t) is a normal RV with mean 0 and variance R/0) =
kT/с. Hence its density equals
fM =	е~"г/2"Т	(1113)
The conditional density of xG) assuming x(f0) = x0 is a normal curve with
mean ax0, and variance P where (see Example 8-11)
/?х(т)
fl = W P = Rx(0)(l-n2J	7-U-/0
11—1 RANDOM WAI К BROWNIAN MOTION. AND I HF RM Al. NOISI 349
FREE MOTION. We say that a particle is in free motion if the restoring force is
0. In this case. (11-9) yields
= F(z)	(11-14)
The solution of this equation is not a stationary process. We shall express its
properties in terms of the properties of the velocity x( t) of the panicle. With
vO) = x'(z), it follows from (11-17) that
mv’(t) +fv(f) = F(r)	(11-15)
The steady state solution of this equation is a stationary process with
t ч 2АТ/	kT
S,M =	--77	/<(r) - —(11-16)
* “	m
normal process with zero mean.
From the preceding, it follows that Hz) is a
variance kT/т, and density
/ Tn
(H-17)
The conditional density of Иг) assuming v(0) = r(l is normal with mean
au0 and variance P (see Example 8-11) where
ЯД0 .	kT 7 kT
a=z-r77^=e~/,''n P= — (’ - a ' = — 0 ~e '
«Д0)	rn	m
In physics, (11-15) is called the Langevin equation, its solution the
Omstein-Uhlenbeck process, and its spectrum lorenzian.
The position x(f) of the particle is the integral of its velocity;
x(z) = ( v(a) da	(11-18)
A)
From this and (10-11) it follows that
E{x2(/)} = [' f'R.ta - fi)dadfi = — ['	da d(5
Jo Jo	>n Jo Jo
Hence
E(x2(')) = ^r-('-y + 7f	U‘-19)
Thus; the position of a particle in free motion is a nonstationary normal process
with zero mean and variance the right side of (11-19).
For t » m/f, (11-19) yields
2kT	kT
E{x2.(Z)} = — t = 2D2t D2-~f	<lb20)
The parameter D is the diffusion constant. This result will be presently
rederived:
350 STOCHASTIC PROCESSES
THE WIENER PROCESS. We now assume that the acceleration term zhx"(/) of
a particle in free motion is small compared to the friction term fx'(t), this is
the case if f » m/t. Neglecting the term in (11-14), we conclude that
/x'(r) = F(/)	x(/) = - J F(a) rfa
J
Because F(t) is white noise with spectrum 2kTf, it follows from (10-36) with
v(t) = FG)// and q(t) = 2kT/f that
, , , 2kT	2kT
E{x2(/)} = —— t = at a = -y- - 2D2
Thus, x(r) is a nonstationary normal process with density
1
/,fn(x) ® / ~e x 7201
Jx(‘* '	/277^7
We maintain that it is also a process with independent increments. Because it is
normal, it suffices to show that it is a process with orthogonal increments, that is
E{[x(t2) - x(r,)][x(r4) - x(/3)]} = 0	(И-21)
for /j < t2 < /3 < t4. This follows from the fact that x(/,) - x(ry) depends only
on the values of F(r) in the interval (/,, ty) and F(/) is white noise. Using this, we
shall show that
Ях(Г|,/2) = amin(/„/2)	(11-22)
To do so, we observe from (11-21) that if /, < f2, then
£{x(tj)x(r2)} = Е{х(Г))[х(Г2) - xC/j) + x(r,)]} = E{x2(t,)} = ati
and (11-22) results. Thus the position of a particle in free motion with negligible
acceleration has the following properties:
It is normal with zero mean, variance at and autocorrelation min(/f, r2). It
is a process with independent increments.
A process with these properties is called the Wetter process. As we have
seen, it is the limiting form of the position of a particle in free motion as t -» «;
it is also the limiting form of the random walk process as n -» «.
We note finally that the conditional density of x(r) assuming x(r0) = x0 is
normal with mean ox0 and variance P where (see Example 8-11)
A(Mo)
a =	1 р =	-aR(t,t0) = at - at0
Hence
- *o) = -й==---------(11-23)
J2tra{t -10)
1I-! RANDOM WALK. IIROWNIaN MOIION AND ГК1КМЛ1 NOISI 351

S„» = 2*77?
(«>) = 2kTG FIGURE 11-2
Difftision equations. The right side of (11-23) is a function depending on the
four parameters x,xu,t, and ta. Denoting this function by tt(.v, aI(; t,,) wc
conclude by repeated differentiation that
where D2 — a/2. These equations arc called diffusion equations. They are
reestablished in Sec. 16-4 in the context of Markoff processes.
Thermal noise
Thermal noise is the distribution of voltages and currents in a network due to
the thermal electron agitation. In lhe following, wc discuss the statistical
properties of thermal noise ignoring the underlying physics. The analysis is
based on a model consisting of noiseless reactive elements and noisy resistors.
A noisy resistor is modeled by a noiseless resistor R in series with a
voltage source nc(/) or in parallel with a current source n,(/) = nr(f)//? as in
Fig. 11-2. It is assumed that n/t) is a normal process with zero mean and flat
spectrum
= 2kTR 5, (") =	= 2kTG
IX
(11-25)
where к is the Boltzmann constant, T is the absolute temperature of the
resistor, and G = 1/R is its conductance. Furthermore, the noise sources of the
various network resistors are mutually independent processes. Note lhe similar-
ity between the spectrum (11-25) of thermal noise and the spectrum (11-10) of
the collision forces in brownian motion.
Using the above and the properties of linear systems, we shall derive the
spectral properties of general network responses starting with an example.
Example 11-1, The circuit of Fig. 11-3 consists of a resistor R and a capacitor C.
We shall determine the spectrum of the voltage v(i) across the capacitor due to
thermal noise.
The voltage v(/) can be considered as the output of a system with input the
noise voltage nr(f) and system function
352 STOCHASTIC PROCESSES
FIGURE 11-3
b
Applying (10-136), we obtain
, 2kTR
-	- 2r2C2
kr	(11’26)
Я.(г) - —
The following consequences are illustrations of Nyquist’s theorem to be
discussed presently: We denote by Z(s) the impedance across the terminals a, b
and by z(t) its inverse transform
=	z<') - 7с"/ЯСи(<)
1 + KCS	c
The function z(t) is the voltage across C due to an impulse current 8(t) (Fig.
11-3). Comparing with (11-26), we obtain
Я
= 2£T ReZ(y<u)	ReZ( jw) =
Я„(т) = kTz(r) т > 0	Яг(0) = Ш(0+)
E{v2(/)} = Я„(0) -	= lim >Z(/W)
G	С	cu—>ос
Given a passive, reciprocal network, we denote by v(r) the voltage across
two arbitrary terminals a, b and by Z(s) the impedance from a to b (Fig. 11-4).
NYQUIST THEOREM. The power spectrum of v(l) equals
5„(<u) = 2fcTReZ(jco)
(11-27)
Proof. We shall assume that there is only one resistor in the network. The
general case can be established similarly if we use the independence of the
noise sources. The resistor is represented by a noiseless resistor in parallel with
ZO)
.$„(«) = 2/tTRe Z(M
FIGURE IM
11-1 RANDOM WALK. BROWNIAN MOTION. AND THERMAL NO1SL 353
figure ll-s
a current source n/z) and the remaining network contains only reactive ele-
ments (Fig. ll-5a). Thus v(r) is the output of a system with input n,(/) and
system function W(w). From the reciprocity theorem it follows that «
•И(о>)//(а>) where 1(ш) is the amplitude of a sine wave from a to b (Fig. 11-56)
and Й(<ч) is the amplitude of the voltage across R. The input power equals
|/(oj)|2 Re |Z(joj) and the power delivered to the resistance equals |И(<о)1“/Я.
Since the connecting network is lossless by assumption, we conclude that
,	|T(«)|2
|/(w)|2 ReZ(Jw)----------—
ix
Hence
,	1И(ш)|2
И"» =	62 =-RReZ(>)
and (11-27) results because
2kT
S,M = Sjto) |H(a>) I2	= —
COROLLARY 1. The autocorrelation of v(z) equals
Ru(t) = кТг(т) т>0	(11-28)
where z(z) is the inverse transform of Z(s).
Proof, Since Z(—jto) • Z♦(jo>), it follows from (11-27) that
SVM = kT[Z(ja>) +Z(-M]
and (11-28) results because the inverse of Z(—jo>) equals z(—t) and z(—t) = 0
for t > 0.
COROLLARY 2. The average power of v(z) equals
kT	1
Efv2(r)j =-— where — e Inh j<oZ(ju) (11-29)
1	C	C «-»«
where C is the input capacity.
354 STOCHASTIC PROCESSES
Proof. As we know (initial value theorem)
z(0+) = lim sZ(s) 5 -* *
and (11-29) follows from (11-28) because
E(v2(r)} = R,(0) = kTz(0<)
Currents. From Thevenin’s theorem it follows that, terminally, a noisy network
is equivalent to a noiseless network with impedance Z(s) in series with a voltage
source v(t). The power spectrum St,(w) of v(r) is the right side of (11-27). This
leads to the following version of Nyquist’s theorem:
The power spectrum of the short-circuit current i(t) from a to b due to
thermal noise equals
1
= 2&TReY(ja>) Y(s) =	(11-30)
Proof. From Thdvenin’s theorem it follows that
, 2ATReZ(j<a)
SM = StM |Y(jo>) I2 =	.
IZ( ju) 1
and (11-30) results.
The current version of the corollaries is left as an exercise.
11-2 POISSON POINTS AND SHOT NOISE
Given a set of Poisson points t, and a fixed point we form the RV
z = t, - r() where t, is the first random point to the right of r(1 (Fig. 11-6). We
shall show that z has an exponential distribution:
/2(z)=Ae"Ai	F,(z) = 1 - e-A; z>0	(11-31)
Proof. For a given z > 0, the function F2(z) equals the probability of the event
{z z). This event occurs if Ц < tn + z, that is, if there is at least one random
point in the interval (t0, tQ + z). Hence
Fz(z) = P{z<z) =Р{п(/0,г0+z) > 0} = 1 -P{n(/0,r0 + z) =0}
and (11-31) results because the probability that there are no points in the
interval + z) equals е-Аг.
FIGURE 11-6
11-2 I'Oissom’oinisлм>mioi soisi-s 355
We can show similarly that if w = /„ - t_, is the distance from t(l to the
first point t_, to the left of zu then
fw(w) = Xe~*w F(v(iV) = I - e'A" и-> 0	(11-32)
We shall now show that lhe distance x„ = t„ - from /„ the nth random
point t„ to the right of f0 (Fig. 11-6) has an Erlang distribution:
Cl-33)
Proof. The event {x„ < x} occurs if there are at least n points in the interval
(f0, t0 4- x). Hence
= -Plx,, <x) = 1 - P{n(rf)ufl + x) < и) = I - £ (-Л )	"
k I
к - () K‘
Differentiating, we obtain (11-33).
Distance between random points. We show next that the distance
* ~	~ Xn_ । t„ — t„_ ।
between two consecutive points and t„ has an exponential distribution:
A(x)=Ar?-Al	(11-34)
Proof. From (11-33) and (5-70) it follows that the moment function of x„ equals
= >—Л	(11-35)
(A - x)
Furthermore, the RVs x and x,,., are independent and x„ = x + x„_,. Hence,
if is the moment function of x, then
4($) = ФД5)фп-|(5)
Comparing with (11-35), we obtain Фх(х) = A/(A - 5) and (11-34) results.
An apparent paradox. We should stress that the notion of the “distance x
between two consecutive points of a point process” is ambiguous. In Fig. 11-6,
we interpreted x as the distance between t„_, and t„ where t„ was lhe nth
random point to the right of some fixed point r(1. This interpretation led to the
conclusion that the density of x is an exponential as in (11-34). The same density
is obtained if we interpret x as the distance between consecutive points to the
left of z0. Suppose, however, that x is interpreted as follows:
Given a fixed point /e, we denote by t; and tr the random points nearest to
ia on its left and right respectively (Fig. ll-7a). We maintain that the density of
the distance x я tr - between these two points equals
f(x) = A2xe-Ax	(11-36)
356 STOCHASTIC PROCESSES
c>
C.V
X----
(/•)
FIGURE 11-7
Indeed the RVs
*i = ta ~ t/ and xr = tr - ta
are independent with exponential density as in (11-31); furthermore, x = xr + x;.
This yields (11-36) because the convolution of two exponentials is the density in
(11-36).
Thus, although x is again the “distance between two consecutive points,"
its density is not an exponential. This apparent paradox is a consequence of the
ambiguity in the specification of the identity of random points. Suppose, for
example, that we identify the points t, by their order /, where the count starts
from some fixed point t0, and we observe that in one particular realization of
the point process, the point t„ defined as above, equals t„. In other realizations
of the process, the RVs tz might equal some other point in this identification
(Fig. 11-76). The same argument shows that the point tr does not coincide with
the ordered point t,j+l for all realizations. Hence we should not expect that the
RV x = tr — tf has the same density as the RV t,l +, - t„.
CONSTRUCTIVE DEFINITION. Given a sequence w„ of positive i.i.d. (indepen-
dent, identically distributed) RVs with density
/(w) = Ae-A*’	(11-37)
we form a set of points t„ as in Fig. ll-8a where r = 0 is an arbitrary origin and
th = Wj + w2 + • • • +w„	(11-38)
We maintain that the points so formed are Poisson distributed with param-
eter A.
и
1 1-2 I'OISSON I’OININ AM>S||I)I NOISES 357
FIGURE 11-8
Proof. Prom the independence of the RVs w„, it follows that the RVs t„ and
w„+t are independent, the density fn(t) of t„ is given by (11-33)
(11-39)
and the joint density of t„ and w„ t, equals the product /„(!)/(»’). If t„ < r and
t„+1 =	+ w/i+i > T ^en there arc exactly n points in the interval ((),-). As
we see from Fig. 11-8Z?, the probability of this event equals
(n - 1)!
t" le Al dwelt
Thus the points tM so constructed have property Pt. We can show similarly that
they have also property P2.
POISSON POINTS REDEFINED. Poisson points are realistic models for a large
class of point processes: photon count, electron emission, telephone calls, data
communication, visits to a doctor, arrivals at a park. The reason is that in these
and other applications, the properties of the underlying points can be derived
from certain general conditions that lead to Poisson distributions. As we show
next, these conditions can be stated in a variety of forms that are equivalent to
the two conditions used in Sec. 3-4 to specify random Poisson points (see page
59).
I. If we place at random W points in an interval of length T where N » 1,
then the resulting point process is nearly Poisson with parameter N/T. This
is exact in the limit as N and T tend to «(see (3-47)].
HJf the distances w„ between two consecutive points t„_, and t„ of a point
process are independent and exponentially distributed, as in (11-37), then
this process is Poisson.
The above can be phrased in an equivalent form: If the distance w
from an arbitrary point r0 to the next point of a point process is an RV
358 STOCHASTIC* PROCESSES
whose density does not depend on the choice of t(1, then the process is
Poisson. The reason for this equivalence is that this assumption leads to the
conclusion that
/(w|w	f„) =/(w - r0)	(11-40)
and the only function satisfying (11-40) is an exponential (sec Example
7-10). In queueing theory, the above is called the Markoff or memoryless
property.
III.	If the number of points n(/, t + dt) in an interval (r, t + dt) is such that:
(a)	P{n(t,t + dt) = 1} is of the order of dt;
(b)	P{n(z,r + dt) > 1} is of order higher than dt',
(c)	the above probabilities do not depend on the state of the point process
outside the interval (r, t + dt)-,
then the process is Poisson (see Sec. 16-4).
IV.	Suppose, finally, that:
(a)	P{n(a, b) = k} depends only on к and on the length of the interval
(«, b);
(h)	if the intervals (a„ bt) are nonoverlapping, then the RVs n(a,, £>,) are
independent;
(c)	P{n(a, b) = <»} = 0.
These conditions lead again to the conclusion that the probability
рк(т) of having к points in any interval of length т equals
P*(O =<A’(Ar)7t!	(11-41)
The proof is omitted.
Linear interpolation. The process
x(t) = t-t„ t„<t<tn + l	(11-42)
of Fig. 11-9 consists of straight line segments of slope 1 between two consecutive
random points t„ and t„+I. For a specific t, x(z) equals the distance w = t - t„
from t to the nearest point tn to the left of t\ hence the first-order distribution
of x(r) is exponential as in (11-32). From this it follows that
1	2
E{x(t)J = -	£{x2(/)) =	(11-43)
Л	Л
FIGURE 11-9
11-2 POISSON I'OIM'S AND SHC11 NOIM.S 359
THEOREM. The autocovariance of x(f) equals
C(t) = -j(l + Al"|)e-A|T|	(11-44)
Л
Proof. We denote by t,„ and t„ the random points to the left of the points t + r
and t respectively. Suppose first, that t,„ = t„: in this case x(r + 7) = 1 + - - t
and x(r) = t - t„. Hence [see (11-42)]
C(r) = E{(t + r - t„)(t - t„)} = E[(f - t„);) + ~E{i - t„) = 4 + -
A” A
Suppose, next, that t,„ =# t„; in this case
C(r) = E{(j + 7 - t„,)(t - t„)} = E{t + 7 - t„,]E{r - t„] = 1
Clearly, tm = t„ if there are no random points in the interval (t + 7, t)\ hence
P{tw = t„) = e-AT. Similarly, t,„ =/= t„ if there is at least one random point in the
interval (f + r, r); hence P(t„ #= t,„) = 1 - e"Ar. And since [see (4-48)]
Я(т) = E{x(t + r)x( t) |t,„ = tJPft,,, = tj
+ E{x(t + T)x(f)|t„ * tJP(tz, * t,„]
we conclude that
I 2	7\	1
E(t) = -2 + T e Ar + 77(1 “ e Ar)
\ Л	Л }	Л
for т > 0. Subtracting 1/A2, we obtain (11-44).
Shot Noise
Given a set of Poisson points t( with average density A and a real function h(t),
we form the sum
s(r) =	(H-45)
i
This sum is ah SSS process known as shot noise. In the following, we discuss its
second order properties. The general statistics are developed in Sec. 16-3.
From the definition it follows that s(f) can be represented as the output of
a linear system (Fig. 11-10) with impulse response h(t) and input the Poisson
impulses
z(<)-	(il-46)
/
This representation agrees with the generation of shot noise in physical prob-
lems: The process s(f) is the output of a dynamic system activated by a sequence
of impulses (particle emissions, for example) occurring at the random times t,.
360 STOCHASTIC PROCESSES
x(r)	s0)
FIGURE 11-10
As we know, 17, = Л; hence
E{s(/)} =Л/“Л(Г)Л = ЛЯ(О)	(11-47)
Furthermore, since (see Example 10-22)
SI2(m) = 2ttA25(w) + A	(11-48)
it follows from (10-136) that
S„(w) = 2тгА2Я2(0)5(ш) + Л|Я(ш)|2	(11-49)
because |Я(п))|2S(w) = //2(0)S(w). The inverse of the above yields
Л„(т) =А2Я2(0) +Ар(т) С„(т) = Ар(т) (11-50)
Campbell’s theorem. The mean i]s and variance <r2 of the shot-noise process
s(t) equal
Ъ = лГ h(t)dt <rs2 = Xp(0) = X Г h2(t) dt (11-51)
* — BQ	J — oo
Proof. It follows from (11-50) because <r2 = Css(ff).
Example 11-2. If
then
A	2 A
~	(rf = —
a	2a
2irA2	A	A
S1M - — W + ^5	C„(r) = -e-H
Example 11-3 Electron transit Suppose that Л(г) is a triangle as in Fig. 11-1U.
Since
11-2
POISSON POINTS AND SI IO I NOIShS 361
it follows from (11-51) that
XkT2 , Xk2T*
In this case
HM - Che-'-d, -	2k™“T/2 _
A)	jar	jo)
Inserting into (11-49), we obtain (Fig. 11-116).
,	A*2
5„(<o) = 2irTj,3(<o) 4--r-(2 - 2cos wT + ш~Т2 - 2a>T sin шТ)
ш
Generalized Poisson process and shot noise. Given a set of Poisson points t,
with average density Л, we form the process
n(i)
X(<)T) - Lc,	(11-52)
i	i = I
where c; is a sequence of i.i.d. RVs independent of the points t, with mean j]c
and variance rrc2. Thus x(t) is a staircase function as in Fig. 10-3 with jumps at
the points t; equal to cf . The process n(z) is the number of Poisson points in the
interval (0, z); hence £{n(z)} = Xt and £{n2(z)} = A2 + Az. For a specific z, x(z)
is a sum as in (8-46). From this it follows that
£{*(0} = ^£{n(z)} = T?CAZ
£{x2(z)} = i72£{n2(z).} + <Tc2£{n(/)) = -h2(Az + A2Z2) + ac2Az
Proceeding as in Example 10-5, we obtain
Cxjr(zt,Z2) =	+ o?2)A min(z,,Z2)	(11-54)
We next form the impulse train
z(Z) — x'(t) = £c,.8(z “ *z)	(11-55)
362 STOCHASTIC PKOCESSES
From (11-54) it follows as in (10-100) that
d
£{z(/)} = —£T(x(r)} =TjfA	(11-56)
C„(Z„<2) - } <Т(д"'2) = (n? + 0-c2)A5(t)	(11-57)
where r = G - ft. Convolving z(f) with a function h(t), we obtain the general-
ized shot noise
s(r) = 'Ecjhit - tj = z(f) * h(t)	(11-58)
I
This yields
E(s(t)} = £{z(f))*A(r) - ЯЛ Г h(t)dt	(11-59)
J — оь
C„(t) = Сгг(т)*Л(т)*Л(-т) = (ле + ог2)Ар(т) (11-60)
Vars(f) = C„(0) = (172 + <rf2)A f h2(t) dt	(11-61)
The above is the extension of Campbell’s theorem to a shot-noise process with
random coefficients.
11-3 MODULATION!
Given two real jointly WSS processes a(f) and b(r) with zero mean and a
constant <o0, we form the process
x(f) = a(/)cos <o0f - b(/)sin wof
= r(r)cos[<unt + <p(r)]	(11-62)
where
r(t) = Va2G) + b2(f) tan <p(f) =
a(')
This process is called modulated with amplitude modulation rO) and phase
modulation <p(t).
We shall show that x(/) is WSS iff the processes a(t) and b(/) are such that
Яв«(т) =«w(r) Rab(r) = —Rba(r)	(11-63)
Proof, Clearly,
£{x(/)} = £{a(/))cos (o0t — £{b(r)}sin &>ot = 0
fA. ‘Papoulis: “Random Modulation: A Review," IEEE Transactions on Acoustics, Speech, and
Signal Processing, vol. ASSP-31, 1983.
11-3 MODUIAIIOS 363
Furthermore,
x(t + r)x(r) = [a(r + r)cos<d0(/ + 7) - b(r + t)cos<u(,(/ + 7)]
x[a(/)cos<u0z - b(/)sin to0/]
Midtiplying, taking expected values, and using appropriate trigonometric identi-
ties, we obtain
2E{x(/+ r)x(f)) = (Яао(г) +/?W)(t)]cos wn7 + [/?u/,(t)	7)]sin <u07
+ [K™(T) ” K/,/>(r)]cos ю(1(2г + r)
-[«„/,( r) + /?,,„( 7)]sin <u()(2r + 7)	(11-64)
If (11-63) is true, then the above yields
RxA?) = Kaa(r)cosw(17 + /?„>,( 7 )sinw07	(11-65)
Conversely, if x(/) is WSS, then the second and third lines in (11-64) must be
independent of t. This is possible only if (11-63) is true.
We introduce the “dual” process
y(/) = Ь(г)со8й)(|Г + a(t)sinav	(11-66)
This process is also WSS and
Ryy(7) = Rx,(r) /?х>.(7) = -RyM	(11-67)
Rxy(7) = Rah( 7)cos <un7 - Raa( 7)sin to07	(11-68)
The above follows from (11-64) if we change one or both factors of the product
x(t + r)x(f) with y(r + 7) or y(z).
Complex representation. We introduce the processes
	w(r) = a(r) + /b(r) = r(t)eMn z(/) = x(t) +jy(l) =w(t)elu"'	(11-69)
Thus	x(t) = Rez(t) - Re[w(f)e/“"'j	(11-70)
and	a(r) +jb(/) =w(/) =	
This yields	a(z) = x(/)cos o)0t + y(f)sin b(f) = y(r)cos<o0r - x(f)sin&>nr	(11-71)
Correlations and spectra. The autocorrelation of the complex process w(t)
equals
Лш|И(т) = E([a(f + r) +/b(r + 7)] [a(r) -fb(/)]}
364 STOCHASTIC PROCESSES
Expanding and using (11-63), we obtain
= 2Лил(т) - 2jRab(r)	(11-72)
Similarly.
RJ/r) = 2Rxx(t) - 2jRxy(r)	(11-73)
We note, further, that
Щг) = e,w"rRwn.(~)	(11-74)
/S>«(w)
(<)
F1GUREU-12
11-3 мши i лщ)>. 365
From the above it follows that
*0 = 25,„(«>) - 2jSllh((,>)
S:z((o) = 25t,(w) - 2j5lv(w)	(H'75)
5,,( w) =	- co,,)	( И-70)
The functions 5,,(co) and 5-.(to) arc real and positive. Furthermore [see
(11-67)]
RKV( --) = -Я,,(-г) =
This leads to the conclusion that the function -j$,v(co) = В (w) is real and
(Fig. li-12o)
|Btl.(w)| < S,,(co)	Bx>.(-co) = -S<y(w)	(11-77)
And since 5,л( -a>) = 5ж1(ш). we conclude from the second equation in (11-75)
that
45,., (co) = 5\,(<и) + 5'.-( —«а)
4;5,.,.(co) = 5..(-co) -5..(co)	(lb78)
Single sideband If b(r) = a(/) is the Hilbert transform of a(t). then [sec
(10-147)] the constraint (11-63) is satisfied and the first equation in (11-75) yields
= 45„„(со)(/(со)
(Fig. 11-12Z>) because
$««(<*>) = /5tfrt(<o)sgnco
The resulting spectra arc shown in Fig. 11-126. We note, in particular, that
5TX(w) = 0 for |«)| < ft)l(.
RICE’S REPRESENTATION. In (11-62) we assumed that the carrier frequency w()
and the processes a(f) and bG) were given. We now consider the converse
problem: Given a WSS process x(t) with zero mean, find a constant w(l and two
processes aG) and b(z) such that xG) can be written in the form (11-62). To do
so, it suffices to find the constant co() and the dual process y(t) [see (11-71)]. This
shows that the representation of xG) in the form (11-62) is not unique because,
not only con is arbitrary, but also the process y(/) can be chosen arbitrarily
subject only to the constraint (11-67). The question then arises whether, among
all possible representations of x(t), there is one that is optimum. The answer
depends, of course, on the optimality criterion. As wc shall presently explain, if
equals the Hilbert transform i(t) of xG), then (11-62) is optimum in the
sense of: minimizing the average rate of variation of the envelope of x(t).
Hilbert transforms. As we know [see (10-147)]
W=«„(’)	R„(r) - -R„(t)	(11-79)
366 STOCHASTIC PROCESSES
We can, therefore, use x(0 to form the processes
z(f) = x(t) +jx(f) = w(f)e'w°'	(Ц-80)
w(r) = i(f) + jq(f) = z(t)e-/“0'
as in (11-69) where now (Fig. ll-12c)
y(r)=x(r)	a(r)=i(/) b(f)-q(O
Inserting into (11-62), we obtain
x(t) = i(r)cos a)Qt - q(t)sin <o0f	(11-81)
This is known as Rice's representation. The process i(z) is called the inphase
component and the process q(r) the quadrature component of x(f). Their
realization is shown in Fig. 11-13 [see (11-71)]. These processes depend, not only
on x(/), but also on the choice of the carrier frequency o>0.
From (10-136) and (11-75) it follows that
S22(o>) =4Sxx(a>)UW	(11-82)
Bandpass processes. A process x(r) is called bandpass (Fig. ll-12c) if its
spectrum Sxx(a>) is 0 outside an interval (o^, <u2). It is called narrowband or
quasimonochromaiic if its bandwidth o>2 - o>, is small compared with the center
frequency. It is called monochromatic if Sxjr(o>) is an impulse function. The
process a cos a>Qt + b sin o>0f is monochromatic.
The representations (11-62) or (11-81) hold for an arbitrary xG). However,
they are useful mainly if x(z) is bandpass. In this case, the complex envelope
wG) and the processes Kt) and q(r) are low-pass because
$«...(®)	+ш0)
We shall show that if the process x(r) is bandpass and <o, + шс < 2o>0, then the
inphase component i(/) and the quadrature component q(t) can be obtained as
responses of the system of Fig. ll-14a where the LP filters are ideal with cutoff
11-3 modhiaiiun 367
FIGURE 11-14
frequency tac such that
W2 — O>0 < O)c (t)1 — O)0 > — шс
(11-84)
Proof. It suffices to show that (linearity) the response of the system of Fig.
11-14Z) equals w(r). Clearly,
2x(t) = z(/) + z*(f) w*(f) = z*(f)ey'“u'
Hence
2x(t)e~,Uot - w(z) + w*(f)e~'2""'
The spectra of the processes w(f) and w*G)e~/2"0' equal Sww(a>) and
— 2a>0) respectively. Under the stated assumptions, the first is in the
band of the LP filter H(<o) and the second outside the band. Therefore, the
response of the filter equals w(t).
We note, finally, that if <u0 to,, then St,w(fo) = 0 for ш < 0. In this case,
q(U is the Hilbert transform of i(r). Since ш2 - s 2<u0, this is possible only
if <у2 Зсор In Fig. ll-14c, we show the corresponding spectra for ю0 =
Optimum envelope. We are given an arbitrary process x(f) and we wish to
determine a constant coo and a process y(r) so that, in the resulting representa-
tion (11-62), the complex envelope w(/) of x(f) is smooth in the sense of
mininuzing E{|w'(r)|2}. As we know, the power spectrum of w'(f) equals
to2SWM,(to) = to2S„(<u + <o0)
368 STOCHASTIC PROCESSES
Our problem, therefore, is to minimize the integral!
M = 27rE{|w'(f)l2} = /	— wo)2^rz(w)	(11-85)
subject to the constraint that Sxx(a)) is specified.
THEOREM. Rice’s representation (11-81) is optimum and the optimum carrier
frequency <oo is the center of gravity io0 of 5хх(ш)1/(ш).
Proof. Suppose, first, that Sxx(a)) is specified. In this case, M depends only on
<o0. Differentiating the right side of (11-85) with respect to <u0, we conclude that
M is minimum if <u0 equals the center of gravity
/00
o)Sz.(a)) da) J а)Вху(ы) dw
w0 =	~--------------4-------------- (11 -86)
/ Si:Mdw f Sxx(w)da)
of Szz(a)). The second equality above follows from (11-75) and (11-77). Inserting
(11-86) into (11-85), we obtain
M = f (a)2 — aj2))Szz(a)) da) = 2 f (a)2 — aj2)Sxx(a)) da) (11-87)
'—00	’'—00
We wish now to choose Szz(a)) so as to minimize M. Since Sxx(a)) is given,
M is minimum if w0 is maximum. As we see from (11-86), this is the case if
|Bxy(w)| = Sxx(w) because |BXJ,(«j)|	Sxx(.o)). We thus conclude that —jSxy(a))
= Sxx(<o)sgn a) and (11-75) yields
Ssz(a)) = 4Sxx(a))U(a))
Instantaneous frequency. With ф(г) as in (11-62), the process
<o,(t) = <o0 + <p'(r)	(11-88)
is called the instantaneous frequency of x(/). Since
Z = геЛ"<^+«Р) = x + jy
we have
z'z* = rr' + fr2<ol- = (x' + yy')(x - jy)	(11-89)
tL Mandel: “Complex Representation of Optical Fields in Coherence Theory." Journal of die
Optical Society of America, vol. 57, 1967. See also N. M. Blachman: Noise and Its Effect on
Communication, Krieger Publishing Company. Malabar. FL, 1982.
11-3 MODULATION 369
This yields E{rr'} = 0 and
£{r2w,} = у- f toS..(o>) da>	(11-90)
•i'TT J _ X
because the cross-power spectrum of z' and z equals
The instantaneous frequency of a process x(r) is not a uniquely defined
process because the dual process y(r) is not unique. In Rice’s representation
у — x, hence
XX* “‘ xzx
w, = ----p------ r“ = x2 + x2	(11-91)
In this case [see (11-82) and (11-86)] the optimum carrier frequency <о0 equals
the weighted average of w,:
_ _ £{г2“.)
"°	£{r2)
Frequency Modulation
The process
x(t) = cos[o)0r + A<p(/) + <p0]	<p(t) = f c(a) da (11-92)
J(i
is FM with instantaneous frequency + AcG) and modulation index A. The
corresponding complex processes equal
w(/)=e>A*<'> z(r) =w(/)e'<-,+*">	(11-93)
We shall study their spectral properties.
THEOREM. If the process c(z) is SSS and the RV <p0 is independent of c(f) and
such that
£{em) = E{e>2*'»} = 0	(11-94)
then the process x(/) is WSS with zero mean. Furthermore,
Rxx(r)= jReR2X(r)	(H95)
Rww(r) - £{w(r))
Proof. From (11-94) it follows that £{x(r)} = 0 because
E{z(/)} =	=0
Furthermore,
£{z(t 4- r)z(/).} = £{еЛ“-х2<+’-»+А*<г+т)+А*(,’1}£{е^} = 0
£{z(/ + r)z*(r)) = e/w»TE/exp jA f^c(a) da 1} = «'“•••'£{*( t)}
370 STOCHASTIC PROCESSES
The last equality is a consequence of the stationarity of the process c(f). Since
2x(f) = z(/) + z*(/), we conclude from the above that
4£{x(/ + r)x(f)} = /?-г(т) +
and (11-95) results because	= Л?.(т).
Definitions A process x(f) is phase modulated if the statistics of «p( t) are
known. In this case, its autocorrelation can simply be found because
£{w(/)} = £{e/A*<f)) = ФДЛ,/)	(11-96)
where Ф^(Л,/) is the characteristic function of <p(z).
A process x(f) is frequency modulated if the statistics of c(z) are known.
To determine Фр(А, f), we must now find the statistics of the integral of c(r).
However, in general this is not simple. The normal case is an exception because
then Фр(Л,г) can be expressed in terms of the mean and variance of <p(/) and,
as we know [see (10-143)]
£{<₽(')} = /f^(c(a)} da = i)ct
°	(11-97)
E{<P2(O} =	-«) da
Jo
For the determination of the power spectrum 5и.и.((о) of xG), we must find
the function Фф(А,/) and its Fourier transform. In general, this is difficult.
However, as the next theorem shows, if Л is large, then Sxx(a>) can be expressed
directly in terms of the density fc(c) of c(r).
WOODWARD’S THEOREM.! If the process c(r) is continuous and its density /e(c)
is bounded, then for large A:
„ ТГ Г f Ы — Ыо\ f —a) — (On 'll
= TT л —H +Л—J—°	f11’98’
ZA L \ л /	\ A /J
Proof. If r0 is sufficiently small, then cG) - c(0), and
<p(0= f'c(a) da = c(0)t	|/|<r0	(11-99)
Jo
Inserting into (11-96), we obtain
£{w(r)} «£{eArc(0)j = фдАт) |T| < Tq (h-100)
fP. M. Woodward; "’The Spectrum of Random Frequency Modulation," Telecommunications
Research, Great Malvern, Worcs., England, Memo 666,1952.
11-3 MOUUIAIION 371
where
ФСЫ =
is the characteristic function of c(r). From this and (11-95) it follows that
K.-.-(T) = ФДАт)е'"‘>'	|т| <t„	(11-101)
If A is sufficiently large, then Фг(Ат) = 0 for lr| > r0 because ФДМ) - 0 as
/I-* oo. Hence (11-101) is a satisfactory approximation for every т in the region
where Ф/Ar) takes significant values. Transforming both sides of (11-101) and
using the inversion formula
Л(с) = —/ Фс(д)е-^</д
-7Г - co
we obtain
2tt i a) — o)n\
Фс(Лт)е'“°те“'"г</т = —/J-----------)
and (11-98) follows from (11-78).
NORMAL PROCESSES. Suppose now that c(t) is normal with zero mean. In this
case <p(t) is also normal with zero mean. Hence [see (11-97)]
Фф(А,т) = схр(-|Л2<гф2(т)}
, r	(11-102)
о£(т) = 2f Rc(a)(T ~ “) da
'o
In general, the Fourier transform of Ф/А,т) is found only numerically.
However, as we show next, explicit formulas can be obtained if A is large or
small. We introduce the “correlation time” tc of c(t):
rc = - a) da p = Rc(<>)	(11-103)
0 'o
and. we select two constants t0 and rl such that
Inserting into (11-102), we obtain (Fig. 11-15)
<Ъ2(т)
(л_2
07
2077,.
|71 < 70
7 > 71
е-рАгг2/21
12 НЛ^(т) (H-104)
It is known from the asymptotic properties of Fourier transforms that the
behavior of Rw„(t) for small (large) r determines the behaviors of Sww(<o) for
372 STOCHASTIC PROCESSBS
FIGURE 11-15
large (small) ш. Since
2 .	' 2\4
(0 + p тсл
(11-105)
we conclude that Sww(a>) is lorenzian near the origin and it is asymptotically
normal as ы -> co. As we show next, these limiting cases give an adequate
description of for large or small A.
Wideband FM. If A is such that
рА2т(2 » 1
then Rww(r) » о for |rI > T(). This shows that we can use the upper approxima-
tion in (11-104) for every significant value of t. The resulting spectrum equals
1 Г2тг ,	,
A V P
(11-106)
in agreement with Woodward’s theorem. The last equality in (11-106) follows
because c(/) is normal with variance E{c2(f)} = p.
Narrowband FM. If A is Such that
рА2т1т<. 1
then /?„,й,(т) » 1 for |t| < Tj. This shows that we can use the lower approxima-
tion in*(il“104) for every significant value of r. Hence
=
2pTrA2
ta2 + р2т2А4
(11-107)
11-4 CYCI <>\l A I1OSAK4 l‘R<>< I y»l S 373
И-4 CYCLOSTATIONARY PROCESSES!
A process x(t) is called strict-sense cyclostationary (SSCS) with period T if its
statistical properties are invariant to a shift of the origin by integral multiples of
T, or, equivalently, if
F(a'i....x„;lx + mT......i„ rmT) =F(xl.........v„;/t....z„)	(11-108)
for every integer m.
A process x(z) is called wide-sense cyclostationary (WSCS) if
7j(f + wT) = 7j(/)	+ mT, tz + mT) = R{ if, t2) (11-109)
for every integer m.
It follows from the definition that if x(t) is SSCS, it is also WSCS. The
following theorems show the close connection between stationary and cyclosta-
tionary processes.
THEOREM 1. If x(f) is an SSCS process and 0 is an RV uniform in the interval
(0, T) and independent of x(t), then the process
x(r)=x(r-0)	(11-110)
obtained by a random shift of the origin is SSS and its mh-order distribution
equals
1 r>'
F(xb...fx„;ti......r„) = -	- ar....Z„ - a) da
I Ai
(11-111)
Proof. To prove the theorem, it suffices to show that the probability of the event
</ = {x(r, + с) ........x(G, + c) <x„)
is independent of c and it equals the right side of (11-111). As we know [see
4-62)]
P(^) =	(11-112)
TJo
Furthermore,
P(V[0 = 0) =P{x(f( + c-6) <;xl,...,x(f„ + c-6)
Arid since 0 is independent of x(/), we conclude that
P{^H0 = 0} =	+c - 0,...jn + c - 0)
Inserting.into (11-112) and using (11-108), we obtain (11-111).
tN. A. Gardner and L. E. Franks: Characteristics of Cyclostationary Random Signal Processes.
IEEE Transactions in Information Theory, vol. IT-21, 1975.
374 STOCHASTIC PROCESSES
theorem 2. If x(t) is a WSCS process, then the shifted process x(r) is WSS
with mean
1 r
V =	(11-113)
I Jo
and autocorrelation
— It
R(t) = -f R(j +T,t)dt	(11-114)
J Jo
Proof From (7-59) and the independence of 0 from x(r), it follows that
£{x(t - 0)} = E{-r](t - 0)} = — fTT](t - 0) dO
L J0
and (11-113) results because 17(f) is periodic. Similarly,
£{x(f + т - 0)x(t - 0)} = E{R(t + т - 0, t - 0)}
1 fT
= — / R(t + т - 0, f - 6) dO
TJo
This yields (11-114) because R(t + t, l) is a periodic function of t.
Pulse-Amplitude Modulation (PAM)
An important example of a cyclostationary process is the random signal
x(f) = £ cnh(t-nT)	(11-115)
П = —co
where h(f) is a given function with Fourier transform H(<o) and c„ is a
stationary sequence of RVs with autocorrelation Rc[m] = £{c„+/nc,t) and power
spectrum
Sc(e'“) = £ Rc[m]e-^“	(11-116)
THEOREM. The power spectrum 5x(o>) of the shifted process x(f) equals
= ~Sc(e^\HM\2	(11-117)
Proof We form the impulse train
«(/) = £ cn8(t-nT)	(11-118)
П " “CO
Clearly, z(0 Is the derivative of the process w(r) of Fig. 11-16:
w(r) - £ c„U(t - nT) z(t) = w'(0	(11-119)
—о»
11-4 CYCI.OSIAI|()NAR\ PR(X l ЧЧ1Л 375
Pulse-amplitude modulation
FIGURE 11-16
The process w(r)is cyclostationary with autocorrelation
Ян.('|,'2) = E - r)U(tl - nT)U(tz - rT)
ft r
From (10-94) it follows that
•Яг0|»*г) = —=	~ rls('i “ nT)8(i, - rT)
This yields
Л.(г + т,/)= £ Яс[т] £ 8[t + т - (m + r)T)8(t - rT] (11-120)
ffl я — 00	r = — X
We shall find first the autocorrelation R.(t) and the power spectrum 5r(oj) of
the shifted process z(f) = z(f - 0). Inserting (11-120) into (11-114) and using
the identity
f^8[t 4- 7 - (m + r)T]8(t - rT) dt = 8(r - mT)
we obtain
Я;(т)=у £ Rt[m]S(r-mT)	(11-121)
* Ш *= - I
From this it follows that
* n « “«
The process x(t) is the output of a linear system with input z(/). Thus
x(r) = z(/)*A(t) x(f) = z(/)* h(t)
Hence {see (11-122) and (10-136)] the power spectrum of the shifted PAM
process x(f) is given by (11-117).
COROLLARY. If the process c„ is white noise with Sc(u) = q, then
Q	Q
«.(") =	Я,(г) = 7>'(<)‘Л(-')	(11-123)
376 STOCHASTIC PROCESSES
FIGURE H-17
1 Я(г)
-гот;
Example 11-4. Suppose that /1G) is a pulse and c„ is a white-noise process taking
the values ± 1 with equal probability:
1	0 t < T c„ = x(nT') Лг[/п] = фп]
0 otherwise
А(0 = {
The resulting process xG) is called binary transmission. It is SSCS taking the values
± 1 in every interval {nT - T, nT), the shifted process xG) = xG - 0) is stationary.
From (11-117) it follows that
$,.(«) =
4 sin2(wT/2)
TV
because Sc(z) = 1. Hence Ях(т) is a triangle as in Fig. 11-17.
11-5 BANDLIMITED PROCESSES AND
SAMPLING THEORY
A process x(r) is called bandlimited (abbreviated BL) if it has finite power and
its spectrum vanishes for |o> | > ar:
R(0)<a>	S(<o) = 0	|<o| > a	(11-124)
In this section we establish various identities involving linear functionals of BL
processes. To do so, we express the two sides of each identity as responses of
linear systems. The underlying reasoning is based on the following:
THEOREM. Suppose that w((/) and w2(z) are the responses of the systems T^w)
and T2(w) to a BL process xG) (Fig. 11-18). We shall show that if
7,(0») = T2(o>) for |ш| ar	(11-125)
then
Wi(') =w2(0	(11-126)
Proofi The difference Wj(r) - w2(f) is the response of the system T/w) — T2(«u)
to the input x(/). Since 5(w) = 0 for |w| > <r, we conclude from (10-139) and
11-5 BANDLIMITED PROCESSUS AND SAMPLING'I HEORY 377
FIGURE 11-18
(11-125) that
E{|w,(i) - w,(r)|2) = 2- Г- TM\2da> = 0
i-Tt J —a
Hencet Wj(r) = w2(f).
Taylor series. If x(f) is BL, then [see (10-121)]
1 r(T
Е(т)=—I S(a))eJU>rdu>	(11-127)
2тг J —(r
In the above, the limits of integration are finite and the area 2тг/?(0) of S(a>) is
also finite. We can therefore differentiate under the integral sign
Л(я’(т) = 2- Г (j<u)"S(<u)e'«rda>	(11-128)
2ir J-a
This shows that the autocorrelation of a BL process is an entire function; that
is, it has derivatives of any order for every t. From this it allows that x(/,>(0
exists for any n (see Арр. 10A).
We maintain that
x(r + r) = Ex(n)(0^r	(11-129)
л-O n‘
Proof, We shall prove (11-129) using (11-126). As we know
e'“T =	all ca	(11-130)
л-0 n‘
The processes x(/ + r) and x<nKf) are the responses of the systems ejuiT and
respectively to the input x(r). If, therefore, we use as systems 7\(ы) and
T2(w) in (11-125) the two sides of (11-130), the resulting responses will equal the
two sides of (11-129). And.since (11-130) is true for all a>, (11-129) follows from
(11-126).
t Ail Identities in this section are interpreted in the MS sense.
37# STOCHASTIC PROCESS liS
Bounds. Bandlimitedness is often associated with slow variation. The following
is an analytical formulation of this association.
If x(t) is BL, then
E{[x(t + t) - x(I)]2} < сг2т2/?(0)	(11-131)
or, equivalently,
2[R(0) -Я(т)] <<т2т2Я(0)	(11-132)
Proof. The familiar inequality |sin <p| < l<pl yields
toT	(O2T2
I — cos tor = 2 sin2— < ——
2	2
Since $(<u) > 0, it follows from the above and (10-122) that
1 fir
R(Q) — R(r) = -—J S(<y)(l — cos шт) do)
ZlT J
i	2 2	2 2	7
1	r<r (а т	err r<T	a’?
£ z— f S(o>)—— do) < ——I S(to) do) = —— /?(0)
2	.тг J2	4tt J-a	2
as in (11-132).
Sampling Expansions
The sampling theorem for deterministic signals states that if /(/) «-> F(to) and
F(to) = 0 for I co | > cr, then the function /(/) can be expressed in terms of its
samples f(nT) where T = тг/сг is the Nyquist interval. The resulting expansion
applied to the autocorrelation R(r) of a BL process x(/) takes the following
form:
®	sin сг(т - nT)
К(т)= £ R(nT) —p------------—-	(11-133)
Л--00	or(T-nT)
We shall establish a similar expansion for the process x(r).
THEOREM. If x(/) is a BL process, then
.	,	sin<r(r — nT)	it
x(l + r)- E х(г + лТ)--------~T=—	(11-134)
„--00	а(т-пТ)	O-
for every t and t. This is a slight extension of (11-133). This extension will
permit us to base the proof of the theorem on (11-126).
Proof We consider the exponential eiwT as a function of to, viewing т as a
parameter, and we expand it into a Fourier series in the interval (-«• ^ w a).
1
11*5 BANDLIMI П'-D PROCESSES AND SAMPLING Till ORY 379
The coefficients of this expansion equal
1 r,T .	- sin <г(т - nT)
a„ = — е,шге-’"Тш du = -------------------------
2a '-а	(т(т — hT)
Hence
e
/шт _
*	_ sin<r(r - nT)
E ----------------—
<r(r — nT)
|o)| < (Г
(11-135)
We denote by T^u) and T,(w) the left and right side respectively of (11-135).
Clearly, ГХ<у) is a delay line and its response w,(r) to x(i) equals x(/ + r).
Similarly, the response w,(z) of T2{u) to x(r) equals the right side of (11-135).
Since Tt(<u) = T2(<o) for Ы < a, (11-134) follows from (I I-I26).
Past samples. A deterministic BL signal is determined only if all its samples,
past and future, are known. This is not necessary for random signals. We show
next that a BL process x(z) can be approximated arbitrarily closely by a sum
involving only its past sample x(nT0) provided that Tl} < T. We illustrate first
with an example.t
Example 11-5. Consider the process
x(0 = лх(г - To) - (")x(z - 2T«) + • • - (-1 )rtx(r - nT") (11-136)
The difference
y(/) = x(z) - x(z) = E (-1)*(")x(/ - kT„)
is the response of the system
H(u) = E (~1)к(Пк}еЧкГ,,ш = (1 -
fc-0 ' '
with input x(r). Since |H(w)| = |2sin(<oT0/2)l", wc conclude from (10-36) that
£{у2(')} -ЛГ5<">(2«п^г) d<a (11'137)
If Го < тт/За, then 2sin(wTn/2) < 2sin(ir/6) = 1 for kl <cr. From this it
follows that the integrand in (11-137) tends to 0 as л -» ®. Therefore, E{y 2(z)) -» 0
and
x(/) -» x(/) as л -* oo
Note that this holds only if T„ < T/3; furthermore, the coefficients of x(z)
tend to oo as л -»oo.
tL A. Wainstein and V, Zubakov: Extraction of Signals in Noise, Prentice-Hall, Englewood Cliffs,
NJ,1962.
380 STOCHASTIC PROCESSES
FIGURE 11-19
We show next that x(z) can be approximated arbitrarily closely by a sum
involving only its past samples x(f - kT()~) where Tlt is a number smaller than T
but otherwise arbitrary.
THEOREM. Given a number To < T and a constant e > 0, wc can find a set of
coefficients ak such that
£{|x(/) — x(f)|2} < e *(O =	(11-138)
k- i
where n is a sufficiently large constant.
Proof. The process x(f) is the response of the system
P(w) = £ аке~’г"ш	(11-139)
k~ i
with input x(z). Hence
£{|x(z) - x(r) |2} = f S(<o)11 - P(<o)\-da>
J —or
it suffices, therefore, to find a sum of exponentials with positive exponents only,
approximating 1 arbitrarily closely. This cannot be done for every Ы < o',, =
тг/Тй because P(<o) is periodic with period 2<r0. We can show, however, that if
or0 > cr, we can find P(<o) such that the differences 11 - Hwfl can be made
arbitrarily small for |w| < cr as to Fig. 11-19. The proof follows from the
Weierstrass approximation theorem and the Fejer-Riesz factorization theorem;
the details, however, are not simple.!
Note that, as in Example 11-5, the coefficients ak tend to <» as e -» 0. This
is based ph the fact that we cannot find a sum P(co) of exponentials as in
PapbU!is: '‘A Note, on "the'Predictability of Band-Limited "Processes,” Proceedings of the IEEE.
Wl. 13, h6. 8. 1985.
11-5 l»AM)| IMIH f) PR<K I SSI-S AShSAMH IS». Hll OIO 381
FIGURE 11-20
(11-139) such that 11 — Р(ш)\ = 0 for ever}' ш in an interval. This would violate
the Paley-Wiener condition (12-9).
THE PAPOULIS SAMPLING EXPANSION.t The sampling expansion holds only if
T < ir/<r. The following theorem states that if we have access to the samples of
the outputs y/f),... ,Уд,(1) of N linear systems //,(«),..., driven by
x(f) (Fig. 11-20), then we can increase the sampling interval from тг/сг to
Nir/v.
We introduce the constants
c = —= ^- T = NT	(11-140)
N T
and the N functions
Pt(a>,l).....P^tn.i)
defined as the solutions of the system
Н1(.ш)Р}(ы,т) +	+/'/у(ш)/\(ш, т)	= 1
+ c)P|(<o, r) + •  • +HN(u) + c)PN(to, r) = e1'T
/f](<u + Nc — с)Р^ш, т) + • • 	+ Me — c)PN(a). т) = еяЛ 1,1 ’
(11-141)
In the above, ш-takes all values in the interval (— ir, - a + c) and r is arbitrary'.
We next form the N functions
= - /‘"’Г+\(й>,т)^“т«/ш I < к < M (11-142)
C J-rr
tA, Papoulis: "New Results in Sampling Theory." Hawaii Intern Conf. System Sciences. January
(See also Papoulis, 1968, pp. 132-137).
382 STOCHASTIC PROCESSES
THEOREM
Xp + r)» L [у1(/+«Пр1(т-«П+ +yN(z+ лТ)рл.(т-л7;)] (11-143)
/I — — «
Proof. The process у ft + nT) is the response of the system Н,(<о)е1,,Гш to the
input xG). Therefore, if we use as systems T^to) and T2(to) in Fig. 11-18 the two
sides of the identity
e^-Hfto) £ pfr- nT)e"’“T +	+ Hw(<o) £ рл.(т - пТ)е}"шТ
fl « — 00	11
(11-144)
the resulting responses will equal the two sides of (11-143). To prove (11-143), it
suffices, therefore, to show that (11-144) is true for every |ш|	<т.
The coefficients Hk(to + kc) of the system (11-141) are independent of r
and the righ_t side consists of periodic functions of r with period T = 2тг/с
because eikcT = 1. Hence the solutions Pk(to, r) are periodic
Рл(<о,т - nT) = Pk(co,r)
From the above and (11-142) it follows that
pk(r — nT) — — f Рк(ш,т)е}ш('~"^> da>
c J-a
This shows that if we expand the function Рк(.<о,т)е'шт into a Fourier series in
the interval (—a, — cr + c), the coefficient of the expansion will equal
pk(r - nT). Hence
Рк{(о,т)е,шг = Pk(T ~ nT)eJnwT	— cr < to < — cr + c (11-145)
fl = —OO
Multiplying each of the equations in (11-141) by e'“T and using (11-145) and the
identity
е/Щы+кс)Т — ein^f
we conclude that (11-144) is true for every to in the interval (—tr,cr).
Random Sampling
We wish to estimate the Fourier transform F(to) of a deterministic signal fit)
in terms of a sum involving the samples of /(r). If we approximate the integral
of/(/)(?_-/cu' by its Riemann sum, we obtain the estimate
F(w)=F*(w)s £ Tf(nT)e~I,,uT	(11-146)
fl — - oo
From the Poisson sum formula (11A-1), it follows that F*(w) equals the sum of
I 1-5 BAND1.1MI I UD PROCTSSl .SAND!SAMPLING 11ILORY 383
Fito) and its displacements
F*(a>) = 22 F((i) + 2/иг) <t = —
» •= - X	T
Hence F#(<u) can be used as the estimate of Fico) in the interval (-<r. a) only
if Fico) is negligible outside this interval. The difference Hw) - F.U) is called
aliasing error. In the following, we replace in (11-146) the equidistant samples
f(nT) of fit) by its samples /(t() at a random set of points t, and we examine
the nature of the resulting error.!
We maintain that if t, is a Poisson point process with average density Л,
then the sum
1 _
P(^) = - Entje--""1'	(11-147)
A i
is an unbiased estimate of F(w). Furthermore, if the energy
E- f
of fit) is finite, then P(<u) -» Fico) as A -» ». To prove the above, it suffices to
show that
E{P(<o)} = F(u>)	^<m=T	(11-148)
Л
Proof. Clearly,
Г/(/)е-^ЕЗ(/ - t,)dt = ЕЛ‘/)^‘Уш‘'	(11-149)
-«	j	j
Comparing with (11-147), we obtain
Р(ю) = — [ y(/)z(f)e“'"'dt where z(f) = E5(z _ Q (11-150)
A t — a>	l
is as Poisson impulse train as in (10-98) with
E{z(t)} = A C2(q,t2) = АЗ(Г, - G)	(11-151)
IE. Masry: “Poisson Sampling and Spectral Estimation of Continuous-Time Processes. IEEE
Transactions on Information Theory, vol. IT-24. 1978. See also F. J. Beutler: “Alias Free Randomly
Timed Sampling of Stochastic Processes.” IEEE Transactions on Information Theory, vol. IT-16.
1970.
384 STOCHASTIC PROCESSES
Hence
£(P(<o)} = | Г f(t)E[z(t)}e-^dt = F(m)
o-Д., = VГ Г	- '2) <Л> *2 = тГ Г(<г) л2
and (11-148) results.
From (11-148) it follows that, for a satisfactory estimate of F(<u), A must
be such that
(11-152)
Example 11-6. Suppose that /(/) is a sum of sine waves in the interval (—a. «):
/(/) = Y,cketu,k‘ W < a
k
and it equals 0 for И > a. In this case,
„	sin a( to — <0.)	_	,
£(*>) = £2сл-----i------- E = 2aL|c,|-	(11-153)
к	к
where we neglected cross-products in the evaluation of E. If a is sufficiently large,
then
F(wJ - 2ack
This shows that if
L|c1|2<K2flAlcJ2 then P(wJ = F(wA.)
i
Thus with random sampling we can detect line spectra of any frequency even if the
average rate Л is small, provided that the observation interval 2a is large.
11-6 DETERMINISTIC SIGNALS IN NOISE
A central problem in the applications of stochastic processes is the estimation of
a signal in the presence of noise. This problem has many aspects (see Chap. 14).
In the following, we discuss two cases that lead to simple solutions. In both
cases the signal is a deterministic function /(r) and the noise is a random
process v(t) with zero mean.
The Matched Filter Principle
The following problem is typical in radar: A signal of known form is reflected
from a distant target. The received signal is a sum
«(0-/(0+»(0	E{v(r)}~0
where /(r) is a shifted and scaled version of the transmitted signal and v(r) is a
11-6 t)kn-KMINIS'l К SIGNALS IN N(>|SI. 385
WSS process with known power spectrum S(a). We assume that f(t) is known
and we wish to establish its presence and location. To do so. we apply the
process x(f) to a linear filter with impulse response Mr) and system function
H(w). The resulting output y(t) = x(r)* h(t) is a sum
У(0 = [_ *({ ~ a)h(a)d(a) = у-Дг) + y,.(r)	(11-154)
where
У/(0 = f - ot)h{a) da = f F(a)H(a)e}u>l da (11-155)
J -®	277 J-з>
is the response due to the signal f(t\ and yp(r) is a random component with
average power
£{уЛО) =	£\$(<o)|H(<u)|2<Z<u	(11-156)
Since y„(f) is due to v(t) and E{v(r)) = 0. wc conclude that E{yt,(r)} = 0 and
E{y/f)} = y/f). Our objective is to find H(a) so as to maximize the signal-to-
noise ratio
i/£{y.’('<>)}
(11-157)
at a specific time rn.
White noise. Suppose, first, that S(o>) = 50. Applying Schwarz’s inequality
(11B-1) to the second integral in (11-155), we conclude that
f |/=’(w)e'"''1|2 da f | H(o>) |2 da	E
гг < ----------------------------------= /	(11-158)
27t5iJ|H(<o)|24/<o
where Ef = (l/2ir)f\F(a)\2 da is the energy of f(t). The above is an equality
if [see (11В-2)]
H(a) = kF*(a)e~iulu h(t) =	- t) (11-159)
This determines the optimum H(a) within a constant factor k. The system so
obtained is called the matched filter. The resulting signal-to-noise ratio is
maximum and it equals ^E^/Sn.
'Colored noise. The solution is not so simple if S(a) is not a constant. In this
ease* we use a trick. We first multiply and divide the integrand of (11-155) by
386 STOCHASTIC PROCESSUS
У5(<о) and then apply Schwarz’s inequality. This yields
_ r F(<o) ------
J Дш)
. |F(oj)|2	,
< f —15—— da) fs(a))|H(w)|~d<o
J 5(<o) J
Inserting into (11-157), we obtain
, |F(a>)|2 г
_2 - f—tofsMiHMI-d, _	. l£Wd(d
2vfsMtHM\2da 2lr 5(<u)
Г- £
EQuality holds if
. 
V5(<o)
Thus the signal-to-noise ratio is maximum if
Н(ы) = к
F*(to)
S(<o)
(11-160)

Tapped delay line. The matched filter is in general noncausal and difficult to
realize. A suboptimal but simpler solution results if Н(ш) is a tapped delay line:
H((o) = a0 + л.е"^7, + • • • +ате~/тыГ	(11-161)
In this case,
tn	m
?/('„)= Е«,Л<о-'П	у,,(<)-E«,»(/-it) (1M62)
i-O	1-0
and our problem is to find the m constants af so as to maximize the resulting
signal-tp-noise ratio. It can be shown that (see Prob. 11-28) the unknown
constants are the solutions of the system
Za,R(nT-iT) = kf(tn - nT) n = 0,...,m	(11-163)
i-O
where Жт) is the autocorrelation of v(r) and к is an arbitrary constant.
Smoothing
We.wish to estimate an unknown signal f(t) in terms of the observed value of
the sum k(/) =/(O + v(t). Wc assume that the noise v(f) is white with known
autocorrelation Ж^»^) = ^{ti)8(tl — t2). Our estimator is again the response
11-6 DETI KMINISIK SIGNM s 14 M>M 387
y(r‘) of the filter Л(/):
У(') = f ~ r)h(r) dr	(11-164)
The estimator is biased with bias
b =yz(/) -/(/) = / f(t - t)/j(t) dr -f(t)	(11-165)
and variance [see (10-90)]
cr2 = £{y,?(0) = f q(t - r)/r(r) dr	(11-166)
Our objective is to find hit) so as to minimize the MS error
e = £([y(O -Г(<)]2| -b'- + a-
We shall assume that hit) is an even positive function of unit area and
finite duration:
ft(-r)=/i(f) f\h(t)dt=l h(t) > 0	(11-167)
where T is a constant to be determined. If T is small, yfit) -fit). hence the
bias is small; however, the variance is large. As T increases, the variance
decreases but the bias increases. The determination of the optimum shape and
duration of hit) is in general complicated. We shall develop a simple solution
under the assumption that the functions /(r) and qit) are smooth in the sense
that /(/) can be approximated by a parabola and qit) by a constant in any
interval of length 2T. From this assumption it follows that (Taylor expansion)
t2
/(f-T)==/(f)-r/'(') + y/"(')	q(t-r)^q(t) (11-168)
for ]t| < T. And since the interval of integration in (11-165) and (11-166) is
(—T, T), we conclude that
b ~ £111 fT т2Л(т) dT = [T h2(r) (It (11-169)
2 J-г
because the function hit) is even and its area equals 1. The resulting MS error
equals
е = |М2[/''(О]2+ &?(')	(H-170)
where M « fLTl2hit)dt and E = /1гЛ^(/)е/т.
To separate the effects of the shape and the size of hit) on the MS error,
we introduce the normalized filter
w(r) = TA(77)	(11-171)
388 STOC'HASIIf VHOCI SSKS
FIGURE 11-21
The function м4г) is of unit area and w(f) = 0 for III > I. With
M„=f t2w(t)dt = —^ Ew= w2(t)df = TE
J-i	l"	1
it follows from (1.1-167) and (11-170) that
o-2=y^(<)	(11-172)
с = |t2m,+ -v«(')	(i‘-ra>
4	i
Thus e depends on the shape of w(i) and on the constant T.
The two-to-one rule.t We assume first that w(t) is specified. In Fig. 11-21 wc
plot the bias b. the variance a2, and the MS error e as functions of T. As T
increases, b increases, and cr2 decreases. Their sum e is minimum for
Inserting into (11-172), we conclude, omitting the simple algebra, that
a = 2b	(11-175)
Thus if M'(/) is of specified shape and T is chosen so as to minimize the MS
error e, then the standard deviation of the estimation error equals twice its bias.
Moving average. A simple estimator of /(f) is the moving average
y(O = f+/x(-) dr
2/ Jt-T
tA. Papoulis, Two-lo-Onc Rule in Data Smoothing, IEEE Trans. Inf. Theory. September, l*>77.
11-7 insi'i < ||<л ли
s.4 S I t М !l >| Ml) |( VIOS 389
of x(/). This is a special case of (11-164)
a pulse of width 2. In this case
Inserting into (11-174). we obtain
7 m
Um]2
where the normalized filter »(/) equals
1 fl 1
F- = -J ;h - 2
... ></(')
e = >b~ = ~ST~	(11-176)
The parabolic window. Wc wish now to determine the shape of w(/) so as to
minimize the sum in (11-173). Since A(r) needs to be determined within a scale
factor, it suffices to assume that Eh has a constant value. Thus our problem is to
find a positive even function w(r) vanishing for |/| > 1 and such that its second
moment MK is minimum. It can be shown that (sec page 388//)
w(z) (0 75('-r)	им	177)
(0	Id > I
Thus the optimum w(f) is a truncated parabola. With u(r) so determined, the
optimum filter is
1 i { \
h(t) - —и- —
r It
tn \ tn J
where T„, is the constant in (11-174). This filter is, of course, time varying
because the scaling factor Tm depends on t.
11-7 BISPECTRA AND SYSTEM
IDENTIFICATION!
Correlations and spectra are the most extensively used concepts in the applica-
tions of stochastic processes. These concepts involve only second-order mo-
ments, In certain applications, moments of higher order are also used. In the
following, we introduce the transform of the third-order moment
Л.х.г(Г|Л2.Г3) = A'[x(r1)x(r2)x(t3)}	(11-178)
of a process x(/) and wc apply it to the phase problem in system identification.
We assume thatx(/) is a real SSS process with zero mean. From the stationarity
of x(7) it follows that the function ,, t2, t3) depends only on the differ-
ences
/] - f3 = Ц. f,~ fy= v
tt>. R. Brillingcr "An Introduction to Polyspeclrn.” storwh of Math Statistics, vol. 36. Also С. I.
Niklas and M. R. Raghuveer (1987): “Bispectrum Rstimalion; Digital Processing Framework,"
IEEE Proceedings, vol. 75. 1965.
iFV
390 STOCHASTIC PROCESSES
Setting t3 = t in (11-178) and omitting subscripts, we obtain
Я(/,Л2-Гз) = Л(м. »*) = £(x(' + A*)x(' + *')*(')}	(11-179)
DEFINITION. The bispectrum S(u, v) of the process x(t) is the two-dimensional
Fourier transform of its third-order moment /?(д, p):
S(u,u) =	dfjbdv	(11-180)
The function Л(д, p) is real; hence
S(—u,-u) = S*(u,v)	(11-181)
If x(f) is white noise then
=03(д)5(р) S(u,v) = Q	(11-182)
Notes I. The third-order moment of a normal process with zero mean is identically zero.
This is a consequence of the fact that the joint density of three jointly normal RVs with
zero mean is symmetrical with respect to the origin.
2. The autocorrelation of a white noise process with third-order moment as in
(11-182) is an impulse дЗ(т); in general, however, q Ф Q. For example if xO) is normal
white noise, then Q = 0 but q Ф 0. Furthermore, whereas q > 0 for all nontrivial
processes, Q might be negative.
Symmetries. The function R(tt, i2, t3) is invariant to the six permutations of the
numbers t2, and t3. For stationary processes,
= М f2~h = v G “ t2 = iJ.-v
1	G.G. h		u,u	4	G«G*G	- Д	— g + V	- и - V, 1‘
2	<j. G> G		V, и	5	G«G»G	-д	+ V,— g	f, - и — V
3	G» G« G	— v, g — V	—и — и, и	6	G-G.G	д -	P, “ V	и, - и - I’
FIGURE 1122
11-7 BlSI’liC'l HA AND SYS'I I M 11)1 N ( П К ЛI ION 391
This yields the identities
К(д.") = /?(р, д) = Я( -J/, д - Р) + Л(-д, - м + ,,)
=/?(-д + р, - д) =/?(д - I/, - р)	(11-183)
Непсе if we know the function /?(д,с) in any one of the six regions of Fig.
11-22, we can determine it everywhere.
From (11-180) and (11-183) it follows that
S(m, r) = S( г, h) = S( — u — r.u) = S( -и - г. с)
= S(i\ - и - i) = S(u, - и - r)	(11-184)
Combining with (11-181), we conclude that if wc know S(u, r) in any one of the
12 regions of Fig. 11-22, we can determine it everywhere.
Linear Systems
We have shown in (10-110) that if x(r) is the input to a linear system, the
third-order moment of the resulting output y(r) equals
z2, G)
= f	- a,t2 - /3,G - y)h(a)h{P)h(y) dadftdy (11-185)
For stationary processes, /?X4r(r, - a, t: - ft h - у) = /?1Д,(д + у - a,v + у
-.ft)', hence
Ят.(д,*') = //f_ Rxxx(v + у ~ a,i> + у - /3)Л( a)/i(/3)/z( у) dadpdy
(11-186)
Using this relationship, we shall express the bispectrum Syyifu,i?) of y(/) in
terms of the bispectrum Sxxx(u,u) of x(f).
THEOREM
Syyy(u,o) = Sxxx(u,u)H(u)H(r)H*(u + i.')	(11-187)
Proof. Taking transformations of both sides of (11-185) and using the identity
// Rjcxxil1 + У - v + у “ ^)е-''("д + ",Мд dv
= Sxxx(u,u)eJl,,(Y‘f,’*,(Y“M
we obtain
Syyy(u,u) = Sxxx(u,v) fjf* e^^+,^-fiiih(a)h(p)h(y)dadpdy
Expressing the above integral as a product of three one-dimensional integrals,
we obtain (11-187).
Example 11-7. Using (11-185), we shall determine the bispectrum of the shot noise
s(r) = ЕЛ(/ - t4) = x(z) *Л(г) z(r) - 2>(r - <.)
i	‘
where t, is a Poisson point process with average density Л.
392 STOCHASTIC PROCESSES
To do so, wc form the centered impulse train z(z) = z(z) - л and the
centered shot noise s(z) = z(z)» h(t). As wc know (see Prob. 11-28)
= A3(m)3(p) hence S.-.-=(zz, z) = A
From this it follows that
$ш(м) =	+ c)
and since S-(w) = А|/7(ш)|2, we conclude from Prob. 11-27 with c = E{s(i)} =
A/7(0) that
$„,(«,«;) = AH(«)H(i')H*(« + v)
+ 2irA2H(0)[lH(«)|I5(t) + \H(t )|23(«) + |И(м)|26(и	c)|
+ 4тг2А4Я3(0)5(«)3(г)
System Identification
A linear system is specified terminally in terms of its system function
H(a)) = А(ш)е">1ш)
System identification is the problem of determining H(w). This problem is
central in system theory and it has been investigated extensively. In the
following, we apply the notion of spectra and polyspectra in the determination
of /4(a>) and
Spectra. Suppose that the input to the system Н(ш) is a WSS process x(r) with
power spectrum Sxx(a>). As we know,
SxyM =Sxx(<o)H*(<o)	(11-188)
This relationship expresses Н(ы) in terms of the spectra Sxx(zo) and 5Xi,(zu) or,
equivalently, in terms of the second-order moments Лхх(т) and /?х,.(т). The
problem of estimating these functions is considered in Chap. 13. In a number of
applications, we cannot estimate Яху(т) either because we do not have access to
the input x(/) of the system or because we cannot form the product x(f + rfyit)
in real time. In such cases, an alternative method is used based on the
assumption that x(/) is white noise. With this assumption (10-136) yields
= S„( ш) |Я( w) I2 = ф42(ш)	(11-189)
This relationship determines the amplitude /1(<о) of in terms of Syiiu)
within a constant factor. It involves, however, only the estimation of the power
spectrum Syy(a>) of the output of the system. If the system is minimum phase
(see page 40), then ff(zu) is completely determined from (11-189) because, then,
y(<u) can be expressed in terms of A((u). In general, however, this is not the
case. The phase of an arbitrary system cannot be determined in terms of
second-order moment of its input. It can, however, be determined if the
third-order moment of y(/) is known.
Phase determination. We assume that x(f) is an SSS white-noise process with
“ 0- Inserting into (11-187), we obtain
Syyy(utv) -	+ г)	(11-190)
11-7 BISPUCTRA AND SYSTEM IDLNTIHCATIQN 393
The function Syyy(u, v) is, in general, complex:
5у>,,.(и,г) = B(u,u)ew,,,>	(11-191)
Inserting (11-191) into (11-190) and equating amplitudes and phases, we obtain
B(u,u) = QA(u)A(u)A(u + r)	(11-192)
в(и,и) = tp(u) + <p(t-) - <p(u + r)	(11-193)
We shall use these equations to express Л(ш) in terms of B(u,v) and
tp(a>) in terms of 0(u, v). Setting г = 0 in (11-192), we obtain
,	<2
A = Л(о7В(ш’О) ^3(°) =<2J3(O,O)	(11-194)
Since Q is in general unknown, Л(<о) can be determined only within a constant
factor. The phase ^>(<u) can be determined only within a linear term because if it
satisfies (11-193), so does the sum y>(<o) + c<o for any c. We can assume
therefore that <p'(0) = 0. To find <p(<o), we differentiate (11-193) with respect to
v and we set v = 0. This yields
6,.(u,0) = -<р'(м)	^(<o) = - ( 0,.(u,O) du (11-195)
'o
where 0r(u, v) = 50(u, v)/du. The above is the solution of (11-193).
In a numerical evaluation of <p(<o), we proceed as follows: Clearly, 0(u, 0)
= p(u) + <p(0) - <p(u) = <p(0) = 0 for every u. From this it follows that
1
0,.(и,О) = lim—0(u, A) as Д -* 0
Hence 0r(u,O) = 0(u, A)/A for sufficiently small A. Inserting into (11-195), we
obtain the approximations
~ -1 fU0(u,b)du <р(пЩ = - Е0(А:Д,Д) (11-196)
д7о	a-i
This is the solution of the digital version
0(fcA,rA) = ^>(&A) + <p(rA) - <p(fcA + rA) (11-197)
of (11-193) where (&A,rA) are points in the sector I of Fig. 11-22. As we see
from (11-196) ф(лА) is determined in terms of the values of 0(AA, A) of 0(m, A)
on the horizontal line v = A. Hence the system (11-197) is overdetermined. This
is used to improve the estimate of if 0(«, v) is not known exactly but it is
estimated in terms of a single sample of y(/).t The corresponding problem of
spectral estimation is considered in Chap. 13.
tT. Matsuoka and T. J. Ulrych: “Phase Estimation Using the Bispectrum," IEEE Proceedings, vol.
72,1984.
394 STOCHASTIC PROCESSES
Note If the bispectrum S(u,u) of a process x(r) equals the right side of (11-190) and
Н(ы) = 0 for |w| > <t, then
S(«,y) = 0 for|n|><r or If | > o’ or |ы + i’| > tr
Thus, S(u, i>) = 0 outside the hexagon of Fig. ll-23a. From this and the symmetries of
Fig. 11-22 it follows that S(u,v) is uniquely determined in terms of its values in the
triangle OAB of Fig. U-23a.
Digital processes. The preceding concepts can be readily extended to digital
processes. We cite only the definition of bispectra.
Given an SSS digital process x[zz], we form its third-order moment
= Е{х[л + fc]x[n + r]x[n]}	(11-198)
The bispectrum of x(n] is the two-dimensional DFT of R[k, r):
S(u,u) = £	£ R[k,r]e-*“k*u''>	(11-199)
& a — OO /> w —>00
This function is doubly periodic with period 2-тг:
S(u + 2irm,v + 2тгп) = S(u,v)	(11-200)
It is therefore determined in terms of its values in the square |u| < тг, |t»| 'tr
of Fig. l l-23b. Furthermore, it has the 12 symmetries of Fig. 11-22.
Suppose finally that the process х(л] equals the samples х(лТ) of an
analog process x(f) with bispectriim Sa(u, t>). If Sa(u, v) equals the right side of
(11-190), then, in the square of Fig. 11-236,
S(u,u) '—0 for |u| > тг or |u| > тг or s|u + l-| 7Г
APPENDIX 10B THE SCHWARZ INF.QI.IA! Ш 395
From this it follows that in this case, Siu,v) is uniquely determined in terms of
its values in the triangle OAB of Fig. 11-236.
APPENDIX 10A
THE POISSON SUM FORMULA
If
F(u) = ( f(x)e~i,,x dx
J — 00
is the Fourier transform of fix) then for any c
"	1	“	27Г
£ f(x + nc) = - £ F(nu0)e'nUQl uQ--------------------(11A-1)
«•’—00	C «•—00	£
Proof. Clearly
£ 6(x + nc) = i £ eJn,,«x	(11A-2)
л — —ее	n — — oo
because the left side is periodic and its Fourier series coefficients equal
1 rc/1	1
— I 8(x)e Jnu°x dx = —
C J-c/2	c
Furthermore, 8(x + nc)* fix) — fix + nc) and
e>«ox*f(x) = Г e/',Ho(x““’/(a) da = e)nu"xFinu0)
— 00
Convolving both sides of (11А-2) with fix) and using the above, we obtain
(11A-1).
APPENDIX 10B
THE SCHWARZ INEQUALITY
We shall show that
2
[bf(x)gix)dx <, fb\fix)\Z dx[b\gix)\* 2 dx (11B-1)
’a	Ja	Ja
with equality iff
/(x) = kg*ix)
(11В-2)
396 STOCHASTIC FROCL-SSliS
Proof. Clearly
fbf(x)g(x) (lx
J II
< ff'\f(x)\\g(x)\dx
Equality holds only if the product f(x)g(x) is real. This is the case if the angles
of fix) and g(x) are opposite as in (11B-2). It suffices, therefore, to assume
that the functions f(x) and g(x) arc real. The quadratic
/(*) = fb[f(x) - zg(x)]2 dx
= z2 ff'g2(x) dx - 2z fbf(A-)g (x) dx + [bf2( A-) dx
Ja	Ja	J<i
is nonnegativc for every real z. Hence, its discriminant cannot be positive. This
yields (1 IB-1). If the discriminant of /(z) is zero, then Z(z) has a real (double)
root z = k. This shows that l(k) = 0 and (1 IB-2) follows.
PROBLEMS
11-1. Find the first-order characteristic function («) of a Poisson process, and (b) of a
Wiener process.
Answer: (a)	* °; e /2.
11-2. (Two-dimensional random walk). The coordinates x(t) and y(t) of a moving object
are two independent random-walk processes with the same .v and T as in Fig.
11-lfl. Show that if z(r> = \Zx2(Z) + y2(z) is the distance of the object from the
origin and t » T, then for z of lhe order of fat:
a = s
at	T
11-3. In the circuit of Fig. Pl 1-3, n,.(z) is the voltage due to thermal noise. Show that
2kTR	2kTR
S ,,((!>) = --------;---------- S (<o) = —i--------XTj
(1 - a>zLCy + a>2R2C2	R- +
and verify Nyquist’s theorems (11-27) and (11-30).
FIGURE Pl 1-3
(K A particle in free motion satisfies the equation
mx"(0 + /x'(/) = F(r) Sr(w) “ 2kTf
HioitiiMt J97
Show that if x(0) = x'(0) = 0, then
2T{x-(r)} = 2D2(z - -1 +
\	4 a Cr	4a J
where D2 = kT/f and a = f /2т.
Hint: Use (10-90) with
1
/»(/) = j(l -	4{t) = 2kTfU(t)
11-5. The position of a particle in underdamped harmonic motion is a normal process
with autocorrelation as in (11-12). Show that its conditional density assuming
x(0) = xn and x'(0) = v(0) = r(l equals
= ...............
/2—Р
Find the constants a. b. and P.
11-6. Given a Wiener process w(z) with parameter «. wc form the processes
x(/) = w(z2) y(z) = w2(z)	z(z)=|w(z)|
Show that x(z) is normal with zero mean. Furthermore, if z, < z;, then
^,(/|,G) efltZ?	= O’2/)(2/| + Z2)
2a -----	/7?
t?) = — yt ft2 (cos в + 0 sin 0) sin 0 - у —
11-7. The process s(z) is shot noise with Л = 3 as in (11-45) where hit) = 2 for
0 S t < 10 and hit) = 0 otherwise. Find E(s(z)}, E{s2(/)), and P(s(7) = ()}.
11-8. The input to a real system Hit») is a WSS process x(z) and the output equals y(r).
Show that if
Kx.(r) =/?,.,.(?)	M-T)= v(r)
as in (11-67), then Hit») = jB(w) where Bit») is a function taking only the values
+ 1 and - 1.
Special case: If yfz) = xfz), then В(ш) = -sgn w.
11-9. Show that if x(z) is the Hilbert transform of x(z) and
i(r) - х(О«»юо/ + XG)sinwftz q(z) = x(z)cos<o(1z -x(z)sin<o0z
then (Fig. Pl 1-9)
where 5,/w) - 45t(w + n>ti)U(v + wn).
398 STOCHASTIC PROCESSES
11-10. Show that if w(r) and wT(r) are the complex envelopes of the processes x(r) and
x(z - t) respectively, then wT(r) = w(z - r)e“/“"T.
11-11. Show that if w(t) is the optimum complex envelope of x(t) {see (11-85)], then
E{|w'(OI2} = -2[/?:(0) + «^(0)]
11-12. Show that if the process x(/)cos<oz 4- y(/)sin<or is normal and WSS, then its
statistical properties are determined in terms of the variance of the process
z(f) = x(r) +/y(r).
11-13. Show that if 6 is an RV uniform in the interval (0. T) and /(») is a periodic
function with period T, then the process x(r) = f(t — 6) is stationary and
/ |-'o
11-14. Show that if
sin tr(t-nT) tt
®/v(O - x(') - E	a=T
n--N ff(t-llT)	T
then
1	z00
E(e«W} - у-/ Я“) «*" ~
7 — 00
N sin <r(r - nT)
n~N a(t - nT)
2
e'"“r du
and if S(a>) = 0 for |ш| > <т, then E{ejj(t)} -* 0 as N -* ».
11-15. Show that if x(r) is BL as in (11-124), thent for |t| < ir/tr:
2r2	t2
^5-|Я"(0)| SR(O) - R(r) £ -^-|R"(0)|
£{[x(r + t) - x(/)]2} 2: ^£{[х'(г)]2}
7Г
Hint: If 0 < <p <tt/2 then 2<p/ir < sin <p <
H-I64 A WSS process x(f) is BL as in (11-124) and its samples х(лэг/<т) are uncorre-
lated. Find Sx(.u) if £(x(/)} — tj and E(x4t)) « I.
11-17. Find the power spectrum S(w) of a process x(r) if 5(w) «• 0 for |ш| > тт and
£{х(л + m)x(n)} = V5[m]
fA. Papoulis: “An Estimation of the Variation of a Bandlimited process,” IEEE, PGtT, 1984.
PROBLEMS 399
11-18. Show that if S(a>) = 0 for |w| > o-, then
7?(t) a 7?((l)coso-r for |r| < r/2cr
11-19. Show that if x(r) is BL as in (11-124) and Д = 2-a/a, then
X
x(r) = 4 Sin-— E
х(лД)	x’(nA)
-------------— -f- —-------------
(al - 2n~}~ <r(at - 2/itt)
Hint: Use (11-143) with /V = 2. Н^ш) = I. H2(w) = juj.
11-20. Find the mean and the variance of P(w„) it t, is a Poisson point process and
11-21. Given a WSS process x(z) and a set of Poisson points t, with average density A.
wc form the sum
X.(w)= £ x(t,)e
I«,l <<•
Show that if £{x(O) = 0 and R,(r) -> 0 as |—I -* x, then for large c.
E{Xr(w)} = 2c5.(<o) + ^R,(0)
Л
11-22. We arc given the data x(r) =»/(/) + n(t) where /?„(т) = HS(r) and E(n(r) = ()}.
We wish to estimate the integral
g(f) =
knowing that g(T) = 0. Show that if we use as the estimate of g(/) the process
w(r) = z(t) - dT)t/T where
z(/) = [‘x(a).da then E{w(t))-g(t)	о’,; = М(1-—I
11-23. (Cauchy inequality) Show that
|Е«д|	G)
with equality iff a, = kb*.
11-24. The input to a system H(z) is the sum х(л] = f[n] + v[n] where f(n] is a known
sequence with z transform F(z). Wc wish to find H(zJ such that the ratio
у/[0]/Е{уДл]} of the output у[л] « уДл] + у, [n] is maximum. Show that (a) if
v(n] is white noise, then H(z) = AF(z~ and (/>) if H(z) is an FIR filter that is, if
H(z) = a0 + a, z“ * + •  • -t-fl^z-^ then its weights am are the solutions of the
system
N
E R, [л -	- V[-«J « = °..........n
m-0
400 STOCHASTIC PKOCESSES
11-25. If R„(t) = N8M and
1
x(r) = A cos inQt + n(r) Н(ш) = ——
a +
y(r) = Bcos(w(l + t + tp) + y„(r)
where y„(Z) is the component of the output y(t) due to n(r), find the value of a
that maximizes the signal-to-noise ratio
IBI2
ФЛ0)
Answer: a = a>().
11-26. In the detection problem of page 386, we apply the process x(r) = /(/) + v(z) to
the tapped delay line (11-161). Show that: (a) The S/N ratio r is maximum if the
coefficients at satisfy (11-162); (Z>) the maximum r equals ^/>y(r0).
11-27. Given an SSS process x(r) with zero mean, power spectrum Siu), and bispcctrum
S(u, u), we form the process yG) = x(r) + c. Show that
Syyy(u,u) = S(u,v) 4- 2irc[S(»)6(o) + S(u)6(u) + S(u)8(u + i)]
+ 4тг2с36(и)5(г>)
11-28. Given a Poisson process xG), we form its centered process x(r) = x(/) - At and
the centered Poisson impulses
dx(t)
Show that
E{x(t|)x(r2)x(fj) = Л min(rp t2, t3)
£{z(/,)z(r2)i(b) - A8(tt - r2)5('i “ b)
Hint Use (10-94) and the identity
min(/t,r2,/3) = r,t/(r2 - r()U(r3 - M + r2t/(r( - t2)U(t3 - t3)
~	~ g)
CHAPTER
12
SPECTRAL
REPRESENTATION
12-1 FACTORIZATION AND INNOVATIONS
In this section, we consider the problem of representing a real WSS process x(f)
as the response of a minimum-phase system Us) with input a white-noise
process i(z). The term minimum-phase has the following meaning: The system
Us) is causal and its impulse response /(/) has finite energy; the system
T(s) = 1/L(s) is causal and its impulse response y(f) has finite energy. Thus a
system Us) is minimum-phase if the functions L(s) and 1/L(s) are analytic in
the right-hand plane Re s > 0. A process x(t) that can be so represented will be
called regular. From the definition it follows that x(r) is a regular process if it is
linearly equivalent with a white-noise process i(r) in the sense that (see Fig.
12-1)
i(/) = f ”У(«)Х(Г “ a) da Ra(T) = 5(T)	(12-1)
Jo
x(t) = fz(a)i(/ - a) da £{x2(0} » f*2(0 dt < « (12-2)
Jo	Jo
The last equality follows from (10-91). The above shows that the power spec-
trum S(s) of a regular process can be written as a product
S(s) = L(s)L(-s)	S(a>) » |L(^)|2	(12-3)
where Us) is a minimum-phase function uniquely determined in terms of S((o).
The function Us) will be called the innovations filter of x(t) and its inverse PCs)
401
402 STOCHASTIC PROCESSES

FIGURE 12-1
the whitening filter of x(/). The process i(f) will be called the innovations of x(r).
It is the output of the filter L(s) with input x(/).
The problem of determining the function L(.y) can be phrased as follows:
Given a positive even function S(w) of finite area, find a minimum-phase
function L(s) such that |L(J<u)|2 = S(<u). It can be shown that this problem has
a solution if 5(ю) satisfies the Paley-Wiener conditions
r<*> |ln S(w)|
/ --------da) <
J-x 1 + a)2
This condition is not satisfied if S(o>) consists of lines, or, more generally, if it is
bandlimited. As we show later, processes with such spectra are predictable. In
general, the problem of factoring 5(<u) as in (12-3) is not simple. In the
following, we discuss an important special case.
(12-4)
Rational spectra. A rational spectrum is the ratio of two polynomials of a)2
because S(~a>) = S(<u):
/4(w2)	A( -s2)
This shows that if s, is a root (zero or pole) of S(s), *s also a root.
Furthermore, all roots are either real or complex conjugate. From this it follows
that the roots of S(s) are symmetrical with respect to the Ja) axis (Fig. 12-2a).
Hence they can be separated into two groups: The “left” group consists of all
roots s,- with Rested, and the “right” group consists of all roots with
Res, > 0. The minimum-phase factor L(s) of S(s) is a ratio of two polynomials
formed with the left roots of S(s):
N(s)N(-s)	N(s)
=
Example 12-1. If S(o) — N/(a2 + a>2) then
/У	N	JN
a2—s2 (a + s)(a - s)	a + s
fN, Wiener* R. E. A. Q Paley: Fourier Transforms in the Complex Domain, American Mathemati-
cal Society College, 1934 (see also Papoulis, 1962).
12-1 FACTORIZATION AND INNOVATIONS 403
S(s)
X X
---------x----------------x—►
j, о
X X
(fl)
FIGURE 12-2
Example 12-2, If S(w) - (49 + 25ш2)/(ш4 + 10co2 + 9) then
_	49 ~ 2552	_	7 4- 5.5
(l-№)(9-№)	(I + s)(3 + 5)
Example 12-3. If 5(<o) = 25/(«/ + 1) then
25 _	25
( )	+ 1	(№ + 1/2S + l)(f2 — 1/2S + 1)
L(5) -
Digital Processes
A digital system is minimum-phase if its system function L(z) and its inverse
Rz) = 1/L(z) are analytic in the exterior |z| > 1 of the unit circle. A real WSS
digital process х[л] is regular if its spectrum S(z) can be written as a product
S(z) = L(z)L(l/z) S(e>) - |L(e>)I2 (12-6)
Denoting by Z[«] and у[л) respectively the delta responses of Uz) and Rz), we
conclude that a regular process х[л] is linearly equivalent with a white-noise
process К л] (see Fig. 12-3):
1[л] ~ E	= 3[m]	(12-7)
k-0
х[л] = E l[*]*[n ~ Л] £{x2[«]} = E < 00	(12-8)
*-0	*-0
The process 1[л] is the innovations of x£n] and the function L(z) its innovations
filter. The whitening filter of x[n] is the function Hz) = l/Uz).
It can be shown that the power spectrum S(e^“) of a process х(л] can be
factored as in (12-6) if it satisfies the Paley-Wiener condition
[ I In S(m) dw| < 00	(12-9)
* —
404 STOCHASTIC PROCESSES
FIGURE 12-3
Rational spectra. The power spectrum S(e;") of a real process is a function of
cos (и = (ei<o + e~JU>)/2 [see (10-180)]. From this it follows that S(z) is a
function of z + l/z. If therefore, 2, is a root of 8(2), 1/z, is also a root. We
thus conclude that the roots of 8(2) are symmetrical with respect to the unit
circle (Fig. 12-3); hence they can be separated into two groups: The “inside”
group consists of all roots z; such that I2J < 1 and the “outside” group consists
of all roots such that |zj > 1. The minimum-phase factor L(z) of 8(2) is a ratio
of two polynomials consisting of the inside roots of 8(2):
W(z)W(l/z)	W(z)
(Z)	D(z)Z>(l/z)	(z) D(z)
L'dl’Sd)
Example 12-4. If S(w) = (5 - 4cosw)/(10 - 6cos<u) then
S-Zfz + z-1)	2(z- l/2)(z-2)	2z - 1
10 —3(z+z~’)	3(z - l/3)(z - 3)	3z - 1
12-2 FINITE-ORDER SYSTEMS AND
STATE VARIABLES
In this section, we consider systems specified in terms of differential equations
or recursion equations. As a preparation, we review briefly the meaning of
finite-order systems and state variables starting with the analog case. The
systems under consideration are multiterminal with m inputs x/r) and r
outputs y/'z) forming the column vectors X(z) = [x/z)] and Y(z) = [y/r)] as in
(10-113).
At a particular time t = the output Y(z) of a system is in general
specified only if the input X(z) is known for every t. Thus, to determine Y(z) for
12-2 HNITE ORDLR SYSTEMS ANDSIAT1 VARI MILLS 405
FIGURE 12-4
t > tg, we must know X(r) for / > /(l and for / < t„. For a certain class of
systems, this is not necessary. The values of Y(r) for t > r(l are completely
specified if we know X(t) for t > tQ and. in addition, the values of a finite
number of parameters. These parameters specify the “state” of the system at
time t = in the sense that their values determine the effect of the past t < ttl
of X(t) on the future t > r0 of Y(t). The values of these parameters depend on
t0; they are, therefore, functions z,(r) of t. These functions are called state
variables. The number n of state variables is called the order of the system. The
vector
Z( t) = [zf( t)] i = 1....н
is called the state vector; this vector is not unique. We shall say that the system
is in zero state at t = r0 if Z(r0) = 0.
We shall consider here only linear, time-invariant, real, causal systems.
Such systems arc specified in terms of the following equations:
dZ(r)
—=/lZ(f) +BX(t)	(12-10a)
Y(t) = CZ(f)+DX(r)	(12-10b)
In the above, А, В, C, and D are matrices with real constant elements, of order
п X n, n X m, г X n, and г X m respectively. In Fig. 12-4 we show a block
diagram of the system S specified terminally in terms of these equations. It
consists of a dynamic system S, with input U = BX(t) and output Z(O, and of
three memoryless systems (multipliers). If the input XG) of the system S is
specified for every t, or, if X(t) — 0 for t < 0 and the system is in zero state at
r= 0, then the response Y(t) of 5 for t > 0 equals
y(r) = [ fi(a)X(t — a) da	(12-11)
Jo
where Mr) is the impulse response matrix of S. This follows from (10-78) and
the fact that Mr) «= 0 for t < 0 (causality assumption).
406 STOCHASTIC paocesses .
We shall determine the matrix Ж0 starting with the system As we see
from (12-10л), the output Z(t) of this system satisfies the equation
dZ(t)
^Z(t) -U(r)	(12-12)
The.impulse response of the system 5г is an n x n matrix Ф(г) = [<₽;i(r)] called
the transition matrix of S. The function </>;,(0 equals the value of the /th state
variable z/tO.when the ith element u/f) of the input U(r) of 5, equals 6(0 and
all other elements are 0. From this it follows that [see (10-115)]
Z(/) = ( <&(a)V(f - a) da = [ Ф(а)ВХ(г - a) da (12-13)
J0
Inserting into (12-106), we obtain
Y(r) = ГсФ(а)ВХ(г - a) da + DX(t)
Jo
= Г[СФ(а)ВХ(г - a) + 6(a)£>X(t - a)] da (12-14)
JQ
where 8(t) is the (scalar) impulse function. Comparing with (12-11), we con-
clude that the impulse response matrix of the system 5 equals
H(i) = СФ(г)В + 8(t)b	(12-15)
From the definition of Ф(/) it follows that
d<b(t)
—12-ЛФ(г)=6(г)1я	(12-16)
dt.
where 1„ is the identity matrix of order л. The Laplace transform <Ks) of Ф(О
is the system function of the system Taking transforms of both sides of
(1246), we obtain
УФ($.) -ЛФО) = 1„.	Ф($) = (sin	(12-17)
Hence
ф(Г)=ея/ t>Q	(12-18)
This is a direct generalization of the scalar case; however, the determination of
the elements x?/,-(t) of <X>(r) is not trivial. Each element is a sum of exponentials
of the form
Ы0 - ЕрдЛО*’*' <>'0
к
where are the eigenvalues of the matrix A and p# k(t) are polynomials in t
of degree equal to the multiplicity of sk. There are several methods for
determining these polynomials. For small л, it is simplest to replace (12-16) by
n systems of: n scalar equations»
12’2 l-INI Ib ORDER SYSTI.MS AN|> STATI VARlAHI.JS 407
Inserting Ф(/) into (12-15). we obtain
H(t) = CeA,B + d'(r)D
H(s) = C(.vl„-A)~'B + D	(12-19)
Suppose now that the input to the system S is a WSS process X(r). We
shall comment briefly on the spectral properties of the resulting output, limiting
the discussion to the state vector Z(r). The system S, is a special case of S
obtained with В = C = 1„ and D - 0. In this case, Z(r) = Y(r) and
rfY(r)
-Л¥(г) =X(O H(s) = (.vl„-/I)"'	(12-20)
Inserting into (10-157), we conclude that
S„(s)-S„(s)(-sl„-/l)-'
S,.x(s)-(*1„-/I')''s„,(s)	(12-21)
S„.(5) - (»1, -X')’'S„U)(-*I. -Л)-
Differential equations. The equation
У("ЧО + aiy("~n(t) + •  • +a„y(t) = x(r) (12-22)
specifies a system S with input x(r) and output y(/). This system is of finite
order because yO) is determined for t > 0 in terms of the values of x(/) for
t S. 0 and the initial conditions
у(0),У(0)....y(n-,)(0)
It is, in fact, a special case of the system of Fig. 12-4 if we set m = r = 1:
	*i(0	= y(t)	z2(O = y’(0 •••«„(	0 = y<"“O(r)	
	0	1	0	0		 0 
A =	0	0	1	•••	0	В = c =	0
	. ~an	"Vi	“an-2	“fll.		. 1 .
and D = 0. Inserting the above into (12-19), we conclude after some effort that
1
sn + a^”"1 + • • • +a„
This result can be derived simply from (12-22).
Multiplying both sides of (12-22) by x(f - r) and y(/ + r), we obtain
«*л>(т) + а.КЙ-'Чг) +	+ «.»,.(г)-Л„(т)	(12-23)
ЯМ(т) +о,ЯЙ-|’(г) +  +о„Я„(г) -/?„(’’)	(12-24)
for all т. This is a special case of (10-133).
408 STOCHASTIC PROCESSES
Finite-order processes. We shall say that a process x(t) is of finite order if its
innovations filter Us) is a rational fiiriction of s:
bosn + bisn~l + • •  + b N(s)
ад-uw-») L(s) 5. + as„-r;-,.	(12-25)
where Ms) and P(s) are two Hurwitz polynomials. The process xG) is the
response of the filter L(s) with input the white-noise process i(r):
x<">(0 +olx<"-1)(t) + ••• + «„x(') =b„i<m4t) + • • +M(') (12-26)
The past x(r - r) of x(/) depends only on the past of i(r); hence it is
orthogonal to the right side of (12-26) -for every r > 0. From this it follows as in
(12-24) that
+ а^а~1\т) + • • • + а„Я(т) =0 т > 0	(12-27)
Assuming that the roots s, of D(s) are simple, we conclude from (12-27) that
Я(т) =	т > 0
The coefficients «, can be determined from the initial value theorem. Alterna-
tively, to find /?(т), we expand S(s) into.partial firactions:
S(s) = E77- + E = s*(s) + S-(s)	(12-28)
7«| 5 si 1-1	5 si
The first siim is the transform of the causal part jR+(t) e R(t)U(t) of 7?(r) and
the second of its anticausal part /?“(т) =/?(r)f/(-r). Since R{-r) = R(t),
this yields
Я(т) =jR+(|t|) =	(12-29)
i-1
Example 12-5. If L(s) — l/(s + a), then
er л-	1	_ l/2a . l/2a
(s+a)( — ji + a)	5 + a —s + a
•Hence 7?(r) -(l/2a)«-a4
Example 12-6; Thediffercntialequation
X'(O + 3x’(0 + 2x(r)•- l(t)	MO -5(r)
specifiesa process x(r)with autocorrelation RM. Fro.m (12-27) it follows that
Я"(О =+ SR’tr) + 2K(r) = 0 hence R(r) + c,<2t
for t > 0. To find the constants C[ and.c2, we. shall determine Ж0) and 7?’(0).
Clearly,
_	1	s/12 + 1/4	—j/12 4- i/4
•(f2 + 3s + 2)(s2 - 3s + 2) в s2 + 3s + 2 + s2 - 3s + 2
12-2 FINITE ORDER SYSTEMS AND STATU VARIAHIJA 409
The first fraction on the right is the transform of Я + (т); hence
R* (0*) = lim .sS+( O = п = C) + c, = R(0)
J —• 05	*
Similarly,
Я’(0*) = lim s(sS +(^) - n) = 0 «= -cl - 2c,
This yields Л(т) - |e-|r| - £е~2|т|.
Note finally that R(r) can be expressed in terms of the impulse response Kt)
of the innovations filter Us):
Я(т)-/(т)«/(-т) = Г/(|т| + a)l(a) da
Jo
(12-30)
Digital Systems
The digital version of the system of Fig. 11-4 is a finite-order system S specified
by the equations:
Z\k + 1] = AZ[k] + BX[k]	(12-3U)
Y[Xc] = CZ[k] + DX[k]	(12-312?)
where к is the discrete time, X[к] the input vector, Y[A:] the output vector, and
Z[k] the state vector. The system is stable if the eigenvalues z, of the n x n
matrix A are such that |zj < 1. The preceding results can be readily extended
to digital systems. Note, in particular, that the system function of S is the z
transform
H(z) = C(zl„ — A)~XB + D	(12-32)
of the delta response matrix
/7 [A] = СФ[А]В + 5[k]D kzO	(12-33)
We shall discuss in some detail scalar systems driven by white noise. This
material is used in Sec. 13-3.
Finite-order processes. Consider a real digital process x[n] with innovations
filter L(z) and power spectrum S(z):
S(z) = L(z)L(l/z) L(z) - £ /Hz""	(12-34)
n-0
where n is now the discrete time. If we know L(z), we can find the autocorrela-
tion of х(л] either from the inversion formula (10-179) or from the
convolution theorem
E[m] =/[«]♦/[-m] - £ i[|ml +k]/[k]	(12-35)
Л-0
We. shall discuss the properties of Я[ш] for the class of finite-order processes.
410 STOCHASTIC PROCESSES
The power spectrum 5(<u) of a finite-order process х[л] is a rational
function of cos <o; hence its innovations filter is a rational function of z;
,, x /V(z) bo + biz~* + •“ +bMz'"
D(z) = 1 +a,z-' + ••• +aNz~N	(l2‘36^
To find its autocorrelation, we determine /[л] arid insert the result into (12-35).
Assuming that the roots z, of ZXz) are simple and M < N, we obtain
L(z) =* E , -Г = E%-M«]
i J ^iZ	i
Alternatively, we expand S(z):
S(t)-E ,	+ EtA-	«[">] = Е»,гГ' (12-37)
t J	i 1 ZiZ	i
Note that at} = y/Ld/z,).
The process x[л] satisfies the recursion equation
х[л] + aixfri -!] + ••• +a,vx[n - TV] = £>oi[n] +. • • • +b1Ui[n - m]
(12-38)
where i[n] is its innovations. We shall use this equation to relate the coefficients
of L(z) to the sequence T?(m] starting with two special cases.
Autoregressive processes. The process x[n].is called autoregressive (AR) if
60
U,.+0	(12-39)
1 * u IZ	r	' “W*
In this case, (12-38) yields
х[л] 4- tf|x[/i - 1] 4- • • • 4-дл,х[п - TV] = hoi[n] (12-40)
Theipast х[л — /и] of x[n] depends only on the past of i[n]; furthermore,
£(i"[n]) — 1. From this it follows that £(xin]i[n]} = b0 and £(х[л —	=
0; Multiplying (12-40) by х[л - m] and setting rn = 0,1,..., we obtain the
equations
7?[0] +•«,/?[!] + • • • +aAZ£[N] = b0
Я[1] +a,T?[0] 4-.	+ aNR[N - 1] =0	(12-41д)
T?[N] Ч- o, J?[TV - 1] + • • • + л„Я[0] = 0
and
Л[./п] -tha।T?[rn — 1] +	• 4-a^R[m — TV] = 0	(12-416)
for m > TV. The first :N 4- 1 of these are called the Yule-Walker equations.
They .ire used in Sebt 13-3 .to express the TV 4- i parameters ak arid bti in terms,
of lhe first ;TV 4* I values of Conversely, if L(z) is luiown, we find T?[/n]
12-2 UNITt ORt)l!llS¥STIMSANI»SJATI VARIAIHJ.4 411
for Inti &N solving the system (12-41a) and we determine /?(m] recursively
from (12-41 b) for m > N.
Example 1-27. Suppose that
xp» 1 - flx[" - 1] - v[n] ni) = Лй(m]
This is a special case of (12-40) with D(z) = I - az 1 and Z| - a. Hence
K[0] - a/?[ I} “ b	Л[т] « «a1"11 a =• ------5
1	- a2
Line spectra. Suppose that x[m] satisfies the homogeneous equation
x[wj + в|Х[я - 1] + • • • + awx[n - N] = 0	(12-42)
This is a special case of (12-40) if we set = 0. Solving for x[h], wc obtain
x[n] « CjZf + • •  +cNz£ D(zJ = 0	(12-43)
If x[n] is a stationary process, only the terms with z, » е1ш> can appear.
Furthermore, their coefficients cA must be uncorrclated with zero mean. From
this it follows that if x[n] is a WSS process satisfying (12-42), its autocorrelation
must be a sum of exponentials as in Example 10-31:
/?[ni] »	= 2tf|ш| <ir (12-44)
where at e £{c?) and )34 = <u, - 2ттк( as in (10-182).
Moving average processes. A process x[/i] is a moving average (MA) if
х[л] = Z>„i[zi] + ••• +£>wi[?i -M]	(12-45)
In this case, L(z) is a polynomial and its inverse /(«] has a finite length (FIR
filter):
L(z) •»/>(> + b,z_* + • • • +bMz~M /[л] = Ь05[л] + • * • + bM8[n - M]
(12-46)
Since I(n] - 0 for n > mt (12-35) yields
Я[т] -	+ *]/[*] - Ё"»*--»*	(12-47)
к-0	к-0
for 0 s т £ М and 0 for т > М. Explicitly,
Я[0]
Л[1] " bobi + bybz + • • • +bM-.ibM
rW-Mw..............................
412 STOCHASTIC F.IIOCHSSES
Example 12-8. Supposethat x[n] is the arithmetic average of the M values rd i[n);
x[/i] » —(![/»] + •«[« - 1] + • • +i[n - Л/ + 1])
M
In this case.
1	I - z M
, л/(-'
2	_ - -л/ w	sin' ——
S(z) = L(2)L(l/2) = f2 - T_,\	S(e-) -------
M(2-Z -2)	Af:sin*y
Autoregressive moving average. We shall say that x[n] is an ARMA process if it
satisfies the equation
х[л] + Я|Х[л -!] + ••• +«.vx[zz - AT] = boi[n] + • • • + hA/i[« - M]
(12-48)
Its innovations filter L(z) is the fraction in (12-36). Again, i[/t] is white noise:
hence
£{x[n -m]i[n - r]} = 0 for m < r
Multiplying (12-48) by x[/i — m) and using the above, we conclude that
R[m] +	— 1] + • • • +ux,/?[w — Л7] = 0	m > M (12-49)
Note that; unlike the AR case, this is true only for m > M.
12-3 FOURIERSERIES AND
KARHUNEN-LOEVE EXPANSIONS
A process x(r) is.MS periodic with period T if E(|x(7 + T) - x(/)|2} = 0 for all
t. A WSS process is MS periodic if its autocorrelation Z?(r) is periodic with
period! T = 2it/^>0 [see (10-165)]. Expanding R(r) into Fourier, series, we
obtain
=	(12-50)
n--«	TJ»
Given a WSS periodic process x(r.) with period T. we form the sum
«	i1 _
Я(4= E	c„= - f	dt (12-51)
,„._w	TJ0
12-3 НИ RII f< М ни s ANL| КАИШМ I G1 V| > XHASSION'. 413
THEOREM. The above sum equals x(r) in the MS sense:
Z:'[|x( <) - x(t)|2) = 0
Furthermore, the RVs c„ are uncorrelatcd with zero mean lor п л 0, and their
variance equals yH:
E[c J = (V'
" U)
n = 0
n * 0
Etc c*} --= I7"
J - 1 и m I
Il •- 111
11 * III
( 12-53)
Proof. We form the products
1 d
c„x*(a) = - / x(r)x’(nf)e ''........ dt
I Jn
c„c;‘> = 7 ['c„x't t	dt
I Л>
and we take expected values, This yields
E{c„x‘(«)} = 7 ('lit' ~ a)e""“"' dt = yne
I Ai
I л?	( v	n = in
£{<„'*} = rf У^ ""^............
l Jn	It’	it * nt
and (12-53) results.
To prove (12-52), we observe, using the above, that
A{|x( r) I2} = L/i{|c„|2) =	= /<(()) = E{|x(/)|?)
E{x(r)x’(/)} = £ EfcXf= Еу„ = E{x*(/)x(t)}
and (12-51) follows readily.
Suppose now that the WSS process x(t) is not periodic. Selecting an
arbitrary constant T, we form again the sum x(t) as in (12-51). It can be shown
that (see Prob. 12-12) x(t) equals x(r) not for all t. but only in the interval (0, T):
E{|x(r) - x(r)|2} = 0	0<t<T	(12-54)
Unlike the periodic case, however, the coefficients c,t of this expansion are not
orthogonal (they are nearly orthogonal for large n). In the following, we show
that an arbitrary process xG), stationary or not, can be expanded into a series
with orthogonal coefficients.
The Karhunen-Loeve Expansion
The Fourier series is a special case of the expansion of a process x(f) into a
series of the form
x(/) = £ €„?„(')
Л “• I
(12-55)
0 < t < T
414 STOCHASTIC PROCESSES
where ^>„(f) is a set of orthonormal functions in the interval (0, T);
dt = Spi - m]	(12-56)
Jo
and the coefficients c„ are RVs given by
cn= fT*(t)rf(t) dt	(12-57)
'o
In the following, we consider the problem of determining a set of orthonormal
functions <p„(t) such that: (a) the sum in (12-55) equals x(f); (6) the coefficients
cn are orthogonal.
To solve this problem, we form the integral equation
[T^(tlJ2)<f>(t2)dt2 = A<f>(tl) 0 <t, < T (12-58)
Jo
where /?(/,, f2) is the autocorrelation of the process x(f). It is well known from
the theory of integral equations that the eigenfunctions <pn(r) of (12-58) are
orthonormal as in (12-56) and they satisfy the identity
«('.') = Ea.WOI2	(12-59)
Л“ I
where An are the corresponding eigenvalues. This is a consequence of the p.d.
character of R(r,,r2).
Using the above, we shall show that if <p„(t) are the eigenfunctions of
(12-58) then
E{|x(f) - x(f)|2} = 0	0 < t < T	(12-60)
and
£{сясХ} = Ml" - >”]	(12-61)
Proof. From (12-57) and (12-58) it follows that
£{cnx*(«)} =	t)<p*(t) dt = An^*(a)
Jo
E{cnc*) = Am/’r<p*(r)<pm(z) dt = Ап3[л - m] (12-62)
'o
Hence
йкЛ‘(0) = Ё ,>•(<) = лл*«)
m«l
£(S(»)x‘«)} - E	= ««<)
n-I
- £(«>(r)x(r)) - £{|x(r)|2) - £(l«(r)I2)
and (12-60) results.
12-3 mUHtI R St-Ktl S ANU KAHHUNLN Ll>I.VI I XfANStONS 415
It is of interest to note that the converse of the above is also true: if </>„(/)
is an orthonormal set of functions and
x(') = i <=„<?„(')	£{c„c*} =/fr" "=,n
n -1	\ (I n * m
then the functions <pn(r) must satisfy (12-58) with A = a~.
Proof. From the assumptions it follows that c„ is given by (12-57). Furthermore,
£{x(r)c*} = £ E{c„c*)^m(t) =<r^„,(/)
fi • i
£{x(r)c*} = flE{x(t)x*(a)}<pm(a) da = (ГЩ(,a)<f>„,(a) da
Jo	Jo
This completes the proof.
The sum in (12-55) is called the Karhunen-Loeve (K-L) expansion of the
process xG). In this expansion, xG) need not be stationary. If it is stationary,
then the origin can be chosen arbitrarily. We shall illustrate with two examples.
Example 12-9. Suppose that the process xG) is ideal low-pass with autocorrelation
sin лт
*(r) =--------
ITT
We shall find its K-L expansion. Shifting the origin appropriately, wc conclude
from (12-58) that the functions <pfl(t) must satisfy the integral equation
,T/2 sin a(t - r)
/	—7~-------:—'₽„(’) dr = A„<₽„( t)	(12-63)
'-7/2 TTyt - T)
The solutions of Ihis equation are known as prolate-spheroidal functions.!
Example 12-10. We shall determine the K-L expansion (12-55) of the Wiener
process wG) introduced in Sec. 11-1. In this case [sec (11-5)]
f2 < fj
KGpfj)-aminGf,^) = ( f >(^
Inserting into (12-58), wc obtain
o[,'t2fp(t2)dt-> +atl[T<p(t2)di2^ Ayftj)	(12-64)
Jn	~
To solve the above integral equation, we evaluate the appropriate endpoint
tD. Slepian, H. J. Landau, and H. O. Pollack: “Prolate Spheroidal Wave Functions." Bell System
Technkal.Jolirnal, 'K>\. 40,1961.
416 STOCHASTIC PROCESSES
conditions and differentiate twice. This yields
(/>(0) = 0 of^V(G) dt2 *¥>'(<>)
ф'(Л “ 0 **'(') + “vO) = о
Solving the last equation, we obtain
z v /Т .	[o'	(2n + l)ir
».(«) - у j s>» *>»>	 у t; —27^“
Thus, in the interval (0, T), the Wiener process can be written as a sum of
sine waves
w(0 ” /у E c« sin cn = f^iysin o>ntdt
where the coefficients c„ are uncorrelated with variance £{c^} = A„.
12-4 SPECTRAL REPRESENTATION OF
RANDOM PROCESSES
The Fourier transform of a stochastic process x(f) is a stochastic process X(w)
given by
X( <o ). = Г x(t)e~/ш> dt	(12-65)
J—it,
The integral is interpreted as an MS limit. Reasoning as in (12-52), we can show
that (inversion formula)
1 /**
x(t) = V" / X(w)e/eJ,rfw	(12-66)
2-1Г — co
in the MS sense. The properties of Fourier transforms also hold for random
signals; For example, if y(t)is the output of a linear system with input x(/) and
system function Ш), then Y(w) = ,Х(й))Я(й>).
The mean^of X(e>) equals the Fourier transform of the mean of x(r). We
shall express, the autocorrelation of X(w) in terms of the two-dimensional
Fourier transform:
r(u, p) = Г Г Л( г, ,г2).е“л",'+‘4tt dt2 (12-67)
-OO
of the autocorrelation tz) of x(/).. Multiplying (12-65) by its conjugate and
talcing expected values, we obtain
E{X(u)X*(u)} = f /“' £{x(fl)x*(l2)Je-^.-‘»i’dtldr2
— oor
12-4 SPECTRAL REPRESENTATION OP RANDOM PROCESSES 417
Hence
Е(Х(м)Х*(у)) = Г(и, -y)	(12-68)
Using (12-68), we shall show that, if xG) is nonstationary white noise with
average power q(t\ then X(rj) is a stationary process and its autocorrelation
equals the Fourier transform Q(a>) of q(t):
THEOREM. If /?(/lsf2) = <?(/,)3(ft - t2\ then
E{X(<t) + a)X*(ot)} = 2(w) = j q(t)e~,a>‘ dt (12-69)
* — 00
Proof. From the identity
Г Г	dt2 = Г q{t2)e~il"+vy'2 dt2
— авт — oo	— x
it follows that Г(и,у) = Q(u + u\ Hence [see (12-68)]
Ё(Х(а> + a)X*(a)} - Г(ы 4- a, -a) = Q(w)
Note that if the process x(r) is real, then
E(X(u)X(y)} = Г(и,у)	(12-70)
Furthermore,
X(-<o) =X*(<o)	Г(-и,-и) = f(u,y) (12-71)
Covariance of energy spectrum. To find the autocovariance of |Х(ш)|2, we must
know the fourth-order moments of X(ru). However, if the process x(r) is normal,
the results can be expressed in terms of the function Г(и, у). We shall assume
that the process x(f) is real with
X(a>) = A(<y) + JB(w) Г(м,у) = Гг(и,у) +Л\(и,и) (12-72)
From (12-68) and (12-70) it follows that
2£{A(u)A(y)} = Гг(м,у) + Гг(и, -y)
2£{A(y)B(u)} - ГХи,у) + ГДи, -у)
(12-73)
2£{B(u)B(y)} = Гг(и,у) - rr(u,-u)
2£(A(u)B(y)} - Ц(и,у) - Г,(м, - у)
THEOREM, If x(Г) is a real normal process with zero mean, then
Cov{IX(u)l2, |X(u)I2) - r2(u,-«) + Г2(и,и)	(12-74)
418 STOCHASTIC PROCESSES
Proof From the normality of’x(/). it follows that the processes A(w) and В(ш)
ate jointly normal with zero mean; Hence (see (7-36)]
E{|X(u)|a|X(y)|a) - E(|X(.«)|2)E{IX(и)I2}
= E{[A2(h) + B2( и).|[А2( v) + B2( i’)]}
- E{A2(u) + B2(u))£{A2(f) + B2(u)}
= 2E2{A(u)A(t’)} +2E2{B(u)B(r)}
+ 2E2{ A(«)B( r)} + 2E2(A(l-)B(u)}
Inserting (12-73) into the above, we obtain (12-74).
STATIONARY PROCESSES. Suppose that x(/) is a stationary process with auto-
correlation /?(t|, /*,) = /?(r, - t?) and power spectrum 5(o>). We shall show that
Г(и,г) = 2irS(it)8(u + i>)	(12-75)
Proof. With f, = t2 + r, it follows from (12-67) that the two-dimensional trans-
form of Ж/i,- t2) equals
Г Г R('r, - t2)e-il“‘^,^dt4dt2 = Г Г R(r)e~iUT dr dt2
—OCT--00	•' — DC	— DC
Hence
F(u,u) = S(u) Г	dt2
J — ОС
This yields (12-74) because je~iui dt = 2ir3(<o).
From. (12-74) and (12-68) it follows that
E{X(m)X*(u)} = 2ifS(u)8(u - o)	(12-76)
This shows that the Fourier transform of a stationary process is nonstationary
white noise with average power 2tt5(u). It can be shown that the converse is
alsc> true (see Prob. 12-12): The process x(r) in (12-66) is WSS iff Е(Х(ш)} = 0
for Ш 0, and
£{X(n)X*(y)} =:Q(«)6(« - u)	(.12-77)
Beat processes, If x(t) is real, then A(— ш) - A(<o), B(-<u) = В(ю), and
1 r®	1 y«>.
jt(t) - — I A( w) cos (o.tdai - — I B(^)sin totdto (12-78)
'ir -/o	it 'o
therefore, to specify A(<o) and B(a>) for ш 0 only. From (12-68) and
(12-70) it follows that
£([A(U) •hJB(lO][AW	“ 0	>• * ±o
12-4 SPIiCrRAl.RCPRISkNTA'IIOSOl КЛЧГЮМ PIUX t.sslA 419
Equating real and imaginary parts, wc obtain
£(A(«)A(p)) = E{A(u)B(i>)} = £{B(«)B(r)) =0 for и * г
(12-7%)
With и = w and и = -ш, (12-9) yields £(X( ш)Х( ы)} = 0 for <u * o; hence
£{A2(*0) = £{B2(co))	£{A( W)B( ш)) = 0	(12-7%)
It can be shown that the converse is also true (see Prob. 12-13). Thus a
real process x(/) is WSS if the coefficients A(<o) and B(<o) of its expansion
(12-78) satisfy (12-79) and £(A(<u)) = £{B(w)} = 0 for ш * 0.
Windows. Given a WSS process x(r) and a function iv(r) with Fourier transform
IF(<o), we form the process y(r) = h'G)xO). This process is nonstationary with
autocorrelation
Kyy(G»G) = H'(G)w‘(t2)/?(f1 - r2)
The Fourier transform of Ey>.(f,, t2) equals
Гуу(«^)=/ f w(tl)w(f2)/?(t, - t2)e-,<",^"-',df,d/2
J — X^ — X
Proceeding as in the proof of (12-75), we obtain
ry„(«,w) = Г И/(М - 0)1Г*(-и - p)S(p) d/3	(12-80)
From (12-68) and the above it follows that the autocorrelation of the Fourier
transform
Y(m) = Г	dt	(12-81)
— X
of yO) equals
E(Y(u)Y*(u)} = Гу„(и, -p) = ЗГ Г *(“ "	- IW) W
Hence
£{|У(^)|2} = -!-Г \W(a>-p)\2S(p)dp	(12-82)
2ir J—x
Example 12-11. The integral
XT(w) = [Tx(l)e-^dt
J-T
is the transform of the segment x(f)pr(r) of the process x(r). This is a special case
of (12-81) with »*</)-Pr(O and WM - 2sin7w/*>. If. therefore, x(r) is a
420 SIOCHAS1IC PttOCESSliS
stationary process, then [sec (12-82)]
,	2 sin2 7 w
E{|Xr(w)|-} = -S(<n)’-------—	(12-83)
Fourier-Stieltjes Representation of WSS
Processes!
We shall express the spectral representation of a WSS process x( t) in terms of
the integral
Z(w) = Гх(а) da	(12-84)
'o
We have shown that the Fourier transform X(o>) of x(O is nonstationary white
noise with average power 2—5(o>). From (12-76) it follows that. Z(w) is a
process with orthogonal increments:
For any w, < w, <Wi < <o4:
E([Z(w2) - ZfwJJtZ^wJ - Z*(w3)]} = 0	(12-85a)
E{|Z(u>,) -Z(to,)|2j = 27гГ25(«) r/w	(12-85b)
/W|
Clearly,
dZ(a>) = X(w) dio	(12-86)
hence the inversion formula (12-66) can be written as a Fourier-Stieltjes
integral:
1
x(t) = — f e^'dZ(^)	(12-87)
2тг J-=c
With й>| = w, ш2 = и + du and = r, a>4 = r + dr, (12-85) yields
E(rfZ(u)«/Z*(r)} =0 u*v
,	(12-88)
£{|rfZ(u)|-} = 2irS(u) du
The last equation can be used to define the spectrum S(o>) of WSS process x(r)
in terms of the process Z(w).
WOLD’S DECOMPOSITION. Using (12-85), we shall show that an arbitrary WSS
process x(r) can be written as a sum:
x(t) e x,(r) 4-хДг)	(12-89)
where x/f) is a regular process and xp(O is a predictable process consisting of
tH. Cramer: Mathematical Methods of Statistics. Princeton Univ. Press. Princeton, NJ.. 1*M6
12-4 SPICIKAI HI i'Rf-Ч1-МЛПОМ» HAMlOM >’l«ll I SSI S 421
exponentials:
*P(f) = c0+ Ес/........ A(cJ = 0	(12-91))
I
Furthermore, the two processes arc orthogonal:
+ r)x;(r)} = 0	(12-91)
This expansion is called Wold's decomposition. In See. 14-2, wc determine the
processes x,(f) and xp(t) as the responses of two linear systems with input x( r).
We also show that xp(t) is predictable in the sense that it is determined in terms
of its past; the process x/r) is not predictable.
We shall prove (12-89) using the properties of the integrated transform
Z(<u) of x(/). The process Z(<u) is a family of functions. In general, these
functions are discontinuous at a set of points w, for almost every outcome. We
expand Z(<o) as a sum (Fig. 12-5)
Z(w) = Z,(<o) + Zp(w)	(12-92)
where Zr(&>) is a continuous process for ш * 0 and Zp(<a) is a staircase function
with discontinuities at ш,. We denote by 2 ire, the discontinuity jumps at w, * U.
These jumps equal the jumps of Zz,(<o). We write the jump at ш = 0 as a sum
2ir(?j + c0) where rj = E(x(t)}, and wc associate lhe term 2tri] with Zr(w).
Thus at co = 0 the process Zr(co) is discontinuous with jump equal to 2тгг). The
jump of Zp(o>) at co = 0 equals 27rcu. Inserting (12-92) into (12-87), wc obtain
the decomposition (12-89) of xU) where xr(t) and xp(t) arc the components due
to Zr(<u) and Zp(a>) respectively.
From (12-85) it follows that Zr(co) and Zz,(w) are two processes with
orthogonal increments and such that
£(z,(u)z;(i)j - 0	£(c,c;)=|‘'	(12-W)
The first equation shows that the processes x,(/) and xp(r) are orthogonal as in
(12-89); the second shows that the coefficients c, of xp(r) are orthogonal. This
also follows from the stationarity of x„(/).
422 STOCHASTIC PROCESSES
We denote by Sr(w) and SpM the spectra and by Fr(a>) and F^w) the
integrated spectra of xr(t) and xp(.t) respectively. From (12-89) and (12-91) it
follows that
S(m) = Sr(w) +$/*>) FM = FrM + Fp(w) (12-94)
The term Fr(ai) is continuous for ш Ф 0; for ю « 0 it is discontinuous with a
jump equal, to iirtf. The term Fp(6>)fs a staircase function, discontinuous at
the points <t>j with jumps equal to 2тгк{. Hence
SpM = 2тгк0ИМ + 2irEM(ai - u()	(12-95)
i
Theimpulse at the origin of 5(ш) equals 2тг(&0 + tj2)S(6j).
Example 12-12. Consider the process
y(r) = ax(r) E{a) = 0
.where x(7) is a regular process independent of a. We shall determine its Wold
decomposition.
From the assumptions it follows that
E{y(r)} --0	R„(r) = Е{а2х(/ + т)х(г)} = <г2Яух.(т)
The spectrum of x(t) equals + 2тгг)*8(ы). Hence
s.yyM =<$«(«) + 2тгстя217;й(ш)
From the regularity of x(r )it follows that its covariance spectrum S'x(<u) has no
impulses. Since -* 0, we. conclude from (12-95) that Sp(w) = 2тгкй6(й)) where
Ao = tr2ti2. Ibis,yields
Ур(О = 'П,а y,(/) =a[x(t)-i7x]
DISCRETE-TIME PROCESSES. Given a discrete-time process 4«1 we form its
discrete Fourier transform (DFT)
X(w) = £ х[к]е~'лш	(12-96)
Л — -oo
This yields
x[h] = -- f	do)	(12-97)
2тг-/-тг
From, the definition it follows that the process X(o>) is periodic with period 2tt.
It sbfficesi therefore, to study its properties for |w| < it only. The preceding
results properly modified also hold for discrete-time processes. We shall discuss
only the digital' version-of.(12-76):
If 4h] is a WSS process with power spectrum S(ci>), then its DFT X(m) is
nonstationary white noise with autocovariance
£lX(«)X*(,u)) = 2tfS(m)5(« - и) -тг. < u, v < тг .(12-98)
12-4 Sri.CTKAt RLVRbSLNTATION 1И RANDOM PROCESSES 423
Proof, The proof is based on the identity
£ e~jnu' = 2Tr3(w)	|w| < tt
tt * -«
Clearly,
£{X(n)X*(f)} = £	£ £{x[n + "d*‘[«’])exp{ -j[{ m + л)м - /»,])
tt •• — 00 fn — ~ X
= £	£ £[/n]e~""''
tt — — ОС	ffi = - X
and (12-98) results.
BISPECTRA AND THIRD ORDER MOMENTS. Consider a real SSS process x(t)
with Fourier transform X(<o) and third-order moment Л(/х,р) [see (11-179)].
Generalizing (12-76), we shall express the third-order moment of X(w) in terms
of the bispectrum S(u, v) of x(z).
THEOREM.
£{X(m)X(i.)X*(’v)} = 2irS(u,i')8(u + t - w) (12-99)
Proof. From (12-65) it follows that the left side of (12-99) equals
Г f Г£{х(г|)х(г2)х(/3))е->(1''' + ^-^'г/г|</г2^3
J — CC*' — CD*' —
With f] = + д and t2 = t3 + v, the above yields
J"* J* /?(д>р)е“л"д+, ,’>г/д</р/”
and (12-99) results because the last integral equals 2ir5(u + i> - tv).
We have thus shown that the third-order moment of Х(га) is 0 everywhere
in the uvw space except on the plane w = и 4- v where it equals a surface
singularity with density 2тг5(м,г). Using this result, we shall determine the
third-order moment of the increments
Z(w,) - Z(o)J = f‘X(w) da)	(12-100)
of the integrated transforms Z(w) of xG).
THEOREM.
B{[-Z(^) - Z(Wl)]{Z(w4) -Z(w3)][Z‘(O - Z*(<V5)]}
= 2ttJ fs(u,u) dudt>	(12-101)
424 stochastic processes
(a)
FIGURE 12-6
where R is the set of points common to the three regions
O>| < и < <w2	&>3 < и < &>4	O)5 < »v < &>6
(shaded in Fig. 12-6a) of the uu plane.
Proof. From (12-99) and (12-100) it follows that the left side of (12-101) equals
J J J 2nS(u,v) dudvdw = 2irj f S(u,u,)duduf 8(u + u-w)dw
Ja>3 Jioi	'<«1	Jwi
The last integral', equals one for a>5 < и. + и < ыь and 0 otherwise. Hence the
right side equals the integral of 2irS(u,u) in the set R as in (12-101).
COROLLARY. Consider the differentials
dZ(u0) =» X(u0).du dZ(t>0), = Х( у0) du	dZ( w0) = X(w0) dw
We maintain that
£{dZ(u0) dZ(^) <fZ*(w0)}. = 2ttS(u0, h0) dudu (12-102)
if wa = uQ + u0 and dw ^du + du\ it is zero if w0 =£ u0 + u0.
Proof, Setting:
<ot = W0	й>з = Р0	tos = Wo = «0 + «0
d>i «о + du — yQ + du (u6 £ w0 + du + du
into (12-lOi), we obtain (12402) because the set R is the shaded rectangle of
Figi 12-6h.
We conclude with the observation that equation (12-102) can be used to
define the bispectrum of a.SSS process x(t) in terms of Z(d>).
problems 425
PROBLEMS
12-1. Find ЯД/n] and the whitening filter of x[/i] if
„ v	cos 2<o + 1
St(w) •= —---------------------
12cos2fc> - 70cos co + 62
12-2. Find the innovations filter of the process x(t) if
w4 + 64
= —j------:----
co4 + 10a>- + 9
12-3. Show that if A,[nJ is the delta response of the innovations filter of s[n], then
K,[0] - Ё /?[»]
n — 0
12-4. The process x(r) is WSS and
Л0 + 3/(t) + 2y(r) =x(z)
Show that (a)
Л;ж(т) + 3/?;г(т) + 2Яух(т)=Яжд(т)
л;у(т) + зя;.у(т) + 2Яуу(т) = яху(т), а11
(b) If Я„(т) = q8(r), then Яух(т) - 0 for т < 0 and for т > 0:
Я"х(т) + ЗЯ;х(т) + 2Яу1(т) = 0	Яух(0) = 0	Я;х(0+) = q
я;у(т) + зя;у(т) + 2Яуу(т) = о	яуу(0) = ~	R'„(Q) = о
12-5. Show that if s[n] is AR and v[n] is white noise orthogonal to s[n], then the
process х[л] = s[n] + v[zi] is ARMA. Find Sx(z) if ЯД/n] = 2"1'"1 and S,,(z) = 5.
12-6. Show that if x(r) is a WSS process and
1 «	„	1	sin2nwT/2
s = - £ х(ЛТ) then E{s2} = —j f Sx(a>) .2 T~/2~ <*<»
12-7. Show that if Я/т) » e~c,T|, then the Karhunen-Loeve expansion of x(r) in the
interval (-a, a) is the sum
*0) = £ (РлЬ„ cos <o„i + Д'Ь; sin <o‘nt)
Л-1
where
tan a«„ — — cot аш’„ = —-	fi„ = (a + cA.„) 1/2 Д' = (a - cA'„) 72
2c	2c
^{b2} = A„ = c2 + шг £{ЬА2} - А'л = c2 + ш,2
12-& Show that if x(r) is WSS and
X (w) « [T/i x^e'^dt then e(^= \XT(a>) I2) - Г
*	J-T/2	i J~T
426 STOCHASTIC PROCESSES
12-9. Find lhe mean and the variance of the integral
X(co) = f [5cos3r + v(r)]c_J,“'A
if E{v(r)) = 0 and Я„(т) = 26(т).
12-10. Show that if
E{x„xJ - tr*8[n - 4] X(w) = £
-00
and E{xJ = 0, then E{X(a>)} = 0 and
E{X(n)X*(w)} = £ an2c-'n<u',>T
n — -0°
12-11. Given a nonperiodic WSS process x(r), we form the sum xU) = Ес,е/Я"»* as in
(12-51). Show that (a) f{|x(r) - x(r)|2} = 0 for 0 < t < T. (b) E{c„c,*} =
(1/T)[g@n(a')eina’<1<‘ da where pnM >* (1/Т)/огЯ(т — da are the co-
efficients of the Fourier expansion of Я(т - a) in the interval (0, T). (c) For large
T, E{c„c*} = S(na„)8(n - m).
12-12. Show that, if the process X(w) is white noise with zero mean and autocovariance
Q(u)b(u - o), then its inverse Fourier transform x(r) is WSS with power spec-
trum Q(a>)/2ir,
12-13. Given a real process x(r) with Fourier transform X(co) = A(co) + ;B(w), show that
if lhe processes A(co) and B(a>) satisfy (12-79) and E{A(o>)) = E{B(w)} = 0, then
x(l) is WSS.
12-14. We use as an estimate of the Fourier transform F(co) of a signal /(r) the integral
Xr(a>)= fT [f(i)+V(t)]e-^dt
J-T
where v(t) is the measurement noise. Show that if Sv„(a>) = q, then
,	. rT sin Т(ш — у)
Е{Хг(а))} = / F(y) —-------------^-dy VarXr(co) = 2qT
J — T	~~ у)
CHAPTER
13
SPECTRAL
ESTIMATION
13-1 ERGODICHY
A central problem in the applications of stochastic processes is the estimation of
various statistical parameters in terms of real data. Most parameters can be
expressed as expected values of some functional of a process x(/). The problem
of estimating the mean of a given process x(r) is, therefore, central in this
investigation. We start with this problem.
For a specific t, x(0 is an RV; its mean 17(f) = £{x(t)} can, therefore, be
estimated as in Sec. 9-2: We observe n samples xU,£) of xG) and use as lhe
point estimate of E{xG)} the average
4<J) e ~Ex(',O
n i
As we know, ?}(/) is a consistent estimate of 17(f); however, it can be used
only if a large number of realizations x(f,£) of x(f) arc available. In many
applications, we know only a single sample of x(f). Can we then estimate 17(f) in
terms of the time average of the given sample? This is not possible if £{x(/)}
depends on t. However, if x(r) is a regular stationary process, its time average
tends to £{x(f)} as the length of the available sample tends to ». Ergodicity is a
topic dealing with the underlying theory.
427
c
428 SPECTRAL ESTIMATION
Mean-Ergodic Processes
We are given a real stationary process x(r) and we wish to estimate its mean
-q = E{x(r)). For this purpose, we form the lime average
> rr
Пт=^[ *(')df	(13-1)
2/ J- 7’
Clearly, т)г is an RV with mean
£{Лг} =	£{*(')} dt = 77
2/ J - т
Thus iir is an unbiased estimator of 77. If its variance cr/ -> 0 as T -* x. then
qr -> 17 in the MS sense. In this case, the time average t|r(f) computed from a
single realization of x(t) is close to 77 with probability close to 1. If this is true,
we shall say that the process x(f) is mean-ergodic. Thus a process x(/) is
mean-ergodic if its time average tends to the ensemble average 77 as T -> x.
To establish the ergodicity of a process, it suffices to find ar and to
examine the conditions under which cr7- -* 0 as Г -» ». As the following
examples show, not all processes are mean-ergodic.
Example 13-1. Suppose that c is an RV with mean and
x(t) = c 77 = E{x(f)} = E{c) = 77c.
Tn this case, x(r) is a family of straight lines and -qr = c. For a specific sample,
TlyCf) = c«) is a constant different from 77 if ctf) * 77, Hence x(f) is not
mean-ergodic.
Example 13-2. Given two mean-ergodic processes xjr) and x,(r) with means
and 77,, we form the sum
x(r) = x,(r) + cx2(r)
where c is an RV independent of x2(r) taking the values 0 and 1 with probability
0.5. Clearly,
E(x(r)} = E(x,( t)} + E{c)E{x2(/)} = 77, + O.577,
If df) = 0 for a particular then x(r) = x,(r) and 17,- -> 77, as Г -> x. if c(f) = 1
for another then x(r) = xt(r) + x2(r) and Пт -* 77, + 77, as T -» ». Hence x(r)
is not mean-ergodic.
VARIANCE. To determine the variance cr/ of the time average -Пу of x(z), we
start, with the observation that
•nr=w(0) where w(r) = Г+ Гх(а) da (13-2)
2TJt~T
is the moving .average of x(r). As we know, w(f) is the output of a linear system
With input x(/) and with impulse response a pulse centered at t = 0. Hence wG >
13’1 I Ш.ОО11 in 429
Is stationary and its autocovariancc equals
1	f-24	/	|(jf| \
Q..(^) = 2f j C<T ~ tr> 1 ~ Tfl da	(13-3)
where C(r) is the autocovariancc of xG) [sec (111-142)1. Since trj = Varw(0) -
CWH.(0) and C( -a) = C(a), this yields
,	1 f2T d	1 /-2Г I <» \
,rf"2r/-2r l“,| 2?) rX ‘'"’I' 2/T'"
This fundamental result leads to the following conclusion: Л pioccss x(r) with
autocovariancc C(r) is mean-ergodic iff
1 f2T I « \
7A, C<a,('  2f)d“T^ °	('3-5>
The determination of the variance of xG) is useful not only in establishing
the crgodicity of xG) but also in determining a confidence interval for the
estimate 7]t of tj. indeed, from Tchebycheffs inequality it follows that the
probability that the unknown rj is in the interval t|-z ± IDtr, is larger than 0.99
[see (5-57)]. Hence -Tjr is a satisfactory estimate of -q if T is such that a, <k q.
Example 13-3. Suppose that C(t) = qe 'M as in (11-15). In this case.
-LTT-)
I -'ll к 2/ f cl \ 2cl )
Clearly, tr/ -» 0 as T -» «; hence xG) is mean-ergodic. If T I /с. then »
<//cT.
Example 13-4. Suppose that xG) = n + pG) where v(r) is while noise with
Kp1,(t) =• q8(r). In ihis case. С(т) *= Я„,,(т) and (13-4) yields
Hence xG) is mean-ergodic.
It is clear from (13-5) that the crgodicity of a process depends on the
behavior of C(r) for large r. If C(r) = 0 for r > a, that is, if xG) is n-depen-
dent and T => a, then
1 ,a	/	r \	I .a	U
*hf/ С(ТЦ1 " Tr) dT = rf C(T) dT < TC(0)	°
.1 JQ	\	2,1 )	1 JQ	1
because |C(r)| < C(0); hence x(/) is mean-ergodic.
In many applications, the RVs xG + r) and xG) are nearly uncorrelated
:for large t, that is, C(r) -» 0 as т -> «. The above suggests that if this is the
й«, then xG) is mean-ergodic and for large T the variance of т)г can be
430 Spectral estimation
approximated by
<4 = ± [2TC(r) <Zr = ifc(r) dr - ^C(O)	(13-6)
/ •'Q	1 JO	J
where tc is the correlation time of x(r) defined in (10-49). This result will be
justified presently.
SLUTSKY’S THEOREM. A process x(r) is mean-ergodic iff
(13-7)
Proof, (a) We show first that if <rT 0 as T -»cc, then (13-7) is true. The
covariance of the RVs i)r and x(0) equals
€ov[i)r,x(b).]	WO -''ilW0) - ч]	C(t)dt
\27 J-r	}	2.1 J-r
But [See (7-9)1
Cov2[ijr,’x(0)] < Var T|7-Varx(0) = <r^C(0)
Hence (13-7.) holds if <rT -> 0.
(&)	We show next that if (13-7). is true; then <rr -* 0 as T -» From
(13-7) it follows that given c > 0, we can find a constant c0 such that
•1 rr
- J C(r) dr < e for every c > c0	(13-8)
t Jc
The variance of -nr equals [see (13-4)]
cr2 = у27“+ - J27C(t)(1 - — 1 dr
Jd T	\ 2T }
The integral from 0 to 2T0 is less than 2T0C(Q)/T because |C(t)1 £ C(0).
Hence
i 2T0	1 r2T ( r \
^<^CW + -f2TCW[l-^]dT
But:(see fig. 13-1).
f1TC(f){2T-rydr^ (2TC(r) f2Tdtdr = f2T f‘ C(r)drdt
'ЭТи	J2T0 J7	J2f0J2T^
From (13-8)’it follows that the inner integral on the right is less than et; hence
27'л. .	E >2T
°r <	+ ^2 / t dt 2e
т	T* J2TV	T^°°
and since ₽ is arbitrary, we conclude that oT -> 0 as T -* «>,
13'1 I R<,(ir>|( fn 411
Example 13*5. Consider the process
x( t) = a cos <Di + b sin on + c
where a and b are two uncorrclated RVs with zero mean and equal variance. As
we know [see (10-55)], the process x(f) is WSS with mean c and autocovariance
cr2 cos o)T. We shall show that it is mean-ergodic. This follows from (13-7) and the
fact that
— [ГС(т) dr = — [Tcos o)t dr = — sin шТ ——► 0
T Jq	T ->п	ыТ	г—&
Sufficient conditions, (a) If
Гс(т)</т<оо	(13-9)
•'0
then (13-7) holds; hence the process x(f) is mean-ergodic.
(6) If 7?(r) -» 7/2 or, equivalently, if
C(t) -» 0 as r-»w	(13-10)
then x(t) is mean-ergodic.
Proof, If (13-10) is true, then given e > 0, we can find a constant To such that
|C(t)| < e for t > TQ; hence
4 fTC(r) dr = — /Г°С(т) dr + - fTC(r) dr
TJ.o	Г	‘ Jru
Tn	T-TQ
< t£C(0) + £------- £
у v '	’T T-*’»
and .since e is arbitrary, we conclude that (13-7) is true.

432 SPECTRAL ESTIMATION
Condition (13-10) is satisfied if the RVs x(t + r) and x(r) arc uncorrelated
for targe t.
Note The time average is an unbiased estimator of rj; however, it is not best. An
estimator with smaller variance results if we use the weighted average
= Г w(t)x(t)dt
J-r
and select the function w(t) appropriately (see also Example 8-4).
DISCRETE-TIME PROCESSES. Wc outline next, without elaboration, the dis-
crete-time version of the preceding results. We are given a real stationary
process x[n] with autocovariance C[w] and we form the time average
1 M
Пл<=Т7 £ x[/t] W = 2Л/+ 1	(13-11)
™ n~ -M
This is an unbiased estimator of the mean of x[?z] and its variance equals
1 2M	(	|m|\
E Cfm] 1 - —	(13-12)
ZV m - - 2M	к	zv /
The process x[n] is mean-ergodic if the right side of (13-12) tends to 0 as
M ->«.
SLUTSKY’S THEOREM. The process x[n] is mean-ergodic iff
1 w
ттЕсЫ^о	(13-13)
m-0
We can show as in (13-10) that if C[m] -> 0 as m -> oo, then x[?i] is mean-
ergodic.
For large M,
1 м
°‘m = T7£C["’]	(13-14)
m eU
Example 13-6. (a) Suppose that the centered process x[n] = x[n] - n is while
noise with autocovariancc P3[m], In this case,
Iм	P
C[m] = P$[m] ofi ~ —	= —
Thusx(n] is mean-ergodic and the variance of equals P/N. This agrees with
(8-22): TheRVsxfn] are iJ.d. with variance C(0] = P, and the time average -nw is
their sample mean.
13-1 ERGOOIcrnr 433
(6) Suppose now that C[/»] = Pa'"'1 as in Example 10-31. In this ease
(13-14) yields
Note that if wc replace x[n] by white noise as in («) with the same P and use as
estimate of tj the time average of N, terms, the variance P/Nt of the resulting
estimator will equal <r/f if
Sampling. In a numerical estimate of the mean of a continuous-time process
x(t), the time-average is replaced by the average
1 V-
of the N samples x(/rt) of x(f). This is an unbiased estimate of t] and its
variance equals
= ~j^2 52	— tk)
n к
where C(t) is the autocovariance of x(z). If the samples are equidistant, then
the RVs x(r„) — x(nT0) form a discrete-time process with autovariance C(mT0).
In this case, the variance afi of Пл, is given by (13-12) if we replace C[/n] by
C(mT0).
SPECTRAL INTERPRETATION OF ERGODICITY. We shall express the ergodicity
conditions in terms of the properties of the covariance spectrum
Sf(m) = 5(<u) - 2ir7]28(a>)
of the process x(t). The variance cr? of nr equals the variance of the moving
average w(/) of x(r) [see (13-2)]. As we know,
sin2 Ta	,	.
hence
1	sin2 Ты
<Гт = -z-j Se(w) da>	(13-16)
The fraction in (13-16) takes significant values only in an interval of the order of
1/T centered at the origin. The ergodicity conditions of x(t) depend, therefore,
only on the behavior of Se.(a>) hear the origin.
Suppose first that the process x(/) is regular. In this case, Sf(a>) does not
have an impulse at ш - 0. If, therefore, T is sufficiently large, we can use the
434 SPECTRAL ESTIMATION
approximation 5f(<w) ~ 5c(0) in (13-16). This yields
5f(0) ,» sin2T<u Sc(0)
Hence x(f) is mean-ergodic.
Suppose now that
Sc(w) = Sf(«i) + 2тг/са8(ш) Sf(O) < oo
Inserting into (13-16), wc conclude as in (13-17) that
= -U,(0) + kn > k0
2/
(13-17)
(13-18)
Hence x(z) is not mean-ergodic. This case arises if in Wold’s decomposition
(12-89) the constant term cft is different from 0, or, equivalently, if the Fourier
transform X(&>) of x(f) contains the impulse 2ire0<5(a>).
Example 13-7. Consider the process
y(r) = ax(f) £{a) = 0
where x(t) is a mean-ergodic process independent of the RV a. Clearly, E{y(f)} = 0
and
= <rfl23\\(w) + 2тга~т];8(ш)
as-in Example 12-12. This shows that the process y(/) is not mean-ergodic.
The preceding discussion leads to the following equivalent conditions for
mean ergodicity:
1.	<rT must tend, to 0 as T -> <».
2.	In Wold's decomposition (12-89) the constant random term c0 must be 0.
3.	The integrated power spectrum Fr(w) must be continuous at the origin.
4.	The integrated Fourier transform Z(a») must be continuous at the origin.
Analog estimators. The mean 77 of a. process x(r) can be estimated by the
response of a physical.system with input x(/). A simple example is a normalized
integrator of finite integration time. This is a linear device with, impulse
response the rectangular pulse p(t) of Fig. 13-2. For ( > To the output of the
integrator equals
У(О=4-Г H{.a)da
4Ji-r0
If To is. large compared to the correlation time tc of x(f), then the variance of
>0) equals 2тсС(О)/Го. This follows, from (13-6) with To =» 2T.
13-1 hRGODK II ¥ 435
Suppose now that x(f) is the input to a system with impulse response h(t)
of unit area and energy E:
w(t) = ( x(a)h(t - a) da E = [ h2(t) dt
Jo	4)
We assume that С(т) = 0 for т > Tt and Л(г) = 0 for t > Tn > Tt as in Fig.
13-2. From these assumptions it follows that EMO) = r] and <ти2 = ЕС(0)тг for
t > TQ. If, therefore, EC(0)tc 772 then w(r) = 17 for t > Tn. The above
conditions are satisfied if the system if low-pass, that is, if H(w) = 0 for
Id < шс and шс 772/C(O)tc.
Covariance-Ergodic Processes
We shall now determine the conditions that an SSS process x(r) must satisfy
such that its autocovariance C(A) can be estimated as a time average. The
results are essentially the same for the estimates of the autocorrelation ЖА)
of x(r).
VARIANCE. We start with the estimate of the variance
|/=C(0) =E{|x(/) -77I2} =£{x2(')} - V (13-19)
of x(z).
Known mean. Suppose, first, that 77 is known. We can then assume, replacing
the process x(r) by its centered process x(/) - 77, that
E(x(/)J =0	И=Е{х2(г)}
Our problem is thus to estimate the mean V of the process x2(O. Proceeding as
in (13-1), we use as the estimate of V the time average
Vr--!-/’x2(<)A	(13-20)
Z/ J-T
This estimate is unbiased and its variance is given by (13-4) where we replace
436 SPECTRAL ESTIMATION
the function C(r) by the autocovariance
C,2i.-(t) = E{x2(t + r)x2(r)} - E2{x2(z)}	(13-21)
of the process x2(f). Applying (13-7) to this process, we conclude that x(.') is
variance-ergodic iff
| [TE(x2(< + r)x2(t)) dl	C2(0)	(13-22)
I J<l	1
To test the validity of (13-22), we need the fourth-order moments of x<t). If,
however, x(/) is a normal process, then [see (10-68)]
Сл2,.’(т) = 2С2(т)	(13-23)
From this and (13-22) it follows that a normal process is variance-ergodic ill’
^/^(т)^—> 0	(13-24)
TA>
Using the simple inequality (see Prob. 13-10)
1	2	।
- (ГС(т) dr <. -	dr
1 Jo	I •'o
we conclude with (13-7) and (13-24) that if a normal process is variance-ergodic,
it is also mean-ergodic. The converse, however, is not true. This theorem has
the following spectral interpretation: The process x(z) is mean-ergodic iff Sr(w)
has no impulses at the origin; it is variance-ergodic iff Sr(a>) has no impulses
anywhere.
Example 13-8. Suppose that the process
x(r) = a cos u)t + bsin a>t + tj
is normal and stationary. Clearly, x(z) is mean-ergodic because it docs not contain
a random constant. However, it is not variance-ergodic because the square
|x(r) — tj|2 = 4(a2 + b2) + 4(a2 cos2o>r - b2 cos2wz) + absin 2wt
oi x(r) — 7j contains the random constant (a2 + b2)/2.
Unknown mean. If 77 is unknown, we evaluate its estimator from (13-1) and
fonn the average
Vr =	[x(') “ Л7]2^ = ^pfT x2(') dt - rtf
The determination of the statistical properties of Vr is difficult. The following
observations, however, simplify the problem. In general, V7- is a biased estimator
ofithe variance И of x(f). However, if T is large, the bias can be neglected in the
determination of the estimation error; furthermore, the variance of V7 can be
approximated by the variance of the known-mean estimator Vr. In many cases.
13-1 । KbODK nv 437
the MS error £{(Vr - И)2) is smaller than E((V7- - Г )2) for moderate values ot
T. It might thus be preferable to use V, as the estimator of V even when r? is
known.
AUTOCOVARIANCE. We shall establish the ergodicity conditions for the auto-
covariance C(A) of the process x(/) under the assumption that £'(x(r)) = 0 Wc
can do so, replacing x(/) by x(/) - tj if 77 is known. If it is unknown, we replace
x(/) by x(f) - In this case, the results arc approximately correct if T is
large.
For a specific A, the product xG + A)x(f) is an SSS process with mean
C(A). We can, therefore, use as the estimate of C(A) the time average
1	rT
dt z(r) = x(r + A)x(r) (13-25)
This is an unbiased estimator of C(A) and its variance is given by (13-4) if we
replace the autocovariancc of x(/) by the autocovariance
С„(т) = £{х(/ + A + т)х(г + t)x(/ + A)x(t)) - C2(A)
of the process z(f). Applying Slutsky’s theorem, wc conclude that the process
x(f) is covariance-ergodic iff
1'3-26)
If x(r) is a normal process,
Cjr) = C(A + t)C(A - r) + C2(r)	(13-27)
In this case, (13-6) yields
VarCT(A) = f2T[C(X + r)C(A - 7) + C~(7)] dr (13-28)
т A)
From (13-27) it follows that if C(r) -» 0, then C.Xr) -» 0 as 7 -♦ «>; hence x(/)
is covariance-ergodic.
Cross-covariance. We comment briefly on the estimate of the eross-covariance
Сжу(т) of two zero-mean processes x(f) and y(f). As in (13-25), the time average
Cxy( 7) =	£^x(/ + 7)y( 1) dt	(13-29)
is an unbiased estimate of Cxy(r) and its variance is given by (13-4) if we replace
C(r) by Cxy(r). We note, finally, that if both processes are variance-ergodic,
they are also cross-covariance-ergodic (see Prob. 13-9).
NONLINEAR ESTIMATORS. The numerical evaluation of the estimate CT(A) of
СЦ) involves the evaluation of the integral of the product x(f + A)x(f) for
438 SPIiCTRAl. KSTIMAllON
various values of Л. We show next that the computations can in certain cases be
simplified if we replace one or both factors of this product by some function! of
x(r). We shall assume that the process x(r) is normal with zero mean.
The arcsine law. We have shown in (10-71) that if y(r) is the output of a hard
limiter with input x(r):
r S	/ '	/	1	*(') >°
x(()<0
then
2	СЛ((т)
Cyy( t ) = - arc sin — - -	(13-30)
77	Clv(0)
The estimate of Cyy(r) is given by
1	r
Суу(т) = —[ sgnx(t + r)sgnx(r) dt	(13-31)
li	J-T
This integral is simple to determine because the integrand equals ± 1. Thus
-y-i)
where TT+ is the total time that x(r + r)x(r) > 0. This yields the estimate
C„(r) = C„(0)sin[yC„.(r)]
of Сжж(т) within a factor.
Bussgang’s theorem. We have shown in (10-72) that the cross-covariance of the
processes x(f) and yG) = sgnx(r) is proportional to Сгж(т):
/	2
C„(r) = KC,,(r) К - J	(13-32)
To estimate Сжж(т), it suffices, therefore, to estimate Сжу(т). Using (13-29), we
obtain
Агж(т) = ^Cxy(r) = Г *(t + r)sgnx(t) dt (13-33)
К	2K1J-r
CORRELOMETERS AND SPECTROMETERS. A correlometer is a physical device
measuring the autocorrelation K(A) of a process x(/). In Fig. 13-3 we show two
correlometers. The first consists of a delay element, a multiplier, and a low-pass
fS. Cambanis and E. Masry: “On the Reconstruction of the Covariance of Stationary Gaussian
Processes Through Zero-Memory Nonlinearities," IEEE Transactions on Informalion Theory, Vol.
IT-24.1978.
13-1 ikGODKin 439
(/>)
FIGURE 13*3
(LP) filter. The input to the LP filter is the process x(r - A)x(r); the output уДг)
is the estimate of the mean Ж A) of the input. The second consists of a delay
element, an adder, a square-law detector, and an LP filter. The input to the LP
filter is the process [x(t - A) + x(r)]2; the output y;(r) is the estimate or the
mean 2[Ж0) + Л(Л)] of the input.
A spectrometer is a physical device measuring the Fourier transform S(<u)
of Ж A). This device consists of a bandpass filter B(<o) with input x(r) and
output y(t), in series with a square-law detector and an LP filter (Fig. 13-4). The
input to the LP filter is the process y2(/); its output z(r) is the estimate of the
mean E{y2(t)} of the input. Suppose that В(ш) is a narrow-band filter of unit
energy with center frequency o)(l and bandwidth 2c. If the function 5(w) is
continuous at o)0 and c is sufficiently small, then S( w) = S(o)0) for |o) - wj < c;
hence (see (10-139)]
1 г00	S(Wii)	_
E{y2(t)} = V [ S(a))B2(a)) dco « —---------- f B"(a)) da) = S(a>„)
2тГ * —so	277 'ыц-С
as in (10-153). This yields
z(i) = E{y2(0} = S(o)n)
We give next the optical realization of the correlometer of Fig. 13-36 and
the spectrometer of Fig. 13-4.
FIGURE 134
440 SPECTRAL ESTIMATION
The Michelson interferometer. The device of Fig. 13-5 is an optical correlome-
ter. It consists of a light source 5, a beam-splitting surface B, and two mirrors.
Mirror Mx is in a fixed position and mirror M2 is movable. The light from the
source S is a random signal x(r) traveling with velocity c and it reaches a
square-law detector D along paths 1 and 2 as shown. The lengths of these paths
equal / and I + 2d respectively, where d is the displacement of mirror M2 from
its equilibrium position.
The signal reaching the detector is thus the sum
Лх(г - t0) + Ax(t - tQ - Л)
where A is the attenuation in each path, r0 = I/с is the delay along path 1, and
Л = 2d/c is the additional delay due to the displacement of mirror M2. The
detector output is the signal
z(r) = A2[x(t - t0 - Л) + x(r - r0)]2
Clearly,
E(z(0) = 2Л2[Я(0) + Л(Л)]
If, therefore, we use z(f) as the input to a low-pass filter, its output y(f) will be
proportional to Я(0) + B(A) provided that the process x(r) is correlation-ergodic
and the band of the filter is sufficiently narrow.
The Fabry-Perot interferometer. The device of Fig. 13-6 is an optical spectrom-
eter. The bandpass filter consists of two highly reflective plates Px and P2
distance d apart and the input is a light beam x(f) with power spectrum 5(«).
13-1 ergoimcitv 441
Fubry-Pcrot interferometer
(b)
FIGURE 13-6
The frequency response of the filter is proportional to
1
= -------5--n	Г — I
'	1 _ ri-Q-12ad/c
where r is the reflection coefficient of each plate and c is the velocity of light in
the medium M between the plates. The function В(ш) is shown in Fig. lO-IOb.
It consists of a sequence of bands centered at
Trnd
whose bandwidth tends to 0 as r -» 1. If only the znth band of В(ш) overlaps
with S(a>) and r = 1, then the output z(f) of the LP filter is proportional to
5(<am). To vary CD„,, we can either vary the distance d between the plates or the
dielectric constant of the medium M.
Distribution-Ergodic Processes
Any parameter of a probabilistic model that can be expressed as the mean of
some function of an SSS process x(r) can be estimated by a time average. For a
specific jc, the distribution of x(t) is the mean of the process y(t) = Z/[x - x(/)):
442 SPECfRAL ESTIMATION
FIGURE 13-7

Hence Fix) can be estimated by the time average of y(t). Inserting into (13-1),
we obtain the estimator
'T+T"	(13-34)
2/J-T
where rf are the lengths of the time intervals during which x(r) is less than x
(Fig. 13-7й).
To find the variance of Fr(x), we must first find the autocovariance of y(/).
The product yit + r)y(r) equals 1 if x(r + r) <x and x(t) <x\ otherwise, it
equals 0. Hence
Ry(r) = P{x{t + r) <x, x(t) <x) = F(x,x;r)
where F(x, x; r) is the second-order distribution of x(r). The variance of Fr(x)
is obtained from (13-4) if we replace C(r) by the autocovariance Fix, x;t) -
F2(x) of yit). From (13-7) it follows that a process x(r) is distribution-ergodic iff
1 т
-( F(x,x;r) (It ——» F2(x)	(13-35)
I Jq	' “
A sufficient condition is obtained from (13-10): A process x(r) is distribution-
ergodic if Fix, x; r) -» F2ix) as т -»<». This is the case if the RVs x(r) and
x(t + t) are independent for large r.
Density. To estimate the density of x(r), we form the time intervals Дт, during
which x(r) is between x and x + Дх (Fig. 13-7Z0. From (13-34) it follows that
1 „
f(x) Дх « F(x + Дх) - F(x) = — Y kr.
Thus fix) Ax equals the percentage of time that a single sample of x(/) is
between x and x + Дх. This can be used to design an analog estimator of fix).
13-2 SPEC TICAI. ES I IMA HON 443
13-2 SPECTRAL ESTIMATION
We wish to estimate the power spectrum S(co) of a real process x(r) in terms of
a single realization of a finite segment
xr(t) =x(r)pr(r) pT(t) =	(13-36)
of The spectrum S(co) is not the mean of some function of xG). It cannot,
therefore, be estimated directly as a time average. It is, however, the Fourier
transform of the autocorrelation
It will be determined in terms of the estimate of Жт). This estimate cannot he
computed from (13-25) because the product x(t + r/2)x(t - r/2) is available
only for i in the interval (-7 + |т|/2, T — |t|/2) (Fig. 13-8). Changing 27 to
27 - |r|, we obtain the estimate
Rr(r)=«(<+ i)A (13’37)
27 - |t[•/-т+|т|/2 \	2) \	2}
This integral specifies Rr(r) for |r| < 27; for |т| > 2T we set Rr(r) = 0. The
above estimate is unbiased; however, its variance increases as |t| increases
because the length 27- |t| of the integration interval decreases. Instead of
RT(r), we shall use the product
RT(r) = (1 - ^)кГ(г)	(13-38)
This estimator is biased; however, its variance is smaller than the variance of
FIGURE 13-8
444 SPECTRAL ESTIMATION
Rr(r). The main reason we use it is that its transform is proportional to the
energy spectrum of the segment xT(t) of x(r) [see (13-39)].
The periodogram
The periodogram of a process x(f) is by definition the process
STM = ^~ f xa)'-*1 dt
21 -T
(13-39)
The above integral is the Fourier transform of the known segment xr(r) of x( t):
Sr(w) = |Хг(ш)|2 XT(<o) = (T x( t)e~Ja>l dt
2T
We shall express Sr(<u) in terms of the estimator Rr(r) of Жт).
THEOREM
Sr(w) = [2T RT(T)e~iu>r dr	(13-40)
•J — 2T
Proof. The integral in (13-37) is the convolution of xr(f) with xf(-r) because
xr(f) = 0 for |f | > T. Hence
1
Кг(т) = ^yxr(T)*xr(—r)	(13-41)
Since xr(f) is real, the transform of xT(-f) equals Х£(ш). This shows that
(convolution theorem) the transform of Rr(r) equals the right side of (13-39).
In the early years of signal analysis, the spectral properties of random
processes were expressed in terms of their periodogram. This approach yielded
reliable results so long as the integrations were based on analog techniques of
limited accuracy. With the introduction of digital processing, the accuracy was
improved and, paradoxically, the computed spectra exhibited noisy behavior.
This apparent paradox can be readily explained in terms of the properties of the
periodogram: The integral in (13-40) depends on all values of Rr(r) for т large
and small. The variance of Rr(r) is small for small т only, and it increases as
r 2T. As a result, Sr(w) approaches a white-noise process with mean 5(w) as
t increases [see (13-57)].
To overcome this behavior of Sr(to), we can do one of two things: (1) We
replace in (13-40) the term Rr(r) by the product iv(t)R7(t) where и’(-г) is a
function (window) close to 1 near the origin, approaching 0 as т -* 2T. This
deemphasizes the unreliable parts of RT(r), thus reducing the variance of its
transform; (2) We convolve Sr(a>) with a suitable window as in (11-164).
We continue with the determination of the bias and the variance of Sr(w).
13-2 МЧС I RAI IsllMAtlO'. 445
Bias. From (13-38) and (13-40) it follows that
E{Sr(tt)J = f (1 — ——	-)e,_'“'7 dr
J-2T\ -I J
Since
(	|r| )	2sin2 Тш
1 - — рг(т) «-*	,
\ Zr )	I m~
we conclude that [see also (12-83)]
,	, sin- T( ш - v)
£{Sr(w))=f —----------------'—S(y)dy	(13-42)
J ~Г[ш — y)
The above shows that the mean of the periodogram is a smoothed version of
S(o)); however, the smoothing kernel sin2 T(<d - у)/тгТ(<а - у)2 takes signifi-
cant values only in an interval of the order of \/T centered at у = <a. If,
therefore, T is sufficiently large, we can set 5(y) = SGo) in (13-42) for every
point of continuity of S(a>). Hence for large T,
sin2 T(<a - v)
E{Sf(W)}	——-----------rdy = S(u))	(13-43)
тгТ(ш — у)
From this it follows that Sz(cu) is asymptotically an unbiased estimator of 5(w).
Data window. If S(cu) is not nearly constant in an interval of the order of l/T.
the periodogram is a biased estimate of 5(w). To reduce the bias, we replace in
(13-39) the process x(f) by the product c(z)x(r). This yields the modified
periodogram
sc(") = [Г c(t)x(t)e
21 J -T
(13-44)
The factor c(/) is called the data window. Denoting by C(w) its Fourier
transform, we conclude that [see (12-82)]
£-{Sc(^)} = -4^Ы*С2(<о)	(13-45)
4тгГ
VARIANCE. For the determination of the variance of Sr(cu), knowledge of the
fourth-order moments of x(t) is required. For normal processes, all moments
can be expressed in terms of /?(т). Furthermore, as T -»<», the fourth-order
moments of most processes approach the corresponding moments of a normal
process with the same autocorrelation (see Papoulis 1.977). We can assume,
therefore, without essential loss of generality, that x(r) is normal with zero
mean,
446 SPECTRAL ESTIMATION
THEOREM. For large T:
Var Sr(w) =
f 25z(0)
U2(")
(о — 0
Ы » i/r
(13-46)
at every point of continuity of 5(cu).
Proof. The Fourier transform of the autocorrelation R(r, - r2)pr(f|)pz(f2) of
the process xr(t) equals
2sin Ta sin T(u + и - a)
T(w,y) = / -—------г--
Jтга(и + - a)
S(u — a) da
(13-47)
This follows from (12-80) with W4fc>) = 2 sin Тш/ш. The fraction in (13-47) takes
significant values only if the terms aT and (u + v - a)T are of the order of 1;
hence, the entire fraction is negligible if |u + l*| » \/T. Setting и = r = <o, we
conclude that Г(ш, <a) — 0 and
2 sin2 Ta
Г(й>, — a>) = / ------5—S(<o — a) da
J-a. тга
2sin2 Ta
- S(w)I --------=—da = 2TS(<o)
J-в тга
(13-48)
for Ы » 1/T and since [see (12-74)]
VarSr(") =	[r2(w> -o>) + Г2(си, w)]
and Г(0,0) = 5(0), (13-46) follows.
Note For a specific r, no matter how large, the estimate Rz(t) -♦ Л(т) as T -♦ <». Its
transform Sr(<u), however, does not tend to S(tu) as T -» ». The reason is that the
convergence of Rr(r) to Я(т) is not uniform in r, that is, given e > 0, we cannot find a
constant TQ independent of r such that |Rz(t)- 7?(t)| < e for every r, and every
T>Tn.
Proceeding similarly, we can show that the variance of the spectrum БДси)
obtained with the data window c(r) is essentially equal to the variance of Sr(cu).
This shows that use of data windows does not reduce the variance of the
estimate. To improve the estimation, we must replace in (13-40) the sample
autocorrelation Rr(r) by the product w(t)Rz(t), or, equivalently, we must
smooth the periodogram ST(e)).
Note Data windows might be useful if we smooth Sr(a>) by an ensemble average:
Suppose that we have access to N independent samples x(r,£) of x(t), or, we divide a
single long sample into N essentially independent pieces, each of duration 2T. We form
13-2 SPfXIRAl. bSIIMAT ION 447
the periodograms Sr(w, of each sample and their average
I
Sr(w) = — ESr(w,<,)	(13-49)
As we know,
sin* oj7	_ i
E(Sr(<o)} = S(w)«———	VarSr(w) = —52(w)	(13-50)
ttz co	N	'
[f is large, the variance of Sr(w) is small. However, its bias might be significant. Use
of data windows is in this case desirable.
Smoothed Spectrum
We shall assume as before that T is large and x(z) is normal. To improve the
estimate, we form the smoothed spectrum
S„.(w) = 7-/ Sr(" ~ У)И'(у) dy = [2T H’(r)Rr(7)e-J“Tdr (13-51)
Z7T J — x	J -IT
where
1 rx
»v(t) = —/	d<o
2тг J-«
The function w(t) is called the lag window and its transform WXw) the
spectral window. We shall assume that W(-w) = W(<o) and
1
w(0) =1 = — / lV((o)d<a l¥(a>)>Q	(13-52)
2тг J-x
Bias. From (13-42) it follows that
1	1 sin2 Тш
£S„(<»)	— $(«).——•и'М
2 тг	2 тг	тг Г<0
Assuming that HTw) is nearly constant in any interval of length l/T, we obtain
the large T approximation
1
£{S„.(<i>)} = —5(w)*W<(<u)	(13-53)
2тг
Variance. We shall determine the variance of Sw(w) using the identity [see
(12-74)]
C0v[Sr(u),Sr(l>)] = dp[Г2(и, -о) + Г2(к,п)1	(13-54)
This problem is in general complicated. We shall outline an approximate
solution based on the following assumptions: The constant T is large in the
sense -that the functions 5(ш) and HTzo) are nearly constant in any interval of
length l/T.The width of that is, the constant cr such that ИЧео) = 0 for
448 SPECTRAL ESTIMATION
|ш| > O', is small in the sense that S(a>) is nearly constant in any interval of
length 2<r.
Reasoning as in the proof of (13-48), we conclude from (13-47) that
Г(и,у) — 0 for и + v » l/T and
2sin T(u — v — a)sin Ta	2sinT(u — u)
Г(и, - и) = S(u) f -------------------г-----da = S(u)-------------—
v	J-x тг(и - v - a)a	и - и
This is the generalization of (13-48). Inserting into (13-54), we obtain
sin2T(n-r)
Cov Sr(«). Sr( i’)] = ~r,------(13-55)
T-(u - l)
Equation (13-46) is a special case obtained with и = v = <o.
THEOREM. For |w| » l/T
VarS.W =	(13-56)
where
Elv=—/ Wz(a>)da>
Z7T — oo
Proof. The smoothed spectrum SH,(ca) equals the convolution of Sr(w) with the
spectral window	From this and (10-87) it follows mutatis mutandis
that the variance of Slv(w) is a double convolution involving the covariance
of Sr(ca) and the window W(a>). The fraction in (13-55) is negligible for
l« - у| » l/T. In any interval of length l/T, the function И'(са) is nearly
constant by assumption. This leads to the conclusion that in the evaluation of
the variance of SH,(td), the covariance of Sr(ca) can be approximated by an
impulse of area equal to the area
,,	,« sin2 T(u - i>) тг ,
S2(u) f —-------------5- do = — S2(u)
Tz(u - v)2 T v
of the right side of (13-55). This yields
Cov[Sr(u),Sr(t’)] -	- u) q(u) = —S2(u) (13-57)
From the above and (10-91) it follows that
. тг r® _	И/2(у)
VarS^w) = — Г S2(w ~y)——dy =
T	4tt~
S2M r W2(y)
2T f-x 2тг У
and (13-56) results.
WINDOW SELECTION. The selection of the window pair iv(/) <-> НХю) depends
on two conflicting requirements: For the variance of Sw(w) to be small, the
energy Ew of the lag window w(t) must be small compared to T. From this it
13-2 spi.ciRAt iisTiMAitoN 449
follows that и'(г) must approach 0 as t -> 2T. Wc can assume, therefore,
without essential loss of generality that »v(/) = 0 for |/| > M where M is a
fraction of 2T. Thus
S„.(w) = fM	dt M < IT
J-M
The mean of Su.(<z>) is a smoothed version of S(to). To reduce the effect of the
resulting bias, we must use a spectral window W(w) of short duration. This is in
conflict with the requirement that M be small (uncertainty principle). The final
choice of M is a compromise between bias and variance. The quality of the
estimate depends on M and on the shape of w(z). To separate the shape factor
from the size factor, we express w(r) as a scaled version of a normalized window
w0(z) of size 2:
I 1 )
w(t) = >voI W'(o)) =	(13-58)
where
w0(t) = 0 for |t| > 1
The critical parameter in the selection of a window is the scaling factor M.
In the absence of any prior information, we have no way of determining the
optimum size of M. The following considerations, however, are useful: A
reasonable measure of the reliability of the estimation is the ratio
For most windows in use, Ew is between 0.5Af and 0.8Af (see Table 13-1). If we
set a = 0.2 as the largest acceptable a, we must set M < T/2. If nothing is
known about S(<o), we estimate it several times using windows of decreasing
size. We start with M = T/2 and observe the form of the resulting estimate
Sw(w). This estimate might not be very reliable; however, it gives us some idea
of the form of 5(cu). If we see that the estimate is nearly constant in any interval
of the order of 1/M, we conclude that the initial choice M = T/2 is too large.
A reduction of M will not appreciably affect the bias but it will yield a smaller
variance. We repeat this process until we obtain a balance between bias and
variance. As we show later, for optimum balance, the standard deviation of the
estimate must equal twice its bias. The quality of the estimate depends, of
course, on the size of the available sample. If, for the given T, the resulting
Sw(«) is not smooth for M — T/2, we conclude that T is not large enough for a
satisfactory estimate.
To complete the specification of the window, we must select the form of
w0(t). In this selection, we are guided by the following considerations:
1. The window ИЧю) must be positive and its area must equal 2тг as in (13-52).
This ensures the positivity and consistency of the estimation.
450 SPECTRAL ESTIMATION
TABLE 13-1
wit)
FF(w)
1.	Bartleit
1 - Id
тг — oo E„ = j n = 2
2.	Tukey
-}(1 + COS1T/)
^2
^2 ~ ”2"	= ?	/1 = 3
3.	Parzen
[3(1 - 2|/|)p,(/)]• [3(1 - 2|/Dp,(»)]
m2 =12 Ew = 0.539 n = 4
4. Papoulist
—tsinird + (1 - I/Dcostt/
TT
m2 = it2 Ew = 0.587	/1 = 4
4 sin2 w/2
ы~
it2 sin to
<w(ir2 - to1)
sin w/4
eu/4
cosz(<u/2)
(it2 — ш2)2
tA. Papoulis: "Minimum Bias Windows for High Resolu-
tion Spectral Estimates," IEEE Transactions on Informa-
tion Theory, vol. IT-19, 1973.
2. For small bias, the “duration” of W(a>) must be small. A measure of
duration is the second moment
1л00-
m2 = —- / ш2И/(ш) dco	(13-60)
Z7T * — 30
3. The function WCoj) must go to 0 rapidly as ш increases (small sidelobes). This
reduces the effect of distant peaks in the estimate of S(<d). As we know, the
asymptotic properties of ^(co) depend on the continuity properties of its
inverse w(t). Since w(t) = 0 for |/| > M, the condition that ИТо/) -> 0 as
A/<Dn as n -> 00 leads to the requirement that the derivatives of w(/) of
order up to n — 1 be zero at the end-points ±M of the lag window w(/):
w(±M) =	= ••• =w("-l) 2 3(±Af) = 0	(13-61)
4* The energy Ew of w(/) must be small. This reduces the variance of the
estimate.
Over.the years, a variety of windows have been proposed. They meet more
or less th£ stated requirements but most of them are selected empirically.
Optimality criteria leading to windows that do not depend on the form of the
13-2
M’LCTRAI LStfMAIION 451
FIGURE 13-9
unknown S(ca) are difficult to generate. However, as we show next, for high-res-
olution estimates (large T) the last example of Table 13-1 minimizes the bias. In
this table and in Fig. 13-9, we list lhe most common window pairs w(t) «-* IT(ш).
We also show the values of the second moment тг, the energy E„„ and the
exponent n of the asymptotic attenuation А/ш" of WXw). In all cases, и-(г) = 0
for |t| > 1.
OPTIMUM WINDOWS. We introduce next three classes of windows. In all cases,
we assume that the data size T and the scaling factor M are large (high-resolu-
tion estimates) in the sense that we can use the parabolic approximation of
$(<»> - a) in the evaluation of the bias. This yields [see (11-168)]
— f 5(o> -а)И^(а) da = 5(w) + —f—/ a3IT(a) da (13-62)
2тг ®	4tt »
Note that since > 0, the above is an equality if we replace lhe term S"(to)
by Sn(w + 8) where 8 is a constant in the region where И'(си) takes significant
values.
Minimum bias data window. The modified periodogram S/w) obtained with
the data window c(t) is a biased estimator of S(w). Inserting (13-62) into
(13-45), we conclude that the bias equals
Be(a>) f S(a> - a)C2(a) da - S(<d)
~	1 f a2C2(a) da	(13^63)
4ir J-a>
452 SPECTRA!. ESTIMATION
FIGURE 13-10
We have thus expressed the bias as a product where the first factor depends
only on 5(co) and the second depends only on C(co). This separation permits us
to find CM so as to minimize To do so, it suffices to minimize the
second moment
M2 =-----[ ы2С2(ы) da> = ( |c'(t)|2r//	(13-64)
2ttj~<x.	j-t
of C2(<d) subject to the constraints
1 /•“ _
-— / C“(co) d<o = 1 C( —ca) = C(cu)
2тг
It can be shown that! the optimum data window is a truncated cosine (Fig.
13-10):
(	1	7Г
. . I ,____cos — t	I /1	< T	,—	cos	•ш
Ф) -{ /Г 2T	11 »CM-4tn/f	2	2 (13-65)
|	тг - ш
10	И > Т
The resulting second moment M2 equals 1. Note that if no data window is used,
then c(r) — 1 and M2 = 2. Thus the optimum data window yields a 50 percent
reduction of the bias.
Minimum bias spectral window. From (13-18) and (13-28) it follows that the
bias of SH.(w) equals
B(cu) = S(co - a)W(a) da - 5(cu) ~	(13-66)
2tTj — ta	2
where m2 is the second moment of И,(й))/2тг. To minimize it suffices,
tA. Papoulis: “Apodizalion for Optimum Imaging of Smooth Objects”, J. Opt. Soc. Am.. Vol. b2.
December, 1972.
11
13-2 SNCIRAl ESliMATU.N 453
I
FIGURE 13-11
therefore, to minimize m2 subject to the constraints
I yX
H'(w) > 0 Hz(-w) = Hz(w) — /	(13-67)
2 77 — x
This is the same as the problem just considered if we replace IT by M and we
set
И'(й)) = С2(ш)
This yields the pair (Fig. 13-11)
{ 1 I	тг I	/	|/h	77
J ~~ s*n 77 M + 1 ~ 77 cos	1И Л/ (iirox
и'(О = \-тг|	м 1	\	M J	M	(13-68)
lO	|f| > M
cos2(Afcu/2)
>K(<o) = 3Mir2—-----------~2	(13-69)
(тг2 — М2ш2)
Thus the last window in Table 13-1 minimizes the bias in high-resolution
spectral estimates.
LMS spectral window. We shall finally select the spectral window WXw) so as to
minimize the MS estimation error
e = B2(<a) 4- VarS„.(w)	(13-70)
We have shown, that for sufficiently large values of T, the periodogram ST(w)
454 spectra!, estimation
can be written as a sumS(cii) + v(co) where v(co) is a nonstationary white noise
process with autocorrelation 7tS2(m)5(m - v)/T as in (13-57). Thus our prob-
lem is the estimation of a deterministic function S(a>) in the presence of
additive noise v(co). This problem was considered in Sec. 11-6. Wc shall
reestablish the results in the context of spectral estimation.
We start with a rectangular window of size 2Д and area 1. The resulting
estimate of S(<o) is the moving average
8д(о») = ТГ Л ST(w - a) da	(13-71)
J — д
of Sr(o)). The rectangular window was used first by Daniellt in the early years
of spectral estimation. It is a special case of the spectral window №(ш)/2-тт.
Note that the corresponding lag window sin Дг/2тгДг is not time-limited.
With the familiar large-T assumption, the periodogram Sr(<u) is an
unbiased estimator of S(<o). Hence the bias of SA(w) equals
1 м	,	A2
— f S(a> - y) dy - S(a>) = f y2dy=S”(a))—~
2. A	6
and variance
7rS2(cu) ,Д	ttS2{(o)
----:---/ dtt) —----------
4Д2Т 7_д 2ДТ
This follows from (11-172) or directly from (13-46) where we replace the window
И/(ш)/2тт by a rectangular window with energy тг/Д. This yields
, ч irS2(to) Д4 - tt52(cu)
е-36[5"(ш)]	(13’72)
Proceeding as in (11-176), we conclude that e is minimum if
_ |9ir\°-2[ S(a>) 104
Л~(2Т) [s"(o>))
The resulting bias equals twice the standard deviation of SJw) (see two-to-one
rule).
Suppose finally that the spectral window is a function of unknown form.
We wish to determine is shape so as to minimize the MS error e. Proceeding as
in (11-177), we сал show that e is minimum if the window is a truncated
IP; J. Daniell: Discussion on “Symposium on Autocorrelation in Time Series,” J. Roy. Statist. Soc.
Slippl., 8, 1946,
13-3 EXTRAPOLATION AND SYS 11 M IDI.NIII-K Al ION 455
parabola:
3 fд	/ y2 \
S»(<o) = 4д/_Л(“"’’)Г “ S*
(13-73)
This window was first suggested by Priestley.t Note that unlike the earlier
windows, it is frequency dependent and its size is a function of the unknown
spectrum 5(<o) and its second derivative. To determine S„(w + 5) we must
therefore estimate first not only S(a>) but also ,S’"(w). Using these estimates we
determine Д for the next step.
13-3 EXTRAPOLATION AND
SYSTEM IDENTIFICATION
In the preceding discussion, we computed the estimate Rr(r) of R(t) for
|т| < M and used as the estimate of 5(a>) the Fourier transform Slv(.w) of the
product w(t)Rr(r). The portion of Rz(r) for |r| > M was not used. In this
section, we shall assume that S(o>) belongs to a class of functions that can be
specified in terms of certain parameters, and we shall use the estimated part of
R(t) to determine these parameters. In our development, we shall not consider
the variance problem. We shall assume that lhe portion of Л(т) for |r| < M is
known exactly. This is a realistic assumption if T » M because RT(r) -» R{~)
for |т| < M as T -* co. A physical problem leading to the assumption that /?(т)
is known exactly but only for |т| < M is the Michelson interferometer. In this
example, the time of observation is arbitrarily large; however, /?(т) can be
determined only for |т| < M where M is a constant proportional to the
maximum displacement of the moving mirror (Fig. 13-5).
Our problem can thus be phrased as follows: We are given a finite
segment
ЯЛ/(т) =
K(r)
0
|t| <m
|t| > M
of the autocorrelation Я(т) of a process x(f) and we wish to estimate its power
spectrum S(o>). This is essentially a deterministic problem: We wish to find the
Fourier transform S(w) of a function /?(т) knowing only the segment Raz(t) of
R(r) and the fact that S(w) 0. This problem does not have a unique solution.
Our task then is to find a particular S(cu) that is close in some sense to the
unknown spectrum. In the early years of spectral estimation, the function S(a>)
tM. B. Priestley: "Basic Considerations in the Estimation of Power Spectra," Technometrics, 4.
1962.
456 SPECTRAL ESTIMATION
was estimated with the method of windows (Blackman and Tukeyf). In this
method, the unknown /?(т) is replaced by 0 and the known or estimated part is
tapered by a suitable factor w(r). In recent years, a different approach has been
used: It is assumed that S(a>) can be specified in terms of a finite number of
parameters (parametric extrapolation) and the problem is reduced to the
estimation of these parameters. In this section we concentrate on the extrapola-
tion method starting with brief coverage of the method of windows.
Method of windows. The continuous-time version of this method is treated in
the last section in the context of the bias reduction problem: We use as the
estimate of S(cu) the integral
Slv(a>) = w(T)R(T)e~jurdr = Г S(cu - а)1У(а) da (13-74)
J-M	2irJ-x
and we select w(t) so as to minimize in some sense the estimation error
S„.(w) — S(<d). If M is large in the sense that S{u> -a) - S(<o) for |аI < l/M,
we can use the approximation [see (13-62)]
Sw(w) — S(<o) —-------f а2И/'(а) da
4-n- J-a,
This is minimum if
1	IT	/	Id \	7Г
w(t) = — sin —T + 1 - — cos —T |т| < M
тг	M	(	M J	M
The discrete-time version of this method is similar: We are given a finite
segment
/? Ш	lw| L	in w
лДт] =	(13-75)
(0	|m| > L
of the autocorrelation A[zn] = ЕЫп + z?? ]x[zi ]} of a process x[n] and we wish to
estimate its power spectrum
S(o>) = £ tf[zz!]e-;"‘"
Щ — —00
We tise as the estimate of S(o>) the DFT
= E	]e~'ww = — [ S(o> - а)РГ(а) da (13-76)
m--L	2irJ-v
Of the product w[m]/?[zzi] where w[zn] <-* is a DFT pair. The criteria for
selecting w[m] are the same as in the continuous-time case. In fact, if M is
fR.B. Blackman and J.W. Tukey: The Measurement of Power Spectra, Dover, New York, 1959.
13-3 *iXTHAPO|_ATlf>N AM> SYhfl M H>| N I'll KAI ION 457
large, we can choose for w[m] the samples
и?["’] = H'(Mm/L)	m =0,..., L	(13-77)
of an analog window w(t) where M is lhe size of »v(r).
In a real problem, the data /?,[/»] are not known exactly. They are
estimated in terms of the J samples of x[n]:
RJW] = 7 E«[w + zm]x[«]	(13-78)
J n
The mean and variance of Rjm) can be determined as in the analog case. The
details, however, will not be given. In the following, we assume that Rt [w] is
known exactly. This assumption is satisfactory if J » L.
Extrapolation Method
The spectral estimation problem is essentially numerical. This involves digital
data even if the given process is analog. We shall, therefore, carry out the
analysis in digital form. In the extrapolation method we assume that S(z) is of
known form. We shall assume that it is rational
b(l + b,z~1 + • • • +b..z~‘y N(z)
S(z) = L(z)L(l/z) L(z) = - --	= -Ц
1 + C|Z 1 + •  • +aNz D(z)
(13-79)
We select the rational model for lhe following reasons: The numerical evalua-
tion of its unknown parameters is relatively simple. An arbitrary spectrum can
be closely approximated by a rational model of sufficiently large order. Spectra
involving responses of dynamic systems are often rational.
System identification. The rational model leads directly to the solution of the
identification problem (see also Sec. 11-7): We wish to determine the system
function H(z) of a system driven by white noise in terms of the measurements of
its output х[л]. As we know, the power spectrum of the output is proportional
to H(z)H(l/z). If, therefore, the system is of finite order and minimum phase,
then H(z) is proportional to L(z). To determine H(z), it suffices, therefore, to
determine the M 4- N 4- 1 parameters of L(z). We shall do so under the
assumption that RL[m] is known exactly for |m| <. M + N + I.
Wc should stress that the proposed model is only an abstraction. In a real
problem, /?[m] is not known exactly. Furthermore, S(z) might not be rational;
even if it is, the constants M and W might not be known. However, the method
leads to reasonable approximations if Rjmj is replaced by its time-average
estimate Rjm] and L is large.
Autoregressive process. Our objective is to determine the M 4- N + 1 coeffi-
cients bt and ak specifying the spectrum S(z) in terms of the first M 4- N + I
458 SPECTRAL ESTIMATION
values ЛД/и] of /?[m]. We start with the assumption that
__________
1 4- fljZ-1 + * • • +aNz~N D(z)
(13-80)
This is a special case of (12-36) with M = 0 and bQ = y/P^. As we know, the
process х[л] satisfies the equation
х[л] + a(xl« — 1] + ’'  +а,ух[л - Af ] s e[«]	(13-81)
where е[л] is white noise with average power PN. Our problem is to find the
W + 1 coefficients ak and PN. To do so, we multiply (13-81) by x[n - лл] and
take expected values. With m = 0,..., N, this yields the Yule-Walker equations
/?[0] + n,K[l] + ••• +aN/?[yV] =PN
Л[1] +fll/?[0] + • +	1] = 0	(13-82)
/?[?/] + a}R[N —!] + ••• + <7„/?[0] = 0
This is a system of N + 1 equations involving the W + 1 unknowns aK and PN,
and it has a unique solution if the determinant Дд, of the correlation matrix DN
of х[л] is strictly positive. We note, in particular, that

P» =
> 0
(13-83)
If An + ) =0, then PN = 0 and еДлл] = 0. In this case, the unknown S(<o)
consists of lines [see (12-44)].
To find L(z), it suffices, therefore, to solve the system (12-82). This
involves the inversion of the matrix DN. The problem of inversion can be
simplified because the matrix DN is Toeplitz; that is, it is symmetrical with
respect to its diagonal. We give later a simple method for determining ak and
PN based on this property (Levinson’s algorithm).
Moving average processes. If х[л] is an MA process, then
S(z) = L(z)L(l/z) L(z) = b0 + 6,z-‘ +   • +bMz~M (13-84)
In this case, S(z) can be expressed directly in terms of the first M 4- 1 values of
Я[т]:
м
S(z) = £ /?[w]2“w
m- ~M
S(e'“) =
M
E btne-^
m»0
(13-85)
In the identification problem, our objective is to find not the function S(z), but
the M + 1 coefficients bm of L(z). One method for doing so is the factorization
S(z) » L(z)L(l/z) of S(z) as in Sec. 12-1. This method involves the determina-
tion of the roots of S(z). We discuss later a method that avoids factorization
(see page 470).
13-3 EXTRAPOLATION ANO SYSTEM IDLNTIIICA1 ION 459
FIGURE 13-12
(13-86)
ARMA processes.! We assume now that x['i] is an ARMA process:
L(z) = /J° + b|Z~' *	+bM2~‘" _
1 + a,?"1 + •• +ovz“N D(z)
In this case, x[/i] satisfies the equation
х[л] + а,х[л -!] + ••• +aNx[n - /V] = bni[w] + • •  + />A,i[n - M]
(13-87)
where i[n] is its innovations. Multiplying both sides of (13-87) by x[n - m) and
taking expected values, we conclude as in (12-49) that
/?[m] +	- 1] +  • • +ал//?[т - /V ] = 0 m>M (13-88)
Setting m = N + 1,N + 2,. ..,2W into (13-88), we obtain a system of W
equations. The solution of this system yields the N unknowns a,,..., aN.
To complete the specification of L(z), it suffices to find the M + 1
constants bQ,To do so, we form a filter with input x[w), and system
function (Fig. 13-12)
D(z) = 1 4-	+ ••• +aNz~N
The resulting output y[n] is called the residual sequence. Inserting into (10-183),
we obtain
Syy(z) = S(z)D(z)D(l/z) = N(z)N(l/z)
From this it follows that у(л] is an MA process, and its whitening filter equals
Ly(z) = N(z) = £>0 + btz~’4- ••• +bMz~M (13-89)
tM- tCaveh: “.High Resolution Spectral Estimation for Noisy Signals,” IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. ASSP-27. See also J. A. Cadzow (1982): "Spectral
Estimation: An Overdetermined Rational Model Equation Approach," IEEE Proceedings, vol. 70.
1979.
460 SrECTftAL CSTIMA1 ION
To determine the constants 6„ it suffices, therefore, to find lhe autocorrelation
ЯуДш] for |mI < M. Since y[zi] is the output of the filter D(z) with input xjzi],
it follows from (12-47) with a0 = 1 that
м
/? [m] =	= E - *]
This yields
N	'V
ЯуДт] = E Я['я-']р[']	?['”]= E «А-гл«л =p[-w] (13-90)
i зя — д/	A — /и
for 0 < m < M and 0 for m > M. With so determined, wc proceed as
in the MA case.
The determination of the ARMA model involves thus the following steps:
Find the constants ak from (13-88); this yields D(z).
Find Яуу[ш] from (13-90).
Find the roots of the polynomial
At
S,,(z)- £ «„[m]z"“-W(z)W(l/z)
wi — — M
Form the Hurwitz factor N(z) of Svy(z).
Lattice filters and Levinson’s algorithm. An MA filter is a polynomial in z"1.
Such a filter is usually realized by a ladder structure as in Fig. 13-14a. A lattice
filter is an alternate realization of an MA filter in the form of Fig. 13-146. In the
context of spectral estimation, lattice filters are used to simplify the solution of
the Yule-Walker equations and the factorization of polynomials. Furthermore,
as we show later, they are also used to give a convenient description of the
properties of extrapolating spectra. Related applications are developed in the
next chapter in the solution of the prediction problem.
The polynomial
D(z) = 1 - afz-1 —  • • - a^z“N = 1 - E akz~k
к = 1
specifies an MA filter with H(z) = £>(z). The superscript in ak identifies the
order of the filter. If the input to this filter is an AR process x[m] with L(z) as in
(13-80) and ak = —ak, then the resulting output
e[h] = x[n] — afSc[n - 1] — • • • — a#x[zi - TV] (13-91)
is white noise as in. (13-81). The filter D(z) is usually realized by the ladder
structure of Fig. 13-14». We shall show that the lattice filter of Fig. 13-146 is an
equivalent realization. We start with W = 1.
13-3
I XTRAPOi AVION ANHSVSILM IDEM II ICA I ION 461
In Fig. 13-13a we show the ladder realization of an MA filter of order 1
and its mirror image. The input to both systems is the process x(n]; lhe outputs
equal
у [/г ] = х[л] + fljxfn - 1] z[n] = —a[x[n] + х[л - 1]
The corresponding system functions equal
1 — a\z~1	—aj + z~1
In Fig. 13-136 we show a lattice filter of order 1. It has a single input х[л] and
two outputs
ё|[п] = x[n] - A?|x[n - 1] e,[л] = —/С,х[л] + x[n - 1]
The corresponding system functions are
t,(z) = 1 -JfjZ-' ё,(2) = -Kx + z"‘ =z~*E|(l/z)
If = a} then the lattice filter of Fig. 13-136 is equivalent to the two MA
filters Of Fig. 13-13a.
In Fig. 13-146 we show a lattice filter of order N formed by cascading N
first-order filters. The input to this filter is the process х[л]. The resulting
462 SPECTRAL ESTIMATION
outputs are denoted by en[л] and еДл] and are called forward and backward
respectively. As we see from the diagram these signals satisfy the equations
= «л/-1(«] ~	- Л	(13-92a)
Ц«1 = ел/-|[л “ Л “	(13-92/»)
Denoting by £jy(z) and ^N(z) the system functions from the input A to the
upper output В and lower output C respectively, we conclude that
6«(z) =	(13-93«)
e„(z) = z-'6„_,(z) -K„E„.,(z)	(13-936)
where E^/z) and Ezw_|(z) are the forward and backward system functions of
the lattice of the first — 1 sections. From (13-93) it follows by a simple
induction that
6„(z) =z-NfeN(l/z)	(13-94)
The lattice filter is thus specified in terms of the N constants Kk. These
constants are called reflection coefficients.
13-3 l£XIKAPOLA-HONANI>bVSri-M IPI-.N III К AI IOS 463
Since £|(z) = 1 — K^z and £t(z) = ~Kt + zl, we conclude from
(13-93) that the functions E((z) and E,v(z) are polynomials in z"1 of the form
E„(z) = 1 -	- a$z-*	(13-95)
Ед,(г) = z~N - afz-"*1 -	 - a*	(13-96)
where ak are N constants that are specified in terms of the reflection coeffi-
cients Kk.
LEVINSON’S ALGORITHM.! We denote by ak~1 the coefficients of the lattice
filter of the first W - 1 sections:
E,v-!(*) = 1 -<'*"--------------
From (13-94) it follows that
z-'Ev-^z) =z-^'Ev_,(l/z)
Inserting into (13-93a) and equating coefficients of equal powers of z. we obtain
ak =	~ Ki^N-k к = 1..........M ~ 1
N k	(13-97)
Z,*V -
aN ~ K-h
We have thus expressed the coefficients ak of a lattice of order N in terms of
the coefficients a^_| and the last reflection coefficient KN. Starting with
=K|, we can express recursively the N parameters ak in terms of the W
reflection coefficients Kk.
Conversely, if we know ak, we find Kk using inverse recursion: The
coefficient KN equals aft. To find KN_t, it suffices to find the polynomial
E/y-iCz). Multiplying (13-936) by KN and subtracting from (13-93a), we obtain
(1 -K^E^z) = E„(z) + KNz~NEN(l/z) (13-98)
This expresses EiV_1(z) in terms of EN(z) because KN — aft- With EjV_](z) so
determined, we set	Continuing this process, we find EN_k(z)
and Kw-k f°r every к < N.
Minimum-phase properties. We shall relate the location of the roots z/v of the
polynomial EN(z) to the magnitude of the reflection coefficients Kk.
THEOREM. If
< 1 foralH^/V then Iz^l < 1 forallf^W (13-99)
tN. Levinson: “The Wiener RMS Error Criterion in Filter Design and prediction." Journal of
Mathematics and Physics, vdl. 25, 1947. See also J. Durbin: “The Fitting of Time Series Models,"
ReuueL’lnstitut Internationale de Statisque, vol. 28, 1960.
464 SPECTRAL ESTIMATION
Proof. By induction. The theorem is true for = 1 because E,(z) = 1 - К,z ‘
hence |z}I = |XJ < 1. Suppose that |z/“11 < 1 for al! j < N - 1 where z;v'1
are the roots of E/y-^z). From this it follows that the function
z"'vEZy_1( 1/z)
^_,(z) » —---------—-----	(13-100)
11 z)
is all-pass. Since EN(z^) = 0 by assumption, we conclude from (13-93a) and
(13-94) that
e«u) =	= о
Hence
1^)1=
This shows that |z/4 <1 [see (13B-2)].
CONVERSE THEOREM. If
|z/4 < 1 for all i <> N then |XJ < 1 for all к <, N (13-101)
Proof. The product of the roots of the polynomial EN(z) equals the last
coefficient a^, Hence
KN ~ aN ~ z\ ’" ‘ zn l^/yl < 1
Thus (13-100) is true for к = N. To show that it is true for к — N — 1, it suffices
to show that 1 < 1 for j N — I. To do so, we form the all-pass function
z-A,L(l/z)
An{z) = л 7 Z	(13-102)
Since	= 0 it follows from (13-98) that
’)l = i^J > 1
Hence |z/“‘| < 1 and = la^ZfI = |ztN-1 ••• z^Z/l < 1. Proceeding
similarly, we conclude that |KA.| <1 for all к N.
COROLLARY. If < 1 for к <, N - 1 and = 1, then
|zf|»l for all i^N	(13-103)
Proof From the theorem it follows that \z^~ * | < I because |KJ < 1 for all
-fc £ Af — 1. Hence the function AN_x(z) in (13-100) is all-pass and HA?_1(z//)|
“ 1/ |&w| e 1. This leads to the conclusion that lz/Ч = 1 [see (13B-2)].
13-3 hXIKAJ’OIAIION ANOSVSII M IDLNHI-K aiion 465
We have thus established the equivalence between a polynomial Ev(z)
and a set of N constants Kk. We have shown further that the polynomial is
strictly Hurwitz, iff |Я\. | < 1 for all k.
Inverse lattice realization of AR systems. An inverse lattice is a modification of
a lattice as in Fig. 13-15. In this modification, the input is at point В and the
outputs are at points A and C. Furthermore, the multipliers from the lower to
the Upper line are changed from ~Kk to Kk. Denoting by ел,[«] the input at
point В and by ea,_1[zi] the resulting output at C, we observe from the figure
that
eyv-i[«] = ^[«] +	~ Л	(13-104л)
=£n-i[« ~ И ~ A’NeN_l[n] (13-104/»)
These equations are identical with the two equations in (13-92). From this it
follows that the system function from В to A equals
1 1
tN(z) 1 - afc"1-------------- a$z~N
We have thus shown that an AR system can be realized by an inverse lattice.
The coefficients ak and Kk satisfy Levinson’s algorithm (13-97).
Iterative solution of the Yule-Walker equations. Consider an AR process
with innovations filter L(z)«	/D(z) as in (13-80). We form the lattice
equivalent of the MA system (Xz) with ak = -ak, and use х[л] as its input. As
466 SPECTRAL ESTIMATION
we know [see (13-95)] the forward and backward responses are given by
eN[«J = х[л] - <x[n - 1]--------- ~	J05)
ёх[л] = *['« - AT] -	- N + 1] - • • • - а£х[л]
Denoting by Sw(z) and S,v(z) the spectra of ёл[л] and ёл,[л] respectively, we
conclude from (13-105) that
SA.(z) - S(z)e„(z)E„(l/z) = />„
6„(z) = S(z)En(z)Ea,(1/z)-Cv
From this it follows that ev[»J and ё„[и] are two white-noise processes and
E{eJ,(n]| = E(e?v[n ]) = PN	(13-106»)
E{x[n - m]е„[и]} =	1Pn 10	m - 0 1 5 m < N	(13-106b)
£{x[zi - т]еД«]} =	/0 \PN	Q < rn < N - 1 m = N	(13-106c)
These equations also hold for all filters of lower order. We shall use them to
express recursively the parameters af, KN, and PN in terms of the Л/ + 1
constants Я[0],..., R[N].
For ЛГ = 1 (13-82) yields
/?[0] - n‘/?[l] = Л Л[1] - n*/?[0] = 0
Setting PQ = Л[0], we obtain
= ^5} p'= (1 _ K^p°
Suppose now that we know the N + 1 parameters а%~\ KN_{, and PN. From
Levinson’s algorithm (13-97) it follows that we can determine if KN is
known. To complete the iteration, it suffices, therefore, to find KN and PN. We
maintain that
N-l
PN-}Kn = R[N] - E af-'Kftf - k]	(13-107)
Л-1
(13-108)
The first equation yields KN in terms of the known parameters a^~', R[m],
and With KN so determined, PN is determined from the second equa-
tions.
Proof. Multiplying (13-92e) by x[n — W] and using the identities
-w]}	- Е*аГад
л-i
^№[n]} = PN	— l]x[«]} = Pfj-i
13-3 kXTKAI’OLATION AND SYSTEM IDES ГК КЛI ION 467
we obtain (13-107). From (13-92«) and the identities
£{en[m]x[«]} = PN	- 1]х[л]} = P1V_,
£{ё/у-1[л - !]*[«]} =/?[N] - £ 4V 'R[N~ fc] = PN.tKN
k~\
it follows similarly that PN = PN_, - К^РЫ_ , and (13-108) results.
Since Pk 0 for every k, it follows from (13-108) that
|KA| < 1 and P0>P} > •• >PN>0	(13-109)
If |K„| = 1 but |KJ < 1 for all k < N, then
Pa> Pi > • • > PN = Q	(13-110)
As we show next this is the case if 5(w) consists of lines.
Line spectra and hidden periodicities. If PN = 0, then e(V[rz] = 0; hence the
process 4n] satisfies the homogeneous recursion equation
x[/i] = e^x[n -!] + ••• +ajjx[n - N ]	(13-111)
This shows that х[л] is a predictable process, that is, it can be expressed in
terms of its N past values. Furthermore,
7?[m] -nf7?[m - 1] - • •• -a*R[m — yV ] = 0	(13-112)
As we know [see (13-103)] the roots z* of the characteristic polynomial ЕдДи)
of this equation are on the unit circle: z* = e/a>i. From this it follows that
N	N
7?[m] ~ ^2aieJa>'"’	$(ш) = 2тгЕаг5(ю - to,) (13-113)
i	i=i
And since S(to) 2: 0, we conclude that a, 0.
Solving (13-111), we obtain
№	f n	i =
x[n] = Ec^01'”	£{c,} = 0 £{cicJ = (n'	, (13-114)
/-1	[0
CARATHEODORY’S THEOREM. We show next that if 7?[m] is a p.d. sequence
and its correlation matrix is of rank N, that is, if
Д„>0	Д„+1 = 0	(13-115)
then 7?[w]is a sum of exponentials with positive coefficients:
N
7?[m] = Ea/e;"'w ai > °	(13-116)
i-i
Proof. Since 7?[m] is a p.d. sequence, we can construct a process x[n] with
autocorrelation Applying Levinson’s algorithm, we obtain a sequence of
constant Kk and Pk. The iteration stops at the Mh step because PN =
e 0* This shows that the process x[n] satisfies the recursion equation
(13-Ш).
468 SPECTRAL ESTIMATION
Detection of hidden periodicities.! We shall use the above to solve the following
problem: We wish to determine the frequencies <о, of a process x[z/] consisting
of at most N exponentials as in (13-114). The available information is the sum
у[м] = x[«] + v[n] £{у2[м]}=1?
where v[zt] is white noise independent of x[zt}.
Using J samples of y[zz], we estimate its autocorrelation
as in (13-78). The correlation matrix DN+X of x[zt] is thus given by
МП	M'V]
_	/?J1]	yo]-<?	•••	М^П
yw]	М*-П	•••	M°]-«
(13-117)
(13-118)
(13-119)
In this expression, Ryy[m] is known but q is unknown. We know, however, that
Ддг+i = 0 because x[n] consists of N lines. Hence q is an eigenvalue of Dr<.,.
It is, in fact, the smallest eigenvalue qn because DN + X > 0 for q < qa. With
/?xx[zn] so determined, we proceed as before: Using Levinson's algorithm, we
find the coefficients and the roots е1ш' of the resulting polynomial E(V(z). If
qQ is a simple eigenvalue, then all roots are distinct and x[n] is a sum of N
exponentials. If, however, qn is a multiple root with multiplicity Nn then x[zz]
consists of N — No + 1 exponentials.
This analysis leads to the following extension of Caratheodory’s theorem:
The TV + 1 values R[0],..., R[/V] of a strictly p.d. sequence /?(zzr] can be
expressed in the form
N
K[zn] = <7O3[zzi] + £	(13-120)
/= i
where qQ and a( are positive constants and to, are real frequencies.
Bmg?s iteration, ф Levinson’s algorithm is used to determine recursively the
coefficients a* of the innovations filter L(z) of an AR process x[zi] in terms of
Ж/nl. In a real problem the data /?[zzi] are not known exactly. They are
estimated from the J samples of x[zi] and these estimates are inserted into
(13-107) and (13-108) yielding the estimates of KN and PN. The results are then
used to estimate a* from (13-97). A more direct approach, suggested by Burg,
avoids the estimation of R[zn]. It is based on the observation that Levinson’s
tV, F. Pisarenko: “The Retrieval of Harmonics," Geophysical Journal of the Royal Astronomical
«SotfriVi 1973.
tU Р» Burg: Maximum entropy spectral analysis, presented at the International Meeting of the
Society for the Exploration of Geophysics, Orlando, FL, 1967.
13-3
1 MRM’OI МИЛ AMiSYMt M ||>| Mlf i, л ! |< >\	469
algorithm expresses recursively the coefficients < in terms of ani| f»v. The
estimates of these coefficients can, therefore, be obtained directly in terms of
the estimates of K,v and Pv. These estimates arc based on the following
identities [see (13-106)]:
Лу-	- l['t]c.4 _|[/t - I])
i\ = o;(«-;.[«] rj,.])
Replacing expected values by time averages, wc obtain the following iteration:
Start with
po = J E x?["]	M"] = ё()[п] = x[/i]
J n - I
Find Рл |. a* ev |[/i], Set
_ Y.J„ v. ,cN. ,[h]fv_,[h - I]
v	+ H 1])	(,3',22)
P,v= (I - K“.)/’v ,	(13-123)
a* = ai“* “ Kvav-1 A=I,...,N-I
(13-124)
«Я =
x -1
e,v[n]=x[n]- Ea£x[n-A]
к - 1
д,	(13-125)
Ц'Ы = x[n ~ w] - E - N + A]
к -1
This completes the Nth iteration step. Note that
IK/J <; 1 PtViO
This follows readily if we apply Cauchy’s inequality (sec Prob. 11-23) to the
numerator of (13-122).
Levinson’s algorithm yields the correct spectrum S(z) only if x[/t] is an AR
process. If it is not, the result is only an approximation. If R[w] is known
exactly, the approximation improves as N increases. However, if Rpn] is
estimated as above, the error might increase because the number of terms in
(13-49) equals J — N — 1 and it decreases as N increases. The determination of
an optimum N is in general difficult.
FEJER-RIESZ THEOREM AND LEVINSON’S ALGORITHM. Given a positive
trigonometric polynomial
N
JF(e>w) » £ wne~inu> &0	(13-126)
л- -w
470 SPECTRAL ESTIMATION
we can find a Hurwitz polynomial
rW"EV	(13-127)
n >=U
such that W(eJtu) = |Y(ey")|2. This theorem has extensive applications. Wc used
it in Sec. 12-1 (spectral factorization) and in the estimation of the spectrum of
an MA and an ARMA process. The construction of the polynomial Y(z)
involves the determination of the roots of IV(z). This is not a simple problem
particularly if ИТе7") is known only as a function of ш. We discuss next a
method for determining Y(z) involving Levinson’s algorithm and Fourier series.
We compute, first, the Fourier series coefficients
«["] = 0 £ " £ N <13-128)
of the function S(eyw) =	The numbers R[zn] so obtained are the
values of a p.d. sequence because > 0. Applying Levinson’s algorithm to
the numbers /?[m] so computed, we obtain W + 1 constants and /\. This
yields
Hence
1 (	*
Г(г)--== 1 -
V \	n—0
as in (13-127). This method thus avoids the factorization problem.
The General Class of Extrapolating Spectral
We consider now the following problem: We are given the N + 1 values (data)
Я[0],...,Л[/7]
of the autocorrelation of a process x[n] and we wish to find all its p.d.
extrapolations, that is, we wish to find the family CN of spectra S(e/tu) > 0 such
that the first N + 1 coefficients of their Fourier series expansion equal the given
data. The sequences /?[m] of the class CN and their spectra will be called
admissible.
A member of the class CN is the AR spectrum
S(z) - L(z)L(l/z) Цг) - Ел(г)/у^
tA. Papoulis: “Levinson’s Algorithm, Wold’s Decomposition, and Spectral Estimation,” SIAM
Review, vol 27, 1985.
13"3 1-.XTKAPOLA1 ION AND SYS'll'M IIJINI11-11 ATION 471
Data	Extrapolating
FIGURE 13-16
where E^z) = EA,(z) is the forward filter of order N obtained from an /V-step
Levinson algorithm. The continuation of the corresponding /фп] is obtained
from (12-41 b):
N
— & ] m > N
к — 1
To find all members of the class wc continue lhe algorithm assigning
arbitrary values
|KJ <; 1 к = N+ l,N + 2,...
to the reflection coefficients. The resulting values of R[ni] are determined
recursively [see (13-107)]
tn- I
fi[m]=	(13-129)
k*= I
This shows that the admissible values of /?[m] at the mth iteration are in an
interval of length 2Pm_}:
E	-*] - C.-iSW *	- Л] + Pln-{ (13-130)
(t-1	л-t
because |KW| s 1. At the endpoints of the interval, |K„,I = 1; in this case,
Pm « 0 and Aw+] = 0. As we have shown, the corresponding spectrum S(w)
consists of lines. If |KWJ < 1 and Km = 0 for m > m0, then S(z) is an AR
spectrum of order m0. In Fig. 13-16, we show the iteration lattice. The first /V
sections are uniquely determined in terms of the data. The remaining sections
form a four-terminal lattice specified in terms of the arbitrarily chosen reflection
Coefficients	•
Admissible spectra. The DFTs of the sequences generated by the preceding
iteration form the class CN of admissible spectra. We give next a simple
characterization of this class starting with regular spectra. Such spectra are the
transforms of all. admissible sequences obtained with |K„J < 1 for all m.
472 SPECTRAL ESTIMATION
We shall show that all regular spectra can be expressed in terms of the
forward and backward functions EN(z) and z^E^l/z) of the first N sections,
the constant PN, and a reflection coefficient p(z) defined as follows:
A function p(z) is called a reflection coefficient if
p(z) = ^4 lp(z)| < 1 for |z| £ 1	(13-131)
a(z)
where g(z) and b(z) are two power series in z-* analytic for |z| > 1 and such
that a(<x>) = 1, Z>(°°) = 0.
It can be shown that the functions
1 - |p(e'~)|2
S( е'ш ) = A,-------:-----------:-----------------j	(13-132)
generate the class of all regular spectra of the class CN where p(z) is an
arbitrary reflection coefficient. The proof of this is based on the properties of
four-terminal lattices. The details, however, are involved (see page 470л).
We shall determine the innovations filter L(z) of a process with the above
spectrum. To do so, we must factor S(z). The denominator of S(z) is factored
readily. To factor the numerator, we observe that
1 -p(2)p(l/z)
g(z)g(l/z) - b(z)b(\/z)
g(z)«(l/z)
It suffices, therefore, to find the numerator. To do so, we determine a Hurwitz
polynomial y(z) such that
y(z)y(l/z) = a(z)g(l/z) - b(z)b(l/z)
With y(z) so determined, (13-132) yields
L(2)	En(z)o(z)-z-"E„(1/z)Z>(z)	(	3)
If p(z) is a rational function, S(z) is an ARMA spectrum. If p(z) = 0, it is an
AR spectrum of order N. If y[z] = constant, S(z) is an AR spectrum of order
higher than N.
Line spectra. If in the preceding algorithm | Ktn | = 1 for m = m0 > N, then the
iteration terminates and the resulting spectrum consists of m0 lines. We can,
therefore, fit the N -I- 1 values R[w] of a p.d. sequence with the sum of m0
exponentials where w0 is any number larger than N. We can do so with N lines
only if AN+1 = ,0. If Aw+1 > 0, we obtain a spectrum consisting of N lines and
a constant:
S(<o) = q0 + 2тг 52 а,5(й> — ч)
«-I
(13-134)
13-3 EXTRAPOLATION AND SYSTEM JDEN ПИСА Г ION 473
where q0 is the smallest eigenvalue of the matrix This is a consequence
of the modified form (13-120) of Caratheodory’s theorem.
Maximum Entropy and Smoothness Conditions
In the preceding discussion we used as the parametric form of S(z) a rational
function of z. In the following we determine the parametric form of S(z) in
terms of certain smoothness conditions leading to the maximization of the
integral of some function of S(<d). The method of maximum entropy is a special
case. We repeat lhe problem: We are given the first N + 1 values of lhe p.d.
sequence Л[т] and we wish to determine its spectrum
S(w) = £
tn ** - 30
To solve this problem, we introduce a nonlinear function G(5(w)) of S(to) and
we determine the unknown values of /?[m] so as to maximize the integral
H = f G(S(w))dw	(13-135)
subject to the constraints
K[m] = T-	(13-136)
where /?[m] are the given data.
The integral H depends on the unknown values of Л(т]. It is, therefore,
maximum if
dH	d	dS(v)	, ,
—= f	= 0	|m|>N
J-ndS 4
With F(SU)) = G'(S(<o)) this yields
v
f F(S(<0))e-jmudu) = 0	\m\ > N because щ—j- = e jmta
From this it follows that the Fourier series coefficient of the function F(5(w))
must be 0 for |m| > M. In other words
F(S(<u)) = £ cke~jko>	(13437)
Jt- -N
The constants ck can, in principle, be determined in terms of the data /?[/«].
Indeed, from (13-136) and (13-137) it follows that
ВДж1ГН £ cke~Jka dw lm|s* (13-138)
474 SPECTRAL ESTIMATION
where F(_,) is the inverse of F(s). This is a nonlinear system of 2/V + 1
equations involving the 2N + 1 unknowns ck. Ils solution is in general difficult.
The selection of the function G(S) depends on the applications. It might
be selected, for example, to emphasize the high or low values of S(to). The
following special case is of particular interest. It leads to a system that can be
simply solved and the result maximizes the uncertainty about the unknown
spectrum.
The method of maximum entropy, t We now assume that
G(S(<u)) = In 5(<u)
In this case,
H = InS(w) da)	(13-139)
If 5(ca) is the power spectrum of a process x[m], then H is the entropy rate of
х[л] [see (15-130)].
From (13-135) it follows that G(S(ca)) = In S(cu); hence
1 N
?($(*>)) =	= £ cke~iko> > Q	(13-140)
5(o>)
This shows that the spectrum S(oj) is ARMA. It can, therefore, be written in
the form
Hence its coefficients ak arid PN can be determined recursively from Levinson’s
algorithm.
We have thus shown that the estimation of 5(o>) based on the principle of
maximum entropy rate is equivalent to the assumption that the unknown S(<d)
is AR.
APPENDIX 13A
MINIMUM-PHASE FUNCTIONS
A function
H(z) = £ hnz~n
n»0
is called minimtim-phase, if it is analytic and its inverse 1/H(z) is also analytic
|A. Papoulis: “Maximum Entropy and Spectral Estimation: A Review,” IEEE Transactions on
Acoustics. Speech, and Signal Processing, vol. ASSP-29. 1981.
APPENDIX IJB ALL-PASS HINCT1ONS 475
for |z| £ 1. We shall show that if H(z) is minimum-phase, then
>	1 r*
>пЛо=^-/ ln|H(e'*)| dtp
&TT J -7Г
(13A-1)
Proof. Using the identity |H(e><p)2 =	wc conclude with e'* = z,
that
£jn |H(e*)|2^ = ^-Hn[H(z)H(z-*)] dz
where the path of integration is the unit circle. We note further, changing z to
1/z, that
r 1	,1
(J>-lnH(z)</z =ф- lnH(z-')</z
To prove (13A-1), it suffices, therefore, to show that
1 f 1
In |Л«»1 = InH(z) dz
2ttj j z
This follows readily because H(z) tends to Ло as z -> « and the function
InH(z) is analytic for |z|	1 by assumption.
APPENDIX 13B
ALL-PASS FUNCTIONS
The unit circle is the locus of points N such that (see Fig. 13-17»)
(ЛИ) = |eyip - l/zfl e J_
(NB) H’-zJ IzJ 2J
From this it follows that, if
then = 1. Furthermore, |F(z)| >1 for |z| < 1 and |F(z)| < 1 for
|z| > 1 because F(z) is continuous and
|F(0)I = kt >1 IF(»)I = |z/*l < 1
K/l
Multiplying N bilinear fractions of the above form, we conclude that, if
N 77 * — 1
H(2) = n-4rr W<!	(J3B4)
476 spectral estimation
All-pass filter
H(z)	H(z-')
(b)
FIGURE 13-17
A - II
Ф1=Xyi,,+Aih i*i
then
I > 1
|H(z)|< = 1
I < 1
Iz| < 1
|z| = 1
1*1 > 1
(13B-2)
A system with system function H(z) as in (13B-1) is called all-pass. Thus
an all-pass system is stable, causal, and
|Н(е'“,г) | = 1
Furthermore,
1
W
Д. z - zi и 1 - z./z {1 \
ГТ -__1 _ ГТ __>'	_ UII
Mzf-l/z \zj
(13B-3)
because if zs is a pole of H(z), then z?*is also a pole.
From the above it follows that if й[л] is the delta response of an all-pass
system, then the delta response of its inverse is
Ж?) 3	(13B-4)
л-0	гЧг/ л-0
^heiciiioth series converge in a ring containing the unit circle.
prc hili ms 477
PROBLEMS
13-1. Find the mean and variance of the RV
1
^x(i) dt where x(t) = 10 p(r)
for T = 5 and for T = 100. Assume that E{v(t)} = (), Rt(r) = 2й(т).
13-2. Show that if a process is normal and distribution-ergodic as in (13-35). then it is
also mean-ergodic.
13-3. Show that if x(f) is normal with тц = 0 and /?х(т) = 0 lor |r| > a. then it is
correlation-ergodic.
13-4. Show that the process ac'1"'**" is not correlaiion-crgodic.
13-5. Show that
1 //
= lim—/ x(f + A)y(f) dt
iff
1 r2T I kl \	,
I'm — I 1 - —-E{x(f + A + r)y(f + т)х(г + A)y( t)} dr = /?;ДА)
T—x 2/ •'-27Д	-•< ]
13-6. The process x(r) is cyclostationary with period T. mean rj(t\ and correlation
jROpG)- Show that if /?(г + t. () -* ij2(t) as |r| -> x. then
lim T- f x<^df = тl'^^dl
C — x £C J - C	' Ml
Hint: The process x(r) = x(r - 6) is mean-ergodic.
13-7. Show that if
C(/ + 7>z)^ о
uniformly in t; then x(r) is mean-ergodic.
13-8. The process x(d is normal with 0 mean and WSS. (n) Show that (Fig. P13-8«)
7?(A)
E(x(r + A)|x(r) = л) = ~r(Q}X
FIGUREPU-8
478 SPECTRAL ESTIMATION
(6) Show that if D is an arbitrary set of real numbers x, and x = E(x(r)|x(r) e D}.
then (Fig. Pl3-86)
E{x(/ + A)|x(/) e D}
*(A).
Л(0)А
(c) Using the above, design an analog corrclonictcr for normal processes
13-9. The processes x(r) and y(r) are jointly normal with zero mean. Show that: (o) If
wU) = x(t + A)y(r), then
Ur) = Cxy(A + т)Сху(А - r) + Схл(г)С„.(т)
(6) If the process x(r) and y(r) arc variance ergodic, they are also cross-variance
ergodic.
13-10. Using Schwarz’s inequality (11B-1), show that
(bf(x)dx
Ja
< (b - a) fh\f(x)\2 dx
Ja
13-11. We wish to estimate the mean q of a process x(t) = tj + v(r) where /?„и(т) =
58(т). (a) Using (5-57), find the 0.95 confidence interval of 77. (6) Improve the
estimate if v(f) is a normal process.
13-12. (a) Show that if we use as estimate of the power spectrum S(<u) of a discrete-time
process x[n] the function
111- —
then
1 {O "
$»(<•>) = — ( S(y)W(w - y) dy H'(cu) = E )пш
_N
(6) Find ITfw) if W = 10 and wn = 1 - |w|/ll.
13-13. Show that if x(r) is zero-mean normal process with sample spectrum
1 fT	2
sr(") = ^ / x(/)e“/o"df
Z1 J — t
and S(<u) is sufficiently smooth, then
E2{Sr(*>)) £ VarSr(<u) £ 2E2{Sr(w))
The right side is an equality if a> = 0. The left side is an approximate equality if
l/o>.
Hint: Use (12-74).
13-14. Show that the weighted sample spectrum
- — [T c(t)x(t)e~y“" Л
£/ J — T
2
PROniLMS 479
of a process x(r) is the Fourier transform of the function
r'(t) - s=CXr('+iM'  iH' * ?)*(' • i)л
13-15. Given a normal process xG) with zero mean and power spectrum 5(w). wc form
its sample autocorrelation Rz(r) as in (13-38). Show that for large T,
VarRr(A) = -L Г (1 4-e,2Au,)S2(“) t/a)
13-16. Show that if
RT(r) = 77 [r x(z + rW' “ Л
74 7	2TJ-T4r\/2 I 2) \	1)
is the estimate of the autocorrelation R(r) of a zero-mean normal process, then
,	1 гэт-1-i r ,	, I |r| + |al \
—	[/?*•(«) + R(a + r)R(a - т)] 1------</a
21 '-2Т + Ы	\	21 J
13-17. Show that in Levinson’s algorithm,
4^* If
ak'1 = Ok . _ vi *	£{ew['’Rv-|[n - П) = 0
1 ля
13-18. Show that if Я[0] = 8 and Л[1] = 4, then the MEM estimate of S(a>) equals
6
Smem(<u) = Il - 0.5e-'"|2
13-19. Find the maximum entropy estimate 5МЕМ(ш) and the line-spectral estimate
(13-111) of a process x[«] if
Л[0] = 13	/?[11 = 5	R[2] = 2
CHAPTER
14
MEAN
SQUARE
ESTIMATION
14-1 INTRODUCTIONt
In this chapter, we consider the problem of estimating the value of a stochastic
process s(t) at a specific time in terms of the values (data) of another process
x(£) specified for every £ in an interval a <, g < b of finite or infinite length. In
the digital case, the solution of this problem is a direct application of the
orthogonality principle (see Sec. 8-4). In the analog case, the linear estimator
s(t) of s(t) is not a sum. It is an integral
s(r) = E{s(r)|x(£), a < g < b) = fbh(a)x(a) da (14-1)
J a
and our objective is to find h(a) so as to minimize the MS error
P = E([s(r) - §(r)]2} - e[ s(f) - fbh(a)x(a) da
1	Ja
(14-2)
The function h(a) involves a noncountable number of unknowns, namely, its
fN. Wiener: Extrapolation, Interpolation, and Smoothing of Stationary Time Series, MIT Press. 1950.
J. Makhoul: “Linear Prediction: A Tutorial Review," Proceedings of the IEEE, vol. 63, 1975. T.
Kailath: “A View of Three Decades of Linear Filtering Theory," IEEE Transactions Information
Theory, vol. JT-20, ,1974.
480
14-1 IN TRODl'CI ION 481
values for every a in the interval (a,b). To determine Ma), wc shall use the
following extension of the orthogonality principle:
THEOREM. The MS error P of the estimation of a process s(r) by the integral
in (14-1) is minimum if the data x(£) are orthogonal to the error s(t) - s(r):
or, equivalently, if Ma) is the solution of the integral equation
= f h(a)Rxx(a,t;) da a < £ < b (14-4)
Ja
Proof. We shall give a formal proof based on the approximation of the integral
in (14-1) by its Riemann sum. Dividing the interval (a, b) into tn segments
(ak, ak + Да), we obtain
b — a
s(f) — д} Л(а^)х(аА.) Да Да = ------------
Applying (8-70) with ak = Мал) Да, we conclude that the resulting MS error P
is minimum if
s(r)
1 < j < tn
where is a point in the interval (а;, a, + Да). This yields the system
m
*„('.(>)= E*( «*)*„(«».(,) Д«	M.......«<	(14-5)
Л = 1
The integral equation (14-4) is the limit of (14-5) as Да -» 0.
From (8-73) it follows that the LMS error of the estimation of s(t) by the
integral in (14-1) equals
= K»(0) - fh(a)Rsx(t,a) da
^a
(14-6)
In general, the integral equation (14-4) can only be solved numerically. In
fact, if we assign to the variable g the values and we approximate the integral
by a sum, we obtain the system (14-5). In this chapter, we consider various
special cases that lead to explicit solutions. Unless stated otherwise, it will be
assumed that all processes are WSS and real.
We shall use the following terminology:
И the time t in (14-1)is in the interior of the data interval (e, b), then the
estimate of s(f) will be called smoothing.
482 MEAN SQUARE ESTIMATION
If t is outside this interval and x(r) = s(r) (no noise), then s(r) is a
predictor of s(t). If t > b, then §(r) is a “forward predictor”; if t < a, it is a
“backward predictor.”
If t is outside the data interval and x(r) =# s(r), then the estimate is called
filtering and prediction.
Simple Illustrations
In this section, we present a number of simple estimation problems involving a
finite number of data and we conclude with the smoothing problem when the
data x(£) are available from (—<*,«>). In this case, the solution of the integral
equation (14-4) is readily obtained in terms of Fourier transforms.
Prediction. We wish to estimate the future value sG + A) of a stationary
process s(r) in terms of its present value
s(t + A) = E{s(t 4- A) |s( t)} = as(r)
From (7-71) and (7-72) it follows with n = 1 that
,	.	Л(А)
E([s(r + A) - as(t)]s(t)} = 0 a = ——
K(U)
P = E{[s(t + A) — as(z)]s(r + A)} = Л(0) - a/?(A)
Special case If
Я(т) =Ле"“1т| then e=e-oA
In this case, the difference s(t 4- A) — «s(/) is orthogonal to s(r — £) for every
E([s(r + A) - as(r)]s(r - f)} - K(A + f) -	)
= Ae~a^*v - Ae~a*e~a* = 0
This shows that as(t) is the estimate of s(t 4- A) in terms of its entire past. Such
a process is called wide-sense Markoff of order 1.
We shall now find the estimate of s(t + A) in terms of s(r) and s'(r):
s(t + A) = fl)S(r) 4- a2s'(r)
The orthogonality condition (8-70) yields
s(r + A) - s(t 4- A) J. s(t),s'(O
Using the identities
«(0) - 0	R„.(r) = -R'(r)	= -я"(г)
we obtain
a, « Я(А)/Я(0) a2 - /?'( A)//?"(0)
P -	+ A) - ats(t) - a2s’(t)]s(t 4- A)} = Я(0) - ^(A) 4- a2rt’(A)
14-1 INlRODUrriON 483
FIGURE 14-1
If A is small, then
Я(Л) = Л(0)	/?'(A) « R'(0) + /?"(0)A « R''(0)A
«) = 1	a2~ A s(t + A) = s(/) + As’(f)
Filtering We shall estimate the present value of a process sG) in terms of the
present value of another process xG):
s(r) = E{s(r)bc(f)) = ax(r)
From (7-71) and (7-72) it follows that
£{[s(t) - tfx(r)]x(t)} =0 a = KJX(0)/Ktjr(0)
P = E{[s(r) - «x(r)]s(0} = RSfW ~ aRfX(0)
Interpolation We wish to estimate the value sG + A) of a process sG) at a
point t + A in the interval (/, / + T), in terms of its 2Л/ + 1 samples s(r + kT)
that are nearest to t (Fig. 14-1)
N
s(t+A) = £ aks(t + kT) 0<A<T	(14-7)
k- —N
The orthogonality principle now yields
1/v	1	\
s(/4-A)— E e*s(t + fcT) s(f + nT)| = 0
k--N	J	/
|n| <:W
from which it follows that
N
E akR(kT — nT) = R(k - nT) -Nin<N (14-8)
k--N
This is a system of 2W 4-1 equations and its solution yields the 2W 4- 1
unknowns ak. The MS value P of the estimation error
N
В/ЛО e Ф + A) - L a^t + kT)	(14-9)
k--N
484 MEAN SQUARE ESTIMATION
equals
N
/> = £-(e„(r)s(z+ Л)) =Л(0) -	£ akR(K-kT) (14-10)
к = — <V
Interpolation as deterministic approximation The error e,v(/) can be con-
sidered as the output of the system
w
EnM - e'“' - E V'"'"
k=-N
(error filter) with input s(/). Denoting by 5(w) the power spectrum of s(r), we
conclude from (10-139) that
1

f-£{4(»)1
N
E akeikTl
k=* -N
da)
(14-11)
This shows that the minimization of P is equivalent to the deterministic
problem of minimizing the weighted mean square error of the approximation of
the exponential e;wA by a trigonometric polynomial (truncated Fourier series).
Quadrature We shall estimate the integral
z = (%(/) dt
of a process s(t) in terms of its /V + 1 samples s(nT):
z = ao's(O) + als(T) + • • • + aNs(NT)
b
T = —
N
Applying (8-70), we obtain
z s(ZrT)^ = 0
0 < к < N
Hence
fbR(t - kT) dt = aQR(kT) + • • • +a„R(kT - NT) 0 £k ^N
This is a system of N + 1 equations and its solution yields the coefficients ak.
Smoothing
We wish to estimate the present value of a process s(t) in terms of the values
x(£) of the: sum
x(r) -s(r) +v(<)
available for every £ from —co to w. The desirable estimate
§(t) = E{s(/)|x(£), -» < £ < co)
14-1 iNiHoiMicnoN 485
will be written in the form
s(r) = Г h(a)x( t - a) da	(14-12)
In this notation, A(a) is independent of t and s(r) can be considered as the
output of a linear time-invariant noncausal system with input x(/) and impulse
response A(r). Our problem is to find //(/).
Clearly,
s(t) - §(/) ± x(£) all $
Setting £ = f - t, we obtain
El s(r) — f h(a)x(f — a) da x(/
I	— ac
7)} = 0
all -
This yields
Ллд(т) — f h(a)RXK(~ - a) da all -	(14-13)
— X
Thus, to determine A(z), wc must solve the above integral equation. This
equation can be solved easily because it holds for all - and the integral is a
convolution of A(r) with /?l4.(r). Taking transforms of both sides, wc obtain
S„(a>) =	Hence

=
(14-14)
The resulting system is called the noncausal Wiener filter.
The MS estimation error P equals
= /?„(0) - f h(a)RiX(a) da = — [ [S„(w) -	o>)] du
J —oc	Z7T — x
(14-15)
If the signal s(z) and the noise v(t) are orthogonal, then

Hence (Fig. 14-2)
Sw(d>)
$«(*>) + V»)
p = — /•	da (14.16)
2-JT
If the Spectra Sw(<y) and £„„(«) do not overlap, then H(u) = I in the band of
the signal and Жш)•-= 0 in the band of the noise. In this case, P = 0.
486 MEAN SQUARE ESTIMATION
FIGURE 14-2
Example 14>1. If
s,M -
No
a2 + a»2
S,№ = N $,„(«) = 0
then (14-16) yields
No
7V0 + N(a2 + w2)
No
A<'> - We
-Pin
1 f" N°
p =
/32 = a2 +
N
DISCRETE-TIME PROCESSES. The noncausal estimate s[n] of a discrete-time
process in terms of the data
х[л] = s[n] + v[«]
is the output
§[л] = E Л[£]х[л-Л]
Л — — oo
of a linear time-invariant noncausal system with input х[и] and delta response
й[л]. The orthogonality principle yields
^{|8[п]“ E &[£]x[?i “&] х[и - /и] > = 0 all m
\\	*=-00	J	j
Hence
E й[Л]Яжх[т - Л] all m (14-17)
*- -00
Talking transforms of both sides, we obtain
H(2) “ (14'18)
ахдА2/
г
14-2 FRhoiciioN 487
The resulting MS error equals
P = e/s[zi]- £ Л[Л]х[я - Л] s[nj)
\	k~ - ®	J
л«(°) E	- ~f [S„(w) - H(e ^r)S,t(w)] dui
к — —оь	~	- a	>
Example 14-2. Suppose that s[/,J is a first-order AR process and v[n] is white
noise orthogonal to s[n]:
N
“ (1 — az-l)(l -az) S-<2> = N s...(~) = 0
In this case.
U*)BS„(z)+N-
aN(l - bz~ »)(1 — bz)
b(l - az’’)(l -az)
where
b + b~l = a + a 1 + —
Hence
bNa
H( z) = 1/0-,.- ». к s M«] =
aN(l - bz *)( 1 - bz)
bN{}
aN(l - b2)
0 < b < a < 1
дг	»
1	“	k-~<*>
bNtt
a(\-b')
14-2 PREDICTION
Prediction is the estimation of the future s(t + A) of a process s(f) in terms of
its past s(r — т), г > 0. This problem has three parts: The past (data) is known
in the interval (—<», /); it is known in the interval (/ - T, t) of finite length T; it
is known in the interval (0, t) of variable length t. We shall develop all three
parts for digital processes only. The discussion of analog predictors will be
limited to the first part. In the digital case, we find it more convenient to predict
the present s[n] of the given process in terms of its past s[n - Л], к к г.
Infinite Past
We start with the estimation of a process s[n] is terms of its entire past
s[n - fc], к 1:
3[л] - jg{s[«]ls[n - £], к st 1} =• £ A(*]s[« - *]	(14-19)
,	л-i
This estimator will be called the one-step predictor of s[n]. Thus §[«] is the
488 MEAN SQUARE ESTIMATION
response of the predictor filter
H(z) =/:[l]z-' + •• +A[*]z-‘ +	(14-20)
to the input s[?j] and our objective is to find the constants h[k] so as to
minimize the MS estimation error. From the orthogonality principle it follows
that the error e[zi] = sin] - s[n] must be orthogonal to the data s[zi - zn]:
e((s[zz] - L Л[Л]в[п - /c]js[zi - zn]} = 0 m>l (14-21)
Ц A-i	/	'
This yields
7?[m] - 52 Л[Л]Я[т — к] = 0 m > 1	(14-22)
k= I
We have thus obtained a system of infinitely many equations expressing the
unknowns in terms of the autocorrelation /?[zn] of s[zi]. These equations
are called Wiener-Hopf (digital form).
The Wiener-Hopf equations cannot be solved directly with z transforms
even though the right side equals the convolution of A[/n] with Я[т]. The
reason is that, unlike (14-17), the two sides of (14-22) are not equal for every m.
A solution based on the analytic properties of the z transforms of causal and
anticausal sequences can be found (see Prob. 14-12); however, the underlying
theory is not simple. We shall give presently a very simple solution based on the
concept of innovations. We comment first on a basic property of the estimation
error e[n] and of the error filter
E(z) = 1 - H(z) = 1 - £ A[n]z-*	(14-23)
Л-1
The error e[n] is orthogonal to the data s[n - zn] for every m 1;
furthermore, e[n - m\ is a linear function of s[n - zn] and its past because e[n]
is the response of the causal system E(z) to the input s[n]. From this it follows
that e[n] is orthogonal to e[/i - zn] for every m 1 and every n. Hence e[zi] is
white noise:
/?ee[n] = jE{e[n]e[/i - zn]} = P5[m]	(14-24)
where
P = E{e2[n]} = E{(s[n] - s[n])s[zz]} = Я[0] - £ й[&]Я[к]
л-i
is the LMS error; This error can be expressed in terms of the power spectrum
$(й>) of s[zi]; as we see from (10-139),
P = Г IE(e'")|25(w) du	(14-25)
Z,TT J-ir
Using the above, we shall show that the function E(z) has no 0’s outside
the unit circle.
14-2 pRFuu-HON 489
THEOREM. If
£(2») = 0 then |zj < 1	(14-26)
Proof. We form the function
1 - z~ l/z*
E0(z) = E(z) —-------
I - z,z
This function is an error filter because it is causal and E(1(x) == E(<») = 1.
Furthermore, if |z,| > 1, then [see (13B-2)]
|E„(e'")| = |j|IE(^“)| < |E(tu“)|
Inserting into (14-25), we conclude that if we use as the estimator filter the
function 1-Eo(z), the resulting MS error will be smaller than P. This, however,
is impossible because P is minimum; hence |z(| s 1.
Regular Processes
We shall solve the Wiener-Hopf equations (14-22) under the assumption that
the process s[n] is regular. As we have shown in Sec. 12-1, such a process is
linearly equivalent to a white-noise process i[zi] in the sense that
s[n] = E/[/t]i[n-fc]	(14-27)
k =0
i[n] = E - £]	(14-28)
ьо
From this it follows that the predictor s[n] of s[n] can be written as a linear sum
involving the past of i[n]:
s[n]= £ h,[fc]i[n - к]	(14-29)
ы
To find s[n], it suffices, therefore, to find the constants ЛДЛ] and to express i[n]
in terms of s(n] using (14-28). To do so, we shall determine first the cross-corre-
lation of s[n] and Цл]. We maintain that
= M	(14'30>
Proof. We multiply (14-27) by i[n - m] and take expected values. This yields
£{»[и]1Гп - ">]) - f '[*]£{![" - *JI[" -	= E
A-о	*-°
because Rw[m] e 5[m], and (14-30) results.
490 MEAN SQUARE ESTIMATION
To find ЛД&], we apply the orthogonality principle:
E< |s[n] — XL	|i[n — /л] > = 0 m < 1
This yields
/?J>] - E ЛДЛ]Л<7[т -	= /?„[/л] - E A,-[*]S[m - fc] = 0
A=I	I
and since the last sum equals hД/л], we conclude that ЛД/л] = /?ЛД/л]. From
this and (14-30) it follows that the predictor §[л], expressed in terms of its
innovations, equals
§[л] = £Z[fc]i[n-fc]	(14-31)
A-l
We shall rederive this important result using (14-27). To do so, it suffices
to show that the difference s[n] - s[n] is orthogonal to i[/i - /л] for every
m 1. This is indeed the case because
е[и] =	E /[/c]i[« - *] = /[0]i[«] (14-32)
к“О	к“1
and |’[л] is white noise.
The sum in (14-31) is the response of the filter
E/[*]z-* = L(z) -/[0]
A = 1
to the input Цл]. To complete the specification of §[л], we must express i[n] in
terms of s[n]. Since Цл] is the response of the filter l/L(z) to the input s[w], we
conclude, cascading as in Fig. 14-3, that the predictor filter of s[n] is the product
1	/[0]
H(z) = 7777(1.(7) - /[0]) = 1-777-	(14-33)
L(z)	L(z)
shown in Fig. 14-4. Thus, to obtain H(z), it suffices to factor S(z) as in (12-6).
The constant /[0] is determined from the initial value theorem:
/[0] = lira L(z)
2-»oe
FIGURE 14-3
14-2
PRLUirriON 491
sfnj
♦ h,w-i- «
Z[OJ
I
L(z)
1[л| И
One-step predictor
FIGURE 14-4
Example 14-3. Suppose that
v 5 - 4 cos ы
S(w) = 77.—---------------------
10 - 6 cos ш
as in Example 14-4. In this case. (14-33) yields
3 2z - 1
Note that s[m] can be determined recursively:
«-(-’) =
/[0]
6(1 - г-’/2)
2
3
The Kolmogoroff-Szego MS error formula! As we have seen from (14-32), the
MS estimation error equals
Р = £{ег[«1)=/г[0]
Furthermore [see (13A-1)]
In/2[0] = —— ( ШЩе"0)|2do>
2тг-'-7г
Since S(to) = |L(e;w)|2, this yields the identity
P =
(14-34)
Expressing P directly in terms of 8(ш).
Autoregressive processes. If s[n] is an AR process as in (12-39), then /[0] = bn
End
H(z) = -e.z-'- • -aNz~N	(1W5)
s[rt] — —0,s{rt — j] — • • • -	- W] P =
Theabove shows that the predictor s[n] of s[n] in terms of its entire past is the
same as the predictor in terms of the W most recent past values. This result can
be established directly: From (12-39) and (14-35) it follows that s[n] - §[n] -
HI. Grenander and G.Szcgo: Toeplitz Forms and Their Applications, Berkley University Press. 1958.
492 MEAN SQUARE ESTIMATION
Z)0i[«]. This is orthogonal to the past of s[n]; hence
E{s[n]|s[n - Ar], 1 < к < /V} = E(s[W]|s[« - к], к > 1}
A process with this property is called wide-sense Markoff of order N.
THE r-STEP PREDICTOR. We shall determine the predictor
§Дм] = E{s[n]|s[zr - Л], к > r}
of s[?i] in terms of s[n - r) and its past using innovations. We maintain that
sr[«] = E/[A']i[n - Л]	(14-36)
Proof. It suffices to show that the difference
Er[zi] = S[«] ~ S,[h] = Y, /[*]»['» “ *]
A = 0
is orthogonal to the data s[n - Ar], к > r. This is a consequence of the fact that
s[n - £] is linearly equivalent to i[/t — £] and its past for > r; hence it is
orthogonal to i[n - m ] for 1 < m £ r - 1.
The prediction error ёДл] is the response of the MA filter /[0] + /[l)z_|
+ ••• +/[r — l]z"'+ ’ of Fig. 14-5 to the input i[«]. Cascading this filter with
l/Uz) as in Fig. 14-5, we conclude that the process sr[«] = s[n] - ёДм] is the
response of the system
H,(z) = 1 - гДт LWz-*	(14-37)
L(*)
to the input s[zi]. This is the r-step predictor filter of s[n]- The resulting MS error
FIGURE 14-5
14-2 i'kt рн । кi\ дуд
equals
C-E(eJ[n]J = E>[t]
A - I)
(14-38)
Example 14-4. Wc are given a process s[n] with autocorrelation /<[//»] = u'"" and
we wish to determine its r-step predictor. In this case (sec Example 1(1-30)
a 1 - a	/>-’
(« 1 + a) - ( z 1 -( г)	(1 - az 1)( I - nz)	*	(>
b
L(z) = ----------—	4"] = /м'Ъ’[л]
I az
Hence
I - az 1,1
- 1-----------— £&,*_• '
” к = 0
r I
s,[m] = «'s(h - r] Pr = b: У a'k = 1 - tr'
к n
ANALOG PROCESSES. Wc consider now the problem of predicting the future
value s(r + A) of a process sit) in terms of its entire past s(/ - т), т £ 0. In this
problem, our estimator is an integral:
§(/ + A) = £{s(/ + A)|s(r - t), t > 0} = ( h(a)s(t - a) da (14-39)
•'ll
and the problem is to find the function hia). From the analog form (14-4) of the
orthogonality principle, it follows that
a) sit — t)> = 0
This yields the Wiener-Hopf integral equation
R(r 4- A) = f h(a)R(T — a)da r>0	(14-40)
Jo
The solution of this equation is the impulse response of the causal Wiener filter
H(s) = f*hit)e~s'dt
The corresponding MS error equals
P -Ef[s(z + Л) - s(l + A)]s(z + Л)) -E(0) -£"л(«)Я(Л + a)da
(14-41)
494 MEAN SQUARE ESTIMATION
Equation (14-40) cannot be solved directly with transforms because the
two sides are equal for т 0 only. A solution based on the analytic properties
of Laplace transforms is outlined in Prob. 14-11. We give next a solution using
innovations.
As we have shown in (12-8), the process s(t) is the response of its
innovations filter L(j) to the white-noise process i(t). From this it follows that
s(t + A) = [ I(a)i(t + Л - a) da	(14-42)
'о
We maintain that s(t + A) is the part of the above integral involving only the
past of i(t):
s(f + A) = f°7(a)i(t + A - a) da = Го + A)i(/ - 0) dp (14-43)
Л	'о
Proof. The difference
s(/ + A) — s(t + A) = f*/(a)i(t + A — a) da (14-44)
Jo
depends only on the values of i(/) in the interval (t, t + A); hence it is
orthogonal to the past of i(r) and, therefore, it is also orthogonal to the past of
s(/).
The predictor 3(f + A) of s(t) is the response of the system
H/s) = (\(t)e-stdt hft) = l(t + A)Z7( r) (14-45)
'о
(Fig. 14-6) to the input i(r). Cascading with 1/L(s), we conclude that s(t + A) is
/(Of
ilGURB144
14-2 vRiDicnos 495
the response of the system
H,(s)

LU)
(14-46)
to the input s(t). Thus, to determine the predictor filter H,(.v) of s(r). proceed as
follows:
Factor the spectrum of s(r) as in (12-3): SU) = !_(.> )L(-.v).
Find the inverse transform /(/) of LU) and form the function h(t) =
l(t + AW
Find the transform H,U) of Л,(/) and determine H(.v) from (14-46).
The MS estimation error is determined from (14-44):
P = E
( l(a)i(i + A - a) da
о
(14-47)
Example 14-5. We arc given a process str) with autocorrelation R(r) = 2a<’ " r|
and we wish to determine its predictor. In this problem.
8(5) = -^^ L(s) = —/(r) =f-"'(/(r)
a~ - s*	a + s
hAt) = e"“Ae-n,i;(r)	H.(s) = ----
a + s
H(s) = e-nA s(r + Л) = t'_rtAs(r)
This shows that the predictor of s(r + A) in terms of its entire past is the same as
the predictor in terms of its present s(r). In other words, if s(r) is specified, the
past has no effect on the linear prediction of the future.
The determination of HU) is simple if s(r) has a rational spectrum.
Assuming that the poles of HU) are simple, we obtain
LO) = -^4 = E —	=
D(s) iS-s,
hi(t) = £c,e’V*'l/(O	= "OUT (14-48)
and (14-46) yields HU) = N{s)/N(.s).
If NU) = 1, then HU) is a polynomial:
H(s) - N{(s) = b0 + b,s + • • * +bns"
and 8(t 4- A) is a linear sum of s(r) and its first n derivatives:
8(t + л) - MO + b^(i) + ••• +Мя>(0
496 MEAN SQUARE ESTIMATION
Example 14-6. We are given a process s(z) with
49 - 25s2 t z ч 7 + 5s
~ (1 - s2)(9 — s2)	L(5) " (1 +s)(3 +s)
and we wish to estimate its future s(i + Л) for Л = In 2. In this problem, ел = 2:
1	4	e-A 4t,3A ,s + 2
L(j) = Г+Т + ГкЗ H,(S) = 7+7 + s + 3 ° (s + l)(s + 3)
s + 2	1	3	, л
H(.r) = F7T4 = T5('> +
5s + 7	5	25
Hence
E{s(t + A)|s(z - t), t > 0} = 0.2s(/) + E{s(r + A)|s(r - t), t > 0}
Notes 1. The integral
y(r) = f h(a)R(T - a) da
J()
in (14-40) is the response of the Wiener filter H(s) to the input Я(т). From (14-40)
and (14-41) it follows that
у(т) = Л(т + А) for т > 0 and y(—A) = Л(0) - P
2. In all MS estimation problems, only second-order moments are used. If,
therefore, two processes have the same autocorrelation, then their predictors are
identical. This suggests the following derivation of the Wiener-Hopf equation:
Suppose that to is an RV with density /(to) and z(r) = eJw'. Clearly,
Лгг(т) = Е{еМ,+г>е~'м} = Г /(to)e'“Trfto
— 90
From this it follows that the power spectrum of z(r) equals 2irf(u) [see also
(10-127)]. If, therefore, s(r) is a process with power spectrum S(to) = 2ir/(to), then
its predictor Л(/) will equal the predictor of z(r):
z(t 4- A) =>	a 0} = A(a)e/w(,-o) da
Jo
= eftu Гh(a)e~iua da = е'ш,Н(и)
JQ
And since z(t 4- A) - i(t + A) 1 z(t - r), for r > 0, we conclude from the above
that
£{ [e/w('+A) - е'"'Я(<а)]<?-м'"т>} =0 r £ 0
Hence
Г /(«) [eMr+A) - e'wrH(w)] du = 0	r>0
* — eo
This yields (1440) because the inverse transform of f(u)eJtu{T+A} equals Л(т 4- A)
and the inverse transform of	equals the integral in (1440).
14-2 PREDICTION 497
Predictable processes. We shall say that a process s[n] is predictable if it equals
its predictor:	1
slw] = 22 A[/c]s[w - At]	(14-49)
In this case [see (14-25)]
1
P ~	ciaj = 0	(14-50)
Since S(&>) > 0, the above integral is 0 if S(a>) & 0 only in a region R of the ш
axis where E(ey") = 0. It can be shown that this region consists of a countable
number of points w,—the proof is based on the Paley-Wiener condition (12-9).
From this it follows that
S(o>) = 2тг £а,5(ш - w,)	E(e/M') = 0	(14-51)
i « I
Thus a process s[zr] is predictable if it is a sum of exponentials as in (12-9):
s[n] = £ c,eito'n E{c;} = a,	(14-52)
i-1
We maintain that the converse is also true: If s[n] is a sum of m
exponentials as in (14-52), then it is predictable and its predictor filler equals
1 — D(z) where
D(z) = (1 -e;“'z"*) • •• (I — e)lUmz~l)	(14-53)
Proof. In this case, E(z) = D(z) and E(e'"') = 0; hence E(e-"")5(w) = 0 be-
cause E(^")3(w - w,) — E(e;w<)3(to - co,) = 0. From this it follows that P = 0.
Note The preceding result seems to be in conflict with the sampling expansion (11-138)
of a BL process s(/): This expansion shows that s(t) is predictable in the sense that it can
be approximated within an arbitrary error e by a linear sum involving only its past
samples s(nT0). From this it follows that the digital process s[«) = s(.nTn) is predictable
in the same sense. Such an expansion, however, docs not violate (14-50). It is only an
approximation and its coefficients tend to » as e -* 0.
GENERAL PROCESSES AND WOLD’S DECOMPOSITIONf We show finally that an
arbitrary process s[л) can be written as a sum
s[n] = S|[«] +	(14-54)
of a regular .process sДл] arid a predictable process s,[n], that these processes arc
tA. Papoulis: Predictable Processes and Wold's Decomposition: A Review. IEEE Transactions on
Acoustics, Speech, and Signal Processing, Vol. 22, 1985.
498 MEAN SQUARE ESTIMATION
FIGURE 14-7
orthogonal, and that they have the same predictor filter. We thus reestablish construc-
tively Wold’s decomposition (12-89) in the context of MS estimation.
As we know [sec (14-24)], the error e[n] of the one-step estimate of s[n] is a
white-noise process. We form the estimator st[n] of s[n] in terms of e[h] and its past:
sjn] = E(s[n]|E[n - Ar], <>!}= 52	_ Ar] (14-55)
fc = (1
Thus sj/i] is the response of the system (Fig. 14-7)
W(z)= £ wkz'k
Jt — 0
to the input e[n], The difference s2[n] = s[/i] - sjn] is the estimation error (Fig. 14-7).
Clearly (orthogonality principle)
s2[n] ± e[/i - Л] k > 0	(14-56)
Note that if s[n] is a regular process, then [see (14-32)] e[/i] = /[O]i[n]; in this case,
sjn] = s[n].
THEOREM, (a) The processes s,[n] and s2[n] are orthogonal:
s,[n] JL s2[n - £] all к	(14*57)
(b) sjn] is a regular process.
(c) s2[n] is a predictable process and its predictor filter is lhe sum in
(14-19):
00
s2[w] = E fc[*]s2[/i - Ac]	(14-58)
A-l
Proof, (a) The process e[n] is orthogonal to s[n — Ac] for every к > 0. Further-
more, s2[n — Ac] is a linear functionof s[n — Ac] and its past; hence s2[« — Ac ] _L
e[m] for Ac > 0. Combining with (14-56), we conclude that
s2[n - Ac] J. e[n] ail к	(14-59)
And sinceit ,[n] depend? linearly on е[л] and its past; (14-57) follows.
14-2 prediction 499
(Ь) The process s,[«] is the response of the system W(z) to the white noise
«[«I- To prove that it is regular, it suffices to show that
ОС
£‘v*<oc	(14-60)
*=0
From (14-54) and (14-55) it follows that
E{s2[»]} = E{s?[«]J + E(s2h]) > E{s2[n]} = £ w2
fc-0
This yields (14-60) because E{s2[n]) = E(0) < <».
(c) To prove (14-58), it suffices to show that the difference
z[h] = s2[n] - £ h[k ]s2[/j -&]
I
equals 0. From (14-59) it follows that z[n] j. e[/i - k] for all k. But z[n) is the
response of the system 1 - H(z) = E(z) to the input s,[zi] = s[«] - sjn];
hence (see Fig. 14-8)
z[n] = e[«] - sjn] + £h[*]si[n-A]	(14-61)
к « I
This shows that z[n] is a linear function of е[п] and its past. And since it is also
Orthogonal to e[zi], we conclude that z[zi] = 0.
Note finally that [see (14-61)]
sjn] - E A[*]Sil« - Zc] = e[«] ± sjn - m] m > 1
л— I
Hence the above sum is the predictor of st[n]. We thus conclude that the sum
H(z)ln (14-20) is the predictor filter of the processes s[n], sjn], and s2[n].
500 MEAN SQUARE ESTIMATION
FIR PREDICTORS. We shall find the estimate of a process s(n] in terms of
its N most recent past values:
:V
S„[«]-J?(s[n]|s[n -*],1 <k SN) - £«?>[»-*] (14-62)
к I
This estimate will be called the forward predictor of order N. The superscript
in a% identifies the order. The process s^/i] is the response of the forward
predictor filter
N
А«(г)-	(14-63)
Ы
to the input s[n]. Our object is to determine the constants so as to minimize
the MS value
PN = E{e^.[n]) = E{(s[n] - sv[n])s[n]}	(14-64)
of the forward prediction error e,v[?z] = s[n] - s^n]
The Yule-Walker equations. From the orthogonality principle it follows that
(f	n	\	\
s[n] -	—/с] Is[n —/n] > = 0	1 < m < /V
\	bl	I	J
This yields the system
N
Z?[m] - 22	- A] = 0	1 < m < N (14-65)
A’= I
Solving, we obtain the coefficients a% of the predictor filter Нд,(г). The
resulting MS error equals [see (13-83)]
N	Альм
?« = Ф1-	=	(14-ВД
Л-1	a/V
In Fig. 14-8 we show the ladder realization of Hw(z) and the forward error filter
ew(z)=i-A„(z).
As we have shown in Sec. 13-3, the error filter can be realized by the
lattice structure of Fig. 14r9. In that figure, the input is s[/i] and the upper
в

14-2 I'HI ПН IHJN 501
output еД/V]. The lower output eJh] is the backward prediction error delined
as follows: The processes s[n] and s( - „] have the same autocorrelation: hence
their predictor filters are identical. From this it follows that the backward
predictor 5n[zj], that is, the predictor of s{n] in terms of its /V most recent
future values, equals
M'd = £{s[n]|s[n + А]. 1 < к < /V) = £ aAvs[" k]
k- i
The backward error
eN[ n) = s[/i - /V] - 5iV[h - Л']
is lhe response of the filter
6N(z) =2-v(l	-------a*.-*) =z-vEv(!/z)
with input s[n]. From this and (13-94) it follows that the lower output of the
lattice of Fig. 14-8 is e.J/i).
In Sec. 13-3, we used the ladder-lattice equivalence to simplify the
solution of the Yule-Walker equations. Wc summarize next the main results in
the context of the prediction problem. We note that the lattice realization also
has the following advantage. Suppose that we have a predictor of order N and
we wish to find the predictor of order N + 1. In the ladder realization, we must
find a new set of W + 1 coefficients ak’ In the lattice realization, we need
only the new reflection coefficient Кл..(; the first N reflection coefficients Kk
do not change.
Levinson’s algorithm. We shall determine the constants a'k, KN, and PN
recursively. This involves the following steps: Start with
«|=К| = Я[1]/Я[0] P, = (l-Kf)/?[0]
Assume that the N + 1 constants ak *, and PN_t are known.
Find KN and PN from (13-107) and (13-108):
PN.tKK - K(W] -	- 4] Р» = (1 -	<14-67)
к I
Find a* from (13-97)
=	(14-68)
In Levinson’s algorithm, the order N of the iteration is fimtebut it can
continue indefinitely. We shall examine the properties of the predictor and oi
the MS error PN as H -> «>. It is obvious that PN is a nonincreasing sequence of
positive numbers; hence it tends to a positive limit:

(14-69)
502 MEAN SQUARE ESTIMATION
As we have shown in Sec. 12-3, the zeros z, of the error filter
E„(z)-1- E#’*
A= 1
are either all inside the unit circle or they are all on the unit circle:
If PN > 0, then | AT J < 1 for every к < N and |zf| < 1 for every i [see
(13-99)].
If PN_i > 0 and PN = 0, then |KJ < 1 for every к < N - 1, |/C^I = 1.
and |zz| = 1 for every i [see (13-101)]. In this case, the process s[n] is pre-
dictable and its spectrum consists of lines.
If P > 0, then |z,| < 1 for every i [see (14-26)]. In this case, the predictor
sN[n] of s[n] tends to the Wiener predictor s[n] as in (14-19). From this and
(14-34) it follows that
{1 fir	\	&N+1
— / In S(tu) da> = /[0] = lim	(14-70)
This shows the connection between the LMS error P of the prediction of s[n] in
terms of its entire past, the power spectrum S(a>) of s[n], the initial value /[0] of
the delta response /[n] of its innovations filter, and the correlation determinant
Suppose, finally, that PM-l > PM and
PM = PM-i = =P	(14-71)
In this case, Kk = 0 for |A: I > M\ hence the algorithm terminates at the Afth
step. From this it follows that the Afth order predictor sA/[n] of s[n] equals its
Wiener predictor:
Mt	«	\
Swt”] = E £<s[n]|s[n - Л], 1 к < M] = У s[n]|s[n - Л], к > 1|
л-i I	*=i	J
In other words, the process s[n] is wide-sense Markoff of order M. This leads to
the conclusion that the prediction error ew[n] = s[n] — sA/[n] is white noise
with average power P [see (14-24)]:
м
«[«] - E <ф[л - fc] = ew[n] Е(ё^[л])=Р
A-l
and it shows that s[n] is an AR process. Conversely, if s[n] is AR, then it is also
wide-sense Markoff.
Autoregressive processes and maximum entropy. Suppose that s[n] is an AR
process of order M with autocorrelation Л[/л] and s[n] is a general process with
autocorrelation f?[m] such that
Я[т]=/?[/и] for |m| £ M
14-2 i’hlijk iios 503
The predictors of these processes of order M are identical because they depend
on the values of R[/n] for |m| <, M only. From this it follows that the
corresponding prediction errors PM and arc equal. As wc have noted
Рм “ P for the AR Process s[/i] and £ P for the general process s[n).
Consider now the class Cty of processes with identical autocorrelations
(data) for |m| M. Each /?[>»] is a p.d. extrapolation of the given data. We
have shown in Sec. 13-3 that the extrapolating sequence obtained with the
maximum entropy (ME) method is the autocorrelation of an AR process [see
(13-141)]. This leads to the following relationship between MS estimation and
maximum entropy: The ME extrapolation is the autocorrelation of a process
s[/t] in the class Сл/, the predictor of which maximizes the minimum MS error
P. In this sense, the ME method maximizes our uncertainty about the values of
R[m] for |m| > M.
Causal Data
We wish to estimate the present value of a regular process s[n] in terms of its
finite past, starting from some origin. The data are now available from 0 to
n — 1 and the desired estimate is given by
s„[n] = £{s[/i]|s[/i - Л], 1	A: £ n) = £ atfspi - /с]	(14-72)
к- 1
Unlike the fixed length N of the FIR predictor sN[n] considered in (14-62), the
length n of this estimate is not constant. Furthermore, the values aj of the
coefficients of the filter specified by (14-72) depend on n. Thus the estimator of
the process s[n] in terms of its causal past is a linear time-varying filter. If it is
realized by a tapped-delay line as in Fig. 14-8, the number of the taps increases
and the values of the weights change as n increases.
The coefficients a% of s„[n] can be determined recursively from Levinson’s
algorithm where now N = n. Introducing the backward estimate s[»i] of s[zi] in
terms of its n most recent future values, we conclude from (13-92) that
s„[»] = s,+ K„(s[0] ~ ViM)	(14-73)
s„[0] = s„-t[O] + K„(s[/i] - s„_,[«])
In Fig. 14-10, we show the normalized lattice realization of the error filter
E^z) where we use as upper output the process
i[„] = 4-a.tn]	£{i2H) = i	<l4-74>
V * n
The filter is formed by switching “on” successively a new lattice section starting
from the left. This filter is again time-varying; however, unlike the tapped-delay
line realization, the elements of each section remain unchanged as n increases.
We should point out that whereas ejnl is the value of the upper response of
504 MEAN SQUARE ESTIMATION
the к th section at time n, the process i[w] does not appear at a fixed position. It
is the output of the last section that is switched “on” and as n increases, the
point where i[n] is observed changes.
We conclude with the observation that if the process s[n] is AR of order
M [see (13-81)], then the lattice stops increasing for n > M, realizing, thus, the
time invariant system EM(z)/ y[P^. The corresponding inverse lattice (see Fig.
13-15) realizes the all-pole system
EM(z)
We shall now show that the output i[n] of the normalized lattice is white
noise
R„[m] =<5[m]	(14-75)
Indeed, as we know, ел[л] ± s[n — k] for 1 к < n. Furthermore, е„_Дл - r]
depends, linearly only on s[n - r] and its past values. Hence
«лк] -1- «л-ik - 1]	(14-76)
This yields (14-75) because Pn «= £{е^[п]}.
Note In a lattice of fixed length, the output ew[/i] is not white noise and it is not
orthogonal to	However for a specific n, the random variables ёл,(л] and
- 1] are orthogonal.
14-2 PREDICTION 505
KALMAN INNOVATIONS'}-. The output i[nj of the time-varying lattice of Fig
14-10 is an orthonormal process that depends linearly on s[n - k]. Denoting by
Ук the response of the lattice at time n to the input s[n] =	- *], we obtain
i[0] = Tos[°]
*[1] = YosfO] 4- yjsfl]	(14-77)
Ы = y«s[0] + •• +n"s[A-] + ••• +?nns[/i]
or in vector form
+ i	®n ♦- I Ц| + I
Уо Уо
у{
о
Уо
У"
у"
/л
where S„ + I and I„ + I are row vectors with components
s[0],...,s[n] and i[0],..., i[n]
respectively.
From the above it follows that if
s[n] = 3[/t — A:] then i[n] = n > к
This shows that to determine the delta response of the lattice of Fig. 14-10, we
use as input the delta sequence 5[n — &] and we observe the moving output i[ n ]
for n k.
The elements of the triangular matrix Гп+1 can be expressed in terms
of the weights ank of the causal predictor §„[«]. Since
E„[n] = s[n] - s„[n] = n]
it follows from (14-72) that
yn . =
in-k
-1
7=°*
к S 1
The inverse of the lattice of Fig. 14-10 is obtained by reversing the flow
direction of the upper line and the sign of the upward weights -Kn as in Fig.
13-15. The turn-on switches close again in succession starting from the left, and
the input i[w] is applied at the terminal of the section that is connected last. The
fT. Kailath, A. Vieira, and M. Morf: “Inverses of Tocplitz Operators, Innovations, and Orthogonal
Polynomials," SIAM Review, vol. 20, no. 1,1978.
506 MEAN SQUARE ESTIMATION
output at A is thus given by
s[0] = ф[0]
s(l] = /М0] +	(14-78)
s[n]-/„"i[oj + ••+/:)[«]	д„-г„-'
From this it follows that if
i[n] = 5[n - A’] then s[/i] =/" n > к
Thus, to determine the delta response I" of the inverse lattice, we use as moving
input the delta sequence 5[n - A] and we observe the left output s[n] for
n 2 k.
From the preceding discussion it follows that the random vector S„ is
linearly equivalent to the orthonormal vector I„. Thus Eqs. (14-77) and (14-78)
correspond to the Gram-Schmidt orthonormalization equations (8-88) and
(8-91) of Sec. 8-3. Applying the terminology of Sec. 12-1 to causal signals, we
shall call the process the Kalman innovations of s[n] and the lattice filter
and its inverse Kalman whitening and Kalman innovations filters respectively.
These filters are time-varying and their transition matrices equal Гл and Ln
respectively. Their elements can be expressed in terms of the parameters Kn
and Pn of Levinson’s algorithm because these parameters specify completely the
filters.
Cholesky factorization We maintain that the correlation matrix Rn and
its inverse can be written as products
Л,;' = Г„Г,:	(14-79)
where Гл and Ln are the triangular matrices introduced earlier. Indeed, from
the orthonormality of I„ and the definition of Rn, it follows that
- i„ £(s;s„} - r„
where 1„ is the identity matrix. Since I„ = S„r„ and S„ = I„L„, the above
yields
Г„'ЯЛГЛ = 1„ L’nlnLn = Rn
and (14-79) results.
Autocorrelation as lattice response. We shall determine the autocorrelation
2?[m] of the process s[n] in terms of the Levinson parameters KN and Ры. For
this purpose, we form a lattice of order No and we denote by and
respectively its upper and lower responses (14-1 la) to the input Я[/и]. As we
see from the figure
+	- 1]	(14-80a)
— 1] —	|[ w]	(14-806)
U'«l = <7oM =	(14-80c)
14-2 pRfcDic-tioN 507
(«)
FIGURE 14-11

Using the above, we shall show that /?[m] can be determined as the
response of the inverse lattice! of Fig. 14-1 lb provided that the following
boundary and initial conditions are satisfied: The input to the system (point B)
is identically 0:
= 0 all m	(14-81)
The initial conditions of all delay elements except the first are 0:
qN[0]=0	N>0	(14-82)
The first delay element is connected to the system at m = 0 and its initial
condition equals Я[0]:
4о[О]=Я[О]	(14-83)
From the above and (14-81) it follows that
^[1] =0 N > 1
We maintain that under the stated conditions, the left output of the
inverse lattice (point Л) equals /?[m] and the right output of the mth section
equals the MS error Pm:
<7оМ-лМ	(14-84)
tE. A. .Robinson and S. Treitel: “Maximum Entropy and the Relationship of the Partial Autocorre-
lation to the Reflection Coefficients of a Layered System," IEEE Transactions on Acoustics, Speech,
ttnd Signal Process, vol. ASSP-28, no. 2,1^80.
508 MEAN SQUARE ESTIMATION
Proof. The proof is based on the fact that the responses of the lattice of Fig.
14-10д satisfy the equations (see Prob. 14-24)
] = 0	1 <, rn < N - 1	(14-85)
(14-86)
From (14-80) it follows that, if we know and - 1], then we can
find	and gN[m]. By a simple induction, this leads to the conclusion
that if 4n0[w] is specified for every rn (boundary conditions) and <7Л[1] is
specified for every N (initial conditions), then all responses of the lattice are
determined uniquely. The two systems of Fig. 14-11 satisfy the same equations
(14-80) and, as we noted, they have identical initial and boundary conditions.
Hence all their responses are identical. This yields (14-84).
14-3 FILTERING AND PREDICTION
In this section, we consider the problem of estimating the future value s(t + A)
of a stochastic process s(t) (signal) in terms of the present and past values of a
regular process x(r) (signal plus noise)
s(t + Л) = E{s(f + A)|x(f — т), т > 0} = f hx(a)x(t - a) da (14-87)
Thus s(t + A) is the output of a linear time-invariant causal system Hr(s) with
input x(r). To determine HA.(s), we use the orthogonality principle
E
s(t + A) — j hx(a)x(t — a) da
JQ
x(r - -)j
= 0
т > 0
This yields the Wiener-Hopf equation
+ A) = ( hx(a)Rxx(r — a) da t>0
Jo
(14-88)
The solution hx(t) of (14-88) is the impulse response of the prediction and
filtering system known as the Wiener filter. If x(r) = s(r), then Ax(t) is a pure
predictor as in (14-39). If A = 0, then hx(t) is a pure filter.
To solve (14-88), we express x(t) in terms of its innovations L(r) (Fig.
14-12)
*(') = j <r(a)U' " «) da Rn(r) = 8(r)	(14-89)
where lx(d) is the impulse response of the innovations filter L/s) obtained by
factoring the spectrum of x(r) as in (12-3):
®jrx(,s) = l-x(s)Lx( —s)	(14-90)
As we know, the processes ix(r) and x(t) are linearly equivalent; hence the
estimate s(/ + A) can be expressed as the output of a causal filter H.-(s) with
*4
14-3 III.II RING ANDI-KI DKIK». 3O9
S,((s)=S„tv)n<-A)
Л,(т)=К„(т + А)С(т)
FIGURE 14-12
input ix(/):
s(r + Л) = I A, ( a)i,( / - «) da
A) '
To determine we use the orthogonality principle
E-
s(r + A) - [ h, (ft)it(r
'и
-«) da
M'
r)J - 0
(14-91)
r > 0
Since ix(/) is white noise, the above yields
Я„(т + А) = Гл/(а)5(г-а)</в = /|Дт)	r>0	(14-92)
*	•'o *
This determines A. (7) for all 7 because h, (7) = 0 for 7 < 0:
\(т) =Лл/1(7 + А)Щ7)	(14-93)
In the above, Ял/(7) is the cross-correlation between the signal s(r) and the
process ir(/). The function RSI (7) can be expressed in terms of the cross-corre-
lation Rsx(t) between s(/) and x(/). Indeed, since it(/) is the output of the
whitening filter Г/s) with input x(t), we can show as in (10-118) and (10-157)
that
S„/s) = S,v(x)rr(-5)	(14-94)
Thus, since SJX(s) is assumed known, (14-94) yields Rsi(T). Shifting to the
left and truncating as in (14-93), we obtain A, (7).
To complete the specification of Hx(s), we multiply the transform H, (5) of
the function Az (1) so obtained with Г/s) (see Fig. 14-12)
НД5) =HJx)C(5)	(14-95)
The function HZ|(s) can be determined directly from (14-94): As we know
(shifting theorem) the transform of Rsi(t + A) equals
SA(s) =	= Sxv(5)G( -s)cAs	(14-96)
To find HJ.s), it suffices to write SA(s) as a sum
(14-97)
where SA (s) is analytic in the right-hand 5 plane and S^(jt) is analytic in the
left-hand $ plane. Since the inverse transforms of the function SA(s) and SA (5)
510 MEAN SQUARE ESTIMATION
equal RJXj(t + A)£/(r) and Rsi(r + A)U(-r) respectively, we conclude from
(14-93) that (see also Note on next page)
H„(5) = SA+(s)	(14-98)
To determine the system function Hx(s) of the Wiener filter, proceed, thus, as
follows:
Factor Sxx(s) as in (14-90) and set Г/л) = 1/Lx(s).
Evaluate S51(s) from (14-94) and form the function SA(s) using (14-96).
Decompose SA(s) as in (14-97) and form the function Hx(s) using (4-98).
Determine H/s) from (14-95).
If the function SA(s) is rational, then the decomposition (14-97) can be
accomplished by expanding SJ( (s) into partial fractions. Assuming that S5x (s) is
a proper fraction with simple poles, we obtain
s„,W-E-
i S
+ E-----
к s~zk
Re s, < 0
Re zk > 0
(14-99)
The inverse of the second sum is 0 for т > 0. If, therefore, it is shifted to the
left, it will remain 0 for т > 0. This shows that only the first sum will contribute
to the term Rst^ + A)[7(r). In other words,
Лл-ж(т + Л)Г/(т) = [a1e*'(r+A) + ••• +яХ"(т+А)][7(т)
The transform of the above yields
SA+(J) =
u1eI|A
S — Sj
aneSnK
(14-100)
Example 14-7. Suppose that x(/) = s(/) + v(t) and
51l(") = ^W S-M =N
as in Example 14-1. In this case, S„(s) = S„(s) and
Sxx(s) = —N° ,+N = N&
a — s'
Ш = 0
(14-101)
N
2
Ct — S
Hence
s + ft
1 a — s
G(-*)
(14-102)
Inserting into (14-9.4) and expanding into partial fractions, we obtain
A_____A
a2 - s2 ~ s + a s - p (a + p)^N
14-3 Ml П-RINGANDPREDirilON 511
and with = -cr, (14-100) yields
5 + a
Hence
в — (Y
нх(*) = s;(s)r,(s) = -----f-«‘	(14-103)
л- + {3	'	'
Note In the decomposition (14-97) of S/л), the functions SA4(s) and SA (s) are unique
within an additive constant. This causes an ambiguity in the determination of /», (r). The
ambiguity is removed if we impose the condition that
SA- (oo) = ()
In the pure filtering case (Л = 0); the resulting Лл(г) might contain impulses al the
origin. This is acceptable because, by assumption (he estimate s(r) of s(i) is a functional
of the past and the present value of the data x(t).
Filtering white noise. In the pure filtering problem, the determination of the
estimator Hx(s) can be simplified if /?„(0) < oo and v(t) is white noise orthogo-
nal to the signal as in (14-101). We maintain, in fact, that in this case
H/5) = 1 - T/Vr/s)	(14-104)
where Г/s) is the whitening filter of xG).
Proof. From the above assumptions it follows that SJ5(<») = 0; hence
S,t(s) = S„(s) = Sxx(s) - N = L,(s)L,( -s) - N
S„(«) = 0	S„(~) = N	L,( ± ~) - /W
Inserting into (14-94), we obtain
M5) = M*) -	-л) = Lx(s) + К - ATJ -5) - К
From the preceding note it follows that the constant К must be such that
the noncausal component of S1( (5) satisfies the infinity condition -ЛЛГж(-«>) -
К = 0. And since Гх(-а>) = l/Lx(-o°) = 1/ v9v, (14-104) follows from (14-95).
Example 14-8. We shall determine the pure filter of the process in Example 14-7.
From (14-102) and (14-104) it follows that
ИЛ'»"1-H? "777
in agreement with (14-103). Note that the resulting MS error equals
/f
P - EUs(0 - Jq hja)x(t - a)da]s(ri)| - ^7^
512 MEAN SQUARE ESTIMATION
Digital Processes
We shall state briefly the discrete-time version of the preceding results. Our
problem now is the determination of the future value s[n + r] of a stochastic
process in terms of the present and past values of another process x[n):
§г[л + r] = E /1'[&]x[n - A:]	(14-105)
k~o
In this case,
s[n + r] — sr[n + r] ± x[n — m] m 0
hence
+ r] = У Лх[А:]Яхх[/тг - к] m > 0	(14-106)
This is the discrete version of the Wiener-Hopf equation (14-88).
To determine Л£[п], we proceed as in the analog case: We express
sr[n + r] in terms of the innovations Цл] of x[n] (Fig. 14-13)
sr[« + r] = £	“ £]	(14-107)
A-0
From this and (8-70) it follows that
+ r] = У Л'ДЛ]5[т - A] = Л£[т] m 0
k-0
because = 3[m]. Hence
all m	(14-108)
The function can be expressed in terms of as in (14-94)
MZ)=S«(Z№')	(14-109)
Thus the transform of Rsi[m + r] equals
Sr(z) =z^(z) =z^„(z)rx(z-1)	(14-110)
FIGURE 14-13
14-3 HLlEIUNGANDPKI-.l>ICilt>N 513
The function Sr(z) is then written as a sum
SrU) = s;(z) + S;(z)	(14-111)
where Sr+(z) is analytic for |z| > 1 and Sr~(z) is analytic for |z| < 1. Further-
more, the inverse of Sr (z) at the origin is 0. Thus S;(z) is the transform of lhe
causal function Rs,t[m +	And since ix[/i] is the response of the
whitening filter Г/z) with input x[n], we conclude from (13-108) that
Н;(г) = H;(z)F,(z) = S;(z)rr(2)	(14-112)
Example 14-9. We shall determine the one-step predictor sjn + 1] of the process
s[ n ] where
s"(z)=(—ог-.)(1-аг)	M-’)-* S...(z)-0
In this case (see Example 14-2)
From (14-110) it follows with r = 1 that
zNfi'Jb/Na Aaz Az/b	I N
'	(1 - ez~*)(l - bz) z — a z - l/b	V ab
Since 0 < fl < I and l/b > 1. we conclude from the above that Sf(z) = Aaz/
(z - a) and (14-112) yields
H’(z) = (fl - b)-^— ЛЦл] = (a - b)bnU[n]
z — b
We discuss presently a more direct method for determining Hx(z) (sec (14-118)
below].
White noise. We shall examine the nature of the predictor H'(z) of s[n + r]
under the assumption that the noise is white and orthogonal to the signal
= A3[m]	= 0	(14-113)
Pure filter Suppose first that r = 0. In this case, H”(z) is a pure filter and
§0[n] is the estimate of the signal s[zi] in terms of x(zz] and its past.
We maintain that (Fig. 14-14)
нм='-ш	(14414)
Proof From (14-113) it follows that
Slx(z) - S„(z) - S„(z) - N - L,(z)Lx(z-') - N
Inserting into (14-109), we obtain
S„.(z) -L,(z)-AT,(z-’)	(Н-П5)
514 MEAN SQUARE ESTIMATION
FIGURE 14-14
We wish to find the causal part of the above, including the value of its inverse at
n = 0. Since the inverse z transform of Г/1/z) is 0 for n > 0 and for n = 0 it
equals Гг(»), we conclude that
H°(z) = L,(z) -ЛТ»	(14-116)
Multiplying by Гл(г), we obtain (14-114) because Гх(ос) = l/Zx[Oj.
Filtering and prediction We shall now show that the estimate sr[n + r] of
s[n + r] equals the pure predictor s0[n + r] of the estimate s0[n] of sfn] (Fig.
14-14)
sr[n + r] = s0[« + r] = £{s0[« + r]|s0[w - Л], k > 0} (14-117)
Proof. From (14-110) and (14-115) it follows that
Sr(z) =z'[L/z) -Wr,(z-')]
But the inverse of zTx(l/z) is 0 for n > 0. Hence S*(z) is the causal part of
z^L/z). Inserting into (14-112), we obtain
H;(z) -z'k(z) -
\	к — 0	/
Z I k(z) j
(14-118)
As we see from Fig. 14-14, the innovations filter of s0[n] equals Lv(z)Hx(z). To
determine the pure predictor Hr(z) of s0[n + r], it suffices, therefore, to
multiply (14-37) by zr (we are predicting now the future) and to replace the
function L(z) by L/z)H°(z). This yields
Ar(z) =z'(l -
\ k(z)-D j
because the inverse of Lx(z) - D equals /х[л] - D5[n]. Comparing with (14-
118), we conclude that
н';(2) = н;(г)й,(г)
14-4 K/\i мам и h.hs 515
(«)
FIGURE 14-15
The preceding discussion leads to the following important consequences of
the white-noise assumption (14-113):
1.	The innovations iA.[zi] of x[zz] arc proportional to the difference x[nj - sn[,i]:
, , л r ч г >	N
Ф»] - s0[h] = Di,[«] D = y-j-j (14-119)
Indeed, x[/t] - s0[zr] is the output of the filter
Lx(z) - [Lt(z) -D]=D
with input zz] (Fig. 14-15д). Thus the process ix[zj] can be realized simply
by a feedback system (Fig. 14-156) involving merely the filter Hj4z).
2.	The r-step filtering and prediction estimate s0[zz + r] can be obtained by
cascading the pure filter H°(z) of s[n] with the pure predictor Hf(z) of
s0[n + r].
3.	If the signal s[zi] is an ARMA process, then its estimate sn(zr] is also an
ARMA process.
Indeed, if Lx(z) = A(z)/B(z) is rational, then [see (14-114)], the filter
H°(z) is also rational. Furthermore, the denominator B(z) of Lx(z) is the
same as the denominator of the forward component Lx(z)-D of the
feedback realization of H®(z) shown in Fig. 14-156.
As we shall presently see, these results are central in the development of
Kalman filters.
14-4 KALMAN FILTERS?
In this section we extend the preceding results to nonstationary processes with
causal data and we show that the results can be simplified if the noise is white
and the signal is an ARMA process. The estimate s,[n + г] of s[n + r] in terms
tR. E. Kalman: “A New Approach to. Linear Filtering and Prediction Problems,” ASME Transac-
tions, Vol. 82D, 1960.
516 MEAN SQUARE ESTIMATION
of the data
x[«] = s[n] + v[m]
lakes now the form
Sr[« + r]=£{s[« + r]|IW,OSii»)=	(14-120)
Л - U
Thus s,[zi + r] is the output of a causal, time-varying system with input
х[л]£7[л], and our problem is to find its delta response /j,[zl£].
As we know,
s[n 4- r] - sr[zi + r] ± x[zn] 0 < m < n
This yields
/?jjr[zi 4-r, zn] = 22 Ax[n Д )Лхл[/с, w] 0 < m < n (14-121)
A-0
Thus й'[пД] must be such that its response to /?xx[zz. m] (the time variable is
n) equals Rsx[n + r, z?i] for every 0 < zn < zi. For a specific zi, this yields n 4- 1
equations for the n 4- 1 unknowns hrx[n,k].
To simplify the determination of hrx[n, Л], we shall express the desired
estimates sjzi + r] in terms of the Kalman innovations [see (14-77)]
U«] = E 7л1»,*]х[Л]	(14-122)
ьо
of the process x[zi][7[zi] where yx[zi,&] is the Kalman whitening filter. The
process ьДп] is orthonormal and, if the data are linearly independent, then the
processes x[zi] and iA[zi] are linearly equivalent. This leads to the conclusion
that §f[zi 4- r] can be expressed in terms of i jn] and its past (Fig. 14-16)
sr[« + r] = £ hrix[n, A:]ix[Ar]	(14-123)
к — 0
FIGURE 14-16
14-4 kai mas i n 11 hs 517
To determine AJJn, A], we apply the orthogonality principle. Since
Л, [/л, л] =	- л]
this yields
/?мДл + r. w] = £ h\ [л. k]8[k - ,л]
с -0
Hence
гл] = Rti [л 4- г, /л] I) < m < n	(14-124)
This function can be expressed in terms of the cross-correlation Kjm. л].
Multiplying (14-122) by s[/w], we obtain
Я.ч.["».«]= E rd"-£]/?„['". A]	(14-125)
fc-ll
Thus, for a specific /л, /?„Д/л. л] is the response of the Kalman whitening filter
of х[л] to the function Я,Д/л.л) where л is the variable. To complete the
specification of §,[л + r], we cascade the filter /ijn.rn] with the whitening
filter yJ«J] as in Fig. 14-16.
ARMA Signals in White Noise
In the numerical implementation of the above, wc are faced with two problems:
(1) the realization of the Kalman innovations process ц[л]: (2) the determina-
tion of the sum in (14-123). In general, these problems are complex, involving
storage capacity and number of computations proportional to л. However, as wc
show next, under certain realistic assumptions the problem can be simplified
drastically.
ASSUMPTION I. The noise is white and orthogonal to the signal:
Я,,,,[т,л] = N„8[m -«]	Ял„[/и,л] = 0	(14-126)
This leads to the following conclusions.
Property 1 If s0[n] is the estimate of s[ л J in terms of х[л] and its past and
D2 is the MS estimation error, then the difference х[л] - §0(л] is proportional
to the Kalman innovations ix[n] of the data х[л]:
х[л] - §<>[«] = ДЛ1Л]	(И-127л)
D2 = Е{|х[л] — 50[л]|2}	(14-1276)
Proof, The difference х[л] — s0[n] depends linearly on х[л] and its past. Fur-
thermore, the processes v[«] and s[«] - s0[n] are orthogonal to the past of
Дл], Hence
х[л] - s0[n] = s[n] - s0[n] + v[n] X x[к ] к < n
518 MEAN SQUARE ESTIMATION
FIGURE 1447
From this it follows that the process x[zz] - s()[zi] is white noise and
x[zj] — s0[zz] ± ix[ к]	0 < к < n — 1
because the processes x[fc] and iv[k] are linearly equivalent. And since x[zz] -
s0[zi] depends linearly on ix[/c] for 0 <, к < n, (14-127«) results. Equation
(14-127Z0 is a consequence of the requirement that £?{i^[zz]} = 1.
Property 1 shows that the process i j«] can be realized simply by the
feedback system of Fig. 14-17. This eliminates the need for designing the
whitening filter yx[n, /с].
Property 2 The estimate sr[zi + r] of s[n + r] equals the pure predictor
i0[« + r] of the estimate s0[zz] of s[zi] (Fig. 14-17)
sr[zi + r] = s0[n + r] « £ /if[zi,k]s0[k]	(14-128)
provided that, for every n > 0,
£{s0[zi]ix[n]} = £{x[n]ix[«]} - Dn * 0	(14-129)
Proof. The process s0[ zi] is linearly dependent on x[zi] and its past. Condition
(14-129) means that the component of §0[zi] in the ix[zi] direction is not 0.
Hence the processes s0[n] and x[n] are linearly equivalent. And since
s0[zz + r] - s0[zi + r] ± s0[k]	0 < к £ n
we conclude that
s0[ zi + r] - s0[n + г] ± x[к]	0 < к £ n
Furthermore,
s[n + r] - g0[n + r] j. x[к]	0 к n + r
because §0[ n + r] is the estimate of s[zz + r] in terms of x[ Xc ] for 0 < к < n + r.
Finally,
s[n + r] - s0[n + r] = (s[n + r] - s0[zj 4- r]) + (s0[zi + r] - s0[Zl + r])
Hence
s[n + r] - §0[zt + r] JL x[к]	0 < к < n
and (14-128) results.
This prpperty shows that filtering and prediction can be reduced to a
cascade of a pure filter and a pure predictor.
14-4 KALMAN FlLllIRS 519
(b)
FIGURE 14-18
ASSUMPTION 2. The signal s[n] is a time-varying ARMA process (Fig. 14-18a)
M-I
s[«] ~«"s[/i - 1]--------- «",s[n - M] = Eb^[n-/c]
(14-130)
Rf{[m,n] =	- «]
Property 3 The estimate s0[n] is also an ARMA process
M-i
8<>[»]-efS,[«-l]----------<ЫАп-М}- Е4Ф-Ч (14-131)
Л-0
where the coefficients ank are the same as in (14-130) and the coefficients c* are
M constants to be determined.
We assume that the above is true for all past estimates s„[« - k] and wc
shall prove that if s0[ rt] is given by (14-131), then it is the estimate of s[«]. It
suffices to show that if the constants ck are suitably chosen, then the resulting
error, satisfies the orthogonality principle
e[n] •= s[n] - s0[n] Xx[r] O^r^n (14-132)
520 MEAN SQUARE ESTIMATION
Subtracting (14-131) from (14-130), we obtain
м	M-1
e[«]= E^e[«-*]+ E "*] -	-*]) (14-133)
Jt-l	k=0
But
for r<n,
and e[« - k] ± x[r] for r <, n - к (induction hypothesis). Hence (14-132) is
true for r £ n - M. It suffices, therefore, to select the M constants such that
Е{е[п]х[г]} = 0 n—M+l<r<n	(14-134)
We have thus expressed s0[n] in terms of ix[?i). To complete the specifica-
tion of the filter, we use (14-127д). This yields the feedback system of Fig.
14-18Z? involving M + 1 unknown parameters: the constant Dn and the M
coefficients c£. These parameters can be determined from (14-127&) and the M
equations in (14-134).
The recursion equation (14-131) can be written as a system of M first-order
equations (state equations) or, equivalently, as a first-order vector equation (see
Sec. 12-2). The unknowns are then the scalar D„ and the coefficients cj!. To
simplify the analysis, we shall carry out the determination of the unknown
parameters for the first-order scalar case only. The results hold also for the
vector case mutatis mutandis.
FIRST-ORDER. If
>Ы-Л,|[в-1]-|;Н	£(«2W) = K (14-135)
then (14-131) yields
S0[/i] -ЛиЦл - 1] = K„(x[n] - §0[л])	(14-136)
where Kn = cg/Dn. This is a first-order system as in Fig. 14-19a. To complete
FIGURE 14-19
14-4 KALMAN HL1LKS 521
its specification, we must find the constant Kn. We maintain that
K" ~ Nn-p = £te2l"])	(14-137)
In the above, NH is the average intensity of v[n], which wC assume known.
The MS error Pn can be determined recursively
I’..	+
-----	(14-138)
Proof, Multiplying the data x[n] = s[n] + v[zr] by the error
е[л] = s[zi] - s„[n] = x[n] - s0(,i] - v[n]
and using the orthogonality condition (14-132), we obtain
Е{е[л]х[л]} = 0 =	+ £(£[n]v[n]}
From (14-135) and (14-136) it follows that
e[n] =Лле[н - 1] + 5[w] - K„(e[/i] + v[n])
(1 + K„)e[«) - Л„ф, - 1] + {[nJ -	('4‘139)
Hence
(1 + X„)E(e[rt]v[n]} = -K„£{v2[n]}
and (14-137) results.
To prove (14-138), we multiply each side of (14-139) by each side of the
identity s[n] = /1л8[п - 1] + £[л]. This yields
(1+ВД=^Рл., + Ия
Since 1 4- KN = Nn/(.N„ - Pn), the above yields (14-138).
Note Using (14-135), we can readily show that
“ И = К„(х[л] -/l„sn[/i - I])
where
- p- A"p-'+ K"
” 4 ^., + r„ + 4
The corresponding system is shown in Fig. 14-19b. In the same diagram, we also show
the realization of the one-step predictor
s,[n + 1] = S0[n + 1] -4ДМ
of s[n + I], This follows readily from (14-128) because the process s0[n] is AR; hence its
pure predictor equals /4„s0[n].
The iteration The estimate s0[n] of s[n] is determined recursively: If
Kn_! and S0[w - 1] are known, then Kn is determined from (14-141) and S0[n]
from (14-140). To start the iteration, we must specify the initial conditions of
(14-140)
(14-141)
522 MEAN SQUARE ESTIMATION
(14-135). We shall assume that
s[0] = C[0]
This leads to the initial estimate
so[O] = Xox[O]
from which it follows that
£{s[0]x[0])	E{?2[0]}
°	E{x2[0]}	£{£2[0]) + £{v2[0]J
Hence
P Wo
K(i Ko + /Vo 0 Ko + No
(14-142)
Linearization Equation (14-138) and its equivalent (14-141) are nonlin-
ear. However, each can be replaced by two linear equations. Indeed, if Fn and
Gn are two sequences such that
Pn =A~nFn- \ +	Fq =
П	П П — I	П fl I	U	U U	/ t л « a \
(14-143)
NnGn =	+ (K„ + 4)^^ Go = Ko + Nq
then
Example 14*10. We shall determine the noncausal, the causal, and the Kalman
estimate of a process s(n) in terms of the data xf/c] = s[£] 4- v[A], and the
corresponding MS error P. We assume that the process s[/i] satisfies the equation
s[n] - 0.8s[n - 1] = £[л]
and that
Fff[*n] > 0.368[m]	Я{1,[m] = 0	= 3[zi - zn]
This is a special case of the process considered in Example 14-2 with
a = 0.8 /V = 1 No = 0.36 b = 0.5
Hence
0.36
<ш-)(1-о.8г)	°-8'""
S«(z) - Ljr(z)Lx(z-') L,(z) = /Гб -—
(л) Smoothing: x[A] is available for all k. In this case the solution is obtained
fowl Example 14-2 with b•- 0.5 and c « 0.375:
Л[л] = 0.3 X 0Л'Л| P = 0.3
14-4 kalman hi.tfks 523
.	’’*! m?”®^'°' ‘ S n Thc M*=r is determined
from (14-114) where now /ДО] = /1.6;
uor \ i *-0.8	0.375z
x(X) = 1 “ L6(z-0.5) =	Л1"1 = 0375 X °
This shows that the estimate s(n] of s[h] satisfies the recursion equation
s[n] — 0.5s[h - I] = 0.375х[л] л 0
The resulting MS error equals
f = «»[0]- E Ц*1Ф] = 0.375
A-О
(c) Kalman filter: x[Ar] is available for 0 < к £ n. Our process is a special
case of (14-135) with
An = 0.8 К = £{C;[«]} = 0.36	4-E(v2hl) = l
Inserting into (14-143), we obtain
Fn = 0.64/;, _ । 4- 0.36G„ . । F(l = 0.36
Gn = 0.64F„ _, + 1.36G„ _, Go = 1.36
This is a system of linear recursion equations and can be readily solved with z
transforms. Since
-PF
£7 _ n n
"	»	Gn
and N = 1, the solution yields
_ 0.48z" - 0.12z?	z, = 1.6
Kn = Pn = 1.28^f + 0.084'	z2 = 04
In particular,
n =	0	1	2	3	4
Pn =	0.3	0.357	0.371	0.374	0.375
Thus, although the number of the available data increases as n increases, the MS
error Pt also increases. The reason is that s[n] is a nonstationary process with
initial second moment Ио = 0.36 because s[0] = 5(0], and, as n increases, E{s2[n]}
approaches the value 1.
We note, Anally, that
0.48
K„ = P„-------*	= 0.375
"	» я-** 1.28
and (14-140) yields
30[л ] - 0.880[л - 1] = 0.375х[л] - 0.3So(n - 1]
The aboveshowsthat, if. the process s(n]'is WSS, then its Kalman filter approaches
the causal1 Wiener filter as л -• », This is the case for any Po because the limit of
FnG„ as n 00 equals 0.375 regardless of the initial conditions.
S24 MEAN SQUARE ESTIMATION
Example 14-11. Wc wish to estimate the RV s in terms of the sum
x[n] = s + v[«] where £{sv[''l) = 0	Я,„.[т, л] = N6[m - л]
The estimate §и[л] in terms of the data х[л] can be obtained as the output of a
Kalman filter if we consider the RV s as a stochastic process satisfying trivially
(14-135)
s[n] = s[n- !]+£[«]	s(-l]=0
rf„1 = /s " = ° y=lEls2}=M n = 0
J к О	л > 0	"	\ о	n > 0
In this case, Atl = I.	= V, and (14-143) yields
Fn = Fn_t VG„ = F„_t +NG„_I F0-MN G„ = M + N
Solving, we obtain
Fn = MN G„ = M + N + Mn
Hence
;V 4- Mn	M
§цГл1 =-------------s,J« — 11 4- ------------х[л|
"L J M + N + Mn ,,L J M + N + Mn 1 J
Continuous-Time Processes
We wish, finally, to determine the estimate
s0(/) = E{s(/)|x(t),0 < - </}	(14-144)
of a continuous-time process s(t) in terms of the data
x(f) = s(r) + v(t)	(14-145)
The solution of this problem parallels the discrete-time solution is recursion
equations are replaced by differential equations and sums by integrals. It might
be instructive, however, to rederive the principal results using a different
approach.
To avoid repetition, we start directly with the white-noise assumption
Ян„(/,т) =W(r)3(t - t) N(t) > 0	= 0 (14-146)
and we show that the process
w(f) = x(r) - s0(/)
is white noise with autocorrelation
U'j) eJV(r)S(t —т)	(14-147)
Proof. As we know
«(') - s(t) - s0(z) ± x(t) v(r) ± x(r)
for < t Furthermore, w(/r) depends, linearly on x(t) and its past. Hence
^v(f) = e(/) + v(r) 1 w(r) r<t	(14-148)
I	14-4 KAiMAMhiiHs 525
To complete lhe proof of 114-147», we shall assume lhal s„(r) isarminuous from
the left
У'-) = s0(')
This is not true at the origin if s(0) * 0. However, for sufficiently large t the
effect of the initial condition can be neglected. From the above it follows that
P(i) = E{tz(i)] < x
and since e(/) ± v(r) for т > t, wc conclude that
Oj) = tfllv(/,-) = Я,.,.(г,т) = yV(-)S(r - T)
Using a limit argument, we can show that, as in the discrete-time case, the
normalized process w(/)/ ^N(t) is the Kalman innovations of x(r). The details,
however, will be omitted. This leads to the conclusion that s„(r) can be
expressed in terms of w(r) [see also (14-123)]
MO = f'hjj’ «)'*(») da	(14-149)
Since s(r) - s0(r) J_ w(r) for т < /, we conclude from the above and (14-147)
that
= fAB.(/,a)N(a)5(T - a) da = hw(t,т) H(t)
and (14-149) yields
M') “ Г-П7—T^JH,(r,a)w(a)da	(14-150)
Ai N\a)
We note that [sec (14-148) and (14-146)]
+"(')]} =/>(')
WTOE-SENSE MARKOFF PROCESSES. Using the above, we shall show that, if
the signal s(r) is WS Markoff, that is, if it satisfies a differential equation driven
by white noise, then its estimate s0(t) satisfies a similar equation. For simplicity,
we consider the first-order case
s’(f) +	=£(r) Ru(t,r) = V(r)8(t — t) (14-151)
The Kalman-Bucy equations!. We maintain that
86(f) + H(l)80(t) - MOM') -M')l	t14'152)
□	tk-B. Kalman and R. C. Bucy. “New Results in Linear Filtering and Prediction Theory,” ASME
тЛз Trofisaciians, vtfl. 83D, 1961.
526 MEAN SQUARE ESTIMATION
where
(14-153)
Furthermore, the MS error P(z) satisfies the Riccati equation
P'(') + 2A(t)P(r) = I'M -	(14-154)
Proof. Multiplying the differential equation in (14-151) by w(r), we obtain
^«1„('.т)+Л(<)Я,„(г,г)=0 r<r (14-155)
Ct
We next equate the derivatives of both sides of (14-150)
1	1 c
=	+ X N(a) TtRtw^' da
Finally, we multiply (14-150) by A(t) and add with the above. This yields
(15-152) because, as we see from (14-155), the sum of the two integrals is 0.
To prove (14-154), we use the following version of (10-90): If z(f) is a
process with E{z2(t)} = Kt) and such that
*'(') + B(t)z(t) = g(t) Rff(t,r) = Q(r)8(t - t) (14-156)
then (see Prob. 10-28Z0
I'(t) +2B(t)I(t)=Q(t)	(14-157)
Returning to (14-152), we observe, subtracting from (14-151), that the
estimation error e(t) satisfies the equation
e'(0 + [/!(/) + ff(«)]e(r) = ?(r) - K(r)»(r)
In the above, the right side ij(r) = £(r) - K(t)v(t) is white noise as in (14-156)
with
G(r) = У(т) +№(t)N(t)
Hence the function P(t) = £{e2(t)} satisfies(14-157) where B(t) = Л(/) + K(t\
This yields
P'(0 + 2[Л(г) + 2ф)]Р(г) = И(г) + K2(t)N(t)
and (14-154) resultSi
Linearization We shall now show that the nonlinear equation (14-154) is
equivalent to two linear equations. For this purpose, we introduce the functions
ЯО and G(t) such that
F(i)
F(0 -	(14-158)
14-4 KALMAN Hl.TLRS 527
Clearly,
ПО = P'(t)G(t) + P(t)G'(t)
and (14-154) yields
ПО + A(t)F(t) -	= P(z) G'(O
F(t)
A(t)Glt) - -~L
4	Л’(/)
This is satisfied if
ПО = ~A(t)F(t) + Z(t)Gu)
F(t)
(14-159)
To solve the above system, we must specify F(0) and G(0). Setting
arbitrarily G(0) = 1, we obtain F(0) = P(0) where
P(0) =E{s2(0)}
is the initial value of the MS error P(r). The determination of the Kalman filter
thus depends on the second moment of s(0).
Example 14-12. We shall determine the noncausal, the causal, and the Kalman
estimate of a process s(t) in terms of the data x(t) = s(t) + v(r), and the
corresponding MS error P. Wc assume that s(z) satisfies the equation
s’(r) + 2s(t) = £(t)
and that
= 12й(т) ЯДт) = 0 Ur) = 8(т)
This is a special case of the process considered in Example 14-7 with
a = 2 N~ I No = 12 Д = 4
Hence
S„(w) =	4 +(i)2	П(т)	=	3e	11
16 + ш2	z	5 +	4
Sx^ =	4 + ш2	Lx(5)	“	5 +	2
(a) Smoothing: x(f) is available for all £. In this case, (14-16) yields
The MS error is obtained from (14-15)
9
pe3__y e-<’|T|e-2|r|4r= 1.5
(/>) Causal filter: x(£) is available for < S t. The unknown filter is specified in
Example 14-8 with
a » 2	/3 = 4 N “ 12
528 MEAN SQUARE ESTIMATION
Thus
2
Нл(*) = —7	Лж(О“2е-4'С/(/)	P=2
$4-4
This shows that the estimate s(/) of s(t) satisfies the differential equation
s'(') + 4s(t) = 2x(t)
(c) Kalman filter: x(£) is available for 0 < £ < f. Our problem is a special
case of (14-151) with
/1(f) = 2 И(Г) = 12 N(l)=l
Hence [see (14-159)]
F'(t) = — 2F(t) 4- 12G(t)	G‘(t) = F(r) 4- 2G(t)
To solve this system, we must know P(0).
Case / If s(0) = 0, then P(0) = 0. In this case, F(0) = 0, G(0) = 1. Inserting
the solution of the above system into (14-153), we obtain
K(O=P(t) =
6e4' - 6e"4'
3e4' 4- e~4'
2
Case 2 Vie now assume that s(f) is the stationary solution of the differential
equation specifying s(t). In this case, E{s2(0)} = 3; hence P(0) = F(0) = 3 and
K(t) = P(/) =
18e4' 4- 6e~4'
9e4' - e~4'
2
Thus, in both cases, the solution s0(t) of the Kalman-Bucy equation (14-152) tends
to the solution of the causal Wiener filter
s{j(t) 4- 2s(>(t) = 2x(t) - 2s()(t)
as t ->».
Example 14*13. We wish to estimate the RV s in terms of the sum
x(/) = s4-v(t)	E{sv(/)}=0 Rvv(r) = N8(r)
This is a special case of (14-151) if
A(t) = 0	s(/) = s f(t) = O /V(/)=/V
In this case, И(/) = 0, P(0) = E{s2} s Mt and (14-159) yields
F(t)
F’G) = 0	G'(t) «	F(0) = M G(0) = 1
Hence
Mt
F(t) = M G(/) = l + j
Inserting into (14452), we obtain
k , v M	M
’i(0 +
problems 529
PROBLEMS
14-1. If R,M = /e“|T,/r and
£{s(r - T/2)|s(/),s(r - T)} = as(r) + bs(i - T)
find the constants a, b and the MS error.
14-2. Show that if г = as(0) + bs(T) is the MS estimate of
14-3. Show that if x(r) = s(r) + v(r), Ял„(-) = 0 and
£{s'(Olx(r),x(r - t)} = flx(r) + bx(r - t)
then for small t, a = -b - /?" (0)/тЯ"г(0),
14-4. Show that, if Sx(w) = 0 for |ш| > cr = ~/T, then the linear MS estimate of x(r)
in terms of its samples х(пГ) equals
Е{х(/)|х(лТ), n-------= £ sinGrf—
„»-» <rt - птт
and the MS error equals 0.
14-5. Show that if
E{s(r + A)|s(r),s(r - т)} = E{s(t + A)|s(t)}
then R,(t) = A?"O|T|.
14-6. A random sequence x„ is called a martingale if E{x„ = 0} and
E{x„|x„_1,....X|} = xnl
Show that if the RVs yn are independent, then their sum x„ = yt + • • • +y„ is a
martingale.
14-7. A random sequence xn is called wide-sense martingale if
E{X„|X„_......... Xj) = xn _!
(fl) Show that a sequence x„ is WS martingale if it can be written as a sum
x„ = у। +. • • • +У„ where the RVs y„ are orthogonal.
(h) Show that if the sequence xn is WS martingale, then
E{x*} й:Е{х^_,} S ••• aE{x?}
Hint: x„ - x„ - x„_, + x„_, and x„ - x„_, 1 хл_,.
14-8. Find the noncausal estimators H^ai) and Я,(ы) respectively of a process s(r) and
its derivative s'(f) in terms of the data x(r) = s(z) + v(r) where
•R,(r) - Л Sl” г°Т RM - Я3(т) R,M = 0
14-9. We denote by Я,(а>) and Яу(й>) respectively the noncausal estimators of the input
#(r) and the output y(/)of the system T(a) in terms of the data x(r) (Fig. Pi4-9).
ShcW/ that Я/е>) = H/w)T(w).
S30 MEAN SQUARE ESTIMATION
FIGURE P14-9
14-10. Show that if S(a>) = 1/(1 + a>4), then the predictor of s(r) in terms of its entire
past equals s(r + A) - btls(r) + bts'(t) where
bu = e_a/^ (cos-7=^ + sin~7=-1 b, =
1 >/1	>/1 J	Д
14-11. (a) Find a function Л(г) satisfying the integral equation (Wiener-Hopf)
R(t — a) da = R(r + In 2) t^O R(t) = je"7 +
(b) The function H(s) is rational with poles in the left-hand plane. The
function Y(s) is analytic in the left-hand plane. Find H(.v) and Y(x) if
r	, 49 - 25s2
-V(s>
(c) Discuss the relationship between (a) and (b).
14-12. (a) Find a sequence h„ satisfying the system
“ 1 1
XI ^k^in-k ~ ^m + 1 /И 2: 0	R„, — t +
* = 0	L J
(b) The function H(z) is rational with poles in the unit circle. The function Y(z)
is rational with poles outside the unit circle. Find H(z) and Y(z) if
r ,	70 - 25(z + z-‘)
[H(z) - z]------------j--------------------= Y(z)
6(z + z ') — 35(z + z ’) + 50
(c) Discuss the relationship between (a) and (b).
14-13. Show that if H(z) is a predictor of a process s[?i] and Hn(z) is an all-pass function
such that |Ha(e'")| = 1, then the function 1 - (1 — H(z))H0(z) is also a predic-
tor with the same MS error P,
14-14, We have shown that the one-step predictor st[n] of an AR process of order m in
terms of its entire past equals [see (14-35)]
- k],k z. 1} - - £ a*s[n - Л]
A“1
Show that its two-step predictor s2[n] is given by
л	m
£{s[n]|s[n - Л], к ;> 2} -	- 1] - 22	~ M
k-1
В /
Г	PROBLEMS 531
J 14*15. Using (14-70) show that
..	„ In A,v 1
N-*	N 2тт1_„ЬЛш)аы
Hint:
1	A„tl 1	1	д
N , П-Д =	1,1	1 “ vIn “* lim In—*4 *
2	л-1 ил 'v	N	N~<r.
14-16. Find the predictor
sN[n] = E{s[n]|s[« - ft], 1 5 к < Л'])
of a process s[n] and realize the error filter E„(z) as an FIR filter (Fig. 14-8) and
as a lattice filter (Fig. 13-15) for N = 1,2, and 3 if
r f,„l = /5<3 ~ |wD H<3
’	\0	|/n| 3
14-17. The lattice filter of a process s[//| is shown in Fig. P14-17 for N = 3. Find the
corresponding FIR filter for N = 1, 2, and 3 and the values of R[m] for |?n| < 3 if
R(0] = 5.
FIGURE P14-17
14-18. We wish to find the estimate s(r) of the random telegraph signal s(r) in terms of
the sum x(t) = s(/) + v(t) and its past, where
R/т) = e-2^' ЛДт) = N6(r)	R3,(t) - 0
Show that
s(t) = (c - 2A) jf x(r - a)e~ea da c = 24^1 + -jjj
14-19. Show that if eN[«] is the forward prediction error and с^[л] is the backward
prediction error of a process s[/t], then (o) ew[/i] ± e(V+,„[/i + ml, (b) enW 1
E/v+zJh “ «J, <c) ЕдДл] -L - N -m].
14-20. Ex(/) = s(O + i*(r):
Д;(т) =	ЛДт) = 56(т) Л5М(т) - 0
Find the following MS estimates and the corresponding MS errors: (e) the
noncausal filter of sft); (b) the causal filter of s(f); (c) the estimate of s(r + 2) in
terms cif $(O and its past; (d) the estimate of s(x + 2) in terms of x(r) and its past.
532 MEAN SQUARE ESTIMATION
14*21. If х[л] = s[n] + м[л]:
Лд[т] = 5 X 0.8|m|	/?„[/«] = 53[лэт] Я,Дт] = О
Find the following MS estimates and the corresponding MS errors: (a) the
noncausal filter of s[n]; (b) the causal filter of sl«]; (c) the estimate of s[n + I] in
terms of s[n] and its past; (d) the estimate of s[« + 1] in terms of х[л] and its
past.
1422. Find the Kalman estimate
§«["] = £{s['«]Is[A ] + v[Ar], 0 < к < л}
of s[n] and the MS error Pn = £{(s[/i] - s0[л])2} if
/?л[/л] = 5 X 0.8|m|	/?„[m] = 55[oi] /?ли[/л] = 0
14-23. Find the Kalman estimate
§o(O = E{s(Ols(r) + v(t), Os t</)
of s(f) and the MS error P(t) = E([s(f) - s0(/)]2} if
Я,(т)-5С-«’1	Я.(т) = ^(т)	R„(r) = 0
14-24. Show that the sequences qN[m] and qN[m] of the inverse lattice of Fig. 14-1 Id
satisfy (14-85) and (14-86) (sec Note 1 page 496).
CHAPTER
15
ENTROPY
15-1 INTRODUCTION
As we have noted in Chap. 1, the probability /3(.йО of an event .V can be
interpreted as a measure of our uncertainty about the occurrence or nonoccur-
rence of «о/ in a single performance of the underlying experiment If
Р(«й^) = 0.999, then we are almost certain that .a/ will occur; if P(.c/) = 0.1,
then we are reasonably certain that .0/ will not occur; our uncertainty is
maximum if = 0.5. In this chapter, we consider the problem of assigning
a measure of uncertainty to the occurrence or nonoccurrence not of a single
event of but of any event of a partition SI of ь/ where, as we recall, a
partition is a collection of mutually exclusive events whose union equals S (Fig.
15-1). The measure of uncertainty about Я will be denoted by Ж) and will be
called the entropy of the partitioning 51.
Historically, the functional Н(Ю was derived from a number of postulates
based on our heuristic understanding of uncertainty. The following is a typical
set of such postulates!:
1.	ЖЯ) is a continuous function of pt =
2.	If = • • • =pN = i/N, then Н(Ю is an increasing function of M
3.	If a new partition SB is formed by subdividing one of the sets of $1, then
Я(ЯЗ)^Я(Ю.
tc. E, Shannon and W. Weaver: The Mathematical Theory of Communication, University of Illinois
Press, 1949.
533
534 tiNTROPY
FIGURE 15-1
It can be shown that the sum
Н(Ю = ~P\log/?, -  • * ~pN log pN (15 — l)t
satisfies these postulates and it is unique within a constant factor. The proof of
this assertion is not difficult but we choose not to reproduce it. We propose,
instead, to introduce (15-1) as the definition of entropy and to develop axiomati-
cally all its properties within the framework of probability. It is true that the
introduction of entropy in terms of postulates establishes a link between the
sum in (15-1) and our heuristic understanding of uncertainty. However, for our
purposes, this is only incidental. In the last analysis, the justification of the
concept must ultimately rely on the usefulness of the resulting theory.
The applications of entropy can be divided into two categories. The first
deals with problems involving the determination of unknown distributions (Sec.
15-4). The available information is in the form of known expected values or
other statistical functionals, and the solution is based on the principle of
maximum entropy: We determine the unknown distributions so as to maximize
the entropy ЖЯ) of some partition 21 subject to the given constraints (statisti-
cal mechanics). In the second category (coding theory), we are given МЮ
(source entropy) and we wish to construct various random variables (code
lengths) so as to minimize their expected values (Sec. 15-5). The solution
involves the construction of optimum mappings (codes) of the random variables
under consideration, into the given probability space.
Uncertainty and information In the heuristic interpretation of entropy,
the number Я(Я) is a measure of our uncertainty about the events of
the partition SI prior to the performance of the underlying experiment. If the
experiment is performed and the results concerning become known, the
uncertainty is removed. We can thus say that the experiment provides informa-
tion about the events equal to the entropy of their partition. Thus uncer-
tainty equals information and both are measured by the sum in (15-1).
tWe shall use as logarithmic base either the number 2 or the number e. In the first case, the unit of
entropy is the bit.
15-1
INIKOIH'CIION 535
Example 15-1. («) We shall determine the entropy of the partition Я = [even oddl
tn the fair-dic experiment. Clearly, /’{even) = P{odd) = j/j. цС1НХ.
/7(Й) = --llogl - |)og| = |og2
(/>) In the same experiment. 6 is the partition consisting of the elementarv
events (4}. In this case. P{ft} = 1/6; hence
H(S) = - i log i -    - 1 log ± = logo
If the die is rolled and we are told which face showed, then we gain
information about the partition © equal to its entropy log 6. If we are told merely
that “even” or “odd" showed, then we gain information about the partition Я
equal to its entropy log 2. In this case, the information gained about the partition
@ equals again log2. As we shall see, the difference log6 - log2 = Iog3 is the
uncertainty about © assuming $1 (conditional entropy).
Example 15-2. We consider now the coin experiment where P{h} = p. In this case,
the entropy of © equals
//(©) = -plogp - (I - p)log(-p) =r(p)	(15-2)
The function r(p) is shown in Fig. 15-2 for 0 < p <, 1. This function is symmetrical,
convex, even about the point p = 0.5, and it reaches its maximum at that point.
Furthermore, r(0) = r( 1) = 0.
Historical note The term entropy as a scientific concept was first used in thermodynam-
ics (Clausius 1850). Its probabilistic interpretation in the context of statistical mechanics
is attributed to Boltzmann (1877). However, the explicit relationship between entropy
and probability was recorded several years later (Planck, 1906). Shannon, in his cele-
brated paper (1948), used the concept to give an economical description of the properties
of long sequences of symbols, and applied the results to a number of basic problems in
coding theory and data transmission. His remarkable contributions form the basis of
modern information theory. Jaynest (1957) reexamined the method of maximum entropy
and applied it to a variety of problems involving the determination of unknown parame-
ters from incomplete data.
Maximum entropy and classical definition. An important application of en-
tropy is the determination of the probabilities p, of the events of a partition И,
subject to various constraints, with the method of maximum entropy (MEM).
The method states that the unknown p/s must be so chosen as to maximize the
entropy of subject to the given constraints. This topic is considered in Sec.
15-4, In the following we introduce the main idea and we show the equivalence
between the MEM and the classical definition of probability (principle of
insufficient reason), using as illustration the die experiment.
tE. T. Jaynes: Physical Review, vqls. 106-1Й7, 1957.
536 ENTROPY
Example 15-3. (e) We wish to determine the probabilities p, of the six faces of a
die, having access to no prior information. The MEM states that the p/s must be
such as to maximize the sum
H(©) = -pj log pj - • • • - pe log p6
Since pj + • • • +p6 = 1, this yields
Pl ~ Pb ~ fi
in agreement with the classical definition.
(b) Suppose now that we are given the following information: A player
places a bet of one dollar on “odd” and he wins, on the average, 20 cents per
game. We wish again to determine the p/s using the MEM; however, now we must
satisfy the constraints
Pi + P3 + Ps = 0.6 Р2+Р4+Рб = 0-4
This is a consequence of the available information because an average gain of 20
cents means that P{odd} - /’{even} = 0.2. Maximizing Ж©) subject to the above
constraints, we obtain
P1=PJ=PS^O.2 p2 =p4 =p6 = 0.133...
This agrees again with the classical definition if we apply the principle of insufficient
reason to the outcomes of the events {odd} and {even} separately.
Although conceptually the ME principle is equivalent to the principle of
insufficient reason, operationally the MEM simplifies the analysis drastically
When, as is the case in most applications, the constraints are phrased in terms of
probabilities in the space У" of repeated trials. In such cases the equivalence
still holds, although it is less obvious, but the reasoning is involved and rather
forced if we derive the unknown probabilities starting from the classical defini-
tion.
The MEM is thus a valuable tool in the solution of applied problems. It is
used, in fact, even in deterministic problems involving the estimation of un-
known parameters from insufficient data. The ME principle is then accepted as
15-1 INIItODlK.-IION 537
a smoothness criterion. We should emphasize, however, that as in (he case of
the classical definition, the conclusions drawn from the MB principle must be
accepted with skepticism particularly when they involve elaborate constraints
This is evident even in the interpretation of the results in Example 15-3; In the
absence of prior constraints, we conclude that all p(’s must be equal. This
conclusion wc accept readily because it is not in conflict with our experience
concerning dice. The second conclusion, however, that p^ = p4 = Ph = 0.133
and P] = p3 = p5 = 0.2 is not as convincing, we would think, even though we
have no basis for any other conclusion. In our experience, no crooked dice
exhibit such symmetries.
One might argue that this apparent conflict between the MEM and our
experience is due to the fact that we did not make total use of our prior
knowledge. Had we included among the constraints everything we know about
dice, there would be no conflict. This might be true; however, it is not always
clear how such constraints can be phrased analytically and, even if they can,
how complex the required computations might be.
Topical Sequences and Relative Frequency
Suppose that	..., is an ^element partition of an experiment
cf. In the space of repeated trials, the elements of 21 form Nn
sequences of the form
occurs n, times in a specific order)	(15-3)
and the probability of each sequence equals
p"> ... pf« ... p""	(15-4)
where p, = Ptja/p. The numbers nt are arbitrary subject only to the constraint
Л] + • • • + nN = n. However, according to the relative frequency interpretation
of probability, if n is “sufficiently large,” then “almost certainly”
n, = np,	(15-5)
This is, of course, only a heuristic statement; hence the resulting conse-
quences must be interpreted accordingly. However, as we know, the approxima-
tion (15-5) can be given a precise interpretation in the form of the law of large
numbers. Following a similar approach, we prove at the end of the section the
main consequence [Eq. (15-10)] of (15-5) in the context of entropy.
Guided by (15-5), we shall separate lhe TV" sequences of the form (15-3)
into tvyo groups; (a) typical and (Z>) rare. We shall say that a sequence is typical,
if n{ — npj. All other sequences will be called rare. A typical sequence will be
identified with the letter t:
t и occurs n, — npj times in a specific order) (15-6)
From the definition it follows that to each set of numbers ni,...^nN
“close” to the numbers лр npjv there corresponds one typical sequence.
The union of all typical sequences will be denoted by T. Thus T is the totality of
all sequences of the form (15-3) where л, = np. As we noted, it is almost certain
538 ENTROPY
that for large n, each observed sequence is typical. This leads to the conclusion
that
P(T) = 1	(15-7)
The complement T of IT is the union of all rare sequences and its probability is
negligible for large л:
P(T)=0	(15-8)
Since = np( for all typical sequences, (15-4) yields
_ рП) . . .	— е»1Р|1ПР|+ +л/>Л'1прЛ
Hence the probability of each typical sequence equals
P(t) =	(15-9)
where Н(Ю is the entropy of the partition Я. Denoting by n j the number of
typical sequences, we conclude from (15-7) and the above that
”’ = Ж°е"/л”	(15’10)
We have thus expressed the number of typical sequences in terms of the
entropy of И. If all the events of SI are equally likely, then Ж91) = In N and
лт = /Vя. In all other cases, Н(Ю < In N [see (15-38)]. Hence
«/Vn for	(15-11)
This leads to the important conclusion that, if n is sufficiently large, then most
sequences are rare even though “almost certainly” none will occur.
Notes 1. We should point out that each typical sequence is not more likely than each
rare sequence. In fact, the sequence with the largest probability is the rare sequence
{л/т occurs n times], where is the event with the largest probability. As we
presently show, the distinction between typical and rare sequences is best expressed in
terms of the events
{s/ occurs n, times in any order]
As we know [see (3-38)], the probability of these events equals
and for large n, it takes significant values only in a small vicinity of the point
(Л, « nxpx,..,,kN = nNpN\ This follows by repeating the argument leading to (3-17)
or, from the DeMoivre-Laplace approximation (3-39).
2. On page 1 of Chap. 1 we noted that the theory of probability applied to
averages of mass phenomena leads to useful results only if the ratio k/n approaches a
constant as n increases and this constant is the same for any subsequence. This
apparently mild requirement results in severe restrictions on the properties of the
resulting sequences. It leads to the conclusion that of all possible Nn sequences formed
with the N elements of a partition И, only the typical sequences are likely to
occur; all other sequences arc nearly impossible.
15-1 INTRODUCTION 539
Typical Sequences and the Law of Large Numbers
We show next that the preceding results can be reestablished rigorously as
consequences of the law of large numbers. For simplicity, we consider only
two-element partitions and, to be concrete, we assume that Ja/ and are the
events “heads” and “tails” respectively in the coin experiment. In the space
the probability of the elementary event (4jtJ = heads in a specific order)
equals
and the probability of the eventt
snfk = {k heads in any order)
equals
In Fig. 15-3 we plot the probability P(s/k), the geometric progression
and the binomial coefficients
as functions of k.
tThc event tfk is not, of course, an element of the partition Я ” 1-т^>
540 ENTROPY
«-TYPICAL SEQUENCES. Given a number a between 0 and 1, we form the
number e such that
a = 2G>^£y/n/pq ) — 1	(15-14)
where G(x) is the normal distribution. We shall say that the sequence *s
а-typical if к is such that
A'i<A'<£2 where k}=n(p — e) /c2 = zj(p + e) (15-15)
The union of all «-typical sequences is a set T consisting of
elements and its probability equals a [see (3-37)]
P(T) = E (?)p‘«'’-‘ = 2gL/7L ] - I =a (15-17)
Л-А,	\ 11 PQ J
FUNDAMENTAL THEOREM. For any a < 1, the number n, of «-typical se-
quences tends to елН(Я) in the following sense
In It,
—(15-18)
Proof. If p = q. = 0.5, then the DeMoivre-Laplace approximation yields
/	^irn /2
2(k — л/2)г/п
for к in the fn vicinity of n/2. This approximation cannot be used to evaluate
the sum in (15-16) for p * 0.5 because then the center np of the interval
(ku kf) is not n/2. We shall bound ит using (15-13) and (15-16). Clearly,
k2
n, = Ё p-V-V(^)	(15-19)
k-k}
where we assume that p < q. As к increases, the term p~kqk~n increases
monotonically. Hence
/.«	^"2	In \ k2 ^2
; E <»> <<r" - E )	(15-20)
\pi t-t,	\pi *-*,
And since [see (15-17)]
*2
ЕЛ^)=Р(Т)=а
k~ki
15-1 1м к<н >((i i<	541
(15-20) yields
г ("i- “ /"Г’
«'Ip)	“5-21>
Setting ki — np — he and k-> = np 4- пе in the above and using the identity
p~npq~'“l = c ln !>*<! I» 41 -
we conclude from (15-21) that
ае',//,я>(-]	<л,
I P /	I P I
Hence
nH(%) + In a - пе log - < In n. < л//(Я) 4- In a + he In -
P	P
Dividing by n, we obtain (15-18) because a is constant and, as we see
from(15-14), c —* 0 as л —»sc.
Important conclusion Theorem (15-18) holds for any a < 1; it will be
assumed, however, that a = 1 and the corresponding sequences will be called
typical. With this assumption
P(T) = a = 1 p(T) = 1-л^0	(15-22)
The probability of an arbitrary event .# equals, therefore, its conditional
probability
P(.Z) = P(.Z|T)/J(T) + Р(./НТ)Ш) = P(.Z|")	(15-23)
In other words, in any conclusions concerning probabilities in the space it
suffices to consider the subspace of ,./'n consisting of typical sequences only.
This is, of course, only approximately true for finite n. It is, however, exact in
the limit as n -» «.
CONCLUDING REMARKS. In Chap. 1, we presented the following interpreta-
tions of the probability P(.?/) of an event л/.
Axiomatic. Р(л/) is a number assigned to the event -V. This number satisfies
three axioms but is otherwise arbitrary.
Empirical. For large nt
к
P(tf) = -
where к is the number of times ла/ occurs in n repetitions of the underlying
experiment
Subjective. P(js/) is a measure of our uncertainty about the occurrence of
ini single performance of .Z1
542 ENTROPY
Principle of insufficient reason. If &/t are N events of a partition 91 of and
nothing is known about their probabilities, then /’(•й/J) = 1/N.
We give next four related interpretations of the entropy Н(Ю of 91.
Axiomatic. Н(Ю is a number of assigned to each partition of This number
equals the sum —Ер,- In p, where p, = f>(.Q^).
Empirical. This interpretation involves the repeated performance not of the
experiment but of the experiment of repeated trials. In this experi-
ment, a specific typical sequence is an event with probability e-nH^\
Applying the relative frequency interpretation of probability to this event, we
conclude that if the experiment is repeated m times and the event t;
occurs rrtj times, then for sufficiently large m,
m,	1 m
P(f.) =e-""(») = -> hence 7/(91) = In —
J	m	n nt
This relates the theoretical quantity /7(91) to the experimental numbers m,
and m.
Subjective. 77(91) is a measure of our uncertainty about the occurrence of the
events of the partition 91 in a single performance of
Principle of maximum entropy. The probabilities p, = Pt&ty must be such as
to maximize 77(91) subject to the given constraints. Since nt = елН(Я), the ME
principle is equivalent to the principle of maximizing the number of typical
sequences. If there are no constraints, that is, if nothing is known about the
probabilities pt, then the ME principle leads to the estimates p, = 1 /N,
77(91) = In N, and nt = N".
15-2 BASIC CONCEPTS
In this section, we develop deductively the properties of entropy starting with
various notations and set operations. At the end of the section, we reexamine
the results in terms of the heuristic notion of entropy as a measure of
uncertainty, and we conclude with a typical sequence interpretation of the main
theorems.
DEFINITIONS. The notation
91 — [л/p..	or simply 91 = [.oz']
will mean that 91 is a partition consisting of the events л/. These events will be
called1 elementst of 91.
tit will be clear from lhe context whether the word element means an event of a partition Я or
an clement of the space
15-2 HASH CONCIJ-TS 543
FIGLIRE 15-1
I.	A partition with only two elements will be called binary. Thus
91 =
is a binary partition consisting of the event </' and its complement .V.
II.	A partition whose elements are the elementary events {<,) of the space ./
will be denoted by <£> and will be called the element partition.
III.	A refinement of a partition is a partition 2? such that each element it? of
93 is a subset of some element of Я (Fig. 15-4). We shall use die
notation 23 < 91 to indicate that 23 is a refinement of 91 and we shall say
that Я is largerf than 23. Thus
23 <91 iff 0.c.^	(15-24)
A common refinement of two partitions is a refinement of both.
The partition © in Fig. 15-5 is a common refinement of the partitions 91
and 23.
IV.	The product^, of two partitions 91 = [.^] and 23 = [.у?,] is a partition whose
elements are all intersections of the elements of 91 and 23. This
partition will be denoted by
91 • 23
Clearly, 91 • 23 is the largest common refinement of 91 and 23.
FIGURE 15-S

fThe symbol -c is not an ordering of two arbitrary partitions. It has a meaning only if ® is a
refinement of И.
should emphasize that partition .product is not -a set operation.
544 ENTROPY
Properties From the definition it follows that
(& -< 91 for any 91
91 93 = 93-91	91 •(© • ®) = (91 •	• 6
If 91, < 9l2 < 9l3 then 91, < 9I3
If 23 -< 91	then 91 ♦ 93 = S3
ENTROPY. The entropy of a partition 91 is by definition the sum
N'
H(4L) = ~(pt logp, + +p;Vlogp/Y) = £>(pj	(15-25)
i—	1
where pt = and <p(p) = -p log p.
Since <p(p) > 0 for 0 < p < 1, it follows from (15-25) that
77(91)^0	(15-26)
where Ж91) = 0 iff one of the p/s equals 1; all others are then equal to 0.
Binary partitions If 91 = [.с/, зз/'] and Р(^) = p, then (Fig. 15-2)
/7(91)----plogp - (1 -p)log(l -p) = r(p) (15-27)
Equally likely events If
Pi = Pi = ” = P/v
then
HW = - ~ log V?-------------^log^ = log N (15-28)
NN	NN
If, in particular, N = 2m, then /7(91) = m.
INEQUALITIES; The function tp(p) = -p log p is convex. Therefore (see Fig.
15-6 and Prob. 15-2)
ф(Р1 + P2) < tP(Pi) + <p(p2) < <p(Pi + «) + <₽(P2 “ e) (15-29)
Where
Pl <Pi + E <p2 - e <p2	(15-30)
This leads to the following properties of entropy:
1.	Given a partition 91 = [ja/,, £f2,s^N], we form the partition S3 =
obtained by splitting into the elements &a and &b
as. in Fig. 15-7. We maintain that
H(9l)</7(93)	(15-31)
Proof. Clearly,
Я(91) - <p(pa +pb) = H(93) - fp(pa) ~ fp(ph)
because each side equals the contribution to /7(91) and /7(93) respectively due
15-2 BASIC CONCEPTS 545
FIGURE 15-6
to the common elements of Я and 93. Hence (15-31) follows from the first
inequality in (15-29).
Example 15-4. In the next table we list the probabilities of the events of a partition
Я and of its refinement 23 obtained as above.
Я	p = 0.4	035	0.25
	pa « 0.22	ph - 0.18	035	0.25
In this case,
Н(Я) = — (0.41og0.4 + 0.35 log 0.35 + 0.25 log 0.25) = 1.559
H(93) = —(0.22 log 0.22 + 0.18 log 0.18
+035 log 0.35 + 0.25log0.25) = 1.956
Я	p=p.+pd	g j	-^4	«
F1GURE15-7
546 ENTROPY
FIGURE 15-8
Thus
H(%) = 1.559 < 1.956 = Я(®)
in agreement with (15-31).
2.	If
then HW>HW	(15-32)
Proof. Repeating the construction of Fig. 15-7, we form a chain of refinements
« = ••• ••• •<»„ = »
where 8Im is obtained by splitting one of the elements of as in Fig. 15-8.
From this and (15-31) it follows that
Я(Я) =7/(8!,) < ••• <sH(8l„) =H(®)
and (15-32) results.
3.	For any 81:
Н(81)<Я(©)	(15-33)
where <5 is the element partition.
Proof. It follows from (15-31) because S is a refinement of 81.
4,	For any 81 arid ®:
Я(81-®) £H(8l)	Я(Я-®)^Н(®)	(15-34)
Proof. It follows from (15-31) because 81 • ® is a refinement of 81 and of ®.
Example 15-5. In the die experiment, the probabilities of the six events
l/J equal
0.1	0.1	0.15	0.2	0.2	0.25
respectively. The probabilities of the events of the partitions
Я — [even, odd] ® « [•/ < 3, i > 3]
are given by
P{even) - 0.55 P{odd) ° 0.45 P{i s: 3} = 0.35 P{i > 3) - 0.65
15-2 hash < oM i.i’is 547
FIGURE 15-9

The product Я • 5J is a partition consisting of the four elements
(Л)	(ЛЛ)	{Ш (Д)
with respective probabilities
0.1	0.25	0.45	0.2
From the above it follows that
Я(Я) = 0.993	H(93) = 0.934	//(Я • 93) = 1.815
in agreement with (15-34).
5.	Suppose that Й and 93 are two partitions that have the same elements
except the first two (Fig. 15-9)
«1- [лГрлГ2,^,...,л^]	® = [^1,.#2,.c/v...,V/v]
We maintain that if
P(^2)=P2	= Pt + e < p2 - f =
as in (15-30), then
Н(Я) <//(«)	(15-35)
Proof. Clearly,
Н(Я) - <p(p{) - <p(p2) = H(93) - <p(pi + s) - <p(p2 + £)
because each side equals the contribution to Н(й) and /7(93) respectively due
to the common elements of Й and 93. Hence (15-35) follows from the second
inequality in (15-29).
Example 15-6. In the next table we list the probabilities of the events of the
partitions Я and 93.
Й	0.1	I	03	035	0.25	px = 0.1
---:--------‘----—------------------------ r = 0.08
»	0.18J	0.22	035	0.25	p2 = 03
In this case,
Я(й) = 1.883	/7(93) - 1.956
in agreement with (15-35).
6,	If we equalize the entropies of twb elements of a partition, leaving all
others unchanged, its entropy increases.
548 ENTROPY
FIGURE 15-10
Proof. It follows from the above with e = (p2 - pt)/2.
7.	The entropy of a partition is maximum if all its elements are equally
likely as in (15-28).
Proof. Suppose that VI is a partition such that /7(21) = Hin is maximum and two
of its elements have unequal probabilities. If they are made equal, then
(property 6) Ж 21) increases. But this is impossible because Hm is maximum by
assumption.
A useful inequality. If a, and bt are 7/ positive numbers such that
a। -+ • • • + a= 1 by + • • • + bN £ 1	(15-36)
then
- log ai _ log bi	(15-37)
।	i
with equality iff a(- = bt.
Proof. From the inequality ey’	1 + у it follows that In x < x - 1 (Fig. 15-10).
With x =.b{/alt this yields
6,- bt
lnb. - In a.: = In— <--------1
ai ai
Multiplying by dj and adding, we obtain
E*/.(ln'b,- - In a,) £aj — - 11 = £(6f - «,) 0
i	i \ai I i
and (15-37) results.
15-2 hash'< <>•.< i п\ 549
Maximum entropy. Using (15-37). wc shall rederive property 7. It suitices to
show that
~ Ep, log p, < log N	(15-38)
Proof. The numbers a, = p, and b, = \/N satisfy (15-36). Inserting into (15-37),
we conclude that
- Ep< log P, < - Ep, log— - log M'Ep, = log .V
»	.A
Conditional Entropy and Mutual Information
The conditional entropy of a partition 21 assuming // is by definition the sum
/7(2l|.^) = - £P(.ft<W)log	(15-39)
/- i
where P(^) ¥= 0, is the number of elements .a/ of 21. and
P(.!/./f )
As we explain later, H(2I|.^) is the uncertainty about 21 in the subsequence of
trials in which Л occurs.
Suppose now that ® is a partition consisting of the elements
Clearly,
Н(Я1 = ~ E /’(«<l^)log P(^|.^)	(15-40)
i= 1
is the conditional entropy of 21 assuming defined as in (15-39). The
conditional entropy of 21 assuming © is the weighted average of H(2t|^,):
Я(21|®) = EP(^)H(2I|.^)	(15-41)
This equals the uncertainty about 2! if at each trial we know which of the events
S&j of 23 has occurred.
Example 15-7. Wc shall determine the conditional entropy H(@|®) of the clement
partition S in the fair-die experiment where 23 = [even, odd).
Clearly, P{//|evcn} = f if i is even and Pf/Jcvcn) e 0 if i is odd. Similarly,
H/(lodd) = | if i is odd and P[/;|odd) - 0 if i is even. Hcncc
//(Sloven) = -(| log | + | log| + | log|) = l.og3 = //(Si odd)
And since P(even) mP(odd) = 0.5, we conclude from (15-41) that
:	//(Si®) - 03 Iog3 + 03log3 • log3
550 UNTROPY
Thus, in the absence of any information, our uncertainty about ® equals //(©) =
log 6. If wc know, however, whether at each trial “even” or “odd” showed, then
our uncertainty is reduced to //(®l®) = log3.
THEOREM 1.	If
29 <21 then //(Sll23)=O	(15-42)
Proof. Since 23 is a refinement of 21, each element of 23 is a subset of some
element of Й and, therefore, it is disjoint with all other elements of 21.
Hence	if i = к and = 0 otherwise. This leads to the conclu-
sion that
(1 i = /c
P(^)	10 i*k
And since p log p = 0 for p = 0 and p = 1, we conclude that all terms in
(15-40) equal 0; hence
Я(»|Й?.) = 0
for every j. From this and (15-41) it follows that //(2I|23) = 0.
Independent partitions Two partitions 21 = [j^.] and 23 « [Д] are called
independent if the events and are independent for every i and j:
= P(tft )P(0y)	(15-43)
THEOREM 2.	If the partitions SI and 23 are independent, then
Я(21|23) =	H( 23ISI) = //(23)	(15-44)
Proof. Clearly,	= Р(лф; hence [see (15-40)]
W(Sl|1%) = - )logP(j< ) = Я(Я)
i
Inserting into (15-41), we obtain
H(U|23) ==Z/(Sl)£P(^.) = H(Sl)
j
and (15-43) results. We can show similarly that //(23|2I) = //(23).
THEOREM 3.	For any SI and 23:
//(21-23) <H(SI) + //(23)	(15-45)
Рпю/. As we Imow [see (2-36)]
P(X ) = £P(^)
i
15-2 BASIC CONCEPTS 551
Hence
Н(И) - )logP(.a< ) - -EP(.a'^)logr(J< )
‘	iJ
Writing a similar equation for Ж93) and adding, we obtain
Н(Я) + Н(9Э) = -	)P(#,)]	(15-46)
•J
Clearly, Н(й • 93) is a partition with elements .й/^. Hence
Я(И • 93) = - £P(^.^)log	(15-47)
To prove (15-45), we shall apply (15-37) identifying the numbers a, and with
the numbers Р(д^^-) and PWjPtty respectively. We can do so because
Ep(^) = i	Ep(X )p(-^) = i
i,j	i,J
From (15-37) it follows that the sum in (15-47) cannot exceed the sum in (15-46);
hence (15-45) must be true.
COROLLARY.
=Н(Я) +HW	(15-48)
iff the partitions 91 and 93 are independent.
Proof. This follows from (15-45) because (15-37) is an equality iff a, = b, for
every i. Hence (15-45) is an equality iff
P(j^) =P(.£< )P(^y)
for every i and j.
THEOREM 4.	For any Й and 93:
Я(« • 93) = H(93) + H(9l|93) = Н(Ю + Я(93|Я)	(15-49)
hoof. Since
Р(л<^) - Р(^)Р(л<1^)
we conclude from (15-40) that
Р(^7)Я(Я|^) - -EP(^)P(^|^)logP(j<l^)
« - EP(^y)[log P(^) - log P(^)]
i
= - EP(^)logP.(^/) + P(^j)logP(^)
552 ENTROPY
Summing over all j, we obtain
£р(^)н(и|«,) = - £p(^)iog р(л<а>) + EP(0,)iog p(«,)
j	i,j	J
and the first equation in (15-49) follows because the above three sums equal
77(2I|®), 77(21 • ®), and -Ж®) respectively. The second equation follows
because 31 - ® = ® • 21.
COROLLARIES. The following relationships follow readily from the last two
theorems: For any 21 and ®:
77(®)sH(21 • ®) <77(21) +//(©)	(15-50)
Я(21|«) <;Я(21)	(15-51)
Я(Я) - 77(2l|®) = H(®) - H(®|21)	(15-52)
Mutual information. The function
/(21,®) = 77(21) + H(®) - 77(21-®)	(15-53)
is called the mutual information of the partitions 21 and ®. From (15-49) it
follows that
7(2(,®) = 77(21) - 77(2l|®) = 77(®) - 7/(®|2l)	(15-54)
Clearly [see (15-51)]
7(21,®) £0	(15-55)
As we shall presently see, 7(21, ®) can be interpreted as the “information about
21 contained in ®” and it equals the “information about ® contained in 21.”
Example 15-8. In the fair-die experiment of Example 15-7,
77(<5) = log 6	H(S|®) = log3 77(®) = log2 7/(®|S) = 0
Hence
7(<5,®) = log2
Thus the information about the element partition © resulting from the observation
of the even-odd partition ® equals log 2.
Generalizations. The preceding results can be readily generalized to an arbi-
trary number of partitions. We list below several special cases leaving the simple
proofs as problems:
(e) If
®<® then 77(2I|®) <; Z7(2l|®)	(15-56)
(b) If the partitions 21, ® and ® are independent, then
Я(И •»•<£)- 77(21) + H(® - G) - н(21) + H(®) + 77(G) (15-57)
15-2 basic conctpfs 553
(c) Chain rule For any 21, ®, and (5:
Ж» • ®l«) =н(®1») +//(®|я.«)	(15.58)
H{% •©•€)= /7(21) + //(» - £|2I) = Я(Я) + Н(<В|<Д) + H(d|?l • <B)
(15-59)
Repeated trials* In the space >/ ' of repeated trials all outcomes are sequences
of the form
(15-60)
where each is an element of ./. Consider a partition Я of ./ consisting of
N events. At the к th trial, one and only one of these events will occur, namely
the event that contains the element £,д. The cartesian product
•••	- x.Z	(15-61)
is an event in with probability
/’И.л) =/’(-4j	(15-62)
because it occurs iff the event occurs at the it th trial. For specific k, the
events k form an N element partition of the space This partition will
be denoted by 21 k. From (15-62) it follows readily that
H(4Lk) = /7(21)	(15-63)
We can define similarly the partition ®A of formed with the
elements of another partition 23 of . Reasoning as in (15-63), we conclude
that Н(«л) = 77(93) and
H(21J® k) = /7(211»)	7(21*, »J =7(21,»)	(15-64)
We next form the product of the n partitions 2lA.-.
21" = 21, • 2l2 ••• 2l„	(15-65)
The elements of this partition are cartesian products of the form
.nZ x • • • x .oZ x • •  x jjZ	(15-66)
If 21 is the element partition of .jZ, then 21" is the element partition of
cZ". In general, however, the elements of 21" are events consisting of a large
number of sequences of the form (15-60). If we picture these sequences as wires,
then the elements (15-66) of the partition 21" can be viewed as cables and their
union as a collection of such cables (Fig. 15-11).
From the independence of the trials, it follows that the n partitions
2tp..., Ил of are independent. Hence [see (15-57) and (15-63)1
77(21") =/7(21,) + ••• +H(2ln) =n/7(2l)	(15-67)
Defining similarly the partition 23", we conclude as in (15-64) that
/7(21"!®") =n/7(»|2l)	/(«",»") -n/(2l,»)	(15-68)
554 ENTROPY
j	к	n
.f/( X •••
(•>..........*>............*>
FIGURE 15-11
Example 15-9. In the coin experiment, the entropy of the clement partition equals
W(@) - -p log - q log q
In the space ./2, the element partition consists of four events with
P{hh} = p2 P{ht} = P{th} = pq P[tt} = q2
Hcnce
H(S2) = —p2 log p2 - 2pq log pq - q2 log q2 = -2p log p - 2q log q
Thus
Я(@2) =2H(@)
in agreement with (15-67).
CONDITIONAL ENTROPY AND UNCERTAINTY. As we have noted, the entropy
H($£) of a partition Я = gives us a measure of uncertainty about the
occurrence of the events at a given trial. Once the trial is performed and
the events are observed, the uncertainty is removed. We give next a similar
interpretation to the conditional entropy H(9l|^) of 51 assuming that the event
has been observed, and of the conditional entropy Ж5Ц®) of 91 assuming
that the partitioning © has been observed.!
If in the definition (15-25) of entropy we replace the probabilities Р(л<)
by the conditional probabilities P(j^l^), we obtain the conditional entropy
H(9I|^) of Я assuming Л [see (15-39)]. The relative frequency interpretation
of jP(ja^JI^) is the same as that of Р(л^) if we consider not the entire sequence
of n trials but only the subsequence of trials in which the eventoccurs. From
this it follows that M?l|^) is the uncertainty about 51 per trial in that
subsequence. In other words, if at a given trial we know that occurs, then our
uncertainty about В equals H(?l|.^); if we know that Л occurs, then our
fThe expression a partition Й is observed will mean that .we know which of the events of ® has
occurred.
15-2
HASH COS('| IMS 555
uncertainty equals 77(Я|.^). The weighted sum
is the uncertainty about 21 assuming that the binary partition	is
observed.
Suppose now that al each trial wc observe the partition 23 = [.л7]. Wc
maintain that, under this assumption, the uncertainty per trial about 21 equals
77(Я|®). Indeed, in a sequence of n trials, the number of times the event d
occurs equals	'
nf - nP(^l)
In this subsequence, the uncertainty about 21 equals /7(Я|.^) per trial. Hence
the total uncertainty about 21 equals
£/1Т7(Я|.0) = £л/,(.^)7/(?1|.^) = /1//(Я|»)
j	j
and the uncertainty per trial equals 7/(Я|93).
Thus the observation of ® reduces the uncertainty about 21 from /7(21) to
Н(Я|®). The difference
7(Я,») =77(21) - /7(211»)
is the reduction of the uncertainty about 23 resulting from the observation of
®. This justifies the statement that the mutual information /(21, ®) equals the
information about Я contained in ®.
We show next the consistency between the properties of entropy devel-
oped earlier and the subjective notion of uncertainty.
1.	If SB is a refinement of Я and ® is observed, then we know which of the
events of Я occurred. Hence 77(Я|») = 0 in agreement with (15-42).
2.	If the partitions Я and 23 are independent and ® is observed, no informa-
tion about Я is gained. Hence 77(Я1®) = /7(Я) in agreement with (15-44).
3.	If we observe ®, our uncertainty about Я can only decrease. Hence 7/СЯ15В)
МЯ) in agreement with (15-51).
4.	To observe Я • ®, we must observe Я and ®. If only 23 is observed, the
information gained equals 77(83). The uncertainty about Я assuming ffl
equals, therefore, the remaining uncertainty /7(Я|ЯЗ) about ®. Hence
ЖЯ • ®) - НМ = Я(Я|®) in agreement with (15-49).
5.	Combining 3 and 4, we conclude that /7(Я • ®) — 77(83) < 77(Я) in agree-
ment with (15-45).
6.	If 23 is observed, then the information that is gained about Я equals
7(Я, 83). If S -< <5 and S3 is observed, then E is known. But knowledge of (£
yields information about Я equal'to 7(Я, (£). Hence, if ® ч (5, then /(Я, ffl)
£ 7(Я, ®) or, equivalently, 77(Я|Ё) < 77(Я|(£) in agreement with (15-56).
SS6 ENTROPY
.rf.
16, x  x.ri,.
№>)>))>)>))>)»>))))>)))№

n
FIGURE ’5-12
CONDITIONAL ENTROPY AND TYPICAL SEQUENCES. We give next a typical
sequence interpretation of lhe properties of conditional entropy limiting the
discussion to (15-45) and (15-49). The underlying reasoning is used in the proof
of the channel capacity theorem (Sec. 15-6).
We denote by ta, t®, and t91'® the typical sequences of the partition
and SI • ® respectively, and by T'1,I®, and T**'® their unions (Fig. I5-I2tr).
As we know [see (15-7)]
p(j«) = P(T®) = P(T9,‘®) = 1
Furthermore, the number of typical sequences in each of the above three sets
equals [see (15-10)]
nT« =	(15-69)
I. We maintain that
//(?(• ®) < H(?I) +//(®)
(15-70)
Proof. Each Iя ® sequence specifies a pair (t91, t®). The total number of such
pairs formed with all the elements of Ta and T® equals я pi • However,
not alli such pairs generate t91® sequences because, if the partitions 91 and ®
aire not independent, then not all pairs can occur. For example, if	for
some i and j and' suf, occurs at the к th trial, then must also occur at this
trial. From the above it follows that
Wj-fl Ф < Hjfl • tljv
and (15-70) follpws from (15-69);
15-2
bask сом I.pis 557
II. We shall show, finally, that
И(Я • 93) = H(®) + H(m)	(15-71)
Proof There are лт« sequences in the set I4’ and n^ * sequences in the set
Iя The ratio
П r fl
_____ _ ег»|Н(Я U4-W-»)]
П|«
equals, therefore, the number of 8 я ® sequences contained in a single t®
sequence on the average. To prove (15-71), we must prove, therefore, that this
number equals gnW(«i®)> \уе shall prove a stronger statement: The number of
I91'® sequences contained in a single t* sequence (Fig. 15-13) equals ел//(Я|'Э).
As we know [see (1-1)], the number of times the event occurs in a t®
sequence “almost certainly” equals
n^nP(^)	(15-72)
We denote by Iй*' a subsequence (Fig. 15-12b) of t® in which the number of
occurrences of satisfies (15-72). In this subsequence, the relative frequency
of the occurrence of an event equals P(j^|^y) [see (2-32)].
We shall use (15-10) to show that the number of typical sequences formed
with the elements of Й that are included in a sequence equals
s рщи(15-73)
indeed, this follows from (15-10) if we introduce the following changes: We
replace. by P(ja?J|^.), the length n of the original sequences with the
length n, = nP(^.), and the entropy Н(Ю of 21 with the conditional entropy
ЖЭДД).
Returning to the original I® sequence; we note that it is formed by
combining the sequences Lhat are included in t®. This shows that the total
liutnbef of Iя sequences that are included in I® equals the product
j-j	= ел7/(«|»)	(15-74)
J
558 ENTROPY
But each Ist sequence that is included in t® is a t*’® sequence. Hence the
number of P® sequences that is included in t® equals
15-3 RANDOM VARIABLES AND
STOCHASTIC PROCESSES
Entropy is a number assigned to a partition. To define the entropy of an RV we
must, therefore, form a suitable partition. This is simple if the RV is of discrete
type. However, for continuous-type RVs we can do so only indirectly.
Discrete type. Suppose that x is an RV taking the values xi with
P{x = x() = p,
The events {x = x() are mutually exclusive and their union is the certain event;
hence they form a partition. This partition will be denoted by 21 x and will be
called the partition of x.
Definition The entropy H(x) of a discrete-type RV x is the entropy
Я(21х) of its partition 2lx:
Я(х) = f/(2lx) = - £> In Pi	(15-75)
Continuous type. The entropy of a continuous-type RV cannot be so defined
because the events {x = x() do not form a partition (they are not countable). To
define Mx), we form, first, the discrete-type RV x5 obtained by rounding off x
as in Fig; 15-14:
xs = n8 if n8 - 8 < x < n8
(15-76)
Clearly,
P{x5 =	= P{n8 — 8 < x < 8} = [nS f(x) dx = 8f(n8)
JnS-S
FIGURE 15-14
15-3 RANDOM VARIABLES AND SI OCHAS11C PRCK l-SSI-S 559
where finS) is a number between lhe maximum and the minimum of
the interval inS - 8, n6). Applying (15-75) to the RV xA, we obtain
fix) in
= " E ЗДп5)1п[з/(п<$)|
ft - - x
and since
ж
E 8f(nS) = f f(x)clx= 1
n => - Ж	7-тс
we conclude that
^(xa)-------In 8- $2 8f(/j8)ln f(n8)	(15-77)
№ — x
As 8 -> 0, the RV x6 tends to x; however, its entropy HixA) tends to « because
-In 8 -» oc. For this reason, we define the entropy Hix) of x not as the limit of
H(xs) but as the limit of the sum //(xs) + In 8 as 8 -> 0. This yields
H(x&) + In 8 -J f(x)lnf(x) dx (15-78)
Definition The entropy of a continuous-type RV x is by definition the
integral
W(x)= - Г f(x)lnf(x)dx	(15-79)
•J — ao
The integration extends only over lhe region where fix) * 0 because
/(x)ln fix) = 0 if fix) = 0.
Example 15-10. If x is uniform in the interval (0, a), then
H(x) = - - fin - dx = In n	(15-80)
a Jq a
Notes 1. The entropy H(x6) of хл is a measure of our uncertainty about the RV x
rounded-off to the nearest n8. If 8 is small, the resulting uncertainty is large and it
tends to oo as 8 -» 0. This conclusion is based on the assumption that x can be
observed perfectly', that is, its various values can be recognized as distinct no matter
how close they are. In a physical experiment, however, this assumption is not
realistic. Values of x that differ slightly cannot always be treated as distinct (noise
considerations or round-off errors, for example). The presence of the term In S in
(15-78) is, in a sense, a recognition of this ambiguity.
2. As in the case of arbitrary partitions, the entropy of a discrete-type RV x
is positive and it . is used as a measure of uncertainty about x. This is not so,
however, for continuous-type RVs. Their entropy can take any value from -<® to <»
apd it is used to measure only changes in uncertainly. The various properties of
partitions also apply to continuous-type RVs if, as is generally the case, they
involve only differences .of entropies.
560 ENTROPY
Entropy as expected value. The integral in (15-79) is the expected value of the
RVу = -In/(x) obtained through the transformation g(x) = -In fix):
H(x) = £{ —ln/(x)} = - Г f(x)\nf(x)dx (15-81)
Similarly, the sum in (15-75) can be written as the expected value of the
RV - In p(x):
//(x) =£{-Inp(x)} = -	(15-82)
i
where now p(x) is a function defined only for x = x, and such that p(xt) = pr
Example 5-11. If
/(x) = ce~cxU(x) then E{-ln/(x)} = E{cx - Inc}
Since E{cx}_ = 1, this yields
e
Я(х) = 1 - Inc = In-	(15-83)
c
Example 15-12. If
/(x) =
<ту2тг
then
— f (x -1])2}	— a2
E( — In /(x)} = In oVZir + El   —2  > = In CT'/l'IT 4- —2
Hence the entropy of a normal RV equals
H(x) = In tr/bre	(15-84)
Joint entropy. Suppose that x and у are two discrete-type RVs taking the values
X( and y, respectively with
P(x = x,-,y = уД =Pij
Their joint entropy, denoted by Я(х,у), is by definition the entropy of the
product of their respective partitions. Clearly, the elements of • $Ly are the
events {x = X;, у = yj. Hence
HCx,y) = /7(ЯХ • Яу) = - In pu
i.j
The above can be written as an expected value
H(x,y) = £{ - In p(x,y)}
where Хх» У) is a function defined only for x = x, and у = yj and it is such
that p&ti yf) - pl}.
15-3 RANDOM VARIABLES AND SKX'llASTlC PK<X't-.SM:S 561
The joint entropy Я(х,у) of two continuous-type RVs x and у is defined as
the limit of the sum
#(хй,уд) +21n3
where x4 and y6 are their staircase approximation. Reasoning as in (15-78), we
obtain
Я(х,у) = - f ( /(.г,у)1п/(л-, y) dxdy = £{-ln/(x,y)) (15-85)
J —	—00
Example 15-13. If the RVs x and у arc jointly normal as in (6-15), then
ln/(x, y) _ ‘ _ 2,
2(l~r“)[ <rp	^cr,	a,*
— ln2ir<7|o-2\/l - r2
In this case,
-/(x-’h)2	_ (* - Л,)(У - пг) (У --7г)2 \	,
E{------5------2r--------------------1- -------> - 1 - 2r 4- I
1 ст,	<riflr2	ai I
Hence
E(-ln /(x,y)} = 1 + In 2iral(r2Vl - r2
From the above and (15-85) it follows that the joint entropy of two jointly
normal RVs equals
H(x,y) = 1п2тг<?7д	(15-86)
where
Д = ДпМгз ~ Au An=cr2 Мзг = °2 Au " rai<r2
Conditional entropy. Consider two discrete-type RVs x and у taking the values
x( and уf with
P{X = Х;|У = У/) = TTjj - Pji/Pj
The conditional entropy H(x|y,) of x assuming у = У/ is by definition the
conditional entropy of the partition of x assuming (у = Ур- From the above
and (15-39) it follows that
H(x|y,) = - In тгл	(15-87)
/
The conditional entropy Я(х|у) of x assuming у is the conditional entropy of 8,
assuming $IX, Thus [see (15-41)1
Я(х|у) - - Е/^Я(х1У/) “ “ to	(15^88)
I	<>1
562 ENTROPY
/(x,y)
For continuous-type RVs the corresponding concepts are defined similarly
H(x\y) = - ( /(x|y)lnf(x|y) dx	(15-89)
* — 00
Я(х|у) = - Г f(y)H(x\y)dy = [ ( f(x, y)\n f(x\y) dxdy (15-90)
j — X	J — M"' — 00
The above integrals can be written as expected values [sec also (7-66)]
H{x\y) = E{ —In /(x|y)|y = y)	(15-91)
Я(х|у) = E(-ln/(x|y)) = £{E(-ln/(x|y)|y}}	(15-92)
The discrete case leads to similar expressions.
Mutual information. Guided by (15-53), we shall call the function
/(x, у) = Я(х) + Я(у) - Я(х, у)	(15-93)
the mutual information of the RVs x and y.
From (15-81) and (15-85) it follows that /(x,y) can be written as an
expected value
(15-94)
Since /(x, y) = /(x|y)f(y) it follows from the above and (15-92) that
/(x,y) = Я(х) -Я(х|у) = Я(у) - /7(у|х)	(15-95)
Example 15-14. If two RVs x and у are jointly normal with zero mean, then [sec
(7-42)] the conditional density /(x|y) is normal with mean and variance
<r/(l — r2). From this and (15-84) it follows that
Я(х|у) = E( - In /(x|y)} = In <гх|/2^(1 -r1}	(15-96)
Since this is independent of y, it follows from (15-92) that
Я(х|у) = Я (x|y)	(15-97)
This yields [see (15-95)]
Z(x,y) = H(x) - Я(х|у)-----0.5 ln( 1 - r2)	(15-98)
We note finally that [see (15-86)]
Я(х|у) 4- Я(у) = 1п2чге\/Д = Я(х,у)
Special Case. Suppose that у = x 4- n where n is independent of x and E{n2} = N.
In this case,
£{xy) = <rx2 E{y2) =ах24-Я	r2=^—
a~ 4- N
Inserting into (15-98), we obtain
/	Or2 \
/(x,y) - 0.5Ш 1 4- -jy	(15-99)
15-3 RANDOM VARIABLES AM) SHX’UASIK. PR(X KSSl-_S 563
PROPERTIES. The properties of entropy, developed in Sec. 15-2 for arbitrary
partitions, are obviously true for the entropy of discrete-type RVs and can be
simply established as appropriate limits for continuous-type RVs. It might be of
interest however, to prove directly theorems (15-45) and (15-49) using the
representation of entropy as expected value. The proofs arc based on the
following version of inequality (15-38): If x and у arc two RVs with respective
densities a(x) and b(y\ then
£{ln a(x)} > £{lnb(x))	(15-100)
Equality holds iff a(x) = b(x).
Proof. Applying the inequality In z < z - I to the function z = b(x)/a(x), we
obtain
b(x) b(x)
lnb(x) - ln<j(x) = In-J—- < —- - 1
a(x) o(x)
Multiplying by o(x) and integrating, we obtain
( o(x)[ln b(x) — In o(x)] dx < ( (b(x) — o(x)] dr = 0
•' — »	— X
and (15-100) results. The right side is 0 because the functions a(x) and b(x) are
densities by assumption.
Inequality (15-100) can be readily extended to ?i-dimensional densities.
For example, if «(x, y) and b(z, w) are the joint densities of the RVs x, у and
z,w respectively, then
£{ln a(x,y)} > E{ln 6(x,y)}	(15-101)
THEOREM 1.
Я(х,у) ^Я(х) + Я(у)	(15-102)
Proof Suppose that /X).(x, y) is the joint density of the RVs x and у and Д(х)
and fy(y) their marginal densities. Clearly, the product /X(x)/y(w) is the joint
density of two independent RVs z and w. Applying (15-101), we conclude that
£{ln Ду(х,у)) 2:£{1п[/х(х)/(у)1) = £{1пД(х)) + £{ln/y(y)}
and. (15-102) resu Its.
THEOREM 2.
Я.(х,у) = Я(х|у) + Я(у) = W(y|x) + Я(х) (15-103)
Proof Inserting the identity /(x, y) =/(x|y)/(y) into (15-85), we obtain
<(x>y) - JS{-ln/(x,y)) - E{ — In/(x|y)} + £{ ~In/(y))
and the fimt equality in (15403) results. The second follows because //(x.y) -
564 ENTROPY
COROLLARY. Comparing (15-102) with (15-103), we conclude that
Я(х|у) <. Я(х)	(15-104)
Note If the RV у is of discrete type, then Жу|х) 0 and (15-103) yields H(x) < H(x, y).
This is not, however, true in general if у is of continuous type.
Generalizations. The preceding results can be readily generalized to an arbi-
trary number of RVs: Suppose that x,,...,x„ are n RVs with joint density
/(X|,...,x„). Extending (15-85), we define their joint entropy as an expected
value
//(X|,...,xn) = E{-In Дх,,..., x„)}	(15-105)
If the RVs x, arc independent, then
/(X|,...,xrt) = /(x,) ••• f(xn)
and (15-105) yields
f/(x„...,xj = Я(х,) +	+ H(xn)	(15-106)
Conditional entropies are defined similarly. For example [see (15-92)]
^(xJx„-i, •• • ,X|) = £{-ln	....x,)}	(15-107)
Chain ride From the identity [see (8-37)]
= /(хй|хя_,......x,)	••• /(x2|x1)/(x1)
and (15-107) it follows that
Я(Х|,...,хл) = H(xJx„_|,...,xI) + ••• +A/(x2|X|) + H(x}) (15-108)
The following relationships are simple extensions of (15-102) and (15-103):
H(x,y|z) <, f/(x|z) + /f(y|z)
H(x,y|z) = H(x|z) 4- f/(y|x,z)	(15-109)
H(x,,...,x;i) </f(x,) + ••• +H(x,t)
Example 15-15. If the RVs xi are jointly normal with covariance matrix C as in
(8-58), then
E{-ln/(xJ,...,x„)} = 1п^(27г)"Д + 4E{XC-'X')	(15-110)
This yields (see Prob. 8-23)
Я(х.....x„) = 1п^/(2тге)”Д	(15-111)
Transformations of RVs
Wc shaUcomparethe entropy of the RVs x and у = g(x).
15-3 UANIJOM \ MUAIU IS лм,М(Н I1(4S,1( VH(J( |лм 4 565
Discrete type. If the RV x is of discrete type, then
//(y) < //(x)	15.112)
with equality iff lhe transformation у - g(x) has a unique inverse.
Proof. Suppose that x takes the values v, with probability pt and «to has a
unique inverse. In this case,
Р{У-У,} = Их =л-,} = p, V^gfA.)
hence H(y) = W(x). If the transformation is not one-to-one, then у - у, for
more than one value of x. This results in a reduction of //(x) [see (15-31)).'
Continuous type. If the RV x is of continuous type, then
H(y) < H(x) 4 E(ln|g'(x)I)	(15-113)
with equality iff the transformation у = g(.v) has a unique inverse.
Proof As wc know [see (5-5)] if у = g(x) has a unique inverse x ~ g( l,( y),
then
Hence
H(y) = -	- Г Л(х)1пД^А
= ~f Л(*),п fAx) d* + f /ж(-*)1п lg'(*)l dx
-CO	~r.
and (15-113) results.
Several RVs. Reasoning as in (15-113), wc can similarly show that if
У, =&(*!....x„)
are n functions of the RVs xit then
Н(У|,...,у„) £ H(X|,...,x„) +E(ln|J(x1.....x„)|)	(15-114)
Where /(Х|,.,..,лп) is the jacobian of the above transformation [sec (8-9)].
Equality holds iff the transformation has a unique inverse.
Linear transformations Suppose that
У/ - flnxt +	+ain*>>
Denoting by A the determinant of the coefficients, we conclude from (15-114)
that if A 0 then
ЖУ|..-.,Уя) “ W(x1M.t,xJ + InlAl	(15-115)
because the transformation has a unique inverse and A does not depend on xc
566 ENTROPY
Stochastic Processes and Entropy Rate
As we know, the statistics of most stochastic processes arc determined in terms
of the joint density /(xh..., x,„) of the RVs х(/Д..., x(t,„). The joint entropy
W(x„...,xM)-E{-ln/(xI,...,xw)}	(15-116)
of these RVs is the mth-order entropy of the process x(/). This function equals
the uncertainty about the above RVs and it equals the information gained when
they are observed.
In general, the uncertainty about the values of x(/) on the entire t axis or
even on a finite interval, no matter how small, is infinite. However, if x(r) can be
expressed in terms of its values on a countable set of points, as in the case for
bandlimited processes, then a rate of uncertainty can be introduced. It suffices,
therefore, to consider only discrete-time processes.
The mth-order entropy of a discrete-time process x„ is the joint entropy
H(xh...,xOT) of the m RVs
x« ’	।»• • •, x„ +1	(15-117)
defined as in (15-116). We shall assume throughout that the process x„ is SSS.
In this case, Жх| • • • xOT) is the uncertainty about any m consecutive values of
the process хл. The first-order entropy will be denoted by Жх). Thus Жх)
equals the uncertainty about хл for a specific n.
Clearly [see (15-109)]
H(x„...,xOT) <H(x,) + ••• +H(x,„)	(15-118)
Special cases (a) If the process x„ is strictly white, that is, if the RVs
x„,x„_|,... are independent, then [see (15-106)]
H(x„...,xwl)	(15-119)
(Z>) If the process хл is Markoff, then [see (16-99)]
Л*.....=/(xOT|xOT_1) ••• /(xJxJAx,) (15-120)
This yields
Я(Х|,..;,хот) =H(xOT|xOT_l) + -t-tf^lx,) + H(xx) (15-121)
From (15-103) and the stationarity of x„ it follows, therefore, that
Я(х„...,хот) = (m - l)H(x„x2) - (m - 2)H(x)	(15-122)
We have thus expressed the mth-order entropy of a Markoff process in terms of
its first- and second-order entropies.
CONDITIONAL ENTROPY; The conditional entropy of order m:
Of a process x„ is the uncertainty about its present under the assumption that its
m most recent values have been observed. Extending (15-104), we can readily
15-3
RANDOM VAR!AUl.l_4 AND SUX'I IAS l К'|’HO< I SSI S 567
show that
H(xjx„_|........хя_,„) < Я(х„|х,x„_„,_ J (15-123)
Thus the above conditional entropy is a decreasing function of tn. If, therefore,
it is bounded from below, it tends to a limit. This is certainly the case if the RVs
x„ are of discrete type because then all entropies are positive. The limit will be
denoted by Hc(x) and will be called the conditional entropy of the process xn:
= lim	.....x„..,„)	(15-124)
The function A/,.(x) is a measure of our uncertainty about the present of x„
under the assumption that its entire past is observed.
Special cases (a) If x„ is strictly white, then
Hf(x) = H(x)
(b) If x„ is a Markoff process, then
^(x„lx„_i......x„_w) = Н(хп\х„_})
Since x„ is a stationary process, the above equals Н(х2\х^. Hence
/7,.(х) = Л/(х2|х,) = Я(х„х2) - A/(x)	(15-125)
This shows that if х„_, is observed, then the past has no effect on the
uncertainty of the present.
ENTROPY RATE. The ratio MX) • • • x„,)/m is the average uncertainty per
sample in a block, of m consecutive samples. The limit of this average as tn -* »
will be denoted by H(x) and will be called the entropy rate of the process x„:
77(x) = lim —H(X|.........x,„)	(15-126)
tn
If xn is strictly white, then
H(x) = W(x) = ЯДх)
If x„ is Markoff, then [see (15-122)]
77(x) = Я(х„х2) - H(x) « А/Дх)	(15-127)
Thus, in both cases, the limit in (15-126) exists and it equals Hc(x). We show
next that this is true in general.
THEOREM. The entropy rate of a process x„ equals its conditional entropy
/7(х)=А/Дх)	(15-128)
Proof, This is a consequence of the following simple property of convergent
sequences: If
ak -> a then — 53 ak a	(15-129)
m t.|
568 ENTROPY
Since x„ is stationary we conclude, as in (15-108), that
Я(х|,...,хт) = H(x) + £ Н(хя|хя_1,...,х„_А.)
A-i
Dividing by m and using (15-129), we obtain (15-128) because
H(x„|x„_ ],•••♦ k)
tends to Hc(x) as к -»<».
Note If x„ equals the samples x(/tT) of x(r), then the entropy rate is measured in bits per
T seconds. If we wish to measure it in bits per second, we must divide by T.
Normal processes. We shall show that if x„ is a normal process with power
spectrum S(w), then
H(x) = \гм/2тге + -— f In S(a>) d<o
4тг —тг
(15-130)
Proof. As we know, the function /(хот + 1|хш,..., jq) is a one-dimensional
normal density with variance Д,„+1/Д,„ [see (8-85) and (14-66)]. Hence
Я(х„|хя_|,...,х„_„,) - InJ------——
1
as in (15-84). This leads to the conclusion that
____ 1	Д/я+1
Я_(х) = lnV2ire H— lim In ——
Д,я
(15-131)
(15-132)
and (15-130) follows from (14-70) and Prob. 14-15.
ENTROPY RATE OF SYSTEM RESPONSE. We shall show that the entropy rate
H(y) of the output ул of a linear system L(z) is given by
—	_	1 7T
Я(у) = Я(х) + —f ln|L(e^r)|rfw	(15-133)
where Жх) is the entropy rate of the input x„ (Fig. 15-15).
Xn
L(2)	Уп H(y)’
H(y) = Я(х) + -L/’ ln|L(e'“r)|A0
EIGURE15-15
15-4 THE maximum IMh<ipy Ml шор 569
Suppose, first, that x„ is a normal process. In this case у
and its entropy rate is given by (15-130) where
is also normal
S(w) - S,(to) - S,(w)|L(t,;iu/ )|~	(15-134)
This yields
Я(у) = 1п/2тге + — f [in 5ж(о») + ln|L(i’J<u7)|2] rfw (15-135)
and (15-133) follows.
The proof for arbitrary processes is involved. We shall sketch a justifica-
tion based on (15-115): If the RVs yj..........ym depend linearly on the RVs
then
/7(У|.....У«) =/7(x,,....x„1) + Ko	(15-136)
where Ko ~ ln(A( is a constant that depends only on the coefficients of the
transformation. The process y„ depends linearly on x„:
X| = Y,lk*n-k n= ,,x	(15-137)
к - 0
where now the transformation matrix is of infinity order. Extending (15-136) to
infinitely many variables, we conclude with (15-126) that
W(y) = H(x)+tf	(15-138)
where again К is a constant that depends only on the parameters of the system
L(z). As we have seen, if x„ is normal, then К equals the integral in (15-133).
And since К is independent of x„, it must equal that integral for any x„.
15-4 THE MAXIMUM ENTROPY METHOD
The MEM is used to determine various parameters of a probability space
subject to given constraints. The resulting problem can be solved, in general,
only numerically and it involves the evaluation of the maximum of a function of
several variables. In a number of important cases, however, the solution can be
found analytically or it can be reduced to a system of algebraic equations. In this
section, we consider certain special cases, concentrating on constraints in the
form of expected values. The results can be obtained with the familiar varia-
tional techniques involving Lagrange multipliers or Euler’s equations. For most
problems under consideration, however, it suffices to use the following form of
(15-100).
If fix') and (f>(x) are two arbitrary densities, then
-j* <p(.v)ln <p(x) dr s - J <p(x)ln/(x) dx (15-139)
Example 15-16. In the coin experiment, the probability of heads is often viewed as
an RV p (see bayesian estimation, Sec. 9-2). Wc shall show that if no prior
'570 ENTROPY
information about p is available, then, according to the ME principle, its density
f(p) is uniform in the interval (0,1). In this problem wc must maximize W(p)
subject to the constraint (dictated by the meaning of p) that ftp) = 0 ouiside the
interval (0,1). The corresponding entropy is, therefore, given by
H(p) = -	f(p)dp
Jo
and our problem is to find ftp) such as to maximize the above integral.
We maintain that H(p) is maximum if
/(p) = l H(p)=0
Indeed, if <p(p) is any other density such that <p(p) = 0 outside the interval (0,1),
then [see (15-139)]
~ fl4>(p)ln<p(p) <, - f'<p(p)ln f(p)dp = 0 = H(p)
Jo	Ja
Example 15-17. Suppose that x is an RV vanishing outside the interval (-тг,тг).
Using the MEM, wc shall determine the density /(x) of x under the assumption
that the coefficients c„ of its Fourier series expansion
/(x) = Г c„ejnx -tt £ x < тг
are known for In I N. Our problem now is to maximize the integral
Я(х) = -£у(х)1п/(х)Л
subject to the constraints
1 fir
c„ = — J f(x)e J"x dx |n | <N	(15-140)
Ztt J-it
Clearly, Я(х) depends on the unknown coefficients cn and it is maximum iff
дН ЭН df
-— = —-—=-/ [In f (x) + l]e/”JC dx = 0	|n| >N
df dc„ J-S
This shows that the coefficients y„ of the Fourier series expansion of the function
In /(x) + 1 in the interval (—тг,тг) аге 0 for |л| > N. Hence
/v
in/(x) + i= £ Уке‘кх
к-—N
From the above it follows that
!n	1
-1 + 5L	— тг^х^тг	(15-141)
k--N	)
We have thus shown that the unknown function is given by an exponential
involving the parameters yk. These parameters can be determined from (15-140).
Tht-tesulting system is nonlinear and can only be solved numerically.
15-4 THE MAXIMUM ENTROPY METHOD 571
Constraints as Expected Values
We shall consider now a class of problems involving constraints in the form of
expected values. Such problems are common in statistical mechanics. Wc start
with the one-dimension case.
We wish to determine the density /(л ) of an RV x subject to the condition
that the expected values 77,- of n known functions g,(x) of x are given
E(8i(x)} = f Si(x)f(x) dx = f), i=l..........n (15-142)
— X
Using (15-139), we shall show that the MEM leads to the conclusion that
/(x) must be an exponential
/(x) = Лехр{-Л|$,(х) - -A„grt(x)) (15-143)
where Az are n constants determined from (15-142) and A is such as to satisfy
the density condition
Af exp{-A,g,(x) -	- A„g„(x)} dx = 1	(15-144)
J — X
Proof, Suppose that fix) is given by (15-143). In this case,
( /(x)ln/(x) dx = { /(x)[ln A - A(g,(x) - ••• - A„g„(x)) dx
Hence
H(x) — A1tj! 4- • • • 4-A„t}m - In A	(15-145)
To prove (15-143), it suffices, therefore, to show that, if <pix) is any other
density satisfying the constraints (15-142), then its entropy cannot exceed the
right side of (15-145). This follows readily from (15-139):
/X	-X
ф(х)1п <p(x) dx - I <p(x)ln f(x) dx
— 00	x
- f <P(*)[A|SiU) +	+A„g„(x) - In Л] dx
J — X
= AjTjg + ••• +Ani?„ - In A
We note that, if fix) - 0 outside a certain set R, then fix) is again given
by (15-143) for every x in R and the region of integration in (15-144) is the
set R.
Example 15-18. Wc shall determine fix) assuming that x is a positive RV with
known mean 17. With gix) =x, it follows from (15-143) that
ft	_ / Ае~Лл	x > 0
Л '	\0	x<0
We have thus shown that if an RV is positive with specified mean, then its density,
obtained with the MEM, is an exponential.
$12 ENTROPY
THE PARTITION FUNCTION. In certain problems, it is more convenient to
express the given constraints in terms of the partition function (Zustandsummc)
1	r00
Z(A],...,An) = — = / exp{-Л।g,(x)	Л,,£„(.?)} dr (15-146)
A J — 00
Indeed, differentiating with respect to A,, we obtain
-—=( g,(-v)exp{ - E Akgk(x) dr = Zl g,(x)/(x)dr
This yields
The above is a system of n equations equivalent to (15-142) and can be used to
determine the n parameters A(.
Example 15-19. In the coin experiment of Example 15-16, we assume that p is an
RV with known mean tj. Since f(p) = 0 outside the interval (0,1). (15-143) yields
f(p) »/Ae~*P	Z=f,e~Apdp----------—
(0	otherwise	'o	A
The constant A is determined from (15-147):
1 HZ 1 -e"A — Ае~л
~~z'ak A(1 - e"A) = V
In Fig. 15-16, we plot A and ftp) for various values of ??. Note that if 77 = 0.5,
then A = 0 and f(p) = 1.
Example 15-20. A collection of particles moves in a conservative field whose
potential equals И(х). For a specific t, the x component of the position of a
particle is an RV x with density /(x) independent of i (stationary state). Thus the
FIGIIREI5-I6
15-4 1HL MAXIMUM l.MHOPY Ml IH(*I> 573
probability that the particle is between x and x + dx equals /(x)dv and the total
energy per unit mass of the ensemble equals
/= Г И(х)/(х)г/х = Е{И(х)}
J —a
We shall find /(x) under the assumption that the function g(x) = l'(x) and the
mean 1 of И(х) arc given. Inserting into (15-143), we obtain
^ V) = 7e A,("	(15-148)
where
Z = Г e-A,‘“dv Г Г(х)е л,,лМх = /
Special Case. In a gravitational field, the potential fz(x) = Mgx is proportional to
the distance x from the ground. Since fix) = 0 for x < 0. it follows from (15-148)
that
Me
/(.r)= — e
The resulting atmospheric pressure is proportional to I - Fix).
Example 15-21. We shall find fix) such that E(x2} = m2. With g|(x)=x2,
(15-143) yields
fix)=Ae-Ax'	(15-149)
Thus, if the second moment m2 of an RV x is specified, then x is Ni0,mz). We
can show similarly that if the variance a2 of x is specified, then x is Ni-q.a2)
where is an arbitrary constant.
Special Case. We consider again a collection of particles in stationary motion and
we denote by vx the x component of their velocity. Wc shall determine the density
fiux) of vx under the constraint that the corresponding average kinetic energy
Kjc = E{Mvx/2} is specified. This is a special case of (15-149) with m2 = 2КЯ/М.
Hence
I M , ,
f(pr) = 1/------e/4Л.
M *} У 4тгКд
Discrete type RVs. Suppose that an RV x takes the values xk with probability
P*. We shall use the MEM to determine pk under the assumption that the
expected'values
£(&(*)} = Ъё{(хк)рк =	(15-150)
1	к
Of the n known functions g/x) are given.
S74 ENTROPY
Using (15-37), we can show as in (15-143) that the unknown probabilities
equal
pk = Acxp{-X,gi(xk) - ••• - Artgrt(xA.)} (15-151)
where
-j = Z = £ exp{ -[A,g,(xfc) +  • • +A„g„(xA)]}	(15-152)
Л	к
The n constants A( are determined either from (15-150) or from the
equivalent system
1 dZ
' = 1....n	(15-153)
(//л:
Example 15*22. A die is rolled a large number of times and the average number of
dots up equals ?j. Assuming that 17 is known, we shall determine the probabilities
pk of the six faces fk using the MEM. For this purpose, we form an RV x such
that х(Д) = к. Clearly,
E{x} = p, + 2p2 + •  • + 6p6 = 7)
With g(x) = x, it follows from (15-151) that
1
pk = — e *A Z = w + iv2 + • • • +>v6
where w = e~x. Hence
wk	w + 2w2 + • • • +6iv6
&k w -I- w2 + • • • +w6 w + И'2 + • • • +h'6	71
as in Fig. 15-17. We note that if 77 = 3.5, then pk =
Joint density. The MEM can be used to determine the density f(X) of the
random vector X: [x(,... ,xM] subject to the n constraints
= Vi i
(15-154)
15-4 пи maximum i
||«1ПМ1|Ц()|> 575
Reasoning as in the scalar ease, wc conclude that
/(Л') = л exp{- ••• - A„gri(A')} (15-155)
Second-Order Moments and Normality
We are given the correlation matrix
* = b-{X'X)	(l5-!56)
of the random vector X and we wish to find its density using the MEM. We
maintain that /(A') is normal with zero mean as in (8-58)
= 7(-?-)л/д exp{~ >л"}	(15-157)
Proof. The elements Rfk = Е{х,хк} of R are the expected values of the M2
RVs gjk(X) = XjXk. Changing the subscript i in (15-154) to double subscript, wc
conclude from (15-155) that
/(X) = A exp/ - £ Ал. ,vrv*)	(15-158)
k 7.A-	>
This shows that f(X) is normal. The M2 coefficients Alk can be determined
from the M2 constraints in (15-156). As we know [see (8-58)], these coefficients
equal the elements of the matrix R~ l/2 as in (15-157).
The preceding results are acceptable only if the matrix R is p.d. Other-
wise, the function /(A') in (15-157) is not a density. The p.d. condition is, of
course, satisfied if the given R is a true correlation matrix. However, even then
(15-157) might not be acceptable if only a subset of the elements of R is
specified. In such cases, it might be necessary, as we shall presently see, to
introduce the unspecified elements of R as auxiliary constraints.
Suppose, first, that we are given only lhe diagonal elements of R:
£{x(2}=R„	/=1,...,/W	(15-159)
Inserting the functions g;/(x) = x2 into (15-155), we obtain
f(X) ехр{-Анл7 -	- Wil (15-160)
This shows that the RVs x, are normal, independent, with zero mean and
variance Rn = 1/2A17.
The above solution is acceptable because Rf, > 0. If, however, we are
given N < M2 arbitrary joint moments, then the corresponding quadratic in
(15-158) will contain only the terms XjXk corresponding to the given moments.
The resulting f(X) might not then be a density. To find the ME solution for
this case, we proceed as follows: We introduce as constraints the M2 joint
moments R/7f where now only N of these moments are given and the other
M2 - N moments arc unknown parameters. Applying the MEM, we obtain
576 ENTROPY
(15-157). The corresponding entropy equals [see (15-111)].
W(Xi,...,xM) = In-/(lire) Л/Д	Д = |R|	(15-161)
This entropy depends on the unspecified parameters of R and it is maximum if
its determinant Д is maximum. Thus the RVs x( are again normal with density
as in (15-157) where the unspecified parameters of R arc such as to maxi-
mize Д.
Note From the above it follows that the determinant Д of a correlation matrix R is such
that
Д 5 /?ii  •  RMM
with equality iff R is diagonal. Indeed, (15-159) is a restricted moment set; hence the ME
solution (15-160) maximizes Д.
Stochastic processes. The MEM can be used to determine the statistics of a
stochastic process subject to given constraints. We shall discuss the following
case.
Suppose that xzl is a WSS process with autocorrelation
/?[wi] = Е{хя+Жхл)
We wish to find its various densities assuming that R[m] is specified either for
some or for all values of m. As we know [see (15-158)] the MEM leads to the
conclusion that, in both cases, x„ must be a normal process with zero mean.
This completes the statistical description of x„ if R[m] is known for all tn. If,
however, we know R[m] only partially, then we must find its unspecified values.
For finite-order densities, this involves the maximization of the corresponding
entropy with respect to the unknown values of /?[m] and it is equivalent to the
maximization of the correlation determinant Д [see (15-161)]. An important
special case is the MEM solution to the extrapolation problem considered in
Sec. 13-3. We shall reexamine this problem in the context of the entropy rate.
We start with the simplest case: Given the average power Е{хя] = R[0] of
x„, we wish to find its power spectrum. In this case, the entropy of the RVs
,..., X„
is maximum if these RVs are normal and independent for any M [see (15-160)],
that is, if the process x„ is normal white noise with R[/n] = R[0]6[m].
Suppose how that we are given the N + 1 values (data)
R[0],...,R[N]
of Я[/и] and we wish to find the density f(X) of the M + 1 RVs x„,.... x„+A/.
If M £ N, then the correlation matrix of X is specified in terms of the data and
f(X) is given by (15-157). This is not the case, however, if M > N because then
only the center diagonal and the N upper and lower diagonals of the correla-
15-4 THE MAXIMUM UN 1 ROPY METHOD 577
tion matrix
are known. To complete lhe specification of /?Af4,, we maximize the determi-
nant Ддг+1 with respect to the unknown values of Я[т].
Example 15-23. Given 7?[0) and Я[1]. we shall find Я[2] using the maximum
determinant method. In this case.
Я[0]
Я[2]
A =
Я[1] Я[2]
Я[0] Я[1]
Я[1] Я[0]
Непсе
ад	я2ГП
ад--2«[0№]+2йЧ1]=0
THE MEM IN SPECTRAL ESTIMATION. We are given again for |m| £ N.
The power spectrum
S(w) = Я[0] + 2 K[m]cosтшТ
m=l
of x„ involves the values of K[m] for every m. To find its unspecified values, we
maximize the correlation determinant Дл/ and examine the form of the result-
ing Я[т1 as M -> «. This is equivalent to the maximization of the entropy rate
H(x) of the process x„. Using this equivalence, we shall develop a more direct
method for determining S(m).
As we know, the MEM leads to the conclusion that under the given
constraints (second-order moments), the process x„ must be normal with zero
mean. From this and (15-130) it follows that
_ _____________ 1
H(x) = In V^lire + — I In 5(<u) da>
The entropy rate H(x) depends on the unspecified values of /?[m] and it is
maximum if
= _L f 2-e-»^, = о Im | > N	(15-162)
ЭЯ[т] 2тг'-тг5(а>)
This shows that the coefficients of the Fourier series expansion of 1/S(<o) are 0
578 ENTROPY
for |m | > M Hence
1
SM
N
E cke'^T
k~-к
Factoring the resulting S(z) as in (12-6). we obtain
1
~ |b0 + ЬхеЧыГ + • • • +bNe~iN”T\2
(15-1563)
This is the spectrum obtained in Sec. 13-3 [see (13-141)] and it shows that the
MEM leads to an AR model. The coefficients bk can be obtained either from
the Yule-Walker equations or from Levinson’s algorithm.
Note The MEM also has applications in nonprobabilistic problems involving the deter-
mination of unknown parameters from insufficient data. In such cases, probabilistic
models are created where the unknown parameters take the form of statistical variables
that are determined with lhe MEM. We should point out, however, that the results
obtained are not unique because more than one model can be used in the same problem.
In the following, we illustrate this approach using as an example the one-dimensional
form of an important problem in crystallography.
A deterministic application of the MEM. We wish to find a nonnegativc periodic
function /(x) with period 2-tt:
0 </(*)- £ cne>'“
Л — —«
having access only to partial information about its Fourier series coefficients
С ~ Г
'nc
The truncation problem We assume that cn is known only for In I < N.
Solution 1. We create the following probabilistic model: In the interval (-тг.тг), the
unknown function /(x) is the density of an RV x taking values between —тг and тг. We
determine /(x) so a to maximize the entropy
/= -Г /(X)ln/(x)dr
J —w*
of x. This yields [see (15-141)]
(N	\
-1 + E ine>nx J
n-—N	J
The constants ya are determined in terms of the known values of c„.
Solution 2. We assume that fix) is the power spectrum of a stochastic process x„ and
wcdetermine /(x) so as to maximize the entropy rale (we omit incidental constants)
f ln/(x)*£r
* — fF
15-5 «я>|м, 579
of х„. In this case, f(x) is given by [see (15-163)1
The constants d„ are again determined in terms of the known values of <•„. (Levinson’s
algorithm).
The phase problem We assume that we know onlv the amplitudes r of c for
To solve the problem, wc form again the integral I. either as the entropy or as the
entropy rate, and wc maximize it with respect to the unknown parameters which arc now
the coefficients c„ (amplitudes and phases) for In I > A'. and the phase <p„ for \n i < ,V.
An equivalent approach involves the determination of /(.v) as in the truncation problem,
treating the phases as parameters, and the maximization of the resulting / with
respect to these parameters. In either case, the required computations are not simple,
15-5 CODING
Coding belongs to a class of problems involving the efficient search and
identification of an object £ from a set of Л' objects. This topic is extensive
and it has many applications. We shall present here merely certain aspects
related to entropy and probability, limiting the discussion to binary instantly
decodable codes. The underlying ideas can be readily generalized.
Binary coding can be also described in terms of the familiar game of 20
questions: A person selects an object from a set .X Another person wants to
identify the object by asking “yes" or “no" questions. The purpose of the game
is to find using the smallest possible number of questions.
The various search techniques can be described in three equivalent forms:
(a) as chains of dichotomies of the set (b) in the form of a binary tree: (c)
as binary codes (Fig. 15-18). We start with an explanation of these approaches,
ignoring for the moment optimality considerations. The criteria for selecting the
“best” search method will be developed later.
Set dichotomies. We subdivide the set ./ into two nonempty sets and .о/,
(first-generation sets). We subdivide each of the sets .V(l and :/, into two
nonempty sets	and	(second-generation sets). We continue
with such dichotomies until the final sets consist of a single element each.
The indices of the sets of each generation are binary numbers formed by
attaching 0 or 1 to the indices of the preceding generation sets.
In Fig. 15-18, we illustrate the above with a set consisting of nine
elements. We shall use the chain of sets so formed to identify the element £7 by
a sequence of appropriate questions (set dichotomies): Is it in No. Is it in
•£/I0? No. Is it in ^uo? Yes. Is it in ^uno? Yes. Hence the unknown element is
because л^н00 = {£,}.
Binary trees. A tree is a simply connected graph consisting of line segments
called branches. In a binary tree, each branch splits into two other branches or
580 ENTROPY
FIGURE 15-18
it terminates. The points of termination arc the endpoints of the tree and the
starting point R is its root (Fig. 15-18). A path is a part of the tree from R to
an endpoint. The two branches closest to the root are the first-generation
branches. They split into two branches each, forming the second generation.
Since each branch splits into two or it terminates, the number of branches in
each generation is always even. The length of a path is the total number of its
branches.
There is one-to-one correspondence between set dichotomies and trees.
The £ th-generation sets correspond to the к th-generation branches and each
set dichotomy to the splitting of the corresponding branch. The terminal sets
{£,} correspond to the terminal branches and the elements to the endpoints of
the tree. The indices of the sets are also used to identify the corresponding
branches where we use the following convention: When a branch splits, 0 is
assigned to the left new branch and I to the right. The index of a terminal
branch is also used to identify the corresponding endpoint Thus each
element £ of is identified by a binary number x, (Fig. 15-18). The number of
digits of Xj equals the length of the path ending at £(. This number also
equals the number of questions (dichotomies) required to identify
Binary codes. A binary code is a one-to-one correspondence between the
elements £, of a set and the elements x, of a set X = (x,, x2,...} of binary
numbers. Encoding is the process of constructing such a correspondence.
The .set will .be called the source and its elements the source words.
The corresponding binary numbers x, will be called the code words. The binary
digits 0 and 1 form the code alphabet. The length I, of a code word x, is the
total number of its binary digits.
15-5	581
A message is a sequence of source words
e •z	(15-164)
The sequence of the corresponding code words
-4,	v(< ••• A-,.	(15-165)
is a coded message.
The indices of lhe terminal elements of a tree. or. equivalently, of a chain
of set dichotomies, specify a code. Codes can, of course, be formed in other
ways; however, other codes will not be considered here The term code will
mean a binary code specified by a tree as above.
In Fig. 15-18, we show the code words .v, of a source ./ consisting of
/V = 9 elements, and the corresponding word lengths
THEOREM. If a source .У’ has N words and the lengths of the corresponding
code words equal then
(15-166)
Proof. The last-generation branches of the tree are terminal and they form
pairs. The two branches of one such pair arc the ends of two paths of length I,
(Fig. 15-19). If they are removed, the tree contracts into a tree with N - 1
endpoints. In this operation, the two paths are replaced with one path of length
'lf— 1 and the two terms 2~lr in (15-166) are replaced with the term
Tree contraction
ElGURE ,154$
582 hntropv
FIGURE 15-20
Since
2_/' + 2~1’ =
(15-167)
the sum does not change. Thus the binary length sum in (15-166) is invariant to
a contraction. Repeating the process until we are left with only two first-gener-
ation branches, we obtain (15-166) because 2_| + 2_| = 1.
CONVERSE THEOREM. Given W integers /, satisfying (15-166), we can con-
struct a code with lengths /f.
Proof. It suffices to construct a binary tree with path lengths From (15-166) it
follows that if lr is the largest of the integers then the number n of lengths
that equal lr is even. Using n = 2m segments, we form the rth (last) generation
branches of our tree. If each of the m pairs of integers lr is replaced by a single
integer lr — 1 and all others are not changed, the resulting set of numbers will
satisfy (15-166) [see (15-167)]. We can, therefore, continue this process until we
are left with only two terms. These terms yield the two first-generation branches.
The above is illustrated in Fig. 15-20 for W = 8.
Decoding. In the earlier discussion, we presented a method for encoding the
words of a source Encoding of an entire message of the form (15-164) can
be obtained by encoding each word successively. The result is a coded message
as in (15-165). Decoding is the reverse process: Given a coded message, find the
corresponding source message.
Since word coding is a one-to-one correspondence between £ and .r(, the
decoding of each word of a message is unique. However, an entire message
cannot always be so decoded because there is no space separating the code
15-5 CODING 583
words (this would require an additional letter in the code alphabet). The
problem of separation does not exist for codes constructed through dichotomies
(they are, we repeat, the only codes considered here) because such codes have
the following property: No code word is the beginning of another code word. This
property is a consequence of the fact that in any tree, each path terminates at
an endpoint; therefore, it cannot be part of another path. Codes with this
property are called “instantaneous" because they are instantly decodable; that
is, if we start from the beginning of a message, we can identify in real lime the
end of each word without any reference to the future.
Example 15-24. Wc wish to decode the message
10110100001010001011111 (100000010
formed with the code shown in Fig. 15-18. Starting from the beginning, we identify
the code words by underlying them with the help of the table of Fig. 15-18;
10	1101	000 010	10 0010	111 IltXI 000 0010
The corresponding source message is the sequence
Note We have identified each source word with a single symbol £(. It is possible,
however, that Si might be a grouping of other symbols. For example, the source ./ might
consist of: All the letters of the English alphabet, certain frequently used words (for
instance, the word the) and even a number of common phrases like happy birthday. Such
sources are equivalent to single-symbol sources if each word is viewed as a single
element.
Optimum Codes
In the absence of prior information, the two subsets of each set dichotomy are
so chosen as to have nearly equal elements. The resulting code lengths are then
nearly equal to log N. If, however, prior information is available, then more
efficient codes can be constructed. The information is usually given in terms of
relative frequencies and it is used to form codes with minimum average length.
Since relative frequencies are best described in terms of probabilities, we shall
assume from now on that the source У is a probability space.
DEFINITIONS. A random code is a process of assigning to every source word £
a binary number x(.
Since Si is an element of the probability space a random code defines
an RV x such that
x(£) =*,
The length of a random code is an RV L such that
(15-168)
where lt is the length of the code word xt assigned to the element £.
S84 ENTROPY
The expected value of L is denoted by L and it is called the average length
of the random code x. Thus
L=£{L} = Ep,/(	(15-169)
i
where pt = P{x = x() = P{£(}.
Optimum code. An optimum code is a code whose average length does not
exceed the average length of any other code. A basic objective of coding theory
is the determination of such a code. Optimum codes have the following
properties:
1. Suppose that ga and are two elements of such that
Ph = PUb}
We maintain that if the code is optimum and
pa>pb then la<lb	(15-170)
Proof, Suppose that la > lb. Interchanging the codes assigned to the elements
and gb, we obtain a new code with average length
L, = L - (pala +pblb) + (palb +Pbla) = L-(pa -pb)(la - lb)
And since (pa ~Pb^a — lb) > 0, we conclude that L} < L. This, however, is
impossible because L is the optimum code length; hence la <, lb.
Repeated application of (15-170) leads to the conclusion that if
p}>p2> ••• ^PN then f, < l2 <, ••• <, lN (15-171)
2. The elements (source words) with the two smallest probabilities pN_,
and pN are in the last generation of the tree; that is, their code lengths are /л,_1
and lN.
Proof. This is a consequence of (15-171) and the fact that the number of
branches in each generation is even.
The following basic theorem shows the relationship between the entropy
N.
H( S) = - £ p( log Pi
i-l
of the source word partition @ and the average length L of an arbitrary
random code x.
THEOREM.
Я(©)^£	(15-172)
Proof. As we have seen from (15-166), if lt are the lengths of the code words of
x and qt • 1/21*., then the sum of the g/s equals 1. With a( = pt and bt - qt it
15-5 CODING 585
follows, therefore, from (15-37) that
~ Ep, l°8 P, “ Ep, log<7, = ЕрЛ = *•	(15-173)
i	i	i
and (15-172) results.
In general, H(@) < L. We maintain, however, that A7(<5) = L iff the
probabilities p( are binary decimals, that is, iff p, = 1/2"'.
Proof. If /7(0) = L, then (15-173) is an equality; hence pt = q, = 1 /1l< [see
(15-37)] and our assertion is true because the lengths /, arc integers.
Conversely, if p, = 1/2"- and n, are integers, then we can construct a
code with lengths /, = n, because the sum of the p,'s equals 1. The length L of
this code equals /7(0). In other words, if all p(*s are binary decimals, then the
code with lengths /, = it, is optimum.
Shannon, Fano, and Huffman Codes
The preceding theorem gives us a low bound for the average code length L but
it does not say how close wc can come to this bound. At the end of the section
we show that, if we encode not each word but an entire message, then we can
construct codes with average length per word less than + s for any
e > 0.
In the following, we present three well-known codes including the opti-
mum code (Huffman). The description of these codes is clarified in Example
15-25.
The Shannon code. As we noted, if all probabilities p, are binary decimals, then
the code with lengths /( = - log pf is optimum. Guided by this, we shall
construct a code for all other cases.
Each Pi specifies an integer such that
1 1
— <p,<----------r	(15-174)
2"<	'	2"'”*
where pf > 1/2"* for at least one p(- (assumption). With n,„ the largest of the
integers nh it follows from the above that
Л 1	1
(15-175)
because the left side is a binary integer smaller than 1. If, therefore, nm is
changed to nm — 1, the resulting value of the sum in (15-175) will not exceed 1.
We continue the process of reducing the largest integer by 1 until we reach a set
of integers /; such that
Л 1
Ezr = l l^ni	(15-176)
/-1 2
586 t-NTROPY
With this set of integers we construct a code and we denote by La its average
length. Thus
N	N
is = ЕрЛ^ Ep."(
i = 1	f » I
We maintain that
Я(@) <La <Я(<5) + 1	(15-177)
Proof. From (15-174) it follows that < -logp, + 1. Multiplying by p, and
adding, we obtain
iV	N
У. p,nt < E - l°s р< + 0 ш # (®) + 1
i — I	i •= I
and (15-177) results [see (15-172)].
The Fano code. We shall describe this code in terms of set dichotomies based
on the following rule of subdivision. We number the probabilities p, in descend-
ing order
Pi>p2> •• >PN	(15-178)
and we select the sets and of the first generation so as to have equal or
nearly equal probabilities. To do so, we determine к such that
Pi + • • • +pk < 0.5 < Pk+i + • • • +pN
and we set equal to {<,, ..., or to , £*+,}. The same rule is used
in all subsequent subdivisions. As we see in Example 15-25, the length Lb of the
resulting code is close to the Shannon code length La.
We note that, since there is an ambiguity in the choice of the subsets in
each dichotomy, the Fano code is not unique.
The Huffman code. We denote by the optimum Я-element code and by L?N
its average length. We shall determine Хд, using the following operation: We
arrange the probabilities p4- of the elements of in descending order as in
(15-178) and we number the corresponding elements accordingly. We then
replace the last two elements t,N_\ and CN a new element and we assign to
this element the probability pN_i + pN. A new source results with Я - 1
elements. This operation will be called Huffman contraction.
In the table of Example 15-25, the new element is identified by a box in
which the replaced elements are shown.
Rearranging the probabilities of the new source in descending order, we
repeat the-above operation until we reach a set with only two elements.
To each element of the source .S', we shall assign a code word x,
starting from the last digit: We assign the numbers 0 and 1 respectively to the
last digits of the code words of the elements and gN. At each subsequent
contraction, we assign the numbers 0 and 1 to the left of the partially completed
code words of all elements that are included in the last two boxes.
15-5 coding 587
The code so formed (Huffman) will be denoted by x‘v and its average
length by LeN. We shall show that this code is optimal.
Proof. The proof of the optimality is based on lhe following observation. We
can readily see that the last two code words x,v_ ] and лл. have the same length
lr. In Example 15-25,
N = 9	ж,, = 00000 xq = 00001 I, = 5
If we replace these two words with a single word consisting of their common
part, we obtain the Huffman code x‘v_ t for the set of N - 1 elements and the
code length of the new element equals /г_г This leads to the conclusion that
~ ( Pn -1 + Pn Vr =	- i ~ ( Pn - i + Pn ) (h ~ I)
Hence
+ pN_ । + pN	(15-179)
In the example
7	7
L% = E Pilj + 5pn + 5p4 L* = E P,l, + 4( pK + pj
i — 1	i -1
Induction The Huffman code is optimum for N = 2 because there is only
one code with two words. We assume that it is optimum for every' source with
к /V — 1 elements and we shall show that it is optimum for к - N. Suppose
that there is an A/-elemcnt source for which this is not true, that is, suppose
that
L°N<L‘„	(15-180)
As we know, the two elements and %N with the smallest probabilities are
in the last-generation branches of the optimum code tree. If they are removed,
the contracted tree specifies a new code with length LN_V Reasoning as in
(15-179), we conclude with (15-180) that
^n-\ + Pn-i + Pn ~ L°n <	— L'n- i + Pn-i + Pn
hence < Lcn-\- But this is impossible because the Huffman code of order
N — 1 is optimum by assumption.
Example 15*25. We shall describe the above codes using as source a set ./ with
nine elements. Their probabilities arc shown in the tabic below:
i	I	2	3	4	5	6	7	«	9
Pi	0.22	0.19	0.15	0.12	0.08	0.07	0.07	0.06	0.04
The resulting cnlropy equals
9
tf(@) - - EpJoba =• 2.703
588 kNTRtJPY
Arbitrary code We form a code using a chain of dichotomies chosen
arbitrarily as in Fig. 15-19. In the table below wc show the code words and their
lengths.
i	1	n	3	4	5	6	7	8	9
X,	000	0010	0011	010	Oil	10	1100	1101	9 111 /. = £ P(/( = 3.40 ,, ।
h	3	4	4	3	3	2	4	4	3
Shannon code In the table below we show the integers nt determined from
(15-174) and the required reductions until the final lengths /, are reached. The
corresponding code tree is shown in Fig. 15-20.
Pl	0.22	0.19	0.15	0.12	0.08	0.07	0.07	0.06	0.04	
	1 F	SP,<	1		1 1 Pl < 2			1 1		L — l-l *
Я.	3	3	3	4	4	4	4	5	5	12/16
	3	3	3	3	3	4	4	4	4	14/16
h	3	3	3	3	3	3	3	4	4	1
	000	001	010	011	100	101	110	1110	1111	L“ = 3.1
Fano code In the table below we show the subsets obtained with the Fano
dichotomies, and their probabilities. The last-generation sets are the elements £
of their probabilities are shown on the first row of the table. The dichotomies
start with
= 0.22 + 0.19 + 0.15 = 0.56
P<	0.22	0.19	0.15	0.12	0.08	0.07	0.07	0.06	0.04	
	•Уо	0.56		V,	0.44						
	•^no	iQ4ii	0.34	4>Но	0.20	.с/ц	0.24				
	fl	4*^010 <2	0*011 <3	Л/|00 <4	•^101 fs	•*nh 0.14		J/ш 0.10		
						•®^uuo <6	^uoi <7	•^1110	,аЛш	
	00	010	Oil	100	101	1100	1101	1110	1111	
	2	3	3	3	3	4	4	4	4	Lb = 3.02
Optimum code In the table below we show the original set consisting
of nine elements and the sets obtained with each Huffman contraction. The
elements are identified by their indices and the combined elements by boxes.
Each box contains all elements £ of lhe original source involved in each contraction,
and the evolution of their code words xt starting with the last digit. The rows
below cach vZ^ line show the probabilities of the various elements of ./?. For
example, the number 0.10 in the line below .У*7 is the probability of the box
(element of that contains the elements and
15-5 coding 589
The column at the extreme right shows the sum of the two smallest
probabilities of the elements in This number is used to form the row +1 by
reordering the elements of .>$
Evolution of Huffman code
	1	2	3	4	5	6	7	8	9	
A. 9	0.22	0.19	0.15	0.12	0.08	0.07	0.07	0.06	0.04	0.10
‘^8	I	2	3	4	8	9 0	1		5	6	7	
Pi.*	0.22	0.19	0.15	0.12	0.10		0.08	0.07	0.07	0.14
1	1	2	3	6	7 0	1		4	8	9 0	1		5	
Pi.l	0.22	0.19	0.15	0.14	0.12		0.10		0.08	0.18
•^6	1	2	8	9	5 00	01	1			3	6	7 0	1		4	
Pi.<>	0.22	0.19	0.18			0.15	0.14		0.12	0.26
^5	6	7	4 00	01	I			1	2	8	9	5 00	01	1			3	
Pi.5	0.26			0.22	0.19	0.18			0.15	0.33
	8	9	5	3 000 001 01	I				6	7	4 100 01 1			1	2		
Pl. 4	0.33 						0.26			0.22 0.19		0.41
	1	2 0	1		8	9	5	3 000 001 01	1				6	7	4 00	01	1			
Pl.3	0.41		0.33				0.26			0.59
	8	9	5	3	6	7	4 0000	001	001	01	100	101	11							1	2 0	1		
Pl.2	0.59							0.41		1
S1	89536741	2 00000 00001 0001 001 0100 0101 OH 10	11									
The completed code words x, taken from the last line of the table and their code
lengths /, are listed below.
	I	2	3	4	5	6	7	8	9
	10	11	001	Oil	0001	0100	0101	00000	00001 L° = 3.01
k	2	2	3	3	4	4	4	5	5
The Shannon Coding Theorem
In the earlier discussion, we considered only codes of the elements Ci of a set
and weshowed that the optimum code is between Я(<5) and #(©) 4- 1:
H(S)	+ 1
(15481)
590 ENTROPY
This follows from (15-172) and (15-177). We show next that if we encode not
merely single words but entire messages, then the code length per word can be
reduced to less than //(©) + e for any e > 0.
A messages of length n is any element of the product space Z’". The
number of such messages is N" and a code of the space ^/>n is a correspondent
between its elements and a set of Af" binary numbers. This correspondence
defines the RV xrt (random code) on the space <Zrt and the lengths of the code
words form another RV L„ (random code length). The expected value Ln of L„
is the average code length. From the definition it follows that Ln is the average
number of digits required to encode the elements of -У"'. The ratio
L = —	(15-182)
n
is the average code length per word. The term word means, of course, an
element of <Z.
We shall assume that «Z" is the space of n independent trials.
THEOREM. We can construct a code of the space Z" such that
1
H(@) <L <H(@) + -
n
(15-183)
Proof. We shall give two proofs. The first is a direct consequence of (15-181).
The second is based on the concept of typical sequences.
1. Applying the earlier results to the source У'1, we construct a code Ln
such that
Я(<5") < Ln < H(@«) + 1	(15-184)
This yields (15-183) because Ln = nL and H(@") = n!7(®) [see (15-67)].
2. As we know the space «Z" can be divided into two sets: the set T of all
typical sequences and the set T of all rare sequences. To prove (15-183), we
construct a code tree consisting of 2"W(S) — 1 short paths of length lt = nH(&)
and 2г paths of length lt +1. The short paths are used as the code words of the
typical sequences and the long paths for the long sequences (Fig. 15-21). Since
P(T) = 1 and P(T) = 0, we conclude that the average length of the resulting
code equals
= AP(T) +(/ + /,) P(T) = /, - nH( ®)
Thus L — Я(®) arid (15-183) results.
We note that (15-184) holds even if the trials are not independent. In this
case, the theorem is true if Zf(<5) is replaced by Ж®л)/н.
15-6 CHANNkl CAPACITY 591
FIGURE 15-21
15-6 CHANNEL CAPACITY
We wish to transmit a message from point A to point В by means of a
communications channel (a telephone cable, for example). The message to be
transmitted is a stationary process x„ generating al the receiving end another
process yn. The output yzl depends not only on the input x„ but also on the
nature of the channel. Our objective is to determine the maximum rate of
information that can be transmitted through the channel. To simplify the
discussion, we make the following assumptions:
1.	The channel is binary; that is, the input x„ and the output y„ take only the
values 0 and 1.
2.	The channel is memoryless; that is, the present value of y„ depends only on
the present value of x„.
3.	The input xn is strictly white noise.
From assumptions 2 and 3 it follows that y„ is also white noise.
4.	The messages are transmitted at the rate of one word per second.
This is a mere normalization stating that the duration T of each transmit-
ted state equals one second.
Example 15-26. In Fig, 15-22 we show a simple realization of a channel as a system
with input x„ and output y„. The input to the physical channel is a time signal x(i)
taking the values E and — E (binary transmission). These values correspond to the
two states 1 and 0 of x„. The received signal y(r) is a distorted version of x(r)
contaminated possibly by noise. The system output y„ is obtained by some decision
rule (detector) translating the time signal y(t) into a discrete-lime signal consisting
of O’s and J’s,
592 ENTROPY
Channel
FIGURE 15-22
Noiseless Channel
We shall say that a channel is noiseless! if there is a one-to-one correspondence
between the input x„ and the output y„. For a binary channel this means that if
хи = 0, then y„ = 0; if x„ = 1, then y„ = 1.
In a given channel, the uncertainty per transmitted word equals the
entropy rate H(x) = /?(x) of the input x„. If the channel is noiseless, then the
observed output y„ determines x„ uniquely; hence it removes this uncertainty.
Thus the rate of transmitted information equals H(x).
Definition of channel capacity. The maximum value of H(x), as x ranges over
all possible inputs, is denoted by C and is called the channel capacity
C = maxH(x)	(15-185)
*л
It appears that C does not depend on the channel but that is not so
because the channel determines the number of the input states. If it is binary,
then x„ has two possible states with probabilities p and q = 1 - p, respectively;
hence
H(x) = -p log p - (1 - p)log(l -p) =r(p)	(15-186)
where r(p) is the function of Fig. 15-2. Since r(p) is maximum for p = 0.5 and
r(0.5) = 1, we conclude that the capacity of a binary noiseless channel equals
1 bit/s.
Similarly, if the channel accepts N input states, then its capacity equals
log N bit/s.
Rate of information transmission. We repeat: The channel transmits messages
at the rate of 1 word/s. It transmits information at the rate H(x) bits/s. This
rate depends on the source and it is maximum if the two states of the source are
equally likely.
tThis definition docs not lead to any conclusion about the actual presence of noise in the channel.
15-6 C HANNLL CAPACITY 593
FIGURE 15-23
THEOREM. The maximum rate of 1 bit/s can be reached even if the input x„ is
arbitrary, provided that it is properly encoded prior to transmission.
Proof. 1. An m-word message is a binary number with m digits. There arc 2'"
such messages forming the space У/" and every realization of lhe input x„ is a
sequence of such messages. We encode optimally the space .У/" into a set of
binary numbers x„ using the techniques of the last section (Fig. 15-23). The
number of digits (code length) of each x„ is an RV L„, with mean L„, = E(L,„).
As we know,
mH(x) <; L,n < mH(x) + 1	(15-187)
Hence Lm — H(x) for large m. A code word x„ requires L,„ seconds to be
transmitted because it consists of Lm binary digits. Hence the average time
required to transmit the ли-word messages of x„ in code form equals Lm =
mH(x) seconds. And since the information contained in each message equals
mH(x) bits, we conclude that the average rate of information transmission
equals mH(x)/mH(x) = 1 bit/s.
Proof. 2. We have 2m messages of length m. In a direct transmission (not
encoded), each message requires the same transmission time: m seconds.
However, of all these messages, only 2z”w(x) are likely to occur (typical se-
quences). To reduce the time of transmission, we encode all typical sequences
into words of length lt	as in Fig. 15-21. The rare sequences require
longer codes; however, the probability of their occurrence is negligible. Hence
the average time of transmission of each message is reduced from m seconds to
mH(x) seconds.
Noisy Channel
Due to a variety of factors, a physical channel establishes not a functional but a
statistical relationship between the input xn and the output y„. For a binary
SJM entropy
channel, this relationship is completely specified in terms of the probabilities
P{x„ = 0} =p P{x„ = 1} = q
of the two states of the input, and the conditional probabilities
P{yn =	= /} = TT,j /,/ = 0,1	(15-188)
The probabilities of the output states are given by
Лу„ = °) = 77ooP + 77ю<7 ЛУ„ = О ='nW + 7ru4 (15-189)
DEFINITION. A noisy channel is a random system establishing a statistical
relationship between the input x„ and the output y„.
For a memoryless channel, this relationship is completely specified in
terms of the channel matrix П whose elements тг1; are the conditional probabil-
ities between the input states and the output states. For a binary channel
77oo ^oi
77 io 77n
where
77 00 + ^Ol — 1
77 io + 77ll = 1
(15-190)
The channel is called symmetrical
channel, = 7гн = 1 — /3 and
if 7Г|0 = 7г0| = /3. In a symmetrical
1 ~P P
P l-P
(15-191)
Example 15-27. To give some idea of the nature of the channel matrix, wc show in
Fig. 15-24 a simple version of a symmetrical channel. The input x(r) is a time signal
as in Example 15-26, and lhe resulting output yG) is the sum
y(f) = x(r) + nT < t < nT + T	(15-192)
where vn is a sequence of independent RVs with density the even function /G')-
The output states are determined as follows:
(1
У" “IO
if y(r) 2: 0
if y(/) < 0
x„:0100010
FIGURE15-24,
15-6 CHANNbl ( Al’At in 595
From this wc conclude that the channel is symmetrical and
/3 = P{y„ = l|x„ = 0} = Г/(р + E) dv = P{V > E)
Channel capacity. Prior to transmission, the uncertainty about the input x„
equals H(x) per word. In a noiseless channel, the observed output y(I reduces
the uncertainty to 0. This is not so, however, Гог a noisy channel because y„ docs
not determine x„ uniquely. Knowledge of y„ reduces the uncertainty about x„
from H(x) to H(x|y) and the difference
/(x,y) = /7(x) - f/(x|y)	(15-193)
is the rate of information transmission A
If the channel is noiseless, then /7(x|y) = 0; hence Лх,у) = /Дх). If the
output y„ is independent of the input, then //(x|y) = /7(x); hence /(x,y) = 0. In
other words, such a channel is useless (it does not transmit any information).
DEFINITION. The function /(x,y) depends on the matrix 11 and on the input
x„. The capacity C of a noisy channel is the maximum value of Z(x,y) as x„
ranges over ail possible inputs
C = max/(x,y)	(15-194)
This is consistent with (15-185) because, for noiseless channels. /(x,y) = /7(x).
Example 15-28. We shall show that the capacity of a binary symmetrical channel
with channel matrix as in (15-191) (Fig. 15-25) equals
C=l— r(/3) where r( p) = — p log p — q log q (15-195)
Proof. The entropy of a two-state partition equals r(p) where p is the probability
of one of the states. Thus the entropy /Дх) of the input to the channel equals r(p)
and the entropy of the output equals
H(y) = (у) у = (1 - 2fl)p + /3	(15-196)
because [sec (15-189)]
p{y„ = 0} = (1 - /3)р + /3(1 -p) = у
The above holds also for conditional entropies. Thus, since
Р(У„ - 0l*„ “ 0} = P(y„ = 1 |x„ = 1} = 1 - /3
we conclude that
H(y|x„ = 0) ° Я(у|х„ = 1) =r(l - fl)
tThe conditional'entropy Жх1у) is Shannon’s equivocation.
Г
596 ENTROPY
Binary symmetrical channel
FIGURE 15-25
Inserting into (15-41) and using the fact that r(0) =r(l - 0). we obiain
/7(x|y) = H(y|x) = pr(0) + qr(p) =r(/3)
From the above it follows that /(x,y) =r(y)-г(Д). This yields (15-195)
because г(Д) does not depend on p and r(y) is maximum if у = 0.5.
Redundant and random codes Consider a set ла/ (source) with N ele-
ments arid a set 68 (code) with M elements where N < M. A redundant code is
a one-to-one correspondence between the elements of ла/ and the elements of a
subset of 68.
The subset 68 { consists of N elements that can be selected in many ways.
If the elements of are chosen at random from the M elements of the
resulting code is called random $ From the definition it follows that the
probability that a specific element of 68 is in the randomly selected set 68 x
equals N/M.
In the next example we show that redundant encoding can be used to
reduce the probability of error in transmission.
Example 15-29. In a symmetrical channel, the probability of error equals ft. To
reduce this error, we encode the input set л/= (0,1} into the subset =
(000, lllj of the set 6$ of all three-digit binary numbers. In the earlier notation.
W = 2 and 8.
The input x„ is thus encoded into a signal x„ consisting of triplets of 0’s and
Г$ yielding as output a signal y„ (Fig. 15-26). The decoding scheme is the majority
rule; If the received triplet consists of at least two 0’s, then y„ = 0, otherwise
fThisdcfinition of. a random code is not the definition given on page 583.
15-6 C IIANNLL CAPAC IH 597
FIGURE 15-26
It can be readily seen that (Prob. 15-23) the probability that a transmitted
word will be detected incorrectly equals /32(3 - 2/3). This is less than /3 if /3 < 0.5.
However, the rate of transmission is also reduced from 1 word per second to I
word per three seconds.
It appears from the above that reduction of the probability of error by
redundant encoding must result in transmission rates that tend to 0 as the error
tends to 0. This, however, is not so. As the following remarkable theorem shows,
it is possible to achieve arbitrarily small error probabilities while maintaining
the rate of information transmission close to the channel capacity.
The Channel Capacity Theorem
Information can be transmitted through a noisy channel at a rate nearly equal to
the channel capacity C with negligible probability of error.
Proof. Preliminary remarks From the definition of channel capacity, it follows
that the maximum of H(x) is at least equal to C because
H(x) = /(x,y) + H(x|y) £ /(x,y)	(15-197)
This shows that we can find a source with entropy rate as close to C as we want.
We shall show that if x„ is a source with entropy rate
H(x)<C	(15-198)
then it can be transmitted at the rate of 1 word per second with probability of
error less than a for any a > 0. This will prove the theorem because the
information per word equals Mx).
As in the noiseless case, the proof is based on proper encoding of the
space consisting of all possible segments of xn of length m. However, as
the following remarks show, the objectives are different.
598 ENTROPY
Noiseless channel
Noisy channel
FIGURE 15-27
.....*(У.)
.....*(2.)
Noiseless channel The code set consists of two groups of binary numbers
(Fig. 15-27a). The first group has 2й11 elements of length = mH(x) and it is
used to encode the 2m' typical sequences of the input space The second
group is used to encode the rare sequences of Since the set of all rare
sequences has negligible probability, the average length of the code equals
Thus, in the noiseless case, the purpose of coding is reduction of the time
of transmission of m-word messages from m seconds to m, seconds. This
results in an increase of the rate of information transmission from mH(x) bits
per m seconds to znH(x) bits per m I = mH(x) seconds.
Noisy channel Reasoning as in (15-197), we conclude that, given e > 0,
we can find a process z„ such that
H(z) - H(z|y) > C - e	(15-199)
Choosing ;e < C — №), we obtain
H(z) > H(x) + H(z|y) > Я(х)	(15-200)
because H(z|y) > 0.
All sequences of z„ of length m form a space consisting of 2"'
elements. We can, therefore, encode the input set into the set The
resulting code is one-to-one (Fig. 15-27/?). The code can, however, be viewed as
redundant if we consider only the mapping of the subset T(x„) of all typical
15-6 < 11 ANM.I. C APA< in 599
sequences of -/’S' into the subset TT(z,„) of all typical sequences of ,Z"'.
Indeed, T(x„) has N = 2"'//<x) elements and T(z„) has Л/ = 2"'"(I> elements
where

(15-201)
because H(x) < H(z) and m » 1. We denote by z„ the code word of a typical
хл message and by T(z„) the set of all such code words. Clearly. T(z„) is a subset
of the set T(z„) consisting of N « M elements.
The purpose of the coding is to select the set T(z„) such that its elements
are at a “large distance” from each other in the following sense: Since the
channel is noisy, the output due to a specific element z„ is not unique. We
denote by iMz„) the set of all output sequences due to this element, and we
attempt to design the code such that the probability of the intersection of the
output sets (Mz„) as z„ ranges over every element of the set T(z„) is negligible.
This will ensure the unique determination of z„ in terms of the observed out-
put У„.
Random code To complete the proof, we shall show that among all
jV-element subsets of the set T(zZ() there exists at least one that meets our
requirements. In fact, we shall prove a stronger statement: If we select at
random N elements z„ from the M elements of T(z„) and use the resulting set
T(z„) to encode the set T(x„) then, almost certainly, the probability of error in
transmission will be negligible.
We note that, once the code set T(z„) has been selected, lhe probability
that an element of T(z„) is in T(z„) equals N/M. From this it follows that, if У'
is a randomly selected subset of T(z„) consisting of Nw elements, then the
probability Pw that it will intersect the set T(z„) equals
4
Pw = 1
NNW
M
(15-202)
because
Suppose that we transmit the selected m-word message z„ through the
channel and we observe at the output the m-word message y„. Since the channel
is noisy, the same y„ might result from many other input messages. We denote
by 2^(y„) the set consisting of all elements of T(z„) that will produce the same
output y„, excluding the actually transmitted message z„ (Fig. 15-27Z>). If the set
ЯР(уп) does not intersect the code set T(z„), there is no error because the
observed signal y„ determines uniquely the transmitted signal z„. The error
probability equals, therefore, the probability Pw that the sets 2^z(y„) and T(zn)
intersect. As we know [see (15-74)] the number Nw of typical elements in
equals 2'nW(I,y). Neglecting all others, we conclude from (15-202) that
NN
p ~ w = 2",//^H')2"’lW(x,_//(>!)1
** M
600 ENTROPY
FIGURE P15-3
This shows that
Pw->0
as m -* a»
because /7(z|y) + Я(х) — H(z) < 0, and the proof is complete.
We note, finally, that the maximum rate of information transmission
cannot exceed C bits per second.
Indeed, to achieve a rate higher than C, we would need to transmit a
signal z„ such that Mz) — Я(г|у) > C. This, however, is impossible
[see (15-194)].
PROBLEMS
15-1. Show that Я(И • 8|8) - ЖЯ1®).
15-2. Show that if <p(p) — —p log p and pt < px + e < p2 — e < ^cn
4>(Pi + />2) < 4>(Pi) + ф(Р2) < 4>(P\ + e) + 4>(P2 ~ e)
15-3. In Fig. P15-3a, we give a schematic representation of the identities
Я(И • 8) = Я(И) + Я(8|Я) = Я(И) + Я(8) - /(Я, 8)
where each quantity equals the area of the corresponding region. Extending
formally this representation to three partitions (Fig. P15-3b), we obtain the
identities
Я(« • 8 • S) « Я(й) + Я(8 • Е1И) = Я(Я • 8) + Я(Е1Я - 8)
Я(Я •»•€)= Я(И) + Я(81») + Я(Е1Я • 8)
Я(8 • «|Я) = Я(8|Я) + Я(®1Я • 8)
Show that these identities are correct.
15-4. Show that
/(« • 8, g) + /(«,») • /(« • g, + /(И, £)
and identify each quantity in the representation of Fig. P15-36.
I'ROUI.I.MS 601
15-5. The conditional mutual information of two partitions Й and ® assuming @ is by
definition
/(Я, ®|E) = Я(Я1®) + H(®|£) - Н(й  ®l®)
(a)	Show that
/(Я. »I6) =/(»,©• 6)-/(И.®)	(i)
and identify each quantity in the representation of Fig. P15-3&.
(6)	From (i) it follows that /(И, S3 • (E) > /(91, @). Interpret this inequality in
terms of the subjective notion of mutual information.
15-6. In an experiment .У, the entropy of the binary partition й = [.<У, equals
r(p) where p = Р(л/). Show that in the experiment ./’3 =./x ./x .у\ the
entropy of the eight-element partition Я3 = Я • 91 • Й equals 3r(p) as in (15-67).
15-7. Show that
Я(х + а)=Н(х) H(x + y|x) =H(y|x)
In the above, H(x + a) is the entropy of the RV s + a and /7(x + y|x) is the
conditional entropy of the Rv x + y.
15-8. The RVs x,y are of discrete type and independent. Show that if z = x + у and
the line x + у = z, contains no more than one mass point, then
H(z|x)=H(y) <H(z)
Hint: Show that 91. =	• 9ly.
15-9. The RV x is uniform in the interval (0, a) and the RV у equals the value of x
rounded off to the nearest multiple of 8. Show that Kx,y) = In a/8.
15-10. Show that, if the transformation у = g(x) is one-to-one and x is of discrete type,
then
tf(x,y) = H(x)
Hint: рц » P{x =	- Д
15-11. Show that for discrete-type RVs
H(x,x) = H(x) H(x|x) = 0 H(y|x) = H(y,x|x)
H(y|x1,...,x„) =Я|у, £ a4xjx1,...,x„|
\ fc-i	/
For continuous-type RVs, the relevant densities are singular. The above holds,
however, if we set H(x,x) = H(x) and use theorem (15-103) and its extensions to
several variables to define recursively all conditional entropies.
15-12. The process x„ is normal white noise with E{x„) = 5, and
= £2-‘x„_a
k-0
(a) Find the mutual information of RVs x„ and y„. (b) Find lhe entropy rate of
the process yM.
602 ENTROPY
15-13, The RVs x„ arc independent and each is uniform in the interval (4,6). Find the
entropy rate of the process
00
y„ = 5 E 2-лх„_а.
A-0
15-14. Find the ME density of an RV x if /(л) = 0 for |.rI > 1 and E{x} = 0.31.
15-15. It is observed that the duration of the telephone calls is a number x between I
and 5 minutes and its mean is 3 min 37 sec. Find its ME density.
15-16. Wc are given a die with P(even) = 0.5 and are told that the mean of the number
x of faces up equals 4.44. Find the ME values of p, = P{x = i}.
15-17. Suppose that x is an RV with entropy ff(x) and у = 3x. Express the entropy H(y)
of у in terms of H(x). (a) if x is of discrete type, (b) if x is of continuous type.
15-18. In the experiment of two fair dice. Я is a partition consisting of the events
л/j = {seven}, s/2 = {eleven), and .o/3 = .o/J U (я) Find its entropy, (b) The
dice were rolled 100 times. Find the number of typical and atypical sequences
formed with the events ^/|, .зЛ, and л/3.
15-19. The process x[n] is SSS with entropy rate H(x). Show that, if
- E
Л-0
then
1
lim —-tf(w0,...,w„) = H(x) + 1пЛ0
л-.» Л + 1
15-20. In the coin experiment, the probability of “heads” is an RV p with E{p} = 0.6.
Using the MEM, find its density ftp).
15-21. (The Brandeis. dice problemt) In a die experiment, the average number of dots up
equals 4.5. Using the MEM, find p/ =
15-22. Using the MEM, find the joint density /(X|, x2, x3) of the RVs хьх2,х3 if
E{xf) = E{x|) = E{xl) = 4	^{х^п} = E{x,x3) = 1
15-23. A source has seven elements with probabilities
0.3	0.2	0.15	0.15	0.1	0.06	0.04
respectively. Construct a Shannon, a Fano, and a Huffman code and find their
average code lengths.
15-24. Show that in the redundant coding of Example 15-29, the probability of error
equals Д2(3 - 2/3).
Hinn P{y„ = 1 |x„ = 0} = fi3 + 3/32(l - /3).
15-25. Find the channel capacity of a symmetrical binary channel if the received
information is always wrong.
IE. T. Jaynes: Brandeis lectures, 1962.
CHAPTER
16
SELECTED
TOPICS
16-1 THE LEVEL-CROSSING PROBLEM
Given a random process x(z) and a constant a, we denote by t, the lime
instances when x(t) crosses the line La shown in Fig. 16-1. This line is parallel
to the time axis and
x(T/) = a
The level-crossing problem is the determination of the statistical properties of
the point process tz so formed. A special case is the zero-crossing problem
(a = 0) when La coincides with the t axis.
The level-crossing problem is, in general, complicated. We discuss next
only certain aspects that lead to simple results.
EXPECTED NUMBER OF CROSSINGS!. We assume that the process x(t) is
stationary and we denote by ne(T) the number of points in an interval of
length T. The following basic theorem expresses the mean of n0(T) in terms of
the first-order density /X(x) of x(O and the conditional mean of its derivative.
THEOREM. If x'O) exists, then
£{ne(T)} - TA(n)£{|x'(f)llx(O =	(16-1)
tA. Bhnc-Lapierre: Modules Statistiques pour I'etude de phenomena de Fluctuations, Masson ct cie,
Pari*,1963.
604 SELECTED TOPICS
Proof. We shall prove the theorem using the following property of the impulse
function (see Papoulis, 1968): If r, are all the real 0’s of a function <p(r), then
(Fig. 16-2)
r	,	„	„
SM,)1 = ? (16’2)
The 0’s of the function = x(/) - a are the La crossings of x(t). Thus
<p(T/) = x(t() - a = 0	<?'(')= x'O)
Inserting into (16-2), we obtain
Е«(»-т,) = |x*(t)|3[x(r) — a]	(16-3)
i
The sum
«/) = E«(t - T.)
i
is a stationary process consisting of a sequence of impulses at the points t,. The
area of each impulse equals 1 and the number of impulses in the interval
(t,r + T) equals ne(T). Hence
«»(T) = jf'+TJ(o) da E(n0(T)) = 7E({(t)J
16-1 пп i.i vi i crossing riu ни.i m 605
To prove (16-1), it suffices, therefore, to find the mean of £0). As we see from
(16-3), the RV £(r) is a function of the RVs x(r) and x’G). Denoting by /(л. д')
the joint density of x(r) and x'(r), we conclude from (16-3) and (7-2) that
£{£(')} = [ f |л'|<5(л- — a)f(x, a') dxdx' = [ |.v'|/( <j, д’) dx'
This yields (16-1) because /(«, x') = /,U)/(a'|«).
Note that the conditional mean £{|x'(r)| |x(r) = «) is the average of the
slopes |x'(r)| of all processes that cross the line La at time i.
>JRVRI ^CROSSING DENSITY. We denote by ри(т) the probability that in an
interval /T of length r there is one and only one crossing. If r -♦ 0 then
pe(r) -* 0. Furthermore, with the exception of unusual cases, the probability
that there is more than one crossing in a small interval т is small compared to
pfl(r). Assuming that this is the case for the process x(t). wc introduce the limit
I
Ля = lim -ро(т)
г —0 T
If this limit exists, it is the density of the L„ crossings. Thus ри(Дт) = Au Дт for
small Дт.
We maintain thatf
Л„-|е{п„(Г)}	(IM)
Proof. If Дт is small, then
Р{пв(Дт) = 1} = ра(Дт) Р{пв(Дт) = 0} = 1 - Ри(Дт)
Hence
Е{п„(Дт)} =1 хР{пя(Дт) = 1} =р„(Дт) = Аи Дт
From.this it follows that (16-4) is true for T small. And since
п0(7’1 + Т2)=пл(Т1)+пй(Т2)
we conclude that it is true for any T.
We have thus shown that if x(r) is differentiable, then the level-crossing
density Afl exists and it equals
A,-A(«)E(|x-(r)||x(r)-a|	(16-5)
Density of maxima. The maxima and minima (extrema) of a process x(r) are the
zero crossings of the process y(r) = x'(/). From this it follows that the density
of the extrema of x(r) is obtained from (16-5) if we set a = 0 and replace
)SJO. Ricc: “Mathcmatical Analysis of Random Noise,” in Selected Papers on Noise and Stochastic
Апсеи». Dover, N.Y., 1954.
606 SELECTED TOPICS
x(/) by x'(r). This yields
A.,,, =/x.(0)£{|x"(r)||x’(O = 0)
The density of maxima equals A,„/2.
(16-6)
Normal Processes
We shall apply the preceding results to normal processes under the assumption
that their mean -qx is 0. This assumption is not essential because the a-level
crossings of the process x(t) are identical to the (a - rjJ-level crossings of the
centered process x(r) — -qx.
THEOREM. If x(r) is a differentiable process with autocorrelation R(t), then
1 / — R"(0)
A. =А(я)Е{|х'(')|) -	(16‘7)
ТГ у
Proof. As we know
Я,х(т) = -Л'(т) Rx,Ar) = -Л"(т)
From the existence of x'(r) it follows that R"{r) exists. Therefore, /?'(r) also
exists and Я'(0) = 0 because R(t) is even. Hence
E{x(r)x'(r)} = -Я'(0) =0
This shows that the RVs x(r) and x'G) are orthogonal. And since they are
normal with zero mean, they are independent. The first equality in (16-7)
follows, therefore, from (16-5).
To prove the second equality, we observe that the variance of x(t) equals
Я(0) and the variance of x’(r) equals -Я"(0). This yields [see (5-45)]
1	,	/ — 2/?"(0)
fM =	£(6'(<) 1} = 1/ ——
yjLirR{\j)	I 7Г
and.(16-7) results.
Zero-crossing density. Denoting by Ao the density of the 0’s of x(t), we
conclude from (16-7) with a = 0 that
-л-(0) _ rSsWd*
0	тг2«(0)	Ла	(
Example 16-1. If S(o>) = 0 for |o»| > <r, then (16-8) yields Ao 5 or/ir because
f a>2S(w) dw £ a2 [ S(a>)da>
16-1 THE Lt- VEt -C ROSSISG PROBLEM 607
If also $(<u) = 50 for |w| < cr (ideal low-pass), then
Example 16-2. («) If a and b arc two independent RVs with variance a2 and
x(r) = acoswGr + bsina>(/
as in Example 10-13, then
Ях(т) = cr2 cos шот ЯД0) = a2 /?;'(()) = ~w2tr2
and (16-8) yields
A(| —
7Г
(6) If the RVs a and b are independent of the process v(/) and
x(z) = a cos w(tt + bsin wot + v(r)
then
Rx(t) = a-2 cos w()t + Rt,(r)
and (16-8) yields
A =1 /^2~ Д"(0)
° тг у a2 + Л,,(0)
Nondifferentiable processes. We have shown that if x*(f) exists, then the proba-
bility p0(r) that there is a zero crossing in a small interval r equals Aor. If x'(f)
does not exist, then p0(r) is no longer proportional to r. In the following, we
examine the asymptotic form of p0(r) as т -> 0 under the assumption that /?(r)
has a comer at the origin as in the Ornstein-Uhlenbeck process [see (11-15)].
Suppose that R'(r) is discontinuous at т = 0 but Л’(0+) exists. In this case
Я(т) =Я(0)+Я'(0+)т + О(т2)	т>0	(16-9)
We shall show that p0(r) is proportional to 4т:
1 / 2K'(0+)r
Proof. If г is small, then we can neglect more than one zero crossing in the
interval ((, t + r). With this assumption, we have one crossing iff x(r + r)x(/)
< 0 (Fig. 16-3). Hence
Po(t) = P{x(t + r)x(r) < 0}
The RVs x(f + r) and x(r) are jointly normal with correlation coefficient
f (г) ® Я(т)/Л(0). Applying (6-46), we conclude that
j3
Po(r) =» — cosj3 = r(r)
77
608 SELECT ED TOPICS
X(/J
FIGURE 16-3
This yields
1Г2Ро(т)
r(r) = COS /3 = COS 7гро(т) = 1--------------
for р0(т) c 1. Thus, for small t,
,_________________________________________-	R(t)
тгр0(т) = /2[1 - г(т)] г(т) =	(16-11)
Inserting (16-9) into the above, we obtain (16-10).
Note Using (16-11), we can reestablish (16-10). Indeed, in this case, R'(0) - 0 and /?"(0)
exists. Hence
Я(т) = Л(0) 4- |/?"(0)т2 + <9(т3)
Inserting into (16-11), wc obtain
A>W = -i/-57ifT	(16-12)
7Г у л(0)
This yields (16-8) because р0(т) = Aor for small r.
Example 16-3. If a(t) and b(r) are two normal independent processes and
x(t) = z(z)cosw0r - b(r)sin a>ot
as in (11-62), then x(/) is normal, stationary, with autocorrelation [see (11-65)]
Ях(т) = Ra(r)cos шат
(в) V/e shall show that, if x(t) differentiable, then its zero-crossing density
equals
C wT	_	1 / —Я"(0)
Ao = V Ao + — where Ao = — J —	(16-13)
У ir	тг у “(0)
is. the zero-crossing density df a(r). Indeed, in this case, Я'(0) = 0; hence
ЛХ(О) =• лв(0)	/?'(о) = о я;(0) = я;(0) - *>§ял(0)
and (16-13) follows from (16-8).
(6) In the nondifferentiable case, we have
M°)e л-(°) л^(о+) = *i(0+) * о
This shows that the zerOrcrbssing probabilities Po(t) and р0(т) of the processes
x(t) and S(t) are equal [see (16-10)].
16-1 ТНГ; LEVEL-CROSSING PRGULEM 609
Example 16-4. In the preceding discussion we assumed that all processes arc
stationary. The results can be readily extended to nonstationary processes. Wc
illustrate the nondifferentiablc case using as an example the Wiener process w<r).
We shall show that for small 7, the probability pa(t; t + 7) that wO) crosses
the l axis in the interval (t, I + r) equals
1 FT
Po<J>t + t) = -J -	(16-14)
ТГ У 1
Proof. Reasoning as in the stationary case, we conclude that pn(r,/ + r) is given
by (16-11) where r(r) is the correlation coefficient of the RVs x(r + r) and x(r).
Thus [see (11-5)]
r2/T) =	R2(<t + r'1}	= a2t~ =	‘
R(t + t, t + r)R(t, t) a(t + r)at t + r
Inserting into (16-11), we obtain (16-14) because \/r/(r + r) = 1 - r/2t for
small t.
Density of maxima. The extrema of x(f) are the 0’s of x'G). Hence their density
is given by (16-8) if we replace the autocorrelation R(t) of x(r) by the
autocorrelation of x'(/). From this it follows that
2 _ -K<4>(0)
" “ 1Г2Я”(О)
[ <o4S((o) dio
J — oo
it2 j (o2S(oj) da>
J — oo
(16-15)
provided that x’(O exists.
Example 16-5. (o) If S(w) = 0 for |w| > cr, then the above yields Xm <, cr/tr
because
j </$(*>) dto <> a2] a)2S{(o) dio
~(J	J-(r
If also 5(ш) = So for |ш| cr, then
(Z>) If S(<o) = So for < |w| < <u2 and 0 otherwise (ideal bandpass), then
1 13(й»2 —
m ir у 5(л>1 - ю?)
HRST-PASSAGE TIME. We denote by r, the first я-level crossing to the right of
the origin (Fig. 16-4a). The first-passage problem is the determination of the
distribution function FT(r, a) of the RV We shall solve the problem under
the assumption that x(f) is the Wiener process (11-20). We should note,
however* that in the solution we make use only of the fact that the increment
610 SELECTED TOPICS
FIGURE 16-4
x(/2) - x(/|) is independent of xOg) and its density is even:
P{x(t2) - x(11) < и>) = P(x(r2) - x(f,) > -w)	(16-16)
The reflection principle. We shall show that the samples x(f,<) of x(z) that
cross the line La continue on symmetrical paths. In other words, if a sample
x(/,^|) crosses the line La at t = tz, then there exists another sample x(/,£2)
that coincides with xG, fj) for t < t, and for t > t, is the reflection of x(z, £,)
on the line La (Fig. 16-46)
х(/,^) = x(/,^2) г<т(
a ~ x(r,^,) = x(l,£2) - a t > r,
This result, known as the reflection principle, can be stated as follows:
THEOREM. For any x (less than or greater than a)
P{x(l) <x|T| r) = P{x(/) > 2a — x|tj < r) (16-17)
Proof. It suffices to show that (see Prob. 4-13)
P{x(f) x|T| = t) = P{x(r) > 2a — x|T| = т) (16-18)
for every т <,t. From (16-16) it follows that
jP{x(t) - x(t) <.x - a) = P(x(t) - x(t) a - x} (16-19)
Since x(t) - x(r) is independent of х(т), we can write (16-19) in the form
P(x(t) - x(t) x - a|x(r) = a) = P{x(.t) - x(t) £ a - x|x(t) = a)
This is true for any t and r; hence it is true for т = In this case
(x(t) = a) — {ij = t) and (16-18) results because x(t|) = a.
COROLLARY. If x <; a, then
P{x( t) <; x, т। r) = 1 - Fx(2a - x, t)	(16-20)
F(*(r) x> -г, > /} = Fx(x,t) + Fx(2a - x, t) - 1	(16-21)
16- 1 THE I.liVEl.'CROSSING PROBI.I.M 611
where
Fr(x,r) =
r	1 f2^t I
is the first-order distribution of the Wiener process x(/).
Proof. Multiplying both sides of (16-17) by Р(т| < f), we obtain
jP{x(t) x, T| < t) = P{x(t) > 2a - x, T| < /}	(16-22)
for every x and t. If x a, then x(r) > 2a - x iff T| < r, hence the right side
of (16-22) equals P{x(f) > 2a — x) and (16-20) results. The second equa-
tion follows because the sum of the left sides of (16-20) and (16-21) equals
P{x(/) £ x}.
First-passage distribution. The distribution FT(i, a) of the RV r, equals
P{t, <,	= P{t, < /,х(/) < a) + Р{т, < Г, x(f) > a) (16-23)
Clearly, if x(t) > a, then there must be a crossing prior to time r; hence t, < t.
From this it follows that
P{t| < t, x(/) > a) = P(x(r) > a}
Setting x = a in (16-20), we obtain
P{t| <t,x(t) <a) = 1 - Fx(att) = P{x(t) > a}
Thus the two terms on the right of (16-23) equal P(x(r) > a). Therefore,
P{t, <; r} = F.(i,a) = 2P{x(z) > a} = 2 - 2Fx{aj) (16-24)
Absorbing wall We replace the line La by an absorbing barrier. This
means that the resulting process y(r) equals a for every t > t, (Fig. 16-5). We
612 Sl-l-l-CTEO TOPIC'S
shall show that the distribution function of y(/) equals
Fy(y.t) = Ffly,t) + Fx(2a - y.t) - \ У<а (16-25)
and, of course, Fv(y, t) = 1 for у > a.
Proof. If у < fl, then y(r, £) < у for some outcome £ iff x(r, £,) < у and \(t. £,)
does not reach the line Lu prior to time i. Hence
{y(Q ^y) = {x(r) <y,T| > t)
and (16-25) follows from (16-21).
Reflecting wall Wc replace now the line La by a reflecting barrier. This
means that the resulting process z(f) equals x(f) if x(f) < a and it equals
2a - x(/) if x(t) > о (Fig. 16-5). We shall show that in this case
Fflz.t) = Fx(z,t) + 1 - Ft(2fl - z,t) z<a (16-26)
and F:(z, t) = 1 for z > a.
Proof. If z <a, then z(f,^) < z iff either x(/.£) < a or x(r. <,) > 2a - z.
Hence
{z(r) < z) = {x(t) < z) + {x(t) > 2a - z)
and (16-26) results.
16-2 QUEUEING THEORY
Queueing theory deals with point processes (arrivals and departures) and
random intervals (waiting and servicing). This involves the statistical properties
of the number of random points in intervals of random length. As a preparation,
we introduce the underlying concepts in the context of Poisson points and
renewals.
Poisson points. The notation n(t,, t2) will mean the number of points of a point
process in the interval (tj,t2)- As we have shown in Example 4-11, if t- is a set
of Poisson points, with average density A, then the number of points nr =
n(t, t 4- T) in an interval of length T is a Poisson RV with parameter AT. The
corresponding moment-generating function equals [see (5-79)]
r„r(z) = £{z”t) =	(16.27)
Ergodicity As we know,
E{nr) = AT a„2;. = AT	(16-28)
This shows that A equals the ensemble average of the number of points in a unit
interval. We maintain that A can also be interpreted as time average. For this
purpose, live form the time average qr = nz/T. As we see from (16-28), »)•/• is
16-2 01 Ч 1.4 INC. Illi CHI'* 613
an RV such that
A
Е{Т]7)=Л
Hence <r? -» 0 as Г -> oo. From this it follows that
*JT
n7
^=77^A	(16-29)
and hT — AT for sufficiently large T.
Poisson Points in Random Intervals
Suppose now that c is a positive RV and
n, = n(r,t + c)
is the number of points in an interval of length c. We shall determine its
statistics.
If с = c is a constant, then nt is a Poisson RV with parameter Ac. Hence
£{ njc = c) = Ac
From this and (7-66) it follows that
£{nt) = £{£{ n. |c}} = £{Ac) = At?,	(16-3(1)
Denoting by pc the average number of points in the random interval c, we
conclude that pc = kric.
Reasoning similarly, we can find all moments of n(.. In the following, wc
determine directly the moment function
rjz) = £{zn-}
of the discrete-type RV nf in terms of the moment function
Ф/s) = £{eJC}	(16-31)
of the continuous-type RV c.
THEOREM.
Г„(2) = Фс(Л2-А)	(16-32)
Proof. If.c = c, then nc is a Poisson RV with parameter Ac. Hence [see (16-27)]
E(z4c = c) =e(2-,,Ar
This yields
E{zn<} =£{£{2n4c}} = £(е(г~l)Ac)
and (16-32) follows from (16-31) with $ = (z - 1)A.
614 SELECTED TOPICS
Using (16-32) and the moment theorems (5-67) and (5-77), we can express
the moments of nc in terms of the moments of c. We note, in particular, that
£(nj = C,(i) = аф„-(о) - л£{с)
in agreement with (16-30). Similarly,
£{nc(nc - О) " Г"(1) = Л2ФД0) = A2£(c2)	(16-33)
Hence
E{n2) = A2E(c2} + pc сгя2 = A2o-c2 + pc
In the above, pc = A?7C is the average number of points in the random interval c,
and <rc2 is the variance of c.
Example 16-6. I ft
/C(c) = ре~цс
then E{c) = l/д, E{c2} = 2/д2, and
Hence
E{nc) = 7 = Pc
and
From this it follows that
Г„,(г) =
= *} =
P
p + A — Az
P
p + A
n = 0,1,...
We have thus shown that the number nc of Poisson points in an exponentially
distributed random interval c has a geometric distribution with ratio A/(p + A).
Example 16-7. If с = c is a constant, then
Фг(5)=еа r„r(z)-.eAc<,-|>
Thus nc is a Poisson RV with parameter Ac, as it should.
'RENEWAL PROCESSES. Consider a stationary point process ty such that the
RVs
C/ =	t/-|
(16-34)
It
tSincevye deal only with positive RVs, w&shall assume tacitly that ail densities are 0 on the negative
16-2 QUEUEING THEORY 615
FIGURE 16-6
are i.i.d. with distribution Fr(c). With ta a fixed point, we form the RV
w = t, - /0
where t| is the first random point to the right of r„ (Fig. 16-6). We shall express
the density fw(w) of this RV in terms of Fr(c).
THEOREM. We maintain that
/» = “[!	(16-35)
Vc
where
T7r = E{c,} = f[l-Fr(c)] de	(16-36)
Jo
is the mean of c, [see (5-27)].
Proof. Given a number w, we define the Rv w, as the distance from the point
tw = tQ 4- w to the next random point t, to its right (Fig. 16-6). From the
stationarity of the points t/( it follows that the RVs w and w, have the same
density fw(w). Suppose that the first random point t, to the right of l0 is also to
the right of tw. In this case, there is no random point between and tlv and
w = t| -	> W W, = t| - tw c, > w, 4- w
From the above it follows that the events
{w > w) and {C| > W] 4- w}
are equal provided that in the second set we consider only the outcomes such
that C) > w,. Hence
P{w > w) = P{cj > Wj 4- w|cj > wj
Reasoning .similarly, we conclude that
Р{и> < w < w 4- dw} = P{0 < w, < dw, с, > w]
Hie above yields
dw = /H.(0) dw[l - Fc(w)]
616 SELECTED TOPICS
because the RVs w, and c, are independent and
P{Cl>w)«I-Fc(w)
This completes the proof because the area of equals one; it also shows
that - 1/т7с.
The theorem is also true if w is the distance from t0 to the nearest random
point on its left.
COROLLARY. The moment function ФДя) of w equals
ФД^) =	- 1]	(16-37)
Proof. Since Fc'(w) = fc{w) and Fc(w) - 0 for w £ 0, we conclude, integrating
by parts, that for Re 5 < 0:
- CFc(w)etwdw = ^ФДл) - [“e^dw = -
A)	s	Jo s
and (16-37) follows from (16-35).
Differentiating (16-37), we obtain
because ФД0) = 1. Hence [see (5-67)]
E{c2}
(16-38)
Example 16-8; If /С(с) = Ле-Ас, then
Ff(c) - 1 - e~Ar nc = у
Л
and (16-35) yields fw(w) — Ae~Kw = /ДиО. This is the case iff t, is a set of Poisson
points.
Note With c, as in (16-34), we form the point process
Tj = C] + c2 + ' " +Cj
starting from /« 0. In general, this process does not have the same statistics as the
original process ty (it is not even stationary). It is, however, asymptotically equivalent to
ty. The process t; is given by
ty = W + C2 4- • • ’ 4c,
16-2 QUEUEING THEORY 617
It can, therefore, be constructed in terms of the sequence c, and the RV w specified by
(16-35).
Arrivals and Departures
The term queueing is used to describe a large class of phenomena involving
arrivals, waiting, servicing, and departures. In Fig. 16-7 we show a typical model
(queueing system) whose inputs are certain objects (units) identified in terms of
their arrival times tr Each unit stays in the system a, seconds (total system time)
and it departs at the departure time t,:
Ъ “ + ar
The number of units in the system (state of the system) at time t will be denoted
by N(t). Thus N(/) is a discrete-state process increasing by 1 at t, and
decreasing by 1 at t,.
Our objective in this section is not to develop this involved topic in detail.
We plan merely to introduce the main ideas in the context of earlier concepts.
We start with a general theorem that does not rely on any special conditions
about the interarrival times, the nature of the system, or the properties of a„. It
assumes merely that all processes are SSS with finite second moments.
LTITLE’S THEOREM. Suppose that the processes t, and a, are mean-ergodic:

(16-39)
In the above, rir is the number of points t, in the interval (0,7) and Л =
E{n.T}/T is the mean density of these points.
Arrivals S Departures
FIGURE^?
618 SELECTED TOPICS
We maintain that!
£{N(/)} = AE{a,)
(16-40)
In fact, we shall establish the stronger statement that N(r) is also mean-ergodic:
lim fTN(t)dt = AE(aJ = £{N(r)}	(16-41)
т-® T Jo
Equation (16-40) seems reasonable: The mean £{N(r)} of the number of
units in the system equals the mean number A of arrivals per second multiplied
by the mean time £{а,) that each unit remains in the system. It is not, however,
always true, although it holds under general conditions.
Proof. We start with the observation that
N(T)	nr N(O)
- L ar < f N(f) dt - £ a„ < £ a,
r— I	n = I l“l
(16-42)
In the above, the terms a„ of the second sum are due to the nr units that
arrived in the interval (0, T); the terms a8ii of the last sum are due to the N(0)
units that are in the system at t = 0; the terms af of the first sum are due to the
N(T) units that are still in the system at t = T. The details of the reasoning that
establishes (16-42) are omitted.
As we know (see Prob. 8-9)
/NO) V
l*-> )
E{N2(r))E{a4) < a>
(16-43)
Dividing (16-42) by T, we conclude that, if T is sufficiently large, then
It	[ nr
- [ N(f) dt = - E
Tn~l
(16-44)
because the left and right sides of (16-42) tend to 0 after the division by T [see
(16-43)1. Furthermore, assumption (16-39) yields nT — XT and
1 nr	a ”r
E	~~ E	~
1 л-I nTn-L
Inserting into (16-44), we obtain the first equality in (16-41). The second follows
because the mean of the left side equals £{N(t)}.
fF. J. Beullen “Mean Sojourn Times...,” IEEE Transactions Information Theory, March. 1983.
16-2 OUEUE1NG FHEOKY 619
Immediate Service (M|G|«j)
In general, the system time a„ is a sum
a« = b„ + c„
where b„ is the waiting time (or queueing time) and c„ is the service lime of the
nth unit. In many applications. b„ = 0 and a„ = c„. This is the case if the
number of servers is infinite or when no servers are involved (visits to a park, for
example). We consider this problem next.
We shall assume that the arrival times t„ are Poisson points with mean
density Л and the service times c„ are i.i.d. with distribution an arbitrary
function F/c). In queueing theory this is written in the following form (Kendall)
M|G|«
The first position in this notation refers to arrivals, the second to service times,
and the third to the number of servers in the system. The letter M (for Markoff
от memoryless) in the first position means Poisson arrivals; in the second
position it means exponentially distributed service times. The letter G (general)
indicates that the arrivals or the service times are arbitrary. The letter D
(deterministic) indicates that they are constant.
THEOREM. The state NG) of an Af|G|oc system is Poisson distributed
Pk
P{N(z) = A) =	—	(16-45)
with parameter
p = ЛЕ{с„) = Лт7г	(16-46)
This parameter is called traffic intensity or offered load.
Proof. Using the point t as the origin (Fig. 16-8), we divide the time axis into
consecutive intervals (aoa,.+ l) of length Aar = aI+J - a,. Denoting by An,
the number of arrivals in the interval /, and by ANG, a{) the contribution to the
state NG) of the system at time t due to these arrivals, we have
N(r) = EAN(t,a<)	(16-47)
i
If Aa is small, then, within probabilities of order Да, the RV An, takes the
FIGURE 16-8
620 Stl-ЕСГЕО TOPICS
values 0 or I and
P{An,= 1} =ЛАа	(16-48)
If An, = 0, then AN(/,a,) = 0; if An, = 1. then
fl	if c, > ft,
AN(t.«;) = n	.f '
v 17	0 if c, < a,
where c, is the service time of the single unit that enters the system in (he
interval Hence
P{AN(Ga,) = 11 An, = 1} = P[c, > «,} = 1 - F(.(ft,)	(16-49)
Multiplying (16-48) and (16-49), we obtain
P{AN(r,a,) = 1) = [1 - /•;(ft,))A Aft
The RV AN(r, a,) takes the values 0 or 1 and they arc independent because the
RVs An, are independent (Poisson points in nonoverlapping intervals.) Hence
(see Example 8-86) the sum in (16-47) tends to a Poisson RV with parameter
xf [1 — Fc(a)]<7a = lim А Да[1 — F,(ft,)]
'o
and (16-45) results because the sum in (16-47) equals N(r) and the above
integral equals E{c„) [see (16-36)].
COROLLARY. The traffic intensity p equals the average number of units in the
system
E{N(/)} =p = Xijc	(16-50)
Proof. It follows from (16-45) and (5-36).
Example 16-9. (я) (Л1|Л/|оо) if c is an exponential with parameter д. then
= 1/g; hence p = Л/д.
(6) (Af|D|<») If с = c is a constant, then i), = c; hence p = Ac.
Single-Server Queue (M| G11)
We conclude the section with a detailed treatment of the Af|G| 1 system. In this
system, the arrival times t, are again Poisson distributed and the service times c,
are i.i.d. However, there is only one server in the system. The model is
described in Fig. 16-9: Suppose that a unit enters the system at t = t,. Just prior
to its arrival, the system contains N(t~) units; one of these units is being served
and the others are waiting in line. The unit entering at t, occupies the last
position in the queue (shaded area). Its position in line remains unchanged as
other units arrive; the unit advances when a service is completed. It reaches the
server (position 1 in line when service starts) at t = t,_, and it leaves the system
1 6-2 OUliUfclNG 1 HI QRY 621
H-------------a,-------------H
FIGURE 16-9
at time t = t,:
t, = T,-, 4- c,
where c; is the service time. Denoting by b, the waiting time and by a, the
system time, we conclude that
b, = t,_ , - t, a, = t, - t, = b, + c,
This completes the description of the Af|G| I model.
Our objective is to express the statistics of the various parameters of the
system in terms of the distribution Fc(c) of the service times c, and the average
density Л of the arrival times t, under the assumption that N(z) is stationary. As
we shall see, the system reaches stationarity only if the mean tjc of c( (mean
service time) is less than the average interarrival time 1/Л.
Note If the state N(t) of a system is a stationary Af|G|l.process, then N(-t) is a G|Л/| 1
process obtained from NG) by interchanging arrivals and departures. This shows that the
properties of a G|A/|1 process follow from the properties of the corresponding M\G11
process.
THE IMBEDDED MARKOFF CHAIN. The t h unit arrives at / = t, and departs at
t — Tf remaining in the system a, = t( - t,- seconds. During the interval (to тД
nOj new units arrive and all other units depart. Hence n0| equals the
state Nftf) of the system at t = тД This number is denoted by qf and is called
the Markoff chain imbedded in the process N(r ) (see Sec. 16-4). Thus
q, = пй( = /V( т* )	(16-51)
We Wish to relate q, to q,_ t. Suppose, first, that q, _|	1 (Fig. 16-10л). In
this case, at least one unit is waiting at t = ту_hence cf = т, - t,_ t is the ith
service interval. During this internal, nCj units arrive and none depart. And
622 SELECTED TOPICS
FIGURE 16-10
since at t = rt the ith unit leaves, we conclude that
Qi = Qi-i +%- 1 q,-i>l
ИЧ/-1 = 0, then the ith service starts not at t = t,-_, (the system is then empty)
but at the time of the arrival of the next unit (Fig. 16-106). Hence
Q/ = nc, Qi-i = 0
where nQ equals the arrivals in the interior of the service interval.
From the above it follows that q(- satisfies the recursion equation
Qz = 4i-i+nCt	(16-52)
where
>(•"'
Using (16-52), we shall determine the state probabilities pk = P{q, = k] at t* .
Zero state We maintain that
Po = 1 - p P = ki]c	(16-54)
Proof. Clearly» nQ is the number of Poisson points in the random interval c,;
hence [see (16-30)]
£{nc/) = Atjc = p
With
QO	00
= EkPk= £Ф*
ft-0	ft = l
the mean of it follows from (16-53) that
£(Q/I“ £ (*• “ 1)Л = ~	-Po)
£-1	ft-i
16-2 QUEUEING THEORY 623
And since (stationarity assumption) £{q,) =	(16-52) yields
Vq = -Hq - (1 -Po) + At?c
and (16-54) results.
Proceeding similarly, we can find the moments of q, (see Prob. 16-11). It is
simpler, however, to determine directly the moment function
Г/z) = £{z4'] = p0 + £ p,zk	(16-55)
jt-1
of the sequence q(. Using (16-52), we shall show that f^z) can be expressed in
terms of the moment function
C(z) = Ф/Az - Л)
In the above, nQ is the number of arrivals in the random interval c(, r„(z) is the
moment function of nc, and Фс(г) is the moment function of c, [see (16-32)].
THEOREM.
1 -z/ФДАг —A)
(16-56)
Proof. The moment function of q, equals
гИг)=₽о+ E Pkzk~' = Ро + 2"‘[г</(2) “Po] (16-57)
Л-1
From the independence of q( and c, it follows that q, and n, are also
independent. Hence [see (16-52)]
Г/z) = Г5(г)Г„(г)	(16-58)
Inserting (16-57) into (16-58). we obtain
гГ/z) = [Г/z) +pQz -р0]Г„((г)	(16-59)
and (16-56) results.
Equation (16-59) can be used to obtain all moments of qf. We shall use it
to determine As a preparation, however, we reestablish (16-54): Since
Г/1) = 1, we conclude differentiating (16-59) that
i + r;(i) = r;(i)+p0 + r;r(i)
This yields (16-54) because Гл (1) = £{nc) = Aqe [see (16-30)].
Pollaczek-Khinchin formula We maintain that the mean of q; equals
A2£{c?)
П’"д+2(1-р)
(16-60)
«24 SELECTED TOPICS
Proof. Differentiating (16-59) twice and setting z = 1, we obtain
2Г;(1)-2[г;(1) +₽„]r;(i) + r»(i)
But [see (16-33)]Г"(1) = A2E{c2}; hence
2^ = 2(17^ + pn)A-nf + A2E{c;}
and (16-60) results because p = 1 - p„ =
Mean system time and waiting time The process q, equals the number of
arrivals пв( during the system time a,. Hence [see (16-30)]
E{q,} = E{ne} = AE{a,}
and (16-60) yields
AE{c2}
(16-61)
This is the mean system time. Since b, - arf - cjt the mean waiting time equals
AE{c2}
£{6,} = —^	(16-62)
Other moments can be found using moment functions. Denoting by Фв($)
the moment function of ai( we conclude from (16-32) that Г (z) = Ф (Az - A).
Hence
«W-iJi + t)	(16-63)
where rq(z) is given by (16-56).
Note As we know (Little’s theorem) E{N(f)) = АЕ{аД. Comparing with (16-61) and
(16-60), we obtain
A2(n2 + <r2)
E(N(Z)} = E{N«)} = A4f + -	(16-64)
^(1 A7jrJ
Thus the mean1 of NG/) equals the mean of N(t) for any t. At the end of this section we
shall prove that the processes N(/) and NG/) have not only equal means but also
identical distributions.
Equation (16-64) shows that, if the mean service time i]c is specified, then the
mean £{N(t)} of the number of units in the system is minimum if ac - 0. This is so iff
the service time q is constant.
Example W40. (Л/|Л/| 1) If F?(c) = 1 - then
u	1	2
я °	я	я
16-2 QUI-UI-.ING THEORY 625
Thus p = Л/д, Po = 1 - Л/д, and (16-56) yields
This shows that q, has a geometric distribution with ratio the traffic intensity p:
i	P	*
P{$i = k}= poPk E(q,} = --------------------------
I — p p — A
We note, finally, that [see (16-63)]
ф„(5) =----------- Fe(fl) = 1 -
PPo “ *
Thus the system time a, is exponential with parameter др1( and the mean system
time equals
p(2 ~ p)
2(1 -P)
Example 16-11. (M|D| 1) If с = c is a constant then
<bt.(s) = ef* E{c} = c £{c2} = c2
p — kc, pn = 1 - p, and (16-60) yields
E{4.) -
Finally [see (16-56) and (16-63)]
1
BUSY PERIOD. The state of a queuing system is characterized by a succession of
idle periods, when N(/) = 0, and busy periods, when there is at least one unit in
the system. We denote by x the length of an idle period and by у the length of a
busy period. Clearly, x is exponentially distributed as in (11-31) because there
are rib arrivals during an idle period. Our objective is to determine the
properties of y.
The busy period starts with the arrival of a unit at t = t0 into an empty
system. The unit is served instantly and it departs at t = т0. The difference
с = т0 - t0 is its service time. Denoting by nc the number of arrivals in the first
service interval (t0, t0), we conclude that
N(tJ)»l N(^)=nc
The busy period у equals, thus, the interval of time from I = t0 when
N(/g) « 1 until, the moment t = t0 + у when N(r) = N(t(t) - 1 = 0 for the first
time. But the variations of N(r) during that interval do not depend on its initial
value N(/(j). Hence, a busy period У/ can be characterized statistically as follows:
626 SELECTED TOPICS
FIGURE 16-11
It is an RV equal to the time interval from t = when a service period begins,
to t = t, 4- y, when N(r) = N(t/) - 1 for the first time.
From the above it follows that the RVs y|,y2,y3 in Fig. 16-11 are
independent and each is a busy period. Furthermore, the total number of these
RVs equals nc because, at the start of yh the state of the system equals
N(tq ) = nc and at the end of the total busy period y, N(f) equals 0. Hence
у = c + y, + • • • 4-y„f	(16-65)
Mean busy period. Using (16-65), we shall show that
7jy = — p0 = 1 - A7jc	(16-66)
Po
Proof. Since = A7)c, it follows from (8-47) that
я/ E 4 = £{“ J £{y.) =
U-i /
Inserting into (16-65), we obtain i]y = tjc + Anci7y and (16-66) results.
Mean number of units served. The number of units served in a busy period
equals the number of arrivals ny in the random interval y. From (16-30) and
(16-66) it follows that the mean of this number equals
E(ny} = A4y -	(16-67)
Moment fiinction. We shall show that the moment function 4>y(s.) of the busy
period'у satisfies the functional equation
Ч(*) e ф< [ * + Афу( s) - A]	(16-68)
16-2 OULUt-ING THI-.OHY 627
Proof. If c « c, then nf is a Poisson RV with parameter Ac. And since the RVs
у and y, have the same statistics, we conclude that (see Prob. 8-13)
£{ехр[$(у! + • • • +y„r)]) = еАгФ’о>-Аг
From this and (16-65) it follows that
E{eS3f} = £{£{e,y|c}} = £{^+Ae‘/J)-A»c}
and (16-68) results.
Note that (16-68) does not give Фу($) explicitly. It can, however, be used to
determine the moments of y. For example, its derivative at 5 = 0 yields
e;(o) = [i + лф;(о)]ф;(о)
in agreement with (16-66).
Ergodicity. The process N(t) and the sequences N(t,) and N(t,) are ergodic.
This means that statistical averages can be expressed as time averages. The
proof is a consequence of the fact that the state of the system is a succession of
idle and busy periods and its behavior in each period does not depend on what
happens elsewhere in time. The details, however, are omitted. We note the
following consequences:
1. The state of the system N(/,“) just before a unit arrives is an ergodic
sequence of RVs; hence the probability that N(t ~) = к equals the number of
times this occurs in a large interval T divided by the number of points t, in
this interval. This number also equals the number of limes N(t/ ) equals к
because every time N(/) crosses the line N = к increasing, it crosses it also
decreasing. Hence
P{N(tr) =A) =P{N(<) = A)
In other words, the processes N(tf) and N(t,+) have the same statistics.
2. The expected values rjx = £{xz), 7}y = £{y,) of the idle and busy periods, and
their total number n in the interval T = Tx + Ty are such that
Tx	T'y
n	n
where Tx and Ty are the total times the system is idle or busy. Furthermore,
/*{N(r) = 0} = Tx/T. This leads to the conclusion that
P{N(z) = 0}--------—-------------—--------= ———
+ Vy 1/A + 7jc/p0 pn + k^c
because 7]x — 1/A and 17 y = i]c/p0 [see (6-66)]. And since pQ = I — A?yc e
flMr*) = 0), we conclude that
P{N(r) = 0} - P{N«) = 0) = pn P{N(t) > 0} = p
'628 SELECTED TOPICS
State statistics. We shall, finally, show that the process N(r) and the sequence
q, = N(t,+) have the same distribution
P(N(r) =k} =Р{<ь = к] =pk k>0	(16-69)
Proof. If we eliminate all idle periods and connect all busy periods together,
contracting the t axis, we obtain a renewal process specified in terms of the
service times с, = т, - t,_,. With t an arbitrary constant, we denote by w the
distance from t to the nearest point т,_, on its left and by nH, the number of
arrivals in the interval /). From (16-37) and (16-32) it follows that
ФД-s)------[ФД*) - 1]
г , .	,	(16-70)
- 1
rjz)=<VAz-A) = -^—--
p(z - 1)
This holds for every t in a busy period, that is, when N = N(f) * 0. Returning
to the original time (Fig. 16-12), we observe that, if N =# 0, then
N = q,_, + n„,
. Jq, q,>o
Where "'" I q, - 0
Since nw is independent of q,, the above yields
E{znIN*0} =r.(z)r„Jz)
where
00
ГЛг) =Poz + Pkzk =Poz + r<Xz) ~Po (16-71)
Ы
As we know, P{N = 0} = p0, P{N > 0} = p. Hence
T„(z) = £{zN) = E{zn|N = 0}po + £{zn|N > 0}p
From this and (16-59) it follows, after some manipulations, that l\(z) = Г^(г)
and (16-69) results.
nf=3
FIGURE 16*12
16-3 shot noisu 629
FIGURE 16-13
16-3 SHOT NOISE
Shot noise is the process
s(f) = £й(1 - t,)
(16-72)
where hit) is a given function and t, is a set of Poisson points. The process s(r)
is the output of a system with impulse response hit) and input a Poisson
impulse train z(/) (Fig. 16-13). If hit) = (Jit), then sit) is a Poisson process
(Fig. 10-3a). If h(t) is a pulse of width c, then sit) is the queueing process
ЛГ|£>|<» of Example 16-96. In Sec. 11-2, we determined the second-order
properties of the shot noise. In the following, we evaluate its general statistics.
DENSITY FUNCTION. We start with the determination of the density fsix) of
sit):
fsix) dx = P{ x < sit) <x + dx}
under the assumption that hit) is of finite duration
hit) =0 for t < 0 and t > T	(16-73)
Denoting by nr the number of Poisson points in the interval it — T, t), we have
[seis (4-48)]
A(*)= E /J(x|nr = /c)P{nr = A}
k-Q
(16-74)
In the above
P{nr = k} =
е-АГ(ЛТ)*
To find У,(х), it suffices, therefore, to find the conditional density Д(х|пг = к)
assuming that there are к points in the interval it — T, t). The evaluation of
this density is based on the following property of Poisson points [see (3-51)]:
Й it is known that there are exactly к points in an interval it t, r2), then
these points have the same statistics as к arbitrary points placed at random in
this interval; In other words, the к points can be assumed to be к independent
RVs uniform in the interval (iIr r2).
630 SELECTED TOPICS
From the above and (16-73) it follows that the conditional density
//xlny. — 1) equals the density g^x) of the process
Xj(/) = h(t - t,) = й(т]) Tl=t-t)
where tj is uniform in the interval (/ - T, t) or, equivalently, t, is uniform in
the interval (0, T). The function g}(x) is independent of t and can be found
with the techniques of Sec. 5-2.
Similarly, /Х(х|пг = 2) equals the density of the process
x2(f) = h(t - tj + Л(г - t2) = Л(т,) + Л(т2)
where т, and т2 are two independent RVs uniform in the interval (0, T). Hence
the RVs Л(Т]) and Л(т2) have the same density g^x) and they are independent
because they are functions of the independent RVs T| and t2. From this it
follows that [see (6-39)] the density g2(x) of x2(z) is the convolution
£z(*) =£i(*)*gi(*)
Reasoning similarly, we conclude that
/X(x|nr = k) =gk(x) =gj(x)* ••• *gj(x) (16-75)
We note, finally, that if nr - 0, then s(z) = 0; therefore,
Д(х|пг= 0) =g0(x) = fi(x)
Inserting into (16-74), we obtain
fs(x)^e лг £ ---------------- (16-76)
A-0 K‘
This formula is useful mainly for “low density" shot noise, that is, when AT
is of the order of 1.
Example 16-12. Suppose that Л(г) is a trapezoid as in Fig. 16-14. In this case
P{xj(r)^x] =P{ts> (1.5 -x)T) =i-0.5
16-3 SHOT NOISE 631
for 0.05	1.5 and 0 otherwise. Hence g^x) is uniform in the interval
(0.5,1.5), gztx) is a triangle in the interval (1,3), and g3(x) consists of three
parabola pieces in the interval (1.5,4.5). Assuming that A = l/T, we have
P{nr=0)-i	/>{nr=l).l
Inserting into (16-76) and neglecting higher-order terms, we obtain the density
ffx) shown in Fig. 16-14.
Moment function. The moment function of s(z) equals
Ф($) = f eiXft(x) dx = ехр(л ГГ[ехЛ(а) - 1] da\ (16-77)
J-oo	1 'o	I
Proof. This is a special case of (16-80) but can be established directly: As we
have Seen, s(f) can be written as a sum
s(t) = £a(T/)	(16-78)
i = 0
where nr is a Poisson RV with parameter AT and t, are independent RVs
uniform in the interval (0, T). Reasoning as in Prob. 8-13, we obtain (16-77)
because
1 т
Е{ехЛ(т')) = - [ eshMda	(16-79)
TJo
General Properties
Suppose now that the points tz are nonuniform with mean density A(t) and that
the function h(t) is arbitrary. We shall show that the second moment function
of the resulting nonhomogeneous shot-noise process s(t) equals
V(s) = 1пФ($) - Г A(a)[eA('-a>I- l]Ja (16-80)
Proof. We divide the time axis into consecutive intervals (ait a/+I) of length
Да = af+J — a, as in Fig. 16-15 and we denote by An, the number of points t,
X 1
«I
—* Да «—

FIGURE16-13
632 SELECTED TOPICS
in I,. If Да is sufficiently small, then the contribution As/f) to s(r) due to these
points is given by [see (16-72)]
As,(r) = А(/- а,) Дп,	(16-81)
As we know, Дп, is a Poisson Rv with parameter
Р+ДвЛ(а) da = Л(а,) Да
Jat
Hence the moment function of As/t) equals
ДФ,($) =	= ехр{Л(а,) Дв[е*<'-^ - 1]}	(16-82)
Furthermore,
s(') = Ед5,(/) = £Л(/ - а,) Дп,
i	f
And since the RVs Дп, are independent, we conclude that
V(s) = 22 In ДФ,-($) = Ел(а,) Да[е',('-Л-М - 1]	(16-83)
/	i
As Да -> 0, the last sum tends to the integral in (16-80).
Generalization of Campbells’ theorem. The mean t]s and the variance <r/ of the
nonhomogeneous shot-noise process s(/) are given by
tjs = f X(a)h(t — a) da of = f A(a)/r(f — a) da (16-84)
'—co	®
Proof. Expanding the exponential in (16-80) into a series and integrating
termwise, we obtain (16-84) because [see (5-73)]
Ъ = ^(0) а/ = Ф"(0)
JOINT CHARACTERISTIC FUNCTIONS. The joint second moment function Ф of
the n RVs
s(tj),...,s(t„)
equals
Ф(^„...,5Л) = Гл(а)(ер- 1) da	(16-85)
Where
0 = S|A(t| - a) + • • • +s„h(t„ - a)
Proof. Vfe assume for simplicity that n — 2. Proceeding as in (16-81), we denote
by An, the number of impulses in the interval If (a,, ai+,) of Fig. 16-15, and by
AS/Oi) ~ Л(/| - а/) Дп, As,(f2) « h(t2 - <х() Дп,
the response of the system at t = tr and t =t2 respectively due to these
16-3 SHOT NOISE 633
impulses. The joint moment function Ф of the RVs As,(/,) and As,(/2) equals
This is of the same form as the second term in (16-82) if we replace the
exponent sh(t — a,) by the sum $|Л(Г] — at) + s2h(j-> - at). Hence, as in
(16-83),
Ф($1,$2) «	- 1] Да
/
and with Да -♦ 0, (16-85) results.
Covariance. We shall use (16-85) to determine the autocovariance C(/h/2) of
s(t). As we know [see (page 160)], C(r„ t2) is the coefficient of in the series
expansion of	about the origin. Expanding the term e? in (16-85), wc
conclude that
C(t],r2) = f A(a)/i(t] — a)h(f2 — a) da (16-86)
— ao
If A(f) = A = constant, then, with r, — t2 = r, the above yields
C(r) = h(r + a)h(a) da	(16-87)
in agreement with (11-50).
High density and normality. We shall show that if the density A of stationary
shot noise is large compared with the time constants of h(t), then s(t) is
approximately normal. For this purpose, we introduce the normalizations
Ao = -r	ha(t) = Jkh(t) pQ = Jkp
and examine the form of	, s2) as к -♦ Clearly,
Л(.е» - 1) =	+ 4 +	+ ' •)
I	* •	О«V К	j
Neglecting negative powers of к, we conclude from (16-85) that
-00 f __	Bn \
.....s2)=aJ	+	(16-88)
This shows that Vis a quadratic function of s{ because is linear in st. Hence
$!') is nearly normal. This result is exact in the limit as к -> <®if the linear term
is Omitted (centering).
JtNTENSITY OF SHOT NOISE. The square
T	1(0 e s2(O
«34 SELECTED TOPICS
rjf = £{s2(/)} = A^ h2(t) dt 4- A2
of the shot noise sit) is a stationary process with mean
» I2
/ h(t) dt (16-89)
У — 00
We shall determine its autocorrelation Я,(т) under the assumption that
Я(0) = Г h(t) dt = 0	(16-90)
— co
With this assumption, the mean and autocorrelation of s(r) are given by
17, = 0	Rx(t)—X.[ h(r 4- a)h(a) da	(16-91)
J — co
We maintain that
Я,(т) = А2#2 4- 2Я2(т) 4- Л f h2(r 4- a)h2(a) da (16-92)
J — »
where
E = j” h2(t)dt
is the energy of hit).
Proof. The second moment function s2) of the RVs s(z t) and s(r2) equals
the integral in (16-85) where
/3 = sxh(tx - a) 4- s2h(t2 ~ a) A(r) = A
As we know [see (7-34)] the coefficient of s2s2 in the expansion of the moment
function Ф($о s2) equals
£{s2(r,)s2(r2)}
4
We introduce the functions
А л»
= &nda
til J-a>
Since
B2
efl _ j . £ +	+ ...
2
and 7t « 0 [see (16-90)], we conclude that
*i) - У г + Уз • + Уч + ’ ’ •
«Hit»- 1 4- y2 + y3 4- y4 4- • • • 4-	* 73 + 74 *-----— 4- • • •
16-4 MARKOFF I’ROCFSSI.S 635
In this expansion, only the sum y4 + y2/2 will have terms in sf.vf. Furthermore,
.1 2
y4 - • - • 4-y— f (2 p2('1 - <*)h2(t2 - a) da + • • 
№r2 a,
y| « • •• +A2-t“^ ( Л2( r, - a) da f h2(t2 - a) da
L	* — rjj
+ A2a,2s2 / h(ti - a)h(t2 - a) da
L”' — OB
and with f| = f2 4- t, (16-92) results.
Power spectrum. The power spectrum 5,(<w) of s(r) equals [see (16-91)]
S,(w) = А|//(ш)|"	(16-93)
Furthermore,
A2£2 ♦-> 2irA2E25(w) 2itR2(t) «-* Ss(<o) * St(ш)
The integral in (16-92) equals Л2(т)* h2(-t). And since
2тгй2(г) <-* //(«)* Н(ш)
we conclude from (16-92) that the power spectrum of the intensity !(/) of s(t)
equals
A2	A
2тгЛ2Е26(й)) + — |//(л>)|2*|Я(ю)|2+ —з|//(^)* H(^)|2 (16-94)
ir	4-тг
High density If A is sufficiently large, then the third term above can be
neglected. In this case, s(/) is nearly normal and the power spectrum of s2(/)
equals the sum of the first two terms in (16-94) [see also (10-68)].
Low density If A is small, then lhe first two terms in (16-94) can be
neglected. In this case,
K0“S2(/XA2G-O	(16-95)
i
because the probability that the terms h(t - tz) and h(t - tk) have a significant
overlap is negligible. This means that the square of s(/) is approximately a
shot-noise process.generated by Л2(г).
16-4 MARKOFF PROCESSES
A Markoff process is a stochastic process whose past has no influence on the
future if its present is specified. This means the following:
If tli^l < t„, then
Л*(fn) SxMht S tn-i) - p{*('n)	(16-96)
636 SELECTED TOPICS
From this it follows that if
G <'2 <  <tn
then
P{x(t„) <. x„ |x( tn _ I),..., x( t,)} =P{x(z„)	(16-97)
The above definition holds also for discrete-time processes if x(z„) is
replaced by x„.
In this section, we develop various properties of Markoff processes,
concentrating on three classes: (д) discrete-time, discrete-state; (b) continuous-
time, discrete-state; (c) continuous-time, continuous-state. We start with a brief
discussion of certain general properties phrasing the results in terms of
discrete-time, continuous-state processes.
1. From (16-97) it follows that
1.	•••>*!)	(16-98)
Applying the chain rule (8-37) to the above, we obtain
/(xI(...,x„) = /(xJx„_I)/(x„_I|x„_2) • •• /(x2|xI)/(xI) (16-99)
Conversely, if (16-99) is true for all zi, then the process x„ is Markoff
because, in this case,
f(x X X 'I
ZUK-i.•••.*!) = J’ ” n~" )" =Лх>и-|) (i6-ioo)
2.	From (16-98) it follows that
£Г{х„|х„_1,...,х1} = E{x„|x„_I}	(16-101)
3.	A Markoff process is also Markoff if time is reversed:
>*>>+*) =/W*w + i)	(16-102)
Proof. The left side of (16-102) equals
f(Xn> *^л+ I» • • • » Xn+k) _ Z(^n+|l^n
f(Xn + [, • • • > Xn+k)	/(x„+I)
And since
f(xn+ i\xn)f(xn) = /(хл,хй+1) =/(x„|x„+1)/(x,1+1)
(16-102) results.
4.	If the present is specified, then the past is independent of the future in
the following sense: If к < m < zr, then
/(*»» “f(xwlxm)/(**lxm)	(16-103)
-/(*„)
16-4 MARKOI h PR(K I SSLS 637
Proof. From (16-99) it follows that
and (16-103) results.
The above relationship can be used to express conditional densities
involving the past and the future, in terms of the conditional (transition')
densities f(xk\xk + l). For example, if к < m < n, then
/(x,„|x„, xk) =	x„,|x„)	(16-1 (14)
J ' A * l-Vz» f
Example 16-13. If x,( satisfies the recursion equation
x„, i - "(x„,«) = *>„	(16-105)
and v,; is strictly white noise, then x„ is Markoff.
Proof. The process x„+) is determined in terms of x„ and v„: hence it is
independent of xk for к < n, assuming x„ = x„.
Special case (generalized random walk). If x(l = 0 and a(x„, n) — x„, then
X„ = x„-l + Vn-I = v( + v2 + ••• + v„_.
Thus the sum of independent RVs is a Markoff sequence.
Homogeneous processes. From the chain rule (16-99) it follows that the statis-
tics of any order of a Markoff process can be determined in terms of the
conditional densities /(x„|x„_j) and the first-order density f(x„). If the process
xrt is stationary, then the functions f(xn) and /(x„|x„_,) are invariant to a shift
of the origin. In this case, the statistics of xn are completely determined in terms
pf the second-order density
f(xlt x2) = /(x2|x,)/(x,)
A Markoff process x„ is called homogeneous if the conditional density
/(xjx,,^) is invariant to a shift of the origin but the first-order density f(xn)
might depend on n. In general, a homogeneous process is not stationary.
However, in many cases, it tends to a stationary process as n <».
The Chapman-Kolmogoroff equation The conditional density /(x„|xfc)
can be expressed in terms of f(x„\xm) and f(xm\xk) for any n > m > k:
fMxk)=( f(xn\x„,)f(xm\xk) dxm	(16-106)
J — X
This follows from (8-39) because /(х„|х,п, хл) = /(x„|xm).
638 SELECTED TOPICS
Discrete-Time Markoff Chains
A discrete-time Markoff chain is a Markoff process x„ having a countable
number of states at. A Markoff chain is specified in terms of its state probabili-
ties
(16-107)
and the transition probabilities
TTij[ni, л2] = р{*П2 =	= fli)	(16-108)
As we know [see (7-48)]
Eirjn.,^] = 1	£p,[^W*.n]=₽j[«]	(16-109)
j	•
Furthermore, if zz t < n2 < n3, then
= E'n’frhn ЛзКД'Ь, лз]	(16-110)
Г
This is the discrete form of the Chapman-Kolmogoroff equation and it follows
readily from (8-40).
HOMOGENEOUS CHAINS. If the process x„ is homogeneous, then the transition
probabilities in (16-108) depend only on the difference m = n2 - nt. Thus
тг/7[т] = P{x„+m = e7|x„ = a,}	(16-111)
Setting n2 — И| = к, n3 — n2 = n in (16-110), we obtain
тг/7[л + Л] = Х2‘п-,>[Л]тгп[я]	(16-112)
Г
For a finite-state Markoff chain, the above can be written in vector form:
II[w +£] = П[л]П[£]	(16-113)
where П[и] is a Markoff matrix with elements тг|7[л]. This yields
П[н] = П" where П = П[1]	(16-114)
is the one-step transition matrix with elements тг,7 = irJl]. The above is the
Solution of the first-order recursion equation [see (16-113)]
П[л + 1] = П[л]П	(16-115)
The matrix П is shown schematically in Fig. 16-16. The circles in the
diagram represent the states a{ of the process and the number on each segment,
from a, to a;, the transition probabilities -rr,7. The number on the loop from a,
to a{ equals тги. This loop (dashed line) can be omitted because the sum of the
row elements of П (segments leaving a state, including the loop) equals 1 [sec
(16409)].
16-4 makkoi ь ph(x t-ssi s 639
FIGURE 16-16
Writing (16-109) in vector form and using (16-114), we conclude that
р[л] = ... =р[„_£]Ц*= ... = p[o]n- (16-116)
where P[n] is a vector whose elements are the stale probabilities рДл].
In. general, P[n] depends on n. However, if the initial state vector
P[l] =P = [P1,.(16-117)
is such that P[2] = P, then Р[л] = P for all n. In this case, the homogeneous
process x(l is also stationary and its state vector P is the solution of the system
PI1 = P E₽i=l	(16-118)
i
The state probability vector P of a stationary Markoff chain is thus an eigenvec-
tor of its transition matrix П and the corresponding eigenvalue equals 1 [see
(16-109)].
If the initial state P[l] of a homogeneous chain x„ does not equal P, then
хл is not stationary. In this case, certain of its states might never be reached.
The details of the underlying theory will not be discussed. We note only that, if
П" tends to a limit as л-»», then xrt is asymptotically stationary. This is the
case if all the elements тг,7 of П are strictly positive (Prob. 16-19).
Example 16-14. In the random walk experiment (Sec. 10*1), we place two reflecting
walls at x = 2s and x = — 2s as in Fig. 16-17. The resulting motion x(/) between
the two walls generates a homogeneous Markoff chain x„ = х(лТ) taking the
FIGURE 1647
640 SELECTED TOPICS
values
-2л	-5	0	5	25
The resulting transition matrix and its diagram are shown in the figure. In this
case, the system (16-118) yields
Pi	Рз	Pi+ Pa
Pi = ~ P2 = Pi + — Рз = ——
Рз
Pa = Pi + — Pi + Pi + Рз + Pa + Ps = 1
Solving, we obtain
Pi ~ Ps ~ к Pi = Рз = Pa = л
These are the initial state probabilities that generate a stationary process.
Continuous-Time Markoff Chains
A continuous-time Markoff chain is a Markoff process x(/) consisting of a family
of staircase functions (discrete states) with discontinuities at the random points
(Fig. 16-18a). The values
4„ = x(C)	(16119)
of x(t) at these points (Fig. 16-186) form a discrete-state Markoff sequence
called the Markoff chain imbedded in the process x(f).
A discrete-state stochastic process is called semi-Markoff if it is not
Markoff but the imbedded sequence q„ is a Markoff chain. An example is the
queueing process N(r) of Sec. 16-2.
A Markoff chain x(t) is specified in terms of the underlying point process
t„ and the imbedded Markoff chain q„.
We denote by
p,(<)-P(x(<) =a()	(16-120)
the state probabilities of x(r) and by
= ajx( 11) = nJ	(16-121)
(*»)	(A)
FIGURE16-18
16-4 .MARKOFF PKCX'L-SSLS 641
its transition probabilities. These functions are such that
ХХ/Оп'г) = 1	=Р/'2)	(16-122)
i	•
and they satisfy the Chapman-Kolmogoroff equation
H/X'pG) =	<l2 <h (16-123)
r
In specific problems the functions тг^г,, t2) are not given directly. As we
shall presently see, however, they can be determined in terms of the transition
probability rates to be presently defined. For simplicity, we shall consider only
homogeneous processes.
A Markoff process x(r) is homogeneous if its transition probabilities
depend on the difference т = t2 - t(:
’’.•Дт) =+ т) = ау|х(Г) = a,)	r>0	(16-124)
From the above and (16-123) it follows with a = t3 - t2 that
7Го(т + «) = Етг,г('г)7го(«)	(16-125)
r
This is the Chapman-Kolmogoroff equation for continuous-time Markoff chains
and it can be written in vector form:
П(т + a) = П(т)П(а) r,a>0	(16-126)
Where П(т) is a matrix with elements тг17(т).
Probability rates. In the discrete-time case, we showed that the matrix П[п]
satisfies the recursion equation (16-115) and can be determined in terms of the
one-step transition matrix П. We show next that the transition matrix ГЦт) of a
continuous-time chain x(t) satisfies a differential equation and can be deter-
mined in terms of the matrix
(16-127)
whose elements A(-; = ir-,(0+) are the derivatives from the right of the elements
of П(т). These derivatives will be called the transition probability rates of
x(r). Clearly,
J
— 0 because 127rz/(T) = 1
and since

(16-128)
642 SELECTED TOPICS
we conclude with /x, = -A„ that
= E'a0 > 0 AM ;> 0 i + j	(16-129)
i
The prime indicates summation for every j Ф i.
In the above, we have assumed that тг,/т) is differentiable at r = П \ This
is so only if the probability that there is one discontinuity point in the interval
(/, t + ДГ) is of the order of Дг:
P{x(r -I- Д0 = ajx(z) = aj =
p ~Pi
(16-130)
The Koimogoroff equations. Differentiating (16-126) with respect to a and
setting a = 0, we obtain
П'(т) = П(т)Л П(0) = 1	(16-131)
This is a system of linear differential equations with constant coefficients and its
initial condition П(0) is the identity matrix [see (16-128)]. Solving, we obtain
П(т)=еЛт	(16-132)
We have thus expressed П(т) in terms of the transition rate matrix Л.
The state probabilities p,(z) satisfies a similar system: Denoting by P(t) a
vector with elements p^t), we conclude from (16-122) that
P(t + т) =Р(г)П(т)	(16-133)
Differentiating with respect to т and setting т = 0, we obtain
P'(') = />(z)A	(16-134)
This is a system of N equations of the form
Pl(t) = -PiPi(j) + E\fP/(0	(16-135)
j
Its formal solution is a vector exponential
P(z) = Р(0)ел'	(16-136)
We have thus expressed P(r) in terms of Л and the initial state probabilities
p/0).
If x(/) is stationary, then p,(z) = p, = constant. Hence [see (16-135)]
L
PvP; =	Ep(=l	(16-137)
J	i
This is a system expressing the state probabilities of a stationary process in
terms of the transition rates Az/.
16-4 MARKOFF PROCESSFS 643
Example 16*15 Generalized telegraph signal. Suppose that x(r) takes two values
(Fig. 16-19)
a, = A a2 = — A
and
P{x(r + Д/) =>l|x(t) = A) = 1 - м, Дг = 77ц(Дг)
(16-138)
Р{х(/ + ДО = -Л|х(/) = —А} = 1 - д, Д/ = тг22(Дг)
In this case, Л)2 = Др А2) = Мг* Inserting into (16-135). we obtain
Pi(O + AiPi(O = МгРзСО
And since p2(t) = 1 - px{t), we conclude that
Pi(O = ———[I “ е"<Д|+м-к] + ^(О)^"'^’'
Д1 + Д2
Note that
p.(0-------» --------=P|	p->(/)-----> --------=p2
'	Д! + p2	' Г— + p2
The transition probabilities are determined from (16-131)
’Гп(т) + Д Fl l(r) = Д2’г12(т)	11(°) = 1
’’ЪС’’) + Дг’ГаСт) = AF21(t)	тг22(0) = 1
where
7Г|2(т) = 1 - ^ll(')	1Г2|(т) = 1 - U22(t)
The above yields
^xM=px+p2e-^^
(16-139)
’r2i(’’) =P2 +pIe"<M'+MpT
Aji =P2
A|jeAi
WORE №19
644 SELECTED TOPICS
Mean and autocorrelation The process xG) is asymptotically stationary
with
P{x(r) = a,} =p, ax = A a2 = -A
and
P{x(r + т) = apx(O = в,) =P,7r,/(r) i,j = 1.2
From this it follows that
E{x(t)} = rj ~pxA - p2A
/?(t) = r)2 + 4A2P|P2e~(,1'4',1-),r'
If pt = p2~ then "П = 0 a°d Я(т) = e“2A|ri as in (10-19). In this case
only, the discontinuity points t, of xG) are Poisson.
SPECTRA OF STOCHASTIC FM SIGNALS!. We shall determine the power spec-
trum of the FM signal
w(/) =	<p(r) = Гх(а) da
(16-140)
(see also Sec. 11-3) where we assume that the instantaneous frequency xG) is a
stationary Markoff chain as in (16-120). Clearly,
= E{w(r)}
We introduce the conditional correlations
Я7*(т) =£{w(t)|x(0) = д(, x(t) = ak}irik(r) (16-141)
where эт(Л(т) are the transition probabilities defined in (16-124). Clearly,
P{x(0) = ajt х(т) = дл.) = p,it^(t)
hence
Л(т) = ЕрЛ(т)	(16-142)
i.k
To determine R(r) it suffices, therefore, to find Rik(r).
THEOREM. For any т > 0 and v > 0:
««(г + -0 -	(16-143)
m
tR. Kubo: *’A Stochastic Theory of Line-Shape and Relaxation," Scottish Universities Summer
School, D. ter Haa'r, ed., Plenum Press, New York, 1961. See also A. Papoulis: "Spectra of
Stochastic FM Signals" in Proceedings of Transactions of 9th Prague Conference on Information
7W.1982,
16-4 MARKOFI PR<X4 SSI-5 645
Proof. In the following, the conditions
x(0) = a, x(r) - a„, x(r + p) = ak
will be abbreviated as д„ a,n. and ak respectively. Reasoning as in (16-104), we
obtain
,	। i Ъ'Л 1 )	, . ...
P{x( T) = a,H | д,-, ak} = ---- (16-144)
~ik\r + v)
Furthermore [see (8-43)]
E{w(t + р)|л,,д*} = £E(w(r + р)|др д„,,дл.}Р{х(т) =Д,„1д,,я*}
9П
(16-145)
If х(т) = ат is specified, then the integrals
are conditionally independent. Hence
= Etexp j f x(a) da
I yo
E
a к .
From the stationarity of xG) it follows that the last term equals

E(exp j [ x(a) da
I L
Inserting into (16-145) and using (16-144), we obtain (16-143).
The Kolmogoroff equations. Differentiating (16-143) with respect to v and
setting v = 0, we obtain
я;*(г) = Ея,„,(г)к^(о+) т>о (16-146)
m
Initial conditions To determine Л,л(т) it suffices, therefore, to find the
Values of Rik(r) and its derivative at т = 0+. We maintain that
адоп-{1	ru(o-) =	‘~kk (16-147)
\V	I T5 A.	I	I f
Proof. For small r.

i = к
itk
, x / 1 - M/T
s л t
646 SELECTED TOPICS
Neglecting terms of the order of t2, we conclude that
® Е(ехр[/х(0)т|О|]}тги(т) = e;fl,T( 1 - nj) - 1 + jo(T - д,т
Furthermore, for i ¥= k:
Ri/M = ^^ikr == A1Jtr
and (16-147) follows.
Combining (16-147) and (16-146), we obtain
Я.ч(~) = Uak “ М*)Я,*(т) + ЕЛш^/?,-ш(т)	(16-148)
tn
This yields Rik(r). The autocorrelation R(r) of w(t) is determined from
(16-142). Since the coefficients of (16-148) are constant, we conclude that the
power spectrum of an FM signal, whose instantaneous frequency is a finite-state
Markoff chain, is rational.
Example 16-16. Suppose that x(f) is a symmetrical telegraph signal (Fig. 16-20a).
In this case,
(21 =	= А	=	” A
and (16-148) yields
Л'п(т) = (jA - А)Яи(т) + A7?i2(t)	7?„(0) = 1
Я',2(т) = AR„(t) - (jA + А)Я,,(т)	Я.,(0) = О
Denoting by S^(s) the Laplace transform of ЯД(т), we conclude from the
above that
sS^(s) — 1 = (JA — A )S]+](s) + AS^(s)
sSгг(,г) = ^(j) ~	A)S^(s)
Hence
s + jA + A	A
s"M = -d«- 8^)-ад
RGURE16-M
16-4 MARKOFF PROCESSUS 647
where
D(s) — s~ + 2As + A~
Reasoning similarly, wc find
And since p, =pz = 0.5. (16-142) yields
+ 2A
S+(s) = “n7T = 2RcS + O)
In Fig. 16-206, we plot S(w) for A = A and A = 0.1 A. Note that the discontinuity
points (zero crossings) of x(/) arc Poisson distributed and their average density
equals A.
BIRTH PROCESSES. A birth process is Markoff chain x(z) consisting of a family
of increasing staircase functions (Fig. 16-21). The process x(/) (population size)
takes the values 1,2,3,... and increases by 1 at the discontinuity points t, (birth
times). From the definition it follows that the transition rates Ao are different
from 0 only if i = j or i = j — 1. Thus
-Aj7 = д, A,(, + l) = р,. Ao = 0 otherwise
The above shows that a birth process is specified in terms of the parameter д.,.
Clearly [see (16-130)]
Р{х(/ + Д/) = n|x(f) = л] = 1 - At n > 1
(16-149)
P{x(r + Д/) = n|x(f) = n - 1} = pn-\ Lt я > 1
Hence
pt(7 + Lt) »/>,(<)(! -p,Lt)
Pn(t + &t) = p„(t)(l ~ pnLt) + pn-i(t)iirt.iLt n > 1
648 SELECTED TOPICS
This yields
Pi(O “ -A1P1O)
p«(O = “Млрл(') + мл-|Рл-|(0 n > 1
in agreement with (16-135).
(16-150)
Note The difference x(t2) - x(r ]) equals the number of discontinuity points tt in the
interval (th/2). This shows that a birth process is completely specified in terms of the
point process t,-.
Example 16-17. If the birthrate is proportional to the population size n:
Мл = nc	(16-151)
then x(t) is called the simple birth process (the constant c is the birthrate per
person). We shall determine p„(t) under the (unrealistic!) assumption that (16-151)
holds for every n I and that x(0) = 1. In this case,
Pj(0) = 1	p„(0) = 0 n > 1	(16-152)
Setting p.n = nc in (16-150), we obtain
Pi(O + <Pi(0 “ 0
(16-153)
Рл(0 + ncp„(t) = (и - l)cp„_!(f) n > 1
The above yields
Pi(O =Pi(0)e~r' = e~ct
and with a simple recursion,
P«(0 = e-c'(l-<?~c')"_1	(16-154)
This function is called the Yule-Furry density. Thus, for a specific /, the RV
x(r) - 1 has a geometric distribution with ratio (1 — e~cl). Hence (sec Prob. 5-35)
E{x(r)} = ecl E{x2(/)} = 2e2cl - ect
Example 16-18. We now assume that the rate of increase of x(/) is independent of
its present state
p.n = A = constant
As we shall see, the resulting x(t) is a Poisson process. We assume again that
x(0) * 1. Setting p,„ = A in (16-150), wc obtain
Pi(O + Ap1(O = 0	Pi(0) = l
P«(0 + AP«(O = bPn-W	P«(0) " 0	n > 1
This yields
е~л'(А/)п
n!
The above is the probability that x(t) ’ — n + 1 and it equals the probability that the
number of points x(l) — x(0) in the interval (0,/) equals n.
16-4 MARKOM PROCESSES 649
BIRTH-DEATH PROCESSES. Suppose now that a Markoff chain takes the values
0,1,2,... and its discontinuities equal +1 or - 1. (Fig. 16-22). We then say that
x(/) is a birth-death process. In this case, Ai; is different from 0 only if i = j or
j - 1 or j + 1. Hence x(r) is specified in terms of the two parameters
Thus —A;/ = д, = a, + and
P{x(r + At) = n\x(t) = n - 1} = Af
P{x(/ + Ar) = л|х(/) = л) = [1 - (a„ + pn) Ar]
P{x(r + Ar) = rt|x(r) = n + 1} = pn+l At
From (16-135) or, directly from the above, it follows that
Po(0 + ^оРо(') =&vPx<X)
PM + («„ +/3„)P„(r) = a„_Ip„_I(r) + 0„+Ip„ + l(r)	n >
(16-155)
Example 16*19 M\M11 queue. A queueing process N(t) is not, in general,
Markoff. The M|M|1 case, however, is an exception. In this case, the arrival times
1, are Poisson with average density A and the service time c, is an RV with density
де_д₽ where p > A. We maintain that the resulting N(r) is a birth-death process.
The probability that a unit will arrive in the interval (/, t + Д/) equals А Д/
(property P| of Poisson points). We shall show that the probability that a unit will
depart in the interval (r, i + Д/) equals pAt. Indeed, denoting by t( the first
departure point to the right of t and by c( the corresponding service time, we
conclude that (see Example 7-10)
P{t < c, £ t + Дг} =/c(0) Дг = ft At
no matter when this service started.
We have thus shown that
P(N(r + Дг) =n|N(/) = л - 1) ~ АДг
P{N(r + Дг) « n — 1 |N(r) e л) = д Дг
Thus N(r) is a birth-death .process with a„ - A and « д. We shall
determine its state probabilities pn for the stationary case.
(16-156)
650 SELECTED TOPICS
In this case, p^(t) = 0 and (16-155) yields
Лр0 = д/?| (Л + м)Р„ = Ap„_| + Aipn + I
From this it follows readily that
(A Iя	A
Pn ~ Pu I	Po ~ 1
\P J	P-
as in Example 16-10.
Continuous-State Processes
A continuous-state Markoff process x(t) is specified in terms of its first-order
density
ap(x,r)
p(x,f) =—z--------- P(x,l) = P(x(r) <A'}
ox
and the conditional (transition) density
7r(x,x0;/J0) =/x(/)(x|x(/0) = x0) t > ia
These functions are such that, if t0 < t < t,, then
f p(x,f) dx = 1
(16-157)
/00
P(xQ,tn)Tr(x,xQ-,t,t0) dx0
— 00
and
r00
J ir(x,x0;t,t0) dx = 1
(16-158)
ir(x,x0;t,t0) = f 'n-(x,X|;t,f1.)ir(xl,x0;r1,t0) dx,
J — ®
as in (16422) and (16-123).
Furthermore,
tt(x, x0; t, t0)	> 6(x-x0)	(16-159)
We shall show that the function ir(x, x0; t, t0) can be determined in terms
of the slopes ri and cr2 of the conditional mean a(x0; /, t0) and the conditional
variance fe(x„; t, t0) of x(0 assuming x(f0) = x0, defined as follows:
a(x0;t,t0)=f хтг(х,х0;г,/о) dr
““	(16-160)
b(xp; t, r0) = f (x - a)2ir(x, x0; t, r0) dr
—00
16-4 MARKOH PROCI SSI Л 651
We assume that these functions are differentiable from the right and we
denote by i7(x0, tQ) and cr2(x0, tQ) respectively their slopes at f = r() (Fig. 16-23)
d
GI	/.J
d	“	(16-161)
^(xoUo) =— b(x0;t,tQ)
dt
Clearly [see (16-159) and (16-160)]
а(х(ь *о) = л’о b(xQ‘t tu, r()) = 0
Hence, for At > 0,
a(x0',t0 + At,t0) ~x0 + 77(x{),t()) At
,	(16-162)
6(x0; t0 + At, t0) = <r2(x(), t0) At
If the process x(t) is homogeneous, then the function тг(х, x(); t, t())
depends on т = 1 - t0. In this case, the slopes i?(x0) and <r2(x0) of a and b are
independent of t0.
From (16-162) and the definition of the functions a and b it follows that
E{</x(r)|x(t) = x] =7](x,t)dt
,	, ,	(16-163)
£{[ Jx(t) — 17 (x, t) <Zt]2|x) = <r2(x,t) dt
As we show in the next example, these equations can often be used to
determine тДх, t) and cr2(x, t) directly in terms of the specifications of the
process x(r)<
Example 16-20. Consider the nonlinear stochastic differential equation
dt
4 dw(x, t)
+ Д(х.О-------%—
where w(x, () is a process with independent increments and such that
£{dw(x,t)} - 0	£{[dw(x,/)]2} - y(X,r) dt
652 SELECTED TOPICS
The solution of the above equation is a Markoff process (sec also (16-105)1.
Clearly,
E{dx( i) |x) = -p(.x,i)di
E([dx(/) + j3(x,/) </r]2|x} = E{[dw(x,z)]‘l*} = v(x,t)dt
Hence i)(xt t) = -fiix, t), <r2(x, i) = y(x, i).
THE DIFFUSION EQUATIONS. We shall show that the conditional density tt =
тг(х, x0; t, f0) satisfies the diffusion equations
dir d r	,	1 32 r ,	4 ,
- + -h(x,z)H --^[tr (x.zhj-o
dir	dir	1	32ir
TT + v(xn,tQ)-~ + -o- (x0,t0)—у = 0
3tQ	dxa	2	dx ()
(16-164)
The first is called forward (or Fokker-Planck} and the second backward.
Proof. If fix) is a density with mean ij and “small” variance, then [see (5-55)]
f git)fit) - giv) + ^-g"(v)	(16-165)
From (16-158) it follows with x, = £ and f, = /0 + e that
тт(л',х();г,/о) = [ ir(x,£',t,tn + Е)тг(^,л|)и0 + E,ttt) d(
In the above, iri£, x0; tQ + e, tn) is a density in the variable £ and its mean and
variance equal [see (16-160) and (16-162)]
^(xQ;tQ + e,t0) = x0 + E7?(.vn, f0) b(x',tQ + E,ta) = eo-2(a(), t0)
Therefore, with
g(€) = ir(x,£;M0 +e) fit) = тг(£,х0;/0 + E,f(1)
(16Л65) yields (Fig. 16-24)
.	.	sa2 32
тг(х,х0; t, tQ) - тг(х,х0 +Eir,t,t0 + e) + —— —2-тг(л-, x0 + E;z,/n 4-e)
2 dx0
within O(g2); Expanding the right side into a power series in e and retaining
only linear terms, we obtain the second equation in (16-164). The proof of the
first is similar.
COROLLARY. The first-order density p = p(x, t) of the process xit) satisfies the
Fokker-Planck equation
dp 3 r	1 d2 r
It +	” 2 ax2^2^*'^] “ °	(l6-166>
16-4 МАККОИ CROC I SSI S 653
Proof. It follows if we express the function p(x. r) in terms of the integral in
(16-157) and use (16-164).
Example 16-21. The velocity v(r) of a particle in brownian motion satisfies
Langevin’s equation
v'G) + /3v(r) = w’(/)
where w(r) is a process with orthogonal increments and such that
E{r/w(O}=0	£{[Jw(f)]2) = у dt
Hence (see Example 16-20) v(r) is a Markoff process with 7}(.v. t) = -flr, tr:(x, t)
= y, and (16-166) yields
др	ЦСР)	У д'Р
-- = P	 +---7
dt dl'--------------2 dc“
(16-167)
where p = p(r, r) is the density of vG).
SOLUTION OF THE FOKKER-PLANCK EQUATION. We shall solve the forward
equation in (16-164) under the assumption that the conditional density
тг(х, xn; /, z(1) does not depend explicitly on the state x(/n) = x0 of xG) but only
on the increment и ~ x — x(). In this case [see (16-160)]
«(*0;Mo) = / (u 4- х(,)тг(и;/,/„) du = «,((, tn) + x„
— 30
Inserting into the second integral in (16-160), we conclude that the
conditional variance b = b(t, tu) does not depend on x(1. From the above it
follows that the slopes т?(г0) and tr2G(l) of a and b are independent of xft. This
simplifies the form of the forward equation
dir дтг 1 д2тг
77 + ”(')^-Г2(')^ = 0	(,6’16S)
ll
The solution ir(u; t, t0) of (16-168) is a density in the variable u. In fact, it
is the density of the increment xG) - xG0) of xG) under the assumption that
654 SELECTED TOPICS
x(r()) - x0. We shall show that this density is normal with
Mean: /\(т) dr Variance: Г<т2(т) dr (16-169)
Jtu
Proof. With <Ms, f) the bilateral Laplace transform in the variable и of the
function ir(u; t, tQ), it follows from (16-168) that
ЗФ	cr2(t)s2
— = -T](t)sQ + -------------Ф	(16-170)
dt	2
The function Ф<5, f), evaluated at t = r0, ,s the transform of the function
7r(u;/0,/0) = 8(u) [see (16-159)]. Hence Ф($, f0) = I and (16-170) yields
1пФ($, r) = — 5У t?(t) dr + — J а2(т) dr (16-171)
This shows that Ф(.у, /) is the moment function of a normal density with mean
and variance as in (16-169).
PROBLEMS
16-1. Show that the probability that lhe Wiener process w(z) does not cross lhe line La
in the interval (0, t) equals 2G(a/ fat) - 1.
16-2. We denote by P0(t) the conditional probability that the number of 0’s of a normal
process x(f) in the interval (/,/ + t), assuming x(i) = 0, is odd. Show that if
EM/)} = 0, then
Я'(0)Я2(т)Г1/2
cos,/>„(r) - -Я(т) -Я"(0)Я(0) + --------------
/ЦЦ)
16-3. Show that if fw(w,t) is the first-order density of the Wiener process w(t) and
/Т(т, a) is the density of the first passage time tj (Fig. 16-4a), then!
a
A(T^) = -Л(а.т)
т
16-4. Passengers arrive at a terminal boarding the next bus. The times of their arrival
are Poisson with density Л = 1 per minute. The times of departure of each bus
are Poisson with density fi = 2 per hour, (a) Find the mean number of passengers
in each bus. (6) Find the mean number of passengers in the first bus that leaves
after 9 a.m.
Answer: (o) 30; (6) 60.
16-5. Passengers arrive at a terminal after 9 a.m. The times of their arrival are Poisson
with mean density A = 1 per minute. The time interval from 9 a.m. to the
tA. A. Borovkov: “On the First Passage Time...Theory of Probability and Its Applications, vol. X,
rtp. 2, 1965.
FROIH.LMS 65S
departure of the next bus is an RV c. Find the mean number of passengers in this
bus (fl) if c has an exponential density with mean 17, = 30 min. (/1) if c is uniform
between 0 and 60 min.
Answer: (a) 30; (b) 30.
16-6, The point process t, is stationary and the RVs c, = t, - t, । arc uniform in the
interval (0, a). Show that if tj is the first point to the right of a fixed point rlt. then
£{t] - tn) = fl/3.
16-7. (a) The RVs c, are i.i.d. and Е{е>й,с'} = Ф,.(ш). The process n(r) is Poisson with
parameter А/ and independent of c,. Show that, if (Fig. P16-7«)
n(r)
x(f) = L ci then E{e;“x(')) = ехр{Аг[Ф(.(<о) - 1])
i — 1
Special case: If the RV c, takes the values 1 and 0 (Fig. Р16-7Ю and
PlCj = 1} = p, then x(t) is a Poisson process with parameter Apr (sec also Prob.
8-11).
Hint: E{ey“x(,»|n(r) = л) = Ф/Ы.
(/>) Using the above, show that, if t, is a Poisson point process with mean density
A and tj is a process obtained by eliminating al random a subset of t,, then t, is a
Poisson point process with mean density kp where p is the probability that a
point of t, is not eliminated.
FIGURE P16-7
16*8. (fl) Show that if t„ is a Poisson point process, then the process t2n consisting of
every other point of t„ is not Poisson. (6) Show that if an and arc two
independent Poisson point processes with densities Aft and kfi respectively, then
the process t„ consisting of all the points of ot„ and is Poisson with density
A„ + kfi.
16-9. Visitors enter a park at Poisson times with mean density A = 2 per minute. Each
visitor stays in the park c minutes where c is an RV uniform between 30 and 90
min. Find the mean and the variance of the number N(/) of visitors in the park.
16*10. In the M\M|1 queue (Example 16-10), у is the busy period and Фу(г) is its
moment function. Show that
АФ*(s) - (Л + p - -ОФ/*) + p - 0 Ф/s) 0
656 selected topics
16-11. (a) With <j, as in (16-53), show that
£(ч;} = £{q?) - 2n„ +p	(i)
(b) Prove the Pollaczek-Khinchin formula (16-60), using (i) and the identity
E{q?) - Efqf) + £(<) + 2E(4jE(n,r)
16-12. In a single-server queueing system, the arrival times t, are Poisson with mean
density A - 9 per hour. Find the mean of the following: the service time c, the
waiting time b, the system time a, the idle period x, the busy period y, and the
number ny of units served during a busy period. Consider two cases: the density
of the service time c is (a) uniform between 4 and 8 min; (b) it equals д:се~цг
where д = 1/3.
16-13. In an M\M11 queue, the service time density equals де~*“'. Find the density of
the distance from a fixed point tQ to the next departure point.
Hint: The probability that lQ is a point of a busy period equals А/д.
16-14. Find the probability P(s(t) s 2} of the shot-noise process
s(/) =	- /,) Л(г) = 4t/(r) - 3U(t - 1) - U(j - 2)
i
where t, are Poisson points with A = 2.
16-15. The shot-noise process s<r) is a train of triangles
s(O = EAG-t/) л(г)-(5(2 И) 1,1 < 2
v 7	,	17 v 7 \o	Id > 2
and the points t, are Poisson with A = 0.01. (fl) Find its power spectrum 5,(w).
(b) Find its first-order density. (Note that 2A 1.)
16-16. The points t, are Poisson with density A and
s(<)= £*(<-«,)	Л(<)-е"'У(0
i
(a) Find the mean, the variance, and the power spectrum of s(r). (b) Find the
power spectrum of the process y(f) = s2(/) for A » a and for A «a.
16-17. The RVs x„ are i.i.d. taking the values +1 and -1 with P(xn = 1} = 0.6 and
P(x„ = -1} = 0.4. Show that the process y„ = x„ + x„ _ ] + • • • +X] is a Markoff
chain and find its state probabilities рДл] and transition probabilities 7г,Дт].
16-18. Show that if x(0) = 0 and x(r) is a process with independent increments, then it is
Markoff.
16-19. Given a two-state Markoff chain x„ taking the values 1 and 0 with state
probability vector P[n] and transition matrix П. Show that, if
Find P[2] and P[3] if Xj = 0.
16-20, Show that, if x(t) is a discrete-state Markoff process taking the values a, and
P{x(0 « aj -^(O	P{x(t2) = fl/|x(r,) = flj -
i’Uoiih.ms 657
(hen its autocorrelation equals
AxGi.'z) = EW/Oi-'jM'i)
ij
16-21. Show that if тг0(Г|, t2) are the transition probabilities of a Markoff chain x(t) and
P{x(r + Ar) = a,|x(r) = a,) = 1 - д(г) Ar
P{x(t + Ar) = a}:|x(l) = a,} = A,; Ar
then
дттц(1,
—----------= -д, (г)тг,(г,г()) + EM'4*(Mi)
dt	к
dlT.-Xt, tn)
"T.	= t	E'77\i(f’ ri»)^/Jt(^o)
к
16-22. The telegraph signal x(r) of Example 16-15 is stationary with p,2 = 3/i| = 6 and
A = 100. (a) Find its mean tj* and autocorrelation Яд(т). (b) Find the power
spectrum Sw(a>) of the FM signal
w(r) = e/‘₽tf)	<p(r) = Гх(а) da
'o
(c) Show that w(r) satisfies the time-varying stochastic differential equation
w*(r) + jx(r)w(r) = 0	w(0) = 1
Find E{w(r)} and Rw(.l},t2).
16-23. Show that the distribution function
F(x,x(1;r,r0) = P{x(r) <x|x(r0) = x(J = / тг(£, x(); r, r0)
of a Markoff process satisfies the backward diffusion equation
dF	dF 1	d2F
T- + ^(xd, („)— + -a2(x„, r0)—T = 0
v/у	2
BIBLIOGRAPHY
Abramson, N. M. (1963): Information Theory and Coding, McGraw-Hill, New York.
Antoniou, A. (1979): Digital Filters: Analysis and Design, McGraw-Hill. New York.
Ash, R. (1965): Information Theory, Interscience, New York.
Bharucha-Reid, A. T. (1960): Elements of the Theory of Markov Processes and Their Applications,
McGraw-Hill, New York.
Blackman, R. B., and J. W. Tukey (1959): The Measurement of Power Spectra, Dover, New York.
Blanc-Lapierre, A. and R. Fortet (1953): Theorie des Fonetions Aleatoires, Masson et Cie, Paris.
Childers, D. G., ed. (1978): Modem Spectrum Analysis, Wiley, New York.
Cooper, R. B. (1981): Introduction to Queuing Theory, North-Holland, New York.
Cramer, H. (1946): Mathematical Methods of Statistics, Princeton University Press, Princeton, NJ.
Davenport, W. B., Jr. and W. L. Root (1958): An Introduction to the Theory of Random Signals and
Noise, McGraw-Hill, New York.
Doob, J. L. (1953): Stochastic Processes, Wiley, New York.
Feinstein, A (1958): Foundations of Information Theory, McGraw-Hill. New York.
Feller, W. (1957 and 1967): An Introduction to Probability Theory and Its Applications, Vols. I and 11,
Wiley, New York.
Franks, L E. (1979): Signal Theory, Prentice-Hall, Englewood Cliffs, NJ.
Gardner, W. A. (1987): Statistical Spectral Analysis: A Non-Probabilistic Theory', Prentice-Hall,
Englewood Cliffs, NJ.
Helstrom, C. W. (1968): Statistical Theory of Signal Detection, 2d ed., Pergamon Press, New York.
Jenkins, G. M. and D. G. Watts (1968): Spectral Analysis and Its Applications, Holden-Day, San
Francisco, CA.
Kleinrock, L. (1975-1976): Queuing Systems, 2 vols., Wiley, New York.
Laning, J. H. and R. H. Battin (1956): Random Processes in Automatic Control, McGraw-Hill, New
York.
Lebedev, V. L.: “Random Processes in Electric and Mechanical Systems.” NSF and NASA
Technical Translations, Washington, DC.
Marple, S. L. (1987): Digital Spectral Analysis, Prentice-Hall, Englewood Cliffs, NJ.
Nahi, N. E. (1969): Estimation Theory and Applications, Wiley, New York.
Oppenheim, A. V. and R. W. Schafer (1975): Digital Signal Processing, Prentice-Hall, Englewood
Cliffe/NJ.
Papoulis, A. (1962): The Fourier Integral and Its Applications, McGraw-Hill. New York.
658
HIBI KMtKAI'IIY 659
Papoulis, A. (1968): Systems and Transforms with Applications in Optus, McGraw-Hill. New York.
Reprinted (1981) by Krieger Publishing Company, Melbourne. FL.
Papoulis, A. (1977): Signal Analysis, McGraw-Hill; New York.
papoulis. A. (1980): Circuits and Systems: A Modem Approach, Holl. Rincharl and Winston. New
York.
Papoulis, A. (1990): Probability and Statistics, Prentice-Hall, Englewood Cliffs. NJ.
Parzen, E. (1960): Modem Probability Theory and Its Applications. Wiley. New York
Priestley, M. (1981): Spectral Analysis and Time Series, 2 vols., Academic, London.
Proakis, J. (1983); Introduction to Digital Communications, McGraw-Hill. New York.
Schwartz, M. (1977): Computer-Communication Network Design and Analysis, Prentice-Hall. Engle-
wood Cliffs, NJ.
-Schwartz, M. and L. Shaw (1975): Signal Processing, McGraw-Hill, New York.
Wainstein, L. A. and V. D. Zubakov (1962); Extraction of Signals from Noise (translated from
Russian), Prentice-Hall, Englewood Cliffs, NJ.
Wiener, N. (1949): Extrapolation, Interpolation, and Smoothing of Stationary Time Series, MIT
Press, Cambridge, MA.
Woodward, P. (1953): Probability and Information Theory with Applications to Radar, Pergamon,
New York.
Yaglom, A. M. (1962); Stationary Random Functions (translated from Russian). Prentice-Hall,
Englewood Cliffs, NJ.
Yaglom, A. M. (1987): Correlation Theory of Stationary and Related Random Functions, 2 vols..
Springer, New York.
INDEX
e-dependent processes, 295
All-pass filters, 475
Alternative hypothesis, 266
Analog estimators. 434, 438-440
correlometers, 438
Michelson interferometer, 440
spectrometers, 439
Fabry-Perot interferometer. 440
Analytic signal, 327
Arcsine law, 307, 341, 438
Autocorrelation, 288, 293, 329
Autocovariance, 289, 294
Autoregressive (AR), 410, 457
moving average (ARMA), 412, 459
Axioms of probability, 20, 24
Bandlimited processes, 376-384
bounds, 378, 398
Taylor series, 377
Bayes’ theorem, 84,164
Bernoulli trials, 43-47, 196
Berry-Ess6n theorem, 220
Bertrand paradox, 9
Best estimators, 245
Rao-Cram6r bound, 264
Bienayme inequality, 115
Binary transmission, 376
Birth process, 647
Bispectra, 389-395
in spectral representation, 423
symmetries, 390
the phase problem, 393
Bit, 534 н
Boltzmann constant, 348. 351
Bose-Einstein statistic. 11
Brownian motion. 348-351
Langevin equation. 349
Buffon’s needle, 132. 236
Burg's iteration, 468
Bussgang's theorem, 307, 438
Campbell's theorem, 360. 632
Caratheodory's theorem, 467
Cauchy inequality, 399
Centered process. 302
Central limit theorem (CLT), 214-221
error correction, 217
products, 220
Chain rule, 36. 192
entropy, 564
Markoff processes. 636
Channel, 591-600
matrix, 594
Channel capacity, 592, 595
theorem, 597-600
Chapman-Kolmogoroff equations. 193, 637
Characteristic functions. 115, 157, 195
binomial, 118, 196
chi-square, 117, 200
convolution theorem, 158. 195
gamma. 116
moment theorem, 116, 160
normal, 196
complex, 198
661
662 I ND [LX
Characteristic functions (Co/iL)
Poisson, 118
second, 115, 157
ChemofT bound, 122
Chi-square (^2) density, 79
percentiles (table), 253
Chi-square tests, 273
distributions, 273
independent events, 274
Cholesky factorization, 207, 506
Circular symmetry, 134
Code length, 583, 590
Codes:
binary, 580
Fano. 586
Huffman, 586
instantaneous, 583
optimum, 584
random, 596
redundant, 596
Shannon, 585
Coding theorems, 584, 589
Complex RVs, 66, 188
Computers and statistics, 236
Conditional:
distributions, 79-84, 162-166
normal RVs, 164, 204
entropy, 549-558, 561
expected values, 169-173, 194
as RVs, 172
probability, 27, 83
Confidence, coefficient, 243, 246
interval, 246
level, 240
Convergence concepts, 208-214
CLT, 214-221
law of large numbers, 53, 211
Convolution, 136
theorem, 158,195
Correlation coefficient, 152
Covariance, 152
matrix, 190
Cramdr theorem, 158
Cramdr-Wold theorem, 157
Cross-correlation, 294
Cumulants, 117
Cyclostatibnwy processes, 373-376
ballon’s law, 170
Decoding, 582
DeMoivre-Laplace theorem, 49,55,216
Density, 72,126,182
circular symmetry, 134
spherical symmetry, 238
Differential equations, 315, 407
Differentiators, 313, 325, 329
Diffusion, constant, 349
equations, 351, 652
Digital processes, 332-336
power spectrum, 333
Distribution, 66, 124, 182
computer generated, 237
marginal, 126, 183
Distributions:
beta, 260
binomial, 75
negative. 123
Cauchy, 94
chi, 96
chi-square (*2), 79
Erlang, 79
exponential, 77
gamma, 79
geometric, 122, 614
Laplace, 78
lognormal, 97, 221
Maxwell, 96
normal, 74
table, 48
truncated, 82
Pascal, 123
Poisson, 76
Rayleigh, 96
Snedecor-F. 150
Student-t, 148
uniform, 75
Weibull, 168
Yule-Furry, 648
zero-one, 72
Doppler effect, 322
Electron transit, 360
Ensemble, 285
Entropy, 534, 542
conditional, 549-558
historical note, 535
inequalities, 544-549
of RVs, 559
as expected value, 560
stochastic processes, 566-569
Entropy rate, 567
system response, 568
Equivocation, 595л
Ergodicity, 427-442
autocovariance, 435-441
variance, 435
distribution, 441
mean» 428-433
index 663
Slutsky's theorem, 430
spectral Interpretation, 433
Estimation, 244-265
bayesian. 256-260
of distributions, 355
maximum likelihood, 260-263
mean, 246
percentiles, 254
probabilities, 251
variance, 252
Expected value, 102
conditional, 169
as RV, 172
approximate evaluation, 112, 156
estimation of, 246
Exponential density, 77
Fabry-Perot interferometer, 440
Factorization, 401, 403
Failure rate, 186
conditional, 167
Fermi-Dirac statistics, 11
Filtering and prediction, 508-515
digital, 512-515
Finite impulse response (FIR);
filter, 411
predictor, 500
First-paSsage time, 609-612
Fokker-Planck equations, 652
Fourier, integral, 416
series, 412
Fourier-Stieltjes representation, 420
Frequency interpretation, 14
Frequency modulation, 369, 644
instantaneous frequency, 368
Woodward’s theorem, 370
Functions of RVs, 86, 135, 142, 183
Gallon's law, 70
Gamma density, 79
characteristic function, 116
moments, 116
Gaussian curve, 48
(See also Normal)
Goodman’s theorem, 199
Goodness of fit test, 274
Gram-Schmidt method, 206
innovations, 401, 403
Hidden periodicities, 468
Hilbert transforms. 327
Rice's representation. 365
Hypergeometric series. 60
Hypothesis, alternative. 266
composite. 266
null. 265
simple. 266
Hypothesis testing. 265-278
chi-square lest, 273-275
computer simulation. 278
equality of variances. 281
of distributions. 272
Kolmogoroff-Smirnov test, 272
of mean, 269
of probability, 270
of variance, 271
Identification of systems, 392, 457
the phase problem, 393
i.i.d., 185
Imbedded Markoff chain, 621, 640
Independence, linear, 190
statistical, 184
Independent:
events, 32, 36
experiments, 41, 133, 184
RVs, 132, 184
stochastic processes, 296
Infinite additivity, 24
Innovations, 402, 403
filter, 401, 403, 506
Kalman, 505, 516
Inphase component, 366, 397
Instantaneous frequency, 368
Insufficient reason, 8
Interferometer:
Fabry-Perot, 440
Michelson, 440
Kalman filters, 515-528
ARMA, 517-524
first order, 520, 525
linearization, 522, 526
Kalman-Bucy equations, 524
Riccati equations, 526
Kalman innovations, 505, 516
Karhunen-Loive expansion, 413-416, 425
Kolmogorolf-Szego error formula, 491
Hard limiter, 307,341,438
Hazard rate, 167
Hermite polynomials, 217
Langevin equation, 349
Lattice filters, 460
664 INDEX
Lattice filters (Cont.)
extrapolating spectra, 470
inverse, 465
Levinson’s algorithm, 460
Law of large numbers, 53, 211
Law of succession, 260
Level-crossing problem, 603-612
first passage time, 609
zero crossings, 606-609
Levinson’s algorithm, 460
Burg’s iteration, 468
Fejer-Riesz theorem, 469
in prediction, 501
Likelihood function, 261
Likelihood ratio test, 275-278
asymptotic form, 277
Line spectra, 323, 422
Wold’s decomposition, 420, 497
Linear systems, 308-319, 332
differential equations, 315, 407
differentiators, 313, 325, 329
finite order, 409-412
state variables, 405, 409
output spectra, 323, 334
Little’s theorem, 617
Localization of power, 328
Lognormal density, 78, 97
in CLT, 221
Lorentzian spectrum, 349
Loss function, 97
Marginal distribution, 126, 183
Markoff chains, 638
continuous-time, 640
Markoff processes, 635-654
birth processes, 647
continuous-time, 640-654
probability rates, 641
diffusion equations, 652
FM spectra, 644-647
Fokker-Planck equations, 652
Markoff’s inequality, 114
matrix, 165
Martingale, 529
Maximum entropy, 535, 549, 569-579
correlation constraints»'575
deterministic applications, 378
mean constraints, 571-575
partition function, 572
in spectral estimation, 577
MaxWclI-Boltzmann statistics, .11, 61
Maxwell density, 78. 96, 195,238
Mean (see Expected value)
Mean square, continuity, 336
differentiation, 336
equality, 287
integration, 338
periodicity, 303, 412
Mean square estimation. 173-178
geometric interpretation, 178, 202
projection theorem, 178, 202
linear, 176, 201
projection theorem, 178
nonlinear, 175, 203
Dalton's law, 170
orthogonality principle, 177, 201, 204
regression line, 170
normal RVs, 164, 204
orthonormalization. 206
Measurement errors, 187
Median, 68, 178
Memoryless, property, 168
systems, 304-308
Message, 581
Michelson interferometer, 440
Minimum phase filter, 401, 403, 474
Mode, 73, 179
Modulation, 362-372
optimum envelope, 367
Rice’s representation, 365
(See also Frequency modulation)
Moment generating functions, 115, 160, 195
(See also Characteristic functions)
Moment theorem, 116, 160
Moving average, 325
Mutual information, 552, 562
Moments, 109, 155
chi-square, 117
normal, 110
Maxwell, 111
Poisson, 112
Rayleigh, 111
third order. 316
bispectra, 389
Monte Carlo method, 221
Buffon's needle, 236
Moving average, 325, 411, 458
process (MA), 411, 458
Mutual information, 552, 562
Nonlinear systems, 304-308
Normal densities, 74, 138, 197
characteristic functions, 115, 159, 197
Price’s theorem, 123, 161
complex, 198
Goodman’s theorems, 199
conditional, 164, 204
1NDI.X 665
entropy, 560. 561
percentiles (table). 247
Null hypothesis, 265
Nyquist interval, 378
theorem, 352
Operating characteristic, 266
Order statistics, 185
Omstein-Uhlenbeck process, 349
Orthogonal RVs, 153
Orthogonality principle, 177, 201, 204
linear, 176, 201
nonlinear, 175, 203
Paley-Wiener condition, 402, 403
Parameter estimation {see estimation)
Parametric extrapolation, 457-474
Partition, 18
function, 572
Percentiles, 68
estimate of, 254
tables:
chi-square, 253
normal, 247
Student-/, 249
Periodic processes, 303, 412
Periodogram, 444
Point processes, 297
Poisson, 57, 612, 354-358
in random intervals, 613
renewal processes, 297, 614
Poisson impulses, 314
spectrum of, 321
Poisson RVs, 76
moment function, 118
moments, 112
Poisson process, 290, 648
Poisson sum formula, 395
Pollaczek-Khinchin formula, 623
Polya’s criterion, 330
Power localization, 328
of a test, 266
Power spectrum, 319-329
digital-processes, 332
Predictable processes, 420, 497
Prediction, 487-508
causal data, 503-508
Kalman innovations, 505
FIR predictors, 500
Levinson’s algorithm, 501
infinite past, 487-499
analog processes, 493
r-stop predictor, 492
Price’s theorem, 123. 161
Probability. 6-12
masses. 27. 130. 139
Product space. 39
Projection theorem. 178. 201
(See also orthogonality principle)
Pulse amplitude modulation (PAM), 374
Quadrature component. 366. 397
Queueing theory, 612-628
arrivals and departures, 617
Little's theorem, 617
immediate service. 619
single server, 620
busy period, 625
Pollaczek-Khinchin formula, 623
Random numbers (RNs). 222
computer generated. 223-236
percentile-transformation method, 226
rejection method. 229
mixing method. 230
general transformations, 231
Box-Muller method. 234
Random points, 57
Random variables (RVs), 64
functions of, 86, 135, 142, 183
Random walk, 345
generalized, 347, 637
Wiener process, 346
Rao-Cramer bound, 263-265
Rare sequences, 537
Rate of information transmission, 595
Reflection coefficient, 462
Reflection principle, 610
Regression line, 170
surface. 203
Dalton’s law, 170
Regular, density, 263
processes, 420-497
Reliability, 166-169
Renewal processes, 297, 614
Rice’s representation, 365
Sample, mean, 188
variance, 188, 200
Sampling expansions, 378
Papoulis, 381
past samples, 379
random sampling. 382
Schwarz* inequality, 154,394
666 INOEX
Semi-Markoff process, 640
imbedded chain, 640
Shannon,code, 585
theorem, 589
Shift operators, 339
Shot noise, 359-362, 629-635
Campbell's lheorem, 360, 632
intensity of, 633
moment function, 631
power spectrum, 360, 635
Signal to noise ratio, 385
Slutsky’s theorem, 430, 432
Smoothing, 484-487
Spectral estimation, 443-474
Burg's iteration, 468
extrapolation method, 455-474
AR processes, 457
ARMA processes, 459
MA processes, 459
hidden periodicities, 468
Levinson’s algorithm, 463
maximum entropy method (MEM), 474
periodogram, 444
smoothed spectrum, 447
windows, method of, 456
Spectral representation, 416-424
Spherical symmetry, 238
Standard deviation, 106
State variables, 405, 409
Stationaiy processes, 297-303
strict sense (SSS), 297
wide sense (WSS), 298
Statistics, 71, 245, 245л
Stochastic convergence, 208-213
System identification, 392, 457
Systems, 303-39
multiterminal, 317
state variables, 405,409
Tayjor series, 377
Tchebycheff inequality, 113
Telegraph signal, 291, 643
Test, most powerful, 266
power of, 266
statistic, 267
Thermal noise, 351-354
Nyquist theorem» 352
Third order moments. 316
bispecira, 389
Time average, 428
{See also ergodicity)
Time-io-failure, 166
Total probability, 84, 164
Traffic intensity, 619
Tree, binary, 579
Transformations of RVs, 86, 135, 142, 183
measure preserving, 339
Typical sequences, 537
in coding, 590
Uncertainty, 534
Uncorrelated RVs, 153
Variance, 106
approximate evaluation, 113
conditional, 170
Vector, processes, 317
spectra, 329
Venn diagrams, 17
White noise. 295
Whitening filter, 402, 403
Kalman, 506
Wiener filter, causal, 493
noncausal, 493
prediction and filtering, 508
Wiener-Hopf equation, 488, 493, 508
Windows, 419, 445, 451
method of, 456
Wold’s decomposition, 420, 499
Woodward’s theorem, 322
Yule-Walker equations, 410, 458, 500
iterative solution, 465
Zero-crossing density, 605
nondifferentiable processes, 607
Zero-one RVs, 72