/
Author: Aigner M. Ziegler G.M.
Tags: mathematics geometry number theory mathematical analysis combinatorics
ISBN: 3-540-67865-4
Year: 1998
Text
+
M. rtin Aigner. Gunter M. Ziegler
Pr I ofs from TH E BOOK
Second Edition
".
.:w Springer
9
DIVINE PUNCH LINES
From the Reviews of the First Edition:
" ... This is a wonderful book that can be recommended to anybody who is in any way connected
to mathematics. Those who have ever experienced the beauty of mathematics will experience
the chill again. For those who have never experienced that, this book is just the right one to start."
Acta Scientiarum Mathematicarum (Szeged), 1999
"...Inside PFTB is indeed a glimpse of mathematical heaven. where clever insights and beautiful
ideas combine in astonishing and glorious ways. There is vast wealth within its pages, one gem
after another. Some of the proofs are classics, but many are new and brilliant proofs of classical
results.... Aigner and Ziegler do not claim to have presented the definitive collection of great
mathematics. In their brief introduction they write: "We have no definition or characterization of
what constitutes a prooffrom THE BOOK: all we offer is the examples that we have selected,
hoping that our readers will share our enthusiasm about brilliant ideas, clever insights and
wonderful observations." I do. ..."
Notices of the American Mathematica/SoCiety, August 1999
ISBN 3-540-67865-4
111111111111111111111111111111
9 773540 678657
1 .1
20 1
---
World Mdlhem<ltlCaJ Year
http://www.springer.de
Martin Aigner
Gunter M. Ziegler
Proofs from THE BOOK
Second Edition
Libr,\!) of Congre Cataloging-in-Puhlic:ltloll Data applied for
Die Dcubche Bibliothek - ClP-Einhcilsautnahme
!nd. i!1. by K,Jr! H. Hofmann. - 2. cd.-
Hong Kong; London; Milan: P,tri,; Singapore:
Mathematics Subject Classification (20GO): 00-0] (General)
ISBN 3-540-67865-4 Springer-Verlag Berlin Heidelberg New York
ISBN 3-540-63698-6 First edition Springer-Verlag Berlin Heidelberg New York
Springl.>r-Verlag Berlin Heidelherg ","ew York
a member or RertejmannSpringer Scicn<.:e+R\1ines Media GmhH
Heidelberg 1998, 21)(n
Prinled
The u,e of general descriptive namc, regilercd rlame" lrademarh, ell' in thi publi<.:alion doc, not
in lhe absence 1)[ a spe<.:ific li.Ilcment. that such name" are exempl frorll lhe r<:lev'I1n1 prolec-
tive free for general u,c
Berli1}
Printed on acid-free paper
SPIN I072J040 ..J-6nI42K()-543 1 ()
Preface
Paul Erdos liked to talk about The Book, in which God maintains the perfect
proofs for mathematical theorems, following the dictum ofG. H. Hardy that
there is no permanent place for ugly mathematics. Erdos also said that you
need not believe in God but, as a mathematician, you should believe in
The Book. A few years ago, we suggested to him to write up a first (and
very modest) approximation to The Book. He was enthusiastic about the
idea and, characteristically, went to work immediately, filling page after
page with his suggestions. Our book was supposed to appear in March
1998 as a present to Erdos' 85th birthday. With Paul's unfortunate death
in the summer of 1996, he is not listed as a co-author. Instead this book is
dedicated to his memory.
We have no definition or characterization of what constitutes a proof from
The Book: all we offer here is the examples that we have selected, hop-
ing that our readers will share our enthusiasm about brilliant ideas, clever
insights and wonderful observations. We also hope that our readers will
enjoy this despite the imperfections of our exposition. The selection is to a
great extent influenced by Paul Erdos himself. A large number of the topics
were suggested by him, and many of the proofs trace directly back to him,
or were initiated by his supreme insight in asking the right question or in
making the right conjecture. So to a large extent this book reflects the views
of Paul Erdos as to what should be considered a proof from The Book.
A limiting factor for our selection of topics was that everything in this book
is supposed to be accessible to readers whose backgrounds include only
a modest amount of technique from undergraduate mathematics. A little
linear algebra, some basic analysis and number theory, and a healthy dollop
of elementary concepts and reasonings from discrete mathematics should
be sufficient to understand and enjoy everything in this book.
We are extremely grateful to the many people who helped and supported
us with this project - among them the students of a seminar where we
discussed a preliminary version, to Benno Artmann, Stephan Brandt, Stefan
Felsner, Eli Goodman, Torsten Heldmann, and Hans Mielke. We thank
Margrit Barrett, Christian Bressler, Ewgenij Gawrilow, Michael Joswig,
Elke Pose, and Jorg Rambau for their technical help in composing this
book. We are in great debt to Tom Trotter who read the manuscript from
first to last page, to Karl H. Hofmann for his wonderful drawings, and
most of all to the late great Paul Erdos himself.
Berlin, March 1998
Martin Aigner . Gunter M. Ziegler
"#
Paul Erdos
"The Book"
Preface to the Second Edition
The first edition of this book got a wonderful reception. Moreover, we re-
ceived an unusual number of letters containing comments and corrections,
some shortcuts, as well as interesting suggestions for alternative proofs and
new topics to treat. (While we are trying to record peifect proofs, our
exposition isn't.)
The second edition gives us the opportunity to present this new version of
our book: It contains three additional chapters, substantial revisions and
new proofs in several others, as well as minor amendments and improve-
ments, many of them based on the suggestions we received. It also misses
one of the old chapters, about the "problem of the thirteen spheres," whose
proof turned out to need details that we couldn't complete in a way that
would make it brief and elegant.
Thanks to all the readers who wrote and thus helped us - among them
Stephan Brandt, Jiirgen Elstroth, Roger Heath-Brown, Lee L. Keener,
Hanfried Lenz, John Scholes, Bernulf WeiBbach, and man} others. Thanks
again for help and support to Ruth Allewelt and Karl-Friedrich Koch at
Springer Heidelberg, to Christoph Eyrich and Torsten Heldmann in Berlin,
and to Karl H. Hofmann for some superb new drawings.
Berlin, September 2000
Martin Aigner . Gunter M. Ziegler
Table of Contents
Number Theory
1
I. Six proofs of the infinity of primes .............................. 3
2. Bertrand's postulate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3. Binomial coefficients are (almost) never powers ................. 13
4. Representing numbers as sums of two squares ........ . . . . . . . . . . . 17
5. Every finite division ring is a field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
6. Some irrational numbers ...................................... 27
Geometry
37
7. Hilbert's third problem: decomposing polyhedra. . . . . . . . . . . . . . . . .39
8. Lines in the plane and decompositions of graphs. . . . . . . . . . . . . . . . .47
9. The slope problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
10. Three applications of Euler's fonnula .......................... .59
II. Cauchy's rigidity theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12. Touching simplices ........................................... 69
13. Every large point set has an obtuse angle. . . . . . . . . . . . . . . . . . . . . . . .73
14. Borsuk's conjecture................... .... ................. ...79
Analysis
85
IS. Sets, functions, and the continuum hypothesis.............. .....87
16. In praise of inequalities ....................................... 99
17. A theorem of P61ya on polynomials ........................... 107
18. On a lemma of Littlewood and Offord................ ......... lIS
19. Cotangent and the Herglotz trick. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
20. Buffon's needle problem ..................................... 125
VIII
Table of Contents
Combinatorics
129
21. Pigeon-hole and double counting ............................. 131
22. Three famous theorems on finite sets. . . . . . . . . . . . . . . . . . . . . . . . . . 143
23. Lattice paths and determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149
24. Cayley's formula for the number of trees ...................... 155
25. Completing Latin squares .................................... 161
26. The Dinitz problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Graph Theory
173
27. Five-coloring plane graphs ................................... 175
28. How to guard a museum ..................................... 179
29. Turan's graph theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
30. Communicating without errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
31. Of friends and politicians .................................... 199
32. Probability makes counting (sometimes) easy .................. 203
About the Illustrations
Index
212
213
Number Theory
1
Six proofs
of the infinity of primes 3
2
Bertrand's postulate 7
3
Binomial coefficients
are (almost) never powers 13
4
Representing numbers
as sums of two squares 17
5
Every finite division ring
is a field 23
6
Some irrational numbers 27
"1 rrationality and If"
--
Six proofs
of the infinity of primes
It is only natural that we start these notes with probably the oldest Book
Proof, usually attributed to Euclid (Elements IX, 20). It shows that the
sequence of primes does not end.
. Euclid's Proof. For any finite set {PI,' .. , Pr} of primes, consider
the number n = PIP2 . . . Pr + 1. This n has a prime divisor p. But P is
not one of the Pi: otherwise P would be a divisor of n and of the product
PIP2 . . . Pr', and thus also of the difference n - P1P2 . . . Pr = 1, which
is impossible. So a finite set {PI' . . . , Pr} cannot be the collection of all
prime numbers. D
Before we continue let us fix some notation. N = {I, 2, 3, . . . } is the set
of natural numbers, Z = {. .. , -2, -1,0,1,2, . . . } the set of integers, and
IF' = {2, 3, 5, 7, . . . } the set of primes.
In the following, we will exhibit various other proofs (out of a much longer
list) which we hope the reader will like as much as we do. Although they
use different view-points, the following basic idea is common to all of them:
The natural numbers grow beyond all bounds, and every natural number
n 2' 2 has a prime divisor. These two facts taken together force IF' to be
infinite. The next three proofs are folklore, the fifth proof was proposed by
Harry Furstenberg, while the last proof is due to Paul Erdos.
The second and the third proof use special well-known number sequences.
. Second Proof. Let us first look at the Fermat numbers Fn = 2 2n + 1 for
n = 0,1,2, . . .. We will show that any two Fermat numbers are relatively
prime; hence there must be infinitely many primes. To this end, we verify
the recursion
n-l
II Fk = Fn - 2
k=O
from which our assertion follows immediately. Indeed, if m is a divisor of,
say, Fk and Fn (k < n), then Tn divides 2, and hence m = lor 2. But
Tn = 2 is impossible since all Fermat numbers are odd.
To prove the recursion we use induction on n. For n = 1 we have Fo = 3
and F 1 - 2 = 3. With induction we now conclude
(n 2' 1),
n n-l
II Fk = (II Fk)Fn = (Fn - 2)Fn =
k=O k=O
= (2 2n _ 1)(2 2n + 1) = 22n+1 - 1
Fn+l - 2.
Chapter 1
Fo 3
FI 5
F 2 17
F3 257
F4 65537
Fs 641.6700417
The first few Fennat numbers
D
4
Six proofs of the infinity of primes
Lagrange's Theorem
IfG is a finite (multiplicative) group
and U is a subgroup, then IUI
divides IGI.
. Proof. Consider the binary rela-
tion
a b: ba- I E U.
It follows from the group axioms
that is an equivalence relation.
The equivalence class containing an
element a is precisely the coset
Ua={xa:xEU}.
Since clearly IUal = IUI, we find
that G decomposes into equivalence
classes, all of size IUI, and hence
that IUI divides IGI. 0
In the special case when U is a cyclic
subgroup {a,a 2 ,... ,am} we find
that m (the smallest positive inte-
ger such that am = 1, called the
order of a) divides the size IGI of
the group.
1
1 2 n n+l
Steps above the function J(t) = t
· Third Proof. Suppose lP' is finite and P is the largest prime. We consider
the so-called Mersenne number 2 P - 1 and show that any prime factor q
of 2 P - 1 is bigger than p, which will yield the desired conclusion. Let q be
a prime dividing 2 P - 1, so we have 2 P == 1 (mod q). Since P is prime, this
means that the element 2 has order P in the multiplicative group Zq \ {O} of
the field Zq. This group has q - 1 elements. By Lagrange's theorem (see
the box) we know that the order of every element divides the size of the
group, that is, we have P I q - 1, and hence p < q. 0
Now let us look at a proof that uses elementary calculus.
· Fourth Proof. Let If (x) : = # {p ::; x : p E lP'} be the number of primes
that are less than or equal to the real number x. We number the primes
lP' = {PI, P2, P3, . . . } in increasing order. Consider the natural logarithm
log x, defined as log x = K tdt.
Now we compare the area below the graph of f(t) = t with an upper step
function. (See also the appendix on page 10 for this method.) Thus for
n ::; x < n + 1 we have
log x <
1 1 1 1
1+-+-+...+_+_
2 3 n-l n
< L , where the sum extends over all Tn E N which have
Tn I . d '. <
on y pnme IVlsors P _ x.
Since every such Tn can be written in a unique way as a product of the form
n p kp , we see that the last sum is equal to
p'Sx
II (L p 1 k )'
pEP kO
p<x
The inner sum is a geometric series with ratio 1, hence
P
log x <
1
II 1_1
pEP P
p'Sx
7f(X)
= II PkP 1 .
k=l
II p1
pEP
p<x
Now clearly Pk 2 k + 1, and thus
Pk
Pk -1
1 1
1+- < 1+-
Pk - 1 k
k+l
k
and therefore
7f(X) k + 1
log x ::; II ----,;- = If(X) + 1.
k=l
Everybody knows that log x is not bounded, so we conclude that If( x) is
unbounded as well, and so there are infinitely many primes. 0
--
Six proofs of the infinity ofprimes
5
. Fifth Proof. After analysis it's topology now! Consider the following
curious topology on the set Z of integers. For a, b E Z, b > 0, we set
Na,b = {a+nb: n E Z}.
Each set Na,b is a two-way infinite arithmetic progression. Now call a set
o <;;; Z open if either 0 is empty, or if to every a E 0 there exists some
b > 0 with Na,b <;;; O. Clearly, the union of open sets is open again. If
0 1 , O 2 are open, and a E 0 1 n O 2 with Na,b, <;;; 0 1 and N a ,b2 <;;; O 2 ,
then a E N a ,b,b2 <;;; 0 1 n O 2 . So we conclude that any finite intersection
of open sets is again open. So, this family of open sets induces a bona fide
topology on Z.
Let us note two facts:
(A) Any non-empty open set is infinite.
(B) Any set Na,b is closed as well.
Indeed, the first fact follows from the definition. For the second we observe
b-l
Na,b = Z\ U Na+i,b,
i=1
which proves that Na,b is the complement of an open set and hence closed.
So far the primes have not yet entered the picture - but here they come.
Since any number n =11, -1 has a prime divisor p, and hence is contained
in No,p, we conclude
Z\{I, -I} = U No,p.
pEII'
Now if IF' were finite, then UPEII' No,p would be a finite union of closed sets
(by (B)), and hence closed. Consequently, {I, -I} would be an open set,
in violation of (A). 0
. Sixth Proof. Our final proof goes a considerable step further and
demonstrates not only that there are infinitely many primes, but also that
the series LpEII' diverges. The first proof of this important result was
given by Euler (and is interesting in its own right), but our proof, devised
by Erdos, is of compelling beauty.
Let Pl,P2,P3,'" be the sequence of primes in increasing order, and
assume that LpEII' converges. Then there must be a natural number k
such that Li>k+l l < . Let us call PI, . .. , Pk the small primes, and
- p,
Pk+l,Pk+2,... the big primes. For an arbitrary natural number N we
therefore find
N
iIPi
<
N
2
/' "
f {.'Y'
!. e; .
I (\S /
... '-
'"" . ......,.... ___ e_-'
:., . =--/'
\ '
<,
" )
----- -\
"Pitching fiat rocks, infinitely"
(1)
6
Six proa.f5 afthe infinity afprimes
Let N b be the number of positive integers n :S N which are divisible by at
least one big prime, and N s the number of positive integers n :S N which
have only small prime divisors. We are going to show that for a suitable N
N b + N s < N,
which will be our desired contradiction, since by definition N b + N s would
have to be equal to N.
To estimate N b note that L If: J counts the positive integers n < N which
are multiples of Pi. Hence by (I) we obtain
N
Nb:S L l J
i:;>k+l p,
N
< 2
(2)
Let us now look at N". We write every n :S N which has only small prime
divisors in the form n = anb;" where an is the square-free part. Every an
is thus a product of different small primes, and we conclude that there are
precisely 2 k different square-free parts. Furthermore, as b n :S vn :S ,jN,
we find that there are at most ,jN different square parts, and so
N s :S 2kVR.
Since (2) holds for any N, it remains to find a number N with 2k,jN :S 't
or 2k+l < ,jN, and for this N = 22k+2 will do. D
References
[I] B. ARTMANN: Euclid - The Creation of Mathematics, Springer-Verlag, New
York 1999.
[2] P. ERDOS: Uberdie Reihe 2: 1, Mathematica, Zutphen B 7 (\938),1-2.
p
[3] L. EULER: Introductio in Ana/ysin Infinitorum, Tomus Primus, Lausanne
1748; Opera Omnia, Ser. I, Vol. 8.
[4] H. FURSTENBERG: On the infinitude of primes, Amer. Math. Monthly 62
(\ 955), 353.
-
Bertrand's postulate
We have seen that the sequence of prime numbers 2,3,5, 7, . .. is infinite.
To see that the size of its gaps is not bounded, let N := 2 . 3 . 5 . . . . . p
denote the product of all prime numbers that are smaller than k + 2, and
note that none of the k numbers
N + 2, N + 3, N + 4,... , N + k, N + (k + 1)
is prime, since for 2 :::; i :::; k + 1 we know that i has a prime factor that is
smaller than k + 2, and this factor also divides N, and hence also N + i.
With this recipe, we find, for example, for k = 10 that none of the ten
numbers
2312,2313,2314, . .. ,2321
IS pnme.
But there are also upper bounds for the gaps in the sequence of prime num-
bers. A famous bound states that "the gap to the next prime cannot be larger
than the number we start our search at." This is known as Bertrand's pos-
tulate, since it was conjectured and verified empirically for 71 < 3000000
by Joseph Bertrand. It was first proved for all 71 by Pafnuty Chebyshev in
1850. A much simpler proof was given by the Indian genius Ramanujan.
Our Book Proof is by Paul Erdos: it is taken from Erdos' first published
paper, which appeared in 1932, when Erdos was 19.
Bertrand's postulate.
For every 71 2 I, there is some prime number p with 71 < p :::; 271.
· Proof. We will estimate the size of the binomial coefficient (2') care-
fully enough to see that if it didn't have any prime factors in the range
71 < p :::; 271, then it would be "too smal!." Our argument is in five steps.
(1) We first prove Bertrand's postulate for 71 < 4000. For this one does not
need to check 4000 cases: it suffices (this is "Landau's trick") to check that
2,3,5,7,13,23,43,83,163,317,631,1259,2503,4001
is a sequence of prime numbers, where each is smaller than twice the previ-
ous one. Hence every interval {y : 71 < y :::; 2n}, with 71 :::; 4000, contains
one of these 14 primes.
Chapter 2
Joseph Bertrand
Beweis eines Satzes von Tschebyschef
\n:\ P Em,ut> ilJ BlJdaptH
Fur tlen 7I1(;nl 'l'l1 Tl:HUWSCmF b:;wJcsencll Salz, J.aUt
dessen e-s lwis.:hen cwer n:UlIr!ither. laid unct Ihrer 2wc!fachcn
stets w,,;mgtcns cine PW1"11ahl glbt, hcgcn III dcr Lltcratur mehrcre
n('!,\('i"f' WIT. Als c)nfaths!tn kilnn m,1n ohne Zweiff'! den R{'('i<;
he-reichnf:'11 In S(:U!(;'1I1 Wet!.. libn
]927), Band I, S 66-68 bib! Herr
einen besonders BeWeJ5 fur clnen Sa,z uber dH: J\nzahJ
dcr PrH1ll<th1en unlt>f eWer aU5 wckhcm un-
mifteJbar dati fur cin tj cintr n,'lIurllchen
7,!111 IJi!d q-facllt'iJ :-.It'h Prilllld!d FOr (I'e ,wg,'p-
blk'kl!chen Zw"CKen dc!>' Hl'rrn LAOAl es nkht auf die
numensdJe lIer !:n Bl!weis l.Jt.ftn.':enden Konstanlen
an, man abcr dUrch emc numenche VerfoJgunR
des Bc\"clses griJJkr als 2 tlUsfl!lIl
In dt'li ,""trdt' H.:h '''Igef!, lid!! lIlan dUllh
tcn Vcr;,th:irfung tier Uelfl LA'IPAvschm BCWe!l licl{<.'n-
den Jdeen Zl! cHlcm Bwel" des obcn crwahn:erl
schen Satzes gelangcn k1nIJ. dcr w:e mlr sheml [In E!Il-
lachkclt nl..:ht hinteJ dcm Br:wcls "!eht (jrieclu!>( f)('
Bl!ch<;tahrn "1,III'n ill1 dlll(hw('g pl""itivl", Id:t'lili,d,!.,
BU<..1r;,t.loen IId!ur])clJe bezt'ichne!1, die BCZ<'1Chl1ul1j:t p JSI
fur Primzahlen \"'rbehai!cn
I. Va H:nr,:11W1kodlment
th In_
oj
8
Bertrand's postulate
(2) Next we prove that
iL
n f L?
)4JJ _ rA
r1fhf
f1tJ!0.
(It '1 -,;" 2ft if
< qJ7t
tlr;!
n ( 2m IrM2"'ft [ff
1 < 1;nf-
fZwi 1 jtf:rrN 1
r \fll7d
Legendre's theorem
The number n! contains the prime
factor p exactly
L l p J
k2: 1
times.
. Proof. Exact! y l J of the factors
ofn! = 1.2.3... ..n are divisible by
p, which accounts for l J p-factors.
Next, l? J of the factors of n! are
even divisible by p2, which accounts
for the next l? J prime factors p
of n!, etc. 0
II p < 4 x - 1
p<x
for aJ] real x 2: 2,
(I)
where our notation - here and in the following - is meant to imply that
the product is taken over all prime numbers p ::::: x. The proof that we
present for this fact uses induction on the number of these primes. It is
not from Erdos' original paper, but it is also due to Erdos (see the margin),
and it is a true Book Proof. First we note that if q is the largest prime with
q ::::: x, then
IIp = IIp
p<x pSq
and
4 q - 1 < 4 x - 1 .
Thus it suffices to check (1) for the case where x = q is a prime number. For
q = 2 we get "2 ::::: 4," so we proceed to consider odd primes q = 2m + 1.
For these we split the product and compute
II p =
p<2m+l
II p' II p ::::: 4 m Cmm+ 1) ::::: 4 m 2 2m = 4 2m .
p:Sm+l m+l<p:S2m+l
All the pieces of this "one-line computation" are easy to see. In fact,
II p < 4 m
p<m+l
holds by induction. The inequality
II p <
m+l<p:S2m+l
Cm m+ 1)
foJ]ows from the observation that emm H ) = l(t:{;! is an integer, where
the primes that we consider all are factors of the numerator (2m + I)!, but
not of the denominator m!(m + 1)1. Finally
( 2m m + 1 )
< 2 2m
holds since
Cmm+ 1) and C: :11)
are two (equal!) summands that appear in
2m+l
L Cm k + 1) = 22m+l.
k=O
( 2n )
(3) From Legendre's theorem (see the box) we get that n
tains the prime factor p exactly
(2n)!
n!n! con-
(l J - 2l p j)
Bertrand's postulate
9
times. Here each summand is at most 1, since it satisfies
l : J - 2l p : J
< 2n _ 2 ( _ 1 ) = 2,
ph ph
and it is an integer. Furthermore the summands vanish whenever ph > 2n.
Thus (2!) contains p exactly
"" (l J - 2 l - J) < max{r: pr :::; 2n}
I
k>l
times. Hence the largest power of p that divides (2r) is not larger than 2n.
In particular, primes p > y'2n appear at most once in e:).
Furthermore - and this, according to Erdos, is the key fact for his proof
- primes p that satisfy tn < p :::; n do not divide e:) at all! Indeed,
3p > 2n implies (for n 3, and hence p 3) that p and 2p are the only
multiples of p that appear as factors in the numerator of (2\! , while we get
n.n.
two p-factors in the denominator.
(4) Now we are ready to estimate (2,). For n 3, using an estimate from
page 12 for the lower bound, we get
4 n ( 2n )
- :::; :::; II 2n
2n n
p < -/2n
II p
-/2n<p<:::*n
II p
n<p<:::2n
and thus, since there are not more than y'2n primes p :::; y'2n,
4 n :::; (2n)1+-/2n
II
II p
for n > 3.
p
-/2n<p-::;n
n<p<:::2n
(5) Assume now that there is no prime p with n < p :::; 2n, so the second
product in (2) is 1. Substituting (1) into (2) we get
4 n :::; (2n)1+-/2n 4n
or
4kn :::; (2n)1+-/2n,
which is false for n large enough! In fact, using a + 1 < 2 a (which holds
for all a 2, by induction) we get
2n = ()6 < (l J + 1)6 < 26l o/:211J :::; 2 6 0/:211, (4)
and thus for n 50 (and hence 18 < 2y'2n) we obtain from (3) and (4)
2 2n :::; (2n)3(1+-/2n) < 2 0/:211 (18+18-/2n) < 2200/:211-/2n = 220(2n)2/3.
This implies (2n)1/3 < 20, and thus n < 4000.
Examples such as
G) = 2 3 . 52 . 7 . 17. 19 . 23
G:) = 2 3 . 3 3 . 52 . 17. 19.23
(7) = 2 4 .3 2 . 5 . 17. 19. 23 . 29
illustrate that "very small" prime factors
p < v'2Ti can appear as higher powers
in e:), "small" primes with v'2Ti <
p :::; n appear at most once, while
factors in the gap with n < p :::; n
don't appear at all.
(2)
(3)
D
10
Bertrand's postulate
1
1 2
One can extract even more from this type of estimates: from (2) one can
derive with the same methods that
II
1
P 2: 2'j(jn
for n 2: 4000,
n<p<:;2"
and thus that there are at least
( ' ) 1 71 1 n
log2n 230" = > --
30 log:z n + 1 30 log:z n
primes in the range between nand 2n.
This is not that bad an estimate: the "true" number of primes in this range
is roughly 71/ log n. This follows from the "prime number theorem," which
says that the limit
. #{p «::: 71: pis prime}
lun
,,-+x 71/ log n
exists, and equals 1. This famous result was first proved by Hadamard and
de la Vallee-Poussin in 1896; Selberg and Erdos found an elementary proof
(without complex analysis tools, but still long and involved) in 1948.
On the prime number theorem itself the final word, it seems, is still not in:
for example a proof of the Riemann hypothesis (see page 34), one of the
major unsolved open problems in mathematics, would also give a substan-
tial improvement for the estimates of the prime number theorem. But also
for Bertrand's postulate, one could expect dramatic improvements. In fact,
the following is an unsolved problem [3, p. 191:
Is there always a prime between n 2 and (71 + 1)2?
Appendix: Some estimates
Estimating via integrals
There is a very simple-but-effective method of estimating sums by integrals
(as already encountered on page 4). For estimating the harmonic numbers
" 1
H" = L k
A'=l
we draw the figure in the margin and derive from it
n 1 i n 1
H" - 1 = L k < t dt = log n
k=2 . 1
by comparing the area below the graph of f(t) = t (1 «::: t «::: n) with the
area of the dark shaded rectangles, and
1
H --
"
71
£ > i " dt
k .1 t
k=l
logn
71
Bertrand's postulate
II
by comparing with the area of the large rectangles (including the lightly
shaded parts). Taken together, this yields
1
log n + - < Hn < log TI + 1.
n
In particular, lim Hn ---t 00, and the order of growth of Hn is given by
n -+ ex:
lim lHn = 1. But much better estimates are known (see [2]), such as
n-+oc, og n
1 1 1 ( 1 )
H" = logn+,+ _ 2 - _ 12 ., + I2() 4 +0 f '
n n- n n'
where! ::::;; 0.5772 is "Euler's constant."
Estimating factorials - Stirling's formula
The same method applied to
n
log(n!) = log2+log3+...+logn
L log k
A-=2
yields
log((n - I)!) < In logtdt < log(n!),
where the integral is easily computed:
,{'logtdt = [tlogt-t];'
Thus we get a lower estimate on n!
n log n - n + 1.
n! > e" log 1/-,,+]
e(r
and at the same time an upper estimate
n! = TI (n -I)! < ne"log,,-n+l = en()n.
Here a more careful analysis is needed to get the asymptotics of n!, as given
by Stirling'sformula
n! '" v27fn (;)"
And again there are more precise versions available, such as
( n ) " ( 1 1 139 ( 1 ) )
n! = y27fn - 1+-+---+0 - .
e I2n 288n 2 5I40n 3 n 4
Estimating binomial coefficients
Just from the definition of the binomial coefficients () as the number of
k-subsets of an n-set, we know that the sequence (), c:), . .. ,C;) of
binomial coefficients
Here 0 ( 11]6 ) denotes a function [(n)
such that [(n) :::; r holds for some
constant c.
Here [(n) g(n) means that
lim [(n) = 1-
n--too g(n)
CT:;
"I
- i'
;:y I 1
'" , G(A/ 1 1
-::ilI'(j,_", "I"''./' 1 2 1
'- . 1 3 3 1
1 464 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
Pascal's triangle
Bertrand's postulate
n
. sums to () = 2"
k=O
. is symmetric:
( " ) = ( " ) .
k ,,-k
From the functional equation G) = "-;'+1 (kl) one easily finds that for
every n the binomial coefficients G) form a sequence that is symmetric
and unimodal: it increases towards the middle, so that the middle binomial
coefficients are the largest ones in the sequence:
1 = () < G) < ... < ([,,/2J) = ([,,'/21) > .., > (,,:1) > (;:) = 1.
Here Lx J resp. 1:1; l denotes the number :r rounded down resp. rounded up
to the nearest integer.
From the asymptotic formulas for the factorials mentioned above one can
obtain very precise estimates for the sizes of binomial coefficients. How-
ever, we will only need very weak and simple estimates in this book, such
as the following: G) ::::: 2" for all k, while for n 2: 2 we have
( n ) 2"
Ln/2J 2:-:;;,
with equality only for n = 2. In particular, for n 2: 1,
( 2n ) 2: 4" .
n 2n
This holds since ([n/2J)' a middle binomial coefficient, is the largest en-
try in the sequence () + (), G), G), . " , (n: 1)' whose sum is 2", and
whose avera g e is thus 2 n .
n
On the other hand, we note the upper bound for binomial coefficients
( n ) n(n-1)...(n-k+1) n k n k
= <-<-
k k! - k! - 2 k - 1 '
which is a reasonably good estimate for the "small" binomial coefficients
at the tails of the sequence, when n is large (compared to k).
References
[I] P. ERDOS: Beweis eines Satzes von Tschebyschej; Acta Sci. Math. (Szeged) 5
(1930-32), 194-198.
[2] R. L. GRAHAM, D. E. KNUTH & O. PATASHNIK: Concrete Mathematics.
A Foundation for Computer Science, Addison-Wesley, Reading MA 1989.
[3] G. H. HARDY & E. M. WRIGHT: An Introduction to the Theory of Numbers,
fifth edition, Oxford University Press 1979.
----
Binomial coefficients
are (almost) never powers
There is an epilogue to Bertrand's postulate which leads to a beautiful re-
sult on binomial coefficients. In 1892 Sylvester strengthened Bertrand's
postulate in the following way:
ffn 2': 2k, then at least one of the numbers 71, 71 - 1, . .. ,71 - k + 1
has a prime divisor p greater than k.
Note that for 71 = 2k we obtain precisely Bertrand's postulate. In 1934,
Erdos gave a short and elementary Book Proof of Sylvester's result, running
along the lines of his proof of Bertrand's postulate. There is an equivalent
way of stating Sylvester's theorem:
The binomial coefficient
G)
71(71 - 1) ... (71 k + 1)
k!
(71 2': 2k)
always has a prime factor p > k.
With this observation in mind, we turn to another one of Erdos' jewels.
When is () equal to a power m£? It is easy to see that there are infinitely
many solutions for k = g = 2, that is, of the equation G) = m 2 . Indeed,
if G) is a square, then so is ((2n;-1)2). To see this, set 71(71 - 1) = 2m 2 .
It follows that
(271 - 1)2((271 - If - 1) = (271 - 1)2471(71 - 1) = 2(2m(2n - 1))2,
and hence
C 271 ; 1)2) = (2m(271 _ 1))2.
Beginning with (;) = 6 2 we thus obtain infinitely many solutions - the
next one is (29) = 204 2 . However, this does not yield all solutions. For
example, e20) = 35 2 starts another series, as does e 6 2 82 ) = 1189 2 . For
k = 3 it is known that G) = m 2 has the unique solution 71 = 50, m = 140.
But now we are at the end of the line. For k 2': 4 and any g 2': 2 no solutions
exist, and this is what Erdos proved by an ingenious argument.
Theorem. The equation G) = me has no integer solutions with
g 2': 2 and 4 ::::; k ::::; 71 - 4.
Chapter 3
() = 140 2
is the only solution for k = 3, f! = 2
14
Binomial coefficients are (almost) never powers
. Proof. Note first that we may assume n ::>: 2k because of G) = (,,k)'
Suppose the theorem is false, and that C) = mE. The proof, by contra-
diction, proceeds in the following four steps.
(1) By Sylvester's theorem, there is a prime factor p of (Z) greater than k,
hence p' divides n(n - 1) . .. (n - k + 1). Clearly, only one of the factors
n - i can be a multiple of p (because of p > k), and we conclude 1/ In - i,
and therefore
n ::>: 1/ > k( ::>: k 2 .
(2) Consider any factor n - j of the numerator and write it in the fonn
n - j = ajTn}, where aj is not divisible by any nontrivial f-th power. We
note by (1) that aj has only prime divisors less than or equal to k. We want
to show next that ai i- aj for i i- j. Assume to the contrary that ai = aj
for some i < j. Then Tn; ::>: Tnj + 1 and
k > (n - i) - (n - j) = aj(Tnf - Tn)) ::>: aj((Tnj + 1)£ - Tn))
> ajbnj-l::>: f(ajm))1/2 ::>: {i(n - k + 1)1/2
> f( + 1)1/2 > nl/2,
which contradicts n > k'2 from above.
(3) Next we prove that the ai's are the integers 1,2, . ., ,k in some order.
(According to Erdos, this is the crux of the proof.) Since we already know
that they are all distinct, it suffices to prove that
aOal ... ak-j divides k!.
Substituting n - j = ajTn) into the equation C) = rr/, we obtain
aoal ... ak-l (mOrnl . .. Tnk-d( = k!m/.
Cancelling the common factors of mo' . . mk-l and Tn yields
aoaj"'ak-l'U( = k'!v'
with gcd(u, v) = 1. It remains to show that v = 1. If not, then v con-
tains a prime divisor p. Since gcd( u, v) = 1, p must be a prime divisor
of aOal . . . ak-1 and hence is less than or equal to k. By the theorem of
Legendre (see page 8) we know that k! contains p to the power i2:1l ff. J.
We now estimate the exponent ofp in n(n - 1) ... (n - k + 1). Let i be a
positive integer, and let b l < b 2 < '" < b s be the multiples of pi among
n, n - 1,... ,n - k + 1. Then b s = b] + (s - l)pi and hence
(s - l)pi = b s - b] ::::: n - (n - k + 1) = k - 1,
which implies
s < l \l J+1 < l : J+1.
--
Binomial coefficients are (almost) never powers
15
So for each i the number of multiples of pi among 71,.. . ,n-k+l, and
hence among the iLj'S, is bounded by L J + 1. This implies that the expo-
nent of pin iLOa] . . . a"-1 is at most
1-]
L(l k; J+1)
i=1 P
with the reasoning that we used for Legendre's theorem in Chapter 2. The
only difference is that this time the sum stops at i = f - 1, since the a/s
contain no f-th powers.
Taking both counts together, we find that the exponent of pin v€ is at most
(-] k k
L (l-;J + 1) - L l-d < f - 1,
i=1 P i:::>1 P
and we have our desired contradiction, since v( is an f-th power.
This suffices already to settle the case f = 2. Indeed, since k 4 one of
the ai's must be equal to 4, but the ai's contain no squares. So let us now
assume that f 3.
(4) Since k 4, we must have ai, = 1, a/ 2 = 2, iLi;j = 4 for some i], i 2 , i 3 ,
that is,
( ' 2 € . 4 f
n-z] =rn 1 , n-z2= rn 2 , n-Z;j= 771 3 ,
We claim that (n --- i2? i- (n -h)(n - i: 3 ). If not, put b = n - i 2 and
71 --- i] = b - J:, 71 - i;; = b +;tj, where 0< 1.1:1, Iyl < k. Hence
b 2 = (b - x)(b + 11) or (y - :1:)b = xy,
where x = 11 is plainly impossible. Now we have by part (1)
I:ryl = bly - :rl b > n - k > (k --- 1)2 1:1:111,
which is absurd.
So we have 711 i- m'17113, where we assume m > 7111Tn3 (the other case
being analogous), and proceed to our last chains of inequalities. We obtain
2(k: --- 1)71 > 71 2 - (71 - k + 1)2 > (n --- i 2 )2 - (71 --- id(n - i 3 )
4[rn( - (711]711;;/] > 4[(mlm3 + 1)1 --- (m]711;3)f]
> 4 1i 1-1 1-]
711] rn;3 .
Since f 3 and 71 > k€ k 3 > 6k, this yields
2(k --- 1)71711] Tn3
4f711i711 = f(n - id(n - i;3)
,) n ') )
f(n - k + 1)- > 3(71 - -)- > 271<
6
>
>
We see that our analysis so far agrees
with C) = 140 2 , as
2
50 = 2 . 5
49 = 1. 7 2
48 = 3.4 2
and 5 . 7 . 4 = 140.
16
Binomial coefficients are (almost) never powers
Now since Tni ::::: nIl! ::::: n I/3 we finally obtain
2 1 3
kn 2: krnITn3 > (k - 1)TnITn3 > n,
or k 3 > n. With this contradiction, the proof is complete.
D
References
[I] P. ERDOS: A theorem of Sylvester and Schur, J. London Math. Soc. 9 (1934),
282-288.
[2] P. ERDOS: On a diuphantine equation, J. London Math. Soc. 26 (1951),
176-178.
[3] J. J. SYLVESTER: On arithmetical series, Messenger of Math. 21 (1892), 1-19,
87-120; Collected Mathematical Papers Vol. 4, 1912,687-731.
Representing numbers
as sums of two squares
Which numbers can be written as sums of two squares?
This question is as old as number theory, and its solution is a classic in the
field. The "hard" part of the solution is to see that every prime number of
the form 4rn + 1 is a sum of two squares. G. H. Hardy writes that this
two square theorem of Fermat "is ranked, very justly, as one of the finest in
arithmetic." Nevertheless, one of our Book Proofs below is quite recent.
Let's start with some "warm-ups." First, we need to distinguish between
the prime p = 2, the primes of the form p = 4rn + 1, and the primes of
the form p = 4rn + 3. Every prime number belongs to exactly one of these
three classes. At this point we may note (using a method "a la Euclid") that
there are infinitely many primes of the form 4rn + 3. In fact, if there were
only finitely many, then we could take p". to be the largest prime of this
form. Setting
NI,; := 2 2 . 3 . 5. . . PI,; - 1
(where PI = 2, P2 = 3, P3 = 5,.... .denotes the sequence of all primes),
we find that N k is congruent to 3 (mod 4), so it must have a prime factor of
the form 4rn + 3, and this prime factor is larger than PI,; - contradiction.
At the end of this chapter we will also derive that there are infinitely many
primes of the other kind, P = 4rn + 1.
Our first lemma is a special case of the famous "law of reciprocity":
It characterizes the primes for which -1 is a square in the field Zp (which
is reviewed in the box on the next page).
Lemma 1. For primes P = 4rn + 1 the equation 8 2 == -1 (mod p) has two
solutions 8 E {1, 2, . . ., p-1}, for p = 2 there is one such solution, while
for primes of the form p = 4rn + 3 there is no solution.
. Proof. For p = 2 take 8 = 1. For odd p, we construct the equivalence
relation on {1, 2, . .. , p - 1} that is generated by identifying every element
with its additive inverse and with its multiplicative inverse in Zp. Thus the
"general" equivalence classes will contain four elements
{x, -x, x, -x}
since such a 4-element set contains both inverses for all its elements. How-
ever, there are smaller equivalence classes if some of the four numbers are
not distinct:
Chapter 4
1
2
3
4
5
6
7
8
9
10
11
1 2 + 0 2
12+12
??
2 2 + 0 2
2 2 + 1 2
??
'!?
Pierre de Fermat
18
Representing numbers as sums of two squares
For p = 11 the partition is
{1,10},{2,9,6,5},{3,8,4,7};
for p = 13 it is
{1,12},{2,11,7,6},{3,10,9,4},
{5, 8}: the pair {5, 8} yields the two
solutions of 8 2 == -1 moc! 13.
+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
Addition and multiplication in 2':5
· x == -;]; is impossible for odd p.
· :r == x is equivalent to £2 == 1. This has two solutions, namely :r = 1
and:1: = p - 1, leading to the equivalence class {l,p - I} of size 2.
· :r == -x is equivalent to :r 2 == -1. This equation may have no solution
or two distinct solutions xo.p - .ro: in this case the equivalence class
is {:ro,p - ;]:o}.
The set {l, 2,... ,p - I} has p - 1 elements, and we have partitioned
it into quadruples (equivalence classes of size 4), plus one or two pairs
(equivalence classes of size 2). For p-1 = 4rn+2 we find that there is only
the one pair {l,p - I}, the rest is quadruples, and thus 8 2 == -1 (modp)
has no solution. For p - 1 = 4rn there has to be the second pair, and this
contains the two solutions of 8 2 == -1 that we were looking for. 0
Prime fields
Ifp is a prime, then the set Zp = {O, 1,... ,p - I} with addition and
multiplication defined "modulo p" forms a finite field. We will need
the following simple properties:
· For x E Zp, x :f- 0, the additive inverse (for which we usually
write -x) is given by p - :r E {l, 2, . . . , p - I}. If p > 2, then x
and -x are different elements of Zp.
· Each x E Zp\{O} has aunique multiplicative inverse x E Zp\{O},
with xx == 1 (modp).
The definition of primes implies that the map Zp -t Zp, z f-t xz
is injective for all x :f- O. Thus on the finite set Zp \ {O} it must be
surjective as well, and hence for each x there is a unique x :f- 0
with xx == 1 (modp).
· The squares 0 2 , 1 2 ,2 2 , . .. , h 2 define different elements of Zp, for
h = lJ.
This is since x 2 == y2, or (x + y)(x - y) == 0, implies that x == y
- l E. J 2 2 2
or that x = -yo The 1 + 2 elements 0 ,1 ,... , h are called
the squares in Zp.
At this point, let us note "on the fly" that for all primes there are solutions
for ;];2 + J/ == -1 (mod p). In fact, there are l J + 1 distinct squares
£2 in Zp, and there are lJ + 1 distinct numbers f the form -(1 + y2).
These two sets of numbers are too large to be disjoint, since Zp has only p
elements, and thus there must exist x and y with :r 2 == -(1 + y2) (modp).
Lemma 2. No number 11 = 4rn + 3 is a sum oltwo squares.
. Proof. The square of any even number is (2k)2 = 4k 2 == 0 (mod4),
while squares of odd numbers yield (2k + 1)2 = 4(k 2 +k) + 1 == 1 (mod 4).
Thus any sum of two squares is congruent to 0, lor 2 (mod4). 0
Representing numbers as sums of two squares
19
This is enough evidence for us that the primes p = 4m + 3 are "bad." Thus,
we proceed with "good" properties for primes of the form p = 4m + 1. On
the way to the main theorem, the following is the key step.
Proposition. Every prime oftheform p = 4m + 1 is a sum of two squares,
that is, it can be written as p = x 2 + y2 for some natural numbers ;r, yEN.
We shall present here two proofs of this result - both of them elegant and
surprising. The first proof features a striking application of the "pigeonhole
principle" (which we have already used "on the fly" before Lemma 2; see
Chapter 21 for more), as well as a clever move to arguments "modulo p"
and back. The idea is due to the Norwegian number theorist Axel Thue.
. Proof. Consider the pairs (:1;', y') of integers with a ::; :1;', y' ::; ,jP, that
is, ;];', V' E {a, 1, . .. , l,jPJ}. There are (l,jPJ + 1)2 such pairs. Using the
estimate l x J + 1 > x for :1; = ,jP, we see that we have more than p such
pairs of integers. Thus for any 8 E Z, it is impossible that all the values
x' - sv' produced by the pairs (x', y') are distinct modulo p. That is, for
every s there are two distinct pairs
(;r',v'), (x",y") E {a, 1,... , l,jPJ}2
with
, I -" " ( d )
x - sy = x - sy mo p .
Now we take differences: We have x' - ;r;" == s(y' - V") (modp). Thus if
we define
I , " 1
""= ' f ' '..
oA/. OJ .A./,
I , " 1
y:= y y ,
then we get
(x, y) E {a, 1,... , lv'PJ}2 with x == ::!::sy (modp).
Also we know that not both x and y can be zero, because the pairs (;r;', V')
and (.r", V") are distinct.
Now let 8 be a solution of 8 2 == -1 (modp), which exists by Lemma I.
Then ;r 2 == s2y2 == _y2 (modp), and so we have produced
(x,y) E Z2
o < :r 2 + y2 < 2p and
,) ,)
:1;- + y- == a (modp).
with
But P is the only number between a and 2p that is divisible by p. Thus
£2 + y2 = p: done! 0
Our second proof for the proposition - also clearly a Book Proof -
was discovered by Roger Heath-Brown in 197 I and appeared in 1984.
(A condensed "one-sentence version" was given by Don Zagier.) It is so
elementary that we don't even need to use Lemma I.
Heath-Brown's argument features three linear involutions: a quite obvious
one, a hidden one, and a trivial one that gives "the final blow." The second,
unexpected, involution corresponds to some hidden structure on the set of
integral sol utions of the equation 4xy + Z2 = p.
For p = 13, l yIP J = 3 we consider
x',y' E {O,1,2,3}. Fors = 5, the sum
.r' -sY' (mod 13) assumes the following
values:
,
,y 0 1 2 3
x
o 0 8 3 11
1 1 9 4 12
2 2 10 5 0
3 3 11 6 1
20
Representing numbers as sums of"two squares
. Proof. We study the set
T
U
.
U
S:= {(J:,y,z)E;Z,3:4xy+z2=p, ;];>0, y>O}.
This set is finite. Indeed, x :::: 1 and y :::: 1 implies y ::; and x ::; l' So
there are only finitely many possible values for x and y, and given x and y,
there are at most two values for z.
1. The first linear involution is given by
1: S ----+ S, (x, y, z) f-----t (y, x, -z),
that is, "interchange J; and y, and negate z." This clearly maps S to itself,
and it is an involution: Applied twice, it yields the identity. Also, 1 has
no fixed points, since z = 0 would imply p = 4xy, which is impossible.
Furthermore, 1 maps the solutions in
T:= {(x,y,z)ES:z>O}
to the solutions in S\T, which satisfy z < O. Also, 1 reverses the signs of
J: - y and of z, so it maps the solutions in
U := {(J;,y,z)ES:(x-y)+z>O}
to the solutions in S\U. For this we have to see that there is no solution
with (x - y) + z = 0, but there is none since this would give p = 4xy + z2 =
4xy + (x - y)2 = (x + y)2.
What do we get from the study of I? The main observation is that since
1 maps the sets T and U to their complements, it also interchanges the
elements in T\ U with these in U\T. That is, there is the same number of
solutions in U that are not in T as there are solutions in T that are not in U
- so T and U have the same cardinality.
2. The second involution that we study is an involution on the set U:
9 : U ----+ U, (x, y, z) f-----t (x - y + z, y, 2y - z).
First we check that indeed this is a well-definded map: If (x, y, z) E U,
then x - y + z > 0, y > 0 and 4(x - y + z)y + (2y - Z)2 = 4xy + Z2, so
g(x,y,z) E S. By (x-y+z) -y+(2y-z) = x> o we find that indeed
g(J;, y, z) E U.
Also 9 is an involution: g(x, y, z) = (x - y + z, y, 2y - z) is mapped by 9
to ((x - y + z) - y + (2y - z), y, 2y - (2y - z)) = (J;, y, z).
And finally: 9 has exactly one fixed point:
(x,y,z) = g(x,y,z) = (x-y+z,y,2y-z)
holds exactly if y = z : But then p = 4xy + y2 = (4x + y)y, which holds
only for y = 1 = z, and x = Pl .
But if 9 is an involution on U that has exactly one fixed point, then the
cardinality of U is odd.
...,.....-
Representing numbers as sums of two squares
21
3. The third, trivial, involution that we study is the involution on T that
interchanges x and y:
h: T ---+ T, (x,y,z) f-----t (y,x,z)
This map is clearly well-defined, and an involution. We combine now our
knowledge derived from the other two involutions: The cardinality of T is
equal to the cardinality of U, which is odd. But if h is an involution on
a finite set of odd cardinality, then it has a fixed point: There is a point
(x, y, z) E T with x = y, that is, a solution of
p = 4x 2 + Z2 = (2X)2 + z2.
Note that this proof yields more - the number of representations of pin
the form p = x 2 + (2y)2 is odd for all primes of the form p = 4771 + 1.
(The representation is actually unique, see [2].) Also note that both proofs
are not effective: Try to find x and y for a ten digit prime! Efficient ways
to find such representations as sums of two squares are discussed in [5].
The following theorem completely answers the question which started this
chapter.
Theorem. A natural number n can be represented as a sum of two squares
if and only if every prime factor of the form p = 4771 + 3 appears with an
even exponent in the prime decomposition of n.
. Proof. Call a number n representable if it is a sum of two squares, that
is, if n = x 2 + y2 for some x, y E No. The theorem is a consequence of
the following five facts.
(I) 1 = 1 2 + 0 2 and 2 = 1 2 + 1 2 are representable. Every prime of the
form p = 4771 + 1 is representable.
(2) The product of any two representable numbers nl = xi + yi and n2 =
2 2 . ( ) 2 ( ) ')
X2 + Y2 IS representable: nln2 = X1X2 + Y1Y2 + X1Y2 - X2Yl -.
(3) If n is representable, n = x 2 + y2, then also nz 2 is representable, by
nz 2 = (XZ)2 + (yz)2.
Facts (I), (2) and (3) together yield the "if" part of the theorem.
(4) If p = 4771 + 3 is a prime that divides a representable number n
x 2 + y2, then p divides both x and y, and thus p2 divides n. In fact, if
we had x oj. 0 (modp), then we could find x such that xx == 1 (modp),
multiply the equation x 2 + y2 == 0 by x 2 , and thus obtain 1 + y 2 x 2 =
1 + (xy)2 0 (modp), which is impossible for p = 4771 + 3 by
Lemma I.
(5) If n is representable, and p = 4771 + 3 divides n, then p2 divides n,
and n/p2 is representable. This follows from (4), and completes the
proof. D
.
\:...
T
On a finite set of odd cardinality, every
involution has at least one fixed point.
D
22
Representing numbers as sums of two squares
As a corollary, we obtain that there are infinitely many primes of the form
P = 4rn + 1. For this, we consider
2 .)
Al k = (3.5.7... pd + 2-,
a number that is congruent to 1 (mod 4). All its prime factors are larger
than Pk, and by fact (4) of the previous proof, it has no prime factors of the
form 4rn + 3. Thus !vI.. has a prime factor of the form 4rn + 1 that is larger
than Ph"
Two remarks close our discussion:
· If a and b are two natural numbers that are relatively prime, then there
are infinitely many primes of the form am + b (rn E N) - this is a
famous (and difficult) theorem of Dirichlet.
· However, it is not quite true that the primes for fixed a and varying b
appear at the same rate, not even for a = 4; this is true at first sight, but
then one discovers that there is a subtle tendency towards "much more"
primes of the form 4771 + 3 than of the form 4771 + 1. This effect is known
as "Chebyshev's bias" - see Rubinstein and Sarnak [3].
References
[IJ D. R HEATH-BROWN: Fermat's two squares theorem, Invariant (1984), 2-5.
[2] 1. NIVEN & H. S. ZUCKERMAN: An Introduction to the Theor\" of Numbers,
third edition, Wiley 1972.
l3] M. RUBINSTEIN & P. SARNAK: Chebyshe\,'shias, Experimental Mathematics
3 (1994),173-197.
[4J A. THUE: Et par antydninger tit en taltheoretisk metode, Kra. Vidensk. Selsk.
Forh. 7 (1902), 57-75.
[5] S. WAGON: Editor's corner: The Euclidean algorithm strikes again, Amer.
Math. Monthly 97 (1990),125-129.
[6J D. ZAGIER: A one-sentence proof that e\'ay prime ]J == 1 (mod 4) is a sum (f
two squares, Amer. Math. MonthJy 97 (1990), 144.
.,.....-
Every finite division ring is a field
Rings are important structures in modern algebra. If a ring R has a mul-
tiplicative unit element 1 and every nonzero element has a multiplicative
inverse, then R is called a division ring. So, all that is missing in R from
being a field is the commutativity of multiplication. The best-known exam-
ple of a non-commutative division ring is the ring of quaternions discovered
by Hamilton. But, as the chapter title says, every such division ring must of
necessity be infinite. If R is finite, then the axioms force the multiplication
to be commutative.
This result which is now a classic has caught the imagination of many math-
ematicians, because, as Herstein writes: "It is so unexpectedly interrelating
two seemingly unrelated things, the number of elements in a certain alge-
braic system and the multiplication of that system."
Theorem. Every finite division ring R is commutative.
This beautiful theorem which is usually attributed to MacLagan Wedder-
burn has been proved by many people using a variety of different ideas.
Wedderburn himself gave three proofs in 1905, and another proof was given
by Leonard E. Dickson in the same year. More proofs were later given by
Emil Artin, Hans Zassenhaus, Nicolas Bourbaki, and many others. One
proof stands out for its simplicity and elegance. It was found by Ernst Witt
in 1931 and combines two elementary ideas towards a glorious finish.
. Proof. Our first ingredient comes from linear algebra. For an arbitrary
element ,<; E R, let G s be the set {x E R : xs = sx} of elements which
commute with s; G s is called the centralizer of s. Clearly, G s contains 0
and I and is a sub-division ring of R. The center Z is the set of elements
which commute with all elements of R, thus Z = nsER G s . In particular,
all elements of Z commute, 0 and 1 are in Z, and so Z is afinite field. Let
us set jZI = q.
We can regard Rand G s as vector spaces over the field Z and deduce that
IRI = q", where n is the dimension of the vector space Rover Z, and
similarly IGsl = q'" for suitable integers ns :::: 1.
Now let us assume that R is not a field. This means that for some ,<; E R
the centralizer G s is not all of R, or, what is the same, ns < n.
On the set R* := R\ {O} we consider the relation
,
T rv T
T' = X-I T.r for some x E R*.
:{::::::::}
Chapter 5
Ernst Witt
24
Every' finite division ring is a field
It is easy to check that'" is an equivalence relation. Let
As := {X- 1 8X: x E R*}
be the equivalence class containing s. We note that IAs I = 1 precisely
when 8 is in the center Z. So by our assumption, there are classes As with
IAs I :::: 2. Consider now for 8 E R* the map Is : x f-----t X-I sx from R*
onto As. We find
x- 1 sx = y- 1 sy {::::::::} (yx-i)s = s(y.1:- i )
{::::::::} yx- 1 E e; {::::::::} y E e.:x,
where e;x = {z.1: : z E e;} has size le;l. Hence any element x- 1 sx is
the image of precisely Ie; I = qns - 1 elements in R* under the map Is,
and we deduce I R* I = I As II e; I. In particular, we note that
IR* I qn - 1 ,
le;1 = qns -1 = IAsl is an integerforall 8.
We know that the equivalence classes partition R*. We now group the
central elements Z* together and denote by AI,. .. ,At the equivalence
classes containing more than one element. By our assumption we know
t :::: 1. Since IR*I = IZ*I + 2:::=IIAil, we have proved the so-called
class formula
t n 1
n L q -
q -1 = q -1 + -,
q71 i -I'
;=1
(I)
n 1
where we have 1 < qn- I E N for all i.
q ,-
With (I) we have left abstract algebra and are back to the natural numbers.
Next we claim that qn i -11 qn -1 implies ni I n. Indeed, write n = ani + r
with 0 'S r < ni, then q71i - 11 qani+T - 1 implies
qn i _ 11 (qani+T _ 1) _ (qni _ 1) = qn, (q(a-l)n,+7' - 1),
and thus qn i -11 q(a-l)n,+T -1, since qn i and qn i -1 are relatively prime.
Continuing in this way we find qn i - 11 qT - 1 with 0 'S r < ni, which is
only possible for r = 0, that is, ni I n. In summary, we note
niln foralli.
(2)
Now comes the second ingredient: the complex numbers C. Consider the
polynomial x n - 1. Its roots in C are called the n-th roots of unity. Since
A 71 = 1, all these roots A have I A I = 1 and lie therefore on the unit circle of
. 2kITi
the complex plane. In fact, they are precisely the numbers Ak = e---n =
cos(2k1r/n) + isin(2k1r/n), 0 'S k 'S n -1 (see the box on the next
page). Some of the roots A satisfy Ad = 1 for d < n; for example, the
root A = -1 satisfies A 2 = 1. For a root A, let d be the smallest positive
exponent with Ad = 1, that is, d is the order of A in the group of the roots
Every finite division ring is a field
25
of unity. Then din, by Lagrange's theorem ("the order of every element of
a group divides the order of the group" - see the box in Chapter 1). Note
that there are roots of order n, such as )'1 = e 2i .
Roots of unity
Any complex number z = x + iy may be written in the "polar" form
z = rei'!' = r(coscp+ isincp),
where r = Izi = J x 2 + y2 is the distance of z to the origin, and cp is
the angle measured from the positive x-axis. The n-th roots of unity
are therefore of the form
2k1d
Ak = e---;;:- = cos(2k1r/n) + isin(2k1r/n),
o ::; k ::; n - 1,
since for all k
A = e 2k1ri = cos(2k1r) + i sin(2k1r) = 1.
We obtain these roots geometrically by inscribing a regular n-gon
into the unit circle. Note that Ak = (k for all k, where ( = e 2i . Thus
the n-th roots of unity form a cyclic group { (, (2, . .. , (n-t , (n = I}
of order n.
Now we group all roots of order d together and set
CPd(X) :=
II (x - A).
>. of order d
Note that the definition of CPd (x) is independent of n. Since every root has
some order d, we conclude that
X n - 1 = II CPd(X),
din
Here is the crucial observation: The coefficients of the polynomials CPn (x)
are integers (that is, CPn(x) E Z[x] for all n), where III addition the constant
coefficient is either 1 or -1.
Let us carefully verify this claim. For n = 1 we have 1 as the only root,
and so CPt (x) = x-I. Now we proceed by induction, where we assume
CPd(X) E Z[x] for all d < n, and that the constant coefficient of CPd(X) is 1
or -1. By (3),
X n - 1 = p(x) CPn(x)
t n-t
where p(x) = L PiXi, CPn(x) = L ajx j , with Po = lor Po = -1.
;=0 j=O
z = rei'!'
)y "in
x = r cos cp
-1
1
The roots of unity for n = 6
(3)
(4)
26
Everyfinite division ring is afield
' q
Iq -);1 > Iq - 11
Since -1 = poao, we see ao E {I, -I}. Suppose we already know that
ao, al, . .. , ak-l E Z. Computing the coefficient of xk on both sides of
(4) we find
k
LPiak-i
i=O
k
L Piak-i + POak E Z.
i=l
By assumption, all ao,... , ak-l (and all Pi) are in Z. Thus POak and hence
ak must also be integers, since Po is 1 or -1.
We are ready for the coup de grace. Let nj I n be one of the numbers
appearing in (I). Then
x n - 1 = II CPd(X) = (X"i - l)q>n(x) II CPd(X).
din dl".dt"i.d#"
We conclude that in Z we have the divisibility relations
CPn(q) I qn - 1
qn _ 1
CPn(q) I q'" _ l '
(5)
and
Since (5) holds for all i, we deduce from the class formula (l)
cp,,(q)lq-1,
but this cannot be. Why? We know CPn(:::) = TI(x - A) where A runs
through all roots of x n - 1 of order n. Le A = a + ib be one of those roots.
By n > 1 (because of R :f- Z) we have A :f- 1, which implies that the real
part a is smaller than 1. Now 1>"1 2 = a 2 + b 2 = 1, and hence
Iq - >"1 2
Iq - a - ib/ 2 = (q - a)2 + b 2
q2 _ 2aq + a 2 + b 2 = q2 - 2aq + 1
>
q2 _ 2q + 1
(q _ 1)2,
(because of a < 1)
and so Iq - AI > q - 1 holds for all roots of order n. This implies
ICPn(q)1 = II Iq - AI > q - 1.
A
which means that CPn(q) cannot be a divisor of q - 1, contradiction and end
0
References
[I] L. E. DICKSON: On finite algebras, Nachrichten der Akad. Wissenschaften
Gottingen Math.-Phys. Klasse (1905), 1-36; Collected Mathematical Papers
Va!. III, Chelsea Pub!. Camp, The Bronx, NY 1975,539-574.
[2] J. H. M. WEDDERBURN: A theorem on finite algebras, Trans. Amer. Math.
Soc. 6 (1905), 349-352.
[3] E. WITT: Uba die Kommutativitiit endlicher Schiefkiirper, Abh. Math. Sem.
Univ. Hamburg 8 (1931), 413.
---
Some irrational numbers
"7]' is irrational"
This was already conjectured by Aristotle, when he claimed that diameter
and circumference of a circle are not commensurable. The first proof of
this fundamental fact was given by Johann Heinrich Lambert in 1766. Our
Book Proof is due to Ivan Niven, 1947: an extremely elegant one-page
proof that needs only elementary calculus. Its idea is powerful, and quite
a bit more can be derived from it, as was shown by Iwamoto and Koksma,
respectively:
. 7]'"2 is irrational (this is a stronger result!) and
. e1" is irrational for rational r :f- O.
Niven's method does, however, have its roots and predecessors: it can be
traced back to the classical paper by Charles Hermite from 1873 which
first established that e is transcendental, that is, that e is not a zero of a
polynomial with rational coefficients.
It is easy to see that e = Lk>O -h is irrational. In fact, from e = % (for
integers u, b > 0) we would get that
11 1
N:=n!(e-L k! )
k=O
is an integer for n :::: b, since then n!e and £+ (for 0 ::; J.. ::; 11) are integers.
However, this integer can also be written as
L n! 1 1
N = k! = n + 1 + (n + l)(n + 2) +
k?n+l
Thus N can be compared to a geometric series, yielding
1
1 1
o < N < n + 1 + (n + 1)"2 + ... = 11
which is absurd, since N is an integer.
This trick, however, isn't even good enough to prove that ('"2 is irrational
(which is a stronger statement). For this we use a different method -
essentially due to Charles Hermite - whose key is the following simple
lemma.
Chapter 6
Charles Hermite
28
Some irrational numbers
Lemma. For some fixed n 2: 1, let
The estimate n! > e( )n yields an
explicit n that is "large enough."
f(x) = x 71 (1- x)71
n!
271
1 '" .
(i) Thefunction f(x) is a polynomial of the form f(x) = 'I CiX',
n.
where the coefficients Ci are integers. i=71
(ii) ForO < x < 1 we have 0 < f(x) < rho
(iii) The derivatives f(k) (0) and f(k) (1) are integersfor all k 2: o.
. Proof. Parts (i) and (ii) are clear.
For (iii) note that by (i) the k-th derivative f(k) vanishes at x = 0 unless
n::::: k ::::: 2n, and in this range f(k)(O) = kk is an integer. From f(x) =
f(l-.T)wegetf(k)(x) = (-l)kf(k)(l-:r) for all x, andhencef(k)(l) =
( -l)k f(k) (0), which is an integer. 0
Theorem 1. e T is irrational for every r E Q\{O}.
. Proof. It suffices to show that e S cannot be rational for a positive integer
,<; (if e f were rational, then (ef)t = e S would be rational, too). Assume
that e" = % for integers a, b > 0, and let n be so large that n! > aS 271 + 1 .
Put
F(x) .- s271 f(x) - s2711 J'(x) + s2712 J"(x) 'f ... + f(271)(x),
where f(x) is the function of the lemma. F(x) may also be written as
F(x) s2n f(x) - s2711 J'(x) + s2n2 J"(x) 'f ... ,
since the higher derivatives Jlk) (x), for k > 2n, vanish. From this we see
that the polynomial F (x) satisfies the identity
F'(x) = -s F(x) + s271+l f(x).
Thus differentiation yields
[eSX F(x)] = seSX F(x) + eSX F'(x)
dx
S2n+le sT f(x)
and hence
N := b 1 1 s271+le sI f(x)dx = b [e SI F(x)] = aF(l) - bF(O).
This is an integer, since part (iii) of the lemma implies that F(O) and F(l)
are integers. However, part (ii) of the lemma yields estimates for the size
of N from below and from above,
o < N = b {I s2n+le s .f f(x)dx < bs271+les =
J o n.
which shows that N cannot be an integer: contradiction.
as 2n + 1
n!
< 1,
o
Now that this trick was so successful, we use it once more.
....,......-
Some irrational numbers
29
Theorem 2. 7r 2 is irrational.
. Proof. Assume that 7r 2
polynomial
% for integers a, b > O. We now use the
F(x) := b n (7r 2n f(x) - 7r 2n - 2 f(2) (x) + 7r 2n - 4 f(4) (x) 'f '" ),
which satisfies F"(X) = -7r 2 F(x) + b n 7r 2n + 2 f(x).
From part (iii) of the lemma we get that F(O) and F(l) are integers.
Elementary differentiation rules yield
d .
- [F'(x) sm 7rX - 7r F(x) cos 7rx]
dx
(F" (J;) + 7r 2 F (x)) sin 7rX
b n 7r 2n + 2 f(x) sin 7rX
7r 2 a"f(x) sin 7rX,
and thus we obtain
N:= 7r 1 1 a n f(x)sin7rxdx
[ F' ( x) sin 7r x - F ( x ) COS 7r X ]
F(O) + F(l),
which is an integer. Furthermore N is positive since it is defined as the
integral of a function that is positive (except on the boundary). However,
if we choose n so large that 1fa,n < 1, then from part (ij) of the lemma we
n.
obtain
o < N = 7r 1 1 an f(x) SiIl7rX dx <
7ra n
< 1,
n!
a contradiction.
This last theorem, together with the following classical result by Euler,
proves that the value
1
((2) = "'?
n-
n:;:.l
of Riemann's zeta function is irrational (see the appendix on page 34).
1 7r 2
Theorem 3. L n2 = 6'
n:;:.l
. Proof. The following proof - due to Tom Apostol - consists in two
different evaluations of the double integral
1 1
f ' f ' 1
I . - 1 _ xy dx dy.
o 0
7r is not rational, but it does have "good
approximations" by rationals - some
of these were known since antiquity:
22
7
355
113
104348
33215
o
3.142857142857...
3.141592920353...
3.141592653921...
7r 3.141592653589...
30
Some irrational numbers
Ay
'11.
v
x
----
.v
For the first one, we expand j.!XY as a geometric series, decompose the
summands as products, and integrate effortlessly:
1 1 1 1
[ '/'/L(XY)"dXdY = L ././.r;nY"dXd Y
o 0 "20 "200 0
1 1
L (./ X"dX) (./ Y"dY)
"20 0 0
1 1
= L n+1n+1
"20
1 1
L (n + 1)2 = L n 2 = ((2).
,,20 "21
This evaluation also shows that the double integral (over a positive function
with a pole at x = Y = 1) is finite. Note that the computation is also easy
and straightforward if we read it backwards - thus the evaluation of ((2)
leads one to the double integral [.
The second way to evaluate [ makes a change of coordinates: a rotation
by 45° leads to coordinates
y+;r; y -;r;
'11. = y2 and v = y2 ,
u-v u+v
x = y2 and Y = y2 '
Substitution of these new coordinates yields
'11. 2 - v 2
1 - ;r;y = 1-
2
and thus
1
2
1 - xy 2 - '11. 2 + v 2 .
The new domain of integration, and the function to be integrated, are sym-
metric with respect to the u-axis, so we just need to compute the integral
over the upper half domain, which we split into two parts in the most natural
way:
u
!v0 IL
4 ./ (./ 2 _ :2 V + v 2 ) du
o 0
dv )
, ,) 2 duo
2-11,-+v
v0 v0-u
+4'/('/
!v0 0
-
[
USin g '/ 2 dx 2 = arctan:' + C, this becomes
a +X a a
[
v0
4 ./ h ardan ( ) du
2 - '11. 2 2 -u 2
o
4 j v} h arctan ( ) duo
. 2 - '11. 2 2 - '11. 2
!v0
+
...,....-
Some irrational numbers
31
Two simple trigonometric substitutions now complete the job. For the
first integral, we put u = V2 sin (). The interval 0 ::; u ::; ! V2 corre-
spond s to 0 ::; () ::; . We compute du = V2 cas () d() and V2 - u 2 =
V2 V l- sin 2 () = V2cas(), and thus
V2
J 1 u
4 arctan du
2 - u 2 2 - u 2
o
j ": 1 ( V2 sin () ) r;:;
4 V2 arctan V2 v 2 cos () d()
2 cos () 2 cos ()
o
6
4 J () d()
o
2
= 4. !(-)2 = !.
2 6 3 6
For the second integral we use u = V2 cas 2(). Here ! V2 < u < V2
translates into 2 () 2 O . We get du = -2V2 sin 2() d(), with
V2 - u 2 J2 V l - cos 2 2() = V2 sin 2() = 2V2 cos () sin ()
V2 - u V2(1 - cas 2()) = 2V2 sin 2 ()
and thus
V2
4 J arctan dU
2 - u 2 2 - u 2
V2
o
J 1 ( 2V2sin2 () )
4 V2 arctan V2 (-2V2) sin 2() d()
2 sin 2() 2 2 cos () sin ()
6
6
4 J 2() d()
o
Putting both integrals together, we obtain
1 7r 2 2 7r 2
I = -- + --
3 6 3 6
_ ( 7r ) 2 _ 27r 2
- 4 - ---
6 3 6 .
2
7r
6
o
Wasn't that marvelous? Well, compare it to the following, entirely different
2
proof for 2::n>l ';2 = . It is completely elementary and it even pro-
- 2
vides us with an estimate for the speed of convergence of 2:::=1 ';2 to 1f6 .
Its origin is not quite clear. According to John Scholes the "proof was com-
mon knowledge when 1 was an undergraduate at Cambridge, England in
the late I 960s." It appeared in the journal Eureka of the Cambridge under-
graduate maths society, the Archimedeans, in 1982 [8], there attributed to
John Scholes. But as Scholes writes: "I probably heard the proof, at least in
outline (one was expected to fill in details oneself) from Peter Swinnerton-
Dyer, but he would probably claim to have heard it from someone else."
32
Some irrational numbers
For m = 1,2,3 this yields
cot 2 i =
cot 2 + cot 2 2; = 2
cot 2 + cot 2 2; + cot 2 3; = 5
Comparison of coefficients:
If pet) = e(t - al).'. (t - am),
then the coefficient of t m - 1
is -e(al + . . . + am).
. Proof. The first step is to establish a remarkable relation between values
of the (squared) cotangent function. Namely, for all m 2 1 one has
2 ( 7r ) 2 ( 27r ) 2 ( m7r )
cot 2m+l + cot 2rn+l +... + cot 2m+l
2rn(2m-l)
6
(1)
To establish this, we start with the relation
cos nx + i sin nx = ( cos x + i sin x) n
and take its imaginary part, which is
sin nx
(7) sinxcos n - 1 x - () sin 3 xcosn- 3 x:!:...
(2)
Now we let n = 2m + 1, while for x we will consider the m different
values x = 2:1 ' for r = 1,2,... , m. For each of these values we have
nx = nr, and thus sin nx = 0, while 0 < x < implies that for sin x we
get m distinct positive values.
In particular, we can divide (2) by sinn x, which yields
o (7) cot n - 1 x - C) cot n - 3 x:!: .'. ,
that is,
o
( 2m + 1 ) 2m ( 2m + 1 ) 2rn-2
cot x - cot x :!: . . .
1 3
for each of the m distinct values of x. Thus for the polynomial of degree m
p(t) Cm 1 + 1) t m - Cm 3 + 1) e n - 1 :!:... + (_1)m G:: )
we know m distinct roots
aT cot 2 ( 2':-1 ) for r = 1,2,... ,m.
Hence the polynomial coincides with
( 2m + 1 ) ( 2 ( 7r ) ) ( 2 ( rn7r ) )
p(t) 1 t - cot 2rn+l . .... t - cot 2rn+l .
Comparison of the coefficients of t rn - 1 in p(t) now yields that the sum of
the roots is
al + . . . + aT
(2rn 3 +! )
e+l)
2rn(2rn-l)
6
which proves (1).
..,.....-
Some irrational numbers
33
We also need a second identity, of the the same type,
, 2 ( 7r ) 2 ( 27r ) , 2 ( 1117r ) 2111(2m+2 )
CSC 2111+1 + CSC 2m+1 +... + CSC 2m+1 - 6 '
for the cosecant function csc x = ..L. But
SIn x
.) 1 cos 2 x+sin 2 x 2
csC X = -----,---- = . 2 = cot X + 1,
sin 2 X sm X
so we can derive (3) from (I) by adding m to both sides of the equation.
Now the stage is set, and everything falls into place. We use that in the
range 0 < y < we have
o < sm y < y < tan y,
and thus
o < cot y <
1
y
cscy,
<
which implies
2 1 2
cot Y < 2 < CSC y.
Y
Now we take this double inequality, apply it to each of the m distinct values
of x, and add the results. Using (I) for the left-hand side, and (3) for the
right-hand side, we obtain
2111(2m-1 ) ( 2111+1 ) 2 ( 2111+1 ) 2 ( 2111+1 ) 2 2111(2111+2)
6 < 7r + 27r + . . . + 1117r < 6 ,
that is,
Jr2 217l 2m,-1
6 2111+1 2m+1
1 1 1
< 12 + 2 2 + . . . + rn 2
< 7r 2 2111 2m+2
6 2111+1 2111+1 .
2
Both the left-hand and the right-hand side converge to 7r 6 for Tn ------+ 00:
end of proof. D
Here comes our final irrationality result.
Theorem 4. For every odd integer n 3, the number
A(n) := .!. arccos ( )
7r vn
is irrational.
We will need this result for Hilbert's third problem (see Chapter 7) in the
cases n = 3 and n = 9. For n = 2 and n = 4 we have A(2) = i and
A( 4) = i, so the restriction to odd integers is essential. These values
are easily derived by appealing to the diagram in the margin, in which the
statement "; arccos ( )n ) is irrational" is equivalent to saying that the
polygonal arc constructed from )n , all of whose chords have the same
length, never closes into itself.
We leave it as an exercise for the reader to show that A( n) is rational only
for n E {I, 2,4}. For that, distinguish the cases when n = 2", and when n
is not a power of 2.
(3)
O<a<b<c
implies
O<<t<
1
34
Some irrational numbers
. Proof. We use the addition theorem
cas a + cos (3 = 2 cas 0:1"0 cos 0:i3
from elementary trigonometry, which for ex = (k + l)ip and (J = (k - l)ip
yields
cas (k + l)ip = 2 cos ip cas kip - cos (k - l)ip.
(4)
For the angle ipn = arccos ( In ), which is defined by cas ipn
o :S ipn :S 7r, this yields representations of the form
..L and
vn
Ak
cas kipn = ,
,;n
where Ak is an integer that is not divisible by n, for all k > O. In fact,
we have such a representation for k = 0, 1 with Ao = Al = 1, and by
induction on k using (4) we get for k 2: 1
2 1 Ak Ak-l
cas(k+ l)ipn = - -
,;n,;n ,;n
2Ak - nA/._ 1
r.;-k+l
yn
Thus we obtain Ak+l = 2Ak - nA k - l . If n 2: 3 is odd, and Ak is not
divisible by n, then we find that Ak+l cannot be divisible by n, either.
Now assume that
1 k
A(n) = ;ipn = g
is rational (with integers k, g > 0). Then Ripn = k7r yields
::1::1 = cas k7r =
A£
£ .
,;n
Thus ,;nf = ::I::A f is an integer, with g 2: 2, and hence n I ,;n{. With
,;n£ I A f we find that n divides At, a contradiction. D
Appendix: The Riemann zeta function
The Riemann zeta function (( 8) is defined for real 8 > 1 by
1
((8) := "'-.
ns
nl
Our estimates for Hn (see page 10) imply that the series for ((1) diverges,
but for any real 8 > 1 it does converge. The zeta function has a canonical
continuation to the entire complex plane (with one simple pole at 8 = 1),
which can be constructed using power series expansions. The resulting
complex function is of utmost importance for the theory of prime numbers.
Let us mention three diverse connections:
"'!I"""
Some irrational numbers
35
(1) The remarkable identity
1
((8) = II 1 _ p-S
P
due to Euler encodes that every natural number has a unique (!) decompo-
sition into prime factors; using this basic fact, it is a simple consequence of
the geometric series expansion
1
1 - p-s
1 1 1
= 1+-+- 2 +- 3 +...
po' p' pS
(2) The location of the complex zeros of the zeta function is the subject
of the "Riemann hypothesis": one of the most famous and important unre-
solved conjectures in all of mathematics. It claims that all the non-trivial
zeros 8 E iC of the zeta function satisfy
1
Re(s) = 2
(The zeta function vanishes at all the negative even integers, which are
referred to as the "trivial zeros.")
Very recently, Jeff Lagarias showed that, surprisingly, the Riemann hypo-
thesis is equivalent to the following elementary statement: For all n 2 1,
L d ::; Hn + cxp(H n ) log(Hn),
din
where H n is again the n-th harmonic number, with equality only for n = 1.
(3) It has been known for a long time that ((8) is a rational multiple of Jr s ,
and hence irrational, if 8 is an even integer 8 2 2, see Chapter 19. Here
2
we presented two Book Proofs of ((2) = ' a famous identity of Euler
from 1734. In contrast, the irrationality of ((3) was proved by Roger Apery
only in 1979 (see the marvelous review in [7]). Despite considerable effort
the picture is rather incomplete about (( 8) for the other odd integers, 8 =
2t + 1 2 5. The latest mathematical news about this, a paper by Rivoal,
implies that infinitely many of the values ((2t + 1) are irrational.
References
[I] T. M. ApOSTOL: A proof'that Euler missed: Evaluating «(2) the easy way,
Math. Intelligencer 5 (\ 983),59-60.
[2] C. HERMITE: Sur la fonction exponentielle, Comptes rendus de I' Academie
des Sciences (Paris) 77 (1873), 18-24; <Euvres de Charles Hermite, Vol. III,
Gauthier-Villars, Paris 1912, pp. 150-181.
[31 Y. IWAMOTO: A prool that Jr2 is irrational, J. Osaka Institute of Science and
Technology 1 (\ 949), 147-148.
[4J J. F. KOKSMA: On Niven '.I' proof that Jr is irrational, Nieuw Archiv Wiskunde
(2) 23 (\949), 39.
36
[5]
[6]
[7]
[8]
[9]
Some irrational numbers
J. C. LAGARIAS: An elementary problem equivalent to the Riemann hypo-
thesis, Preprint arXi v: ma th. NT /0008177, August 2000, 4 pages.
1. NIVEN: A simple proof that IT is irrational, Bulletin Amer. Math. Soc. 53
(1947),509.
A. VAN DER PaRTEN: A proof that Euler missed... Apery's proof of the
irrationality of «(3). An informal report, Math. Intelligencer 1 (1979), 195-
203.
T. J. RANSFORD: An elementary proof of 2:;'" 2 = 7r G 2 , Eureka No. 42,
Summer 1982,3-5.
T. RIVOAL: La fonction Zeta de Riemann prend une infinite de valeurs irra-
tionnelles aux entiers impairs, Comptes rendus de I' Academie des Sciences
(Paris), Ser. 1,331 (2000), 267-270.
...,....
Geometry
7
Hilbert's third problem:
decomposing polyhedra 39
8
Lines in the plane
and decompositions of graphs 47
9
The slope problem 53
10
Three applications
of Euler's formula 59
11
Cauchy's rigidity theorem 65
12
Touching simplices 69
13
Every large point set
has an obtuse angle 73
14
Borsuk's conjecture 79
"Platonic solids - child's play!"
.......--
Hilbert's third problem:
decomposing polyhedra
In his legendary address to the International Congress of Mathematicians
at Paris in 1900 David Hilbert asked - as the third of his twenty-three
problems - to specify
"two tetrahedra of equal bases and equal altitudes which can in
no way be split into congruent tetrahedra, and which cannot be
combined with congruent tetrahedra to form two polyhedra which
themselves could be split up into congruent tetrahedra."
This problem can be traced back to two letters of Carl Friedrich Gauss
from 1844 (published in Gauss' collected works in 1900). If tetrahedra of
equal volume could be split into congruent pieces, then this would give one
an "elementary" proof of Euclid's theorem XII.5 that pyramids with the
same base and height have the same volume. It would thus provide an ele-
mentary definition of the volume for polyhedra (that would not depend on
analysis, and hence on continuity arguments). A similar statement is true
in plane geometry: the Bolyai-Gerwien Theorem [I, Sect. 2.71 states that
planar polygons are both equidecomposable (can be dissected into congru-
ent triangles) and equicomplementable (can be made congruent by adding
congruent triangles) if and only if they have the same area.
.. ..
.. ..
Chapter 7
David Hilbert
The cross is equicomplementable with a
square of the same area.
In fact, they are even equidecomposable.
Hilbert's third problem: decomposing polyhedra
40
Hilbert - as we can see from his wording of the problem - did expect that
there is no analogous theorem in dimension 3, and he was right. In fact, the
problem was completely solved by Hilbert's student Max Dehn in two pa-
pers: the first one, exhibiting non-equidecomposable tetrahedra of equal
base and height, appeared already in 1900, the second one, also covering
equicomplementability, appeared in 1902. However, Dehn's papers are not
easy to understand, and it takes effort to see whether Dehn did not fall into a
subtle trap which ensnared others: a very-elegant-but-unfortunately-wrong
proof was found by Bricard (in 1896!), by Meschkowski (1960), and prob-
ably by others. Luckily, Dehn's proof was reworked and redone, and after
combined efforts of V. F. Kagan (1903/1 930), Hugo Hadwiger (1949/54)
and Vladimir G. Boltianskii, we now have a Book Proof - as follows.
(The appendix to this chapter provides some basics about polyhedra.)
(1) A little linear algebra
For every finite set of real numbers M = {ml,... , mk} C JR, we define
V(M) as the set of all linear combinations of numbers in M with rational
coefficients, that is,
k
V(M):= {Lqimi: qi E Q} JR.
i=1
The first (trivial, but important) observation is that V(M) is a finite dimen-
sional vector space over the field Q of rational numbers. In fact, V (M) is
clearly closed under taking sums and under multiplication with rationals,
and the field axioms for JR make V (M) into a vector space. The dimension
of V(M) is the size of any minimal generating set. Since M generates
V (M) by definition, we see that it contains a minimal generating set, and
hence
dimQi V(M) :::: k = IMI.
In the following, we shall need and use Q-linear functions
I: V(M) ---t Q
which we interpret as linear maps of Q-vector spaces. The key property is
that for every rational linear dependence 2:=1 qimi = 0 with qi E Q, we
must have 2:=1 q;J(mJ = 1(0) = O. Here is the simple lemma that gets
things going.
Lemma. For any finite subsets M M' of JR, the Q-vector space V(M)
is a subspace of the Q-vector space V(M'). Thus if 1 : V(M) ---t Q is
a Q-linear function, then 1 can be extended to a Q-linear function l'
V(M') ---t Q so that 1'(m) = I(m) for all mE M.
. Proof. Any Q-linear function V (M) ---t Q is determined as soon as its
values on a Q-basis of V(M) are fixed. Since every basis on'(M) can be
extended to a basis of V(M'), the rest follows. D
T
Hilbert's third problem: decomposing polyhedra
41
(2) Dehn invariants
For a 3-dimensional polyhedron P, let lU p denote the set of all angles
between adjacent facets (dihedral angles), together with the number 7r.
Thus for a cube C we get Me = {, 7r}, while for an orthogonal prism Q
over an equilateral triangle we get M Q = {}-, , 7r}.
Given any finite set AI IE. that contains M p , and any Q-linear function
f : Y(M) ---t Q
that satisfies f (7r) = 0, we define the Dehn invariant of P (with respect
to j) to be the real number
Dj(P) := L R(e) . f(o:(e)),
eEP
where the sum extends over all edges e of the polyhedron, f( e) denotes the
length of e, and 0:( e) is the angle between the two facets that meet in e.
We will calculate various Dehn invariants later. For now just note that
f () = bf(7r) = 0 must hold for an}' such Q-Iinear function f, and
thus
Dj(C) = 0,
that is, the Dehn invariant of a cube is zero with respect to any f.
(3) The Dehn-Hadwiger theorem
As above we call two polyhedra P, Q equidecomposable if they can be
decomposed into finite sets of polyhedra PI, . . . , Pn and QI, . . . , Qn such
that Pi and Qi are congruent for all i (1 S i S n). Two polyhedra are
equicomplementable if there are polyhedra PI, . .. , Pm and Q I, . .. , Q m
so that the interiors of the Pi are disjoint from each other and from P, and
similrly for the Qi and Q, such that Pi icongruent to Qi for all i, and such
that P := PU PI U P 2 U ... U Pm and Q := Q U QI U Q2 U ... U Qrn are
equidecomposable. A theorem of Gerling from 1844 implies that it does not
matter whether we admit reflections when considering congruences, or not.
Clearly equidecomposable polyhedra are equicomplementable, but the con-
verse is far from clear. The following theorem of Hadwiger (in the version
of Boltianskii) provides our tool to find - as Hilbert proposed - tetrahedra
of equal volume that are not equicomplementable, and thus not equidecom-
posable.
Theorem. Let P and Q be polyhedra with dihedral angles 0:1, . .. , o:p resp.
PI, . .. , Pq at their edges, and let AI be afinite set of real numbers with
{O:I, . .. , O:p, PI, . . . , pq, 7r} AI.
If f : Y (M) ---t Q is any Q-linear function with f (7r) = 0 such that
Dj(P) i: Dj(Q),
then P and Q are not equicomplementable.
Me = {f,1f}
MQ = g, %,1f}
42
Hilbert's third problem: decomposing polyhedra
t
D
. Proof. The argument comes in two parts.
(1) If a polyhedron P has a decomposition into finitely many polyhedral
pieces PI, . .. , Pn, and if all the dihedral angles of the pieces PI, . .. , P"
are contained in the set AI, then for every Q-linear f : 1 "(AI) --+ Q, the
Dehn invariants add up:
Dj(P) = Dj(PJ) +... + Dj(Pn).
For this, we associate a mass to any part of an edge of a polyhedron: if
e' e is a part of an edge e of P, then its mass will be
mj(e') := f(e') f(a(e')),
its length times the f -value of its dihedral angle.
Now if P is decomposed into PI, . " , PI/, consider the union of all the
edges of the pieces Pi. Along the edges e' that are contained in edges of P,
we see that the dihedral angles of the pieces add up to the dihedral angle
of P at e', and hence the masses add up.
At any other edge e" of one of the Pi'S which is contained in the interior of
a face of P or in the interior of P, the angles add up to 7r or to 27r, so the
f -values of the angles in the pieces add up to f (7r) = 0 resp. to f(27r) = O.
Thus for the sum of the masses we get the same value that we had attached
to these edges for P in the first place, namely O.
(2) Assuming that P and Q are equicomplementable, we can enlarge AI to a
superset AI' that also includes all the dihedral angles appearing in any of the
pieces involved. !'vJ' is finite, since we only consider finite decompositions.
Then our lemma above allows us to extend f to f' : 1"("U') --+ Q, and
hence part (1) yields an equation of the type
Df' (P)+Df'(Pd+... +Dr(Pm) = Dr' (Q)+Df' (Qd+... +Df' (Qm)
where Dr' (Pi) = Df' (Q;) since Pi and Q i are congruent. Hence we
conclude D j (P) = D j (Q), a contradiction. D
Example 1. For a regular tetrahedron To with edge lengths f, we calculate
the dihedral angle from the sketch. The midpoint lU of the base triangle
divides the height AE of the base triangle by I :2, and since lAB I = IDEI,
we find cas a = t, and thus
a = arccos t.
C Setting AI := {o, 7r} we note that the ratio
a 1 I
-; = -; arccos :3
is irrational, according to Theorem 4 of Chapter 6 (taking n = 9). Thus
the Q-vector space 1"(AI) is 2-dimensional with basis AI, and there is a
Q-Iinear function f : Y(AI) --+ Q with
f(a) := I, f(7r) := O.
-
Hilbert's third problem: decomposing polyhedra
43
For this f we have
Dj'(To) = 6£f(0:) = 6£ ic 0,
and thus a regular tetrahedron cannot be equidecomposable or equicom-
plementable with a cube, since the Dehn invariant of a cube vanishes for
any f.
Example 2. Let Tl be a tetrahedron spanned by three orthogonal edges
A.B, A.C, A.D of length u. This tetrahedron has three dihedral angles that
are right angles, and three more dihedral angles of equal size ip, which we
calculate from the sketch as
IA.EI
cosip = -
, IDEI
V2u
&J3 V2u
1
J3'
It follows that
1
ip = arccos J3 '
For M := {, arccos , 7r}, the Q-vector space l' (M) has dimension 2.
In fact, 7r and are linearly dependent, so F( M) = F ( { arccos , 7r} ),
but there is no rational relation between arccos and 7r - equivalently,
v3
1. arccos is irrational, as we proved in Chapter 6 (take 71 = 3 in Thm. 4).
IT v:)
Thus we may construct a Q-linear map f by setting
f(7r) := 0 and f( arccos ) := 1,
from which we obtain f ( ) = 0 and hence
Dj'(Td = 3uf() +3(V2u)f(arccos ) = 3V2u ic O.
This proves that T] is not equidecomposable or equicomplementable with
a cube C of the same volume, since D j' (C) = 0 holds for any f.
Example 3. Finally, let T 2 be a tetrahedron with three consecutive edges
A.B, BC and CD that are mutually orthogonal (an "orthoscheme") and of
the same length u.
We will not calculate the angles in such a tetrahedron (they are , J,
and ), but rather argue that - using the midpoints of edges and faces,
and the center - a cube of edge length u can be decomposed into 6 tetra-
hedra of this type (3 congruent copies, and 3 mirror images).
All these congruent copies and mirror images have the same Dehn invari-
ants, and hence for every suitable functional f we will obtain
1
Dj'(T 2 ) = (iDj'(C) = 0
so all Dehn invariants of such a tetrahedron vanish! This solves Hilbert's
third problem, since we have before constructed a different tetrahedron, T] ,
with congruent bases and the same height, and with D j' (Td ic O. By
the Dehn-Hadwiger theorem Tl and T 2 are not equidecomposable, and not
even equicomplementable.
D
B
C
D
A.
44
Hilbert's third problem: decomposing polyhedra
Some familiar polytopes: tetrahedron,
cube and permutahedron
Appendix: Polytopes and polyhedra
A convex polytope in JRd is the convex hull of a finite set S = {81, . ., ,8n},
that is, a set of the form
n n
P conv(S)'- {LAi8i:A;?:O, LAi=l}.
i=1 ;=1
Polytopes are certainly familiar objects: prime examples are given by con-
vex polygons (2-dimensional convex polytopes) and by convex polyhedra
(3-dimensional convex polytopes).
There are several types of polyhedra that generalize to higher dimensions
in a natural way. For example, if the set S is affinely independent of
cardinality d + 1, then conv(S) is a d-dimensional simplex (or d-simplex).
For d = 2 this yields a triangle, for d = 3 we obtain a tetrahedron. Simi-
larly, squares and cubes are special cases of d-cubes, such as the unit d-cube
given by Cd = [0, l]d JRd.
General polytopes are defined as finite unions of convex polytopes. In this
book non-convex polyhedra will appear in connection with Cauchy's rigid-
ity theorem in Chapter 11, and non-convex polygons in connection with
Pick's theorem in Chapter 10, and again when we discuss the art gallery
theorem in Chapter 28.
Convex polytopes can, equivalently, be defined as the bounded solution sets
of finite systems oflinear inequalities. Thus every convex polytope P JRd
has a representation of the form
P = {xEJRd:A.x:S;b}
for some matrix A. E JR7nxd and a vector b E JR"'. In other words, Pis
the solution set of a system of Tn linear inequalities aT x :S; bi, where aT is
the i-th row of A.. Conversely, every bounded such solution set is a convex
polytope, and can thus be represented as the convex hull of a finite set of
points.
For polygons and polyhedra, we have the familiar concepts of vertices,
edges, and 2-faces. For higher-dimensional convex polytopes, we can de-
fine their faces as follows: a face of P is a subset F P of the fonn
P n {x E JRd : aTx = b}, where aT x S b is a linear inequality that is
valid for all points x E P.
All the faces of a polytope are themselves polytopes. The set V of vertices
(O-dimensional faces) of a convex polytope is also the inclusion-minimal set
such that conv(V) = P. Assuming that P JRd is a d-dimensional convex
polytope, the facets (the (d-I)-dimensional faces) determine a minimal set
of hyperplanes and thus of halfspaces that contain P, and whose intersec-
tion is P. In particular, this implies the following fact that we will need
later: Let F be a facet of P, denote by H F the hyperplane it determines,
and by Hj; and Hi the two closed half-spaces bounded by H F . Then one
of these two halfspaces contains P (and the other one doesn't).
-
Hilbert's third problem: decomposing polyhedra
45
The graph G(P) of the convex polytope P is given by the set V of ver-
tices, and by the edge set E of I-dimensional faces. If P has dimension 3,
then this graph is planar, and gives rise to the famous "Euler polyhedron
formula" (see Chapter 10).
Two polytopes P, P' <:;; JRd are congruent if there is some length-preserving
affine map that takes P to P'. Such a map may reverse the orientation
of space, as does the reflection of P in a hyperplane, which takes P to
a mirror image of P. They are combinatorially equivalent if there is a
bijection from the faces of P to the faces of P' that preserves dimension
and inclusions between the faces. This notion of combinatorial equivalence
is much weaker than congruence: for example, our figure shows a unit cube
and a "skew" cube that are combinatorially equivalent (and thus we would
call anyone of them "a cube"), but they are certainly not congruent.
A polytope (or a more general subset of JRd) is called centrally symmetric
if there is some point Xo E JRd such that
Xo + x E P {::::::} Xo - x E P.
In this situation we call Xo the center of P.
References
[I] V. G. BOLTIANSKII: Hilbert's Third Problem, V. H. Winston & Sons (Halsted
Press, John Wiley & Sons), Washington DC 1978.
[2] M. DEHN: Ueber raumgleiche Polyeder, Nachrichten von der Konig!.
Gesellschaft der Wissenschaften, Mathematisch-physikalische Klasse (1900),
345-354.
[3] M. DEHN: Ueber den Rauminhalt, Mathematische Annalen 55 (1902),
465-478.
[4] C. F. GAUSS: "Congruenz und Symmetrie": Briefwechsel mit Gerling,
pp. 240-249 in: Werke, Band VIII, Konig!. Gesellschaft der Wissenschaften
zu Gottingen; B. G. Teubner, Leipzig 1900.
[51 D. HILBERT: Mathematical Problems, Lecture delivered at the International
Congress of Mathematicians at Paris in 1900, Bulletin Amer. Math. Soc. 8
(1902),437-479.
[6] G. M. ZIEGLER: Lectures on Pol.vtopes, Graduate Texts in Mathematics 152,
Springer- Verlag New York 1995.
Combinatorially equivalent polytopes
Lines in the plane
and decompositions of graphs
Perhaps the best-known problem on configurations of lines was raised by
Sylvester in 1893 in a column of mathematical problems.
QUESTIONS FOR SOLUTION.
11851. (Professor SYLVESTIm.)-Prove that it is not possible to
arrange any finite number of real points 80 that a right line. th.rough
every two of them shull pass through a third, unless they all he III the
same right line.
Whether Sylvester himself had a proof is in doubt, but a correct proof was
given by Tibor Gallai [Grunwald] some 40 years later. Therefore the fol-
lowing theorem is commonly attributed to Sylvester and Gallai. Subsequent
to Gallai's proof several others appeared, but the following argument due
to L. M. Kelly may be "simply the best."
Theorem 1. In any configuration ofn points in the plane, not all on a line,
there is a line which contains exactly two of the points.
. Proof. Let P be the given set of points and consider the set £: of all lines
which pass through at least two points of P. Among all pairs (P, g) with P
not on f, choose a pair (Po, fa) such that Po has the smallest distance to go,
with Q being the point on fa closest to Po (that is, on the line through Po
vertical to fa).
Claim. This line go does it!
If not, then go contains at least three points of P, and thus two of them, say
PI and P 2 , lie on the same side of Q. Let us assume that PI lies between
Q and P 2 , where PI possibly coincides with Q. The figure on the right
shows the configuration. It follows that the distance of PI to the line gi
determined by Po and P 2 is smaller than the distance of Po to fa, and this
contradicts our choice for fa and Po. 0
In the proof we have used metric axioms (shortest distance) and order
axioms (PI lies between Q and P 2 ) of the real plane. Do we really need
these properties beyond the usual incidence axioms of points and lines?
Well, some additional condition is required, as the famous Fano plane de-
picted in the margin demonstrates. Here P = {I, 2, . ., , 7} and £: consists
of the 7 three-point lines as indicated in the figure, including the "line"
{4, 5, 6}. Any two points determine a unique line, so the incidence axioms
are satisfied, but there is no 2-point line. The Sylvester-Gallai theorem
therefore shows that the Fano configuration cannot be embedded into the
real plane such that the seven collinear triples lie on real liiles: there must
always be a "crooked" line. 1
Chapter 8
J. J. Sylvester
go
Q
PI
3
6
2
48
Lines in the plane, and decompositions of graphs
Q
P P 3
P 4
However, it was shown by Coxeter that the order axioms wiIl suffice for
a proof of the Sylvester-GaIlai theorem. Thus one can devise a proof that
does not use any metric properties - see also the proof that we will give in
Chapter 10, using Euler's formula.
The Sylvester-Gallai theorem directly implies another famous result on
points and lines in the plane, due to Paul Erdos and Nicolaas G. de Bruijn.
But in this case the result holds more generally for arbitrary point-line
systems, as was observed already by Erdos and de Bruijn. We will discuss
the more general result in a moment.
Theorem 2. Let P be a set of n 3 points in the plane, not all on a line.
Then the set £: of lines passing through at least two points contains at least
n lines.
Pn
. Proof. For n = 3 there is nothing to show. Now we proceed by induction
on n. Let IPI = n + 1. By the previous theorem there exists a line eo E £:
containing exactly two points P and Q of P. Consider the set P' = P\Q
and the set £:' of lines determined by P'. If the points of p' do not all lie
on a single line, then by induction 1£:'1 n and hence I£:I n + 1 because
of the additional line eo in £:. If, on the other hand, the points in P' are all
on a single line, then we have the "pencil" which results in precisely n + 1
lines. 0
Now, as promised, here is the general result, which applies to much more
general "incidence geometries."
Theorem 3. Let X be a set of n 3 elements, and let AI," . , Am be
proper subsets of X, such that every pair of elements of X is contained in
precisely one set Ai. Then m n holds.
. Proof. The following proof, variously attributed to Motzkin or Conway,
is almost one-line and truly inspired. For x E X let Tx be the number of
sets Ai containing x. (Note that 2 ::; Tx < m by the assumptions.) Now if
x rf. Ai, then T x IAi I because the IAi I sets containing x and an element
of Ai must be distinct. Suppose m < n, then mlAil < nT x and thus
m(n - IAil) > n(m - Tx) for x Ai, and we find
1-"'.1-'" '" 1 >'" '" 1 ="'.1.=1
- L... n - L... L... n(m-Tx} L... L... m(n-IA,I) L... m '
xEX xEX Ai :xllAi A, x:xllAi A-
which is absurd.
o
There is another very short proof for this theorem that uses linear algebra.
Let B be the incidence matrix of (X; AI,... , Am), that is, the rows in B
are indexed by the elements of X, the columns by AI,... , :-lm, where
Bd := {
if x E A
if x rf. A.
Consider the product BB T . For x i- x' we have (BnT)xx'
1, since x
Lines in the plane, and decompositions (}f graphs
and x' are contained in precisely one set Ai, hence
BB T
['"r
1
)+(:
o
1
1
; )
o
r.1:2 -1
o
r Xn -1
o
where r x is defined as above. Since the first matrix is positive definite (it has
only positive eigenvalues) and the second matrix is positive semi-definite
(it has the eigenvalues nand 0), we deduce that BET is positive definite
and thus, in particular, invertible, implying rank(BB T ) = n. It follows
that the rank of the (n x m)-matrix B is at least n, and we conclude that
indeed n m, since the rank cannot exceed the number of columns.
Let us go a little beyond and turn to graph theory. (We refer to the review of
basic graph concepts in the appendix to this chapter.) A moment's thought
shows that the following statement is really the same as Theorem 3:
If we decompose a complete graph K" into m cliques different
from KIl' such that every edge is in a unique clique, then m :::: n.
Indeed, let X correspond to the vertex set of KIl and the sets Ai to the
vertex sets of the cliques, then the statements are identical.
Our next task is to decompose K 11 into complete bipartite graphs such
that again every edge is in exactly one of these graphs. There is an easy
way to do this. Number the vertices {1, 2, . .. , n}. First take the com-
plete bipartite graph joining I to all other vertices. Thus we obtain the
graph K i . ll - i which is called a star. Next join 2 to 3,... , n, result-
ing in a star Ki,n-2' Going on like this, we decompose KIl into stars
Ki,ll-i,Ki,Il-2"" ,Ki,i' This decomposition uses n -1 complete bi-
partite graphs. Can we do better, that is, use fewer graphs? No, as the
following result of Ron Graham and Henry O. Pollak says:
Theorem 4. If Kn is decomposed into complete bipartite subgraphs
Hi, . . . , Hrn, then m :::: n - 1.
The interesting thing is that, in contrast to the Erdos-de Bruijn theorem, no
combinatorial proof for this result is known! All of them use linear algebra
in one way or another. Of the various more or less equivalent ideas let us
look at the proof due to Tverberg, which may be the most transparent.
. Proof. Let the vertex set of KIl be {1,. .. , n}, and let Lj, Rj be the
defining vertex sets of the complete bipartite graph Hj, j = 1,... , m.
To every vertex i we associate a variable Xi. Since Hi,'" , Hm decom-
pose K Il , we find
rn
"'"" x x .
' J
i<j
L( L Xa' L Xb).
k=i aEL k bERk
Now suppose the theorem is false, m < n - 1. Then the system of linear
49
1JJ
t
1\:
.
-71
.
.
.
\ .
.
(I) A decomposition of J( 5 into 4 complete
bipartite subgraphs
50
Lines in the plane, and decompositions of graphs
A graph G with 7 vertices and 11 edges.
It has one loop, one double edge and one
triple edge.
The complete graphs Kn on n vertices
and () edges
The complete bipartite graphs Km+n
with rn + n vertices and mn edges
equations
.r1 + . . . + X n 0,
L Xa
()
(k = 1,.. . ,m)
aEL k
has fewer equations than variables, hence there exists a non-trivial solution
C1,... ,C n . From (I) we infer
L CiCj 0.
i<j
But this implies
( .)
0= C1+...+Cn)-
n
Lci +2LC;Cj
i=1 i<j
n
L ')
c
I
> 0,
;=1
a contradiction, and the proof is complete.
D
Appendix: Basic graph concepts
Graphs are among the most basic of all mathematical structures. Corre-
spondingly, they have many different versions, representations, and incar-
nations. Abstractly, a graph is a pair G = (F, E), where l' is the set of
vertices, E is the set of ed[?es, and each edge f' E E "connects" two ver-
tices v, tv E F. We consider only finite graphs, where l' and E are finite.
Usually, we deal with simple graphs: Then we do not admit loops, i. e.,
edges for which both ends coincide, and no multiple ed[?es that have the
same set of endvertices. Vertices of a graph are called adjacent or nei[?hhors
if they are the end vertices of an edge. A vertex and an edge are called
incident if the edge has the vertex as an end vertex.
Here is a little picture gallery of important (simple) graphs:
KJKL
. .
K1,1
><
K 2 ,2
K2,1
K3'3
..,.........-
Lines in the plane, and decompositions of [?raphs
51
PI' P/\
cG8
Two graphs G = (1/, E) and G' = (1/', E') are considered isomorphic if
there are bijections ' -+ 1" and E -+ E' that preserve the incidences be-
tween edges and their endvertices. (It is a major unsolved problem whether
there is an efficient test to decide whether two given graphs are isomorphic.)
This notion of isomorphism allows us to talk about the complete graph /\5
on 5 vertices, etc.
G' = (", E') is a suh[?raph of G = (1/, E) if ",T' 1/, E' E, and every
edge C E E' has the same endvertices in G' as in G. G' is an induced
suh[?raph if, additionally, all edges of G that connect vertices of G' are also
edges of G'.
Many notions about graphs are quite intuitive: for example, a graph G
is connected if every two distinct vertices are connected by a path in G,
or equivalently, if G cannot be split into two nonempty subgraphs whose
vertex sets are disjoint.
We end this survey of basic graph concepts with a few more pieces of ter-
minology: A clique in G is a complete subgraph. An independent set in G
is an induced subgraph without edges, that is, a subset of the vertex set such
that no two vertices are connected by an edge of G. A graph is aj(Jrest if it
does not contain any cycles. A tree is a connected forest. Finally, a graph
G = (', E) is hipartite if it is isomorphic to a subgraph of a complete bi-
partite graph, that is, if its vertex set can be written as a union T = 1/1 U V2
of two independent sets.
References
[I] N. G. DE BRUUN & P. ERDOS: On a combinatorial prohlem, Proc. Kon. Ned.
Akad. Wetensch. 51 (1948),1277-1279.
12J H. S. M. COXETER: A problem of collinear points, Amer. Math. Monthly 55
(1948),26-28 (contains Kelly's proof).
[3] P. ERDOS: Problem 4065 Three point collinearity, Amer. Math. Monthly 51
(1944),169-171 (contains Gallai's proof).
[4] R. L. GRAHAM & H. O. POLLAK: On the addressing problemj(Jr loop switch-
ing, Bell System Tech. J. 50 (1971),2495-2519.
[5] J. J. SYLVESTER: Mathematical Question 11851, The Educational Times 46
(1893), 156.
[6] H. TVERBERG: Oil the decompositioll (!{ Kn into complete hipartite graphs,
J. Graph Theory 6 (1982), 493-494.
The paths Pn with n vertices
The cycles C n with n vertices
is a subgraph of
..,......-
The slope problem
Try for yourself - before you read much further - to construct config-
urations of points in the plane that determine "relatively few" slopes. For
this we assume, of course, that the n 2: 3 points do not all lie on one line.
Recall from Chapter 8 on "Lines in the plane" the theorem of Erdos and de
Bruijn: the n points will determine at least n different lines. But of course
many of these lines may be parallel, and thus determine the same slope.
L.
n=3 n=4 n=5 n=6 n=7
3 slopes 4 slopes 4 slopes 6 slopes 6 slopes
or
ff.
n=3
3 slopes
n=4
4 slopes
n=5
4 slopes
n=6
6 slopes
n=7
6 slopes
After some attempts at finding configurations with fewer slopes you might
conjecture - as Scott did in 1970 - the following theorem.
Theorem. If n 2: 3 points in the plane do not lie on one single line,
then they determine at least n - 1 different slopes, where equality is
possible only if n is odd and n 2: 5.
Our examples above - the drawings represent the first few configurations
in two infinite sequences of examples - show that the theorem as stated is
best possible: for any odd n 2: 5 there is a configuration with n points that
determines exactly n - 1 different slopes, and for any other n 2: 3 we have
a configuration with exactly n slopes.
Chapter 9
A little experimentation for small n will
probably lead you to a sequence such as
the two depicted here.
54
. . . . .
. . . .
. . . . . . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
Three pretty sporadic examples from the
Jamison-Hill catalogue
This configuration of n = 6 points
determines t = 6 different slopes.
I
.
.
3 4 5
. .
6
.
2
.
---0-----0---0-
2 345 6
Here a vertical starting direction yields
7fo = 123456
The slope problem
However, the configurations that we have drawn above are by far not the
only ones. For example, Jamison and Hill described four infinite families
of configurations, each of them consisting of configurations with an odd
number n of points that determine only n - 1 slopes ("slope-critical con-
figurations"). Furthermore, they listed 102 "sporadic" examples that do not
seem to fit into an infinite family, most of them found by extensive com-
puter searches.
Conventional wisdom might say that extremal problems tend to be very
difficult to solve exactly if the extreme configurations are so diverse and
irregular. Indeed, there is a lot that can be said about the structure of slope-
critical configurations (see (21), but a classification seems completely out
of reach. However, the theorem above has a simple proof, which has two
main ingredients: a reduction to an efficient combinatorial model due to
Eli Goodman and Ricky Pollack, and a beautiful argument in this model by
which Peter Ungar completed the proof in 1982.
. Proof. (1) First we notice that it suffices to show that every "even" set
of n = 2m points in the plane (m 2: 2) determines at least n slopes. This
is so since the case n = 3 is trivial, and for any set of n = 2m + 1 2: 5
points (not all on a line) we can find a subset of n - 1 = 2m points, not all
on a line, which already determines n - 1 slopes.
Thus for the following we consider a configuration of n = 2m points in the
plane that determines t 2: 2 different slopes.
(2) The combinatorial model is obtained by constructing a periodic se-
quence of permutations. For this we start with some direction in the plane
that is not one of the configuration's slopes, and we number the points
1, . .. , n in the order in which they appear in the I-dimensional projec-
tion in this direction. Thus the permutation 7fo = 123...n represents the
order of the points for our starting direction.
Next let the direction perform a counterclockwise motion, and watch how
the projection and its permutation change. Changes in the order of the
projected points appear exactly when the direction passes over one of the
configuration's slopes.
But the changes are far from random or arbitrary: by performing a 180 0
rotation of the direction, we obtain a sequence of permutations
7fo --+ 7f1 --+ 7f2 --+ .,. --+ 7ft-1 --+ 7ft
which has the following special properties:
. The sequence starts with 7fo = 123...n and ends with 7ft = n...321.
. The length t of the sequence is the number of slopes of the point con-
figuration.
. In the course of the sequence, every pair i < j is switched exactly
once. This means that on the way from 7fo = 123...n to 7ft = n...321,
only increasinR substrings are reversed.
.,.......
The slope prohlem
..
55
. Every move consists in the reversal of one or more disjoint increasing
substrings (corresponding to the one or more lines that have the direc-
tion which we pass at this point).
7r6 = 654321
7r5 = 652341
7r2 = 213564
7r] = 213546
7ro = 123456
By continuing the circular motion around the configuration, one can view
the sequence as a part of a two-way infinite, periodic sequence of permuta-
tions
--+ 7r-1 --+ 7ro --+ ... --+ trt --+ 7rt+1 --+ ... --+ 7r2t --+ ...
where 7ri+1 is the reverse of 7ri for all i, and thus 7ri+2t = 7ri for all i E Z.
We will show that every sequence with the above properties (and t 2: 2)
must have length t 2: n.
(3) The proof's key is to divide each permutation into a "left half" and a
"right half" of equal size Tn = , and to count the letters that cross the
imaginary harrier between the left half and the right half.
Call 7ri --+ 7ri+ 1 a crossing move if one of the substrings it reverses does
involve letters from both sides of the barrier. The crossing move has order
d if it moves 2d letters across the barrier, that is, if the crossing string has
exactly d letters on one side and at least d letters on the other side. Thus in
our example
7r2 = 2 13:56 4 ---+ 2 65:31 4 = 7r3
is a crossing move of order d
which we mark by":"),
2 (it moves 1,3,5,6 across the barrier,
65 2:34 1 ---+ 654:321
Getting the sequence of permutations
for our small example
A crossing move
265314
213564
56
The slope problem
is crossing of order d 2 = 1, while for example
A touching move
An ordinary move
652341
/ 625314
6
213564
213546
6 25 :3 14 ----+ 6 52 :3 41
is not a crossing move.
In the course of the sequence 7ro --+ 7r1 --+ ... --+ 7ri, each of the letters
1,2, . .. ,n has to cross the barrier at least once. This implies that, if
d 1 , . ., ,de denote the orders of the c crossing moves, then we have
e
L2d i = #{lettersthatcrossthebarrier} 2: n.
;=1
This also implies that we have at least two crossing moves, since a crossing
move with 2d i = n occurs only if all the points are on one line, i. e. for
t = 1. Geometrically, a crossing move corresponds to the direction of a
line of the configuration that has less than m points on each side.
(4) A touching move is a move that reverses some string that is adjacent to
the central barrier, but does not cross it. For example,
7r4 = 6 25 :3 14 ----+ 6 52 :3 41 = 7r5
is a touching move. Geometrically, a touching move corresponds to the
slope of a line of the configuration that has exactly Tn points on one side,
and hence at most Tn - 2 points on the other side.
Moves that are neither touching nor crossing will be called ordinary moves.
For this
7r1 = 213:5 46 ----+ 213:5 64 = 7r2
is an example. So every move is either crossing, or touching, or ordinary,
and we can use the letters T, C, 0 to denote the types of moves. C(d) will
denote a crossing move of order d. Thus for our small example we get
T 0 C'(2) () T C(1)
7r 0 ----+ 7r 1 ----+ 7r 2 ----+ 7r:3 ----+ 7r 4 ----+ 7r 5 ----+ 7r 6 ,
or even shorter we can record this sequence as T, 0, C (2), 0, T, C (1).
(5) To complete the proof, we need the following two facts:
Between any two crossing moves, there is at least one touching
move.
Between any crossing move of order d and the next touching move,
there are at least d - 1 ordinary moves.
In fact, after a crossing move of order d the barrier is contained in a sym-
metric decreasing substring of length 2d, with d letters on each side of the
barrier. For the next crossing move the central barrier must be brought into
an increasing substring of length at least 2. But only touching moves affect
The slope problem
57
whether the barrier is in an increasing substring. This yields the first fact.
For the second fact, note that with each ordinary move (reversing some
increasinR substrings) the decreasing 2d-string can get shortened by only
one letter on each side. And, as long as the decreasing string has at least 4
letters, a touching move is impossible. This yields the second fact.
If we construct the sequence of permutations starting with the same initial
projection but using a clockwise rotation, then we obtain the reversed se-
quence of permutations. Thus the sequence that we do have recorded must
also satisfy the opposite of our second fact, namely
Between a touching move and the next crossing move, of order d,
there are at least d - 1 ordinary moves.
(6) The T -O-C -pattern of the infinite sequence of permutations, as derived
in (2), is obtained by repeating over and over again the T -O-C -pattern of
length t of the sequence 7fo ----+ . " ----+ 7ft. Thus with the facts of (5) we
see that in the infinite sequence of moves, each crossing move of order d is
embedded into a T -O-C -pattern of the type
T, 0,0, . .. , 0, C (d), 0, 0, . .. , (),
> d-1 > d-1
- -
(*)
of length 1 + (d - 1) + 1 + (d - 1) = 2d.
In the infinite sequence, we may consider a finite segment of length t that
starts with a touching move. This segment consists of substrings of the
type (*), plus possibly extra inserted T's. This implies that its length t
satisfies
c
t > L 2d i > n,
i=l
which completes the proof.
D
References
[I] J. E. GOODMAN & R. POLLACK: A combinatorial perspective on some
prohlems in geometry, Congressus Numerantium 32 (1981),383-394.
[2] R. E. JAMISON & D. HILL: A catalogue ()l slope-critical configurations,
Congressus Numerantium 40 (1983),101-125.
[3] p, R, SCOTT: On the sets of directions determined by n points, Amer.
Math. Monthly 77 (1970),502-505.
[4] p, UNGAR: 2N noncollinear points determine at least 2N directions, J,
Combinatorial Theory Ser. A 33 (1982), 343-347.
Three applications
of Euler's formula
A graph is planar if it can be drawn in the plane JE.2 without crossing edges
(or, equivalently, on the 2-dimensional sphere S2). We talk of a plane graph
if such a drawing is already given and fixed. Any such drawing decomposes
the plane or sphere into a finite number of connected regions, including
the outer (unbounded) region, which are referred to as faces. Euler's for-
mula exhibits a beautiful relation between the number of vertices, edges
and faces that is valid for any plane graph. Euler mentioned this result for
the first time in a letter to his friend Goldbach in 1750, but he did not have
a complete proof at the time. Among the many proofs of Euler's formula,
we present a pretty and "self-dual" one that gets by without induction. It
can be traced back to von Staudt's book "Geometrie der Lage" from 1847.
Euler's formula. If G is a connected plane graph with n vertices,
e edges and f faces, then
n - e + f = 2.
. Proof. Let T <;;; E be the edge set of a spanning tree for G, that is, of a
minimal subgraph that connects all the vertices of G. This graph does not
contain a cycle because of the minimality assumption.
We now need the dual [?raph G* of G: to construct it, put a vertex into the
interior of each face of G, and connect two such vertices of G* by edges that
correspond to common boundary edges between the corresponding faces. If
there are several common boundary edges, then we draw several connecting
edges in the dual graph. (Thus G* may have multiple edges even if the
original graph G is simple.)
Consider the collection T* <;;; E* of edges in the dual graph that corre-
sponds to edges in E\T. The edges in T* connect all the faces, since T
does not have a cycle; but also T* does not contain a cycle, since otherwise
it would separate some vertices of G inside the cycle from vertices outside
(and this cannot be, since T is a spanning subgraph, and the edges of T and
of T* do not intersect). Thus T* is a spanning tree for G*.
For every tree the number of vertices is one larger than the number of
edges. To see this, choose one vertex as the root, and direct all edges
"away from the root": this yields a bijection between the non-root ver-
tices and the edges, by matching each edge with the vertex it points at.
Applied to the tree T this yields n = eT + 1, while for the tree T* it yields
f = eT* + 1. Adding both equations we get n+f = (eT+1)+(eT.+1) =
e+ 2. D
Chapter 10
Leonhard Euler
v:::v
A plane graph G: n = 6, e = 10, f = 6
.........
..... -..
'.
..........
. . .
. .
. .
. .
. .
. .
. .
. .
. .
. . .
. . .
'. .
----- u,,_ -,,:b ":-::_._._._ ._-;;..:...,
..........
Dual spanning trees in G and in G*
60
Three applications of Euler'sformula
The five platonic solids
Here the degree is written next to each
vertex. Counting the vertices of given
degree yields 712 = 3, 713 = 0, 714 = 1,
715 = 2.
The number of sides is written into each
region. Counting the faces with a given
number of sides yields h = 1. h = 3,
14 = 1, 19 = 1, and I, = ° otherwise.
Euler's formula thus produces a strong numerical conclusion from a geo-
metric-topolo[?ical situation: the numbers of vertices, edges, and faces of a
finite graph G satisfy 71 - e + 1 = 2 whenever the graph is or can he drawn
in the plane or on a sphere.
Many well-known and classical consequences can be derived from Euler's
formula. Among them are the classification of the regular convex polyhedra
(the platonic solids), the fact that K" and K:u are not planar (see below),
and the five-color theorem that every planar map can be colored with at
most five colors such that no two adjacent countries have the same color.
But for this we have a much better proof, which does not even need Euler's
formula - see Chapter 27.
This chapter collects three other beautiful proofs that have Euler's formula
at their core. The first two - a proof of the Sylvester-Gallai theorem, and
a theorem on two-colored point configurations - use Euler's formula in
clever combination with other arithmetic relationships between basic graph
parameters. Let us first look at these parameters.
2
The degree of a vertex is the number of edges that end in the vertex, where
loops count double. Let 71; denote the number of vertices of degree 'I in G.
Counting the vertices according to their degrees, we obtain
71 = no + 711 + 712 + 713 + . . .
(I)
On the other hand, every edge has two ends, so it contributes 2 to the sum
of all degrees, and we obtain
2e = 771 + 2712 + 3713 + 4714 + . . .
(2)
You may interpret this identity as counting in two ways the ends of the
edges, that is, the edge-vertex incidences. The average degree d of the
vertices is therefore
d
2e
71
Next we count the faces of a plane graph according to their number of sides:
a k-face is a face that is bounded by k edges (where an edge that on both
sides borders the same region has to be counted twice!). Let Ik be the
number of k-faces. Counting all faces we find
1 = h + h + 13 + 14 + . . .
(3)
Counting the edges according to the faces of which they are sides, we get
2e = h+2h+3h+414+'"
(4)
As before, we can interpret this as double-counting of edge-face incidences.
Note that the average number of sides of faces is given by
- 2e
1 - -
- 1"
Three applications of Euler 's formula
61
Let us deduce from this - together with Euler's formula - quickly that the
complete graph K5 and the complete bipartite graph l(J,:; are not planar.
For a hypothetical plane drawing of K5 we calculate n = 5, e = G) = 10,
thus I = c + 2 - n = 7 and 1 = 7- = 27 0 < 3. But if the average number
of sides is smaller than 3, then the embedding would have a face with at
most two sides, which cannot be.
Similarly for K 3 . 3 we get n = 6, e = 9, and I = e + 2 - n = 5, and thus
1 = 7- = 15 R < 4, which cannot be since l(3,:3 is simple and bipartite, so
all its cycles have length at least 4.
It is no coincidence, of course, that the equations (3) and (4) for the Ii'S look
so similar to the equations (1) and (2) for the ni's. They are transformed
into each other by the dual graph construction G --+ G* explained above.
From the double counting identities, we get the following important "local"
consequences of Euler's formula.
Proposition. Let G he any non empty simple plane graph. Then
(A) G has a vertex qfdegree at most 5.
(B) G has at most 3n - 6 edges.
(C) If the edges of G are two-colored, then there is a vertex of G with at
most two color-changes in the cyclic order of the edges around the
vertex.
. Proof. For each of the three statements, we may assume that G is con-
nected.
(A) Every face has at least 3 sides (since G is simple), so (3) and (4) yield
I h + 14 + fe, + . . .
2e 3h + 4/ 1 + 5f5 + . . .
and thus 2e - 3f 2> o.
Now if each vertex has degree at least 6, then (I) and (2) imply
n n6 + n7 + n8 + . . .
2e 6n6 + 7n7 + 8n8 + . ..
and thus 2e - 6n 2> 0,
Taking both inequalities together, we get
6(e-n-.f) = (2e-6n)+2(2e-3.f) > 0
and thus e 2> n + f, contradicting Euler's formula.
(B) As in the first step of part (A), we obtain 2e - 3f 2> 0, and thus
3n - 6 = 3e - 3 f 2> c
from Euler's formula.
K5 drawn with one crossing
K 3.3 drawn with one .crossing
62
Three applications (f'Euler'sformula
Arrows point to the corners with color
changes
A
(:)
(C) Let c be the number of corners where color changes occur. Suppose the
statement is false, then we have r 4n color changes, since at every vertex
there is an even number of changes. Now every face with 2k or 2k + 1 sides
has at most 2k such corners, so we conclude that
4n < c <
<
213 + 414 + 4f.s + 616 + 617 + 8fs + ...
213 + 414 + 6f.s + 816 + 1017 + ...
2(313 + 414 + 5h + 616 + 717 + ...)
-4(13 + 14 + 15 + 16 + 17 + . . . )
4e - 41
using again (3) and (4). So we have e n + 1, again contradicting Euler's
formula. D
1. The Sylvester-Gallai theorem, revisited
It was first noted by Norman Steenrod, it seems, that part (A) of the propo-
sition yields a strikingly simple proof of the Sylvester-Gallai theorem (see
Chapter 8).
The Sylvester-Gallai theorem. Given any set (f' n 3 points in
the plane, not all on one line, there is alwa.vs a line that contains
exactly two of'the points.
. Proof. (Sylvester-Gallai via Euler)
If we embed the plane JR2 in JR3 near the unit sphere 52 as indicated in
our figure, then every point in JR2 corresponds to a pair of antipodal points
on 52, and the lines in JR2 correspond to great circles on 52. Thus the
Sylvester-Gallai theorem amounts to the following:
Given any set ofn 3 pairs of antipodal points on the sphere, not
all on one great circle, there is always a great circle that contains
exactly two of the antipodal pairs.
Now we dualize, replacing each pair of antipodal points by the correspond-
ing great circle on the sphere. That is, instead of points :::I::v E 52 we
consider the orthogonal circles given by C v := {x E 52 : (x, v) = O}.
(This C v is the equator if we consider v as the north pole of the sphere.)
Then the Sylvester-Gallai problem asks us to prove:
Given any collection ofn 3 great circles on 52, not all of them
passing through one point, there is always a point that is on exactly
two of the great circles.
But the arrangement of great circles yields a simple plane graph on 52,
whose vertices are the intersection points of two of the great circles, which
divide the great circles into edges. All the vertex degrees are even, and they
are at least 4 - by construction. Thus part (A) of the proposition yields the
existence of a vertex of degree 4. That's it! D
Three applications of Euler 's formula
63
2. Monochromatic lines
The following "colorful" relative of the Sylvester-Gallai theorem is due to
Don Chakerian.
Theorem. Given any finite configuration of "black" and "white" points
in the plane, not all on one line, there is always a "monochromatic" line:
a line that contains at least two points of" one color and none of the other.
. Proof. As for the Sylvester-Gallai problem, we transfer the problem to
the unit sphere and dualize it there. So we must prove:
Given any finite collection of "black" and "white" great circles on
the unit sphere, not all passing through one point, there is always
an intersection point that lies either only on white great circles, or
only on hlack great circles.
Now the (positive) answer is clear from part (C) of the proposition, since
in every vertex where great circles of different colors intersect, we always
have at least 4 corners with sign changes. D
3. Pick's theorem
Pick's theorem from 1899 is a beautiful and surprising result in itself, but
it is also a "classical" consequence of Euler's formula. For the following,
call a convex lattice polygon P S;; JR2 elementary if its vertices are integral,
but if it does not contain any further lattice points.
Lemma. Every elementary triangle 6. = conv{po, Pl , P2} S;; JR2 has area
:1(6.) = .
. Proof. Both the parallelogram P with corners Po, P1 , P2, P1 + P2 - Po
and the lattice 7/} are symmetric with respect to the map
(J: x f-----t Pi + P2 - x,
which is the reflection with respect to the center of the segment from Pi
to P2' Thus the parallelogram P = 6. U 0'(6.) is elementary as well, and
its integral translates tile the plane. Hence {Pi - Po, P2 - Po} is a basis of
the lattice 7/} , it has determinant 1, P is a of area 1, and 6. has area . (For
an explanation of these terms see the box on the next page.) D
Theorem. The area of any (not necessarily convex) polygon Q S;; JR2 with
integral vertices is given by
A(Q)
1
nint + -'(I'bd - 1,
2
. . .
P2
. .
. .
Po
.
.
.
.
where nint and nbd are the numbers (f" integral points in the interior
respectively on the boundary ofQ. nint = 11, nbd = 8, so
.
64
Three applications of Euler's formula
Lattice bases
A basis of;f} is a pair of linearly independent vectors e 1, e2 such that
.)
Z- = {A1e1 + A2e2 : A1, A2 E Z}.
Let e1 = () and e2 = (), then the area of the parallelogram
spanned by e1 and e2 is given by .4(el,e2) = Idet(el,e2)1 =
1 det ( ) I. If f 1 = G) and f 2 = () is another basis, then
there exists an invertible Z-matrix Q with (;) = ()Q. Since
QQ -1 = (6 n, and the determinants are integers, it follows that
1 det QI = 1, and hence I det(f 1, f 2) 1 = 1 det( e1, e2)1. Therefore
all basis parallelograms have the same area 1, since A ( (6), ()) = 1.
.
. Proof. Every such polygon can be triangulated using all the 7Iin! lattice
points in the interior, and all the nbd lattice points on the boundary of Q.
(This is not quite obvious, in particular if Q is not required to be convex, but
the argument given in Chapter 28 on the art gallery problem proves this.)
Now we interpret the triangulation as a plane graph, which subdivides the
plane into one unbounded face plus .f - 1 triangles of area , so
1
.1(Q) = 2(.f 1).
.
.
.
.
Every triangle has three sides, where each of the eint interior edges bounds
two triangles, while the ('bd boundary edges appear in one single triangle
each. So 3(.f -1) = 2eint +ebd and thus .f = 2(e - .f) -ebr/+3. Also, there
is the same number of boundary edges and vertices, ebd = 7Ibr/. These two
facts together with Euler's formula yield
.f
2(e - .f) - ('.bd + 3
2(n - 2) - nbd + 3
271in! + n/"l - 1,
and thus
A(Q)
(.f - 1)
D
nin! + nbr/ - 1.
References
[IJ G. D. CHAKERIAN: Sylvester's problem on collinear points and a relatil'e,
Amer. Math. Monthly 77 (1970),164-167.
[2J G. PICK: Geometrisches zur Zahlenlehre, Sitzungsberichte Lotos (Prag),
Natur-med. Verein fur Bohmen 19 (1899),311-319.
[3] G. K. C. VON STAUDT: Geometrie der Lage, Verlag der Fr. Korn'schen
Buchhandlung, Nurnberg 1847.
[41 N. E. STEENROD: Solution 40651Editorial Note, Amer. Math. Monthly 51
(1944), 170-171.
............--
Cauchy's rigidity theorem
A famous result that depends on Euler's formula (specifically, on part (C)
of the proposition in the previous chapter) is Cauchy's rigidity theorem for
3-dimensional polyhedra.
For the notions of congruence and of combinatorial equivalence that are
used in the following we refer to the appendix on polytopes and polyhedra
in the chapter on Hilbert's third problem, see page 44.
Theorem. If two 3-dimensional convex polyhedra P and pi are
combinatorially equivalent with corresponding facets being congru-
ent, then also the angles between corresponding pairs (If adjacent
facets are equal (and thus P is congruent to PI).
The illustration in the margin shows two 3-dimensional polyhedra that are
combinatorially equivalent, such that the corresponding faces are congru-
ent. But they are not congruent, and only one of them is convex. Thus the
assumption of convexity is essential for Cauchy's theorem!
. Proof. The following is essentially Cauchy's original proof. Assume
that two convex polyhedra P and pi with congruent faces are given. We
color the edges of P as follows: an edge is black (or "positive") if the
corresponding interior angle between the two adjacent facets is larger in P'
than in P; it is white (or "negative") if the corresponding angle is smaller
in P' than in P.
The black and the white edges of P together form a 2-colored plane graph
on the surface of P, which by radial projection, assuming that 0 is in the in-
terior of P, we may transfer to the surface of S2. If P and pi have unequal
corresponding facet-angles, then the graph is nonempty. With part (C) of
the proposition in the previous chapter we find that there is a vertex p that
is adjacent to at least one black or white edge, such that there are at most
two changes between black and white edges (in cyclic order).
Now we intersect P with a small sphere SE (of radius E) centered at the
vertex p, and we intersect P' with a sphere S; of the same radi us E centered
at the corresponding vertex p'. In SE and S; we find convex spherical
polygons Q and Q' such that corresponding arcs have the same lengths,
because of the congruence of the facets of P and pI, and since we have
chosen the same radius E.
Now we mark by + the angles of Q for which the corresponding angle
Chapter 11
Augustin Cauchy
66
Cauchy's rigidity theorem
Q:
Q':
in Q' is larger, and by - the angles whose corresponding angle of Q' is
smaller. That is, when moving from Q to Q' the + angles are "opened,"
the - angles are closed, while all side lengths and the unmarked angles
stay constant.
From our choice of p we know that some + or - sign occurs, and that in
cyclic order there are at most two +/- changes. If only one type of signs
occurs, then the lemma below directly gives a contradiction, saying that one
edge must change its length. If both types of signs occur, then (since there
are only two sign changes) there is a "separation line" that connects the
midpoints of two edges and separates all the + signs from all the - signs.
Again we get a contradiction from the lemma below, since the separation
line cannot be both longer and shorter in Q' than in Q. D
Cauchy's arm lemma.
If Q and Q' are convex (planar or spherical) n-gons, labeled as In
the figure,
Q 0 ' qnl
003
q2 OOn-l
002
ql qn
Q':
oo)
--
q;
a;
q;,
q
such that qiqi+1 = q;q:+l holds for the lengths of corresponding edges for
1 ::; i ::; n - 1, and OOi ::; a; holds for the sizes of cor responding anglesfor
2 ::;i ::; n - 1, then the "missing" edge length sati.fies
ql qn ::; q q;"
with equality if'and only if a; = a; holds for all i.
It is interesting that Cauchy's original proof of the lemma was false: a con-
tinuous motion that opens angles and keeps side-lengths fixed may destroy
convexity - see the figure! On the other hand, both the lemma and its
proof given here, from a letter by I. J. Schoenberg to S. K. Zaremba, are
valid both for planar and for spherical polygons.
. Proof. We use induction on n. The case n = 3 is easy: If in a triangle
we increase the angle, between two sides of fixed lengths a and b, then the
length c of the opposite side also increases. Analytically, this follows from
the cosine theorem
c 2 = a 2 + b 2 - 2abcos,
in the planar case, and from the analogous result
cas c = cas a cas b + sin a sin b cas,
in spherical trigonometry. Here the lengths a, b, r are measured on the
surface of a sphere of radius 1, and thus have values in the interval [0, 1T].
Cauchy's rigidity theorem
67
Now let n 4. If for any i E {2,... , n - I} we have ai = ai, then the
corresponding vertex can be cut off b y introdu cin g the dia gonal from qi-l
to qi+l resp. from qi-l to q;+1' with qi-l qi+1 = qi-l q;+I' so we are done
by induction. Thus we may assume ai < a; for 2 :'::: i :'::: n - 1.
Now we produce a new polygon Q* from Q by replacing a,,_] by the
largest possible angle a;'_1 :'::: a_1 that keeps Q* convex. For this we
replace q" by q, keeping all the other qi' edge lengths, and angles from Q.
If indeed we can choose n_1 = aj'_1 keeping Q* convex, then we get
- --
q] q" < ql q :'::: q( q, using the case n = 3 for the first step and induction
as above for the second.
Otherwise after a nontrivial move that yields
q1 q > q] qn
we "get stuck" in a situation where q:2, ql and q are collinear, with
q2q] + q] q = q:2q.
Now we compare this Q* with Q' and find
----. < "
q2qn _ q2qn
by induction on n (ignoring the vertex ql resp. q(). Thus we obtain
(* )
q; q:, > qq:, - q( q
(:1)
> q:2q ---0 ql q:2
(2)
(] )
>
q1q
ql (]",
where (*) is just the triangle inequality, and all other relations have already
been derived. D
We have seen an example which shows that Cauchy's theorem is not true
for non-convex polyhedra. The special feature of this example is, of course,
that a non-continuous "flip" takes one polyhedron to the other, keeping the
facets congruent while the dihedral angles "jump." One can ask for more:
Could there be, for some non-convex polyhedron, a continuous
deformation that would keep the facets fiat and congruent?
It was conjectured that no triangulated surface, convex or not, admits such
a motion. So, it was quite a surprise when in 1977 - more than 160 years
after Cauchy's work - Robert Connelly presented counterexamples: closed
triangulated spheres embedded in JR3 (without self-intersections) that are
flexible, with a continuous motion that keeps all the edge lengths constant,
and thus keeps the triangular faces congruent.
(I)
o
Q0
ql t q"
Q*
L / Q"-,.\
ql q
(2)
Q * Cj
. a;'_1
----- -----
q:2 q1 q
(3)
68
Cauchy's rigidity theurem
A beautiful example of a flexible sur-
face constructed by Klaus Steffen: The
dashed I ines represent the non-convex
edges in this "cut-out" paper model.
Fold the normal lines as "mountains"
and the dashed lines as "valleys." The
edges in the model have lengths 5, 10,
11, 12 and 17 units.
',,-
',,-
'''-,
"-
'''-,
The rigidity theory of surfaces has even more surprises in store: only very
recently Connelly, Sabitov and Walz managed to prove that when any such
flexing surface moves, the vulume it encloses must be constant. Their proof
is beautiful also in its use of algebraic machinery (outside the scope of
this book).
References
[I] A. CAUCHY: Sur les polygones et les polvedres, seconde mhnoire, J. Ecole
Poly technique XV Ie Cahier, Tome IX (1813),87; CEuvres Completes, IIe Serie.
Vol. I, Paris 1905,26-38.
[2J R. CONNELLY: A counterexample to the rigidity conjecturef!Jr polyhedra, lnst.
Haut. Etud. Sci., Pub!. Math. 47 (1978), 333-338.
[3] R. CONNELLY: The rigidity o/polyhedral surfaces, Mathematics Magazine 52
(1979),275-283.
[4] R. CONNELLY, L SABITOV & A. WALZ: The hel/ows conjecture, Beitrage
zur Algebra und Geometrie/Contributions to Algebra and Geometry 38 (1997),
1-10.
[5J 1. SCHOENBERG & S.K. ZAREMBA: On Cauchy's lemma concerning convex
polygons, Canadian J. Math. 19 (1967), 1062-1071.
--
Touching simplices
How many d-dimensional simplices can be positioned in Rd such
that they touch pairwise, that is, such that all their pairwise inter-
sections are (d - 1) -dimensional?
This is an old and very natural question. We shall call f(d) the answer to
this problem, and record f (1) = 2, which is trivial. For d = 2 the configu-
ration of four triangles in the margin shows f(2) :::: 4. There is no similar
configuration with five triangles, because from this the dual graph construc-
tion, which for our example with four triangles yields a planar drawing
of K4, would give a planar embedding of K5, which is impossible (see
page 61). Thus we have
f(2) = 4.
In three dimensions, f(3) :::: 8 is quite easy to see. For that we use the con-
figuration of eight triangles depicted on the right. The four shaded triangles
are joined to some point x below the "plane of drawing," which yields four
tetrahedra that touch the plane from below. Similarly, the four white trian-
gles are joined to some point y above the plane of drawing. So we obtain a
configuration of eight touching tetrahedra in R 3 , that is, f (3) :::: 8.
In 1965, Baston wrote a book proving f(3) :S 9, and in 1991 it took Zaks
another book to establish
f(3) = 8.
With f(l) = 2, f(2) = 4 and f(3) = 8, it doesn't take much inspiration to
arrive at the following conjecture, first posed by Bagemihl in 1965.
Conjecture. The maximal number of pairwise touching d-simplices in a
configuration in R d is
f(d) = 2 d .
The lower bound, f(d) :::: 2 d , is easy to verify "if we do it right." This
amounts to a heavy use of affine coordinate tranformations, and to an in-
duction on the dimension that verifies the following stronger result, due to
Zaks [4J.
Theorem 1. For every d :::: 2, there is a family of 2 d pairwise touching
d-simplices in Rd together with a transversal line that hits the interior of
every single one of them.
Chapter 12
f(2) 2: 4
f(3) 2: 8
:',_fJ1
/;7 ..\
,/ ,/
// /
fif ///
.... '"" /_Jj <'iiJ;;_
(0fc( ;;/ r>
\, \ \...J /. - "\ /
t\ '" /
} 1''' " /
.- -< ..
" .
f\\-...J
"Touching simplices"
70
Touching simplices
(-I,I)T
tI X2
--
04--
Xl
(-I,-I)T (O,-I)T
X2
, ItI
,
,
,
- - '
, Xl
,
,
,
.4' ,
. Proof. For d = 2 the family of four triangles that we had considered
does have such a transversal line. Now consider any d-dimensional con-
figuration of touching simplices that has a transversal line £. Any nearby
parallel line f' is a transversal line as well. If we choose f' and £ parallel
and close enough, then each of the simplices contains an orthogonal
(shortest) connecting interval between the two lines. Only a bounded part
of the lines f and f' is contained in the simplices of the configuration, and
we may add two connecting segments outside the configuration, such that
the rectangle spanned by the two outside connecting lines (that is, their con-
vex hull) contains ail the other connecting segments. Thus, we have placed
a "ladder" such that each of the simplices of the configuration has one of
the ladder's steps in its interior, while the four ends of the ladder are outside
the configuration.
Now the main step is that we perform an (affine) coordinate transformation
that maps JRd to JRd, and takes the rectangle spanned by the ladder to the
rectangle (half-square) as shown in the figure below, given by
R 1 = {(x1,x2,0,...,Of:-I:,:::x1:':::0;-I:':::x2:':::I}.
Thus the configuration of touching simplices 1 in JRd which we obtain
has the X 1 -axis as a transversal line, and it is placed such that each of the
simplices contains a segment
51 (a) = {(a,:r:2,0,...,0)T:-I:,:::x2:':::I}
in its interior (for some 0; with -1 < a < 0), while the origin 0 is outside
all simplices.
Now we produce a second copy of this configuration by reflecting the first
one in the hyperplane given by Xl = X2. This second configuration has the
x2- axi s as a transversal line, and each simplex contains a segment
2 T }
5 ((3) = {(Xl,;5',O,... ,0) : -1:,::: XI:'::: 1
in its interior, with -1 < ;5' < O. But each segment 5 1 (0;) intersects each
segment 5 2 (8), and thus the interior of each simplex of 1 intersects each
simplex of 2 in its interior. Thus if we add a new (d + I)-st coordinate
Xd+l, and take to be
1 2
{conv(PiU{-ed+d):PiE} U {conv(PjU{ed+d):PjE}
then we get a configuration of touching (d + I)-simplices in JRd+l. Fur-
thermore, the anti diagonal
.4 = {(x,-x,O,...,Of::rEJR} C JRd
intersects all segments 51 (a) and 52 (;5'). We can "tilt" it a little, and obtain
a line
L E = {(x,-x,O,... ,O,cx): X E JR} S;; JRd+l,
which for all small enough c > 0 intersects ail the simplices of. This
completes our induction step. D
..........-
Touching simplices
7I
In contrast to this exponential lower bound, tight upper bounds are harder
to get. A naive inductive argument (considering all the facet hyperplanes in
a touching configuration separately) yields only
f(d) :'::: (d + I)!,
and this is quite far from the lower bound of Theorem I. However, Micha
Perles found the following "magical" proof for a much better bound.
Theorem 2. Foralld 1, we have f(d) < 2 d +1.
. Proof. Given a configuration of 1 touching d-simplices PI , P 2 , . . . , P"
in JRd, first enumerate the different hyperplanes HI, H2, . .. , Hs spanned
by facets of the Pi, and for each of them arbitrarily choose a positive
side H;, and call the other side H i - .
For example, for the 2-dimensional configuration of r' = 4 triangles depicted
on the right we find s = 6 hyperplanes (which are lines for d = 2).
From these data, we construct the B-matrix, an (I x s)-matrix with entries
in {+I, -I,O}, as follows: we set
llo . {
Pi has a facet in Hj, and Pi S;; Ht,
Pi has a facet in Hj, and Pi S;; H j -,
Pi does not have a facet in Hj.
For example, the 2-dimensional configuration in the margin gives rise to
the matrix
n (
1
-1
-1
o
1
o
o
o
n
o
-1
1
-1
1
1
o
-1
o
o
1
o
Three properties of the B-matrix are worth recording. First, since every
d-simplex has d + 1 facets, we find that every row of B has exactly d + 1
nonzero entries, and thus has exactly 8 - (d + 1) zero entries. Secondly, we
are dealing with a configuration of pairwise touching simplices, and thus
for every pair of rows we find one column in which one row has a + 1 entry,
while the entry in the other row is -1. That is, the rows are different even
if we disregard their zero entries. Thirdly, the rows of B "represent" the
simplices Pi, via
p
,
n
(*)
j:B,j=1
n Hj-'
j:Bij=-1
H+ n
J
Now we derive from B a new matrix C, in which every row of B is replaced
by all the row vectors that one can generate from it by replacing all the zeros
by either + 1 or -1. Since each row of B has s - d - 1 zeros, and B has 1
rows, the matrix C has 2 s - d - 1 1 rows.
HI' A "
, H5
H 2
72 Touching simplices
For our example, this matrix C is a (32 x 6)-matrix that starts
1 1 1 1 1 1
1 1 1 1 1 -1
1 1 1 -1 1 1
1 1 1 -1 1 -1
1 -1 1 1 1 1
C= 1 1 1 1 1 1
1 -1 1 -1 1 1
1 -1 1 -1 1 -1
-1 -1 1 1 1 1
-1 -1 1 1 1 -1
Hl"A
. H5
The first row of the C -matrix represents
the shaded triangle, while the second
row corresponds to an empty intersec-
tion of the halfspaces. The point x leads
to the vector
(1-111-11)
that does not appear in the C-matrix.
where the first eight rows of C are derived from the first row of B, the
second eight rows come from the second row of B, etc.
The point now is that all the rows of C are different: if two rows are derived
from the same row of fl, then they are different since their zeros have been
replaced differently; if they are derived from different rows of B, then they
differ no matter how the zeros have been replaced. But the rows of Care
(::I: I)-vectors oflength 8, and there are only 2 8 different such vectors. Thus
since the rows of C are distinct, C can have at most 2' rows, that is,
2 8 - d - 1 r' < 2 8 .
However, not all possible (::I:1)-vectors appear in C, which yields a strict
inequality 2 8 - d - 1 r < 2 8 , and thus r < 2d+ I. To see this, we note that
every row of C represents an intersection of halfspaces - just as for the
rows of B before, via the formula (*). This intersection is a subset of the
simplex Pi, which was given by the corresponding row of B. Let us take
a point x E IR. d that does not lie on any of the hyperplanes Hj, and not in
any of the simplices Pi. From this x we derive a (::1:1 )-vector that records
for each j whether x E Hj or x E Hj-' This (::I:1)-vector does not occur
in C, because its halfspace intersection according to (*) contains x and thus
is not contained in any simplex Pi. D
References
[I] F. BAGEMIHL: A conjecture concerning neighboring tetrahedra, Amer. Math.
Monthly 63 (1956) 328-329.
[2] V. 1. D. BASTON: Some Properties of Polyhedra in Euclidean Space, Perga-
mon Press, Oxford 1965.
[3] M. A. PERLES: At most 2 d + 1 neighborly simplices in Ed, Annals of Discrete
Math. 20 (1984), 253-254.
[4] 1. ZAKS: Neighborly families of2 d d-simplices in Ed, Geometriae Dedicata
11 (1981), 279-296.
[5] 1. ZAKS: No nine neighborly tetrahedra exist, Memoirs Amer. Math. Soc.
No. 447, Vol. 91,1991.
--
Every large point set
has an obtuse angle
Around 1950, Paul Erdos conjectured that every set of more than 2 d points
in IR. d determines at least one obtuse angle, that is, an angle that is strictly
greater than . In other words, any set of points in IR. d which only has acute
angles (including right angles) has size at most 2 d . This problem was posed
as a "prize question" by the Dutch Mathematical Society - but solutions
were received only for d = 2 and for d = 3.
For d = 2 the problem is easy: The five points may determine a convex
pentagon, which always has an obtuse angle (in fact, at least one angle of
at least 108°). Otherwise we have one point contained in the convex hull
of three others that form a triangle. But this point "sees" the three edges of
the triangle in three angles that sum to 360°, so one of the angles is at least
120°. (The second case also includes situations where we have three points
on a line, and thus a 180 0 angle.)
Unrelated to this, Victor Klee asked a few years later - and Erdos spread
the question - how large a point set in IR. d could be and still have the
following "antipodality property": For any two points in the set there is a
strip (bounded by two parallel hyperplanes) that contains the point set, and
that has the two chosen points on different sides on the boundary.
Then, in 1962, Ludwig Danzer and Branko Griinbaum solved both prob-
lems in one stroke: they sandwiched both maximal sizes into a chain of
inequalities, which starts and ends in 2 d . Thus the answer is 2 d both for
Erdos' and for Klee's problem.
In the following, we consider (finite) sets S IR.d of points, their convex
hulls conv(S), and general convex polytopes Q IR. d . (See the box on
polytopes on page 44 for the basic concepts.) We assume that these sets
have the full dimension d, that is, they are not contained in a hyperplane.
Such sets touch if they have at least one boundary point in common, while
their interiors do not intersect. For any set Q IR. d and any vector s E IR. d
we denote by Q + s the image of Q under the translation that moves 0 to s.
Similarly, Q - s is the translate obtained by the map that moves s to the
anglO.
Don't be intimidated: this chapter is an excursion into d-dimensional
geometry, but the arguments in the following do not require any "high-
dimensional intuition," since they all can be followed, visualized (and thus
understood) in three dimensions, or even in the plane. Hence, our figures
will illustrate the proof for d = 2 (where a "hyperplane" is just a line),
and you could figure out the pictures for d = 3 (where a "hyperplane" is
a plane).
Chapter 13
(7
\ -.......:::>/
\ // ///
.
e
e- --
e
e
e
_-e---
74
Every large point set has an obtuse angle
Theorem 1. For every d, one has the following chain of inequalities:
(1 )
2 d < max # {S IR d 1<1: (8 i, 8 j, 8 k) ::; for any {8 i, 8 j, 8 d S}
(2) { I or any two points {8i, 8j} S there is a strip }
< max # S IR d S( i, j) that contains S, with 8i and 8 j lying in
the parallel boundary hyperplanes ofS(i,j)
(:J) maxi { s C IRd I he translates P8i ofP:= conv(S) intersect }
- In a common pomt, but they only touch
(,2 max # { S C IR d I the translates Q + 8i of some d-.dim.ensional }
- convex polytope Q IRd touch pairwIse
{ ! the translates Q* + 8i of some d-dimenSiOnal }
max # S IR d centrally.syn:metric convex polytope Q* IRd
touch paIrwIse
(6)
< 2 d
. Proof. We have six claims (equalities and inequalities) to verify. Let's
get going.
(1) Take S := {O, l}d to be the vertex set of the standard unit cube in IR d ,
and choose 8 i, 8 j, 8k E S. By symmetry we may assume that 8 j = 0 is
the zero vector. Hence the angle can be computed from
(8i,8d
cos <1:(8i, 8j, 8k) = 1 8 il1 8 kl
H ij + 8j
which is clearly nonnegative. Thus S is a set with ISI
obtuse angles.
(2) If S contains no obtuse angles, then for any 8i, 8j E S we may define
H ij +8i and H ij +8j to be the parallel hyperplanes through 8i resp. 8j that
are orthogonal to the edge [8i, 8j]. Here H ij = {x E IRd : (x, 8i-8j) = O}
is the hyperplane through the origin that is orthogonal to the line through
8i and 8j, and H + 8j = {x + 8j : x E H} is the translate of H that
passes through 8 j, etc. Hence the strip between H ij + 8i and Hi} + 8j
consists, besides 8i and 8 j, exactly of all the points x E IR such that the
angles <1: (8i, 8 j, x) and <1: (8 j, 8i, x) are non-obtuse. Thus the strip contains
all of S.
2 d that has no
.
H ij +8i \'
. \ .
\
\
\
.
(3) P is contained in the halfspace of H ij + 8j that contains 8i if and only
if P - 8 j is contained in the halfspace of Hij that contains 8i - 8 j: A prop-
erty "an object is contained in a halfspace" is not destroyed if we translate
both the object and the halfspace by the same amount (namely by -8j).
Similarly, P is contained in the halfspace of Hij + 8i that contains 8j if
and only if P - 8i is contained in the halfspace of H ij that contains 8 j - 8i.
Putting both statements together, we find that the polytope P is contained
in the strip between H ij + 8i and H ij + 8j if and only if P - 8i and P - 8j
lie in different halfspaces with respect to the hyperplane H ij .
--
Every large point set has an obtuse angle
75
This correspondence is illustrated by the sketch in the margin.
Furthermore, from Si E P = conv(S) we get that the origin 0 is contained
in all the translates P - Si (Sf E S). Thus we see that the sets P - Si
all intersect in 0, but they only touch: their interiors are pairwise disjoint,
since they lie on opposite sides of the corresponding hyperplanes H ij .
(4) This we get for free: "the translates must touch pairwise" is a weaker
condition than "they intersect in a common point, but only touch."
Similarly, we can relax the conditions by letting P be an arbitrary convex
d-polytope in ]Rd. Furthermore, we may replace S by -S.
(5) Here "2:" is trivial, but that is not the interesting direction for us. We
have to start with a configuration S <;:; ]Rd and an arbitrary d-polytope
Q <;:; ]R d such that the translates Q + Si (Si E S) touch pairwise. The
claim is that in this situation we can use
Q*:= U(x -y) E]Rd: x,y E Q}
instead of Q. But this is not hard to see: First, Q* is d-dimensional, convex,
and centrally symmetric. One can check that Q* is a polytope (its vertices
are of the form (q i - q j ), for vertices q i, q j of Q), but this is not important
for us.
Now we will show that Q + Si and Q + Sj touch if and only if Q* + Si and
Q* + Sj touch. For this we note, in the footsteps of Minkowski, that
(Q*+Si) n (Q* + Sj) ¥' 0
--, , " , " Q 1 ( I II ) 1 ( ' II )
{:::::::} :J qi, qi, qj, qj E : 2" qi - qi + Si = 2" qj - qj + Sj
--, , " , " Q 1 ( I II ) 1 ( I II )
{:::::::} :Jqi,qi,qj,qjE :2"qi+qj +Si=2"qj+qi +Sj
{:::::::} :3 qi, qj E Q : qi + Si = qj + Sj
{:::::::} (Q+si)n(Q+sj)¥'0
where in the third (and crucial) equivalence "{:::::::}" we use that every q E Q
can be written as q = (q + q) to get" {:::", and that Q is convex and thus
(q; + q'j), (qj + q;') E Q to see ",*".
Thus the passage from Q to Q* (known as Minkowski symmetrization) pre-
serves the property that two translates Q + Si and Q + S j intersect. That is,
we have shown that for any convex set Q, two translates Q + S i and Q + S j
intersect if and only if the translates Q* + Si and Q* + Sj intersect.
The following characterization shows that Minkowski symmetrization also
preserves the property that two translates touch:
Q + Si and Q + Sj touch if and only if they intersect, while Q + Si
and Q + S j + c(Sj - Si) do not intersect for any E > O.
(6) Assume that Q* + Si and Q* + Sj touch. For every intersection point
x E (Q* + Si) n (Q* + Sj)
H ij + Si
H ij + Sj
Si - Sj
H
'J
76
Every large point set has an obtuse angle
we have
Scaling factor, vol(F]) = vol(F)
x - Si E Q* and x - Sj E Q*,
thus, since Q* is centrally symmetric,
Si - x = -(x Si) E Q*
and hence, since Q* is convex,
(S; - Sj) = ((x - Sj) + (s; - x)) E Q*.
We conclude that (Si +S j) is contained in Q* + Sj for all i. Consequently,
for P := conv(5) we get
Pj := (P + Sj) = conv {(Si + Sj) : Si E 5} c;=; Q* + Sj,
which implies that the sets Pj = (P + S j) can only touch.
Finally, the sets Pj are contained in r, because all the points Si, Sj and
& (Si + Sj) are in P, since P is convex. But the Pj are just smaller, scaled,
translates of P, contained in P. The scaling factor is , which implies that
1
vol(Pj) = 2d vol(P),
since we are dealing with d-dimensional sets. This means that at most 2 d
sets Pj fit into P, and hence 151 ::; 2 d .
This completes our proof: the chain of inequalities is closed. 0
. .. but that's not the end of the story. Danzer and Grunbaum asked the
following natural question:
What happens if one requires all angles to be acute rather than just
non-obtuse, that is, if right angles are forbidden?
They constructed configurations of 2d - 1 points in ]Rd with only acute
angles, conjecturing that this may be best possible. Griinbaum proved that
this is indeed true for d ::; 3. But twenty-one years later, in 1983, Paul
Erdos and Zoltan Furedi showed that the conjecture is false - quite dra-
matically, if the dimension is high! Their proof is a great example for the
power of probabilistic arguments; see Chapter 32 for an introduction to the
"probabilistic method."
Theorem 2. For every d 1. there is a set 5 c;=; {O, l}d of L ( ) d J
points in]Rd (vertices of the unit d-cube) that determine only acute angles.
In particular, in dimension d = 35 there is a set of 76 > 2.35 - 1 points
with only acute angles.
L 1 ( 2 ) d J .
. Proof. Set Tn : = '2 V3 ' and pick 2m vectors
x(1),x(2),... ,x(2m) E {O,l}d
by choosing all their coordinates independently and randomly, to be either
o or 1, with probability & for each alternative. (You may toss a perfect coin
2md times for this; however, if d is large you may get bored by this soon.)
--'
.........-
Every large point set has an obtuse angle
77
Now three vectors x(i), xU), x(k) determine a right angle with apex x(j)
if and only if the scalar product
(x(i) - xU), x(k) - xU))
vanishes, that is, if we have
x(i)£ - xU)£ = 0 or x(k)£ - x(j)£ = 0
for each coordinate g. We call (i, j, k) a bad triple if this happens. (If
x(i) = xU) or xU) = x(k), then the angle is not defined, but also then
the triple (i, j, k) is certainly bad.)
The probability that one specific triple is bad is exactly ()d: Indeed, it
will be good if and only if, for one of the d coordinates R, we get
either
or
x(i)£ = x(k)£ = 0,
x(i)£ = x(k)£ = 1,
x(j)£ = 1,
xU)£ = O.
This leaves us with six bad options out of eight equally likely ones, and a
triple will be bad if and only if one of the bad options (with probability )
happens for each of the d coordinates.
The number of triples we have to consider is 3(2;"), since there are (2;')
sets of three vectors, and for each of them there are three choices for the
apex. Of course the probabilities that the various triples are bad are not
independent: but linearity of expectation (which is what you get by averag-
ing over all possible selections; see the appendix) yields that the expected
number of bad triples is exactly 3 (2;") () d. This means - and this is the
point where the probabilistic method shows its power - that there is some
choice of the 2m vectors such that there are at most 3 (2;n) () d bad triples,
where
3(2;n) ()d < 3 (2,,):3 ()d = m(2m)2 ()d :S m,
by the choice of m.
But if there are not more than m bad triples, then we can remove m of the
2m vectors x(i) in such a way that the remaining m vectors don't contain
a bad triple, that is, they detennine acute angles only. D
Appendix: Three tools from probability
Here we gather three basic tools from discrete probability theory which
will come up several times: random variables, linearity of expectation and
Markov's inequality.
Let (0, p) be a finite probability space, that is, 0 is a finite set and p = Prob
is a map from 0 into the interval [0,1] with I:wEOP(w) = 1. A random
variable X on 0 is a mapping X : n -----+ IR.. We define a probability space
on the image set X (0) by setting p(X = x) := I:X(wJ=x p(w). A simple
example is an unbiased dice (all p(w) = k) with X = "the number on top
when the dice is thrown."
78
Every large point set has an obtuse angle
The expectation EX of X is the average to be expected, that is,
EX = L p(w)X(w).
wEO
Now suppose X and Yare two random variables on 0, then the sum X + y
is again a random variable, and we obtain
E(X + }')
LP(w)(X(w) + Y(w))
w
LP(w)X(w) + LP(w)Y(w)
EX + EY.
w
w
Clearly, this can be extended to any finite linear combination of random
variables - this is what is called the linearity of expectation. Note that it
needs no assumption that the random variables have to be "independent"
. ,
In any sense.
Our third tool concerns random variables X which take only nonnegative
values, shortly denoted X 2: O. Let
Prob(X 2: a) =
L p(w)
w:X(w)2"a
be the probability that X is at least as large as some a > O. Then
EX L p(w)X(w) + L p(w)X(w) > a L p(w),
w:X(w)2"a
w:X(w)<a
w:X(w)2"a
and we have proved Markov's inequality
EX
Prob(X 2: a) ::;
a
References
[11 L. DANZER & B. GRUNBAUM: Uber zwei Probleme beziiglich konvexer
Kbrper von P. Erdos und van V. L. Klee, Math. Zeitschrift 79 (1962),95-99.
[21 P. ERDOS & Z. FUREDI: The greatest angle among n points in the
d-dimensional Euclidean space, Annals of Discrete Mathematics 17 (1983),
275-283.
[31 H. MINKOWSKI: Dichteste gitterfbrmige Lagerung kongruenter Kbrper,
Nachrichten Ges. Wiss. Gottingen, Math.-Phys. Klasse 1904,311-355.
.......-
Borsuk's conjecture
Karol Borsuk's paper "Three theorems on the n-dimensional euclidean
sphere" from 1933 is famous because it contained an important result
(conjectured by Stanislaw Vlam) that is now known as the Borsuk-Ulam
theorem:
Every continuous map f : Sd --+ IR. d maps two antipodal points of
the sphere Sd to the same point in IR.d.
The same paper is famous also because of a problem posed at its end, which
became known as Borsuk's Conjecture:
Can every set S IR.d of bounded diameter diam(S) > 0 be
partitioned into at most d + 1 sets of smaller diameter?
The bound of d + 1 is best possible: if S is a regular d-dimensional simplex,
or just the set of its d + 1 vertices, then no part of a diameter-reducing
partition can contain more than one of the simplex vertices. If f (d) denotes
the smallest number such that every bounded set S IR. d has a diameter-
reducing partition into f (d) parts, then the example of a regular simplex
establishes f (d) 2: d + 1.
Borsuk's conjecture was proved for the case when S is a sphere (by Borsuk
himself), for smooth bodies S (using the Borsuk-Ulam theorem), for d ::; 3,
. .' but the general conjecture remained open. The best available upper
bound for f (d) was established by Oded Schramm, who showed that
f(d) ::; (1.23)d
for all large enough d. This bound looks quite weak compared with the con-
jecture "f(d) = d + 1", but it suddenly seemed reasonable when Jeff Kahn
and Gil Kalai dramatically disproved Borsuk's conjecture in 1993. Sixty
years after Borsuk's paper, Kahn and Kalai proved that f(d) 2: (1.2)Vd
holds for large enough d.
A Book version of the Kahn-Kalai proof was provided by A. Nilli: brief
and self-contained, it yields an explicit counterexample to Borsuk's conjec-
ture in dimension d = 946. We present here a modification of this proof,
due to Andrei M. Raigorodskii and to Bernulf WeiBbach, which reduces the
dimension to d = 561, and even to d = 560. A new "record" of d = 323
was achieved by Aicke Hinrichs in the summer of 2000.
Chapter 14
Karol Borsuk
t
L:]6 D
Any d-simplex can be split into d + 1
pieces, each of smaller diameter.
80
Borsuk's conjecture
A. Nilli
( 1 )
-1
x -1
1
-1
x T ( 1 -1 -1 1 -1 )
( -; -1 -1 1 -; )
1 1 -1
xx T -1 1 1 -1
1 -1 -1 1 -1
-1 1 1 -1 1
Theorem. Let q = pm he a prime power, n := 4q - 2, and d := G) =
(2q - 1) (4q - 3). Then there is a set S {+ 1, _l}d ol2/1-2 points in JR.d
such that every partition of S, whose parts have smaller diameter than S,
has at least
2 n - 2
q-2
2: (nil)
;=0
parts. For q = 9 this implies that the Borsuk conjecture is false in dimen-
sion d = 561. Furthermore, f(d) > (1.2)Vd holds for all large enough d.
. Proof. The construction of the set S proceeds in four steps.
(1) Let q be a prime power, set n = 4q - 2, and let
q := {XE{+l,-l}n: x ] =1, #{i:xi=-l}iseven}.
This q is a set of 2/1-2 vectors in JR.n. We will see that (x, y) == 2 (mod 4)
holds for all vectors x, y E Q. We will call x, y nearl.v-orthogonal if
I(x, y)1 = 2. We will prove that any subset q' Q which contains no
-.) ]
nearly-orthogonal vectors must be "small": I Q'f ::; 2:;=0 (n i ).
(2) From q, we construct the set
R := {xxI': x E Q}
of2n-2 symmetric (n x n)-matrices of rank 1. We interpret them as vectors
with n 2 components, R JR.n 2 . We will show that there are only acute
angles between these vectors: they have positive scalar products, which are
at least 4. Furthermore, if R' C R contains no two vectors with minimal
scalar product 4, then IR'I is "sall": IR'I ::; 2:;Ig (nil
Vectors, matrices, and scalar products
In our notation all vectors x, y,... are column vectors; the trans-
posed vectors xl', yT,... are thus row vectors. The matrix product
xxI' is a matrix of rank 1, with (XXT)ij = XiXj.
If x, yare column vectors, then their scalar product is
(x,y) = LXiYi = xTy.
We will also need scalar products for matrices X, Y E JR.nxn which
can be interpreted as vectors of length n 2 , and thus their scalar
product is
(X, Y) .- L XijYij.
i,j
,..........-
Borsuk's conjecture
81
(3) From R, we obtain the set of points in ]]J) whose coordinates are the
subdiagonal entries of the corresponding matrices:
5 := {(XXT)i>j: xx T E R}.
Again, 5 consists of 2 n - 2 points. The maximal distance between these
points is precisely obtained for the nearly-orthogonal vectors x, y E Q.
We conclude that a subset 5' C 5 of smaller diameter than 5 must be
"small": 15'1 'S 'L,i (nil).
(4) Estimates: From (3) we see that one needs at least
2 4q - 4
g(q) := 'L,q-=-2 (4 q :-3)
1-0 "
parts in every diameter-reducing partition of 5. Thus
f(d) 2 max{g(q), d + I}
for d = (2q - 1)(4q - 3).
Therefore, whenever we have g(q) > (2q - 1)( 4q - 3) + 1, then we have a
counterexample to Borsuk's conjecture in dimension d = (2q - 1) (4q - 3).
We will calculate below that g(9) > 562, which yields the counterexample
in dimension d = 561, and that
e ( 27 ) q
g(q) > 64q2 16
which yields the asymptotic bound f (d) > (1.2) -Id for d large enough.
Details for (1): We start with some harmless divisibility considerations.
Lemma. The function P(z) := (=;) is a polynomial of degree q - 2. It
yields integer valuesforal! integers z. The integer P(z) is divisible by p if
and only ilz is not congruent to 0 or 1 modulo q.
. Proof. For this we write the binomial coefficient as
P(z) =
( z - 2 )
q-2
(z - 2Hz - 3) . '" . (z - q + 1)
(q-2)(q-3). '" '" .2.1
and compare the number of p-factors in the denominator and in the numer-
ator. The denominator has the same number of p-factors as (q - 2)!, or as
(q - I)!, since q - 1 is not divisible by p. Indeed, by the claim in the margin
we get an integer with the same number of p- factors if we take any product
of q - 1 integers, one from each non-zero residue class modulo q.
Now if z is congruent to 0 or 1 (mod q), then the numerator is also of this
type: All factors in the product are from different residue classes, and the
only classes that do not occur are the zero class (the multiples of q), and the
class either of -1 or of + 1, but neither + 1 nor -1 is divisible by p. Thus
denominator and numerator have the same number of p-factors, and hence
the quotient is not divisible by p.
(*)
Claim. If a == b =t 0 (modq), then
a and b have the same number of p-
factors.
. Proof. We have a = b + spm, where
b is not divisible by pm = q. So every
power pk that divides b satisfies k < m,
and thus it also divides a. The statement
is symmetric in a and b. 0
82
Borsuk's conjecture
On the other hand, if z oj 0, 1 (mod q), then the numerator of (*) contains
one factor that is divisible by q = pm. At the same time, the product has no
factors from two adjacent nonzero residue classes: one of them represents
numbers that have no p-factors at all, the other one has fewer p-factors
than q = pm. Hence there are more p- factors in the numerator than in the
denominator, and the quotient is divisible by p. 0
Now we consider an arbitrary subset Q' <; Q that does not contain any
nearly-orthogonal vectors. We want to establish that Q' must be "small."
Claim 1. lfx, yare distinct vectors from Q. then %( (x, y) + 2) is
an integer in the range
-(q-2) %((x,y) +2) q-1.
Both x and y have an even number of (-1 )-components, so the number of
components in which x and y differ is even, too. Thus
(x,y) = (4q - 2) - 2#{i: Xi -::J y;} == -2 (mod4)
for all x, y E Q, that is, % ((x, y) + 2) is an integer.
Fromx,y E {+1,_1}4 q -2 weseethat-(4q-2) (x,y) 4q-2,
that is, -(q - 1) %((x, y) + 2) q. The lower bound never holds with
equality, since Xl = YI = 1 implies that x -::J -Yo The upper bound holds
with equality only if x = y.
Claim 2. For any y E Q', the polynomial in n variables Xl, . .. , X n
of degree q - 2 given by
Fy(x) := P(%((x,y)+2)) = (%((X')_+22)-2)
satisfies that Fy (x) is divisible by p for every x E Q'\ {y}, but
notfor x = y.
The representation by a binomial coefficient shows that Fy (x) is an integer-
valued polynomial. For x = y, we get the value Fy(Y) = 1. For x -::J y,
the Lemma yields that Fy (x) is divisible by p if and only if % ((x, y) + 2) is
congruent to 0 or 1 (modq). By Claim I, this happens only if t((x, y) +2)
is either 0 or 1, that is, if (x, y) E {-2, +2}. So x and y must be nearly
orthogonal for this, which contradicts the definition of Q'.
Claim 3. The same is true for the polynomials F y(x) in the n-1
variables X2, . . . , X n that are obtained as follows: Expand Fy (x)
into monomials and remove the variable Xl, and reduce all higher
powers of other variables, by substituting Xl = 1. and x; = 1 for
i > 1. The polynomials F y(x) have degree at most q - 2.
The vectors x E Q <; {+ 1, -I} n all satisfy Xl = 1 and x; = 1. Thus
the substitutions do not change the values of the polynomials on the set Q.
They also do not increase the degree, so F y (x) has degree at most q - 2.
.....-
Borsuk's conjecture
83
Claim 4. There is no linear relation (with rational coefficients)
between the polynomials F y(x), that is, the polynomials F y(x),
y E Q', are linearly independent over iQ. In particular, they are
distinct.
Assume that there is a relation of the form 2.: yEQ ' Cty F y (x) = 0 such that
not all coefficients Ct y are zero. After multiplication with a suitable scalar
we may assume that all the coefficients are integers, but not all of them are
divisible by p. But then for every y E Q' the evaluation at x := y yields
that Cty F y(Y) is divisible by p, and hence so is ct y , since F y(Y) is not.
Claim 5. I Q'I is bounded by the number of sq uarefree monomials
of degree at most q - 2 in n - 1 variables, which is 2.:i (n1).
By construction the polynomials F yare squarefree: none of their mono-
mials contains a variable with higher degree than 1. Thus each F y (x) is a
linear combination of the squarefree monomials of degree at most q - 2 in
the n - 1 variables X2, . . . , Xn. Since the polynomials F y (x) are linearly
independent, their number (which is IQ'I) cannot be larger than the number
of monomials in question.
Details for (2): The first column of xx T is x. Thus for distinct x E Q
we obtain distinct matrices l'vl (x) := xx T . We interpret these matrices as
vectors of length n 2 with components XiXj. A simple computation
n n
(M(x),M(y))
L L(XiXj)(YiYj)
i=1 j=1
(tXiYi)(tXjyj) = (X,y)2 2 4
i= 1 j = 1
shows that the scalar product of M(x) and M(y) is minimized if and only
if x, y E Q are nearly-orthogonal.
Details for (3): Let U (x) E {+ 1, -I} d denote the vector of all sub-
diagonal entries of M(x). Since M(x) = xx T is symmetric with diagonal
values +1, we see that M(x) -::J M(y) implies U(x) -::J U(y). Further-
more,
4 < (M(x), M(y)) = 2(U(x), U(y)) + n,
that is, Al(x)
(U(x), U(y)) 2 - + 2,
with equality if and only if x and yare nearly-orthogo nal. Since all the vec-
tors U (x) E 5 have the same length V (U (x), U (x)) = Iff), this means
that the maximal distance between points U(x), U(y) E 5 is achieved
exactly when x and yare nearly-orthogonal.
Details for (4): For q = 9 we have g(9) :::::; 758.31, which is greater than
( 34 )
d + 1 = 2 + 1 = 562.
1
,
84
Borsuk '5 conjecture
To obtain a general bound for large d, we use monotonicity and unimodality
ofthe binomial coefficients and the estimates n! > e( )n and n! < cn( )"
(see the appendix to Chapter 2) and derive
2: q - 2 ( 4q-3 ) ( 4q ) _ (4q)! e4q( 4n 4q _4 q2 ( 256 ) q
<q -q-<q , ---.
i=O i q q!(3q)! e (;)q e en 3q e 27
Thus we conclude
f(d) > g(q)
2 4q - 4
e ( 27 ) q
q-2 > 64 q 2 16 .
2.:= (4 q ;3)
i=O
From this, with
d = (2q-l)(4q-3) 5 q 2+(q-3)(3q-l) 25 q 2 forq23,
q = + V + 6 1 4 > Ii, and U )JI > 1.2032,
we get
f(d) > d (1.2032)v'd > (1.2)Vd for all large enough d, 0
13
A counterexample of dimension 560 is obtained by noting that for q = 9
the quotient g(q) 758 is much larger than the dimension d(q) = 561.
Because of this one still obtains a counterexample of dimension 560 by
taking only "three fourths" of the set S, as given by those points of Q
which satisfy (Xl, X2, X3) -::J- (1, 1, 1),
Borsuk's conjecture is known to be true for d :::; 3, but it has not been
verified for any larger dimension. In contrast to this, it is true up to d = 8
if we restrict ourselves to subsets S <;;: {I, -1}d, as constructed above
(see [8 D. In either case it is quite possible that counterexamples can be
found in reasonably small dimensions,
References
[I] K. BORSUK: Drei Siitze iiber die n-dimensionale euklidische Sphare, Funda-
menta Math. 20 (1933),177-190.
[2] A. HINRICHS: Spherical codes and Borsuk's conjecture, Preprint, 2000.
[3] 1. KAHN & G. KALAl: A counterexample to Borsuk's conjecture, Bulletin
Amer. Math. Soc. 29 (1993), 60-62.
l4] A. NILLI: On Borsuk's problem, in: "Jerusalem Combinatorics '93"
(H. Barcelo and G. Kalai, eds.), Contemporary Mathematics 178, Amer. Math.
Soc. 1994, 209-2] O.
[5] A. M. RAIGORODSKII: On the dimension in Borsuk's problem, Russian Math,
Surveys (6) 52 (1997),1324-1325.
[6] O. SCHRAMM: Illuminating sets of constant width, Mathematika 35 (1988),
I 80-1 99.
[7] B. WEISSBACH: Sets with large Borsuk number, Beitrage zur Algebra und
Geometrie/Contributions to Algebra and Geometry, to appear.
[8] G. M. ZIEGLER: Coloring Hamming graphs, optimal binary codes, and the
O/I-Borsuk problem in low dimensions, Preprint, TV Berlin, June 2000.
......-
Analysis
. -----------
.--...........
-
I
1
15
Sets, functions,
and the continuum hypothesis 87
16
In praise of inequalities 99
17
A theorem of P61ya
on polynomials 107
18
On a lemma
of Littlewood and Offord 115
19
Cotangent
and the Herglotz trick 119
20
Buffon's needle problem 125
"Hilbert's seaside resort hotel"
..........-
Sets, functions,
and the continuum hypothesis
Set theory, founded by Georg Cantor in the second half of the 19th cen-
tury, has profoundly transformed mathematics. Modern day mathematics
is unthinkable without the concept of a set, or as David Hilbert put it: "No-
body will drive us from the paradise (of set theory) that Cantor has created
for us."
One of Cantor's basic concepts was the notion of the size or cardinality of
a set lvI, denoted by IlvII. For finite sets, this presents no difficulties: we
just count the number of elements and say that AI is an n-set or has size n,
if AI contains precisely n elements. Thus two finite sets AI and N have
equal size, IAII = IN I, if they contain the same number of elements.
To carry this notion of equal size over to infinite sets, we use the following
suggestive thought experiment for finite sets. Suppose a number of people
board a bus. When will we say that the number of people is the same as the
number of available seats? Simple enough, we let all people sit down. If
everyone finds a seat, and no seat remains empty, then and only then do the
two sets (of the people and of the seats) agree in number. In other words,
the two sizes are the same if there is a bijection of one set onto the other.
This is then our definition: Two arbitrary sets lvI and N (finite or infinite)
are said to be of equal size or cardinality, if and only if there exists a bi-
jection from AI onto N. Clearly, this notion of equal size is an equivalence
relation, and we can thus associate a number, called cardinal number, to
every class of equal-sized sets. For example, we obtain for finite sets the
cardinal numbers 0, 1,2, . .. , n, . " where n stands for the class of n-sets,
and, in particular, 0 for the empty set 0. We further observe the obvious fact
that a proper subset of a finite set AI invariably has smaller size than AI.
The theory becomes very interesting (and highly non-intuitive) when we
turn to infinite sets. Consider the set N = {I, 2, 3, . . . } of natural numbers.
We call a set l'vI countable if it can be put in one-to-one correspondence
with N. In other words, AI is countable if we can list the elements of AI as
m1, rn2, rn3, . . . . But now a strange phenomenon occurs. Suppose we add
to N a new element x. Then N U {x} is still countable, and hence has equal
size with N!
This fact is delightfully illustrated by "Hilbert's hotel." Suppose a hotel
has countably many rooms, numbered 1,2,3,... with guest 9i occupying
room i; so the hotel is fully booked. Now a new guest x arrives asking
for a room, whereupon the hotel manager tells him: Sorry, all rooms are
taken. No problem, says the new arrival, just move guest 91 to room 2,
92 to room 3, 93 to room 4, and so on, and I will then take room 1. To the
Chapter 15
Georg Cantor
91 92
88
Sets, functions, and the continuum hypothesis
X gl
i/!;i/i;t/
f////t
I I I I
./2/3/4
f/!/1
:2. :2.
./2
6
I
managers's surprise (he is not a mathematician) this works; he can still put
up all guests plus the new arrival X!
Now it is clear that he can also put up another guest y, and another one z,
and so on. In particular, we note that, in contrast to finite sets, it may well
happen that a proper subset of an infinite set AI has the same size as 111. In
fact, as we will see, this is a characterization of infinity: A set is infinite if
and only if it has the same size as some proper subset.
Let us leave Hilbert's hotel and look at our familiar number sets. The set
Z of integers is again countable, since we may enumerate Z in the fonn
Z = {O, 1, -1,2, -2, 3, -3,...}. Perhaps more surprisingly, the set iQ of
rational numbers is also countable. By listing the set iQ+ of positive ratio-
nals as suggested in the figure (leaving out numbers already encountered)
we see that iQ+ is countable, and hence so is iQ by listing 0 at the beginning
and - E right after E. With this listing
q q
I I I I 3 3
iQ = {0,1,-1,2,-2'2'-2'3'-3,3,-3,4,-4'2'-2""}'
Another way to interpret the figure is the following statement:
The union of countably many countable sets AI" is aRain countable.
Indeed, set 111" = {ani, a,,2, a n 3,"'} and list
=
U AI" = {all, a21, a12, a13, a22, a;31, a41, a;32, a23. aI4,"'}
,,=1
precisely as before.
What about the real numbers IE.? Are they still countable? No, they are not,
and the means by which this is shown - Cantor's diagonalization method
- is not only of fundamental importance for all of set theory, but certainly
belongs into The Book as a rare stroke of genius.
Theorem 1. The set IE. of real numbers is not countable.
. Proof. Any subset N of a countable set Nl = {ml, m2, m3, . . . } is at
most countable (that is, finite or countable). In fact, just list the elements
of N as they appear in AI. Accordingly, if we can find a subset of IE. which
is not countable, then a fortiori IE. cannot be countable. The subset M
of IE. we want to look at is the interval (0, 1] of all positive real numbers r
with 0 < r :::; 1. Suppose, to the contrary, that Nl is countable, and let
111 = {rl, r2, r3, . . . } be a listing of 111. We write r" as its unique infinite
decimal expansion without an infinite sequence of zeros at the end:
r" = 0.a"la"2a"3...
where a"i E {O, 1,... , 9} for all nand i. For example, 0.7 = 0.6999...
...-
Sets, functions, and the continuum hypothesis
89
Consider now the doubly infinite array
rl 0.(Llla12(L13 ...
r"2 0.a21 a22a23 . . .
r n 0.a n 1 a n2 a n3'"
Foreveryn,chooseb" E {I,... ,8}differentfroma"n;c1earlythiscanbe
done. Then b = 0.b 1 b 2 b 3 ." b n ... is a real number in our set AI and hence
must have an index, say b = r/.:. But this cannot be, since b" is different
from (LJ.:/.:. And this is the whole proof! D
Let us stay with the real numbers for a moment. We note that all four
types of intervals (0,1), (0, 1], [0, 1) and [0,1] have the same size. As an
example, we verify that (0, 1] and (0, 1) have equal cardinality. The map
f : (0, 1] -----+ (0, 1), x f-----7 y defined by
! 3
'i-x
3
4- X
1 '=
j . * x
for
for
for
1 < .,. < 1
2 oA) _ ,
t < :J; ::; ,
k < :J; ::; t,
does the job. Indeed, the map is bijective, since the range of y in the first line
is ::; y < 1, in the second line t ::; y < , in the third line k ::; y < t,
and so on.
Next we find that any two intervals (of finite length> 0) have equal size
by considering the central projection as in the figure. Even more is true:
Every interval (of length> 0) has the same size as the whole real line R
To see this, look at the bent open interval (0,1) and project it onto JR:. from
the center s.
So, in conclusion, any open, half-open, closed (finite or infinite) interval of
length> a has the same size, and we denote this size by c, where c stands
for continuum (a name sometimes used for the interval [0, I D.
That finite and infinite intervals have the same size may come expected on
second thought, but here is a fact that is downright counter-intuitive.
Theorem 2. The set JR:.2 of all ordered pairs of real numbers (that is, the
real plane) has the same size as R
. Proof. To see this, it suffices to prove that the set of all pairs (::c, y),
a < x, y ::; 1, can be mapped bijectively onto (0,1]. The proof is again
from The Book. Consider the pair (x, y) and write x, y in their unique
non-terminating decimal expansion as in the following example:
:J;
0.3 01 2
0.009 2 05
007 08
1 0008
y
".
".
"
°
f: (0,1] ---+ (0,1)
I J:::>
1
5
JR:.
90
Sets, functions, and the continuum hypothesis
Note that we have separated the digits of x and y into groups by always
going to the next nonzero digit, inclusive. Now we associate to (x, y) the
number Z E (0, 1] by writing down the first x-group, after that the first
V-group, then the second :r-group, and so on. Thus, in our example, we
obtain
Z = 0.3 009 01 2 2 05 007 1 08 0008 ...
Since neither x nor y exhibits only zeros from a certain point on, we find
that the expression for z is again a non-terminating decimal expansion.
Conversely, from the expansion of z we can immediately read off the
preimage (x, y), and the map is bijective - end of proof. 0
As (x, y) f-----7 x + iy is a bijection from JR2 onto the complex numbers C,
we conclude that 1«::1 = IJRI = c. Why is the result IJR21 = IJRI so unex-
pected? Because it goes against our intuition of dimension. It says that the
2-dimensional plane JR2 (and, in general, by induction, the n-dimensional
space JRIl) can be mapped bijectively onto the I-dimensional line R Thus
dimension is not generally preserved by bijective maps. If, however, we
require the map and its inverse to be continuous, then the dimension is pre-
served, as was first shown by Brouwer.
Let us go a little further. So far, we have the notion of equal size. When
will we say that AI is at most as large as N? Mappings provide again the
key. We say that the cardinal number m is less than or equal to n, if for
sets M and N with IMI = m, INI = n, there exists an injection from AI
into N. Clearly, the relation m :::; n is independent of the representative
sets AI and N chosen. For finite sets this corresponds again to our intuitive
notion. An m-set is at most as large as an n-set if and only if m :::; n.
Now we are faced with a basic problem. We would certainly like to have
that the usual laws concerning inequalities also hold for cardinal numbers.
But is this true for infinite cardinals? In particular, is it true that m :::; n,
n :::; m imply m = n? The affirmative answer is provided by the famous
Schroeder-Bernstein theorem. The following proof for it, which we learned
from Andrei Zelevinsky, is a veritable jewel.
Theorem 3. If each of two sets lvI and N can be mapped injectively into
the other, then there is a bijection from M to N, that is, IMI = INI.
. Proof. Let f : AI ----+ Nand g : N ----+ M be injections. For each
A M define the set F(A) M by
F(A) := AI \ g(N \ f(A)),
that is, any subset A M is mapped to f(A) N, then we take its
complement N \ f (A) in N, then this complement is mapped back to obtain
g(N \ f(A)) M, and of this set we finally take the complement in M
- see the figure in the margin.
Sets, functions, and the continuum hypothesis
9]
1\'
Suppose now that there is some Ao with F(Ao) = Ao, that is, such that
g(N \ f(Ao)) = M \ Ao. Then, clearly,
rjJ(x) := { f(x) f x E Ao
g-l(X) If x Ao
provides a bijection from AI to N. Hence it remains to find such a "fixed
set" Ao. Our method to find one relies on the following observation:
Claim. Let AI, A 2 , ib, . ., be any family of subsets of" M, then
F(n A;) = n F(A;).
i2:1 ;>1
To see this, note first (i) that f(n A;) = n f(A;) holds, since f is an
injection. We will also use (ii) that g(UB;) = Ug(B;), which is true for
an arbitrary mapping 9 : N ----+ M and arbitrary sets B i <:;; N. Now
F(n Ai)
M \ g(N \ f(n Ai))
M \ g(N \ n f(A;))
M \ g(U(N \ f(A;)))
M \ (Ug(N \ f(A i )))
nOlI \ g(N \ f(A;)))
n F(A;).
by de Morgan's law
by (i)
by de Morgan's law
by (ii)
which proves the claim. Now we consider the chain of subsets
M 2 F(M) 2 F2(M) 2 F:3(M)... ,
and define Ao as the intersection of all the sets in the chain,
Ao := M n F(M) n F2(M) n F: 3 (M)... .
From this the claim yields
F(Ao) = F(M) n F2(M) n F: 3 (M)... ,
and thus F(Ao) = Ao: that's it!
D
"Schroeder and Bernstein painting"
92
Sets, functions, and the continuum hypothesis
"The smallest infinite cardinal"
With this we have also proved a result
announced earlier:
Every infinite set has the same size as
some proper subset.
What about the other relations governing inequalities? As usual, we set
m < n if m 'S n, but m -::j: n. We have just seen that for any two cardinals
m and n at most one of the three possibilities
m < n, m = n, n > m
holds, and it follows from the theory of cardinal numbers that, in fact, pre-
cisely one relation is true (see the appendix to this chapter, Proposition 2).
Furthermore, the Schroeder-Bernstein Theorem tells us that the relation <
is transitive, that is, m < nand n < pimply m < p. Thus the cardinalities
are arranged in linear order starting with the finite cardinals 0,1,2,3,....
Invoking the usual Zermelo-Fraenkel axiom system (in particular, the ax-
iom of choice) we easily find that any infinite set AI contains a countable
subset. In fact, AI contains an element, say mj. The set AI \ {md is not
empty (since it is infinite) and hence contains an element m2. Consider-
ing AI \ {m], m2} we infer the existence of m:" and so on. So, the size
of a countable set is the smallest infinite cardinal, usually denoted by No
(pronounced "'aleph zero").
As a corollary to No 'S m for any infinite cardinal m, we can immediately
prove "'Hilbert's hotel" for any infinite cardinal number m, that is, we have
1M U {:r}1 = IMI for any infinite set },I. Indeed, M contains a subset
N = {mj, m2, m3,"'}' Now map x onto mj, mj onto m2, and so on,
keeping the elements of lVI\N fixed. This gives the desired bijection.
Up to now we know the cardinal numbers 0,1,2,... ,No, and further that
the cardinality r of JR is bigger than No. The passage from iQ with liQl = No
to JR with IJRI = c immediately suggests the next question:
Is (' = I JRI the next infinite cardinal number after No?
Now, of course, we have the problem whether there is a next larger cardinal
number, or in other words, whether N j has a meaning at all. It does - the
proof for this is outlined in the appendix to this chapter.
The statement r = N j became known as the continuum hypothesis. The
question whether the continuum hypothesis is true presented for many
decades one of the supreme challenges in all of mathematics. The answer,
finally given by Kurt Godel and Paul Cohen, takes us to the limit of
logical thought. They showed that the statement c = N] is independent
of the Zermelo-Fraenkel axiom system, in the same way as the parallel
axiom is independent of the other axioms of Euclidian geometry. There are
models where c = N] holds, and there are other models of set theory where
(' -::j: N j holds.
In the light of this fact it is quite interesting to ask whether there are other
conditions (from analysis, say) which are equivalent to the continuum
hypothesis. In the following we want to present one such instance and
its extremely elegant and simple solution by Paul Erdos. In 1962, Wetzel
asked the following question:
.......-
Sets, functions, and the continuum hypothesis
93
Let {f,,} be a family of pairwise distinct analytic functions on the
complex numbers such thatfor each z E CC the set of values {f,,(z)}
is countable; let us call this property (Po).
Does it then follow that the family itself is countable?
Very shortly afterwards Erdos showed that, surprisingly, the answer de-
pends on the continuum hypothesis.
Theorem 4. If c > N I , then every family {fa} sati.lying (Po) is countable.
If, on the other hand, C = N], then there exists some family {fa} with
property (Po) which has size c.
For the proof we need some basic facts on cardinal and ordinal numbers.
For readers who are unfamiliar with these concepts, this chapter has an
appendix where all the necessary results are collected.
. Proof of Theorem 4. Assume first c > N]. We shall show that for any
family {fa} of size N] of analytic functions there exists a complex number
Zo such that all N] values fa (zo) are distinct. Consequently, if a family of
functions satisfies (Po), then it must be countable.
To see this, we make use of our knowledge of ordinal numbers. First, we
well-order the family {fn} according to the initial ordinal number W] of N].
This means by Proposition 1 of the appendix that the index set runs through
all ordinal numbers a which are smaller than WI. Next we show that the
set of pairs (a ,(3), a < (3 < W], has size N 1. Since any (3 < Wj is a
countable ordinal, the set of pairs (a, (-J), a < (3 , is countable for every
fixed (3. Taking the union over all N] -many (3, we find from Proposition 6
of the appendix that the set of all pairs (a, (3), a < (3, has size N].
Consider now for any pair a < (3 the set
5(a,(3) = {z E CC: fa(z) = h(z)}.
We claim that each set 5(00, (3) is countable. To verify this, consider the
circles C k of radius k = 1,2,3, . .. around the origin in the complex plane.
If fa and fe agree on infinitely many points in some Ck, then fa and fp
are identical by a well-known result on analytic functions. Hence fa and f{3
agree only in finitely many points in each Ck, and hence in at most count-
ably many points altogether. Now we set 5 = Ua<p 5(a, (-J). Again by
Proposition 6, we find that 5 has size N], as each set 5(00, (3) is countable.
And here is the punch line: Because, as we know, CC has size c, and c is
larger than N] by assumption, there exists a complex number Zo not in 5,
and for this Zo all N 1 values f" (zo) are distinct.
Next we assume c = N]. Consider the set D CC of complex numbers
p + iq with rational real and imaginary part. Since for each p the set
{p + iq : q E Q} is countable, we find that D is countable. Furthermore,
D is a dense set in CC: every open disk in the complex plane contains some
94
Sets, functions, and the continuum hypothesis
point of D. Let {z" : 0 ::; Q < wd be a well-ordering of cc. We shall
now construct a family {fp : 0 ::; fJ < WI} of N1-many distinct analytic
functions such that
fe(z,,) ED whenever ct < fJ.
(I)
Any such family satisfies the condition (Po). Indeed, each point z E CC has
some index, say z = Z". Now, for all ;3 > ct, the values {Jp(z,,)} lie in
the countable set D. Since ct is a countable ordinal number, the functions
fp with fJ ::; Q will contribute at most countably further values fp(z,,), so
that the set of all values {fi'(Z)} is likewise at most countable. Hence, if
we can construct a family {fd} satisfying (I), then the second part of the
theorem is proved.
The construction of {fe} is by transfinite induction. For fa we may take
any analytic function, for example fa = constant. Suppose f:3 has already
been constructed for all /'J < J. Since I is a countable ordinal, we may
reorder {f,3 : 0 ::; /3 < I} into a sequence gl,g2,g3,.... The same
reordering of {z" : 0 ::; ct < I} yields a sequence WI, W2, W3, . . .. We
shall now construct a function f( satisfying for each n the conditions
f'Y(w n ) ED
and
f'Y(w n ) f gn(w n ).
(2)
The second condition will ensure that all functions f'Y (0 ::; I < wd are
distinct, and the first condition is just (I), implying (Po) by our previous
argument. Notice that the condition f'Y(w n ) f Yn(w n ) is once more a
diagonalization argument.
To construct f'Y' we write
f'Y(z) .- cO+CI(Z- W l)+C2(Z- w d(z-w:z)
+ C3(Z - wd(z - W2)(Z - W3) + ... .
If I is a finite ordinal, then f'Y is a polynomial and hence analytic, and we
can certainly choose numbers Ci such that (2) is satisfied. Now suppose I
is a countable ordinal, then
=
f-, (z) = Len (z - W d . . . (z - W n ).
n=O
(3)
Note that the values of em (rn 2: n) have no influence on the value f'Y (w n ),
hence we may choose the C n step by step. If the sequence (en) converges
to 0 sufficiently fast, then (3) defines an analytic function. Finally, since
D is a dense set, we may choose this sequence (cn) so that f'Y meets the
requirements of (2), and the proof is complete. D
....-
Sets, functions, and the continuum hypothesis
95
Appendix: On cardinal and ordinal numbers
Let us first discuss the question whether to each cardinal number there ex-
ists a next larger one. As a start we show that to every cardinal number m
there always is a cardinal number n larger than m. To do this we employ
again a version of Cantor's diagonalization method.
Let M be a set, then we claim that the set P(M) of all subsets of M has
larger size than M. By letting rn E M correspond to {rn} E P(M),
we see that M can be mapped bijectively onto a subset of P, which im-
plies IMI :S IP(M)I by definition. It remains to show that P(M) can
not be mapped bijectively onto a subset of AI. Suppose, on the contrary,
i.p : N ---+ P(M) is a bijection of N <;;; M onto P(M). Consider the
subset U <;;; N of all elements of N which are not contained in their image
under i.p, that is, U = {rn EN: rn tf- i.p(rn)}. Since i.p is a bijection, there
exists u E N with i.p(u) = U. Now, either 11 E U or 11 tf- U, but both
alternatives are impossible! Indeed, if 11 E U, then u tf- i.p( 11) = U by the
definition of U, and if u tf- U = i.p (u), then 11 E U, contradiction.
Most likely, the reader has seen this argument before. It is the old barber
riddle: "A barber is the man who shaves all men who do not shave them-
selves. Does the barber shave himself?"
To get further in the theory we introduce another great concept of Can-
tor's, ordered sets and ordinal numbers. A set !vI is ordered by < if the
relation < is transitive, and if for any two distinct elements a and b of 1\;1
we either have a < b or b < a. For example, we can order 1':1 in the
usual way according to magnitude, 1':1 = {I, 2, 3, 4,... }, but, of course,
we can also order 1':1 the other way round, 1':1 = {... ,4,3,2,1}, or 1':1 =
{I, 3, 5, . ., ,2,4,6,. . . } by listing first the odd numbers and then the even
numbers.
Here is the seminal concept. An ordered set !vI is called well-ordered if
every nonempty subset of !vI has a first element. Thus the first and third
orderings of 1':1 above are well-orderings, but not the second ordering. The
fundamental well-ordering theorem, implied by the axioms (including the
axiom of choice), now states that every set"U admits a well-ordering. From
now on, we only consider sets endowed with a well-ordering.
Let us say that two well-ordered sets AI and N are similar (or of the same
order-type) if there exists a bijection i.p from AI on N which respects the
ordering, that is, rn <!II n implies i.p(rn) <N i.p(n). Note that any ordered
set which is similar to a well-ordered set is itself well-ordered.
Clearly, similarity is an equivalence relation, and we can thus speak of an
ordinal number a belonging to a class of similar sets. For finite sets, any
two orderings are similar well-orderings, and we use again the ordinal num-
ber n for the class of n-sets. Note that, by definition, two similar sets have
the same cardinality. Hence it makes sense to speak of the cardinality lal
of an ordinal number (t. Note further that any subset of a well-ordered set
is also well-ordered under the induced ordering.
-"--\[
"A legend talks about St. Augustin who,
walking along the seashore and contem-
plating infinity, saw a child trying to
empty the ocean with a small shell. . . ..
The well-ordered sets fir = {I, 2, 3,. . .}
and fir = {1, 3, 5, .. . ,2,4,6, .. . } are
not similar: the first ordering has only
one element without an immediate pre-
decessor, while the second one has two.
96
Sets, functions, and the continuum hypothesis
The ordinal number of {I, 2, 3, . . . }
is smaller than the ordinal number of
{I, 3, 5, . .. ,2,4,6,. . . }.
As we did for cardinal numbers, we now compare ordinal numbers. Let NI
be a well-ordered set, m E AI, then AIm = {x E AI : X < m} is called the
(initial) segment of AI determined by Tn; N is a segment of AI if N = AIm
for some m. Thus, in particular, AIm is the empty set when Tn is the first
element of AI. Now let IL and v be the ordinal numbers of the well-ordered
sets AI and N. We say that IL is smaller than v, IL < v, if AI is similar
to a segment of N. Again, we have the transitive law that IL < v, V < Jr
implies IL < Jr, since under a similarity mapping a segment is mapped onto
a segment.
Clearly, for finite sets, Tn < n corresponds to the usual meaning. Let us de-
note by w the ordinal number of 1':1 = {I, 2, 3, 4, . . . } ordered according to
magnitude. By considering the segment 1':In+1 we find n < w for any finite
n. Next we see that w <:::: a holds for any infinite ordinal number a. Indeed,
if the infinite well-ordered set !vI has ordinal number a, then AI contains a
first element Tn 1, the set AI\ Tnl contains a first element Tn2, !vI \ {Tn 1, Tn2}
contains a first element Tn3. Continuing in this way, we produce the se-
quence Tn 1 < Tn2 < Tn3 < . .. in AI. If AI = {Tn I , Tn2 , Tn3, . . . }, then !vI
is similar to 1':1, and hence a = w. If, on the other hand, lU\ { Tn 1, Tn2, . . . }
is nonempty, then it contains a first element m, and we conclude that 1':1 is
similar to the segment AIm, that is, w < a by definition.
We now state (without the proofs, which are not difficult) three basic re-
sults on ordinal numbers. The first says that any ordinal number IL has a
"standard" representative well-ordered set TY/ 1 .
Proposition 1. Let IL be an ordinal number and denote by vV Il the set of
ordinal numbers smaller than 11. Then the following holds:
(i) The elements of vV Il are pailWise comparable.
(ii) If we order TV" according to magnitude, then TT'" is well-ordered and
has ordinal number IL.
Proposition 2. Any lYvo ordinal numbers IL and v sati.f)' precisely one ()f'
the relations IL < v, 11 = v, or IL > v.
Proposition 3. Every set of' ordinal numbers (ordered according to
magnitude) is well-ordered.
After this excursion to ordinal numbers we come back to cardinal num-
bers. Let m be a cardinal number, and denote by Om the set of all ordinal
numbers IL with I/LI = m. By Proposition 3 there is a smallest ordinal
number W m in Om, which we call the inital ordinal number of Om. As an
example, w is the initial ordinal number of No.
With these preparations we can now prove a basic result for this chapter.
Proposition 4. For every cardinal number m there is a definite next larger
cardinal number.
. Proof. We already know that there is some larger cardinal number n.
Consider now the set K of all cardinal numbers larger than m and at most
Sets, functions, and the continuum hypothesis
97
as large as n. We associate to each.p E K its initial ordinal number wp.
Among these initial numbers there is a smallest (Proposition 3), and the
corresponding cardinal number is then the smallest in K, and thus is the
desired next larger cardinal number to m. D
Proposition 5. Let the infinite set M have cardinality m, and let Ai be
well-ordered according to the initial ordinal number Wm' Then Ai has no
last element.
. Proof. Indeed, if Ai had a last element Tn, then the segment AIm would
have an ordinal number IL < Wm with IILI = m, contradicting the definition
. D
What we finally need is a considerable strenghthening of the result that the
union of countably many countable sets is again countable. In the following
result we consider arbitrary families of countable sets.
Proposition 6. Suppose {Ax} is a family of size m of countable sets AG'
where m is an infinite cardinal. Then the union U AG has size at most m.
G
. Proof. We may assume that the sets AG are pairwise disjoint, since this
can only increase the size of the union. Let M with IMI = m be the index
set, and well-order it according to the initial ordinal number Wm. We now
replace each a E AI by a countable set BG = { b G1 = a, b G2 , b,,3,'" },
ordered according to w, and call the new set lvi. Then fili is again well-
ordered by setting b"i bej for a < f3 and b ai <.!!aj for i < j. Let Ii be
the ordinal number of M. Since M is a subset of lvi , we have IL S Ii by an
earlier argument. If IL = Ii, t hen Ai is similar to M, and if II < Ii, then M
is similar to a segment of lvi. Now, since the ordering Wm of Ai has no last
element (Proposition 5), we see that lvi is in both cases similar to the union
of countable sets B6, and hence of the same cardinality.
The rest is easy. Let i.p : U B;3 ------+ Ivi be a bijection, and suppose that
i.p(Be) = {oo],oo2,oo3,'''}' Replace each a; by Ax, and consider the
union U A exi . Since U Aa, is the union of countably many countable sets
(and hence countable), we see that B{3 has the same size as U Aex,. In
other words, there is a bijection from Be to U A.", for all (-J, and hence
a bijection 4) from U B;3 to U A.a. But now 4Ji.p-] gives the desired
bijection from M to U AG' and thus I U A.a I = m. D
References
[I] L. E. J. BROUWER: Beweis der Invarianz der Dimensionszahl, Math. Annalen
70(1911),161-165.
[2] P. ERDOS: An interpolation problem associated with the continuum hypo-
thesis, Michigan Math. J. 11 (1964), 9-10.
[31 E. KAMKE: Theory (f Sets, Dover Books 1950.
98
Sets, functions, and the continuum hypothesis
"111/initely many more cardinals"
,....-
In praise of inequalities
Chapter 16
Analysis abounds with inequalities, as witnessed for example by the famous
book "Inequalities" by Hardy, Littlewood and P6lya. Let us single out two
of the most basic inequalities with two applications each, and let us listen
in to George P6lya, who was himself a champion of the Book Proof, about
what he considers the most appropriate proofs.
Our first inequality is variously attributed to Cauchy, Schwarz and/or to
Buniakowski:
Theorem I (Cauchy-Schwarz inequality)
Let (a, b) be an inner product on a real vector space V (with the norm
lal 2 := (a, a)). Then
(a, b)2 lal 2 1bl 2
holds j(Jr all vectors a, b E V, with equality if and only if a and bare
linearly dependent.
. Proof. The following (folklore) proof is probably the shortest. Consider
the quadratic function
Ixa + bl 2 = x21al2 + 2x(a, b) + Ibl 2
in the variable x. We may assume a ¥' O. If b = A.a, then clearly
(a, b)2 = lal2lbl2. If, on the other hand, a and b are linearly independent,
then Ixa + bl 2 > 0 for all x, and thus the discriminant (a, b)2 -lal2lbl2 is
less than O. D
Our second example is the inequality of the harmonic, geometric and
arithmetic mean:
Theorem II (Harmonic, geometric and arithmetic mean)
Let a1, . ., , an be positive real numbers, then
n a1 + . . . + an
1 ] a 1 a2 . .. an
a;-+"'+a,:- n
with equality in both cases if and only if all ai's are equal.
. Proof. The following beautiful non-standard induction proof is attributed
to Cauchy (see [7]). Let P(n) be the statement of the second inequality,
written in the form
( a 1 +...+a ll ) 11
a 1 a2 . . . all n .
100
In praise of inequalities
For n = 2, we have al a2 <::: ( a l !a 2 j2 {:::::::} (al - a2)2 2: 0, which is true.
Now we proceed in the following two steps:
(A) P(n) ===} P(n - 1)
(B) P(n) and P(2) ===} P(2n)
which will clearly imply the full result.
n-l
To P rove ( A ) set A : = --".lL then
, D Tl-I'
k=1
n-l
(II ak)A
k=1
P(n)
<
11,-1 n
( k1 a: L + A ) = Cn - l;L A + A r
( nt 1 a k ) n-l
k=1
n-1
An
n-1
and hence II ak
k=1
For (B), we see
< An- 1
IT ak = ( IT a k ) ( IT a k )
k=1 k=1 k=n+l
P(n)
<
(t ; r( f :: r
k=1 k=n+l
P):) ( ,':; r ( t" r
The condition for equality is derived just as easily. The left-hand inequality,
between the harmonic and the geometric mean, follows now by considering
l D
aI' . .. , an'
. Another Proof. Of the many other proofs of the arithmetic-geometric
mean inequality (the monograph [2] lists more than 50), let us single out a
particularly striking one by Alzer which is of recent date. As a matter of
fact, this proof yields the stronger inequality
ai'a2...an <::: Pl a l+p2 a 2+...+p"a n
for any positive numbers aI, . " , an, PI, . " , Pn with L:1 Pi = 1. Let us
denote the expression on the left side by G, and on the right side by A. We
may assume al <::: .., <::: an. Clearly al <::: G <::: an, so there must exist
some k with ak <::: G <::: ak+ 1. It follows that
k f G 1 1 n f a, 1 1
L Pi (-i - G ) dt + L Pi ( G - t) dt
1=1 ai 1=k+l G
> 0
(I)
since all integrands are 2: O. Rewriting (I) we obtain
n ai n ai
LPif dt 2: LPifdt
1=1 G 1=1 G
r
I
In praise o.f inequalities
where the left-hand side equals
101
n a; - G
L P; -------c;-
i=1
A.
- -1
G '
while the right-hand side is
n
11
LPi(logai -logG) = log II afi -logG = O.
;=1 i=1
We conclude B- - 1 2:: 0, which is A. 2:: G. In the case of equality, all
integrals in ( I) must be 0, which implies al = . . . = an = G. D
Our first application is a beautiful result of Laguerre (see [7]) concerning
the location of roots of polynomials.
Theorem 1. Suppose all roots of the polynomial x n +an-lXIl-1 +. . . +ao
are real. Then the roots are contained in the interval with the endpoints
an -1 n - 1 J .) 2n
--::!::- a- --a 'J
n n n-l n _ 1 n-_'
. Proof. Let y be one of the roots and Yl, . . . ,Yn-l the others. Then the
polynomial is (x - y)(x - YI)'" (x - Yn-d. Thus
-a n -l
Y + Yl + . . . + Yn-l,
Y(YI +... + Yn-I) + L YiYj,
i<j
a n -2
and so
n-l
'J ') "'"' 2
o,_1 - 2a n -2 - Y- = Yi .
i=1
By Cauchy's inequality applied to (Yl," . ,Yn-l) and (1,... ,1),
'J
(0,11-1 + y)-
'J
(Yl + Y2 + . . . + Yn-d-
n-l
< (n - 1) L yf = (n - l)(a;'_1 - 2a n -2 - y2),
i=1
or
:2 2o, n -1 2(n - 1) n - 2 2
Y + -Y + o, n -2 - -a n - 1 :S o.
n n n
Thus Y (and hence all Y i) lie between the two roots of the quadratic function,
and these roots are our bounds. D
For our second application we start from a well-known elementary property
of a parabola. Consider the parabola described by f (x) = 1 - x 2 between
x = -1 and x = 1. We associate to f (x) the tangential triangle and the
tangential rectangle as in the figure.
(0,2)
( -1,0)
o
(1,0)
102
In praise of inequalities
(xo, Yo)
v
2
Mathematical Reviews
Erdos, P. and Grunwald, T. On polynomials with only real
roots. Ann. of :.\lath. 40, 537-548 (1939). '1F 93J
Es S(:if(x) ein Polynom mil nur r('cUen Wurzeln,
J(-I;=f(l)=O, o <f(x):2J(p.) fur -l<:x<l,
wohei -1<>1<1. f,<) da&$ p. die Stelle des Maximums \ron
I(-d im Interv::dl (-1, I) bedeutl"t. Dann ist
2f'(1)F(-l)
We find that the shaded area A. = .r 1 (1- x 2 ) dx is eq ual to , and the areas
T and R of the triangle and rectangle are both equal to 2. Thus =
and = 1.
In a beautiful paper, Paul Erdos and Tibor Gallai asked what happens
when f(x) is an arbitrary n-th degree real polynomial with f(x) > 0 for
-1 < x < 1,andf(-1) = f(1) = O. TheareaAisthen.rl f(x)dx. Sup-
pose that f(x) assumes in (-1,1) its maximum value at b, then R = 2f(b).
Computing the tangents at -1 and at 1, it is readily seen (see the box)
that
T = 2f'(1)f'(-1)
1'(1) -1'(-1)'
respectively T = 0 for 1'(1) = 1'( -1) = O.
(2)
The tangential triangle
The area T of the tangential triangle is precisely Yo, where (xo, Yo)
is the point of intersection of the two tangents. The equation of these
tangents are y = 1'( -1)(x + 1) and y = f'(1)(x - 1), hence
1'(1)+1'(-1)
Xo = f'(1)-f'(-1)'
and thus
Yo =
1'(1) ( 1'(1) + 1'(-1) -1 )
f'(1) - f'(-1)
2 1'(1)1'(-1)
f'(1) - f'(-1)"
In general, there are no nontrivial bounds for and . To see this, take
f( ' ) - 1 :Zn Th T - 2 4. - 4n d h . T S . . 1 I
x - - x. en - n," - +1 ' an t us _ 4 > n. Iml ar y,
...,11 1
R = 2 and = 21 , which approaches 1 with n to infinity.
But, as Erdos and Gallai showed, for polynomials which have only real
roots such bounds do indeed exist.
Theorem 2. Let f(x) be a real polynomial of degree n 2: 2 with only real
roots, such that f(x) > Of or -1 < x < 1 and f(-1) = f(1) = O. Then
2 2
-T < A. < -R
3 - - 3 '
and equality holds in both cases onlyfor n = 2.
Erdos and Gallai established this result with an intricate induction proof.
In the review of their paper, which appeared on the first page of the first
issue of the Mathematical Reviews in 1940, George P61ya explained how
the first inequality can also be proved by the inequality of the arithmetic
and geometric mean - a beautiful example of a conscientious review and
a Book Proof at the same time.
/n praise of inequalities
103
. Proof of T:::; A. Since f(x) has only real roots, and none of them in
the open interval (-1,1), it can be written - apart from a constant positive
factor which cancels out in the end - in the form
f(:r) = (1 - x 2 ) II (a; - x) II (;3j + :z:)
j
(3)
with a; 2: 1, ;3j 2: 1. Hence
1
A = /(1-.7: 2 )II(a i -:r)II(;3j+x)d:z:.
-1 ' J
By making the substitution x t-----+ -x, we find that also
1
A = /(1- .7: 2 ) II (a; + :r) II U3j - x)d:z:,
-1 1 J
and hence by the inequality of the arithmetic and the geometric mean (note
that all factors are 2: 0)
1
A = / H (1-x 2 )II(a;-x)II(;3j+X) +
_] I J
(1-:z: 2 )II(a;+x)IIU3 j -x) ]d:r:
i j
1
> /(1 - .7: 2 ) (II (a; - .7: 2 ) II (BJ - x 2 )) 1/2 dx
-1 I J
1
/ 1/2
> (1 - :r 2 ) (II (0:; - 1) II CBJ - 1)) dx
-1 I .I
4 1/2
:3 (II (a; - 1) II CBJ - 1)) .
i j
Let us compute 1'(1) and f'(-l). (We may assume f'(-l),1'(l) f 0,
since otherwise T = 0 and the inequality iT ::; A becomes trivial.) By (3)
we see
1'(1) = -2II(ai- 1 )IICB j +1),
j
and similarly
1'(-1)
2II(ai + 1) IIU:i j -1).
j
Hence we conclude
A > ( - l' (1) l' ( -1)) 1 /2
3 .
104
In praise of inequalities
Applying now the inequality of the harmonic and the geometric mean
to - l' (1) and l' (1), we arrive by (2) at the conclusion
2 2
A. > 1
3 -f(l) + f'(-l)
4 1'(1)1'(-1)
3f'(1) - f'(-l)
T
3 '
which is what we wanted to show. By analyzing the case of equality in
all our inequalities the reader can easily supply the last statement of the
theorem. 0
The reader is invited to search for an equally inspired proof of the second
inequality in Theorem 2.
Well, analysis is inequalities after all, but here is an example from graph
theory where the use of inequalities comes in quite unexpected. In Chap-
ter 29 we will discuss Tunin's theorem. In the simplest case it takes on the
following form:
Theorem 3. Suppose G is a graph on n vertices without triangles. Then G
2
has at most ' edges, and equality holds only when n is even and G is the
complete bipartite graph K n / 2 ,n/2'
. First proof. This proof, using Cauchy's inequality, is due to Mantel.
Let T = {I,... ,n} be the vertex set and E the edge set ofG. By d i we
denote the degree of i, hence LiE V d; = 21EI (see page 135 in the chapter
on double counting). Suppose ij is an edge. Since G has no triangles, we
find d i + d j ::; n since no vertex is a neighbor of both i and j.
It follows that
L (d; + d j ) < niEI.
ijEf,'
Note that d i appears exactly d; times in the sum, so we get
nlEI 2: L (d; + d j )
ijEE
""' d 2
I'
iE\
and hence with Cauchy's inequality applied to the vectors (d l ,. .. , d n ) T
and (1, . .. ,1 f '
( '" d; ) 2
nlEI > L d; >
iEV
41EI2
n
n
and the result follows. In the case of equality we find d i = d j for all
i,j, and further d; = (since d i + d j = n). Since G is triangle-free,
G = Kn/2.11/2 is immediately seen from this. 0
r
In praise of inequalities
105
. Second proof. The following proof of Theorem 3, using the inequality
of the arithmetic and the geometric mean, is a folklore Book Proof. Let a
be the size of a largest independent set .4, and set /3 = n - a. Since G is
triangle-free, the neighbors of a vertex i form an independent set, and we
infer d i :S a for all i.
The set B = V\A of size /3 meets every edge of G. Counting the edges
of G according to their end vertices in B, we obtain lEI :S LiEF d i . The
inequality of the arithmetic and geometric mean now yields
lEI :S L d i :S a/3 :S ( a; /3 f
iEB
n 2
4'
and again the case of equality is easily dealt with.
References
[I] H. ALZER: A proof (fthe arithmetic mean-geometric mean inequality, Amer.
Math. Monthly 103 (1996), 585.
[2] P. S. BULLEN, D. S. MITRINOVICS & P. M. VASIC: Means and their In-
equalities, Reidel, Dordrecht 1988.
[31 P. ERDOS & T. GRUNWALD: On polynomials with only real roots, Annals
Math. 40 (1939), 537-548.
[4] G. H. HARDY, J. E. LITTLEWOOD & G. POLY A: Inequalities, Cambridge
University Press, Cambridge 1952.
[5] W. MANTEL: Problem 28. Wiskundige Opgaven 10 (1906), 60-61.
[61 G. PoLYA: Review of [3J, Mathematical Reviews 1 (1940), I.
[7J G. PoLYA & G. SZEGO: Problems and Theorems in Analysis, Vol. I, Springer-
Verlag. Berlin Heidelberg New York 1972/78; Reprint 1998.
J\.
d i
o
A theorem of Palya on polynomials
Among the many contributions of P6lya to analysis, the following has
always been Erdos' favorite, both for the surprising result and for the beauty
of its proof. Suppose that
j(z) = Z" + b n _ 1 z"- 1 +... + b o
is a complex polynomial of degree n 2: 1 with leading coefficient 1. Asso-
ciate with j (z) the set
C := {z E CC: Ij(z)1 :S 2},
that is, C is the set of points which are mapped under f into the circle of
radius 2 around the origin in the complex plane. So for n = 1 the domain C
is just a circular disk of diameter 4.
By an astoundingly simple argument, P6lya revealed the following beauti-
ful property of this set C:
Take any line L in the complex plane and consider the orthogonal
projection C L of'the set C onto L. Then the totallenf?th of'any such
projection never exceeds 4.
What do we mean by the total length of the projection C f > being at most 4?
We will see that C L is a finite union of disjoint intervals / 1 , . . . ,It, and the
condition means that f(h) + ... + f(Id :S 4, where f(Ij) is the usual
length of an interval.
By rotating the plane we see that it suffices to consider the case when L is
the real axis of the complex plane. With these comments in mind, let us
state P6lya's result.
Theorem 1. Let j(z) be a complex pol.vnomial of'def?ree at least 1 and
leading coefficient 1. Set C = {z E CC : If (z) I :S 2} and let R be the
orthof?onal projection of' C onto the real axis. Then there are intervals
h, . .. , It on the real line which tOf?ether cover R and satisfy
P(II) +... + f(It) :S 4.
Clearly the bound of 4 in the theorem is attained for n = 1. To get more
of a feeling for the problem let us look at the polynomial j(z) = z'2 - 2,
which also attains the bound of 4. If z = x + iy is a complex number, then
x is its orthogonal projection onto the real line. Hence
R = {x E IE.: x + iy E C for some y}.
Chapter 17
George P61ya
108
A theorem of P6lya on polynomials
c)
fez) = z - 2
J>-
X
z = x + iy
y ... c, a, + ib,
bk
x
ak
The reader can easily prove that for j(z) = Z2 - 2 we have x + iy E C if
and only if
(X 2 +y2)2 ::; 4(X 2 _y2).
It follows that x 4 ::; (x 2 + y2)2 ::; 4x 2 , and thus x 2 ::; 4, that is, Ixl ::; 2.
On the other hand, any z = :r E IR with I.TI ::; 2 satisfies IZ2 - 21 ::; 2, and
we find that R is precisely the interval [- 2,2] oflength 4.
Asafirststeptowardstheproofwritej(z) = (z-cd...(z-c,,)withck =
ak + ib k , and consider the real polynomial p(x) = (x - ad... (:r: - an).
Let z = x + i y E C, then by the theorem of Pythagoras
2 2 2
Ix-akl +Iy-bkl = IZ-Ckl
and hence I.T - ak I ::; Iz - Ck I for all k;, that is,
Ip(x)1 = IX-all...lx-anl::; Iz- c 11...lz-c,,1 = Ij(z)I::;2.
Thus we find that R is contained in the set P = {x E IR : Ip(x) I ::; 2},
and if we can show that this latter set is covered by intervals of total length
at most 4, then we are done. Accordingly, our main Theorem I will be a
consequence of the following result.
Theorem 2. Let p(x) be a real polynomial of degree n :2: 1 with leading
coefficient 1, and all roots real. Then the set P = {x E IR : Ip(x)1 ::; 2}
can be covered by intervals of total length at most 4.
As P6lya shows in his paper [2], Theorem 2 is, in turn, a consequence
of the following famous result due to Chebyshev. To make this chapter
self-contained, we have included a proof in the appendix (following the
beautiful exposition by P6lya and Szego).
Chebyshev's Theorem.
Let p(x) be a real polynomial of degree n :2: 1 with leading coefficient 1.
Then
1
max Ip(x)1 >
-1 <:.:X<:.: 1 2 n - 1 .
Let us first note the following immediate consequence.
Corollary. Let p( x) be a real polynomial of deg ree n :2: 1 with leading
coefficient 1, and suppose that Ip( x) I ::; 2 for all x in the interval [a, b].
Then b - a < 4.
. Proof. Consider the substitution y = ba (x - a) - 1. This maps the
x-interval [a, b] onto the y-interval [-1,1]. The corresponding polynomial
q(y) = p( b;a (y+l)+a)
has leading coefficient ( b;a )n and satisfies
max Iq(y)1
-1 <:.: y<:.: 1
max Ip(x)l.
a<x<b
A theorem of P6lya on polynomials
109
By Chebyshev's theorem we deduce
2 max Ip(x)1 ( b;a )n 2}-1
a5c":<b
2( b4: a )/l,
and thus b - a :S 4, as desired.
This corollary brings us already very close to the statement of Theorem 2.
If the set P = {x : Ip(x)1 :S 2} is an interval, then the length of Pis
at most 4. The set P may, however, not be an interval, as in the example
depicted here, where P consists of two intervals.
What can we say about P? Since p(x) is a continuous function, we know
at any rate that P is the union of disjoint closed intervals 1 1 ,1 2 , . . . , and
that p(:E) assumes the value 2 or -2 at each endpoint of an interval Ij. This
implies that there are only finitely many intervals I] , . " , It, since p(x) can
assume any value only finitely often.
P6lya's wonderful idea was to construct another polynomial j5(x) of degree
n, again with leading coefficient 1, such that P = {x : 1j5(x) I :S 2} is an
interval of length at least fJ(Id + ... + fJ(It). The corollary then proves
fJ(IJ) + . . . + fJ(It) :S fJ(P) :S 4, and we are done.
. Proof of Theorem 2. Consider p(x) = (x - ad... (x - all) with
P = {x E IR: Ip(x)1 :S 2} = II U ... U It, where we arrange the intervals
Ij such that I] is the leftmost and It the rightmost interval. First we claim
that any interval Ij contains a root of p( x). We know that p( x) assumes the
values 2 or -2 at the endpoints of Ij. If one value is 2 and the other -2,
then there is certainly a root in Ij. So assume p(x) = 2 at both endpoints
(the case -2 being analogous). Suppose b E Ij is a point where p(x)
assumes its minimum in Ij. Thenp'(b) = 0 andp"(b) O. Ifp"(b) = 0,
then b is a multiple root of p'(x), and hence a root of p(x) by Fact I from
the box on the next page. If, on the other hand, p" (b) > 0, then we deduce
p(b) :S 0 from Fact 2 from the same box. Hence either p(b) = 0, and we
have our root, or p(b) < 0, and we obtain a root in the interval from b to
either endpoint of Ij.
Here is the final idea of the proof. Let h, . .. , It be the intervals as before,
and suppose the rightmost interval It contains m roots of p(:E), counted
with their multiplicities. If m = n, then It is the only interval (by what
we just proved), and we are finished. So assume m < n, and let d be
the distance between It-l and It as in the figure. Let b l ,... , b m be the
roots of p(x) which lie in It and CI, . . . Cn-m the remaining roots. We now
write p(x) = q(x)r(x) where q(:1;) = (x - bJ)... (x - bm) and r(x) =
(x - cd", (x - c n - rn )' and setpI(x) = q(x + d)r'(x). The polynomial
PI (x) is again of degree n with leading coefficient 1. For x E I] U. . . U It-I
we have Ix + d - bil < Ix - bil for all i, and hence Iq(x + d)1 < Iq(x)l. It
follows that
Ip I (:r;) I :S Ip(:E) I :S 2
for :1; E h U . . . U It -I.
If, on the other hand, x E It, then we find Ir(:1; - d) I :S Ir(x)1 and thus
IpI(x - d) I = Iq(x)llr(x - d) I :S Ip(x)1 :S 2,
o
1-vS
1 HvS ;;:::3.20
For the polynomial p(;£) = J;2 (J; - 3)
we get P = [1- vS, 1 ]U[HvS, ;;::: 3.2]
d
-
h
1 2
It-I It
A theorem ()f' P6lya on polynomials
which means that It - d PI = {x: Ilh(J;) I :S 2}.
In summary, we see that PI contains h U. . . U It-l U (It - d) and hence has
total length at least as large as P. Notice now that with the passage from
p( J;) to lh (x) the intervals It-I and It - d merge into a single interval. We
conclude that the intervals J],. ., ,J s of PI (J;) making up PI have total
length at leasU(Id +. . . + {(It ), and that the rightmost interval J s contains
more than rn roots of PI (x). Repeating this Erocedure at most t - 1 times,
we finally arrive at polynomial p(x) with P = {x : I]!(:r) I :S 2} being an
interval of length {(P) 2 £(h)+... ((It), and the proof is complete. 0
Two facts about polynomials with real roots
Let p(:r) be a non-constant polynomial with only real roots.
Fact 1. If b is a multiple root of pi (x), then b is also a root qf' p(x).
. Proof. Let b],... ,b,. be the roots of p(x) with multiplicities
8],... ,8n L=]Sj = n. Fromp(x) = (x -bj)Sjh(x) we infer
that b j is a root of pi (x) if 8 j 2 2, and the multiplicity of b j in pi (x)
is 8j - 1. Furthermore, there is a root of p'(X) between b l and b"2,
another root between b 2 and b 3 , . . . , and one between b r - I and b r ,
and all these roots must be single roots, since L=I (8 j - 1) + (r - 1)
counts already up to the degree n - 1 of pi (x). Consequently, the
multiple roots of pi (x) can only occur among the roots of p( x). 0
Fact 2. We have p'(X)2 2 p(:r)p"(x) for all x E lit
. Proof. If x = ai is a root of p(x), then there is nothing to show.
Assume then x is not a root. The product rule of differentiation yields
pi (x) = t p(:r)
x - ak
k=]
1/ 2: n 2: n p(x) ( ) 2: 1
p (x) = = 2p x
(x - ak)(x - at) (x - ak)(x - at)
k=1 t=1 {k,f}
lick
where {k, £} runs through all pairs {k, £} {I,... ,n}. Hence
I ,)
P (x)- =
n 1 "2
p(X)2(2: -)
k=1 X - ak
> 2p(x)"2 '" 1
(X-ak)(x-at)
{k,£}
p(x)pl/(x).
o
A theorem of P6lya on polynomials
III
Appendix: Chebyshev's theorem
Theorem. Let p(x) be a real polynomial of degree n > 1 with leading
coefficient 1. Then
1
max Ip(;r;)I > 2 n-l'
-1<x<1
Before we start, let us look at some examples where we have equality. The
margin depicts the graphs of polynomials of degrees 1, 2 and 3, where we
have equality in each case. Indeed, we will see that for every degree there
is precisely one polynomial with equality in Chebyshev's theorem.
. Proof. Consider a real polynomial p(x) = x n + a n _IX,,-1 + ... + ao
with leading coefficient 1. Since we are interested in the range -1 :S :r :S 1,
we set x = cas iJ and denote by g( iJ) := p( cos iJ) the resulting polynomial
in cas iJ,
g(19) = (cosiJ)n + an_l(COsiJ)n-l +... + ao. (1)
The proof proceeds now in the following three steps which are all classical
results and are all interesting in their own right.
(A) We express g( iJ) as a so-called cosine polynomial, that is, a polynomial
of the form
g( iJ) = b" cas niJ + b n - I cos(n - l)iJ + . . . + b l cas 19 + b o (2)
with b k E IR, and show that its leading coefficient is b 11 = 2}-1 '
(B) Given any cosine polynomial h( iJ) of order n (meaning that An is the
highest nonvanishing coefficient)
h(iJ) = A"cosniJ+An_1Cos(n-1)iJ+...+Ao, (3)
we show IAnl :S max Ih(iJ)l, which when applied to g(13) will then prove
the theorem.
(C) On the way to proving (B) we show the following remarkable result:
If h( iJ) = An cos n13 + . . . + Ao is a nonnegative cosine polynomial of
order n, then IAn I :S Ao.
Proof of (A). To pass from (1) to the representation (2), we have to ex-
press all powers (cas iJ)k as cosine polynomials. For example, the addition
theorem for the cosine gives
cos 2iJ = cos 2 iJ - sin 2 19 = 2 cos 2 iJ - 1,
so that cos 2 iJ = & cos 213 + t. To do this for an arbitrary power (cosl9)k
we go into the complex numbers, via the relation e ix = cas x + i sin :r.
;
The polynomials p](x) = .1:, P2(X) =
x 2 - andp3(x) = x 3 - x achieve
equality in Chebyshev's theorem.
112
A theorem of P6lya on polynomials
The e ix are the complex numbers of absolute value 1 (see the box on com-
plex unit roots on page 25). In particular, this yields
e im9 = cos nl3 + i sin nl3.
(4)
On the other hand,
e im9 = (e iiJ )" = (cosl3+isinl3)".
(5)
Equating the real parts in (4) and (5) we obtain by i 4f + 2 = -1, i 4f = 1 and
sin 2 e = 1 - cos 2 e
cosnl3
L (:£) (cosl3)n-4£(I- c:os 2 13)2£
£>0
- L C£:2)(COSl3),,-4£-2(1-cos 2 13)2H1.
f;o.O
(6)
We conclude that cos nl3 is a polynomial in cos 13,
cosnl3 = c,,(c:osl3)"+C"_1(COSl3)"-1 +...+co. (7)
From (6) we obtain for the highest coefficient
e" = L (;) + L (4£ : 2)
f;o.O £;0.0
2,,-1.
Now we turn our argument around. Assuming by induction that for k < n,
(cos l3)k can be expressed as a cosine polynomial of order k, we infer from
(7) that (cosl3)" can be written as a cosine polynomial of order n with
leading coefficient b" = 2n1_1 '
Proof of (C). Let h(l3) be a cosine polynomial of order n as in (3). Since
cos( -13) = cos 13 and sin( -13) = - sin 19, we find by (4)
e ikiJ + e- ik19
cos kl3 =
2
Setting z = e iiJ , we can therefore write h( 13) in the form
h(l3) =
z" + z-n zk + z-k
ATI 2 + . . . + Ak 2 + . . . + Ao (8)
Z2" + ZO z,,+k + z"-k
Z-"(A" 2 +...+Ak 2 +...+AoZTl)
z-"H(z).
Let us study H(z) as a complex polynomial in the variable z. H(z) has
degree 2n and does not vanish at z = 0 since An f O. Furthermore, we
have
H(z) = z2"H(!).
z
.....
A theorem of P6lya on polynomials
113
Hence, if a is a root of H(z), then so is . What's more, since H(z)
has real coefficients, we see that the conjugate complex number a and its
inverse l/a are also roots. Let us classify the 2n roots of H (z) as follows.
In the first group, we put all roots a with lal = 1, that is, a = l/a. In the
second group, we have a f l/a, hence one of a or 1/-a has absolute value
less than 1. So we see that H(z) can be written in the form
An II II 1
H(z) = 2' (z - a) (z - 13)(z - =).
101=1 0<1/31<1 13
(9)
It is at this point that we use the assumption that h( 1'J) ? 0 for all 19 . Since
z = e iiJ , we have Izi = 1, and therefore
h(1'J) = Ih(1'J) I = IH(z)l.
Let us look at the roots of H(z) as in (9). If a is a root of the first group,
lal = 1, then we have a = e iif , corresponding to a real root 75 of h(1'J),
o <:::: 75 :S 27L By differentiating the equatior: h(1'J) = e- iniJ H(e i11 ) it is
readily seen that he multiplicity of the root 1'J in h( 1'J) is the same as the
multipliciry of e iiJ as a root of H(z). But since h(1'J) is nonnegative, we
find that 1'J is a minimum of h(1'J) and hence has even multiplicity. Let us
turn to the second group, and consider a product I z - !; II z - 1/131. Since
zz = 1, we find
1 2
Iz jjl
(z -l)(z _! ).= (z:B -1)(z3 - 1)
13 13 13 13
1313 - z:B - z13 + 1 (z - (:1)(z - :B)
3/3 3/3
Iz-131 2
liJl2
and thus
Iz - fJllz -ll
13
Iz - !W
1131
In summary, we obtain
h(1'J) = IH(z)1 = lei II Iz - al 2
101=1
II Iz - 131 2 , for some 0 feE CC,
0<161<1
and have thus proved a famous theorem of Riesz (see [3]):
If h( 1'J) is a nonnegative cosine polynomial of order n, then
h(1'J) = lu(zW for z = e;11,
where u(z) = unz n + Un_1Zn-1 + ... + Uo is a polynomial of
degree n.
114
A theorem of P(5lya on polynomials
What's more, since H(z) has real coefficients, the roots Ct, B of u(z)
occur in conjugate pairs, which means that u(z) is a polynomial with real
coefficients Ui. Thus we obtain with z = c W , Z = c-it!
h(13) (U/l z " + '" + uo) (unz-/I + ... + liD)
z-ll( unz ll + .., + Uo) (Un + ... + '/1oz H ).
Comparing this to the expression (8) we get
AD
Ak
:z
ud + uf + . . . + U;l
unun-k + U n -IU n -1-k +... + ukuO (0 < k::; n)
(10)
and in particular,
An = 2u n u o.
(11)
Taking ( I 0) and (II) together we thus find
I I 21 II I ') 0) ,) ,) ,)
A/I = Un '/10 ::; ui) + u;, ::; ui) + '/11 + . . . + u;, = AO.
( 12)
Proof of (B). Consider first a cosine polynomial h( 13) = An cos rdJ + . . . +
A1 cos 19 of order n 2: 1 with constant coefficient AD = O. We claim
that h( iJ) assumes positive and negative values. Suppose, on the contrary,
h( t'J) 2: O. Then, by (12), I An I ::; 0, and thus An = 0, contradiction. In the
case when h( iJ) ::; 0, we apply the same reasoning to - h( iJ).
Let us, finally, return to general cosine polynomials h(13) of the form (3),
and let us set AI = maxh(19), m = minh(13). Note that AI > AD > m
by what we just proved. Considering the nonnegative cosine polynomials
AI - h( ii) and h( iJ) - m, respectively, we conclude by (12)
IAnl ::; AI - AD and IAn I::; AD - Tn, (13)
and thus
IAn I <
NI-m
2
::; max(AI, -m) = max Ih(iJ)l.
(14)
The proof of (B) and thus of Chebyshev's theorem is complete. D
Using (10)-(14), the reader can easily complete the analysis, showing that
g( iJ) = 2}-1 cos niJ is the only cosine polynomial of order n with leading
coefficient 1 that achieves the equality max Ig( 13) I = .),,1_ 1 ,
The polynomials TH(x) = cosniJ, x = cosiJ, are clled the Chebyshev
polynomials (of the first kind); thus 2 L 1 Tn (.7:) is the unique monic poly-
nomial of degree n where equality holds in Chebyshev's theorem.
References
[I] P. L. CEBYCEV: (Euvres. Vol. I, Acad. Imperiale des Sciences, St. Peters-
burg 1899, pp. 387-469.
[2] G. POLYA: Beitrag zur Verallgemeinerung des Verzermngssatzes aufmehrfach
z,usammenhangenden Gebieten, Sitzungsber. Preuss. Akad. Wiss. Berlin
(1928),228-232; Collected Papers Vol. I, MIT Press 1974,347-351.
[31 G. POLYA & G. SZEGO: Problems and Theorems ill Analysis, Vol. fl, Springer-
Verlag, Berlin Heidelberg New York 1976; Reprint 199L
On a lemma
of Littlewood and Offord
In their work on the distribution of roots of algebraic equations, Littlewood
and Offord proved in 1943 the following result:
Let aI, a2,.', , an be complex numbers with la;1 2:: 1forall 'i, and
consider the 2 n linear combinations L;l Cia; with C; E {I, -I}.
Then the number of sums L;l Cia; which lie in the interior of any
circle of radius 1 is not greater than
2"
c r;; log n for some constant c > O.
yn
Chapter 18
A few years later Paul Erdos improved this bound by removing the log n
term, but what is more interesting, he showed that this is, in fact, a simple
consequence of the theorem of Spemer (see page 143).
To get a feeling for his argument, let us look at the case when all a; are
real. We may assume that all ai are positive (by changing a; to -ai and C;
to -C; whenever ai < 0). Now suppose that a set of combinations L Cia; John E. Littlewood
lies in the interior of an interval of length 2. Let N = {I, 2,. " , n} be the
index set. For every L Cia; we set I := {i EN: Ci = I}. Now if I l'
for two such sets, then we conclude that
Lcai - LCiai = 2 L a; > 2,
;EI'\I
which is a contradiction. Hence the sets I form an antichain, and we
conclude from the theorem of Sperner that there are at most (In';2J) such
combinations. By Stirling's formula (see page II) we have
(Ln r ;2 J)
2"
< c-
- Vii
for some c > O.
For n even and all ai = 1 we obtain ("';J combinations L:l Cia; that
sum to O. Looking at the interval (-1, 1) we thus find that the binomial
number gives the exact bound.
In the same paper Erdos conjectured that (In'/2J) was the right bound for
complex numbers as well (he could only prove c2"n- 1 / 2 for some c) and
indeed that the same bound is valid for vectors aI, . " , an with lai I 2:: 1 in
a real Hilbert space, when the circle of radius 1 is replaced by an open ball
of radius 1.
Sperner's theorem. Any antic/win of
subsets of an n-set has si::e at most
(lll/2J) .
116
On a lemma of Littlewood and Offord
Erdos was right, but it took twenty years until Gyula Katona and Daniel
Kleitman independently came up with a proof for the complex numbers
(or, what is the same, for the plane IE?). Their proofs used explicitly the
2-dimensionality of the plane, and it was not at all clear how they could be
extended to cover finite dimensional real vector spaces.
But then in 1970 Kleitman proved the full conjecture on Hilbert spaces
with an argument of stunning simplicity. In fact, he proved even more. His
argument is a prime example of what you can do when you find the right
induction hypothesis.
A word of comfort for all readers who are not familiar with the notion of
a Hilbert space: We do not really need general Hilbert spaces. Since we
only deal with finitely many vectors ai, it is enough to consider the real
space IR d with the usual scalar product. Here is Kleitman's result.
Theorem. Let al,... , an be vectors in IRd, each of length
at least 1, and let R 1 , . . . , Rk be k open regions of IRd, where
Ix - yl < 2for any x, y that lie in the same region R i .
Then the number of linear combinations I:=l Eiai, Ei E {I, -I},
that can lie in the union U i R i of the regions is at most the sum of
the k largest binomial coefficients (j).
In particular, we get the bound (In/2J) for k = 1.
Before turning to the proof note that the bound is exact for
al = ... = an = a = (1,0,... ,O)T.
Indeed, for even n we obtain C,'/2) sums equal to 0, (n/;'-I) sums equal to
(-2)a, C'/+l) sums equal to 2a, and so on. Choosing balls of radius 1
around
2 rk - 1 l
- I ---:2' a,
(-2)a, 0, 2a,
2l k-l J
---:2' a ,
we obtain
(l n+l J) +... + ( n2 ) + (;) + ( nt 2 ) +... + (I n:;-l J)
sums lying in these k balls, and this is our promised expression, since the
largest binomial coefficients are centered around the middle (see page 12).
A similar reasoning works when n is odd.
. Proof. We may assume, without loss of generality, that the regions R i
are disjoint, and will do so from now on. The key to the proof is the recur-
sion of the binomial coefficients, which tells us how the largest binomial
coefficients of nand n - 1 are related. Set r = l n-+l J, 8 = l n+-l J,
then C), (,'l)' . '. , C) are the k largest binomial coefficients for n. The
recursion (7) = (nl) + C) implies
On a lemma of Littlewood and Offord
117
ten
l=r
tCI)+tGn
t=r l=r
t(nI)+i%lCI)
itl (n 1) + C 1),
and an easy calculation shows that the first sum adds the k + 1 largest
binomial coefficients (n7 1 ), and the second sum the largest k - 1.
Kleitman's proof proceeds by induction on n, the case n = 1 being trivial.
In the light of (1) we need only show for the induction step that the linear
combinations of ai, . . . ,an that lie in k disjoint regions can be mapped
bijectively onto combinations of al, . .. ,an-l that lie in k + 1 or k - 1
regIOns.
Claim. At least one of the translated regions Rj - an is disjoint
from all the translated regions Rl + an, . . . , Rk + an.
To prove this, consider the hyperplane H = {x : (an, x) = c} orthogonal
to an, which contains all translates R i + an on the side that is given by
(an, x) ::::: c, and which touches the closure of some region, say Rj + an.
Such a hyperplane exists since the regions are bounded. Now Ix - yl < 2
holds for any x E Rj and y in the closure of Rj, since Rj is open. We want
to show that Rj - an lies on the other side of H. Suppose, on the contrary,
that (an, x - an) ::::: C for some x E Rj, that is, (an, x) ::::: la n l 2 + c.
Let y + an be a point where H touches Rj + an, then y is in the closure
of Rj, and (an, y + an) = c, that is, (an, -y) = la n l 2 - c. Hence
2
(an, x - y) ::::: 21a n l ,
and we infer from the Cauchy-Schwarz inequality
21a n l 2 ::; (an, x - y) ::; lanllx - yl,
and thus (with lanl ::::: 1) we get 2 ::; 21a n l ::; Ix - yl, a contradiction.
The rest is easy. We classify the combinations L Eiai which come to lie in
Rl U. . . U Rk as follows. Into Class 1 we put all L=l E iai with En = -1
and all Ll Eiai with En = 1 lying in Rj, and into Class 2 we throw in
the remaining combinations L=l Eiai with En = 1, not in Rj. It follows
that the combinations L;ll Eiai corresponding to Class I lie in the k + 1
disjoint regions Rl + an,... , Rk + an and Rj - an, and the combina-
tions L:ll Eiai corresponding to Class 2 lie in the k - 1 disjoint regions
Rl - an, . . . , Rk - an without Rj - an. By induction, Class 1 con-
tains at most L:=r-l (n7 l ) combinations, while Class 2 contains at most
L:: (nil) combinations - and by (1) this is the whole proof, straight
from The Book. 0
(I)
] ]8
On a lemma of Littlewood and Offord
References
[11 P. ERDOS: On a lemma of Littlewood and OtJrml, Bulletin Amer. Math. Soc.
S 1 (1945), 898-902.
[2] G. KATONA: On a conjecture of Erdos and a stronger fbrm of Sperner's
theorem, Studia Sci. Math. Hungar. 1 (1966),59-63.
[3] D. KLEITMAN: On a lemma of Littlewood and Ojjr)rd on the distribution of
certain sums, Math. Zeitschrift 90 (1965),251-259.
[4] D. KLEITMAN: On a lemma (Jj'Littlewood and Offord on the distributions of
linear combinations of !'ector.S', Advances Math. S (1970), 155-157.
[5] J. E. LITTLEWOOD & A. C. OFFORD: On the number {)f real roots {)f a
random algebraic equation III, Mat. USSR Sb. 12 (] 943),277-285.
Cotangent and the Herglotz trick
What is the most interesting formula involving elementary functions? In
his beautiful article [2], whose exposition we closely follow, Jurgen Elstrodt
nominates as a first candidate the partial fraction expansion of the cotangent
function:
lOCI 1
JrCot7fX = - + L (- +-)
x x+n x-n
n=1
(x E JE.\Z).
This elegant formula was proved by Euler in 3178 of his Introductio in
Analysin Infinitorum and certainly counts among his finest achievements.
We can also write it even more elegantly as
N
lim L
N-+oc x+n
11=-N
1
Jr cot Jr:l;
but one has to note that the evaluation of the sum '\' E '" _ + 1 is a bit
61/ IL..X 11
dangerous, since the sum is only conditionally convergent, so its value
depends on the "right" order of summation.
We shall derive (I) by an argument of stunning simplicity which is
attributed to Gustav Herglotz - the "Herglotz trick." To get started, set
N 1
f(:r) := Jr cot Jr.7:, g(x):= lim L -,
N-+oc X + n
n=-1V
and let us try to derive enough common properties of these functions to see
in the end that they must coincide. . .
(A) The functions f(x) and g(x) are defined for all non-integral values
and are continuous there.
For the cotangent function f (x) = Jr cot JrX = Jr cos IT.c , this is clear (see
SIll TLr
the fi g ure ) . For g( x ) ' we first use the identit y _ + 1 +.....!.... - 2 2.' 2 to
. x 11 x-n n -x
rewrite Euler's formula as
Jr cot JrX
1 oc 2.7:
;: - L n 2 - X'2 .
11=1
Thus for (A) we have to prove that for every x Z the series
ex:; 1
L n 2 - x2
11=1
onverges uniformly in a neighborhood of x.
Chapter 19
(I)
Gustav Herglotz
f(x)
(2)
.7:
The function f (x) = 'if cot 'if X
120
Cotangent and the Herglotz trick
Addition theorems:
sin (x + y) = sinxcosy + cosxsiny
cos(x + y) = cosxcosy - sinxsiny
==> sin(x + %) = cosx
cos(x + %) = -sin x
sin x = 2 sin cos
cos x = cos 2 - sin 2 .
For this, we don't get any problem with the first term, for 71 = 1, or with
the terms with 271 - 1 ::; x 2 , since there is only a finite number of them. On
the other hand, for 71 2" 2 and 271 - 1 > x 2 , that is 71 2 - x 2 > (71 - 1)2 > 0,
the summands are bounded by
1
o < 2 .,
n - J;-
<
1
(71-1)2'
and this bound is not only true for x itself, but also for values in a neighbor-
2
hood of x. Finally the fact that L (n!1)2 converges (to ' see page 29)
provides the uniform convergence needed for the proof of (A).
(B) Both f and 9 are periodic of period 1, that is, f(x + 1) = f(x) and
g(x + 1) = g(x) hold for all x E IR\Z.
Since the cotangent has period Jr, we find that f has period 1 (see again the
figure above). For 9 we argue as follows. Let
gN(X) :=
N 1
L x+n '
n=-N
then
gN(x+l)
N 1
L x+l+n
n=-N
N+l 1
L x+n
n=-N+l
1 1
gN-l (x) + + N 1
:r+lV .1:+ +
Hence g(x + 1) = lim gN (x + 1) = lim gN-l (x) = g(:1:).
N --+= N--+=
(C) Both f and 9 are odd functions, that is, we have f ( - x) = - f (x) and
g(-x) = -g(x) for all x E IR\Z.
The function f obviously has this property, and for 9 we just have to
observe that gN(-x) = -gN(x).
The final two facts constitute the Herglotz trick: First we show that f and 9
satisfy the same functional equation, and secondly that h := f - 9 can be
continuously extended to all of IR.
(D) The two functions f and 9 satisfy the same functional equation:
f() + f( X!l ) = 2f(x) and g() + g( X!l ) = 2g(x).
For f (x) this results from the addition theorems for the sine and cosine
functions:
f()+f( X!] )
[ . 7fX . 7f X ]
cos 2 sm 2
1f -
sm i cos T
cos ( 7f i " + ".,J' )
2Jr . 7f x 7f-X
sm(T + 2)
2 f(x).
Cotanf?ent and the Herf?lotz trick
121
The functional equation for g folIows from
( .r ) ( .r+ 1 )
gN:2 + gN -y-
2
2 Y? J\ T (:1:) +
- x + 2N + 1
which in turn follows from
1 1
-+
1 + n ;rt 1 + n
2 ( + 1 ) .
x + 2n :1: + 2n + 1
Now let us look at
( 1 ;x; 2.7: )
h(x) = f(x) - g(:1:) = 7fcotu - - - L 2 2 '
X n -.r
n=l
We know by now that h is a continuous function on JR\Z that satisfies the
properties (B), (C), (D). What happens at the integral values? From the sine
and cosine series expansions, or by applying de I'Hospital's rule twice, we
find
( 1 ) x cos x - sin:r
Jim cot x - - = Jim
.1'-+0 X ,1'-+0 X SIn x
0,
and hence also
lim ( 7f cot 7fX - ) = O.
,r-+O X
But since the last sum I::=1 r/.r2 in (3) converges to 0 with x ---+ 0, we
have in fact Jim h(:r) = 0, and thus by periodicity
,1'-+0
Jim h(x) = 0
x----tn
for all n E Z.
In summary, we have shown the following:
(E) By setting h(:r) := 0 for:r E Z, h becomes a continuous function
on all of JR that shares the properties given in (B), (C) and (D).
We are ready for the coup de gdice. Since h is a periodic continuous func-
tion, it possesses a maximum m. Let Xo be a point in [0, 1] with h(xo) = m.
It follows from (D) that
h(7) + h( .Toil ) = 2m,
and hence that h( 7) = m. Iteration gives h( F) = Tn for all n, and hence
h(O) = m by continuity. But h(O) = 0, and so m = 0, that is, h(:r) ::; 0
for all x E JR. As h(x) is an odd function, h(x) < 0 is impossible, hence
h(x) = 0 for all x E JR, and Euler's theorem is proved. 0
A great many corollaries can be derived from (I), the most famous of which
concerns the values of Riemann's zeta function at even positive integers
(see the Appendix to Chapter 6),
00 1
((2k) = L 2J:
n
n=l
(k E 1':1).
(3)
246
COSX = 1 - ;, + I - ! ::!:.,
. £3.1' 5 :.r 7
SIIl X = X - 31 + 5' - 71 ::!: . . .
(4)
]22
Cotangent and the Herglotz trick
So to finish our story let us see how Eu]er - a few years later, in 1755 -
treated the series (4). We start with formula (2). Mu]tip]ying (2) by x and
setting y = 7fX we find for IYI < 7f:
Y cot Y
= y2
1-2'" 2 2
7f n - y2
n=l
00 y2 1
1- 2 0 7f 2 n21_(..lL)2 0
n-l 1Tn
The last factor is the sum of a geometric series, hence
Y coty
00 00 2k
1 - 2 L L ( JL ) .
7fn
n=lk=l
ex; ( 1 00 1 ) 2k
1 - 2 L 7f2k L n 2k y ,
k=l n=l
and we have proved the remarkable result:
For all k E 1':1, the coefficient of y2k in the power series expansion of y cot y
equals
[y2kJ Y cot Y =
2 = 1
-"'2k L "'2k
7f n
n=l
2
-"'2k((2k).
7f
(5)
There is another, perhaps much more "canonical," way to obtain a series
expansion of y cot y. We know from analysis that e iy = cos y +i sin y, and
thus
e iy + e- iy
2
siny
e iy - e- iy
cosy
2i
which yields
Y cot Y
. e iy + e- iy
l.y . .
. e'y - e- 1Y
. e 2iy + 1
zYe2iY _ 1
We now substitute z = 2iy, and get
z C Z + 1
y cot Y = --
2 e Z - 1
z z
- +
2 e Z - 1
(6)
Thus all we need is a power series expansion of the function c' -=-1 ; note
that this function is defined and continuous on all of IR (for z = 0 use the
power series of the exponential function, or alternatively de I'Hospital's
rule, which yields the value 1). We write
z
e Z -1
zn
-0 LEn,o
n.
n>O
(7)
Cotangent and the Herglotz trick
123
The coefficients Bn are known as the Bernoulli numbers. The left-hand
side of (6) is an even function (that is, f (z) = f ( - z), and thus we see that
Bn = 0 for odd n 2 3, while B 1 = - corresponds to the term of % in (6).
From
( zn ) ( zn ) ( zn )
L Bn n! (e z - 1) = L Bn n! L n!
n?O n?O n?1
z
we obtain by comparing coefficients for zn:
n-1 {
L Bk =
k=O k!(n - k)!
forn = 1,
for n -::J 1.
We may compute the Bernoulli numbers recursively from (8). The value
n = 1 gives Bo = 1, n = 2 yields lff + B 1 = 0, that is B] = -, and
so on.
Now we are almost done: The combination of (6) and (7) yields
CXJ (2 ' ) 2k CXJ ( _ 1) k 2 2k B
" zy " 2k 2k
ycoty = B2k = (2k)! y,
k=O k=O
and out comes, with (5), Euler's formula for ((2k):
CXJ 1
L"2k =
n
n=1
( _ 1) k-1 2 2k-I B
2k 2k
(2k)! Jr
(k E N).
Looking at our table of the Bernoulli numbers, we thus obtain once again
the sum I: ';2 = K6 2 from Chapter 6, and further
oc 1 Jr4
L n 4 = 90 '
n=1
CXJ 1 Jr6
L n 6 = 945 '
n=1
CXJ 1 Jr8
L n 8 = 9450 '
n=1
CXJ 1 Jrl0
L n 10 = 93555 '
n=1
CXJ 1 691Jr 12
L n 12 = 638512875 '
11=1
The Bernoulli number BlO = 56 that gets us ((10) looks innocuous enough,
but the next value B 12 = - 2 6 ,(3£0 ' needed for (( 12), contains the large prime
factor 691 in the numerator. Euler had first computed some values ((2k)
without noticing the connection to the Bernoulli numbers. Only the appear-
ance of the strange prime 691 put him on the right track.
Incidentally, since ((2k) converges to 1 for k ----+ CXJ, equation (9) tells us
that the numbers I B (2k) I grow very fast - something that is not clear from
the first few values.
In contrast to all this, one knows very little about the values of the Riemann
zeta function at the odd integers k 2 3; see page 35.
(8)
n012345678
Bn 1 - i 0 - 3 1 0 0 4 1 2 0 - 3 1 0
The first few Bernoulli numbers
(9)
124
Cotangent and the Herglotz trick
References
[1] S. BOCHNER: Book review of "Gesammelte Schriften" by Gustav Herglotz,
Bulletin Amer. Math. Soc. 1 (1979), 1020-1022.
[2] 1. ELSTRODT: Partialbruchzerlegung des Kotangens, Herglotz-Trick und die
Weierstraj3sche stetige, nirgends differenzierbare Funktion, Math. Semester-
berichte 4S (1998), 207-220.
[3] L. EULER: introductio in Analysin infin ito rum, Tomus Primus, Lausanne
1748; Opera Omnia, Ser. 1, Vol. 8. In English: introduction to Analysis of
the infinite, Book I (translated by J. D. Blanton), Springer-Verlag, New York
1988.
[4] L. EULER: institutiones calculi differentialis cum ejus usu in analysifinitorum
ac doctrina serierum, Petersburg 1755; Opera Omnia, Ser. I, Vol. 10.
Buffon's needle problem
A French nobleman, Georges Louis Leclerc, Comte de Buffon (1707 -1788),
posed the following problem in 1777:
I
Suppose that you drop a short needle on ruled paper - what is then
the probability that the needle comes to lie in a position where it
crosses one of the lines?
Chapter 20
The probability depends on the distance d between the lines of the ruled
paper, and it depends on the length (I of the needle that we drop - or
rather it depends only on the ratio . A short needle for our purpose is one
of length (I ::::; d. In other words, a short needle is one that cannot cross
two lines at the same time (and will come to touch two lines only with
probability zero). The answer to Buffon's problem may come as a surprise:
It involves the number 7'1.
Theorem ("Buffon's needle problem") Le Comte de Buffon
If a short needle, of length (I, is dropped on paper that is ruled with equally
spaced lines of distance d 2: (I, then the probability that the needle comes
to lie in a position where it crosses one of the lines is exactly
2 (I
p = ;'d'
The result means that from an experiment one can get approximate val-
ues for 7'1: If you drop a needle N times, and get a positive answer (an
intersection) in P cases, then should be approximately , that is, 7'1
should be approximated by 2J; . The most extensive (and exhaustive)
test was perhaps done by Lazzarini in 1901, who allegedly even built a
machine in order to drop a stick 3408 times (with = ). He found
that it came to cross a line 1808 times, which yields the approximation
7'1 2 . i = 3.1415929...., which is correct to six digits of 7'1, and
much too good to be true! (The values that Lazzarini chose lead directly
to the well-known approximation 7'1 i ; see page 29. This explains the
more than suspicious choices of 3408 and , where 3408 is a multiple
of 355. See [5] for a discussion of Lazzarini's hoax.)
The needle problem can be solved by evaluating an integral. We will do that
below, and by this method we will also solve the problem for a long needle.
But the Book Proof, presented by E. Barbier in 1860, needs no integrals.
It just drops a different needle. . . .
/ \
\
\ \ t d
{ --- - \ ,
I /
, ...------ /'
"'..- ./
/ . /I
\
\ \ 1(1
126
Bufj'on's needle problem
/
o
() ()()C) Cj
If you drop any needle, short or long, then the expected number of crossings
wi II be
E = PI + 2P2 + 3P3 + . . .
where PI is the probability that the needle will come to lie with exactly one
crossing, P2 is the probability that we get exactly two crossings, P3 is the
probability for three crossings, etc. The probability that we get at least one
crossing, which Buffon's problem asks for, is thus
P = PI + P2 + P3 + . . .
(Events where the needle comes to lie exactly on a line, or with an end-
point on one of the lines, have probability zero - so they can be ignored
throughout our discussion.)
On the other hand, if the needle is short then the probability of more than
one crossing is zero, P2 = P3 = ... = 0, and thus we get E = p: The
probability that we are looking for is just the expected number of crossings.
This reformulation is extremely useful, because now we can use linearity of
expectation (cf. page 78). Indeed, let us write E(R) for the expected number
of crossings that will be produced by dropping a straight needle of length £.
If this length is R = x + y, and we consider the "front part" of length x and
the "back part" of length y of the needle separately, then we get
E(x + y) = E(x) + E(y),
since the crossings produced are always just those produced by the front
part, plus those of the back part.
By induction on n this "functional equation" implies that E(nx) = nE(x)
for all n E N, and then that mE(x) = E(mx) = E(nx) = nE(x),
so that E(rx) = rE(x) holds for all rational r E Q. Furthermore, E(x)
is clearly monotone in x 2: 0, from which we get that E(x) = ex for all
x 2: 0, where c = E(I) is some constant.
But what is the constant?
For that we use needles of different shape. Indeed, let's drop a "polygonal"
needle of total length R, which consists of straight pieces. Then the number
of crossings it produces is (with probability 1) the sum of the numbers of
crossings produced by its straight pieces. Hence, the expected number of
. . .
crossmgs IS agam
E = cR,
by linearity of expectation. (For that it is not even important whether the
straight pieces are joined together in a rigid or in a flexible way!)
The key to Barbier's solution of Buff'on's needle problem is to consider a
needle that is a perfect circle C of diameter d, which has length :1; = d7f.
Such a needle, if dropped onto ruled paper, produces exactly two inter-
sections, always!
Buffon's needle problem
127
The circle can be approximated by polygons. Just imagine that together
with the circular needle C we are dropping an inscribed polygon Pn, as
well as a circumscribed polygon pn. Every line that intersects Pn will also
intersect C, and if a line intersects C then it also hits p n . Thus the expected
numbers of intersections satisfy
E(Pn) < E(C) < E(p n ).
Now both Pn and pn are polygons, so the number of crossings that we may
expect is "c times length" for both of them, while for C it is 2, whence
C£(Pn) < 2 < C£(p n ).
Both P" and P" approximate C for n ---+ 00. In particular,
lim £(Pn) = d7f = lim £(P"),
n --+ ex) n --+ ex)
and thus for n ---+ 00 we infer from (1) that
cd7f < 2 :<::; cd7f,
which gives c = .
But we could also have done it by calculus! The trick to obtain an "easy"
integral is to first consider the slope of the needle; let's say it drops to lie
with an angle of 0: away from horizontal, where 0: will be in the range
o :<::; 0: :<::; . (We will ignore the case where the needle comes to lie with
negative slope, since that case is symmetric to the case of positive slope,
and produces the same probability.) A needle that lies with angle 0: has
height £ sin 0:, and the probability that such a needle doesn't cross any of
the horizontal lines of distance d is £ s c> . Thus we get the probability by
averaging over the possible angles 0:, as
7r/2
2 J £ sin 0: 2 £ [ ] 7r /2
P = - - do: = - - - cos 0:
7f d 7fd 0
o
2 £
7fd
For a long needle, we get the same probability f Sil C> as long as £ sin 0: :<::; d,
that is, in the range 0 :<::; 0: :<::; arcsin . However, for larger angles 0: the
needle must cross a line, so the probability is 1. Hence we compute
7r/2
P (larcsil1(d/f) £ s 0: + J 1 dx)
arcsin(d/f)
2 ( £ [ J arcSin(d/f) ( 7f d ))
:; d - COS 0: 0 + "2 - arcsin e
1 + ( ( 1 - J 1 - d 2 ) - arcsin )
7f d £2' l
for £ > d.
Q n
pn
(1)
o
o
()()
D
o
o
\
\
\
/
/
---
'-
-------
I
I
128
Buffon's needle problem
"Got a problem:'''
So the answer isn't that pretty for a longer needle, but it provides us with a
nice exercise: Show ("just for safety") that the formula yields for £ = d,
that it is strictly increasing in £, and that it tends to 1 for £ ---+ 00.
References
[I] E. BARBIER: Note sur Ie probleme de l'aiguille et Ie jeu du Joint couvert, J.
Mathematiques Pures et Appliquees (2) 5 (1860), 273-286.
[2] L BERGGREN, 1. BORWEIN & P. BORWEIN, EDS.: Pi: A Source Book,
Springer-Verlag, New York 1997.
[3] G. L LECLERC, COMTE DE BUPFON: Essai d'arithmtitique morale, Ap-
pendix to "Histoire naturelle generale et particuliere," Vol. 4, 1777.
14] D. A. KLAIN & G.-c. ROTA: Introduction to Geometric Probability, "Lezioni
Lincee," Cambridge University Press 1997.
[5] T. H. O'BEIRNE: Puzzles and Paradoxes, Oxford University Press, London
1965.
.
/
\
\
I
'\
/
/
Com bi natorics
21
Pigeon-hole and
double counting 131
22
Three famous theorems
on finite sets 143
23
Lattice paths and determinants 149
24
Cayley's formula
for the number of trees 155
25
Completing Latin squares 161
26
The Dinitz problem 167
"A melancholic Latin square"
,..
Pigeon-hole and double counting
Some mathematical principles, such as the two in the title of this chapter,
are so obvious that you might think they would only produce equally
obvious results. To convince you that "It ain't necessarily so" we
illustrate them with examples that were suggested by Paul Erdos to be
included in The Book. We will encounter instances of them also in later
chapters.
Pigeon-hole principle.
If n objects are placed in r boxes, where r < n, then at least one of
the boxes contains more than one object.
Well, this is indeed obvious, there is nothing to prove. In the language of
mappings our principle reads as follows: Let Nand R be two finite sets
with
INI = n > r = IRI,
and let f : N ---+ R be a mapping. Then there exists some a E R with
If- 1 (a) I 2: 2. We may even state a stronger inequality: There exists some
a E R with
Irl(a)1 2: Il. (1)
In fact, otherwise we would have I f-l (a) I < for all a, and hence
n = 2:: If-I (a) I < r = n, which cannot be.
aER
1. Numbers
Claim. Consider the numbers 1,2,3, . . . , 2n, and take any n + 1
of them. Then there are two among these n + 1 numbers which are
relatively prime.
This is again obvious. There must be two numbers which are only I apart,
and hence relatively prime.
But let us now turn the condition around.
Claim. Suppose again A <;;; {I, 2,... , 2n} with IAI = n+ 1. Then
there are always two numbers in A such that one divides the other.
Chapter 21
J--::- f;>
/. . /
,\ i!!
,< .,---"> ,
';;;'<.' -, ,----- - </
"The pigeon-holes from a hird',\'
perspective ..
132
Pigeon-hole and double counting
Both results are no longer true if one
replaces n+ 1 by n: For this consider
the sets {2, 4, 6, . . . , 2n}, respectively
{n+l,n+2,... ,2n}.
The reader may have fun in proving that
for mn numbers the statement remains
no longer true in general.
This is not so clear. As Erdos told us, he put this question to young Lajos
Posa during dinner, and when the meal was over, Lajos had the answer. It
has remained one of Erdos' favorite "initiation" questions to mathematics.
The (affinnative) solution is provided by the pigeon-hole principle. Write
every number a E A in the form a = 2 k m, where m is an odd number
between 1 and 2n - 1. Since there are n + 1 numbers in A, but only n
different odd parts, there must be two numbers in A with the same odd
part. Hence one is a multiple of the other. 0
2. Sequences
Here is another one of Erdos' favorites, contained in a paper of Erdos and
Szekeres on Ramsey problems.
Claim. In any sequence al, a2, . ., , amn+l of mn + 1 distinct real
numbers, there exists an increasing subsequence
ai, < ai2 < ... < aim+l
(i l < i 2 < '" < im+d
of length m + 1, or a decreasing subsequence
aj, > ah > ... > ajn+l
(iI < 12 < . . . < jn+d
of length n + 1, or both.
This time the application of the pigeon-hole principle is not immediate.
Associate to each ai the number ti which is the length of a longest increas-
ing subsequence starting at ai. If ti 2': m + 1 for some i, then we have
an increasing subsequence of length m + 1. Suppose then that ti m for
all i. The function f : ai f-----t ti mapping {al, . . . , amn+d to {I, . .. , m}
tells us by (I) that there is some s E {I,... , m} such that f(ai) = s for
Tn + 1 = n + 1 numbers ai. Let aj" ah, . . . , ajn+l (iI < '" < jn+l)
be these numbers. Now look at two consecutive numbers aji' a ji+l' If
aji < aji+l' then we would obtain an increasing subsequence of length
s starting at aji+l' and consequently an increasing subsequence of length
s + 1 starting at aji' which cannot be since f (aj,) = s. We thus obtain a
decreasing subsequence aj, > ah > . . . > ajn+l of length n + 1. 0
This simple-sounding result on monotone subsequences has a highly non-
obvious consequence on the dimension of graphs. We don't need here the
notion of dimension for general graphs, but only for complete graphs Kn.
It can be phrased in the following way. Let N = {I,... , n}, n 2': 3, and
consider m permutations 7fl, . " , 7f m of N. We say that the permutations
7f; represent Kn if to every three distinct numbers i, j, k there exists a per-
mutation 7f in which k comes after both i and j. The dimension of Kn is
then the smallest m for which a representation 7fl, . .. , 7f m exists.
As an example we have dim(K3) = 3 since anyone of the three numbers
must come last, as in 7fl = (1,2,3), 7f2 = (2,3,1), 7f3 = (3,1,2). What
Pigeon-hole and double counting
133
about K 1 ? Note first dirn(Kn) :<::; dirn(Kn+I): just delete n + 1 in a
representation of Kn+l' So, dirn(K 4 ) 2': 3, and, in fact, dirn(K4) = 3, by
taking
'ifl = (1,2,3,4), 'if2 = (2,4,3,1), 'if:3 = (1,4,3,2).
It is not quite so easy to prove dirn(K5) = 4, but then, surprisingly, the
dimension stays at 4 up to n = 12, while dim(K I3 ) = 5. So dim(Kn)
seems to be a pretty wild function. WelI, it is not! With n going to infinity,
dirn(Kn) is, in fact, a very welI-behaved function - and the key for finding
a lower bound is the pigeon-hole principle. We claim
dim( K n) 2': log2 log2 n.
Since, as we have seen, dim(Kn) is a monotone function in n, it suffices to
verify (2) for n = 2 2P + 1, that is, we have to show
dim(Kn) 2': p + 1 for n = 2 2P + l.
Suppose, on the contrary, dim(Kn) :<::; p, and let 'ifl, . .. , 'if P be representing
permutations of N = {I, 2,... , 2 2P + I}. Now we use our result on mono-
tone subsequences p times. In 'ifl there exists a monotone subsequence Al
of length 2 2p - 1 + 1 (it does not matter whether increasing or decreasing).
Look at this set Al in 'if2. Using our result again, we find a monotone sub-
sequence A 2 of Al in 'if2 of length 2 2p - 2 + 1, and A 2 is, of course, also
monotone in 'ifl. Continuing, we eventualIy find a subsequence Ap of size
22° + 1 = 3 which is monotone in all permutations 'ifi. Let Ap = (a, b, c),
then either a < b < c or a > b > c in all'ifi' But this cannot be, since there
must be a permutation where b comes after a and c. D
The right asymptotic growth was provided by Joel Spencer (upper bound)
and by Erdos, Szemeredi and Trotter (lower bound):
1
dim(Kn) log210g2 n + (2 + 0(1)) log210g210g2 n.
But this is not the whole story: Very recently, Morris and Hoten found
a method which, in principle, establishes the precise value of dim(Kn).
Using their result and a computer one can obtain the values given in the
margin. This is truly astounding! Just consider how many permutations of
size 1422564 there are. How does one decide whether 7 or 8 of them are
reuired to represent K1422564?
3. Sums
Paul Erdos attributes the folIowing nice application of the pigeon-hole
principle to Andrew Vazsonyi and Marta Sved:
Claim. Suppose we are Riven n integers aI, . . . , an, which need
not be distinct. Then there is always a set of consecutive numbers
ak+l, ak+2,... , at whose sum I:;=k+l ai is a multiple ofn.
'if): 1 2 3 5 6 7 8 9 10 11 12 4
'if2: 2 3 4 8 7 6 5 12 11 10 9 1
'if3: 3 4 1 11 12 910 6 5 8 7 2
'if4: 4 1 2 10 9 12 11 7 8 5 6 3
These four permutations represent K 12
(2)
dim(Kn) :<::; 4 n 12
dim(Kn) 5 n 81
dim(Kn) 6 n 2646
dim(Kn) 7 n :<::; 1422564
134
Pigeon-hole and double counting
C 1 2 3 4 5 6 7 8
R
1 1 1 1 1 1 1 1 1
2 1 1 1 1
3 1 1
4 1 1
5 1
6 1
7 1
8 1
For the proof we set N = {O, a1, a1 + a2,... , a1 + a2 + .., + an} and
R = {O, 1,... , n - I}. Consider the map f : N --+ R, where f(m) is
the remainder of m upon division by n. Since INI = n + 1 > n = IRI, it
follows that there are two sums a] +. . . + ak, a] +. . . + at (k < £) with the
same remainder, where the first sum may be the empty sum denoted by O.
It follows that
f
L a; =
t
La;
k
"a
L'
i=A'+]
i=l
i=l
has remainder 0 - end of proof.
o
Let us turn to the second principle: counting in two ways. By this we mean
the following.
Double counting.
Suppose that we are given two finite sets Rand C and a subset
5 R x C. Whenever (p, q) E 5, then we say p and q are incident.
If rp denotes the number of elements that are incident to pER,
and c q denotes the number of elements that are incident to q E C,
then
LTp
pER
LC q ,
qEC
1 5 1
(3)
Again, there is nothing to prove. The first sum classifies the pairs in 5
according to the first entry, while the second sum classifies the same pairs
according to the second entry.
There is a useful way to picture the set 5. Consider the matrix A = (u pq ),
the incidence matrix of 5, where the rows and columns of A are indexed
by the elements of Rand C, respectively, with
U pq = {
if (p,q) E 5
if (p, q) 5.
With this set-up, Tp is the sum of the p-th row of A and c q is the sum of
the q-th column. Hence the first sum in (3) adds the elements of A (that is,
counts the l's in 5) by rows, and the second sum by columns.
The following example should make this correspondence clear. Let R =
C = {I, 2,... , 8}, and set 5 = {(i,j) : i divides j}. We then obtain the
matrix in the margin, which only displays the I's.
4. Numbers again
Look at the table on the left. The number of 1 's in column j is precisely the
number of divisors of j; let us denote this number by t(j). Let us ask how
....-
Pigeon-hole and double counting
135
large this number t(j) is on the average when j ranges from 1 to n. Thus,
we ask for the quantity
n 1 2 3 4 5 6 7 8
_ 1 n 3 ,5 7 16 5
t(n) = - L t(j). l(n) 1 '2 3" 2 2 3" '"7 '2
n j=1 The first few values of [(71)
How large is l( 71) for arbitrary n? At first glance, this seems hopeless. For
prime numbers p we have t(p) = 2, while for 2" we obtain a large number
t(2") = k + 1. So, t(n) is a wildly jumping function, and we surmise that
the same is true for l(n). Wrong guess, the opposite is true! Counting in
two ways provides an unexpected and simple answer.
Consider the matrix A (as above) for the integers I up to n. Counting by
columns we get 2:.7=1 t(j). How many I's are in row i? Easy enough, the
I's correspond to the multiples of i: Ii, 2i, . . . , and the last multiple not
exceeding 71 is lIf J i. Hence we obtain
_ In Inn In 1/ 1
t(n) = - L t(j) = - L l --:- J - L = L-:-,
71 71 7 71 7 7
j=l ;=1 ;=1 ;=1
where the error in each summand, when passing from lIf J to If, is less
than 1. Hence the overall error for the average is less than 1 as well. Now
the last sum is the n-th harmonic number, which (see page II) is roughly
equal to log n, with an error of less than 1. In summary, we have proved
the remarkable result that, while t(n) is totally erratic, the average l(n)
behaves beautifully: l(n) log 71 holds with an error of less than 2.
5. Graphs
Let G be a finite simple graph with vertex set F and edge set E. We have
defined in Chapter 10 the degree d( v) of a vertex v as the number of edges
which have v as an end-vertex. In the example of the figure, the vertices
1,2, . .. , 7 have degrees 3,2,4,3,3,2,3, respectively.
Almost every book in graph theory starts with the following result (that we
have already encountered in Chapters 10 and 16):
L d(v) = 21EI.
vEt'
For the proof consider S C;;; T X E, where S is the set of pairs (v, e) such
that v E F is an end-vertex of e E E. Counting S in two ways gives on the
one hand 2:.vEt' d(v), since every vertex contributes d(v) to the count, and
on the other hand 21EI, since every edge has two ends. D
As simple as the result (4) appears, it has many important consequences,
some of which will be discussed as we go along. We want to single out in
this section the following beautiful application to an extremal problem on
graphs. Here is the problem:
1 6
2 <1>4 <]
3 7
(4)
136
Pifieon-hole and double counting
Suppose G = (V, E) has n vertices and contains no cycle of
length 4 (denoted by C 4 ), that is, no subgraph t!. How many
edges can G have at most?
As an example, the graph in the margin on 5 vertices contains no 4-cycle
and has 6 edges. The reader may easily show that on 5 vertices the maximal
number of edges is 6, and that this graph is indeed the only graph on 5
vertices with 6 edges that has no 4-cycle.
Let us tackle the general problem. Let G be a graph on n vertices without
a 4-cycle. As above we denote by d(u) the degree of u. Now we count
the following set S in two ways: S is the set of pairs (u, {v, 1o}) where
u is adjacent to v and to w, with v i- 1O. In other words, we count all
occurrences of u
v1O
Summing over u, we find ISI = L1LE' (d')). On the other hand,
every pair {v, 1O} has at most one common neighbor (by the C 4 -condition).
Hence ISI :::; G), and we conclude
. (dU)) < G)
or
L d(u)"2 :::; n(n - 1) + L d(u).
1LE' "E'
(5)
Next (and this is quite typical for this sort of extremal problems) we
apply the Cauchy-Schwarz inequality to the vectors (d(ud,... , d(u,,))T
and (1,1,... , I)T, obtaining
(L d(u)f < n L d(u)"2,
"E' "E'
and hence by (5)
"2
(L d(u))
"E'
< n 2 (n - 1) + n L d(u).
uEV
Invoking (4) we find
41EI"2 < n 2 (n - 1) + 2n lEI
or
I E I 2 _ !.!: I E I _ n 2 (n -1) < 0
2 4 -'
Solving the corresponding quadratic equation we thus obtain the following
result of Istvan Reiman.
Pigeon-hole and double counting
137
Theorem. If the graph G on n vertices contains no 4-cycles, then
lEI < l(1+ V4n-3 )J.
For n = 5 this gives lEI :::; 6, and the graph above shows that equality
can hold.
Counting in two ways has thus produced in an easy way an upper bound
on the number of edges. But how good is the bound (6) in general? The
following beautiful example [2] [3] [6] shows that it is almost sharp. As is
often the case in such problems, finite geometry leads the way.
In presenting the example we assume that the reader is familiar with the
finite field Zp of integers modulo a prime p (see page 18). Consider the
3-dimensional vector space X over Zp. We construct from X the fol-
lowing graph G p . The vertices of G p are the one-dimensional subspaces
[v] : = span z {v}, 0 i- v EX, and we connect two such subspaces
p
[v], [w] by an edge if
(v, w) = VI WI + V2'UJ2 + V3'UJ3 = O.
Note that it does not matter which vector i- 0 we take from the subspace.
In the language of geometry, the vertices are the points of the projective
plane over Zp, and [w] is adjacent to [v] if w lies on the polar line of v.
As an example, the graph G 2 has no 4-cycle and contains 9 edges, which
almost reaches the bound 10 given by (6). We want to show that this is true
for any prime p.
Let us first prove that G p satisfies the C 1 -condition. If [u] is a common
neighbor of [v] and [w], then u is a solution of the linear equations
V]X + V2Y + V:jZ = °
WIX + W2Y + W3Z = O.
Since v and ware linearly independent, we infer that the solution space
has dimension 1, and hence that the common neighbor [u] is unique.
Next, we ask how many vertices G p has. It's double counting again. The
space X contains p3 - 1 vectors i- O. Since everyone-dimensional sub-
space contains p - 1 vectors i- 0, we infer that X has II = p2 + P + 1
one-dimensional subspaces, that is, G p has n = p2 + p + 1 vertices. Simi-
larly, any two-dimensional subspace contains p2 -1 vectors i- 0, and hence
2
Pp/ = P + lone-dimensional subspaces.
It remains to determine the number of edges in G p , or, what is the same by
(4), the degrees. By the construction of G p , the vertices adjacent to [u] are
the solutions of the equation
UIX + U2Y + U:3Z = 0.
The solution space of (7) is a two-dimensional subspace, and hence there
are p + 1 vertices adjacent to u. But beware, it may happen that [u] itself
is a solution of (7). In this case there are only p vertices adjacent to [u].
(6)
(0,0,1)
(1,0,0)
(0, 1,0)
(1,1,0)
The graph G 2 : its vertices are all seven
nonzero triples (x, y, z).
(7)
]38
Pigeon-hole and double counting
0 1 1 1 0 0 0
1 0 1 0 1 0 0
1 1 0 0 0 1 0
A= 1 0 0 1 0 0 1
0 1 0 0 1 0 1
0 0 1 0 0 1 1
0 0 0 1 1 1 0
The matrix for G 2
In summary, we obtain the following result: If u lies on the conic given by
x 2 + y2 + Z2 = 0, then d([u]) = p, and, if not, then d([u]) = p + 1. So it
remains to find the number of one-dimensional subspaces on the conic
x 2 + y2 + Z2 = O.
Let us anticipate the result which we shall prove in a moment.
Claim. There are precisely Ii solutions (.r, y, z) of'the equation
x 2 + y2 + Z2 = 0, and hence (excepting the zero solution) preciselv
:-=-/ = p + 1 vertices in G p of defiree p.
With this, we complete our analysis of G p . There are p + 1 vertices of
degree p, hence (p2 + P + 1) - (p + 1) = p2 vertices of degree p + 1.
Using (4), we obtain
lEI
(p + l)p p2(p + 1) (p + 1)2p
2 + 2 = 2
(P:1)P (1+(2P+1)) = p2:P (1+ V 4p2+4P+1).
Setting n = p2 + P + 1, the last equation reads
n-1
lEI = (1+ v4n-3 ),
and we see that this almost agrees with (6).
Now to the proof of the claim. The following argument is a beautiful appli-
cation of linear algebra involving symmetric matrices and their eigenvalues.
We will encounter the same method in Chapter 3 I, which is no coincidence:
both proofs are from the same paper by Erdos, Renyi and S6s.
We represent the one-dimensional subspaces of X as before by vectors
VI, V2, ... , V p 2+ p +l, any two of which are linearly independent. Simi-
]arly, we may represent the two-dimensional subspaces by the same set of
vectors, where the subspace corresponding to u = (Ul, U2, U3) T is the
set of solutions of the equation Ul.T + U2Y + U:3Z = 0 as in (7). (Of
course, this is just the duality principle of linear algebra.) Hence, by (7),
a one-dimensional subspace, represented by Vi, is contained in the two-
dimensional subspace, represented by V j, if and only if (v i. V j) = O.
Consider now the matrix A = (aij) of size (p2+ p + 1) x (p2 +p+ 1), defined
as follows: The rows and columns of A correspond to VI... . , V p 2+ p +l (we
use the same numbering for rows and columns) with
aij := {
if (Vi, Vj) = 0,
otherwise.
A is thus a real symmetric matrix, and we have aii = 1 if (v i, Vi) = 0, that
is, precisely when Vi lies on the conic x 2 + l.P + Z2 = O. Thus, all that
remains to show is that
trace A
P + 1.
........-
Pi[;eon-hole and double counting
]39
From linear algebra we know that the trace equals the sum of the eigenval-
ues. And here comes the trick: While A looks complicated, the matrix A 2
is easy to analyze. We note two facts:
. Any row of .4_ contains precisely p + 1 ] 'so This implies that p + 1 is an
eigenvalue of .4_, since Al = (p+ 1)1, where 1 is the vector consisting
of I's.
. For any two distinct rows Vi, V j there is exactly one column with a I in
both rows (the column corresponding to the unique subspace spanned
byvi,vj).
Using these facts we find
1'[P:'
1
p I)
pI + J,
p+1
where I is the identity matrix and,] is the all-ones-matrix. Now, J has
the eigenvalue p2 + P + 1 (of multiplicity I) and 0 (of multiplicity p2 + p).
Hence .--1 2 has the eigenvalues p2 + 2p+ 1 = (p+ 1)2 of multiplicity I and p
of multiplicity p2 + p. Since A is real and symmetric, hence diagonalizable,
we find that A has the eigenvalue p + 1 or - (p + 1) and p2 + P eigenvalues
::I:yP. From Fact I above, the first eigenvalue must be p + 1. Suppose
that yP has multiplicity 7", and - yP multiplicity 8, then
trace A = (p + 1) + 7"yP - 8yP.
But now we are home: Since the trace is an integer, we must have 7" = 8,
so trace A = p + 1. D
6. Sperner's Lemma
In 1911, Luitzen Brouwer published his famous fixed point theorem:
Every continuous function f : E" ----+ B" of an n-dimensional
ball to itself has a fixed point (a point x E E" with f(x) = x).
For dimension 1, that is for an interval, this follows easily from the inter-
mediate value theorem, but for higher dimensions Brouwer's proof needed
some sophisticated machinery. It was therefore quite a surprise when in
1928 young Emanuel Sperner (he was 23 at the time) produced a simple
combinatorial result from which both Brouwer's fixed point theorem and
the invariance of the dimension under continuous maps could be deduced.
And what's more, Sperner's ingenious lemma is matched by an equally
beautiful proof - it is just double counting.
140
Pigeon-hole and double countin!?
3
3
/ \
. - -\
I 2 2 I 2
The triangles with three different colors
are shaded
3
,
--0----
We discuss Spemer's lemma, and Brouwer's theorem as a consequence, for
the first interesting case, that of dimension n = 2. The reader should have
no difficulty to extend the proofs to higher dimensions (by induction on the
dimension).
Sperner's Lemma.
Suppose that some "bi!?" trian!?le with vertices VI, V 2 , V3 is trian!?ulated
(that is, decomposed into a finite number of "small" trian!?les that fit to-
gether edge-by-edge).
Assume that the vertices in the triangulation get "colors" from the set
{I, 2, 3} such that Vi receives the color i (for each i), and only the col-
ors i and j are used for vertices along the edge from Ii to 1 j (for i =I j),
while the interior vertices are colored arbitrarily with 1, 2 or 3.
Then in the triangulation there must be a small "tricolored" triangle, which
has all three different vertex colors.
. Proof. We will prove a stronger statement: the number of tricolored
triangles is not only nonzero, it is always odd.
Consider the dual graph to the triangulation, but don't take all its edges
- only those which cross an edge that has endvertices with the (different)
colors 1 and 2. Thus we get a "partial dual graph" which has degree 1 at all
vertices that correspond to tricolored triangles, degree 2 for all triangles in
which the two colors 1 and 2 appear, and degree 0 for triangles that do not
have both colors 1 and 2. Thus only the tricolored triangles correspond to
vertices of odd degree (of degree 1).
However, the vertex of the dual graph which corresponds to the outside of
the triangulation has odd degree: in fact, along the big edge from l T 1 to 1'2,
there is an odd number of changes between 1 and 2. Thus an odd number
of edges of the partial dual graph crosses this big edge, while the other big
edges cannot have both 1 and 2 occurring as colors.
Now since the number of odd vertices in any finite graph is even (by equa-
tion (4)), we find that the number of small triangles with three different
colors (corresponding to odd inside vertices of our dual graph) is odd. D
With this lemma, it is easy to derive Brouwer's theorem.
. Proof of Brouwer's fixed point theorem (for n = 2). Let.6. be the tri-
angle in JE.3 with vertices el = (1,0,0), e2 = (0,1,0), and e3 = (0,0,1).
It suffices to prove that every continuous map f : .6. ----t .6. has a fixed
point, since .6. is homeomorphic to the two-dimensional ball B 2 .
We use 6(7) to denote the maximal length of an edge in a triangulation T-
One can easily construct an infinite sequence of triangulations 'TJ., 12, . . .
of .6. such that the sequence of maximal diameters 6 (7,,) converges to O.
Such a sequence can be obtained by explicit construction, or inductively,
for example by taking 7,,+1 to be the barycentric subdivision of 7".
For each of these triangulations, we define a 3-coloring of their vertices v
by setting A( v) := min {i : f (V)i < Vi}, that is, >"( v) is the smallest index i
such that the i-th coordinate of f (v) - v is negative. Assuming that f has
no fixed point, this is well-defined. To see this, note that every v E .6. lies
Pigeon-hole and douhle counting
141
in the plane Xl + ;r;2 + X3 = 1, hence Li Vi = 1. So if f(v) i- v, then at
least one of the coordinates of f( v) - v must be negative (and at least one
must be positive).
Let us check that this coloring satisfies the assumptions of Sperner's lemma.
First, the vertex ei must receive color i, since the only possible negative
component of f(ei) - ei is the i-th component. Moreover, if v lies on the
edge opposite to ei, then Vi = 0, so the i-th component of f( v) - v cannot
be negative, and hence v does not get the color i.
Sperner's lemma now tells us that in each triangulation Tk there is a tri-
colored triangle {vk:l,vk:2,Vk:3} with A(V k : i ) = i. The sequence of
points (V k : l h::O-l need not converge, but since the simplex .6. is compact
some subsequence has a limit point. After replacing the sequence of tri-
angulations 7f.. by the corresponding subsequence (which for simplicity
we also denote by Tk) we can assume that (Vk:l)k converges to a point
v E .6.. Now the distance of V k : 2 and V k : 3 from V k : l is at most the mesh
length b(Tk), which converges to O. Thus the sequences (V k : 2 ) and (v.': 3 )
converge to the same point v.
But where is f(v)? We know that the first coordinate f(V k : l ) is smaller
than that of V k : l for all k. Now since f is continuous, we derive that the
first coordinate of f ( v) is smaller or equal to that of v. The same reasoning
works for the second and third coordinates. Thus none of the coordinates
of f (v) - v is positive - and we have already seen that this contradicts
the assumption f(v) i- v. 0
References
[I] L. E. J. BROUWER: Uber Abbildungen \'on Mannigfaltigkeiten, Math. An-
nalen 71 (1912), 97-115.
[2] W. C. BROWN: On graphs that do not contain a Thomsen graph, Canadian
Math. Bull. 9 (1966),281-285.
[3] P. ERDOS, A. RENYI & V. SOS: On a problem of graph theory, Studia Sci.
Math. Hungar. 1 (1966), 215-235.
[4] P. ERDOS & G. SZEKERES: A combinatorial problem in geometry, Compositio
Math. (1935), 463-470.
[5J S. HOTEN & W. D. MORRIS: The order dimension (d' the complete graph,
Discrete Math. 201 (1999), 133-139.
[6J 1. REIMAN: Uber ein Problem von K. Zarankiewicz, Acta Math. Acad. Sci.
Hungar. 9 (1958), 269-273.
[7] J. SPENCER: Minimal scrambling sets of simple orders, Acta Math. Acad. Sci.
Hungar. 22 ( 1971), 349-353.
[8J E. SPERNER: Neuer Beweis fiir die lnvarianz der Dimensionszahl und des
Gebietes, Abh. Math. Sem. Hamburg 6 (1928), 265-272.
[9] W. T. TROTTER: Combinatorics and Partially Ordered Sets: Dimension
TheorY, John Hopkins University Press, Baltimore and London 1992.
--
Three famous theorems
on finite sets
In this chapter we are concerned with a basic theme of combinatorics:
properties and sizes of special families F of subsets of a finite set N =
{I, 2,... , n}. We start with two results which are classics in the field:
the theorems of Sperner and of Erdos-Ko-Rado. These two results have in
common that they were reproved many times and that each of them initi-
ated a new field of combinatorial set theory. For both theorems, induction
seems to be the natural method, but the arguments we are going to discuss
are quite different and truly inspired.
In 1928 Emanuel Sperner asked and answered the following question: Sup-
pose we are given the set N = {I, 2,. .. , n}. Call a family F of subsets of
N an antichain if no set of F contains another set of the family F. What is
the size of a largest antichain? Clearly, the family Fk of all k-sets satisfies
the antichain property with IFk I = C). Looking at the maximum of the
binomial coefficients (see page 12) we conclude that there is an antichain
of size (l"/2 J) = maxk (Z). Sperner's theorem now asserts that there are
no larger ones.
Theorem 1. The size qf' a largest antichain (f' an n-set is (l,,'hJ)'
. Proof. Of the many proofs the following one, due to Lubell, is probably
the shortest and most elegant. Let F be an arbitrary antichain. Then we
have to show IFI :::: (In/2J)' The key to the proof is that we consider
chains of subsets 0 = Co C C 1 C C 2 C ... C C" = N, where IC;I = i
for i = 0, . .. , n. How many chains are there? Clearly, we obtain a chain
by adding one by one the elements of N, so there are just as many chains as
there are permutations of N, namely n!. Next, for a set A E F we ask how
many of these chains contain A. Again this is easy. To get from 0 to A we
have to add the elements of A one by one, and then to pass from fl to N we
have to add the remaining elements. Thus if .4 contains k elements, then
by considering all these pairs of chains linked together we see that there are
precisely k!(n - k)! such chains. Note that no chain can pass through two
different sets A and B of F, since F is an antichain.
To complete the proof, let mk be the number of k-sets in:F. Thus IFI =
L=o 'Ink, Then it follows from our discussion that the number of chains
passing through some member of F is
"
L ntk k! (71 - k)!,
k=O
and this expression cannot exceed the number n! of all chains. Hence
Chapter 22
Emanuel Spemer
144
Threefamous theorems onfinite sets
we conclude
Check that the family of all -sets for
even n respectively the two families of
all nl -sets and of all nt1 -sets, when
n is odd, are indeed the only antichains
that achieve the maximum size!
point
,
edge
f
A circle C for n = 6. The bold edges
depict an arc of length 3.
L n k!(n - k)!
m",
n!
k=O
or
n
L mk <
"'=0 C) 1.
< 1,
Replacing the denominators by the largest binomial coefficient, we there-
fore obtain
1 "
--------n-- L m", < 1,
(L"/2J) "'=0
"
Lmk <
k=O
(ln r ;2J) ,
that is,
1.1'1
and the proof is complete.
D
Our second result is of an entirely different nature. Again we consider the
set N = {I,... , n}. CaIl a family .1' of subsets an intersecting family if
any two sets in .1' have at least one element in common. It is almost imme-
diate that the size of a largest intersecting family is 2" -1. If A E .1', then
the complement A C = N\A has empty intersection with A and accordingly
cannot be in F. Hence we conclude that an intersecting family contains at
most half the number 2" of all subsets, that is, 1.1'1 :::; 2"-1. On the other
hand, if we consider the family of all sets containing a fixed element, say
the family .1'1 of all sets containing I, then clearly 1.1'11 = 2,,-1, and the
problem is settled.
But now let us ask the foIlowing question: How large can an intersecting
family .1' be if all sets in .1' have the same size, say k ? Let us caIl such fami-
lies intersecting kIamilies. To avoid trivialities, we assume n 2k since
otherwise any two k-sets intersect, and there is nothing to prove. Taking
up the above idea, we certainly obtain such a family .1'1 by considering all
k-sets containing a fixed element, say I. Clearly, we obtain alI sets in .1'1
by adding to I all (k - I)-subsets of {2, 3,... , n}, hence 1.1'11 = G=).
Can we do better? No - and this is the theorem of Erdos-Ko-Rado.
Theorem 2. The largest size of an intersecting kfamily in an n-set is G=)
when n 2k.
Paul Erdos, Chao Ko and Richard Rado found this result in 1938, but it
was not published until 23 years later. Since then multitudes of proofs and
variants have been given, but the foIlowing argument due to Gyula Katona
is particularly elegant.
. Proof. The key to the proof is the foIlowing simple lemma, which at
first sight seems to be totalIy unrelated to our problem. Consider a circle C
divided by n points into n edges. Let an arc of length k consist of k + 1
consecutive points and the k edges between them.
Lemma. Let n 2k, and suppose we are given t distinct arcs AI, . .. ,At
of length k, such that any two arcs have an edge in common. Then t :::; k.
To prove the lemma, note first that any point of C is the endpoint of at most
one arc. Indeed, if Ai, Aj had a common endpoint v, then they would have
------
Three famous theorems on finite sets
145
to start in different direction (since they are distinct). But then they cannot
have an edge in common as n 2 2k. Let us fix AI. Since any Ai (i 2 2)
has an edge in common with AI, one of the endpoints of Ai is an inner
point of A]. Since these endpoints must be distinct as we have just seen,
and since A] contains k - 1 inner points, we conclude that there can be at
most k - 1 further arcs, and thus at most k arcs altogether. D
Now we proceed with the proof of the Erd6s-Ko-Rado theorem. Let F be
an intersecting k-family. Consider a circle C with n points and n edges as
above. We take any cyclic permutation 7r = (ai, a2,... , an) and write the
numbers ai clockwise next to the edges of C. Let us count the number of
sets A E F which appear as k consecutive numbers on C. Since F is an
intersecting family we see by our lemma that we get at most k such sets.
Since this holds for any cyclic permutation, and since there are (n - I)!
cyclic permutations, we produce in this way at most
k(n - I)!
sets of F which appear as consecutive elements of some cyclic permutation.
How often do we count a fixed set A E F? Easy enough: A appears in 7r
if the k elements of A appear consecutively in some order. Hence we have
k! possibilities to write A consecutively, and (n - k)! ways to order the
remaining elements. So we conclude that a fixed set A appears in precisely
k!(n - k)! cyclic permutations, and hence that
k(n - I)! (n - I)!
IFI k!(n - k)! = (k - l)!(n - 1 - (k - I))!
= ( n - 1 ) . D
k - 1
Again we may ask whether the families containing a fixed element are the
only intersecting k-families. This is certainly not true for n = 2k. For
example, for n = 4 and k = 2 the family {I, 2}, {I, 3}, {2, 3} also has
size (D = 3. More generally, for n = 2k we get the maximal intersecting
k-families, of size G) = (=), by arbitrarily including one out of every
pair of sets fonned by a k-set A and its complement N\A. But for n > 2k
the special families containing a fixed element are indeed the only ones.
The reader is invited to try his hand at the proof. An intersecting family for n = 4, k = 2
Finally, we turn to the third result which is arguably the most important
basic theorem in finite set theory, the "marriage theorem" of Philip Hall
proved in 1935. It opened the door to what is today called matching theory,
with a wide variety of applications, some of which we shall see as we
go along.
Consider a finite set X and a collection ih, . . . ,An of subsets of X (which
need not be distinct). Let us call a sequence Xl, . .. , X n a system of distinct
representatives of {AI, . .. , An} if the Xi are distinct elements of X, and
if Xi E Ai for all i. Of course, such a system, abbreviated SDR, need not
exist, for example when one of the sets Ai is empty. The content of the
theorem of Hall is the precise condition under which an SDR exists.
]46
Threefamous theorems on finite sets
,
-::\ ,,,J -::} < i,;i",
>' < fJ. " J ' A; ,'I. .' l .',
., ,: <,jf j":;' , · lvI,' . "
}k, 'f". ........ .... !.... . .. ' .... . . . .. :t. ... . .If1 . y .. ' ..... .. . i' ... ' ... . ,.. I' \ ,ft .... 'Pi L,d
(,,' ',I Ii '\ J \1
, ,...TS') " I i ..... R.' I.(
" I r'" ; ! "", 'f';'}, )
" r, "i: 1 1rrr .. ( . ' .. :I 1f ! II ,.. ", '
I\. i' I . r . ru\
\ ,i - ',I; """_-',
) - I ,
"A mass wedding"
c
{B, C, D} is a critical family
Before giving the result let us state the human interpretation which gave
it the folklore name marriage theorem: Consider a set {I,.. . , n} of girls
and a set X of boys. Whenever:£ E Ai, then girl i and boy J: are inclined
to get married, thus Ai is just the set of possible matches of girl i. An SDR
represents then a mass-wedding where every girl marries a boy she likes.
Back to sets, here is the statement of the result.
Theorem 3. Let AI, . .. , An be a collection of subsets of a finite set X.
Then there exists a system of distinct representatives !f and only if the union
of any Tn sets Ai contains at least Tn elements, for 1 :::; Tn :::; n.
The condition is clearly necessary: If 'Tn sets Ai contain between them
fewer than Tn elements, then these Tn sets can certainly not be represented
by distinct elements. The surprising fact (resu]ting in the universal ap-
p]icability) is that this obvious condition is also sufficient. Hall's original
proof was rather complicated, and subsequently many different proofs were
given, of which the following one (due to Easterfield and rediscovered by
Ha]mos and Vaughan) may be the most natural.
. Proof. We use induction on n. For n = 1 there is nothing to prove. Let
n > 1, and suppose {AI, . .. , An} satisfies the condition of the theorem
which we abbreviate by (H). Call a collection of f. sets Ai with 1 :::; fI < n a
critical family if its union has cardinality f. Now we distinguish two cases.
Case 1: There is no critical family.
Choose any e]ement:£ E An. Delete x from X and consider the collection
A, . " , A';,-1 with A; = Ai \x. Since there is no critical family, we find
that the union of any Tn sets A; contains at least 'Tn elements. Hence by
induction on n there exists an SDR J:1, . " , J: n -1 of {A, . .. ,A;,-d, and
together with J: n = x, this gives an SDR for the original collection.
Case 2: There exists a critical family.
After renumbering the sets we may assume that {A] , . " , A{} is a critical
family. Then U;=I Ai = ,Y with I "XC I = fi. Since F < n, we infer the exis-
tence of an SD for A1' . . . , A.( by induction, that is, there is a numbering
XI,'" , Xi of X such that Xi E Ai for all i :::; fl.
Consider now the remaining collection A{+1,.', , All, and take any 'Tn of
these sets. Since the union of A 1 , . .. , A( and these Tn sets contains at least
f. + m. elements by cndition (H), we infer that the Tn sets contain at least
Tn elements outside X. In other words, condition (H) is satisfied for the
family
A H1 \X,
,An\X.
Induction now gives an SDR for A,+] , . " , An that avoids X. Combin-
ing it with :£1, . . . , X{ we obtain an SDR for all sets Ai. This completes
the proof. 0
.......--
Three famous theorems on finite sets
147
As we mentioned, Hall's theorem was the beginning of the now vast field
of matching theory [6]. Of the many variants and ramifications let us state
one particularly appealing result which the reader is invited to prove for
himself:
Suppose the sets AI, . ., ,All all have size k 2: 1 and suppose
further that no element is contained in more than k sets. Then
there exist k SDR's such that for any i the k representatives of Ai
are distinct and thus togetherform the set Ai.
A beautiful result which should open new horizons on marnage possi-
bilities.
References
[I] T. E. EASTERFIELD: A combinatorial algorithm, J. London Math. Soc. 21
(1946),219-226.
[2] P. ERDOS, C. Ko & R. RADO: Intersection theoremsfor systems offinite sets,
Quart. J. Math. (Oxford), Ser. (2) 12 (1961 ), 313-320.
13] P. HALL: On representatives of subsets, Quart. J. Math. (Oxford) 10 (1935),
26-30.
[4] P. R. HALMOS & H. E. VAUGHAN: The marriage problem, Amer. J. Math.
72 (1950), 214-215.
[5] G. KATONA: A simple proof (if the Erdos-Ko-Rado theorem, J. Combinatorial
Theory, Ser. B 13 (1972), 183-184.
l6] L. Lov ASZ & M. D. PLUMMER: Matching Theory, Akademiai Kiado, Bu-
dapest 1986.
l7] D. LUBELL: A short proqf of Sperner's theorem, J. Combinatorial Theory 1
(1966), 299.
[8] E. SPERNER: Ein Sat;:. iiber Untermengen einer endlichen Menge, Math.
Zeitschrift 27 (1928), 544-548.
......-
Lattice paths and determinants
The essence of mathematics is proving theorems - and so, that is what
mathematicians do: they prove theorems. But to tell the truth, what they
really want to prove, once in their lifetime, is a Lemma, like the one by
Fatou in analysis, the Lemma of Gauss in number theory, or the Burnside-
Frobenius Lemma in combinatorics.
Now what makes a mathematical statement a true Lemma? First, it should
be applicable to a wide variety of instances, even seemingly unrelated prob-
lems. Secondly, the statement should, once you have seen it, be completely
obvious. The reaction of the reader might well be one of faint envy: Why
haven't I noticed this before? And thirdly, on an esthetic level, the Lemma
- including its proof - should be beautiful!
In this chapter we look at one such marvelous piece of mathematical rea-
soning, the Lemma of Ira Gessel and Gerard Viennot, which has become an
instant classic in combinatorial enumeration since its appearance in 1985.
A similar version was previously obtained by Bernt Lindstrom.
The starting point is the usual permutation representation of the detenninant
of a matrix. Let 1\1 = (7nij) be a real n x n-matrix. Then
clet.M = L sign (J 7n1 u(l) 7n2u(2) . . . 7n nu ( n), (1)
u
where (J runs through all permutations of {I, 2,... ,n}, and the sign of (J
is 1 or -1, depending on whether (J is the product of an even or an odd
number of transpositions.
Now we pass to graphs, more precisely to weighted directed bipartite graphs. Al
Let the vertices A l, . .. , An stand for the rows of A1, and B l, . .. , Bn for
the columns. For each pair of i and j draw an arrow from Ai to Bj and give
it the weight 7nij, as in the figure.
In terms of this graph, the formula (I) has the following interpretation:
. The left -hand side is the determinant of the path-matrix 1\1, whose
(i, j)-entry is the weight of the (unique) directed path from Ai to Bj.
. The right-hand side is the weighted (signed) sum over all vertex-disjoint
path systems from A = {AI,'" ,An} to B = {B!,... , Bn}. Such a
system Pu is given by paths
AI --+ BU(l)' ... , An --+ Bu(n),
Chapter 23
Ai
An
BI
Bn
BJ
150
Lattice paths and determinants
and the weight of the path system Pu is the product of the weights of
the individual paths:
w(Pu) = w(A. 1 --+ Bu(l)) ... w(An --+ Bu(n))'
In this interpretation formula (1) reads
detM = L signO'w(Pu).
u
\0
And what is the result of Gessel and Viennot? It is the natural generalization
of (I) from bipartite to arbitrary graphs. It is precisely this step which
makes the Lemma so widely applicable - and what's more, the proof is
stupendously simple and elegant.
Let us first collect the necessary concepts. We are given a finite acyclic
directed graph G = (1', E), where acyclic means that there are no directed
cycles in G. In particular, there are only finitely many directed paths
between any two vertices A and B, where we include all trivial paths
A. --+ A of length O. Every edge c carries a weightw( c). If P is a
directed path from A to B, written shortly P : A --+ B, then we define
the weight of P as
An acyclic directed graph
w(P) := II w(e),
eEP
which is defined to be w(P) = 1 if P is a path of length O.
Now let A = {Al,...,An}andB = {B 1 ,...,B,,}betwosetsofn
vertices, where A and B need not be disjoint. To A and B we associate the
path matrix }\1 = (rn;j) with
Tnij :=
L w(P).
P:A;--+Hj
A path system P from A to B consists of a permutation 0' together with n
paths Pi : Ai --+ BU(i), for i = 1,... , n; we write sign P = sign 0' . The
weight of P is the product of the path weights
"
w(P) = II w(Pi),
;=1
(2)
which is the product of the weights of all the edges of the path system.
Finally, we say that the path system P = (P 1 , . " , P,,) is vertex-disjoint if
the paths of P are pairwise vertex-disjoint.
Lemma. Let G = (1 " E) be a finite acyclic weighted directed graph,
A = {A 1 , . .. , An } and B = {B 1 , . .. , B" } two n-sets of ve rtice s, and "1
the path matrix from A to B. Then
detM
L
sign P w(P).
(3)
P vertex-disjoint
path ...y...tcm
Lattice paths and determinants
151
. Proof. A typical summand of det(NI) IS sign 0' Tn1()"(1)" . Tnn()"(n) ,
which can be written as
sign 0' (
L
w(Fn)).
L
w(Fd) ... (
T't:A[-+Rall)
Pn :./11/ --tEeT( n)
Summing over 0' we immediately find from (2) that
det Al = L sign P w(P),
P
where P runs through all path systems from A to B (vertex-disjoint or not).
Hence to arrive at (3), all we have to show is
L signP w(P) = 0,
PE1V
where N is the set of all path systems that are not vertex-disjoint. And this
is accomplished by an argument of singular beauty. Namely, we exhibit an
involution 7r : N --+ N (without fixed points) such that for P and 7rP
w(7rP) = w(P)
and
sign 7rP = -sign P .
Clearly, this will imply (4) and thus the formula (3) of the Lemma.
The involution 7r is defined in the most natural way. Let PEN with paths
F i : Ai --+ B()"( i)' By definition, some pair of paths will intersect:
. Let io be the minimal index such that Fio shares some vertex with
another path.
. I:et X be the first such common vertex on the path Fia.
. Let jo be the minimal index (jo >io) such that Fj(] has the vertex X
in common with Fia'
Now we construct the new system 7rP = (F{, . .. , F:,) as follows:
. Set PI. = n. for all k: i- io,jo.
. The new path F!o goes from A io to X along F io ' and then continues
to B()"Uo) along Fja' Similarly, F)o goes from Ajo to X along Fjo and
continues to B()"(io) along Fia'
Clearly 7r(7rP) = p, since the index i o , the vertex X, and the index jo are
the same as before. In other words, applying 7r twice we switch back to
the old paths F i . Next, since 7rP and P use precisely the same edges, we
certainly have w(7rP) = w(P). And finally, since the new permutation 0"
is obtained by multiplying 0' with the transposition (in, jo), we find that
sign 7rP = -sign P , and that's it. D
The Gessel- Viennot Lemma can be used to derive all basic properties of
determinants, just by looking at appropriate graphs. Let us consider one
particularly striking example, the formula of Binet-Cauchy, which gives a
very useful generalization of the product rule for determinants.
(4)
A iO
X
B()"Uo)
Ajo
B()"(i o )
152
Lattice paths and determinants
Ai
Ar
.
Bs.
.
C r
C 1
C
)
I
I 1
I 2 I
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 I
1 7 21 35 35 21 7 1
a=3
b=4
Theorem. If P is an (r x s) -matrix and Q an (s x r) -matrix, r ::; s, then
det(PQ) = 2)det Pz)(det Qz),
z
where Pz is the (r x r)-submatrix of P with column-set Z, and Qz the
(r x r')-submatrix ofQ with the corresponding rows Z.
. Proof. Let the bipartite graph on A and B correspond to P as before, and
similarly the bipartite graph on Band C to Q. Consider now the concate-
nated graph as indicated in the figure on the left, and observe that the (i, n-
entry mij of the path matrix M from A to C is precisely mij = Lk Pikqkj,
thus M = PQ.
Since the vertex-disjoint path systems from A to C in the concatenated
graph correspond to pairs of systems from A to Z resp. from Z to C, the
result follows immediately from the Lemma, by noting that sign (In) =
(sign 0') (sign T). 0
The Lemma of Gessel- Viennot is also the source of a great number of re-
sults that relate determinants to enumerative properties. The recipe is al-
ways the same: Interpret the matrix IV! as a path matrix, and try to compute
the right-hand side of (3). As an illustration we will consider the original
problem studied by Gessel and Viennot, which led them to their Lemma:
Suppose that al < a2 < . . . < an and b 1 < b 2 < . . . < b n are two
sets of natural numbers. We wish to compute the determinant of the
matrix M = (mij), where mij is the binomial coefficient (i).
J
In other words, Gessel and Viennot were looking at the determinants of
arbitrary square matrices of Pascal's triangle, such as the matrix
( ()
det (i)
()
m
m
m
m )
(:)
()
dctU
1
4
20
X)
given by the bold entries of Pascal's triangle, as displayed in the margin.
As a preliminary step to the solution of the problem we recall a well-known
result which connects binomial coefficients to lattice paths. Consider an
a x b-lattice as in the margin. Then the number of paths from the lower
left-hand corner to the upper right-hand corner, where the only steps that
are allowed for the paths are up (North) and to the right (East), is (ab).
The proof of this is easy: each path consists of an arbitrary sequence of b
"east" and a "north" steps, and thus it can be encoded by a sequence of the
form NENEEEN, consisting of a+ b letters, a N's and bE's. The number of
such strings is the number of ways to choose a positions of letters N from
a total of a + b positions, which is (ab) = (at b ).
Lattice paths and determinants 153
Now look at the figure to the right, where Ai is placed at the point (0, -(Li)
and Bj at (b j , -b j ).
The number of paths from "4i to Bj in this grid that use only steps to the
north and east is, by what we just proved, (oJ +(; -u J )) = (J In other Al
words, the matrix of binomials AI is precisely the path matrix from A to B
in the directed lattice graph for which all edges have weight 1, and all edges
are directed to go north or east. Hence to compute det AI we may apply
the Gessel- Viennot Lemma. A moment's thought shows that every vertex-
disjoint path system P from A to B must consist of paths Pi : Ai --+ B i for Ai
all i. Thus the only possible permutation is the identity, which has sign = 1,
and we obtain the beautiful result
det ( (;)) = # vertex-disjoint path systems from A to B.
.
.
.
In particular, this implies the far from obvious fact that det AI is always
nonnegative, since the right-hand side of the equality counts somethin/?
In fact, one gets from the Gessel- Viennot Lemma that det AI = 0 if and
only if (Li < b i for some i.
For example,
An
( G) m W ) :3
det '(;) m () # vertex-disjoint
path systems in
() m ()
6
I
\,. G- ,-..
-C;1 ,
-.L } -
- ,,.
".
'"
. - - -----------c
"
"Lattice paths"
154
Lattice paths and determinants
References
r 11 I. M. GESSEL & G. VIENNOT: Binomial determinants, paths, and hook length
formulae, Advances in Math. 58 (1985), 300-321.
[2] B. LINDSTROM: On the vector representation of induced matroids, Bulletin
London Math. Soc. 5 (1973), 85-90.
.....-
Cayley's formula
for the number of trees
One of the most beautiful formulas in enumerative combinatorics concerns
the number of labeled trees. Consider the set N = {I, 2,... , n}. How
many different trees can we form on this vertex set? Let us denote this
number by Tn. Enumeration "by hand" yields Tl = 1, T 2 = 1, T;j = 3,
T4 = 16, with the trees shown in the following table:
1 2 1 2 1 2
\V7
333
1 2 1 2 1 2 1 212 1 2 1 2 1 2
f:/1Ln:JUC
3: 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4
1
.
1 2
----
1 2 1 2 1 2 1 212 121 212
ZNSV1X>1XI><
3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4
Note that we consider labeled trees, that is, although there is only one tree
of order 3 in the sense of graph isomorphism, there are 3 different labeled
trees by choosing the inner vertex first. For n = 5 there are three non-
isomorphic trees:
-!r
6
5
For the first tree there are clearly 5 different labelings, and for the second
and third there are ¥ = 60 labelings, and we obtain 1',,, = 125. This should
be enough to conjecture Tn = nn-2, and this is precisely Cayley's result.
Theorem. There are nn-2 different labeled trees on n vertices.
This beautiful formula yields to equally beautiful proofs, drawing on a
variety of combinatorial and algebraic techniques. We will outline three
of them before presenting the proof which is to date the most beautiful of
them all.
Chapter 24
Arthur Cayley
156
Cayley's f(mnula for the numher qf'trees
1 I I r
The four trees of 72
7
9
10 __________
1
3 V8
6 1
6
2
9 1 5 8
1 . . 2 0 3 '
10
6
. First proof (Bijection). The classical and most direct method is to find
a bijection from the set of all trees on n vertices onto another set whose
cardinality is known to be nn-2. Naturally, the set of all ordered sequences
(a 1,... ,a"-2) with 1 ::; ai ::; n comes into mind. Thus we want to
uniquely encode every tree T by a sequence (a1 ... ,a 11 -2). Such a code
was found by Priifer and is contained in most books on graph theory.
Here we want to discuss another bijection proof, due to Joyal, which is
less known but of equal elegance and simplicity. For this, we consider not
just trees t on N = {1,... , n} but trees together with two distinguished
vertices, the left end 0 and the right end D, which may coincide. Let
T" = {(t; 0, D)} be this new set; then, clearly, IT" I = n 2 T".
Our goal is thus to prove IT" I = nn. Now there is a set whose size is
known to be n", namely the set NN of all mappings from N into N. Thus
our formula is proved if we can find a bijection from N N onto T".
Let J : N N be any map. We represent J as a directed graph 6 J by
drawing arrows from i to J(i).
For example, the map
J = (
3
9
4
1 )
2
5
4
9
6
2
7
5
5
8
8
5
1
is represented by the directed graph in the margin.
Look at a component of 6 1 , Since there is precisely one edge emanating
from each vertex, the component contains equally many vertices and edges,
and hence precisely one directed cycle. Let AI <;;; N be the union of the
vertex sets of these cycles. A moment's thought shows that "U is the unique
maximal subset of N such that the restriction of J onto AI acts as a bijection
on M. Write JIM = (J(a) Jb) J(z) ) such that the numbers
a, b, . " , z in the first row appear in natural order. This gives us an ordering
J(a), J(b),... , J(z) of M according to the second row. Now J(a) is our
left end and J (z) is our right end.
The tree t corresponding to the map J is now constructed as follows: Draw
J(a),... , J(z) in this order as a path from J(a) to J(z), and fill in the
remaining vertices as in 6 1 (deleting the arrows).
In our example above we obtain AI = {1, 4, 5, 7, 8, 9}
@]
( 1 4
JM = 7 9
5
1
7 8
5 8
)
and thus the tree t depicted in the margin.
It is immediate how to reverse this correspondence: Given a tree t, we look
at the unique path P from the left end to the right end. This gives us the
set M and the mapping JIM. The remaining correspondencesi --+ J( i) are
then filled in according to the unique paths from i to P. D
.....-
Cayley's formula for the number of trees
157
. Second proof (Linear Algebra). We can think of T" as the number of
spanning trees in the complete graph K 11 . Now let us look at an arbitrary
connected simple graph G on 1/ = {I, 2, . . . , n}, denoting by t( G) the
number of spanning trees; thus T" = t(K 11 ). The following celebrated
result is Kirchhoff's matrix-tree theorem (see fI1). Consider the incidence
matrix B = (b ie ) of G, whose rows are labeled by F, the columns by E,
where we write b ie = lor 0 depending on whether i E cor i <t e. Note that
lEI 2' n - 1 since G is connected. In every column we replace one of the
two 1 's by -1 in an arbitrary manner (this amounts to an orientation of G),
and call the new matrix e. !vI = ee T is then a symmetric (n x n)-matrix
with the degrees d l , . ., , d n in the main diagonal.
Proposition. We have t( G) = det!vh for all i = 1,... , n, where Alii
results from Al by deleting the i-th row and the i-th column.
. Proof. The key to the proof is the Binet-Cauchy theorem proved in the
previous chapter: When r is an (r x 8)-matrix and Q an (8 x r)-matrix,
r :S 8, then det(PQ) equals the sum of the products of determinants of
corresponding (r x r )-submatrices, where "corresponding" means that we
take the same indices for the r columns of r and the r rows of Q.
For AI; i this means that
det Alii = '" det lv' . det NT = '" ( det N ) 2.
N N.
where N runs through all (n - 1) x (n - 1) submatrices of e\ {row i}. The
n - 1 columns of N correspond to a subgraph of G with n - 1 edges on n
vertices, and it remains to show that
1 7\ r { ::!:: 1 if these edges span a tree
( ct 1'1 = .
o otherwIse.
Suppose the n - 1 edges do not span a tree. Then there exists a component
which does not contain i. Since the corresponding rows of this component
add to 0, we infer that they are linearly dependent, and hence det N = O.
Assume now that the columns of 1\/ span a tree. Then there is a ver-
tex jj i- i of degree 1; let Cj be the incident edge. Deleting jj, Cj we
obtain a tree with n 2 edges. Again there is a vertex h i-i of de-
gree I with incident edge e2. Continue in this way until jj , h, . .. , j n-j
and Cl, e2,... , e,,_j with ji E ei are determined. Now permute the rows
and columns to bring j" into the k-th row and e" into the k:-th column.
Since by construction jk tI. ef for k < e, we see that the new matrix N' is
lower triangular with all elements on the main diagonal equal to ::!:: 1. Thus
det N = ::!::det N' = ::!::1, and we are done.
For the special case G = Kn we clearly obtain
( n-l
-1
J.I ii = :
-1
n- J
-1
n-l
-1
1
-1
and an easy computation shows det Alii = n"- 2 .
"A nonstandard method of counting
trees: Put a cat into each tree, walk your
dog, and count how otten he harks."
D
158
Cayley's formula for the number of trees
For example, T 4 ,2 = 8 for A = {I, 2}
1 2 3
k:
.
""-.,..--'
z
. Third proof (Recursion). Another classical method in enumerative
combinatorics is to establish a recursion and to prove this recursion by
induction. The following idea is essentially due to Riordan and Renyi.
To find the proper recursion, we consider a more general problem (which
already appears in Cayley's paper). Let A be an arbitrary k-set of the
vertices. By Tn,k we denote the number of (labeled) forests on {I, . .. ,n}
consisting of k trees where the vertices of A appear in different trees.
Clearly, the set A does not matter, only the size k. Note that Tn.l = Tn.
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
L:IIXL::J/1
3 4 3 434 3 434 3 434 3 4
Consider such a forest F with A = {I, 2, . .. , k}, and suppose I is adjacent
to i vertices, as indicated in the figure in the margin. Deleting I, we see that
these i vertices together with 2, . .. , k yield Tn-l ,k-Hi forests. As we can
choose the i vertices arbitrarily among the n - k vertices different from
1, . . . k, we infer for n 2 k 2 1,
n-k ( n - k )
TIl,k = L i Tn-l. k - 1 + i
1=0
(I)
where we set T o . o = 1, T Il . o = 0 for n > O. Note that To,o = 1 is necessary
to ensure T n ,1l = 1.
Proposition. We have
Tn,k
kn n - k - 1
(2)
and thus, in particular,
Tn,] = Tn = nn-2.
. Proof. By (I), and using induction, we find
n-k
Tn.k L Ck)(k-1+i)(n-1)n-l-k-i (i--+n-k-i)
1=0
I: (71 k) (n - 1 - i)(n _ l)i-l
1=0
C k) (n - l)i _ (n k)i(n _ 1)'-1
n-I.,
nn-k _ (n _ k) (n k) (n _ 1) i-I
, n-l-k ( n_1_k ) .
nn-k - (n - k) L i (n - 1)'
1.=0
n"- k _(n_k)nn-l-k = kn n - 1 - k .
D
Cayley's formula for the number of trees
159
. Fourth proof (Double Counting). The following marvelous idea due
to Jim Pitman gives Cayley's formula and its generalization (2) without
induction or bijection - it is just clever counting in two ways.
A rooted forest on {I, . " , 71 } is a forest together with a choice of a root in
each component tree. Let F".k be the set of all rooted forests that consist
of k rooted trees. Thus F",1 is the set of all rooted trees.
Note that IF".11 = nT", since in every tree there are 71 choices for the root.
We now regard F",k E F",k as a directed graph with all edges directed
away from the roots. Say that a forest F contains another forest F' if F
contains F' as directed graph. Clearly, if F properly contains F', then F
has fewer components than F'. The figure shows two such forests with the
roots on top.
Here is the crucial idea. Call a sequence F] , . ., , FA' of forests a refining
sequence if F i E F".i and F i contains Fi+l' for all i.
Now let Fk be a fixed forest in F".k and denote
. by N (Fd the number of rooted trees containing FA, and
. by N* (Fd the number ofrefining sequences ending in Fl,;.
We count N*(Fd in two ways, first by starting at a tree and secondly by
'?tarting at Fk. Suppose Fl E F",1 contains Fk. Since we may delete
the k - 1 edges of F 1 \FI, in any possible order to get a refining sequence
from F 1 to FA, we find
N*(F k ) = N(Fk) (k - I)!.
Let us now start at the other end. To produce from FA' an Fk-l we have to
add a directed edge, from any vertex a, to any of the k - 1 roots of the trees
that do not contain a (see the figure on the right, where we pass from F: 3
to F 2 by adding the edge 3 e.. e 7). Thus we have n(k 1) choices.
Similarly, for F k - 1 we may produce a directed edge from any vertex b to
any of the k-2 roots of the trees not containing b. For this we have n(k-2)
choices. Continuing this way, we arrive at
N*(F k ) = n k - 1 (k -I)!,
and out comes, with (3), the unexpectedly simple relation
N(FI,;) = nk-l
for any Fk E F".k.
For k = 71, F" consists just of 71 isolated vertices. Hence N(F,,) counts the
number of all rooted trees, and we obtain IF",! I = 71"-1, and thus Cayley's
formula. D
But we get even more out of this proof Formula (4) yields for k = 7/:
#{ refining sequences (Fl, F 2 ,. ,. , F n )} = 71"-1 (71 - I)!. (5)
For FI,;EF".k, let N**(F k ) denote the number of those refining sequences
F] , . ., , Fn whose k-th tenn is Fk. Clearly this is N* (Fk) times the number
F 2 8
2
9
F 2 contains F;j
847
,A3 I" /)9
F;) 6
10
(3)
--
A/ 1 4" -) ' ,?:
2 3 1 9
6
F3 --+ Fz
10
(4)
160
Cayley'sformulafor the number oj" trees
of ways to choose (Fic+ 1 , . . . , Fn). But this latter number is (n - k)! since
we may delete the n - k edges of F" in any possible way, and so
N**(Fd = N*(Fd(n - k)! = nk-l (k - l)!(n - k:)!. (6)
Since this number does not depend on the choice of F"., dividing (5) by (6)
yields the number of rooted forests with k trees:
IF".kl =
n,,-l (n - I)!
nk-l(k - l)!(n - k)!
C)knn-l-k.
As we may choose the k roots in G) possible ways, we have reproved the
formula TrI,k = kn n - k - 1 without recourse to induction.
Let us end with a historical note. Cayley's paper from 1889 was anticipated
by Carl W. Borchardt (1860), and this fact was acknowledged by Cayley
himself. An equivalent result appeared even earlier in a paper of James J.
Sylvester (1857), see [2, Chapter 3]. The novelty in Cayley's paper was
the use of graph theory terms, and the theorem has been associated with his
name ever SInce.
References
[II M. AIGNER: Combinatorial Theon, Springer-Verlag, Berlin Heidelberg New
York 1979; Reprint 1997.
[2] N. L BIGGS, E. K. LLOYD & R. J. WILSON: Graph Theon /736-/936,
Clarendon Press, Oxford 1976.
[3J A. CAYLEY: A theorem on trees, Quart. J. Pure AppL Math. 23 (1889),
376-378; Collected Mathematical Papers VoL 13, Cambridge University Press
1897,26-28.
14] A. JOYAL: Une theorie combinatoire des seriesformel/es, Advances in Math.
42 ( 1(81), 1-82.
15J 1. PITMAN: Coalescent random forests, J. Combinatorial Theory, Ser. A 8S
(1999), 165-193.
r6J H. PRUFER: Neller Beweis eines Satzes iiber PennlllationCl1, Archiv der Math.
u. Physik (3) 27 (1918),142-144.
[7J A. RENYI: Some remarks on the theon' of trees. MTA Mat. Kut. lnst. Kozl.
(Publ. math. Inst. Hungar. Acad. Sci.) 4 (1959), 73-85; Selected Papers Vol. 2,
Akademiai Kiad6, Budapest 1976, 363-374.
181 J. RIORDAN: Forests of labeled trees, J. Combinatorial Theory S (1968),
90-103.
Completing Latin squares
Some of the oldest combinatorial objects, whose study apparently goes
back to ancient times, are the Latin squares. To obtain a Latin square,
one has to fill the n 2 cells of an (n x n)-square array with the numbers
1,2, . " , n so that that every number appears exactly once in every row
and in every column. In other words, the rows and columns each represent
permutations of the set {l, . . . , n}. Let us call n the order of the Latin
square.
Here is the problem we want to discuss. Suppose someone started filling
the cells with the numbers {I, 2, . .. , n}. At some point he stops and asks
us to fill in the remaining cells so that we get a Latin square. When is this
possible? In order to have a chance at all we must, of course, assume that
at tI1e start of our task any element appears at most once in every row and
in every column. Let us give this situation a name. We speak of a partial
Latin square of order n if some cells of an (n x n)-array are filled with
numbers from the set {I, . .. , n} such that every number appears at most
once in every row and column. So the problem is:
When can a partial Latin square be completed to a Latin square of
the same order?
Let us look at a few examples. Suppose the first n - 1 rows are filled and
the last row is empty. Then we can easily fill in the last row. Just note that
every element appears n - 1 times in the partial Latin square and hence is
missing from exactly one column. Hence by writing each element below
the column where it is missing we have completed the square correctly.
Going to the other end, suppose only the first row is filled. Then it is again
easy to complete the square by cyclically rotating the elements one step in
each of the following rows.
So, while in our first example the completion is forced, we have lots of
possibilities in the second example. In general, the fewer cells are pre-
filled, the more freedom we should have in completing the square.
However, the margin displays an example of a partial square with only n
cells filled which clearly cannot be completed, since there is no way to fill
the upper right-hand comer without violating the row or column condition.
If fewer than n cells are filled in an n x n array, can one then always
complete it to obtain a Latin square?
Chapter 25
1 2 3 4
2 1 4 3
4 3 1 2
3 4 2 1
A Latin square of order 4
1 -4 2 5 3
4 2 5 3 1
2 5 3 1 4
5 3 1 4 2
3 1 4 2 5
A cyclic Latin square
1 2 .. . n-1
n
A partial Latin square that cannot be
completed
162
Completing Latin squares
1 3 2
2 1 3
3 2 1
R: 1 I I 2 2 2 3 3 3
C: I 23 I 23 I 23
E: I 3 2 2 I 3 3 2 I
If we permute the lines of the above
example cyclically,
R -----+ C -----+ E -----+ R, the n we
obtain the following line array and
Latin square:
1 2 3
3 1 2
2 3 1
R:1322l3321
C: I I I 222 3 3 3
E: I 2 3 I 2 3 I 2 3
This question was raised by Trevor Evans in 1960, and the assertion that
a completion is always possible quickly became known as the Evans con-
jecture. Of course, one would try induction, and this is what finally led
to success. But Bohdan Smetaniuk's proof from 1981, which answered
the question, is a beautiful example of just how subtle an induction proof
may be needed in order to do such a job. And, what's more, the proof is
constructive, it allows us to complete the Latin square explicitly from any
initial partial configuration.
Before proceeding to the proof let us take a closer look at Latin squares
in general. We can alternatively view a Latin square as a (3 x 712)-array,
called the line array of the Latin square. The figure to the left shows a Latin
square of order 3 and its associated line array, where R, C and E stand for
rows, columns and elements.
The condition on the Latin square is equivalent to saying that in any two
lines of the line array all n 2 ordered pairs appear (and therefore each pair
appears exactly once). Clearly, we may permute the symbols in each line
arbitrarily (corresponding to permutations of rows, columns or elements)
and still obtain a Latin square. But the condition on the (3 x n 2 )-array tells
us more: There is no special role for the elements. We may also permute
the lines of the array (as a whole) and still preserve the conditions on the
line array and hence obtain a Latin square.
Latin squares that are connected by any such permutation are called con-
jugates. Here is the observation which will make the proof transparent:
A partial Latin square obviously corresponds to a partial line array (every
pair appears at most once in any two lines), and any conjugate of a partial
Latin square is again a partial Latin square. In particular, a partial Latin
square can be completed if and only if any conjugate can be completed Uust
complete the conjugate and then reverse the permutation of the three lines).
We will need two results, due to Ryser and to Lindner, that were known
prior to Smetaniuk's theorem. If a partial Latin square is of the form that
the first T" rows are completely filled and the remaining cells are empty, then
we speak of an (T" X n )-Latin rectangle.
Lemma 1. Any (1" X n) - Latin rectangle, T" < n, can be extended to an
(T" + 1) x n Latin rectangle and hence can be completed to a Latin square.
. Proof. We apply Hall's theorem (see Chapter 22). Let Aj be the set
of numbers that do not appear in column j. An admissible (r' + l)-st row
corresponds then precisely to a system of distinct representatives for the
collection AI,... , An. To prove the lemma we therefore have to verify
Hall's condition (H). Every set Aj has size n - T", and every element is in
precisely n - T" sets Aj (since it appears T" times in the rectangle). Any rn
of the sets A-j contain together rn( n - r') elements and therefore at least Tn
different ones, which is just condition (H). 0
Lemma 2. Let P be a partial Latin square of order n with at most n - 1
cells filled and at most ¥ distinct elements, then P can be completed to a
Latin square of order n.
...I
JIll""'"
Completing Latin squares
163
. Proof. We first transform the problem into a more convenient form.
By the conjugacy principle discussed above, we may replace the condi-
tion "at most ¥ distinct elements" by the condition that the entries appear
in at most ¥ rows, and we may further assume that these rows are the top
rows. So let the rows with filled cells be the rows 1,2,... , r, with fi filled
cells in row i, where r :<:; ¥ and L:1 it :<:; n - 1. By permuting the rows,
we may assume h 2 h 2 '" 2 fr" Now we complete rows 1,... , r step
by step until we reach an (r x n)-rectangle which can then be extended to
a Latin square by Lemma I.
Suppose we have already filled rows 1,2, . . . , e - 1. In row fJ there are !£
filled cells which we may assume to be at the end. The current situation is
depicted in the figure, where the shaded part indicates the filled cells.
The completion of row e is performed by another application of Hall's
theorem, but this time it is quite subtle. Let X be the set of elements that
do not appear in rowe, thus IXI = n - !£, and for j = 1,... , n - !£
let A j denote the set of those elements in X which do not appear in
column j (neither above nor below row €). Hence in order to complete
row e we must verify condition (H) for the collection AI, . . . , An- it"
First we claim
n - Ie - e + 1 > f - 1 + !£+1 + .., + fl"
The case e = 1 is clear. Otherwise L=1 fi < n, h 2 ... > fr and
1 < e :<:; r together imply
l'
n > 'Lfi > (e-l)f;-I+fe+,..+f,..
i=1
Now either it-I 2 2 (in which case (1) holds) or it-I = 1. In the latter
case, (I) reduces to n > 2(e - 1) + r - f + 1 = r + e - 1, which is true
because of e :<:; r :<:; ¥.
Let us now take m sets Aj, 1 :<:; m :<:; n - fe, and let B be their union.
We must show IBI 2 m. Consider the number c of cells in the m columns
corresponding to the Aj's which contain elements of X. There are at most
(e - l)m such cells above row e and at most !£+1 +... + fr below row f,
and thus
c :<:; (e-1)m+it+l+.,.+fr.
On the other hand, each element:r E X\B appears in each of the m
columns, hence c 2 m(IXI-IBI), and therefore (with IXI = n -!£)
lEI 2 IXI-,kc 2 n-!t-(f-1)--!n(fe+l+,..+fr)'
It follows that IBI 2 m if
n - fe - (e - 1) - ,k(ff+1 + .,. + fr) > Tn - 1,
that is, if
m(n-ff-£+2-m) > ff+l+,..+f,..
A situation for n = 8, with f = 3, h =
h = h = 2, 14 = 1. The dark squares
represent the pre-filled cells, the lighter
ones show the cells that have been filled
in the completion process.
(I)
(2)
164
Completing Latin squares
8]
2 7
4 5
5
4
82
83
81
v
84
4 .
5 .
.) 4 .
.
.
2 7
.
83
82
81
L
4 1 2 6 3 5
3 6 1 5 4 2
1 5 4 3 2 6
.) 3 6 2 1 4
6 2 3 4 5 1
2 4 5 1 6 3
L'
4 1 2 6 3 5
3 6 1 5 4 .
1 5 4 3 .
5 3 6 .
6 2 .
2 .
.
Inequality (2) is true for m = 1 and for Tn = 11, - fr - f + 1 by (I), and hence
for all values Tn between 1 and 11, - fr - f + 1, since the left-hand side is
a quadratic function in Tn with leading coefficient -1. The remaining case
is Tn > 11, - if - f + 1. Since any element ;z; of X is contained in at most
f - 1 + fr+1 + . . . + ir rows, it can also appear in at most that many
columns. Invoking (I) once more, we find that x is in one of the sets Aj,
and we conclude in this case B = X, lEI = 11, - if 2 Tn, and the proof is
complete. D
Let us finally prove Smetaniuk's theorem.
Theorem. A partial Latin square (f'order 11, with at most 11, - 1 filled cells
can be completed to a Latin square of'the same order.
.
. Proof. We use induction on n, the cases 11, <::: 2 being trivial. Suppose
P is a partial Latin square of order n + 1 with at most n filled cells. By
Lemma 2 we may assume that there are more than distinct elements,
hence there is an element which appears only once: call it 11, + 1. As before
we assume that there are r partially filled rows 81, . .. ,8,. with iI, . .. . i,.
filled cells, L'=1 ii <::: 11,. Assume further that the cell filled with n + 1 is
in row 8).
As a first step we will move all filled cells above the back diagonal except
for the cell filled with 11, + 1 which will end up on the back diagonal. (The
back diagonal consists of the cells (i, j) withi + j = 11, + 2.) This is
accomplished as follows: Move 5] to row position 11,+ 2 - h. By permuting
the columns move the filled cells to the left such that 11, + 1 is last in its row,
and hence on the back diagonal. Next move row 82 to position n+ 1- h - h
and the filled cells as far left as possible. In general, move row 8 i to position
n + 1 - h - h - . . . - ii and the filled cells as far left as possible. This
clearly gives the desired set-up. The top figure in the margin of this page
shows an example with n + 1 = 7.
Now comes the main idea of the proof. Delete the element 11, + 1 and the
last row and column. By induction, the remaining partial Latin square of
order 11, can be completed to a Latin square L of order n. On the left we
have one (of many) completions of the partial Latin square from above.
Next we cut the Latin square L by keeping the part above and including the
back diagonal, emptying the lower part and moving it one step down. Thus
we obtain a half-filled rectangle L', which for our example is also displayed
in the margin.
Notice that all pre-filled cells of L' (except for the element n + 1) appear
in the "upper half" above the back diagonal. Hence if we can fill the lower
half of L' with the numbers 1, . .. , n so as to obtain a partial Latin square
(now of order n + 1), then we may fill the back diagonal with n + 1, add
the unique last column (a trivial case for Lemma I), and the proof will be
complete. We now perform this final step by filling in the rows of the lower
half row by row, using information from the Latin square L for guidance.
Let k 2 3, and assume inductively that we have filled rows 3, . .. . k - 1
subject to the following conditions:
Completing Latin squares
165
. The k - 2 elements in column j of L' different from n + 1 are among
the first k - 1 entries of column j of the original Latin square L, for
n - k + 3 ::; j ::; n. So there is precisely one "missing element" for
each column.
. The missing elements x j of the columns of L' are distinct.
Now we proceed to row k. Let Yn-k+2. . . . , Yn be the elements of row k
to the right (and including) the back diagonal of the Latin square L. Thus
Yn-k+2 is the new missing element in column n - k + 2. As a first try
we fill in row k with the elements Yn-k+3,... , Yn' Note that the whole
row k is admissible and that ;r; j -I- Yj for all j since they come from the
same column j in L. So this will work except when Yn-k+2 = Xj for
some j 2' n - k + 3, because then the condition on the missing elements
is violated. In this case we interchange ;r; j and Y j. The new row k is still
admissible (since Xj = Yn-k'+2 und Yn-k+2 was in row k before). So
we are finished unless Yj = Xl for some f -I- j. Then we interchange Xi
and YR, and so on. Clearly, this process stops with the desired result, at the
very latest when all ;I:i'S have been moved to row k. In this way we fill
in all rows down to row n. The final row n + 1 can then be filled with the
missing elements, and the proof is complete. D
As an illustration, the following figure shows this filling process for our
example.
,-
6
I
2
2
1 4
..
..
missing
elements:
2
226
2 6
2 2
.. 2 4 .. 2 4 ..
4 5 1 4 5 1
5 1 6 3
3 2 1 6 4 3 2 1 6
4 1 2 6 3 5 7
3 6 1 5 4 7 2
1 5 4 3 7 2 6
.) 3 6 7 2 4 1
6 2 7 4 .) 1 3
2 7 5 1 6 3 4
7 4 3 2 1 6 5
166
Completing Latin squares
References
[1] T. EVANS: Embedding incomplete Latin squares, Amer. Math. Monthly 67
(1960),958-961.
[2] C. C. LINDNER: 0/1 completing Latin rectangles, Canadian Math. Bulletin 13
(1970), 65-68.
[31 H. J. RYSER: A combinatorial theorem with an application to Latin rectangles,
Proc. Amer. Math. Soc. 2 (1951), 550-552.
[41 B. SMETANIUK: A new construction on Latin squares I: A proof" of the Evans
conjecture, Ars Combinatoria 11 (1981), 155-172.
j
,.....--
The Di n itz problem
The four-color problem was a main driving force for the development of
graph theory as we know it today, and coloring is still a topic that many
graph theorists like best. Here is a simple-sounding coloring problem,
raised by Jeff Dinitz in 1978, which defied all attacks until its astonishingly
simple solution by Fred Galvin fifteen years later.
Consider n 2 cells arranged in an (n x n) -square, and let (i, j) de-
note the cell in row i and column j. Suppose thatfor every cell (i, j)
we are given a set C(i, j) ofn colors.
Is it then always possible to color the whole array by picking for
each cell (i, j) a color from its set C (i, j) such that the colors in
each row and each column are distinct?
As a start consider the case when all color sets C( i, j) are the same, say
{I, 2, . .. , n}. Then the Dinitz problem reduces to the following task: Fill
the (n x n)-square with the numbers 1,2,. .. , n in such a way that the
numbers in any row and column are distinct. In other words, any such
coloring corresponds to a Latin square, as discussed in the previous chapter.
So, in this case, the answer to our question is "yes."
Since this is so easy, why should it be so much harder in the general case
when the set C := U . C(i,j) contains even more than n colors? The
1"J
difficulty derives from the fact that not every color of C is available at each
cell. For example, whereas in the Latin square case we can clearly choose
an arbitrary permutation of the colors for the first row, this is not so anymore
in the general problem. Already the case n = 2 illustrates this difficulty.
Suppose we are given the color sets that are indicated in the figure. If we
choose the colors I and 2 for the first row, then we are in trouble since we
would then have to pick color 3 for both cells in the second row.
Before we tackle the Dinitz problem, let us rephrase the situation in the
language of graph theory. As usual we only consider graphs G = (17, E)
without loops and multiple edges. Let x( G) denote the chromatic number
of the graph, that is, the smallest number of colors that one can assign to
the vertices such that adjacent vertices receive different colors.
In other words, a coloring calls for a partition of 17 into classes (colored
with the same color) such that there are no edges within a class. Calling
a set A <;;; 17 independent if there are no edges within A, we infer that
the chromatic number is the smallest number of independent sets which
partition the vertex set 17.
Chapter 26
J
-1-
C(i,j)
pY
+-
{1,2} {2,3}
{I, 3} {2, 3}
168
The Dinitz problem
In 1976 Vizing, and three years later Erdos, Rubin, and Taylor, studied the
folIowing coloring variant which leads us straight to the Dinitz problem.
Suppose in the graph G = (1 ',E) we are given a set C(v) of colors for
each vertex v. A list coloring is a coloring c : V --+ U"E\" C(v) where
c(v) E C(v) for each v E F. The definition of the list chromatic number
X{ (G) should now be clear: It is the smalIest number k such for any list
of color sets C(v) with IC(v)1 = k for alI v E I' there always exists a list
coloring. Of course, we have X{ (G) -:; WI (we never run out of colors).
Since ordinary coloring is just the special case oflist coloring when all sets
C(v) are equal, we obtain for any graph G
X(G) -:; Xt(G).
The graph 53
To get back to the Dinitz problem, consider the graph S" which has as
vertex set the n"2 celIs of our (n x n)-array with two cells adjacent if and
only if they are in the same row or column.
Since any n cells in a row are pairwise adjacent we need at least n col-
ors. Furthermore any coloring with n colors corresponds to a Latin square,
with the cells occupied by the same number forming a color class. Since
Latin squares, as we have seen, exist, we infer X(S,,) = n, and the Dinitz
problem can now be succinctly stated as
{1,2}
{1,4}
X{ (S,,) = n'?
One might think that perhaps X(G) = X{(G) holds for any graph G, but
this is a long shot from the truth. Consider the graph G = K 2 ,4. The
chromatic number is 2 since we may use one color for the two left vertices
and the second color for the vertices on the right. But now suppose that we
are given the color sets indicated in the figure.
To color the left vertices we have the four possibilities 113, 114, 213 and 214,
but anyone of these pairs appears as a color set on the right-hand side, so
a list coloring is not possible. Hence X ( (G) 2: 3, and the reader may find
it fun to prove X t (G) = 3 (there is no need to tryout alI possibilities!).
Generalizing this example, it is not hard to find graphs G where X(G) = 2,
but X {( G) is arbitrarily large! So the list coloring problem is not as easy as
it looks at first glance.
Back to the Dinitz problem. A significant step towards the solution was
made by Jeanette Janssen in 1992 when she proved X{(S,,) -:; n + 1, and
the coup de grace was delivered by Fred Galvin by ingeniously combining
two results, both of which had long been known. We are going to discuss
these two results and show then how they imply X{ (S,,) = n.
First we fix some notation. Suppose v is a vertex of the graph G, then we
denote as before by d( v) the degree of v. In our square graph S" every
vertex has degree 2n - 2, accounting for the n - 1 other vertices in the
same row and in the same column. For a subset A <;;; I' we denote by GA
the subgraph which has A as vertex set and which contains alI edges of G
between vertices of A. We calI G Jl the subgraph induced by .4., and say
that H is an induced subgraph of G if H = GA. for some .4..
{1,3}
{3,4}
{2,3}
{2,4}
,....-
The Dinitz problem
169
To state our first result we need directed graphs a = (V, E), that is, graphs
where every edge e has an orientation. The notation e = (11, v) means that
there is an arc e, also denoted by 11V, whose initial vertex is 11 and whose
terminal vertex is v. It then makes sense to speak of the outdegree d+ (v)
resp. the indegree d- (v), where d+ (v) counts the number of edges with v as
initial vertex, and similarly for d- (v); furthermore d+ (v) + d- (v) = d( v).
When we write G, we mean the graph a without the orientations.
The following concept originated in the analysis of games and will playa
crucial role in our discussion.
Definition 1. Let a = (V, E) be a directed graph. A kernel K <;;; V is a
subset of the vertices such that
(i) K is independent in G, and
(ii) for every 11 tf- K there exists a vertex v E K with an edge 11 v.
Let us look at the example in the figure. The set {b, c, f} constitutes a
kernel, but the subgraph induced by {a, c, e} does not have a kernel since
the three edges cycle through the vertices.
With all these preparations we are ready to state the first result.
Lemma 1. Let a = (V, E) be a directed graph, and suppose that for each
vertex vEt' we have a color set C (v) that is larger than the outdegree,
IC(v)1 2: d+(v) + 1. Ifevery induced subgraph of a possesses a kernel,
then there exists a list coloring ofG with a color from C( v) for each v.
. Proof. We proceed by induction on IVI. For IVI = 1 there is nothing to
prove. Choose a color c E C = U vEV C(v) and set
A(c) := {v E V : c E C(v)}.
By hypothesis, the induced subgraph G A(e) possesses a kernel K(c). Now
we color all v E K(c) with the color c (this is possible since K(c) is
independent), and delete K(c) from G and c from C. Let G' be the induced
subgraph of G on V\K(c) with C'(v) = C(v)\c as the new list of color
sets. Notice that for each v E A( c) \K (c), the outdegree d+ (v) is decreased
by at least 1 (due to condition (ii) of a kernel). So d+ (v) + 1 I c' (v) I still
holds in a'. The same condition also holds for the vertices outside A(c),
since in this case the color sets C (v) remain unchanged. The new graph G'
contains fewer vertices than G, and we are done by induction. 0
The method of attack for the Dinitz problem is now obvious: We have to
find an orientation of the graph Sn with outdegrees d+ (v) n - 1 for all v
and which ensures the existence of a kernel for all induced subgraphs. This
is accomplished by our second result.
Again we need a few preparations. Recall (from Chapter 8) that a bipartite
graph G = (X U Y, E) is a graph with the following property: The vertex
set V is split into two parts X and Y such that every edge has one end vertex
in X and the other in Y. In other words, the bipartite graphs are precisely
those which can be colored with two colors (one for X and one for Y).
a
b
c
f
e
170
The Dinitz problem
x
Y
A bipartite graph with a matching
The bold edges constitute a stable mar-
riage. In each priority list, the choice
leading to a stable marriage is printed
bold.
Now we come to an important concept, "stable matchings," with a down-
to-earth interpretation. A matching M in a bipartite graph G = (X U Y, E)
is a set of edges such that no two edges in 111 have a common endvertex. In
the displayed graph the edges drawn in bold lines constitute a matching.
Consider X to be a set of men and Y a set of women and interpret uv E E to
mean that u and v might marry. A matching is then a mass-wedding with no
person committing bigamy. For our purposes we need a more refined (and
more realistic?) version of a matching, suggested by Gale and Shapley.
Clearly, in real life every person has preferences, and this is what we add to
the set-up. In G = (X U Y, E) we assume that for every v E Xu Y there
is a ranking of the set N (v) of vertices adjacent to v, lv- (v) = {Zl > Z2 >
. . . > Zd( 11)}' Thus Zl is the top choice for v, followed by Z2, and so on.
Definition 2. A matching M of G = (X U Y, E) is called stable if the
following condition holds: Whenever uv E E\M, u E X, v E Y, then
either uy E M with y > v in N(u) or xv E M with x > u in N(v),
or both.
In our real life interpretation a set of marriages is stable if it never happens
that u and v are not married but u prefers v to his partner (if he has one at
all) and v prefers u to her mate (if she has one at all), which would clearly
be an unstable situation.
Before proving our second result let us take a look at the following example:
{A> C}
{C > D > E}
{A > D}
{A}
a A
b E
c C
d D
{c > d > a}
{b}
{a>b}
{c>b}
Notice that in this example there is a unique largest matching !'vI with four
edges, M = {aC, bE, cD, dA}, but M is not stable (consider cA).
Lemma 2. A stable matching always exists.
. Proof. Consider the following algorithm. In the first stage all men
u E X propose to their top choice. If a girl receives more than one pro-
posal she picks the one she likes best and keeps him on a string, and if she
receives just one proposal she keeps that one on a string. The remaining
men are rejected and form the reservoir R. In the second stage all men in R
propose to their next choice. The women compare the proposals (together
with the one on the string, if there is one), pick their favorite and put him
on the string. The rest is rejected and forms the new set R. Now the men
in R propose to their next choice, and so on. A man who has proposed to
his last choice and is again rejected drops out from further consideration
(as well as from the reservoir). Clearly, after some time the reservoir R is
empty, and at this point the algorithm stops.
....-
The Dinitz problem
171
Claim. When the algorithm stops, then the men on the strings
together with the corresponding girlsfonn a stable matching.
Notice first that the men on the string of a particular girl move there in
increasing preference (of the girl) since at each stage the girl compares
the new proposals with the present mate and then picks the new favorite.
Hence if nv E E but nv tf- 1\1, then either 11 never proposed to v in
which case he found a better mate before he even got around to v, im-
plying uy E M with y > v in N(u), or n proposed to 11 but was rejected,
implying .TV E 111 with.1: > n in N(v). But this is exactly the condition of
a stable matching. 0
Putting Lemmas and 2 together, we now get Galvin's solution of the
Dinitz problem.
Theorem. We have X e (S 11) = n for all n.
. Proof. As before we denote the vertices of SIl by (-i,j), 1 Si,j S 71.
Thus (i,.i) and (7",8) are adjacent if and only ifi = r or j = 8. Take any
Latin square L with letters from {I, 2,. ., , n} and denote by L(i, j) the
ent'Ty in cell (i,j). Next make Sn into a directed graph S" by orienting
the horizontal edges (i,j) -+ (i,jf) if L(i,j) < L(i,jf) and the vertical
edges (i,j) -+ (if,j) if L(i,j) > L(i',j). Thus, horizontally we orient
from the smaller to the larger element, and vertically the other way around.
(In the margin we have an example for n = 3.)
Notice that we obtain d+(i,j) = n -1 for all (i,j). In fact, if L(i,j) = k,
then n - k cells in rowi contain an entry larger than k, and k - 1 cells in
column j have an entry smaller than k.
By Lemma I it remains to show that every induced subgraph of SIl pos-
sesses a kernel. Consider a subset A <;;; V, and let X be the set of rows
of L, and l' the set of its columns. Associate to A the bipartite graph
G = (X U Y, A), where every (i, j) E A is represented by the edge ij with
i E X, j E Y. In the example in the margin the cells of A are shaded.
The orientation on S" naturally induces a ranking on the neighborhoods in
G = (X U 1', A) by setting jf > j in N(i) if (i,.i) -+ (i, jf) in S" respec-
tively'i' >i in N(j) if (i,j) -+ (i',j). By Lemma 2, G = (X U 1', A)
possesses a stable matching 1\I. This 1\1, viewed as a subset of A, is our
desired kernel! To see why, note first that 111 is independent in A since as
edges in G = (X U Y, A) they do not share an endvertex i or j. Secondly,
if (i, j) E A \111, then by the definition of a stable matching there either
exists (i, j') E 111 with j' > j or (i', j) E 111 with i' > i, which for SIl
means (i,j) -+ (i,j') E AI or (i,j) -+ (if,j) E 111, and the proof
is complete. 0
To end the story let us go a little beyond. The reader may have noticed that
the graph SIl arises from a bipartite graph by a simple construction. Take
the complete bipartite graph, denoted by Kn.n, with IXI = 11'1 = n, and
all edges between X and Y. If we consider the edges of Kn.n as vertices
1 2 3
3 1 2
2 3 1
1 234
1 1
2 2
3 3
4 4
172
The Dinitz problem
G: L(G): a
b7d
r
c
Construction of a line graph
of a new graph, joining two such vertices if and only if as edges in Kn,n
they have a common endvertex, then we clearly obtain the square graph Sn.
Let us say that Sri is the line graph of Kn,n. Now this same construction
can be perfonned on any graph G with the resulting graph called the line
graph L(G) of G.
In general, call H a line graph if H = L(G) for some graph G. Of course,
not every graph is a line graph, an example being the graph K 2 .4 that we
considered earlier, and for this graph we have seen X(K 2 . 1 ) < X£(K 2 . 4 ).
But what if H is a line graph? By adapting the proof of our theorem it can
easily be shown that X(H) = XI (H) holds whenever H is the line graph of
a bipartite graph, and the method may well go some way in verifying the
supreme conjecture in this field:
Does X( H) = XI (H) hold for every line graph H?
Very little is known about this conjecture, and things look hard - but after
all, so did the Dinitz problem twenty years ago.
References
[I] P. ERDOS, A. L. RUBIN & H. TAYLOR: Choosability in graphs, Proc. West
Coast Conference on Combinatorics, Graph Theory and Computing, Congres-
sus Numerantium 26 (1979), 125-157.
[2] D. GALE & L. S. SHAPLEY: College admissions and the stability of marriage,
Amer. Math. Monthly 69 (1962), 9-15.
[3] F. GALVIN: The list chromatic index of a bipartite multigraph, J. Combinato-
rial Theory, Ser. B 63 (1995), 153-158.
[4J 1. C. M. JANSSEN: The Dinitz problem solved for rectangles, Bulletin Amer.
Math. Soc. 29 (1993), 243-249.
[5] V G. VIZING: Coloring the vertices of a graph in prescribed colours (in Rus-
sian), Melody Diskret. Analiz. 101 (1976), 3-10.
.....-
Graph Theory
(
(
27
Five-coloring plane graphs 175
28
How to guard a museum 179
29
Turan's graph theorem 183
30
Communicating
without errors 189
31
Of friends and politicians 199
32
Probability makes counting
(sometimes) easy 203
"
"The four-colorist geographer"
Five-coloring plane graphs Chapter 27
Plane graphs and their colorings have been the subject of intensive research
since the beginnings of graph theory because of their connection to the four-
color problem. As stated originally the four-color problem asked whether it
is always possible to color the regions of a plane map with four colors such
that regions which share a common boundary (and not just a point) receive
different colors. The figure on the right shows that coloring the regions of
a map is really the same task as coloring the points of a plane graph. As in
Chapter 10 (page 59) place a point in the interior of each region (including
the outer region) and connect two such points belonging to neighboring
regions by an edge through the common boundary.
The resulting graph G, the dual graph of the map M, is then a plane graph,
and coloring the vertices of G in the usual sense is the same as coloring
the regions of A1. So we may as well concentrate on vertex-coloring plane
graphs and will do so from now on. Note that we may assume that G has The dual graph of a map
no loops or multiple edges, since these are irrelevant for coloring.
In the long and arduous history of attacks to prove the four-color theorem
many attempts came close, but what finally succeeded in the Appel-Haken
proof of 1976 and also in the recent proof of Robertson, Sanders, Seymour
and Thomas 1997 was a combination of very old ideas (dating back to the
19th century) and the very new calculating powers of modern-day comput-
ers. Twenty-five years after the original proof, the situation is still basically
the same, no proof from The Book is in sight.
So let us be more modest and ask whether there is a neat proof that every
plane graph can be 5-colored. A proof of this five-color theorem had al-
ready been given by Heawood at the turn of the century. The basic tool for
his proof (and indeed also for the four-color theorem) was Euler's formula
(see Chapter 10). Clearly, when coloring a graph G we may assume that G
is connected since we may color the connected pieces separately. A plane
graph divides the plane into a set R of regions (including the exterior re-
gion). Euler's formula states that for plane connected graphs G = (V, E)
we always have
!VI - lEI + IRI = 2.
As a warm-up, let us see how Euler's formula may be applied to prove
that every plane graph G is 6-colorable. We proceed by induction on the
number n of vertices. For small values of n (in particular, for n ::; 6) this
is obvious.
This plane graph has 8 vertices,
13 edges and 7 regions.
176
Five-coloring plane graphs
A near-triangulated plane graph
From part (A) of the proposition on page 61 we know that G has a vertex v
of degree at most 5. Delete v and all edges incident with v. The resulting
graph G' = G\v is a plane graph on n - 1 vertices. By induction, it can be
6-colored. Since v has at most 5 neighbors in G, at most 5 colors are used
for these neighbors in the coloring of G'. So we can extend any 6-coloring
of G' to a 6-coloring of G by assigning a color to v which is not used for
any of its neighbors in the coloring of G'. Thus G is indeed 6-colorable.
Now let us look at the list chromatic number of plane graphs, as discussed in
the previous chapter on the Dinitz problem. Clearly, our 6-coloring method
works for lists of colors as well (again we never run out of colors), so
X£(G) ::; 6 holds for any plane graph G. Erdos, Rubin and Taylor conjec-
tured in 1979 that every plane graph has list chromatic number at most 5,
and further that there are plane graphs G with X£ (G) > 4. They were
right on both counts. Margit Voigt was the first to construct an example
of a plane graph G with X£ (G) = 5 (her example had 238 vertices) and
around the same time Carsten Thomassen gave a truly stunning proof of
the 5-list coloring conjecture. His proof is a telling example of what you
can do when you find the right induction hypothesis. It does not use Euler's
formula at all!
Theorem. All planar graphs G can be 5-list colored:
X£(G)::;5.
. Proof. First note that adding edges can only increase the chromatic num-
ber. In other words, when H is a sub graph of G, then X£(H) ::; Xf(G)
certainly holds. Hence we may assume that G is connected and that all
the bounded faces of an embedding have triangles as boundaries. Let us
call such a graph near-triangulated. The validity of the theorem for near-
triangulated graphs will establish the statement for all plane graphs.
The trick of the proof is to show the following stronger statement (which
allows us to use induction):
Let G = (V, E) be a near-triangulated graph, and let B be the
cycle bounding the outer region. We make the following assump-
tions on the color sets C (v), v E V:
(1) Two adjacent vertices x, y of B are already colored with
(different) colors a and (3.
(2) Ie (v) I 2 3 for all other ve rtices v of B.
(3) Ie (v) I 2 5 for all vertices v in the interior.
Then the coloring of x, y can be extended to a prope r coloring of G
by choosing colors from the lists. In particular, X£ (G) ::; 5.
J
Five-coloring plane graphs
177
For IFI = 3 this is obvious, since for the only uncolored vertex v we have
IC(v)1 2: 3, so there is a color available. Now we proceed by induction.
Case 1: Suppose B has a chord, that is, an edge not in B that joins two
vertices 'U, v E B. The subgraph G 1 which is bounded by Bl U {'Uv}
and contains x, y, 'U and v is near-triangulated and therefore has a 5-list
coloring by induction. Suppose in this coloring the vertices 'U and v receive
the colors 1 and c5. Now we look at the bottom part G 2 bounded by B 2 and
uv. Regarding 'U, v as pre-colored, we see that the induction hypotheses
are also satisfied for G 2 . Hence G 2 can be 5-list colored with the available
colors, and thus the same is true for G.
Case 2: Suppose B has no chord. Let Va be the vertex on the other side of
the a-colored vertex :E on B, and let x, Vl , . ., , Vt, w be the neighbors of Va.
Since G is near-triangulated we have the situation shown in the figure.
Construct the near-triangulated graph G' = G\va by deleting from G the
vertex Va and all edges emanating from Va. This G' has as outer boundary
B' = (B\va) U {VI,..' , vt}. Since IC(va) I 2: 3 by assumption (2) there
exist two colors I, c5 in C(va) different from a. Now we replace every
color set C ( Vi) by C ( Vi ) \ { I, c5}, keeping the origi nal color sets for all other
vertices in G'. Then G' clearly satisfies all assumptions and is thus 5-list
colorable by induction. Choosing 1 or c5 for Va, different from the color
of w, we can extend the list coloring of G' to all of G. 0
So, the 5-list color theorem is proved, but the story is not quite over. A
stronger conjecture claimed that the list-chromatic number of a plane graph
G is at most I more than the ordinary chromatic number:
Is X p (G) X(G) + 1 for every plane graph G ?
Since X(G) 4 by the four-color theorem, we have three cases:
Case 1: X(G) = 2 ===} Xf(G) 3
Case II: X(G) = 3 ===} X£(G) 4
Case III: X(G) = 4 ===} Xf(G) 5.
Thomassen's result settles Case III, and Case I was proved by an ingenious
(and much more sophisticated) argument by Alon and Tarsi. Furthermore,
there are plane graphs G with X( G) = 2 and X£ (G) = 3, for example
the graph K 2 .4 that we considered in the preceding chapter on the Dinitz
problem.
But what about Case II? Here the conjecture fails: this was first shown
by Margit Voigt for a graph that was earlier constructed by Shai Gutner.
His graph on 130 vertices can be obtained as follows. First we look at
the "double octahedron" (see the figure), which is clearly 3-colorable. Let
a E {5, 6, 7, 8} and (3 E {9, 10, 11, 12}, and consider the lists that are given
in the figure. You are invited to check that with these lists a coloring is not
possible. Now take 16 copies of this graph, and identify all top vertices and
all bottom vertices. This yields a graph on 16 . 8 + 2 = 130 vertices which
G 1
v
y((3)
G 2
B 2
Va
a
{a,1,3,4}
j
{a,(3,1,2}
(3
178
Five-coloring plane graphs
is still plane and 3-colorable. We assign {5, 6, 7, 8} to the top vertex and
{9, 10, 11, 12} to the bottom vertex, with the inner lists corresponding to
the 16 pairs (0, {3), 0 E {5, 6, 7, 8}, {3 E {9, 10, 11, 12}. For every choice
of 0 and {3 we thus obtain a subgraph as in the figure, and so a coloring of
the big graph is not possible.
By modifying another one of Gutner's examples, Voigt and Wirth came up
with an even smaller plane graph with 75 vertices and X = 3, Xl = 5, which
in addition uses only the minimal number of 5 colors in the combined lists.
The current record is 63 vertices.
References
[IJ N. ALON & M. TARSI: Colorings and orientations of graphs, Combinatorica
12 (1992), 125-134.
[2] P. ERDOS, A. L. RUBIN & H. TAYLOR: Choosability in graphs, Proc. West
Coast Conference on Combinatorics, Graph Theory and Computing, Congres-
sus Numerantium 26 (1979), 125-157.
[3] S. GUTNER: The complexity of planar graph choosability, Discrete Math. 159
(1996), 119-130.
[4] N. ROBERTSON, D. P. SANDERS, P. SEYMOUR & R. THOMAS: The four-
colour theorem, J. Combinatorial Theory, Ser. B 70 (1997), 2-44.
[5J C. THOMASSEN: Every planar graph is 5-choosable, J. Combinatorial Theory,
Ser. B 62 ( 1994), 180-181.
[6] M. VOIGT: List colorings of planar graphs, Discrete Math. 120 (1993),
215-219.
[7] M. VOIGT & B. WIRTH: On 3-colorable non-4-choosable planar graphs,
J. Graph Theory 24 (1997), 233-235.
..J
,....-
I
How to guard a museum
Here is an appealing problem which was raised by Victor Klee in 1973.
Suppose the manager of a museum wants to make sure that at all times
every point of the museum is watched by a guard. The guards are stationed
at fixed posts, but they are able to turn around. How many guards are
needed?
We picture the walls of the museum as a polygon consisting of n sides.
Of course, if the polygon is convex, then one guard is enough. In fact, the
guard may be stationed at any point of the museum. But, in general, the
walls of the museum may have the shape of any closed polygon.
Consider a comb-shaped museum with n = 3m walls, as depicted on the
right. It is easy to see that this requires at least m = guards. In fact,
there are n walls. Now notice that the point I can only be observed by a
guard stationed in the shaded triangle containing I, and similarly for the
other points 2,3, . .. ,m. Since all these triangles are disjoint we conclude
that at least m guards are needed. But m guards are also enough, since they
can be placed at the top lines of the triangles. By cutting off one or two
walls at the end, we conclude that for any n there is an n-walled museum
which requires l J guards.
Chapter 28
\
\
\ /
--.---
"
/
/
/
/
A convex exhibition hall
'v--v-v-
1
---,
I-
--I
L-
-v
2
3
m,
l
A real life art gallery. . .
180
How to guard a museum
The following result states that this is the worst case.
A museum with n = 12 walls
c'
B
Schonhardt's polyhedron: The interior
dihedral angles at the edges A.B', BC'
and C A.' are greater than 180 0 .
Theorem. For any museum with n walls, l I J guards suffice.
This "art gallery theorem" was first proved by Vasek Chvatal by a clever
argument, but here is a proof due to Steve Fisk that is truly beautiful.
. Proof. First of all, let us draw n - 3 non-crossing diagonals between
corners of the walls until the interior is triangulated. For example, we can
draw 9 diagonals in the museum depicted in the margin to produce a trian-
gulation. It does not matter which triangulation we choose, anyone will do.
Now think of the new figure as a plane graph with the corners as vertices
and the walls and diagonals as edges.
Claim. This graph is 3-colorable.
For n = 3 there is nothing to prove. Now for n > 3 pick any two vertices
11 and v which are connected by a diagonal. This diagonal will split the
graph into two smaller triangulated graphs both containing the edge l1V. By
induction we may color each part with 3 colors where we may choose color
1 for 11 and color 2 for v in each coloring. Pasting the colorings together
yields a 3-coloring of the whole graph.
The rest is easy. Since there are n vertices, at least one of the color classes,
say the vertices colored I, contains at most l I J vertices, and this is where
we place the guards. Since every triangle contains a vertex of color I we in-
fer that every triangle is guarded, and hence so is the whole museum. 0
The astute reader may have noticed a subtle point in our reasoning. Does
a triangulation always exist? Probably everybody's first reaction is: Obvi-
ously, yes! Well, it does exist, but this is not completely obvious, and,
in fact, the natural generalization to three dimensions (partitioning into
tetrahedra) is false! This may be seen from Schonhardt's polyhedron, de-
picted on the left. It is obtained from a triangular prism by rotating the
top triangle, so that each of the quadrilateral faces breaks into two triangles
with a non-convex edge. Try to triangulate this polyhedron! You will notice
that any tetrahedron that contains the bottom triangle must contain one of
the three top vertices: but the resulting tetrahedron will not be contained in
Schonhardt's polyhedron. So there is no triangulation without an additional
vertex.
To prove that a triangulation exists in the case of a planar non-convex
polygon, we proceed by induction on the number n of vertices. For n = 3
the polygon is a triangle, and there is nothing to prove. Let n 4. To
use induction, all we have to produce is one diagonal which will split the
polygon P into two smaller parts which can be pasted together as we did
above.
Call a vertex A convex if the interior angle at the vertex is less than 180 0 .
Since the sum of the interior angles of P is (n - 2) 180 0 , there must be a
How to guard a museum
181
convex vertex A. In fact, there must be at least three of them: In essence
this is an application of the pigeonhole principle! Or you may consider the
convex hull of the polygon, and note that all its vertices are convex also for
the original polygon.
Now look at the two neighboring vertices Band C of A If the segment
BC lies entirely in P, then this is our diagonal. If not, the triangle ABC
contains other vertices. Slide BC towards A until it hits the last vertex Z
in ABC. Now AZ is within P, and we have a diagonal.
There are many variants to the art gallery theorem. For example, we may
only want to guard the walls (which is, after all, where the paintings hang),
or the guards are all stationed at vertices. A particularly nice (unsolved)
variant goes as follows:
',Suppose each guard may patrol one wall of the museum, so he
walks along his wall and sees anything that can be seen from any
point along this wall.
How many "wall guards" do we then need to keep control?
Gottfried Toussaint constructed the example of a museum displayed here
which shows that l J guards may be necessary.
This polygon has 28 sides (and, in general, 4rn sides), and the reader is
invited to check that rn side-guards are needed. It is conjectured that, except
for some small values of n, this number is also sufficient, but a proof, let
alone a Book Proof, is not yet in sight.
References
[I] V CHV A TAL: A combinatorial theorem in plane geometry, J. Combinatorial
Theory, Ser. B 18 (1975), 39-41.
[21 S. FISK: A short pro(f of Chvdtal's watchman theorem, J. Combinatorial
Theory, Ser. B 24 (1978),374.
[3] J. 0' ROURKE: Art Gallery Theorems and Algorithms, Oxford University
Press 1987.
[4] E. SCHONHARDT: Dber die Zerlegung von Dreieckspolyedern in Tetraeder,
Math. Annalen 98 (1928),309-312.
A
182
How to guard a museum
"Museum guards"
(A 3-dimensional art-gallery problem)
\
--
. .::.tZl" r::--
I- ky$1
I
y
, \
. _-----1
I
I f
,
I
- "
\
--
Turan's graph theorem
One of the fundamental results in graph theory is the theorem of Tunin
from 1941, which initiated extremal graph theory. Tunin's theorem was
rediscovered many times with various different proofs. We will discuss five
of them and let the reader decide which one belongs in The Book.
Let us fix some notation. We consider simple graphs G on the vertex set
1/ = {v],... , v n } and edge set E. If Vi and Vj are neighbors, then we
write ViVj E E. A p-clique in G is a complete sub graph of G on p vertices,
denoted by Kp. Paul Tunin posed the following question:
Suppose G is a simple graph that does not contain a p-clique.
What is the largest number of edges that G can have?
Chapter 29
We readily obtain examples of such graphs by dividing V into p-l pairwise
disjoint subsets V = VI U .., U V p - 1 , IViI = n;, n = nl + ... + np-l, Paul Tunin
joining two vertices if and only if they lie in distinct sets Vi, Vj. We denote
the resulting graph by Kn,.... ,np-l; it has Li<j ninj edges. We obtain a
maximal number of edges among such graphs with given n if we divide
the numbers ni as evenly as possible, that is, if Ini - njl ::; 1 for all i,j.
Indeed, suppose n] n2 + 2. By shifting one vertex from V 1 to v'2, we
obtain K n, - I ,n2+ 1 ,... ,np-l which contains (n] - l)(n2 + 1) - nln2 =
nl - n2 - 1 1 more edges than K n" n2,... ,np-l' Let us call the graphs
K n1 ,... ,np-l with Ini - nj I ::; 1 the Turan graphs. In particular, if p - 1
divides n, then we may choose ni = P::l for all i, obtaining The graph K 2 ,2,3
( P-l )( n ) 2 ( 1 ) n 2
2 p-l = I- p _l '2
edges. Tunin's theorem now states that this number is an upper bound for
the edge-number of any graph on n vertices without a p-c1ique.
Theorem. ff a graph G = (V, E) on n vertices has no p-clique, p 2,
then
1 2
lEI < ( 1- - ) .
- p-l 2
For p = 2 this is trivial. In the first interesting case p = 3 the theorem states
2
that a triangle-free graph on n vertices contains at most ' edges. Proofs
of this special case were known prior to Tunin's result. Two elegant proofs
using inequalities are contained in Chapter 16.
(1)
184
Turdn '05 graph theorem
A
B
G
V m
T
v
H
Let us turn to the general case. The first two proofs use induction and are
due to Tunin and to Erdos, respectively.
. First proof. We use induction on n. (1) is trivially true for n = 1.
Let G be a graph on V = {Vj,... , v n } without p-cliques with a maximal
number of edges. G certainly contains (p - I)-cliques, since otherwise we
could add edges. Let A be a (p - I)-clique, and set B := V\A.
A contains (P;I) edges, and we now estimate the edge-number eB in B
and the edge-number e A,B between A and B. By induction, we have eB :::;
(1- pl )(n - p + 1)2. Since G has no p-clique, every Vj E B is adjacent
to at most p - 2 vertices in A, and we obtain eA,R :::; (p - 2)(n - p + 1).
Altogether, this yields
( p - 1 ) 1 ( 1 ) .
lEI:::; + - 1 - - (n - p + 1)2 + (p - 2)(n - p + 1),
2 2 p-I
which is precisely (1 - pl ) 72 .
o
. Second proof. This proof makes use of the structure of the Tunin
graphs. Let V m E V be a vertex of maximal degree d m = maxI"sj"Sn d j .
Denote by S the set of neighbors of V m , ISI = d m , and set T = V\S. As
G contains no p-clique, and V m is adjacent to all vertices of S, we note that
S contains no (p - 1) -clique.
We now construct the following graph H on V (see the figure). H corre-
sponds to G on S and contains all edges between Sand T, but no edges
within T. In other words, T is an independent set in H, and we con-
clude that H has again no p-cliques. Let dj be the degree of Vj in H.
If Vj E S, then we certainly have dj :::: d j by the construction of H, and
for Vj E T, we see dj = ISI = d m :::: d j by the choice of V m . We in-
fer IE(H)I :::: lEI, and find that among all graphs with a maximal number
of edges, there must be one of the form of H. By induction, the graph
induced by S has at most as many edges as a suitable graph Kn, ,... ,n p _2
on S. So lEI:::; IE(H)I :::; E(K n1 ,... ,np-J with np-I = ITI, which
implies (l). 0
The next two proofs are of a totally different nature, using a maximizing
argument and ideas from probability theory. They are due to Motzkin and
Strauss and to Alon and Spencer, respectively.
. Third proof. Consider a probability distribution w = (WI,... , W n )
on the vertices, that is, an assignment of values Wi :::: 0 to the vertices with
L=I Wi = 1. Our goal is to maximize the function
f(w) = L WiWj.
V l Vj EE
Suppose w is any distribution, and let Vi and Vj be a pair of non-adjacent
vertices with positive weights Wi, Wj. Let 8i be the sum of the weights of
Turan '05 graph theorem
185
all vertices adjacent to Vi, and define Sj similarly for Vj, where we may
assume that Si 2: Sj. Now we move the weight from Vj to Vi, that is, the
new weight of viis 1V i + 1V j, while the weight of Vj drops to O. For the new
new distribution w' we find
f(w ' ) = f(w) + 1Vj S i - 1VjSj 2: f(w).
;(\
",r' .''- +;y
,- ':?f,":
//"tf'<o.
, ...J_
"""" '
.,&., ,
-...
Repeating this step (getting fewer vertices with positive weights in each
step) we conclude that there is an optimal distribution where the nonzero
weights are concentrated on a clique, say on a kf-clique. Now if, say, "Moving weights"
W1 > 7112 > 0, then choose E with 0 < E < W] - W2 and change w] to
'j - E and 11:2 to W2 + E. The new distribution w' satisfies f( w') =
f(w) + c(W1 - W2) - E 2 > f(w), and we infer that the maximal value
of f is attained for 11Ii = t on a k-clique and Wi = 0 otherwise. Since a
k-clique contains k(k 2 -1) edges, we obtain
f = k(k - 1) = ( 1 _ ) .
2 k2 2 k
Since this expression is increasing in k, the best we can do is to set k = p-1
(since G has no p-cliques). So we conclude
f(w) ( 1- )
2 p-1
for any distribution w. In particular, this inequality holds for the uniform
distribution given by 11Ii = for all i. Thus we find
lEI
n 2
f ( Wi = ) ( 1- ) ,
11 2 p-1
which is precisely (I).
o
. Fourth proof. This time we use some concepts from probability theory.
Let G be an arbitrary graph on the vertex set i' = {v],... ,VI/}' Denote
the degree of Vi by d i , and write w( G) for the number of vertices in a largest
clique, called the clique number of G.
" 1
Claim. We have w(G) 2: ""'-.
n-d
i=1 I
We choose with equal probability ;b a random permutation 7r17r2 . . . 7r 1/
of i' and construct the following set Crr. We put 7ri into C Tr if and only
if 7ri is adjacent to all 7rj (j < i) preceding 7ri. By definition, C Tr is a
clique in G. Let X = IC Tr I be the corresponding random variable. We have
X = L;l Xi, where Xi is the indicator random variable of the vertex Vi,
that is, Xi = lor Xi = 0 depending on whether Vi E C Tr or vi <t C Tr . Note
that Vi belongs to C Tr with respect to the permutation 7r]7r2 . . .7rf/ if and
only if Vi appears before all n - 1- d i vertices which are not adjacent to Vi,
or in other words, if Vi is thejirst among Vi and its n -1- d i non-neighbors.
The probability that this happens is d ' hence EX i = d '
n '1 n 1
186
Turan's graph theorem
Thus by linearity of expectation (see page 78) we obtain
/W
VJ
.
n n 1
E(IC 7T I) = EX = LEX i = L-'
. . 71. - d i
1.=1 1=1
Consequently, there must be a clique of at least that size, and this was our
claim. To deduce Tunin's theorem from the claim we use the Cauchy-
Schwarz inequality from Chapter 16,
n 2 n n
(La;b;) (LaO (LbO.
;=1 i=l n=l
Set ai = y'n - ri i , b i = v' 1 d ' then aibi = 1, and so
(1- i
11 11 1 11
71. 2 (L(n - di))(L 71. _ d ) w(G) L(n - d;). (2)
i=l i=l 1 i=1
At this point we apply the hypothesis w(G) p - 1 of Tunin's theorem.
Using also L1 d i = 21EI from the chapter on double counting, inequal-
ity (2) leads to
71. 2 (p - 1)(71. 2 - 2IEI),
and this is equivalent to Tunin's inequality.
o
Now we are ready for the last proof, which may be the most beautiful of
them all. Its origin is not clear; we got it from Stephan Brandt, who heard
it in Oberwolfach. It may be "folklore" graph theory. It yields in one stroke
that the Tunin graph is in fact the unique example with a maximal number
of edges. It may be noted that both proofs I and 2 also imply this stronger
result.
. Fifth proof. Let G be a graph on 71. vertices without a p-clique and with
a maximal number of edges.
Claim. G does not contain three vertices 11, v, w such that vw E E,
but 1J,V <I- E, uw <I- E.
.
u
Suppose otherwise, and consider the following cases.
Case 1.' d(u) < d(v) or d(u) < d(w).
We may suppose that d(u) < d(v). Then we duplicate v, that is, we create
a new vertex v' which has exactly the same neighbors as v (but vv' is not
an edge), delete u, and keep the rest unchanged.
The new graph G' has again no p-clique, and for the number of edges we
find
IE(G')I
IE(G)I + d(v) - d(u) > IE(G)I,
a contradiction.
Tunin's graph theorem
187
Case 2: d(ll) :::: d(v) and d(ll) :::: d(w).
Duplicate 1l twice and delete v and 'Ill (as illustrated in the margin). Again,
the new graph G' has no p-c1ique, and we compute (the -1 results from the
edge vw):
IE(G')I = IE(G)I + 2d(ll) - (d(v) + d(w) - 1) > IE(G)I.
So we have a contradiction once more.
A moment's thought shows that the claim we have proved is equivalent to
the statement that
1l rv V : {::::::} llV <I- E (G)
defines an equivalence relation. Thus G is a complete multipartite graph,
G = K '11 "" .'1,,-1' and we are finished. D
References
[11 M. AIGNER: Turdn's graph theorem, Amer. Math. Monthly 102 (1995),
808-816.
[2J N. ALON & J. SPENCER: The Probabilistic Method, Wiley Interscience 1992.
[31 P. ERDOS: On the graph theorem of Turdn (in Hungarian), Math. FiL Lapok
21 (1970), 249-251.
[4] T. S. MOTZKIN & E. G. STRAUSS: MaximajiJr graphs and a new prool(lla
theorem (If Turan, Canad. J. Math. 17 (1965), 533-540.
[51 P. TURAN: On an extremal prohlem in graph theorv, Math. Fiz. Lapok 48
(1941),436-452.
,',-J U)
v
1l ll' 1l"
.'
"Larger weights to move"
Communicating without errors
In 1956, Claude Shannon, the founder of information theory, posed the
following very interesting question:
Suppose we want to transmit messages across a channel (where
some symbols may be distorted) to a receiver. What is the maximum
rate of transmission such that the receiver may recover the original
message without errors?
Let us see what Shannon meant by "channel" and "rate of transmission."
We are given a set F of symbols, and a message is just a string of symbols
from F. We model the channel as a graph G = (F, E), where F is the set
of symbols, and E the set of edges between unreliable pairs of symbols,
that is, symbols which may be confused during transmission. For example,
communicating over a phone in everyday language, we connnect the sym-
bols Band P by an edge since the receiver may not be able to distinguish
them. Let us call G the confusion graph.
The 5-cycle C 5 will playa prominent role in our discussion. In this exam-
ple, I and 2 may be confused, but not I and 3, etc. Ideally we would like
to use all 5 symbols for transmission, but since we want to communicate
error-free we can - if we only send single symbols - use only one let-
ter from each pair that might be confused. Thus for the 5-cycle we can use
only two different letters (any two that are not connected by an edge). In the
language of information theory, this means that for the 5-cycle we achieve
an information rate of logz 2 = 1 (instead of the maximallog z 5 2.32).
It is clear that in this model, for an arbitrary graph G = (V, E), the best
we can do is to transmit symbols from a largest independent set. Thus the
information rate, when sending single symbols, is logz a( G), where a( G)
is the independence number of G.
Let us see whether we can increase the information rate by using larger
strings in place of single symbols. Suppose we want to transmit strings of
length 2. The strings 11] 11Z and v] Vz can only be confused if one of the
following three cases holds:
· 11] = v] and 11Z can be confused with V2
· 11Z = Vz and 11] can be confused with v]
· 11] -I- v] can be confused and 11Z -I- Vz can be confused.
In graph-theoretic terms this amounts to considering the product G] x G z
of two graphs G] = (F], Ed and G z = (Fz, Ez). G] x G z has the vertex
Chapter 30
1
50'
4 3
190
Communicatinfi without errors
The graph C 5 xC,",
set V 1 x 12 = {(Ul,U2) : Ul E V l ,U2 E "2}, with (Ul,U2) -I- (Vl,V2)
connected by an edge if and only if Ui = Vi or U(lJi E E for i = 1,2. The
confusion graph for strings of length 2 is thus G 2 = G x G, the product of
the graph with itself. The information rate of strings of length 2 per symbol
is then given by
log2 a(G 2 )
2
= log2 Ja(G2) .
Now, of course, we may use strings of any length n. The n-th confusion
graph Gn = G x G x . . . x G has vertex set yn = {( U] , . . . , un) : U i E I'}
with (u] , . ., , un) -I- (Vj, . . . v n ) being connected by an edge if Ui = Vi or
UiVi E E for all i. The rate of information per symbol determined by
strings of length n is
log2 n(Gn)
n
= log2 \,Ia(Gn) .
What can we say about a(Gn)? Here is a first observation. Let U V
be a largest independent set in G, IUI = n. The an vertices in G n of the
form (HI, . . . , Un), Ui E U for all i, clearly form an independent set in Gn.
Hence
a(G n ) > a(G)n
and therefore
\la(Gn) 2: n(G),
meaning that we never decrease the information rate by using longer strings
instead of single symbols. This, by the way, is a basic idea of coding theory:
By encoding symbols into longer strings we can make error-free communi-
cation more efficient.
Disregarding the logarithm we thus arrive at Shannon's fundamental
definition: The zero-error capacity of a graph G is given by
8(G) := sup \,Ia(Gn) ,
n>1
and Shannon's problem was to compute 8(G), and in particular 8(C 5 ).
Let us look at C 5 . So far we know a(G.5) = 2 :S 8(C 5 ). Looking at
the 5-cycle as depicted earlier, or at the product C x C 5 as drawn on the
left, we see that the set {(I, 1), (2,3), (3,5), (4,2), (5, -i)} is independent
in C,r Thus we have a(C 5 2 ) 2: 5. Since an independent set can contain
only two vertices from any two consecutive rows we see that a(C 5 2 ) = 5.
Hence, by using strings of length 2 we have increased the lower bound for
the capacity to 8( C 5 ) 2: )5.
So far we have no upper bounds for the capacity. To obtain such bounds
we again follow Shannon's original ideas. First we need the dual defini-
tion of an independent set. We recall that a subset C I' is a clique if
any two vertices of C are joined by an edge. Thus the vertices form trivial
CommunicatinR without errors
191
cliques of size I, the edges are the cliques of size 2, the triangles are cliques
of size 3, and so on. Let C be the set of cliques in G. Consider an arbitrary
probability distribution x = (xv : v E 1/) on the set of vertices, that
is, x,, 2: 0 and 2:VEV Xv = 1. To every distribution x we associate the
"maximal value of a clique"
A(X) = max "" XV,
C'EC
vEC'
and finally we set
A(G) = minA(x) = minmax"" XU'
'" '" C'EC
vEC'
To be precise we should use inf instead of min, but the minimum exists
because A( x) is continuous on the compact set of all distributions.
Consider now an independent set U 1/ of maximal size a( G) = a.
Associated to U we define the distribution Xu = (xv: v E ') by setting
Xu = if v E U and Xv = 0 otherwise. Since any clique contains at most
one vertex from U, we infer A( xu) = , and thus by the definition of A( G)
1
A( G) ::; a( G)
or
a(G) ::; A(G)-l.
What Shannon observed is that A (G) -1 is, in fact, an upper bound for all
\la(G") , and hence also for 8(G). In order to prove this it suffices to
show that for graphs G, H
A(G x H) = A(G)A(H)
holds, since this will imply A(G") = A(G)n and hence
(I)
a(G n ) <
\la(G") ::;
A(G")-l = A(G)-n
A(G) -1.
To prove (I) we make use of the duality theorem of linear programming
(see [I]) and get
A(G) = minma "" Xv = maxrni "" Yc, (2)
'" GEC y vE\
vEG ('3 v
where the right-hand side runs through all probability distributions y =
(Yc : C E C) on C.
Consider G x H, and let x and Xl be distributions which achieve the
minima, A(X) = A(G), A(XI) = A(H). In the vertex set of G x H we
assign the value z(u.v) = xux to the vertex (u,v). Since 2:(u,u) Z(U,I') =
2: u Xu 2:v x;, = 1, we obtain a distribution. Next we observe that the max-
imal cliques in G x Hare of the form ex D = {(u, v) : u E C,v E D}
where C and D are cliques in G and H, respectively. Hence we obtain
A(G x H) ::; A(Z)
1i5 L z(u.u}
(n,v)EC'xD
max "" X II "" x;,
C'xf)
uEC' ('ED
A(G)A(H)
192
CommunicatinR without errors
by the definition of >.( G x H). In the same way the converse inequality
>'(G x H) >'(G)>'(H) is shown by using the dual expression for >'(G)
in (2). In summary we can state:
8(G) :::: >'(G)-l,
for any graph G.
Let us apply our findings to the 5-cycle and, more generally, to the
Tn-cycle C m . By using the uniform distribution ( ..L,... , ..L ) on the
nl 1n
vertices, we obtain >. (C m ) :::: -:'n, since any clique contains at most two
vertices. Similarly, choosing ..L for the edges and 0 for the vertices, we have
? m .
>'(C m ) by the dual expression in (2). We conclude that >'(C m ) =
and therefore
rn
8 ( C ) < -
m - 2
for all Tn. Now, if Tn is even, then clearly a(C m ) = !!J- and thus also
8(C m ) = '!j-. For odd Tn, however, we have a(C m ) = ml . For Tn = 3,
C;, is a clique, and so is every product q,', implying a( C;,) = 8( C 3 ) = 1.
So, the first interesting case is the 5-cycle, where we know up to now
;)
v5 :::: 8(C 5 ) :::: 2'
(3)
Using his linear programming approach (and some other ideas) Shannon
was able to compute the capacity of many graphs and, in particular, of all
graphs with five or fewer vertices - with the single exception of C[), where
he could not go beyond the bounds in (3). This is where things stood for
more than 20 years until Laszlo Lovasz showed by an astonishingly simple
argument that indeed 8 (C 5) = v5. A seemingly very difficult combina-
torial problem was provided with an unexpected and elegant solution.
Lovasz' main new idea was to represent the vertices v of the graph by
real vectors of length 1 such that any two vectors which belong to non-
adjacent vertices in G are orthogonal. Let us call such a set of vectors
an orthonormal representation of G. Clearly, such a representation always
exists: just take the unit vectors (1,0,... ,O)T, (0,1,0,... ,0r T , '" ,
(0,0,. .. ,1)'/ of dimension Tn = WI.
For the graph C 5 we may obtain an orthonormal representation in IR 3 by
considering an "umbrella" with five ribs Vj , . " ,Vr; of unit length. Now
open the umbrella (with tip at the origin) to the point where the angles
between alternate ribs are 90 0 .
Lovasz then went on to show that the height h of the umbrella, that is, the
distance between 0 and S, provides the bound
1
8(Cr;) :::: h 2 '
(4)
The Lovasz umbrella
A simple calculation yields h 2 = ; see the box on the next page. From
this 8(C 5 ) :::: V5 follows, and therefore 8(C 5 ) = )5.
......
Communicatinfi without errors
193
Let us see how Lovasz proceeded to prove the inequality (4). (His results
were, in fact, much more general.) Consider the usual inner product
(x, y) = XIYl + ... + XsYs
of two vectors x (Xl, . . . , X s), y = (Yl,... , Y s) in Il.s. Then I x 1 2 =
(x, x) = :ri + . . . + X; is the square of the length Ixl of x, and the angle,
between x and y is given by
cas,
(x,y)
IxIIYI'
Thus (x, y) = 0 if and only if x and yare orthogonal.
Pentagons and the golden section
Tradition has it that a rectangle was considered aesthetically pleasing
if, after cutting off a square of length a, the remaining rectangle had
the same shape as the original one. The side lengths a, b of such a
rectangle must satisfy % = b(J,a ' Setting T := % for the ratio, we
obtain T = Tl or T 2 - T - 1 = O. Solving the quadratic equation
yields the golden section T = 1+ 2 v'5 1.6180.
Consider now a regular pentagon of side length a, and let d be the
length of its diagonals. It was already known to Euclid (Book XIII,8)
that = T, and that the intersection point of two diagonals divides
the diagonals in the golden section.
Here is Euclid's Book Proof. Since the total angle sum of the pen-
tagon is 37T, the angle at any vertex equals 3: . It follows that
;)
<1:AB E = K' since AB E is an isosceles triangle. This, in turn,
implies <1:AM B = 3 5 7f , and we conclude that the triangles ABC and
AM B are similar. The quadrilateral C M E D is a rhombus since op-
posing sides are parallel (look at the angles), and so IMCj = a and
thus IAMI = d - a. By the similarity of ABC and AM B we con-
clude
d IACI IABI a IMCj
----------T
a - IABI - IAMI - d - a - IMAI - .
There is more to come. For the distance 8 of a vertex to the center
of gravity S, the reader is invited to prove the relation 8 2 = T2
(note that BS cuts the diagonal AC at a right angle and halves it). To
finish our excursion into geometry, consider now the umbrella with
the regular pentagon on top. Since alternate ribs (of length 1) form a
right angle, the theorem of Pythagoras gives us d = )2, and hence
8 2 = T2 = A+5 ' SO, with Pythagoras again, we find for the height
h = 10SI our promised result
h 2 = 1 - 8 2
1+)5
-
)5+5
1
)5'
b
a
A
B
C
b-a
E
D
194
CommunicatinR without errors
N ow we head for an upper bound "8 (G) :S (J T] " for the Shannon capacity
of any graph G that has an especially "nice" orthonormal representation.
For this let T = {v(1),..., v(rn)} be an orthonormal representation
of G in IRs, where V(i) corresponds to the vertex Vi. We assume in
addition that all the vectors v(;) have the same angle (", 900) with the
vector u := (v(1) +... + v(rn»), or equivalently that the inner product
(V(i), u) = (J1'
has the same value (J l' '" 0 for all i. Let us call this value (J l' the constant
of the representation T. For the Lovasz umbrella that represents C,5 the
condition (v(;), u) = (JT certainly holds, for u := OS.
Now we proceed in the following three steps.
(A) Consider a probability distribution x = (:z:],... , xm) on ' and set
( ) .- 1 (1) (rn) 1 2
/lX .- XIV +...+.1: rn V ,
and
/l1'(G) := inf 11(X).
'"
Let U be a largest independent set in G with IUI = rt, and define Xu =
(Xl, . .. , x m ) with .1:i = if Vi E U and Xi = 0 otherwise. Since all
vectors v(i) have unit length and (v(i), v U )) = 0 for any two non-adjacent
vertices, we infer
rn 2 rn,) 11
/l1'(G):S /l(xu) = I LXiV(i)1 = LXi=rt a2 =.
';=1 ;=1
Thus we have /l1' (G) :S a-I, and therefore
1
o(G) :S _ (G)
/l1'
(B) Next we compute /l1' (G). We need the Cauchy-Schwarz inequality
(a, b)2 :S lal 2 IW
for vectors a, b E IRs. Applied to a = Xl V(l) + . . . + xrnv(rn) and b = u,
the inequality yields
( (1) (rn) ) 2 < ( ) 1 1 2
.1:1 V +...+XrnV ,U _ 11 XU.
(5)
By our assumption that (v(i), U) = (J1' for alii, we have
(:Z:lV(l)+...+x m v(rn),u) = (X1+...+X m )(JT = (J1'
for any distribution x. Thus, in particular, this has to hold for the uniform
distribution ( , . . . , ;,), which implies lul 2 = (J 1'" Hence (5) reduces to
,)
(J1' :S 11(X) (J1'
or
/l1'(G) 2: (J1"
.........
IIIIIIIr
Communicating without errors
]95
On the other hand, for x = ( ..L, .. .. l ) we obtain
rn ) In
JLT(G)::; Il(x) = 1(v(l)+...+v(rn)W
lul 2
aT'
and so we have proved
Il T (G) = a'],"
In summary, we have established the inequality
1
a(G) ::; -
aT
for any orthonormal respresentation T with constant aT'
(C) To extend this inequality to 8( G), we proceed as before. Consider
again the product G x H of two graphs. Let G and H have orthonormal
representations Rand S in IRr and IRs, respectively, with constants a R
and as' Let v = (VI,..' ,v r ) be a vector in Rand w = (WI,... ,w s ) be
a vector in S. To the vertex in G x H corresponding to the pair (v, w) we
associate the vector
vw T := (VI WI,... ,VIWs,V2WI,... ,V2Ws,... ,V"WI,... ,vrw s ) E IR rs .
It is immediately checked that R x S := {vw T : v E R, w E S} is an
orthonormal representation of G x H with constant a R as' Hence by (6)
we obtain
JLRXS(G x H) = tLR(G)lls(H).
For Gn
means
G x '" x G and the representation T with constant (JT this
t L T n (G n )
111' ( G) n
an
l'
and by (7) we obtain
a( G n )
yia(Gn) ::; (J;I.
< a- n
- 1"
Taking all things together we have thus completed Lovasz' argument:
Theorem. Whenever T = {v(1),..., v(m)} is an orthonormal
representation ofG with constant aT' then
8( G) ::;
1
(8)
aT
Looking at the Lovasz umbrella, we have u = (0,0, h= 4 )T and hence
v5
a = (v(i), u) = h 2 = Jg , which yields 8(C 5 ) ::; )5. Thus Shannon's
problem is solved.
(6)
(7)
.
... ' 1 .. . .; . 7 ,..: \ .
L.-';k;-
/ 1 ..A . ':Y-: t \;\lf
.. '''Ii: \. 16.;t..-, ,
/ ( \ // ) "i.
\1 ,1..- ( . ..'
J1fJ.. Jtl1r
, yV)
, b j
"Umbrellas with five ribs"
196
CommunicatinR without errors
( ! 1 0 0 J
0 1 0
A= 1 0 1
0 1 0
0 0 1 The adjacency matrix for the 5-cycle C 5
Let us carry our discussion a little further. We see from (8) that the larger (J" T
is for a representation of G, the better a bound for 8(G) we will get. Here
is a method that gives us an orthonormal representation for any graph G.
To G = (V, E) we associate the adjacency matrix A = (aij), which is
defined as follows: Let . = {VI, . . . ,v m }, then we set
aij := {
if ViV) E E
otherwise.
A is a real symmetric matrix with O's in the main diagonal.
Now we need two facts from linear algebra. First, as a symmetric matrix,
A has m real eigenvalues Al 2: A2 2: ... 2: Am (some of which may
be equal), and the sum of the eigenvalues equals the sum of the diagonal
entries of A, that is, O. Hence the smallest eigenvalue must be negative
(except in the trivial case when G has no edges). Letp = IAml = -Am be
the absolute value of the smallest eigenvalue, and consider the matrix
1
AI := I + - A,
P
where I denotes the (m x m)-identity matrix. This AI has the eigenvalues
1 + & > 1 + > . . . > 1 + Am = O. Now we quote the second result (the
p- p- - p
principal axis theorem of linear algebra): If AI = (mi)) is a real symmetric
matrix with all eigenvalues 2: 0, then there are vectors v(l), . .. ,v(m) E IRs
for 8 = rank( AI), such that
Tn- - = ( V (i) . v(j) ) ( 1 < i J . < m ) .
1) , , _
In particular, for AI = I + 1 A we obtain
p
(v(i),v(i)) = mii = 1 for all i
and
(V(i), v(j)) = aij
p
for i oj. j.
Since aij = 0 whenever ViVj tf. E, we see that the vectors v(1),... ,v(m)
form indeed an orthonormal representation of G.
Let us, finally, apply this construction to the m-cycles C m for odd m 2: 5.
Here one easily computes p = IAminl = 2cas (see the box). Every
row of the adjacency matrix contains two 1 's, implying that every row of
the matrix AI sums to 1 + . For the representation {v(1), . . . ,v(m)} this
means
2
(v(i),v(1)+...+v(m)) = 1+-
p
1
1+-
cas JI.
m
and hence
1
(V(i),U) = -(1 + ( cas JI.)-I) = (J"
Tn m
for all i. We can therefore apply our main result (8) and conclude
m
8(C m ) ::; 1 + (cas )-l (for m 2: 5 odd). (9)
I
I
I
J
"
Communicating without errors
197
Notice that because of cos < 1 the bound (9) is better than the bound
8(C I11 ) :::; !!j- we found before. Note further cas f, = , where T = +1
is the golden section. Hence for m = 5 we again obtain
8(C:;) :::; 5 = 5()5 + 1) =)5
1 + v'51 5 + )5 .
The orthonormal representation given by this construction is, of course,
precisely the "Lovasz umbrella."
And what about C 7 , C g , and the other odd cycles? By considering n:(C;;,),
n:(C;,) and other small powers the lower bound 111;-1 :::; 8(C m ) can cer-
tainly be increased, but for no odd m 2: 7 do the best known lower bounds
agree with the upper bound given in (8). So, twenty years after Lovasz'
marvelous proof of 8( C 5 ) = )5, these problems remain open and are
considered very difficult - but after all we had this situation before.
The eigenvalues of C m
Look at the adjacency matrix A of the cycle Cm. To find the eigen-
values (and eigenvectors) we use the m-th roots of unity. These are
given by 1, (, (2, . .. , (111-1 for ( = e 2:;.i - see the box on page 25.
Let ). (k be any of these roots, then we claim that
(1,).,).2, . . . ,).111-1)T is an eigenvector of A to the eigenvalue
). + ). -1. In fact, by the set-up of A we find
/ 1 ). + ).111-1 \ / 1 \
). ).2 + 1 ).
A ).2 ).3 + ). = ().+).-1) ).2
).111-1 , 1 + ). m-2 ).m-1
)
Since the vectors (1, >., . " ,). m-1) are independent (they form a so-
called Vandermonde matrix) we conclude that for odd m
(k + Ck
[( cas(2k1r jm) + i sin(2k1r jm)]
+ [cos(2k1rjm) - isin(2k7Tjm)]
2cos(2k1rjrn) (0:::; k:::; m;-1 )
are all the eigenvalues of A. Now the cosine is a decreasing function,
and so
( (rn - 1)7T )
2 cos
Tn
- 2 cas
rn
is the smallest eigenvalue of A.
For example, for Tn = 7 all we know is
5 7
v343:::; 8(C7) :::; 1 ( ")-1 '
+ cas t
which is 3.2141:::; 8(C 7 ) :::; 3.3177.
198
Communicating without errors
References
[I J V. CHV A TAL: Linear Programming, Freeman, New York 1983.
[2] W. HAEMERS: Eigenvalue methods, in: "Packing and Covering in Combina-
torics" (A. Schrijver, ed.), Math. Centre Tracts 106 (1979), 15-38.
[3] L. Lov ASZ: On the Shannon capacity of" a graph, IEEE Trans. Information
Theory 2S (1979), 1-7.
[4J C. E. SHANNON: The zero-error capacity (f" a noisv channel, IRE Trans.
Information Theory 3 (1956), 3-15.
J
J
IIIIIIIIIIIf'
Of friends and politicians
It is not known who first raised the following problem or who gave it its
human touch. Here it is:
Suppose in a group of people we have the situation that any pair of
persons have precisely one common friend. Then there is always a
person (the "politician") who is everybody'sfriend.
In the mathematical jargon this is called the friendship theorem.
Before tackling the proof let us rephrase the problem in graph-theoretic
terms. We interpret the people as the set of vertices l' and join two vertices
by an edge if the corresponding people are friends. We tacitly assume that
friendship is always two-ways, that is, if u is a friend of v, then v is also
a friend of u, and further that nobody is his or her own friend. Thus the
theorem takes on the following form:
Theorem. Suppose that G is a graph in which any two vertices have
precisely one common neighbor. Then there is a vertex which is adjacent to
all other vertices.
Note that there are graphs with this property as in the figure, where 11 is the
politician; in fact we will show that these "windmill graphs" are the only
graphs with this property.
It should be clear that in the presence of a politician only the windmill
graphs are possible. Several proofs of the friendship theorem exist, but the
first proof, given by Paul Erdos, Alfred Renyi and Vera S6s, is still the most
accomplished.
Chapter 31
- ,\, - :;/T;:,\
. - _., .1.
.' ./;:5,' ' - / ' . .f;
/,'i :z._ '. . c-__' -
(;: / "':i.' " i. .
it'"71 ,
. r JJ
. -!-) /
\-p.- -" /
\1
"A politician '5 smile"
. Proof. Suppose the assertion is false, and G is a counterexample, that is,
no vertex of G is adjacent to all other vertices. To derive a contradiction we A windmill graph
proceed in two steps. The first part is combinatorics, and the second part is
linear algebra.
(1) We claim that G is a regular graph, that is, d( 11) = d(v) for any 11, v E 1'. <)
Note first that the condition of the theorem implies that there are no cycles
oflength 4 in G. Let us call this the C 4 -condition.
We first prove that any two non-adjacent vertices u and v have equal degree
d(u) = d(v). Suppose d(11) = k, where 1V],... , w" are the neighbors ofu.
200
Offriends and politicians
u
Wi
Z"
Exactly one of the Wi, say W2, is adjacent to v, and W2 adjacent to exactly
one of the other Wi'S, say Wi, so that we have the situation of the figure to
the left. The vertex v has with WI the common neighbor W2, and with Wi
(i 2': 2) a common neighbor Zi (i 2': 2). By the C 4 -condition, all these Zi
must be distinct. We conclude d(v) 2': k = d(u), and thus d(u) = d(v) = k
by symmetry.
To finish the proof of (1), observe that any vertex different from W2 is not
adjacent to either 11 or v, and hence has degree k, by what we already
proved. But since W2 also has a non-neighbor, it has degree k as well,
and thus G is k-regular.
Summing over the degrees of the k neighbors of u we get k 2 . Since
every vertex (except u) has exactly one common neighbor with 11, we have
counted every vertex once, except for 11, which was counted k times. So
the total number of vertices of G is
n = k 2 - k + 1.
(I)
(2) The rest of the proof is a beautiful application of some standard results
of linear algebra. Note first that k must be greater than 2, since for k <; 2
only G = K] and G = K;, are possible by (1), both of which are trivial
windmill graphs. Consider the adjacency matrix A = (aij), as defined on
page 196. By part (1), any row has exactly k 1 's, and by the condition of
the theorem, for any two rows there is exactly one column where they both
have a 1. Note further that the main diagonal consists of O's. Hence we
have
A 2
(;
1
1
k
l)
(k-l)I+.J,
where I is the identity matrix, and .J the matrix of aliI's. It is immediately
checked that .J has the eigenvalues n (of multiplicity 1) and 0 (of multi-
plicity n - 1). It follows that A2 has the eigenvalues k - 1 + n = k 2
(of multiplicity 1) and k - 1 (of multiplicity n - 1).
Since A is symmetric and hence diagonalizable, we conclude that A has
the eigenvalues k (of multiplicity 1) and ::I:Jk'='1. Suppose T of the
eigenvalues are equal to Jk'='1 and 8 of them are equal to -Jk'='1, with
T + 8 = n - 1. Now we are almost home. Since the sum of the eigenvalues
of A equals the trace (which is 0), we find
k + T - sJk='1 0,
and, in particular, T # 8, and
x:
8-7'
.J
IIIIIIF"
Offriends and politicians
201
It follows that is an integer h (if vrn is rational, then it is an inte-
ger !), and we obtain
.)
h(s - r) = k = h- + 1.
Since h divides h 2 + 1 and h 2 , we find that h must be equal to 1, and
thus k = 2, which we have already excluded. So we have arrived at a
contradiction, and the proof is complete. 0
However, the story is not quite over. Let us rephrase our theorem in the
following way: Suppose G is a graph with the property that between any
two vertices there is exactly one path of length 2. Clearly, this is an equiv-
alent formulation of the friendship condition. Our theorem then says that
the only such graphs are the windmill graphs. But what if we consider
paths of length more than 2? A conjecture of Anton Kotzig asserted that
the analogous situation is impossible.
Kotzig's Conjecture. Let g > 2. Then there are no graphs with the
property that between any two vertices there is precisely one path of
length £.
Kotzig himself verified his conjecture for g :s; 8. By some very clever
counting, Xing and Hu settled it for all g 2: 12, and the remaining cases
were just recently finished by Yang, Lin, Wang and Li. So Kotzig's
conjecture has become a theorem.
References
[I] P. ERDOS, A. RENYI & V. SOS: On a problem o/graph theory, Studia Sci.
Math. 1 (1966),215-235.
[2J A. KOTZIG: Regularly k-path connected graphs, Congressus Numerantium 40
(1983), 137-141.
[3] K. XING & B. Hu: On Kotzig's conjecturef{Jr graphs with a regular path-
connectedness, Discrete Math. 135 (1994),387-393.
[4J Y. YANG, 1. LIN, C. WANG & V. LI: On Kotzig's conjecture concerning
graphs with a unique regular path-connectivity, Discrete Math. 211 (2000),
287-298.
Probability makes counting
(sometimes) easy
Chapter 32
Just as we started this book with the first papers of Paul Erdos in num-
ber theory, we close it by discussing what will possibly be considered his
most lasting legacy - the introduction, together with Alfred Renyi, of the
probabilistic method. Stated in the simplest way it says:
If, in a given set of objects, the probability that an object does not
have a certain property is less than 1, then there must exist an object
with this property.
Thus we have an existence result. It may be (and often is) very difficult to
find this object, but we know that it exists. We present here three examples
(of increasing sophistication) of this probabilistic method due to Erdos, and
end with a particularly elegant brand-new application.
As a warm-up, consider a family F of subsets Ai, all of size d 2: 2, of a
finite ground-set X. We say that F is 2-colorable if there exists a coloring
of X with two colors such that in every set Ai both colors appear. It is
immediate that not every family can be colored in this way. As an example,
take all subsets of size d of a (2d - I)-set X. Then no matter how we
2-color X, there must be d elements which are colored alike. On the other
hand, it is equally clear that every subfamily of a 2-colorable family of
d-sets is itself 2-colorable. Hence we are interested in the smallest number
m = m(d) for which a family with m sets exists which is not 2-colorable.
Phrased differently, m( d) is the smallest number which guarantees that
every family with less than m( d) sets is 2-colorable.
Theorem 1. Every family of at most 2 d - 1 d-sets is 2-colorable, that is,
m(d) > 2 d - 1 .
. Proof. Suppose F is a family of d-sets with at most 2 d - 1 sets. Color X
randomly with two colors, all colorings being equally likely. For each set
A E F let E A be the event that all elements of A are colored alike. Since
there are precisely two such colorings, we have
1 d-l
Prob(EA) = (:2) ,
and hence with m = IFI 2 d - 1 (note that the events E A are not disjoint)
U ""' 1 d-l
Prob( E A ) < Prob(EA) = m (2") < 1.
AE.:F AE.:F
We conclude that there exists some 2-coloring of X without a unicolored
set, and this is just our condition of 2-colorability. 0
204
3
Probability makes counting (sometimes) easy
1
v
B
A
blue
edges
An upper bound for m(d), roughly equal to d 2 2 d , was also established by
Erdos, again using the probabilistic method, this time taking random sets
and a fixed coloring. As for exact values, only the first two m(2) = 3,
m(3) = 7 are known. Of course, m(2) = 3 is realized by the graph K3,
while the Fano configuration yields m( 3) 7. Here:F consists of the seven
3-sets of the figure (including the circle set {4, 5, 6}). The reader may find
it fun to show that :F needs 3 colors. To prove that all families of six 3-sets
are 2-colorable, and hence m(3) = 7, requires a little more care.
2
Our next example is the classic in the field - Ramsey numbers. Consider
the complete graph KN on 1'v T vertices. We say that KN has property (m, n)
if, no matter how we color the edges of K N red and blue, there is always a
complete subgraph on m vertices with all edges colored red or a complete
subgraph on n vertices with all edges colored blue. It is clear that if KN
has property (m, n), then so does every Ks with.'; 2: N. So, as in the first
example, we ask for the smallest number N (if it exists) with this property
- and this is the Ramsey number R(m, n).
As a start, we certainly have R(m, 2) = m because either all of the edges
of Km are red or there is a blue edge, resulting in a blue K 2 . By symmetry,
we have R(2, n) = n. Now, suppose R(m - 1, n) and R(m, n - 1) exist.
We then prove that R( m, n) exists and that
R(m,n) R(m -1,n) + R(m,n -1).
(I)
Suppose N = R(m - 1, n) + R(m, n - 1), and consider an arbitrary red-
blue coloring of KN. For a vertex v, let A be the set of vertices joined to v
by a red edge, and B the vertices joined by a blue edge.
Since IAI + IBI = N - 1, we find that either IAI 2: R(m - 1, n) or
IBI 2: R(m, n - 1). Suppose IAI 2: R(m - 1, n), the other case being
analogous. Then by the definition of R(m - 1, n), there either exists in A a
subset A R of size m - 1 all of whose edges are colored red which together
with v yields a red K"" or there is a subset A B of size n with all edges
colored blue. We infer that KN satisfies the (m, n )-property and Claim (I)
follows.
Combining (I) with the starting values R(m, 2) = m and R(2, n) = n, we
obtain from the familiar recursion for binomial coefficients
R(rn, n)
( m + n - 2 ) ,
m-l
and, in particular
R(k, k)
( 2k - 2 )
k-l
( 2k - 3 ) + ( 2k - 3 ) < 22k3.
k - 1 k _ 2 - (2)
Now what we are really interested in is a lower bound for R(k, k). This
amounts to proving that for every N < R(k, k) there exists a coloring
of the edges such that no red or blue Kk results. And this is where the
probabilistic method comes into play.
--""IIiI
Probability makes counting (sometimes) easy
205
Theorem 2. For all k 2: 2, the following lower bound holds for the Ramsey
numbe rs:
R(k,k) 2:
k
2 2 .
. Proof. We have R(2, 2) = 2. From (2) we know R(3, 3) :s; 6, and the
pentagon colored as in the figure shows R(3, 3) = 6.
k
Now let us assume k 2: 4. Suppose N < 2 2 , and consider all red-blue
colorings, where we color each edge independently red or blue with proba-
bility . Thus all colorings are equally likely with probability T(). Let A
be a set of vertices of size k. The probability of the event A R that the edges
in A are all colored red is then T m. Hence it follows that the probability
P R for some k-set to be colored all red is bounded by
P R
Prob( U A R )
IAI=k
< L Prob(A R )
IAI=k
() Tm.
Now with N < 2 and k 2: 4, using () :s; 2r:1 for k 2: 2 (see page 12),
we have
( N ) Tm < N k Tm < 2-m-k+l = T+l < .
k - 2 k - 1 - 2
Hence P R < , and by symmetry P B < for the probability of some
k vertices with all edges between them colored blue. We conclude that
PH + P B < 1 for N < 2 , so there must be a coloring with no red or
blue Kk, which means that KN does not have property (k, k). 0
Of course, there is quite a gap between the lower and the upper bound for
R( k, k). Still, as simple as this Book Proof is, no lower bound with a better
exponent has been found for general k in the more than 50 years since
Erdos' result. In fact, no one has been able to prove a lower bound of the
form R(k, k) > 2( +c)k nor an upper bound of the form R(k, k) < 2(2-E)k
for a fixed c > O.
Our third result is another beautiful illustration of the probabilistic method.
Consider a graph G on n vertices and its chromatic number x( G). If x( G)
is high, that is, if we need many colors, then we might suspect that G
contains a large complete subgraph. However, this is far from the truth.
Already in the fourties Blanche Descartes constructed graphs with arbitrar-
ily high chromatic number and no triangles, that is, with every cycle having
length at least 4, and so did several others (see the box on the next page).
However, in these examples there were many cycles of length 4. Can we do
even better? Can we stipulate that there are no cycles of small length and
still have arbitrarily high chromatic number? Yes we can! To make matters
precise, let us call the length of a shortest cycle in G the girth ,( G) of G;
then we have the following theorem, first proved by Paul Erdos.
206
Probability makes counting (sometimes) easy
Triangle-free graphs with high chromatic number
Here is a sequence of triangle-free graphs G 3 , G 4 ,... with
x(G n ) = n.
Constructing the Mycielski graph
Start with G 3 = C 5 , the 5-cycle; thus X(G 3 ) = 3. Suppose we have
already constructed G n on the vertex set V. The new graph G n + 1 has
the vertex set V U V' u {z}, where the vertices v' E V' correspond
bijectively to v E V, and z is a single other vertex. The edges of
G n + 1 fall into 3 classes: First, we take all edges of G n ; secondly
every vertex v'is joined to precisely the neighbors of v in G n ; thirdly
z is joined to all v' E V'. Hence from G 3 = C,s we obtain as G 4 the
so-called Mycielski graph.
Clearly, G n + 1 is again triangle-free. To prove x(Gn+d = n + 1 we
use induction on n. Take any n-coloring of G n and consider a color
class C. There must exist a vertex v E C which is adjacent to at
least one vertex of every other color class; otherwise we could dis-
tribute the vertices of C onto the n - 1 other color classes, resulting
in X( G n ) ::; n - 1. But now it is clear that v' (the vertex in V' cor-
responding to v) must receive the same color as v in this n-coloring.
So, all n colors appear in V', and we need a new color for z.
Theorem 3. For every k 2: 2, there exists a graph G with chromatic
number X(G) > k and girth ,(G) > k.
The strategy is similar to that of the previous proofs: We consider a cer-
tain probability space on graphs and go on to show that the probability for
X( G) ::; k is smaller than , and similarly the probability for ,(G) ::; k
is smaller than &. Consequently, there must exist a graph with the desired
properties.
. Proof. Let / = {v 1, v:z, . . . , v,,} be the vertex set, and p a fixed num-
ber between 0 and 1, to be carefully chosen later. Our probability space
Q (n, p) consists of all graphs on V where the individual edges appear with
probability p, independently of each other. In other words, we are talking
about a Bernoulli experiment where we throw in each edge with proba-
bility p. As an example, the probability Prob(Kn) for the complete graph
is Prob(Kn) = p(). In general, we have Prob(H) = p"'(l - p)(;)-'" if
the graph H on V has precisely Tn edges.
Let us first look at the chromatic number X(G). By ex = ex(G) we denote
the independence number, that is, the size of a largest independent set in G.
Since in a coloring with X = X( G) colors all color classes are independent
(and hence of size::; ex), we infer xn 2: n. Therefore if ex is small as
compared to n, then X must be large, which is what we want.
Suppose 2 ::; r ::; n. The probability that a fixed r-set in F is independent
........
Probability makes counting (sometimes) easy
207
is (1 - p) (;), and we conclude by the same argument as in Theorem 2
Prob(ex r) < G) (1- p)(;)
< nr(1 - p)(;) (n(1 - p) ";1 r < (ne- p (r-l)/2r,
since 1 - p ::; e- P for all p.
k
Given any fixed k > 0 we now choose p = n- HI , and proceed to show
that for n large enough,
( n ) 1
Prob ex > - < -.
- 2k 2
(3)
1 1
Indeed, since n k + 1 grows faster than logn, we have n H1 >
for large enough n, and thus p > 6k lO n . For r = r 21 l
pr 310g n, and thus
6k log n
this gives
ne- p (r-l)/2 = ne-Pf e < ne- log ne
1 1 1
= n-"Ie"I = ()2,
which converges to 0 as n goes to infinity. Hence (3) holds for all n nl.
Now we look at the second parameter, ,(G). For the given k we want to
show that there are not too many cycles of length::; k. Let i be between 3
and k, and it <:;;; V a fixed i-set. The number of possible i-cycles on it is
clearly the number of cyclic pennutations of it divided by 2 (since we may
traverse the cycle in either direction), and thus equal to (i-;;l)! . The total
number of possible i-cycles is therefore CD (i-;l)! , and every such cycle C
appears with probability pi. Let X be the random variable which counts the
number of cycles of length::; k. In order to estimate X we use two simple
but beautiful tools. The first is linearity of expectation, and the second is
Markov's inequality for nonnegative random variables, which says
EX
Prob(X a) ::; -,
a
where EX is the expected value of X. See the appendix to Chapter 13 for
both tools.
Let Xc be the indicator random variable of the cycle C of, say, length i.
That is, we set Xc = 1 or 0 depending on whether C appears in the graph
or not; hence EX c = pi. Since X counts the number of all cycles of
length::; k we have X = L Xc, and hence by linearity
k ( n ) ( i _ 1 ) '. 1 k .. 1
EX = i 2 ' p! ::; 2n!p! ::; 2(k-2)n kpk
1
where the last inequality holds because of np = n m 1. Applying now
Markov's inequality with a = , we obtain
Prob(X ) ::;
EX
n/2
< (k _ 2) (np)k
n
1
(k - 2)n- k+l .
208
Probability makes counting (sometimes) easy
:=>() :=>....
<7<.
'Y-
;6( ;6(
Since the right-hand side goes to 0 with n going to infinity, we infer that
p( X 2: ) < for n 2: n2.
Now we are almost home. Our analysis tells us that for n 2: max(nj, n2)
there exists a graph H on n vertices with a(H) < 2 n " and fewer than
cycles of length::; k. Delete one vertex from each of these cycles, and
let G be the resulting graph. Then,( G) > k holds at any rate. Since G
contains more than vertices and satisfies a(G) ::; a(H) < 2 ' we find
n/2 n
X(G) 2: a(G) > 2a(H)
n
> - / = k.
n k .
and the proof is finished.
o
Explicit constructions of graphs with high girth and chromatic number (of
huge size) are known. (In contrast, one does not know how to construct
red/blue colorings with no large monochromatic cliques, whose existence
is given by Theorem 2.) What remains striking about the Erdos proof is
that it proves the existence of relatively small graphs with high chromatic
number and girth.
To end our excursion into the probabilistic world let us discuss an important
result in geometric graph theory (which again goes back to Paul Erdos)
whose stunning Book Proof is of very recent vintage.
Consider a simple graph G(V, E) with n vertices and Tn edges. We want
to embed G into the plane just as we did for planar graphs. Now, we know
from Chapter 10 - as a consequence of Euler's formula - that a simple
planar graph G has at most 3n - 6 edges. Hence if Tn is greater than
3n - 6, there must be crossings of edges. The crossing number cr( G) is
then naturally defined: It is the smallest number of crossings among all
embeddings of G. Thus cr( G) = 0 if and only if G is planar.
In such a minimal embedding the following three situations are ruled out:
. No edge can cross itself.
. Edges with a common endvertex cannot cross.
. No two edges cross twice.
This is because in either of these cases, we can construct a different draw-
ing of the same graph with fewer crossings, using the operations that are
indicated in our figure. So, from now on we assume that any embedding
observes these rules.
Suppose that G is embedded in ]R2 with cr(G) crossings. We can imme-
diately derive a lower bound on the number of crossings. Consider the
following graph H: The vertices of H are those of G together with all
crossing points, and the edges are all pieces of the original edges as we go
along from crossing point to crossing point.
The new graph H is now plane and simple (this follows from our three
assumptions!). The number of vertices in H is n + cr(G) and the number
j
....J
Probability makes counting (sometimes) easy
209
of edges is m + 2cr(G), since every new vertex has degree 4. Invoking the
bound on the number of edges for plane graphs we thus find
m+2cr(G) ::; 3(n+cr(G))-6,
that is,
cr(G) 2: m-3n+6.
As an example, for the complete graph K6 we compute
cr(K6) 2: 15 - 18 + 6 = 3
and, in fact, there is an embedding with just 3 crossings.
The bound (4) is good enough when m is linear in n, but when m is larger
compared to n, then the picture changes, and this is our theorem.
Theorem 4. Let G be a simple graph with n vertices and m edges, where
m 2: 4n. Then
1 m 3
cr(G) > 64 n2 .
The history of this result, called the crossing lemma, is quite interesting.
It was conjectured by Erdos and Guy in 1973 (with 6 replaced by some
constant c). The first proofs were given by Leighton in 1982 (with 160 in-
stead of l4 ) and independently by Ajtai, Chvatal, Newborn and Szemeredi.
The crossing lemma was hardly known (in fact, many people thought of it
as a conjecture long after the original proofs), until Laszlo Szekely demon-
strated its usefulness in a beautiful paper, applying it to a variety of hitherto
hard geometric extremal problems. The proof which we now present arose
from e-mail conversations between Bernard Chazelle, Micha Sharir and
Emo Welzl, and it belongs without doubt in The Book.
. Proof. Consider a minimal embedding of G, and let p be a number
between 0 and I (to be chosen later). Choose independently every vertex
of G with probability p, and denote by G p the graph induced by the vertices
that are present in G p .
Let n p , m p , X p be the random variables counting the number of vertices,
of edges, and of crossings in G p . Since cr(G) - m + 3n 2: 0 holds by (4)
for any graph, we certainly have for the expectation
E(X p - mp + 3n p ) 2: o.
(4)
Now we proceed to compute the individual expectations E(np), E(mp) and
E(X p ). Clearly, E(n p ) = pn and E(m p ) = p2m, since an edge appears
in G p if and only if both its endvertices do. And finally, E(X p ) = p 4 cr( G),
since a crossing is present in G p if and only if all four (distinct!) vertices
involved are there.
210
Probability makes counting (sometimes) easy
By linearity of expectation we thus find
o ::; E(X p ) - E(mp) + 3E(n p )
p 4 cr(G) - p 2 m + 3pn,
which is
cr(G)
p 2 m - 3pn
> 4
P
m 3n
---
p2 p3'
(5)
Here comes the punch line: Set p = ;;: (which is at most 1 by Our assump-
tion), then (5) becomes
cr( G) > 1 [ 4m 3n ]
- 64 (n/m)2 - (n/m)3
1 m 3
64 ;:2'
and this is it.
o
Paul Erdos would have loved to see this proof.
References
llJ M. AJTAI, V. CHVATAL, M. NEWBORN & E. SZEMEREDI: Crossingjree
subgraphs, Annals of Discrete Mathematics 12 (1982), 9-12.
[2J N. ALON & J. SPENCER: The Probabilistic Method, Wiley-lnterscience 1992.
[3J P. ERDOS: Some remarks on the theory of graphs, Bulletin Amer. Math. Soc.
S3 (1947), 292-294.
[4J P. ERDOS: Graph theory and probability, Canadian J. Math. 11 (1959), 34-38.
[5J P. ERDOS: On a combinatorial problem I, Nordisk Math. Tidskrift 11 (1963),
5-10.
[6J P. ERDOS & R. K. GUY: Crossing number problems, Amer. Math. Monthly
80 (1973),52-58.
[7J P. ERDOS & A. RENYI: On the evolution of random graphs, Magyar Tud.
Akad. Mat. Kut. lnt. Kozl. S (1960), 17-61.
[8J T. LEIGHTON: Complexity Issues in VLSI, MIT Press, Cambridge MA 1983.
[9J L. A. SZEKELY: Crossing numbers and hard Erdos problems in discrete
geometry, Combinatorics, Probability, and Computing 6 (1997), 353-358.
Prooffrom THE BOOK
211
About the Illustrations
We are happy to have the possibility and privilege to illustrate this volume
with wonderful original drawings by Karl Heinrich Hofmann (Dannstadt).
Thank you!
The regular polyhedra on page 60 and the fold-out map of a flexible sphere
on page 68 are by WAF Ruppert (Vienna).
Jurgen Richter-Gebert provided two illustrations on page 62.
Page 179 shows an outside view and a floorplan of the Weisman Art
Museum in Minneapolis. The photo of its west faade is by Chris Faust.
The floorplan is of the Dolly Fiterman Riverview Gallery, which lies behind
the west faade.
The portraits of Bertrand, Cantor, Erdos, Euler, Fermat, Hermite, Herglotz,
Hilbert, P61ya, Littlewood, Tunin, and Sylvester are from the photo archive
of the Mathematisches Forschungsinstitut Oberwolfach, with permission.
(Thanks to Annette Disch!)
The picture of Hermite is from the first volume of his collected works. The
portrait of Buffon is from the MacTutor History of Mathematics archive.
The portrait of Cayley is taken from the "Photoalbum fur WeierstraB"
(edited by Reinhard Bolling, Vieweg 1994), with permission from the Kunst-
bibliothek, Staatliche Museen zu Berlin, Preussischer Kulturbesitz.
The Cauchy portrait is reproduced with permission from the Collections de
l'Ecole Poly technique, Paris.
The picture of Fermat is reproduced from Stefan Hildebrandt and Anthony
Tromba: The Parsimonious Universe. Shape and FornI in the Natural
World, Springer-Verlag, New York 1996.
The portrait of Ernst Witt is from volume 426 (1992) of the Journal fur die
Reine und Angewandte Mathematik, with permission by Walter de Gruyter
Publishers. It was taken around 1941.
The photo of Karol Borsuk was taken in 1967 by Isaac Namioka, and is
reproduced with his kind permission.
We thank Dr. Peter Sperner (Braunschweig) for the portrait of his father.
Thanks to Noga Alon for the portrait of A. Nilli!
-
Index
acyclic directed graph 150
addition theorems 120
adjacency matrix 196
adjacent vertices 50
antichain 143
arithmetic mean 99
art gaIJery theorem 180
average degree 60
average number of divisors 134
Bernoul1i numbers 123
Bertrand's postulate 7
bijection 87
Binet-Cauchy formula 151, 157
binomial coefficient 13
bipartite graph 51,169
Borsuk's conjecture 79
Brouwer's fixed point theorem 139
Buffon's needle problem 125
cardinal number 87
cardinality 87,95
Cauchy's arm lemma 66
Cauchy's rigidity theorem 65
Cauchy-Schwarz inequality 99
Cayley's formula 155
center 23
centralizer 23
centrally symmetric 45
chain 143
channel 189
Chebyshev polynomials 114
Chebyshev's theorem 108
class formula 24
clique 51,183,190
clique number 185
2-colorable set system 203
combinatorial1yequivalent 45
comparison of coefficients 32
complete bipartite graph 50
complete graph 50
complex polynomial 107
confusion graph 189
congruent 45
connected 51
continuum 89
continuum hypothesis 92
convex polytope 44
convex vertex 180
cosine polynomial III
countable 87
critical family 146
crossing lemma 209
crossing number 208
cube 44
cycle 51
C 4 -condition 199
degree 60
Dehn invariant 41
Dehn-Hadwiger theorem 41
dense 93
determinants 149
dihedral angle 41
dimension 90
dimension of a graph 132
Dinitz problem 167
directed graph 169
division ring 23
double counting 134
dual graph 59, 175
edge of a graph 50
edge of a polyhedron 44
elementary polygon 63
equal size 87
equicomplementable polyhedra 39
equidecomposable polyhedra 39
Erd6s-Ko-Rado theorem 144
Euler's polyhedron formula 59
214
Index
even function 123
expectation 78
face 44, 59
facet 44
Fennat number 3
finite field 23
finite set system 143
forest 51
four-color theorem 175
friendship theorem 199
geometric mean 99
Gessel-Viennot lemma 149
girth 205
golden section 193
graph 50
graph coloring 175
graph of a polytope 45
harmonic mean 99
harmonic number 10
Herglotz trick 119
Hilbert's third problem 39
incidence matrix 48, 134
incident 50
indegree 169
independence number 189,206
independent set 51, 167
induced subgraph 51, 168
inequalities 99
inital ordinal number 96
intersecting family 144
involution 20
irrational numbers 27
isomorphic graphs 51
kernel 169
labeled tree 155
Lagrange's theorem 4
Latin rectangle 162
Latin square 161,167
lattice basis 64
lattice paths 149
Legendre's theorem 8
line graph I 72
linearity of expectation 78, 126
list chromatic number 168
list coloring 168, 176
Littlewood-Offord problem 115
loop 50
Lovasz umbrella 192
Lovasz' theorem 195
Markov's inequality 78
marriage theorem 146
matching 170
matrix of rank 1 80
matrix -tree theorem 157
Mersenne number 4
Minkowski symmetrization 75
mirror image 45
monotone subsequences 132
multiple edges 50
museum guards 179
Mycielski graph 206
near-triangulated plane graph 176
nearly-orthogonal vectors 80
needles 125
neighbors 50
obtuse angle 73
odd function 120
order of a group element 4
ordered set 95
ordinal number 95
orthonormal representation 192
outdegree 169
partial Latin square 161
path 5 I
path-matrix 149
periodic function 120
Pick's theorem 63
pigeon-hole principle 131
planar graph 59
plane graph 59, 176
point configuration 53
polygon 44
polyhedron 39,44
polynomial with real roots 101, lID
polytope 73
prime field 18
prime number 3, 7
---.J
Index
prime number theorem 10
probabilistic method 203
probability distribution 184
probability space 77
product of graphs 189
projective plane 137
(Q-linear function 40
Ramsey number 204
random variable 77
rate of transmission 189
refining sequence 159
Riemann zeta function 34
rooted forest 159
roots of unity 25
scalar product 80
Schonhardt's polyhedron 180
Schroeder-Bernstein theorem 90
simple graph 50
simplex 44
size 87
slope problem 53
Sperner's lemma 139
Sperner's theorem 143
squares 18
stable matching 170
star 49
Stirling's formula II
subgraph 51
sums of two squares 17
Sylvester's theorem 13
Sylvester-Gallai theorem 47,62
system of distinct representatives 145
cr
'"'
tangential rectangle 101
tangential triangle !OI
touching simplices 69
tree 51
triangle-free graph 206
Tunin's graph theorem 183
Tunin graph 183
two square theorem 17
umbrella 192
unimodal 12
unit d-cube 44
vertex 44, 50
vertex degree 60, 135, 168
vertex-disjoint path system 149
volume 68
weighted directed graph 149
well-ordered 95
well-ordering theorem 95
windmill graph 199
zero-error capacity 190
Printing and bookbinding: Sturtz AG, Wurzburg